CN110597998A - Military scenario entity relationship extraction method and device combined with syntactic analysis - Google Patents

Military scenario entity relationship extraction method and device combined with syntactic analysis Download PDF

Info

Publication number
CN110597998A
CN110597998A CN201910653287.0A CN201910653287A CN110597998A CN 110597998 A CN110597998 A CN 110597998A CN 201910653287 A CN201910653287 A CN 201910653287A CN 110597998 A CN110597998 A CN 110597998A
Authority
CN
China
Prior art keywords
entity
corpus
entity relationship
military
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910653287.0A
Other languages
Chinese (zh)
Inventor
杨若鹏
卢稳新
鲁义威
刘乾
蒋序平
张建军
温鸿鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201910653287.0A priority Critical patent/CN110597998A/en
Publication of CN110597998A publication Critical patent/CN110597998A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a military scenario entity relationship extraction method and device combining syntactic analysis, wherein the method comprises the following steps: 1. predefining a target relationship type of a military scenario entity relationship extraction task; 2. constructing a training data set and a testing data set of the entity relationship extraction model; 3. parsing the linguistic data item by item, and filtering out sentence components which do not contribute to the extraction of the entity relationship; 4. converting sentence components reserved after syntactic parsing into vectorized word embedding by using a pre-trained word embedding matrix; 5. training the entity relationship extraction model by using vectorized training data; 6. and extracting entity relations of the military scenario texts to be processed. The military thought entity relationship extraction method combined with the syntactic analysis can effectively improve the calculation efficiency and the accuracy of the entity relationship extraction.

Description

Military scenario entity relationship extraction method and device combined with syntactic analysis
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to an entity relationship extraction method and device for military scenario.
Background
The military idea is divided into basic idea and supplementary idea, is a practice document which is assumed and assumed according to the attempts, situations and development situations of both parties of the battle according to the training topic, and is a basic document which organizes and induces the military practice and operation. The military thought entity relationship is a basic information element of military thought data, is a basis for extracting, processing and analyzing the military thought data, aims to extract the military thought entity relationship, finds the entity relationship hidden in the military thought unstructured text, and extracts the entity relationship by adopting a certain means.
At present, entity relationship extraction methods in the open field mainly include a rule-based method, a kernel function-based method, and a deep learning-based method. The rule-based method needs to depend on expert knowledge and manual induction seriously according to domain knowledge related to the linguistic data to be processed, so that the cost is high, the portability is poor, and the rule-based method is difficult to widely use; the method based on the kernel function performs entity relation extraction by calculating the similarity of the syntactic structure tree, so that the training and testing speed is too low, and the method is not suitable for processing large-scale data; the deep learning-based method can automatically extract high-level features in sentences by utilizing a deep neural network, has strong portability and high extraction precision, but for the text in the closed field planned by military, the performance of the text is restricted due to the lack of large-scale manual labeling linguistic data.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and realize a military scenario entity relationship extraction method and device combined with syntactic analysis.
In order to achieve the purpose, the invention adopts the following technical scheme:
a military scenario entity relationship extraction method based on syntactic analysis and a deep neural network comprises the following steps:
s1, corpus construction, which is used for predefining entity relationship extraction target relationship types, labeling military scenario original texts, constructing entity relationship extraction model training data sets and testing data sets, and specifically comprises the following steps:
s1.1, predefining entity relations, wherein the entity relations are used for analyzing military concepts in an authoritative dictionary in the field, referring to the principle and method of a Semantic Evaluation conference about entity relation type definition, and predefining entity relation types to be extracted;
authoritative dictionaries in the field include, but are not limited to, dictionaries such as Chinese military encyclopedia, military dictionary, concise military dictionary and the like;
s1.2, entity relation linguistic data are constructed, military scenario original texts are labeled manually according to predefined entity relation types, an entity relation extraction corpus is generated, and the storage form of each linguistic data in the corpus is (e)1,e2R, s) in which e1、e2Respectively representing a head entity and a tail entity, r representing a semantic relationship between the two entities, s representing a description entity e1、e2Sentences with semantic relation r;
s1.3, dividing a data set, dividing a training data set and a test data set, and dividing the corpus obtained in the step S1.2 into the training data set and the test data set according to a specific proportion;
the division ratio of the training data set to the test data set is 2: 1.
S2, parsing, which is used to parse the sentence S in each corpus in the corpus and filter the sentence components that do not contribute to the entity relationship extraction, and specifically includes:
s2.1, generating a syntax tree, and analyzing sentences S in each corpus in the corpus by using a syntax analysis open source tool to generate the syntax tree;
the syntax parsing open source tool includes but is not limited to Stanford parser and the like;
s2.2, parsing tree pruning for pruning triples (e) in the syntax tree related to the entity1,e2R) generating a syntactic parse subtree by the irrelevant sentence components;
and S2.3, recombining the subtrees, namely recombining the syntax analysis subtrees into a text sequence, wherein the original sequence of words is not changed in the recombining process.
S3, vectorizing data, converting the recombined sequence generated in step S2.3 into a word embedding set expressed in a distributed vector form, specifically including:
s3.1, training original text vectorization, combining with an authority dictionary in the field to recombine the currently input sequence SiConversion into one-hot vectors, s, in units of wordsiRepresenting sentences in the input ith corpus;
s3.2, generating word embedding, namely converting the one-hot vector set obtained in the step S3.1 into low-dimensional real-value word embedding word by using a word vector conversion open source tool;
the word vector translation open source tool includes, but is not limited to, word2vec, etc.
S4, model training, namely training an entity relationship extraction model based on the deep neural network by using a datamation entity relationship extraction training data set, wherein the method specifically comprises the following steps:
s4.1, semantic feature extraction, namely selecting a specific neural network as a basic relation extractor, extracting high-level semantic features of the current sentence from the vector set output in the step S3.4, and simultaneously extracting an entity pair e by adopting a bidirectional neural network in the model1、e2The context semantic information of the ith corpus is used for improving the recognition accuracy of the entity relationship, and the characteristic expression of the jth word of the ith corpus is shown as the following formula:
in the formula (I), the compound is shown in the specification,a combination of a forward path output and a reverse path output]The representation is shown with a vector in parentheses,representing semantic features of the jth word in the ith corpus output from the forward channel,representing semantic features of a jth word in an ith corpus output by a backward channel;
the specific neural network includes, but is not limited to, Long Short-Term Memory Networks (LTSM) and the like;
the bidirectional neural network includes but is not limited to bidirectional long short term memory network (BLSTM) and the like;
s4.2, entity relation prediction, namely processing the characteristic vector output in the step S4.1 by using a classifier, and calculating the current corpus (e)1,e2R, s) where the relationship r is a predefined entity relationship type set Y ═ Y1,y2,…,y8]Middle relation yn(n∈[1,8]) Is estimated probability of
Wherein softmax (·) represents softmax classifier operation, W represents weight matrix of classifier network, and siRepresents a sentence in the ith corpus,representing the combination of the feature vectors of all the words of the sentences in the ith corpus, and b representing the bias of the classifier network;
estimating the relationship type corresponding to the maximum value in the probabilityNamely, the prediction result of the relation r in the current corpus is labeledTo show that:
in the formula (I), the compound is shown in the specification,it means that the maximum value is taken for operation,representing a sentence s in the ith corpusiThe entity relationship type described is ynConditional probability of (a), ynRepresenting the nth predefined entity relationship type, siRepresenting sentences in the ith corpus;
the classifier includes but is not limited to a softmax classifier, etc.;
s4.3, optimizing a cost function, and obtaining the following cost function of the deep neural network by calculating the logarithm of the negative likelihood function of the real label y:
in the formula, tnWhich represents a one-hot vector of the vector,the method comprises the steps of representing the estimation probability of each predefined relationship type output by a softamx classifier, representing the number of the predefined relationship types (the value is 8 here), representing L2 regularized superparameter, representing theta to an independent parameter in an entity relationship extraction model, representing | · | | | to obtain a norm, and continuously adjusting the model superparameter through minimizing a cost function J (theta) to finish model training.
S5, entity relationship extraction, which is used to extract entity relationships of the military scenario text to be processed by using the trained model, and specifically includes:
s5.1, testing text vectorization, and using the processing process in the step S3 to vectorize the military scenario original text to be processed sentence by sentence;
and S5.2, entity relation prediction, namely performing semantic relation prediction on the vectorized military scenario sentence by sentence output in the step S5.1 by using the model trained in the step S4, and storing the result.
The invention adopts the extraction method of the military scenario entity relationship by combining the syntactic analysis, and has the advantages that:
1. through deep analysis of authoritative dictionaries in the fields of Chinese military encyclopedia and the like, the target requirement of extraction of military thought entity relations is cleared, on the basis, the principle and the method of definition of entity relation types of a Semantic Evaluation conference are referred, the target relation types of extraction of 8 military thought entity relations are predefined, and a military thought entity relation extraction training/testing corpus containing 11236 corpora is constructed;
2. the semantic expression which is set for military affairs has stronger normativity and modularity, and the syntactic parser is firstly utilized to carry out syntactic parsing and pruning operation on the sentence before the relation extraction is carried out, so that sentence components which do not contribute to the entity relation extraction are filtered, the utilization rate of effective information is improved, and the operation overhead of the model is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram of an embodiment of a military proposed entity relationship extraction method incorporating syntactic analysis of the present invention;
FIG. 2 is a block diagram of the component architecture of the present invention;
FIG. 3 is a diagram of an entity relationship extraction model based on a deep neural network applied in the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flow diagram of a military scenario entity relationship extraction method with syntactic analysis is shown, which specifically includes the following steps:
s1, corpus construction, which is used for predefining entity relationship extraction target relationship types, labeling military scenario original texts, constructing entity relationship extraction model training data sets and testing data sets, and specifically comprises the following steps:
s1.1, predefining entity relations, analyzing military concepts in authoritative dictionaries in fields of Chinese military encyclopedias, military major dictionaries, concise military dictionaries and the like, referring to principles and methods of a Semantic Evaluation conference about entity relation type definition, and predefining entity relation types to be extracted;
1.2, constructing entity relation corpora, marking military scenario original texts by adopting a manual method according to predefined entity relation types to generate an entity relation extraction corpus, wherein the storage form of each corpus in the corpus is (e)1,e2R, s) in which e1、e2Respectively representing a head entity and a tail entity; r represents the semantic relationship between two entities; s denotes a description entity e1、e2Sentences with semantic relation r;
s1.3, dividing a data set, dividing a training data set and a test data set, and dividing the corpus obtained in the step S1.2 into the training data set and the test data set according to a specific proportion;
s2, parsing, which is used to parse the sentence S in each corpus in the corpus and filter the sentence components that do not contribute to the entity relationship extraction, and specifically includes:
s2.1, generating a syntax tree, and performing syntax analysis on sentences S in each corpus in the corpus by using open source tools such as Stanfordparser and the like to generate the syntax tree;
s2.2, parsing tree pruning for pruning triples (e) in the syntax tree related to the entity1,e2R) generating a syntactic parse subtree by the irrelevant sentence components;
and S2.3, recombining the subtrees, namely recombining the syntax analysis subtrees into a text sequence, wherein the original sequence of words is not changed in the recombining process.
S3, vectorizing data, converting the recombined sequence generated in step S2.3 into a word embedding set expressed in a distributed vector form, specifically including:
s3.1, training original text vectorization, combining with an authority dictionary in the field to recombine the currently input sequence SiConversion into one-hot vectors, s, in units of wordsiRepresenting sentences in the input ith corpus;
s3.2, generating word embedding, namely converting the one-hot vector set obtained in the step S3.1 into low-dimensional real-value word embedding word by utilizing open source tools such as word2vec and the like, namely, converting the jth word x in the ith sentence into low-dimensional real-value word embedding wordijConversion to k-dimensional vectors
S4, model training, namely training an entity relationship extraction model based on the deep neural network by using a datamation entity relationship extraction training data set, wherein the method specifically comprises the following steps:
s4.1, semantic feature extraction, namely selecting a Long Short-Term Memory network (LTSM) and the like as a basic relation extractor, extracting high-level semantic features of the current sentence from the vector set output in the step S3.2, and simultaneously extracting an entity pair e by using a bidirectional Long Short-Term Memory network (BLSTM) and the like as a model1、e2The context semantic information of the entity relationship is improved, and the identification precision of the entity relationship is improved. Of the j-th word of the ith corpusThe characteristic expression is shown as the following formula:
in the formula (I), the compound is shown in the specification,a combination of a forward path output and a reverse path output]The representation is shown with a vector in parentheses,representing semantic features of the jth word in the ith corpus output from the forward channel,representing semantic features of a jth word in an ith corpus output by a backward channel;
s4.2, entity relation prediction, namely processing the feature vector output in the step S4.1 by using classifiers such as softmax and the like, and calculating the current corpus (e)1,e2R, s) where the relationship r is a predefined entity relationship type set Y ═ Y1,y2,…,y8]Middle relation yn(n∈[1,8]) Is estimated probability of
Wherein softmax (·) represents softmax classifier operation, W represents weight matrix of classifier network, and siRepresents a sentence in the ith corpus,representing the combination of the feature vectors of all the words of the sentences in the ith corpus, and b representing the bias of the classifier network;
estimating the relationship type corresponding to the maximum value in the probabilityNamely, the prediction result of the relation r in the current corpus is labeledTo show that:
in the formula (I), the compound is shown in the specification,it means that the maximum value is taken for operation,representing a sentence s in the ith corpusiThe entity relationship type described is ynConditional probability of (a), ynRepresenting the nth predefined entity relationship type, siRepresenting sentences in the ith corpus;
s4.3, optimizing a cost function, and obtaining the following cost function of the deep neural network by calculating the logarithm of the negative likelihood function of the real label y:
in the formula, tnWhich represents a one-hot vector of the vector,and (2) representing the estimated probability of each predefined relationship type output by the softamx classifier in the step (S4.2), wherein m represents the number of the predefined relationship types (the value is 8), lambda represents the regularized hyperparameter of L2, theta represents an independent parameter in the entity relationship extraction model, and | L | · | | represents the norm, and model hyperparameter is continuously adjusted by minimizing a cost function J (theta) to complete model training.
S5, extracting entity relations, wherein the extraction of the entity relations is carried out on the military scenario text to be processed by utilizing the trained model, and the extraction method specifically comprises the following steps:
s5.1, testing text vectorization, and using the processing process in the step S3 to vectorize the military scenario original text to be processed sentence by sentence;
and S5.2, entity relation prediction, namely performing semantic relation prediction on the vectorized military scenario sentence by sentence output in the step S5.1 by using the model trained in the step S4, and storing the result.
Referring to fig. 2, there is shown a composition structure diagram of the present invention, which specifically includes:
the corpus construction module 100 is configured to predefine an entity relationship extraction target relationship type, label a military scenario original text, and construct an entity relationship extraction model training data set and a test data set, and specifically includes:
the entity relationship predefining unit 101 is used for analyzing military concepts in authoritative dictionaries in fields of Chinese military encyclopedias, military dictionaries, concise military dictionaries and the like, predefining entity relationship types to be extracted by referring to principles and methods defined by a Semantic Evaluation conference about entity relationship types;
the entity relationship corpus building unit 102 labels military scenario original texts by a manual method according to predefined entity relationship types to generate an entity relationship extraction corpus;
a data set dividing unit 103, configured to divide a training data set and a test data set, and divide the corpus obtained by the entity-relationship corpus establishing unit 102 into the training data set and the test data set according to a specific ratio;
the syntax parsing module 200 is configured to perform syntax parsing on sentences in each corpus in the corpus, and filter out sentence components that do not contribute to entity relationship extraction, and specifically includes:
a syntax tree generating unit 201, which performs syntax parsing on sentences in each corpus in the corpus by using an open source tool to generate a syntax tree;
a syntax tree pruning unit 202, configured to prune branches and leaves in the syntax tree except for the entity and its root node, and generate a syntax parsing sub-tree;
and the subtree recombination unit 203 recombines the syntax analysis subtree into a text sequence, and does not change the original sequence of words in the recombination process.
The data vectorization module 300 converts the recombination sequences generated by the sub-tree recombination unit 203 into word embedding sets expressed in a distributed vector form, and specifically includes:
training an original text vectorization unit 301, segmenting the recombination sequence in the current input corpus according to words to obtain a word set consisting of T words, and converting the words in the set into one-hot vectors based on an authority dictionary in the field;
the word embedding generating unit 302 converts the one-hot vector set obtained by training the original text vectorization 301 into low-dimensional real-valued word embedding word by using an open-source tool.
The model training module 400, which trains the entity relationship extraction model based on the deep neural network by using the datamation entity relationship extraction training data set, specifically includes:
the semantic feature extraction unit 401 selects a specific neural network as a basic relationship extractor, extracts the high-level semantic features of the current sentence from the vector set output by the word embedding generation unit 302, and the model adopts a bidirectional neural network to simultaneously extract the entity pair e1、e2The context semantic information of the entity relation is improved so as to improve the identification precision of the entity relation;
an entity relationship prediction unit 402 for processing the feature vector output from the semantic feature extraction unit 401 by using a classifier;
the cost function optimization unit 403 obtains a cost function of the deep neural network by calculating the logarithm of the negative likelihood function of the real label y, and completes model training by continuously adjusting the hyper-parameters of the model by minimizing the cost function.
The entity relationship extraction module 500 performs entity relationship extraction on the military scenario text to be processed by using the trained model, and specifically includes:
the test text vectorization unit 501 performs vectorization on the military scenario original text to be processed sentence by using the processing procedure in the data vectorization module 300;
the entity relationship prediction unit 502 performs semantic relationship prediction on the vectorization military scenario sentence by sentence output by the test text vectorization unit 501 by using the model trained by the model training module 400, and stores the result.

Claims (9)

1. A military scenario entity relationship extraction method combined with syntactic analysis is characterized by comprising the following steps:
s1, corpus construction: predefining entity relationship extraction target relationship types, labeling military scenario original texts, and constructing an entity relationship extraction model training data set and a test data set, wherein the method specifically comprises the following steps:
s1.1, predefining entity relations: predefining entity relationship types to be extracted by adopting the principle and method of definition of the entity relationship types of the Semantic Evaluation conference;
s1.2, entity relation corpus construction: marking military scenario original text by adopting a manual method according to a predefined entity relation type to generate an entity relation extraction corpus, wherein the storage form of each corpus in the corpus is (e)1,e2R, s) in which e1、e2Respectively representing a head entity and a tail entity, r representing a semantic relationship between the two entities, s representing a description entity e1、e2Sentences with semantic relation r;
s1.3, data set division: dividing a training data set and a test data set, and dividing the corpus obtained in the step S1.2 into the training data set and the test data set according to a specific proportion;
s2, syntax analysis: the method specifically comprises the following steps of performing syntactic analysis on a sentence s in each corpus in a corpus, and filtering out sentence components which do not contribute to entity relationship extraction, wherein the method specifically comprises the following steps:
s2.1, syntax tree generation: analyzing sentences s in each corpus in the corpus by using a syntax analysis open source tool to generate a syntax tree;
s2.2, pruning of the analytic tree: pruning triples (e) in syntax trees related to entities1,e2R) generating a syntactic parse subtree by the irrelevant sentence components;
s2.3, subtree recombination: the syntax analysis subtrees are recombined into a text sequence, and the original sequence of words is not changed in the recombination process;
s3, vectorizing data, converting the recombined sequence generated in step S2.3 into a word embedding set expressed in a distributed vector form, specifically including:
s3.1, training original text vectorization: combining the currently input recombination sequence s with the authoritative dictionary in the fieldiConversion into one-hot vectors, s, in units of wordsiRepresenting sentences in the input ith corpus;
s3.2, word embedding generation: converting the one-hot vector set obtained in the step S3.1 into low-dimensional real-valued word embedding word by utilizing a word vector conversion and source-opening tool:
s4, model training: the method for training the entity relationship extraction model based on the deep neural network by utilizing the datamation entity relationship extraction training data set specifically comprises the following steps:
s4.1, semantic feature extraction: selecting a specific neural network as a basic relation extractor, extracting high-level semantic features of the current sentence from the vector set output in the step S3.4, and simultaneously extracting an entity pair e by adopting a bidirectional neural network in the model1、e2The context semantic information of the ith corpus is used for improving the recognition accuracy of the entity relationship, and the characteristic expression of the jth word of the ith corpus is shown as the following formula:
in the formula (I), the compound is shown in the specification,a combination of a forward path output and a reverse path output]The representation is shown with a vector in parentheses,representing semantic features of the jth word in the ith corpus output from the forward channel,representing semantic features of a jth word in an ith corpus output by a backward channel;
s4.2, entity relation prediction: processing the feature vector output in step S4.1 by a classifier to calculate the current corpus (e)1,e2R, s) where the relationship r is a predefined entity relationship type set Y ═ Y1,y2,…,y8]Middle relation yn(n∈[1,8]) Is estimated probability of
Wherein softmax (·) represents softmax classifier operation, W represents weight matrix of classifier network, and siRepresents a sentence in the ith corpus,representing the combination of the feature vectors of all the words of the sentences in the ith corpus, and b representing the bias of the classifier network;
estimating the relationship type corresponding to the maximum value in the probabilityNamely, the prediction result of the relation r in the current corpus is labeledTo show that:
in the formula (I), the compound is shown in the specification,it means that the maximum value is taken for operation,representing a sentence s in the ith corpusiThe entity relationship type described is ynConditional probability of (a), ynRepresenting the nth predefined entity relationship type, siRepresenting sentences in the ith corpus;
s4.3, cost function optimization: by calculating the logarithm of the negative likelihood function of the real label y, the cost function of the deep neural network is obtained as follows:
in the formula, tnWhich represents a one-hot vector of the vector,representing the estimated probability of each predefined relationship type output by the softamx classifier in the step S4.2, wherein m represents the number of the predefined relationship types (the value is 8 here), lambda represents L2 regularized hyper-parameter, theta represents an independent parameter in the entity relationship extraction model, and | L | · | | | represents the norm, and model hyper-parameter is continuously adjusted through minimizing a cost function J (theta) to complete model training;
s5, entity relationship extraction: the method for extracting the entity relationship of the military scenario text to be processed by utilizing the trained model specifically comprises the following steps:
s5.1, testing text vectorization: vectorizing the military scenario original text to be processed sentence by using the processing procedure in the step S3;
s5.2, entity relation prediction: and (4) performing semantic relation prediction on the vectorized military scenario sentence by sentence output in the step (S5.1) by using the model trained in the step (S4), and storing the result.
2. The method of military affairs ideation entity relationship extraction combined with syntactic analysis according to claim 1, wherein the domain authority dictionary comprises military encyclopedia of China, military dictionary, and concise military dictionary.
3. The method of extracting military hypothetical entity relationships incorporating syntactic analysis according to claim 1, wherein the training data set is divided by the test data set in a ratio of 2: 1.
4. The method of extracting military tape-out entity relationships in conjunction with syntactic analysis according to claim 1, wherein said syntactic parse open source tool is a Stanford parser.
5. The method of extracting military hypothetical entity relationships incorporating syntactic analysis according to claim 1, wherein the word vector translation open source tool is word2 vec.
6. The method of extracting military hypothetical entity relationships incorporating syntactic analysis according to claim 1, wherein the specific neural network is a long-short term memory network.
7. The method of extracting military hypothetical entity relationships incorporating syntactic analysis according to claim 1, wherein the bidirectional neural network is a bidirectional long-short term memory network.
8. The method of extracting military proposal entity relationships in conjunction with syntactic analysis of claim 1 wherein the classifier comprises a softmax classifier.
9. An apparatus for extracting military proposed entity relationships in conjunction with syntactic analysis, the apparatus comprising:
corpus construction module 100: predefining entity relationship extraction target relationship types, labeling military scenario original texts, and constructing an entity relationship extraction model training data set and a test data set, wherein the method specifically comprises the following steps:
entity relationship pre-defining unit 101: predefining entity relationship types to be extracted by adopting the principle and method of definition of the entity relationship types of the Semantic Evaluation conference;
the entity relationship corpus building unit 102: marking military scenario original texts by adopting a manual method according to predefined entity relationship types to generate an entity relationship extraction corpus;
the data set dividing unit 103: dividing a training data set and a test data set, and dividing a corpus obtained by the entity relationship corpus construction unit 102 into the training data set and the test data set according to a specific proportion;
syntax parsing module 200: the method specifically comprises the following steps of performing syntactic analysis on sentences in each corpus in a corpus, and filtering sentence components which do not contribute to entity relationship extraction, wherein the method specifically comprises the following steps:
syntax tree generation unit 201: performing syntax analysis on sentences in each corpus in the corpus by using an open source tool to generate a syntax tree;
syntax tree pruning unit 202: cutting branches and leaves except entities and root nodes thereof in the syntax tree to generate a syntax analysis sub-tree;
the subtree recombination unit 203: the syntax analysis subtrees are recombined into a text sequence, and the original sequence of words is not changed in the recombination process;
the data vectorization module 300: converting the recombination sequences generated by the sub-tree recombination unit 203 into word embedding sets expressed in a distributed vector form, which specifically comprises:
training the original text vectorization unit 301: segmenting the recombination sequence in the current input corpus according to words to obtain a word set consisting of T words, and converting the words in the set into one-hot vectors based on an authority dictionary in the field;
the word embedding generation unit 302: converting the one-hot vector set obtained by the training original text vectorization unit 301 word by word into low-dimensional real-valued word embedding by using an open-source tool;
model training module 400: the method for training the entity relationship extraction model based on the deep neural network by utilizing the datamation entity relationship extraction training data set specifically comprises the following steps:
semantic feature extraction section 401: selecting a particular neural network as a basis relationship extractionA device for extracting high-level semantic features of the current sentence from the vector set output by the word embedding generation unit 302, wherein the model adopts a bidirectional neural network to simultaneously extract entity pairs e1、e2The context semantic information of the entity relation is improved so as to improve the identification precision of the entity relation;
entity relationship prediction unit 402: processing the feature vectors output by the semantic feature extraction unit 401 by using a classifier;
cost function optimization unit 403: obtaining a cost function of the deep neural network by calculating the logarithm of the negative likelihood function of the real label y, and continuously adjusting the hyper-parameters of the model by minimizing the cost function to finish model training;
the entity relationship extraction module 500: the method for extracting the entity relationship of the military scenario text to be processed by utilizing the trained model specifically comprises the following steps:
the test text vectorization unit 501: vectorizing the military scenario original text to be processed sentence by using the processing procedure in the data vectorization module 300;
entity relationship prediction unit 502: the model trained by the model training module 400 is used for carrying out semantic relation prediction on the vectorization military scenario sentence by sentence output by the test text vectorization unit 501, and the result is stored.
CN201910653287.0A 2019-07-19 2019-07-19 Military scenario entity relationship extraction method and device combined with syntactic analysis Pending CN110597998A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910653287.0A CN110597998A (en) 2019-07-19 2019-07-19 Military scenario entity relationship extraction method and device combined with syntactic analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910653287.0A CN110597998A (en) 2019-07-19 2019-07-19 Military scenario entity relationship extraction method and device combined with syntactic analysis

Publications (1)

Publication Number Publication Date
CN110597998A true CN110597998A (en) 2019-12-20

Family

ID=68852960

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910653287.0A Pending CN110597998A (en) 2019-07-19 2019-07-19 Military scenario entity relationship extraction method and device combined with syntactic analysis

Country Status (1)

Country Link
CN (1) CN110597998A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177383A (en) * 2019-12-24 2020-05-19 上海大学 Text entity relation automatic classification method fusing text syntactic structure and semantic information
CN111309925A (en) * 2020-02-10 2020-06-19 同方知网(北京)技术有限公司 Knowledge graph construction method of military equipment
CN111476035A (en) * 2020-05-06 2020-07-31 中国人民解放军国防科技大学 Chinese open relation prediction method and device, computer equipment and storage medium
CN111738000A (en) * 2020-07-22 2020-10-02 腾讯科技(深圳)有限公司 Phrase recommendation method and related device
CN112149423A (en) * 2020-10-16 2020-12-29 中国农业科学院农业信息研究所 Corpus labeling method and system for domain-oriented entity relationship joint extraction
CN112487206A (en) * 2020-12-09 2021-03-12 中国电子科技集团公司第三十研究所 Entity relationship extraction method for automatically constructing data set
CN112685513A (en) * 2021-01-07 2021-04-20 昆明理工大学 Al-Si alloy material entity relation extraction method based on text mining
CN113076421A (en) * 2021-04-02 2021-07-06 西安交通大学 Social noise text entity relation extraction optimization method and system
CN113076396A (en) * 2021-03-29 2021-07-06 中国医学科学院医学信息研究所 Entity relationship processing method and system oriented to man-machine cooperation
CN112214610B (en) * 2020-09-25 2023-09-08 中国人民解放军国防科技大学 Entity relationship joint extraction method based on span and knowledge enhancement
CN117332761A (en) * 2023-11-30 2024-01-02 北京一标数字科技有限公司 PDF document intelligent identification marking system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930746A (en) * 1996-03-20 1999-07-27 The Government Of Singapore Parsing and translating natural language sentences automatically
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb
CN109902301A (en) * 2019-02-26 2019-06-18 广东工业大学 Relation inference method, device and equipment based on deep neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930746A (en) * 1996-03-20 1999-07-27 The Government Of Singapore Parsing and translating natural language sentences automatically
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN109165385A (en) * 2018-08-29 2019-01-08 中国人民解放军国防科技大学 Multi-triple extraction method based on entity relationship joint extraction model
CN109241538A (en) * 2018-09-26 2019-01-18 上海德拓信息技术股份有限公司 Based on the interdependent Chinese entity relation extraction method of keyword and verb
CN109902301A (en) * 2019-02-26 2019-06-18 广东工业大学 Relation inference method, device and equipment based on deep neural network

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
LI ZHEN 等: ""Research on Entity Semantic Relation Extraction in Fusion Domain"", 《2018 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND BUSINESS ANALYTICS (ICDSBA)》 *
YUANFEI DAI 等: ""Relation Classification via LSTMs Based on Sequence and Tree Structure"", 《IEEE ACCESS》 *
单赫源 等: ""结合词语规则和SVM模型的军事命名实体关系抽取方法"", 《指挥控制与仿真》 *
唐弘毅: ""基于深度学习的实体关系抽取的研究"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
唐敏: ""基于深度学习的中文实体关系抽取方法研究"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
庄成龙 等: ""基于树核函数的实体语义关系抽取方法研究"", 《中文信息学报》 *
朱珊珊 等: ""基于BiLSTM_Att的军事领域实体关系抽取研究"", 《智能计算机与应用》 *
李枫林 等: ""基于深度学习框架的实体关系抽取研究进展"", 《情报科学》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177383B (en) * 2019-12-24 2024-01-16 上海大学 Text entity relation automatic classification method integrating text grammar structure and semantic information
CN111177383A (en) * 2019-12-24 2020-05-19 上海大学 Text entity relation automatic classification method fusing text syntactic structure and semantic information
CN111309925B (en) * 2020-02-10 2023-06-30 同方知网数字出版技术股份有限公司 Knowledge graph construction method for military equipment
CN111309925A (en) * 2020-02-10 2020-06-19 同方知网(北京)技术有限公司 Knowledge graph construction method of military equipment
CN111476035A (en) * 2020-05-06 2020-07-31 中国人民解放军国防科技大学 Chinese open relation prediction method and device, computer equipment and storage medium
CN111476035B (en) * 2020-05-06 2023-09-05 中国人民解放军国防科技大学 Chinese open relation prediction method, device, computer equipment and storage medium
CN111738000B (en) * 2020-07-22 2020-11-24 腾讯科技(深圳)有限公司 Phrase recommendation method and related device
CN111738000A (en) * 2020-07-22 2020-10-02 腾讯科技(深圳)有限公司 Phrase recommendation method and related device
CN112214610B (en) * 2020-09-25 2023-09-08 中国人民解放军国防科技大学 Entity relationship joint extraction method based on span and knowledge enhancement
CN112149423A (en) * 2020-10-16 2020-12-29 中国农业科学院农业信息研究所 Corpus labeling method and system for domain-oriented entity relationship joint extraction
CN112149423B (en) * 2020-10-16 2024-01-26 中国农业科学院农业信息研究所 Corpus labeling method and system for domain entity relation joint extraction
CN112487206A (en) * 2020-12-09 2021-03-12 中国电子科技集团公司第三十研究所 Entity relationship extraction method for automatically constructing data set
CN112685513A (en) * 2021-01-07 2021-04-20 昆明理工大学 Al-Si alloy material entity relation extraction method based on text mining
CN113076396A (en) * 2021-03-29 2021-07-06 中国医学科学院医学信息研究所 Entity relationship processing method and system oriented to man-machine cooperation
CN113076396B (en) * 2021-03-29 2023-05-16 中国医学科学院医学信息研究所 Entity relationship processing method and system for man-machine cooperation
CN113076421A (en) * 2021-04-02 2021-07-06 西安交通大学 Social noise text entity relation extraction optimization method and system
CN113076421B (en) * 2021-04-02 2023-03-28 西安交通大学 Social noise text entity relationship extraction optimization method and system
CN117332761A (en) * 2023-11-30 2024-01-02 北京一标数字科技有限公司 PDF document intelligent identification marking system
CN117332761B (en) * 2023-11-30 2024-02-09 北京一标数字科技有限公司 PDF document intelligent identification marking system

Similar Documents

Publication Publication Date Title
CN110597998A (en) Military scenario entity relationship extraction method and device combined with syntactic analysis
US11194972B1 (en) Semantic sentiment analysis method fusing in-depth features and time sequence models
CN107291693B (en) Semantic calculation method for improved word vector model
CN106919646B (en) Chinese text abstract generating system and method
CN110598203B (en) Method and device for extracting entity information of military design document combined with dictionary
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN107908614A (en) A kind of name entity recognition method based on Bi LSTM
CN107818164A (en) A kind of intelligent answer method and its system
CN110362819B (en) Text emotion analysis method based on convolutional neural network
CN107885721A (en) A kind of name entity recognition method based on LSTM
CN112487143A (en) Public opinion big data analysis-based multi-label text classification method
CN111353306B (en) Entity relationship and dependency Tree-LSTM-based combined event extraction method
CN107797987B (en) Bi-LSTM-CNN-based mixed corpus named entity identification method
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN109062904B (en) Logic predicate extraction method and device
CN114020768A (en) Construction method and application of SQL (structured query language) statement generation model of Chinese natural language
CN108733647B (en) Word vector generation method based on Gaussian distribution
CN107977353A (en) A kind of mixing language material name entity recognition method based on LSTM-CNN
Ma et al. Tagging the web: Building a robust web tagger with neural network
CN110134793A (en) Text sentiment classification method
CN114020906A (en) Chinese medical text information matching method and system based on twin neural network
Legrand et al. Phrase representations for multiword expressions
Ansari et al. Language Identification of Hindi-English tweets using code-mixed BERT
Yang et al. Multi-intent text classification using dual channel convolutional neural network
Ronghui et al. Application of Improved Convolutional Neural Network in Text Classification.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191220

RJ01 Rejection of invention patent application after publication