CN115238690A - Military field composite named entity identification method based on BERT - Google Patents

Military field composite named entity identification method based on BERT Download PDF

Info

Publication number
CN115238690A
CN115238690A CN202111408527.4A CN202111408527A CN115238690A CN 115238690 A CN115238690 A CN 115238690A CN 202111408527 A CN202111408527 A CN 202111408527A CN 115238690 A CN115238690 A CN 115238690A
Authority
CN
China
Prior art keywords
word
model
layer
attention
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111408527.4A
Other languages
Chinese (zh)
Inventor
周焕来
张博阳
乔磊崖
高源�
郭健煜
唐小龙
贾海涛
王俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yituo Communications Group Co ltd
Original Assignee
Yituo Communications Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yituo Communications Group Co ltd filed Critical Yituo Communications Group Co ltd
Priority to CN202111408527.4A priority Critical patent/CN115238690A/en
Publication of CN115238690A publication Critical patent/CN115238690A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis

Abstract

The invention provides a method for recognizing a compound named entity in the military field based on BERT, which comprises the following steps: representing Word vectors by using a BERT pre-training model and Word vectors by using Word2Vec in an input layer, performing Word Embedding on the Word vectors and the Word vectors in a combined manner, introducing data enhancement operation, splicing the Word vectors and the Word vectors on a Word vector representation layer to enhance original input information, and constructing sentence initial vectors; capturing global semantic information of the text by using Bi-On-LSTM at an encoding layer; introducing an attention layer and updating semantic weight; and an LSTM Unit long-short term memory network is adopted in a decoding layer to extract nested named entities, the number of output labels is not limited to a single label, the advantages of the traditional On-LSTM in the aspect of extracting sentence level information are inherited, softmax is used for predicting and outputting, and a CRF layer ensures that sequence labeling results meet the dependency relationship among the labels, so that the field nested entity extraction task is completed. Because the decoding layer adopts the decoding mode of the LSTM Unit network and introduces the CRF layer to ensure the sequence dependency relationship, the problems of entity nesting and the like can be well solved, and the accuracy of the nested entity extraction in the military field is improved.

Description

Military field composite named entity identification method based on BERT
Technical Field
The invention belongs to the field of natural language processing.
Background
In recent years, a computer and a related information processing technology which are mature and developed day by day provide an effective means for further improving military command efficiency, and the informatization and intelligentization construction of our military is steadily promoted. As a new technology, the knowledge graph can integrate complex and massive data together, and the data are mutually linked through the mined relation, so that the knowledge graph has strong data description capacity and rich semantic relation functions. Entities are important language units carrying information in texts and are also core elements forming a knowledge graph. The entity identification and extraction is the basis for the subsequent work of attribute relationship extraction, event extraction, knowledge graph construction and the like, and the main task is to identify and classify entities with specific meanings in texts, for example, the general fields generally take names of people, places and organizations as extraction targets. The subject of the present invention is the target entity in the military field, meaning the named entities with high value in unstructured military text, such as military characters, weaponry, military events, etc., which often contain rich military knowledge. The method can correctly and efficiently identify the military named entity, provide support for subsequent operations such as battlefield information acquisition, information retrieval, information filtering, information association, semantic search and the like, improve the efficiency of actions such as information reconnaissance, command decision, organization and implementation and the like, and further improve the automation and intelligentization performance of military combat command.
Named entity recognition is an important research direction in the field of information extraction, meaning that entities involved in unstructured text containing entities are extracted from it. The composite named entity recognition is an optimization of named entity recognition specific situations, and the core objective of the composite named entity recognition is to extract all elements which may be entities from a text and distinguish nesting situations between the entities.
Currently, researchers are concerned more with common named entity recognition, and research on compound named entity recognition is limited. Therefore, most of the existing methods can only perform common named entity recognition, and cannot achieve good effect when facing a composite entity, so that the models have no good portability, and are difficult to complete the entity extraction task of an open domain. Unlike most scholars' choices, the present invention employs a completely different model than the common named entity recognition in the entity extraction stage. In view of the good performance of the codec framework and the attention mechanism on other natural language processing tasks, and the task researched by the invention also belongs to one of the sequence labeling tasks fundamentally, therefore, the invention builds a model for extracting the compound entity in the event element by combining the attention mechanism on the basis of the codec framework.
Disclosure of Invention
The invention provides a method for extracting a composite entity based on an encoding and decoding model, aiming at improving the accuracy and efficiency of the extraction of the composite entity. The method comprises the following steps:
(1) And selecting features at an input layer to construct an initial sentence vector.
(2) Hierarchical structure information and sequence information are captured at the coding layer.
(3) And capturing word-word information in the sentence at the attention layer and calculating corresponding weight.
(4) Features prior to layer synthesis are decoded and further abstract features are extracted.
(5) And obtaining a factor recognition result by using a softmax function at an output layer.
Drawings
FIG. 1 is a block diagram of a codec framework employed in the present invention.
FIG. 2 is a schematic diagram of the features of the constructed text vector employed in the present invention.
FIG. 3 defines military nested entity types for the present invention.
FIG. 4 is an input level conjunctive word embedding representation of the present invention.
Fig. 5 is a schematic diagram of EDA data enhancement added after a presentation layer.
FIG. 6 is a view showing the structure of On-LSTM.
Fig. 7 illustrates the core concept of the attention mechanism used in the present invention.
Fig. 8 is a schematic diagram of a composite entity tag.
Fig. 9 is a general configuration diagram of a decoding layer.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention.
As shown in fig. 1, the event element extraction method is mainly based on a codec framework and combines an Attention mechanism to extract event elements, and mainly comprises five parts, namely an Input Layer (Input Layer), an encoding Layer (Encoder Layer), an Attention Layer (Attention Layer), a decoding Layer (Decoder Layer) and an Output Layer (Output Layer). The specific implementation mode is as follows:
the method comprises the following steps: input layer
In other named entity extraction models, the entity boundary and part-of-speech features obtained by Chinese segmentation are two main features commonly used in named entity recognition models. However, in practical applications, when a nested entity is to be obtained, information such as word segmentation and part of speech is obtained by using a natural language processing tool (such as the LTP in hayage) in a text preprocessing stage, which may cause error propagation, thereby reducing model performance. Therefore, in order to ensure the accuracy and efficiency of the entity recognition model, the original features which can be directly extracted from the text are selected.
In the invention, word embedding maps a text sequence to a high-dimensional vector space model to construct, a strategy with better effect in the existing research is fused, chinese single word embedding containing full text information is obtained through a pre-training semantic model BERT (Bidirectional Encoder retrieval from transformations), and the recognition of nested entities in the military field is assisted by using the language rules and semantic knowledge in word vectors; meanwhile, the Word2Vec technology is fused, the boundary characteristics and the part-of-speech characteristics of Chinese words are introduced, and the performance of entity recognition is improved.
The invention adds data enhancement in the middle of the vector representation layer output to the encoder to generate the negative example of sentence vector representation and enhance the performance of model training.
As shown in FIG. 2, in the military field nested entity recognition task, the invention selects BERT word vectors (position characteristics, sentence characteristics and word characteristics) and W2V word vectors (part of speech characteristics and boundary characteristics). In fig. 2, the nested entity "cross-border business contract negotiation" in this sentence belongs to the target event entity category as a nested noun. With L = { w 1 ,w 2 ,…,w n Denotes a sentence.
Wherein the length of the sentence is n, the ith word in the sentence is w i We will say that each word w i Converting a vector x consisting of three parts i
The input layer of the invention comprises 3 steps as follows:
BERT word vector representation
The input part of the BERT layer is composed of superposition of word embedding (token embedding), sentence embedding (segment embedding), and position embedding (position embedding) for each character. In addition, since the experimental corpus is input as a single sentence, word embedding is used to represent sentence information, and sentence sequences are represented by inserting sentence heads and sentence tails with symbols [ CLS ] and [ SEP ]. The BERT layer maps each word in the sentence to a low-dimensional dense word vector.
In the invention, BERT utilizes bidirectional Transformer calculation training to map each word in a sentence into a low latitude dense word vector.
Word2vec word vector representation
With e (w) i ) Represents the word w i The word vector of (2). The invention uses Word2Vec in Python Gensim theme model package in the process of training Word2Vec, and adopts Skip-gram model.
On the basis of character features extracted by BERT, semantic features of parts of speech and boundaries are introduced to serve as one of expression features of a subsequent neural network. The method removes part of parts of speech of the original general field, introduces a part of speech tag set with military characteristics, and includes the parts of speech and types of named entities related to the military field. The model defines the military domain nested entities as shown in fig. 3.
Each part of speech in the part of speech corpus with good word segmentation results is looked up by a table to obtain corresponding part of speech vector representation P i (POS embedding), and then carrying out word vector C obtained by passing the POS embedding and the word sequence through a BERT pre-training model i Splicing and fusing by adopting a proper mode to obtain a mixed feature vector representation H i
The potential information of the Chinese text is utilized, chinese part-of-speech rules are better utilized, word boundary information ignored in a word model is introduced, and vocabulary information and entity boundary information are fused in a model word embedding layer. Therefore, the auxiliary model can judge the entity type and the boundary when the entity is identified, and the function of improving the identification capability of the model is achieved. After two kinds of vector representations are respectively obtained through BERT and W2V, the two kinds of vectors are combined in a splicing mode, and text joint features are fused. As shown in fig. 4.
3. Data enhancement
The Data enhancement technology (EDA, easy Data enhancement technologies) is first proposed in image processing, and has become a standard configuration in the image field nowadays, and Data enhancement is realized by Techniques such as inversion, rotation, mirror image, white gaussian noise and the like of an image, so as to improve the robustness of successful classifier identification in an image identification environment. In natural language processing, various variants are generated for different task data enhancement techniques, such as text classification, part-of-speech tagging, and the like. So-called competitive training, which is actually considered as a method for improving the generalization capability of a model by expanding limited data, data enhancement techniques improve the performance of the model by generating perturbations that are easily recognized as false instances by a classifier.
The training data of the model is from partial cases, and 4 operations are introduced to enhance the data in order to improve the performance of the nested entity recognition model in the military field, so that overfitting is prevented, and the generalization capability of the model is improved.
1) Synonym substitution (SR: synonyms Replace): regardless of stopwords, n words are randomly extracted from a sentence, and then synonyms are randomly extracted from a synonym dictionary and replaced.
2) Random insertion (RI: random Insert): regardless of stopwords, a word is randomly extracted and then randomly selected from a set of synonyms for the word and inserted into a random position in the original sentence. This process may be repeated n times.
3) Random crossover (RS: random Swap): in the sentence, two words are randomly selected, and positions are exchanged. This process may be repeated n times.
4) Random deletion (RD: random Delete): each word in the sentence is deleted randomly with probability p. The original input information is enhanced by adding a data enhancement operation on top of the concatenated word vector representation layer, as shown in fig. 5.
The input representation layer model comprises word embedding fusing BERT word vectors with W2V and enhanced data operations, by adding small perturbation functions in training data. As shown in equation 1.
Figure RE-GDA0003831805820000051
I.e. by manipulating the worst case enhancement data η adv Added to the original embedded vector ω to maximize the loss function. Wherein the content of the first and second substances,
Figure RE-GDA0003831805820000052
is a copy of the current model parameters. The original example and the generated enhanced data statement are then jointly trained, so the final penalty is shown in equation 2.
Figure RE-GDA0003831805820000053
In summary, the word w i Real value vector x of i This can be expressed as shown in equation 3.
Figure RE-GDA0003831805820000054
Wherein x is i ∈R d The dimension is d dimension,
Figure RE-GDA0003831805820000055
meaning that vectors are merged in a concatenated manner, may be represented by X = { X = { (X) 1 ,x 2 ,…,x n Represents an event sentence L of length n; wherein X ∈ R n×d Dimension is n × d dimension, x i Is the ith word w i The real-valued vector of (1).
Step two: coding layer
For different tasks, the coding layer and the decoding layer can be selected in different combination modes, for example, on the image processing task, a convolutional neural network is usually used for forming the coding layer, but for the natural language processing field task of extracting event elements, a cyclic neural network is usually selected; because a sentence can be represented as a hierarchical structure, neurons in a conventional recurrent neural network such as LSTM are often unordered, so that hierarchical information of the sentence cannot be extracted. Therefore, the invention selects a bidirectional ordered long-short term memory network (Bi-On LSTM) as the basic structure of the coding layer. The forward calculation formula of the On-LSTM is shown in formula 4, and FIG. 6 is a schematic structural diagram of the On-LSTM unit.
Figure RE-GDA0003831805820000056
Wherein, the ON-LSTM is mainly a new main forgetting gate
Figure RE-GDA0003831805820000057
Main input gate
Figure RE-GDA0003831805820000058
Figure RE-GDA0003831805820000059
And
Figure RE-GDA00038318058200000510
right/left cumsum operation, respectively;
computing word x at t time by forward On-LSTM t Left state
Figure RE-GDA0003831805820000061
And then the word x at the t moment is calculated by utilizing the backward On-LSTM t State of the right
Figure RE-GDA0003831805820000062
The output result of the coding layer at time t is
Figure RE-GDA0003831805820000063
Step three: attention layer
In short, the attention mechanism is to ignore unimportant features among a large number of features while enhancing the focus on useful features. The Attention mechanism is divided into a Soft-Attention model and a Self-Attention model, and as shown in FIG. 7, the core idea of the Attention mechanism used in the present invention is shown.
In the Soft-Attention model, firstly, terms in an input sequence S are abstracted into data pairs in the form of < Key, value >, so that for a certain term Query in a target sequence T, a weight coefficient Value corresponding to each Key in the input sequence can be obtained by calculation and correlation between the Key and the Query, and final Attention (Attention Value) can be obtained by weighting and summing the values corresponding to all keys, as shown in formula 5.
Figure RE-GDA0003831805820000064
Wherein the length of the input sequence S is L x
The Self-Attention model, which is adopted by the latest machine translation model of Google, is also referred to as the Self-Attention mechanism. In the Soft-Attention model, the Attention mechanism mainly acts between each word in the input sequence S and Query in the target sequence T. In the Self-Attention model, attention is mainly focused on the internal words of the input sequence S or the target sequence T, and the mode of calculation of the Attention is similar to that of the Soft-Attention model, and the difference is only that the calculation objects are different.
Aiming at natural language processing tasks, the Self-Attention model has strong capability of capturing semantic features among words in sentences. For a cyclic neural network and a gated cyclic unit network, after the Self-orientation model is added, related information among words which can be calculated through a plurality of steps can be directly associated easily through one step, so that the dependency relationship of the words can be captured well in long sentences.
Unlike most of the past natural language processing tasks, the model of the present invention does not directly hide the state h of the Encoder layer t The weighted sum is used as the context vector c of the attention layer t But c at the time of calculating t t When ignoring the hidden state h t And predicting the final result y t Time-to-time decoding layer output result s t And h of the coding layer t Taken together as a feature. This is because at time t, h of the coding layer t Semantic information representing current candidate event elements, which is a prediction y t The most effective information of the result; and c is a t Can represent the influence of other words in the sentence on the candidate event elements. Thus, context vector c t The calculation formula of (c) is shown in formula 6.
Figure RE-GDA0003831805820000071
Wherein, the hidden state of the coding layer is recorded as h j Attention distribution weight is denoted as a t,j The formula is calculated as shown in formula 7.
Figure RE-GDA0003831805820000072
Wherein the attention score (attention score) is recorded as e t,j Following withe t,j Increase of (2), attention is assigned a weight a t,j Semantic coding h of the coding layer, which is enlarged j For context vector c t The influence of (c) will also increase, and thus the influence on the final element type will also increase.
As shown in the following equation, the attention score e is calculated for different deep learning tasks t,j The method (1) comprises dot product (dot), multiplication (general) and addition (concat). Wherein the matrix
Figure RE-GDA0003831805820000073
Is a matrix s t For the transpose of (1), the weight matrix in the attention layer is W a . A large number of experiments show that better final results can be obtained by calculating the attention score in a general way for the natural language processing task, and therefore, the general calculation way is selected for the event element extraction task studied by the present invention, as shown in formula 8.
Figure RE-GDA0003831805820000074
The core of the model designed by the invention is the attention mechanism, and the core of the attention mechanism is how to allocate attention weights, which has great influence on the final result of the event element model. When the role type of the event element is judged, the attention mechanism endows each word in the sentence with a certain weight value, and when the model is identified and classified, the words with larger weight are more concerned.
Step four: decoding layer
The function of the decoding layer of the invention is to label the token with a label not limited to one based on the token and other information output by the coding layer and the attention layer.
The decoding layer adopts an LSTM Unit long-short term memory network. As with other models using codec frameworks, the first hidden state of the decoding layer is also calculated by the last hidden state of the coding layer, and the state can be initialized during training or updated during trainingAnd (5) new. For the label l i e.L, which corresponds to a representation g i . Meanwhile, the hidden layer output h of the last time step Unit can be obtained t-1 Output model o of last time step t-1 And token for the current time step input represents x t The task of this layer is how to find the current hidden layer output h based on these parameters t Output o of the model at the current time step t . The method comprises the following steps:
1) Model output of last time step o t-1 The method comprises the steps of mapping the whole distribution to a 01 interval through softmax, and finding out all labels with the probability larger than a preset threshold value T. For example, using FIG. 8, if the model achieves our expected result, for the current token "his", the model should find two tags U that meet the threshold requirement PER And B PER . Note that since the label O is not predicted here, but to efficiently model the probability of the beginning of each new entity, each token must have at least one O label, i.e., the model needs to eventually predict O, U PER And B PER
2) Since the last time step model predicts three possible labels, we need to examine which label the three labels may correspond to at the current step. So we will be x at present t Three copies were copied, each one calculating the current hidden layer output by the LSTM unit. In particular, for the possible label k of the previous time step, the hidden layer result of the current time step is shown in equation 9:
Figure RE-GDA0003831805820000081
3) Thus, for each prediction result of the previous time step, a hidden layer representation of the corresponding current time step can be obtained. We now have three hidden layer representations, which are averaged as shown in equation 10:
Figure RE-GDA0003831805820000082
wherein | G t-1 I is the number of all tags meeting the threshold in the previous time step, and the value is 3 in the present example;
4) Obtaining an output o of the current time step t =Uh t + b, where U and b are the weight matrix and the regular term of FFN, respectively.
The above is the form of outputting all possible labels of each token one by one using the form similar to Decoder, and if the above expression is difficult to be directly understood, it can also be understood by comparing the model structure diagram 9.
The formula for the loss function based on the above idea is given here to describe equation 11:
Figure RE-GDA0003831805820000083
the method is essentially a multi-classification cross entropy loss function, the output of each training is corrected according to the function, and the training can be finished after the model effect reaches the standard. The number of labels output by using the model is not limited to a single label, the advantages of the traditional On-LSTM in the aspect of extracting sentence level information are inherited, and the method is a core process in a composite named entity recognition decoding structure.
Step five: output layer
And the output layer processes the output result of the decoding layer by adopting a softmax function so as to obtain a classification result. In the task studied by the invention, the most influence on the classification result is that the current semantic meaning (candidate event element) is a simple entity or a nested entity, so the invention calculates the attention weight c t Only the influence of other words in the calculation sentence on the candidate event is calculated, and the semantic information h obtained after the candidate event element passes through the coding layer is ignored t In order to preserve as much information as possible about the words themselves. Y of the output layer t Contains c t And h t Two parts, as shown in equation 12:
y t =softmax(w h h t +w c c t +b) (12)
wherein the weight matrix w h 、w c Are all randomly generated, b is a bias vector, and the prediction result of the current word is y t
The number of training samples is set as T, and the ith sample is (x) i ,y i ) The model parameters are represented by θ, and the model target loss function is shown in equation 13:
Figure RE-GDA0003831805820000091
during training, the invention performs objective loss function optimization by Adam while preventing overfitting using Dropout.
Step six: CRF layer
y i Is x i The probability matrix of each corresponding label is obtained, and the final output result is y i The label corresponding to the maximum value. However, there are cases where the chosen label does not comply with the constraint rules, such as label PER-B followed by label ORG-E. In order to ensure that the labeling result of the whole sequence conforms to the dependency relationship among the labels, a transfer matrix T and an element T are introduced ij Indicating the probability of transitioning from label i to label j. The layer applies the viterbi algorithm, the calculation result is obtained by the common calculation of the upper layer output matrix Y and the transition matrix T, and the prediction output of the whole sequence is as shown in formula 14:
Figure RE-GDA0003831805820000092
although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited in scope to the specific embodiments. Such variations are obvious and all inventions utilizing the inventive concept are intended to be protected.

Claims (6)

1. A military field composite named entity recognition method based on BERT is characterized in that the method aims at recognizing and classifying simple entities and nested entities in military field event sentences, and the method comprises the following steps:
step 1: selecting features from an input layer to construct a word vector and a word vector, and performing data enhancement;
and 2, step: capturing hierarchical structure information and sequence information at a coding layer;
and step 3: capturing word-word information in the sentence at the attention layer and calculating corresponding weight;
and 4, step 4: features before the integration of the decoding layers are combined, and abstract features are further extracted;
and 5: and obtaining element identification results by using a softmax function and a CRF algorithm on an output layer.
2. The BERT-based military field composite named entity recognition method of claim 1, wherein in the step 1, features are selected from an input layer to construct word vectors and word vectors, and data enhancement specifically refers to: in the military field composite named entity recognition task, two types of characters of a word vector and a word vector are selected, and L = { w = (zero-rank) is used 1 ,w 2 ,…,w n The manner shown represents a sentence;
wherein the sentence length is n, the ith word in the sentence is w i We will put each word w i Converting a vector x consisting of i
Step 1.1: training word vector
The input part of the BERT layer is composed of superposition of word embedding (token embedding), sentence embedding (segment embedding), and position embedding (position embedding) for each character. In addition, since the experimental corpus is input as a single sentence, word embedding is used to represent sentence information, and sentence sequences are represented by inserting sentence heads and sentence tails with symbols [ CLS ] and [ SEP ]. The BERT layer maps each word in the sentence to a low-dimensional dense word vector.
In the invention, BERT utilizes bidirectional Transformer calculation training to map each word in a sentence into a low latitude dense word vector.
Step 1.2: training word vector
With e (w) i ) Represents the word w i The word vector of (2). The invention uses Word2Vec in Python Gensim theme model package in the process of training Word2Vec, and adopts Skip-gram model.
On the basis of character features extracted by BERT, semantic features of parts of speech and boundaries are introduced to serve as one of expression features of a subsequent neural network. The method removes part of parts of speech of the original general field, introduces a part of speech tag set with military characteristics, and includes the parts of speech and types of named entities related to the military field. The model defines the military domain nested entities as shown in fig. 3.
Each part of speech in the part of speech corpus with good word segmentation results is subjected to table lookup to obtain corresponding part of speech vector representation P i (POS embedding), and then carrying out word vector C obtained by passing the POS embedding and the word sequence through a BERT pre-training model i Splicing and fusing by adopting a proper mode to obtain a mixed feature vector representation H i
The potential information of the Chinese text is utilized, chinese part-of-speech rules are better utilized, word boundary information ignored in a word model is introduced, and vocabulary information and entity boundary information are fused in a model word embedding layer. Therefore, the auxiliary model can judge the entity type and the boundary when the entity is identified, and the function of improving the identification capability of the model is achieved. After two kinds of vector representations are respectively obtained through BERT and W2V, the two kinds of vectors are combined in a splicing mode, and text joint features are fused. As shown in fig. 4.
Step 1.3: data enhancement
The training data of the model is from partial cases, and 4 operations are introduced to enhance the data in order to improve the performance of the nested entity recognition model in the military field, so that overfitting is prevented, and the generalization capability of the model is improved.
1) Synonym replacement (SR: synonyms Replace): regardless of stopwords, n words are randomly extracted from a sentence, and then synonyms are randomly extracted from a synonym dictionary and replaced.
2) Random insertion (RI: random Insert): regardless of stopwords, a word is randomly extracted and then randomly selected from a set of synonyms for the word and inserted into a random position in the original sentence. This process may be repeated n times.
3) Random crossover (RS: random Swap): in the sentence, two words are randomly selected, and positions are exchanged. This process may be repeated n times.
4) Random deletion (RD: random Delete): each word in the sentence is deleted randomly with a probability p.
The original input information is enhanced by adding a data enhancement operation on top of the stitched word vector representation layer, as shown in fig. 5.
The input representation layer model comprises word embedding and data enhancement operations of fusing BERT word vectors and W2V, and small perturbation functions are added in training data. As shown in equation 1.
Figure FDA0003372872920000021
And (4) melting. Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003372872920000022
is a copy of the current model parameters. The original example and the generated enhanced data statement are then jointly trained, so the final penalty is shown in equation 2.
Figure FDA0003372872920000023
In summary, the word w i Real value vector x of i This can be expressed as shown in equation 3.
Figure FDA0003372872920000024
Wherein x is i ∈R d The dimension is d dimension,
Figure FDA0003372872920000025
meaning that vectors are merged in a concatenated manner, may be represented by X = { X = { (X) 1 ,x 2 ,…,x n Represents an event sentence L of length n; wherein X ∈ R n×d Dimension is n × d dimension, x i Is the ith word w i The real-valued vector of (a).
3. The method for extracting event element entity relationship based on coding and decoding model as claimed in claim 2, wherein the capturing hierarchical structure information and sequence information at the coding layer in step 2 specifically refers to:
for different tasks, the coding layer and the decoding layer can be selected in different combination modes, for example, on the image processing task, a convolutional neural network is usually used for forming the coding layer, but for the natural language processing field task of extracting event elements, a cyclic neural network is usually selected; one sentence can be represented as a hierarchical structure, and neurons in a conventional recurrent neural network such as LSTM are generally unordered, so that hierarchical information of the sentence cannot be extracted; therefore, the invention selects a bidirectional ordered long-short term memory network (Bi-OnLSTM) as a basic structure of a coding layer; forward calculation of On-LSTM, as shown in equation 4:
Figure FDA0003372872920000031
wherein, the ON-LSTM is mainly a new main forgetting gate
Figure FDA0003372872920000032
Main input gate
Figure FDA0003372872920000033
Figure FDA0003372872920000034
And
Figure FDA0003372872920000035
right/left cumsum operation, respectively;
computing word x at t time by forward On-LSTM t Left state
Figure FDA0003372872920000036
Then, backward On-LSTM is utilized to calculate word x at time t t Right state
Figure FDA0003372872920000037
The output result of the coding layer at the time t is
Figure FDA0003372872920000038
4. The method for extracting event element entity relationship based on coding and decoding model as claimed in claim 3, wherein said capturing word-word information in sentence at attention level and calculating corresponding weight in step 3 specifically means:
the attention layer is a core part of the model of the present invention. In short, the attention mechanism is to ignore unimportant features among a large number of features while enhancing the focus on useful features. The Attention mechanism is divided into a Soft-Attention model and a Self-Attention on model, and as shown in FIG. 7, the core idea of the Attention mechanism used in the present invention is shown.
In the Soft-Attention model, firstly, terms in an input sequence S are abstracted into data pairs in the form of < Key, value >, for a certain term Query in a target sequence T, weight coefficients Value corresponding to keys in the input sequence can be obtained by calculation and correlation between the Key and the Query, and Value corresponding to all keys is weighted and summed to obtain final Attention (Attention Value), as shown in formula 5.
Figure FDA0003372872920000039
Wherein the length of the input sequence S is L x
The Self-Attention model, which is adopted by Google's latest machine translation model, is also referred to as the Self-Attention mechanism. In the Soft-Attention model, the Attention mechanism mainly acts between each word in the input sequence S and Query in the target sequence T. In the Self-Attention model, attention is mainly focused on the internal words of the input sequence S or the target sequence T, and the mode of calculation of the Attention is similar to that of the Soft-Attention model, and the difference is only that the calculation objects are different.
Aiming at natural language processing tasks, the Self-Attention model has strong capability of capturing semantic features among words in sentences. For a cyclic neural network and a gated cyclic unit network, after a Self-orientation model is added, related information among words which can be calculated through a plurality of steps can be directly related easily through one step, so that the dependency relationship of the words can be captured well in long sentences.
Unlike most of the past natural language processing tasks, the model of the present invention does not directly hide the state h of the Encoder layer t The weighted sum is used as the context vector c of the attention layer t But c at the time of calculating t t When ignoring the hidden state h t And predicting the final result y t Output result s of time-to-be-decoded layer t And h of the coding layer t Taken together as a feature. This is because at time t, h of the coding layer t Semantic information representing current candidate event elements, which is a prediction y t The most effective information of the result; and c t Can represent the influence of other words in the sentence on the candidate event elements. Thus, context vector c t The calculation formula of (c) is shown in formula 6.
Figure FDA0003372872920000041
Wherein, the hidden state of the coding layer is recorded as h j Attention assignment weight is denoted as a t,j It calculates the formula as shown in formula 7.
Figure FDA0003372872920000042
Wherein the attention score (attention score) is recorded as e t,j With e t,j Increase of (2), attention assigning weight a t,j Semantic coding h of the coding layer, which is enlarged j For context vector c t The influence of (c) will also increase, and thus the influence on the final element type will also increase.
As shown in the following equation, the attention score e is calculated for different deep learning tasks t,j The method of (1) comprises three methods of dot product (dot), multiplication (general) and addition (concat). Wherein, the matrix
Figure FDA0003372872920000043
Is a matrix s t The weight matrix in the attention layer is W a . A large number of experiments show that better final results can be obtained by calculating the attention score in a general way for the natural language processing task, and therefore, the general calculation way is selected for the event element extraction task studied by the present invention, as shown in formula 8.
Figure FDA0003372872920000044
The core of the model designed by the invention is the attention mechanism, and the core of the attention mechanism is how to allocate attention weights, which has great influence on the final result of the event element model. When the role type of the event element is judged, the attention mechanism endows each word in the sentence with a certain weight value, and when the model is identified and classified, the words with larger weight are more concerned.
5. The method for extracting event element entity relationship based on coding and decoding model as claimed in claim 4, wherein the feature before decoding layer synthesis and further extracting abstract feature in step 4 specifically refer to:
1) Model output o of last time step t-1 The method comprises the steps of mapping the whole distribution to a 01 interval through softmax, and finding out all labels with the probability larger than a preset threshold value T. For example, using FIG. 8, if the model achieves our expected result, for the current token "his", the model should find two tags U that meet the threshold requirement PER And B PER . Note that since the label O is not predicted here, but to efficiently model the probability of the beginning of each new entity, there must be at least one O label per to ken, i.e., the model needs to be eventually predicted to obtain O, U PER And B PER
2) Since the last time step model predicts three possible labels, we need to consider which label the three labels may correspond to respectively at the current step. So we will be x at present t Three copies were copied, each one calculating the current hidden layer output by the LSTM unit. In particular, for the possible label k of the previous time step, the hidden layer result of the current time step is shown in equation 9:
Figure FDA0003372872920000051
3) Thus, for each prediction result of the previous time step, a hidden layer representation of the corresponding current time step can be obtained. We now have three hidden layer representations, which are averaged as shown in equation 10:
Figure FDA0003372872920000052
wherein | G t-1 I is the number of all tags meeting the threshold in the last time step, and the value is 3 in the example presented by us at present;
4) Obtaining the output of the current time stepGo out of t =Uh t + b, where U and b are the weight matrix and the regular term of FFN, respectively.
The above is the form of outputting all possible labels of each token one by one using the form similar to Decoder, and if the above expression is difficult to be directly understood, it can also be understood by comparing the model structure diagram 9.
The formula for the loss function based on the above idea is given here to describe equation 11:
Figure FDA0003372872920000053
the method is essentially a multi-classification cross entropy loss function, the output of each training is corrected according to the function, and the training can be finished after the model effect reaches the standard. The number of labels output by using the model is not limited to a single label, the advantage of the traditional On-LSTM in the aspect of extracting sentence level information is inherited, and the method is a core process in a compound named entity recognition decoding structure.
6. The method for extracting event element entity relationship based on coding and decoding model as claimed in claim 5, wherein the deriving element identification result at output layer using softmax function in step 5 specifically refers to: identifying event elements in the sentence by the extracted features through a classifier and classifying roles;
step 6.1: softmax function
The output layer processes the output result of the decoding layer by adopting a softmax function so as to obtain a classification result; in the task studied by the invention, the most influential to the classification result should be the semantic information of the current word (candidate event element) itself, and therefore, the invention is calculating the attention weight c t Only the influence of other words in the calculation sentence on the candidate event elements is calculated, and the semantic information h obtained after the candidate event elements pass through the coding layer is ignored t So as to keep the information of the words as much as possible; output layer y t Contains c t And h t Two partsAs shown in equation 12;
y t =softmax(w h h t +w c c t +b) (12)
wherein the weight matrix w h 、w c Are all randomly generated, b is a bias vector, and the prediction result of the current word is y t
The number of training samples is set as T, and the ith sample is (x) i ,y i ) Representing that the model parameter is represented by theta, then the model target loss function is represented by formula 13;
Figure FDA0003372872920000061
during training, the invention performs objective loss function optimization by Adam while preventing overfitting using Dropout.
Step 6.2: CRF prediction
y i Is x i The probability matrix of each corresponding label is obtained, and the final output result is y i The label corresponding to the maximum value. However, there are cases where the chosen label does not comply with the constraint rules, such as label PER-B followed by label ORG-E. In order to ensure that the labeling result of the whole sequence conforms to the dependency relationship among the labels, a transfer matrix T and an element T are introduced ij Indicating the probability of transitioning from label i to label j. The layer applies the Viterbi algorithm, the calculation result is obtained by the common calculation of an upper layer output matrix Y and a transfer matrix T, and the prediction output formula of the whole sequence is as follows:
Figure FDA0003372872920000062
CN202111408527.4A 2021-11-26 2021-11-26 Military field composite named entity identification method based on BERT Pending CN115238690A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111408527.4A CN115238690A (en) 2021-11-26 2021-11-26 Military field composite named entity identification method based on BERT

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111408527.4A CN115238690A (en) 2021-11-26 2021-11-26 Military field composite named entity identification method based on BERT

Publications (1)

Publication Number Publication Date
CN115238690A true CN115238690A (en) 2022-10-25

Family

ID=83665821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111408527.4A Pending CN115238690A (en) 2021-11-26 2021-11-26 Military field composite named entity identification method based on BERT

Country Status (1)

Country Link
CN (1) CN115238690A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860002A (en) * 2022-12-27 2023-03-28 中国人民解放军国防科技大学 Combat task generation method and system based on event extraction
CN115879421A (en) * 2023-02-16 2023-03-31 之江实验室 Sentence ordering method and device for enhancing BART pre-training task
CN117236338A (en) * 2023-08-29 2023-12-15 北京工商大学 Named entity recognition model of dense entity text and training method thereof
CN117669574A (en) * 2024-02-01 2024-03-08 浙江大学 Artificial intelligence field entity identification method and system based on multi-semantic feature fusion
CN117786092A (en) * 2024-02-27 2024-03-29 成都晓多科技有限公司 Commodity comment key phrase extraction method and system
CN117786092B (en) * 2024-02-27 2024-05-14 成都晓多科技有限公司 Commodity comment key phrase extraction method and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860002A (en) * 2022-12-27 2023-03-28 中国人民解放军国防科技大学 Combat task generation method and system based on event extraction
CN115860002B (en) * 2022-12-27 2024-04-05 中国人民解放军国防科技大学 Combat task generation method and system based on event extraction
CN115879421A (en) * 2023-02-16 2023-03-31 之江实验室 Sentence ordering method and device for enhancing BART pre-training task
CN115879421B (en) * 2023-02-16 2024-01-09 之江实验室 Sentence ordering method and device for enhancing BART pre-training task
CN117236338A (en) * 2023-08-29 2023-12-15 北京工商大学 Named entity recognition model of dense entity text and training method thereof
CN117669574A (en) * 2024-02-01 2024-03-08 浙江大学 Artificial intelligence field entity identification method and system based on multi-semantic feature fusion
CN117669574B (en) * 2024-02-01 2024-05-17 浙江大学 Artificial intelligence field entity identification method and system based on multi-semantic feature fusion
CN117786092A (en) * 2024-02-27 2024-03-29 成都晓多科技有限公司 Commodity comment key phrase extraction method and system
CN117786092B (en) * 2024-02-27 2024-05-14 成都晓多科技有限公司 Commodity comment key phrase extraction method and system

Similar Documents

Publication Publication Date Title
CN111581961B (en) Automatic description method for image content constructed by Chinese visual vocabulary
CN109753566B (en) Model training method for cross-domain emotion analysis based on convolutional neural network
CN109800437B (en) Named entity recognition method based on feature fusion
CN112100351A (en) Method and equipment for constructing intelligent question-answering system through question generation data set
CN112733541A (en) Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
CN111259127B (en) Long text answer selection method based on transfer learning sentence vector
CN115238690A (en) Military field composite named entity identification method based on BERT
CN112989834A (en) Named entity identification method and system based on flat grid enhanced linear converter
Shi et al. Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval
CN112417097B (en) Multi-modal data feature extraction and association method for public opinion analysis
CN108509521B (en) Image retrieval method for automatically generating text index
CN113377897B (en) Multi-language medical term standard standardization system and method based on deep confrontation learning
CN117076653B (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
Xiao et al. A new attention-based LSTM for image captioning
CN114218922A (en) Aspect emotion analysis method based on dual-channel graph convolution network
CN116010553A (en) Viewpoint retrieval system based on two-way coding and accurate matching signals
CN111985548A (en) Label-guided cross-modal deep hashing method
CN116662591A (en) Robust visual question-answering model training method based on contrast learning
CN115730232A (en) Topic-correlation-based heterogeneous graph neural network cross-language text classification method
CN115169429A (en) Lightweight aspect-level text emotion analysis method
Zhu et al. ZH-NER: Chinese named entity recognition with adversarial multi-task learning and self-attentions
Liao et al. The sg-cim entity linking method based on bert and entity name embeddings
Pingili et al. Target-based sentiment analysis using a bert embedded model
CN113392649A (en) Identification method, device, equipment and storage medium
CN111737507A (en) Single-mode image Hash retrieval method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination