CN114780677A - Chinese event extraction method based on feature fusion - Google Patents

Chinese event extraction method based on feature fusion Download PDF

Info

Publication number
CN114780677A
CN114780677A CN202210354653.4A CN202210354653A CN114780677A CN 114780677 A CN114780677 A CN 114780677A CN 202210354653 A CN202210354653 A CN 202210354653A CN 114780677 A CN114780677 A CN 114780677A
Authority
CN
China
Prior art keywords
network
event
input
word
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210354653.4A
Other languages
Chinese (zh)
Inventor
柯欣飞
姬红兵
张文博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Fangcun Jihui Intelligent Technology Co ltd
Xidian University
Original Assignee
Shaanxi Fangcun Jihui Intelligent Technology Co ltd
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi Fangcun Jihui Intelligent Technology Co ltd, Xidian University filed Critical Shaanxi Fangcun Jihui Intelligent Technology Co ltd
Priority to CN202210354653.4A priority Critical patent/CN114780677A/en
Publication of CN114780677A publication Critical patent/CN114780677A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Chinese event extraction method based on feature fusion, which comprises the following steps: 1) constructing a Chinese event extraction network BERT-FF; 2) constructing a training data set; 3) downloading a pre-training parameter file and optimizing by using a contrast learning method; 4) loading the optimized pre-training parameter file in a word level feature extraction network by using a transfer learning method; 5) training by using a training data set to obtain a trained Chinese event extraction network BERT-FF; 6) and crawling a text describing the event from the open network, inputting the text serving as a test data set into a trained Chinese event extraction network BERT-FF for event extraction, and outputting structured event information, namely an event extraction result. The method enhances the semantic expression capability of the model through a feature fusion method, improves the performance of extracting Chinese events, and can be used in the fields of news public opinion analysis, information processing, financial risk assessment and the like.

Description

Chinese event extraction method based on feature fusion
Technical Field
The invention belongs to the technical field of artificial intelligence, relates to natural language processing, and particularly relates to a Chinese event extraction method based on feature fusion, which can be used for public opinion analysis and information processing.
Background
The main objective of the event extraction task is to extract structured event information from unstructured text, so as to reduce the difficulty of acquiring and processing information for users. This typically includes determining the type of event (event type) contained in the text, and identifying an event argument (argument) and determining the role (role) it plays in the event. The technology can be applied to the fields of news public opinion analysis, information processing, financial risk assessment and the like. The research of the event extraction system has been developed over a decade, and many experts and scholars have obtained a great deal of effective research results, but most of the research is directed at the english corpus, the research in the aspect of chinese event extraction is relatively immature, not only is a high-quality data set lacking, but also the model performance is poorer than that of english event extraction.
The traditional event extraction method converts event detection and event argument extraction into classification problems by acquiring characteristic information of event trigger words and vocabulary, syntax, semantics and the like of event arguments, and the core of the traditional event extraction method is the extraction of characteristics and the construction of a classifier. Although various characteristics such as vocabulary, syntax, semantics and the like can be used as input of the classifier, the construction of higher-level linguistic characteristics such as part-of-speech tagging, syntactic dependency analysis and the like requires professional knowledge of linguistics and related fields, which limits the adaptability and universality of the classifier.
With the development of neural networks, deep learning has made remarkable achievements in both event detection performance and event argument extraction performance. In a neural network, input layers can adopt simple representation of original data as input, and each layer can convert input of a shallow layer into more abstract and complex characteristics through learning and then input the more abstract and complex characteristics into a deeper layer until output characteristics of a deepest layer are used for classification. Compared with the traditional method, the deep learning can greatly reduce the difficulty of the characteristic engineering, and the detection and classification precision is superior to that of the traditional method. In 2018, in 10 months, google AI research institute issued a pre-trained language model BERT based on a Transformer encoder structure, and a research on applying the pre-trained language model to an event extraction task was also activated. Before that, the mainstream event detection method is to detect an event trigger word from a text and judge the event type according to the event trigger word; with the development of an event extraction model based on a pre-training network, due to the strong semantic feature representation capability of the event extraction model, an event detection method based on the full text becomes the mainstream.
Disclosure of Invention
In order to overcome the above drawbacks of the prior art, the present invention provides a method for extracting chinese events based on feature fusion, so as to improve the accuracy and recall rate of chinese event extraction.
In order to achieve the purpose, the invention adopts the technical scheme that:
the Chinese event extraction method based on feature fusion comprises the following steps:
step 1, constructing a Chinese event extraction network BERT-FF
The Chinese event extraction network BERT-FF comprises a word level feature extraction network, a feature fusion network and a back-end classification network;
the word level feature extraction network is based on a BERT pre-training language model and is used for extracting word level features of an input text; the word level feature extraction network is used for extracting word level features of an input text; the feature fusion network fuses the extracted word-level features and the word-level features through an attention mechanism so as to enhance the semantic representation capability of the model and obtain fusion feature vectors; the back-end classification network is used for respectively inputting the fusion feature vectors into the event detection back-end network and the event argument extraction back-end network to obtain a final event extraction result;
step 2, constructing a training data set
The training data set consists of texts which are crawled from an open network and describe events and annotation files which are in one-to-one correspondence with the texts;
step 3, training the Chinese event extraction network BERT-FF
And 4, crawling a text describing the event from the open network, inputting the text as a test data set into a trained Chinese event extraction network BERT-FF for event extraction, outputting structured event information, obtaining an event extraction result, and calculating the event extraction accuracy and recall rate.
Compared with the prior art, the invention has the beneficial effects that:
firstly, the invention uses a transfer learning method to load the pre-training parameter file of the BERT into the word level feature extraction network constructed in the step (1a), so that the word level feature extraction network can rapidly learn the word level semantic features of the text, the integral convergence speed of the model is accelerated, and the time complexity of event extraction is reduced.
Secondly, the invention optimizes the pre-training parameters of the BERT by using a contrast learning method, relieves the anisotropy problem of the BERT semantic feature vector space, and then uses the optimized model to replace the original model as a feature extraction network, thereby improving the event detection performance of the model.
Thirdly, the invention uses a feature fusion method based on an attention mechanism to fuse the extracted word-level feature information with the word-level feature information aiming at the characteristics of Chinese text data, thereby enhancing the context semantic representation capability of the model and improving the accuracy rate and the recall rate of the model in event detection and event argument extraction.
Drawings
FIG. 1 is a schematic structural diagram of a Chinese event extraction network BERT-FF.
FIG. 2 is a schematic diagram of the structure of the BERT pre-training model.
FIG. 3 is a schematic diagram of a multi-head attention mechanism.
Fig. 4 is an exemplary diagram of event extraction.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples.
The invention relates to a Chinese event extraction method based on feature fusion, which comprises the following implementation steps.
Step 1, referring to fig. 1, a Chinese event extraction network BERT-FF is constructed.
The Chinese event extraction network BERT-FF comprises: the system comprises a word level feature extraction network, a feature fusion network and a back-end classification network.
The word level feature extraction network is based on a BERT pre-training language model and is used for extracting word level features of an input text; the word level feature extraction network is used for extracting word level features of the input text; the feature fusion network fuses the extracted word-level features and the word-level features through an attention mechanism so as to enhance the semantic representation capability of the model and obtain fusion feature vectors; the back-end classification network is used for inputting the fusion feature vectors into the event detection back-end network and the event argument extraction back-end network respectively to obtain a final event extraction result.
The key steps are as follows:
1.1) building a character level feature extraction network of a Chinese event extraction network BERT-FF:
referring to fig. 2, the structural relationship is: input layer → word embedding layer → position coding → N concatenated semantic coders → output layer.
The specific parameters and implementation modes of the modules are as follows:
the input of the input layer is a token sequence obtained after text word segmentation, in order to avoid the problem of OOM (out of memory) of display memory, the maximum length of the token sequence during training is set to be 128, and if the maximum length exceeds 128, truncation is carried out. In order to Batch the input text data, the token sequence length in each Batch (Batch) must be kept equal, and if not, the token sequence length Padding is the longest in the Batch. Let token sequence length be sequence length1
The embedding dimension embeded size of the word embedding layer is 768, which meets the input dimension requirement of BERT, that is, the word vector of each token is a column vector of 768 dimensions.
The position coding method adopts Sinusoid position coding, and is shown as formula (1) and formula (2):
Figure BDA0003582374320000041
Figure BDA0003582374320000042
wherein pos refers to the position of the current token in the sequence, and the value range is [0, sequence length ]1) I refers to the dimension number of the word vector, and the value range is [0, embedded size/2) ], i.e., the dimension of the position code is consistent with the dimension of the word vector, and d refers to the dimension of the word vector. The formula (1) and formula (2)) is a set of sine and cosine formulas, which are respectively calculation formulas of position coding when the dimension serial number i of the word vector is even number and odd number, thereby generating different periodic changes. As can be seen from the above equation, as i increases, the frequency of the periodic variation becomes lower and lower, eventually yielding a unique texture containing location information at each different location. The method is used as a position code to be added into a word vector, so that a model can learn the dependency relationship between positions and the time sequence characteristic of a natural language.
The main body of the word-level feature extraction network is N cascaded semantic encoders, wherein N is 12. Each semantic encoder consists of two parts: a Multi-Head Self-Attention (Multi-Head Self-Attention) module containing a residual network and a Forward propagation (Feed Forward) module containing a residual network.
The multi-head self-attention module comprising the residual error network is formed by splicing a multi-head attention module and a residual error module, and the multi-head attention module refers to fig. 3. Input of multi-head attention module1And input2The three inputs of each Attention Head are respectively a Query (Query) vector sequence Q, a Key (Key) vector sequence K and a Value (Value) vector sequence V, and the number h of the Attention Head is 12. Q is input1Obtained through the full connection layer, K and V are from input2Obtained by full connection layer. In practical use, the dimension d of the input word vector is usually a large number, and since multiple groups of attention weights need to be calculated, in order to avoid the excessive number of network parameters, the mapping matrix is generally selected
Figure BDA0003582374320000052
Dimension reduction is performed, that is, in each extension head, the original d-dimensional word vector is projected to d/h dimension, so that the dimension of each fully-connected layer mapping matrix is 768 × 64. Each Attention Head generates a set of Q, K and V, each set of Q, K and V is input to a scaled dot product Attention module to compute a context feature vector. Finally, splicing the obtained multiple groups of context feature vectors, inputting the multiple groups of context feature vectors into a full connection layer to obtain the output of the multi-head attention module, wherein the dimensionality of a full connection layer mapping matrix is 768 multiplied by 768. The residual module adds the input and output of its pre-module and performs layer normalization.
The forward propagation module containing the residual error network is formed by splicing two full connection layers, a GeLU activation function and a residual error module, wherein the GeLU activation function is positioned between the two full connection layers. The first fully-connected layer mapping matrix has dimensions of 768 × 2048, and the second fully-connected layer mapping matrix has dimensions of 2048 × 768. The residual module adds the input and output of its pre-module and performs layer normalization. The GeLU activation function is expressed as:
Figure BDA0003582374320000051
where x represents the output of the pre-module and erf (×) represents the gaussian error computation function.
1.2) building a word level feature extraction network of a Chinese event extraction network BERT-FF:
the structural relationship is as follows: input layer → word embedding layer → position coding → fully connected layer → output layer.
The specific parameters and implementation of each module are as follows:
the input of the input layer is a token sequence obtained after the word segmentation of the text,in order to avoid the video on-board OOM problem, the maximum length of the token sequence during training is set to be 128, and if the maximum length exceeds 128, truncation is carried out. In order to Batch the input text data, the token sequence length in each Batch (Batch) must be kept equal, and if not, the token sequence length Padding is the longest in the Batch. Let token sequence length be sequence length2
The embedding dimension embeddedsize of the word embedding layer is 128, i.e. the word vector of each token is a column vector of 128 dimensions.
The position coding method adopts Sinussoid position coding.
The dimensionality of the fully-connected layer mapping matrix is 128 multiplied by 768, so that the dimensionality of the word-level feature vector is equal to the dimensionality of the word-level feature vector, and subsequent calculation is facilitated.
1.3) constructing a feature fusion network of a Chinese event extraction network BERT-FF:
the structural relationship is as follows: input layer → multi-headed attention module → output layer.
The specific parameters and implementation of each module are as follows:
the input of the input layer is composed of two parts, namely a word-level feature vector input1And word-level feature vector input2,input1Has a dimension of sequence length1×768,input2Has a dimension of sequence length2×768。
The three inputs of each Attention Head of the multi-Head Attention module are respectively a query vector sequence Q, a key vector sequence K and a value vector sequence V, and through experimental parameter adjustment, the model performance is optimal when the number of the Attention heads is set to be 24. Q is input1Obtained through the full connection layer, K and V are input2And obtaining the mapping matrix through the full connection layer, wherein the dimensionality of each full connection layer mapping matrix is 768 multiplied by 32. Each Attention Head generates a set of Q, K and V, each set of Q, K and V is input to a scaled dot product Attention module to compute a context feature vector. Finally, splicing the obtained multiple groups of context feature vectors, inputting the multiple groups of context feature vectors into a full connection layer to obtain the output of the multi-head attention module, wherein the dimensionality of a full connection layer mapping matrix is 768 multiplied by 768.
1.4) building a rear-end classification network of a Chinese event extraction network BERT-FF:
the back-end classification network of the Chinese event extraction network BERT-FF is divided into two parts, namely an event detection back-end network and an event argument extraction back-end network.
The structural relationship of the event detection back-end network is as follows in sequence: input layer → fully connected layer → multi-label classifier → output layer.
The specific parameters and implementation modes of the modules are as follows:
the input of the input layer is a feature vector of a [ CLS ] label in the fused feature vector, and the dimension is 1 × 768.
The fully connected layer mapping matrix has dimensions of 768 × n _ events, which is the total number of event types.
The multi-label classifier consists of n _ events Sigmoid functions, and the final output is the probability distribution of the event types contained in the current input text. And if the probability is greater than 0.5, the corresponding event type is considered to be contained in the text, otherwise, the corresponding event type is considered not to be contained.
The structure relationship of the event argument extraction back-end network is as follows: input layer → fully connected layer → conditional random field → output layer.
The specific parameters and implementation of each module are as follows:
the input of the input layer is a fusion feature vector with a dimension of sequence length1×768。
The dimensionality of the fully-connected layer mapping matrix is 768 × n _ labels, the n _ labels are the total number of the labeled labels of the event argument sequences, and the labeling method is a BIO labeling method.
The conditional random field is a linear chain element random field, and a sequence labeling result of event arguments is finally output. Conditional Random Field (CRF) is a discriminative probabilistic model, which is a probabilistic undirected graph model of a Random variable Y given the Random variable X. A special conditional random field defined on the Linear Chain, called Linear Chain component random field (Linear Chain CRF), is defined as:
let X ═ X1,X2,…,Xn),Y=(Y1,Y2,…,Yn) All random variable sequences are represented by linear chains, and if the conditional probability distribution P (Y | X) of Y satisfies Markov property under the condition of given X
P(Yi|X,Y1,...,Yi-1,Yi+1,…,Yn)=P(Yi|X,Yi-1,Yi+1),i=1,2,...,n#(4)
Then P (Y | X) is called the random field of the linear chain element.
And 2, constructing a training data set. The training data set consists of texts which are crawled from an open network and are used for describing events and annotation files which are in one-to-one correspondence with the texts.
In the embodiment of the invention, the construction process is as follows:
2.1) crawling at least 5000 texts describing the events from the open network.
2.2) manually labeling the event type, the event argument and the argument role contained in each text, and generating annotation files corresponding to the crawled texts one by one. The event type refers to an event type contained in the text; event arguments refer to elements in the text that are related to an event, typically named entities; argument roles refer to the role an event argument plays in an event.
In the news text "also security officer 25 days, a landmine explosion event occurred in the port city in south west of the country on the day of mooha, resulting in the death of at least 4 civilians. For example, according to a predefined event pattern, the event extraction system should recognize that the text contains two events of "disaster/accident-explosion" and "life-death", and extract that "the time" when the disaster/accident-explosion "occurs is" the day "," the location "is" the city mooha at the port in the south west of the country ", the number of the resulting" death "is" at least 4 ", and" the time "when the life-death" occurs is "the day", "the location" is "the city mooha at the port in the south west of the country", and "the dead" is "the civilian".
In the above examples, "disaster/accident-explosion" and "life-death" are event types; "the current day", "the port city of southwest of the country", "at least 4" and "civilian" are the event arguments; "time", "place", "number of dead", and "dead" are the corresponding argument roles.
2.3) composing the crawled text and the annotation files into a training data set.
And 3, training the event extraction network BERT-FF.
3.1) configuring an event extraction network BERT-FF operating environment, comprising different software such as CUDA11.0, cuDNN 8.0.4, Python 3.7, PyTorch 1.7.0 and the like.
3.2) downloading the pre-training parameter file (Chinese _ BERT _ wwm. bin) of the pre-training language model BERT to a local hard disk.
And 3.3) optimizing the pre-training parameters of the BERT by using a contrast learning method to obtain an optimized pre-training parameter file.
And 3.4) loading the optimized pre-training parameter file in the word level feature extraction network by using a transfer learning method to obtain the loaded word level feature extraction network. The method is characterized in that model parameters with strong semantic representation capability are obtained in related pre-training tasks and are loaded into a word level feature extraction network, so that the convergence speed of model training is increased, the training time is saved, and the risks of under-fitting and over-fitting are reduced.
3.5) training the training data set by using the BERT-FF of the loaded word level feature extraction network to obtain a trained Chinese event extraction network BERT-FF, wherein the training is realized as follows:
the iteration times (Epoch) of the network training are set to be 50 rounds, so that the network weight is sufficiently iterated and converged to the optimal value. And testing on the test sample after each round of training is finished, and storing the best test result and the network weight by taking the F1 score extracted by the event argument as a standard. The Batch Size (Batch Size) during training is set to 24, and the computational resources of the GPU are fully utilized. The network optimizer selects an adaptive momentum estimation (Adam) algorithm. Learning Rate (Learning Rate) the hierarchical Learning Rate is used: the learning rate of the word-level feature extraction network was set to 0.00002, the learning rate of the conditional random field was set to 0.002, and the learning rate of the other portions of the model was set to 0.0002. The word-level feature extraction network is based on a pre-training language model, and can be converged only by fine adjustment, so that the learning rate is set to be small, and other parts of the model need to be fully trained from the beginning. The conditional random field based on the probabilistic graphical model is difficult to converge, and the learning rate is set to be larger.
And 4, crawling a text describing the event from the open network, inputting the text serving as a test data set into a trained Chinese event extraction network BERT-FF for event extraction, outputting structured event information, namely an event extraction result, and calculating the event extraction accuracy and recall rate. Illustratively, to ensure effectiveness, the crawled text is at least 500 pieces.
To enter the news text "also security officer 25 days, a landmine explosion event occurred in the city in port in south west of the country on the day, resulting in the death of at least 4 civilians. "for example, the final event extraction result can be organized in a structured form of key-value pairs, see FIG. 4. This structured form may be conveniently stored in a json file or database for the user to obtain event information or a visual presentation.
The effect of the present invention will be further described with reference to simulation experiments.
1. Conditions of the experiment
The hardware test platform of the simulation experiment of the invention is as follows: CPU is Intel (R) Xeon (R) Silver 4310, the main frequency is 2.1Ghz, the memory is 32GB, and the GPU is a single NVIDIA GeForce RTX 3080 Ti; the software platform is as follows: ubuntu 18.04 system.
2. Analysis of Experimental Contents and results
The simulation experiment of the invention adopts the method of the invention to train the constructed Chinese event extraction network BERT-FF on the constructed event extraction training data set. And performing event extraction on the input test text by using the trained Chinese event extraction network BERT-FF, outputting structured event information to obtain an event extraction result, and calculating the accuracy and the recall rate of the event extraction.
Table 1 shows the performance of event detection on a test set after training a constructed data set using the method of the present invention.
TABLE 1 comparison of event detection Performance for different methods
Model Precision Recall F1
LSTM-CRF 0.8315 0.7266 0.7755
BERT(baseline) 0.8863 0.9270 0.9062
BERT-FF 0.9356 0.9203 0.9279
Table 1 shows the performance of the method of the invention in detecting events on the test set, while the other two methods were used as control groups. LSTM-CRF represents a classical event extraction method based on LSTM-CRF sequence labeling, BERT represents an event extraction method without using the feature fusion method in the invention, and BERT-FF represents the method in the invention. Precision represents Precision, Recall represents Recall, and F1 represents F1 score.
In contrast to BERT, it can be observed that BERT-FF, after using the feature fusion method of the present invention, is 5.56% more accurate than BERT in event detection, while there is only less than 1% reduction in recall. Finally, the F1 score is calculated, and BERT-FF is improved by 2.39% compared with BERT. In comprehensive comparison, the event detection performance of BERT-FF is improved considerably compared with BERT.
Table 2 shows the performance of event argument extraction on the test set after training the constructed data set by the method of the present invention.
TABLE 2 comparison of event argument extraction Performance for different methods
Model Precision Recall F1
LSTM-CRF 0.6826 0.4989 0.5765
BERT(baseline) 0.7446 0.7465 0.7456
BERT-FF 0.7570 0.7801 0.7684
Table 2 shows the performance of the method of the present invention in extracting event arguments on the test set, while the other two methods were used as control groups to perform the experiment.
Compared with BERT, the characteristic fusion method provided by the invention has the advantages that the BERT-FF is improved in each index, the accuracy is improved by 1.67%, the recall rate is improved by 4.50%, and the final calculation F1 score is improved by 3.04%. Compared with BERT, the performance of the event argument extraction of BERT-FF is greatly improved compared with BERT.
In conclusion, after the feature fusion method is added, the event detection performance and the event argument extraction performance of the model are improved, so that the fused word-level features and the word-level features really play a certain role in optimizing the semantic representation capability of the model, and the effectiveness of the feature fusion method provided by the chapter on the event extraction task is proved.
The above is only one specific example of the present invention for the convenience of those skilled in the art to understand the present invention, but the present invention is not limited to the scope of the specific example, and it is obvious to those skilled in the art that various changes are possible as long as they are within the spirit and scope of the present invention defined and determined by the appended claims, and all the inventions utilizing the inventive concept are protected by the present invention.

Claims (10)

1. The Chinese event extraction method based on feature fusion is characterized by comprising the following steps of:
step 1, constructing a Chinese event extraction network BERT-FF
The Chinese event extraction network BERT-FF comprises a word level feature extraction network, a feature fusion network and a back-end classification network;
the word level feature extraction network is based on a BERT pre-training language model and is used for extracting word level features of an input text; the word level feature extraction network is used for extracting word level features of an input text; the feature fusion network fuses the extracted word-level features and the word-level features through an attention mechanism so as to enhance the semantic representation capability of the model and obtain fusion feature vectors; the back-end classification network is used for respectively inputting the fusion feature vector into an event detection back-end network and an event argument extraction back-end network to obtain a final event extraction result;
step 2, constructing a training data set
The training data set consists of texts which are crawled from an open network and describe events and annotation files which are in one-to-one correspondence with the texts;
step 3, training the Chinese event extraction network BERT-FF
And 4, crawling a text describing the event from the open network, inputting the text as a test data set into a trained Chinese event extraction network BERT-FF for event extraction, outputting structured event information, obtaining an event extraction result, and calculating the event extraction accuracy and recall rate.
2. The method for extracting chinese events based on feature fusion according to claim 1, wherein the word-level feature extraction network has the following structural relationships in sequence: input layer → word embedding layer → position coding → N cascaded semantic coders → output layer;
the input of the input layer is a token sequence obtained after text word segmentation, the maximum length of the token sequence during training is set to be 128, if the maximum length exceeds 128, truncation is performed, the length of the token sequence in each batch is kept equal, and if the maximum length is not equal, the length of the token sequence is sequence length according to the longest token sequence length in each batch1
The embedding dimension embeddedsize of the word embedding layer is 768, that is, the word vector of each token is a column vector of 768 dimensions;
the position coding method adopts Sinusoid position coding, and is shown as a formula (1) and a formula (2):
Figure FDA0003582374310000021
Figure FDA0003582374310000022
wherein pos refers to the position of the current token in the sequence, and the numeric area is [0, sequence length ]1) I refers to the dimension serial number of the word vector, the value range is [0, embedded size/2), namely the dimension of the position code is consistent with the dimension of the word vector, and d refers to the dimension of the word vector; the formula (1) and the formula (2) are respectively calculation formulas of position coding when the word vector dimension serial number i is even number and odd number, so that different periodic changes are generated; with the increase of i, the frequency of periodic variation is lower and lower, and finally, a unique texture containing position information is generated at each different position and is used as a position code to be added into a word vector, so that a model can learn the dependency relationship between the positions and the time sequence characteristic of a natural language;
the N cascaded semantic encoders are main bodies of a word-level feature extraction network, and each semantic encoder consists of two parts: a multi-headed self-attention module containing a residual network and a forward propagation module containing a residual network.
3. The feature fusion-based Chinese event extraction method of claim 2, wherein the multi-head self-attention module comprising the residual error network is formed by splicing a multi-head attention module and a residual error module, and an input of the multi-head attention module1And input2All the word vectors are binding position codes, three inputs of each Attention Head are respectively a query vector sequence Q, a key vector sequence K and a value vector sequence V, and the number of the Attention heads is 12; q is input1Obtained through the full connection layer, K and V are input2The method comprises the steps that all-connection layers are obtained, the dimensionality of each all-connection layer mapping matrix is 768 x 64, each Attention Head generates a group of Q, K and V, each group of Q, K and V inputs a scaling dot product Attention module to be calculated to obtain context eigenvectors, finally, the obtained multiple groups of context eigenvectors are spliced, and then the multiple groups of context eigenvectors are input into all-connection layers to obtain multi-Head AttentionThe dimensionality of a fully-connected layer mapping matrix is 768 x 768 for the output of the module, and the input and the output of a front module are added by a residual module and subjected to layer normalization;
the forward propagation module comprising the residual error network is formed by splicing two full connection layers, a GeLU activation function and a residual error module, wherein the GeLU activation function is positioned between the two full connection layers, the dimensionality of a mapping matrix of the first full connection layer is 768 multiplied by 2048, the dimensionality of a mapping matrix of the second full connection layer is 2048 multiplied by 768, and the residual error module adds the input and the output of a front-end module and performs layer normalization; the GeLU activation function is expressed as:
Figure FDA0003582374310000031
where x represents the output of the pre-stage block and erf (—) represents the gaussian error computation function.
4. The method for extracting chinese events based on feature fusion as claimed in claim 2, wherein the word-level feature extraction network has the following structural relationships: input layer → word embedding layer → position coding → full-link layer → output layer;
the input of the input layer is a token sequence obtained after text word segmentation, the maximum length of the token sequence during training is set to be 128, if the maximum length exceeds 128, truncation is carried out, the length of the token sequence in each batch is kept equal, and if the maximum length is not equal, the length of the token sequence is sequence length according to the longest length Padding of the token sequence in each batch2
The embedding dimension embeddedsize of the word embedding layer is 128, that is, the word vector of each token is a 128-dimensional column vector;
the position coding method adopts Sinusoid position coding.
The dimension of the fully-connected layer mapping matrix is 128 × 768.
5. The method for extracting Chinese events based on feature fusion as claimed in claim 4, wherein the feature fusion network has the following structural relationships: input layer → multi-headed attention module → output layer;
the input of the input layer is composed of two parts, namely a word-level feature vector input1And word-level feature vector input2,input1Has a dimension of sequence length1×768,input2Has a dimension of sequence length2×768;
The three inputs of each Attention Head of the multi-Head Attention module are respectively a query vector sequence Q, a key vector sequence K and a value vector sequence V, the number of the Attention heads is 24, and Q is input by input1Obtained through a full connection layer, K and V are input2The method comprises the steps that all-connection layers are obtained, the dimensionality of each all-connection layer mapping matrix is 768 x 32, each Attention Head generates a group of Q, K and V, each group of Q, K and V is input into a scaling dot product Attention module to be calculated to obtain context eigenvectors, finally, the obtained multiple groups of context eigenvectors are spliced, then the obtained multiple groups of context eigenvectors are input into all-connection layers to obtain the output of a multi-Head Attention module, and the dimensionality of the all-connection layer mapping matrix is 768 x 768.
6. The feature fusion-based Chinese event extraction method according to claim 5, wherein the back-end classification network is divided into two parts, namely an event detection back-end network and an event argument extraction back-end network;
the structural relationship of the event detection back-end network is as follows: input layer → fully connected layer → multi-label classifier → output layer;
the input of the input layer is a feature vector of a [ CLS ] label in the fusion feature vector, and the dimensionality is 1 multiplied by 768;
the dimensionality of the mapping matrix of the full connection layer is 768 multiplied by n _ events, and the n _ events is the total number of the event types;
the multi-label classifier consists of n _ events Sigmoid functions, the final output is probability distribution of event types contained in the current input text, if the probability is greater than 0.5, the text is considered to contain the corresponding event types, otherwise, the text is considered not to contain the event types;
the structural relationship of the event argument extraction back-end network is as follows in sequence: input layer → fully connected layer → conditional random field → output layer;
the input of the input layer is a fusion feature vector with a dimension of sequence length1×768;
The dimensionality of the mapping matrix of the full connection layer is 768 multiplied by n _ labels, the n _ labels are the total number of the event argument sequences labeled labels, and the labeling method is a BIO labeling method;
the conditional random field is a linear chain element random field, and a sequence labeling result of event arguments is finally output.
7. The method for extracting Chinese events based on feature fusion according to claim 1, wherein in the step 2, at least 5000 texts describing the events are crawled from the open network, the event types, the event arguments and the argument roles contained in each text are manually labeled, and annotation files corresponding to the crawled texts one to one are generated.
8. The method for extracting chinese events based on feature fusion according to claim 1, wherein the step 3 comprises:
step 3a, configuring an event extraction network BERT-FF environment
Step 3b, downloading a pre-training parameter file (Chinese _ BERT _ wwm. bin) of a pre-training language model BERT;
step 3c, optimizing the pre-training parameters of the BERT by using a contrast learning method to obtain an optimized pre-training parameter file;
step 3d, loading the optimized pre-training parameter file in the word level feature extraction network by using a transfer learning method to obtain a loaded word level feature extraction network;
and 3e, extracting the BERT-FF of the network by using the loaded character level features, and training the training data set to obtain a trained Chinese event extraction network BERT-FF.
9. The feature fusion-based Chinese event extraction method of claim 8, wherein in step 3d, model parameters having strong semantic representation capability are obtained in a relevant pre-training task and loaded into a word-level feature extraction network, so as to accelerate convergence speed during model training, save training time, and reduce under-fitting and over-fitting risks.
10. The method for extracting chinese events based on feature fusion of claim 8, wherein in step 3e, the number of iterations of network training is set to 50 rounds, testing is performed on a test sample after each round of training is finished, the best test result and network weight are stored with F1 score extracted from event arguments as the standard, the batch size during training is set to 24, the network optimizer selects the adaptive momentum estimation algorithm, and the learning rate is a layered learning rate: the learning rate of the word-level feature extraction network was set to 0.00002, the learning rate of the conditional random field was set to 0.002, and the learning rate of the other portions of the model was set to 0.0002.
CN202210354653.4A 2022-04-06 2022-04-06 Chinese event extraction method based on feature fusion Pending CN114780677A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210354653.4A CN114780677A (en) 2022-04-06 2022-04-06 Chinese event extraction method based on feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210354653.4A CN114780677A (en) 2022-04-06 2022-04-06 Chinese event extraction method based on feature fusion

Publications (1)

Publication Number Publication Date
CN114780677A true CN114780677A (en) 2022-07-22

Family

ID=82427172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210354653.4A Pending CN114780677A (en) 2022-04-06 2022-04-06 Chinese event extraction method based on feature fusion

Country Status (1)

Country Link
CN (1) CN114780677A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116861901A (en) * 2023-07-04 2023-10-10 广东外语外贸大学 Chinese event detection method and system based on multitask learning and electronic equipment
CN117236338A (en) * 2023-08-29 2023-12-15 北京工商大学 Named entity recognition model of dense entity text and training method thereof

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116861901A (en) * 2023-07-04 2023-10-10 广东外语外贸大学 Chinese event detection method and system based on multitask learning and electronic equipment
CN116861901B (en) * 2023-07-04 2024-04-09 广东外语外贸大学 Chinese event detection method and system based on multitask learning and electronic equipment
CN117236338A (en) * 2023-08-29 2023-12-15 北京工商大学 Named entity recognition model of dense entity text and training method thereof
CN117236338B (en) * 2023-08-29 2024-05-28 北京工商大学 Named entity recognition model of dense entity text and training method thereof

Similar Documents

Publication Publication Date Title
CN110826336B (en) Emotion classification method, system, storage medium and equipment
Xiang et al. A convolutional neural network-based linguistic steganalysis for synonym substitution steganography
Gallant et al. Representing objects, relations, and sequences
Ren et al. Conversational query understanding using sequence to sequence modeling
CN113239700A (en) Text semantic matching device, system, method and storage medium for improving BERT
CN110765240B (en) Semantic matching evaluation method for multi-phase sentence pairs
Ju et al. An efficient method for document categorization based on word2vec and latent semantic analysis
CN114780677A (en) Chinese event extraction method based on feature fusion
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
CN116662582B (en) Specific domain business knowledge retrieval method and retrieval device based on natural language
Gu et al. Sentiment analysis via deep multichannel neural networks with variational information bottleneck
Grzegorczyk Vector representations of text data in deep learning
CN112232053A (en) Text similarity calculation system, method and storage medium based on multi-keyword pair matching
CN110955745B (en) Text hash retrieval method based on deep learning
CN114564953A (en) Emotion target extraction model based on multiple word embedding fusion and attention mechanism
Anass et al. Deceptive opinion spam based on deep learning
CN109948163A (en) The natural language semantic matching method that sequence dynamic is read
KR102462758B1 (en) Method for document summarization based on coverage with noise injection and word association, recording medium and device for performing the method
CN111581365B (en) Predicate extraction method
CN113822018B (en) Entity relation joint extraction method
CN111767388B (en) Candidate pool generation method
Prajapati et al. Automatic Question Tagging using Machine Learning and Deep learning Algorithms
Jin et al. An efficient machine reading comprehension method based on attention mechanism
Rath Word and relation embedding for sentence representation
Chhina Identifying Well-formed Questions using Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination