CN113901813A

CN113901813A - Event extraction method based on topic features and implicit sentence structure

Info

Publication number: CN113901813A
Application number: CN202111178364.5A
Authority: CN
Inventors: 黄婉华; 漆桂林; 高桓
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2022-01-07

Abstract

The invention discloses an event extraction method based on topic features and an implicit sentence structure, which is mainly used for presenting an unstructured text containing event information in a structured form and has wide application in the fields of automatic abstracting, automatic question answering, information retrieval and the like. According to the method, firstly, the topic information of a document level is introduced into an event extraction model with sentence level topic characteristics of the document obtained by combining BERT and LDA; secondly, extracting syntax information hidden in the embedded representation of the BERT words, and performing combined modeling on the extraction process and event extraction, thereby introducing important syntax information for the event extraction while avoiding the problem of error accumulation; and finally, the model extracts a plurality of trigger words in a single sentence and extracts the element roles of the entity in a plurality of events by using a sequence labeling method based on Bi-LSTM and cascading CRF.

Description

Event extraction method based on topic features and implicit sentence structure

Technical Field

The invention belongs to the field of information extraction, and relates to an event extraction method based on topic features and an implicit sentence structure.

Background

With the development and popularization of the internet, millions of data sources are published in the form of news articles, blogs, papers and the like every day, more and more experience knowledge is stored in documents, and as the traditional knowledge storage mode brings the problem of low retrieval efficiency, how to manage and utilize the data gradually becomes a core problem in the field of natural language processing. With research and research findings, the structured storage mode can effectively improve the ability of people to retrieve and collect experience knowledge. In order for machines to better understand human language, techniques for automatically organizing and processing data under study of information extraction tasks become indispensable. The basic goal of the information extraction task is to automatically extract and store information from unstructured or semi-structured machine-readable documents and other sources of electronic representations in a structured form to enable the organization, management, and analysis of large amounts of textual information on the internet.

The event extraction is one of core tasks of information extraction, and the main aim of the event extraction is to extract structured event information from unstructured texts, and the event extraction plays an important role in information retrieval and the structure of a case map. Existing event extraction methods can be broadly classified into pipeline methods and combined methods. The pipeline method has the problem of error accumulation, and most of recent work adopts a combined method to extract events. However, most sentence-level event extraction joint methods lack the overall information of the text so as not to well deal with the ambiguity problem of the trigger word, and document-level joint methods have the problem of complicated modeling; in addition, because the relationship between the event trigger words and the event elements in the sentences is close, the event extraction task depends on syntactic characteristics, however, only a few methods introduce syntactic information in the event extraction, and the syntactic analysis which depends on the pre-training tools still causes error accumulation on the event extraction; in related data sets and real-world applications, the situation that a sentence contains a plurality of events or event elements are overlapped is quite common, but most methods only consider single events and single element roles, and a large amount of event information is lost.

In order to improve the problems, the invention provides an event extraction joint method based on topic characteristics and an implicit sentence structure. The method firstly introduces document-level subject information for a sentence-level event extraction model by combining BERT and LDA, thereby improving the ambiguity problem of trigger words; secondly, syntax information hidden in the embedded representation of the BERT words is extracted, and the extraction process and the event extraction are subjected to combined modeling, so that not only is important syntax information introduced for the event extraction, but also the problem of error accumulation is avoided; and finally, the model can extract a plurality of trigger words in a single sentence and extract element roles of the entity in a plurality of events, so that the problem of overlapping of multiple events and event elements is solved. The method has the advantages of introducing the topic characteristics, the implicit syntactic characteristics and the joint modeling, so that the method for extracting the events based on the topic characteristics and the implicit sentence structure is constructed, the problem of error accumulation is avoided, the topic characteristics and the implicit sentence structure information are introduced, the quality of event extraction can be effectively improved, and the method has great research significance.

Disclosure of Invention

The invention provides an event extraction combined method, which comprises the following steps: for the ambiguity problem of the trigger word, on one hand, semantic structure information of the trigger word is obtained based on the expression of the sentence, on the other hand, topic distribution expression is obtained through topic modeling, and the whole context information of the introduced document is extracted for the event so as to achieve the effect of disambiguation of the trigger word; for the problem of error accumulation possibly caused by introducing syntactic characteristics, researching a method for extracting sentence structure information hidden in the embedding of the BERT words, establishing a joint model with event extraction, and avoiding the influence of error accumulation while introducing syntactic information; for the problem of multiple events and event element overlap, the model of the invention can identify multiple events in a single sentence and determine the element role a candidate entity plays in multiple events. Improvements to the above challenges can be accomplished by these methods to enhance the effectiveness of event extraction.

The invention utilizes a pre-training language model BERT to extract implicit sentence structural features and applies the implicit sentence structural features to a process of performing combined extraction with a subtask of event extraction. Firstly, extracting sentence structure information implied in a BERT result; then, extracting event trigger words in a cascading manner by utilizing a CRF model; then, introducing implicit sentence structure information into the process of extracting event elements by utilizing a Bi-LSTM model; and finally, defining a loss function of model joint training, and jointly optimizing each task to learn the optimal parameters of the model.

An event extraction method based on topic features and implicit sentence structures, comprising the following steps:

1) data processing and topic feature extraction: reconstructing an original data set into a format suitable for the model of the invention, extracting the theme characteristics of each sample invention file in the read data set, and then carrying out sentence segmentation on the sample invention files by using a sentence segmentation tool in an NLTK package to obtain sample sentences;

2) extracting an implicit sentence structure: for each sample sentence, firstly, utilizing a language model Bert to obtain word embedding in the sentence as the context characteristics of the sentence, and then utilizing a shielding mechanism to calculate the mutual influence degree between all components in the sentence for the word embedding sequence as the implicit sentence structure characteristics for a subsequent event extraction combination method;

3) the event trigger word extraction module based on the cascade CRF adopts a cascade sequence labeling method to decompose an extraction task into two tasks of boundary labeling and type judgment;

4) an event element extraction module for fusing the Bi-LSTM into the syntactic information is utilized, data in an influence matrix is introduced in the forward and reverse recursion processes, and corresponding links are established between the current word nodes and the strongly related word nodes, so that the syntactic information can be transmitted among the LSTM nodes, and finally the syntactic information is fused into the vector representation of words;

5) performing combined training: and calculating losses of the event trigger word extraction module and the event element extraction module respectively by using a cross entropy loss function, performing joint training on the event trigger word and the event element extraction to avoid the error accumulation problem, and in order that loss items of two subtasks are converged at the same time, the final loss is represented by the sum of the losses of the two subtasks.

In a preferred embodiment of the subject feature extraction of the present invention, in the step 1), the subject feature is extracted as follows:

1-1) obtaining a context expression with context semantic information of each document by using a long Sentence coding-oriented sequence-Transformer;

1-2) then obtaining the theme distribution information of each document by using a theme model LDA;

1-3) training an auto-encoder with the two vectors for fusing the two vectors, with the result from the encoder as the subject feature of each document.

In the preferred embodiment of the implicit sentence structure extraction of the present invention, in the step 2), a training data set is constructed according to the following features:

2-1) replacement of any word in the input sequence with a masking character [ MASK ]]Obtaining a new input sequence, inputting the sequence into BERT to obtain a result h_iH is to be_iAs x_iIs represented by (a);

2-2) obtaining other components x in the sentence_jFor x_iIn turn will input x in the sequence_jAlso specially converted into MASK characters]Then input into BERT to obtain x_iNew representation of (A) represents H_ij；

2-3) calculating H by using Euclidean distance_ijAnd h_iDistance f (x) in semantic space_i,x_j) Finally, the influence degree matrix between every two components in the sentence is obtained

The matrix

The implicit sentence structure information is the implicit sentence structure information, and the mutual influence degree between any two sentence components can be represented;

in the preferred embodiment of the event trigger extraction of the present invention, in the step 3), the event trigger is extracted according to the following specific steps:

3-1) using a BERT model to perform word segmentation and vectorization on the input sequence, aligning the input sequence with the original label sequence, removing special representation of the BERT such as 'CLS', 'SEP', and using the aligned sequence as the input of CRF.

3-2) performing sequence labeling on the word embedding sequence obtained by using BERT, and labeling whether the words in the input sequence are the beginning ('B') or the internal part ('I') of the trigger word by using CRF only when introducing a BIO labeling method into the task of the chapterOr is independent of the trigger word ("O"). Then the input sequence is labeled by a CRF model to obtain a labeled sequence C_i＝[c₁,...,c_i,…,c_n]Wherein c is_i∈{B,I,O}；

3-3) obtaining a CRF labeling sequence C_i＝[c₁,...,c_i,…,c_n]Then for c therein_iWord w for e { B, I }_iOr phrase g_i＝[w_p,...,w_q]The word w is found from the results of BERT_iOr phrase g_iVector representation of (1), wherein the phrase g_i＝[w_p,...,w_q]And taking the average value of word embedding of each word in the phrase as a vector representation of the phrase. The resulting vector is then fed to a fully-connected neural network to make a determination of the particular event type for the word or phrase.

In a preferred embodiment of the event element extraction of the present invention, in the step 4), the event element is extracted according to the following specific steps:

4-1) after the input sequence is participled and vectorized by using a BERT model, aligning the sequence with the original label sequence, and removing special expressions of 'CLS', 'SEP', and the like of the BERT.

4-2) for the input at the current moment, checking the influence degree of other components in the syntactic influence matrix and the corresponding sentence on the input at the current moment, adding the syntactic influence matrix into the calculation process of the node, applying the same calculation mode in the reverse LSTM calculation process, and integrating the syntactic influence information of the context into the vector representation of the whole sentence.

4-3) through calculation in forward and backward directions, a new vector representation sequence and a representation of the whole sentence can be obtained. And for any candidate event trigger word and any candidate event element entity pair, finding a corresponding word vector from the new vector representation sequence, splicing the word vector and the event vector with the event type, and inputting the spliced word vector and the event type into a full-connection classifier to classify the element roles.

The event extraction combination method provided by the invention respectively calculates the loss of the event trigger word extraction module and the event element extraction module by using a cross entropy loss functionAnd the event trigger word and event element extraction are jointly trained to avoid the problem of error accumulation, so that the loss terms of the two subtasks converge at the same time, and the final loss is represented by the sum of the losses of the two subtasks. Meanwhile, introducing an appropriate penalty factor gamma for a loss function used in joint training_tAnd gamma_aThe most suitable loss function is obtained through adjustment, and the loss of the final combined model is as follows:

the first item represents the loss of the event trigger word extraction module, the second item represents the loss of the event element extraction module, and the specific parameter meanings are referred to corresponding chapters; gamma ray_tAnd gamma_aThe method respectively corresponds to two main error conditions of event trigger word extraction error and event element extraction error: if the event trigger word has an error, i.e. k equals 1, the loss of the event trigger word extraction module is multiplied by a penalty factor γ_tIf the event element role classification is wrong only, i.e., k is 0, the loss of the event element extraction module is multiplied by a penalty coefficient γ_a. For the loss function of the joint model, the parameters were learned using an AdamW optimizer.

Compared with the prior art, the invention has the following advantages:

1) compared with most of the current event extraction joint methods, the event extraction joint method based on the topic characteristics and the implicit sentence structure solves the challenges faced by three event extraction tasks: aiming at the ambiguity problem of the event trigger word, combining BERT vector representation with sentence context semantics and LDA representation with topic distribution information to obtain the topic representation of the document, introducing the topic representation as a characteristic into an event extraction modeling process, and disambiguating the trigger word to a certain extent.

2) Secondly, aiming at the problem of error accumulation possibly caused by the upstream task which is quite important for event extraction through syntactic analysis, a modeling mode of extracting syntactic information hidden in a BERT word embedding result, carrying out joint training on the process and the event extraction two subtasks and jointly optimizing is used, and the problem of error accumulation is avoided while the syntactic information is introduced.

3) Meanwhile, the two methods allow the model to mark a plurality of event trigger words in one sentence, and the trigger words are defaulted to belong to different events so as to solve the challenge of the multi-event problem; in addition, for the candidate entity set in the sample, pairwise matching is carried out on the candidate entity set and the candidate trigger words, and then the relation (element role) between the candidate entity set and the candidate trigger words is determined, namely the model allows one entity to serve as an event element in a plurality of events so as to solve the problem of overlapping of the event elements. Experiments prove that the method effectively solves the three problems, is superior to other methods in recall rate, accuracy and F1 value, and can construct a high-efficiency and high-performance event extraction combined model.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a general framework of the present invention;

fig. 3 is a complete flow chart of the event extraction algorithm of the present invention.

Detailed Description

In order to enhance the understanding and appreciation of the present invention, the following detailed description of the invention is provided in connection with the examples. Example 1: referring to fig. 1-3, an event extraction method based on topic features and implicit sentence structures includes the following 5 steps:

step 1): firstly, preprocessing data, extracting subject characteristics in the preprocessing process, then processing sample data into a sentence-level form, and extracting context characteristics of a sentence, wherein the specific steps are as follows:

(1) document topic feature extraction

For all documents in the data set, respectively obtaining the document context characteristics S [ S ] based on the sequence-transforms₁,s₂,…,s_n]And the topic distribution characteristics of LDA L ═ L₁,l₂,…,l_n]Then, the vector l is distributed due to the topic_iThe dimension of (a) is a preset number of topics, and the document context feature vector s_iUp to 768 dimensions in the dimension(s),the document theme distribution characteristics can be lost by directly splicing the two, so that the model needs to fully fuse the document information of two different angles under the condition of not losing the document theme distribution information. Therefore, the invention utilizes an automatic encoder to effectively fuse two feature vectors, namely, each document D_iThe context vector representation and the theme distribution vector representation are spliced by an importance index gamma to obtain a high-dimensional vector representation

Then using this high-dimensional vector

Training an automatic encoder to realize the dimensionality reduction of high-dimensional vectors so as to fuse the distribution characteristics l of the topics_iAnd contextual characteristics s_iThe information of (1).

The automatic encoder utilizes the obtained original high-dimensional splicing vector through self-supervision learning

Respectively train one slave

Encoder for low-dimensional representation of potential space vectors and a mapping from low-dimensional vectors back to high-dimensional stitching vector

A decoder of, wherein

Gamma is an importance factor.

Finally, the final theme characteristic vector representation T of the ith document theme is obtained by utilizing the trained automatic encoder to encode_i：

T_i＝σ(W_e([s_i,γl_i])+b_e)#

(2) Sentence context feature extraction

For sentence context features, the present invention utilizes BERT to obtain word embedding information for an input sequence. BERT is a transform-based multi-layered bi-directional language representation model that aims to derive deep tokens with context information by learning the left and right context of each word.

Specifically, BERT is composed of N identical transform encoder modules, and the transform encoder module is denoted as trans (x), and the specific encoding operations are as follows:

h⁰＝SW_s+W_p#

h^α＝Trans(h^α-1),α∈[1,N]#

where S is the thermally independent code for each word in the input sentence, W_sIs a word-embedding matrix, W_pIs a position embedding matrix, p denotes the position index of the current word in the input sequence, h^αIs a hidden state vector representing the context representation of the input sentence at the alpha level, and N is the number of transform encoder modules. Considering the length of the coding sequence of the effective position of the BERT and the size of the model actually trained, the invention sets the maximum sequence length as maxLength 200.

For sentence context features, for each input sentence W ═ W₁,w₂,…,w_n]Encoded using BERT to obtain H_i＝[h₁,h₂,…,h_n]。

Obtaining the context characteristics H of a sample sentence_i＝[h₁,h₂,…,h_n]And subject characteristics T of the document_iThen, because both vectors are high-dimensional vectors, the spliced high-dimensional vectors can burden subsequent modules, so the invention reduces the dimension after connecting the two characteristics through a full-connection neural network:

x_j＝σ(W_f([h_j,T_i])+b_f)#

the final feature representation X ═ X of the sentence is obtained₁,x₂,…,x_n]The vector will be fed into subsequent modules for event extraction tasks.

Step 2) extracting sentence structure information hidden in the word embedding sequence of each sentence:

(1) converting the input sequence W ═ x₁,…,x_i,…,x_n]Any one word x in_iSubstitution into masked characters [ MASK ]]Obtaining a new input sequence W ═ x₁,…,MASK,…,x_n]Result h obtained by inputting the sequence into BERT_iH is to be_iAs x_iIs represented by (a);

(2) to obtain other components x in the sentence_jFor x_iFurther, W is [ x ]₁,…,MASK,…,x_n]X in (2)_jAlso specially converted into MASK characters]Then input into BERT to obtain x_iNew representation of (A) represents H_ij；

(3) Calculating f (x)_i,x_j) Value of (a), f (x)_i,x_j) Actually for describing the lack of x in a sentence_jAfter this context word, how x is represented to BERT_iThe influence of (c). The invention calculates H_ijAnd h_iThe distance in the semantic space characterizes the specific value of this effect.

This section calculates H by using Euclidean distance_ijAnd h_iDistance f (x) in semantic space_i,x_j) The specific calculation is as follows:

due to the particularity of the word segmentation mechanism of the BERT, a mode that a part of words are segmented into a plurality of sub-words may exist, so when the shielding operation is carried out, the shielding operation is applied to all the sub-word sequences of the BERT by taking one word or one text span as a reference. Meanwhile, considering the standard entity set given by ACE05, this chapter represents sentences as a sequence W ═ x composed of entity text spans₁,…,x_i,…,x_n]Wherein x is_i＝[w_p,…,w_q]Means that the ith entity text spans the p-th word and the q-th word and all the words between the p-th word and the q-th wordAnd (4) integration. According to the method, when the influence degree value mentioned by the multi-span entity is calculated, according to the label of the head word given by the multi-span entity in the ACE2005 data set, an importance factor k is introduced when the syntactic influence degree value of the multi-span entity is calculated, after the influence degree value of each word in the multi-span entity is calculated respectively, the influence degree of the head word of the multi-span entity is multiplied by the importance factor k, and then the influence degree average value of all words in the span is calculated to serve as the integral influence degree value.

For any two text span pairs in a sentence<x_i,x_j>Repeating the above steps and calculating f (x)_i,x_j) After the values, an N influence matrix can be constructed

Where N is the input sequence W ═ x₁,…,x_i,…,x_n]Length of (d). The matrix

That is, the extracted sentence structure information can represent the degree of interaction between any two sentence components to illustrate the association relationship between the two sentence components. The specific algorithm flow is as follows:

and 3) carrying out sequence annotation on the event trigger words by utilizing the cascading CRF:

(1) for an input sequence W ═ W₁,...,w_i,…,w_n]After passing through BERT model, the words are segmented and vectorized into H_i＝[h₁,...,h_i,…,h_n]And aligning the sequence with the original tag sequence, including removing "[ CLS ]]”、“[SEP]"A special expression of a class of BERTs" is toThe aligned sequence serves as input for the CRF.

(2) For sequence labeling of word-embedded sequences obtained by BERT, when introducing the BIO labeling method into the task of this chapter, the CRF is used only to label words in the input sequence as the beginning ("B") or the internal part ("I") of the trigger word or independent of the trigger word ("O"). Thus H_i＝[h₁,h₂,…,h_n]Obtaining a labeling sequence C after CRF model labeling_i＝[c₁,...,c_i,…,c_n]Wherein c is_i∈{B,I,O}；

(3) Obtaining a labeling sequence C of CRF_i＝[c₁,...,c_i,…,c_n]Then for c therein_iWord w for e { B, I }_iOr phrase g_i＝[w_p,...,w_q]The word w is found from the results of BERT_iOr phrase g_iVector representation of (1), wherein the phrase g_i＝[w_p,...,w_q]And taking the average value of word embedding of each word in the phrase as a vector representation of the phrase. The resulting vector is then fed to a fully-connected neural network to make a determination of the specific event type for the word or phrase using the following equation:

finally, the labeling sequence of the trigger words in the sentence is obtained

The following cross entropy loss function is still applied to the event-triggered word extraction module:

wherein N represents the length of the input sequence W; y is_iIs the event type label to which the ith word belongs in W; p is a radical of_iIndicating the ith word as an event triggerEvent type distribution.

Step 4) utilizing Bi-LSTM to introduce implicit sentence structure information to extract event elements:

a Bi-LSTM network is utilized, data in an influence matrix is introduced in the forward and reverse recursion processes, and corresponding relations are established between current word nodes and strongly related word nodes, so that syntactic information can be transmitted among LSTM nodes, and finally the syntactic information is fused into vector representation of words. The overall process of the event element extraction module mainly comprises three steps:

(1) for an input sequence W ═ W₁,w₂,…,w_n]After word segmentation and vectorization using the BERT model, this sequence is aligned with the original tag sequence, including removal "[ CLS ]]”、“[SEP]"Special representation of a class of BERTs to obtain H ═ H₁,h₂,…,h_n]。

(2) At time node t, the calculation process of forward LSTM unit is specifically described: input h for the current time_tLooking up syntactic impact matrix

Neutralization of h_tOther component pairs h in the corresponding sentence_tThe degree of influence of (c). Due to influence matrix

Describing the influence degree among the components in the sentence, and constructing the influence matrix

Combining words in the multi-entity span combination into a sentence component according to the entity set labels given by the data set, and for the input word vector sequence H ═ H₁,h₂,…,h_n]All words inside the multi-span entity in (1) apply the relevant data of the belonging entity span in the influence matrix. Meanwhile, the invention sets a threshold value pi only when other components h_jOccurs before time step t, and for h_tDegree of influence of

Beyond this threshold value π, h is only set in the following manner_jIntroduction h of information_tIn the calculation of (2):

wherein d is_tThe method is characterized in that a fully-connected network is introduced in order to not influence the calculation of the LSTM, and h is fused by the following calculation method_tAnd h_jThe information of (2):

applying the same calculation in the inverse LSTM calculation process, the syntactic impact information of the context can be merged into the vector representation of the entire sentence.

(3) Through forward and backward calculation, a new vector representation sequence can be obtained

And a representation of the entire sentence o^LSTM. Composed for any candidate event trigger word and any candidate event element entity<trigger_i,entitiy_j>For, the sequence H is represented from a new vector^LSTMFind the corresponding word vector

And

if the trigger word is a multi-span trigger word or a multi-span entity, the vector with the average value of all words in the span as a whole represents h_iOr h_j. Splicing the two and the event type and inputting the result into a full-connection branchAnd classifying element roles in the classifier:

wherein

Indicating the distribution of the role of the element that the jth entity acts in the event represented by the ith trigger,

vector representation, type, representing the ith trigger predicted at the event trigger extraction module_iIndicating the event type corresponding to the trigger word,

a vector representation representing the jth entity in the sequence of entities.

Obtaining the final event element role labeling sequence

Afterwards, the loss function of the event element extraction module still adopts the following cross entropy loss:

wherein M represents pairwise<Trigger word, entity>Counting;

is the element role referenced by the entity in the ith trigger-entity pair,

represents the ith<Trigger word, entity>The entity in the pair refers to the distribution of element roles.

Step 5) event extraction joint modeling method:

the model classifies the pairwise matching of event trigger words and event elements, and multiple events can be used for the same event<Event trigger word t_iEvent type e_iEvent element a_iElement role r_i>And (4) representing by a quadruple. There may be multiple cases if there is an error in a quad, and since the classification of event elements in the previous work is generally not good enough on the ACE05 event extraction dataset, this section mainly discusses the case of event element role error. There may be two cases of event element role errors: an event element extraction module obtains wrong global information in the joint modeling process of shared information for event trigger word detection errors or event type discrimination errors; the other is that the event trigger word extraction is correct, and the event element extraction is wrong, which is also divided into two cases: if the event element role r is not contained in the event element role set predefined by the event type e, the event element extraction module still cannot well judge the element type under the condition of giving prior of the event type; if the event element role r is contained in the event element role set predefined by the event type e but is not the correct role corresponding to the current event element, it is indicated that the event element extraction module can effectively utilize prior information brought by the event trigger word and the event type, but the role type cannot be correctly determined under the condition that the number of the element roles is reduced. For model optimization, solving the above three conditions can bring more model lifting, so the loss generated by the above conditions should be increased, and the model can be trained better.

For the above cases, introducing appropriate penalty factor gamma for the loss function used in the joint training_tAnd gamma_aThe most suitable loss function is obtained through adjustment, and the loss of the final combined model is as follows:

wherein the first item represents the loss of the event trigger word extraction module, and the second item represents the eventThe loss of the element extraction module, and the meaning of the specific parameters refers to the corresponding chapters; gamma ray_tAnd gamma_aThe two main error cases are respectively corresponded to: if the event trigger word has an error, i.e. k equals 1, the loss of the event trigger word extraction module is multiplied by a penalty factor γ_tIf the event element role classification is wrong only, i.e., k is 0, the loss of the event element extraction module is multiplied by a penalty coefficient γ_a。

Parameters were learned using an AdamW optimizer for the loss function of the joint model.

It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent substitutions or substitutions made on the basis of the above-mentioned technical solutions belong to the scope of the present invention.

Claims

1. An event extraction method based on topic features and implicit sentence structures is characterized by comprising the following steps:

1) data processing and topic feature extraction: reconstructing an original data set into a JSON format, extracting the theme characteristics of each sample invention file in the read data set, and then performing sentence segmentation on the sample invention files by using a sentence segmentation tool in an NLTK package to obtain sample sentences;

3) the method comprises the steps that an event trigger word extraction module based on a cascading CRF (formula CRF) adopts a cascading sequence labeling method to decompose an extraction task into two tasks of boundary labeling and type judgment, the boundary of an event trigger word is labeled firstly, and then the corresponding event type is judged;

5) and joint training, namely calculating losses of the event trigger word extraction module and the event element extraction module by using a cross entropy loss function, performing joint training on the event trigger word and the event element extraction to avoid the error accumulation problem, and representing the final loss by the sum of the losses of the two subtasks in order that the loss items of the two subtasks are converged at the same time.

2. The method for extracting events based on topic features and implicit sentence structures according to claim 1, wherein in the step 1), the topic features are extracted as follows:

3. The method for extracting events based on topic features and implicit sentence structures according to claim 1, wherein the training data set is constructed in the step 2) according to the following features:

2-1) will enter any word x in the sequence_iSubstitution into masked characters [ MASK ]]Obtaining a new input sequence, inputting the sequence into BERT to obtain a result h_iH is to be_iAs x_iIs represented by (a);

The matrix

Namely the implicit sentence structure information, the mutual influence degree between any two sentence components can be represented.

4. The method for extracting events based on topic features and implicit sentence structures according to claim 1, wherein the step 3) comprises the following specific steps:

3-1) performing word segmentation and vectorization on an input sequence by using a BERT model, aligning the input sequence with an original label sequence, removing special representation of 'CLS', 'SEP', and taking the aligned sequence as the input of CRF;

3-2) carrying out sequence annotation on the word embedding sequence obtained by using BERT, and only using CRF to label whether the words in the input sequence are the beginning ('B') or the internal part ('I') of the trigger word or are not related to the trigger word ('O') when introducing a BIO labeling method into the task of the chapter, so that the input sequence obtains an annotation sequence C after being labeled by a CRF model_i＝[c₁,...,c_i,…,c_n]Wherein c is_i∈{B,I,O}；

3-3) obtaining a CRF labeling sequence C_i＝[c₁,...,c_i,…,c_n]Then for c therein_iWord w for e { B, I }_iOr phrase g_i＝[w_p,...,w_q]The word w is found from the results of BERT_iOr phrase g_iVector representation of (1), wherein the phrase g_i＝[w_p,...,w_q]Using the average value of word embedding of each word in the phrase as the vector representation of the phrase, and then feeding the obtained vector to a fullAnd the connecting neural network judges the specific event type of the word or the phrase.

5. The method for extracting events based on topic features and implicit sentence structures according to claim 1, wherein the event element extraction in step 4) is performed according to the following specific steps:

4-1) after word segmentation and vectorization are carried out on an input sequence by using a BERT model, aligning the sequence with an original label sequence, and removing special representation of a class of BERTs such as 'CLS', 'SEP';

4-2) for the input of the current moment, checking the influence degree of other components in the syntactic influence matrix and the corresponding sentence on the input of the current moment, adding the syntactic influence matrix into the calculation process of the node, and applying the same calculation mode in the reverse LSTM calculation process to blend the syntactic influence information of the context into the vector representation of the whole sentence;

4-3) obtaining a new vector representation sequence and representation of the whole sentence through forward and backward calculation, finding out a corresponding word vector from the new vector representation sequence for any candidate event trigger word and any candidate event element entity pair, splicing the word vector and the event type, and inputting the spliced word vector and the event type into a full-connection classifier to classify element roles.