CN106599032B

CN106599032B - Text event extraction method combining sparse coding and structure sensing machine

Info

Publication number: CN106599032B
Application number: CN201610955220.9A
Authority: CN
Inventors: 汤斯亮; 吴飞; 杨启凡; 邵健; 郝雷光; 庄越挺
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2016-10-27
Filing date: 2016-10-27
Publication date: 2020-01-14
Anticipated expiration: 2036-10-27
Also published as: CN106599032A

Abstract

The invention discloses a text event extraction method combining sparse coding and a structure sensing machine. The method comprises the following steps: 1) marking and constructing text data into a training sample according to an ACE (adaptive communication interface) or Richter standard; 2) taking the extracted entity as a candidate entity of an event trigger word and an event parameter, and extracting text characteristics; 3) further extracting text distributed word vector characteristics and learning sparse coding characteristics; 4) training a structure perceptron classifier by using training samples and extracted text characteristics, and identifying trigger words and parameters related to events in the text; 5) and (3) inputting the new text data into a structure sensing machine classifier after the step 1, and extracting text event information. The method utilizes the sparse coding expression of the distributed word vector characteristics based on the neural network to strengthen the text characteristics, and on the other hand, the structural perceptron model is used for learning the recognition of the event trigger words and the event participants at the same time, so that a better event extraction effect is obtained.

Description

Text event extraction method combining sparse coding and structure sensing machine

Technical Field

The invention relates to event extraction, in particular to a text event extraction method combining sparse coding and a structure perceptron.

Background

An event is something that occurs or appears, an event involving an entity (person, item, etc.) that participates in or is affected by the event, as well as aspects of space-time. It is very important to know events and their descriptions in text data, and event extraction is often also an application key component part of machine reading, news summarization, information retrieval, knowledge base construction, and the like.

Generally, the goal of the event extraction task is to extract event-related trigger words and participants (people or things) in the text. Current leading-edge methods for event extraction generally include three steps: firstly, extracting entities such as people, mechanisms, positions and the like from a text by using a pre-trained named entity recognition tool, and then completing trigger word recognition and classification and event parameter recognition and classification step by step. An obvious drawback of this pipelined event extraction method is that errors occurring upstream are gathered and propagated downstream, and downstream steps cannot correct the upstream error selection. Therefore, the research considers that the event extraction is converted into the structure prediction problem, so that the identification and classification of event trigger words and parameters are realized simultaneously, and the idea is similar to the method used in other common natural language processing tasks such as POS tagging and Chunking.

Similar to other machine learning applications, it is often necessary to extract text features for model training and testing in natural language processing tasks, and these features can be largely classified into two categories, lexical features and contextual features. The lexical characteristics mainly refer to Part-of-Speech labels, entity information, morphology (stem, dynamic noun form) and the like, and the characteristics are used for acquiring semantic information of words; the context characteristics mainly refer to characteristics of syntactic dependency analysis and semantic role labeling, and the characteristics can reserve the structural relationship of grammar and semantics of the text. However, most of the characteristics require manual intervention, and the extraction process is time-consuming and not universal. In recent years, neural networks and deep learning technologies become research hotspots, unsupervised distributed word vector extraction methods in the field of NLP are also increasingly common, the learning method of the distributed word vectors is simple and general, and does not need manual labeling data, but the learning method has the defect that the interpretability and the flexibility of common sparse expression characteristics are not available, so that the research is provided for converting the distributed word vectors into a sparse expression form which is convenient to use in the traditional NLP problem.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a text event extraction method combining sparse coding and a structure sensing machine.

The text event extraction method combining sparse coding and a structure sensing machine comprises the following steps:

1) constructing the text data into a training sample according to the Automatic Content Extraction and/or the Rich Entity relationship event specification label;

2) taking the extracted entity as a candidate entity of an event trigger word and an event parameter, and extracting text characteristics;

3) further extracting text distributed word vector characteristics and learning sparse coding characteristics;

4) training a structure perceptron classifier by using training samples and extracted text characteristics, and identifying trigger words and parameters related to events in the text;

5) and (3) inputting the new text data into a structure sensing machine classifier after the step 1), and extracting text event information.

The steps of the invention can adopt the following preferred specific implementation modes:

the step 1) sub-step comprises:

1.1) inputting an input English original prediction into a manual or rule system after preprocessing, and marking a certain amount of training samples according to Automatic Content Extraction and/or Rich Entity relationship Event specifications; if the training sample corresponding to the original corpus exists, skipping the step; the preprocessing comprises removing stop words and mood auxiliary words;

1.2) according to Automatic Content Extraction and/or Rich Entity relationship Event specification, the Event Extraction task comprises extracting Event trigger words, predicting the type of an Event and extracting Event participants, wherein each Event participant corresponds to a certain Event role; for each document, all event references included therein constitute a set of training samples, each training sample consisting of a set of entities, one or more trigger words, and a set of event parameters, denoted { { e { (E) }₁,…,e_s},{t₁,…,t_r},{a₁,…,a_n}}，e₁,…,e_s1 to s entities, t₁,…,t_rRespectively 1 st to r trigger words, a₁,…,a_n1 st to n th event parameters respectively; and one role corresponding to each parameter is represented by the sequence coupling relation between the trigger words and the parameters.

The step 2) sub-step comprises:

2.1) Each training sample in document C corresponds to a sentence S in the document, each document C_jCorresponding to a set of training samples S₁,…,S_i}; for each training sample S_iPerforming word segmentation (tokenization) to obtain a group of corresponding words { T }₁,…,T_k}; extracting text features for each word (English is expressed as token, which is not limited to words, but can be phrases);

2.2) for each word, firstly extracting basic characteristics including a stem and a noun-to-verb, and roughly predicting the event type possibly corresponding to the word according to a pre-constructed rule by using the basic characteristics;

2.3) extracting a Part-of-Speech label, a WordNet similar meaning word and a Brown clustering category for each word in sequence;

2.4) carrying out syntactic dependency analysis on each sentence by using a stanford parser, and taking the dependency of the word in the syntactic dependency tree as a characteristic, namely the parent node and the child node of the word in the dependency tree; meanwhile, the dependency relationship in the dependency relationship tree is also used as the characteristic of the dependency relationship between the event trigger word and the event parameter;

2.5) if the word corresponds to a certain entity, using information such as entity corresponding type as word feature.

The substep of step 3) comprises:

3.1) constructing a language model by utilizing a neural network, taking all documents as training linguistic data, training the language model to obtain distributed word vector expression x corresponding to words_i；

3.2) expressing x for distributed word vectors_iConverted into sparse representation y by sparse coding_iThe transformation requires optimization of the objective function as in equation (2):

where D is the random initialization model parameter and A is all y_iA matrix of compositions; the latter two terms in the formula (2) are regularization terms;

3.3) optimizing an objective function in the formula (2) by using an adagradad random gradient descent algorithm, and defining:

wherein g is_t,i,jIs a gradient; eta_tIs the learning rate at time t; lambda is the hyper-parameter of the model;

the parameter updating method is as follows:

wherein y is_t+1,i,jRepresenting sparse vector representation y_iThe updated value of the jth element at time t.

The substep of step 4) comprises:

4.1) for each training sample, i.e. sentence instance S ═ S_iTaking an entity conforming to the event parameter type as an event parameter candidate value, and converting the prediction process of the structure perceptron into a decoding problem of finding the optimal configuration z epsilon gamma corresponding to the model parameter w

z＝argmax_z′∈γ(s)w·f(s,z′) (5)

Wherein f (s, z ') represents a feature vector of the instance s in the configuration z'; y(s) represents the set of all possible configurations under the corresponding instance s; the configuration (configuration) is to describe the result of assigning event trigger words and event parameters in sentence instances.

4.2) for each training sample (s, y'), finding the optimal configuration corresponding to s according to the formula (5) in each iteration in the training process, and if the found optimal configuration does not accord with the grund-truth, updating the parameters according to the following rules:

w＝w+f(s,y′)-f(s,z) (6)

the decoding problem is solved by using the beam-search strategy based on early-update, and the model decoding process comprises two sub-steps: first enumerate the words in the sentenceCalculating the score w · f (s, z ') of each possible configuration z' according to formula (5) by using the possible trigger word labels, and then selecting the top p configurations with the highest scores, wherein p is used as beam size; then go through each configuration in the beam, once the sample-conforming word s is found_iThe corresponding trigger word label is searched for { e }₁,…,e_sThe role the entity may play in the event, at which point the configuration score is again computed and p best results are selected to join the beam.

The step 5) sub-step comprises: firstly, extracting entities (entity detection) contained in the document, then extracting text features based on the steps 1) -4), and inputting a structure perceptron model obtained by training to obtain an extraction event result.

Compared with the background technology, the invention has the beneficial effects that:

1) compared with the traditional method for extracting events based on the production line, the method for extracting the events based on the flow line classifies the trigger words of the text and extracts the event parameters, and the method provided by the invention extracts the trigger words and the event parameters simultaneously based on the structure perceptron model, so that the error transmission effect in the flow line method is avoided, namely the error in the previous step is transmitted to the next step, and the information acquired in the next step cannot correct the error in the previous step.

2) The invention utilizes abundant text characteristics, not only applies traditional common text characteristics such as word stems, verb-to-noun conversion, part of speech tagging and the like, but also applies an expert system such as near-meaning words, upper and lower-level words and the like in WordNet, simultaneously utilizes sentence structure information extracted by syntactic dependency analysis, and combines the structure information with the structure prediction problem extracted by events; on the other hand, the invention also utilizes a neural network method of hot time to extract the distributed vector expression of the words, trains a Sparse Coding model for improving the utilization convenience and the interpretability of the word vectors, and learns to obtain the Sparse vector characteristics of the words.

Drawings

FIG. 1 is a schematic diagram of sparse representation learning based on word2 vec.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

The method utilizes the sparse coding expression of the distributed word vector characteristics based on the neural network, strengthens the text characteristics, and learns the identification of event trigger words and event participants by using a structural perceptron model at the same time, thereby realizing the event extraction. The text event extraction method combining sparse coding and a structure sensing machine comprises the following steps:

1) constructing the text data as a training sample according to Automatic Content Extraction (ACE) or Rich energy relationship Event (Richter) specification marking;

2) taking the extracted entity as a candidate entity of an event trigger word and an event parameter, and extracting text characteristics (part of speech tagging, dependency syntactic analysis and the like);

3) further extracting text distributed word vector characteristics and learning sparse coding (sparse coding) characteristics;

5) and (3) inputting the new text data into a structure sensing machine classifier after the step 1, and extracting text event information.

The step 1) comprises the following steps:

1.1) inputting an input English original forecast into a manual or rule system (such as a JET system) after preprocessing such as removing stop words, mood-assisted words and the like, and labeling a certain amount of training samples according to an ACE/Richter standard; if there is a training sample corresponding to the original corpus, this step can be omitted;

1.2) according to the ACE/Richter specification, the event extraction task includes extracting event triggers (generally verbs), while predicting the type of event and extracting event participants (i.e., event parameters), each event participant corresponding to a certain event role. Thus, for each document, all event references included therein constitute a set of training examples, each training example consisting of a set of entities (entity), one or more triggers (trigger), and a set of event parameters (argument). Is expressed as { { e { {₁,…,e_s},{t₁,…,t_r},{a₁,…,a_n}},e₁,…,e_s1 to s entities, t₁,…,t_rRespectively 1 st to r trigger words, a₁,…,a_n1 st to n th event parameters respectively; and one role corresponding to each parameter is represented by the sequence coupling relation between the trigger words and the parameters. The step 2) comprises the following steps:

2.1) Each training sample in document C corresponds to a sentence S in the document, each document C_jCorresponding to a set of training samples S₁,…,S_i}; for each training sample S_iPerforming word segmentation (token), and obtaining a group of corresponding words tokens { T }₁,…,T_k}; extracting text features for each word token

2.2) for each token, firstly extracting basic characteristics such as a stem, a noun rotation word and the like, and roughly predicting the type between events possibly corresponding to the token by using the basic characteristics according to a pre-constructed rule;

2.3) extracting text characteristics such as a Part-of-Speech (POS) label, a WordNet synonym, a Brown cluster category and the like from each token in sequence;

2.4) carrying out syntactic dependency analysis on each sentence by using a stanford parser, and taking the dependency relationship of the token in a syntactic dependency tree as a characteristic, namely, the parent node and the child node of the token in the dependency tree, wherein the relationship is easy to obtain in a Universal dependences format output by the stanford parser; meanwhile, the dependency relationship in the dependency relationship tree can also be used as the characteristic of the dependency relationship between the event trigger word and the event parameter;

2.5) if the token corresponds to a certain entity, taking information such as entity corresponding type as token characteristics. The step 3) comprises the following steps:

3.1) constructing a language model by utilizing a neural network, taking all documents as training linguistic data, training the language model, and simultaneously obtaining distributed word vector expression x corresponding to words_i. The language model may take the form of equation (1). Of course, other existing forms can be adopted, and x can be obtained_iAnd (4) finishing.

3.2) since the distributed word vector does not have the simplicity and interpretability of the utilization of sparse features in common natural language processing tasks, it can be converted into sparse representation features. For distributed word vector x_iConverted into sparse representation y by sparse coding_iThe transformation requires optimization of the objective function as in equation (2):

where D is the random initialization model parameter and A is all y_iA matrix of compositions; the last two terms in equation (2) are regularization terms to prevent over-fitting of the model.

wherein g is_t,i,jIs a gradient; eta_tIs the learning rate at time t; lambda is the hyper-parameter of the model;the parameter updating method is as follows:

The step 4) comprises the following steps:

4.1) the structure perceptron is an extension used for structure prediction based on a standard linear perceptron, and the idea of simultaneously extracting event trigger words and event parameters by using the structure perceptron is as follows: for each training sample, i.e. sentence instance S ═ S_iIncluding event parametersThe candidate values (entities that conform to the event parameter type), the prediction process of the structure perceptron is transformed into a decoding problem, i.e. the optimal configuration z epsilon gamma corresponding to the model parameter w is found,

z＝argmax_z′∈γ(s)w·f(s,z′) (5)

wherein f (s, z ') represents a feature vector of the instance s in the configuration z'; y(s) represents the set of all possible configurations under the corresponding instance s; the configuration describes the assignment results of event trigger words and event parameters in the sentence instances.

4.2) training process: the training process of the structure sensing machine can be executed in online, for each training sample (s, y'), in each iteration in the training process, the optimal configuration corresponding to s is found according to a formula (5), and if the found optimal configuration does not accord with the grund-truth, the parameters are updated according to the following rules:

w＝w+f(s,y)-f(s,z) (6)

the most key step in the training process and the testing process of the model is to find the optimal configuration corresponding to the sentence example under the current parameters, and the beam-search strategy based on early-update is used for solving the decoding problem in the invention. This process can be summarized as algorithm 1:

algorithm 1 structure perceptron training algorithm

Inputting: training sample set

Maximum iteration round number T, beam search parameter p

And (3) outputting: model parameter w

Step 1, initializing a model parameter w to be 0;

step 2, repeatedly executing the step 2-6 for a T round;

step 3, executing steps 4-6 on each training sample in the set D;

and 4, searching the optimal configuration z corresponding to the current sentence example by using the beam-search.

And 5, if z is not equal to y, updating model parameters:

w＝w+f(s,y_[1:p])-f(s,z) (7)

4.3) definition of s ═ s<(s₁,s₂,…,s_n),ε>Is a training example, where s_iFor the ith token of the sentence s,for the event parameter candidate entity, the group-route configuration corresponding to the instance is represented as:

y′＝(t₁,a_1,1,…,a_1,m,…,t_n,a_n,1,…,a_n,m) (8)

wherein t is_iRepresenting tokens_iCorresponding event trigger assignment (i.e., whether the token is a trigger, and corresponding type), a_i,kDenotes s_iAnd candidate entity e_kEvent role relationships between.

The model decoding process comprises two sub-steps: enumerating all possible trigger word labels (representing event types) for a current token in a sentence, calculating the score w · f (s, z ') of each possible configuration z' according to a formula (5), and then selecting the top p configurations with the highest scores, wherein p is used as a beam size; then go through each configuration in the beam, once the sample-conforming word s is found_iThe corresponding trigger word label is searched for { e }₁,…,e_sThe role the entity may play in the event, at which point the configuration score is again computed and p best results are selected to join the beam.

The step 5) comprises the following steps: firstly, extracting entities (entity detection) contained in the document, then extracting text features based on the steps, inputting a structure perceptron model obtained by training, and obtaining an extraction event result.

To verify the effect of the present invention, the method proposed in the present invention was tested using ACE 2005 corpus. Similar to the related research using the ACE 2005 corpus, 40 english news articles in the corpus were used as a test set (containing 672 sentences in total), the remaining 30 documents were randomly selected as a verification set, and the remaining documents were used as a training set (14840 sentences in total). The test uses precision (P), Recall (R), F-measure (F1) as an evaluation index. For the test result, when the type and subtype of the trigger word are matched with the group-route, the trigger word is regarded as correct identification; and when the type and the subtype of the event parameter are matched with the group-route, the event parameter is regarded as correctly identified, and the event role corresponding to the event parameter is correctly identified, so that the event parameter is correctly classified. Because the ACE 2005 corpus contains the manual annotation results of entities and events, step 1) in the invention is not executed during testing.

On the other hand, in order to verify the effect of the invention on other data sets, the method of the invention is also applied to a TAC 2016Event alignment and Linking Task test set, and the test set comprises 30k English documents. Training data still uses ACE 2005, and due to differences in test data and training data, experimental results can be found to be much worse than those directly on the ACE 2005 data set. The test results are shown in the following table.

％	Precision	Recall	F1
				ACE 2005	64.7	44.4	52.7
TAC 2016	26.6	5.2	8.7

Claims

1. A text event extraction method combining sparse coding and a structure sensing machine is characterized by comprising the following steps:

5) inputting new text data into a structure sensing machine classifier after the step 1), and extracting text event information;

the step 3) comprises the following steps:

wherein g is_t，i，jIs a gradient; eta_tIs the learning rate at time t; lambda is the hyper-parameter of the model;the parameter updating method is as follows:

wherein y is_t+1，i，jRepresenting sparse vector representation y_iThe updated value of the jth element at time t.

2. The method for extracting text events by combining sparse coding and structure sensing according to claim 1, wherein the step 1) comprises:

1.1) inputting an input English original prediction into a manual or rule system after preprocessing, and marking a certain amount of training samples according to an AutomaticContent Extraction and/or Rich Entity relationship Event specification; if the training sample corresponding to the original corpus exists, skipping the step; the preprocessing comprises removing stop words and mood auxiliary words;

1.2) according to Automatic Content Extraction and/or Rich Entity relationship Event specification, the Event Extraction task comprises extracting Event trigger words, predicting the type of an Event and extracting Event participants, wherein each Event participant corresponds to a certain Event role; for each document, all event references included therein constitute a set of training samples, each training sample consisting of a set of entities, one or more trigger words, and a set of event parameters, denoted { { e { (E) }₁，…，e_s}，{t₁，…，t_r}，{a₁，…，a_n}}，e₁，…，e_s1 to s entities, t₁，…，t_rRespectively 1 st to r trigger words, a₁，…，a_n1 st to n th event parameters respectively; and one role corresponding to each parameter is represented by the sequence coupling relation between the trigger words and the parameters.

3. The method for extracting text events by combining sparse coding and structure sensing according to claim 1, wherein the step 2) comprises:

2.1) Each training sample in document C corresponds to a sentence S in the document, each document C_jCorresponding to a set of training samples S₁，…，S_i}; for each training sample S_iPerforming word segmentation to obtain a group of corresponding words { T }₁，…，T_k}; extracting text features for each word;

2.5) if the word corresponds to a certain entity, taking the information comprising the entity corresponding type as the word characteristic.

4. The method for extracting text events by combining sparse coding and structure sensing according to claim 1, wherein the step 4) comprises:

z＝argmax_z′∈Υ(s)w·f(s，z′) (5)

Wherein f (s, z ') represents a feature vector of the instance s in the configuration z'; y(s) represents the set of all possible configurations under the corresponding instance s; the configuration is used for describing the assignment result of event trigger words and event parameters in sentence instances;

w＝w+f(s，y′)-f(s，z) (6)

the decoding problem is solved by using the beam-search strategy based on early-update, and the model decoding process comprises two sub-steps: enumerating all possible trigger word labels for the current word in the sentence, calculating the score w · f (s, z ') of each possible configuration z' according to formula (5), and then selecting the top p configurations with the highest scores, wherein p is used as the beam size; then go through each configuration in the beam, once the sample-conforming word s is found_iThe corresponding trigger word label is searched for { e }₁，…，e_sThe role the entity may play in the event, at which point the configuration score is again computed and p best results are selected to join the beam.

5. The method for extracting text events by combining sparse coding and structure sensing according to claim 1, wherein the step 5) comprises: firstly extracting entities contained in the document, then extracting text features based on the steps 1) -4), and inputting the trained structure perceptron classifier to obtain an extraction event result.