CN114880427A

CN114880427A - Model based on multi-level attention mechanism, event argument extraction method and system

Info

Publication number: CN114880427A
Application number: CN202210416103.0A
Authority: CN
Inventors: 吴昆�; 丁国栋
Original assignee: Mairong Intelligent Technology Shanghai Co ltd
Current assignee: Mairong Intelligent Technology Shanghai Co ltd
Priority date: 2022-04-20
Filing date: 2022-04-20
Publication date: 2022-08-09

Abstract

The invention discloses a model based on a multilevel attention mechanism, an event argument extraction method and a system, wherein the method comprises the steps of preprocessing an input text containing an event type and describing the event, and coding the text in a data set by using a pre-training language model to obtain an initial text representation of the model; secondly, inputting the event type into a model of a multi-level attention mechanism, and acquiring event type-argument role level attention characteristics and argument role-argument role level attention characteristics; secondly, inputting the text representation into a double affine layer, and fusing the text representation with the event type-argument role hierarchy attention feature and the argument role-argument role hierarchy attention feature to obtain a final fusion classification feature; and finally, taking the fusion classification characteristics as the input of a final classification layer, predicting the head and tail indexes of the event arguments of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model. The effect of extracting event arguments from the document is effectively improved.

Description

Model based on multi-level attention mechanism, event argument extraction method and system

Technical Field

The invention belongs to the field of event argument extraction research of information extraction in natural language processing, and particularly relates to a model based on a multi-level attention mechanism, and an event argument extraction method and system.

Background

The internet has been developed to the present and has fully entered the big data era, and everyone is wrapped by massive data. Data is widely used in various industries, and text is one of important expression forms of information. In the face of complicated text data, how to quickly acquire information needed by people is very important, and the task of information extraction is born under the requirement.

Event extraction is a core task in the field of information extraction, and aims to extract events from natural texts and express the events in a structured form of table classes. A complete event is composed of an event trigger word defining the event type and a plurality of arguments involved in the event, and the trigger word and the arguments are entities. According to the extraction stage division, the event extraction task can be decomposed into two subtasks of event trigger word extraction and event argument extraction. The event argument extraction is to extract all arguments involved in the event on the basis of the known event trigger words and event types.

The conventional method for extracting the event argument usually only simply uses the known information when extracting the argument, and ignores the hierarchical relationship between the event type and the event argument defined in the event template after only splicing the category vector of the event type information after text representation when introducing the event type information. Besides, the same statement usually contains a plurality of arguments, and there are more or less domain and semantic associations between the arguments, and such dependent information is often not included in the modeling process. Therefore, how to reasonably and efficiently utilize the known event type information and the dependency information among the arguments to assist the extraction of the event arguments has important research value.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the model based on the multi-level attention mechanism, the event argument extraction method and the system solve the problem that an event argument extraction algorithm in the prior art is low in classification accuracy.

The invention adopts the following technical scheme for solving the technical problems:

a model construction method of a multi-level attention mechanism comprises the steps of firstly, constructing an event type-argument role hierarchical relationship and an argument role-argument role hierarchical relationship, and respectively representing by using two-dimensional matrixes; secondly, inputting the pre-obtained text characteristics and the event type-event argument hierarchical relation matrix into an event type-event argument attention module, and calculating event type-argument role hierarchical attention characteristics; inputting a pre-obtained text characteristic and an event argument-event argument hierarchical relation matrix into an event argument-event argument attention module, and calculating argument role-argument role hierarchical attention characteristics; and finally, taking the event type-argument role hierarchy attention characteristic and the argument role-argument role hierarchy attention characteristic as the output of the model.

The specific process of calculating the event type-argument role level attention characteristics is as follows:

according to an official event template, analyzing the affiliation relationship between the event type and the argument role and using a two-dimensional matrix for representation, wherein the event template gives arguments contained in a specific type of event when the event is defined, the event type is used as an abscissa, the argument role is used as an ordinate to construct a two-dimensional relationship matrix, if a certain event contains a certain argument, the value of the event is set to 1 in the two-dimensional matrix, and otherwise, the value of the event is set to 0.

The specific process of computing the argument role-argument role hierarchy attention features is as follows:

analyzing the dependency relationship among the argument roles, using a two-dimensional matrix for representation, abstracting the value attribute contained in the argument roles into an upper-layer concept, expressing the attribute of a certain dimension of the argument roles, constructing a two-dimensional relationship matrix by using the argument role type as an abscissa and the upper-layer concept type as an ordinate, and setting the value of a certain argument in the two-dimensional relationship matrix to be 1 if the argument has the certain attribute, or setting the value of the argument in the two-dimensional relationship matrix to be 0 if the argument has the certain attribute.

The text representation is obtained by encoding the text in the original data set by applying a pre-training language model.

An event argument extraction method based on a multi-level attention mechanism comprises the following steps:

step 1, preprocessing an input text containing an event type and describing the event, and coding the text in a data set by using a pre-training language model to obtain an initial text representation of the model;

step 2, inputting the event type in the step 1 into a model of a multi-level attention mechanism, and obtaining event type-argument role level attention characteristics and argument role-argument role level attention characteristics;

step 3, inputting the text representation obtained in the step 1 into a double affine layer, and fusing the text representation with the event type-argument role hierarchy attention feature and the argument role-argument role hierarchy attention feature to obtain a final fusion classification feature;

and 4, taking the fusion classification characteristics as input of a final classification layer, predicting the head-to-tail index of the event argument of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model.

The specific process of the step 1 is as follows:

and dividing a training set and a testing set, dividing a long document in the document into a sentence set with a fixed length of 200 characters, wherein one sentence corresponds to one sample in a data set, and performing word embedding representation by using a pre-training language model BERT to obtain an initial text representation h.

Searching in an event type-argument role level attention characteristic two-dimensional relation matrix obtained by using a known event type e of each sample to obtain an association vector of the event type and the argument role, and then searching in a randomly initialized event type-argument role parameter matrix to obtain a semantic characteristic e of the event type corresponding to the argument role _uc Assuming that the event may include k arguments, the text representation obtained in step 1 is compared with e _uc Integrating and calculating to obtain the attention score s of the event type facing to the argument role by using softmax function _e ；

For each sample, using a two-dimensional matrix of argument role-argument role hierarchical attention features to obtain all argument relations by looking up a table in a randomly initialized argument role-argument role parameter matrixSemantic features r of contact information _uc (ii) a The text representation obtained in the step 1 and r are combined _uc Fusion and calculation of upper-layer concept-based attention scores s between argument roles using softmax function _r And argument-argument hierarchy features e of the sample _r ；

E is to be _r Splicing with the text representation h obtained in the step 1, and calculating a probability matrix for providing attention scores between argument roles for each token in the text

And screening out another argument with maximum correlation by using a max function aiming at each candidate argument to obtain an argument-argument characteristic matrix h for final classification _r 。

The specific process of the step 3 is as follows:

and (3) embedding the text representation obtained in the step (1) into an input double affine layer, mapping the text representation to a vector p for calculating probability of each argument role by using a feedforward neural network, and fusing the vector p with the event type-event argument level attention feature and the event argument-event argument level attention feature of the multi-level attention mechanism model to obtain a final fused classification feature.

The specific process of the step 4 is as follows:

and (3) taking the multi-level attention mechanism fusion feature representation as the input of a final classification layer, classifying the vector p obtained in the step (3) by using a plurality of two classifiers, predicting the head and tail indexes of the event arguments of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model.

The event argument extraction system based on the multilevel attention mechanism comprises a pre-training language model, a span extraction module, a multilevel attention mechanism model, a feature fusion module and an argument extraction module; wherein the content of the first and second substances,

the pre-training language model is used for receiving an external input including an event type and a section of text describing the event to perform pre-training and acquiring an event text representation;

the span extraction module is used for processing the received text representation to obtain initial classification characteristics;

the multi-level attention mechanism model is used for receiving the event type and acquiring two level characteristics;

the feature fusion module is used for fusing the two hierarchical features and the initial classification feature to obtain a final fusion classification feature;

and the argument extraction module is used for carrying out secondary classification on the fusion classification features to obtain the head and tail positions of argument entities and extracting event argument parameters.

Compared with the prior art, the invention has the following beneficial effects:

1. according to the scheme, firstly, the influence of an event type introduced into a template on an event role is explored, and an event type-argument type context hierarchy characteristic is established by using an attention mechanism; secondly, upper-layer conceptual correlation among arguments is explored, and an argument type-argument type context hierarchy characteristic is constructed by using a hierarchy attention mechanism; and finally, the effect of extracting event arguments from the document is improved through the fusion of a multi-level attention mechanism.

2. The relationship between the event type and the event argument and the relationship between the event arguments are respectively modeled through an attention mechanism, and the obtained features are fused with the text representation and are used for a final event argument classification task, so that a more accurate event argument extraction result is realized.

3. The model is an independent part and can be used in related fields, and the processing performance of the extraction of the event arguments can be effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive labor.

FIG. 1 is a flow chart of an event argument extraction method based on a multi-level attention mechanism according to the present invention.

FIG. 2 is an abstract view of an event-argument hierarchy used in the present invention.

FIG. 3 is an abstract view of an argument-argument hierarchical relationship used in the present invention.

FIG. 4 is a diagram illustrating the overall structure of the sentence-level event argument extraction task according to the present invention.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

In order to better explain the embodiment, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention.

It should be noted that the concept of attention weighting mentioned in this document may also be called attention point and attention feature in the art, therefore, the three concepts presented in this document all represent the same meaning, and are replaced by the common concepts in this field, and there will be no question of unclear presentation or of variable and interpretable correspondence.

according to an official event template, analyzing the relationship between the event type and the argument role and using a two-dimensional matrix for representation, wherein the event template gives arguments contained in a specific type of event when defining the event, the event type is used as an abscissa, the argument role is used as an ordinate to construct a two-dimensional relationship matrix, if a certain event contains a certain argument, the value of the two-dimensional relationship matrix is set to be 1, and otherwise, the value of the two-dimensional relationship matrix is set to be 0.

The event argument extraction method based on the multi-level attention mechanism comprises the following steps:

In a specific embodiment, as shown in figure 1,

s1: preprocessing an input text containing an event type and describing the event, and coding the text in the data set by using a pre-training language model to obtain an initial text representation of the model;

s2: constructing an event type-argument role hierarchical relation by using an existing official event template, and expressing by using a two-dimensional matrix;

s3: abstracting superior attributes of the argument based on experience knowledge of people, constructing an argument role-argument role hierarchical relationship through the superior attributes, and expressing by using a two-dimensional matrix;

s4: inputting the event type and text representation in the step S1 and the event type-argument role hierarchical relationship matrix obtained in the step S2 into an event type-event argument attention module, and calculating event type-argument role hierarchical attention characteristics;

s5: inputting the text representation obtained in the step S1 and the argument role-argument role hierarchical relation matrix obtained in the step S3 into an event argument-event argument attention module, and calculating argument role-argument role hierarchical attention characteristics;

s6: embedding the text representation obtained in the step S1 into an input double affine layer, and fusing the text representation with the event type-event argument level attention feature obtained in the step S4 and the argument role-argument role level attention feature obtained in the step S5 to obtain a final classification feature;

s7: and (4) taking the fusion classification characteristic representation obtained in the step (6) as the input of a final classification layer, predicting the head and tail position indexes of the event arguments of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model.

The specific process of step S1 is as follows:

dividing a data set used by a training model into a training set and a testing set, dividing a document in the data set into a sentence set by 200 words in maximum length, extracting argument by taking a sentence as a unit, wherein one sentence corresponds to one sample in the data set, coding by using a pre-trained language model BERT, and mapping each word to a fixed dimension d _h To obtain a generic semantic embedded text representation h:

wherein h is _i Embedding a representation for a word corresponding to each word, wherein tri indicates the position of an event trigger word, and N indicates the length of a text sequence; the size of the text representation h is N x d _h 。

The specific process of step S2 is as follows:

an event type-argument role two-dimensional relation matrix is constructed by using an official event template, the official event template presets subordinate argument roles for each type of event, and a schematic diagram is shown in FIG. 2, namely the true argument role set of the event is a subset of the preset argument role set of the template. Based on the theory, the hierarchical relationship between the event type and the argument role is expressed by using a two-dimensional matrix; the abscissa of the two-dimensional relationship matrix is 33 event types, the ordinate is 35 argument roles, if a certain argument role belongs to a certain event type, the corresponding position of the argument role is set to be 1 in the two-dimensional matrix, otherwise, the corresponding position is 0.

The specific process of step S3 is as follows:

step S300: and constructing an argument role-argument role two-dimensional relation matrix according to field leading edge research.

Step S301: argument roles often do not exist independently, and different argument roles have correlation in some dimension, and this final correlation helps to facilitate the co-extraction of arguments. Based on the theory, the upper concept is abstracted into 8 large classes (Person, Behavior, Entity, Good, Place, Org, Time, NA) according to expert design, and a schematic diagram is shown in fig. 3.

Step S302: designing an argument role-argument role two-dimensional relation matrix; the abscissa of the two-dimensional relationship matrix is 35 argument roles, the ordinate is 8 upper-layer concepts, if a certain argument role contains a certain upper-layer concept attribute, the corresponding position of the argument role in the two-dimensional relationship matrix is set to be 1, otherwise, the corresponding position is 0.

The specific process of step S4 is as follows:

step S400: and searching in a two-dimensional relationship matrix of the event type-argument role level attention features obtained by using the known event type e of each sample to obtain an association vector of the event type and the argument role, and calculating the event type-argument role level attention features.

Step S401: randomly initializing a size num _e (number of event types) × d _h Two-dimensional query vector E (of the same dimension as the text representation h in step S1) _e 。

Step S402: querying vector E according to known event types _e In a first dimension to obtain a size of N num _r *d _h Corresponding to the semantic feature vector e of the argument role _uc Wherein num _r Is the number of argument classes.

Step S403: expanding the text token h from step S1 in a second dimension to obtain an AND vector e _uc Vectors of the same size

Then splicing the two and obtaining a characteristic vector h through a full connection layer _e The size of the particles is N x num _r 。

h _e ＝tanh(W _ae [h；e _uc ])

Step S404: calculating the attention weight s of the event text to the argument role by using softmax function _e 。

Where i indicates the current argument type, k _r Representing the number of arguments a current event type has in the template.

The specific process of step S5 is as follows:

step S500: and (4) computing an argument role-argument role hierarchical attention feature.

Step S501: for each sample, a two-dimensional matrix of argument role-argument role hierarchy attention features is used, and a size num is randomly initialized _c (upper layer concept number) d _h Two-dimensional query vector E (of the same dimension as the text representation h in step S1) _r 。

Step S502: the relation between all arguments and upper concepts is arranged in a query vector E _c In the query and get the size num in the extended dimension _c *N*d _h Semantic feature vector r of the association information between all arguments of _uc 。

Step S503: expanding the text token h from step S1 in a second dimension to obtain an AND vector r _uc Vectors of the same size

Then splicing the two and obtaining a characteristic vector through a full connection layer

Its size is num _c *N。

Here, the

Step S504: computing attention weight of argument role association upper-level concept using softmax function

Size num _c *N。

Where i represents the current position index and n represents the length of the current text sequence.

Step S505: for each argument, a weighted average attention score s of all its associated upper-level concepts is calculated _r The size of the vector after dimension expansion is N num _r (number of argument roles).

Where i denotes the current position index, k _c Representing the number of the upper-layer concept attributes contained in the current argument role, and being marked as c ₁ ,c ₂ ,...,c _k 。

Step S506: expanding the text token h from step S1 in a second dimension to obtain a vector

The vector s obtained in the last step is added _r Expanding on the second dimension to obtain a vector

Calculating the Hadamard product of the two to obtain the size of N num _r *d _h Argument-argument hierarchy feature vector e of _r 。

Step S507: vector obtained by carrying out dimension expansion on text representation h from step S1

And the vector e obtained in the above step _r Vector obtained through dimension expansion

Splicing and obtaining the characteristic vector through the full connection layer

The vector provides a probability matrix of attention scores of all argument roles to each other for each feature word (token)

The size of which is N num _r *num _r 。

Step S508: aiming at each candidate argument, screening out another argument with the maximum correlation with each candidate argument by using a max function to obtain a role attention score matrix h with the highest relevance with each argument _r Size N x num _r 。

The specific process of step S6 is as follows:

step S600: the multi-feature fusion is detailed with reference to fig. 4, and finally, the classification features are obtained.

Step S601: the text tokens h from step S1 are separately input into two dual affine layers, which are mapped to a vector p ' that computes probabilities for each argument role using a feed forward neural network, resulting in a probability matrix p ' corresponding to the head-to-tail index ' _s/e ，p′ _s Indicate Start (start) index, p' _e Indicating an end index (end), both of size N num _r *2。

p′＝W ₁ (tanh(W ₂ ·h+b ₂ ))+b ₁

Step S602: the event type-argument role feature vector S obtained in the step S4 _e And argument role-argument role feature vector h obtained in the step five _r And fusing the vector p to obtain the final fusion classification feature probability.

p＝h _r *(λ·s _e +p′)

The specific process of step S7 is as follows:

the multi-level attention mechanism fused feature representation is used as an input of a final classification layer, and 0/1 labels are distributed to the head-tail position index of each argument role by the obtained vector p through a plurality of binary classifiers.

y _s/e ＝argmax(p _s/e )

By now, event argument extraction based on a multi-level attention mechanism has been completed. Through the scheme, the prior information contained in the event is fully mined and utilized, the attention mechanism is used in the encoding process, the guidance information of the event type to the argument roles and the correlation information between the argument roles are fully fused, the semantic features are enhanced, and the accuracy and the performance of the event element extraction are improved.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A model construction method of a multi-level attention mechanism is characterized by comprising the following steps: firstly, constructing an event type-argument role hierarchical relationship and an argument role-argument role hierarchical relationship, and respectively representing by using two-dimensional matrixes; secondly, inputting the pre-obtained text characteristics and the event type-event argument hierarchical relation matrix into an event type-event argument attention module, and calculating event type-argument role hierarchical attention characteristics; inputting a pre-obtained text characteristic and an event argument-event argument hierarchical relation matrix into an event argument-event argument attention module, and calculating argument role-argument role hierarchical attention characteristics; and finally, taking the event type-argument role hierarchy attention characteristic and the argument role-argument role hierarchy attention characteristic as the output of the model.

2. The model building method of a multi-level attention mechanism of claim 1, wherein: the specific process of calculating the event type-argument role level attention characteristics is as follows:

3. The model building method of a multi-level attention mechanism of claim 1, wherein: the specific process of computing the argument role-argument role hierarchy attention features is as follows:

4. The model building method of a multi-level attention mechanism of claim 1, wherein: the text representation is obtained by encoding the text in the original data set by applying a pre-training language model.

5. An event argument extraction method based on a multi-level attention mechanism is characterized by comprising the following steps: the method comprises the following steps:

step 2, inputting the event type in the step 1 into the model of the multi-level attention mechanism of any one of claims 1 to 4, and obtaining event type-argument role level attention characteristics and argument role-argument role level attention characteristics;

step 3, inputting the text representation obtained in the step 1 into a double affine layer, and fusing the text representation with the event type-argument role level attention feature and the argument role-argument role level attention feature to obtain a final fusion classification feature;

6. The method of claim 5 for event argument extraction based on multi-tier attention mechanism, wherein: the specific process of the step 1 is as follows:

7. The method of claim 6, wherein the event argument extraction method based on multi-level attention mechanism comprises: searching in an event type-argument role level attention characteristic two-dimensional relation matrix obtained by using a known event type e of each sample to obtain an association vector of the event type and the argument role, and then searching in a randomly initialized event type-argument role parameter matrix to obtain a semantic characteristic e of the event type corresponding to the argument role _uc Assuming that the event may include k arguments, the text representation obtained in step 1 is compared with e _uc Integrating and calculating to obtain the attention score s of the event type facing to the argument role by using softmax function _e ；

For each sample, a two-dimensional matrix of argument role-argument role hierarchical attention features is used for obtaining semantic features r of correlation information among all arguments by looking up a table in an argument role-argument role parameter matrix initialized at random _uc (ii) a The text representation obtained in the step 1 and r are combined _uc Fusion and calculation of upper-layer concept-based attention scores s between argument roles using softmax function _r And argument-argument hierarchy features e of the sample _r ；

8. The method of claim 5 for event argument extraction based on multi-tier attention mechanism, wherein: the specific process of the step 3 is as follows:

9. The method of claim 5 for event argument extraction based on multi-tier attention mechanism, wherein: the specific process of the step 4 is as follows:

10. An event argument extraction system based on a multi-level attention mechanism is characterized in that: the system comprises a pre-training language model, a span extraction module, a multi-level attention mechanism model, a feature fusion module and an argument extraction module; wherein the content of the first and second substances,