CN114880427A - Model based on multi-level attention mechanism, event argument extraction method and system - Google Patents

Model based on multi-level attention mechanism, event argument extraction method and system Download PDF

Info

Publication number
CN114880427A
CN114880427A CN202210416103.0A CN202210416103A CN114880427A CN 114880427 A CN114880427 A CN 114880427A CN 202210416103 A CN202210416103 A CN 202210416103A CN 114880427 A CN114880427 A CN 114880427A
Authority
CN
China
Prior art keywords
argument
event
role
attention
event type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210416103.0A
Other languages
Chinese (zh)
Inventor
吴昆�
丁国栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mairong Intelligent Technology Shanghai Co ltd
Original Assignee
Mairong Intelligent Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mairong Intelligent Technology Shanghai Co ltd filed Critical Mairong Intelligent Technology Shanghai Co ltd
Priority to CN202210416103.0A priority Critical patent/CN114880427A/en
Publication of CN114880427A publication Critical patent/CN114880427A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a model based on a multilevel attention mechanism, an event argument extraction method and a system, wherein the method comprises the steps of preprocessing an input text containing an event type and describing the event, and coding the text in a data set by using a pre-training language model to obtain an initial text representation of the model; secondly, inputting the event type into a model of a multi-level attention mechanism, and acquiring event type-argument role level attention characteristics and argument role-argument role level attention characteristics; secondly, inputting the text representation into a double affine layer, and fusing the text representation with the event type-argument role hierarchy attention feature and the argument role-argument role hierarchy attention feature to obtain a final fusion classification feature; and finally, taking the fusion classification characteristics as the input of a final classification layer, predicting the head and tail indexes of the event arguments of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model. The effect of extracting event arguments from the document is effectively improved.

Description

Model based on multi-level attention mechanism, event argument extraction method and system
Technical Field
The invention belongs to the field of event argument extraction research of information extraction in natural language processing, and particularly relates to a model based on a multi-level attention mechanism, and an event argument extraction method and system.
Background
The internet has been developed to the present and has fully entered the big data era, and everyone is wrapped by massive data. Data is widely used in various industries, and text is one of important expression forms of information. In the face of complicated text data, how to quickly acquire information needed by people is very important, and the task of information extraction is born under the requirement.
Event extraction is a core task in the field of information extraction, and aims to extract events from natural texts and express the events in a structured form of table classes. A complete event is composed of an event trigger word defining the event type and a plurality of arguments involved in the event, and the trigger word and the arguments are entities. According to the extraction stage division, the event extraction task can be decomposed into two subtasks of event trigger word extraction and event argument extraction. The event argument extraction is to extract all arguments involved in the event on the basis of the known event trigger words and event types.
The conventional method for extracting the event argument usually only simply uses the known information when extracting the argument, and ignores the hierarchical relationship between the event type and the event argument defined in the event template after only splicing the category vector of the event type information after text representation when introducing the event type information. Besides, the same statement usually contains a plurality of arguments, and there are more or less domain and semantic associations between the arguments, and such dependent information is often not included in the modeling process. Therefore, how to reasonably and efficiently utilize the known event type information and the dependency information among the arguments to assist the extraction of the event arguments has important research value.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the model based on the multi-level attention mechanism, the event argument extraction method and the system solve the problem that an event argument extraction algorithm in the prior art is low in classification accuracy.
The invention adopts the following technical scheme for solving the technical problems:
a model construction method of a multi-level attention mechanism comprises the steps of firstly, constructing an event type-argument role hierarchical relationship and an argument role-argument role hierarchical relationship, and respectively representing by using two-dimensional matrixes; secondly, inputting the pre-obtained text characteristics and the event type-event argument hierarchical relation matrix into an event type-event argument attention module, and calculating event type-argument role hierarchical attention characteristics; inputting a pre-obtained text characteristic and an event argument-event argument hierarchical relation matrix into an event argument-event argument attention module, and calculating argument role-argument role hierarchical attention characteristics; and finally, taking the event type-argument role hierarchy attention characteristic and the argument role-argument role hierarchy attention characteristic as the output of the model.
The specific process of calculating the event type-argument role level attention characteristics is as follows:
according to an official event template, analyzing the affiliation relationship between the event type and the argument role and using a two-dimensional matrix for representation, wherein the event template gives arguments contained in a specific type of event when the event is defined, the event type is used as an abscissa, the argument role is used as an ordinate to construct a two-dimensional relationship matrix, if a certain event contains a certain argument, the value of the event is set to 1 in the two-dimensional matrix, and otherwise, the value of the event is set to 0.
The specific process of computing the argument role-argument role hierarchy attention features is as follows:
analyzing the dependency relationship among the argument roles, using a two-dimensional matrix for representation, abstracting the value attribute contained in the argument roles into an upper-layer concept, expressing the attribute of a certain dimension of the argument roles, constructing a two-dimensional relationship matrix by using the argument role type as an abscissa and the upper-layer concept type as an ordinate, and setting the value of a certain argument in the two-dimensional relationship matrix to be 1 if the argument has the certain attribute, or setting the value of the argument in the two-dimensional relationship matrix to be 0 if the argument has the certain attribute.
The text representation is obtained by encoding the text in the original data set by applying a pre-training language model.
An event argument extraction method based on a multi-level attention mechanism comprises the following steps:
step 1, preprocessing an input text containing an event type and describing the event, and coding the text in a data set by using a pre-training language model to obtain an initial text representation of the model;
step 2, inputting the event type in the step 1 into a model of a multi-level attention mechanism, and obtaining event type-argument role level attention characteristics and argument role-argument role level attention characteristics;
step 3, inputting the text representation obtained in the step 1 into a double affine layer, and fusing the text representation with the event type-argument role hierarchy attention feature and the argument role-argument role hierarchy attention feature to obtain a final fusion classification feature;
and 4, taking the fusion classification characteristics as input of a final classification layer, predicting the head-to-tail index of the event argument of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model.
The specific process of the step 1 is as follows:
and dividing a training set and a testing set, dividing a long document in the document into a sentence set with a fixed length of 200 characters, wherein one sentence corresponds to one sample in a data set, and performing word embedding representation by using a pre-training language model BERT to obtain an initial text representation h.
Searching in an event type-argument role level attention characteristic two-dimensional relation matrix obtained by using a known event type e of each sample to obtain an association vector of the event type and the argument role, and then searching in a randomly initialized event type-argument role parameter matrix to obtain a semantic characteristic e of the event type corresponding to the argument role uc Assuming that the event may include k arguments, the text representation obtained in step 1 is compared with e uc Integrating and calculating to obtain the attention score s of the event type facing to the argument role by using softmax function e
For each sample, using a two-dimensional matrix of argument role-argument role hierarchical attention features to obtain all argument relations by looking up a table in a randomly initialized argument role-argument role parameter matrixSemantic features r of contact information uc (ii) a The text representation obtained in the step 1 and r are combined uc Fusion and calculation of upper-layer concept-based attention scores s between argument roles using softmax function r And argument-argument hierarchy features e of the sample r
E is to be r Splicing with the text representation h obtained in the step 1, and calculating a probability matrix for providing attention scores between argument roles for each token in the text
Figure BDA0003606058130000031
And screening out another argument with maximum correlation by using a max function aiming at each candidate argument to obtain an argument-argument characteristic matrix h for final classification r
The specific process of the step 3 is as follows:
and (3) embedding the text representation obtained in the step (1) into an input double affine layer, mapping the text representation to a vector p for calculating probability of each argument role by using a feedforward neural network, and fusing the vector p with the event type-event argument level attention feature and the event argument-event argument level attention feature of the multi-level attention mechanism model to obtain a final fused classification feature.
The specific process of the step 4 is as follows:
and (3) taking the multi-level attention mechanism fusion feature representation as the input of a final classification layer, classifying the vector p obtained in the step (3) by using a plurality of two classifiers, predicting the head and tail indexes of the event arguments of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model.
The event argument extraction system based on the multilevel attention mechanism comprises a pre-training language model, a span extraction module, a multilevel attention mechanism model, a feature fusion module and an argument extraction module; wherein the content of the first and second substances,
the pre-training language model is used for receiving an external input including an event type and a section of text describing the event to perform pre-training and acquiring an event text representation;
the span extraction module is used for processing the received text representation to obtain initial classification characteristics;
the multi-level attention mechanism model is used for receiving the event type and acquiring two level characteristics;
the feature fusion module is used for fusing the two hierarchical features and the initial classification feature to obtain a final fusion classification feature;
and the argument extraction module is used for carrying out secondary classification on the fusion classification features to obtain the head and tail positions of argument entities and extracting event argument parameters.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the scheme, firstly, the influence of an event type introduced into a template on an event role is explored, and an event type-argument type context hierarchy characteristic is established by using an attention mechanism; secondly, upper-layer conceptual correlation among arguments is explored, and an argument type-argument type context hierarchy characteristic is constructed by using a hierarchy attention mechanism; and finally, the effect of extracting event arguments from the document is improved through the fusion of a multi-level attention mechanism.
2. The relationship between the event type and the event argument and the relationship between the event arguments are respectively modeled through an attention mechanism, and the obtained features are fused with the text representation and are used for a final event argument classification task, so that a more accurate event argument extraction result is realized.
3. The model is an independent part and can be used in related fields, and the processing performance of the extraction of the event arguments can be effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive labor.
FIG. 1 is a flow chart of an event argument extraction method based on a multi-level attention mechanism according to the present invention.
FIG. 2 is an abstract view of an event-argument hierarchy used in the present invention.
FIG. 3 is an abstract view of an argument-argument hierarchical relationship used in the present invention.
FIG. 4 is a diagram illustrating the overall structure of the sentence-level event argument extraction task according to the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
In order to better explain the embodiment, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention.
It should be noted that the concept of attention weighting mentioned in this document may also be called attention point and attention feature in the art, therefore, the three concepts presented in this document all represent the same meaning, and are replaced by the common concepts in this field, and there will be no question of unclear presentation or of variable and interpretable correspondence.
A model construction method of a multi-level attention mechanism comprises the steps of firstly, constructing an event type-argument role hierarchical relationship and an argument role-argument role hierarchical relationship, and respectively representing by using two-dimensional matrixes; secondly, inputting the pre-obtained text characteristics and the event type-event argument hierarchical relation matrix into an event type-event argument attention module, and calculating event type-argument role hierarchical attention characteristics; inputting a pre-obtained text characteristic and an event argument-event argument hierarchical relation matrix into an event argument-event argument attention module, and calculating argument role-argument role hierarchical attention characteristics; and finally, taking the event type-argument role hierarchy attention characteristic and the argument role-argument role hierarchy attention characteristic as the output of the model.
The specific process of calculating the event type-argument role level attention characteristics is as follows:
according to an official event template, analyzing the relationship between the event type and the argument role and using a two-dimensional matrix for representation, wherein the event template gives arguments contained in a specific type of event when defining the event, the event type is used as an abscissa, the argument role is used as an ordinate to construct a two-dimensional relationship matrix, if a certain event contains a certain argument, the value of the two-dimensional relationship matrix is set to be 1, and otherwise, the value of the two-dimensional relationship matrix is set to be 0.
The specific process of computing the argument role-argument role hierarchy attention features is as follows:
analyzing the dependency relationship among the argument roles, using a two-dimensional matrix for representation, abstracting the value attribute contained in the argument roles into an upper-layer concept, expressing the attribute of a certain dimension of the argument roles, constructing a two-dimensional relationship matrix by using the argument role type as an abscissa and the upper-layer concept type as an ordinate, and setting the value of a certain argument in the two-dimensional relationship matrix to be 1 if the argument has the certain attribute, or setting the value of the argument in the two-dimensional relationship matrix to be 0 if the argument has the certain attribute.
The text representation is obtained by encoding the text in the original data set by applying a pre-training language model.
The event argument extraction method based on the multi-level attention mechanism comprises the following steps:
step 1, preprocessing an input text containing an event type and describing the event, and coding the text in a data set by using a pre-training language model to obtain an initial text representation of the model;
step 2, inputting the event type in the step 1 into a model of a multi-level attention mechanism, and obtaining event type-argument role level attention characteristics and argument role-argument role level attention characteristics;
step 3, inputting the text representation obtained in the step 1 into a double affine layer, and fusing the text representation with the event type-argument role hierarchy attention feature and the argument role-argument role hierarchy attention feature to obtain a final fusion classification feature;
and 4, taking the fusion classification characteristics as input of a final classification layer, predicting the head-to-tail index of the event argument of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model.
In a specific embodiment, as shown in figure 1,
the event argument extraction method based on the multi-level attention mechanism comprises the following steps:
s1: preprocessing an input text containing an event type and describing the event, and coding the text in the data set by using a pre-training language model to obtain an initial text representation of the model;
s2: constructing an event type-argument role hierarchical relation by using an existing official event template, and expressing by using a two-dimensional matrix;
s3: abstracting superior attributes of the argument based on experience knowledge of people, constructing an argument role-argument role hierarchical relationship through the superior attributes, and expressing by using a two-dimensional matrix;
s4: inputting the event type and text representation in the step S1 and the event type-argument role hierarchical relationship matrix obtained in the step S2 into an event type-event argument attention module, and calculating event type-argument role hierarchical attention characteristics;
s5: inputting the text representation obtained in the step S1 and the argument role-argument role hierarchical relation matrix obtained in the step S3 into an event argument-event argument attention module, and calculating argument role-argument role hierarchical attention characteristics;
s6: embedding the text representation obtained in the step S1 into an input double affine layer, and fusing the text representation with the event type-event argument level attention feature obtained in the step S4 and the argument role-argument role level attention feature obtained in the step S5 to obtain a final classification feature;
s7: and (4) taking the fusion classification characteristic representation obtained in the step (6) as the input of a final classification layer, predicting the head and tail position indexes of the event arguments of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model.
The specific process of step S1 is as follows:
dividing a data set used by a training model into a training set and a testing set, dividing a document in the data set into a sentence set by 200 words in maximum length, extracting argument by taking a sentence as a unit, wherein one sentence corresponds to one sample in the data set, coding by using a pre-trained language model BERT, and mapping each word to a fixed dimension d h To obtain a generic semantic embedded text representation h:
Figure BDA0003606058130000061
wherein h is i Embedding a representation for a word corresponding to each word, wherein tri indicates the position of an event trigger word, and N indicates the length of a text sequence; the size of the text representation h is N x d h
The specific process of step S2 is as follows:
an event type-argument role two-dimensional relation matrix is constructed by using an official event template, the official event template presets subordinate argument roles for each type of event, and a schematic diagram is shown in FIG. 2, namely the true argument role set of the event is a subset of the preset argument role set of the template. Based on the theory, the hierarchical relationship between the event type and the argument role is expressed by using a two-dimensional matrix; the abscissa of the two-dimensional relationship matrix is 33 event types, the ordinate is 35 argument roles, if a certain argument role belongs to a certain event type, the corresponding position of the argument role is set to be 1 in the two-dimensional matrix, otherwise, the corresponding position is 0.
The specific process of step S3 is as follows:
step S300: and constructing an argument role-argument role two-dimensional relation matrix according to field leading edge research.
Step S301: argument roles often do not exist independently, and different argument roles have correlation in some dimension, and this final correlation helps to facilitate the co-extraction of arguments. Based on the theory, the upper concept is abstracted into 8 large classes (Person, Behavior, Entity, Good, Place, Org, Time, NA) according to expert design, and a schematic diagram is shown in fig. 3.
Step S302: designing an argument role-argument role two-dimensional relation matrix; the abscissa of the two-dimensional relationship matrix is 35 argument roles, the ordinate is 8 upper-layer concepts, if a certain argument role contains a certain upper-layer concept attribute, the corresponding position of the argument role in the two-dimensional relationship matrix is set to be 1, otherwise, the corresponding position is 0.
The specific process of step S4 is as follows:
step S400: and searching in a two-dimensional relationship matrix of the event type-argument role level attention features obtained by using the known event type e of each sample to obtain an association vector of the event type and the argument role, and calculating the event type-argument role level attention features.
Step S401: randomly initializing a size num e (number of event types) × d h Two-dimensional query vector E (of the same dimension as the text representation h in step S1) e
Step S402: querying vector E according to known event types e In a first dimension to obtain a size of N num r *d h Corresponding to the semantic feature vector e of the argument role uc Wherein num r Is the number of argument classes.
Step S403: expanding the text token h from step S1 in a second dimension to obtain an AND vector e uc Vectors of the same size
Figure BDA0003606058130000071
Then splicing the two and obtaining a characteristic vector h through a full connection layer e The size of the particles is N x num r
h e =tanh(W ae [h;e uc ])
Step S404: calculating the attention weight s of the event text to the argument role by using softmax function e
Figure BDA0003606058130000072
Where i indicates the current argument type, k r Representing the number of arguments a current event type has in the template.
The specific process of step S5 is as follows:
step S500: and (4) computing an argument role-argument role hierarchical attention feature.
Step S501: for each sample, a two-dimensional matrix of argument role-argument role hierarchy attention features is used, and a size num is randomly initialized c (upper layer concept number) d h Two-dimensional query vector E (of the same dimension as the text representation h in step S1) r
Step S502: the relation between all arguments and upper concepts is arranged in a query vector E c In the query and get the size num in the extended dimension c *N*d h Semantic feature vector r of the association information between all arguments of uc
Step S503: expanding the text token h from step S1 in a second dimension to obtain an AND vector r uc Vectors of the same size
Figure BDA0003606058130000081
Then splicing the two and obtaining a characteristic vector through a full connection layer
Figure BDA0003606058130000082
Its size is num c *N。
Here, the
Figure BDA0003606058130000083
Step S504: computing attention weight of argument role association upper-level concept using softmax function
Figure BDA0003606058130000084
Size num c *N。
Figure BDA0003606058130000085
Where i represents the current position index and n represents the length of the current text sequence.
Step S505: for each argument, a weighted average attention score s of all its associated upper-level concepts is calculated r The size of the vector after dimension expansion is N num r (number of argument roles).
Figure BDA0003606058130000086
Where i denotes the current position index, k c Representing the number of the upper-layer concept attributes contained in the current argument role, and being marked as c 1 ,c 2 ,...,c k
Step S506: expanding the text token h from step S1 in a second dimension to obtain a vector
Figure BDA0003606058130000087
The vector s obtained in the last step is added r Expanding on the second dimension to obtain a vector
Figure BDA0003606058130000088
Calculating the Hadamard product of the two to obtain the size of N num r *d h Argument-argument hierarchy feature vector e of r
Figure BDA0003606058130000089
Step S507: vector obtained by carrying out dimension expansion on text representation h from step S1
Figure BDA00036060581300000810
And the vector e obtained in the above step r Vector obtained through dimension expansion
Figure BDA00036060581300000811
Splicing and obtaining the characteristic vector through the full connection layer
Figure BDA00036060581300000812
The vector provides a probability matrix of attention scores of all argument roles to each other for each feature word (token)
Figure BDA0003606058130000091
The size of which is N num r *num r
Figure BDA0003606058130000092
Step S508: aiming at each candidate argument, screening out another argument with the maximum correlation with each candidate argument by using a max function to obtain a role attention score matrix h with the highest relevance with each argument r Size N x num r
Figure BDA0003606058130000093
The specific process of step S6 is as follows:
step S600: the multi-feature fusion is detailed with reference to fig. 4, and finally, the classification features are obtained.
Step S601: the text tokens h from step S1 are separately input into two dual affine layers, which are mapped to a vector p ' that computes probabilities for each argument role using a feed forward neural network, resulting in a probability matrix p ' corresponding to the head-to-tail index ' s/e ,p′ s Indicate Start (start) index, p' e Indicating an end index (end), both of size N num r *2。
p′=W 1 (tanh(W 2 ·h+b 2 ))+b 1
Step S602: the event type-argument role feature vector S obtained in the step S4 e And argument role-argument role feature vector h obtained in the step five r And fusing the vector p to obtain the final fusion classification feature probability.
p=h r *(λ·s e +p′)
The specific process of step S7 is as follows:
the multi-level attention mechanism fused feature representation is used as an input of a final classification layer, and 0/1 labels are distributed to the head-tail position index of each argument role by the obtained vector p through a plurality of binary classifiers.
y s/e =argmax(p s/e )
By now, event argument extraction based on a multi-level attention mechanism has been completed. Through the scheme, the prior information contained in the event is fully mined and utilized, the attention mechanism is used in the encoding process, the guidance information of the event type to the argument roles and the correlation information between the argument roles are fully fused, the semantic features are enhanced, and the accuracy and the performance of the event element extraction are improved.
The event argument extraction system based on the multilevel attention mechanism comprises a pre-training language model, a span extraction module, a multilevel attention mechanism model, a feature fusion module and an argument extraction module; wherein the content of the first and second substances,
the pre-training language model is used for receiving an external input including an event type and a section of text describing the event to perform pre-training and acquiring an event text representation;
the span extraction module is used for processing the received text representation to obtain initial classification characteristics;
the multi-level attention mechanism model is used for receiving the event type and acquiring two level characteristics;
the feature fusion module is used for fusing the two hierarchical features and the initial classification feature to obtain a final fusion classification feature;
and the argument extraction module is used for carrying out secondary classification on the fusion classification features to obtain the head and tail positions of argument entities and extracting event argument parameters.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A model construction method of a multi-level attention mechanism is characterized by comprising the following steps: firstly, constructing an event type-argument role hierarchical relationship and an argument role-argument role hierarchical relationship, and respectively representing by using two-dimensional matrixes; secondly, inputting the pre-obtained text characteristics and the event type-event argument hierarchical relation matrix into an event type-event argument attention module, and calculating event type-argument role hierarchical attention characteristics; inputting a pre-obtained text characteristic and an event argument-event argument hierarchical relation matrix into an event argument-event argument attention module, and calculating argument role-argument role hierarchical attention characteristics; and finally, taking the event type-argument role hierarchy attention characteristic and the argument role-argument role hierarchy attention characteristic as the output of the model.
2. The model building method of a multi-level attention mechanism of claim 1, wherein: the specific process of calculating the event type-argument role level attention characteristics is as follows:
according to an official event template, analyzing the relationship between the event type and the argument role and using a two-dimensional matrix for representation, wherein the event template gives arguments contained in a specific type of event when defining the event, the event type is used as an abscissa, the argument role is used as an ordinate to construct a two-dimensional relationship matrix, if a certain event contains a certain argument, the value of the two-dimensional relationship matrix is set to be 1, and otherwise, the value of the two-dimensional relationship matrix is set to be 0.
3. The model building method of a multi-level attention mechanism of claim 1, wherein: the specific process of computing the argument role-argument role hierarchy attention features is as follows:
analyzing the dependency relationship among the argument roles, using a two-dimensional matrix for representation, abstracting the value attribute contained in the argument roles into an upper-layer concept, expressing the attribute of a certain dimension of the argument roles, constructing a two-dimensional relationship matrix by using the argument role type as an abscissa and the upper-layer concept type as an ordinate, and setting the value of a certain argument in the two-dimensional relationship matrix to be 1 if the argument has the certain attribute, or setting the value of the argument in the two-dimensional relationship matrix to be 0 if the argument has the certain attribute.
4. The model building method of a multi-level attention mechanism of claim 1, wherein: the text representation is obtained by encoding the text in the original data set by applying a pre-training language model.
5. An event argument extraction method based on a multi-level attention mechanism is characterized by comprising the following steps: the method comprises the following steps:
step 1, preprocessing an input text containing an event type and describing the event, and coding the text in a data set by using a pre-training language model to obtain an initial text representation of the model;
step 2, inputting the event type in the step 1 into the model of the multi-level attention mechanism of any one of claims 1 to 4, and obtaining event type-argument role level attention characteristics and argument role-argument role level attention characteristics;
step 3, inputting the text representation obtained in the step 1 into a double affine layer, and fusing the text representation with the event type-argument role level attention feature and the argument role-argument role level attention feature to obtain a final fusion classification feature;
and 4, taking the fusion classification characteristics as input of a final classification layer, predicting the head-to-tail index of the event argument of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model.
6. The method of claim 5 for event argument extraction based on multi-tier attention mechanism, wherein: the specific process of the step 1 is as follows:
and dividing a training set and a testing set, dividing a long document in the document into a sentence set with a fixed length of 200 characters, wherein one sentence corresponds to one sample in a data set, and performing word embedding representation by using a pre-training language model BERT to obtain an initial text representation h.
7. The method of claim 6, wherein the event argument extraction method based on multi-level attention mechanism comprises: searching in an event type-argument role level attention characteristic two-dimensional relation matrix obtained by using a known event type e of each sample to obtain an association vector of the event type and the argument role, and then searching in a randomly initialized event type-argument role parameter matrix to obtain a semantic characteristic e of the event type corresponding to the argument role uc Assuming that the event may include k arguments, the text representation obtained in step 1 is compared with e uc Integrating and calculating to obtain the attention score s of the event type facing to the argument role by using softmax function e
For each sample, a two-dimensional matrix of argument role-argument role hierarchical attention features is used for obtaining semantic features r of correlation information among all arguments by looking up a table in an argument role-argument role parameter matrix initialized at random uc (ii) a The text representation obtained in the step 1 and r are combined uc Fusion and calculation of upper-layer concept-based attention scores s between argument roles using softmax function r And argument-argument hierarchy features e of the sample r
E is to be r Splicing with the text representation h obtained in the step 1, and calculating a probability matrix for providing attention scores between argument roles for each token in the text
Figure FDA0003606058120000021
And screening out another argument with maximum correlation by using a max function aiming at each candidate argument to obtain an argument-argument characteristic matrix h for final classification r
8. The method of claim 5 for event argument extraction based on multi-tier attention mechanism, wherein: the specific process of the step 3 is as follows:
and (3) embedding the text representation obtained in the step (1) into an input double affine layer, mapping the text representation to a vector p for calculating probability of each argument role by using a feedforward neural network, and fusing the vector p with the event type-event argument level attention feature and the event argument-event argument level attention feature of the multi-level attention mechanism model to obtain a final fused classification feature.
9. The method of claim 5 for event argument extraction based on multi-tier attention mechanism, wherein: the specific process of the step 4 is as follows:
and (3) taking the multi-level attention mechanism fusion feature representation as the input of a final classification layer, classifying the vector p obtained in the step (3) by using a plurality of two classifiers, predicting the head and tail indexes of the event arguments of each role type by adopting an 0/1 labeling format, and performing iterative training to obtain an optimal model.
10. An event argument extraction system based on a multi-level attention mechanism is characterized in that: the system comprises a pre-training language model, a span extraction module, a multi-level attention mechanism model, a feature fusion module and an argument extraction module; wherein the content of the first and second substances,
the pre-training language model is used for receiving an external input including an event type and a section of text describing the event to perform pre-training and acquiring an event text representation;
the span extraction module is used for processing the received text representation to obtain initial classification characteristics;
the multi-level attention mechanism model is used for receiving the event type and acquiring two level characteristics;
the feature fusion module is used for fusing the two hierarchical features and the initial classification feature to obtain a final fusion classification feature;
and the argument extraction module is used for carrying out secondary classification on the fusion classification features to obtain the head and tail positions of argument entities and extracting event argument parameters.
CN202210416103.0A 2022-04-20 2022-04-20 Model based on multi-level attention mechanism, event argument extraction method and system Pending CN114880427A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210416103.0A CN114880427A (en) 2022-04-20 2022-04-20 Model based on multi-level attention mechanism, event argument extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210416103.0A CN114880427A (en) 2022-04-20 2022-04-20 Model based on multi-level attention mechanism, event argument extraction method and system

Publications (1)

Publication Number Publication Date
CN114880427A true CN114880427A (en) 2022-08-09

Family

ID=82670994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210416103.0A Pending CN114880427A (en) 2022-04-20 2022-04-20 Model based on multi-level attention mechanism, event argument extraction method and system

Country Status (1)

Country Link
CN (1) CN114880427A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049884A (en) * 2022-08-15 2022-09-13 菲特(天津)检测技术有限公司 Broad-sense few-sample target detection method and system based on fast RCNN
CN116049345A (en) * 2023-03-31 2023-05-02 江西财经大学 Document-level event joint extraction method and system based on bidirectional event complete graph

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049884A (en) * 2022-08-15 2022-09-13 菲特(天津)检测技术有限公司 Broad-sense few-sample target detection method and system based on fast RCNN
CN116049345A (en) * 2023-03-31 2023-05-02 江西财经大学 Document-level event joint extraction method and system based on bidirectional event complete graph
CN116049345B (en) * 2023-03-31 2023-10-10 江西财经大学 Document-level event joint extraction method and system based on bidirectional event complete graph

Similar Documents

Publication Publication Date Title
CN110162593B (en) Search result processing and similarity model training method and device
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
Zou et al. A lexicon-based supervised attention model for neural sentiment analysis
CN111324696B (en) Entity extraction method, entity extraction model training method, device and equipment
CN114880427A (en) Model based on multi-level attention mechanism, event argument extraction method and system
CN114547298A (en) Biomedical relation extraction method, device and medium based on combination of multi-head attention and graph convolution network and R-Drop mechanism
CN116304748B (en) Text similarity calculation method, system, equipment and medium
CN116204674B (en) Image description method based on visual concept word association structural modeling
CN113254581A (en) Financial text formula extraction method and device based on neural semantic analysis
CN115860006A (en) Aspect level emotion prediction method and device based on semantic syntax
CN110569355B (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
Che et al. Tensor factorization with sparse and graph regularization for fake news detection on social networks
Sun et al. Graph force learning
Kokane et al. Word sense disambiguation: a supervised semantic similarity based complex network approach
CN112287119B (en) Knowledge graph generation method for extracting relevant information of online resources
CN114706989A (en) Intelligent recommendation method based on technical innovation assets as knowledge base
CN113128237A (en) Semantic representation model construction method for service resources
CN116719999A (en) Text similarity detection method and device, electronic equipment and storage medium
Barik et al. Analysis of customer reviews with an improved VADER lexicon classifier
CN113516094B (en) System and method for matching and evaluating expert for document
Zhang et al. An attentive memory network integrated with aspect dependency for document-level multi-aspect sentiment classification
CN110852066A (en) Multi-language entity relation extraction method and system based on confrontation training mechanism
KR102330190B1 (en) Apparatus and method for embedding multi-vector document using semantic decomposition of complex documents
CN113449517A (en) Entity relationship extraction method based on BERT (belief propagation) gating multi-window attention network model
Hu et al. CGNN: Caption-assisted graph neural network for image-text retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination