CN117149940A - Event argument extraction method and device - Google Patents

Event argument extraction method and device Download PDF

Info

Publication number
CN117149940A
CN117149940A CN202310942975.5A CN202310942975A CN117149940A CN 117149940 A CN117149940 A CN 117149940A CN 202310942975 A CN202310942975 A CN 202310942975A CN 117149940 A CN117149940 A CN 117149940A
Authority
CN
China
Prior art keywords
argument
probability
representation
extraction
context semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310942975.5A
Other languages
Chinese (zh)
Inventor
靳小龙
郭嘉丰
程学旗
黄林萌
官赛萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202310942975.5A priority Critical patent/CN117149940A/en
Publication of CN117149940A publication Critical patent/CN117149940A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a method and a device for extracting event argument, wherein the method comprises the following steps: respectively encoding training data and event types to obtain context semantic representation of trigger words and representation of event types, and interacting the context semantic representation and the representation of the event types to obtain trigger word representation containing event type information and predict the event types; generating an argument extraction problem of a corresponding event type, splicing and encoding a text to be extracted and the argument extraction problem to obtain a context semantic representation of a label, a context semantic representation of each word of a sentence to be extracted and a context semantic representation of an argument role; the method comprises the steps of respectively splicing the context semantic representation of a label and the context semantic representation of each word in a sentence to be extracted with the context semantic representation of an argument character to be extracted, and inputting the spliced context semantic representation into a discrimination network to respectively obtain discrimination probability and labeling probability; and determining the extraction result corresponding to the final argument role by combining the discrimination probability and the labeling probability. The method improves event extraction performance.

Description

Event argument extraction method and device
Technical Field
The application relates to the technical field of event extraction, in particular to a sentence-level event argument extraction method and device.
Background
Event extraction is one of the important tasks in the field of information extraction, the goal of which is to extract structured event information, including event trigger words, event types, event arguments, and their roles, from a given unstructured text. According to the difference of the text range, the event extraction can be divided into two types, namely sentence level and chapter level, and the application focuses on sentence level event extraction, which is hereinafter called event extraction for short without causing confusion. At present, the event extraction task generally adopts a deep learning-based method, and can be divided into two types according to different result acquisition methods: classification-based methods and generation-based methods. The former regards event extraction as multi-classification task of word level, and the final extraction result is obtained by carrying out label classification on all words in the input text; the latter adopts an end-to-end form, directly generates trigger words and argument contents of the event, and obtains a final structured event result by comparing and positioning the generated contents and the original text.
The existing event extraction method has good effects on both subtasks, however, the existing method ignores the unbalance problem existing in training data, and partial performance loss of the model occurs in the training process. So-called training data imbalance contains two aspects, namely, on one hand, the large difference in the number of training samples of different event types in event detection, and the existing method generally identifies all event types by training a single model, and although the method can simplify the implementation and management of an algorithm, the single model can also have deviation in the performance of different event types due to the possible difference in the data volume between the different event types. On the other hand, the existing method assumes that all argument roles are contained in sentences because of the imbalance in quantity caused by the existence or non-existence of argument roles in the original text when the argument is extracted by the event. In practice, however, a sentence often does not contain all argument roles, and the negative sample of model training is built too much, resulting in very low recall rates on the test set.
Disclosure of Invention
In order to solve the problems, the application provides an event argument extraction method and an event argument extraction device, which are realized by an event type representation enhanced two-stage event extraction method, and solve the problem that the consideration of unbalance of training data is lacking in event extraction.
In order to achieve the above object, an aspect of the present application provides an event argument extraction method, including:
respectively encoding training data and event types to obtain context semantic representation of trigger words and representation of event types;
the context semantic representation of the trigger word is interacted with the representation of the event type to obtain the trigger word representation containing the event type information, the trigger word representation is classified, and the event type is predicted;
according to specific trigger words and predicted event types, designing an argument extraction template, generating an argument extraction problem of a corresponding event type, splicing a text to be extracted with the argument extraction problem, and encoding to obtain a context semantic representation of a label, a context semantic representation of each word of a sentence to be extracted and a context semantic representation of an argument role;
after splicing the context semantic representation of the label and the context semantic representation of the argument character to be extracted, inputting a discrimination network to obtain discrimination probability;
after the context semantic representation of each word in the sentence to be extracted and the context semantic representation of the argument character to be extracted are spliced, inputting a discrimination network to obtain labeling probability;
and determining the extraction result corresponding to the final argument role by combining the discrimination probability and the labeling probability.
Optionally, the training data is encoded to obtain a context semantic representation of the trigger word, including:
the training data is pre-processed and the data is pre-processed,
pre-coding the preprocessed training data by using a BERT pre-training language model to obtain a distributed semantic expression of each word after being coded by the BERT pre-training model;
and aggregating the distributed semantic expressions corresponding to the trigger words to obtain the context semantic representation of the trigger words.
Optionally, the event type is encoded to obtain a representation of the event type, including:
constructing a graph neural network according to the hierarchical relationship of the event types, wherein the graph nodes are label nodes of the event types and the sub-event types, and when the sub-event types belong to the event types, connecting edges appear between the corresponding nodes;
and information transmission is carried out among the graph nodes, so that the representation of the event type is obtained.
Optionally, the interaction between the context semantic representation of the trigger word and the representation of the event type to obtain a trigger word representation containing the event type information includes:
performing attention calculation on the context semantic representation of the trigger word and the representation of the event type to obtain trigger word characteristics containing the event type information;
and obtaining the trigger word representation according to the weighted sum of the trigger word characteristics and the trigger word context semantic representation.
Optionally, designing an argument extraction template according to the trigger word of the event and the predicted event type, and generating an argument extraction problem corresponding to the event type, including:
splicing the trigger words, the predicted event types and the argument extraction templates to obtain an argument extraction problem;
the event type provides definition of a given event type, the trigger word description of the event designates a corresponding trigger word of an argument to be extracted, and the argument extraction template represents a structure of the given event type.
Optionally, splicing the text to be extracted with the argument extraction problem, and then encoding to obtain a context semantic representation of the tag, a context semantic representation of each word of the sentence to be extracted, and a context semantic representation of an argument role, including:
the text to be extracted and the argument extraction problem are spliced and then preprocessed, and the preprocessed text is input into a BERT pre-training language model for pre-coding, so that the label is obtained
The next Wen Yuyi representation, contextual semantic representations of the individual words of the sentence to be extracted, and contextual semantic representations of the corresponding fragments of the argument characters;
and aggregating the context semantic representations of the fragments corresponding to the argument roles to obtain the context semantic representations of the argument roles needing to be extracted.
Optionally, after splicing the context semantic representation of the tag and the context semantic representation of the argument character to be extracted, inputting the context semantic representation into a discrimination network to obtain a discrimination probability, including:
the context semantic representation of the selection tag is spliced with the context semantic representation of the argument character to be extracted, so that the distinguishing characteristics of the argument character in the text are obtained;
inputting the distinguishing characteristics into a two-layer distinguishing network to perform characteristic modeling;
modeling the extractable probability of the argument character in the text through a softmax function, and determining the discrimination probability:
wherein,discrimination probability indicating that the argument character has no answer in the text,/for example>Indicating the discrimination probability of the argument character having an answer in the text.
Optionally, after splicing the context semantic representation of each word in the sentence to be extracted and the context semantic representation of the argument character to be extracted, inputting a discrimination network to obtain a labeling probability, including:
splicing the context semantic representation of each word in the sentence to be extracted with the context semantic representation of the argument character to be extracted to obtain the extraction feature of each word for the argument character;
respectively inputting the extracted features into three two-layer discrimination networks, and respectively modeling words as start tag features, end tag features and BIO tag features pointed by character corresponding argument;
wherein Z is start 、Z end 、Z BIO Respectively represents a start tag feature, an end tag feature and a BIO tag feature,representing the distinguishing characteristics-> Are all learned parameters;
modeling the word as the start tag probability, end tag probability and BIO tag probability of the role corresponding element according to the start tag feature, end tag feature and BIO tag feature of the word as the role corresponding element by a softmax function, and determining the labeling probability;
wherein the labeling probability comprises a start label probability, an end label probability and a BIO label probability,the representational word does not refer to the start tag probability, end tag probability, or +_as the argument>The representative word as the argument refers to the start tag probability and end tag probability, ++>Representing the probability that the token is not part of an argument reference,representing the probability of the token as the argument referring to the starting position,/for>Representing the probability that the token refers to intermediate content as the argument.
Optionally, the determining the extraction result corresponding to the final argument role by combining the discrimination probability and the labeling probability includes:
obtaining a discrimination score of the answer according to the discrimination probability of the answer in the text of the argument character;
obtaining a target extraction score with an answer according to the labeling probability, wherein the target extraction score comprises:
obtaining a first extraction score according to the weighted sum of the start tag probability as the argument reference and the end tag probability as the argument reference;
obtaining a second extraction score according to the weighted sum of the probability of the token as the starting position of the argument and the probability of the token as the intermediate content;
obtaining a target extraction score with an answer according to the weighted sum of the first extraction score and the second extraction score;
obtaining a comprehensive score of the extraction segment as a question answer according to the weighted sum of the discrimination score and the target extraction score;
and determining an extraction result corresponding to the final argument role according to the comprehensive score, wherein the extraction result comprises:
when the comprehensive score exceeds a score threshold, representing that the question can be answered, and adding the extraction segment into the result of role correspondence meta-reference;
and when the integrated score does not exceed a score threshold, indicating that the question is not answerable, and discarding the extracted segment.
The application also provides an event argument extraction device, which adopts the event argument extraction method, and at least comprises the following steps:
the text and event type coding module is used for respectively coding training data and event types to obtain context semantic representation of trigger words and representation of event types;
the interaction and prediction module is used for interacting the context semantic representation of the trigger word with the representation of the event type to obtain the trigger word representation containing the event type information, classifying the trigger word representation and predicting the event type;
the problem coding module is used for designing an argument extraction template according to specific trigger words and predicted event types, generating an argument extraction problem corresponding to the event types, splicing a text to be extracted with the argument extraction problem, and coding to obtain a context semantic representation of a label, a context semantic representation of each word of a sentence to be extracted and a context semantic representation of an argument role;
the argument character judging module is used for inputting the judging network to obtain judging probability after splicing the context semantic representation of the label and the context semantic representation of the argument character to be extracted;
the argument character extraction module is used for inputting the judgment network to obtain labeling probability after splicing the context semantic representation of each word in the sentence to be extracted and the context semantic representation of the argument character to be extracted;
and the argument role decoding module is used for determining an extraction result corresponding to the final argument role by combining the discrimination probability and the labeling probability.
The advantages of the application are as follows:
the event argument extraction method provided by the application is used for extracting the event argument based on sentence level, and mainly comprises two processes of event detection and argument extraction. In the event detection process, training data and event types are respectively encoded to obtain context semantic representation of trigger words and representation of event types; and interacting the context semantic representation of the trigger word with the representation of the event type to obtain the trigger word representation containing the event type information, classifying the trigger word representation, and predicting the event detection of the event type. In the process of extracting the argument, the context semantic representation of the label, the context semantic representation of each word of the sentence to be extracted and the context semantic representation of the argument role are obtained by generating an argument extraction problem corresponding to the event type, splicing the text to be extracted with the argument extraction problem and then encoding; then, after the context semantic representation of the label and the context semantic representation of the argument character to be extracted are spliced, inputting a discrimination network to obtain discrimination probability; after the context semantic representation of each word in the sentence to be extracted and the context semantic representation of the argument character to be extracted are spliced, inputting a discrimination network to obtain labeling probability; and finally, determining the extraction result corresponding to the final argument role by combining the discrimination probability and the labeling probability. The discrimination network is introduced in the event argument extraction task, and under the condition of less increase of the calculation cost, the problem of sample imbalance is effectively relieved through the fusion of the results of two stages. Meanwhile, the judgment probability and the labeling probability are combined to conduct argument character extraction, and argument extraction performance is further improved.
Drawings
Fig. 1 is a flow chart illustrating a method for extracting event arguments according to an embodiment of the present application;
FIG. 2 is a diagram showing details of event type structure codes according to the present application;
FIG. 3 is a block diagram showing the event detection process of steps S1-S2 in the present application;
FIG. 4 is a diagram showing a problem of corpus preprocessing in step S3;
FIG. 5 is a detailed block diagram of the event argument extraction module of steps S4-S6 of the present application;
FIG. 6 shows a schematic diagram of an event argument extraction apparatus;
wherein,
200-event argument extraction means;
201-text and event type encoding module;
202-an interaction and prediction module;
203-a problem encoding module;
204-an argument character discrimination module;
205-argument character extraction module;
206-argument character decoding module.
Detailed Description
In order to make the above features and effects of the present application more clearly understood, the following specific examples are given with reference to the accompanying drawings.
The application aims to solve the problem that the prior art ignores unbalance of training data, and provides a two-stage event extraction method with enhanced event type representation. Introducing a hierarchical relation of event types into an event detection task, designing and realizing a discriminant method for enhancing the event type representation, and carrying out knowledge transfer by utilizing the hierarchical relation of the event types, in particular from the event types with more samples to the event types with less samples, thereby enhancing the representation of the whole event types and improving the effect of the event detection task; converting the argument extraction task into a reading understanding task, converting whether a question exists in an argument character into whether a answer exists in a reading understanding question, firstly judging whether the answer exists, extracting and determining the starting position and the ending position of the answer, and finally judging the final argument extraction result through the fusion of whether the answer exists in two stages. The present application will be described in detail below.
As shown in fig. 1, fig. 1 shows a flow chart of an event argument extraction method provided by an embodiment of the present application, which specifically includes the following steps:
an event argument extraction method, comprising:
s1, respectively encoding training data and event types to obtain context semantic representation of trigger words and representation of event types.
In the embodiment, the context semantic representation of the trigger word is obtained by encoding the training data; by encoding the event type, a representation of the event type is obtained. Specifically, for coding training data, firstly, preprocessing the training data, specifically, word segmentation can be performed on the text of the training data through a WordPieceTokenizer module in a converters library, and the input of the same batch of words is filled into the same length according to the longest text length of the batch of words. Then, the large-scale corpus BERT pre-training language model is used for pre-coding the input words and word sequences of the pre-processed training data, so that the distributed semantic expression of each word after being coded by the BERT pre-training model is obtained, and compared with the traditional static word vector, the method can obtain more abundant dynamic semantic expression:
X=BERT([CLS],t 1 ,t 2 ,…,t n ,[SEP])={x CLS ,x 1 ,x 2 ,…,x n ,x SEP [ CLS ]]、[SEP]Is a label, x CLS ,x 1 ,x 2 ,…,x n ,x SEP Representing the semantic representation of each word.
Then, the distributed semantic expressions corresponding to the text trigger words are aggregated to obtain context semantic expressions of the trigger words, namely, the semantic vectors of the trigger words are weighted and averaged to obtain context semantic expressions x of the trigger words tri
In addition, for the coding of the event type, the specific coding is shown in fig. 2, and a graph neural network is constructed according to the hierarchical relation of the event type, wherein the graph node is a label node of the event type and the sub-event type, and when the sub-event type belongs to the event type, the corresponding nodes are connected with edges; and information transmission is carried out among the graph nodes, so that the representation of the event type is obtained.
Specifically, assume a parent node v i And child node v j There is a hierarchical path e between i,j The features f (e i,j ) Will be determined by the prior probability P (U j |U i ) And P (U) i |U j ) The representation is:
wherein U is k Representing v k K e { i, j }, P (U) j |U i ) Representing when v i V when present j Probability of occurrence, P (U j ∩U i ) Represents { v } j ,v i Probability of simultaneous occurrence; n (N) k Represented in training set U k Since the corresponding event type must appear when a certain sub event type appears, the probability value when j e child (i) is set to 1.0 in the application. The weighted adjacency matrix uses a global edge feature matrix f=a 0,0 ,a 0,1 ,…,a C-1,C-1 The representation, for node i, where from ring edge a i,i =1, and the weights with other edges use the edge features f (e i,j ) Representation, i.e. for non-neighbour node a i,j =0, for parent nodeFor child node a i,j =1. Then, information transfer is carried out among the graph nodes, and the node states are converted as follows:
wherein,is a collectable parameter, h k A representation of the event type.
In the embodiment, a hierarchical relationship of event types is introduced into an event detection task to construct a graph neural network for event type coding, and the hierarchical relationship of the event types is permanently established on a given data set without manual design, so that noise introduced by using additional knowledge introduced by manual design or a third-party tool is avoided.
S2, interacting the context semantic representation of the trigger word with the representation of the event type to obtain the trigger word representation containing the event type information, classifying the trigger word representation, and predicting the event type.
In a specific implementation, as shown in FIG. 3, FIG. 3 shows a block diagram of the event detection process of steps S1-S2. The trigger word characteristics v containing the event type information are obtained by performing attention calculation on the trigger word context semantic representation and the event type representation determined in the step S1 evt The method is characterized by comprising the following steps:
then, according to the trigger word feature v evt Semantic representation x with the trigger word context <tri> To obtain the trigger word representation
After the trigger word representation is obtained, the trigger word representation is classified to predict the event type, in particular, by inputting the trigger word representation into a two-layer linear network modeling feature z= { Z 0 ,z 1 ,…,z M-1 }。
Z=W 2 ·H+b 2
Wherein W is 1 、b 1 、W 2 、b 2 Is a learning parameter.
Then, modeling the probability distribution of the trigger word on the sub-event type label through a softmax function
Wherein,the probability that the trigger word belongs to the k-th sub-event is represented, and the probability characterizes the probability estimation of the model for the trigger word belonging to a certain sub-event type. Meanwhile, according to the true probability that the trigger word belongs to a certain sub-event typeThe following cross entropy loss function calculation is made:
s3, designing an argument extraction template according to specific trigger words and predicted event types, generating an argument extraction problem of a corresponding event type, splicing a text to be extracted and the argument extraction problem, and encoding to obtain a context semantic representation of a label, a context semantic representation of each word of a sentence to be extracted and a context semantic representation of an argument role.
As shown in fig. 5, fig. 5 shows a structural diagram of an argument extraction process of steps S3-S6.
Specifically, for the problem encoding process, in this embodiment, the trigger words, the predicted event types, and the argument extraction templates are spliced by [ SEP ] to obtain an argument extraction problem, which is used for guiding the learning of the model, where the structure of the problem is shown in fig. 4, the event types provide definitions of given event types, the trigger word description of the event specifies the corresponding trigger words of the argument to be extracted, and the argument extraction templates represent the structure of the given event types.
Then, the text t= { T to be extracted 1 ,t 2 ,…,t n The problem q= { Q of the extraction of the and argument 1 ,q 2 ,…,q m Pass [ SEP ]]And (3) after splicing, an input text is obtained, then preprocessing is carried out, particularly, word segmentation can be carried out on the input text through a WordPieceTokenizer module in a converters library, and the input of the same batch of words is filled into the same length according to the longest text length of the batch of words. Inputting the preprocessed input text into the BERT pre-training language model for pre-coding to obtain the context semantic representation of the label, the context semantic representation of each word of the sentence to be extracted and the context semantic representation of the corresponding segment of the argument character:
obtaining the context semantic representation of the argument character needing to be extracted by aggregating the context semantic representations of the fragments corresponding to the argument characterWherein arg j Representing the j-th argument role.
S4, after the context semantic representation of the label and the context semantic representation of the argument character to be extracted are spliced, the judgment probability is obtained by inputting the judgment network.
For the argument character discrimination process, the [ CLS ] is specifically selected in the embodiment]Context semantic representation of a tag and context semantic representation x of an argument character that needs to be extracted arg Splicing to obtain the distinguishing characteristics of the argument character in the text
Then, inputting the obtained distinguishing features into a two-layer distinguishing network for feature modeling to obtain a feature vector Z, namely:
Z=W 2 ·H+b 2
wherein W is 1 、b 1 、W 2 、b 2 Is a learning parameter.
Then, based on the feature vector Z, modeling the probability that the argument character is extractable in the text by a softmax functionDetermining a discrimination probability:
wherein,discrimination probability indicating that the argument character has no answer in the text,/for example>Indicating the discrimination probability of the argument character having an answer in the text. The probability characterizes the probability estimation of whether the model has an answer in the text for the argument character, and meanwhile, the application is based on the true probability of whether the argument character has an answer in the text>The following cross entropy loss function calculation is made:
wherein the method comprises the steps ofRepresenting the true probability that the argument character has no answer in the text,/for example>Indicating the true probability that the argument character has no answer in the text.
S5, after the context semantic representation of each word in the sentence to be extracted and the context semantic representation of the argument character to be extracted are spliced, inputting a discrimination network to obtain the labeling probability.
Specifically, for the argument character extraction process, in this embodiment, the context semantic meaning x of each word in the sentence to be extracted is expressed i Context semantic representation x with argument roles that need to be extracted arg Splicing to obtain extraction features of each word for the argument character
Respectively inputting the extracted features into three two-layer discrimination networks, and respectively modeling words as start tag features, end tag features and BIO tag features pointed by character corresponding argument;
wherein Z is start 、Z end 、Z BIO Respectively represents a start tag feature, an end tag feature and a BIO tag feature,representing the distinguishing characteristics-> Are all learned parameters;
then, modeling the word as the start tag probability, end tag probability and BIO tag probability of the role corresponding argument according to the start tag feature, end tag feature and BIO tag feature of the word as the role corresponding argument respectively through a softmax function, and determining the labeling probability;
wherein the labeling probability comprises a start label probability, an end label probability and a BIO label probability,the representational word does not refer to the start tag probability, end tag probability, or +_as the argument>The representative word as the argument refers to the start tag probability and end tag probability, ++>Representing the probability that the token is not part of an argument reference,representing the probability of the token as the argument referring to the starting position,/for>Representing the probability that the token refers to intermediate content as the argument.
At the same time, the application is based on the true probability Y of whether the word is part of the argument reference start 、Y end 、Y BIO The following cross entropy loss function calculation is made:
wherein the method comprises the steps ofRespectively representing whether the word is referred to as the true probability of the start tag, end tag, respectively, ++>Representing the true probability that the word tag is B/I/O.
In this embodiment, a discrimination network is introduced in the event argument extraction tasks of steps S4 and S5, and under the condition of less increase in calculation cost, the problem of sample imbalance is effectively alleviated by the fusion of the results of the two stages. Meanwhile, the judgment probability and the labeling probability are combined to conduct argument character extraction, and argument extraction performance is further improved.
S6, combining the discrimination probability and the labeling probability to determine the extraction result corresponding to the final argument role.
In a specific implementation, further referring to fig. 5, for the argument character decoding process, first, according to the discrimination probability of the argument character having an answer in the text, a discrimination score having an answer is obtained
Then, obtaining a target extraction score with an answer according to the labeling probability, wherein the target extraction score specifically comprises:
and traversing the text aiming at the start and end labels, extracting preliminary argument contents, and regarding the overlong candidate argument, considering that the end position of a certain argument index is not successfully judged, so that the argument index obtained by using the BIO label in the text.
For the argument index extracted by using the start and end labels, taking the average value according to the weighted sum of the start label probability as the argument index and the end label probability as the argument index to obtain a first extraction scoreWherein start and end represent the start and end positions in the text, respectively, using the start tag, end tag extraction argument.
For argument index extracted by using BIO label, taking average value according to weighted sum of probability of token as starting position of argument index and probability of token as intermediate content of argument index, obtaining second extraction score:where start and end represent the start and end positions, respectively, of the argument references in the text using BIO extraction.
Then, obtaining a target score with answer according to the weighted sum of the first extraction score and the second extraction score has
Meanwhile, determining the discrimination score ans Score extraction with target has Then, obtaining a combined score for extracting the segment as the question answer according to the weighted sum of the two all =α·score ans +β·score has Wherein α, β represent weighting coefficients, score ans Indicating a discrimination score, score has-o Representing target score, score all Representing the composite score.
Finally, according to the comprehensive score, determining an extraction result corresponding to the role of the final argument, namely when the comprehensive score exceeds a score threshold, indicating that the question can be answered, and adding the extraction fragment into a result pointed by the role correspondence argument; and when the integrated score does not exceed a score threshold, indicating that the question is not answerable, and discarding the extracted segment.
In summary, the event argument extraction method provided by the application is used for extracting the event argument based on sentence level, and is mainly divided into two processes of event detection and argument extraction. In the event detection process, training data and event types are respectively encoded to obtain context semantic representation of trigger words and representation of event types; and interacting the context semantic representation of the trigger word with the representation of the event type to obtain the trigger word representation containing the event type information, classifying the trigger word representation, and predicting the event detection of the event type. In the process of extracting the argument, the context semantic representation of the label, the context semantic representation of each word of the sentence to be extracted and the context semantic representation of the argument role are obtained by generating an argument extraction problem corresponding to the event type, splicing the text to be extracted with the argument extraction problem and then encoding; then, after the context semantic representation of the label and the context semantic representation of the argument character to be extracted are spliced, inputting a discrimination network to obtain discrimination probability; after the context semantic representation of each word in the sentence to be extracted and the context semantic representation of the argument character to be extracted are spliced, inputting a discrimination network to obtain labeling probability; and finally, determining the extraction result corresponding to the final argument role by combining the discrimination probability and the labeling probability. The discrimination network is introduced in the event argument extraction task, and under the condition of less increase of the calculation cost, the problem of sample imbalance is effectively relieved through the fusion of the results of two stages. Meanwhile, the judgment probability and the labeling probability are combined to conduct argument character extraction, and argument extraction performance is further improved. Through detection, the event detection F1 value reaches 77.4% on the ACE public data test set, and the event argument extraction F1 value reaches 75.9%.
In addition, the above embodiment of the present application may be applied to a terminal device having a function of extracting an event argument based on a sentence level, where the terminal device may include a personal terminal, an upper computer terminal, and the like, and the embodiment of the present application is not limited thereto. The terminal can support Windows, android (android), IOS, windowsPhone and other operating systems.
The event argument extraction device 200 is applicable to a sentence-level-based event argument extraction method, which can be applied to a personal terminal and an upper computer terminal device, and can realize various processes realized by the event argument extraction method shown in fig. 1. Fig. 6 shows a schematic diagram of the event argument extraction apparatus 200.
An event argument extraction apparatus 200, at least comprising:
the text and event type coding module 201 is configured to code training data and event types respectively, so as to obtain context semantic representation of a trigger word and representation of an event type;
the interaction and prediction module 202 is configured to interact the context semantic representation of the trigger word with the representation of the event type, obtain a trigger word representation containing event type information, classify the trigger word representation, and predict the event type;
the problem coding module 203 is configured to design an argument extraction template according to a specific trigger word and a predicted event type, generate an argument extraction problem corresponding to the event type, splice a text to be extracted with the argument extraction problem, and code the result to obtain a context semantic representation of a tag, a context semantic representation of each word of a sentence to be extracted, and a context semantic representation of an argument role;
the argument character judging module 204 is configured to splice the context semantic representation of the tag with the context semantic representation of the argument character to be extracted, and then input the context semantic representation into a judging network to obtain a judging probability;
the argument character extraction module 205 is configured to splice a context semantic representation of each word in the sentence to be extracted with a context semantic representation of an argument character to be extracted, and then input a discrimination network to obtain a labeling probability;
and the argument character decoding module 206 is configured to determine an extraction result corresponding to the final argument character by combining the discrimination probability and the labeling probability.
Furthermore, it should be understood that in the event argument extraction apparatus 200 according to the embodiment of the present application, only the above-described division of each functional module is illustrated, and in practical applications, the above-described allocation of functions may be performed by different functional modules according to needs, i.e., the event argument extraction apparatus 200 may be divided into functional modules different from the above-illustrated modules to perform all or part of the above-described functions.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may also be applied, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (10)

1. A method for extracting event arguments, comprising:
respectively encoding training data and event types to obtain context semantic representation of trigger words and representation of event types;
the context semantic representation of the trigger word is interacted with the representation of the event type to obtain the trigger word representation containing the event type information, the trigger word representation is classified, and the event type is predicted;
according to specific trigger words and predicted event types, designing an argument extraction template, generating an argument extraction problem of a corresponding event type, splicing a text to be extracted with the argument extraction problem, and encoding to obtain a context semantic representation of a label, a context semantic representation of each word of a sentence to be extracted and a context semantic representation of an argument role;
after splicing the context semantic representation of the label and the context semantic representation of the argument character to be extracted, inputting a discrimination network to obtain discrimination probability;
after the context semantic representation of each word in the sentence to be extracted and the context semantic representation of the argument character to be extracted are spliced, inputting a discrimination network to obtain labeling probability;
and determining the extraction result corresponding to the final argument role by combining the discrimination probability and the labeling probability.
2. The method of claim 1, wherein encoding training data to obtain trigger context semantic representations comprises:
the training data is pre-processed and the data is pre-processed,
pre-coding the preprocessed training data by using a BERT pre-training language model to obtain a distributed semantic expression of each word after being coded by the BERT pre-training model;
and aggregating the distributed semantic expressions corresponding to the trigger words to obtain the context semantic representation of the trigger words.
3. The method of claim 1, wherein encoding the event type to obtain a representation of the event type comprises:
constructing a graph neural network according to the hierarchical relationship of the event types, wherein the graph nodes are label nodes of the event types and the sub-event types, and when the sub-event types belong to the event types, connecting edges appear between the corresponding nodes;
and information transmission is carried out among the graph nodes, so that the representation of the event type is obtained.
4. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the context semantic representation of the trigger word is interacted with the representation of the event type to obtain the trigger word representation containing the event type information, which comprises the following steps:
performing attention calculation on the context semantic representation of the trigger word and the representation of the event type to obtain trigger word characteristics containing the event type information;
and obtaining the trigger word representation according to the weighted sum of the trigger word characteristics and the trigger word context semantic representation.
5. The method of claim 1, wherein designing an argument extraction template based on trigger words of events and predicted event types, generates an argument extraction question for the corresponding event type, comprising:
splicing the trigger words, the predicted event types and the argument extraction templates to obtain an argument extraction problem;
the event type provides definition of a given event type, the trigger word description of the event designates a corresponding trigger word of an argument to be extracted, and the argument extraction template represents a structure of the given event type.
6. The method of claim 5, wherein splicing the text to be extracted with the argument extraction problem and encoding to obtain a contextual semantic representation of a tag, a contextual semantic representation of each word of a sentence to be extracted, and a contextual semantic representation of an argument character comprises:
the text to be extracted and the argument extraction problem are spliced, preprocessed and input into a BERT pre-training language model for pre-coding, so that the context semantic representation of the label, the context semantic representation of each word of the sentence to be extracted and the context semantic representation of the corresponding segment of the argument role are obtained;
and aggregating the context semantic representations of the fragments corresponding to the argument roles to obtain the context semantic representations of the argument roles needing to be extracted.
7. The method of claim 6, wherein the step of providing the first layer comprises,
after the context semantic representation of the label and the context semantic representation of the argument character to be extracted are spliced, the judgment probability is obtained by inputting a judgment network, and the method comprises the following steps:
the context semantic representation of the selection tag is spliced with the context semantic representation of the argument character to be extracted, so that the distinguishing characteristics of the argument character in the text are obtained;
inputting the distinguishing characteristics into a two-layer distinguishing network to perform characteristic modeling;
modeling the extractable probability of the argument character in the text through a softmax function, and determining the discrimination probability:
wherein,discrimination probability indicating that the argument character has no answer in the text,/for example>Indicating the discrimination probability of the argument character having an answer in the text.
8. The method of claim 7, wherein concatenating the contextual semantic representations of the words in the sentence to be extracted with the contextual semantic representations of the argument characters to be extracted, inputting the contextual semantic representations into a discrimination network to obtain labeling probabilities, comprises:
splicing the context semantic representation of each word in the sentence to be extracted with the context semantic representation of the argument character to be extracted to obtain the extraction feature of each word for the argument character;
respectively inputting the extracted features into three two-layer discrimination networks, and respectively modeling words as start tag features, end tag features and BIO tag features pointed by character corresponding argument;
wherein Z is start 、Z end 、Z BIO Respectively represents a start tag feature, an end tag feature and a BIO tag feature,representing the distinguishing characteristics-> Are all learned parameters;
modeling the word as the start tag probability, end tag probability and BIO tag probability of the role corresponding element according to the start tag feature, end tag feature and BIO tag feature of the word as the role corresponding element by a softmax function, and determining the labeling probability;
wherein the labeling probability comprises a start label probability, an end label probability and a BIO label probability,the representational word does not refer to the start tag probability, end tag probability, or +_as the argument>The representative word as the argument refers to the start tag probability and end tag probability, ++>Representing the probability that the token is not part of the argument index,/a>Representing the probability of the token as the argument referring to the starting position,/for>Representing the probability that the token refers to intermediate content as the argument.
9. The method of claim 8, wherein the determining the extraction result corresponding to the final argument role by combining the discrimination probability and the labeling probability comprises:
obtaining a discrimination score of the answer according to the discrimination probability of the answer in the text of the argument character;
obtaining a target extraction score with an answer according to the labeling probability, wherein the target extraction score comprises:
obtaining a first extraction score according to the weighted sum of the start tag probability as the argument reference and the end tag probability as the argument reference;
obtaining a second extraction score according to the weighted sum of the probability of the token as the starting position of the argument and the probability of the token as the intermediate content;
obtaining a target extraction score with an answer according to the weighted sum of the first extraction score and the second extraction score;
obtaining a comprehensive score of the extraction segment as a question answer according to the weighted sum of the discrimination score and the target extraction score;
and determining an extraction result corresponding to the final argument role according to the comprehensive score, wherein the extraction result comprises:
when the comprehensive score exceeds a score threshold, representing that the question can be answered, and adding the extraction segment into the result of role correspondence meta-reference;
and when the integrated score does not exceed a score threshold, indicating that the question is not answerable, and discarding the extracted segment.
10. Event argument extraction apparatus, characterized in that it adopts the event argument extraction method of any one of claims 1-9, comprising at least:
the text and event type coding module is used for respectively coding training data and event types to obtain context semantic representation of trigger words and representation of event types;
the interaction and prediction module is used for interacting the context semantic representation of the trigger word with the representation of the event type to obtain the trigger word representation containing the event type information, classifying the trigger word representation and predicting the event type;
the problem coding module is used for designing an argument extraction template according to specific trigger words and predicted event types, generating an argument extraction problem corresponding to the event types, splicing a text to be extracted with the argument extraction problem, and coding to obtain a context semantic representation of a label, a context semantic representation of each word of a sentence to be extracted and a context semantic representation of an argument role;
the argument character judging module is used for inputting the judging network to obtain judging probability after splicing the context semantic representation of the label and the context semantic representation of the argument character to be extracted;
the argument character extraction module is used for inputting the judgment network to obtain labeling probability after splicing the context semantic representation of each word in the sentence to be extracted and the context semantic representation of the argument character to be extracted;
and the argument role decoding module is used for determining an extraction result corresponding to the final argument role by combining the discrimination probability and the labeling probability.
CN202310942975.5A 2023-07-28 2023-07-28 Event argument extraction method and device Pending CN117149940A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310942975.5A CN117149940A (en) 2023-07-28 2023-07-28 Event argument extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310942975.5A CN117149940A (en) 2023-07-28 2023-07-28 Event argument extraction method and device

Publications (1)

Publication Number Publication Date
CN117149940A true CN117149940A (en) 2023-12-01

Family

ID=88883210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310942975.5A Pending CN117149940A (en) 2023-07-28 2023-07-28 Event argument extraction method and device

Country Status (1)

Country Link
CN (1) CN117149940A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117910473A (en) * 2024-03-19 2024-04-19 北京邮电大学 Event argument extraction method integrating entity type information and related equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117910473A (en) * 2024-03-19 2024-04-19 北京邮电大学 Event argument extraction method integrating entity type information and related equipment

Similar Documents

Publication Publication Date Title
CN111708882B (en) Transformer-based Chinese text information missing completion method
CN112487149B (en) Text auditing method, model, equipment and storage medium
CN113051929A (en) Entity relationship extraction method based on fine-grained semantic information enhancement
CN111339260A (en) BERT and QA thought-based fine-grained emotion analysis method
CN112070138A (en) Multi-label mixed classification model construction method, news classification method and system
CN115292463B (en) Information extraction-based method for joint multi-intention detection and overlapping slot filling
CN113468333B (en) Event detection method and system fusing hierarchical category information
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN117149940A (en) Event argument extraction method and device
CN114492460B (en) Event causal relationship extraction method based on derivative prompt learning
CN111858878A (en) Method, system and storage medium for automatically extracting answer from natural language text
CN112528658A (en) Hierarchical classification method and device, electronic equipment and storage medium
CN115408525A (en) Petition text classification method, device, equipment and medium based on multi-level label
CN113239694B (en) Argument role identification method based on argument phrase
CN115098673A (en) Business document information extraction method based on variant attention and hierarchical structure
CN114881043A (en) Deep learning model-based legal document semantic similarity evaluation method and system
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN116304064A (en) Text classification method based on extraction
CN113342982B (en) Enterprise industry classification method integrating Roberta and external knowledge base
CN115563278A (en) Question classification processing method and device for sentence text
CN114297408A (en) Relation triple extraction method based on cascade binary labeling framework
CN112883183B (en) Method for constructing multi-classification model, intelligent customer service method, and related device and system
CN115563253A (en) Multi-task event extraction method and device based on question answering
CN113657118A (en) Semantic analysis method, device and system based on call text
CN113535946A (en) Text identification method, device and equipment based on deep learning and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination