CN113222119A

CN113222119A - Argument extraction method for multi-view encoder by using topological dependency relationship

Info

Publication number: CN113222119A
Application number: CN202110594279.0A
Authority: CN
Inventors: 罗森林; 祁佳俊; 吴舟婷; 周妍汝; 董勃
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-08-06
Anticipated expiration: 2041-05-28
Also published as: CN113222119B

Abstract

The invention relates to an argument extraction method of a multi-view graph encoder by using topological dependency relationship, belonging to the field of natural language processing and machine learning. The method mainly aims to solve the problem that when single-type feature modeling is adopted to extract arguments, argument feature representation of multiple roles is easily interfered by semantic-free association information of candidate arguments, and feature representation of the multiple-role arguments is inaccurate. Firstly, realizing text embedding on a data set by using a BERT pre-training model to obtain a text embedding vector, and triggering a word category embedding vector and an entity category embedding vector; then modeling topological relation entries among the candidate arguments, the entity categories and the trigger words, and constructing a multi-view map information network; and finally, respectively coding the multi-view graphs by using a graph convolution network, aggregating to obtain candidate argument multi-view graph embedded vectors, and classifying and extracting event arguments from the candidate arguments through a SoftMax full-link layer. Experiments are carried out on the ACE2005 English corpus, and results show that the method can achieve a good argument extraction effect.

Description

Argument extraction method for multi-view encoder by using topological dependency relationship

Technical Field

The invention relates to an argument extraction method of a multi-view graph encoder by using topological dependency relationship, belonging to the field of natural language processing and machine learning.

Background

The argument extraction aims at extracting corresponding argument entities in the event sentences and marking argument roles such as time, place, people and the like for the argument entities, so that structured output of unstructured texts containing event information is realized.

In the argument extraction task, one trigger word class corresponds to arguments of several specific roles, a certain argument is represented by an entity of a specific class, and the arguments are connected with the arguments through a syntactic structure, so that a certain topological relation exists among candidate arguments, the entity class and the trigger word.

The triggering word category, the entity category, the candidate argument and other multi-type features in the candidate sentence can provide effective guidance for accurate extraction of the argument, and the existing argument extraction methods are mainly divided into a vector splicing-based method, a sequence modeling-based method and a topological structure construction-based method according to different feature construction modes.

1. Vector splicing-based method

The argument extraction method based on vector splicing realizes the construction of multiple types of features by utilizing the splicing mode of different types of feature vectors. However, the method often utilizes a mode of directly introducing or directly calculating a single-type feature vector to construct features, and the feature construction mode does not consider the guiding effect of a syntactic structure on argument distribution, so that the argument entities in events are difficult to directly and accurately position, and the problem of inaccurate role labeling in candidate entities is caused.

2. Method based on sequence modeling

The argument extraction method based on sequence modeling realizes multi-type feature fusion in a sequence model construction mode. The method utilizes the same characteristic construction mode as the vector splicing method and directly calculates the single type of characteristic vector to construct the characteristic, so that the argument entity of the event is difficult to be directly and accurately positioned, and finally the problem of inaccurate entity role labeling is caused.

3. Method based on topological structure

The argument extraction method based on the topological structure construction mainly researches the construction of the topological structure among different types of features. The method considers the effectiveness of topological structure information on the guidance of argument extraction, but only constructs the syntactic relation among candidate arguments, and does not consider the construction of the topological relation among candidate arguments, a trigger word category and a candidate argument, an entity category, and influences the accuracy of candidate argument feature representation, so that the problems of insufficient utilization of guidance information and low accuracy of argument identification and classification when the candidate arguments corresponding to various roles are subjected to feature modeling are caused.

In summary, in the existing method, only the construction of the dependency relationship between the candidate argument and the candidate argument is usually considered, the trigger word class and the entity class information are often introduced in a vector splicing or sequence modeling manner, and the topological relationship construction between the candidate argument, the trigger word class and the candidate argument-entity class characteristics is not considered, so that when the argument extraction method of the single-type feature modeling is adopted to extract the arguments in the candidate sentences with the same co-occurrence words, the argument feature representation corresponding to multiple roles is easily interfered by the candidate argument without semantic association information, the feature representation of the multi-role argument is inaccurate, and the argument extraction effect is affected.

Disclosure of Invention

The invention aims to provide an argument extraction method of a multi-view graph encoder by using a topological dependency relationship, aiming at the problem that argument feature representation of multiple roles is easily interfered by semantic association information-free candidate arguments when single-type feature modeling is adopted to extract arguments, so that the feature representation of the multi-role arguments is inaccurate.

The design principle of the invention is as follows: firstly, embedding a text into a data set by using a BERT pre-training model; secondly, modeling the correlation among the three types of characteristics of the candidate argument, the trigger word category and the entity category in a multi-view diagram constructing mode; then, encoding three graphs constructed from different angles by using a Graph Convolution Network (GCN), and obtaining a multi-view graph embedding vector of a candidate argument; and finally, classifying and extracting the event arguments through a Softmax full-connection layer.

The technical scheme of the invention is realized by the following steps:

step 1, text embedding is realized on a data set ACE2005 by using a BERT pre-training model.

Step 1.1, regarding the sentence as a sequence formed by words, dividing the words into a group of limited public subword units to obtain word block embedded vectors, and setting [ CLS ] and [ SEP ] labels at the beginning and the end of the sentence respectively.

And step 1.2, coding the position information of the words into a feature vector, wherein the position coding mode is the same as that used in a transform model, and a position embedding vector is obtained by calculating a sine and cosine function.

And 1.3, setting different characteristic values for two different sentences, and distinguishing the two different sentences by setting the different characteristic values to obtain the segmentation embedded vector.

And step 1.4, inputting the word block embedding vector, the position embedding vector and the segmentation embedding vector obtained in the steps 1.1, 1.2 and 1.3 into a BERT model to obtain a text embedding vector.

Step 1.5, embedding 34 trigger word categories defined by the ACE2005 data set by searching a randomly initialized trigger word category vector table to obtain trigger word category embedded vectors.

Step 1.6, the entity scope is labeled by using a BIO labeling strategy, and at the same time, 45 entity categories defined in the ACE2005 are embedded by searching a randomly initialized entity category vector table to obtain entity category embedded vectors.

And 2, modeling the topological relation among the candidate argument, the entity category and the trigger word, and constructing a multi-view diagram.

And 2.1, constructing a candidate argument node-candidate argument node view angle information network graph according to the dependency syntax relationship among the candidate arguments to obtain a corresponding adjacency matrix.

And 2.2, constructing a candidate argument node-trigger word class node view angle information network graph according to the topological relation between the candidate arguments and the trigger word class, and obtaining a corresponding adjacency matrix.

And 2.3, constructing a candidate argument node-entity class node view angle information network graph according to the topological relation between the candidate arguments and the entity class, and obtaining a corresponding adjacency matrix.

And 3, respectively encoding the multi-view graphs by using a Graph Convolution Network (GCN), and aggregating to obtain multi-view graph embedded vectors of candidate arguments.

And 3.1, respectively encoding the candidate argument node-candidate argument node graph, the candidate argument node-trigger word class node graph and the selected argument node-entity class node graph by using GCN to obtain three corresponding network embedded vectors.

And 3.2, aggregating the network embedded vectors of the three views by using a bidirectional gating circulation unit (BiGRU) to obtain the multi-view map embedded vector of the candidate argument.

And 4, classifying and extracting the event arguments from the candidate arguments through a Softmax full-connection layer.

Advantageous effects

Compared with the argument extraction method constructed based on the topological structure, the construction method of the multi-view graph not only utilizes the dependency syntax relationship between the candidate arguments, but also utilizes the topological dependency relationship between the candidate arguments, the trigger word category and the candidate arguments, the entity category, solves the problem of insufficient utilization of guidance information when the candidate arguments of various roles are subjected to feature modeling, and improves the accuracy of argument identification and classification.

Compared with a argument extraction method based on vector splicing or sequence modeling, the method provided by the invention takes the guiding effect of the syntactic structure on argument distribution into consideration, so that the problem of inaccurate role labeling in candidate entities is avoided.

Drawings

FIG. 1 is a schematic diagram of argument extraction method of a multi-view encoder using topological dependency relationship according to the present invention.

Detailed Description

In order to better illustrate the objects and advantages of the present invention, embodiments of the method of the present invention are described in further detail below with reference to examples.

The experimental data is ACE2005 English open corpus, is the mainstream data set of the event extraction task, contains 599 files in total, and covers 6 different fields: broadcast conversion (bc), broadcast news (bn), telestone conversion (cts), newwire (nw), usenet (un) and weblogs (wl). And (3) carrying out test set, verification set and training set division on the original data set, wherein the experimental data division condition is shown in table 1.

TABLE 1 data set partitioning

In the experimental process, the entity category embedding dimension and the trigger word category embedding dimension are both set to be 50, and the multi-view map embedding dimension is set to be 200. The number of layers of the Graph Convolution Network (GCN) is 2, and dropout is 0.2. The dropout of BERT is 0.5. The learning rate of the model is 2 e-5.

The test uses Precision (Precision), Recall (Recall) and F1 values to evaluate the effect of argument extraction. Precision represents the proportion of the entity with the correct role label in all the entities predicted as arguments on the premise of correctly classifying the trigger words; recall represents the proportion of the entity with the correct role label in all argument entities on the premise of correct classification of the trigger words; f1 is the harmonic mean of Precision and Recall. Each calculation formula is shown as formula 1, formula 2, and formula 3.

Wherein, TP represents that the real category is a certain argument role, and is also marked as the sample number of the argument role in the prediction result; FP represents that the real category is non-argument and the sample number marked as a certain argument role in the prediction result; FN represents the number of samples of which the real category is a certain argument role and which are marked as non-argument in the prediction result; TN indicates the number of samples whose true category is non-argument and which is also non-argument in the prediction result.

The experiment is carried out on a computer and a server, and the computer is specifically configured as follows: inter i7-6700, CPU 2.40GHz, memory 4G, operating system windows 7, 64 bit; the specific configuration of the server is as follows: e7-4820v4, RAM 256G, operating system is Linux Ubuntu 64 bit.

The specific process of the experiment is as follows:

Step 1.1, sentence

Is regarded as being composed of N_wDividing a word into a limited group of common sub-word units to obtain word block embedding vectors, and setting [ CLS ] at the beginning and the end of a sentence respectively]And [ SEP ]]And (4) a label.

Step 1.4, inputting the word block embedding vector, the position embedding vector and the segmentation embedding vector obtained in the steps 1.1, 1.2 and 1.3 into a BERT model to obtain a text embedding vector

Step 1.5, embedding 34 trigger word categories defined by the ACE2005 data set by searching a randomly initialized trigger word category vector table to obtain trigger word category embedding directionsMeasurement of

Step 1.6, labeling entity scope by using BIO labeling strategy, and embedding 45 entity categories defined in ACE2005 by searching randomly initialized entity category vector table to obtain entity category embedded vector

Step 2.1, constructing a candidate argument node-candidate argument node view angle information network graph according to the dependency syntax relation among the candidate arguments

Wherein upsilon is_wwIs a node, epsilon_wwIs an edge. First, a dependency syntax tree of a candidate sentence is generated by using a Stanford Parser dependency syntax analysis tool, and a dependency syntax relation R (w) of a candidate primitive layer is determined_i,w_j) Building edge (w)_i,w_j) While adding a reversal edge (w)_j,w_i) And self-ringing side (w)_i,w_i) The calculation mode of the candidate argument-candidate argument boundary is as follows:

finally, an adjacency matrix of the dependency relationship between the candidate argument nodes is obtained

Wherein n is_wThe number of the candidate word layer nodes is obtained.

In the step 2.2, the step of the method,constructing a candidate argument node-trigger word class node view angle information network graph according to the topological relation between the candidate arguments and the trigger word classes

Wherein upsilon is_wtIs a node, epsilon_wtIs an edge. The edge construction rule is to judge whether the current candidate argument is a trigger word, and if the current candidate argument is the trigger word, an edge is established between the current candidate argument and the trigger word class node to which the current candidate argument belongs. According to the figure

Obtaining an adjacency matrix of dependency relationships between candidate argument nodes and trigger word class nodes

Wherein n is_wIs the number of candidate word level nodes, n_tThe number of the trigger word category layer nodes.

Step 2.3, constructing a candidate argument node-entity category node view angle information network graph according to the topological relation between the candidate arguments and the entity categories

Wherein upsilon is_weIs a node, epsilon_weIs an edge. The edge construction rule is used for judging whether the current candidate argument belongs to a certain entity class, and if the dependency relationship exists, an edge is established between a candidate argument node and an entity class node to which the argument node belongs according to the BI label. According to the figure

The adjacency matrix for obtaining the dependency relationship between the candidate argument nodes and the entity class nodes is

Wherein n is_wIs the number of candidate word level nodes, n_eThe number of the nodes of the entity category layer.

Step 3.1, using GCN to respectively encode candidate argument nodes-candidate argument node graph, candidate argument nodes-trigger word class node graph and selected argument nodes-entity class node graph, wherein the encoding process is as follows: h^(l+1)＝σ(M^-1/2A′M^-1/2·H^(l)·W^(l)) Where a' is a + I, a being an adjacency matrix derived from a multi-view map

And

i is an identity matrix representing self-connection, W^(l)Is the weight matrix of the l-th layer, σ (-) represents the activation function, H^(l)Initializing H for hidden layer representation of layer I node⁽⁰⁾＝X，

For a regularized laplacian matrix, the formula is calculated as:

respectively carrying out GCN coding on adjacent matrixes of the multi-view map to obtain three corresponding network embedded vectors

And

step 3.2, aggregating the network embedded vectors of three views by using a bidirectional gating circulation unit (BiGRU), wherein the formula is as follows:

wherein

Finally, obtaining a multi-view map embedding vector H of candidate argument^mpge。

And (3) testing results: according to the method, the event argument extraction is carried out on the ACE2005 English corpus by using the argument extraction method of the topological dependency relationship multi-view graph encoder, the accuracy of argument extraction on the ACE2005 English corpus is 61.4%, the recall rate is 62.6%, the F1 value is 62%, the recall rate and the F1 are respectively improved by 4.5% and 0.4% compared with a PLMEE-model, and the event argument extraction effect is improved.

The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An argument extraction method for a multi-view encoder using topological dependency, the method comprising the steps of:

step 1, embedding texts into a data set ACE2005 by using a BERT pre-training model, firstly, sentences

Is regarded as being composed of N_wA sequence of words is obtained by dividing a word into a limited set of common sub-word units to obtain word block embedding vectors, and [ CLS ] is set at the beginning and end of a sentence respectively]And [ SEP ]]The label is used for coding the position information of the word into a characteristic vector in the same position coding mode as that used in the Transformer model, obtaining a position embedding vector through sine and cosine function calculation, setting different characteristic values for two different sentences, distinguishing the two different sentences through setting different characteristic values to obtain a segmentation embedding vector, and then inputting the obtained word block embedding vector, the position embedding vector and the segmentation embedding vector into the BERT model to obtain a text embedding vector

Finally, embedding 34 trigger word categories defined by the ACE2005 data set by searching a randomly initialized trigger word category vector table to obtain trigger word category embedded vectors

Simultaneously, the BIO marking strategy is used for marking the entity range, 45 entity categories defined in the ACE2005 are embedded by searching a randomly initialized entity category vector table, and entity category embedded vectors are obtained

Step 2, modeling topological relations among candidate arguments, entity categories and trigger words, and constructing a multi-view map, wherein a candidate argument node-candidate argument node view information network map is constructed according to a dependency syntax relation among the candidate arguments, then a candidate argument node-trigger word category node view information network map is constructed according to a topological relation between the candidate arguments and the trigger word categories, and finally a candidate argument node-trigger word category view information network map is constructed according to a topological relation between the candidate arguments and the trigger word categories;

step 3, respectively encoding the multi-view map by using a Graph Convolution Network (GCN) to obtain multi-view map embedded vectors of candidate arguments, firstly, respectively encoding candidate argument nodes-candidate argument node maps, candidate argument nodes-trigger word class node maps and selected argument nodes-entity class node maps by using the GCN, wherein the encoding process is as follows: h^(l+1)＝σ(M-^1/2A′M-^1/2·H^(l)·W^(l)) Where a' is a + I, a being an adjacency matrix derived from a multi-view map

And

For the regularized Laplacian matrix, adjacent matrixes of the multi-view image are respectively subjected to GCN coding to obtain three corresponding network embedded vectors

And

then, aggregating the network embedded vectors of three views by using a bidirectional gating cycle unit (BiGRU), wherein the formula is as follows:

wherein

Finally, obtaining a multi-view map embedding vector H of candidate argument^mpge；

2. The argument extraction method of utilizing a topology dependency multiview encoder according to claim 1, wherein: in step 2, a dependency syntax tree of the candidate sentence is generated by using a Stanford Parser dependency syntax analysis tool, and the dependency syntax relation R (w) of the candidate primitive layer is used_i，w_j) Building edge (w)_i，w_j) While adding a reversal edge (w)_j，w_i) And self-ringing side (w)_i，w_i) Constructing a candidate argument node-candidate argument node view angle information network graph ζ_ww＝(υ_ww，ε_ww) Wherein upsilon is_wwIs a node, epsilon_wwTo the side, use

The edges between the candidate arguments and the candidate arguments are constructed in a computing mode, and finally an adjacent matrix of the dependency relationship between the candidate arguments and the candidate argument nodes is obtained

Wherein n is_wThe number of the candidate word layer nodes is obtained.

3. The argument extraction method of utilizing a topology dependency multiview encoder according to claim 1, wherein: in step 2, a candidate argument node-trigger word category node visual angle information network graph is constructed by judging whether a current candidate argument is a trigger word or not, if so, establishing an edge construction rule of one edge between the current candidate argument and the trigger word category node to which the current candidate argument is a trigger word, and constructing the candidate argument node-trigger word category node visual angle information network graph

Wherein upsilon is_wtIs a node, epsilon_wtIs an edge according to the figure

Wherein n is_wIs the number of candidate word level nodes, n_tTo trigger word class level nodesAnd (4) the number.

4. The argument extraction method of utilizing a topology dependency multiview encoder according to claim 1, wherein: step 2, establishing a candidate argument node-entity category node visual angle information network graph by judging whether the current candidate argument belongs to a certain entity category or not and establishing an edge construction rule of one edge between a candidate argument node and an entity category node if an affiliation exists and according to BI labels

Wherein upsilon is_weIs a node, epsilon_weIs an edge according to the figure