CN112836504A

CN112836504A - Event extraction method and device based on hierarchical policy network

Info

Publication number: CN112836504A
Application number: CN202110022760.2A
Authority: CN
Inventors: 赵翔; 黄培馨; 谭真; 胡升泽; 肖卫东; 胡艳丽; 张军; 李硕豪
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2021-05-25
Anticipated expiration: 2041-01-08
Also published as: CN112836504B

Abstract

The invention discloses an event extraction method and device based on a hierarchical policy network, wherein the method comprises the following steps: constructing a hierarchical strategy network; in the process of scanning from the beginning of a sentence to the end of the sentence, the event-level strategy network detects a trigger word at each participle and classifies the type of the event for the detected trigger word; once a specific event is detected, the argument level policy network is triggered to start scanning sentences from beginning to end to detect the participating arguments of the current event; once an argument is identified, the role-level policy network is triggered to predict the role this argument plays in the event under the current event; when the role prediction is completed, the argument level strategy network continues to scan the sentence from the word segmentation position of the current argument backwards to detect other arguments of the event until the tail of the sentence is scanned; then the strategy net at the event level continues to scan the sentence from the word segmentation position of the current event to the back to detect other events contained in the sentence until the tail of the sentence is scanned.

Description

Event extraction method and device based on hierarchical policy network

Technical Field

The invention relates to the technical field of text event extraction in natural language processing, in particular to an event extraction method and device based on a hierarchical policy network.

Background

Event Extraction (EE) plays an important role in many natural language processing top-level applications such as information retrieval and news summarization, etc. The purpose of event extraction is to discover events triggered by specific trigger words and arguments of the events. Generally, event extraction involves several subtasks: trigger word recognition, trigger word classification, event argument recognition and argument role classification.

Some existing event extraction works employ a pipeline method to process these subtasks, i.e., perform event detection (including event-triggered word recognition and classification) and event argument classification in stages. These methods generally assume that the entity information in the text has been labeled (non-patent literature: McClosky et al, 2011; Chen et al, 2015; Yang et al, 2019). However, these staged extraction models do not have any strategy to fully utilize the information interaction between the subtasks, and the event extraction subtasks cannot pass information to each other to improve their decision making. Although some federated models for event extraction by constructing federated extractors are currently available (non-patent documents: Yang and Mitchell, 2016; Nguyen and Nguyen, 2019; Zhang et al, 2019), these models essentially follow a pipelined framework, first identifying entities and trigger words jointly, and then detecting each entity-event pair to identify arguments and argument roles. In addition, the strategy gradient method (Sutton et al, 1999) and the REINFORCE algorithm (Williams,1992) can be used in the prior art to perform parameter optimization of the event detection model.

One problem that these models face is that they all produce redundant entity-event pair information and therefore can also introduce possible errors; another possibility is that when a sentence contains multiple events, there may be a mismatch between the argument and the trigger word, making the performance of event extraction poor.

Consider, for example, the following sentence: in Baghdad, a camera two word an American tank corrected on the Palestine hot In this sentence, "camera" is not only the Victim argument of the event Die (trigger "Die") but also the Target argument of the event Attack (trigger "fire"). However, since "camera" is relatively far from the trigger word "fire" in the text, it is very likely that the event extractor will not recognize "camera" as an argument of the event Attack.

Details of non-patent literature:

David McClosky,Mihai Surdeanu,and Christopher D.Manning.2011.Event extraction as dependency parsing.In The 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies,Proceedings of the Conference,19-24June,2011,Portland,Oregon,USA,pages 1626–1635.

Chen Teruko Mitamura,Zheng zhong Liu,and Eduard H.Hovy.2015.Overview of TAC KBP 2015event nugget track.In Proceedings of the 2015Text Analysis Conference,TAC 2015,Gaithersburg,Maryland,USA,November 16-17,2015,2015.

Yang Trung Minh Nguyen and Thien Huu Nguyen.2019.One for all:Neural joint modeling of entities and events.In The Thirty-Third AAAI Conference on Artificial Intelligence,AAAI 2019,The Thirty-First Innovative Applications of Artificial Intelligence Conference,IAAI 2019,The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence,EAAI 2019,Honolulu,Hawaii,USA,January 27 -February 1,2019,pages 6851–6858.

Bishan Yang and Tom M.Mitchell.2016.Joint extraction of events and entities within a document context.In NAACL HLT 2016,The 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,San Diego California,USA,June 12-17,2016,pages 289–299.

Junchi Zhang,Yanxia Qin,Yue Zhang,Mengchi Liu,and Donghong Ji.2019. Extracting entities and events as a single task using a transition-based neural model. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,IJCAI 2019,Macao,China,August 10-16,2019,pages 5422–5428.

Richard S.Sutton,David A.McAllester,Satinder P.Singh,and Yishay Mansour. 1999.Policy gradient methods for reinforcement learning with function approximation.In Advances in Neural Information Processing Systems 12,[NIPS Conference,Denver,Colorado,USA,November 29-December 4,1999],pages 1057–1063.

Ronald J.Williams.1992.Simple statistical gradient-following algorithms for connectionist reinforcement learning.Mach.Learn.,8:229–256.

disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention discloses an event extraction method and device based on a hierarchical policy network. The method provides a Multi-layer Policy Network (MPNet) to jointly perform subtasks of event extraction. The MPNet comprises an event-level policy network (event-level policy network), an argument-level policy network (argument-level policy network) and a role-level policy network (role-level policy network), and the tasks of event detection, event argument identification and argument role classification are respectively solved in the three layers.

The technical scheme of the invention is that the event extraction method based on the hierarchical policy network comprises the following steps:

step 1, constructing a hierarchical policy network, wherein the hierarchical policy network comprises an event level policy network, an argument level policy network and a role level decision network;

step 2, in the process of scanning from the beginning of a sentence to the end of the sentence, the event-level strategy network detects a trigger word at each participle and classifies the event type of the detected trigger word;

step 3, once a specific event is detected, the argument level strategy network is triggered to start scanning sentences from beginning to end so as to detect the participation argument of the current event;

step 4, once an argument is identified, the role-level policy network is triggered to predict the role of the argument in the event under the current event;

and 5, when the role classification of the role-level strategy network is finished, the argument-level strategy network continues to scan the sentence after the role classification to find the next argument, and once the argument detection of the argument-level strategy network under the current event is finished, the event-level strategy network continues to scan the sentence from the word segmentation position of the current event to the back to detect other events contained in the sentence until the tail of the sentence is scanned.

Furthermore, the agent is adopted to perform the above steps 2-5, in step 2, when the agent scans sentences sequentially from beginning to end, the event-level policy network continues to sample selections according to the policy at each time step, and the event-level selections usually include non-trigger words or a specific event type set of trigger words;

step 3, a specific event is detected, the intelligent agent is transferred to the argument level strategy network, when the sentence is scanned from beginning to end, an action is selected according to the strategy at each time step, and the argument level action is to assign a specific argument label to the word;

in step 4, a specific argument is detected, the intelligent agent is transferred to a role-level network to sample and select the current argument according to a strategy, and the role-level selection is a role type set;

in step 5, after the role classification of the argument is completed, the intelligent agent is transferred to the argument level strategy network to continuously scan the rest arguments of the remaining word segmentation recognition events of the sentence, and once the intelligent agent finishes detecting the participated arguments of the current event under the current event, the intelligent agent is transferred to the event level strategy network to continuously scan the remaining sentences to recognize other events;

in step 2-5, once a selection or action is sampled, a reward is returned.

Specifically, given the input text S ═ w₁,w₂,K,w_LThe purpose of the event-level policy network is to detect the trigger word w_iThe event type triggered, at the current word or time step t, the event level policy network will adopt a random policy mu to determine the selection, and then the obtained reward is used to guide the policy learning of the policy network;

selection of the event level policy network

Is from a selection set O^eSampling from { NE }. U epsilon, wherein NE represents a participle of a non-trigger word, and epsilon is a predefined event type set in a data set and is used for indicating an event type triggered by a current trigger word;

the state of the event-level policy network process

Is related to the past time step and encodes not only the current input but also the previous ambient state, s_t ^eIs a concatenation of three vectors: 1) last time step shapeState s_t-1Wherein s if the agent initiates an event-level policy process at time step t-1_t-1＝s_t-1 ^e(ii) a Otherwise s_t-1＝s_t-1 ^r，s_t-1 ^eRepresenting the environmental state, s, of a t-1 time step event level policy network_t-1 ^rRepresenting the environmental state of a t-1 time step role level policy network, 2) event type vector

Is from the satisfaction of

Last choice of (3) hidden state vector h_tIt is at the current input word vector w_tAnd (3) processing the text word segmentation sequence by the Bi-LSTM to obtain the hidden layer state vector obtained by the Bi-LSTM:

in this way,

expressed as:

finally, the state is represented as a continuous real-valued vector using the multi-layered perceptron MLP

The random strategy in the event-level strategy network, namely the strategy for making a certain selection, mu: S^e→O^eIt takes a selection

Probability distribution according to:

wherein, W^eAnd b^eIs a parameter that is a function of,

is a state representation vector;

the final purpose of the reward of the event-level policy network is to identify and classify events, whether the trigger word is correct or not is an intermediate result, once the event-level selection is made

Sampled, agents will receive an immediate reward that can be reflected in the selection

Short term reward by comparison to the standard annotation of the event type in sentence S

Obtaining:

where sgn (·) is a sign function, and I (NE) is a switch function for distinguishing the reward of a trigger word from a non-trigger word:

the smaller the alpha is, the smaller the reward obtained by identifying the non-trigger word is, so that the condition that the model learns an unimportant strategy can be avoided, and all words are predicted to be NE (NE), namely the non-trigger word.

When the event level strategy network samples and selects until the last word in the sentence S, the agent ends all event levelsAfter selection, a final reward is obtained

The delayed reward of this final state is defined by sentence-level event detection performance:

wherein F₁(. h) represents the F1 score for sentence-level event detection results, which is the harmonic mean of sentence-level accuracy and recall.

In particular, in step 3, a particular event is detected at time step t', i.e.

The agent will transfer to the argument level policy network to predict each argument at event

The argument level strategy network takes a random strategy pi to select action at each word or time step t, leads the learning of the argument under the current event by using rewards, and selects argument decision in order to transmit event information with finer granularity to assist argument decision

And state representation from event level processes

Used as additional input by the entire argument level process;

the action of the argument level policy network

A particular argument label is assigned to the current word,

is from a mobile stationA betweenⁿB/I/E represents position information of a current participle in an argument, B represents a start position, I represents an intermediate position, E represents an end position, O marks an argument irrelevant to a current event, S represents an argument of an individual participle, N marks a non-argument word, and the same argument may be given different labels due to different types of events at different time steps; in this way, the multiple event and mismatch problem can be solved quite naturally.

State of argument level policy network process

In relation to the past time step, not only the current input, but also the previous environment status and environment information of the initialization event type,

is the concatenation of four vectors: 1) state s of last time step_t-1Wherein s is_t-1Is a state from an event-level policy network or an argument-level policy network or a role-level policy network, 2) argument tag vectors

It is a slave action

Middle learning, 3) event status representation

4) Hidden layer state vector h_tWhich is obtained from a similar Bi-LSTM treatment in formula 1,

expressed as:

finally useMulti-layered perceptron MLP represents states as a continuous real-valued vector

By event type

As an additional input, a random strategy for argument detection, i.e., a strategy that takes some action of π Sⁿ→AⁿIt selects an action

Probability distribution according to:

wherein WⁿAnd bⁿIs a parameter that is a function of,

is a vector of argument level state representations,

is an event

Is represented by the formula (I), W_μIs an array of the epsilon matrix,

represent an event

Mapping through the array to obtain an event expression vector;

once an argument level action

If selected, the agent will receive an immediate reward

The reward is related to the type of event that is predicted

Standard argument notation under

In contrast, the reward is calculated as follows:

where I (N) is a switch function for differentiating between argument and non-argument word awards:

wherein beta is a bias weight, beta is less than 1, and the smaller beta means that the reward obtained by non-argument words is smaller, so that the intelligent agent can be prevented from learning an unimportant strategy and setting all actions as N;

the agent continues to select actions for each participle until the action of the last participle, when the agent ends at the current event

All choice of argument level actions that follow will result in a final reward

In particular, in step 4, a participating argument is detected at time step t, i.e.

Agent will move to role level policy network to predict arguments

At event

Specifically, at each word/time step t, the role-level policy network adopts a random policy mu to select and select, and guides argument role learning participating in arguments under the current event by using rewards, and selects event information and argument information with finer granularity to assist the decision of argument roles

And actions

The entire argument level process is used as additional input;

of role-level policy networks

Classifying a argument role for the current argument, and selecting a argument role set, namely O^rR, where R is a predefined set of argument roles;

state of the process of the role gradation

Also related to the past time step, not only the current input, but also the previous environment state,

is a concatenation of three vectors: 1) state of last time step

2) Argument role vector

It is selected from

From middle learning, 3) hidden state vector h_tWhich is obtained from a similar Bi-LSTM treatment in formula 2,

expressed as:

finally, the state is expressed as a continuous real-value vector by using a multi-layer perceptron MLP

To define the strategy for role set, the scores for all argument roles are first computed:

wherein, W^rIs a parameter that is a function of,

is an argument level state representation vector

Is the representative vector of the current argument, h_πIs a hidden state vector on the input word vector;

therefore, a matrix M epsilon {0,1} based on the event architecture is designed^|ε|*|R|Wherein M [ e ]][r]Using this matrix to filter out argument roles that are unlikely to participate in the current event if and only if the event e has a role r in the event framework information;

then, the random strategy for role detection is μ S^r→O^rIt selects a selection o_t ^rProbability distribution according to:

W^rand b^rIs a parameter;

upon a role level selection

Is executed, the agent will receive an immediate reward r_t ^rThe reward is marked by the standard role under the current event type

In contrast, the reward is calculated as follows:

the final reward is due to the fact that the role-level selection is performed only one step at the argument level action

Furthermore, the event-level strategy network selects probability sampling according to formula 3 during training; the most probable choice of the event-level policy network is selected during testing, i.e. the most probable choice is selected

The argument levelWhen the strategy network and the role-level strategy network are trained and tested, actions are sampled in a similar manner of the event-level strategy network.

Still further, the transition of the event-level policy network depends on the selection

If at a certain time step

The agent will continue to start with a new event-level policy network state, otherwise, meaning that a particular event was detected, the agent will initiate a new subtask, switch to the argument-level policy network to detect the argument of participation in the current event, after which the agent will begin argument-level selection and will not switch to the event-level policy network until all events at the current event are reached

The lower argument level selection is sampled and finished, and the event level strategy network continues sampling and selecting until the last word in the sentence S;

transition of an argument level policy network depends on action

If at a certain time step

Then, one participating argument of the current event is identified, the intelligent agent is transferred to the role-level strategy network to classify the argument roles, otherwise, the intelligent agent continues to the argument-level strategy network; if the argument level policy network executes to the end of the sentence, the agent will transition to the event level policy network to continue identifying the remaining events.

Still further, to optimize the event level policy network, the argument level policy network, and the role level policy network, the hierarchical training goal of the hierarchical policy network is to maximize the expected cumulative discount rewards from the three phases obtained by the agent at each time step t according to policy sample selection and action, the expected cumulative discount rewards calculated as follows:

where E is the expected calculation of the reward under the policy network, γ ∈ [0,1 ]]Is the discount rate, T^eIs the total elapsed time step, T, of the event-level process before the endⁿIs the end time step, T, of the argument level process^rIs the time step that elapses before the end of the role-level process;

the reward obtained at time step k for process.

The cumulative prize is then decomposed into bellman equations, which can then be optimized with the REINFORCE algorithm, the decomposed bellman equations being as follows:

where N is the argument level process in the selection

The next time step of duration, so the next choice of agent is o_t+NIf, if

Then N is 1; since the role level policy network is acting

Only one-step role classification is involved, so the index of the discount rate gamma in the process of argument level is 1; and there is no other step under the argument level strategy network, so the index of the discount rate gamma is 0; r is the final reward which is finally obtained by each layer of policy network, and R is the instant reward.

Optimizing the Bellman equation obtained by decomposition by adopting a strategy gradient method and a REINFORCE algorithm to obtain the following random gradient for updating the parameters:

the invention also discloses an electronic device, comprising:

a processor;

and a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the above-described event extraction method via execution of the executable instructions.

Compared with the prior art, the method has the advantages that: firstly, a hierarchical strategy network is applied, and a deep reinforcement learning method is used for extracting events; a three-layer hierarchical network MPNet is designed to realize combined event extraction, an event-level strategy network is used for event extraction, an argument-level strategy network is used for argument extraction, and a role-level strategy network is used for argument role identification. Due to the hierarchical structural design, MPNet is adept at utilizing deep information interaction among subtasks and is highlighted in processing sentences containing a plurality of events. Therefore, the event extraction method has better performance.

Drawings

FIG. 1 shows a schematic flow diagram of an embodiment of the invention;

FIG. 2 shows an algorithm flow diagram of an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example one

Fig. 1 shows a schematic flow chart of a first embodiment of the present invention. The technical scheme of the invention is that the event extraction method based on the hierarchical policy network comprises the following steps:

The specific algorithm flow is shown in fig. 2.

in step 2-5, once a selection or action is sampled, a reward is returned.

selection of the event level policy network

the state of the event-level policy network process

Is related to past time step, not only encodes the current input, but also encodes the previous oneThe state of the environment is that of the environment,

is a concatenation of three vectors: 1) state s of last time step_t-1Wherein s if the agent initiates an event-level policy process at time step t-1_t-1＝s_t-1 ^e(ii) a Otherwise s_t-1＝s_t-1 ^r，s_t-1 ^eRepresenting the environmental state, s, of a t-1 time step event level policy network_t-1 ^rRepresenting the environmental state of a t-1 time step role level policy network, 2) event type vector

Is from the satisfaction of

in this way,

expressed as:

The random strategy in the event-level strategy network, namely the strategy for making a certain selection, mu: S^e→O^eIt samples oneSelecting

Probability distribution according to:

wherein, W^eAnd b^eIs a parameter that is a function of,

is a state representation vector;

Obtaining:

the method comprises the following steps that alpha is a bias weight, alpha is less than 1, and the smaller alpha is, the smaller reward obtained by identifying a non-trigger word is, so that the condition that a model learns an unimportant strategy can be avoided, and all words are predicted to be NE (NE), namely the non-trigger word;

when the event-level strategy network samples and selects until the last word in the sentence S, and the agent finishes all the event-level selections, a final reward is obtained

In particular, in step 3, a particular event is detected at time step t', i.e.

And state representation from event level processes

Used as additional input by the entire argument level process;

the action of the argument level policy network

Is to giveThe current word is given a particular argument label,

is from a motion space AⁿB/I/E represents position information of a current participle in an argument, B represents a start position, I represents an intermediate position, E represents an end position, O marks an argument irrelevant to a current event, S represents an argument of an individual participle, N marks a non-argument word, and the same argument may be given different labels due to different types of events at different time steps; in this way, the multiple event and mismatch problem can be solved quite naturally.

State of argument level policy network process

It is a slave action

Middle learning, 3) event status representation

4) Hidden layer state vector h_tWhich is obtained from a similar Bi-LSTM treatment in formula 2,

expressed as:

By event type

Probability distribution according to:

wherein WⁿAnd bⁿIs a parameter that is a function of,

is a vector of argument level state representations,

is an event

Is represented by the formula (I), W_μIs an array of the epsilon matrix,

represent an event

Mapping through the array to obtain an event expression vector;

once an argument level action

If selected, the agent will receive an immediate reward

The reward is related to the type of event that is predicted

Standard argument notation under

In contrast, the reward is calculated as follows:

All choice of argument level actions that follow will result in a final reward

Agent will move to role level policy network to predict arguments

At event

And actions

The entire argument level process is used as additional input;

of role-level policy networks

state of the process of the role gradation

is a concatenation of three vectors: 1) state of last time step

2) Argument role vector

It is selected from

expressed as:

wherein,

is the representative vector of the current argument, h_πIs hidden layer on input word vectorA state vector;

W^rand b^rIs a parameter;

upon a role level selection

In contrast, the reward is calculated as follows:

The above-mentionedWhen training and testing, actions of the argument level strategy network and the role level strategy network are sampled in a similar way of the event level strategy network.

If at a certain time step

transition of an argument level policy network depends on action

If at a certain time step

the reward obtained at time step k for process.

where N is the argument level process in the selection

The next time step of duration, so the next choice of agent is o_t+NIf, if

Then N is 1; since the role level policy network is acting

example two

The invention also discloses an electronic device, comprising:

a processor;

and a memory for storing executable instructions of the processor;

wherein the processor is configured to execute the event extraction method via executing the executable instructions in the first embodiment.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. An event extraction method based on a hierarchical policy network is characterized by comprising the following steps:

2. The method for extracting events based on hierarchical policy network according to claim 1, wherein the agent performs the above steps 2-5, and in step 2, when the agent scans sentences from beginning to end, the event-level policy network continues to sample selections according to the policy at each time step, and the event-level selections usually include non-trigger words or a specific set of event types of trigger words;

in step 2-5, once a selection or action is sampled, a reward is returned.

3. The method as claimed in claim 2, wherein the input text S ═ w is given₁,w₂,...,w_LThe purpose of the event-level policy network is to detect the trigger word w_iThe event type triggered, at the current word or time step t, the event level policy network will adopt a random policy mu to determine the selection, and then the obtained reward is used to guide the policy learning of the policy network;

selection of the event level policy network

the state of the event-level policy network process

Is related to the past time step, encodes not only the current input, but also the previous environment state,

Is from the satisfaction of

in this way,

expressed as:

Probability distribution according to:

wherein, W^eAnd b^eIs a parameter that is a function of,

is a state representation vector;

final purpose of reward of said event level policy networkIt is to identify and classify events, whether the trigger word is correct or not is an intermediate result, once the event level is selected

Obtaining:

4. The method for extracting events based on hierarchical policy network as claimed in claim 3, wherein in step 3, a specific event is detected at time step t

And state representation from event level processes

Used as additional input by the entire argument level process;

the action of the argument level policy network

A particular argument label is assigned to the current word,

is from a motion space AⁿB, { B, I, O, E, S } { N }, where B/I/E denotes position information of the current participle in the argument, B denotes a start position, and I tableIndicating an intermediate position, indicating an end position, marking an argument which is irrelevant to the current event by O, indicating an argument of a single participle by S, and indicating a non-argument word by N, wherein the same argument may be endowed with different labels due to different types of the events at different time steps; in this way, multiple events and mismatch problems can be solved quite naturally;

state of argument level policy network process

It is a slave action

Middle learning, 3) event status representation

expressed as:

finally using a multi-layer perceptronMLP represents a state as a continuous real-valued vector

By event type

Probability distribution according to:

wherein WⁿAnd bⁿIs a parameter that is a function of,

is a vector of argument level state representations,

is an event

Is represented by the formula (I), W_μIs an array of the epsilon matrix,

represent an event

Mapping through the array to obtain an event expression vector;

once an argument level action

If selected, the agent will receive an immediate reward

The reward is related to the type of event that is predicted

Standard argument notation under

In contrast, the reward is calculated as follows:

All choice of argument level actions that follow will result in a final reward

5. The method according to claim 4, wherein in step 4, a participating argument is detected at time step t, i.e. the method comprises

Agent will move to role level policy network to predict arguments

At event

And actions

The entire argument level process is used as additional input;

of role-level policy networks

Classifying a argument role for the current argument, and selecting a argument role set, namely O^rWherein R is RA defined argument role set;

state of the process of the role gradation

is a concatenation of three vectors: 1) state of last time step

2) Argument role vector

It is selected from

The user can learn the Chinese character from the Chinese character,

3) hidden layer state vector h_tWhich is obtained from a similar Bi-LSTM treatment in formula 2,

expressed as:

wherein,

then, the random strategy for role detection is μ S^r→O^rIt selects a selection

Probability distribution according to:

wherein, W^rAnd b^rIs a parameter;

upon a role level selection

Is executed, the agent will receive an instant reward

This reward is labeled by the standard role under the current event type

In contrast, the reward is calculated as follows:

6. The method for extracting events based on hierarchical policy network according to any of claims 3 to 5, wherein the event-level policy network selects probability sampling according to equation 3 during training; the most probable choice of the event-level policy network is selected during testing, i.e. the most probable choice is selected

When the argument level strategy network and the role level strategy network are trained and tested, actions are obtained by sampling in a similar mode of the event level strategy network;

event-level policy network migration is selection dependent

If at a certain time step

transition of an argument level policy network depends on action

If at a certain time step

7. The method of claim 6, wherein for optimizing the event-level policy network, the argument-level policy network, and the role-level policy network, the objective of the hierarchical training of the hierarchical policy network is to maximize the expected cumulative discount rewards from three phases obtained by the agent at each time step t according to policy sample selection and action, the expectation of the cumulative discount rewards being calculated as follows:

wherein,

is the expected calculation of the reward under the policy network, and gamma is equal to 0,1]Is the discount rate, T^eIs the total elapsed time step, T, of the event-level process before the endⁿIs the end time step, T, of the argument level process^rIs the time step that elapses before the end of the role-level process;

awards obtained for process at time step k;

where N is the argument level process in the selection

The next time step of duration, so the next choice of agent is o_t+NIf, if

Then N is 1; since the role level policy network is acting

Only one-step role classification is involved, so the index of the discount rate gamma in the process of argument level is 1; and there is no other step under the argument level strategy network, so the index of the discount rate gamma is 0; r is the final reward which is finally obtained by each layer of policy network, and R is the instant reward;

8. an event extraction electronic device based on a hierarchical policy network, comprising:

a processor;

and a memory for storing executable instructions of the processor;

wherein the processor is configured to perform a hierarchical policy network based event extraction method via execution of the executable instructions of any of claims 1 to 7.