CN112836504B

CN112836504B - Event extraction method and device based on hierarchical policy network

Info

Publication number: CN112836504B
Application number: CN202110022760.2A
Authority: CN
Inventors: 赵翔; 黄培馨; 谭真; 胡升泽; 肖卫东; 胡艳丽; 张军; 李硕豪
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2024-02-02
Anticipated expiration: 2041-01-08
Also published as: CN112836504A

Abstract

The invention discloses an event extraction method and equipment based on a hierarchical policy network, wherein the method comprises the following steps: constructing a hierarchical policy network; in the process of scanning from the head of a sentence to the tail of the sentence, the event-level strategy network detects trigger words at each word segmentation position and classifies event types of the detected trigger words; once a particular event is detected, the argument-level policy network will be triggered to begin scanning sentences from beginning to end to detect the participating arguments of the current event; once an argument is identified, a role level policy network will be triggered to predict the role that this argument plays in the event at the current event; when the role prediction is completed, the argument level strategy network continues to scan sentences backwards from the word segmentation position of the current argument to detect other argument of the event until the tail of the sentence is scanned; the event-level strategic net will then continue scanning the sentence back from the word segmentation of the current event to detect other events contained in the sentence until the end of the sentence is scanned.

Description

Event extraction method and device based on hierarchical policy network

Technical Field

The present invention relates to the field of text event extraction technology in natural language processing, and in particular, to an event extraction method and apparatus based on a hierarchical policy network.

Background

Event Extraction (EE) plays a very important role in many natural language processing upper layer applications such as information retrieval and news summaries, etc. The purpose of event extraction is to discover events triggered by specific trigger words and arguments of the events. Generally, event extraction includes several subtasks: trigger word recognition, trigger word classification, event argument recognition and argument character classification.

Some existing event extraction works employ pipelined methods to handle these subtasks, namely, event detection (including event trigger word recognition and classification) and event argument classification in stages. These methods generally assume that the entity information in the text has been annotated (non-patent literature: mcClosky et al, 2011; chen et al, 2015; yang et al, 2019). However, these staged extraction models do not have any strategy to take full advantage of the information interaction between subtasks, and event extraction subtasks cannot communicate information with each other to promote their decisions. Although there are currently some federated models that perform event extraction by building a federated extractor (non-patent documents Yang and Mitchell,2016;Nguyen and Nguyen,2019;Zhang et al, 2019), these models follow a pipelined framework in nature, first federated identifying entities and trigger words, and then detecting each entity-event pair to identify arguments as well as argument roles. In addition, the strategy gradient method (Sutton et al 1999) and the REINFORCE algorithm (Williams 1992) can be used to perform parameter optimization of the event detection model in the prior art.

One problem that these models face is that they all produce redundant entity-event pair information and thus also introduce possible errors; another possible face is that when a sentence contains multiple events, there may be a mismatch problem between the argument and the trigger word, making the event extraction not good.

Consider, for example, the following sentence: in basic a cameraman died when an American tank fired on the Palestine hotel In this sentence "camera" is not only the Victim argument of event Die (trigger word "d"), but also the Target argument of event attach (trigger word "field"). However, since "camera" is far from the trigger word "field" in the text, the event extractor is likely to not recognize "camera" as an argument of the event attach.

Details of non-patent literature:

David McClosky,Mihai Surdeanu,and Christopher D.Manning.2011.Event extraction as dependency parsing.In The 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies,Proceedings of the Conference,19-24 June,2011,Portland,Oregon,USA,pages 1626–1635.

Chen Teruko Mitamura,Zheng zhong Liu,and Eduard H.Hovy.2015.Overview of TAC KBP 2015 event nugget track.In Proceedings of the 2015 Text Analysis Conference,TAC 2015,Gaithersburg,Maryland,USA,November 16-17,2015,2015.

Yang Trung Minh Nguyen and Thien Huu Nguyen.2019.One for all:Neural joint modeling of entities and events.In The Thirty-Third AAAI Conference on Artificial Intelligence,AAAI 2019,The Thirty-First Innovative Applications of Artificial Intelligence Conference,IAAI 2019,The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence,EAAI 2019,Honolulu,Hawaii,USA,January 27-February 1,2019,pages 6851–6858.

Bishan Yang and Tom M.Mitchell.2016.Joint extraction of events and entities within a document context.In NAACL HLT 2016,The 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,San Diego California,USA,June 12-17,2016,pages 289–299.

Junchi Zhang,Yanxia Qin,Yue Zhang,Mengchi Liu,and Donghong Ji.2019.Extracting entities and events as a single task using a transition-based neural model.In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,IJCAI 2019,Macao,China,August 10-16,2019,pages 5422–5428.

Richard S.Sutton,David A.McAllester,Satinder P.Singh,and Yishay Mansour.1999.Policy gradient methods for reinforcement learning with function approximation.In Advances in Neural Information Processing Systems 12,[NIPS Conference,Denver,Colorado,USA,November 29-December 4,1999],pages 1057–1063.

Ronald J.Williams.1992.Simple statistical gradient-following algorithms for connectionist reinforcement learning.Mach.Learn.,8:229–256.

disclosure of Invention

The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention discloses an event extraction method and equipment based on a hierarchical policy network. The method proposes a hierarchical policy network (Multi-layer Policy Network, MPNet) to combine the sub-tasks of event extraction. MPNet contains an event-level policy network (event-level policy network), an argument-level policy network (event-level policy network) and a role-level policy network (role-level policy network) where event detection, event argument identification and argument role classification tasks are addressed, respectively.

The technical scheme of the invention is as follows, an event extraction method based on a hierarchical policy network comprises the following steps:

step 1, constructing a hierarchical policy network, wherein the hierarchical policy network comprises an event-level policy network, an argument-level policy network and a role-level decision network;

step 2, in the process of scanning from the head of the sentence to the tail of the sentence, the event-level strategy network detects trigger words at each word segmentation position and classifies event types of the detected trigger words;

step 3, once a specific event is detected, the argument level policy network is triggered to start scanning sentences from beginning to end to detect the participation argument of the current event;

step 4, once an argument is identified, the role-level policy network is triggered to predict the role played by the argument in the event under the current event;

and 5, when role classification of the role-level policy network is completed, the argument-level policy network continues to scan the sentence after the next argument is searched, and once the argument-level policy network under the current event completes argument detection, the event-level policy network continues to scan the sentence backwards from the word segmentation position of the current event to detect other events contained in the sentence until the tail of the sentence is scanned.

Further, the steps 2-5 are executed by using the agent, in the step 2, when the agent scans sentences sequentially from beginning to end, the event level policy network continuously samples the selection according to the policy at each time step, and the event level selection usually comprises a specific event type set of non-trigger words or trigger words;

in step 3, a specific event is detected, the agent will transfer to the said argument-level policy network, and when the sentence is scanned from beginning to end, an action is selected according to the policy at each time step, and argument-level action is to assign a specific argument tag to the segmentation word;

in step 4, a specific argument is detected, the intelligent agent is transferred to a role-level network to sample and select the current argument according to a strategy, and the role-level selection is a set of role types;

in step 5, after the role classification of the argument is completed, the agent will transfer to the argument level policy network to continue scanning the rest of the argument of the sentence and identifying the rest of the argument of the event, once the agent finishes detecting the participation argument of the current event at the current event, the agent will transfer to the event level policy network to continue scanning the rest of sentence and identifying other events;

In step 2-5, once a selection or action is sampled, a reward is returned.

Specifically, given the input text s=w ₁ ,w ₂ ,...,w _L The purpose of the event-level policy network is to detect trigger words w _i The triggered event type, at the current word or time step t, the event level policy network will take a random policy μ to determine the selection, and then use the rewards obtained to guide the policy learning of the policy network;

selection of the event level policy networkIs from a selection set O ^e Sampling in = { NE }. U epsilon, wherein NE represents word segmentation of non-trigger words, epsilon is a predefined event type set in a data set and is used for indicating event types triggered by current trigger words;

the state of the event-level policy network processIs related to the past time step, not only the current input but also the previous environmental state, +.>Is a join of the following three vectors: 1) State s of last time step _t-1 Wherein if the agent initiates an event level policy procedure at time step t-1, s _t-1 ＝s _t-1 ^e The method comprises the steps of carrying out a first treatment on the surface of the Otherwise s _t-1 ＝s _t-1 ^r ，s _t-1 ^e Representing the environmental state of a t-1 time step event level policy network, s _t-1 ^r Representing the environmental status of a t-1 time step role level policy network, 2) event type vector +. >Is from meeting->Is learned from the last selection of (3) hidden layer state vector h) _t It is the current input word vector w _t The hidden layer state vector obtained by Bi-LSTM is obtained by processing text word segmentation sequences through Bi-LSTM:

thereby the processing time of the product is reduced,expressed as:

finally, using multi-layer perceptron MLP to represent states as a continuous real-valued vector

The random strategy in the event-level strategy network, namely the strategy for carrying out certain selection, mu S ^e →O ^e It samples a selectionAccording to the following probability distribution:

wherein W is ^e And b ^e Is a parameter of the sample, which is a parameter,is a state representation vector;

the final purpose of rewarding the event level policy network is to identify and classify events, whether the trigger words are correct or not is an intermediate result, once the event level is selectedSampled, the agent will get an immediate reward which can be reflected in the choice +.>The immediate rewards are given by standard notes +.>The method comprises the following steps:

where sgn (·) is the sign function and I (NE) is a switching function that distinguishes rewards for trigger words from non-trigger words:

where α is the bias weight, the smaller α < 1, the smaller the reward for identifying the non-trigger word, which can avoid the model learning to a non-important strategy to predict all words as NEs, i.e., non-trigger words.

When the event level strategy network samples and selects until the last word in the sentence S, the agent obtains a final rewards after finishing all event level selectionsThe delay of this final state, reward, is defined by the sentence-level event detection performance:

wherein F is ₁ (. Cndot.) the F1 score representing the sentence-level event detection result is the harmonic mean of the sentence-level accuracy rate and recall rate.

Specifically, in step 3, a specific event is detected at time step t', i.eThe agent will migrate to the argument level policy network to predict each argument at event +.>At each word or time step t, the meta-level policy network takes a random policy pi to take action and uses rewards to guide the learning of the participating meta-at the current event, to assist meta-decision, select in order to deliver more fine-grained event information>And status representation from event-level procedure +.>Is used as an additional input by the overall meta-level process;

actions of the argument-level policy networkIs to assign a specific argument tag to the current word,>is from an action space A ⁿ Selecting from = { B, I, O, E, S } u { N }, wherein B/I/E represents the position information of the current word in the argument, B represents the start position, I represents the intermediate position, E represents the end position, O marks the argument unrelated to the current event, S represents the argument of the individual word, N marks the non-argument word, and at different time steps, the same argument may be given different labels due to different types of the events; in this way, the multiple event and mismatch problems can be solved very naturally.

State of an argument-level policy network processIn connection with the past time step not only the current input but also the previous environmental state and the environmental information of the type of initialization event are encoded, < >>Is a join of the following four vectors: 1) State s of last time step _t-1 Wherein s is _t-1 Is a state from an event-level policy network or an argument-level policy network or a role-level policy network, 2) argument tag vector +.>It is from action->Is learned from, 3) event status is indicative of +.>4) Hidden layer state vector h _t It is obtained from a similar Bi-LSTM treatment in formula 1,/I>Expressed as:

By event typeAs an additional input, the random policy for argument detection, i.e., the policy to take some action, is pi: S ⁿ →A ⁿ It selects an action +.>According to the following probability distribution:

wherein W is ⁿ And b ⁿ Is a parameter of the sample, which is a parameter,is an argument-level state representation vector, +.>Is event->Is represented by W _μ Is an array of the |epsilon|matrix, |>Representing the event +.>Mapping is carried out through the array to obtain a representation vector of the event;

once an argument level actsSelected, the agent will get an immediate prize + - >This reward is by and in the predicted event type +.>The following standard treaty marks->The comparison results in the following rewards calculated:

wherein I (N) is a switching function for distinguishing rewards of argument and non-argument words:

wherein, beta is bias weight, beta is less than 1, and smaller beta means that the rewards obtained by non-argument words are smaller, so that the intelligent agent can be prevented from learning an unimportant strategy, and all actions are set as N;

the agent continues to select actions for each word segment until the last word segment, when the agent ends the current eventThe selection of the next level action of all the arguments will get a final prize +.>

Specifically, in step 4, a participation argument is detected at time step t, i.eThe agent will transfer to the role level policy network to predict the argument +.>In event->In particular, at each word/time step t, the role-level policy network takes a random policy μ to choose the choice and uses rewards to guide the argument role learning of the argument in the current event, and to assist in the decision of argument roles by delivering more fine-grained event information and argument information, choose->Action->The entire argument-level process is used as an additional input;

Role level policy networkIs to classify the current argument into an argument character, and the selection space is an argument character set, namely O ^r R, where R is a predefined argument role set;

status of role-level proceduresAlso in relation to the past time steps, not only the current input but also the previous environmental state,/->Is a join of the following three vectors: 1) Status of last time step->2) Argument character vectorIt is selected from->From learning, 3) hidden layer state vector h _t It is similar from formula 2Bi-LSTM treatment, is obtained>Expressed as:

To define policies for a set of roles, the scores for all argument roles are first calculated:

wherein W is ^r Is a parameter of the sample, which is a parameter,is an argument-level state representation vector +>Is the representation vector of the current argument, h _π Is a hidden layer state vector on the input word vector;

thus, a matrix M E {0,1} based on the event architecture is designed ^|ε|*|R| Wherein M [ e ]][r]=1 if and only if event e has a role r in the event architecture information, using this matrix to filter out the argument roles that are not likely to participate in the current event;

then, the random strategy for character detection is μ:S ^r →O ^r It selects a selectionAccording to the following probability distribution:

W ^r and b ^r Is a parameter;

once a role level is selectedIs executed, the agent will get an immediate prize +>This reward is marked by +.>The comparison results in the following rewards calculated:

since the character level selection will only perform one step under the argument level action, the final rewards

Further, the event-level policy network selects probability sampling according to equation 3 during training; the event-level policy network is tested by selecting the most probable choice, i.eThe action can be sampled in a similar way to the event-level policy network during training and testing of the meta-level policy network and the role-level policy network.

Still further, the transition of the event level policy network is dependent on the selectionIf +.>The agent will continue to start from a new event level policy network state, otherwise, meaning that a particular event was detected, the agent will initiate a new subtask, turn to the meta-level policy network to detect the participation meta-of the current event, after which the agent will start meta-level selection and will not migrate to the event level policy network until all at the current event- >The next argument level selection is ended by sampling, and the event level strategy network continues sampling selection until the last word in the sentence S;

the transition of the meta-level policy network depends on actionsIf +.>Then a participation argument indicating the current event is identified, the agent will transfer to the role-level policy network to classify the argument roles, otherwise the agent will continue to discuss the argument-level policy network; if the meta-level policy network executes to the end of sentence, then the agent will transition to the event-level policy network to continue identifying the remaining events.

Still further, to optimize the event-level, meta-level, and role-level policy networks, the hierarchical training goal of the hierarchical policy network is to maximize the expected cumulative discount rewards from the three phases that are obtained by the agent at each time step t based on the policy sample selections and actions, the expected calculation of the cumulative discount rewards is as follows:

wherein,is the expected calculation of rewards under the strategy network, gamma is E [0,1 ]]Is the discount rate, T ^e Is the total elapsed time steps, T, of the event-level process before ending ⁿ Is the ending time step of the argument-level process, T ^r Is the time step that passes before the end of the character level process; / >Rewards obtained at time step k for the procedure.

The jackpot is then decomposed into bellman equations, which can then be optimized using the REINFORCE algorithm, the decomposed bellman equations being as follows:

where N is the argument level process in the selectionA time step of next duration, so the next choice of agent is o _t+N If->N=1; because the role-level policy network is acting +.>Only one-step role classification is involved, so that the index of the discount rate gamma in the process of the argument level is 1; no other steps are performed under the meta-level policy network, so that the index of the discount rate gamma is 0; r is the final prize that each layer of policy network will eventually get, and R is the instant prize.

And optimizing the Belman equation obtained by the decomposition by adopting a strategy gradient method and a REINFORCE algorithm to obtain the following random gradients for updating parameters:

the invention also discloses an electronic device, comprising:

a processor;

and a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the event extraction method described above via execution of the executable instructions.

Compared with the prior art, the method has the advantages that: applying a hierarchical strategy network for the first time, and extracting events by a deep reinforcement learning method; a three-layer hierarchical network MPNet is designed to realize joint event extraction, an event level policy network is used for event extraction, an argument level policy network is used for argument extraction and a role level policy network is used for argument role recognition. Due to the hierarchical structural design, MPNet is good at utilizing deep information interaction among subtasks and is outstanding in processing sentences containing a plurality of events, and meanwhile, the method of the invention naturally avoids the problem of mismatching through a hierarchical structure and better handles the situation of the plurality of events. Therefore, the invention has better performance for event extraction.

Drawings

FIG. 1 shows a schematic flow diagram of an embodiment of the present invention;

fig. 2 shows a schematic algorithm flow diagram of an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Example 1

Fig. 1 shows a schematic flow chart of a first embodiment of the present invention. The technical scheme of the invention is as follows, an event extraction method based on a hierarchical policy network comprises the following steps:

The specific algorithm flow is shown in fig. 2.

in step 2-5, once a selection or action is sampled, a reward is returned.

The state of the event-level policy network processIs related to the past time step, not only the current input but also the previous environmental state, +.>Is a join of the following three vectors: 1) State s of last time step _t-1 Wherein if the agent initiates an event level policy procedure at time step t-1, s _t-1 ＝s _t-1 ^e The method comprises the steps of carrying out a first treatment on the surface of the Otherwise s _t-1 ＝s _t-1 ^r ，s _t-1 ^e Representing the environmental state of a t-1 time step event level policy network, s _t-1 ^r Representing the environmental status of a t-1 time step role level policy network, 2) event type vector +.>Is from meeting->Is learned from the last selection of (3) hidden layer state vector h) _t It is the current input word vector w _t The hidden layer state vector obtained by Bi-LSTM is obtained by processing text word segmentation sequences through Bi-LSTM:

thereby the processing time of the product is reduced,expressed as:

the final purpose of rewarding the event level policy network is to identify and classify events, whether the trigger words are correct or not is an intermediate result, once the event level is selected Sampled, the agent will get an immediate reward which can be reflected in the choice +.>The immediate rewards are given by standard notes +.>The method comprises the following steps:

wherein alpha is bias weight, alpha is smaller than 1, and the smaller alpha is, the smaller the reward obtained by identifying the non-trigger word is, so that the model can be prevented from learning an unimportant strategy, and all words are predicted as NE, namely the non-trigger word;

Specifically, in step 3, a specific event is detected at time step t', i.eThe agent will migrate to the argument level policy network to predict each argument at event +.>At each word or time step t, the meta-level policy network takes a random policy pi to take action and uses rewards to guide the learning of the participating meta-at the current event, to assist meta-decision, select in order to deliver more fine-grained event information >And status representation from event-level procedure +.>Is used as an additional input by the overall meta-level process;

State of an argument-level policy network processIn connection with the past time step not only the current input but also the previous environmental state and the environmental information of the type of initialization event are encoded, < >>Is a join of the following four vectors: 1) State s of last time step _t-1 Wherein s is _t-1 Is a state from an event-level policy network or an argument-level policy network or a role-level policy network, 2) argument tag vector +.>It is from action- >Is learned from, 3) event status is indicative of +.>4) Hidden layer state vector h _t It is obtained from a similar Bi-LSTM treatment in formula 2,/I>Expressed as:

once an argument level actsSelected, the agent will get an immediate prize + ->This reward is by and in the predicted event type +.>The following standard treaty marks->The comparison results in the following rewards calculated:

the agent continues to select actions for each word segment until the last word segment, when the agent ends the current event The selection of the next level action of all the arguments will get a final prize +.>

role level policy networkIs to classify the current argument into an argument character, and the selection space is an argument character set, namely O ^r R, where R is a predefined argument role set; />

Status of role-level proceduresAlso in relation to past time steps, not only encodes the timeFront input, also encoding the previous environmental status,/->Is a join of the following three vectors: 1) Status of last time step->2) Argument character vectorIt is selected from->From learning, 3) hidden layer state vector h _t It is obtained from a similar Bi-LSTM treatment in formula 2,/I >Expressed as:

wherein,is the current argumentIs a representation vector of (h) _π Is a hidden layer state vector on the input word vector;

then, the random strategy for character detection is μ:S ^r →O ^r It selects one of the choices o _t ^r According to the following probability distribution:

W ^r and b ^r Is a parameter;

Further, the event-level policy network selects probability sampling according to equation 3 during training; the event-level policy network is testedThe selection with the highest probability will be selected, i.e The action can be sampled in a similar way to the event-level policy network during training and testing of the meta-level policy network and the role-level policy network.

Still further, the transition of the event level policy network is dependent on the selectionIf +.>The agent will continue to start from a new event level policy network state, otherwise, meaning that a particular event was detected, the agent will initiate a new subtask, turn to the meta-level policy network to detect the participation meta-of the current event, after which the agent will start meta-level selection and will not migrate to the event level policy network until all at the current event->The next argument level selection is ended by sampling, and the event level strategy network continues sampling selection until the last word in the sentence S;

wherein,is the expected calculation of rewards under the strategy network, gamma is E [0,1 ]]Is the discount rate, T ^e Is the total elapsed time steps, T, of the event-level process before ending ⁿ Is the ending time step of the argument-level process, T ^r Is the time step that passes before the end of the character level process;Rewards obtained at time step k for the procedure.

example two

The invention also discloses an electronic device, comprising:

a processor;

and a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the event extraction method described above via execution of the executable instructions in the first embodiment.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. An event extraction method based on a hierarchical policy network is characterized by comprising the following steps:

2. The method of claim 1, wherein the steps 2-5 are performed by an agent, and in step 2, when the agent scans sentences sequentially from beginning to end, the event-level policy network continuously samples the selections according to policies at each time step, and the event-level selections generally include a specific set of event types of non-trigger words or trigger words;

in step 2-5, once a selection or action is sampled, a reward is returned.

3. A hierarchical policy network based event extraction method according to claim 2, wherein the given input text S = w ₁ ,w ₂ ,...,w _L The purpose of the event-level policy network is to detect trigger words w _i The triggered event type, at the current word or time step t, the event level policy network will take a random policy μ to determine the selection, and then use the rewards obtained to guide the policy learning of the policy network;

the state of the event-level policy network processIs related to the past time step, not only the current input but also the previous environmental state, +.>Is a join of the following three vectors: 1) State s of last time step _t-1 Wherein if the agent initiates an event level policy procedure at time step t-1, s _t-1 ＝s _t-1 ^e The method comprises the steps of carrying out a first treatment on the surface of the Otherwise s _t-1 ＝s _t-1 ^r ，s _t-1 ^e Representing the environmental state of a t-1 time step event level policy network, s _t-1 ^r Representing t-1 time step rolesEnvironmental state of level policy network, 2) event type vector +.>Is from meeting->Is learned from the last selection of (3) hidden layer state vector h) _t It is the current input word vector w _t The hidden layer state vector obtained by Bi-LSTM is obtained by processing text word segmentation sequences through Bi-LSTM:

Thereby the processing time of the product is reduced,expressed as:

the final purpose of rewarding the event level policy network is to identify and classify events, whether the trigger words are correct or not is an intermediate result, once the event level is selectedSampled, the agent will receive an immediate reward which can be reflected in the selectionThe immediate rewards are given by standard notes +.>The method comprises the following steps:

when the event level strategy network samples and selects until the last word in the sentence S, the agent obtains a final rewards after finishing all event level selections The delay of this final state, reward, is defined by the sentence-level event detection performance:

4. A method of event extraction based on a hierarchical policy network as claimed in claim 3, wherein in step 3 a specific event is detected at time step t', namelyThe agent will migrate to the argument level policy network to predict each argument at event +.>At each word or time step t, the meta-level policy network takes a random policy pi to take action and uses rewards to guide the learning of the participating meta-at the current event, to assist meta-decision, select in order to deliver more fine-grained event information>And status representation from event-level procedure +.>Is used as an additional input by the overall meta-level process;

actions of the argument-level policy networkIs to assign a specific argument tag to the current word,>is from an action space A ⁿ Selected from = { B, I, O, E, S } u { N }, wherein B/I/E represents the position information of the current word in the argument, B represents the start position, I represents the intermediate position, E represents the end position, O Then the argument unrelated to the current event is marked, S represents an argument of an individual word, N is marked with a non-argument word, and at different time steps, the same argument may be given different labels due to different types of events. In this way, the multiple event and mismatch problems can be solved very naturally;

once an argument level actsSelected, agentWill get an immediate prize +_>This reward is by and in the predicted event type +.>The following standard treaty marks->The comparison results in the following rewards calculated:

5. According to claim 4An event extraction method based on a hierarchical policy network, characterized in that in step 4, a participation argument is detected at time step t, i.e.The agent will transfer to the role level policy network to predict the argument +. >In event->In particular, at each word/time step t, the role-level policy network takes a random policy μ to choose the choice and uses rewards to guide the argument role learning of the argument in the current event, and to assist in the decision of argument roles by delivering more fine-grained event information and argument information, choose->Action->The entire argument-level process is used as an additional input;

status of role-level proceduresAlso in relation to the past time steps, not only the current input but also the previous environmental state,/->Is a join of the following three vectors: 1) Status of last time step->2) Argument character vector->It is selected from->Learning from Chinese and->3) Hidden layer state vector h _t It is obtained from a similar Bi-LSTM treatment in formula 2,/I>Expressed as:

wherein, Is the representation vector of the current argument, h _π Is a hidden layer state vector on the input word vector;

wherein W is ^r And b ^r Is a parameter;

6. An event extraction based on hierarchical policy network as claimed in any of claims 3 to 5The method is characterized in that the event-level strategy network selects probability sampling according to the formula 3 during training; the event-level policy network is tested by selecting the most probable choice, i.eThe action can be obtained by sampling in a similar way of the event-level policy network when the meta-level policy network and the role-level policy network are trained and tested;

The transition of event level policy networks depends on selectionIf +.>The agent will continue to start from a new event level policy network state, otherwise, meaning that a particular event was detected, the agent will initiate a new subtask, turn to the meta-level policy network to detect the participation meta-of the current event, after which the agent will start meta-level selection and will not migrate to the event level policy network until all at the current event->The next argument level selection is ended by sampling, and the event level strategy network continues sampling selection until the last word in the sentence S;

the transition of the meta-level policy network depends on actionsIf +.>Then a participation argument indicating the current event is identified, the agent will transfer to the role-level policy network to classify the argument roles, otherwise the agent will continue to discuss the argument-level policy network; if the argument-level policy network enforces to the end of sentence, then the agentThe transition to the event level policy network continues to identify the remaining events.

7. The hierarchical policy network based event extraction method according to claim 6, wherein in order to optimize the event-level, meta-level and role-level policy networks, the hierarchical training of the hierarchical policy network aims to maximize the expected cumulative discount rewards from the three phases obtained by the agent according to the policy sample selection and actions at each time step t, the expected cumulative discount rewards are calculated as follows:

Wherein,is the expected calculation of rewards under the strategy network, gamma is E [0,1 ]]Is the discount rate, T ^e Is the total elapsed time steps, T, of the event-level process before ending ⁿ Is the ending time step of the argument-level process, T ^r Is the time step that passes before the end of the character level process;Rewards obtained for process step k;

where N is the argument level process in the selectionA time step of next duration, so the next choice of agent is o _t+N If (3)N=1; because the role-level policy network is acting +.>Only one-step role classification is involved, so that the index of the discount rate gamma in the process of the argument level is 1; no other steps are performed under the meta-level policy network, so that the index of the discount rate gamma is 0; r is the final rewards which can be finally obtained by each layer of strategy network, and R is the instant rewards;

8. an event extraction electronic device based on a hierarchical policy network, comprising:

A processor;

and a memory for storing executable instructions of the processor;

wherein the processor is configured to perform a hierarchical policy network based event extraction method via execution of the executable instructions of any of claims 1 to 7.