CN112836504A - Event extraction method and device based on hierarchical policy network - Google Patents

Event extraction method and device based on hierarchical policy network Download PDF

Info

Publication number
CN112836504A
CN112836504A CN202110022760.2A CN202110022760A CN112836504A CN 112836504 A CN112836504 A CN 112836504A CN 202110022760 A CN202110022760 A CN 202110022760A CN 112836504 A CN112836504 A CN 112836504A
Authority
CN
China
Prior art keywords
event
argument
level
network
role
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110022760.2A
Other languages
Chinese (zh)
Other versions
CN112836504B (en
Inventor
赵翔
黄培馨
谭真
胡升泽
肖卫东
胡艳丽
张军
李硕豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110022760.2A priority Critical patent/CN112836504B/en
Publication of CN112836504A publication Critical patent/CN112836504A/en
Application granted granted Critical
Publication of CN112836504B publication Critical patent/CN112836504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an event extraction method and device based on a hierarchical policy network, wherein the method comprises the following steps: constructing a hierarchical strategy network; in the process of scanning from the beginning of a sentence to the end of the sentence, the event-level strategy network detects a trigger word at each participle and classifies the type of the event for the detected trigger word; once a specific event is detected, the argument level policy network is triggered to start scanning sentences from beginning to end to detect the participating arguments of the current event; once an argument is identified, the role-level policy network is triggered to predict the role this argument plays in the event under the current event; when the role prediction is completed, the argument level strategy network continues to scan the sentence from the word segmentation position of the current argument backwards to detect other arguments of the event until the tail of the sentence is scanned; then the strategy net at the event level continues to scan the sentence from the word segmentation position of the current event to the back to detect other events contained in the sentence until the tail of the sentence is scanned.

Description

Event extraction method and device based on hierarchical policy network
Technical Field
The invention relates to the technical field of text event extraction in natural language processing, in particular to an event extraction method and device based on a hierarchical policy network.
Background
Event Extraction (EE) plays an important role in many natural language processing top-level applications such as information retrieval and news summarization, etc. The purpose of event extraction is to discover events triggered by specific trigger words and arguments of the events. Generally, event extraction involves several subtasks: trigger word recognition, trigger word classification, event argument recognition and argument role classification.
Some existing event extraction works employ a pipeline method to process these subtasks, i.e., perform event detection (including event-triggered word recognition and classification) and event argument classification in stages. These methods generally assume that the entity information in the text has been labeled (non-patent literature: McClosky et al, 2011; Chen et al, 2015; Yang et al, 2019). However, these staged extraction models do not have any strategy to fully utilize the information interaction between the subtasks, and the event extraction subtasks cannot pass information to each other to improve their decision making. Although some federated models for event extraction by constructing federated extractors are currently available (non-patent documents: Yang and Mitchell, 2016; Nguyen and Nguyen, 2019; Zhang et al, 2019), these models essentially follow a pipelined framework, first identifying entities and trigger words jointly, and then detecting each entity-event pair to identify arguments and argument roles. In addition, the strategy gradient method (Sutton et al, 1999) and the REINFORCE algorithm (Williams,1992) can be used in the prior art to perform parameter optimization of the event detection model.
One problem that these models face is that they all produce redundant entity-event pair information and therefore can also introduce possible errors; another possibility is that when a sentence contains multiple events, there may be a mismatch between the argument and the trigger word, making the performance of event extraction poor.
Consider, for example, the following sentence: in Baghdad, a camera two word an American tank corrected on the Palestine hot In this sentence, "camera" is not only the Victim argument of the event Die (trigger "Die") but also the Target argument of the event Attack (trigger "fire"). However, since "camera" is relatively far from the trigger word "fire" in the text, it is very likely that the event extractor will not recognize "camera" as an argument of the event Attack.
Details of non-patent literature:
David McClosky,Mihai Surdeanu,and Christopher D.Manning.2011.Event extraction as dependency parsing.In The 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies,Proceedings of the Conference,19-24June,2011,Portland,Oregon,USA,pages 1626–1635.
Chen Teruko Mitamura,Zheng zhong Liu,and Eduard H.Hovy.2015.Overview of TAC KBP 2015event nugget track.In Proceedings of the 2015Text Analysis Conference,TAC 2015,Gaithersburg,Maryland,USA,November 16-17,2015,2015.
Yang Trung Minh Nguyen and Thien Huu Nguyen.2019.One for all:Neural joint modeling of entities and events.In The Thirty-Third AAAI Conference on Artificial Intelligence,AAAI 2019,The Thirty-First Innovative Applications of Artificial Intelligence Conference,IAAI 2019,The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence,EAAI 2019,Honolulu,Hawaii,USA,January 27 -February 1,2019,pages 6851–6858.
Bishan Yang and Tom M.Mitchell.2016.Joint extraction of events and entities within a document context.In NAACL HLT 2016,The 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies,San Diego California,USA,June 12-17,2016,pages 289–299.
Yang Trung Minh Nguyen and Thien Huu Nguyen.2019.One for all:Neural joint modeling of entities and events.In The Thirty-Third AAAI Conference on Artificial Intelligence,AAAI 2019,The Thirty-First Innovative Applications of Artificial Intelligence Conference,IAAI 2019,The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence,EAAI 2019,Honolulu,Hawaii,USA,January 27 -February 1,2019,pages 6851–6858.
Junchi Zhang,Yanxia Qin,Yue Zhang,Mengchi Liu,and Donghong Ji.2019. Extracting entities and events as a single task using a transition-based neural model. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence,IJCAI 2019,Macao,China,August 10-16,2019,pages 5422–5428.
Richard S.Sutton,David A.McAllester,Satinder P.Singh,and Yishay Mansour. 1999.Policy gradient methods for reinforcement learning with function approximation.In Advances in Neural Information Processing Systems 12,[NIPS Conference,Denver,Colorado,USA,November 29-December 4,1999],pages 1057–1063.
Ronald J.Williams.1992.Simple statistical gradient-following algorithms for connectionist reinforcement learning.Mach.Learn.,8:229–256.
disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention discloses an event extraction method and device based on a hierarchical policy network. The method provides a Multi-layer Policy Network (MPNet) to jointly perform subtasks of event extraction. The MPNet comprises an event-level policy network (event-level policy network), an argument-level policy network (argument-level policy network) and a role-level policy network (role-level policy network), and the tasks of event detection, event argument identification and argument role classification are respectively solved in the three layers.
The technical scheme of the invention is that the event extraction method based on the hierarchical policy network comprises the following steps:
step 1, constructing a hierarchical policy network, wherein the hierarchical policy network comprises an event level policy network, an argument level policy network and a role level decision network;
step 2, in the process of scanning from the beginning of a sentence to the end of the sentence, the event-level strategy network detects a trigger word at each participle and classifies the event type of the detected trigger word;
step 3, once a specific event is detected, the argument level strategy network is triggered to start scanning sentences from beginning to end so as to detect the participation argument of the current event;
step 4, once an argument is identified, the role-level policy network is triggered to predict the role of the argument in the event under the current event;
and 5, when the role classification of the role-level strategy network is finished, the argument-level strategy network continues to scan the sentence after the role classification to find the next argument, and once the argument detection of the argument-level strategy network under the current event is finished, the event-level strategy network continues to scan the sentence from the word segmentation position of the current event to the back to detect other events contained in the sentence until the tail of the sentence is scanned.
Furthermore, the agent is adopted to perform the above steps 2-5, in step 2, when the agent scans sentences sequentially from beginning to end, the event-level policy network continues to sample selections according to the policy at each time step, and the event-level selections usually include non-trigger words or a specific event type set of trigger words;
step 3, a specific event is detected, the intelligent agent is transferred to the argument level strategy network, when the sentence is scanned from beginning to end, an action is selected according to the strategy at each time step, and the argument level action is to assign a specific argument label to the word;
in step 4, a specific argument is detected, the intelligent agent is transferred to a role-level network to sample and select the current argument according to a strategy, and the role-level selection is a role type set;
in step 5, after the role classification of the argument is completed, the intelligent agent is transferred to the argument level strategy network to continuously scan the rest arguments of the remaining word segmentation recognition events of the sentence, and once the intelligent agent finishes detecting the participated arguments of the current event under the current event, the intelligent agent is transferred to the event level strategy network to continuously scan the remaining sentences to recognize other events;
in step 2-5, once a selection or action is sampled, a reward is returned.
Specifically, given the input text S ═ w1,w2,K,wLThe purpose of the event-level policy network is to detect the trigger word wiThe event type triggered, at the current word or time step t, the event level policy network will adopt a random policy mu to determine the selection, and then the obtained reward is used to guide the policy learning of the policy network;
selection of the event level policy network
Figure BDA0002889173730000051
Is from a selection set OeSampling from { NE }. U epsilon, wherein NE represents a participle of a non-trigger word, and epsilon is a predefined event type set in a data set and is used for indicating an event type triggered by a current trigger word;
the state of the event-level policy network process
Figure BDA0002889173730000052
Is related to the past time step and encodes not only the current input but also the previous ambient state, st eIs a concatenation of three vectors: 1) last time step shapeState st-1Wherein s if the agent initiates an event-level policy process at time step t-1t-1=st-1 e(ii) a Otherwise st-1=st-1 r,st-1 eRepresenting the environmental state, s, of a t-1 time step event level policy networkt-1 rRepresenting the environmental state of a t-1 time step role level policy network, 2) event type vector
Figure BDA0002889173730000061
Is from the satisfaction of
Figure BDA0002889173730000062
Last choice of (3) hidden state vector htIt is at the current input word vector wtAnd (3) processing the text word segmentation sequence by the Bi-LSTM to obtain the hidden layer state vector obtained by the Bi-LSTM:
Figure BDA0002889173730000063
in this way,
Figure BDA0002889173730000064
expressed as:
Figure BDA0002889173730000065
finally, the state is represented as a continuous real-valued vector using the multi-layered perceptron MLP
Figure BDA0002889173730000066
The random strategy in the event-level strategy network, namely the strategy for making a certain selection, mu: Se→OeIt takes a selection
Figure BDA0002889173730000067
Probability distribution according to:
Figure BDA0002889173730000068
wherein, WeAnd beIs a parameter that is a function of,
Figure BDA0002889173730000069
is a state representation vector;
the final purpose of the reward of the event-level policy network is to identify and classify events, whether the trigger word is correct or not is an intermediate result, once the event-level selection is made
Figure BDA00028891737300000610
Sampled, agents will receive an immediate reward that can be reflected in the selection
Figure BDA00028891737300000611
Short term reward by comparison to the standard annotation of the event type in sentence S
Figure BDA00028891737300000612
Obtaining:
Figure BDA00028891737300000613
where sgn (·) is a sign function, and I (NE) is a switch function for distinguishing the reward of a trigger word from a non-trigger word:
Figure BDA0002889173730000071
the smaller the alpha is, the smaller the reward obtained by identifying the non-trigger word is, so that the condition that the model learns an unimportant strategy can be avoided, and all words are predicted to be NE (NE), namely the non-trigger word.
When the event level strategy network samples and selects until the last word in the sentence S, the agent ends all event levelsAfter selection, a final reward is obtained
Figure BDA0002889173730000072
The delayed reward of this final state is defined by sentence-level event detection performance:
Figure BDA0002889173730000073
wherein F1(. h) represents the F1 score for sentence-level event detection results, which is the harmonic mean of sentence-level accuracy and recall.
In particular, in step 3, a particular event is detected at time step t', i.e.
Figure BDA0002889173730000074
The agent will transfer to the argument level policy network to predict each argument at event
Figure BDA0002889173730000075
The argument level strategy network takes a random strategy pi to select action at each word or time step t, leads the learning of the argument under the current event by using rewards, and selects argument decision in order to transmit event information with finer granularity to assist argument decision
Figure BDA0002889173730000076
And state representation from event level processes
Figure BDA0002889173730000077
Used as additional input by the entire argument level process;
the action of the argument level policy network
Figure BDA0002889173730000078
A particular argument label is assigned to the current word,
Figure BDA0002889173730000079
is from a mobile stationA betweennB/I/E represents position information of a current participle in an argument, B represents a start position, I represents an intermediate position, E represents an end position, O marks an argument irrelevant to a current event, S represents an argument of an individual participle, N marks a non-argument word, and the same argument may be given different labels due to different types of events at different time steps; in this way, the multiple event and mismatch problem can be solved quite naturally.
State of argument level policy network process
Figure BDA0002889173730000081
In relation to the past time step, not only the current input, but also the previous environment status and environment information of the initialization event type,
Figure BDA0002889173730000082
is the concatenation of four vectors: 1) state s of last time stept-1Wherein s ist-1Is a state from an event-level policy network or an argument-level policy network or a role-level policy network, 2) argument tag vectors
Figure BDA0002889173730000083
It is a slave action
Figure BDA0002889173730000084
Middle learning, 3) event status representation
Figure BDA0002889173730000085
4) Hidden layer state vector htWhich is obtained from a similar Bi-LSTM treatment in formula 1,
Figure BDA0002889173730000086
expressed as:
Figure BDA0002889173730000087
finally useMulti-layered perceptron MLP represents states as a continuous real-valued vector
Figure BDA0002889173730000088
By event type
Figure BDA0002889173730000089
As an additional input, a random strategy for argument detection, i.e., a strategy that takes some action of π Sn→AnIt selects an action
Figure BDA00028891737300000810
Probability distribution according to:
Figure BDA00028891737300000811
Figure BDA00028891737300000812
wherein WnAnd bnIs a parameter that is a function of,
Figure BDA00028891737300000813
is a vector of argument level state representations,
Figure BDA00028891737300000814
is an event
Figure BDA00028891737300000815
Is represented by the formula (I), WμIs an array of the epsilon matrix,
Figure BDA00028891737300000816
represent an event
Figure BDA00028891737300000817
Mapping through the array to obtain an event expression vector;
once an argument level action
Figure BDA00028891737300000818
If selected, the agent will receive an immediate reward
Figure BDA00028891737300000819
The reward is related to the type of event that is predicted
Figure BDA00028891737300000820
Standard argument notation under
Figure BDA00028891737300000821
In contrast, the reward is calculated as follows:
Figure BDA00028891737300000822
where I (N) is a switch function for differentiating between argument and non-argument word awards:
Figure BDA0002889173730000091
wherein beta is a bias weight, beta is less than 1, and the smaller beta means that the reward obtained by non-argument words is smaller, so that the intelligent agent can be prevented from learning an unimportant strategy and setting all actions as N;
the agent continues to select actions for each participle until the action of the last participle, when the agent ends at the current event
Figure BDA0002889173730000092
All choice of argument level actions that follow will result in a final reward
Figure BDA0002889173730000093
Figure BDA0002889173730000094
In particular, in step 4, a participating argument is detected at time step t, i.e.
Figure BDA0002889173730000095
Agent will move to role level policy network to predict arguments
Figure BDA0002889173730000096
At event
Figure BDA0002889173730000097
Specifically, at each word/time step t, the role-level policy network adopts a random policy mu to select and select, and guides argument role learning participating in arguments under the current event by using rewards, and selects event information and argument information with finer granularity to assist the decision of argument roles
Figure BDA0002889173730000098
And actions
Figure BDA0002889173730000099
The entire argument level process is used as additional input;
of role-level policy networks
Figure BDA00028891737300000918
Classifying a argument role for the current argument, and selecting a argument role set, namely OrR, where R is a predefined set of argument roles;
state of the process of the role gradation
Figure BDA00028891737300000910
Also related to the past time step, not only the current input, but also the previous environment state,
Figure BDA00028891737300000911
is a concatenation of three vectors: 1) state of last time step
Figure BDA00028891737300000912
2) Argument role vector
Figure BDA00028891737300000913
It is selected from
Figure BDA00028891737300000914
From middle learning, 3) hidden state vector htWhich is obtained from a similar Bi-LSTM treatment in formula 2,
Figure BDA00028891737300000915
expressed as:
Figure BDA00028891737300000916
finally, the state is expressed as a continuous real-value vector by using a multi-layer perceptron MLP
Figure BDA00028891737300000917
To define the strategy for role set, the scores for all argument roles are first computed:
Figure BDA0002889173730000101
Figure BDA0002889173730000102
wherein, WrIs a parameter that is a function of,
Figure BDA0002889173730000103
is an argument level state representation vector
Figure BDA0002889173730000104
Is the representative vector of the current argument, hπIs a hidden state vector on the input word vector;
therefore, a matrix M epsilon {0,1} based on the event architecture is designed|ε|*|R|Wherein M [ e ]][r]Using this matrix to filter out argument roles that are unlikely to participate in the current event if and only if the event e has a role r in the event framework information;
then, the random strategy for role detection is μ Sr→OrIt selects a selection ot rProbability distribution according to:
Figure BDA0002889173730000105
Wrand brIs a parameter;
upon a role level selection
Figure BDA0002889173730000106
Is executed, the agent will receive an immediate reward rt rThe reward is marked by the standard role under the current event type
Figure BDA0002889173730000107
In contrast, the reward is calculated as follows:
Figure BDA0002889173730000108
the final reward is due to the fact that the role-level selection is performed only one step at the argument level action
Figure BDA0002889173730000109
Furthermore, the event-level strategy network selects probability sampling according to formula 3 during training; the most probable choice of the event-level policy network is selected during testing, i.e. the most probable choice is selected
Figure BDA00028891737300001010
The argument levelWhen the strategy network and the role-level strategy network are trained and tested, actions are sampled in a similar manner of the event-level strategy network.
Still further, the transition of the event-level policy network depends on the selection
Figure BDA00028891737300001011
If at a certain time step
Figure BDA0002889173730000111
The agent will continue to start with a new event-level policy network state, otherwise, meaning that a particular event was detected, the agent will initiate a new subtask, switch to the argument-level policy network to detect the argument of participation in the current event, after which the agent will begin argument-level selection and will not switch to the event-level policy network until all events at the current event are reached
Figure BDA0002889173730000112
The lower argument level selection is sampled and finished, and the event level strategy network continues sampling and selecting until the last word in the sentence S;
transition of an argument level policy network depends on action
Figure BDA0002889173730000113
If at a certain time step
Figure BDA0002889173730000114
Then, one participating argument of the current event is identified, the intelligent agent is transferred to the role-level strategy network to classify the argument roles, otherwise, the intelligent agent continues to the argument-level strategy network; if the argument level policy network executes to the end of the sentence, the agent will transition to the event level policy network to continue identifying the remaining events.
Still further, to optimize the event level policy network, the argument level policy network, and the role level policy network, the hierarchical training goal of the hierarchical policy network is to maximize the expected cumulative discount rewards from the three phases obtained by the agent at each time step t according to policy sample selection and action, the expected cumulative discount rewards calculated as follows:
Figure BDA0002889173730000115
where E is the expected calculation of the reward under the policy network, γ ∈ [0,1 ]]Is the discount rate, TeIs the total elapsed time step, T, of the event-level process before the endnIs the end time step, T, of the argument level processrIs the time step that elapses before the end of the role-level process;
Figure BDA0002889173730000116
the reward obtained at time step k for process.
The cumulative prize is then decomposed into bellman equations, which can then be optimized with the REINFORCE algorithm, the decomposed bellman equations being as follows:
Figure BDA0002889173730000121
where N is the argument level process in the selection
Figure BDA0002889173730000122
The next time step of duration, so the next choice of agent is ot+NIf, if
Figure BDA0002889173730000123
Then N is 1; since the role level policy network is acting
Figure BDA0002889173730000124
Only one-step role classification is involved, so the index of the discount rate gamma in the process of argument level is 1; and there is no other step under the argument level strategy network, so the index of the discount rate gamma is 0; r is the final reward which is finally obtained by each layer of policy network, and R is the instant reward.
Optimizing the Bellman equation obtained by decomposition by adopting a strategy gradient method and a REINFORCE algorithm to obtain the following random gradient for updating the parameters:
Figure BDA0002889173730000125
the invention also discloses an electronic device, comprising:
a processor;
and a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the above-described event extraction method via execution of the executable instructions.
Compared with the prior art, the method has the advantages that: firstly, a hierarchical strategy network is applied, and a deep reinforcement learning method is used for extracting events; a three-layer hierarchical network MPNet is designed to realize combined event extraction, an event-level strategy network is used for event extraction, an argument-level strategy network is used for argument extraction, and a role-level strategy network is used for argument role identification. Due to the hierarchical structural design, MPNet is adept at utilizing deep information interaction among subtasks and is highlighted in processing sentences containing a plurality of events. Therefore, the event extraction method has better performance.
Drawings
FIG. 1 shows a schematic flow diagram of an embodiment of the invention;
FIG. 2 shows an algorithm flow diagram of an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example one
Fig. 1 shows a schematic flow chart of a first embodiment of the present invention. The technical scheme of the invention is that the event extraction method based on the hierarchical policy network comprises the following steps:
step 1, constructing a hierarchical policy network, wherein the hierarchical policy network comprises an event level policy network, an argument level policy network and a role level decision network;
step 2, in the process of scanning from the beginning of a sentence to the end of the sentence, the event-level strategy network detects a trigger word at each participle and classifies the event type of the detected trigger word;
step 3, once a specific event is detected, the argument level strategy network is triggered to start scanning sentences from beginning to end so as to detect the participation argument of the current event;
step 4, once an argument is identified, the role-level policy network is triggered to predict the role of the argument in the event under the current event;
and 5, when the role classification of the role-level strategy network is finished, the argument-level strategy network continues to scan the sentence after the role classification to find the next argument, and once the argument detection of the argument-level strategy network under the current event is finished, the event-level strategy network continues to scan the sentence from the word segmentation position of the current event to the back to detect other events contained in the sentence until the tail of the sentence is scanned.
The specific algorithm flow is shown in fig. 2.
Furthermore, the agent is adopted to perform the above steps 2-5, in step 2, when the agent scans sentences sequentially from beginning to end, the event-level policy network continues to sample selections according to the policy at each time step, and the event-level selections usually include non-trigger words or a specific event type set of trigger words;
step 3, a specific event is detected, the intelligent agent is transferred to the argument level strategy network, when the sentence is scanned from beginning to end, an action is selected according to the strategy at each time step, and the argument level action is to assign a specific argument label to the word;
in step 4, a specific argument is detected, the intelligent agent is transferred to a role-level network to sample and select the current argument according to a strategy, and the role-level selection is a role type set;
in step 5, after the role classification of the argument is completed, the intelligent agent is transferred to the argument level strategy network to continuously scan the rest arguments of the remaining word segmentation recognition events of the sentence, and once the intelligent agent finishes detecting the participated arguments of the current event under the current event, the intelligent agent is transferred to the event level strategy network to continuously scan the remaining sentences to recognize other events;
in step 2-5, once a selection or action is sampled, a reward is returned.
Specifically, given the input text S ═ w1,w2,K,wLThe purpose of the event-level policy network is to detect the trigger word wiThe event type triggered, at the current word or time step t, the event level policy network will adopt a random policy mu to determine the selection, and then the obtained reward is used to guide the policy learning of the policy network;
selection of the event level policy network
Figure BDA0002889173730000151
Is from a selection set OeSampling from { NE }. U epsilon, wherein NE represents a participle of a non-trigger word, and epsilon is a predefined event type set in a data set and is used for indicating an event type triggered by a current trigger word;
the state of the event-level policy network process
Figure BDA0002889173730000152
Is related to past time step, not only encodes the current input, but also encodes the previous oneThe state of the environment is that of the environment,
Figure BDA0002889173730000153
is a concatenation of three vectors: 1) state s of last time stept-1Wherein s if the agent initiates an event-level policy process at time step t-1t-1=st-1 e(ii) a Otherwise st-1=st-1 r,st-1 eRepresenting the environmental state, s, of a t-1 time step event level policy networkt-1 rRepresenting the environmental state of a t-1 time step role level policy network, 2) event type vector
Figure BDA0002889173730000154
Is from the satisfaction of
Figure BDA0002889173730000155
Last choice of (3) hidden state vector htIt is at the current input word vector wtAnd (3) processing the text word segmentation sequence by the Bi-LSTM to obtain the hidden layer state vector obtained by the Bi-LSTM:
Figure BDA0002889173730000156
in this way,
Figure BDA0002889173730000161
expressed as:
Figure BDA0002889173730000162
finally, the state is represented as a continuous real-valued vector using the multi-layered perceptron MLP
Figure BDA0002889173730000163
The random strategy in the event-level strategy network, namely the strategy for making a certain selection, mu: Se→OeIt samples oneSelecting
Figure BDA0002889173730000164
Probability distribution according to:
Figure BDA0002889173730000165
wherein, WeAnd beIs a parameter that is a function of,
Figure BDA0002889173730000166
is a state representation vector;
the final purpose of the reward of the event-level policy network is to identify and classify events, whether the trigger word is correct or not is an intermediate result, once the event-level selection is made
Figure BDA0002889173730000167
Sampled, agents will receive an immediate reward that can be reflected in the selection
Figure BDA0002889173730000168
Short term reward by comparison to the standard annotation of the event type in sentence S
Figure BDA0002889173730000169
Obtaining:
Figure BDA00028891737300001610
where sgn (·) is a sign function, and I (NE) is a switch function for distinguishing the reward of a trigger word from a non-trigger word:
Figure BDA00028891737300001611
the method comprises the following steps that alpha is a bias weight, alpha is less than 1, and the smaller alpha is, the smaller reward obtained by identifying a non-trigger word is, so that the condition that a model learns an unimportant strategy can be avoided, and all words are predicted to be NE (NE), namely the non-trigger word;
when the event-level strategy network samples and selects until the last word in the sentence S, and the agent finishes all the event-level selections, a final reward is obtained
Figure BDA00028891737300001612
The delayed reward of this final state is defined by sentence-level event detection performance:
Figure BDA00028891737300001613
wherein F1(. h) represents the F1 score for sentence-level event detection results, which is the harmonic mean of sentence-level accuracy and recall.
In particular, in step 3, a particular event is detected at time step t', i.e.
Figure BDA0002889173730000171
The agent will transfer to the argument level policy network to predict each argument at event
Figure BDA0002889173730000172
The argument level strategy network takes a random strategy pi to select action at each word or time step t, leads the learning of the argument under the current event by using rewards, and selects argument decision in order to transmit event information with finer granularity to assist argument decision
Figure BDA0002889173730000173
And state representation from event level processes
Figure BDA0002889173730000174
Used as additional input by the entire argument level process;
the action of the argument level policy network
Figure BDA0002889173730000175
Is to giveThe current word is given a particular argument label,
Figure BDA0002889173730000176
is from a motion space AnB/I/E represents position information of a current participle in an argument, B represents a start position, I represents an intermediate position, E represents an end position, O marks an argument irrelevant to a current event, S represents an argument of an individual participle, N marks a non-argument word, and the same argument may be given different labels due to different types of events at different time steps; in this way, the multiple event and mismatch problem can be solved quite naturally.
State of argument level policy network process
Figure BDA0002889173730000177
In relation to the past time step, not only the current input, but also the previous environment status and environment information of the initialization event type,
Figure BDA0002889173730000178
is the concatenation of four vectors: 1) state s of last time stept-1Wherein s ist-1Is a state from an event-level policy network or an argument-level policy network or a role-level policy network, 2) argument tag vectors
Figure BDA0002889173730000179
It is a slave action
Figure BDA00028891737300001710
Middle learning, 3) event status representation
Figure BDA00028891737300001711
4) Hidden layer state vector htWhich is obtained from a similar Bi-LSTM treatment in formula 2,
Figure BDA00028891737300001712
expressed as:
Figure BDA00028891737300001713
finally, the state is expressed as a continuous real-value vector by using a multi-layer perceptron MLP
Figure BDA00028891737300001714
By event type
Figure BDA0002889173730000181
As an additional input, a random strategy for argument detection, i.e., a strategy that takes some action of π Sn→AnIt selects an action
Figure BDA0002889173730000182
Probability distribution according to:
Figure BDA0002889173730000183
Figure BDA0002889173730000184
wherein WnAnd bnIs a parameter that is a function of,
Figure BDA0002889173730000185
is a vector of argument level state representations,
Figure BDA0002889173730000186
is an event
Figure BDA0002889173730000187
Is represented by the formula (I), WμIs an array of the epsilon matrix,
Figure BDA0002889173730000188
represent an event
Figure BDA0002889173730000189
Mapping through the array to obtain an event expression vector;
once an argument level action
Figure BDA00028891737300001810
If selected, the agent will receive an immediate reward
Figure BDA00028891737300001811
The reward is related to the type of event that is predicted
Figure BDA00028891737300001812
Standard argument notation under
Figure BDA00028891737300001813
In contrast, the reward is calculated as follows:
Figure BDA00028891737300001814
where I (N) is a switch function for differentiating between argument and non-argument word awards:
Figure BDA00028891737300001815
wherein beta is a bias weight, beta is less than 1, and the smaller beta means that the reward obtained by non-argument words is smaller, so that the intelligent agent can be prevented from learning an unimportant strategy and setting all actions as N;
the agent continues to select actions for each participle until the action of the last participle, when the agent ends at the current event
Figure BDA00028891737300001816
All choice of argument level actions that follow will result in a final reward
Figure BDA00028891737300001817
Figure BDA00028891737300001818
In particular, in step 4, a participating argument is detected at time step t, i.e.
Figure BDA00028891737300001819
Agent will move to role level policy network to predict arguments
Figure BDA00028891737300001820
At event
Figure BDA00028891737300001821
Specifically, at each word/time step t, the role-level policy network adopts a random policy mu to select and select, and guides argument role learning participating in arguments under the current event by using rewards, and selects event information and argument information with finer granularity to assist the decision of argument roles
Figure BDA0002889173730000191
And actions
Figure BDA0002889173730000192
The entire argument level process is used as additional input;
of role-level policy networks
Figure BDA0002889173730000193
Classifying a argument role for the current argument, and selecting a argument role set, namely OrR, where R is a predefined set of argument roles;
state of the process of the role gradation
Figure BDA0002889173730000194
Also related to the past time step, not only the current input, but also the previous environment state,
Figure BDA0002889173730000195
is a concatenation of three vectors: 1) state of last time step
Figure BDA0002889173730000196
2) Argument role vector
Figure BDA0002889173730000197
It is selected from
Figure BDA0002889173730000198
From middle learning, 3) hidden state vector htWhich is obtained from a similar Bi-LSTM treatment in formula 2,
Figure BDA0002889173730000199
expressed as:
Figure BDA00028891737300001910
finally, the state is expressed as a continuous real-value vector by using a multi-layer perceptron MLP
Figure BDA00028891737300001911
To define the strategy for role set, the scores for all argument roles are first computed:
Figure BDA00028891737300001912
Figure BDA00028891737300001913
wherein,
Figure BDA00028891737300001914
is the representative vector of the current argument, hπIs hidden layer on input word vectorA state vector;
therefore, a matrix M epsilon {0,1} based on the event architecture is designed|ε|*|R|Wherein M [ e ]][r]Using this matrix to filter out argument roles that are unlikely to participate in the current event if and only if the event e has a role r in the event framework information;
then, the random strategy for role detection is μ Sr→OrIt selects a selection ot rProbability distribution according to:
Figure BDA0002889173730000201
Wrand brIs a parameter;
upon a role level selection
Figure BDA0002889173730000202
Is executed, the agent will receive an immediate reward rt rThe reward is marked by the standard role under the current event type
Figure BDA0002889173730000203
In contrast, the reward is calculated as follows:
Figure BDA0002889173730000204
the final reward is due to the fact that the role-level selection is performed only one step at the argument level action
Figure BDA0002889173730000205
Furthermore, the event-level strategy network selects probability sampling according to formula 3 during training; the most probable choice of the event-level policy network is selected during testing, i.e. the most probable choice is selected
Figure BDA0002889173730000206
The above-mentionedWhen training and testing, actions of the argument level strategy network and the role level strategy network are sampled in a similar way of the event level strategy network.
Still further, the transition of the event-level policy network depends on the selection
Figure BDA0002889173730000207
If at a certain time step
Figure BDA0002889173730000208
The agent will continue to start with a new event-level policy network state, otherwise, meaning that a particular event was detected, the agent will initiate a new subtask, switch to the argument-level policy network to detect the argument of participation in the current event, after which the agent will begin argument-level selection and will not switch to the event-level policy network until all events at the current event are reached
Figure BDA0002889173730000209
The lower argument level selection is sampled and finished, and the event level strategy network continues sampling and selecting until the last word in the sentence S;
transition of an argument level policy network depends on action
Figure BDA00028891737300002010
If at a certain time step
Figure BDA00028891737300002011
Then, one participating argument of the current event is identified, the intelligent agent is transferred to the role-level strategy network to classify the argument roles, otherwise, the intelligent agent continues to the argument-level strategy network; if the argument level policy network executes to the end of the sentence, the agent will transition to the event level policy network to continue identifying the remaining events.
Still further, to optimize the event level policy network, the argument level policy network, and the role level policy network, the hierarchical training goal of the hierarchical policy network is to maximize the expected cumulative discount rewards from the three phases obtained by the agent at each time step t according to policy sample selection and action, the expected cumulative discount rewards calculated as follows:
Figure BDA0002889173730000211
where E is the expected calculation of the reward under the policy network, γ ∈ [0,1 ]]Is the discount rate, TeIs the total elapsed time step, T, of the event-level process before the endnIs the end time step, T, of the argument level processrIs the time step that elapses before the end of the role-level process;
Figure BDA0002889173730000212
the reward obtained at time step k for process.
The cumulative prize is then decomposed into bellman equations, which can then be optimized with the REINFORCE algorithm, the decomposed bellman equations being as follows:
Figure BDA0002889173730000213
where N is the argument level process in the selection
Figure BDA0002889173730000214
The next time step of duration, so the next choice of agent is ot+NIf, if
Figure BDA0002889173730000215
Then N is 1; since the role level policy network is acting
Figure BDA0002889173730000216
Only one-step role classification is involved, so the index of the discount rate gamma in the process of argument level is 1; and there is no other step under the argument level strategy network, so the index of the discount rate gamma is 0; r is the final reward which is finally obtained by each layer of policy network, and R is the instant reward.
Optimizing the Bellman equation obtained by decomposition by adopting a strategy gradient method and a REINFORCE algorithm to obtain the following random gradient for updating the parameters:
Figure BDA0002889173730000221
example two
The invention also discloses an electronic device, comprising:
a processor;
and a memory for storing executable instructions of the processor;
wherein the processor is configured to execute the event extraction method via executing the executable instructions in the first embodiment.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims (8)

1. An event extraction method based on a hierarchical policy network is characterized by comprising the following steps:
step 1, constructing a hierarchical policy network, wherein the hierarchical policy network comprises an event level policy network, an argument level policy network and a role level decision network;
step 2, in the process of scanning from the beginning of a sentence to the end of the sentence, the event-level strategy network detects a trigger word at each participle and classifies the event type of the detected trigger word;
step 3, once a specific event is detected, the argument level strategy network is triggered to start scanning sentences from beginning to end so as to detect the participation argument of the current event;
step 4, once an argument is identified, the role-level policy network is triggered to predict the role of the argument in the event under the current event;
and 5, when the role classification of the role-level strategy network is finished, the argument-level strategy network continues to scan the sentence after the role classification to find the next argument, and once the argument detection of the argument-level strategy network under the current event is finished, the event-level strategy network continues to scan the sentence from the word segmentation position of the current event to the back to detect other events contained in the sentence until the tail of the sentence is scanned.
2. The method for extracting events based on hierarchical policy network according to claim 1, wherein the agent performs the above steps 2-5, and in step 2, when the agent scans sentences from beginning to end, the event-level policy network continues to sample selections according to the policy at each time step, and the event-level selections usually include non-trigger words or a specific set of event types of trigger words;
step 3, a specific event is detected, the intelligent agent is transferred to the argument level strategy network, when the sentence is scanned from beginning to end, an action is selected according to the strategy at each time step, and the argument level action is to assign a specific argument label to the word;
in step 4, a specific argument is detected, the intelligent agent is transferred to a role-level network to sample and select the current argument according to a strategy, and the role-level selection is a role type set;
in step 5, after the role classification of the argument is completed, the intelligent agent is transferred to the argument level strategy network to continuously scan the rest arguments of the remaining word segmentation recognition events of the sentence, and once the intelligent agent finishes detecting the participated arguments of the current event under the current event, the intelligent agent is transferred to the event level strategy network to continuously scan the remaining sentences to recognize other events;
in step 2-5, once a selection or action is sampled, a reward is returned.
3. The method as claimed in claim 2, wherein the input text S ═ w is given1,w2,...,wLThe purpose of the event-level policy network is to detect the trigger word wiThe event type triggered, at the current word or time step t, the event level policy network will adopt a random policy mu to determine the selection, and then the obtained reward is used to guide the policy learning of the policy network;
selection of the event level policy network
Figure RE-FDA0002990763060000021
Is from a selection set OeSampling from { NE }. U epsilon, wherein NE represents a participle of a non-trigger word, and epsilon is a predefined event type set in a data set and is used for indicating an event type triggered by a current trigger word;
the state of the event-level policy network process
Figure RE-FDA0002990763060000022
Is related to the past time step, encodes not only the current input, but also the previous environment state,
Figure RE-FDA0002990763060000023
is a concatenation of three vectors: 1) state s of last time stept-1Wherein s if the agent initiates an event-level policy process at time step t-1t-1=st-1 e(ii) a Otherwise st-1=st-1 r,st-1 eRepresenting the environmental state, s, of a t-1 time step event level policy networkt-1 rRepresenting the environmental state of a t-1 time step role level policy network, 2) event type vector
Figure RE-FDA0002990763060000024
Is from the satisfaction of
Figure RE-FDA0002990763060000025
Last choice of (3) hidden state vector htIt is at the current input word vector wtAnd (3) processing the text word segmentation sequence by the Bi-LSTM to obtain the hidden layer state vector obtained by the Bi-LSTM:
Figure RE-FDA0002990763060000026
in this way,
Figure RE-FDA0002990763060000027
expressed as:
Figure RE-FDA0002990763060000031
finally, the state is represented as a continuous real-valued vector using the multi-layered perceptron MLP
Figure RE-FDA0002990763060000032
The random strategy in the event-level strategy network, namely the strategy for making a certain selection, mu: Se→OeIt takes a selection
Figure RE-FDA0002990763060000033
Probability distribution according to:
Figure RE-FDA0002990763060000034
wherein, WeAnd beIs a parameter that is a function of,
Figure RE-FDA0002990763060000035
is a state representation vector;
final purpose of reward of said event level policy networkIt is to identify and classify events, whether the trigger word is correct or not is an intermediate result, once the event level is selected
Figure RE-FDA0002990763060000036
Sampled, agents will receive an immediate reward that can be reflected in the selection
Figure RE-FDA0002990763060000037
Short term reward by comparison to the standard annotation of the event type in sentence S
Figure RE-FDA0002990763060000038
Obtaining:
Figure RE-FDA0002990763060000039
where sgn (·) is a sign function, and I (NE) is a switch function for distinguishing the reward of a trigger word from a non-trigger word:
Figure RE-FDA00029907630600000310
the method comprises the following steps that alpha is a bias weight, alpha is less than 1, and the smaller alpha is, the smaller reward obtained by identifying a non-trigger word is, so that the condition that a model learns an unimportant strategy can be avoided, and all words are predicted to be NE (NE), namely the non-trigger word;
when the event-level strategy network samples and selects until the last word in the sentence S, and the agent finishes all the event-level selections, a final reward is obtained
Figure RE-FDA00029907630600000311
The delayed reward of this final state is defined by sentence-level event detection performance:
Figure RE-FDA00029907630600000312
wherein F1(. h) represents the F1 score for sentence-level event detection results, which is the harmonic mean of sentence-level accuracy and recall.
4. The method for extracting events based on hierarchical policy network as claimed in claim 3, wherein in step 3, a specific event is detected at time step t
Figure RE-FDA0002990763060000041
The agent will transfer to the argument level policy network to predict each argument at event
Figure RE-FDA0002990763060000042
The argument level strategy network takes a random strategy pi to select action at each word or time step t, leads the learning of the argument under the current event by using rewards, and selects argument decision in order to transmit event information with finer granularity to assist argument decision
Figure RE-FDA0002990763060000043
And state representation from event level processes
Figure RE-FDA0002990763060000044
Used as additional input by the entire argument level process;
the action of the argument level policy network
Figure RE-FDA0002990763060000045
A particular argument label is assigned to the current word,
Figure RE-FDA0002990763060000046
is from a motion space AnB, { B, I, O, E, S } { N }, where B/I/E denotes position information of the current participle in the argument, B denotes a start position, and I tableIndicating an intermediate position, indicating an end position, marking an argument which is irrelevant to the current event by O, indicating an argument of a single participle by S, and indicating a non-argument word by N, wherein the same argument may be endowed with different labels due to different types of the events at different time steps; in this way, multiple events and mismatch problems can be solved quite naturally;
state of argument level policy network process
Figure RE-FDA0002990763060000047
In relation to the past time step, not only the current input, but also the previous environment status and environment information of the initialization event type,
Figure RE-FDA0002990763060000048
is the concatenation of four vectors: 1) state s of last time stept-1Wherein s ist-1Is a state from an event-level policy network or an argument-level policy network or a role-level policy network, 2) argument tag vectors
Figure RE-FDA0002990763060000049
It is a slave action
Figure RE-FDA00029907630600000410
Middle learning, 3) event status representation
Figure RE-FDA00029907630600000411
4) Hidden layer state vector htWhich is obtained from a similar Bi-LSTM treatment in formula 1,
Figure RE-FDA00029907630600000412
expressed as:
Figure RE-FDA00029907630600000413
finally using a multi-layer perceptronMLP represents a state as a continuous real-valued vector
Figure RE-FDA00029907630600000414
By event type
Figure RE-FDA00029907630600000415
As an additional input, a random strategy for argument detection, i.e., a strategy that takes some action of π Sn→AnIt selects an action
Figure RE-FDA00029907630600000416
Probability distribution according to:
Figure RE-FDA0002990763060000051
Figure RE-FDA0002990763060000052
wherein WnAnd bnIs a parameter that is a function of,
Figure RE-FDA0002990763060000053
is a vector of argument level state representations,
Figure RE-FDA0002990763060000054
is an event
Figure RE-FDA0002990763060000055
Is represented by the formula (I), WμIs an array of the epsilon matrix,
Figure RE-FDA0002990763060000056
represent an event
Figure RE-FDA0002990763060000057
Mapping through the array to obtain an event expression vector;
once an argument level action
Figure RE-FDA0002990763060000058
If selected, the agent will receive an immediate reward
Figure RE-FDA0002990763060000059
The reward is related to the type of event that is predicted
Figure RE-FDA00029907630600000510
Standard argument notation under
Figure RE-FDA00029907630600000511
In contrast, the reward is calculated as follows:
Figure RE-FDA00029907630600000512
where I (N) is a switch function for differentiating between argument and non-argument word awards:
Figure RE-FDA00029907630600000513
wherein beta is a bias weight, beta is less than 1, and the smaller beta means that the reward obtained by non-argument words is smaller, so that the intelligent agent can be prevented from learning an unimportant strategy and setting all actions as N;
the agent continues to select actions for each participle until the action of the last participle, when the agent ends at the current event
Figure RE-FDA00029907630600000514
All choice of argument level actions that follow will result in a final reward
Figure RE-FDA00029907630600000515
Figure RE-FDA00029907630600000516
5. The method according to claim 4, wherein in step 4, a participating argument is detected at time step t, i.e. the method comprises
Figure RE-FDA00029907630600000517
Agent will move to role level policy network to predict arguments
Figure RE-FDA00029907630600000518
At event
Figure RE-FDA00029907630600000519
Specifically, at each word/time step t, the role-level policy network adopts a random policy mu to select and select, and guides argument role learning participating in arguments under the current event by using rewards, and selects event information and argument information with finer granularity to assist the decision of argument roles
Figure RE-FDA00029907630600000520
And actions
Figure RE-FDA00029907630600000521
The entire argument level process is used as additional input;
of role-level policy networks
Figure RE-FDA0002990763060000061
Classifying a argument role for the current argument, and selecting a argument role set, namely OrWherein R is RA defined argument role set;
state of the process of the role gradation
Figure RE-FDA0002990763060000062
Also related to the past time step, not only the current input, but also the previous environment state,
Figure RE-FDA0002990763060000063
is a concatenation of three vectors: 1) state of last time step
Figure RE-FDA0002990763060000064
2) Argument role vector
Figure RE-FDA0002990763060000065
It is selected from
Figure RE-FDA0002990763060000066
The user can learn the Chinese character from the Chinese character,
Figure RE-FDA0002990763060000067
3) hidden layer state vector htWhich is obtained from a similar Bi-LSTM treatment in formula 2,
Figure RE-FDA0002990763060000068
expressed as:
Figure RE-FDA0002990763060000069
finally, the state is expressed as a continuous real-value vector by using a multi-layer perceptron MLP
Figure RE-FDA00029907630600000610
To define the strategy for role set, the scores for all argument roles are first computed:
Figure RE-FDA00029907630600000611
Figure RE-FDA00029907630600000612
wherein,
Figure RE-FDA00029907630600000613
is the representative vector of the current argument, hπIs a hidden state vector on the input word vector;
therefore, a matrix M epsilon {0,1} based on the event architecture is designed|ε|*|R|Wherein M [ e ]][r]Using this matrix to filter out argument roles that are unlikely to participate in the current event if and only if the event e has a role r in the event framework information;
then, the random strategy for role detection is μ Sr→OrIt selects a selection
Figure RE-FDA00029907630600000618
Probability distribution according to:
Figure RE-FDA00029907630600000614
wherein, WrAnd brIs a parameter;
upon a role level selection
Figure RE-FDA00029907630600000615
Is executed, the agent will receive an instant reward
Figure RE-FDA00029907630600000616
This reward is labeled by the standard role under the current event type
Figure RE-FDA00029907630600000617
In contrast, the reward is calculated as follows:
Figure RE-FDA0002990763060000071
the final reward is due to the fact that the role-level selection is performed only one step at the argument level action
Figure RE-FDA0002990763060000072
6. The method for extracting events based on hierarchical policy network according to any of claims 3 to 5, wherein the event-level policy network selects probability sampling according to equation 3 during training; the most probable choice of the event-level policy network is selected during testing, i.e. the most probable choice is selected
Figure RE-FDA0002990763060000073
When the argument level strategy network and the role level strategy network are trained and tested, actions are obtained by sampling in a similar mode of the event level strategy network;
event-level policy network migration is selection dependent
Figure RE-FDA0002990763060000074
If at a certain time step
Figure RE-FDA0002990763060000075
The agent will continue to start with a new event-level policy network state, otherwise, meaning that a particular event was detected, the agent will initiate a new subtask, switch to the argument-level policy network to detect the argument of participation in the current event, after which the agent will begin argument-level selection and will not switch to the event-level policy network until all events at the current event are reached
Figure RE-FDA0002990763060000076
The lower argument level selection is sampled and finished, and the event level strategy network continues sampling and selecting until the last word in the sentence S;
transition of an argument level policy network depends on action
Figure RE-FDA0002990763060000077
If at a certain time step
Figure RE-FDA0002990763060000078
Then, one participating argument of the current event is identified, the intelligent agent is transferred to the role-level strategy network to classify the argument roles, otherwise, the intelligent agent continues to the argument-level strategy network; if the argument level policy network executes to the end of the sentence, the agent will transition to the event level policy network to continue identifying the remaining events.
7. The method of claim 6, wherein for optimizing the event-level policy network, the argument-level policy network, and the role-level policy network, the objective of the hierarchical training of the hierarchical policy network is to maximize the expected cumulative discount rewards from three phases obtained by the agent at each time step t according to policy sample selection and action, the expectation of the cumulative discount rewards being calculated as follows:
Figure RE-FDA0002990763060000081
wherein,
Figure RE-FDA0002990763060000082
is the expected calculation of the reward under the policy network, and gamma is equal to 0,1]Is the discount rate, TeIs the total elapsed time step, T, of the event-level process before the endnIs the end time step, T, of the argument level processrIs the time step that elapses before the end of the role-level process;
Figure RE-FDA0002990763060000083
awards obtained for process at time step k;
the cumulative prize is then decomposed into bellman equations, which can then be optimized with the REINFORCE algorithm, the decomposed bellman equations being as follows:
Figure RE-FDA0002990763060000084
where N is the argument level process in the selection
Figure RE-FDA0002990763060000085
The next time step of duration, so the next choice of agent is ot+NIf, if
Figure RE-FDA0002990763060000086
Then N is 1; since the role level policy network is acting
Figure RE-FDA0002990763060000087
Only one-step role classification is involved, so the index of the discount rate gamma in the process of argument level is 1; and there is no other step under the argument level strategy network, so the index of the discount rate gamma is 0; r is the final reward which is finally obtained by each layer of policy network, and R is the instant reward;
optimizing the Bellman equation obtained by decomposition by adopting a strategy gradient method and a REINFORCE algorithm to obtain the following random gradient for updating the parameters:
Figure RE-FDA0002990763060000088
Figure RE-FDA0002990763060000091
8. an event extraction electronic device based on a hierarchical policy network, comprising:
a processor;
and a memory for storing executable instructions of the processor;
wherein the processor is configured to perform a hierarchical policy network based event extraction method via execution of the executable instructions of any of claims 1 to 7.
CN202110022760.2A 2021-01-08 2021-01-08 Event extraction method and device based on hierarchical policy network Active CN112836504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110022760.2A CN112836504B (en) 2021-01-08 2021-01-08 Event extraction method and device based on hierarchical policy network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110022760.2A CN112836504B (en) 2021-01-08 2021-01-08 Event extraction method and device based on hierarchical policy network

Publications (2)

Publication Number Publication Date
CN112836504A true CN112836504A (en) 2021-05-25
CN112836504B CN112836504B (en) 2024-02-02

Family

ID=75928654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110022760.2A Active CN112836504B (en) 2021-01-08 2021-01-08 Event extraction method and device based on hierarchical policy network

Country Status (1)

Country Link
CN (1) CN112836504B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN110704598A (en) * 2019-09-29 2020-01-17 北京明略软件系统有限公司 Statement information extraction method, extraction device and readable storage medium
CN111382575A (en) * 2020-03-19 2020-07-07 电子科技大学 Event extraction method based on joint labeling and entity semantic information
US20200380210A1 (en) * 2018-07-03 2020-12-03 Tencent Technology (Shenzhen) Company Limited Event Recognition Method and Apparatus, Model Training Method and Apparatus, and Storage Medium
CN112183030A (en) * 2020-10-10 2021-01-05 深圳壹账通智能科技有限公司 Event extraction method and device based on preset neural network, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200380210A1 (en) * 2018-07-03 2020-12-03 Tencent Technology (Shenzhen) Company Limited Event Recognition Method and Apparatus, Model Training Method and Apparatus, and Storage Medium
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN110704598A (en) * 2019-09-29 2020-01-17 北京明略软件系统有限公司 Statement information extraction method, extraction device and readable storage medium
CN111382575A (en) * 2020-03-19 2020-07-07 电子科技大学 Event extraction method based on joint labeling and entity semantic information
CN112183030A (en) * 2020-10-10 2021-01-05 深圳壹账通智能科技有限公司 Event extraction method and device based on preset neural network, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄培馨;赵翔;方阳;朱慧明;肖卫东;: "融合对抗训练的端到端知识三元组联合抽取", 计算机研究与发展, no. 12, pages 2536 - 2548 *

Also Published As

Publication number Publication date
CN112836504B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
Lu et al. Transfer learning using computational intelligence: A survey
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
CN109214006B (en) Natural language reasoning method for image enhanced hierarchical semantic representation
CN108536784B (en) Comment information sentiment analysis method and device, computer storage medium and server
Wang et al. Exploiting topic-based adversarial neural network for cross-domain keyphrase extraction
CN114756687A (en) Self-learning entity relationship combined extraction-based steel production line equipment diagnosis method
CN115131613B (en) Small sample image classification method based on multidirectional knowledge migration
CN111582506A (en) Multi-label learning method based on global and local label relation
CN118170668A (en) Test case generation method, device, storage medium and equipment
Barbhuiya et al. Gesture recognition from RGB images using convolutional neural network‐attention based system
CN114419394A (en) Method and device for recognizing semantic soft label image with limited and unbalanced data
CN114048361A (en) Crowdsourcing software developer recommendation method based on deep learning
Shen et al. Progress-aware online action segmentation for egocentric procedural task videos
Shen et al. Active learning for event extraction with memory-based loss prediction model
CN116630708A (en) Image classification method, system, equipment and medium based on active domain self-adaption
CN112836504A (en) Event extraction method and device based on hierarchical policy network
Li et al. Variance tolerance factors for interpreting all neural networks
CN115132280A (en) Causal network local structure discovery system based on weak prior knowledge
Li Textual Data Mining for Financial Fraud Detection: A Deep Learning Approach
CN114842246B (en) Social media pressure type detection method and device
CN116610783B (en) Service optimization method based on artificial intelligent decision and digital online page system
US20230306769A1 (en) Model Generation System and Model Generation Method
Schöner Detecting Uncertainty in Text Classifications: A Sequence to Sequence Approach using Bayesian RNNs
Sharma et al. Optimizing Text Data in Deep Learning: An Experimental Approach
Laurelli Adaptive Meta-Domain Transfer Learning (AMDTL): A Novel Approach for Knowledge Transfer in AI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant