CN116306581A - Event extraction method and device - Google Patents

Event extraction method and device Download PDF

Info

Publication number
CN116306581A
CN116306581A CN202310505237.4A CN202310505237A CN116306581A CN 116306581 A CN116306581 A CN 116306581A CN 202310505237 A CN202310505237 A CN 202310505237A CN 116306581 A CN116306581 A CN 116306581A
Authority
CN
China
Prior art keywords
entity
event
target
argument
target sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310505237.4A
Other languages
Chinese (zh)
Inventor
汤伟
郭行飞
刘永丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongxin Kuanwei Media Technology Co ltd
Original Assignee
Zhongxin Kuanwei Media Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongxin Kuanwei Media Technology Co ltd filed Critical Zhongxin Kuanwei Media Technology Co ltd
Priority to CN202310505237.4A priority Critical patent/CN116306581A/en
Publication of CN116306581A publication Critical patent/CN116306581A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an event extraction method and device, comprising the following steps: acquiring an event type set, a role type set and a target sentence of an event to be extracted; determining each event type in the event type set and each argument role in the role type set as an entity set; based on the entity set, performing nested entity recognition on the target sentence to obtain a target entity corresponding to the target sentence; generating an argument relation diagram corresponding to the target sentence based on the target entity; searching a complete subgraph in the argument relation graph by using a preset searching algorithm; and generating event information corresponding to the target sentence based on the complete subgraph. The invention can improve the event extraction effect.

Description

Event extraction method and device
Technical Field
The invention relates to the technical field of natural language processing, in particular to an event extraction method and an event extraction device.
Background
At present, the goal of the event extraction task is to identify all events of the target event type in the sentence by giving a set of target event types, role types and sentences, and extract the argument corresponding to the event according to the argument role set.
In practice, it has been found that the current event extraction approach generally requires performing the four sub-tasks of trigger word detection, event/trigger word type recognition, event argument detection, and argument character recognition in sequence. However, this event extraction requires the trigger words to be detected first and then further processed based on the trigger words. If a problem occurs in the trigger word detection link, the next processing is difficult. Therefore, the existing event extraction method has the problem of poor event extraction effect.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides an event extraction method and an event extraction device, which are used for at least improving the event extraction effect.
According to an aspect of an embodiment of the present invention, there is provided an event extraction method, including: acquiring an event type set, a role type set and a target sentence of an event to be extracted; determining each event type in the event type set and each argument role in the role type set as an entity set; based on the entity set, performing nested entity recognition on the target sentence to obtain a target entity corresponding to the target sentence; generating an argument relation diagram corresponding to the target sentence based on the target entity; searching a complete subgraph in the argument relation graph by using a preset searching algorithm; and generating event information corresponding to the target sentence based on the complete subgraph.
As an optional implementation manner, based on the entity set, performing nested entity recognition on the target sentence to obtain a target entity corresponding to the target sentence, including: encoding the target sentence to obtain a target vector sequence; converting the target vector sequence into a first vector sequence and a second vector sequence; scoring each entity in the entity set by using the first vector sequence and the second vector sequence to obtain entity scoring information; and determining the target entity corresponding to the target sentence based on the entity scoring information.
As an optional implementation manner, based on the target entity, generating an argument relation diagram corresponding to the target sentence includes: determining each target entity as each argument corresponding to the target sentence; the argument relationship graph between the various arguments is constructed.
As an alternative embodiment, searching for the complete subgraph in the argument relation graph using a preset search algorithm includes: determining all node pairs in the argument relationship graph; if all node pairs are adjacent, the argument relationship graph is determined to be the full subgraph.
As an alternative embodiment, the method further comprises: if the node pairs are not adjacent, candidate sub-graph construction operation is carried out on the non-adjacent node pairs, and candidate sub-graphs corresponding to each node are obtained; if all node pairs in the candidate subgraph are adjacent, determining the candidate subgraph as the complete subgraph; and if the node pairs in the candidate subgraph are not adjacent, repeatedly executing the candidate subgraph construction operation on the non-adjacent node pairs to obtain updated candidate subgraphs until all the node pairs in the updated candidate subgraphs are adjacent.
As an alternative embodiment, the candidate sub-graph construction operation is: for each node in a non-adjacent node pair, determining a set of adjacent nodes for the node; the candidate subgraph is determined based on the node and a set of neighboring nodes to the node.
According to another aspect of the embodiment of the present invention, there is also provided an event extraction apparatus, including: the data acquisition unit is used for acquiring an event type set, a role type set and a target sentence of an event to be extracted; the entity construction unit is used for determining each event type in the event type set and each argument role in the role type set as an entity set; the entity identification unit is used for carrying out nested entity identification on the target sentences based on the entity set to obtain target entities corresponding to the target sentences; an argument relation generating unit, configured to generate an argument relation graph corresponding to the target sentence based on the target entity; the subgraph searching unit is used for searching the complete subgraph in the argument relation graph by utilizing a preset searching algorithm; and the event generation unit is used for generating event information corresponding to the target sentence based on the complete subgraph.
As an optional implementation manner, the entity identification unit is specifically configured to: encoding the target sentence to obtain a target vector sequence; converting the target vector sequence into a first vector sequence and a second vector sequence; scoring each entity in the entity set by using the first vector sequence and the second vector sequence to obtain entity scoring information; and determining the target entity corresponding to the target sentence based on the entity scoring information.
As an alternative embodiment, the argument relation generating unit is specifically configured to: determining each target entity as each argument corresponding to the target sentence; the argument relationship graph between the various arguments is constructed.
As an alternative embodiment, the sub-graph search unit is specifically configured to: determining all node pairs in the argument relationship graph; if all node pairs are adjacent, the argument relationship graph is determined to be the full subgraph.
As an alternative embodiment, the sub-graph search unit is specifically configured to: if the node pairs are not adjacent, candidate sub-graph construction operation is carried out on the non-adjacent node pairs, and candidate sub-graphs corresponding to each node are obtained; if all node pairs in the candidate subgraph are adjacent, determining the candidate subgraph as the complete subgraph; and if the node pairs in the candidate subgraph are not adjacent, repeatedly executing the candidate subgraph construction operation on the non-adjacent node pairs to obtain updated candidate subgraphs until all the node pairs in the updated candidate subgraphs are adjacent.
As an alternative embodiment, the candidate sub-graph construction operation is: for each node in a non-adjacent node pair, determining a set of adjacent nodes for the node; the candidate subgraph is determined based on the node and a set of neighboring nodes to the node.
According to a further aspect of embodiments of the present invention, there is also provided a computer-readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the above-described event extraction method at run-time.
According to still another aspect of the embodiments of the present invention, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the event extraction method described above through the computer program.
In the embodiment of the invention, the event extraction problem is converted into the named entity recognition problem by combining each event type and each role type into a large class to form an entity set, and particularly, for the entity with the nesting problem, a target entity (namely, the argument of each event forming the target sentence) in the target sentence is determined by adopting a nested entity recognition mode, then an argument relation diagram is generated, the argument relation diagram is searched for a complete sub-diagram, and the event information corresponding to the target sentence is determined and obtained, so that it can be understood that one complete sub-diagram corresponds to one event. By adopting the scheme in the embodiment of the invention, the event type and the argument roles are directly divided into the entities without considering the trigger words, then the nested entity identification is carried out, and the event extraction effect can be improved by combining the complete sub-graph search.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow chart of an alternative event extraction method according to an embodiment of the invention;
FIG. 2 is a flow chart of another alternative event extraction method according to an embodiment of the invention;
FIG. 3 is an alternative event extraction schematic according to an embodiment of the invention;
FIG. 4 is an alternative complete sub-diagram schematic according to an embodiment of the invention;
FIG. 5 is a schematic diagram of an alternative event extraction device according to an embodiment of the invention;
fig. 6 is a schematic structural diagram of an alternative electronic device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention provides an optional event extraction method, as shown in fig. 1, which comprises the following steps:
s101, acquiring an event type set, a role type set and a target sentence of an event to be extracted.
In this embodiment, the execution body may be an electronic device such as a terminal device or a server.
In this embodiment, when performing event extraction, the execution body may first acquire a preset event type set, a role type set, and a target sentence that needs to be subjected to event extraction. Then, selecting the event type and the argument character of the adaptation target sentence from the event type set, the event type and the argument character agreed by the character type set, and identifying each argument corresponding to the argument character and the event corresponding to the argument in the sentence.
The event type set may include a plurality of event types, and the event types may include, but are not limited to, win or lose, abstract a crown, and the like, which is not limited in this embodiment.
The set of character types may include multiple argument characters, and the character types may include, but are not limited to, "time", "loser", "winner", "champion", "event", and the like, which are not limited in this embodiment.
The target sentence of the time to be extracted may be a sentence that needs to be extracted by an event, and the target sentence may include one event or multiple events, which is not limited in this embodiment.
S102, determining each event type in the event type set and each argument role in the role type set as an entity set.
In this embodiment, the execution body may use all event types in the event types and all argument roles in the role type set as entities to obtain an entity set. By adopting the construction mode of the entity set, the trigger word is not required to be marked, the trigger word is also used as an argument character of the event, and the subsequent event extraction only needs to perform argument identification and event division, so that the event extraction effect is improved.
S103, based on the entity set, performing nested entity recognition on the target sentence to obtain a target entity corresponding to the target sentence.
In this embodiment, different target entities may be nested, so nested entity recognition of target sentences based on entity sets is required. Nested entity recognition is preferably accomplished using a globalpinter (a way to perform named entity recognition using globally normalized ideas) model.
Specifically, by performing nested entity recognition on the target sentence, a target entity corresponding to the target sentence can be obtained, where the target entity may be an argument in an event corresponding to the target sentence.
S104, generating an argument relation diagram corresponding to the target sentence based on the target entity.
In this embodiment, the execution body may use an undirected graph or a directed graph to establish a connection relationship between target entities, so as to form an argument relationship graph corresponding to the target sentence. Wherein, the nodes of any two arguments of the same event can be connected with one edge to form adjacent nodes, and if the two arguments never appear in the same event, the corresponding nodes have no edge.
S105, searching the complete subgraph in the argument relation graph by using a preset searching algorithm.
In this embodiment, a complete sub-graph refers to any two nodes that are adjacent to the same event, and one complete sub-graph in the meta-relationship graph corresponds to one event.
And S106, generating event information corresponding to the target sentence based on the complete subgraph.
In this embodiment, all arguments in each complete sub-graph may be determined as arguments constituting one event, and the argument information corresponding to each event may be determined as the event information described above.
Referring to fig. 2 together, fig. 2 is a flowchart of another alternative event extraction method according to an embodiment of the present invention, as shown in fig. 2, in this embodiment, when event extraction is performed, each event type and each argument role may be combined to form a large class, so that the event extraction problem is converted into an entity identification problem. And then, adopting a globalpinter model for identifying nested entities to identify the nested entities of the target sentence, and obtaining each argument of the target sentence. Thereafter, for the determination of event arguments, they can be translated into complete sub-graph search questions. Specifically, each argument of the target sentence can be combined to obtain an argument relation graph, and then the argument relation graph is searched by adopting a recursive search algorithm to obtain a complete subgraph in the argument relation graph. And then, determining event arguments corresponding to all events in the target sentence based on the complete subgraph, and generating event information corresponding to the target sentence.
Referring to fig. 3 together, fig. 3 is an optional event extraction schematic diagram according to an embodiment of the present invention, as shown in fig. 3, for a sentence "9 months and 9 am, in which event extraction is required, player X wins player 5 with 3:1, and takes a single champion of a tennis public race", the existing event extraction method needs to determine trigger words "defeat", "capture" in the sentence first, and then determine event types "win or lose", "capture" according to the trigger words. Thereafter, the argument character "time", "event name", "loser", "winner", "crown-grabbing event", "champion" matching the event type is determined. And then determining the argument 'time 9 months 9 days morning', 'tennis fair play', 'athlete X', 'athlete Y' corresponding to each argument character from the sentence, thereby realizing the whole process of event extraction. The event type and the argument roles in fig. 3 can be combined into a large class to obtain an entity set, and then the nested entity recognition is performed on the target sentence based on the entity set to determine each argument in the sentence, so that the whole event extraction is realized.
As an optional implementation manner, based on the entity set, performing nested entity recognition on the target sentence to obtain a target entity corresponding to the target sentence, including: encoding the target sentence to obtain a target vector sequence; converting the target vector sequence into a first vector sequence and a second vector sequence; scoring each entity in the entity set by using the first vector sequence and the second vector sequence to obtain entity scoring information; and determining the target entity corresponding to the target sentence based on the entity scoring information.
In this embodiment, using globalpoint, based on the entity set, the method for identifying the nested entity of the target sentence to obtain the target entity corresponding to the target sentence may specifically be: encoding the target sentence to obtain a target vector sequence
Figure SMS_1
. After that, the target vector sequence +.>
Figure SMS_2
Conversion to a first vector sequence
Figure SMS_3
And a second vector sequence->
Figure SMS_4
. Wherein n refers to the length of the target sentence, < >>
Figure SMS_5
Refers to the type of entity.
Wherein the specific way of converting the target vector sequence into the first vector sequence and the second vector sequence is that, for the target vector sequence, the target vector sequence is converted by
Figure SMS_6
And->
Figure SMS_7
A first vector sequence and a second vector sequence are obtained. Wherein i refers to the index of the target sentence; w, b is the pointing weight coefficient; q and k refer to a vector sequence obtained by converting an entity type alpha.
The basic idea of globalpinter is: assume that the text sequence to be recognized has a length of
Figure SMS_10
It is assumed that only one entity needs to be identified and that each entity to be identified is a continuous segment of the sequence and thatNested within each other. Then the length is +.>
Figure SMS_18
Has +.>
Figure SMS_19
Different consecutive subsequences. I.e. need to be from this->
Figure SMS_11
The actual entity is selected from among the "candidate entities". If there is->
Figure SMS_12
The species entity type needs to be identified, then it is made +.>
Figure SMS_14
Personal->
Figure SMS_16
Selecting multi-label classification of entities. Definitions->
Figure SMS_8
As a continuous fragment from i to j is a type +.>
Figure SMS_13
Is a scoring of the entities of (a). That is, using
Figure SMS_15
And->
Figure SMS_17
Is taken as a segment->
Figure SMS_9
Is of the type +.>
Figure SMS_20
Is a scoring of the entities of (a). Wherein, tj]The actual meaning refers to a continuous substring of the ith element through the jth element of the sequence t.
The loss function of globalpinter may be:
Figure SMS_21
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_22
is all types of the sample +.>
Figure SMS_23
End-to-end collection of entities of (a), a->
Figure SMS_24
All non-entities or types of the sample are not +.>
Figure SMS_25
Is only required to consider +.>
Figure SMS_26
Combinations of (a), i.e.)
Figure SMS_27
Figure SMS_28
Wherein Ω represents a set of position indices satisfying the condition; i. j represents the index of the text sequence.
As an optional implementation manner, based on the target entity, generating an argument relation diagram corresponding to the target sentence includes: determining each target entity as each argument corresponding to the target sentence; the argument relationship graph between the various arguments is constructed.
In this embodiment, each target entity may be used as each argument of the target sentence, and a connection relationship between each argument may be established, so as to obtain an argument relationship diagram.
As an alternative embodiment, searching for the complete subgraph in the argument relation graph using a preset search algorithm includes: determining all node pairs in the argument relationship graph; if all node pairs are adjacent, the argument relationship graph is determined to be the full subgraph.
As an alternative embodiment, the method further comprises: if the node pairs are not adjacent, candidate sub-graph construction operation is carried out on the non-adjacent node pairs, and candidate sub-graphs corresponding to each node are obtained; if all node pairs in the candidate subgraph are adjacent, determining the candidate subgraph as the complete subgraph; and if the node pairs in the candidate subgraph are not adjacent, repeatedly executing the candidate subgraph construction operation on the non-adjacent node pairs to obtain updated candidate subgraphs until all the node pairs in the updated candidate subgraphs are adjacent.
As an alternative embodiment, the candidate sub-graph construction operation is: for each node in a non-adjacent node pair, determining a set of adjacent nodes for the node; the candidate subgraph is determined based on the node and a set of neighboring nodes to the node.
In this embodiment, it is assumed that there is already a directed graph describing the argument relationship, and the nodes in the directed graph are reusable, meaning that the same entity is also an argument for a plurality of different events at the same time. Referring to fig. 4 together, fig. 4 is an alternative complete sub-graph, as shown in fig. 4, in which 8 nodes in the graph can search out two complete sub-graphs, and node D appears in both sub-graphs at the same time, which means that they share a common argument D in the two events that are partitioned. The recursive search algorithm is as follows:
step 1, enumerating all node pairs on the graph, if all node pairs are adjacent, the graph is a complete graph, and directly returning to the graph; if there are non-adjacent node pairs, then step 2 is performed;
and 2, for each pair of non-adjacent nodes, respectively finding out all node sets adjacent to the node sets (including the node sets) to form a sub-graph, and then respectively executing step 1 for each sub-graph set.
Taking fig. 4 as an example, one can find
Figure SMS_29
Is a pair of non-adjacent nodes, then the adjacent sets thereof can be found as
Figure SMS_30
And->
Figure SMS_31
Then continue to look for +.>
Figure SMS_32
And->
Figure SMS_33
Is not found, so
Figure SMS_34
And->
Figure SMS_35
Are all complete subgraphs. It should be noted here that this does not depend on the order of the non-adjacent node pairs, since the same operation is required for all non-adjacent nodes.
Alternatively, the present invention may utilize a nested entity recognition model to identify arguments, and then require a "head-to-head" matching and "tail-to-tail" matching model, respectively, to construct relationships between arguments. The DuEE Chinese event extraction dataset and the DuEE-fin financial field chapter level event extraction dataset may be selected.
In the embodiment of the invention, the event extraction problem is converted into the named entity recognition problem by combining each event type and each role type into a large class to form an entity set, and particularly, for the entity with the nesting problem, a target entity (namely, the argument of each event forming the target sentence) in the target sentence is determined by adopting a nested entity recognition mode, then an argument relation diagram is generated, the argument relation diagram is searched for a complete sub-diagram, and the event information corresponding to the target sentence is determined and obtained, so that it can be understood that one complete sub-diagram corresponds to one event. By adopting the scheme in the embodiment of the invention, the event type and the argument roles are directly divided into the entities without considering the trigger words, then the nested entity identification is carried out, and the event extraction effect can be improved by combining the complete sub-graph search.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
Further, an embodiment of the present invention provides an optional event extraction apparatus, as shown in fig. 5, where the event extraction apparatus includes:
the data acquisition unit 501 is configured to acquire a set of event types, a set of role types, and a target sentence of an event to be extracted.
In this embodiment, when the event extraction is performed, a preset event type set, a role type set and a target sentence required to be subjected to the event extraction may be acquired first. Then, selecting the event type and the argument character of the adaptation target sentence from the event type set, the event type and the argument character agreed by the character type set, and identifying each argument corresponding to the argument character and the event corresponding to the argument in the sentence.
The event type set may include a plurality of event types, and the event types may include, but are not limited to, win or lose, abstract a crown, and the like, which is not limited in this embodiment.
The set of character types may include multiple argument characters, and the character types may include, but are not limited to, "time", "loser", "winner", "champion", "event", and the like, which are not limited in this embodiment.
The target sentence of the time to be extracted may be a sentence that needs to be extracted by an event, and the target sentence may include one event or multiple events, which is not limited in this embodiment.
An entity construction unit 502, configured to determine each event type in the event type set and each argument role in the role type set as an entity set.
In this embodiment, all event types in the event types and all meta-roles in the role type set may be used as entities to obtain an entity set. By adopting the construction mode of the entity set, the trigger word is not required to be marked, the trigger word is also used as an argument character of the event, and the subsequent event extraction only needs to perform argument identification and event division, so that the event extraction effect is improved.
And the entity recognition unit 503 is configured to perform nested entity recognition on the target sentence based on the entity set, so as to obtain a target entity corresponding to the target sentence.
In this embodiment, different target entities may be nested, so nested entity recognition of target sentences based on entity sets is required. Nested entity recognition is preferably accomplished using a globalpinter (a way to perform named entity recognition using globally normalized ideas) model.
Specifically, by performing nested entity recognition on the target sentence, a target entity corresponding to the target sentence can be obtained, where the target entity may be an argument in an event corresponding to the target sentence.
An argument relation generating unit 504, configured to generate an argument relation diagram corresponding to the target sentence based on the target entity.
In this embodiment, a connection relationship between target entities may be established by using an undirected graph or a directed graph, so as to form an argument relationship graph corresponding to a target sentence. Wherein, the nodes of any two arguments of the same event can be connected with one edge to form adjacent nodes, and if the two arguments never appear in the same event, the corresponding nodes have no edge.
And a sub-graph searching unit 505 for searching the complete sub-graph in the argument relation graph by using a preset searching algorithm.
In this embodiment, a complete sub-graph refers to any two nodes that are adjacent to the same event, and one complete sub-graph in the meta-relationship graph corresponds to one event.
And an event generating unit 506, configured to generate event information corresponding to the target sentence based on the complete subgraph.
In this embodiment, all arguments in each complete sub-graph may be determined as arguments constituting one event, and the argument information corresponding to each event may be determined as the event information described above.
As an optional implementation manner, the entity identification unit is specifically configured to: encoding the target sentence to obtain a target vector sequence; converting the target vector sequence into a first vector sequence and a second vector sequence; scoring each entity in the entity set by using the first vector sequence and the second vector sequence to obtain entity scoring information; and determining the target entity corresponding to the target sentence based on the entity scoring information.
As an alternative embodiment, the argument relation generating unit is specifically configured to: determining each target entity as each argument corresponding to the target sentence; the argument relationship graph between the various arguments is constructed.
In this embodiment, each target entity may be used as each argument of the target sentence, and a connection relationship between each argument may be established, so as to obtain an argument relationship diagram.
As an alternative embodiment, the sub-graph search unit is specifically configured to: determining all node pairs in the argument relationship graph; if all node pairs are adjacent, the argument relationship graph is determined to be the full subgraph.
As an alternative embodiment, the sub-graph search unit is specifically configured to: if the node pairs are not adjacent, candidate sub-graph construction operation is carried out on the non-adjacent node pairs, and candidate sub-graphs corresponding to each node are obtained; if all node pairs in the candidate subgraph are adjacent, determining the candidate subgraph as the complete subgraph; and if the node pairs in the candidate subgraph are not adjacent, repeatedly executing the candidate subgraph construction operation on the non-adjacent node pairs to obtain updated candidate subgraphs until all the node pairs in the updated candidate subgraphs are adjacent.
As an alternative embodiment, the candidate sub-graph construction operation is: for each node in a non-adjacent node pair, determining a set of adjacent nodes for the node; the candidate subgraph is determined based on the node and a set of neighboring nodes to the node.
In the embodiment of the invention, the event extraction problem is converted into the named entity recognition problem by combining each event type and each role type into a large class to form an entity set, and particularly, for the entity with the nesting problem, a target entity (namely, the argument of each event forming the target sentence) in the target sentence is determined by adopting a nested entity recognition mode, then an argument relation diagram is generated, the argument relation diagram is searched for a complete sub-diagram, and the event information corresponding to the target sentence is determined and obtained, so that it can be understood that one complete sub-diagram corresponds to one event. By adopting the scheme in the embodiment of the invention, the event type and the argument roles are directly divided into the entities without considering the trigger words, then the nested entity identification is carried out, and the event extraction effect can be improved by combining the complete sub-graph search.
Further, according to still another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the event extraction method described above, as shown in fig. 6, the electronic device including a memory 602 and a processor 604, the memory 602 storing a computer program, the processor 604 being configured to execute the steps of any of the method embodiments described above by means of the computer program.
Alternatively, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of the computer network.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, acquiring an event type set, a role type set and a target sentence of an event to be extracted;
s2, determining each event type in the event type set and each argument role in the role type set as an entity set;
s3, based on the entity set, performing nested entity identification on the target sentence to obtain a target entity corresponding to the target sentence;
s4, generating an argument relation diagram corresponding to the target sentence based on the target entity;
s5, searching a complete subgraph in the argument relation graph by using a preset searching algorithm;
and S6, generating event information corresponding to the target sentence based on the complete subgraph.
Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 6 is only schematic, and the electronic device may also be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 6 is not limited to the structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 6, or have a different configuration than shown in FIG. 6.
The memory 602 may be used to store software programs and modules, such as program instructions/modules corresponding to the event extraction method in the embodiment of the present invention, and the processor 604 executes the software programs and modules stored in the memory 602 to perform various functional applications and data processing, i.e., implement the event extraction method described above. The memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 602 may further include memory located remotely from processor 604, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 602 may be used to store information such as operation instructions, but is not limited to. As an example, as shown in fig. 6, the memory 602 may include, but is not limited to, various modules in the apparatus.
Optionally, the transmission device 606 is used to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 606 includes a network adapter (Network Interface Controller, NIC) that may be connected to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 606 is a Radio Frequency (RF) module for communicating wirelessly with the internet.
In addition, the electronic device further includes: a display 608 and a connection bus 610.
According to a further aspect of embodiments of the present invention there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
s1, acquiring an event type set, a role type set and a target sentence of an event to be extracted;
s2, determining each event type in the event type set and each argument role in the role type set as an entity set;
s3, based on the entity set, performing nested entity identification on the target sentence to obtain a target entity corresponding to the target sentence;
s4, generating an argument relation diagram corresponding to the target sentence based on the target entity;
s5, searching a complete subgraph in the argument relation graph by using a preset searching algorithm;
and S6, generating event information corresponding to the target sentence based on the complete subgraph.
Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and are merely a logical functional division, and there may be other manners of dividing the apparatus in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (10)

1. An event extraction method, comprising:
acquiring an event type set, a role type set and a target sentence of an event to be extracted;
determining each event type in the event type set and each argument role in the role type set as an entity set; wherein the entity set contains trigger words;
based on the entity set, performing nested entity identification on the target sentence by using a globalpoint to obtain a target entity corresponding to the target sentence; wherein, the target entity is an argument in an event corresponding to the target sentence;
generating an argument relation diagram corresponding to the target sentence based on the target entity;
searching a complete subgraph in the argument relation graph by using a preset searching algorithm; wherein, the complete subgraph refers to that any two nodes of the same event are adjacent, and one complete subgraph corresponds to one event;
generating event information corresponding to the target sentence based on the complete subgraph; the event information is meta information corresponding to the event formed by each complete sub-graph.
2. The method of claim 1, wherein performing nested entity recognition on the target sentence by using globalpoint based on the entity set to obtain a target entity corresponding to the target sentence, comprises:
encoding the target sentence to obtain a target vector sequence;
converting the target vector sequence into a first vector sequence and a second vector sequence;
scoring each entity in the entity set by using the first vector sequence and the second vector sequence to obtain entity scoring information;
and determining the target entity corresponding to the target sentence based on the entity scoring information.
3. The method of claim 1, wherein generating an argument relationship graph corresponding to the target sentence based on the target entity comprises:
determining each target entity as each argument corresponding to the target sentence;
the argument relationship graph between the various arguments is constructed.
4. The method according to claim 1, wherein searching for a complete subgraph in the argument relation graph using a preset search algorithm comprises:
determining all node pairs in the argument relationship graph;
if all node pairs are adjacent, the argument relationship graph is determined to be the full subgraph.
5. The method according to claim 4, wherein the method further comprises:
if the node pairs are not adjacent, candidate sub-graph construction operation is carried out on the non-adjacent node pairs, and candidate sub-graphs corresponding to each node are obtained;
if all node pairs in the candidate subgraph are adjacent, determining the candidate subgraph as the complete subgraph;
and if the node pairs in the candidate subgraph are not adjacent, repeatedly executing the candidate subgraph construction operation on the non-adjacent node pairs to obtain updated candidate subgraphs until all the node pairs in the updated candidate subgraphs are adjacent.
6. The method of claim 5, wherein the candidate subgraph construction operation is:
for each node in a non-adjacent node pair, determining a set of adjacent nodes for the node;
the candidate subgraph is determined based on the node and a set of neighboring nodes to the node.
7. An event extraction device, comprising:
the data acquisition unit is used for acquiring an event type set, a role type set and a target sentence of an event to be extracted;
the entity construction unit is used for determining each event type in the event type set and each argument role in the role type set as an entity set; wherein the entity set contains trigger words;
the entity identification unit is used for carrying out nested entity identification on the target sentences by using the globalpoint based on the entity set to obtain target entities corresponding to the target sentences; wherein, the target entity is an argument in an event corresponding to the target sentence;
an argument relation generating unit, configured to generate an argument relation graph corresponding to the target sentence based on the target entity;
the subgraph searching unit is used for searching the complete subgraph in the argument relation graph by utilizing a preset searching algorithm; wherein, the complete subgraph refers to that any two nodes of the same event are adjacent, and one complete subgraph corresponds to one event;
the event generation unit is used for generating event information corresponding to the target sentence based on the complete subgraph; the event information is meta information corresponding to the event formed by each complete sub-graph.
8. The apparatus according to claim 7, wherein the entity identification unit is specifically configured to:
encoding the target sentence to obtain a target vector sequence;
converting the target vector sequence into a first vector sequence and a second vector sequence;
scoring each entity in the entity set by using the first vector sequence and the second vector sequence to obtain entity scoring information;
and determining the target entity corresponding to the target sentence based on the entity scoring information.
9. The apparatus according to claim 7, wherein the argument relation generating unit is specifically configured to:
determining each target entity as each argument corresponding to the target sentence;
the argument relationship graph between the various arguments is constructed.
10. The apparatus according to claim 7, wherein the sub-graph search unit is specifically configured to:
determining all node pairs in the argument relationship graph;
if all node pairs are adjacent, the argument relationship graph is determined to be the full subgraph.
CN202310505237.4A 2023-05-08 2023-05-08 Event extraction method and device Pending CN116306581A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310505237.4A CN116306581A (en) 2023-05-08 2023-05-08 Event extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310505237.4A CN116306581A (en) 2023-05-08 2023-05-08 Event extraction method and device

Publications (1)

Publication Number Publication Date
CN116306581A true CN116306581A (en) 2023-06-23

Family

ID=86803398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310505237.4A Pending CN116306581A (en) 2023-05-08 2023-05-08 Event extraction method and device

Country Status (1)

Country Link
CN (1) CN116306581A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823824A (en) * 2013-11-12 2014-05-28 哈尔滨工业大学深圳研究生院 Method and system for automatically constructing text classification corpus by aid of internet
CN112906391A (en) * 2021-03-16 2021-06-04 合肥讯飞数码科技有限公司 Meta-event extraction method and device, electronic equipment and storage medium
CN113032636A (en) * 2019-12-25 2021-06-25 中移动信息技术有限公司 Complete subgraph data searching method, device, equipment and medium
CN113268569A (en) * 2021-07-19 2021-08-17 中国电子科技集团公司第十五研究所 Semantic-based related word searching method and device, electronic equipment and storage medium
CN113468328A (en) * 2021-06-18 2021-10-01 浙江工业大学 Multi-attribute matter relation extraction and visual analysis method
CN114547301A (en) * 2022-02-21 2022-05-27 北京百度网讯科技有限公司 Document processing method, document processing device, recognition model training equipment and storage medium
CN115329746A (en) * 2022-08-05 2022-11-11 杭州海康威视数字技术股份有限公司 Event extraction method, device and equipment
CN115858814A (en) * 2022-12-20 2023-03-28 上海大学 Text structured information extraction method based on global pointer decoding method
CN115983274A (en) * 2022-12-20 2023-04-18 东南大学 Noise event extraction method based on two-stage label correction

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823824A (en) * 2013-11-12 2014-05-28 哈尔滨工业大学深圳研究生院 Method and system for automatically constructing text classification corpus by aid of internet
CN113032636A (en) * 2019-12-25 2021-06-25 中移动信息技术有限公司 Complete subgraph data searching method, device, equipment and medium
CN112906391A (en) * 2021-03-16 2021-06-04 合肥讯飞数码科技有限公司 Meta-event extraction method and device, electronic equipment and storage medium
CN113468328A (en) * 2021-06-18 2021-10-01 浙江工业大学 Multi-attribute matter relation extraction and visual analysis method
CN113268569A (en) * 2021-07-19 2021-08-17 中国电子科技集团公司第十五研究所 Semantic-based related word searching method and device, electronic equipment and storage medium
CN114547301A (en) * 2022-02-21 2022-05-27 北京百度网讯科技有限公司 Document processing method, document processing device, recognition model training equipment and storage medium
CN115329746A (en) * 2022-08-05 2022-11-11 杭州海康威视数字技术股份有限公司 Event extraction method, device and equipment
CN115858814A (en) * 2022-12-20 2023-03-28 上海大学 Text structured information extraction method based on global pointer decoding method
CN115983274A (en) * 2022-12-20 2023-04-18 东南大学 Noise event extraction method based on two-stage label correction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
世界划水锦标赛冠军: "GlobalPointer:用统一的方式处理嵌套和非嵌套NER", Retrieved from the Internet <URL:https://blog.csdn.net/qq_41898761/article/details/125209437> *
苏剑林: "GPLinker:基于GlobalPointer的事件联合抽取", pages 1 - 2, Retrieved from the Internet <URL:hhttps://spaces.ac.cn/archives/8926> *

Similar Documents

Publication Publication Date Title
CN110837550B (en) Knowledge graph-based question answering method and device, electronic equipment and storage medium
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
CN109583904B (en) Training method of abnormal operation detection model, abnormal operation detection method and device
CN110019876B (en) Data query method, electronic device and storage medium
CN110134800A (en) A kind of document relationships visible processing method and device
CN109800431B (en) Event information keyword extracting and monitoring method and system and storage and processing device
CN108363686A (en) A kind of character string segmenting method, device, terminal device and storage medium
CN111696635A (en) Disease name standardization method and device
CN106294778A (en) Information-pushing method and device
US20230035954A1 (en) Method for establishing medicine synergism prediction model, prediction method and corresponding apparatus
CN111402973A (en) Information matching analysis method and device, computer system and readable storage medium
CN112084781B (en) Standard term determining method, device and storage medium
US11669727B2 (en) Information processing device, neural network design method, and recording medium
Huang et al. Identifying influential individuals in microblogging networks using graph partitioning
CN109657048A (en) One kind being applied to answerer&#39;s recommended method in open source Ask-Answer Community
CN106844338B (en) method for detecting entity column of network table based on dependency relationship between attributes
CN116955646A (en) Knowledge graph generation method and device, storage medium and electronic equipment
CN116306581A (en) Event extraction method and device
CN110069691B (en) Method and device for processing click behavior data
CN112070161A (en) Network attack event classification method, device, terminal and storage medium
CN111666501A (en) Abnormal community identification method and device, computer equipment and storage medium
Olman et al. Identification of regulatory binding sites using minimum spanning trees
CN105279157B (en) A kind of method and apparatus of canonical inquiry
CN108304467A (en) For matched method between text
CN110232393B (en) Data processing method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230623

RJ01 Rejection of invention patent application after publication