CN112861527A - Event extraction method, device, equipment and storage medium - Google Patents

Event extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN112861527A
CN112861527A CN202110286434.2A CN202110286434A CN112861527A CN 112861527 A CN112861527 A CN 112861527A CN 202110286434 A CN202110286434 A CN 202110286434A CN 112861527 A CN112861527 A CN 112861527A
Authority
CN
China
Prior art keywords
event
text
event extraction
sequence
extracted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110286434.2A
Other languages
Chinese (zh)
Other versions
CN112861527B (en
Inventor
王玉杰
吴飞
张浩宇
方四安
柳林
徐承
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Ustc Iflytek Co ltd
Original Assignee
Hefei Ustc Iflytek Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Ustc Iflytek Co ltd filed Critical Hefei Ustc Iflytek Co ltd
Priority to CN202110286434.2A priority Critical patent/CN112861527B/en
Publication of CN112861527A publication Critical patent/CN112861527A/en
Application granted granted Critical
Publication of CN112861527B publication Critical patent/CN112861527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The application provides an event extraction method, an event extraction device and a storage medium, wherein the method comprises the following steps: inputting a text to be extracted into a pre-trained event extraction model to obtain a category label of each text unit in the text to be extracted as an event extraction result; the event extraction model is obtained by training a text sequence serving as a training sample, an event trigger word position label of the text sequence and a category label of each text unit of the text sequence serving as a first type sample label, wherein the category label of the text unit comprises a label of an event trigger word type and an event argument type to which the text unit belongs. The method can realize the event extraction and can comprehensively identify the multiple roles of the text in the event, thereby ensuring the integrity of the event extraction.

Description

Event extraction method, device, equipment and storage medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to an event extraction method, apparatus, device, and storage medium.
Background
Event extraction is an important component in information extraction and intelligence analysis. An event represents the occurrence of an action or a change of state, and in an event, a verb or noun representing the action is used as a trigger of the action or state, and a main argument role (also called argument, such as time, place, person, etc.) participating in the action is also included.
Therefore, extracting event trigger words and event arguments from the text is a key for realizing event extraction. The conventional event extraction method generally extracts event trigger words and event arguments from a text step by step in sequence, so as to obtain event extraction. The conventional event extraction method has the disadvantages of complex processing flow and low event extraction efficiency.
Disclosure of Invention
Based on the above requirements, the application provides an event extraction method, device, equipment and storage medium, which can extract event trigger words and event arguments from a text, thereby realizing event extraction.
The technical scheme provided by the application is as follows:
an event extraction method, comprising:
inputting a text to be extracted into a pre-trained event extraction model to obtain a category label of each text unit in the text to be extracted as an event extraction result;
the event extraction model is obtained by training a text sequence serving as a training sample, an event trigger word position label of the text sequence and a category label of each text unit of the text sequence serving as a first type sample label, wherein the category label of the text unit comprises a label of an event trigger word type and an event argument type to which the text unit belongs.
Optionally, the category label of the text unit is a category label sequence, the category label sequence includes sequence elements corresponding to all event trigger word types and event argument types, and in the category label sequence of the text unit, values of the sequence elements corresponding to the event trigger word type and the event argument type to which the text unit belongs are set valid values.
Optionally, when the event extraction model is trained, the sequence labeling result of the event trigger word and the argument in the text sequence is further used as a second type sample label.
Optionally, the training process of the event extraction model includes:
inputting a training sample into the event extraction model to obtain an event extraction result output by the event extraction model; the event extraction result output by the event extraction model comprises a first type result and a second type result, the first type result comprises a labeling result of the position of a trigger word of the training sample and a labeling result of a category label of each text unit of the training sample, and the second type result comprises a sequence labeling result of the event trigger word and an event argument of the training sample;
calculating and determining an event extraction loss value of the event extraction model according to an event extraction result output by the event extraction model, the first type sample label and the second type sample label;
adjusting the operation parameters of the event extraction model according to the event extraction loss value of the event extraction model;
and repeating the above processing until the event extraction loss value of the event extraction model is less than the set value.
Optionally, the calculating and determining a loss value of the event extraction model according to the event extraction result output by the event extraction model, and the first type sample label and the second type sample label includes:
calculating to obtain a first event extraction loss value of the event extraction model according to an event extraction result output by the event extraction model and the first type sample label;
calculating a second event extraction loss value of the event extraction model according to the event extraction result output by the event extraction model and the second type sample label;
and calculating and determining an event extraction loss value of the event extraction model according to the first event extraction loss value and the second event extraction loss value.
Optionally, the event trigger word position tag is a coding sequence, elements of the coding sequence correspond to each text unit of the text sequence one to one, where values of coding sequence elements corresponding to event trigger words of the text sequence are set values, and values of the remaining coding sequence elements are non-set values.
Optionally, the processing, by the event extraction model, the text to be extracted to obtain the category label of each text unit in the text to be extracted includes:
performing trigger word recognition processing on the text to be extracted, and determining the position of a trigger word in the text to be extracted;
and determining the category label of each text unit in the text to be extracted according to the position of the trigger word in the text to be extracted, wherein the category label of the text unit comprises the label of the event trigger word type and the event argument type to which the text unit belongs.
Optionally, the performing trigger word recognition processing on the text to be extracted to determine the position of the trigger word in the text to be extracted includes:
generating an initial coding sequence according to the text to be extracted, wherein sequence elements of the initial coding sequence correspond to each text unit of the text to be extracted one by one, and the values of the sequence elements of the initial coding sequence are set initial values;
inputting the text to be extracted and the initial coding sequence into a trigger word extraction module so that the trigger word extraction module identifies the trigger words in the text to be extracted and outputs a trigger word mark coding sequence;
the trigger word mark coding sequence is obtained by setting the value of an element corresponding to an event trigger word in the text to be extracted in the initial coding sequence to a set value by the trigger word extraction module.
Optionally, the determining the category label of each text unit in the text to be extracted according to the position of the trigger word in the text to be extracted includes:
and inputting the text to be extracted and the trigger word mark coding sequence into a argument extraction module, and determining the category label of each text unit in the text to be extracted.
An event extraction device comprising:
the event extraction processing unit is used for inputting the text to be extracted into a pre-trained event extraction model to obtain the category labels of all text units in the text to be extracted as an event extraction result;
the event extraction model is obtained by training a text sequence serving as a training sample, an event trigger word position label of the text sequence and a category label of each text unit of the text sequence serving as a first type sample label, wherein the category label of the text unit comprises a label of an event trigger word type and an event argument type to which the text unit belongs.
An event extraction device comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is used for realizing the event extraction method by running the program in the memory.
A storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described event extraction method.
According to the event extraction method provided by the embodiment of the application, the pre-trained event extraction model is used for extracting the event of the text to be extracted, the category labels of all text units in the text to be extracted can be respectively labeled, namely the labels of the event trigger word type and the event argument type of all the text units in the text to be extracted are labeled, so that the event trigger word and the event argument are determined from the text to be extracted, and the event extraction result is obtained.
Furthermore, based on the specific training mode of the event extraction model, the event extraction model can not only realize end-to-end extraction of the event trigger words and the event arguments, reduce the scale of the model and the consumption of the model and improve the efficiency of event extraction, but also can use the extraction of the event trigger words for assisting the extraction of the event arguments, thereby ensuring the corresponding relation between the trigger words and the arguments during the extraction of the event and improving the accuracy of the extraction of the event. Moreover, the event extraction model can also comprehensively identify multiple roles of the text in the event, so that the integrity of the event extraction is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of an event extraction method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a class tag sequence provided by an embodiment of the present application;
FIG. 3 is a diagram illustrating a text sequence annotation result provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of a training process of an event extraction model provided by an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an event extraction model provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram of an event extraction device according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an event extraction device according to an embodiment of the present application.
Detailed Description
The technical scheme of the embodiment of the application is suitable for extracting the application scene of the event from the text, and the event trigger words and the event arguments can be determined from the text, so that the event extraction is realized. In addition, the embodiment of the application can determine the event trigger words and the event arguments from the text at the same time through one-time information extraction, and has higher event extraction efficiency.
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
An embodiment of the present application provides an event extraction method, as shown in fig. 1, the method includes:
and S101, acquiring a text to be extracted.
The text to be extracted is only the text of the event to be extracted, and may specifically be the text of any language, any field, and any length.
S102, inputting the text to be extracted into a pre-trained event extraction model to obtain the category label of each text unit in the text to be extracted as an event extraction result.
The event extraction model is obtained by training a text sequence as a training sample, an event trigger word position label of the text sequence, and a category label of each text unit of the text sequence as a first type sample label.
The category labels of the text units comprise labels of event trigger word types and event argument types to which the text units belong.
Specifically, the embodiment of the present application uses the category label of each text unit in the text as the event extraction result. The category labels of the text units comprise labels of event trigger word types to which the text units belong and labels of event argument types to which the text units belong.
The text unit may be a text content in which characters, words, phrases, segments, and the like in the text can be used as an independent content unit. For example, verbs, nouns, names of people, time, locations, etc. in the text can be used as a text unit. These text units, perhaps as trigger words for events, perhaps as arguments for events, and thus an event may be composed of text units as trigger words for events and text units as arguments for events.
It is understood that when the category label of each text unit in the text is determined separately, it is also determined whether each text unit in the text belongs to an event trigger word or an event argument, and the trigger word type and argument type of each text unit can be determined separately. When the information is determined, the event trigger words and the event arguments in the text are respectively determined, and then the event extraction result is obtained.
Correspondingly, in the event extraction model training process, the text sequence is taken as a training sample, and the category label of each text unit of the text sequence is taken as a training sample label for training.
The category labels of the text units comprise event trigger part-of-speech type labels to which the text units belong and event argument type labels to which the text units belong.
The type of the event trigger word to which the text unit belongs refers to a type of the event trigger word to which the text unit belongs when the text unit is used as the event trigger word, and may be, for example, a verb, a noun, or the like. The event argument type to which the text unit belongs refers to an argument type to which the text unit belongs when the text unit is taken as an event argument, and may be, for example, time, place, person, or the like.
In some cases, a certain text unit in a piece of text may serve as both an event trigger word and an event argument, or both an event argument of type a and an event argument of type B.
For example, in the text "day 24, 11 months, suicidal explosions occurred in the united states, resulting in three suicides and 2 civilian deaths," suicide "is both an aggressor argument role and a victim argument role. At this time, the event argument type tag corresponding to the "suicide" should include both the attacker argument tag and the victim argument tag.
In order to comprehensively and accurately represent the event roles of the text units, when labeling category labels for each text unit of a text sequence, the embodiment of the application labels the event triggering word type label to which the text unit belongs and the event argument type label to which the text unit belongs. That is, the category label of the corresponding text unit includes two parts of content, one part is the label content indicating the event trigger word type to which the text unit belongs, and the other part is the label content indicating the event argument type to which the text unit belongs. The number of the tags representing the event trigger word type to which the text unit belongs and the number of the tags representing the event argument type to which the text unit belongs may be determined according to the specific number of the event roles that the text unit serves.
Therefore, in the embodiment of the present application, the category labels corresponding to the same text unit may include a plurality of event trigger word type labels and a plurality of event argument type labels, so that when one text unit simultaneously acts as multiple event roles (trigger words and/or arguments), the event role that the text unit acts on can be accurately and comprehensively represented by using the category label of the text unit.
For example, if a certain text unit in a text sequence is only used as an event trigger word in the text sequence, then an event trigger word type tag may be labeled for the text unit, at this time, the number of category tags of the text unit may be determined according to the specific type of the text unit as the event trigger word, specifically may be one or more, and the event argument type tag of the text unit may be labeled as null; if a certain text unit is only used as an event argument in the text sequence, then an event argument type label can be labeled for the text unit, at this time, the number of category labels of the text unit can be determined according to the specific type of the text unit as an event argument, specifically one or more, and the event trigger word type label of the text unit can be labeled as null; if a certain text unit is used as both an event trigger word and an event argument in the text sequence, then both an event trigger part-of-word type tag and an event argument type tag are labeled for the text unit, and at this time, the number of event trigger part-of-word type tags and the number of event argument type tags of the text unit may be determined according to the specific type of the text unit as the event trigger word and the specific type of the event argument, and may be specifically one or more.
Generally, there is a corresponding relationship between event trigger words and event arguments, for example, some trigger words representing actions can only be executed by human, so it can be determined that the trigger words should correspond to a human argument role and not to an animal argument role or an article argument role; or, some action trigger words can only take an article as an execution object, and the trigger words should correspond to the article argument role, that is, take the article argument role as the execution object, but cannot take the human argument role as the execution object.
Moreover, the event is usually driven by a trigger word, so the event trigger word extraction should be the primary task of the event extraction, and when the event trigger word is determined, the argument role corresponding to the trigger word can be further extracted based on the event trigger word, so as to obtain the event extraction result. The event trigger words are used as the reference for extracting the event arguments, so that the event arguments can be determined more accurately, the correct corresponding relation between the extracted event trigger words and the event arguments is ensured, and the accuracy and the efficiency of extracting the events are improved.
However, based on the labeling manner of the text unit category labels in the text sequence in the embodiment of the present application, in one text sequence, a plurality of text units may be all used as event trigger words, or one or more text units may be used as event trigger words of multiple different types. And finally as an event trigger word in the text sequence, only one text unit may be selected.
Therefore, in order to clarify a text unit which is finally used as an event trigger word from each text unit of a text sequence, so that the event trigger word can be used for event argument recognition, in the embodiment of the present application, the position of the event trigger word in the text sequence is labeled and used as an event trigger word position tag as a part of a training sample tag.
In the model training process, when an event extraction model extracts an event from a text sequence sample, extracting an event trigger word from the text sequence, namely identifying the event trigger word from the text sequence and labeling a label, then judging whether a text unit at the position of the event trigger word represented in the sample label is marked as the event trigger word, if not, determining that the extraction of the event trigger word fails and needing to correct parameters to extract the event trigger word again; if so, continuing the subsequent event extraction, namely referring to the text unit at the position of the event trigger word, namely the text unit at the position of the event trigger word and serving as the event trigger word, and extracting the event argument from the text sequence so as to extract the correct event argument corresponding to the event trigger word from the text sequence.
As can be seen, in the embodiment of the present application, when an event extraction model is trained, an event trigger word position tag of a text sequence and a category tag of each text unit of the text sequence are used as sample tags of the text sequence, and in the embodiment of the present application, the sample tags are named as first type sample tags. It is to be understood that the first type of sample tag includes an event trigger word position tag of a text sequence sample, and a category tag of each text unit of the text sequence sample. The category labels of the text units comprise labels of event trigger word types and event argument types to which the text units belong.
On one hand, the event extraction model can output the labeling result of the event trigger words and the event arguments in the text sequence at one time, namely end-to-end text event extraction is realized by utilizing one model, and the event extraction efficiency is high. And based on the labeling mode of the event extraction model to the category labels of the text units, for the condition that the same text unit in the text sequence simultaneously serves as multiple event roles, the event extraction model can also label the multiple event roles of the same text unit through one-time event extraction, namely, the event trigger words and the event arguments in the text can be more comprehensively identified, and the comprehensiveness of the event extraction is ensured.
On the other hand, when the event extraction model extracts the event trigger words and the event arguments from the text sequence, the event trigger words can be used as the basis for extracting the event arguments, namely the event trigger words are used for extracting the auxiliary event arguments, so that the corresponding relation between the event trigger words and the event arguments is ensured, and the accuracy and the efficiency of extracting the events are improved.
On the basis, when the text to be extracted is subjected to event extraction, the text to be extracted is input into the event extraction model obtained through the training, and then the class label labeling result of each text unit in the text to be extracted can be obtained, namely the event extraction result of the text to be extracted is obtained.
Based on the above introduction, it can be seen that in the embodiment of the present application, event extraction is performed on a text to be extracted by using a pre-trained event extraction model, and a category label of each text unit in the text to be extracted can be obtained by labeling, that is, labels of an event trigger word type and an event argument type to which each text unit in the text to be extracted belongs are obtained by labeling, so that an event trigger word and an event argument are determined from the text to be extracted, and an event extraction result is obtained.
Furthermore, based on the specific training mode of the event extraction model, the event extraction model not only can realize end-to-end extraction of event trigger words and event arguments and reduce the scale of the model and the consumption of the model, but also can be used for assisting the extraction of the event arguments in extracting the event trigger words, so that the corresponding relation between the trigger words and the arguments in extracting the event is ensured, and the accuracy of extracting the event is improved. Moreover, the event extraction model can also comprehensively identify multiple roles of the text in the event, so that the integrity of the event extraction is ensured.
Illustratively, in the embodiment of the present application, a category label is labeled for each text unit of a text sequence in a manner of a category label sequence.
Referring to fig. 2, the sequence elements in the category tag sequence set in the embodiment of the present application respectively correspond to all event trigger word types and event argument types, that is, the category tag sequence includes sequence elements corresponding to all event trigger word types and event argument types.
Assuming that M event trigger word types and N event argument types are in total in the event extraction field, M + N sequence elements I are in total in the category label sequence corresponding to the text unitM+NWherein the first M sequence elements I0~IM-1Respectively corresponding to M event trigger word types one by one, and then N sequence elements IM~IM+N-1And the event argument types are respectively in one-to-one correspondence with the N event argument types.
Based on the category label sequence set in the above manner, in the category label sequence corresponding to a certain text unit, the value of the sequence element corresponding to the event trigger word type and the event argument type to which the text unit belongs is a set effective value, and the value of the sequence element corresponding to other event trigger word types and event argument types in the category label sequence is not the set effective value.
For example, in the category label sequence of a text unit, the value of the sequence element corresponding to the event trigger word type and the event argument type to which the text unit belongs is 1, and the value of the enterprise sequence element is 0.
For a certain text unit, assuming that the text unit only serves as an event trigger word in the text sequence and serves as a 3 rd type event trigger word, a 3 rd sequence element I in the category label sequence of the text unit2Is 1 and the remaining sequence elements have values of 0. Assuming that the text unit is only an event argument in the text sequence and is an event argument of type 2, the M +2 sequence element I in the category tag sequence of the text unitM+1Is 1 and the remaining sequence elements have values of 0. Assuming that the text unit is in a text sequenceIf the event argument is both the 2 nd type and the 3 rd type, then the M +2 th sequence element I in the category label sequence of the text unitM+1And the M +3 th sequence element IM+2Is 1 and the remaining sequence elements have values of 0.
It can be seen that based on the above-mentioned category label sequence, any text unit can be represented in the same category label regardless of the event role it plays in the text sequence. For a text sequence, the category labels corresponding to each text unit contained in the text sequence are combined to obtain a category label sequence matrix. Each column in the matrix respectively corresponds to a category label sequence of a text unit in the sequence, and each row in the matrix respectively corresponds to an event trigger word type or an event argument type.
Based on the training, when the event extraction model is used for extracting the event from the text to be extracted, the obtained event extraction result is also in a matrix form, each column in the matrix corresponds to a category label sequence of a text unit in the text to be extracted, and each row in the matrix corresponds to an event trigger word type or an event argument type. When the value of an element of the matrix is a set valid value (e.g., 1), a text unit corresponding to the element may be determined as an event trigger word or an event argument corresponding to the element in the text to be extracted.
As a preferred training mode, in the embodiment of the present application, when training the event extraction model, not only the first type sample label is used as a training sample label, but also the second type label is simultaneously used as a training sample label.
Specifically, the second type tag is a sequence tagging result of the event trigger word and the event argument in the training sample, that is, a sequence tagging result of the event trigger word and the event argument in the text sequence as the training sample.
The sequence labeling result for the text sequence is shown in fig. 3. In the sequence labeling result, for each text unit of the text sequence, determining the unique event role, namely the event trigger word or the event argument, and arranging the event role labels of the text units according to the sequence of the text units in the text sequence to obtain the sequence labeling result corresponding to the text sequence. The values of the respective elements in the above-described sequence labeling result can be represented by preset numerical values or symbols.
In the sequence labeling result of the text sequence, each text unit has a unique category label, namely, the text unit serves as an event trigger word or an event argument, and the specific trigger word type and argument type of the text unit serving as the event trigger word or the event argument are also uniquely determined. Therefore, the sequence labeling result of the text sequence can accurately define the event trigger words and the event arguments in the text sequence, which is beneficial for distinguishing the event trigger words and the event arguments from the text sequence.
Therefore, the second type sample label is used for model training, so that the model can be trained to accurately distinguish the boundaries between the trigger words, between the trigger words and the arguments and between the arguments and the arguments in the text, and the event trigger words and the event arguments can be more accurately identified from the text.
Based on the first type sample labels and the second type sample labels, as shown in fig. 4, the training process of the event extraction model according to the embodiment of the present application is as follows:
s401, obtaining a training sample and a first type sample label and a second type sample label of the training sample.
The training sample is a text sequence containing events. The first type sample label of the training sample is a result of labeling the class label of each text unit of the training sample, and a result of labeling the event trigger word position in the training sample, and specific content of the result of labeling the class label can be referred to the description of the above embodiment; the second type sample label of the training sample is a sequence labeling result of the event trigger word and the event argument in the training sample, and specific contents thereof can be referred to the description of the above embodiment.
S402, inputting the training sample into the event extraction model to obtain an event extraction result output by the event extraction model.
The event extraction result output by the event extraction model comprises two results: a first type of result and a second type of result. The first type result is a category label labeling result for each text unit in the input training sample, the category label of the text unit includes an event trigger word labeling result and an event argument labeling result, and specific contents and forms of the event trigger word labeling result and the event argument labeling result are described in the above embodiments.
Meanwhile, as the event extraction model can identify any text unit in the training sample which can be used as an event trigger word, a plurality of text units may be labeled with event trigger word category labels, but one event usually has only one event trigger word, that is, the event trigger word is finally used and usually only one text unit in the text sequence. Therefore, in order to enable the event extraction model to fully recognize the event trigger words in the training sample and determine the text units finally serving as the event trigger words, the first type of result further includes a result of labeling the positions of the event trigger words, that is, a result of labeling the positions of the text units finally serving as the event trigger words. It can be seen that the form and content type of the first type result are the same as the form and content type of the first type tag.
The second type result is a sequence labeling result of the event trigger words and the event arguments in the input training sample, and the content type of the second type result is the same as that of the second type tag.
As an optional model setting mode, the event extraction model may be obtained by training based on a pre-training language model. Specifically, in order to ensure that the event trigger word extraction and the event argument extraction can be realized in one model and effectively identify the difference between the trigger word extraction and the argument extraction, the embodiment of the application selects a pre-training language model based on bert and derivatives thereof for training to obtain the event extraction model.
It can be understood that the above-mentioned event extraction model determines whether a text unit is an event trigger word or an event argument by recognizing the text unit features in the input text when performing event extraction. In the model training process, mainly when the training model identifies various types of texts, the text unit characteristics can be accurately grasped, so that accurate event extraction is realized. The model output result is the result of feature extraction and recognition of the text based on the model, and the organized output result conforms to the output form of the event extraction result. That is, the first type result and the second type result output by the above event extraction model are different in specific content and form, but are extracted and recognized based on the text features of the model. Therefore, no matter the model is trained by the first type result or the second type result, the performance of the model can be improved, and the training effect can be achieved.
Because the different types of results have unique effects on event extraction respectively, the event extraction model outputs the two types of event extraction results simultaneously, and the model is trained based on the two types of event extraction results, so that the event extraction results output by the event extraction model have respective advantages of the different types of results simultaneously, and the model can obtain a better training effect more quickly, namely, the performance of extracting events comprehensively and accurately can be quickly achieved.
Based on the above idea, after inputting the training sample into the above event extraction model and obtaining the event extraction result output by the event extraction model, the following step S403 is executed:
and S403, calculating and determining an event extraction loss value of the event extraction model according to the event extraction result output by the event extraction model, the first type sample label and the second type sample label.
Specifically, the event extraction loss value of the event extraction model is used to indicate the difference between the event extraction result of the event extraction model on the training sample and the event included in the training sample, and the difference can reflect the event extraction accuracy of the event extraction model.
Since the event extraction result output by the event extraction model includes two results, the embodiment of the present application calculates the event extraction loss value according to the two results output by the event extraction model when calculating the event extraction loss value of the event extraction model.
Firstly, calculating to obtain a first event extraction loss value of the event extraction model according to an event extraction result output by the event extraction model and the first type sample label.
Specifically, a difference value between a first type result output by the event extraction model and a first type sample label of the training sample is calculated and used as a first event extraction loss value of the event extraction model.
And then, calculating a second event extraction loss value of the event extraction model according to an event extraction result output by the event extraction model and the second type sample label.
Specifically, a difference value between a second type result output by the event extraction model and a second type sample label of the training sample is calculated and used as a second event extraction loss value of the event extraction model.
And finally, calculating and determining an event extraction loss value of the event extraction model according to the first event extraction loss value and the second event extraction loss value.
Specifically, the first event extraction loss value and the second event extraction loss value are summed or weighted, and the obtained calculation result can be used as the event extraction loss value of the event extraction model.
S404, judging whether the event extraction loss value of the event extraction model is smaller than a set value or not.
If the value is less than the set value, step S405 is executed and the training process is ended. At this time, an event extraction model capable of realizing accurate event extraction can be obtained.
If the calculated value is not less than the set value, step S406 is executed to adjust the operation parameters of the event extraction model according to the event extraction loss value of the event extraction model.
Then, returning to step S401, the training process is executed again, that is, the training samples and the corresponding sample labels are obtained again, and the event extraction model after the parameters are modified is trained. And repeating the training process until the event extraction loss value of the event extraction model is judged to be smaller than the set value when the step S404 is executed, and completely training the event extraction model at the moment.
It can be understood that, in the training process of the event extraction model in the embodiment of the present application, the sequence labeling result of the model on the text event and the category label labeling result of the text unit are combined, and based on the two event extraction results of the event extraction model, the loss functions of the event extraction model are respectively calculated and used for correcting the model parameters, so that the event extraction model not only can have the accurate distinguishing performance on the event trigger words and the arguments, but also can realize the comprehensive recognition on the event trigger words and the event arguments, and has better recognition performance on the condition that one text unit simultaneously serves as multiple event roles.
As an exemplary implementation manner, the embodiment of the present application uses the coding sequence as an event trigger word position tag.
The event triggering word position label as the text sequence, the elements of the coding sequence correspond to the text units of the text sequence one by one, that is, the length of the coding sequence is the same as the length of the text sequence. The value of the coding sequence element corresponding to the event trigger word in the text sequence is a set value, and the values of the other coding sequence elements are non-set values, so that the position of the event trigger word in the corresponding text sequence can be determined by determining the position of the set value element in the coding sequence.
For example, the above coding sequence is denoted as segment _ id, and if the text sequence is "a army 2-frame battle airplane parking B ground", the text sequence includes 5 text units, which are respectively "a army", "2 frames", "fighters", "parking", and "B ground", where the 4 th text unit "parking" is an event trigger, the length of the segment _ id corresponding to the text sequence is 5 elements, and the 4 th element corresponds to the event trigger of the text sequence, so the value of the element should be a set value, and assumed to be 1, and the values of the remaining elements are non-set values, and assumed to be 0, the segment _ id can be finally determined to be 00010. According to the segment _ id, since the value of the 4 th element is the set value, it can be determined that the 4 th text unit of the corresponding text sequence is the event trigger word.
It can be understood that based on the one-to-one correspondence relationship between the elements of the coding sequence and the text units of the text sequence, the positions of the event trigger words in the text sequence can be intuitively and accurately reflected by using the elements of the setting values in the coding sequence.
Optionally, in this embodiment of the present application, the structure of the event extraction model adopts a pipeline model structure, for example, as shown in fig. 5.
After the text data is input into the event extraction model, the event trigger word extraction is firstly carried out through the trigger word extraction module, then the event trigger word extraction result and the text data enter the argument extraction module to carry out argument extraction, and then the argument extraction module outputs the complete event extraction result, namely the category label labeling result of each text unit of the text data.
Based on the above-mentioned event extraction model structure, the specific processing procedure of the event extraction model for processing the text to be extracted to obtain the category label of each text unit in the text to be extracted includes the following two steps of processing a1 and a 2:
and A1, performing trigger word recognition processing on the text to be extracted, and determining the position of the trigger word in the text to be extracted.
Specifically, the trigger word extraction module of the event extraction model can be used for recognizing the event trigger word in the text to be extracted, so that the position of the trigger word in the text to be extracted is determined.
The processing procedure for determining the position of the trigger word in the text to be extracted comprises the following two steps A11 and A12:
and A11, generating an initial coding sequence according to the text to be extracted.
The initial coding sequence is used for reflecting the position of the event trigger word in the text to be extracted. The sequence elements of the initial coding sequence correspond to the text units of the text to be extracted one by one, that is, the length of the initial coding sequence is the same as that of the text to be extracted. The values of all sequence elements of the initial coding sequence are set initial values.
For example, the initial coding sequence may be denoted as segment _ id, the sequence elements of the initial coding sequence correspond to the text units in the text to be extracted one by one, and the value of each sequence element is an initial value, for example, X.
And A12, inputting the text to be extracted and the initial coding sequence into a trigger word extraction module, so that the trigger word extraction module identifies the trigger words in the text to be extracted and outputs a trigger word mark coding sequence.
After receiving the text to be extracted, the trigger word extraction module identifies the event trigger words in the text to be extracted, that is, identifies the text units serving as the event trigger words from the text units in the text to be extracted.
Then, the trigger word extraction module sets the value of the sequence element corresponding to the event trigger word in the text to be extracted in the received initial coding sequence as a set value, sets the values of other sequence elements in the initial coding sequence as non-set values, and outputs the obtained new sequence as a trigger word mark coding sequence.
A2, determining the category label of each text unit in the text to be extracted according to the position of the trigger word in the text to be extracted.
Specifically, the text to be extracted and the trigger word mark coding sequence are input into an argument extraction module, and the argument extraction module identifies and extracts event arguments in the text to be extracted according to the trigger word mark coding sequence, that is, identifies whether each text unit in the text to be extracted is an event argument and a specific event argument type. And finally, the argument extraction module comprehensively arranges the trigger word extraction result determined based on the trigger word mark coding sequence and the argument extraction result, and outputs the category label of each text unit in the text to be extracted.
Based on the above description, in the process of extracting an event from a text to be extracted by using the event extraction model provided in the embodiment of the present application, the event trigger word in the text to be extracted is used as a reference for extracting an event argument from the text to be extracted, so that a corresponding relationship between the event trigger word and the event argument in the event extraction process can be ensured, and the event extraction accuracy is ensured.
An event extraction device is further provided in an embodiment of the present application, and as shown in fig. 6, the event extraction device includes:
the event extraction processing unit is used for inputting the text to be extracted into a pre-trained event extraction model to obtain the category labels of all text units in the text to be extracted as an event extraction result;
the event extraction model is obtained by training a text sequence serving as a training sample, an event trigger word position label of the text sequence and a category label of each text unit of the text sequence serving as a first type sample label, wherein the category label of the text unit comprises a label of an event trigger word type and an event argument type to which the text unit belongs.
The event extraction model provided by the embodiment of the application performs event extraction on the text to be extracted by means of the pre-trained event extraction model, and can respectively label to obtain the category label of each text unit in the text to be extracted, namely label to obtain the event trigger word type and the event argument type to which each text unit in the text to be extracted belongs, so that the event trigger word and the event argument are determined from the text to be extracted, and an event extraction result is obtained.
Furthermore, based on the specific training mode of the event extraction model, the event extraction model not only can realize end-to-end extraction of event trigger words and event arguments and reduce the scale of the model and the consumption of the model, but also can be used for assisting the extraction of the event arguments in extracting the event trigger words, so that the corresponding relation between the trigger words and the arguments in extracting the event is ensured, and the accuracy of extracting the event is improved. Moreover, the event extraction model can also comprehensively identify multiple roles of the text in the event, so that the integrity of the event extraction is ensured.
As an optional implementation manner, the category label of the text unit is a category label sequence, where the category label sequence includes sequence elements corresponding to all event trigger word types and event argument types, and in the category label sequence of the text unit, values of the sequence elements corresponding to the event trigger word type and the event argument type to which the text unit belongs are set valid values.
As an optional implementation manner, when the event extraction model is trained, the sequence labeling result of the event trigger words and arguments in the text sequence is further used as a second type sample label.
As an alternative implementation, the training process of the event extraction model includes:
inputting a training sample into the event extraction model to obtain an event extraction result output by the event extraction model; the event extraction result output by the event extraction model comprises a first type result and a second type result, the first type result comprises a labeling result of the position of a trigger word of the training sample and a labeling result of a category label of each text unit of the training sample, and the second type result comprises a sequence labeling result of the event trigger word and an event argument of the training sample;
calculating and determining an event extraction loss value of the event extraction model according to an event extraction result output by the event extraction model, the first type sample label and the second type sample label;
adjusting the operation parameters of the event extraction model according to the event extraction loss value of the event extraction model;
and repeating the above processing until the event extraction loss value of the event extraction model is less than the set value.
As an optional implementation manner, the calculating and determining a loss value of the event extraction model according to the event extraction result output by the event extraction model and the first type sample label and the second type sample label includes:
calculating to obtain a first event extraction loss value of the event extraction model according to an event extraction result output by the event extraction model and the first type sample label;
calculating a second event extraction loss value of the event extraction model according to the event extraction result output by the event extraction model and the second type sample label;
and calculating and determining an event extraction loss value of the event extraction model according to the first event extraction loss value and the second event extraction loss value.
As an optional implementation manner, the event trigger word position tag is a coding sequence, elements of the coding sequence correspond to each text unit of the text sequence one to one, where values of coding sequence elements corresponding to event trigger words of the text sequence are set values, and values of the other coding sequence elements are non-set values.
As an optional implementation manner, the processing, by the event extraction model, of the text to be extracted to obtain the category label of each text unit in the text to be extracted includes:
performing trigger word recognition processing on the text to be extracted, and determining the position of a trigger word in the text to be extracted;
and determining the category label of each text unit in the text to be extracted according to the position of the trigger word in the text to be extracted, wherein the category label of the text unit comprises the label of the event trigger word type and the event argument type to which the text unit belongs.
As an optional implementation manner, the performing trigger word recognition processing on the text to be extracted to determine the position of the trigger word in the text to be extracted includes:
generating an initial coding sequence according to the text to be extracted, wherein sequence elements of the initial coding sequence correspond to each text unit of the text to be extracted one by one, and the values of the sequence elements of the initial coding sequence are set initial values;
inputting the text to be extracted and the initial coding sequence into a trigger word extraction module so that the trigger word extraction module identifies the trigger words in the text to be extracted and outputs a trigger word mark coding sequence;
the trigger word mark coding sequence is obtained by setting the value of an element corresponding to an event trigger word in the text to be extracted in the initial coding sequence to a set value by the trigger word extraction module.
As an optional implementation manner, the determining, according to the position of the trigger word in the text to be extracted, the category label of each text unit in the text to be extracted includes:
and inputting the text to be extracted and the trigger word mark coding sequence into a argument extraction module, and determining the category label of each text unit in the text to be extracted.
Specifically, please refer to the contents of the above method embodiments for the specific working contents of each unit of the event extraction device, which are not described herein again.
Another embodiment of the present application further provides an event extraction device, as shown in fig. 7, the event extraction device including:
a memory 200 and a processor 210;
wherein, the memory 200 is connected to the processor 210 for storing programs;
the processor 210 is configured to implement the event extraction method disclosed in any of the above embodiments by running the program stored in the memory 200.
Specifically, the event extraction device may further include: a bus, a communication interface 220, an input device 230, and an output device 240.
The processor 210, the memory 200, the communication interface 220, the input device 230, and the output device 240 are connected to each other through a bus. Wherein:
a bus may include a path that transfers information between components of a computer system.
The processor 210 may be a general-purpose processor, such as a general-purpose Central Processing Unit (CPU), microprocessor, etc., an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs in accordance with the present invention. But may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components.
The processor 210 may include a main processor and may also include a baseband chip, modem, and the like.
The memory 200 stores programs for executing the technical solution of the present invention, and may also store an operating system and other key services. In particular, the program may include program code including computer operating instructions. More specifically, memory 200 may include a read-only memory (ROM), other types of static storage devices that may store static information and instructions, a Random Access Memory (RAM), other types of dynamic storage devices that may store information and instructions, a disk storage, a flash, and so forth.
The input device 230 may include a means for receiving data and information input by a user, such as a keyboard, mouse, camera, scanner, light pen, voice input device, touch screen, pedometer, or gravity sensor, among others.
Output device 240 may include equipment that allows output of information to a user, such as a display screen, a printer, speakers, and the like.
Communication interface 220 may include any device that uses any transceiver or the like to communicate with other devices or communication networks, such as an ethernet network, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.
The processor 2102 executes programs stored in the memory 200 and invokes other devices, which may be used to implement the steps of the event extraction method provided by the embodiments of the present application.
Another embodiment of the present application further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the computer program implements the steps of the event extraction method provided in any of the above embodiments.
Specifically, the specific working contents of each part of the event extraction device and the specific processing contents of the computer program on the storage medium when being executed by the processor can refer to the contents of each embodiment of the event extraction method, and are not described herein again.
While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present application is not limited by the order of acts or acts described, as some steps may occur in other orders or concurrently with other steps in accordance with the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The steps in the method of each embodiment of the present application may be sequentially adjusted, combined, and deleted according to actual needs, and technical features described in each embodiment may be replaced or combined.
The modules and sub-modules in the device and the terminal in the embodiments of the application can be combined, divided and deleted according to actual needs.
In the several embodiments provided in the present application, it should be understood that the disclosed terminal, apparatus and method may be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of a module or a sub-module is only one logical division, and there may be other divisions when the terminal is actually implemented, for example, a plurality of sub-modules or modules may be combined or integrated into another module, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules or sub-modules described as separate parts may or may not be physically separate, and parts that are modules or sub-modules may or may not be physical modules or sub-modules, may be located in one place, or may be distributed over a plurality of network modules or sub-modules. Some or all of the modules or sub-modules can be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, each functional module or sub-module in the embodiments of the present application may be integrated into one processing module, or each module or sub-module may exist alone physically, or two or more modules or sub-modules may be integrated into one module. The integrated modules or sub-modules may be implemented in the form of hardware, or may be implemented in the form of software functional modules or sub-modules.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software cells may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

1. An event extraction method, comprising:
inputting a text to be extracted into a pre-trained event extraction model to obtain a category label of each text unit in the text to be extracted as an event extraction result;
the event extraction model is obtained by training a text sequence serving as a training sample, an event trigger word position label of the text sequence and a category label of each text unit of the text sequence serving as a first type sample label, wherein the category label of the text unit comprises a label of an event trigger word type and an event argument type to which the text unit belongs.
2. The method according to claim 1, wherein the category label of the text unit is a category label sequence, the category label sequence includes sequence elements corresponding to all event trigger word types and event argument types, and in the category label sequence of the text unit, values of the sequence elements corresponding to the event trigger word type and the event argument type to which the text unit belongs are set valid values.
3. The method of claim 1, wherein in training the event extraction model, the results are further labeled with a sequence of event-triggered words and arguments in the text sequence as a second type of sample label.
4. The method of claim 3, wherein the training process for the event extraction model comprises:
inputting a training sample into the event extraction model to obtain an event extraction result output by the event extraction model; the event extraction result output by the event extraction model comprises a first type result and a second type result, the first type result comprises a labeling result of the position of a trigger word of the training sample and a labeling result of a category label of each text unit of the training sample, and the second type result comprises a sequence labeling result of the event trigger word and an event argument of the training sample;
calculating and determining an event extraction loss value of the event extraction model according to an event extraction result output by the event extraction model, the first type sample label and the second type sample label;
adjusting the operation parameters of the event extraction model according to the event extraction loss value of the event extraction model;
and repeating the above processing until the event extraction loss value of the event extraction model is less than the set value.
5. The method according to claim 4, wherein the calculating and determining the loss value of the event extraction model according to the event extraction result output by the event extraction model and the first type sample label and the second type sample label comprises:
calculating to obtain a first event extraction loss value of the event extraction model according to an event extraction result output by the event extraction model and the first type sample label;
calculating a second event extraction loss value of the event extraction model according to the event extraction result output by the event extraction model and the second type sample label;
and calculating and determining an event extraction loss value of the event extraction model according to the first event extraction loss value and the second event extraction loss value.
6. The method according to claim 1, wherein the event trigger word position tag is a coding sequence, elements of the coding sequence correspond to each text unit of the text sequence in a one-to-one correspondence manner, wherein the value of the coding sequence element corresponding to the event trigger word of the text sequence is a set value, and the values of the rest coding sequence elements are non-set values.
7. The method according to claim 1, wherein the processing of the text to be extracted by the event extraction model to obtain the category label of each text unit in the text to be extracted comprises:
performing trigger word recognition processing on the text to be extracted, and determining the position of a trigger word in the text to be extracted;
and determining the category label of each text unit in the text to be extracted according to the position of the trigger word in the text to be extracted, wherein the category label of the text unit comprises the label of the event trigger word type and the event argument type to which the text unit belongs.
8. The method according to claim 7, wherein the performing a trigger recognition process on the text to be extracted and determining the position of the trigger in the text to be extracted comprises:
generating an initial coding sequence according to the text to be extracted, wherein sequence elements of the initial coding sequence correspond to each text unit of the text to be extracted one by one, and the values of the sequence elements of the initial coding sequence are set initial values;
inputting the text to be extracted and the initial coding sequence into a trigger word extraction module so that the trigger word extraction module identifies the trigger words in the text to be extracted and outputs a trigger word mark coding sequence;
the trigger word mark coding sequence is obtained by setting the value of an element corresponding to an event trigger word in the text to be extracted in the initial coding sequence to a set value by the trigger word extraction module.
9. The method according to claim 8, wherein the determining the category label of each text unit in the text to be extracted according to the position of the trigger word in the text to be extracted comprises:
and inputting the text to be extracted and the trigger word mark coding sequence into a argument extraction module, and determining the category label of each text unit in the text to be extracted.
10. An event extraction device, comprising:
the event extraction processing unit is used for inputting the text to be extracted into a pre-trained event extraction model to obtain the category labels of all text units in the text to be extracted as an event extraction result;
the event extraction model is obtained by training a text sequence serving as a training sample, an event trigger word position label of the text sequence and a category label of each text unit of the text sequence serving as a first type sample label, wherein the category label of the text unit comprises a label of an event trigger word type and an event argument type to which the text unit belongs.
11. An event extraction device, comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to implement the event extraction method according to any one of claims 1 to 9 by executing a program in the memory.
12. A storage medium, having stored thereon a computer program which, when executed by a processor, implements an event extraction method as claimed in any one of claims 1 to 9.
CN202110286434.2A 2021-03-17 2021-03-17 Event extraction method, device, equipment and storage medium Active CN112861527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110286434.2A CN112861527B (en) 2021-03-17 2021-03-17 Event extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110286434.2A CN112861527B (en) 2021-03-17 2021-03-17 Event extraction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112861527A true CN112861527A (en) 2021-05-28
CN112861527B CN112861527B (en) 2024-08-30

Family

ID=75995068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110286434.2A Active CN112861527B (en) 2021-03-17 2021-03-17 Event extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112861527B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468433A (en) * 2021-09-02 2021-10-01 中科雨辰科技有限公司 Target event extraction data processing system
CN113722462A (en) * 2021-09-02 2021-11-30 中科雨辰科技有限公司 Target argument information extraction data processing system
CN113987104A (en) * 2021-09-28 2022-01-28 浙江大学 Ontology guidance-based generating type event extraction method
CN114741516A (en) * 2021-12-08 2022-07-12 商汤国际私人有限公司 Event extraction method and device, electronic equipment and storage medium
CN114861677A (en) * 2022-05-30 2022-08-05 北京百度网讯科技有限公司 Information extraction method, information extraction device, electronic equipment and storage medium
WO2022262080A1 (en) * 2021-06-17 2022-12-22 腾讯云计算(北京)有限责任公司 Dialogue relationship processing method, computer and readable storage medium
CN115701862A (en) * 2023-01-10 2023-02-14 中国电子信息产业集团有限公司第六研究所 Event element determination method and device, electronic equipment and storage medium
WO2023035330A1 (en) * 2021-09-13 2023-03-16 深圳前海环融联易信息科技服务有限公司 Long text event extraction method and apparatus, and computer device and storage medium
CN115983274A (en) * 2022-12-20 2023-04-18 东南大学 Noise event extraction method based on two-stage label correction

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120036130A1 (en) * 2007-12-21 2012-02-09 Marc Noel Light Systems, methods, software and interfaces for entity extraction and resolution and tagging
US20180060831A1 (en) * 2016-08-25 2018-03-01 Crown Equipment Corporation Observation based event tracking
CN109325228A (en) * 2018-09-19 2019-02-12 苏州大学 English event trigger word abstracting method and system
CN111460830A (en) * 2020-03-11 2020-07-28 北京交通大学 Method and system for extracting economic events in judicial texts
CN111967268A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Method and device for extracting events in text, electronic equipment and storage medium
CN112116075A (en) * 2020-09-18 2020-12-22 厦门安胜网络科技有限公司 Event extraction model generation method and device and text event extraction method and device
CN112287672A (en) * 2019-11-28 2021-01-29 北京京东尚科信息技术有限公司 Text intention recognition method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120036130A1 (en) * 2007-12-21 2012-02-09 Marc Noel Light Systems, methods, software and interfaces for entity extraction and resolution and tagging
US20180060831A1 (en) * 2016-08-25 2018-03-01 Crown Equipment Corporation Observation based event tracking
CN109325228A (en) * 2018-09-19 2019-02-12 苏州大学 English event trigger word abstracting method and system
CN112287672A (en) * 2019-11-28 2021-01-29 北京京东尚科信息技术有限公司 Text intention recognition method and device, electronic equipment and storage medium
CN111460830A (en) * 2020-03-11 2020-07-28 北京交通大学 Method and system for extracting economic events in judicial texts
CN111967268A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Method and device for extracting events in text, electronic equipment and storage medium
CN112116075A (en) * 2020-09-18 2020-12-22 厦门安胜网络科技有限公司 Event extraction model generation method and device and text event extraction method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NING DING ET AL: "Event Detection with Trigger-Aware Lattice Neural Network", PROCEEDINGS OF THE 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 7 December 2019 (2019-12-07), pages 347 - 356 *
胡博磊 等: "基于条件随机域的中文事件类型识别", 模式识别与人工智能, no. 03, 15 June 2012 (2012-06-15) *
贺瑞芳 等: "基于多任务学习的中文事件抽取联合模型", 软件学报, vol. 30, no. 4, 15 April 2019 (2019-04-15), pages 1015 - 1030 *
贺瑞芳 等: "基于多任务学习的中文事件抽取联合模型", 软件学报, vol. 30, no. 4, pages 1015 - 1030 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022262080A1 (en) * 2021-06-17 2022-12-22 腾讯云计算(北京)有限责任公司 Dialogue relationship processing method, computer and readable storage medium
CN113468433A (en) * 2021-09-02 2021-10-01 中科雨辰科技有限公司 Target event extraction data processing system
CN113722462A (en) * 2021-09-02 2021-11-30 中科雨辰科技有限公司 Target argument information extraction data processing system
CN113468433B (en) * 2021-09-02 2021-12-07 中科雨辰科技有限公司 Target event extraction data processing system
CN113722462B (en) * 2021-09-02 2022-03-04 中科雨辰科技有限公司 Target argument information extraction data processing system
WO2023035330A1 (en) * 2021-09-13 2023-03-16 深圳前海环融联易信息科技服务有限公司 Long text event extraction method and apparatus, and computer device and storage medium
CN113987104A (en) * 2021-09-28 2022-01-28 浙江大学 Ontology guidance-based generating type event extraction method
CN114741516A (en) * 2021-12-08 2022-07-12 商汤国际私人有限公司 Event extraction method and device, electronic equipment and storage medium
CN114861677A (en) * 2022-05-30 2022-08-05 北京百度网讯科技有限公司 Information extraction method, information extraction device, electronic equipment and storage medium
CN115983274A (en) * 2022-12-20 2023-04-18 东南大学 Noise event extraction method based on two-stage label correction
CN115983274B (en) * 2022-12-20 2023-11-28 东南大学 Noise event extraction method based on two-stage label correction
CN115701862A (en) * 2023-01-10 2023-02-14 中国电子信息产业集团有限公司第六研究所 Event element determination method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112861527B (en) 2024-08-30

Similar Documents

Publication Publication Date Title
CN112861527B (en) Event extraction method, device, equipment and storage medium
CN102662930B (en) Corpus tagging method and corpus tagging device
CN109147767B (en) Method, device, computer equipment and storage medium for recognizing numbers in voice
CN109582772B (en) Contract information extraction method, contract information extraction device, computer equipment and storage medium
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
CN111178077B (en) Corpus generation method, corpus generation device and intelligent equipment
CN107423278B (en) Evaluation element identification method, device and system
CN111352907A (en) Method and device for analyzing pipeline file, computer equipment and storage medium
CN112434131B (en) Text error detection method and device based on artificial intelligence and computer equipment
US20210390370A1 (en) Data processing method and apparatus, storage medium and electronic device
CN103870000A (en) Method and device for sorting candidate items generated by input method
WO2020199600A1 (en) Sentiment polarity analysis method and related device
CN110674255A (en) Text content auditing method and device
CN108932218A (en) A kind of example extended method, device, equipment and medium
CN110442871A (en) Text message processing method, device and equipment
CN114090794A (en) Event map construction method based on artificial intelligence and related equipment
CN110929520A (en) Non-named entity object extraction method and device, electronic equipment and storage medium
CN112906361A (en) Text data labeling method and device, electronic equipment and storage medium
CN112559725A (en) Text matching method, device, terminal and storage medium
CN116796730A (en) Text error correction method, device, equipment and storage medium based on artificial intelligence
CN115392235A (en) Character matching method and device, electronic equipment and readable storage medium
CN111090970A (en) Text standardization processing method after speech recognition
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN111563212A (en) Inner chain adding method and device
CN110970030A (en) Voice recognition conversion method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant