CN113255322B - Event extraction method and device, computer equipment and computer-readable storage medium - Google Patents

Event extraction method and device, computer equipment and computer-readable storage medium Download PDF

Info

Publication number
CN113255322B
CN113255322B CN202110649268.8A CN202110649268A CN113255322B CN 113255322 B CN113255322 B CN 113255322B CN 202110649268 A CN202110649268 A CN 202110649268A CN 113255322 B CN113255322 B CN 113255322B
Authority
CN
China
Prior art keywords
role
target
clause
event
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110649268.8A
Other languages
Chinese (zh)
Other versions
CN113255322A (en
Inventor
孙俊
黄继青
刘云峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhuiyi Technology Co Ltd
Original Assignee
Shenzhen Zhuiyi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Zhuiyi Technology Co Ltd filed Critical Shenzhen Zhuiyi Technology Co Ltd
Priority to CN202110649268.8A priority Critical patent/CN113255322B/en
Publication of CN113255322A publication Critical patent/CN113255322A/en
Application granted granted Critical
Publication of CN113255322B publication Critical patent/CN113255322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Abstract

The application relates to an event extraction method, an event extraction device, computer equipment and a computer readable storage medium. The method comprises the following steps: acquiring a target text, and performing clause processing on the target text to obtain a plurality of clause texts; performing at least two different character extraction processes on the multiple clause texts to obtain a character extraction result corresponding to each character extraction process, and combining the character extraction results to obtain a target character extraction result corresponding to the target text, wherein the target character extraction result comprises a plurality of character elements corresponding to the target text; and performing event extraction processing on the target role extraction result to obtain at least one piece of event information, wherein the event information comprises an event type and a target role element corresponding to the event type. By adopting the method, the accuracy of event extraction can be improved.

Description

Event extraction method and device, computer equipment and computer-readable storage medium
Technical Field
The present invention relates to the field of artificial intelligence technology, and in particular, to an event extraction method, an event extraction device, a computer device, and a computer-readable storage medium.
Background
Event Extraction (EE) is a classic information Extraction task in the field of NLP (Natural Language Processing), and is widely applied in the fields of commerce, military and the like. Event extraction refers to extracting event information in a structured form from unstructured text.
Currently, the related art also fuses event extraction and ASR (Automatic Speech Recognition), so that ASR conversion can be performed on the Speech of the user to obtain an unstructured text, and event extraction can be performed on the unstructured text.
However, the event extraction method based on the ASR scenario has a problem that the accuracy of the event extraction is low.
Disclosure of Invention
In view of the above, it is necessary to provide an event extraction method, an event extraction device, a computer device, and a computer-readable storage medium, which can improve the accuracy of event extraction.
In a first aspect, an embodiment of the present application provides an event extraction method, where the method includes:
acquiring a target text, and performing clause processing on the target text to obtain a plurality of clause texts;
performing at least two different character extraction processes on the multiple clause texts to obtain a character extraction result corresponding to each character extraction process, and combining the character extraction results to obtain a target character extraction result corresponding to the target text, wherein the target character extraction result comprises a plurality of character elements corresponding to the target text;
and performing event extraction processing on the target role extraction result to obtain at least one piece of event information, wherein the event information comprises an event type and a target role element corresponding to the event type.
In one embodiment, the performing at least two different character extraction processes on the multiple clause texts to obtain a character extraction result corresponding to each character extraction process includes:
for each role extraction processing process, dividing the multiple clause texts into multiple clause groups according to the arrangement sequence of the clause texts in the target text, wherein the clause groups comprise the same number of clause texts, and the number of the clause groups corresponding to different role extraction processing processes is different;
and performing role extraction processing on the clause text corresponding to each clause group to obtain a role extraction result corresponding to the current role extraction processing process.
In one embodiment, the performing role extraction processing on the clause text corresponding to each clause group to obtain the role extraction result corresponding to the current role extraction processing process includes:
entity recognition is carried out on the clause texts corresponding to the clause groups, and a plurality of keywords are obtained;
and performing role matching on each keyword based on a preset role element database, and taking the keyword which is successfully matched as an initial role element to obtain a role extraction result.
In one embodiment, the combining the character extraction results to obtain the target character extraction result corresponding to the target text includes:
based on each role extraction result, if the number of initial role elements corresponding to a target role label is multiple, determining the role element corresponding to the target role label from the multiple initial role elements according to the confidence degrees corresponding to the multiple initial role elements;
and in each role extraction result, replacing the plurality of initial role elements corresponding to the target role labels with the role elements corresponding to the target role labels to obtain the target role extraction result.
In one embodiment, the determining, according to the confidence degrees corresponding to the plurality of initial role elements, a role element corresponding to the target role label from the plurality of initial role elements includes:
determining a maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements;
and determining the initial role element corresponding to the maximum confidence coefficient as the role element corresponding to the target role label.
In one embodiment, before determining the maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements, the method further includes:
detecting whether text overlapping exists in a plurality of initial role elements;
correspondingly, the determining the maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements includes:
and if the initial role elements have text overlap, determining the maximum confidence level according to the confidence levels corresponding to the initial role elements.
In one embodiment, the performing event extraction processing on the target role extraction result to obtain at least one piece of event information includes:
determining role elements corresponding to the clause texts according to the target role extraction result;
for each clause text, if the role element corresponding to the clause text comprises a trigger role element corresponding to a trigger role, determining the event type of the role element corresponding to the clause text according to the trigger role element;
and acquiring the at least one piece of event information according to the role elements and the event types corresponding to the clause texts.
In one embodiment, the determining, according to the trigger role element, the event type of the role element corresponding to the clause text includes:
determining an event template corresponding to the trigger role element, wherein the event template comprises a plurality of event role elements;
and if the role element corresponding to the clause text is matched with the event role elements, taking the event type corresponding to the event template as the event type of the role element corresponding to the clause text.
In one embodiment, before the acquiring the at least one piece of event information according to the role element and the event type corresponding to each clause text, the method further includes:
and for each clause text, if the role element corresponding to the clause text does not comprise the trigger role element, taking the trigger role element of the adjacent clause text of the clause text as the trigger role element corresponding to the clause text.
In one embodiment, the obtaining the at least one piece of event information according to the role element and the event type corresponding to each clause text includes:
fusing the role elements with the same event type according to the role elements corresponding to the clause texts and the event type to obtain the target role element corresponding to each event type;
and obtaining event information corresponding to each event type according to the target role element corresponding to each event type.
In one embodiment, the obtaining event information corresponding to each event type according to the target role element corresponding to each event type includes:
for each event type, performing duplicate removal processing on the target role element corresponding to the event type to obtain a duplicate-removed target role element;
and filtering the target role element after the deduplication processing according to the role label and the confidence coefficient corresponding to the target role element after the deduplication processing to obtain event information corresponding to the event type.
In one embodiment, the method further comprises:
and determining target event information from the at least one event information according to a preset table filling strategy, and filling the target event information into a preset table.
In one embodiment, the determining, according to a preset table filling policy, target event information from the at least one event information, and filling the target event information into a preset table includes:
for each table, determining the target event information corresponding to the target event type from the at least one event information according to the target event type corresponding to the table;
and selecting a corresponding target role element from the target event information according to a preset mapping relation between the header and the role element, and filling the target role element into a position frame indicated by the corresponding header, wherein the mapping relation comprises a mapping relation between one role element and one header, or the mapping relation comprises a mapping relation between a plurality of role elements and one header.
In a second aspect, an embodiment of the present application provides an event extraction apparatus, including:
the acquisition module is used for acquiring a target text and performing clause processing on the target text to obtain a plurality of clause texts;
the role extraction module is used for performing at least two different role extraction processes on the multiple clause texts to obtain role extraction results corresponding to each role extraction process, combining the role extraction results to obtain target role extraction results corresponding to the target texts, and the target role extraction results comprise multiple role elements corresponding to the target texts;
and the event extraction module is used for performing event extraction processing on the target role extraction result to obtain at least one piece of event information, wherein the event information comprises an event type and a target role element corresponding to the event type.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the method according to the first aspect when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method according to the first aspect as described above.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
the method comprises the steps of obtaining a target text, performing clause processing on the target text to obtain a plurality of clause texts, then performing at least two times of different role extraction processing on the plurality of clause texts to obtain role extraction results corresponding to each role extraction processing process, combining the role extraction results to obtain a target role extraction result comprising a plurality of role elements corresponding to the target text, and performing event extraction processing on the target role extraction result to obtain at least one event message, wherein the event message comprises an event type and a target role element corresponding to the event type; thus, since the target character extraction result of the embodiment of the present application is obtained by performing at least two different character extraction processes on a plurality of sentence-divided texts and combining the character extraction results, the adverse effect of the character extraction error in the single character extraction process on the global character extraction result can be reduced, and the accuracy of the target character extraction result can be improved, for example, in an ASR scene, if the target text has spoken expressions, unnatural sentence breaks, and the like, the problem that the character extraction error is large in only the single character extraction process may exist, whereas in the embodiment of the present application, through two different character extraction processes, the optimal character element combination can be selected from the character extraction results when the character extraction results are combined to obtain the target character extraction result, so that the adverse effect of the character extraction error in the single character extraction process on the target character extraction result can be reduced, the accuracy of the target role extraction result is improved, and the accuracy of event extraction based on the target role extraction result is further improved.
Drawings
FIG. 1-a is a diagram illustrating an exemplary application environment of a method for event extraction, according to an embodiment;
FIG. 1-b is a diagram illustrating an application environment of an event extraction method according to another embodiment;
FIG. 2 is a flowchart illustrating an event extraction method according to an embodiment;
fig. 3 is a schematic flow chart illustrating a process of performing at least two different character extraction processes on a plurality of clause texts to obtain a character extraction result corresponding to each character extraction process according to another embodiment;
FIG. 4 is a flowchart of step 302 provided in another embodiment;
fig. 5 is a schematic flowchart illustrating a process of combining extracted results of various roles to obtain an extracted result of a target role according to another embodiment;
fig. 6 is a schematic flowchart of determining a role element corresponding to a target role label from a plurality of initial role elements according to another embodiment;
FIG. 7 is a schematic flow chart of step 203 provided by another embodiment;
fig. 8 is a schematic flowchart of another embodiment of determining event types of role elements corresponding to clause texts according to triggered role elements;
FIG. 9 is a flowchart of step 703 provided in another embodiment;
FIG. 10 is a flowchart illustrating an event extraction method according to another embodiment;
FIG. 11 is a schematic illustration of an exemplary form according to another embodiment;
FIG. 12 is a flowchart of step 204 provided in another embodiment;
fig. 13 is a block diagram illustrating an event extraction apparatus according to an embodiment;
fig. 14 is an internal structural diagram of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The event extraction method, the event extraction device, the computer equipment and the computer readable storage medium provided by the embodiment of the application aim to solve the technical problem that in the traditional technology, the event extraction method based on an ASR scene is low in accuracy of event extraction. The following describes in detail the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems by embodiments and with reference to the drawings. The following specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
The following describes technical solutions related to the embodiments of the present application with reference to a scenario in which the embodiments of the present application are applied.
Fig. 1-a is a schematic diagram of an implementation environment related to the event extraction method provided in the embodiment of the present application, as shown in fig. 1-a, the implementation environment may include a terminal 101, and the terminal 101 may be a smart robot, a smart phone, a tablet computer, a personal computer, a notebook computer, a wearable device, a vehicle-mounted device, and the like.
In the implementation environment shown in fig. 1-a, the terminal 101 may obtain a target text, and perform clause processing on the target text to obtain a plurality of clause texts; the terminal 101 can perform at least two different character extraction processes on the multiple clause texts to obtain character extraction results corresponding to each character extraction process, and combine the character extraction results to obtain target character extraction results corresponding to the target texts, wherein the target character extraction results comprise multiple character elements corresponding to the target texts; the terminal 101 may perform event extraction processing on the target role extraction result to obtain at least one event information, where the event information includes an event type and a target role element corresponding to the event type.
Optionally, the implementation environment related to the event extraction method provided by the embodiment of the present application may further include a terminal and a server. As shown in fig. 1-b, the implementation environment may further include a terminal 101 and a server 102, and the terminal 101 and the server 102 may communicate with each other through a wired network or a wireless network. The terminal 101 may be an intelligent robot, a smart phone, a tablet computer, a personal computer, a notebook computer, a wearable device, a vehicle-mounted device, or the like; the server 102 may be one server or a server cluster including a plurality of servers.
In the implementation environment shown in fig. 1-b, the terminal 101 may send the target text to the server 102, and the server 102 obtains the target text and performs clause processing on the target text to obtain a plurality of clause texts; the server 102 may perform at least two different character extraction processes on the multiple clause texts to obtain a character extraction result corresponding to each character extraction process, and combine the character extraction results to obtain a target character extraction result corresponding to the target text, where the target character extraction result includes multiple character elements corresponding to the target text; the server 102 may perform event extraction processing on the target role extraction result to obtain at least one piece of event information, where the event information includes an event type and a target role element corresponding to the event type. Alternatively, the server 102 may transmit the event information to the terminal 101.
In one embodiment, as shown in fig. 2, an event extraction method is provided, which is described by taking the method as an example applied to the terminal 101 in fig. 1-a, and includes the following steps:
step 201, the terminal obtains a target text and performs clause processing on the target text to obtain a plurality of clause texts.
The target text refers to the text to be subjected to event extraction. In this embodiment of the application, the target text may be text data obtained by performing ASR (Automatic Speech Recognition) on the voice audio data of the user. Optionally, after the terminal may collect voice audio data of the user through the pickup assembly, the voice audio data is input into the pre-trained ASR model, and text data output by the ASR model, that is, the target text, is obtained.
Of course, in other embodiments, the target text may also be text data in a non-ASR scenario, for example, the target text may be obtained by performing OCR (Optical Character Recognition) on an image including text information by a terminal, and the like, and a manner of obtaining the target text is not particularly limited herein.
After the terminal acquires the target text, in order to avoid adverse effects on a subsequent extraction process caused by the excessively long text length of the target text, the terminal can detect whether the text length of the target text is larger than a preset text length threshold value, and if so, the terminal can perform sentence division processing on the target text based on punctuation marks in the target text to obtain a plurality of sentence division texts.
Alternatively, the terminal may rely on punctuation points that are not in symmetry (e.g.,!
Figure 189766DEST_PATH_IMAGE001
Etc.), wherein the symmetric punctuations can be ' ()'s, ' etc.
Therefore, the terminal can obtain a plurality of clause texts based on the target text.
Step 202, the terminal performs at least two different character extraction processes on the multiple clause texts to obtain a character extraction result corresponding to each character extraction process, and combines the character extraction results to obtain a target character extraction result corresponding to the target text.
And the target role extraction result comprises a plurality of role elements corresponding to the target text.
The character extraction is to extract character elements in the sentence text, and the character elements include, for example, "name" = "zhang san", "sex" = "male", "age" = "37 years", and the like.
In the embodiment of the application, the terminal performs at least two different role extraction processes on the multiple sentence dividing texts. Taking at least two different role extraction processes as the role extraction processing mode a and the role extraction processing mode B as an example, on one hand, the terminal performs one-time integral role extraction on a plurality of clause texts by using the role extraction processing mode a to obtain a role extraction result corresponding to the role extraction processing mode a, and the role extraction result may include each role element obtained by the role extraction processing mode a. On the other hand, the terminal performs a whole character extraction on the plurality of clause texts by adopting the character extraction processing method B to obtain a character extraction result corresponding to the character extraction processing method B, wherein the character extraction result may include each character element extracted by the character extraction processing method B.
And after the terminal obtains the role extraction result corresponding to the role extraction processing mode A and the role extraction result corresponding to the role extraction processing mode B, combining the two role extraction results, and selecting the optimal role element combination from the two role extraction results to obtain a target role extraction result.
Hereinafter, a mode in which the terminal selects the optimal character element from the two character extraction results will be described by way of an example.
It is assumed that, for a certain character element a, the character element a in the character extraction processing method a is "age" = "37 years", the character element a in the character extraction processing method B is "age" = "7 years", and the terminal selects a character element a with the highest reliability, for example, "age" = "37 years", from the two character extraction results.
In this embodiment of the application, optionally, the role extraction processing manner a may be, for example, a pattern matching manner, such as a regular matching; the role extraction processing method B may be, for example, a machine learning model method; optionally, the role extraction processing mode a and the role extraction processing mode B may both be in a mode matching manner, and the role extraction processing mode a and the role extraction processing mode B may perform role extraction processing on a plurality of clause texts respectively in a plurality of times, but the number of clause texts targeted by each role extraction of the role extraction processing mode a and the number of clause texts targeted by each role extraction of the role extraction processing mode B may be different, and so on, and here, no specific limitation is made on at least two different role extraction processing.
In this way, the terminal combines the character extraction results to obtain the target character extraction result corresponding to the target text.
And step 203, the terminal performs event extraction processing on the target role extraction result to obtain at least one piece of event information.
The event information comprises an event type and a target role element corresponding to the event type.
And the terminal matches the event types corresponding to the role elements based on a preset event template, and if the matching is successful, the event type and the role elements matched with the event type are used as event information. The target role element corresponding to the event type is the role element matched with the event type.
In the embodiment of the application, at least one event information can be used for form filling by a terminal or a server, so that the event information extracted from an unstructured text (target text) can be filled into a structured form for data management and presentation.
The event extraction method is not limited by the text length of the target text, and the influence of unnatural sentence breaks caused by an ASR scene is well avoided. The method comprises the steps of performing clause processing on a target text to obtain a plurality of clause texts, then performing at least two different role extraction processing on the plurality of clause texts to obtain role extraction results corresponding to each role extraction processing process, combining the role extraction results to obtain a target role extraction result comprising a plurality of role elements corresponding to the target text, and performing event extraction processing on the target role extraction result to obtain at least one event message, wherein the event message comprises an event type and a target role element corresponding to the event type; thus, since the target character extraction result of the embodiment of the present application is obtained by performing at least two different character extraction processes on a plurality of sentence-divided texts and combining the character extraction results, the adverse effect of the character extraction error in the single character extraction process on the global character extraction result can be reduced, and the accuracy of the target character extraction result can be improved, for example, in an ASR scene, if the target text has spoken expressions, unnatural sentence breaks, and the like, the problem that the character extraction error is large in only the single character extraction process may exist, whereas in the embodiment of the present application, through two different character extraction processes, the optimal character element combination can be selected from the character extraction results when the character extraction results are combined to obtain the target character extraction result, so that the adverse effect of the character extraction error in the single character extraction process on the target character extraction result can be reduced, the accuracy of the target role extraction result is improved, and the accuracy of event extraction based on the target role extraction result is further improved.
In an embodiment, based on the embodiment shown in fig. 2, referring to fig. 3, this embodiment relates to a process of how a terminal performs at least two different character extraction processes on a plurality of clause texts to obtain a character extraction result corresponding to each character extraction process. As shown in fig. 3, the process may include steps 301 and 302:
step 301, the terminal divides the multiple clause texts into multiple clause groups according to the arrangement sequence of the clause texts in the target text for each role extraction processing process.
For each role extraction processing procedure, the terminal may set the number of clause texts that each clause group needs to include in the role extraction processing procedure, so that the terminal divides the multiple clause texts into multiple clause groups according to the arrangement order (for example, the order may be from front to back) of each clause text in the target text, where the number of clause texts included in each clause group is the number of clause texts that the clause group needs to include set by the terminal. For example, the number of sentence texts included in each sentence group is one, or the number of sentence texts included in each sentence group is two, and so on.
In the embodiment of the application, in the role extraction processing process, the number of clause texts included in each clause group is the same, but the number of clause groups corresponding to different role extraction processing processes is different. Continuously taking at least two different role extraction processes as a role extraction processing mode A and a role extraction processing mode B as an example, the role extraction processing mode A divides a plurality of clause texts obtained by the target text clause into a plurality of clause groups, and the number of the clause texts included in each divided clause group is the same; the role extraction processing mode B divides a plurality of clause texts obtained by the target text clause into a plurality of clause groups, and the number of the clause texts included in each divided clause group is the same; however, the number of sentence groups divided by the character extraction processing method a is different from the number of sentence groups divided by the character extraction processing method B.
The following describes, by way of an example, a manner in which the terminal divides a plurality of sentence texts into a plurality of sentence groups in accordance with the arrangement order of each sentence text in the target text.
Suppose that a terminal performs clause processing on a target text to obtain 10 clause texts, and the 10 clause texts are respectively a clause text 1, a clause text 2, an clause text 10 according to the sequence of the clause texts in the target text from front to back.
Continuously taking at least two different role extraction processes as a role extraction process mode A and a role extraction process mode B as an example, setting the number of clause texts required to be included in each clause group to be 1 for the role extraction process mode A, and taking each clause text as a clause group by the terminal, namely, taking the clause text 1 as one clause group, taking the clause text 2 as one clause group, and so on to obtain 10 clause groups; for the role extraction processing mode B, the number of clause texts to be included in each clause group is set to be 2, and the terminal takes the clause text 1 and the clause text 2 as one clause group, the clause text 2 and the clause text 3 as one clause group, the clause text 3 and the clause text 4 as one clause group, and so on to obtain 9 clause groups.
And step 302, the terminal performs role extraction processing on the clause text corresponding to each clause group to obtain a role extraction result corresponding to the current role extraction processing process.
For the current role extraction processing process, after the terminal divides the multiple clause texts into multiple clause groups, the terminal can adopt a regular matching mode to perform role extraction processing on the clause texts corresponding to each clause group as a whole, so as to obtain a role extraction result corresponding to the current role extraction processing process.
In a possible implementation manner of step 302, referring to fig. 4, the terminal may execute step 3021 and step 3022 shown in fig. 4, and implement the process of step 302:
step 3021, the terminal performs entity recognition on the clause text corresponding to the clause group to obtain a plurality of keywords.
The entity identification means acquiring entity data such as a person name, a place name, time and the like from the text. In the embodiment of the application, in order to facilitate role extraction, for each clause group in the current role extraction processing process, the terminal performs entity identification on a clause text in the clause group, for example, the terminal can match the clause text with an entity in an entity database based on a preset entity database, and a field successfully matched in the clause text is used as a keyword, namely an entity. Through entity identification, the terminal can obtain a plurality of keywords corresponding to each clause group.
And step 3022, the terminal performs role matching on each keyword based on a preset role element database, and obtains a role extraction result by using the successfully matched keyword as an initial role element.
In this embodiment of the present application, the role element database may include a plurality of standard role elements and role labels corresponding to the standard role elements, where the role labels are used to characterize roles represented by the standard role elements.
And for each keyword obtained by entity identification, the terminal matches each keyword with a standard role element in the role element database, and if the matching is successful, the keyword is used as an initial role element, so that a role extraction result is obtained, wherein the role extraction result comprises a plurality of initial role elements.
Optionally, in the process of matching each keyword with a standard role element in the role element database, the terminal may calculate a confidence level between the keyword and the standard role element identified by the entity, where the confidence level may represent a matching degree, and if the confidence level is greater than a preset threshold, it is determined that the matching is successful.
In the embodiment of the application, for the initial role elements obtained by matching, the terminal can also add the corresponding role labels and confidence degrees thereof, so that the terminal can perform combination of the role extraction results and event extraction.
In this way, the terminal obtains the role extraction results corresponding to the role extraction processes through the above embodiment, and each role extraction result may include a plurality of initial role elements, the role labels corresponding to the initial role elements, and the confidence degrees corresponding to the initial role elements.
In another possible implementation manner of step 302, the terminal performs a role extraction process on the clause text, or inputs the clause text into a pre-trained role recognition model, where the role recognition model may be a machine learning model, so as to obtain a role extraction result corresponding to the current role extraction process.
The embodiment performs at least two different character extraction processes on a plurality of clause texts, and can avoid the problem of non-ideal character extraction effect caused by unnatural sentence breaks caused by spoken expressions and pause in speech expression of a user in an ASR scene. According to the method and the device, the different role extraction processing is performed on the multiple clause texts at least twice, and then the role extraction results are combined, so that the accuracy of the target role extraction result can be improved, and the accuracy of event extraction is improved.
In an embodiment, based on the embodiment shown in fig. 2, referring to fig. 5, this embodiment relates to a process of how a terminal combines extracted results of characters to obtain an extracted result of a target character corresponding to a target text. As shown in fig. 5, the process may include steps 501 and 502:
in step 501, the terminal determines, based on the respective character extraction results, a character element corresponding to the target character tag from the plurality of initial character elements according to the confidence degrees corresponding to the plurality of initial character elements if the number of initial character elements corresponding to the target character tag is plural.
In the embodiment of the present application, the role extraction result corresponding to each role extraction processing procedure may include a plurality of initial role elements, a role label corresponding to each initial role element, and a confidence corresponding to each initial role element.
It should be noted that, in the embodiment of the present application, the number of clause groups corresponding to different role extraction processing procedures is different, that is, the text length of the clause group is different in different role extraction processing procedures.
For example, the terminal performs clause processing on a target text to obtain 10 clause texts, a role extraction processing mode A is to obtain 10 clause groups by taking each clause text as one clause group, a role extraction processing mode B is to take a clause text 1 and a clause text 2 as one clause group, take a clause text 2 and a clause text 3 as one clause group, and so on to obtain 9 clause groups. It can be seen that the text length of one sentence group in the character extraction processing mode a is different from the text length of one sentence group in the character extraction processing mode B.
In this way, in the process of combining the character extraction results, the character extraction result of the character extraction processing mode a includes the character extraction result of 10 clause groups, and the character extraction result of the character extraction processing mode B includes the character extraction result of 9 clause groups, and for each character extraction result obtained by one of the character extraction processing modes (taking the character extraction processing mode B as an example), the terminal can perform cyclic comparison with each character extraction result obtained by the character extraction processing mode a to perform combination.
In the process that the terminal combines the extracted results of all the roles, if the number of the initial role elements corresponding to a certain target role label in a certain clause text is multiple, the terminal can screen out one role element with the highest reliability from the multiple initial role elements according to the confidence degrees corresponding to the multiple initial role elements, and the role element is used as the final role element of the target role label.
Hereinafter, a process in which the terminal identifies a character element corresponding to the target character label from among the plurality of initial character elements based on the confidence levels corresponding to the plurality of initial character elements will be described.
In a possible implementation manner, the terminal may sort the initial role elements in a descending order from a high confidence level to a low confidence level corresponding to the initial role elements to obtain a sorted sequence, and then, the terminal determines a first initial role element in the sorted sequence as a role element corresponding to the target role label.
In another possible implementation, referring to fig. 6, the process may include steps 601 and 602 shown in fig. 6:
step 601, the terminal determines the maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements.
Step 602, the terminal determines the initial role element corresponding to the maximum confidence as the role element corresponding to the target role label.
The terminal firstly screens out the maximum confidence coefficient from the confidence coefficients corresponding to the initial role elements, and then determines the initial role element corresponding to the maximum confidence coefficient as the role element corresponding to the target role label.
Optionally, before step 601, the terminal may further detect whether text overlapping exists in the multiple initial character elements, if text overlapping exists in the multiple initial character elements, the terminal executes step 601, and if text overlapping does not exist, each initial character element is retained.
For example, if the target role label is in a marital state, the initial role element corresponding to the target role label in one role extraction result is not married, the initial role element corresponding to the target role label in the other role extraction result is married, and the terminal detects that the two initial role elements are overlapped in text, the terminal determines the maximum confidence level according to the confidence levels corresponding to the two initial role elements, and determines the initial role element corresponding to the maximum confidence level as the role element corresponding to the target role label.
In this way, according to the above-described embodiment, the terminal performs filtering on a case where one character tag corresponds to a plurality of initial character elements, and determines a final character element corresponding to each character tag in the sentence text.
And 502, the terminal adopts the role elements corresponding to the target role labels to replace a plurality of initial role elements corresponding to the target role labels in all the role extraction results to obtain target role extraction results.
The terminal may replace a plurality of initial role elements corresponding to the target role label with the role element corresponding to the target role label, that is, only the role element with the highest confidence coefficient is retained, and obtain a target role extraction result after replacement.
For example, taking the example that the terminal continues to perform clause processing on the target text to obtain 10 clause texts, the terminal combines the character extraction results to obtain the character extraction results corresponding to the clause texts 1-10, and the character extraction results corresponding to the clause texts 1-10 form the target character extraction result. And the character elements in the target character extraction result can be sorted according to the sorting sequence of the corresponding clause text in the target text.
In this way, the repeated initial role elements in each role extraction result are filtered and screened according to the confidence degrees corresponding to the initial role elements, and are recombined, and if the role extraction effect on the unnatural sentence breaks caused by spoken expressions and pauses in a certain role extraction process is not ideal, that is, the confidence degree is low, the initial role elements can be discarded, only the role elements with the highest confidence degree are reserved, and the reliability of role extraction is improved.
In an embodiment, based on the embodiment shown in fig. 2, referring to fig. 7, this embodiment relates to a process of how a terminal performs event extraction processing on a target role extraction result to obtain at least one piece of event information. As shown in fig. 7, the terminal may execute step 701, step 702, and step 703 shown in fig. 7, and implement the process of step 203:
and 701, the terminal determines role elements corresponding to each clause text according to the target role extraction result.
In the target role extraction result, the role elements can be sorted according to the sorting sequence of the corresponding clause text in the target text. For example, the target character extraction result is that the clause text 1 corresponds to 1 character element: the role element 1 and the sentence text 2 correspond to 3 role elements: role element 2, role element 3, and role element 4, etc., the target role extraction result is role element 1, role element 2, role element 3, role element 4, etc.
In this way, the terminal determines the role elements corresponding to the clause texts according to the sequence of the role elements.
Step 702, for each clause text, if the role element corresponding to the clause text comprises a trigger role element corresponding to a trigger role, the terminal determines the event type of the role element corresponding to the clause text according to the trigger role element.
The trigger role represents the core word of the event occurrence or the description subject of the current event, such as "create", "applicant", "customer", etc.
In the embodiment of the application, the role element database may further include a tag indicating whether each standard role element is a trigger role element, so that in the process of performing role identification by the terminal through pattern matching, if a certain role element in the clause text is matched as a trigger role element, the terminal may add a corresponding trigger role tag to the role element.
And for each clause text, the terminal can determine whether the role elements corresponding to the clause text comprise trigger role elements according to the trigger role labels of the role elements corresponding to the clause text.
Optionally, for each clause text, if the role element corresponding to the clause text includes a trigger role element, the terminal determines the event type of the role element corresponding to the clause text according to the trigger role element.
Optionally, because there is a principle of context correlation when the language is expressed, for each clause text, if the role element corresponding to the clause text does not include the trigger role element, the terminal uses the trigger role element of the adjacent clause text of the clause text as the trigger role element corresponding to the clause text. Namely, the trigger role element can be extracted from the current clause text or inherited from the previous clause text.
Optionally, the trigger role element corresponding to the clause text may also be predicted by a machine learning model, and a manner in which the terminal determines the trigger role element corresponding to the clause text is not specifically limited herein.
And after the terminal determines the trigger role element corresponding to the clause text, determining the event type of the role element corresponding to the clause text according to the trigger role element.
In a possible implementation manner, referring to fig. 8, the terminal may execute steps 801 and 802 shown in fig. 8 to implement a process of determining an event type of a role element corresponding to a clause text according to a trigger role element:
step 801, the terminal determines an event template corresponding to the trigger role element.
In the embodiment of the application, the terminal can preset event templates corresponding to the trigger role elements. After determining the trigger role elements corresponding to the clause text, the terminal screens an event template corresponding to the trigger role elements according to the trigger role elements, wherein the event template comprises a plurality of event role elements, and the event role elements can be keywords for describing events.
The number of event templates corresponding to the trigger role elements obtained by the terminal screening may be one or more.
Step 802, if the role element corresponding to the clause text matches with a plurality of event role elements, the terminal takes the event type corresponding to the event template as the event type of the role element corresponding to the clause text.
And the terminal matches the role elements corresponding to the clause text with the event role elements of each event template, and if the role elements corresponding to the clause text are matched with the event role elements of a certain event template, the terminal takes the event type corresponding to the event template as the event type of the role elements corresponding to the clause text.
Note that if a role element does not match each event role element, the role element is discarded as redundant information.
The role extraction and the event extraction are based on template matching, so that the requirement for initializing data volume is lowered, a large amount of data does not need to be collected to train a machine learning model, the implementation difficulty is lowered, the efficiency is improved, and the event extraction effect can be quickly achieved when a project is cold started.
And 703, the terminal acquires at least one piece of event information according to the role element and the event type corresponding to each clause text.
And after the terminal determines the event type of the role element corresponding to the clause text, adding a corresponding event type label to each successfully matched role element in the clause text. And the terminal takes each event type and the role element corresponding to the event type as event information to obtain a final event extraction result.
In the embodiment of the application, the event template is determined by triggering the role elements, and the event type corresponding to the clause text can be quickly obtained based on template matching, so that the event extraction process is simple and easy to realize, and the practicability is high.
In an embodiment, based on the embodiment shown in fig. 7, referring to fig. 9, this embodiment relates to a process of how the terminal obtains at least one piece of event information according to the role element and the event type corresponding to each clause text. As shown in fig. 9, step 703 may include steps 901 and 902 shown in fig. 9:
and step 901, the terminal fuses the role elements with the same event type according to the role elements and the event types corresponding to the clause texts to obtain a target role element corresponding to each event type.
And after the terminal determines the event type of the role element corresponding to the clause text, adding a corresponding event type label to each successfully matched role element.
And the terminal combines the role elements with the same event type label according to the event type label corresponding to each role element, and takes the role element with the event type label as the target role element of the event type corresponding to the event type label.
And step 902, the terminal obtains event information corresponding to each event type according to the target role element corresponding to each event type.
After the event fusion is performed in step 901, the terminal may further perform deduplication and disambiguation processing for each event type. In a possible implementation, the terminal may perform the following steps a1 and a2, implementing the procedure of step 902 to perform the deduplication and disambiguation processes:
and step A1, the terminal performs deduplication processing on the target role element corresponding to each event type to obtain the deduplicated target role element.
The terminal detects the event type labels of all the target role elements, and if the event type labels are the same and the target role elements are the same, the terminal only reserves one target role element with the event type label.
And step A2, the terminal filters the target role element after the deduplication processing according to the role label and the confidence degree corresponding to the target role element after the deduplication processing, and obtains event information corresponding to the event type.
The confidence may be a confidence in the character extraction process, for example, in the process of matching each keyword with a standard character element in the character element database, the terminal may calculate a confidence between the keyword recognized by the entity and the standard character element, and the confidence may represent the degree of matching.
And the terminal sorts the ambiguous target role elements according to the confidence degrees, reserves the target role element with the maximum confidence degree and obtains the final event information corresponding to the obtained event type. The ambiguous target role element means that the role labels corresponding to a plurality of different target role elements are the same in a plurality of target role elements corresponding to the event type, so that only the most reliable target role element corresponding to the role label can be reserved through disambiguation processing.
In the process of acquiring at least one event message by the terminal according to the role elements and the event types corresponding to the clause texts, the role elements with the same event type are fused, and duplication elimination and ambiguity elimination are performed on each event type terminal to obtain the final event message, so that the data accuracy of the event message is improved.
In an embodiment, based on the embodiment shown in fig. 2, referring to fig. 10, the embodiment relates to a process of filling event information into a form after the terminal acquires the event information. As shown in fig. 10, the event extraction method of this embodiment further includes step 204:
and step 204, the terminal determines target event information from the at least one event information according to a preset form filling strategy, and fills the target event information into a preset form.
In the embodiment of the application, the terminal can extract an event to obtain a plurality of event information, so that if certain event information needs to be displayed in a tabular manner, the terminal determines the target event information from at least one event information and fills the target event information into a preset form. Alternatively, the terminal may present the filled-in form in a preset area of the screen.
Referring to fig. 11, fig. 11 is a schematic diagram of an exemplary form. As shown in FIG. 11, the target text is "the applicant name Zhang Mei, which is aunt Zhao's daughter, almost twenty-five and six years old, a baby just born in the last year, a three-family life happiness, gestational diabetes mellitus when the baby in the last year in Mei is got, the recovery is better recently, and the baby wants to buy insurance recently, tends to select the annuity insurance. The old in the beauty works in a certain law, the income is good, and the couples and the social insurance merchants have the good. "
The terminal performs clause processing on the target text to obtain a plurality of clause texts through the implementation mode of the embodiment, performs at least two different role extraction processing on the plurality of clause texts to obtain role extraction results corresponding to each role extraction processing process, and combines the role extraction results to obtain a target role extraction result corresponding to the target text; and the terminal performs event extraction processing on the target role extraction result to obtain at least one piece of event information.
If the preset form shown in fig. 11 needs to be filled in the actual service, the terminal selects the target event information "applicant information" from at least one event information, and fills the corresponding form item content in the position frame corresponding to each form header in the form based on the preset form filling strategy.
In the embodiment of the application, the table item content is a subset of at least one event information obtained by event extraction, so that the table filling requirements of different event types can be met through one-time event extraction, and the computing resources are saved.
Optionally, referring to fig. 12, the terminal may execute the process of implementing step 204 in step 2041 and step 2042 shown in fig. 12:
step 2041, for each table, the terminal determines target event information corresponding to the target event type from at least one piece of event information according to the target event type corresponding to the table.
And extracting the terminal event to obtain at least one piece of event information, wherein each piece of event information comprises an event type and a target role element corresponding to the event type.
And for each table, the terminal determines target event information comprising the target event type from the plurality of event information according to the target event type corresponding to the table.
Step 2042, the terminal selects the corresponding target role element from the target event information according to the preset mapping relationship between the header and the role element, and fills the corresponding target role element in the position frame indicated by the header.
The mapping relationship comprises a mapping relationship between one role element and one header, or the mapping relationship comprises a mapping relationship between a plurality of role elements and one header.
The mapping relation between one role element and one header refers to the situation that the content expressed by the role element and the header is substantially the same but the expression mode is different. For example, the mapping may include a mapping between "name" and "name", a mapping between "age" and "age", a mapping between "mother" and "mom", and so on.
The mapping relationship between a plurality of role elements and a header means that the role elements jointly determine a table item corresponding to the header. For example, the character element "married" and the character element "two children" may together determine the contents of the table entry corresponding to the "number of family members" in the header, i.e., the table entry is filled with 3.
The mapping relation of the embodiment of the application comprises the mapping relation between one role element and one header, or the mapping relation comprises the mapping relation between a plurality of role elements and one header, the table filling can be flexibly carried out on the basis of the extracted event information, and the rendering flexibility is high.
It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 13, there is provided an event extraction device including:
the acquiring module 10 is configured to acquire a target text and perform clause processing on the target text to obtain a plurality of clause texts;
the role extraction module 20 is configured to perform at least two different role extraction processes on the multiple clause texts to obtain role extraction results corresponding to the respective role extraction processes, and combine the respective role extraction results to obtain target role extraction results corresponding to the target text, where the target role extraction results include multiple role elements corresponding to the target text;
and the event extraction module 30 is configured to perform event extraction processing on the target role extraction result to obtain at least one piece of event information, where the event information includes an event type and a target role element corresponding to the event type.
In one embodiment, the character extraction module 20 includes:
a grouping unit, configured to, for each role extraction processing procedure, divide the multiple clause texts into multiple clause groups according to an arrangement order of the clause texts in the target text, where the clause groups include the same number of clause texts, and the clause groups corresponding to different role extraction processing procedures have different numbers;
and the role extraction unit is used for carrying out role extraction processing on the clause text corresponding to each clause group to obtain the role extraction result corresponding to the current role extraction processing process.
In one embodiment, the role extraction unit is specifically configured to perform entity identification on a clause text corresponding to the clause group to obtain a plurality of keywords; and performing role matching on each keyword based on a preset role element database, and taking the keyword which is successfully matched as an initial role element to obtain a role extraction result.
In one embodiment, the role extraction result includes a plurality of initial role elements, a role label corresponding to each of the initial role elements, and a confidence corresponding to each of the initial role elements, and the role extraction module 20 further includes:
a first determining unit, configured to determine, based on each of the character extraction results, a character element corresponding to a target character tag from the plurality of initial character elements according to the confidence degrees corresponding to the plurality of initial character elements if the number of initial character elements corresponding to the target character tag is multiple;
and the combining unit is used for replacing the plurality of initial role elements corresponding to the target role labels with the role elements corresponding to the target role labels in each role extraction result to obtain the target role extraction result.
Optionally, the first determining unit is specifically configured to determine a maximum confidence level according to the confidence levels corresponding to the multiple initial role elements; and determining the initial role element corresponding to the maximum confidence coefficient as the role element corresponding to the target role label.
In one embodiment, the character extraction module 20 further includes:
a detecting unit, configured to detect whether text overlapping exists in the plurality of initial character elements;
the first determining unit is specifically configured to determine, if text overlapping exists in the multiple initial role elements, a maximum confidence level according to the confidence levels corresponding to the multiple initial role elements.
In one embodiment, the event extraction module 30 includes:
a second determining unit, configured to determine, according to the target role extraction result, a role element corresponding to each clause text;
a third determining unit, configured to determine, for each clause text, an event type of a role element corresponding to the clause text according to a trigger role element if the role element corresponding to the clause text includes the trigger role element corresponding to the trigger role;
and the obtaining unit is used for obtaining the at least one piece of event information according to the role elements and the event types corresponding to the clause texts.
Optionally, the third determining unit is specifically configured to determine an event template corresponding to the trigger role element, where the event template includes multiple event role elements; and if the role element corresponding to the clause text is matched with the event role elements, taking the event type corresponding to the event template as the event type of the role element corresponding to the clause text.
In one embodiment, the event extraction module 30 further includes:
and a fourth determining unit, configured to, for each clause text, if the role element corresponding to the clause text does not include the trigger role element, use the trigger role element of an adjacent clause text of the clause text as the trigger role element corresponding to the clause text.
In one embodiment, the obtaining unit is specifically configured to fuse, according to the role elements corresponding to the clause texts and the event types, the role elements of the same event type to obtain the target role element corresponding to each event type; and obtaining event information corresponding to each event type according to the target role element corresponding to each event type.
In an embodiment, the obtaining unit is specifically configured to, for each event type, perform deduplication processing on the target role element corresponding to the event type to obtain a deduplicated target role element; and filtering the target role element after the deduplication processing according to the role label and the confidence coefficient corresponding to the target role element after the deduplication processing to obtain event information corresponding to the event type.
In one embodiment, the apparatus further comprises:
and the filling module is used for determining target event information from the at least one piece of event information according to a preset filling strategy and filling the target event information into a preset form.
Optionally, the fill module includes:
a fifth determining unit, configured to determine, for each table, the target event information corresponding to a target event type from the at least one piece of event information according to the target event type corresponding to the table;
and the filling-in unit is used for selecting a corresponding target role element from the target event information according to a preset mapping relation between the header and the role element and filling the corresponding target role element into a position frame indicated by the corresponding header, wherein the mapping relation comprises a mapping relation between one role element and one header, or the mapping relation comprises a mapping relation between a plurality of role elements and one header.
The event extraction device provided in this embodiment may implement the above-mentioned event extraction method embodiment, and its implementation principle and technical effect are similar, and are not described herein again. For the specific definition of the event extraction device, reference may be made to the above definition of the event extraction method, which is not described herein again. The modules in the event extraction device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, there is also provided a computer device as shown in fig. 14, the computer device may be a terminal, and its internal structure diagram may be as shown in fig. 14. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an event extraction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 14 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring a target text, and performing clause processing on the target text to obtain a plurality of clause texts;
performing at least two different character extraction processes on the multiple clause texts to obtain a character extraction result corresponding to each character extraction process, and combining the character extraction results to obtain a target character extraction result corresponding to the target text, wherein the target character extraction result comprises a plurality of character elements corresponding to the target text;
and performing event extraction processing on the target role extraction result to obtain at least one piece of event information, wherein the event information comprises an event type and a target role element corresponding to the event type.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
for each role extraction processing process, dividing the multiple clause texts into multiple clause groups according to the arrangement sequence of the clause texts in the target text, wherein the clause groups comprise the same number of clause texts, and the number of the clause groups corresponding to different role extraction processing processes is different;
and performing role extraction processing on the clause text corresponding to each clause group to obtain a role extraction result corresponding to the current role extraction processing process.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
entity recognition is carried out on the clause texts corresponding to the clause groups, and a plurality of keywords are obtained;
and performing role matching on each keyword based on a preset role element database, and taking the keyword which is successfully matched as an initial role element to obtain a role extraction result.
In one embodiment, the role extraction result includes a plurality of initial role elements, a role label corresponding to each of the initial role elements, and a confidence corresponding to each of the initial role elements, and the processor, when executing the computer program, further implements the following steps:
based on each role extraction result, if the number of initial role elements corresponding to a target role label is multiple, determining the role element corresponding to the target role label from the multiple initial role elements according to the confidence degrees corresponding to the multiple initial role elements;
and in each role extraction result, replacing the plurality of initial role elements corresponding to the target role labels with the role elements corresponding to the target role labels to obtain the target role extraction result.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
determining a maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements;
and determining the initial role element corresponding to the maximum confidence coefficient as the role element corresponding to the target role label.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
detecting whether text overlapping exists in a plurality of initial role elements;
correspondingly, the determining the maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements includes:
and if the initial role elements have text overlap, determining the maximum confidence level according to the confidence levels corresponding to the initial role elements.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
determining role elements corresponding to the clause texts according to the target role extraction result;
for each clause text, if the role element corresponding to the clause text comprises a trigger role element corresponding to a trigger role, determining the event type of the role element corresponding to the clause text according to the trigger role element;
and acquiring the at least one piece of event information according to the role elements and the event types corresponding to the clause texts.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
determining an event template corresponding to the trigger role element, wherein the event template comprises a plurality of event role elements;
and if the role element corresponding to the clause text is matched with the event role elements, taking the event type corresponding to the event template as the event type of the role element corresponding to the clause text.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and for each clause text, if the role element corresponding to the clause text does not comprise the trigger role element, taking the trigger role element of the adjacent clause text of the clause text as the trigger role element corresponding to the clause text.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
fusing the role elements with the same event type according to the role elements corresponding to the clause texts and the event type to obtain the target role element corresponding to each event type;
and obtaining event information corresponding to each event type according to the target role element corresponding to each event type.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
for each event type, performing duplicate removal processing on the target role element corresponding to the event type to obtain a duplicate-removed target role element;
and filtering the target role element after the deduplication processing according to the role label and the confidence coefficient corresponding to the target role element after the deduplication processing to obtain event information corresponding to the event type.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
and determining target event information from the at least one event information according to a preset table filling strategy, and filling the target event information into a preset table.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
for each table, determining the target event information corresponding to the target event type from the at least one event information according to the target event type corresponding to the table;
and selecting a corresponding target role element from the target event information according to a preset mapping relation between the header and the role element, and filling the target role element into a position frame indicated by the corresponding header, wherein the mapping relation comprises a mapping relation between one role element and one header, or the mapping relation comprises a mapping relation between a plurality of role elements and one header.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, the computer program can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Ramb microsecond direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring a target text, and performing clause processing on the target text to obtain a plurality of clause texts;
performing at least two different character extraction processes on the multiple clause texts to obtain a character extraction result corresponding to each character extraction process, and combining the character extraction results to obtain a target character extraction result corresponding to the target text, wherein the target character extraction result comprises a plurality of character elements corresponding to the target text;
and performing event extraction processing on the target role extraction result to obtain at least one piece of event information, wherein the event information comprises an event type and a target role element corresponding to the event type.
In one embodiment, the computer program when executed by the processor further performs the steps of:
for each role extraction processing process, dividing the multiple clause texts into multiple clause groups according to the arrangement sequence of the clause texts in the target text, wherein the clause groups comprise the same number of clause texts, and the number of the clause groups corresponding to different role extraction processing processes is different;
and performing role extraction processing on the clause text corresponding to each clause group to obtain a role extraction result corresponding to the current role extraction processing process.
In one embodiment, the computer program when executed by the processor further performs the steps of:
entity recognition is carried out on the clause texts corresponding to the clause groups, and a plurality of keywords are obtained;
and performing role matching on each keyword based on a preset role element database, and taking the keyword which is successfully matched as an initial role element to obtain a role extraction result.
In one embodiment, the character extraction result includes a plurality of initial character elements, a character tag corresponding to each of the initial character elements, and a confidence corresponding to each of the initial character elements, and the computer program when executed by the processor further implements the following steps:
based on each role extraction result, if the number of initial role elements corresponding to a target role label is multiple, determining the role element corresponding to the target role label from the multiple initial role elements according to the confidence degrees corresponding to the multiple initial role elements;
and in each role extraction result, replacing the plurality of initial role elements corresponding to the target role labels with the role elements corresponding to the target role labels to obtain the target role extraction result.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining a maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements;
and determining the initial role element corresponding to the maximum confidence coefficient as the role element corresponding to the target role label.
In one embodiment, the computer program when executed by the processor further performs the steps of:
detecting whether text overlapping exists in a plurality of initial role elements;
correspondingly, the determining the maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements includes:
and if the initial role elements have text overlap, determining the maximum confidence level according to the confidence levels corresponding to the initial role elements.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining role elements corresponding to the clause texts according to the target role extraction result;
for each clause text, if the role element corresponding to the clause text comprises a trigger role element corresponding to a trigger role, determining the event type of the role element corresponding to the clause text according to the trigger role element;
and acquiring the at least one piece of event information according to the role elements and the event types corresponding to the clause texts.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining an event template corresponding to the trigger role element, wherein the event template comprises a plurality of event role elements;
and if the role element corresponding to the clause text is matched with the event role elements, taking the event type corresponding to the event template as the event type of the role element corresponding to the clause text.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and for each clause text, if the role element corresponding to the clause text does not comprise the trigger role element, taking the trigger role element of the adjacent clause text of the clause text as the trigger role element corresponding to the clause text.
In one embodiment, the computer program when executed by the processor further performs the steps of:
fusing the role elements with the same event type according to the role elements corresponding to the clause texts and the event type to obtain the target role element corresponding to each event type;
and obtaining event information corresponding to each event type according to the target role element corresponding to each event type.
In one embodiment, the computer program when executed by the processor further performs the steps of:
for each event type, performing duplicate removal processing on the target role element corresponding to the event type to obtain a duplicate-removed target role element;
and filtering the target role element after the deduplication processing according to the role label and the confidence coefficient corresponding to the target role element after the deduplication processing to obtain event information corresponding to the event type.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and determining target event information from the at least one event information according to a preset table filling strategy, and filling the target event information into a preset table.
In one embodiment, the computer program when executed by the processor further performs the steps of:
for each table, determining the target event information corresponding to the target event type from the at least one event information according to the target event type corresponding to the table;
and selecting a corresponding target role element from the target event information according to a preset mapping relation between the header and the role element, and filling the target role element into a position frame indicated by the corresponding header, wherein the mapping relation comprises a mapping relation between one role element and one header, or the mapping relation comprises a mapping relation between a plurality of role elements and one header.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only show some embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. An event extraction method, the method comprising:
acquiring a target text, and performing clause processing on the target text to obtain a plurality of clause texts;
for each role extraction processing process, dividing the multiple clause texts into multiple clause groups according to the arrangement sequence of the clause texts in the target text, wherein the clause groups comprise the same number of clause texts, and the number of the clause groups corresponding to different role extraction processing processes is different;
taking the clause text corresponding to each clause group as a whole to perform role extraction processing to obtain a role extraction result corresponding to the current role extraction processing process;
combining the role extraction results to obtain a target role extraction result corresponding to the target text, wherein the target role extraction result comprises a plurality of role elements corresponding to the target text;
and performing event extraction processing on the target role extraction result to obtain at least one piece of event information, wherein the event information comprises an event type and a target role element corresponding to the event type.
2. The method according to claim 1, wherein the performing a role extraction process on the sentence text corresponding to each sentence group as a whole to obtain a role extraction result corresponding to the current role extraction process includes:
entity recognition is carried out on the clause texts corresponding to the clause groups, and a plurality of keywords are obtained;
and performing role matching on each keyword based on a preset role element database, and taking the keyword which is successfully matched as an initial role element to obtain a role extraction result.
3. The method of claim 1, wherein the character extraction result includes a plurality of initial character elements, character tags corresponding to the initial character elements, and confidence degrees corresponding to the initial character elements, and wherein combining the character extraction results to obtain a target character extraction result corresponding to the target text comprises:
based on each role extraction result, if the number of initial role elements corresponding to a target role label is multiple, determining the role element corresponding to the target role label from the multiple initial role elements according to the confidence degrees corresponding to the multiple initial role elements;
and in each role extraction result, replacing the plurality of initial role elements corresponding to the target role labels with the role elements corresponding to the target role labels to obtain the target role extraction result.
4. The method of claim 3, wherein determining the role element corresponding to the target role label from the plurality of initial role elements according to the confidence degrees corresponding to the plurality of initial role elements comprises:
determining a maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements;
and determining the initial role element corresponding to the maximum confidence coefficient as the role element corresponding to the target role label.
5. The method of claim 4, wherein before determining a maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements, the method further comprises:
detecting whether text overlapping exists in a plurality of initial role elements;
correspondingly, the determining the maximum confidence level according to the confidence levels corresponding to the plurality of initial role elements includes:
and if the initial role elements have text overlap, determining the maximum confidence level according to the confidence levels corresponding to the initial role elements.
6. The method according to claim 1, wherein said performing an event extraction process on the target character extraction result to obtain at least one event message comprises:
determining role elements corresponding to the clause texts according to the target role extraction result;
for each clause text, if the role element corresponding to the clause text comprises a trigger role element corresponding to a trigger role, determining the event type of the role element corresponding to the clause text according to the trigger role element;
and acquiring the at least one piece of event information according to the role elements and the event types corresponding to the clause texts.
7. The method of claim 6, wherein the determining the event type of the role element corresponding to the clause text according to the trigger role element comprises:
determining an event template corresponding to the trigger role element, wherein the event template comprises a plurality of event role elements;
and if the role element corresponding to the clause text is matched with the event role elements, taking the event type corresponding to the event template as the event type of the role element corresponding to the clause text.
8. The method according to claim 6, wherein before the obtaining the at least one event message according to the role element and the event type corresponding to each clause text, the method further comprises:
and for each clause text, if the role element corresponding to the clause text does not comprise the trigger role element, taking the trigger role element of the adjacent clause text of the clause text as the trigger role element corresponding to the clause text.
9. The method according to claim 6, wherein the obtaining the at least one piece of event information according to the role element and the event type corresponding to each clause text comprises:
fusing the role elements with the same event type according to the role elements corresponding to the clause texts and the event type to obtain the target role element corresponding to each event type;
and obtaining event information corresponding to each event type according to the target role element corresponding to each event type.
10. The method according to claim 9, wherein obtaining event information corresponding to each of the event types according to the target role element corresponding to each of the event types comprises:
for each event type, performing duplicate removal processing on the target role element corresponding to the event type to obtain a duplicate-removed target role element;
and filtering the target role element after the deduplication processing according to the role label and the confidence coefficient corresponding to the target role element after the deduplication processing to obtain event information corresponding to the event type.
11. The method of claim 1, further comprising:
and determining target event information from the at least one event information according to a preset table filling strategy, and filling the target event information into a preset table.
12. The method according to claim 11, wherein the determining target event information from the at least one event information according to a preset table filling policy and filling the target event information into a preset table comprises:
for each table, determining the target event information corresponding to the target event type from the at least one event information according to the target event type corresponding to the table;
and selecting a corresponding target role element from the target event information according to a preset mapping relation between the header and the role element, and filling the target role element into a position frame indicated by the corresponding header, wherein the mapping relation comprises a mapping relation between one role element and one header, or the mapping relation comprises a mapping relation between a plurality of role elements and one header.
13. An event extraction device, the device comprising:
the acquisition module is used for acquiring a target text and performing clause processing on the target text to obtain a plurality of clause texts;
the role extraction module is used for performing at least two different role extraction processes on the multiple clause texts to obtain role extraction results corresponding to each role extraction process, combining the role extraction results to obtain target role extraction results corresponding to the target texts, and the target role extraction results comprise multiple role elements corresponding to the target texts;
the event extraction module is used for performing event extraction processing on the target role extraction result to obtain at least one piece of event information, wherein the event information comprises an event type and a target role element corresponding to the event type;
wherein, the role extraction module comprises:
a grouping unit, configured to, for each role extraction processing procedure, divide the multiple clause texts into multiple clause groups according to an arrangement order of the clause texts in the target text, where the clause groups include the same number of clause texts, and the clause groups corresponding to different role extraction processing procedures have different numbers;
and the role extraction unit is used for carrying out role extraction processing on the clause texts corresponding to each clause group as a whole to obtain the role extraction result corresponding to the current role extraction processing process.
14. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method according to any of claims 1 to 12.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 12.
CN202110649268.8A 2021-06-10 2021-06-10 Event extraction method and device, computer equipment and computer-readable storage medium Active CN113255322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110649268.8A CN113255322B (en) 2021-06-10 2021-06-10 Event extraction method and device, computer equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110649268.8A CN113255322B (en) 2021-06-10 2021-06-10 Event extraction method and device, computer equipment and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN113255322A CN113255322A (en) 2021-08-13
CN113255322B true CN113255322B (en) 2021-10-01

Family

ID=77187464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110649268.8A Active CN113255322B (en) 2021-06-10 2021-06-10 Event extraction method and device, computer equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN113255322B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113468433B (en) * 2021-09-02 2021-12-07 中科雨辰科技有限公司 Target event extraction data processing system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562772A (en) * 2017-07-03 2018-01-09 南京柯基数据科技有限公司 Event extraction method, apparatus, system and storage medium
CN110008463A (en) * 2018-11-15 2019-07-12 阿里巴巴集团控股有限公司 Method, apparatus and computer-readable medium for event extraction

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886665B2 (en) * 2014-12-08 2018-02-06 International Business Machines Corporation Event detection using roles and relationships of entities
CN111597817B (en) * 2020-05-27 2023-12-08 北京明略软件系统有限公司 Event information extraction method and device
CN111967268B (en) * 2020-06-30 2024-03-19 北京百度网讯科技有限公司 Event extraction method and device in text, electronic equipment and storage medium
CN112084381A (en) * 2020-09-11 2020-12-15 广东电网有限责任公司 Event extraction method, system, storage medium and equipment
CN112434535B (en) * 2020-11-24 2023-05-02 上海浦东发展银行股份有限公司 Element extraction method, device, equipment and storage medium based on multiple models

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107562772A (en) * 2017-07-03 2018-01-09 南京柯基数据科技有限公司 Event extraction method, apparatus, system and storage medium
CN110008463A (en) * 2018-11-15 2019-07-12 阿里巴巴集团控股有限公司 Method, apparatus and computer-readable medium for event extraction

Also Published As

Publication number Publication date
CN113255322A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN110059320B (en) Entity relationship extraction method and device, computer equipment and storage medium
CN111859960B (en) Semantic matching method, device, computer equipment and medium based on knowledge distillation
WO2021027533A1 (en) Text semantic recognition method and apparatus, computer device, and storage medium
CN111444723B (en) Information extraction method, computer device, and storage medium
CN108595695B (en) Data processing method, data processing device, computer equipment and storage medium
CN109446302A (en) Question and answer data processing method, device and computer equipment based on machine learning
CN111222305B (en) Information structuring method and device
CN110569500A (en) Text semantic recognition method and device, computer equipment and storage medium
CN109753653B (en) Entity name recognition method, entity name recognition device, computer equipment and storage medium
CN109766072B (en) Information verification input method and device, computer equipment and storage medium
CN111680634B (en) Document file processing method, device, computer equipment and storage medium
CN111444349B (en) Information extraction method, information extraction device, computer equipment and storage medium
CN112270196A (en) Entity relationship identification method and device and electronic equipment
CN109033427B (en) Stock screening method and device, computer equipment and readable storage medium
CN110738262A (en) Text recognition method and related product
CN113255322B (en) Event extraction method and device, computer equipment and computer-readable storage medium
CN114241499A (en) Table picture identification method, device and equipment and readable storage medium
CN111552527A (en) Method, device and system for translating characters in user interface and storage medium
CN112183513B (en) Method and device for recognizing characters in image, electronic equipment and storage medium
CN112270184A (en) Natural language processing method, device and storage medium
CN115309862A (en) Causal relationship identification method and device based on graph convolution network and contrast learning
CN111078984B (en) Network model issuing method, device, computer equipment and storage medium
CN106815592B (en) Text data processing method and device and wrong word recognition methods and device
CN114638229A (en) Entity identification method, device, medium and equipment of record data
CN115526176A (en) Text recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant