CN112860852A - Information analysis method and device, electronic equipment and computer readable storage medium - Google Patents

Information analysis method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112860852A
CN112860852A CN202110104560.1A CN202110104560A CN112860852A CN 112860852 A CN112860852 A CN 112860852A CN 202110104560 A CN202110104560 A CN 202110104560A CN 112860852 A CN112860852 A CN 112860852A
Authority
CN
China
Prior art keywords
information
event
text
argument
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110104560.1A
Other languages
Chinese (zh)
Other versions
CN112860852B (en
Inventor
刘文强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jindi Technology Co Ltd
Original Assignee
Beijing Jindi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jindi Technology Co Ltd filed Critical Beijing Jindi Technology Co Ltd
Priority to CN202110104560.1A priority Critical patent/CN112860852B/en
Publication of CN112860852A publication Critical patent/CN112860852A/en
Application granted granted Critical
Publication of CN112860852B publication Critical patent/CN112860852B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the disclosure provides an information analysis method and device, an electronic device and a computer-readable storage medium. The method comprises the following steps: acquiring an information text; adding event prior information corresponding to the trigger word to the information text based on the trigger word in the information text to obtain a text to be analyzed; predicting event information and argument information in the text to be analyzed by using a first extraction model to obtain a prediction result; the first extraction model is obtained by training in advance based on a plurality of first training corpuses, and the first training corpuses are marked with event marking information and argument role marking information; and determining event information and argument information corresponding to the information text based on the prediction result. The technical scheme can improve the accuracy of extracting events and argument values from public opinion news.

Description

Information analysis method and device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to an information analysis method and apparatus, an electronic device, and a computer-readable storage medium.
Background
Public opinion refers to the social attitude of the people as the subject in the direction of social managers, enterprises, individuals and other organizations as objects and their politics, society, morality, etc. around the occurrence, development and change of social events in a certain social space.
With the rapid development of internet technology, the network development and flexibility make it one of the main carriers reflecting social public sentiment. By extracting information and storing the information structurally of the public opinion news of the enterprise, the user can conveniently acquire the comprehensive public opinion information of the concerned enterprise, the public opinion information of the enterprise can be analyzed, the development trend of the enterprise can be accurately judged, and a public opinion report and various statistical reports can be further generated so as to facilitate decision making.
In the prior art, when information of public sentiment news of an enterprise is extracted, event and argument values in the public sentiment news are extracted by adopting a pipeline (pipeline) mode based on events and arguments, the events (such as purchasing) are firstly identified, and the argument values (such as time and purchased objects) of the corresponding events are extracted when the events are identified. In the process of implementing the present disclosure, the inventor finds, through research, that the above method for extracting event and argument values in public sentiment news based on a pipeline manner of events and arguments has an error conduction problem, and if the previous event identification is not accurate, the subsequent argument value identification is also not accurate, so that the information extracted from the public sentiment news of an enterprise has errors.
Disclosure of Invention
The present disclosure is directed to an information analysis method and apparatus, an electronic device, and a computer-readable storage medium, so as to improve the accuracy of extracting events and argument values from public sentiment news at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided an information analysis method including:
acquiring an information text;
adding event prior information corresponding to the trigger word to the information text based on the trigger word in the information text to obtain a text to be analyzed;
predicting event information and argument information in the text to be analyzed by using a first extraction model to obtain a prediction result; the first extraction model is obtained by training in advance based on a plurality of first training corpuses, and the first training corpuses are marked with event marking information and argument role marking information;
and determining event information and argument information corresponding to the information text based on the prediction result.
Optionally, in an exemplary embodiment of the present disclosure, the adding, to the information text, event prior information corresponding to a trigger word based on the trigger word in the information text includes:
detecting a trigger word in the information text;
determining event prior information corresponding to the trigger word;
and respectively adding the event prior information before and after the information text, or adding the event prior information before the information text, or adding the event prior information after the information text.
Optionally, in an exemplary embodiment of the present disclosure, the event prior information includes: prior information of the event category;
the first training corpus is also marked with trigger word marking information;
the predicting the event information and the argument information in the text to be analyzed by using the first extraction model to obtain a prediction result, and the predicting comprises the following steps:
predicting event information, argument information and trigger word information in the text to be analyzed by using the first extraction model to obtain a prediction result, wherein the prediction result comprises event category prediction information, argument role prediction information and trigger word prediction information;
the determining event information and argument information corresponding to the information text based on the prediction result includes:
determining an event type corresponding to the information text based on the event type prediction information, determining an event type corresponding to the information text based on the event type prediction information and the trigger word prediction information, and determining an argument role included in the information text and an argument value of the argument role based on the argument role prediction information.
Optionally, in an exemplary embodiment of the present disclosure, the event prior information includes: event type and prior information of event category to which the event type belongs;
the prediction result comprises event prediction information and argument role prediction information; the event prediction information comprises event type prediction information and event type prediction information;
the determining event information and argument information corresponding to the information text based on the prediction result includes:
determining an event type corresponding to the information text based on the event type prediction information, and determining an argument role included in the information text and an argument value of the argument role based on the argument role prediction information.
Optionally, in an exemplary embodiment of the present disclosure, the predicting, by using the first extraction model, event information and argument information in the text to be analyzed to obtain a prediction result includes:
performing BIO labeling of event information on event prior information in the text to be analyzed by using the first extraction model, and performing BIO labeling of argument information on argument values in the text to be analyzed to obtain a prediction result, wherein the prediction result comprises the text to be analyzed carrying the BIO labeling information of the event information and the BIO labeling information of the argument information;
the determining event information and argument information corresponding to the information text based on the prediction result includes:
determining event information corresponding to the information text based on the text to be analyzed and the BIO labeling information of the event information; and determining an argument role and an argument value of the argument role included in the information text based on the text to be analyzed and the BIO labeling information of the argument information.
Optionally, in an exemplary embodiment of the present disclosure, the predicting, by using the first extraction model, event information and argument information in the text to be analyzed to obtain a prediction result further includes:
marking the trigger words in the text to be analyzed by using the first extraction model, wherein the prediction result further comprises BIO marking information of the trigger words;
the determining the event information and the argument information corresponding to the information text based on the prediction result further includes:
and determining the trigger word in the information text based on the text to be analyzed and the BIO labeling information of the trigger word.
Optionally, in an exemplary embodiment of the present disclosure, the first extraction model includes: the language model is pre-trained.
Optionally, in an exemplary embodiment of the present disclosure, training in advance based on a plurality of first corpus to obtain the first extraction model includes:
adding event prior information aiming at each initial corpus in a plurality of initial corpuses respectively, and labeling event labeling information and argument role labeling information aiming at the initial corpuses added with the event prior information to obtain a first training corpus;
and respectively inputting the first training corpuses into the first extraction model so that the first extraction model learns event information and argument information in the first training corpuses and a stipulation relation between the event labeling information and the argument role labeling information.
Optionally, in an exemplary embodiment of the present disclosure, the obtaining the first extraction model based on a plurality of first corpus training in advance further includes:
marking trigger word marking information aiming at the initial corpus added with the event prior information;
the inputting the plurality of first corpus into the first extraction model respectively to make the first extraction model learn the event information and the argument information in the plurality of first corpus and the specification relationship between the event labeling information and the argument role labeling information includes:
and respectively inputting the first training corpora into the first extraction model so that the first extraction model learns event information, argument information and trigger word information in the first training corpora and a specification relation between the event labeling information and the argument role labeling information.
Optionally, in an exemplary embodiment of the present disclosure, the method further includes:
extracting triple information from the event information and the argument information in the information text and the relationship between the event information and the argument information by using a second extraction model to obtain a second extraction result; the second extraction model is obtained by training based on a plurality of second training corpuses in advance, the second training corpuses are marked with main entity marking information, guest entity marking information and relation marking information between the main entity and the guest entities, the main entity comprises events, and the guest entities comprise argument roles;
determining event information and argument information corresponding to the information text based on the second extraction result to obtain a first determination result;
determining an analysis result of the information text based on the first determination result and the second determination result according to a preset rule; and the second determination result is event information and argument information corresponding to the information text determined based on the prediction result.
Optionally, in an exemplary embodiment of the present disclosure, the determining, according to a preset rule, an analysis result of the information text based on the first determination result and the second determination result includes:
if the first determination result is consistent with the second determination result, taking any one of the first determination result and the second determination result as an analysis result of the information text, and outputting the analysis result of the information text, or storing the analysis result of the information text in a structured database in a structured information storage manner; alternatively, the first and second electrodes may be,
if the first determination result is inconsistent with the second determination result, taking the first determination result or the second determination result determined according to the preset rule as an analysis result of the information text, and outputting the analysis result of the information text, or storing the analysis result of the information text in a structured database in a structured information storage manner; alternatively, the first and second electrodes may be,
and if the first determination result is inconsistent with the second determination result, determining that the analysis result of the information text is not obtained.
Optionally, in an exemplary embodiment of the present disclosure, the method further includes:
identifying whether a negative word exists in the information text and whether the negative word acts on a trigger word in the information text by using an identification model; the recognition model is obtained in advance based on negative word and triggering word combined training;
if a negative word exists in the information text and acts on a trigger word in the information text, correcting event information and argument information corresponding to the information text based on the negative word; or discarding the event information and the argument information corresponding to the information text.
According to a second aspect of the present disclosure, there is provided an information analysis apparatus including:
the acquisition module is used for acquiring the information text;
the adding module is used for adding event prior information corresponding to the trigger word to the information text based on the trigger word in the information text to obtain a text to be analyzed;
the prediction module is used for predicting the event information and the argument information in the text to be analyzed by utilizing a first extraction model to obtain a prediction result; the first extraction model is obtained by training in advance based on a plurality of first training corpuses, and the first training corpuses are marked with event marking information and argument role marking information;
and the first determining module is used for determining the event information and the argument information corresponding to the information text based on the prediction result.
Optionally, in an exemplary embodiment of the present disclosure, the adding module includes:
the detection unit is used for detecting the trigger words in the information text;
the determining unit is used for determining event prior information corresponding to the trigger word;
and the adding unit is used for respectively adding the event prior information before and after the information text, or adding the event prior information before the information text, or adding the event prior information after the information text.
Optionally, in an exemplary embodiment of the present disclosure, the event prior information includes: prior information of the event category;
the first training corpus is also marked with trigger word marking information;
the prediction module is specifically configured to predict event information, argument information and trigger word information in the text to be analyzed by using the first extraction model to obtain a prediction result, where the prediction result includes event category prediction information, argument role prediction information and trigger word prediction information;
the first determining module is specifically configured to determine an event category corresponding to the information text based on the event category prediction information, determine an event type corresponding to the information text based on the event category prediction information and the trigger word prediction information, and determine an argument role included in the information text and an argument value of the argument role based on the argument role prediction information.
Optionally, in an exemplary embodiment of the present disclosure, the event prior information includes: event type and prior information of event category to which the event type belongs;
the prediction result comprises event prediction information and argument role prediction information; the event prediction information comprises event type prediction information and event type prediction information;
the first determining module is specifically configured to determine an event type corresponding to the information text based on the event type prediction information, and determine an argument role included in the information text and an argument value of the argument role based on the argument role prediction information.
Optionally, in an exemplary embodiment of the present disclosure, the prediction module is specifically configured to perform, by using the first extraction model, BIO labeling of event information on event prior information in the text to be analyzed, and perform BIO labeling of argument information on argument values in the text to be analyzed to obtain a prediction result, where the prediction result includes a text to be analyzed that carries the BIO labeling information of the event information and the BIO labeling information of the argument information;
the first determining module is specifically configured to determine event information corresponding to the information text based on the text to be analyzed and the BIO labeling information of the event information; and determining an argument role and an argument value of the argument role included in the information text based on the text to be analyzed and the BIO labeling information of the argument information.
Optionally, in an exemplary embodiment of the present disclosure, the prediction module is specifically further configured to label, by using the first extraction model, a trigger word in the text to be analyzed, where the prediction result further includes BIO labeling information of the trigger word;
the first determining module is specifically further configured to determine a trigger word in the information text based on the text to be analyzed and the BIO labeling information of the trigger word.
Optionally, in an exemplary embodiment of the present disclosure, the first extraction model includes: the language model is pre-trained.
Optionally, in an exemplary embodiment of the present disclosure, the method further includes:
the preprocessing module is used for adding event prior information to each initial corpus in the plurality of initial corpuses respectively, and labeling event labeling information and argument role labeling information to the initial corpuses to which the event prior information is added to obtain a first training corpus;
and the training module is used for respectively inputting the first training corpuses into the first extraction model so as to enable the first extraction model to learn the protocol relationship between the event information and the argument information in the first training corpuses and between the event labeling information and the argument role labeling information.
Optionally, in an exemplary embodiment of the present disclosure, the preprocessing module is further configured to label trigger word labeling information for the initial corpus after adding the event prior information;
the training module is specifically configured to input the first corpus into the first extraction model, so that the first extraction model learns the event information, the argument information, the trigger word information, and the specification relationship between the event labeling information and the argument role labeling information in the first corpus.
Optionally, in an exemplary embodiment of the present disclosure, the method further includes:
the extraction module is used for extracting triple information from the event information and the argument information in the information text and the relationship between the event information and the argument information by using a second extraction model to obtain a second extraction result; the second extraction model is obtained by training based on a plurality of second training corpuses in advance, the second training corpuses are marked with main entity marking information, guest entity marking information and relation marking information between the main entity and the guest entities, the main entity comprises events, and the guest entities comprise argument roles;
the second determining module is used for determining the event information and the argument information corresponding to the information text based on the second extraction result to obtain a first determining result;
the third determining module is used for determining an analysis result of the information text based on the first determining result and the second determining result according to a preset rule; and the second determination result is event information and argument information corresponding to the information text determined based on the prediction result.
Optionally, in an exemplary embodiment of the present disclosure, the third determining module is specifically configured to:
if the first determination result is consistent with the second determination result, taking any one of the first determination result and the second determination result as an analysis result of the information text, and outputting the analysis result of the information text, or storing the analysis result of the information text in a structured database in a structured information storage manner; alternatively, the first and second electrodes may be,
if the first determination result is inconsistent with the second determination result, taking the first determination result or the second determination result determined according to the preset rule as an analysis result of the information text, and outputting the analysis result of the information text, or storing the analysis result of the information text in a structured database in a structured information storage manner; alternatively, the first and second electrodes may be,
and if the first determination result is inconsistent with the second determination result, determining that the analysis result of the information text is not obtained.
Optionally, in an exemplary embodiment of the present disclosure, the method further includes:
the recognition module is used for recognizing whether a negative word exists in the information text and whether the negative word acts on a trigger word in the information text by using a recognition model; the recognition model is obtained in advance based on negative word and triggering word combined training;
a result processing module, configured to, according to the recognition result of the second recognition module, modify event information and argument information corresponding to the information text based on a negative word if the information text has the negative word and the negative word acts on a trigger word in the information text; or discarding the event information and the argument information corresponding to the information text.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the above-described information analysis method via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described information analysis method.
According to a fifth aspect of the present disclosure, there is provided a computer program comprising computer readable code which, when run on a device, a processor in the device executes a method for implementing the above information analysis method.
As can be seen from the foregoing technical solutions, the information analysis method and apparatus, the electronic device, and the computer-readable storage medium in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:
according to the information analysis method and device, the electronic device and the computer readable storage medium in the embodiment of the disclosure, after the information text is obtained, event prior information corresponding to a trigger word is added to the information text based on the trigger word in the information text to obtain a text to be analyzed, then event information and argument information in the text to be analyzed are predicted by using a first extraction model to obtain a prediction result, and further, the event information and the argument information corresponding to the information text are determined based on the prediction result. The first extraction model is obtained by training based on a plurality of first training corpuses in advance, each first training corpuses is marked with event marking information and argument role marking information, the first extraction model can learn various event information and argument information and the stipulation relation between the event information and the argument information through a large number of training corpuses in advance, therefore, the event information and the argument information in the text to be analyzed can be accurately predicted, the event information and the argument information corresponding to the information text can be accurately determined based on the prediction result, the accuracy of extracting the event and the argument value from the information text is improved, and compared with the existing method for extracting the event and the argument value from the public opinion news in a streamline mode based on the event and the argument, the problem of error conduction and the problem of error of the information extracted from the public opinion news of enterprises caused by the error conduction problem can be avoided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 illustrates a system architecture diagram to which embodiments of the present disclosure may be applied;
fig. 2 shows a schematic flow chart of an information analysis method in a first exemplary embodiment of the present disclosure;
fig. 3 shows a schematic flow chart of an information analysis method in a second exemplary embodiment of the present disclosure;
fig. 4 shows a schematic flow chart of an information analysis method in a third exemplary embodiment of the present disclosure;
fig. 5 shows a schematic flow chart of an information analysis method in a fourth exemplary embodiment of the present disclosure;
fig. 6 shows a schematic flow chart of an information analysis method in a fifth exemplary embodiment of the present disclosure;
fig. 7 shows a schematic flow chart of an information analysis method in a sixth exemplary embodiment of the present disclosure;
fig. 8 shows a schematic flow chart of an information analysis method in a seventh exemplary embodiment of the present disclosure;
fig. 9 shows a block diagram of an information analysis apparatus in a first exemplary embodiment of the present disclosure;
fig. 10 shows a block diagram of an information analysis apparatus in a second exemplary embodiment of the present disclosure;
fig. 11 shows a block diagram of an electronic device in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, apparatus, steps, etc. In other instances, well-known structures, methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present disclosure, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. The symbol "/" generally indicates that the former and latter associated objects are in an "or" relationship.
In the present disclosure, unless otherwise expressly specified or limited, the terms "connected" and the like are to be construed broadly, e.g., as meaning electrically connected or in communication with each other; may be directly connected or indirectly connected through an intermediate. The specific meaning of the above terms in the present disclosure can be understood by those of ordinary skill in the art as appropriate.
FIG. 1 shows a system architecture diagram to which embodiments of the present disclosure may be applied. As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may be various electronic devices having display screens including, but not limited to, smart phones, tablets, portable and desktop computers, digital cinema projectors, and the like.
The server 105 may be a server that provides various information texts, such as servers of various websites, self-media platforms, databases, and the like. For example, the user uses the terminal device 103 (or the terminal device 101 or 102) to obtain the information text from the server 105 in real time or periodically, and executes the information analysis method of the embodiment of the present disclosure to obtain the event information and the argument information corresponding to the information text, and stores the event information and the argument information in the structured database in the structured information storage manner for subsequent analysis.
Fig. 2 shows a flow chart of an information analysis method in an exemplary embodiment of the present disclosure. The present embodiment can be applied to an electronic device, and as shown in fig. 2, the information analysis method of the present embodiment includes the following steps:
step 201, obtaining an information text.
The characters in the information text in the embodiment of the present disclosure may be chinese characters, english characters, or characters of any type such as numbers. In addition, the information text in the embodiments of the present disclosure may be a text in any field, and the contents and the fields of the information text are not limited in the embodiments of the present disclosure.
In some embodiments, the information text in the embodiment of the disclosure may be a public opinion news text of an enterprise, the public opinion news text may be an original public opinion news text, or a public opinion news text after preprocessing the original public opinion news text, where the preprocessing may be, for example, removing emoticons, wrong punctuation marks, and the like in the original public opinion news text, and the embodiment of the disclosure does not limit specific content and representation of the public opinion news text, whether to preprocess and a specific manner of preprocessing. For example, the public opinion news text of a business may be "the stock in stock is going to buy HB group 51% equity".
The information text in the embodiments of the present disclosure, such as the public opinion news text of an enterprise, is unstructured information.
In the embodiment of the present disclosure, the information text may be acquired from each website, forum, self-media platform, or the like in real time or according to a certain period, or the information text input by the user may also be received.
Step 202, adding event prior information corresponding to a trigger word to an information text based on the trigger word in the information text to obtain a text to be analyzed.
Where an event is a specific occurrence involving a participant, it can often be described as a change in state. The Event is composed of an Event Trigger word (Event Trigger) and an Event Argument (Event Argument) describing the Event structure, and the Event Trigger word and the Event Argument are combined to completely describe the Event. The event trigger word is referred to as a trigger word for short, is a word capable of triggering an event to occur, is an important feature word for determining an event type and an event type, and determines the event type and the event type. The event argument is referred to as argument for short, and refers to each constituent element (time, place, participant, relevant content of event adaptation, etc.) of the event in the event description, and the elements are arguments, and each argument corresponds to an argument role.
In the embodiment of the present disclosure, the event category refers to a field to which an event belongs according to the content of the event, for example, the field is divided into financial, transaction, entertainment, science and technology, health, and the like.
In the embodiment of the present disclosure, under each event category, the event category may be further divided into a plurality of event types according to the content of the event, for example, the event category may specifically include event types such as purchase, sale, and the like under the transaction event category, and the event type is also a sub-category under the event category, and therefore may also be referred to as an event sub-category. The number and the specific dividing mode of the event types which are divided specifically under the event category are not limited in the embodiment of the disclosure.
And 203, predicting event information and argument information in the text to be analyzed by using the first extraction model to obtain a prediction result.
The first extraction model is obtained by training based on a plurality of first training corpuses in advance, and the first training corpuses are marked with event marking information and argument role marking information. Since the first extraction model is used to jointly predict event information and argument information, it may also be referred to as a joint extraction model.
The event marking information may be event type marking information, or may include both event type and event type marking information. Each event category and event type has a corresponding argument role, for example, a sale/acquisition event type under the event category of transaction, and the corresponding argument roles include: time, seller, transaction, sale price, acquirer.
And step 204, determining event information and argument information corresponding to the information text based on the prediction result.
The event information may include at least one of a specific event category and an event type under the event category. The argument information may include an argument role and an argument value of the argument role, where the argument value is the specific content of the argument role corresponding to the information text, for example, in the information text "stock quotation purchase HB group 51% stock right", the argument role includes: the corresponding argument values of the seller, the trade article and the acquirer are respectively as follows: HB group, 51% stock right, middle stock share.
Based on the embodiment, after the information text is obtained, event prior information corresponding to a trigger word is added to the information text based on the trigger word in the information text to obtain a text to be analyzed, then event information and argument information in the text to be analyzed are predicted by using a first extraction model to obtain a prediction result, and further, based on the prediction result, the event information and the argument information corresponding to the information text are determined. The first extraction model is obtained by training based on a plurality of first training corpuses in advance, each first training corpuses is marked with event marking information and argument role marking information, the first extraction model can learn various event information and argument information and the stipulation relation between the event information and the argument information through a large number of training corpuses in advance, therefore, the event information and the argument information in the text to be analyzed can be accurately predicted, the event information and the argument information corresponding to the information text can be accurately determined based on the prediction result, the accuracy of extracting the event and the argument value from the information text is improved, and the problems of error conduction and error of the information extracted from the public sentiment of an enterprise caused by the error conduction problem of the existing method for extracting the event and the argument value in the news and the public sentiment based on the streamline mode of the event and the argument can be avoided.
Fig. 3 shows a schematic flow chart of an information analysis method in another exemplary embodiment of the present disclosure. As shown in fig. 3, based on the embodiment shown in fig. 2, step 202 may include the following steps:
step 2021, detect the trigger word in the information text.
Optionally, in some embodiments, a trigger word library may be pre-established, and a trigger word in the trigger word library where the information text exists may be detected in a regular matching manner.
Alternatively, in other embodiments, the trigger word in the information text may be detected by using a detection model, which is trained in advance based on the trigger word in the trigger word library, so that the trigger word in the text may be detected.
In the embodiment of the disclosure, trigger words related to each event type or each event type under each event type can be collected in advance to construct a trigger word library, and the trigger words in the trigger word library can be updated according to actual requirements.
Step 2022, determining event prior information corresponding to the trigger word.
The event prior information may be prior information used for representing an event type, or may be prior information of an event type and an event type to which the event type belongs.
Step 2023, adding corresponding event prior information before and after the information text, or adding corresponding event prior information before the information text, or adding corresponding event prior information after the information text, to obtain a text to be analyzed.
For example, for the information text "the stock shares are planned to buy HB group with 51% stock right", the trigger word "buy" in the information text is detected, the event category corresponding to the event trigger word "buy" is used as a transaction, and prior information of the event category is added before and after the information text, so as to obtain: in the transaction, 51% of right to stock of HB group is planned to be purchased for stock, and event prior information corresponding to a trigger word is added, so that convention information in event information and argument information in an information text can be reduced.
Based on the embodiment, after the trigger word in the information text is detected, corresponding event prior information can be added before and after the information text, so that the subsequent first extraction model is facilitated to realize the first extraction of the event information and the argument information of the text to be analyzed by combining the event prior information, and the accuracy of extracting the event and the argument value can be improved based on the reduction relation between the event information and the argument information.
Fig. 4 shows a flow diagram of an information analysis method in yet another exemplary embodiment of the present disclosure. As shown in fig. 4, the information analysis method of the present embodiment includes the following steps:
step 301, obtaining an information text.
Step 302, adding prior information of event categories corresponding to trigger words to the information text based on the trigger words in the information text to obtain a text to be analyzed.
And step 303, predicting event information, argument information and trigger word information in the text to be analyzed by using the first extraction model to obtain a prediction result.
The prediction result comprises event category prediction information, argument role prediction information and trigger word prediction information.
The first extraction model is obtained by training in advance based on a plurality of first training corpuses, the first training corpuses are marked with event marking information, trigger word marking information and argument role marking information, and the event marking information is event category marking information.
Step 304, determining an event category corresponding to the information text based on the event category prediction information, determining an event type corresponding to the information text based on the event category prediction information and the trigger word prediction information, and determining an argument role included in the information text and an argument value of the argument role based on the argument role prediction information.
In some embodiments, an event category may be extracted from the event category prediction information, a trigger word may be extracted from the trigger word prediction information, and an argument value of each argument role and an argument role may be extracted from the argument role prediction information with the extracted event category and trigger word as an event type.
In other embodiments, the event category prediction information may be directly used as the event category, the event category prediction information and the trigger word prediction information may be used as the event type, and the argument role prediction information may be used as the argument role and the argument value of each argument role.
Or, in other ways, the event category corresponding to the information text may be determined based on the event category prediction information, the event type corresponding to the information text is determined based on the event category prediction information and the trigger word prediction information, and the argument role included in the information text and the argument value of the argument role are determined based on the argument role prediction information, which is not limited in this disclosure.
For example, for the above information text "stock shares are to be bought into HB group 51% equity", the following results are obtained:
event types are as follows: trading; event type: purchasing;
argument roles and argument values: and (3) acquisition party: storing the stock shares; and (3) selling the product: HB group; transaction article: and 51% stock right.
Based on the embodiment, after a trigger word in an information text is detected, the information text is added with prior information of an event category corresponding to the trigger word to obtain a text to be analyzed, the text to be analyzed is subjected to combined prediction by using a first extraction model to obtain event category prediction information, argument role prediction information and trigger word prediction information, the event category corresponding to the information text can be determined based on the event category prediction information, the specific event type corresponding to the information text can be determined based on the event category prediction information and the trigger word prediction information, and the argument values of argument roles and argument roles included in the information text can be determined based on the argument role prediction information, so that the accurate prediction of the event category, the event type and the argument values corresponding to the information text is realized, and the accuracy of extracting the event and the argument values can be improved.
Fig. 5 shows a flow chart of an information analysis method in still another exemplary embodiment of the present disclosure. As shown in fig. 5, the information analysis method of the present embodiment includes the following steps:
step 401, obtaining an information text.
Step 402, adding an event type corresponding to a trigger word and prior information of an event type to which the event type belongs to the information text based on the trigger word in the information text, and obtaining a text to be analyzed.
And step 403, predicting event information and argument information in the text to be analyzed by using the first extraction model to obtain a prediction result.
The prediction result comprises event prediction information and argument role prediction information, wherein the event prediction information comprises event type prediction information and event type prediction information.
The first extraction model is obtained by training in advance based on a plurality of first training corpuses, the first training corpuses are marked with event marking information and argument role marking information, and the event marking information is event type and event type marking information.
Step 404, determining an event type corresponding to the information text based on the event type prediction information, and determining an argument role and an argument value of the argument role included in the information text based on the argument role prediction information.
In some embodiments, event categories may be extracted from the event category prediction information, event types may be extracted from the event category prediction information, and argument roles and argument values for each argument role may be extracted from the argument role prediction information.
In other embodiments, the event category prediction information may be directly used as the event category, the event type prediction information may be used as the event type, and the argument role prediction information may be used as the argument role and the argument value of each argument role.
Alternatively, the event type corresponding to the information text may be determined based on the event type prediction information, and the argument values of the argument roles and the argument roles included in the information text may be determined based on the argument role prediction information in other manners, which is not limited in this embodiment of the present disclosure.
For example, for the above information text "stock shares are to be bought into HB group 51% equity", the following results are obtained:
event types are as follows: trading; event type: purchasing;
argument roles and argument values: and (3) acquisition party: storing the stock shares; and (3) selling the product: HB group; transaction article: and 51% stock right.
Based on the embodiment, after a trigger word in an information text is detected, the information text is added with event type corresponding to the trigger word and prior information of the event type corresponding to the trigger word to obtain a text to be analyzed, the text to be analyzed is subjected to combined prediction by using a first extraction model to obtain event type prediction information, event type prediction information and argument role prediction information, the event type corresponding to the information text can be determined based on the event type prediction information, the specific event type corresponding to the information text can be determined based on the event type prediction information, and the argument values of argument roles and argument roles included in the information text can be determined based on the argument role prediction information, so that the accurate prediction of the event type, the event type and the argument values corresponding to the information text is realized, and the accuracy of extracting the event and the argument values can be improved.
Optionally, in some embodiments, the first extraction model in the above embodiments of the present disclosure may be a pre-training language model, for example, a BERT model, a ROBERT model, an ERNI model, or other large pre-training language models.
The pre-training language model can learn semantic representation of complete concepts through prior semantic knowledge such as entity concepts in modeling mass data, the representation of semantic knowledge units is closer to the real world, the prior semantic knowledge units are directly modeled while the modeling is input based on character features, and the pre-training language model has strong semantic representation capability.
Fig. 6 shows a schematic flow chart of an information analysis method in a fifth exemplary embodiment of the present disclosure. As shown in fig. 6, the first extraction model may be obtained by training in advance based on a plurality of first corpus as follows:
step 501, adding event prior information for each initial corpus of a plurality of initial corpora respectively, and labeling event labeling information and argument role labeling information for the initial corpus to which the event prior information is added to obtain a first corpus.
Step 502, inputting the plurality of first corpus into the first extraction model respectively, so that the first extraction model learns the event information and the argument information in the plurality of first corpus, and the specification relationship between the event labeling information and the argument role labeling information.
Optionally, in some embodiments, an unsupervised training mode may be adopted to train the first extraction model, that is, the first extraction model learns the stipulation relationship between the event information and the argument information and between the event tagging information and the argument role tagging information in a preset number of first training corpora, or the training frequency of the first extraction model reaches a preset frequency, and the specific training mode of the first extraction model is not limited in the embodiments of the present disclosure.
Or, in other embodiments, the first extraction model may also be trained in a supervised training manner, so that a difference between a prediction result of the first extraction model and the labeling information of the first corpus is smaller than a preset threshold.
Based on this embodiment, can utilize a large amount of first corpus training first extraction models, make first extraction model can fully learn event information and argument information among each first corpus, and the stipulation relation between event marking information and argument role marking information, like this, after first extraction model training is accomplished, can jointly predict event information and argument information based on the stipulation relation between event information and argument information to realize the accurate prediction of event information and argument information.
Optionally, in some embodiments, in step 501, trigger word tagging information may be further tagged to the initial corpus after the event prior information is added. Accordingly, in step 502, the first corpus is respectively input into the first extraction model, so that the first extraction model learns the event information, the argument information and the trigger word information in the first corpus, and the specification relationship between the event annotation information and the argument role annotation information.
Based on the embodiment, a large amount of first corpus can be utilized to train the first extraction model, so that the first extraction model can fully learn the event information, the argument information, the trigger word tagging information and the stipulation relationship among the trigger word tagging information, the event tagging information and the argument role tagging information in each first corpus, and after the training of the first extraction model is completed, the trigger word, the event information and the argument information can be jointly predicted based on the stipulation relationship among the trigger word, the event information and the argument information, so that the accurate prediction of the event information and the argument information is realized.
In the embodiment of the disclosure, an event system can be designed, including argument roles corresponding to event categories. Such as financial, transaction event categories, trigger terms are acquisition, and argument roles include: time, seller, transaction, sale price, acquirer.
Optionally, in some embodiments, the training of the pre-trained language model is implemented by using a large pre-trained language model such as a BERT model, a ROBERT model, or an ERNI model as the first extraction model and refining (fine-tuning) the pre-trained language model using a large amount of first training corpora.
Optionally, in some embodiments, the event tagging information, the trigger word tagging information, and the argument role tagging information in the foregoing embodiments may be BIO tagging information, for example, BIO tagging information of an event category, BIO tagging information of an event type, BIO tagging information of a trigger word, and BIO tagging information of an argument role.
Wherein, BIO labels, namely labeling each element as B-X, I-X or O. Wherein B-X indicates that the fragment in which the element is located belongs to X type and the element is at the beginning of the fragment, I-X indicates that the fragment in which the element is located belongs to X type and the element is in the middle position of the fragment, and O indicates that the fragment does not belong to any type. For example, we denote X as a Noun Phrase (Noun Phrase, NP), then three markers of BIO are: B-NP: the beginning of a noun phrase; I-NP: the middle of a noun phrase; o: not noun phrases.
For example, for the initial corpus "land market of 7 month-monthly newspaper in Dalian building City: two residential plots are successfully given a business, the trigger word is "given a business", the event category corresponding to the trigger word is "financial", and the first training corpus obtained after adding the event prior information is as follows: the 7 month and month newspaper of < finance > Dalian building City: two residential plots are successfully given out and allowed, BIO labeling information of event categories and BIO labeling information of argument roles are added, and the following first training corpora are obtained: < B-finance, finance I-finance, > I-finance, big O, Lian O, Lou O, City O, 7B-time, month I-time, month O, newspaper O,: o, two B-transactions, zoned I-transactions, inhabitated I-transactions, local I-transactions, block I-transactions, adult O, gong O, outbound B-Trigger, let I-Trigger, < B-financial,/I-financial, financial I-financial, menstrual I-financial, > I-financial.
Optionally, in some embodiments, in step 203, 303 or 403, a first extraction model may be used to perform BIO labeling of event information on event prior information in a text to be analyzed, and perform BIO labeling of argument information on argument values in the text to be analyzed, so as to obtain a prediction result, where the prediction result includes a text to be analyzed that carries BIO labeling information of event information and BIO labeling information of argument information. Correspondingly, in the above step 204, 304 or 404, the event information corresponding to the information text may be determined based on the to-be-analyzed text and the BIO labeling information of the event information; and determining argument values of the argument roles and the argument roles included in the information text based on the text to be analyzed and the BIO labeling information of the argument information.
For example, for the text to be analyzed of "the HB group 51% stock right to be purchased for the stock shares" in the information text, the trigger word is "purchase", and the prediction result obtained by using the first extraction model is: < B-transaction, deal I-transaction, trade I-transaction, > I-transaction, medium B-acquirer, store I-acquirer, stock I-acquirer, share I-acquirer, quasi, receive B-Trigger, buy I-Trigger, H B-seller, B I-seller, aggregate I-seller, group I-seller, 5B-transaction, 1I-transaction,% I-transaction, stock I-transaction, right I-transaction, < B-transaction,/I-transaction, deal I-transaction, trade I-transaction, > I-transaction. The event category corresponding to the information text can be determined to be "transaction" based on the BIO annotation information "< B-transaction, deal I-transaction, transaction I-transaction, > I-transaction" and "< B-transaction,/I-transaction, deal I-transaction, transaction I-transaction, > I-transaction" of the event information; the argument roles and corresponding argument values were determined based on "Zhong B-acquirer, store I-acquirer, stock I-acquirer, share I-acquirer", "H B-seller, B I-seller, Collection I-seller, group I-seller" and "5B-trader, 1I-trader,% I-trader, stock I-trader, weight I-trader" as follows: and (3) acquisition party: storing the stock shares; and (3) selling the product: HB group; transaction article: and 51% stock right. The following results were obtained:
event types are as follows: trading; triggering words: purchasing;
argument roles and argument values: and (3) acquisition party: storing the stock shares; and (3) selling the product: HB group; transaction article: and 51% stock right.
Optionally, in another embodiment, in step 203, 303, or 403, a first extraction model may be used to perform BIO labeling of event information on event prior information in a text to be analyzed, perform BIO labeling of argument information on argument values in the text to be analyzed, and label a trigger word in the text to be analyzed, so as to obtain a prediction result, where the prediction result includes a text to be analyzed that carries the BIO labeling information of event information, the BIO labeling information of trigger word, and the BIO labeling information of argument information. Correspondingly, in the above step 204, 304 or 404, the event information corresponding to the information text may be determined based on the to-be-analyzed text and the BIO labeling information of the event information; determining argument values of argument roles and argument roles included in the information text based on the text to be analyzed and the BIO labeling information of the argument information; and determining the trigger word in the information text based on the text to be analyzed and the BIO labeling information of the trigger word.
For example, for the text to be analyzed of "the HB group 51% stock right to be purchased for the stock shares" in the information text, the trigger word is "purchase", and the prediction result obtained by using the first extraction model is: < B-transaction, deal I-transaction, trade I-transaction, > I-transaction, medium B-acquirer, store I-acquirer, stock I-acquirer, share I-acquirer, quasi, receive B-Trigger, buy I-Trigger, H B-seller, B I-seller, aggregate I-seller, group I-seller, 5B-transaction, 1I-transaction,% I-transaction, stock I-transaction, right I-transaction, < B-transaction,/I-transaction, deal I-transaction, trade I-transaction, > I-transaction. The event category corresponding to the information text can be determined to be "transaction" based on the BIO annotation information "< B-transaction, deal I-transaction, transaction I-transaction, > I-transaction" and "< B-transaction,/I-transaction, deal I-transaction, transaction I-transaction, > I-transaction" of the event information; determining a Trigger word 'acquisition' based on 'acquisition B-Trigger and acquisition I-Trigger'; the argument roles and corresponding argument values were determined based on "Zhong B-acquirer, store I-acquirer, stock I-acquirer, share I-acquirer", "H B-seller, B I-seller, Collection I-seller, group I-seller" and "5B-trader, 1I-trader,% I-trader, stock I-trader, weight I-trader" as follows: and (3) acquisition party: storing the stock shares; and (3) selling the product: HB group; transaction article: and 51% stock right.
For example, for the above information text "stock shares are to be bought into HB group 51% equity", the following results are obtained:
event types are as follows: trading; triggering words: purchasing;
argument roles and argument values: and (3) acquisition party: storing the stock shares; and (3) selling the product: HB group; transaction article: and 51% stock right.
Fig. 7 shows a flowchart of an information analysis method in a sixth exemplary embodiment of the present disclosure. As shown in fig. 7, on the basis of the foregoing embodiment, after the information analysis method of this embodiment acquires the information text, the method may further include:
step 601, extracting the triple information of the relationship between the event information and the argument information and the relationship between the event information and the argument information in the information text by using a second extraction model to obtain a second extraction result.
The second extraction model is obtained by training based on a plurality of second training corpuses in advance, the second training corpuses are marked with main entity marking information, guest entity marking information and relation marking information between the main entity and the guest entities, wherein the main entity comprises events, and the guest entities comprise argument roles. The second extraction model is used for extracting the triple information of the relationship between the event information and the argument information and the relationship between the event information and the argument information, and can also be called as a relationship extraction model.
Step 602, based on the second extraction result, determining event information and argument information corresponding to the information text, and obtaining a first determination result.
Step 603, determining an analysis result of the information text based on the first determination result and the second determination result according to a preset rule.
The second determination result is event information and argument information corresponding to the information text determined based on the prediction result in the above embodiment.
Based on the embodiment, the triple information extraction is performed on the event information and the argument information in the information text and the relationship between the event information and the argument information by using the second extraction model to obtain a second extraction result, so that the event information and the argument information corresponding to the information text are determined, the analysis result of the information text is determined by combining the event information and the argument information corresponding to the information text determined based on the prediction result, more event information and argument information can be identified from multiple dimensions and multiple angles, the extraction recall of the event information and the argument information is improved, and the comprehensiveness and the accuracy of the information extraction are improved.
In the embodiment of the present disclosure, the system of design transformation events and arguments is in the form of a triple, for example, the event category: trading; event type: purchasing; argument role list: time, seller, transaction, selling price, acquirer, may be converted to the following triplet form:
object class: time _ is; the relationship is as follows: time; the subject categories: trading;
object class: seller _ is; the relationship is as follows: selling the seller; the subject categories: trading;
object class: acquirer _ is; the relationship is as follows: an acquisition party; the subject categories: trading;
object class: transaction _ is; the relationship is as follows: a transaction; the subject categories: trading;
object class: selling price _ is; the relationship is as follows: selling price; the subject categories: trading;
and (3) utilizing a second extraction model to extract the triple information of the relationship between the event information and the argument information of the 'HB group 51% stock right for planning purchase of the stock shares' in the information text, and obtaining a second extraction result as follows:
a main body: purchasing; the subject categories: trading; the relationship is as follows: an acquisition party; object: storing the stock shares; object class: acquirer _ is;
a main body: purchasing; the subject categories: trading; the relationship is as follows: selling the seller; object: HB group; object class: seller _ is;
a main body: purchasing; the subject categories: trading; the relationship is as follows: a transaction; object: 51% stock right; object class: transaction _ is;
based on the second extraction result, determining the event information and the argument information corresponding to the information text, and obtaining a first determination result as follows:
event type: purchasing; event types are as follows: trading;
argument roles and argument values: and (3) acquisition party: storing the stock shares; and (3) selling the product: HB group; transaction article: and 51% stock right.
In the embodiment of the disclosure, the conversion event and argument system is designed to be in a triple form, and the second extraction model is used for extracting events, so that event trigger words have more generalization characteristics, and more diverse event expression forms and more diverse trigger word expression forms can be gained, thereby identifying more results and improving the recall rate.
Optionally, in some embodiments, the second extraction model in the above embodiments of the present disclosure may be a pre-training language model, for example, a BERT model, a ROBERT model, an ERNI model, or other large pre-training language models.
Optionally, in some embodiments, in step 603, if the first determination result is consistent with the second determination result, either one of the first determination result and the second determination result may be used as an analysis result of the information text, and the analysis result of the information text is output, or the analysis result of the information text is stored in the structured database in a structured information storage manner; or if the first determination result is inconsistent with the second determination result, the first determination result or the second determination result determined according to the preset rule can be used as an analysis result of the information text, and the analysis result of the information text is output, or the analysis result of the information text is stored in the structured database in a structured information storage mode; or if the first determination result is inconsistent with the second determination result, determining that the analysis result of the information text is not obtained, and not outputting the analysis result of the information text.
Based on the embodiment, when the first determination result is consistent with the second determination result, the consistent determination result is directly used as the analysis result of the information text, and when the first determination result is inconsistent with the second determination result, the first determination result or the second determination result can be determined according to a preset rule as the analysis result of the information text, and the analysis result of the information text is stored or output so as to perform public opinion analysis based on the analysis result of the information text; when the two are inconsistent, the analysis result of the information text is not output, so that wrong public opinion analysis errors are avoided.
Fig. 8 shows a flowchart of an information analysis method in a seventh exemplary embodiment of the present disclosure. As shown in fig. 8, on the basis of the foregoing embodiment, the information analysis method of this embodiment may further include:
step 701, identifying whether a negative word exists in the information text and whether the negative word acts on a trigger word in the information text by using the identification model.
Wherein a negative word is a word with negative meaning (negotion).
If the negative word exists in the information text and the negative word acts on the trigger word in the information text, executing step 702; otherwise, if no negative word exists in the information text and/or a negative word exists but does not act on the trigger word in the information text, the subsequent process of the embodiment is not executed.
The recognition model is obtained in advance based on the joint training of the negative words and the trigger words.
Step 702, modifying the event information and argument information corresponding to the information text obtained based on the above embodiment based on the negative word, for example, adding a negative word before the trigger word; or discarding the event information and the argument information corresponding to the information text.
The event information and the argument information corresponding to the information text obtained in the above embodiment may be event information and argument information (i.e., a second determination result) corresponding to the information text determined based on the prediction result, may be event information and argument information (i.e., a first determination result) corresponding to the information text determined based on the second extraction result, or may be an analysis result of the information text determined based on the first determination result and the second determination result according to a preset rule. Optionally, in some embodiments, the recognition model may be used to recognize whether a negative word and a trigger word exist in the information text at the same time, and if the negative word and the trigger word are recognized at the same time, the event in the information text is considered as a negative event. Otherwise, if any one of the negative word and the trigger word is not recognized or neither the negative word nor the trigger word is recognized, the event in the information text is considered not to be a negative event.
Since the recognition model is used for the event in the information text as a negative event, the recognition model can also be called as a negative event recognition model.
Specifically, a third corpus can be obtained by performing BIO labeling on the negative words and the trigger words in the multiple initial corpora, and the event recognition model is trained based on the third corpus, so that the negative words and the trigger words in the text can be recognized after the event recognition model is trained.
For example, reply to query letter to the initial corpus "three-dimensional silk: auspicious environment-friendly no-achievement gliding risk ", performing BIO labeling on a negative word and a trigger word of the negative word to obtain a third training corpus as follows: three O, vitamin O, silk O, Hui
O, complex O, query O, function O: o, Xiang O, Sheng O, cyclo O, Bao, No B-NEGATION, Job B-TRIGGER, Performance I-TRIGGER, Down I-TRIGGER, slippery I-TRIGGER, Feng O, Risk O. The negative word is 'none', and the trigger word is 'performance glide', so that the recognition model is trained jointly based on the negative word and the trigger word.
In the prediction using the recognition model, for example, inputting the information text "it is forbidden to accept the one-quarter referee 4000 and not to have any referee plan" to the recognition model, the recognition model outputs the output results as follows: medium O, Xing O, No B-negotion, affirmed I-negotion, one O, Quaternary O, Cedu O, cutting B-TRIGGER, member I-TRIGGER, 4O, 0O, human O, and B-negotion, no I-negotion, any O, cutting B-TRIGGER, member I-TRIGGER, plan O, and plan O. Based on the output result, the first negation word and the active triggering word are "negative affirmation" and "referee", and the first negation word and the active triggering word are "none" and "referee".
Based on the embodiment, the recognition model is utilized to recognize the negative event, so that the negative event is prevented from being recognized as the event by mistake, if a certain enterprise is not bankruptcy and a certain enterprise is bankruptcy, semantically, the mistake of the non-event is eliminated, and the accuracy of event information extraction and the accuracy of event recognition can be further improved; in addition, negative events which do not occur are removed, and the recognition of wrong public opinion information can be prevented.
Optionally, in some embodiments, the recognition model in the above embodiments of the present disclosure may be a pre-training language model, for example, a BERT model, a ROBERT model, an ERNI model, or other large pre-training language models.
Based on the embodiment of the disclosure, basic information of events and arguments, such as types of the events, participants of the events, occurrence time and places, and the like, can be extracted from the unstructured natural language text, and can be presented in a structured form so as to facilitate various applications in the follow-up process.
Based on the embodiment of the disclosure, after the analysis result of the information text is obtained, various applications can be performed, for example, the analysis result of the information text can be displayed and expressed in a structured form, so that a user can conveniently and quickly understand an event; for example, in a financial field scenario, effective risk control can be performed based on the analysis result of the company information text, and the application scenario of the analysis result of the information text is not limited in this embodiment.
Fig. 9 shows a block diagram of an information analysis apparatus in the first exemplary embodiment of the present disclosure. The information analysis device provided in any embodiment of the present disclosure may be used to implement the information analysis method in the above-described embodiment of the present disclosure. The information analysis apparatus provided in any embodiment of the present disclosure may be disposed on a terminal device, may also be disposed on a server, or may be partially disposed on a terminal device and partially disposed on a server, for example, may be disposed on the server 105 in fig. 1, but the present disclosure is not limited thereto.
As shown in fig. 9, the information analysis apparatus of this embodiment includes: an acquisition module 801, an addition module 802, a prediction module 803 and a first determination module 804. Wherein:
an obtaining module 801, configured to obtain an information text.
An adding module 802, configured to add event prior information corresponding to a trigger word to an information text based on the trigger word in the information text, to obtain a text to be analyzed.
The predicting module 803 is configured to predict event information and argument information in the text to be analyzed by using the first extraction model, so as to obtain a prediction result. The first extraction model is obtained by training in advance based on a plurality of first training corpuses, and the first training corpuses are marked with event marking information and argument role marking information.
And a first determining module 804, configured to determine event information and argument information corresponding to the information text based on the prediction result.
Based on the embodiment of the disclosure, after an information text is obtained, event prior information corresponding to a trigger word is added to the information text based on the trigger word in the information text to obtain a text to be analyzed, then event information and argument information in the text to be analyzed are predicted by using a first extraction model to obtain a prediction result, and further, event information and argument information corresponding to the information text are determined based on the prediction result. The first extraction model is obtained by training based on a plurality of first training corpuses in advance, each first training corpuses is marked with event marking information and argument role marking information, the first extraction model can learn various event information and argument information and the stipulation relation between the event information and the argument information through a large number of training corpuses in advance, therefore, the event information and the argument information in the text to be analyzed can be accurately predicted, the event information and the argument information corresponding to the information text can be accurately determined based on the prediction result, the accuracy of extracting the event and the argument value from the information text is improved, and the problems of error conduction and error of the information extracted from the public sentiment of an enterprise caused by the error conduction problem of the existing method for extracting the event and the argument value in the news and the public sentiment based on the streamline mode of the event and the argument can be avoided.
Optionally, in some embodiments, the adding module 802 may include: the detection unit is used for detecting the trigger words in the information text; the determining unit is used for determining event prior information corresponding to the trigger word; and the adding unit is used for respectively adding the event prior information before and after the information text, or adding the event prior information before the information text, or adding the event prior information after the information text.
Optionally, in some of these embodiments, the event prior information includes: prior information of the event category; the first training corpus is also marked with trigger word marking information. Correspondingly, the predicting module 803 is specifically configured to predict event information, argument information, and trigger word information in the text to be analyzed by using the first extraction model, so as to obtain a prediction result, where the prediction result includes event category prediction information, argument role prediction information, and trigger word prediction information. The first determining module 804 is specifically configured to determine an event category corresponding to the information text based on the event category prediction information, determine an event type corresponding to the information text based on the event category prediction information and the trigger word prediction information, and determine an argument role included in the information text and an argument value of the argument role based on the argument role prediction information.
Optionally, in other embodiments, the event prior information includes: event type and prior information of event type to which the event type belongs; the prediction result comprises event prediction information and argument role prediction information, wherein the event prediction information comprises event category prediction information and event type prediction information. Accordingly, the first determining module 804 is specifically configured to determine an event category corresponding to the information text based on the event category prediction information, determine an event type corresponding to the information text based on the event type prediction information, and determine an argument role included in the information text and an argument value of the argument role based on the argument role prediction information.
Optionally, in some embodiments, the prediction module 803 is specifically configured to perform, by using the first extraction model, BIO labeling of event information on event prior information in a text to be analyzed, and perform BIO labeling of argument information on argument values in the text to be analyzed to obtain a prediction result, where the prediction result includes a text to be analyzed that carries BIO labeling information of the event information and BIO labeling information of the argument information. Correspondingly, the first determining module 804 is specifically configured to determine event information corresponding to the information text based on the to-be-analyzed text and the BIO labeling information of the event information; and determining an argument role included in the information text and an argument value of the argument role based on the BIO labeling information of the text to be analyzed and the argument information.
Optionally, in some embodiments, the predicting module 803 is specifically further configured to label, by using the first extraction model, the trigger word in the text to be analyzed, where the prediction result further includes BIO labeling information of the trigger word. Correspondingly, the first determining module 804 is further specifically configured to determine the trigger word in the information text based on the text to be analyzed and the BIO labeling information of the trigger word.
Optionally, in some of these embodiments, the first extraction model includes: pre-training language models, such as BERT models, ROBERT models, ERNI models, and other large pre-training language models, and the embodiment of the present disclosure does not limit the language models to be specifically used.
Fig. 10 shows a block diagram of an information analysis apparatus in a second exemplary embodiment of the present disclosure. As shown in fig. 10, on the basis of the embodiment shown in fig. 9, the information analysis apparatus of this embodiment may further include: a preprocessing module 805 and a training module 806. Wherein:
the preprocessing module 805 is configured to add event prior information to each initial corpus of the multiple initial corpuses, and label event labeling information and argument role labeling information to the initial corpus to which the event prior information is added, to obtain a first corpus.
The training module 806 is configured to input the first corpus into the first extraction model, so that the first extraction model learns the stipulation relationship between the event information and the argument information, and between the event tagging information and the argument role tagging information in the first corpus.
Optionally, in some embodiments, the preprocessing module 805 is further configured to label trigger word labeling information for the initial corpus after adding the event prior information. Correspondingly, the training module 806 is specifically configured to input the multiple first corpus into a first extraction model, so that the first extraction model learns the event information, the argument information, the trigger word information, and the specification relationship between the event tagging information and the argument role tagging information in the multiple first corpuses.
Optionally, referring to fig. 10 again, on the basis of the foregoing embodiments, the information analysis apparatus may further include: an extraction module 807, a second determination module 808, and a third determination module 809. Wherein:
the extracting module 807 is configured to perform triple information extraction on the relationship between the event information and the argument information and the relationship between the event information and the argument information in the information text acquired by the acquiring module 801 by using the second extraction model, so as to obtain a second extraction result. The second extraction model is obtained by training based on a plurality of second training corpuses in advance, the second training corpuses are marked with main entity marking information, guest entity marking information and relation marking information between the main entity and the guest entities, the main entity comprises events, and the guest entities comprise argument roles.
And a second determining module 808, configured to determine, based on the second extraction result, event information and argument information corresponding to the information text, to obtain a first determination result.
The third determining module 809 is configured to determine an analysis result of the information text based on the first determining result and the second determining result according to a preset rule. The second determination result is event information and argument information corresponding to the information text determined by the first determination module 804 based on the prediction result.
Optionally, in some embodiments, the third determining module 809 is specifically configured to: if the first determination result is consistent with the second determination result, taking any one of the first determination result and the second determination result as an analysis result of the information text, and outputting the analysis result of the information text, or storing the analysis result of the information text in a structured database in a structured information storage mode; or if the first determination result is inconsistent with the second determination result, taking the first determination result or the second determination result determined according to a preset rule as an analysis result of the information text, and outputting the analysis result of the information text, or storing the analysis result of the information text in a structured database in a structured information storage mode; or if the first determination result is inconsistent with the second determination result, determining that the analysis result of the information text is not obtained.
Optionally, referring to fig. 10 again, on the basis of the foregoing embodiments, the information analysis apparatus may further include: an identification module 810, a second identification module 811 and a result processing module 811. Wherein:
and the identifying module 810 is used for identifying whether a negative word exists in the information text and whether the negative word acts on the trigger word in the information text by using the identifying model. The recognition model is obtained in advance based on the joint training of the negative words and the trigger words.
A result processing module 811, configured to modify, according to the recognition result of the recognition module 810, event information and argument information corresponding to the information text based on a negative word if the negative word exists in the information text and the negative word acts on a trigger word in the information text; or discarding the event information and the argument information corresponding to the information text.
The specific implementation of each module, unit, and subunit in the information analysis apparatus provided in the embodiment of the present disclosure may refer to the content in the information analysis method, and is not described herein again.
It should be noted that although several modules, units and sub-units of the apparatus for action execution are mentioned in the above detailed description, such division is not mandatory. Indeed, the features and functionality of two or more modules, units and sub-units described above may be embodied in one module, unit and sub-unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module, unit and sub-unit described above may be further divided into embodiments by a plurality of modules, units and sub-units.
An embodiment of the present disclosure further provides an electronic device, including:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the information analysis method of any of the above embodiments via execution of the executable instructions.
Fig. 11 shows a block diagram of an electronic device in an exemplary embodiment of the disclosure.
As shown in FIG. 11, the example electronic device 90 includes a processor 901 for executing software routines although a single processor is shown for clarity, the electronic device 90 may also include a multi-processor system. The processor 901 is connected to a communication infrastructure 902 for communicating with other components of the electronic device 90. The communication infrastructure 902 may include, for example, a communication bus, a crossbar, or a network.
Electronic device 90 also includes Memory, such as Random Access Memory (RAM), which may include a main Memory 903 and a secondary Memory 910. The secondary memory 910 may include, for example, a hard disk drive 911 and/or a removable storage drive 912, which may include a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. Removable storage drive 912 reads from and/or writes to a removable storage unit 913 in a conventional manner. Removable storage unit 913 may comprise a floppy disk, magnetic tape, optical disk, etc. which is read by and written to by removable storage drive 912. As will be appreciated by those skilled in the relevant art(s), the removable storage unit 913 includes a computer-readable storage medium having stored thereon computer-executable program code instructions and/or data.
In an alternative embodiment, the secondary memory 910 may additionally or alternatively include other similar means for allowing computer programs or other instructions to be loaded into the electronic device 90. Such means may include, for example, a removable storage unit 921 and an interface 920. Examples of removable storage unit 921 and interface 920 include: a program cartridge and cartridge interface (such as that found in video game console devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 921 and interfaces 920 which allow software and data to be transferred from the removable storage unit 921 to electronic device 90.
The electronic device 90 also includes at least one communication interface 940. Communications interface 940 allows software and data to be transferred between electronic device 90 and external devices via communications path 941. In various embodiments of the invention, the communication interface 940 allows data to be transferred between the electronic device 90 and a data communication network, such as a public data or private data communication network. The communication interface 940 may be used to exchange data between different electronic devices 90, which electronic devices 90 form part of an interconnected computer network. Examples of communication interface 940 may include a modem, a network interface (such as an ethernet card), a communication port, an antenna with associated circuitry, and so forth. The communication interface 940 may be wired or may be wireless. Software and data transferred via communications interface 940 are in the form of signals which may be electronic, magnetic, optical or other signals capable of being received by communications interface 940. These signals are provided to a communications interface via communications path 941.
As shown in fig. 11, the electronic device 90 further includes a display interface 931 to perform operations for rendering images to an associated display 930, and an audio interface 932 to perform operations for playing audio content through an associated speaker 933.
In this disclosure, the term "computer program product" may refer, in part, to: a removable storage unit 913, a removable storage unit 921, a hard disk installed in the hard disk drive 911, or a carrier wave carrying software over a communication path 941 (wireless link or cable) to the communication interface 940. Computer-readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to electronic device 90 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROMs, DVDs, Blu-ray (TM) disks, hard disk drives, ROMs, or integrated circuits, USB memory, magneto-optical disks, or a computer-readable card, such as a PCMCIA card, etc., whether internal or external to the electronic device 90. Transitory or non-tangible computer-readable transmission media may also participate in providing software, applications, instructions, and/or data to the electronic device 90, examples of such transmission media including radio or infrared transmission channels, network connections to another computer or another networked device, and the internet or intranet including e-mail transmissions and information recorded on websites and the like.
Computer programs (also called computer program code) are stored in the main memory 903 and/or the secondary memory 910. Computer programs may also be received via communications interface 940. Such computer programs, when executed, enable the electronic device 90 to perform one or more features of embodiments discussed herein. In various embodiments, the computer programs, when executed, enable the processor 901 to perform the features of the embodiments described above. Accordingly, such computer programs represent controllers of the computer system 90.
The software may be stored in a computer program product and loaded into electronic device 90 using removable storage drive 912, hard drive 911 or interface 920. Alternatively, the computer program product may be downloaded to computer system 90 via communications path 941. The software, when executed by the processor 901, causes the electronic device 90 to perform the functions of the embodiments described herein.
It should be understood that the embodiment of fig. 11 is given by way of example only. Accordingly, in some embodiments, one or more features of the electronic device 90 may be omitted. Also, in some embodiments, one or more features of the electronic device 90 may be combined together. Additionally, in some embodiments, one or more features of the electronic device 90 may be separated into one or more components.
It will be appreciated that the elements shown in fig. 11 function to provide a means for performing the various functions and operations of the server described in the above embodiments.
In one embodiment, a server may be generally described as a physical device including at least one processor and at least one memory including computer program code. The at least one memory and the computer program code are configured to, with the at least one processor, cause the physical device to perform necessary operations.
The disclosed embodiments also provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the functions of the information analysis method shown in fig. 2-8.
The disclosed embodiments also provide a computer program comprising computer readable code which, when run on a device, a processor in the device performs the functions for implementing the information analysis method shown in fig. 2-8.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by an electronic device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The basic principles of the present invention have been described above with reference to specific embodiments, but it should be noted that the advantages, effects, etc. mentioned in the present invention are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present invention. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the invention is not limited to the specific details described above.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The method and apparatus of the present invention may be implemented in a number of ways. For example, the methods and apparatus of the present invention may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (16)

1. An information analysis method, comprising:
acquiring an information text;
adding event prior information corresponding to the trigger word to the information text based on the trigger word in the information text to obtain a text to be analyzed;
predicting event information and argument information in the text to be analyzed by using a first extraction model to obtain a prediction result; the first extraction model is obtained by training in advance based on a plurality of first training corpuses, and the first training corpuses are marked with event marking information and argument role marking information;
and determining event information and argument information corresponding to the information text based on the prediction result.
2. The method according to claim 1, wherein adding event prior information corresponding to a trigger word to the information text based on the trigger word in the information text comprises:
detecting a trigger word in the information text;
determining event prior information corresponding to the trigger word;
and respectively adding the event prior information before and after the information text, or adding the event prior information before the information text, or adding the event prior information after the information text.
3. The method of claim 1 or 2, wherein the event prior information comprises: prior information of the event category;
the first training corpus is also marked with trigger word marking information;
the predicting the event information and the argument information in the text to be analyzed by using the first extraction model to obtain a prediction result, and the predicting comprises the following steps:
predicting event information, argument information and trigger word information in the text to be analyzed by using the first extraction model to obtain a prediction result, wherein the prediction result comprises event category prediction information, argument role prediction information and trigger word prediction information;
the determining event information and argument information corresponding to the information text based on the prediction result includes:
determining an event type corresponding to the information text based on the event type prediction information, determining an event type corresponding to the information text based on the event type prediction information and the trigger word prediction information, and determining an argument role included in the information text and an argument value of the argument role based on the argument role prediction information.
4. The method of claim 1 or 2, wherein the event prior information comprises: event type and prior information of event category to which the event type belongs;
the prediction result comprises event prediction information and argument role prediction information; the event prediction information comprises event type prediction information and event type prediction information;
the determining event information and argument information corresponding to the information text based on the prediction result includes:
determining an event type corresponding to the information text based on the event type prediction information, and determining an argument role included in the information text and an argument value of the argument role based on the argument role prediction information.
5. The method according to any one of claims 1 to 4, wherein the predicting, by using the first extraction model, event information and argument information in the text to be analyzed to obtain a prediction result comprises:
performing BIO labeling of event information on event prior information in the text to be analyzed by using the first extraction model, and performing BIO labeling of argument information on argument values in the text to be analyzed to obtain a prediction result, wherein the prediction result comprises the text to be analyzed carrying the BIO labeling information of the event information and the BIO labeling information of the argument information;
the determining event information and argument information corresponding to the information text based on the prediction result includes:
determining event information corresponding to the information text based on the text to be analyzed and the BIO labeling information of the event information; and determining an argument role and an argument value of the argument role included in the information text based on the text to be analyzed and the BIO labeling information of the argument information.
6. The method according to claim 5, wherein the predicting, by using the first extraction model, event information and argument information in the text to be analyzed to obtain a prediction result, further comprises:
marking the trigger words in the text to be analyzed by using the first extraction model, wherein the prediction result further comprises BIO marking information of the trigger words;
the determining the event information and the argument information corresponding to the information text based on the prediction result further includes:
and determining the trigger word in the information text based on the text to be analyzed and the BIO labeling information of the trigger word.
7. The method of any of claims 1-6, wherein the first decimation model comprises: the language model is pre-trained.
8. The method according to any one of claims 1 to 7, wherein training in advance based on a plurality of first corpus to obtain the first extraction model comprises:
adding event prior information aiming at each initial corpus in a plurality of initial corpuses respectively, and labeling event labeling information and argument role labeling information aiming at the initial corpuses added with the event prior information to obtain a first training corpus;
and respectively inputting the first training corpuses into the first extraction model so that the first extraction model learns event information and argument information in the first training corpuses and a stipulation relation between the event labeling information and the argument role labeling information.
9. The method according to claim 8, wherein said pre-training based on a plurality of first corpus results in said first extraction model, further comprising:
marking trigger word marking information aiming at the initial corpus added with the event prior information;
the inputting the plurality of first corpus into the first extraction model respectively to make the first extraction model learn the event information and the argument information in the plurality of first corpus and the specification relationship between the event labeling information and the argument role labeling information includes:
and respectively inputting the first training corpora into the first extraction model so that the first extraction model learns event information, argument information and trigger word information in the first training corpora and a specification relation between the event labeling information and the argument role labeling information.
10. The method of any of claims 1-9, further comprising:
extracting triple information from the event information and the argument information in the information text and the relationship between the event information and the argument information by using a second extraction model to obtain a second extraction result; the second extraction model is obtained by training based on a plurality of second training corpuses in advance, the second training corpuses are marked with main entity marking information, guest entity marking information and relation marking information between the main entity and the guest entities, the main entity comprises events, and the guest entities comprise argument roles;
determining event information and argument information corresponding to the information text based on the second extraction result to obtain a first determination result;
determining an analysis result of the information text based on the first determination result and the second determination result according to a preset rule; and the second determination result is event information and argument information corresponding to the information text determined based on the prediction result.
11. The method according to claim 10, wherein the determining the analysis result of the information text based on the first determination result and the second determination result according to a preset rule comprises:
if the first determination result is consistent with the second determination result, taking any one of the first determination result and the second determination result as an analysis result of the information text, and outputting the analysis result of the information text, or storing the analysis result of the information text in a structured database in a structured information storage manner; alternatively, the first and second electrodes may be,
if the first determination result is inconsistent with the second determination result, taking the first determination result or the second determination result determined according to the preset rule as an analysis result of the information text, and outputting the analysis result of the information text, or storing the analysis result of the information text in a structured database in a structured information storage manner; alternatively, the first and second electrodes may be,
and if the first determination result is inconsistent with the second determination result, determining that the analysis result of the information text is not obtained.
12. The method of any of claims 1-11, further comprising:
identifying whether a negative word exists in the information text and whether the negative word acts on a trigger word in the information text by using an identification model; the recognition model is obtained in advance based on negative word and triggering word combined training;
if a negative word exists in the information text and acts on a trigger word in the information text, correcting event information and argument information corresponding to the information text based on the negative word; or discarding the event information and the argument information corresponding to the information text.
13. An information analysis apparatus, characterized by comprising:
the acquisition module is used for acquiring the information text;
the adding module is used for adding event prior information corresponding to the trigger word to the information text based on the trigger word in the information text to obtain a text to be analyzed;
the prediction module is used for predicting the event information and the argument information in the text to be analyzed by utilizing a first extraction model to obtain a prediction result; the first extraction model is obtained by training in advance based on a plurality of first training corpuses, and the first training corpuses are marked with event marking information and argument role marking information;
and the first determining module is used for determining the event information and the argument information corresponding to the information text based on the prediction result.
14. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the information analysis method of any one of claims 1-12 via execution of the executable instructions.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the information analysis method of any one of claims 1 to 12.
16. A computer program comprising computer readable code, characterized in that when the computer readable code is run on a device, a processor in the device executes a method for implementing the information analysis method according to any one of claims 1-12.
CN202110104560.1A 2021-01-26 2021-01-26 Information analysis method and device, electronic equipment and computer readable storage medium Active CN112860852B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110104560.1A CN112860852B (en) 2021-01-26 2021-01-26 Information analysis method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110104560.1A CN112860852B (en) 2021-01-26 2021-01-26 Information analysis method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112860852A true CN112860852A (en) 2021-05-28
CN112860852B CN112860852B (en) 2024-03-08

Family

ID=76009273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110104560.1A Active CN112860852B (en) 2021-01-26 2021-01-26 Information analysis method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112860852B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434631A (en) * 2021-06-25 2021-09-24 平安科技(深圳)有限公司 Emotion analysis method and device based on event, computer equipment and storage medium
CN113609391A (en) * 2021-08-06 2021-11-05 北京金堤征信服务有限公司 Event recognition method and apparatus, electronic device, medium, and program
CN113761936A (en) * 2021-08-19 2021-12-07 哈尔滨工业大学(威海) Multi-task chapter-level event extraction method based on multi-head self-attention mechanism
CN113761875A (en) * 2021-07-21 2021-12-07 中国科学院自动化研究所 Event extraction method and device, electronic equipment and storage medium
CN113779159A (en) * 2021-08-19 2021-12-10 北京三快在线科技有限公司 Model training method, argument detecting device, electronic equipment and storage medium
CN114065763A (en) * 2021-11-24 2022-02-18 深圳前海环融联易信息科技服务有限公司 Event extraction-based public opinion analysis method and device and related components
WO2022262080A1 (en) * 2021-06-17 2022-12-22 腾讯云计算(北京)有限责任公司 Dialogue relationship processing method, computer and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN110704598A (en) * 2019-09-29 2020-01-17 北京明略软件系统有限公司 Statement information extraction method, extraction device and readable storage medium
CN111222305A (en) * 2019-12-17 2020-06-02 共道网络科技有限公司 Information structuring method and device
CN111597817A (en) * 2020-05-27 2020-08-28 北京明略软件系统有限公司 Event information extraction method and device
CN111967268A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Method and device for extracting events in text, electronic equipment and storage medium
WO2020247616A1 (en) * 2019-06-07 2020-12-10 Raytheon Bbn Technologies Corp. Linguistically rich cross-lingual text event embeddings
CN112116075A (en) * 2020-09-18 2020-12-22 厦门安胜网络科技有限公司 Event extraction model generation method and device and text event extraction method and device
CN112149386A (en) * 2020-09-25 2020-12-29 杭州中软安人网络通信股份有限公司 Event extraction method, storage medium and server

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
WO2020247616A1 (en) * 2019-06-07 2020-12-10 Raytheon Bbn Technologies Corp. Linguistically rich cross-lingual text event embeddings
CN110704598A (en) * 2019-09-29 2020-01-17 北京明略软件系统有限公司 Statement information extraction method, extraction device and readable storage medium
CN111222305A (en) * 2019-12-17 2020-06-02 共道网络科技有限公司 Information structuring method and device
CN111597817A (en) * 2020-05-27 2020-08-28 北京明略软件系统有限公司 Event information extraction method and device
CN111967268A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Method and device for extracting events in text, electronic equipment and storage medium
CN112116075A (en) * 2020-09-18 2020-12-22 厦门安胜网络科技有限公司 Event extraction model generation method and device and text event extraction method and device
CN112149386A (en) * 2020-09-25 2020-12-29 杭州中软安人网络通信股份有限公司 Event extraction method, storage medium and server

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘振;: "基于网络科技信息的事件抽取研究", 情报科学, no. 09, pages 117 - 119 *
李培峰;周国栋;朱巧明;: "基于语义的中文事件触发词抽取联合模型", 软件学报, no. 02, pages 90 - 104 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022262080A1 (en) * 2021-06-17 2022-12-22 腾讯云计算(北京)有限责任公司 Dialogue relationship processing method, computer and readable storage medium
CN113434631A (en) * 2021-06-25 2021-09-24 平安科技(深圳)有限公司 Emotion analysis method and device based on event, computer equipment and storage medium
WO2022267460A1 (en) * 2021-06-25 2022-12-29 平安科技(深圳)有限公司 Event-based sentiment analysis method and apparatus, and computer device and storage medium
CN113434631B (en) * 2021-06-25 2023-10-13 平安科技(深圳)有限公司 Emotion analysis method and device based on event, computer equipment and storage medium
CN113761875A (en) * 2021-07-21 2021-12-07 中国科学院自动化研究所 Event extraction method and device, electronic equipment and storage medium
CN113609391A (en) * 2021-08-06 2021-11-05 北京金堤征信服务有限公司 Event recognition method and apparatus, electronic device, medium, and program
CN113609391B (en) * 2021-08-06 2024-04-19 北京金堤征信服务有限公司 Event recognition method and device, electronic equipment, medium and program
CN113761936A (en) * 2021-08-19 2021-12-07 哈尔滨工业大学(威海) Multi-task chapter-level event extraction method based on multi-head self-attention mechanism
CN113779159A (en) * 2021-08-19 2021-12-10 北京三快在线科技有限公司 Model training method, argument detecting device, electronic equipment and storage medium
CN114065763A (en) * 2021-11-24 2022-02-18 深圳前海环融联易信息科技服务有限公司 Event extraction-based public opinion analysis method and device and related components

Also Published As

Publication number Publication date
CN112860852B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN112860852B (en) Information analysis method and device, electronic equipment and computer readable storage medium
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
CN108536852B (en) Question-answer interaction method and device, computer equipment and computer readable storage medium
US20170308523A1 (en) A method and system for sentiment classification and emotion classification
US9710829B1 (en) Methods, systems, and articles of manufacture for analyzing social media with trained intelligent systems to enhance direct marketing opportunities
CN110287405B (en) Emotion analysis method, emotion analysis device and storage medium
CN108388650B (en) Search processing method and device based on requirements and intelligent equipment
CN111680159A (en) Data processing method and device and electronic equipment
CN113032520A (en) Information analysis method and device, electronic equipment and computer readable storage medium
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
WO2020199600A1 (en) Sentiment polarity analysis method and related device
CN110399473B (en) Method and device for determining answers to user questions
KR20200041199A (en) Method, apparatus and computer-readable medium for operating chatbot
CN108268602A (en) Analyze method, apparatus, equipment and the computer storage media of text topic point
KR102098003B1 (en) Method, apparatus and computer-readable medium for operating chatbot
Das A multimodal approach to sarcasm detection on social media
CN113705207A (en) Grammar error recognition method and device
CN116756281A (en) Knowledge question-answering method, device, equipment and medium
Samuel et al. The dark side of sentiment analysis: An exploratory review using lexicons, dictionaries, and a statistical monkey and chimp
CN110705308A (en) Method and device for recognizing field of voice information, storage medium and electronic equipment
CN113392213B (en) Event extraction method, electronic equipment and storage device
CN115719058A (en) Content analysis method, electronic equipment and storage medium
CN112784015B (en) Information identification method and device, apparatus, medium, and program
CN114579876A (en) False information detection method, device, equipment and medium
CN113722487A (en) User emotion analysis method, device and equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant