CN113609391A - Event recognition method and apparatus, electronic device, medium, and program - Google Patents

Event recognition method and apparatus, electronic device, medium, and program Download PDF

Info

Publication number
CN113609391A
CN113609391A CN202110902349.4A CN202110902349A CN113609391A CN 113609391 A CN113609391 A CN 113609391A CN 202110902349 A CN202110902349 A CN 202110902349A CN 113609391 A CN113609391 A CN 113609391A
Authority
CN
China
Prior art keywords
event type
event
information text
training
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110902349.4A
Other languages
Chinese (zh)
Other versions
CN113609391B (en
Inventor
刘文强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jindi Credit Service Co ltd
Original Assignee
Beijing Jindi Credit Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jindi Credit Service Co ltd filed Critical Beijing Jindi Credit Service Co ltd
Priority to CN202110902349.4A priority Critical patent/CN113609391B/en
Publication of CN113609391A publication Critical patent/CN113609391A/en
Application granted granted Critical
Publication of CN113609391B publication Critical patent/CN113609391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the present disclosure provides an event recognition method and apparatus, an electronic device, a medium, and a program, wherein the method includes: acquiring an information text; predicting whether the information text has the probability value of each event type in the event type list by using an event type recognition model obtained by pre-training to obtain a first prediction result; the event type list comprises a plurality of preset event types; and determining the event type corresponding to the information text based on the first prediction result. According to the technical scheme, the information text can be effectively identified, and the accuracy of identifying the event type corresponding to the information text is improved.

Description

Event recognition method and apparatus, electronic device, medium, and program
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to an event recognition method and apparatus, an electronic device, a medium, and a program.
Background
Public opinion refers to the social attitude of the people as the subject in the direction of social managers, enterprises, individuals and other organizations as objects and their politics, society, morality, etc. around the occurrence, development and change of social events in a certain social space.
With the rapid development of internet technology, the network development and flexibility make it one of the main carriers reflecting social public sentiment. By extracting information and storing the information structurally of the public opinion news of the enterprise, the user can conveniently acquire the comprehensive public opinion information of the concerned enterprise, the public opinion information of the enterprise can be analyzed, the development trend of the enterprise can be accurately judged, and a public opinion report and various statistical reports can be further generated so as to facilitate decision making.
In the prior art, when event recognition is performed on public sentiment information of an enterprise, preset event keywords existing in the public sentiment information are directly extracted as events. In the process of implementing the present disclosure, the inventor finds, through research, that since the preset event keywords are limited, some public opinion information does not have the preset event keywords, and at this time, effective event identification cannot be performed on the public opinion information.
Disclosure of Invention
An object of the present disclosure is to provide an event recognition method and apparatus, an electronic device, a medium, and a program, thereby improving accuracy of event recognition on public opinion information at least to a certain extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided an event recognition method, including:
acquiring an information text;
predicting whether the probability value of each event type in the event type list of the information text is obtained by using an event type recognition model obtained by pre-training to obtain a first prediction result; the event type list comprises a plurality of preset event types;
and determining the event type corresponding to the information text based on the first prediction result.
Optionally, in an exemplary embodiment of the present disclosure, the first prediction result includes: the information text is the probability value of each event type in the event type list and is not the probability value of each event type in the event type list;
determining an event type corresponding to the information text based on the first prediction result, wherein the event type comprises:
acquiring an event type of which the probability value of the event type in the event type list is greater than the probability value of the event type which is not the same event type in the first prediction result as an event type corresponding to the information text; alternatively, the first and second electrodes may be,
acquiring an event type with a probability value of the event type in the event type list larger than a first preset threshold in the first prediction result as an event type corresponding to the information text; alternatively, the first and second electrodes may be,
and acquiring the first N event types with the maximum probability value of each event type in the event type list in the first prediction result as the event types corresponding to the information text, wherein the value of N is an integer greater than 0.
Optionally, in an exemplary embodiment of the present disclosure, after determining an event type corresponding to the information text, the method further includes:
determining an event type corresponding to the information text according to a corresponding relation between each event type in a preset event type list and each event type in an event type list; the event category list comprises a plurality of preset event categories.
Optionally, in an exemplary embodiment of the present disclosure, the method further includes:
predicting whether the probability value of each event category in the event category list of the information text is obtained by using an event category identification model obtained by pre-training to obtain a second prediction result; the event category list comprises a plurality of preset event categories;
determining an event category corresponding to the information text based on the second prediction result;
determining whether the event type corresponding to the information text and the event type corresponding to the information text accord with the corresponding relation between each event type in a preset event type list and each event type in an event type list;
and if the corresponding relation is met, outputting the event type and/or the event type corresponding to the information text.
Optionally, in an exemplary embodiment of the present disclosure, after the obtaining the information text, the method further includes:
identifying whether the length of the information text is greater than a preset length;
if the length of the information text is greater than the preset length, the information text is divided into a plurality of text sections by taking the preset length as a unit;
predicting whether the information text has probability values of all event types in the event type list by using an event type recognition model obtained by pre-training to obtain a first prediction result, wherein the first prediction result comprises the following steps:
respectively predicting probability values of event types in the event type list whether the text segments exist by using an event type recognition model obtained by pre-training;
obtaining a first prediction result based on whether the text segment has the probability value of each event type in the event type list;
otherwise, if the length of the information text is smaller than or equal to the preset length, an event type recognition model obtained through pre-training is used for predicting whether the probability value of each event type in the event type list exists in the information text, and the operation of obtaining a first prediction result is executed.
Optionally, in an exemplary embodiment of the present disclosure, after the obtaining the information text, the method further includes:
identifying whether words in a preset garbage corpus word set exist in the information text or not;
if the words in the preset garbage corpus word set exist in the information text, the subsequent operation is not executed, or the words in the preset garbage corpus word set are filtered, and for the information text after the words in the preset garbage corpus word set are filtered, an event type recognition model obtained by pre-training is used for predicting whether the probability value of each event type in the event type list of the information text is obtained, so that the operation of obtaining a first prediction result is executed;
otherwise, if the words in the preset garbage corpus word set do not exist in the information text, an event type recognition model obtained through pre-training is used for predicting whether the probability value of each event type in the event type list exists in the information text, and the operation of obtaining a first prediction result is executed.
Optionally, in an exemplary embodiment of the present disclosure, after the obtaining the information text, the method further includes:
carrying out entity identification on the information text;
carrying out correlation analysis between the entity and the information text on the information text to obtain a correlation analysis result;
predicting whether the information text has probability values of all event types in the event type list by using an event type recognition model obtained by pre-training to obtain a first prediction result, wherein the first prediction result comprises the following steps:
and according to the correlation analysis result, for the entity with high correlation, performing operation of predicting whether the probability value of each event type in the event type list of the information text is obtained by using an event type recognition model obtained by pre-training to obtain a first prediction result.
Optionally, in an exemplary embodiment of the present disclosure, the high relevance includes any one or more of: the frequency of occurrence is high, the number of occurrences in the first-person perspective is high.
Optionally, in an exemplary embodiment of the present disclosure, the training of the event type recognition model includes:
inputting each event type pre-training corpus and event type marking information in a plurality of event type pre-training corpora into an event type recognition model so that the event type recognition model learns the event type information corresponding to each event type pre-training corpus;
inputting each first corpus and event type marking information in the plurality of first corpuses into an event type recognition model, and outputting probability values of whether each first corpus is in an event type list or not through the event type recognition model;
and training the event type recognition model based on whether the probability value of each event type in the event type list and the probability value corresponding to the corresponding event type marking information exist in the first training corpora.
Optionally, in an exemplary embodiment of the present disclosure, the training of the event category recognition model includes:
inputting each event category pre-training corpus and event category label information in a plurality of event category pre-training corpora into an event category identification model so that the event category identification model learns the event category information corresponding to each event category pre-training corpus;
inputting each second corpus and event category label information in the plurality of second corpuses into an event category identification model, and outputting probability values of whether each second corpus is in an event category list or not through the event category identification model;
and training the event type recognition model based on whether the probability value of each event type in the event type list and the probability value corresponding to the corresponding event type marking information exist in the plurality of second training corpora.
According to a second aspect of the present disclosure, there is provided an event recognition apparatus comprising:
the text acquisition module is used for acquiring the information text;
the first prediction module is used for predicting whether the probability value of each event type in the event type list exists in the information text by utilizing an event type recognition model obtained by pre-training to obtain a first prediction result; the event type list comprises a plurality of preset event types;
and the first determining module is used for determining the event type corresponding to the information text based on the first prediction result.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the above-described event recognition method via execution of executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described event recognition method.
According to a fifth aspect of the present disclosure, there is provided a computer program comprising computer readable code which, when run on a device, is executed by a processor in the device for implementing the above-mentioned event recognition method.
As can be seen from the foregoing technical solutions, the event identification method and apparatus, the electronic device, the computer-readable storage medium, and the computer program in the exemplary embodiments of the present disclosure have at least the following advantages and positive effects:
the event identification method and apparatus, the electronic device, the medium, and the program in the embodiments of the present disclosure obtain an information text, predict whether a probability value of each event type in an event type list is included in the information text by using an event type identification model obtained by training in advance, obtain a first prediction result, where the event type list includes a plurality of event types set in advance, and then determine an event type corresponding to the information text based on the first prediction result. The embodiment of the disclosure can realize effective event recognition for information texts (such as public opinion information); and the probability value of whether the information text is of each event type in the event type list can be accurately predicted, so that the event type corresponding to the information text can be accurately determined, and the accuracy of identifying the event type corresponding to the information text is improved. In addition, the probability value of each event type in the event type list is respectively predicted, and the event type is not directly predicted, so that the event types are independent and not mutually exclusive, and not only can the identification of multiple event types be realized, but also the identification of the event type of the complete information in the information text can be realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 illustrates a system architecture diagram to which embodiments of the present disclosure may be applied;
fig. 2 shows a schematic flow chart of an event recognition method in a first exemplary embodiment of the present disclosure;
fig. 3 shows a flow diagram of an event recognition method in a second exemplary embodiment of the present disclosure;
fig. 4 shows a flow diagram of an event recognition method in a third exemplary embodiment of the present disclosure;
fig. 5 shows a schematic flow chart of an event recognition method in a fourth exemplary embodiment of the present disclosure;
fig. 6 shows a schematic flow chart of an event recognition method in a fifth exemplary embodiment of the present disclosure;
fig. 7 shows a block diagram of an event recognition device in a first exemplary embodiment of the present disclosure;
fig. 8a shows a block diagram of an event recognition device in a second exemplary embodiment of the present disclosure;
fig. 8b shows a block diagram of an event recognition device in a third exemplary embodiment of the present disclosure;
fig. 9 shows a block diagram of an electronic device in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, apparatus, steps, etc. In other instances, well-known structures, methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present disclosure, "a plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. The symbol "/" generally indicates that the former and latter associated objects are in an "or" relationship.
In the present disclosure, unless otherwise expressly specified or limited, the terms "connected" and the like are to be construed broadly, e.g., as meaning electrically connected or in communication with each other; may be directly connected or indirectly connected through an intermediate. The specific meaning of the above terms in the present disclosure can be understood by those of ordinary skill in the art as appropriate.
FIG. 1 shows a system architecture diagram to which embodiments of the present disclosure may be applied. As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The terminal devices 101, 102, 103 may be various electronic devices having display screens including, but not limited to, smart phones, tablets, portable computers, desktop computers, digital cinema projectors, and the like.
The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as: a wired, wireless communication link, or fiber optic cable, etc.
The server 105 may be a server that provides various information texts, such as: various websites, servers from media platforms, databases, etc. In some embodiments, the user uses the terminal device 103 (or the terminal device 101 or 102) to obtain the information text from the server 105 in real time or periodically, and executes the event identification method of the embodiment of the present disclosure to obtain the event type corresponding to the information text, and stores the event type in the database in a structured information storage manner for subsequent analysis and use.
Fig. 2 shows a flowchart of an event recognition method in a first exemplary embodiment of the present disclosure. The present embodiment can be applied to an electronic device, and as shown in fig. 2, the event identification method of the present embodiment includes the following steps:
step 201, obtaining an information text.
The characters in the information text in the embodiment of the present disclosure may be chinese characters, english characters, or characters of any type such as numbers. In addition, the information text in the embodiments of the present disclosure may be a text in any field, and the contents and the fields of the information text are not limited in the embodiments of the present disclosure.
In some embodiments, the information text in the embodiment of the disclosure may be a public opinion news text of an enterprise, the public opinion news text may be an original public opinion news text, or a public opinion news text after preprocessing the original public opinion news text, where the preprocessing may be, for example, removing emoticons, wrong punctuation marks, and the like in the original public opinion news text, and the embodiment of the disclosure does not limit specific content and representation of the public opinion news text, whether to preprocess and a specific manner of preprocessing. For example, a business's public opinion news text may be "a 10 series, which today releases the expected specification of the price in europe and the method of watching live. Company a is well prepared and is now ready to launch the promising a10 series worldwide. The a10 series would replace the a9 series and likely include a10, a10Pro, and a10 ProMax. It is expected that these devices will have new designs, improved cameras, upgraded specifications, etc.
The information text in the embodiment of the present disclosure, for example, the public opinion news text of the above-mentioned enterprise, is unstructured information.
In the embodiment of the present disclosure, the information text may be acquired from each website, forum, self-media platform, or the like in real time or periodically, or the information text input by the user may also be received.
Step 202, predicting whether the information text has a probability value of each event type in the event type list by using an event type recognition model obtained by pre-training to obtain a first prediction result.
The event type list includes a plurality of preset event types.
Where an event is a specific occurrence involving a participant, it can often be described as a change in state.
Step 203, determining the event type corresponding to the information text based on the first prediction result.
Based on the embodiment, a first prediction result is obtained by obtaining an information text and predicting whether the information text has a probability value of each event type in an event type list by using an event type recognition model obtained by pre-training, wherein the event type list comprises a plurality of preset event types, and then the event type corresponding to the information text is determined based on the first prediction result. The embodiment of the disclosure can realize effective event recognition for information texts (such as public opinion information); and the probability value of whether the information text is of each event type in the event type list can be accurately predicted, so that the event type corresponding to the information text can be accurately determined, and the accuracy of identifying the event type corresponding to the information text is improved.
In addition, the probability value of each event type in the event type list is respectively predicted, and the event type is not directly predicted, so that the event types are independent and not mutually exclusive, and not only can the identification of multiple event types be realized, but also the identification of the event type of the complete information in the information text can be realized.
Optionally, in some of these embodiments, the first prediction result may include: the information text is the probability value of each event type in the event type list and the probability value of each event type in the event type list. For example, in the first prediction result, the information text is a probability value of an event type of "violation" in the event type list and a probability value of an event type of "violation" in the event type list.
Optionally, in some embodiments, in step 203, for each event type in the event type list, an event type in the first prediction result, where the probability value of the event type in the event type list is greater than the probability value of the event type in the event type list, may be obtained as the event type corresponding to the information text. For example, in the first prediction result, the probability value of the event type that the information text is "violation" in the event type list is 0.7839257717132568, and the probability value of the event type that the information text is not "violation" is 0.21607419848442078; the probability value of the event type that is "legal dispute" in the event type list is 0.04098828509449959, and the probability value of the event type that is not "legal dispute" is 0.9590117335319519. Since the probability value that the information text is the event type of 'illegal violation' in the event type list is greater than the probability value that the information text is not the event type of 'illegal violation' in the event type list, the event type corresponding to the information text is 'illegal violation'.
Or, in another embodiment, in step 203, for each event type in the event type list, an event type in the first prediction result, where the probability value of the event type in the event type list is greater than a first preset threshold (for example, the first preset threshold is 0.5), may be obtained as the event type corresponding to the information text. For example, assuming that the first preset threshold is 0.52, in the first prediction result, the probability value of the event type that the information text is "violation" in the event type list is 0.7839257717132568, and the probability value of the event type that the information text is not "violation" is 0.21607419848442078; the probability value of the event type that is "legal dispute" in the event type list is 0.04098828509449959, and the probability value of the event type that is not "legal dispute" is 0.9590117335319519. Since the probability value of the event type that the information text is "violation" in the event type list is greater than the first preset threshold value of 0.52, the event type corresponding to the information text is "violation".
Or, in another embodiment, in step 203, for each event type in the event type list, the first N event types in the first prediction result, which have the highest probability value of each event type in the event type list, may be acquired as the event type corresponding to the information text, where a value of N is an integer greater than 0. For example, assume that the event type list includes: the method comprises the following steps that four event types including violation, high-level management change, enterprise officer and safety accident are adopted, the value of N is 2, in a first prediction result, the probability values of the information texts, namely 'violation', 'high-level management change', 'enterprise officer' and 'safety accident' in an event type list are 0.7839257717132568, 0.21607419848442078, 0.04098828509449959 and 0.9590117335319519 respectively, and the event types corresponding to the information texts are 'safety accident' and 'violation' because the first 2 event types with the highest probability values of the event types in the event type list are 'safety accident' and 'violation'.
For example, in one specific example, the information text "a 10 series releases today the expected specification in europe for price and method of watching live. Company a is well prepared and is now ready to launch the promising a10 series worldwide. The a10 series would replace the a9 series and likely include a10, a10Pro, and a10 ProMax. It is expected that these devices will have new designs, improved cameras, upgraded specifications, etc.
Inputting the information text into an event type identification model, and obtaining a first prediction result as follows: ' multi _ pred ': product promotion ', ' multi _ pred _ probs ': [0.4144411087036133,0.5855588912963867] ], wherein, ' multi _ pred ' represents an event type, ' multi _ pred _ probs ' represents a probability value of whether an information text is of an event type ' product promotion ', 0.4144411087036133 represents a probability value of whether the information text is of an event type ' product promotion ', and 0.5855588912963867 represents a probability value of not the information text is of an event type ' product promotion '. The example illustrates an event type as an example, if the event type list includes M event types, the first prediction result includes probability values of whether the event types are M event types, where M is an integer greater than 0. For example, when the event type list includes 2 event types of violation and law dispute, the first prediction result may be: multi _ pred ': illegal | legal dispute', 'multi _ pred _ probs' [ [0.21607419848442078,0.7839257717132568], [0.04098828509449959,0.9590117335319519] ].
Optionally, in some embodiments, after step 203, the event category corresponding to the information text may be further determined according to a correspondence between each event category in a preset event category list and each event type in an event type list. The event category list comprises a plurality of preset event categories.
In the embodiment of the present disclosure, the event categories refer to the fields to which the events belong according to the content thereof, for example, the events are divided into legal actions, enterprise high management, enterprise operation, supervision-related events, investment financing, extreme events, product-related events, and non-events, and the number and the specific dividing manner of the specifically divided event categories are not limited in the embodiment of the present disclosure.
In the embodiment of the present disclosure, each event category may be further divided into a plurality of event types according to the content of the event, for example, the event category of lawsuits may specifically include a violation event type and a legal dispute event type, and the event type is also a sub-category of the event category, and therefore may also be referred to as an event sub-category. Additionally, the event category may also be referred to as a primary event label and the event type may also be referred to as a secondary event label. The number and the specific dividing mode of the event types which are divided specifically under the event category are not limited in the embodiment of the disclosure.
For example, in a specific application, the event type list may be preset to include 8 event types (primary event tags), and 22 event types (secondary event tags) located under each event type in the event type list are set, as shown in the following table.
Primary event tag Secondary event label
Lawsuit of law Violation of law, legal dispute
Enterprise high pipe High pipe variation, high pipe negative
Enterprise operation The enterprise referee and the enterprise profit and loss reach cooperation and market competition
Supervision correlation Regulatory negotiations, warning penalties, spot checks, regulatory policies, protocol investigations
Investment financing External investment, financing process
Extreme events Sudden death from suicide, security accident, thunderstorm event
Product correlation Product promotion, customer complaints, off-shelf/recall
Non-event Non-event
In the above table, the primary event label "non-event" does not belong to the other 7 event categories, and the secondary event label "non-event" does not belong to the 21 event types under the other 7 event categories.
Fig. 3 shows a flow chart of an event recognition method in a second exemplary embodiment of the present disclosure. As shown in fig. 3, on the basis of the embodiment shown in fig. 2, the method may further include:
step 301, predicting whether the information text has a probability value of each event type in the event type list by using an event type recognition model obtained by pre-training to obtain a second prediction result.
The event category list comprises a plurality of preset event categories.
Step 302, based on the second prediction result, determining the event type corresponding to the information text.
Step 303, determining whether the event type corresponding to the information text and the event type corresponding to the information text conform to the correspondence between each event type in the preset event type list and each event type in the event type list.
If the above-mentioned corresponding relationship is satisfied, go to step 304; otherwise, if the corresponding relation is not met, the subsequent operation is not executed.
And step 304, outputting the event type and/or the event type corresponding to the information text.
Specifically, since the correspondence between each event type in the event type list and each event type in the event type list is fixed, and the event type is determined, the corresponding event type can be determined according to the correspondence. For example, if the event type determined based on the first prediction result is illegal, the event type corresponding to the illegal violation according to the correspondence is legal, if the event type determined based on the second prediction result is also legal, the correspondence is met, and the event identification result (including the event type and the event type) is accurately predicted, otherwise, if the correspondence is not met, the event identification result (including the event type and the event type) is not accurately predicted. Based on the embodiment, inaccurate event recognition results can be filtered out, and the accuracy of the event recognition results is ensured.
Fig. 4 shows a flowchart of an event recognition method in a third exemplary embodiment of the present disclosure. As shown in fig. 4, on the basis of the embodiment shown in fig. 2 or fig. 3, after step 201, the method may further include:
step 401, identifying whether the length of the information text is greater than a preset length.
The preset length is less than or equal to the text length supported by the event type recognition model.
If the length of the message text is greater than the preset length, go to step 402; otherwise, if the length of the message text is smaller than or equal to the preset length, step 202 is directly executed.
Step 402, dividing the information text into a plurality of text segments by using the preset length as a unit.
Correspondingly, in step 202, the event type recognition model obtained through pre-training may be used to respectively predict whether each text segment in the text segments has a probability value of each event type in the event type list, and obtain a first prediction result based on whether each text segment has a probability value of each event type in the event type list.
In this embodiment, for an information text with a length greater than a preset length, the preset length is taken as a unit, the information text is divided into a plurality of text segments, the event type recognition model obtained through pre-training is used to respectively predict whether each text segment in the plurality of text segments is a probability value of each event type in the event type list, and a first prediction result is obtained based on whether each text segment is a probability value of each event type in the event type list, so that event recognition on text information with various lengths is realized.
Optionally, in some embodiments, after step 201, it may be further identified whether a word in the preset garbage corpus word set exists in the information text. If the words in the preset garbage corpus word set exist in the information text, the subsequent operation is not executed; or filtering out words in the preset garbage corpus word set, and executing the operation of step 202 on the information text after the words in the preset garbage corpus word set are filtered out. Otherwise, if there is no word in the preset corpus word set in the information text, the operation of step 202 is directly executed.
Based on the embodiment, the quality of the information text can be judged in advance through the preset garbage corpus word set, and the filtering of low-quality words in the garbage information text or the information text is completed, so that the quality of the information text for event recognition and the effectiveness of event recognition are ensured, and the accuracy of event recognition is improved.
Optionally, in some embodiments, after step 201, entity identification may be further performed on the information text, and correlation analysis between an entity and the information text may be performed on the information text to obtain a correlation analysis result, where the entity refers to an enterprise subject. Accordingly, in step 202, specifically, according to the correlation analysis result, for the entity with high correlation, an event type recognition model obtained by pre-training is used to predict whether the information text has a probability value of each event type in the event type list, so as to obtain a first prediction result. The correlation is high, and for example, may include, but is not limited to, any one or more of the following: high frequency of occurrence, high number of occurrences in first-person perspective, and so on.
Specifically, enterprises with information texts can be identified in an inner chain typing mode to obtain all enterprise subjects appearing in the information texts, the correlation between the enterprise subjects and the information texts is analyzed, for example, the frequency of appearance of the enterprise subjects in a first-person perspective and the like, the enterprise subjects with high correlation (namely, the frequency of appearance is high, the frequency of appearance of the enterprise subjects in the first-person perspective is large and the like) are determined, and event identification is carried out, so that event identification of key entities in the information texts is guaranteed, and the effect of event identification is improved. For example, news refers to company a and company a mobile phones, and refers to other mobile phones such as BB and CC, but mainly refers to company a, and the other enterprises are not mentioned in detail, and further, for example, the information text "mobile phone of company AA is sold and sold better than that of company BB", company AA is the first-person perspective, company BB is the non-first-person perspective, and the main body of the enterprise with high relevance is company AA.
In specific application, entities in some information texts appear in short, accurate entity recognition can be carried out through a pre-training entity recognition model and based on context information in the information texts, and partial entities with ambiguity or entities which are not enterprise subjects are filtered. For example, if "millet" is mentioned in a piece of information text, but it can be determined that millet refers to millet company according to the context in the information text, then "millet" in the information text can be determined as an entity; if the fact that millet refers to grain can be determined according to the context in the information text, the fact that millet in the information text is not an entity can be determined, the information text can be filtered, meaningless event recognition is avoided, and the event recognition effect is improved.
Optionally, in some embodiments, the event type recognition model and the event category recognition model in the above embodiments of the present disclosure may be pre-trained language models, such as a BERT model, a RoBERTa model, an ERNIE model, and other large pre-trained language models.
The pre-training language model can learn semantic representation of complete concepts through prior semantic knowledge such as entity concepts in modeling mass data, the representation of semantic knowledge units is closer to the real world, the prior semantic knowledge units are directly modeled while the modeling is input based on character features, and the pre-training language model has strong semantic representation capability.
Fig. 5 is a flowchart illustrating an event recognition method according to a fourth exemplary embodiment of the present disclosure. As shown in fig. 5, the event type recognition model can be trained as follows:
step 501, inputting each event type pre-training corpus and event type label information in a plurality of event type pre-training corpora into an event type recognition model, so that the event type recognition model learns the event type information corresponding to each event type pre-training corpus.
Optionally, in some embodiments, the event type recognition model may be pre-trained in an unsupervised training manner, that is, the event type information learns the event type information corresponding to the event type pre-training corpora of the preset number, or the event type pre-training times of the event type recognition model reach the preset times.
Step 502, inputting each first corpus and event type label information in the plurality of first corpuses into an event type identification model, and outputting probability values of whether each first corpus is in an event type list or not through the event type identification model.
Step 503, training the event type identification model based on whether the probability value of each event type in the event type list and the probability value corresponding to the corresponding event type marking information exist in the plurality of first training corpora.
Optionally, in some embodiments, a sigmoid activation function may be used instead of the softmax activation function, two-class cross entropies (loss) between the probability values of the event types in the event type list and the probability values corresponding to the corresponding event type tagging information are calculated based on each first corpus, then the two-class cross entropies calculated by the plurality of first corpuses are summarized to obtain a sum of the two-class cross entropies, and the event type recognition model is trained in a manner that the sum of the two-class cross entropies is minimized.
Because softmax represents that tags are independent and mutually exclusive, in order to identify a plurality of event types, if 5 event types of 22 event types occur in an information text, the probabilities of the 5 event types are respectively 0.2, 0.2, 0.2, 0.2 and 0.2, the probabilities of the rest event types are all 0, the size of a threshold is very critical for determining the event type of the information text, and it is difficult to set an accurate threshold, and an event identification result may be wrong if the threshold is not accurately set. For example, if the threshold is set to 0.2, the message text occurs for these 5 event types; if the threshold is set to 0.3, no event type has occurred with the message text. And the sigmoid activated function represents that tags are independent and not mutually exclusive, 22 event types in the event type list are classified for two times respectively in order to identify a plurality of event types, namely whether a certain section of information text has each event type in the 22 event types in the event type list or not is classified for 22 times, the threshold value is set to be 0.5, if the probability value is more than 0.5, the event type is shown to appear, and the accuracy of multi-event prediction can be improved.
Optionally, in some embodiments, the event type recognition model may be trained in a supervised training manner, the step 502 and 503 are performed iteratively, and the event type recognition model is trained until a first preset training completion condition is met, for example, the iterative training times of the event type recognition model reach a preset number, and/or a function value of a loss function calculated based on whether the probability value of each event type in the event type list and the probability value corresponding to the corresponding event type tagging information are smaller than a preset threshold value or not is obtained based on the plurality of first training corpora. The embodiment of the present disclosure does not limit the preset training completion condition.
For example, a first corpus is { "text": because of disputes of infringement of information network transmission rights of the works, the court judges that company A pays company B for 3 ten thousand yuan. And on 22 days after 3 months, the company and the company A disclose the dispute first-aid civil judgment book of the infringement work information network transmission right, and judge the company A to compensate for 3 ten thousand yuan of the company B. The judgment book is displayed, the original notice B company claims that the original notice is authorized to obtain the exclusive information network transmission right of the TV series DDDD of the involved works, and the original notice finds that the company of the notice A is not authorized to provide the online playing service of the involved works on the video software A (android mobile phone client) operated by the company of the notice A. The defended act infringes the information network transmission right of the original report, so the law suit is raised, the law is requested to support the request of the original report law suit so as to strike the infringement act and maintain the legitimate rights and interests of the original report. "," event "{ ' multi _ pred ': ' illegal | legal dispute ' } }, where" text "represents the information text," event "represents the event type annotation information, and" multi _ pred ' represents the specifically annotated event type. A piece of information text and a plurality of event types of the information text can be labeled according to the format, so that the event type recognition model can learn the interaction context of the information text and the event types.
Optionally, in some embodiments, a large pre-training language model such as a BERT model, a ROBERT model, an ERNI model, etc. is used as the event type recognition model obtained in step 501, and then the event type recognition model is refined (fine-tuning) by performing step 502 and step 503 iteratively using a large amount of first training corpora to implement the training of the event type recognition model.
Based on the embodiment, a large amount of event type pre-training corpora can be used for pre-training the event type recognition model, and then the event type recognition model is trained by using a large amount of first training corpora, so that the event type recognition model can fully learn the event type information in each first training corpus, and thus after the training of the event type recognition model is completed, the event type information can be predicted, and the accurate prediction of the event type information is realized.
Fig. 6 shows a flowchart of an event recognition method in a fifth exemplary embodiment of the present disclosure. As shown in fig. 6, the event category identification model may be trained as follows:
step 601, inputting each event category pre-training corpus and event category label information in a plurality of event category pre-training corpora into an event category identification model, so that the event category identification model learns the event category information corresponding to each event category pre-training corpus.
Optionally, in some embodiments, the event category identification model may be pre-trained in an unsupervised training manner, that is, the event type information is made to learn event category information corresponding to a preset number of event category pre-training corpora, or the event category pre-training times of the event category identification model reach a preset number of times, which is not limited in the pre-training manner of the event category identification model in the embodiments of the present disclosure.
Step 602, inputting each second corpus and event category label information in the plurality of second corpuses into the event category identification model, and outputting a probability value of whether each second corpus is in an event category list or not through the event category identification model.
Step 603, training the event type identification model based on whether the probability value of each event type in the event type list and the probability value corresponding to the corresponding event type marking information exist in the plurality of second training corpora.
Optionally, in some embodiments, a sigmoid activation function may be used instead of the softmax activation function, two-class cross entropies (loss) between the probability value of each event class in the event class list and the probability value corresponding to the corresponding event class label information are calculated for each second corpus, then the two-class cross entropies calculated by the second corpuses are summed to obtain the sum of the two-class cross entropies, and the event class identification model is trained in a manner that the sum of the two-class cross entropies is minimized.
Because softmax represents that tags are independent and mutually exclusive, in order to identify a plurality of event categories, if 5 types of events occur in 8 types of events, the probability of the 5 types of events is 0.2, 0.2, 0.2, 0.2 and 0.2, and the rest are all 0, the size of a threshold is very critical for determining the event category where the information text occurs, and how to set an accurate threshold is difficult, and an event identification result may be wrong if the threshold is not accurately set. For example, if the threshold is set to 0.2, the information text appears in the 5 event categories; if the threshold is set to 0.3, no event category occurs for the message text. And the sigmoid activation function represents that tags are independent and not mutually exclusive, in order to identify a plurality of event categories, 8 event categories in the event category list are respectively subjected to secondary classification, namely whether a certain section of information text has each event category in the 8 event categories in the event category list or not is subjected to secondary classification for 8 times, the threshold value is 0.5, if the probability value is greater than 0.5, the event category is represented, and the accuracy of multi-event prediction can be improved.
Optionally, in some embodiments, the event category identification model may be trained in a supervised training manner, the step 602 and the step 603 are performed iteratively, and the event category identification model is trained until a second preset training completion condition is met, for example, the number of iterative training times of the event category identification model reaches a preset number, and/or a function value of a loss function calculated based on whether the probability value of each event category in the event category list and the probability value corresponding to the corresponding event category label information are smaller than a preset threshold value or not is obtained based on the plurality of second training corpora. The embodiment of the present disclosure does not limit the preset training completion condition.
Optionally, in some embodiments, a large pre-training language model such as a BERT model, a RoBERTa model, an ERNIE model, etc. is used as the event category identification model obtained in step 601, and then the event category identification model is refined (fine-tuning) by performing step 602 and step 603 iteratively using a large amount of first training corpora to implement the training of the event category identification model.
Based on the embodiment, a large amount of event category pre-training corpora pre-training event category identification models can be utilized, and then a large amount of first training corpora are utilized to train the event category identification models, so that the event category identification models can fully learn the event category information in each first training corpus, and therefore after the training of the event category identification models is completed, the event category information can be predicted, and therefore the accurate prediction of the event category information is achieved.
Fig. 7 shows a block diagram of an event recognition apparatus in a first exemplary embodiment of the present disclosure. The event identification device provided by any embodiment of the present disclosure can be used for implementing the event identification method in the above embodiments of the present disclosure. The event recognition apparatus provided in any embodiment of the present disclosure may be disposed on a terminal device, or may be disposed on a server, or may be partially disposed on a terminal device, or partially disposed on a server, for example, may be disposed on the server 105 in fig. 1, but the present disclosure is not limited thereto.
As shown in fig. 7, the event recognition apparatus of this embodiment includes: a text acquisition module 701, a first prediction module 702, and a first determination module 703. The text acquisition module 701 is used for acquiring an information text; a first prediction module 702, configured to predict, by using an event type identification model obtained through pre-training, whether a probability value of each event type in an event type list of an information text is obtained, so as to obtain a first prediction result, where the event type list includes a plurality of event types set in advance; a first determining module 703, configured to determine, based on the first prediction result, an event type corresponding to the information text.
Based on the embodiment, a first prediction result is obtained by obtaining an information text and predicting whether the information text has a probability value of each event type in an event type list by using an event type recognition model obtained by pre-training, wherein the event type list comprises a plurality of preset event types, and then the event type corresponding to the information text is determined based on the first prediction result. The embodiment of the disclosure can realize effective event recognition for information texts (such as public opinion information); and the probability value of whether the information text is of each event type in the event type list can be accurately predicted, so that the event type corresponding to the information text can be accurately determined, and the accuracy of identifying the event type corresponding to the information text is improved. In addition, the probability value of each event type in the event type list is respectively predicted, and the event type is not directly predicted, so that the event types are independent and not mutually exclusive, and not only can the identification of multiple event types be realized, but also the identification of the event type of the complete information in the information text can be realized.
Optionally, in some of these embodiments, the first prediction result may include: the information text is the probability value of each event type in the event type list and the probability value of each event type in the event type list. Accordingly, the first determining module 703 is specifically configured to: acquiring an event type of which the probability value of each event type in the event type list is greater than the probability value of the event type which is not the same in the first prediction result as an event type corresponding to the information text; or respectively aiming at each event type in the event type list, acquiring the event type of which the probability value of each event type in the event type list is greater than a first preset threshold in the first prediction result as the event type corresponding to the information text; or, the first N event types with the maximum probability value of each event type in the event type list in the first prediction result are obtained as the event types corresponding to the information text, wherein the value of N is an integer greater than 0.
Fig. 8a shows a block diagram of an event recognition device in a second exemplary embodiment of the present disclosure. As shown in fig. 8a, on the basis of the embodiment shown in fig. 7, the event recognition apparatus of this embodiment further includes: a second determining module 801, configured to determine an event category corresponding to the information text according to a correspondence between each event category in the preset event category list and each event type in the event type list. The event category list comprises a plurality of preset event categories.
Fig. 8b shows a block diagram of an event recognition device in a third exemplary embodiment of the present disclosure. As shown in fig. 8b, on the basis of the embodiment shown in fig. 8a, the event recognition apparatus of this embodiment further includes: the second prediction module 802 is configured to predict whether the information text has a probability value of each event category in the event category list by using an event category identification model obtained through pre-training, so as to obtain a second prediction result; the event category list comprises a plurality of preset event categories; a third determining module 803, configured to determine, based on the second prediction result, an event category corresponding to the information text; a fourth determining module 804, configured to determine whether the event category corresponding to the information text and the event type corresponding to the information text meet a correspondence between each event category in a preset event category list and each event type in the event type list; and an output module 805, configured to output the event type and/or event type corresponding to the information text if the correspondence is met.
In addition, referring to fig. 8b again, in the event recognition apparatus of the above embodiment, the apparatus may further include: a first identification module 806, configured to identify whether a length of the information text is greater than a preset length; a dividing module 807, configured to divide the information text into a plurality of text segments by taking the preset length as a unit if the length of the information text is greater than the preset length; a first prediction module 702, configured to obtain a first prediction result based on whether the text segment has a probability value of each event type in the event type list; and if the length of the information text is smaller than or equal to the preset length, predicting whether the probability value of each event type in the event type list of the information text is obtained by using an event type recognition model obtained by pre-training to obtain a first prediction result.
In addition, referring to fig. 8b again, in the event recognition apparatus of the above embodiment, the apparatus may further include: the second identifying module 808 is configured to identify whether a word in the preset spam corpus word set exists in the information text. Correspondingly, the first prediction module 702 is specifically configured to, if a word in the preset garbage corpus word set exists in the information text, not perform a subsequent operation, or filter out the word in the preset garbage corpus word set, and perform an operation of predicting whether the probability value of each event type in the event type list of the information text is the probability value by using an event type recognition model obtained by pre-training for the information text after the word in the preset garbage corpus word set is filtered out, so as to obtain a first prediction result; and if the words in the preset garbage corpus word set do not exist in the information text, performing operation of predicting whether the probability value of each event type in the event type list exists in the information text or not by using an event type recognition model obtained by pre-training to obtain a first prediction result.
In addition, referring to fig. 8b again, in the event recognition apparatus of the above embodiment, the apparatus may further include: a third identification module 809, configured to perform entity identification on the information text; the analysis module 810 is configured to perform correlation analysis between the entity and the information text on the information text to obtain a correlation analysis result. Accordingly, the first prediction module 702 is specifically configured to perform, according to the correlation analysis result, an operation of predicting, by using an event type identification model obtained through pre-training, whether the information text has a probability value of each event type in the event type list, to an entity with high correlation, so as to obtain a first prediction result.
Optionally, in an exemplary embodiment of the present disclosure, the high relevance may include, for example and without limitation, any one or more of the following: high frequency of occurrence, high number of occurrences in first-person perspective, and so on.
In addition, in the event recognition apparatus of the above embodiment, the event recognition apparatus may further include: a first training model (not shown) for: the event type recognition model is used for inputting each event type pre-training corpus and event type marking information in the event type pre-training corpora into the event type recognition model so as to enable the event type recognition model to learn the event type information corresponding to each event type pre-training corpus; inputting each first corpus and event type marking information in the plurality of first corpuses into an event type recognition model, and outputting probability values of whether each first corpus is in an event type list or not through the event type recognition model; and training the event type recognition model based on whether the probability value of each event type in the event type list and the probability value corresponding to the corresponding event type marking information exist in the first training corpora.
In addition, in the event recognition apparatus of the above embodiment, the event recognition apparatus may further include: a second training model (not shown) for: inputting each event category pre-training corpus and event category label information in a plurality of event category pre-training corpora into an event category identification model so that the event category identification model learns the event category information corresponding to each event category pre-training corpus; inputting each second corpus and event category label information in the plurality of second corpuses into an event category identification model, and outputting probability values of whether each second corpus is in an event category list or not through the event category identification model; and training the event type recognition model based on whether the probability value of each event type in the event type list and the probability value corresponding to the corresponding event type marking information exist in the plurality of second training corpora.
The specific implementation of each module, unit and subunit in the event identification apparatus provided in the embodiment of the present disclosure may refer to the content in the event identification method, and is not described herein again.
It should be noted that although several modules, units and sub-units of the apparatus for action execution are mentioned in the above detailed description, such division is not mandatory. Indeed, the features and functionality of two or more modules, units and sub-units described above may be embodied in one module, unit and sub-unit, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module, unit and sub-unit described above may be further divided into embodiments by a plurality of modules, units and sub-units.
An embodiment of the present disclosure further provides an electronic device, including:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the event recognition method of any of the above embodiments via execution of executable instructions.
Fig. 9 shows a block diagram of an electronic device in an exemplary embodiment of the disclosure. As shown in FIG. 9, the example electronic device 90 includes a processor 901 for executing software routines although a single processor is shown for clarity, the electronic device 90 may also include a multi-processor system. The processor 901 is connected to an infrastructure 902 for communicating with other components of the electronic device 90. The infrastructure 902 may include, for example, a communications bus, a crossbar, or a network.
Electronic device 90 also includes Memory, such as Random Access Memory (RAM), which may include main Memory 903 and secondary Memory 910. The secondary memory 910 may include, for example, a hard disk drive 911 and/or a removable storage drive 912, which may include a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. Removable storage drive 912 reads from and/or writes to a removable storage unit 913 in a conventional manner. Removable storage unit 913 may comprise a floppy disk, magnetic tape, optical disk, etc. which is read by and written to by removable storage drive 912. As will be appreciated by those skilled in the relevant art(s), the removable storage unit 913 includes a computer-readable storage medium having stored thereon computer-executable program code instructions and/or data.
In an alternative embodiment, the secondary memory 910 may additionally or alternatively include other similar means for allowing computer programs or other instructions to be loaded into the electronic device 90. Such means may include, for example, a removable storage unit 921 and an interface 920. Examples of removable storage unit 921 and interface 920 include: a program cartridge and cartridge interface (such as that found in video game console devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 921 and interfaces 920 which allow software and data to be transferred from the removable storage unit 921 to electronic device 90.
The electronic device 90 also includes at least one communication interface 940. Communications interface 940 allows software and data to be transferred between electronic device 90 and external devices via communications path 941. In various embodiments of the invention, the communication interface 940 allows data to be transferred between the electronic device 90 and a data communication network, such as a public data or private data communication network. The communication interface 940 may be used to exchange data between different electronic devices 90, which electronic devices 90 form part of an interconnected computer network. Examples of communication interface 940 may include a modem, a network interface (such as an ethernet card), a communication port, an antenna with associated circuitry, and so forth. The communication interface 940 may be wired or may be wireless. Software and data transferred via communications interface 940 are in the form of signals which may be electronic, magnetic, optical or other signals capable of being received by communications interface 940. These signals are provided to a communications interface via communications path 941.
As shown in fig. 9, the electronic device 90 further includes a display interface 931 to perform operations for rendering images to an associated display 930, and an audio interface 932 to perform operations for playing audio content through an associated speaker 933.
In this disclosure, the term "computer program product" may refer, in part, to: a removable storage unit 913, a removable storage unit 921, a hard disk installed in the hard disk drive 911, or a carrier wave carrying software over a communication path 941 (wireless link or cable) to the communication interface 940. Computer-readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to electronic device 90 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROMs, DVDs, Blu-ray (TM) disks, hard disk drives, ROMs, or integrated circuits, USB memory, magneto-optical disks, or a computer-readable card, such as a PCMCIA card, etc., whether internal or external to the electronic device 90. Transitory or non-tangible computer-readable transmission media may also participate in providing software, applications, instructions, and/or data to the electronic device 90, examples of such transmission media including radio or infrared transmission channels, network connections to another computer or another networked device, and the internet or intranet including e-mail transmissions and information recorded on websites and the like.
Computer programs (also called computer program code) are stored in the main memory 903 and/or the secondary memory 910. Computer programs may also be received via communications interface 940. Such computer programs, when executed, enable the electronic device 90 to perform one or more features of embodiments discussed herein. In various embodiments, the computer programs, when executed, enable the processor 901 to perform the features of the embodiments described above. Accordingly, such computer programs represent controllers of the computer system 90.
The software may be stored in a computer program product and loaded into electronic device 90 using removable storage drive 912, hard drive 911 or interface 920. Alternatively, the computer program product may be downloaded to computer system 90 via communications path 941. The software, when executed by the processor 901, causes the electronic device 90 to perform the functions of the embodiments described herein.
It should be understood that the embodiment of fig. 9 is given by way of example only. Accordingly, in some embodiments, one or more features of the electronic device 90 may be omitted. Also, in some embodiments, one or more features of the electronic device 90 may be combined together. Additionally, in some embodiments, one or more features of the electronic device 90 may be separated into one or more components.
It will be appreciated that the elements shown in fig. 9 serve to provide a means for performing the various functions and operations of the server described in the above embodiments.
In one embodiment, a server may be generally described as a physical device including at least one processor and at least one memory including computer program code. The at least one memory and the computer program code are configured to, with the at least one processor, cause the physical device to perform necessary operations.
The disclosed embodiments also provide a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the functions of the event recognition method shown in fig. 2-6.
The disclosed embodiments also provide a computer program comprising computer readable code which, when run on a device, a processor in the device performs the functions for implementing the event recognition method shown in fig. 2-6.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by an electronic device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The basic principles of the present invention have been described above with reference to specific embodiments, but it should be noted that the advantages, effects, etc. mentioned in the present invention are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present invention. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the invention is not limited to the specific details described above.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The method and apparatus of the present invention may be implemented in a number of ways. For example, the methods and apparatus of the present invention may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.
The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims (14)

1. An event recognition method, comprising:
acquiring an information text;
predicting whether the information text has the probability value of each event type in the event type list by using an event type recognition model obtained by pre-training to obtain a first prediction result; the event type list comprises a plurality of preset event types;
and determining the event type corresponding to the information text based on the first prediction result.
2. The method of claim 1, wherein the first prediction comprises: the information text is a probability value of each event type in the event type list and is not a probability value of each event type in the event type list;
the determining the event type corresponding to the information text based on the first prediction result comprises:
acquiring an event type of which the probability value of the event type in the event type list is greater than the probability value of the event type which is not the same event type in the first prediction result as an event type corresponding to the information text; alternatively, the first and second electrodes may be,
acquiring an event type of which the probability value of the event type in the event type list is greater than a first preset threshold in the first prediction result as an event type corresponding to the information text; alternatively, the first and second electrodes may be,
and acquiring the first N event types with the maximum probability value of each event type in the event type list in the first prediction result as the event types corresponding to the information text, wherein the value of N is an integer greater than 0.
3. The method according to claim 1, wherein after determining the event type corresponding to the information text, further comprising:
determining the event type corresponding to the information text according to the corresponding relation between each event type in a preset event type list and each event type in the event type list; the event category list comprises a plurality of preset event categories.
4. The method of claim 1, further comprising:
predicting whether the information text has the probability value of each event category in the event category list by using an event category identification model obtained by pre-training to obtain a second prediction result; the event category list comprises a plurality of preset event categories;
determining an event category corresponding to the information text based on the second prediction result;
determining whether the event type corresponding to the information text and the event type corresponding to the information text accord with the corresponding relation between each event type in a preset event type list and each event type in an event type list;
and if the corresponding relation is met, outputting the event type and/or the event type corresponding to the information text.
5. The method according to any one of claims 1-4, wherein after obtaining the information text, further comprising:
identifying whether the length of the information text is greater than a preset length;
if the length of the information text is larger than a preset length, dividing the information text into a plurality of text segments by taking the preset length as a unit;
the predicting whether the information text has a probability value of each event type in the event type list by using the event type recognition model obtained by pre-training to obtain a first prediction result comprises the following steps:
respectively predicting the probability value of each event type in the event type list whether the text segment is in the event type list by using an event type recognition model obtained by pre-training;
obtaining the first prediction result based on whether the text segment has the probability value of each event type in the event type list;
otherwise, if the length of the information text is smaller than or equal to the preset length, the operation of predicting whether the probability value of each event type in the event type list of the information text is obtained by using the event type recognition model obtained by pre-training is executed, and a first prediction result is obtained.
6. The method according to any one of claims 1-4, wherein after obtaining the information text, further comprising:
identifying whether words in a preset garbage corpus word set exist in the information text or not;
if the words in the preset garbage corpus word set exist in the information text, performing no subsequent operation, or filtering the words in the preset garbage corpus word set, and for the information text with the words in the preset garbage corpus word set filtered, performing the event type recognition model obtained by pre-training to predict whether the information text has probability values of event types in an event type list, so as to obtain a first prediction result;
otherwise, if the words in the preset garbage corpus word set do not exist in the information text, executing the event type recognition model obtained by pre-training, predicting whether the information text has the probability value of each event type in the event type list, and obtaining a first prediction result.
7. The method according to any one of claims 1-4, wherein after obtaining the information text, further comprising:
carrying out entity identification on the information text;
carrying out correlation analysis between the entity and the information text on the information text to obtain a correlation analysis result;
the predicting whether the information text has a probability value of each event type in the event type list by using the event type recognition model obtained by pre-training to obtain a first prediction result comprises the following steps:
and according to the correlation analysis result, for the entity with high correlation, executing the operation of predicting whether the probability value of each event type in the event type list of the information text is obtained by using the event type recognition model obtained by pre-training to obtain a first prediction result.
8. The method of claim 7, wherein the high correlation comprises any one or more of: the frequency of occurrence is high, the number of occurrences in the first-person perspective is high.
9. The method according to any one of claims 1-4, wherein the training of the event type recognition model comprises:
inputting each event type pre-training corpus and event type marking information in a plurality of event type pre-training corpora into the event type recognition model so that the event type recognition model learns the event type information corresponding to each event type pre-training corpus;
inputting each first corpus and event type marking information in a plurality of first corpuses into the event type recognition model, and outputting probability values of whether each first corpus is in an event type list or not through the event type recognition model;
and training the event type identification model based on whether the probability value of each event type in the event type list and the probability value corresponding to the corresponding event type marking information exist in the plurality of first training corpora.
10. The method of claim 4, wherein the training of the event class recognition model comprises:
inputting each event category pre-training corpus and event category marking information in a plurality of event category pre-training corpora into the event category identification model, so that the event category identification model learns the event category information corresponding to each event category pre-training corpus;
inputting each second corpus and event category marking information in a plurality of second corpuses into the event category identification model, and outputting probability values of whether each second corpus is in an event category list or not through the event category identification model;
and training the event type recognition model based on whether the probability value of each event type in the event type list and the probability value corresponding to the corresponding event type marking information exist in the plurality of second training corpora.
11. An event recognition apparatus, comprising:
the text acquisition module is used for acquiring the information text;
the first prediction module is used for predicting whether the information text has the probability value of each event type in the event type list by using an event type recognition model obtained by pre-training to obtain a first prediction result; the event type list comprises a plurality of preset event types;
and the first determining module is used for determining the event type corresponding to the information text based on the first prediction result.
12. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the event recognition method of any one of claims 1-10 via execution of the executable instructions.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the event recognition method of any one of claims 1 to 10.
14. A computer program comprising computer readable code, characterized in that when the computer readable code is run on a device, a processor in the device executes a method for event recognition according to any of claims 1-10.
CN202110902349.4A 2021-08-06 2021-08-06 Event recognition method and device, electronic equipment, medium and program Active CN113609391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110902349.4A CN113609391B (en) 2021-08-06 2021-08-06 Event recognition method and device, electronic equipment, medium and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110902349.4A CN113609391B (en) 2021-08-06 2021-08-06 Event recognition method and device, electronic equipment, medium and program

Publications (2)

Publication Number Publication Date
CN113609391A true CN113609391A (en) 2021-11-05
CN113609391B CN113609391B (en) 2024-04-19

Family

ID=78307494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110902349.4A Active CN113609391B (en) 2021-08-06 2021-08-06 Event recognition method and device, electronic equipment, medium and program

Country Status (1)

Country Link
CN (1) CN113609391B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080133515A1 (en) * 2006-12-01 2008-06-05 Industrial Technology Research Institute Method and system for executing correlative services
WO2018023981A1 (en) * 2016-08-03 2018-02-08 平安科技(深圳)有限公司 Public opinion analysis method, device, apparatus and computer readable storage medium
CN108320255A (en) * 2017-01-16 2018-07-24 软通动力信息技术(集团)有限公司 A kind of information processing method and device
CN108563655A (en) * 2017-12-28 2018-09-21 北京百度网讯科技有限公司 Text based event recognition method and device
US10769223B1 (en) * 2017-05-16 2020-09-08 State Farm Mutual Automobile Insurance Company Systems and methods for identification and classification of social media
CN112036168A (en) * 2020-09-02 2020-12-04 深圳前海微众银行股份有限公司 Event subject recognition model optimization method, device and equipment and readable storage medium
CN112597366A (en) * 2020-11-25 2021-04-02 中国电子科技网络信息安全有限公司 Encoder-Decoder-based event extraction method
CN112750028A (en) * 2020-12-30 2021-05-04 北京知因智慧科技有限公司 Risk early warning method and device of event text based on entity extraction
CN112860852A (en) * 2021-01-26 2021-05-28 北京金堤科技有限公司 Information analysis method and device, electronic equipment and computer readable storage medium
CN112967144A (en) * 2021-03-09 2021-06-15 华泰证券股份有限公司 Financial credit risk event extraction method, readable storage medium and device
CN113032520A (en) * 2021-02-26 2021-06-25 北京金堤征信服务有限公司 Information analysis method and device, electronic equipment and computer readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080133515A1 (en) * 2006-12-01 2008-06-05 Industrial Technology Research Institute Method and system for executing correlative services
WO2018023981A1 (en) * 2016-08-03 2018-02-08 平安科技(深圳)有限公司 Public opinion analysis method, device, apparatus and computer readable storage medium
CN108320255A (en) * 2017-01-16 2018-07-24 软通动力信息技术(集团)有限公司 A kind of information processing method and device
US10769223B1 (en) * 2017-05-16 2020-09-08 State Farm Mutual Automobile Insurance Company Systems and methods for identification and classification of social media
CN108563655A (en) * 2017-12-28 2018-09-21 北京百度网讯科技有限公司 Text based event recognition method and device
CN112036168A (en) * 2020-09-02 2020-12-04 深圳前海微众银行股份有限公司 Event subject recognition model optimization method, device and equipment and readable storage medium
CN112597366A (en) * 2020-11-25 2021-04-02 中国电子科技网络信息安全有限公司 Encoder-Decoder-based event extraction method
CN112750028A (en) * 2020-12-30 2021-05-04 北京知因智慧科技有限公司 Risk early warning method and device of event text based on entity extraction
CN112860852A (en) * 2021-01-26 2021-05-28 北京金堤科技有限公司 Information analysis method and device, electronic equipment and computer readable storage medium
CN113032520A (en) * 2021-02-26 2021-06-25 北京金堤征信服务有限公司 Information analysis method and device, electronic equipment and computer readable storage medium
CN112967144A (en) * 2021-03-09 2021-06-15 华泰证券股份有限公司 Financial credit risk event extraction method, readable storage medium and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
C. KEROGLOU等: "Bound on the probability of HMM misclassification", MEDITERRANEAN CONFERENCE ON CONTROL & AUTOMATION (MED), vol. 2011, 11 August 2011 (2011-08-11), pages 449 - 454 *
巩前胜: ""情景—应对"型应急决策中情景识别关键技术研究", 中国博士学位论文全文数据库 (工程科技Ⅰ辑), vol. 2018, no. 12, 15 December 2018 (2018-12-15), pages 021 - 36 *

Also Published As

Publication number Publication date
CN113609391B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN107463605B (en) Method and device for identifying low-quality news resource, computer equipment and readable medium
CN110598157B (en) Target information identification method, device, equipment and storage medium
CN112231484B (en) News comment auditing method, system, device and storage medium
JP2022535165A (en) Data classification using information aggregated from many classification modules
US20200226510A1 (en) Method and System for Determining Risk Score for a Contract Document
US10637826B1 (en) Policy compliance verification using semantic distance and nearest neighbor search of labeled content
CN112860852B (en) Information analysis method and device, electronic equipment and computer readable storage medium
CN113010638B (en) Entity recognition model generation method and device and entity extraction method and device
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
CN111984792A (en) Website classification method and device, computer equipment and storage medium
CN112507167A (en) Method and device for identifying video collection, electronic equipment and storage medium
CN113032520A (en) Information analysis method and device, electronic equipment and computer readable storage medium
CN105512300B (en) information filtering method and system
CN111695357A (en) Text labeling method and related product
CN117520503A (en) Financial customer service dialogue generation method, device, equipment and medium based on LLM model
CN107545505A (en) Insure recognition methods and the system of finance product information
CN113609390A (en) Information analysis method and device, electronic equipment and computer readable storage medium
CN113887191A (en) Method and device for detecting similarity of articles
CN111784360B (en) Anti-fraud prediction method and system based on network link backtracking
CN116089732B (en) User preference identification method and system based on advertisement click data
CN112685618A (en) User feature identification method and device, computing equipment and computer storage medium
CN113609391B (en) Event recognition method and device, electronic equipment, medium and program
KR20100090178A (en) Apparatus and method refining keyword and contents searching system and method
CN113591467B (en) Event main body recognition method and device, electronic equipment and medium
CN112256836A (en) Recording data processing method and device and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant