CN110705597B - Network early event detection method and system based on event cause and effect extraction - Google Patents

Network early event detection method and system based on event cause and effect extraction Download PDF

Info

Publication number
CN110705597B
CN110705597B CN201910833900.7A CN201910833900A CN110705597B CN 110705597 B CN110705597 B CN 110705597B CN 201910833900 A CN201910833900 A CN 201910833900A CN 110705597 B CN110705597 B CN 110705597B
Authority
CN
China
Prior art keywords
event
seedling
events
causal
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910833900.7A
Other languages
Chinese (zh)
Other versions
CN110705597A (en
Inventor
史存会
程学旗
王俊
张瑾
俞晓明
刘悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201910833900.7A priority Critical patent/CN110705597B/en
Publication of CN110705597A publication Critical patent/CN110705597A/en
Application granted granted Critical
Publication of CN110705597B publication Critical patent/CN110705597B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a network seedling head event detection method and system based on event causal relationship extraction, which comprises the following steps: taking the causal events in the seedling head causal event pairs as seedling head events, storing the seedling head events into a seedling head event sample library, taking data of the seedling head event sample library as a training set, training a first seedling head event classifier based on machine learning, taking causal connection of the seedling head causal event pairs as seedling head event judgment rules, storing the seedling head event judgment rules into a seedling head event judgment rule library, and constructing a second seedling head event classifier based on the rules by using the seedling head event judgment rule library; the method comprises the steps of extracting events from a designated network platform to obtain a plurality of structured events, unifying the structured events which refer to the same event in the plurality of structured events into a common indication event, generalizing the common indication event to obtain an abstract event of the network platform, respectively processing the abstract event by using a first seedling event classifier and a second seedling event classifier, and integrating the results of the first seedling event classifier and the second seedling event classifier to obtain the detection result of the seedling event of the network platform.

Description

Network early event detection method and system based on event cause and effect extraction
Technical Field
The invention relates to the technical field of natural language processing, in particular to a network seedling head event detection method and system based on event cause and effect extraction.
Background
At present, news, microblogs, forums, social media and other information in the Internet are huge in quantity, high in updating speed and strong in real-time performance, and the information is continuously generated to form network streaming data. The network streaming data contains a large number of events which occur in reality, and the content is spread in each field of social life. Therefore, the method becomes an important data source of network public sentiment.
The event is the key point of network public opinion attention, while the young event is the leading event of a certain important event (possibly relating to social and civil problems, political sensitivity problems and sudden events) which may occur, and can show the sign before the important event occurs. Finding the head-of-line event and developing and analyzing the head-of-line event can help people to master the occurrence and development trend of major events in time, so that a response scheme is formulated in time to control the events and the germination stage, and the negative influence of the events is reduced to the maximum extent. For example, an event is known to be a first-class event of a social panic event, and the event is most likely to cause the social panic, so that once the event occurs, a public sentiment system can respond in time to make a response scheme to avoid the occurrence of the social panic. Thus, the head-of-seedling event is detected quickly and accurately from a large amount of event stream data.
The first method is a method for detecting the seedling head event based on rules, such as a method for detecting the seedling head event based on keywords of the seedling head event; the second method is a method for detecting the seedling head event based on machine learning, such as a method for distinguishing the seedling head event based on a text classifier of an SVM.
The first method has high precision in detecting the seedling head event, but because the method is mainly limited to established judgment rules and cannot judge a newly-appeared seedling head event, the generalization capability is poor; the second method has higher precision and certain generalization capability, but the classifier based on the second method is relatively solid, and is not updated after being trained on the training set for the first time or is manually updated at intervals. This allows the model of the second method to guarantee detection accuracy and generalization capability over a period of time, but its effectiveness is compromised beyond a period of time due to lack of or untimely updates.
Disclosure of Invention
The invention aims to solve the problems that the seedling head discriminator in the prior art is poor in generalization capability and the model is relatively solid, and provides a seedling head event detection method and system based on event causal relation extraction.
Specifically, the application provides a network early event detection method based on event cause and effect extraction, which includes:
step 1, reading historical events in a time period from a historical event library in which abstract events are stored, extracting causal event pairs consisting of causal events and causal events from the historical events in the time period, filtering all the causal event pairs, and screening the seedling-end causal event pairs;
step 2, taking the cause events in the seedling head cause and effect event pairs as seedling head events, storing the seedling head event events into a seedling head event sample library, taking the data of the seedling head event sample library as a training set, training a first seedling head event classifier based on machine learning, taking the cause and effect relationship of the seedling head cause and effect event pairs as seedling head event judgment rules, storing the seedling head event judgment rules into a seedling head event judgment rule library, and constructing a second seedling head event classifier based on rules by using the seedling head event judgment rule library;
and 3, extracting events of the appointed network platform to obtain a plurality of structured events, unifying the structured events which refer to the same event in the plurality of structured events into a common event, generalizing the common event to obtain an abstract event of the network platform, respectively processing the abstract event by using the first seedling event classifier and the second seedling event classifier, and integrating the results of the first seedling event classifier and the second seedling event classifier to obtain the detection result of the seedling events of the network platform.
The network pre-emergence event detection method based on event causal relationship extraction is characterized in that the input of the first pre-emergence event classifier is an abstract event, and the first pre-emergence event classifier judges whether the event is a pre-emergence event according to the event characteristics of the abstract event; the input of the second seedling head event classifier is an abstract event, the second seedling head event classifier matches the abstract event according to rules in the seedling head event judgment rule base, and when the abstract event meets the rules, the classifier judges that the event is the seedling head event.
The network early event detection method based on event causal relationship extraction, wherein the process of extracting the causal event pair in the step 1 comprises the following steps:
step 11, judging whether the cause event and the result event are candidate causal pairs or not according to the probability of the occurrence of the cause event in a period of time after the cause event occurs;
and 12, comparing all the candidate causal pairs of the same effect event, and selecting the top k candidate causal pairs with the maximum correlation as the causal event pairs of the effect event.
The network early event detection method based on event cause-and-effect extraction is characterized in that the common reference event comprises event participants, event trigger words, event elements and event occurrence places.
The network early event detection method based on event cause and effect extraction further comprises the following steps:
and a dynamic updating step, wherein the step 1 and the step 2 are periodically and repeatedly executed according to a preset period, so that historical events in a time period are read from a historical event library, and the first seedling event classifier and the second seedling event classifier are dynamically updated, and the detection effect of the seedling events in the step 3 is ensured.
The invention also provides a network early event detection system based on event cause and effect extraction, which comprises the following steps:
the method comprises the steps that a module 1 reads historical events in a time period from a historical event library stored with abstract events, causal event pairs formed by causal events and causal events are extracted from the historical events in the time period, all the causal event pairs are filtered, and early causal event pairs are screened;
a module 2, taking the cause events in the seedling head cause and effect event pairs as seedling head events, storing the seedling head event events into a seedling head event sample library, taking the data of the seedling head event sample library as a training set, training a first seedling head event classifier based on machine learning, taking the cause and effect relationship of the seedling head cause and effect event pairs as seedling head event judgment rules, storing the seedling head event judgment rules into a seedling head event judgment rule library, and constructing a second seedling head event classifier based on rules by using the seedling head event judgment rule library;
and the module 3 is used for extracting events from the appointed network platform to obtain a plurality of structured events, unifying the structured events which refer to the same event in the plurality of structured events into a common event, generalizing the common event to obtain an abstract event of the network platform, respectively processing the abstract event by using the first seedling event classifier and the second seedling event classifier, and integrating the results of the first seedling event classifier and the second seedling event classifier to obtain the detection result of the seedling events of the network platform.
The network pre-emergence event detection system based on event causal relationship extraction is characterized in that the input of the first pre-emergence event classifier is an abstract event, and the first pre-emergence event classifier judges whether the event is a pre-emergence event according to the event characteristics of the abstract event; the input of the second seedling event classifier is abstract events, the second seedling event classifier matches the abstract events according to rules in the seedling event judgment rule base, and when the abstract events meet the rules, the classifier judges that the events are seedling events.
The network early event detection system based on event causal relationship extraction, wherein the process of extracting the causal event pair in the module 1 comprises the following steps:
the module 11 is used for judging whether the cause event and the result event are candidate cause-effect pairs or not according to the probability of the occurrence of the cause event in a period of time after the cause event occurs;
the module 12 selects the first k candidate causal pairs of which the correlation is the greatest as the causal pair of the consequent event by comparing all the candidate causal pairs of the same consequent event.
The network early event detection system based on event cause-and-effect extraction is characterized in that the coreference event comprises event participants, event trigger words, event elements and event occurrence places.
The network early event detection system based on event cause and effect extraction further comprises:
and the dynamic updating module is used for periodically and repeatedly executing the module 1 and the module 2 according to a preset period so as to read historical events in a time period from the historical event library and dynamically update the first seedling event classifier and the second seedling event classifier, thereby ensuring the detection effect of the seedling events in the module 3.
According to the scheme, the invention has the advantages that:
compared with the prior art, the event preprocessing module carries out coreference resolution and event generalization on the events, so that the matching of the events is more accurate and simpler, and the problem of sparse events is solved; the seedling head event detection module is combined with two main stream seedling head event distinguishing methods, so that the advantages are complementary, and the effect of the whole discriminator is improved; the seedling head event detection model updating module is used for extracting new data according to the causal event and updating the model without manually marking the new data and manually making new rules to update the model, so that the labor cost can be greatly saved. Meanwhile, the dynamic updating of the model ensures that the model can deal with real-time data and newly-appeared seedling head events, reduces misjudgment and missing judgment of the seedling head events and improves the monitoring effect of the seedling head events.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In the prior art, the characteristics of the seedling head event are generally learned from the known seedling head event, and the judgment of the seedling head event is carried out, for example, a rule-based method mainly depends on the characteristics of keywords and the like of the seedling head event learned from the known seedling head event manually, and a judgment rule is formulated; the method based on machine learning mainly uses the existing seedling head event as a training set to train the seedling head event discriminator. The methods only consider the known seedling head events, so that the discrimination of the seedling head events is limited in a small range, and the discovery of unknown potential seedling head events is not considered. Specifically, the solution is as follows: potential event cause and effect relationships are mined from a large number of event streams (APT logic may be used), cause events for which the resulting event is a major event (event of interest) are stored as a top event in a top event repository for the mined cause and effect pairs, and then the top event discriminator is updated.
In order to achieve the above purpose, the present invention is mainly divided into three modules:
1. the event preprocessing module preprocesses the input event. Wherein the event preprocessing module comprises two processes: event coreference resolution and event generalization. The event coreference resolution unifies a plurality of events (different expression modes) which refer to the same event into one coreference event for representation; event generalization the event abstraction and generalization are performed on the co-reference events, and the specifically specialized events are abstracted.
Wherein, the information of the input event comprises: event ID, event participant, event trigger word, event element, event occurrence time, event occurrence place and event description.
Wherein, the event coreference resolution: and for the input events, judging whether a plurality of events in the input events refer to the same event by using a machine learning model, and unifying the plurality of events which refer to the same event into a common event for representation.
The event after coreference resolution is called a coreference event, and the information of the coreference resolved event comprises the following information: the event ID, event participants, event trigger words, event elements, event occurrence time, event occurrence place and source event ID list are referred in common. The event trigger word is a word or phrase representing an event; the event element is a binary pair consisting of an entity participating in the event and an attribute of the entity.
Wherein, the event generalization: the event abstraction and generalization are carried out on the co-referent events, specifically, event participants, event trigger words, event elements and event occurrence places of the events are abstracted, namely, words of the elements describing the events are converted into upper words thereof by using wordnet (a dictionary based on cognitive linguistics, words are related according to semantics thereof to form a graph structure), and the words are converted into the upper words thereof (for example, the event ' three-in-three kills lie four ' can be generalized into an abstract event ' someone conspires someone ' to kill someone ').
The purpose of event generalization is to unify words with the same or similar meanings in events, so as to find the similarity between different events, and solve the problem of event sparseness in event matching (for example, event "three kill Liqu is harmful" and event "five Wang kill Qian six" can be generalized into abstract event "someone kills someone"), so as to facilitate the subsequent event cause and effect extraction work.
2. And the seedling event detection module is used for monitoring the input abstract events and judging whether the abstract events are seedling events. Wherein the first event detection module comprises two first event classifiers: a machine learning-based head event classifier and a rule-based head event classifier. And integrating the results of the two classifiers to obtain a processing result of the seedling event detection module.
Wherein the machine learning based classifier: the input of the classifier is an abstract event (specifically, event participants, trigger words, elements, occurrence time and occurrence places are used as event characteristics), and the classifier model judges whether the event is a seedling-end event or not according to the event characteristics; the classifier is a neural network model obtained by training the seedling head event set in the seedling head event sample library; the updating ways of the data in the seedling event sample library are two, one is manual updating, and the other is automatic updating of the seedling event detection model updating module.
Wherein, the said classifier of event of seedling head based on rule: the input of the classifier is an abstract event (specifically, event participants, trigger words, elements, occurrence time and occurrence places are used as event characteristics), the classifier matches the abstract event according to rules in a seedling head event judgment rule base, when the abstract event meets the rules, the classifier judges that the event is a seedling head event, otherwise, the event is a non-seedling head event; the updating ways of the seedling head event judgment rule base of the classifier are two, one is manual updating, and the other is automatic updating of the seedling head event detection model updating module.
The final input of the seedling event detection module integrates the results of the two classifiers, and the combination formula is as follows:
f=αf 1 +(1-α)f 2
wherein f isFor the integrated output of the seedling head event detection module, f 1 For the output of the classifier based on machine learning, f 2 For the output of the rule-based classifier, the α parameter is f 1 The weight of the integrated output is 0.75 by default.
3. And the seedling event detection model updating module updates the two classification models used by the seedling event detection module. Wherein the seedling head event detection model updating module comprises four steps: extracting the causal relationship of the historical events, filtering the causal events of the seedling head, updating a sample library of the seedling head events and a judgment rule library of the seedling head events, and updating two classifiers used by a detection module of the seedling head events according to the updated data.
And extracting the causal relationship of the historical events: historical events in a period of time (wherein the representation of the historical events is abstract events) are read from a historical event library, and a causal event pair (causal event + effect event) in the period of events is extracted by using an APT logic algorithm.
The APT logic algorithm: the method is used for solving the reasoning problem in the form of 'the probability of the event G occurring in t time units after the event F is generated is l% -u%'.
Specifically, the causal extraction algorithm is divided into an event causal pair extraction algorithm and an event causal pair screening algorithm.
The event causal pair extraction algorithm mainly calculates the probability that a known cause event (set as c) occurs and an effect event (set as e) occurs within delta t after c, whether a more probability value is greater than a threshold value or not and whether the support degree (occurrence probability) of the cause event is greater than the threshold value or not, and judges whether the causal relationship of the event pair is established or not. The formalization is represented as:
Figure GDA0003744285190000061
s c =|{t|c∈E(t)}|
wherein c → e indicates that event e occurs within Δ t after event c occurs, i.e. there is a causal pair of events (causal event c, causal event e), p c→e Denotes the probability that event c and event E constitute a causal pair of events, E (t) denotes the set of events that occur at time t, c ∈ E (t) denotes that event c occurs at event t, and SumE (t, t + Δ t) denotes that event c occurs over a time period (t, t + Δ t)]E ∈ SumE (t, t + Δ t) } indicates that the event e occurs at the time period (t, t + Δ t)]In (1). s is c The support degree of the reason event c characterizes the occurrence probability of the reason event.
When p is c→e And s c And when the sum of the difference is larger than the threshold value, extracting a candidate causal pair c → e. Namely, it is
p c→e >MinProb
s c >MinSup
The event causal pair screening algorithm compares a plurality of candidate causal pairs of the same effect event, and selects the first k candidate causal pairs with the largest relevance as the predicted causal pairs of the effect event. In particular, as follows, two candidate causal pairs c of the event e are assumed to exist 1 → e and c 2 → e, calculating the probability of two causal pairs occurring simultaneously
Figure GDA0003744285190000071
And the probability that the former does not occur the latter
Figure GDA0003744285190000072
Figure GDA0003744285190000073
Figure GDA0003744285190000074
Wherein, c 1 ∧c 2 → e represents event c 1 And event c 2 Simultaneously, the result event e occurs within the later delta t time;
Figure GDA00037442851900000710
represents an event c 2 Occurs alone (event c) 1 Not occurring) followed by a resulting event e occurring within a time Δ t.
Computing
Figure GDA0003744285190000075
If it is
Figure GDA0003744285190000076
Greater than zero, meaning c 1 ∧c 2 Probability of → e being greater than
Figure GDA00037442851900000711
It can be inferred that compared to event c 2 Occurs alone, event c 1 May cause an increased probability of occurrence of the resulting event e. Further, it can be seen that,
Figure GDA0003744285190000077
can be considered as a candidate causal pair c 1 Importance indicator of → e. Then use c 1 → e compares all the candidate event pairs of the result event e, calculates
Figure GDA0003744285190000078
Can calculate c 1 → e comprehensive evaluation score. Therefore, for a plurality of candidate causal pairs (set R (e)) for a certain outcome event e, the overall evaluation score of any of the causal pairs (c → e) is calculated as:
Figure GDA0003744285190000079
selection of score c→e The top k candidate causal pairs with the largest score are the predicted causal pairs for the outcome event e.
The filtration of the head of seedling causal event: filtering all causal event pairs obtained in the causal extraction, and screening the seedling head causal event pairs; the seedling head causal event pair is as follows: the outcome event is a causal event pair of a major event (event of interest) (head of seedling event + major event); the important events are established by specific services, and are generally events with important social influence, such as accidents like crowd accidents, terrorist attacks and the like. The discrimination of the events is matched by a manually added major event set or a major event discrimination rule, and can be changed along with the change of the focus of the system.
Updating the seedling head event sample library and the seedling head event judgment rule library: and storing the cause events in the obtained seedling head cause and effect event pairs as seedling head events in a seedling head event sample library, and storing the cause and effect relationship of the seedling head cause and effect event pairs as seedling head event judgment rules in a seedling head event judgment rule library.
The two classifiers used by the update seedling head event detection module are as follows: and after the seedling head event sample library and the seedling head event judgment rule library are updated, training the classifier based on machine learning by using the data of the seedling head event sample library as a training set. The function of the head of seedling event judgment rule base comprises the following steps:
1. the seedling head event judgment rule base stores the judgment rules of the seedling head event, and the rules can be manually added and also comprise causal connection discovered by a causal event discovery model;
2. the rules in the seedling head event judgment rule base are used for judging the seedling heads of the seedling head event classifier based on the rules;
3. because the classifier based on the rules does not need to use data for training, the classifier classifies the events to be distinguished by using the rules in the rule base and judges whether one event is a seedling-end event or not according to the rule matching degree. Therefore, the function of updating the classifier is achieved only by updating the source rule base;
4. the role of the rule-based classifier is to complement the effect of the machine learning-based classifier. Meanwhile, an interface for manually specifying a specific seedling event judgment rule is provided.
Identifying a seedling head event based on event cause and effect extraction, and taking a cause event of a major event as the seedling head event; the technical effects are as follows: and identifying a newly appeared seedling head event from the historical event data by using an unsupervised method, and updating the seedling head event classifier to dynamically update the seedling head event detection model and reduce the risk of complete failure of new types of events.
Based on an APT logic algorithm, discovering potential causal event pairs from the time sequence relation of historical event data; the technical effects are as follows: huge historical data are effectively utilized, valuable information is discovered, and a seedling event detection model is gained.
Event generalization pretreatment, namely generalizing the representation of events, and unifying the events which are similar but have different representations; the technical effects are as follows: the probability of successful event matching is increased, the problem of sparse events is solved, and high-level causal relationship among events is conveniently discovered.
The detection result of the seedling head event integrates a classifier based on machine learning and a classifier based on rules. The technical effects are as follows: the two classifiers have respective advantages and disadvantages, and are combined according to the weight, so that the advantages of the two classifiers are complementary, and the detection result of the seedling event is more accurate.
In order to make the aforementioned features and advantages of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
Referring to the attached FIG. 1, the method of the present invention comprises the following steps:
the event preprocessing module preprocesses the input event. Wherein the event preprocessing module comprises two processes: event coreference resolution and event generalization.
The information of the input event comprises: event ID, event participant, event trigger word, event element, event occurrence time, event occurrence place and event description.
The event coreference resolution: and for the input events, judging whether a plurality of events in the input events refer to the same event by using a machine learning model, and unifying the plurality of events which refer to the same event into a common event for representation.
The event after coreference resolution is called a coreference event, and the information comprises: common reference event ID, event participant, event trigger, event element, event time, event place, source event ID list
The event generalization comprises the following steps: the event abstraction and generalization are carried out on the coreference event, specifically, event participants, event trigger words, event elements and event occurrence places of the event are abstracted, namely, words of the elements describing the event are converted into upper words thereof by using wordnet (a dictionary based on cognitive linguistics, words are related according to semantics thereof to form a graph structure) (for example, the event 'three-in-one kills Liqu' can be generalized into an abstract event 'someone conspires to kill someone'). The purpose of event generalization is to unify words with the same or similar meanings in events, so as to find the similarity among different events, solve the problem of event sparseness in event matching (for example, event "Zhangsan kills Li IV" and event "Wang Wu sui kills Qian Liu" can be generalized to abstract event "someone kills someone") and facilitate the subsequent event causal relationship extraction work.
And the seedling event detection module is used for monitoring the input abstract events and judging whether the abstract events are seedling events. Wherein the seedling head event detection module comprises two seedling head event classifiers: a seedling head event classifier based on machine learning and a seedling head event classifier based on rules. And integrating the results of the two classifiers to obtain a processing result of the seedling event detection module.
The machine learning-based classifier: the input of the classifier is an abstract event (specifically, event participants, trigger words, elements, occurrence time and occurrence places are used as event characteristics), and the classifier model judges whether the event is a seedling-end event or not according to the event characteristics; the classifier is a neural network model obtained by training the seedling head event set in the seedling head event sample library; the updating ways of the data in the seedling event sample library are two, one is manual updating, and the other is automatic updating of the seedling event detection model updating module.
The said classifier of event of seedling head based on rule: the input of the classifier is an abstract event (specifically, event participants, trigger words, elements, occurrence time and occurrence places are used as event characteristics), the classifier matches the abstract event according to rules in a seedling head event judgment rule base, when the abstract event meets the rules, the classifier judges that the event is a seedling head event, otherwise, the event is a non-seedling head event; the updating ways of the seedling event judgment rule base of the classifier are two, one is manual updating, and the other is automatic updating of the seedling event detection model updating module.
The final input of the seedling event detection module integrates the results of the two classifiers, and the combination formula is as follows:
f=αf 1 +(1-α)f 2
wherein f is the integrated output of the event detection module at the seedling head, f 1 For the output of the classifier based on machine learning, f 2 For the output of the rule-based classifier, the α parameter is f 1 The weight of the integrated output is 0.75 by default.
And the seedling event detection model updating module updates the two classification models used by the seedling event detection module. Wherein the seedling head event detection model updating module comprises four steps: extracting the causal relationship of the historical events, filtering the causal events of the seedling head, updating a sample library of the seedling head events and a judgment rule library of the seedling head events, and updating two classifiers used by a detection module of the seedling head events according to the updated data.
And extracting the causal relationship of the historical events: historical events in a period of time are read from a historical event library (wherein the representation of the historical events is abstract events), and a causal event pair (causal event + effect event) in the period of events is extracted by using an APT logic algorithm.
The APT logic algorithm: the method is used for solving the reasoning problem in the form of 'the probability of the event G occurring in t time units after the event F is generated is l% -u%'.
Specifically, the causal extraction algorithm of the invention is divided into an event causal pair extraction algorithm and an event causal pair screening algorithm.
The event causal pair extraction algorithm mainly calculates the probability that a known cause event (set as c) occurs and an effect event (set as e) occurs within delta t after c, whether a more probability value is greater than a threshold value or not and whether the support degree (occurrence probability) of the cause event is greater than the threshold value or not, and judges whether the causal relationship of the event pair is established or not. The formalization is represented as:
Figure GDA0003744285190000111
s c =|{t|c∈E(t)}|
wherein c → e indicates that event e occurs within Δ t after event c occurs, i.e. there is a causal pair of events (causal event c, causal event e), p c→e Denotes the probability that event c and event E constitute causal pairs of events, E (t) denotes the set of events that occur at time t, c ∈ E (t) denotes that event c occurs at event t, sumE (t, t + Δ t) denotes that event c occurs at time period (t, t + Δ t)]E ∈ SumE (t, t + Δ t) } indicates that event e occurs at a time period (t, t + Δ t)]In (1). s c And describing the probability of occurrence of the reason event for the support degree of the reason event c.
When p is c→e And s c And if the sum is larger than the threshold value, extracting a candidate causal pair c → e. Namely, it is
p c→e >MinProb
s c >MinSup
The event causal pair screening algorithm compares a plurality of candidate causal pairs of the same effect event, and selects the first k candidate causal pairs with the largest relevance as the predicted causal pairs of the effect event. In particular, as follows, two candidate causal pairs c of the event e are assumed to exist 1 → e and c 2 → e, calculating the probability of two causal pairs occurring simultaneously
Figure GDA0003744285190000112
And the probability that the former does not occur the latter
Figure GDA0003744285190000113
Figure GDA0003744285190000114
Figure GDA0003744285190000115
Wherein, c 1 ∧c 2 → e represents event c 1 And event c 2 Simultaneously, a result event e occurs within the later delta t time;
Figure GDA00037442851900001110
represents an event c 2 Occurs alone (event c) 1 Not occurring) followed by a resulting event e occurring at a time Δ t.
Calculating out
Figure GDA0003744285190000116
If it is
Figure GDA0003744285190000117
Greater than zero, represents c 1 ∧c 2 Probability greater than → e
Figure GDA00037442851900001111
It can be inferred that compare to event c 2 Occurs alone, event c 1 May cause an increase in the probability of occurrence of the resulting event e. Further, it can be known that,
Figure GDA0003744285190000118
can be regarded as a candidate causal pair c 1 Importance indicator of → e. Then use c 1 → e compares all the candidate event pairs of the result event e, calculates
Figure GDA0003744285190000119
Can calculate c 1 → e comprehensive evaluation score. Therefore, for a plurality of candidate causal pairs (set R (e)) of a certain outcome event e, a composite evaluation score of any of the causal pairs (c → e) is calculated as:
Figure GDA0003744285190000121
selection score c→e The first k candidate causal pairs with the largest score are the predictors of the resulting event eAnd (5) fruit pairing.
The filtration of the head of seedling causal event: filtering all causal event pairs obtained in the causal extraction, and screening the seedling head causal event pairs; the seedling head causal event pair is as follows: the outcome event is a causal event pair of a major event (event of interest) (head of seedling event + major event); the important events are established by specific services, and are generally events with important social influence, such as accidents like crowd accidents, terrorist attacks and the like. The discrimination of the events is matched by a manually added major event set or a major event discrimination rule, and can be changed along with the change of the focus of the system.
Updating the seedling head event sample library and the seedling head event judgment rule library: and storing the cause events in the obtained seedling head cause and effect event pairs as seedling head events in a seedling head event sample library, and storing the cause and effect relationship of the seedling head cause and effect event pairs as seedling head event judgment rules in a seedling head event judgment rule library.
The two classifiers used by the update seedling head event detection module are as follows: and after the seedling head event sample library and the seedling head event judgment rule library are updated, training the classifier based on machine learning again according to the data of the seedling head event sample library as a training set.
The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a network early event detection system based on event cause and effect extraction, which comprises the following steps:
the method comprises the following steps that 1, historical events in a time period are read from a historical event library in which abstract events are stored, causal event pairs formed by causal events and causal events are extracted from the historical events in the time period by using an APT (advanced persistent threat) logic algorithm, all the causal event pairs are filtered, and the seedling-end causal event pairs are screened;
the module 2 is used for taking the causal events in the seedling head causal event pairs as seedling head events, storing the seedling head event events into a seedling head event sample library, taking the data of the seedling head event sample library as a training set, training a first seedling head event classifier based on machine learning, taking causal relation of the seedling head causal event pairs as a seedling head event judgment rule, storing the seedling head event classifier into a seedling head event judgment rule library, and constructing a second seedling head event classifier based on the rule by using the seedling head event judgment rule library;
and the module 3 is used for extracting events from a specified network platform to obtain a plurality of structured events, unifying the structured events which refer to the same event in the plurality of structured events into a common-pointed event, generalizing the common-pointed event to obtain an abstract event of the network platform, respectively processing the abstract event by using the first seedling-end event classifier and the second seedling-end event classifier, and integrating the results of the first seedling-end event classifier and the second seedling-end event classifier to serve as the detection result of the seedling-end event of the network platform.
The network seedling head event detection system based on event causal relationship extraction is characterized in that the input of the first seedling head event classifier is an abstract event, and the first seedling head event classifier judges whether the event is a seedling head event according to the event characteristics of the abstract event; the input of the second seedling event classifier is abstract events, the second seedling event classifier matches the abstract events according to rules in the seedling event judgment rule base, and when the abstract events meet the rules, the classifier judges that the events are seedling events.
The network early event detection system based on event causal relationship extraction, wherein the process of extracting the causal event pair in the module 1 comprises the following steps:
the module 11 is used for judging whether the cause event and the result event are candidate cause-effect pairs or not according to the probability of the occurrence of the cause event in a period of time after the cause event occurs;
the module 12 selects the first k candidate causal pairs of which the correlation is the greatest as the causal event pair of the effect event by comparing all candidate causal pairs of the same effect event.
The network early event detection system based on event cause-and-effect extraction is characterized in that the coreference event comprises event participants, event trigger words, event elements and event occurrence places.
The network early event detection system based on event cause and effect extraction further comprises:
and the dynamic updating module is used for periodically and repeatedly executing the step 1 and the step 2 according to a preset period so as to read the historical events in a time period from the historical event library and dynamically update the first seedling event classifier and the second seedling event classifier, thereby ensuring the detection effect of the seedling events in the step 3.

Claims (10)

1. A network early event detection method based on event causal relationship extraction is characterized by comprising the following steps:
step 1, reading historical events in a time period from a historical event library in which abstract events are stored, extracting causal event pairs formed by causal events and causal events from the historical events in the time period by using an APT (advanced persistent threat) logic algorithm, filtering all the causal event pairs, and screening out the seedling-end causal event pairs;
step 2, taking the cause events in the seedling head cause and effect event pairs as seedling head events, storing the seedling head event events into a seedling head event sample library, taking the data of the seedling head event sample library as a training set, training a first seedling head event classifier based on machine learning, taking the cause and effect relationship of the seedling head cause and effect event pairs as seedling head event judgment rules, storing the seedling head event judgment rules into a seedling head event judgment rule library, and constructing a second seedling head event classifier based on rules by using the seedling head event judgment rule library;
and 3, extracting events from the specified network platform to obtain a plurality of structured events, unifying the structured events which refer to the same event in the plurality of structured events into a common event, generalizing the common event to obtain an abstract event of the network platform, respectively processing the abstract event by using the first seedling event classifier and the second seedling event classifier, and integrating the results of the first seedling event classifier and the second seedling event classifier to obtain a detection result of the seedling event of the network platform.
2. The method for detecting network premature events extracted based on event cause and effect relationship as claimed in claim 1, wherein the input of the first premature event classifier is an abstract event, and the first premature event classifier judges whether the event is a premature event according to the event characteristics of the abstract event; the input of the second seedling event classifier is abstract events, the second seedling event classifier matches the abstract events according to rules in the seedling event judgment rule base, and when the abstract events meet the rules, the classifier judges that the events are seedling events.
3. The method for detecting the network early event based on event causal relationship extraction as claimed in claim 1, wherein the process of extracting the causal event pair in step 1 comprises:
step 11, judging whether the cause event and the result event are candidate causal pairs or not according to the probability of the occurrence of the cause event in a period of time after the cause event occurs;
and 12, comparing all the candidate causal pairs of the same effect event, and selecting the top k candidate causal pairs with the maximum correlation as the causal event pairs of the effect event.
4. The method as claimed in claim 1, wherein the co-reference events include event participants, event triggers, event elements and event occurrence locations.
5. The method for detecting network early events based on event causal relationship extraction as claimed in claim 1, further comprising:
and a dynamic updating step, wherein the step 1 and the step 2 are periodically and repeatedly executed according to a preset period, so that historical events in a time period are read from a historical event library, and the first seedling event classifier and the second seedling event classifier are dynamically updated, and the detection effect of the seedling events in the step 3 is ensured.
6. A network early event detection system based on event cause and effect extraction is characterized by comprising:
the method comprises the following steps that 1, historical events in a time period are read from a historical event library in which abstract events are stored, causal event pairs consisting of causal events and causal events are extracted from the historical events in the time period by using an APT (advanced persistent threat) logic algorithm, all the causal event pairs are filtered, and the seedling-end causal event pairs are screened out;
the module 2 is used for taking the causal events in the seedling head causal event pairs as seedling head events, storing the seedling head event events into a seedling head event sample library, taking the data of the seedling head event sample library as a training set, training a first seedling head event classifier based on machine learning, taking causal relation of the seedling head causal event pairs as a seedling head event judgment rule, storing the seedling head event classifier into a seedling head event judgment rule library, and constructing a second seedling head event classifier based on the rule by using the seedling head event judgment rule library;
and the module 3 is used for extracting events from the appointed network platform to obtain a plurality of structured events, unifying the structured events which refer to the same event in the plurality of structured events into a common event, generalizing the common event to obtain an abstract event of the network platform, respectively processing the abstract event by using the first seedling event classifier and the second seedling event classifier, and integrating the results of the first seedling event classifier and the second seedling event classifier to obtain the detection result of the seedling events of the network platform.
7. The system for detecting network premature events extracted based on the causal relationship of events as claimed in claim 6, wherein the input of the first premature event classifier is an abstract event, and the first premature event classifier determines whether the event is a premature event according to the event characteristics of the abstract event; the input of the second seedling head event classifier is an abstract event, the second seedling head event classifier matches the abstract event according to rules in the seedling head event judgment rule base, and when the abstract event meets the rules, the classifier judges that the event is the seedling head event.
8. The system for detecting the network early event based on event causal relationship extraction as claimed in claim 6, wherein the process of extracting the causal event pair in the module 1 comprises:
the module 11 judges whether the cause event and the result event are candidate causal pairs according to the probability of the occurrence of the result event within a period of time after the cause event occurs;
the module 12 selects the first k candidate causal pairs of which the correlation is the greatest as the causal event pair of the effect event by comparing all candidate causal pairs of the same effect event.
9. The system as claimed in claim 6, wherein the co-reference events include event participants, event triggers, event elements and event occurrence locations.
10. The system for detecting network pre-emergence events extracted based on event cause and effect relationship as claimed in claim 6, further comprising:
and the dynamic updating module is used for periodically and repeatedly executing the module 1 and the module 2 according to a preset period so as to read historical events in a time period from the historical event library and dynamically update the first seedling event classifier and the second seedling event classifier, thereby ensuring the detection effect of the seedling events in the module 3.
CN201910833900.7A 2019-09-04 2019-09-04 Network early event detection method and system based on event cause and effect extraction Active CN110705597B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910833900.7A CN110705597B (en) 2019-09-04 2019-09-04 Network early event detection method and system based on event cause and effect extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910833900.7A CN110705597B (en) 2019-09-04 2019-09-04 Network early event detection method and system based on event cause and effect extraction

Publications (2)

Publication Number Publication Date
CN110705597A CN110705597A (en) 2020-01-17
CN110705597B true CN110705597B (en) 2022-11-11

Family

ID=69193675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910833900.7A Active CN110705597B (en) 2019-09-04 2019-09-04 Network early event detection method and system based on event cause and effect extraction

Country Status (1)

Country Link
CN (1) CN110705597B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967601B (en) * 2020-06-30 2024-02-20 北京百度网讯科技有限公司 Event relation generation method, event relation rule generation method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106507315A (en) * 2016-11-24 2017-03-15 西安交通大学 A kind of urban traffic accident Forecasting Methodology and system based on network social intercourse media data
CN108052576A (en) * 2017-12-08 2018-05-18 国家计算机网络与信息安全管理中心 A kind of reason knowledge mapping construction method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793589B (en) * 2012-10-31 2017-01-18 中国科学院软件研究所 High-speed train fault handling method
CN103279479A (en) * 2013-04-19 2013-09-04 中国科学院计算技术研究所 Emergent topic detecting method and system facing text streams of micro-blog platform
CN104765733B (en) * 2014-01-02 2018-06-15 华为技术有限公司 A kind of method and apparatus of social networks event analysis
JP6515937B2 (en) * 2017-02-08 2019-05-22 横河電機株式会社 Event analysis apparatus, event analysis system, event analysis method, event analysis program, and recording medium
CN109471932A (en) * 2018-11-26 2019-03-15 国家计算机网络与信息安全管理中心 Rumour detection method, system and storage medium based on learning model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106507315A (en) * 2016-11-24 2017-03-15 西安交通大学 A kind of urban traffic accident Forecasting Methodology and system based on network social intercourse media data
CN108052576A (en) * 2017-12-08 2018-05-18 国家计算机网络与信息安全管理中心 A kind of reason knowledge mapping construction method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Mining for Causal Relationships: A Data-Driven Study of the Islamic State;Andrew Stanton,et al;《Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining》;20150810;第1-10页 *
基于深度学习的军事事件抽取研究;游飞;《中国优秀硕士学位论文全文数据库(社会科学Ⅰ辑)》;20190515;G112-30 *

Also Published As

Publication number Publication date
CN110705597A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
US20210089579A1 (en) Method and apparatus for collecting, detecting and visualizing fake news
Wang et al. Prioritizing test inputs for deep neural networks via mutation analysis
Fourure et al. Anomaly detection: How to artificially increase your f1-score with a biased evaluation protocol
CN105740228A (en) Internet public opinion analysis method
CN105518656A (en) A cognitive neuro-linguistic behavior recognition system for multi-sensor data fusion
CN112235327A (en) Abnormal log detection method, device, equipment and computer readable storage medium
Kamiran et al. Techniques for discrimination-free predictive models
CN111143842A (en) Malicious code detection method and system
CN113762377B (en) Network traffic identification method, device, equipment and storage medium
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN112183652A (en) Edge end bias detection method under federated machine learning environment
CN114266455A (en) Knowledge graph-based visual enterprise risk assessment method
CN111160959A (en) User click conversion estimation method and device
US11562133B2 (en) System and method for detecting incorrect triple
Feng et al. An improved X-means and isolation forest based methodology for network traffic anomaly detection
Moskal et al. Translating intrusion alerts to cyberattack stages using pseudo-active transfer learning (PATRL)
CN110705597B (en) Network early event detection method and system based on event cause and effect extraction
Chua et al. Problem Understanding of Fake News Detection from a Data Mining Perspective
Zeng et al. Detecting journalism in the age of social media: three experiments in classifying journalists on twitter
CN117197568A (en) Zero sample image recognition method based on CLIP
CN117038074A (en) User management method, device, equipment and storage medium based on big data
Qiao et al. Learning Evolutionary Stages with Hidden Semi‐Markov Model for Predicting Social Unrest Events
CN116841779A (en) Abnormality log detection method, abnormality log detection device, electronic device and readable storage medium
Chandra et al. An Enhanced Deep Learning Model for Duplicate Question Detection on Quora Question pairs using Siamese LSTM
Saranya Shree et al. Prediction of fake Instagram profiles using machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant