CN109101530A - Effective sequence of events pattern mining algorithm - Google Patents

Effective sequence of events pattern mining algorithm Download PDF

Info

Publication number
CN109101530A
CN109101530A CN201810650504.6A CN201810650504A CN109101530A CN 109101530 A CN109101530 A CN 109101530A CN 201810650504 A CN201810650504 A CN 201810650504A CN 109101530 A CN109101530 A CN 109101530A
Authority
CN
China
Prior art keywords
mode
effective
value
sequence
utility
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810650504.6A
Other languages
Chinese (zh)
Other versions
CN109101530B (en
Inventor
张春慨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201810650504.6A priority Critical patent/CN109101530B/en
Publication of CN109101530A publication Critical patent/CN109101530A/en
Application granted granted Critical
Publication of CN109101530B publication Critical patent/CN109101530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of effective sequence of events pattern mining algorithms, comprising the following steps: S1, security incident definition;The division of S2, transaction database;S3, increment type effective security incident sequential mining;S4, parallelization increment type effective security incident sequential mining.The beneficial effects of the present invention are: can accelerate the time excavated using parallelization, more preferable utilization of hardware resources realizes the excavation of effective sequence of events mode, accelerates the speed of data mining.

Description

Effective sequence of events pattern mining algorithm
Technical field
The present invention relates to data mining more particularly to a kind of effective sequence of events pattern mining algorithms.
Background technique
Current network safety event related analysis technology mainly includes point based on probability similarity between security incident Analysis method, based on association analysis method causal between time behavior result and prerequisite, the method based on attack graph, Method etc. based on data mining and machine learning, wherein the method based on data mining and machine learning is most at all to be also most Effectively association analysis method.Association rule mining is as a kind of typical data digging method, in the pass of network safety event It is had been widely used in connection analysis model, but with the arriving of big data era, traditional association rule mining method Application field it is more and more narrow, then large quantities of scholars propose the innovatory algorithm to association rules mining algorithm.
The improvement of some pairs of current traditional association rule mining algorithms, greatly both for traditional association rule mining algorithms Purposes improve, break tradition just for transaction data commodity set rule digging, it is more multiple to be applied to condition In miscellaneous application.
In the association rule algorithm research field of time series, R.J.Povinelli et al. is proposed based on time series Data mining framework (TSDM, Time Series Data Mining framework), he be referred to as time series data digging It digs (Time Series Data Mining).Zeng Haiquan be based on inter-associative successor tree model, propose time series excavate to it is similar Property search technology.Lu Shan proposes Nonlinear Time Series phase space reconfiguration, the Financial Time Series Forecasting of nonlinear kinetics Technology;According to the research conditions of current time Series Data Mining, Hou Pengman thinks that Time Series Data Mining can be more typically It is defined as follows to property: the data mining of time series based on one or more is known as Time Series Data Mining (Time Series Data Mining, TSDM), it can from the rule extracted in timing inside timing for the numerical value of timing, the period, Trend analysis and prediction etc.;D.Gasp et al. then proposes a kind of method that rule is found from time series.D.Gas is adopted first Pre- place is standardized to time series data with the sliding window method (mobile windows method) that Baltzersen is proposed Time series, is converted to timed sample sequence, completes discretization and symbolism treatment process to time series data by reason;Its It is secondary, standardized time series data sample set is clustered;Again, with these obtained classes to original time series data into Row reconstruct;Rule digging finally is carried out to the time series data collection after reconstruct.But this method is only raw by data mining processing method It removes and is applied in Time-Series analysis with applying arbitrarily, do not account for the temporal characteristics and knowledge background problem of time series, also do not give Reasonable theoretical explanation out.J.Han et al. then carries out the period to the timing in time series databases using data mining technology Segment and partial periodicity segment research, it is therefore an objective to find that periodicity pattern (refers to the mould that regularity occurs in regular intervals Formula).
Existing data set is all based on about the excavation of correlation rule to carry out, i.e., given the set of affairs at present. And in the excavation based on sequence of events it may first have to which sequence of events is converted into the affairs set containing event.At present this Most of use of kind conversion is carried out based on sliding window.However current method is will to be used as admittedly comprising the number of event Fixed window size is divided.And this is clearly unreasonable in sequence of events, between the time of even two events Every larger, it can also be partitioned into the same affairs under this approach, and the two biggish event of time interval its correlation degrees It is smaller even without association, the fact that this division methods are ignored as and introduce by force include the two events affairs, This is it is clear that unreasonable.Therefore it needs to make improvement to the method for division.
In addition, for sequence of events mode, since event continually generates, what corresponding division generated Affairs set is also dynamic change.In view of closing a newly-increased transaction set in original transaction set, traditional method be by Two transaction sets, which merge, becomes a big transaction set, on the basis of this big transaction set again using previous method into Row excavates.So a disadvantage will be generated: with the continuous expansion of transaction set scale, the time of excavation will constantly expand Greatly, excavation time consumption will eventually be caused huge or even be unable to complete excavation.This method, which does not account for, has previously dug The mode dug, but excavate again every time.In practical applications, this method is obviously unreasonable.
Excavation for sequence pattern, most of algorithm are all serializations, and in most of algorithm, front and back is dug There is no basic associations for the sequence pattern of pick, i.e., the excavation of some sequence patterns is independent of partial mode.Therefore, how to adopt Accelerate the time excavated with parallelization, more preferable utilization of hardware resources is those skilled in the art's technology urgently to be resolved Problem.
Summary of the invention
In order to solve the problems in the prior art, the present invention provides a kind of effective sequence of events pattern mining algorithms.
The present invention provides a kind of effective sequence of events pattern mining algorithms, comprising the following steps:
S1, security incident definition;
The division of S2, transaction database;
S3, increment type effective security incident sequential mining;
S4, parallelization increment type effective security incident sequential mining.
As a further improvement of the present invention, in step sl, different events is using attack type label as mark Influence of the remaining attribute to event is considered, by calculating its value of utility for attribute value, then by the effectiveness of each attribute The cumulative value of utility as final event of value;Value of utility corresponding to attribute value, thus can be by changing by manually providing Become value of utility to come to different location, the event of different IP assigns different significance levels.
As a further improvement of the present invention, in step s 2, event set is divided into thing by the way of sliding window Business set, window are indicated from tsTo teA period of time, the time span of each window is identical, i.e. (te-ts) identical.
As a further improvement of the present invention, in step s 2, according to the event after time-sequencing, using time interval phase With sliding window original event divided;Event in the same window will form a transaction sequence, window Mouth will slide into the time point of next time every time, if event is separated by relatively closely several times, be construed as concurrent, Precedence is not considered to it;When combined event is located at the first item of current window, merge generation to make up event Influence, using original value of utility multiplied by combined number as new value of utility.
As a further improvement of the present invention, in step s3, if raw data set is D1, increasing data set newly is D2, original Effective security incident sequence pattern collection in data set D1 is combined into HUSEP1, increases data set D newly2Middle effective safety time sequence Column set of modes is HUSEP2;According to definition: the minimum value of utility of HUSEP1 is more than or equal to δ × u (D1), the minimum effect of HUSEP2 It is more than or equal to δ × u (D with value2);Raw data set D1 merges the database to be formed with newly-increased data set D2 and is denoted as D3, original number The effective security incident sequence pattern collection for merging the database D 3 to be formed with newly-increased data set D2 according to collection D1 is combined into HUSEP3, shows The minimum value of utility of right HUSEP3 is more than or equal to δ × u (D3)=δ × u (D1)+δ×u(D2)。
It as a further improvement of the present invention, in step s3, is a son of HUSEP1 ∪ HUSEP2 for HUSEP3 Collection, it is clear that HUSEP3 at least occurred in HUSEP1 or HUSEP2, if original HUSEP3 is equal in HUSEP1 and HUSEP2 Do not occur, enabling value of utility of the corresponding mode in D1, D2 and D3 is respectively u1, u2 and u3, according to definition, Ying You: u1<δ×u (D1) and u2<δ×u(D2), it releases: u3=u1+u2<δ×u(D1)+δ×u(D2)=δ × u (D3), it is clear that the mode should not be In HUSEP3, contradiction is generated, therefore HUSEP3 is a subset of HUSEP1 ∪ HUSEP2.
As a further improvement of the present invention, in step s3,
For the mode in the mode and HUSEP2 in HUSEP1, situation in 4 is shared:
5) mode is not effective mode in D1 and D2;
6) mode is all effective mode in D1 and D2;
7) mode is effective mode in D1, is not effective mode in D2;
8) mode is not effective mode in D1, is effective mode in D2;
For situation 1), mode is not effective mode in D3;
For situation 2), mode is effective mode in D3,
Analogy situation 1) there is u1≥δ×u(D1), u2≥δ×u(D2), therefore
u3=u1+u2≥δ×u(D1)+δ×u(D2)=δ × u (D3);
For situation 3) and whether be effective mode, need to calculate mode and exist if 4) then can not directly release mode in D3 Value of utility in D3 judges.
For situation 3), since mode has been effective mode in D1, it need to only calculate value of utility of the mode in D2 It is being subject to judgement;
For situation 4), since mode has been effective mode in D2, it need to only calculate value of utility of the mode in D1 It is being subject to judgement.
As a further improvement of the present invention, in step s 4, in using HUSP-Miner algorithm mining process, first Need to find the effective upper bound be greater than threshold value 1 item collection of candidate, then again on the basis of this, by sequence growth come by k item collection K+1 item collection is generated, and search space is reduced using Pruning strategy.
As a further improvement of the present invention, Pruning strategy is exactly the sky for reducing search by constantly reducing database Between: database is read in into memory first, with the growth of mode, the transaction set comprising this mode will constantly reduce, i.e., corresponding Data for projection library constantly becomes smaller;Due in mining process not to database generate change, it can be considered that given After database, the mining process of each mode is independently carried out.
As a further improvement of the present invention, in step s 4, P mining is carried out by the way of multithreading:
1) when thread I completes excavating for task, it is at wait state;
If 2) thread J does not complete excavation also at this time, current mode to be treated is handed into thread I processing, thread J Execute next mode to be processed.
The beneficial effects of the present invention are: through the above scheme, the time excavated can be accelerated using parallelization, preferably Using hardware resource, the excavation of effective sequence of events mode is realized, accelerates the speed of data mining.
Detailed description of the invention
Fig. 1 is divided based on the time to security incident in a kind of effective sequence of events pattern mining algorithm of the present invention Schematic diagram.
Fig. 2 is one result figure of experiment.
Fig. 3 is two result figures of experiment.
Fig. 4 is three result figures of experiment.
Specific embodiment
The invention will be further described for explanation and specific embodiment with reference to the accompanying drawing.
A kind of effective sequence of events pattern mining algorithm, comprising the following steps:
One, the definition of security incident
Before the mode excavation of research network safety event, the definition for providing network safety event, i.e. network are first had to The attribute of security incident.According to previous experience, extracted in the present invention following typical attribute for define a net Network security incident.Table 1 gives the definition of network safety event, and table 2 enumerates the common type for carrying out network attack.
1 security incident attribute of table
2 common attack type of table
Different events is using attack type label as mark, in order to consider influence of the remaining attribute to event, by right Its value of utility is calculated in attribute value, then by the cumulative value of utility as final event of the value of utility of each attribute.Attribute The corresponding value of utility of value by manually providing, thus can be by changing value of utility come to different location, the thing of different IP Part assigns different significance levels.
Two, the division of transaction database
Since what is obtained is an event sets, traditional pattern mining algorithm not can be used directly for the set and carry out It excavates, needs to be converted into the affairs set suitable for pattern mining algorithm.It should be noted that traditional effective sequence mould It is slightly distinguished in the form of expression of affairs and the effective sequential mode mining based on security incident in formula excavation.In effective mould During formula is excavated, the effectiveness of project is influenced by external effectiveness and internal effectiveness.Internal effectiveness refers generally to quantity, external effectiveness one As refer to the corresponding profit of the project and for the same project, external effectiveness is identical.Since each event is by other attributes Influence, therefore for the event of different numbers, value of utility may be different.This allows for the effective sequence based on security incident The different from of affairs in form in affairs and traditional effective sequential mining in column excavation.For example, it is contemplated that affairs < [(e1:ue1)],[(e2:ue2)] >, in security incident excavation, Ue1 and ue2 indicates the value of utility of corresponding event in affairs, and In traditional effective sequential mode mining Ue1 it is corresponding with ue2 should be an e1 and e2 occur in respective range time Number.Although different in form, still traditional effective Sequential Pattern Mining Algorithm can be applied to safe thing In the transaction database that part generates.The inside effectiveness of traditional effective sequential mode mining and external effectiveness are intended merely to calculate The value of utility of project out, in addition, there is no what substantive influences to mining process for internal effectiveness and external effectiveness itself.
Event set is divided into affairs set by the way of sliding window herein.Window is indicated from tsTo teOne section when Between, the time span of each window is identical, i.e. (te-ts) identical.The detailed step of event division is exemplified below.Equipped with peace Full-time set D1, as shown in table 3.
Attack ID Time(s) Location Source IP Destination IP
e1 t1 l1 s1 d1
e2 t2 l2 s2 d2
e3 t3 l1 s7 d2
e4 t4 l1 s2 d1
e5 t5 l2 s5 d1
e6 t6 l2 s5 d1
3 security incident set D1 of table
According to the event after time-sequencing as shown in Figure 1, using the identical sliding window of time interval come to original thing Part is divided.Event in the same window will form a transaction sequence, and window will slide into next time every time Time point be construed as concurrent if event is separated by relatively close several times, i.e., precedence do not considered to it.When When combined event is located at the first item of current window, merges the influence generated to make up event, original value of utility is multiplied Using combined number as new value of utility.With e4 in Fig. 1, for e5, since the two is closer, it can be regarded as one Event.Using sliding window mark off come affairs answer are as follows: < [(e4:ue4)(e5:ue5)],[(e6:ue6)] >, it is contemplated that merge Influence, need the number by original value of utility multiplied by combined event, affairs that treated should be < [(e4:2*ue4)(e5:2* ue5)],[(e6:2*ue6)]>。
Affairs set after the division of table 4
After security incident is divided into transaction database, existing effective Sequential Pattern Mining Algorithm pair can be used It is excavated, and herein using HUSP miner algorithm, does not do excessive introduction here.Three, the safe thing of increment type effective Part sequential mining
In actual application process, security incident generates in real time, and therefore, the database for dividing formation is also dynamic Increase.For the database that this dynamic increases, if all carrying out re-starting one after the content update of database every time Secondary excavation, then a large amount of resource will be expended.In addition, the mining algorithm of script is very with the continuous growth of database size Result is unable to get because of excessive scale to meeting.Therefore the relationship for needing to find out raw data set and newly-increased data set is next simple Change mining process.
If raw data set D1, newly-increased data set is D2, effective security incident sequence pattern in raw data set D1 Collection is combined into HUSEP1, and effective safety time sequence pattern collection is combined into HUSEP2 in newly-increased data set.According to definition: HUSEP1's Minimum value of utility is more than or equal to δ × u (D1), the minimum value of utility of HUSEP2 is more than or equal to δ × u (D2).D1, which merges with D2, to be formed Database is denoted as D3, it is clear that the minimum value of utility of HUSEP3 is more than or equal to δ × u (D3)=δ × u (D1)+δ×u(D2)。
It may further obtain: be a subset of HUSEP1 ∪ HUSEP2 for the HUSEP3 in D3.Obviously, HUSEP3 At least occurred in HUSEP1 or HUSEP2, if HUSEP3 does not occur in HUSEP1 and HUSEP2, enabled corresponding Value of utility of the mode in database D 1, D2 and D3 is respectively u1, u2 and u3.According to definition, Ying You: u1<δ×u(D1) and u2<δ ×u(D2), it releases:
u3=u1+u2<δ×u(D1)+δ×u(D2)=δ × u (D3), it is clear that the mode should not generate lance in HUSEP3 Shield.Therefore HUSEP3 is a subset of HUSEP1 ∪ HUSEP2.
For the mode in the mode and HUSEP2 in HUSEP1, situation in 4 is shared:
9) mode is not effective mode in D1 and D2
10) mode is all effective mode in D1 and D2
11) mode is effective mode in D1, is not effective mode in D2
12) mode is not effective mode in D1, is effective mode in D2
For situation 1), mode certainly not effective mode in D3, it was demonstrated that process, which is similar to, proves that HUSEP3 is The subset of HUSEP1 ∪ HUSEP2 does not do repetition since related proof has already given above here.
For situation 2), mode is effective mode, analogy situation 1 certainly in D3) there is u1≥δ×u(D1), u2≥δ× u(D2), therefore
u3=u1+u2≥δ×u(D1)+δ×u(D2)=δ × u (D3)。
For situation 3) and whether be effective mode, need to calculate mode and exist if 4) then can not directly release mode in D3 Value of utility in D3 judges.
For situation 3), since mode has been effective mode in D1, it need to only calculate value of utility of the mode in D2 It is being subject to judgement.
Similarly, for situation 4), since mode has been effective mode in D2, mode only need to be calculated in D1 Value of utility is being subject to judgement.
Define 1: project ijValue of utility in q- item collection v is defined as
u(ij, v) and=q (ij,v)×pr(ij)
Wherein q (ij, v) and it is ijQuantity in v, pr (ij) it is ijProfit.
The value of utility for defining 2:q- item collection v is defined as
The value of utility for defining 3:q- sequence s is defined as
Define 4: given q- sequence s=< v1,v2,...,vd>and sequence t=<w1,w2,...,wr>, if d=r and right Meet v in 1≤k≤dkAnd wkIdentical, then s is the matching of t, is denoted as s~t.
5: the sequence t value of utility in q- sequence s is defined to be defined as
Wherein, t~skIndicate skIt is the matching of t.
Defining 6: the sequence t value of utility in the D of quantized data library is
Define 7: the value of utility of quantized data library D is defined as
Define 8: if value of utility of the sequence t in the D of quantized data library is not less than user-defined minimum threshold δ × u (D), then t is effective sequence pattern (HUSP), is denoted as
HUSP←{t|u(t)≥u(D)×δ}
Based on above-mentioned definition, effective sequential mode mining can be with is defined as: given quantization sequence database D and minimum effect With threshold value δ (decimal between 0~1), the sequence pattern that all value of utilities are not less than δ × u (D) is found out.
Four, parallelization increment type effective security incident sequential mining
Using in HUSP-Miner algorithm mining process, it is necessary first to find candidate of the effective upper bound of institute greater than threshold value 1 item collection then again on the basis of this, increases (two kinds of growth pattern) by sequence to generate k+1 item collection by k item collection, and using conjunction Suitable Pruning strategy reduces search space.One of Pruning strategy is exactly to reduce search by constantly reducing database Space: database is read in memory first by algorithm, and with the growth of mode, the transaction set comprising this mode will constantly reduce, i.e., Corresponding data for projection library constantly becomes smaller.Due in mining process not to database generate change, it can be considered that After database, the mining process of each mode is independently carried out.
It therefore, can be by the way that an item collection be divided, then using multithreading after finding out all 1 item collections of candidate Mode carries out P mining to it.Significantly, since candidate each 1 item collection may finally generate effective mode Number is not identical, it is thus possible to which the thread having terminates too early, and some threads execution time is longer.In this way between different threads by In the difference for executing the time, total runing time may finally be will lead to and greatly differed from each other with expection.In order to solve such case, into Row is following to be improved:
1) when thread I completes excavating for task, it is at wait state.
If 2) thread J does not complete excavation also at this time, current mode to be treated is handed into thread I processing, oneself Execute next mode to be processed.
By above-mentioned strategy, it can to load relative equilibrium between each thread, effectively reduce the time of excavation.
Experimental result and analysis
The event sets that the data set of experiment is generated at random by one according to division methods obtain.After division Sequence sets share 9752 affairs, and different types of event shares 1000 kinds.Experiment is broadly divided into three parts, as shown in Fig. 2, real It tests first is that is tested is to change the size of data set in the identical situation of δ;As shown in figure 3, experiment is second is that constant in data set In the case where, change δ;As shown in figure 4, experiment three is then to compare the mining algorithm of multithreading with single thread.
Experiment one and experiment are second is that test incremental database.Wherein, increment 1 refer in legacy data collection and Newly-increased data set is excavated respectively, and the method introduced before recycling merges the result excavated twice.Increment 2 is benefit The result excavated with raw data base is merged with the result excavated to newly-increased data set.In experiment one, number is increased newly Constant according to collecting, to test the data set being previously generated, original data set is from the data splicing generated (i.e. by original thing More parts of duplication of business set are spliced again, the duplicate affairs of nonjoinder), scale is the 1,2,3,4 of the data set generated respectively Times.
Experiment two in, when δ takes 0.0005, the method for increment 1 more slowly than original method, this is because being taken in δ It is worth in lesser situation, it is more excavates the effective mode quantity obtained.And relatively common hair merging side is employed herein Method, therefore may be taken some time when merging.Experiment one and experiment two the result shows that, in known original excavation It, can be than the merging of new and old data set be being dug using the mining algorithm based on increment type for newly-increased data set when as a result Pick is much faster.
Experiment is third is that using the mining algorithm of multithreading compared with the mining algorithm of single thread, here using four Thread.The data set of experiment is constant, is compared by changing δ to two methods.As can be seen from Fig., δ is smaller, the two it Between difference be more obvious.This is because δ is smaller, mode in need of consideration is more, and the calculation amount of progress is also bigger, multithreading Advantage can also preferably be shown.Therefore, when the amount of data is large, it may be considered that accelerate to dig by the way of multithreading The speed of pick.
Following table shows the partial results of excavation, which is by by and splicing four original event sets At δ 0.0008, corresponding threshold value is 7273.Excavate the result is that value of utility be greater than specified threshold effective sequence mould Formula, wherein each corresponds to a time ID.By taking third in table as an example, the mode excavated is [(132)], [(577)], [(936)], [(825)], [(531)], [(646)], [(24)], [(505) (644)], [(710)], this is indicated that Directly there may be certain relationships for these events, it is noted that two events of event id 505,644 be it is concurrent, they it Between but may can be contacted in the presence of certain with successive event there is no connection.In the effective sequence of events mould that excavation obtains It can conduct further research in formula, be potentially associated between these events with excavating out.
5 part Result of table
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention Protection scope.

Claims (10)

1. a kind of effective sequence of events pattern mining algorithm, which comprises the following steps:
S1, security incident definition;
The division of S2, transaction database;
S3, increment type effective security incident sequential mining;
S4, parallelization increment type effective security incident sequential mining.
2. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step sl, no Same event is using attack type label as mark, in order to consider influence of the remaining attribute to event, by for attribute value meter Its value of utility is calculated, then by the cumulative value of utility as final event of the value of utility of each attribute;Corresponding to attribute value Value of utility, thus can be different to assign to the event of different location, different IP by changing value of utility by manually providing Significance level.
3. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step s 2, adopt Event set is divided into affairs set with the mode of sliding window, window is indicated from tsTo teA period of time, each window when Between span it is identical, i.e. (te-ts) identical.
4. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step s 2, press According to the event after time-sequencing, original event is divided using the identical sliding window of time interval;Same Event in window will form a transaction sequence, and window will slide into the time point of next time every time, if event several times It is separated by relatively closely, is construed as concurrent, i.e., precedence is not considered to it;Work as front window when combined event is located at Mouthful first item when, in order to make up event merge generate influence, using original value of utility multiplied by combined number as newly Value of utility.
5. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step s3, if Raw data set is D1, increasing data set newly is D2, the effective security incident sequence pattern collection in raw data set D1 is combined into HUSEP1 increases data set D newly2Middle effective safety time sequence pattern collection is combined into HUSEP2;According to definition: the minimum of HUSEP1 Value of utility is more than or equal to user-defined minimum threshold δ × u (D1), the minimum value of utility of HUSEP2 is more than or equal to user-defined Minimum threshold δ × u (D2);Raw data set D1 merges the database to be formed with newly-increased data set D2 and is denoted as D3, raw data set The effective security incident sequence pattern collection that D1 merges the database D 3 to be formed with newly-increased data set D2 is combined into HUSEP3, it is clear that The minimum value of utility of HUSEP3 is more than or equal to user-defined minimum threshold δ × u (D3)=δ × u (D1)+δ×u(D2)。
6. effective sequence of events pattern mining algorithm according to claim 5, it is characterised in that: in step s3, right In a subset that HUSEP3 is HUSEP1 ∪ HUSEP2, it is clear that HUSEP3 at least occurred in HUSEP1 or HUSEP2, If original HUSEP3 does not occur in HUSEP1 and HUSEP2, value of utility difference of the corresponding mode in D1, D2 and D3 is enabled For u1, u2 and u3, according to definition, Ying You: u1<δ×u(D1) and u2<δ×u(D2), it releases: u3=u1+u2<δ×u(D1)+δ×u (D2)=δ × u (D3), it is clear that the mode should not generate contradiction, therefore HUSEP3 is HUSEP1 ∪ HUSEP2 in HUSEP3 A subset.
7. effective sequence of events pattern mining algorithm according to claim 5, it is characterised in that: in step s3,
For the mode in the mode and HUSEP2 in HUSEP1, situation in 4 is shared:
1) mode is not effective mode in D1 and D2;
2) mode is all effective mode in D1 and D2;
3) mode is effective mode in D1, is not effective mode in D2;
4) mode is not effective mode in D1, is effective mode in D2;
For situation 1), mode is not effective mode in D3;
For situation 2), mode is effective mode in D3,
Analogy situation 1) there is u1≥δ×u(D1), u2≥δ×u(D2), therefore
u3=u1+u2≥δ×u(D1)+δ×u(D2)=δ × u (D3);
For situation 3) and whether be effective mode, need to calculate mode in D3 if 4) then can not directly release mode in D3 Value of utility judge.
For situation 3), since mode has been effective mode in D1, it only need to calculate value of utility of the mode in D2 and add With judgement;
For situation 4), since mode has been effective mode in D2, it only need to calculate value of utility of the mode in D1 and add With judgement.
8. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step s 4, Using in HUSP-Miner algorithm mining process, it is necessary first to find the effective upper bound be greater than threshold value 1 item collection of candidate, then Again on the basis of this, k+1 item collection is generated by k item collection by sequence growth, and search space is reduced using Pruning strategy.
9. effective sequence of events pattern mining algorithm according to claim 8, it is characterised in that: Pruning strategy is exactly logical It crosses and reduces database constantly to reduce the space of search: database being read in into memory first, include this mould with the growth of mode The transaction set of formula will constantly reduce, i.e., corresponding data for projection library constantly becomes smaller;Due to not produced to database in mining process It is raw to change, it can be considered that the mining process of each mode independently carries out after given database.
10. effective sequence of events pattern mining algorithm according to claim 9, it is characterised in that: in step s 4, adopt P mining is carried out with the mode of multithreading:
1) when thread I completes excavating for task, it is at wait state;
If 2) thread J does not complete excavation also at this time, current mode to be treated is handed into thread I processing, thread J is executed Next mode to be processed.
CN201810650504.6A 2018-06-22 2018-06-22 High-utility event sequence pattern mining method Active CN109101530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810650504.6A CN109101530B (en) 2018-06-22 2018-06-22 High-utility event sequence pattern mining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810650504.6A CN109101530B (en) 2018-06-22 2018-06-22 High-utility event sequence pattern mining method

Publications (2)

Publication Number Publication Date
CN109101530A true CN109101530A (en) 2018-12-28
CN109101530B CN109101530B (en) 2021-09-21

Family

ID=64844854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810650504.6A Active CN109101530B (en) 2018-06-22 2018-06-22 High-utility event sequence pattern mining method

Country Status (1)

Country Link
CN (1) CN109101530B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857758A (en) * 2018-12-29 2019-06-07 天津南大通用数据技术股份有限公司 A kind of association analysis method and system based on neighbours' window
CN113886396A (en) * 2021-10-20 2022-01-04 电子科技大学 Power system fault detection method and system based on high-utility frequent pattern mining
CN115964415A (en) * 2023-03-16 2023-04-14 山东科技大学 Pre-HUSPM-based database sequence insertion processing method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777182A (en) * 2016-12-23 2017-05-31 陕西理工学院 A kind of data flow effective item set mining algorithm for reducing candidate

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777182A (en) * 2016-12-23 2017-05-31 陕西理工学院 A kind of data flow effective item set mining algorithm for reducing candidate

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JERRY CHUN-WEI LIN等: "Efficiently updating the discovered high average-utility itemsets with transaction insertion", 《ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE》 *
MORTEZA ZIHAYAT等: "Distributed and Parallel High Utility Sequential Pattern Mining", 《2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)》 *
徐前方: "基于数据挖掘的网络故障告警相关性研究", 《中国博士学位论文全文数据库信息科技辑(月刊)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857758A (en) * 2018-12-29 2019-06-07 天津南大通用数据技术股份有限公司 A kind of association analysis method and system based on neighbours' window
CN113886396A (en) * 2021-10-20 2022-01-04 电子科技大学 Power system fault detection method and system based on high-utility frequent pattern mining
CN113886396B (en) * 2021-10-20 2022-03-29 电子科技大学 Power system fault detection method and system based on high-utility frequent pattern mining
CN115964415A (en) * 2023-03-16 2023-04-14 山东科技大学 Pre-HUSPM-based database sequence insertion processing method

Also Published As

Publication number Publication date
CN109101530B (en) 2021-09-21

Similar Documents

Publication Publication Date Title
Duong et al. Efficient high utility itemset mining using buffered utility-lists
Mai et al. A lattice-based approach for mining high utility association rules
Tong et al. Discovering threshold-based frequent closed itemsets over probabilistic data
CN103838863B (en) A kind of big data clustering algorithm based on cloud computing platform
Duong et al. An efficient method for mining frequent itemsets with double constraints
CN109101530A (en) Effective sequence of events pattern mining algorithm
Nam et al. Efficient approach for damped window-based high utility pattern mining with list structure
Lin et al. Efficient chain structure for high-utility sequential pattern mining
CN101727391B (en) Method for extracting operation sequence of software vulnerability characteristics
Liu et al. Incremental mining of high utility patterns in one phase by absence and legacy-based pruning
Jiang et al. Mining weighted negative association rules from infrequent itemsets based on multiple supports
Song et al. Parallel incremental association rule mining framework for public opinion analysis
Abbasghorbani et al. Survey on sequential pattern mining algorithms
Guo et al. High utility episode mining made practical and fast
Liu et al. SAPNSP: Select actionable positive and negative sequential patterns based on a contribution metric
Vu et al. FTKHUIM: A Fast and Efficient Method for Mining Top-K High-Utility Itemsets
Lin et al. Mining of high average-utility patterns with item-level thresholds
Bailey et al. Efficient incremental mining of contrast patterns in changing data
CN106021401A (en) Extensible entity analysis algorithm based on reverse indices
CN109325092A (en) Merge the nonparametric parallelization level Di Li Cray process topic model system of phrase information
CN109857817A (en) The whole network domain electronic mutual inductor frequent continuous data is screened and data processing method
Nguyen et al. An improved algorithm for mining frequent Inter-transaction patterns
Nguyen et al. An N-list-based approach for mining frequent inter-transaction patterns
Yang et al. Frequent pattern mining algorithm for uncertain data streams based on sliding window
Maragatham et al. A strategy for mining utility based temporal association rules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant