CN109101530A - Effective sequence of events pattern mining algorithm - Google Patents
Effective sequence of events pattern mining algorithm Download PDFInfo
- Publication number
- CN109101530A CN109101530A CN201810650504.6A CN201810650504A CN109101530A CN 109101530 A CN109101530 A CN 109101530A CN 201810650504 A CN201810650504 A CN 201810650504A CN 109101530 A CN109101530 A CN 109101530A
- Authority
- CN
- China
- Prior art keywords
- mode
- effective
- value
- sequence
- utility
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of effective sequence of events pattern mining algorithms, comprising the following steps: S1, security incident definition;The division of S2, transaction database;S3, increment type effective security incident sequential mining;S4, parallelization increment type effective security incident sequential mining.The beneficial effects of the present invention are: can accelerate the time excavated using parallelization, more preferable utilization of hardware resources realizes the excavation of effective sequence of events mode, accelerates the speed of data mining.
Description
Technical field
The present invention relates to data mining more particularly to a kind of effective sequence of events pattern mining algorithms.
Background technique
Current network safety event related analysis technology mainly includes point based on probability similarity between security incident
Analysis method, based on association analysis method causal between time behavior result and prerequisite, the method based on attack graph,
Method etc. based on data mining and machine learning, wherein the method based on data mining and machine learning is most at all to be also most
Effectively association analysis method.Association rule mining is as a kind of typical data digging method, in the pass of network safety event
It is had been widely used in connection analysis model, but with the arriving of big data era, traditional association rule mining method
Application field it is more and more narrow, then large quantities of scholars propose the innovatory algorithm to association rules mining algorithm.
The improvement of some pairs of current traditional association rule mining algorithms, greatly both for traditional association rule mining algorithms
Purposes improve, break tradition just for transaction data commodity set rule digging, it is more multiple to be applied to condition
In miscellaneous application.
In the association rule algorithm research field of time series, R.J.Povinelli et al. is proposed based on time series
Data mining framework (TSDM, Time Series Data Mining framework), he be referred to as time series data digging
It digs (Time Series Data Mining).Zeng Haiquan be based on inter-associative successor tree model, propose time series excavate to it is similar
Property search technology.Lu Shan proposes Nonlinear Time Series phase space reconfiguration, the Financial Time Series Forecasting of nonlinear kinetics
Technology;According to the research conditions of current time Series Data Mining, Hou Pengman thinks that Time Series Data Mining can be more typically
It is defined as follows to property: the data mining of time series based on one or more is known as Time Series Data Mining (Time
Series Data Mining, TSDM), it can from the rule extracted in timing inside timing for the numerical value of timing, the period,
Trend analysis and prediction etc.;D.Gasp et al. then proposes a kind of method that rule is found from time series.D.Gas is adopted first
Pre- place is standardized to time series data with the sliding window method (mobile windows method) that Baltzersen is proposed
Time series, is converted to timed sample sequence, completes discretization and symbolism treatment process to time series data by reason;Its
It is secondary, standardized time series data sample set is clustered;Again, with these obtained classes to original time series data into
Row reconstruct;Rule digging finally is carried out to the time series data collection after reconstruct.But this method is only raw by data mining processing method
It removes and is applied in Time-Series analysis with applying arbitrarily, do not account for the temporal characteristics and knowledge background problem of time series, also do not give
Reasonable theoretical explanation out.J.Han et al. then carries out the period to the timing in time series databases using data mining technology
Segment and partial periodicity segment research, it is therefore an objective to find that periodicity pattern (refers to the mould that regularity occurs in regular intervals
Formula).
Existing data set is all based on about the excavation of correlation rule to carry out, i.e., given the set of affairs at present.
And in the excavation based on sequence of events it may first have to which sequence of events is converted into the affairs set containing event.At present this
Most of use of kind conversion is carried out based on sliding window.However current method is will to be used as admittedly comprising the number of event
Fixed window size is divided.And this is clearly unreasonable in sequence of events, between the time of even two events
Every larger, it can also be partitioned into the same affairs under this approach, and the two biggish event of time interval its correlation degrees
It is smaller even without association, the fact that this division methods are ignored as and introduce by force include the two events affairs,
This is it is clear that unreasonable.Therefore it needs to make improvement to the method for division.
In addition, for sequence of events mode, since event continually generates, what corresponding division generated
Affairs set is also dynamic change.In view of closing a newly-increased transaction set in original transaction set, traditional method be by
Two transaction sets, which merge, becomes a big transaction set, on the basis of this big transaction set again using previous method into
Row excavates.So a disadvantage will be generated: with the continuous expansion of transaction set scale, the time of excavation will constantly expand
Greatly, excavation time consumption will eventually be caused huge or even be unable to complete excavation.This method, which does not account for, has previously dug
The mode dug, but excavate again every time.In practical applications, this method is obviously unreasonable.
Excavation for sequence pattern, most of algorithm are all serializations, and in most of algorithm, front and back is dug
There is no basic associations for the sequence pattern of pick, i.e., the excavation of some sequence patterns is independent of partial mode.Therefore, how to adopt
Accelerate the time excavated with parallelization, more preferable utilization of hardware resources is those skilled in the art's technology urgently to be resolved
Problem.
Summary of the invention
In order to solve the problems in the prior art, the present invention provides a kind of effective sequence of events pattern mining algorithms.
The present invention provides a kind of effective sequence of events pattern mining algorithms, comprising the following steps:
S1, security incident definition;
The division of S2, transaction database;
S3, increment type effective security incident sequential mining;
S4, parallelization increment type effective security incident sequential mining.
As a further improvement of the present invention, in step sl, different events is using attack type label as mark
Influence of the remaining attribute to event is considered, by calculating its value of utility for attribute value, then by the effectiveness of each attribute
The cumulative value of utility as final event of value;Value of utility corresponding to attribute value, thus can be by changing by manually providing
Become value of utility to come to different location, the event of different IP assigns different significance levels.
As a further improvement of the present invention, in step s 2, event set is divided into thing by the way of sliding window
Business set, window are indicated from tsTo teA period of time, the time span of each window is identical, i.e. (te-ts) identical.
As a further improvement of the present invention, in step s 2, according to the event after time-sequencing, using time interval phase
With sliding window original event divided;Event in the same window will form a transaction sequence, window
Mouth will slide into the time point of next time every time, if event is separated by relatively closely several times, be construed as concurrent,
Precedence is not considered to it;When combined event is located at the first item of current window, merge generation to make up event
Influence, using original value of utility multiplied by combined number as new value of utility.
As a further improvement of the present invention, in step s3, if raw data set is D1, increasing data set newly is D2, original
Effective security incident sequence pattern collection in data set D1 is combined into HUSEP1, increases data set D newly2Middle effective safety time sequence
Column set of modes is HUSEP2;According to definition: the minimum value of utility of HUSEP1 is more than or equal to δ × u (D1), the minimum effect of HUSEP2
It is more than or equal to δ × u (D with value2);Raw data set D1 merges the database to be formed with newly-increased data set D2 and is denoted as D3, original number
The effective security incident sequence pattern collection for merging the database D 3 to be formed with newly-increased data set D2 according to collection D1 is combined into HUSEP3, shows
The minimum value of utility of right HUSEP3 is more than or equal to δ × u (D3)=δ × u (D1)+δ×u(D2)。
It as a further improvement of the present invention, in step s3, is a son of HUSEP1 ∪ HUSEP2 for HUSEP3
Collection, it is clear that HUSEP3 at least occurred in HUSEP1 or HUSEP2, if original HUSEP3 is equal in HUSEP1 and HUSEP2
Do not occur, enabling value of utility of the corresponding mode in D1, D2 and D3 is respectively u1, u2 and u3, according to definition, Ying You: u1<δ×u
(D1) and u2<δ×u(D2), it releases: u3=u1+u2<δ×u(D1)+δ×u(D2)=δ × u (D3), it is clear that the mode should not be
In HUSEP3, contradiction is generated, therefore HUSEP3 is a subset of HUSEP1 ∪ HUSEP2.
As a further improvement of the present invention, in step s3,
For the mode in the mode and HUSEP2 in HUSEP1, situation in 4 is shared:
5) mode is not effective mode in D1 and D2;
6) mode is all effective mode in D1 and D2;
7) mode is effective mode in D1, is not effective mode in D2;
8) mode is not effective mode in D1, is effective mode in D2;
For situation 1), mode is not effective mode in D3;
For situation 2), mode is effective mode in D3,
Analogy situation 1) there is u1≥δ×u(D1), u2≥δ×u(D2), therefore
u3=u1+u2≥δ×u(D1)+δ×u(D2)=δ × u (D3);
For situation 3) and whether be effective mode, need to calculate mode and exist if 4) then can not directly release mode in D3
Value of utility in D3 judges.
For situation 3), since mode has been effective mode in D1, it need to only calculate value of utility of the mode in D2
It is being subject to judgement;
For situation 4), since mode has been effective mode in D2, it need to only calculate value of utility of the mode in D1
It is being subject to judgement.
As a further improvement of the present invention, in step s 4, in using HUSP-Miner algorithm mining process, first
Need to find the effective upper bound be greater than threshold value 1 item collection of candidate, then again on the basis of this, by sequence growth come by k item collection
K+1 item collection is generated, and search space is reduced using Pruning strategy.
As a further improvement of the present invention, Pruning strategy is exactly the sky for reducing search by constantly reducing database
Between: database is read in into memory first, with the growth of mode, the transaction set comprising this mode will constantly reduce, i.e., corresponding
Data for projection library constantly becomes smaller;Due in mining process not to database generate change, it can be considered that given
After database, the mining process of each mode is independently carried out.
As a further improvement of the present invention, in step s 4, P mining is carried out by the way of multithreading:
1) when thread I completes excavating for task, it is at wait state;
If 2) thread J does not complete excavation also at this time, current mode to be treated is handed into thread I processing, thread J
Execute next mode to be processed.
The beneficial effects of the present invention are: through the above scheme, the time excavated can be accelerated using parallelization, preferably
Using hardware resource, the excavation of effective sequence of events mode is realized, accelerates the speed of data mining.
Detailed description of the invention
Fig. 1 is divided based on the time to security incident in a kind of effective sequence of events pattern mining algorithm of the present invention
Schematic diagram.
Fig. 2 is one result figure of experiment.
Fig. 3 is two result figures of experiment.
Fig. 4 is three result figures of experiment.
Specific embodiment
The invention will be further described for explanation and specific embodiment with reference to the accompanying drawing.
A kind of effective sequence of events pattern mining algorithm, comprising the following steps:
One, the definition of security incident
Before the mode excavation of research network safety event, the definition for providing network safety event, i.e. network are first had to
The attribute of security incident.According to previous experience, extracted in the present invention following typical attribute for define a net
Network security incident.Table 1 gives the definition of network safety event, and table 2 enumerates the common type for carrying out network attack.
1 security incident attribute of table
2 common attack type of table
Different events is using attack type label as mark, in order to consider influence of the remaining attribute to event, by right
Its value of utility is calculated in attribute value, then by the cumulative value of utility as final event of the value of utility of each attribute.Attribute
The corresponding value of utility of value by manually providing, thus can be by changing value of utility come to different location, the thing of different IP
Part assigns different significance levels.
Two, the division of transaction database
Since what is obtained is an event sets, traditional pattern mining algorithm not can be used directly for the set and carry out
It excavates, needs to be converted into the affairs set suitable for pattern mining algorithm.It should be noted that traditional effective sequence mould
It is slightly distinguished in the form of expression of affairs and the effective sequential mode mining based on security incident in formula excavation.In effective mould
During formula is excavated, the effectiveness of project is influenced by external effectiveness and internal effectiveness.Internal effectiveness refers generally to quantity, external effectiveness one
As refer to the corresponding profit of the project and for the same project, external effectiveness is identical.Since each event is by other attributes
Influence, therefore for the event of different numbers, value of utility may be different.This allows for the effective sequence based on security incident
The different from of affairs in form in affairs and traditional effective sequential mining in column excavation.For example, it is contemplated that affairs <
[(e1:ue1)],[(e2:ue2)] >, in security incident excavation, Ue1 and ue2 indicates the value of utility of corresponding event in affairs, and
In traditional effective sequential mode mining Ue1 it is corresponding with ue2 should be an e1 and e2 occur in respective range time
Number.Although different in form, still traditional effective Sequential Pattern Mining Algorithm can be applied to safe thing
In the transaction database that part generates.The inside effectiveness of traditional effective sequential mode mining and external effectiveness are intended merely to calculate
The value of utility of project out, in addition, there is no what substantive influences to mining process for internal effectiveness and external effectiveness itself.
Event set is divided into affairs set by the way of sliding window herein.Window is indicated from tsTo teOne section when
Between, the time span of each window is identical, i.e. (te-ts) identical.The detailed step of event division is exemplified below.Equipped with peace
Full-time set D1, as shown in table 3.
Attack ID | Time(s) | Location | Source IP | Destination IP |
e1 | t1 | l1 | s1 | d1 |
e2 | t2 | l2 | s2 | d2 |
e3 | t3 | l1 | s7 | d2 |
e4 | t4 | l1 | s2 | d1 |
e5 | t5 | l2 | s5 | d1 |
e6 | t6 | l2 | s5 | d1 |
3 security incident set D1 of table
According to the event after time-sequencing as shown in Figure 1, using the identical sliding window of time interval come to original thing
Part is divided.Event in the same window will form a transaction sequence, and window will slide into next time every time
Time point be construed as concurrent if event is separated by relatively close several times, i.e., precedence do not considered to it.When
When combined event is located at the first item of current window, merges the influence generated to make up event, original value of utility is multiplied
Using combined number as new value of utility.With e4 in Fig. 1, for e5, since the two is closer, it can be regarded as one
Event.Using sliding window mark off come affairs answer are as follows: < [(e4:ue4)(e5:ue5)],[(e6:ue6)] >, it is contemplated that merge
Influence, need the number by original value of utility multiplied by combined event, affairs that treated should be < [(e4:2*ue4)(e5:2*
ue5)],[(e6:2*ue6)]>。
Affairs set after the division of table 4
After security incident is divided into transaction database, existing effective Sequential Pattern Mining Algorithm pair can be used
It is excavated, and herein using HUSP miner algorithm, does not do excessive introduction here.Three, the safe thing of increment type effective
Part sequential mining
In actual application process, security incident generates in real time, and therefore, the database for dividing formation is also dynamic
Increase.For the database that this dynamic increases, if all carrying out re-starting one after the content update of database every time
Secondary excavation, then a large amount of resource will be expended.In addition, the mining algorithm of script is very with the continuous growth of database size
Result is unable to get because of excessive scale to meeting.Therefore the relationship for needing to find out raw data set and newly-increased data set is next simple
Change mining process.
If raw data set D1, newly-increased data set is D2, effective security incident sequence pattern in raw data set D1
Collection is combined into HUSEP1, and effective safety time sequence pattern collection is combined into HUSEP2 in newly-increased data set.According to definition: HUSEP1's
Minimum value of utility is more than or equal to δ × u (D1), the minimum value of utility of HUSEP2 is more than or equal to δ × u (D2).D1, which merges with D2, to be formed
Database is denoted as D3, it is clear that the minimum value of utility of HUSEP3 is more than or equal to δ × u (D3)=δ × u (D1)+δ×u(D2)。
It may further obtain: be a subset of HUSEP1 ∪ HUSEP2 for the HUSEP3 in D3.Obviously, HUSEP3
At least occurred in HUSEP1 or HUSEP2, if HUSEP3 does not occur in HUSEP1 and HUSEP2, enabled corresponding
Value of utility of the mode in database D 1, D2 and D3 is respectively u1, u2 and u3.According to definition, Ying You: u1<δ×u(D1) and u2<δ
×u(D2), it releases:
u3=u1+u2<δ×u(D1)+δ×u(D2)=δ × u (D3), it is clear that the mode should not generate lance in HUSEP3
Shield.Therefore HUSEP3 is a subset of HUSEP1 ∪ HUSEP2.
For the mode in the mode and HUSEP2 in HUSEP1, situation in 4 is shared:
9) mode is not effective mode in D1 and D2
10) mode is all effective mode in D1 and D2
11) mode is effective mode in D1, is not effective mode in D2
12) mode is not effective mode in D1, is effective mode in D2
For situation 1), mode certainly not effective mode in D3, it was demonstrated that process, which is similar to, proves that HUSEP3 is
The subset of HUSEP1 ∪ HUSEP2 does not do repetition since related proof has already given above here.
For situation 2), mode is effective mode, analogy situation 1 certainly in D3) there is u1≥δ×u(D1), u2≥δ×
u(D2), therefore
u3=u1+u2≥δ×u(D1)+δ×u(D2)=δ × u (D3)。
For situation 3) and whether be effective mode, need to calculate mode and exist if 4) then can not directly release mode in D3
Value of utility in D3 judges.
For situation 3), since mode has been effective mode in D1, it need to only calculate value of utility of the mode in D2
It is being subject to judgement.
Similarly, for situation 4), since mode has been effective mode in D2, mode only need to be calculated in D1
Value of utility is being subject to judgement.
Define 1: project ijValue of utility in q- item collection v is defined as
u(ij, v) and=q (ij,v)×pr(ij)
Wherein q (ij, v) and it is ijQuantity in v, pr (ij) it is ijProfit.
The value of utility for defining 2:q- item collection v is defined as
The value of utility for defining 3:q- sequence s is defined as
Define 4: given q- sequence s=< v1,v2,...,vd>and sequence t=<w1,w2,...,wr>, if d=r and right
Meet v in 1≤k≤dkAnd wkIdentical, then s is the matching of t, is denoted as s~t.
5: the sequence t value of utility in q- sequence s is defined to be defined as
Wherein, t~skIndicate skIt is the matching of t.
Defining 6: the sequence t value of utility in the D of quantized data library is
Define 7: the value of utility of quantized data library D is defined as
Define 8: if value of utility of the sequence t in the D of quantized data library is not less than user-defined minimum threshold δ × u
(D), then t is effective sequence pattern (HUSP), is denoted as
HUSP←{t|u(t)≥u(D)×δ}
Based on above-mentioned definition, effective sequential mode mining can be with is defined as: given quantization sequence database D and minimum effect
With threshold value δ (decimal between 0~1), the sequence pattern that all value of utilities are not less than δ × u (D) is found out.
Four, parallelization increment type effective security incident sequential mining
Using in HUSP-Miner algorithm mining process, it is necessary first to find candidate of the effective upper bound of institute greater than threshold value
1 item collection then again on the basis of this, increases (two kinds of growth pattern) by sequence to generate k+1 item collection by k item collection, and using conjunction
Suitable Pruning strategy reduces search space.One of Pruning strategy is exactly to reduce search by constantly reducing database
Space: database is read in memory first by algorithm, and with the growth of mode, the transaction set comprising this mode will constantly reduce, i.e.,
Corresponding data for projection library constantly becomes smaller.Due in mining process not to database generate change, it can be considered that
After database, the mining process of each mode is independently carried out.
It therefore, can be by the way that an item collection be divided, then using multithreading after finding out all 1 item collections of candidate
Mode carries out P mining to it.Significantly, since candidate each 1 item collection may finally generate effective mode
Number is not identical, it is thus possible to which the thread having terminates too early, and some threads execution time is longer.In this way between different threads by
In the difference for executing the time, total runing time may finally be will lead to and greatly differed from each other with expection.In order to solve such case, into
Row is following to be improved:
1) when thread I completes excavating for task, it is at wait state.
If 2) thread J does not complete excavation also at this time, current mode to be treated is handed into thread I processing, oneself
Execute next mode to be processed.
By above-mentioned strategy, it can to load relative equilibrium between each thread, effectively reduce the time of excavation.
Experimental result and analysis
The event sets that the data set of experiment is generated at random by one according to division methods obtain.After division
Sequence sets share 9752 affairs, and different types of event shares 1000 kinds.Experiment is broadly divided into three parts, as shown in Fig. 2, real
It tests first is that is tested is to change the size of data set in the identical situation of δ;As shown in figure 3, experiment is second is that constant in data set
In the case where, change δ;As shown in figure 4, experiment three is then to compare the mining algorithm of multithreading with single thread.
Experiment one and experiment are second is that test incremental database.Wherein, increment 1 refer in legacy data collection and
Newly-increased data set is excavated respectively, and the method introduced before recycling merges the result excavated twice.Increment 2 is benefit
The result excavated with raw data base is merged with the result excavated to newly-increased data set.In experiment one, number is increased newly
Constant according to collecting, to test the data set being previously generated, original data set is from the data splicing generated (i.e. by original thing
More parts of duplication of business set are spliced again, the duplicate affairs of nonjoinder), scale is the 1,2,3,4 of the data set generated respectively
Times.
Experiment two in, when δ takes 0.0005, the method for increment 1 more slowly than original method, this is because being taken in δ
It is worth in lesser situation, it is more excavates the effective mode quantity obtained.And relatively common hair merging side is employed herein
Method, therefore may be taken some time when merging.Experiment one and experiment two the result shows that, in known original excavation
It, can be than the merging of new and old data set be being dug using the mining algorithm based on increment type for newly-increased data set when as a result
Pick is much faster.
Experiment is third is that using the mining algorithm of multithreading compared with the mining algorithm of single thread, here using four
Thread.The data set of experiment is constant, is compared by changing δ to two methods.As can be seen from Fig., δ is smaller, the two it
Between difference be more obvious.This is because δ is smaller, mode in need of consideration is more, and the calculation amount of progress is also bigger, multithreading
Advantage can also preferably be shown.Therefore, when the amount of data is large, it may be considered that accelerate to dig by the way of multithreading
The speed of pick.
Following table shows the partial results of excavation, which is by by and splicing four original event sets
At δ 0.0008, corresponding threshold value is 7273.Excavate the result is that value of utility be greater than specified threshold effective sequence mould
Formula, wherein each corresponds to a time ID.By taking third in table as an example, the mode excavated is [(132)],
[(577)], [(936)], [(825)], [(531)], [(646)], [(24)], [(505) (644)], [(710)], this is indicated that
Directly there may be certain relationships for these events, it is noted that two events of event id 505,644 be it is concurrent, they it
Between but may can be contacted in the presence of certain with successive event there is no connection.In the effective sequence of events mould that excavation obtains
It can conduct further research in formula, be potentially associated between these events with excavating out.
5 part Result of table
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that
Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist
Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention
Protection scope.
Claims (10)
1. a kind of effective sequence of events pattern mining algorithm, which comprises the following steps:
S1, security incident definition;
The division of S2, transaction database;
S3, increment type effective security incident sequential mining;
S4, parallelization increment type effective security incident sequential mining.
2. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step sl, no
Same event is using attack type label as mark, in order to consider influence of the remaining attribute to event, by for attribute value meter
Its value of utility is calculated, then by the cumulative value of utility as final event of the value of utility of each attribute;Corresponding to attribute value
Value of utility, thus can be different to assign to the event of different location, different IP by changing value of utility by manually providing
Significance level.
3. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step s 2, adopt
Event set is divided into affairs set with the mode of sliding window, window is indicated from tsTo teA period of time, each window when
Between span it is identical, i.e. (te-ts) identical.
4. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step s 2, press
According to the event after time-sequencing, original event is divided using the identical sliding window of time interval;Same
Event in window will form a transaction sequence, and window will slide into the time point of next time every time, if event several times
It is separated by relatively closely, is construed as concurrent, i.e., precedence is not considered to it;Work as front window when combined event is located at
Mouthful first item when, in order to make up event merge generate influence, using original value of utility multiplied by combined number as newly
Value of utility.
5. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step s3, if
Raw data set is D1, increasing data set newly is D2, the effective security incident sequence pattern collection in raw data set D1 is combined into
HUSEP1 increases data set D newly2Middle effective safety time sequence pattern collection is combined into HUSEP2;According to definition: the minimum of HUSEP1
Value of utility is more than or equal to user-defined minimum threshold δ × u (D1), the minimum value of utility of HUSEP2 is more than or equal to user-defined
Minimum threshold δ × u (D2);Raw data set D1 merges the database to be formed with newly-increased data set D2 and is denoted as D3, raw data set
The effective security incident sequence pattern collection that D1 merges the database D 3 to be formed with newly-increased data set D2 is combined into HUSEP3, it is clear that
The minimum value of utility of HUSEP3 is more than or equal to user-defined minimum threshold δ × u (D3)=δ × u (D1)+δ×u(D2)。
6. effective sequence of events pattern mining algorithm according to claim 5, it is characterised in that: in step s3, right
In a subset that HUSEP3 is HUSEP1 ∪ HUSEP2, it is clear that HUSEP3 at least occurred in HUSEP1 or HUSEP2,
If original HUSEP3 does not occur in HUSEP1 and HUSEP2, value of utility difference of the corresponding mode in D1, D2 and D3 is enabled
For u1, u2 and u3, according to definition, Ying You: u1<δ×u(D1) and u2<δ×u(D2), it releases: u3=u1+u2<δ×u(D1)+δ×u
(D2)=δ × u (D3), it is clear that the mode should not generate contradiction, therefore HUSEP3 is HUSEP1 ∪ HUSEP2 in HUSEP3
A subset.
7. effective sequence of events pattern mining algorithm according to claim 5, it is characterised in that: in step s3,
For the mode in the mode and HUSEP2 in HUSEP1, situation in 4 is shared:
1) mode is not effective mode in D1 and D2;
2) mode is all effective mode in D1 and D2;
3) mode is effective mode in D1, is not effective mode in D2;
4) mode is not effective mode in D1, is effective mode in D2;
For situation 1), mode is not effective mode in D3;
For situation 2), mode is effective mode in D3,
Analogy situation 1) there is u1≥δ×u(D1), u2≥δ×u(D2), therefore
u3=u1+u2≥δ×u(D1)+δ×u(D2)=δ × u (D3);
For situation 3) and whether be effective mode, need to calculate mode in D3 if 4) then can not directly release mode in D3
Value of utility judge.
For situation 3), since mode has been effective mode in D1, it only need to calculate value of utility of the mode in D2 and add
With judgement;
For situation 4), since mode has been effective mode in D2, it only need to calculate value of utility of the mode in D1 and add
With judgement.
8. effective sequence of events pattern mining algorithm according to claim 1, it is characterised in that: in step s 4,
Using in HUSP-Miner algorithm mining process, it is necessary first to find the effective upper bound be greater than threshold value 1 item collection of candidate, then
Again on the basis of this, k+1 item collection is generated by k item collection by sequence growth, and search space is reduced using Pruning strategy.
9. effective sequence of events pattern mining algorithm according to claim 8, it is characterised in that: Pruning strategy is exactly logical
It crosses and reduces database constantly to reduce the space of search: database being read in into memory first, include this mould with the growth of mode
The transaction set of formula will constantly reduce, i.e., corresponding data for projection library constantly becomes smaller;Due to not produced to database in mining process
It is raw to change, it can be considered that the mining process of each mode independently carries out after given database.
10. effective sequence of events pattern mining algorithm according to claim 9, it is characterised in that: in step s 4, adopt
P mining is carried out with the mode of multithreading:
1) when thread I completes excavating for task, it is at wait state;
If 2) thread J does not complete excavation also at this time, current mode to be treated is handed into thread I processing, thread J is executed
Next mode to be processed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810650504.6A CN109101530B (en) | 2018-06-22 | 2018-06-22 | High-utility event sequence pattern mining method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810650504.6A CN109101530B (en) | 2018-06-22 | 2018-06-22 | High-utility event sequence pattern mining method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109101530A true CN109101530A (en) | 2018-12-28 |
CN109101530B CN109101530B (en) | 2021-09-21 |
Family
ID=64844854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810650504.6A Active CN109101530B (en) | 2018-06-22 | 2018-06-22 | High-utility event sequence pattern mining method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109101530B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857758A (en) * | 2018-12-29 | 2019-06-07 | 天津南大通用数据技术股份有限公司 | A kind of association analysis method and system based on neighbours' window |
CN113886396A (en) * | 2021-10-20 | 2022-01-04 | 电子科技大学 | Power system fault detection method and system based on high-utility frequent pattern mining |
CN115964415A (en) * | 2023-03-16 | 2023-04-14 | 山东科技大学 | Pre-HUSPM-based database sequence insertion processing method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777182A (en) * | 2016-12-23 | 2017-05-31 | 陕西理工学院 | A kind of data flow effective item set mining algorithm for reducing candidate |
-
2018
- 2018-06-22 CN CN201810650504.6A patent/CN109101530B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106777182A (en) * | 2016-12-23 | 2017-05-31 | 陕西理工学院 | A kind of data flow effective item set mining algorithm for reducing candidate |
Non-Patent Citations (3)
Title |
---|
JERRY CHUN-WEI LIN等: "Efficiently updating the discovered high average-utility itemsets with transaction insertion", 《ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE》 * |
MORTEZA ZIHAYAT等: "Distributed and Parallel High Utility Sequential Pattern Mining", 《2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)》 * |
徐前方: "基于数据挖掘的网络故障告警相关性研究", 《中国博士学位论文全文数据库信息科技辑(月刊)》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109857758A (en) * | 2018-12-29 | 2019-06-07 | 天津南大通用数据技术股份有限公司 | A kind of association analysis method and system based on neighbours' window |
CN113886396A (en) * | 2021-10-20 | 2022-01-04 | 电子科技大学 | Power system fault detection method and system based on high-utility frequent pattern mining |
CN113886396B (en) * | 2021-10-20 | 2022-03-29 | 电子科技大学 | Power system fault detection method and system based on high-utility frequent pattern mining |
CN115964415A (en) * | 2023-03-16 | 2023-04-14 | 山东科技大学 | Pre-HUSPM-based database sequence insertion processing method |
Also Published As
Publication number | Publication date |
---|---|
CN109101530B (en) | 2021-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Duong et al. | Efficient high utility itemset mining using buffered utility-lists | |
Mai et al. | A lattice-based approach for mining high utility association rules | |
Tong et al. | Discovering threshold-based frequent closed itemsets over probabilistic data | |
CN103838863B (en) | A kind of big data clustering algorithm based on cloud computing platform | |
Duong et al. | An efficient method for mining frequent itemsets with double constraints | |
CN109101530A (en) | Effective sequence of events pattern mining algorithm | |
Nam et al. | Efficient approach for damped window-based high utility pattern mining with list structure | |
Lin et al. | Efficient chain structure for high-utility sequential pattern mining | |
CN101727391B (en) | Method for extracting operation sequence of software vulnerability characteristics | |
Liu et al. | Incremental mining of high utility patterns in one phase by absence and legacy-based pruning | |
Jiang et al. | Mining weighted negative association rules from infrequent itemsets based on multiple supports | |
Song et al. | Parallel incremental association rule mining framework for public opinion analysis | |
Abbasghorbani et al. | Survey on sequential pattern mining algorithms | |
Guo et al. | High utility episode mining made practical and fast | |
Liu et al. | SAPNSP: Select actionable positive and negative sequential patterns based on a contribution metric | |
Vu et al. | FTKHUIM: A Fast and Efficient Method for Mining Top-K High-Utility Itemsets | |
Lin et al. | Mining of high average-utility patterns with item-level thresholds | |
Bailey et al. | Efficient incremental mining of contrast patterns in changing data | |
CN106021401A (en) | Extensible entity analysis algorithm based on reverse indices | |
CN109325092A (en) | Merge the nonparametric parallelization level Di Li Cray process topic model system of phrase information | |
CN109857817A (en) | The whole network domain electronic mutual inductor frequent continuous data is screened and data processing method | |
Nguyen et al. | An improved algorithm for mining frequent Inter-transaction patterns | |
Nguyen et al. | An N-list-based approach for mining frequent inter-transaction patterns | |
Yang et al. | Frequent pattern mining algorithm for uncertain data streams based on sliding window | |
Maragatham et al. | A strategy for mining utility based temporal association rules |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |