CN109101530B

CN109101530B - High-utility event sequence pattern mining method

Info

Publication number: CN109101530B
Application number: CN201810650504.6A
Authority: CN
Inventors: 张春慨
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2018-06-22
Filing date: 2018-06-22
Publication date: 2021-09-21
Anticipated expiration: 2038-06-22
Also published as: CN109101530A

Abstract

The invention provides a high-utility event sequence pattern mining method, which comprises the following steps: s1, defining a safety event; s2, dividing a transaction database; s3, mining an incremental high-efficiency safety event sequence; and S4, mining the parallelized incremental high-utility security event sequence. The invention has the beneficial effects that: the parallelization can be adopted to accelerate the mining time, better utilize hardware resources, realize the mining of high-utility event sequence patterns and accelerate the data mining speed.

Description

High-utility event sequence pattern mining method

Technical Field

The invention relates to data mining, in particular to a high-utility event sequence pattern mining method.

Background

The current network security event correlation analysis technology mainly comprises an analysis method based on probability similarity between security events, a correlation analysis method based on causal relationship between time behavior results and prerequisites, an attack graph-based method, a data mining and machine learning-based method and the like, wherein the data mining and machine learning-based method is the most basic and the most effective correlation analysis method. Association rule mining is widely applied to an association analysis model of network security events as a typical data mining method, but with the coming of big data era, the application field of the traditional association rule mining method is narrower and narrower, so that a large number of scholars propose an improved algorithm of the association rule mining algorithm.

At present, some improvements on the traditional association rule mining algorithm are mainly improved aiming at the purpose of the traditional association rule mining algorithm, the traditional rule mining only aiming at a transaction data commodity set is broken through, and the traditional rule mining is applied to the application with more complex conditions.

Povinelli et al, in the field of association rule algorithm research on Time Series, propose a Time Series Data Mining framework (TSDM) based on Time Series, which is called Time Series Data Mining. The Zenghaiquan provides a time sequence mining and similarity searching technology based on a mutual association successor tree model. The Lushan proposes a financial time sequence prediction technology of nonlinear dynamics based on nonlinear time sequence phase space reconstruction; from the current Time Series Data Mining study, Montmann states that Time Series Data Mining may be more generally defined as Time Series Data Mining (TSDM), which extracts the internal rules of a Time sequence from the Time sequence for numerical values, periods, trend analysis and prediction of the Time sequence; gasp et al propose a method to discover rules from a time series. The method comprises the following steps that (1) Gas firstly adopts a sliding window method (mobile windows method) proposed by Baltzersen to carry out standardized preprocessing on time sequence data, converts a time sequence into a time sequence sample, and completes the discretization and symbolization processing process of the time sequence data; secondly, clustering the standardized time series data sample set; thirdly, reconstructing the original time sequence data by using the obtained classes; and finally, carrying out rule mining on the reconstructed time sequence data set. However, the method only applies the data mining processing method to the time sequence analysis in a flexible and hard way, does not consider the time characteristics of the time sequence and the knowledge background problem, and does not provide a reasonable theoretical explanation. Han et al used data mining techniques to perform periodic and partial periodic segment studies on time sequences in a time sequence database in order to discover periodic patterns (referring to patterns that occur regularly at fixed time intervals).

Currently, mining on association rules is based on existing data sets, i.e. given a set of transactions. In mining based on event sequences, however, it is first necessary to convert the event sequences into a transaction set containing events. Currently, most of the conversions are performed based on a sliding window. However, the current method divides the number of events as a fixed window size. This is obviously not reasonable in the event sequence, even if the time interval between two events is large, the two events are divided into the same transaction in the method, and the two events with large time interval are associated to a small extent or even not associated, so that the dividing method ignores the fact and forcibly introduces the transaction containing the two events, which is obviously not reasonable. There is therefore a need for improvements in the method of partitioning.

In addition, for the event sequence mode, since the events are generated continuously, the transaction set generated by the corresponding partition is also dynamically changed. Considering that a new transaction set is added to the original transaction set, the traditional method is to combine two transaction sets into a large transaction set, and then adopt the previous method to mine again on the basis of the large transaction set. This results in a disadvantage: with the continuous expansion of the transaction set size, the mining time will be continuously expanded, and finally, the mining time will be huge and even the mining cannot be completed. This method does not take into account patterns that have been previously mined, but is re-mined each time. In practical applications, this method is obviously not reasonable.

For the mining of sequence patterns, most algorithms are serialized, and in most algorithms, the sequence patterns mined before and after are not fundamentally related, i.e. the mining of some sequence patterns does not depend on partial patterns. Therefore, how to adopt parallelization to accelerate the mining time and better utilize hardware resources is a technical problem to be solved urgently by those skilled in the art.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a high-utility event sequence pattern mining method.

The invention provides a high-utility event sequence pattern mining method, which comprises the following steps:

s1, defining a safety event;

s2, dividing a transaction database;

s3, mining an incremental high-efficiency safety event sequence;

and S4, mining the parallelized incremental high-utility security event sequence.

As a further improvement of the present invention, in step S1, different events use the attack type labels as labels, in order to consider the influence of the remaining attributes on the event, the utility values of the events are calculated by calculating the attribute values, and then the utility values of the attributes are accumulated to be the final utility value of the event; the utility value corresponding to the attribute value is given manually, so that the utility value can be changed to endow different positions and events of different IPs with different degrees of importance.

As a further improvement of the present invention, in step S2, the event set is divided into transaction sets by means of sliding window, and the window represents the event set from t_sTo t_eThe time span of each window is the same, i.e. (t)_e-t_s) The same is true.

As a further improvement of the present invention, in step S2, according to the time-sequenced events, the original events are divided by using a sliding window with the same time interval; events in the same window form a transaction sequence, the window slides to the time point of the next time each time, if the events are closer, the events are considered as simultaneous events, namely the sequence of the events is not considered; and when the merged event is positioned in the first item of the current window, in order to make up for the influence generated by event merging, multiplying the original utility value by the merged number to obtain a new utility value.

As a further improvement of the present invention, in step S3, let the original data set be D₁The newly added data set is D₂The high-utility security event sequence pattern set in the original data set D1 is HUSEP1, and the new data set D₂The high and medium utility safety time sequence mode is HUSEP 2; by definition: HUSEP1 has a minimum utility value of δ × u (D)₁) HUSEP2 has a minimum utility value of δ × u (D)₂) (ii) a The database formed by merging the original data set D1 and the new added data set D2 is denoted as D3, the high-utility security event sequence pattern set of the database D3 formed by merging the original data set D1 and the new added data set D2 is HUSEP3, and obviously, the minimum utility value of HUSEP3 is more than or equal to delta x u (D2)₃)＝δ×u(D₁)+δ×u(D₂)。

As a further development of the invention, in step S3, for a HUSEP3 being a subset of HUSEP1 £ HUSEP2, it is evident that HUSEP3 has occurred at least in HUSEP1 or HUSEP2, and if the original HUSEP3 has not occurred in both HUSEP1 and HUSEP2, the utility values of the corresponding patterns in D1, D2 and D3 are u1, u2 and u3, respectively, by definition: u. of₁＜δ×u(D₁) And u₂＜δ×u(D₂) And (3) pushing out: u. of₃＝u₁+u₂＜δ×u(D₁)+δ×u(D₂)＝δ×u(D₃) It is clear that this pattern should not produce a contradiction in HUSEP3, and therefore HUSEP3 is a subset of HUSEP1 {. HUSEP 2.

As a further improvement of the present invention, in step S3,

for the pattern in HUSEP1 and the pattern in HUSEP2, there are 4 cases:

5) the mode is not a high utility mode in neither D1 nor D2;

6) the mode is a high utility mode in both D1 and D2;

7) the mode is a high utility mode in D1, and is not a high utility mode in D2;

8) the mode is not a high utility mode in D1, and is a high utility mode in D2;

for case 1), the mode is not a high utility mode in D3;

for case 2), the mode is the high utility mode in D3,

analogy case 1) has u₁≥δ×u(D₁)，u₂≥δ×u(D₂) Therefore u is₃＝u₁+u₂≥δ×u(D₁)+δ×u(D₂)＝δ×u(D₃)；

For cases 3) and 4), it cannot be directly deduced whether the pattern is a high utility pattern in D3, and the utility value of the pattern in D3 needs to be calculated for judgment.

For the case 3), since the mode is already the high utility mode in D1, it is only necessary to calculate the utility value of the mode in D2 for judgment;

for case 4), since the pattern is already a high utility pattern in D2, it is only necessary to calculate the utility value of the pattern in D1 for judgment.

As a further improvement of the present invention, in step S4, in the mining process using the HUSP-Miner algorithm, firstly, a candidate 1 item set whose effective upper bound is greater than the threshold needs to be found, then, on the basis of this, a k +1 item set is generated from the k item set by sequence growth, and a pruning strategy is used to reduce the search space.

As a further improvement of the present invention, the pruning strategy is to reduce the search space by continuously shrinking the database: firstly, reading a database into a memory, and as the mode increases, the transaction set containing the mode is continuously reduced, namely the corresponding projection database is continuously reduced; since no changes are made to the database during mining, it can be considered that the mining process of each schema is performed independently after the database is given.

As a further improvement of the present invention, in step S4, parallel mining is performed in a multi-thread manner:

1) when the thread I finishes the task of excavation, the thread I is in a waiting state;

2) if the thread J does not complete the mining at this time, the mode needing processing at present is transferred to the thread I for processing, and the thread J executes the next mode to be processed.

The invention has the beneficial effects that: by the scheme, the mining time can be shortened by adopting parallelization, hardware resources are better utilized, the mining of a high-utility event sequence mode is realized, and the data mining speed is increased.

Drawings

Fig. 1 is a schematic diagram of dividing security events based on time in the high-utility event sequence pattern mining method of the present invention.

FIG. 2 is a graph showing the results of the experiment.

FIG. 3 is a graph showing the results of the second experiment.

FIG. 4 is a graph showing the results of the three experiments.

Detailed Description

The invention is further described with reference to the following description and embodiments in conjunction with the accompanying drawings.

A high-utility event sequence pattern mining method comprises the following steps:

definition of security event

Before researching the pattern mining of the network security events, the definition of the network security events, namely the attributes of the network security events, is given first. According to past experience, the following typical attributes are extracted from the present invention to define a network security event. Table 1 gives the definition of a network security event and table 2 lists the common types of network attacks.

Table 1 security event attributes

TABLE 2 common attack types

Different events take the attack type labels as marks, in order to consider the influence of the residual attributes on the events, the utility values of the events are calculated by the attribute values, and then the utility values of the attributes are accumulated to be used as the final utility value of the events. The utility value corresponding to the attribute value is given manually, so that the utility value can be changed to endow different positions and events of different IPs with different degrees of importance.

Second, partitioning of a transaction database

Because an event set is obtained, the set cannot be mined by directly applying a traditional pattern mining algorithm, and needs to be converted into a transaction set suitable for the pattern mining algorithm. It should be noted that the representation of the transaction in the conventional high-utility sequence pattern mining is slightly different from that in the high-utility sequence pattern mining based on the security event. In high utility model mining, the utility of a project is affected by both external and internal utilities. Internal utility generally refers to quantity, external utility generally refers to profit corresponding to the project and external utility is the same for the same project. Since each event is affected by other attributes, its utility value may be different for different numbered events. This makes it possible toTransactions in the high-utility sequence mining based on the security events are differentiated from transactions in the traditional high-utility sequence mining in form. For example, consider a transaction<[(e1:u_e1)],[(e2:u_e2)]>In the safety event mining, the Ue1 and Ue2 in the transaction represent the utility value of the corresponding event, while the Ue1 and Ue2 in the conventional high utility sequence pattern mining correspond to the number of times the items e1 and e2 appear in the corresponding ranges. Although formally different, conventional high utility sequential pattern mining algorithms can be applied to the transaction database generated by the security events. The internal utility and the external utility of the traditional high-utility sequence pattern mining are only used for calculating the utility value of the project, and in addition, the internal utility and the external utility have no substantial influence on the mining process.

The event set is divided into transaction sets in a sliding window mode. Window representation from t_sTo t_eThe time span of each window is the same, i.e. (t)_e-t_s) The same is true. The following illustrates the detailed steps of event partitioning. There is a set of safe times D1, as shown in table 3.

Attack ID	Time(s)	Location	Source IP	Destination IP
					e₁	t₁	l₁	s₁	d₁
e₂	t₂	l₂	s₂	d₂
					e₃	t₃	l₁	s₇	d₂
e₄	t₄	l₁	s₂	d₁
					e₅	t₅	l₂	s₅	d₁
e₆	t₆	l₂	s₅	d₁

Table 3 set of security events D1

The events sorted by time are shown in fig. 1, and the original events are divided by using a sliding window with the same time interval. Events in the same window form a transaction sequence, the window slides to the time point of the next time each time, if several events are closer, the events are considered to be simultaneous events, namely, the sequence of the events is not considered. And when the merged event is positioned in the first item of the current window, in order to make up for the influence generated by event merging, multiplying the original utility value by the merged number to obtain a new utility value. Taking e4 and e5 in fig. 1 as an example, they can be regarded as an event because of their close distance. The transactions divided out using the sliding window should be:<[(e4:u_e4)(e5:u_e5)],[(e6:u_e6)]>considering the merging effect, the original utility value needs to be multiplied by the number of the merged events, and the processed transaction should be<[(e4:2*u_e4)(e5:2*u_e5)],[(e6:2*u_e6)]>。

TABLE 4 partitioned transaction sets

After the security events are divided into the transaction database, the transaction database can be mined by adopting the existing high-utility sequence pattern mining algorithm, which is a HUSP miner algorithm and is not described in more detail herein.

Three, incremental high-efficiency safety event sequence mining

In the actual application process, the security events are generated in real time, and therefore, the partitioned database is also dynamically increased. For such a dynamically growing database, if mining is performed again each time the contents of the database are updated, a lot of resources are consumed. Moreover, with the continuous increase of the size of the database, the original mining algorithm cannot obtain results even due to the excessively large size. There is a need to find the relationship of the original data set to the new data set to simplify the mining process.

Let original data set D₁The newly added data set is D₂The high-utility security event sequence pattern set in the original data set D1 is HUSEP1, and the high-utility security time sequence pattern set in the newly added data set is HUSEP 2. By definition: HUSEP1 has a minimum utility value of δ × u (D)₁) HUSEP2 has a minimum utility value of δ × u (D)₂). The database formed by combining D1 with D2 is denoted as D3, and it is obvious that the minimum utility value of HUSEP3 is larger than or equal to delta x u (D2)₃)＝δ×u(D₁)+δ×u(D₂)。

It can further be derived that: HUSEP3 for D3 is a subset of HUSEP1 {. U.HUSEP 2. Obviously, HUSEP3 was present at least in HUSEP1 or HUSEP2, and if HUSEP3 was not present in HUSEP1 or HUSEP2, the utility values of the corresponding patterns in databases D1, D2 and D3 were u1, u2 and u3, respectively. By definition, there should be: u. of₁＜δ×u(D₁) And u₂＜δ×u(D₂) And (3) pushing out:

u₃＝u₁+u₂＜δ×u(D₁)+δ×u(D₂)＝δ×u(D₃) It is clear that this pattern should not be in HUSEP3, creating a contradiction. Thus HUSEP3 is a subset of HUSEP1 U.U.HUSEP 2.

For the pattern in HUSEP1 and the pattern in HUSEP2, there are 4 cases:

9) the mode is not a high utility mode in neither D1 nor D2

10) The mode is a high utility mode in both D1 and D2

11) The mode is a high utility mode in D1, and is not a high utility mode in D2

12) The mode is not the high utility mode in D1, and is the high utility mode in D2

For case 1), the pattern is certainly not a high utility pattern in D3, the proving process is similar to the proving that HUSEP3 is a subset of HUSEP1 £ HUSEP2, which is not repeated here since the correlation has been given above.

For case 2), the mode must be the high utility mode in D3, an analogous case1) Has u₁≥δ×u(D₁)，u₂≥δ×u(D₂) Thus, therefore, it is

u₃＝u₁+u₂≥δ×u(D₁)+δ×u(D₂)＝δ×u(D₃)。

For case 3), since the pattern is already a high utility pattern in D1, it is only necessary to calculate the utility value of the pattern in D2 for judgment.

Similarly, for case 4), since the pattern is already a high utility pattern in D2, it is only necessary to calculate the utility value of the pattern in D1 for judgment.

Definition 1: item i_jThe utility value in the q-term set v is defined as

u(i_j,v)＝q(i_j,v)×pr(i_j)

Wherein q (i)_jV) is i_jThe number in v, pr (i)_j) Is i_jThe profit of (1).

Definition 2: the utility value of the q-term set v is defined as

Definition 3: the utility value of the q-sequence s is defined as

Definition 4: given the q-sequence s ═ v₁,v₂,...,v_dAnd the sequence t ═ w₁,w₂,...,w_r> -, if d ≦ r and for 1 ≦ k ≦ d, v is satisfied_kAnd w_kAnd if the two are identical, s is the matching of t and is marked as s-t.

Definition 5: the utility value of the sequence t in the q-sequence s is defined as

Wherein, t to s_kDenotes s_kIs a match for t.

Definition 6: the utility value of the sequence t in the quantization database D is

Definition 7: defining the utility value of the quantitative database D as

Definition 8: if the utility value of the sequence t in the quantization database D is not lower than the user-defined minimum threshold value δ × u (D), then t is the High Utility Sequence Pattern (HUSP) and is noted as

HUSP←{t|u(t)≥u(D)×δ}

Based on the above definitions, high utility sequence pattern mining can be defined as: given the quantitative sequence database D and the minimum utility threshold delta (decimal between 0 and 1), finding out all sequence patterns with utility values not lower than delta x u (D).

Four, parallelization incremental high-utility safety event sequence mining

In the mining process by adopting the HUSP-Miner algorithm, firstly a candidate 1 item set with an effective upper bound larger than a threshold value needs to be found, then on the basis, a k +1 item set is generated from a k item set by sequence growth (two growth modes), and a proper pruning strategy is adopted to reduce a search space. One pruning strategy is to reduce the search space by continually narrowing the database: the algorithm first reads the database into the memory, and as the pattern grows, the transaction set containing the pattern will shrink continuously, i.e. the corresponding projection database becomes smaller continuously. Since no changes are made to the database during mining, it can be considered that the mining process of each schema is performed independently after the database is given.

Therefore, after finding out all candidate 1 item sets, one item set can be divided and then mined in parallel in a multithreading mode. It is noted that, because the number of high utility patterns that can be generated by each 1 item set is different, some threads may end too early, and some threads may have longer execution time. Thus, the total running time may be far from the expected running time due to the difference of the execution time among different threads. To solve this situation, the following improvements are made:

1) when the thread I finishes the mining task, the thread I is in a waiting state.

2) If the thread J does not complete the mining at this time, the mode needing processing at present is transferred to the thread I for processing, and the next mode to be processed is executed by the thread J.

Through the strategy, the loads among the threads can be relatively balanced, and the mining time is effectively reduced.

Results and analysis of the experiments

The experimental data set was derived from a randomly generated set of events according to the given partitioning method. The sequence set after the division has 9752 transactions, and the different kinds of events have 1000 kinds. The experiment is mainly divided into three parts, as shown in fig. 2, the experiment is to test that the size of the data set is changed under the condition that the delta is the same; as shown in fig. 3, the second experiment is to change δ under the condition that the data set is not changed; as shown in fig. 4, experiment three compares the mining algorithm of multiple threads with a single thread.

And testing the incremental database by using the first experiment and the second experiment. The increment 1 refers to that the original data set and the newly added data set are respectively mined, and the results of the two mining are merged by using the method introduced before. And the increment 2 is that the result obtained by mining the original database is combined with the result obtained by mining the new data set. In experiment one, the newly added data set is unchanged, and is the data set generated previously in the experiment, and the original data set is spliced from the generated data (i.e. the original transaction set is copied to multiple copies and then spliced, and the original transaction set is not spliced and repeated), and the sizes of the newly added data set are respectively 1, 2, 3 and 4 times of the generated data set.

In the second experiment, when δ is 0.0005, the method of increment 1 is slower than the original method, because the number of high-utility modes obtained by mining is large under the condition that δ is small. The more common sending and combining method is adopted in the text, so that certain time may be spent in combining. The results of the first experiment and the second experiment show that when the original mining result is known, for the newly added data set, the incremental mining algorithm is much faster than merging the old data set and the new data set for mining.

Experiment three is a comparison of a mining algorithm using multiple threads and a mining algorithm using a single thread, where four threads are used. The experimental data set was unchanged and the two methods were compared by varying δ. As can be seen from the figure, the smaller the δ, the more distinct the difference between the two. This is because the smaller δ, the more modes that need to be considered, the more computation is performed, and the advantages of multithreading can be better displayed. Therefore, when the data volume is large, it can be considered to adopt a multi-thread mode to accelerate the mining speed.

The following table shows the partial results of mining, the data set is formed by splicing four original event sets, the delta is 0.0008, and the corresponding threshold value is 7273. The result of the mining is a high utility sequence pattern with utility values greater than a specified threshold, where each entry corresponds to a time ID. Taking the third example in the table, the mining pattern is [ (132) ], [ (577) ], [ (936) ], [ (825) ], [ (531) ], [ (646) ], [ (24) ], [ (505) (644) ], [ (710) ], which indicates that these events may have a certain relation directly, and it is noted that two events with event ID 505,644 are concurrent, there may not be a relation between them, but there may be a relation with the subsequent events. Further research can be conducted on the high-utility event sequence patterns obtained by mining to discover potential associations between the events.

Table 5 partial mining results

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A high-utility event sequence pattern mining method is characterized by comprising the following steps:

s1, defining a safety event;

s2, dividing a transaction database;

s3, mining an incremental high-efficiency safety event sequence;

s4, excavating a parallelization incremental high-utility security event sequence;

in step S1, different events use the attack type labels as labels, in order to consider the influence of the remaining attributes on the event, the utility values of the events are calculated by calculating the attribute values, and then the utility values of the attributes are accumulated to be the final utility value of the event; the utility values corresponding to the attribute values are manually given, so that different positions and different IP events can be endowed with different importance degrees by changing the utility values;

in step S2, the event set is divided into transaction sets by means of sliding window, where the window represents the time from t_sTo t_eThe time span of each window is the same, i.e. (t)_e-t_s) The same;

in step S2, dividing the original events by using sliding windows with the same time interval according to the time-sequenced events; events in the same window form a transaction sequence, the window slides to the time point of the next time each time, if the events are closer, the events are considered as simultaneous events, namely the sequence of the events is not considered; when the merged event is positioned in the first item of the current window, in order to make up for the influence generated by the merging of the events, multiplying the original utility value by the merged number to be used as a new utility value;

in step S3, let the original data set be D₁The newly added data set is D₂The high-utility security event sequence pattern set in the original data set D1 is HUSEP1, and the new data set D₂The high and medium utility safety time sequence mode is HUSEP 2; by definition: the minimum utility value of HUSEP1 is equal to or greater than a user-defined minimum threshold value δ × u (D)₁) HUSEP2 has a minimum utility value equal to or greater than a user-defined minimum threshold value δ × u (D)₂) (ii) a The database formed by merging the original data set D1 with the new added data set D2 is denoted as D3, the high-utility security event sequence pattern set of the database D3 formed by merging the original data set D1 with the new added data set D2 is HUSEP3, and obviously, the minimum utility value of HUSEP3 is more than or equal to the minimum threshold value delta x u (D3) defined by the user₃)＝δ×u(D₁)+δ×u(D₂)；

In step S3, for a subset of HUSEP3 that is HUSEP1 {. U HUSEP2, it is clear that HUSEP3 has appeared at least in HUSEP1 or HUSEP2, and if the original HUSEP3 has not appeared in both HUSEP1 and HUSEP2, the utility values of the corresponding pattern in D1, D2 and D3 are u1, u2 and u3, respectively, by definition, there should be: u. of₁＜δ×u(D₁) And u₂＜δ×u(D₂) And (3) pushing out: u. of₃＝u₁+u₂＜δ×u(D₁)+δ×u(D₂)＝δ×u(D₃) It is clear that this pattern should not produce a contradiction in HUSEP3, so HUSEP3 is a subset of HUSEP1 {. HUSEP 2;

in the step S3, in step S3,

for the pattern in HUSEP1 and the pattern in HUSEP2, there are 4 cases:

1) the mode is not a high utility mode in neither D1 nor D2;

2) the mode is a high utility mode in both D1 and D2;

3) the mode is a high utility mode in D1, and is not a high utility mode in D2;

4) the mode is not a high utility mode in D1, and is a high utility mode in D2;

for case 1), the mode is not a high utility mode in D3;

for case 2), the mode is the high utility mode in D3,

For cases 3) and 4), it cannot be directly deduced whether the mode is a high utility mode in D3, and the utility value of the mode in D3 needs to be calculated for judgment;

2. The high-utility event sequence pattern mining method according to claim 1, characterized in that: in step S4, in the mining process using the HUSP-Miner algorithm, first, a candidate 1 item set whose effective upper bound is greater than the threshold needs to be found, then, on the basis, a k +1 item set is generated from the k item set by sequence growth, and a pruning strategy is used to reduce the search space.

3. The high-utility event sequence pattern mining method according to claim 2, characterized in that: pruning strategies reduce the search space by continually scaling down the database: firstly, reading a database into a memory, and as the mode increases, the transaction set containing the mode is continuously reduced, namely the corresponding projection database is continuously reduced; since no changes are made to the database during mining, it can be considered that the mining process of each schema is performed independently after the database is given.

4. The high-utility event sequence pattern mining method according to claim 3, characterized in that: in step S4, parallel mining is performed in a multithread manner: