CN104008185A - Frequent close scenario mining method based on same node table and scenario tree - Google Patents

Frequent close scenario mining method based on same node table and scenario tree Download PDF

Info

Publication number
CN104008185A
CN104008185A CN201410256954.9A CN201410256954A CN104008185A CN 104008185 A CN104008185 A CN 104008185A CN 201410256954 A CN201410256954 A CN 201410256954A CN 104008185 A CN104008185 A CN 104008185A
Authority
CN
China
Prior art keywords
plot
node
frequent
tree
scenario
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410256954.9A
Other languages
Chinese (zh)
Inventor
杜承烈
杨凯
尤涛
钟冬
吴其蔓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201410256954.9A priority Critical patent/CN104008185A/en
Publication of CN104008185A publication Critical patent/CN104008185A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Abstract

The invention discloses a frequent close scenario mining method based on the same node table and a scenario tree. The method is used for solving the technical problem that an existing frequent scenario mining method is poor in practicality. According to the technical scheme, on the basis of the frequent scenario tree, the frequent close scenario mining method of an event sequence based on the same node table is disclosed, the scenario tree is constructed and pruned, a frequent close scenario is extracted, and candidate scenarios cannot be generated through the method, so that space efficiency is improved. Close scenario checking is contained in the process of construction of the scenario tree and the pruning of the same node, close checking is not needed to be independently conducted, and in this way the time efficiency of the method is improved. By means of the frequent close scenario mining method based on the same node table and the scenario tree, the frequent close scenario of the event sequence can be found more effectively. For the same mining object, the time efficiency is averagely improved by 20%, and the space efficiency is averagely improved by 60%. The method has the advantages that time and space cost is not greatly increased along with the increment of the support degree and the sequence length.

Description

Based on same node table and frequent plot tree, frequently close Episodes Mining
Technical field
The present invention relates to a kind of Frequent Episodes Mining, particularly relate to and a kind ofly based on same node table and frequent plot tree, frequently close Episodes Mining.
Background technology
Plot has been described the relation that follows closely between event type, is commonly used to portray the behavior pattern of user in real world applications or system, how to find that frequent plot has become one of hot issue of Data Mining.
Document " the frequent plot mining algorithm of sequence of events based on broad sense suffix tree; University of Science & Technology, Beijing's journal; 2006; Vol28 (5); p490-497 " discloses a kind of Frequent Episodes Mining occurring for minimum, the method is utilized broad sense suffix concept, adopts broad sense suffix tree to find and deposit frequent plot, realizes and shares front compression; The problem that cannot find discontinuous plot for traditional broad sense suffix tree, expands broad sense suffix tree, finds to exist the plot at interval, and adopts adjacent events largest interval to limit to find the plot of random length; For the larger problem of complete broad sense suffix tree time and space consumption, adopt the frequent plot list of locations of position diffuse layer by layer method utilization, utilize frequent plot list of locations to realize the successively division to sequence, at the enterprising market joint of corresponding projection word bank counting, dwindled search volume.But in plot excavation phase, the method only can Mining Frequent plot, for more reflecting frequently closing plot and cannot excavating of the minimum closure of plot, there is limitation; The method adopts the minimum plot definition occurring, and when excavating plot, can produce and owe enumeration problem, can not reflect accurately the situation that plot occurs.In sum, existing Frequent Episodes Mining exists cannot close plot by Mining Frequent, and minimum genetic definition cannot accurately reflect that the problem of owing counting occurs to exist plot.
Summary of the invention
In order to overcome the deficiency of existing Frequent Episodes Mining poor practicability, the invention provides and a kind ofly based on same node table and frequent plot tree, frequently close Episodes Mining.The method is on the basis of frequent plot tree, based on same node table, proposed frequently to close Episodes Mining on a sequence of events, by plot tree is carried out structure and beta pruning and extracted frequently closing plot, the method does not need to produce candidate's plot, has improved space efficiency; By closing plot inspection, lie in the structure of plot tree and the process of same node beta pruning, without carrying out separately closed inspection, thereby promoted the time efficiency of method.
The technical solution adopted for the present invention to solve the technical problems is: a kind of based on same node table and frequent plot tree frequently close Episodes Mining, be characterized in adopting following steps:
Step 1, plot tree structure and pruning method.
(1) sequence of scanning, finds all frequent 1-plots, frequent 1-plot is joined in the ground floor node of plot tree.
(2) for all frequent n-plots, in sequence, according to maximum interval of events, detect successor, if can meet requirement and this event of minimum support, by other frequent plots, do not used, for this node, create a child node, the service marking of this event of juxtaposition is true.
(3) by new node join in same node point table.Repeatedly carry out, until can not construct new node.In this course, without generating the set of candidate's plot.
(4) for all nodes in same node table, only retain one tree middle-level minimum, all the other nodes and child node thereof carry out beta pruning processing without exception.
Step 2, frequently close plot abstracting method.
(1) from ground floor node, start of record, the support of detecting lower one deck node.If support is identical with current layer node, continue to detect downwards.
(2) if different, node from the off frequently closes plot to the event formation one of current layer node representative.From lower one deck node, continue to survey.
(3) degree of depth travels through whole beta pruning plot tree, until travel through completely, the plots of frequently closing all in this time series are extracted out.
The invention has the beneficial effects as follows: on the basis of frequent plot tree, based on same node table, proposed frequently to close Episodes Mining on a sequence of events, by plot tree is carried out structure and beta pruning and is extracted frequently closing plot, the method does not need to produce candidate's plot, has improved space efficiency; By closing plot inspection, lie in the structure of plot tree and the process of same node beta pruning, without carrying out separately closed inspection, thereby promoted the time efficiency of method.Theoretical analysis and experimental evaluation prove, the inventive method can more effectively be found frequently to close plot on sequence of events, for identical excavation object, time efficiency on average promotes 20%, space efficiency on average promotes 60%, and has that time and space cost do not increase with support, sequence length and the superperformance that significantly increases.
Below in conjunction with the drawings and specific embodiments, the present invention is elaborated.
Accompanying drawing explanation
Fig. 1 is the inventive method plot tree structure and pruning method process flow diagram.
Fig. 2 is that the inventive method is frequently closed plot abstracting method process flow diagram.
Fig. 3 is the inventive method embodiment sequence of events exemplary graph.
Fig. 4 is the frequent plot tree graph of the inventive method embodiment.
Fig. 5 is that the inventive method application same node table is to frequent plot hedge clipper branch figure.
Fig. 6 is that the inventive method embodiment frequently closes plot tree graph.
Embodiment
With reference to Fig. 1-6.The present invention is based on same node table and frequent plot tree frequently to close Episodes Mining concrete steps as follows:
The present invention relates to following concept definition as follows:
Sequence of events: the sequence that the event in given event type collection E occurs successively according to strict time order and function order, be designated as ES=<E1-t1, E2-t2 ..., Ek-tk>.
Plot: plot is the partial ordering set of event on sequence of events.Be expressed as a={E1 ... Ek}.
All non-overlapped the number that occur of support: plot a on sequence of events ES is called the support of a, is designated as a.supp.
Frequent plot, frequently closes plot: if plot a is greater than or equal to the minimum support of setting, a is a frequent plot.Given frequent plot a, if the support of any very super plot of a is all not equal to the support of a, a is one and frequently closes plot.
Occur: given plot a={E1 ... Ek}, if sequence of events ES exist any one subsequence ES '=E1-t1, E2-t2 ..., Ek-tk}, and ti+1-ti< maximum time, be called ES ' for the once generation of a on ES.
Minimum generation: establish [ts, te] be the once generation of plot a on sequence of events ES, if on ES, do not exist another time of a occur [ts ', te '], make ts<ts ' and te≤te ', or ts≤ts ' and te<te ', claim that [ts, te] is once minimum occur of a on ES.
Non-overlapped generation.If [ts, te] and [ts ', te '] is twice generation of plot a on sequence of events ES, if t e<t' sor t' e<t s, [ts, te] and [ts ', te '] is the non-overlapped generation of a on ES.
Non-overlapped occurs: establish [ts, te] be the once generation of plot a on sequence of events ES, [ts ', te '] be another generation of plot a on sequence of events ES, if be not all positioned at same position for any one occurrence in [ts, te], [ts ', te '] on ES, claiming [ts, te] to be non-overlapped occurs.
In addition, for finding frequent plot, also need the degree around of the generation of event in plot to be limited.Method is a maximum extended period that adopts window definition plot, usings the frequency of occurrences of plot in all windows as its support, and the shortcoming of the method is to find to surpass the frequent plot of window width.The maximum time interval of other method restriction adjacent events, the window width that comprises the plot of k event is (k-1) * maximum time interval to the maximum.The present invention adopts second method to find plot.
Frequent plot tree: frequent plot tree is a root tree, and each node represents an event, and each node can have 0-n child node, and a sequence string that starts to finish to certain node from root node represents a plot being comprised of some events.In node, safeguard following attribute: event type, the support of this node, event occurrence positions.The growth of frequent plot tree is successively expanded, and utilizes the position of each node successively to divide sequence of events, has dwindled search volume.
Same node table: be arranged in the node of frequent plot tree diverse location, if there is identical event type, thing support, event occurrence positions, claim that two nodes are same node.In frequent plot tree, same node has identical child node.This table is a secondary index table.For the node newly adding in in-tree, all positions that the event of its representative is occurred, take and the event type of representative is cryptographic hash, add in Hash table, as first order index.The quoting of each node that adds respective index recorded in the second level.
The compound event matching process of content oriented distribution subscription system is divided into three phases, is respectively to increase to subscribe to, delete and subscribe to and event matches.The increase of subscribing to is respectively the renewal process for data structure with deletion, for maintenance event matching structure, adopts the speed of carrying out accelerated events coupling with the concept of meaning partial order in event matches process, and concrete steps are as follows:
Method of the present invention is mainly divided into two steps: plot tree structure is with beta pruning, frequently close plot extraction.Be described below:
With reference to Fig. 1, plot tree structure comprises the following steps with beta pruning flow process:
Step 1: scan one time sequence, find all frequent 1-plots, frequent 1-plot is joined in the ground floor node of plot tree.
Step 2: for all frequent n-plots, in sequence, according to maximum interval of events, detect successor, if can meet requirement and this event of minimum support, by other frequent plots, do not used, for this node, create a child node, the service marking of this event of juxtaposition is true.
Step 3: by new node join in same node point table.Repeatedly carry out, until can not construct new node.In this course, without generating the set of candidate's plot.
Step 4: for all nodes in same node table, only retain one tree middle-level minimum, all the other nodes and child node thereof carry out beta pruning processing without exception.
With reference to Fig. 2, frequently close plot extraction and comprise the following steps:
Step 1: from ground floor node, start of record, the support of detecting lower one deck node.If support is identical with current layer node, continue to detect downwards.
Step 2: if different, node from the off frequently closes plot to the event formation one of current layer node representative.From lower one deck node, continue to survey.
Step 3: the degree of depth travels through whole beta pruning plot tree, until travel through completely, the plots of frequently closing all in this time series are extracted out.
The sequence of events providing with reference to Fig. 3, if number of support threshold value and sweep spacing are set to minsup=2, maxgap=2.
According to step 1: scan all frequent 1-plots, result is A, B, C, D, G.
According to step 2: the iteration frequent n-plot of growing, result as shown in Figure 4.
With reference to Fig. 5 according to step 3: same node point is joined in same node point table, obtains the same node point in frequent plot tree.
With reference to Fig. 6 according to step 4: carry out the beta pruning of same node point table, by all nodes in same node table, only retain one middle-level minimum in tree, obtain frequently closing plot tree.
With reference to the beta pruning plot tree shown in Fig. 6, closing plot extracts.
According to step 1: from ground floor node, start of record, the support of detecting lower one deck node.If support is identical with current layer node, continue to detect downwards.Result is D:sup=3, GDC:sup=2.
According to step 2: if different, node from the off frequently closes plot to the event formation one of current layer node representative.From lower one deck node, continue to survey.Result is B:sup=3, BAD:sup=2.
According to step 3: the degree of depth travels through whole beta pruning plot tree, until travel through completely, the plots of frequently closing all in this time series are extracted out, and result is B:sup=3, BAD:sup=2, D:sup=3, GDC:sup=2.

Claims (1)

  1. Based on same node table and frequent plot tree frequently close an Episodes Mining, it is characterized in that comprising the following steps:
    Step 1, plot tree structure and pruning method;
    (1) sequence of scanning, finds all frequent 1-plots, frequent 1-plot is joined in the ground floor node of plot tree;
    (2) for all frequent n-plots, in sequence, according to maximum interval of events, detect successor, if can meet requirement and this event of minimum support, by other frequent plots, do not used, for this node, create a child node, the service marking of this event of juxtaposition is true;
    (3) by new node join in same node point table; Repeatedly carry out, until can not construct new node; In this course, without generating the set of candidate's plot;
    (4) for all nodes in same node table, only retain one tree middle-level minimum, all the other nodes and child node thereof carry out beta pruning processing without exception;
    Step 2, frequently close plot abstracting method;
    (1) from ground floor node, start of record, the support of detecting lower one deck node; If support is identical with current layer node, continue to detect downwards;
    (2) if different, node from the off frequently closes plot to the event formation one of current layer node representative; From lower one deck node, continue to survey;
    (3) degree of depth travels through whole beta pruning plot tree, until travel through completely, the plots of frequently closing all in this time series are extracted out.
CN201410256954.9A 2014-06-11 2014-06-11 Frequent close scenario mining method based on same node table and scenario tree Pending CN104008185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410256954.9A CN104008185A (en) 2014-06-11 2014-06-11 Frequent close scenario mining method based on same node table and scenario tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410256954.9A CN104008185A (en) 2014-06-11 2014-06-11 Frequent close scenario mining method based on same node table and scenario tree

Publications (1)

Publication Number Publication Date
CN104008185A true CN104008185A (en) 2014-08-27

Family

ID=51368842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410256954.9A Pending CN104008185A (en) 2014-06-11 2014-06-11 Frequent close scenario mining method based on same node table and scenario tree

Country Status (1)

Country Link
CN (1) CN104008185A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563757A (en) * 2018-04-16 2018-09-21 泰州学院 Pervasive sequence of events Frequent Episodes Mining
CN115859132A (en) * 2023-02-27 2023-03-28 广州帝隆科技股份有限公司 Big data risk management and control method and system based on neural network model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7496592B2 (en) * 2005-01-31 2009-02-24 International Business Machines Corporation Systems and methods for maintaining closed frequent itemsets over a data stream sliding window
US20090112863A1 (en) * 2007-10-26 2009-04-30 Industry-Academic Cooperation Foundation, Yonsei University Method and apparatus for finding maximal frequent itmesets over data streams
CN102073732A (en) * 2011-01-18 2011-05-25 东北大学 Method for mining frequency episode from event sequence by using same node chains and Hash chains
CN102622447A (en) * 2012-03-19 2012-08-01 南京大学 Hadoop-based frequent closed itemset mining method
CN103324712A (en) * 2013-06-19 2013-09-25 西北工业大学 Extraction method for non-redundancy plot rule

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7496592B2 (en) * 2005-01-31 2009-02-24 International Business Machines Corporation Systems and methods for maintaining closed frequent itemsets over a data stream sliding window
US20090112863A1 (en) * 2007-10-26 2009-04-30 Industry-Academic Cooperation Foundation, Yonsei University Method and apparatus for finding maximal frequent itmesets over data streams
CN102073732A (en) * 2011-01-18 2011-05-25 东北大学 Method for mining frequency episode from event sequence by using same node chains and Hash chains
CN102622447A (en) * 2012-03-19 2012-08-01 南京大学 Hadoop-based frequent closed itemset mining method
CN103324712A (en) * 2013-06-19 2013-09-25 西北工业大学 Extraction method for non-redundancy plot rule

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
汪金苗等: "一种不确定性数据中最大频繁项集挖掘方法", 《山东理工大学学报》 *
袁红娟: "GFExtractor:事件序列上有效挖掘无冗余情节规则的算法", 《计算机工程与应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563757A (en) * 2018-04-16 2018-09-21 泰州学院 Pervasive sequence of events Frequent Episodes Mining
CN108563757B (en) * 2018-04-16 2021-05-28 泰州学院 Universal event sequence frequent plot mining method
CN115859132A (en) * 2023-02-27 2023-03-28 广州帝隆科技股份有限公司 Big data risk management and control method and system based on neural network model

Similar Documents

Publication Publication Date Title
CN105741175B (en) A method of account in online social networks is associated
CN103218397B (en) A kind of social networks method for secret protection based on non-directed graph amendment
Achar et al. Pattern-growth based frequent serial episode discovery
CN105808754A (en) Method for rapidly discovering accumulation mode from movement trajectory data
CN103914493A (en) Method and system for discovering and analyzing microblog user group structure
CN103150163A (en) Map/Reduce mode-based parallel relating method
CN104408149A (en) Criminal suspect mining association method and system based on social network analysis
CN107194498B (en) Hydrologic monitoring network optimization method
Donato et al. Corralling a black swan: natural range of variation in a forest landscape driven by rare, extreme events
Buchin et al. Group diagrams for representing trajectories
CN110992698A (en) Method for calculating association degree between intersections based on Apriori support degree and driving distance in weighting manner
CN104317794A (en) Chinese feature word association pattern mining method based on dynamic project weight and system thereof
CN105099731A (en) Method and system for finding churn factor for user churn of network application
CN106203631A (en) The parallel Frequent Episodes Mining of description type various dimensions sequence of events and system
CN104008185A (en) Frequent close scenario mining method based on same node table and scenario tree
Belbag et al. Comparison of two fuzzy multi criteria decision methods for potential airport location selection
Martindale et al. Urbanism in northern Tsimshian archaeology
Dutta et al. Coalescing-branching random walks on graphs
CN110727958B (en) Differential privacy track data protection method based on prefix tree
CN102622447A (en) Hadoop-based frequent closed itemset mining method
CN103886049A (en) Method for mining heterogeneous related data set in data space
Mahajan et al. An approach to optimize fuzzy time-interval sequential patterns using multi-objective genetic algorithm
Chan et al. ciForager: Incrementally discovering regions of correlated change in evolving graphs
Schmitt et al. The urgent need for job creation
CN105094986A (en) Prediction method and device for burst access behaviors oriented to storage system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140827

WD01 Invention patent application deemed withdrawn after publication