GB2568558A - Minimum non-reduction association rule mining method based on item subset example tree - Google Patents

Minimum non-reduction association rule mining method based on item subset example tree Download PDF

Info

Publication number
GB2568558A
GB2568558A GB1801845.7A GB201801845A GB2568558A GB 2568558 A GB2568558 A GB 2568558A GB 201801845 A GB201801845 A GB 201801845A GB 2568558 A GB2568558 A GB 2568558A
Authority
GB
United Kingdom
Prior art keywords
item
case
subset
closed
association rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1801845.7A
Other versions
GB201801845D0 (en
Inventor
Pei Zheng
Li Bo
Zhou Bin
Kong Mingming
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xihua University
Original Assignee
Xihua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xihua University filed Critical Xihua University
Publication of GB201801845D0 publication Critical patent/GB201801845D0/en
Publication of GB2568558A publication Critical patent/GB2568558A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • G06N5/025Extracting rules from data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

A minimum non-reduction association rule mining method based on an item subset example tree. The method comprises the following steps: generating an item subset in an example item database by using a closed itemset generated from a single item and a union operation of a set, the set being a proper subset of a power set of an itemset; constructing an item subset example tree structure of the example item database by using the generated item subset; mining a closed frequent itemset and a minimum generation element thereof in the item subset example tree and rapidly generating a minimum non-reduction association rule according to the mined closed frequent itemset and the minimum generation element thereof. By using the closed itemset generated from a single item, a plurality of item subsets can be obtained, the item subset example tree can be constructed, and hierarchical relationships between the item subsets and corresponding support levels thereof can be described, thus effectively reducing the number of retrievals between examples and items. The storage space is also effectively reduced at the same time, thereby increasing the speed and efficiency of mining the minimum non-reduction association rule.

Description

The present invention relates to the field of data excavation and knowledge acquisition, and proposes a method for excavating a tiny non-reduction association rule based on an item subset case tree from a large case item database, thereby obtaining a non-redundant knowledge library of the large case item database.
TECHNICAL BACKGROUND OF THE INVENTION
In a large case item database, an association rule describes a simultaneous occurrence relationship between items, that is, a plurality of cases in a large case item database meets certain items at the same time, wherein, a part of the items is taken as a former piece while the remaining items are taken as latter pieces to constitute an association rule between items. For example, in a large supermarket transaction database, each transaction is taken as one case while a commodity involved in the transaction is taken as one item, the excavated association rule depicts the simultaneous occurrence situation of the commodity in the transaction, this knowledge can be used for the placement of the commodity in a supermarket, the management in the number of the purchased commodities and other supermarket commodity management. Theoretically, if the set of the case meeting a subset of an item is not an empty set, the subset of the item can be used for excavating the association rule. Therefore, on the one hand, the excavating association rule is completed in the power set of the set of the item, this problem is an NP-difficult problem in computer science. On the other hand, because the association rule describes reasonable, scientific and useful knowledge in the large case item database, association rule excavation has been widely used in computer science, management science, economics, social science and other fields, for obtaining the reasonable, scientific and useful knowledge of the corresponding database.
The association rule that are usually excavated are very much, which is beyond the scope that people can understand, therefore, combining with the practical application, people have proposed a variety of extended or improved methods for excavating the association rule, in general, these methods for excavating the association rules comprise the following two main contents:
1. generating a frequent item set or a closed frequent item set.
2. excavating various association rules from the frequent item set or the closed frequent item. Tn practical applications, on the one hand, many generated frequent item sets or closed frequent item sets are generated, therefore, people also propose a huge frequent item set, a generalized item set, a free item set, a disjunction free item set, and so on to restrict the generation of the number of the item sets of the association rule or the association rule of a special need; on the other hand, the association rule excavated from the frequent item set or the closed frequent item set has redundant information, Therefore, people also propose a tiny-huge association rule, an irreducible association rule, a tiny non-reduction association rule, a weighting association rule and so on, to restrict the form of the association rule and reduce the generation of redundant association rule. From the method of generating the association rule, the method in the prior art can be divided into two categories, the first category is the method for excavating the association rule derived from Apriori method, Apriori method is the method for the association rule first proposed, The core idea thereof is to construct Apriori generation function and add successively a subset of an item-generation item according to the size of the support degree of each item, the generated item subset is stored in a hash-tree structure, the association item subset is quickly excavated via the hash-tree structure and taken as the former piece and the latter piece, thereby quickly generating the association rule. Subsequently, Apriori method was extensively extended or improved. The second category is the method derived from a frequent-pattern (FP) tree. Unlike the hash-tree structure of Apriori method, the FP-tree is a subset representation way of the relevant frequent item subset, each branch of the FP-tree stores a subset of the frequent item in a descending order, if the FP-tree is constructed, each item is first arranged from the large to the small, and then the case set and the item set are traversed, respectively, the frequent item subset of which the support degree is from the large to the small can be constructed from large to small layer by layer, the FP-tree is utilized to be able to quickly generate the association rule. Subsequently, the FP-tree method is extensively extended or improved.
It can be seen that the common feature of the method for excavating the association rule is that a single item generates a subset of the frequent item in a successively increasing way, during a generation process, each item is sequenced from the large to the small based on the support degree thereof, therefore, the frequent item subset is generated in the descending order based on the support degree. The hash-tree storage structure starts to increase from the single item, the case set and the item set are needed to be traversed for a plurality of times to generate the frequent item subset, in the large case item database, the number of calculation and storage space will be increased exponentially. In the FP-tree storage structure, the item is utilized to arrange a list from the large to the small according to the support degree, if the case set and the item set are traversed twice, a branch map in the FP tree where the frequent item subset is arranged from the large to the small based on the support degree can be constructed, Since the frequent item subset is still generated in the successively increasing way of each item, the method derived from the FP-tree still relates to the number of calculations and the storage space during the process of generating the frequent item subset and the corresponding association rule thereof in the large case item database. In general, the frequent item subset is generated based on the size of the support degree of the single item and in the way that the item is added successively, which has the following defects:
1. the successive addition of the item is essential that the single item is traversed and searched in the item set, causing the number of the generated frequent item subset to be very big, in particular, the number of the frequent item subsets in the large case item database expands exponentially, which is not conducive to rapidly excavate the tiny-huge association rule, the tiny non-reduction association rule and so on. In fact, a correlation between the items is provided in the large case item database, the emergence of each item will inevitably lead to the emergence of another item, The way in which the item is added successively does not utilize the correlation between such items.
2. the way of the successive addition of the item has a large amount of calculations during the process of generating the frequent item subset, generates a lot of redundant frequent item subsets, causes the scope of the information including the search of the closed frequent item set, a generator of item subset to be expanded, results in the two problems of the calculation and storage, which is not conducive to the rapid excavation of the association rule. In fact, the relationships between the items in the large case item database can be used to effectively reduce the number of generated redundant frequent item subsets.
SUMMARY OF THE PRESENT INVENTION
In order to overcome the shortcomings of the way of the successive addition of an item in an excavation process of the association rule, the invention utilizes a correlation relationship between various items in a large case item database to generate a frequent item subset, provides a construction method of an item subset case tree, and provides a method for quickly excavating a closed frequent item set, a tiny generator, and a tiny non-reduction association rule in an item subset tree.
To realize the above purpose, the present invention adopts the following technical solution:
A method for excavating a tiny non-reduction association rule based on an item subset case tree comprises the following steps: generating a closed item set corresponding to each item according to a closed operation between a case and an item in a case item database, wherein the closed item set meets the requirement that the support degree thereof is the same as that of the corresponding item;
sequencing the generated closed item set from the large to the small according to the number of elements in the set to generate each item subset via a union operation of the set; generating, via an intersection operation of the set, a case set (the support degree of an item subset) of which each item subset is met, and constructing an item subset case tree structure in a generation order;
excavating the closed frequent item set and the tiny generator thereof in the item subset case tree and further generating the tiny non-reduction association rule.
Specifically, let a case item database be D=(U,A), where U={ui,u2,.. ,,un} is a case set, A={al,a2,...,am} is an item set, each case u/i=l,2,...,n) is an item subset, for example, ii\={a\,a2,aA is one subset of A, which indicates that z/i meets items c/i/o and a3 .The invention uses two following mapping to describe two operations between the case and the item: for any cIj<eA, j=\,2,...,m,
T(aj)= {ut\VΜ,e (/and eu,}
Intuitively, ^(a7) represents a case subset which all cases meeting item a} constitutes, therefore, in the case item database, the support degree of item a7 is the number of the elements of ^(a7), that is, sup(a7)=| T(a7)|. naturally, for any item subset Aj^A,
T(Ak)={Ui\\fUieU and Α^ιι,\
Intuitively, T(Ak) represents a case subset which the case meeting each item in ri^at the same time constitutes, therefore, the support degree of the item subset Ak is the number of the elements of ^(Ak), that is | ΐ(Α^.
For any case subset Ui<^U, the item subset met by Ui is as follows:
Based the above mapping representation, the method for excavating the tiny non-reduction association rule based on the item subset case tree of the present invention is specifically described in the following:
1. Generating a closed item set corresponding to each item for any item a7eA, the above two mappings Γ and / are used and the closed item set generated by item a7 is as follows:
C(a,)=Anai))=vQc, >6 according to the representation of mappings Tand /, T(a7) is all of the subsets of the case meeting item a7, while the item subset met by ^(^) is /(Tf/,)), therefore, the case subset met by the item subset C(a7j and the case subset met by item a7 are the same, that is, the support degree of C(a7) is the support degree of item a.j. Many sound natures about mappings T and / have been obtained, according to the existing natures, C(a7) is easily proved to be one closed item set. Formally, the closed item set C(a7j is described to have this correlation together with item a7 at the same time, that is, item a7is met by any item in C(a7) while meeting the case, if item a/ is provided, other items in C(rz7) are also inevitably provided.
2. Building the item subset case tree
Different from that the single item is added gradually to generate the frequent item set, the present invention uses the closed item set C(a7) of the single item to generate the item subset, that is, B={C(iZi),C(a2),· · .,C(am)} is understood to be a generator basis, the item subset is generated by the union operation of the set used by a plurality of elements, for example, C(ai)uC(fl2)^C(am) generates one item subset, formally, let A' be one generated item subset, then,
People have got a lot about the good nature of the closed item set C(a7), According to the existing properties, it is easy to prove that all closed item sets of the case item database shall be included in all the item subsets generated by generator basis B={C(ai),C(a2),···, C(am)}. According to this conclusion, we can first generate all the item subsets by generating the generator basis B, and then excavates the required closed frequent item sets in the generated item subset. Since each C(a7) itself is one closed item set, on the one hand, the item subset generated by the generator basis B is different from the item subset generated by adding the single item successively, on the other hand, the item subset generated by the generator basis B is a true subset of a power set of the item set, The number of the item sets generated by the generator basis B is smaller than the number of the item sets generated by adding the single item successively, which means that the range of the excavated closed frequent items is small. Formally, the case set meeting the item subset A' generated by the generator basis B can be represented as:
The following procedure is used to build the item subset case tree and rapidly generate the above all item subsets and the case set met by the item subset:
(1) each node of the item subset case tree is represented as:
A'x T(A') where A' is one item subset generated by B, and A' is the case set met by T/A').
(2) each root node of the item subset case tree is represented as:
0xU (3) each sub-node of the root node is represented as:
C(<7,)x T(C(o,)) where the sub-node is arranged from the left to the right based on the number of the items included by C(a7) from the large to the small, that is, from the left to the right, the first sub-node is the node where the number of the items contained by C(a7) is the most, the last sub-node is the node where the number of the items contained by C(a7) is the least, the node is arranged in serial number when the number of the items is the same.
(4) The sub-node of each sub-node C(a7jx '/’(Cfi/J) is generated based on the following way: let C(«i)x T(C(fli)), C(a2)x T(C(a2)), ..., C(am)x T(C(am)) be the results arranged based on the requirement of (3), for each sub-node C(rz7)x ^(0(^)), the first sub-node thereof is as follows:
(C(«y)uC(«7+i))x( r(C(«7))o r(C(a7+1)))
If there is C(aj)<jC(aj+i)^A and C(a7)oC(a7+i) and T(C(ajy)r>T(C(aj+iy)^0, other nodes can be generated based on (C(a7)oC(a7+2))x('Z'(C(a7))o'Z'(C(a7+2))), ..., (C(a7)uC(am))x (T(C(a7)) n T(C(am))) respectively and successively.
(5) For any node A'x ^(Ar), it is assumed that there is A'=A''<jC(a/), then the first sub-node of A'x ?(A') is as follows:
(A'^C(aj+1))x( Τ(Α')π T(C(a]+1)))
If there are A'<jC(aj+ifiA and C(aj+i)<zA' and τ (A')rx T(C(aj+i))^0, other nodes can be generated based on (C(x4')oC(a7+2))x('Z'(C(x4,))o'Z'(C(a7+2))), ... ,(C(d')uC(am))x (T(C(/1’)) nf(C(affl))) respectively and successively.
(6) When it is need to generate frequent item subsets, it is only need to increase the limit of the minimum support oc in each node generation process, that is, for any node A'x T(A’), increase the limit:
I ^')|>oc
3. Excavating the closed frequent item set and the tiny generator thereof and generating the tiny non-reduction association rule.
For the item subset case tree, each node is constituted by the item subset and the case set met by the item subset, according to the case set, the following equivalence relation « is defined based on the node of the item subset case tree: for any two nodes A’x T(A’) and A” xT(A\
A'x T(A')^x T(A) when and only when there is T(A')=T(A\ According to the equivalence relation «, the node can be incorporated as:
[A']xT(A') [A'J is the set which the item subset of all of the nodes being equivalent to the node A'x f(A') in the item subset case tree constitutes, that is, the case set met by the item subset in [A'J is ^(A'). For the convenience of description, the invention provides the following agreement:
(1) max|L4'] is the largest element determined by the inclusion relation in [A1].
(2) min[/T] is the generator set of the largest element in [A'J.
Based on the above agreement, the closed frequent item set and the tiny generator thereof are as follows:
• max|L4'] is the closed frequent item set of which the support degree is iw • Let τ4ειηίηβ4'], if the case set of the subset met by the subset of A is T(A'\ and the case set of the subset no less than the subset is also T(A'\ the subset is one tiny generator of the closed frequent item set max|/4'], Gminf^'] is recorded as all of the tiny generators of max|/4'] obtained from min[^'].
According to the closed frequent item set and the tiny generator thereof, the tiny non-reduction association rule is generated as follows:
• the tiny non-reduction association rule with confidence of 1.
For any equivalence category [A'], letriieGminf^'], then
Ai^>( max[^4']-^4i) is the tiny non-reduction association rule,the support degree thereof is supfri!—>( max[^']-^4i))= | T (z4')|, while the confidence thereof is conf^-K max^'HOH ^'ΨΙ Τ(Λ)|=1.
• the tiny non-reduction association rule with the confidence of β.
For any equivalence category [A'J and the father node equivalence category thereof [A”], that is, in the item subset case tree, A” is the father node of A' and T(A)^ ΉΑ'), let^xeGminf^], then
A]—>( max[^4']-^4i) is the tiny non-reduction association rule, the support degree thereof is sup(Al —>( max[A']-Al))= | L (A')|,
There is confidence 3=conf(Al^-( max[A']-Al))=| ^(A')|/| ^(A1)|<1.
Compared with the prior art, the invention has the following beneficial effects: the present invention is a method for excavating a tiny non-reduction association rule based on an item subset case tree and utilizes a closes item set of a single item to generate a item subset case tree, compared with a method for enumerating the single item to generate an item subset, the present invention generates less item subsets, thereby effectively avoiding the generation of a redundant item subset. Meanwhile, the searched closed frequent item set and the tiny generator thereof are limited in the item subset case tree, thereby effectively reducing the search scope of the closed frequent item set and the tiny generator thereof. In addition, by utilizing the equivalence category and the hierarchical relation in the item subset case tree, the tiny non-reduction association rule is quickly excavated, thereby effectively avoiding the repeated calculation between the item set and the case set.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 is a corresponding closed item set for calculating each item in one embodiment of the present invention;
Figure 2 is a case tree for generating an item subset in one embodiment of the present invention;
Figure 3 is a case tree of a specific generated item subset in one embodiment of the present invention;
Figure 4 is an IT-tree generated in CHARM-L algorithm in one embodiment of the present invention;
Figure 5 is a method for excavating a tiny non-reduction association rule in one embodiment of the present invention;
Figure 6 is the running time curve of the algorithm proposed in the present invention and Aprior algorithm;
Figure 7 is a used memory curve of the algorithm proposed in the present invention and Aprior algorithm;
Figure 8 is a flow chart of an algorithm of the present invention;
Table 1 shows a case item database of six cases and five items;
Table 2 shows a closed item set and the support degree thereof;
Table 3 shows a node, a closed item set and a tiny generator thereof which are incorporated in a case subset case tree shown in Figure 3;
Table 4 shows a tiny non-reduction association rule with a confidence threshold of 0.9;
Table 5 shows the run time and used memory in Embodiment 2.
EMBODIMENTS OF THE PRESENT INVENITON
The present invention is described in further detail with reference to embodiments. It should not be understood that the scope of the above subject matter of the present invention is limited to the following embodiments, and that the techniques carried out based on the present invention are within the scope of the present invention.
Embodiment 1
Figure 1 is a method for excavating a tiny non-reduction association rule based on an item subset case tree according to one embodiment of the present invention, Figure 1 aims at calculating and obtaining a closed item set corresponding to each item and comprises the following steps:
providing an example and a table of a case item database D=(U^L) of six cases and five items, and providing a case set meeting each item and an item set met by the case set in Embodiment 1 for calculating the closed item set corresponding to each item;
Specifically, Table 1 describes the case item database D=(U^L) of six cases and five items, combined with Table 1, the case set meeting each item and the item set met by the case set are as follows:
T(aj)={i\\/ieU II. a7ez}, /(W)=zei?«7)Z'’ wherein /=1,2,...,6, j=l,2,3,4,5. Accordingly, a closed item set corresponding to each item is as follows:
^0,)=/(¾)) the support degree thereof is as follows:
Sup(C(aj))=\ T(aj)\.
The closed item set corresponding to each item according to Embodiment 1 is B^C^C^C^C^C^}.
Figure 2 is a method for excavating a tiny non-reduction association rule based on an item subset case tree according to one embodiment of the present invention, Figure 2 aims at generating the item subset case tree based on the closed item set corresponding to each item generated based on Figure 1 and comprises the following steps:
Generating a node of layer Zo, that is, a root node Zo: 0xt7.
Generating a node of layer Λ i, that is, a sub node of the root node,
Li: C(a1)x7(C(a1)), C(a2)xT(C(a2)), ..., C(a5)xT(C(a5)) wherein C(a7) is the j-th large closed item set comprising the number of the items.
It is assumed that layer Lr.i has been generated, the node of layer Lr consists of a sub node of each node in layer Lr.i, if node A/x ^(A/) of layer Lr.i is provided and Aj= Aj<jC(ak) is met, the sub node thereof is generated in the following:
(4/uC(at+i))x( Τ(Α/Υ> 7(C(^+i)))„. „(4'uC(a5))x( T(Aj'Y>
and A/uC^a^Aand CfaJczA/ and 7(4/)m 7(C(a;))/0, i=k+1,...,5 are met.
Figure 3 is a method for excavating a tiny non-reduction association rule based on an item subset case tree according to one embodiment of the present invention, Figure 3 aims at excavating the tiny non-reduction association rule based on the item subset case tree generated in Figure 2, and comprises the following steps:
Utilizing the equivalence relationship on the following nodes to incorporate the node in the item subset case tree, for any two nodes A'x f(A') and Ax ΉΑ''),
A'x T(A')&Ax ΐ(Α), when and only when there is 7(T')= 7(T) accordingly, the same node of the case set can be incorporated as:
[A']xT(A') wherein the case sets met by the item subset in the item subset equivalence category [A'J are all 7(T'), according to the inclusion relationship of the set, the largest element and the largest generator in [A'J are as follows:
max[T'] min[T'] max[T'] is the closed item set generated by A’ while the tiny generator of max[T'] is searched in min[^'], that is, for any ^emin[^'], if the case set of the subset met by the subset of A -is 7(T') and the case set of the subset no less than the subset is 7(4'), the subset is one tiny generator of the closed item set max[4'], GminfT'] is recorded as all of the tiny generators obtained from max[T'] in min[T'].
Accordingly, the tiny non-reduction association rule is generated in the following:
For any equivalence category [A’J, let4ieGmin[4'], then
Ai—>( max[T']-T!)
The support degree thereof is supfri!—>( max[T']-T1))=| 7 (4')| while the confidence is conf(ri!—>( max[T']-Ti)) =| 7(4')|/| 7(4^1=1.
For any equivalent category [A'J and the father node equivalence category [A”], that is, in the item subset case tree, A is father node of A' and there is 7(4)/ T(A'\ let^xeGminfT], then
Ai—>( maxfT'J-Ti)
The support degree thereof is sup(/1| >( max|d']-di))= | 7(4')| while there is confidence 3=conf(ri!—>( maxf^'J-Ti)) =| 7(4')|/| 7(4^1.
Embodiment 1:
One case item database is D=(U,T)=({ 1,2,3,4,5,6},{c/|,//2//3//4//5,), the example is shown in Table 1.
According to Table 1 and Figure 1, the case set meeting ct\ is as follows: T(ai)={i\\fieU II. 1,2,5},
The item subset met by the case set {1,2,5} is as follows:
ηη«·))= ,,0,,) i={a4,a5}r>{a4,a3}r>{a4,a2,a3,a4,a5}={a]}
Accordingly, the closed item set corresponding to item ct\ is as follows:
C(fli)= /( Τ(α/))={αγ}
The support degree thereof is as follows:
5^(0(^))=1^)1=1(1,2,5)1=3.
Similarly, the closed item set and the support degree thereof corresponding to a2, a3, a4 and cn can be obtained, the results of embodiment 1 are shown in Table 2.
As shown in Table 2, the results sequenced according to the number of included items are C(a4), C(a2), C(«i), C(c/3), C(a/), therefore, the sub node of the root node 0xUconstitutes layer L4 and are from the left to the right as follows:
C(a4)x{5,6), C(a2)x{4,5), C(ai)x{ 1,2,5), C(a3)x{2,3,4,5,6), C(a5)x {1,3,5,6).
The sub node of each node of layer L4 constitutes layer L2, wherein the sub node of C(a4)x{5,6) is as follows:
(C(a4)oC(a2))x ({5,6)n{4,5)), (C(a4)uC(ai))x ({5,6)o{ 1,2,5))
C(a4)oC(a3) and (C(a4)oC(fl5)) do not generate the node, for is not met, other sub nodes can be generated similarly. The sub node of each node of layer L2 constitutes layer Z3, wherein the sub node of (C(a4)oC(a2))x{5) is as follows: ((C(«4)oC(«2))oC(r/0)x({ 5 )n{ 1,2,5 )) ((C(a4)oC(a2))oC(a3)) and ((C(a4)oC(a2))oC(a5)) do not generate the node, for C(a/)(ZAj' is not met, other sub nodes can be generated similarly. Figure 3 shows the specific item subset case tree generated in Embodiment 1, wherein a3a4a3 represents the item subset {α345), 56 represents the case subset {5,6).Figure 4 is the IT-tree generated based on CHARM-L algorithm in Embodiment 1, wherein the representations of the item subset and the case subset both are similar to those in Figure 3. Compared to Figure 4, the number of the layers and of the nodes of the item subset case tree in Figure 3 are both less than those of the IT-tree, naturally, the excavated closed frequent item set and the scope of the tiny generator thereof are less than those of the IT-tree, therefore, the tiny non-reduction association rule can be rapidly generated in the item subset case tree.
According to the item subset case tree shown in Figure 3, by equating the case set, the node in the item subset case tree is incorporated, for example, [ala2a3]x5, wherein, [ala2a3]={ala2a3,ala3a5,a2a3a5,ala2a3a5,ala3a4a5,a2a3a4a5,ala2a3a4a5) max[ala2a3]=ala2a3a4a5, min[ala2a3]={ala2a3,ala3a5,a2a3a5,ala2a3a5,ala3a4a5,a2a3a4a5), Gmin[ala2a3]={ala2,ala4,a2a4,a2a5,ala3a5).
The generated tiny non-reduction association rule with the confidence of 1 is as follows:
ala2^-a3a4a5, ala4^-a2a3a5, a2a4^aia3a5, <72<75^>ai<73a4, a1a3a5^-a2a4
Table 3 shows the node, the closed item set and the tiny generator thereof which are incorporated in the case subset case tree shown in Figure 3; Table 4 shows the tiny non-reduction association rule with a confidence threshold of 0.9.
Embodiment 2
Embodiment 2 uses the data set EXTENDED BAKERY Dataset, the data set records a total of 75,000 sales records for the purchase of 40 kinds of breads (numbered 1 to 40) and 10 kinds of drinks (numbered 41 to 50), the excavated attribute association rule is reflected in the relationship between the purchased bread and beverages, for the attribute association rule excavated by using the present invention, the threshold value of the support degree is set to be 0.01, the threshold value of the confidence is set to be 0, a total of 112 attribute association rules are generated, the method of the present invention is compared to classical Aprior algorithm in terms of the number of the attribute association rules (352 pieces), running time and a used memory, wherein the number of the attribute association rules and the content of a former piece and a latter piece of the rule are completely identical, the running time and the used memory are shown in Table 5, in comparative experiment, Embodiment 2 copies and doubles 75,000 pieces of the original data by 7 times, is multiplied by 2, and get 8 sets of data, respectively, the number, the support degree and the confidence of the obtained rules are unchanged while the running time and the used memory are changed. Figure 6 is the running time curve of the algorithm proposed in the present invention and Aprior algorithm. Figure 7 is a used memory curve of the algorithm proposed in the present invention and Aprior algorithm.
The 112 attribute association rules generated by this method are all within the attribute association rules (352) generated by the Aprior algorithm, and all rules are Min-Max rules.
The described above are only the embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art will be able to easily think of variations or substitutions within the technical scope of the present invention, which should be covered within the scope of the present invention.

Claims (5)

1. A method for excavating a tiny non-reduction association rule based on an item subset case tree, comprising the following steps:
step 1: generating a closed item set corresponding to each item according to a closed operation between a case and an item in a case item database, wherein the closed item set meets the requirement that the support degree thereof is the same as that of the corresponding item;
step 2: sequencing the generated closed item set from the large to the small according to the number of elements in the set to generate each item subset via a union operation of the set;
step 3: generating, via the union operation of the set, a case set of which each item subset is met, and constructing an item subset case tree structure based on the generated case set according to a generated order;
step 4: excavating a closed frequent item set and a tiny generator thereof in the item subset case tree, and utilizing the excavated closed frequent item set and the tiny generator thereof to generate the tiny non-reduction association rule.
2. The method for excavating the tiny non-reduction association rule based on the item subset case tree, characterized in that step 1 comprises the following steps:
step 1.1: constituting, by a case meeting an item and an item meeting a case, the closed operation between a pair of the case and the item;
step 1.2: using the closed operation to be able to generate the item subset commonly met by the case meeting an item, that is, the closed item set determined by the case meeting an item.
3. The method for excavating the tiny non-reduction association rule based on the item subset case tree, characterized in that step 2 comprises the following steps:
step 2.1: sequencing the closed item set determined by the case met by each item from the large to the small based on the number of the items comprised therein;
step 2.2: generating a new item subset for the generated item subset and the selected closed item set via the union operation of the set according to the sequencing order again.
4. The method for excavating the tiny non-reduction association rule based on the item subset case tree, characterized in that step 3: calculating, via an intersection operation of the set, the case set of which each item subset is met, and constructing the item subset case tree structure based on the generated order of the case set.
5. The method for excavating the tiny non-reduction association rule based on the item subset case tree, characterized in that step 4 comprises the following steps:
step 4.1: selecting the item subset having the same case set in the item subset case tree;
step 4.2: the largest element in the item subset of the same case set is the closed item set according to inclusive relationship, wherein the generator is used for obtaining the tiny generator of the closed item set;
step 4.3: the tiny generator is taken as a former piece while the closed item set subtracts the tiny generator to become a latter piece and to generate the tiny non-reduction association rule.
GB1801845.7A 2016-05-27 2016-09-13 Minimum non-reduction association rule mining method based on item subset example tree Withdrawn GB2568558A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610365087.1A CN106021546A (en) 2016-05-27 2016-05-27 Minimum non-reduction association rule mining method based on item subset example tree
PCT/CN2016/098788 WO2017201920A1 (en) 2016-05-27 2016-09-13 Minimum non-reduction association rule mining method based on item subset example tree

Publications (2)

Publication Number Publication Date
GB201801845D0 GB201801845D0 (en) 2018-03-21
GB2568558A true GB2568558A (en) 2019-05-22

Family

ID=57092299

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1801845.7A Withdrawn GB2568558A (en) 2016-05-27 2016-09-13 Minimum non-reduction association rule mining method based on item subset example tree

Country Status (3)

Country Link
CN (1) CN106021546A (en)
GB (1) GB2568558A (en)
WO (1) WO2017201920A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019178733A1 (en) * 2018-03-20 2019-09-26 深圳大学 Method and apparatus for mining frequent item sets of large-scale data set, device, and medium
CN112733915B (en) * 2020-12-31 2023-11-07 大连大学 Situation estimation method based on improved D-S evidence theory
CN112861008B (en) * 2021-03-01 2022-08-09 山东大学 Restaurant ordering recommendation method and system based on multi-user information fusion and entropy
CN117114116A (en) * 2023-08-04 2023-11-24 北京杰成合力科技有限公司 Root cause analysis method, medium and equipment based on machine learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5615341A (en) * 1995-05-08 1997-03-25 International Business Machines Corporation System and method for mining generalized association rules in databases
CN101996102A (en) * 2009-08-31 2011-03-30 中国移动通信集团公司 Method and system for mining data association rule
CN105335785A (en) * 2015-10-30 2016-02-17 西华大学 Association rule mining method based on vector operation
CN105589908A (en) * 2014-12-31 2016-05-18 中国银联股份有限公司 Association rule computing method for transaction set

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5615341A (en) * 1995-05-08 1997-03-25 International Business Machines Corporation System and method for mining generalized association rules in databases
CN101996102A (en) * 2009-08-31 2011-03-30 中国移动通信集团公司 Method and system for mining data association rule
CN105589908A (en) * 2014-12-31 2016-05-18 中国银联股份有限公司 Association rule computing method for transaction set
CN105335785A (en) * 2015-10-30 2016-02-17 西华大学 Association rule mining method based on vector operation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEI, CHANGHUA et al., "Mining Minimal Non-Redundant Association Rules Based on Galois Connection", COMPUTER ENGINEERING AND SCIENCE, (20070228), vol. 29, no. 2, ISSN 1007-130X, pages 94 - 95 *

Also Published As

Publication number Publication date
WO2017201920A1 (en) 2017-11-30
CN106021546A (en) 2016-10-12
GB201801845D0 (en) 2018-03-21

Similar Documents

Publication Publication Date Title
Stockinger et al. Query-driven visualization of large data sets
GB2568558A (en) Minimum non-reduction association rule mining method based on item subset example tree
Yao et al. A fast space-saving algorithm for maximal co-location pattern mining
Gilson et al. From web data to visualization via ontology mapping
CN102521417B (en) Method for processing multi-dimensional data based on virtual data cube and system of method
Goethals et al. MIME: a framework for interactive visual pattern mining
US8819592B2 (en) Sparse dynamic selection trees
Hahn et al. Visualization of varying hierarchies by stable layout of voronoi treemaps
US20180018402A1 (en) Dynamic hierarchy generation based on graph data
US9449063B2 (en) Synchronization of form fields in a client-server environment
JP5241738B2 (en) Method and apparatus for building tree structure data from tables
Singh et al. Online Mining of data to generate association rule mining in large databases
CN104239373A (en) Document tag adding method and document tag adding device
JP2006172446A (en) Complex data access
Feng et al. Mining inter-transaction associations with templates
TW200412514A (en) Method using list menu to express hierarchical tree database and storage medium storing and executing computer program of method thereof
CN104268191A (en) Document display method and device
NL2028096B1 (en) Method for excavating tiny non-reduction association rule based on item subset case tree
Lewis et al. X-μ fuzzy association rule method
Lu et al. Data visualization of web service with parallel coordinates and nodetrix
JP3785008B2 (en) Electronic catalog using system and computer readable recording medium recording electronic catalog using program
Hao et al. RELT–visualizing trees on mobile devices
US8103687B2 (en) Selecting member sets for generating asymmetric queries
Cordasco et al. Efficient on-line algorithms for Euler diagram region computation
Javangula et al. High utility itemset mining using path encoding and constrained subset generation

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)