Summary of the invention
The present invention aims to provide a kind of method of frequent item set Rapid Detection, to solve the inefficient problem of Mining Frequent Itemsets Based in above-mentioned prior art.
A kind of method that the invention discloses frequent item set Rapid Detection, comprising: scanning transaction database, according to the record in things database, obtains 1 all collection of described things data centralization;
The support of calculating 1 collection described in each, supported degree is not less than frequent 1 collection of minimum support threshold values;
By frequent k item collection and frequent 1 collection, carry out nothing and repeat to merge, generate the frequent k+1 item collection that support is not less than minimum support threshold values;
Wherein, k is greater than 0 integer.
Preferably, also comprise:
Described in each, 1 set pair is being answered boolean's array, the record sum that this boolean's array length is transaction database, and each numerical digit of described boolean's array is corresponding with the record of described things database one by one according to the order of the record in described things database;
If certain record in transaction database comprises this 1 concentrated item, will be designated as 1 with this logical value recording in corresponding numerical digit; Otherwise, be designated as 0;
Calculate the support of described all 1 collection, reject described 1 collection that support is less than minimum support threshold values, obtain described frequent 1 collection.
Wherein, in boolean's array the number of " 1 " and the numerical digit length ratio of boolean's array as described support.
Preferably, also comprise:
Described candidate frequent k+1 item collection and corresponding boolean's array thereof are carried out nothing by frequent K item collection and boolean's array thereof and frequent 1 collection and boolean's array thereof and are repeated merging and obtain;
In the process that repeats to merge in described nothing, the logical value in the identical numerical digit in frequent boolean's array of k item collection and boolean's array of frequent 1 collection is carried out logic and operation, obtains boolean's array of the frequent k+1 item of candidate collection;
Calculate the support of the frequent k+1 item of described all candidates collection; Rejecting support is less than the described k+1 item collection of minimum support threshold values, obtains described frequent k+1 item collection.
Preferably, also comprise:
In the process that repeats to merge in described nothing,
Judgement obtains described frequent k+1 item and integrates in the situation as empty set, finishes to excavate flow process.
Preferably, the process that described nothing repeats to merge comprises:
After merging, obtain the frequent k+1 item of described candidate collection for before do not occur, this k+1 item collection is labeled as " merging ", and after merging process in, identical frequent item set with it, abandons merging processing.
The method that correlation rule in the present invention excavates fast, has the following advantages:
1, the method that the present invention searches for and detects frequent item set, only need when generating 1 collection table, scan 1 time transaction database D, that compares classical Apriori algorithm and most of other association rule algorithms repeatedly reads transaction database, has greatly reduced the IO expense producing owing to reading transaction database;
While 2, generating frequent item set, need not first produce candidate item, frequent k item collection is directly generated by frequent 1 collection and frequent k-1 item collection, compared to equally only needing single pass transaction database but transaction database need be compressed to the FP-growth method of frequent pattern tree (fp tree), there is memory consumption still less;
3, in this method, maximum calculating consumes as " logical and " computing, meets the computing pattern of the bottom of computing machine, and the software of designing is thus fast operation not only, for the consumption of cpu and internal memory, also saves the most.
Embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.
A kind of method that the invention discloses frequent item set Rapid Detection, comprising:
S11, scanning transaction database, according to the record in things database, obtain 1 collection all in described things database;
S12, calculate the support of 1 collection described in each, supported degree is not less than frequent 1 collection of minimum support threshold values;
S13, by frequent k item collection and frequent 1 collection, carry out nothing and repeat to merge, generate the frequent k+1 item collection that support is not less than minimum support threshold values;
Wherein, k is greater than 0 integer.
Further, also comprise:
Described in each, 1 set pair is being answered boolean's array, the record sum that this boolean's array length is transaction database, and each numerical digit of described boolean's array is corresponding with the record of described things database one by one according to the order of the record in described things database;
If certain record in transaction database comprises this 1 concentrated item, will be designated as 1 with this logical value recording in corresponding numerical digit; Otherwise, be designated as 0;
Calculate the support of described all 1 collection, reject described 1 collection that support is less than minimum support threshold values, obtain described frequent 1 collection.
Wherein, in boolean's array the number of " 1 " and the numerical digit length ratio of boolean's array as described support.
Further, also comprise:
Described candidate frequent k+1 item collection and corresponding boolean's array thereof are carried out nothing by frequent K item collection and boolean's array thereof and frequent 1 collection and boolean's array thereof and are repeated merging and obtain;
In the process that repeats to merge in described nothing, the logical value in the identical numerical digit in frequent boolean's array of k item collection and boolean's array of frequent 1 collection is carried out logic and operation, obtains boolean's array of the frequent k+1 item of candidate collection;
Calculate the support of the frequent k+1 item of described all candidates collection; Rejecting support is less than the described k+1 item collection of minimum support threshold values, obtains described frequent k+1 item collection.
Further, in the process that repeats to merge in described nothing,
Judgement obtains described frequent k+1 item and integrates in the situation as empty set, finishes to excavate flow process.
Preferably, the process that described nothing repeats to merge comprises:
After merging, obtain the frequent k+1 item of described candidate collection for before do not occur, this k+1 item collection is labeled as " merging ", and after merging process in, identical frequent item set with it, abandons merging processing.
Further, this method discloses a preferably embodiment, as follows:
Table 1: transaction database D
TID |
Item collection |
T001 |
A、B、E |
T002 |
B、D |
T003 |
B、C |
T004 |
A、B、D |
T005 |
A、C |
T006 |
B、C |
T007 |
A、C |
T008 |
A、B、C、E |
T009 |
A、B、C |
According to following rule, set up following 1 collection table (table 2).
Scanning everything thing database D, all " items " in D of take shown for one 1 of Foundation collects.
Because this is concentrated, include an A, B, C, D and D, this table length is 5; This table comprises 3 row, and first classifies a sequence number as; Second classifies key name as claims; The 3rd classifies boolean's array as, this array is set up as follows: array length is 9, if the obtaining value method of each element is in this boolean's array---" item " of its correspondence is present in the i(1≤i≤n of transaction database D) in individual record, by the logical value assignment of i element of this array, be true value 1, otherwise be 0.
Table 2:1 item collection item table
Sequence number |
Item title |
Boolean's array |
1 |
A |
100110111 |
2 |
B |
111101011 |
3 |
C |
001011111 |
4 |
D |
010100000 |
5 |
E |
100000010 |
By following rule, set up following frequent 1 collection table (table 3).
Calculate this 1 collection and show the true value number of the boolean's array in first record, this is worth to the length 9 divided by transaction database D, be somebody's turn to do the support of " 1 collection ";
Setting minimum support support threshold values is 2/9;
If described support is greater than given minimum support threshold values, this 1 collection is labeled as to frequent item set;
Records all in 1 collection table is carried out to said process, obtain frequent 1 collection table.
Table 3: frequent 1 collection table
By following rule, set up following frequent 2 collection tables (table 4).
By i in frequent 1 collection table record and j record (1≤i≤5,1≤j≤5 and i ≠ j) in the corresponding element of boolean's array carry out logic "and" operation, obtain new boolean's array;
Calculate the true value number in this boolean's array, this is worth to the length 9 divided by transaction database D, obtain the support of these 2 collection;
If described support is greater than given minimum support threshold values 2/9, two 2 collection that form in i record in this frequent 1 collection table and j record are labeled as to frequent item set;
Complete after the circulation of i and j, obtain all frequent 2 collection tables.
Table 4: frequent 2 collection tables
By following rule, set up following frequent 3 collection tables (table 5).
In situation for known frequent 1 collection and frequent k item collection, can generate by the following method frequent k+1 item collection (k >=2):
Judge the item of i record in frequent k item collection table and the situation (1≤i≤6,1≤j≤5) after the item merging in frequent 1 j concentrated record:
If be k+1 item collection after merging, and this k+1 item collection do not merge, and by this k+1 item set identifier, was " merging "; During merging is afterwards processed, corresponding identical frequent item set, does not process.
I record in this frequent k item collection table and boolean's array of frequent 1 j concentrated record are carried out to logic "and" operation, obtain new boolean's array;
Calculate the true value number in this boolean's array, this is worth to the length 9 divided by transaction database D, obtain the support of this k item collection;
If described support is greater than given minimum support threshold values 2/9, this k item collection is labeled as to frequent item set;
Complete after the circulation of i and j, obtain all frequent k item collection tables.
Table 5: frequent 3 collection tables
Sequence number |
3 set names claim |
Boolean's array |
Support |
1 |
A、B、C |
000000011 |
2/9 |
2 |
A、B、E |
100000010 |
2/9 |
According to above-mentioned rule, by frequent 1 collection and frequent 3 collection, could not generate frequent 4 collection, recurrence stops.Combined statement 3,4,5, obtains, from affairs database D search and all frequent item sets of detecting, obtaining frequent item set table (table 6)
Table 6: frequent item set table
Sequence number |
Item set name claims |
Boolean's array |
Support |
1 |
A |
100110111 |
6/9 |
2 |
B |
111101011 |
7/9 |
3 |
C |
001011111 |
6/9 |
4 |
D |
010100000 |
2/9 |
5 |
E |
100000010 |
2/9 |
6 |
A、B |
100100011 |
4/9 |
7 |
A、C |
000010111 |
4/9 |
8 |
A、E |
100000010 |
2/9 |
9 |
B、C |
001001011 |
4/9 |
10 |
B、D |
010100000 |
2/9 |
11 |
B、E |
100000010 |
2/9 |
12 |
A、B、C |
000000011 |
2/9 |
13 |
A、B、E |
100000010 |
2/9 |