CN103678530A

CN103678530A - Rapid detection method of frequent item sets

Info

Publication number: CN103678530A
Application number: CN201310632561.9A
Authority: CN
Inventors: 江潮
Original assignee: WUHAN TRANSN INFORMATION TECHNOLOGY Co Ltd
Current assignee: WUHAN TRANSN INFORMATION TECHNOLOGY Co Ltd
Priority date: 2013-11-30
Filing date: 2013-11-30
Publication date: 2014-03-26

Abstract

The invention discloses a rapid detection method of frequent item sets. The method includes: scanning a transaction database, and acquiring all 1 item sets in the transaction database according to recordings in the transaction database; calculating support degree of each 1 item set to obtain frequent 1 item sets with the support degree not smaller than a minimum support degree threshold; merging frequent k item sets and the frequent 1 item sets with no repetition, and generating frequent k+1 item sets with the support degree not smaller than the minimum support degree threshold. The letter k is an integer larger than 0. The method has the advantages that the amount of data processing for making association rules by computer data processing is reduced and computer processing efficiency is improved greatly.

Description

A kind of method of frequent item set Rapid Detection

Technical field

The present invention relates to a kind of computer realm, in particular to a kind of method of frequent item set Rapid Detection.

Background technology

Data correlation be the class that exists in database important can found knowledge.If there is certain regularity between the value of two or more variablees, be just called association.Association rule mining is found interesting association or correlative connection between mass data middle term collection.It is an important problem in data mining.Correlation rule lays particular emphasis on the contact between different field in specified data, find the concentrated interesting connection of data-oriented, in database, the correlation function of data is unknown uncertain often, and correlation rule is the process of a self study, by it, can find unknown useful rule.

At present, the excavation of correlation rule is mainly finishing dealing with by computing machine, in the correlation computations process of correlation rule, mainly be calculated as the excavation of frequent item set, adopt general Apriori algorithm Mining Frequent Itemsets Based, need to repeatedly retrieve whole transaction database, when deal with data amount is too huge, digging efficiency is low.Therefore, improve the digging efficiency of frequent item set of correlation rule and the emphasis that the data processing amount of minimizing computing machine remains research.

Summary of the invention

The present invention aims to provide a kind of method of frequent item set Rapid Detection, to solve the inefficient problem of Mining Frequent Itemsets Based in above-mentioned prior art.

A kind of method that the invention discloses frequent item set Rapid Detection, comprising: scanning transaction database, according to the record in things database, obtains 1 all collection of described things data centralization;

The support of calculating 1 collection described in each, supported degree is not less than frequent 1 collection of minimum support threshold values;

By frequent k item collection and frequent 1 collection, carry out nothing and repeat to merge, generate the frequent k+1 item collection that support is not less than minimum support threshold values;

Wherein, k is greater than 0 integer.

Preferably, also comprise:

Described in each, 1 set pair is being answered boolean's array, the record sum that this boolean's array length is transaction database, and each numerical digit of described boolean's array is corresponding with the record of described things database one by one according to the order of the record in described things database;

If certain record in transaction database comprises this 1 concentrated item, will be designated as 1 with this logical value recording in corresponding numerical digit; Otherwise, be designated as 0;

Calculate the support of described all 1 collection, reject described 1 collection that support is less than minimum support threshold values, obtain described frequent 1 collection.

Wherein, in boolean's array the number of " 1 " and the numerical digit length ratio of boolean's array as described support.

Preferably, also comprise:

Described candidate frequent k+1 item collection and corresponding boolean's array thereof are carried out nothing by frequent K item collection and boolean's array thereof and frequent 1 collection and boolean's array thereof and are repeated merging and obtain;

In the process that repeats to merge in described nothing, the logical value in the identical numerical digit in frequent boolean's array of k item collection and boolean's array of frequent 1 collection is carried out logic and operation, obtains boolean's array of the frequent k+1 item of candidate collection;

Calculate the support of the frequent k+1 item of described all candidates collection; Rejecting support is less than the described k+1 item collection of minimum support threshold values, obtains described frequent k+1 item collection.

Preferably, also comprise:

In the process that repeats to merge in described nothing,

Judgement obtains described frequent k+1 item and integrates in the situation as empty set, finishes to excavate flow process.

Preferably, the process that described nothing repeats to merge comprises:

After merging, obtain the frequent k+1 item of described candidate collection for before do not occur, this k+1 item collection is labeled as " merging ", and after merging process in, identical frequent item set with it, abandons merging processing.

The method that correlation rule in the present invention excavates fast, has the following advantages:

1, the method that the present invention searches for and detects frequent item set, only need when generating 1 collection table, scan 1 time transaction database D, that compares classical Apriori algorithm and most of other association rule algorithms repeatedly reads transaction database, has greatly reduced the IO expense producing owing to reading transaction database;

While 2, generating frequent item set, need not first produce candidate item, frequent k item collection is directly generated by frequent 1 collection and frequent k-1 item collection, compared to equally only needing single pass transaction database but transaction database need be compressed to the FP-growth method of frequent pattern tree (fp tree), there is memory consumption still less;

3, in this method, maximum calculating consumes as " logical and " computing, meets the computing pattern of the bottom of computing machine, and the software of designing is thus fast operation not only, for the consumption of cpu and internal memory, also saves the most.

Accompanying drawing explanation

Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:

Fig. 1 shows the process flow diagram of embodiment.

Embodiment

Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.

A kind of method that the invention discloses frequent item set Rapid Detection, comprising:

S11, scanning transaction database, according to the record in things database, obtain 1 collection all in described things database;

S12, calculate the support of 1 collection described in each, supported degree is not less than frequent 1 collection of minimum support threshold values;

S13, by frequent k item collection and frequent 1 collection, carry out nothing and repeat to merge, generate the frequent k+1 item collection that support is not less than minimum support threshold values;

Wherein, k is greater than 0 integer.

Further, also comprise:

Further, in the process that repeats to merge in described nothing,

Preferably, the process that described nothing repeats to merge comprises:

Further, this method discloses a preferably embodiment, as follows:

Table 1: transaction database D

TID	Item collection
		T001	A、B、E
T002	B、D
		T003	B、C
T004	A、B、D
		T005	A、C
T006	B、C
		T007	A、C
T008	A、B、C、E
		T009	A、B、C

According to following rule, set up following 1 collection table (table 2).

Scanning everything thing database D, all " items " in D of take shown for one 1 of Foundation collects.

Because this is concentrated, include an A, B, C, D and D, this table length is 5; This table comprises 3 row, and first classifies a sequence number as; Second classifies key name as claims; The 3rd classifies boolean's array as, this array is set up as follows: array length is 9, if the obtaining value method of each element is in this boolean's array---" item " of its correspondence is present in the i(1≤i≤n of transaction database D) in individual record, by the logical value assignment of i element of this array, be true value 1, otherwise be 0.

Table 2:1 item collection item table

Sequence number	Item title	Boolean's array
			1	A	100110111
2	Ｂ	111101011
			3	Ｃ	001011111
4	Ｄ	010100000
			5	Ｅ	100000010

By following rule, set up following frequent 1 collection table (table 3).

Calculate this 1 collection and show the true value number of the boolean's array in first record, this is worth to the length 9 divided by transaction database D, be somebody's turn to do the support of " 1 collection ";

Setting minimum support support threshold values is 2/9;

If described support is greater than given minimum support threshold values, this 1 collection is labeled as to frequent item set;

Records all in 1 collection table is carried out to said process, obtain frequent 1 collection table.

Table 3: frequent 1 collection table

By following rule, set up following frequent 2 collection tables (table 4).

By i in frequent 1 collection table record and j record (1≤i≤5,1≤j≤5 and i ≠ j) in the corresponding element of boolean's array carry out logic "and" operation, obtain new boolean's array;

Calculate the true value number in this boolean's array, this is worth to the length 9 divided by transaction database D, obtain the support of these 2 collection;

If described support is greater than given minimum support threshold values 2/9, two 2 collection that form in i record in this frequent 1 collection table and j record are labeled as to frequent item set;

Complete after the circulation of i and j, obtain all frequent 2 collection tables.

Table 4: frequent 2 collection tables

By following rule, set up following frequent 3 collection tables (table 5).

In situation for known frequent 1 collection and frequent k item collection, can generate by the following method frequent k+1 item collection (k >=2):

Judge the item of i record in frequent k item collection table and the situation (1≤i≤6,1≤j≤5) after the item merging in frequent 1 j concentrated record:

If be k+1 item collection after merging, and this k+1 item collection do not merge, and by this k+1 item set identifier, was " merging "; During merging is afterwards processed, corresponding identical frequent item set, does not process.

I record in this frequent k item collection table and boolean's array of frequent 1 j concentrated record are carried out to logic "and" operation, obtain new boolean's array;

Calculate the true value number in this boolean's array, this is worth to the length 9 divided by transaction database D, obtain the support of this k item collection;

If described support is greater than given minimum support threshold values 2/9, this k item collection is labeled as to frequent item set;

Complete after the circulation of i and j, obtain all frequent k item collection tables.

Table 5: frequent 3 collection tables

Sequence number	3 set names claim	Boolean's array	Support
				1	Ａ、Ｂ、Ｃ	000000011	2/9
2	Ａ、Ｂ、Ｅ	100000010	2/9

According to above-mentioned rule, by frequent 1 collection and frequent 3 collection, could not generate frequent 4 collection, recurrence stops.Combined statement 3,4,5, obtains, from affairs database D search and all frequent item sets of detecting, obtaining frequent item set table (table 6)

Table 6: frequent item set table

Sequence number	Item set name claims	Boolean's array	Support
				1	A	100110111	6/9
2	Ｂ	111101011	7/9
				3	Ｃ	001011111	6/9
4	Ｄ	010100000	2/9

5	Ｅ	100000010	2/9
				6	Ａ、Ｂ	100100011	4/9
7	Ａ、Ｃ	000010111	4/9
				8	Ａ、Ｅ	100000010	2/9
9	Ｂ、Ｃ	001001011	4/9
				10	Ｂ、Ｄ	010100000	2/9
11	Ｂ、Ｅ	100000010	2/9
				12	Ａ、Ｂ、Ｃ	000000011	2/9
13	Ａ、Ｂ、Ｅ	100000010	2/9

Claims

1. a method for frequent item set Rapid Detection, is characterized in that, comprising: scanning transaction database, according to the record in things database, obtains 1 collection all in described things database;

Wherein, k is greater than 0 integer.

2. method according to claim 1, is characterized in that, also comprises:

Calculate the support of described all 1 collection, reject described 1 collection that support is less than minimum support threshold values, obtain described frequent 1 collection;

3. method according to claim 2, is characterized in that, also comprises:

4. method according to claim 1, is characterized in that, also comprises:

In the process that repeats to merge in described nothing,

5. method according to claim 1, is characterized in that, the process that described nothing repeats to merge comprises: