CN103678530A - Rapid detection method of frequent item sets - Google Patents

Rapid detection method of frequent item sets Download PDF

Info

Publication number
CN103678530A
CN103678530A CN201310632561.9A CN201310632561A CN103678530A CN 103678530 A CN103678530 A CN 103678530A CN 201310632561 A CN201310632561 A CN 201310632561A CN 103678530 A CN103678530 A CN 103678530A
Authority
CN
China
Prior art keywords
frequent
collection
item
array
boolean
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310632561.9A
Other languages
Chinese (zh)
Inventor
江潮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUHAN TRANSN INFORMATION TECHNOLOGY Co Ltd
Original Assignee
WUHAN TRANSN INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUHAN TRANSN INFORMATION TECHNOLOGY Co Ltd filed Critical WUHAN TRANSN INFORMATION TECHNOLOGY Co Ltd
Priority to CN201310632561.9A priority Critical patent/CN103678530A/en
Publication of CN103678530A publication Critical patent/CN103678530A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a rapid detection method of frequent item sets. The method includes: scanning a transaction database, and acquiring all 1 item sets in the transaction database according to recordings in the transaction database; calculating support degree of each 1 item set to obtain frequent 1 item sets with the support degree not smaller than a minimum support degree threshold; merging frequent k item sets and the frequent 1 item sets with no repetition, and generating frequent k+1 item sets with the support degree not smaller than the minimum support degree threshold. The letter k is an integer larger than 0. The method has the advantages that the amount of data processing for making association rules by computer data processing is reduced and computer processing efficiency is improved greatly.

Description

A kind of method of frequent item set Rapid Detection
Technical field
The present invention relates to a kind of computer realm, in particular to a kind of method of frequent item set Rapid Detection.
Background technology
Data correlation be the class that exists in database important can found knowledge.If there is certain regularity between the value of two or more variablees, be just called association.Association rule mining is found interesting association or correlative connection between mass data middle term collection.It is an important problem in data mining.Correlation rule lays particular emphasis on the contact between different field in specified data, find the concentrated interesting connection of data-oriented, in database, the correlation function of data is unknown uncertain often, and correlation rule is the process of a self study, by it, can find unknown useful rule.
At present, the excavation of correlation rule is mainly finishing dealing with by computing machine, in the correlation computations process of correlation rule, mainly be calculated as the excavation of frequent item set, adopt general Apriori algorithm Mining Frequent Itemsets Based, need to repeatedly retrieve whole transaction database, when deal with data amount is too huge, digging efficiency is low.Therefore, improve the digging efficiency of frequent item set of correlation rule and the emphasis that the data processing amount of minimizing computing machine remains research.
Summary of the invention
The present invention aims to provide a kind of method of frequent item set Rapid Detection, to solve the inefficient problem of Mining Frequent Itemsets Based in above-mentioned prior art.
A kind of method that the invention discloses frequent item set Rapid Detection, comprising: scanning transaction database, according to the record in things database, obtains 1 all collection of described things data centralization;
The support of calculating 1 collection described in each, supported degree is not less than frequent 1 collection of minimum support threshold values;
By frequent k item collection and frequent 1 collection, carry out nothing and repeat to merge, generate the frequent k+1 item collection that support is not less than minimum support threshold values;
Wherein, k is greater than 0 integer.
Preferably, also comprise:
Described in each, 1 set pair is being answered boolean's array, the record sum that this boolean's array length is transaction database, and each numerical digit of described boolean's array is corresponding with the record of described things database one by one according to the order of the record in described things database;
If certain record in transaction database comprises this 1 concentrated item, will be designated as 1 with this logical value recording in corresponding numerical digit; Otherwise, be designated as 0;
Calculate the support of described all 1 collection, reject described 1 collection that support is less than minimum support threshold values, obtain described frequent 1 collection.
Wherein, in boolean's array the number of " 1 " and the numerical digit length ratio of boolean's array as described support.
Preferably, also comprise:
Described candidate frequent k+1 item collection and corresponding boolean's array thereof are carried out nothing by frequent K item collection and boolean's array thereof and frequent 1 collection and boolean's array thereof and are repeated merging and obtain;
In the process that repeats to merge in described nothing, the logical value in the identical numerical digit in frequent boolean's array of k item collection and boolean's array of frequent 1 collection is carried out logic and operation, obtains boolean's array of the frequent k+1 item of candidate collection;
Calculate the support of the frequent k+1 item of described all candidates collection; Rejecting support is less than the described k+1 item collection of minimum support threshold values, obtains described frequent k+1 item collection.
Preferably, also comprise:
In the process that repeats to merge in described nothing,
Judgement obtains described frequent k+1 item and integrates in the situation as empty set, finishes to excavate flow process.
Preferably, the process that described nothing repeats to merge comprises:
After merging, obtain the frequent k+1 item of described candidate collection for before do not occur, this k+1 item collection is labeled as " merging ", and after merging process in, identical frequent item set with it, abandons merging processing.
The method that correlation rule in the present invention excavates fast, has the following advantages:
1, the method that the present invention searches for and detects frequent item set, only need when generating 1 collection table, scan 1 time transaction database D, that compares classical Apriori algorithm and most of other association rule algorithms repeatedly reads transaction database, has greatly reduced the IO expense producing owing to reading transaction database;
While 2, generating frequent item set, need not first produce candidate item, frequent k item collection is directly generated by frequent 1 collection and frequent k-1 item collection, compared to equally only needing single pass transaction database but transaction database need be compressed to the FP-growth method of frequent pattern tree (fp tree), there is memory consumption still less;
3, in this method, maximum calculating consumes as " logical and " computing, meets the computing pattern of the bottom of computing machine, and the software of designing is thus fast operation not only, for the consumption of cpu and internal memory, also saves the most.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 shows the process flow diagram of embodiment.
Embodiment
Below with reference to the accompanying drawings and in conjunction with the embodiments, describe the present invention in detail.
A kind of method that the invention discloses frequent item set Rapid Detection, comprising:
S11, scanning transaction database, according to the record in things database, obtain 1 collection all in described things database;
S12, calculate the support of 1 collection described in each, supported degree is not less than frequent 1 collection of minimum support threshold values;
S13, by frequent k item collection and frequent 1 collection, carry out nothing and repeat to merge, generate the frequent k+1 item collection that support is not less than minimum support threshold values;
Wherein, k is greater than 0 integer.
Further, also comprise:
Described in each, 1 set pair is being answered boolean's array, the record sum that this boolean's array length is transaction database, and each numerical digit of described boolean's array is corresponding with the record of described things database one by one according to the order of the record in described things database;
If certain record in transaction database comprises this 1 concentrated item, will be designated as 1 with this logical value recording in corresponding numerical digit; Otherwise, be designated as 0;
Calculate the support of described all 1 collection, reject described 1 collection that support is less than minimum support threshold values, obtain described frequent 1 collection.
Wherein, in boolean's array the number of " 1 " and the numerical digit length ratio of boolean's array as described support.
Further, also comprise:
Described candidate frequent k+1 item collection and corresponding boolean's array thereof are carried out nothing by frequent K item collection and boolean's array thereof and frequent 1 collection and boolean's array thereof and are repeated merging and obtain;
In the process that repeats to merge in described nothing, the logical value in the identical numerical digit in frequent boolean's array of k item collection and boolean's array of frequent 1 collection is carried out logic and operation, obtains boolean's array of the frequent k+1 item of candidate collection;
Calculate the support of the frequent k+1 item of described all candidates collection; Rejecting support is less than the described k+1 item collection of minimum support threshold values, obtains described frequent k+1 item collection.
Further, in the process that repeats to merge in described nothing,
Judgement obtains described frequent k+1 item and integrates in the situation as empty set, finishes to excavate flow process.
Preferably, the process that described nothing repeats to merge comprises:
After merging, obtain the frequent k+1 item of described candidate collection for before do not occur, this k+1 item collection is labeled as " merging ", and after merging process in, identical frequent item set with it, abandons merging processing.
Further, this method discloses a preferably embodiment, as follows:
Table 1: transaction database D
TID Item collection
T001 A、B、E
T002 B、D
T003 B、C
T004 A、B、D
T005 A、C
T006 B、C
T007 A、C
T008 A、B、C、E
T009 A、B、C
According to following rule, set up following 1 collection table (table 2).
Scanning everything thing database D, all " items " in D of take shown for one 1 of Foundation collects.
Because this is concentrated, include an A, B, C, D and D, this table length is 5; This table comprises 3 row, and first classifies a sequence number as; Second classifies key name as claims; The 3rd classifies boolean's array as, this array is set up as follows: array length is 9, if the obtaining value method of each element is in this boolean's array---" item " of its correspondence is present in the i(1≤i≤n of transaction database D) in individual record, by the logical value assignment of i element of this array, be true value 1, otherwise be 0.
Table 2:1 item collection item table
Sequence number Item title Boolean's array
1 A 100110111
2 111101011
3 001011111
4 010100000
5 100000010
By following rule, set up following frequent 1 collection table (table 3).
Calculate this 1 collection and show the true value number of the boolean's array in first record, this is worth to the length 9 divided by transaction database D, be somebody's turn to do the support of " 1 collection ";
Setting minimum support support threshold values is 2/9;
If described support is greater than given minimum support threshold values, this 1 collection is labeled as to frequent item set;
Records all in 1 collection table is carried out to said process, obtain frequent 1 collection table.
Table 3: frequent 1 collection table
Figure BDA0000427072410000081
By following rule, set up following frequent 2 collection tables (table 4).
By i in frequent 1 collection table record and j record (1≤i≤5,1≤j≤5 and i ≠ j) in the corresponding element of boolean's array carry out logic "and" operation, obtain new boolean's array;
Calculate the true value number in this boolean's array, this is worth to the length 9 divided by transaction database D, obtain the support of these 2 collection;
If described support is greater than given minimum support threshold values 2/9, two 2 collection that form in i record in this frequent 1 collection table and j record are labeled as to frequent item set;
Complete after the circulation of i and j, obtain all frequent 2 collection tables.
Table 4: frequent 2 collection tables
By following rule, set up following frequent 3 collection tables (table 5).
In situation for known frequent 1 collection and frequent k item collection, can generate by the following method frequent k+1 item collection (k >=2):
Judge the item of i record in frequent k item collection table and the situation (1≤i≤6,1≤j≤5) after the item merging in frequent 1 j concentrated record:
If be k+1 item collection after merging, and this k+1 item collection do not merge, and by this k+1 item set identifier, was " merging "; During merging is afterwards processed, corresponding identical frequent item set, does not process.
I record in this frequent k item collection table and boolean's array of frequent 1 j concentrated record are carried out to logic "and" operation, obtain new boolean's array;
Calculate the true value number in this boolean's array, this is worth to the length 9 divided by transaction database D, obtain the support of this k item collection;
If described support is greater than given minimum support threshold values 2/9, this k item collection is labeled as to frequent item set;
Complete after the circulation of i and j, obtain all frequent k item collection tables.
Table 5: frequent 3 collection tables
Sequence number 3 set names claim Boolean's array Support
1 A、B、C 000000011 2/9
2 A、B、E 100000010 2/9
According to above-mentioned rule, by frequent 1 collection and frequent 3 collection, could not generate frequent 4 collection, recurrence stops.Combined statement 3,4,5, obtains, from affairs database D search and all frequent item sets of detecting, obtaining frequent item set table (table 6)
Table 6: frequent item set table
Sequence number Item set name claims Boolean's array Support
1 A 100110111 6/9
2 111101011 7/9
3 001011111 6/9
4 010100000 2/9
5 100000010 2/9
6 A、B 100100011 4/9
7 A、C 000010111 4/9
8 A、E 100000010 2/9
9 B、C 001001011 4/9
10 B、D 010100000 2/9
11 B、E 100000010 2/9
12 A、B、C 000000011 2/9
13 A、B、E 100000010 2/9

Claims (5)

1. a method for frequent item set Rapid Detection, is characterized in that, comprising: scanning transaction database, according to the record in things database, obtains 1 collection all in described things database;
The support of calculating 1 collection described in each, supported degree is not less than frequent 1 collection of minimum support threshold values;
By frequent k item collection and frequent 1 collection, carry out nothing and repeat to merge, generate the frequent k+1 item collection that support is not less than minimum support threshold values;
Wherein, k is greater than 0 integer.
2. method according to claim 1, is characterized in that, also comprises:
Described in each, 1 set pair is being answered boolean's array, the record sum that this boolean's array length is transaction database, and each numerical digit of described boolean's array is corresponding with the record of described things database one by one according to the order of the record in described things database;
If certain record in transaction database comprises this 1 concentrated item, will be designated as 1 with this logical value recording in corresponding numerical digit; Otherwise, be designated as 0;
Calculate the support of described all 1 collection, reject described 1 collection that support is less than minimum support threshold values, obtain described frequent 1 collection;
Wherein, in boolean's array the number of " 1 " and the numerical digit length ratio of boolean's array as described support.
3. method according to claim 2, is characterized in that, also comprises:
Described candidate frequent k+1 item collection and corresponding boolean's array thereof are carried out nothing by frequent K item collection and boolean's array thereof and frequent 1 collection and boolean's array thereof and are repeated merging and obtain;
In the process that repeats to merge in described nothing, the logical value in the identical numerical digit in frequent boolean's array of k item collection and boolean's array of frequent 1 collection is carried out logic and operation, obtains boolean's array of the frequent k+1 item of candidate collection;
Calculate the support of the frequent k+1 item of described all candidates collection; Rejecting support is less than the described k+1 item collection of minimum support threshold values, obtains described frequent k+1 item collection.
4. method according to claim 1, is characterized in that, also comprises:
In the process that repeats to merge in described nothing,
Judgement obtains described frequent k+1 item and integrates in the situation as empty set, finishes to excavate flow process.
5. method according to claim 1, is characterized in that, the process that described nothing repeats to merge comprises:
After merging, obtain the frequent k+1 item of described candidate collection for before do not occur, this k+1 item collection is labeled as " merging ", and after merging process in, identical frequent item set with it, abandons merging processing.
CN201310632561.9A 2013-11-30 2013-11-30 Rapid detection method of frequent item sets Pending CN103678530A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310632561.9A CN103678530A (en) 2013-11-30 2013-11-30 Rapid detection method of frequent item sets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310632561.9A CN103678530A (en) 2013-11-30 2013-11-30 Rapid detection method of frequent item sets

Publications (1)

Publication Number Publication Date
CN103678530A true CN103678530A (en) 2014-03-26

Family

ID=50316075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310632561.9A Pending CN103678530A (en) 2013-11-30 2013-11-30 Rapid detection method of frequent item sets

Country Status (1)

Country Link
CN (1) CN103678530A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573080A (en) * 2015-01-27 2015-04-29 南京信息职业技术学院 Constraint frequent item set mining method based on transaction binary system
CN105447134A (en) * 2015-11-20 2016-03-30 央视国际网络无锡有限公司 Optimization method of a frequent item set mining algorithm
CN105868328A (en) * 2016-03-28 2016-08-17 中国银联股份有限公司 Method and device for log association analysis
CN106202246A (en) * 2016-06-27 2016-12-07 广东工业大学 A kind of secret protection method for digging based on condensation matrix
CN106294494A (en) * 2015-06-08 2017-01-04 哈尔滨工业大学深圳研究生院 Item set mining method and device
WO2017071005A1 (en) * 2015-10-30 2017-05-04 西华大学 Vector operation-based association rule mining method
CN108228607A (en) * 2016-12-14 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 Maximum frequent itemsets method for digging based on degree of communication
CN110069548A (en) * 2019-04-02 2019-07-30 南京工业大学 Association rule merging method based on circulation mode
CN111352954A (en) * 2020-02-20 2020-06-30 中国科学院自动化研究所 Association rule mining method, system and device under low resource condition
CN111737321A (en) * 2020-07-02 2020-10-02 大连理工大学人工智能大连研究院 Urban atmospheric pollution joint defense joint control area division method based on data mining

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573080A (en) * 2015-01-27 2015-04-29 南京信息职业技术学院 Constraint frequent item set mining method based on transaction binary system
CN106294494A (en) * 2015-06-08 2017-01-04 哈尔滨工业大学深圳研究生院 Item set mining method and device
CN106294494B (en) * 2015-06-08 2022-11-15 哈尔滨工业大学深圳研究生院 Item set mining method and device
WO2017071005A1 (en) * 2015-10-30 2017-05-04 西华大学 Vector operation-based association rule mining method
GB2558438A (en) * 2015-10-30 2018-07-11 Univ Xihua Vector operation-based association rule mining method
CN105447134A (en) * 2015-11-20 2016-03-30 央视国际网络无锡有限公司 Optimization method of a frequent item set mining algorithm
CN105447134B (en) * 2015-11-20 2019-03-08 央视国际网络无锡有限公司 The optimization method of Frequent Itemsets Mining Algorithm
CN105868328B (en) * 2016-03-28 2019-05-10 中国银联股份有限公司 Method and apparatus for log correlation analysis
CN105868328A (en) * 2016-03-28 2016-08-17 中国银联股份有限公司 Method and device for log association analysis
CN106202246A (en) * 2016-06-27 2016-12-07 广东工业大学 A kind of secret protection method for digging based on condensation matrix
CN108228607B (en) * 2016-12-14 2021-10-15 中国航空工业集团公司西安航空计算技术研究所 Maximum frequent item set mining method based on connectivity
CN108228607A (en) * 2016-12-14 2018-06-29 中国航空工业集团公司西安航空计算技术研究所 Maximum frequent itemsets method for digging based on degree of communication
CN110069548A (en) * 2019-04-02 2019-07-30 南京工业大学 Association rule merging method based on circulation mode
CN111352954A (en) * 2020-02-20 2020-06-30 中国科学院自动化研究所 Association rule mining method, system and device under low resource condition
CN111737321A (en) * 2020-07-02 2020-10-02 大连理工大学人工智能大连研究院 Urban atmospheric pollution joint defense joint control area division method based on data mining

Similar Documents

Publication Publication Date Title
CN103678530A (en) Rapid detection method of frequent item sets
Yu et al. An improved Apriori algorithm based on the Boolean matrix and Hadoop
Chen et al. CEMiner--An Efficient Algorithm for Mining Closed Patterns from Time Interval-Based Data
CN112287118A (en) Event pattern frequent subgraph mining and predicting method
Gangurde Feature selection using clustering approach for big data
Kumar et al. Sequential pattern mining with multiple minimum supports by MS-SPADE
Liu et al. SAPNSP: Select actionable positive and negative sequential patterns based on a contribution metric
Fageeri et al. A semi-apriori algorithm for discovering the frequent itemsets
Murugappan et al. PCFA: mining of projected clusters in high dimensional data using modified FCM algorithm
Aggarwal et al. An approach to improve the efficiency of apriori algorithm
Prasanna et al. Efficient and accurate discovery of colossal pattern sequences from biological datasets: a Doubleton Pattern Mining Strategy (DPMine)
Cheng et al. Research and improvement of apriori algorithm for association rules
Tao et al. Unifying density-based clustering and outlier detection
CN103440351A (en) Parallel computing method and device of association rule data mining algorithm
CN108228607B (en) Maximum frequent item set mining method based on connectivity
Tohidi et al. A frequent pattern mining algorithm based on FP-growth without generating tree
Umarani et al. Developing novel and effective approach for association rule mining using progressive sampling
Wakchaure et al. Sequential pattern mining using apriori and FP growth algorithm
Karimtabar et al. Finding frequent items: novel method for improving Apriori algorithm
Rajeswari et al. Mining Association Rules Using Hash Table
CN112883080B (en) UFIM-Matrix algorithm-based improved uncertain frequent item set marketing data mining algorithm
Chezhian et al. Hierarchical sequence clustering algorithm for data mining
Thiel et al. FINEX: A Fast Index for Exact & Flexible Density-Based Clustering (Extended Version with Proofs)
Alghyaline et al. Efficiently mining frequent itemsets in transactional databases
Sarmah et al. An efficient algorithm for mining closed frequent intervals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 430070 East Lake Hubei Development Zone, Optics Valley Software Park, a phase of the west, South Lake Road South, Optics Valley Software Park, No. 2, No. 5, layer 205, six

Applicant after: Language network (Wuhan) Information Technology Co., Ltd.

Address before: 430073 East Lake Hubei Development Zone, Optics Valley Software Park, a phase of the west, South Lake Road South, Optics Valley Software Park, No. 2, No. 5, layer 205, six

Applicant before: Wuhan Transn Information Technology Co., Ltd.

COR Change of bibliographic data
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140326