CN105719155A

CN105719155A - Association rule algorithm based on Apriori improved algorithm

Info

Publication number: CN105719155A
Application number: CN201510583227.8A
Authority: CN
Inventors: 高曾荣; 何新; 应国力; 王建宇; 周宇浩; 袁侃辉
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2015-09-14
Filing date: 2015-09-14
Publication date: 2016-06-29

Abstract

The invention discloses an association rule algorithm based on an Apriori improved algorithm. First of all, a transaction database D is preprocessed, after data records are simplified, the data records are all read into a memory, in the process when candidate sets are generated through connecting and cutting frequent item sets, the process when the candidate sets are generated is improved, a candidate item set is directly generated, the database is scanned for calculating support after the candidate sets are obtained, and since the candidate sets and the transaction database D are ordered, when the candidate sets are respectively searched for in each transaction T, i.e., each record, once values greater than a candidate item are sought, search of the transaction is stopped. According to the invention, the improved Apriori algorithm is applied to a pharmacy management system, results indicate that the performance of the improved algorithm is obviously better than a conventional algorithm, the operation is concise, and actual demands are better satisfied.

Description

A kind of association rule algorithm based on Apriori innovatory algorithm

Technical field

The present invention relates to data mining, Algorithm Analysis field, particularly with regard to data mining technology.

Background technology

In China, along with the development of information technology is with universal, data warehouse technology is also into the fast-developing phase, but still has very big gap compared with developed countries.Domestic retail market dog-eat-dog, particularly the competition of commodity market is more harsh, although China has some enterprises all progressively to enter into information-based warehousing management, but at present, the domestic data warehouse technology not having maturation, causes the situation occurring in that " mass data, poor information ", rationally cannot effectively utilizing substantial amounts of information data, enterprise is great loss by this.The more important thing is domestic data warehouse applications not it being applied in whole enterprise-wide, it does not have set up the huge system of the global information environment being uniformly coordinated.Some domestic Retail chain enterprises are increasing to the demand of data warehouse, and the storage data of complete information management system and magnanimity are all create data warehouse to provide the foundation.The technology that Some Enterprises Introduced From Abroad is advanced, also has some enterprise self-determinings research and development data warehouse, and increasing manager recognizes only by data warehouse real company information is fully understanded of ability, makes reasonably planning and decision-making.Therefore design a kind of data mining analysis scheme and seem particularly urgent to the efficiency of management and economic benefit improving retailer.

In current data mining analysis scheme, correlation rule Apriori classic algorithm (R.Agrawal, T.Imielinski, A.Swami.Miningassociationrulesbetweensetsofitemsinlarged atabase [M] .InSIGMOD ' 93, Washington, D.C., 1993:207～216) adopt the mode of iterative search data base to create frequent item set, play a significant role for exploring affairs internal relation, be one important branch of data mining research.But, along with the increase of data, the requirement of algorithm performance is more and more higher, the defect of Apriori algorithm also comes out gradually.Traditional Apriori algorithm has two bottlenecks, and one is that Multiple-Scan data base creates huge I/O load, and two create huge Candidate Set, and time and the space taken to algorithm are all great challenges.

Summary of the invention

It is an object of the invention to provide a kind of association rule algorithm based on Apriori innovatory algorithm, the shiploads of merchandise data message in retail business shops and retailer warehouse is managed and analysis by this algorithm mainly by the Apriori algorithm improved.Mainly through the analysis to classical Apriori association rule algorithm, devise a kind of innovatory algorithm based on the association of Candidate Set medicine on this basis, to improve the performance of algorithm.

The technical scheme realizing the object of the invention is: a kind of association rule algorithm based on Apriori innovatory algorithm, first transaction database D is carried out pretreatment, internal memory is all read after being simplified by data record, connect at frequent item set, beta pruning generates in the process of Candidate Set, the process generating Candidate Set is improved, directly generate candidate, after obtaining Candidate Set, scan database calculates support, owing to Candidate Set and transaction database D all sort, when each affairs T and every record search for Candidate Set respectively, once search the value more than candidate item, the search of these affairs can be stopped.

Compared with tradition Apriori algorithm, the remarkable advantage of the present invention is: algorithm decreases the scanning times to data base, shortens the time generating Candidate Set, improves the operational efficiency of algorithm.The article sales data of retailer can be carried out the data mining analysis based on correlation rule by the present invention, manager and goods providers for enterprise provide effective decision-making foundation, it is easy to them and analyzes the drug demand of different regions, dynamically regulate type of merchandize and the quantity of the different places such as shopping centre, residential block, Office Area, to meet growing personalized customization demand, reach the work efficiency improving retailer, the target cut operating costs.

Accompanying drawing explanation

Fig. 1 is the excavation flow chart in the present invention based on Apriori innovatory algorithm.

Fig. 2 is prescription table database diagram in the present invention.

Fig. 3 is the numbering of the medicine in data mining process in the present invention.

Fig. 4 is the numbering of the prescription in data mining process in the present invention.

Fig. 5 is that in the present invention, medicine Distribution Warehouse instructs instance graph.

Detailed description of the invention

Below in conjunction with accompanying drawing, the present invention is described in further detail.

1, Apriori algorithm bottleneck analysis

Correlation rule Apriori classic algorithm adopts the mode of iterative search data base to create frequent item set, plays a significant role for exploring affairs internal relation, is one important branch of data mining research.But, along with the increase of data, the requirement of algorithm performance is more and more higher, the defect of Apriori algorithm also comes out gradually.

Apriori algorithm exists following two affects the bottleneck of performance:

(1) Multiple-Scan data base, produces huge I/O load

First, in kth time circulation, connecting the candidate's k-item produced and concentrate, frequent (k-1) item that (k-1) the item subset of each once produces before being required for scanning integrates to judge that whether this is as frequent episode；Secondly, each element in Candidate Set is required for scan database and calculates support, determines whether to meet minimum support, so that it is determined that frequent item set.If a Candidate Set includes N bar element, then at least to data base's scan N time, to necessarily lead to huge I/O load.

(2) huge Candidate Set is produced

Owing to frequently (k-1) item collection data in connecting the process producing candidate's k-item collection are exponentially increased, such as 104 frequent 1-item collection connect the frequent 2-item rally produced and reach 107, so big Candidate Set quantity, time or space to algorithm are all great challenges.

2, Apriori algorithm is improved

Owing to Apriori algorithm exists two bottleneck effect performance of algorithm, Apriori algorithm has been improved by many scholars, these algorithms are all based on basic theories and the thought of Apriori, relevant technology are introduced, and improve operational efficiency and the execution speed of Apriori algorithm.Traditional algorithm, for two above problems, has been improved by Apriori performance innovatory algorithm, improves algorithm performance.In order to reduce the scanning times of data base, first data can be carried out pretreatment, internal memory is all read after being simplified by data record, reduce I/O expense during even elimination algorithm computing, and all items in transaction database are added up after support sorts from high to low be numbered (1,2, ..., n), remove the item less than minimum support in transaction database D simultaneously, and each affairs T is rearranged and generate new data base according to numbering.The sequence of data all plays a significant role in follow-up multiple links.In Apriori algorithm, the process by frequent item set connection, beta pruning generation Candidate Set is the process that in whole algorithm, time loss is very big, therefore, the process generating Candidate Set is improved.Modified hydrothermal process need not first generate C_k', obtain C further according to the collection beta pruning of (k-1)-item_k, but directly produce C_k.All nonvoid subsets of frequent item set are all frequently, then can directly judge that when generating each Candidate Set whether its all subsets are frequent item sets, utilize the mode of form quickly to judge.When generating k-item Candidate Set, being combined in the same row by item collection identical for front (k-2) item, line number is front (k-2) item, and row set is last of each (k-1) item, and the item in set is carried out C from front to back according to the order of sequence_n ²The combination of (n is item number), and find corresponding line number according to combination preceding paragraph and this every trade number respectively in connection with (if line number only comprises 1, need not in conjunction with), judge that combination is consequent whether in this row set, if existing, then this collection is candidate's k-item collection, if not existing, directly deleting, Candidate Set may finally be directly obtained.

Such as: by frequent 2-item collection generate candidate's 3-item collection, frequent 2-item collection be 1,2}, and 1,3}, { Isosorbide-5-Nitrae }, { 1,5}, { 2,3}, { 2,4}, 3,5}, and 4,5}}, to tabulate according to the item that (k-2) item (i.e. Section 1) is identical, Candidate Set combination judges that table is as shown in table 1:

Table 1 Candidate Set combination table

From the set that line number is 1 start combination, first group be 2,3}, 2,4}, and 2,5}, according to combination preceding paragraph " 2 " find correspondence line number " 2 ", consequent " 3 ", " 4 " are in set, and " 5 " are not in set, so { 1,2,3}, { 1,2,4} at Candidate Set；Second group be 3,4}, and 3,5}, find corresponding line number " 3 " according to combination preceding paragraph " 3 ", consequent " 4 " are in set, so { 1,3,4} at Candidate Set；3rd group be 4,5}, find corresponding line number " 4 " according to combination preceding paragraph " 4 ", consequent " 5 " are in set, so { Isosorbide-5-Nitrae, 5} is at Candidate Set.Line number 2,3,4 by that analogy, may finally directly obtain Candidate Set C_kSo that 1,2,3}, and 1,2,4}, 1,3,5}, Isosorbide-5-Nitrae, 5}, { 2,3,5}, { 2,4,5}}.Owing to line number, row set sort, once occur that the content of great-than search value can stop search in search procedure, save the time.Candidate Set needs scan database to calculate support after obtaining, owing to Candidate Set and transaction database all sort, when searching for Candidate Set respectively in affairs T, once search the value more than candidate item in T, represent in T and do not comprise this candidate item, the search of affairs can be stopped.Such as: at affairs T={1, three collection { 1 are searched in 2,4,5,7}, 2,3}, " 1 " exists, " 2 " exist in first search, during search " 3 ", searches " 4 " in T, more than " 3 ", represent in T and do not comprise these three collection, stop search, so greatly save database search time.

Data are carried out pretreatment by algorithm steps: (1) scan database D, and data all write internal memory, and all data calculate support and sort, and in order data item are numbered.(2) by number the data in transaction database are replaced, remove the support item less than minimum support, after sequence, set up new database.(3) k-item Candidate Set is generated according to innovatory algorithm.(4) in orderly transaction database, search for Candidate Set, and calculate support, support is removed less than the item of the minimum support arranged, obtains frequent k-item collection.(5) repeat step (2)～(4), produce maximum frequent itemsets.(6) obtain all of frequent item set, generate correlation rule.(7) correlation rule is mapped back former data.

Algorithm example: setting transaction database D such as table 2, this transaction database has 10 data, 7 items, if minimum support is 30%, then transaction journal number * minimum support is 10*30%=3, with innovatory algorithm, this transaction database is excavated:

Table 2 transaction database D

TID	Items
		1	B,C,A
2	A,C,E,F
		3	B,A,E,C,D
4	A,C,E,D,F
		5	B,C,E,F
6	C,D,E,F,G
		7	A,E,F,B
8	A,C,D,E,G
		9	A,B,C,E
10	D,F,E

(1) data are carried out pretreatment by scan database, all data are calculated support and sorts, and in order data item are numbered, as shown in attached 3.

Table 3 transaction database is numbered

(2) by number the data in transaction database are replaced, remove the support item less than 3, set up new database after sequence, as shown in table 4.

Table 4 new database

TID	Items
		1	2,3,5 4 -->
2	1,2,3,4
		3	1,2,3,5,6
4	1,2,3,4,6
		5	1,2,4,5
6	1,3,4,5
		7	1,2,4,6
8	1,2,3,6
		9	1,2,3,5
10	1,4,6

(3)L₂Generation: according to frequent 1-item collection generate candidate 2-item collection C₂, and calculate support and obtain frequent 2-item collection L₂, such as table 5, shown in table 6.

(4)L₃Generation: according to innovatory algorithm, by frequent 2-item collection L₂Generate candidate 3-item collection C₃, obtain frequent 3-item collection L be more than or equal to 3 further according to support₃。

Table 7 candidate 3-combines

As shown in table 7, by the integrated a line tabulation of item collection identical for Section 1, Section 1 is as line number, and Section 2 is aggregates content.From the first row start combination, first group with " 2 " for preceding paragraph combination obtain 2,3}, and 2,4}, 2,5}, 2,6}, consequent all in the set that line number is " 2 ", therefore all in Candidate Set；Second group with " 3 " for preceding paragraph combination obtain 3,4}, 3,5}, and 3,6}, consequent all with " 3 " for the set of line number, therefore all in Candidate Set；3rd group obtains with " 4 " for preceding paragraph combination that { 4,5}, { 5,6}, " 5 ", not with " 4 " for the set of line number, delete.By that analogy, respectively to second and third, four row are combined, it is judged that obtain candidate's 3-item collection as shown in the table.Computational item collection support removes the support item collection less than 3, obtains frequent 3-item collection L₃, such as table 8, shown in table 9.

(5) generation of L4:

The combination of table 10 Candidate Key 4-

The combination of candidate's 4-item collection is complex, but identical with 3-item set creation method.As shown in table 10, integrated for front two identical item collection a line is tabulated, will front two as line number, Section 3 composition is gathered.Starting combination from the first row, first group obtains { 3,4}, { 3 with " 3 " for preceding paragraph combination, 5}, { 3.6}, it is judged that need time consequent to combine line number respectively with preceding paragraph, find corresponding line number, namely combination line number is { 1,3}, { 2,3}, consequent " 4 " are only that { set of 1,3}, is not { the set of 2,3} in line number in line number, deleting, consequent be " 5 " " 6 " is being all { 1,3} in line number, { in 2,3}, therefore in Candidate Set；By that analogy, all of row being combined judgement and can obtain whole candidate 4-item collection C4, computational item collection support removes the support item collection less than 3, obtains frequent 4-item collection L4, such as table 11, shown in table 12.

Finally give frequent 4-item collection for { 1,2,3,6}, mapping back former data base according to database accession number form, can to obtain frequent 4-item collection be { A, C, D, E}.

3, performance evaluation

Based in the association rule algorithm of support, the selection of support determines frequent item set scale.Test under various prescription scales first below, select to be applicable to the support of system herein, then innovatory algorithm is analyzed with Apriori algorithm operational performance under identical support.Prescriptions of Chinese medicine complicated component, single prescription generally comprises tens of kinds of medicinal herb componentses.This example adopts Xiamen City Traditional Chinese Medicine Hospital, 25000 history prescription records as data sample, and this sample packages is containing 497 kinds of medicines, and prescription average length is 12.

(1) support selects

Support arranges frequent item set that excessive excavation arrives very little, and a lot of correlation rules cannot find；It is too many that support arranges the frequent 2-item collection that too small excavation arrives, frequent 3-item collection, causes that frequent 4-item collection quantity is excessive, calculates overlong time, even cannot find correlation rule.Therefore, by 2000,5000,12000,20000 prescriptions are tested, test result such as table 13 to 16 is to determining applicable support.

Table 132000 prescription scale support contrasts

Table 145000 prescription scale support contrasts

Table 1512000 prescription scale support contrasts

Table 1620000 prescription scale support contrasts

Table 1720000 prescription scale difference support comparing result

The correlation rule that native system obtains deposits configuration for goods of storing in a warehouse, and medicine needs have stronger dependency, and therefore support is unsuitable too small, but support is excessive causes correlation rule lazy weight, it is impossible to well play the effect that Distribution Warehouse instructs.By 4 groups of data are analyzed, the number of the frequent 4-item collection that sample data obtains when support is 1.8% is proper, runs the time shorter, and therefore, arranging support when processing larger data is 1.8%.

Table 18 support 1.8% different pieces of information comparing result

By data above it have also been found that, increase along with data volume, the frequent item set number obtained tapers off trend, by the frequent 4-item collection drawn under 12000 prescriptions and the identical support of 20000 prescriptions is compared, finding that the 4-item collection that 20000 prescription samples draw is substantially contained in sample is that the 4-item that 12000 prescriptions draw is concentrated, and the increase along with sample data is described, correlation rule is further refined, there is higher accuracy, the effectiveness of prescription correlation rule is described simultaneously.

(2) innovatory algorithm and Apriori algorithm relative analysis are the performance of checking innovatory algorithm, first under identical prescription scale, the performance of innovatory algorithm and classical Apriori is compared, by selecting bigger prescription sample, embody innovatory algorithm performance in each collection calculating process further

Table 17 is the operation time of innovatory algorithm and classical each collection of Apriori under 20000 prescription scales, as shown in Table 17, the algorithm after improvement frequent 2,3,4 collection under each support are respectively provided with certain performance boost, particularly in the process calculating 4-item collection, algorithm performance improves 4-6 times, and support is more little, the speed of service advantage of innovatory algorithm is more obvious.Herein algorithm realizes in process, and all data all write text, and when program starts disposable reading internal memory, greatly reduce the number of times of I/O required for program, the traditional Apriori algorithm therefore realized herein has speed faster equally.

For verifying the performance advantage of innovatory algorithm further, relatively innovatory algorithm and classical Apriori operation time under difference prescription scale, identical supports, table 18 give support be 1.8% time prescription scale respectively 2000,5000,12000 and 20000 experimental result:

As shown in Table 18, when prescription scale is 5000,2 collection, 3 collection, 4 collection calculate speed and have been respectively increased 20%, 50%, 80%, and when prescription scale is 2000,12000 and 20000, every collection has identical performance boost substantially, visible algorithm has stronger stability, it is possible to adapt to different pieces of information scale.It addition, 2,3,4 collection performance boost ratios are obvious ascendant trend, illustrating that frequent item set k is more big, the overall performance advantage of algorithm is more obvious.

Adopting the association rule algorithm improved that Medicine prescription is analyzed, obtain medicine correlation rule, correlation rule is 4-item collection.Finally choosing support is 1.8%, prescription scale be 20000 prescription sample can obtain { Radix Glycyrrhizae, Poria, the Radix Paeoniae Alba, Radix Bupleuri }, { Radix Glycyrrhizae, Poria, the Radix Paeoniae Alba, Rhizoma Atractylodis Macrocephalae (parched with bran) }, { Radix Glycyrrhizae, Poria, Pericarpium Citri Reticulatae, Rhizoma Atractylodis Macrocephalae (parched with bran) }, { Radix Glycyrrhizae, Poria, Pericarpium Citri Reticulatae, Rhizoma Pinelliae Preparatum }, { Radix Glycyrrhizae, Radix Scutellariae, Radix Rehmanniae, Radix Bupleuri } etc. 35 medicine correlation rules.According to correlation rule, the position in storehouse in warehouse being adjusted, sales volume is good, support is high medicine position in storehouse is arranged on the position separating out mouth nearer and facilitates outbound；Medicine in same rule is arranged on adjacent position in storehouse, can be more quick when getting it filled, save the time., when shops's drug stock deficiency, the medicine with correlation rule together can be provided and delivered in delivery process meanwhile, save the human and material resources resource of dispensing.

4, algorithm pencil reason system on sale is applied

Warehousing management have employed correlation rule data mining algorithm, use the Apriori algorithm improved that the medicine information of each shops and warehouse is analyzed, obtain the correlation rule between medicine, the position in storehouse in shops and warehouse is configured the guidance proposing optimization, also has certain directive significance in dispensing combined aspects simultaneously.

It is that in C++Builder, C++Builder software, control is abundant and function is all packaged that correlation rule generates the exploitation software used, and therefore program writes succinct convenience.Three shops and central warehouse can be associated the analysis of rule by data warehousing management system, and these 25000 prescription data sentencing central warehouse are that example is shown.Accompanying drawing 2 is prescription table data base.

HIS data base comprises in HCFB table a lot of field, the information such as including prescription ID, patient's name, medicine inventory, time, it is associated not needing all of attribute when regular data is excavated, only medicine in medicine inventory is analyzed, therefore first data is carried out pretreatment.Owing in data base, the medicine number of medicine inventory use is the character string of 5 numerical characters, being all consume the too many time when sequence or scanning data, adding time complexity and the space complexity of algorithm, so first data being carried out pretreatment.First total data writing internal memory, run-down data base, calculates the support of all medicines, and medicine is numbered from high to low by support, numbering starts to write document from 1, such as accompanying drawing 3, shown in accompanying drawing 4.

In accompanying drawing 3 first be classified as sequence after medicine newly number, second is classified as the ID that medicine is original, and the 3rd is classified as the support of medicine, the number of times that namely this medicine occurs in 25000 prescriptions.The numbering of 3 with reference to the accompanying drawings, replaces to former for the medicine in prescription ID new numbering, and is ranked up according to new numbering by each prescription, generates the prescription data in accompanying drawing 4, one prescription of each behavior in figure, and each prescription arranges from small to large by medicine numbering.Carrying out data mining to after data prediction according to the Apriori algorithm improved, support is set to 1.8%.First producing frequent 2-item collection, frequent 2-item collection totally 539, the operation time is 9621ms, then produces frequent 3-item collection, and totally 231, the operation time is 970ms, finally produces frequent 4-item collection, totally 33, runs time 44ms, finally obtains correlation rule.Operation total time is 10635ms, processes 25000 prescription data and runs the time at about 10s, calculates speed fast, and efficiency is high.Obtain the medicine correlation rule of each shops, shops's medicine can be put and play directive function, medicine high for degree of association is placed in adjacent or close position, medicine high for support is placed in the position of convenient access, contribute to pharmacist's Quick medicine-taking, save the time got it filled.

Storehouse position in storehouse is configured by the medicine correlation rule according to producing, support is high, degree of association is high medicine is placed on the convenient position of outbound, it is easy to quickly provide and deliver outbound, in the delivery process that replenishes, medicine high for degree of association is sent to together pharmacy shops simultaneously, save dispensing number of times.Utilize correlation rule to carry out warehousing management and can save human and material resources, shorten the time got it filled, provide and deliver, improve work efficiency, provide effective decision support for enterprise.

Distribution Warehouse instructs what intercept in the such as accompanying drawing 5, figure of interface to be the Distribution Warehouse guidance content of shops 2.

Obtaining different correlation rules after prescription data analysis according to each shops, position in storehouse can be adjusted by shops manager according to the correlation rule provided, and medicine high for degree of association is placed on adjacent position, convenient medicament taking.Such as Radix Glycyrrhizae, Radix Scutellariae, Radix Rehmanniae, Cortex Moutan, Radix Paeoniae Rubra can be placed on close positions by shops 1, and Radix Glycyrrhizae, Poria, the Radix Paeoniae Alba, Cortex Moutan then can be placed on close positions by shops 2.Combining with the personal habits of pharmacist according to correlation rule, reasonably configuration position in storehouse can more quickly be got it filled, be made up a prescription, and improves work efficiency.For warehouse, it is possible to degree of association is high, use frequency is high medicine lie adjacent, and it is arranged on the position in storehouse that distance store exit is nearer, facilitates outbound, medicine high for degree of association together can be provided and delivered in the process that dispensing replenishes simultaneously, reduce number of times of providing and delivering.

Claims

1. the association rule algorithm based on Apriori innovatory algorithm, it is characterized in that: first transaction database D is carried out pretreatment, internal memory is all read after being simplified by data record, connect at frequent item set, beta pruning generates in the process of Candidate Set, the process generating Candidate Set is improved, directly generate candidate, after obtaining Candidate Set, scan database calculates support, owing to Candidate Set and transaction database D all sort, when each affairs T and every record search for Candidate Set respectively, once search the value more than candidate item, the search of these affairs can be stopped.

2. the association rule algorithm based on Apriori innovatory algorithm according to claim 1, it is characterised in that:

The support of described setting option is the ratio shared by record comprising this collection among data set；

Dependency according to transaction database middle term collection arranges minimum support minSup and estimates a number, arranging interval is 0 ~ 100%, first arranging minSup is an estimated value, calculate the frequent item set of transaction database D, if frequent item set is more than estimating a number, reduce the value of minSup, if frequent item set is less than estimating a number, increasing the value of minSup, repeating above step until obtaining the minimum support minSup closest to discreet value.

3. the association rule algorithm based on Apriori innovatory algorithm according to claim 1, it is characterised in that: specifically comprise the following steps that

(1) size of k is set, data are carried out pretreatment by scanning transaction database D, are simplified by data record, namely redundant data is removed, data are all write internal memory, and is numbered (1,2 after sorting from high to low after all items statistics support in transaction database D, ..., n), obtain candidate's 1-item collection, remove candidate's 1-item and concentrate the item collection less than minimum support minSup to obtain frequent 1-item collection；

(2) remove the item less than minimum support minSup in transaction database D, all items in transaction database D are replaced with its corresponding numbering, and each affairs T is rearranged and generate new transaction database D from small to large according to numbering；

(3) according to tradition Apriori algorithm, utilize frequent 1-item collection to generate candidate's 2-item collection, and calculate support and obtain frequent 2-item collection；

(4) candidate's m-item collection is generated, concrete grammar is that the item collection that before being concentrated by frequent m-1 item, (m-2) item is identical combines in the same row, and m is be more than or equal to 3, and line number is front (m-2) item, row set is last of each (m-1) item, is carried out according to the order of sequence from front to back by the item in row setCombination, n is item number, and according to combination preceding paragraph and this every trade number respectively in connection with, find corresponding line number, if line number only comprises 1, need not be in conjunction with, judge that combination is consequent whether in this row set, if existing, then this collection is candidate's m-item collection, if not existing, directly delete, candidate's m-item collection may finally be directly obtained；

(5) after obtaining candidate's m-item collection, scan new transaction database D, calculate candidate's m-item and concentrate the support of each candidate item, owing to candidate's m-item collection and transaction database D all sort, when affairs T searches for candidate item respectively, once search the value more than candidate item in T, represent in T and do not comprise this candidate item, the search of this affairs T can be stopped；After candidate's m-item collection has all calculated, the support item less than minimum support minSup is removed, obtains frequent m-item collection；

(6) repeat step (4) ~ (5), until the value of m reaches to set the size of k, obtain k-frequent item set；

(7) list all items of k-frequent item set, generate correlation rule according to tradition Apriori algorithm；

(8) numbering generated according to step (1), maps back source data item by correlation rule.