CN105719155A - Association rule algorithm based on Apriori improved algorithm - Google Patents

Association rule algorithm based on Apriori improved algorithm Download PDF

Info

Publication number
CN105719155A
CN105719155A CN201510583227.8A CN201510583227A CN105719155A CN 105719155 A CN105719155 A CN 105719155A CN 201510583227 A CN201510583227 A CN 201510583227A CN 105719155 A CN105719155 A CN 105719155A
Authority
CN
China
Prior art keywords
item
candidate
algorithm
collection
frequent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510583227.8A
Other languages
Chinese (zh)
Inventor
高曾荣
何新
应国力
王建宇
周宇浩
袁侃辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN201510583227.8A priority Critical patent/CN105719155A/en
Publication of CN105719155A publication Critical patent/CN105719155A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an association rule algorithm based on an Apriori improved algorithm. First of all, a transaction database D is preprocessed, after data records are simplified, the data records are all read into a memory, in the process when candidate sets are generated through connecting and cutting frequent item sets, the process when the candidate sets are generated is improved, a candidate item set is directly generated, the database is scanned for calculating support after the candidate sets are obtained, and since the candidate sets and the transaction database D are ordered, when the candidate sets are respectively searched for in each transaction T, i.e., each record, once values greater than a candidate item are sought, search of the transaction is stopped. According to the invention, the improved Apriori algorithm is applied to a pharmacy management system, results indicate that the performance of the improved algorithm is obviously better than a conventional algorithm, the operation is concise, and actual demands are better satisfied.

Description

A kind of association rule algorithm based on Apriori innovatory algorithm
Technical field
The present invention relates to data mining, Algorithm Analysis field, particularly with regard to data mining technology.
Background technology
In China, along with the development of information technology is with universal, data warehouse technology is also into the fast-developing phase, but still has very big gap compared with developed countries.Domestic retail market dog-eat-dog, particularly the competition of commodity market is more harsh, although China has some enterprises all progressively to enter into information-based warehousing management, but at present, the domestic data warehouse technology not having maturation, causes the situation occurring in that " mass data, poor information ", rationally cannot effectively utilizing substantial amounts of information data, enterprise is great loss by this.The more important thing is domestic data warehouse applications not it being applied in whole enterprise-wide, it does not have set up the huge system of the global information environment being uniformly coordinated.Some domestic Retail chain enterprises are increasing to the demand of data warehouse, and the storage data of complete information management system and magnanimity are all create data warehouse to provide the foundation.The technology that Some Enterprises Introduced From Abroad is advanced, also has some enterprise self-determinings research and development data warehouse, and increasing manager recognizes only by data warehouse real company information is fully understanded of ability, makes reasonably planning and decision-making.Therefore design a kind of data mining analysis scheme and seem particularly urgent to the efficiency of management and economic benefit improving retailer.
In current data mining analysis scheme, correlation rule Apriori classic algorithm (R.Agrawal, T.Imielinski, A.Swami.Miningassociationrulesbetweensetsofitemsinlarged atabase [M] .InSIGMOD ' 93, Washington, D.C., 1993:207~216) adopt the mode of iterative search data base to create frequent item set, play a significant role for exploring affairs internal relation, be one important branch of data mining research.But, along with the increase of data, the requirement of algorithm performance is more and more higher, the defect of Apriori algorithm also comes out gradually.Traditional Apriori algorithm has two bottlenecks, and one is that Multiple-Scan data base creates huge I/O load, and two create huge Candidate Set, and time and the space taken to algorithm are all great challenges.
Summary of the invention
It is an object of the invention to provide a kind of association rule algorithm based on Apriori innovatory algorithm, the shiploads of merchandise data message in retail business shops and retailer warehouse is managed and analysis by this algorithm mainly by the Apriori algorithm improved.Mainly through the analysis to classical Apriori association rule algorithm, devise a kind of innovatory algorithm based on the association of Candidate Set medicine on this basis, to improve the performance of algorithm.
The technical scheme realizing the object of the invention is: a kind of association rule algorithm based on Apriori innovatory algorithm, first transaction database D is carried out pretreatment, internal memory is all read after being simplified by data record, connect at frequent item set, beta pruning generates in the process of Candidate Set, the process generating Candidate Set is improved, directly generate candidate, after obtaining Candidate Set, scan database calculates support, owing to Candidate Set and transaction database D all sort, when each affairs T and every record search for Candidate Set respectively, once search the value more than candidate item, the search of these affairs can be stopped.
Compared with tradition Apriori algorithm, the remarkable advantage of the present invention is: algorithm decreases the scanning times to data base, shortens the time generating Candidate Set, improves the operational efficiency of algorithm.The article sales data of retailer can be carried out the data mining analysis based on correlation rule by the present invention, manager and goods providers for enterprise provide effective decision-making foundation, it is easy to them and analyzes the drug demand of different regions, dynamically regulate type of merchandize and the quantity of the different places such as shopping centre, residential block, Office Area, to meet growing personalized customization demand, reach the work efficiency improving retailer, the target cut operating costs.
Accompanying drawing explanation
Fig. 1 is the excavation flow chart in the present invention based on Apriori innovatory algorithm.
Fig. 2 is prescription table database diagram in the present invention.
Fig. 3 is the numbering of the medicine in data mining process in the present invention.
Fig. 4 is the numbering of the prescription in data mining process in the present invention.
Fig. 5 is that in the present invention, medicine Distribution Warehouse instructs instance graph.
Detailed description of the invention
Below in conjunction with accompanying drawing, the present invention is described in further detail.
1, Apriori algorithm bottleneck analysis
Correlation rule Apriori classic algorithm adopts the mode of iterative search data base to create frequent item set, plays a significant role for exploring affairs internal relation, is one important branch of data mining research.But, along with the increase of data, the requirement of algorithm performance is more and more higher, the defect of Apriori algorithm also comes out gradually.
Apriori algorithm exists following two affects the bottleneck of performance:
(1) Multiple-Scan data base, produces huge I/O load
First, in kth time circulation, connecting the candidate's k-item produced and concentrate, frequent (k-1) item that (k-1) the item subset of each once produces before being required for scanning integrates to judge that whether this is as frequent episode;Secondly, each element in Candidate Set is required for scan database and calculates support, determines whether to meet minimum support, so that it is determined that frequent item set.If a Candidate Set includes N bar element, then at least to data base's scan N time, to necessarily lead to huge I/O load.
(2) huge Candidate Set is produced
Owing to frequently (k-1) item collection data in connecting the process producing candidate's k-item collection are exponentially increased, such as 104 frequent 1-item collection connect the frequent 2-item rally produced and reach 107, so big Candidate Set quantity, time or space to algorithm are all great challenges.
2, Apriori algorithm is improved
Owing to Apriori algorithm exists two bottleneck effect performance of algorithm, Apriori algorithm has been improved by many scholars, these algorithms are all based on basic theories and the thought of Apriori, relevant technology are introduced, and improve operational efficiency and the execution speed of Apriori algorithm.Traditional algorithm, for two above problems, has been improved by Apriori performance innovatory algorithm, improves algorithm performance.In order to reduce the scanning times of data base, first data can be carried out pretreatment, internal memory is all read after being simplified by data record, reduce I/O expense during even elimination algorithm computing, and all items in transaction database are added up after support sorts from high to low be numbered (1,2, ..., n), remove the item less than minimum support in transaction database D simultaneously, and each affairs T is rearranged and generate new data base according to numbering.The sequence of data all plays a significant role in follow-up multiple links.In Apriori algorithm, the process by frequent item set connection, beta pruning generation Candidate Set is the process that in whole algorithm, time loss is very big, therefore, the process generating Candidate Set is improved.Modified hydrothermal process need not first generate Ck', obtain C further according to the collection beta pruning of (k-1)-itemk, but directly produce Ck.All nonvoid subsets of frequent item set are all frequently, then can directly judge that when generating each Candidate Set whether its all subsets are frequent item sets, utilize the mode of form quickly to judge.When generating k-item Candidate Set, being combined in the same row by item collection identical for front (k-2) item, line number is front (k-2) item, and row set is last of each (k-1) item, and the item in set is carried out C from front to back according to the order of sequencen 2The combination of (n is item number), and find corresponding line number according to combination preceding paragraph and this every trade number respectively in connection with (if line number only comprises 1, need not in conjunction with), judge that combination is consequent whether in this row set, if existing, then this collection is candidate's k-item collection, if not existing, directly deleting, Candidate Set may finally be directly obtained.
Such as: by frequent 2-item collection generate candidate's 3-item collection, frequent 2-item collection be 1,2}, and 1,3}, { Isosorbide-5-Nitrae }, { 1,5}, { 2,3}, { 2,4}, 3,5}, and 4,5}}, to tabulate according to the item that (k-2) item (i.e. Section 1) is identical, Candidate Set combination judges that table is as shown in table 1:
Table 1 Candidate Set combination table
From the set that line number is 1 start combination, first group be 2,3}, 2,4}, and 2,5}, according to combination preceding paragraph " 2 " find correspondence line number " 2 ", consequent " 3 ", " 4 " are in set, and " 5 " are not in set, so { 1,2,3}, { 1,2,4} at Candidate Set;Second group be 3,4}, and 3,5}, find corresponding line number " 3 " according to combination preceding paragraph " 3 ", consequent " 4 " are in set, so { 1,3,4} at Candidate Set;3rd group be 4,5}, find corresponding line number " 4 " according to combination preceding paragraph " 4 ", consequent " 5 " are in set, so { Isosorbide-5-Nitrae, 5} is at Candidate Set.Line number 2,3,4 by that analogy, may finally directly obtain Candidate Set CkSo that 1,2,3}, and 1,2,4}, 1,3,5}, Isosorbide-5-Nitrae, 5}, { 2,3,5}, { 2,4,5}}.Owing to line number, row set sort, once occur that the content of great-than search value can stop search in search procedure, save the time.Candidate Set needs scan database to calculate support after obtaining, owing to Candidate Set and transaction database all sort, when searching for Candidate Set respectively in affairs T, once search the value more than candidate item in T, represent in T and do not comprise this candidate item, the search of affairs can be stopped.Such as: at affairs T={1, three collection { 1 are searched in 2,4,5,7}, 2,3}, " 1 " exists, " 2 " exist in first search, during search " 3 ", searches " 4 " in T, more than " 3 ", represent in T and do not comprise these three collection, stop search, so greatly save database search time.
Data are carried out pretreatment by algorithm steps: (1) scan database D, and data all write internal memory, and all data calculate support and sort, and in order data item are numbered.(2) by number the data in transaction database are replaced, remove the support item less than minimum support, after sequence, set up new database.(3) k-item Candidate Set is generated according to innovatory algorithm.(4) in orderly transaction database, search for Candidate Set, and calculate support, support is removed less than the item of the minimum support arranged, obtains frequent k-item collection.(5) repeat step (2)~(4), produce maximum frequent itemsets.(6) obtain all of frequent item set, generate correlation rule.(7) correlation rule is mapped back former data.
Algorithm example: setting transaction database D such as table 2, this transaction database has 10 data, 7 items, if minimum support is 30%, then transaction journal number * minimum support is 10*30%=3, with innovatory algorithm, this transaction database is excavated:
Table 2 transaction database D
TID Items
1 B,C,A
2 A,C,E,F
3 B,A,E,C,D
4 A,C,E,D,F
5 B,C,E,F
6 C,D,E,F,G
7 A,E,F,B
8 A,C,D,E,G
9 A,B,C,E
10 D,F,E
(1) data are carried out pretreatment by scan database, all data are calculated support and sorts, and in order data item are numbered, as shown in attached 3.
Table 3 transaction database is numbered
(2) by number the data in transaction database are replaced, remove the support item less than 3, set up new database after sequence, as shown in table 4.
Table 4 new database
TID Items
1 2,3,5 4 -->
2 1,2,3,4
3 1,2,3,5,6
4 1,2,3,4,6
5 1,2,4,5
6 1,3,4,5
7 1,2,4,6
8 1,2,3,6
9 1,2,3,5
10 1,4,6
(3)L2Generation: according to frequent 1-item collection generate candidate 2-item collection C2, and calculate support and obtain frequent 2-item collection L2, such as table 5, shown in table 6.
(4)L3Generation: according to innovatory algorithm, by frequent 2-item collection L2Generate candidate 3-item collection C3, obtain frequent 3-item collection L be more than or equal to 3 further according to support3
Table 7 candidate 3-combines
As shown in table 7, by the integrated a line tabulation of item collection identical for Section 1, Section 1 is as line number, and Section 2 is aggregates content.From the first row start combination, first group with " 2 " for preceding paragraph combination obtain 2,3}, and 2,4}, 2,5}, 2,6}, consequent all in the set that line number is " 2 ", therefore all in Candidate Set;Second group with " 3 " for preceding paragraph combination obtain 3,4}, 3,5}, and 3,6}, consequent all with " 3 " for the set of line number, therefore all in Candidate Set;3rd group obtains with " 4 " for preceding paragraph combination that { 4,5}, { 5,6}, " 5 ", not with " 4 " for the set of line number, delete.By that analogy, respectively to second and third, four row are combined, it is judged that obtain candidate's 3-item collection as shown in the table.Computational item collection support removes the support item collection less than 3, obtains frequent 3-item collection L3, such as table 8, shown in table 9.
(5) generation of L4:
The combination of table 10 Candidate Key 4-
The combination of candidate's 4-item collection is complex, but identical with 3-item set creation method.As shown in table 10, integrated for front two identical item collection a line is tabulated, will front two as line number, Section 3 composition is gathered.Starting combination from the first row, first group obtains { 3,4}, { 3 with " 3 " for preceding paragraph combination, 5}, { 3.6}, it is judged that need time consequent to combine line number respectively with preceding paragraph, find corresponding line number, namely combination line number is { 1,3}, { 2,3}, consequent " 4 " are only that { set of 1,3}, is not { the set of 2,3} in line number in line number, deleting, consequent be " 5 " " 6 " is being all { 1,3} in line number, { in 2,3}, therefore in Candidate Set;By that analogy, all of row being combined judgement and can obtain whole candidate 4-item collection C4, computational item collection support removes the support item collection less than 3, obtains frequent 4-item collection L4, such as table 11, shown in table 12.
Finally give frequent 4-item collection for { 1,2,3,6}, mapping back former data base according to database accession number form, can to obtain frequent 4-item collection be { A, C, D, E}.
3, performance evaluation
Based in the association rule algorithm of support, the selection of support determines frequent item set scale.Test under various prescription scales first below, select to be applicable to the support of system herein, then innovatory algorithm is analyzed with Apriori algorithm operational performance under identical support.Prescriptions of Chinese medicine complicated component, single prescription generally comprises tens of kinds of medicinal herb componentses.This example adopts Xiamen City Traditional Chinese Medicine Hospital, 25000 history prescription records as data sample, and this sample packages is containing 497 kinds of medicines, and prescription average length is 12.
(1) support selects
Support arranges frequent item set that excessive excavation arrives very little, and a lot of correlation rules cannot find;It is too many that support arranges the frequent 2-item collection that too small excavation arrives, frequent 3-item collection, causes that frequent 4-item collection quantity is excessive, calculates overlong time, even cannot find correlation rule.Therefore, by 2000,5000,12000,20000 prescriptions are tested, test result such as table 13 to 16 is to determining applicable support.
Table 132000 prescription scale support contrasts
Table 145000 prescription scale support contrasts
Table 1512000 prescription scale support contrasts
Table 1620000 prescription scale support contrasts
Table 1720000 prescription scale difference support comparing result
The correlation rule that native system obtains deposits configuration for goods of storing in a warehouse, and medicine needs have stronger dependency, and therefore support is unsuitable too small, but support is excessive causes correlation rule lazy weight, it is impossible to well play the effect that Distribution Warehouse instructs.By 4 groups of data are analyzed, the number of the frequent 4-item collection that sample data obtains when support is 1.8% is proper, runs the time shorter, and therefore, arranging support when processing larger data is 1.8%.
Table 18 support 1.8% different pieces of information comparing result
By data above it have also been found that, increase along with data volume, the frequent item set number obtained tapers off trend, by the frequent 4-item collection drawn under 12000 prescriptions and the identical support of 20000 prescriptions is compared, finding that the 4-item collection that 20000 prescription samples draw is substantially contained in sample is that the 4-item that 12000 prescriptions draw is concentrated, and the increase along with sample data is described, correlation rule is further refined, there is higher accuracy, the effectiveness of prescription correlation rule is described simultaneously.
(2) innovatory algorithm and Apriori algorithm relative analysis are the performance of checking innovatory algorithm, first under identical prescription scale, the performance of innovatory algorithm and classical Apriori is compared, by selecting bigger prescription sample, embody innovatory algorithm performance in each collection calculating process further
Table 17 is the operation time of innovatory algorithm and classical each collection of Apriori under 20000 prescription scales, as shown in Table 17, the algorithm after improvement frequent 2,3,4 collection under each support are respectively provided with certain performance boost, particularly in the process calculating 4-item collection, algorithm performance improves 4-6 times, and support is more little, the speed of service advantage of innovatory algorithm is more obvious.Herein algorithm realizes in process, and all data all write text, and when program starts disposable reading internal memory, greatly reduce the number of times of I/O required for program, the traditional Apriori algorithm therefore realized herein has speed faster equally.
For verifying the performance advantage of innovatory algorithm further, relatively innovatory algorithm and classical Apriori operation time under difference prescription scale, identical supports, table 18 give support be 1.8% time prescription scale respectively 2000,5000,12000 and 20000 experimental result:
As shown in Table 18, when prescription scale is 5000,2 collection, 3 collection, 4 collection calculate speed and have been respectively increased 20%, 50%, 80%, and when prescription scale is 2000,12000 and 20000, every collection has identical performance boost substantially, visible algorithm has stronger stability, it is possible to adapt to different pieces of information scale.It addition, 2,3,4 collection performance boost ratios are obvious ascendant trend, illustrating that frequent item set k is more big, the overall performance advantage of algorithm is more obvious.
Adopting the association rule algorithm improved that Medicine prescription is analyzed, obtain medicine correlation rule, correlation rule is 4-item collection.Finally choosing support is 1.8%, prescription scale be 20000 prescription sample can obtain { Radix Glycyrrhizae, Poria, the Radix Paeoniae Alba, Radix Bupleuri }, { Radix Glycyrrhizae, Poria, the Radix Paeoniae Alba, Rhizoma Atractylodis Macrocephalae (parched with bran) }, { Radix Glycyrrhizae, Poria, Pericarpium Citri Reticulatae, Rhizoma Atractylodis Macrocephalae (parched with bran) }, { Radix Glycyrrhizae, Poria, Pericarpium Citri Reticulatae, Rhizoma Pinelliae Preparatum }, { Radix Glycyrrhizae, Radix Scutellariae, Radix Rehmanniae, Radix Bupleuri } etc. 35 medicine correlation rules.According to correlation rule, the position in storehouse in warehouse being adjusted, sales volume is good, support is high medicine position in storehouse is arranged on the position separating out mouth nearer and facilitates outbound;Medicine in same rule is arranged on adjacent position in storehouse, can be more quick when getting it filled, save the time., when shops's drug stock deficiency, the medicine with correlation rule together can be provided and delivered in delivery process meanwhile, save the human and material resources resource of dispensing.
4, algorithm pencil reason system on sale is applied
Warehousing management have employed correlation rule data mining algorithm, use the Apriori algorithm improved that the medicine information of each shops and warehouse is analyzed, obtain the correlation rule between medicine, the position in storehouse in shops and warehouse is configured the guidance proposing optimization, also has certain directive significance in dispensing combined aspects simultaneously.
It is that in C++Builder, C++Builder software, control is abundant and function is all packaged that correlation rule generates the exploitation software used, and therefore program writes succinct convenience.Three shops and central warehouse can be associated the analysis of rule by data warehousing management system, and these 25000 prescription data sentencing central warehouse are that example is shown.Accompanying drawing 2 is prescription table data base.
HIS data base comprises in HCFB table a lot of field, the information such as including prescription ID, patient's name, medicine inventory, time, it is associated not needing all of attribute when regular data is excavated, only medicine in medicine inventory is analyzed, therefore first data is carried out pretreatment.Owing in data base, the medicine number of medicine inventory use is the character string of 5 numerical characters, being all consume the too many time when sequence or scanning data, adding time complexity and the space complexity of algorithm, so first data being carried out pretreatment.First total data writing internal memory, run-down data base, calculates the support of all medicines, and medicine is numbered from high to low by support, numbering starts to write document from 1, such as accompanying drawing 3, shown in accompanying drawing 4.
In accompanying drawing 3 first be classified as sequence after medicine newly number, second is classified as the ID that medicine is original, and the 3rd is classified as the support of medicine, the number of times that namely this medicine occurs in 25000 prescriptions.The numbering of 3 with reference to the accompanying drawings, replaces to former for the medicine in prescription ID new numbering, and is ranked up according to new numbering by each prescription, generates the prescription data in accompanying drawing 4, one prescription of each behavior in figure, and each prescription arranges from small to large by medicine numbering.Carrying out data mining to after data prediction according to the Apriori algorithm improved, support is set to 1.8%.First producing frequent 2-item collection, frequent 2-item collection totally 539, the operation time is 9621ms, then produces frequent 3-item collection, and totally 231, the operation time is 970ms, finally produces frequent 4-item collection, totally 33, runs time 44ms, finally obtains correlation rule.Operation total time is 10635ms, processes 25000 prescription data and runs the time at about 10s, calculates speed fast, and efficiency is high.Obtain the medicine correlation rule of each shops, shops's medicine can be put and play directive function, medicine high for degree of association is placed in adjacent or close position, medicine high for support is placed in the position of convenient access, contribute to pharmacist's Quick medicine-taking, save the time got it filled.
Storehouse position in storehouse is configured by the medicine correlation rule according to producing, support is high, degree of association is high medicine is placed on the convenient position of outbound, it is easy to quickly provide and deliver outbound, in the delivery process that replenishes, medicine high for degree of association is sent to together pharmacy shops simultaneously, save dispensing number of times.Utilize correlation rule to carry out warehousing management and can save human and material resources, shorten the time got it filled, provide and deliver, improve work efficiency, provide effective decision support for enterprise.
Distribution Warehouse instructs what intercept in the such as accompanying drawing 5, figure of interface to be the Distribution Warehouse guidance content of shops 2.
Obtaining different correlation rules after prescription data analysis according to each shops, position in storehouse can be adjusted by shops manager according to the correlation rule provided, and medicine high for degree of association is placed on adjacent position, convenient medicament taking.Such as Radix Glycyrrhizae, Radix Scutellariae, Radix Rehmanniae, Cortex Moutan, Radix Paeoniae Rubra can be placed on close positions by shops 1, and Radix Glycyrrhizae, Poria, the Radix Paeoniae Alba, Cortex Moutan then can be placed on close positions by shops 2.Combining with the personal habits of pharmacist according to correlation rule, reasonably configuration position in storehouse can more quickly be got it filled, be made up a prescription, and improves work efficiency.For warehouse, it is possible to degree of association is high, use frequency is high medicine lie adjacent, and it is arranged on the position in storehouse that distance store exit is nearer, facilitates outbound, medicine high for degree of association together can be provided and delivered in the process that dispensing replenishes simultaneously, reduce number of times of providing and delivering.

Claims (3)

1. the association rule algorithm based on Apriori innovatory algorithm, it is characterized in that: first transaction database D is carried out pretreatment, internal memory is all read after being simplified by data record, connect at frequent item set, beta pruning generates in the process of Candidate Set, the process generating Candidate Set is improved, directly generate candidate, after obtaining Candidate Set, scan database calculates support, owing to Candidate Set and transaction database D all sort, when each affairs T and every record search for Candidate Set respectively, once search the value more than candidate item, the search of these affairs can be stopped.
2. the association rule algorithm based on Apriori innovatory algorithm according to claim 1, it is characterised in that:
The support of described setting option is the ratio shared by record comprising this collection among data set;
Dependency according to transaction database middle term collection arranges minimum support minSup and estimates a number, arranging interval is 0 ~ 100%, first arranging minSup is an estimated value, calculate the frequent item set of transaction database D, if frequent item set is more than estimating a number, reduce the value of minSup, if frequent item set is less than estimating a number, increasing the value of minSup, repeating above step until obtaining the minimum support minSup closest to discreet value.
3. the association rule algorithm based on Apriori innovatory algorithm according to claim 1, it is characterised in that: specifically comprise the following steps that
(1) size of k is set, data are carried out pretreatment by scanning transaction database D, are simplified by data record, namely redundant data is removed, data are all write internal memory, and is numbered (1,2 after sorting from high to low after all items statistics support in transaction database D, ..., n), obtain candidate's 1-item collection, remove candidate's 1-item and concentrate the item collection less than minimum support minSup to obtain frequent 1-item collection;
(2) remove the item less than minimum support minSup in transaction database D, all items in transaction database D are replaced with its corresponding numbering, and each affairs T is rearranged and generate new transaction database D from small to large according to numbering;
(3) according to tradition Apriori algorithm, utilize frequent 1-item collection to generate candidate's 2-item collection, and calculate support and obtain frequent 2-item collection;
(4) candidate's m-item collection is generated, concrete grammar is that the item collection that before being concentrated by frequent m-1 item, (m-2) item is identical combines in the same row, and m is be more than or equal to 3, and line number is front (m-2) item, row set is last of each (m-1) item, is carried out according to the order of sequence from front to back by the item in row setCombination, n is item number, and according to combination preceding paragraph and this every trade number respectively in connection with, find corresponding line number, if line number only comprises 1, need not be in conjunction with, judge that combination is consequent whether in this row set, if existing, then this collection is candidate's m-item collection, if not existing, directly delete, candidate's m-item collection may finally be directly obtained;
(5) after obtaining candidate's m-item collection, scan new transaction database D, calculate candidate's m-item and concentrate the support of each candidate item, owing to candidate's m-item collection and transaction database D all sort, when affairs T searches for candidate item respectively, once search the value more than candidate item in T, represent in T and do not comprise this candidate item, the search of this affairs T can be stopped;After candidate's m-item collection has all calculated, the support item less than minimum support minSup is removed, obtains frequent m-item collection;
(6) repeat step (4) ~ (5), until the value of m reaches to set the size of k, obtain k-frequent item set;
(7) list all items of k-frequent item set, generate correlation rule according to tradition Apriori algorithm;
(8) numbering generated according to step (1), maps back source data item by correlation rule.
CN201510583227.8A 2015-09-14 2015-09-14 Association rule algorithm based on Apriori improved algorithm Pending CN105719155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510583227.8A CN105719155A (en) 2015-09-14 2015-09-14 Association rule algorithm based on Apriori improved algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510583227.8A CN105719155A (en) 2015-09-14 2015-09-14 Association rule algorithm based on Apriori improved algorithm

Publications (1)

Publication Number Publication Date
CN105719155A true CN105719155A (en) 2016-06-29

Family

ID=56144875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510583227.8A Pending CN105719155A (en) 2015-09-14 2015-09-14 Association rule algorithm based on Apriori improved algorithm

Country Status (1)

Country Link
CN (1) CN105719155A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106302450A (en) * 2016-08-15 2017-01-04 广州华多网络科技有限公司 A kind of based on the malice detection method of address and device in DDOS attack
CN107844514A (en) * 2017-09-22 2018-03-27 深圳市易成自动驾驶技术有限公司 Data digging method, device and computer-readable recording medium
CN107967199A (en) * 2017-12-05 2018-04-27 广东电网有限责任公司东莞供电局 Power equipment temperature early warning analysis method based on association rule mining
CN108664642A (en) * 2018-05-16 2018-10-16 句容市茂润苗木有限公司 Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm
CN109697361A (en) * 2017-10-20 2019-04-30 北京理工大学 A kind of wooden horse classification method based on Trojan characteristics
CN109767617A (en) * 2018-12-20 2019-05-17 北京航空航天大学 A kind of public security traffic control service exception data analysis method based on Apriori
CN110399876A (en) * 2019-08-30 2019-11-01 东北大学 A kind of Target Searching Method based on relation inference
CN110442609A (en) * 2019-08-02 2019-11-12 云南电网有限责任公司电力科学研究院 A kind of recognition methods of secondary equipment of intelligent converting station association defect
CN110597889A (en) * 2019-10-08 2019-12-20 四川长虹电器股份有限公司 Machine tool fault prediction method based on improved Apriori algorithm
CN110874786A (en) * 2019-10-11 2020-03-10 支付宝(杭州)信息技术有限公司 False transaction group identification method, equipment and computer readable medium
CN111125183A (en) * 2019-11-07 2020-05-08 北京科技大学 Tuple measurement method and system based on CFI-Apriori algorithm in fog environment
CN111159545A (en) * 2019-12-24 2020-05-15 贝壳技术有限公司 Recommended house source determining method and device and house source recommending method and device
CN111914163A (en) * 2020-06-20 2020-11-10 武汉海云健康科技股份有限公司 Medicine combination recommendation method and device, electronic equipment and storage medium
CN114791924A (en) * 2022-03-24 2022-07-26 三峡大学 ACT-Apriori algorithm based on self-coding technology
CN117971921A (en) * 2024-01-15 2024-05-03 兵器装备集团财务有限责任公司 Method and system for detecting abnormal operation of client based on apriori algorithm

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120116989A1 (en) * 2010-11-04 2012-05-10 Kuei-Kuie Lai Method and system for evaluating/analyzing patent portfolio using patent priority approach
CN104573106A (en) * 2015-01-30 2015-04-29 浙江大学城市学院 Intelligent urban construction examining and approving method based on case-based reasoning technology

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120116989A1 (en) * 2010-11-04 2012-05-10 Kuei-Kuie Lai Method and system for evaluating/analyzing patent portfolio using patent priority approach
CN104573106A (en) * 2015-01-30 2015-04-29 浙江大学城市学院 Intelligent urban construction examining and approving method based on case-based reasoning technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马丽伟: "关联规则算法研究及其在中医药数据挖掘中的应用", 《中国优秀硕士学位论文全文数据库》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106302450B (en) * 2016-08-15 2019-08-30 广州华多网络科技有限公司 A kind of detection method and device based on malice address in DDOS attack
CN106302450A (en) * 2016-08-15 2017-01-04 广州华多网络科技有限公司 A kind of based on the malice detection method of address and device in DDOS attack
CN107844514A (en) * 2017-09-22 2018-03-27 深圳市易成自动驾驶技术有限公司 Data digging method, device and computer-readable recording medium
CN109697361A (en) * 2017-10-20 2019-04-30 北京理工大学 A kind of wooden horse classification method based on Trojan characteristics
CN107967199A (en) * 2017-12-05 2018-04-27 广东电网有限责任公司东莞供电局 Power equipment temperature early warning analysis method based on association rule mining
CN107967199B (en) * 2017-12-05 2019-11-08 广东电网有限责任公司东莞供电局 Power equipment temperature early warning analysis method based on association rule mining
CN108664642A (en) * 2018-05-16 2018-10-16 句容市茂润苗木有限公司 Rules for Part of Speech Tagging automatic obtaining method based on Apriori algorithm
CN109767617A (en) * 2018-12-20 2019-05-17 北京航空航天大学 A kind of public security traffic control service exception data analysis method based on Apriori
CN110442609A (en) * 2019-08-02 2019-11-12 云南电网有限责任公司电力科学研究院 A kind of recognition methods of secondary equipment of intelligent converting station association defect
CN110442609B (en) * 2019-08-02 2023-05-16 云南电网有限责任公司电力科学研究院 Identification method for association defects of secondary equipment of intelligent substation
CN110399876B (en) * 2019-08-30 2022-11-15 东北大学 Target searching method based on relational reasoning
CN110399876A (en) * 2019-08-30 2019-11-01 东北大学 A kind of Target Searching Method based on relation inference
CN110597889A (en) * 2019-10-08 2019-12-20 四川长虹电器股份有限公司 Machine tool fault prediction method based on improved Apriori algorithm
CN110874786B (en) * 2019-10-11 2022-10-18 支付宝(杭州)信息技术有限公司 False transaction group identification method, device and computer readable medium
CN110874786A (en) * 2019-10-11 2020-03-10 支付宝(杭州)信息技术有限公司 False transaction group identification method, equipment and computer readable medium
CN111125183A (en) * 2019-11-07 2020-05-08 北京科技大学 Tuple measurement method and system based on CFI-Apriori algorithm in fog environment
CN111125183B (en) * 2019-11-07 2023-06-23 北京科技大学 Tuple measurement method and system based on CFI-Apriori algorithm in fog environment
CN111159545A (en) * 2019-12-24 2020-05-15 贝壳技术有限公司 Recommended house source determining method and device and house source recommending method and device
CN111914163A (en) * 2020-06-20 2020-11-10 武汉海云健康科技股份有限公司 Medicine combination recommendation method and device, electronic equipment and storage medium
CN114791924A (en) * 2022-03-24 2022-07-26 三峡大学 ACT-Apriori algorithm based on self-coding technology
CN117971921A (en) * 2024-01-15 2024-05-03 兵器装备集团财务有限责任公司 Method and system for detecting abnormal operation of client based on apriori algorithm
CN117971921B (en) * 2024-01-15 2024-06-21 兵器装备集团财务有限责任公司 Method and system for detecting abnormal operation of client based on apriori algorithm

Similar Documents

Publication Publication Date Title
CN105719155A (en) Association rule algorithm based on Apriori improved algorithm
Sagin et al. Determination of association rules with market basket analysis: application in the retail sector
CN104866474B (en) Individuation data searching method and device
Heer et al. Interactive analysis of big data
US6832216B2 (en) Method and system for mining association rules with negative items
US6564212B2 (en) Method of processing queries in a database system, and database system and software product for implementing such method
US7003504B1 (en) Data processing system
US8484252B2 (en) Generation of a multidimensional dataset from an associative database
Dehdouh et al. Columnar nosql star schema benchmark
Fisher Incremental, approximate database queries and uncertainty for exploratory visualization
CN107016001A (en) A kind of data query method and device
CN104077407B (en) A kind of intelligent data search system and method
CN105159971B (en) A kind of cloud platform data retrieval method
CA2662936A1 (en) Adaptive processing of top-k queries in nested-structure arbitrary markup language such as xml
JP2006526840A (en) Computer System and Method for Multidimensional Database Query and Visualization The present invention relates to the Ministry of Energy ASCI Level 1 Partnership LLL-B523835 with Stanford University ("Center for Integrated Turbulence Simulation"). As well as by the Defense Advanced Research Projects Agency (“Visualization of Complex Systems and Environments”), APRA order number E269. The US government may have rights in the invention.
MXPA01000123A (en) Value-instance-connectivity computer-implemented database.
CN102779143A (en) Visualizing method for knowledge genealogy
Zhang et al. A survey of key technologies for high utility patterns mining
Karim et al. An efficient distributed programming model for mining useful patterns in big datasets
Arvanitis et al. Efficient influence-based processing of market research queries
CN102231158B (en) Data set recommendation method and system
WO2009006028A2 (en) Explaining changes in measures thru data mining
CN106682173A (en) Social security big data OLAP pre-processing method and on-line analysis and query method
EP1890243A2 (en) Adaptive processing of top-k queries in nested structure arbitrary markup language such as XML
Rao et al. Efficient iceberg query evaluation using set representation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160629