CN106777182A - A kind of data flow effective item set mining algorithm for reducing candidate - Google Patents

A kind of data flow effective item set mining algorithm for reducing candidate Download PDF

Info

Publication number
CN106777182A
CN106777182A CN201611202991.7A CN201611202991A CN106777182A CN 106777182 A CN106777182 A CN 106777182A CN 201611202991 A CN201611202991 A CN 201611202991A CN 106777182 A CN106777182 A CN 106777182A
Authority
CN
China
Prior art keywords
effectiveness
item
affairs
candidate
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611202991.7A
Other languages
Chinese (zh)
Inventor
陈涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi University of Technology
Original Assignee
Shaanxi University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaanxi University of Technology filed Critical Shaanxi University of Technology
Priority to CN201611202991.7A priority Critical patent/CN106777182A/en
Publication of CN106777182A publication Critical patent/CN106777182A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data flow effective item set mining algorithm of reduction candidate of present invention offer, first, a global tree is set up by the single pass of current window in data flow, and reduce the redundancy value of utility of head table entry and node in global tree;Then, based on overall situation tree generation candidate pattern, the candidate effectiveness based on growth algorithm reduction local tree;In candidate effectiveness, according to the order of transaction set, successively by the i in k-th affairsjThe affairs weighting effectiveness of item is added, used as node ijBe added to for secondary frequency effectiveness item in setting by the things weighting effectiveness summation of item, treatment time frequency effectiveness item collection;Then by introducing affairs effectiveness threshold value high with low affairs effectiveness threshold value, PTUVDSecondary frequency effectiveness item collection in storage data set;Actual utility is finally calculated to determine final effective item collection.It is based on True Data stream test result indicate that, spatiotemporal efficiency of the invention and EMS memory occupation are than being superior to the effective pattern mining algorithm of other data flows.

Description

A kind of data flow effective item set mining algorithm for reducing candidate
Technical field
The invention belongs to data mining technology field, more specifically, more particularly to a kind of data for reducing candidate Stream effective item set mining algorithm.
Background technology
With cloud computing, the fast development of big data and internet, each face of each side during we live all be unable to do without calculating Machine technology is stored, excavated and analyze data.What we received is not only the less data of scale inside body series, but How cross interconnected unmeasured vastness information knowledge ocean between every profession and trade, obtain knowledge and information from the large-scale data for producing It is a huge challenge.In traditional information system data are carried out with additions and deletions and looks into that the operation such as to change and count new in this day instantly Month different society oneself through tending to out-of-date old stuff, carry out mining analysis by which type of technology data huge to amount of storage, The potential information existed between data is fast and effectively found, and oneself information through excavating is manager or decision-making by these Person provides the prediction of knowledge, effectively improves the utilization rate of resource, and this is only the technical research for meeting requirements of the times.Thus, from number Support that the research of data mining starts to occur according to discovery knowledge in storehouse and dominant technology, and quickly developed.Data mining Be exactly from substantial amounts of, incomplete, noisy, fuzzy, random real application data, extraction lie in it is therein, People in advance it is ignorant but really potentially useful information and the process of knowledge.Oneself is through there is many field of reality to exist now Using DM technologies, including manufacturing industry, retail business, finance, health care, engineering and science etc..Simultaneously in behavior recommendation, network carriage The aspects such as feelings monitoring system are widely applied very much.
Association rule mining has obtained the extensive of scholar as one in data mining technology very important research branch Research, it mainly excavates the associated degree between item collection, and wherein its core is frequent item set mining.Agrawal in 1993 Et al. the concept for proposing correlation rule first, Zhi Hou are fully studied by the Supermarket shopping baskets data message to Wal-Mart Many industries are applied.Such as shopping online platform (day cat, when work as), the correlation rule obtained by excavation can predict Gu The buying pattern of visitor and hobby, then can provide personalized buying experience with for every customer.But association rule mining The degree of association size between commodity is only analyzed without the consideration of other factors, such as quantity and profit of article, this will Still effectiveness item collection high is ignored less to make occurrence number.In order to solve this problem, scholar proposes effective item collection first (high utility itemsets) is excavated, and it increases to the quantity of article and profit value in Association Rules Model, works as item collection Total utility value it is bigger than previously given effectiveness threshold value when, it is just referred to as effective item collection.
But, with the fast development of database and network technology, the significantly lifting of memory data output causes data not It is again static, but builds up, changes.Such as the sales data of online platform, the message registration of CHINAUNICOM's movement, friendship Logical real-time monitoring data etc..Different from traditional association rule mining, the data in transaction set can be changed over time, more Data after new compare before it is more important, how correctly to consider the factor of these changes and fast and effeciently excavate true Real valuable knowledge and information, tightened up requirement and challenge is proposed to association rule mining.Traditional batch-type is frequent Item set mining algorithm can only produce new association item collection by rescaning the database after updating, and FUP is proposed in the prior art Algorithm, needs database after frequently scanning renewal when algorithm solves the problems, such as newly-increased transaction set than original transaction collection small scale. The concept of secondary Frequent Set and FP-tree combinations have devised prelarge-tree structures and effectively carry out Increment Mining.Then again Propose the concept that decrement is excavated and change is excavated.Value of utility is considered again on the basis of correlation rule Increment Mining afterwards It is interior, using the downward closure of affairs weighting effectiveness (TWU), constantly change on the basis of FUP algorithms and inferior frequent itemsets concept Enter, such as Lin et al propose FUP-HU works algorithm and carry out effective increment excavation based on FUP algorithms, but when an item collection exists In original data set it is low frequency effectiveness and still needs when being high frequency effectiveness on data set in the updated and rescan renewal Database afterwards.Given this Pre-HU works algorithm proposes that Two-Phase algorithms and Pre-large concepts are incorporated into effectiveness excavates In, the time of scan database is reduced using the downward closure of affairs.
Although these effective increment algorithms improve renewal efficiency, the number of times of scanning raw data base is effectively reduced, Still need to produce a large amount of useless candidate's frequency items, and be only suitable for processing the increase of transaction database, when item collection changes in former db transaction The database rescaned after updating is still needed to when becoming (reduce, modification etc.), can be reached by the present invention and effectively be reduced candidate frequently The purpose of item number, can not only process the increase of transaction set, and the change of transaction set can be processed again, while can also be efficiently completed dynamic Effectiveness mining task, this has been also adapted to the new demand excavated to effectiveness at this stage.
The content of the invention
The invention aims to solve shortcoming present in prior art, and a kind of reduction candidate for proposing Data flow effective item set mining algorithm.
To achieve the above object, the present invention provides following technical scheme:
A kind of data flow effective item set mining algorithm for reducing candidate, comprises the following steps:
S1, first, a global tree is set up by the single pass of current window in data flow, and head table enters in reducing global tree Mouthful with the redundancy value of utility of node;
S2 and then, based on the overall situation tree generation candidate pattern, based on growth algorithm reduction local tree candidate effectiveness;
S3, in candidate effectiveness, according to the order of transaction set, successively by the i in kth affairsjThe affairs weighting of item Effectiveness is added, used as node ijThe things weighting effectiveness summation of item, meanwhile, by item ijPrefix be added to node ijPrefix In item collection chained list, be added to secondary frequency effectiveness item in tree by treatment time frequency effectiveness item collection;
S4 and then by introducing affairs effectiveness threshold value high and low affairs effectiveness threshold value, three layers are divided into by affairs weighting utility scale, Alignment processing is layered in original transaction collection and newly-increased transaction set, using HTWUDHigh frequency effectiveness item collection in storage data set, PTUVDSecondary frequency effectiveness item collection in storage data set;
S5, finally calculate actual utility and determine final effective item collection.
Preferably, the method for building up of the global tree is as follows:
A, the affairs weighting effectiveness variable quantity for calculating each item collection in change affairs first;
B and then they are divided into high frequency effectiveness according to the item frequency of raw data base, secondary frequency effectiveness and low frequency effectiveness are come Construction PreHU-tree;
C, directly determine the frequency of n mono- finally by the affairs weighting effectiveness and prefix item collection chained list of search each nodes of PreHU-tree ;
D, the outside effectiveness with reference to the item collection support in prefix item collection chained list and item excavate varying type effective item collection.
Preferably, redundancy effectiveness reduction algorithm is stated as follows:
A, in a head table for overall situation HUS trees for each sets up a conditional pattern base, each divide search space head table In not include every terms of information, therefore from conditional pattern base produce candidate pattern when, without the utility information comprising project below;
B, hypothesisS={i 1<i 2<...<i m It is current sequence, whereini 1Withi m It is respectively the top and bottom of global tree head table , it is assumed that from the beginning table selects one to excavate programi p A conditional pattern base is set up, before only being included in sequence in conditional pattern base Severali 1,i 2,...,i p-1, so without in addition below effectiveness to the effective of some.
Technique effect of the invention and advantage:A kind of data flow effective item collection of reduction candidate that the present invention is provided Mining algorithm, first, sets up a global tree, and reduce head table in global tree by the single pass of current window in data flow The redundancy value of utility of entrance and node;Then, based on overall situation tree generation candidate pattern, the time based on growth algorithm reduction local tree Set of choices effectiveness;Finally, effective pattern is selected from candidate pattern.It is based on True Data stream test result indicate that, this hair Bright spatiotemporal efficiency and EMS memory occupation are than being superior to the effective pattern mining algorithm of other data flows.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with specific embodiment, to this Invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, not For limiting the present invention.Based on the embodiment in the present invention, those of ordinary skill in the art are not before creative work is made The every other embodiment for being obtained is put, the scope of protection of the invention is belonged to.
A kind of data flow effective item set mining algorithm for reducing candidate, comprises the following steps:
S1, first, a global tree is set up by the single pass of current window in data flow, and head table enters in reducing global tree Mouthful with the redundancy value of utility of node;
S2 and then, based on the overall situation tree generation candidate pattern, based on growth algorithm reduction local tree candidate effectiveness;
S3, in candidate effectiveness, according to the order of transaction set, successively by the i in kth affairsjThe affairs weighting of item Effectiveness is added, used as node ijThe things weighting effectiveness summation of item, meanwhile, by item ijPrefix be added to node ijPrefix In item collection chained list, be added to secondary frequency effectiveness item in tree by treatment time frequency effectiveness item collection;
S4 and then by introducing affairs effectiveness threshold value high and low affairs effectiveness threshold value, three layers are divided into by affairs weighting utility scale, Alignment processing is layered in original transaction collection and newly-increased transaction set, using HTWUDHigh frequency effectiveness item collection in storage data set, PTUVDSecondary frequency effectiveness item collection in storage data set;
S5, finally calculate actual utility and determine final effective item collection.
Specifically, the method for building up of the global tree is as follows:
A, the affairs weighting effectiveness variable quantity for calculating each item collection in change affairs first;
B and then they are divided into high frequency effectiveness according to the item frequency of raw data base, secondary frequency effectiveness and low frequency effectiveness are come Construction PreHU-tree;
C, directly determine the frequency of n mono- finally by the affairs weighting effectiveness and prefix item collection chained list of search each nodes of PreHU-tree ;
D, the outside effectiveness with reference to the item collection support in prefix item collection chained list and item excavate varying type effective item collection.
Specifically, redundancy effectiveness reduction algorithm is stated as follows:
A, in a head table for overall situation HUS trees for each sets up a conditional pattern base, each divide search space head table In not include every terms of information, therefore from conditional pattern base produce candidate pattern when, without the utility information comprising project below;
B, hypothesisS={i 1<i 2<...<i m It is current sequence, whereini 1Withi m It is respectively the top and bottom of global tree head table , it is assumed that from the beginning table selects one to excavate programi p A conditional pattern base is set up, before only being included in sequence in conditional pattern base Severali 1,i 2,...,i p-1, so without in addition below effectiveness to the effective of some.
In sum:A kind of data flow effective item set mining algorithm of reduction candidate that the present invention is provided, first, One global tree is set up by the single pass of current window in data flow, and it is superfluous with node to reduce head table entry during the overall situation is set Remaining value of utility;Then, based on overall situation tree generation candidate pattern, the candidate effectiveness based on growth algorithm reduction local tree;Most Eventually, effective pattern is selected from candidate pattern.It is based on True Data stream test result indicate that, spatiotemporal efficiency of the invention with EMS memory occupation is than being superior to the effective pattern mining algorithm of other data flows.
Finally it should be noted that:The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, Although being described in detail to the present invention with reference to the foregoing embodiments, for a person skilled in the art, it still may be used Modified with to the technical scheme described in foregoing embodiments, or equivalent carried out to which part technical characteristic, All any modification, equivalent substitution and improvements within the spirit and principles in the present invention, made etc., should be included in of the invention Within protection domain.

Claims (3)

1. it is a kind of reduce candidate data flow effective item set mining algorithm, it is characterised in that comprise the following steps:
S1, first, a global tree is set up by the single pass of current window in data flow, and head table enters in reducing global tree Mouthful with the redundancy value of utility of node;
S2 and then, based on the overall situation tree generation candidate pattern, based on growth algorithm reduction local tree candidate effectiveness;
S3, in candidate effectiveness, according to the order of transaction set, successively by the i in kth affairsjThe affairs weighting effect of item With addition, as node ijThe things weighting effectiveness summation of item, meanwhile, by item ijPrefix be added to node ijPrefix In collection chained list, be added to secondary frequency effectiveness item in tree by treatment time frequency effectiveness item collection;
S4 and then by introducing affairs effectiveness threshold value high and low affairs effectiveness threshold value, three layers are divided into by affairs weighting utility scale, Alignment processing is layered in original transaction collection and newly-increased transaction set, using HTWUDHigh frequency effectiveness item collection in storage data set, PTUVDSecondary frequency effectiveness item collection in storage data set;
S5, finally calculate actual utility and determine final effective item collection.
2. a kind of data flow effective item set mining algorithm for reducing candidate according to claim 1, its feature exists In:The method for building up of the global tree is as follows:
A, the affairs weighting effectiveness variable quantity for calculating each item collection in change affairs first;
B and then they are divided into high frequency effectiveness according to the item frequency of raw data base, secondary frequency effectiveness and low frequency effectiveness are come Construction PreHU-tree;
C, finally by search each nodes of PreHU-tree affairs weighting effectiveness and prefix item collection chained list directly determine n- frequency items;
D, the outside effectiveness with reference to the item collection support in prefix item collection chained list and item excavate varying type effective item collection.
3. a kind of data flow effective item set mining algorithm for reducing candidate according to claim 1, its feature exists In:The redundancy effectiveness reduction algorithm is as follows:
A, in a head table for overall situation HUS trees for each sets up a conditional pattern base, each divide search space head table In not include every terms of information, therefore from conditional pattern base produce candidate pattern when, without the utility information comprising project below;
B, hypothesisS={i 1<i 2<...<i m It is current sequence, whereini 1Withi m It is respectively the top and bottom of global tree head table , it is assumed that from the beginning table selects one to excavate programi p A conditional pattern base is set up, before only being included in sequence in conditional pattern base Severali 1,i 2,...,i p-1, so without in addition below effectiveness to the effective of some.
CN201611202991.7A 2016-12-23 2016-12-23 A kind of data flow effective item set mining algorithm for reducing candidate Pending CN106777182A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611202991.7A CN106777182A (en) 2016-12-23 2016-12-23 A kind of data flow effective item set mining algorithm for reducing candidate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611202991.7A CN106777182A (en) 2016-12-23 2016-12-23 A kind of data flow effective item set mining algorithm for reducing candidate

Publications (1)

Publication Number Publication Date
CN106777182A true CN106777182A (en) 2017-05-31

Family

ID=58897578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611202991.7A Pending CN106777182A (en) 2016-12-23 2016-12-23 A kind of data flow effective item set mining algorithm for reducing candidate

Country Status (1)

Country Link
CN (1) CN106777182A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766461A (en) * 2017-09-28 2018-03-06 新乡学院 A kind of campus Student Information Management System
CN109101530A (en) * 2018-06-22 2018-12-28 哈尔滨工业大学(深圳) Effective sequence of events pattern mining algorithm
CN109408563A (en) * 2018-11-07 2019-03-01 哈尔滨工业大学(深圳) High average utility item set mining method, apparatus and computer equipment
CN110413660A (en) * 2019-07-26 2019-11-05 哈尔滨工业大学(深圳) Excavate the method, apparatus and computer readable storage medium of global effective item collection
CN110471960A (en) * 2019-08-21 2019-11-19 桂林电子科技大学 A kind of effective item set mining method containing disutility
CN112801793A (en) * 2021-01-31 2021-05-14 哈尔滨工业大学(威海) Method for mining high-profit commodities in e-commerce transaction data
CN113792099A (en) * 2021-08-12 2021-12-14 上海熙业信息科技有限公司 Data flow high-utility item set mining system based on historical effective table pruning
CN115964415A (en) * 2023-03-16 2023-04-14 山东科技大学 Pre-HUSPM-based database sequence insertion processing method

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766461A (en) * 2017-09-28 2018-03-06 新乡学院 A kind of campus Student Information Management System
CN109101530A (en) * 2018-06-22 2018-12-28 哈尔滨工业大学(深圳) Effective sequence of events pattern mining algorithm
CN109101530B (en) * 2018-06-22 2021-09-21 哈尔滨工业大学(深圳) High-utility event sequence pattern mining method
CN109408563B (en) * 2018-11-07 2021-06-22 哈尔滨工业大学(深圳) High average utility item set mining method and device and computer equipment
CN109408563A (en) * 2018-11-07 2019-03-01 哈尔滨工业大学(深圳) High average utility item set mining method, apparatus and computer equipment
CN110413660A (en) * 2019-07-26 2019-11-05 哈尔滨工业大学(深圳) Excavate the method, apparatus and computer readable storage medium of global effective item collection
CN110413660B (en) * 2019-07-26 2024-05-14 哈尔滨工业大学(深圳) Method, apparatus and computer readable storage medium for mining global efficient item sets
CN110471960A (en) * 2019-08-21 2019-11-19 桂林电子科技大学 A kind of effective item set mining method containing disutility
CN110471960B (en) * 2019-08-21 2022-04-05 桂林电子科技大学 High-utility item set mining method containing negative utility
CN112801793A (en) * 2021-01-31 2021-05-14 哈尔滨工业大学(威海) Method for mining high-profit commodities in e-commerce transaction data
CN112801793B (en) * 2021-01-31 2022-04-15 哈尔滨工业大学(威海) Method for mining high-profit commodities in e-commerce transaction data
CN113792099A (en) * 2021-08-12 2021-12-14 上海熙业信息科技有限公司 Data flow high-utility item set mining system based on historical effective table pruning
CN113792099B (en) * 2021-08-12 2023-08-25 上海熙业信息科技有限公司 Data flow high-utility item set mining system based on historical utility table pruning
CN115964415A (en) * 2023-03-16 2023-04-14 山东科技大学 Pre-HUSPM-based database sequence insertion processing method
CN115964415B (en) * 2023-03-16 2023-05-26 山东科技大学 Pre-HUSPM-based database sequence insertion processing method

Similar Documents

Publication Publication Date Title
CN106777182A (en) A kind of data flow effective item set mining algorithm for reducing candidate
Bergmeir et al. Bagging exponential smoothing methods using STL decomposition and Box–Cox transformation
CN104182474B (en) A kind of pre- recognition methods for being lost in user
Uddin et al. Population age structure and savings rate impacts on economic growth: Evidence from Australia
Edeme et al. Infrastructural development, sustainable agricultural output and employment in ECOWAS countries
Zhao et al. Intuitionistic fuzzy set approach to multi-objective evolutionary clustering with multiple spatial information for image segmentation
CN105760443B (en) Item recommendation system, project recommendation device and item recommendation method
De Queiroz et al. Sharing cuts under aggregated forecasts when decomposing multi-stage stochastic programs
CN107895038A (en) A kind of link prediction relation recommends method and device
Kang et al. Forecast with forecasts: Diversity matters
CN104850577A (en) Data flow maximal frequent item set mining method based on ordered composite tree structure
Liu et al. A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge
CN109559156A (en) Client&#39;s intention based on client properties and marketing data has monitoring forecast method
Mirhashemi et al. Extracting association rules from changes in aquifer drawdown in irrigation areas of Qazvin plain, Iran
CN107578165A (en) Marketing of bank management method and system based on brief algorithm in rough set
CN109754106A (en) A kind of prediction technique of shopping center shop distribution planning
CN107220742A (en) A kind of development of information system common support method analyzed based on system vulnerability and platform
CN108509531B (en) Spark platform-based uncertain data set frequent item mining method
Meagher et al. Applied general equilibrium modelling and labour market forecasting
Chaudhari et al. Advance privacy preserving in association rule mining
Khalili Araghi et al. Effective Factors on the Growth of Provinces of Iran: A Spatial Panel Approach
Shangguan et al. Enhancing data quality and real-time sharing performance in water informatics through decision tree mining algorithm
Billa et al. Efficient frequent pattern mining algorithm based on node sets in cloud computing environment
Singh et al. Education and unemployment in rural and urban Kerala
Surawase et al. High Utility Itemset Mining From Transaction Database Using Up-Growth And Up-Growth+ Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170531