CN106777182A - A kind of data flow effective item set mining algorithm for reducing candidate - Google Patents
A kind of data flow effective item set mining algorithm for reducing candidate Download PDFInfo
- Publication number
- CN106777182A CN106777182A CN201611202991.7A CN201611202991A CN106777182A CN 106777182 A CN106777182 A CN 106777182A CN 201611202991 A CN201611202991 A CN 201611202991A CN 106777182 A CN106777182 A CN 106777182A
- Authority
- CN
- China
- Prior art keywords
- effectiveness
- item
- affairs
- candidate
- tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of data flow effective item set mining algorithm of reduction candidate of present invention offer, first, a global tree is set up by the single pass of current window in data flow, and reduce the redundancy value of utility of head table entry and node in global tree;Then, based on overall situation tree generation candidate pattern, the candidate effectiveness based on growth algorithm reduction local tree;In candidate effectiveness, according to the order of transaction set, successively by the i in k-th affairsjThe affairs weighting effectiveness of item is added, used as node ijBe added to for secondary frequency effectiveness item in setting by the things weighting effectiveness summation of item, treatment time frequency effectiveness item collection;Then by introducing affairs effectiveness threshold value high with low affairs effectiveness threshold value, PTUVDSecondary frequency effectiveness item collection in storage data set;Actual utility is finally calculated to determine final effective item collection.It is based on True Data stream test result indicate that, spatiotemporal efficiency of the invention and EMS memory occupation are than being superior to the effective pattern mining algorithm of other data flows.
Description
Technical field
The invention belongs to data mining technology field, more specifically, more particularly to a kind of data for reducing candidate
Stream effective item set mining algorithm.
Background technology
With cloud computing, the fast development of big data and internet, each face of each side during we live all be unable to do without calculating
Machine technology is stored, excavated and analyze data.What we received is not only the less data of scale inside body series, but
How cross interconnected unmeasured vastness information knowledge ocean between every profession and trade, obtain knowledge and information from the large-scale data for producing
It is a huge challenge.In traditional information system data are carried out with additions and deletions and looks into that the operation such as to change and count new in this day instantly
Month different society oneself through tending to out-of-date old stuff, carry out mining analysis by which type of technology data huge to amount of storage,
The potential information existed between data is fast and effectively found, and oneself information through excavating is manager or decision-making by these
Person provides the prediction of knowledge, effectively improves the utilization rate of resource, and this is only the technical research for meeting requirements of the times.Thus, from number
Support that the research of data mining starts to occur according to discovery knowledge in storehouse and dominant technology, and quickly developed.Data mining
Be exactly from substantial amounts of, incomplete, noisy, fuzzy, random real application data, extraction lie in it is therein,
People in advance it is ignorant but really potentially useful information and the process of knowledge.Oneself is through there is many field of reality to exist now
Using DM technologies, including manufacturing industry, retail business, finance, health care, engineering and science etc..Simultaneously in behavior recommendation, network carriage
The aspects such as feelings monitoring system are widely applied very much.
Association rule mining has obtained the extensive of scholar as one in data mining technology very important research branch
Research, it mainly excavates the associated degree between item collection, and wherein its core is frequent item set mining.Agrawal in 1993
Et al. the concept for proposing correlation rule first, Zhi Hou are fully studied by the Supermarket shopping baskets data message to Wal-Mart
Many industries are applied.Such as shopping online platform (day cat, when work as), the correlation rule obtained by excavation can predict Gu
The buying pattern of visitor and hobby, then can provide personalized buying experience with for every customer.But association rule mining
The degree of association size between commodity is only analyzed without the consideration of other factors, such as quantity and profit of article, this will
Still effectiveness item collection high is ignored less to make occurrence number.In order to solve this problem, scholar proposes effective item collection first
(high utility itemsets) is excavated, and it increases to the quantity of article and profit value in Association Rules Model, works as item collection
Total utility value it is bigger than previously given effectiveness threshold value when, it is just referred to as effective item collection.
But, with the fast development of database and network technology, the significantly lifting of memory data output causes data not
It is again static, but builds up, changes.Such as the sales data of online platform, the message registration of CHINAUNICOM's movement, friendship
Logical real-time monitoring data etc..Different from traditional association rule mining, the data in transaction set can be changed over time, more
Data after new compare before it is more important, how correctly to consider the factor of these changes and fast and effeciently excavate true
Real valuable knowledge and information, tightened up requirement and challenge is proposed to association rule mining.Traditional batch-type is frequent
Item set mining algorithm can only produce new association item collection by rescaning the database after updating, and FUP is proposed in the prior art
Algorithm, needs database after frequently scanning renewal when algorithm solves the problems, such as newly-increased transaction set than original transaction collection small scale.
The concept of secondary Frequent Set and FP-tree combinations have devised prelarge-tree structures and effectively carry out Increment Mining.Then again
Propose the concept that decrement is excavated and change is excavated.Value of utility is considered again on the basis of correlation rule Increment Mining afterwards
It is interior, using the downward closure of affairs weighting effectiveness (TWU), constantly change on the basis of FUP algorithms and inferior frequent itemsets concept
Enter, such as Lin et al propose FUP-HU works algorithm and carry out effective increment excavation based on FUP algorithms, but when an item collection exists
In original data set it is low frequency effectiveness and still needs when being high frequency effectiveness on data set in the updated and rescan renewal
Database afterwards.Given this Pre-HU works algorithm proposes that Two-Phase algorithms and Pre-large concepts are incorporated into effectiveness excavates
In, the time of scan database is reduced using the downward closure of affairs.
Although these effective increment algorithms improve renewal efficiency, the number of times of scanning raw data base is effectively reduced,
Still need to produce a large amount of useless candidate's frequency items, and be only suitable for processing the increase of transaction database, when item collection changes in former db transaction
The database rescaned after updating is still needed to when becoming (reduce, modification etc.), can be reached by the present invention and effectively be reduced candidate frequently
The purpose of item number, can not only process the increase of transaction set, and the change of transaction set can be processed again, while can also be efficiently completed dynamic
Effectiveness mining task, this has been also adapted to the new demand excavated to effectiveness at this stage.
The content of the invention
The invention aims to solve shortcoming present in prior art, and a kind of reduction candidate for proposing
Data flow effective item set mining algorithm.
To achieve the above object, the present invention provides following technical scheme:
A kind of data flow effective item set mining algorithm for reducing candidate, comprises the following steps:
S1, first, a global tree is set up by the single pass of current window in data flow, and head table enters in reducing global tree
Mouthful with the redundancy value of utility of node;
S2 and then, based on the overall situation tree generation candidate pattern, based on growth algorithm reduction local tree candidate effectiveness;
S3, in candidate effectiveness, according to the order of transaction set, successively by the i in kth affairsjThe affairs weighting of item
Effectiveness is added, used as node ijThe things weighting effectiveness summation of item, meanwhile, by item ijPrefix be added to node ijPrefix
In item collection chained list, be added to secondary frequency effectiveness item in tree by treatment time frequency effectiveness item collection;
S4 and then by introducing affairs effectiveness threshold value high and low affairs effectiveness threshold value, three layers are divided into by affairs weighting utility scale,
Alignment processing is layered in original transaction collection and newly-increased transaction set, using HTWUDHigh frequency effectiveness item collection in storage data set,
PTUVDSecondary frequency effectiveness item collection in storage data set;
S5, finally calculate actual utility and determine final effective item collection.
Preferably, the method for building up of the global tree is as follows:
A, the affairs weighting effectiveness variable quantity for calculating each item collection in change affairs first;
B and then they are divided into high frequency effectiveness according to the item frequency of raw data base, secondary frequency effectiveness and low frequency effectiveness are come
Construction PreHU-tree;
C, directly determine the frequency of n mono- finally by the affairs weighting effectiveness and prefix item collection chained list of search each nodes of PreHU-tree
;
D, the outside effectiveness with reference to the item collection support in prefix item collection chained list and item excavate varying type effective item collection.
Preferably, redundancy effectiveness reduction algorithm is stated as follows:
A, in a head table for overall situation HUS trees for each sets up a conditional pattern base, each divide search space head table
In not include every terms of information, therefore from conditional pattern base produce candidate pattern when, without the utility information comprising project below;
B, hypothesisS={i 1<i 2<...<i m It is current sequence, whereini 1Withi m It is respectively the top and bottom of global tree head table
, it is assumed that from the beginning table selects one to excavate programi p A conditional pattern base is set up, before only being included in sequence in conditional pattern base
Severali 1,i 2,...,i p-1, so without in addition below effectiveness to the effective of some.
Technique effect of the invention and advantage:A kind of data flow effective item collection of reduction candidate that the present invention is provided
Mining algorithm, first, sets up a global tree, and reduce head table in global tree by the single pass of current window in data flow
The redundancy value of utility of entrance and node;Then, based on overall situation tree generation candidate pattern, the time based on growth algorithm reduction local tree
Set of choices effectiveness;Finally, effective pattern is selected from candidate pattern.It is based on True Data stream test result indicate that, this hair
Bright spatiotemporal efficiency and EMS memory occupation are than being superior to the effective pattern mining algorithm of other data flows.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with specific embodiment, to this
Invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, not
For limiting the present invention.Based on the embodiment in the present invention, those of ordinary skill in the art are not before creative work is made
The every other embodiment for being obtained is put, the scope of protection of the invention is belonged to.
A kind of data flow effective item set mining algorithm for reducing candidate, comprises the following steps:
S1, first, a global tree is set up by the single pass of current window in data flow, and head table enters in reducing global tree
Mouthful with the redundancy value of utility of node;
S2 and then, based on the overall situation tree generation candidate pattern, based on growth algorithm reduction local tree candidate effectiveness;
S3, in candidate effectiveness, according to the order of transaction set, successively by the i in kth affairsjThe affairs weighting of item
Effectiveness is added, used as node ijThe things weighting effectiveness summation of item, meanwhile, by item ijPrefix be added to node ijPrefix
In item collection chained list, be added to secondary frequency effectiveness item in tree by treatment time frequency effectiveness item collection;
S4 and then by introducing affairs effectiveness threshold value high and low affairs effectiveness threshold value, three layers are divided into by affairs weighting utility scale,
Alignment processing is layered in original transaction collection and newly-increased transaction set, using HTWUDHigh frequency effectiveness item collection in storage data set,
PTUVDSecondary frequency effectiveness item collection in storage data set;
S5, finally calculate actual utility and determine final effective item collection.
Specifically, the method for building up of the global tree is as follows:
A, the affairs weighting effectiveness variable quantity for calculating each item collection in change affairs first;
B and then they are divided into high frequency effectiveness according to the item frequency of raw data base, secondary frequency effectiveness and low frequency effectiveness are come
Construction PreHU-tree;
C, directly determine the frequency of n mono- finally by the affairs weighting effectiveness and prefix item collection chained list of search each nodes of PreHU-tree
;
D, the outside effectiveness with reference to the item collection support in prefix item collection chained list and item excavate varying type effective item collection.
Specifically, redundancy effectiveness reduction algorithm is stated as follows:
A, in a head table for overall situation HUS trees for each sets up a conditional pattern base, each divide search space head table
In not include every terms of information, therefore from conditional pattern base produce candidate pattern when, without the utility information comprising project below;
B, hypothesisS={i 1<i 2<...<i m It is current sequence, whereini 1Withi m It is respectively the top and bottom of global tree head table
, it is assumed that from the beginning table selects one to excavate programi p A conditional pattern base is set up, before only being included in sequence in conditional pattern base
Severali 1,i 2,...,i p-1, so without in addition below effectiveness to the effective of some.
In sum:A kind of data flow effective item set mining algorithm of reduction candidate that the present invention is provided, first,
One global tree is set up by the single pass of current window in data flow, and it is superfluous with node to reduce head table entry during the overall situation is set
Remaining value of utility;Then, based on overall situation tree generation candidate pattern, the candidate effectiveness based on growth algorithm reduction local tree;Most
Eventually, effective pattern is selected from candidate pattern.It is based on True Data stream test result indicate that, spatiotemporal efficiency of the invention with
EMS memory occupation is than being superior to the effective pattern mining algorithm of other data flows.
Finally it should be noted that:The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention,
Although being described in detail to the present invention with reference to the foregoing embodiments, for a person skilled in the art, it still may be used
Modified with to the technical scheme described in foregoing embodiments, or equivalent carried out to which part technical characteristic,
All any modification, equivalent substitution and improvements within the spirit and principles in the present invention, made etc., should be included in of the invention
Within protection domain.
Claims (3)
1. it is a kind of reduce candidate data flow effective item set mining algorithm, it is characterised in that comprise the following steps:
S1, first, a global tree is set up by the single pass of current window in data flow, and head table enters in reducing global tree
Mouthful with the redundancy value of utility of node;
S2 and then, based on the overall situation tree generation candidate pattern, based on growth algorithm reduction local tree candidate effectiveness;
S3, in candidate effectiveness, according to the order of transaction set, successively by the i in kth affairsjThe affairs weighting effect of item
With addition, as node ijThe things weighting effectiveness summation of item, meanwhile, by item ijPrefix be added to node ijPrefix
In collection chained list, be added to secondary frequency effectiveness item in tree by treatment time frequency effectiveness item collection;
S4 and then by introducing affairs effectiveness threshold value high and low affairs effectiveness threshold value, three layers are divided into by affairs weighting utility scale,
Alignment processing is layered in original transaction collection and newly-increased transaction set, using HTWUDHigh frequency effectiveness item collection in storage data set,
PTUVDSecondary frequency effectiveness item collection in storage data set;
S5, finally calculate actual utility and determine final effective item collection.
2. a kind of data flow effective item set mining algorithm for reducing candidate according to claim 1, its feature exists
In:The method for building up of the global tree is as follows:
A, the affairs weighting effectiveness variable quantity for calculating each item collection in change affairs first;
B and then they are divided into high frequency effectiveness according to the item frequency of raw data base, secondary frequency effectiveness and low frequency effectiveness are come
Construction PreHU-tree;
C, finally by search each nodes of PreHU-tree affairs weighting effectiveness and prefix item collection chained list directly determine n- frequency items;
D, the outside effectiveness with reference to the item collection support in prefix item collection chained list and item excavate varying type effective item collection.
3. a kind of data flow effective item set mining algorithm for reducing candidate according to claim 1, its feature exists
In:The redundancy effectiveness reduction algorithm is as follows:
A, in a head table for overall situation HUS trees for each sets up a conditional pattern base, each divide search space head table
In not include every terms of information, therefore from conditional pattern base produce candidate pattern when, without the utility information comprising project below;
B, hypothesisS={i 1<i 2<...<i m It is current sequence, whereini 1Withi m It is respectively the top and bottom of global tree head table
, it is assumed that from the beginning table selects one to excavate programi p A conditional pattern base is set up, before only being included in sequence in conditional pattern base
Severali 1,i 2,...,i p-1, so without in addition below effectiveness to the effective of some.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611202991.7A CN106777182A (en) | 2016-12-23 | 2016-12-23 | A kind of data flow effective item set mining algorithm for reducing candidate |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611202991.7A CN106777182A (en) | 2016-12-23 | 2016-12-23 | A kind of data flow effective item set mining algorithm for reducing candidate |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106777182A true CN106777182A (en) | 2017-05-31 |
Family
ID=58897578
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611202991.7A Pending CN106777182A (en) | 2016-12-23 | 2016-12-23 | A kind of data flow effective item set mining algorithm for reducing candidate |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106777182A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766461A (en) * | 2017-09-28 | 2018-03-06 | 新乡学院 | A kind of campus Student Information Management System |
CN109101530A (en) * | 2018-06-22 | 2018-12-28 | 哈尔滨工业大学(深圳) | Effective sequence of events pattern mining algorithm |
CN109408563A (en) * | 2018-11-07 | 2019-03-01 | 哈尔滨工业大学(深圳) | High average utility item set mining method, apparatus and computer equipment |
CN110413660A (en) * | 2019-07-26 | 2019-11-05 | 哈尔滨工业大学(深圳) | Excavate the method, apparatus and computer readable storage medium of global effective item collection |
CN110471960A (en) * | 2019-08-21 | 2019-11-19 | 桂林电子科技大学 | A kind of effective item set mining method containing disutility |
CN112801793A (en) * | 2021-01-31 | 2021-05-14 | 哈尔滨工业大学(威海) | Method for mining high-profit commodities in e-commerce transaction data |
CN113792099A (en) * | 2021-08-12 | 2021-12-14 | 上海熙业信息科技有限公司 | Data flow high-utility item set mining system based on historical effective table pruning |
CN115964415A (en) * | 2023-03-16 | 2023-04-14 | 山东科技大学 | Pre-HUSPM-based database sequence insertion processing method |
-
2016
- 2016-12-23 CN CN201611202991.7A patent/CN106777182A/en active Pending
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766461A (en) * | 2017-09-28 | 2018-03-06 | 新乡学院 | A kind of campus Student Information Management System |
CN109101530A (en) * | 2018-06-22 | 2018-12-28 | 哈尔滨工业大学(深圳) | Effective sequence of events pattern mining algorithm |
CN109101530B (en) * | 2018-06-22 | 2021-09-21 | 哈尔滨工业大学(深圳) | High-utility event sequence pattern mining method |
CN109408563B (en) * | 2018-11-07 | 2021-06-22 | 哈尔滨工业大学(深圳) | High average utility item set mining method and device and computer equipment |
CN109408563A (en) * | 2018-11-07 | 2019-03-01 | 哈尔滨工业大学(深圳) | High average utility item set mining method, apparatus and computer equipment |
CN110413660A (en) * | 2019-07-26 | 2019-11-05 | 哈尔滨工业大学(深圳) | Excavate the method, apparatus and computer readable storage medium of global effective item collection |
CN110413660B (en) * | 2019-07-26 | 2024-05-14 | 哈尔滨工业大学(深圳) | Method, apparatus and computer readable storage medium for mining global efficient item sets |
CN110471960A (en) * | 2019-08-21 | 2019-11-19 | 桂林电子科技大学 | A kind of effective item set mining method containing disutility |
CN110471960B (en) * | 2019-08-21 | 2022-04-05 | 桂林电子科技大学 | High-utility item set mining method containing negative utility |
CN112801793A (en) * | 2021-01-31 | 2021-05-14 | 哈尔滨工业大学(威海) | Method for mining high-profit commodities in e-commerce transaction data |
CN112801793B (en) * | 2021-01-31 | 2022-04-15 | 哈尔滨工业大学(威海) | Method for mining high-profit commodities in e-commerce transaction data |
CN113792099A (en) * | 2021-08-12 | 2021-12-14 | 上海熙业信息科技有限公司 | Data flow high-utility item set mining system based on historical effective table pruning |
CN113792099B (en) * | 2021-08-12 | 2023-08-25 | 上海熙业信息科技有限公司 | Data flow high-utility item set mining system based on historical utility table pruning |
CN115964415A (en) * | 2023-03-16 | 2023-04-14 | 山东科技大学 | Pre-HUSPM-based database sequence insertion processing method |
CN115964415B (en) * | 2023-03-16 | 2023-05-26 | 山东科技大学 | Pre-HUSPM-based database sequence insertion processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106777182A (en) | A kind of data flow effective item set mining algorithm for reducing candidate | |
Bergmeir et al. | Bagging exponential smoothing methods using STL decomposition and Box–Cox transformation | |
CN104182474B (en) | A kind of pre- recognition methods for being lost in user | |
Uddin et al. | Population age structure and savings rate impacts on economic growth: Evidence from Australia | |
Edeme et al. | Infrastructural development, sustainable agricultural output and employment in ECOWAS countries | |
Zhao et al. | Intuitionistic fuzzy set approach to multi-objective evolutionary clustering with multiple spatial information for image segmentation | |
CN105760443B (en) | Item recommendation system, project recommendation device and item recommendation method | |
De Queiroz et al. | Sharing cuts under aggregated forecasts when decomposing multi-stage stochastic programs | |
CN107895038A (en) | A kind of link prediction relation recommends method and device | |
Kang et al. | Forecast with forecasts: Diversity matters | |
CN104850577A (en) | Data flow maximal frequent item set mining method based on ordered composite tree structure | |
Liu et al. | A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge | |
CN109559156A (en) | Client's intention based on client properties and marketing data has monitoring forecast method | |
Mirhashemi et al. | Extracting association rules from changes in aquifer drawdown in irrigation areas of Qazvin plain, Iran | |
CN107578165A (en) | Marketing of bank management method and system based on brief algorithm in rough set | |
CN109754106A (en) | A kind of prediction technique of shopping center shop distribution planning | |
CN107220742A (en) | A kind of development of information system common support method analyzed based on system vulnerability and platform | |
CN108509531B (en) | Spark platform-based uncertain data set frequent item mining method | |
Meagher et al. | Applied general equilibrium modelling and labour market forecasting | |
Chaudhari et al. | Advance privacy preserving in association rule mining | |
Khalili Araghi et al. | Effective Factors on the Growth of Provinces of Iran: A Spatial Panel Approach | |
Shangguan et al. | Enhancing data quality and real-time sharing performance in water informatics through decision tree mining algorithm | |
Billa et al. | Efficient frequent pattern mining algorithm based on node sets in cloud computing environment | |
Singh et al. | Education and unemployment in rural and urban Kerala | |
Surawase et al. | High Utility Itemset Mining From Transaction Database Using Up-Growth And Up-Growth+ Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170531 |