CN104834733A - Big data mining and analyzing method - Google Patents

Big data mining and analyzing method Download PDF

Info

Publication number
CN104834733A
CN104834733A CN201510254391.4A CN201510254391A CN104834733A CN 104834733 A CN104834733 A CN 104834733A CN 201510254391 A CN201510254391 A CN 201510254391A CN 104834733 A CN104834733 A CN 104834733A
Authority
CN
China
Prior art keywords
item sets
value
candidate
data
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510254391.4A
Other languages
Chinese (zh)
Inventor
高爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Boyuan Technology Co Ltd
Original Assignee
Chengdu Boyuan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Boyuan Technology Co Ltd filed Critical Chengdu Boyuan Technology Co Ltd
Priority to CN201510254391.4A priority Critical patent/CN104834733A/en
Publication of CN104834733A publication Critical patent/CN104834733A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Abstract

The invention provides a big data mining and analyzing method. The method includes the steps that firstly, all item sets with a supporting number not lower than a threshold set by a user are found in a database through iteration; then, by means of the item sets obtained through retrieval, a rule meeting the minimum confidence is constructed; MapReduce processing is carried out on an association rule generating process, and an association rule is transplanted to a cloud computation platform and used for data analysis and processing based on cloud computation. In data analysis and processing based on cloud computation, the method can improve execution efficiency of data analysis and processing, and the effect is particularly obvious under the situation of big datasets.

Description

A kind of large data mining analysis method
Technical field
The present invention relates to large data processing, particularly the large data mining analysis method of one.
Background technology
Cloud computing, by internet platform, provides the cheap extendible computing power of distributed dynamic.Cloud computing can make to realize the ideal effect that thing thing is connected, net net communicates.Much information sensing equipment intercoms mutually middle, and the data of generation are magnanimity.Therefore it is vital for fast and effeciently extracting useful information in the large data of the magnanimity how generated.The shortcoming that traditional data processing method exists is exactly want the whole database of Multiple-Scan in the process of implementation, produces huge Candidate Set, causes the waste of Time and place.
Summary of the invention
For solving the problem existing for above-mentioned prior art, the present invention proposes a kind of large data mining analysis method, comprise: the Item Sets being first not less than threshold value set by user by support numbers all in iterative search database, the Item Sets then utilizing retrieval to obtain constructs the rule meeting min confidence; MapReduce process is carried out to correlation rule generative process, and is transplanted to cloud computing platform, be applied to the Data Analysis Services based on cloud computing.
Preferably, described correlation rule generative process comprises further:
(1) be that a unit distributes with the data set of 16Mb size, transaction database D horizontal homogeneous be divided into n subset, send it to m working node;
(2) aggregate-value of the support number of Candidate itemsets X is designated as cs (X), the initial value setting each cs (X) is 1, each working node scans the subset be assigned to separately, produce one and comprise candidate 1 Item Sets until the set of candidate K Item Sets, be denoted as CP;
(3) define partition functions, candidate 1 Item Sets generate m working node, until candidate K Item Sets is divided into the individual different subregion of r, is sent to r node together with respective cs; Each node adds up the cs of same project collection, obtain the final cs of projects collection, the final cs of projects collection and the aggregate-value Smin of the minimum support number preset are compared, deletes the Item Sets that support number is less than Smin, determine the Item Sets set L of a local p;
(4) the result L of all r node is merged p, generate the Item Sets set L of the overall situation;
(5) travel through Item Sets according to the min confidence cm of setting, obtain Strong association rule, process terminates.
Preferably, described correlation rule generative process carries out MapReduce process, comprises further:
(1) transaction database D is flatly divided into n block by MapReduce, the size of every block is determined by parameter, n data subset is sent to the node that m performs Map affairs, is responsible for scheduling by master routine, processing transactions is distributed to the working machine be in free list;
(2) format n data subset, produce ID, Value key-value pair, wherein ID represents the affairs ID in D, and Value is the list value that respective transaction ID is corresponding;
(3) Map operation is to each ID of input, Value key-value pair scans, generate the set CP of local candidate 1 Item Sets to candidate k Item Sets, the cs initial value of each Candidate itemsets is set as 1, Map operation exports intermediate result Itemset, 1 key-value pair, wherein Itemset represents the Candidate itemsets in CP;
(4) on the working machine of each execution Map function, an optional partition functions is increased, the intermediate result that Map operation produces is merged, key-value pair Itemset, s in the middle of exporting, wherein s represents the accumulated value of the cs of Itemset in data subset, then utilizes hash function:
hash ( m 1 , m 2 , m 3 , . . . , m k ) = Σ j = 1 k 10 k - j m j mod r
Wherein m 1, m 2, m 3..., m kfor the sequence number that the item in K Item Sets is corresponding in the Item Sets of D, by ascending order arrangement, r is the number of the different subregions divided, and by the Itemset that partition functions produces, s is divided into r subregion, and master routine is responsible for each subregion being assigned to corresponding Reduce function;
(5) Reduce node reads the key-value pair Itemset that partition functions is submitted to, s, after it is sorted and merging, form Itemset, list (s), then carry out corresponding Reduce operation, obtain the actual support number aggregate-value of each Candidate itemsets in D, retaining all Candidate itemsets being more than or equal to minimum support number aggregate-value Smin, is namely the set L of local item collection p; Merge the Item Sets that in r subregion, Reduce function exports, obtain the set L of final Item Sets;
(6) after completing whole Map operations and Reduce operation, master routine excited users program, MapReduce turns back to corresponding point of invocation.
The present invention compared to existing technology, has the following advantages:
Method of the present invention, based in the Data Analysis Services of cloud computing, can improve the execution efficiency of Data Analysis Services, and particularly when large data sets, effect is particularly evident.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the large data mining analysis method according to the embodiment of the present invention.
Embodiment
Detailed description to one or more embodiment of the present invention is hereafter provided together with the accompanying drawing of the diagram principle of the invention.Describe the present invention in conjunction with such embodiment, but the invention is not restricted to any embodiment.Scope of the present invention is only defined by the claims, and the present invention contain many substitute, amendment and equivalent.Set forth many details in the following description to provide thorough understanding of the present invention.These details are provided for exemplary purposes, and also can realize the present invention according to claims without some in these details or all details.
An aspect of of the present present invention provides a kind of large data mining analysis method.Fig. 1 is the large data mining analysis method flow diagram according to the embodiment of the present invention.Data Analysis Services system based on cloud computing is made up of data memory module, data analysis module and transaction model, and the present invention uses Hadoop platform as computing environment, the MapReduce plug-in unit that developing instrument uses Hadoop to carry.MapReduce is a kind of distributed programmed model that can carry out parallel computation process to large data.Two steps are mainly divided into the operation of data: Map and Reduce, Map operation is the operation of specifying the key-value pair of shape as (key, value) of input, key-value pair in the middle of generating a group.Reduce operation is exactly carry out stipulations merging to the middle key-value pair that Map operation exports.
The present invention improves traditional association rule generating method on cloud computing platform, first iteration is passed through, search all support numbers in database and be not less than the Item Sets of threshold value set by user, the Item Sets then utilizing retrieval to obtain constructs the rule that can meet min confidence.The correlation rule generation method done after corresponding improvement will generate all Item Sets, just can complete as long as carry out a scanning to transaction database.Utilize the attribute of the Distributed Parallel Computing of cloud computing, MapReduce process is carried out to the correlation rule generation method improved, is transplanted to cloud computing platform, the Data Analysis Services based on cloud computing can be applied to.
The preferred correlation rule generative process of the present invention comprises:
(1) in order to obtain good load balancing, being that a unit distributes with the data set of 16Mb size, transaction database D horizontal homogeneous being divided into n subset, sending it to m working node.
(2) aggregate-value of the support number of Candidate itemsets X is designated as cs (X), the initial value setting each cs (X) is 1, each working node scans the subset be assigned to separately, produces one and comprises candidate 1 Item Sets until the set of candidate K Item Sets, be denoted as CP.
(3) define a partition functions, candidate 1 Item Sets generate m working node, until candidate K Item Sets is divided into the individual different subregion of r, is sent to r node together with respective cs.Each node adds up the cs of same project collection, obtains the final cs of projects collection, itself and the aggregate-value Smin of minimum support number preset is compared, and deletes the Item Sets that support number is less than Smin, determines the Item Sets set L of a local p.
(4) merge the result of all nodes, just generate the Item Sets set L of the overall situation.
(5) travel through Item Sets according to the min confidence cm of setting, obtain Strong association rule, process terminates.
The correlation rule generation method improved according to above thinking only needs scanning transaction database just can find all Item Sets.
Data of the present invention are stored by data memory module and are converted to database file, are saved in HDFS; Data analysis module utilizes the attribute of the Distributed Parallel Computing of cloud computing platform that the correlation rule generation method of improvement is transferred to MapReduce process, apply to cloud computing platform, be responsible for establishment, the management control of affairs by master routine, according to the request of user, algorithm be transferred to corresponding node and calculate; Transaction model is responsible for the scheduling of all Data Analysis Services affairs by master routine, and net result is returned to user.MapReduce programming model only provides relevant interface the details that realizes loaded down with trivial details for bottom to be shielded to upper layer module, reduces programming difficulty.
The correlation rule generation method of above-mentioned improvement can realize with MapReduce programming model, and concrete operating process is as follows:
(1) transaction database D is flatly divided into n block by MapReduce, is determined the size of every block by parameter, and the size setting every block in the present invention is 16Mb.N data subset is sent to the node that m performs Map affairs.Be responsible for scheduling by master routine, processing transactions distributed to the working machine be in free list.
(2) format n data subset, produce ID, Value couple, wherein ID represents the affairs ID in D, and Value is the list value that respective transaction ID is corresponding.
(3) each ID, the Value of Map function to input scans, and generates the set CP of local candidate 1 Item Sets to candidate k Item Sets.The cs initial value of each Candidate itemsets is set as 1.Map function exports intermediate result Itemset, and 1 key-value pair, wherein Itemset represents the Candidate itemsets in CP.
(4) first on the working machine of each execution Map function, an optional partition functions is increased, the intermediate result that Map function produces is merged, key-value pair Itemset, s in the middle of exporting, s represents the accumulated value of the cs of Itemset in data subset, then utilizes hash function:
hash ( m 1 , m 2 , m 3 , . . . , m k ) = Σ j = 1 k 10 k - j m j mod r
Wherein m 1, m 2, m 3..., m kfor the sequence number that the item in K Item Sets is corresponding in the Item Sets of D, by ascending order arrangement, r is the number of the different subregions divided), by the Itemset that partition functions produces, s is divided into r subregion, and master routine is responsible for each subregion being assigned to corresponding Reduce function.
(5) Reduce node reads the key-value pair Itemset that partition functions is submitted to, s, after it is sorted and merging, form Itemset, list (s), then carry out corresponding Reduce operation, obtain the actual support number aggregate-value of each Candidate itemsets in D, retaining all Candidate itemsets being more than or equal to minimum support number aggregate-value Smin, is namely the set L of local item collection p.Merge the Item Sets that in r subregion, Reduce function exports, obtain the set L of final Item Sets.
(6) when completing after whole Map operation and Reduce operate, user program activate by master routine, MapReduce turns back to corresponding point of invocation.
The correlation rule generation method of improvement is realized with MapReduce programming model, only needs, to transaction database run-down, just can obtain complete Item Sets L, accelerate the speed of parallel processing, substantially increase execution efficiency.
According to a further aspect in the invention, a kind of analytical approach of correlation rule of large data is provided.The core of analyzing and processing correlation rule obtains Item Sets by statistics item, but along with the quantity of large data constantly increases, data volume rank has arrived TB level even PB level, traditional single node serial algorithm cannot meet the needs of data volume sharp increase, meanwhile, along with the dynamic growth of data set, hiding correlation rule also can change thereupon.
The present invention will improve the problems such as lower, the large Data Update analyzing and processing of serial mode Association Rule Analysis treatment effeciency, a kind of Association Rules Algorithm Updating based on cloud computing is proposed, a kind of association rules updating method under single node environment is proposed, the correlation rule incremental analysis process problem that the scale that can effectively solve is less.(2) method for designing adopting MapReduce function right, by the parallelization of association rules updating method, proposes the Association Rules Algorithm Updating based on cloud computing.Propose a kind of cloud computing framework of association rules updating, and can expand in the analyzing and processing application of other data type.
Cloud computing technology and large data processing are closely related, and utilizing cloud computing to solve extensive tree Data Analysis Services is a direction with development potentiality.In storage capacity, it is that traditional database is incomparable that the tree data that cloud computing platform provides store with the ability of maintenance, magnanimity tree data capacity may reach hundreds of GB even TB rank, can be larger if carry out storage system maintenance cost by traditional database, cloud computing platform then provides distributed memory module, the storage capacity of a large amount of common computer and computing power can be gathered together, for large data provide sufficient space, cloud computing environment additionally provides data backup simultaneously, con current control, the strategy such as consistency maintenance and reliability, reliable guarantee can be provided for large data.In processing power, cloud computing platform provides distributed treatment ability, utilizes this feature, can carry out parallel processing, can significantly improve the ability of large Data Analysis Services to Data Analysis Services process.
In dirigibility and scalability, cloud computing platform possesses good dirigibility and scalability, is applicable to very much the magnanimity tree data larger to data volume Flexible change and processes.Cloud computing platform provides the function of expanding node in existing cloud, to improve computational resource and memory capacity.
MapReduce model mainly comprises Hadoop and HOP system, and the present invention will utilize MapReduce model to process mass data.The workflow performing each stage of MapReduce operation in Hadoop platform is as follows:
(1) input file: the large data files of input is divided into some independently data by MapReduce storehouse, and in the backup of the enterprising line program data of different machines.
(2) distribute affairs: master routine peer distribution subtransaction in MapReduce, and subtransaction is submitted in idle working machine node.
(3) generate key-value pair: the working node of the subtransaction be assigned with read input file, therefrom parse key/value key-value pair, and the Map function process key-value pair that invoke user is write, and key-value pair in the middle of generating.
(4) message is sent: these intermediate data are divided into some districts by partition functions, and each district positional information in disk is sent to master routine, are then transmitted to Reduce subtransaction node.
(5) call intermediate data: Reduce subtransaction node calls intermediate data on disk according to positional information after obtaining the subtransaction forwarded by master routine, and sort by key value in the middle of these, identical key value carries out union operation.
(6) perform Reduce function: the intermediate data after the sequence of Reduce subtransaction node traverses, and data are passed to user-defined Reduce function.Its execution result will be output in final output file.
(7) Output rusults: after waiting all Reduce subtransactions to complete, all data are returned to user program by master routine node, and user program pooled data also exports final data.
MapReduce algorithm workflow based on Hadoop platform is simple, the allocation strategy of affairs and the right design of MapReduce function only need be considered when designing, and for the challenge in other parallel computation, then give Hadoop platform as scheduling, fault-tolerant processing, distributed storage, network service etc. and process.Therefore, the present invention will design a kind of Association Rules Algorithm Updating to improve the replacement analysis treatment effeciency of large data based on Hadoop platform.
For improving the execution efficiency of algorithm, utilizing this character that all nonvoid subsets of Item Sets are also, cut operator can be carried out to candidate k Item Sets, to improve algorithm operational efficiency.But when data set occurs to upgrade, traditional correlation rule generation method has met new demand, can only rescan database analysis processing item collection, can greatly increase analyzing and processing time and consume system resources like this.Therefore first the present invention proposes the association rules updating method under single computing node, and arthmetic statement is as follows:
(1) former database tdb is obtained, Item Sets L knewly-increased database tdb, minimum support number s, to all X ∈ Lk, the newly-increased data set tdb of scanning, obtain the support number s of X in TDB ∪ tdb (TDB ∪ tdb), if s (TDB ∪ tdb) <s × (TDB+tdb), then X is deleted from Lk.
(2) in tdb, all candidate k Item Sets C are searched k, to all X ∈ C k, scanning tdb also calculates the support number of each Candidate itemsets, if support number is less than s × tdb, then by X from C kmiddle removal, obtains the set C ' of the Candidate itemsets that is simplified more with this k.
(3) scan raw data base TDB, upgrade C kin the support number of all Candidate itemsets, and find Item Sets new in TDB ∪ tdb, the L after these new Item Sets and above-mentioned renewal kjointly constitute the Item Sets L in new database k *.
In the implementation of association rules updating method, each iteration only needs to scan whole database once, for the new Item Sets produced, first prune according to the support number of Candidate itemsets in newly-increased database tdb, and then whether to judge in total data storehouse, can greatly reduce the number of times of scan database like this, therefore the execution efficiency of the method when more kainogenesis is better than use correlation rule generation method.
But, when database is comparatively large or when upgrading, association rules updating method can cause the reduction of operational efficiency because of sharply increasing of calculated amount.Therefore, the problem that an Association Rules Algorithm Updating based on cloud computing solves large Data Analysis Services is designed.When data set occurs to upgrade, if data volume is greater than predefine threshold value, then cloud computing platform adopts MapReduce model, the renewal of correlation rule is performed parallelization process in multiple distributed node, otherwise in single node, performs the renewal of correlation rule.
Based on association rules updating method design master routine of cloud platform, first by master routine, newly-increased database tdb is carried out to the analyzing and processing of Item Sets, obtain Item Sets L (tdb) all in tdb, original Item Sets L (TDB) and L (tdb) are contrasted, searches its public part and put into final Item Sets L *in, remaining Item Sets L (TDB) and L (tdb) are designated as C r.Then carry out MapReduce operation, arthmetic statement is as follows:
Map operates: parallel scan raw data base and newly-increased database, according to original Item Sets and C r, format manipulation is carried out to data and forms key-value pair <T num, L k>, and all key-value pairs are passed to Reduce operation as intermediate data.
Reduce operates: scanning intermediate result collection, and middle key-value pair is carried out ascending sort, successively scan database judge whether the X ∈ L that satisfies condition kif condition is set up, and deletes this key-value pair, otherwise traversal tdb, the support number of calculated candidate Item Sets in tdb, if the s that satisfies condition (TDB ∪ tdb) <s × (TDB+tdb), then deletes this Item Sets.Finally travel through TDB+tdb, calculate the support number of each Item Sets, then judge that whether support number is higher than user preset support number threshold value, in new database, k Item Sets is by former L kin remaining Item Sets and the new Item Sets produced jointly form L k *=(Lk-L delete) ∪ L new.
In sum, method of the present invention, based in the Data Analysis Services of cloud computing, can improve the execution efficiency of Data Analysis Services, and particularly when large data sets, effect is particularly evident.
Obviously, it should be appreciated by those skilled in the art, above-mentioned of the present invention each module or each step can realize with general computing system, they can concentrate on single computing system, or be distributed on network that multiple computing system forms, alternatively, they can realize with the executable program code of computing system, thus, they can be stored and be performed by computing system within the storage system.Like this, the present invention is not restricted to any specific hardware and software combination.
Should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims (3)

1. a large data mining analysis method, for carrying out data analysis by the Data Analysis Services system based on cloud computing, is characterized in that, comprise:
First be not less than the Item Sets of threshold value set by user by support numbers all in iterative search database, the Item Sets then utilizing retrieval to obtain constructs the rule meeting min confidence; MapReduce process is carried out to correlation rule generative process, and is transplanted to cloud computing platform, be applied to the Data Analysis Services based on cloud computing.
2. method according to claim 1, is characterized in that, described correlation rule generative process comprises further:
(1) be that a unit distributes with the data set of 16Mb size, transaction database D horizontal homogeneous be divided into n subset, send it to m working node;
(2) aggregate-value of the support number of Candidate itemsets X is designated as cs (X), the initial value setting each cs (X) is 1, each working node scans the subset be assigned to separately, produce one and comprise candidate 1 Item Sets until the set of candidate K Item Sets, be denoted as CP;
(3) define partition functions, candidate 1 Item Sets generate m working node, until candidate K Item Sets is divided into the individual different subregion of r, is sent to r node together with respective cs; Each node adds up the cs of same project collection, obtain the final cs of projects collection, the final cs of projects collection and the aggregate-value Smin of the minimum support number preset are compared, deletes the Item Sets that support number is less than Smin, determine the Item Sets set L of a local p;
(4) the result L of all r node is merged p, generate the Item Sets set L of the overall situation;
(5) travel through Item Sets according to the min confidence cm of setting, obtain Strong association rule, process terminates.
3. method according to claim 2, is characterized in that, described correlation rule generative process carries out MapReduce process, comprises further:
(1) transaction database D is flatly divided into n block by MapReduce, the size of every block is determined by parameter, n data subset is sent to the node that m performs Map affairs, is responsible for scheduling by master routine, processing transactions is distributed to the working machine be in free list;
(2) format n data subset, produce ID, Value key-value pair, wherein ID represents the affairs ID in D, and Value is the list value that respective transaction ID is corresponding;
(3) Map operation is to each ID of input, Value key-value pair scans, generate the set CP of local candidate 1 Item Sets to candidate k Item Sets, the cs initial value of each Candidate itemsets is set as 1, Map operation exports intermediate result Itemset, 1 key-value pair, wherein Itemset represents the Candidate itemsets in CP;
(4) on the working machine of each execution Map function, an optional partition functions is increased, the intermediate result that Map operation produces is merged, key-value pair Itemset, s in the middle of exporting, wherein s represents the accumulated value of the cs of Itemset in data subset, then utilizes hash function:
hash ( m 1 , m 2 , m 3 , . . . m k ) = &Sigma; j = 1 k 10 k - j m j mod r
Wherein m 1, m 2, m 3..., m kfor the sequence number that the item in K Item Sets is corresponding in the Item Sets of D, by ascending order arrangement, r is the number of the different subregions divided, and by the Itemset that partition functions produces, s is divided into r subregion, and master routine is responsible for each subregion being assigned to corresponding Reduce function;
(5) Reduce node reads the key-value pair Itemset that partition functions is submitted to, s, after it is sorted and merging, form Itemset, list (s), then carry out corresponding Reduce operation, obtain the actual support number aggregate-value of each Candidate itemsets in D, retaining all Candidate itemsets being more than or equal to minimum support number aggregate-value Smin, is namely the set L of local item collection p; Merge the Item Sets that in r subregion, Reduce function exports, obtain the set L of final Item Sets;
(6) after completing whole Map operations and Reduce operation, master routine excited users program, MapReduce turns back to corresponding point of invocation.
CN201510254391.4A 2015-05-18 2015-05-18 Big data mining and analyzing method Pending CN104834733A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510254391.4A CN104834733A (en) 2015-05-18 2015-05-18 Big data mining and analyzing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510254391.4A CN104834733A (en) 2015-05-18 2015-05-18 Big data mining and analyzing method

Publications (1)

Publication Number Publication Date
CN104834733A true CN104834733A (en) 2015-08-12

Family

ID=53812619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510254391.4A Pending CN104834733A (en) 2015-05-18 2015-05-18 Big data mining and analyzing method

Country Status (1)

Country Link
CN (1) CN104834733A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205158A (en) * 2015-09-29 2015-12-30 成都四象联创科技有限公司 Big data retrieval method based on cloud computing
CN107766442A (en) * 2017-09-21 2018-03-06 深圳金融电子结算中心有限公司 A kind of mass data association rule mining method and system
CN108628954A (en) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 A kind of mass data self-service query method and apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799810A (en) * 2009-02-06 2010-08-11 中国移动通信集团公司 Association rule mining method and system thereof
CN103258049A (en) * 2013-05-27 2013-08-21 重庆邮电大学 Association rule mining method based on mass data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799810A (en) * 2009-02-06 2010-08-11 中国移动通信集团公司 Association rule mining method and system thereof
CN103258049A (en) * 2013-05-27 2013-08-21 重庆邮电大学 Association rule mining method based on mass data

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BING LIU著,俞勇等译: "《Web数据挖掘(第2版)》", 30 April 2009 *
夏春艳: "《数据挖掘技术与应用》", 31 August 2014 *
孙芬芬: "海量数据并行挖掘技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李玲娟等: "云计算环境下关联规则挖掘算法的研究", 《计算机技术与发展》 *
杨新月: "云计算环境下关联规则算法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
杨泽民: "云计算模型中关联规则增量更新方法", 《计算机工程与设计》 *
胡可云等: "《数据挖掘理论与应用》", 30 April 2008 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205158A (en) * 2015-09-29 2015-12-30 成都四象联创科技有限公司 Big data retrieval method based on cloud computing
CN107766442A (en) * 2017-09-21 2018-03-06 深圳金融电子结算中心有限公司 A kind of mass data association rule mining method and system
CN107766442B (en) * 2017-09-21 2019-02-01 深圳金融电子结算中心有限公司 A kind of mass data association rule mining method and system
CN108628954A (en) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 A kind of mass data self-service query method and apparatus
CN108628954B (en) * 2018-04-10 2021-05-25 北京京东尚科信息技术有限公司 Mass data self-service query method and device

Similar Documents

Publication Publication Date Title
CN104834557A (en) Data analysis method based on Hadoop
Afrati et al. Map-reduce extensions and recursive queries
CN102033748B (en) Method for generating data processing flow codes
US20190102447A1 (en) System and method for metadata sandboxing and what-if analysis in a multidimensional database environment
CN101957863B (en) Data parallel processing method, device and system
US20110154339A1 (en) Incremental mapreduce-based distributed parallel processing system and method for processing stream data
Perez et al. Ringo: Interactive graph analytics on big-memory machines
CN104111958A (en) Data query method and device
CN107807983B (en) Design method of parallel processing framework supporting large-scale dynamic graph data query
CN104834734A (en) Efficient data analysis and processing method
Liroz-Gistau et al. FP-Hadoop: Efficient processing of skewed MapReduce jobs
CN104834751A (en) Data analysis method based on Internet of things
Heintz et al. MESH: A flexible distributed hypergraph processing system
Hu et al. Output-optimal massively parallel algorithms for similarity joins
Oruganti et al. Exploring Hadoop as a platform for distributed association rule mining
CN104199912A (en) Task processing method and device
CN104834733A (en) Big data mining and analyzing method
CN104572832B (en) A kind of demand meta-model construction method and device
Salah et al. A highly scalable parallel algorithm for maximally informative k-itemset mining
Raouf et al. An optimized scheme for vertical fragmentation, allocation and replication of a distributed database
CN108153859A (en) A kind of effectiveness order based on Hadoop and Spark determines method parallel
CN105224663A (en) A kind of data-accessing tasks management method based on multiple data source and device
CN104809114A (en) Video big data oriented parallel data mining method
Narkhede et al. Analyzing web application log files to find hit count through the utilization of Hadoop MapReduce in cloud computing environment
CN116775712A (en) Method, device, electronic equipment, distributed system and storage medium for inquiring linked list

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150812

RJ01 Rejection of invention patent application after publication