CN102737126B - Classification rule mining method under cloud computing environment - Google Patents

Classification rule mining method under cloud computing environment Download PDF

Info

Publication number
CN102737126B
CN102737126B CN201210203816.5A CN201210203816A CN102737126B CN 102737126 B CN102737126 B CN 102737126B CN 201210203816 A CN201210203816 A CN 201210203816A CN 102737126 B CN102737126 B CN 102737126B
Authority
CN
China
Prior art keywords
key
value
control center
cloud computing
computing environment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210203816.5A
Other languages
Chinese (zh)
Other versions
CN102737126A (en
Inventor
杨善林
丁静
罗贺
丁帅
徐达宇
范雯娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201210203816.5A priority Critical patent/CN102737126B/en
Publication of CN102737126A publication Critical patent/CN102737126A/en
Application granted granted Critical
Publication of CN102737126B publication Critical patent/CN102737126B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a classification rule mining method under a cloud computing environment. The classification rule mining method is characterized in that a master-slave type organization structure consisting of a control center and a plurality of slave servers is adopted, and comprises the following steps of: dividing, by the control center, a data set to be classified into a training sample and a test sample, and distributing each of data blocks of the same size which are formed by uniformly dividing the training sample to one processing unit; training, by the processing units, the data blocks, by using a genetic algorithm to obtain an atomic rule for classification; and finally, reducing the atomic rule by a classifier, and selecting a reduction result which meets classification accuracy requirement as a final result of the classification rule mining. The classification rule mining method is suitable for data classification on distributed data storage under the cloud computing environment; distributed parallel processing of data classification tasks under the cloud computing environment can be performed; and the classification rule mining method has a positive effect on the classification processing problem of mass data under the cloud computing environment.

Description

Classifying rules method for digging under cloud computing environment
Technical field
The invention belongs to data analysis technique field under cloud computing environment, be specifically related to the classifying rules method for digging under a kind of cloud computing environment.
Background technology
Sorting technique research is the important research field of data analysis and management under cloud computing environment.On the one hand, classification is the important task type of data mining, data under cloud computing environment have the features such as magnanimity, distributivity and dynamic, these features have been brought challenge to the data management under cloud computing environment, by classifying rules, excavate these data analysis, contribute to improve the efficiency of magnanimity data analysis and management under cloud computing environment.On the other hand, cloud environment has ultra-large storage and computing power, and resource and structure have dynamic retractility, and provides as required service by Intel Virtualization Technology and huge resource pool, makes efficient Data classification become possibility.It is the vital task in data analysis management that classifying rules excavates, and contributes to understand better the mass data under cloud computing environment, the Operation Decision under auxiliary cloud computing environment; Meanwhile, the high performance calculating of cloud computing and storage capacity, for classifying rules excavates the guarantee that efficient operation is provided.Therefore, it is the important technology that the data analysis under cloud computing environment is processed that classifying rules excavates, and its research theoretical and application is significant.
In the research of classifying rules digging technology, Chinese scholars has proposed many solutions, comprise and take statistical method that Bayesian Method is representative, decision tree method and rule induction are representative and the learning method of take, and neural net method etc., when these methods are applied to the classifying rules excavation of small-scale static data collection, there is higher nicety of grading.Yet still there is bottleneck problem in these methods, comprise and need to carry out scanning and sequence repeatedly to data set, cause the poor efficiency of algorithm; More responsive to noise and certain data, be prone to over-fitting; For the scalability of large training set, not fine etc.Especially under cloud computing environment, extensive property and the dynamic of distributed mass data collection, cause the surge of Data classification process search volume and dimension, increased the computational complexity of classification, reduced the efficiency of traditional classification method, thereby existing classifying rules method for digging cannot directly apply in cloud computing environment.
Summary of the invention
In order to address the above problem, the present invention overcomes the limitation of prior art, and the method for digging of the classifying rules under a kind of cloud computing environment is provided.The present invention is applicable to the Data classification in Distributed Storage under cloud computing environment, can carry out the distributed parallel of Data classification task under cloud computing environment and process, and the classification processing problem of mass data under cloud computing environment has been played to positive effect.Utilize the scale of large-scale calculations node under cloud computing environment to calculate effect, effectively improve the efficiency of magnanimity Data classification rule digging under cloud computing environment; And by institutional framework and the regular training process based on genetic algorithm of master-slave mode, solve classifying rules and excavate the distributed implementation under cloud computing environment.
The present invention is that technical solution problem adopts following technical scheme:
The feature of the classifying rules method for digging under cloud computing environment of the present invention is:
Described cloud computing environment consists of a plurality of distributed servers; While implementing classifying rules excavation under described cloud computing environment, take master-slave mode institutional framework, described master-slave mode institutional framework is control center for a station server is set, and other server is dependent server; By described control center, arrange execution, the management and running of whole mining task and coordinate the operation of each dependent server; Described each dependent server is the concrete performance element of task, and described classifying rules method for digging carries out as follows:
A ,You control center is divided into training sample and test sample book by data set to be sorted, described training sample is evenly divided, obtain each data block of formed objects, for each data block, specify a dependent server of carrying out classified excavation task as processing unit, described each data block is assigned in corresponding each processing unit;
B, data block distribution being obtained by described processing unit employing genetic algorithm are implemented the training of the atomic rule for classifying, and the atomic rule that training is obtained deposits buffer zone in;
C, by described control center, the atomic rule in buffer zone is divided, select idle dependent server as sorter, format classification device carries out the redundancy yojan of atomic rule, and detect the nicety of grading of yojan result, filter out and meet the yojan result that nicety of grading requires, the net result excavating as classifying rules.
The feature of the classifying rules method for digging under cloud computing environment of the present invention is also:
The implementation of described step a is:
Control center is after receiving the request of classified excavation task, data set to be sorted is divided into training sample and test sample book, according to user, the requirement of classified excavation task and the feature of training sample are proposed, search for suitable dependent server as processing unit, and training sample is evenly divided into the data block that size is identical; If the processing units quantity satisfying condition is N, training sample size is M, and the data block size divided is M/N;
The satisfied condition of described suitable dependent server is: storage space is not less than M/N, and the response time is not more than the desired time the latest of user;
Control center accesses described training sample, and the block copy of dividing gained, to corresponding processing unit, and is utilized to the operational order of genetic algorithm training atomic rule to processing unit transmission.
The training process of the atomic rule in described step b is:
Processing unit carries out genetic coding to each data recording of composition data piece, by the loop iteration Generation of atoms rule of genetic manipulation, by described atomic rule with <key, the form of value> key-value pair deposits buffer zone in, described <key, key in value> key-value pair is class label, and value is the atomic rule under such label;
Control center periodically reads the <key in buffer zone, value> key-value pair, generate <key, the list of value list> key-value pair deposits buffer zone in, described <key, key in the list of value list> key-value pair is class label, and value list is the atomic rule list under such label;
The complete message of ,Xiang control center transmission processing unit operations after the operation of all data recording in the complete paired data piece of processing unit.
Described step c carries out according to the following procedure:
By control center, search for idle dependent server as sorter, the number of sorter is <key, the number of key value in the list of value list> key-value pair, the corresponding key value of each sorter; Control center is by <key, in atomic rule list in the list of value list> key-value pair and test sample book, have in the sorter that the record of same item label is sent to, and transmit to sorter the operational order that redundancy yojan and nicety of grading detect;
Sorter only records once a plurality of same atoms rules under same class label, deletes the atomic rule of redundancy, obtains the atomic rule after yojan;
Sorter utilizes the atomic rule after yojan to classify to test sample book, whether consistently with the class label of test sample book detect classification results, in the test sample book of supposing to be classified by atomic rule a, there is the class label of Y bar record consistent with classification results, have class label and the classification results of N bar record inconsistent, the nicety of grading of atomic rule a is Y/ (Y+N); Suppose in the classified excavation mission requirements that user proposes, nicety of grading requires as α, all niceties of grading is not less than to the net result that the atomic rule of α excavates as classifying rules and sends control center to;
Control center gathers the net result that all sorters generate, then summarized results is fed back to the requestor of classifying rules mining task.
Compare with existing data classification method, beneficial effect of the present invention is embodied in:
1, the present invention is divided into a plurality of subtasks by the classifying rules mining task of mass data, be assigned on the extensive server cluster in cloud computing environment and process, reduce the computation complexity of individual task, utilize the scale of cloud computing server cluster to calculate effect, significantly improve the efficiency of whole classifying rules mining task;
2, master-slave mode institutional framework in the present invention, has realized the distribution of task under cloud computing environment, scheduling and management, for classifying rules excavates, provides distributed realization mechanism; Meanwhile, the genetic algorithm itself that rule training adopts has good concurrency, has solved the poor problem of general classification technology concurrency in distributed environment.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of classifying rules method for digging under cloud computing environment of the present invention
Fig. 2 is the process flow diagram that in the present invention, the atomic rule based on genetic manipulation loop iteration generates
Embodiment
In classifying rules method for digging under the present embodiment cloud computing environment:
Cloud computing environment consists of a plurality of distributed servers; While implementing classifying rules excavation under cloud computing environment, take master-slave mode institutional framework, master-slave mode institutional framework is control center for a station server is set, and other server is dependent server; By control center, arrange execution, the management and running of whole mining task and coordinate the operation of each dependent server; Each dependent server is the concrete performance element of task.Classifying rules method for digging as shown in Figure 1, carries out as follows:
1 ,You control center is divided into training sample and test sample book by data set to be sorted, training sample is evenly divided, obtain each data block of formed objects, for each data block, specify a dependent server of carrying out classified excavation task as processing unit, each data block is assigned in corresponding each processing unit;
2, by processing unit, adopt genetic algorithm to distributing the data block obtaining to implement the training of the atomic rule for classifying, the atomic rule that training is obtained deposits buffer zone in;
3 ,You control centers divide the atomic rule in buffer zone, select idle dependent server as sorter, format classification device carries out the redundancy yojan of atomic rule, and detect the nicety of grading of yojan result, filter out and meet the yojan result that nicety of grading requires, the net result excavating as classifying rules.
Tentation data collection is by relation schema R (a 1, a 2..., a k) expression, wherein a i(i=1,2 ..., k) be attribute.By set of properties <a 1, a 2..., a k> is divided into two parts, comprises k-1 conditional attribute and 1 class label, attribute a 1, a 2..., a k-1for conditional attribute, attribute a kfor class label.Each of data centralization data recording is a k dimensional vector [c 1, c 2..., c k], c ifor attribute a in this data recording ivalue.
The form of expression of atomic rule is: If (a 1=c 1) ∧ (a 2=c 2) ∧ ... ∧ (a k-1=c k-1), Then a k=c k.
The implementation of step 1 is:
Control center is after receiving the request of classified excavation task, data set to be sorted is divided into training sample and test sample book, according to user, the requirement of classified excavation task and the feature of training sample are proposed, search for suitable dependent server as processing unit, and training sample is evenly divided into the data block that size is identical; If the processing units quantity satisfying condition is N, training sample size is M, and the data block size divided is M/N;
Suitable dependent server need meet following condition: storage space is not less than M/N, and the response time is not more than the desired time the latest of user.
Control center access training sample, to corresponding processing unit, and utilizes the operational order of genetic algorithm training atomic rule by the block copy of dividing gained to processing unit transmission.
The training process of step 2 Atom rule is:
Processing unit carries out genetic coding to each data recording of composition data piece, by the loop iteration Generation of atoms rule of genetic manipulation, by atomic rule with <key, the form of value> key-value pair deposits buffer zone in, <key, key in value> key-value pair is class label, and value is the atomic rule under such label;
The genetic coding process of data recording is: every data logger is shown item chromosome, and chromogene value consists of the scale-of-two code value of k property value.If property value is discrete value, can directly carry out binary coding; If property value is successive value, to carries out after discretize continuous property, then carry out binary coding.Binary coding process is: establish attribute a ithere is s Category Attributes value v 1, v 2..., v s, with the binary number representation attribute a containing s code bit ivalue.If a ivalue be v j, the j place value of this property value binary code is 1, the value position of all the other code bits is 0.For example, sex attribute has " man ", " female " two values, if property value is " man ", the binary coding of this attribute is " 01 "; If property value is " female ", be encoded to " 10 ".In genetic algorithm, every chromosome is that a heredity is individual, all populations of the individual formation of heredity, and population scale represents with the individual quantity n of heredity, the population being obtained by genetic coding process is initial population, a gene position for a scale-of-two code bit homologue.
The loop iteration process of genetic manipulation is as shown in Figure 2: in the t time iteration, first evaluate t for the fitness of population P (t), judge whether afterwards to stop iteration, if meet iteration stopping condition, stop iteration, using P (t) output as atomic rule; If do not meet iteration stopping condition, select, crossover and mutation operation, generate t+1 for population P (t+1), make t=t+1, implement next iteration.
Iteration stopping condition is any one of following two conditions:
1. iterations t>100;
2. fitness fitness>0.75.
Fitness evaluation is: establishing the individual corresponding data recording of a certain heredity is [c 1, c 2..., c k], the fitness of this heredity individuality is fitness=TT/n+TT/ (TT+TF), wherein TT meets " (a in data block 1=c 1) ∧ (a 2=c 2) ∧ ... ∧ (a k=c k) " data recording number, TF is that data meet " (a in fast 1=c 1) ∧ (a 2=c 2) ∧ ... ∧ (a k-1=c k-1) ∧ (a k≠ c k) " data recording number, FT meets " ┐ [(a in data block 1=c 1) ∧ (a 2=c 2) ∧ ... ∧ (a k-1=c k-1)] ∧ (a k=c k) " data recording number, FF meets " ┐ [(a in data block 1=c 1) ∧ (a 2=c 2) ∧ ... ∧ (a k-1=c k-1)] ∧ (a k≠ c k) " data recording number.
Selection is operating as: hereditary individual X iselection probability
Figure BDA00001786606100051
copy p i* n X icopy as the individuality in genetic manipulation population next time, fitnexss (X i) be X ifitness value.
Interlace operation is: according to crossover probability p ctwo individual X of heredity of random selection iand X j, a gene position w on random selective staining body, by X iand X jthe exchange of gene section below of upper gene position w forms two new individualities, as the individuality in genetic manipulation population next time.Wherein, crossover probability p cfor the numerical value between [0.4,0.9], also can adopt adaptive crossover probability.
Mutation operation is: according to variation Probability p man individual X of heredity of random selection i, select at random X ia upper gene position, carries out negate to the binary code in this gene position.Wherein, variation Probability p mfor the numerical value between [0.01,0.1], also can adopt adaptive variation probability.
Control center periodically reads the <key in buffer zone, value> key-value pair, generate <key, the list of value list> key-value pair deposits buffer zone in, <key, key in the list of value list> key-value pair is class label, and value list is the atomic rule list under such label;
The complete message of ,Xiang control center transmission processing unit operations after the operation of all data recording in the complete paired data piece of processing unit.
Step 3 is carried out according to the following procedure:
By control center, search for idle dependent server as sorter, the number of sorter is <key, the number of key value in the list of value list> key-value pair, the corresponding key value of each sorter; Control center is by <key, in atomic rule list in the list of value list> key-value pair and test sample book, have in the sorter that the record of same item label is sent to, and transmit to sorter the operational order that redundancy yojan and nicety of grading detect;
Sorter only records once a plurality of same atoms rules under same class label, deletes the atomic rule of redundancy, obtains the atomic rule after yojan;
Sorter utilizes the atomic rule after yojan to classify to test sample book, whether consistently with the class label of test sample book detect classification results, in the test sample book of supposing to be classified by atomic rule a, there is the class label of Y bar record consistent with classification results, have class label and the classification results of N bar record inconsistent, the nicety of grading of atomic rule a is Y/ (Y+N).In the classified excavation mission requirements that user proposes, nicety of grading requires as α, and all niceties of grading are not less than to the atomic rule of α as the net result of classifying rules excavation, sends control center to;
Control center gathers the net result that all sorters generate, then summarized results is fed back to the requestor of classifying rules mining task.

Claims (3)

1. the classifying rules method for digging under cloud computing environment, is characterized in that:
Described cloud computing environment consists of a plurality of distributed servers; While implementing classifying rules excavation under described cloud computing environment, take master-slave mode institutional framework, described master-slave mode institutional framework is control center for a station server is set, and other server is dependent server; By described control center, arrange execution, the management and running of whole mining task and coordinate the operation of each dependent server; Described each dependent server is the concrete performance element of task, and described classifying rules method for digging carries out as follows:
A ,You control center is divided into training sample and test sample book by data set to be sorted, described training sample is evenly divided, obtain each data block of formed objects, for each data block, specify a dependent server of carrying out classified excavation task as processing unit, described each data block is assigned in corresponding each processing unit;
B, data block distribution being obtained by described processing unit employing genetic algorithm are implemented the training of the atomic rule for classifying, and the atomic rule that training is obtained deposits buffer zone in; The training process of described atomic rule is:
Processing unit carries out genetic coding to each data recording of composition data piece, by the loop iteration Generation of atoms rule of genetic manipulation, by described atomic rule with <key, the form of value> key-value pair deposits buffer zone in, described <key, key in value> key-value pair is class label, and value is the atomic rule under such label;
Control center periodically reads the <key in buffer zone, value> key-value pair, generate <key, the list of value list> key-value pair deposits buffer zone in, described <key, key in the list of value list> key-value pair is class label, and value list is the atomic rule list under such label;
The complete message of ,Xiang control center transmission processing unit operations after the operation of all data recording in the complete paired data piece of processing unit;
C, by described control center, the atomic rule in buffer zone is divided, select idle dependent server as sorter, format classification device carries out the redundancy yojan of atomic rule, and detect the nicety of grading of yojan result, filter out and meet the yojan result that nicety of grading requires, the net result excavating as classifying rules.
2. according to the classifying rules method for digging under the cloud computing environment described in claims 1, it is characterized in that, the implementation of described step a is:
Control center is after receiving the request of classified excavation task, data set to be sorted is divided into training sample and test sample book, according to user, the requirement of classified excavation task and the feature of training sample are proposed, search for suitable dependent server as processing unit, and training sample is evenly divided into the data block that size is identical; If the processing units quantity satisfying condition is N, training sample size is M, and the data block size divided is M/N;
The satisfied condition of described suitable dependent server is: storage space is not less than M/N, and the response time is not more than the desired time the latest of user;
Control center accesses described training sample, and the block copy of dividing gained, to corresponding processing unit, and is utilized to the operational order of genetic algorithm training atomic rule to processing unit transmission.
3. according to the classifying rules method for digging under the cloud computing environment described in claims 1, it is characterized in that, described step c carries out according to the following procedure:
By control center, search for idle dependent server as sorter, the number of sorter is <key, the number of key value in the list of value list> key-value pair, the corresponding key value of each sorter; Control center is by <key, the record in the atomic rule list in the list of value list> key-value pair and test sample book with same item label is sent in sorter, and to sorter, transmits the operational order of redundancy yojan and nicety of grading detection;
Sorter only records once a plurality of same atoms rules under same class label, deletes the atomic rule of redundancy, obtains the atomic rule after yojan;
Sorter utilizes the atomic rule after yojan to classify to test sample book, whether consistently with the class label of test sample book detect classification results, in the test sample book of supposing to be classified by atomic rule a, there is the class label of Y bar record consistent with classification results, have class label and the classification results of N bar record inconsistent, the nicety of grading of atomic rule a is Y/ (Y+N); Suppose in the classified excavation mission requirements that user proposes, nicety of grading requires as α, all niceties of grading is not less than to the net result that the atomic rule of α excavates as classifying rules and sends control center to;
Control center gathers the net result that all sorters generate, then summarized results is fed back to the requestor of classifying rules mining task.
CN201210203816.5A 2012-06-19 2012-06-19 Classification rule mining method under cloud computing environment Expired - Fee Related CN102737126B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210203816.5A CN102737126B (en) 2012-06-19 2012-06-19 Classification rule mining method under cloud computing environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210203816.5A CN102737126B (en) 2012-06-19 2012-06-19 Classification rule mining method under cloud computing environment

Publications (2)

Publication Number Publication Date
CN102737126A CN102737126A (en) 2012-10-17
CN102737126B true CN102737126B (en) 2014-03-12

Family

ID=46992627

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210203816.5A Expired - Fee Related CN102737126B (en) 2012-06-19 2012-06-19 Classification rule mining method under cloud computing environment

Country Status (1)

Country Link
CN (1) CN102737126B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113516A (en) * 2013-04-19 2014-10-22 中国移动通信集团设计院有限公司 Method and terminal for recognizing rule conflicts of firewalls
EP2988230A4 (en) * 2014-06-27 2016-10-19 Huawei Tech Co Ltd Data processing method and computer system
CN105900064B (en) * 2014-11-19 2019-05-03 华为技术有限公司 The method and apparatus for dispatching data flow task
CN104468239B (en) * 2014-12-22 2018-08-21 上海大唐移动通信设备有限公司 A kind of rule-based data processing method and device
US10430429B2 (en) 2015-09-01 2019-10-01 Cognizant Technology Solutions U.S. Corporation Data mining management server
CN107153630B (en) 2016-03-04 2020-11-06 阿里巴巴集团控股有限公司 Training method and training system of machine learning system
CN105912674A (en) * 2016-04-13 2016-08-31 精硕世纪科技(北京)有限公司 Method, device and system for noise reduction and classification of data
CN105975331A (en) * 2016-04-26 2016-09-28 浪潮(北京)电子信息产业有限公司 Data parallel processing method and apparatus
CN106372185B (en) * 2016-08-31 2017-07-04 广东京奥信息科技有限公司 A kind of data preprocessing method of heterogeneous data source
CN107169513B (en) * 2017-05-05 2019-10-18 第四范式(北京)技术有限公司 Control distributed machines learning system and its method that data use sequence
CN107590263B (en) * 2017-09-22 2020-07-07 辽宁工程技术大学 Distributed big data classification method based on multivariate decision tree model
CN108763952B (en) * 2018-05-03 2022-04-05 创新先进技术有限公司 Data classification method and device and electronic equipment
CN110109753A (en) * 2019-04-25 2019-08-09 成都信息工程大学 Resource regulating method and system based on various dimensions constraint genetic algorithm
TWI739229B (en) 2019-12-03 2021-09-11 財團法人工業技術研究院 Method and device for screening out dispatching rules
CN111814882B (en) * 2020-07-10 2021-06-22 辽东学院 Data classification system based on computer big data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1282106C (en) * 2004-11-29 2006-10-25 南京大学 Rapid study classifying method
KR101600303B1 (en) * 2007-11-08 2016-03-07 센티언트 테크놀로지스 (바베이도스) 리미티드 Distributed network for performing complex algorithms
CN101556650B (en) * 2009-04-01 2011-07-06 东北大学 Distributed self-adapting pulmonary nodule computer detection method and system thereof
US9275048B2 (en) * 2009-09-09 2016-03-01 Tapicu, Inc. System and methods for solving information retrieval problem sets
CN102143022B (en) * 2011-03-16 2013-09-25 北京邮电大学 Cloud measurement device and method for IP network

Also Published As

Publication number Publication date
CN102737126A (en) 2012-10-17

Similar Documents

Publication Publication Date Title
CN102737126B (en) Classification rule mining method under cloud computing environment
Bifet et al. Extremely fast decision tree mining for evolving data streams
Xiao et al. SMK-means: an improved mini batch k-means algorithm based on mapreduce with big data
Gao et al. Classifying data streams with skewed class distributions and concept drifts
Luo et al. A parallel dbscan algorithm based on spark
Lin et al. A K-means clustering with optimized initial center based on Hadoop platform
Elagib et al. Big data analysis solutions using MapReduce framework
Zhang et al. Multi-source big data dynamic compressive sensing and optimization method for water resources based on IoT
Liu High performance latent dirichlet allocation for text mining
Xu Research and implementation of improved random forest algorithm based on Spark
Li et al. Scalable random forests for massive data
CN111159406A (en) Big data text clustering method and system based on parallel improved K-means algorithm
Gong et al. Evolutionary computation in China: A literature survey
Al_Zyadat et al. Securitizing big data characteristics used tall array and mapreduce
Wang et al. Association rules mining in parallel conditional tree based on grid computing inspired partition algorithm
Saxena et al. A framework for multi-sensor data fusion in the context of IoT smart city parking data
Xu et al. Applying an improved elephant herding optimization algorithm with spark-based parallelization to feature selection for intrusion detection
Sun et al. Key nodes discovery in large-scale logistics network based on MapReduce
Zhu et al. Efficient Gaussian Kernel Microcluster Real-Time Clustering Method for Industrial Internet of Things (IIoT) Streams
Jia et al. An improved FP-growth algorithm based on SOM partition
Wang et al. Sknn algorithm for filling missing oil data based on knn
Gao et al. Dynamic community detection using nonnegative matrix factorization
Huang et al. Optimization of K-means Algorithm Base on MapReduce
Zeng A clustering method of combining grid and genetic algorithm in wireless sensor networks
Song et al. HTME: A data streams processing strategy based on Hoeffding tree in MapReduce environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140312

Termination date: 20210619

CF01 Termination of patent right due to non-payment of annual fee