CN102737126A - Classification rule mining method under cloud computing environment - Google Patents
Classification rule mining method under cloud computing environment Download PDFInfo
- Publication number
- CN102737126A CN102737126A CN2012102038165A CN201210203816A CN102737126A CN 102737126 A CN102737126 A CN 102737126A CN 2012102038165 A CN2012102038165 A CN 2012102038165A CN 201210203816 A CN201210203816 A CN 201210203816A CN 102737126 A CN102737126 A CN 102737126A
- Authority
- CN
- China
- Prior art keywords
- key
- value
- cloud computing
- atomic rule
- computing environment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention discloses a classification rule mining method under a cloud computing environment. The classification rule mining method is characterized in that a master-slave type organization structure consisting of a control center and a plurality of slave servers is adopted, and comprises the following steps of: dividing, by the control center, a data set to be classified into a training sample and a test sample, and distributing each of data blocks of the same size which are formed by uniformly dividing the training sample to one processing unit; training, by the processing units, the data blocks, by using a genetic algorithm to obtain an atomic rule for classification; and finally, reducing the atomic rule by a classifier, and selecting a reduction result which meets classification accuracy requirement as a final result of the classification rule mining. The classification rule mining method is suitable for data classification on distributed data storage under the cloud computing environment; distributed parallel processing of data classification tasks under the cloud computing environment can be performed; and the classification rule mining method has a positive effect on the classification processing problem of mass data under the cloud computing environment.
Description
Technical field
The invention belongs to data analysis technique field under the cloud computing environment, be specifically related to the classifying rules method for digging under a kind of cloud computing environment.
Background technology
Sorting technique research is the important research field of data analysis and management under the cloud computing environment.On the one hand; Classification is the important task type of data mining; Data under the cloud computing environment have characteristics such as magnanimity property, distributivity and dynamic; These characteristics have been brought challenge to the data management under the cloud computing environment, excavate through classifying rules these data are analyzed, and help to improve magnanimity data analysis and efficiency of managing under the cloud computing environment.On the other hand, cloud environment has ultra-large storage and computing power, and resource and structure have dynamic retractility property, and through Intel Virtualization Technology and huge resource pool service is provided as required, makes that data qualification becomes possibility efficiently.It is the vital task in the data analysis management that classifying rules excavates, and helps to understand better the mass data under the cloud computing environment, the Operation Decision under the auxiliary cloud computing environment; Simultaneously, high performance calculating of cloud computing and storage capacity are excavated the guarantee that efficient operation is provided for classifying rules.Therefore, it is the important technology that the data analysis under the cloud computing environment is handled that classifying rules excavates, and it is theoretical significant with the research of using.
In the research of classifying rules digging technology; Chinese scholars has proposed many solutions; Comprise with Bayes's method be representative statistical method, with decision tree method and rule induction method be representative and learning method; And neural net method etc., when these methods are applied to the classifying rules excavation of small-scale static data collection, have higher nicety of grading.Yet still there is bottleneck problem in these methods, comprise carrying out repeatedly scanning and ordering to data set, cause the poor efficiency of algorithm; Noise with data are relatively more responsive really, is prone to over-fitting; For the scalability of big training set is not fine etc.Especially under cloud computing environment; The extensive property and the dynamic of distributed mass data collection; Cause the surge of data qualification process searches space and dimension; Increased the computational complexity of classification, reduced the efficient of traditional classification method, thereby existing classifying rules method for digging can't directly apply in the cloud computing environment.
Summary of the invention
In order to address the above problem, the present invention overcomes the limitation of prior art, and the method for digging of the classifying rules under a kind of cloud computing environment is provided.The present invention is applicable to the data qualification on the Distributed Storage under the cloud computing environment, can carry out the distributed parallel of data qualification task under the cloud computing environment and handle, and magnanimity classification of Data under the cloud computing environment is handled problems played positive effect.Utilize the scale of large-scale calculations node under the cloud computing environment to calculate effect, effectively improve the efficient of magnanimity data qualification rule digging under the cloud computing environment; And the institutional framework through master-slave mode and based on the regular training process of genetic algorithm, solve classifying rules and excavate the distributed realization under cloud computing environment.
The present invention adopts following technical scheme for the technical solution problem:
The characteristics of the classifying rules method for digging under the cloud computing environment of the present invention are:
Said cloud computing environment is made up of a plurality of distributed servers; When under said cloud computing environment, implementing the classifying rules excavation, take the master-slave mode institutional framework, said master-slave mode institutional framework is a control center for a station server is set, and other server is a dependent server; Arrange execution, the management and running of whole mining task and coordinate the operation of each dependent server by said control center; Said each dependent server is the concrete performance element of task, and said classifying rules method for digging carries out as follows:
A, by control center will be to be classified data set be divided into training sample and test sample book; Said training sample is evenly divided; Obtain each data block of identical size; For each data block specifies a dependent server of carrying out the classified excavation task as processing unit, with said each data block allocations to each processing unit of correspondence;
The training of the atomic rule that b, the data block enforcement of adopting genetic algorithm that distribution is obtained by said processing unit are used to classify, the atomic rule that training is obtained deposits buffer zone in;
C, the atomic rule in the buffer zone is divided by said control center; Select idle dependent server as sorter; The format classification device carries out the redundant yojan of atomic rule; And detection yojan result's nicety of grading, filter out and satisfy the yojan result that nicety of grading requires, as the net result of classifying rules excavation.
The characteristics of the classifying rules method for digging under the cloud computing environment of the present invention also are:
The implementation of said step a is:
Control centre is after the request of receiving the classified excavation task; Data set to be classified is divided into training sample and test sample book; Based on the user requirement of classified excavation task and the characteristic of training sample are proposed; Search for suitable dependent server as processing unit, and training sample evenly is divided into the identical data block of size; If the processing units quantity that satisfies condition is N, the training sample size is M, and the data block of then being divided size is M/N;
The condition that said suitable dependent server satisfies is: storage space is not less than M/N, and the response time is not more than the desired time the latest of user;
Control center visits said training sample, and the block copy of dividing gained to corresponding processing unit, and is utilized the operational order of genetic algorithm training atomic rule to the processing unit transmission.
The training process of the atomic rule among the said step b is:
Processing unit carries out genetic coding to each bar data recording of composition data piece; Loop iteration through genetic manipulation generates atomic rule; Deposit said atomic rule in buffer zone, said < key with the form of < key, value>key-value pair; Value>key type of being label in the key-value pair, value is the atomic rule under such label;
Control center periodically reads < key, the value>key-value pair in the buffer zone, generates < key; Value list>key-value pair tabulation deposits buffer zone in; Key type of being label in the tabulation of said < key, value list>key-value pair, value list is the tabulation of the atomic rule under such label;
Processing unit is accomplished after the operation to all data recording in the data block, sends the message that processing unit operation finishes to control center.
Said step c is undertaken by following process:
Search for idle dependent server as sorter by control center, the number of sorter is the number of key value in the tabulation of < key, value list>key-value pair, the corresponding key value of each sorter; In the sorter that control center is sent to the record that has the same item label in the tabulation of the atomic rule in < key, value list>key-value pair tabulation and the test sample book, and transmit the operational order of redundant yojan and nicety of grading detection to sorter;
Sorter to same type label a plurality of same atoms rules down only record once, the atomic rule of deletion redundancy obtains the atomic rule after the yojan;
Sorter utilizes the atomic rule after the yojan that test sample book is classified; Whether detect classification results consistent with the class label of test sample book; Suppose by in the test sample book of atomic rule a classification; Have the class label and the classification results of Y bar record consistent, have the class label and the classification results of N bar record inconsistent, then the nicety of grading of atomic rule a is Y/ (Y+N); Suppose in the classified excavation mission requirements that the user proposes that nicety of grading requires to be α that the atomic rule that all niceties of grading is not less than α sends control center to as the net result that classifying rules excavates;
Control center gathers the net result that all sorters generate, and summarized results is fed back to the requestor of classifying rules mining task again.
Compare with existing data classification method, beneficial effect of the present invention is embodied in:
1, the present invention is divided into a plurality of subtasks with the classifying rules mining task of mass data; Be assigned on the extensive server cluster in the cloud computing environment and handle; Reduce the computation complexity of individual task; Utilize the scale of cloud computing server cluster to calculate effect, significantly improve the efficient of whole classifying rules mining task;
2, master-slave mode institutional framework among the present invention has realized the distribution of task under the cloud computing environment, scheduling and management, for classifying rules excavates distributed realization mechanism is provided; Simultaneously, the genetic algorithm itself that the rule training is adopted has good concurrency, has solved the problem of general classification technology concurrency difference in distributed environment.
Description of drawings
Fig. 1 is the schematic diagram of classifying rules method for digging under the cloud computing environment of the present invention
The process flow diagram of Fig. 2 for generating based on the atomic rule of genetic manipulation loop iteration among the present invention
Embodiment
In the classifying rules method for digging under the present embodiment cloud computing environment:
Cloud computing environment is made up of a plurality of distributed servers; When under cloud computing environment, implementing the classifying rules excavation, take the master-slave mode institutional framework, the master-slave mode institutional framework is a control center for a station server is set, and other server is a dependent server; Arrange execution, the management and running of whole mining task and coordinate the operation of each dependent server by control center; Each dependent server is the concrete performance element of task.The classifying rules method for digging is as shown in Figure 1, carries out as follows:
1, by control center will be to be classified data set be divided into training sample and test sample book; Training sample is evenly divided; Obtain each data block of identical size; For each data block specifies a dependent server of carrying out the classified excavation task as processing unit, with each data block allocations to each processing unit of correspondence;
The training of the atomic rule that the data block enforcement of 2, adopting genetic algorithm that distribution is obtained by processing unit is used to classify, the atomic rule that training is obtained deposits buffer zone in;
3, by control center the atomic rule in the buffer zone is divided; Select idle dependent server as sorter; The format classification device carries out the redundant yojan of atomic rule; And detection yojan result's nicety of grading, filter out and satisfy the yojan result that nicety of grading requires, as the net result of classifying rules excavation.
The tentation data collection is by relation schema R (a
1, a
2..., a
k) expression, wherein a
i(i=1,2 ..., k) be attribute.With set of properties<a
1, a
2..., a
k>Be divided into two parts, comprise k-1 conditional attribute and 1 class label, attribute a
1, a
2..., a
K-1Be conditional attribute, attribute a
kBe the class label.Each bar data recording of data centralization is a k dimensional vector [c
1, c
2..., c
k], c
iBe attribute a in this data recording
iValue.
The form of expression of atomic rule is: If (a
1=c
1) ∧ (a
2=c
2) ∧ ... ∧ (a
K-1=c
K-1), Then a
k=c
k
The implementation of step 1 is:
Control centre is after the request of receiving the classified excavation task; Data set to be classified is divided into training sample and test sample book; Based on the user requirement of classified excavation task and the characteristic of training sample are proposed; Search for suitable dependent server as processing unit, and training sample evenly is divided into the identical data block of size; If the processing units quantity that satisfies condition is N, the training sample size is M, and the data block of then being divided size is M/N;
Suitable dependent server need satisfy following condition: storage space is not less than M/N, and the response time is not more than the desired time the latest of user.
Control center visit training sample to corresponding processing unit, and utilizes the operational order of genetic algorithm training atomic rule with the block copy of dividing gained to the processing unit transmission.
The training process of atomic rule is in the step 2:
Processing unit carries out genetic coding to each bar data recording of composition data piece; Loop iteration through genetic manipulation generates atomic rule; Deposit atomic rule in buffer zone, < key with the form of < key, value>key-value pair; Value>key type of being label in the key-value pair, value is the atomic rule under such label;
The genetic coding process of data recording is: every data recording is expressed as a chromosome, and the chromogene value is made up of the scale-of-two code value of k property value.If property value is a discrete value, can directly carry out binary coding; If property value is successive value, then to carry out discretize to the connection attribute value after, carry out binary coding again.The binary coding process is: establish attribute a
iS discrete property value v arranged
1, v
2..., v
s, then with the binary number representation attribute a that contains s sign indicating number position
iValue.If a
iValue be v
j, then the j place value of this property value binary code is 1, the value position of remainder code position is 0.For example, the sex attribute has " man ", " woman " two values, if property value is " man ", then the binary coding of this attribute is " 01 "; If property value is " woman ", be encoded to " 10 ".In the genetic algorithm; Every chromosome is that a heredity is individual, population of the individual formation of all heredity, and population scale is represented with the individual quantity n of heredity; The population that is obtained by the genetic coding process is an initial population, the gene position of a binary code position homologue.
The loop iteration process of genetic manipulation is as shown in Figure 2: in the t time iteration, at first estimate the fitness of t for population P (t), judge whether to stop iteration afterwards, if satisfy the iteration stopping condition, then stop iteration, P (t) is exported as atomic rule; If do not satisfy the iteration stopping condition, then select, intersection and mutation operation, generate t+1 for population P (t+1), make t=t+1, implement next iteration.
The iteration stopping condition is any of following two conditions:
1. iterations t>100;
2. fitness fitness>0.75.
Fitness evaluation is: establish the individual corresponding data recording of a certain heredity and be [c
1, c
2..., c
k], the individual fitness of this heredity is fitness=TT/n+TT/ (TT+TF), wherein TT satisfies " (a in the data block
1=c
1) ∧ (a
2=c
2) ∧ ... ∧ (a
k=c
k) " data recording bar number, TF is that data satisfy " (a in fast
1=c
1) ∧ (a
2=c
2) ∧ ... ∧ (a
K-1=c
K-1) ∧ (a
k≠ c
k) " data recording bar number, FT satisfies " ┐ [(a in the data block
1=c
1) ∧ (a
2=c
2) ∧ ... ∧ (a
K-1=c
K-1)] ∧ (a
k=c
k) " data recording bar number, FF satisfies " ┐ [(a in the data block
1=c
1) ∧ (a
2=c
2) ∧ ... ∧ (a
K-1=c
K-1)] ∧ (a
k≠ c
k) " data recording bar number.
Selection operation is: hereditary individual X
iThe selection probability
Duplicate p
i* n X
iCopy as the individuality in the genetic manipulation population next time, fitnexss (X
i) be X
iFitness value.
Interlace operation is: according to crossover probability p
cSelect two individual X of heredity at random
iAnd X
j, a gene position w on the selective staining body at random is with X
iAnd X
jThe exchange of the gene section of last gene position w back forms two new individualities, as the individuality in the genetic manipulation population next time.Wherein, crossover probability p
cBe the numerical value between [0.4,0.9], also can adopt adaptive crossover probability.
Mutation operation is: according to the variation Probability p
mSelect an individual X of heredity at random
i, select X at random
iA last gene position is carried out negate to the binary code on this gene position.Wherein, variation Probability p
mBe the numerical value between [0.01,0.1], also can adopt adaptive variation probability.
Control center periodically reads < key, the value>key-value pair in the buffer zone, generates < key; Value list>key-value pair tabulation deposits buffer zone in; Key type of being label in the tabulation of < key, value list>key-value pair, value list is the tabulation of the atomic rule under such label;
Processing unit is accomplished after the operation to all data recording in the data block, sends the message that processing unit operation finishes to control center.
Step 3 is undertaken by following process:
Search for idle dependent server as sorter by control center, the number of sorter is the number of key value in the tabulation of < key, value list>key-value pair, the corresponding key value of each sorter; In the sorter that control center is sent to the record that has the same item label in the tabulation of the atomic rule in < key, value list>key-value pair tabulation and the test sample book, and transmit the operational order of redundant yojan and nicety of grading detection to sorter;
Sorter to same type label a plurality of same atoms rules down only record once, the atomic rule of deletion redundancy obtains the atomic rule after the yojan;
Sorter utilizes the atomic rule after the yojan that test sample book is classified; Whether detect classification results consistent with the class label of test sample book; Suppose by in the test sample book of atomic rule a classification; Have the class label and the classification results of Y bar record consistent, have the class label and the classification results of N bar record inconsistent, then the nicety of grading of atomic rule a is Y/ (Y+N).In the classified excavation mission requirements that the user proposes, nicety of grading requires to be α that the atomic rule that all niceties of grading is not less than α sends control center to as the net result that classifying rules excavates;
Control center gathers the net result that all sorters generate, and summarized results is fed back to the requestor of classifying rules mining task again.
Claims (4)
1. the classifying rules method for digging under the cloud computing environment is characterized in that:
Said cloud computing environment is made up of a plurality of distributed servers; When under said cloud computing environment, implementing the classifying rules excavation, take the master-slave mode institutional framework, said master-slave mode institutional framework is a control center for a station server is set, and other server is a dependent server; Arrange execution, the management and running of whole mining task and coordinate the operation of each dependent server by said control center; Said each dependent server is the concrete performance element of task, and said classifying rules method for digging carries out as follows:
A, by control center will be to be classified data set be divided into training sample and test sample book; Said training sample is evenly divided; Obtain each data block of identical size; For each data block specifies a dependent server of carrying out the classified excavation task as processing unit, with said each data block allocations to each processing unit of correspondence;
The training of the atomic rule that b, the data block enforcement of adopting genetic algorithm that distribution is obtained by said processing unit are used to classify, the atomic rule that training is obtained deposits buffer zone in;
C, the atomic rule in the buffer zone is divided by said control center; Select idle dependent server as sorter; The format classification device carries out the redundant yojan of atomic rule; And detection yojan result's nicety of grading, filter out and satisfy the yojan result that nicety of grading requires, as the net result of classifying rules excavation.
2. according to the classifying rules method for digging under claims 1 described cloud computing environment, it is characterized in that the implementation of said step a is:
Control centre is after the request of receiving the classified excavation task; Data set to be classified is divided into training sample and test sample book; Based on the user requirement of classified excavation task and the characteristic of training sample are proposed; Search for suitable dependent server as processing unit, and training sample evenly is divided into the identical data block of size; If the processing units quantity that satisfies condition is N, the training sample size is M, and the data block of then being divided size is M/N;
The condition that said suitable dependent server satisfies is: storage space is not less than M/N, and the response time is not more than the desired time the latest of user;
Control center visits said training sample, and the block copy of dividing gained to corresponding processing unit, and is utilized the operational order of genetic algorithm training atomic rule to the processing unit transmission.
3. according to the classifying rules method for digging under claims 1 described cloud computing environment, it is characterized in that the training process of the atomic rule among the said step b is:
Processing unit carries out genetic coding to each bar data recording of composition data piece; Loop iteration through genetic manipulation generates atomic rule; Deposit said atomic rule in buffer zone, said < key with the form of < key, value>key-value pair; Value>key type of being label in the key-value pair, value is the atomic rule under such label;
Control center periodically reads < key, the value>key-value pair in the buffer zone, generates < key; Value list>key-value pair tabulation deposits buffer zone in; Key type of being label in the tabulation of said < key, value list>key-value pair, value list is the tabulation of the atomic rule under such label;
Processing unit is accomplished after the operation to all data recording in the data block, sends the message that processing unit operation finishes to control center.
4. according to the classifying rules method for digging under claims 1 described cloud computing environment, it is characterized in that said step c is undertaken by following process:
Search for idle dependent server as sorter by control center, the number of sorter is the number of key value in the tabulation of < key, value list>key-value pair, the corresponding key value of each sorter; In the sorter that control center is sent to the record that has the same item label in the tabulation of the atomic rule in < key, value list>key-value pair tabulation and the test sample book, and transmit the operational order of redundant yojan and nicety of grading detection to sorter;
Sorter to same type label a plurality of same atoms rules down only record once, the atomic rule of deletion redundancy obtains the atomic rule after the yojan;
Sorter utilizes the atomic rule after the yojan that test sample book is classified; Whether detect classification results consistent with the class label of test sample book; Suppose by in the test sample book of atomic rule a classification; Have the class label and the classification results of Y bar record consistent, have the class label and the classification results of N bar record inconsistent, then the nicety of grading of atomic rule a is Y/ (Y+N); Suppose in the classified excavation mission requirements that the user proposes that nicety of grading requires to be α that the atomic rule that all niceties of grading is not less than α sends control center to as the net result that classifying rules excavates;
Control center gathers the net result that all sorters generate, and summarized results is fed back to the requestor of classifying rules mining task again.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210203816.5A CN102737126B (en) | 2012-06-19 | 2012-06-19 | Classification rule mining method under cloud computing environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210203816.5A CN102737126B (en) | 2012-06-19 | 2012-06-19 | Classification rule mining method under cloud computing environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102737126A true CN102737126A (en) | 2012-10-17 |
CN102737126B CN102737126B (en) | 2014-03-12 |
Family
ID=46992627
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210203816.5A Expired - Fee Related CN102737126B (en) | 2012-06-19 | 2012-06-19 | Classification rule mining method under cloud computing environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102737126B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104113516A (en) * | 2013-04-19 | 2014-10-22 | 中国移动通信集团设计院有限公司 | Method and terminal for recognizing rule conflicts of firewalls |
CN104468239A (en) * | 2014-12-22 | 2015-03-25 | 上海大唐移动通信设备有限公司 | Data processing method and device based on rules |
WO2015196476A1 (en) * | 2014-06-27 | 2015-12-30 | 华为技术有限公司 | Data processing method and computer system |
WO2016078008A1 (en) * | 2014-11-19 | 2016-05-26 | 华为技术有限公司 | Method and apparatus for scheduling data flow task |
CN105912674A (en) * | 2016-04-13 | 2016-08-31 | 精硕世纪科技(北京)有限公司 | Method, device and system for noise reduction and classification of data |
CN105975331A (en) * | 2016-04-26 | 2016-09-28 | 浪潮(北京)电子信息产业有限公司 | Data parallel processing method and apparatus |
CN106372185A (en) * | 2016-08-31 | 2017-02-01 | 广东京奥信息科技有限公司 | Data preprocessing method for heterogeneous data sources |
WO2017148314A1 (en) * | 2016-03-04 | 2017-09-08 | 阿里巴巴集团控股有限公司 | Method of training machine learning system, and training system |
CN107169513A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Control data uses the distributed machines learning system and its method of order |
CN107590263A (en) * | 2017-09-22 | 2018-01-16 | 辽宁工程技术大学 | A kind of distributed big data sorting technique based on multi-variable decision tree-model |
CN108763952A (en) * | 2018-05-03 | 2018-11-06 | 阿里巴巴集团控股有限公司 | A kind of data classification method, device and electronic equipment |
CN110109753A (en) * | 2019-04-25 | 2019-08-09 | 成都信息工程大学 | Resource regulating method and system based on various dimensions constraint genetic algorithm |
US10430429B2 (en) | 2015-09-01 | 2019-10-01 | Cognizant Technology Solutions U.S. Corporation | Data mining management server |
CN111814882A (en) * | 2020-07-10 | 2020-10-23 | 辽东学院 | Data classification system based on computer big data |
US11762376B2 (en) | 2019-12-03 | 2023-09-19 | Industrial Technology Research Institute | Quick dispatching rule screening method and apparatus |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1614630A (en) * | 2004-11-29 | 2005-05-11 | 南京大学 | Rapid study classifying method |
CN101556650A (en) * | 2009-04-01 | 2009-10-14 | 东北大学 | Distributed self-adapting pulmonary nodule computer detection method and system thereof |
CN101939727A (en) * | 2007-11-08 | 2011-01-05 | 遗传学金融(巴巴多斯)有限公司 | Distributed network for performing complex algorithms |
CN102143022A (en) * | 2011-03-16 | 2011-08-03 | 北京邮电大学 | Cloud measurement device and method for IP network |
US20110191322A1 (en) * | 2009-09-09 | 2011-08-04 | Tapicu, Inc. | Stochastic optimization techniques of evolutionary computation search strategies for an information sharing system |
-
2012
- 2012-06-19 CN CN201210203816.5A patent/CN102737126B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1614630A (en) * | 2004-11-29 | 2005-05-11 | 南京大学 | Rapid study classifying method |
CN101939727A (en) * | 2007-11-08 | 2011-01-05 | 遗传学金融(巴巴多斯)有限公司 | Distributed network for performing complex algorithms |
CN101556650A (en) * | 2009-04-01 | 2009-10-14 | 东北大学 | Distributed self-adapting pulmonary nodule computer detection method and system thereof |
US20110191322A1 (en) * | 2009-09-09 | 2011-08-04 | Tapicu, Inc. | Stochastic optimization techniques of evolutionary computation search strategies for an information sharing system |
CN102143022A (en) * | 2011-03-16 | 2011-08-03 | 北京邮电大学 | Cloud measurement device and method for IP network |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104113516A (en) * | 2013-04-19 | 2014-10-22 | 中国移动通信集团设计院有限公司 | Method and terminal for recognizing rule conflicts of firewalls |
US9984336B2 (en) | 2014-06-27 | 2018-05-29 | Huawei Technologies Co., Ltd. | Classification rule sets creation and application to decision making |
WO2015196476A1 (en) * | 2014-06-27 | 2015-12-30 | 华为技术有限公司 | Data processing method and computer system |
WO2016078008A1 (en) * | 2014-11-19 | 2016-05-26 | 华为技术有限公司 | Method and apparatus for scheduling data flow task |
CN105900064A (en) * | 2014-11-19 | 2016-08-24 | 华为技术有限公司 | Method and apparatus for scheduling data flow task |
US10558498B2 (en) | 2014-11-19 | 2020-02-11 | Huawei Technologies Co., Ltd. | Method for scheduling data flow task and apparatus |
CN105900064B (en) * | 2014-11-19 | 2019-05-03 | 华为技术有限公司 | The method and apparatus for dispatching data flow task |
CN104468239A (en) * | 2014-12-22 | 2015-03-25 | 上海大唐移动通信设备有限公司 | Data processing method and device based on rules |
CN104468239B (en) * | 2014-12-22 | 2018-08-21 | 上海大唐移动通信设备有限公司 | A kind of rule-based data processing method and device |
US10430429B2 (en) | 2015-09-01 | 2019-10-01 | Cognizant Technology Solutions U.S. Corporation | Data mining management server |
US11151147B1 (en) | 2015-09-01 | 2021-10-19 | Cognizant Technology Solutions U.S. Corporation | Data mining management server |
US11257005B2 (en) | 2016-03-04 | 2022-02-22 | Alibaba Group Holding Limited | Training method and training system for machine learning system |
WO2017148314A1 (en) * | 2016-03-04 | 2017-09-08 | 阿里巴巴集团控股有限公司 | Method of training machine learning system, and training system |
TWI740896B (en) * | 2016-03-04 | 2021-10-01 | 香港商阿里巴巴集團服務有限公司 | Training method and training system of machine learning system |
CN105912674A (en) * | 2016-04-13 | 2016-08-31 | 精硕世纪科技(北京)有限公司 | Method, device and system for noise reduction and classification of data |
CN105975331A (en) * | 2016-04-26 | 2016-09-28 | 浪潮(北京)电子信息产业有限公司 | Data parallel processing method and apparatus |
CN106372185A (en) * | 2016-08-31 | 2017-02-01 | 广东京奥信息科技有限公司 | Data preprocessing method for heterogeneous data sources |
CN107169513A (en) * | 2017-05-05 | 2017-09-15 | 第四范式(北京)技术有限公司 | Control data uses the distributed machines learning system and its method of order |
CN107590263B (en) * | 2017-09-22 | 2020-07-07 | 辽宁工程技术大学 | Distributed big data classification method based on multivariate decision tree model |
CN107590263A (en) * | 2017-09-22 | 2018-01-16 | 辽宁工程技术大学 | A kind of distributed big data sorting technique based on multi-variable decision tree-model |
CN108763952A (en) * | 2018-05-03 | 2018-11-06 | 阿里巴巴集团控股有限公司 | A kind of data classification method, device and electronic equipment |
CN110109753A (en) * | 2019-04-25 | 2019-08-09 | 成都信息工程大学 | Resource regulating method and system based on various dimensions constraint genetic algorithm |
US11762376B2 (en) | 2019-12-03 | 2023-09-19 | Industrial Technology Research Institute | Quick dispatching rule screening method and apparatus |
CN111814882A (en) * | 2020-07-10 | 2020-10-23 | 辽东学院 | Data classification system based on computer big data |
CN111814882B (en) * | 2020-07-10 | 2021-06-22 | 辽东学院 | Data classification system based on computer big data |
Also Published As
Publication number | Publication date |
---|---|
CN102737126B (en) | 2014-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102737126B (en) | Classification rule mining method under cloud computing environment | |
Bifet et al. | Extremely fast decision tree mining for evolving data streams | |
Panda et al. | Planet: massively parallel learning of tree ensembles with mapreduce | |
Luo et al. | A parallel dbscan algorithm based on spark | |
Jia et al. | A grid and density-based clustering algorithm for processing data stream | |
Liu et al. | SP-Partitioner: A novel partition method to handle intermediate data skew in spark streaming | |
CN104809244B (en) | Data digging method and device under a kind of big data environment | |
CN104834557B (en) | A kind of data analysing method based on Hadoop | |
Lin et al. | A K-means clustering with optimized initial center based on Hadoop platform | |
Elagib et al. | Big data analysis solutions using MapReduce framework | |
Han et al. | Efficient mining of high-speed uncertain data streams | |
Jiang et al. | SKCompress: compressing sparse and nonuniform gradient in distributed machine learning | |
Xu | Research and implementation of improved random forest algorithm based on Spark | |
Han et al. | SlimML: Removing non-critical input data in large-scale iterative machine learning | |
Wu et al. | Mining of high-utility patterns in big IoT-based databases | |
Mao et al. | A MapReduce-based K-means clustering algorithm | |
Wang et al. | Association rules mining in parallel conditional tree based on grid computing inspired partition algorithm | |
Sun et al. | Key nodes discovery in large-scale logistics network based on MapReduce | |
Xu et al. | Applying an improved elephant herding optimization algorithm with spark-based parallelization to feature selection for intrusion detection | |
Huang et al. | Optimization of K-means Algorithm Base on MapReduce | |
Zhu et al. | Efficient Gaussian Kernel Microcluster Real-Time Clustering Method for Industrial Internet of Things (IIoT) Streams | |
CN113157915A (en) | Naive Bayes text classification method based on cluster environment | |
Zeng | A clustering method of combining grid and genetic algorithm in wireless sensor networks | |
Hao et al. | An association-oriented partitioning approach for streaming graph query | |
Zhang et al. | Emotion-based social computing platform for streaming big-data: Architecture and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20140312 Termination date: 20210619 |
|
CF01 | Termination of patent right due to non-payment of annual fee |