CN101782976B - Automatic selection method for machine learning in cloud computing environment - Google Patents

Automatic selection method for machine learning in cloud computing environment Download PDF

Info

Publication number
CN101782976B
CN101782976B CN 201010017918 CN201010017918A CN101782976B CN 101782976 B CN101782976 B CN 101782976B CN 201010017918 CN201010017918 CN 201010017918 CN 201010017918 A CN201010017918 A CN 201010017918A CN 101782976 B CN101782976 B CN 101782976B
Authority
CN
China
Prior art keywords
cloud
machine learning
algorithm
data
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201010017918
Other languages
Chinese (zh)
Other versions
CN101782976A (en
Inventor
王汝传
孔强
任勋益
付雄
邓松
易侃
杨明慧
蒋凌云
邓勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN 201010017918 priority Critical patent/CN101782976B/en
Publication of CN101782976A publication Critical patent/CN101782976A/en
Application granted granted Critical
Publication of CN101782976B publication Critical patent/CN101782976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an automatic selection method for machine learning in cloud computer environment. By using a cloud computing platform, a user can automatically and intelligently build a machine learning mathematic model which meets actual problems without building the operation environment of machine learning, selecting a machine to learn algorithm and even adjusting complicated machine learning functions and accompanying parameters but only using a Web method to upload sample data. Through the method, the use of machine learning is free from the environmental constraints and displays the advantages of the cloud computing platform, so that the machine learning model building is transparent to the user, so as to best reduce the use threshold of machine learning. The automatic selection method for machine learning in cloud computing environment solves the disadvantages of the unpredictability of model building selection, the manual experience of parameter adjustment, the difficulties of common users and the like when machine learning is applied in actual life.

Description

A kind of automatic selection method for machine learning in cloud computing environment
Technical field
The present invention is the autonomous system of selection of a kind of machine learning based on cloud computing environment.By using cloud computing platform, so that the user need not to build the running environment of machine learning, also need not to learn machine learning algorithm, more need not adjust machine learning function and the parameter thereof of numerous and complicated, only need under cloud computing platform, use the Web mode to upload training data and prediction test data, and determine to comprise the information seldom such as usable range, expectation territory, just can obtain needed multiple machine learning model and specific descriptions, so that solving practical problems.
Background technology
Machine learning is the another important application after expert system application, artificial intelligence application, also is a kind of core research topic of artificial intelligence simultaneously.Its objective is to make computing machine can simulate or test human learning behavior, thereby acquire knowledge or technical ability can constantly be improved performance according to new information simultaneously.The ability of machine learning is very important feature, and H.A.Simon thinks that study is the adaptations that system does, so that system finishes same or obtaining better to finish effect during similar task next time.R.s.Michalski thinks that study is the expression of constructing or revising for the experience things.The people that are engaged in DEVELOPMENT OF EXPERT SYSTEM think that then study is obtaining of knowledge.These viewpoints emphasize particularly on different fields, and the first viewpoint emphasizes that the external behavior effect learnt, the second then emphasize the internal procedure of learning, and the third mainly is from the point of view of practicability of knowledge engineering.
The research method of machine learning is to use for reference physiology, psychology, cognitive science etc. to the understanding of the self-teaching mechanism of the mankind own, foundation is to computation model or the cognitive model of human learning process, thereby form the various theories of learning and learning method, set up the learning system with application-specific of oriented mission.These goals in research influence each other and mutually promote.Since 1980 since Ka Neiji-Mei Long university holds first machine scientific seminar, machine learning development is very fast, has become one of central topic.
And the history development procedure of machine learning is divided into four-stage: (1) the mid-50 is to the ardent period of the mid-1960s; (2) the mid-1960s is to the calm period of the mid-1970s; (3) the mid-1970s is to the recovery period of the mid-80; Beginning in (4) 1986 years then is the latest stage of machine learning.And present this period, outstanding feature is that machine learning has developed into an emerging frontier branch of science, has merged various learning method, and range of application is also increasing, and relevant academic activities are very active.
Machine learning develops into present stage, uses very extensively, and a lot of outstanding algorithms that have been born may be summarized to be the study of symbol-based and basically based on is-not symbol study, namely connectionist learning.And the study of the former symbol-based generally comprises rote learning, instructs formula study, Learning From Examples, matching test study, explanation-based learning etc.
Wherein comparatively common algorithm has: decision Tree algorithms, genetic algorithm, Bayesian statistics algorithm, artificial neural network algorithm, algorithm of support vector machine, association rule algorithm etc.Carried the MBM of these common algorithms in the method for this paper design, and come parameter is carried out maximal possibility estimation with the EM algorithm.
But use machine learning techniques to process specific tasks, mainly face three problems: (1) is for a certain specific tasks the time, setting up machine learning model wastes time and energy, because the distinctiveness of specific tasks details, be difficult to directly use for reference the system model that other have built, need to select according to the personal experience.(2) even certain subtask, selecting properly relatively meet the machine learning algorithm of objective fact essence, how its complicated parameter arranges also is a problem needing to overcome, need rule of thumb or the long computing of subscriber computer obtains, the computing power at alone family (3) user that is difficult to deal with problems fast need to learn and use concrete machine learning software, the machine learning algorithm numerous and complicated, autonomous learning need to spend the plenty of time, simultaneously certain some algorithm of user's autonomous learning differ also that suitable user surely need to solve each run into task.
And the cloud computing technology of emerging appearance but can solve above problem, makes convenient being applied in the reality of machine learning, the faster and better creation of value.
Cloud computing is a kind of novel computation model that proposes on the basis of the development such as distributed system, grid computing, is a kind of method of emerging shared architecture, and what it was faced is ultra-large distributed environment, and core provides data storage and network service.This is a kind of payment and use pattern of serving of referring to, refer to by network with as required, the mode of easily expansion obtains required service.It is relevant with software, internet that this service can be IT, also can be arbitrarily other service.Cloud computing provides the most reliable, safest data storage center, the user again concern of data lose, the trouble such as poisoning intrusion, simultaneously cloud computing has reached minimum to the equipment requirement of user side." cloud " mentioned in the cloud computing is that some can self and the virtual computational resource of management, is generally some large server clusters, comprises calculation server, storage server, broadband resource etc.Cloud computing puts together all computational resources by various clouds are provided, and realizes automatically management by software, need not artificial participation.This can be absorbed in own business more to use that the supplier need not be that loaded down with trivial details details is worried, is conducive to innovation and reduction cost.Be applied in the machine learning, cloud computing can establish machine learning model and correlation module supplies user selection, solves problem so that the user enjoys the achievement of machine learning techniques fast.
Existing cloud computing platform is based on theoretical calculating and stores service substantially, finds no the special cloud computing method of setting up for machine learning, and the present invention provides a kind of feasible implementation method in conjunction with advantage and the characteristic of machine learning techniques and cloud computing technology.
Summary of the invention
Technical matters: the purpose of this invention is to provide is a kind of automatic selection method for machine learning in cloud computing environment.By using cloud computing platform, solved the problem of machine learning modeling inconvenience, provide a kind of and provide conveniently method in conjunction with cloud computing skill and machine learning techniques Coping with Reality problem.Thereby it is worried that the user be need not as loaded down with trivial details details, can more be absorbed in the business of oneself, is conducive to innovation and reduces cost.
Technical scheme: the present invention makes the use of machine learning break away from the constraint of environment, the efficient computing power of cloud computing platform and the transparency have been given full play to, farthest reduced the use threshold of machine learning, so that need not to comform, the user seeks suitable machine learning method by repeatedly testing in the multimachine device learning method, the shortcomings such as the artificial experience that is difficult to predictability, parameter adjustment that modeling is selected, domestic consumer's difficulty of learning have been solved when the practical application machine learning.
The present invention seeks to set up the method for the cloud computing platform that the machine learning service is provided.Under cloud computing platform, carry out system constructing by following three aspects:: the various machine learning clouds of setting up on the one hand a large amount of computing machines compositions that exist with the cloud form, comprise decision Tree algorithms cloud, genetic algorithm cloud, Bayesian statistics algorithm cloud, artificial neural network algorithm cloud, algorithm of support vector machine cloud, association rule algorithm cloud, so that the cloud computing platform acquiescence has carried common machine learning algorithm; Generally estimate cloud, method discovery cloud, EM algorithm support cloud, valuation functions cloud, calculate cloud, machine learning algorithm expansion cloud by the initial modeling cloud, the search volume that are formed by computer cluster equally on the other hand, thereby embody the advantage of cloud, calculated the suitable parameter of the machine learning use that domestic consumer is difficult to or need to calculate for a long time by a large amount of computational resources; Last aspect is that cloud computing platform and user carry out mutual necessary module, comprises the Web interactive interface, and machine learning input/output module and cloud administration module are in order to support the operation of cloud computing platform.
Step 1) under the United Dispatching of cloud administration module, at first by the Web interactive interface, obtain the required rough description of dealing with problems of user, comprise the problem kind, large class under namely selecting, from expert system, cognitive simulation, planning and problem solving, data mining, the network information service, pattern recognition, fault diagnosis, natural language understanding, robot and game, other classification, select
Step 2) enables initial modeling cloud, by step 1) in the large class that provides of user, enter different subclass interfaces, fill in corresponding more detailed information, comprise carry out that sample is uploaded, selected method for expressing, determines the interpretation of result method, usable range, expectation territory
Step 3) starting method is found cloud, compares with historical typical example according to the information that the user provides, and determines because taking the machine learning algorithm of which kind of or which kind; This cloud module is accompanied by the subsequent step operation, thereby constantly adjusts according to each stage result of calculation,
Step 4) information of then step 2 kind of user being inputted, input machine learning input/output module, must unitize, after the datumization, carry out successively the operations such as missing values processing, noise data processing, data scrubbing, data integration, data transformation, data reduction, in order to obtain the intermediate result that general algorithm can use
Step 5) start the valuation functions cloud, according to the user in step 2) information of input sets up valuation functions, the quality of machine learning solution is judged is prepared, thereby the specific algorithm performance is predicted,
Step 6) call simultaneously the EM algorithm and support cloud, solution space is carried out maximal possibility estimation, calculate the approximate location in solution space of optimum solution or more excellent solution, increase search efficiency,
Step 7) arrive this step after, illustrate that preliminary work finishes, be about to carry out the training process of machine learning, automatic decision by above step, call respectively one or several concrete machine learning cloud modules and learn, comprise decision Tree algorithms cloud, genetic algorithm cloud, Bayesian statistics algorithm cloud, artificial neural network algorithm cloud, algorithm of support vector machine cloud, association rule algorithm cloud; Such as User Defined machine learning algorithm expand cloud, then preferentially call machine learning algorithm and expand cloud,
Step 8) calculate through above step, select one or several algorithm clouds, with its startup, pass through simultaneously the Web interactive interface to field feedback, comprise the step of calculating operation, the intermediate result that obtains, current optimum solution changes,
Step 9) supports in the process that iterates of cloud at the EM algorithm, constantly turn back to step 6, step 7 is calculated, judge whether simultaneously to reach end condition, if reach end condition then jump procedure 10, otherwise the performance prediction algorithm that uses step 5 to formulate is judged the outstanding degree of solution, this step needs a large amount of computational resources, thereby need to utilize the calculating advantage of cloud computing, must calculate outstanding solution as far as possible
Step 10) when end condition satisfies, as arriving computing time, finish without more excellent solution or the iteration of algorithm own, result of calculation is converted to the information with readability by the machine learning input/output module, return the client by the Web interactive interface again, and provide detailed data to download, preserve simultaneously the machine learning result, in order to reuse, avoid double counting.
One, architecture
Whole scheme has comprised that cloud is generally estimated in decision Tree algorithms cloud, genetic algorithm cloud, Bayesian statistics algorithm cloud, artificial neural network algorithm cloud, algorithm of support vector machine cloud, association rule algorithm cloud, initial modeling cloud, search volume, method finds that cloud, EM algorithm support cloud, valuation functions cloud, calculating cloud, machine learning algorithm expansion cloud and Web interactive interface module, machine learning input/output module, cloud administration module.Mutual relationship as shown in Figure 2.
The below provides the explanation of concrete module:
Decision Tree algorithms cloud: the decision Tree algorithms modeling service that the major function of this cloud provides and prediction service.Decision tree is a kind of method of the forecasting type modeling for classification, cluster, prediction, and the thought of employing is " dividing and rule ", thereby it is divided into some subsets with the search volume and sets up decision tree.This algorithm is one of most widely used induction algorithm at present, is a kind of algorithm that approaches discrete function, take example as the basis, is commonly used to as sorter.And basic decision Tree algorithms can be described as a kind of greedy algorithm, and the algorithm of existing large-scale application all is that improvement and the function of basic decision tree algorithm strengthened.This algorithm cloud comprises following common improved decision Tree algorithms equally: C4.5 method, CART method, SLIQ method, SPRINT method etc.
The C4.5 algorithm has increased the processing to continuous type attribute, property value vacancy, has improved equally the ability of cutting out of decision tree.The technology that the CART algorithm adopts a kind of two minutes recurrence to cut apart is two subsets with sample decomposition, so that the non-leaf node of each of decision tree has two branches, thereby obtains binary decision tree simple for structure.The SLIQ algorithm has mainly adopted the breadth First algorithm to claim decision tree; The SPRINT algorithm has well solved the restriction of memory size, has processed the ultra-large training set that other algorithms can not be suitable for, and effectively deep layer decision tree.
The function of decision Tree algorithms cloud provides basic decision Tree algorithms modeling training service and prediction service on the one hand, and the characteristic according to the input data of intelligence selects concrete improved decision Tree algorithms to carry out calculation process on the other hand.
Genetic algorithm cloud: the genetic algorithm modeling service that the major function of this cloud provides and prediction service.Genetic algorithm is to carry out evolutionary process by " survival of the fittest " rule in simulating nature circle and the algorithm that designs.Bagley and Rosengerg have at first proposed the concept of genetic algorithm in the PhD dissertation at them in 1967.The monograph of Holland publication in 1975 has been established the theoretical foundation of genetic algorithm.Nowadays genetic algorithm has not only provided clearly arthmetic statement, and Set upOne The result of a little quantitative test has obtained widely should in various fieldsBe similar to that production task planning, exploited in communication, TSP problem, knapsack problem and image are processed and during signal processing etc., can adopt this algorithm cloud with, the problem description that proposes as the user.
The fundamental purpose of genetic algorithm cloud is to use genetic algorithm idea, and final solution is tried to achieve in the simulation of evolving gradually to user's target.Carry out iteration by initial solution, constantly from old solution, produce new explanation according to certain rule, and expect that new solution is more outstanding than old solution.If new explanation is higher by the value that valuation functions calculates, the chance that it keeps is also just larger.Genetic algorithm is only used the coded representation problem, and the value that obtains take valuation functions does not require clear and definite analytical expression as foundation, therefore can solve the non-linear optimizing problem of arbitrary height.And easily be combined with other algorithms, get its strong point, obtain more excellent effect.
Bayesian statistics algorithm cloud: the Bayes Modeling service that the major function of this cloud provides and prediction service.Bayes is that statistic algorithm can be predicted may concern between the class members, belongs to the probable value of certain class such as given certain sample number.Bayesian algorithm belongs to the certain kinds probability by the calculating sample and classifies mainly based on Bayes law.Compare with other method, Bayes can in conjunction with sample information and prior probability, be specially adapted to the situation that sample is difficult to obtain.Owing to need to calculate prior probability, along with increasing so that rise appreciably computing time of sample, more be fit to small scale machine study simultaneously.When user's problem description is similar to the aspects such as information recovery and diagnosis, economic field, automatic classification, production quality control, can adopt this algorithm cloud.
The fundamental purpose of Bayesian statistics algorithm cloud is to use Bayes statistical method, calculates the probability of the object that belongs to a certain class, has the class of maximum probability and is class under this object.The object of its processing can be that disperse, continuous, also can be mixed type.Based on bayes method, common are naive Bayesian method and Bayesian network method.This algorithm cloud comprises common bayes method method for establishing model and the expansion of constantly being upgraded.
Artificial neural network algorithm cloud: the Artificial Neural Network Modeling service that the major function of this cloud provides and prediction service.Artificial neural network utilizes the intelligency activity of computer technology simulation human brain, simulates structure and the Information Conduction mode of biological neural network, and expresses with mathematical form.Artificial neural network is a basic technology in the current intelligent science and technology, and the symbolic reasoning mechanism of the connection mechanism of employing and artificial intelligence becomes two large camps of intelligence science and technology side by side.The anatomical physiology feature of artificial Neural Network Simulation human brain is with many parallel simple neurons, under certain topological structure connects, accept external information, mutually stimulate simultaneously, thereby reach distributed store, associative memory, the feedback refinement, the black box mapping, Weight balance, dynamic approximation, record is deposited in holography, the effect of fault-tolerant anti-mistake.Simultaneously because the imictron interconnection, when quantity reaches certain rank, can form powerful self study, self-adaptation, self-organization, self diagnosis, self-reparing capability, by constantly feeding back between node, can simulate to a certain extent the reasoning from logic of human brain, scope therefore has a wide range of applications.Particularly in pattern-recognition, when approximation of function and loan risk evaluation, can preferentially adopt this algorithm cloud.
The fundamental purpose of artificial neural network algorithm cloud is used artificial neural network technology, adopts the method for simulation cerebral nerve network, manual construction a kind of neural network that can realize certain function.This algorithm cloud can produce human brain neural network's mathematical model, forms a kind of model of setting up based on imitating cerebral nerve network structure and function.This model is interconnected by a large amount of simple components and neuron, and a kind of complex network of formation has the non-linear of height, can carry out complicated logical operation and nonlinear relationship and realize.Usable range is very extensive.
Algorithm of support vector machine cloud: the model construction of SVM service that the major function of this cloud provides and prediction service.Support vector machine (Support Vector Machine) is a kind of new model recognition methods that grows up on the basis of Statistical Learning Theory in recent years, shows many distinctive advantages in solving small sample, non-linear and higher-dimension pattern recognition problem.The main application fields of support vector machine has pattern-recognition, approximation of function and probability density to estimate etc. that this algorithm cloud can be preferentially adopted in these fields.
The fundamental purpose of algorithm of support vector machine cloud is, sets up a kind of model, and sample vector is mapped to high latitude space, and structure optimal classification face obtains linear optimal decision function in higher dimensional space.Gap metric by the control lineoid comes the over-fitting of inhibition function, simultaneously by having used kernel function to solve cleverly problem of dimension, and the directly related sample dimension of complexity of having avoided learning method to calculate.Optimal classification lineoid wherein so that classifying distance is large as far as possible, and guarantees in empirical risk minimization, so that fiducial range is minimum in the boundary of generalization, thereby guarantees that real risk is minimum under guaranteeing that sample is without the misclassification situation.
Association rule algorithm cloud: the correlation rule modeling service that the major function of this cloud provides and prediction service.The association rule algorithm purpose is to solve a class problem of excavating the relevance between the collection on the large transaction data set (TDS).Association Rule Analysis is a large generic task in the machine learning.It originates from the analysis to dichotomic variable, expresses two relations between the dichotomic variable with the mode of rule, and the relation between a plurality of dichotomic variable.Certainly, development afterwards also so that correlation rule not only is confined to dichotomic variable, also can be analyzed many classified variables and continuous variable.So correlation rule can be regarded as the situational variables Relations Among, and this relation table is reached the method for the rule of be very easy to explaining.The Association Rule Analysis method distributes to data and does not do any requirement, and the result of gained is complete based on data, without any the subjectivity supposition, has reflected objectively the essence of data, and very strong cogency is arranged.The result that correlation rule obtains data analysis can be regarded as regular summary between variable in the data.Therefore correlation rule is after proposing, obtained a large amount of application in all trades and professions, particularly in the machine learning modeling in the extremely huge fields such as Astronomy, Meteorology biology of market analysis, credit assessment, commodity price analysis, intrusion detection and quantity of information, can preferentially adopt this algorithm cloud.
The fundamental purpose of association rule algorithm cloud is that the model of setting up model can solve following problem: the pattern of the incidence relation between different objects or the description of form; Improve the related speed of calculating and reduce storage space; Association analysis in mass data etc.
Initial modeling cloud: this cloud is uploaded sample, basic representation method, is determined that interpretation of result method, usable range, expectation territory etc. carry out initialization what the user provided, obtains initial model.
Cloud is generally estimated in the search volume: the major function of this cloud provides the estimation position of feasible solution and outstanding solution, both obtained the hunting zone that is complementary with problem description in the possible space, get rid of as much as possible the space of the outstanding solution that can't be born, thereby improve search efficiency, reduce calculated amount.
Method is found cloud: the major function of this cloud is to select suitable machine learning algorithm to set up model.By generally estimating and initial calculation, predictability is selected certain or certain several machine algorithm cloud.
The EM algorithm supports cloud: the model construction of SVM service that the major function of this cloud provides and prediction service.A lot of algorithms all will carry out the parameter estimation of model in the machine learning, namely will carry out maximum likelihood estimation or maximum posteriori likelihood and estimate.When the variable in the model is directly during observation variable, maximum likelihood or maximum posteriori likelihood are obvious.But when some variable is hidden, carry out maximum likelihood and just estimate that very complicated difficult is directly to obtain.Exist in the situation of latent variable, the method that model parameter is estimated has a variety of, and a kind of popular Maximum Likelihood Estimation is the Expectation-Maxi2mization algorithm, usually referred to as the EM algorithm.It is not directly the posteriority of complexity to be distributed to maximize or carry out analog computation, but adds some potential data on the basis of observed data, calculates and finish a series of simple maximizations or simulation thereby simplify.The EM algorithm be a kind of from non-complete data the Maximum Likelihood Estimation of solving model parameter.Non-complete data generally is divided into two kinds of situations: a kind of is because restriction or the mistake of observation process itself, such as human error, the fragmentary data that obtains such as be difficult to measure; A kind of to be that the likelihood function of parameter is directly optimized very difficult, and introduce extra parameter, such as parameter implicit or that lose.So be that definition raw observation data add that excessive data forms " complete data " to its optimization method, the raw observation data just become " fragmentary data " naturally.
The valuation functions cloud: the major function of this cloud be reflection when setting up machine learning model with the degree that conforms to of target, and to the assessment of established model.On the one hand according to characteristic and the historical experience regulation valuation functions of each algorithm, whether good by checking in the training data performance on the other hand, test in test data independently again.Test data wherein is must break away from model to set up algorithm, only participates in prediction and judges.
Calculate cloud: the major function of this cloud is to give full play to the advantage of cloud computing, under ultra-large distributed environment, utilizes the calculated performance, data storage and the network service that provide to come the magnanimity computing that machine learning needs is calculated.The calculating advantage of parallel computation, Distributed Calculation and grid computing has been given full play in cloud computing, and calculation services can well be provided.
Machine learning algorithm expands cloud: the major function of this cloud provide machine learning algorithm can't satisfy user's needs the time, be that User Defined or platform itself are from the interface of upgrading reservation.On the one hand machine learning algorithm expands cloud according to the new learning algorithm of certain rule structure, and this cloud is responsible for contacting other cloud and module on the other hand, thereby so that the new learning algorithm of structure can be complete use.
Web interactive interface module: the major function of this cloud provides interactive interface.Cloud computing support the user at an arbitrary position, use various terminals to obtain application service.Requested resource is from " cloud ", rather than fixing tangible entity.Be applied in somewhere operation in " cloud ", but in fact the user need not to understand, also do not worry using the particular location of operation, that is to say for the user it is transparent.Only need a notebook or a mobile phone, just can realize all that we need by network service, even comprise the task that supercomputing is such.Therefore, be best mode alternately by the Web interface, the user needn't be concerned about operation and the computing that carry out on the backstage, only needs to be concerned about the information of input and the result of output.This module and processing and user's interaction problems.
Machine learning input/output module: the model construction of SVM service that the major function of this cloud provides and prediction service.For the mathematical modeling of machine learning provides feasible input sample and parametric description, in fact also comprised the pre-service work of numerous and complicated.As much as possible difference is recorded that form, different custom, different time weak point, diverse location, different data set are right, the data centralization of different ill-formalnesses, integration, cleaning.Usually to unitize, datumization, format conversion is carried out the operations such as missing values processing, noise data processing, data scrubbing, data integration, data transformation, data reduction
The cloud administration module: the major function of this cloud is startup, execution and the monitor state of each module of management.Cloud computing is because its ultra-large property, generally have hundreds of thousands of station servers, large enterprise even have the hundreds of thousands station server, and for user transparent, this all needs a large amount of bookkeepings, control the ruly operation of each module, scheduling and allocating task are rationally utilized storage, calculating, bandwidth resources.
Two, method flow
1, builds and operational scheme
1. the user at first installs and starts the cloud administration module, then increases successively by administration module that decision Tree algorithms cloud, genetic algorithm cloud, Bayesian statistics algorithm cloud, artificial neural network algorithm cloud, algorithm of support vector machine cloud, association rule algorithm cloud, initial modeling cloud, search volume estimate generally that cloud, method find that cloud, EM algorithm support cloud, valuation functions cloud, calculate cloud, machine learning algorithm expands cloud and Web interactive interface module, machine learning input/output module.
2. start Web interactive interface module, wait for user's use.When the user set up the machine learning model request by the submission of Web interactive interface, the cloud administration module will start and call other cloud module, the mathematical modeling that carries out machine learning.
3. call initial modeling cloud, upload sample, basic representation method, determine that interpretation of result method, usable range, expectation territory etc. carry out initialization what the user provided, obtain initial model.
4. operation method is found cloud, compares with historical typical example according to the information that the user provides, and determines because taking the machine learning algorithm of which kind of or which kind.This cloud module is accompanied by the subsequent step operation, thereby constantly adjusts according to each stage result of calculation.
5. the cloud administration module is by input machine learning input/output module, with the data of user by Web interactive interface input unitize, after the datumization, carry out successively the operations such as missing values processing, noise data processing, data scrubbing, data integration, data transformation, data reduction, in order to obtain the intermediate result that general algorithm can use.
6. starting the valuation functions cloud prepares to the quality of judging the machine learning solution.This step mainly is to formulate the specific algorithm of estimated performance, whether uses the verification methods such as cross validation and leaving-one method, bootstrap method.
7. call the EM algorithm and support cloud, solution space is carried out maximal possibility estimation, calculate the approximate location in solution space of optimum solution or more excellent solution, increase search efficiency.
The ultimate principle of EM algorithm can be expressed as follows: the data that can observe are y, and complete data x=(y, z), z are hidden variables, the expression missing data, and θ is model parameter.θ about the posteriority distribution p (θ | y) very complicated, be difficult to carry out various different statistical computations.If z is known, then may obtain a simple interpolation posteriority distribution p about θ (θ | y, z), utilize the simplicity of p (θ | y, z) can carry out various statistical computations.Then, can the supposition of z be conducted a survey and improve again, thereby with the maximization of a complexity or the problem reduction of sampling.
Can find out that the EM algorithm is a kind of alternative manner, be mainly used in asking the posteriority mode of distribution.
The specific implementation step is as follows: suppose that y is the non-complete observation data collection of obeying a certain distribution, and there is a complete data collection x=(y, z), then the density function of x is: p (x| θ)=p (y, z| θ)=p (z|y, θ) p (y| θ) therefrom can find out, density function p (x| θ) be by Marginal Density Function p (θ | y), hypothesis, parameter θ initial estimate and the hidden variable z of hidden variable z and the relation between the observational variable y determine.
8. after preliminary work is finished, carry out the modeling process of machine learning, automatic decision by above step, call respectively one or several concrete machine learning cloud modules and learn, comprise decision Tree algorithms cloud, genetic algorithm cloud, Bayesian statistics algorithm cloud, artificial neural network algorithm cloud, algorithm of support vector machine cloud, association rule algorithm cloud.Such as User Defined machine learning algorithm expand cloud, then preferentially call machine learning algorithm and expand cloud.
2, machine learning modeling flow process
1. decision Tree algorithms modeling
Decision tree can be regarded a tree-shaped forecast model as, and it comes classified instance by example is aligned to certain leaf node from root node, and leaf node is the classification under the example, shown in Fig. 3 decision tree basic configuration figure.The key problem of decision tree is to select the beta pruning of Split Attribute and decision tree.The algorithm of decision tree has a lot, and ID3, C4.5, CART etc. are arranged.These algorithms all adopt top-down greedy algorithm, and the best attribute of each node selection sort effect is 2 or a plurality of child node with node split, continue this process until this tree classification based training collection exactly, or all properties are used all.A kind of classification and regression algorithm in the machine learning such as wherein classification regression tree (CART).If training sample set L={X 1, X 2, X 3... X n, Y}, wherein, X i(i=1,2,3 ..., n) be called attribute vector; Y is called label vector or categorization vector.When Y is orderly quantitative value, be called regression tree; When Y is discrete value, be called classification tree.At the root node place of tree, search problem collection (data acquisition space) finds so that the optimum division variable that the non-purity of data set descends maximum in the child node of future generation and corresponding division threshold value.
Here non-purity index is weighed with the Gini index, and it is defined as: Wherein, i (t) is the Gini index of node t, and p (i/t) is illustrated in the shared ratio of sample that belongs to the i class among the node t, and p (j/t) is the shared ratio of sample that belongs to the j class among the node t.Divide variable and divide threshold value root node t with this 1Split into t 2And t 3If, at certain node ti place, the remarkable reduction of further non-purity can not be arranged again, then this node ti becomes leaf node, otherwise continues to seek its optimum division variable and divide threshold value and divide.For classification problem, in leaf node, only have a class, this class is just as the class under the leaf node so, if there is the sample in a plurality of classes to exist in the node, determines classification under the node according to that maximum class of sample in the leaf node; For regression problem, then get the mean value of its quantitative value.Clearly, the very large undue fitting data of tree possibility, but less tree possibly can't catch important structure again.The best size of tree is the adjustment parameter of control model complicacy, and it should be by the selection of data adaptive.A kind of desirable strategy is to increase a larger tree t 0, only when reaching minimum node size (such as 3), just stop fission process.Then the method for utilizing Pruning strategy and 5 foldings or 10 folding cross validations to combine is pruned this tree, thereby some Noise and Interference data are got rid of, and obtains optimal tree.Thereby set up the mathematical model of decision tree.
2. genetic algorithm modeling
For little space, the classical method of exhaustion is just enough; And to large space, then need to use special artificial intelligence technology.Genetic algorithm (Genetic Algorithm) is a kind of in these technology, it be an analoglike biological evolution process and produce by selecting operator, Crossover Operator and mutation operator three basic to calculate molecular global optimizing algorithm.It by selecting operator to select the good male parent of proterties, hybridizes computing by Crossover Operator from an initial family, and mutation operator carries out a little variation, the random search model space under certain rule of probability control.Evolution generation upon generation of is until final error functional value of separating correspondence reaches the requirement of setting.
The t time iteration, genetic algorithm is kept the colony of a potential solution
Figure G2010100179189D00091
Each separates x 1 tThe evaluation function evaluation evaluation of using the valuation functions cloud to obtain.Then by selecting more suitable individuality (t+1 iteration) to form a new colony.The member of new colony carries out conversion by hybridization and variation, forms new solution.Cross combination the feature of two parent chromosomes (the binary coding string of parameter namely to be asked), formed two similar offsprings by exchange parent corresponding segment.For example parent chromosome is (a 1, b 1, c 1, d 1, e 1) and (a 2, b 2, c 2, d 2, e 2), behind second gene, to hybridize, the offspring of generation is (a 1, b 1, c 2, d 2, e 2) and (a 2, b 2, c 1, d 1, e 1).The purpose of Crossover Operator is to carry out message exchange between the potential solution of difference.Variation is to change randomly the one or more genes (bit in the chromosome) that are selected on the chromosome by the probability that equals aberration rate with.Some extra variabilities are introduced in being intended that to colony of mutation operator.Modeling process is shown in Fig. 4 genetic algorithm basic process figure.This process has been set up the genetic algorithm mathematical model thus.
3. Bayesian statistics modeling
Bayes is that the modeling of Bayes statistical method is a kind of method for classifying modes in the situation of known prior probability and class conditional probability.The classification results for the treatment of minute sample of its processing depends on all of sample in each class field.If
Training sample set is divided into the M class, is designated as C={c 1, c 2..., c t..., c M, the prior probability of every class is P (c i), i=1,2 ..., M.When sample set is very large, can think P (c i)=c iSample number/total sample number.Treat a minute sample X for one, it is attributed to c iThe class conditional probability of class is P (X/c i), then according to the Bayes theorem,
Can obtain c iPosterior probability P (the c of class i/ X)=P (X/c i) P (c i)/P (X) P (ci/X).If P is (c i/ X)=MaxjP (c j/ X), and i=1,2 ..., M, j=1,2 ..., M then has X ∈ c i, maximum posterior probability decision rule criterion that Here it is.The Bayes sorting technique is proved more fully in theory, also is very widely on using.Overall probability distribution and the probability distribution function of Different categories of samples (or density function) usually are ignorant.In order to obtain them, just require sample enough large.In addition, when being used for text classification, the Bayes method requires the descriptor of expression text separate, and such condition general being difficult in actual text satisfied, so the method often is difficult to reach theoretic maximal value in effect.Prior probability by being based upon statistical and this method of class conditional probability can be set up Bayesian statistical model.
4. artificial neural network algorithm modeling
Artificial neural network (Artificial Neural Network. is called for short ANN) just the mankind to the basis of its cerebral nerve network the cognition and comprehension on the neural network that can realize certain function of manual construction.It is the human brain neural network's that theorizes mathematical model, is based on imitation cerebral nerve network structure and function and a kind of information handling system of setting up.It is actually by a large amount of simple components and interconnects the complex network that forms, and has the non-linear of height, can carry out the system that complicated logical operation and nonlinear relationship realize.
Artificial neural network is organized by layer, and every one deck is comprised of a plurality of artificial neurons, does not have connecting line to connect between them, and connects by connecting line between layers.Artificial neural network can have individual layer, also multilayer can be arranged, and at present commonly used have an individual layer, two layers and three layers.
Artificial neural network is divided into two kinds of frame modes according to artificial neuron's data flow mode: forward direction type and feedback-type.If the artificial neuron metadata less than feedback, is not referred to as the forward direction type from being input to the output uniflux, if feedback (no matter feeding back to this neuron or other neuron of same layer) is arranged, then be referred to as feedback-type.
The artificial neuron is the base unit of artificial neural network, and the artificial neuron can have multiple model, but has a kind of substantially horizontal type the most common, and it is composed as follows:
I. input: an artificial neuron can have a plurality of inputs.
II. output: an artificial neuron can only have an output.
III. inner structure: will input addition with totalizer, and then add deviate, and then calculate it with activation function, the neuronic output of the as a result conduct of calculating
A. totalizer: will input linear, additive, exactly, be that the product with input and corresponding weight value sums up.
B. deviate: the value that totalizer produces often can be subject to external disturbance and impact and produce deviation, therefore needs a deviation to adjust, and generally is used for θ k to represent k neuronic deviate.
C. activation function: be used for limiting the scope of neuron output value, generally-1~+ 1 or 0~1.
Activation function commonly used has Logistic, Simoid etc.
Link to each other with connecting line between the artificial neuron, every connecting line has weights, and as mentioned above, the inner totalizer of the target nerve of connecting line unit can use these weights when summing up.Use ω IjRepresent that i neuron is to the weights of connecting line between j neuron.
Artificial neural network has learning functionality, and this study is trained it with real data sample exactly.A data sample has the input and output data, with the input of input data as artificial neural network, then the relatively output of artificial neural network and the output of sample, by adjusting the parameter (being weights and the neuronic deviation of connecting line) in the artificial neural network, so that both differences are 0 or within the acceptable scope.
Trained artificial neural network has certain judgement and inferential capability, and can carry out certain prediction and decision-making.Reflections propagate model (BP, Back Propagation) is the modal a kind of model of artificial neural network, has the application more than half of surpassing to adopt this model.It is multilayer forward direction type structure, is comprised of following three parts:
I. input layer: only have one deck, formed by m neuron, receive extraneous m input xi (i=1,2 ..., m), each input links to each other with a neuron.The neuron of this one deck is non-basic neuron, does not have inner structure, and the value of its output is exactly the value of input.
II. hidden layer: multilayer can be arranged, and every layer is comprised of n neuron, and these neurons are exactly the basic neuron of introducing previously.
III. output layer: only having one deck, be comprised of p neuron, also is basic neuron.
It is the connection of multi-to-multi that the neuron that (comprises between a plurality of hidden layers) between above-mentioned each layer connects, and the input and output layer is man-to-man the connection with the external world, shown in Fig. 5 artificial neural network basic block diagram.
Substantially neuronic activation function adopts the Logistic function, and expression formula is:
O j = 1 1 + e - I j
Algorithm divides following step:
Calculate the input value of each neuronic j of hidden layer and output layer, thus the output valve of calculating:
A) input:
Figure G2010100179189D00112
I is that front one deck all and neuron j have the neuron that connects in the formula.
B) output: adopt Logistic function calculation output valve.
Calculate the error of output layer neuron j:
E rrj=O i(1-O j)(T j-O j)
T in the formula jBe sample class label.
Calculate the error of each neuron j of hidden layer:
E rrj = O j ( 1 - O j ) Σ k E rrk ω jk
In the formula k be later layer all with neuron j the neuron that is connected, E are arranged RrkThen be these neuronic errors.
Each connecting line weights ω in the computational grid IjModified value:
Δω ij=(l)E rriO j
(l) is the learning rate of algorithm in the formula, and this value is formulated voluntarily by the trainer.The selection of learning rate helps to seek the minimum weights of the overall situation, selects too littlely, and learning process can be carried out very slowly, and is too large, may appear between the unsuitable solution to swing.Generally can select a constant between (0,1), empirical value commonly used is 1/t, and t is the number of times of iteration.
Then calculate the new weights of this connecting line, and revise it:
ω ij=ω ij+Δω ij
The modified value of each neuron deviate in hidden layer and the output layer in the computational grid:
Δθ j=(l)E rrj
Then calculate the new deviate of this neuron:
θ j=θ j+Δθ j
2) check end condition, several are generally arranged, as:
A) Δ ω iWith Δ θ jAll enough little, less than a certain designated value;
B) iterations has reached specified quantity.
This process has been set up the mathematical model of artificial neural network, by the neural network model that trains, can calculate the input sample, thereby obtain predicted value.
5. algorithm of support vector machine modeling
The initial thought of support vector machine is how to seek the optimal classification face for the linear separability problem, for feature space neutral line separable problem, the optimal classification face is exactly the interphase of interval γ maximum, according to the analysis of above-mentioned nuclear theory as can be known, it really under guaranteeing that sample is by the prerequisite of correctly classifying, have the interphase of best generalization ability.For the inseparable problem of feature space neutral line, can consider by a penalty factor impact of interval and relaxation factor.
The lineoid of a usefulness feature space of consideration is done the problem of two-value classification to given training dataset.For given sample point: (x 1, y 1) ..., (x l, y l), x i∈ R n, y i{ 1 ,+1} is vector x wherein for ∈ iMay be to extract the directly vector of structure of some feature from the object samples collection, also may be that original vector is vectorial by the mapping that certain kernel function is mapped in the nuclear space.In feature space, construct segmentation plane:
(wx)+b=0 so that:
( w · x i ) + b ≥ 1 y i = 1 ( w · x i ) + b ≤ - 1 yi = - 1 ⇔ y i [ ( w · x i ) + b ] ≥ ( i = 1,2 , . . . , l )
Can calculate, the minor increment of the segmentation plane that training dataset to is given is:
p ( w , b ) = min { x i | y i = 1 } w · x i + b | w | - max { x i | y i = - 1 } w · x i + b | w | = 2 | w |
To optimizing the definition of segmentation plane, can find out that the Solve problems to this plane can be reduced to: in the situation of the formula of satisfying condition (3), calculating can maximize normal vector w and the side-play amount b of the segmentation plane of p (w, b) according to SVM.The people such as Vapnik prove:
The normal vector w of cutting apart lineoid 0The linear combination of all training set vectors.Be w 0Can be described as:
Figure G2010100179189D00123
Definition discriminant function f (x)=w 0X+b 0Then the classification function of test set can be described as: label (x)=sgn (f (x))=sign (w 0X+b 0)
Under the situation of linear separability, all should satisfy all training samples | f (x) | 〉=1, hereinafter, we are satisfying | f (x) |<1 zone calls cuts apart the corresponding borderline region of lineoid.
The finding the solution of optimum segmentation plane is equivalent under the former constraint below the maximization
Figure G2010100179189D00124
Introduce Lagrange multiplier α i, i=1,2 ..., l, and definition
Figure G2010100179189D00125
Use the Wolfe antithesis fixed
MaxW ( α ) = Σ i α i - 1 2 w ( α ) · w ( α )
subject to α i ≥ 0 , Σ i a i y i = 0
Reason is converted into its dual problem to the problems referred to above:
For the training set of linearly inseparable, can introduce slack variable ξ i, be rewritten as following:
Min ( 1 2 | | w | | 2 + C Σ i ξ i )
Subject?to?y i(w·x i+b)≥1-ξ i,ξ i≥0
Similarly can obtain corresponding dual problem:
MaxW ( α ) = Σ i α i - 1 2 w ( α ) · w ( α )
subject to 0 ≤ α i ≤ C , Σ i a i y i = 0 ,
Finding the solution of this form is a typical constrained quadratic form optimization problem, the derivation algorithm that a lot of maturations have been arranged, in recent years, V.Vapnik, C.Burges, E.Osuna, T.Joachims, the people's such as J.Platt a series of activities is so that become possibility to the implement the algorithm of support vector machine of Large-Scale Training Data Set.
The mathematical model of setting up by above description, can automatic seeking finding out those has the support vector of better separating capacity to classification, and the sorter that constructs thus can maximize the interval of class and class, thereby preferably adaptive faculty and higher differentiation rate are arranged.The method only need to decide last classification results by the classification of the boundary sample of each class field, has finally set up the support vector machine mathematical model.
6. association rule algorithm modeling
Association rule mining is a class problem of excavating the relevance between the collection at large transaction data set (TDS).Association Rule Analysis is a large generic task in the machine learning.It originates from the analysis to dichotomic variable, expresses two relations between the dichotomic variable with the mode of rule, and the relation between a plurality of dichotomic variable.Certainly, development afterwards also so that correlation rule not only is confined to dichotomic variable, also can be analyzed many classified variables and continuous variable.So correlation rule can be regarded as the situational variables Relations Among, and this relation table is reached the method for the rule of be very easy to explaining.
The Association Rule Analysis method distributes to data and does not do any requirement, and the result of gained is complete based on data, without any the subjectivity supposition, has reflected objectively the essence of data, and very strong cogency is arranged.The result that correlation rule obtains data analysis can be regarded as regular summary between variable in the data.Therefore correlation rule has obtained a large amount of application in all trades and professions after proposing.
The algorithm of correlation rule is exactly by the solution procedure of input to output.If I={i 1, i 2..., i mM different item destination aggregation (mda), element wherein is called (Item).Note D is the set of item for the set of transaction T (Transaction), the T that concludes the business here, and
Figure G2010100179189D00133
Corresponding each transaction has unique sign, such as Transaction Identification Number, is denoted as TID.Correlation rule be shape as
Figure G2010100179189D00134
Implications, here,
Figure G2010100179189D00135
Figure G2010100179189D00136
And X ∩ Y=θ.X is called the prerequisite of rule, and Y is the result.Rule
Figure G2010100179189D00137
Support in transaction set D (Support) refers to comprise the number of deals of X and Y and the ratio of All Activity number, is designated as
Figure G2010100179189D00138
Namely
Figure G2010100179189D00139
Rule
Figure G2010100179189D001310
Confidence level in transaction set D (confidence) refers to comprise the ratio of number of deals with the number of deals that comprises X of X and Y, namely A given transaction set D, the Mining Association Rules problem is exactly to produce support and confidence level respectively greater than the correlation rule of the given minimum support of user (Minsupp) and minimum confidence level (Minconf), is called strong rule.
The task of association rule mining is exactly to excavate strong rules all among the data set D.Strong regular X] Item Sets (X ∪ Y) that Y is corresponding must be collection frequently, collection (∪ Y) correlation rule of deriving frequently
Figure G2010100179189D001312
Degree of confidence can calculate with the support that frequently collect X and (X ∪ Y).
The mathematical model that contains description rule by above process obtains is the correlation rule modeling.
Beneficial effect: because the development of network, information is explosive increase, how effectively to utilize these letters, and uses these information to boost productivity to become problem in the urgent need to address.Present present situation is to only have few part can be by correct use in the information that can effectively obtain in a large number, the information that has consumed ample resources not only can not be used effectively, and because Useful Information deeper is buried among the garbage, becoming more is difficult to utilize.Machine learning is to solve one of effective ways of this class problem.Along with going deep into and the specifically expansion of application of machine learning research, a large amount of machine learning modeling mission requirements have been brought.It is same because machine learning is of a great variety, could set up the preferably mathematical model of Complex Problem essential characteristic for the machine learning algorithm that concrete problem description need to adapt, the machine learning model that has often spent the plenty of time searching can not well reflect objective reality.
Waste time and energy for the model that specific tasks are based upon on the machine learning basis, because the distinctiveness of specific tasks details is difficult to directly use for reference the machine learning model that other have built, need to select according to the personal experience.Even selecting properly relatively meet the machine learning algorithm of objective fact essence, how complicated parameter is set, also need rule of thumb or the long computing of subscriber computer obtains, the computing power at alone family is difficult to deal with problems fast.Simultaneously, the user need to learn and use concrete machine learning software, the machine learning algorithm numerous and complicated, and user's autonomous learning need to spend the plenty of time, each task that suitable user surely need to solve and certain some algorithm of user's autonomous learning also differ.
The solution route that this programme provides takes full advantage of the strong cloud computing platform of computing power on the one hand, the computational problem of complexity when solving machine learning, utilize on the other hand cloud computing for user's simple and easy usability, the transparency, solved the machine learning algorithm that domestic consumer is difficult to select to meet objective reality, thereby the machine learning model that Rapid Establishment can solving practical problems, and find as far as possible automatically suitable parameter.
Description of drawings
Fig. 1 machine learning modeling cloud computing flowchart,
Fig. 2 module relation diagram,
Fig. 3 decision tree basic configuration figure,
Fig. 4 genetic algorithm basic process figure,
Fig. 5 artificial neural network basic block diagram,
Embodiment
The present invention is the autonomous system of selection of a kind of machine learning based on cloud computing environment.By using cloud computing platform, the user need not to build the running environment of machine learning, also need not to select machine learning algorithm, more need not adjust the machine learning function of numerous and complicated and subsidiary parameter thereof, only need to use the Web mode to upload sample data, with regard to the machine learning mathematical model of setting up realistic problem of energy automated intelligent.The present invention makes the use of machine learning break away from the constraint of environment, has brought into play the advantage of cloud computing platform, so that the machine learning modeling, has farthest reduced the use threshold of machine learning for user transparent.The shortcomings such as the artificial experience that is difficult to predictability, parameter adjustment that modeling is selected, domestic consumer's difficulty have been solved when the practical application machine learning.The final platform of setting up can put together all computational resources fully in conjunction with the cloud computing advantage, realizes automatically management by software.In data analysis process, it integrates historical data and available data, makes the information of collecting more accurate, can provide Intelligent Service for machine learning.The user no longer needs to be concerned about how to buy server, machine learning software according to the business demand of oneself, as long as the demand of basis oneself just can obtain the machine learning achievement by cloud computing platform, obtains the machine learning mathematical model, is used for solving practical problems.
Concrete steps are:
1. under the United Dispatching of cloud administration module, at first by the Web interactive interface, obtain the required rough description of dealing with problems of user, comprise the problem kind, large class under namely selecting is as selecting from expert system, cognitive simulation, planning and problem solving, data mining, the network information service, pattern recognition, fault diagnosis, natural language understanding, robot and game, other classification;
2. enable initial modeling cloud, by the large class that the user in the step 1 provides, enter different subclass interfaces, fill in corresponding more detailed information, comprise carry out that sample is uploaded, selected method for expressing, determines the interpretation of result method, usable range, expectation territory etc.
3. starting method is found cloud, compares with historical typical example according to the information that the user provides, and determines because taking the machine learning algorithm of which kind of or which kind.This cloud module is accompanied by the subsequent step operation, thereby constantly adjusts according to each stage result of calculation.
4. the information of then step 2 kind of user being inputted, input machine learning input/output module, must unitize, after the datumization, carry out successively the operations such as missing values processing, noise data processing, data scrubbing, data integration, data transformation, data reduction, in order to obtain the intermediate result that general algorithm can use.
5. start the valuation functions cloud, set up valuation functions according to the user in the information of step 2 input, the quality of machine learning solution is judged prepared.This step mainly is to formulate the specific algorithm of estimated performance, whether uses the verification methods such as cross validation and leaving-one method, bootstrap method.
6. call simultaneously the EM algorithm and support cloud, solution space is carried out maximal possibility estimation, calculate the approximate location in solution space of optimum solution or more excellent solution, increase search efficiency.
7. after arriving this step, illustrate that preliminary work finishes, be about to carry out the training process of machine learning, automatic decision by above step, call respectively one or several concrete machine learning cloud modules and learn, comprise decision Tree algorithms cloud, genetic algorithm cloud, Bayesian statistics algorithm cloud, artificial neural network algorithm cloud, algorithm of support vector machine cloud, association rule algorithm cloud.Such as User Defined machine learning algorithm expand cloud, then preferentially call machine learning algorithm and expand cloud.
8. after decision Tree algorithms cloud, genetic algorithm cloud, Bayesian statistics algorithm cloud, artificial neural network algorithm cloud, algorithm of support vector machine cloud, association rule algorithm cloud and machine learning algorithm expand the cloud startup, constantly as achievement in the middle of cloud administration module, calculating cloud, EM algorithm support cloud feedback result and the acquisition, thereby automatically adjust self strategy, approach outstanding solution.Pass through simultaneously the Web interactive interface to field feedback, comprise the step of calculating operation, the intermediate result that obtains, current optimum solution variation etc.
9. support in the process that iterates of cloud at the EM algorithm, constantly turn back to step 6, step 7 is calculated, the valuation functions of using step 5 to formulate obtains Performance Evaluation, thereby prediction algorithm is judged the outstanding degree of solution, this step needs a large amount of computational resources, thereby need to utilize the calculating advantage of cloud computing, must calculate outstanding solution as far as possible.
10. when end condition satisfies, as computing time to, some generations iteration finish without more excellent solution or the iteration of algorithm own, result of calculation is converted to the information with readability by the machine learning input/output module, return the client by the Web interactive interface again, and provide detailed data to download, preserve simultaneously the machine learning result, in order to reuse, avoid double counting.

Claims (1)

1. automatic selection method for machine learning in cloud computing environment is characterized in that the step that the method comprises is:
Step 1) under the United Dispatching of cloud administration module, at first by the Web interactive interface, obtain the required rough description of dealing with problems of user, comprise the problem kind, large class under namely selecting, from expert system, cognitive simulation, planning and problem solving, data mining, the network information service, pattern recognition, fault diagnosis, natural language understanding, robot and game, other classification, select
Step 2) enables initial modeling cloud, by step 1) in the large class that provides of user, enter different subclass interfaces, fill in corresponding more detailed information, comprise carry out that sample is uploaded, selected method for expressing, determines the interpretation of result method, usable range, expectation territory
Step 3) starting method is found cloud, compares with historical typical example according to the information that the user provides, and determines to take the machine learning algorithm of which kind of or which kind; This cloud module is accompanied by the subsequent step operation, thereby constantly adjusts according to each stage result of calculation,
Step 4) information of then user in the step 2 being inputted, input machine learning input/output module, wait unitize, after the datumization, carry out successively missing values processing, noise data processing, data scrubbing, data integration, data transformation, data reduction operation, in order to obtain the intermediate result that general algorithm can use
Step 5) start the valuation functions cloud, according to the user in step 2) information of input sets up valuation functions, the quality of machine learning solution is judged is prepared, thereby the specific algorithm performance is predicted,
Step 6) call the EM algorithm and support cloud, solution space is carried out maximal possibility estimation, calculate the approximate location in solution space of optimum solution or more excellent solution, increase search efficiency,
Step 7) this step is used the result of calculation of above step, carry out the training process of machine learning, automatic decision by above step, call respectively one or several concrete machine learning cloud modules and learn, comprise decision Tree algorithms cloud, genetic algorithm cloud, Bayesian statistics algorithm cloud, artificial neural network algorithm cloud, algorithm of support vector machine cloud, association rule algorithm cloud; Such as User Defined machine learning algorithm expand cloud, then preferentially call machine learning algorithm and expand cloud,
Step 8) calculate through above step, select one or several algorithm clouds, with its startup, pass through simultaneously the Web interactive interface to field feedback, comprise the step of calculating operation, the intermediate result that obtains, current optimum solution changes,
Step 9) begins the process that iterates of EM algorithm this moment, namely carry out successively rapid 6, step 7, step 8, judge whether simultaneously to reach end condition, if reach end condition then jump procedure 10, otherwise the performance prediction algorithm that uses step 5 to formulate is judged the outstanding degree of solution, and this step needs a large amount of computational resources, thereby need to utilize the calculating advantage of cloud computing, calculate as far as possible outstanding solution
Step 10) when end condition satisfies, if arrive computing time, finish without more excellent solution or the iteration of algorithm own, result of calculation is converted to the information with readability by the machine learning input/output module, return the client by the Web interactive interface again, and provide detailed data to download, preserve simultaneously the machine learning result, in order to reuse, avoid double counting.
CN 201010017918 2010-01-15 2010-01-15 Automatic selection method for machine learning in cloud computing environment Active CN101782976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010017918 CN101782976B (en) 2010-01-15 2010-01-15 Automatic selection method for machine learning in cloud computing environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010017918 CN101782976B (en) 2010-01-15 2010-01-15 Automatic selection method for machine learning in cloud computing environment

Publications (2)

Publication Number Publication Date
CN101782976A CN101782976A (en) 2010-07-21
CN101782976B true CN101782976B (en) 2013-04-10

Family

ID=42522964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010017918 Active CN101782976B (en) 2010-01-15 2010-01-15 Automatic selection method for machine learning in cloud computing environment

Country Status (1)

Country Link
CN (1) CN101782976B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532376A (en) * 2018-04-13 2019-12-03 国际商业机器公司 Classifying text is to determine the target type for selecting machine learning algorithm result

Families Citing this family (86)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102468975B (en) * 2010-11-16 2014-09-10 苏州搜能信息科技有限公司 Resource management simulation cloud computing system for cloud computing mobile network and application system thereof
CN102509177B (en) * 2011-11-11 2014-12-31 国家电网公司 Locally weighted linear regression projection operation method based on cloud platform
CN102523246B (en) * 2011-11-23 2015-07-01 陈刚 Cloud computation treating system and method
CN103139278A (en) * 2011-12-05 2013-06-05 北京网康科技有限公司 Network resource pre-fetching and cache accelerating method and device thereof
CN102724298A (en) * 2012-05-25 2012-10-10 清华大学 Method for configuring storage parameter under cloud environment
CN102799512B (en) * 2012-07-04 2015-06-03 南京邮电大学 Virtual machine monitoring method in vector-autoregression-based cloud computing
CN103036745A (en) * 2012-12-21 2013-04-10 北京邮电大学 Anomaly detection system based on neural network in cloud computing
US9860140B2 (en) * 2013-02-05 2018-01-02 Cisco Technology, Inc. Dynamically adjusting a set of monitored network properties using distributed learning machine feedback
CN103605695A (en) * 2013-11-05 2014-02-26 佛山职业技术学院 Internet based artificial-intelligence knowledge logic system and method thereof
US20150347562A1 (en) * 2014-06-02 2015-12-03 Qualcomm Incorporated Deriving user characteristics from users' log files
CN104200087B (en) * 2014-06-05 2018-10-02 清华大学 For the parameter optimization of machine learning and the method and system of feature tuning
US9436507B2 (en) 2014-07-12 2016-09-06 Microsoft Technology Licensing, Llc Composing and executing workflows made up of functional pluggable building blocks
US10026041B2 (en) 2014-07-12 2018-07-17 Microsoft Technology Licensing, Llc Interoperable machine learning platform
US20160012318A1 (en) * 2014-07-12 2016-01-14 Microsoft Technology Licensing, Llc Adaptive featurization as a service
CN105320835A (en) * 2014-07-15 2016-02-10 通用电气智能平台有限公司 Apparatus and method for time series data analysis method market
CN106067028A (en) * 2015-04-19 2016-11-02 北京典赞科技有限公司 The modeling method of automatic machinery based on GPU study
US20160358099A1 (en) * 2015-06-04 2016-12-08 The Boeing Company Advanced analytical infrastructure for machine learning
CN104951425B (en) * 2015-07-20 2018-03-13 东北大学 A kind of cloud service performance self-adapting type of action system of selection based on deep learning
JP6522488B2 (en) * 2015-07-31 2019-05-29 ファナック株式会社 Machine learning apparatus, robot system and machine learning method for learning work taking-out operation
EP3341073B1 (en) * 2015-08-26 2023-07-26 Boston Scientific Neuromodulation Corporation Machine learning to optimize spinal cord stimulation
CN106528489A (en) * 2015-09-14 2017-03-22 上海羽视澄蓝信息科技有限公司 System for vehicle detection machine learning based on cloud computing
CN107092962B (en) * 2016-02-17 2021-01-26 创新先进技术有限公司 Distributed machine learning method and platform
CN105808500A (en) * 2016-02-26 2016-07-27 山西牡丹深度智能科技有限公司 Realization method and device of deep learning
CN106445988A (en) * 2016-06-01 2017-02-22 上海坤士合生信息科技有限公司 Intelligent big data processing method and system
CN107784363B (en) * 2016-08-31 2021-02-09 华为技术有限公司 Data processing method, device and system
CN106779087B (en) * 2016-11-30 2019-02-22 福建亿榕信息技术有限公司 A kind of general-purpose machinery learning data analysis platform
JP6911408B2 (en) * 2017-03-13 2021-07-28 オムロン株式会社 Evaluation system, safety controller, evaluation program, and evaluation method
SG11201908824PA (en) 2017-03-28 2019-10-30 Oracle Int Corp Systems and methods for intelligently providing supporting information using machine-learning
CN107291811B (en) * 2017-05-18 2019-11-29 浙江大学 A kind of sense cognition enhancing robot system based on cloud knowledge fusion
US10956453B2 (en) * 2017-05-24 2021-03-23 International Business Machines Corporation Method to estimate the deletability of data objects
CN107329445B (en) * 2017-06-28 2020-09-08 重庆柚瓣家科技有限公司 Intelligent supervision method for robot behavior criterion
CN107255969B (en) * 2017-06-28 2019-10-18 重庆柚瓣家科技有限公司 Endowment robot supervisory systems
CN111079942B (en) * 2017-08-30 2023-03-24 第四范式(北京)技术有限公司 Distributed system for performing machine learning and method thereof
JP6577542B2 (en) 2017-09-05 2019-09-18 ファナック株式会社 Control device
CN107538492A (en) * 2017-09-07 2018-01-05 福物(上海)机器人科技有限公司 Intelligent control system, method and the intelligence learning method of mobile robot
JP7229686B2 (en) * 2017-10-06 2023-02-28 キヤノン株式会社 Control device, lithography device, measurement device, processing device, planarization device and article manufacturing method
WO2019076541A1 (en) * 2017-10-19 2019-04-25 British Telecommunications Public Limited Company Algorithm consolidation
CN108021986A (en) * 2017-10-27 2018-05-11 平安科技(深圳)有限公司 Electronic device, multi-model sample training method and computer-readable recording medium
CN108228325B (en) * 2017-10-31 2020-12-29 深圳市商汤科技有限公司 Application management method and device, electronic equipment and computer storage medium
CN107766940B (en) * 2017-11-20 2021-07-23 北京百度网讯科技有限公司 Method and apparatus for generating a model
CN108009643B (en) * 2017-12-15 2018-10-30 清华大学 A kind of machine learning algorithm automatic selecting method and system
CN107977712A (en) * 2017-12-20 2018-05-01 四川九洲电器集团有限责任公司 Network type machine learning system
CN108343125B (en) * 2018-02-27 2024-02-23 宁波欧琳科技股份有限公司 Method and system for draining water based on wireless control drainer
JP6857332B2 (en) * 2018-03-13 2021-04-14 オムロン株式会社 Arithmetic logic unit, arithmetic method, and its program
CN108764267B (en) * 2018-04-02 2021-08-10 上海大学 Denial of service attack detection method based on countermeasure decision tree integration
CN108628669A (en) * 2018-04-25 2018-10-09 北京京东尚科信息技术有限公司 A kind of method and apparatus of scheduling machine learning algorithm task
CN108665072A (en) * 2018-05-23 2018-10-16 中国电力科学研究院有限公司 A kind of machine learning algorithm overall process training method and system based on cloud framework
US11334329B2 (en) 2018-06-08 2022-05-17 Shanghai Cambricon Information Technology Co., Ltd. General machine learning model, and model file generation and parsing method
CN111338630B (en) * 2018-11-30 2022-02-08 上海寒武纪信息科技有限公司 Method and device for generating universal machine learning model file and storage medium
CN114282686A (en) * 2018-06-26 2022-04-05 第四范式(北京)技术有限公司 Method and system for constructing machine learning modeling process
CN108960433B (en) * 2018-06-26 2022-04-05 第四范式(北京)技术有限公司 Method and system for running machine learning modeling process
CN109117266B (en) * 2018-07-13 2021-10-19 视云融聚(广州)科技有限公司 Video artificial intelligence training platform based on multilayer framework
CN109240658A (en) * 2018-09-12 2019-01-18 郑州云海信息技术有限公司 A kind of method and device of software architecture selection neural network based
CN109409533B (en) * 2018-09-28 2021-07-27 深圳乐信软件技术有限公司 Method, device, equipment and storage medium for generating machine learning model
CN109325541A (en) * 2018-09-30 2019-02-12 北京字节跳动网络技术有限公司 Method and apparatus for training pattern
TWI710922B (en) 2018-10-29 2020-11-21 安碁資訊股份有限公司 System and method of training behavior labeling model
CN109635918A (en) * 2018-10-30 2019-04-16 银河水滴科技(北京)有限公司 The automatic training method of neural network and device based on cloud platform and preset model
CN109376844A (en) * 2018-10-30 2019-02-22 银河水滴科技(北京)有限公司 The automatic training method of neural network and device recommended based on cloud platform and model
CN109635833A (en) * 2018-10-30 2019-04-16 银河水滴科技(北京)有限公司 A kind of image-recognizing method and system based on cloud platform and model intelligent recommendation
CN111177802B (en) * 2018-11-09 2022-09-13 安碁资讯股份有限公司 Behavior marker model training system and method
CN109711436A (en) * 2018-12-05 2019-05-03 量子云未来(北京)信息科技有限公司 A kind of artificial intelligence training pattern construction method, device and storage medium
CN109858631B (en) * 2019-02-02 2021-04-27 清华大学 Automatic machine learning system and method for streaming data analysis for concept migration
US11769075B2 (en) 2019-08-22 2023-09-26 Cisco Technology, Inc. Dynamic machine learning on premise model selection based on entity clustering and feedback
CN110598777B (en) * 2019-09-03 2022-12-27 中国科学院深圳先进技术研究院 Data processing method and system based on end cloud cooperation
US11475374B2 (en) 2019-09-14 2022-10-18 Oracle International Corporation Techniques for automated self-adjusting corporation-wide feature discovery and integration
US11663523B2 (en) 2019-09-14 2023-05-30 Oracle International Corporation Machine learning (ML) infrastructure techniques
US12118474B2 (en) 2019-09-14 2024-10-15 Oracle International Corporation Techniques for adaptive pipelining composition for machine learning (ML)
US11562267B2 (en) 2019-09-14 2023-01-24 Oracle International Corporation Chatbot for defining a machine learning (ML) solution
CN110715953B (en) * 2019-09-18 2020-07-21 浙江大学 System and method for testing heat-conducting property of film material based on machine learning
CN110653801B (en) * 2019-09-30 2022-06-17 哈尔滨工业大学 Guide control system of robot operating arm and flexible control and teaching learning method thereof
CN110765163B (en) * 2019-10-17 2020-07-14 广州商品清算中心股份有限公司 Execution plan generation method for big data processing flow
CN111210023B (en) * 2020-01-13 2023-04-11 哈尔滨工业大学 Automatic selection system and method for data set classification learning algorithm
CN111461892B (en) * 2020-03-31 2021-07-06 支付宝(杭州)信息技术有限公司 Method and device for selecting derived variables of risk identification model
CN111523646B (en) * 2020-04-23 2023-06-23 国家开放大学 Intelligent perception network and management method for remote education learning center based on Internet of things
CN111680717A (en) * 2020-05-12 2020-09-18 顺德职业技术学院 Product classification method and system on intelligent manufacturing production line based on deep learning
US11487967B2 (en) * 2020-05-15 2022-11-01 International Business Machines Corporation Finetune image feature extraction using environmental data
CN111814864A (en) * 2020-07-03 2020-10-23 北京中计新科仪器有限公司 Artificial intelligent cloud platform system for mass spectrometry data and data analysis method
CN112288133A (en) * 2020-09-28 2021-01-29 珠海大横琴科技发展有限公司 Algorithm service processing method and device
US11568318B2 (en) 2020-10-07 2023-01-31 Panasonic Intellectual Property Management Co., Ltd. Method for developing machine-learning based tool
CN112671757B (en) * 2020-12-22 2023-10-31 无锡江南计算技术研究所 Encryption flow protocol identification method and device based on automatic machine learning
CN112698848B (en) * 2020-12-31 2024-07-26 Oppo广东移动通信有限公司 Downloading method, device, terminal and storage medium of machine learning model
CN115271087A (en) * 2021-04-29 2022-11-01 华为云计算技术有限公司 Method and device for acquiring knowledge
US12015691B2 (en) 2021-09-23 2024-06-18 International Business Machines Corporation Security as a service for machine learning
CN114579822B (en) * 2021-12-13 2023-05-30 北京市建筑设计研究院有限公司 Modeling tool pushing method and device, electronic equipment and storage medium
WO2024001344A1 (en) * 2022-07-01 2024-01-04 华为云计算技术有限公司 Target function solving method and apparatus based on cloud computing technology and computing device
CN114927164A (en) * 2022-07-18 2022-08-19 深圳市爱云信息科技有限公司 Sample compatibility detection method, device, equipment and storage medium based on AIOT platform

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615265A (en) * 2009-08-11 2009-12-30 路军 A kind of intelligent decision simulating experimental system based on multi-Agent technology

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615265A (en) * 2009-08-11 2009-12-30 路军 A kind of intelligent decision simulating experimental system based on multi-Agent technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532376A (en) * 2018-04-13 2019-12-03 国际商业机器公司 Classifying text is to determine the target type for selecting machine learning algorithm result
CN110532376B (en) * 2018-04-13 2024-03-19 玛雷迪夫美国公司 Classifying text to determine a target type for selecting machine learning algorithm results

Also Published As

Publication number Publication date
CN101782976A (en) 2010-07-21

Similar Documents

Publication Publication Date Title
CN101782976B (en) Automatic selection method for machine learning in cloud computing environment
Cicek et al. Optimizing the artificial neural network parameters using a biased random key genetic algorithm for time series forecasting
Wu et al. Evolving RBF neural networks for rainfall prediction using hybrid particle swarm optimization and genetic algorithm
Papageorgiou Review study on fuzzy cognitive maps and their applications during the last decade
Gil et al. The application of artificial intelligence in project management research: A review
Razi et al. A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models
Papageorgiou Review study on fuzzy cognitive maps and their applications during the last decade
CN106779087A (en) A kind of general-purpose machinery learning data analysis platform
Wei A GA-weighted ANFIS model based on multiple stock market volatility causality for TAIEX forecasting
Hassan et al. A hybrid of multiobjective Evolutionary Algorithm and HMM-Fuzzy model for time series prediction
Zhang et al. A novel case adaptation method based on an improved integrated genetic algorithm for power grid wind disaster emergencies
Wu et al. A multiobjective optimization-based sparse extreme learning machine algorithm
Janković et al. Machine learning models for ecological footprint prediction based on energy parameters
Alzaeemi et al. Examining the forecasting movement of palm oil price using RBFNN-2SATRA metaheuristic algorithms for logic mining
Taherdoost Machine learning algorithms: features and applications
Gao et al. A multi-objective service composition method considering the interests of tri-stakeholders in cloud manufacturing based on an enhanced jellyfish search optimizer
Chu et al. A data-driven meta-learning recommendation model for multi-mode resource constrained project scheduling problem
El-Hassani et al. A new optimization model for MLP hyperparameter tuning: modeling and resolution by real-coded genetic algorithm
Nathani et al. Foundations of Machine Learning
Jackson et al. Automl approach to classification of candidate solutions for simulation models of logistic systems
Kokkinos et al. Efficiency in energy decision support systems using soft computing techniques
Hasan et al. Review of AI Techniques and Cognitive Computing Framework for Intelligent Decision Support
Duggal et al. Learning systems and their applications: future of strategic expert system
Rajaan et al. Efficient Usage of Energy Infrastructure in Smart City Using Machine Learning
Yan et al. A short-term forecasting model with inhibiting normal distribution noise of sale series

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract

Assignee: Jiangsu Jiqun Information Industry Co., Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2012320000280

Denomination of invention: Automatic selection method for machine learning in cloud computing environment

License type: Exclusive License

Open date: 20100721

Record date: 20120322

C14 Grant of patent or utility model
GR01 Patent grant