A kind of machine learning method, device and big data platform
Technical field
The present invention relates to big data technique field, particularly relate to a kind of machine learning method, machine learning device, Yi Jiji
A kind of big data platform in this machine learning device.
Background technology
Spark was that the big data calculating that Databricks increases income processes engine, became the top project of Apache in 2010,
Its core calculations is elasticity distribution formula data set (RDD), it is provided that the MapReduce model more enriched than Hadoop, it is possible to
Quickly to data set iterative computation in internal memory, support complicated machine learning algorithm and graph-theoretical algorithm.
Machine learning method of the prior art is as follows.
First, perform step 1) collection of initial data: data producer can generate polytype data, such as log literary composition
Part, view data, text data etc., the quality of data can produce along with some problems of the improper department of user or system
A lot of noise datas, are difficult to avoid the generation of wrong data;It is special that other multi-medium data such as text, image also need to some
Other instrument carries out data loading.
Then, perform step 2) data prediction: from step 1) in the data collected containing a lot of dirty datas, invalid
Data and some multi-medium datas, it is necessary to process through a series of measure, frequently with Hive, MR and
Data are gone dirty process, Missing Data Filling etc. by the preprocess module of Spark;Place for multi-medium data
Reason, has a third-party tool kit for being converted to the data that computer can process, such as OpenCV, Word2vec.
Then, step 3 is performed) Feature Engineering: Feature Engineering mainly includes again the processing of preprocessed data, data
Format, sampling, the conversion of data and the design of feature and selection, often use MR, Spark features module and
Data are processed by third party's instrument of a little specialties, output characteristic data after process, are used for doing the training of model.
Then, step 4 is performed) model training: practical business problem is mainly entered by model training by data method
Row modeling, conventional Spark MLlib, Mahout and third party bag such as sklearn etc. carry out business model to data,
The model trained carries out persistence preservation.
Then, step 5 is performed) model reaches the standard grade: model is reached the standard grade and is mainly said data on the model tie-in line trained, and is
User provide conventional classification, return, the machine learning service such as recommendation.
Finally, framework relies on: whole machine learning techniques scheme based on spark relies on data literary composition big with HDFS, S3 etc.
Part system, and various big data including hadoop, spark process and deployment tool.
As can be seen here, the whole framework in machine learning method based on the big data of spark of the prior art supports and relates to
And wide, business difficulty is big.Therefore, the technical scheme of machine learning algorithm models based on the big data of spark in prior art,
Sufficiently complex, the technological layer of design covers the every aspects such as Distributed Calculation, framework deployment, model calculating, data mining, flower
Take the biggest manpower and materials just can complete.Each flow process is required for carrying out continually operation of exchanging visits with file system, significantly drops
The performance of low whole system, thus the reliability reduction causing modeling, predicting and apply;The more important thing is the spirit that can cause programming
The reusability of activity, ease for maintenance, code or assembly is a greater impact, and therefore causes Consumer's Experience poor.
Summary of the invention
It is an object of the invention to open a kind of machine learning device, a kind of machine learning based on this machine learning device
Method, and use a kind of big data platform of above-mentioned machine learning device and machine learning method thereof, in order to realize standard
The data mining that relates to during changing big data mining, the Effec-tive Function of machine learning, simplify the exploitation of the big data of standardization
Flow process, improves the development deployment efficiency of the big data of standardization, and provides the most unified interface.
For realizing above-mentioned first goal of the invention, the invention provides a kind of machine learning device, comprising:
User Defined process module, configuration module, data base;And
Event server;Wherein,
User Defined process module comprises a logic, this logic be able to receive that Client-initiated request comprised can
Perform file, and called by event server;
Front end exploitation application, by configuring the configuration file that module is write, is carried out by data base with described executable file
Binding.
As a further improvement on the present invention, described User Defined process module includes interface module, service logic mould
Block, service module and performance estimation module.
As a further improvement on the present invention, described business logic modules comprises executable file execution logical operation
At least one rule, described rule includes that machine learning algorithm rule, text data process rule, graphic user interface processes rule
Then.
As a further improvement on the present invention, described performance estimation module is according to the rule included in this business logic modules
The executable file then comprised Client-initiated request obtains machine learning algorithm model, and is specifying according to user's scene
Hyper parameter adjusting and optimizing is carried out, to obtain machine learning algorithm model parameter in model parameter.
As a further improvement on the present invention, described business logic modules comprises data prediction logic, Feature Engineering is patrolled
Volume, model algorithm logic and model reach the standard grade logic;Wherein, described model logic of reaching the standard grade includes RESTfull API and data stock
Storage index.
As a further improvement on the present invention, this machine learning device also includes encrypting module, and it passes through access key
Binding RESTfull API, to bind configuration file with described executable file.
As a further improvement on the present invention, described access key includes Access key or Secret key.
As a further improvement on the present invention, described data base is by creating different tables of data, and during by service request
Between type data base is divided into first data base, the second data base and the 3rd data base;Wherein,
Described first data base, is used for storing metadata;
Described second data base, is used for storing event type, configuration parameter, model training parameter;
Described 3rd data base, for having stored the model of training.
As a further improvement on the present invention, described data base supports that Hbase interactive mode, Elasticsearch are mutual
Pattern or Mysql interactive mode.
As a further improvement on the present invention, event server include import engine, process engine, model training engine and
Service provides engine.
As a further improvement on the present invention, described executable file includes that executable program, computer module, system are inserted
The application of part, visualization interface or computer can perform document.
For realizing above-mentioned second goal of the invention, present invention also offers a kind of machine learning method, comprise the following steps:
S1, the executable file comprised by the reception Client-initiated request of User Defined process module;
S2, executable file is called to event server;
S3, build configuration file according to the environmental variable of user;
S4, in data base content according to configuration file, front end exploitation application is tied up with described executable file
Fixed.
For realizing above-mentioned 3rd goal of the invention, present invention also offers a kind of big data platform, including above-mentioned any one
Machine learning device and at least one platform engine, described platform engine include spark engine, tensorflow engine or
Person's mxnet engine.
Compared with prior art, the invention has the beneficial effects as follows: carry out finishing service logic by User-defined template
Component, it is achieved that adaptability and the versatility to various application scenarios, it is achieved that relate to during data mining big to standardization
The data mining arrived, the Effec-tive Function of machine learning, simplify the development process of the big data of standardization, improve the big number of standardization
According to development deployment efficiency, and the most unified interface can be provided, so that algorithm development, application and development are developed with framework
It is capable of modular operation, the deployment efficiency greatly improving big data platform and the efficiency that data are excavated.
Accompanying drawing explanation
Fig. 1 is the structure chart of the present invention a kind of machine learning device;
Fig. 2 is the structure chart of the event server in the machine learning device in Fig. 1;
Fig. 3 is the present invention a kind of machine learning device structure chart in a kind of variation;
Fig. 4 is the flow chart of a kind of machine learning method of the present invention.
Detailed description of the invention
The present invention is described in detail for each embodiment shown in below in conjunction with the accompanying drawings, but it should explanation, these
Embodiment not limitation of the present invention, those of ordinary skill in the art according to these embodiment institute work energy, method,
Or the equivalent transformation in structure or replacement, within belonging to protection scope of the present invention.
Embodiment one:
Please join a kind of detailed description of the invention of a kind of machine learning device of the present invention shown in Fig. 1 Yu Fig. 2.
In the present embodiment, a kind of machine learning device, comprising: User Defined process module 1, configuration module 4,
Data base 3;And event server 2.User Defined process module 1 comprises a logic, and this logic is able to receive that user sends out
The executable file that the request risen is comprised, and called by event server institute 2.Data base 3 is by configuring what module 4 was write
Configuration file, binds front end exploitation application with described executable file.Concrete, this executable file includes performing
The application of program, computer module, system plugin, visualization interface or computer can perform document.
This User Defined process module 1 includes that interface module 11, business logic modules 12, service module 13 and performance are commented
Estimate module 14.Concrete, shown in ginseng Fig. 2, in the present embodiment, event server 2 includes importing engine 21, processing engine
22, model training engine 23 and service provide engine 24.Import engine 21 to be responsible for disposition data source parameter, pending data are entered
The basic handling such as row read operation/write operation, and support to carry out data interaction between data base 3.Process engine 22 to be responsible for treating
Process data and perform text data process.Model training engine 23, is responsible for being carried out practical business problem by data method
Modeling, commonly uses, with Spark MLlib, Mahout and third party's bag sklearn etc. such as, data is carried out business model,
The model trained carries out persistence preservation, and preserves to the second data base 32.Concrete, model training engine 23 is supported
Line model training is trained with off-line model, thus improves user's convenience when big data are disposed.Service provides engine 24,
It accepts the model in service module 13, and directly provides a user with online service operation by network.
This business logic modules 12 comprises at least one rule that executable file performs logical operation, described rule bag
Include machine learning algorithm rule, text data processes rule, graphic user interface processes rule.In the present embodiment, business
Logic module 12 is by adding above-mentioned four kinds of logics so that whole User Defined process module 1 has possessed centralization and processed industry
The logic of business.
Client-initiated request is comprised by performance estimation module 14 according to the rule included in this business logic modules
Executable file obtain machine learning algorithm model, and in designated model parameter, carry out hyper parameter adjustment according to user's scene
Optimize, to obtain machine learning algorithm model parameter.The rule that performance estimation module 14 can allow user on-demand or set by oneself
Then input rule, and dispose on the line of implementation model.
Business logic modules 12 comprise data prediction logic 121, Feature Engineering logic 122, model algorithm logic 123 and
Model is reached the standard grade logic 124;Wherein, described model reach the standard grade logic 124 include RESTfull API and database purchase index.
RESTfull API, a kind of software architecture interface, it provides one group of design principle and constraints.It is mainly used in client
Software with server interactive class.
Data base 3 is by creating different tables of data, and by service request time type, data base 3 is divided into first
Data base the 31, second data base 32 and the 3rd data base 33, wherein, the first data base 31, it is used for storing metadata;Second data
Storehouse 32, is used for storing event type, configuration parameter, model training parameter;3rd data base 33, for having stored the mould of training
Type.
The executable file generated from User Defined process module 1 can be realized, the most also by RESTfull API
User can be received or data inquiry request that manager is sent, and event server 2 can be preserved in the second data base 32
The model exported or service or application.
User or manager, can be by app, system plugin, program, graphic user interfaces when building big data platform
(GUI), all data that can be readable by a computer such as text data are captured by User Defined process module 1, and form use
Family self-defined template.The configuration file that this User-defined template can be imported with configuration module 4, with front end in data base 3
Web application, app or the service of exploitation realize encapsulation, and are stored in the 3rd data base 33, thus for follow-up service or should
By the big data, services providing integration.The web application of front end exploitation, app or service can pass through JAVA, Python, PHP or
The language such as person Ruby are write and are formed.
Concrete, in the present embodiment, this data base 3 supports the mutual mould of Hbase interactive mode, Elasticsearch
Formula or Mysql interactive mode, and preferably Elasticsearch interactive mode.Elasticsearch be one based on
The search server of Lucene.It provide the full-text search engine of a distributed multi-user ability, based on RESTful web
Interface.Elasticsearch Java develops, and issues as the open source code under Apache license terms, and can be real
Existing distributed full-text search.
In the present embodiment, the component of finishing service logic is carried out by User-defined template 1, it is achieved that answer various
By adaptability and the versatility of scene, it is achieved that the data mining that relates to during data mining big to standardization, engineering
The Effec-tive Function practised, simplifies the development process of the big data of standardization, improves the development deployment efficiency of the big data of standardization, and
The most unified interface can be provided, so that algorithm development, application and development are developed with framework is capable of modular operation,
The deployment efficiency greatly improving big data platform and the efficiency that data are excavated.
Simultaneously, it is also possible to realize the adaptation exploitation of back end business logic, design and set, and pass through to event server 2
Submit to newly-built app to ask, determine the relevant informations such as app ID, app NAME, Access Key.Determine these information complete it
After, directly can carry out structure and the deployment of template in data base 3, to complete the binding of template and app;Then taken by event
The operations such as the template built and deployment is complete is compiled by business device 2 successively, training, thus generate template.Template after generation
Can be bound by Access Key with the various application of front end exploitation, program, the computer executable file such as plug-in unit, it is achieved
Business separates with framework.
In service deployment and development process, business separates with framework, algorithm engineering teacher only need to pay close attention to algorithm logic work,
It is responsible for the development of algorithm logic template;CLP AD only need to participate in app, web development, it is provided that data access
Logic is presented with data;Framework Developmental Engineer only need to pay close attention to framework details, and all business all have event to trigger, and event is by business
Personnel are self-defined, and the configuration file that all model related works are write in data base 3 by configuration template 4 is controlled, whole
Individual big data platform is divided the work clearly, is disposed simply, and the bottom at spark calculates and with the help of big data ecology instrument, can be big
The earth reduces the data storage redundancy in conventional machines learning device, degraded performance, a difficult problem for development process complexity.
Embodiment two:
In conjunction with reference to shown in Fig. 3, the present embodiment differs primarily in that with embodiment one, in the present embodiment, and this machine
Device learning device also include encrypt mould 5 pieces, its by access key bind RESTfull API, with by configuration file with described
Executable file is bound.Preferably, this access key is Access key, it is possible to for Secret key.Front end applications
Interacted with model time server 6 by binding Access Key binding RESTful API service, complete the inquiry of data
Service.
The technical scheme that the present embodiment is identical with embodiment one please be joined described in embodiment one, does not repeats them here.
Embodiment three:
Shown in ginseng Fig. 4, the present embodiment discloses a kind of machine learning method, comprises the following steps:
S1, the executable file comprised by the reception Client-initiated request of User Defined process module;
S2, executable file is called to event server;
S3, build configuration file according to the environmental variable of user;
S4, in data base content according to configuration file, front end exploitation application is tied up with described executable file
Fixed.
Embodiment four:
Present embodiment discloses a kind of big data platform, it includes one or more machine learning device and at least one
Individual platform engine, described platform engine includes spark engine, tensorflow engine or mxnet engine, concrete, and this is put down
Platform engine selects different computing engines according to business demand, such as in scene based on the multimedia data service such as image and video
Under, we provide tensorflow or mxnet engine to do the Computational frame of platform, at business scenario based on structural data
Under, use spark as platform computing engines.
Machine learning device in the present embodiment coordinates with reference to described in this specification embodiment one and/or embodiment two.
Spark is a data processing platform (DPP) increased income, and is made up of one group of storehouse powerful, high level, at present these
Storehouse mainly includes Spark SQL, Spark Streaming, MLlib, GraphX, supports to include Scala, Java, Python, R
In interior API Calls, it is possible to carry out the most integrated with Hadoop ecosystem and data source.
Spark mainly includes structured data query and analysis engine (SparkSQL), distributed machines learning database
(MLlib), parallel figure Computational frame (GraphX), stream calculation framework (Spark Streaming), third party's sub-project are (such as
BlinkDB, Tachyon, Mesos etc.).MLlib is the assembly being responsible for machine learning in Spark, and conventional module includes
Classification、Regression、Clustering、Collaborativefiltering、
Frequentpatternmining and conventional data prediction and Feature Engineering module.MLlib is offer machine under Spark
The module of device learning algorithm, built-in multiple machine learning algorithm.
Mxnet and tensorflow is that the degree of depth learns calculating instrument, is commonly used to build the degree of depth based on multi-medium data
Practise framework, have an advantage in that without engineer's feature, given business demand, build multilayer neural network, changing by magnanimity
In generation, calculates, and carrys out the multi-medium data demand that digging user is interested.Common business scenario include security protection, critical point abnormality detection,
Target recognition etc..
The a series of detailed description of those listed above is only for the feasibility embodiment of the present invention specifically
Bright, they also are not used to limit the scope of the invention, all equivalent implementations made without departing from skill of the present invention spirit
Or change should be included within the scope of the present invention.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie
In the case of the spirit or essential attributes of the present invention, it is possible to realize the present invention in other specific forms.Therefore, no matter
From the point of view of which point, all should regard embodiment as exemplary, and be nonrestrictive, the scope of the present invention is by appended power
Profit requires rather than described above limits, it is intended that all by fall in the implication of equivalency and scope of claim
Change is included in the present invention.Should not be considered as limiting involved claim by any reference in claim.
Although moreover, it will be appreciated that this specification is been described by according to embodiment, but the most each embodiment only wraps
Containing an independent technical scheme, this narrating mode of description is only that for clarity sake those skilled in the art should
Description can also be formed those skilled in the art through appropriately combined as an entirety, the technical scheme in each embodiment
May be appreciated other embodiments.