CN109828751A

CN109828751A - Integrated machine learning algorithm library and unified programming framework

Info

Publication number: CN109828751A
Application number: CN201910116872.7A
Authority: CN
Inventors: 郭昆; 郭文忠; 陈羽中; 郭鸿清
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2019-02-15
Filing date: 2019-02-15
Publication date: 2019-05-31

Abstract

The present invention relates to a kind of integrated machine learning algorithm libraries and unified programming framework, including model learning component, model modification component and learning strategy component；The model learning component utilizes the logic of machine learning algorithm and the batch data developing algorithm model of fixed size based on Batch Processing；The logic that the model modification component is updated specific algorithm model using dynamic data flow based on Timely Processing；The learning strategy component is built-in to judge that input data is the strategy of batch data or flow data, and two scheduling model learning object, model modification component components carry out the logic of dynamic learning.The present invention can overcome the single treatment mode disadvantage of conventional machines learning system, facilitate the application of machine learning.

Description

Integrated machine learning algorithm library and unified programming framework

Technical field

The present invention relates to field of computer technology, and in particular to a kind of integrated machine learning algorithm library and unified programming Frame.

Background technique

Our life has been come into machine learning extensively, and application in daily life is more and more frequent.In fact, machine Study just with we: it sends a notification message to us, including microblogging hot spot, the dynamic that the people of concern has just sent out, mobile phone The traffic route searched out on figure, Spam filtering, the even safety of bank information.In recent years, machine learning had been sent out The variation of essence is given birth to, nowadays machine only needs least manual intervention that can learn.However, due to the place of machine learning Reason method is more and process is complicated, needs to design and safeguard simultaneously two sets of codes of batch processing and stream process in practical application, without one It covers unified programming model and adapts to various concrete scenes, be unfavorable for the large-scale application of machine learning.

Summary of the invention

In view of this, the purpose of the present invention is to provide a kind of integrated machine learning algorithm libraries and unified programming frame Frame can overcome the single treatment mode disadvantage of conventional machines learning system, facilitate the application of machine learning.

To achieve the above object, the present invention adopts the following technical scheme:

A kind of integrated machine learning algorithm library and unified programming framework, including model learning component, model modification component and Learning strategy component；The model learning component utilizes machine learning algorithm and fixed size based on Batch Processing The logic of batch data developing algorithm model；The model modification component is based on Timely Processing using dynamic number The logic that specific algorithm model is updated according to stream；The learning strategy component is built-in to judge that input data is batch data Or the strategy of flow data, two scheduling model learning object, model modification component components carry out the logic of dynamic learning.

Further, the model learning component includes a set of data conversion adapter.

Further, data set is enterprising in Hadoop, Spark or Flink platform when in use for the model learning component Line number Data preprocess and algorithm model building, in preprocessing process, using data conversion adapter between different platform seamless biography It passs, realizes cross-platform model construction, recycle the flow datas processing platforms such as Spark Structured Streaming or Flink Realize streaming algorithm model modification.

Further, the model modification component passes through in Spark Structured Streaming, Flink streaming meter It calculates and is realized on frame, the specified computing engines executed when operation, the algorithms library of component internal is based on stream process mode and provides point The machine learning algorithms model modification strategy such as class, recurrence, cluster.

Further, the learning strategy component passes through scheduling model learning object and model modification based on workflow engine Component completes batch machine learning or streaming machine learning task.

Further, the learning strategy component is provided with monitor and scheduler.

A kind of real-time learning control method in integrated machine learning algorithm library and unified programming framework, including following step It is rapid:

Step S1: the scheduler of learning strategy component sends start command to model modification component;

Step S2: model modification component begins listening for flow data source, blocks itself after accumulating predetermined amount data；

Step S3: learning strategy component dispatcher starts monitor, and monitor sends start command to model learning component and carries out Study；

Step S4: when model learning component completes algorithm model training, notice monitor algorithm model training has been completed, has been monitored Device notification model updates component activation itself and continues to monitor flow data source, utilizes the data more new model of arrival.

A kind of batch learning control method in integrated machine learning algorithm library and unified programming framework, including following step It is rapid:

Step S1: the scheduler of learning strategy component sends starting snoop command to model modification component;

Step S2: model modification component begins listening for the model data source of model learning component；

Step S3: the scheduler of learning strategy component starts monitor, monitor to model learning component send start command into Row study；

Step S4: when model modification component, which listens to algorithm model training, to be completed, notice monitor starting next round study；

Step S5: practising by default wheel mathematics, when monitor reaches the threshold value of study number, monitor notification model update group Part study terminates, and notification model learning object integrates final algorithm model.

Compared with the prior art, the invention has the following beneficial effects:

The integrated machine learning algorithm library of the present invention can overcome the single of conventional machines learning system with unified programming framework Processing mode disadvantage facilitates the application of machine learning.

Detailed description of the invention

Fig. 1 is data flow diagram of the invention；

Fig. 2 is real-time learning control assembly logic interaction figure of the present invention；

Fig. 3 is batch study control assembly logic interaction figure of the invention.

Specific embodiment

The present invention will be further described with reference to the accompanying drawings and embodiments.

Fig. 1 is please referred to, the present invention provides a kind of integrated machine learning algorithm library and unified programming framework, unified to compile Batch Processing is converted a kind of Timely Processing that can terminate by journey frame, to the batch of fixed size Data are taken to be sampled for several times, and sampling learns immediately and construct single algorithm model when completing, by integrating final algorithm Model；Unified programming framework includes model learning component, model modification component and learning strategy component in specific implementation；It is described Model learning component utilizes the batch data developing algorithm of machine learning algorithm and fixed size based on Batch Processing The logic of model；The model modification component is based on Timely Processing using dynamic data flow to specific algorithm The logic that model is updated；The learning strategy component is built-in to judge that input data is the plan of batch data or flow data Slightly, two scheduling model learning object, model modification component components carry out the logic of dynamic learning.

In the present embodiment, the model learning component includes a set of data conversion adapter, and the model learning component exists Data set is constructed in the enterprising line number Data preprocess of Hadoop, Spark or Flink platform and algorithm model when use, pretreated Cheng Zhong, using data converter between different platform seamless delivery, realize cross-platform model construction, recycle Spark The flow datas processing platforms such as Structured Streaming or Flink realize streaming algorithm model modification.

In the present embodiment, the model modification component passes through in Spark Structured Streaming, Flink streaming It is realized on Computational frame, the specified computing engines executed when operation, the algorithms library of component internal is based on stream process mode and provides The machine learning algorithms model modification strategies such as classification, recurrence, cluster.

Learning strategy component described in the present embodiment is provided with monitor and scheduler.

Learning strategy component be based on Workflow(workflow) by scheduling model learning object and model modification component come It completes batch machine learning or streaming machine learning task, scheduling process is as follows:

Start, learning strategy component starts Learning Process(learning process), a mark Attribute(is set and is belonged to Property) " whether can end mark " and start a self-timing Activity(activity), if in the activity time data source data Unchanged, setting flag property " whether can end mark " is " can terminate "；Otherwise input data is the data flow of variation, setting Flag property " whether can end mark " is " not terminating ".

If flag property " whether can end mark " is " can terminate ", starting Batch Learning Process(batch Learn sub-process), a Notification(notice is sent to model modification component), notification model more New Parent monitors model The model output source of learning object；Then start second automatic Activity, start monitor and flag property " study is set Whether number is more than threshold value " it is "No", monitor notification model learning object carries out sampling study；When model modification component detection A Message(message is sent to monitor when completing to model training), monitor sends flag property " study time after receiving Whether number is more than threshold value " it is used as Respond(response), and notification model learning object carries out sampling study next time；Work as monitoring When device detects that study number reaches the threshold value of setting, monitor setting " whether study number is more than threshold value " is "Yes", model More New Parent stops after receiving response；Last monitor notification model learning object integrates the algorithm model of all generations, study Process terminates.

If flag property " whether can end mark " is " can not terminate ", learning strategy component starts Real-time Learning Process(real-time learning sub-process), a Notification(notice is sent to model modification component), prison Listen flow data source；Then start second automatic Activity, start monitor and whether flag property " starting to learn " is set For "No"；A Message is sent to monitor after the certain data volume of model modification component accumulation, and is waited to be answered；Monitor Notification model learning object carries out first time study after receiving message；When monitor detects that algorithm model training is completed, if Set that " whether starting to learn " is "Yes" and returns to model modification component as Respond；After model modification component receives response It constantly reads flow data and updates algorithm model.

Referring to Fig. 2, a kind of real-time learning control method in integrated machine learning algorithm library and unified programming framework, packet Include following steps:

Step S4: when model learning component completes algorithm model training, notice monitor model training has been completed, and monitor is logical Perception model updates component activation itself and continues to monitor flow data source, utilizes the data more New Algorithm Model of arrival.

Referring to Fig. 3, a kind of batch learning control method in integrated machine learning algorithm library and unified programming framework, packet Include following steps:

The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with Modification, is all covered by the present invention.

Claims

1. a kind of integrated machine learning algorithm library and unified programming framework, it is characterised in that: including model learning component, mould Type more New Parent and learning strategy component；The model learning component is based on Batch Processing and utilizes machine learning algorithm With the logic of the batch data developing algorithm model of fixed size；The model modification component is based on Timely Processing The logic that specific algorithm model is updated using dynamic data flow；The built-in judgement input number of the learning strategy component According to the strategy for being batch data or flow data, two scheduling model learning object, model modification component components carry out dynamics The logic of habit.

2. integrated machine learning algorithm library according to claim 1 and unified programming framework, it is characterised in that: described Model learning component includes a set of data conversion adapter.

3. integrated machine learning algorithm library according to claim 2 and unified programming framework, it is characterised in that: described Model learning component when in use data set in the enterprising line number Data preprocess of Hadoop, Spark or Flink platform and algorithm model Building in preprocessing process, bumpless transfer and is transmitted data set between different platform using data conversion adapter, is realized across flat Platform model construction recycles Spark Structured Streaming, Flink flow data processing platform to realize streaming algorithm Model modification.

4. integrated machine learning algorithm library according to claim 1 and unified programming framework, it is characterised in that: described By realizing on Spark Structured Streaming, Flink streaming computing frame, when operation, refers to model modification component Surely the computing engines executed, the algorithms library of component internal are based on stream process mode and provide classification, recurrence, cluster machine learning calculation Method model modification strategy.

5. integrated machine learning algorithm library according to claim 1 and unified programming framework, it is characterised in that: described Learning strategy component completes batch engineering by scheduling model learning object and model modification component based on workflow engine It practises and streaming machine learning task.

6. integrated machine learning algorithm library according to claim 1 and unified programming framework, it is characterised in that: described Learning strategy component is provided with monitor and scheduler.

7. the real-time learning control method in a kind of integrated machine learning algorithm library and unified programming framework, which is characterized in that The following steps are included:

Step S4: when model learning component completes algorithm model training, notice monitor algorithm model training has been completed, has been monitored Device notification model updates component activation itself and continues to monitor flow data source, utilizes the data more New Algorithm Model of arrival.

8. the batch learning control method in a kind of integrated machine learning algorithm library and unified programming framework, which is characterized in that The following steps are included: