CN105956021B - A kind of automation task suitable for distributed machines study parallel method and its system - Google Patents

A kind of automation task suitable for distributed machines study parallel method and its system Download PDF

Info

Publication number
CN105956021B
CN105956021B CN201610255970.5A CN201610255970A CN105956021B CN 105956021 B CN105956021 B CN 105956021B CN 201610255970 A CN201610255970 A CN 201610255970A CN 105956021 B CN105956021 B CN 105956021B
Authority
CN
China
Prior art keywords
node
module
stage
key
variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610255970.5A
Other languages
Chinese (zh)
Other versions
CN105956021A (en
Inventor
廖小飞
曹镇山
郭人通
刘海坤
金海�
陆枫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201610255970.5A priority Critical patent/CN105956021B/en
Publication of CN105956021A publication Critical patent/CN105956021A/en
Application granted granted Critical
Publication of CN105956021B publication Critical patent/CN105956021B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Discrete Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Multi Processors (AREA)

Abstract

The present invention provides a kind of automation task parallel methods and its system suitable for distributed machines study, solve the defect of existing distributed machines study programming interface: the read-write interface for only providing key-value pair leads to system data access behavior and applies logic close coupling.The defect aggravates network bandwidth resources competition in distributed type assemblies, and programming personnel is caused to be not easy to carry out parallelization to task.Present system includes working node module, service node module, host node module, tensor module, scheduler module, message tracking module, stage module, stage group module and enforcement engine module.The present invention decouples the logic of read and write access behavior and application program by the way that the programming for providing higher level is abstract, runtime system carries out dynamic task division according to the loading condition of service node first, secondly machine learning task is automated into parallel execution, mitigates the burden that programming personnel writes high concurrent machine learning application significantly.

Description

A kind of parallel method of automation task suitable for distributed machines study and its System
Technical field
The invention belongs to distributed computings and machine learning interleaving techniques field, and in particular to one kind is suitable for distributed machine The automation task of device study parallel method and its system.
Background technique
The conventional method that machine learning algorithm is worth as a kind of mining data is widely used in natural language processing, text The fields such as this analysis, speech recognition, automatic driving of motor vehicle and biological information.With the arrival of big data era, data Value increasingly show especially out, the commercial value especially wherein contained, machine learning thus be taken seriously.However, with number According to scale and mutually increasing in requisition for the scale of the model parameter of study, single calculate node is due to its memory source, meter The finiteness for calculating resource and memory bandwidth resource etc. is no longer satisfied the demand of large-scale machines study.By traditional single-unit Point machine learning, which carries out distribution, becomes a kind of new and required trend.After machine learning distribution, it can be used more More calculate nodes goes to handle larger data, at the same shorten training gained model needed for the time, and improve study Model accuracy.Distributed machines study is all received universal concern in industry and academia, such as: Google utilizes distribution Formula system DistBelief has trained cat face identification model, and Apache Software Foundation, which is developed, to be based on It increases income one and is applicable to machine in the distributed machines laboratory Berkeley AMP learning framework Mahout and UC of Hadoop Distributed computing system Spark of learning algorithm etc..
Distributed machines, which learn most of algorithms, has iterative nature, runs the iterative process or model ginseng of predetermined number of times Number, which converges to a certain stable state, just terminates training process.Traditional Distributed Architecture MapReduce etc. is due to its synchronization mechanism Defect, make it be bad to cause its performance not fully up to expectations in iterative estimated performance.
Novel machine learning distributed system is parameter server framework, and parameter described herein refers in machine learning For the key-value pair (key, value) or two-dimensional matrix or multi-dimensional matrix of descriptive model parameter, while multi-dimensional matrix Also referred to as tensor.In parameter server framework, the calculate node in cluster is divided into two classes, and a kind of node is known as working node, Another kind of node is known as service node.Wherein, service node is responsible for safeguarding that world model's parameter, including responsive operation node are directed to The operation such as inquiry and update of model parameter;Working node loads partial data collection that global training data is concentrated to local memory In, it is calculated using the algorithm of application logic regulation and which model parameter is needed to be calculated, initiate inquiry behaviour to service node Make, required model parameter is transmitted in local memory by network, then utilizes the algorithm and required mould for applying logic regulation Shape parameter calculates the updated value Δ w of new model parameter w or model parameter, after a wheel iterative calculation, work section Point is initiated update and synchronous world model's parameter etc. to service node and is operated.Working node is primary complete in distributed machines study Behavior in whole iteration, which can conclude, is described as following steps:
1. working node loading section data set;
2. working node calculates the model parameter of needs, the mould needed for being obtained by the model access interface that bottom provides Shape parameter;
3. according to the updated value Δ w for going out new model parameter w or model parameter using logic calculation;
4. the updated value Δ w of the model parameter w newly calculated or model parameter is pushed to service node by working node, into Row parameter update with it is synchronous.
Step 2 among the above, 3,4 are the committed steps in iterative calculation, and are accessed by world model's parameter reading and writing Interface obtains the model parameter needed for calculating and the updated value of the model parameter newly calculated or model parameter is pushed to clothes Business node, is the major source of network transmission in system.
It is huge due to model parameter for step 2, the network transmission volume thus caused be also it is huge, in net In the case that network bandwidth resources are certain, for a working node, the network latency in iterative process, which is greater than, to be calculated Time, so that the time of entire model training lengthens;When multiple working nodes trigger network transmission simultaneously, there are bandwidth resources Warfare, network latency can become longer.The behavior and upper layer application logic of working node triggering mode parameter It is closely related.The physical layer interface provided in parameter current server architecture is the unified interface of global parameter access, is made in this way It obtains the behavior of the access global parameter of system and applies logic close coupling, be unfavorable for optimizing from system bottom.
For step 3, working node computation model parameter, this operation is the operation of computation-intensive, current many-core, How multicore era maximizes the parallel calculating task, most important for the concurrency for improving system.Current distributed machine Device learning system is not provided with the programming interface of corresponding parallelization, only provides world model's read and write access interface, it is therefore desirable to Programming personnel has the experience of multiple programming, can just write the machine learning application program of high concurrent.
For step 4, for the bottleneck of network transmission in parameter synchronization, existing 2 kinds of solutions: one is change to synchronize The iteration progress of model, i.e. permission different operating node has certain difference, after the difference of iteration progress reaches certain threshold value, Batch synchronization (BSP, Bulk Synchronous Parallel) is carried out again, and such scheme alleviates Netowrk tape to a certain extent The case where wide resource contention;Another solution is control parameter server resource occupancy situation, is selected for different operating node Different synchronization of time intenals is taken to avoid request emergency case, while guaranteeing that the time interval chosen can meet reduction simultaneously Communication frequency and ensure train accuracy rate.
Summary of the invention
In view of the above drawbacks of the prior art or Improvement requirement, the present invention provides be suitable for appointing for distributed machines study Business automates parallel method and its system.Firstly, by decoupling the access interface of model parameter and application logic, this Adjustable characteristic when sample makes system have operation for the access behavior of model parameter, be in this way network transmission and be The optimization of system bottom parallelization etc. provides the foundation.Several stages are logically decomposed into secondly, will apply, and thus construct oriented nothing Ring figure (directed acyclic graph, abbreviation DAG) goes to describe the dependence between each calculation stages, and when operation is Task divide and be executed parallel by system by DAG automation, improves system concurrency degree.Above method and system can be effective Ground solves network transmission bottleneck problem in existing distributed machines learning system and improves system concurrency degree, to improve system Overall performance.
To achieve the goals above, according to one aspect of the present invention, it provides a kind of suitable for distributed machines study Task automation parallel method and its system, specifically include working node module, service node module, host node module, Measure module, scheduler module, message tracking module, stage module, stage group module and enforcement engine module.Wherein stage mould Block, scheduler module are all connected with tensor module;Stage module is connected with stage group module;Engine execution module and stage module phase Even;Scheduler module, tensor module, stage group are all connected with tensor module.
The working node module and service node module, is the row for working node and parameter service node respectively For abstractdesription, and the two modules are transparent to machine learning programming personnel.
The host node module is the abstractdesription for host node.The effect of host node is to coordinate whole system Workflow, such as the initialization of system and the end of system.Mentioned-above system module, in addition to working node module, clothes Other modules outside business node module, host node module are all present in all nodes.
The tensor module is used to describe the key-value pair (key, value) of model parameter in machine learning.Application program needs A variety of tensor objects are wanted to describe to train required model parameter, each tensor object have tensor_id attribute as it only One mark.There are three types of the types of tensor object: the overall situation sharable (global shared), globally unique (global ) and local (local) unique.Global sharable expression tensor object safeguarded by distributed node, different nodes Between the data safeguarded can have intersection;The globally unique expression tensor object safeguarded by distributed node, different sections Point between safeguard without intersection;The expression of the local tensor object exists only in a node.Tensor object has load (load), the operation interfaces such as (pull), push (push) are pulled to use for programming personnel.
The stage module, for describing certain section of programmed logic in application program.The present invention patrols the entirety of application program It collects and resolves into the different stages, and each phase object contains stage_id attribute as its unique identification.Between phase object, Function set_dependency being relied on by setting, the dependence between them is set.Phase object needs several tensors pair As its input and an optional output.Its input has 2 seed types for the stage, and one kind being referred to as master variable Primary_variable, another kind are referred to as complementary variable secondary_variable.(key, the value) of master variable It is right, there is no dependence between key, and (key, the value) of complementary variable is right, has dependence between key.It is right In a stage, programming personnel needs to provide core function kernel_function, the core logic as this stage.Simultaneously Programming personnel it is also required to provide mapping function (the key_projection letter between the key of master variable and the key of complementary variable Number), runtime system derives the key of auxiliary variable automatically according to the key and key_projection function of master variable.For Each master variable and auxiliary variable in stage have a corresponding variable to be known as update_variable.update_ Variable is for updating corresponding variable, and the customer-furnished update_function of more new logic is defined.
The stage group module, for describing one group of stage.Connection is close between this group of represented stage of stage group.Rank Section group has attribute group_id as its unique identification.Stage group has the two interfaces of run and set_barrier.The side run The optional parameters of method is an integer num_run, for specifying the execution number of this stage group.Set_barrier interface is used In setting simultaneously operating, after up till now stage group has executed, needing all working node to enter, fence is synchronous to wait shape for expression State can just continue to run after the stage group of all working node has executed.
The scheduler module is for decision for some tensor object, the set of working node next stage key to be treated key_set.The bandwidth information on scheduler module fixed time broadcast its node on service node, the scheduler module root on working node According to the bandwidth information of its service node obtained, decision share out the work node model parameter to be treated next time key collection It closes.
The engine execution module, for by stage group stage and its relation of interdependence be described as directed acyclic graph (directed acyclic graph, abbreviation DAG), in the directed acyclic graph, the node in figure indicates stage, having in figure Dependence between the side expression stage, while tail portion stage will prior to while head stage execute.
The message tracking module is used in logging program operation by tensor module, scheduler module, stage group module, work Make the message that node module, service node module, host node module are submitted, when message tracking module is given in message-submission, message with Track module is responsible for for message being transferred to recipient, and after recipient returns to message receipt, message tracking module is responsible for notification message Initial launching side, and deliver acknowledgement message.
Correspondingly, the present invention also provides a kind of tasks suitable for distributed machines study to automate parallel method, uses In the automatic division and automatic paralleling execution that carry out task in distributed machines learning algorithm scene, including system initialization Step, parallel training step and system finishing step, in which:
(1) system initialization step: initialization node topology information and initialization application logic, specifically include with Lower sub-step:
(1.1) all nodes bring into operation, and read configuration file respectively, determine the role's rotor step (1.2) of oneself, institute Stating role is working node or service node or host node;
(1.2) working node, service node are communicated with host node respectively, inform that its nodal information of host node, host node will The nodal information being collected into is broadcast to other all nodes, goes to step (1.3);
(1.3) after working node and service node receive the nodal information that host node is sent, initialization node topology letter Breath, for the communication between posterior nodal point;Go to step (1.4);
(1.4) working node and service node initializing application logic, runtime system is according to stage group in program The sequencing occurred in code determines that the successive of stage group executes sequence, while constructing the corresponding DAG of each stage group;Turn step Suddenly (2);
(2) parallel training step: host node and service node skip specific training logic, enter step (3), each Working node enters model training state, and working node is iterated formula parallel training according to the training data subset of input, directly To predefined iteration termination condition is met, the behavior of working node specifically includes following sub-steps:
(2.1) runtime system of working node, will be on its DAG for having determined each stage group of sequencing Node carries out topological sorting and executes sequence with each stage group of determination interior all stages;Go to step (2.2);
(2.2) the stage group being currently not carried out is known as next_group;Setting next_group is that machine learning application is patrolled Collect the stage group occurred for the first time;(2.3) are gone to step, if going to step (2.6) currently without the stage group being not carried out;
(2.3) by the run method execution num_run of the next_group stage group indicated, (num_run is that user starts journey The parameter provided when sequence) it is secondary, for the run method of single, when operation, creates a collection of thread, according to determined by step (2.1) Stage executes sequence in the stage group, successively executes the run method in all stages, numbers the small stage and first carries out, has The stage pipeline of identical number executes, which goes to step (2.4) after having executed num_run times;
(2.4) whether the stage group that the runtime system judgement of working node has currently executed num_run times is provided with Set_barrier carries out fence simultaneously operating if being provided with set_barrier;Go to step (2.5);
(2.5) if next_group, is set to the stage group being currently not carried out by the stage group being currently also not carried out, turn Step (2.3), otherwise goes to step (2.6);
(2.6) judge whether to reach iteration termination condition when working node is run, if reaching termination condition, go to step (3), (2.1) are otherwise gone to step;
(3) system finishing step: working node informs that its work of host node is completed, and host node detects all working node Work complete after, all nodes of notice coordinate exit the program, specifically include following sub-step:
(3.1) all working node sends job_done message to host node, and host node receives all working node After job_done message, host node sends sys_exit message to all working node and service node, goes to step (3.2);
(3.2) after working node and service node receive sys_exit message, working node and service node are to main section Point sends sys_exit_ack message, goes to step (3.3);
(3.3) host node receives the sys_exit_ack message that all working node and service node are sent, and goes to step (3.4);
(3.4) all nodes terminate program.
Determine that the process of the execution sequence in stage in stage group specifically includes following sub-step in above-mentioned steps (2.1):
Current unassigned number order is set to 0 by (2.1.1), and the node set nodes that current in-degree is 0 is set to Sky goes to step (2.1.2);
The node that in-degree in current DAG is 0 is added in set nodes by (2.1.2), in node set nodes It is order that all nodes, which are all numbered, and order increases 1 certainly;By in node set nodes node and its it is all go out side from DAG figure Remove, nodes set is set to sky, goes to step (2.1.3);
(2.1.3) judges whether current DAG is sky, then goes to step (2.2) if it is sky, otherwise go to step (2.1.2).
The run method in stage described in above-mentioned steps (2.3), specifically includes following sub-step:
The runtime system of (2.3.1) working node calls the prepare_variables method in the stage, determines current The master variable primary_variable in stage keyset to be processed closes primary_key_set, specifically, first according to scheduling The loading condition (L1, L2 ..., Ln) and the key of master variable primary_variable for the service node that module obtains are servicing Network is loaded the part key not handled by working node also safeguarded on minimum service node by the distribution situation on node Set distributes to working node, closes parimary_key_set, rotor step (2.3.2) as keyset to be processed next time;
The key_projection function that (2.3.2) runtime system is provided according to user, and determined in (2.3.1) The keyset of master variable closes primary_key_set and derives that the keyset of auxiliary variable closes secondary_key_set, calls the master The pull method of the tensors object such as variable and auxiliary variable goes to pull required model parameter;Go to step (2.3.3);
(2.3.3) executes the core function kernel_function in the stage, and runtime system is automatically by the key of master variable Set key_set be divided into num_threads part automatically, create num_threads thread parallelization execution core Function, wherein num_threads is the parameter that user provides, and goes to step (2.3.4);
(2.3.4) run core function kernel_function generate more new variables v_update, runtime system according to The update_function that user provides updates corresponding variable v;If the type of variable v is globally shared or complete Office is unique, and the push function of run time call variable v is updated, and (key, the value) which to be updated is to sequence Change;And the data after serializing are sent to the service node of all maintenance this section key, service node receives more new data Afterwards, the data of its maintenance are updated.
Pull method described in above-mentioned (2.3.2) step has following sub-step:
(2.3.2.1) keyset to be pulled the tensor object closes key_set serializing, and the data of serializing are sent To the service node for safeguarding the paragraph key set, (2.3.2.2) is gone to step;
After (2.3.2.2) service node receives pull message, by corresponding (key, value) the binary group number of key_set Requesting party is returned to according to serializing, and by the data after serializing.
By the above method, in general the above technical scheme conceived by the present invention compared with prior art, have with Under advantage and technical effect:
(1) the present invention provides compared to the higher some programming modules of global read and write access interface abstraction level, these Module is decoupled the logic of read and write access behavior and application program, is on the one hand greatly facilitated programming personnel and is write and answers With program, on the other hand provide the foundation for the optimization of system level;
(2) the automation task that the present invention realizes machine learning task executes parallel, this significantly reduces application program Personnel write the burden of high concurrent machine learning application;
(3) runtime system that the present invention develops carries out the dynamic of task automatically according to the loading condition of each service node It divides, takes full advantage of network bandwidth resources.
Detailed description of the invention
Fig. 1 is the module frame chart of automation task parallel system of the present invention;
Fig. 2 is the overall workflow figure of automation task parallel method of the present invention;
Fig. 3 is the sub- work flow diagram of system initialization of automation task parallel method of the present invention;
Fig. 4 is the sub- work flow diagram of parallel training of automation task parallel method of the present invention;
Fig. 5 is the sub- work flow diagram of system finishing of automation task parallel method of the present invention;
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below that Not constituting conflict between this can be combined with each other.
Fig. 1 is the module frame chart of automation task parallel execution system of the present invention.As shown in Figure 1, automation of the invention Task parallel system specifically includes working node module, service node module, host node module, tensor module, scheduler module, disappears Cease tracking module, stage module, stage group module and enforcement engine module.Wherein stage module, scheduler module be all and tensor Module is connected;Stage module is connected with stage group module;Engine execution module is connected with stage module;Scheduler module, tensor mould Block, stage group are all connected with message tracking module.
Working node module and service node module, are the pumping for the behavior of working node and parameter service node respectively As description, and the two modules are transparent to machine learning programming personnel.
Host node module is the abstractdesription for host node.The effect of host node is to coordinate the workflow of whole system Journey, such as the initialization of system and the end of system.Mentioned-above system module, in addition to working node module, service node Other modules outside module, host node module are all present in all nodes.
Tensor module is used to describe the key-value pair (key, value) of model parameter in machine learning.Application program needs more Tensor object is planted to describe to train required model parameter, each tensor object has tensor_id attribute as its unique mark Know.There are three types of the types of tensor object: the overall situation sharable (global shared), globally unique (global unique) With local (local).Global sharable expression tensor object is safeguarded by distributed node, is safeguarded between different nodes Data can have intersection;The globally unique expression tensor object is safeguarded by distributed node, is tieed up between different nodes Shield without intersection;The expression of the local tensor object exists only in a node.Tensor object has load (load), draws The operation interfaces such as (pull), push (push) are taken to use for programming personnel.
Stage module, for describing certain section of programmed logic in application program.Overall logic point of the present invention application program Solution is at the different stages, and each phase object contains stage_id attribute as its unique identification.Between phase object, it can lead to It crosses setting and relies on the dependence that function set_dependency is arranged between them.Phase object needs several tensor objects to make For its input and an optional output.Its input has 2 seed types for the stage, and one kind being referred to as master variable Primary_variable, another kind are referred to as complementary variable secondary_variable.(key, the value) of master variable It is right, there is no dependence between key, and (key, the value) of complementary variable is right, has dependence between key.It is right In a stage, programming personnel needs to provide core function kernel_function, the core logic as this stage.Simultaneously Programming personnel it is also required to provide the mapping function key_projection between the key of master variable and the key of complementary variable, operation When system the key of auxiliary variable is derived automatically according to the key and key_projection function of master variable.It is every for the stage A master variable and auxiliary variable have a corresponding variable to be known as update_variable.Update_variable is used for Corresponding variable is updated, and the customer-furnished update_function of more new logic is defined.
Stage group module, for describing one group of stage.Connection is close between this group of represented stage of stage group.Stage group With attribute group_id as its unique identification.Stage group has the two interfaces of run and set_barrier.Run method Optional parameters is an integer num_run, for specifying the execution number of this stage group.Set_barrier interface is for setting Simultaneously operating is set, expression needs all working node to enter the synchronous wait state of fence after up till now stage group has executed, when After the stage group of all working node has executed, can just it continue to run.
Scheduler module is for decision for some tensor object, the set key_ of working node next stage key to be treated set.The bandwidth information on scheduler module fixed time broadcast its node on service node, the scheduler module on working node is according to it The bandwidth information of the service node of acquisition, decision share out the work node model parameter to be treated next time key set.
Engine execution module, for by stage group stage and its relation of interdependence be described as directed acyclic graph (directed acyclic graph, abbreviation DAG), in the directed acyclic graph, the node in figure indicates stage, having in figure Dependence between the side expression stage, while tail portion stage will prior to while head stage execute.
Message tracking module, for being saved in logging program operation by tensor module, scheduler module, stage group module, work The message that point module, service node module, host node module are submitted, when message tracking module is given in message-submission, message tracks mould Block is responsible for for message being transferred to recipient, and after recipient returns to message receipt, message tracking module is responsible for the first of notification message Beginning initiator, and deliver acknowledgement message.
Fig. 2 is the overall workflow figure of automation task parallel method of the present invention.As shown in Fig. 2, the present invention automates The overall workflow of task parallel method the following steps are included:
(1) system initialization step: initialization node topology information and initialization application logic;
(2) parallel training step: host node and service node skip specific training logic, enter step (3), each Working node enters model training state, and working node is iterated formula parallel training according to the training data subset of input, directly To meeting predefined iteration termination condition;
(3) system finishing step: working node informs that its work of host node is completed, and host node detects all working node Work complete after, all nodes of notice coordinate exit the program.
Fig. 3 is the sub- work flow diagram of system initialization that automation task of the present invention executes method parallel.As shown in Figure 3 originally Invention automation task execute the system initialization workflow of method parallel the following steps are included:
(1.1) all nodes bring into operation, and read configuration file respectively, determine the role's rotor step (1.2) of oneself, institute The role stated is working node or service node or host node;
(1.2) working node, service node are communicated with host node respectively, inform that its nodal information of host node, host node will The nodal information being collected into is broadcast to other all nodes, goes to step (1.3);
(1.3) after working node and service node receive the nodal information that host node is sent, initialization node topology letter Breath, for the communication between posterior nodal point;Go to step (1.4);
(1.4) working node and service node initializing application logic, runtime system is according to stage group in program The sequencing occurred in code determines that the successive of stage group executes sequence, while constructing the corresponding DAG of each stage group;Turn step Suddenly (2).
Fig. 4 is the sub- work flow diagram of parallel training of automation task parallel method of the present invention.As shown in figure 4, certain works The sub- workflow of the parallel training of node, comprising the following steps:
(2) parallel training step: host node and service node skip specific training logic, enter step (3), each Working node enters model training state, and working node is iterated formula parallel training according to the training data subset of input, directly To predefined iteration termination condition is met, the behavior of working node specifically includes following sub-steps:
(2.1) runtime system of working node, will be on its DAG for having determined each stage group of sequencing Node carries out topological sorting and executes sequence with each stage group of determination interior all stages;Go to step (2.2);
(2.2) the stage group being currently not carried out is known as next_group;Setting next_group is that machine learning application is patrolled Collect the stage group occurred for the first time;(2.3) are gone to step, if going to step (2.6) currently without the stage group being not carried out;
(2.3) by the run method execution num_run of the next_group stage group indicated, (num_run is that user starts journey The parameter provided when sequence) it is secondary, for the run method of single, when operation, creates a collection of thread, according to determined by step (2.1) Stage executes sequence in the stage group, successively executes the run method in all stages, numbers the small stage and first carries out, has The stage pipeline of identical number executes, which goes to step (2.4) after having executed num_run times;
(2.4) whether the stage group that the runtime system judgement of working node has currently executed num_run times is provided with Set_barrier carries out fence simultaneously operating if being provided with set_barrier;Go to step (2.5);
(2.5) if next_group, is set to the stage group being currently not carried out by the stage group being currently also not carried out, turn Step (2.3), otherwise goes to step (2.6);
(2.6) working node runtime system judges whether to reach iteration termination condition, if reaching termination condition, turns step Suddenly (3) otherwise go to step (2.1).
Fig. 5 is the sub- work flow diagram of system finishing of automation task parallel method of the present invention.As shown in figure 5, of the invention The sub- workflow of system finishing of automation task parallel method the following steps are included:
(3.1) all working node sends job_done message to host node, and host node receives all working node After job_done message, host node sends sys_exit message to all working node and service node, goes to step (3.2);
(3.2) after working node and service node receive sys_exit message, working node and service node are to main section Point sends sys_exit_ack message, goes to step (3.3);
(3.3) host node receives the sys_exit_ack message that all working node and service node are sent, and goes to step (3.4);
(3.4) all nodes terminate program.
Further, the process for the execution sequence in stage in stage group being determined in step (2.1) specifically includes following son Step:
Current unassigned number order is set to 0 by (2.1.1), and the node set nodes that current in-degree is 0 is set to Sky goes to step (2.1.2);
The node that in-degree in current DAG is 0 is added in set nodes by (2.1.2), in node set nodes It is order that all nodes, which are all numbered, and order increases 1 certainly;By in node set nodes node and its it is all go out side from DAG figure Remove, nodes set is set to sky, goes to step (2.1.3);
(2.1.3) judges whether current DAG is sky, then goes to step (2.2) if it is sky, otherwise go to step (2.1.2).
Further, the run method in stage described in step (2.3), specifically includes following sub-step:
The runtime system of (2.3.1) working node calls the prepare_variables method in the stage, determines current The master variable primary_variable in stage keyset to be processed closes primary_key_set, specifically, first according to scheduling The loading condition (L1, L2 ..., Ln) and the key of master variable primary_variable for the service node that module obtains are servicing Network is loaded the part key not handled by working node also safeguarded on minimum service node by the distribution situation on node Set distributes to working node, closes as keyset to be processed next time, rotor step (2.3.2);
The key_projection function that (2.3.2) runtime system is provided according to user, and determined in (2.3.1) The keyset of master variable closes primary_key_set and derives that the keyset of auxiliary variable closes secondary_key_set, calls the master The pull method of the tensors object such as variable and auxiliary variable goes to pull required model parameter;Go to step (2.3.3);
(2.3.3) executes the core function kernel_function in the stage, and runtime system is automatically by the key of master variable Set key_set be divided into num_threads part automatically, create num_threads thread parallelization execution core Function, wherein num_threads is the parameter that user provides, and goes to step (2.3.4);
(2.3.4) run core function kernel_function generate more new variables v_update, runtime system according to The update_function that user provides updates corresponding variable v;If the type of variable v is globally shared or complete Office is unique, and the push function of run time call variable v is updated, and (key, the value) which to be updated is to sequence Change;And the data after serializing are sent to the service node of all maintenance this section key, service node receives more new data Afterwards, the data of its maintenance are updated.
Further, pull method described in step (2.3.2) has following sub-step:
(2.3.2.1) keyset to be pulled the tensor object closes key_set serializing, and the data of serializing are sent To the service node for safeguarding the paragraph key set, (2.3.2.2) is gone to step;
After (2.3.2.2) service node receives the message of pull, by corresponding (key, the value) binary group of key_set Data Serialization, and the data after serializing are returned into requesting party.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include Within protection scope of the present invention.

Claims (5)

1. a kind of automation task parallel system suitable for distributed machines study, which is characterized in that including working node mould Block, service node module, host node module, tensor module, scheduler module, message tracking module, stage module, stage group module And enforcement engine module;Wherein stage module, scheduler module are all connected with tensor module;Stage module and stage group module phase Even;Engine execution module is connected with stage module;Scheduler module, tensor module, stage group are all connected with message tracking module;
The working node module and service node module, is respectively used to the behavior for working node and parameter service node It is described abstractly;
The host node module, for the abstractdesription to host node, host node is the workflow for coordinating whole system Journey, the end of initialization and system including system;
The tensor module is used to describe the key-value pair (key, value) of model parameter in machine learning;Application program needs more Tensor object is planted to describe to train required model parameter, each tensor object has tensor_id attribute as its unique mark Know;There are three types of the types of tensor object: the overall situation sharable (global shared), globally unique (global unique) With local (local);Global sharable expression tensor object is safeguarded by distributed node, is safeguarded between different nodes Data can have intersection;The globally unique expression tensor object is safeguarded by distributed node, is tieed up between different nodes Shield without intersection;The expression of the local tensor object exists only in a node;Tensor object has load (load), draws The operation interfaces such as (pull), push (push) are taken to use for programming personnel;
The stage module, for describing certain section of programmed logic in application program, the overall logic of application program resolves into difference Stage, and each phase object contains stage_id attribute as its unique identification;Between phase object, can by setting according to Rely function set_dependency that the dependence between them is set;Phase object needs several tensor objects as its input An and optional output;Its input has 2 seed types for phase object, and one kind being referred to as master variable primary_ Variable, another kind are referred to as complementary variable secondary_variable;(key, the value) of master variable is right, key Between there is no dependence, and (key, the value) of complementary variable is right, has dependence between key;For a rank Section, programming personnel need to provide core function kernel_function, the core logic as this stage;Programming personnel simultaneously It it is also required to provide the mapping function key_projection between the key of master variable and the key of complementary variable, runtime system is certainly The dynamic key that auxiliary variable is derived according to the key and key_projection function of master variable;For each master variable in stage And auxiliary variable, there is a corresponding variable to be known as update_variable;Update_variable is for updating correspondence Variable, and more new logic customer-furnished update_function definition;
The stage group module, for describing one group of stage;Connection is close between this group of represented stage of stage group;Stage group With attribute group_id as its unique identification;Stage group has the two interfaces of run and set_barrier;Run method Optional parameters is an integer num_run, for specifying the execution number of this stage group;Set_barrier interface is for setting Simultaneously operating is set, expression needs all working node to enter the synchronous wait state of fence after up till now stage group has executed, when After the stage group of all working node has executed, can just it continue to run;
The scheduler module is for decision for some tensor object, the set key_ of working node next stage key to be treated set;Scheduler timing on service node broadcasts the bandwidth information on its node, and the scheduler on working node is according to its acquisition Service node bandwidth information, decision share out the work node model parameter to be treated next time key set;
The enforcement engine module, for by stage group stage and its relation of interdependence be described as directed acyclic graph (directed acyclic graph, abbreviation DAG), in the directed acyclic graph, the node in figure indicates stage, having in figure Dependence between the side expression stage, while tail portion stage will prior to while head stage execute;
The message tracking module, for being saved in logging program operation by tensor module, scheduler module, working group's module, work The message that point module, service node module, host node module are submitted, when message tracking module is given in message-submission, message tracks mould Block is responsible for for message being transferred to recipient, and after recipient returns to message receipt, message tracking module is responsible for the first of notification message Beginning initiator, and deliver acknowledgement message.
2. a kind of automation task parallel method suitable for distributed machines study, which is characterized in that including system initialization Step, parallel training step and system finishing step, in which:
(1) system initialization step: initialization node topology information and initialization application logic specifically include following son Step:
(1.1) all nodes bring into operation, and read configuration file respectively, determine the role's rotor step (1.2) of oneself, the angle Color is working node or service node or host node;
(1.2) working node, service node are communicated with host node respectively, inform that its nodal information of host node, host node will be collected To nodal information be broadcast to other all nodes, go to step (1.3);
(1.3) after working node and service node receive the nodal information that host node is sent, node topology information is initialized, is used In with the communication between posterior nodal point;Go to step (1.4);
(1.4) working node and service node initializing application logic, runtime system is according to stage group in program code The sequencing of middle appearance determines that the successive of stage group executes sequence, while constructing the corresponding DAG of each stage group;It goes to step (2);
(2) parallel training step: host node and service node skip specific training logic, enter step (3), each work Node enters model training state, and working node is iterated formula parallel training, Zhi Daoman according to the training data subset of input The predefined iteration termination condition of foot, the behavior of working node specifically include following sub-steps:
(2.1) runtime system of working node is for having determined each stage group of sequencing, by the node on its DAG It carries out topological sorting and executes sequence with each stage group of determination interior all stages;Go to step (2.2);
(2.2) the stage group being currently not carried out is known as next_group;Setting next_group is machine learning application logic the The stage group once occurred;(2.3) are gone to step, if going to step (2.6) currently without the stage group being not carried out;
(2.3) the run method of the next_group stage group indicated is executed into num_run (when num_run is user's startup program The parameter of offer) it is secondary, for the run method of single, when operation, creates a collection of thread, according to the rank determined by step (2.1) Section organize in the stage execute sequence, successively execute the run method in all stages, numbering the small stage first carries out, with identical The stage pipeline of number executes, which goes to step (2.4) after having executed num_run times;
(2.4) whether the stage group that the runtime system judgement of working node has currently executed num_run times is provided with set_ Barrier carries out fence simultaneously operating if being provided with set_barrier;Go to step (2.5);Set_barrier interface is used In setting simultaneously operating, after up till now stage group has executed, needing all working node to enter, fence is synchronous to wait shape for expression State can just continue to run after the stage group of all working node has executed;
(2.5) if next_group, is set to the stage group being currently not carried out, gone to step by the stage group being currently also not carried out (2.3), (2.6) are otherwise gone to step;
(2.6) runtime system of working node judges whether to reach iteration termination condition, if reaching termination condition, goes to step (3), (2.1) are otherwise gone to step;
(3) system finishing step: working node informs that its work of host node is completed, and host node detects the work of all working node After completing, all nodes of notice coordinate exit the program, and specifically include following sub-step:
(3.1) all working node sends job_done message to host node, and host node receives the job_ of all working node After done message, host node sends sys_exit message to all working node and service node, goes to step (3.2);
(3.2) after working node and service node receive sys_exit message, working node and service node are sent out to host node Sys_exit_ack message is sent, (3.3) are gone to step;
(3.3) host node receives the sys_exit_ack message that all working node and service node are sent, and goes to step (3.4);
(3.4) all nodes terminate program.
3. the automation task parallel method suitable for distributed machines study as claimed in claim 2, which is characterized in that institute It states and determines that the process of the execution sequence in stage in stage group specifically includes following sub-step in step (2.1):
Current unappropriated DAG node number order is set to 0 by (2.1.1), and the node set nodes that current in-degree is 0 is set For sky, (2.1.2) is gone to step;
The node that in-degree in current DAG is 0 is added in set nodes by (2.1.2), to all in node set nodes It is order that node, which is all numbered, and order increases 1 certainly;By in node set nodes node and its it is all go out side from DAG figure Fall, nodes set is set to sky, goes to step (2.1.3);
(2.1.3) judges whether current DAG is sky, then goes to step (2.2) if it is sky, otherwise go to step (2.1.2).
4. the automation task parallel method suitable for distributed machines study as claimed in claim 2, which is characterized in that step Suddenly the run method in stage described in (2.3), specifically includes following sub-step:
The runtime system of (2.3.1) working node calls the prepare_variables method in the stage, determines the current generation Master variable primary_variable keyset to be processed close primary_key_set, specifically, first according to scheduler module The loading condition (L1, L2 ..., Ln) of the service node of acquisition and the key of master variable primary_variable are in service node On distribution situation, network is loaded safeguarded on minimum service node also not by working node handle part key set Working node is distributed to, closes parimary_key_set, rotor step (2.3.2) as keyset to be processed next time;
The key_projection function that (2.3.2) runtime system is provided according to user, and the main transformer determined in (2.3.1) The keyset of amount closes primary_key_set and derives that the keyset of auxiliary variable closes secondary_key_set, calls the master variable It goes to pull required model parameter with the pull method of the tensors object such as auxiliary variable;Go to step (2.3.3);
(2.3.3) executes the core function kernel_function in the stage, and runtime system is automatically by the collection of the key of master variable It closes key_set and is divided into num_threads part automatically, create num_threads thread parallelization and execute core function, Wherein num_threads is the parameter that user provides, and goes to step (2.3.4);
(2.3.4) runs core function kernel_function and generates more new variables v_update, and runtime system is according to user The update_function of offer updates corresponding variable v;If the type of variable v be it is globally shared or it is global only One, the push function of run time call variable v is updated, and (key, the value) which to be updated is to serializing;And Data after serializing are sent to the service node of all maintenance this section key, after service node receives more new data, are updated Its data safeguarded.
5. the automation task parallel method suitable for distributed machines study as claimed in claim 4, which is characterized in that step Suddenly pull method described in (2.3.2) step has following sub-step:
(2.3.2.1) keyset to be pulled the tensor object closes key_set serializing, and the data of serializing are sent to dimension The service node for protecting the paragraph key set goes to step (2.3.2.2);
After (2.3.2.2) service node receives the message of pull, by corresponding (key, value) the binary group data of key_set Serializing, and the data after serializing are returned into requesting party.
CN201610255970.5A 2016-04-22 2016-04-22 A kind of automation task suitable for distributed machines study parallel method and its system Active CN105956021B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610255970.5A CN105956021B (en) 2016-04-22 2016-04-22 A kind of automation task suitable for distributed machines study parallel method and its system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610255970.5A CN105956021B (en) 2016-04-22 2016-04-22 A kind of automation task suitable for distributed machines study parallel method and its system

Publications (2)

Publication Number Publication Date
CN105956021A CN105956021A (en) 2016-09-21
CN105956021B true CN105956021B (en) 2019-05-21

Family

ID=56915367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610255970.5A Active CN105956021B (en) 2016-04-22 2016-04-22 A kind of automation task suitable for distributed machines study parallel method and its system

Country Status (1)

Country Link
CN (1) CN105956021B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009642B (en) * 2016-10-31 2021-12-14 腾讯科技(深圳)有限公司 Distributed machine learning method and system
CN108229686B (en) * 2016-12-14 2022-07-05 阿里巴巴集团控股有限公司 Model training and predicting method and device, electronic equipment and machine learning platform
EP3376441B1 (en) * 2017-03-15 2021-07-14 Siemens Aktiengesellschaft A method for execution of a machine learning model on memory restricted industrial device
CN108733461B (en) * 2017-04-18 2021-09-14 北京京东尚科信息技术有限公司 Distributed task scheduling method and device
CN107231558B (en) * 2017-05-23 2019-10-22 江苏火米互动科技有限公司 A kind of implementation method of the H.264 parallel encoder based on CUDA
CN111597187B (en) * 2017-08-30 2023-09-01 第四范式(北京)技术有限公司 Distributed system for performing machine learning and method thereof
CN109447274B (en) * 2017-08-30 2021-02-09 第四范式(北京)技术有限公司 Distributed system for performing machine learning and method thereof
CN111079942B (en) * 2017-08-30 2023-03-24 第四范式(北京)技术有限公司 Distributed system for performing machine learning and method thereof
CN109814986B (en) * 2017-11-20 2021-01-05 上海寒武纪信息科技有限公司 Task parallel processing method, storage medium, computer equipment, device and system
CN107944566B (en) * 2017-11-28 2020-12-22 杭州云脑科技有限公司 Machine learning method, main node, working node and system
CN109960570B (en) * 2017-12-14 2021-09-03 北京图森智途科技有限公司 Multi-module scheduling method, device and system
CN108681777B (en) * 2018-05-07 2021-07-20 北京京东尚科信息技术有限公司 Method and device for running machine learning program based on distributed system
CN109871958B (en) * 2019-02-01 2023-07-28 东软医疗系统股份有限公司 Method, device and equipment for training model
WO2020243973A1 (en) * 2019-06-06 2020-12-10 华为技术有限公司 Model-based signal inference method and apparatus
US11907770B2 (en) 2019-09-19 2024-02-20 Huawei Cloud Computing Technologies Co., Ltd. Method and apparatus for vectorized resource scheduling in distributed computing systems using tensors
CN110990059B (en) * 2019-11-28 2021-11-19 中国科学院计算技术研究所 Stream type calculation engine operation method and system for tilt data
TWI780382B (en) * 2019-12-05 2022-10-11 新唐科技股份有限公司 Microcontroller updating system and method
CN111506402B (en) * 2020-03-31 2023-06-27 上海氪信信息技术有限公司 Computer task scheduling method, device, equipment and medium for machine learning modeling
CN111580970B (en) * 2020-05-07 2023-02-03 电子科技大学 Transmission scheduling method for model distribution and aggregation of federated learning
CN111753997B (en) * 2020-06-28 2021-08-27 北京百度网讯科技有限公司 Distributed training method, system, device and storage medium
US11954611B2 (en) 2020-08-27 2024-04-09 International Business Machines Corporation Tensor comparison across a distributed machine learning environment
CN112214256B (en) * 2020-09-30 2024-02-02 招商局金融科技有限公司 Machine learning operation control method and device, electronic equipment and storage medium
CN113157413B (en) * 2021-04-16 2022-04-26 上海交通大学 Deep learning task resource optimization configuration method and system based on service quality requirement
CN113703980B (en) * 2021-08-31 2024-09-06 西安电子科技大学 Distributed machine learning system and communication scheduling method suitable for same
CN114461392B (en) * 2022-01-25 2023-03-31 西南交通大学 Bandwidth-aware selective data multicast method
CN115314397B (en) * 2022-08-05 2023-07-21 中科计算技术西部研究院 Network simulation method, system, device and storage medium for distributed training
CN116483580B (en) * 2022-09-29 2024-05-28 陕西震旦纪信息技术有限公司 System and method for scheduling server computing resources based on Kubernetes
CN116662039B (en) * 2023-07-25 2024-01-23 菲特(天津)检测技术有限公司 Industrial information parallel detection method, device and medium based on shared memory

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8027938B1 (en) * 2007-03-26 2011-09-27 Google Inc. Discriminative training in machine learning
CN102546247A (en) * 2011-12-29 2012-07-04 华中科技大学 Massive data continuous analysis system suitable for stream processing
CN103763378A (en) * 2014-01-24 2014-04-30 中国联合网络通信集团有限公司 Task processing method and system and nodes based on distributive type calculation system
CN104360903A (en) * 2014-11-18 2015-02-18 北京美琦华悦通讯科技有限公司 Method for realizing task data decoupling in spark operation scheduling system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9563670B2 (en) * 2013-03-14 2017-02-07 Leidos, Inc. Data analytics system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8027938B1 (en) * 2007-03-26 2011-09-27 Google Inc. Discriminative training in machine learning
CN102546247A (en) * 2011-12-29 2012-07-04 华中科技大学 Massive data continuous analysis system suitable for stream processing
CN103763378A (en) * 2014-01-24 2014-04-30 中国联合网络通信集团有限公司 Task processing method and system and nodes based on distributive type calculation system
CN104360903A (en) * 2014-11-18 2015-02-18 北京美琦华悦通讯科技有限公司 Method for realizing task data decoupling in spark operation scheduling system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Distributed GraphLab: A Framework for Machine Learning;Yucheng Low;《Proceedings of Vldb Endowment》;20120531;全文
Petuum: A Framework for Iterative-Convergent;Wei Dai;《Proceedings of Advances in Neural Information Processing Systems》;20131231;全文
Scaling Distributed Machine Learning;Mu Li;《Proceedings of 11th USENIX Symposium on Operating Systems Design and Implementation》;20141231;全文
基于事物内存的分布式编程环境中缓存一致性维护机制;余林琛等;《微电子学与计算机》;20130331;全文

Also Published As

Publication number Publication date
CN105956021A (en) 2016-09-21

Similar Documents

Publication Publication Date Title
CN105956021B (en) A kind of automation task suitable for distributed machines study parallel method and its system
Kim et al. Strads: A distributed framework for scheduled model parallel machine learning
CN105117286B (en) The dispatching method of task and streamlined perform method in MapReduce
Ward et al. Colmena: Scalable machine-learning-based steering of ensemble simulations for high performance computing
US10754709B2 (en) Scalable task scheduling systems and methods for cyclic interdependent tasks using semantic analysis
US20240111586A1 (en) Multi-policy intelligent scheduling method and apparatus oriented to heterogeneous computing power
Yu et al. Automated runtime-aware scheduling for multi-tenant dnn inference on gpu
CN109891438B (en) Numerical quantum experiment method and system
CN112416585B (en) Deep learning-oriented GPU resource management and intelligent scheduling method
CN112764893B (en) Data processing method and data processing system
CN104243617A (en) Task scheduling method and system facing mixed load in heterogeneous cluster
US20210390405A1 (en) Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof
CN112948066A (en) Spark task scheduling method based on heterogeneous resources
CN112052081A (en) Task scheduling method and device and electronic equipment
Rocha et al. Pipetune: Pipeline parallelism of hyper and system parameters tuning for deep learning clusters
CN117009038B (en) Graph computing platform based on cloud native technology
Feljan et al. Task allocation optimization for multicore embedded systems
CN106844024B (en) GPU/CPU scheduling method and system of self-learning running time prediction model
US20240193721A1 (en) System and method for adaptive graph-to-stream scheduling
US10719903B2 (en) On-the fly scheduling of execution of dynamic hardware behaviors
CN114925591A (en) Automatic parallel strategy searching method based on polyhedron model modeling and related equipment
US20100131740A1 (en) Data processing system and data processing method
CN116991878A (en) Method and system for generating distributed execution plan based on Q-learning
Zhou et al. Scheduling-efficient framework for neural network on heterogeneous distributed systems and mobile edge computing systems
CN113902567B (en) Method and device for executing tasks and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant