CN105956021B - A kind of automation task suitable for distributed machines study parallel method and its system - Google Patents
A kind of automation task suitable for distributed machines study parallel method and its system Download PDFInfo
- Publication number
- CN105956021B CN105956021B CN201610255970.5A CN201610255970A CN105956021B CN 105956021 B CN105956021 B CN 105956021B CN 201610255970 A CN201610255970 A CN 201610255970A CN 105956021 B CN105956021 B CN 105956021B
- Authority
- CN
- China
- Prior art keywords
- node
- module
- stage
- key
- variable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2323—Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Discrete Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Multi Processors (AREA)
Abstract
The present invention provides a kind of automation task parallel methods and its system suitable for distributed machines study, solve the defect of existing distributed machines study programming interface: the read-write interface for only providing key-value pair leads to system data access behavior and applies logic close coupling.The defect aggravates network bandwidth resources competition in distributed type assemblies, and programming personnel is caused to be not easy to carry out parallelization to task.Present system includes working node module, service node module, host node module, tensor module, scheduler module, message tracking module, stage module, stage group module and enforcement engine module.The present invention decouples the logic of read and write access behavior and application program by the way that the programming for providing higher level is abstract, runtime system carries out dynamic task division according to the loading condition of service node first, secondly machine learning task is automated into parallel execution, mitigates the burden that programming personnel writes high concurrent machine learning application significantly.
Description
Technical field
The invention belongs to distributed computings and machine learning interleaving techniques field, and in particular to one kind is suitable for distributed machine
The automation task of device study parallel method and its system.
Background technique
The conventional method that machine learning algorithm is worth as a kind of mining data is widely used in natural language processing, text
The fields such as this analysis, speech recognition, automatic driving of motor vehicle and biological information.With the arrival of big data era, data
Value increasingly show especially out, the commercial value especially wherein contained, machine learning thus be taken seriously.However, with number
According to scale and mutually increasing in requisition for the scale of the model parameter of study, single calculate node is due to its memory source, meter
The finiteness for calculating resource and memory bandwidth resource etc. is no longer satisfied the demand of large-scale machines study.By traditional single-unit
Point machine learning, which carries out distribution, becomes a kind of new and required trend.After machine learning distribution, it can be used more
More calculate nodes goes to handle larger data, at the same shorten training gained model needed for the time, and improve study
Model accuracy.Distributed machines study is all received universal concern in industry and academia, such as: Google utilizes distribution
Formula system DistBelief has trained cat face identification model, and Apache Software Foundation, which is developed, to be based on
It increases income one and is applicable to machine in the distributed machines laboratory Berkeley AMP learning framework Mahout and UC of Hadoop
Distributed computing system Spark of learning algorithm etc..
Distributed machines, which learn most of algorithms, has iterative nature, runs the iterative process or model ginseng of predetermined number of times
Number, which converges to a certain stable state, just terminates training process.Traditional Distributed Architecture MapReduce etc. is due to its synchronization mechanism
Defect, make it be bad to cause its performance not fully up to expectations in iterative estimated performance.
Novel machine learning distributed system is parameter server framework, and parameter described herein refers in machine learning
For the key-value pair (key, value) or two-dimensional matrix or multi-dimensional matrix of descriptive model parameter, while multi-dimensional matrix
Also referred to as tensor.In parameter server framework, the calculate node in cluster is divided into two classes, and a kind of node is known as working node,
Another kind of node is known as service node.Wherein, service node is responsible for safeguarding that world model's parameter, including responsive operation node are directed to
The operation such as inquiry and update of model parameter;Working node loads partial data collection that global training data is concentrated to local memory
In, it is calculated using the algorithm of application logic regulation and which model parameter is needed to be calculated, initiate inquiry behaviour to service node
Make, required model parameter is transmitted in local memory by network, then utilizes the algorithm and required mould for applying logic regulation
Shape parameter calculates the updated value Δ w of new model parameter w or model parameter, after a wheel iterative calculation, work section
Point is initiated update and synchronous world model's parameter etc. to service node and is operated.Working node is primary complete in distributed machines study
Behavior in whole iteration, which can conclude, is described as following steps:
1. working node loading section data set;
2. working node calculates the model parameter of needs, the mould needed for being obtained by the model access interface that bottom provides
Shape parameter;
3. according to the updated value Δ w for going out new model parameter w or model parameter using logic calculation;
4. the updated value Δ w of the model parameter w newly calculated or model parameter is pushed to service node by working node, into
Row parameter update with it is synchronous.
Step 2 among the above, 3,4 are the committed steps in iterative calculation, and are accessed by world model's parameter reading and writing
Interface obtains the model parameter needed for calculating and the updated value of the model parameter newly calculated or model parameter is pushed to clothes
Business node, is the major source of network transmission in system.
It is huge due to model parameter for step 2, the network transmission volume thus caused be also it is huge, in net
In the case that network bandwidth resources are certain, for a working node, the network latency in iterative process, which is greater than, to be calculated
Time, so that the time of entire model training lengthens;When multiple working nodes trigger network transmission simultaneously, there are bandwidth resources
Warfare, network latency can become longer.The behavior and upper layer application logic of working node triggering mode parameter
It is closely related.The physical layer interface provided in parameter current server architecture is the unified interface of global parameter access, is made in this way
It obtains the behavior of the access global parameter of system and applies logic close coupling, be unfavorable for optimizing from system bottom.
For step 3, working node computation model parameter, this operation is the operation of computation-intensive, current many-core,
How multicore era maximizes the parallel calculating task, most important for the concurrency for improving system.Current distributed machine
Device learning system is not provided with the programming interface of corresponding parallelization, only provides world model's read and write access interface, it is therefore desirable to
Programming personnel has the experience of multiple programming, can just write the machine learning application program of high concurrent.
For step 4, for the bottleneck of network transmission in parameter synchronization, existing 2 kinds of solutions: one is change to synchronize
The iteration progress of model, i.e. permission different operating node has certain difference, after the difference of iteration progress reaches certain threshold value,
Batch synchronization (BSP, Bulk Synchronous Parallel) is carried out again, and such scheme alleviates Netowrk tape to a certain extent
The case where wide resource contention;Another solution is control parameter server resource occupancy situation, is selected for different operating node
Different synchronization of time intenals is taken to avoid request emergency case, while guaranteeing that the time interval chosen can meet reduction simultaneously
Communication frequency and ensure train accuracy rate.
Summary of the invention
In view of the above drawbacks of the prior art or Improvement requirement, the present invention provides be suitable for appointing for distributed machines study
Business automates parallel method and its system.Firstly, by decoupling the access interface of model parameter and application logic, this
Adjustable characteristic when sample makes system have operation for the access behavior of model parameter, be in this way network transmission and be
The optimization of system bottom parallelization etc. provides the foundation.Several stages are logically decomposed into secondly, will apply, and thus construct oriented nothing
Ring figure (directed acyclic graph, abbreviation DAG) goes to describe the dependence between each calculation stages, and when operation is
Task divide and be executed parallel by system by DAG automation, improves system concurrency degree.Above method and system can be effective
Ground solves network transmission bottleneck problem in existing distributed machines learning system and improves system concurrency degree, to improve system
Overall performance.
To achieve the goals above, according to one aspect of the present invention, it provides a kind of suitable for distributed machines study
Task automation parallel method and its system, specifically include working node module, service node module, host node module,
Measure module, scheduler module, message tracking module, stage module, stage group module and enforcement engine module.Wherein stage mould
Block, scheduler module are all connected with tensor module;Stage module is connected with stage group module;Engine execution module and stage module phase
Even;Scheduler module, tensor module, stage group are all connected with tensor module.
The working node module and service node module, is the row for working node and parameter service node respectively
For abstractdesription, and the two modules are transparent to machine learning programming personnel.
The host node module is the abstractdesription for host node.The effect of host node is to coordinate whole system
Workflow, such as the initialization of system and the end of system.Mentioned-above system module, in addition to working node module, clothes
Other modules outside business node module, host node module are all present in all nodes.
The tensor module is used to describe the key-value pair (key, value) of model parameter in machine learning.Application program needs
A variety of tensor objects are wanted to describe to train required model parameter, each tensor object have tensor_id attribute as it only
One mark.There are three types of the types of tensor object: the overall situation sharable (global shared), globally unique (global
) and local (local) unique.Global sharable expression tensor object safeguarded by distributed node, different nodes
Between the data safeguarded can have intersection;The globally unique expression tensor object safeguarded by distributed node, different sections
Point between safeguard without intersection;The expression of the local tensor object exists only in a node.Tensor object has load
(load), the operation interfaces such as (pull), push (push) are pulled to use for programming personnel.
The stage module, for describing certain section of programmed logic in application program.The present invention patrols the entirety of application program
It collects and resolves into the different stages, and each phase object contains stage_id attribute as its unique identification.Between phase object,
Function set_dependency being relied on by setting, the dependence between them is set.Phase object needs several tensors pair
As its input and an optional output.Its input has 2 seed types for the stage, and one kind being referred to as master variable
Primary_variable, another kind are referred to as complementary variable secondary_variable.(key, the value) of master variable
It is right, there is no dependence between key, and (key, the value) of complementary variable is right, has dependence between key.It is right
In a stage, programming personnel needs to provide core function kernel_function, the core logic as this stage.Simultaneously
Programming personnel it is also required to provide mapping function (the key_projection letter between the key of master variable and the key of complementary variable
Number), runtime system derives the key of auxiliary variable automatically according to the key and key_projection function of master variable.For
Each master variable and auxiliary variable in stage have a corresponding variable to be known as update_variable.update_
Variable is for updating corresponding variable, and the customer-furnished update_function of more new logic is defined.
The stage group module, for describing one group of stage.Connection is close between this group of represented stage of stage group.Rank
Section group has attribute group_id as its unique identification.Stage group has the two interfaces of run and set_barrier.The side run
The optional parameters of method is an integer num_run, for specifying the execution number of this stage group.Set_barrier interface is used
In setting simultaneously operating, after up till now stage group has executed, needing all working node to enter, fence is synchronous to wait shape for expression
State can just continue to run after the stage group of all working node has executed.
The scheduler module is for decision for some tensor object, the set of working node next stage key to be treated
key_set.The bandwidth information on scheduler module fixed time broadcast its node on service node, the scheduler module root on working node
According to the bandwidth information of its service node obtained, decision share out the work node model parameter to be treated next time key collection
It closes.
The engine execution module, for by stage group stage and its relation of interdependence be described as directed acyclic graph
(directed acyclic graph, abbreviation DAG), in the directed acyclic graph, the node in figure indicates stage, having in figure
Dependence between the side expression stage, while tail portion stage will prior to while head stage execute.
The message tracking module is used in logging program operation by tensor module, scheduler module, stage group module, work
Make the message that node module, service node module, host node module are submitted, when message tracking module is given in message-submission, message with
Track module is responsible for for message being transferred to recipient, and after recipient returns to message receipt, message tracking module is responsible for notification message
Initial launching side, and deliver acknowledgement message.
Correspondingly, the present invention also provides a kind of tasks suitable for distributed machines study to automate parallel method, uses
In the automatic division and automatic paralleling execution that carry out task in distributed machines learning algorithm scene, including system initialization
Step, parallel training step and system finishing step, in which:
(1) system initialization step: initialization node topology information and initialization application logic, specifically include with
Lower sub-step:
(1.1) all nodes bring into operation, and read configuration file respectively, determine the role's rotor step (1.2) of oneself, institute
Stating role is working node or service node or host node;
(1.2) working node, service node are communicated with host node respectively, inform that its nodal information of host node, host node will
The nodal information being collected into is broadcast to other all nodes, goes to step (1.3);
(1.3) after working node and service node receive the nodal information that host node is sent, initialization node topology letter
Breath, for the communication between posterior nodal point;Go to step (1.4);
(1.4) working node and service node initializing application logic, runtime system is according to stage group in program
The sequencing occurred in code determines that the successive of stage group executes sequence, while constructing the corresponding DAG of each stage group;Turn step
Suddenly (2);
(2) parallel training step: host node and service node skip specific training logic, enter step (3), each
Working node enters model training state, and working node is iterated formula parallel training according to the training data subset of input, directly
To predefined iteration termination condition is met, the behavior of working node specifically includes following sub-steps:
(2.1) runtime system of working node, will be on its DAG for having determined each stage group of sequencing
Node carries out topological sorting and executes sequence with each stage group of determination interior all stages;Go to step (2.2);
(2.2) the stage group being currently not carried out is known as next_group;Setting next_group is that machine learning application is patrolled
Collect the stage group occurred for the first time;(2.3) are gone to step, if going to step (2.6) currently without the stage group being not carried out;
(2.3) by the run method execution num_run of the next_group stage group indicated, (num_run is that user starts journey
The parameter provided when sequence) it is secondary, for the run method of single, when operation, creates a collection of thread, according to determined by step (2.1)
Stage executes sequence in the stage group, successively executes the run method in all stages, numbers the small stage and first carries out, has
The stage pipeline of identical number executes, which goes to step (2.4) after having executed num_run times;
(2.4) whether the stage group that the runtime system judgement of working node has currently executed num_run times is provided with
Set_barrier carries out fence simultaneously operating if being provided with set_barrier;Go to step (2.5);
(2.5) if next_group, is set to the stage group being currently not carried out by the stage group being currently also not carried out, turn
Step (2.3), otherwise goes to step (2.6);
(2.6) judge whether to reach iteration termination condition when working node is run, if reaching termination condition, go to step
(3), (2.1) are otherwise gone to step;
(3) system finishing step: working node informs that its work of host node is completed, and host node detects all working node
Work complete after, all nodes of notice coordinate exit the program, specifically include following sub-step:
(3.1) all working node sends job_done message to host node, and host node receives all working node
After job_done message, host node sends sys_exit message to all working node and service node, goes to step (3.2);
(3.2) after working node and service node receive sys_exit message, working node and service node are to main section
Point sends sys_exit_ack message, goes to step (3.3);
(3.3) host node receives the sys_exit_ack message that all working node and service node are sent, and goes to step
(3.4);
(3.4) all nodes terminate program.
Determine that the process of the execution sequence in stage in stage group specifically includes following sub-step in above-mentioned steps (2.1):
Current unassigned number order is set to 0 by (2.1.1), and the node set nodes that current in-degree is 0 is set to
Sky goes to step (2.1.2);
The node that in-degree in current DAG is 0 is added in set nodes by (2.1.2), in node set nodes
It is order that all nodes, which are all numbered, and order increases 1 certainly;By in node set nodes node and its it is all go out side from DAG figure
Remove, nodes set is set to sky, goes to step (2.1.3);
(2.1.3) judges whether current DAG is sky, then goes to step (2.2) if it is sky, otherwise go to step (2.1.2).
The run method in stage described in above-mentioned steps (2.3), specifically includes following sub-step:
The runtime system of (2.3.1) working node calls the prepare_variables method in the stage, determines current
The master variable primary_variable in stage keyset to be processed closes primary_key_set, specifically, first according to scheduling
The loading condition (L1, L2 ..., Ln) and the key of master variable primary_variable for the service node that module obtains are servicing
Network is loaded the part key not handled by working node also safeguarded on minimum service node by the distribution situation on node
Set distributes to working node, closes parimary_key_set, rotor step (2.3.2) as keyset to be processed next time;
The key_projection function that (2.3.2) runtime system is provided according to user, and determined in (2.3.1)
The keyset of master variable closes primary_key_set and derives that the keyset of auxiliary variable closes secondary_key_set, calls the master
The pull method of the tensors object such as variable and auxiliary variable goes to pull required model parameter;Go to step (2.3.3);
(2.3.3) executes the core function kernel_function in the stage, and runtime system is automatically by the key of master variable
Set key_set be divided into num_threads part automatically, create num_threads thread parallelization execution core
Function, wherein num_threads is the parameter that user provides, and goes to step (2.3.4);
(2.3.4) run core function kernel_function generate more new variables v_update, runtime system according to
The update_function that user provides updates corresponding variable v;If the type of variable v is globally shared or complete
Office is unique, and the push function of run time call variable v is updated, and (key, the value) which to be updated is to sequence
Change;And the data after serializing are sent to the service node of all maintenance this section key, service node receives more new data
Afterwards, the data of its maintenance are updated.
Pull method described in above-mentioned (2.3.2) step has following sub-step:
(2.3.2.1) keyset to be pulled the tensor object closes key_set serializing, and the data of serializing are sent
To the service node for safeguarding the paragraph key set, (2.3.2.2) is gone to step;
After (2.3.2.2) service node receives pull message, by corresponding (key, value) the binary group number of key_set
Requesting party is returned to according to serializing, and by the data after serializing.
By the above method, in general the above technical scheme conceived by the present invention compared with prior art, have with
Under advantage and technical effect:
(1) the present invention provides compared to the higher some programming modules of global read and write access interface abstraction level, these
Module is decoupled the logic of read and write access behavior and application program, is on the one hand greatly facilitated programming personnel and is write and answers
With program, on the other hand provide the foundation for the optimization of system level;
(2) the automation task that the present invention realizes machine learning task executes parallel, this significantly reduces application program
Personnel write the burden of high concurrent machine learning application;
(3) runtime system that the present invention develops carries out the dynamic of task automatically according to the loading condition of each service node
It divides, takes full advantage of network bandwidth resources.
Detailed description of the invention
Fig. 1 is the module frame chart of automation task parallel system of the present invention;
Fig. 2 is the overall workflow figure of automation task parallel method of the present invention;
Fig. 3 is the sub- work flow diagram of system initialization of automation task parallel method of the present invention;
Fig. 4 is the sub- work flow diagram of parallel training of automation task parallel method of the present invention;
Fig. 5 is the sub- work flow diagram of system finishing of automation task parallel method of the present invention;
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not
For limiting the present invention.As long as in addition, technical characteristic involved in the various embodiments of the present invention described below that
Not constituting conflict between this can be combined with each other.
Fig. 1 is the module frame chart of automation task parallel execution system of the present invention.As shown in Figure 1, automation of the invention
Task parallel system specifically includes working node module, service node module, host node module, tensor module, scheduler module, disappears
Cease tracking module, stage module, stage group module and enforcement engine module.Wherein stage module, scheduler module be all and tensor
Module is connected;Stage module is connected with stage group module;Engine execution module is connected with stage module;Scheduler module, tensor mould
Block, stage group are all connected with message tracking module.
Working node module and service node module, are the pumping for the behavior of working node and parameter service node respectively
As description, and the two modules are transparent to machine learning programming personnel.
Host node module is the abstractdesription for host node.The effect of host node is to coordinate the workflow of whole system
Journey, such as the initialization of system and the end of system.Mentioned-above system module, in addition to working node module, service node
Other modules outside module, host node module are all present in all nodes.
Tensor module is used to describe the key-value pair (key, value) of model parameter in machine learning.Application program needs more
Tensor object is planted to describe to train required model parameter, each tensor object has tensor_id attribute as its unique mark
Know.There are three types of the types of tensor object: the overall situation sharable (global shared), globally unique (global unique)
With local (local).Global sharable expression tensor object is safeguarded by distributed node, is safeguarded between different nodes
Data can have intersection;The globally unique expression tensor object is safeguarded by distributed node, is tieed up between different nodes
Shield without intersection;The expression of the local tensor object exists only in a node.Tensor object has load (load), draws
The operation interfaces such as (pull), push (push) are taken to use for programming personnel.
Stage module, for describing certain section of programmed logic in application program.Overall logic point of the present invention application program
Solution is at the different stages, and each phase object contains stage_id attribute as its unique identification.Between phase object, it can lead to
It crosses setting and relies on the dependence that function set_dependency is arranged between them.Phase object needs several tensor objects to make
For its input and an optional output.Its input has 2 seed types for the stage, and one kind being referred to as master variable
Primary_variable, another kind are referred to as complementary variable secondary_variable.(key, the value) of master variable
It is right, there is no dependence between key, and (key, the value) of complementary variable is right, has dependence between key.It is right
In a stage, programming personnel needs to provide core function kernel_function, the core logic as this stage.Simultaneously
Programming personnel it is also required to provide the mapping function key_projection between the key of master variable and the key of complementary variable, operation
When system the key of auxiliary variable is derived automatically according to the key and key_projection function of master variable.It is every for the stage
A master variable and auxiliary variable have a corresponding variable to be known as update_variable.Update_variable is used for
Corresponding variable is updated, and the customer-furnished update_function of more new logic is defined.
Stage group module, for describing one group of stage.Connection is close between this group of represented stage of stage group.Stage group
With attribute group_id as its unique identification.Stage group has the two interfaces of run and set_barrier.Run method
Optional parameters is an integer num_run, for specifying the execution number of this stage group.Set_barrier interface is for setting
Simultaneously operating is set, expression needs all working node to enter the synchronous wait state of fence after up till now stage group has executed, when
After the stage group of all working node has executed, can just it continue to run.
Scheduler module is for decision for some tensor object, the set key_ of working node next stage key to be treated
set.The bandwidth information on scheduler module fixed time broadcast its node on service node, the scheduler module on working node is according to it
The bandwidth information of the service node of acquisition, decision share out the work node model parameter to be treated next time key set.
Engine execution module, for by stage group stage and its relation of interdependence be described as directed acyclic graph
(directed acyclic graph, abbreviation DAG), in the directed acyclic graph, the node in figure indicates stage, having in figure
Dependence between the side expression stage, while tail portion stage will prior to while head stage execute.
Message tracking module, for being saved in logging program operation by tensor module, scheduler module, stage group module, work
The message that point module, service node module, host node module are submitted, when message tracking module is given in message-submission, message tracks mould
Block is responsible for for message being transferred to recipient, and after recipient returns to message receipt, message tracking module is responsible for the first of notification message
Beginning initiator, and deliver acknowledgement message.
Fig. 2 is the overall workflow figure of automation task parallel method of the present invention.As shown in Fig. 2, the present invention automates
The overall workflow of task parallel method the following steps are included:
(1) system initialization step: initialization node topology information and initialization application logic;
(2) parallel training step: host node and service node skip specific training logic, enter step (3), each
Working node enters model training state, and working node is iterated formula parallel training according to the training data subset of input, directly
To meeting predefined iteration termination condition;
(3) system finishing step: working node informs that its work of host node is completed, and host node detects all working node
Work complete after, all nodes of notice coordinate exit the program.
Fig. 3 is the sub- work flow diagram of system initialization that automation task of the present invention executes method parallel.As shown in Figure 3 originally
Invention automation task execute the system initialization workflow of method parallel the following steps are included:
(1.1) all nodes bring into operation, and read configuration file respectively, determine the role's rotor step (1.2) of oneself, institute
The role stated is working node or service node or host node;
(1.2) working node, service node are communicated with host node respectively, inform that its nodal information of host node, host node will
The nodal information being collected into is broadcast to other all nodes, goes to step (1.3);
(1.3) after working node and service node receive the nodal information that host node is sent, initialization node topology letter
Breath, for the communication between posterior nodal point;Go to step (1.4);
(1.4) working node and service node initializing application logic, runtime system is according to stage group in program
The sequencing occurred in code determines that the successive of stage group executes sequence, while constructing the corresponding DAG of each stage group;Turn step
Suddenly (2).
Fig. 4 is the sub- work flow diagram of parallel training of automation task parallel method of the present invention.As shown in figure 4, certain works
The sub- workflow of the parallel training of node, comprising the following steps:
(2) parallel training step: host node and service node skip specific training logic, enter step (3), each
Working node enters model training state, and working node is iterated formula parallel training according to the training data subset of input, directly
To predefined iteration termination condition is met, the behavior of working node specifically includes following sub-steps:
(2.1) runtime system of working node, will be on its DAG for having determined each stage group of sequencing
Node carries out topological sorting and executes sequence with each stage group of determination interior all stages;Go to step (2.2);
(2.2) the stage group being currently not carried out is known as next_group;Setting next_group is that machine learning application is patrolled
Collect the stage group occurred for the first time;(2.3) are gone to step, if going to step (2.6) currently without the stage group being not carried out;
(2.3) by the run method execution num_run of the next_group stage group indicated, (num_run is that user starts journey
The parameter provided when sequence) it is secondary, for the run method of single, when operation, creates a collection of thread, according to determined by step (2.1)
Stage executes sequence in the stage group, successively executes the run method in all stages, numbers the small stage and first carries out, has
The stage pipeline of identical number executes, which goes to step (2.4) after having executed num_run times;
(2.4) whether the stage group that the runtime system judgement of working node has currently executed num_run times is provided with
Set_barrier carries out fence simultaneously operating if being provided with set_barrier;Go to step (2.5);
(2.5) if next_group, is set to the stage group being currently not carried out by the stage group being currently also not carried out, turn
Step (2.3), otherwise goes to step (2.6);
(2.6) working node runtime system judges whether to reach iteration termination condition, if reaching termination condition, turns step
Suddenly (3) otherwise go to step (2.1).
Fig. 5 is the sub- work flow diagram of system finishing of automation task parallel method of the present invention.As shown in figure 5, of the invention
The sub- workflow of system finishing of automation task parallel method the following steps are included:
(3.1) all working node sends job_done message to host node, and host node receives all working node
After job_done message, host node sends sys_exit message to all working node and service node, goes to step (3.2);
(3.2) after working node and service node receive sys_exit message, working node and service node are to main section
Point sends sys_exit_ack message, goes to step (3.3);
(3.3) host node receives the sys_exit_ack message that all working node and service node are sent, and goes to step
(3.4);
(3.4) all nodes terminate program.
Further, the process for the execution sequence in stage in stage group being determined in step (2.1) specifically includes following son
Step:
Current unassigned number order is set to 0 by (2.1.1), and the node set nodes that current in-degree is 0 is set to
Sky goes to step (2.1.2);
The node that in-degree in current DAG is 0 is added in set nodes by (2.1.2), in node set nodes
It is order that all nodes, which are all numbered, and order increases 1 certainly;By in node set nodes node and its it is all go out side from DAG figure
Remove, nodes set is set to sky, goes to step (2.1.3);
(2.1.3) judges whether current DAG is sky, then goes to step (2.2) if it is sky, otherwise go to step (2.1.2).
Further, the run method in stage described in step (2.3), specifically includes following sub-step:
The runtime system of (2.3.1) working node calls the prepare_variables method in the stage, determines current
The master variable primary_variable in stage keyset to be processed closes primary_key_set, specifically, first according to scheduling
The loading condition (L1, L2 ..., Ln) and the key of master variable primary_variable for the service node that module obtains are servicing
Network is loaded the part key not handled by working node also safeguarded on minimum service node by the distribution situation on node
Set distributes to working node, closes as keyset to be processed next time, rotor step (2.3.2);
The key_projection function that (2.3.2) runtime system is provided according to user, and determined in (2.3.1)
The keyset of master variable closes primary_key_set and derives that the keyset of auxiliary variable closes secondary_key_set, calls the master
The pull method of the tensors object such as variable and auxiliary variable goes to pull required model parameter;Go to step (2.3.3);
(2.3.3) executes the core function kernel_function in the stage, and runtime system is automatically by the key of master variable
Set key_set be divided into num_threads part automatically, create num_threads thread parallelization execution core
Function, wherein num_threads is the parameter that user provides, and goes to step (2.3.4);
(2.3.4) run core function kernel_function generate more new variables v_update, runtime system according to
The update_function that user provides updates corresponding variable v;If the type of variable v is globally shared or complete
Office is unique, and the push function of run time call variable v is updated, and (key, the value) which to be updated is to sequence
Change;And the data after serializing are sent to the service node of all maintenance this section key, service node receives more new data
Afterwards, the data of its maintenance are updated.
Further, pull method described in step (2.3.2) has following sub-step:
(2.3.2.1) keyset to be pulled the tensor object closes key_set serializing, and the data of serializing are sent
To the service node for safeguarding the paragraph key set, (2.3.2.2) is gone to step;
After (2.3.2.2) service node receives the message of pull, by corresponding (key, the value) binary group of key_set
Data Serialization, and the data after serializing are returned into requesting party.
As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, not to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include
Within protection scope of the present invention.
Claims (5)
1. a kind of automation task parallel system suitable for distributed machines study, which is characterized in that including working node mould
Block, service node module, host node module, tensor module, scheduler module, message tracking module, stage module, stage group module
And enforcement engine module;Wherein stage module, scheduler module are all connected with tensor module;Stage module and stage group module phase
Even;Engine execution module is connected with stage module;Scheduler module, tensor module, stage group are all connected with message tracking module;
The working node module and service node module, is respectively used to the behavior for working node and parameter service node
It is described abstractly;
The host node module, for the abstractdesription to host node, host node is the workflow for coordinating whole system
Journey, the end of initialization and system including system;
The tensor module is used to describe the key-value pair (key, value) of model parameter in machine learning;Application program needs more
Tensor object is planted to describe to train required model parameter, each tensor object has tensor_id attribute as its unique mark
Know;There are three types of the types of tensor object: the overall situation sharable (global shared), globally unique (global unique)
With local (local);Global sharable expression tensor object is safeguarded by distributed node, is safeguarded between different nodes
Data can have intersection;The globally unique expression tensor object is safeguarded by distributed node, is tieed up between different nodes
Shield without intersection;The expression of the local tensor object exists only in a node;Tensor object has load (load), draws
The operation interfaces such as (pull), push (push) are taken to use for programming personnel;
The stage module, for describing certain section of programmed logic in application program, the overall logic of application program resolves into difference
Stage, and each phase object contains stage_id attribute as its unique identification;Between phase object, can by setting according to
Rely function set_dependency that the dependence between them is set;Phase object needs several tensor objects as its input
An and optional output;Its input has 2 seed types for phase object, and one kind being referred to as master variable primary_
Variable, another kind are referred to as complementary variable secondary_variable;(key, the value) of master variable is right, key
Between there is no dependence, and (key, the value) of complementary variable is right, has dependence between key;For a rank
Section, programming personnel need to provide core function kernel_function, the core logic as this stage;Programming personnel simultaneously
It it is also required to provide the mapping function key_projection between the key of master variable and the key of complementary variable, runtime system is certainly
The dynamic key that auxiliary variable is derived according to the key and key_projection function of master variable;For each master variable in stage
And auxiliary variable, there is a corresponding variable to be known as update_variable;Update_variable is for updating correspondence
Variable, and more new logic customer-furnished update_function definition;
The stage group module, for describing one group of stage;Connection is close between this group of represented stage of stage group;Stage group
With attribute group_id as its unique identification;Stage group has the two interfaces of run and set_barrier;Run method
Optional parameters is an integer num_run, for specifying the execution number of this stage group;Set_barrier interface is for setting
Simultaneously operating is set, expression needs all working node to enter the synchronous wait state of fence after up till now stage group has executed, when
After the stage group of all working node has executed, can just it continue to run;
The scheduler module is for decision for some tensor object, the set key_ of working node next stage key to be treated
set;Scheduler timing on service node broadcasts the bandwidth information on its node, and the scheduler on working node is according to its acquisition
Service node bandwidth information, decision share out the work node model parameter to be treated next time key set;
The enforcement engine module, for by stage group stage and its relation of interdependence be described as directed acyclic graph
(directed acyclic graph, abbreviation DAG), in the directed acyclic graph, the node in figure indicates stage, having in figure
Dependence between the side expression stage, while tail portion stage will prior to while head stage execute;
The message tracking module, for being saved in logging program operation by tensor module, scheduler module, working group's module, work
The message that point module, service node module, host node module are submitted, when message tracking module is given in message-submission, message tracks mould
Block is responsible for for message being transferred to recipient, and after recipient returns to message receipt, message tracking module is responsible for the first of notification message
Beginning initiator, and deliver acknowledgement message.
2. a kind of automation task parallel method suitable for distributed machines study, which is characterized in that including system initialization
Step, parallel training step and system finishing step, in which:
(1) system initialization step: initialization node topology information and initialization application logic specifically include following son
Step:
(1.1) all nodes bring into operation, and read configuration file respectively, determine the role's rotor step (1.2) of oneself, the angle
Color is working node or service node or host node;
(1.2) working node, service node are communicated with host node respectively, inform that its nodal information of host node, host node will be collected
To nodal information be broadcast to other all nodes, go to step (1.3);
(1.3) after working node and service node receive the nodal information that host node is sent, node topology information is initialized, is used
In with the communication between posterior nodal point;Go to step (1.4);
(1.4) working node and service node initializing application logic, runtime system is according to stage group in program code
The sequencing of middle appearance determines that the successive of stage group executes sequence, while constructing the corresponding DAG of each stage group;It goes to step
(2);
(2) parallel training step: host node and service node skip specific training logic, enter step (3), each work
Node enters model training state, and working node is iterated formula parallel training, Zhi Daoman according to the training data subset of input
The predefined iteration termination condition of foot, the behavior of working node specifically include following sub-steps:
(2.1) runtime system of working node is for having determined each stage group of sequencing, by the node on its DAG
It carries out topological sorting and executes sequence with each stage group of determination interior all stages;Go to step (2.2);
(2.2) the stage group being currently not carried out is known as next_group;Setting next_group is machine learning application logic the
The stage group once occurred;(2.3) are gone to step, if going to step (2.6) currently without the stage group being not carried out;
(2.3) the run method of the next_group stage group indicated is executed into num_run (when num_run is user's startup program
The parameter of offer) it is secondary, for the run method of single, when operation, creates a collection of thread, according to the rank determined by step (2.1)
Section organize in the stage execute sequence, successively execute the run method in all stages, numbering the small stage first carries out, with identical
The stage pipeline of number executes, which goes to step (2.4) after having executed num_run times;
(2.4) whether the stage group that the runtime system judgement of working node has currently executed num_run times is provided with set_
Barrier carries out fence simultaneously operating if being provided with set_barrier;Go to step (2.5);Set_barrier interface is used
In setting simultaneously operating, after up till now stage group has executed, needing all working node to enter, fence is synchronous to wait shape for expression
State can just continue to run after the stage group of all working node has executed;
(2.5) if next_group, is set to the stage group being currently not carried out, gone to step by the stage group being currently also not carried out
(2.3), (2.6) are otherwise gone to step;
(2.6) runtime system of working node judges whether to reach iteration termination condition, if reaching termination condition, goes to step
(3), (2.1) are otherwise gone to step;
(3) system finishing step: working node informs that its work of host node is completed, and host node detects the work of all working node
After completing, all nodes of notice coordinate exit the program, and specifically include following sub-step:
(3.1) all working node sends job_done message to host node, and host node receives the job_ of all working node
After done message, host node sends sys_exit message to all working node and service node, goes to step (3.2);
(3.2) after working node and service node receive sys_exit message, working node and service node are sent out to host node
Sys_exit_ack message is sent, (3.3) are gone to step;
(3.3) host node receives the sys_exit_ack message that all working node and service node are sent, and goes to step (3.4);
(3.4) all nodes terminate program.
3. the automation task parallel method suitable for distributed machines study as claimed in claim 2, which is characterized in that institute
It states and determines that the process of the execution sequence in stage in stage group specifically includes following sub-step in step (2.1):
Current unappropriated DAG node number order is set to 0 by (2.1.1), and the node set nodes that current in-degree is 0 is set
For sky, (2.1.2) is gone to step;
The node that in-degree in current DAG is 0 is added in set nodes by (2.1.2), to all in node set nodes
It is order that node, which is all numbered, and order increases 1 certainly;By in node set nodes node and its it is all go out side from DAG figure
Fall, nodes set is set to sky, goes to step (2.1.3);
(2.1.3) judges whether current DAG is sky, then goes to step (2.2) if it is sky, otherwise go to step (2.1.2).
4. the automation task parallel method suitable for distributed machines study as claimed in claim 2, which is characterized in that step
Suddenly the run method in stage described in (2.3), specifically includes following sub-step:
The runtime system of (2.3.1) working node calls the prepare_variables method in the stage, determines the current generation
Master variable primary_variable keyset to be processed close primary_key_set, specifically, first according to scheduler module
The loading condition (L1, L2 ..., Ln) of the service node of acquisition and the key of master variable primary_variable are in service node
On distribution situation, network is loaded safeguarded on minimum service node also not by working node handle part key set
Working node is distributed to, closes parimary_key_set, rotor step (2.3.2) as keyset to be processed next time;
The key_projection function that (2.3.2) runtime system is provided according to user, and the main transformer determined in (2.3.1)
The keyset of amount closes primary_key_set and derives that the keyset of auxiliary variable closes secondary_key_set, calls the master variable
It goes to pull required model parameter with the pull method of the tensors object such as auxiliary variable;Go to step (2.3.3);
(2.3.3) executes the core function kernel_function in the stage, and runtime system is automatically by the collection of the key of master variable
It closes key_set and is divided into num_threads part automatically, create num_threads thread parallelization and execute core function,
Wherein num_threads is the parameter that user provides, and goes to step (2.3.4);
(2.3.4) runs core function kernel_function and generates more new variables v_update, and runtime system is according to user
The update_function of offer updates corresponding variable v;If the type of variable v be it is globally shared or it is global only
One, the push function of run time call variable v is updated, and (key, the value) which to be updated is to serializing;And
Data after serializing are sent to the service node of all maintenance this section key, after service node receives more new data, are updated
Its data safeguarded.
5. the automation task parallel method suitable for distributed machines study as claimed in claim 4, which is characterized in that step
Suddenly pull method described in (2.3.2) step has following sub-step:
(2.3.2.1) keyset to be pulled the tensor object closes key_set serializing, and the data of serializing are sent to dimension
The service node for protecting the paragraph key set goes to step (2.3.2.2);
After (2.3.2.2) service node receives the message of pull, by corresponding (key, value) the binary group data of key_set
Serializing, and the data after serializing are returned into requesting party.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610255970.5A CN105956021B (en) | 2016-04-22 | 2016-04-22 | A kind of automation task suitable for distributed machines study parallel method and its system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610255970.5A CN105956021B (en) | 2016-04-22 | 2016-04-22 | A kind of automation task suitable for distributed machines study parallel method and its system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105956021A CN105956021A (en) | 2016-09-21 |
CN105956021B true CN105956021B (en) | 2019-05-21 |
Family
ID=56915367
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610255970.5A Active CN105956021B (en) | 2016-04-22 | 2016-04-22 | A kind of automation task suitable for distributed machines study parallel method and its system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105956021B (en) |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108009642B (en) * | 2016-10-31 | 2021-12-14 | 腾讯科技(深圳)有限公司 | Distributed machine learning method and system |
CN108229686B (en) * | 2016-12-14 | 2022-07-05 | 阿里巴巴集团控股有限公司 | Model training and predicting method and device, electronic equipment and machine learning platform |
EP3376441B1 (en) * | 2017-03-15 | 2021-07-14 | Siemens Aktiengesellschaft | A method for execution of a machine learning model on memory restricted industrial device |
CN108733461B (en) * | 2017-04-18 | 2021-09-14 | 北京京东尚科信息技术有限公司 | Distributed task scheduling method and device |
CN107231558B (en) * | 2017-05-23 | 2019-10-22 | 江苏火米互动科技有限公司 | A kind of implementation method of the H.264 parallel encoder based on CUDA |
CN111597187B (en) * | 2017-08-30 | 2023-09-01 | 第四范式(北京)技术有限公司 | Distributed system for performing machine learning and method thereof |
CN109447274B (en) * | 2017-08-30 | 2021-02-09 | 第四范式(北京)技术有限公司 | Distributed system for performing machine learning and method thereof |
CN111079942B (en) * | 2017-08-30 | 2023-03-24 | 第四范式(北京)技术有限公司 | Distributed system for performing machine learning and method thereof |
CN109814986B (en) * | 2017-11-20 | 2021-01-05 | 上海寒武纪信息科技有限公司 | Task parallel processing method, storage medium, computer equipment, device and system |
CN107944566B (en) * | 2017-11-28 | 2020-12-22 | 杭州云脑科技有限公司 | Machine learning method, main node, working node and system |
CN109960570B (en) * | 2017-12-14 | 2021-09-03 | 北京图森智途科技有限公司 | Multi-module scheduling method, device and system |
CN108681777B (en) * | 2018-05-07 | 2021-07-20 | 北京京东尚科信息技术有限公司 | Method and device for running machine learning program based on distributed system |
CN109871958B (en) * | 2019-02-01 | 2023-07-28 | 东软医疗系统股份有限公司 | Method, device and equipment for training model |
WO2020243973A1 (en) * | 2019-06-06 | 2020-12-10 | 华为技术有限公司 | Model-based signal inference method and apparatus |
US11907770B2 (en) | 2019-09-19 | 2024-02-20 | Huawei Cloud Computing Technologies Co., Ltd. | Method and apparatus for vectorized resource scheduling in distributed computing systems using tensors |
CN110990059B (en) * | 2019-11-28 | 2021-11-19 | 中国科学院计算技术研究所 | Stream type calculation engine operation method and system for tilt data |
TWI780382B (en) * | 2019-12-05 | 2022-10-11 | 新唐科技股份有限公司 | Microcontroller updating system and method |
CN111506402B (en) * | 2020-03-31 | 2023-06-27 | 上海氪信信息技术有限公司 | Computer task scheduling method, device, equipment and medium for machine learning modeling |
CN111580970B (en) * | 2020-05-07 | 2023-02-03 | 电子科技大学 | Transmission scheduling method for model distribution and aggregation of federated learning |
CN111753997B (en) * | 2020-06-28 | 2021-08-27 | 北京百度网讯科技有限公司 | Distributed training method, system, device and storage medium |
US11954611B2 (en) | 2020-08-27 | 2024-04-09 | International Business Machines Corporation | Tensor comparison across a distributed machine learning environment |
CN112214256B (en) * | 2020-09-30 | 2024-02-02 | 招商局金融科技有限公司 | Machine learning operation control method and device, electronic equipment and storage medium |
CN113157413B (en) * | 2021-04-16 | 2022-04-26 | 上海交通大学 | Deep learning task resource optimization configuration method and system based on service quality requirement |
CN113703980B (en) * | 2021-08-31 | 2024-09-06 | 西安电子科技大学 | Distributed machine learning system and communication scheduling method suitable for same |
CN114461392B (en) * | 2022-01-25 | 2023-03-31 | 西南交通大学 | Bandwidth-aware selective data multicast method |
CN115314397B (en) * | 2022-08-05 | 2023-07-21 | 中科计算技术西部研究院 | Network simulation method, system, device and storage medium for distributed training |
CN116483580B (en) * | 2022-09-29 | 2024-05-28 | 陕西震旦纪信息技术有限公司 | System and method for scheduling server computing resources based on Kubernetes |
CN116662039B (en) * | 2023-07-25 | 2024-01-23 | 菲特(天津)检测技术有限公司 | Industrial information parallel detection method, device and medium based on shared memory |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8027938B1 (en) * | 2007-03-26 | 2011-09-27 | Google Inc. | Discriminative training in machine learning |
CN102546247A (en) * | 2011-12-29 | 2012-07-04 | 华中科技大学 | Massive data continuous analysis system suitable for stream processing |
CN103763378A (en) * | 2014-01-24 | 2014-04-30 | 中国联合网络通信集团有限公司 | Task processing method and system and nodes based on distributive type calculation system |
CN104360903A (en) * | 2014-11-18 | 2015-02-18 | 北京美琦华悦通讯科技有限公司 | Method for realizing task data decoupling in spark operation scheduling system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9563670B2 (en) * | 2013-03-14 | 2017-02-07 | Leidos, Inc. | Data analytics system |
-
2016
- 2016-04-22 CN CN201610255970.5A patent/CN105956021B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8027938B1 (en) * | 2007-03-26 | 2011-09-27 | Google Inc. | Discriminative training in machine learning |
CN102546247A (en) * | 2011-12-29 | 2012-07-04 | 华中科技大学 | Massive data continuous analysis system suitable for stream processing |
CN103763378A (en) * | 2014-01-24 | 2014-04-30 | 中国联合网络通信集团有限公司 | Task processing method and system and nodes based on distributive type calculation system |
CN104360903A (en) * | 2014-11-18 | 2015-02-18 | 北京美琦华悦通讯科技有限公司 | Method for realizing task data decoupling in spark operation scheduling system |
Non-Patent Citations (4)
Title |
---|
Distributed GraphLab: A Framework for Machine Learning;Yucheng Low;《Proceedings of Vldb Endowment》;20120531;全文 |
Petuum: A Framework for Iterative-Convergent;Wei Dai;《Proceedings of Advances in Neural Information Processing Systems》;20131231;全文 |
Scaling Distributed Machine Learning;Mu Li;《Proceedings of 11th USENIX Symposium on Operating Systems Design and Implementation》;20141231;全文 |
基于事物内存的分布式编程环境中缓存一致性维护机制;余林琛等;《微电子学与计算机》;20130331;全文 |
Also Published As
Publication number | Publication date |
---|---|
CN105956021A (en) | 2016-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105956021B (en) | A kind of automation task suitable for distributed machines study parallel method and its system | |
Kim et al. | Strads: A distributed framework for scheduled model parallel machine learning | |
CN105117286B (en) | The dispatching method of task and streamlined perform method in MapReduce | |
Ward et al. | Colmena: Scalable machine-learning-based steering of ensemble simulations for high performance computing | |
US10754709B2 (en) | Scalable task scheduling systems and methods for cyclic interdependent tasks using semantic analysis | |
US20240111586A1 (en) | Multi-policy intelligent scheduling method and apparatus oriented to heterogeneous computing power | |
Yu et al. | Automated runtime-aware scheduling for multi-tenant dnn inference on gpu | |
CN109891438B (en) | Numerical quantum experiment method and system | |
CN112416585B (en) | Deep learning-oriented GPU resource management and intelligent scheduling method | |
CN112764893B (en) | Data processing method and data processing system | |
CN104243617A (en) | Task scheduling method and system facing mixed load in heterogeneous cluster | |
US20210390405A1 (en) | Microservice-based training systems in heterogeneous graphic processor unit (gpu) cluster and operating method thereof | |
CN112948066A (en) | Spark task scheduling method based on heterogeneous resources | |
CN112052081A (en) | Task scheduling method and device and electronic equipment | |
Rocha et al. | Pipetune: Pipeline parallelism of hyper and system parameters tuning for deep learning clusters | |
CN117009038B (en) | Graph computing platform based on cloud native technology | |
Feljan et al. | Task allocation optimization for multicore embedded systems | |
CN106844024B (en) | GPU/CPU scheduling method and system of self-learning running time prediction model | |
US20240193721A1 (en) | System and method for adaptive graph-to-stream scheduling | |
US10719903B2 (en) | On-the fly scheduling of execution of dynamic hardware behaviors | |
CN114925591A (en) | Automatic parallel strategy searching method based on polyhedron model modeling and related equipment | |
US20100131740A1 (en) | Data processing system and data processing method | |
CN116991878A (en) | Method and system for generating distributed execution plan based on Q-learning | |
Zhou et al. | Scheduling-efficient framework for neural network on heterogeneous distributed systems and mobile edge computing systems | |
CN113902567B (en) | Method and device for executing tasks and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |