CN112448899A - Flow scheduling-based multitask training cluster network optimization method - Google Patents
Flow scheduling-based multitask training cluster network optimization method Download PDFInfo
- Publication number
- CN112448899A CN112448899A CN201910819132.XA CN201910819132A CN112448899A CN 112448899 A CN112448899 A CN 112448899A CN 201910819132 A CN201910819132 A CN 201910819132A CN 112448899 A CN112448899 A CN 112448899A
- Authority
- CN
- China
- Prior art keywords
- priority
- task
- training
- communication
- queue
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000005457 optimization Methods 0.000 title claims abstract description 8
- 230000006854 communication Effects 0.000 claims abstract description 65
- 238000004891 communication Methods 0.000 claims abstract description 63
- 238000013507 mapping Methods 0.000 claims description 11
- HRULVFRXEOZUMJ-UHFFFAOYSA-K potassium;disodium;2-(4-chloro-2-methylphenoxy)propanoate;methyl-dioxido-oxo-$l^{5}-arsane Chemical compound [Na+].[Na+].[K+].C[As]([O-])([O-])=O.[O-]C(=O)C(C)OC1=CC=C(Cl)C=C1C HRULVFRXEOZUMJ-UHFFFAOYSA-K 0.000 claims description 6
- 239000000126 substance Substances 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 abstract description 2
- 238000013135 deep learning Methods 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 3
- 241000406668 Loxodonta cyclotis Species 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/50—Queue scheduling
- H04L47/62—Queue scheduling characterised by scheduling criteria
- H04L47/6215—Individual queue per QOS, rate or priority
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
The invention provides a multi-task training cluster network optimization method based on traffic scheduling, which determines the traffic priority of each training task according to task characteristics, namely determines the tasks within the I-th iteration after the training is started as the highest priority, respectively determines the tasks after the I-th iteration after the training as other priorities except the highest priority according to the total transmission quantity of the tasks in all previous iteration rounds, constructs a communication queue, maps the traffic of each task into the communication queue according to the priority, and performs communication based on the communication queue to improve the communication efficiency.
Description
Technical Field
The invention relates to the field of network communication of a multitask machine learning training cluster; in particular to a multitask training cluster network optimization method based on traffic scheduling.
Background
Deep Learning (DL) has achieved wide success in artificial intelligence driven services and is the core of basic products in many related fields. Because the computation cost of Deep Neural Network (DNN) training is very high, the advantage of parallel computation of a distributed system needs to be explored if timely training is to be realized. Thus, industry leadership IT enterprises such as microsoft, Facebook and Google have begun running distributed Deep Learning Training (DLT) tasks on production clusters of hundreds or thousands of servers. DLT, as a compute intensive task, requires a focused effort for efficient cluster computing resource scheduling. Meanwhile, as the GPU is faster and faster in computation speed and larger in model, the performance bottleneck of the cluster is shifting from computation to communication. However, network optimization of DLT in a production environment is still in a starting stage, and the existing parameter interaction mechanism has a great defect.
It is particularly noted that deep learning training clusters (DL clusterings) in a production environment are full of various uncertainties. Especially when running several, tens or even hundreds of training tasks simultaneously on a larger scale cluster, they (especially the different tasks scheduled to the same compute node) will have to share the network of the cluster. Thus, there is a strong competition for network resources between traffic between different training tasks, and between long-lived elephant flows and delay-sensitive mouse flows in the traffic.
Disclosure of Invention
In view of this, the present invention provides a method for optimizing a multitask training cluster network based on traffic scheduling.
In one aspect, an embodiment of the present invention provides a traffic scheduling method based on a task priority queue.
The traffic scheduling method comprises the following steps:
constructing K ready queues (K is a positive integer not less than 2), wherein each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;
and (3) enabling the flow of each training task to enter a corresponding queue according to the respective priority, and scheduling according to the priority:
determining the task flow within the I-th iteration after the training is started as the highest priority;
respectively mapping the flow of the tasks after the I-th iteration after the training is started to other priority queues except the highest priority queue according to the total sending quantity of the tasks in all previous iteration rounds; the larger the total sending quantity of the tasks is, the lower the priority of the tasks is;
wherein, the I is a positive integer; the magnitude of the I value is generally set according to conditions such as experience and model.
In another aspect, an embodiment of the present invention provides a method for optimizing a multi-task training cluster network.
With reference to the first aspect, based on the traffic scheduling method in the first aspect, the method for optimizing a multitask training cluster network includes:
acquiring task characteristics, and determining the traffic priority of each training task according to the task characteristics:
determining the tasks within the I-th iteration after the training is started to be the highest priority;
determining the tasks after the I-th iteration after the training is started as other priorities except the highest priority according to the total sending quantity of the tasks in all previous iteration rounds; the larger the total sending quantity of the tasks is, the lower the priority of the tasks is;
wherein, the I is a positive integer; the value of the I is generally set according to conditions such as experience, models and the like;
according to the traffic scheduling method mentioned in the first aspect, the traffic of each training task on each computing node of the cluster is scheduled, and traffic communication of each training task is controlled, so that the average completion time of each training task is minimized.
In another aspect, an embodiment of the present invention provides a traffic scheduling module based on a task priority queue.
With reference to the first aspect, correspondingly, the traffic scheduling module includes:
a priority component for obtaining/receiving task communication priority;
and a communication queue component for constructing K ready queues (K being a positive integer no less than 2): each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;
and mapping the flow corresponding to each task to a corresponding ready queue according to the acquired task communication priority, and carrying out scheduling communication.
In another aspect, an embodiment of the present invention provides a flow scheduling-based multitask training cluster network system.
With reference to the second and third aspects, the above multi-task training cluster network system includes:
a communication management unit and a flow scheduling unit;
the communication management unit is used for determining the communication priority of the task; specifically, after the communication management module obtains the task characteristics, the communication management module determines the traffic priority of each training task according to the number of training iteration rounds in the task characteristics, and the like:
determining the tasks within the I-th iteration after the training is started as the highest priority;
determining the tasks after the I-th iteration after the training is started as other priorities except the highest priority according to the total sending quantity of the tasks in all previous iteration rounds; the larger the total sending quantity of the tasks is, the lower the priority of the tasks is;
wherein, the I is a positive integer; the value of the I is generally set according to conditions such as experience, models and the like;
the traffic scheduling unit mentioned above includes the traffic scheduling module mentioned in the third aspect, and is configured to obtain the task communication priority determined by the communication management unit, and schedule communication according to the task communication priority.
The task priority queue traffic scheduling method and module, and the multi-task training cluster network optimization method and the multi-task training cluster network system based on the task priority queue traffic scheduling method and module improve communication efficiency by determining task communication priority, constructing a communication queue, mapping each task traffic into the communication queue according to the priority, and performing communication based on the communication queue.
The technical solution of the present invention is further described with reference to the accompanying drawings and specific embodiments.
Drawings
To more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings related to a part of the embodiments of the present invention or the description in the prior art will be briefly introduced below.
Fig. 1 is a flowchart illustrating a method for optimizing a multi-task training cluster network based on traffic scheduling according to some embodiments of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention is clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of a portion of the invention and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The following are some preferred embodiments of the invention. Wherein the content of the first and second substances,
some of the preferred embodiments described above provide a method for task priority queue based traffic scheduling. The traffic scheduling method comprises the following steps:
at a host terminal serving as a computing node, constructing K ready queues (K is a positive integer not less than 2) by a system of the host terminal for traffic scheduling; wherein each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;
and the flow of each training task running at the same host terminal enters a corresponding queue according to the priority of each training task, and is scheduled according to the priority:
determining the task flow within the I-th iteration after the training is started as the highest priority to obtain early feedback, predict and guide subsequent training; respectively mapping the flow of the tasks after the I-th iteration after the training is started to other priority queues except the highest priority queue according to the total sending quantity of the tasks in all previous iteration rounds; the larger the total sending quantity of the tasks is, the lower the priority of the tasks is; wherein, the I is a positive integer; the magnitude of the I value is generally set according to conditions such as experience and model.
In some of the above-mentioned preferred embodiments, in the method for scheduling traffic based on task priority queues, during the communication process, the sending traffic belonging to each task is also dynamically changed, in short, that is, data belonging to any task traffic is changed before it is sent. To address this issue, in these embodiments, the task priority is dynamically changed:
for any task, when the sending data quantity (for example, the number of bytes) of the task exceeds a preset threshold value, the priority of the task is reduced, and the task flow is moved to a lower queue.
In some of the flow scheduling methods based on task priority queues provided in the preferred embodiments, a task is added to a queue with the lowest priority, and if other task flows continuously enter the queue thereafter, communication is performed to form a stable communication flow, and the flow of the task is kept waiting; the priority of the traffic with longer latency in the low priority is increased.
Other embodiments of the present invention provide a method for optimizing a multi-task training cluster network based on traffic scheduling. As shown in fig. 1, the method includes:
acquiring task characteristics, analyzing the task characteristics, and determining the traffic priority of each training task according to the number of training iteration rounds, the previous sending data amount and the like:
determining the tasks within the I-th iteration after the training is started as the highest priority;
determining the tasks after the I-th iteration after the training is started as other priorities except the highest priority according to the total sending quantity of the tasks in all previous iteration rounds; the larger the total sending quantity of the tasks is, the lower the priority of the tasks is;
wherein, the I is a positive integer; the value of the I is generally set according to conditions such as experience, models and the like;
according to the traffic scheduling method in any embodiment, the traffic of each training task on each computing node of the cluster is scheduled, and the traffic communication of each training task is controlled, so that the average completion time of each training task is minimized.
In the conventional traffic priority control method, the priority of a flow can be easily modified by dynamically modifying the DSCP of one flow. However, in some of the above-mentioned preferred embodiments, in order to improve the communication efficiency, an efficient network (for example, RDMA-based network) is used as the transmission network of the training data, and the high-speed network usually bypasses the kernel of the operating system to reduce and avoid CPU occupation, so as to achieve high-speed communication. It is for the above reason that the DSCP cannot be dynamically modified, and therefore, when a high-speed network is used, the above method cannot be directly adopted to dynamically modify the priority. Therefore, in the optimization method for the multitask training cluster network based on traffic scheduling provided in the embodiments, a unique DSCP is allocated to each task, and traffic priority scheduling is implemented by periodically adjusting the DSCP-priority mapping relationship (on the end hosts and the switches).
Still other embodiments of the present invention provide a task priority queue based traffic scheduling module. The traffic scheduling module comprises:
a priority component for obtaining/receiving task communication priority; specifically, when the priority component is called, the priority component acquires/receives the priority of the cluster training task traffic communication sent by the communication management unit from the communication management unit;
a communication queue component, which is used to construct K ready queues (K is a positive integer no less than 2): each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;
and mapping the flow corresponding to each task to a corresponding ready queue according to the acquired task communication priority, and carrying out scheduling communication.
Some of the above preferred embodiments provide a traffic scheduling module, during the communication process, the sending traffic belonging to each task is also dynamically changed, in short, that is, the data belonging to any task traffic is changed before it is sent. To address this issue, in these embodiments, the communication queue component dynamically changes task priority according to the amount of data sent:
for any task, when the sending data quantity (for example, the number of bytes) of the task exceeds a preset threshold value, the priority of the task is reduced.
Some of the traffic scheduling modules provided in the above preferred embodiments increase the priority of the low-priority traffic with a long latency.
Still other embodiments of the present invention provide a multitask training cluster network system based on stream scheduling. The system comprises: a communication management unit and a flow scheduling unit; wherein the content of the first and second substances,
the communication management unit runs on one node of the cluster and is used for determining the communication priority of the task; specifically, after acquiring the task features, the communication management module determines the traffic priority of each training task according to the number of training iteration rounds in the task features and the like:
determining tasks within the I-th iteration after the training is started as the highest priority to realize early feedback;
determining the tasks after the I-th iteration after the training is started as other priorities except the highest priority according to the total sending quantity of the tasks in all previous iteration rounds; the larger the total sending quantity of the tasks is, the lower the priority of the tasks is;
wherein, the I is a positive integer; the value of the I is generally set according to conditions such as experience, models and the like;
the traffic scheduling unit runs on each computing node of the cluster and is used for traffic scheduling when the parameters of each training task exchange communication; specifically, the traffic scheduling unit includes the traffic scheduling module in the embodiment, and is configured to obtain the task communication priority determined by the communication management unit, and schedule communication according to the task communication priority.
In some of the above-mentioned preferred embodiments, in order to improve communication efficiency, an efficient network is used in the multitask training cluster network system based on stream scheduling; therefore, in the multi-task training cluster network system based on flow scheduling provided by these embodiments, a unique DSCP is allocated to each task, and traffic priority scheduling is implemented by periodically adjusting the DSCP-priority mapping relationship (on the end hosts and the switches).
The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto.
Claims (10)
1. A flow scheduling method based on task priority queues is characterized by comprising the following steps:
constructing K ready queues, wherein each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;
and (3) enabling the flow of each training task to enter a corresponding queue according to the respective priority, and scheduling according to the priority:
determining the task flow within the I-th iteration after the training is started as the highest priority;
respectively mapping the flow of the tasks after the I-th iteration after the training is started to other priority queues except the highest priority queue according to the total sending quantity of the tasks in all previous iteration rounds;
wherein K is a positive integer not less than 2, and I is a positive integer.
2. The traffic scheduling method according to claim 1,
dynamically changing task priority:
for any task, when the sending data volume of the task exceeds a preset threshold value, the priority of the task is reduced.
3. The traffic scheduling method according to claim 1,
and for the traffic with longer waiting time in low priority, the priority is improved.
4. A method for optimizing a multitask training cluster network is characterized by comprising the following steps:
acquiring task characteristics of each training task, and determining the flow priority according to the task characteristics:
determining the tasks within the I-th iteration after the training is started as the highest priority;
determining the tasks after the I-th iteration after the training is started as other priorities except the highest priority according to the total sending quantity of the tasks in all previous iteration rounds; wherein I is a positive integer;
the traffic scheduling method according to any one of claims 1 to 3, wherein the traffic of each training task on each computing node of the cluster is scheduled to control traffic communication thereof.
5. The method for multitask training cluster network optimization according to claim 4,
performing parameter exchange by using a high-speed network;
and allocating a unique DSCP for each task, and realizing the flow priority scheduling by regularly adjusting the DSCP-priority mapping relation.
6. A task priority queue based traffic scheduling module, comprising:
a priority component for obtaining/receiving task communication priority;
and a communication queue component for constructing K ready queues: each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority; wherein K is a positive integer not less than 2;
and mapping the flow corresponding to each task to a corresponding ready queue according to the acquired task communication priority, and carrying out scheduling communication.
7. The traffic scheduling module of claim 6,
dynamically changing task priority:
for any task, when the sending data volume of the task exceeds a preset threshold value, the priority of the task is reduced.
8. The traffic scheduling module of claim 6,
and for the traffic with longer waiting time in low priority, the priority is improved.
9. A flow scheduling based multitask training cluster network system, comprising:
a communication management unit and a flow scheduling unit; wherein the content of the first and second substances,
the communication management unit is used for determining the communication priority of the task; the communication management unit determines the flow priority of each training task according to the task characteristics:
determining the tasks within the I-th iteration after the training is started as the highest priority;
determining the tasks after the I-th iteration after the training is started as other priorities except the highest priority according to the total sending quantity of the tasks in all previous iteration rounds; wherein I is a positive integer;
a traffic scheduling unit comprising the traffic scheduling module of any of claims 6 to 8, configured to obtain the task communication priority determined by the communication management unit, and schedule communication according to the task communication priority.
10. The multitasking training cluster network system of claim 9,
the system uses a high-speed network for parameter exchange;
and allocating a unique DSCP for each task, and realizing the flow priority scheduling by regularly adjusting the DSCP-priority mapping relation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910819132.XA CN112448899A (en) | 2019-08-31 | 2019-08-31 | Flow scheduling-based multitask training cluster network optimization method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910819132.XA CN112448899A (en) | 2019-08-31 | 2019-08-31 | Flow scheduling-based multitask training cluster network optimization method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112448899A true CN112448899A (en) | 2021-03-05 |
Family
ID=74733938
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910819132.XA Pending CN112448899A (en) | 2019-08-31 | 2019-08-31 | Flow scheduling-based multitask training cluster network optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112448899A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114900472A (en) * | 2022-07-12 | 2022-08-12 | 之江实验室 | Method and system for realizing cooperative flow scheduling by control surface facing to multiple tasks |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101572718A (en) * | 2008-04-30 | 2009-11-04 | 张文 | IP QoS unified strategic system based on oriented application and method thereof |
CN103136056A (en) * | 2013-03-04 | 2013-06-05 | 浪潮电子信息产业股份有限公司 | Cloud computing platform scheduling method |
CN106533981A (en) * | 2016-12-19 | 2017-03-22 | 北京邮电大学 | Multi-attribute based big data flow scheduling method and device |
CN107025205A (en) * | 2016-01-30 | 2017-08-08 | 华为技术有限公司 | A kind of method and apparatus of training pattern in distributed system |
US20180183684A1 (en) * | 2016-12-28 | 2018-06-28 | Google Inc. | Auto-prioritization of device traffic across local network |
CN108694090A (en) * | 2018-04-16 | 2018-10-23 | 江苏润和软件股份有限公司 | A kind of cloud computing resource scheduling method of Based on Distributed machine learning |
CN108762896A (en) * | 2018-03-26 | 2018-11-06 | 福建星瑞格软件有限公司 | One kind being based on Hadoop cluster tasks dispatching method and computer equipment |
-
2019
- 2019-08-31 CN CN201910819132.XA patent/CN112448899A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101572718A (en) * | 2008-04-30 | 2009-11-04 | 张文 | IP QoS unified strategic system based on oriented application and method thereof |
CN103136056A (en) * | 2013-03-04 | 2013-06-05 | 浪潮电子信息产业股份有限公司 | Cloud computing platform scheduling method |
CN107025205A (en) * | 2016-01-30 | 2017-08-08 | 华为技术有限公司 | A kind of method and apparatus of training pattern in distributed system |
CN106533981A (en) * | 2016-12-19 | 2017-03-22 | 北京邮电大学 | Multi-attribute based big data flow scheduling method and device |
US20180183684A1 (en) * | 2016-12-28 | 2018-06-28 | Google Inc. | Auto-prioritization of device traffic across local network |
CN108762896A (en) * | 2018-03-26 | 2018-11-06 | 福建星瑞格软件有限公司 | One kind being based on Hadoop cluster tasks dispatching method and computer equipment |
CN108694090A (en) * | 2018-04-16 | 2018-10-23 | 江苏润和软件股份有限公司 | A kind of cloud computing resource scheduling method of Based on Distributed machine learning |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114900472A (en) * | 2022-07-12 | 2022-08-12 | 之江实验室 | Method and system for realizing cooperative flow scheduling by control surface facing to multiple tasks |
CN114900472B (en) * | 2022-07-12 | 2022-11-08 | 之江实验室 | Method and system for realizing cooperative flow scheduling by control surface facing to multiple tasks |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113254197B (en) | Network resource scheduling method and system based on deep reinforcement learning | |
CN114647515A (en) | GPU cluster-oriented dynamic resource scheduling method | |
CN114253735B (en) | Task processing method and device and related equipment | |
CN110990140B (en) | Method for scheduling distributed machine learning flow in photoelectric switching network | |
CN111767146A (en) | Distributed machine learning system acceleration method based on network reconfiguration | |
CN112333234B (en) | Distributed machine learning training method and device, electronic equipment and storage medium | |
Gentry et al. | Robust dynamic resource allocation via probabilistic task pruning in heterogeneous computing systems | |
CN111740925B (en) | Deep reinforcement learning-based flow scheduling method | |
US11868808B2 (en) | Automatic driving simulation task scheduling method and apparatus, device, and readable medium | |
CN114205353A (en) | Calculation unloading method based on hybrid action space reinforcement learning algorithm | |
Li et al. | Endpoint-flexible coflow scheduling across geo-distributed datacenters | |
CN105740059A (en) | Particle swarm scheduling method for divisible task | |
CN116166381A (en) | Resource scheduling based on IACO algorithm in multi-cloud management platform | |
Wang et al. | CEFS: Compute-efficient flow scheduling for iterative synchronous applications | |
CN113938930B (en) | Construction method of virtual network function forwarding graph adapting to 5G network multi-service scene | |
CN115543626A (en) | Power defect image simulation method adopting heterogeneous computing resource load balancing scheduling | |
CN112448899A (en) | Flow scheduling-based multitask training cluster network optimization method | |
Che et al. | Deep reinforcement learning in M2M communication for resource scheduling | |
CN112446484A (en) | Multitask training cluster intelligent network system and cluster network optimization method | |
Chen et al. | Deadline-constrained MapReduce scheduling based on graph modelling | |
Liu et al. | 5G/B5G Network Slice Management via Staged Reinforcement Learning | |
Bensalem et al. | Towards optimal serverless function scaling in edge computing network | |
CN113010319A (en) | Dynamic workflow scheduling optimization method based on hybrid heuristic rule and genetic algorithm | |
Zhang et al. | Dynamic VNF scheduling: a deep reinforcement learning approach | |
Wadhonkar et al. | A task scheduling algorithm based on task length and deadline in cloud computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |