CN112448899A

CN112448899A - Flow scheduling-based multitask training cluster network optimization method

Info

Publication number: CN112448899A
Application number: CN201910819132.XA
Authority: CN
Inventors: 孙军欢; 胡水海
Original assignee: Shenzhen Zhixing Technology Co Ltd
Current assignee: Shenzhen Zhixing Technology Co Ltd
Priority date: 2019-08-31
Filing date: 2019-08-31
Publication date: 2021-03-05

Abstract

The invention provides a multi-task training cluster network optimization method based on traffic scheduling, which determines the traffic priority of each training task according to task characteristics, namely determines the tasks within the I-th iteration after the training is started as the highest priority, respectively determines the tasks after the I-th iteration after the training as other priorities except the highest priority according to the total transmission quantity of the tasks in all previous iteration rounds, constructs a communication queue, maps the traffic of each task into the communication queue according to the priority, and performs communication based on the communication queue to improve the communication efficiency.

Description

Flow scheduling-based multitask training cluster network optimization method

Technical Field

The invention relates to the field of network communication of a multitask machine learning training cluster; in particular to a multitask training cluster network optimization method based on traffic scheduling.

Background

Deep Learning (DL) has achieved wide success in artificial intelligence driven services and is the core of basic products in many related fields. Because the computation cost of Deep Neural Network (DNN) training is very high, the advantage of parallel computation of a distributed system needs to be explored if timely training is to be realized. Thus, industry leadership IT enterprises such as microsoft, Facebook and Google have begun running distributed Deep Learning Training (DLT) tasks on production clusters of hundreds or thousands of servers. DLT, as a compute intensive task, requires a focused effort for efficient cluster computing resource scheduling. Meanwhile, as the GPU is faster and faster in computation speed and larger in model, the performance bottleneck of the cluster is shifting from computation to communication. However, network optimization of DLT in a production environment is still in a starting stage, and the existing parameter interaction mechanism has a great defect.

It is particularly noted that deep learning training clusters (DL clusterings) in a production environment are full of various uncertainties. Especially when running several, tens or even hundreds of training tasks simultaneously on a larger scale cluster, they (especially the different tasks scheduled to the same compute node) will have to share the network of the cluster. Thus, there is a strong competition for network resources between traffic between different training tasks, and between long-lived elephant flows and delay-sensitive mouse flows in the traffic.

Disclosure of Invention

In view of this, the present invention provides a method for optimizing a multitask training cluster network based on traffic scheduling.

In one aspect, an embodiment of the present invention provides a traffic scheduling method based on a task priority queue.

The traffic scheduling method comprises the following steps:

constructing K ready queues (K is a positive integer not less than 2), wherein each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;

and (3) enabling the flow of each training task to enter a corresponding queue according to the respective priority, and scheduling according to the priority:

determining the task flow within the I-th iteration after the training is started as the highest priority;

respectively mapping the flow of the tasks after the I-th iteration after the training is started to other priority queues except the highest priority queue according to the total sending quantity of the tasks in all previous iteration rounds; the larger the total sending quantity of the tasks is, the lower the priority of the tasks is;

wherein, the I is a positive integer; the magnitude of the I value is generally set according to conditions such as experience and model.

In another aspect, an embodiment of the present invention provides a method for optimizing a multi-task training cluster network.

With reference to the first aspect, based on the traffic scheduling method in the first aspect, the method for optimizing a multitask training cluster network includes:

acquiring task characteristics, and determining the traffic priority of each training task according to the task characteristics:

determining the tasks within the I-th iteration after the training is started to be the highest priority;

determining the tasks after the I-th iteration after the training is started as other priorities except the highest priority according to the total sending quantity of the tasks in all previous iteration rounds; the larger the total sending quantity of the tasks is, the lower the priority of the tasks is;

wherein, the I is a positive integer; the value of the I is generally set according to conditions such as experience, models and the like;

according to the traffic scheduling method mentioned in the first aspect, the traffic of each training task on each computing node of the cluster is scheduled, and traffic communication of each training task is controlled, so that the average completion time of each training task is minimized.

In another aspect, an embodiment of the present invention provides a traffic scheduling module based on a task priority queue.

With reference to the first aspect, correspondingly, the traffic scheduling module includes:

a priority component for obtaining/receiving task communication priority;

and a communication queue component for constructing K ready queues (K being a positive integer no less than 2): each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;

and mapping the flow corresponding to each task to a corresponding ready queue according to the acquired task communication priority, and carrying out scheduling communication.

In another aspect, an embodiment of the present invention provides a flow scheduling-based multitask training cluster network system.

With reference to the second and third aspects, the above multi-task training cluster network system includes:

a communication management unit and a flow scheduling unit;

the communication management unit is used for determining the communication priority of the task; specifically, after the communication management module obtains the task characteristics, the communication management module determines the traffic priority of each training task according to the number of training iteration rounds in the task characteristics, and the like:

determining the tasks within the I-th iteration after the training is started as the highest priority;

the traffic scheduling unit mentioned above includes the traffic scheduling module mentioned in the third aspect, and is configured to obtain the task communication priority determined by the communication management unit, and schedule communication according to the task communication priority.

The task priority queue traffic scheduling method and module, and the multi-task training cluster network optimization method and the multi-task training cluster network system based on the task priority queue traffic scheduling method and module improve communication efficiency by determining task communication priority, constructing a communication queue, mapping each task traffic into the communication queue according to the priority, and performing communication based on the communication queue.

The technical solution of the present invention is further described with reference to the accompanying drawings and specific embodiments.

Drawings

To more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings related to a part of the embodiments of the present invention or the description in the prior art will be briefly introduced below.

Fig. 1 is a flowchart illustrating a method for optimizing a multi-task training cluster network based on traffic scheduling according to some embodiments of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention is clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of a portion of the invention and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The following are some preferred embodiments of the invention. Wherein the content of the first and second substances,

some of the preferred embodiments described above provide a method for task priority queue based traffic scheduling. The traffic scheduling method comprises the following steps:

at a host terminal serving as a computing node, constructing K ready queues (K is a positive integer not less than 2) by a system of the host terminal for traffic scheduling; wherein each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;

and the flow of each training task running at the same host terminal enters a corresponding queue according to the priority of each training task, and is scheduled according to the priority:

determining the task flow within the I-th iteration after the training is started as the highest priority to obtain early feedback, predict and guide subsequent training; respectively mapping the flow of the tasks after the I-th iteration after the training is started to other priority queues except the highest priority queue according to the total sending quantity of the tasks in all previous iteration rounds; the larger the total sending quantity of the tasks is, the lower the priority of the tasks is; wherein, the I is a positive integer; the magnitude of the I value is generally set according to conditions such as experience and model.

In some of the above-mentioned preferred embodiments, in the method for scheduling traffic based on task priority queues, during the communication process, the sending traffic belonging to each task is also dynamically changed, in short, that is, data belonging to any task traffic is changed before it is sent. To address this issue, in these embodiments, the task priority is dynamically changed:

for any task, when the sending data quantity (for example, the number of bytes) of the task exceeds a preset threshold value, the priority of the task is reduced, and the task flow is moved to a lower queue.

In some of the flow scheduling methods based on task priority queues provided in the preferred embodiments, a task is added to a queue with the lowest priority, and if other task flows continuously enter the queue thereafter, communication is performed to form a stable communication flow, and the flow of the task is kept waiting; the priority of the traffic with longer latency in the low priority is increased.

Other embodiments of the present invention provide a method for optimizing a multi-task training cluster network based on traffic scheduling. As shown in fig. 1, the method includes:

acquiring task characteristics, analyzing the task characteristics, and determining the traffic priority of each training task according to the number of training iteration rounds, the previous sending data amount and the like:

according to the traffic scheduling method in any embodiment, the traffic of each training task on each computing node of the cluster is scheduled, and the traffic communication of each training task is controlled, so that the average completion time of each training task is minimized.

In the conventional traffic priority control method, the priority of a flow can be easily modified by dynamically modifying the DSCP of one flow. However, in some of the above-mentioned preferred embodiments, in order to improve the communication efficiency, an efficient network (for example, RDMA-based network) is used as the transmission network of the training data, and the high-speed network usually bypasses the kernel of the operating system to reduce and avoid CPU occupation, so as to achieve high-speed communication. It is for the above reason that the DSCP cannot be dynamically modified, and therefore, when a high-speed network is used, the above method cannot be directly adopted to dynamically modify the priority. Therefore, in the optimization method for the multitask training cluster network based on traffic scheduling provided in the embodiments, a unique DSCP is allocated to each task, and traffic priority scheduling is implemented by periodically adjusting the DSCP-priority mapping relationship (on the end hosts and the switches).

Still other embodiments of the present invention provide a task priority queue based traffic scheduling module. The traffic scheduling module comprises:

a priority component for obtaining/receiving task communication priority; specifically, when the priority component is called, the priority component acquires/receives the priority of the cluster training task traffic communication sent by the communication management unit from the communication management unit;

a communication queue component, which is used to construct K ready queues (K is a positive integer no less than 2): each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;

Some of the above preferred embodiments provide a traffic scheduling module, during the communication process, the sending traffic belonging to each task is also dynamically changed, in short, that is, the data belonging to any task traffic is changed before it is sent. To address this issue, in these embodiments, the communication queue component dynamically changes task priority according to the amount of data sent:

for any task, when the sending data quantity (for example, the number of bytes) of the task exceeds a preset threshold value, the priority of the task is reduced.

Some of the traffic scheduling modules provided in the above preferred embodiments increase the priority of the low-priority traffic with a long latency.

Still other embodiments of the present invention provide a multitask training cluster network system based on stream scheduling. The system comprises: a communication management unit and a flow scheduling unit; wherein the content of the first and second substances,

the communication management unit runs on one node of the cluster and is used for determining the communication priority of the task; specifically, after acquiring the task features, the communication management module determines the traffic priority of each training task according to the number of training iteration rounds in the task features and the like:

determining tasks within the I-th iteration after the training is started as the highest priority to realize early feedback;

the traffic scheduling unit runs on each computing node of the cluster and is used for traffic scheduling when the parameters of each training task exchange communication; specifically, the traffic scheduling unit includes the traffic scheduling module in the embodiment, and is configured to obtain the task communication priority determined by the communication management unit, and schedule communication according to the task communication priority.

In some of the above-mentioned preferred embodiments, in order to improve communication efficiency, an efficient network is used in the multitask training cluster network system based on stream scheduling; therefore, in the multi-task training cluster network system based on flow scheduling provided by these embodiments, a unique DSCP is allocated to each task, and traffic priority scheduling is implemented by periodically adjusting the DSCP-priority mapping relationship (on the end hosts and the switches).

The above description is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto.

Claims

1. A flow scheduling method based on task priority queues is characterized by comprising the following steps:

constructing K ready queues, wherein each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority;

respectively mapping the flow of the tasks after the I-th iteration after the training is started to other priority queues except the highest priority queue according to the total sending quantity of the tasks in all previous iteration rounds;

wherein K is a positive integer not less than 2, and I is a positive integer.

2. The traffic scheduling method according to claim 1,

dynamically changing task priority:

for any task, when the sending data volume of the task exceeds a preset threshold value, the priority of the task is reduced.

3. The traffic scheduling method according to claim 1,

and for the traffic with longer waiting time in low priority, the priority is improved.

4. A method for optimizing a multitask training cluster network is characterized by comprising the following steps:

acquiring task characteristics of each training task, and determining the flow priority according to the task characteristics:

determining the tasks after the I-th iteration after the training is started as other priorities except the highest priority according to the total sending quantity of the tasks in all previous iteration rounds; wherein I is a positive integer;

the traffic scheduling method according to any one of claims 1 to 3, wherein the traffic of each training task on each computing node of the cluster is scheduled to control traffic communication thereof.

5. The method for multitask training cluster network optimization according to claim 4,

performing parameter exchange by using a high-speed network;

and allocating a unique DSCP for each task, and realizing the flow priority scheduling by regularly adjusting the DSCP-priority mapping relation.

6. A task priority queue based traffic scheduling module, comprising:

a priority component for obtaining/receiving task communication priority;

and a communication queue component for constructing K ready queues: each queue corresponds to a priority; wherein, the priority of the first queue is highest, and the priority of the queue is reduced by the priority; wherein K is a positive integer not less than 2;

7. The traffic scheduling module of claim 6,

dynamically changing task priority:

8. The traffic scheduling module of claim 6,

9. A flow scheduling based multitask training cluster network system, comprising:

a communication management unit and a flow scheduling unit; wherein the content of the first and second substances,

the communication management unit is used for determining the communication priority of the task; the communication management unit determines the flow priority of each training task according to the task characteristics:

a traffic scheduling unit comprising the traffic scheduling module of any of claims 6 to 8, configured to obtain the task communication priority determined by the communication management unit, and schedule communication according to the task communication priority.

10. The multitasking training cluster network system of claim 9,

the system uses a high-speed network for parameter exchange;