CN108768876B - Traffic scheduling method facing machine learning framework - Google Patents

Traffic scheduling method facing machine learning framework Download PDF

Info

Publication number
CN108768876B
CN108768876B CN201810569876.6A CN201810569876A CN108768876B CN 108768876 B CN108768876 B CN 108768876B CN 201810569876 A CN201810569876 A CN 201810569876A CN 108768876 B CN108768876 B CN 108768876B
Authority
CN
China
Prior art keywords
flow
group
machine learning
priority
flows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810569876.6A
Other languages
Chinese (zh)
Other versions
CN108768876A (en
Inventor
江勇
李清
杨光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Tsinghua University
Original Assignee
Shenzhen Graduate School Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Tsinghua University filed Critical Shenzhen Graduate School Tsinghua University
Priority to CN201810569876.6A priority Critical patent/CN108768876B/en
Publication of CN108768876A publication Critical patent/CN108768876A/en
Application granted granted Critical
Publication of CN108768876B publication Critical patent/CN108768876B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/625Queue scheduling characterised by scheduling criteria for service slots or service orders
    • H04L47/6275Queue scheduling characterised by scheduling criteria for service slots or service orders based on priority

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a traffic scheduling method facing a machine learning framework, which is a high-efficiency traffic scheduling mechanism of a distributed machine learning framework in a data center, and realizes a high-efficiency scheduling strategy by utilizing the self-similarity of machine learning traffic on the level of flow grouping under the scene that application flow information cannot be acquired. The mechanism organically combines the rate control of the flow with the flow scheduling, helps the completion of effective flow information speculation in the flow transmission process through the timely rate control, and reasonably guides the rate control of the flow under different network environments based on the scheduling strategy of the speculation result.

Description

Traffic scheduling method facing machine learning framework
Technical Field
The invention relates to a scheme for improving traffic scheduling performance of machine learning application in a data center network, and belongs to the field of computer networks.
Background
Technological breakthroughs in the field of machine learning in recent years have enabled more and more large business companies to increase the investment in research and development of their artificial intelligence applications. To advance the development progress, various companies have launched different machine learning frameworks to leverage the computing resources of physical computer clusters. The scheduling of resources in the cluster is very important for efficiently completing the machine learning task, wherein the allocation of network resources is particularly critical. A lot of traffic is generated in the process of completing a machine learning task, and the traffic entering a cluster network (data center network) is easy to cause network congestion, so that the task completion time is prolonged. There are two main reasons for network congestion: (1) due to the lack of an application semantic perception mechanism, the traditional data center network cannot distinguish the differentiation requirements of different applications on the network, and the fair service provided by the traditional data center network can cause the network to become the bottleneck of improving the application performance; (2) the traditional network transmission rate control mechanism is not suitable for a data center network, and a flow convergence mode in the data center network easily causes data packet loss, so that the transmission performance is reduced.
Before the machine learning task occupies a large workload of the cluster, group streaming has proven to be an effective model to effectively improve the network performance of the distributed computing framework in the data center. The reason why the group flow based scheduling scheme is superior to the conventional flow based scheme is that the group flow contains the real-time requirements of the distributed application to the network. For example, multiple streams from the same distributed application arrive at the same receiver via different links, and the application requires that the receiver can enter the next calculation stage after completing the transmission of all streams; at this time, when a link where one of the flows is located is congested, in the flow-based scheduling scheme, a congestion signal of one flow can only affect a rate control mechanism of the flow itself; for the group stream scheduling scheme, which logically considers the streams as one group stream, the rates of other streams are appropriately reduced to avoid unnecessary bandwidth occupation.
Therefore, it can be expected that implementing traffic scheduling at the flow level of the distributed machine learning framework will bring about great network performance improvement.
Unfortunately, however, the effect is limited if the group flow scheme lacks flow information. This is because, after the traffic scheduling at the distributed machine learning framework group flow level is implemented, the group flow scheduling policy determines the group flow scheduling performance when congestion occurs. Unlike flow-based scheduling schemes, the definition of the group flow itself determines that it has the optimal scheduling policy: the priority of the group flow is determined by the slowest flow within the group flow because the completion time of the group flow depends on the completion of the last flow. The prior art is lack of an effective flow grouping scheme of flow information, so the effect is limited, the size of a flow needs to be predicted according to the number of flow sending data packets and the flow sending data packets are placed in different priority queues, the scheduling performance depends on the prediction accuracy, and meanwhile the rate control of the flow cannot be explicitly controlled.
Disclosure of Invention
The invention aims to solve the problem of a flow grouping scheme and provides a traffic scheduling method facing a machine learning framework.
In the traffic scheduling method facing the machine learning framework, the machine learning is used for obtaining different machine learning models under a large-scale data set through a data parallel model; in the machine learning, a large-scale data set is divided into a plurality of distributed nodes for storage; training a working example running in a distributed node according to a local partial data set to obtain a gradient value of a model parameter, and sending the gradient value to a hyper-parameter server to update the model; the hyper-parameter server converges a plurality of groups of gradient values to carry out model training and sends the updated model parameters back to the working example; the method is characterized in that streams sent by a plurality of working examples to the same hyper-parameter server are organized into one group stream, and streams sent by the hyper-parameter server to a plurality of examples are organized into another group stream, so that the traffic scheduling of the distributed machine learning framework on the group stream layer is realized.
Further, the method also comprises a group flow information inference mechanism used for detecting the potential congestion capability of the group flow as fast as possible, wherein the group flow information inference mechanism comprises the following steps: s1, after the machine learning task starts, acquiring the number n of the group flow by counting the number of the active flows based on the group flow scheduling framework; s2, randomly selecting one of the streams in the group as a detection stream and ensuring that the detection stream completes transmission of a data packet as soon as possible, so as to obtain the size f of the stream; s3, in combination with machine learning of self-similarity of group flows, the size of a group flow can be derived from n × f; the size of the group of flows is equated to potential congestion capability for the shared forwarding node and is used as a basis for determining the priority of a group of flows; and S4, updating the priority.
Further, group flows from a newly generated machine learning task need to be subject to information speculation to be added to an active group flow set; in the information speculation process, one group flow in a plurality of group flows in a new task is randomly selected according to an edge switch to which a receiving end hyper-parameter server belongs; and for the selected group flow, randomly selecting one flow as a detection flow according to the physical host where the working example of the sending end is located.
Further, packets from probe flows are marked into the probe queue, enjoying the highest priority; the detection flow is randomly assigned in the group flow obtained by random selection, namely, the selection of the detection flow adopts double random design.
Further, the highest priority of the probe stream is combined with a resilient rate control algorithm to ensure a rapid increase in the probe stream transmission rate.
Further, before the priority update starts, the non-probe flow from the new task enters the active queue to ensure that it is not starved for packets at the lowest sending rate under the current network configuration.
Furthermore, when the flow priority update timer is overtime, all the non-probe flows including the unfinished group flow enter the corresponding priority queue according to the probe result of the probe flow to which the non-probe flows belong; the probe flow size determines the priority of the non-probe flow by the inference of the influence on the group flow size, and the priority updates the scheduling policy ensuring the priority of the short job.
Further, the elastic rate control algorithm comprises: in the case of sufficient available bandwidth, the rate of the stream is multiplied to quickly occupy the available bandwidth to increase the link utilization.
Furthermore, collecting RTT information of links passed by each flow in the group flow at the end system to estimate available bandwidth, and realizing rate control based on the end system; and is compatible with existing TCP/IP protocols by setting the send window size in a similar manner.
Further, for a flow f, the target rate, i.e. the number of data packets expected to be transmitted in the next period, is calculated by the following formula:
elastic_gain*f.max_rtt/f.min_rtt
the preset value elastic _ gain determines the lowest sending rate of the flow, and f.max _ rtt represents the maximum queue length accepted in the link, and the size of the queue is related to the network configuration; min _ rtt represents the queue length in the link in the last measurement period.
The invention also comprises a flow scheduling system facing the machine learning framework, which is used for realizing the method and comprises a controller, a flow analysis module and a flow analysis module, wherein the controller is used for receiving the latest flow information periodically reported by a sending end and realizing the semantic analysis function of the flow; the controller comprises a group flow information acquisition module, a communication mode matching module and a group flow size presumption module; the group flow information acquisition module is used for organizing flows from the machine learning task into group flows according to the existing flow information and identifying the group flows from different training periods; the communication pattern matching module is used for recording the completed group flow to match the current incomplete group flow; the successfully matched group flow does not need to enter a group flow size presumption module, and the flow to which the group flow belongs directly uses the matched result to issue a decision to a receiving end; the group flow size presumption module is used for realizing the group flow information presumption algorithm, dividing new group flows into detection flows and non-detection flows and updating the priority of the non-detection flows according to the result of the group flow information acquisition module;
further, the controller is also used for periodically sending a scheduling decision to the receiving end; the module at the terminal realizes the received group flow strategy through elastic flow scheduling, wherein the module comprises a priority marking module, a measurement result updating module and a transmission layer rate control module; after receiving the priority updating information of each flow in the group flow from the controller, the priority marking module marks corresponding priority to the data packet of each flow and ensures that the data packet enters a multi-level feedback priority queue in an operating system to realize a short operation priority strategy; the measurement result updating module collects RTT information of a link between end to end and the quantity of sending packets and the quantity of receiving packets in the previous period to provide a data basis for a transmission layer rate control module in the next step; the transmission layer rate control module is used for calculating the number of the sending packets in the next period and ensuring that the corresponding sending packets are sent out normally from the network card with the help of the rate limiter.
Compared with the prior art, the invention provides an efficient flow scheduling mechanism of a distributed machine learning framework in a data center, and realizes an efficient scheduling strategy on the level of group flow under the scene that application flow information cannot be acquired.
Furthermore, the mechanism organically combines the rate control of the flow with the traffic scheduling, helps the completion of effective flow information speculation in the flow transmission process through the timely rate control, and reasonably guides the rate control of the flow under different network environments based on the scheduling strategy of the speculation result.
Drawings
FIG. 1 is a schematic diagram of the deployment of a distributed machine learning framework in a data center and a typical topology of a data center network;
FIG. 2 is a schematic diagram of an example of a group flow in a distributed machine learning framework in one embodiment of the invention;
FIG. 3 is a flow chart illustrating the flow of information inference for a group flow according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a link queue based rate control strategy according to an embodiment of the present invention;
FIG. 5 is a flow chart illustrating elastic rate control according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a model framework according to an embodiment of the invention;
fig. 7a-7c are exemplary diagrams of a group flow scheduling algorithm according to an embodiment of the present invention.
Reference numerals: tn: an edge switch numbered n; hn-m: a physical host with the number m and accessed to the nth edge switch; VM1: number 1 virtual host, D1 Hadoop based working example number 1; s1 working example No. 1 based on Spark.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like reference numerals refer to like parts unless otherwise specified. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.
The deployment of the distributed machine learning framework in a data center and the data center network topology are shown in fig. 1: a physical host H typically runs multiple distributed working instances (such as D from the computing framework Hadoop, or S from Spark) in the form of virtual machines VM; multiple working examples in the same physical host share physical resources, such as a CPU, a memory, a hard disk and network bandwidth; each physical host accesses an internal network through an edge switch T (data center internal network refers to a network organization accessed by edge switches and internally constructed by a plurality of high-performance switches), and the edge switches distribute the traffic from the internal network to the belonging physical hosts.
It is generally considered that the internal network is not congested, and the congestion mainly occurs in a network portion composed of edge switches and physical hosts in the data center, i.e., an external network. Common data center network congestion can be divided into sender congestion and edge switch congestion. The former is that the data packet cannot leave the host and enter the network due to the preemption of the bandwidth by a plurality of working examples running in a single host; the latter is because the traffic from the internal network exceeds the forwarding capability of the edge switch, causing the packet to be dropped at the last hop in the routing path.
In combination with the above example of group flows, it can be seen that if we can organize flows from multiple working instances of the same physical host into one group flow, or organize multiple flows arriving at the same edge switch into one group flow, the scheduling advantage of the group flow can be fully utilized.
Fortunately, the computational semantics of the distributed machine learning framework support the group flow scheduling scheme. Most mainstream distributed machine frameworks are built based on the hyper-parametric server model as shown in FIG. 2. In these systems, the goal of the machine learning task is to obtain different machine learning models under large-scale data sets through a data parallel model. Dividing a large-scale data set into a plurality of distributed nodes for storage; training a working example running in a distributed node according to a local partial data set to obtain a gradient value of a model parameter, and sending the gradient value to a hyper-parameter server to update the model; the hyper-parameter server converges multiple groups of gradient values to train the model, and sends the updated model parameters back to the working example. Because the hyper-parameter server needs to wait until a certain gradient result to start model training, and the working instance needs to wait until the latest model parameter of the hyper-parameter server is issued to start a new round of gradient calculation, streams sent by a plurality of working instances to the same hyper-parameter server can be organized into a group stream (the group stream from task 1), and streams sent by the hyper-parameter server to a plurality of instances are organized into a group stream (the group stream from task 2). Comparing the congestion types in the data center network mentioned above, it can be seen that the former is favorable for solving the congestion of the edge switch of the uplink at the super-parameter server end, and the latter is favorable for solving the congestion of the sending end of the downlink at the super-parameter server end.
The inventor finds that unlike the group flow of other distributed applications, the machine learning group flow has the following characteristics with great potential value: 1. unlike other multi-stage application tasks, it can be seen from the above example that the machine learning task has only two stages, which determines that the terminals of the group stream do not migrate due to different stages, thereby ensuring the stability of the group stream structure (the sending end and the receiving end of each stream); 2. the applied semantics are similar and the size of the divided data blocks in one machine learning task is fixed, so the flow sizes in different training periods are consistent; 3. in the hyper-parameter server model, the model parameters responsible for one hyper-parameter server are divided equally, and the same type group streams corresponding to each hyper-parameter server are similar to each other. The above features may be self-similarity, collectively referred to as a machine learning group flow.
The inventors have discovered that machine learning the self-similarity of group flows is advantageous to simplify the design of group flow strategies. The embodiment of the invention utilizes the findings to construct an efficient flow scheduling mechanism of a distributed machine learning framework in a data center. The concrete description is as follows:
1. mechanism for speculating group flow information
The purpose of the group flow information inference mechanism is to detect the potential congestion capability of a group flow as quickly as possible. Considering that short jobs are the best algorithm for shortening task completion time, which is known at present, we use the largest group flow as the group flow most likely to cause congestion under the premise that the sending rate of the flow is stable and unchanged.
How to calculate the longest flow completion time in a group flow is a core problem in designing an efficient group flow scheduling scheme. Conventional group stream schemes require calculating a target result in combination with available bandwidth given the size of each stream within the group stream and updating the rate of each stream in accordance with the result. This scheme is difficult to adapt to scenarios where the stream information is unknown, and depends on the accuracy of bandwidth detection, while updating the stream transmission rate on a large scale would bring about a significant resource overhead.
Because the group stream structure in the machine learning framework has stability, the number n of the group stream can be obtained by counting the number of the active streams based on the existing group stream scheduling framework after the machine learning task starts. If we randomly choose a stream in the group stream as a probe stream and ensure that it completes the transmission of data packets as soon as possible, so as to obtain the size f of the stream, the size of a group stream can be obtained from n x f by combining the self-similarity of machine learning group streams. Since the size of a group flow is equated to potential congestion capability for the shared forwarding node, the group flow size is used as the primary basis for determining the priority of a group flow.
Information speculation and priority updating are two core parts in the group flow information speculation algorithm, as shown in fig. 3. Group flows from newly generated machine learning tasks need to go through information speculation to be added to the active group flow set. In the process of information speculation, one group flow in a plurality of group flows in a new task is randomly selected according to an edge switch to which a receiving end (a hyper-parameter server) belongs. For the selected group flow, a flow is randomly selected as a detection flow according to a physical host where a sending end (working example) is located. Packets from the probe stream are marked into the probe queue, with the highest priority. The detection stream is randomly assigned from the randomly selected group stream (see steps 2 and 3 in fig. 3); the dual random design of the detection flow selection is to avoid collision competition bandwidth among the detection flows as much as possible. The highest priority of the probe flow can ensure that the link round trip delay (RTT) measured by the data packet of the probe flow is the lowest, and the elastic rate control algorithm can ensure that the sending rate of the probe flow is rapidly increased.
Before the priority update starts, the non-probe flow from the new task enters the active queue to ensure that it is not starved for packet transmission at the lowest transmission rate under the current network configuration. When the flow priority update timer is over, all non-probe flows (including unfinished group flows) enter the corresponding priority queue according to the probe result of the probe flow to which the non-probe flows belong. Since the probe flow size determines the priority of the non-probe flow by estimating the influence on the group flow size, the priority of the non-probe flow having a smaller probe flow size is higher in accordance with the scheduling principle of short job priority. The priority update ensures the implementation of a scheduling policy with short job priority because the priority of the non-probe flow decided by the probe flow is gradually adjusted downward when the size of the probe flow is increased continuously (see step 9 in fig. 3).
The group flow speculation mechanism supports a group flow scheduling scheme to implement an efficient scheduling strategy. Compared to the conventional method (LAS) of determining the group stream priority by the amount of data transmitted by the stream, the probe stream provides a probe result that can achieve a theoretically better short job priority policy (SJF), as shown in fig. 7 a. In the example, there are two simultaneous work tasks J1 and J2, where J1 includes two group streams C1 and C2, and C1 includes three streams with size 1 sharing one receiving end H1, and the other group stream structures are shown in fig. 7a and will not be described again. Assuming that a link processes one unit of data per time unit, the proposed scheduling SJF scheme (as in fig. 7c) is superior to the conventional scheme under the LAS scheduling policy (as in fig. 7b) from the viewpoint of group stream completion time. This is because LAS needs to distinguish between servicing group streams from different tasks when t is 3, whereas SJF can get an earlier group stream size J2 larger than J1 through the probe stream regions J1 to H1 and J2 to H3 when t is 2.
2. Elastic rate control algorithm
Probing flows and non-probing flows put different demands on the rate control strategy. An ideal rate control strategy is shown in figure 4. The probe stream takes on the mission of group stream size speculation, and traditional incremental increases in rate control are obviously not as effective as multiplicative increases. Therefore, in the case of sufficient available bandwidth, the rate of the stream should be multiplied to quickly occupy the available bandwidth to increase the link utilization. As shown in the bandwidth detection part in fig. 4, when the number of packets transmitted in the previous detection period is m, no packet loss occurs, and the number of packets transmitted in the period may be increased to 2 × m. Meanwhile, the higher rate should not suffer from random network fluctuation, such as a small amount of data packet discarding, but be greatly reduced, thereby causing the large fluctuation of the rate; a reasonable strategy would be to sense the appropriate slowdown after the queue appearing in the link is small to preserve the current rate level. As shown in the rate protection part in fig. 4, when the number of packet losses in a period of a stream is n, the number of data packets sent in the period may be adjusted to m-n. Conversely, the goal of the probe flow is to match its rate as much as possible to the available bandwidth in the link. The probe flow can accept large rate fluctuations to avoid congestion, especially when there are probe flows in the link that occupy large amounts of bandwidth. As shown in the congestion avoidance part of fig. 4, when the transmission of m packets transmitted in one period on the flow waits for the transmission of n packets on average, the number of packets transmitted in the period may be adjusted to m/n.
The elastic rate control algorithm proposed by the present patent can meet the above requirements as much as possible with less overhead, as shown in fig. 5. Since machine learning group flows share the same receiving end (or transmitting end), RTT information of links passed by each flow in the group flows can be collected at an end system to estimate available bandwidth. The rate control based on the end system avoids calculation and communication overhead brought by centralized rate calculation, and can be well compatible with the existing TCP/IP protocol in a mode of setting the size of a sending window, so that the development cost is reduced. It is to be noted that the upper limit of the transmission rate of the stream is not made a mandatory limit here.
The flexible rate control algorithm treats any complex link as a direct link with the same RTT size and bottleneck bandwidth. For a flow f, f.max _ rtt represents the maximum queue length acceptable in the link, the size of which is related to the network configuration; min _ rtt represents the queue length in the link in the last measurement period, because the queuing delay caused by the forwarding delay, processing delay and transmission delay of the data packet in the modern high-speed data center network compared with the queue is negligible. The current sending rate may be logically equivalent to indicate the number of data packets sent by the sending end in the last period (since the period size is fixed, the more data packets sent indicates the faster sending rate); the current receiving rate represents the number of data packets received by the receiving end in the last period so as to realize a rate protection strategy; the target rate represents the number of data packets expected to be transmitted in the next period, and is calculated by the formula in fig. 5 (see step 1 in fig. 5):
elastic_gain*f.max_rtt/f.min_rtt
the preset value elastic _ gain determines the lowest sending rate of the flow, because when the queue in the link reaches the upper limit, f.max _ rtt/f.min _ rtt is 1; the resulting transmission rate represents the number of packets actually to be transmitted in the next cycle.
According to the method and the device, through a group flow information speculation mechanism, dependence of a group flow scheduling scheme on acquisition of group flow information in advance is avoided, complexity of scheduling scheme design is reduced through mining of machine learning group flow self-similarity, an existing group flow scheduling framework is simplified, deployment is facilitated, the utilization rate of a link is improved while the performance of a scheduling strategy is optimized through elastic rate control, and average completion time of machine learning group flow is shortened.
According to the scheme, the existing centralized group flow scheduling framework needs to be simplified, and part of modules are migrated to the receiving end to be realized, as shown in fig. 6.
The algorithm shown in fig. 3 is located in the group stream semantic analysis module in fig. 6, and the algorithm shown in fig. 5 is located in the elastic stream scheduling module in fig. 6. The controller mainly realizes the group flow semantic analysis function and comprises a group flow information acquisition module, a communication mode matching module and a group flow size presumption module. The sending end periodically reports the latest flow information to the controller, and the group flow information acquisition module organizes the flows from the machine learning task into group flows according to the existing flow information and identifies the group flows from different training periods. The communication mode matching module is responsible for recording the completed group flow to match the current uncompleted group flow; the successfully matched group stream does not need to enter a group stream size presumption module, and the stream to which the group stream belongs directly uses the matching result to issue a decision to a receiving end. The group flow size presumption module mainly realizes the group flow information presumption algorithm, divides new group flows into detection flows and non-detection flows, and updates the priority of the non-detection flows according to the result of the group flow information collection module.
The controller periodically sends scheduling decisions to the receiving end. The module at the terminal realizes the received group flow strategy through elastic flow scheduling, wherein the module comprises a priority marking module, a measuring result updating module and a transmission layer rate control module. After receiving the priority update information of each flow in the group flow from the controller, the priority marking module marks the corresponding priority to the data packet of each flow and ensures that the data packet enters a multi-level feedback priority queue in an operating system to realize a short operation priority strategy. The measurement result updating module collects RTT information of a link between end to end and the quantity of sending packets and the quantity of receiving packets in the previous period to provide a data base for a transmission layer rate control module in the next step. The transmission layer rate control module mainly realizes the functions described by the algorithm shown in fig. 5, calculates the number of the sending packets in the next period, and ensures that the corresponding sending packets are sent out normally from the network card with the help of the rate limiter.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims (4)

1. A traffic scheduling method facing a machine learning framework is disclosed, wherein the machine learning is used for obtaining different machine learning models under a large-scale data set through a data parallel model; in the machine learning, a large-scale data set is divided into a plurality of distributed nodes for storage; training a working example running in a distributed node according to a local partial data set to obtain a gradient value of a model parameter, and sending the gradient value to a hyper-parameter server to update the model; the hyper-parameter server converges a plurality of groups of gradient values to carry out model training and sends the updated model parameters back to the working example; the method is characterized in that streams sent by a plurality of working examples to the same hyper-parameter server are organized into one group stream, and streams sent by the hyper-parameter server to a plurality of examples are organized into another group stream, so that the traffic scheduling of a distributed machine learning framework on the group stream level is realized;
the method comprises a group flow information inference mechanism used for detecting the potential congestion capability of the group flow as fast as possible, wherein the group flow information inference mechanism comprises the following steps:
s1, after the machine learning task starts, acquiring the number n of the group flow by counting the number of the active flows based on the group flow scheduling framework;
s2, randomly selecting one of the streams in the group as a detection stream and ensuring that the detection stream completes transmission of a data packet as soon as possible, so as to obtain the size f of the stream;
s3, in combination with machine learning of self-similarity of group flows, the size of a group flow can be derived from n × f; the size of the group of flows is equated to potential congestion capability for the shared forwarding node and is used as a basis for determining the priority of a group of flows;
s4, updating the priority;
before the priority updating is started, the non-detection flow from the new task enters an active queue to ensure that the non-detection flow is not starved to be dead under the current network configuration and is transmitted at the lowest transmission rate;
when the flow priority updating timer is overtime, all non-detection flows comprise unfinished group flows and enter a corresponding priority queue according to the detection result of the detection flow to which the non-detection flows belong; the detection flow size determines the priority of a non-detection flow through the speculation of influencing the group flow size, and the priority updates a scheduling strategy for ensuring the priority of short operation;
group flows from a newly generated machine learning task need to be added to an active group flow set through information speculation; in the information speculation process, one group flow in a plurality of group flows in a new task is randomly selected according to an edge switch to which a receiving end hyper-parameter server belongs; for the selected group flow, randomly selecting a flow as a detection flow according to a physical host where a working example of a sending end is located;
data packets from the probe stream are marked into a probe queue, enjoying the highest priority; the detection flow is randomly designated in the group flow obtained by random selection, namely, the selection of the detection flow adopts double random design;
the highest priority of the probe stream is combined with an elastic rate control algorithm to ensure the rapid increase of the sending rate of the probe stream;
the elastic rate control algorithm includes: in the case of sufficient available bandwidth, the rate of the stream is multiplied to quickly occupy the available bandwidth to increase the link utilization.
2. The traffic scheduling method for the machine learning framework according to claim 1, wherein: collecting RTT information of links passed by each flow in the group flow at an end system to estimate available bandwidth, and realizing rate control based on the end system; and is compatible with the existing TCP/IP protocol by setting the size of the transmission window.
3. The traffic scheduling method oriented to the machine learning framework of claim 2, wherein for a flow f, the target rate, i.e. the number of data packets expected to be transmitted in the next period, is calculated by the following formula:
elastic_gain*f.max_rtt/f.min_rtt
the preset value elastic _ gain determines the lowest sending rate of the flow, and f.max _ rtt represents the maximum queue length accepted in the link, and the size of the queue is related to the network configuration; min _ rtt represents the queue length in the link in the last measurement period.
4. A traffic scheduling system facing a machine learning framework, which is used for implementing the method of any one of claims 1 to 3, and is characterized by comprising a controller, a group flow semantic analysis function, and a flow scheduling module, wherein the controller is used for receiving the latest flow information periodically reported by a sending end; the controller comprises a group flow information acquisition module, a communication mode matching module and a group flow size presumption module;
the group flow information acquisition module is used for organizing flows from the machine learning task into group flows according to the existing flow information and identifying the group flows from different training periods;
the communication pattern matching module is used for recording the completed group flow to match the current incomplete group flow; the successfully matched group flow does not need to enter a group flow size presumption module, and the flow to which the group flow belongs directly uses the matched result to issue a decision to a receiving end;
the group flow size presumption module is used for realizing the group flow information presumption algorithm, dividing new group flows into detection flows and non-detection flows and updating the priority of the non-detection flows according to the result of the group flow information acquisition module;
the controller is also used for periodically sending a scheduling decision to the receiving end; the module at the terminal realizes the received group flow strategy through elastic flow scheduling, wherein the module comprises a priority marking module, a measurement result updating module and a transmission layer rate control module;
after receiving the priority updating information of each flow in the group flow from the controller, the priority marking module marks corresponding priority to the data packet of each flow and ensures that the data packet enters a multi-level feedback priority queue in an operating system to realize a short operation priority strategy;
the measurement result updating module collects RTT information of a link between end to end and the quantity of sending packets and the quantity of receiving packets in the previous period to provide a data basis for a transmission layer rate control module in the next step;
the transmission layer rate control module is used for calculating the number of the sending packets in the next period and ensuring that the corresponding sending packets are sent out normally from the network card with the help of the rate limiter.
CN201810569876.6A 2018-06-05 2018-06-05 Traffic scheduling method facing machine learning framework Active CN108768876B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810569876.6A CN108768876B (en) 2018-06-05 2018-06-05 Traffic scheduling method facing machine learning framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810569876.6A CN108768876B (en) 2018-06-05 2018-06-05 Traffic scheduling method facing machine learning framework

Publications (2)

Publication Number Publication Date
CN108768876A CN108768876A (en) 2018-11-06
CN108768876B true CN108768876B (en) 2022-01-11

Family

ID=63999879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810569876.6A Active CN108768876B (en) 2018-06-05 2018-06-05 Traffic scheduling method facing machine learning framework

Country Status (1)

Country Link
CN (1) CN108768876B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110958187B (en) * 2019-12-17 2021-05-18 电子科技大学 Distributed machine learning parameter-oriented synchronous differential data transmission method
CN111078659B (en) * 2019-12-20 2023-04-21 腾讯科技(深圳)有限公司 Model updating method, device, computer readable storage medium and computer equipment
CN111612155B (en) * 2020-05-15 2023-05-05 湖南大学 Distributed machine learning system and communication scheduling method suitable for same
CN111628940B (en) * 2020-05-15 2022-12-27 清华大学深圳国际研究生院 Flow scheduling method, device, system, switch and computer storage medium
CN113839884B (en) * 2020-06-24 2023-08-22 华为技术有限公司 Flow control method and device
CN113194086B (en) * 2021-04-27 2022-05-27 新华三信息安全技术有限公司 Anti-attack method and device
CN115499306B (en) * 2022-07-29 2024-03-12 天翼云科技有限公司 Method and device for constructing flow scheduling model, electronic equipment and storage medium
CN115062771B (en) * 2022-08-16 2022-11-25 之江实验室 Distributed machine learning gradient convergence method and device and model training method
CN115271102B (en) * 2022-09-26 2022-12-16 太极计算机股份有限公司 Task-oriented priority method and system for machine learning engine

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692657A (en) * 2009-10-22 2010-04-07 北京交通大学 Differentiated service core router and data forwarding method thereof
CN104852887A (en) * 2014-02-17 2015-08-19 上海宽带技术及应用工程研究中心 Network flow tracing system and method based on OpenFlow technology
EP3200410A1 (en) * 2016-01-28 2017-08-02 Alcatel Lucent Method and system for queueing packets in communication networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3436871B2 (en) * 1997-10-23 2003-08-18 株式会社東芝 Communication resource management method and node device
RU2009144127A (en) * 2008-12-01 2011-06-10 РАЗУМ, Инк. (US) SERVICE QUALITY MANAGEMENT BASED ON MONITORING THE STATE OF THE STREAM STREAM WITHOUT USER SIGNALING

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101692657A (en) * 2009-10-22 2010-04-07 北京交通大学 Differentiated service core router and data forwarding method thereof
CN104852887A (en) * 2014-02-17 2015-08-19 上海宽带技术及应用工程研究中心 Network flow tracing system and method based on OpenFlow technology
EP3200410A1 (en) * 2016-01-28 2017-08-02 Alcatel Lucent Method and system for queueing packets in communication networks

Also Published As

Publication number Publication date
CN108768876A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108768876B (en) Traffic scheduling method facing machine learning framework
CN109104373B (en) Method, device and system for processing network congestion
KR100748187B1 (en) Node availability prediction-based grid network congestion control device and method therefor
CN107579922B (en) Network load balancing device and method
CN110932989B (en) Elephant flow path monitoring and scheduling method based on SDN data center network
CN107454017B (en) Mixed data stream cooperative scheduling method in cloud data center network
Wang et al. Freeway: Adaptively isolating the elephant and mice flows on different transmission paths
CN105022717A (en) Network on chip resource arbitration method and arbitration unit of additional request number priority
CN102724103A (en) Proxy server, hierarchical network system and distributed workload management method
Zhang et al. Tuning the aggressive TCP behavior for highly concurrent HTTP connections in intra-datacenter
CN114268537B (en) Deterministic network-oriented network slice generation and dynamic configuration system and method
CN112804157A (en) Programmable congestion control
Li et al. OPTAS: Decentralized flow monitoring and scheduling for tiny tasks
De Pellegrini et al. Blind, adaptive and robust flow segmentation in datacenters
CN114500354A (en) Switch control method, device, control equipment and storage medium
CN116302578B (en) QoS (quality of service) constraint stream application delay ensuring method and system
CN112714081B (en) Data processing method and device
Balman et al. Dynamic adaptation of parallelism level in data transfer scheduling
CN109298932B (en) OpenFlow-based resource scheduling method, scheduler and system
US20200296044A1 (en) Data Scheduling Method and Tor Switch
Wei et al. Coflow scheduling with unknown prior information in data center networks
Wang et al. Efficient and fair: Information-agnostic online coflow scheduling by combining limited multiplexing with DRL
Yang et al. Cross-layer self-similar coflow scheduling for machine learning clusters
He et al. ShuttleBus: Dense packet assembling with QUIC stream multiplexing for massive IoT
Shreshta et al. INT Based Network-Aware Task Scheduling for Edge Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant