CN105553879A - Server-based flow scheduling method - Google Patents

Server-based flow scheduling method Download PDF

Info

Publication number
CN105553879A
CN105553879A CN201510957601.6A CN201510957601A CN105553879A CN 105553879 A CN105553879 A CN 105553879A CN 201510957601 A CN201510957601 A CN 201510957601A CN 105553879 A CN105553879 A CN 105553879A
Authority
CN
China
Prior art keywords
stream
recipient
priority
packet
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510957601.6A
Other languages
Chinese (zh)
Inventor
张大方
张洁
黄昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201510957601.6A priority Critical patent/CN105553879A/en
Publication of CN105553879A publication Critical patent/CN105553879A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/625Queue scheduling characterised by scheduling criteria for service slots or service orders
    • H04L47/6275Queue scheduling characterised by scheduling criteria for service slots or service orders based on priority
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a server-based flow scheduling method. The flow control and the flow scheduling are combined to keep a queue of output ports on a switch in a network short, so that a function of flow scheduling can be realized on a server. After adopting the method, each server of a data center keeps respective flow activity with the highest priority and suspends other flows with low priority, and all flows are completed according to the priority, so that a purpose of minimizing the flow completion time is realized. The method comprises two novel technologies that a bidirectional flow scheduling technology sends or receives the current flow with the highest priority for each server, and a hottest flow coordination technology deals with a problem of inconsistency of the priorities of a sender and a receiver. The experimental result shows that the transmission speed of the flow in the network of the data center can be accelerated, in comparison with a server-based scheme, namely a DCTCP (Data Center Transmission Control Protocol), the speed for completing flow minimization is up to four times of the scheme.

Description

A kind of stream scheduling method based on server
Technical field
The present invention relates to TCP flow dispatching technique and Research of Congestion Control Techniques in data center network.
Background technology
The minimum stream deadline (FlowCompletionTimes is called for short FCT) is a very important problem in data center network (DataCenterNetworks is called for short DCN).Cloud service often produces a large amount of TCP flow in DCN, and the speed that completes of stream determines the deadline of task to a great extent.Any one stream that can not complete in time all can have influence on the last result of task, reduces Consumer's Experience or brings direct economic loss.But, in current DCN, flow through normal needs and just can complete transmission for a long time, even exceed ten times of its theoretical deadline.Its main cause be the congested switch in a network of packet output port on form very long queue, experienced by queuing delay too of a specified duration in the transmission.
Scholars have proposed the multiple scheme minimizing FCT, and these methods can be divided into two classes: speed controls class and stream scheduling class.It is the congestion condition at transmitting terminal sensing network speed control class work (as DCTCP, D2TCP and HULL), continuous adjustment transmission rate keeps switch output port queue in network as far as possible short, to reduce the queuing delay that packet experiences in the transmission.This kind of being operated in decreases queuing delay to a certain extent, and is easier to dispose, because they are all based on server.But this kind of work can allow multiple stream transmit simultaneously, share bandwidth, cannot FCT be minimized.Nearest research shows to minimize the method (as PDQ, pFabric, PASE and PIAS) that FCT needs use stream scheduling, namely (generally adopts the preferential rule of rill) from high to low according to the priority orders of stream and allows stream complete one by one.PFabric is a kind of stream scheduling scheme based on switch, and it does best in the work of stream scheduling class.Its priority according to packet in the output port queue of switch releases packet, and that stream the slowest in rill all almost can be completed with its theoretical transmission times.But, due to pFabric essence change the first in first out of switch output port packet release mode, pFabric be realized, the change on hardware must be done switch.And in DCN, have thousands of switches, disposing pFabric may need a large amount of expenses.
Summary of the invention
Technical problem to be solved by this invention is, not enough for prior art, a kind of stream scheduling method based on server (Server-basedFlowScheduling is called for short SFS) is provided, both easily disposed, and the effect similar with pFabric can be obtained minimizing on FCT again.
For solving the problems of the technologies described above, the technical solution adopted in the present invention is: a kind of stream scheduling method based on server, comprises the following steps:
1) at transmit leg, TCP flow obtains priority from application layer, then by the SYN bag of priority embedding transmission and packet; Recipient, described priority is copied in corresponding ACK bag; The congestion window of TCP is fixed as BDP, and timeout value is fixed as 500us, and Congestion Avoidance and the Fast retransmission function of TCP are closed; Network adopts the FatTree network of 1 to 1 convergence ratio, and Load Balanced is distributed on many equative routes by the routing mode using bag to spray;
2) between the ICP/IP protocol stack and network interface card of recipient, back scheduling device is added, back scheduling device activates/suspends the stream that recipient receives realize reverse stream scheduling, simultaneously by discharging the tcp data bag total amount of flying in ACK bag net control by discharging/catching ACK bag; Back scheduling device comprises a stream table and a stream window, the ACK bag transmitted from ICP/IP protocol stack is pushed into stream table, stream in stream table, with the arrangement of priority height, flows the fluxion of window limit activity, and network interface card extracts ACK and wraps and be sent in network from the stream table list item in stream window;
3) between the ICP/IP protocol stack and network interface card of transmit leg, add forward scheduler, forward scheduler by discharge/catch SYN bag and packet activate/suspend its send stream realize forward stream dispatch; Forward scheduler comprises a stream table, the SYN bag transmitted from ICP/IP protocol stack and packet are pushed into stream table, stream in stream table is with the arrangement of priority height, and network interface card always has packet and extracted data bag be sent in network the highest list item of priority from current;
4) in described back scheduling device and described forward scheduler, add Coordination module, when transmit leg and recipient judge inconsistent for the priority that certain flows, coordinate sender and recipient, fill link bandwidth.
Described Coordination module adopts most hot-fluid coordination approach, and most hot-fluid coordination approach comprises following three steps, and first, when the stream side of being sent out of the current reception of recipient interrupts, transmit leg notifies that recipient's current transmission has interrupted; Then, recipient activates the hottest stream to fill link recipient; Finally, recipient side activates the hottest circulation and becomes the stream activating limit priority again.
Recipient identifies the hottest stream by the temperature of stream, the temperature of stream be used for weighing recipient upper once receive packet of this stream after time of waiting for, the hottest described stream is exactly the stream be transmitted across recently.
The transmit leg of the stream that received side suspends can refresh by sending heartbeat packet the temperature that this stream suspended locates its recipient, described heartbeat packet refers to the retransmission packet that TCP time out event produces, if the stream suspended described has enough priority at transmit leg, so the heartbeat packet of this stream suspended is released and refreshes the temperature that this stream suspended locates recipient, otherwise this stream suspended is passed in time in the temperature that recipient locates and is dropped to zero degree.
If when recipient receives the Transmission notice of transmit leg, recipient knows that the stream in stream window has suspended the side of being sent out, then recipient additionally opens a candidate window and activates the hottest stream.
Recipient moves described candidate window, activates highest priority stream transform activating the hottest flow direction, this is because activate the hottest stream can effectively fill network link, but cannot the minimum stream deadline; Moving process is: at any time, if when recipient finds that the stream of the list item on the candidate window left side has just been refreshed as maximum temperature, recipient will move to this diffluence candidate window, suspends original stream, newly have activated the synthermal but stream that priority is higher.
When the stream interrupted when the side of being sent out that recipient flows in window starts again to transmit, recipient closes candidate window.
Compared with prior art, the beneficial effect that the present invention has is: the present invention can be dispatched by software simulating stream; The present invention effectively can accelerate the transmission speed flowed in data center network, and is compared with scheme DCTCP based on server equally, and the speed that completes of our rill can reach its four times.
Accompanying drawing explanation
Fig. 1 is the basic framework of SFS;
Fig. 2 is the example that recipient flows scheduling;
Fig. 3 is the schematic diagram of recipient based on the stream scheduler module of window;
Fig. 4 is the example of transmit leg stream scheduling;
Fig. 5 is the schematic diagram of transmit leg based on the stream scheduler module of priority;
Fig. 6 (a) is the inconsistent initial condition figure of priority; 6 (b) is priority inconsistent event generation schematic diagram;
Fig. 7 (a) is most hot-fluid coordination technique initial condition figure; Fig. 7 (b) is that stream temperature reduces schematic diagram; Fig. 7 (c) is when occurring priority inconsistent time, and candidate window opens schematic diagram; Fig. 7 (d) receives the schematic diagram of the response of transmit leg for the stream be activated; Fig. 7 (e) for the inconsistent event of priority again occur time, candidate window open schematic diagram; Fig. 7 (f) receives the schematic diagram of the response of transmit leg for the stream be activated; Fig. 7 (g) moves schematic diagram for candidate window; Fig. 7 (h) is candidate window closedown schematic diagram;
Fig. 8 (a) is the mean F CT of the rill of SFS and DCTCP and pFabric when adopting Webpage search load in FatTree; Fig. 8 (b) is the FCT of the slowest rill of SFS and DCTCP and pFabric when adopting Webpage search load in FatTree; Fig. 8 (c) is the FCT of the large stream of SFS and DCTCP and pFabric when adopting Webpage search load in FatTree;
Fig. 9 (a) is the mean F CT of the rill of SFS and DCTCP and pFabric when adopting data mining load in FatTree; Fig. 9 (b) is the FCT of the slowest rill of SFS and DCTCP and pFabric when adopting data mining load in FatTree; Fig. 9 (c) is the FCT of the large stream of SFS and DCTCP and pFabric when adopting data mining load in FatTree;
Figure 10 (a) is when using SFS in network heavy duty situation, the maximum queue length of down link in network; Figure 10 (b) is when using SFS in network heavy duty situation, the average queue length of down link in network.
Embodiment
1) server and network are configured, comprise following four aspects:
A) priority policy: at transmit leg, flows and obtains current priority from application layer, and these priority are attached, so that transmit leg is dispatched it by SYN bag and packet.Recipient, priority can copy in ACK bag from corresponding SYN bag and packet, and then recipient also can know that the priority of stream is dispatched it.We have employed the priority level initializing similar to pFabric, and in stream, the priority of each packet is the value of the current packet also do not transmitted of this stream, and scheduling strategy is that the stream of minimum residue bag is preferential equally.One greatly stream a lot of do not transmit packet owing to having, its priority is very low; Rill little does not transmit packet owing to having, and its priority is very high.Along with the carrying out of transmission, a large stream also slowly can become rill, and its priority can correspondingly improve.
B) topological structure: we adopt convergence ratio be 1 to 1 FatTree structure because it is current most widely used topological structure.
C) load balancing: we adopt bag to spray and are evenly ejected on many equative routes by the packet of each stream.
D) TCP: we adopt unidirectional TCP, its congestion window is fixed as BDP.In order to eliminate the out of order impact of bag, we close Congestion Avoidance and Fast retransmission.The timeout value of TCP is fixed as 500us by us.
2) between the ICP/IP protocol stack and network interface card of recipient, scheduler is added.Recipient adopts and carrys out based on the stream scheduler module of window the stream that back scheduling receives, make stream according to priority sequence complete.The stream of this module activation limit priority, suspends the stream of other low priorities.Here we make use of three-way handshake and the self clock mechanism of TCP.We know, transmit leg must receive first ACK bag and could set up TCP and connect in three-way handshake, and in transmitting procedure, transmit leg also needs to receive continuously the ACK bag that recipient feeds back could send packet continuously.This module make use of these two mechanism and discharges/catch ACK bag to activate/to suspend stream.Fig. 2 is the example of recipient's back scheduling, we can see, although server A and server D send packet to server B, the stream scheduler module based on window in server B has suspended the stream of (dotted line represents) low priority, and the stream of medium priority can monopolize bandwidth.This module comprises a stream table and a stream window.As shown in Figure 3, the ACK bag from ICP/IP protocol stack is pushed into stream table.Stream in stream table according to priority arranges from high to low.Recipient uses one to flow window and controls movable fluxion.Network interface card can only extract ACK bag from the stream table list item in window, and the stream outside stream window is suspended because of not having ACK to feed back.
A) stream table: the priority that recipient extracts in ACK is shown to build this.Stream in table arranges from high to low according to priority.The pointer that list item in table comprises stream ID, the temperature of stream, priority and one point to ACK bag.When new stream comes then, recipient is this new streamer volume list item and inserts in stream table.When a stream completes transmission, recipient deletes the list item of this stream.Note, the list item of stream table can be a fixed value (as 100).As long as give TCP flow one suitable time-out time, if some new stream is because stream table is expired and cannot register, they can cross and can again register by Retransmission timeout SYN bag.
B) flow window: the inspiration being subject to TCP sliding window, we devise the fluxion that a stream window comes for recipient's restraint.Stream in stream window is activated, and the stream outside stream window is suspended.Just as TCP slides its window keeps continuously transmitting, our stream window also slides and the stream in stream table is completed successively according to its priority height.The size of stream window is not unalterable.If the stream in stream table is all very little, so stream window must could activate greatlyr enough stream and jointly fill up link; Otherwise, if the stream in stream table is all very large, so flows window and be necessary for 1 the stream of limit priority could be allowed to monopolize bandwidth.We carry out the size of dynamically control flow check window by the packet flown in estimation network.As previously mentioned, we adopt the packet amount flowing and remain and do not transmit as flowing current priority.This priority is carry transmission by packet and gives ACK bag, and we know that the congestion window of TCP flow is fixed to BDP, so we can know that the flight bag in a network that stream i activated is current caused is Min (Priority i, BDP) and individual.In like manner, the amount of the flying quality bag that the stream that we also can calculate all activities in stream window causes.We remain the amount of the flying quality bag that each recipient causes carefully between [0.5BDP, 1.5BDP], flow window like this and just always remain suitable size, can fill up link, high priority flows can be allowed again to monopolize bandwidth.
3) between the ICP/IP protocol stack and network interface card of transmit leg, scheduler is added.The forward scheduling of transmit leg is realized by a stream scheduler module based on priority.Fig. 4 is an example of transmit leg scheduling.Server D sends data to server B and server C simultaneously, and the stream scheduler module based on priority on the serviced device D of stream of low priority is suspended (dotted line represents), and the stream of medium priority can monopolize bandwidth.As shown in Figure 5, transmit leg has similar structure based on the stream scheduler module of priority to the stream scheduler module based on window of recipient.SYN from ICP/IP protocol stack wraps and packet is pushed into stream table.Stream in stream table according to priority arranges from high to low.Each list item of stream table comprises the queue of a first in first out.Network interface card always extracts the packet of the stream of the limit priority of current existence and is discharged in network.Such as, in Figure 5, first network interface card can extract the packet of stream A, if the queue finding stream A is empty, just attempts extracting stream B packet.Compared with the stream scheduler module of recipient, the scheduler module of transmit leg has 2 differences.
A) scheduler module of transmit leg does not need to flow the fluxion that window carrys out restraint.Recipient need to flow window control movable fluxion with to avoid when many-to-one flow rate mode in the end one jump produce congested; And for some senders, no matter being connected to how many recipients, it all cannot oneself up link congested.
Have the queue of a first in first out in the list item of b) transmit leg stream table, queue for storing the pointer pointing to packet.Recipient does not need such queue, because the mechanism that TCP adopts accumulation to confirm, the ACK bag that recipient is only required to be each stream reservation current sequence number the highest is just passable, and other ACK bag can directly discard to save bandwidth.
Only doing stream in transmission with recipient dispatches still inadequate, and SFS introduces priority inconsistence problems, this problem meeting interrupt transfer, increase FCT.Fig. 6 is the example of priority inconsistence problems.In Fig. 6 (a), concerning server B, the stream from the medium priority of server D is the stream of its current highest priority, and the stream from the low priority of server A is suspended by it.In Fig. 6 (b), server D starts the stream sending a high priority, and the stream of therefore original medium priority has been interrupted.But in server B, the stream of that interrupted medium priority remains the stream of its limit priority.Present server B can not receive the stream of medium priority, and also cannot receive the stream of low priority, link is in idle condition.The basic reason of the problem that has problems is, SFS is the scheme of a distributed terminal scheduling, and it does not have the precedence information of the stream of the overall situation, and certainly, this also means that SFS has good scalability.
4) in order to solve priority inconsistence problems, we devise most hot-fluid coordination technique.This technology realizes following three kinds of functions.First, when the inconsistent generation of priority, sender will notify recipient.As in Fig. 6 (b), when the outflow of high priority is current, server D wants announcement server B " I will interrupt to the stream of the medium priority of your transmission ".Then, recipient will select a stream and activate to fill link, and recipient always selects current that the hottest stream and activates.The hottest stream is the stream that recipient received packet recently, and in all candidates stream, this stream has maximum probability can receive the response of its transmit leg after being activated.Finally, scheduling strategy will be transformed back into the stream of minimum residue bag preferentially to minimize FCT by recipient at leisure.In second step, the way that recipient activates most hot-fluid is in fact have employed the preferential scheduling strategy of most hot-fluid, and this strategy can activate stream effectively to fill link, but cannot minimize FCT.So scheduling strategy will be become again the stream of minimum residue bag preferentially to minimize FCT by recipient as much as possible.Most hot-fluid coordination technique comprises following six mechanism, and their realize in the forward scheduler of transmit leg and the back scheduling device of recipient, the state of monitoring flow scheduler module, and adjustment scheduling is to realize above-mentioned three functions.
A) informing mechanism of transmit leg: this mechanism can allow sender notify recipient when priority inconsistence problems occurs.Sender, by monitoring that the change of current output stream judges whether to have occurred the inconsistent event of priority, if there is event, then gives notice.Such as, in Fig. 6 (b), when the packet of the stream of high priority occurs, the stream of current output becomes the stream of high priority from the stream of medium priority, and at this moment server D wants the inconsistent event of announcement server B priority to occur.Note, only have packet can trigger transmit leg informing mechanism.Such as, in Fig. 6 (b), when the SYN of the stream of high priority contracts out now, server D is the stream registry entry of high priority and exports this SYN and wrap, and can not the generation of the inconsistent event of announcement server B priority.This is because, at this moment server D do not know whether this has enough priority at its recipient place at the stream that it is high priority.If its priority is enough high, its recipient can return ACK bag for it, if not, its recipient would not return ACK bag, server D also would not produce naturally the packet of this stream.Otherwise when the packet of the stream of this high priority occurs, server D can confirm that this stream also has sufficiently high priority, so announcement server B " Transmission " in time at its recipient place.In order to give notice, sender needs the packet of a multiple original stream in transmission again, and this bag is marked to inform that the inconsistent event of recipient's priority occurs.
B) temperature flowed: just as storage system temperature represents nearest used blocks of files, the stream that the temperature of stream allows recipient perceive to be transmitted across recently.A hot stream represents that recipient received the packet from this stream recently, also illustrates that this stream has enough priority to be sent out out its sender.Cold stream represents that recipient has the packet do not received a period of time from this stream, also illustrates that the priority that this stream is located its sender is too low, is not enough to be sent out out.The hottest stream is exactly the stream be just transmitted across, and receives the response of transmit leg after having maximum possibility to activate.The span of the temperature of stream is set as [0:7] by we.Receive the packet of certain stream, it is 7 degree that the temperature of this stream can be refreshed at every turn.Every 100us, in stream table, the temperature of all stream declines 1 degree.In basic assumption above, we have secured TCP time-out time is 500us, therefore can produce a packet every 500us, TCP time out event.If this stream has enough priority at transmit leg, this packet can be sent out to arrive recipient, and just its temperature is refreshed back 7 degree before its temperature reduces to 0 degree.If this stream is inadequate in the priority at sender place, the bag that so time out event of this stream produces also cannot send, he finally can reduce to 0 degree, even if recipient can know activate the response that this stream can not receive its transmit leg in the temperature at recipient place.Note, when recipient receive that sender sends be labeled with the packet of priority inconsistent event time, fail to be convened for lack of a quorum accordingly at once by freezing (temperature is set to 0).
C) heartbeat packet: as can be seen from description above we, in fact the packet that TCP time out event produces in fact is treated as heartbeat packet in SFS, is used for telling whether recipient's stream that each suspends is movable at its sender place.This just looks like that the state that inside hadoop system, Master understands each Worker current by the heartbeat packet received is the same.Have some difference to be, the packet that TCP time out event produces is with load.In order to save bandwidth, we allow sender cut down load in heartbeat packet, only retain packet header.In pFabric, also there is similar design, use empty bag to carry out the state of detection network.Note, heartbeat packet can not trigger transmit leg informing mechanism, and it remains, and the person of being sent out dispatches.Such as, if the priority of a stream is inadequate, so its heartbeat packet cannot be sent out out.If it experienced by repeatedly time out event and creates multiple heartbeat packet, sender only retains a maximum heartbeat packet of wherein sequence number for it, and other heartbeat packet is simply discarded to save bandwidth.
D) candidate window is opened: recipient uses this mechanism to activate the hottest stream to fill link.Mention network interface card above and can only extract ACK bag from the list item stream window.Here, we revise this setting, and present network interface card can wrap from two local ACK of extraction, namely flows window and candidate window.Under normal circumstances, candidate window is closed, because the stream in stream window just enough fills link.And when the inconsistent event of priority occurs, the stream in stream window has been interrupted, at this moment, recipient will open candidate window to activate stream to fill link.The unlatching of candidate window is by the extraction action triggers of network interface card.When each network interface card extracts ACK bag, recipient checks that whether current stream is by freezing (temperature is 0).If find a freezing stream and current candidate window is not opened, candidate window can be unlocked to activate the hottest current stream.Stream sometimes in candidate window also can meet with the inconsistent event of priority and freezing, and the at this moment extraction action of network interface card can trigger candidate window open event again, and recipient again finds the hottest current stream and opens candidate window to activate it.Fig. 7 (a)-(d) illustrates the unlatching of candidate window.In Fig. 7 (a), five TCP from different sender in the stream table of recipient, are had to connect.Wherein flow A and be in state of activation, its packet arrives continuously, and its temperature being refreshed is 7 degree.Other stream B-E is suspended.In Fig. 7 (b), As time goes on, the temperature of all stream once all have dropped.In Fig. 7 (c), stream A has met with the inconsistent event of priority, and its temperature becomes 0 degree.The stream E that recipient finds temperature the highest in stream B-E, and open a candidate window to activate its (discharging its ACK to wrap).In Fig. 7 (d), recipient have received the packet of the stream E just now activated, and it is 7 degree that its temperature is refreshed.Fig. 7 (e) (f) illustrates the unlatching again of candidate window.In Fig. 7 (e), stream E has also met with the inconsistent event of priority, and its temperature becomes 0 degree.Recipient have found the highest stream D of Current Temperatures in stream B-D, and opens candidate window for it.In Fig. 7 (f), the packet that the transmit leg that recipient have received stream D is sent, it is 7 degree that its temperature is refreshed.
E) candidate window moves: recipient uses this mechanism to move candidate window, most hot-fluid preference strategy is transformed back into the stream preference strategy of minimum residue bag.The strategy that what the unlatching of candidate window adopted in fact is most hot-fluid is preferential, it is very effective that this strategy is used for filling link, but can not minimize FCT.We know that the stream in stream table from left to right according to priority arranges, and move candidate window so to the left and just mean stream preference strategy Change inpolicy being returned minimum residue bag.The mechanism of candidate window movement is work like this.When candidate window is opened, recipient receives the packet of the stream not in candidate window at every turn, and it just checks that this flows current position.If this stream is on the left side of candidate window, then candidate window is moved to the position of this stream to activate it by recipient.Otherwise then candidate window can not move.Fig. 7 (g) is the example of a candidate window movement.Stream C have received a heartbeat packet, and it is 7 degree that its temperature is refreshed.Although stream C is the same with the temperature of stream D, the priority of stream C is higher than stream D, and therefore candidate window has been moved to stream C from stream D.Stream D is suspended, and stream C is activated.
F) candidate window is closed: this mechanism is used for closing candidate window and firmly flows to suspend.When candidate window is opened, if receive the packet of the stream in stream window, then mean the end of the inconsistent event of priority.At this moment, recipient closes candidate window, reactivates the stream in stream window.Fig. 7 (h) is the example that candidate window is closed.Stream A have received the packet that its transmit leg is sent, and temperature becomes 7 degree, is again activated, and candidate window is closed to suspend and firmly flowed C.
We use the simulator NS2 of bag level to test the performance of SFS.SFS compares with two schemes by we: DCTCP and pFabric.The optimum configurations of DCTCP is as follows: the threshold values of the explicit congestion notification of switch is set to 15 bags; TCP time-out time is fixed as 1000us.PFabric arranges as follows: for the TCP of pFabric customization is after sending SYN bag, does not need recipient to return first ACK and just directly start to send packet; TCP time-out time is fixed as 500us.SFS arranges as follows: the congestion window of TCP is fixed as BDP(8 bag), time-out time is fixed as 500us to produce heartbeat packet.
Test topological structure used be 8 mouth switchs composition convergence ratio be 1 to 1 FatTree network.128 station servers are had, 16 core layer switches in this network.The bandwidth sum propagation delay of all links is respectively 1Gbps and 1us.Concerning the packet of 1500B, its RTT(RoundTripTime between Pod) be 86us, its RTT in Pod is 28.6us.For SYN bag, its RTT between Pod is 16us, and its RTT in Pod is 4.6us.
We use the Webpage search and data mining load that meet True Data central site network flow distribution to produce flow.In Webpage search load, the flow more than 95% from 30% size at the stream of 1-20MB.In data mining load, the flow more than 95% is greater than the stream of 3.6% of 15MB, and the stream more than 80% is all less than 10KB.We decide with these two loads the size producing stream in above FatTree structure.The source and destination of stream is the server of random selecting, and the arrival time of stream meets Poisson process.We adjust the raw speed of new miscarriage to produce different loading conditions to test the performance of SFS all sidedly.
Fig. 8 is the FCT of SFS and DCTCP and pFabric when adopting Webpage search load in FatTree.In attention figure, the FCT of each stream has done normalized (actual measurement deadline/theory deadline).We can see that SFS shows better than DCTCP, very close to pFabric.Can see from Fig. 8 (a) (b), compared with DCTCP, the mean F CT of the rill of SFS reduces maximum 63%, fast 3 to 4 times of the speed that completes of the slowest rill.It is as expected, SFS cannot surmount pFabric, faster than SFS 1.3 times of the mean F CT of the rill of pFabric, faster than SFS 1.6 times of the FCT of the slowest rill.Reason mainly contains two aspects.First, the TCP for pFabric customization returns first ACK without the need to wait-receiving mode side just can start to send packet, and a lot of like this stream being less than 8 bags can complete in a RTT.And in SFS, TCP will wait recipient to return first ACK just to start to send packet, so transmit the stream of same size, the time of SFS to an I haven't seen you for ages RTT more multiplex than pFabric.Secondly, we know the scheduling strategy minimizing FCT and the stream of minimum residue bag must be used preferential, Here it is strategy that pFabric uses always.And SFS is when the inconsistent event of priority occurs, temporarily can use the strategy that most hot-fluid is preferential, then be transformed back the stream preference strategy of minimum residue bag, so very natural SFS cannot obtain the same effect of pFabric on the FCT of rill.But will obtain the such effect of pFabric is the change needed on hardware, and to dispose pFabric in the DCN having thousands of universal exchange may need a large amount of costs.And SFS can use software simulating, its income is more real.Can see from Fig. 8 (c), for large stream, the performance of SFS is not as DCTCP and pFabric.DCTCP never can suspend large stream, so its large stream has performed the best.PFabric can suspend large stream, allows rill get ahead, so the performance of the large stream of pFabric is not as DCTCP.For SFS, it not only can suspend large stream, and when the inconsistent event of priority occurs, link can be in idle condition momently, so the performance of its large stream is not as pFabric.
Fig. 9 is the FCT of SFS and DCTCP and pFabric when adopting data mining load in FatTree.We can see that present all schemes have all showed better compared with use Webpage search load.This is because the quantity of stream that data mining load produces in a network is less, the Congestion Level SPCC of network alleviates.Can see from Fig. 9 (a), compared with DCTCP, the mean F CT of the rill of SFS decreases maximum 43%, and the FCT of the slowest rill decreases maximum 59%.PFabric remains and has performed the best, faster than SFS 1.4 times of the mean F CT of its rill, faster than SFS 1.7 times of the FCT of its slowest rill.Comparison diagram 9 (a) and Fig. 8 (a), we can see that the gap between pFabric and SFS curve is now larger, and this is the impact that stream size distribution difference is brought.We mention above, and for any one stream, SFS always at least will spend more the time of a RTT than pFabric.And in data mining load, the quantity being less than the rill of 8 packets is very many, pFabric can complete these rills in a RTT, this significantly reduces the FCT of its rill.As can be seen from Fig. 9 (c), for large stream, present SFS and pFabric shows better than DCTCP, and this result is just in time contrary with Fig. 8 (c).In fact, this is also the embodiment of the impact that flow point cloth convection current scheduling class methods produce.In Webpage search flow, the stream being less than 10MB accounts for 65% of total flow, and each large stream runs into these stream, all must be suspended.And in data mining flow, the stream being less than 10MB only accounts for 2% of total flow, they almost can not cause anything to affect on large stream.
Whether can obtain zero queue to test SFS, we are repeated the experiment of Webpage search when 0.9 load.We measure the queue length of all down links, and namely core layer is to convergence-level, and convergence-level is to boundary layer, and boundary layer is to the link of server.In the topological structure of experiment, every layer has 128, altogether 384 links.Figure 10 (a) shows, and the maximum queue length of core layer and convergence-level is no more than 5KB, congestedly only occurs between boundary layer and server.This is because sometimes multiple candidate window moving event recurs at short notice, multiple stream is sequentially activated, and a packet in the end jumping of their release defines of short duration Incast, adds queue length.From Figure 10 (b), we can see that the average length of queue is no more than 2.5KB, at the average queue length of core layer and convergence-level even lower than 0.5KB, namely less than the length of a bag.This illustrates, though and heavy load under, SFS still can long term maintenance queue a very short degree.In addition, we also can see the effect that bag sprays: the curve of core layer and convergence-level is obviously mild than the curve in boundary layer; The maximum queue length of core layer is about the same; Convergence-level link towards same machine frame has similar average queue length.

Claims (7)

1. based on a stream scheduling method for server, it is characterized in that, comprise the following steps:
1) at transmit leg, TCP flow obtains priority from application layer, then by the SYN bag of priority embedding transmission and packet; Recipient, described priority is copied in corresponding ACK bag; The congestion window of TCP is fixed as BDP, and timeout value is fixed as 500us, and Congestion Avoidance and the Fast retransmission function of TCP are closed; Network adopts the FatTree network of 1 to 1 convergence ratio, and Load Balanced is distributed on many equative routes by the routing mode using bag to spray;
2) between the ICP/IP protocol stack and network interface card of recipient, back scheduling device is added, back scheduling device activates/suspends the stream that recipient receives realize reverse stream scheduling, simultaneously by discharging the tcp data bag total amount of flying in ACK bag net control by discharging/catching ACK bag; Back scheduling device comprises a stream table and a stream window, the ACK bag transmitted from ICP/IP protocol stack is pushed into stream table, stream in stream table, with the arrangement of priority height, flows the fluxion of window limit activity, and network interface card extracts ACK and wraps and be sent in network from the stream table list item in stream window;
Between the ICP/IP protocol stack and network interface card of transmit leg, add forward scheduler, forward scheduler by discharge/catch SYN bag and packet activate/suspend its send stream realize forward stream dispatch; Forward scheduler comprises a stream table, the SYN bag transmitted from ICP/IP protocol stack and packet are pushed into stream table, stream in stream table is with the arrangement of priority height, and network interface card always has packet and extracted data bag be sent in network the highest list item of priority from current;
3) in described back scheduling device and described forward scheduler, add Coordination module, when transmit leg and recipient judge inconsistent for the priority that certain flows, coordinate sender and recipient, fill link bandwidth.
2. the stream scheduling method based on server according to claim 1, it is characterized in that, described Coordination module adopts most hot-fluid coordination approach, most hot-fluid coordination approach comprises following three steps, first, when the stream side of being sent out of the current reception of recipient interrupts, transmit leg notifies that recipient's current transmission interrupts; Then, recipient activates the hottest stream to fill link recipient; Finally, recipient becomes activating the hottest circulation the stream activating limit priority again.
3. the stream scheduling method based on server according to claim 2, it is characterized in that, recipient identifies the hottest stream by the temperature of stream, the temperature of stream be used for weighing recipient upper once receive packet of this stream after time of waiting for, the hottest described stream is exactly the stream be transmitted across recently.
4. the stream scheduling method based on server according to claim 3, it is characterized in that, the transmit leg of the stream that received side suspends refreshes by sending heartbeat packet the temperature that this stream suspended locates its recipient, described heartbeat packet refers to the retransmission packet that TCP time out event produces, if the stream suspended described has enough priority at transmit leg, so the heartbeat packet of this stream suspended is released and refreshes the temperature that this stream suspended locates recipient, otherwise this stream suspended is passed in time in the temperature that recipient locates and is dropped to zero degree.
5. the stream scheduling method based on server according to claim 4, it is characterized in that, if when recipient receives the Transmission notice of transmit leg, recipient knows that the stream in stream window has suspended the side of being sent out, then recipient additionally opens a candidate window and activates the hottest stream.
6. the stream scheduling method based on server according to claim 5, is characterized in that, recipient moves described candidate window, activates highest priority stream conversion by activating the hottest flow direction; Moving process is: at any time, if when recipient finds that the stream of the list item on the candidate window left side has just been refreshed as maximum temperature, candidate window is moved to this diffluence by recipient, suspends original stream, newly have activated the synthermal but stream that priority is higher.
7. the stream scheduling method based on server according to claim 6, is characterized in that, when the stream interrupted when the side of being sent out that recipient flows in window starts again to transmit, recipient closes candidate window.
CN201510957601.6A 2015-12-18 2015-12-18 Server-based flow scheduling method Pending CN105553879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510957601.6A CN105553879A (en) 2015-12-18 2015-12-18 Server-based flow scheduling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510957601.6A CN105553879A (en) 2015-12-18 2015-12-18 Server-based flow scheduling method

Publications (1)

Publication Number Publication Date
CN105553879A true CN105553879A (en) 2016-05-04

Family

ID=55832816

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510957601.6A Pending CN105553879A (en) 2015-12-18 2015-12-18 Server-based flow scheduling method

Country Status (1)

Country Link
CN (1) CN105553879A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106533970A (en) * 2016-11-02 2017-03-22 重庆大学 Differential flow control method and device for cloud computing data center network
CN106656858A (en) * 2016-08-10 2017-05-10 广州市香港科大霍英东研究院 Scheduling method based on co-current flow, and server
CN107070620A (en) * 2016-12-09 2017-08-18 深圳信息职业技术学院 A kind of wireless communication system resource allocation methods and device
CN107566318A (en) * 2016-06-30 2018-01-09 联芯科技有限公司 The restorative procedure and device of stream medium data
CN107948103A (en) * 2017-11-29 2018-04-20 南京大学 A kind of interchanger PFC control methods and control system based on prediction
CN109120544A (en) * 2018-09-30 2019-01-01 华中科技大学 The transfer control method of Intrusion Detection based on host end flow scheduling in a kind of data center network
CN112948097A (en) * 2021-04-15 2021-06-11 哈工大机器人(合肥)国际创新研究院 Method and device for executing and scheduling function block of IEC61499
CN114363260A (en) * 2021-11-09 2022-04-15 天津大学 Data flow scheduling method for data center network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050283529A1 (en) * 2004-06-22 2005-12-22 Wan-Yen Hsu Method and apparatus for providing redundant connection services
CN1993944A (en) * 2004-06-02 2007-07-04 高通股份有限公司 Method and apparatus for scheduling in a wireless network
CN101616098A (en) * 2009-08-12 2009-12-30 杭州华三通信技术有限公司 The dispatching method and the equipment of tcp data stream
CN103795643A (en) * 2014-01-28 2014-05-14 广西大学 Method for processing synchronous priority bursty flow in data center network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1993944A (en) * 2004-06-02 2007-07-04 高通股份有限公司 Method and apparatus for scheduling in a wireless network
US20050283529A1 (en) * 2004-06-22 2005-12-22 Wan-Yen Hsu Method and apparatus for providing redundant connection services
CN101616098A (en) * 2009-08-12 2009-12-30 杭州华三通信技术有限公司 The dispatching method and the equipment of tcp data stream
CN103795643A (en) * 2014-01-28 2014-05-14 广西大学 Method for processing synchronous priority bursty flow in data center network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张洁等: "Minimizing datacenter flowcompletion times with server-basedflow scheduling", 《COMPUTER NETWORKS》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107566318B (en) * 2016-06-30 2021-08-03 联芯科技有限公司 Streaming media data repairing method and device
CN107566318A (en) * 2016-06-30 2018-01-09 联芯科技有限公司 The restorative procedure and device of stream medium data
CN106656858B (en) * 2016-08-10 2019-10-29 广州市香港科大霍英东研究院 Dispatching method and server based on concurrent
CN106656858A (en) * 2016-08-10 2017-05-10 广州市香港科大霍英东研究院 Scheduling method based on co-current flow, and server
CN106533970B (en) * 2016-11-02 2019-06-07 重庆大学 Difference towards cloud computing data center network spreads transmission control method and device
CN106533970A (en) * 2016-11-02 2017-03-22 重庆大学 Differential flow control method and device for cloud computing data center network
CN107070620A (en) * 2016-12-09 2017-08-18 深圳信息职业技术学院 A kind of wireless communication system resource allocation methods and device
CN107948103A (en) * 2017-11-29 2018-04-20 南京大学 A kind of interchanger PFC control methods and control system based on prediction
CN107948103B (en) * 2017-11-29 2020-06-30 南京大学 Switch PFC control method and control system based on prediction
CN109120544A (en) * 2018-09-30 2019-01-01 华中科技大学 The transfer control method of Intrusion Detection based on host end flow scheduling in a kind of data center network
CN112948097A (en) * 2021-04-15 2021-06-11 哈工大机器人(合肥)国际创新研究院 Method and device for executing and scheduling function block of IEC61499
CN112948097B (en) * 2021-04-15 2022-10-14 哈工大机器人(合肥)国际创新研究院 Method and device for executing and scheduling function block of IEC61499
CN114363260A (en) * 2021-11-09 2022-04-15 天津大学 Data flow scheduling method for data center network
CN114363260B (en) * 2021-11-09 2023-10-17 天津大学 Data flow scheduling system for data center network

Similar Documents

Publication Publication Date Title
CN105553879A (en) Server-based flow scheduling method
Cardwell et al. BBR: congestion-based congestion control
CN104954206B (en) A kind of out of order analysis method of multipath parallel transmission system
US6859435B1 (en) Prevention of deadlocks and livelocks in lossless, backpressured packet networks
CN104734957B (en) Business transmitting method and device in a kind of software defined network SDN
US8451742B2 (en) Apparatus and method for controlling data communication
CN108989235A (en) A kind of message transmission control method and device
WO2013182122A1 (en) Multiflow service simultaneous-transmission control method and device
CN107948103A (en) A kind of interchanger PFC control methods and control system based on prediction
Hua et al. Scheduling design and analysis for end-to-end heterogeneous flows in an avionics network
US20160164784A1 (en) Data transmission method and apparatus
CN109863782A (en) 5G congestion control
Mo et al. Global cyclic queuing and forwarding mechanism for large-scale deterministic networks
CN104219170A (en) Packet scheduling method based on probability retransmission in wireless network
CN109327406A (en) A method of the service quality guarantee for difference queue service queuing data packet
Sup et al. Explicit non-congestion notification: A new AQM approach for TCP networks
Patel et al. Comparative analysis of congestion control algorithms using ns-2
CN102404225A (en) Method for rapid enqueue of packet for differential queue service system
CN102904829A (en) Unilateral acceleration FAST TCP (Fast Active queue management Scalable Transmission Control Protocol) improved algorithm based on history linkage information
Hayashi et al. Improving fairness of quantized congestion notification for data center ethernet networks
Soni et al. Integrating offset in worst case delay analysis of switched ethernet network with deficit round robbin
CN101129033B (en) A method of and a system for controlling access to a shared resource
Westphal et al. Packet trimming to reduce buffer sizes and improve round-trip times
Osuo-Genseleke et al. Performance measures for congestion control techniques in a wireless sensor network
CN102149207B (en) Access point (AP) scheduling method for improving short-term fairness of transmission control protocol (TCP) in wireless local area network (WLAN)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160504