CN111526036B

CN111526036B - Short flow real-time optimization method, system and network transmission terminal

Info

Publication number: CN111526036B
Application number: CN202010202795.XA
Authority: CN
Inventors: 沈玉龙; 赵迪; 张志为; 何昶辉; 王博; 祝幸辉; 崔志浩; 何怡
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2021-11-23
Anticipated expiration: 2040-03-20
Also published as: CN111526036A

Abstract

The invention belongs to the technical field of network transmission flow scheduling of cloud data centers, and discloses a short flow real-time optimization method, a system and a network transmission terminal, which are used for constructing a reinforcement learning framework of the short flow real-time optimization method of a data center; acquiring network flow characteristics of the data center, monitoring the flow characteristics in the data center, and generating flow size distribution; initializing a multi-stage queue degradation threshold, and calculating the multi-stage queue degradation threshold according to the distribution characteristics of the internal flow of the data center; and the threshold value dynamic adjustment is used for dynamically adjusting the threshold value size in a fine granularity manner so as to adapt to the change of the data center flow, and the data packet division priority transmission is realized. The invention realizes the threshold value self-adaptive matching of the flow size and the transmission based on reinforcement learning, reduces the interactive real-time short flow transmission delay, improves the delayed transmission of the long flow with high bandwidth requirement in the queue and improves the real-time property of network transmission.

Description

Short flow real-time optimization method, system and network transmission terminal

Technical Field

The invention belongs to the technical field of network transmission flow scheduling of cloud data centers, and particularly relates to a short flow real-time optimization method, a short flow real-time optimization system and a network transmission terminal.

Background

At present, data center network traffic transmission optimization is one of core research problems of a cloud environment data center, is a hotspot concerned and researched by the industrial and academic circles, and due to the differentiation and dynamic change characteristics of network traffic inside the data center, the data center network traffic transmission performance optimization becomes a very challenging problem. The research on network traffic transmission aims to reduce traffic completion time and improve transmission real-time performance, cloud applications such as webpage search, social network and service recommendation and the like all exist in a data center in a short traffic form, and the short traffic has high requirements on transmission real-time performance and is a guarantee for user experience. In order to reduce traffic transmission delay and improve user experience, a great deal of research is carried out at home and abroad aiming at the data center network traffic transmission control, and the research directions can be divided into network transmission protocol optimization, traffic priority division transmission methods and the like. On one hand, the network transport protocol optimization method generally adopts a flow control method to reduce the flow transmission rate, but does not perform network isolation operation according to the flow differentiation requirement, so the network transport protocol optimization method cannot improve the transmission delay of the interactive real-time application.

The article "Data center tcp (dccp)" uses the switch to cooperate with network transmission to perform congestion control, so that the network transmission performance of the Data center can be effectively improved, but delay-sensitive short traffic and bandwidth-high-demand long traffic are not distinguished for transmission, so that short traffic delay cannot be reduced, a switch queue is filled with non-delay-sensitive long traffic Data packets, and the user experience of cloud service and application cannot be effectively improved. The article "setter river th late: meeting addresses in data center networks' utilizes a switch to implement flow transmission speed control according to prior knowledge such as flow size, deadline and the like, utilizes display congestion notification to dynamically control packet sending rate of data flow, can reduce interactive real-time short flow completion time of a cloud data center to a certain extent, on the other hand, flow is distinguished according to different types according to a priority transmission method, and the priority is divided, a multi-queue traffic scheduling method is adopted, the traffic scheduling mechanism meets the demand of traffic type differentiation, and achieve performance isolation, the article "pFabric: a minimum near-optimal data center transport' is based on a flow rate priority scheduling mechanism, combines a data center network flow rate scheduling and rate control method, and utilizes the characteristic that a switch supports multi-queue scheduling to realize fine-grained flow rate scheduling at a sending end and the switch end. A flow scheduling method PIAS based on multi-stage feedback queues is provided in the article 'Pias: Practical information-advertising flow scheduling for communicating data centers', transmission flow degradation thresholds are set in queues at all stages according to the cumulative size of current flow sending application, long and short types of flows are classified, short flow priority transmission is realized, short flow transmission delay is reduced, user experience is effectively improved, short flow transmission real-time performance is improved, and deployment implementation is easier. In the prior art, due to the characteristics of dynamic change and burstiness of the data center flow and the like, the degradation threshold of each stage of queue is not matched with the flow, the optimization period is long, the automatic adaptation to the flow change cannot be realized, when the threshold is not matched with the flow, the short-flow priority transmission efficiency is low, and the problem that the short flow is delayed by the long flow still exists.

Through the above analysis, the problems and defects of the prior art are as follows: in the prior art, the problem that degradation thresholds of queues at all levels are not matched with the flow size exists, the optimization period is long, the automatic adaptation to flow change cannot be realized, when the threshold size is not matched with the flow size, the short-flow priority transmission efficiency is low, and the problem that the short flow is delayed by the long flow still exists.

The difficulty in solving the above problems and defects is: how to adjust the multi-level queue degradation threshold value in a fine-grained manner to adapt to the dynamic change of the data center flow, and the improvement of the short flow transmission real-time property needs to be solved.

The significance of solving the problems and the defects is as follows: in the short-flow real-time optimization method, the network flow with dynamic change of the data center is adaptively matched by utilizing the multi-stage queue degradation threshold, so that the interactive real-time application short-flow transmission delay can be reduced, and the user experience is effectively improved in the cloud environment aiming at most interactive applications.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a short flow real-time optimization method, a short flow real-time optimization system and a network transmission terminal.

The invention is realized in such a way that a short flow real-time optimization method comprises the following steps:

step one, constructing a reinforcement learning framework of a short flow real-time optimization method of a data center;

acquiring network flow characteristics of the data center, monitoring the flow characteristics in the data center, and generating flow size distribution;

initializing a multi-stage queue degradation threshold, and calculating the multi-stage queue degradation threshold according to the distribution characteristics of the internal flow of the data center;

and step four, dynamically adjusting the threshold value, wherein the threshold value is used for dynamically adjusting the threshold value size in a fine granularity manner so as to adapt to the change of the flow size of the data center, and realizing the transmission of the data packets by dividing the priority.

Further, the architecture includes:

(1) the state is as follows: establishing a state space S, each state

The threshold of the ith queue at the representative time step t and the initial threshold in the flow scheduling method based on the multi-level priority queue are defined as the initial state, the threshold is simply adjusted by using actions, and the state space is expressed as:

S＝(thres₁,thres₂,…,thres_M)；

wherein q is_jIs the jth queue, j belongs to 1,2, …, M;

(2) an action space: the behavior of selecting a corresponding threshold value according to the link requirement is shown, when the short flow is larger than the threshold value, the threshold value is properly increased, and the short flow is transmitted in a queue with higher priority; when the short flow is far smaller than the threshold value, the threshold value is properly reduced, and the long flow is transmitted in a lower priority queue; the actions are represented as:

A＝(inc_thres₁,dec_thres₁,inc_thres₂,dec_thres₂,…,inc_thres_n,dec_thres_n)；

(3) rewarding: for evaluating the performance of the machine in the threshold training process, the ratio of short flow average completion time of two iterations is used as the reward of the reinforcement learning model, and the reward function is expressed as:

further, the flow characteristic obtaining includes: and generating bandwidth requirements according to the flow size distribution, specifying a source IP address and a destination IP address of the data packet, sending different flows to the server, collecting flow characteristic information such as transmission completion time, sending rate and the like of all the flows at the server, and feeding the flow characteristic information back to the client program for display.

Further, the multi-stage queue demotion threshold initialization module includes: calculating the initial threshold of the multistage queue degradation by using the distribution of network load and flow, wherein the calculation formula is as follows:

wherein M priority queues Q_i(i is more than or equal to 1 and less than or equal to M), wherein the highest-level queue is Q₁The degradation threshold from j-1 to j is alpha_j-1(j is more than or equal to 2 and less than or equal to M) is formed by₀＝0，α_MInfinity, the maximum flow is known to be in the mth queue. Each flow from i-1 to N has a size x_iThe cumulative density function defining the flow size distribution is F (x) (F (x) ≦ x). Definition of theta_jIs a threshold interval [ alpha ]_j-1,α_j) Probability of magnitude of flow, i.e. theta_j＝F(α_j)-F(α_j-1) In the two-stage queue, the relationship between the flow completion time and the threshold is as follows:

further, the dynamic threshold adjustment comprises the following steps:

initializing all function values in a state-action Q function value table, setting an initial state threshold value, and defining all initial values of the state-action Q function values as 0;

secondly, inquiring in a Q function value table aiming at the initial states s (thres) of the queue threshold optimization models;

thirdly, the intelligent agent executes the action threshold value variation delta thres, adjusts the threshold value state to be s '(delta thres'), sends the latest threshold value to the PIAS module, obtains the flow completion time according to the new threshold value, and calculates the reward r of the state-action value (s, delta thres);

fourthly, checking whether the iteration steps of the algorithm are completely finished;

and fifthly, selecting an action delta thres in the action space A according to the action selection strategy pi because the iteration state value does not exist in the Q function value table, and turning to the third algorithm step.

Further, in the second step, aiming at the initial states s (thres) of the queue threshold optimization models, inquiring in a Q function value table, and if the states exist, selecting an action delta thres as a threshold variation under the current model state s according to an action selection strategy; and if the state does not exist, jumping to the fifth step.

Further, the fourth step checks whether the number of steps of the algorithm execution iteration is completely finished; if not, the algorithm takes a new threshold state s '(delta thres') as the initial state of the next step, and returns to the step one to be executed until the step four; if all the learning is finished, the learning of the current round is finished.

It is another object of the present invention to provide a program storage medium for receiving user input, the stored computer program causing an electronic device to perform the steps comprising:

Another object of the present invention is to provide a short-flow real-time optimization system for implementing the short-flow real-time optimization method, the short-flow real-time optimization system comprising:

the architecture construction module is used for constructing a short flow real-time optimization method architecture;

the flow size distribution generation module is used for monitoring flow characteristics in the data center and generating flow size distribution;

the multi-stage queue degradation threshold calculation module is used for realizing the initialization of the multi-stage queue degradation threshold and calculating the multi-stage queue degradation threshold according to the distribution characteristics of the internal flow of the data center;

and the data packet dividing module is used for realizing dynamic threshold adjustment and dynamically adjusting the threshold size in a fine-grained manner so as to adapt to the change of the flow size of the data center and realize the priority transmission of the data packet division.

Another object of the present invention is to provide a network transmission terminal, which carries the short traffic real-time optimization system.

By combining all the technical schemes, the invention has the advantages and positive effects that: in order to solve the problem that the threshold value of the multi-stage queue is not matched with the flow rate, aiming at the characteristics that the prior knowledge such as the flow rate of a data center, the transmission cut-off time and the like is unknown, the invention provides and designs a threshold optimization method and a network transmission decision model based on reinforcement learning, researches a network multi-stage feedback queue flow scheduling method and technology, realizes the threshold self-adaption matching of the flow rate and transmission based on reinforcement learning, reduces the interactive real-time short flow transmission delay, improves the delayed transmission of the long flow rate required by high bandwidth in the queue, and improves the real-time performance of network transmission.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained from the drawings without creative efforts.

Fig. 1 is a flowchart of a short-flow real-time optimization method according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a short-flow real-time optimization system provided in an embodiment of the present invention;

in the figure: 1. an architecture construction module; 2. a flow size distribution generation module; 3. a multi-level queue degradation threshold calculation module; 4. and a data packet dividing module.

Fig. 3 is an architecture diagram of a short-flow real-time optimization method according to an embodiment of the present invention.

Fig. 4 is a flowchart of a multi-stage queue threshold dynamic adjustment algorithm according to an embodiment of the present invention.

Fig. 5 is a line graph of experimental results provided by an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides a short traffic real-time optimization method, system and network transmission terminal, which are described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the short flow real-time optimization method provided by the present invention includes the following steps:

s101: constructing a short flow real-time optimization method framework;

s102: monitoring flow characteristics in a data center and generating flow size distribution;

s103: initializing a multi-stage queue degradation threshold, and calculating the multi-stage queue degradation threshold according to the distribution characteristics of the internal flow of the data center;

s104: and the threshold value dynamic adjustment is used for dynamically adjusting the threshold value size in a fine granularity manner so as to adapt to the change of the data center flow, and the data packet division priority transmission is realized.

As shown in fig. 2, the short flow real-time optimization system provided by the present invention includes:

and the architecture construction module 1 is used for constructing a short flow real-time optimization method architecture.

And the flow size distribution generation module 2 is used for monitoring the flow characteristics in the data center and generating flow size distribution.

And the multi-stage queue degradation threshold calculation module 3 is used for initializing the multi-stage queue degradation threshold and calculating the multi-stage queue degradation threshold according to the distribution characteristics of the internal flow of the data center.

And the data packet dividing module 4 is used for realizing dynamic threshold adjustment and dynamically adjusting the threshold size in a fine granularity manner so as to adapt to the change of the data center flow size and realize the data packet dividing priority transmission.

The technical solution of the present invention is further described below with reference to the accompanying drawings.

As shown in fig. 3 and 4, the short flow real-time optimization provided by the present invention includes the following steps:

1. and constructing a short flow real-time optimization method framework.

The state is as follows: establishing a state space S, each state

Representing the threshold of the ith queue at time step t. Because the queue degradation threshold value is continuous, in order to accelerate the convergence of the method, the threshold value is discretized, the initial threshold value in the flow scheduling method based on the multi-level priority queue is defined as the initial state, and the threshold value is simply adjusted by using actions. Form ofThe state space is represented as:

S＝(thres₁,thres₂,…,thres_M)；

wherein q is_jIs the jth queue, j ∈ 1,2, …, M.

An action space: indicating that the corresponding threshold behavior is selected according to the link requirements. When the short flow is larger than the threshold value, the threshold value is properly increased, and the short flow is transmitted in the queue with higher priority; when the short traffic is far smaller than the threshold, the threshold is appropriately reduced, and the long traffic is transmitted in the lower priority queue. The actions are represented as:

rewarding: for evaluating the quality of the machine-performed action during the threshold training process. The ratio of the short flow average completion time of two iterations is used as the reinforcement learning model reward. The reward function is represented as:

2. acquiring flow characteristics in a data center, generating bandwidth requirements according to flow size distribution, designating a source IP address and a destination IP address of a data packet, sending different flows to a server, collecting flow characteristic information such as all flow transmission completion time, sending rate and the like at the server, and feeding the flow characteristic information back to a client program for display;

3. initializing a threshold, calculating a multistage queue degradation initial threshold by utilizing the distribution of network load and flow, wherein the calculation formula is as follows:

wherein M priority queues Q_i(i is more than or equal to 1 and less than or equal to M), wherein the highest-level queue is Q₁The degradation threshold from j-1 to j is alpha_j-1(j is more than or equal to 2 and less than or equal to M) is formed by₀＝0，α_MInfinity, the maximum flow is known to be in the mth queue. Each flow from i-1 to N has a size x_iThe cumulative density function defining the flow size distribution is F (x) (F (x) ≦ x). Definition of theta_jIs a threshold interval [ alpha ]_j-1,α_j) Probability of magnitude of flow, i.e. theta_j＝F(α_j)-F(α_j-1). For example, in a two-level queue, the traffic completion time is related to the threshold by:

5. dynamically adjusting a threshold value;

the method comprises the following steps: all function values in the initialized state-action Q function value table, given the initial state threshold size. The initial value of the Q value can be selected at will, and in order to accelerate the convergence speed of the algorithm, assignment operation is usually carried out according to prior knowledge related to the method, so that the method is more in line with the change rule of a plurality of queue thresholds, and the flow transmission real-time performance is improved. Since the method of the present invention is performed under conditions where a priori knowledge of the flow size and the cut-off time is unknown, all initial values of the state-action Q function values are defined as 0.

Step two: aiming at initial states s (thres) of a plurality of queue threshold optimization models, inquiring in a Q function value table, and if the states exist, selecting an action delta thres as a threshold variation under the current model state s according to an action selection strategy; and if the state does not exist, jumping to the step five.

Step three: the agent executes the action threshold variation delta thres, adjusts the threshold state to be s '(delta thres'), sends the latest threshold to the PIAS module, obtains the flow completion time according to the new threshold, and calculates the reward r of the state-action value (s, delta thres).

Step four: it is checked whether the number of steps the algorithm performs iterations is all completed. If not, the algorithm takes a new threshold state s '(delta thres') as the initial state of the next step, and returns to the second step until the fourth step; if all the learning is finished, the learning of the current round is finished.

Step five: and selecting an action delta thres in the action space A according to the action selection strategy pi, and turning to the third step of the algorithm.

The technical effects of the present invention will be described in detail with reference to experiments.

Fig. 5 shows experimental data of the present invention, and the results show that the short flow real-time optimization method (RL-QTO) of the present embodiment can effectively reduce the short flow transmission delay and improve the short flow transmission real-time, and the short flow average completion time of the RL-QTO is reduced by 60.2% and 32.9% respectively compared with the conventional TCP transmission process and the PIAS.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer-executable instructions and/or embodied in processor control code, as software executed by various types of processors, or as a combination of hardware circuitry and software, such as firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A short flow real-time optimization method is characterized by comprising the following steps:

step four, dynamically adjusting a threshold value, wherein the threshold value is used for dynamically adjusting the threshold value size in a fine granularity manner so as to adapt to the change of the flow size of a data center, and realizing the transmission of data packets by dividing priorities;

the architecture comprises:

(1) the state is as follows: establishing a state space S, each state

S＝(thres₁,thres₂,…,thres_M)；

wherein q is_jIs the jth queue, j belongs to 1,2, …, M;

(2) an action space: representing the selection of the corresponding threshold behavior according to link requirements, the actions are represented as follows:

(3) rewarding: for evaluating the performance of a machine executing actions in a threshold training process, the ratio of short flow average completion time of two iterations is used as a reinforcement learning framework reward, and the reward function is expressed as:

when the short flow is larger than the threshold value, the threshold value is properly increased, and the short flow is transmitted in the queue with higher priority; when the short flow is smaller than the threshold value, the threshold value is properly reduced, and the long flow is transmitted in the lower priority queue;

the flow characteristic acquisition includes: generating bandwidth requirements according to the flow size distribution, specifying a source IP address and a destination IP address of a data packet, sending different flows to a server, collecting all flow transmission completion time and sending rate flow characteristic information at the server, and feeding back the information to a client program for display;

the multi-level queue destage threshold initialization comprises: calculating the initial threshold of the multistage queue degradation by using the distribution of network load and flow, wherein the calculation formula is as follows:

wherein M priority queues Q_i(i is more than or equal to 1 and less than or equal to M), wherein the highest-level queue is Q₁The degradation threshold from j-1 to j is alpha_j-1(j is more than or equal to 2 and less than or equal to M) is formed by₀＝0，α_MIt is known that the maximum flow is in the mth queue, and the flow size from i to N is x_iDefining the cumulative density function of the flow size distribution as F (x) (F (x) ≦ x) and theta_jIs a threshold interval [ alpha ]_j-1,α_j) Probability of magnitude of flow, i.e. theta_j＝F(α_j)-F(α_j-1) In the two-stage queue, the relationship between the flow completion time and the threshold is as follows:

the dynamic threshold adjustment comprises the following steps:

secondly, inquiring in a Q function value table aiming at the initial state s of a plurality of queue threshold optimization models;

thirdly, the intelligent agent executes the action threshold value variation delta thres, adjusts the threshold value state to be s', sends the latest threshold value to the PIAS module, obtains the flow completion time according to the new threshold value, and calculates the reward r of the state-action value (s, delta thres);

fifthly, selecting an action delta thres in the action space A according to an action selection strategy pi because the iteration state value does not exist in the Q function value table, and turning to the third algorithm step;

and in the second step, inquiring the initial state s of the queue threshold optimization models in a Q function value table, if the state exists, executing the third step and the fourth step, and if the state does not exist, jumping to the fifth step.

2. The short-flow real-time optimization method of claim 1, wherein the fourth step checks whether the number of steps of the algorithm execution iteration is completely completed; if not, the algorithm takes the new threshold state s' as the initial state of the next step, and returns to the step one to execute until the step four; if all the learning is finished, the learning of the current round is finished.

3. A short-flow real-time optimization system for implementing the short-flow real-time optimization method according to any one of claims 1 to 2, wherein the short-flow real-time optimization system comprises:

the architecture construction module is used for constructing a reinforcement learning architecture of the short flow real-time optimization method;

the multi-stage queue degradation threshold calculation module is used for realizing the initialization of the multi-stage queue degradation threshold and calculating the multi-stage queue degradation threshold according to the distribution characteristics of the internal flow of the data center; the multi-level queue destage threshold initialization comprises: calculating the initial threshold of the multistage queue degradation by using the distribution of network load and flow, wherein the calculation formula is as follows:

wherein M is excellentFirst queue Q_i(i is more than or equal to 1 and less than or equal to M), wherein the highest-level queue is Q₁The degradation threshold from j-1 to j is alpha_j-1(j is more than or equal to 2 and less than or equal to M) is formed by₀＝0，α_MIt is known that the maximum flow is in the mth queue, and the flow size from i to N is x_iDefining the cumulative density function of the flow size distribution as F (x) (F (x) ≦ x) and theta_jIs a threshold interval [ alpha ]_j-1,α_j) Probability of magnitude of flow, i.e. theta_j＝F(α_j)-F(α_j-1) In the two-stage queue, the relationship between the flow completion time and the threshold is as follows:

the threshold dynamic adjustment module is used for dynamically adjusting the threshold size in a fine-grained manner so as to adapt to the change of the flow of the data center and realize the priority transmission of the data packets; the dynamic threshold adjustment comprises the following steps:

4. A network transmission terminal, characterized in that the network transmission terminal is equipped with the short traffic real-time optimization system of claim 3.