CN113315716A

CN113315716A - Method and equipment for training congestion control model and method and equipment for congestion control

Info

Publication number: CN113315716A
Application number: CN202110592772.9A
Authority: CN
Inventors: 周超; 陈艳姣; 夏振厂
Original assignee: Wuhan University WHU; Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Wuhan University WHU; Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-08-27
Anticipated expiration: 2041-05-28
Also published as: CN113315716B

Abstract

The disclosure provides a training method and equipment of a congestion control model and a congestion control method and equipment. The congestion control method comprises the following steps: acquiring current first network state information and the preference of current application on network transmission performance; inputting the acquired first network state information and the preference into a congestion control model to obtain a predicted action to be executed for adjusting the size of a congestion window; the predicted action is performed to reset the congestion window.

Description

Method and equipment for training congestion control model and method and equipment for congestion control

Technical Field

The present disclosure relates generally to the field of communications technologies, and in particular, to a method and an apparatus for training a congestion control model, and a method and an apparatus for congestion control.

Background

In recent years, in order to solve the problem of network congestion and improve network performance, many congestion control protocols have been proposed, including heuristic protocols and learning-based protocols.

The learning-based congestion control protocols PCC and PCC Vivace learn the relationship between rate control behavior and observed performance in an online manner. To avoid the hard mapping between states and actions collected in the traditional TCP variants, they select the optimal sending rate by employing online learning techniques, which continually try to modify the sending rate within a small range to approach better utility function performance. Although PCC and PCC Vivace are able to achieve good performance. The learning-based congestion control protocol learns a congestion control policy by interacting with the environment, which may select appropriate actions to control the sending rate or congestion window depending on the state of the network. However, learning-based congestion control protocols drive performance through pre-designed reward or objective functions, which are fixed, and when new applications appear, these protocols cannot meet the performance requirements of these applications, thus requiring redesign of the objective function and retraining of new models.

Disclosure of Invention

Exemplary embodiments of the present disclosure are directed to a method and apparatus for training a congestion control model, and a method and apparatus for congestion control, which at least solve the problems of the related art described above, and may not solve any of the problems described above.

According to a first aspect of the embodiments of the present disclosure, there is provided a method for training a congestion control model, including: initializing a communication network environment used by the current training round; inputting the preference of the training round on the network transmission performance and the current first network state information into a congestion control model to obtain predicted actions to be executed for adjusting the size of a congestion window; executing the predicted action to reset the congestion window and controlling the sending end to send the data packet to the receiving end under the currently set congestion window; when a sending end receives an ACK message fed back by a receiving end, calculating a loss function of the congestion control model according to the action, first network state information before the action is executed, first network state information after the action is executed and the preference; and training the congestion control model by adjusting the model parameters of the congestion control model according to the loss function, and determining whether to finish the training round, wherein when the training round is not finished, the steps of inputting the preference of the training round on the network transmission performance and the current first network state information into the congestion control model are returned to be executed, so that the predicted action which needs to be executed and is used for adjusting the size of the congestion window are obtained.

Optionally, the first network status information comprises at least one of: the size of the congestion window, delay, packet acknowledgement rate, and sending rate; wherein the delay, the packet acknowledgement rate, and the transmission rate are determined based on the ACK message fed back by the receiving end.

Optionally, the preference for network transmission performance comprises a degree of preference for at least one of: throughput, packet loss rate, and latency.

Optionally, the step of determining whether to end the training round comprises: and determining whether to finish the training round according to the change condition of the second network state information.

Optionally, the step of determining whether to end the training round according to the change of the second network state information includes: when the second network state information after the action is executed meets a first preset condition, determining the action as a winning action; when the second network state information after the action is executed meets a second preset condition, determining the action as a failed action; when the continuous times of the winning actions reach a first preset time, determining to finish the training round; when the continuous times of the failed actions reach a second preset time, determining to finish the training round; and when the total times of executing the actions reach a third preset time, determining to finish the training round.

Optionally, the second network status information includes: throughput and latency; the first preset condition is as follows: the throughput is 90% -110% of the bandwidth, and the delay is less than or equal to 0.7 multiplied by the overtime threshold; the second preset condition is as follows: the throughput is 50% -70% of the bandwidth and the delay is greater than or equal to 0.7 x the timeout threshold.

Optionally, the method further comprises: initializing the size of a congestion window; wherein the step of initializing the size of the congestion window comprises: the bandwidth of the communication network is estimated, and an initial size of a congestion window is determined based on the estimated bandwidth.

Optionally, the step of predicting the bandwidth of the communication network comprises: determining the total quantity of ACK messages fed back by a receiving end aiming at N data packets sent by a sending end; and determining the bandwidth of the communication network according to the average value obtained by dividing the total number by N.

Optionally, when the sending end receives an ACK message fed back by the receiving end, the step of calculating the loss function of the congestion control model according to the action, the first network state information before the action is executed, the first network state information after the action is executed, and the preference includes: and when the sending end receives the ACK message fed back by the receiving end, calculating a loss function of the congestion control model according to the action, the first network state information before the action is executed, the first network state information after the action is executed, the reward function of the action and the preference.

Optionally, the reward function of the action is calculated based on the preference and third network state information after the action is performed; wherein the third network state information comprises at least one of: packet loss rate, throughput, and delay.

Optionally, the congestion control model is constructed based on a reinforcement learning algorithm; wherein the value function in the reinforcement learning algorithm is a value function regarding an action, first network state information, and a preference for network transmission performance.

Optionally, the congestion control model predicts an action, the probability with e is an action randomly selected from the action set, and the probability with 1-e is an optimal action obtained using a value function.

Optionally, the loss function of the congestion control model is based on: loss function L for making value function closer to maximum reward function^S(θ) And an auxiliary loss function L^T(theta) calculated.

Optionally, the loss function of the congestion control model is represented as: (1-. epsilon.) L^S(θ)+ε·L^T(θ); wherein epsilon is a trade-off index, the more later an action is predicted in a training round, the larger the value of epsilon is when calculating the loss function of the congestion control model aiming at the action, and epsilon is more than or equal to 0 and less than or equal to 1.

Optionally, the objective function of the congestion control model is a composite objective function with respect to: a reward function, a value function, first network state information after performing an action, first network state information before performing an action, a preference of the current training round for network transmission performance, and a preference for the best under the current network environment.

Optionally, the method further comprises: when determining to end the training return, determining whether to end the training process of the congestion control model; and when the training process of the congestion control model is determined not to be ended, returning to the step of executing the communication network environment used for initializing the current training round so as to enter the next training round.

According to a second aspect of the embodiments of the present disclosure, there is provided a congestion control method, including: acquiring current first network state information and the preference of current application on network transmission performance; inputting the acquired first network state information and the preference into a congestion control model to obtain a predicted action to be executed for adjusting the size of a congestion window; the predicted action is performed to reset the congestion window.

Optionally, the method further comprises: initializing the size of a congestion window; wherein the step of initializing the size of the congestion window comprises: bandwidth of the communication network is estimated, and an initial size of the congestion window is determined based on the estimated bandwidth.

Optionally, the step of predicting the bandwidth of the communication network comprises: determining the total quantity of ACK messages fed back by a receiving end aiming at the N sent data packets; and determining the bandwidth of the communication network according to the average value obtained by dividing the total number by N.

Optionally, the congestion control model is trained using a training method as described above.

According to a third aspect of embodiments of the present disclosure, there is provided a training apparatus of a congestion control model, including: an environment initialization unit configured to initialize a communication network environment used by a current training round; the prediction unit is configured to input the preference of the training round on the network transmission performance and the current first network state information into the congestion control model, and obtain predicted actions which need to be executed and are used for adjusting the size of the congestion window; a congestion window setting unit configured to perform a predicted action to reset a congestion window and control a transmitting end to transmit a data packet to a receiving end under the currently set congestion window; a loss function calculation unit configured to calculate a loss function of the congestion control model according to the action, the first network state information before the action is executed, the first network state information after the action is executed, and the preference when the sender receives an ACK message fed back by a receiver; a training unit configured to train the congestion control model by adjusting model parameters of the congestion control model according to the loss function; and the predicting unit inputs the preference of the training round to the network transmission performance and the current first network state information into the congestion control model when determining not to end the training round, and obtains predicted actions required to be executed for adjusting the size of the congestion window.

Optionally, the round end determining unit is configured to determine whether to end the training round according to a change of the second network state information.

Optionally, the round end determining unit is configured to determine that the action is a winning action when the second network state information after the action is performed satisfies a first preset condition; when the second network state information after the action is executed meets a second preset condition, determining the action as a failed action; when the continuous times of the winning actions reach a first preset time, determining to finish the training round; when the continuous times of the failed actions reach a second preset time, determining to finish the training round; and when the total times of executing the actions reach a third preset time, determining to finish the training round.

Optionally, the apparatus further comprises: a window initialization unit configured to initialize a size of a congestion window; wherein the window initialization unit is configured to predict a bandwidth of the communication network and determine an initial size of a congestion window based on the predicted bandwidth.

Optionally, the window initialization unit is configured to determine the total number of ACK messages fed back by the receiving end for the N data packets sent by the sending end; and determining the bandwidth of the communication network according to an average value obtained by dividing the total number by N.

Optionally, the loss function calculating unit is configured to calculate, when the sending end receives an ACK message fed back by the receiving end, a loss function of the congestion control model according to the action, the first network state information before the action is performed, the first network state information after the action is performed, a reward function of the action, and the preference.

Optionally, the loss function of the congestion control model is based on: loss function L for making value function closer to maximum reward function^S(theta) and an auxiliary loss function L^T(theta) calculated.

Optionally, the apparatus further comprises: a training end determining unit configured to determine whether to end a training process of the congestion control model when it is determined to end the present training round, wherein the environment initializing unit initializes a communication network environment used by the present training round to enter a next training round when it is determined not to end the training process of the congestion control model.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a congestion control device including: the acquisition unit is configured to acquire current first network state information and the preference of the current application on the network transmission performance; a prediction unit configured to input the obtained first network state information and the preference into a congestion control model, and obtain a predicted action to be executed for adjusting the size of a congestion window; a congestion window setting unit configured to perform the predicted action to reset the congestion window.

Optionally, the apparatus further comprises: a window initialization unit configured to initialize a size of a congestion window; wherein the window initialization unit is configured to predict a bandwidth of the communication network and determine an initial size of the congestion window based on the predicted bandwidth.

Optionally, the window initialization unit is configured to determine the total number of ACK messages fed back by the receiving end for the N transmitted data packets; and determining the bandwidth of the communication network according to an average value obtained by dividing the total number by N.

Optionally, the congestion control model is trained using a training apparatus as described above.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a training method for a congestion control model as described above and/or a congestion control method as described above.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by at least one processor, cause the at least one processor to perform a training method of a congestion control model as described above and/or a congestion control method as described above.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by at least one processor, implement the training method of the congestion control model as described above and/or the congestion control method as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the congestion control model disclosed by the exemplary embodiment of the disclosure, the optimal congestion control strategy can be selected according to the preference of the application to the transmission performance, so that the transmission performance requirements of different applications can be met without redesigning an objective function and a training model; according to the congestion control method of the exemplary embodiment of the disclosure, the congestion control method is suitable for congestion control of various types of applications, can realize the balance among throughput, time delay and packet loss, and can meet the transmission performance requirements of different types of applications;

the multi-objective reinforcement learning network of the congestion control model of the exemplary embodiment of the disclosure can optimize the whole preference space of congestion control, so that the trained model can generate an optimal strategy for any given preference, which fundamentally changes the design that the objective function or utility function of the existing protocol is fixed, and has great advantages in meeting different types of applications;

by setting different initial congestion window values aiming at different network bandwidths, network convergence can be effectively accelerated;

by the method for finishing the training round according to the change of the network environment, the training round can be finished at a proper time according to the change of the network bandwidth utilization rate, the time delay and the throughput, so that the training efficiency of the model is improved;

in addition, a method for interrupting a training round by win-lose-tie to improve the training quality of the congestion control model is provided, and the problem that pseudo-interruption of training occurs when multi-target reinforcement learning is applied to the congestion control problem is solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a schematic diagram illustrating an implementation scenario of a congestion control method and apparatus according to an exemplary embodiment of the present disclosure;

fig. 2 shows a flow chart of a training method of a congestion control model according to an exemplary embodiment of the present disclosure;

fig. 3 shows a flow chart of a congestion control method according to an exemplary embodiment of the present disclosure;

fig. 4 shows a schematic diagram of a training method of a congestion control model and a congestion control method according to an exemplary embodiment of the present disclosure;

fig. 5 shows a block diagram of a training apparatus of a congestion control model according to an exemplary embodiment of the present disclosure;

fig. 6 shows a block diagram of a congestion control device according to an exemplary embodiment of the present disclosure;

fig. 7 illustrates a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

With the rapid development of mobile internet technology and the increasing number of terminals, many terminal devices are equipped with different types of applications, including, for example, delay-sensitive applications and throughput-sensitive applications, which may not benefit from higher bandwidth, requiring transmission delays as low as several milliseconds for delay-sensitive applications such as internet phones or cloud games; for throughput sensitive applications, such as video streaming or file sharing applications, high bandwidth is often required for better performance. Depending on the type of application and its requirements on network transmission performance (e.g., high throughput, low delay, and low packet loss), the congestion control method may need to follow a completely different policy. As shown in fig. 1, if an application is a file transfer type application sensitive to throughput, the throughput is critical, and the application has high requirements on the throughput of file transfer and relatively low requirements on delay; if an application is a delay sensitive real-time streaming application for which minimizing delay is crucial, it requires low transmission delay to reduce video stutter and relatively low packet loss requirements and relatively low throughput requirements. As the most important protocol of a network transport layer, a computer network congestion control protocol needs to provide high-quality network services for applications with different network performance requirements, that is, the transport layer not only needs to adapt to variable network conditions, but also needs to adapt to different application requirements, thereby meeting different requirements of users and improving the quality of experience of the users.

The method can be suitable for different types of applications by utilizing multi-target reinforcement learning and preference, and particularly can input the preference of the application to the network transmission performance and the current network state information into the congestion control model, and the congestion control model can provide the optimal action for adjusting the size of a congestion window according to the application performance requirement.

It should be understood that the congestion control method and/or congestion control apparatus according to the present disclosure may be applied not only to the above-described scenario but also to other suitable scenarios, and the present disclosure is not limited thereto.

Fig. 2 shows a flowchart of a training method of a congestion control model according to an exemplary embodiment of the present disclosure.

Referring to fig. 2, in step S101, a communication network environment used by the current training round is initialized.

The communication network environment is used for data transmission by the transmitting end and the receiving end used in the training round.

In addition, as an example, before the training round starts, a transmitting end and a receiving end used in the training round may be initialized, and the initialized transmitting end and the initialized receiving end handshake so that in the training round, the transmitting end is controlled to continuously transmit a data packet to the receiving end in the communication network environment, and the receiving end transmits a response ACK message to the transmitting end, and a current network state may be monitored by analyzing a determination message of the receiving end on the data packet.

In step S102, the preference of the training round for the network transmission performance and the current first network state information S are input into the congestion control model, and the predicted action a to be executed for adjusting the congestion window size is obtained.

As an example, the first network status information may comprise at least one of: size of congestion window, Delay, packet acknowledgement rate ACK _ rate, and transmission rate Sending _ rate. For example, the delay, the packet acknowledgement rate, and the transmission rate may be determined based on the ACK message (i.e., the acknowledgement message) fed back by the receiving end. For example, the sending end may be controlled to send a data packet to the receiving end under the currently set congestion window size, and when the sending end receives an ACK message fed back by the receiving end, the sending end may determine the current delay, packet acknowledgement rate, and sending rate based on the ACK message, so as to obtain the current first network state information.

As an example, the preference for network transmission performance may include a degree of preference for at least one of: throughput, packet loss rate, and latency.

As an example, a sample may be selected from the preference set as the preference of the present training round for the network transmission performance. For example, one sample in the set of preferences may be: preference for throughput is 0.7, preference for packet loss rate is 0.2, preference for latency is 0.1; another sample in the set of preferences may be: preference for throughput is 0.5, preference for packet loss rate is 0.1, preference for latency is 0.4. It should be appreciated that the set of preferences may be set according to various requirements of different types of applications for network transmission performance.

In step S103, the predicted action is performed to reset the congestion window, and the transmitting end is controlled to transmit the data packet to the receiving end under the currently set congestion window.

As an example, a predicted action may be performed on the current congestion window size to get the size of the congestion window that needs to be set and set.

As an example, the action predicted by the congestion control model may be one action in a set of actions, as an example, the set of actions may be { × 0.5, -50, -10.0, +0.0, +10.0, + 2.0, +50}, for example, × 0.5 represents the size of the congestion window to be set after the current congestion window size × 0.5; 10.0 represents that the current congestion window size is taken as the size of the congestion window needing to be set after being plus 10.0; 10.0 represents the size of the congestion window to be set after the current congestion window size is 10.0.

In step S104, when the sending end receives an ACK message fed back by the receiving end, a loss function of the congestion control model is calculated according to the action a, the first network state information before the action is executed (i.e., the first network state information S input to the congestion control model to predict the action a), the first network state information S' after the action is executed, and the preference of the training round for the network transmission performance.

Here, the first network state information S' after the action is performed is the first network state information determined based on the ACK message fed back by the receiving end with respect to the packet transmitted to the receiving end in step S103.

As an example, the loss function of the congestion control model may be calculated according to the action a, the first network state information S before the action is performed, the first network state information S' after the action is performed, the reward function r of the action, and the preference of the current training round for network transmission performance.

The Reward function for an action is a Reward function used to measure the revenue of the action predicted by the congestion control model. As an example, the reward function of the action may be calculated based on the preference of the present training round for the network transmission performance and the third network state information after performing the action. For example, the third network status information may include at least one of: packet loss rate, throughput, and delay. It should be understood that the third network state information after performing the action is the third network state information determined based on the ACK message fed back by the receiving end for the data packet transmitted to the receiving end in step S103.

As an example, the reward function for an action [ L (Throughput (t)), L (Loss _ rate (t)), L (delay (t)) may be determined based on the following triplets]For example, the reward function for an action may be a weighted sum of these three quantities, with the weight of each quantity being related to the preference of the present training round for network transmission performance. Wherein t represents time; l (x) denotes an activation function, e.g., L (x) ═ 10)^-x+ 1; throughput (t) may represent the result of normalizing throughput, e.g., by dividing throughput by bandwidth; delay (t) may represent the result of normalizing the delay, e.g., the delay may be normalized by dividing the delay by the timeout threshold timeout; loss _ rate (t) may represent the packet Loss rate itself, i.e., no normalization of the packet Loss rate is required.

As an example, in step S103, the actions predicted by the congestion control model are performed to reset the congestion window, control the sending end to send the data packet to the receiving end under the newly set congestion window, and wait for the acknowledgement messages ACKs returned by the receiving end. After receiving the ACK, the reward function r of Step of the training round and the observed current first network state information S' (i.e. the first network state information after performing the action) can be obtained by calculating rtt, comparing the number of packets with the number of acknowledgement messages, etc. in Step S104. As an example, the state S before this step can be taken, thisOne-step action a, and the reward r obtained after the action is executed and the new state S' transferred to are saved in a replay cache in a vector form

In (1). Accordingly, as an example, the playback buffer may be initialized before the beginning of each training round

In addition, it should be understood that the first network state information after the action is executed in this step can be used as the first network state information before the action is executed in the next step.

As an example, the congestion control model may be constructed based on a reinforcement learning algorithm DQN. As an example, when the preference for the network transmission performance includes a preference level for a plurality of network transmission performances, the congestion control model is a multi-objective model. As an example, the congestion control model may be a multi-objective reinforcement learning model.

As an example, the value function (Q-function) in the reinforcement learning algorithm may be a value function with respect to actions, first network state information, and preferences for network transmission performance.

As an example, the congestion control model may sample an action as a predicted action a using an e-greedily policy_tAn action can be sampled using equation (1), specifically, the sampled action has a probability of e being a randomly selected action from the action set A, a probability of 1-e being the optimal action obtained using the Q function,

wherein A represents action set, omega represents preference of training round to network transmission performance, theta represents parameter of Q function, and s_tRepresenting the current first network state information.

As an example, the congestion control model may be a multi-objective model, in order to model congestionThe plug control problem is mathematically represented assuming that there are multiple objectives, each of which can be expressed in the form of an objective function, to achieve a different objective function m_i(O) maximization of the ensemble:

s.t.g_i(O)≤0,i＝1,…,a_g

wherein m is_i(O) represents an objective function of the ith target, i is 1, …, m_f,g_i(O) represents a constraint function of the congestion control problem.

As an example, the objective function of the congestion control model may be a composite objective function with respect to: a reward function, a value function, first network state information after performing an action, first network state information before performing an action, a preference of the current training round for network transmission performance, and a preference for the best under the current network environment.

As an example, the objective function of the congestion control model may be a complex objective function TQ (s, a, ω), which may be expressed as:

wherein the content of the first and second substances,

r () represents a reward function, γ represents a weight coefficient, Q () represents a value function, s 'represents first network state information after an action is performed, s represents first network state information before the action is performed, a represents an action, ω represents a preference of the present training round for network transmission performance, ω' represents an optimal preference under the current network environment,

representing a set of actions and omega a set of preferences.

As an example, the loss function of the congestion control model may be based on: loss function L for making value function closer to maximum reward function^S(theta) and an auxiliary loss function L^T(theta) calculated.

Here, the auxiliary loss function is proposed in consideration of the fact that a large number of discrete solutions are included in the optimal boundary, which causes the curve of the loss function to become unsmooth.

As an example, the loss function of the congestion control model may be expressed as: (1-. epsilon.) L^S(θ)+ε·L^T(θ); wherein epsilon is a trade-off index, the more later an action is predicted in a training round, the larger the value of epsilon is when calculating the loss function of the congestion control model aiming at the action, and epsilon is more than or equal to 0 and less than or equal to 1.

In other words, in each training round, ε has an initial value of 0, and gradually increases from 0 to 1 as the number of steps increases, so that the loss function goes from L^SIn the direction of (theta) L^T(θ) migration.

As an example, the loss function L^S(θ) can be expressed as:

as an example, the auxiliary loss function L^T(θ) can be expressed as:

wherein the content of the first and second substances,

r denotes the reward function, gamma denotes the weight coefficient, theta denotes the model parameter, theta_kRepresenting parameters of the K-th model, Q () representing a value function, s 'representing first network state information after the action is executed, s representing first network state information before the action is executed, a representing the action, omega representing the preference of the training round to the network transmission performance, and omega' representing the optimal preference under the current network environment.

In step S105, the congestion control model is trained by adjusting model parameters of the congestion control model according to the loss function.

As an example, the parameter θ of the Q-function of the congestion control model may be adjusted according to the loss function.

As an example, the parameter θ of the Q function may be randomly graded down using equation (3) to update the Q function of the model, wherein,

the amount of gradient of the parameter theta is represented,

in step S106, it is determined whether to end the present training round, wherein when it is determined that the present training round is not ended, execution returns to step S102.

Further, as an example, the training method of the congestion control model according to an exemplary embodiment of the present disclosure may further include: when determining to end the training return, determining whether to end the training process of the congestion control model; when it is determined that the training process of the congestion control model is not ended, execution returns to step S101 to enter the next training round, that is, prepare the next training round. When the training process of the congestion control model is determined to be finished, stopping training the congestion control model, namely, finishing the training of the congestion control model. For example, whether to end the training process of the congestion control model may be determined according to the predicted effect or the total training duration of the congestion control model, and the like. Further, it should be understood that the initial communication network environment for different training rounds may be the same or different, and the preferences of different training rounds for network transmission performance may be the same or different.

As an example, whether to end the training round may be determined according to a change of the second network state information.

As an example, the second network status information may be after the action is performedWhen the first preset condition is met, determining the action as a winning action; when the second network state information after the action is executed meets a second preset condition, determining the action as a failed action; continuous times Win of winning action_NumWhen the first preset times are reached, determining to finish the training round; continuous number of times of action when failure_NumWhen reaching the second preset times, determining to finish the training round; and when the total times of executing the actions reach a third preset time, determining to finish the training round. For example, the first preset number M may be set to 50, the second preset number L may be set to 50, and the third preset number X may be set to 200.

As an example, the second network state information may include: throughput and latency.

As an example, the first preset condition may be: the throughput is 90% -110% of the bandwidth, and the delay is less than or equal to 0.7 multiplied by the overtime threshold; the second preset condition may be: the throughput is 50% -70% of the bandwidth and the delay is greater than or equal to 0.7 x the timeout threshold.

The method is characterized in that the method comprises the steps of determining the end point of a training round according to the number of steps or fixed time, determining the end point of the training round according to the number of steps or the fixed time, and determining the end point of the training round according to the end point of the training round. The present disclosure proposes a method that can end a training round in the most appropriate way. A training round is a complete training process of the reinforcement learning algorithm, in the process, a series of actions are sequentially selected at each moment according to the network state and the congestion control strategy, and the length of the training round has great influence on whether the optimal model can be learned. As an example, in each training round, Win_NumAnd Lose_NumAre all 0, Win is determined every time a predicted action is determined to be a winning action_NumIs +1, and will Lose_NumIs reset to 0 each time a prediction is madeDetermining the action as a failed action, then Lose is determined_NumAnd Win is equal to +1_NumReset to 0; if the number of successive wins Win of the training round_NumIf the number of successive failures is less than or equal to M, the training round is stopped_Num≧ L, the training round is stopped in failure, and if the current training round has taken X steps, the training round is ended in a tie case, in other words Win before the current training round has taken X steps_NumLess than M, Lose_NumAlso, L is not reached.

In addition, the present disclosure considers that an initial congestion window (init-cwnd, which is the size of a congestion window set when a transmitting end starts transmitting data) has a significant influence on a model convergence speed, but in the related art, the init-cwnd is generally set to a fixed value, and thus, there is a problem that rapid convergence cannot be achieved due to different link capacities in different network scenarios. According to the method, the problem that rapid convergence of the congestion control method cannot be achieved under different network scenes due to large difference of different links is considered, an initial congestion window can be designed according to the link bandwidth dynamic state, and therefore the convergence speed of the congestion control method is improved.

As an example, the training method of the congestion control model according to an exemplary embodiment of the present disclosure may further include: initializing the size of a congestion window; the step of initializing the size of the congestion window may include: the bandwidth of the communication network is estimated, and an initial size of a congestion window is determined based on the estimated bandwidth.

As an example, the step of predicting the bandwidth of the communication network may comprise: determining the total number Num of ACK messages fed back by a receiving end aiming at N data packets sent by a sending end_ack(ii) a According to the total number Num_ackAverage Num obtained by dividing by N_aveDetermining a bandwidth bw of the communication network_i。

As an example, sum Num can be found from a predefined Bandwidth combination Bandwidth by equation (4)_aveCorresponding predicted network bandwidth bw_i：

Wherein the content of the first and second substances,

representing a set of characteristic functions with respect to the receiving rate defined in equation (5), (c)_j-1,c_j) Which represents a set of reception rates for the signal,

as an example, the bandwidth bw may be estimated based on equation (6)_iDetermining an initial size W of a congestion window_init-cwndWhere b is a coefficient, for example, b may be set to 2.5 based on experimental data to achieve better fitting rate and learning effect,

W_init-cwnd＝b*bw_i (6)

as an example, the size of the congestion window may be initialized before starting the present training round to use W when the transmitting end transmits a data packet for the first time in the present training round_init-cwnd. As another example, the size of the congestion window may be initialized based on the network state information of the first N steps of the present training round.

It should be appreciated that if the communication network environment is the same for multiple training rounds, the training rounds may share the same W_init-cwnd。

Fig. 3 shows a flowchart of a congestion control method according to an exemplary embodiment of the present disclosure.

Referring to fig. 3, in step S201, current first network state information and a preference of a current application for network transmission performance are acquired.

Here, the application is an application that performs data transmission using the congestion control method.

In step S202, the obtained first network status information and the preference are input to a congestion control model, and a predicted action to be performed for adjusting the size of the congestion window is obtained.

In step S203, the predicted action is performed to reset the congestion window. It should be understood that the congestion control method according to the exemplary embodiment of the present disclosure may be repeatedly performed to adjust the congestion window size in real time according to the network status.

As an example, the first network status information may comprise at least one of: size of congestion window, delay, packet acknowledgement rate, and sending rate. For example, the delay, the packet acknowledgement rate, and the transmission rate may be determined based on the ACK message fed back by the receiving end.

As an example, the congestion control model may be constructed based on a reinforcement learning algorithm; wherein the value function in the reinforcement learning algorithm may be a value function regarding the action, the first network state information, and a preference for network transmission performance.

As an example, the congestion control model may be trained using the training method described in the above exemplary embodiment.

As an example, the congestion control method according to an exemplary embodiment of the present disclosure may further include: initializing the size of a congestion window; the step of initializing the size of the congestion window may include: bandwidth of the communication network is estimated, and an initial size of the congestion window is determined based on the estimated bandwidth. As an example, the size of the congestion window may be initialized when the application starts data transmission with the receiving end.

As an example, the step of predicting the bandwidth of the communication network may comprise: determining the total quantity of ACK messages fed back by a receiving end aiming at the N sent data packets; and determining the bandwidth of the communication network according to the average value obtained by dividing the total number by N.

The specific processing in the congestion control method according to the exemplary embodiment of the present disclosure has been described in detail in the embodiment of the training method of the congestion control model described above, and will not be elaborated here.

Fig. 4 illustrates a training method of a congestion control model and a schematic diagram of a congestion control method according to an exemplary embodiment of the present disclosure.

As shown in fig. 4, when training the congestion control model and starting a training loop, the sending end may send data to the receiving end based on the initial congestion window size set by the bandwidth, so as to improve the convergence speed of the network; training a congestion control model using the multi-target reinforcement learning DQN network by using training data generated by network environment interaction; in addition, a training round interruption algorithm can be adopted to solve the problem of false interruption of the training round. The congestion control method is realized by using the trained congestion control model, and an optimal congestion control strategy can be generated according to any specified preference, so that the requirements of different types of applications can be met.

According to the training method of the congestion control model, the performance of different network indexes can be improved according to different preference settings without resetting a target function or a reward function, so that the transmission performance requirements of different types of applications can be met, and the training time and the training cost of the congestion control model are reduced;

in addition, in order to improve the convergence of the model and solve the problem that the convergence speed of the current congestion control model is relatively low, a method for dynamically initializing a congestion window is further provided according to the exemplary embodiment of the disclosure, and different sizes of the initialized congestion window are set for different network bandwidths, so that the convergence speed of the congestion control model in different network environments can be improved;

in addition, in order to solve the problem of pseudo training interruption when multi-target reinforcement learning is applied to the congestion control problem, a win-loss-tie interruption round algorithm is further provided according to the exemplary embodiment of the disclosure to improve the training quality of the congestion control model, so that the multi-target reinforcement learning can be applied to the congestion control problem, and the training efficiency of the algorithm is improved;

in addition, the congestion control method according to the exemplary embodiment of the present disclosure has superior experimental performance. The method realizes the balance between high throughput and low time delay, and correspondingly shows excellent congestion control capability in different network environments of 12Mbps and 50 Mbps. For different cellular network scenarios, by setting different preferences, the congestion control method according to the exemplary embodiment of the present disclosure can meet transmission performance requirements of different types of applications, and achieve optimal trade-off between different performance indexes, that is, can meet performance requirements of different types of applications in a dynamic network scenario by setting different preferences.

Fig. 5 illustrates a block diagram of a training apparatus of a congestion control model according to an exemplary embodiment of the present disclosure.

As shown in fig. 5, the training apparatus 10 of the congestion control model according to the exemplary embodiment of the present disclosure includes: an environment initialization unit 101, a prediction unit 102, a congestion window setting unit 103, a loss function calculation unit 104, a training unit 105, and an end-of-round determination unit 106.

In particular, the environment initialization unit 101 is configured to initialize the communication network environment used by the current training round.

The prediction unit 102 is configured to input the preference of the present training round for the network transmission performance and the current first network state information into the congestion control model, resulting in predicted actions to be performed for adjusting the congestion window size.

The congestion window setting unit 103 is configured to perform a predicted action to reset the congestion window and control the transmitting end to transmit the data packet to the receiving end under the currently set congestion window.

The loss function calculation unit 104 is configured to, when the sender receives an ACK message fed back by the receiver, calculate a loss function of the congestion control model according to the action, the first network state information before the action is executed, the first network state information after the action is executed, and the preference.

The training unit 105 is configured to train the congestion control model by adjusting model parameters of the congestion control model according to the loss function.

The round end determination unit 106 is configured to determine whether to end the present training round, wherein when determining not to end the present training round, the prediction unit 102 inputs the preference of the present training round for the network transmission performance and the current first network state information to the congestion control model, resulting in predicted actions to be performed for adjusting the congestion window size.

As an example, the first network status information may comprise at least one of: the size of the congestion window, delay, packet acknowledgement rate, and sending rate; wherein the delay, the packet acknowledgement rate, and the transmission rate are determined based on the ACK message fed back by the receiving end.

As an example, the round end determination unit 106 may be configured to determine whether to end the present training round according to a variation of the second network state information.

As an example, the round end determination unit 106 may be configured to determine that the action is a winning action when the second network state information after performing the action satisfies a first preset condition; when the second network state information after the action is executed meets a second preset condition, determining the action as a failed action; when the continuous times of the winning actions reach a first preset time, determining to finish the training round; when the continuous times of the failed actions reach a second preset time, determining to finish the training round; and when the total times of executing the actions reach a third preset time, determining to finish the training round.

As an example, the second network state information may include: throughput and latency; the first preset condition is as follows: the throughput is 90% -110% of the bandwidth, and the delay is less than or equal to 0.7 multiplied by the overtime threshold; the second preset condition is as follows: the throughput is 50% -70% of the bandwidth and the delay is greater than or equal to 0.7 x the timeout threshold.

As an example, the training apparatus 10 of the congestion control model may further include: a window initializing unit (not shown) configured to initialize a size of the congestion window; wherein the window initialization unit is configured to predict a bandwidth of the communication network and determine an initial size of a congestion window based on the predicted bandwidth.

As an example, the window initialization unit may be configured to determine a total number of ACK messages fed back by the receiving end for N data packets sent by the sending end; and determining the bandwidth of the communication network according to an average value obtained by dividing the total number by N.

As an example, the loss function calculation unit 104 may be configured to calculate the loss function of the congestion control model according to the action, the first network state information before the action is performed, the first network state information after the action is performed, the reward function of the action, and the preference when the sender receives an ACK message fed back by the receiver.

As an example, the reward function for the action may be calculated based on the preference and third network state information after performing the action; wherein the third network state information comprises at least one of: packet loss rate, throughput, and delay.

As an example, the congestion control model may be constructed based on a reinforcement learning algorithm; wherein the value function in the reinforcement learning algorithm is a value function regarding an action, first network state information, and a preference for network transmission performance.

As an example, the action predicted by the congestion control model, the probability of having ∈ is one action randomly selected from the action set, and the probability of having 1 ∈ is the optimal action obtained using a value function.

As an example, the loss function of the congestion control model may be expressed as: (1-. epsilon.) L^S(θ)+ε·L^T(θ); where ε is a trade-off index, one training roundThe more recent predicted action is, the greater the value of epsilon is when calculating the loss function of the congestion control model for the action, and epsilon is greater than or equal to 0 and less than or equal to 1.

As an example, the loss function L^S(θ) can be expressed as:

and/or

Auxiliary loss function L^T(θ) is expressed as:

wherein the content of the first and second substances,

As an example, the objective function of the congestion control model may be a complex objective function TQ (s, a, ω), which is expressed as:

wherein the content of the first and second substances,

r () represents the reward function, γRepresents a weight coefficient, Q () represents a value function, s 'represents first network state information after an action is performed, s represents first network state information before the action is performed, a represents an action, ω represents a preference of the present training round for network transmission performance, ω' represents an optimal preference under the current network environment,

representing a set of actions and omega a set of preferences.

As an example, the training apparatus 10 of the congestion control model according to an exemplary embodiment of the present disclosure may further include: a training end determination unit (not shown) configured to determine whether to end the training process of the congestion control model when determining to end the present training round, wherein the environment initialization unit 101 initializes the communication network environment used by the current training round to enter the next training round when determining not to end the training process of the congestion control model.

Fig. 6 shows a block diagram of a congestion control apparatus according to an exemplary embodiment of the present disclosure.

As shown in fig. 6, the congestion control apparatus 20 according to the exemplary embodiment of the present disclosure includes: an acquisition unit 201, a prediction unit 202, and a congestion window setting unit 203.

Specifically, the obtaining unit 201 is configured to obtain current first network state information and a preference of a current application for network transmission performance.

The prediction unit 202 is configured to input the obtained first network status information and the preference to the congestion control model, resulting in a predicted action to be performed for adjusting the congestion window size.

The congestion window setting unit 203 is configured to perform a predicted action to reset the congestion window.

As an example, the congestion control device 20 may further include: a window initializing unit (not shown) configured to initialize a size of the congestion window; wherein the window initialization unit is configured to predict a bandwidth of the communication network and determine an initial size of the congestion window based on the predicted bandwidth.

As an example, the window initialization unit may be configured to determine a total number of ACK messages fed back by the receiving end for the transmitted N data packets; and determining the bandwidth of the communication network according to an average value obtained by dividing the total number by N.

As an example, the congestion control model may be trained using the training apparatus 10 of the exemplary embodiment described above.

With regard to the apparatus in the above-described embodiment, the specific manner in which the respective units perform operations has been described in detail in the embodiment related to the method, and will not be elaborated upon here.

Further, it should be understood that the respective units in the training apparatus 10 and the congestion control apparatus 20 of the congestion control model according to the exemplary embodiments of the present disclosure may be implemented as hardware components and/or software components. The individual units may be implemented, for example, using Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs), depending on the processing performed by the individual units as defined by the skilled person.

Fig. 7 illustrates a block diagram of an electronic device according to an exemplary embodiment of the present disclosure. Referring to fig. 7, the electronic device 30 includes: at least one memory 301 and at least one processor 302, the at least one memory 301 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 302, perform a method of training a congestion control model and/or a method of congestion control as described in the above exemplary embodiments.

By way of example, the electronic device 30 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the set of instructions described above. Here, the electronic device 30 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 30 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 30, the processor 302 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processor 302 may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 302 may execute instructions or code stored in the memory 301, wherein the memory 301 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 301 may be integrated with the processor 302, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 301 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 301 and the processor 302 may be operatively coupled or may communicate with each other, e.g., through I/O ports, network connections, etc., such that the processor 302 is able to read files stored in the memory.

In addition, the electronic device 30 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 30 may be connected to each other via a bus and/or a network.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, which, when executed by at least one processor, cause the at least one processor to perform the training method of the congestion control model and/or the congestion control method as described in the above exemplary embodiments. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, a computer program product may also be provided, in which instructions are executable by at least one processor to perform a method of training a congestion control model and/or a method of congestion control as described in the above exemplary embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for training a congestion control model, comprising:

initializing a communication network environment used by the current training round;

inputting the preference of the training round on the network transmission performance and the current first network state information into a congestion control model to obtain predicted actions to be executed for adjusting the size of a congestion window;

executing the predicted action to reset the congestion window and controlling the sending end to send the data packet to the receiving end under the currently set congestion window;

when a sending end receives an ACK message fed back by a receiving end, calculating a loss function of the congestion control model according to the action, first network state information before the action is executed, first network state information after the action is executed and the preference;

and training the congestion control model by adjusting the model parameters of the congestion control model according to the loss function, and determining whether to finish the training round, wherein when the training round is not finished, the steps of inputting the preference of the training round on the network transmission performance and the current first network state information into the congestion control model are returned to be executed, so that the predicted action which needs to be executed and is used for adjusting the size of the congestion window are obtained.

2. The method of claim 1, wherein the first network state information comprises at least one of: the size of the congestion window, delay, packet acknowledgement rate, and sending rate;

wherein the delay, the packet acknowledgement rate, and the transmission rate are determined based on the ACK message fed back by the receiving end.

3. The method of claim 1, wherein the preference for network transmission performance comprises a degree of preference for at least one of: throughput, packet loss rate, and latency.

4. The method of claim 1, wherein the step of determining whether to end the training round comprises:

and determining whether to finish the training round according to the change condition of the second network state information.

5. A congestion control method, comprising:

acquiring current first network state information and the preference of current application on network transmission performance;

inputting the acquired first network state information and the preference into a congestion control model to obtain a predicted action to be executed for adjusting the size of a congestion window;

the predicted action is performed to reset the congestion window.

6. Training equipment of a congestion control model, characterized by comprising:

an environment initialization unit configured to initialize a communication network environment used by a current training round;

the prediction unit is configured to input the preference of the training round on the network transmission performance and the current first network state information into the congestion control model, and obtain predicted actions which need to be executed and are used for adjusting the size of the congestion window;

a congestion window setting unit configured to perform a predicted action to reset a congestion window and control a transmitting end to transmit a data packet to a receiving end under the currently set congestion window;

a loss function calculation unit configured to calculate a loss function of the congestion control model according to the action, the first network state information before the action is executed, the first network state information after the action is executed, and the preference when the sender receives an ACK message fed back by a receiver;

a training unit configured to train the congestion control model by adjusting model parameters of the congestion control model according to the loss function;

and the predicting unit inputs the preference of the training round to the network transmission performance and the current first network state information into the congestion control model when determining not to end the training round, and obtains predicted actions required to be executed for adjusting the size of the congestion window.

7. A congestion control apparatus, characterized by comprising:

the acquisition unit is configured to acquire current first network state information and the preference of the current application on the network transmission performance;

a prediction unit configured to input the obtained first network state information and the preference into a congestion control model, and obtain a predicted action to be executed for adjusting the size of a congestion window;

a congestion window setting unit configured to perform the predicted action to reset the congestion window.

8. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a training method of a congestion control model according to any one of claims 1 to 4 and/or a congestion control method according to claim 5.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform a method of training a congestion control model according to any one of claims 1 to 4 and/or a method of congestion control according to claim 5.

10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by at least one processor, implement a training method of a congestion control model according to any of claims 1 to 4 and/or a congestion control method according to claim 5.