CN115865806A

CN115865806A - Congestion control method and device, electronic equipment and storage medium

Info

Publication number: CN115865806A
Application number: CN202211487825.1A
Authority: CN
Inventors: 王玲; 吕磊; 程诚; 程博锋
Original assignee: New H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-03-28

Abstract

The embodiment of the application provides a congestion control method, a congestion control device, electronic equipment and a storage medium, wherein the method comprises the following steps: when a preset acquisition period is reached, acquiring network state data of the current acquisition period as first network state data; calculating the reward value of the current acquisition period based on the first network state data to serve as a first reward value; inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy; and adjusting the current congestion window according to the target adjustment strategy. Aiming at different network environments, whether the congestion signals are caused by network congestion or not does not need to be distinguished, the reward value can be calculated based on the first network state data, and then the congestion window is adjusted. Therefore, the congestion control method can be applied to complex network environments, and the effectiveness of congestion control can be improved.

Description

Congestion control method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer network technologies, and in particular, to a congestion control method and apparatus, an electronic device, and a storage medium.

Background

With the development of network technology, the current network environment is more and more complex, and factors influencing the network transmission efficiency are numerous. In order to avoid network congestion and ensure network stability and efficient data transmission, the data sending rate of the data sending end can be adjusted in a congestion control mode.

For example, in the related art, a congestion control method based on a heuristic method uses a packet loss rate and a delay as a congestion signal and dynamically controls a transmission rate or a congestion window to avoid network congestion. However, as the network environment becomes more complex, based on the method, it is impossible to effectively distinguish whether the congestion signal is caused by network congestion, and it is difficult to adapt to the current complex network environment, resulting in low validity of congestion control.

Disclosure of Invention

An object of the embodiments of the present application is to provide a congestion control method and apparatus, an electronic device, and a storage medium, so as to be applicable to a complex network environment and improve the validity of congestion control. The specific technical scheme is as follows:

according to a first aspect of embodiments of the present application, there is provided a congestion control method, the method including:

when a preset acquisition period is reached, acquiring network state data of the current acquisition period as first network state data;

the first network state data comprises a sending rate and a receiving rate of a current acquisition period; the sending rate of the current acquisition period is determined based on the sending rate acquired at each appointed moment in the current acquisition period; the receiving rate of the current acquisition period is determined based on the receiving rate acquired at each appointed moment in the current acquisition period; the specified time comprises a first time when the ACK message is received; the reception rate at each first time instant represents: receiving data rate between the time of the last received ACK message before sending the data packet corresponding to the first time and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the transmission rate at each first time instant represents: the sending time of the data packet responded by the last received ACK message and the sending time of the data packet corresponding to the first time are the data sending rate;

calculating the reward value of the current acquisition period based on the first network state data to serve as a first reward value;

inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy; the adjustment strategy prediction network model is obtained by training based on a reinforcement learning algorithm;

and adjusting the current congestion window according to the target adjustment strategy.

Optionally, the first reward value is positively correlated with the receiving rate of the current acquisition period, and is negatively correlated with the rate difference of the current acquisition period; and the speed difference value of the current acquisition period represents the difference value between the sending speed of the current acquisition period and the receiving speed of the current acquisition period.

Optionally, the first network status data further includes at least one of:

the minimum round trip time for the current acquisition period, represents: in the current acquisition period, when reaching the minimum value of the round-trip time of each acquired appointed time at each appointed time;

average round trip time for the current acquisition period, representing: average value of round trip time collected at each appointed time in current collection period;

the average time delay of the current acquisition period represents: average value of time delay collected at each appointed moment in the current collection period;

the average congestion window size for the current acquisition period, represents: the average value of the sizes of the congestion windows collected at each specified moment in the current collection period;

the average in-flight data size for the current acquisition period represents: average value of the size of the in-flight data acquired at each designated moment in the current acquisition period; the in-flight data size collected at a given time represents: the size of the data packet which has been sent at the specified time and has not received the corresponding ACK message;

the size of the data sent in the current acquisition period;

the size of a data packet responded by the ACK message received in the current acquisition period;

the size of the data packet lost in the current acquisition period;

the number of display congestion signals in the ACK message received in the current acquisition period.

Optionally, the specified time further includes a second time when a packet loss event is detected;

the round trip time collected at each second instant represents: round trip time collected during the last received ACK message before the second time;

the time delay collected at each second moment represents: a time delay collected during the last received ACK message before the second time;

the transmission rate acquired at each second instant represents: a sending rate acquired when the ACK message is received last before the second time;

the reception rate acquired at each second instant represents: the reception rate collected at the time of the last received ACK message before the second time.

Optionally, the first reward value is negatively correlated with the average time delay of the current acquisition period.

Optionally, the calculating, based on the first network state data, an incentive value of a current collection period as a first incentive value includes:

judging whether the average time delay of the current acquisition period is smaller than a first threshold value or not;

if yes, determining the first reward value of the current acquisition period as the receiving rate of the current acquisition period;

if not, calculating the reward value of the current acquisition period as a first reward value based on the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period and the minimum round-trip time of the current acquisition period.

Optionally, the first threshold is positively correlated to the minimum round trip time of the current acquisition period.

Optionally, the first threshold is calculated based on a first formula;

the first formula is:

S＝εMinRtt+ρ

wherein S represents a first threshold value, minRtt represents the minimum round-trip time of the current acquisition period, epsilon represents a first preset parameter, and rho represents a second preset parameter;

the calculating, based on the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period, and the minimum round trip time of the current acquisition period, an incentive value of the current acquisition period as a first incentive value includes:

calculating a reward value of the current acquisition period according to a second formula and based on the sending rate of the current acquisition period, the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period and the minimum round trip time of the current acquisition period, wherein the reward value is used as a first reward value; wherein the second formula is:

wherein reward represents a first reward value, AR represents a receiving rate of a current acquisition period, D represents an average time delay of the current acquisition period, minrt represents a minimum round trip time of the current acquisition period, SR represents a sending rate of the current acquisition period, and δ represents a third preset parameter.

Optionally, the sending rate of the current acquisition period is the sending rate acquired at the last designated time in the current acquisition period; and the receiving rate of the current acquisition period is the receiving rate acquired at the last appointed moment in the current acquisition period.

Optionally, the target adjustment policy includes more than two first specified adjustment multiples greater than 1, more than two second specified adjustment multiples reciprocal to the more than two first specified adjustment multiples, and a probability corresponding to each specified adjustment multiple;

the adjusting the current congestion window according to the target adjustment strategy includes:

and adjusting the current congestion window according to the specified adjustment multiple with the maximum probability.

Optionally, the method further includes:

and when the preset adjusting period is reached, transmitting the data packet according to the rate smaller than the current receiving rate in the first time length.

Optionally, before the first network state data and the first reward value are input to a pre-trained adjustment policy prediction network model to obtain a target adjustment policy, the method further includes:

acquiring network state data of a preset number of historical periods before the current acquisition period as second network state data;

the inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy comprises:

and inputting the first network state data, the second network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

Optionally, the training process of the adjustment strategy prediction network model includes the following steps:

acquiring network state data of a sample period as sample network state data;

wherein the sample network state data comprises: the sending rate and receiving rate of the sample period; the sending rate of the sample period is determined based on the sending rate acquired at each appointed moment in the sample period; the receiving rate of the sample period is determined based on the receiving rate acquired at each appointed moment in the sample period; the specified time comprises a first time when the ACK message is received; the reception rate at each first time instant represents: receiving data rate between the time of the last received ACK message before sending the data packet corresponding to the first time and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the transmission rate at each first time instant represents: the sending time of the data packet responded by the last received ACK message and the sending time of the data packet corresponding to the first time are the data sending rate;

calculating a second reward value for a sample period based on the sample network state data; wherein the second reward value is positively correlated with the receiving rate of the sample period and negatively correlated with the rate difference of the sample period; the rate difference represents a difference between a sending rate of a sample period and a receiving rate of the sample period;

inputting the sample network state data and the second reward value into an adjustment strategy prediction network model of an initial parameter to obtain a sample adjustment strategy and a strategy score value;

adjusting the current congestion window according to the sample adjustment strategy;

and adjusting model parameters of the initial parameter adjustment strategy prediction network model based on the strategy score value and the second incentive value until a convergence condition is reached.

According to a second aspect of embodiments of the present application, there is provided a congestion control apparatus, the apparatus including:

the first network state acquisition module is used for acquiring the network state data of the current acquisition period as first network state data when the preset acquisition period is reached;

the first network state data comprises a sending rate and a receiving rate of a current acquisition period; the sending rate of the current acquisition period is determined based on the sending rate acquired at each appointed moment in the current acquisition period; the receiving rate of the current acquisition period is determined based on the receiving rate acquired at each appointed moment in the current acquisition period; the appointed time comprises a first time when the ACK message is received; the reception rate at each first time instant represents: receiving data rate between the time of the last received ACK message before sending the data packet corresponding to the first time and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the transmission rate at each first time instant represents: the sending time of the data packet responded by the last received ACK message and the sending time of the data packet corresponding to the first time are the data sending rate;

the first reward value calculation module is used for calculating a reward value of the current acquisition period based on the first network state data to serve as a first reward value;

the target adjustment strategy acquisition module is used for inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy; the adjustment strategy prediction network model is obtained by training based on a reinforcement learning algorithm;

and the congestion window adjusting module is used for adjusting the current congestion window according to the target adjusting strategy.

Optionally, the first network status data further includes at least one of:

the minimum round trip time for the current acquisition period, represents: in the current acquisition period, when the round-trip time of each acquired appointed time reaches each appointed time, the round-trip time is the minimum value;

the average congestion window size for the current acquisition period, represents: the average value of the sizes of congestion windows collected at each specified moment in the current collection period;

the average in-flight data size for the current acquisition period, representing: average value of the size of the in-flight data acquired at each designated moment in the current acquisition period; the in-flight data size collected at a given time represents: the size of the data packet which has been sent at the specified time and has not received the corresponding ACK message;

the size of the data sent in the current acquisition period;

the size of the data packet lost in the current acquisition period;

the reception rate acquired at each second instant represents: the reception rate collected for the last received ACK message before the second time.

Optionally, the first reward value is negatively correlated with the average time delay of the current collection period.

Optionally, the first bonus value calculating module includes:

the first threshold judgment submodule is used for judging whether the average time delay of the current acquisition period is smaller than a first threshold, if so, the first reward value calculation submodule is triggered, and if not, the second reward value calculation submodule is triggered;

the first reward value operator module is used for determining a first reward value of the current acquisition period as the receiving rate of the current acquisition period;

and the second reward value calculation submodule is used for calculating the reward value of the current acquisition cycle as the first reward value based on the sending rate of the current acquisition cycle, the receiving rate of the current acquisition cycle, the average time delay of the current acquisition cycle and the minimum round trip time of the current acquisition cycle.

Optionally, the first threshold is calculated based on a first formula;

the first formula is:

S＝εMinRtt+ρ

the second reward value calculation submodule is used for calculating a reward value of the current acquisition cycle as a first reward value according to a second formula based on the sending rate of the current acquisition cycle, the sending rate of the current acquisition cycle, the receiving rate of the current acquisition cycle, the average time delay of the current acquisition cycle and the minimum round trip time of the current acquisition cycle; wherein the second formula is:

Optionally, the sending rate of the current acquisition period is the sending rate acquired at the last specified time in the current acquisition period; and the receiving rate of the current acquisition period is the receiving rate acquired at the last appointed moment in the current acquisition period.

the congestion window adjusting module is specifically configured to adjust the current congestion window according to the specified adjustment multiple with the largest corresponding probability.

Optionally, the apparatus further comprises:

and the sending rate adjusting module is used for sending the data packet according to the rate smaller than the current receiving rate in the first time length when the preset adjusting period is reached.

Optionally, the apparatus further comprises:

the second network state acquisition module is used for acquiring network state data of a preset number of historical periods before the current acquisition period as second network state data before inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy;

the target adjustment strategy acquisition module is configured to input the first network state data, the second network state data, and the first reward value to a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

Optionally, the apparatus further comprises:

the training module is used for acquiring network state data of a sample period as sample network state data;

wherein the sample network state data comprises: the sending rate and receiving rate of the sample period; the sending rate of the sample period is determined based on the sending rate acquired at each appointed moment in the sample period; the receiving rate of the sample period is determined based on the receiving rate acquired at each appointed moment in the sample period; the appointed time comprises a first time when the ACK message is received; the reception rate at each first time instant represents: the rate of receiving data between the time of the ACK message received last before the data packet corresponding to the first time and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the transmission rate at each first time instant represents: the sending time of the data packet responded by the last received ACK message and the sending time of the data packet corresponding to the first time are the data sending rate;

According to a third aspect of the embodiments of the present application, there is provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing any of the above method steps when executing a program stored in the memory.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor, performs any of the method steps described above.

According to the congestion control method provided by the embodiment of the application, aiming at different network environments, the sending rate and the receiving rate of the current acquisition period can be obtained, and the sending rate and the receiving rate can effectively reflect the current network state, so that congestion control can be effectively realized based on the obtained information. The congestion window can be adjusted by acquiring the sending rate and the receiving rate of the current acquisition period without distinguishing whether the congestion signal is caused by network congestion. Therefore, the congestion control method can be applied to complex network environments, and the effectiveness of congestion control can be improved. In addition, because the reward value is related to the sending rate and the receiving rate of the current acquisition period, the receiving rate and the sending rate are equal to each other as much as possible through a reward mechanism. Therefore, the congestion window is adjusted according to the target adjustment strategy obtained based on the reward value, a larger bandwidth utilization rate can be obtained, and the problem of oversending caused by the fact that the sending rate is larger than the receiving rate can be avoided.

Of course, it is not necessary for any product or method of the present application to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the description below are only some embodiments of the present application, and other embodiments can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a flowchart of a congestion control method according to an embodiment of the present application;

fig. 2 is a schematic diagram of calculating a sending rate and a receiving rate at a given time according to an embodiment of the present application;

fig. 3 is a schematic diagram of a congestion control method according to an embodiment of the present application;

fig. 4 is a training flowchart of an adjustment strategy prediction network model in the congestion control method according to the embodiment of the present application;

fig. 5 is a schematic diagram of generating a tuning strategy based on a tuning strategy prediction network model according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a congestion control apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a congestion control apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a congestion control apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a congestion control apparatus according to an embodiment of the present application;

fig. 10 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the description herein are intended to be within the scope of the present disclosure.

In order to be applicable to a complex network environment and improve the effectiveness of congestion control, an embodiment of the present application provides a congestion control method, and referring to fig. 1, fig. 1 is a flowchart of the congestion control method provided in the embodiment of the present application, where the method may include the following steps:

step S101: and when the preset acquisition period is reached, acquiring the network state data of the current acquisition period as first network state data.

The first network state data comprises a sending rate and a receiving rate of a current acquisition period; the sending rate of the current acquisition period is determined based on the sending rate acquired at each appointed moment in the current acquisition period; the receiving rate of the current acquisition period is determined based on the receiving rate acquired at each appointed moment in the current acquisition period; the specified time includes a first time when an ACK (Acknowledgement character) message is received; the reception rate at each first time instant represents: receiving data rate between the time of the last received ACK message before sending the data packet corresponding to the first time and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the transmission rate at each first time instant represents: and sending data between the sending time of the data packet responded by the ACK message received last before the data packet corresponding to the first time and the sending time of the data packet corresponding to the first time.

Step S102: based on the first network state data, a reward value for the current acquisition period is calculated as a first reward value.

Step S103: and inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

The adjustment strategy prediction network model is obtained by training based on a reinforcement learning algorithm.

Step S104: and adjusting the current congestion window according to the target adjustment strategy.

By applying the congestion control method provided by the embodiment of the application, the sending rate and the receiving rate of the current acquisition period can be obtained for different network environments, and the sending rate and the receiving rate can effectively reflect the current network state, so that the congestion control can be effectively realized based on the obtained information. The congestion window can be adjusted by acquiring the sending rate and the receiving rate of the current acquisition period without distinguishing whether the congestion signal is caused by network congestion. Therefore, the congestion control method can be applied to complex network environments, and the effectiveness of congestion control can be improved. In addition, because the reward value is related to the sending rate and the receiving rate of the current acquisition period, the receiving rate and the sending rate are equal to each other as much as possible through a reward mechanism. Therefore, the congestion window is adjusted according to the target adjustment strategy obtained based on the reward value, so that a larger bandwidth utilization rate can be obtained, and the problem of oversending caused by the fact that the sending rate is greater than the receiving rate can be avoided.

For step S101, the period duration of the preset acquisition period may be a fixed duration. Alternatively, the period duration of the preset acquisition period may also be determined based on the RTT (Round Trip Time) of the current data packet. For example, at the end of one acquisition cycle, the RTT of the data packet at the current time may be determined, and the cycle duration of the next acquisition cycle may be determined as the RTT.

In the present application, an event occurring at a specific time may be referred to as a specific event. For example, the specified event includes an event that an ACK message is received. In each acquisition cycle, when a specified event occurs, current network state data, including the current sending rate and receiving rate, i.e., the sending rate and receiving rate acquired at the current specified time, may be acquired. The network state data for one acquisition cycle is determined based on the network state data acquired at each specified time within the acquisition cycle.

The sending rate of the current acquisition period is determined based on the sending rate acquired at each designated moment in the current acquisition period. For example, the sending rate of the current acquisition period may be the sending rate acquired at the last designated time in the current acquisition period, may also be the sending rates acquired at other designated times in the current acquisition period, and may also be an average value of the sending rates acquired at each designated time in the current acquisition period.

The receiving rate of the current acquisition period is determined based on the receiving rate acquired at each designated moment in the current acquisition period. For example, the receiving rate of the current acquisition period may be the receiving rate acquired at the last designated time in the current acquisition period, may also be the receiving rate acquired at other designated times in the current acquisition period, and may also be the average value of the sending rates acquired at each designated time in the current acquisition period.

In network communication, a sending end may send a data packet to a receiving end, and then, after receiving the data packet, the receiving end may send an ACK message to the sending end in response to the received data packet. The sending end receives the ACK message, and can determine the data packet to which the ACK message responds. A data packet corresponding to a first time indicates a data packet to which the ACK message received at the first time responds, that is, an ACK message received at the first time is used to respond to the data packet corresponding to the first time.

The sending time of the data packet can be recorded when the sending end sends the data packet, and the corresponding receiving time can be recorded when the sending end receives the ACK message sent by the receiving end. The reception rate at the first time can represent the rate at which data is received during one historical period of time (which may be referred to as a first period of time) prior to the first time, and correspondingly, the transmission rate at the first time can represent the rate at which data is transmitted during another historical period of time (which may be referred to as a second period of time). The ACK message received during the first time period corresponds to the data packet sent during the second time period.

Referring to fig. 2, fig. 2 is a schematic diagram for calculating a sending rate and a receiving rate at a specific time according to an embodiment of the present application. Where the abscissa represents time, the black rectangular box represents the transmitted packet, and the white rectangular box represents the received ACK message. The time when the packet a is transmitted is T1, the time when the ACK message A1 is received is T2, the time when the packet B is transmitted is T3, and the time when the ACK message B1 is received is T4. The data packet responded by the ACK message B1 is the data packet B, the ACK message received last before the data packet B is sent is the ACK message A1, and the data packet responded by the ACK message A1 is the data packet a.

When the first time is time T4, it can be seen that the sending rate at time T4 is a ratio of the size of the data packet sent between time T3 and time T1 to the time interval between time T3 and time T1. The receiving rate at the time T4 is a ratio of a size of a data packet responded by the ACK message received between the time T4 and the time T2 to a time interval between the time T4 and the time T2.

The transmission rate at time T4 can be calculated based on equation (1):

wherein, send _ rate represents the transmission rate at time T4, send represents the size of the data packet transmitted between time T3 and time T1, and T3-T1 represents the time interval between time T3 and time T1.

The reception rate at time T4 can be calculated based on equation (2):

here, acked _ rate represents a reception rate at time T4, acked represents a size of a packet to which an ACK message received between time T4 and time T2 responds, and T4-T2 represents a time interval between time T4 and time T2.

For step S102, in one embodiment, the first reward value is positively correlated with the receiving rate of the current acquisition period, and negatively correlated with the rate difference of the current acquisition period; the rate difference for the current acquisition period represents the difference between the sending rate for the current acquisition period and the receiving rate for the current acquisition period. Since the first reward value is positively correlated with the receiving rate of the current acquisition period, that is, the higher the receiving rate of the current acquisition period is, the larger the first reward value is, in order to obtain a larger reward value, the higher the receiving rate is. The first reward value is negatively correlated with the rate difference of the current acquisition period, and the rate difference of the current acquisition period represents the difference between the sending rate of the current acquisition period and the receiving rate of the current acquisition period. The smaller the difference between the sending rate of the current acquisition period and the receiving rate of the current acquisition period is, the larger the first reward value is, so that in order to obtain a larger reward value, the target adjustment strategy formed by the adjustment strategy prediction network model can reduce the rate difference of the current acquisition period, and the receiving rate is equal to the sending rate as much as possible, so that the larger bandwidth utilization rate is obtained, and the problem of over-sending is avoided.

For step S103 and step S104, the first network state data includes network state data representing the current network state, such as the sending rate of the current acquisition cycle and the receiving rate of the current acquisition cycle. The adjustment strategy prediction network model can sense the network state of the current acquisition period based on the first network state data, and then a target adjustment strategy is obtained by combining the reward value. The target adjustment strategy comprises adjustment actions for adjusting the current congestion window and probabilities corresponding to the adjustment actions. For example, the adjustment action may indicate that the current congestion window is increased, such as by increasing a preset multiple, increasing a fixed value, to increase the current congestion window. The adjustment action may also indicate that the current congestion window is decreased, for example, by decreasing the preset multiple and decreasing the fixed value to decrease the current congestion window. Alternatively, the adjustment action may also indicate keeping the size of the congestion window unchanged. Further, the adjustment action with the high probability can be selected according to the probability corresponding to each adjustment action, the size of the current congestion window is adjusted, and correspondingly, the data packet can be sent according to the adjusted congestion window, so that congestion control is realized.

In one embodiment, the sending rate of the current acquisition period is the sending rate acquired at the last designated moment in the current acquisition period; the receiving rate of the current acquisition period is the receiving rate acquired at the last appointed moment in the current acquisition period. Therefore, the sending rate and the receiving rate acquired at the last appointed moment in the current acquisition period are used as the sending rate and the receiving rate of the current acquisition period to adjust the congestion window, and the adjustment effectiveness can be improved.

In one embodiment, the first network status data may further comprise at least one of:

the minimum round trip time for the current acquisition period, represents: and in the current acquisition period, the minimum value of the round trip time of each specified time which is acquired when each specified time is reached.

Average round trip time for the current acquisition period, representing: the average value of the round trip times acquired at each specified time within the current acquisition period.

The average time delay of the current acquisition period represents: average value of time delay collected at each appointed moment in the current collection period. The time delay of a given time may represent the difference between the round trip time of the given time and the minimum value of the round trip times of the given times that have been acquired at the given time.

The average congestion window size for the current acquisition period, represents: and average value of the sizes of the congestion windows collected at each specified moment in the current collection period.

The average in-flight data size of the current acquisition period represents the average value of the in-flight data sizes acquired at each designated moment in the current acquisition period; the in-flight data size collected at a given time represents: the size of the packet that was sent at the specified time and for which the corresponding ACK message has not been received.

The size of the data sent in the current acquisition period.

The size of the data packet to which the ACK message received in the current acquisition period is responded.

The size of the data packets lost during the current acquisition period.

In one example, a minimum round trip time may be recorded for each acquisition period, and the minimum round trip time may be updated during the acquisition period. During the acquisition period, when each specified time is reached, the round trip time of the specified time is acquired. If the round-trip time acquired at the specified moment is less than the minimum round-trip time of the current acquisition period recorded currently, updating the minimum round-trip time of the current acquisition period to the round-trip time acquired at the specified moment, otherwise, not updating the minimum round-trip time of the current acquisition period. That is, the minimum round trip time of the current acquisition period represents the minimum value among the round trip times of the respective specified times that have been acquired when each of the specified times is reached within the current acquisition period.

The sending time of the data packet can be recorded when the sending end sends the data packet, and the corresponding receiving time can be recorded when the sending end receives the ACK message corresponding to the data packet sent by the receiving end. Based on the reception time of the ACK message and the transmission time of the corresponding data packet, a round trip time may be calculated.

For each specified time in the current acquisition cycle, the size of the currently transmitted data packet and the size of the data packet responded by the currently received ACK message may be recorded at the specified time. Further, the size of the current flight data at the specified time may be calculated based on the recorded data. For example, the difference between the size of the packet currently transmitted and the size of the packet to which the ACK message currently received is responded at the specified time may be calculated as the in-flight data size.

For each given time in the current acquisition cycle, the size of the currently transmitted data packet may be recorded at the given time. Furthermore, the size of the data packet sent at the last designated time of the acquisition period and the size of the data packet sent at the last designated time of the previous acquisition period may be obtained, and then the difference between the two obtained values is calculated to obtain the size of the data sent in the acquisition period.

For each given time in the current acquisition cycle, the size of the data packet to which the ACK message is currently received may be recorded at the given time. Further, the size of the data packet to which the ACK message has been received at the last designated time of the acquisition period and the size of the data packet to which the ACK message has been received at the last designated time of the previous acquisition period may be obtained, and then the difference between the two obtained values may be calculated to obtain the size of the data packet to which the ACK message has been received in the acquisition period.

Due to network congestion and other reasons, a situation that a data packet is lost may occur, and therefore, the size of the lost data packet can also reflect the network state in the acquisition period. For each specified time in the current acquisition cycle, the size of the currently lost data packet may be recorded at the specified time. Furthermore, the size of the data packet lost at the last designated time of the acquisition period and the size of the data packet lost at the last designated time of the previous acquisition period can be obtained, and then the difference value of the two obtained values is calculated to obtain the size of the data packet lost in the acquisition period.

When Congestion occurs in the network, a router supporting ECN (Explicit Congestion Notification) may set a flag in the packet, i.e., indicate a Congestion signal. And after receiving the data packet added with the congestion display signal, the receiving end sends a corresponding ACK message carrying the congestion display signal to the sending end. Therefore, the number of congestion-indicating signals in the ACK message received in one acquisition period can also reflect the network state in the acquisition period.

The first network state data may include at least one piece of data capable of describing a network state of a current acquisition cycle, and the data has a definite physical meaning, so that the first network state data is more meaningful for describing the network state, and further, the network state of the current acquisition cycle can be better perceived based on the first network state data, and a network congestion situation in the current acquisition cycle can be clearly known, so that the adaptability of the congestion control method provided by the embodiment of the present application to a network environment is improved, and a better control effect is obtained.

In one embodiment, the specified time further includes a second time at which a packet loss event is detected.

The round trip time collected at each second instant represents: round trip time collected at the last received ACK message before the second time.

The time delay collected at each second moment represents: the time delay collected during the last received ACK message before the second time.

The transmission rate acquired at each second instant represents: the transmission rate collected at the time of the last received ACK message before the second time.

When a packet loss event occurs, the sending end does not receive the ACK message, and the round trip time, the time delay, the sending rate and the receiving rate need to be obtained based on the ACK message. Therefore, these network status data cannot be collected when a packet loss event is detected. In order to describe the network state in the event of packet loss, the network state data that cannot be acquired and corresponds to the packet loss event may be determined based on the relevant data acquired when the ACK message that is received last before the packet loss event is detected.

The minimum round trip time, delay, congestion window size, in-flight data size, size of the transmitted data collected for the second time instant are similar to the way these data were collected for the first time instant.

If the ACK message corresponding to a certain data packet is not received in a certain time, it is determined that a packet loss event occurs, and when the packet loss event occurs, network congestion is likely to occur. That is, the network state data at the time of the occurrence of the packet loss event can also effectively represent the state of the network, and therefore, congestion control can also be performed based on the network state data at the time of the occurrence of the packet loss event.

In one embodiment, the first reward value is inversely related to the average time delay of the current acquisition period. The first reward value can be reduced along with the increase of the average time delay of the current acquisition period, namely, the average time delay of the current acquisition period can be punished through the first reward value, the increase of the time delay is avoided, the problem of buffer inflation can be further avoided, and a better congestion control effect is achieved.

In one implementation, the first prize value may be calculated according to equation (3):

reward＝α×acked_rate-β×(send_rate-acked_rate)-γ×delay (3)

wherein, reward represents a first reward value, acked _ rate is a receiving rate of a current acquisition period, send _ rate is a sending rate of the current acquisition period, delay is an average time delay of the current acquisition period, and α, β, and γ represent preset coefficients. For example, α, β may be 1 or 2, and γ may be 0.05 or 0.04.

In one embodiment, calculating a reward value for a current acquisition period as a first reward value based on the first network status data (S102) includes:

the method comprises the following steps: judging whether the average time delay of the current acquisition period is smaller than a first threshold value or not; if yes, executing the step two; if not, executing step three.

Step two: the reward value for the current acquisition period is determined as the reception rate for the current acquisition period.

Step three: and calculating the reward value of the current acquisition period based on the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period and the minimum round-trip time of the current acquisition period.

And if the average time delay of the current acquisition cycle is smaller than a first threshold value, determining the reward value of the current acquisition cycle as the receiving rate of the current acquisition cycle, and further enabling the reward value to be larger if the receiving rate of the current acquisition cycle is larger. Therefore, through the reward mechanism, the receiving rate and the sending rate are equal to each other as much as possible, so that a larger bandwidth utilization rate is obtained.

And if the average time delay of the current acquisition period is not less than the first threshold, calculating the reward value of the current acquisition period based on the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period and the minimum round-trip time of the current acquisition period. Because the reward value is related to the sending rate and the receiving rate of the current acquisition period, the receiving rate and the sending rate are equal to the greatest extent through a reward mechanism, so that a larger bandwidth utilization rate is obtained.

Based on the above processing, if the first threshold is set to a smaller value, the average delay of the current acquisition cycle may be controlled lower, thereby reducing data queuing; if the first threshold is set to a larger value, in order to obtain a larger reward value, the average delay of the current acquisition cycle can still be controlled within a larger range, so that the sent data volume can be increased, and the capacity of preempting the cache can be improved.

For example, the first threshold may be a fixed value set in advance.

Alternatively, the first threshold may be positively correlated to the minimum round trip time of the current acquisition cycle. Correspondingly, when the minimum round-trip time of the current acquisition cycle is longer, the first threshold is also larger, and at the moment, the larger average time delay can still be rewarded, so that the sending rate is increased, the size of flying data is increased, more data packets can be ensured to be in queue, and the capacity of taking the cache is improved. When the minimum round trip time of the current acquisition period is smaller, the first threshold is also smaller, so that the average time delay of the current acquisition period can be controlled to be lower, and data queuing is reduced.

For example, the first threshold is calculated based on equation (4):

S＝εMinRtt+ρ (4)

wherein S represents a first threshold, minRtt represents a minimum round-trip time of a current acquisition period, epsilon represents a first preset parameter, and ρ represents a second preset parameter.

Step three, comprising:

calculating the reward value of the current acquisition period according to a formula (5) based on the sending rate of the current acquisition period, the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period and the minimum round trip time of the current acquisition period, and taking the reward value as a first reward value:

When the values of epsilon and rho are larger, namely the first threshold value is set to be a larger value, at the moment, the larger average time delay can still be rewarded, so that the sending rate is increased, the size of the flying data is improved, more data packets can be ensured to be in queue, and the capacity of taking the cache is improved. When the values of epsilon and rho are small, namely the first threshold value is set to be a small value, the average time delay of the current acquisition period can be controlled to be low, and therefore data queuing is reduced.

In the prior art, the reward value of congestion control based on reinforcement learning is calculated based on a linear combination of throughput, average delay and size of lost data packet in a period of time, as shown in formula (6):

reward＝α×T+β×D+γ×L (6)

wherein reward represents the reward value of the period of time, T represents the throughput of the period of time, D represents the average time delay of the period of time, L represents the size of the lost data packet of the period of time, and α, β, and γ represent preset coefficients. However, in the above method for calculating the bonus value, there is only a simple linear relationship between the bonus value and the throughput, the average delay, and the size of the lost data packet in a period of time, and therefore, the method for calculating the bonus value cannot be adapted to all network environments through coefficient adjustment, which results in that the network environment to which the method is adapted is relatively limited.

In the method for calculating the reward value provided by the embodiment of the application, the relationship between the reward value and the network state data is nonlinear, so that the calculated reward value can be suitable for different network environments by adjusting the preset parameters.

In one embodiment, the target adjustment strategy includes two or more first specified adjustment multiples greater than 1, two or more second specified adjustment multiples reciprocal to the two or more first specified adjustment multiples, and a probability corresponding to each of the specified adjustment multiples. Correspondingly, according to the target adjustment strategy, adjusting the current congestion window (S104), including:

step S1041: and adjusting the current congestion window according to the specified adjustment multiple with the maximum probability.

More than two appointed adjusting times in the target adjusting strategy respectively correspond to adjusting actions with different amplitudes so as to adjust the size of the congestion window. For example, the first specified adjustment factor may include 2.89, 1.25, 1.05, which respectively represents adjusting the congestion window size to 2.89, 1.25, 1.05 times the current congestion window, and correspondingly, the second specified adjustment factor includes: 1/2.89, 1/1.25, 1/1.05, respectively, indicating that the congestion window size is adjusted to 1/2.89, 1/1.25, 1/1.05 of the current congestion window. And each designated adjustment multiple has the corresponding probability, and the congestion window is adjusted according to the designated adjustment multiple with the maximum corresponding probability.

In addition, in the prior art, the adjustment action includes: keeping the size of the congestion window unchanged, adjusting the size of the congestion window to 1/2 of the current congestion window, decreasing the size of the congestion window to 10 times of the current congestion window, adjusting the size of the congestion window to 10 times of the current congestion window, and adjusting the size of the congestion window to 2 times of the current congestion window. The determined adjustment action may also be the same for two networks of different bandwidths, and the same adjustment action may have a different effect on the two different networks. In this method, the same adjusted congestion window size may be obtained based on different adjustment operations. For example, the size of the congestion window before adjustment is 10, the size of the congestion window after adding 10 to the current congestion window is 20, and the size of the congestion window after doubling the current congestion window is also 20, which are the same. That is, even if different adjustment actions are determined based on different network states, the same adjustment result is obtained.

In the embodiment provided by the application, the size of the current congestion window can be adjusted according to the specified adjustment multiple, so that the problem that different adjustment actions obtain the same adjustment result can be avoided. In addition, the specified adjustment multiple is not 1, the size of the congestion window can be kept in a changed state continuously, the change of the data sending rate can be promoted, and further, the influence of the rate change on the network state can be sensed, so that the effectiveness of congestion control is improved. Selecting a first, larger, specified adjustment factor (e.g., 2.89) may cause the sending rate of data to rise rapidly, and correspondingly, selecting a second, specified adjustment factor (e.g., 1/2.89) corresponding to the first specified adjustment factor may cause the sending rate of data to fall rapidly, so that the link is drained quickly. The size of the congestion window is adjusted through various specified adjustment multiples, the changed network environment can be detected, and the congestion control method provided by the embodiment of the application has rapid strain capacity on the change of the network environment.

In one embodiment, the method may further comprise: and when the preset adjusting period is reached, transmitting the data packet according to the rate smaller than the current receiving rate in the first time length.

The preset adjustment period may also be referred to as an RTT probing phase. For example, when the preset adjustment period is reached, the data packet is sent within the first duration according to the rate represented by the preset multiple of the current receiving rate, where the preset multiple is smaller than 1. For example, the period duration of the preset adjustment period may be 10S or 11S, the first period duration may be 190ms or 200ms, and the preset multiple may be 0.5 times or 0.6 times. The setting of the preset adjustment period, the first time length and the preset multiple can be adjusted according to the actual application condition, and is not particularly limited.

The current receiving rate may be a receiving rate acquired when the ACK message is last received before the current time.

As the network state may change, the current actual minimum round trip time may be much larger than the minimum round trip time recorded for the current acquisition period. Therefore, in order to make the recorded minimum round-trip time more accurate, when the preset adjustment period is reached, the data packet is directly sent at a rate smaller than the current receiving rate, and the first duration is maintained to reduce the size of the in-flight data, and the link is emptied, so that the actual minimum round-trip time of the network can be reduced, and the currently recorded minimum round-trip time is more accurate, so that the congestion control method provided by the embodiment of the application is suitable for the dynamic delay link.

In addition, when the first time length is reached, data can continue to be transmitted according to the size of the congestion window before adjustment.

In one embodiment, referring to fig. 3, fig. 3 is a schematic diagram of a congestion control method provided by an embodiment of the present application. The sending end receives and sends data with the receiving end, and realizes asynchronous congestion control through the intelligent decision module and the data sending module. The Agent decision module is RL Agent (Reinforcement Learning Agent). The data transmission module comprises an Event State Sampler and a Round-Trip State Sampler, wherein the Event State Sampler collects network State data when detecting ACK (acknowledgement) messages and packet loss events. When the timing module determines that the preset acquisition period is reached, the round-trip state sampler can integrate the network state data of each appointed moment in the current acquisition period acquired by the event state sampler to obtain the first network state data of the current acquisition period.

The data sending module calculates a first reward value based on the first network state data and transmits the first network state data and the first reward value to the agent decision module. For example, the data sending module may further include a computation submodule to which the round trip status sampler may send first network status data, and accordingly, the computation submodule may compute a first reward value based on the first network status data and pass the first network status data and the first reward value to the agent decision module.

The agent decision module obtains a target adjustment strategy based on the first network state data and the first reward value, selects a designated adjustment multiple (i.e. adjustment action) from the target adjustment strategy, and sends the adjustment action to the data sending module. And the data sending module adjusts the size of the congestion window according to the adjusting action. For example, the agent decision module may process the first network state data and the first reward value based on the adjustment policy prediction network model shown in fig. 5 to obtain the target adjustment policy.

When the RTT detection stage is reached, namely a preset adjustment period is reached, the data sending module sends a data packet according to a rate smaller than the current receiving rate within a first time length. For example, the data sending module may further include a sending adjustment submodule, that is, when the preset adjustment period is reached, the sending adjustment submodule may send the data packet at a rate smaller than the current receiving rate within the first duration.

Because the first reward value is calculated based on the sending rate, the receiving rate and the time delay, the mode ensures that the reward value calculation mode has definite meaning, and can effectively guide an agent decision module (namely an agent for reinforcement learning) to converge and make a good control decision.

In one embodiment, before inputting the first network state data and the first reward value into the pre-trained adjustment strategy prediction network model to obtain the target adjustment strategy (S103), the method further includes: and acquiring network state data of a preset number of historical periods before the current acquisition period as second network state data.

Correspondingly, inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy (S103), including:

step S1031: and inputting the first network state data, the second network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

The network state data of the preset number of historical periods can reflect the network state at the historical moment, and further, the network state data of the preset number of historical periods and the network state data of the current acquisition period are combined, so that the change condition of the network state in a certain time period can be reflected. Furthermore, based on the data, the adjustment strategy prediction network model can make more reasonable adjustment strategy selection, and an adjustment strategy more suitable for the current network state is determined, so that a better congestion control effect can be obtained.

Fig. 4 is a training flowchart of an adjustment policy prediction network model in the congestion control method according to the embodiment of the present application, and referring to fig. 4, a training process of the adjustment policy prediction network model includes the following steps:

step S401: and acquiring network state data of the sample period as sample network state data.

Wherein the sample network state data comprises: the sending rate and receiving rate of the sample period; the sending rate of the sample period is determined based on the sending rate acquired at each appointed moment in the sample period; the receiving rate of the sample period is determined based on the receiving rate acquired at each appointed moment in the sample period; the designated time comprises a first time when the ACK message is received; the reception rate at each first time instant represents: receiving data rate between the time of the last received ACK message before sending the data packet corresponding to the first time and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the transmission rate at each first time instant represents: and the data sending rate between the sending time of the data packet responded by the received ACK message and the sending time of the data packet corresponding to the first time.

Step S402: based on the sample network state data, a second reward value for the sample period is calculated.

Wherein the second reward value is positively correlated with the receiving rate of the sample period and negatively correlated with the rate difference; the rate difference represents the difference between the transmission rate of the sample period and the reception rate of the sample period.

Step S403: and inputting the sample network state data and the second reward value into the adjustment strategy prediction network model of the initial parameters to obtain a sample adjustment strategy and a strategy score value.

Step S404: and adjusting the current congestion window according to the sample adjustment strategy.

Step S405: and adjusting the model parameters of the initial parameter adjustment strategy prediction network model based on the strategy score value and the second incentive value until a convergence condition is reached.

The adjustment strategy prediction network model can be realized based on a deep reinforcement learning network model.

Referring to fig. 5, fig. 5 is a schematic diagram of generating a tuning strategy based on a tuning strategy prediction network model according to an embodiment of the present application. The network model contains two fully-connected layers (fully-connected layer 1 and fully-connected layer 2), two active layers (active layer 1 and active layer 2), and a target network layer. Each fully connected layer may contain 512 neuron nodes. The network state data is input into the network model, and the feature data output by the activation layer 2 can be obtained. Then, the obtained characteristic data and the reward value are input into the target network layer together, and a strategy and an evaluation value are obtained. The policy includes adjustment actions for adjusting the congestion window and probabilities corresponding to the adjustment actions, and the evaluation value represents a score for the policy. The target network layer may be an LSTM (Long Short-Term Memory) layer or a full connection layer. In the training process, the congestion window is adjusted according to the probability corresponding to the adjustment action in the obtained strategy, the loss value is calculated based on the strategy and the evaluation value, and the model parameter of the strategy prediction network model is adjusted based on the loss value. Correspondingly, in the deployment stage, the congestion window can be directly adjusted according to the strategy output by the target network layer.

With respect to the above method embodiment, an embodiment of the present application further provides a congestion control device, referring to fig. 6, where fig. 6 is a schematic structural diagram of the congestion control device provided in the embodiment of the present application, and the congestion control device may include:

the first network state acquisition module 601 is configured to acquire network state data of a current acquisition period as first network state data when a preset acquisition period is reached;

a first reward value calculation module 602, configured to calculate a reward value of a current acquisition period as a first reward value based on the first network state data;

a target adjustment policy obtaining module 603, configured to input the first network state data and the first reward value to a pre-trained adjustment policy prediction network model to obtain a target adjustment policy; the adjustment strategy prediction network model is obtained by training based on a reinforcement learning algorithm;

and a congestion window adjusting module 604, configured to adjust the current congestion window according to the target adjustment policy.

By applying the congestion control device provided by the embodiment of the application, the sending rate and the receiving rate of the current acquisition period can be obtained according to different network environments, and the sending rate and the receiving rate can effectively reflect the current network state, so that the congestion control can be effectively realized based on the obtained information. The congestion window can be adjusted by acquiring the sending rate and the receiving rate of the current acquisition period without distinguishing whether the congestion signal is caused by network congestion. Therefore, the congestion control device of the present application can be applied to a complicated network environment, and the effectiveness of congestion control can be improved. In addition, because the reward value is related to the sending rate and the receiving rate of the current acquisition period, the receiving rate and the sending rate are equal to each other as much as possible through a reward mechanism. Therefore, the congestion window is adjusted according to the target adjustment strategy obtained based on the reward value, a larger bandwidth utilization rate can be obtained, and the problem of oversending caused by the fact that the sending rate is larger than the receiving rate can be avoided.

In one embodiment, the first reward value is positively correlated with the receiving rate of the current acquisition period and negatively correlated with the rate difference of the current acquisition period; the rate difference for the current acquisition period represents the difference between the sending rate for the current acquisition period and the receiving rate for the current acquisition period.

In one embodiment, the first network status data further comprises at least one of:

the minimum round trip time for the current acquisition cycle, representing: in the current acquisition period, when reaching the minimum value of the round-trip time of each acquired appointed time at each appointed time;

the average time delay of the current acquisition period represents: average value of time delay collected at each appointed moment in the current collection period; the time delay of the acquisition at a specified moment represents the difference between the round trip time of the acquisition at the specified moment and the minimum value of the round trip time of each acquired specified moment at the specified moment;

the average congestion window size for the current acquisition period, representing: the average value of the sizes of the congestion windows collected at each specified moment in the current collection period;

the size of the data sent in the current acquisition period;

the size of the data packet lost in the current acquisition period;

In one embodiment, the specified time further includes a second time when the packet loss event is detected;

In one embodiment, the first reward value is inversely related to the average time delay of the current acquisition period.

In an embodiment, referring to fig. 7, fig. 7 is another structural schematic diagram of a congestion control apparatus provided in an embodiment of the present application, and the first reward value calculation module 602 includes:

a first threshold judgment sub-module 6021, configured to judge whether the average time delay of the current acquisition period is smaller than a first threshold, if so, trigger the first reward value operator module 6022, and if not, trigger the second reward value operator module 6023;

a first reward value operator module 6022, configured to determine a first reward value of the current acquisition period as a receiving rate of the current acquisition period;

and the second reward value operator module 6023 is configured to calculate a reward value of the current acquisition period based on the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period, and the minimum round trip time of the current acquisition period, and serve as the first reward value.

In one embodiment, the first threshold is positively correlated with the minimum round trip time for the current acquisition period.

In one embodiment, the first threshold is calculated based on equation (4) above;

and the second reward value operator module 6023 is configured to calculate, based on the sending rate of the current acquisition period, the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period, and the minimum round trip time of the current acquisition period, a reward value of the current acquisition period according to the above formula (5), and serve as the first reward value.

In one embodiment, the sending rate of the current acquisition period is the sending rate acquired at the last designated moment in the current acquisition period; the receiving rate of the current acquisition period is the receiving rate acquired at the last appointed moment in the current acquisition period.

In one embodiment, the target adjustment strategy includes two or more first specified adjustment multiples greater than 1, two or more second specified adjustment multiples reciprocal to the two or more first specified adjustment multiples, and a probability corresponding to each of the specified adjustment multiples;

the congestion window adjusting module 604 is specifically configured to adjust the current congestion window according to the specified adjustment multiple with the largest corresponding probability.

In an embodiment, referring to fig. 8, fig. 8 is another schematic structural diagram of a congestion control device provided in the embodiment of the present application, where the congestion control device further includes:

a sending rate adjusting module 605, configured to send a data packet at a rate smaller than the current receiving rate within the first duration when a preset adjusting period is reached.

In this embodiment, the sending rate adjustment module 605 and other modules in the congestion control device may be processed asynchronously.

In one embodiment, the apparatus further comprises:

and the second network state acquisition module is used for acquiring network state data of a preset number of historical periods before the current acquisition period as second network state data before inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

And the target adjustment strategy acquisition module is used for inputting the first network state data, the second network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy.

In an embodiment, referring to fig. 9, fig. 9 is another schematic structural diagram of a congestion control device provided in an embodiment of the present application, where the congestion control device further includes:

a training module 606, configured to obtain network state data of a sample period as sample network state data;

wherein the sample network state data comprises: the sending rate and receiving rate of the sample period; the sending rate of the sample period is determined based on the sending rate acquired at each appointed moment in the sample period; the receiving rate of the sample period is determined based on the receiving rate acquired at each appointed moment in the sample period; the appointed time comprises a first time when the ACK message is received; the reception rate at each first time instant represents: receiving data rate between the time of the last received ACK message before sending the data packet corresponding to the first time and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the transmission rate at each first time instant represents: the sending time of the data packet responded by the ACK message received last before the data packet corresponding to the first time is sent, and the data sending rate between the sending time of the data packet corresponding to the first time;

The embodiment of the present application further provides an electronic device, as shown in fig. 10, which includes a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 complete mutual communication through the communication bus 1004,

a memory 1003 for storing a computer program;

the processor 1001 is configured to implement the following steps when executing the program stored in the memory 1003:

when a first preset period is reached, acquiring network state data of the current acquisition period as first network state data;

wherein the first network state data comprises: the sending rate and the receiving rate of the current acquisition period; the sending rate of the current acquisition period is determined based on the sending rate acquired at each appointed moment in the current acquisition period; the receiving rate of the current acquisition period is determined based on the receiving rate acquired at each appointed moment in the current acquisition period; the designated time comprises a first time when the ACK message is received; the reception rate at each first time instant represents: receiving data rate between the time of the last received ACK message before sending the data packet corresponding to the first time and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the transmission rate at each first time instant represents: the rate of sending data between the sending time of the data packet responded by the ACK message received last before sending the data packet corresponding to the first time and the sending time of the data packet corresponding to the first time;

inputting the first network state data and the first reward value into a pre-trained adjustment strategy prediction network model to obtain a target adjustment strategy; the strategy prediction network model is adjusted and obtained by training based on a reinforcement learning algorithm;

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the congestion control methods described above.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the congestion control methods of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the scope of protection of the present application.

Claims

1. A method of congestion control, the method comprising:

2. The method of claim 1, wherein the first reward value is positively correlated with a reception rate of a current acquisition period and negatively correlated with a rate difference of the current acquisition period; and the speed difference value of the current acquisition period represents the difference value between the sending speed of the current acquisition period and the receiving speed of the current acquisition period.

3. The method of claim 1, wherein the first network state data further comprises at least one of:

the average time delay of the current acquisition period represents: the average value of the time delay collected at each appointed moment in the current collection period;

the size of the data sent in the current acquisition period;

the size of the data packet lost in the current acquisition period;

4. The method of claim 3, wherein the specified time further comprises a second time when a packet loss event is detected;

the transmission rate acquired at each second instant represents: the sending rate acquired when the ACK message is received last before the second moment;

5. The method of claim 3, wherein the first reward value is negatively correlated with an average time delay of a current acquisition period.

6. The method of claim 3, wherein calculating the reward value for the current acquisition period as the first reward value based on the first network status data comprises:

if not, calculating the reward value of the current acquisition cycle as a first reward value based on the sending rate of the current acquisition cycle, the receiving rate of the current acquisition cycle, the average time delay of the current acquisition cycle and the minimum round-trip time of the current acquisition cycle.

7. The method of claim 6, wherein the first threshold is positively correlated with a minimum round trip time for a current acquisition cycle.

8. The method of claim 7, wherein the first threshold is calculated based on a first formula;

the first formula is:

S＝εMinRtt+ρ

9. The method of claim 1, wherein the target adjustment strategy comprises two or more first specified adjustment multiples greater than 1, two or more second specified adjustment multiples reciprocal to the two or more first specified adjustment multiples, and a probability corresponding to each of the specified adjustment multiples;

10. The method of claim 1, further comprising:

11. The method of claim 1, the training process of the adaptation strategy prediction network model comprising the steps of:

acquiring network state data of a sample period as sample network state data;

wherein the sample network state data comprises: the sending rate and receiving rate of the sample period; the sending rate of the sample period is determined based on the sending rate acquired at each appointed moment in the sample period; the receiving rate of the sample period is determined based on the receiving rate acquired at each appointed moment in the sample period; the appointed time comprises a first time when the ACK message is received; the reception rate at each first time instant represents: receiving data rate between the time of the last received ACK message before sending the data packet corresponding to the first time and the first time; the data packet corresponding to the first time represents the data packet responded by the ACK message received at the first time; the transmission rate at each first time instant represents: the sending time of the data packet responded by the last received ACK message and the sending time of the data packet corresponding to the first time are the data sending rate;

12. A congestion control apparatus, characterized in that the apparatus comprises:

13. The apparatus of claim 12, wherein the first reward value is positively correlated with a reception rate of a current acquisition period and negatively correlated with a rate difference of the current acquisition period; and the speed difference value of the current acquisition period represents the difference value between the sending speed of the current acquisition period and the receiving speed of the current acquisition period.

14. The apparatus of claim 12, wherein the first network status data further comprises at least one of:

average round trip time for the current acquisition cycle, representing: average value of round trip time collected at each appointed time in current collection period;

the average in-flight data size for the current acquisition period represents: the average value of the in-flight data acquired at each designated moment in the current acquisition period; the in-flight data size collected at a given time represents: the size of the data packet which has been sent at the specified time and has not received the corresponding ACK message;

the size of the data sent in the current acquisition period;

the size of the data packet lost in the current acquisition period;

15. The apparatus of claim 12, wherein the specified time further comprises a second time when a packet loss event is detected;

the round trip time collected at each second instant represents: round trip time collected when the ACK message was last received before the second time;

16. The apparatus of claim 12, wherein the first reward value is negatively correlated with an average time delay of a current acquisition period.

17. The apparatus of claim 14, wherein the first prize value calculating module comprises:

and the second reward value calculation submodule is used for calculating the reward value of the current acquisition period as the first reward value based on the sending rate of the current acquisition period, the receiving rate of the current acquisition period, the average time delay of the current acquisition period and the minimum round-trip time of the current acquisition period.

18. The apparatus of claim 17, wherein the first threshold is positively correlated with a minimum round trip time for a current acquisition period.

19. The apparatus of claim 18, wherein the first threshold is calculated based on a first formula;

the first formula is:

S＝εMinRtt+ρ

20. The apparatus of claim 12, wherein the target adjustment strategy comprises two or more first specified adjustment multiples greater than 1, two or more second specified adjustment multiples reciprocal to the two or more first specified adjustment multiples, and a probability corresponding to each of the specified adjustment multiples;

21. The apparatus of claim 12, further comprising:

22. The apparatus of claim 12, further comprising:

23. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 11 when executing a program stored in the memory.

24. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-11.