CN112822718B - Packet transmission method and system based on reinforcement learning and stream coding driving - Google Patents

Packet transmission method and system based on reinforcement learning and stream coding driving Download PDF

Info

Publication number
CN112822718B
CN112822718B CN202011620034.2A CN202011620034A CN112822718B CN 112822718 B CN112822718 B CN 112822718B CN 202011620034 A CN202011620034 A CN 202011620034A CN 112822718 B CN112822718 B CN 112822718B
Authority
CN
China
Prior art keywords
packet
sending
action
packets
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011620034.2A
Other languages
Chinese (zh)
Other versions
CN112822718A (en
Inventor
张非凡
李业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202011620034.2A priority Critical patent/CN112822718B/en
Publication of CN112822718A publication Critical patent/CN112822718A/en
Application granted granted Critical
Publication of CN112822718B publication Critical patent/CN112822718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0289Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/06Optimizing the usage of the radio link, e.g. header compression, information sizing, discarding information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/12Wireless traffic scheduling

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a packet transmission method and a system based on reinforcement learning and stream coding driving, wherein the packet transmission method specifically comprises the following steps: firstly, initializing relevant parameters of stream coding, then estimating the congestion state of the network and the ordered grouping receiving progress of the receiving end by the sending end according to the feedback of the receiving end, using the series of states as characteristic vectors for the real-time learning of a model, then selecting the current behavior according to a reward function, and finally realizing the on-line training of the sending action of the sending end in the grouping sending process. The grouping system comprises a sending end, a receiving end, a state space unit, a reward function unit, a value fitting unit and an action selection unit. The invention dynamically adjusts the packet sending interval and intelligently selects the sending packet type according to the network condition and the packet loss rate at the moment, realizes the joint optimization of the stream coding code rate control and the congestion control, improves the throughput of the network, reduces the data transmission delay and can adapt to changeable link conditions.

Description

Packet transmission method and system based on reinforcement learning and stream coding driving
Technical Field
The invention belongs to the technical field of wireless communication, and particularly relates to a packet transmission method and system based on reinforcement learning and stream coding driving, in particular to a packet transmission method and system based on reinforcement learning and stream coding driving and oriented to a wireless link with a large time delay bandwidth product.
Background
The wireless long and fat link, namely the wireless link with large time delay bandwidth product, is an important component of the air-ground integrated network in the future. At present, in a long and fat wireless link, the problem of low bandwidth utilization rate generally exists in TCP (transmission control protocol) which is relied on conventionally. Most TCP variants treat data packet loss as a congestion signal and will therefore reduce the transmission rate. In wireless links, however, data packet loss may be due to random link errors rather than congestion, and such implementation may result in unnecessary slowdowns. In many new air-space-ground integrated network scenarios, link layer automatic repeat request (ARQ) cannot be used due to large propagation delay, and therefore, data packet loss due to link error inevitably occurs, so that the problem is particularly serious. Secondly, to avoid congestion, the sending rate of TCP is gradually increased at the beginning of the transmission (called slow start). In a long fat link where both bandwidth and propagation delay are large, it may take a long time to fill the link with data. Especially in short-term data volume connections, this results in a severe drop in link bandwidth utilization.
A number of TCP congestion control variants have been proposed in the art to address these problems, typical examples include TCPWestwood + and Google's BBR, among others. However, the congestion control scheme based on the rules is not enough to meet the high heterogeneous and dynamic characteristics of the future air-space-ground integrated network. In future heterogeneous and large-scale wireless networks, higher flexibility and more stringent throughput/delay requirements are required. Recently, the fast UDP network connection protocol (QUIC) proposed by Google is widely recognized as an alternative to TCP in future network packet transmission. QUIC is based entirely on UDP, takes advantage of the connectionless nature of UDP to reduce the 3-way handshake delay of TCP to establish a connection, takes advantage of the out-of-order nature of UDP to multiplex HTTP streams more efficiently, and the lightweight nature of UDP also gives great flexibility to deployment.
However, in order for UDP-based transport to provide a reliable, orderly application interface like TCP, it is still necessary to add congestion control and reliability mechanisms. However, current QUIC designs still employ mainly existing congestion control and retransmission mechanisms of TCP. In a fat wireless link, the original problems of TCP still exist.
Disclosure of Invention
In view of the above, the present invention aims to provide a packet transmission method based on reinforcement learning and stream coding driving, so as to solve the problem of low bandwidth utilization of packet transmission links in long and fat wireless links in the existing TCP and QUIC technologies.
The invention provides a packet transmission method based on reinforcement learning and stream coding driving, which comprises the following steps:
s1, setting stream coding parameters;
s2, a sending end sends a packet, wherein the packet is an uncoded source packet or a coded repair packet;
s3, the receiving end decodes and recovers the received packets and orderly transmits the packets to an upper layer application, and simultaneously sends feedback information to the sending end, wherein the feedback information comprises decoding progress, the number and the type of the latest received packets, the number of the received source packets and the number of the received repair packets;
s4, the sending end processes the feedback information, determines system state information, calculates reward and punishment values according to a reward function, estimates available bandwidth of a link, determines interval time of sending actions of the sending end according to the available bandwidth of the link, and then conducts reinforcement learning;
the reinforcement learning is executed based on a reinforcement learning model, and the reinforcement learning method comprises the following steps:
s41, outputting a value function after weight updating and the value of each sending action according to the system state information and the reward and punishment values;
s42, selecting an optimal sending action according to the value of each sending action, wherein the optimal sending action is used as the sending action with the maximum value in the current state;
the system state information comprises the ratio of the current packet round-trip delay to the minimum packet round-trip delay, the ratio of the current sending packet action number to the total action number, and the ratio of the current sending source packet number to the total packet number; the sending action is one of sending source packet, sending repair packet and abandoning sending; the reward function is determined according to an optimization objective of packet transmission that maximizes its throughput for each user stream while minimizing latency;
s43, the sending end realizes sending action according to the optimal sending action selected in the step S42;
and S5, repeating the steps S3 and S4 to realize congestion control and stream coding rate control.
Further, the repair packet is a linear combination of source packets that have been previously transmitted, as shown in the following equation:
Figure GDA0003245036390000021
wherein, ckDenotes a repair packet numbered k, k being 0,1,2,3, …; gk,iIs from a finite field
Figure GDA0003245036390000024
The selected stream coding coefficients; w is asFor the number of the oldest source packet in the current transmit queue, wsIs 0, wsThe value of (c) is continuously updated according to the feedback information; i.e. iseqIndicating the number of the last transmitted source packet.
Further, the reward function is expressed as follows:
Figure GDA0003245036390000022
r (s, a) represents that the system state information is s, and the motion is sent as a reward and punishment value when a is reached; gp is goodput, i.e., the number of ordered source packets received by the receiving end divided by the elapsed time; inp is the number of all packets sent by the sending end divided by the time used; u shapenAs utility function, UnLog (gp) - δ log (RTT), RTT being a smoothed estimate of the minimum round trip delay; RTT (round trip time)ratioThe ratio of the currently and smoothly estimated RTT to the minimum value of RTT; tau is a preset hyper-parameter.
Further, the cost function is obtained by the following steps:
and mapping the system state information into a characteristic vector only containing discrete values 0 and 1 by adopting a tile coding mode, and fitting the characteristic vector in a linear function form by combining the reward and punishment values to obtain a cost function.
Further, the selecting an optimal sending action according to the value of each sending action specifically includes: and selecting the optimal sending action by using an e-greedy strategy.
The invention also provides a packet transmission system based on reinforcement learning and stream coding driving, which comprises:
a transmitting end, configured to transmit a packet, where the packet is an uncoded source packet or a coded repair packet;
the receiving end is used for decoding and recovering the received packets and orderly transmitting the packets to an upper layer application, and simultaneously sending feedback information to the sending end, wherein the feedback information comprises decoding progress, the number and the type of the latest received packets, the number of the received source packets and the number of the received repair packets;
the state space unit is arranged at the sending end and used for processing the feedback information and determining system state information; the system state information comprises the ratio of the current packet round-trip delay to the minimum packet round-trip delay, the ratio of the current sending packet action number to the total action number, and the ratio of the current sending source packet number to the total packet number;
a reward function unit for calculating an output reward penalty value according to a reward function as shown below;
Figure GDA0003245036390000023
r (s, a) represents that the system state information is s, and the motion is sent as a reward and punishment value when a is reached; gp is goodput, i.e., the number of ordered source packets received by the receiving end divided by the elapsed time; inp is the number of all packets sent by the sending end divided by the time used; u shapenAs utility function, UnLog (gp) - δ log (RTT), RTT being a smoothed estimate of the minimum round trip delay; RTT (round trip time)ratioThe ratio of the currently and smoothly estimated RTT to the minimum value of RTT; tau is a preset hyper-parameter;
the system state information is mapped into a characteristic vector only containing discrete values 0 and 1 by adopting a tile coding mode, then a cost function is obtained by combining the reward and punishment values and fitting in a linear function form of the characteristic vector, and the value of each sending action is output;
and the action selection unit is used for selecting the sending action with the maximum value by adopting an e-greedy strategy according to the value of each sending action output by the value fitting unit and sending the sending action by the sending end.
Compared with the prior art, the invention has the following beneficial effects:
1. on the one hand, the technical scheme of the invention adopts stream coding to realize packet loss recovery and provides a reliability mechanism for UDP. The method has higher throughput than a retransmission scheme and has smaller decoding delay than a block code (block code); on the other hand, the invention is based on a reinforcement learning model, on-line learning is carried out according to the current network condition and the packet loss rate, the packet sending interval is dynamically adjusted, the sending packet type is intelligently selected, the joint optimization of the stream coding code rate (the proportion of two actions of sending source packets and repairing packets) control and congestion control is realized, the throughput of the network is improved, the data transmission delay is reduced, and the invention can adapt to changeable link conditions.
2. A large amount of sample data is not needed, only the information of the external environment (the congestion condition of the network and the ordered packet receiving progress of the receiving end at the moment) is needed to carry out self-learning model on-line training, and the artificial experience and the external data information are rarely relied on.
3. The sending end can learn and make a decision on line according to the network condition, so that the packet sending is more intelligent.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be noted that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained by those skilled in the art without inventive exercise.
Fig. 1 is a block diagram of a packet transmission system according to the present invention.
Fig. 2 is a block diagram illustrating a structure of a reinforcement learning model in the packet transmission system according to the present invention.
Fig. 3 is a graph comparing throughput of the transmission method of the present invention with other methods.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it should be noted that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a packet transmission method based on reinforcement learning and stream coding driving, which specifically comprises the following steps:
s1, setting stream coding parameters;
the stream coding parameters are seeds of a pseudo-random number generator used to obtain the stream coding coefficients.
S2, a sending end sends a packet, wherein the packet is an uncoded source packet or a coded repair packet;
s3, the receiving end decodes and recovers the received packets and orderly transmits the packets to an upper layer application, and simultaneously sends feedback information to the sending end, wherein the feedback information comprises decoding progress, the number and the type of the latest received packets, the number of the received source packets and the number of the received repair packets;
the transmitting end may send two packets, one being an uncoded source packet and the other being a coded repair packet. Let iseqNumber indicating the most recently transmitted uncoded source packet, initialize iseq-1, i after each transmission of a source packetseqAnd adding 1. The repair packet is represented as
Figure GDA0003245036390000031
Which is a linear combination of source packets that have been previously transmitted. In the formula (1), ckDenotes a repair packet numbered k, gk,iIs from a finite field
Figure GDA0003245036390000032
Where k is 0, and 1,2 … is the number of the repair packet. w is asCorresponding to the number of the oldest (old) source packet in the current transmit queue. Initialization wsAt 0, the original packet acknowledged as received will be removed from the queue according to the feedback from the receiving end, at which time wsAn update will be made. Let we=iseq,[ws,we]Referred to as the coding window of the current repair packet.
The receiving end decodes and recovers the received packet and transmits the packet to the upper layer application in order. Let iordIndicating the latest in-order transport packet number, initializing iordThe decoder initial state is an ordered state-1. If the decoder next receives a packet that is neither true
Figure GDA0003245036390000047
Nor do they have we=iordA repair packet of a nature means that the in-order transmission is interrupted. The decoder enters an out-of-order state where it will buffer the received packet and attempt decoding. The buffered packets are out-of-order source packets (numbered greater than i)ord+1) or repair packets (where we>iord+1). Order to
Figure GDA0003245036390000041
Make it
Figure GDA0003245036390000042
The maximum number of the upper bound of the coding window in the buffered repair packets.
Figure GDA0003245036390000043
Referred to as the decoder current decoding window. As more packets are buffered, the window may expand (i.e., the window may expand)
Figure GDA0003245036390000044
Growth). The decoder decodes using gaussian elimination, i.e. dynamically constructs a linear system of equations AS ═ B and performs forward elimination on-line, where the rows of a and B are the coding coefficients of the buffered packets (unordered source packets are treated AS special repair packets with coding coefficients having only one non-zero element 1) and the coded information symbols, respectively. When decoding is successful, the decoded source packets in the decoding window are all transmitted to the upper layer application, the decoder is restored to an ordered state, and the source packets are transmitted in order
Figure GDA0003245036390000045
And the process is restarted.
And S4, the sending end processes the feedback information, determines system state information, calculates reward and punishment values according to a reward function, estimates the available bandwidth of the link, determines the interval time of sending actions of the sending end according to the available bandwidth of the link, and then executes a learning process based on a reinforcement learning model.
In the invention, the system state information is used for representing the network condition, and specifically comprises the ratio of the current packet round-trip delay to the minimum packet round-trip delay, the ratio of the current sending packet action number to the total action number, and the ratio of the current sending source packet number to the total packet number. The transmission action is one of transmission source packet, transmission of repair packet, and abandonment of transmission (backoff). The interval between the sending actions is set to 2/3 packet size divided by the link available bandwidth.
In the invention, the reward function is determined according to the optimization target of the packet transmission, and the optimization target of the packet transmission is set to maximize the throughput of each user flow while reducing the time delay to the maximum extent. Specifically, the embodiments of the present invention design the reward function as follows:
Figure GDA0003245036390000046
r (s, a) represents that the system state information is s, and the motion is sent as a reward and punishment value when a is reached; gp is goodput, i.e., the number of ordered source packets received by the receiving end divided by the time taken for transmission to date; inp is the number of all packets sent by the sending end divided by the time taken for the current transmission; u shapenAs utility function, UnLog (gp) - δ log (RTT), RTT being a smoothed estimate of the minimum round trip delay; RTT (round trip time)ratioThe ratio of the currently and smoothly estimated RTT to the minimum value of RTT; tau is a preset hyper-parameter. In the embodiment of the present invention, τ is set to 1.2, and it can be seen that the function emphasizes that each user stream should try to maximize its throughput while minimizing the delay. The log function ensures that the network can fairly allocate bandwidth resources when multiple users compete for the same bottleneck link.
Through one action, if the utility function value is increased, a positive prize can be obtainedA penalty value. If the utility function value decreases and RTTratio≥τ,RTTratioThe ratio of the currently smoothed estimated RTT to the minimum value of RTT, so that RTTratioMore than or equal to tau represents congestion, the reward and penalty value is a negative value, and the closer gp/inp is to 1, the smaller the reward and penalty value is. In other cases, the reward penalty value is zero.
The system state with continuous values is mapped into a feature vector only containing discrete values 0 and 1 by adopting a tile coding (telecom) mode. A cost function reflecting the value of each transmission action is then fitted in the form of a linear function of this feature vector. The learning process of reinforcement learning is to obtain a weight of a cost function for each transmission action.
Specifically, the reinforcement learning process of the invention comprises the following steps:
s41, outputting a value function after weight updating and the value of each sending action according to the system state information and the reward and punishment values;
s42, selecting the sending action with the maximum value (namely the optimal sending action) in the current state according to the value of each sending action;
and S43, the sending end realizes the sending action according to the optimal sending action selected in the step S42.
Specifically, the sending end selects whether to send the packet at present when the sending action moment comes; if the packet is determined to be transmitted, it is further determined whether to transmit a new source packet or to generate a repair packet based on a source packet that has been transmitted previously.
In the embodiment of the invention, the optimal sending action is selected by using an e-greedy strategy. The e-greedy strategy is specifically as follows: if the current random probability is lower than oa, an action is selected at random, otherwise the action with the highest value is selected in the current state. Selecting actions according to an e-greedy strategy, and realizing the combined optimization of stream coding rate control and congestion control; and the new packets or the repair packets to be sent are sequentially stored in the UDP transmission buffer to be sent.
And S5, continuously repeating the steps S3 and S4, dynamically adjusting the packet sending interval and intelligently selecting the sent packet type according to the current network condition and the packet loss rate, and realizing the joint optimization of stream coding rate control and congestion control so as to realize congestion control and stream coding rate control.
The code rate of the stream coding is the ratio of two actions of sending source grouping and repairing grouping. R ═ a/(a + b), where a is the number of transmitted source packets and b is the number of transmitted repair packets. Therefore, the technical scheme provided by the invention controls the code rate by controlling the action proportion of the sending source grouping and the repairing grouping.
As shown in fig. 1, the present invention further provides a packet transmission system based on reinforcement learning and stream coding driving, where the packet transmission system includes a transmitting end, a receiving end, a state space unit, a reward function unit, a value fitting unit, and an action selection unit. Wherein the state space unit, the reward function unit, the value fitting unit and the action selection unit constitute a reinforcement learning model as shown in fig. 2.
The system comprises a sending end and a receiving end, wherein the sending end is provided with an encoder, and the encoder sends an uncoded source packet or a coded repair packet;
and the receiving end is used for decoding and recovering the received packets and orderly transmitting the packets to the upper layer application, and simultaneously sending feedback information to the sending end, wherein the feedback information comprises the decoding progress, the number and the type of the latest received packets, the number of the received source packets and the number of the received repair packets.
The state space unit is arranged at the sending end and used for processing the feedback information sent by the receiving end and determining the system state information; the system state information includes the ratio of the current packet round trip delay to the minimum packet round trip delay, the ratio of the number of currently transmitted packet actions to the total number of actions, and the ratio of the number of currently transmitted source packets to the total number of packets.
And the reward function unit is used for calculating an output reward punishment value according to the reward function.
And the value fitting unit is used for mapping the system state information into a feature vector only containing discrete values 0 and 1 in a tile coding mode, then fitting the feature vector in a linear function form by combining the reward and punishment values to obtain a value function, and outputting the value of each sending action.
And the action selection unit is used for selecting the sending action with the maximum value by adopting an e-greedy strategy according to the value of each sending action output by the value fitting unit and sending the sending action by the sending end. And the new packets or the repair packets to be sent are sequentially stored in the UDP transmission buffer to be sent.
In specific application, a receiving end sends a stream coding packet, a receiving end decoder decodes the stream coding packet, and decoding and receiving progress information and congestion indexes are continuously fed back to a sending end. The sending end abstracts the state information according to the feedback information, calculates a reward value according to a reward function, inputs the reward value and the state information into a value fitting unit to obtain corresponding values of all actions, updates related fitting parameters, and finally selects the optimal action according to an action selection unit. The reinforcement learning process is a process of continuously iterating and updating the intelligent agent value fitting function driven by the feedback of the receiving end. The model can be continuously learned along with the progress of the packet transmission process, and the joint optimization of congestion control and stream coding rate control is realized. The network simulation result shows that the method is used under the condition of wireless long and fat links. The throughput obtained by the method of the invention is far better than that obtained by other methods. As shown in fig. 3, the effective throughput (gp) obtained by the scheme of the present invention on a fat wireless link with 1% packet loss rate, 100 ms delay, and 20Mbps bandwidth is much higher than that of existing schemes such as QUIC, TCPBBR, and TCPCUBIC.
Although the present invention has been described in terms of the preferred embodiment, it is not intended that the invention be limited to the embodiment. Any equivalent changes or modifications made without departing from the spirit and scope of the present invention also belong to the protection scope of the present invention. The scope of the invention should therefore be determined with reference to the appended claims.

Claims (5)

1. A packet transmission method based on reinforcement learning and stream coding driving is characterized by comprising the following steps:
s1, setting stream coding parameters;
s2, a sending end sends a packet, wherein the packet is an uncoded source packet or a coded repair packet;
s3, the receiving end decodes and recovers the received packets and orderly transmits the packets to an upper layer application, and simultaneously sends feedback information to the sending end, wherein the feedback information comprises decoding progress, the number and the type of the latest received packets, the number of the received source packets and the number of the received repair packets;
s4, the sending end processes the feedback information, determines system state information, calculates reward and punishment values according to a reward function, estimates available bandwidth of a link, determines interval time of sending actions of the sending end according to the available bandwidth of the link, and then conducts reinforcement learning;
the reinforcement learning is executed based on a reinforcement learning model, and the reinforcement learning method comprises the following steps:
s41, outputting a value function after weight updating and the value of each sending action according to the system state information and the reward and punishment values;
s42, selecting an optimal sending action according to the value of each sending action, wherein the optimal sending action is used as the sending action with the maximum value in the current state;
the system state information comprises the ratio of the current packet round-trip delay to the minimum packet round-trip delay, the ratio of the current sending packet action number to the total action number, and the ratio of the current sending source packet number to the total packet number; the sending action is one of sending source packet, sending repair packet and abandoning sending; the reward function is determined according to an optimization objective of packet transmission that maximizes its throughput for each user stream while minimizing latency;
s43, the sending end realizes sending action according to the optimal sending action selected in the step S42;
and S5, repeating the steps S3 and S4 to realize congestion control and stream coding rate control.
2. The packet transmission method according to claim 1, wherein the repair packet is a source packet s that has been previously transmittediThe linear combination of (a) is specifically represented by the following formula:
Figure FDA0003245036380000011
wherein, ckDenotes a repair packet numbered k, k being 0,1,2,3, …; gk,iIs from a finite field
Figure FDA0003245036380000013
The selected stream coding coefficients; w is asFor the number of the oldest source packet in the current transmit queue, wsIs 0, wsThe value of (c) is continuously updated according to the feedback information; i.e. iseqIndicating the number of the last transmitted source packet.
3. The packet transmission method according to claim 1, wherein the reward function is expressed by the following equation:
Figure FDA0003245036380000012
r (s, a) represents that the system state information is s, and the motion is sent as a reward and punishment value when a is reached; gp is goodput, i.e., the number of ordered source packets received by the receiving end divided by the elapsed time; inp is the number of all packets sent by the sending end divided by the time used; u shapenAs utility function, UnLog (gp) - δ log (RTT), RTT being a smoothed estimate of the minimum round trip delay; RTT (round trip time)ratioThe ratio of the currently and smoothly estimated RTT to the minimum value of RTT; tau is a preset hyper-parameter.
4. The packet transmission method according to claim 1, wherein the cost function is obtained by:
and mapping the system state information into a characteristic vector only containing discrete values 0 and 1 by adopting a tile coding mode, and fitting the characteristic vector in a linear function form by combining the reward and punishment values to obtain a cost function.
5. The packet transmission method according to claim 1, wherein the selecting an optimal sending action according to the value of each sending action specifically comprises: and selecting the optimal sending action by using an e-greedy strategy.
CN202011620034.2A 2020-12-31 2020-12-31 Packet transmission method and system based on reinforcement learning and stream coding driving Active CN112822718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011620034.2A CN112822718B (en) 2020-12-31 2020-12-31 Packet transmission method and system based on reinforcement learning and stream coding driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011620034.2A CN112822718B (en) 2020-12-31 2020-12-31 Packet transmission method and system based on reinforcement learning and stream coding driving

Publications (2)

Publication Number Publication Date
CN112822718A CN112822718A (en) 2021-05-18
CN112822718B true CN112822718B (en) 2021-10-12

Family

ID=75855909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011620034.2A Active CN112822718B (en) 2020-12-31 2020-12-31 Packet transmission method and system based on reinforcement learning and stream coding driving

Country Status (1)

Country Link
CN (1) CN112822718B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110581808A (en) * 2019-08-22 2019-12-17 武汉大学 Congestion control method and system based on deep reinforcement learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599965B (en) * 2009-07-02 2012-01-25 电子科技大学 Self-adaption high-speed information transmission method based on measurement
CN102137023B (en) * 2011-04-14 2014-01-29 中国人民解放军空军工程大学 Multicast congestion control method based on available bandwidth prediction
US8793557B2 (en) * 2011-05-19 2014-07-29 Cambrige Silicon Radio Limited Method and apparatus for real-time multidimensional adaptation of an audio coding system
CN109217977A (en) * 2017-06-30 2019-01-15 株式会社Ntt都科摩 Data transmission method for uplink, device and storage medium
CN107911242A (en) * 2017-11-15 2018-04-13 北京工业大学 A kind of cognitive radio based on industry wireless network and edge calculations method
CN110958078B (en) * 2019-11-01 2022-06-24 南通先进通信技术研究院有限公司 Low-delay stream code packet transmission method for high-loss link

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110581808A (en) * 2019-08-22 2019-12-17 武汉大学 Congestion control method and system based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Q-learning的HTTP自适应流码率控制方法研究;熊丽荣等;《通信学报》;20170925(第09期);全文 *

Also Published As

Publication number Publication date
CN112822718A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
US6934251B2 (en) Packet size control technique
CN107171842B (en) Multipath transmission protocol congestion control method based on reinforcement learning
WO2012174763A1 (en) Tcp-based adaptive network control transmission method and system
US6097697A (en) Congestion control
CN100588177C (en) Data transferring method, and communication system and program applied with the method
US10834368B2 (en) Kind of partially reliable transmission method based on hidden Markov model
CN111314022B (en) Screen updating transmission method based on reinforcement learning and fountain codes
CN105827537A (en) Congestion relieving method based on QUIC protocol
JP5009009B2 (en) Method and apparatus for controlling parameters of wireless data streaming system
US7376737B2 (en) Optimised receiver-initiated sending rate increment
RU2018117504A (en) METHOD FOR ADMINISTRATION OF ADAPTIVE AND JOINT IMPLEMENTATION OF ROUTING POLICY AND REDIAL TRANSFER POLICY AT A UNIT IN A UNDERWATER NETWORK, AND A MEANS FOR ITS IMPLEMENTATION
EP1251661A1 (en) Data flow control method
WO2006065008A1 (en) Apparatus for arq controlling in wireless portable internet system and method thereof
CN105450357A (en) Adjustment method of encoding parameters, adjustment device of encoding parameters, processing method of feedback information and processing device of feedback information
CN111818570A (en) Intelligent congestion control method and system for real network environment
US20130039209A1 (en) Data transfer
CN101588597A (en) Control method of wireless streaming media self-adapting mixing FEC/ARQ based on Kalman filtering
CN107070802A (en) Wireless sensor network Research of Congestion Control Techniques based on PID controller
CN113162850A (en) Artificial intelligence-based heterogeneous network multi-path scheduling method and system
Jarvinen et al. FASOR retransmission timeout and congestion control mechanism for CoAP
CN112822718B (en) Packet transmission method and system based on reinforcement learning and stream coding driving
CN109039541B (en) Link self-adaptive optimization method based on AOS communication system packet loss rate minimization
CN104980365A (en) TCP transmission acceleration method based on continuous packet losing congestion judgment
CN108337167B (en) Video multi-channel parallel transmission and distribution method and system based on ant colony algorithm
KR100419280B1 (en) Indirect acknowledgement method in snoop protocol in accordance with the status of a wireless link and packet transmission apparatus in an integrated network using the same method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant