CN112564712B - Intelligent network coding method and equipment based on deep reinforcement learning - Google Patents

Intelligent network coding method and equipment based on deep reinforcement learning Download PDF

Info

Publication number
CN112564712B
CN112564712B CN202011344089.5A CN202011344089A CN112564712B CN 112564712 B CN112564712 B CN 112564712B CN 202011344089 A CN202011344089 A CN 202011344089A CN 112564712 B CN112564712 B CN 112564712B
Authority
CN
China
Prior art keywords
node
coding
network
intermediate node
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011344089.5A
Other languages
Chinese (zh)
Other versions
CN112564712A (en
Inventor
王琪
刘建敏
徐勇军
王永庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202011344089.5A priority Critical patent/CN112564712B/en
Publication of CN112564712A publication Critical patent/CN112564712A/en
Priority to PCT/CN2021/118099 priority patent/WO2022110980A1/en
Application granted granted Critical
Publication of CN112564712B publication Critical patent/CN112564712B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a network coding method based on deep reinforcement learning, which comprises the following steps: dividing information to be transmitted into K pieces by a source node, determining the coding coefficient of each piece according to a source node coding model, generating and transmitting a coding packet to a next hop node; and the intermediate node receives the coding packet sent by the previous node, codes the received coding packet again, determines a coding coefficient according to an intermediate node coding model, and generates and sends the coding packet to the next-hop node, wherein the source node and the intermediate node coding model are obtained by training the DQN network. The invention can adaptively adjust the coding coefficient according to the dynamic change of the network, improves the decoding efficiency, has good model generalization capability, can generalize the network under different network scales and different link qualities, simplifies the coding coefficient optimization implementation and improves the DQN training stability by the respective coding coefficient optimization models which are respectively distributed and executed on the source node and the intermediate node.

Description

Intelligent network coding method and equipment based on deep reinforcement learning
Technical Field
The invention relates to the technical field of information, in particular to a network coding method.
Background
Linear network coding is a type of network coding in which data is linearly combined by coding coefficients selected from a finite field. Linear network coding has lower complexity and simpler model than nonlinear network coding using nonlinear combining functions, and thus has been intensively studied and widely used.
The basic idea of linear network coding is that nodes in the network linearly encode original data by selecting coding coefficients from a finite field to form new encoded data and forward the new encoded data, and the receiving node can recover the original data by corresponding decoding operations. The linear network coding method mainly comprises a deterministic network coding algorithm and a random linear network coding algorithm. Deterministic network coding algorithms canThe target node is guaranteed to decode successfully, but it needs global information such as network topology and link capacity. In reality, there are a variety of topologies, and it is impractical to design specific coding methods for different types of networks. Furthermore, it is not suitable for dynamic networks because the collection of global information from distributed nodes in real time is very complex and cannot be applied on a large scale. In random linear network coding, nodes use independent and random coding coefficients selected in a certain limit domain to carry out linear combination on data to be transmitted. Related studies have shown that random linear network coding ensures that each receiving node can finish decoding with a high probability, i.e. the global coding coefficient matrix corresponding to the receiving node is full rank, as long as the finite field is large enough. Since the main feature of random linear network coding is to randomly select coefficients of linear combinations, random linear network coding is suitable for networks of unknown or varying topology, as it can be easily implemented in a distributed manner. For example, a node with encoding capability has three packets X, Y, Z to be transmitted, and the node can randomly choose the encoding coefficient a 1 、a 2 、a 3 、b 1 、b 2 、b 3 、c 1 、c 2 、c 3 The data packets are then combined into a using the coding coefficients 1 X+a 2 Y+a 3 Z、b 1 X+b 2 Y+b 3 Z、c 1 X+c 2 Y+c 3 And Z, sending out the combinations. After receiving 3 code combinations, the receiving node works as matrixUpon anecdotal, the original packet X, Y, Z can be solved by linear operations.
The decoding failure may be caused by various reasons, not only the linear correlation coefficient extracted by the intermediate node, but also packet loss caused by network instability, so that the intermediate node does not receive some packets for decoding. In random linear network coding, coefficients are randomly extracted from a Galois field with equal probability. Therefore, this coding method cannot adjust the coding coefficient according to the dynamic changes of the network (including the changes of the network link quality and the number of intermediate nodes), which causes the problem of low decoding efficiency.
Disclosure of Invention
The present invention addresses the above-mentioned problems by providing, according to a first aspect of the present invention, a network coding method, the network including a source node and an intermediate node, the method comprising:
the source node divides the information to be transmitted into K pieces x 1 ,x 2 ,…,x K K is an integer greater than 1, and the coding coefficient g (x 1 ),g(x 2 ),...,g(x K ) K slices are encoded to generate an encoded packet P S And transmits the encoded packet P to the next hop node S Wherein the source node coding model is obtained by training a DQN network, wherein each step of environmental state is usedAs training input ss k For the environmental state of the kth step, x k For the kth piece of the packet, +.>M recently received coded packets stored in a buffer area of a next hop intermediate node for the source node, M being an integer greater than 1;
the intermediate node receives the coded packet sent by the previous node and sends the received coded packet P j Coding M times, determining coding coefficient g (P j (1)),g(P j (2)),…g(P j (M)) to generate the encoded packet P new And transmits the encoded packet P to the next hop node new Wherein the intermediate node coding model is obtained by training a DQN network, wherein each step of environmental state is usedS as training input k P being the environmental state of the kth step new For the current encoded packet, P j (k) For the kth encoded packet in the intermediate node buffer +.>The recently received M encoded packets stored in the buffer for the intermediate node next hop node z.
In one embodiment of the present invention, wherein the source node encoding model includes a target network N s And executing network N snode The training of the source node coding model comprises the following steps:
step 110: playback of memory M from experience s Training N with medium random sampling experience s
Step 120: will N s The trained DQN parameters are sent to the source node for N snode Updating; and/or
Step 130: at the source node, the environmental state ss k As N snode The input of the DQN model of (1) outputs the Q value corresponding to each behavior, selects the behavior to determine the coding coefficient of K slices of the original information by greedy strategy probability epsilon, collects the experience of the source node interacting with the environment after execution, and stores the experience in an experience playback memory M s Is a kind of medium.
In one embodiment of the invention, wherein the intermediate node coding model comprises a target network N R And executing network N Rnode Training of the intermediate node coding model includes:
step 210: in an empirical playback memory M R Training N with medium random sampling experience R
Step 220: will N R The trained DQN parameters are sent to each intermediate node to generate N Rnode Updating; and/or
Step 230: at each intermediate node, the environmental state s k As N Rnode The input of the DQN model of (2) outputting the Q value corresponding to each behavior, determining the coding coefficient of M packets of the intermediate node buffer area by using greedy strategy probability epsilon selection behavior, collecting the experience of the intermediate node interaction with the environment after execution, and storing the experience in an experience playback memory M R Is a kind of medium.
In one embodiment of the inventionIn embodiments, wherein for N s Training includes:
encoding network environment state ss k As N s By minimizing the loss functionTraining the neural network, wherein K takes a value of 1 … K, and Q target Is N s A calculated target Q value;
a k representing the behavior of the kth step;
r k representing rewards after taking action in the kth step;
θ k network parameters representing the DQN of step k.
In one embodiment of the present invention, wherein for N R Training includes:
coding the environmental state s of a network k As N R By minimizing the loss functionTraining the neural network, wherein k takes a value of 1 … M
Q target Is N R A calculated target Q value;
a k representing the behavior of the kth step;
r k representing rewards after taking action in the kth step;
θ k network parameters representing the DQN of step k.
In one embodiment of the present invention, wherein for N s
a k The kth slice x of information k Coding coefficient, a k ∈A S Wherein A is S = {0,1, (q-1) }, q is the domain value size of the galois field;
r when the code packet sent by the source node can increase the rank of the linear system formed by the code packet in the next-hop intermediate node buffer of the source node k 1, otherwise, r k Is 0.
At the bookIn one embodiment of the invention, wherein for N R
a k Coding coefficient for kth packet, a k ∈A R Wherein A is R = {0,1, (q-1) }, q is the domain value size of the galois field;
r when the encoded packet sent by the intermediate node enables the rank of the linear system formed by the encoded packets in the next hop node buffer of the intermediate node to increase k 1, otherwise, r k Is 0.
In one embodiment of the invention, wherein if the source node does not receive an ACK, the ss of the source node k A kind of electronic deviceUnchanged; if the intermediate node does not receive the ACK, s of the intermediate node k Is->Is unchanged.
In one embodiment of the present invention, wherein the source node generates the encoded packet P by S
P S =G S X, wherein x= [ X ] 1 ,x 2 ,...,x K ],G S =[g(x 1 ),g(x 2 ),...,g(x K )]。
In one embodiment of the present invention, wherein the kth one of the M encodings of the intermediate node comprises:
when k=1, P new =P j ⊕(g(P j (k))·P j (k)),
When k > 1, P new =P new ⊕(g(P j (k))·P j (k)),
P j (k) For the kth encoded packet in the buffer of the intermediate node, k takes a value of 1 … M.
According to a second aspect of the present invention there is provided a computer readable storage medium having stored therein one or more computer programs which when executed are adapted to carry out the network coding method of the present invention.
According to a third aspect of the present invention there is provided a network encoded computing system comprising a storage device, and one or more processors; wherein the storage means is for storing one or more computer programs which when executed by the processor are for carrying out the network coding method of the invention.
Compared with the prior art, the embodiment of the invention has the advantages that:
compared with the prior art, the invention has the following advantages:
1. compared with the prior art, the method can adaptively adjust the coding coefficient according to the dynamic change of the network (including the change of the quality of network links and the number of intermediate nodes) so as to adapt to the network environment with high dynamic change and improve the decoding efficiency.
2. The present invention uses a Markov Decision Process (MDP) to formulate a coding coefficient optimization problem, wherein network changes can be automatically and continuously represented as MDP state transitions. In addition, the invention has good model generalization capability and can generalize the network under different network scales and different link qualities, so that the invention can adapt to the dynamic change of the network.
3. The invention realizes a distributed coding coefficient optimization mechanism, a coding coefficient optimization model Network based on a Deep Q-learning Network (DQN) is intensively trained by a preset optimizer, and meanwhile, coding coefficient optimization models based on a source node and an intermediate node of the DQN are respectively executed in a distributed manner on the source node and the intermediate node, thereby simplifying coding coefficient optimization implementation and improving the stability of DQN training.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:
FIG. 1 illustrates a source node network encoding flow diagram according to an embodiment of the invention;
FIG. 2 illustrates an intermediate node network encoding flow diagram according to an embodiment of the invention;
FIG. 3 illustrates a functional configuration block diagram of an apparatus for deep reinforcement learning intelligent network coding in accordance with an embodiment of the present invention;
FIG. 4 illustrates a multi-jumper network topology according to an embodiment of the invention;
FIG. 5 illustrates a multi-hop parallel network topology according to an embodiment of the invention;
FIG. 6 shows a simulation experiment result diagram of a multi-jumper network according to an embodiment of the present invention;
FIG. 7 shows a diagram of simulation experiment results of a multi-hop parallel network according to an embodiment of the present invention;
FIG. 8 is a graph showing simulation experiment results of generalization capability on different network scales according to an embodiment of the present invention;
fig. 9 is a diagram showing simulation experiment results of generalization capability on different link qualities according to an embodiment of the present invention;
FIG. 10 shows a graph comparing simulation experiment results of three methods of the reference encoding algorithm and the RL-obtained SNC with the results of a real experiment platform.
Detailed Description
The present invention provides a network coding method based on deep reinforcement learning, and the method is described in detail below with reference to the accompanying drawings and specific embodiments.
In general terms, in the present invention, a network includes a source node, an intermediate node, and a destination node that receives information. The information is generated in the source node, sent out by the source node, passes through the intermediate node and is finally received by the destination node. The source node divides the information into a plurality of slices, determines the coding coefficient of each slice, codes the slices, generates a coded packet, and transmits the coded packet to the next hop node. The intermediate node receives the encoded packets, determines the encoding coefficient of each packet for the received encoded packets, encodes the plurality of encoded packets again, generates a new encoded packet, and transmits the new encoded packet to the next hop node.
The invention adopts a depth reinforcement learning method DQN to determine the coding coefficient, and the model of the DQN method comprises a plurality of steps and a plurality of environment states, and can take a plurality of actions in each environment state, wherein each action corresponds to different rewards. In the present invention, each step corresponds to determining a coding coefficient for each slice or each packet, where the behavior is the determined coding coefficient and the environmental state is the relevant slice or packets. The DQN evaluates each behavior using the Q value, and among the behaviors in each environmental state, the behavior that maximizes the Q value is the best one, that is, the one that should be taken in that environmental state. DQN is to find the best solution overall, and therefore the best behaviour is evaluated from the whole of a series of behaviour, i.e. that behaviour which can optimise the jackpot for all steps in the current environmental situation.
The calculation of the Q value is based on the reward, using the following formula: q (Q) k =r k +γmaxQ k+1 K is a positive integer, Q value Q of the kth step k Depending on the Q value of the k+1 steps, in particular, the maximum maxQ of the Q values of all the behaviors of the k+1 steps k+1 Gamma is discount factor, 0 gamma is less than or equal to 1, r k The last step is awarded for the kth step, and the Q value of the last step is awarded for the last step.
The DQN enables the neural network to calculate the Q value for each behavior for each environmental state by training the neural network. The training method of the DQN is to collect input and output from a real environment, wherein the input is an environment state, the output is a Q value of a behavior, the Q value of the behavior is calculated after the environment state is input into a convolutional neural network CNN, and an error between the calculated target Q value and the real Q value is expressed by using an loss function so as to reduce the error. In actual execution, although the behavior with the greatest Q value is the best behavior, in order to balance learning and exploration, new behaviors are tried, for example, a greedy strategy is adopted, namely, unknown behaviors are selected with a small probability epsilon (epsilon < 1), and behaviors with the greatest Q value known by learning are selected with 1 epsilon.
The existing DQN further comprises: a sample playback Buffer (Replay Buffer), otherwise known as experience playback (Experience Replay), and a Target Network (Target Network). In order to alleviate the influence of the related problems, the training and execution parts are decoupled as far as possible, and the invention introduces a new Network, still named as Target Network (Target Network), and the original Target Network is named as execution Network (Behavior Network).
At the beginning of training, both network models use exactly the same parameters. In the execution process, the Behavior Network is responsible for interacting with the environment to obtain an interaction sample. In the training process, a Target Q value obtained by Q-Learning is obtained by Target Network calculation; and then comparing the model with a Q value obtained by the Behavior Network in interaction with the environment to obtain an error, training the Target Network by reducing the error, continuously updating the model of the Target Network, synchronizing the updated model to the Behavior Network, and updating the model of the Behavior Network.
Every time the training is completed for a certain number of iterations, the experience of the Behavior Network model is synchronized to the Target Network, so that the training of the next stage can be performed. By using the Target Network, the model for calculating the Q value will be fixed for a period of time, so that the model can alleviate the fluctuation of the model.
The Target Network of the present invention comprises two neural networks N s And N R ,N s For source node, by preset optimizer O s Training, N R For all intermediate nodes, by a preset optimizer O R Training, O s And O R Each having a memory for storing experiences including environmental status, behavior, rewards for each step. O (O) s Memory of M s ,O R Memory of M R . The Behavior Network of the invention comprises a set of neural networks N deployed on source nodes snode And a set of neural networks each deployed on all intermediate nodesN complex Rnode 。N snode To N s Copy of N Rnode To N R Is a copy of (a). For N snode And N Rnode Instead of training, the Q values corresponding to the behaviors are obtained after the environmental states are input to the nodes.
The network coding method based on the deep reinforcement learning comprises two parts: a centralized training process and a distributed execution process. In the centralized training process, the DQN-based coding coefficient optimization model network is trained in a preset optimizer. In the distributed execution process, coding coefficient optimization models based on a source node and an intermediate node of the DQN are respectively executed in a distributed mode on the source node and the intermediate node, experience generated by execution is sent back to an optimizer for training, and the DQN training speed is increased by executing and training.
(1) During the centralized training process, source node optimizer O s Playback of memory M from experience s Training source node N with medium random sampling experience s Is input into the source node environment state ss k (details of the environmental state of the source node will be described in detail later), the neural network N is modified by minimizing a predetermined loss function s Training N s Is output as the environmental state ss k Lower selection behavior a k Then obtain the optimal cumulative prize value Q k . Wherein, the loss function is: in the loss function, Q target Is N s The calculated target Q value, Q (ss k ,a k ;θ k ) Is known to be in the environment state ss from experience k Next, select action a k Q, θ after k Representing the network parameters of said DQN at the current decision step k.
Likewise, intermediate node optimizer O R Playback of memory M from experience R Training with medium random sampling experienceIntermediate node N R Is input with the intermediate node environment state s k (details of the intermediate node environmental states will be described in detail below), the neural network N is modified by minimizing a pre-set loss function R Training N R Is the output of the environment state s k Lower selection behavior a k Then obtain the optimal cumulative prize value Q k . Wherein, the loss function is:in the loss function, Q target Is N R The calculated target Q value, Q (s k ,a k ;θ k ) Is known empirically to be in the ambient state s k Next, select action a k Q, θ after k Representing the network parameters of said DQN at the current decision step k.
Once the parameters of the DQN are updated, the centralized optimizer O s And O R The updated DQN parameters are sent to each of the source and intermediate nodes in the network. The source and intermediate nodes update the neural network N on that node with the received DQN parameters snode And N Rnode Is set to be a DQN parameter of (c).
(2) In the distributed execution process, for the source node, according to the observed current environment state ss k Will ss k As the source node N snode The input of the DQN model of (a) outputs the Q value corresponding to each behavior, selects one behavior with greedy strategy probability epsilon (e.g., epsilon=0.1) to determine the coding coefficient of the kth slice of the original information, one behavior a k After execution, the source node obtains a prize value r k Optimizer O s Collecting experience of source node interaction with environment (ss k ,a k ,r k ,ss k+1 ) And stores the experience to the experience playback memory M s In (a) and (b); for intermediate node i, based on its observed environmental state s k Will s k As intermediate node N Rnode The input of the DQN model of (a), the output of the Q value corresponding to each behavior, and the selection of a behavior to determine the intermediate node buffer with greedy policy probability epsilon (e.g., epsilon=0.1)Coding coefficient of kth packet, a behavior a k After execution, the intermediate node obtains a prize value r k Optimizer O R Collecting experience of intermediate node interactions with the environment (s k ,a k ,r k ,s k+1 ) And stores the experience to the experience playback memory M R Is a kind of medium.
The following describes specific methods of source node and intermediate node encoding and their corresponding environmental states, behaviors, and rewards in connection with embodiments of the invention.
Encoding of source nodes and their corresponding environmental status, behavior and rewards
Fig. 1 shows the encoding process of a source node based on deep reinforcement learning: information X (x= [ X 1 ,x 2 ,…,x K ]) Dividing into K slices, K being integers greater than 1, the coding coefficient optimization process of the K slices being regarded as a markov process (MDP) comprising K decision steps, in which the K (k=1, 2, …, K) th slice x k Is determined;
specifically, two major modules of a deep reinforcement learning agent and a network environment in a source node coding coefficient optimization model based on deep reinforcement learning are designed as follows:
(1) The source node is regarded as a deep reinforcement learning agent;
(2) An abstract environment is a network formed by a source node and all next-hop intermediate nodes of the source node, including the source node, all next-hop intermediate nodes of the source node, and links formed by the source node and all next-hop intermediate nodes of the source node.
(3) Deep reinforcement learning agent observes environmental state ss of current decision step k k And according to the environment state ss k Take an action a k Acting on the environment, the environment will feed back a prize r k And the method is used for the deep reinforcement learning agent so as to realize the interaction between the deep reinforcement learning agent and the environment.
According to one embodiment of the invention, at the current decision step k, the observed environmental state ss of the source node k The method comprises the following steps:
environmental state ss k Kth slice x comprising one packet k And recently received M (e.g., m=10) encoded packets stored in the buffer of the next-hop intermediate node of the source nodeM is an integer greater than 1, i.eWherein P is S (l) Is the first encoded packet in the next hop intermediate node buffer of the source node.
Specifically, at the current environmental state ss k Next, the source node performs act a k
At each decision step k, the source node selects an action a k ∈A S To determine the kth slice x of the packet k Coding coefficient g (x) k ),g(x k )=a k Wherein A is S = {0,1, (q-1) }, q is the domain value size of the Galois field (Galois field), in one embodiment q=2, in another embodiment q is a positive integer.
According to one embodiment of the invention, at the current environmental state ss k Next, the source node performs act a k After receiving rewards r from the environment k The method comprises the following steps:
when the code packet sent by the source node can increase the rank of a linear system formed by the code packets in the next-hop intermediate node buffer of the source node, r k =1, otherwise, r k =0。
After undergoing the K decision steps, the coding coefficients of the K slices of one packet are determined, and then the source node encodes the K slices by using the determined coding coefficients and transmits the encoded packet P S ,P S =G S X, wherein x= [ X ] 1 ,x 2 ,...,x K ],G S =[g(x 1 ),g(x 2 ),...,g(x K )]。
In one embodiment, the node remains addressed to the next hop nodeEncoding packets to form a next-hop intermediate node buffer on a source nodeAnd confirming whether the next-hop node receives the coded packet or not through sending the ACK fed back by the next-hop node. If the node does not receive ACK, which means that the next hop node does not receive the encoded packet, then +.>Will not change, i.e. the state ss of the source node when it sends the next encoded packet k Is->No change occurs with respect to the sending of the current packet. If the node receives ACK, which means that the next hop node successfully receives the encoded packet, then +.>Change, i.e. the state ss of the source node when it sends the next encoded packet k Is->Changes occur with respect to transmitting the current packet. It follows that whether an ACK packet is accepted is determined by the link quality, which in turn affects the buffer +.>The stored code packets, the coding model of the source node may adaptively adjust the coding coefficients according to changes in network link quality.
In one embodiment, after all K steps have been performed, the rewards for K steps of steps 1 through K are determined and sent to the next hop node, the rewards for K steps being the same. Since the node is in the buffer areaThe coded packet accepted by the next-hop node is reserved, so that the node is no matterIf ACK is not received, the node can change according to whether the transmitted code packetThe linear system rank formed by the inner code packets evaluates the behavior.
Coding of intermediate nodes and their corresponding environmental states, behaviors and rewards
FIG. 2 shows a depth reinforcement learning based intermediate node encoding process, with a current intermediate node i encoding a received packet P from a previous hop node j of the intermediate node i j The re-encoding process is considered as a markov process (MDP) comprising M (e.g., m=10) decision steps, in which the intermediate node i decides the encoding coefficient of the kth encoded packet in the buffer of the intermediate node i and compares the kth encoded packet with the current encoded packet P new And performing exclusive or operation. In the first decision step, i.e. when k=1, P new =P j
According to one embodiment of the invention, two large modules of a deep reinforcement learning agent and a network environment in an intermediate node coding coefficient optimization model based on deep reinforcement learning are designed as follows:
(1) The intermediate node is regarded as a deep reinforcement learning agent;
(2) The abstract environment is a network formed by a current intermediate node i and a next-hop node of the intermediate node i, and comprises the intermediate node i, the next-hop node of the intermediate node i and a link formed by the intermediate node i and the next-hop node z of the intermediate node i;
(3) The deep reinforcement learning agent observes the environmental state s of the current decision step k k And according to the environmental state s k Take an action a k Acting on the environment, the environment will feed back a prize r k And the method is used for the deep reinforcement learning agent so as to realize the interaction between the deep reinforcement learning agent and the environment.
According to one embodiment of the invention, the environmental state s observed by the intermediate node i at the current decision step k k The method comprises the following steps:
environmental state s k Comprising the current encoded packet P new The intermediate node i bufferThe kth encoded packet P in (b) j (k) And the recently received M (m=10) encoded packets stored in the buffer of the next hop node z of the intermediate node i +.>I.e. < ->Wherein P is i (l) Is the first encoded packet in the buffer of the next hop node z of the intermediate node i, and P j (1),P j (l),…,P j (M) is received earlier than P j Is received by the receiver.
According to one embodiment of the invention, at the current environmental state s k The intermediate node i performs action a k
At each decision step k, the intermediate node i selects an action a k ∈A R To determine the coding coefficient g (P) j (k)),g(P j (k))=a k Wherein A is R = {0,1, (q-1) }, q is the domain value size of the Galois field (Galois field), in one embodiment q=2, in another embodiment q is a positive integer.
According to one embodiment of the invention, at the current environmental state s k The intermediate node i performs action a k After receiving rewards r from the environment k The method comprises the following steps:
when the code packet sent by the intermediate node i can increase the rank of the linear system formed by the code packet in the next hop node z buffer of the intermediate node i, r k =1; otherwise, r k =0。
After the kth decision step, the current encoded packet P new Is recoded, i.e. P new =P new ⊕(gP j (k))·P j (k) Especially when k=1, p new =P j ⊕(g(P j (k))·P j (k) A kind of electronic device. After M decision steps, the intermediate node i receives the coded packet P from the previous hop node j j Is recoded M times, and finally the intermediate node i transmits the coded packet P after the last decision step M is coded new
In one embodiment, the node retains encoded packets addressed to the next hop node to form a next hop intermediate node buffer on the intermediate nodeAnd confirming whether the next-hop node receives the coded packet or not through sending the ACK fed back by the next-hop node. If the node does not receive ACK, which means that the next hop node does not receive the encoded packet, then +.>Will not change, i.e. the state s of the intermediate node i when it transmits the next encoded packet k Is->No change occurs with respect to the sending of the current packet. If the node receives ACK, which means that the next hop node successfully receives the encoded packet, then +.>Change, i.e. the state s of the intermediate node i when it sends the next encoded packet k Is->Changes occur with respect to transmitting the current packet. It follows that whether an ACK packet is accepted is determined by the link quality, which in turn affects the buffer +.>The stored encoded packets, the encoding model of the intermediate node can be adapted according to the changes in network link qualityThe coding coefficients are adjusted.
In one embodiment, after all M steps have been performed, the rewards for M steps 1 through M are determined and sent to the next hop node, the rewards for M steps being the same. Since the node is in the buffer areaThe code packet accepted by the next hop node is reserved, so that whether the node receives the ACK or not, the node can change according to whether the transmitted code packet is changed or notThe linear system rank formed by the inner code packets evaluates the behavior.
FIG. 3 illustrates a device functional configuration block diagram of intelligent network coding for deep reinforcement learning according to an embodiment of the present invention. The apparatus includes: a source node coding coefficient optimization unit configured to optimize coding coefficients of a data packet on a source node by a depth reinforcement learning coding coefficient optimization model of the source node; an intermediate node coding coefficient optimization unit configured to optimize coding coefficients of the data packet on the intermediate node by a depth reinforcement learning coding coefficient optimization model of the intermediate node; an intelligent network coding unit configured to code information according to the optimized coding coefficient; and the data packet forwarding unit is configured to forward the encoded data packet.
The effects of the present invention will be described below in terms of simulation and platform verification experiments of the present invention.
The example uses a frame TensorFlow 1.15 based on Python3.5 to construct the intelligent network coding method based on the deep reinforcement learning and the architecture of the deep neural network. In this example, consider a multi-jumper network topology with a single source, multiple intermediate nodes and a single destination and a multi-hop parallel network topology, with fig. 4 showing the multi-jumper network topology and fig. 5 showing the multi-hop parallel network topology.
The intelligent network coding method based on the deep reinforcement learning is evaluated by using 2 performance indexes of decoding rate and overhead. Before analyzing the experimental results, the concepts and terms involved in the experiment will be briefly described:
decoding rate: probability of successful decoding (recovering original information) after the destination node receives the P data packets;
overhead: for measuring decoding efficiency of different coding algorithms, we can define overheadWhere K is the number of packets for which one information is divided, E is the number of redundant packets when network coding is used, and Nr is the number of packets received at the destination node.
Link quality: this patent indicates link quality by packet error rate (Packet error rate, PER). Probability of packet error transmission for a given signal-to-interference plus noise ratio (Signal to Interference plus Noise Ratio, SINR) value γWherein N is b Is the size of a data packet (unit: bit); BER (γ) is the bit error rate for a given SINR value γ, and depends on the technology employed by the physical layer and the statistical characteristics of the channel.
Fig. 6 shows the relationship between the decoding rate of the present example and the number of packets transmitted by the source node and the number of intermediate nodes in the case where the packet error rate per link is 0.1 in the multi-jumper network topology. It can be seen that as the number of packets transmitted by the source node increases and the number of intermediate nodes increases, the probability of successful decoding by the destination node is improved. Furthermore, the greater K, the lower the probability of decoding by the target node in the case that the target node receives the same data packet. In the case of k=5, the overhead when the number (N) of intermediate nodes is equal to 2,4,6,8 is 12.2%, 15.1%, 19.2% and 20.1%, respectively. In the case of k=10, the overhead when the number (N) of intermediate nodes is equal to 2,4,6,8 is 2.5%,4.2%,4.5% and 5.2%, respectively. The more intermediate nodes, the more data packets can pass through longer paths (more intermediate nodes) to reach the destination node, the larger the total loss rate of the packets, and some information packets can not reach the destination node, so that the source node needs to send a lot of redundant information, and the number Nr of the data packets finally received by the destination node is increased (molecules in an overhead formula), so that the overhead is increased.
Fig. 7 shows the relationship between the decoding rate and the number of packets transmitted by the source node and the number of intermediate nodes in the case where the packet error rate of the link between the source node and the intermediate node is 0.1, the packet error rate of the link between the intermediate node and the destination node is 0.3, and the packet error rate of the link between the source node and the destination node is 0.8 in the multi-hop parallel network topology. It can be seen that as the number of packets transmitted by the source node increases and the number of intermediate nodes increases, the probability of successful decoding by the destination node is improved. Furthermore, the greater K, the lower the probability of decoding by the target node in the case that the target node receives the same data packet. In the case of k=5, the overhead when the number (N) of intermediate nodes is equal to 2,6,10,14 is 12.2%, 15.1%, 19.2% and 20.1%, respectively. In the case of k=10, the overhead when the number of intermediate nodes (N) is equal to 2,6,10,14 is 4.8%,4.1%,3.8% and 3.1%, respectively.
Fig. 8 shows the generalization ability of the present invention over different numbers of intermediate nodes in the case where the error rate of packets per link is 0.1 in a linear topology. We first Train a DQN model for the present example, defined as Train, with the number of intermediate nodes n=1 N=1 . We then use the trained DQN model to Test the decoding rate at other intermediate node numbers, we define these Test results as (Test N=i ,Train N=1 I=2, 4,6, 8). Finally, we combine these results with training and testing results at the same number of intermediate nodes (defined as Test N=i ,Train N=i I=2, 4,6, 8). It can be seen that in the present example, (Test) N=i ,Train N=1 I=2, 4,6, 8) results and (Test N=i ,Train N=i I=2, 4,6, 8) results are relatively consistent, and Root Mean Square Error (RMSE) is 0.0034, 0.0072, 0.011 and 0.015 respectively at n=2, 4,6,8, which verifies that the method of the present invention is applicable to different network specificationsGeneralization capability on the mold.
Fig. 9 shows the generalization capability of the present invention at different link qualities with the number of intermediate nodes n=1 in a linear topology. Error rate PER of packets of a link between source S and intermediate node R1 of fig. 4 S-R1 =0.3, and error rate PER of packets of the link between intermediate node R1 and destination node D R1-D In the case of =0.3, a DQN model was trained for the example of the invention, defined asWe then used the trained DQN model to test the channel quality at other link quality (PER S-R1 =0,PER R1-D =0),(PER S-R1 =0.1,PER R1-D =0.3),(PER S-R1 =0.1,PER R1-D =0.5), we define these test results as +.>Use->Representing a link quality PER S-R1=w ,PER R1-D=y Down-trained DQN model at link quality PER S-R1=u ,PER R1-D=v Test results of the test under test, in the figureIs marked as +.>Finally, these results are compared with training and testing results at the same link qualityA comparison is made. It can be seen that in the case of the present invention, < > is>Results and resultsThe results are more consistent and at link quality (PER S-R1 =0,PER R1-D =0),(PER S-R1 =0.1,PER R1-D =0.3),(PER S-R1 =0.1,PER R1-D =0.5), the root mean square error (Root Mean Square Error, RMSE) is 0, 0.002 and 0.003, respectively, which verifies the generalization ability of the inventive method over different link qualities.
Finally, evaluating the performance of the embodiment of the invention on a real test platform, configuring a source node coding coefficient optimizing unit, an intermediate node coding coefficient optimizing unit, an intelligent network coding unit and a data packet forwarding unit, and using Raspberry Pi3B+ type to carry out experiments. Raspberry Pi3B+ has a 1.4GHz ARM A53 processor, a 1GB RAM, and integrated wireless and Bluetooth functionality. We deployed the DQN model trained by the example of the invention to the racbore pi3b+ using a TensorFlow Lite. In this experiment, the inventive example was compared to a conventional reference coding algorithm and an existing reinforcement learning coding algorithm (RL-aided SNC: dynamic Sparse Coded Multi-Hop Transmissions using Reinforcement Learning) based coding algorithm. In the reference encoding algorithm, the source node uses a conventional fountain code, while the intermediate node uses a random network encoding algorithm. Meanwhile, the decoding result obtained in the simulation environment is compared with the decoding result on a real test platform.
Fig. 10 shows the decoding rate of the present example compared to the conventional reference encoding algorithm and the existing reinforcement-learning encoding algorithm RL-aid SNC in the case where the packet error rate of each link is equal to 0.1, k=5 in the multi-jumper topology. It can be seen that the decoding rate efficiency of the present invention is high with the same number of intermediate nodes. In addition, the simulation result is more consistent with the result obtained by the real test platform. Under the simulation environment and the real test platform, the root mean square error of the decoding results of the three coding algorithms is 0.0042, 0.0153 and 0.0379 respectively.
The experimental result of the embodiment shows that the intelligent network coding method based on the deep reinforcement learning has higher decoding rate and lower cost compared with the existing coding method.
It should be noted that, the steps in the foregoing embodiments are not necessary, and those skilled in the art may perform appropriate operations, substitutions, modifications and the like according to actual needs.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and are not limiting. Although the invention has been described in detail with reference to the embodiments, those skilled in the art will understand that modifications and equivalents may be made thereto without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims (12)

1. A network coding method, the network comprising a source node and an intermediate node, the method comprising:
the source node divides the information to be transmitted into K pieces x 1 ,x 2 ,...,x K K is an integer greater than 1, and the coding coefficient g (x 1 ),g(x 2 ),...,g(x K ) K slices are encoded to generate an encoded packet P S And transmits the encoded packet P to the next hop node S Wherein the source node encoding model is derived by training a DQN network, wherein during the source node encoding model training process, the DQN of the source node is trained by randomly sampling experience from an experience replay memory, wherein each step of environmental state is usedTraining by minimizing a predetermined loss function as a training input, the output of which is the optimal cumulative prize value Q, ss obtained after a selected behavior in the ambient conditions k For the environmental state of the kth step, x k For the kth piece of the packet, +.>Recent receipt stored in a buffer for a next-hop intermediate node of the source nodeM coded packets, M being an integer greater than 1;
the intermediate node receives the coded packet sent by the previous node and sends the received coded packet P j Coding M times, determining coding coefficient g (P j (1)),g(P j (2)),...,g(P j (M)) to generate the encoded packet P new And transmits the encoded packet P to the next hop node new Wherein the intermediate node coding model is obtained by training a DQN network, wherein during the intermediate node coding model training process, the DQN of the intermediate node is trained by randomly sampling experience from an experience replay memory, wherein each step of environmental state is usedTraining by minimizing a predetermined loss function as a training input, the output of which is the optimal cumulative prize value Q value obtained after the selection of the behavior under the environmental conditions, s k P being the environmental state of the kth step new For the current encoded packet, P j (k) For the kth encoded packet in the intermediate node buffer +.>The recently received M encoded packets stored in the buffer for the intermediate node next hop node z.
2. The method of claim 1, wherein the source node encoding model comprises a target network N s And executing network N snode The training of the source node coding model comprises the following steps:
step 110: playback of memory M from experience s Training N with medium random sampling experience s
Step 120: will N s The trained DQN parameters are sent to the source node for N snode Updating; and/or
Step 130: at the source node, the environmental state ss k As N snode The input of the DQN model of (1) outputs the Q value corresponding to each behavior, and the behavior is selected by greedy strategy probability epsilon to determineDetermining the coding coefficients of K pieces of original information, collecting the experience of source node interaction with environment after execution, and storing the experience in experience playback memory M s Is a kind of medium.
3. The method of claim 1, wherein the intermediate node coding model comprises a target network N R And executing network N Rnode Training of the intermediate node coding model includes:
step 210: in an empirical playback memory M R Training N with medium random sampling experience R
Step 220: will N R The trained DQN parameters are sent to each intermediate node to generate N Rnode Updating; and/or
Step 230: at each intermediate node, the environmental state s k As N Rnode The input of the DQN model of (2) outputting the Q value corresponding to each behavior, determining the coding coefficient of M packets of the intermediate node buffer area by using greedy strategy probability epsilon selection behavior, collecting the experience of the intermediate node interaction with the environment after execution, and storing the experience in an experience playback memory M R Is a kind of medium.
4. The method of claim 2, wherein for N s Training includes:
encoding network environment state ss k As N s By minimizing the loss functionTraining a neural network, wherein K takes the value 1..k, where Q target Is N s A calculated target Q value;
a k representing the behavior of the kth step;
r k representing rewards after taking action in the kth step;
θ k network parameters representing the DQN of step k;
ss k+1 representing the environment state of the network coding in the k+1 step;
q(ss k ,a k ;θ k ) Represented in said environment state ss k Lower selection behavior a k And the Q value after that.
5. A method according to claim 3, wherein for N R Training includes:
coding the environmental state s of a network k As N R By minimizing the loss functionTraining a neural network, wherein k takes the value 1..m, wherein
Q target Is N R A calculated target Q value;
a k representing the behavior of the kth step;
r k representing rewards after taking action in the kth step;
θ k network parameters representing the DQN of step k;
s k+1 representing the environment state of the network coding in the k+1 step;
q(s k ,a k ;θ k ) Represented in said environmental state s k Lower selection behavior a k And the Q value after that.
6. The method of claim 4, wherein for N s
a k The kth slice x of information k Coding coefficient, a k ∈A S Wherein A is S = {0,1, (q-1) }, q is the domain value size of the galois field;
r when the code packet sent by the source node can increase the rank of the linear system formed by the code packet in the next-hop intermediate node buffer of the source node k 1, otherwise, r k Is 0.
7. The method of claim 5, wherein for N R
a k Coding coefficient for kth packet, a k ∈A R Wherein A is R = {0,1, (q-1) }, q is the domain value size of the galois field;
r when the encoded packet sent by the intermediate node enables the rank of the linear system formed by the encoded packets in the next hop node buffer of the intermediate node to increase k 1, otherwise, r k Is 0.
8. The method of claim 1, wherein the ss of the source node if the source node does not receive an ACK k A kind of electronic deviceUnchanged; if the intermediate node does not receive the ACK, s of the intermediate node k Is->Is unchanged.
9. The method of claim 1, wherein the source node generates the encoded packet P by S
P S =G S X, wherein x= [ X ] 1 ,x 2 ,...,x K ],G S =[g(x 1 ),g(x 2 ),...,g(x K )]。
10. The method of claim 1, wherein a kth one of M encodings of the intermediate node comprises:
when k=1, the number of the groups,
when k is more than 1, the method comprises the steps of,
P j (k) K is a value of 1..m for the kth encoded packet in the buffer of the intermediate node.
11. A computer readable storage medium, in which one or more computer programs are stored which, when executed, are adapted to carry out the method of any one of claims 1-10.
12. A network coded computing system, comprising
A storage device, and one or more processors;
wherein the storage means is for storing one or more computer programs which, when executed by the processor, are for implementing the method of any of claims 1-10.
CN202011344089.5A 2020-11-26 2020-11-26 Intelligent network coding method and equipment based on deep reinforcement learning Active CN112564712B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011344089.5A CN112564712B (en) 2020-11-26 2020-11-26 Intelligent network coding method and equipment based on deep reinforcement learning
PCT/CN2021/118099 WO2022110980A1 (en) 2020-11-26 2021-09-14 Intelligent network coding method and device based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011344089.5A CN112564712B (en) 2020-11-26 2020-11-26 Intelligent network coding method and equipment based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112564712A CN112564712A (en) 2021-03-26
CN112564712B true CN112564712B (en) 2023-10-10

Family

ID=75045041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011344089.5A Active CN112564712B (en) 2020-11-26 2020-11-26 Intelligent network coding method and equipment based on deep reinforcement learning

Country Status (2)

Country Link
CN (1) CN112564712B (en)
WO (1) WO2022110980A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112564712B (en) * 2020-11-26 2023-10-10 中国科学院计算技术研究所 Intelligent network coding method and equipment based on deep reinforcement learning
CN116074891A (en) * 2021-10-29 2023-05-05 华为技术有限公司 Communication method and related device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102209079A (en) * 2011-06-22 2011-10-05 北京大学深圳研究生院 Transmission control protocol (TCP)-based adaptive network control transmission method and system
CN104079483A (en) * 2013-03-29 2014-10-01 南京邮电大学 Multistage security routing method for delay tolerant network and based on network codes
CN110519020A (en) * 2019-08-13 2019-11-29 中国科学院计算技术研究所 Unmanned systems network intelligence cross-layer data transmission method and system
WO2020215462A1 (en) * 2019-04-24 2020-10-29 香港中文大学(深圳) Batch encoding-based network communication method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9160687B2 (en) * 2012-02-15 2015-10-13 Massachusetts Institute Of Technology Method and apparatus for performing finite memory network coding in an arbitrary network
US9787614B2 (en) * 2015-06-03 2017-10-10 Aalborg Universitet Composite extension finite fields for low overhead network coding
CN111770546B (en) * 2020-06-28 2022-09-16 江西理工大学 Delay tolerant network random network coding method based on Q learning
CN112564712B (en) * 2020-11-26 2023-10-10 中国科学院计算技术研究所 Intelligent network coding method and equipment based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102209079A (en) * 2011-06-22 2011-10-05 北京大学深圳研究生院 Transmission control protocol (TCP)-based adaptive network control transmission method and system
CN104079483A (en) * 2013-03-29 2014-10-01 南京邮电大学 Multistage security routing method for delay tolerant network and based on network codes
WO2020215462A1 (en) * 2019-04-24 2020-10-29 香港中文大学(深圳) Batch encoding-based network communication method and system
CN110519020A (en) * 2019-08-13 2019-11-29 中国科学院计算技术研究所 Unmanned systems network intelligence cross-layer data transmission method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
针对事件驱动型传感器网络的多路径编码路由协议;仝杰;钱德沛;刘轶;李世晗;;西安交通大学学报(第06期);全文 *

Also Published As

Publication number Publication date
CN112564712A (en) 2021-03-26
WO2022110980A1 (en) 2022-06-02

Similar Documents

Publication Publication Date Title
CN112564712B (en) Intelligent network coding method and equipment based on deep reinforcement learning
KR101751497B1 (en) Apparatus and method using matrix network coding
CN107994971B (en) Coding transmission method and coding communication system for limited buffer area relay link
CN108092742B (en) A kind of communication means based on polarization code
CN103650399A (en) Adaptive generation of correction data units
CN112468265B (en) Wireless local area network modulation coding self-adaptive selection method based on reinforcement learning and wireless equipment
Wang et al. INCdeep: Intelligent network coding with deep reinforcement learning
Saxena et al. Deep learning for frame error probability prediction in BICM-OFDM systems
US9876608B2 (en) Encoding apparatus and encoding method
CN101431358B (en) Vertical layered space-time signal detection method based on M-elite evolution algorithm
CN113923743A (en) Routing method, device, terminal and storage medium for electric power underground pipe gallery
Zhang et al. Deep Deterministic Policy Gradient for End-to-End Communication Systems without Prior Channel Knowledge
WO2018157263A1 (en) Generalized polar codes
CN110505681B (en) Non-orthogonal multiple access scene user pairing method based on genetic method
CN109039531B (en) Method for adjusting LT code coding length based on machine learning
Zhao et al. Joint Computing Resource and Bandwidth Allocation for Semantic Communication Networks
CN109660317A (en) Quantum network transmission method based on self-dual quantum low-density parity check error correction
CN115378548A (en) Connectionless-oriented binary superposition determined linear network coding transmission method
CN117581493A (en) Link adaptation
Ao et al. Deep reinforcement learning based spinal code transmission strategy in long distance FSO communication
CN110190931B (en) Recursive chaotic channel coding method
TWI833065B (en) Network optimizer and network optimization method thereof
CN115515181B (en) Distributed computing method and system based on network coding in wireless environment
CN112332862A (en) Polarization code incremental redundancy hybrid retransmission method and device based on deep reinforcement learning
CN115208821B (en) Cross-network route forwarding method and device based on BP neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant