CN111629415B - Opportunistic routing protocol design method based on Markov decision process model - Google Patents

Opportunistic routing protocol design method based on Markov decision process model Download PDF

Info

Publication number
CN111629415B
CN111629415B CN202010331293.7A CN202010331293A CN111629415B CN 111629415 B CN111629415 B CN 111629415B CN 202010331293 A CN202010331293 A CN 202010331293A CN 111629415 B CN111629415 B CN 111629415B
Authority
CN
China
Prior art keywords
node
state
value
packet
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010331293.7A
Other languages
Chinese (zh)
Other versions
CN111629415A (en
Inventor
黄成�
尹政
刘子淇
刘振光
姚文杰
徐志良
王力立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202010331293.7A priority Critical patent/CN111629415B/en
Publication of CN111629415A publication Critical patent/CN111629415A/en
Application granted granted Critical
Publication of CN111629415B publication Critical patent/CN111629415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/04Communication route or path selection, e.g. power-based or shortest path routing based on wireless node resources
    • H04W40/10Communication route or path selection, e.g. power-based or shortest path routing based on wireless node resources based on available power or energy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/12Communication route or path selection, e.g. power-based or shortest path routing based on transmission quality or channel quality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/38Services specially adapted for particular environments, situations or purposes for collecting sensor information
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an opportunistic routing protocol based on a Markov decision process model, which comprises the following steps of firstly evaluating the quality of an environmental link and evaluating the receiving rate of a packet: acquiring packet receiving rate data under the same RSSI value and LQI average value and packet receiving rate data under different communication distances, establishing a sample space, and performing curve family regression fitting on the LQI average value and the packet receiving rate data to obtain an estimation formula of the packet receiving rate; broadcasting wireless sensor nodes to construct a wireless sensor network; periodically broadcasting and receiving a detection packet by a sensor node, and establishing a neighbor information table; the sensor node establishes a candidate node set; the node where the effective data packet is located broadcasts the data packet, the candidate node which receives the data packet recalculates the state value corresponding to the node according to the value iteration formula, and the data packet is sent Fang Xuanqu to return the node with the maximum corresponding state value as the next hop forwarding node. The invention optimizes and balances the energy use of the wireless sensor network.

Description

Opportunistic routing protocol design method based on Markov decision process model
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to a method for designing an opportunistic routing protocol based on a Markov decision process model.
Background
The wireless sensor network is a network formed by a plurality of sensor nodes in a multi-hop self-organizing mode, and has a very wide application prospect. In a great deal of research work on wireless sensor networks, the research on routing protocols is always key content, and reasonable routing design can effectively improve network performance. Because the sensor nodes are scattered in the unmanned area at random, the battery is difficult to replace, and the problems of saving the consumption of the node energy and balancing the use of the network energy become unavoidable.
Conventional wireless sensor network routing protocols select one or more optimized fixed paths before data transmission begins, and data packets are transmitted along the predetermined fixed paths. Unlike conventional routing protocols, each node in the opportunistic routing that receives a data packet may act as a relay node, and the routing path from the source node to the destination node is not fixed. And each node in the network acquires neighbor node information and network parameters through periodically sending and receiving the detection packet, and selects a proper neighbor node as a candidate node to form a candidate node set (CRS). In the forwarding process, the node selects an optimal next-hop forwarding node from candidate nodes which successfully receive the data packet. This process is repeated until the packet is forwarded to the destination node.
The traditional routing protocol has the defects that the routing path is fixed, the network environment change cannot be well adapted, and the network performance is greatly influenced by the changes of factors such as network link quality, node residual energy, node position and the like. If the data transmission of the next hop node in the sending direction fails in a certain data transmission process, the sending party retransmits the data packet until the next hop node receives the data packet successfully, and the data packet is not forwarded by other nodes receiving the data packet, so that the waste of network resources is caused. The disadvantage of opportunistic routing is that the data packet is hop-by-hop selected to be forwarded to the destination node, and the forwarding node has no global information of the network, so that the routing decision can be carried out only by depending on the position information of the neighbor node and the destination node, and the data packet is ensured to be forwarded to the destination node continuously. In this case, if there is a deviation in the positioning of the sensor nodes in the network or an unknown variation in the node positions, the routing protocol performance will be severely affected. Besides the influence of the position information error, the opportunistic routing protocol only considers the parameters of the neighbor nodes, and the single-step optimal solution is only sought, so that the global optimal cannot be realized. The above problems limit further improvement of the opportunistic routing protocol performance, and how to overcome the dependence on the location information, and it is a great difficulty to select the best node from the global perspective to forward the data packet.
Disclosure of Invention
The invention aims to provide an opportunistic routing protocol based on a Markov decision process model, so as to solve the problems that the routing protocol performance is poor because node position information is relied on and a global optimal solution cannot be realized in the prior art; the invention realizes the design of the opportunistic routing protocol through the Markov decision process of reinforcement learning, so that the energy use of the wireless sensor network is optimized and balanced, and the aim of prolonging the life cycle of the network is fulfilled.
The technical solution for realizing the purpose of the invention is as follows:
an opportunistic routing protocol based on a markov decision process model comprising the steps of:
step 1, evaluating the environmental link quality and the packet receiving rate:
acquiring packet receiving rate data under the same RSSI value and LQI average value and packet receiving rate data under different communication distances, establishing a sample space, and performing curve family regression fitting on the LQI average value and the packet receiving rate data to obtain an estimation formula of the packet receiving rate;
step 2, sowing wireless sensor nodes, and constructing a wireless sensor network: the wireless sensor network comprises a sink node which is responsible for collecting data collected by common nodes in the area and uploading the data to the network;
step 3, periodically broadcasting and receiving the detection packet by the sensor node, and establishing a neighbor information table;
step 4, the sensor node establishes a candidate node set;
step 5, solving a forwarding node selection problem in the opportunistic routing by using a Markov decision process: the node where the effective data packet is located broadcasts the data packet, the candidate node which receives the data packet recalculates the state value corresponding to the node according to the value iteration formula, and the data packet is sent Fang Xuanqu to return the node with the maximum corresponding state value as the next hop forwarding node.
Step 6, repeating the data packet forwarding process of the step 5 until the data packet is forwarded to the sink node; and finally, obtaining an optimal routing path by continuously carrying out data packet forwarding and state value iteration.
Compared with the prior art, the invention has the remarkable advantages that:
(1) The invention provides a method for combining a Markov decision process of a classical reinforcement learning method, which provides a new method for the field of current wireless sensor network routing protocol design, and when the Markov Decision Process (MDP) modeling is carried out on the opportunistic routing optimization problem, a probability transition matrix P is derived by utilizing the packet receiving rate among sensor nodes, and then the problem of searching the optimal solution of the Markov decision process model is solved by utilizing dynamic programming, so that the algorithm is more efficient and has better convergence.
(2) The invention calculates the packet receiving rate between the sensor nodes in real time by utilizing the Received Signal Strength Indication (RSSI) and the Link Quality Indication (LQI) provided by the network physical layer, so that the algorithm can adapt to the change of the network state.
(3) The invention designs the opportunistic routing protocol by using the reinforced-learning Markov decision process model, does not depend on the position information of the sensor node, and can search a proper forwarding path from the global optimal angle after continuous learning.
Drawings
FIG. 1 is a schematic view of the present invention
FIG. 2 is a schematic diagram of a sensor network layout
FIG. 3 is a schematic diagram of state transition matrix computation
FIG. 4 is a plot of the energy ratio k of the action prize R to the current node surplus F Relation of (2)
FIG. 5 is a schematic diagram of a learned opportunistic routing path
FIG. 6 is a schematic diagram of an end-of-life wireless sensor network
Detailed Description
The invention is further described with reference to the drawings and specific embodiments.
The invention provides an opportunity routing protocol based on a Markov decision process model, which utilizes reinforcement learning to find an energy optimal forwarding path, and comprises the following specific steps:
step 1, evaluating the quality of an environmental link, and providing a packet receiving rate evaluation method:
and selecting a certain area as a data acquisition area, carrying out multiple communication experiments in the area by utilizing two sensor nodes under different communication distances, acquiring packet receiving rate data under the same RSSI value, and establishing a sample space by LQI average values and packet receiving rate data under different communication distances. When the link communication quality is good, the correlation between the RSSI value and the packet receiving rate is the best, so the RSSI value is used for estimating the packet receiving rate. When-70 dBm is less than or equal to RSSI, the packet receiving rate is 100%; when the RSSI is less than or equal to-75 dBm and less than or equal to-70 dBm, the wrapping yield is 99 percent; when the RSSI is less than or equal to-80 dBm and less than or equal to-75 dBm, the wrapping yield is 98 percent; when the RSSI is less than or equal to-85 dBm and less than or equal to-80 dBm, the packet receiving rate is (RSSI+177%; when the RSSI is less than-85 dBm, the packet receiving rate is estimated by using the LQI mean value, and curve family regression fitting is carried out on the LQI mean value and the packet receiving rate data, so as to obtain an estimation formula of the packet receiving rate. The method utilizes the RSSI and LQI value information carried in the data packet transmission to estimate the packet receiving rate in real time, and can adapt the routing protocol to the change of the network.
Step 2, sowing wireless sensor nodes, and constructing a wireless sensor network:
as shown in fig. 2, a plurality of wireless sensor nodes are randomly scattered in a selected data acquisition area to form a wireless sensor network, wherein the wireless sensor network comprises a sink node which is responsible for collecting data acquired by common nodes in the area and uploading the data to the network. The common node has limited energy, and sink node has sufficient energy.
Step 3, periodically broadcasting and receiving the detection packet by the sensor node, and establishing a neighbor information table:
each sensor node in the sensor network periodically broadcasts a detection packet, wherein the detection packet comprises a node ID, a node corresponding state value, a node sleep/interception duty cycle and a candidate node set of the node. Wherein the node corresponding state value is used to evaluate the value of a node as a data packet forwarding node, and the candidate node set of nodes refers to a set of nodes that can receive and forward data packets from the node. Each sensor node receives detection packets from the neighbor nodes, builds a neighbor information table, acquires RSSI and LQI values when receiving data packets, and estimates the packet receiving rate between the sensor node and the neighbor nodes by using the fitting formula obtained in the step 1. Considering the sleep/listening period of a node, in fact, the probability L that node j successfully receives a node i broadcast packet ij Can be calculated as follows:
L ij =p ij ·k jw
wherein ,pij For the packet reception rate, k, between node i and node j jw Is the listening time duty cycle of node j. This step is performed periodically during the life cycle of the sensor network.
Step 4, the sensor node establishes a candidate node set:
the sensor node sorts the neighbor nodes according to the corresponding state values, and sequentially selects the neighbor nodes with the state values larger than or equal to the neighbor nodes as candidate nodes until the probability that the data packet broadcast by the sensor node is successfully received by at least one candidate node is larger than 90% or no selectable neighbor node. The candidate nodes constitute a candidate node set. If a data packet broadcast by a sensor node at a time is not received by any candidate node, the node rebroadcasts the data packet. After updating the state value of the neighbor node, the node periodically repeats the step and reestablishes the candidate node set according to the new state value.
Step 5, solving a forwarding node selection problem in the opportunistic routing by using a Markov decision process: the node where the effective data packet is located broadcasts the data packet, the candidate node which receives the data packet recalculates the state value corresponding to the node according to the value iteration formula, and the data packet is sent Fang Xuanqu to return the node with the maximum corresponding state value as the next hop forwarding node.
5.1 modeling opportunistic routing problems with a Markov decision process model:
in the opportunistic routing problem, the agent is a valid packet that needs to be forwarded. The modeling process is in the prior art, wherein S represents a state set, states are represented by S, different states of an effective data packet are represented by S, the effective data packet is located in different sensor nodes, each state S of the effective data packet corresponds to a node, and a value of a state corresponding to each node is a state value corresponding to the node and is used for representing the value of the node as a forwarding node; a represents an action set, the action is represented by a, the action set A is a broadcast data packet, a next hop forwarding node is selected according to a certain rule, and the difference between different actions is that the rule for selecting the next hop node is different, so that the generated state transition probability matrixes P are also different; p is a state transition matrix, which represents the probability that an effective data packet is at a certain node after taking a certain action, and different actions can generate different state transition probabilities related to the action taken; r is an action reward, and taking an action in a certain state generates a corresponding reward.
5.2 calculating a state transition probability matrix P:
according to the invention, a state transition probability matrix P is obtained according to the packet receiving rate between nodes, and fig. 3 is a schematic diagram of state transition probability matrix calculation. In the graph, CRS i is a candidate node set of node i, and the node set has m candidate nodes and corresponds to a state value v (j) 1 )>v(j 2 )>v(j 3 )>v(j 4 )>…>v( m), wherein j1 、j 2 、j 3 、j 4 、j m Are candidate nodes for node i. The action taken by the node where the effective data packet is located is broadcasting the data packet, and the node with the largest state value is selected from candidate nodes receiving the data packet as a forwarding node of the next hop according to a greedy strategy, so that the effective data packet is transferred from the node i to the node j y And is composed of j y Probability of forwarding
Figure GDA0004048064370000051
Can be calculated by the following formula:
Figure GDA0004048064370000052
wherein ,
Figure GDA0004048064370000053
representing node j y Probability of successful reception of a node i broadcast packet, < >>
Figure GDA0004048064370000054
Representing node j t The probability of successfully receiving a broadcast packet by node i, t is the amount of change from 1 to y-1, and y is the amount of change from 1 to m. Node j other than the candidate node x (x=m+1, m+2.,. N) will not act as a forwarding node for the node i broadcast packet, and therefore +.>
Figure GDA0004048064370000055
Is the amount varying from m+1 to N. Probability calculated by node i>
Figure GDA0004048064370000056
The ith row and the jth row of the state transition probability matrix P respectively 1 、j 2 、…、j m 、…、j N And calculating the value of the column and the value of the corresponding row of the state transition probability matrix P by each node to obtain the complete state transition probability matrix P.
5.3 developing a reward function R:
in reinforcement learning, each walker generates a walking reward R,
Figure GDA0004048064370000057
representing rewards earned by taking action a in state s. In each state, the action set A can be broadcast data packet and select the candidate node of next hop according to a certain rule. The broadcast data packet generates corresponding action rewards for actual actions, so that rewards corresponding to all actions in the same action set are the same, and the action rewards only relate to the current state.
And (3) reasonably formulating an action reward, and realizing a corresponding optimization task by a reinforcement learning algorithm. The final purpose of the energy-aware opportunistic routing is to optimize the network energy use and improve the network life cycle. To achieve this, on the one hand, network energy is saved, and the data packet is forwarded to the destination node by the shortest path possible; on the other hand, network energy is balanced, and premature energy exhaustion of part of nodes due to frequent use is avoided. In order to balance the above two problems, the present invention formulates an action rewarding function R s =-1+f(k E), wherein Rs Representing an action prize, k, for broadcasting a data packet in state s E Ratio of remaining energy to initial energy for current node, f (k E ) For remaining energy proportion k with respect to front node E Is a function of (2). Broadcast packets consume energy, so the action rewards are all negative. The data packet has a basic reward of-1 after each transmission, and can ensure that after a certain learning process, the state value function corresponding to the node far away from the destination nodeSmaller. f (k) E ) The lower the current node's remaining energy, the greater the cost of forwarding the packet, and the less the action reward. Based on this principle, f (k) E ) The expression of (2) is as follows:
Figure GDA0004048064370000061
at this time, the ratio k of the energy remaining in the current node to the action prize R E The relationship of (2) is shown in FIG. 4.
5.4, formulating an action strategy:
the action strategy in the Markov decision process model adopts a greedy strategy, namely, an optimal action is adopted under the state s, so that the state value of the state s after iteration is maximized. In order to better explore the state space, most algorithms give a certain randomness to the action strategy, so that the intelligent agent has probability to perform random actions to find possible better solutions. However, in the opportunistic routing problem, since the reward function is negative, all arrived states are assigned negative state values. Thus, the algorithm automatically explores the unknown state space, and the unused nodes are preferentially utilized to forward data packets. Through the continuous learning process, the data packet reaches the destination node along the path with the minimum forwarding cost. The present algorithm uses a greedy strategy as the action strategy.
5.5 the candidate nodes iterate corresponding state values and return the state values:
broadcasting data packets by nodes where effective data packets are located, and dynamically planning a value iteration formula of candidate nodes receiving the data packets according to the dynamic programming value
Figure GDA0004048064370000062
The own state value is recalculated, but instead of immediately replacing the original state value, the value is returned to the source node. Wherein k represents the kth iteration, k+1 represents the kth+1 iteration, v represents a state value, s represents a current state, namely, a state corresponding to a candidate node receiving the data packet, and s' representsShowing the next time state, v k+1 (s) is the value of state s at the (k+1) th iteration, v k (s ') is the value of state s' at the kth iteration, a represents an action that can be taken in the current state s, A is the set of actions, gamma is the discount factor, R is the action reward,
Figure GDA0004048064370000063
representing rewards for taking action a in state s, P is a state transition probability matrix,
Figure GDA0004048064370000064
representing the probability that the state becomes s' at the next time after taking action a in state s, a corresponding value can be found in the state transition probability matrix P, max indicating that the action policy is a greedy policy.
And 5.6 selecting a next hop forwarding node.
The data packet sending node receives the status value returned by the candidate node, and selects the node with the highest status value corresponding to success as the next hop forwarding node, and broadcasts the message. The candidate node selected as the forwarding node uses the state value calculated by the median iteration formula in the step 5.5 as a new state value, other candidate nodes do not update the state value, and the received data packet is abandoned.
And 6, repeating the data packet forwarding process in the step 5 until the data packet is forwarded to the sink node. And finally, obtaining an optimal routing path by continuously carrying out data packet forwarding and state value iteration. Fig. 5 is a schematic diagram of a learned routing path. If a certain data packet has no path to reach the sink node, the data packet transmission fails. And when 30% of data packet transmission fails in a certain longer period, the life cycle of the wireless sensor network is ended. Fig. 6 is a schematic diagram of the end of life of the network, where the black filled circles indicate sensor nodes that die from energy depletion.

Claims (1)

1. The opportunistic routing protocol design method based on the Markov decision process model is characterized by comprising the following steps of:
step 1, evaluating the environmental link quality and the packet receiving rate:
carrying out multiple communication experiments in a data acquisition area by utilizing two sensor nodes under different communication distances, acquiring packet receiving rate data under the same RSSI value and establishing a sample space by LQI average value and packet receiving rate data under different communication distances; when-70 dBm is less than or equal to RSSI, the packet receiving rate is 100%; when the RSSI is less than or equal to-75 dBm and less than or equal to-70 dBm, the wrapping yield is 99 percent; when the RSSI is less than or equal to-80 dBm and less than or equal to-75 dBm, the wrapping yield is 98 percent; when the RSSI is less than or equal to-85 dBm and less than or equal to-80 dBm, the packet receiving rate is (RSSI+177%; when the RSSI is less than-85 dBm, estimating the packet receiving rate by using an LQI mean value, and performing curve family regression fitting on the LQI mean value and the packet receiving rate data to obtain an estimation formula of the packet receiving rate;
step 2, sowing wireless sensor nodes, and constructing a wireless sensor network: the wireless sensor network comprises a sink node which is responsible for collecting data collected by common nodes in the area and uploading the data to the network;
step 3, periodically broadcasting and receiving the detection packet by the wireless sensor node, and establishing a neighbor information table; establishing a neighbor information table, wherein the neighbor information table comprises neighbor node IDs, neighbor node corresponding state values, neighbor node sleep/interception duty ratios, candidate node sets of the neighbor nodes and the packet receiving rate of the neighbor nodes to the neighbor nodes;
step 4, the wireless sensor node establishes a candidate node set; establishing a candidate node set, namely sequencing neighbor nodes of the candidate node set according to the corresponding state values through wireless sensor nodes, sequentially selecting the neighbor nodes with the state values larger than or equal to the state values as the candidate nodes until the probability that the data packet broadcast by the candidate nodes is successfully received by at least one candidate node is larger than a set value or no selectable neighbor nodes, and stopping the candidate nodes to form the candidate node set;
step 5, solving a forwarding node selection problem in the opportunistic routing by using a Markov decision process: the node where the effective data packet is located broadcasts the data packet, the candidate node which receives the data packet recalculates the corresponding state value of the node according to a value iteration formula, and the data packet is transmitted Fang Xuanqu to the node with the largest corresponding state value to serve as a next hop forwarding node; the method for solving the forwarding node selection problem in the opportunistic routing by using the Markov decision process specifically comprises the following steps:
5.1 modeling the opportunistic routing problem by using a Markov decision process model;
5.2 calculating a state transition probability matrix P: obtaining a state transition probability matrix P according to the packet receiving rate between nodes, and transferring the effective data packet from the node i to the candidate node j y And is composed of j y Probability of forwarding
Figure FDA0004103686520000011
(y=1, 2, …, m) can be calculated by the following formula:
Figure FDA0004103686520000012
wherein ,
Figure FDA0004103686520000013
representing node j y Probability of successful reception of a node i broadcast packet, < >>
Figure FDA0004103686520000014
Representing node j t The probability of successfully receiving the broadcast data packet of the node i is that m represents m candidate nodes in the candidate node set of the node i; the rest nodes except the candidate node are j x (x=m+1,m+2,…,N),/>
Figure FDA0004103686520000021
N represents N wireless sensor nodes in total of the network; probability calculated by node i>
Figure FDA0004103686520000022
The ith row and the jth row of the state transition probability matrix P respectively 1 、j 2 、…、j m 、…、j N The values of the columns and the values of the corresponding rows of the state transition probability matrix P are calculated by each node, so that a complete state transition probability matrix P can be obtained;
5.3, making action rewarding function:
formulating an action rewarding function R s =-1+f(k E), wherein Rs Representing an action prize, k, for broadcasting a data packet in state s E Ratio of remaining energy to initial energy for current node, f (k E ) For the ratio k of the remaining energy to the initial energy with respect to the current node E Is a function of (2);
design f (k) E ) The expression of (2) is as follows:
Figure FDA0004103686520000023
the less the current node residual energy, the greater the cost of forwarding the data packet, and the smaller the action rewards;
5.4, formulating an action strategy: adopting a greedy strategy, namely taking optimal action under the state s, so that the state value of the state s after iteration is maximized;
5.5 the candidate nodes iterate corresponding state values and return the state values:
broadcasting data packets by nodes where effective data packets are located, and dynamically planning a value iteration formula of candidate nodes receiving the data packets according to the dynamic programming value
Figure FDA0004103686520000024
Re-calculating the state value of the self, but not immediately replacing the original state value with the value, and returning the value to the source node; wherein k represents the kth iteration, k+1 represents the kth+1 iteration, v represents the state value, s represents the current state, i.e. the state corresponding to the candidate node receiving the data packet, s' represents the state at the next moment, v k+1 (s) is the value of state s at the (k+1) th iteration, v k (s ') is the value of state s' at the kth iteration, a represents an action that can be taken in the current state s, A is the set of actions, lambda is the discount factor, R is the action reward,
Figure FDA0004103686520000025
is represented in state sRewarding with action a, P is a state transition probability matrix,>
Figure FDA0004103686520000026
representing the probability that the state becomes s' at the next moment after taking action a in state s, a corresponding value can be found in the state transition probability matrix P, and max refers to an action policy as a greedy policy;
5.6 selecting a next hop forwarding node:
the data packet sending node receives the status value returned by the candidate node, selects the node with the highest status value which is successfully corresponding to the status value as the next hop forwarding node, and broadcasts the message;
step 6, repeating the data packet forwarding process of the step 5 until the data packet is forwarded to the sink node; and finally, obtaining an optimal routing path by continuously carrying out data packet forwarding and state value iteration.
CN202010331293.7A 2020-04-24 2020-04-24 Opportunistic routing protocol design method based on Markov decision process model Active CN111629415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010331293.7A CN111629415B (en) 2020-04-24 2020-04-24 Opportunistic routing protocol design method based on Markov decision process model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010331293.7A CN111629415B (en) 2020-04-24 2020-04-24 Opportunistic routing protocol design method based on Markov decision process model

Publications (2)

Publication Number Publication Date
CN111629415A CN111629415A (en) 2020-09-04
CN111629415B true CN111629415B (en) 2023-04-28

Family

ID=72260539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010331293.7A Active CN111629415B (en) 2020-04-24 2020-04-24 Opportunistic routing protocol design method based on Markov decision process model

Country Status (1)

Country Link
CN (1) CN111629415B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112702710A (en) * 2020-12-22 2021-04-23 杭州电子科技大学 Opportunistic routing optimization method based on link correlation in low duty ratio network
CN112954769B (en) * 2021-01-25 2022-06-21 哈尔滨工程大学 Underwater wireless sensor network routing method based on reinforcement learning
CN113950113B (en) * 2021-10-08 2022-10-25 东北大学 Internet of vehicles switching decision method based on hidden Markov
CN114125984B (en) * 2021-11-22 2023-05-16 北京邮电大学 Efficient opportunistic routing method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105848247A (en) * 2016-05-17 2016-08-10 中山大学 Vehicular Ad Hoc network self-adaption routing protocol method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105848247A (en) * 2016-05-17 2016-08-10 中山大学 Vehicular Ad Hoc network self-adaption routing protocol method

Also Published As

Publication number Publication date
CN111629415A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN111629415B (en) Opportunistic routing protocol design method based on Markov decision process model
CN112469100B (en) Hierarchical routing algorithm based on rechargeable multi-base-station wireless heterogeneous sensor network
CN108712767B (en) Inter-cluster multi-hop routing control method with balanced energy consumption in wireless sensor network
CN104301965A (en) Wireless sensor network inhomogeneous cluster node scheduling method
Haberman et al. Overlapping particle swarms for energy-efficient routing in sensor networks
Barbato et al. Resource oriented and energy efficient routing protocol for IPv6 wireless sensor networks
CN108566658B (en) Clustering algorithm for balancing energy consumption in wireless sensor network
Micheletti et al. CER-CH: combining election and routing amongst cluster heads in heterogeneous WSNs
CN114501576B (en) SDWSN optimal path calculation method based on reinforcement learning
CN105764110B (en) A kind of wireless sensor network routing optimization method based on immune clonal selection
CN116261202A (en) Farmland data opportunity transmission method and device, electronic equipment and medium
CN113316214B (en) Self-adaptive cooperative routing method of energy heterogeneous wireless sensor
Darabkh et al. An innovative RPL objective function for broad range of IoT domains utilizing fuzzy logic and multiple metrics
CN106685819B (en) A kind of AOMDV agreement power-economizing method divided based on node energy
Fatima et al. Route discovery by cross layer approach for MANET
Bongale et al. Firefly algorithm inspired energy aware clustering protocol for wireless sensor network
CN114567917A (en) Multi-channel Internet of things routing method based on fuzzy hierarchical analysis
Prakash et al. Best cluster head selection and route optimization for cluster based sensor network using (M-pso) and Ga algorithms
Rabelo et al. An approach based on fuzzy inference system and ant colony optimization for improving the performance of routing protocols in Wireless Sensor Networks
CN114501575A (en) Agricultural Internet of things self-adaptive routing method based on fuzzy logic
CN114531716A (en) Routing method based on energy consumption and link quality
CN113965943A (en) Method for optimizing AODV (Ad hoc on-demand distance vector) routing based on bidirectional Q-Learning
Sharmin et al. Efficient and scalable ant colony optimization based WSN routing protocol for IoT
Riva et al. Pheromone-based in-network processing for wireless sensor network monitoring systems
Rasheed et al. Cluster-quality based hybrid routing for large scale mobile multi-hop networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant