CN111629415A - Opportunistic routing protocol based on Markov decision process model - Google Patents

Opportunistic routing protocol based on Markov decision process model Download PDF

Info

Publication number
CN111629415A
CN111629415A CN202010331293.7A CN202010331293A CN111629415A CN 111629415 A CN111629415 A CN 111629415A CN 202010331293 A CN202010331293 A CN 202010331293A CN 111629415 A CN111629415 A CN 111629415A
Authority
CN
China
Prior art keywords
node
state
value
data packet
packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010331293.7A
Other languages
Chinese (zh)
Other versions
CN111629415B (en
Inventor
黄成�
尹政
刘子淇
刘振光
姚文杰
徐志良
王力立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202010331293.7A priority Critical patent/CN111629415B/en
Publication of CN111629415A publication Critical patent/CN111629415A/en
Application granted granted Critical
Publication of CN111629415B publication Critical patent/CN111629415B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/04Communication route or path selection, e.g. power-based or shortest path routing based on wireless node resources
    • H04W40/10Communication route or path selection, e.g. power-based or shortest path routing based on wireless node resources based on available power or energy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/12Communication route or path selection, e.g. power-based or shortest path routing based on transmission quality or channel quality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/38Services specially adapted for particular environments, situations or purposes for collecting sensor information
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an opportunistic routing protocol based on a Markov decision process model, which firstly evaluates the quality of an environmental link and evaluates the packet receiving rate: collecting packet receiving rate data under the same RSSI value and LQI mean value and packet receiving rate data under different communication distances to establish a sample space, and performing curvilinear family regression fitting on the LQI mean value and the packet receiving rate data to obtain an estimation formula of the packet receiving rate; scattering wireless sensor nodes to establish a wireless sensor network; the sensor node periodically broadcasts and receives a detection packet, and a neighbor information table is established; the sensor node establishes a candidate node set; and broadcasting the data packet by the node where the effective data packet is located, recalculating the corresponding state value of the node by the candidate node receiving the data packet according to a value iteration formula, and selecting the node with the maximum corresponding state value as a next skip sending node by the data packet sender. The invention optimizes and balances the energy use of the wireless sensor network.

Description

Opportunistic routing protocol based on Markov decision process model
Technical Field
The invention belongs to the technical field of machine learning, and particularly relates to an opportunistic routing protocol based on a Markov decision process model.
Background
The wireless sensor network is a network formed by a plurality of sensor nodes in a multi-hop self-organizing mode, and has a very wide application prospect. In a large amount of research works related to wireless sensor networks, the research on routing protocols is always the key content, and reasonable routing design can effectively improve the network performance. Because more sensor nodes are randomly scattered in an unmanned area, batteries are difficult to replace, and how to save the consumption of node energy and balance the use of network energy becomes an inevitable problem.
In the conventional wireless sensor network, one or more optimized fixed paths are selected by a protocol before data starts to be transmitted, and data packets are transmitted along the preset fixed paths. Unlike the conventional routing protocol, each node receiving a data packet in the opportunistic routing is likely to serve as a relay node, and the routing path from the source node to the destination node is not fixed. Each node in the network acquires neighbor node information and network parameters through periodically sending and receiving detection packets, and selects proper neighbor nodes as candidate nodes to form a candidate node set (CRS). In the forwarding process, the node selects the optimal next hop forwarding node from the candidate nodes which successfully receive the data packet. This process is repeated until the packet is forwarded to the destination node.
The traditional routing protocol has the defects that a routing path is fixed and cannot be well adapted to the change of a network environment, and the change of factors such as network link quality, node residual energy, node position and the like can cause great influence on network performance. If the data transmission from the sender to the next hop node fails in a certain data transmission process, the sender retransmits the data packet until the next hop node successfully receives the data packet, instead of using other nodes receiving the data packet to forward, which causes the waste of network resources. The opportunistic routing has the defects that the forwarding nodes are selected hop by hop for the data packet, and the network global information is not available, so that routing decision can be carried out only by depending on the position information of the neighbor nodes and the destination node, and the data packet is ensured to be continuously forwarded to the destination node. In this case, if there is a deviation in the positioning of the sensor nodes in the network or the positions of the nodes are unknown, the performance of the routing protocol will be seriously affected. Except for the influence of position information errors, the opportunistic routing protocol only considers parameters of neighbor nodes, only finds single-step optimal solution, and cannot realize global optimal. The above problems limit the further improvement of the performance of the opportunistic routing protocol, how to overcome the dependence on the position information, and how to select the optimal node from the global perspective to forward the data packet is a big problem.
Disclosure of Invention
The invention aims to provide an opportunistic routing protocol based on a Markov decision process model, which aims to solve the problem that the routing protocol performance is poor due to the fact that the node position information is relied on and the global optimal solution can not be realized in the prior art; the invention realizes the design of the opportunistic routing protocol through the Markov decision process of reinforcement learning, optimizes and balances the energy use of the wireless sensor network and achieves the aim of prolonging the life cycle of the network.
The technical solution for realizing the purpose of the invention is as follows:
an opportunistic routing protocol based on a Markov decision process model comprising the steps of:
step 1, evaluating environment link quality, evaluating packet receiving rate:
collecting packet receiving rate data under the same RSSI value and LQI mean value and packet receiving rate data under different communication distances to establish a sample space, and performing curvilinear family regression fitting on the LQI mean value and the packet receiving rate data to obtain an estimation formula of the packet receiving rate;
step 2, scattering wireless sensor nodes, and establishing a wireless sensor network: the wireless sensor network comprises a sink node and is responsible for collecting data collected by common nodes in a region and uploading the data to the network;
step 3, periodically broadcasting and receiving detection packets by the sensor nodes, and establishing a neighbor information table;
step 4, the sensor nodes establish a candidate node set;
step 5, solving the forwarding node selection problem in the opportunistic routing by using a Markov decision process: and broadcasting the data packet by the node where the effective data packet is located, recalculating the corresponding state value of the node by the candidate node receiving the data packet according to a value iteration formula, and selecting the node with the maximum corresponding state value as a next skip sending node by the data packet sender.
Step 6, repeating the data packet forwarding process of the step 5 until the data packet is forwarded to the sink node; and finally, obtaining the optimal routing path by continuously forwarding the data packet and iterating the state value.
Compared with the prior art, the invention has the following remarkable advantages:
(1) the invention provides a method for Markov decision process combined with classical reinforcement learning method, which provides a new method for the design field of the current wireless sensor network routing protocol.
(2) The invention utilizes the Received Signal Strength Indicator (RSSI) and the Link Quality Indicator (LQI) provided by the network physical layer to calculate the packet receiving rate among the sensor nodes in real time, so that the algorithm can adapt to the change of the network state.
(3) The invention designs the opportunistic routing protocol by utilizing a Markov decision process model for reinforcement learning, does not depend on the position information of the sensor nodes, and can search a proper forwarding path from the perspective of global optimum after continuous learning.
Drawings
FIG. 1 is a general schematic view of the present invention
FIG. 2 is a schematic diagram of a sensor network layout
FIG. 3 is a schematic diagram of state transition matrix calculation
FIG. 4 is a diagram illustrating the ratio k of the action reward R to the current node residual energyEIn relation to (2)
FIG. 5 is a diagram of learned opportunistic routing paths
FIG. 6 is a schematic diagram of the end of life of a wireless sensor network
Detailed Description
The invention is further described with reference to the following figures and embodiments.
The invention provides an opportunistic routing protocol based on a Markov decision process model, which utilizes reinforcement learning to search an energy optimal forwarding path and comprises the following specific steps:
step 1, evaluating the quality of an environment link, and providing a packet receiving rate evaluation method:
and selecting a certain area as a data acquisition area, performing a plurality of communication experiments under different communication distances by using two sensor nodes in the area, and acquiring packet receiving rate data under the same RSSI value and LQI mean value and packet receiving rate data under different communication distances to establish a sample space. When the communication quality of the link is better, the correlation between the RSSI value and the packet receiving rate is best, so the RSSI value is used for estimating the packet receiving rate. When the RSSI is less than or equal to-70 dBm, the packet receiving rate is 100 percent; when RSSI is less than-70 dBm and is less than-75 dBm, the packet receiving rate is 99 percent; when RSSI is less than-75 dBm and is less than-80 dBm, the packet receiving rate is 98 percent; when RSSI is less than-80 dBm and less than-85 dBm, the packet receiving rate is (RSSI + 177)%; and when the RSSI is less than-85 dBm, estimating the packet receiving rate by using an LQI mean value, and performing curve family regression fitting on the LQI mean value and the packet receiving rate data to obtain an estimation formula of the packet receiving rate. The RSSI and LQI value information carried in the data packet transmission are used for estimating the packet receiving rate in real time, so that the routing protocol can adapt to the change of the network.
Step 2, scattering wireless sensor nodes, and establishing a wireless sensor network:
as shown in fig. 2, a plurality of wireless sensor nodes are randomly scattered in a selected data acquisition area to form a wireless sensor network, wherein the wireless sensor network comprises a sink node which is responsible for collecting data acquired by common nodes in the area and uploading the data to the network. The energy of the common node is limited, and the energy of the sink node is sufficient.
Step 3, the sensor node periodically broadcasts and receives the detection packet, and establishes a neighbor information table:
each sensor node in the sensor network periodically broadcasts a detection packet, wherein the detection packet comprises a node ID, a node corresponding state value, a node sleep/interception duty ratio and a candidate node set of the node. The node corresponding state value is used to evaluate the value of a certain node as a data packet forwarding node, and the candidate node set of the node refers to a set of nodes that can receive and forward data packets from the node. Each sensor node receives a detection packet from a neighbor node, establishes a neighbor information table, simultaneously obtains RSSI and LQI values when receiving a data packet, and estimates the packet receiving rate between the sensor node and the neighbor node by using the fitting formula obtained in the step 1. Considering the sleep/listening period of a node, the probability L that node j successfully receives the node i broadcast packet in practiceijCan be calculated as follows:
Lij=pij·kjw
wherein ,pijIs the packet reception rate, k, between node i and node jjwIs the listening time of node j. This step is performed periodically during the sensor network lifecycle.
Step 4, the sensor nodes establish a candidate node set:
the sensor node sorts the neighbor nodes according to the sizes of the corresponding state values, selects the neighbor nodes with the state values larger than or equal to the state values as candidate nodes in sequence, and stops the process until the probability that the data packet broadcasted by the sensor node is successfully received by at least one candidate node is larger than 90% or no optional neighbor node exists. The candidate nodes constitute a set of candidate nodes. If the data packet broadcasted by the sensor node at a certain time is not received by any candidate node, the node rebroadcasts the data packet. After the state value of the neighbor node is updated, the node periodically repeats the step, and the candidate node set is reestablished according to the new state value.
Step 5, solving the forwarding node selection problem in the opportunistic routing by using a Markov decision process: and broadcasting the data packet by the node where the effective data packet is located, recalculating the corresponding state value of the node by the candidate node receiving the data packet according to a value iteration formula, and selecting the node with the maximum corresponding state value as a next skip sending node by the data packet sender.
5.1 modeling the opportunistic routing problem by using a Markov decision process model:
in the opportunistic routing problem, the agent is the valid packet that needs to be forwarded. The modeling process is the prior art, wherein S represents a state set, the states are represented by S, different states S of an effective data packet represent that the effective data packet is located at different sensor nodes, each state S of the effective data packet corresponds to a node, and the value of the state corresponding to each node is the value of the state corresponding to the node and is used for representing the value of the node as a forwarding node; a represents an action set, the action set is represented by a, the action set A is a broadcast data packet and selects a next hop forwarding node according to a certain rule, and different actions are different in the rule of selecting the next hop node, so that the generated state transition probability matrixes P are different; p is a state transition matrix, which indicates the probability that an effective data packet is at a node after a certain action is taken, and different actions can generate different state transition probabilities in relation to the action taken; r is an action award, and taking a certain action in a certain state will generate a corresponding award.
5.2 calculating the state transition probability matrix P:
the invention obtains a state transition probability matrix P according to the packet receiving rate between nodes, and FIG. 3 is a schematic diagram for calculating the state transition probability matrix. In the figure, CRSi is a candidate node set of a node i, and the node set has m candidate nodes in total and corresponds to a state value v (j)1)>v(j2)>v(j3)>v(j4)>…>v(jm), wherein j1、j2、j3、j4、jmAre all candidate nodes for node i. The action taken by the node where the effective data packet is located is a broadcast data packet, and the node with the maximum state value is selected from candidate nodes receiving the data packet according to a greedy strategy to be used as a forwarding node of the next hop, so that the effective data packet is transferred from the node i to the node jyAnd is formed by jyProbability of forwarding
Figure BDA0002465041570000051
Can be calculated from the following formula:
Figure BDA0002465041570000052
wherein ,
Figure BDA0002465041570000053
represents node jyThe probability of successfully receiving the node i broadcast packet,
Figure BDA0002465041570000054
represents node jtThe probability of successfully receiving a node i broadcast packet, t being the amount that varies from 1 to y-1, and y being the amount that varies from 1 to m. Nodes j other than candidate nodex(x ═ m +1, m + 2.., N) does not act as a forwarding node for node i to broadcast packets, so node i does not broadcast packets
Figure BDA0002465041570000055
x is an amount varying from m +1 to N. Probability calculated by node i
Figure BDA0002465041570000056
Respectively, the ith row and the jth row of the state transition probability matrix P1、j2、…、jm、…、jNAnd calculating the value of the corresponding row of the state transition probability matrix P by each node according to the column values, so as to obtain the complete state transition probability matrix P.
5.3 formulating a reward function R:
in reinforcement learning, each walking movement generates an action reward R,
Figure BDA0002465041570000057
indicating a prize earned by taking action a in state s. In each state, the action set A which can be taken is a broadcast data packet and a next hop candidate node is selected according to a certain rule. The broadcast data packet is the actual action and generates the corresponding action reward, so the rewards corresponding to all actions in the same action set are the same, and the action rewardReward is only relevant to the current state.
And the corresponding optimization task can be realized only by reasonably formulating action rewards and strengthening a learning algorithm. The energy perception type opportunistic routing aims at optimizing network energy use and prolonging the life cycle of the network. In order to achieve the purpose, on one hand, network energy is saved, and a data packet is forwarded to a destination node through the shortest path as much as possible; on the other hand, the network energy needs to be balanced, and premature energy exhaustion of partial nodes due to frequent use is avoided. To balance the problems of the two aspects, the invention formulates a behavior reward function Rs=-1+f(kE), wherein RsAction reward, k, representing a broadcast packet in state sEIs the ratio of the remaining energy to the initial energy of the current node, f (k)E) To a ratio k of remaining energy with respect to the front nodeEAs a function of (c). The broadcast packet consumes energy, so the action reward is negative. The data packet has-1 basic reward every time the data packet is transmitted, and the state value function corresponding to the node far away from the target node is smaller after a certain learning process. f (k)E) The value of (1) is always negative, the less the residual energy of the current node is, the higher the cost of forwarding the data packet is, and the smaller the action reward is. Based on this principle, f (k) is designedE) Is represented by the following formula:
Figure BDA0002465041570000061
at this time, the ratio k of the action reward R to the current node residual energyEThe relationship of (2) is shown in FIG. 4.
5.4, an action strategy is established:
the action strategy in the Markov decision process model adopts a greedy strategy, namely, the optimal action is taken under the state s, so that the state value of the state s after iteration is maximum. In order to better explore the state space, most algorithms give certain randomness to the action strategy, so that the intelligent agent has probability to carry out random actions to search possible better solutions. But in opportunistic routing problems, all reached states are assigned negative state values because the reward function is negative. Therefore, the algorithm automatically explores the unknown state space and preferentially forwards the data packet by using the unused nodes. Through the continuous learning process, the data packet reaches the destination node along the path with the minimum forwarding cost. The algorithm employs a greedy strategy as an action strategy.
5.5 the candidate node iterates and returns the corresponding state value:
broadcasting the data packet by the node where the effective data packet is located, and iterating the formula according to the dynamically planned value of the candidate node receiving the data packet
Figure BDA0002465041570000062
The self state value is recalculated, but the value is not immediately replaced with the original state value, but is transmitted back to the source node. In the formula, k represents the kth iteration, k +1 represents the kth iteration and +1, v represents the state value, s represents the current state, i.e. the state corresponding to the candidate node receiving the data packet, s' represents the state at the next moment, vk+1(s) is the value of state s at the k +1 th iteration, vk(s ') is the value of state s' at the kth iteration, a represents the action that can be taken in the current state s, A is the set of actions, γ is the discount factor, R is the action reward,
Figure BDA0002465041570000063
a reward representing taking action a in state s, P is a state transition probability matrix,
Figure BDA0002465041570000064
the probability that the state changes to s' at the next moment after the action a is taken in the state s is represented, a corresponding value can be found in the state transition probability matrix P, and max indicates that the action strategy is a greedy strategy.
5.6 selecting the next hop forwarding node.
And the data packet sending node receives the state values returned by the candidate nodes, selects the node with the highest corresponding state value as the next hop sending node, and broadcasts the message. The candidate node selected as the forwarding node uses the state value calculated by the value iterative formula in the step 5.5 as a new state value of the candidate node, and the other candidate nodes do not update the state value and abandon the received data packet.
And 6, repeating the data packet forwarding process in the step 5 until the data packet is forwarded to the sink node. And finally, obtaining the optimal routing path by continuously forwarding the data packet and iterating the state value. Fig. 5 is a schematic diagram of a learned routing path. If no path of a certain data packet can reach the sink node, the data packet transmission fails. And when 30% of data packets fail to be transmitted in a certain longer period, the life cycle of the wireless sensor network is ended. Fig. 6 is a schematic diagram of the end of life of the network, wherein black filled circles indicate dead sensor nodes due to energy depletion.

Claims (4)

1. An opportunistic routing protocol based on a Markov decision process model, comprising the steps of:
step 1, evaluating environment link quality, evaluating packet receiving rate:
collecting packet receiving rate data under the same RSSI value and LQI mean value and packet receiving rate data under different communication distances to establish a sample space, and performing curvilinear family regression fitting on the LQI mean value and the packet receiving rate data to obtain an estimation formula of the packet receiving rate;
step 2, scattering wireless sensor nodes, and establishing a wireless sensor network: the wireless sensor network comprises a sink node and is responsible for collecting data collected by common nodes in a region and uploading the data to the network;
step 3, periodically broadcasting and receiving detection packets by the sensor nodes, and establishing a neighbor information table;
step 4, the sensor nodes establish a candidate node set;
step 5, solving the forwarding node selection problem in the opportunistic routing by using a Markov decision process: and broadcasting the data packet by the node where the effective data packet is located, recalculating the corresponding state value of the node by the candidate node receiving the data packet according to a value iteration formula, and selecting the node with the maximum corresponding state value as a next skip sending node by the data packet sender.
Step 6, repeating the data packet forwarding process of the step 5 until the data packet is forwarded to the sink node; and finally, obtaining the optimal routing path by continuously forwarding the data packet and iterating the state value.
2. The opportunistic routing protocol according to claim 1 wherein step 3 establishes a neighbor information table comprising neighbor node IDs, neighbor node corresponding state values, neighbor node sleep/listening duty cycles, neighbor node candidate set and neighbor node packet reception rates for itself.
3. The opportunistic routing protocol according to claim 1, wherein the candidate node set is established in step 4, specifically, neighbor nodes of the opportunistic routing protocol are sorted according to the sizes of corresponding state values by the sensor nodes, the neighbor nodes with the state values larger than or equal to the state values of the neighbor nodes are sequentially selected as the candidate nodes, and the opportunistic routing protocol is stopped until the probability that at least one candidate node is successfully received by data packets broadcasted by the sensor nodes is larger than a set value or no optional neighbor node exists, and the candidate nodes form the candidate node set.
4. The opportunistic routing protocol of claim 1 wherein step 5, solving the forwarding node selection problem in opportunistic routing with a markov decision process, specifically comprises the steps of:
5.1 modeling opportunistic routing problems with Markov decision process models
5.2 calculating the state transition probability matrix P: obtaining a state transition probability matrix P according to the packet receiving rate between the nodes, and transferring the effective data packet from the node i to the candidate node jyAnd is formed by jyProbability of forwarding
Figure FDA0002465041560000011
Can be calculated from the following formula:
Figure FDA0002465041560000012
wherein ,
Figure FDA0002465041560000021
represents node jyThe probability of successfully receiving the node i broadcast packet,
Figure FDA0002465041560000022
represents node jtAnd the probability of successfully receiving the broadcast data packet of the node i, wherein m represents m candidate nodes in the candidate node set of the node i. The nodes except the candidate node are jx(x=m+1,m+2,...,N),
Figure FDA0002465041560000023
And N represents that the network has N sensor nodes. Probability calculated by node i
Figure FDA0002465041560000024
Respectively, the ith row and the jth row of the state transition probability matrix P1、j2、...、jm、…、jNAnd calculating the value of the corresponding row of the state transition probability matrix P by each node according to the column values, so as to obtain the complete state transition probability matrix P.
5.3 formulating a reward function:
formulating a behavior reward function Rs=-1+f(kE), wherein RsAction reward, k, representing a broadcast packet in state sEIs the ratio of the remaining energy to the initial energy of the current node, f (k)E) To a ratio k of remaining energy with respect to the front nodeEA function of (a);
design f (k)E) Is represented by the following formula:
Figure FDA0002465041560000025
the less the residual energy of the current node is, the higher the cost of forwarding the data packet is, and the smaller the action reward is;
5.4, an action strategy is established: adopting a greedy strategy, namely, taking the optimal action under the state s to maximize the state value of the state s after iteration;
5.5 the candidate node iterates and returns the corresponding state value:
broadcasting the data packet by the node where the effective data packet is located, and iterating the formula according to the dynamically planned value of the candidate node receiving the data packet
Figure FDA0002465041560000026
Recalculating the self state value, but not immediately replacing the original state value with the self state value, and transmitting the self state value back to the source node; in the formula, k represents the kth iteration, k +1 represents the kth iteration and +1, v represents the state value, s represents the current state, i.e. the state corresponding to the candidate node receiving the data packet, s' represents the state at the next moment, vk+1(s) is the value of state s at the k +1 th iteration, vk(s ') is the value of state s' at the kth iteration, a represents the action that can be taken in the current state s, A is the set of actions, γ is the discount factor, R is the action reward,
Figure FDA0002465041560000027
a reward representing taking action a in state s, P is a state transition probability matrix,
Figure FDA0002465041560000031
representing the probability that the state is changed into s' at the next moment after the action a is taken under the state s, and finding a corresponding value in a state transition probability matrix P, wherein max indicates that the action strategy is a greedy strategy;
5.6 selecting the next hop forwarding node:
and the data packet sending node receives the state values returned by the candidate nodes, selects the node with the highest corresponding state value as the next hop sending node, and broadcasts the message.
CN202010331293.7A 2020-04-24 2020-04-24 Opportunistic routing protocol design method based on Markov decision process model Active CN111629415B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010331293.7A CN111629415B (en) 2020-04-24 2020-04-24 Opportunistic routing protocol design method based on Markov decision process model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010331293.7A CN111629415B (en) 2020-04-24 2020-04-24 Opportunistic routing protocol design method based on Markov decision process model

Publications (2)

Publication Number Publication Date
CN111629415A true CN111629415A (en) 2020-09-04
CN111629415B CN111629415B (en) 2023-04-28

Family

ID=72260539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010331293.7A Active CN111629415B (en) 2020-04-24 2020-04-24 Opportunistic routing protocol design method based on Markov decision process model

Country Status (1)

Country Link
CN (1) CN111629415B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112702710A (en) * 2020-12-22 2021-04-23 杭州电子科技大学 Opportunistic routing optimization method based on link correlation in low duty ratio network
CN112954769A (en) * 2021-01-25 2021-06-11 哈尔滨工程大学 Underwater wireless sensor network routing method based on reinforcement learning
CN113950113A (en) * 2021-10-08 2022-01-18 东北大学 Hidden Markov-based Internet of vehicles switching decision algorithm
CN114125984A (en) * 2021-11-22 2022-03-01 北京邮电大学 Efficient opportunistic routing method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105848247A (en) * 2016-05-17 2016-08-10 中山大学 Vehicular Ad Hoc network self-adaption routing protocol method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105848247A (en) * 2016-05-17 2016-08-10 中山大学 Vehicular Ad Hoc network self-adaption routing protocol method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112702710A (en) * 2020-12-22 2021-04-23 杭州电子科技大学 Opportunistic routing optimization method based on link correlation in low duty ratio network
CN112954769A (en) * 2021-01-25 2021-06-11 哈尔滨工程大学 Underwater wireless sensor network routing method based on reinforcement learning
CN112954769B (en) * 2021-01-25 2022-06-21 哈尔滨工程大学 Underwater wireless sensor network routing method based on reinforcement learning
CN113950113A (en) * 2021-10-08 2022-01-18 东北大学 Hidden Markov-based Internet of vehicles switching decision algorithm
CN114125984A (en) * 2021-11-22 2022-03-01 北京邮电大学 Efficient opportunistic routing method and device
CN114125984B (en) * 2021-11-22 2023-05-16 北京邮电大学 Efficient opportunistic routing method and device

Also Published As

Publication number Publication date
CN111629415B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN111629415B (en) Opportunistic routing protocol design method based on Markov decision process model
Lamaazi et al. OF-EC: A novel energy consumption aware objective function for RPL based on fuzzy logic.
Kumar et al. A review of routing protocols in wireless sensor network
CN104301965A (en) Wireless sensor network inhomogeneous cluster node scheduling method
WO2016116989A1 (en) Network of nodes, battery-powered node and method for managing battery-powered node
Patel et al. Energy and collision aware WSN routing protocol for sustainable and intelligent IoT applications
CN104602302B (en) It is a kind of based on cluster structured ZigBee-network balancing energy method for routing
US10390241B2 (en) Wireless mesh network health determination
CN112469100A (en) Hierarchical routing algorithm based on rechargeable multi-base-station wireless heterogeneous sensor network
Saaidah et al. An efficient design of RPL objective function for routing in internet of things using fuzzy logic
CN109309620B (en) Lightweight heterogeneous network clustering method facing edge calculation
Chen et al. Energy residue aware (ERA) clustering algorithm for leach-based wireless sensor networks
Barbato et al. Resource oriented and energy efficient routing protocol for IPv6 wireless sensor networks
Cheng et al. Discovering long lifetime routes in mobile ad hoc networks
CN116261202A (en) Farmland data opportunity transmission method and device, electronic equipment and medium
Navarro et al. Energy-efficient and balanced routing in low-power wireless sensor networks for data collection
CN105764110B (en) A kind of wireless sensor network routing optimization method based on immune clonal selection
Kharche et al. Node level energy consumption analysis in 6LoWPAN network using real and emulated Zolertia Z1 motes
Darabkh et al. An innovative RPL objective function for broad range of IoT domains utilizing fuzzy logic and multiple metrics
CN106685819B (en) A kind of AOMDV agreement power-economizing method divided based on node energy
Omar et al. Energy efficiency in ad hoc wireless networks with node-disjoint path routing
Fatima et al. Route discovery by cross layer approach for MANET
Basagni et al. Localization error-resilient geographic routing for wireless sensor networks
CN114501576A (en) SDWSN optimal path calculation method based on reinforcement learning
CN114531716A (en) Routing method based on energy consumption and link quality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant