CN114025405A - Underwater unmanned vehicle safety opportunity routing method and device based on reinforcement learning - Google Patents

Underwater unmanned vehicle safety opportunity routing method and device based on reinforcement learning Download PDF

Info

Publication number
CN114025405A
CN114025405A CN202111176454.0A CN202111176454A CN114025405A CN 114025405 A CN114025405 A CN 114025405A CN 202111176454 A CN202111176454 A CN 202111176454A CN 114025405 A CN114025405 A CN 114025405A
Authority
CN
China
Prior art keywords
node
nodes
value
unmanned vehicle
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111176454.0A
Other languages
Chinese (zh)
Other versions
CN114025405B (en
Inventor
王桐
崔立佳
高山
陈立伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202111176454.0A priority Critical patent/CN114025405B/en
Publication of CN114025405A publication Critical patent/CN114025405A/en
Application granted granted Critical
Publication of CN114025405B publication Critical patent/CN114025405B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/02Communication route or path selection, e.g. power-based or shortest path routing
    • H04W40/12Communication route or path selection, e.g. power-based or shortest path routing based on transmission quality or channel quality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/08Learning-based routing, e.g. using neural networks or artificial intelligence
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

A safety opportunity routing method and device for an underwater unmanned vehicle based on reinforcement learning belong to the technical field of sensors. The current underwater exploration aims at the sensor nodes which cannot move autonomously underwater; the selection of the encountering node cannot be made during movement; void nodes are easily created. The invention provides a reinforcement learning-based underwater unmanned vehicle safety opportunity routing method, which comprises the following steps: the method comprises the steps that an underwater unmanned vehicle screens nodes for the first time in a communication range, and a trust evaluation model is established; establishing a trust evaluation model according to the preliminarily screened nodes for evaluation; inputting the evaluation elements into a fuzzy logic system to obtain an evaluation node comprehensive trust value, and updating the evaluation node comprehensive trust value into a encountering node trust value dynamic table; and (4) according to the comprehensive trust value of the evaluation node output by the fuzzy logic system, performing routing selection, setting a state-action value updating function and setting a reward function by using reinforcement learning. The method is applied to the field of safety opportunity routing of the underwater unmanned vehicle.

Description

Underwater unmanned vehicle safety opportunity routing method and device based on reinforcement learning
Technical Field
The invention relates to the field of safety opportunity routing of an underwater unmanned vehicle, in particular to the field of safety opportunity routing of the underwater unmanned vehicle based on reinforcement learning.
Background
The existing invention CN112188583A 'an ocean underwater wireless sensing network opportunistic routing method based on reinforcement learning', proposes the idea of combining reinforcement learning and opportunistic routing, but aims at the sensor nodes which can not move autonomously underwater, the topological change of the sensor nodes is small, and the sensor nodes only record the interaction information with the neighbor nodes and can not move autonomously.
If the underwater unmanned vehicle senses are combined with the opportunistic routing of the wireless sensing network, the nodes cannot be covered comprehensively, the nodes cannot be updated automatically due to the combination of the underwater unmanned vehicle senses and the opportunistic routing of the wireless sensing network, the encountered nodes cannot be selected in the moving process, and the final safe and efficient transmission of the information is realized; void nodes are easily created.
Disclosure of Invention
The invention solves the problem that the sensor node which can not move autonomously underwater has small topological change, and the sensor node only records the interaction information with the neighbor node and can not move autonomously; the encountered nodes can not be selected in the moving process, so that the final safe and efficient transmission of the messages is realized, and the problem of void nodes is easily caused.
An reinforcement learning-based underwater unmanned vehicle safety opportunity routing method, the method comprising:
primarily screening the underwater unmanned vehicle in a node-to-communication range, and establishing a trust evaluation model according to the primarily screened nodes;
establishing a trust evaluation model for evaluation according to the preliminarily screened nodes, wherein evaluation elements of the evaluation model consist of a direct trust value DTvalue and an indirect trust value ITvalue;
inputting the evaluation elements into a fuzzy logic system to obtain an evaluation node comprehensive trust value, and updating the evaluation node comprehensive trust value into a encountering node trust value dynamic table;
and (4) according to the comprehensive trust value of the evaluation node output by the fuzzy logic system, performing routing selection, setting a state-action value updating function and setting a reward function by using reinforcement learning.
Further, the process of primarily screening the underwater unmanned vehicle in the communication range from the node to the communication range and establishing the trust evaluation model according to the primarily screened node comprises the following steps:
the underwater unmanned vehicle node carrying the message sends a broadcast to other nodes in a communication range, requests other nodes to feed back node information, acquires a data packet, performs primary screening according to an indirect trust value ITValue in the data packet information of the other side, and selects the node with the indirect trust value exceeding a threshold value as a candidate relay node for further evaluation.
Further, the direct trust value DTValue evaluation element is selected as: 1. calculating relative distance between nodes through the sending and receiving time difference of the node data packets, and estimating the communication quality between the nodes according to the relative distance between the nodes; 2. node familiarity; 3. node relay ratio.
Further, the path loss estimated by the relative distance between the nodes measures the communication quality between the nodes, and the path loss a (d, f) of any pair of nodes occurring in the underwater acoustic channel is:
A(d,f)=A0dkα1(f)d
Figure BDA0003295291150000021
wherein, the frequency is f, the unit is KHz signal, the distance is d, the unit is m, A0Is a unity normalization constant, k is a propagation factor, representing the geometry of the propagation, α1Is an absorption factor;
further, the node familiarity includes:
each node records interaction records of the node, a previous hop node and a next hop node, including the number of the node of the other party, a target node, the time of starting and ending information transmission and the interaction times;
after receiving the information, the destination node broadcasts and sends a confirmation data packet only containing a packet header in the network, wherein the packet header contains successful transmission path information with destination node information;
the node receives the packet header message and confirms the interaction record, if the packet header message is contained in the successful transmission path, the previous hop node and the next hop node enter the own successful cooperative transmission node table, and whether the node which receives the packet header message and confirms the interaction record exists in the successful cooperative transmission node table is judged:
the nodes which receive the packet header message and confirm the interactive record only update the recorded data, the updated data comprises the time of the beginning and the end of the transmission and the accumulated transmission times, and the nodes in the table can be regarded as friend nodes of the current node;
if no node for receiving the header message and confirming the interactive record exists, namely the interactive record is not found on the successful transmission path, automatically clearing the interactive record after a certain time;
the network successfully operates, each node has a friend node belonging to the network, the contact interval time between the friend nodes is subjected to negative index distribution under the influence of the node moving speed and the transmission radius, the contact interval between the nodes is set to be subjected to negative index distribution, and the contact probability between the friend nodes is judged:
Figure BDA0003295291150000022
b is a friend node of A, PA,B(T) represents the probability of contact between node A and node B within time T, n is the total number of acquired historical transmission intervals, xiIs the ith transmission interval, the recorded successful interaction times with friend nodes are limited in the mobile opportunity network, and the values of n are different in the calculation processes of different nodes, wherein x isA,BIs a statistical average of historical transmission intervals.
The further node relay ratio is:
Pret=PA,B(T)/Nr
wherein, PretAs node relay ratio, NrThe number of messages received for the acquiring node.
Further, the indirect trust value uses reinforcement learning to perform routing selection, set a state-action value updating function and a reward function according to an evaluation node comprehensive trust value output by the fuzzy logic system, and the method comprises the following steps:
determining a comprehensive trust value of the encountering node according to a fuzzy logic method, using a Q learning strategy in reinforcement learning to find a proper forwarding path for the message, and defining an updating formula of a state-action value Q value as follows:
Figure BDA0003295291150000031
wherein Q isd(s, x) selecting a node x as a state-action value of a next skip sending node in a node s for a data packet with a destination node d, namely forwarding a forwarding utility Q value corresponding to the data packet with the destination node d to the node x by the node s, taking out the corresponding Q value stored in a state-action value table and substituting the Q value into a formula when updating, and storing the updated value into the state-action value table again; alpha is a learning coefficient, and alpha is more than or equal to 0 and less than or equal to 1; gamma rayd(s, x) is a dynamic discount factor corresponding to the data packet of which the destination node is d is forwarded to the node x in the node s; n is a radical ofxA set of contact nodes representing node x, the set containing all nodes, Q, encountered during movement of node xd' (x, y) state-action values for node synthetic trust values introduced to guarantee mobile opportunistic network security dynamics;
the dynamic discount factor gammad(s, x) is
γd(s,x)=γ*eCTValue(s,x)-1
Wherein gamma is a fixed constant and belongs to gamma (0, 1).
Further, the reward function is an immediate return value, is a function related to the node comprehensive trust value, and is a forward feedback to the node on the path of successful transmission:
Figure BDA0003295291150000032
wherein, CTValue (s, x) represents the integrated reputation value of the node x in the encountering node of the node s;
the forward feedback is the feedback after the message is successfully delivered to the destination node.
The invention provides an underwater unmanned vehicle safety opportunity routing device based on reinforcement learning, which comprises:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing a reinforcement learning-based underwater unmanned vehicle safety opportunity routing method as described above.
The present invention provides a computer device characterized in that: comprising a memory having a computer program stored therein and a processor that, when executed by the processor, performs a reinforcement learning-based underwater unmanned vehicle safety opportunity routing method as described hereinabove.
The invention has the advantages that:
the invention solves the problems that the existing sensor node which can not move autonomously underwater has small topological change, and the sensor node only records the interaction information with the neighbor node and can not move autonomously; the encountered nodes can not be selected in the moving process, so that the final safe and efficient transmission of the messages is realized, and the problem of void nodes is easily caused.
The method is designed aiming at the routing protocol of the underwater unmanned vehicle, realizes the safety and the high efficiency of the information transmission of the underwater unmanned vehicle, can avoid underwater cavities, can improve the networking performance of the underwater unmanned vehicle, reduces the time delay of the underwater information transmission and increases the delivery rate of messages.
The underwater unmanned vehicle is used as a sensor node, has autonomous mobility, can select meeting nodes in movement, uses opportunistic routing, evaluates the trust values of the meeting nodes during meeting of the nodes, performs routing selection by combining the comprehensive trust values of the reinforcement learning reference nodes, dynamically updates and optimizes the overall performance of the underwater network, and avoids effective transmission of information while cavity nodes.
The method is applied to the field of safety opportunity routing of the underwater unmanned vehicle.
Drawings
FIG. 1 is an overall implementation process of an underwater unmanned vehicle safety opportunity routing;
FIG. 2 is a comprehensive trust value output model.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments.
First embodiment this embodiment is described with reference to fig. 1. The embodiment provides an underwater unmanned vehicle safety opportunity routing method based on reinforcement learning, which comprises the following steps:
primarily screening the underwater unmanned vehicle in a node-to-communication range, and establishing a trust evaluation model according to the primarily screened nodes;
establishing a trust evaluation model for evaluation according to the preliminarily screened nodes, wherein evaluation elements of the evaluation model consist of a direct trust value DTvalue and an indirect trust value ITvalue;
inputting the evaluation elements into a fuzzy logic system to obtain an evaluation node comprehensive trust value, and updating the evaluation node comprehensive trust value into a encountering node trust value dynamic table;
and (4) according to the comprehensive trust value of the evaluation node output by the fuzzy logic system, performing routing selection, setting a state-action value updating function and setting a reward function by using reinforcement learning.
According to the safety opportunistic routing method of the underwater unmanned vehicle based on reinforcement learning, the underwater unmanned vehicle has autonomous mobility, meeting nodes can be selected during movement, opportunistic routing is used, during node meeting, the meeting nodes are subjected to trust value evaluation, routing selection is performed by combining a comprehensive trust value of a reinforcement learning reference node, the overall performance of an underwater network is dynamically updated and optimized, and effective transmission of information is avoided while cavity nodes are simultaneously performed.
Second embodiment this embodiment will be described with reference to fig. 1. The embodiment is further limited to the reinforcement learning-based underwater unmanned vehicle security opportunity routing method described in the first embodiment, in the present embodiment, the process of primarily screening the underwater unmanned vehicle in the communication range from the node, and establishing the trust evaluation model according to the primarily screened node is as follows:
the underwater unmanned vehicle node carrying the message sends a broadcast to other nodes in a communication range, requests other nodes to feed back node information, acquires a data packet, performs primary screening according to an indirect trust value ITValue in the data packet information of the other side, and selects the node with the indirect trust value exceeding a threshold value as a candidate relay node for further evaluation.
In the embodiment, the node information is acquired by performing primary screening in the communication range of the node.
Third embodiment this embodiment is described with reference to fig. 2. The embodiment is a further limitation on the reinforcement learning-based underwater unmanned vehicle security opportunity routing method in the first embodiment, and in the first embodiment, the direct trust value DTValue evaluation element is selected as follows: 1. calculating relative distance between nodes through the sending and receiving time difference of the node data packets, and estimating the communication quality between the nodes according to the relative distance between the nodes; 2. node familiarity; 3. node relay ratio.
The DTvalue of the indirect trust value guarantees the objectivity of evaluation on the current node, each node maintains a dynamic trust value table, the comprehensive trust value data of other nodes on the node is recorded, and the average value of the data in the dynamic trust value table is output as the indirect trust value.
In the embodiment, the evaluation element consists of an indirect trust value ITvalue and a direct trust value DTvalue, and the comprehensive trust value CTvalue of the candidate relay node is obtained through comprehensive calculation, so that the safety and the effectiveness of information transmission are realized.
Fourth embodiment this embodiment is described with reference to fig. 2. In this embodiment, the path loss estimated by the relative distance between the nodes measures the communication quality between the nodes, and the path loss a (d, f) of any pair of nodes occurring in the underwater acoustic channel is:
A(d,f)=A0dkα1(f)d
Figure BDA0003295291150000061
wherein, the frequency is f, the unit is KHz signal, the distance is d, the unit is m, A0Is a unity normalization constant, k is a propagation factor, representing the geometry of the propagation, α1Is the absorption factor. The geometric propagation loss depends only on the propagation distance and is independent of frequency.
k·10logd+d·10α1log (f) represents the attenuation caused by the absorption factor α and the distance d, where:
Figure BDA0003295291150000062
the signal-to-noise ratio is inversely proportional to the distance d and the error rate, and the distance is directly proportional to the error rate, so that the smaller the distance is, the more reliable data transmission can be ensured.
In the embodiment, the inter-node communication quality is an important influence factor for realizing effective transmission of messages with the relay node in the communication process of the underwater unmanned vehicle.
Example five this example is illustrated with reference to figure 2. In this embodiment, the third embodiment is further limited to the reinforcement learning-based underwater unmanned vehicle security opportunity routing method, where the node familiarity degree includes:
and each node on the transmission path records the interaction records of the node, the previous hop node and the next hop node, including the node number of the other party, the destination node, the time for starting and ending information transmission and the interaction times, and after the destination node successfully receives the information, the destination node broadcasts and sends a confirmation data packet only containing a packet header to the network, wherein the packet header contains the successful transmission path information with the destination node information. The nodes receive the packet header message and confirm the interactive record, if the packet header message is contained in the successful transmission path, the upper and lower jump nodes are added into the own successful cooperative transmission node table, if the nodes exist in the successful cooperative transmission node table, only data are updated into the record, the updated data comprise the time of transmission start and end and the accumulated transmission times, the nodes in the table can be regarded as friend nodes of the current node, if the interactive record is not found in the successful transmission path, and after a certain time, the interactive record is automatically cleared.
After the network successfully operates for a period of time, each node has a friend node, and the contact interval time between friend nodes obeys negative exponential distribution under the influence of the node moving speed and the transmission radius, so that the contact interval between the friend nodes is assumed to obey the negative exponential distribution, and the contact probability between the friend nodes is estimated.
When B is friend node of A, P is presentA,B(T) represents the probability of contact between node A and node B within time T, θA,BMean value representing negative exponential distribution of node a and B contact intervals:
Figure BDA0003295291150000071
wherein, the information can be transmitted by the successful cooperation of the nodes to obtain the historical transmission interval record thetaA,BThe value of (A) can be estimated by a maximum likelihood method and derived
Figure BDA0003295291150000072
Where n is the total number of historical transmission intervals that can be acquired, xiThe ith transmission interval is the transmission interval, the recorded successful interaction times with friend nodes in the mobile opportunity network are limited, and the values of n are different in the calculation processes of different nodes;
therefore, the distribution mean of the contact interval index distribution can be estimated by using the statistical average of the contact intervals of the friend nodes, and the contact probability of the nodes A and B in the time T is shown as the following formula
Figure BDA0003295291150000073
Statistical averaging for historical transmission intervals:
Figure BDA0003295291150000074
in the embodiment, the routing is carried out by calculating and measuring the contact probability between the friend nodes, so that the safety of information transmission can be effectively improved, and the effective transmission of data packets can be guaranteed.
Sixth embodiment this embodiment is described with reference to fig. 2. The present embodiment is a further limitation to the reinforcement learning-based underwater unmanned vehicle security opportunity routing method described in the third embodiment, and in the present embodiment, the node relay ratio is:
Pret=PA,B(T)/Nr
wherein, PretAs node relay ratio, NrReceiving message number for acquiring node
The node relay ratio P described in this embodimentretThe capability of the nodes for forwarding the message can be well reflected, and the introduction of the element can identify the failed underwater nodes and avoid underwater holes, so that the safe and reliable transmission of the message is realized.
Seventh embodiment this embodiment is described with reference to fig. 1. In this embodiment, the indirect trust value is a comprehensive trust value of an evaluation node output by a fuzzy logic system, and the reinforcement learning is used for routing, setting a state-action value update function and a reward function, and includes the steps of:
determining a comprehensive trust value of the encountering node according to a fuzzy logic method, using a Q learning strategy in reinforcement learning to find a proper forwarding path for the message, and defining an updating formula of a state-action value Q value as follows:
Figure BDA0003295291150000075
wherein Q isd(s, x) selecting a node x as a state-action value of a next skip sending node in a node s for a data packet with a destination node d, namely forwarding a forwarding utility Q value corresponding to the data packet with the destination node d to the node x by the node s, taking out the corresponding Q value stored in a state-action value table and substituting the Q value into a formula when updating, and storing the updated value into the state-action value table again; alpha is a learning coefficient, and alpha is more than or equal to 0 and less than or equal to 1; gamma rayd(s, x) is a dynamic discount factor corresponding to the data packet of which the destination node is d is forwarded to the node x in the node s; n is a radical ofxA set of contact nodes representing node x, the set containing all nodes, Q, encountered during movement of node xd' (x, y) status-action values for the synthetic trust values of the nodes introduced for ensuring mobile opportunistic network security dynamics.
Assuming that the learning rate α is 1, the state-action value Q value update formula is given:
Figure BDA0003295291150000081
Qd′(x,y)=CTValue(x,y)*Qd(x,y)
the dynamic discount factor gammad(s, x) is
γd(s,x)=γ*eCTValue(s,x)-1
Wherein gamma is a fixed constant and belongs to gamma (0, 1).
When the data packet described in this embodiment is transmitted in the network, the reward value and the dynamic discount factor gradually play a role through iterative update, and a relay node with a higher comprehensive trust value can be selected for the data packet, and the comprehensive trust value is a numerical value obtained through overall understanding of network security and efficiency, so that network transmission performance is improved.
Example eight this example is described with reference to fig. 1. In this embodiment, the reward function is an immediate return value, is a function related to a node comprehensive trust value, and is a forward feedback to a node on a successfully transmitted path:
Figure BDA0003295291150000082
CTvalue (s, x) represents the comprehensive credit value of the node x in the encountering node of the node s, and the comprehensive credit value is obtained through a fuzzy logic system; if the node s forwards the data packet to the destination node d of the data packet, the obtained instant return value is eCTValue(s,x)-1And otherwise, the value is 0, and the larger the comprehensive trust value of the node is, the larger the obtained immediate return value is.
The forward feedback is the feedback after the message is successfully delivered to the destination node.
The embodiment uses a greedy strategy to update the state-action values, and selects the maximum state-action value from the action set of the node x
Figure BDA0003295291150000083
For iterative update of state-action values. In the networking of the underwater unmanned vehicle, the encountering node trust value plays an important role in the design of a safe route, so that the safety and the reliability of a transmission path are guaranteed by combining the encountering node comprehensive trust value in the updating process of the state-action value.
The data packet containing only the header in the embodiment of the present invention is shown in table 1.
Figure BDA0003295291150000091
When the node x receives the message forwarded by the node s, if the selected relay node x is not the final destination node of the data Packet, the node x will add its own related information to the path information in the Packet header, the Packet sequence number PacketidWhen the node s receives the forwarded data packet broadcasted by the node x, the data packet at the moment is equivalent to an acknowledgement data packet, the node s extracts information related to the node x from the packet header to replace the original information related to the node x, then calculates the comprehensive trust value of the data packet of the node x, the node s obtains an immediate return of 0 at the moment, and updates the state-action value Q corresponding to the node xd(s,x)。
If the node x is the final destination node of the data Packet, the node x does not need to continuously forward the data Packet, and only needs to broadcast a message data Packet without load with self information to other nodes, wherein the sequence number of the data Packet is PacketidSet to-1, indicating that the packet is a packet used to update information for other nodes.
TABLE 2 State-action value update and packet forwarding procedures
Figure BDA0003295291150000092
Figure BDA0003295291150000101
Ninth embodiment, the underwater unmanned vehicle security opportunity routing apparatus based on reinforcement learning according to the present embodiment includes:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing a reinforcement learning-based underwater unmanned vehicle safety opportunity routing method as in any of the above embodiments.
Tenth embodiment, a computer device according to this embodiment, characterized in that: comprising a memory having a computer program stored therein and a processor that, when executed by the processor, performs a reinforcement learning-based underwater unmanned vehicle safety opportunity routing method as described in any of the above embodiments.

Claims (10)

1. An underwater unmanned vehicle safety opportunity routing method based on reinforcement learning is characterized by comprising the following steps:
primarily screening nodes in a communication range by using an underwater unmanned vehicle, and establishing a trust evaluation model according to the primarily screened nodes;
evaluating the preliminarily screened nodes by using a trust evaluation model, wherein evaluation elements of the evaluation model consist of a direct trust value DTvalue and an indirect trust value ITvalue;
inputting the evaluation elements into a fuzzy logic system to obtain an evaluation node comprehensive trust value, and updating the evaluation node comprehensive trust value into a encountering node trust value dynamic table;
and (4) according to the comprehensive trust value of the evaluation node output by the fuzzy logic system, performing routing selection by using reinforcement learning, setting a state-action value updating function and setting a reward function.
2. The reinforcement learning-based underwater unmanned vehicle safety opportunity routing method as claimed in claim 1, wherein the underwater unmanned vehicle is primarily screened from the communication range of the nodes, and the process of establishing the trust evaluation model according to the primarily screened nodes comprises the following steps:
the underwater unmanned vehicle node carrying the message sends a broadcast to other nodes in a communication range, requests other nodes to feed back node information, acquires a data packet, performs primary screening according to an indirect trust value ITValue in the data packet information of the other side, and selects the node with the indirect trust value exceeding a threshold value as a candidate relay node for further evaluation.
3. The reinforcement learning-based underwater unmanned vehicle safety opportunity routing method according to claim 1, wherein the direct trust value DTvalue evaluation element is selected as: 1. calculating relative distance between nodes through the sending and receiving time difference of the node data packets, and estimating the communication quality between the nodes according to the relative distance between the nodes; 2. node familiarity; 3. a node relay ratio;
the DTvalue of the indirect trust value guarantees the objectivity of evaluation on the current node, each node maintains a dynamic trust value table, the comprehensive trust value data of other nodes on the node is recorded, and the average value of the data in the dynamic trust value table is output as the indirect trust value.
4. The reinforcement learning-based underwater unmanned vehicle safety opportunity routing method according to claim 3, wherein the path loss estimated from the relative distance between the nodes measures the communication quality between the nodes, and the path loss A (d, f) of any pair of nodes occurring in an underwater acoustic channel is as follows:
A(d,f)=A0dkα1(f)d
Figure FDA0003295291140000011
where f is frequency, d is distance, A0Is a unity normalization constant, k is a propagation factor, representing the geometry of the propagation, α1Is the absorption factor.
5. The reinforcement learning-based underwater unmanned vehicle safety opportunity routing method of claim 3, wherein the node familiarity comprises:
each node records interaction records of the node, a previous hop node and a next hop node, including the number of the node of the other party, a target node, the time of starting and ending information transmission and the interaction times;
after receiving the information, the destination node broadcasts and sends a confirmation data packet only containing a packet header in the network, wherein the packet header contains successful transmission path information with destination node information;
the node receives the packet header message and confirms the interaction record, if the packet header message is contained in the successful transmission path, the previous hop node and the next hop node enter the own successful cooperative transmission node table, and whether the node which receives the packet header message and confirms the interaction record exists in the successful cooperative transmission node table is judged:
the nodes which receive the packet header message and confirm the interactive record only update the recorded data, the updated data comprises the time of the beginning and the end of the transmission and the accumulated transmission times, and the nodes in the table can be regarded as friend nodes of the current node;
if no node for receiving the header message and confirming the interactive record exists, namely the interactive record is not found on the successful transmission path, automatically clearing the interactive record after a certain time;
the network successfully operates, each node has a friend node belonging to the network, the contact interval time between the friend nodes is subjected to negative index distribution under the influence of the node moving speed and the transmission radius, the contact interval between the nodes is set to be subjected to negative index distribution, and the contact probability between the friend nodes is judged:
Figure FDA0003295291140000021
b is a friend node of A, PA,B(T) represents the probability of contact between node A and node B within time T, n is the total number of acquired historical transmission intervals, xiIs the ith transmission interval, the recorded successful interaction times with friend nodes are limited in the mobile opportunity network, and the values of n are different in the calculation process of different nodes, wherein
Figure FDA0003295291140000022
Is a statistical average of historical transmission intervals.
6. The reinforcement learning-based underwater unmanned vehicle safety opportunity routing method of claim 3, wherein the node-to-relay ratio is:
Pret=PA,B(T)/Nr
wherein, PretAs node relay ratio, NrThe number of messages received for the acquiring node.
7. The reinforcement learning-based underwater unmanned vehicle security opportunity routing method of claim 1, wherein the indirect trust value is a comprehensive trust value of an evaluation node output by a fuzzy logic system, and the reinforcement learning is used for routing selection, setting a state-action value updating function and a reward function, and the method comprises the following steps:
determining a comprehensive trust value of the encountering node according to a fuzzy logic method, using a Q learning strategy in reinforcement learning to find a proper forwarding path for the message, and defining an updating formula of a state-action value Q value as follows:
Figure FDA0003295291140000032
wherein Q isd(s, x) selecting a node x as a state-action value of a next skip sending node in a node s for a data packet with a destination node d, namely forwarding a forwarding utility Q value corresponding to the data packet with the destination node d to the node x by the node s, taking out the corresponding Q value stored in a state-action value table and substituting the Q value into a formula when updating, and storing the updated value into the state-action value table again; alpha is a learning coefficient, and alpha is more than or equal to 0 and less than or equal to 1; gamma rayd(s, x) is a dynamic discount factor corresponding to the data packet of which the destination node is d is forwarded to the node x in the node s; n is a radical ofxA set of contact nodes representing node x, the set containing all nodes, Q, encountered during movement of node xd' (x, y) state-action values for node synthetic trust values introduced to guarantee mobile opportunistic network security dynamics;
the dynamic discount factor gammad(s, x) is
γd(s,x)=γ*eCTValue(s,x)-1
Wherein gamma is a fixed constant and belongs to gamma (0, 1).
8. The reinforcement learning-based underwater unmanned vehicle security opportunity routing method of claim 1, wherein the reward function is an immediate return value, is a function related to a node comprehensive trust value, and is a forward feedback to nodes on a path of successful transmission:
Figure FDA0003295291140000031
wherein, CTValue (s, x) represents the integrated reputation value of the node x in the encountering node of the node s;
the forward feedback is the feedback after the message is successfully delivered to the destination node.
9. An underwater unmanned vehicle safety opportunity routing device based on reinforcement learning, comprising:
one or more processors;
a memory; and
one or more programs, wherein one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing a reinforcement learning-based underwater unmanned vehicle safety opportunity routing method of any of claims 1-8.
10. A computer device, characterized by: comprising a memory and a processor, the memory having a computer program stored therein, the processor when executing the computer program stored in the memory performing a reinforcement learning-based underwater unmanned vehicle safety opportunity routing method according to any one of claims 1-8.
CN202111176454.0A 2021-10-09 2021-10-09 Underwater unmanned vehicle safety opportunity routing method and device based on reinforcement learning Active CN114025405B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111176454.0A CN114025405B (en) 2021-10-09 2021-10-09 Underwater unmanned vehicle safety opportunity routing method and device based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111176454.0A CN114025405B (en) 2021-10-09 2021-10-09 Underwater unmanned vehicle safety opportunity routing method and device based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN114025405A true CN114025405A (en) 2022-02-08
CN114025405B CN114025405B (en) 2023-07-28

Family

ID=80055812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111176454.0A Active CN114025405B (en) 2021-10-09 2021-10-09 Underwater unmanned vehicle safety opportunity routing method and device based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN114025405B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114692551A (en) * 2022-03-22 2022-07-01 中国科学院大学 Method for detecting safety key signals of Verilog design files

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129974A1 (en) * 2016-11-04 2018-05-10 United Technologies Corporation Control systems using deep reinforcement learning
CN109547351A (en) * 2019-01-22 2019-03-29 西安电子科技大学 Method for routing based on Q study and trust model in Ad Hoc network
CN111065145A (en) * 2020-01-13 2020-04-24 清华大学 Q learning ant colony routing method for underwater multi-agent
US20210111988A1 (en) * 2019-10-10 2021-04-15 United States Of America As Represented By The Secretary Of The Navy Reinforcement Learning-Based Intelligent Control of Packet Transmissions Within Ad-Hoc Networks
CN112954769A (en) * 2021-01-25 2021-06-11 哈尔滨工程大学 Underwater wireless sensor network routing method based on reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129974A1 (en) * 2016-11-04 2018-05-10 United Technologies Corporation Control systems using deep reinforcement learning
CN109547351A (en) * 2019-01-22 2019-03-29 西安电子科技大学 Method for routing based on Q study and trust model in Ad Hoc network
US20210111988A1 (en) * 2019-10-10 2021-04-15 United States Of America As Represented By The Secretary Of The Navy Reinforcement Learning-Based Intelligent Control of Packet Transmissions Within Ad-Hoc Networks
CN111065145A (en) * 2020-01-13 2020-04-24 清华大学 Q learning ant colony routing method for underwater multi-agent
CN112954769A (en) * 2021-01-25 2021-06-11 哈尔滨工程大学 Underwater wireless sensor network routing method based on reinforcement learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114692551A (en) * 2022-03-22 2022-07-01 中国科学院大学 Method for detecting safety key signals of Verilog design files

Also Published As

Publication number Publication date
CN114025405B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
US6493759B1 (en) Cluster head resignation to improve routing in mobile communication systems
KR101091740B1 (en) Method for adjusting a transmitting power in a wire-less communications network
US20020071395A1 (en) Mechanism for performing energy-based routing in wireless networks
CN103118439B (en) based on the data fusion method of sensor network node universal middleware
Patil et al. Serial data fusion using space-filling curves in wireless sensor networks
CN109936866B (en) Wireless mesh network opportunistic routing method based on service quality guarantee
CN110324805B (en) Unmanned aerial vehicle-assisted wireless sensor network data collection method
US9621411B2 (en) Relaying information for an unreliably heard utility node
CN111049743B (en) Joint optimization underwater sound multi-hop cooperative communication network routing selection method
US9485676B2 (en) Wireless communication device and method for searching for bypass route in wireless network
US10798158B1 (en) Network system and decision method
CN106658539B (en) Mobile path planning method for mobile data collector in wireless sensor network
US10193661B2 (en) Communication device, non-transitory computer readable medium and wireless communication system
CN114025405A (en) Underwater unmanned vehicle safety opportunity routing method and device based on reinforcement learning
CN116261202A (en) Farmland data opportunity transmission method and device, electronic equipment and medium
US10313956B2 (en) Communication method within a dynamic-depth cluster of communicating electronic devices, communicating electronic device implementing said method and associated system
CN114430581B (en) Ant colony strategy-based AC-OLSR routing method, equipment and medium
JP5821467B2 (en) Wireless terminal
US8825104B2 (en) Wireless communication apparatus, wireless communication system and transmitting power control method
CN113347679A (en) Data transmission method and device, storage medium and electronic device
CN113163411B (en) Satellite network clustering method and device, electronic equipment and storage medium
JP2001128231A (en) Variable area adhoc network
CN110831006A (en) Ad hoc network system and data transmission method thereof
US11968252B2 (en) Peer selection for data distribution in a mesh network
KR100874009B1 (en) Repeater selection method in mobile communication system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant