CN116600265B - Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm - Google Patents

Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm Download PDF

Info

Publication number
CN116600265B
CN116600265B CN202310650889.7A CN202310650889A CN116600265B CN 116600265 B CN116600265 B CN 116600265B CN 202310650889 A CN202310650889 A CN 202310650889A CN 116600265 B CN116600265 B CN 116600265B
Authority
CN
China
Prior art keywords
unmanned
unmanned ship
ship
hoc network
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310650889.7A
Other languages
Chinese (zh)
Other versions
CN116600265A (en
Inventor
温广辉
周艳
郑治
罗中婧
邵佳伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202310650889.7A priority Critical patent/CN116600265B/en
Publication of CN116600265A publication Critical patent/CN116600265A/en
Application granted granted Critical
Publication of CN116600265B publication Critical patent/CN116600265B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/24Connectivity information management, e.g. connectivity discovery or connectivity update
    • H04W40/246Connectivity information discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W40/00Communication routing or communication path finding
    • H04W40/24Connectivity information management, e.g. connectivity discovery or connectivity update
    • H04W40/248Connectivity information update
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/18Self-organising networks, e.g. ad-hoc networks or sensor networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses an unmanned ship self-organizing network routing method based on a multi-agent QMIX algorithm, which comprises the following steps: modeling an unmanned ship dynamic communication network into an ad hoc network, and setting a moving area and a moving mode of the unmanned ship; establishing a communication model of the unmanned ship ad hoc network, and calculating related parameters among unmanned ships; describing the target of the unmanned ship ad hoc network route optimization problem, and giving constraint conditions for unmanned ship movement; modeling an unmanned ship ad hoc network routing problem into a reinforcement learning problem, and designing multi-agent QMIX algorithm elements; each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next jumping unmanned ship; information exchange is carried out among the unmanned ship intelligent bodies, and each unmanned ship uses the state-action value of the next-hop unmanned ship to update the own cost function. The multi-agent QMIX algorithm provided by the invention can obtain an intelligent real-time routing strategy and keep the reliability of the communication link between unmanned ships.

Description

Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm
Technical Field
The invention relates to the technical field of network routing, in particular to an unmanned ship self-organizing network routing method based on a multi-agent QMIX algorithm.
Background
The unmanned ship self-organizing network is a wireless self-organizing network for decentralization and autonomy, and is composed of a plurality of unmanned ships, and has the characteristics of self organization, self repair, self adaption and the like. Compared with the traditional wired network and sensor network, the unmanned ship ad hoc network has more flexibility, real-time performance and robustness, and can be widely applied to the fields of marine survey, environment monitoring, marine rescue and the like. In unmanned ship ad hoc networks, routing algorithms are a very critical part. The routing algorithm is used for determining how the data packet is forwarded between unmanned boats so as to ensure that information in the network can reach a destination in time, and the performance and efficiency of the network are directly affected. The traditional routing algorithm faces many challenges in unmanned ship ad hoc network, such as fast unmanned ship movement speed, complex and changeable environment, dynamic change of network topology structure and the like. Therefore, a routing method suitable for unmanned ship ad hoc network needs to be designed, so that the transmission time of data packets is reduced, and the network congestion probability is reduced.
In the existing routing method based on deep reinforcement learning, SDN-routing algorithm proposed by the literature [ Stampa G, arias M, S-nchez-Charles D, et al A deep-reinforcement learning approach for software-defined networking routing optimization. ArXiv preprint arXiv:1709.07080,2017 ] uses single-agent deep reinforcement learning for route optimization of traffic engineering for the first time, and uses traffic demand between a source node and a destination node as a state set, and determines a transmission path of a data packet through a network controller, thereby effectively reducing network delay. The method is a routing method based on single-agent deep reinforcement learning, and cannot solve the problem of multi-task decision of unmanned ship ad hoc network. The literature [ Mukhutdinov D, filchenkov A, shayto A, et al Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system. Future Generation Computer Systems,2019,94:587-600 ] combines the Q-learning algorithm and the DQN algorithm, and multi-agent deep reinforcement learning is used for the first time for optimization of routing problems. In the centralized training phase, each router is seen as an agent, and its parameters are shared with each other and updated simultaneously during the training process. Independent instructions are provided for the transmission of data packets during the execution phase. The algorithm realizes the optimization of the transmission time and the energy consumption of the data packet. The literature [ You X, li X, xu Y, et al Toward packet routing with fully distributed multiagent deep reinforcement learning.IEEE Transactions on Systems, man, and Cybernetics: systems,2020,52 (2): 855-68 ] proposes a DQRC routing method based on multi-agent reinforcement learning, each router has a long and short memory cyclic neural network for training and decision-making of a distributed environment, routing characteristics can be extracted from backlogged data packets and past action information, and the cost function of Q-learning can be effectively approximated. The algorithm balances congestion awareness and shortest paths, significantly reducing the transmission time of data packets. However, in the existing researches, some routing methods based on multi-agent deep reinforcement learning mostly involve a static network model, and are not suitable for routing decisions of a dynamic network.
Disclosure of Invention
The technical problem to be solved by the invention is to provide the unmanned ship self-organizing network routing method based on the multi-agent QMIX algorithm, the data packet transmission time and the network congestion probability are reduced by designing the unmanned ship movement track and selecting the proper next hop for the data packet, meanwhile, the reliable communication links between unmanned ships are maintained, and the technical guarantee is provided for the efficient and stable operation of the unmanned ship self-organizing network.
In order to solve the technical problems, the invention provides an unmanned ship self-organizing network routing method based on a multi-agent QMIX algorithm, which comprises the following steps:
step 1, modeling a dynamic communication network of an unmanned ship into an ad hoc network, and setting a moving area and a moving mode of the unmanned ship;
step 2, establishing a communication model of the unmanned ship ad hoc network, and calculating the signal-to-noise ratio, the transmission rate and the transmission time of data packets of communication between unmanned ships;
step 3, describing a target of the unmanned ship ad hoc network route optimization problem, and giving constraint conditions of unmanned ship movement;
step 4, modeling the unmanned self-organizing network routing problem into a reinforcement learning problem, and designing multi-agent QMIX algorithm elements;
step 5, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next-hop unmanned ship;
and 6, information exchange is carried out among the unmanned ship intelligent bodies, and each unmanned ship updates a cost function by using a state-action value of the next-hop unmanned ship.
Preferably, in step 1, the unmanned ship dynamic communication network is modeled as an ad hoc network, wherein the unmanned ships are regarded as nodes in the network, the unmanned ship ad hoc network is composed of N unmanned ships, and indexes of the N unmanned ships are expressed as a set
Preferably, in step 1, the moving area of the unmanned boat in the system is described as follows: the moving area of the unmanned ship is designed into a square area with L, the shore base is positioned at the edge of the area, the unmanned ship N is stopped near the base station, communication connection with the base station is maintained, and the unmanned ship N is also a destination of a data packet in the unmanned ship ad hoc network.
Preferably, in step 1, the movement mode of the unmanned boat in the system is described as follows: the unmanned boat moves within the area at a fixed speed V, and the unmanned boat n is positioned at a p at time t n T is more than or equal to 0 and less than or equal to T, wherein T represents the total running time of the unmanned ship ad hoc network, and the system is assumed to run in a time slot mode, namelyEach time slot delta t The direction of movement of the drone within each time slot remains unchanged, the movement of the drone being expressed as:wherein e n And (t) represents the moving direction of the unmanned ship n at time t, and the position and the moving direction of the unmanned ship in the system are in linear relation.
Preferably, in step 2, a communication model of the unmanned ship ad hoc network is established, and the calculation of the signal-to-noise ratio, the transmission rate and the transmission time of the data packet of the communication between the unmanned ships specifically comprises the following steps:
step 21, a communication model of the unmanned ship ad hoc network is as follows:
because the distance between the source unmanned ship and the destination unmanned ship N exceeds the communicable distance, each data packet can reach the destination unmanned ship N only through the multi-hop unmanned ship of the unmanned ship ad hoc network, and the source unmanned ship in the systemThe transmission path between the boat and the destination unmanned boat N is shown asWherein M is σ Indicating the number of unmanned boats in the data packet path, n m Representing an mth unmanned boat in the path omega;
the wireless channel transmission loss model of the system adopts a Longley-Rice model and is divided into three cases: apparent distance transmission loss, diffraction transmission loss, and scattering transmission loss; based on Longley-Rice model, transmission loss L between unmanned ship n and unmanned ship n n,n′ (t) is expressed as: l (L) n,n′ (t)=L free +L ref (d) Wherein L is free And L ref (d) Expressed as:
L free =32.45+20lgd+20lgf,
wherein L is free Represents free space path transmission loss, d represents transmission distance, f represents radio frequency, k 1 And k 2 Representing the transmission loss coefficient, m d And m s Loss coefficients, L, representing diffraction and scattering, respectively be 、L bed 、L bes Respectively representing the transmission loss in view distance, diffraction and scattering in free space, if d satisfies d min ≤d≤d Ls D represents the line-of-sight transmission distance; if d satisfies d Ls ≤d≤d x D represents the diffraction transmission distance; if d satisfies d is greater than or equal to d x D represents the scattering transmission distance; wherein d is Ls Represents the smooth ground distance d x Indicating that the diffraction loss and the scattering loss at a certain place are equal;
step 22, because the unmanned boats in the system operate in the same frequency band, communication interference among the unmanned boats can be caused, and the unmanned boats n m+1 From unmanned ship n m The received signal-to-noise ratio is expressed as:wherein the method comprises the steps ofP represents an unmanned boat n m B represents the communication bandwidth between unmanned boats, ζ 0 Representing noise power spectral density;
step 23, unmanned boat n m And unmanned boat n m+1 The packet transmission rate at time t is:
step 24, each unmanned aerial vehicle is provided with a buffer queue for storing received data packets, the data packet in-out queues follow the first-in first-out principle, and the unmanned aerial vehicle can forward the data packets at the head of each buffer queue in each time slot; unmanned ship n m To unmanned boat n m+1 The data packet size of (a) is sigma, if unmanned ship n m+1 Is to be buffered in the rest of the bufferThenIf->Then network congestion is caused; if the communication link of the unmanned ship ad hoc network is normal, the transmission time of the data packet between unmanned ships is +.>Denoted as->From this, the total transmission time t of the data packet on the path ω is derived total Is->
Preferably, in step 3, the objective of unmanned aerial vehicle ad hoc network route optimization problem is describedThe target and the constraint conditions for unmanned ship movement are given specifically as follows: through designing suitable unmanned ship moving trackAnd a packet transmission path omega to reduce the transmission time of each packet, reduce the congestion probability of the network, and thus the route optimization problem is expressed asAnd the following conditions are satisfied at the same time:
wherein the method comprises the steps ofRepresents the initial position phi of the unmanned ship n l And phi u Respectively representing the upper and lower boundaries of the unmanned ship moving area;representing a speed constraint of the unmanned boat; />Indicating whether the next-hop unmanned ship is in the communication range of the current unmanned ship, and if the next-hop unmanned ship does not meet the constraint conditions, generating a service interruption event;indicating that the unmanned boat has enough remaining cache to receive the upcoming data packet; phi (phi) l ≤p n (t)≤φ u Indicating that the unmanned boat is restricted to movement within the set area.
Preferably, in step 4, the unmanned ad hoc network routing problem is modeled as a reinforcement learning problem, and the design of the multi-agent QMIX algorithm element specifically includes the following steps:
step 41, observation of the agent is described as follows: observation space o of unmanned ship n n (t) consists of the following parts: unmanned ship positionRemaining buffer of unmanned ship->The size sigma of the currently transmitted data packet;
step 42, the actions of the agent are described as follows: action a of each unmanned boat n (t) consists of two partsWherein the actions are->Track design representing unmanned ship, +.>Representing the next hop selection of the unmanned boat, the direction of movement in the system +.>Consists of five choices of left, right, front, rear and rest: { (-1,0), (1,0), (0,1), (0, -1), (0,0) }. Unmanned boat through selectionSelecting the appropriate next hop->To reduce packet transmission time.
Step 43, rewards of the agent are described as follows: the rewards of each unmanned ship intelligent body are composed of the following parts, and the transmission time of data packets from the current unmanned ship to the next hop unmanned shipThe goal is to minimize the transmission time, so rewards are defined as the opposite number of transmission times; if the unmanned ship moves outside the defined area, it will be given a penalty value +.>Limiting the unmanned ship to move in the networking area; the data packet is transmitted from the current unmanned ship to the next-hop unmanned ship, so that the network of the next-hop unmanned ship is congested, and a punishment value is given to the network of the next-hop unmanned ship>Reducing network congestion probability of networking; the next hop unmanned ship selected by the current unmanned ship exceeds the communication range, a punishment value is given to the next hop unmanned ship>Ensuring that the communication link between unmanned boats is reliable; the total reward for unmanned boat n is +.>Wherein k is t The weight is a positive value; />Indicating whether the unmanned boat n moves outside the defined area; />Indicating whether or not congestion of the selected next hop unmanned ship network is causedA plug; />Indicating whether the selected next hop unmanned boat is within communication range.
Preferably, in step 5, each unmanned ship intelligent body is divided into two sub intelligent bodies, and the unmanned ship track design and the selection of the next hop unmanned ship are respectively responsible for specifically: in order to solve the problem of large action space caused by two types of decisions, each unmanned ship intelligent agent uses a multi-intelligent agent QMIX algorithm; in the system, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next-hop unmanned ship, and a larger action space is divided into two smaller action spaces, so that the action space is reduced from 5 x (N-1) to 5+ (N-1), and the training complexity is effectively reduced; the greedy strategy for unmanned boat n is expressed as:wherein->And->Representing the state-action value function of the sub-intelligent body, respectively responsible for the track design of the unmanned ship and the selection of the next-hop unmanned ship, wherein the combination strategy of the unmanned ship intelligent body is expressed as +.>
Preferably, in step 6, information exchange is performed between the unmanned ship intelligent bodies, and each unmanned ship updates its own cost function by using the state-action value of the next-hop unmanned ship specifically: although each unmanned boat is divided into two sub-agents, the hybrid network can still output an action-cost functionEach unmanned boat is thus considered during training between unmanned boatsAn agent, the algorithm uses the action-cost function of the next hop unmanned craft to calculate the training goal, which is obtained through the information exchange between unmanned craft.
Preferably, the training objectives are defined as:
calculating training targets using state-action values from other agents may break the correlation of target network and training network, with little penalty to each unmanned ship agent's state-action value after sufficient trainingCan be seen as the opposite number of transmission times of the current data packet. For example, in unmanned boat n m Under normal condition of communication link with target unmanned ship, unmanned ship n m The reward of (a) is that the data packet is transmitted from the unmanned ship n m The opposite number of times transmitted to the destination unmanned ship, status-action value +.>Also a data packet is from unmanned ship n m The opposite number of times to transfer to the destination unmanned boat.
The beneficial effects of the invention are as follows: (1) According to the invention, the unmanned ship dynamic communication network is modeled into an ad hoc network, the transmission time of the data packet and the network congestion probability are reduced by designing the movement track of the unmanned ship and selecting a proper next hop for the data packet, and meanwhile, the reliability of the communication links between the unmanned ships is maintained; (2) According to the invention, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for the design of the moving track of the unmanned ship and the next jump selection of the data packet, the action space of the intelligent bodies is reduced by decomposing the unmanned ship intelligent bodies, and the training process is simplified; (3) According to the invention, unmanned ship intelligent bodies cooperate with each other in a distributed mode, so that a more intelligent routing strategy can be learned from interaction with a network environment, the transmission time of data packets and the network congestion probability are reduced to a certain extent, and the network throughput is improved.
Drawings
FIG. 1 is a schematic diagram of the method steps of the present invention.
Fig. 2 is a schematic diagram of a network model structure according to the present invention.
FIG. 3 is a schematic flow chart of the method of the present invention.
FIG. 4 is a schematic diagram of the method of the present invention.
FIG. 5 is a jackpot diagram for all unmanned boat agents per turn provided by the present invention.
Fig. 6 is a probability map of network congestion and communication outage events provided by the present invention.
Fig. 7 is a graph of the influence of network load on the probability of network congestion provided by the present invention.
Fig. 8 is a graph of the impact of network load on network throughput provided by the present invention.
Detailed Description
As shown in fig. 1 to 4, a multi-agent QMIX algorithm-based unmanned ship ad hoc network routing method comprises the following steps:
step 1, modeling a dynamic communication network of an unmanned ship into an ad hoc network, and setting a moving area and a moving mode of the unmanned ship;
step 2, establishing a communication model of the unmanned ship ad hoc network, and calculating the signal-to-noise ratio, the transmission rate and the transmission time of data packets of communication between unmanned ships;
step 3, describing a target of the unmanned ship ad hoc network route optimization problem, and giving constraint conditions of unmanned ship movement;
step 4, modeling the unmanned self-organizing network routing problem into a reinforcement learning problem, and designing multi-agent QMIX algorithm elements;
step 5, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next-hop unmanned ship;
and 6, information exchange is carried out among the unmanned ship intelligent bodies, and each unmanned ship updates a cost function by using a state-action value of the next-hop unmanned ship.
Example 1:
the movable area of the unmanned ship is designed to be a square area with the side length of L=2km, the number of the unmanned ships is 20, and the communication range d of the unmanned ship is provided c The unmanned boat has a moving speed v=10m/s, a training round length t=1000, and the size of the generated data packet is subject to a poisson distribution of λ=25. The unmanned boat communication parameters are shown in table 1.
Table 1 unmanned ship communication parameters
TABLE 2 MLP Structure of sub-agent
In the proposed multi-agent QMIX algorithm, each sub-agent has a deep neural network with two convolutional layers for extracting the profile of the unmanned boat. Each sub-agent uses a multi-layer perceptron to output state-action values, the specific structure of which is shown in table 2. The hybrid neural network also has a convolution layer for extracting the distribution characteristics of the unmanned ship, inputs the state-action values of the two sub-agents, and outputs the combined state-action values by using a multi-layer perceptron. The weights of the hybrid network are generated by the super network, which takes the observations as input and outputs the weights of the hybrid network. Random search probability epsilon=0.1 was designed at training. The agent was randomly sampled from the experience playback pool in each training step, and the capacity of the experience playback pool was 16000 in the experiment. Experimental winning parameter settings are shown in table 3.
TABLE 3 unmanned boat rewarding parameters
The proposed multi-agent QMIX algorithm is compared with the DQN-routing algorithm and the AODV algorithm. The position of all unmanned boats is fixed in the AODV algorithm. As shown in FIG. 5, the effectiveness of the proposed algorithm was verified experimentally with the jackpot for all agents in each round, which showed an upward trend with increasing training rounds. For the two sub-agents of the multi-agent QMIX algorithm, the performance of the QMIX algorithm is optimal at the end of training and is superior to the DQN-routing algorithm and the AODV algorithm. At the beginning of the training process, the jackpot of the QMIX algorithm is lower than the DQN-routing algorithm and the AODV algorithm. This is because, at the beginning of training, the unmanned boats move randomly according to the movement track of the sub-agent design, resulting in unstable communication links between unmanned boats. The selected next-hop unmanned ship does not cause the loss of the data packet in the communication range, so that the punishment on the intelligent agent is more. With the increase of training rounds, the unmanned boat intelligent body gradually learns to select the proper next hop unmanned boat within the communication range, and the obtained rewards are also gradually increased. The DQN-routing algorithm converges faster than the QMIX algorithm, but the rewards acquired after convergence fluctuate more. This is because the DQN-routing algorithm lacks planning the unmanned ship trajectory and only trains the agent that selects the next hop unmanned ship, sometimes selecting a better data packet transmission path as the network topology changes. Since the network topology of the AODV algorithm is static, the next hop is selected according to the shortest path algorithm, so that the obtained rewards are stable.
As shown in fig. 6, the proposed method also prevents network congestion and communication disruption events better. The communication disruption event drops sharply in the first 1000 rounds, indicating that the agent has learned how to select the next hop drone within its communicable range. At the beginning of training, the network congestion probability tends to rise, and falls after about 1000 rounds. This is because it is only when the data packets are transmitted to the unmanned boat within the communicable range that it is possible to count whether or not network congestion occurs. At the beginning of training, most unmanned boat intelligent bodies cannot correctly select the next hop unmanned boat within the communication range, so the probability of network congestion is small. As training rounds increase, more packets may be transmitted to the drones within communication range and the congestion probability of the network begins to rise. Unmanned ship agents gradually learn from the experience playback pool how to avoid network congestion, so the probability of network congestion gradually drops to zero in later training.
Fig. 7 is an effect of network load on network congestion probability. Each time step is designed to be 10ms in the experiment, and the smaller the generation time interval time of the data packet is, the more the data packet is generated in the unmanned ship Ad hoc network. As can be seen from fig. 7, as the data packets in the network decrease, the congestion probability of the network tends to decrease. The network congestion probability of the multi-agent QMIX algorithm is lower than that of the DQN-routing algorithm and the AODV algorithm. This is because the AODV algorithm selects the next hop based on the shortest path algorithm, the DQN-routing algorithm does not take into account the current unmanned boat location, whereas the multi-agent QMIX algorithm can learn from past experience and select an appropriate transmission path through unmanned boat trajectory design to avoid network congestion.
Fig. 8 is the impact of network load on network throughput. Compared with the DQN-routing algorithm and the AODV algorithm, the multi-agent QMIX algorithm has the highest throughput in comparison of the three algorithms, and the network throughput of the multi-agent QMIX algorithm is improved by 86.67% when the system throughput is the lowest. The throughput of the system increases and then decreases with the increase of the data packets, because the network load is lighter when the data packets in the network are fewer, and the increase of the data packets can improve the throughput of the network. However, when the throughput reaches the network capacity, the increase of the data packets causes the increase of the network congestion probability, and more data packets are discarded, resulting in the decrease of the throughput.

Claims (10)

1. An unmanned ship self-organizing network routing method based on a multi-agent QMIX algorithm is characterized by comprising the following steps:
step 1, modeling a dynamic communication network of an unmanned ship into an ad hoc network, and setting a moving area and a moving mode of the unmanned ship;
step 2, establishing a communication model of the unmanned ship ad hoc network, and calculating the signal-to-noise ratio, the transmission rate and the transmission time of data packets of communication between unmanned ships;
step 3, describing a target of the unmanned ship ad hoc network route optimization problem, and giving constraint conditions of unmanned ship movement;
step 4, modeling the unmanned self-organizing network routing problem into a reinforcement learning problem, and designing multi-agent QMIX algorithm elements;
step 5, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next-hop unmanned ship;
and 6, information exchange is carried out among the unmanned ship intelligent bodies, and each unmanned ship updates a cost function by using a state-action value of the next-hop unmanned ship.
2. The multi-agent QMIX algorithm-based unmanned aerial vehicle ad hoc network routing method of claim 1, wherein in step 1, the unmanned aerial vehicle dynamic communication network is modeled as an ad hoc network, wherein the unmanned aerial vehicle is regarded as a node in the network, the unmanned aerial vehicle ad hoc network is composed of N unmanned aerial vehicles, and the indexes thereof are expressed as a set
3. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm of claim 1, wherein in step 1, the movement area of the unmanned ship in the system is described as follows: the moving area of the unmanned aerial vehicle is designed into a square area with L, a shore base is positioned at the edge of the area, the unmanned aerial vehicle N is stopped near the base station, communication connection with the base station is kept, meanwhile, the unmanned aerial vehicle is also the destination of a data packet in the unmanned aerial vehicle ad hoc network, the unmanned aerial vehicle ad hoc network consists of N unmanned aerial vehicles, and the indexes of the unmanned aerial vehicle ad hoc network are expressed as a set
4. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm of claim 1, wherein in step 1, the movement of unmanned ships in the system is described as follows: the unmanned boat moves within the area at a fixed speed V, and the unmanned boat n is positioned at a p at time t n T is more than or equal to 0 and less than or equal to T, wherein T represents the total running time of the unmanned ship ad hoc network, and the system is assumed to run in a time slot mode, namelyEach time slot delta t The direction of movement of the drone within each time slot remains unchanged, the movement of the drone being expressed as: />Wherein e n (t) represents the moving direction of the unmanned ship N at time t, the position and the moving direction of the unmanned ship in the system are in linear relation, the unmanned ship ad hoc network consists of N unmanned ships, and the indexes of the unmanned ships are represented as a set->
5. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm of claim 1, wherein in step 2, a communication model of the unmanned ship ad hoc network is established, and the calculation of the signal-to-noise ratio, the transmission rate and the transmission time of the data packet of the communication between the unmanned ships comprises the following steps:
step 21, a communication model of the unmanned ship ad hoc network is as follows:
because the distance between the source unmanned aerial vehicle and the destination unmanned aerial vehicle N exceeds the communicable distance, each data packet can reach the destination unmanned aerial vehicle N only through the multi-hop unmanned aerial vehicle of the unmanned aerial vehicle ad hoc network, and the transmission path between the source unmanned aerial vehicle and the destination unmanned aerial vehicle N in the system is shown asWherein M is σ Indicating the number of unmanned boats in the data packet path, n m Representing the mth unmanned ship in the path omega, wherein the unmanned ship ad hoc network consists of N unmanned ships, and the indexes of the unmanned ships are represented as a set +.>
The wireless channel transmission loss model of the system adopts a Longley-Rice model and is divided into three cases: apparent distance transmission loss, diffraction transmission loss, and scattering transmission loss; based on Longley-Rice model, transmission loss L between unmanned ship n and unmanned ship n n,n′ (t) is expressed as: l (L) n,n′ (t)=L free +L ref (d) Wherein L is free And L ref (d) Expressed as:
L free =32.45+20lgd+20lgf,
wherein L is free Represents free space path transmission loss, d represents transmission distance, f represents radio frequency, k 1 And k 2 Representing the transmission loss coefficient, m d And m s Loss coefficients, L, representing diffraction and scattering, respectively be 、L bed 、L bes Respectively representing the transmission loss in view distance, diffraction and scattering in free space, if d satisfies d min ≤d≤d Ls D represents the line-of-sight transmission distance; if d satisfies d Ls ≤d≤d x D represents the diffraction transmission distance; if d satisfies d is greater than or equal to d x D represents the scattering transmission distance; wherein d is Ls Represents the smooth ground distance d x Indicating that the diffraction loss and the scattering loss at a certain place are equal;
step 22, because the unmanned boats in the system operate in the same frequency band, communication interference among the unmanned boats can be caused, and the unmanned boats n m+1 From unmanned ship n m The received signal-to-noise ratio is expressed as:wherein the method comprises the steps ofP represents an unmanned boat n m B represents the communication bandwidth between unmanned boats, ζ 0 Representing noise power spectral density;
step 23, unmanned boat n m And unmanned boat n m+ 1 at time t is:
step 24, each unmanned aerial vehicle is provided with a buffer queue for storing received data packets, the data packet in-out queues follow the first-in first-out principle, and the unmanned aerial vehicle can forward the data packets at the head of each buffer queue in each time slot; unmanned ship n m To unmanned boat n m+1 The data packet size of (a) is sigma, if unmanned ship n m+1 Is to be buffered in the rest of the bufferThenIf->Then network congestion is caused; if the communication link of the unmanned ship ad hoc network is normal, the transmission time of the data packet between unmanned ships is +.>Denoted as->From this, the total transmission time t of the data packet on the path ω is derived total Is->
6. The unmanned aerial vehicle ad hoc network routing method based on the multi-agent QMIX algorithm according to claim 1, wherein in step 3, the objective describing the unmanned aerial vehicle ad hoc network route optimization problem is given by the following constraints of unmanned aerial vehicle movement: through designing suitable unmanned ship moving trackAnd a packet transmission path omega for reducing transmission time of each packet, reducing congestion probability of the network, and thus route optimization problem is expressed as +.>And the following conditions are satisfied at the same time:
wherein the method comprises the steps ofIndicating the transmission time of the data packets between unmanned boats, < >>Represents the initial position of the unmanned ship n, M σ Representing the number of unmanned boats in the data packet path, phi l And phi u Respectively representing the upper and lower boundaries of the unmanned ship moving area;representing a speed constraint of the unmanned boat; />Indicating whether the next-hop unmanned ship is in the communication range of the current unmanned ship, and if the next-hop unmanned ship does not meet the constraint conditions, generating a service interruption event;indicating that the unmanned boat has enough remaining cache to receive the upcoming data packet; unmanned ship n m To unmanned boat n m+ 1 has a packet size sigma, phi l ≤p n (t)≤φ u Indicating that the unmanned ship is limited to move in the set area, the unmanned ship ad hoc network consists of N unmanned ships, and the index of the unmanned ship ad hoc network is indicated as a set +.>
7. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm according to claim 1, wherein in step 4, the unmanned ship ad hoc network routing problem is modeled as a reinforcement learning problem, and the design of the multi-agent QMIX algorithm element specifically comprises the steps of:
step 41, observation of the agent is described as follows: observation space o of unmanned ship n n (t) consists of the following parts: unmanned ship positionRemaining buffer of unmanned ship->The size sigma of the currently transmitted data packet, the unmanned ship ad hoc network consists of N unmanned ships, the index of which is expressed as a set +.>
Step 42, the actions of the agent are described as follows: action a of each unmanned boat n (t) consists of two partsWherein the actions are->Track design representing unmanned ship, +.>Representing the next hop selection of the unmanned boat, the direction of movement in the system +.>Consists of five choices of left, right, front, rear and rest: { (-1, 0), (0, 1), (0, -1), (0, 0) }, unmanned ship is built by selecting the appropriate next hop +.>To reduce packet transmission time;
step 43, rewards of the agent are described as follows: the rewards of each unmanned ship intelligent body are composed of the following parts, and the transmission time of data packets from the current unmanned ship to the next hop unmanned shipThe goal is to minimize transmissionThe transmission time, and thus the reward is defined as the opposite number of transmission times; if the unmanned ship moves outside the defined area, it will be given a penalty value +.>Limiting the unmanned ship to move in the networking area; the data packet is transmitted from the current unmanned ship to the next-hop unmanned ship, so that the network of the next-hop unmanned ship is congested, and a punishment value is given to the network of the next-hop unmanned ship>Reducing network congestion probability of networking; the next hop unmanned ship selected by the current unmanned ship exceeds the communication range, a punishment value is given to the next hop unmanned ship>Ensuring that the communication link between unmanned boats is reliable; the total reward for unmanned boat n is +.>Wherein k is t The weight is a positive value;indicating whether the unmanned boat n moves outside the defined area; />Indicating whether the selected next hop unmanned ship network is congested;indicating whether the selected next hop unmanned boat is within communication range.
8. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm according to claim 1, wherein in step 5, each unmanned ship agent is divided into two sub-agents, each responsible for noThe design of the unmanned boat track and the selection of the unmanned boat to be jumped next are specifically as follows: in order to solve the problem of large action space caused by two types of decisions, each unmanned ship intelligent agent uses a multi-intelligent agent QMIX algorithm; in the system, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next-hop unmanned ship, a larger action space is divided into two smaller action spaces, the action space is reduced from 5 x (N-1) to 5+ (N-1), the training complexity is effectively reduced, the unmanned ship self-organizing network is composed of N unmanned ships, and indexes of the unmanned ship self-organizing network are expressed as a setThe greedy strategy for unmanned boat n is expressed as:wherein->And->Representing the state-action value function of the sub-intelligent body, respectively responsible for the track design of the unmanned ship and the selection of the next-hop unmanned ship, wherein the combination strategy of the unmanned ship intelligent body is expressed as +.>
9. The unmanned ship ad hoc network routing method based on the multi-agent QMIX algorithm of claim 1, wherein in step 6, information exchange is performed between unmanned ship agents, and each unmanned ship updates its own cost function using the state-action value of the next hop unmanned ship specifically is: although each unmanned boat is divided into two sub-agents, the hybrid network can still output an action-cost functionTherefore, each unmanned ship is regarded as an intelligent body in the training process between unmanned ships, and the algorithm calculates a training target by using the action-cost function of the next-hop unmanned ship, and the training target is obtained through information exchange between the unmanned ships.
10. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm of claim 9, wherein the training objective is defined as:
calculating training targets using state-action values from other agents may break the correlation of target network and training network, with little penalty to each unmanned ship agent's state-action value after sufficient trainingCan be seen as the opposite number of transmission times of the current data packet.
CN202310650889.7A 2023-06-02 2023-06-02 Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm Active CN116600265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310650889.7A CN116600265B (en) 2023-06-02 2023-06-02 Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310650889.7A CN116600265B (en) 2023-06-02 2023-06-02 Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm

Publications (2)

Publication Number Publication Date
CN116600265A CN116600265A (en) 2023-08-15
CN116600265B true CN116600265B (en) 2024-04-05

Family

ID=87608117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310650889.7A Active CN116600265B (en) 2023-06-02 2023-06-02 Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm

Country Status (1)

Country Link
CN (1) CN116600265B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113741449A (en) * 2021-08-30 2021-12-03 南京信息工程大学 Multi-agent control method for air-sea cooperative observation task
CN114499648A (en) * 2022-03-10 2022-05-13 南京理工大学 Unmanned aerial vehicle cluster network intelligent multi-hop routing method based on multi-agent cooperation
CN116127848A (en) * 2023-02-27 2023-05-16 东南大学 Multi-unmanned aerial vehicle collaborative tracking method based on deep reinforcement learning
WO2023095151A1 (en) * 2021-11-26 2023-06-01 Telefonaktiebolaget Lm Ericsson (Publ) Improving collective performance of multi-agents

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113741449A (en) * 2021-08-30 2021-12-03 南京信息工程大学 Multi-agent control method for air-sea cooperative observation task
WO2023095151A1 (en) * 2021-11-26 2023-06-01 Telefonaktiebolaget Lm Ericsson (Publ) Improving collective performance of multi-agents
CN114499648A (en) * 2022-03-10 2022-05-13 南京理工大学 Unmanned aerial vehicle cluster network intelligent multi-hop routing method based on multi-agent cooperation
CN116127848A (en) * 2023-02-27 2023-05-16 东南大学 Multi-unmanned aerial vehicle collaborative tracking method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于多智能体强化学习的时延容忍网络路由技术研究;韩晨晨;北京邮电大学硕士学位论文;全文 *
引入通信与探索的多智能体强化学习QMIX算法;邓晖奕 等;计算机应用(第1期);全文 *

Also Published As

Publication number Publication date
CN116600265A (en) 2023-08-15

Similar Documents

Publication Publication Date Title
Tang et al. Survey on machine learning for intelligent end-to-end communication toward 6G: From network access, routing to traffic control and streaming adaption
Alzamzami et al. Fuzzy logic-based geographic routing for urban vehicular networks using link quality and achievable throughput estimations
CN108712760B (en) High-throughput relay selection method based on random Learning Automata and fuzzy algorithmic approach
CN114499648B (en) Unmanned aerial vehicle cluster network intelligent multi-hop routing method based on multi-agent cooperation
CN114124823B (en) Self-adaptive routing method, system and equipment oriented to high dynamic network topology
CN111510956B (en) Hybrid routing method based on clustering and reinforcement learning and ocean communication system
He et al. A fuzzy logic reinforcement learning-based routing algorithm for flying ad hoc networks
CN116390164A (en) Low orbit satellite network trusted load balancing routing method, system, equipment and medium
CN111356199B (en) Vehicle-mounted self-organizing network routing method in three-dimensional scene
Lyu et al. Qngpsr: A q-network enhanced geographic ad-hoc routing protocol based on gpsr
CN114567365A (en) Routing method and system for low-earth-orbit satellite network load balancing
Su et al. A glider-assist routing protocol for underwater acoustic networks with trajectory prediction methods
CN116600265B (en) Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm
CN113133105B (en) Unmanned aerial vehicle data collection method based on deep reinforcement learning
CN116828559A (en) Intelligent routing method for vehicle track ad hoc network based on mobile perception
CN116033513A (en) Internet of things route optimization method based on deep reinforcement learning
CN113490181B (en) LSTM neural network-based vehicle transmission delay optimization method
Hatamian et al. Priority-based congestion control mechanism for wireless sensor networks using fuzzy logic
CN114449608A (en) Unmanned aerial vehicle ad hoc network self-adaptive routing method based on Q-Learning
Tang et al. Disaster Resilient Emergency Communication With Intelligent Air-Ground Cooperation
Hou et al. Deep-Reinforcement-Learning-Aided Loss-Tolerant Congestion Control for 6LoWPAN Networks
CN105979560B (en) Based on the wireless sensing network data transmission method and system for shunting optimum choice
CN113316216B (en) Routing method for micro-nano satellite network
Liu et al. CLORP: Cross-Layer Opportunistic Routing Protocol for Underwater Sensor Networks Based on Multi-Agent Reinforcement Learning
CN116234073A (en) Routing method of distributed unmanned aerial vehicle ad hoc network based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant