CN116600265B

CN116600265B - Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm

Info

Publication number: CN116600265B
Application number: CN202310650889.7A
Authority: CN
Inventors: 温广辉; 周艳; 郑治; 罗中婧; 邵佳伟
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2024-04-05
Anticipated expiration: 2043-06-02
Also published as: CN116600265A

Abstract

The invention discloses an unmanned ship self-organizing network routing method based on a multi-agent QMIX algorithm, which comprises the following steps: modeling an unmanned ship dynamic communication network into an ad hoc network, and setting a moving area and a moving mode of the unmanned ship; establishing a communication model of the unmanned ship ad hoc network, and calculating related parameters among unmanned ships; describing the target of the unmanned ship ad hoc network route optimization problem, and giving constraint conditions for unmanned ship movement; modeling an unmanned ship ad hoc network routing problem into a reinforcement learning problem, and designing multi-agent QMIX algorithm elements; each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next jumping unmanned ship; information exchange is carried out among the unmanned ship intelligent bodies, and each unmanned ship uses the state-action value of the next-hop unmanned ship to update the own cost function. The multi-agent QMIX algorithm provided by the invention can obtain an intelligent real-time routing strategy and keep the reliability of the communication link between unmanned ships.

Description

Unmanned ship self-organizing network routing method based on multi-agent QMIX algorithm

Technical Field

The invention relates to the technical field of network routing, in particular to an unmanned ship self-organizing network routing method based on a multi-agent QMIX algorithm.

Background

The unmanned ship self-organizing network is a wireless self-organizing network for decentralization and autonomy, and is composed of a plurality of unmanned ships, and has the characteristics of self organization, self repair, self adaption and the like. Compared with the traditional wired network and sensor network, the unmanned ship ad hoc network has more flexibility, real-time performance and robustness, and can be widely applied to the fields of marine survey, environment monitoring, marine rescue and the like. In unmanned ship ad hoc networks, routing algorithms are a very critical part. The routing algorithm is used for determining how the data packet is forwarded between unmanned boats so as to ensure that information in the network can reach a destination in time, and the performance and efficiency of the network are directly affected. The traditional routing algorithm faces many challenges in unmanned ship ad hoc network, such as fast unmanned ship movement speed, complex and changeable environment, dynamic change of network topology structure and the like. Therefore, a routing method suitable for unmanned ship ad hoc network needs to be designed, so that the transmission time of data packets is reduced, and the network congestion probability is reduced.

In the existing routing method based on deep reinforcement learning, SDN-routing algorithm proposed by the literature [ Stampa G, arias M, S-nchez-Charles D, et al A deep-reinforcement learning approach for software-defined networking routing optimization. ArXiv preprint arXiv:1709.07080,2017 ] uses single-agent deep reinforcement learning for route optimization of traffic engineering for the first time, and uses traffic demand between a source node and a destination node as a state set, and determines a transmission path of a data packet through a network controller, thereby effectively reducing network delay. The method is a routing method based on single-agent deep reinforcement learning, and cannot solve the problem of multi-task decision of unmanned ship ad hoc network. The literature [ Mukhutdinov D, filchenkov A, shayto A, et al Multi-agent deep learning for simultaneous optimization for time and energy in distributed routing system. Future Generation Computer Systems,2019,94:587-600 ] combines the Q-learning algorithm and the DQN algorithm, and multi-agent deep reinforcement learning is used for the first time for optimization of routing problems. In the centralized training phase, each router is seen as an agent, and its parameters are shared with each other and updated simultaneously during the training process. Independent instructions are provided for the transmission of data packets during the execution phase. The algorithm realizes the optimization of the transmission time and the energy consumption of the data packet. The literature [ You X, li X, xu Y, et al Toward packet routing with fully distributed multiagent deep reinforcement learning.IEEE Transactions on Systems, man, and Cybernetics: systems,2020,52 (2): 855-68 ] proposes a DQRC routing method based on multi-agent reinforcement learning, each router has a long and short memory cyclic neural network for training and decision-making of a distributed environment, routing characteristics can be extracted from backlogged data packets and past action information, and the cost function of Q-learning can be effectively approximated. The algorithm balances congestion awareness and shortest paths, significantly reducing the transmission time of data packets. However, in the existing researches, some routing methods based on multi-agent deep reinforcement learning mostly involve a static network model, and are not suitable for routing decisions of a dynamic network.

Disclosure of Invention

The technical problem to be solved by the invention is to provide the unmanned ship self-organizing network routing method based on the multi-agent QMIX algorithm, the data packet transmission time and the network congestion probability are reduced by designing the unmanned ship movement track and selecting the proper next hop for the data packet, meanwhile, the reliable communication links between unmanned ships are maintained, and the technical guarantee is provided for the efficient and stable operation of the unmanned ship self-organizing network.

In order to solve the technical problems, the invention provides an unmanned ship self-organizing network routing method based on a multi-agent QMIX algorithm, which comprises the following steps:

step 1, modeling a dynamic communication network of an unmanned ship into an ad hoc network, and setting a moving area and a moving mode of the unmanned ship;

step 2, establishing a communication model of the unmanned ship ad hoc network, and calculating the signal-to-noise ratio, the transmission rate and the transmission time of data packets of communication between unmanned ships;

step 3, describing a target of the unmanned ship ad hoc network route optimization problem, and giving constraint conditions of unmanned ship movement;

step 4, modeling the unmanned self-organizing network routing problem into a reinforcement learning problem, and designing multi-agent QMIX algorithm elements;

step 5, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next-hop unmanned ship;

and 6, information exchange is carried out among the unmanned ship intelligent bodies, and each unmanned ship updates a cost function by using a state-action value of the next-hop unmanned ship.

Preferably, in step 1, the unmanned ship dynamic communication network is modeled as an ad hoc network, wherein the unmanned ships are regarded as nodes in the network, the unmanned ship ad hoc network is composed of N unmanned ships, and indexes of the N unmanned ships are expressed as a set

Preferably, in step 1, the moving area of the unmanned boat in the system is described as follows: the moving area of the unmanned ship is designed into a square area with L, the shore base is positioned at the edge of the area, the unmanned ship N is stopped near the base station, communication connection with the base station is maintained, and the unmanned ship N is also a destination of a data packet in the unmanned ship ad hoc network.

Preferably, in step 1, the movement mode of the unmanned boat in the system is described as follows: the unmanned boat moves within the area at a fixed speed V, and the unmanned boat n is positioned at a p at time t _n T is more than or equal to 0 and less than or equal to T, wherein T represents the total running time of the unmanned ship ad hoc network, and the system is assumed to run in a time slot mode, namelyEach time slot delta _t The direction of movement of the drone within each time slot remains unchanged, the movement of the drone being expressed as:wherein e _n And (t) represents the moving direction of the unmanned ship n at time t, and the position and the moving direction of the unmanned ship in the system are in linear relation.

Preferably, in step 2, a communication model of the unmanned ship ad hoc network is established, and the calculation of the signal-to-noise ratio, the transmission rate and the transmission time of the data packet of the communication between the unmanned ships specifically comprises the following steps:

step 21, a communication model of the unmanned ship ad hoc network is as follows:

because the distance between the source unmanned ship and the destination unmanned ship N exceeds the communicable distance, each data packet can reach the destination unmanned ship N only through the multi-hop unmanned ship of the unmanned ship ad hoc network, and the source unmanned ship in the systemThe transmission path between the boat and the destination unmanned boat N is shown asWherein M is _σ Indicating the number of unmanned boats in the data packet path, n _m Representing an mth unmanned boat in the path omega;

the wireless channel transmission loss model of the system adopts a Longley-Rice model and is divided into three cases: apparent distance transmission loss, diffraction transmission loss, and scattering transmission loss; based on Longley-Rice model, transmission loss L between unmanned ship n and unmanned ship n _n,n′ (t) is expressed as: l (L) _n,n′ (t)＝L _free +L _ref (d) Wherein L is _free And L _ref (d) Expressed as:

L _free ＝32.45+20lgd+20lgf,

wherein L is _free Represents free space path transmission loss, d represents transmission distance, f represents radio frequency, k ₁ And k ₂ Representing the transmission loss coefficient, m _d And m _s Loss coefficients, L, representing diffraction and scattering, respectively _be 、L _bed 、L _bes Respectively representing the transmission loss in view distance, diffraction and scattering in free space, if d satisfies d _min ≤d≤d _Ls D represents the line-of-sight transmission distance; if d satisfies d _Ls ≤d≤d _x D represents the diffraction transmission distance; if d satisfies d is greater than or equal to d _x D represents the scattering transmission distance; wherein d is _Ls Represents the smooth ground distance d _x Indicating that the diffraction loss and the scattering loss at a certain place are equal;

step 22, because the unmanned boats in the system operate in the same frequency band, communication interference among the unmanned boats can be caused, and the unmanned boats n _m+1 From unmanned ship n _m The received signal-to-noise ratio is expressed as:wherein the method comprises the steps ofP represents an unmanned boat n _m B represents the communication bandwidth between unmanned boats, ζ ₀ Representing noise power spectral density;

step 23, unmanned boat n _m And unmanned boat n _m+1 The packet transmission rate at time t is:

step 24, each unmanned aerial vehicle is provided with a buffer queue for storing received data packets, the data packet in-out queues follow the first-in first-out principle, and the unmanned aerial vehicle can forward the data packets at the head of each buffer queue in each time slot; unmanned ship n _m To unmanned boat n _m+1 The data packet size of (a) is sigma, if unmanned ship n _m+1 Is to be buffered in the rest of the bufferThenIf->Then network congestion is caused; if the communication link of the unmanned ship ad hoc network is normal, the transmission time of the data packet between unmanned ships is +.>Denoted as->From this, the total transmission time t of the data packet on the path ω is derived _total Is->

Preferably, in step 3, the objective of unmanned aerial vehicle ad hoc network route optimization problem is describedThe target and the constraint conditions for unmanned ship movement are given specifically as follows: through designing suitable unmanned ship moving trackAnd a packet transmission path omega to reduce the transmission time of each packet, reduce the congestion probability of the network, and thus the route optimization problem is expressed asAnd the following conditions are satisfied at the same time:

wherein the method comprises the steps ofRepresents the initial position phi of the unmanned ship n _l And phi _u Respectively representing the upper and lower boundaries of the unmanned ship moving area;representing a speed constraint of the unmanned boat; />Indicating whether the next-hop unmanned ship is in the communication range of the current unmanned ship, and if the next-hop unmanned ship does not meet the constraint conditions, generating a service interruption event;indicating that the unmanned boat has enough remaining cache to receive the upcoming data packet; phi (phi) _l ≤p _n (t)≤φ _u Indicating that the unmanned boat is restricted to movement within the set area.

Preferably, in step 4, the unmanned ad hoc network routing problem is modeled as a reinforcement learning problem, and the design of the multi-agent QMIX algorithm element specifically includes the following steps:

step 41, observation of the agent is described as follows: observation space o of unmanned ship n _n (t) consists of the following parts: unmanned ship positionRemaining buffer of unmanned ship->The size sigma of the currently transmitted data packet;

step 42, the actions of the agent are described as follows: action a of each unmanned boat _n (t) consists of two partsWherein the actions are->Track design representing unmanned ship, +.>Representing the next hop selection of the unmanned boat, the direction of movement in the system +.>Consists of five choices of left, right, front, rear and rest: { (-1,0), (1,0), (0,1), (0, -1), (0,0) }. Unmanned boat through selectionSelecting the appropriate next hop->To reduce packet transmission time.

Step 43, rewards of the agent are described as follows: the rewards of each unmanned ship intelligent body are composed of the following parts, and the transmission time of data packets from the current unmanned ship to the next hop unmanned shipThe goal is to minimize the transmission time, so rewards are defined as the opposite number of transmission times; if the unmanned ship moves outside the defined area, it will be given a penalty value +.>Limiting the unmanned ship to move in the networking area; the data packet is transmitted from the current unmanned ship to the next-hop unmanned ship, so that the network of the next-hop unmanned ship is congested, and a punishment value is given to the network of the next-hop unmanned ship>Reducing network congestion probability of networking; the next hop unmanned ship selected by the current unmanned ship exceeds the communication range, a punishment value is given to the next hop unmanned ship>Ensuring that the communication link between unmanned boats is reliable; the total reward for unmanned boat n is +.>Wherein k is _t The weight is a positive value; />Indicating whether the unmanned boat n moves outside the defined area; />Indicating whether or not congestion of the selected next hop unmanned ship network is causedA plug; />Indicating whether the selected next hop unmanned boat is within communication range.

Preferably, in step 5, each unmanned ship intelligent body is divided into two sub intelligent bodies, and the unmanned ship track design and the selection of the next hop unmanned ship are respectively responsible for specifically: in order to solve the problem of large action space caused by two types of decisions, each unmanned ship intelligent agent uses a multi-intelligent agent QMIX algorithm; in the system, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next-hop unmanned ship, and a larger action space is divided into two smaller action spaces, so that the action space is reduced from 5 x (N-1) to 5+ (N-1), and the training complexity is effectively reduced; the greedy strategy for unmanned boat n is expressed as:wherein->And->Representing the state-action value function of the sub-intelligent body, respectively responsible for the track design of the unmanned ship and the selection of the next-hop unmanned ship, wherein the combination strategy of the unmanned ship intelligent body is expressed as +.>

Preferably, in step 6, information exchange is performed between the unmanned ship intelligent bodies, and each unmanned ship updates its own cost function by using the state-action value of the next-hop unmanned ship specifically: although each unmanned boat is divided into two sub-agents, the hybrid network can still output an action-cost functionEach unmanned boat is thus considered during training between unmanned boatsAn agent, the algorithm uses the action-cost function of the next hop unmanned craft to calculate the training goal, which is obtained through the information exchange between unmanned craft.

Preferably, the training objectives are defined as:

calculating training targets using state-action values from other agents may break the correlation of target network and training network, with little penalty to each unmanned ship agent's state-action value after sufficient trainingCan be seen as the opposite number of transmission times of the current data packet. For example, in unmanned boat n _m Under normal condition of communication link with target unmanned ship, unmanned ship n _m The reward of (a) is that the data packet is transmitted from the unmanned ship n _m The opposite number of times transmitted to the destination unmanned ship, status-action value +.>Also a data packet is from unmanned ship n _m The opposite number of times to transfer to the destination unmanned boat.

The beneficial effects of the invention are as follows: (1) According to the invention, the unmanned ship dynamic communication network is modeled into an ad hoc network, the transmission time of the data packet and the network congestion probability are reduced by designing the movement track of the unmanned ship and selecting a proper next hop for the data packet, and meanwhile, the reliability of the communication links between the unmanned ships is maintained; (2) According to the invention, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for the design of the moving track of the unmanned ship and the next jump selection of the data packet, the action space of the intelligent bodies is reduced by decomposing the unmanned ship intelligent bodies, and the training process is simplified; (3) According to the invention, unmanned ship intelligent bodies cooperate with each other in a distributed mode, so that a more intelligent routing strategy can be learned from interaction with a network environment, the transmission time of data packets and the network congestion probability are reduced to a certain extent, and the network throughput is improved.

Drawings

FIG. 1 is a schematic diagram of the method steps of the present invention.

Fig. 2 is a schematic diagram of a network model structure according to the present invention.

FIG. 3 is a schematic flow chart of the method of the present invention.

FIG. 4 is a schematic diagram of the method of the present invention.

FIG. 5 is a jackpot diagram for all unmanned boat agents per turn provided by the present invention.

Fig. 6 is a probability map of network congestion and communication outage events provided by the present invention.

Fig. 7 is a graph of the influence of network load on the probability of network congestion provided by the present invention.

Fig. 8 is a graph of the impact of network load on network throughput provided by the present invention.

Detailed Description

As shown in fig. 1 to 4, a multi-agent QMIX algorithm-based unmanned ship ad hoc network routing method comprises the following steps:

Example 1:

the movable area of the unmanned ship is designed to be a square area with the side length of L=2km, the number of the unmanned ships is 20, and the communication range d of the unmanned ship is provided _c The unmanned boat has a moving speed v=10m/s, a training round length t=1000, and the size of the generated data packet is subject to a poisson distribution of λ=25. The unmanned boat communication parameters are shown in table 1.

Table 1 unmanned ship communication parameters

TABLE 2 MLP Structure of sub-agent

In the proposed multi-agent QMIX algorithm, each sub-agent has a deep neural network with two convolutional layers for extracting the profile of the unmanned boat. Each sub-agent uses a multi-layer perceptron to output state-action values, the specific structure of which is shown in table 2. The hybrid neural network also has a convolution layer for extracting the distribution characteristics of the unmanned ship, inputs the state-action values of the two sub-agents, and outputs the combined state-action values by using a multi-layer perceptron. The weights of the hybrid network are generated by the super network, which takes the observations as input and outputs the weights of the hybrid network. Random search probability epsilon=0.1 was designed at training. The agent was randomly sampled from the experience playback pool in each training step, and the capacity of the experience playback pool was 16000 in the experiment. Experimental winning parameter settings are shown in table 3.

TABLE 3 unmanned boat rewarding parameters

The proposed multi-agent QMIX algorithm is compared with the DQN-routing algorithm and the AODV algorithm. The position of all unmanned boats is fixed in the AODV algorithm. As shown in FIG. 5, the effectiveness of the proposed algorithm was verified experimentally with the jackpot for all agents in each round, which showed an upward trend with increasing training rounds. For the two sub-agents of the multi-agent QMIX algorithm, the performance of the QMIX algorithm is optimal at the end of training and is superior to the DQN-routing algorithm and the AODV algorithm. At the beginning of the training process, the jackpot of the QMIX algorithm is lower than the DQN-routing algorithm and the AODV algorithm. This is because, at the beginning of training, the unmanned boats move randomly according to the movement track of the sub-agent design, resulting in unstable communication links between unmanned boats. The selected next-hop unmanned ship does not cause the loss of the data packet in the communication range, so that the punishment on the intelligent agent is more. With the increase of training rounds, the unmanned boat intelligent body gradually learns to select the proper next hop unmanned boat within the communication range, and the obtained rewards are also gradually increased. The DQN-routing algorithm converges faster than the QMIX algorithm, but the rewards acquired after convergence fluctuate more. This is because the DQN-routing algorithm lacks planning the unmanned ship trajectory and only trains the agent that selects the next hop unmanned ship, sometimes selecting a better data packet transmission path as the network topology changes. Since the network topology of the AODV algorithm is static, the next hop is selected according to the shortest path algorithm, so that the obtained rewards are stable.

As shown in fig. 6, the proposed method also prevents network congestion and communication disruption events better. The communication disruption event drops sharply in the first 1000 rounds, indicating that the agent has learned how to select the next hop drone within its communicable range. At the beginning of training, the network congestion probability tends to rise, and falls after about 1000 rounds. This is because it is only when the data packets are transmitted to the unmanned boat within the communicable range that it is possible to count whether or not network congestion occurs. At the beginning of training, most unmanned boat intelligent bodies cannot correctly select the next hop unmanned boat within the communication range, so the probability of network congestion is small. As training rounds increase, more packets may be transmitted to the drones within communication range and the congestion probability of the network begins to rise. Unmanned ship agents gradually learn from the experience playback pool how to avoid network congestion, so the probability of network congestion gradually drops to zero in later training.

Fig. 7 is an effect of network load on network congestion probability. Each time step is designed to be 10ms in the experiment, and the smaller the generation time interval time of the data packet is, the more the data packet is generated in the unmanned ship Ad hoc network. As can be seen from fig. 7, as the data packets in the network decrease, the congestion probability of the network tends to decrease. The network congestion probability of the multi-agent QMIX algorithm is lower than that of the DQN-routing algorithm and the AODV algorithm. This is because the AODV algorithm selects the next hop based on the shortest path algorithm, the DQN-routing algorithm does not take into account the current unmanned boat location, whereas the multi-agent QMIX algorithm can learn from past experience and select an appropriate transmission path through unmanned boat trajectory design to avoid network congestion.

Fig. 8 is the impact of network load on network throughput. Compared with the DQN-routing algorithm and the AODV algorithm, the multi-agent QMIX algorithm has the highest throughput in comparison of the three algorithms, and the network throughput of the multi-agent QMIX algorithm is improved by 86.67% when the system throughput is the lowest. The throughput of the system increases and then decreases with the increase of the data packets, because the network load is lighter when the data packets in the network are fewer, and the increase of the data packets can improve the throughput of the network. However, when the throughput reaches the network capacity, the increase of the data packets causes the increase of the network congestion probability, and more data packets are discarded, resulting in the decrease of the throughput.

Claims

1. An unmanned ship self-organizing network routing method based on a multi-agent QMIX algorithm is characterized by comprising the following steps:

2. The multi-agent QMIX algorithm-based unmanned aerial vehicle ad hoc network routing method of claim 1, wherein in step 1, the unmanned aerial vehicle dynamic communication network is modeled as an ad hoc network, wherein the unmanned aerial vehicle is regarded as a node in the network, the unmanned aerial vehicle ad hoc network is composed of N unmanned aerial vehicles, and the indexes thereof are expressed as a set

3. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm of claim 1, wherein in step 1, the movement area of the unmanned ship in the system is described as follows: the moving area of the unmanned aerial vehicle is designed into a square area with L, a shore base is positioned at the edge of the area, the unmanned aerial vehicle N is stopped near the base station, communication connection with the base station is kept, meanwhile, the unmanned aerial vehicle is also the destination of a data packet in the unmanned aerial vehicle ad hoc network, the unmanned aerial vehicle ad hoc network consists of N unmanned aerial vehicles, and the indexes of the unmanned aerial vehicle ad hoc network are expressed as a set

4. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm of claim 1, wherein in step 1, the movement of unmanned ships in the system is described as follows: the unmanned boat moves within the area at a fixed speed V, and the unmanned boat n is positioned at a p at time t _n T is more than or equal to 0 and less than or equal to T, wherein T represents the total running time of the unmanned ship ad hoc network, and the system is assumed to run in a time slot mode, namelyEach time slot delta _t The direction of movement of the drone within each time slot remains unchanged, the movement of the drone being expressed as: />Wherein e _n (t) represents the moving direction of the unmanned ship N at time t, the position and the moving direction of the unmanned ship in the system are in linear relation, the unmanned ship ad hoc network consists of N unmanned ships, and the indexes of the unmanned ships are represented as a set->

5. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm of claim 1, wherein in step 2, a communication model of the unmanned ship ad hoc network is established, and the calculation of the signal-to-noise ratio, the transmission rate and the transmission time of the data packet of the communication between the unmanned ships comprises the following steps:

because the distance between the source unmanned aerial vehicle and the destination unmanned aerial vehicle N exceeds the communicable distance, each data packet can reach the destination unmanned aerial vehicle N only through the multi-hop unmanned aerial vehicle of the unmanned aerial vehicle ad hoc network, and the transmission path between the source unmanned aerial vehicle and the destination unmanned aerial vehicle N in the system is shown asWherein M is _σ Indicating the number of unmanned boats in the data packet path, n _m Representing the mth unmanned ship in the path omega, wherein the unmanned ship ad hoc network consists of N unmanned ships, and the indexes of the unmanned ships are represented as a set +.>

L _free ＝32.45+20lgd+20lgf,

step 23, unmanned boat n _m And unmanned boat n _m+ 1 at time t is:

6. The unmanned aerial vehicle ad hoc network routing method based on the multi-agent QMIX algorithm according to claim 1, wherein in step 3, the objective describing the unmanned aerial vehicle ad hoc network route optimization problem is given by the following constraints of unmanned aerial vehicle movement: through designing suitable unmanned ship moving trackAnd a packet transmission path omega for reducing transmission time of each packet, reducing congestion probability of the network, and thus route optimization problem is expressed as +.>And the following conditions are satisfied at the same time:

wherein the method comprises the steps ofIndicating the transmission time of the data packets between unmanned boats, < >>Represents the initial position of the unmanned ship n, M _σ Representing the number of unmanned boats in the data packet path, phi _l And phi _u Respectively representing the upper and lower boundaries of the unmanned ship moving area;representing a speed constraint of the unmanned boat; />Indicating whether the next-hop unmanned ship is in the communication range of the current unmanned ship, and if the next-hop unmanned ship does not meet the constraint conditions, generating a service interruption event;indicating that the unmanned boat has enough remaining cache to receive the upcoming data packet; unmanned ship n _m To unmanned boat n _m+ 1 has a packet size sigma, phi _l ≤p _n (t)≤φ _u Indicating that the unmanned ship is limited to move in the set area, the unmanned ship ad hoc network consists of N unmanned ships, and the index of the unmanned ship ad hoc network is indicated as a set +.>

7. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm according to claim 1, wherein in step 4, the unmanned ship ad hoc network routing problem is modeled as a reinforcement learning problem, and the design of the multi-agent QMIX algorithm element specifically comprises the steps of:

step 41, observation of the agent is described as follows: observation space o of unmanned ship n _n (t) consists of the following parts: unmanned ship positionRemaining buffer of unmanned ship->The size sigma of the currently transmitted data packet, the unmanned ship ad hoc network consists of N unmanned ships, the index of which is expressed as a set +.>

Step 42, the actions of the agent are described as follows: action a of each unmanned boat _n (t) consists of two partsWherein the actions are->Track design representing unmanned ship, +.>Representing the next hop selection of the unmanned boat, the direction of movement in the system +.>Consists of five choices of left, right, front, rear and rest: { (-1, 0), (0, 1), (0, -1), (0, 0) }, unmanned ship is built by selecting the appropriate next hop +.>To reduce packet transmission time;

step 43, rewards of the agent are described as follows: the rewards of each unmanned ship intelligent body are composed of the following parts, and the transmission time of data packets from the current unmanned ship to the next hop unmanned shipThe goal is to minimize transmissionThe transmission time, and thus the reward is defined as the opposite number of transmission times; if the unmanned ship moves outside the defined area, it will be given a penalty value +.>Limiting the unmanned ship to move in the networking area; the data packet is transmitted from the current unmanned ship to the next-hop unmanned ship, so that the network of the next-hop unmanned ship is congested, and a punishment value is given to the network of the next-hop unmanned ship>Reducing network congestion probability of networking; the next hop unmanned ship selected by the current unmanned ship exceeds the communication range, a punishment value is given to the next hop unmanned ship>Ensuring that the communication link between unmanned boats is reliable; the total reward for unmanned boat n is +.>Wherein k is _t The weight is a positive value;indicating whether the unmanned boat n moves outside the defined area; />Indicating whether the selected next hop unmanned ship network is congested;indicating whether the selected next hop unmanned boat is within communication range.

8. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm according to claim 1, wherein in step 5, each unmanned ship agent is divided into two sub-agents, each responsible for noThe design of the unmanned boat track and the selection of the unmanned boat to be jumped next are specifically as follows: in order to solve the problem of large action space caused by two types of decisions, each unmanned ship intelligent agent uses a multi-intelligent agent QMIX algorithm; in the system, each unmanned ship intelligent body is divided into two sub intelligent bodies which are respectively responsible for unmanned ship track design and selection of the next-hop unmanned ship, a larger action space is divided into two smaller action spaces, the action space is reduced from 5 x (N-1) to 5+ (N-1), the training complexity is effectively reduced, the unmanned ship self-organizing network is composed of N unmanned ships, and indexes of the unmanned ship self-organizing network are expressed as a setThe greedy strategy for unmanned boat n is expressed as:wherein->And->Representing the state-action value function of the sub-intelligent body, respectively responsible for the track design of the unmanned ship and the selection of the next-hop unmanned ship, wherein the combination strategy of the unmanned ship intelligent body is expressed as +.>

9. The unmanned ship ad hoc network routing method based on the multi-agent QMIX algorithm of claim 1, wherein in step 6, information exchange is performed between unmanned ship agents, and each unmanned ship updates its own cost function using the state-action value of the next hop unmanned ship specifically is: although each unmanned boat is divided into two sub-agents, the hybrid network can still output an action-cost functionTherefore, each unmanned ship is regarded as an intelligent body in the training process between unmanned ships, and the algorithm calculates a training target by using the action-cost function of the next-hop unmanned ship, and the training target is obtained through information exchange between the unmanned ships.

10. The unmanned ship ad hoc network routing method based on multi-agent QMIX algorithm of claim 9, wherein the training objective is defined as:

calculating training targets using state-action values from other agents may break the correlation of target network and training network, with little penalty to each unmanned ship agent's state-action value after sufficient trainingCan be seen as the opposite number of transmission times of the current data packet.