CN111065105B

CN111065105B - Distributed intelligent routing method for unmanned aerial vehicle network slice

Info

Publication number: CN111065105B
Application number: CN201911395351.6A
Authority: CN
Inventors: 陈博伦; 孙耀; 秦爽; 冯钢
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-06-11
Anticipated expiration: 2039-12-30
Also published as: CN111065105A

Abstract

The invention discloses a distributed intelligent routing method facing unmanned aerial vehicle network slicing, which comprises the following steps: modeling an unmanned aerial vehicle network into a network model; setting constraint conditions for the network model, wherein the constraint conditions comprise: time delay limit, rate limit, packet loss rate limit; when the slice is oriented to low time delay, a constrained network model is established as a multi-constraint optimization model; solving the multi-constraint optimization model through a reinforcement learning model to obtain a solution value, wherein each communication node independently stores the link conditions of the communication node and the neighbor nodes and updates the link conditions in real time in the solution process; and carrying out dynamic routing according to the solved value and the link condition. The invention effectively solves the problems that the routing method in the prior art cannot adapt to the characteristics that the unmanned aerial vehicle network has high requirement on time delay and the network dynamically changes at any time. The invention fills the technical gap and creates better conditions for the network environment of the unmanned aerial vehicle.

Description

Distributed intelligent routing method for unmanned aerial vehicle network slice

Technical Field

The invention relates to a wireless network routing method, in particular to a distributed intelligent routing method facing unmanned aerial vehicle network slicing.

Background

The traditional algorithm for solving the routing strategy based on the shortest path algorithm is difficult to apply to the scene of unmanned aerial vehicle network slicing. This is because the current network has poor adaptability to the network environment changing in real time, and the shortest-path algorithm has a single selected path, so that when the number of traffic flows in the network increases gradually, the probability of network congestion increases significantly. If the congestion level of the network is expected to be reduced on the basis of the shortest-path algorithm, the number of the carried service flows is sacrificed. In addition, in an actual scenario, due to the mobility of the node, the communication quality of the link in the network may also change, and the dynamic property of the network may also affect the result of the routing algorithm.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: in the prior art, a routing algorithm is single and cannot adapt to the changing requirements of network dynamics, so that the quantity of service flows in a network is in a direct proportion relation with network congestion and cannot be applied to the scene of unmanned aerial vehicle network slicing. The invention aims to provide a distributed intelligent routing method facing unmanned aerial vehicle network slicing, and solves the problem of how to improve communication quality under the dynamic network environment of an unmanned aerial vehicle network slicing scene.

The invention is realized by the following technical scheme:

a distributed intelligent routing method facing unmanned aerial vehicle network slicing comprises the following steps:

s1: modeling an unmanned aerial vehicle network into a network model;

s2: setting constraints on the network model, wherein the constraints comprise: time delay limit, rate limit, packet loss rate limit;

s3: when the slice is oriented to low time delay, the constrained network model is built into a multi-constraint optimization model;

s4: solving the multi-constraint optimization model through a reinforcement learning model to obtain a solved value, wherein each communication node independently stores the link conditions of the communication node and the neighbor nodes and updates the link conditions in real time in the solving process;

s5: and carrying out dynamic routing according to the solved value and the link condition.

Firstly, establishing an unmanned aerial vehicle network as a network model and a network model in a static environment, and then setting constraint conditions of time delay limit, rate limit and packet loss rate limit for the grid model. Aiming at the requirement of the unmanned aerial vehicle network, when the time delay oriented slicing is carried out, a constrained network model is built into a multi-constrained optimization model. And establishing a reinforcement learning model, solving the multi-constraint optimization model through the reinforcement learning model to obtain a solution value, simultaneously, independently storing the link conditions of each communication node and the neighbor nodes in the network in the solution process, and updating the current link conditions of each communication node in real time. And finally, finding a routing path which is between the source node and the destination node and meets the minimum time delay requirements of the speed, the packet loss rate and the like according to the solved value and the link state of the communication node. The path dynamically changes over time as network conditions and constraints change.

Further, a network model in a static environment is established as an undirected graph G (V, E, W) with weighted edges, wherein V is a communication node set in the network, E is a communication link set in the network, and W represents the weighted value of the links in the network. Techniques used by the communication link include: TDMA, CSMA, or polling.

Further, the constraint conditions of the network model are expressed by QoS ═ δ, ν, γ, δ denotes delay, ν denotes rate, γ denotes packet loss rate, and the constraint conditions include:

∑_{i，j＝1…n}D(L_ij)≤δ、

min_{i，j＝1…n}V(L_ij)≥v、

1-Π_{i，j＝1…n}(1-R(L_ij))≤γ；

L_ijrepresenting a link of the communication node i to the communication node j; d (L)_ij) Represents the link L_ijTime delay of (2); v (L)_ij) Represents the link L_ijThe rate of (d); r (L)_ij) Represents the link L_ijThe packet loss rate of (1).

Further, when facing the slice of the low time delay of the unmanned aerial vehicle network, the multi-constraint optimization model is:

minimize Delay＝E[Σ_i,j＝1…nD(L_ij)]_t、

subject to Error＝E[1-Π_i,j＝1…n(1-R(L_ij))]_t≤γ、

TransRate＝E[min_i,j＝1…nV(L_ij)]_t≥ν、

L_ij∈p＝(L_si,…,L_ij,…,L_jd)；

E[θ]_tdenotes the expectation of theta in its duration, t denotes the duration of the traffic flow, Error denotes the packet loss rate of the traffic flow, TransRate denotes the transmission rate of the link, and L denotes the rate of the link_ijThe start node belonging to the same group as the source node s and the end node belonging to the same group as the destination node d; the minimize Delay represents the optimization objective to minimize the propagation Delay of the path.

Further, the communication node measures the quality of the link QoS between the communication node and the destination node by using the Q value, and the step S5 includes the following sub-steps:

s51: defining a communication node for sending a data packet as a data packet node;

s52: the data packet node sends a data packet to a neighbor node;

s53: judging a communication node of a next hop of a data packet according to the Q value of the neighbor node and the link QoS, and taking the communication node of the next hop as a data packet node;

s54: repeating the steps S52-S54 until the data packet node is the destination node.

And the data packet node independently calculates the Q value from the data packet node to the destination node.

The communication node uses a Q value to measure the quality of the QoS of the link between the current communication node and the destination node. The communication node sends data packets to all the neighbor nodes, judges which is the next communication node of the data packet according to the Q values of the neighbor nodes and the link QoS, transmits the data packets until the data packets reach the destination node according to the judgment result, and completes the iterative process of routing from a source node to the destination node. On the basis of the process, a distributed algorithm is preferably adopted, each communication node independently calculates the Q value from the current communication node to the destination node, each communication node and the neighbor node interactively acquire the Q value and the link QoS, and the acquired Q value and the link QoS value are used as the selection standard of the next communication node.

Further, when the network fluctuates greatly, the communication node discards the currently stored link condition and performs the link condition calculation again.

Further, the reinforcement learning adopts a value iteration method.

Further, modeling the routing problem as a Markov Decision Process (MDP), the MDP model comprising: a state machine, an action set, a probability transition matrix, a reward matrix, a discount factor, the discount factor used to calculate a cumulative reward.

Compared with the prior art, the invention has the following advantages and beneficial effects:

the invention establishes a multi-constraint optimization model for low-delay network slicing, the model has the dynamic characteristic, the optimal delay path from a source node to a destination node can be found by solving the multi-constraint optimization model, and the path dynamically changes at any time along with the network condition. The method is particularly suitable for the unmanned aerial vehicle network environment, and has the characteristics of high requirement on time delay and random change of network nodes. Meanwhile, the network node adopts a distributed method, and each user independently stores the link condition between the user and the neighbor and updates the link state, so that the selection of the path is faster and more efficient.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a schematic view of example 1.

Fig. 2 is a schematic diagram of a network model in embodiment 1.

Fig. 3 is a schematic diagram of a Q learning method in embodiment 2.

FIG. 4 is a diagram of a simulation environment of embodiment 5.

FIG. 5 is a graph showing the convergence of the average Q values at different ε in example 5.

FIG. 6 is a diagram showing the convergence of example 5 with a small number of iterations.

Fig. 7 shows the routing results of example 6 under different QoS requirements.

Fig. 8 is a diagram of a situation of a node loss in the network according to embodiment 6.

Fig. 9 is a diagram of a case of link failure in the network according to embodiment 6.

Fig. 10 is a graph comparing the service transmission rate of the DSDV algorithm according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example 1.

As shown in fig. 1. Embodiment 1 is a distributed intelligent routing method for unmanned aerial vehicle network slicing. Firstly, modeling an unmanned aerial vehicle network into a network model; and setting constraint conditions for the network model, wherein the constraint conditions comprise: time delay limit, rate limit, packet loss rate limit; when the slice is oriented to low time delay, the constrained network model is built into a multi-constraint optimization model; then solving the multi-constraint optimization model through a reinforcement learning model to obtain a solved value, wherein each communication node independently stores the link conditions of the communication node and the neighbor nodes and updates the link conditions in real time in the solving process; and then, carrying out dynamic routing selection according to the solved value and the link condition.

In a static network environment, the network model in embodiment 1 is established as an undirected graph G (V, E, W) with weighted edges, where V is a set of communication nodes in the network, and is denoted as V ═ W₁,v₂,…,v_nN is the number of nodes; e is the set of communication links in the network, E ═ E_11a,e_12a,…,e_ijk}，e_ijkIndicating a link from node i to node j, k indicating the communication technology used by the link, k being taken to a valueThere are three {1,2,3}, which respectively represent the TDMA, CSMA, and polling techniques; w represents the weight value of the link in the network, W ═ d_ijk,v_ijk,p_ijk) Denotes a link e_ijkCorresponds to the link delay (delay), the link transmission rate (transmission rate) and the packet loss rate (packetloss) of the link. Link e in the unlicensed figure_ijkAnd e_jikIndicating the same link, as shown in fig. 2, node No. 4 and node No. 5 have 3 heavy edges, indicating that the link from node 4 to node 5 supports 3 communication technologies.

On the basis of the network model, each service flow applying for accessing the network has the corresponding requirements of time delay, rate and packet loss rate, and is represented by a triple group QoS (δ, ν, γ) (the corresponding value of each link can be obtained by sending a detection packet). The purpose of QoS routing is to find a suitable path P (L) in the set P of paths that satisfy the requirements of the source node s and the destination node d of the traffic f_si,…,L_ij,…,L_jd) The following constraints are satisfied:

1. delay limitation of service request:

Σ_i,j＝1…nD(L_ij)≤δ

2. rate limiting of service requests:

min_i,j＝1…nV(L_ij)≥ν

3. packet loss rate limitation of service request:

1-Π_i,j＝1…n(1-R(L_ij))≤γ

wherein D (L)_i) Represents a link L_iTransmission delay of, V (L)_i) Represents a link L_iAvailable bandwidth of R (L)_i) Represents a link L_iThe packet loss rate of (1).

Under a slicing scene, different types of services have different requirements on different QoS characteristics, and taking low-delay-oriented slicing as an example, when the services have strict limitation requirements on delay, the following multi-constraint optimization model can be established:

minimize Delay＝E[Σ_i,j＝1…nD(L_ij)]_t

subject to Error＝E[1-Π_i,j＝1…n(1-R(L_ij))]_t≤γ；

TransRate＝E[min_i,j＝1…nV(L_ij)]_t≥ν；

L_ij∈p＝(L_si,…,L_ij,…,L_jd).

wherein, E [ theta ]]_tIndicating the expectation of theta over its duration, where t denotes the duration of the traffic flow. Constraint one (Error) is the packet loss rate (bit Error rate) constraint of the traffic flow, constraint two (TransRate) is the transmission rate of the link, and constraint three (L)_ijE.g. p) the starting node which restricts a route must be the source node s, the terminating node is the destination node d; the optimization objective (minimize Delay) is to minimize the transmission Delay of the path.

And establishing a reinforcement learning model to solve the multi-constraint optimization model.

The routing problem is first modeled as a Markov Decision Process (MDP) and solved using a reinforcement learning model. Under the model, each node of the network is regarded as one State (State) in the MDP, the process of selecting the neighbor node as the next hop of the route in each State is used as Action (Action) selection, each node is used as agent to independently perform own Action selection and Reward (Reward) calculation, and the updating of the whole network is realized through information interaction among the nodes.

The MDP model contains five main elements<S,A,P,R,γ>S represents a state machine, A represents an action set, P represents a probability transition matrix, R represents a reward matrix, and gamma represents a discount factor, for calculating a cumulative reward. Since the transition of the state after the selection of the designated action in the designated state is definite, the action transition probability p becomes 1. The QoS index of the link is used as an instant reward value r, and a specific QoS index d_ij(t)、e_ij(t) and v_ij(t) represents latency expectation, bit error rate expectation and rate of link i-j, respectively, over a period of time. In addition, three Q values Q are defined_ij(d)、Q_ij(e) And Q_ij(v) Means that when the j node is selected as the next hop from the i node, the cumulative time delay and the cumulative time delay of all links of the whole path from the i node to the destination node are obtainedThe product error rate and the accumulation rate. Corresponding to the optimization model, the optimization goal is that the delay expectation accumulation of all links of the selected path is minimum, so that the discount factor γ is set to be 1, at this time, the update strategy of the Q value is simplified, and a general update formula of the Q learning cost function is as follows: q (s, a) — Q (s, a) + α (r + γ Q (s ', a') -Q (s, a)), where Q (s, a) is the value of the state-action (s, a), α is the learning rate, γ is the discount factor, and r is the immediate reward. Depending on routing issues, the formula can be reduced to Q (s, a) ═ r + Q (s ', a').

In the learning model, the QoS performance of the link is obtained through interaction among nodes, a global link QoS performance matrix (return R) can be obtained at any time t after the system starts to operate, and at the time t, < S, A, P, R, gamma > are known, so that the problem can be solved by using reinforcement learning with the model.

According to the Q value update formula Q (s, a) ═ r + Q (s ', a'), Q (s, a) represents the shortest path delay from the s node to the d node when the s node selects its neighbor node a as the next hop. The optimization objective considered is to minimize the delay, so minimizing Q (s, a) is equivalent to minimizing r + Q (s ', a'), while the immediate reward term r is a definite value at a certain time, so only Q (s ', a') needs to be minimized. Specifically, after finding out the next hop node j of a certain node i in the network by the epsilon-greedy algorithm, only the next hop k with the minimum Q value in the Q value table of the node j (i.e. the next hop k with the minimum Q value) needs to be found

) Updated, thereby calculated Q_ijThe value is the optimal value for the current stage.

Example 2.

The specific process of Q learning will be described by taking fig. 3 as an example. A path is found from v1 to v7 that meets the QoS requirements. Each node is assigned a state, and in the network shown in fig. 3, the number of states is 7. In each state, an action is defined as a selectable next hop of each node, taking v1 as an example, the action set size is 3, and three selectable actions are v2, v3 and v 4. Three Q values are defined in each state and are respectively used for describing the conditions of time delay, packet loss rate and bandwidth from the current node to the destination node. v1 firstly uses epsilon-greedy strategy to send data packet, and judges which node the next hop of the data packet should be sent to according to Q values of v2, v3 and v 4. If the packet loss rate and the bandwidth meet the requirements when v2 is selected (judged by Q12(e) and Q12 (v)), Q12(d) is updated, the same steps are repeated by v2 until v7 is selected, and an iterative process is completed.

A large amount of node information and link information need to be used in a network environment, if all data information is stored in the SDN controller, a large amount of cache space of the controller is occupied, the complexity of updating and accessing the information is also on the order of O (| V | | E |), and taking a complete graph as an example, the complexity is O (| V | | E |)³) This can greatly increase the pressure on the SDN controller.

For the situation, considering that each node has certain computing power and storage space, the invention adopts a distributed method, and each user independently saves the link situation between the user and the neighbor and updates the Q value. Namely, each node independently calculates the Q value of v7 by using the RREQ data packet in the same way, but the Q values and the link QoS values of other nodes are obtained among the nodes through interaction and are used as the standard for action selection, so that the training speed is higher, and the complexity is lower.

Example 3.

It is assumed that a service flow in the current network initiates a routing request, the source node is s, the destination node is d, Q_ij(d)、Q_ij(e) And Q_ij(v) Respectively represents the accumulated link delay, the accumulated packet loss rate and the path rate from the current node to the destination node d, infinity represents infinity, each node maintains the 3 types of Q value information of the node,

the neighbor node i, l representing node j represents the experimental round.

Now, it is known that: source node s, destination node j, service type (here, low delay service is taken as an example), QoS requirements of the service flow;

the purpose is to obtain: the path selection scheme for this traffic pi.

An initialization stage:

Q_ij(d)＝∞,Q_ij(e)＝1,Q_ij(v) 0// initializing Q-value

And (3) an online learning stage:

and (3) outputting:

example 4.

The present embodiment introduces a method for a network to cope with link fluctuations. Each node i E V is regarded as an independent agent, and the set of neighbor users is

The node will send a Route Request message (RREQ) according to an epsilon-greedy rule within a fixed iteration period, so as to detect the QoS value of the wireless link between the current node and the neighboring user.

For a certain node i in the network, the node is set to bear the load according to the self bearing capacity and the computing capacity of the node

Information of a plurality of neighbors (all the neighbors can be updated under the condition that the number of the neighbors is not large), and the RREQ is sent once by selecting k neighbors each time. The neighbor node returns a Route Reply message (RREP) after receiving the RREQ, and returns the RREP according to the link condition detected by the RREQInstantaneous status information (including bit error rate e) of additional links_ijTime delay d_ijAnd velocity v_ij) And Q value information Q stored by the neighbor node_jk(d)、Q_jk(e) And Q_jk(v) And also returns the link status to the node i through the RP to complete the update of the link status.

After the node i selects k neighbors to send RREQs, the RREP sent back by the neighbor nodes is received, and the QoS value of the link i-j is updated by using the information carried in the RREP.

An off-line stage:

initialization:

d_ij(0)＝0,e_ij(0)＝0,v_ij(0)＝0

as can be seen from the above, if the network fluctuates greatly in a period of time, the currently stored expected value of the link state can be discarded, and the link state calculation can be performed again.

Example 5.

The algorithm was simulated using python. Based on the system model, the following parameters are set for the simulation environment:

the number of nodes is as follows: 50; area size: 1000km multiplied by 1000 km; node communication range: 300 km; link error rate: normal distribution; link rate: uniformly distributing 50kbps to 450 kbps; link delay: normal distribution; source node, destination node: random integers of 0 to 49; service requirements are as follows: and (4) randomly generating.

The error rate of the link is set as a random number which follows normal distribution, the parameter of the normal distribution is related to the degree of the node (the number of neighbors of the node), and the delay of the link is also set as a random number which follows normal distribution, and the parameter is related to the length of the link. In order to generate a dynamic environment, certain dynamic property is added to the QoS value of the link under the condition that the link connection state is not changed, the QoS value is changed within a small range, and the link fluctuation in an actual scene is simulated.

In the simulation process, the node is defined as a python class, and the inclusion attribute is set as follows: node position, neighbor node set and Q value table for different QoS, wherein each type of Q value table of each node is length

And storing the list. The adjacency matrix (two-dimensional array) of the network environment map is stored, and the positions of 50 nodes are obtained by randomly generating horizontal and vertical coordinates within the range of 1000km × 1000km, and the specific network environment map is shown in fig. 4.

In fig. 4, Source node of the traffic flow is set to be node No. 2, destination node Destiny is node No. 20, and the specific location has been calibrated in the figure. To facilitate the verification of convergence, the QoS requirements for the traffic flow are first set as: the speed requirement v is more than or equal to 50kbps, and the error rate requirement e is less than or equal to 1. Under such a demand, the problem translates into an unconditionally minimized latency routing problem.

Fig. 5 shows the convergence process of the algorithm under different values of epsilon (epsilon), fig. 6 is the convergence details when epsilon changes linearly and epsilon is 0.5 when the iteration time (iteration time) in fig. 5 is within 7000 times, and the algorithm can be considered to converge when the Q value (Qd _ value) no longer changes within a period of time. When epsilon is small, the algorithm is explored with small probability and utilized with large probability, network information cannot be fully acquired at the initial stage of system operation, so that the convergence speed is slow, the convergence speed of the algorithm gradually increases along with the increase of epsilon, and as can be seen in fig. 6, the fastest convergence speed can be obtained by adopting the linearly changed epsilon value related to iteration rounds, and convergence can be achieved after about 2000 iterations. After the algorithm converges, the source node starts to use the Q value, and selects the node with the smallest Q value as the next hop at each node, and the finally obtained path is [2,36,49,23,20], which is the route with the lowest delay under the network, and the lowest delay is 16.9647 ms.

Example 6.

As shown in fig. 7, when a path from node 32 (Source) to node 20 (Destiny) is found for a traffic flow under the condition of different bit error rate and rate requirements, the path result given by the method of the present invention is used, and the simulation data gives detailed QoS requirement setting and algorithm operation results:

the QoS requirements for traffic flow 1 are: gamma is less than or equal to 0.15, and v is more than or equal to 100 kbps;

actual routing results: γ is 0.119, ν is 106.78kbps, δ is 19.294 ms;

the QoS requirements for traffic flow 2 are: gamma is less than or equal to 0.15, and v is more than or equal to 120 kbps;

actual routing results: γ is 0.137, ν is 169.497kbps, δ is 19.74 ms;

the QoS requirements for traffic flow 3 are: gamma is less than or equal to 0.25, and v is more than or equal to 250 kbps;

actual routing results: γ is 0.238, ν is 272.289kbps, δ is 23.769 ms;

the QoS requirements for traffic flow 4 are: gamma is less than or equal to 0.75, and v is more than or equal to 450 kbps;

actual routing results: γ is 1, ν is 0, δ is infinity;

the QoS requirements for traffic flow 5 are: gamma is less than or equal to 0.05, and v is more than or equal to 50 kbps;

actual routing results: γ is 1, ν is 0, δ is infinity.

For the service flow 1, the service flow 2 and the service flow 3, the present invention finds the route with the shortest delay under the given requirement, and the specific path is as shown in fig. 7. Compared with the service flow 1 and the service flow 2, the service flow 3 has stronger QoS limitation, so that the routing path of the service flow 3 is longer, the hop count is more, and the time delay is also obviously longer. For the service flow 4 and the service flow 5, because a path meeting the requirement does not exist in the network, the time delay result returned by the algorithm is infinity, for the service flow 4, the network cannot meet the requirement of the communication rate, and for the service flow 5, the network cannot meet the requirement of the error rate.

Fig. 8 shows the case of partial node loss in the network, and fig. 9 shows the case of partial link failure. The method of the present invention is set to occur when 7000 times of iteration, and as can be seen from the figure, when some nodes or links in the network are abnormal, the method of the present invention can adaptively perform route recovery after sensing the change. Because the node stores the Q value information of each action, if some nodes or links in the optimal route fail, the method can select an alternative path to forward the service, and the network survivability is enhanced.

According to the QoS performance of the algorithm, the distributed Q learning method (also called QLRA) and the DSDV routing algorithm are compared. From a source node s to a destination node d, the obtained path delay is 13.6436ms, the bit error rate is 0.1140 and the rate is 131.0551kbps by a QLRA method; and the DSDV routing algorithm obtains the path delay of 8.3547ms, the bit error rate of 0.1274 and the rate of 57.2383 kbps. The path in the DSDV takes hop counts (hops) as a measurement standard, the given routing information can only satisfy one QoS requirement of hop counts, and when the QoS requirement types of services increase, the DSDV algorithm cannot satisfy all requirements. When the QoS requirement of the service is limited to be less than or equal to 0.12, and the communication speed is more than or equal to 50kbps, the DSDV algorithm cannot provide a routing result meeting the requirement, and the DSDV algorithm can only provide the result according to the shortest time delay.

As shown in fig. 10, the probability of correct transmission of traffic at different interaction frequencies in a dynamic network environment is shown. It can be seen that the QLRA method can sense the change of the network environment, so that the accuracy of service delivery gradually increases with the increase of the learning round, and the DSDV can only store the local network topology information obtained last time, so that the accuracy of transmission cannot be guaranteed, and information needs to be interacted again and the routing table needs to be updated each time the network environment changes. Fig. 10(a) shows the success rate of service transmission at a high interaction frequency, and fig. 10(b) shows the service transmission rate at a low interaction frequency, and it can be seen that the success rates of the two algorithms decrease to different degrees with the decrease of the interaction frequency, but since the QLRA method performs message update in return for long-term expectation of obtaining an environment state, even if the interaction frequency between nodes is decreased, the distributed Q learning method still learns a network change trend to a certain degree, and ensures a certain accuracy, and the accuracy of the DSDV algorithm decreases to a greater degree, because it can only ensure that the link information interacted at a certain time is correct, and the accuracy of the algorithm is greatly affected when the environment dynamically changes.

The DSDV algorithm adopts a flooding mode to carry out routing information interaction and acquisition, a large number of invalid messages can appear in a network, and the problem of broadcast storm is caused. In addition, the route table maintained by the QLRA method of the invention does not provide a standby route for the node to forward the message. When the network environment fluctuates, such as some links or nodes in the network are damaged, the DSDV algorithm cannot guarantee the delivery rate of the service, but the QLRA method of the present invention can select the backup path to forward according to the Q value information of other nodes in the Q table.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A distributed intelligent routing method facing unmanned aerial vehicle network slices is characterized by comprising the following steps:

s1: modeling an unmanned aerial vehicle network into a network model; the network model is an undirected graph G (V, E, W) with a weighted edge, wherein V is a communication node set in the network, E is a communication link set in the network, and W represents the weighted value of the link in the network;

s3: when the slice is oriented to low time delay, a constrained network model is built into a multi-constraint optimization model; the multi-constraint optimization model is as follows:

minimize Delay＝E[Σ_i,j＝1…nD(L_ij)]_t、

subject to Error＝E[1-Π_i,j＝1…n(1-R(L_ij))]_t≤γ、

TransRate＝E[min_i,j＝1…nV(L_ij)]_t≥ν、

L_ij∈p＝(L_si,…,L_ij,…,L_jd)；

E[θ]_tdenotes the expectation of theta in its duration, t denotes the duration of the traffic flow, Error denotes the packet loss rate of the traffic flow, TransRate denotes the transmission rate of the link, and L denotes the rate of the link_ijThe start node belonging to the same group as the source node s and the end node belonging to the same group as the destination node d; minimize Delay represents the transmission Delay with the optimization goal of minimizing the path;

2. The distributed intelligent routing method for unmanned aerial vehicle network slices of claim 1, wherein the technologies used by the communication link include: TDMA, CSMA, or polling.

3. The unmanned-aerial-vehicle-network-slice-oriented distributed intelligent routing method of claim 1, wherein the constraint is expressed by QoS ═ δ, ν, γ, δ denotes latency, ν denotes rate, γ denotes packet loss rate, and the constraint comprises:

∑_i,j＝1…nD(L_ij)≤δ、

min_i,j＝1…nV(L_ij)≥v、

1-Π_i,j＝1…n(1-R(L_ij))≤γ；

4. The distributed intelligent routing method for unmanned aerial vehicle network slice according to claim 1, wherein the communication node measures the quality of the link QoS between the communication node and the destination node by using a Q value, and the step S5 includes the following sub-steps:

s52: the data packet node sends a data packet to a neighbor node;

5. The distributed intelligent routing method for unmanned aerial vehicle network slices of claim 4, wherein the packet node independently calculates a Q value from the packet node to a destination node.

6. The distributed intelligent routing method for unmanned aerial vehicle network slices of claim 1, wherein when the network fluctuates greatly, the communication node discards the currently stored link condition and performs the link condition calculation again.

7. The distributed intelligent routing method for unmanned aerial vehicle network slices of claim 1, wherein the reinforcement learning employs a value iteration method.

8. The distributed intelligent routing method for unmanned aerial vehicle network slices of claim 1, wherein the routing problem is modeled as a Markov Decision Process (MDP), the MDP model comprising: a state machine, an action set, a probability transition matrix, a reward matrix, a discount factor, the discount factor used to calculate a cumulative reward.