CN113242556B

CN113242556B - Unmanned aerial vehicle resource dynamic deployment method based on differentiated services

Info

Publication number: CN113242556B
Application number: CN202110625142.7A
Authority: CN
Inventors: 王小洁; 宁兆龙; 郭磊; 高新波
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2022-08-23
Anticipated expiration: 2041-06-04
Also published as: CN113242556A

Abstract

The invention discloses an unmanned aerial vehicle dynamic deployment method based on differentiated services, which is used for realizing efficient differentiated services in a wireless mobile network environment. The invention firstly establishes a Markov game model based on the state information of the unmanned aerial vehicle and the ground users, and then deduces the Nash equilibrium condition of the service resources provided by the owner of the unmanned aerial vehicle under the condition of complete information, and the users can achieve the optimal utility. And in the case of incomplete information, effective decision of the unmanned aerial vehicle owner on the provided resources of each time slot is realized by means of simulation learning. In addition, the invention designs a novel neural network model for strategy training, and combines a convolutional neural network, a generation countermeasure network and a gradient descent strategy. Theoretical analysis shows that the unmanned aerial vehicle service deployment decision provided by the invention is a progressive optimal solution. The invention provides a new method for unmanned aerial vehicle resource allocation based on differentiated services.

Description

Unmanned aerial vehicle resource dynamic deployment method based on differentiated services

Technical Field

The invention belongs to a method for dynamically deploying available resources of an unmanned aerial vehicle based on differentiated service demands of ground users, and particularly relates to a method for dynamically deploying resources of the unmanned aerial vehicle based on imitation learning.

Background

The unmanned aerial vehicle has the characteristics of flexibility, mobility and the like, and is widely applied to a wireless edge network to provide services for users, including data collection, network access, content caching and the like. The key problem in implementing services based on unmanned aerial vehicles is how to efficiently deploy unmanned aerial vehicle resources to meet user demands. However, the existing unmanned aerial vehicle deployment scheme only focuses on a single type of service, and does not consider a scenario in which multiple differentiated services coexist. A typical application scenario is that different network operators may provide heterogeneous network services, such as 4G and 5G networks, to ground users via drones. In a live basketball live game, spectators can purchase different kinds of network value-added services from network operators according to their own demands and purchasing power on the network. Therefore, how to satisfy both the interests of the user and the owner of the drone through optimal configuration of the drone resources is awaited further exploration by researchers. The invention aims to provide a dynamic unmanned aerial vehicle service deployment method based on differentiated services, mainly aiming at the defects of the existing research, and the method realizes the unmanned aerial vehicle owner online resource deployment scheme under incomplete information by using simulation learning, optimizes the utility of users and unmanned aerial vehicle owners simultaneously, and provides a new method for unmanned aerial vehicle resource deployment based on differentiated services.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. An unmanned aerial vehicle resource dynamic deployment method based on differentiated services is provided. The technical scheme of the invention is as follows:

a method for dynamically deploying unmanned aerial vehicle resources based on differentiated services comprises the following steps:

1) constructing a dynamic demand model, and determining the utility of a user and the owner of the unmanned aerial vehicle;

2) constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem;

3) in the complete information state, constructing an expert strategy to enable the performance to be optimal off line;

4) and in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3).

Further, step 1) constructs a dynamic demand model, determines the utility of the user and the owner of the unmanned aerial vehicle, and specifically includes:

the dynamic demand model comprises H hot spot areas and K unmanned aerial vehicle owners, and in each time slot t, a user i has probability

Generating service requests _hi (t) and is defined as

Wherein d is _hi (t) represents the required service capability, iota _hik (t)∈[0，1]Representing the preference degree of the user i in the hotspot area h for the service k;

budget e for purchasing service by user i located in hot spot area h _hi M represents the total number of users in the hot spot area h _h (t) then the total user budget for the hot spot area h is

The aggregated preference of the user for service k is:

the total demand of the hotspot area h for service k within the time slot t is expressed as:

then the aggregate user utility in hotspot region h can be calculated by the following equation:

where 0 < alpha < 1 indicates the degree of substitution for different services, variable q _hk (t) total amount of service q in cache application for the unmanned aerial vehicle owner can provide for the hotspot area h in the time slot t _hk (t) represents the available transmission rate, and the total revenue of the system user is calculated by the following formula:

further, the service overhead of the drone owner includes two parts: maintenance and energy costs, where the unit maintenance cost is in g ₀ Expressed as g for energy consumption per unit power _s Means that unit service energy consumption g _c It is shown that the energy consumption cost of the owner k of the drone in the time slot t is calculated by the following formula:

expression(s)

Indicating the number of drones required, where b _k Representing the service capacity of a single drone, the benefit of the drone owner k in the time slot t is calculated by the following formula:

Γ _hk (t)＝p _k (t)q _hk (t)-c _hk (t)，

wherein p is _k (t) is the price of service k in time slot t;

based on the definition of the user aggregate utility and the unmanned aerial vehicle owner income, the first optimization target is to maximize the total utility of the user, and the problem is described as follows:

P1

s.t.

the constraint condition ensures that the total user overhead of the hot spot area h in the time slot t does not exceed the total budget;

second, the goal is to maximize the long-term revenue of the owner of the drone, the problem is described as follows:

P2：

further, the step 2) of constructing a markov game model, and converting the profit maximization problem in the step 1) into a markov optimization problem specifically includes:

the unmanned aerial vehicle owner income maximization problem defined in the step 1 is converted into a Markov game problem, the game can be represented by a tuple < K, S, O, A, P, R, gamma >, and the meanings of all elements are as follows:

the state S represents the state information of the established Markov game model and is expressed as

Wherein S is ₁ Representing the state of the user, including the service demand, service preference and budget generated by the user; s ₂ Representing the state of the owner of the unmanned aerial vehicle, including unit cost overhead, service capacity and service replaceability; s ₃ Representing the status of the provided services, including the number and price of the services provided in the past;

observing state O, wherein the owner of the unmanned aerial vehicle in the system can not observe the system state S, can only observe partial information and is expressed as

Wherein

Is the observed state of the drone owner k, including the user's budget, the drone owner cost per unit, service capacity and service alternatives, and the number and price of services offered in the past;

action A. action set of the owner of the drone is represented as

Wherein Δ q _hk (t) is the number of services that need to be provided in addition to the last time slot.

State transition probability P: expressed as P: sxas × S → [0, 1]Based on the probability P(s) ^t+1 |s ^t ，a ^t ) And action a ^t The system state is from s ^t Jump to s ^t+1 ；

The reward function R can be expressed as

S × A → R, representing agent k performing an action within time slot t

Post-acquired transient rewards; the instant prize may be calculated by the following formula:

so that the objective function conversion of the owner of the unmanned aerial vehicle maximizes the accumulated instantaneous rewards

Further, the step 3: in a complete information state, constructing an expert strategy to make the performance reach offline optimum, specifically comprising:

in the complete information state, the optimization problems P1 and P2 are converted to obtain the relationship between the service quantity and the price:

the optimization problems P1 and P2 are converted into the optimization problems only with the unknown variable q _hk (t) while verifying that the optimal solutions of P1 and P2 are consistent, the expert strategy is obtained by the following steps:

1) k experts obtain the optimal service quantity q by solving a following equation according to the current system state _hk (t)：

Wherein A is _k ＝(g _o +g _s +g _c b _k )/b _k Of variable b _k Serving resource capacity, variable q, for a single drone _h,-k The number of services provided in the hotspot region k for other services except the service k; variable Q _k ＝f _hk (t)[q _hk (t)] ^α And is made of

2) And recording the actions, the system state, the observable state and the reward executed by K experts in each time slot to form a data set.

Further, the step 4: in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3), specifically comprising:

firstly, under a partial observation state, each agent needs to predict an adversary strategy, and a strategy pi of an agent K can be established based on an occupancy rate measurement matching strategy _k And adversary strategy pi _-k The relationship between them, expressed as:

where o represents the observed state. By using the strategy of generating an confrontation network training agent, the optimization problem can be converted into the following form:

P3

wherein

Representing smart-based strategies pi _k And pi- _k Expectation of (D) _k Indicating the generation of an output of the countermeasure network. Only the saddle point (pi) needs to be found _k ，D _k ) The problem can be solved;

second, to solve for the saddle point (π _k ，D _k ) And training the intelligent agent strategy model.

Further, in order to meet user requirements, an unmanned aerial vehicle governed by the same unmanned aerial vehicle owner forms a mesh network which is spiraled above a hotspot position h, nodes in the mesh network can communicate with each other and perform load balancing in a self-adaptive manner, and unmanned aerial vehicles governed by different unmanned aerial vehicle owners do not communicate with each other. The user only needs to upload the service requirements to the drone that is of its preferred type and closest to it.

The invention has the following advantages and beneficial effects:

the invention constructs a dynamic unmanned aerial vehicle resource deployment framework for realizing differentiated service scheduling based on the unmanned aerial vehicle in a wireless mobile edge network. In order to maximize the user utility and maximize the long-term income of various unmanned aerial vehicle owners, the method firstly establishes a Markov game model based on the state information of the unmanned aerial vehicles and the ground users, and then theoretically deduces the Nash equilibrium condition of the service resources provided by the unmanned aerial vehicle owners under the condition of complete information, and the users can achieve the optimal utility. And in the case of incomplete information, effective decision of the unmanned aerial vehicle owner on the provided resources of each time slot is realized by means of simulation learning. The method combines the simulation learning and the differentiated service resource scheduling based on the unmanned aerial vehicle for the first time, and is more suitable for online scheduling and independent of a centralized control mode compared with the traditional scheduling algorithm. Compared with a machine learning scheme based on no model, the method has better convergence and performance. The experimental results prove the high efficiency of the method in the aspects of user utility, unmanned aerial vehicle owner income and fairness. The invention provides a novel deployment method of differentiated services applied to an unmanned aerial vehicle-assisted edge network.

Drawings

FIG. 1 is a diagram of a preferred embodiment dynamic demand model provided by the present invention.

Fig. 2 is a schematic diagram of algorithm training based on the simulation learning.

Fig. 3 and 4 compare the performance of the proposed mlu algorithm with the other three algorithms in terms of average user utility and revenue obtained by the owner of the drone.

Fig. 5 and 6 compare the performance of the mlu algorithm proposed by the present invention with the other three algorithms in the fairness of the unmanned aerial vehicle owner's revenue.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

fig. 1 is a dynamic demand model of a preferred embodiment provided by the present invention, in which a plurality of drones managed by each drone owner in a hotspot area cooperate to provide services to ground users, and service request data dynamically changes as users move.

Fig. 2 is a schematic diagram of algorithm training based on the simulation learning. A plurality of experts and a plurality of agents interact with the environment, the experts obtain complete observation information of the system, and the agents obtain partial observable information of the system. The expert generates a state dynamic library through offline learning, and the intelligent agent performs online training through modules such as strategies, values and discriminators based on the library generated by the expert.

Fig. 3 and 4 compare the performance of the proposed mlu algorithm with the other three algorithms in terms of average user utility and revenue obtained by the owner of the drone. The experimental result shows that the simulation learning is beneficial to the online scheduling with the knowledge of local information, and compared with a comparison algorithm, the method can obtain higher system user utility and unmanned aerial vehicle owner income.

Fig. 5 and 6 compare the performance of the mlu algorithm proposed by the present invention with the other three algorithms in the fairness of the unmanned aerial vehicle owner's revenue. The experimental result shows that the method can achieve better fairness in the aspect of income obtained by an unmanned aerial vehicle owner, and has smaller performance gap with an expert strategy.

The embodiment of the invention provides an unmanned aerial vehicle resource deployment method based on difference service, which comprises the following steps:

step 1: and constructing a dynamic demand model, and determining the utility of the user and the owner of the unmanned aerial vehicle.

Hair brushA dynamic demand model is constructed, wherein the dynamic demand model comprises H hot spot areas and K unmanned aerial vehicle owners. In each time slot t, user i has a probability

Generates a service request and defines as

Wherein d is _hi (t) represents the required service capability, iota _hik (t)∈[0，1]Indicating the degree of preference of user i in hotspot area h for service k. In order to meet the user requirements, the unmanned aerial vehicles governed by the same unmanned aerial vehicle owner form a mesh network to circle above the hotspot position h. All nodes in the mesh network can communicate with each other and perform load balancing in a self-adaptive manner, and unmanned aerial vehicles governed by different unmanned aerial vehicle owners do not communicate with each other. The user only needs to upload the service requirements to the drone that is of its preferred type and closest to it.

The budget for purchasing services by user i located in hotspot area h can be used as e _hi The total number of users in the hot spot area h can be expressed by m _h (t) then the total user budget for the hot spot area h is

The aggregated preference of the user for service k is:

the total demand of the hotspot area h for service k in time slot t can be expressed as:

wherein 0 < alpha < 1Showing the degree of substitution of different services. Variable q _hk (t) total amount of service that the owner of the drone can provide for hotspot h within time slot t, e.g. q in caching applications _hk (t) represents the available transmission rate. The total revenue for the system user can then be calculated using the following formula:

the service overhead of the drone owner consists of two parts: maintenance and energy costs, where a unit maintenance cost may be in g ₀ Expressed as g for energy cost per unit power _s Means that unit service energy consumption g _c And (4) showing. The energy consumption cost of the owner k of the unmanned aerial vehicle in the time slot t can be calculated by the following formula:

expression formula

Indicating the number of drones required, where b _k Representing the service capacity of a single drone. The profit for the drone owner k in time slot t may be calculated by the following formula:

Г _hk (t)＝p _k (t)q _hk (t)-c _hk (t)，

wherein p is _k (t) is the price of service k in time slot t.

Based on the definition of the user aggregate utility and the unmanned aerial vehicle owner income, the first optimization goal of the method is to maximize the total utility of the user, and the problems are described as follows:

P1

s.t.

the constraint conditions ensure that the total user overhead of the hotspot area h in the time slot t does not exceed the total budget.

Second, the goal is to maximize the long-term revenue of the owner of the drone, and the problem is described as follows:

P2：

and 2, step: constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem.

The unmanned aerial vehicle owner income maximization problem defined in the step 1 can be converted into a Markov game problem, the game can be represented by a tuple < K, S, O, A, P, R, gamma >, and the meanings of the elements are as follows:

Wherein S is ₁ Representing the status of the user, including the user's generated service requirements, service preferences, and budgets; s ₂ Representing the state of the owner of the unmanned aerial vehicle, including unit cost overhead, service capacity and service replaceability; s ₃ Indicating the status of the services provided, including the number and price of services provided in the past.

Wherein

Is the observed state of the drone owner k, including the user's budget, the drone owner cost per unit, service capacity and service alternatives, and the number and price of services offered in the past.

Action A movement of the owner of the unmanned aerial vehicleAre shown as a set

State transition probability P: expressed as P: sxas × S → [0, 1]Based on the probability P(s) ^t+1 |s ^t ，a ^t ) And action a ^t The system state is from s ^t Jump to s ^t+1 。

The reward function R can be expressed as

S × A → R, representing agent k performing an action within time slot t

The instant prize later earned. The instant prize in the present system can be calculated by the following formula:

And step 3: and in a complete information state, constructing an expert strategy, wherein the performance of the expert strategy can reach the offline optimum.

In the complete information state, the optimization problems P1 and P2 are converted, and the relationship between the service quantity and the price can be obtained:

optimization problems P1 and P2 can be transformed to only the unknown variables q _hk (t) as a function of. Meanwhile, the consistency of the optimal solutions of P1 and P2 can be verified. The expert strategy can be obtained by the following steps:

1) and K experts calculate the optimal service quantity according to the current system state.

And 4, step 4: and in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3).

First, under partial observation conditions, each agent needs to predict the adversary strategy. Strategy pi of intelligent agent K can be established based on occupancy rate measurement matching strategy _k And adversary strategy pi _-k The relationship between them, expressed as:

the invention adopts a strategy of generating an confrontation network training agent, and the optimization problem can be converted into the following form:

P3

only the saddle point (pi) needs to be found _k ，D _k ) The problem can be solved.

Second, to solve for the saddle point (π _k ，D _k ) Training the strategy model of the agent, wherein the pseudo code flow of the algorithm is shown in table 1.

TABLE 1 agent policy model training pseudo-code

The designed online algorithm MILU pseudo code flow is shown in Table 2.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the present invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. A method for dynamically deploying unmanned aerial vehicle resources based on differentiated services is characterized by comprising the following steps:

4) under the state of local information, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3);

the step 1) of constructing a dynamic demand model, determining the utility of a user and an unmanned aerial vehicle owner, and specifically comprises the following steps:

Generating service requests _hi (t) and is defined as

Wherein d is _hi (t) represents the required service capability, iota _hik (t)∈[0，1]Representing the preference degree of the user i in the hotspot region h for the service k;

budget e for purchasing service by user i located in hot spot area h _hi M represents the total number of users in the hot spot area h _h (t) indicates that then the total user budget for the hot spot area h is

The aggregated preference of the user for service k is:

where 0 < alpha < 1 denotes the degree of substitution of different services, variable q _hk (t) total amount of service that the owner of the drone can provide for the hotspot area h within the time slot t, q in caching applications _hk (t) represents the available transmission rate, and the total revenue of the system user is calculated by the following formula:

the service overhead of the drone owner consists of two parts: maintenance costs and energy consumption costs, among othersG for unit maintenance cost ₀ Expressed as g for energy consumption per unit power _s Means that unit service energy consumption g _c It is shown that the energy consumption cost of the owner k of the drone in the time slot t is calculated by the following formula:

expression formula

Γ _hk (t)＝p _k (t)q _hk (t)-c _hk (t)，

wherein p is _k (t) is the price of service k within time slot t;

based on the definition of the user aggregate utility and the unmanned aerial vehicle owner income, the first optimization goal is to maximize the user total utility, and the problem is described as follows:

P1：

P2：

the step 2) of constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem specifically comprises the following steps:

Wherein S is ₁ Representing the status of the user, including the user's generated service requirements, service preferences, and budgets; s. the ₂ Representing the state of the owner of the unmanned aerial vehicle, including unit cost overhead, service capacity and service replaceability; s. the ₃ Representing the status of the provided services, including the number and price of the services provided in the past;

Wherein

action A. action set of the owner of the drone is represented as

Wherein Δ q _hk (t) is the number of services that need to be provided in addition to the last time slot;

The reward function R can be expressed as

S × A → R, representing agent k performing an action within time slot t

thus, the objective function conversion of the owner of the unmanned aerial vehicle maximizes the accumulated instantaneous rewards

The step 3: in a complete information state, constructing an expert strategy to make the performance reach offline optimization, specifically comprising:

the optimization problems P1 and P2 are converted into the optimization problems only with the unknown variable q _hk (t) while verifying that the optimal solutions of P1 and P2 are consistent, the expert strategy is obtained by:

1) k experts obtain the optimal service quantity q according to the current system state by solving the following equation _hk (t)：

Wherein A is _k ＝(g _o +g _s +g _c b _k )/b _k Of variable b _k Serving resource capacity for a single drone, variable q _h,-k Are clothesThe number of services provided by other services except the service k in the hotspot area k; variable Q _k ＝f _hk (t)[q _hk (t)] ^α And is made of

2) Recording actions, system states, observable states and rewards executed by K experts in each time slot to form a data set;

the step 4: in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3), specifically comprising:

firstly, under a partial observation state, each agent needs to predict an adversary strategy, and a relation between a strategy pi K of an agent K and an adversary strategy pi-K can be established based on an occupancy rate measurement matching strategy, and is expressed as follows:

wherein o represents the observation state, and the optimization problem can be converted into the following form by adopting the strategy of generating the confrontation network training agent:

P3：

wherein

Representing smart-based strategies pi _k And pi _-k Expectation of (D) _k Representing an output for generating a countermeasure network; only the saddle point (pi) needs to be found _k ，D _k ) The problem can be solved;

2. The method for dynamically deploying unmanned aerial vehicle resources based on differentiated services according to claim 1, wherein in order to meet user requirements, an unmanned aerial vehicle administered by the same unmanned aerial vehicle owner forms a mesh network to circle above a hotspot position h, nodes in the mesh network can communicate with each other and adaptively perform load balancing, and unmanned aerial vehicles administered by different unmanned aerial vehicle owners do not communicate with each other; the user only needs to upload the service requirements to the drone that is of its preferred type and closest to it.