CN113242556B - Unmanned aerial vehicle resource dynamic deployment method based on differentiated services - Google Patents

Unmanned aerial vehicle resource dynamic deployment method based on differentiated services Download PDF

Info

Publication number
CN113242556B
CN113242556B CN202110625142.7A CN202110625142A CN113242556B CN 113242556 B CN113242556 B CN 113242556B CN 202110625142 A CN202110625142 A CN 202110625142A CN 113242556 B CN113242556 B CN 113242556B
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
service
owner
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110625142.7A
Other languages
Chinese (zh)
Other versions
CN113242556A (en
Inventor
王小洁
宁兆龙
郭磊
高新波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202110625142.7A priority Critical patent/CN113242556B/en
Publication of CN113242556A publication Critical patent/CN113242556A/en
Application granted granted Critical
Publication of CN113242556B publication Critical patent/CN113242556B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/18Network planning tools
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/50Allocation or scheduling criteria for wireless resources
    • H04W72/53Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses an unmanned aerial vehicle dynamic deployment method based on differentiated services, which is used for realizing efficient differentiated services in a wireless mobile network environment. The invention firstly establishes a Markov game model based on the state information of the unmanned aerial vehicle and the ground users, and then deduces the Nash equilibrium condition of the service resources provided by the owner of the unmanned aerial vehicle under the condition of complete information, and the users can achieve the optimal utility. And in the case of incomplete information, effective decision of the unmanned aerial vehicle owner on the provided resources of each time slot is realized by means of simulation learning. In addition, the invention designs a novel neural network model for strategy training, and combines a convolutional neural network, a generation countermeasure network and a gradient descent strategy. Theoretical analysis shows that the unmanned aerial vehicle service deployment decision provided by the invention is a progressive optimal solution. The invention provides a new method for unmanned aerial vehicle resource allocation based on differentiated services.

Description

Unmanned aerial vehicle resource dynamic deployment method based on differentiated services
Technical Field
The invention belongs to a method for dynamically deploying available resources of an unmanned aerial vehicle based on differentiated service demands of ground users, and particularly relates to a method for dynamically deploying resources of the unmanned aerial vehicle based on imitation learning.
Background
The unmanned aerial vehicle has the characteristics of flexibility, mobility and the like, and is widely applied to a wireless edge network to provide services for users, including data collection, network access, content caching and the like. The key problem in implementing services based on unmanned aerial vehicles is how to efficiently deploy unmanned aerial vehicle resources to meet user demands. However, the existing unmanned aerial vehicle deployment scheme only focuses on a single type of service, and does not consider a scenario in which multiple differentiated services coexist. A typical application scenario is that different network operators may provide heterogeneous network services, such as 4G and 5G networks, to ground users via drones. In a live basketball live game, spectators can purchase different kinds of network value-added services from network operators according to their own demands and purchasing power on the network. Therefore, how to satisfy both the interests of the user and the owner of the drone through optimal configuration of the drone resources is awaited further exploration by researchers. The invention aims to provide a dynamic unmanned aerial vehicle service deployment method based on differentiated services, mainly aiming at the defects of the existing research, and the method realizes the unmanned aerial vehicle owner online resource deployment scheme under incomplete information by using simulation learning, optimizes the utility of users and unmanned aerial vehicle owners simultaneously, and provides a new method for unmanned aerial vehicle resource deployment based on differentiated services.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. An unmanned aerial vehicle resource dynamic deployment method based on differentiated services is provided. The technical scheme of the invention is as follows:
a method for dynamically deploying unmanned aerial vehicle resources based on differentiated services comprises the following steps:
1) constructing a dynamic demand model, and determining the utility of a user and the owner of the unmanned aerial vehicle;
2) constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem;
3) in the complete information state, constructing an expert strategy to enable the performance to be optimal off line;
4) and in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3).
Further, step 1) constructs a dynamic demand model, determines the utility of the user and the owner of the unmanned aerial vehicle, and specifically includes:
the dynamic demand model comprises H hot spot areas and K unmanned aerial vehicle owners, and in each time slot t, a user i has probability
Figure BDA0003101885210000029
Generating service requests hi (t) and is defined as
Figure BDA0003101885210000021
Wherein d is hi (t) represents the required service capability, iota hik (t)∈[0,1]Representing the preference degree of the user i in the hotspot area h for the service k;
budget e for purchasing service by user i located in hot spot area h hi M represents the total number of users in the hot spot area h h (t) then the total user budget for the hot spot area h is
Figure BDA0003101885210000022
The aggregated preference of the user for service k is:
Figure BDA0003101885210000023
the total demand of the hotspot area h for service k within the time slot t is expressed as:
Figure BDA0003101885210000024
then the aggregate user utility in hotspot region h can be calculated by the following equation:
Figure BDA0003101885210000025
where 0 < alpha < 1 indicates the degree of substitution for different services, variable q hk (t) total amount of service q in cache application for the unmanned aerial vehicle owner can provide for the hotspot area h in the time slot t hk (t) represents the available transmission rate, and the total revenue of the system user is calculated by the following formula:
Figure BDA0003101885210000026
further, the service overhead of the drone owner includes two parts: maintenance and energy costs, where the unit maintenance cost is in g 0 Expressed as g for energy consumption per unit power s Means that unit service energy consumption g c It is shown that the energy consumption cost of the owner k of the drone in the time slot t is calculated by the following formula:
Figure BDA0003101885210000027
expression(s)
Figure BDA0003101885210000028
Indicating the number of drones required, where b k Representing the service capacity of a single drone, the benefit of the drone owner k in the time slot t is calculated by the following formula:
Γ hk (t)=p k (t)q hk (t)-c hk (t),
wherein p is k (t) is the price of service k in time slot t;
based on the definition of the user aggregate utility and the unmanned aerial vehicle owner income, the first optimization target is to maximize the total utility of the user, and the problem is described as follows:
P1
Figure BDA0003101885210000031
s.t.
Figure BDA0003101885210000032
the constraint condition ensures that the total user overhead of the hot spot area h in the time slot t does not exceed the total budget;
second, the goal is to maximize the long-term revenue of the owner of the drone, the problem is described as follows:
P2:
Figure BDA0003101885210000033
further, the step 2) of constructing a markov game model, and converting the profit maximization problem in the step 1) into a markov optimization problem specifically includes:
the unmanned aerial vehicle owner income maximization problem defined in the step 1 is converted into a Markov game problem, the game can be represented by a tuple < K, S, O, A, P, R, gamma >, and the meanings of all elements are as follows:
the state S represents the state information of the established Markov game model and is expressed as
Figure BDA0003101885210000034
Wherein S is 1 Representing the state of the user, including the service demand, service preference and budget generated by the user; s 2 Representing the state of the owner of the unmanned aerial vehicle, including unit cost overhead, service capacity and service replaceability; s 3 Representing the status of the provided services, including the number and price of the services provided in the past;
observing state O, wherein the owner of the unmanned aerial vehicle in the system can not observe the system state S, can only observe partial information and is expressed as
Figure BDA0003101885210000035
Wherein
Figure BDA0003101885210000037
Is the observed state of the drone owner k, including the user's budget, the drone owner cost per unit, service capacity and service alternatives, and the number and price of services offered in the past;
action A. action set of the owner of the drone is represented as
Figure BDA0003101885210000036
Wherein Δ q hk (t) is the number of services that need to be provided in addition to the last time slot.
State transition probability P: expressed as P: sxas × S → [0, 1]Based on the probability P(s) t+1 |s t ,a t ) And action a t The system state is from s t Jump to s t+1
The reward function R can be expressed as
Figure BDA0003101885210000041
S × A → R, representing agent k performing an action within time slot t
Figure BDA0003101885210000042
Post-acquired transient rewards; the instant prize may be calculated by the following formula:
Figure BDA0003101885210000043
so that the objective function conversion of the owner of the unmanned aerial vehicle maximizes the accumulated instantaneous rewards
Figure BDA0003101885210000044
Further, the step 3: in a complete information state, constructing an expert strategy to make the performance reach offline optimum, specifically comprising:
in the complete information state, the optimization problems P1 and P2 are converted to obtain the relationship between the service quantity and the price:
Figure BDA0003101885210000045
the optimization problems P1 and P2 are converted into the optimization problems only with the unknown variable q hk (t) while verifying that the optimal solutions of P1 and P2 are consistent, the expert strategy is obtained by the following steps:
1) k experts obtain the optimal service quantity q by solving a following equation according to the current system state hk (t):
Figure BDA0003101885210000046
Wherein A is k =(g o +g s +g c b k )/b k Of variable b k Serving resource capacity, variable q, for a single drone h,-k The number of services provided in the hotspot region k for other services except the service k; variable Q k =f hk (t)[q hk (t)] α And is made of
Figure BDA0003101885210000047
Figure BDA0003101885210000048
2) And recording the actions, the system state, the observable state and the reward executed by K experts in each time slot to form a data set.
Further, the step 4: in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3), specifically comprising:
firstly, under a partial observation state, each agent needs to predict an adversary strategy, and a strategy pi of an agent K can be established based on an occupancy rate measurement matching strategy k And adversary strategy pi -k The relationship between them, expressed as:
Figure BDA0003101885210000051
where o represents the observed state. By using the strategy of generating an confrontation network training agent, the optimization problem can be converted into the following form:
P3
Figure BDA0003101885210000052
Figure BDA0003101885210000054
wherein
Figure BDA0003101885210000053
Representing smart-based strategies pi k And pi- k Expectation of (D) k Indicating the generation of an output of the countermeasure network. Only the saddle point (pi) needs to be found k ,D k ) The problem can be solved;
second, to solve for the saddle point (π k ,D k ) And training the intelligent agent strategy model.
Further, in order to meet user requirements, an unmanned aerial vehicle governed by the same unmanned aerial vehicle owner forms a mesh network which is spiraled above a hotspot position h, nodes in the mesh network can communicate with each other and perform load balancing in a self-adaptive manner, and unmanned aerial vehicles governed by different unmanned aerial vehicle owners do not communicate with each other. The user only needs to upload the service requirements to the drone that is of its preferred type and closest to it.
The invention has the following advantages and beneficial effects:
the invention constructs a dynamic unmanned aerial vehicle resource deployment framework for realizing differentiated service scheduling based on the unmanned aerial vehicle in a wireless mobile edge network. In order to maximize the user utility and maximize the long-term income of various unmanned aerial vehicle owners, the method firstly establishes a Markov game model based on the state information of the unmanned aerial vehicles and the ground users, and then theoretically deduces the Nash equilibrium condition of the service resources provided by the unmanned aerial vehicle owners under the condition of complete information, and the users can achieve the optimal utility. And in the case of incomplete information, effective decision of the unmanned aerial vehicle owner on the provided resources of each time slot is realized by means of simulation learning. The method combines the simulation learning and the differentiated service resource scheduling based on the unmanned aerial vehicle for the first time, and is more suitable for online scheduling and independent of a centralized control mode compared with the traditional scheduling algorithm. Compared with a machine learning scheme based on no model, the method has better convergence and performance. The experimental results prove the high efficiency of the method in the aspects of user utility, unmanned aerial vehicle owner income and fairness. The invention provides a novel deployment method of differentiated services applied to an unmanned aerial vehicle-assisted edge network.
Drawings
FIG. 1 is a diagram of a preferred embodiment dynamic demand model provided by the present invention.
Fig. 2 is a schematic diagram of algorithm training based on the simulation learning.
Fig. 3 and 4 compare the performance of the proposed mlu algorithm with the other three algorithms in terms of average user utility and revenue obtained by the owner of the drone.
Fig. 5 and 6 compare the performance of the mlu algorithm proposed by the present invention with the other three algorithms in the fairness of the unmanned aerial vehicle owner's revenue.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
fig. 1 is a dynamic demand model of a preferred embodiment provided by the present invention, in which a plurality of drones managed by each drone owner in a hotspot area cooperate to provide services to ground users, and service request data dynamically changes as users move.
Fig. 2 is a schematic diagram of algorithm training based on the simulation learning. A plurality of experts and a plurality of agents interact with the environment, the experts obtain complete observation information of the system, and the agents obtain partial observable information of the system. The expert generates a state dynamic library through offline learning, and the intelligent agent performs online training through modules such as strategies, values and discriminators based on the library generated by the expert.
Fig. 3 and 4 compare the performance of the proposed mlu algorithm with the other three algorithms in terms of average user utility and revenue obtained by the owner of the drone. The experimental result shows that the simulation learning is beneficial to the online scheduling with the knowledge of local information, and compared with a comparison algorithm, the method can obtain higher system user utility and unmanned aerial vehicle owner income.
Fig. 5 and 6 compare the performance of the mlu algorithm proposed by the present invention with the other three algorithms in the fairness of the unmanned aerial vehicle owner's revenue. The experimental result shows that the method can achieve better fairness in the aspect of income obtained by an unmanned aerial vehicle owner, and has smaller performance gap with an expert strategy.
The embodiment of the invention provides an unmanned aerial vehicle resource deployment method based on difference service, which comprises the following steps:
step 1: and constructing a dynamic demand model, and determining the utility of the user and the owner of the unmanned aerial vehicle.
Hair brushA dynamic demand model is constructed, wherein the dynamic demand model comprises H hot spot areas and K unmanned aerial vehicle owners. In each time slot t, user i has a probability
Figure BDA0003101885210000078
Generates a service request and defines as
Figure BDA0003101885210000071
Wherein d is hi (t) represents the required service capability, iota hik (t)∈[0,1]Indicating the degree of preference of user i in hotspot area h for service k. In order to meet the user requirements, the unmanned aerial vehicles governed by the same unmanned aerial vehicle owner form a mesh network to circle above the hotspot position h. All nodes in the mesh network can communicate with each other and perform load balancing in a self-adaptive manner, and unmanned aerial vehicles governed by different unmanned aerial vehicle owners do not communicate with each other. The user only needs to upload the service requirements to the drone that is of its preferred type and closest to it.
The budget for purchasing services by user i located in hotspot area h can be used as e hi The total number of users in the hot spot area h can be expressed by m h (t) then the total user budget for the hot spot area h is
Figure BDA0003101885210000073
The aggregated preference of the user for service k is:
Figure BDA0003101885210000074
the total demand of the hotspot area h for service k in time slot t can be expressed as:
Figure BDA0003101885210000075
then the aggregate user utility in hotspot region h can be calculated by the following equation:
Figure BDA0003101885210000076
wherein 0 < alpha < 1Showing the degree of substitution of different services. Variable q hk (t) total amount of service that the owner of the drone can provide for hotspot h within time slot t, e.g. q in caching applications hk (t) represents the available transmission rate. The total revenue for the system user can then be calculated using the following formula:
Figure BDA0003101885210000077
the service overhead of the drone owner consists of two parts: maintenance and energy costs, where a unit maintenance cost may be in g 0 Expressed as g for energy cost per unit power s Means that unit service energy consumption g c And (4) showing. The energy consumption cost of the owner k of the unmanned aerial vehicle in the time slot t can be calculated by the following formula:
Figure BDA0003101885210000081
expression formula
Figure BDA0003101885210000082
Indicating the number of drones required, where b k Representing the service capacity of a single drone. The profit for the drone owner k in time slot t may be calculated by the following formula:
Г hk (t)=p k (t)q hk (t)-c hk (t),
wherein p is k (t) is the price of service k in time slot t.
Based on the definition of the user aggregate utility and the unmanned aerial vehicle owner income, the first optimization goal of the method is to maximize the total utility of the user, and the problems are described as follows:
P1
Figure BDA0003101885210000083
s.t.
Figure BDA0003101885210000084
the constraint conditions ensure that the total user overhead of the hotspot area h in the time slot t does not exceed the total budget.
Second, the goal is to maximize the long-term revenue of the owner of the drone, and the problem is described as follows:
P2:
Figure BDA0003101885210000085
and 2, step: constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem.
The unmanned aerial vehicle owner income maximization problem defined in the step 1 can be converted into a Markov game problem, the game can be represented by a tuple < K, S, O, A, P, R, gamma >, and the meanings of the elements are as follows:
the state S represents the state information of the established Markov game model and is expressed as
Figure BDA0003101885210000086
Wherein S is 1 Representing the status of the user, including the user's generated service requirements, service preferences, and budgets; s 2 Representing the state of the owner of the unmanned aerial vehicle, including unit cost overhead, service capacity and service replaceability; s 3 Indicating the status of the services provided, including the number and price of services provided in the past.
Observing state O, wherein the owner of the unmanned aerial vehicle in the system can not observe the system state S, can only observe partial information and is expressed as
Figure BDA0003101885210000091
Wherein
Figure BDA0003101885210000092
Is the observed state of the drone owner k, including the user's budget, the drone owner cost per unit, service capacity and service alternatives, and the number and price of services offered in the past.
Action A movement of the owner of the unmanned aerial vehicleAre shown as a set
Figure BDA0003101885210000093
Wherein Δ q hk (t) is the number of services that need to be provided in addition to the last time slot.
State transition probability P: expressed as P: sxas × S → [0, 1]Based on the probability P(s) t+1 |s t ,a t ) And action a t The system state is from s t Jump to s t+1
The reward function R can be expressed as
Figure BDA0003101885210000095
S × A → R, representing agent k performing an action within time slot t
Figure BDA0003101885210000096
The instant prize later earned. The instant prize in the present system can be calculated by the following formula:
Figure BDA0003101885210000097
so that the objective function conversion of the owner of the unmanned aerial vehicle maximizes the accumulated instantaneous rewards
Figure BDA0003101885210000098
And step 3: and in a complete information state, constructing an expert strategy, wherein the performance of the expert strategy can reach the offline optimum.
In the complete information state, the optimization problems P1 and P2 are converted, and the relationship between the service quantity and the price can be obtained:
Figure BDA0003101885210000099
optimization problems P1 and P2 can be transformed to only the unknown variables q hk (t) as a function of. Meanwhile, the consistency of the optimal solutions of P1 and P2 can be verified. The expert strategy can be obtained by the following steps:
1) and K experts calculate the optimal service quantity according to the current system state.
2) And recording the actions, the system state, the observable state and the reward executed by K experts in each time slot to form a data set.
And 4, step 4: and in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3).
First, under partial observation conditions, each agent needs to predict the adversary strategy. Strategy pi of intelligent agent K can be established based on occupancy rate measurement matching strategy k And adversary strategy pi -k The relationship between them, expressed as:
Figure BDA00031018852100000910
the invention adopts a strategy of generating an confrontation network training agent, and the optimization problem can be converted into the following form:
P3
Figure BDA0003101885210000101
Figure BDA0003101885210000103
only the saddle point (pi) needs to be found k ,D k ) The problem can be solved.
Second, to solve for the saddle point (π k ,D k ) Training the strategy model of the agent, wherein the pseudo code flow of the algorithm is shown in table 1.
TABLE 1 agent policy model training pseudo-code
Figure BDA0003101885210000102
The designed online algorithm MILU pseudo code flow is shown in Table 2.
Figure BDA0003101885210000111
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the present invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims (2)

1. A method for dynamically deploying unmanned aerial vehicle resources based on differentiated services is characterized by comprising the following steps:
1) constructing a dynamic demand model, and determining the utility of a user and the owner of the unmanned aerial vehicle;
2) constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem;
3) in the complete information state, constructing an expert strategy to enable the performance to be optimal off line;
4) under the state of local information, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3);
the step 1) of constructing a dynamic demand model, determining the utility of a user and an unmanned aerial vehicle owner, and specifically comprises the following steps:
the dynamic demand model comprises H hot spot areas and K unmanned aerial vehicle owners, and in each time slot t, a user i has probability
Figure FDA0003674158010000016
Generating service requests hi (t) and is defined as
Figure FDA0003674158010000011
Wherein d is hi (t) represents the required service capability, iota hik (t)∈[0,1]Representing the preference degree of the user i in the hotspot region h for the service k;
budget e for purchasing service by user i located in hot spot area h hi M represents the total number of users in the hot spot area h h (t) indicates that then the total user budget for the hot spot area h is
Figure FDA0003674158010000012
The aggregated preference of the user for service k is:
Figure FDA0003674158010000013
the total demand of the hotspot area h for service k within the time slot t is expressed as:
Figure FDA0003674158010000014
then the aggregate user utility in hotspot region h can be calculated by the following equation:
Figure FDA0003674158010000015
where 0 < alpha < 1 denotes the degree of substitution of different services, variable q hk (t) total amount of service that the owner of the drone can provide for the hotspot area h within the time slot t, q in caching applications hk (t) represents the available transmission rate, and the total revenue of the system user is calculated by the following formula:
Figure FDA0003674158010000021
the service overhead of the drone owner consists of two parts: maintenance costs and energy consumption costs, among othersG for unit maintenance cost 0 Expressed as g for energy consumption per unit power s Means that unit service energy consumption g c It is shown that the energy consumption cost of the owner k of the drone in the time slot t is calculated by the following formula:
Figure FDA0003674158010000022
expression formula
Figure FDA0003674158010000023
Indicating the number of drones required, where b k Representing the service capacity of a single drone, the benefit of the drone owner k in the time slot t is calculated by the following formula:
Γ hk (t)=p k (t)q hk (t)-c hk (t),
wherein p is k (t) is the price of service k within time slot t;
based on the definition of the user aggregate utility and the unmanned aerial vehicle owner income, the first optimization goal is to maximize the user total utility, and the problem is described as follows:
P1:
Figure FDA0003674158010000024
Figure FDA0003674158010000025
the constraint condition ensures that the total user overhead of the hot spot area h in the time slot t does not exceed the total budget;
second, the goal is to maximize the long-term revenue of the owner of the drone, and the problem is described as follows:
P2:
Figure FDA0003674158010000026
the step 2) of constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem specifically comprises the following steps:
the unmanned aerial vehicle owner income maximization problem defined in the step 1 is converted into a Markov game problem, the game can be represented by a tuple < K, S, O, A, P, R, gamma >, and the meanings of all elements are as follows:
the state S represents the state information of the established Markov game model and is expressed as
Figure FDA0003674158010000027
Wherein S is 1 Representing the status of the user, including the user's generated service requirements, service preferences, and budgets; s. the 2 Representing the state of the owner of the unmanned aerial vehicle, including unit cost overhead, service capacity and service replaceability; s. the 3 Representing the status of the provided services, including the number and price of the services provided in the past;
observing state O, wherein the owner of the unmanned aerial vehicle in the system can not observe the system state S, can only observe partial information and is expressed as
Figure FDA0003674158010000031
Wherein
Figure FDA0003674158010000032
Is the observed state of the drone owner k, including the user's budget, the drone owner cost per unit, service capacity and service alternatives, and the number and price of services offered in the past;
action A. action set of the owner of the drone is represented as
Figure FDA0003674158010000033
Wherein Δ q hk (t) is the number of services that need to be provided in addition to the last time slot;
state transition probability P: expressed as P: sxas × S → [0, 1]Based on the probability P(s) t+1 |s t ,a t ) And action a t The system state is from s t Jump to s t+1
The reward function R can be expressed as
Figure FDA0003674158010000034
S × A → R, representing agent k performing an action within time slot t
Figure FDA0003674158010000035
Post-acquired transient rewards; the instant prize may be calculated by the following formula:
Figure FDA0003674158010000036
thus, the objective function conversion of the owner of the unmanned aerial vehicle maximizes the accumulated instantaneous rewards
Figure FDA0003674158010000037
The step 3: in a complete information state, constructing an expert strategy to make the performance reach offline optimization, specifically comprising:
in the complete information state, the optimization problems P1 and P2 are converted to obtain the relationship between the service quantity and the price:
Figure FDA0003674158010000038
the optimization problems P1 and P2 are converted into the optimization problems only with the unknown variable q hk (t) while verifying that the optimal solutions of P1 and P2 are consistent, the expert strategy is obtained by:
1) k experts obtain the optimal service quantity q according to the current system state by solving the following equation hk (t):
Figure FDA0003674158010000039
Wherein A is k =(g o +g s +g c b k )/b k Of variable b k Serving resource capacity for a single drone, variable q h,-k Are clothesThe number of services provided by other services except the service k in the hotspot area k; variable Q k =f hk (t)[q hk (t)] α And is made of
Figure FDA0003674158010000041
Figure FDA0003674158010000042
2) Recording actions, system states, observable states and rewards executed by K experts in each time slot to form a data set;
the step 4: in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3), specifically comprising:
firstly, under a partial observation state, each agent needs to predict an adversary strategy, and a relation between a strategy pi K of an agent K and an adversary strategy pi-K can be established based on an occupancy rate measurement matching strategy, and is expressed as follows:
Figure FDA0003674158010000043
wherein o represents the observation state, and the optimization problem can be converted into the following form by adopting the strategy of generating the confrontation network training agent:
P3:
Figure FDA0003674158010000044
wherein
Figure FDA0003674158010000045
Representing smart-based strategies pi k And pi -k Expectation of (D) k Representing an output for generating a countermeasure network; only the saddle point (pi) needs to be found k ,D k ) The problem can be solved;
second, to solve for the saddle point (π k ,D k ) And training the intelligent agent strategy model.
2. The method for dynamically deploying unmanned aerial vehicle resources based on differentiated services according to claim 1, wherein in order to meet user requirements, an unmanned aerial vehicle administered by the same unmanned aerial vehicle owner forms a mesh network to circle above a hotspot position h, nodes in the mesh network can communicate with each other and adaptively perform load balancing, and unmanned aerial vehicles administered by different unmanned aerial vehicle owners do not communicate with each other; the user only needs to upload the service requirements to the drone that is of its preferred type and closest to it.
CN202110625142.7A 2021-06-04 2021-06-04 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services Active CN113242556B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110625142.7A CN113242556B (en) 2021-06-04 2021-06-04 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110625142.7A CN113242556B (en) 2021-06-04 2021-06-04 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services

Publications (2)

Publication Number Publication Date
CN113242556A CN113242556A (en) 2021-08-10
CN113242556B true CN113242556B (en) 2022-08-23

Family

ID=77136840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110625142.7A Active CN113242556B (en) 2021-06-04 2021-06-04 Unmanned aerial vehicle resource dynamic deployment method based on differentiated services

Country Status (1)

Country Link
CN (1) CN113242556B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105979603A (en) * 2016-06-24 2016-09-28 贵州宇鹏科技有限责任公司 Unmanned aerial vehicle uplink scheduling method for information flow QoS guarantee based on TD-LTE technology
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN112118556A (en) * 2020-03-02 2020-12-22 湖北工业大学 Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10452951B2 (en) * 2016-08-26 2019-10-22 Goodrich Corporation Active visual attention models for computer vision tasks
CN108594858B (en) * 2018-07-16 2020-10-27 河南大学 Unmanned aerial vehicle searching method and device for Markov moving target
CN110263388A (en) * 2019-05-30 2019-09-20 东华大学 A kind of multiple unmanned plane cooperative system performance estimating methods based on stochastic Petri net
CN110488859B (en) * 2019-07-15 2020-08-21 北京航空航天大学 Unmanned aerial vehicle route planning method based on improved Q-learning algorithm
KR102223736B1 (en) * 2019-07-22 2021-03-05 엘지전자 주식회사 A speech processing method using an artificial intelligence device
CN111193536B (en) * 2019-12-11 2021-06-04 西北工业大学 Multi-unmanned aerial vehicle base station track optimization and power distribution method
CN111787509B (en) * 2020-07-14 2021-11-02 中南大学 Unmanned aerial vehicle task unloading method and system based on reinforcement learning in edge calculation
CN112308372A (en) * 2020-09-22 2021-02-02 合肥工业大学 Data and model combined driven air-ground patrol resource dynamic scheduling method and system
CN112507622B (en) * 2020-12-16 2022-06-21 中国人民解放军国防科技大学 Anti-unmanned aerial vehicle task allocation method based on reinforcement learning
CN112702714B (en) * 2020-12-28 2021-12-14 湖南大学 Unmanned aerial vehicle cooperative type vehicle networking operation task unloading method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105979603A (en) * 2016-06-24 2016-09-28 贵州宇鹏科技有限责任公司 Unmanned aerial vehicle uplink scheduling method for information flow QoS guarantee based on TD-LTE technology
CN112118556A (en) * 2020-03-02 2020-12-22 湖北工业大学 Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning

Also Published As

Publication number Publication date
CN113242556A (en) 2021-08-10

Similar Documents

Publication Publication Date Title
Du et al. Resource pricing and allocation in MEC enabled blockchain systems: An A3C deep reinforcement learning approach
Hu et al. Twin-timescale artificial intelligence aided mobility-aware edge caching and computing in vehicular networks
Tushar et al. Distributed real-time electricity allocation mechanism for large residential microgrid
Chen et al. Multiuser computation offloading and resource allocation for cloud–edge heterogeneous network
CN107906675A (en) A kind of central air-conditioning cluster optimal control method based on user demand
CN116306324B (en) Distributed resource scheduling method based on multiple agents
CN104754063B (en) Local cloud computing resource scheduling method
CN109831808A (en) A kind of resource allocation methods of the hybrid power supply C-RAN based on machine learning
Gu et al. Service management and energy scheduling toward low-carbon edge computing
Xie et al. Multi-Agent attention-based deep reinforcement learning for demand response in grid-responsive buildings
Qin et al. User-edge collaborative resource allocation and offloading strategy in edge computing
Xu et al. Task allocation for unmanned aerial vehicles in mobile crowdsensing
Zhao et al. Reinforcement learning for resource mapping in 5G network slicing
CN113242556B (en) Unmanned aerial vehicle resource dynamic deployment method based on differentiated services
CN113946423A (en) Multi-task edge computing scheduling optimization method based on graph attention network
Yu et al. Resources sharing in 5G networks: Learning-enabled incentives and coalitional games
CN113821346A (en) Computation uninstalling and resource management method in edge computation based on deep reinforcement learning
CN114619907A (en) Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning
CN115361453B (en) Load fair unloading and migration method for edge service network
CN112822055A (en) DQN-based edge computing node deployment algorithm
Han et al. Multi-agent reinforcement learning enabling dynamic pricing policy for charging station operators
CN116502921A (en) Park comprehensive energy system optimization management system and coordination scheduling method thereof
Zhang et al. Flexible selection framework for secondary frequency regulation units based on learning optimisation method
CN116468168A (en) Distributed power supply multi-target hierarchical planning method based on improved beluga optimization algorithm
Luan et al. Cooperative power consumption in the smart grid based on coalition formation game

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant