CN113242556B - Unmanned aerial vehicle resource dynamic deployment method based on differentiated services - Google Patents
Unmanned aerial vehicle resource dynamic deployment method based on differentiated services Download PDFInfo
- Publication number
- CN113242556B CN113242556B CN202110625142.7A CN202110625142A CN113242556B CN 113242556 B CN113242556 B CN 113242556B CN 202110625142 A CN202110625142 A CN 202110625142A CN 113242556 B CN113242556 B CN 113242556B
- Authority
- CN
- China
- Prior art keywords
- unmanned aerial
- aerial vehicle
- service
- owner
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/02—Resource partitioning among network components, e.g. reuse partitioning
- H04W16/10—Dynamic resource partitioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/18—Network planning tools
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/22—Traffic simulation tools or models
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/53—Allocation or scheduling criteria for wireless resources based on regulatory allocation policies
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention discloses an unmanned aerial vehicle dynamic deployment method based on differentiated services, which is used for realizing efficient differentiated services in a wireless mobile network environment. The invention firstly establishes a Markov game model based on the state information of the unmanned aerial vehicle and the ground users, and then deduces the Nash equilibrium condition of the service resources provided by the owner of the unmanned aerial vehicle under the condition of complete information, and the users can achieve the optimal utility. And in the case of incomplete information, effective decision of the unmanned aerial vehicle owner on the provided resources of each time slot is realized by means of simulation learning. In addition, the invention designs a novel neural network model for strategy training, and combines a convolutional neural network, a generation countermeasure network and a gradient descent strategy. Theoretical analysis shows that the unmanned aerial vehicle service deployment decision provided by the invention is a progressive optimal solution. The invention provides a new method for unmanned aerial vehicle resource allocation based on differentiated services.
Description
Technical Field
The invention belongs to a method for dynamically deploying available resources of an unmanned aerial vehicle based on differentiated service demands of ground users, and particularly relates to a method for dynamically deploying resources of the unmanned aerial vehicle based on imitation learning.
Background
The unmanned aerial vehicle has the characteristics of flexibility, mobility and the like, and is widely applied to a wireless edge network to provide services for users, including data collection, network access, content caching and the like. The key problem in implementing services based on unmanned aerial vehicles is how to efficiently deploy unmanned aerial vehicle resources to meet user demands. However, the existing unmanned aerial vehicle deployment scheme only focuses on a single type of service, and does not consider a scenario in which multiple differentiated services coexist. A typical application scenario is that different network operators may provide heterogeneous network services, such as 4G and 5G networks, to ground users via drones. In a live basketball live game, spectators can purchase different kinds of network value-added services from network operators according to their own demands and purchasing power on the network. Therefore, how to satisfy both the interests of the user and the owner of the drone through optimal configuration of the drone resources is awaited further exploration by researchers. The invention aims to provide a dynamic unmanned aerial vehicle service deployment method based on differentiated services, mainly aiming at the defects of the existing research, and the method realizes the unmanned aerial vehicle owner online resource deployment scheme under incomplete information by using simulation learning, optimizes the utility of users and unmanned aerial vehicle owners simultaneously, and provides a new method for unmanned aerial vehicle resource deployment based on differentiated services.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. An unmanned aerial vehicle resource dynamic deployment method based on differentiated services is provided. The technical scheme of the invention is as follows:
a method for dynamically deploying unmanned aerial vehicle resources based on differentiated services comprises the following steps:
1) constructing a dynamic demand model, and determining the utility of a user and the owner of the unmanned aerial vehicle;
2) constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem;
3) in the complete information state, constructing an expert strategy to enable the performance to be optimal off line;
4) and in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3).
Further, step 1) constructs a dynamic demand model, determines the utility of the user and the owner of the unmanned aerial vehicle, and specifically includes:
the dynamic demand model comprises H hot spot areas and K unmanned aerial vehicle owners, and in each time slot t, a user i has probabilityGenerating service requests hi (t) and is defined asWherein d is hi (t) represents the required service capability, iota hik (t)∈[0,1]Representing the preference degree of the user i in the hotspot area h for the service k;
budget e for purchasing service by user i located in hot spot area h hi M represents the total number of users in the hot spot area h h (t) then the total user budget for the hot spot area h isThe aggregated preference of the user for service k is:
the total demand of the hotspot area h for service k within the time slot t is expressed as:then the aggregate user utility in hotspot region h can be calculated by the following equation:
where 0 < alpha < 1 indicates the degree of substitution for different services, variable q hk (t) total amount of service q in cache application for the unmanned aerial vehicle owner can provide for the hotspot area h in the time slot t hk (t) represents the available transmission rate, and the total revenue of the system user is calculated by the following formula:
further, the service overhead of the drone owner includes two parts: maintenance and energy costs, where the unit maintenance cost is in g 0 Expressed as g for energy consumption per unit power s Means that unit service energy consumption g c It is shown that the energy consumption cost of the owner k of the drone in the time slot t is calculated by the following formula:
expression(s)Indicating the number of drones required, where b k Representing the service capacity of a single drone, the benefit of the drone owner k in the time slot t is calculated by the following formula:
Γ hk (t)=p k (t)q hk (t)-c hk (t),
wherein p is k (t) is the price of service k in time slot t;
based on the definition of the user aggregate utility and the unmanned aerial vehicle owner income, the first optimization target is to maximize the total utility of the user, and the problem is described as follows:
the constraint condition ensures that the total user overhead of the hot spot area h in the time slot t does not exceed the total budget;
second, the goal is to maximize the long-term revenue of the owner of the drone, the problem is described as follows:
further, the step 2) of constructing a markov game model, and converting the profit maximization problem in the step 1) into a markov optimization problem specifically includes:
the unmanned aerial vehicle owner income maximization problem defined in the step 1 is converted into a Markov game problem, the game can be represented by a tuple < K, S, O, A, P, R, gamma >, and the meanings of all elements are as follows:
the state S represents the state information of the established Markov game model and is expressed asWherein S is 1 Representing the state of the user, including the service demand, service preference and budget generated by the user; s 2 Representing the state of the owner of the unmanned aerial vehicle, including unit cost overhead, service capacity and service replaceability; s 3 Representing the status of the provided services, including the number and price of the services provided in the past;
observing state O, wherein the owner of the unmanned aerial vehicle in the system can not observe the system state S, can only observe partial information and is expressed asWhereinIs the observed state of the drone owner k, including the user's budget, the drone owner cost per unit, service capacity and service alternatives, and the number and price of services offered in the past;
action A. action set of the owner of the drone is represented asWherein Δ q hk (t) is the number of services that need to be provided in addition to the last time slot.
State transition probability P: expressed as P: sxas × S → [0, 1]Based on the probability P(s) t+1 |s t ,a t ) And action a t The system state is from s t Jump to s t+1 ;
The reward function R can be expressed asS × A → R, representing agent k performing an action within time slot tPost-acquired transient rewards; the instant prize may be calculated by the following formula:so that the objective function conversion of the owner of the unmanned aerial vehicle maximizes the accumulated instantaneous rewards
Further, the step 3: in a complete information state, constructing an expert strategy to make the performance reach offline optimum, specifically comprising:
in the complete information state, the optimization problems P1 and P2 are converted to obtain the relationship between the service quantity and the price:
the optimization problems P1 and P2 are converted into the optimization problems only with the unknown variable q hk (t) while verifying that the optimal solutions of P1 and P2 are consistent, the expert strategy is obtained by the following steps:
1) k experts obtain the optimal service quantity q by solving a following equation according to the current system state hk (t):
Wherein A is k =(g o +g s +g c b k )/b k Of variable b k Serving resource capacity, variable q, for a single drone h,-k The number of services provided in the hotspot region k for other services except the service k; variable Q k =f hk (t)[q hk (t)] α And is made of
2) And recording the actions, the system state, the observable state and the reward executed by K experts in each time slot to form a data set.
Further, the step 4: in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3), specifically comprising:
firstly, under a partial observation state, each agent needs to predict an adversary strategy, and a strategy pi of an agent K can be established based on an occupancy rate measurement matching strategy k And adversary strategy pi -k The relationship between them, expressed as:
where o represents the observed state. By using the strategy of generating an confrontation network training agent, the optimization problem can be converted into the following form:
whereinRepresenting smart-based strategies pi k And pi- k Expectation of (D) k Indicating the generation of an output of the countermeasure network. Only the saddle point (pi) needs to be found k ,D k ) The problem can be solved;
second, to solve for the saddle point (π k ,D k ) And training the intelligent agent strategy model.
Further, in order to meet user requirements, an unmanned aerial vehicle governed by the same unmanned aerial vehicle owner forms a mesh network which is spiraled above a hotspot position h, nodes in the mesh network can communicate with each other and perform load balancing in a self-adaptive manner, and unmanned aerial vehicles governed by different unmanned aerial vehicle owners do not communicate with each other. The user only needs to upload the service requirements to the drone that is of its preferred type and closest to it.
The invention has the following advantages and beneficial effects:
the invention constructs a dynamic unmanned aerial vehicle resource deployment framework for realizing differentiated service scheduling based on the unmanned aerial vehicle in a wireless mobile edge network. In order to maximize the user utility and maximize the long-term income of various unmanned aerial vehicle owners, the method firstly establishes a Markov game model based on the state information of the unmanned aerial vehicles and the ground users, and then theoretically deduces the Nash equilibrium condition of the service resources provided by the unmanned aerial vehicle owners under the condition of complete information, and the users can achieve the optimal utility. And in the case of incomplete information, effective decision of the unmanned aerial vehicle owner on the provided resources of each time slot is realized by means of simulation learning. The method combines the simulation learning and the differentiated service resource scheduling based on the unmanned aerial vehicle for the first time, and is more suitable for online scheduling and independent of a centralized control mode compared with the traditional scheduling algorithm. Compared with a machine learning scheme based on no model, the method has better convergence and performance. The experimental results prove the high efficiency of the method in the aspects of user utility, unmanned aerial vehicle owner income and fairness. The invention provides a novel deployment method of differentiated services applied to an unmanned aerial vehicle-assisted edge network.
Drawings
FIG. 1 is a diagram of a preferred embodiment dynamic demand model provided by the present invention.
Fig. 2 is a schematic diagram of algorithm training based on the simulation learning.
Fig. 3 and 4 compare the performance of the proposed mlu algorithm with the other three algorithms in terms of average user utility and revenue obtained by the owner of the drone.
Fig. 5 and 6 compare the performance of the mlu algorithm proposed by the present invention with the other three algorithms in the fairness of the unmanned aerial vehicle owner's revenue.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
fig. 1 is a dynamic demand model of a preferred embodiment provided by the present invention, in which a plurality of drones managed by each drone owner in a hotspot area cooperate to provide services to ground users, and service request data dynamically changes as users move.
Fig. 2 is a schematic diagram of algorithm training based on the simulation learning. A plurality of experts and a plurality of agents interact with the environment, the experts obtain complete observation information of the system, and the agents obtain partial observable information of the system. The expert generates a state dynamic library through offline learning, and the intelligent agent performs online training through modules such as strategies, values and discriminators based on the library generated by the expert.
Fig. 3 and 4 compare the performance of the proposed mlu algorithm with the other three algorithms in terms of average user utility and revenue obtained by the owner of the drone. The experimental result shows that the simulation learning is beneficial to the online scheduling with the knowledge of local information, and compared with a comparison algorithm, the method can obtain higher system user utility and unmanned aerial vehicle owner income.
Fig. 5 and 6 compare the performance of the mlu algorithm proposed by the present invention with the other three algorithms in the fairness of the unmanned aerial vehicle owner's revenue. The experimental result shows that the method can achieve better fairness in the aspect of income obtained by an unmanned aerial vehicle owner, and has smaller performance gap with an expert strategy.
The embodiment of the invention provides an unmanned aerial vehicle resource deployment method based on difference service, which comprises the following steps:
step 1: and constructing a dynamic demand model, and determining the utility of the user and the owner of the unmanned aerial vehicle.
Hair brushA dynamic demand model is constructed, wherein the dynamic demand model comprises H hot spot areas and K unmanned aerial vehicle owners. In each time slot t, user i has a probabilityGenerates a service request and defines asWherein d is hi (t) represents the required service capability, iota hik (t)∈[0,1]Indicating the degree of preference of user i in hotspot area h for service k. In order to meet the user requirements, the unmanned aerial vehicles governed by the same unmanned aerial vehicle owner form a mesh network to circle above the hotspot position h. All nodes in the mesh network can communicate with each other and perform load balancing in a self-adaptive manner, and unmanned aerial vehicles governed by different unmanned aerial vehicle owners do not communicate with each other. The user only needs to upload the service requirements to the drone that is of its preferred type and closest to it.
The budget for purchasing services by user i located in hotspot area h can be used as e hi The total number of users in the hot spot area h can be expressed by m h (t) then the total user budget for the hot spot area h isThe aggregated preference of the user for service k is:
the total demand of the hotspot area h for service k in time slot t can be expressed as:then the aggregate user utility in hotspot region h can be calculated by the following equation:
wherein 0 < alpha < 1Showing the degree of substitution of different services. Variable q hk (t) total amount of service that the owner of the drone can provide for hotspot h within time slot t, e.g. q in caching applications hk (t) represents the available transmission rate. The total revenue for the system user can then be calculated using the following formula:
the service overhead of the drone owner consists of two parts: maintenance and energy costs, where a unit maintenance cost may be in g 0 Expressed as g for energy cost per unit power s Means that unit service energy consumption g c And (4) showing. The energy consumption cost of the owner k of the unmanned aerial vehicle in the time slot t can be calculated by the following formula:
expression formulaIndicating the number of drones required, where b k Representing the service capacity of a single drone. The profit for the drone owner k in time slot t may be calculated by the following formula:
Г hk (t)=p k (t)q hk (t)-c hk (t),
wherein p is k (t) is the price of service k in time slot t.
Based on the definition of the user aggregate utility and the unmanned aerial vehicle owner income, the first optimization goal of the method is to maximize the total utility of the user, and the problems are described as follows:
the constraint conditions ensure that the total user overhead of the hotspot area h in the time slot t does not exceed the total budget.
Second, the goal is to maximize the long-term revenue of the owner of the drone, and the problem is described as follows:
and 2, step: constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem.
The unmanned aerial vehicle owner income maximization problem defined in the step 1 can be converted into a Markov game problem, the game can be represented by a tuple < K, S, O, A, P, R, gamma >, and the meanings of the elements are as follows:
the state S represents the state information of the established Markov game model and is expressed asWherein S is 1 Representing the status of the user, including the user's generated service requirements, service preferences, and budgets; s 2 Representing the state of the owner of the unmanned aerial vehicle, including unit cost overhead, service capacity and service replaceability; s 3 Indicating the status of the services provided, including the number and price of services provided in the past.
Observing state O, wherein the owner of the unmanned aerial vehicle in the system can not observe the system state S, can only observe partial information and is expressed asWhereinIs the observed state of the drone owner k, including the user's budget, the drone owner cost per unit, service capacity and service alternatives, and the number and price of services offered in the past.
Action A movement of the owner of the unmanned aerial vehicleAre shown as a setWherein Δ q hk (t) is the number of services that need to be provided in addition to the last time slot.
State transition probability P: expressed as P: sxas × S → [0, 1]Based on the probability P(s) t+1 |s t ,a t ) And action a t The system state is from s t Jump to s t+1 。
The reward function R can be expressed asS × A → R, representing agent k performing an action within time slot tThe instant prize later earned. The instant prize in the present system can be calculated by the following formula:so that the objective function conversion of the owner of the unmanned aerial vehicle maximizes the accumulated instantaneous rewards
And step 3: and in a complete information state, constructing an expert strategy, wherein the performance of the expert strategy can reach the offline optimum.
In the complete information state, the optimization problems P1 and P2 are converted, and the relationship between the service quantity and the price can be obtained:
optimization problems P1 and P2 can be transformed to only the unknown variables q hk (t) as a function of. Meanwhile, the consistency of the optimal solutions of P1 and P2 can be verified. The expert strategy can be obtained by the following steps:
1) and K experts calculate the optimal service quantity according to the current system state.
2) And recording the actions, the system state, the observable state and the reward executed by K experts in each time slot to form a data set.
And 4, step 4: and in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3).
First, under partial observation conditions, each agent needs to predict the adversary strategy. Strategy pi of intelligent agent K can be established based on occupancy rate measurement matching strategy k And adversary strategy pi -k The relationship between them, expressed as:
the invention adopts a strategy of generating an confrontation network training agent, and the optimization problem can be converted into the following form:
only the saddle point (pi) needs to be found k ,D k ) The problem can be solved.
Second, to solve for the saddle point (π k ,D k ) Training the strategy model of the agent, wherein the pseudo code flow of the algorithm is shown in table 1.
TABLE 1 agent policy model training pseudo-code
The designed online algorithm MILU pseudo code flow is shown in Table 2.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the present invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.
Claims (2)
1. A method for dynamically deploying unmanned aerial vehicle resources based on differentiated services is characterized by comprising the following steps:
1) constructing a dynamic demand model, and determining the utility of a user and the owner of the unmanned aerial vehicle;
2) constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem;
3) in the complete information state, constructing an expert strategy to enable the performance to be optimal off line;
4) under the state of local information, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3);
the step 1) of constructing a dynamic demand model, determining the utility of a user and an unmanned aerial vehicle owner, and specifically comprises the following steps:
the dynamic demand model comprises H hot spot areas and K unmanned aerial vehicle owners, and in each time slot t, a user i has probabilityGenerating service requests hi (t) and is defined asWherein d is hi (t) represents the required service capability, iota hik (t)∈[0,1]Representing the preference degree of the user i in the hotspot region h for the service k;
budget e for purchasing service by user i located in hot spot area h hi M represents the total number of users in the hot spot area h h (t) indicates that then the total user budget for the hot spot area h isThe aggregated preference of the user for service k is:
the total demand of the hotspot area h for service k within the time slot t is expressed as:then the aggregate user utility in hotspot region h can be calculated by the following equation:
where 0 < alpha < 1 denotes the degree of substitution of different services, variable q hk (t) total amount of service that the owner of the drone can provide for the hotspot area h within the time slot t, q in caching applications hk (t) represents the available transmission rate, and the total revenue of the system user is calculated by the following formula:
the service overhead of the drone owner consists of two parts: maintenance costs and energy consumption costs, among othersG for unit maintenance cost 0 Expressed as g for energy consumption per unit power s Means that unit service energy consumption g c It is shown that the energy consumption cost of the owner k of the drone in the time slot t is calculated by the following formula:
expression formulaIndicating the number of drones required, where b k Representing the service capacity of a single drone, the benefit of the drone owner k in the time slot t is calculated by the following formula:
Γ hk (t)=p k (t)q hk (t)-c hk (t),
wherein p is k (t) is the price of service k within time slot t;
based on the definition of the user aggregate utility and the unmanned aerial vehicle owner income, the first optimization goal is to maximize the user total utility, and the problem is described as follows:
the constraint condition ensures that the total user overhead of the hot spot area h in the time slot t does not exceed the total budget;
second, the goal is to maximize the long-term revenue of the owner of the drone, and the problem is described as follows:
the step 2) of constructing a Markov game model, and converting the profit maximization problem in the step 1) into a Markov optimization problem specifically comprises the following steps:
the unmanned aerial vehicle owner income maximization problem defined in the step 1 is converted into a Markov game problem, the game can be represented by a tuple < K, S, O, A, P, R, gamma >, and the meanings of all elements are as follows:
the state S represents the state information of the established Markov game model and is expressed asWherein S is 1 Representing the status of the user, including the user's generated service requirements, service preferences, and budgets; s. the 2 Representing the state of the owner of the unmanned aerial vehicle, including unit cost overhead, service capacity and service replaceability; s. the 3 Representing the status of the provided services, including the number and price of the services provided in the past;
observing state O, wherein the owner of the unmanned aerial vehicle in the system can not observe the system state S, can only observe partial information and is expressed asWhereinIs the observed state of the drone owner k, including the user's budget, the drone owner cost per unit, service capacity and service alternatives, and the number and price of services offered in the past;
action A. action set of the owner of the drone is represented asWherein Δ q hk (t) is the number of services that need to be provided in addition to the last time slot;
state transition probability P: expressed as P: sxas × S → [0, 1]Based on the probability P(s) t+1 |s t ,a t ) And action a t The system state is from s t Jump to s t+1 ;
The reward function R can be expressed asS × A → R, representing agent k performing an action within time slot tPost-acquired transient rewards; the instant prize may be calculated by the following formula:thus, the objective function conversion of the owner of the unmanned aerial vehicle maximizes the accumulated instantaneous rewards
The step 3: in a complete information state, constructing an expert strategy to make the performance reach offline optimization, specifically comprising:
in the complete information state, the optimization problems P1 and P2 are converted to obtain the relationship between the service quantity and the price:
the optimization problems P1 and P2 are converted into the optimization problems only with the unknown variable q hk (t) while verifying that the optimal solutions of P1 and P2 are consistent, the expert strategy is obtained by:
1) k experts obtain the optimal service quantity q according to the current system state by solving the following equation hk (t):
Wherein A is k =(g o +g s +g c b k )/b k Of variable b k Serving resource capacity for a single drone, variable q h,-k Are clothesThe number of services provided by other services except the service k in the hotspot area k; variable Q k =f hk (t)[q hk (t)] α And is made of
2) Recording actions, system states, observable states and rewards executed by K experts in each time slot to form a data set;
the step 4: in a local information state, constructing an intelligent agent online learning strategy based on the offline expert strategy set obtained in the step 3), specifically comprising:
firstly, under a partial observation state, each agent needs to predict an adversary strategy, and a relation between a strategy pi K of an agent K and an adversary strategy pi-K can be established based on an occupancy rate measurement matching strategy, and is expressed as follows:
wherein o represents the observation state, and the optimization problem can be converted into the following form by adopting the strategy of generating the confrontation network training agent:
whereinRepresenting smart-based strategies pi k And pi -k Expectation of (D) k Representing an output for generating a countermeasure network; only the saddle point (pi) needs to be found k ,D k ) The problem can be solved;
second, to solve for the saddle point (π k ,D k ) And training the intelligent agent strategy model.
2. The method for dynamically deploying unmanned aerial vehicle resources based on differentiated services according to claim 1, wherein in order to meet user requirements, an unmanned aerial vehicle administered by the same unmanned aerial vehicle owner forms a mesh network to circle above a hotspot position h, nodes in the mesh network can communicate with each other and adaptively perform load balancing, and unmanned aerial vehicles administered by different unmanned aerial vehicle owners do not communicate with each other; the user only needs to upload the service requirements to the drone that is of its preferred type and closest to it.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110625142.7A CN113242556B (en) | 2021-06-04 | 2021-06-04 | Unmanned aerial vehicle resource dynamic deployment method based on differentiated services |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110625142.7A CN113242556B (en) | 2021-06-04 | 2021-06-04 | Unmanned aerial vehicle resource dynamic deployment method based on differentiated services |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113242556A CN113242556A (en) | 2021-08-10 |
CN113242556B true CN113242556B (en) | 2022-08-23 |
Family
ID=77136840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110625142.7A Active CN113242556B (en) | 2021-06-04 | 2021-06-04 | Unmanned aerial vehicle resource dynamic deployment method based on differentiated services |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113242556B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105979603A (en) * | 2016-06-24 | 2016-09-28 | 贵州宇鹏科技有限责任公司 | Unmanned aerial vehicle uplink scheduling method for information flow QoS guarantee based on TD-LTE technology |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
CN112118556A (en) * | 2020-03-02 | 2020-12-22 | 湖北工业大学 | Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10452951B2 (en) * | 2016-08-26 | 2019-10-22 | Goodrich Corporation | Active visual attention models for computer vision tasks |
CN108594858B (en) * | 2018-07-16 | 2020-10-27 | 河南大学 | Unmanned aerial vehicle searching method and device for Markov moving target |
CN110263388A (en) * | 2019-05-30 | 2019-09-20 | 东华大学 | A kind of multiple unmanned plane cooperative system performance estimating methods based on stochastic Petri net |
CN110488859B (en) * | 2019-07-15 | 2020-08-21 | 北京航空航天大学 | Unmanned aerial vehicle route planning method based on improved Q-learning algorithm |
KR102223736B1 (en) * | 2019-07-22 | 2021-03-05 | 엘지전자 주식회사 | A speech processing method using an artificial intelligence device |
CN111193536B (en) * | 2019-12-11 | 2021-06-04 | 西北工业大学 | Multi-unmanned aerial vehicle base station track optimization and power distribution method |
CN111787509B (en) * | 2020-07-14 | 2021-11-02 | 中南大学 | Unmanned aerial vehicle task unloading method and system based on reinforcement learning in edge calculation |
CN112308372A (en) * | 2020-09-22 | 2021-02-02 | 合肥工业大学 | Data and model combined driven air-ground patrol resource dynamic scheduling method and system |
CN112507622B (en) * | 2020-12-16 | 2022-06-21 | 中国人民解放军国防科技大学 | Anti-unmanned aerial vehicle task allocation method based on reinforcement learning |
CN112702714B (en) * | 2020-12-28 | 2021-12-14 | 湖南大学 | Unmanned aerial vehicle cooperative type vehicle networking operation task unloading method |
-
2021
- 2021-06-04 CN CN202110625142.7A patent/CN113242556B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105979603A (en) * | 2016-06-24 | 2016-09-28 | 贵州宇鹏科技有限责任公司 | Unmanned aerial vehicle uplink scheduling method for information flow QoS guarantee based on TD-LTE technology |
CN112118556A (en) * | 2020-03-02 | 2020-12-22 | 湖北工业大学 | Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning |
CN111786713A (en) * | 2020-06-04 | 2020-10-16 | 大连理工大学 | Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN113242556A (en) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Du et al. | Resource pricing and allocation in MEC enabled blockchain systems: An A3C deep reinforcement learning approach | |
Hu et al. | Twin-timescale artificial intelligence aided mobility-aware edge caching and computing in vehicular networks | |
Tushar et al. | Distributed real-time electricity allocation mechanism for large residential microgrid | |
Chen et al. | Multiuser computation offloading and resource allocation for cloud–edge heterogeneous network | |
CN107906675A (en) | A kind of central air-conditioning cluster optimal control method based on user demand | |
CN116306324B (en) | Distributed resource scheduling method based on multiple agents | |
CN104754063B (en) | Local cloud computing resource scheduling method | |
CN109831808A (en) | A kind of resource allocation methods of the hybrid power supply C-RAN based on machine learning | |
Gu et al. | Service management and energy scheduling toward low-carbon edge computing | |
Xie et al. | Multi-Agent attention-based deep reinforcement learning for demand response in grid-responsive buildings | |
Qin et al. | User-edge collaborative resource allocation and offloading strategy in edge computing | |
Xu et al. | Task allocation for unmanned aerial vehicles in mobile crowdsensing | |
Zhao et al. | Reinforcement learning for resource mapping in 5G network slicing | |
CN113242556B (en) | Unmanned aerial vehicle resource dynamic deployment method based on differentiated services | |
CN113946423A (en) | Multi-task edge computing scheduling optimization method based on graph attention network | |
Yu et al. | Resources sharing in 5G networks: Learning-enabled incentives and coalitional games | |
CN113821346A (en) | Computation uninstalling and resource management method in edge computation based on deep reinforcement learning | |
CN114619907A (en) | Coordinated charging method and coordinated charging system based on distributed deep reinforcement learning | |
CN115361453B (en) | Load fair unloading and migration method for edge service network | |
CN112822055A (en) | DQN-based edge computing node deployment algorithm | |
Han et al. | Multi-agent reinforcement learning enabling dynamic pricing policy for charging station operators | |
CN116502921A (en) | Park comprehensive energy system optimization management system and coordination scheduling method thereof | |
Zhang et al. | Flexible selection framework for secondary frequency regulation units based on learning optimisation method | |
CN116468168A (en) | Distributed power supply multi-target hierarchical planning method based on improved beluga optimization algorithm | |
Luan et al. | Cooperative power consumption in the smart grid based on coalition formation game |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |