CN111800828B - Mobile edge computing resource allocation method for ultra-dense network - Google Patents
Mobile edge computing resource allocation method for ultra-dense network Download PDFInfo
- Publication number
- CN111800828B CN111800828B CN202010597779.5A CN202010597779A CN111800828B CN 111800828 B CN111800828 B CN 111800828B CN 202010597779 A CN202010597779 A CN 202010597779A CN 111800828 B CN111800828 B CN 111800828B
- Authority
- CN
- China
- Prior art keywords
- user
- function
- representing
- users
- expressed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000013468 resource allocation Methods 0.000 title claims abstract description 19
- 206010042135 Stomatitis necrotising Diseases 0.000 claims abstract description 30
- 201000008585 noma Diseases 0.000 claims abstract description 30
- 230000008569 process Effects 0.000 claims abstract description 22
- 238000004891 communication Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 56
- 238000005265 energy consumption Methods 0.000 claims description 42
- 230000009471 action Effects 0.000 claims description 40
- 238000004364 calculation method Methods 0.000 claims description 28
- 238000004422 calculation algorithm Methods 0.000 claims description 27
- 230000005540 biological transmission Effects 0.000 claims description 26
- 230000002787 reinforcement Effects 0.000 claims description 9
- 230000006399 behavior Effects 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 229920003087 methylethyl cellulose Polymers 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/20—Control channels or signalling for resource management
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a mobile edge computing resource allocation method of an ultra-dense network, which is based on the ultra-dense network, wherein a NOMA-MEC communication system in the ultra-dense network comprises M= {1,2, …, M } small base stations, wherein each small base station is provided with an MEC server to execute a computing task of user unloading; assuming that the set of users served by each small cell is n= {1,2, …, N }, the N users are divided into y= {1,2, …, Y } groups, and there are k= {1,2, …, K } users in each group. The method solves the problem that the prior art is difficult to process the mutual interference among users, thereby influencing the computing performance of the users.
Description
[ field of technology ]
The invention belongs to the technical field of wireless communication, and particularly relates to a mobile edge computing resource allocation method of an ultra-dense network.
[ background Art ]
With the rapid development of the fifth generation (5G) mobile communication technology, deployment of Ultra Dense Networks (UDNs) has become a major architecture for future development. The UDN can effectively improve system capacity and data transmission rate to ensure user quality of service. However, due to the limited computing power of the user, solving the computationally intensive task in UDNs is a significant challenge. As an emerging technology, mobile Edge Computing (MEC) has been proposed to alleviate the computational pressure of users in UDNs. In particular, MECs offload computationally intensive tasks to the network edge to reduce the user's energy consumption and task delay.
In MEC systems, how to increase the utilization of spectrum resources between users is a significant challenge, as it directly affects energy consumption and task delay. As an emerging multiple access method, non-orthogonal multiple access (NOMA) can effectively improve the spectral efficiency of a system by allocating the same resources to multiple users. Thus, in certain operations NOMA has been applied to MEC systems to reduce energy consumption and task delays.
Average field gaming (MFG) is a tool suitable for scenes with large scale gaming individuals that can model relationships between individuals and groups in a UDN. Specifically, in UDN, the MFG averages the influence between each member, simplifying a complex model.
The authors in the literature 1"Learning deep mean field games for modeling large population behavior[in International Conference on Learning Representations, vancouver, canada, apr.2018 @, demonstrate an equilibrium solution for average field gaming with a Markov Decision Process (MDP) to predict the evolution of demographics over time.
Document 2"Collaborative Artificial Intelligence (AI) for User-Cell Association in Ultra-Dense Cellular Systems [ IEEE International Conference on Communications Workshops (ICCCWorkshops), kansas, MO, may2018]" proposes a neural Q learning algorithm to solve the problem of User association in ultra-dense network systems.
Unlike the prior art, the present invention models NOMA-MEC systems in UDN scenarios, where each Small Base Station (SBS) is equipped with a MEC server. When a user cannot handle a large number of computing tasks, some tasks will be offloaded onto the MEC server. Firstly, a User Clustering Matching Algorithm (UCMA) based on channel gain difference is provided, and the user is clustered, so that the data rate of the user is improved. Then, using NOMA-MEC system as model, establishing MFG theory frame, using deep deterministic strategy gradient (DDPG) algorithm in reinforcement learning to solve the equilibrium solution algorithm of MFG, so as to reduce energy consumption and task delay of user.
[ invention ]
The invention aims to provide a mobile edge computing resource allocation method of an ultra-dense network, which aims to solve the problem that the prior art is difficult to process mutual interference among users, so that the computing performance of the users is affected.
The technical scheme adopted by the invention is that the method for allocating the mobile edge computing resources of the ultra-dense network is based on the ultra-dense network, and a NOMA-MEC communication system in the ultra-dense network comprises M= {1,2, …, M } small base stations, wherein each small base station is provided with an MEC server to execute a computing task unloaded by a user; assuming that the set of users served by each small cell is n= {1,2, …, N }, the N users are divided into y= {1,2, …, Y } groups, and k= {1,2, …, K } users are in each group;
the resource allocation method is implemented according to the following steps:
step one, constructing an uplink NOMA-MEC communication system, wherein each SBS is provided with an MEC server to serve a plurality of users;
step two, clustering is carried out on all users in the NOMA-MEC communication system according to the difference of channel gains; the users in the clusters adopt a NOMA transmission mode, and the clusters adopt a TDMA transmission mode;
step three, calculating the calculation cost of the user, namely the time delay and the energy consumption when the user processes the task; wherein the computing costs include a local computing cost of the user and an offload computing cost;
modeling a NOMA-MEC communication system as an MFG framework; the SINR and the channel gain of the user are expressed as a state space, and the transmitting power, the unloading decision factor and the resource allocation factor of the user are expressed as an action space; constructing a reward function of the user according to the calculation cost of the user;
and fifthly, acquiring an equilibrium solution of the average field game by using a reinforcement learning method based on DDPG, namely an optimal resource allocation scheme in the mobile edge computing system.
Further, the specific method of the second step is as follows:
in the NOMA-MEC communication system model established in the first step, all users of each SBS service are ordered according to the magnitude of the channel gain, and then the users with the first M channel gains are sequentially selected as the first users in M NOMA clusters;
selecting a user with the maximum sum of channel gain differences of the NOMA cluster from other users according to a greedy matching method;
when the number of users cannot be uniformly allocated to each cluster, redundant users are randomly allocated to different clusters, and the channel gain of each user in the clusters is different.
Further, the specific mode of the third step is as follows:
3.1 Cost of local computation for the user:
let x be mk Unloading variables representing the kth user in the mth group, for a local computing model, i.e., the user can accomplish the computing task locally, without unloading the computing task to the MEC server, assume f m l k And > 0 represents the local computing capacity of the kth user in the mth group, when the user performs the task locally, the time is:
when computing the energy consumption of local computing, a commonly used model of computing energy consumption is adopted, namely epsilon=κf 2 . Where κ is an energy coefficient depending on the chip structure, the local energy consumption of the kth user in the mth group can be expressed as:
according to formulas (5) and (6), the local computation cost of the kth user in the mth group can be expressed as:
wherein,,and->Weight coefficients representing delay and energy consumption, respectively, and
3.2 Offloading computational cost for the user:
in the process of unloading to the MEC server for calculation, the method comprises two parts of transmission and calculation at the MEC server, wherein the transmission time and the execution time are respectively as follows:
wherein f s Is the computational power of the MEC server;
the total time of the unloading process is:
the energy consumption of the unloading process also has two parts, namely, the energy consumption in the transmission process and the energy consumption for executing the calculation task at the MEC server are respectively:
according to equations (11) and (12), the total energy consumption of the unloading process is expressed as:
thus, the offload computation cost function for the kth user in the mth group is expressed as:
3.3 Total computation cost for the user):
according to 3.1 and 3.2, obtaining a user local computation cost and a user offload computation cost, the overall computation cost function of the user to complete the computation task can be expressed as:
further, the specific steps of the fourth step are as follows:
in a NOMA-MEC system of an ultra-dense network, the state and channel gain of the kth user in the mth group are expressed as a state space, and the state space is expressed as:
s mk (t)={τ mk (t),h mk (t)} (16),
each user is based on the current state s mk (t) selecting action a from action space A mk (t) the action of the kth user in the mth group consists of its power, unload variables and weight coefficients, action a mk (t) ∈A is expressed as:
a mk (t)={p mk (t),x mk ,λ mk } (17),
in the method, in the process of the invention,weight coefficients representing delay and energy consumption;
according to the analysis of the user calculation cost in the third step, the cost function of the user is expressed as:
therefore, the reward function for the kth user in the mth group is expressed as:
in average field gaming, the Hamilton-Jacobi-Bellman (HJB) equation and Fokker-Planck-Kolmogorov (FPK) equation describe the overall system model;
when the kth user in the mth group is in state s mk (t) Down selection action a mk At (t), its FPK equation can be expressed as:
π mk (t+1)=π mk (t)P mk (p mk ,x mk ,λ mk ) (20),
wherein pi mk (t+1) is the state of the kth user in the mth group at the time (t+1), P mk (p mk ,x mk ,λ mk ) The probability that the kth user in the mth group transits from the t moment state to the (t+1) moment state is mainly determined by the action of the user;
according to the definition of the reward function, the state s at time t mk The value function (i.e., the HJB equation) of (t) is expressed as:
and solving a Nash equilibrium solution of the MFG based on the FPK and HJB equations.
Further, the specific mode of the fifth step is as follows:
the DDPG algorithm is adopted to solve the equilibrium solution of the MFG, and the objective function of the DDPG algorithm is defined as:
wherein θ μ Is a parameter of the policy network that generates deterministic actions, and θ μ Updating through a strategy gradient;
there are mainly two in the Actor sectionNetworks, i.e., an online policy network and a target policy network. The deterministic strategy μ is used to directly derive each moment action a t =μ(s t |θ μ ) And (5) determining a value. Like the Actor section, the Critic section also has two networks, an online Q network and a target Q network. The Q function (i.e., action value function) defined by the bellman equation is a reward expectation of selecting an action under a deterministic strategy, using a Q network to fit the Q function, namely:
Qμ(s t ,a t )=E[R+γQ(s t+1 ,μ(s t+1 ))] (23),
wherein Q is μ (s t ,a t ) Represented in state s t The deterministic strategy mu is adopted to select the action a t The expected values obtained, in order to measure the performance of the policy, define the performance targets as follows:
wherein β represents the behavior policy, ρ β Is a probability density function of the state space. In the Critic section, the mean square error is used as a loss function, namely:
thus, the loss function L with respect to θ can be obtained from a standard back propagation algorithm Q The gradient of (a), namely:
and updating the gradient in real time to enable the objective function to be converged, and finally obtaining an optimal strategy, namely obtaining an optimal resource allocation scheme in the mobile edge computing system.
Compared with the prior art, the invention has the beneficial effects that:
1. the NOMA-MEC system is constructed as an MFG theoretical framework, and the equilibrium solution of the MFG is solved through reinforcement learning, so that the calculation cost of a user, including energy consumption and time delay, is minimized.
2. The invention constructs an uplink NOMA-MEC system in an ultra-dense network, and each SBS is provided with one MEC server to serve a plurality of users. In the system, all users of each SBS service are divided into different clusters according to a user clustering algorithm to increase the data rate of the users.
3. The NOMA-MEC system under ultra dense networks is modeled as a MFG framework. And then solving the equilibrium solution of the MFG by adopting a DDPG method, and reducing the energy consumption and task delay of the user by learning a dynamic resource allocation strategy.
4. According to the invention, the method can effectively learn the optimal resource allocation strategy through experiments, and compared with other methods, the method can more effectively reduce the calculation time delay and the energy consumption of the user.
[ description of the drawings ]
FIG. 1 is a block diagram of a system for mobile edge computation for ultra dense networks in accordance with the present invention;
FIG. 2 is a schematic diagram of the relationship between the average field gaming and reinforcement learning algorithms of the present invention;
FIG. 3 is a schematic diagram of the present invention employing a reinforcement learning algorithm to optimize resource allocation in a NOMA-MEC system;
FIG. 4 is a graph showing the relationship between the energy consumption and the maximum transmission power under different algorithm comparisons according to the present invention;
fig. 5 is a schematic diagram showing the relationship between the calculated time delay and the maximum transmitting power under the comparison of different algorithms according to the present invention.
[ detailed description ] of the invention
The invention will be described in detail below with reference to the drawings and the detailed description.
Unlike the existing literature, the invention researches the resource optimization in the uplink NOMA-MEC system in the ultra-dense network from the aspects of relieving network resources and overcoming the limitations of the mobile equipment, and the invention combines a deep reinforcement learning algorithm to minimize the system delay and energy consumption by optimizing the power and unloading strategy.
Step one, constructing a system model:
an uplink NOMA-MEC system is constructed, with one MEC server per SBS to serve multiple users.
The concrete construction mode is as follows:
as shown in fig. 1, the present invention contemplates a NOMA-MEC communication system in an ultra-dense network of m= {1,2, …, M } small base stations, each equipped with a MEC server to perform user offloaded computational tasks. Assuming that the set of users served by each small cell is n= {1,2, …, N }, in order to reduce interference between users, the users need to be grouped. In the invention, N users are divided into Y= {1,2, …, Y } groups, and K= {1,2, …, K } users are in each group.
In the information transmission, the bandwidth B of the whole system is divided into Y sub-channels, and the bandwidth of each sub-channel is expressed as B sc = B/Y, while users in each group transmit information simultaneously in their sub-channels.
And step two, clustering all users in the system through a user clustering algorithm to improve the data transmission rate of the users. The users in the clusters adopt a NOMA transmission mode, and the clusters adopt a time division multiple access (Time division multiple access, TDMA) transmission mode.
The specific mode of the second step is as follows:
in the NOMA-MEC communication system model established in step one, all users of each SBS service are ordered according to the magnitudes of their channel gains, and then the user with the first M channel gains is sequentially selected as the first user in the M NOMA clusters. Next, a user having the NOMA cluster with the largest sum of channel gain differences is selected from the remaining users according to a greedy matching method. In addition, when the number of users cannot be uniformly allocated to each cluster, redundant users may be randomly allocated to different clusters, and the channel gain of each user in the clusters is different.
And step three, calculating the calculation cost of the user, namely the time delay and the energy consumption when the user processes the task. Including the local and offload computation costs of the user.
The specific mode of the third step is as follows:
and finishing clustering by the user according to the clustering algorithm in the step two. Because the NOMA technology is adopted by the users in the clusters when information is transmitted, the TDMA technology is adopted among the clusters, so that any user can be interfered by users in the same cluster and also can be interfered by users of other SBS services in the same time slot when information is transmitted.
For users within the NOMA cluster, users with greater channel gain will experience interference from users with less channel gain. The user with the smallest channel gain is not interfered by other users. Thus, the interference experienced by users within a NOMA cluster can be expressed as:
wherein p is mf Representing the transmission power of the f-th user in the m-th NOMA cluster, h mf Representing the channel gain of the f-th user in the m groups.
Secondly, in an ultra-dense network, users served by different small base stations will interfere when transmitting tasks in the same time slot, which can be expressed as:
wherein p is jk Representing the transmission power of the kth user in the j group, h jk Representing the channel gain of the kth user in the j group.
The SINR of the kth user in the mth group is expressed as:
wherein,,is the power of additive white Gaussian noise, so the firstThe data rate of the kth user in the m groups is expressed as:
R mk =W sc log(1+τ mk ) (4),
wherein W is sc =W total /M,W total Is the system bandwidth.
The computing tasks of the kth user in the mth group may be defined asWherein d mk Representing input data required by the kth user in the mth group to complete the computing task, c mk Representing the kth user calculation d in the mth group mk The number of CPU cycles required, ">Representing the last time the kth user in the mth group completed the computing task.
Let x be mk Unloading variables representing the kth user in the mth group, for the local computational model, assume thatRepresenting the local computing capacity of the kth user in the mth group, when the user performs the task locally, its time is:
when computing the energy consumption of local computing, a commonly used model of computing energy consumption is adopted, namely epsilon=κf 2 . Where κ is an energy coefficient depending on the chip structure, so the local energy consumption of the kth user in the mth group can be expressed as:
according to formulas (5) and (6), the calculation cost of the kth user in the mth group at the time of local calculation can be expressed as:
wherein,,and->Weight coefficients representing delay and energy consumption, respectively, andwhen->Indicating that the user is sensitive to delay and paying more attention to calculation time; otherwise, the user is indicated to have low energy, and the energy consumption of the calculation task is more focused.
In the process of unloading to the MEC server for calculation, the method comprises two parts of transmission and calculation at the MEC server, wherein the transmission time and the execution time are respectively as follows:
wherein f s Is the computational power of the MEC server. The total time for this unloading process is:
similarly, the energy consumption in the unloading process also has two parts, namely, the energy consumption in the transmission process and the energy consumption for executing the calculation task at the MEC server are respectively:
according to equations (11) and (12), the total energy consumption of the offloading process can be expressed as:
thus, the cost function of the kth user in the mth group during offloading can be expressed as:
further, the cost function of the kth user in the mth group to complete the computing task can be expressed as
Step four, establishing a cost function:
modeling NOMA-MEC as an MFG framework, wherein SINR and channel gain of a user are represented as a state space, and transmit power, offloading decision factors, and resource allocation factors of the user are represented as an action space; and constructing a reward function of the user according to the calculation cost of the user.
The specific steps of the fourth step are as follows:
interference can become very severe when many users compute tasks simultaneously. This severely reduces the data transfer rate of the user, thereby increasing the time delay and power consumption in offloading the computing tasks. Since each user is an independent individual, in an ultra-dense scenario, it only considers its own interests. Therefore, the present invention expresses this model as the MFG theoretical framework.
The status of each user comes only from its own local observations. In a NOMA-MEC system of an ultra-dense network, the state and channel gain of the kth user in the mth group are expressed as a state space, and the state space is expressed as:
s mk (t)={τ mk (t),h mk (t)} (16),
each user is based on the current state s mk (t) selecting action a from action space A mk (t) the action of the kth user in the mth group consists of its power, unload variables and weight coefficients, action a mk (t) ∈A is expressed as:
a mk (t)={p mk (t),x mk ,λ mk } (17),
in the method, in the process of the invention,weight coefficients representing delay and energy consumption.
It is an object of the invention to minimize the computational cost of a user on the basis of a maximum delay. From the analysis of the user's computational cost in step three, the user's cost function can be expressed as:
therefore, the reward function for the kth user in the mth group can be expressed as:
in average field gaming, the Hamilton-Jacobi-Bellman (HJB) equation and the Fokker-Planck-Kolmogorov (FPK) equation describe the overall system model. When in group m
k users in state s mk (t) Down selection action a mk At (t), its FPK equation can be expressed as:
π mk (t+1)=π mk (t)P mk (p mk ,x mk ,λ mk ) (20),
wherein pi mk (t+1) is the state of the kth user in the mth group at the time (t+1), P mk (p mk ,x mk ,λ mk ) Is the probability that the kth user in the mth group transitions from the t-time state to the (t+1) time state, which is primarily determined by the user's actions.
According to the definition of the reward function, the state s at time t mk The value function (i.e., the HJB equation) of (t) is expressed as:
the Nash equilibrium solution for MFG can be solved based on FPK and HJB equations.
And fifthly, acquiring an equilibrium solution of the average field game by using a reinforcement learning method based on DDPG.
The specific mode of the fifth step is as follows:
the DDPG algorithm is adopted to solve the equilibrium solution of the MFG, which can solve the problem of continuous motion space, and the relationship between the MFG and reinforcement learning is shown in figure 2. The DDPG algorithm can be used for resource optimization problems in many communication scenarios.
A schematic diagram of optimizing resource allocation in a NOMA-MEC system using DDPG algorithm is shown in fig. 3. The DDPG algorithm is an Actor-Critic framework, and therefore is mainly divided into an Actor and Critic to illustrate the process of the DDPG algorithm. The Actor part outputs a specific action a by minimizing the action Q (s, a) through a deterministic strategy mu on the premise of inputting a state s; the Critic part outputs Q (s, a) updated by the bellman equation on the premise of inputting the state s and the specific action a. Thus, the objective function of the DDPG algorithm can be defined as:
wherein θ μ Is a parameter of the policy network that generates deterministic actions, and θ μ By policyThe gradient is updated.
In the Actor section there are mainly two networks, an online policy network and a target policy network. The deterministic strategy μ is used to directly derive each moment action a t =μ(s t |θ μ ) And (5) determining a value. Like the Actor section, the Critic section also has two networks, an online Q network and a target Q network. The Q function (i.e., action value function) defined by the bellman equation is a reward expectation of selecting an action under a deterministic strategy, using a Q network to fit the Q function, namely:
Q μ (s t ,a t )=E[R+γQ(s t+1 ,μ(s t+1 ))] (23),
wherein Q is μ (s t ,a t ) Represented in state s t The deterministic strategy mu is adopted to select the action a t The expected values obtained, in order to measure the performance of the policy, define the performance targets as follows:
wherein β represents the behavior policy, ρ β Is a probability density function of the state space. The purpose of training is to target the performance of the Q network J β Maximization minimizes the loss of Q network. In the Critic section, the mean square error is used as a loss function, namely:
L(θ Q )=E[R+γQ′(s t+1 ,μ′(s t+1 |θ μ′ )|θ Q′ )-Q(s t ,a t |θ Q )] (25),
thus, the loss function L with respect to θ can be obtained from a standard back propagation algorithm Q The gradient of (a), namely:
examples:
the illustrations provided in the examples below and the setting of specific parameter values in the models are mainly for illustrating the basic idea of the invention and for performing simulation verification on the invention, and in a specific application environment, the actual scene and the requirements can be appropriately adjusted.
The invention researches a NOMA-MEC system in an ultra-dense network, wherein 60 small base stations are randomly distributed within a range of 10km by 10km, the coverage range of each small base station is 20m, and 64 users are randomly distributed near the small base stations.
To implement the DDPG algorithm, the Actor network and Critic network use a fully connected neural network with three hidden layers, each hidden layer containing 300 neurons. For an Actor network, the last output layer uses Sigmoid activation functions to ensure that the probability of the last action output is between 0-1. For Critic networks, a ReLU activation function is used for each layer. Learning rates of the Actor network and the Critic network are set to 0.0001 and 0.001, respectively.
Fig. 4 and 5 show the effect of maximum transmit power for different algorithms and different multiple access modes. In fig. 4, it can be observed that the energy consumption of the system gradually increases with an increase in the maximum transmission power. The NOMA scheme may achieve lower power consumption when the maximum transmit power is fixed. This is because users in the NOMA cluster can simultaneously use the full spectrum resources to send information, which can reduce the power consumption of the system. As can be seen from fig. 5, the calculation delay decreases with increasing maximum transmit power. This is because, when the maximum transmission power is large, both the calculation speed and the data transmission rate of the user become large, resulting in a reduction in calculation delay.
Claims (5)
1. A method for distributing computing resources at mobile edge of ultra-dense network is characterized by that,
the resource allocation method is based on an ultra-dense network, and a NOMA-MEC communication system in the ultra-dense network comprisesEach small base station is provided with an MEC server to execute the calculation task of user unloading; assume that each small base stationThe user set of the service is->N users are divided intoGroups, each group has +.>A user;
the resource allocation method is implemented according to the following steps:
step one, constructing an uplink NOMA-MEC communication system, wherein each small base station SBS is provided with an MEC server to serve a plurality of users;
step two, clustering is carried out on all users in the NOMA-MEC communication system according to the difference of channel gains; the users in the clusters adopt a NOMA transmission mode, and the clusters adopt a TDMA transmission mode;
step three, calculating the calculation cost of the user, namely the time delay and the energy consumption when the user processes the task; wherein the computing costs include a local computing cost of the user and an offload computing cost;
modeling a NOMA-MEC communication system as an MFG framework; the SINR and the channel gain of the user are expressed as a state space, and the transmitting power, the unloading decision factor and the resource allocation factor of the user are expressed as an action space; constructing a reward function of the user according to the calculation cost of the user;
and fifthly, acquiring an equilibrium solution of the average field game by using a reinforcement learning method based on DDPG, namely an optimal resource allocation scheme in the mobile edge computing system.
2. The method for allocating mobile edge computing resources of an ultra-dense network according to claim 1, wherein the specific method in the second step is as follows:
in the NOMA-MEC communication system model established in the first step, all users of each SBS service are ordered according to the magnitude of the channel gain, and then the users with the first M channel gains are sequentially selected as the first users in M NOMA clusters;
selecting a user with the maximum sum of channel gain differences of the NOMA cluster from other users according to a greedy matching method;
when the number of users cannot be uniformly allocated to each cluster, redundant users are randomly allocated to different clusters, and the channel gain of each user in the clusters is different.
3. The method for allocating mobile edge computing resources of an ultra dense network according to claim 1 or 2, wherein the specific manner of the third step is:
3.1 Cost of local computation for the user:
let x be mk Unloading variables representing the kth user in the mth group, for a local computing model, i.e., the user can accomplish the computing task locally, without unloading the computing task to the MEC server, assumingRepresenting the local computing capacity of the kth user in the mth group, c mk Representing the number of CPU cycles required by the kth user in the mth group to perform local computation, when the user performs the task locally, the time is:
when computing the energy consumption of local computing, a commonly used model of computing energy consumption is adopted, namely epsilon=κf 2 The method comprises the steps of carrying out a first treatment on the surface of the Where ε represents the local computation energy consumption and κ is the energy coefficient depending on the chip structure, the local energy consumption of the kth user in the mth group can be expressed as:
according to formulas (5) and (6), the local computation cost of the kth user in the mth group can be expressed as:
wherein,,and->Weight coefficients representing delay and energy consumption, respectively, and +.>
3.2 Offloading computational cost for the user:
in the process of unloading to the MEC server for calculation, the method comprises two parts of transmission and calculation at the MEC server, wherein the transmission time and the execution time are respectively as follows:
wherein f s Is the computational power of the MEC server; r is R mk Representing the data transmission rate of the kth user in the mth group;
the total time of the unloading process is:
the energy consumption of the unloading process also has two parts, namely, the energy consumption in the transmission process and the energy consumption for executing the calculation task at the MEC server are respectively:
wherein p is mk Representing user power;
according to equations (11) and (12), the total energy consumption of the unloading process is expressed as:
thus, the offload computation cost function for the kth user in the mth group is expressed as:
3.3 Total computation cost for the user):
according to 3.1 and 3.2, obtaining a user local computation cost and a user offload computation cost, the overall computation cost function of the user to complete the computation task can be expressed as:
4. the method for allocating mobile edge computing resources of an ultra dense network according to claim 1 or 2, wherein the step four comprises the following specific steps:
in a NOMA-MEC system of an ultra-dense network, the state and channel gain of the kth user in the mth group are expressed as a state space, and the state space is expressed as:
s mk (t)={τ mk (t),h mk (t)} (16),
wherein τ is mk (t) represents the signal-to-interference-and-noise ratio of the user, h mk (t) represents the channel gain of the user;
each user is based on the current state s mk (t) from the action spaceIn selection action a mk (t) the action of the kth user in the mth group consists of its power, unload variables and weight coefficients, action +.>Expressed as:
a mk (t)={p mk (t),x mk ,λ mk } (17),
in the method, in the process of the invention,weight coefficients representing delay and energy consumption; p is p mk (t) represents the data transmission power of the user, x mk An unload variable representing a user;
according to the analysis of the user calculation cost in the third step, the cost function of the user is expressed as:
therefore, the reward function for the kth user in the mth group is expressed as:
wherein,,representing the local computational cost of the kth user in the mth group,/>representing the offload computation cost of the kth user in the mth group;
in average field gaming, the Hamilton-Jacobi-Bellman (HJB) equation and Fokker-Planck-Kolmogorov (FPK) equation describe the overall system model;
when the kth user in the mth group is in state s mk (t) Down selection action a mk At (t), its FPK equation can be expressed as:
wherein pi mk (t+1) is the state of the kth user in the mth group at the time (t+1), P mk (p mk ,x mk ,λ mk ) The probability that the kth user in the mth group transits from the t moment state to the (t+1) moment state is mainly determined by the action of the user;
according to the definition of the reward function, the state s at time t mk The value function of (t) is expressed as:
wherein V is t μ (s mk ) A value function representing a selection strategy μ at time t, R (p mk ,x mk ,λ mk |s mk ) Representing a reward function; and solving a Nash equilibrium solution of the MFG based on the FPK and HJB equations.
5. The method for allocating mobile edge computing resources of an ultra dense network according to claim 1 or 2, wherein the specific manner of the fifth step is:
the DDPG algorithm is adopted to solve the equilibrium solution of the MFG, and the objective function of the DDPG algorithm is defined as:
wherein θ μ Is a parameter of the policy network that generates deterministic actions, and θ μ Updating by means of strategy gradients, E representing the expected value of the function, gamma representing the weighted value of the reward function, R i Representing a prize function value;
two networks, namely an online strategy network and a target strategy network, are mainly arranged in the Actor part; the deterministic strategy μ is used to directly derive each moment action a t =μ(s t |θ μ ) A determined value; the Critic part is also provided with two networks, namely an online Q network and a target Q network, which are the same as the Actor part; the Q function defined by the bellman equation is the rewards expectation of the selection of actions under deterministic policies, the Q function is fitted using a Q network, namely:
wherein Q is μ (s t ,a t ) Represented in state s t The deterministic strategy mu is adopted to select the action a t The expected values obtained, in order to measure the performance of the policy, define the performance targets as follows:
wherein s represents the state of the user,representing user state set +.>Obeying the probability density function ρ β E represents the expected value of the function, beta represents the behavior strategy, ρ β Is a probability density function of the state space; in the Critic section, mean square error is usedThe difference is as a loss function, namely:
wherein,,represents the expected value of the function, R represents the value of the reward function, gamma represents the weighted value of the reward function, mu ' represents a deterministic strategy, Q ' represents the expected value obtained with a deterministic strategy mu ', theta Q Q network parameters, θ, representing the generation of desired values by strategy μ μ′ The representation is a parameter of the policy network μ' that generates deterministic actions, θ Q′ A Q network parameter representing the generation of the desired value by the strategy μ';
thus, the loss function L with respect to θ can be obtained from a standard back propagation algorithm Q The gradient of (a), namely:
and updating the gradient in real time to enable the objective function to be converged, and finally obtaining an optimal strategy, namely obtaining an optimal resource allocation scheme in the mobile edge computing system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010597779.5A CN111800828B (en) | 2020-06-28 | 2020-06-28 | Mobile edge computing resource allocation method for ultra-dense network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010597779.5A CN111800828B (en) | 2020-06-28 | 2020-06-28 | Mobile edge computing resource allocation method for ultra-dense network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111800828A CN111800828A (en) | 2020-10-20 |
CN111800828B true CN111800828B (en) | 2023-07-18 |
Family
ID=72803807
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010597779.5A Active CN111800828B (en) | 2020-06-28 | 2020-06-28 | Mobile edge computing resource allocation method for ultra-dense network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111800828B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112468568B (en) * | 2020-11-23 | 2024-04-23 | 南京信息工程大学滨江学院 | Task relay unloading method for mobile edge computing network |
CN112492691B (en) * | 2020-11-26 | 2024-03-26 | 辽宁工程技术大学 | Downlink NOMA power distribution method of depth deterministic strategy gradient |
CN112601256B (en) * | 2020-12-07 | 2022-07-15 | 广西师范大学 | MEC-SBS clustering-based load scheduling method in ultra-dense network |
CN112654081B (en) * | 2020-12-14 | 2023-02-07 | 西安邮电大学 | User clustering and resource allocation optimization method, system, medium, device and application |
CN112738822A (en) * | 2020-12-25 | 2021-04-30 | 中国石油大学(华东) | NOMA-based security offload and resource allocation method in mobile edge computing environment |
CN113055854A (en) * | 2021-03-16 | 2021-06-29 | 西安邮电大学 | NOMA-based vehicle edge computing network optimization method, system, medium and application |
CN113517920A (en) * | 2021-04-20 | 2021-10-19 | 东方红卫星移动通信有限公司 | Calculation unloading method and system for simulation load of Internet of things in ultra-dense low-orbit constellation |
CN113543342B (en) * | 2021-07-05 | 2024-03-29 | 南京信息工程大学滨江学院 | NOMA-MEC-based reinforcement learning resource allocation and task unloading method |
CN113938997B (en) * | 2021-09-30 | 2024-04-30 | 中国人民解放军陆军工程大学 | Resource allocation method of secure MEC system in NOMA (non-volatile memory access) Internet of things |
CN114827191B (en) * | 2022-03-15 | 2023-11-03 | 华南理工大学 | Dynamic task unloading method for fusing NOMA in vehicle-road cooperative system |
CN114727423A (en) * | 2022-04-02 | 2022-07-08 | 北京邮电大学 | Personalized access method in GF-NOMA system |
CN115022937B (en) * | 2022-07-14 | 2022-11-11 | 合肥工业大学 | Topological feature extraction method and multi-edge cooperative scheduling method considering topological features |
CN115460080B (en) * | 2022-08-22 | 2024-04-05 | 昆明理工大学 | Blockchain-assisted time-varying average field game edge calculation unloading optimization method |
CN117857559B (en) * | 2024-03-07 | 2024-07-12 | 北京邮电大学 | Metropolitan area optical network task unloading method based on average field game and edge server |
CN118509823B (en) * | 2024-07-19 | 2024-10-18 | 山东科技大学 | Distributed multidimensional network resource slicing method based on strategy gradient algorithm and game |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107819840A (en) * | 2017-10-31 | 2018-03-20 | 北京邮电大学 | Distributed mobile edge calculations discharging method in the super-intensive network architecture |
CN109548013A (en) * | 2018-12-07 | 2019-03-29 | 南京邮电大学 | A kind of mobile edge calculations system constituting method of the NOMA with anti-eavesdropping ability |
CN109951897A (en) * | 2019-03-08 | 2019-06-28 | 东华大学 | A kind of MEC discharging method under energy consumption and deferred constraint |
CN110798849A (en) * | 2019-10-10 | 2020-02-14 | 西北工业大学 | Computing resource allocation and task unloading method for ultra-dense network edge computing |
CN111245539A (en) * | 2020-01-07 | 2020-06-05 | 南京邮电大学 | NOMA-based efficient resource allocation method for mobile edge computing network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR3072851B1 (en) * | 2017-10-23 | 2019-11-15 | Commissariat A L'energie Atomique Et Aux Energies Alternatives | REALIZING LEARNING TRANSMISSION RESOURCE ALLOCATION METHOD |
-
2020
- 2020-06-28 CN CN202010597779.5A patent/CN111800828B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107819840A (en) * | 2017-10-31 | 2018-03-20 | 北京邮电大学 | Distributed mobile edge calculations discharging method in the super-intensive network architecture |
CN109548013A (en) * | 2018-12-07 | 2019-03-29 | 南京邮电大学 | A kind of mobile edge calculations system constituting method of the NOMA with anti-eavesdropping ability |
CN109951897A (en) * | 2019-03-08 | 2019-06-28 | 东华大学 | A kind of MEC discharging method under energy consumption and deferred constraint |
CN110798849A (en) * | 2019-10-10 | 2020-02-14 | 西北工业大学 | Computing resource allocation and task unloading method for ultra-dense network edge computing |
CN111245539A (en) * | 2020-01-07 | 2020-06-05 | 南京邮电大学 | NOMA-based efficient resource allocation method for mobile edge computing network |
Also Published As
Publication number | Publication date |
---|---|
CN111800828A (en) | 2020-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111800828B (en) | Mobile edge computing resource allocation method for ultra-dense network | |
CN109729528B (en) | D2D resource allocation method based on multi-agent deep reinforcement learning | |
Chen et al. | Multiuser computation offloading and resource allocation for cloud–edge heterogeneous network | |
Li et al. | Downlink transmit power control in ultra-dense UAV network based on mean field game and deep reinforcement learning | |
CN113873022A (en) | Mobile edge network intelligent resource allocation method capable of dividing tasks | |
CN111405569A (en) | Calculation unloading and resource allocation method and device based on deep reinforcement learning | |
CN109947545A (en) | A kind of decision-making technique of task unloading and migration based on user mobility | |
CN112788605B (en) | Edge computing resource scheduling method and system based on double-delay depth certainty strategy | |
CN110856259A (en) | Resource allocation and offloading method for adaptive data block size in mobile edge computing environment | |
CN116456493A (en) | D2D user resource allocation method and storage medium based on deep reinforcement learning algorithm | |
CN113490219B (en) | Dynamic resource allocation method for ultra-dense networking | |
Cheng et al. | Efficient resource allocation for NOMA-MEC system in ultra-dense network: A mean field game approach | |
CN113590279A (en) | Task scheduling and resource allocation method for multi-core edge computing server | |
CN114828018A (en) | Multi-user mobile edge computing unloading method based on depth certainty strategy gradient | |
CN113573363A (en) | MEC calculation unloading and resource allocation method based on deep reinforcement learning | |
CN114980039A (en) | Random task scheduling and resource allocation method in MEC system of D2D cooperative computing | |
Zhou et al. | Joint multi-objective optimization for radio access network slicing using multi-agent deep reinforcement learning | |
Bhandari et al. | Optimal Cache Resource Allocation Based on Deep Neural Networks for Fog Radio Access Networks | |
CN116321293A (en) | Edge computing unloading and resource allocation method based on multi-agent reinforcement learning | |
Ma et al. | On-demand resource management for 6G wireless networks using knowledge-assisted dynamic neural networks | |
Gao et al. | Multi-armed bandits scheme for tasks offloading in MEC-enabled maritime communication networks | |
CN113821346B (en) | Edge computing unloading and resource management method based on deep reinforcement learning | |
Geng et al. | Deep reinforcement learning-based computation offloading in vehicular networks | |
Han et al. | Multi-step reinforcement learning-based offloading for vehicle edge computing | |
CN114219074A (en) | Wireless communication network resource allocation algorithm dynamically adjusted according to requirements |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |