CN111405569A - Calculation unloading and resource allocation method and device based on deep reinforcement learning - Google Patents

Calculation unloading and resource allocation method and device based on deep reinforcement learning Download PDF

Info

Publication number
CN111405569A
CN111405569A CN202010197729.8A CN202010197729A CN111405569A CN 111405569 A CN111405569 A CN 111405569A CN 202010197729 A CN202010197729 A CN 202010197729A CN 111405569 A CN111405569 A CN 111405569A
Authority
CN
China
Prior art keywords
reinforcement learning
resource allocation
deep reinforcement
computing
resources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010197729.8A
Other languages
Chinese (zh)
Inventor
周欢
江恺
冯阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges University CTGU
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN202010197729.8A priority Critical patent/CN111405569A/en
Publication of CN111405569A publication Critical patent/CN111405569A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/02Resource partitioning among network components, e.g. reuse partitioning
    • H04W16/10Dynamic resource partitioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention provides a computing unloading and resource allocation method and device based on deep reinforcement learning, wherein the method comprises the following steps: calculating total calculation resources of an MEC server based on calculation task parameters of the UE, performance parameters of the UE, channel parameters between the UE and the AP and mobile edges, and constructing an optimization problem model; and determining the optimal solution of the optimization problem model based on deep reinforcement learning, determining the unloading decision of the UE, and respectively allocating the percentage number of the computing resources and the percentage number of the spectrum resources to the UE. The method and the device for calculating unloading and resource allocation based on deep reinforcement learning simultaneously consider the actual characteristics of calculating unloading and resource allocation in a time-varying MEC system, the time delay threshold value of a task and the limited resource capacity constraint of the system, and effectively approximate a value function in reinforcement learning by using DNN based on the deep reinforcement learning so as to determine a combined optimal scheme of calculating unloading and resource allocation and further reduce the energy consumption of UE.

Description

Calculation unloading and resource allocation method and device based on deep reinforcement learning
Technical Field
The invention relates to the technical field of mobile communication, in particular to a computing unloading and resource allocation method and device based on deep reinforcement learning.
Background
In order to alleviate the increasingly serious conflict between application requirements and resource-constrained User Equipment (UE), the MCC is brought to bear the rise as an effective solution in consideration that the Computing capacity and storage capacity of a Cloud server deployed in Mobile Cloud Computing (MCC) are significantly higher than those of the UE. However, the MCC technology inevitably faces the problem that the deployed cloud server is far away from the user equipment, which may cause additional transmission energy overhead when the user equipment transmits data to the cloud server. In addition, the Quality of Service (QoS) of the delay-sensitive application cannot be guaranteed even in long-distance transmission.
In the prior art, a Mobile Edge Computing (MEC) technology is proposed, which introduces part of network functions to the network Edge to execute. MEC is an important component of the emerging 5G architecture to handle compute intensive tasks, extending the capabilities of MCC by extending cloud computing services from a centralized cloud to the edge of the network, as compared to MCC. The MEC supports user equipment to offload workload to an adjacent MEC server by using a Base Station (BS) or an Access Point (AP), which can improve QoS of mobile applications and significantly reduce execution delay and power consumption of tasks.
The existing scheme only focuses on the performance of the quasi-static system, and ignores the influence of different resource requirements and limited resource capacity on the performance of the MEC system, and the technical problem of overlarge energy consumption of UE still exists in the practical network application.
Disclosure of Invention
The embodiment of the invention provides a calculation unloading and resource allocation method and device based on deep reinforcement learning, which are used for solving the technical problems in the prior art.
In order to solve the above technical problem, in one aspect, an embodiment of the present invention provides a computation offloading and resource allocation method based on deep reinforcement learning, including:
calculating total calculation resources of an MEC server based on calculation task parameters of terminal UE, performance parameters of the UE, channel parameters between the UE and an access point AP and a mobile edge, and constructing an optimization problem model;
and determining an optimal solution of the optimization problem model based on deep reinforcement learning, wherein the optimal solution comprises an unloading decision of the UE, the percentage of the computing resources distributed to the UE by the MEC server to the total computing resources is the percentage of the spectrum resources distributed to the UE by the AP to the total spectrum resources.
Further, the calculation task parameters include the amount of calculation resources required to complete the calculation task, the data size of the calculation task, and the maximum tolerable delay for executing the calculation task.
Further, the performance parameters include energy consumed by the CPU for each round when the computation task is executed locally, transmission power when data is uploaded, and power consumption in a standby state.
Further, the channel parameters include a channel bandwidth of an available spectrum, a channel gain of a wireless transmission channel, and a power of white gaussian noise inside the channel.
Further, the optimization problem model aims to: the long term energy consumption of all UEs in the system is minimized.
6. The deep reinforcement learning-based computation offload and resource allocation method according to claim 1, wherein the constraint conditions of the optimization problem model are as follows:
a. the offloading decision of the UE can only choose local execution or edge execution to handle its computational tasks;
b. the execution time of local or unloading calculation cannot exceed the maximum tolerable time delay of a certain calculation task;
c. the sum of the computing resources allocated to all UEs cannot exceed the total computing resources that the MEC server can provide;
d. the computing resources allocated to any UE cannot exceed the total computing resources that the MEC server can provide;
e. the sum of the spectrum resources allocated to all UEs cannot exceed the total spectrum resources that the AP can provide;
f. the spectrum resources allocated to any UE cannot exceed the total spectrum resources that the AP can provide.
Further, the determining an optimal solution of the optimization problem model based on the deep reinforcement learning specifically includes:
determining a state space, an action space and a return function according to the optimization problem model;
constructing a Markov decision problem;
and calculating the Markov decision problem based on deep reinforcement learning, estimating an action value function value by utilizing a deep neural network DNN, and determining the optimal solution of the optimization problem model.
In another aspect, an embodiment of the present invention provides a device for computation offload and resource allocation based on deep reinforcement learning, including:
the building module is used for computing the total computing resources of the MEC server based on the computing task parameters of the terminal UE, the performance parameters of the UE, the channel parameters between the UE and the access point AP and the mobile edge, and building an optimization problem model;
and the determining module is used for determining the optimal solution of the optimization problem model based on deep reinforcement learning, wherein the optimal solution comprises the unloading decision of the UE, the computing resources distributed to the UE by the MEC server account for the percentage of the total computing resources, and the spectrum resources distributed to the UE by the AP account for the percentage of the total spectrum resources.
In another aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method provided by the first aspect when executing the computer program.
In yet another aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method provided in the first aspect.
The method and the device for calculating unloading and resource allocation based on deep reinforcement learning provided by the embodiment of the invention simultaneously consider the actual characteristics of calculating unloading and resource allocation in a time-varying MEC system, the time delay threshold of a task and the limited resource capacity constraint of the system, effectively approach a value function in reinforcement learning by utilizing DNN based on the deep reinforcement learning, determine the joint optimal scheme of calculating unloading and resource allocation, and further reduce the energy consumption of UE.
Drawings
FIG. 1 is a schematic diagram of a deep reinforcement learning-based computation offloading and resource allocation method according to an embodiment of the present invention;
fig. 2 is a schematic view of a scenario of a multi-user mobile edge network model according to an embodiment of the present invention;
FIG. 3 is a diagram of a convergence analysis based on deep reinforcement learning according to an embodiment of the present invention;
fig. 4 is a schematic diagram of energy consumption of all users under different UE numbers according to an embodiment of the present invention;
fig. 5 is a schematic diagram of energy consumption of all users under different total computing resources of the MEC server according to the embodiment of the present invention;
FIG. 6 is a schematic diagram of an apparatus for computation offloading and resource allocation based on deep reinforcement learning according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
With the advent of many emerging wireless services in 5G networks, mobile applications, especially more and more compute-intensive tasks such as online interactive gaming, face recognition, and augmented/virtual reality (AR/VR), have resulted in an unprecedented explosive increase in data traffic. In general, these emerging applications have high requirements for quality of service (QoS) and delay sensitivity, which results in such applications consuming more power than legacy applications. However, considering the physical size and production cost constraints of User Equipments (UEs), the current UEs have certain limitations in terms of computation, resources, energy, etc., which may become new bottlenecks faced in the challenges of handling large-scale applications or providing persistent energy supply.
To alleviate the increasingly severe conflict between application requirements and resource-constrained UEs, the MCC has been motivated to come up as an effective solution in view of the significantly higher Computing and storage capabilities of Cloud servers deployed in Mobile Cloud Computing (MCC) than UEs. MCC technology can conveniently access a shared resource pool in a centralized "cloud" to provide storage, computing, and energy resources for UEs by offloading workload from the UEs to a cloud server. However, the MCC technology inevitably faces the problem that the deployed cloud server is far away from the user equipment, which may cause additional transmission energy overhead when the user equipment transmits data to the cloud server. In addition, long-distance transmission cannot guarantee the QoS of delay-sensitive applications.
Therefore, some researchers have proposed a Mobile Edge Computing (MEC) technology, which introduces part of the network functions to the network Edge to perform. MEC is an important component of the emerging 5G architecture to handle compute intensive tasks, extending the capabilities of MCC by extending cloud computing services from a centralized cloud to the edge of the network, as compared to MCC. In particular, the MEC supports user equipment to offload workload to an adjacent MEC server by using a Base Station (BS) or an Access Point (AP), which can improve QoS of mobile applications and significantly reduce execution delay and power consumption of tasks.
In view of the actual computational offload and resource allocation characteristics in time-varying MEC systems, reinforcement learning has been considered as a suitable method for obtaining optimal computational strategies. In particular, without any a priori information about the system environment, the agent may learn its feedback values in future returns by observing the environment, thereby achieving a strategy of optimal long-term objectives. This feature makes reinforcement learning have great potential for use in designing offload decisions and resource allocation schemes in dynamic systems. However, in practical network applications, most of previous researches only focus on the performance of the quasi-static system, the time delay sensitive characteristics and time-varying conditions of the system in the time domain are rarely considered, and the influence of different resource requirements and limited resource capacity on the performance of the MEC system is often ignored. In addition, in such complex dynamic computational offload scenarios, the state space and the motion space in reinforcement learning may grow exponentially with the number of UEs, so that the conventional reinforcement learning method cannot maintain the Q table due to dimension disaster or memory limitation, and it takes a lot of time delay to search for the corresponding value in such a huge table.
In order to solve these problems, it is necessary to consider and solve the delay thresholds of heterogeneous computational tasks and uncertain dynamic resource requirements in different tasks, while considering the use of Deep Neural Networks (DNN) instead of Q tables, therefore, the present patent is directed to the joint optimization problem of offloading decision and resource allocation for task execution in MEC, modeling the corresponding problem as a nonlinear integer problem from the perspective of energy consumption, aiming to minimize the total energy consumption of all UEs, and simultaneously considering the delay constraints and resource requirements of different computational tasks in the optimization problem.
Fig. 1 is a schematic view of a computation offloading and resource allocation method based on deep reinforcement learning according to an embodiment of the present invention, and as shown in fig. 1, an implementation subject of the computation offloading and resource allocation method based on deep reinforcement learning according to an embodiment of the present invention is a computation offloading and resource allocation apparatus based on deep reinforcement learning. The method comprises the following steps:
step S101, calculating total calculation resources of the MEC server based on calculation task parameters of the terminal UE, performance parameters of the UE, channel parameters between the UE and the access point AP and mobile edges, and constructing an optimization problem model.
Specifically, fig. 2 is a schematic view of a scenario of a multi-user mobile edge network model according to an embodiment of the present invention, as shown in fig. 2, a single-cell scenario is considered in a mobile edge computing network, where the scenario includes an Access Point (AP) and n users, and the number of users may be represented by a set I ═ {1,2, …, n }. In order to provide the MEC service for the UE, a group of MEC servers are deployed on the AP for computation offloading, and a plurality of UEs within a cell may offload their workloads to the MEC servers over the wireless link to assist in the computation. Suppose the system is operating in a fixed-length time slice T {0, 1,2, …, T }, and there is one compute-intensive task per UE to process in any time slice T. At the same time, all arriving computing tasks are considered to be atomic, i.e. not split into parts for processing, which means that the UE's computing tasks cannot be performed on different devices, they can only be performed on the local device by means of the UE's own computing resources, or in MEC servers offloaded to the AP over the wireless link to perform the computations. When multiple tasks on different devices need to be offloaded simultaneously, the MEC server operator needs to decide how to optimally allocate spectrum resources and computing resources to each UE according to time-varying system conditions, task heterogeneity, and energy overhead conditions of all UEs under different conditions.
Without loss of generality, the embodiments of the present invention employ a widely used task model to describe the tasks reached on the UE. For each time slice UEiThe above corresponding arbitrary computational task, which may be defined by three parameters:
Figure BDA0002418221630000071
wherein s isiRepresenting a computational task HiData size of ciIndicating completion of computational task HiThe amount of computing resources required. Variable ciAnd siWithin each time slice, are independent and identically distributed, and there may be an arbitrary probability distribution between them that is not necessarily known.
Figure BDA0002418221630000072
Indicating the execution of task HiMaximum tolerable time ofThis means that the execution time of a task on any UE should not exceed the latency threshold, regardless of whether the task is chosen to execute on the local device or to be offloaded by computation
Figure BDA0002418221630000073
Further, assume that during computation offload, the UE is always within communication coverage of the AP. The embodiments of the present invention are directed to performing tasks on local devices or offloading tasks to MEC services deployed on APs to assist in the performance, without further consideration of offloading tasks to remote cloud or other macro base stations. Using integer variables
Figure BDA0002418221630000074
To indicate the UE within a certain time slice tiIn which xi0 denotes task HiDirectly at the local equipment UEiPerforms the calculation on the CPU of (1), x i1 denotes a UEiThus, the offload decision vector for all users in the overall MEC system may be defined as η ═ x1,x2,x3,...,xn}。
1) And (3) communication model: when the computing task is difficult to execute on the local device under limited constraints, the UE may offload the computing task to the MEC server deployed on the AP over the wireless link. It is assumed that the UE employs an orthogonal frequency division technique in communicating with the AP and ignores the communication overhead between the MEC server and the AP. Meanwhile, because only one AP exists in the cell at this time, and the problem of overlapping coverage between adjacent cells is not considered, communication interference between users can be ignored. Now assuming that there are multiple UEs uploading their computational tasks to the AP at the same time, the MEC system can allocate bandwidth according to the real-time needs of the UEs by using dynamic spectrum access. Will thetai∈[0,1]UE defined as AP to single useriThe allocated spectrum resources account for a percentage of the total resources, and therefore, when the user UEiWhen the calculation task is unloaded to the AP, the UEiAnd the channel uploading rate R between the AP and the APiCan be expressed as follows:
Figure BDA0002418221630000075
wherein W represents UEiChannel bandwidth of available spectrum, p, with APiFor uploading data, UEiTransmission power of giIs a UEiAnd the channel gain of a wireless transmission channel between the AP and the AP, wherein the sigma is the power of complex white Gaussian noise in the channel.
2) Calculating a model: compute task HiCan rely on UEiThe own computing resource selection is executed locally, and can also be executed on the MEC server through computing unloading. These two calculation models are presented below:
the local execution model: for xiWhen 0, task HiWill be controlled by the UEiAnd carrying out local calculation processing. Are used separately
Figure BDA0002418221630000081
And
Figure BDA0002418221630000082
to represent the user UEiLocal computing power (CPU turns/second) and the energy consumed by the CPU per turn when performing computing tasks locally. Thus, in this case, compute task HiThe required computational processing time is:
Figure BDA0002418221630000083
and, at this time, the UEiThe corresponding energy consumption can be calculated by the following formula:
Figure BDA0002418221630000084
wherein the content of the first and second substances,
Figure BDA0002418221630000085
this value depends on the actual CPU chip architecture。
Moving edge execution model: for xiWhen 1, UEiSelecting to compute task HiAnd unloading the calculation result to an MEC server connected with the AP for execution, and returning the calculation result to the UE after the MEC server processes the calculation task. It should be noted here that, since the data amount of the returned result is small and the downlink transmission rate from the AP to the UE is high in most cases, the transmission time and energy consumption spent in returning the result can be ignored. To sum up, task HiThe total processing time of (2) mainly comprises two parts, the first part is to carry out the task H through a wireless linkiThe time consumed for transmission from the UE to the MEC server, and the second part is task HiThe time consumed to perform the calculations on the MEC server.
Wherein, the task HiSlave UEiTime taken to transmit to MEC server and calculate input data size siAnd UEiIs directly related, so there are:
Figure BDA0002418221630000086
accordingly, task HiSlave UEiThe transmission energy spent for transmission to the MEC server may be calculated as:
Figure BDA0002418221630000087
wherein p isiFor the UEiAnd the transmission power with the AP.
β will be mixedi∈[0,1]Defined as MEC server to single UEiThe percentage of the total resources of the MEC server is occupied by the distributed computing resources, and f is definedmecTotal number of computing resources owned by MEC server, therefore, βifmecThen it is assigned to the UE on behalf of the MEC server within any time sliceiThe number of computing resources of (1). When a high percentage of the amount of computing resources is allocated to a certain UE, the execution time of the task on it becomes shorter, but the energy consumed by this process may also be reducedCan be increased accordingly, and at the same time variable βiThe constraint of total resource allocation must be satisfied
Figure BDA0002418221630000091
Thus, the MEC server processes task HiThe time taken can be given by:
Figure BDA0002418221630000092
when the MEC server is the UEiWhile performing the computation task, the UEiAt this time, a return result after the completion of the execution of the task should be waited. During this time, assume that the UEiIn a standby mode and defining the UE in the standby stateiPower consumption of
Figure BDA0002418221630000093
Thus, it can be derived that the UEiThe corresponding energy consumption in this state is:
Figure BDA0002418221630000094
therefore, in combination with the above calculation process, the UE performs the calculation unloading processiThe total execution time and the corresponding energy consumption of the upper task are both composed of two parts, namely a communication process and a calculation process, which are respectively expressed as follows:
Figure BDA0002418221630000095
Figure BDA0002418221630000096
3) energy consumption model: in an MEC system, a UEiA calculation mode has to be selected to perform the calculation task HiThus for any UE in a certain time sliceiIn other words, its execution latency can be expressed as:
Figure BDA0002418221630000097
likewise, within a certain time slice, a single UEiIn order to complete the arrived calculation task HiThe energy consumed can be expressed as:
Figure BDA0002418221630000098
finally, the total energy consumption of all UEs in this MEC system can be derived, which is expressed as:
Figure BDA0002418221630000099
the joint optimization problem related to computation offloading and resource allocation in the MEC system proposed by the embodiments of the present invention aims to minimize long-term energy consumption of all UEs. Considering the maximum tolerable delay constraint of a task, the corresponding constraint optimization problem can be planned as follows:
Figure BDA0002418221630000101
Figure BDA0002418221630000102
Figure BDA0002418221630000103
Figure BDA0002418221630000104
Figure BDA0002418221630000107
Figure BDA0002418221630000105
Figure BDA0002418221630000106
the constraints in the above formula have the following meanings:
constraints (14) indicate that any UE can only select either the local execution model or the edge execution model to handle its computational tasks.
Constraints (15) ensure that neither the local nor the off-load computational model can be executed for more than the maximum tolerable delay for the task.
The constraint (16) indicates that the computational resources allocated to all UEs cannot exceed the total amount of computational resources that the MEC server can provide.
Constraints (17) guarantee allocation to a single UEiThe computing resources of (a) must be less than the total amount of computing resources that the MEC server can provide.
Constraints (18) ensure that the spectrum resources used by all UEs should be less than the total available spectrum resources of the AP.
Constraints (19) guarantee a single user UEiThe used spectrum resources cannot exceed the total available spectrum resources of the AP.
And S102, determining an optimal solution of the optimization problem model based on deep reinforcement learning, wherein the optimal solution comprises an unloading decision of the UE, the computing resources distributed to the UE by the MEC server account for the percentage of the total computing resources, and the spectrum resources distributed to the UE by the AP account for the percentage of the total spectrum resources.
Specifically, to solve the optimization problem described above, the offload decision variable { x } must be obtainediI ∈ I, calculating resource allocation variable { β |)iI ∈ I and a communication resource allocation variable thetaiThe optimal values of I ∈ I, which can be used to minimize the total computational energy consumption given the delay constraintsiIs a binary variable and at the same time a communication resource allocation variable βiAnd computing a resource allocation variable θiAre dynamically changing, the system needs to collect a large amount of network state information and perform global offload selection and resource allocation decisions for each UE based on the current state of the network. When the objective function is a mixtureTo solve this NP-hard problem, embodiments of the present invention propose a reinforcement learning based approach to replace the traditional optimization approach.
Firstly, a state space, an action space and a return function in reinforcement learning are defined, and a Markov decision process is established for a solution to be proposed. Then, a method based on deep reinforcement learning is proposed to solve the above optimization problem and reduce the computational complexity.
1) Definition of state space, action space and reward function:
three key elements need to be determined in the reinforcement learning-based method: states, actions, and rewards, which in the context of the present problem may be defined as:
state space: within a certain time slice t, the available computing resources and the available spectrum resources are determined by the system state
Figure BDA0002418221630000111
And
Figure BDA0002418221630000112
where the former is a percentage of the computing resources that are idle in the current MEC server and the latter is a percentage of the spectrum resources that are available in the current wireless channel, observing that their role is to maintain constraints on computing resource capacity and communication channel resource capacity. In addition, the energy consumption of all users in each time slice e (t) needs to be observed to compare whether the optimal state is reached. Thus, the state vector within a certain time slice t can be represented as:
Figure BDA0002418221630000113
an action space: in the MEC system provided in the embodiment of the present invention, the MEC server needs to determine an offloading policy of the computing task to select a local execution mode or an edge execution mode. In addition, the method can be used for producing a composite materialIt is also determined that the UE is allocated within a certain time slice tiTherefore, within a certain time slice t, the action vector should contain three parts, the offload decision vector η for the UE respectively, x1,x2,...,xn}, calculating a resource allocation vector { β1,β2,...,βiAnd a communication resource allocation vector theta1,θ2,...,θiTherefore, the current motion vector can be formed by combining some possible values in the three parts, which can be specifically expressed as: di(t)={x1,x2,...,xn12,...,θi,β1,β2,...,βi}。
A return function: generally, the real-time network reward function should be related to the objective function. The optimization goal of the embodiments of the present invention is to obtain the minimum total energy consumption of all users, while the goal of reinforcement learning is to achieve the maximum return. Therefore, the return value needs to be converted into a negative correlation with the total energy consumption value. Now within a certain time slice t, when the state is
Figure BDA0002418221630000121
Execute a certain action diAfter (t), the immediate reward earned by the agent may be expressed as
Figure BDA0002418221630000122
To minimize energy consumption for all users, the instant payback is defined uniformly as
Figure BDA0002418221630000123
Wherein
Figure BDA0002418221630000124
The actual total energy consumption in the current state is given.
2) Markov decision process:
the markov decision process is the basis for reinforcement learning. In general, almost all planning problems in reinforcement learning can be solvedDescribed in terms of MDP. Embodiments of the present invention approximate the computational offload optimization problem as an MDP, where the agent continuously learns and makes decisions through iterative interactions with the unknown environment in discrete time steps. Specifically, the agent observes a current state of the environment at each time step of
Figure BDA0002418221630000125
Then selecting and executing an allowable action according to the strategy pi
Figure BDA0002418221630000126
Figure BDA0002418221630000127
A policy π is considered as a mapping from a current state to a corresponding action, and a particular policy π may be in a different current state
Figure BDA0002418221630000128
Down-lead decision-making actions
Figure BDA0002418221630000129
After that, the agent will get an immediate reward
Figure BDA00024182216300001210
At the same time the system will transition to the next new state.
For long-term considerations, agents are in a state
Figure BDA00024182216300001211
State cost function at pi time of lower execution strategy
Figure BDA00024182216300001212
This state cost function can be used to evaluate the long-term impact (measure the value of a state or an available state-action pair) of implementing policy π in the current state, as determined by the expected long-term discount return value and a discount factor. Thus, in any initial state
Figure BDA00024182216300001213
The state cost functions of (b) can all be defined as follows:
Figure BDA00024182216300001214
wherein
Figure BDA0002418221630000131
Indicating the desire for it to be,
Figure BDA0002418221630000132
is a discount factor that indicates the importance of the future return relative to the current return.
Now use
Figure BDA0002418221630000133
To indicate at any current state
Figure BDA0002418221630000134
Execute a certain action dtNext new state after, and slave state
Figure BDA0002418221630000135
To the state
Figure BDA0002418221630000136
Has a transition probability of
Figure BDA0002418221630000137
State cost function when planning a system environment as an MDP
Figure BDA0002418221630000138
Can be converted into a time difference form by Bellman Equation (Bellman Equation). The method comprises the following specific steps:
Figure BDA0002418221630000139
through the above process, the method can be seenThe purpose of the reinforcement learning agent is to be in the current state
Figure BDA00024182216300001310
Next, an optimal control strategy is made that maximizes the expected long-term discount return
Figure BDA00024182216300001311
Thus, at the optimal strategy π*The optimization problem in the embodiment of the invention can be converted into a recursive optimal state cost function
Figure BDA00024182216300001312
The method comprises the following specific steps:
Figure BDA00024182216300001313
s.t.constraints in(C1)-(C6)
then in the policy
Figure BDA00024182216300001314
For the state
Figure BDA00024182216300001315
Optimal action decision of
Figure BDA00024182216300001316
Can be expressed as:
Figure BDA00024182216300001317
3) the solution based on deep reinforcement learning comprises the following steps:
the traditional reinforcement learning method can estimate the optimal action value of the state-allowed action pair in each time step
Figure BDA00024182216300001318
And stores or updates it in the Q table. For the dynamic environment of the network model, traditional reinforcement learning algorithms attempt to force the agent at each time stepAnd respectively and automatically learning optimal behavior decisions in specific upper and lower environments within a long term. The algorithm can directly approximate the optimal Q value of any state-action pair instead of modeling the dynamic information in the MDP, and then the Q value is updated in a well-maintained two-dimensional Q table after each iteration. Finally, the corresponding policy can be derived by selecting the action that maximizes the Q value at each state. Here will be the state
Figure BDA0002418221630000141
Next action d that can be takentIs defined as a state-action Q function, then a certain action d is performedtThe later expected cumulative reward is:
Figure BDA0002418221630000142
at this time, the optimum state cost function can be easily obtained
Figure BDA0002418221630000143
The relationship to the state-action Q function is:
Figure BDA0002418221630000144
in conjunction with equation (24) and equation (25), equation (24) can be rewritten as follows:
Figure BDA0002418221630000145
in order to further avoid the bottleneck of the conventional reinforcement learning method, the present invention adopts a method based on deep reinforcement learning to solve the proposed markov decision problem, and utilizes a Deep Neural Network (DNN) to estimate the action value function value, and the method based on DR L can successfully utilize the updated deep neural network parameter θ to approximate the optimal Q value.
The Q value in DR L can be expressed as follows:
Figure BDA0002418221630000146
where θ is the weight of the master neural network. There is another target neural network, as will be described below.
Unlike traditional reinforcement learning methods, the DR L-based method utilizes a mechanism of experience replay pool, during any time slice t, the DR L agent will experience transition tuples (z) in each time stept,dt,rt,zt+1) In another aspect, a fixed Q-target mechanism exists in DR L, and the Q-target mechanism is used to maintain two identical neural networks with different parameters in DR L to break up correlations
Figure BDA0002418221630000151
By the weight coefficient theta of the main neural networkjAccording to
Figure BDA0002418221630000152
ζ < 1 for periodic updates. Then, the fixed Q-target mechanism is used to generate the target Q value
Figure BDA0002418221630000153
Is represented as follows:
Figure BDA0002418221630000154
furthermore, the target Q-network updates its weights after some training steps, rather than updating the weights every training step. By doing so, the learning process of the agent may become more stable.
Throughout the training process, the DR L agent randomly selects a small batch of R samples (z) from the empirical replay pool each timej,dj,rj,zj+1) In each iteration, the depth Q function is trained to gradually approach the target value by minimizing the loss function L oss (θ). The loss function L oss (θ) may be expressed as follows:
Figure BDA0002418221630000155
the basic idea behind the DR L-based approach is to first build a deep neural network to obtain each state-action pair
Figure BDA0002418221630000156
As a function of its value
Figure BDA0002418221630000157
The correlation between them. In particular, it is necessary to pre-process the offloading decisions and resource allocation of the MEC system for a sufficiently long time using a randomly chosen strategy. Then, an action is performed and the corresponding estimated Q value is stored
Figure BDA0002418221630000158
And some state transition information files to the experience playback pool. Finally, the input state-action pairs are utilized
Figure BDA0002418221630000159
And the value function of the output
Figure BDA00024182216300001510
In particular, in each epsilon, the DR L agent first obtains an initial observed state of the MEC system and takes its observed state as the initial state
Figure BDA00024182216300001511
Then the action d to be performed is selected again using the ∈ -greedy strategytI.e. there is a very small probability value ∈ at each action selection to randomly select an action set
Figure BDA00024182216300001512
Otherwise, the action-state pair that maximizes the estimated Q value obtained by the master neural network is selected as the action-state pair
Figure BDA00024182216300001513
To select an action. The agent then performs action dtAnd obtaining the corresponding return value r of the action from the MEC systemtAnd the next observation state
Figure BDA0002418221630000161
Simultaneous transfer of experience tuples in each time step
Figure BDA0002418221630000162
These arriving samples can be used to train parameters of the neural network, and the agent may randomly select a small number of previous samples from the experience replay pool in the subsequent training to train parameters of the deep neural network. After the target Q value is calculated
Figure BDA0002418221630000163
Then, the DR L agent updates the parameter theta of the main neural network through the minimization loss function L oss (theta), and the gradient strategy updating formula of the parameter theta can be obtained through the gradient strategy updating formula
Figure BDA0002418221630000164
And (4) calculating. Therefore, a random gradient descent is performed before the state-action Q function converges to an optimal value.
The method comprises the steps of considering actual calculation unloading and resource allocation characteristics in a time-varying MEC system, considering a delay threshold of a task and limited resource capacity constraint of the system, jointly optimizing unloading decision and communication in task execution and allocation of calculation resources, modeling a corresponding problem into a mixed integer nonlinear programming problem from the perspective of energy consumption, and aiming at minimizing total energy consumption of all UEs.
The technical effects of the technical scheme are verified by combining specific experimental data as follows:
in experiments, the present invention considers a small cell with inscribed circle radius, where an AP with MEC server deployed is located in the center of the small cell. In each time slice, a plurality of UEs with computation tasks are randomly distributed in the coverage area of the AP.
The embodiment of the invention compares the performance of the proposed DR L-based method with that of other reference methods under the multi-user scenario, wherein the self-computing power of the UE is 0.8GHz, the computing power of an MEC server on an AP is 6GHzThe intervals (12, 16) Mbit are subject to uniform distribution, and the number of CPU rounds required to complete the corresponding computing task is subject to uniform distribution in the intervals (2000, 2500) Megacycles. At this time, the maximum tolerable time delay of the calculation task is 3s, the parameter learning rate is 0.1, and the return attenuation is
Figure BDA0002418221630000165
Is 0.9.
In the baseline method of participating in the comparison, the UEs are shown attempting to meet the maximum delay threshold as "L ocal First
Figure BDA0002418221630000171
A method that performs its tasks as locally as possible under constraints. In contrast, "Offloading First" is used to indicate that UEs will prefer to offload tasks to the method performed by the MEC server. In the Offloading First method, the total communication resources and computation resources of the MEC server will be evenly allocated to each UE. It should be noted that, since the resource requirements of different computing tasks are dynamic at each time slice t, the maximum tolerable delay is
Figure BDA0002418221630000172
Under the constraints of (2), some UEs may be unable to perform the arriving tasks on the local device due to the excessive computational resources required. The key difference between the proposed method and the benchmark method is that the proposed method can dynamically make offloading decisions and allocate computing resources for executed tasks in the MEC system.
Fig. 3 is a convergence analysis diagram of the proposed DR L-based method according to an embodiment of the present invention, and as shown in fig. 3, for the proposed DR L-based method, the reward value at each time slice epsilon gradually increases with continuous iteration of the user agent and the MEC system environment, and at this time, the agent can gradually learn an efficient computation offload policy without any prior information.
FIG. 4 is a diagram illustrating the energy consumption of all users for different UE amounts according to an embodiment of the present invention, as shown in FIG. 4, when the computing power of the UE and the MEC server are 0.8GHz and 6GHz, respectively, the total energy consumption of the proposed DR L-based method varies with the increase of the UE amount, it can be seen that the total energy consumption of the three methods increases with the increase of the UE amount, by comparing the three methods, it can be found that the proposed DR L-based method has the best performance and the least total energy consumption, which indicates that the proposed method is effective.
Fig. 5 is a schematic diagram of energy consumption of all users under different total computing resources of the MEC server according to an embodiment of the present invention, as shown in fig. 5, when the number of UEs is 5, the proposed DR L-based method and two other reference methods have different computing capacities f of the MEC servermecThe performance of the proposed DR L-based method is still best, meaning that the proposed method is better than the Offloading First method and the L cal First methodTime delay and the corresponding power consumption.
Based on any of the above embodiments, fig. 6 is a schematic diagram of a computation offloading and resource allocation apparatus based on deep reinforcement learning according to an embodiment of the present invention, as shown in fig. 6, an embodiment of the present invention provides a computation offloading and resource allocation apparatus based on deep reinforcement learning, including a construction module 601 and a determination module 602, where:
the building module 601 is configured to build an optimization problem model based on a calculation task parameter of the terminal UE, a performance parameter of the UE, a channel parameter between the UE and the access point AP, and a total calculation resource of the mobile edge calculation MEC server; the determining module 602 is configured to determine an optimal solution of the optimization problem model based on deep reinforcement learning, where the optimal solution includes an offloading decision of the UE, a percentage of a computing resource allocated to the UE by the MEC server to a total computing resource of the UE, and a percentage of a spectrum resource allocated to the UE by the AP to a total spectrum resource of the UE.
Embodiments of the present invention provide a device for computation offload and resource allocation based on deep reinforcement learning, which is configured to perform the method described in any of the above embodiments, and specific steps of performing the method described in one of the above embodiments by using the device provided in this embodiment are the same as those in the corresponding embodiments, and are not described herein again.
The device for calculating, unloading and resource allocation based on deep reinforcement learning provided by the embodiment of the invention considers the actual characteristics of calculating, unloading and resource allocation in the time-varying MEC system, the time delay threshold of the task and the limited resource capacity constraint of the system, and determines the joint optimal scheme of calculating, unloading and resource allocation based on deep reinforcement learning, thereby further reducing the energy consumption of the UE.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 7, the electronic device includes: a processor (processor)701, a communication Interface (Communications Interface)702, a memory (memory)703 and a communication bus 704, wherein the processor 701, the communication Interface 702 and the memory 703 complete communication with each other through the communication bus 704. The processor 701 and the memory 702 communicate with each other via a bus 703. The processor 701 may call logic instructions in the memory 703 to perform the following method:
calculating total calculation resources of an MEC server based on calculation task parameters of terminal UE, performance parameters of the UE, channel parameters between the UE and an access point AP and a mobile edge, and constructing an optimization problem model;
and determining an optimal solution of the optimization problem model based on deep reinforcement learning, wherein the optimal solution comprises an unloading decision of the UE, the percentage of the computing resources distributed to the UE by the MEC server to the total computing resources is the percentage of the spectrum resources distributed to the UE by the AP to the total spectrum resources.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Further, embodiments of the present invention provide a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the steps of the above-described method embodiments, for example, including:
calculating total calculation resources of an MEC server based on calculation task parameters of terminal UE, performance parameters of the UE, channel parameters between the UE and an access point AP and a mobile edge, and constructing an optimization problem model;
and determining an optimal solution of the optimization problem model based on deep reinforcement learning, wherein the optimal solution comprises an unloading decision of the UE, the percentage of the computing resources distributed to the UE by the MEC server to the total computing resources is the percentage of the spectrum resources distributed to the UE by the AP to the total spectrum resources.
Further, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above method embodiments, for example, including:
calculating total calculation resources of an MEC server based on calculation task parameters of terminal UE, performance parameters of the UE, channel parameters between the UE and an access point AP and a mobile edge, and constructing an optimization problem model;
and determining an optimal solution of the optimization problem model based on deep reinforcement learning, wherein the optimal solution comprises an unloading decision of the UE, the percentage of the computing resources distributed to the UE by the MEC server to the total computing resources is the percentage of the spectrum resources distributed to the UE by the AP to the total spectrum resources.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A deep reinforcement learning-based computing offloading and resource allocation method is characterized by comprising the following steps:
calculating total calculation resources of an MEC server based on calculation task parameters of terminal UE, performance parameters of the UE, channel parameters between the UE and an access point AP and a mobile edge, and constructing an optimization problem model;
and determining an optimal solution of the optimization problem model based on deep reinforcement learning, wherein the optimal solution comprises an unloading decision of the UE, the percentage of the computing resources distributed to the UE by the MEC server to the total computing resources is the percentage of the spectrum resources distributed to the UE by the AP to the total spectrum resources.
2. The deep reinforcement learning-based computation offloading and resource allocation method according to claim 1, wherein the computation task parameters include an amount of computation resources required to complete the computation task, a data size of the computation task, and a maximum tolerable delay for executing the computation task.
3. The deep reinforcement learning-based computing offloading and resource allocation method according to claim 1, wherein the performance parameters include energy consumed per round of the CPU when performing the computing task locally, transmission power when uploading data, and power consumption in a standby state.
4. The deep reinforcement learning-based computation offload and resource allocation method according to claim 1, wherein the channel parameters comprise channel bandwidth of available spectrum, channel gain of wireless transmission channel and power of white gaussian noise inside the channel.
5. The deep reinforcement learning-based computing offloading and resource allocation method according to claim 1, wherein the optimization problem model aims at: the long term energy consumption of all UEs in the system is minimized.
6. The deep reinforcement learning-based computation offload and resource allocation method according to claim 1, wherein the constraint conditions of the optimization problem model are as follows:
a. the offloading decision of the UE can only choose local execution or edge execution to handle its computational tasks;
b. the execution time of local or unloading calculation cannot exceed the maximum tolerable time delay of a certain calculation task;
c. the sum of the computing resources allocated to all UEs cannot exceed the total computing resources that the MEC server can provide;
d. the computing resources allocated to any UE cannot exceed the total computing resources that the MEC server can provide;
e. the sum of the spectrum resources allocated to all UEs cannot exceed the total spectrum resources that the AP can provide;
f. the spectrum resources allocated to any UE cannot exceed the total spectrum resources that the AP can provide.
7. The deep reinforcement learning-based computation offload and resource allocation method according to any one of claims 1 to 6, wherein the determining an optimal solution of the optimization problem model based on deep reinforcement learning specifically comprises:
determining a state space, an action space and a return function according to the optimization problem model;
constructing a Markov decision problem;
and calculating the Markov decision problem based on deep reinforcement learning, estimating an action value function value by utilizing a deep neural network DNN, and determining the optimal solution of the optimization problem model.
8. An apparatus for computing offloading and resource allocation based on deep reinforcement learning, comprising:
the building module is used for computing the total computing resources of the MEC server based on the computing task parameters of the terminal UE, the performance parameters of the UE, the channel parameters between the UE and the access point AP and the mobile edge, and building an optimization problem model;
and the determining module is used for determining the optimal solution of the optimization problem model based on deep reinforcement learning, wherein the optimal solution comprises the unloading decision of the UE, the computing resources distributed to the UE by the MEC server account for the percentage of the total computing resources, and the spectrum resources distributed to the UE by the AP account for the percentage of the total spectrum resources.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, implements the steps of the deep reinforcement learning-based computation offload and resource allocation method according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, carries out the steps of the deep reinforcement learning-based computation offload and resource allocation method according to any one of claims 1 to 7.
CN202010197729.8A 2020-03-19 2020-03-19 Calculation unloading and resource allocation method and device based on deep reinforcement learning Pending CN111405569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010197729.8A CN111405569A (en) 2020-03-19 2020-03-19 Calculation unloading and resource allocation method and device based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010197729.8A CN111405569A (en) 2020-03-19 2020-03-19 Calculation unloading and resource allocation method and device based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN111405569A true CN111405569A (en) 2020-07-10

Family

ID=71414019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010197729.8A Pending CN111405569A (en) 2020-03-19 2020-03-19 Calculation unloading and resource allocation method and device based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111405569A (en)

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111405568A (en) * 2020-03-19 2020-07-10 三峡大学 Computing unloading and resource allocation method and device based on Q learning
CN111918339A (en) * 2020-07-17 2020-11-10 西安交通大学 AR task unloading and resource allocation method based on reinforcement learning in mobile edge network
CN111970762A (en) * 2020-08-06 2020-11-20 北京邮电大学 Spectrum allocation method and device and electronic equipment
CN111970154A (en) * 2020-08-24 2020-11-20 浙江工商大学 Unloading decision and resource allocation method based on deep reinforcement learning and convex optimization
CN112272390A (en) * 2020-10-20 2021-01-26 广州大学 Processing method and system for task unloading and bandwidth allocation based on physical layer
CN112422346A (en) * 2020-11-19 2021-02-26 北京航空航天大学 Variable-period mobile edge computing unloading decision method considering multi-resource limitation
CN112492591A (en) * 2020-11-06 2021-03-12 广东电网有限责任公司电力调度控制中心 Method and device for accessing power Internet of things terminal to network
CN112732359A (en) * 2021-01-14 2021-04-30 广东技术师范大学 Multi-user hybrid computing unloading method and device, electronic equipment and storage medium
CN112764936A (en) * 2021-01-29 2021-05-07 北京邮电大学 Edge calculation server information processing method and device based on deep reinforcement learning
CN112764932A (en) * 2021-01-27 2021-05-07 西安电子科技大学 Deep reinforcement learning-based calculation-intensive workload high-energy-efficiency distribution method
CN112862083A (en) * 2021-04-06 2021-05-28 南京大学 Deep neural network inference method and device under edge environment
CN112929849A (en) * 2021-01-27 2021-06-08 南京航空航天大学 Reliable vehicle-mounted edge calculation unloading method based on reinforcement learning
CN113377531A (en) * 2021-06-04 2021-09-10 重庆邮电大学 Mobile edge computing distributed service deployment method based on wireless energy drive
CN113435580A (en) * 2021-06-29 2021-09-24 福州大学 DNN application calculation unloading self-adaptive middleware construction method in edge environment
CN113452625A (en) * 2021-06-28 2021-09-28 重庆大学 Deep reinforcement learning-based unloading scheduling and resource allocation method
CN113573363A (en) * 2021-07-27 2021-10-29 西安热工研究院有限公司 MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN113568727A (en) * 2021-07-23 2021-10-29 湖北工业大学 Mobile edge calculation task allocation method based on deep reinforcement learning
CN113612843A (en) * 2021-08-02 2021-11-05 吉林大学 MEC task unloading and resource allocation method based on deep reinforcement learning
CN113626104A (en) * 2021-08-18 2021-11-09 北京工业大学 Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
CN113726858A (en) * 2021-08-12 2021-11-30 西安交通大学 Self-adaptive AR task unloading and resource allocation method based on reinforcement learning
CN113821346A (en) * 2021-09-24 2021-12-21 天津大学 Computation uninstalling and resource management method in edge computation based on deep reinforcement learning
CN114025359A (en) * 2021-11-01 2022-02-08 湖南大学 Resource allocation and computation unloading method, system, device and medium based on deep reinforcement learning
CN114116209A (en) * 2021-11-12 2022-03-01 中国人民解放军国防科技大学 Spectrum map construction and distribution method and system based on deep reinforcement learning
CN114339819A (en) * 2020-11-06 2022-04-12 北京航空航天大学 Calculation unloading method based on optimal resource allocation amount and search algorithm
CN114490057A (en) * 2022-01-24 2022-05-13 电子科技大学 MEC unloaded task resource allocation method based on deep reinforcement learning
CN115328638A (en) * 2022-10-13 2022-11-11 北京航空航天大学 Multi-aircraft task scheduling method based on mixed integer programming
CN115396955A (en) * 2022-08-24 2022-11-25 广西电网有限责任公司 Resource allocation method and device based on deep reinforcement learning algorithm
CN115421930A (en) * 2022-11-07 2022-12-02 山东海量信息技术研究院 Task processing method, system, device, equipment and computer readable storage medium
CN115623540A (en) * 2022-11-11 2023-01-17 南京邮电大学 Edge optimization unloading method of mobile equipment
WO2023144926A1 (en) * 2022-01-26 2023-08-03 日本電信電話株式会社 Offload server, offload control method, and offload program

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130218814A1 (en) * 2012-02-20 2013-08-22 Xerox Corporation Method and system for the dynamic allocation of resources based on fairness, throughput, and user behavior measurement
CN109302709A (en) * 2018-09-14 2019-02-01 重庆邮电大学 The unloading of car networking task and resource allocation policy towards mobile edge calculations
US20190325307A1 (en) * 2018-04-20 2019-10-24 EMC IP Holding Company LLC Estimation of resources utilized by deep learning applications
CN110418356A (en) * 2019-06-18 2019-11-05 深圳大学 A kind of calculating task discharging method, device and computer readable storage medium
CN110418416A (en) * 2019-07-26 2019-11-05 东南大学 Resource allocation methods based on multiple agent intensified learning in mobile edge calculations system
CN110557769A (en) * 2019-09-12 2019-12-10 南京邮电大学 C-RAN calculation unloading and resource allocation method based on deep reinforcement learning
US20200008044A1 (en) * 2019-09-12 2020-01-02 Intel Corporation Multi-access edge computing service for mobile user equipment method and apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130218814A1 (en) * 2012-02-20 2013-08-22 Xerox Corporation Method and system for the dynamic allocation of resources based on fairness, throughput, and user behavior measurement
US20190325307A1 (en) * 2018-04-20 2019-10-24 EMC IP Holding Company LLC Estimation of resources utilized by deep learning applications
CN109302709A (en) * 2018-09-14 2019-02-01 重庆邮电大学 The unloading of car networking task and resource allocation policy towards mobile edge calculations
CN110418356A (en) * 2019-06-18 2019-11-05 深圳大学 A kind of calculating task discharging method, device and computer readable storage medium
CN110418416A (en) * 2019-07-26 2019-11-05 东南大学 Resource allocation methods based on multiple agent intensified learning in mobile edge calculations system
CN110557769A (en) * 2019-09-12 2019-12-10 南京邮电大学 C-RAN calculation unloading and resource allocation method based on deep reinforcement learning
US20200008044A1 (en) * 2019-09-12 2020-01-02 Intel Corporation Multi-access edge computing service for mobile user equipment method and apparatus

Cited By (53)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111405568A (en) * 2020-03-19 2020-07-10 三峡大学 Computing unloading and resource allocation method and device based on Q learning
CN111405568B (en) * 2020-03-19 2023-01-17 三峡大学 Computing unloading and resource allocation method and device based on Q learning
CN111918339A (en) * 2020-07-17 2020-11-10 西安交通大学 AR task unloading and resource allocation method based on reinforcement learning in mobile edge network
CN111918339B (en) * 2020-07-17 2022-08-05 西安交通大学 AR task unloading and resource allocation method based on reinforcement learning in mobile edge network
CN111970762A (en) * 2020-08-06 2020-11-20 北京邮电大学 Spectrum allocation method and device and electronic equipment
CN111970762B (en) * 2020-08-06 2022-04-01 北京邮电大学 Spectrum allocation method and device and electronic equipment
CN111970154A (en) * 2020-08-24 2020-11-20 浙江工商大学 Unloading decision and resource allocation method based on deep reinforcement learning and convex optimization
CN111970154B (en) * 2020-08-24 2022-06-10 浙江工商大学 Unloading decision and resource allocation method based on deep reinforcement learning and convex optimization
CN112272390A (en) * 2020-10-20 2021-01-26 广州大学 Processing method and system for task unloading and bandwidth allocation based on physical layer
CN112272390B (en) * 2020-10-20 2023-06-20 广州大学 Processing method and system for task unloading and bandwidth allocation based on physical layer
CN112492591B (en) * 2020-11-06 2022-12-09 广东电网有限责任公司电力调度控制中心 Method and device for accessing power Internet of things terminal to network
CN114339819A (en) * 2020-11-06 2022-04-12 北京航空航天大学 Calculation unloading method based on optimal resource allocation amount and search algorithm
CN114339819B (en) * 2020-11-06 2023-10-03 北京航空航天大学 Computing unloading method based on optimal resource allocation amount and search algorithm
CN112492591A (en) * 2020-11-06 2021-03-12 广东电网有限责任公司电力调度控制中心 Method and device for accessing power Internet of things terminal to network
CN112422346A (en) * 2020-11-19 2021-02-26 北京航空航天大学 Variable-period mobile edge computing unloading decision method considering multi-resource limitation
CN112732359A (en) * 2021-01-14 2021-04-30 广东技术师范大学 Multi-user hybrid computing unloading method and device, electronic equipment and storage medium
CN112929849B (en) * 2021-01-27 2022-03-01 南京航空航天大学 Reliable vehicle-mounted edge calculation unloading method based on reinforcement learning
CN112929849A (en) * 2021-01-27 2021-06-08 南京航空航天大学 Reliable vehicle-mounted edge calculation unloading method based on reinforcement learning
CN112764932A (en) * 2021-01-27 2021-05-07 西安电子科技大学 Deep reinforcement learning-based calculation-intensive workload high-energy-efficiency distribution method
CN112764932B (en) * 2021-01-27 2022-12-02 西安电子科技大学 Deep reinforcement learning-based calculation-intensive workload high-energy-efficiency distribution method
CN112764936A (en) * 2021-01-29 2021-05-07 北京邮电大学 Edge calculation server information processing method and device based on deep reinforcement learning
CN112764936B (en) * 2021-01-29 2022-06-14 北京邮电大学 Edge calculation server information processing method and device based on deep reinforcement learning
CN112862083A (en) * 2021-04-06 2021-05-28 南京大学 Deep neural network inference method and device under edge environment
CN112862083B (en) * 2021-04-06 2024-04-09 南京大学 Deep neural network inference method and device in edge environment
CN113377531B (en) * 2021-06-04 2022-08-26 重庆邮电大学 Mobile edge computing distributed service deployment method based on wireless energy drive
CN113377531A (en) * 2021-06-04 2021-09-10 重庆邮电大学 Mobile edge computing distributed service deployment method based on wireless energy drive
CN113452625A (en) * 2021-06-28 2021-09-28 重庆大学 Deep reinforcement learning-based unloading scheduling and resource allocation method
CN113452625B (en) * 2021-06-28 2022-04-15 重庆大学 Deep reinforcement learning-based unloading scheduling and resource allocation method
CN113435580B (en) * 2021-06-29 2022-06-07 福州大学 DNN application calculation unloading self-adaptive middleware construction method in edge environment
CN113435580A (en) * 2021-06-29 2021-09-24 福州大学 DNN application calculation unloading self-adaptive middleware construction method in edge environment
CN113568727A (en) * 2021-07-23 2021-10-29 湖北工业大学 Mobile edge calculation task allocation method based on deep reinforcement learning
CN113573363B (en) * 2021-07-27 2024-01-23 西安热工研究院有限公司 MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN113573363A (en) * 2021-07-27 2021-10-29 西安热工研究院有限公司 MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN113612843B (en) * 2021-08-02 2022-08-30 吉林大学 MEC task unloading and resource allocation method based on deep reinforcement learning
CN113612843A (en) * 2021-08-02 2021-11-05 吉林大学 MEC task unloading and resource allocation method based on deep reinforcement learning
CN113726858B (en) * 2021-08-12 2022-08-16 西安交通大学 Self-adaptive AR task unloading and resource allocation method based on reinforcement learning
CN113726858A (en) * 2021-08-12 2021-11-30 西安交通大学 Self-adaptive AR task unloading and resource allocation method based on reinforcement learning
CN113626104A (en) * 2021-08-18 2021-11-09 北京工业大学 Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
CN113626104B (en) * 2021-08-18 2023-12-15 北京工业大学 Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
CN113821346A (en) * 2021-09-24 2021-12-21 天津大学 Computation uninstalling and resource management method in edge computation based on deep reinforcement learning
CN113821346B (en) * 2021-09-24 2023-09-05 天津大学 Edge computing unloading and resource management method based on deep reinforcement learning
CN114025359B (en) * 2021-11-01 2024-04-23 湖南大学 Resource allocation and calculation unloading method, system, equipment and medium based on deep reinforcement learning
CN114025359A (en) * 2021-11-01 2022-02-08 湖南大学 Resource allocation and computation unloading method, system, device and medium based on deep reinforcement learning
CN114116209A (en) * 2021-11-12 2022-03-01 中国人民解放军国防科技大学 Spectrum map construction and distribution method and system based on deep reinforcement learning
CN114490057B (en) * 2022-01-24 2023-04-25 电子科技大学 MEC offloaded task resource allocation method based on deep reinforcement learning
CN114490057A (en) * 2022-01-24 2022-05-13 电子科技大学 MEC unloaded task resource allocation method based on deep reinforcement learning
WO2023144926A1 (en) * 2022-01-26 2023-08-03 日本電信電話株式会社 Offload server, offload control method, and offload program
CN115396955A (en) * 2022-08-24 2022-11-25 广西电网有限责任公司 Resource allocation method and device based on deep reinforcement learning algorithm
CN115328638A (en) * 2022-10-13 2022-11-11 北京航空航天大学 Multi-aircraft task scheduling method based on mixed integer programming
CN115328638B (en) * 2022-10-13 2023-01-10 北京航空航天大学 Multi-aircraft task scheduling method based on mixed integer programming
CN115421930A (en) * 2022-11-07 2022-12-02 山东海量信息技术研究院 Task processing method, system, device, equipment and computer readable storage medium
CN115623540B (en) * 2022-11-11 2023-10-03 南京邮电大学 Edge optimization unloading method for mobile equipment
CN115623540A (en) * 2022-11-11 2023-01-17 南京邮电大学 Edge optimization unloading method of mobile equipment

Similar Documents

Publication Publication Date Title
CN111405569A (en) Calculation unloading and resource allocation method and device based on deep reinforcement learning
CN111405568B (en) Computing unloading and resource allocation method and device based on Q learning
CN109729528B (en) D2D resource allocation method based on multi-agent deep reinforcement learning
CN111586696B (en) Resource allocation and unloading decision method based on multi-agent architecture reinforcement learning
CN107766135B (en) Task allocation method based on particle swarm optimization and simulated annealing optimization in moving cloud
Lee et al. An online secretary framework for fog network formation with minimal latency
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN113543176B (en) Unloading decision method of mobile edge computing system based on intelligent reflecting surface assistance
Wang et al. Resource management for edge intelligence (EI)-assisted IoV using quantum-inspired reinforcement learning
KR20230007941A (en) Edge computational task offloading scheme using reinforcement learning for IIoT scenario
CN113573363A (en) MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN113821346B (en) Edge computing unloading and resource management method based on deep reinforcement learning
CN113747507B (en) 5G ultra-dense network-oriented computing resource management method and device
CN114980039A (en) Random task scheduling and resource allocation method in MEC system of D2D cooperative computing
CN112689296B (en) Edge calculation and cache method and system in heterogeneous IoT network
Jo et al. Deep reinforcement learning‐based joint optimization of computation offloading and resource allocation in F‐RAN
CN116828534B (en) Intensive network large-scale terminal access and resource allocation method based on reinforcement learning
Hossain et al. Edge orchestration based computation peer offloading in MEC-enabled networks: a fuzzy logic approach
CN115665869A (en) Multi-user collaboration platform and method based on edge calculation and directed acyclic graph
Zhang et al. Computation offloading and shunting scheme in wireless wireline internetwork
Cao 5G communication resource allocation strategy based on edge computing
Cen et al. Resource Allocation Strategy Using Deep Reinforcement Learning in Cloud-Edge Collaborative Computing Environment
Liu et al. A Joint Allocation Algorithm of Computing and Communication Resources Based on Reinforcement Learning in MEC System.
Agbaje et al. Deep Reinforcement Learning for Energy-Efficient Task Offloading in Cooperative Vehicular Edge Networks
Hlophe et al. Prospect-theoretic DRL Approach for Container Provisioning in Energy-constrained Edge Platforms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination