CN110798849A

CN110798849A - Computing resource allocation and task unloading method for ultra-dense network edge computing

Info

Publication number: CN110798849A
Application number: CN201910959379.1A
Authority: CN
Inventors: 刘家佳; 郭鸿志; 孙文; 张海宾; 周小艺; 吕剑锋
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2020-02-14

Abstract

A computing resource allocation and task unloading method for ultra-dense network edge computing comprises the following steps: step 1, establishing a system model based on an ultra-dense network edge computing network of an SDN (software defined network), and acquiring network parameters; step 2, obtaining parameters required by edge calculation: sequentially carrying out local calculation and unloading to an edge server of a macro base station and an edge server connected with a small base station s to obtain an uplink data rate for transmitting a calculation task; step 3, obtaining an optimal computing resource allocation and task unloading strategy by adopting a Q-learning scheme; and step 4, obtaining an optimal computing resource allocation and task unloading strategy by adopting the DQN scheme. It is applicable to dynamic systems by stimulating agents to find optimal solutions on the basis of learning variables. In the Reinforcement Learning (RL) algorithm, Q-Learning performs well in some time-varying networks. By combining the deep learning technology with Q-learning, a learning scheme based on a Deep Q Network (DQN) is provided, so that the benefits of mobile equipment and operators are optimized simultaneously in a time-varying environment, and the learning time is shorter and the convergence is faster than that of a method based on Q-learning. The method achieves the benefit of optimizing Mobile Devices (MDs) and operators simultaneously in a time-varying environment based on DQN.

Description

Computing resource allocation and task unloading method for ultra-dense network edge computing

Technical Field

The invention belongs to the technical field of intelligent computers, and particularly relates to a computing resource allocation and task unloading method for ultra-dense network edge computing.

Background

In today's society, ever increasing Mobile Devices (MDs) with innovative applications place unprecedented demands on user experience and network capacity expansion. The ultra-dense network (UDN) can provide enough baseband resources and ubiquitous connectivity for widely distributed mobile equipment, and the Mobile Edge Computing (MEC) can well meet the requirements of high computing resources and low delay of various novel Internet of things applications. Therefore, the combination of ultra-dense networks and mobile edge computing is considered a promising future technology that can significantly increase the capacity of the system and extend the cloud computing power to the nearest edge servers to meet the ever-increasing computing demands of mobile devices.

However, how to optimize the computing resource configuration to maximize the operator's operating revenue while reducing the cost of the mobile device while meeting the different computing requirements of the mobile device has become a challenge to be solved. In terms of mobile equipment, as a key technology in mobile edge computing, Mobile Edge Computing Offloading (MECO) is an effective scheme for improving the benefit of mobile equipment by selecting an optimal offloading strategy. For operators, the computing resources of the edge servers in different computing demand areas are reasonably configured, so that the operating cost (OPEX) can be remarkably reduced. However, most of the conventional optimization schemes focus on one-time optimization targets in certain scenes and situations, and it is difficult to achieve long-term unloading performance in a changing real-world environment.

Disclosure of Invention

The present invention aims to provide a method for allocating computing resources and unloading tasks for ultra-dense network edge computing, so as to solve the above problems.

In order to achieve the purpose, the invention adopts the following technical scheme:

a computing resource allocation and task unloading method for ultra-dense network edge computing comprises the following steps:

step 1, establishing a system model based on an ultra-dense network edge computing network of an SDN (software defined network), and acquiring network parameters: calculating the average task rate reaching the edge server of the macro base station and the average task rate reaching the edge server of the small base station according to the number of the mobile devices, the number of the macro base stations, the number of the wireless channels of the small base stations connected to the macro base station and the number of the wireless channels connected to the small base stations s in the scene;

step 2, obtaining parameters required by edge calculation: sequentially carrying out local calculation and unloading to an edge server of a macro base station and an edge server connected with a small base station s to obtain an uplink data rate for transmitting a calculation task;

step 3, obtaining an optimal computing resource allocation and task unloading strategy by adopting a Q-learning scheme;

and step 4, obtaining an optimal computing resource allocation and task unloading strategy by adopting the DQN scheme.

Further, in step 1, network parameters are obtained: number of mobile devices in scene C, set

Represents; the number of macro base stations is 1, the number of small base stations is B, and the set is usedRepresents; the number of radio channels connected to the macro base station is W^mThe number of radio channels connected to the small base station s is W^s(ii) a The computing task types are E in total, and are represented by epsilon {1,2, … E }, the arrival and processing processes of the tasks adopt an M/M/1 queuing model, and the average task rate of the edge server of the macro base station is

The average task rate to the edge server of the small base station is

The transmission power of mobile m is p_m,nAnd a channel gain between the macro base station and the base station of

And a small base station has a channel gain of

The dependent variables define: c mobile devices are distributed randomly and covered by 1 macro base station and B small base stations, and the distance from each device to the macro base station and the small base stations is D_mUse of sets

Denotes the distance of all devices to all base stations, wherein

α for n-type calculation task requested by Mobile Device (MD) m_m,nRepresentation, computation task feature set ofWherein i_m,nSetting the size of the task to 300-; o_m,nSetting the CPU period required for processing the computing task as 100-1000 Megacycles;set to 0.5-3s for maximum allowable processing delay α_m,nIs described as an offload decision set

Wherein the content of the first and second substances,indicating that the task is to be computed locally,

indicating that the task should be offloaded to an edge server connected to the macro base station,

indicating that mobile device m selects to offload task α_m,nEdge server to connected small base station, wherein

σ²The background noise power is set to-100 dbm.

Further, define

All tasks to be processed at time t; definition ofFor the total computing resources at all edge servers time t,

wherein

a_s(t) is the total resources of all edge servers connecting the macro base station and the small base station; definition of

The computing resources of the seed are used at time t for all edge servers,

wherein sigma_m(t) and b_s(t) for connecting macro base station and small base stationResources being used by all edge servers; definition of

To allocate policies for computing resources to all edge servers,

further, in the step 2,

a. local compute task α_m,nQueue condition v of_m,nSetting the queuing delay introduced per decision periodSecond, CPU cycle o required to process a computational task_m,nAnd computing resources q of a particular mobile device_m. Locally calculating total delay

Calculated from the following formula:

b. computation tasks α off-load to edge servers of macro base station_m,nQueue condition v of_m,nSetting the queuing delay introduced per decision periodSecond, calculate size of task i_m,nCPU cycles o required to process a computational task_m,nAnd the computing resource size q of the edge server connected with the macro base station^mBetween 16-32 GHz. Uplink data rate for transmitting computational tasksCalculated from the following formula:

final total delay of the unloading mode

Calculated from the following formula:

c. computation tasks α off-loaded to edge servers connecting small cells s_m,nQueue condition v of_m,nSetting the queuing delay introduced per decision period

Second, calculate size of task i_m,nCPU cycles o required to process a computational task_m,nAnd the computing resource size q of the edge server connected with the macro base station^sBetween 4-8 GHz. Uplink data rate for transmitting computational tasks

Calculated from the following formula

Final total delay of the unloading mode

Calculated from the following formula:

the calculation time for which three calculation schemes can be obtained is

Further, step 3 specifically includes:

1) initializing a Q table, setting all Q values to be 0, and setting a discount factor gamma and a learning rate α;

2) defining system states

the system state at time t isα therein_m,n(t) is a calculation task feature, v_m,n(t) is the task queuing state,

computing resources for the total of all edge servers at time t-1;

computing resources of the seeds are used for all the edge servers at the time t;

the distance between the mobile equipment and all edge servers is collected;

3) defining actions

: the set of actions at time t is

I.e. a computing resource allocation strategy for all edge servers;

4) defining a reward function

Calculating processing time from edges

And price function

Wherein mu₁The price of computing resources per time unit for edge servers connected to macro base stations is set to 0.7, mu₂The price of computing resources in unit time unit for an edge server connected with a small base station is set as 1; pi_m,nThe size of the computing resources allocated to the respective computing task; the normalized conversion was defined as γ (x), and the normalized user benefit-expenditure utility was calculated as

R_m,n(t) earnings after processing the calculation tasks; the reward value function at time t is

For the total number of processing tasks at time t,

a_s(t) Total resources, σ, of all edge servers connecting the macro base station and the small base station_m(t)，b_s(t) is the resources being used by all edge servers connecting the macro base station and the small base station;

5) observing the current system state s, and executing corresponding action a according to the Q (s, a) value stored in the Q table, namely resource allocation; then observing the next system state s 'after the action a is executed, and according to the current system state s, the executed action a and the next system state s', obtaining the system state

Obtaining the current Q value and storingStoring in a Q table; continuously executing the training process until the training is finished; and finally, obtaining the optimal computing resource allocation and task unloading strategy.

Further, step 4 specifically includes:

1) initializing, namely, evaluating the weight parameter of the network to be theta, the weight parameter of the target network to be theta', discounting factor gamma and learning rate α, and exploring the probability

A priori playback

2) Defining a system state S: the system state at time t is

α therein_m,n(t) is a calculation task feature, v_m,n(t) is the task queuing state,

computing resources for the total of all edge servers at time t-1;computing resources of the seeds are used for all the edge servers at the time t;

the distance between the mobile equipment and all edge servers is collected;

3) define action A: the set of actions at time t is

I.e. a computing resource allocation strategy for all edge servers;

4) defining a reward function R: calculating processing time from edges

And price function

Wherein mu₁The price of computing resources per time unit for edge servers connected to macro base stations is set to 0.7, mu₂The price of computing resources in unit time unit for an edge server connected with a small base station is set as 1; pi_m,nThe size of the computing resources allocated to the respective computing task; the normalized conversion was defined as γ (x), and the normalized user benefit-expenditure utility could be calculated asR_m,n(t) earnings after processing the calculation tasks; the reward value function at time t is

For the total number of processing tasks at time t,

5) adopts an epsilon-greedy method to explore the probability

Gradually decreases from 1 to 0.1; observing the current system state s (t), selecting a random number omega from 0 to 1 ifRandomly selecting an action from all possible actions to execute, namely allocating computing resources; if it is not

According to a (t) argmax_aQ(s) (t; a (t); theta) selection action execution; after the corresponding action is executed, a reward function r (t) is calculated, the next system state s (t +1) is observed, and the transition states (s (t); a (t); r (t); s (t +1)) are stored in an a priori replay

In (1),

wherein Λ (t) ═ { s (t); a (t); r (t); s (t +1) }; randomly selecting MiniBatch from the prior experiment as a sample, and selecting y ═ r (n) + gamma max_a(n+1)Q (s (n + 1); a (n + 1); θ') sets a target network value y; then by a gradient decreasing function

Updating and evaluating a network weight parameter theta; continuously executing the process, and updating the target network weight parameter theta' to the current evaluation network weight parameter theta after J times; repeating the training process until the training is finished; and finally, obtaining the optimal computing resource allocation and task unloading strategy.

Compared with the prior art, the invention has the following technical effects:

in order to obtain a global view of a network and realize centralized management and scheduling, a Software Defined Network (SDN) is introduced into an architecture. By separating the control plane from the data plane, the SDN controller collects dynamic information from the mobile device and the network. Then, by continuously monitoring the system state, an optimal computing resource configuration strategy and a task unloading strategy can be generated.

Two optimal strategies for generating computing resource allocation and fast decision are proposed for a scene, one is a Q-learning-based method, and the other is a DQN-based method.

For the Q-learning based method, the following advantages are provided:

reinforcement Learning (RL) is an important branch of machine learning that is applicable to dynamic systems by stimulating an agent to find an optimal solution based on learning variables. In the Reinforcement Learning (RL) algorithm, Q-Learning performs well in some time-varying networks. The method achieves the benefit of optimizing Mobile Devices (MDs) and operators simultaneously in a time-varying environment based on Q-learning.

For DQN based methods, the following advantages are present:

on the basis of a Q-learning-based method, aiming at the problems of complex state and action information and long learning process in the Q-learning process, the method combines a deep learning technology with the Q-learning, and provides a learning scheme based on a Deep Q Network (DQN). The method has the advantages of the Q-learning-based method, realizes the effect of optimizing the mobile equipment and the operator simultaneously in the time-varying environment, and has shorter learning time and faster convergence compared with the Q-learning-based method.

Drawings

FIG. 1 is a schematic view of a scene model;

FIG. 2 is a flow chart of an algorithm for a Q-learning based method;

FIG. 3 is an algorithmic flow chart of a DQN-based method;

FIG. 4 is a graph of a training curve tracking weighted network utility comparison at different learning rates for a DQN-based method;

FIG. 5 is a comparison of training curves for DQN-based methods tracking average rewards at different learning rates;

FIG. 6 is a comparison of the convergence behavior of the DQN-based method and the Q-learning-based method with training segments;

FIG. 7 is a comparison of total processing delay with the number of Mobile Devices (MDs) under the DQN-based method, the Q-learning-based method, and the game theory method;

fig. 8 is a comparison of computational resource utilization and number of Mobile Devices (MDs) under the DQN-based method, the Q-learning-based method, and the game theory method.

Detailed Description

The invention is further described below with reference to the accompanying drawings:

step 1): as shown in fig. 1, the system model is established as an ultra-dense network edge computing network based on SDN, and is a scenario of multiple users, multiple task types and multiple MEC servers, a user local server and an edge server can only process one computing task at most simultaneously, and an MEC base station supports multi-user access

Acquiring network parameters: number of mobile devices in scene C, setRepresents; the number of macro base stations is 1, the number of small base stations is B, and the set is used

Represents; the number of radio channels connected to the macro base station is W^mThe number of radio channels connected to the small base station s is W^s(ii) a The computing task types are E in total, and are represented by epsilon {1,2, … E }, the arrival and processing processes of the tasks adopt an M/M/1 queuing model, and the average task rate of the edge server of the macro base station is

The average task rate to the edge server of the small base station is

The transmission power of Mobile Device (MD) m is p_m,nAnd a channel gain between the macro base station and the base station of

And a small base station has a channel gain of

The related variables in the invention are defined as follows: c mobile devices are distributed randomly and covered by 1 macro base station and B small base stations, and the distance from each device to the macro base station and the small base stations is D_mUse of setsDenotes the distance of all devices to all base stations, wherein

α for n types of computing tasks requested by mobile device m_m,nRepresentation, computation task feature set of

Wherein i_m,nSetting the size of the task to 300-; o_m,nSetting the CPU period required for processing the computing task as 100-1000 Megacycles;

set to 0.5-3s for maximum allowable processing delay α_m,nIs described as an offload decision set

Wherein the content of the first and second substances,

indicating that the task is to be computed locally,

indicating moving equipmentBackup m selection offload task α_m,nEdge server to connected small base station, wherein

σ²The background noise power is set to-100 dbm.

Because this patent needs to carry out many times iterative learning, define the following variable for convenient representation: definition of

All tasks to be processed at time t; definition of

For the total computing resources at all edge servers time t,

whereina_s(t) is the total resources of all edge servers connecting the macro base station and the small base station; definition of

The computing resources of the seed are used at time t for all edge servers,

wherein sigma_m(t) and b_s(t) is the resources being used by all edge servers connecting the macro base station and the small base station; definition of

To allocate policies for computing resources to all edge servers,

step 2): acquiring parameters required by edge calculation:

a. local compute task α_m,nQueue condition v of_m,nSetting the queuing delay introduced per decision period

Second, CPU cycle o required to process a computational task_m,nAnd computing resources q of a particular mobile device_m. Locally calculating total delay

Calculated from the following formula:

b. computation tasks α off-load to edge servers of macro base station_m,nQueue condition v of_m,nSetting the queuing delay introduced per decision period

Second, calculate task size i_m,nCPU cycles o required to process a computational task_m,nAnd the computing resource size q of the edge server connected with the macro base station^mBetween 16-32 GHz. Uplink data rate for transmitting computational tasks

Calculated from the following formula:

final total delay of the unloading mode

Is represented by the formulaAnd calculating to obtain:

c. computation tasks α off-loaded to edge servers connecting small cells s_m,nQueue condition v of_m,nSetting the queuing delay introduced per decision periodSecond, calculate task size i_m,nCPU cycles o required to process a computational task_m,nAnd the computing resource size q of the edge server connected with the macro base station^sBetween 4-8 GHz. Uplink data rate for transmitting computational tasks

Calculated from the following formula

Final total delay of the unloading modeCalculated from the following formula:

the calculation time for which three calculation schemes can be obtained is

Step 3): as shown in fig. 2, a Q-learning scheme is adopted to obtain an optimal computing resource allocation and task offloading strategy:

1. initialization is to initialize the Q table, make all Q values 0, set the discount factor γ and the learning rate α.

2. Defining system states

: the system state at time t is

the total computing resources at time t-1 are for all edge servers.The seed computing resources are used at time t for all edge servers.

Is the set of distances of the mobile device to all edge servers.

3. Defining actions

: the set of actions at time t is

I.e., a policy for the allocation of computing resources to all edge servers.

4. Defining a reward function

Calculating processing time from edges

And price function

Wherein mu₁The price of computing resources per time unit for edge servers connected to macro base stations is set to 0.7, mu₂The price of computing resources in unit time unit for an edge server connected with a small base station is set as 1; pi_m,nIs the size of the computing resource allocated to the respective computing task. The normalized conversion was defined as γ (x), and the normalized user benefit-expenditure utility could be calculated as

R_m,n(t) is the revenue after processing the computing task. the reward value function at time t is

For the total number of processing tasks at time t,

a_s(t) Total resources, σ, of all edge servers connecting the macro base station and the small base station_m(t)，b_s(t) is the resource being used by all edge servers connecting the macro base station and the small base station.

5. And observing the current system state s, and executing corresponding action a, namely resource allocation according to the Q (s, a) value stored in the Q table. Then observing the next system state s 'after the action a is executed, and according to the current system state s, the executed action a and the next system state s', obtaining the system state

The current Q value is obtained and stored in the Q table. The training process is continuously executed until the training is finished. And finally, obtaining the optimal computing resource allocation and task unloading strategy.

Step 4): as shown in fig. 3, an optimal computation resource allocation and task offloading strategy is obtained by using a DQN scheme:

1. initializing, namely, evaluating the weight parameter of the network to be theta, the weight parameter of the target network to be theta', discounting factor gamma and learning rate α, and exploring the probability

A priori playback

2. Defining a system state S: the system state at time t is

the total computing resources at time t-1 are for all edge servers.

The seed computing resources are used at time t for all edge servers.

Is the set of distances of the mobile device to all edge servers.

3. Define action a,: the set of actions at time t is

I.e., a policy for the allocation of computing resources to all edge servers.

4. Defining a reward function R: calculating processing time from edges

And price function

For the total number of processing tasks at time t,

5. Adopts an epsilon-greedy method to explore the probability

Gradually decreasing from 1 to 0.1. Observing the current system state s (t), selecting a random number omega from 0 to 1 if

Randomly selecting an action from all possible actions to execute, namely allocating computing resources; if it is not

According to a (t) argmax_aQ(s) (t; a (t); theta) is performed by a selection action. After the corresponding action is executed, a reward function r (t) is calculated, the next system state s (t +1) is observed, and the transition states (s (t); a (t); r (t); s (t +1)) are stored in an a priori replay

In (1),

wherein Λ (t) ═ { s (t); a (t); r (t); s (t +1) }. Randomly selecting MiniBatch from the prior experiment as a sample, and selecting y ═ r (n) + gamma max_a(n+1)Q (s (n + 1); a (n + 1); θ') sets the target network value y. Then by a gradient decreasing function

And updating the evaluation network weight parameter theta. And continuously executing the process, and updating the target network weight parameter theta' to the current evaluation network weight parameter theta after J times of operation. The training process is repeated until the training is finished. And finally, obtaining the optimal computing resource allocation and task unloading strategy.

Fig. 4 shows that the training curve based on the DQN method tracks the utility contrast of the weighting network under different learning rates, and the learning rate is an important parameter influencing the convergence performance. It can be seen that the training process does not converge within the set number of cycles when the learning rate is set to 0.1, and converges much slower when the learning rate is set to 0.001. In contrast, with a learning rate of 0.01, the training process converges faster, reaching higher practicability.

Fig. 5 shows that the training curve based on the DQN method tracks the average reward contrast at different learning rates. It can be seen in this figure that the curve at a learning rate of 0.01 converges most quickly around a period of 400. Although the bonus rate achieved at the learning rate of 0.001 is slightly reduced at 0.01, it converges slowly at about a period of 750. In addition, in the case where the learning rate is 0.1, the algorithm can obtain a higher reward than the other two learning rates at some time, but cannot ensure convergence. Therefore, we chose the learning rate of 0.01 as a simulation parameter for subsequent experiments.

Fig. 6 shows a comparison of the convergence behavior of the DQN-based method and the Q-learning-based method with training segments, which shows that both the DQN-based method and the Q-learning-based method can converge to near optimal utility values in short-term training. However, in contrast to the DQN-based method, which converges when the period is around 400, the Q-learning-based method converges when the period exceeds 1000. The improvement of DQN convergence speed is mainly benefited by dual Q network and memory space

The use of (1).

As shown in fig. 7, which is a comparison of the total processing delay and the number of mobile devices under the DQN-based method, the Q-learning-based method and the game theory method, it can be seen from the figure that the DQN-based method and the Q-learning-based method can significantly reduce the processing delay compared to the game theory method. The reason for this phenomenon is that our proposed solution looks at the long-term offload performance of time-varying systems.

Fig. 8 shows the comparison of the computing resource utilization rate with the number of mobile devices under the DQN-based method, the Q-learning-based method and the game theory method, and it can be seen from the figure that the average computing resource utilization rate obtained by the Q-learning-based method and the DQN-based method is 74.58% and 77.37%, respectively. However, with the game theory approach, the resulting utilization is 61.38%, which is much lower than our proposed solution. By improving the utilization rate of computing resources, operators can effectively reduce the operation cost (OPEX) without reducing the unloading performance of the system.

Claims

1. A computing resource allocation and task unloading method for ultra-dense network edge computing is characterized by comprising the following steps:

2. The method for computing resource allocation and task offloading in ultra-dense network edge computing according to claim 1, wherein in step 1, the network parameters are obtained: number of mobile devices in scene C, set

Represents; the number of macro base stations is 1, the number of small base stations is B, and the set is used

Represents; the number of radio channels connected to the macro base station is W^mThe number of radio channels connected to the small base station s is W^s(ii) a The computing task types are E in total, and are represented by epsilon {1, 2.. E }, the arrival and processing processes of the tasks adopt an M/M/1 queuing model, and the average task rate reaching the edge server of the macro base station is

The average task rate to the edge server of the small base station is

The transmission power of mobile m is p_m，nAnd a channel gain between the macro base station and the base station ofAnd a small base station has a channel gain of

Denotes the distance of all devices to all base stations, wherein

α for n types of computing tasks requested by mobile device m_m，nRepresentation, computation task feature set ofWherein i_m，nSetting the size of the task to 300-; o_m，nSetting the CPU period required for processing the computing task as 100-1000 Megacycles;

set to 0.5-3s for maximum allowable processing delay α_m，nUnloading blockThe policy set is described as

Wherein the content of the first and second substances,

indicating that the task is to be computed locally,indicating that the task should be offloaded to an edge server connected to the macro base station,

indicating that mobile device m selects to offload task α_m，nEdge server to connected small base station, wherein

σ²The background noise power is set to-100 dbm.

3. The method of claim 2, wherein defining the computing resource allocation and task off-loading for ultra-dense network edge computing

All tasks to be processed at time t; definition of

For the total computing resources at all edge servers time t,whereina_s(t) is the total resources of all edge servers connecting the macro base station and the small base station; definition of

The computing resources of the seed are used at time t for all edge servers,

wherein sigma_m(t) and b_s(t) is the resources being used by all edge servers connecting the macro base station and the small base station; definition ofTo allocate policies for computing resources to all edge servers,

4. the method for computing resource allocation and task offloading in ultra-dense network edge computing as claimed in claim 1, wherein in step 2,

a. local compute task α_m，nQueue condition v of_m，nSetting the queuing delay introduced per decision period

Second, CPU cycle o required to process a computational task_m，nAnd computing resources q of a particular mobile device_m(ii) a Locally calculating total delay

Calculated from the following formula:

b. computation tasks α off-load to edge servers of macro base station_m，nQueue condition v of_m，nSetting the queuing delay introduced per decision period

Second, calculate size of task i_m，nCPU cycles o required to process a computational task_m，nAnd the computing resource size q of the edge server connected with the macro base station^mBetween 16-32 GHz; uplink data rate for transmitting computational tasks

Calculated from the following formula:

final total delay of the unloading mode

Calculated from the following formula:

c. computation tasks α off-loaded to edge servers connecting small cells s_m，nQueue condition v of_m，nSetting the queuing delay introduced per decision period

Second, calculate size of task i_m，nCPU cycles o required to process a computational task_m，nAnd edge server computation resources connecting macro base stationsSource size q^sBetween 4-8 GHz; uplink data rate for transmitting computational tasks

Calculated from the following formula

Final total delay of the unloading mode

Calculated from the following formula:

the calculation time for which three calculation schemes can be obtained is

5. The method for computing resource allocation and task offloading in ultra-dense network edge computing according to claim 1, wherein step 3 specifically includes:

2) defining system states

the system state at time t is

α therein_m，n(t) is a calculation task feature, v_m，n(t) is the task queuing state,

computing resources for the total of all edge servers at time t-1;

the distance between the mobile equipment and all edge servers is collected;

3) defining actions

the set of actions at time t is

I.e. a computing resource allocation strategy for all edge servers;

4) defining a reward function

Calculating processing time from edges

And price function

Wherein mu₁The price of computing resources per time unit for edge servers connected to macro base stations is set to 0.7, mu₂For connecting small base stationsThe price of the edge server per unit time unit computing resource is set to be 1; pi_m，nThe size of the computing resources allocated to the respective computing task; defining the normalized conversion as gamma (x), and calculating the normalized user profit-expenditure utility asR_m，n(t) earnings after processing the calculation tasks; the reward value function at time t is

For the total number of processing tasks at time t,

5) observing the current system state s, and executing corresponding action a according to the Q (s, a) value stored in the Q table, namely resource allocation; then observing the next system state s 'after the action a is executed, and according to the current system state s, the executed action a and the next system state s', obtaining the system stateObtaining a current Q value and storing the current Q value in a Q table; continuously executing the training process until the training is finished; and finally, obtaining the optimal computing resource allocation and task unloading strategy.

6. The method for computing resource allocation and task offloading in ultra-dense network edge computing according to claim 1, wherein step 4 specifically includes:

1) initialization: the weight parameter of the evaluation network is theta, and the weight parameter of the target networkNumber θ', discount factor γ and learning rate α, probability of exploration

A priori playback

2) Defining a system state S: the system state at time t is

α therein_m，n(t) is a calculation task feature, v_m，n(t) is the task queuing state,computing resources for the total of all edge servers at time t-1;

the distance between the mobile equipment and all edge servers is collected;

3) define action A: the set of actions at time t is

I.e. a computing resource allocation strategy for all edge servers;

4) defining a reward function R: calculating processing time from edges

And price function

Wherein mu₁The price of computing resources per time unit for edge servers connected to macro base stations is set to 0.7, mu₂The price of computing resources in unit time unit for an edge server connected with a small base station is set as 1; pi_m，nThe size of the computing resources allocated to the respective computing task; defining the normalized transformation as γ (x), the normalized user benefit-expenditure utility can be calculated as

R_m，n(t) earnings after processing the calculation tasks; the reward value function at time t is

For the total number of processing tasks at time t,

5) adopts an epsilon-greedy method to explore the probability

Gradually decreases from 1 to 0.1; observing the current system state s (t), selecting a random number omega from 0 to 1 if

According to a (t) argmax_aQ(s) (t; a (t); theta) selection action execution; after the corresponding action is executed, a reward function r (t) is calculated, the next system state s (t +1) is observed, and the transition states (s (t); a (t); r (t); s (t +1)) are stored in an a priori replayIn (1),wherein Λ (t) ═ { s (t); a (t); r (t); s (t +1) }; randomly selecting MiniBatch from the prior experiment as a sample, and selecting y ═ r (n) + gamma max_a(n+1)Q (s (n + 1); a (n + 1); θ') sets a target network value y; then by a gradient decreasing function