CN115686779B - DQN-based self-adaptive edge computing task scheduling method - Google Patents

DQN-based self-adaptive edge computing task scheduling method Download PDF

Info

Publication number
CN115686779B
CN115686779B CN202211261147.7A CN202211261147A CN115686779B CN 115686779 B CN115686779 B CN 115686779B CN 202211261147 A CN202211261147 A CN 202211261147A CN 115686779 B CN115686779 B CN 115686779B
Authority
CN
China
Prior art keywords
value
task
network
dqn
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211261147.7A
Other languages
Chinese (zh)
Other versions
CN115686779A (en
Inventor
巨涛
王志强
刘帅
火久元
张学军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lanzhou Jiaotong University
Original Assignee
Lanzhou Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lanzhou Jiaotong University filed Critical Lanzhou Jiaotong University
Priority to CN202211261147.7A priority Critical patent/CN115686779B/en
Publication of CN115686779A publication Critical patent/CN115686779A/en
Application granted granted Critical
Publication of CN115686779B publication Critical patent/CN115686779B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a self-adaptive edge computing task scheduling method based on DQN for an edge computing system, wherein an agent in the method respectively acquires task configuration information and computing node configuration information and takes the task configuration information and the computing node configuration information as environment state information of an input neural network; and calculating final output of the neural network according to the loss values of the previous training, selecting calculation nodes for tasks according to the final output and the loss values of the last several times of training, and finally storing learning experience based on the loss values. Finally, the optimal matching of the task and the computing node is realized, and the invention provides an effective solution for fully utilizing the edge computing resource, improving the real-time performance of task processing and reducing the system overhead.

Description

DQN-based self-adaptive edge computing task scheduling method
Technical Field
The invention belongs to the field of computer system structures, relates to a self-adaptive task scheduling method, and particularly relates to a DQN-based self-adaptive task scheduling method for an edge computing system.
Background
How to fully utilize the computing resources in the edge computing system, improve the real-time performance of task processing, and reduce the system overhead is a key problem faced by the edge computing system. With the development of machine learning technology, more and more deep reinforcement learning algorithms (such as DQN, DDPG, actor-Critic and the like) are used to solve task scheduling problems under edge computing. However, the task scheduling problem belongs to the continuity problem, and this characteristic requires discretization of an action space and a state space in an algorithm or selection of an algorithm suitable for processing the problem. The ability to make finer divisions of tasks is detrimental to efficient use of computing resources when scheduled as a whole. If algorithms such as DQN are applied that are suitable for continuous problems, more efficient discretization is performed and the convergence speed of the neural network is compromised while reducing the impact of the "overestimation" problem caused by the algorithm itself. If the exploring degree of the action space cannot be effectively adjusted in the training process of the neural network, the convergence and the stability of the neural network are not facilitated. When the deep reinforcement learning algorithm is applied to edge calculation to solve the task scheduling problem, limited computing resources, internal characteristics of the tasks, convergence speed and stability of the algorithm are considered, the algorithm with small relative computing quantity is selected, the tasks are reasonably divided, the exploration efficiency of the algorithm to a solution space is improved, fluctuation after algorithm convergence is reduced, optimal matching of the tasks and computing nodes is realized, the utilization rate of the computing resources of the system can be improved, the real-time performance of task processing is improved, and the system cost is reduced.
Most of the existing research works consider that the task is regarded as a whole to be scheduled, which cannot effectively utilize computing resources, and the probability value in the selection strategy of the computing node is a fixed value, which is unfavorable for effective exploration of an action space, so that algorithm convergence speed is slow and unstable. The discretization is needed in the work of using the DQN and other algorithms with smaller calculation amount but suitable for the discrete space problem algorithm, otherwise, the accuracy is reduced, and the discretization treatment, namely, the sampling strategy of taking out part of the learning experience from the experience pool for playback is mostly random sampling, which cannot effectively improve the sample efficiency. Since algorithms such as DQN continue to select the computing node with the largest adaptation value to the task, this will lead to an "overestimation" problem, i.e. an estimated value that is larger than the actual value. While algorithms such as DDPG are suitable for task scheduling, they are not as computationally intensive as edge computing systems with relatively limited computing resources.
Disclosure of Invention
The invention aims to overcome the problems in the prior art and provide a DQN-based self-adaptive edge computing task scheduling method, which is based on task configuration information and computing node configuration information to realize optimal matching of tasks and computing nodes so as to fully utilize computing resources, improve the real-time performance of task processing and reduce system overhead.
In order to achieve the purpose, the invention is realized by adopting the following technical scheme:
an adaptive edge computing task scheduling method based on DQN comprises the following steps:
1) When the training steps of the neural network are multiples of the steps of the appointed copying parameters, copying the evaluation network parameters in the DQN to the target network; when the training steps of the neural network are multiples of the designated playback experience steps, playing back the learning experience in the experience pool and emptying the experience pool;
2) Acquiring computing node configuration information, terminal equipment configuration information and task configuration information as environment state information, and normalizing the environment state information to be used as input of a deep reinforcement learning neural network; the environment state information is composed of the calculation task data size, the required calculation resource number, the required storage resource number, the available calculation resource number of all calculation nodes and the available storage resource number information, namely:
wherein, state i Status information representing a computing task and an ith computing node; ds, tc, ts are the data size, the number of required computing resources and the number of required storage resources of the computing task respectively; nc and ns are the number of available computing resources and the number of available storage resources of the computing node respectively.
3) Respectively obtaining the output of an evaluation network and a target network, and calculating the final output of the neural network by combining the loss value of the last training through a comprehensive Q value calculation method, and taking the final output as an adaptation degree value of a task and a calculation node; the specific calculation formula of the comprehensive Q value is as follows:
TNet and ENT are respectively a target network and an evaluation network, OT and OE are respectively the outputs of the target network and the evaluation network, and Loss is the Loss of the last iteration.
4) Based on a self-adaptive dynamic action space exploration degree adjustment strategy, selecting a computing node corresponding to the maximum adaptation degree value by taking the final output of the neural network and the loss value of the training of the last times as tasks with a certain probability, otherwise, randomly selecting the computing node; the adaptive dynamic motion space exploration degree adjustment strategy specifically comprises the following steps:
wherein rd is a random number generation function for generating random numbers in the range of [0,1 ]; if the F value is True, selecting an unloading action corresponding to a non-maximum value for the current task to be processed, and if the F value is False, selecting an unloading action corresponding to the maximum value;
3) Calculating the loss values of all the current tasks;
the specific calculation method is as follows:
wherein output is output of the evaluation network, and action is action selection;
6) The current task is prioritized by utilizing an adaptive lightweight playback mechanism based on the loss value, and learning experience with the highest priority is stored in an experience pool;
7) Updating and evaluating network parameters;
8) Until the end condition is satisfied.
Further:
in step 2), subtask configuration information and each computing node configuration information divided under the task are taken as environment state information.
In step 3), the loss value of the last training in the comprehensive Q value calculation method is used to measure the duty ratio of the evaluation network and the target network in the final output, the output of the target network is mainly used in the initial training stage of the neural network, and the output of the evaluation network is mainly converted as the training progresses.
In step 4), the average value of the loss values of the last several training in the adaptive dynamic motion space exploration degree adjustment strategy is calculated and used as a design basis for calculating the node selection probability.
In step 5), a cross entropy loss function is adopted when the loss values of all the current tasks are calculated.
In step 6), the adaptive lightweight playback mechanism sorts the loss values based on the current learning experience, and stores the learning experience in the middle part into the experience pool because the learning experience with small loss values easily guides the neural network to local optimum and the loss values are far away from the optimum solution.
Compared with the prior art, the invention has the following beneficial effects:
aiming at the task scheduling problem under edge calculation, the task is regarded as being composed of independent subtasks, the subtasks and the configuration information of each calculation node are used as the input of the neural network, the final output of the neural network is calculated based on the loss value obtained in the last training, the calculation nodes are selected for the task based on the final output and the loss value of the last training, finally, the priority ranking is carried out according to the loss values of all the tasks, the learning samples of the middle part are stored in an experience pool, and parameter replication or experience playback is carried out when the specified condition is met, so that the optimal matching of the task and the calculation nodes is realized, the calculation resources are fully utilized, the real-time performance of the task processing is improved, and the system cost is reduced.
Drawings
Fig. 1 is a general framework of the present invention:
FIG. 2 is a process flow of the present invention:
FIG. 3 is a graph of loss values for the present invention:
fig. 4 is a graph of loss values for DQN:
fig. 5 is a graph of loss values for D3 DQN:
fig. 6 is an overall comparison of loss value curves:
FIG. 7 is a graph showing the cumulative energy consumption of the present invention versus various baseline algorithms:
FIG. 8 is a graph of the cumulative weighted overhead of the present invention versus various baseline algorithms.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the application scenario of the present invention may be:
in an edge computing system, there is a set of computing nodes at the edge, a set of terminal devices, and a decision agent. When the agent receives a task scheduling request from the terminal equipment, the agent collects task information and computing node information submitted by the terminal equipment through a wireless network and makes a task unloading decision, if the task is unloaded, the task data is uploaded to a computing node at an edge end for processing, and a processing result is returned to the terminal equipment; and if the local processing is performed, processing the task on the terminal equipment.
Referring to fig. 2, an adaptive edge computing task scheduling method based on DQN includes the steps of:
1) When the training steps of the neural network are multiples of the steps of the appointed copying parameters, copying the evaluation network parameters in the DQN to the target network; when the training steps of the neural network are multiples of the specified playback experience steps, the learning experience in the experience pool is played back and the experience pool is emptied. Specific: initializing various parameters when the processing starts; ending if the training step number itr reaches the maximum value, otherwise continuing processing; if the training step number itr satisfies the condition 1: when the number of training steps of the specified replication parameters is multiple, the replication parameters are replicated to a target network; if the training step number itr satisfies the condition 2: when the multiple of the training steps of the appointed playback experience is reached, the learning experience in the experience pool is played back;
2) And acquiring computing node configuration information, terminal equipment configuration information and task configuration information as environment state information, and normalizing the environment state information to be used as input of the deep reinforcement learning neural network. Wherein subtask configuration information and each computing node configuration information divided under the task are taken as environment state information. The specific treatment is as follows:
when processing a scheduling request sent by a terminal device, the agent needs to comprehensively consider the current task to be processed and the state information of all computing nodes so as to make an optimal scheduling decision. The agent receives the scheduling request sent by the terminal equipment and contains the state information of the task to be processed and the state information of the terminal equipment; upward, the proxy requests state information of all edge computing nodes from the edge server. The agent can start scheduling decisions after acquiring the above needed environmental state information.
The environment state information is composed of the calculation task data size, the required calculation resource number, the required storage resource number, the available calculation resource number of all the calculation nodes and the available storage resource number information, namely:
wherein, state i Status information representing a computing task and an ith computing node; ds, tc, ts are the data size, the number of required computing resources and the number of required storage resources of the computing task respectively; nc and ns are the number of available computing resources and the number of available storage resources of the computing node respectively.
3) Respectively obtaining the output of an evaluation network and a target network, and calculating the final output of the neural network by combining the loss value of the last training through a comprehensive Q value calculation method, and taking the final output as an adaptation degree value of a task and a calculation node; the loss value of the last training in the comprehensive Q value calculation method is used for measuring the duty ratio of the evaluation network and the target network in the final output, the output of the target network is mainly used in the initial training stage of the neural network, and the output of the evaluation network is mainly converted into the output of the neural network along with the progress of training.
The design idea of the specific calculation method is as follows:
in the conventional DQN algorithm, Q values of all possible unloading actions are output according to the environmental status information, the magnitude of which represents the probability magnitude that the unloading action is selected. And then selecting an unloading action corresponding to the maximum value of the Q value as a scheduling decision of the current task to be processed. However, in the initial stage of training of the neural network, selecting the maximum Q value causes that the actual Q value of the neural network is updated in a direction larger than the actual Q value when the parameter is updated, thereby causing an overestimation problem. In the existing work, when parameter updating is carried out on two neural networks of an evaluation network and a target network, the evaluation network parameter is updated in real time, the target network parameter is updated in a delayed manner, and the output of the target network is used as the basis of action selection. While this reduces the effect of "overestimation", it is disadvantageous for parameter updates of the evaluation network and can easily cause fluctuations in the neural network after copying the parameters of the evaluation network to the target network. In order to solve the above problems, the final output calculation of the neural network is performed based on the loss value of the last training and the outputs of the evaluation network and the target network, and the specific calculation formula of the comprehensive Q value is as follows:
TNet and ENT are respectively a target network and an evaluation network, OT and OE are respectively the outputs of the target network and the evaluation network, and Loss is the Loss of the last iteration. The loss value can reflect the learning degree of the neural network, the larger the loss value is, the farther the distance between the neural network and the convergence is, the more difficult the accurate evaluation is made on the current environment state, and the larger the influence of over-estimation is; whereas the closer the distance converges, the less affected by "overestimation". In the initial stage of learning, the loss value of the neural network is larger, and the output of the whole neural network is mainly the output of the target neural network according to the formula, so that the influence of overestimation is reduced; the closer to convergence, the transition is to predominate in evaluating the output of the network. Therefore, the target network and the evaluation network jointly determine the final network output so as to reduce the influence of over estimation and ensure the stability of the neural network.
4) Based on the self-adaptive dynamic action space exploration degree adjustment strategy, selecting a computing node corresponding to the maximum adaptation degree value by taking the final output of the neural network and the loss value of the training of the last times as tasks with a certain probability, and otherwise, randomly selecting the computing node. And the average value of the loss values of the last several training steps in the self-adaptive dynamic action space exploration degree adjustment strategy is calculated and is used as a design basis for calculating the node selection probability.
The design idea of the selection method of the concrete computing node is as follows:
to increase the heuristics of the action space, existing work often employs an ε -greedy strategy on action selection. And selecting other actions with fixed probability, and otherwise selecting the action corresponding to the maximum Q value. However, the degree of exploration required at different stages in the learning process of the neural network is different, and in the initial stage of the neural network learning, in order to approach the optimal solution as soon as possible, the exploration of the action space should be performed with a higher probability, that is, the unloading action corresponding to the non-maximum value is selected, and as the learning proceeds, the unloading action corresponding to the non-maximum value is selected with a lower probability. The learning progress of the neural network can be reflected by the loss value, if the loss value is large, the neural network is not converged, and the action space is required to be explored with a large exploration degree to find the optimal solution, and if the loss value is small, the exploration degree is correspondingly reduced. Therefore, the design of the action selection strategy is carried out based on the loss value, and in order to prevent the neural network from fluctuating due to overlarge change of the loss value, the square of the average value of the loss value in the training of the near several times is used as the basis of the probability value of the design action selection strategy, so that the dynamic adjustment of the exploration degree of the action space of the neural network is realized. The calculation method comprises the following steps:
where rd is a random number generation function for generating random numbers in the range of [0,1 ]. And if the F value is True, selecting an unloading action corresponding to a non-maximum value for the current task to be processed, and if the F value is False, selecting an unloading action corresponding to the maximum value.
5) Calculating the loss values of all the current tasks;
the specific calculation method is as follows:
wherein output is the output of the evaluation network and action is the action selection.
6) The current task is prioritized by utilizing an adaptive lightweight playback mechanism based on the loss value, and learning experience with the highest priority is stored in an experience pool; the self-adaptive lightweight playback mechanism sorts the loss values based on the current learning experience, and the learning experience of the middle part is stored in the experience pool because the learning experience with small loss values is easy to guide the neural network to local optimum and the loss value is far away from the optimum solution.
The specific design concept is as follows:
as the state space dimension increases, it will result in a "curse dimension", i.e. more learning samples are needed to achieve a satisfactory effect on the neural network. However, the number of actual samples is often limited, and consideration is given to how to increase the efficiency of the limited number of samples. The experience playback mechanism not only can solve the problem of low efficiency of learning samples, but also can break the continuity of action space, and often solves the problem of complex high dimension together with the DQN algorithm. However, in a marginal environment of limited computing resources, conventional experience playback mechanisms that save all historical experience consume significant memory resources, and randomly extracting a certain number of samples from the historical experience for playback may not effectively utilize the more efficient samples.
Since the most recent learning experience is most favorable for the learning of the neural network and has the greatest correlation with the neural network, the playback mechanism only saves the learning experience of the most recent m iterations. The learning experience is ranked based on the loss value, the neural network is easily guided to be locally optimal due to the fact that the loss value is small, and the neural network is far away from the optimal solution due to the fact that the loss value is large, and therefore the historical experience of the middle part x is extracted and played back. The value of the x value is different in different stages of the neural network learning, the neural network needs to be learned again in the initial stage of learning, and the value of the x value should be a smaller value at the moment; as learning goes deep, neural networks should pay attention to playback of historical experience to stabilize performance, where the value of x should take a larger value.
7) Parameters of the evaluation network are updated in a designated optimizer based on the loss value, action selection and learning rate of the current task.
8) Until the end condition is satisfied.
The invention can sense the changed environment state and acquire the needed environment state information, and optimally match the task with the computing node according to the environment state information, thereby realizing efficient, real-time and low-energy task scheduling in the edge computing system with limited computing resources. A specific process flow is shown in fig. 2.
Aiming at the task scheduling problem under an edge computing system, the invention utilizes a DQN algorithm with relatively small calculated amount, and combines various improved methods and strategies designed based on loss values; and matching the task with the computing node on the basis of the environmental state information formed by the task and the computing node. According to the invention, the optimal matching of the task and the computing node can be realized according to the changed environmental state information, the computing resource is fully utilized, the real-time performance of task processing is improved, the system overhead is reduced, and a self-adaptive task scheduling method is provided for the edge computing system.
To verify the effectiveness of the present invention, various performance comparisons of the present invention with various baseline algorithms are made, as shown in fig. 3-8. The brief analysis is as follows:
the D3DQN-CAA is the invention obtained by combining the three methods and mechanisms designed in the invention on the basis of Dueling Double DQN (D3 DQN), and the rest of the algorithms are all existing algorithms, and are used for comparing the performance of the invention.
FIGS. 3-6 are comprehensive comparisons of loss values for D3DQN-CAA, DQN, and D3DQN, with the number of training steps on the abscissa and loss values on the ordinate. It can be seen that the D3DQN-CAA curve is the smoother and has the smallest fluctuation amplitude after convergence, which illustrates that the comprehensive Q value calculation method and the self-adaptive lightweight playback mechanism can play a role of stabilizing the model. As can be seen by comparing FIGS. 3, 4 and 5, the D3DQN-CAA, DQN, D D3DQN loss curve has similar trend, but the DQN loss curve decreases too fast, and neither DQN nor D3DQN can reach the convergence value of D3 DQN-CAA. Meanwhile, although the loss value curves in the figures 4 and 5 are converged, the loss value curves are at a relatively higher convergence value and have larger fluctuation range, which shows that the adaptive dynamic motion space exploration degree adjustment strategy mechanism can effectively control the exploration degree of the model in the motion space and improve the stability of the model; in addition, the comparison shows that the loss value curves of the DQN and the D3DQN are easy to generate larger fluctuation at 1200, 1400 and 1600 and more training steps are needed to achieve convergence compared with the D3DQN-CAA, which shows that the comprehensive Q value calculation method has remarkable effects on improving the convergence speed of the neural network and reducing the fluctuation amplitude of the neural network after parameter replication. The experimental comparison results show that the loss value curve of the designed method has the advantages of minimum training steps required for convergence, difficult fluctuation and more stable performance.
FIG. 7 is a graph showing the cumulative energy consumption comparisons of D3DQN-CAA, DQN, D3DQN, only Local, only Edge and Random, with the number of training steps on the abscissa and the cumulative energy consumption value on the ordinate. The method D3DQN-CAA adopts a comprehensive Q value calculation method, a self-adaptive dynamic action space exploration degree adjustment strategy and a self-adaptive lightweight playback mechanism, so that the training steps of the neural network can be effectively reduced and the calculation resources can be fully utilized while the stability of the algorithm is ensured, and the accumulated energy consumption curve of the method is lower than DQN, D3DQN and other algorithms.
Fig. 8 is a graph of D3DQN-CAA, DQN, D3DQN, only Local, only Edge, and Random accumulated weighted overhead (calculated from the weighted sum of system computation delay, transmission delay, and total energy consumption), plotted against training steps, and accumulated weighted overhead values plotted against the abscissa. As can be seen from the figure, the worst performance, i.e. the accumulated weighted cost curve is higher than that of other algorithms, the optimal performance is that of the scheduling algorithm D3DQN-CAA, and the middle part is sequentially from high to low, namely Random, DQN, only Edge and D3DQN.
The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (6)

1. The adaptive edge computing task scheduling method based on the DQN is characterized by comprising the following steps of:
1) When the training steps of the neural network are multiples of the steps of the appointed copying parameters, copying the evaluation network parameters in the DQN to the target network; when the training steps of the neural network are multiples of the designated playback experience steps, playing back the learning experience in the experience pool and emptying the experience pool;
2) Acquiring computing node configuration information, terminal equipment configuration information and task configuration information as environment state information, and normalizing the environment state information to be used as input of a deep reinforcement learning neural network; the environment state information consists of the calculation task data size, the required calculation resource number, the required storage resource number, the available calculation resource number of all the calculation nodes and the available storage resource number information, namely
Wherein, state i Status information representing a computing task and an ith computing node; ds, tc, ts are the data size, the number of required computing resources and the number of required storage resources of the computing task respectively; nc, ns are the number of available computing resources and the number of available storage resources of the computing node respectively;
3) Respectively obtaining the output of an evaluation network and a target network, and calculating the final output of the neural network by combining the loss value of the last training through a comprehensive Q value calculation method, and taking the final output as an adaptation degree value of a task and a calculation node; the specific calculation formula of the comprehensive Q value is as follows:
TNet and ENT are respectively a target network and an evaluation network, OT and OE are respectively the output of the target network and the evaluation network, and Loss is the Loss of the last iteration;
4) Based on a self-adaptive dynamic action space exploration degree adjustment strategy, selecting a computing node corresponding to the maximum adaptation degree value by taking the final output of the neural network and the loss value of the training of the last times as tasks with a certain probability, otherwise, randomly selecting the computing node; the adaptive dynamic motion space exploration degree adjustment strategy specifically comprises the following steps:
wherein rd is a random number generation function for generating random numbers in the range of [0,1 ]; if the F value is True, selecting an unloading action corresponding to a non-maximum value for the current task to be processed, and if the F value is False, selecting an unloading action corresponding to the maximum value;
5) Calculating the loss values of all the current tasks;
the specific calculation method is as follows:
wherein output is output of the evaluation network, and action is action selection;
6) The current task is prioritized by utilizing an adaptive lightweight playback mechanism based on the loss value, and learning experience with the highest priority is stored in an experience pool;
7) Updating and evaluating network parameters;
8) Until the end condition is satisfied.
2. The DQN-based adaptive edge computing task scheduling method of claim 1, wherein: in step 2), subtask configuration information and each computing node configuration information divided under the task are taken as environment state information.
3. The DQN-based adaptive edge computing task scheduling method of claim 1, wherein: in step 3), the loss value of the last training in the comprehensive Q value calculation method is used to measure the duty ratio of the evaluation network and the target network in the final output, the output of the target network is mainly used in the initial training stage of the neural network, and the output of the evaluation network is mainly converted as the training progresses.
4. The DQN-based adaptive edge computing task scheduling method of claim 1, wherein: in step 4), the average value of the loss values of the last several training in the adaptive dynamic motion space exploration degree adjustment strategy is calculated and used as a design basis for calculating the node selection probability.
5. The DQN-based adaptive edge computing task scheduling method of claim 1, wherein: in step 5), a cross entropy loss function is adopted when the loss values of all the current tasks are calculated.
6. The DQN-based adaptive edge computing task scheduling method of claim 1, characterized by: in step 6), the adaptive lightweight playback mechanism sorts the loss values based on the current learning experience, and stores the learning experience in the middle part into the experience pool because the learning experience with small loss values easily guides the neural network to local optimum and the loss values are far away from the optimum solution.
CN202211261147.7A 2022-10-14 2022-10-14 DQN-based self-adaptive edge computing task scheduling method Active CN115686779B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211261147.7A CN115686779B (en) 2022-10-14 2022-10-14 DQN-based self-adaptive edge computing task scheduling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211261147.7A CN115686779B (en) 2022-10-14 2022-10-14 DQN-based self-adaptive edge computing task scheduling method

Publications (2)

Publication Number Publication Date
CN115686779A CN115686779A (en) 2023-02-03
CN115686779B true CN115686779B (en) 2024-02-09

Family

ID=85067008

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211261147.7A Active CN115686779B (en) 2022-10-14 2022-10-14 DQN-based self-adaptive edge computing task scheduling method

Country Status (1)

Country Link
CN (1) CN115686779B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116257361B (en) * 2023-03-15 2023-11-10 北京信息科技大学 Unmanned aerial vehicle-assisted fault-prone mobile edge computing resource scheduling optimization method
CN116909717B (en) * 2023-09-12 2023-12-05 国能(北京)商务网络有限公司 Task scheduling method
CN117082008B (en) * 2023-10-17 2023-12-15 深圳云天畅想信息科技有限公司 Virtual elastic network data transmission scheduling method, computer device and storage medium
CN117806806A (en) * 2024-02-28 2024-04-02 湖南科技大学 Task part unloading scheduling method, terminal equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111866869A (en) * 2020-07-07 2020-10-30 兰州交通大学 Federal learning indoor positioning privacy protection method facing edge calculation
CN112134916A (en) * 2020-07-21 2020-12-25 南京邮电大学 Cloud edge collaborative computing migration method based on deep reinforcement learning
CN112822055A (en) * 2021-01-21 2021-05-18 国网河北省电力有限公司信息通信分公司 DQN-based edge computing node deployment algorithm
CN113269322A (en) * 2021-05-24 2021-08-17 东南大学 Deep reinforcement learning improvement method based on self-adaptive hyper-parameters
CN113296845A (en) * 2021-06-03 2021-08-24 南京邮电大学 Multi-cell task unloading algorithm based on deep reinforcement learning in edge computing environment
WO2022069747A1 (en) * 2020-10-02 2022-04-07 Deepmind Technologies Limited Training reinforcement learning agents using augmented temporal difference learning
CN114374949A (en) * 2021-12-31 2022-04-19 东莞理工学院 Power control mechanism based on information freshness optimization in Internet of vehicles

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111866869A (en) * 2020-07-07 2020-10-30 兰州交通大学 Federal learning indoor positioning privacy protection method facing edge calculation
CN112134916A (en) * 2020-07-21 2020-12-25 南京邮电大学 Cloud edge collaborative computing migration method based on deep reinforcement learning
WO2022069747A1 (en) * 2020-10-02 2022-04-07 Deepmind Technologies Limited Training reinforcement learning agents using augmented temporal difference learning
CN112822055A (en) * 2021-01-21 2021-05-18 国网河北省电力有限公司信息通信分公司 DQN-based edge computing node deployment algorithm
CN113269322A (en) * 2021-05-24 2021-08-17 东南大学 Deep reinforcement learning improvement method based on self-adaptive hyper-parameters
CN113296845A (en) * 2021-06-03 2021-08-24 南京邮电大学 Multi-cell task unloading algorithm based on deep reinforcement learning in edge computing environment
CN114374949A (en) * 2021-12-31 2022-04-19 东莞理工学院 Power control mechanism based on information freshness optimization in Internet of vehicles

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
D3DQN-CAA:一种基于DRL 的自适应边缘计算任务调度方法;巨涛等;《https://link.cnki.net/urlid/43.1061.N.20231013.0855.002》;20231016;1-13 *
Job Scheduling Based on Deep Reinforcement Learning in Cloud Data Center;Fengcun Li等;《In Proceedings of the 2019 4th International Conference on Big Data and Computing》;20190512;48-53 *
移动边缘计算中多约束下的任务卸载和资源分配算法;童钊等;《计算机工程与科学》;20201015;第42卷(第10期);1869-1879 *

Also Published As

Publication number Publication date
CN115686779A (en) 2023-02-03

Similar Documents

Publication Publication Date Title
CN115686779B (en) DQN-based self-adaptive edge computing task scheduling method
CN111556461B (en) Vehicle-mounted edge network task distribution and unloading method based on deep Q network
CN113434212B (en) Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
CN112380008B (en) Multi-user fine-grained task unloading scheduling method for mobile edge computing application
WO2021012946A1 (en) Video bit rate determining method and apparatus, electronic device, and storage medium
CN110460880B (en) Industrial wireless streaming media self-adaptive transmission method based on particle swarm and neural network
CN110780938B (en) Computing task unloading method based on differential evolution in mobile cloud environment
CN111768028B (en) GWLF model parameter adjusting method based on deep reinforcement learning
CN111813506A (en) Resource sensing calculation migration method, device and medium based on particle swarm algorithm
CN113485826A (en) Load balancing method and system for edge server
CN114760311A (en) Optimized service caching and calculation unloading method for mobile edge network system
CN113722980A (en) Ocean wave height prediction method, system, computer equipment, storage medium and terminal
CN114385272B (en) Ocean task oriented online adaptive computing unloading method and system
CN114706631B (en) Unloading decision method and system in mobile edge calculation based on deep Q learning
Dong et al. Quantum particle swarm optimization for task offloading in mobile edge computing
Li et al. An intelligent adaptive algorithm for servers balancing and tasks scheduling over mobile fog computing networks
CN114449584A (en) Distributed computing unloading method and device based on deep reinforcement learning
CN117579701A (en) Mobile edge network computing and unloading method and system
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
CN116305747A (en) Workflow multi-target scheduling method based on improved whale optimization algorithm
CN116137724A (en) Task unloading and resource allocation method based on mobile edge calculation
CN116367231A (en) Edge computing Internet of vehicles resource management joint optimization method based on DDPG algorithm
CN115766241A (en) Distributed intrusion detection system task scheduling and unloading method based on DQN algorithm
CN111709578A (en) Short-time ship traffic flow prediction method and device and storage medium
Yao et al. Performance Optimization in Serverless Edge Computing Environment using DRL-Based Function Offloading

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant