CN115914227A - Edge Internet of things agent resource allocation method based on deep reinforcement learning - Google Patents
Edge Internet of things agent resource allocation method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN115914227A CN115914227A CN202211401605.2A CN202211401605A CN115914227A CN 115914227 A CN115914227 A CN 115914227A CN 202211401605 A CN202211401605 A CN 202211401605A CN 115914227 A CN115914227 A CN 115914227A
- Authority
- CN
- China
- Prior art keywords
- reinforcement learning
- deep reinforcement
- state
- time
- terminal device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 64
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000013468 resource allocation Methods 0.000 title claims abstract description 41
- 238000004364 calculation method Methods 0.000 claims abstract description 25
- 230000003993 interaction Effects 0.000 claims description 60
- 230000007704 transition Effects 0.000 claims description 32
- 230000006870 function Effects 0.000 claims description 23
- 239000003795 chemical substances by application Substances 0.000 claims description 21
- 230000009471 action Effects 0.000 claims description 18
- 238000012549 training Methods 0.000 claims description 17
- 230000005012 migration Effects 0.000 claims description 16
- 238000013508 migration Methods 0.000 claims description 16
- 230000003111 delayed effect Effects 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 10
- 230000007246 mechanism Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 230000002452 interceptive effect Effects 0.000 claims 2
- 230000002123 temporal effect Effects 0.000 claims 1
- 238000005457 optimization Methods 0.000 abstract description 3
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008447 perception Effects 0.000 description 3
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 1
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Telephonic Communication Services (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及物联网技术领域,具体涉及一种基于深度强化学习的边缘物联网代理资源分配方法。The present invention relates to the technical field of Internet of Things, and in particular to an edge Internet of Things agent resource allocation method based on deep reinforcement learning.
背景技术Background Art
本节中的陈述仅提供与本公开相关的背景信息,并且可能不构成现有技术。The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
合理的资源配置是高效支持边缘物联网代理的电力业务的重要保障;电力物联网是国家工业互联网的重要组成部分;构建高效、安全、可靠的感知层已成为电力行业的一项重要建设工作;然而,目前电力物联网设备的计算能力有限,不能有效地实现本地大型快速计算的任务;边缘物联网代理作为物联网感知层的核心设备,发挥着连接物联网终端和云端的作用;随着语音、视频、图像等多种数据的接入,以及高频数据的采集和异构数据的存储,如何动态、自适应地将物联网终端的任务部署在合适的边缘物联网代理节点上是现阶段的关键问题。Reasonable resource allocation is an important guarantee for efficiently supporting the power business of edge IoT agents; the power IoT is an important part of the national industrial Internet; building an efficient, secure and reliable perception layer has become an important construction task in the power industry; however, the computing power of current power IoT devices is limited and cannot effectively achieve local large-scale fast computing tasks; the edge IoT agent, as the core device of the IoT perception layer, plays the role of connecting IoT terminals and the cloud; with the access of various data such as voice, video, and images, as well as the collection of high-frequency data and the storage of heterogeneous data, how to dynamically and adaptively deploy the tasks of IoT terminals on appropriate edge IoT agent nodes is a key issue at this stage.
目前,边缘物联网代理的关键问题主要体现在两个方面;首先,由于不同边缘的物联网代理之间相互依赖,现有的组合优化方法一般采用近似算法或启发式算法来解决部署方案,不仅需要较长的运行时间,而且性能有限;其次,边缘物联网代理环境中存在多个边缘节点,而边缘服务器的资源容量有限;因此,不同的边缘节点需要通过分布式决策进行合作,实现最优的资源分配,以支持高效、可靠的信息交互。At present, the key issues of edge IoT agents are mainly reflected in two aspects; first, due to the mutual dependence between IoT agents at different edges, the existing combinatorial optimization methods generally use approximate algorithms or heuristic algorithms to solve deployment solutions, which not only requires a long running time but also has limited performance; second, there are multiple edge nodes in the edge IoT agent environment, and the resource capacity of the edge server is limited; therefore, different edge nodes need to cooperate through distributed decision-making to achieve optimal resource allocation to support efficient and reliable information interaction.
多层网络模型的出现为通信网络资源的优化配置提供了新的解决方案;通过多层网络对网络模型进行训练,以达到准确、高效的解决方案;目前,一些研究者已经进行了研究和分析;现有技术中的一种方案为基于卷积神经网络,实现物联网资源的合理分配和边缘设备对终端数据及网络任务的高效交互和协调;另一种方案为利用贝叶斯对Q-learning网络进行优化,实现网络中资源分配的合理化和有序化,以抵御DDoS网络攻击;另外,深度时空残差网络的引入有效支持了工业物联网网络的有效负载平衡,保证了网络实现低延迟、高可靠的数据交互;考虑到网络设备的异质性,现有技术多采用深度学习网络对网络服务器和用户请求进行有效匹配,为用户设备分配最佳资源量;但需要注意的是,由于深度网络模型的网络结构,在更新和迭代网络状态时容易陷入计算能力和处理问题不匹配的问题,限制了计算效率,不足以支持复杂动力物联网的资源优化配置。The emergence of multi-layer network models provides a new solution for the optimal configuration of communication network resources; the network model is trained through a multi-layer network to achieve an accurate and efficient solution; currently, some researchers have conducted research and analysis; one solution in the existing technology is based on convolutional neural networks to achieve reasonable allocation of IoT resources and efficient interaction and coordination of edge devices with terminal data and network tasks; another solution is to use Bayesian to optimize the Q-learning network to achieve rationalization and ordering of resource allocation in the network to resist DDoS network attacks; in addition, the introduction of deep spatiotemporal residual networks effectively supports the effective load balancing of industrial IoT networks and ensures that the network achieves low-latency and highly reliable data interaction; considering the heterogeneity of network devices, the existing technology mostly uses deep learning networks to effectively match network servers and user requests to allocate the best amount of resources to user devices; but it should be noted that due to the network structure of the deep network model, it is easy to fall into the problem of mismatch between computing power and processing problems when updating and iterating the network status, which limits the computing efficiency and is insufficient to support the optimal resource configuration of complex power IoT.
发明内容Summary of the invention
本发明的目的在于:针对现有技术中的上述不足,提供了一种基于深度强化学习的边缘物联网代理资源分配方法,解决了边缘物联网代理资源分配时间长、性能有限以及现有技术不足以支持复杂动力物联网的资源优化配置的问题。The purpose of the present invention is to provide an edge Internet of Things agent resource allocation method based on deep reinforcement learning in response to the above-mentioned deficiencies in the prior art, which solves the problems of long edge Internet of Things agent resource allocation time, limited performance and the insufficiency of prior art to support resource optimization configuration of complex power Internet of Things.
本发明的技术方案如下:The technical solution of the present invention is as follows:
一种基于深度强化学习的边缘物联网代理资源分配方法,包括:A resource allocation method for edge IoT agents based on deep reinforcement learning, comprising:
步骤S1:由终端设备x收集环境中的数据,并将所述数据传输至深度强化学习网络模型;Step S1: The terminal device x collects data in the environment and transmits the data to the deep reinforcement learning network model;
步骤S2:根据所述数据,由深度强化学习网络模型得到最优分配策略;Step S2: according to the data, obtaining the optimal allocation strategy by the deep reinforcement learning network model;
步骤S3:根据所述最优分配策略,将所述数据发送至边缘节点e进行计算,实现边缘物联网代理资源分配。Step S3: According to the optimal allocation strategy, the data is sent to the edge node e for calculation to realize edge IoT agent resource allocation.
进一步地,所述步骤S1中深度强化学习网络模型的训练方法包括如下步骤:Furthermore, the training method of the deep reinforcement learning network model in step S1 comprises the following steps:
步骤S101:初始化所述深度强化学习网络模型的系统状态s;Step S101: Initialize the system state s of the deep reinforcement learning network model;
步骤S102:初始化所述深度强化学习网络模型的实时ANN和延迟ANN;Step S102: Initializing the real-time ANN and the delayed ANN of the deep reinforcement learning network model;
步骤S103:初始化所述深度强化学习网络模型的经验池O;Step S103: Initialize the experience pool O of the deep reinforcement learning network model;
步骤S104:根据当前系统状态st,利用ε-greedy策略,选择系统动作at;Step S104: according to the current system state s t , using the ε-greedy strategy, select the system action a t ;
步骤S105:由环境根据所述系统动作at反馈奖励σt+1和系统下一状态st+1;Step S105: The environment feeds back a reward σ t+1 and the next state s t+1 of the system according to the system action a t ;
步骤S106:根据所述当前系统状态st、系统动作at、奖励σt+1和系统下一状态st+1,计算得到状态转换序列Δt,并将状态转换序列Δt存储至经验池O;Step S106: Calculate a state transition sequence Δ t according to the current system state s t , system action a t , reward σ t+1 and the next system state s t+1 , and store the state transition sequence Δ t in the experience pool O;
步骤S107:判断经验池O存储量是否达到预设值,若是,从经验池O中抽取N个状态转换序列对实时ANN和延迟ANN进行训练,完成对深度强化学习网络模型的训练;否则,将当前系统状态st更新为系统下一状态st+1,并返回步骤S104。Step S107: Determine whether the storage capacity of the experience pool O reaches the preset value. If so, extract N state transition sequences from the experience pool O to train the real-time ANN and the delayed ANN to complete the training of the deep reinforcement learning network model; otherwise, update the current system state s t to the next system state s t+1 and return to step S104.
进一步地,所述步骤S101中的系统状态s为本地卸载状态,表达式如下:Furthermore, the system state s in step S101 is a local uninstallation state, which is expressed as follows:
s=[F,M,B]s=[F,M,B]
其中:in:
F为卸货决策向量;F is the unloading decision vector;
M为计算资源分配向量;M is the computing resource allocation vector;
B为剩余计算资源向量;B=[b1,b2,b3…bd,…],其中,bd为第d个MEC服务器的剩余计算资源,Gd为总计算资源,为分配给计算资源分配向量M中每个任务的计算资源;B is the remaining computing resource vector; B = [b 1 , b 2 , b 3 …b d , …], Where b d is the remaining computing resources of the dth MEC server, G d is the total computing resources, Allocate computing resources to each task in the computing resource allocation vector M;
所述步骤S104中的系统动作at的表达式如下:The expression of the system action a t in step S104 is as follows:
at=[x,μ,k]a t = [x, μ, k]
其中:in:
x为终端设备;x is the terminal device;
μ为终端设备x的卸货方案;μ is the unloading plan of terminal device x;
k为终端设备x的计算资源分配方案;k is the computing resource allocation scheme of terminal device x;
所述步骤S105中的奖励σt+1的计算公式如下:The calculation formula of the reward σ t+1 in step S105 is as follows:
其中:in:
r为奖励函数;r is the reward function;
A为当前时间t状态下的目标函数值;A is the objective function value at the current time t;
A'为当前系统状态st采取系统动作at后到下一个状态时的目标函数值;A' is the objective function value when the current system state s t takes system action a t to the next state;
A”为所有局部卸载下的计算值;A” is the calculated value under all local unloading conditions;
所述步骤S106中的状态转换序列Δt的表达式如下:The expression of the state transition sequence Δt in step S106 is as follows:
Δt=(st,at,σt+1,st+1)。Δ t =(s t ,a t ,σ t+1 ,s t+1 ).
进一步地,所述步骤S107中对实时ANN和延迟ANN的训练方法包括如下步骤:Furthermore, the training method for the real-time ANN and the delayed ANN in step S107 includes the following steps:
步骤S1071:对所述N个状态转换序列,根据状态转换序列得到状态动作对的估计值Q(st,at,θ)和下一状态的值Q(st+1,at+1,θ');Step S1071: for the N state transition sequences, obtain an estimated value Q(s t , a t , θ) of the state-action pair and a value Q(s t+1 , a t+1 , θ′) of the next state according to the state transition sequence;
步骤S1072:根据所述下一状态的值Q(st+1,at+1,θ')和奖励σt+1,计算得到状态动作对的目标值y;Step S1072: Calculate the target value y of the state-action pair according to the value Q(s t+1 ,a t+1 ,θ′) of the next state and the reward σ t+1 ;
步骤S1073:根据所述状态动作对的估计值Q(st,at,θ)和目标值y,计算得到损失函数Loss(θ);Step S1073: Calculate the loss function Loss(θ) according to the estimated value Q(s t , a t , θ) of the state-action pair and the target value y;
步骤S1074:通过损失的反向传播机制调整实时ANN的参数θ,并利用优化器RMSprop减小损失函数Loss(θ);Step S1074: adjusting the parameter θ of the real-time ANN through the back propagation mechanism of the loss, and reducing the loss function Loss(θ) using the optimizer RMSprop;
步骤S1075:判断距离上一次更新延迟ANN的参数θ'的步数是否等于设定值,若是,更新延迟ANN的参数θ',进入步骤S1077;否则,进入步骤S1076;Step S1075: determine whether the number of steps since the last update of the delay ANN parameter θ' is equal to the set value. If so, update the delay ANN parameter θ' and proceed to step S1077; otherwise, proceed to step S1076;
步骤S1076:判断N个状态转换序列是否训练结束,若是,从经验池O中重新抽取N个状态转换序列,并返回步骤S1071,否则返回步骤S1071;Step S1076: Determine whether the training of the N state transition sequences is completed. If so, re-extract N state transition sequences from the experience pool O and return to step S1071. Otherwise, return to step S1071.
步骤S1077:对所述深度强化学习网络模型性能指标进行测试,得到测试结果;Step S1077: testing the performance index of the deep reinforcement learning network model to obtain a test result;
步骤S1078:判断所述测试结果是否达到要求,若是,则实时ANN和延迟ANN训练结束,得到训练完成的深度强化学习网络模型;否则,从经验池O中重新抽取N个状态转换序列,并返回步骤S1071。Step S1078: Determine whether the test result meets the requirements. If so, the real-time ANN and delayed ANN training is completed to obtain a trained deep reinforcement learning network model; otherwise, re-extract N state transition sequences from the experience pool O and return to step S1071.
进一步地,所述步骤S1072中的状态动作对的目标值y的计算公式如下:Furthermore, the calculation formula of the target value y of the state-action pair in step S1072 is as follows:
其中:in:
为maxQ(st+1,at+1,θ')的波动系数; is the fluctuation coefficient of maxQ(s t+1 ,a t+1 ,θ');
Q(st+1,at+1,θ')为系统下一状态的值;Q(s t+1 ,a t+1 ,θ') is the value of the next state of the system;
maxQ(st+1,at+1,θ')为系统下一状态的最大值;maxQ(s t+1 ,a t+1 ,θ') is the maximum value of the next state of the system;
所述步骤S1073中的损失函数Loss(θ)的表达式如下:The expression of the loss function Loss(θ) in step S1073 is as follows:
其中:in:
N为每次抽取的状态转换序列的数量值;N is the number of state transition sequences extracted each time;
n为状态转换序列的序号。n is the sequence number of the state transition sequence.
进一步地,所述步骤S1077中的深度强化学习网络模型性能指标包括:全局成本和可靠度;Further, the performance indicators of the deep reinforcement learning network model in step S1077 include: global cost and reliability;
所述全局成本包括延迟成本c1、迁移成本c2和负载成本c3。The global cost includes delay cost c 1 , migration cost c 2 and load cost c 3 .
进一步地,所述延迟成本c1的表达式如下:Furthermore, the expression of the delay cost c1 is as follows:
其中:in:
t为交互次数;t is the number of interactions;
X为终端设备集合;X is a terminal device set;
E为边缘节点集合;E is the set of edge nodes;
ux为发送的数据量;u x is the amount of data sent;
为当前交互时间里终端设备x与边缘节点e的部署变量; is the deployment variable of the terminal device x and the edge node e during the current interaction time;
τxe为终端设备x与边缘节点e的传输延迟;τ xe is the transmission delay between terminal device x and edge node e;
所述迁移成本c2的表达式如下:The expression of the migration cost c2 is as follows:
其中:in:
j为迁移边缘节点;j is the migration edge node;
为上一交互时间里终端设备x与边缘节点e的部署变量; is the deployment variable of the terminal device x and the edge node e in the last interaction time;
为当前交互时间里终端设备x与迁移边缘节点j的部署变量; is the deployment variable of the terminal device x and the migration edge node j in the current interaction time;
所述负载成本c3的表达式如下:The expression of the load cost c3 is as follows:
其中:in:
ux为发送的数据量。u x is the amount of data sent.
进一步地,所述可靠度的计算包括以下步骤:Furthermore, the calculation of the reliability includes the following steps:
步骤A1:将终端设备x和边缘节点e的交互数据存储于滑动窗口中,并进行实时更新;Step A1: Store the interaction data between the terminal device x and the edge node e in a sliding window and update it in real time;
步骤A2:根据终端设备x和边缘节点e的历史交互数据,采用基于贝叶斯信任评价的期望值计算当前交互的时间衰减程度和资源分配率;Step A2: According to the historical interaction data between the terminal device x and the edge node e, the expected value based on the Bayesian trust evaluation is used to calculate the time decay degree and resource allocation rate of the current interaction;
步骤A3:根据所述时间衰减程度和资源分配率,计算得到可靠度Tex(t)Step A3: Calculate the reliability T ex (t) according to the time decay degree and resource allocation rate
进一步地,所述可靠度Tex(t)的计算公式如下:Furthermore, the calculation formula of the reliability T ex (t) is as follows:
Nex(t)=1-Pex(t)N ex (t) = 1 - P ex (t)
其中:in:
U为滑动窗口中有效信息的数量;U is the amount of valid information in the sliding window;
w为当前交互信息;w is the current interaction information;
为时间衰减程度; is the degree of time decay;
Hex(tw)为资源分配率;H ex (t w ) is the resource allocation rate;
的波动系数; The coefficient of volatility;
Pex(tw)当前交互的正服务满意度;P ex (t w ) positive service satisfaction of the current interaction;
Nex(tw)为当前交互的负服务满意度;N ex (t w ) is the negative service satisfaction of the current interaction;
sex(t)为终端设备x和边缘节点e之间成功的历史交互次数;s ex (t) is the number of successful historical interactions between terminal device x and edge node e;
fex(t)为终端设备x和边缘节点e之间失败的历史交互次数。f ex (t) is the number of failed historical interactions between terminal device x and edge node e.
进一步地,所述步骤A2中时间衰减程度的表达式如下:Furthermore, the expression of the time attenuation degree in step A2 is as follows:
其中:in:
Δtw为第w次交互结束到当前交互开始的时间间隙;Δt w is the time interval from the end of the wth interaction to the beginning of the current interaction;
所述步骤A2中资源分配率的计算公式如下:The calculation formula of the resource allocation rate in step A2 is as follows:
其中:in:
sourceex(t)为边缘节点e在当前时隙中能提供给终端设备x的资源量;source ex (t) is the amount of resources that edge node e can provide to terminal device x in the current time slot;
sourcee(t)为边缘节点e在当前时隙中所能提供的资源总量。source e (t) is the total amount of resources that edge node e can provide in the current time slot.
与现有的技术相比本发明的有益效果是:Compared with the prior art, the present invention has the following beneficial effects:
1、一种基于深度强化学习的边缘物联网代理资源分配方法,利用深度强化学习网络模型计算得到最优分配策略,根据所述最优分配策略,将终端数据传输至边缘节点e进行计算,有效缓解了现场设备的计算压力,避免了在资源分配过程中大数据量带来的存储困难,保证了通信网络可靠和高效的信息交互,为电力物联网提供了更好的信息交互支持服务。1. A resource allocation method for edge IoT agents based on deep reinforcement learning. The optimal allocation strategy is calculated using a deep reinforcement learning network model. According to the optimal allocation strategy, the terminal data is transmitted to the edge node e for calculation, which effectively alleviates the computing pressure of the on-site equipment and avoids the storage difficulties caused by large amounts of data in the resource allocation process. It ensures reliable and efficient information interaction of the communication network and provides better information interaction support services for the power IoT.
2、一种基于深度强化学习的边缘物联网代理资源分配方法,其中的深度强化学习网络模型将深度学习的感知能力和强化学习的决策能力相结合,进行优势互补,能够支持大数据量的最优策略求解。2. A resource allocation method for edge IoT agents based on deep reinforcement learning, in which the deep reinforcement learning network model combines the perception ability of deep learning and the decision-making ability of reinforcement learning to complement each other and support the optimal strategy solution for large amounts of data.
3、一种基于深度强化学习的边缘物联网代理资源分配方法,其中的神经网络包括实时ANN和延迟ANN,经过一定次数的训练后,将延迟ANN的参数更新为实时ANN的参数,保证了延迟ANN值函数的及时性,降低了状态之间的相关性。3. A resource allocation method for edge IoT agents based on deep reinforcement learning, in which the neural networks include real-time ANN and delay ANN. After a certain number of trainings, the parameters of the delay ANN are updated to the parameters of the real-time ANN, which ensures the timeliness of the delay ANN value function and reduces the correlation between states.
4、一种基于深度强化学习的边缘物联网代理资源分配方法,将全局成本和可靠度作为网络模型的性能判断指标,为网络模型寻求最优策略提供判断依据。4. A resource allocation method for edge IoT agents based on deep reinforcement learning, which takes global cost and reliability as performance judgment indicators of the network model, and provides a judgment basis for the network model to seek the optimal strategy.
5、一种基于深度强化学习的边缘物联网代理资源分配方法,采用滑动窗口机制更新交互信息,直接摒弃间隔时间较长的交互信息,减少了用户终端的计算开销,且可靠度的计算保证了用户终端在任务卸载过程中的安全性,为建立良好的交互环境提供保障。5. A resource allocation method for edge IoT agents based on deep reinforcement learning adopts a sliding window mechanism to update interaction information, directly discards interaction information with a long interval, reduces the computational overhead of user terminals, and the reliability calculation ensures the security of user terminals during task offloading, providing a guarantee for establishing a good interaction environment.
6、一种基于深度强化学习的边缘物联网代理资源分配方法,计算用户终端与边缘服务器之间的各项交互质量数值,为可靠度的计算做准备,为网络模型寻求最优策略提供判断依据。6. A resource allocation method for edge IoT agents based on deep reinforcement learning, which calculates the quality values of various interactions between user terminals and edge servers, prepares for reliability calculations, and provides a basis for judgment in seeking the optimal strategy for network models.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.
图2为本发明中深度强化学习网络模型的实现方法流程图。FIG2 is a flow chart of a method for implementing a deep reinforcement learning network model in the present invention.
图3为本发明中实时ANN和延迟ANN的训练方法流程图。FIG3 is a flow chart of the training method of the real-time ANN and the delayed ANN in the present invention.
图4为本发明中可靠度计算方法流程图。FIG4 is a flow chart of the reliability calculation method in the present invention.
图5为本发明中滑动窗口示意图。FIG5 is a schematic diagram of a sliding window in the present invention.
图6为本发明中深度强化学习网络结构图。FIG6 is a diagram showing the structure of a deep reinforcement learning network in the present invention.
图7为本发明实施例中深度强化学习网络模型参数。FIG. 7 shows the parameters of the deep reinforcement learning network model in an embodiment of the present invention.
图8为本发明实施例中深度强化学习网络模型不同学习率下的网络性能曲线图。FIG8 is a graph showing network performance at different learning rates for a deep reinforcement learning network model according to an embodiment of the present invention.
具体实施方式DETAILED DESCRIPTION
需要说明的是,术语“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that relational terms such as "first" and "second" are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the existence of other identical elements in the process, method, article or device including the elements.
下面结合实施例对本发明的特征和性能作进一步的详细描述。The features and performance of the present invention are further described in detail below in conjunction with the embodiments.
实施例一
请参阅图1,一种基于深度强化学习的边缘物联网代理资源分配方法,包括:Please refer to FIG1 , a method for allocating resources of edge IoT agents based on deep reinforcement learning, including:
步骤S1:由终端设备x收集环境中的数据,并将所述数据传输至深度强化学习网络模型;Step S1: The terminal device x collects data in the environment and transmits the data to the deep reinforcement learning network model;
优选地,在本实施例中,终端设备x收集的数据为用户终端的语音、视频、图像等数据;Preferably, in this embodiment, the data collected by the terminal device x is the voice, video, image and other data of the user terminal;
优选地,在本实施例中,采用python3+tensorflow1.0作为仿真实验平台,硬件条件为Intel Core i7-5200u和16GB的内存,在仿真测试环境中设置了50个终端设备x和5个边缘节点e,所述终端设备x和边缘节点e均匀分布在15公里×15公里的网格中;Preferably, in this embodiment, python3+tensorflow1.0 is used as the simulation experiment platform, the hardware conditions are Intel Core i7-5200u and 16GB of memory, and 50 terminal devices x and 5 edge nodes e are set in the simulation test environment, and the terminal devices x and edge nodes e are evenly distributed in a 15 km × 15 km grid;
优选地,在本实施例中,终端设备x每隔1小时向边缘节点e发送一次任务请求,边缘节点e以分布式方式决定执行任务的服务器;其中,终端设备x的负荷来自于真实的负荷数据集,在这个数据集中,由于潮汐效应,终端任务的负荷大致遵循24小时的周期性分布,但也会因为环境因素而产生随机波动。Preferably, in this embodiment, the terminal device x sends a task request to the edge node e once every hour, and the edge node e determines the server that executes the task in a distributed manner; wherein the load of the terminal device x comes from a real load data set, in which, due to the tidal effect, the load of the terminal task roughly follows a 24-hour periodic distribution, but may also produce random fluctuations due to environmental factors.
优选地,在本实施例中,图7为深度强化学习网络模型参数。Preferably, in this embodiment, FIG. 7 is a deep reinforcement learning network model parameter.
步骤S2:根据所述数据,由深度强化学习网络模型得到最优分配策略;Step S2: according to the data, obtaining the optimal allocation strategy by the deep reinforcement learning network model;
步骤S3:根据所述最优分配策略,将所述数据发送至边缘节点e进行计算,实现边缘物联网代理资源分配。Step S3: According to the optimal allocation strategy, the data is sent to the edge node e for calculation to realize edge IoT agent resource allocation.
在本实施例中,具体的,如图2所示,所述步骤S1中深度强化学习网络模型的训练方法包括如下步骤:In this embodiment, specifically, as shown in FIG2 , the training method of the deep reinforcement learning network model in step S1 includes the following steps:
步骤S101:初始化所述深度强化学习网络模型的系统状态s;Step S101: Initialize the system state s of the deep reinforcement learning network model;
步骤S102:初始化所述深度强化学习网络模型的实时ANN和延迟ANN;Step S102: Initializing the real-time ANN and the delayed ANN of the deep reinforcement learning network model;
步骤S103:初始化所述深度强化学习网络模型的经验池O;Step S103: Initialize the experience pool O of the deep reinforcement learning network model;
步骤S104:根据当前系统状态st,利用ε-greedy策略,选择系统动作at;Step S104: according to the current system state s t , using the ε-greedy strategy, select the system action a t ;
步骤S105:由环境根据所述系统动作at反馈奖励σt+1和系统下一状态st+1;Step S105: The environment feeds back a reward σ t+1 and the next state of the system s t+1 according to the system action a t ;
步骤S106:根据所述当前系统状态st、系统动作at、奖励σt+1和系统下一状态st+1,计算得到状态转换序列Δt,并将状态转换序列Δt存储至经验池O;Step S106: Calculate a state transition sequence Δ t according to the current system state s t , system action a t , reward σ t+1 and the next system state s t+1 , and store the state transition sequence Δ t in the experience pool O;
步骤S107:判断经验池O存储量是否达到预设值,若是,从经验池O中抽取N个状态转换序列对实时ANN和延迟ANN进行训练,完成对深度强化学习网络模型的训练;否则,将当前系统状态st更新为系统下一状态st+1,并返回步骤S104。Step S107: Determine whether the storage capacity of the experience pool O reaches the preset value. If so, extract N state transition sequences from the experience pool O to train the real-time ANN and the delayed ANN to complete the training of the deep reinforcement learning network model; otherwise, update the current system state s t to the next system state s t+1 and return to step S104.
在本实施例中,具体的,所述步骤S101中的系统状态s为本地卸载状态,表达式如下:In this embodiment, specifically, the system state s in step S101 is a local uninstallation state, which is expressed as follows:
s=[F,M,B]s=[F,M,B]
其中:in:
F为卸货决策向量;F is the unloading decision vector;
M为计算资源分配向量;M is the computing resource allocation vector;
B为剩余计算资源向量;B=[b1,b2,b3…bd,…],其中,bd为第d个MEC服务器的剩余计算资源,Gd为总计算资源,为分配给计算资源分配向量M中每个任务的计算资源;B is the remaining computing resource vector; B = [b 1 , b 2 , b 3 …b d , …], Where b d is the remaining computing resources of the dth MEC server, G d is the total computing resources, Allocate computing resources to each task in the computing resource allocation vector M;
所述步骤S104中的系统动作at的表达式如下:The expression of the system action a t in step S104 is as follows:
at=[x,μ,k]a t = [x, μ, k]
其中:in:
x为终端设备;x is the terminal device;
μ为终端设备x的卸货方案;μ is the unloading plan of terminal device x;
k为终端设备x的计算资源分配方案;k is the computing resource allocation scheme of terminal device x;
所述步骤S105中的奖励σt+1的计算公式如下:The calculation formula of the reward σ t+1 in step S105 is as follows:
其中:in:
r为奖励函数;r is the reward function;
A为当前时间t状态下的目标函数值;A is the objective function value at the current time t;
A'为当前系统状态st采取系统动作at后到下一个状态时的目标函数值;A' is the objective function value when the current system state s t takes system action a t to reach the next state;
A”为所有局部卸载下的计算值;A” is the calculated value under all local unloading conditions;
所述步骤S106中的状态转换序列Δt的表达式如下:The expression of the state transition sequence Δt in step S106 is as follows:
Δt=(st,at,σt+1,st+1)。Δ t =(s t ,a t ,σ t+1 ,s t+1 ).
在本实施例中,具体的,如图3所示,所述步骤S107中对实时ANN和延迟ANN的训练方法包括如下步骤:In this embodiment, specifically, as shown in FIG3 , the training method for the real-time ANN and the delayed ANN in step S107 includes the following steps:
步骤S1071:对所述N个状态转换序列,根据状态转换序列得到状态动作对的估计值Q(st,at,θ)和下一状态的值Q(st+1,at+1,θ');Step S1071: for the N state transition sequences, obtain an estimated value Q(s t , a t , θ) of the state-action pair and a value Q(s t+1 , a t+1 , θ′) of the next state according to the state transition sequence;
步骤S1072:根据所述下一状态的值Q(st+1,at+1,θ')和奖励σt+1,计算得到状态动作对的目标值y;Step S1072: Calculate the target value y of the state-action pair according to the value Q(s t+1 ,a t+1 ,θ′) of the next state and the reward σ t+1 ;
步骤S1073:根据所述状态动作对的估计值Q(st,at,θ)和目标值y,计算得到损失函数Loss(θ);Step S1073: Calculate the loss function Loss(θ) according to the estimated value Q(s t , a t , θ) of the state-action pair and the target value y;
步骤S1074:通过损失的反向传播机制调整实时ANN的参数θ,并利用优化器RMSprop减小损失函数Loss(θ);Step S1074: adjusting the parameter θ of the real-time ANN through the back propagation mechanism of the loss, and reducing the loss function Loss(θ) using the optimizer RMSprop;
步骤S1075:判断距离上一次更新延迟ANN的参数θ'的步数是否等于设定值,若是,更新延迟ANN的参数θ',进入步骤S1077;否则,进入步骤S1076;Step S1075: determine whether the number of steps since the last update of the delay ANN parameter θ' is equal to the set value. If so, update the delay ANN parameter θ' and proceed to step S1077; otherwise, proceed to step S1076;
步骤S1076:判断N个状态转换序列是否训练结束,若是,从经验池O中重新抽取N个状态转换序列,并返回步骤S1071,否则返回步骤S1071;Step S1076: Determine whether the training of the N state transition sequences is completed. If so, re-extract N state transition sequences from the experience pool O and return to step S1071. Otherwise, return to step S1071.
步骤S1077:对所述深度强化学习网络模型性能指标进行测试,得到测试结果;Step S1077: testing the performance index of the deep reinforcement learning network model to obtain a test result;
步骤S1078:判断所述测试结果是否达到要求,若是,则实时ANN和延迟ANN训练结束,得到训练完成的深度强化学习网络模型;否则,从经验池O中重新抽取N个状态转换序列,并返回步骤S1071。Step S1078: Determine whether the test result meets the requirements. If so, the real-time ANN and delayed ANN training is completed to obtain a trained deep reinforcement learning network model; otherwise, re-extract N state transition sequences from the experience pool O and return to step S1071.
在本实施例中,具体的,所述步骤S1072中的状态动作对的目标值y的计算公式如下:In this embodiment, specifically, the calculation formula of the target value y of the state-action pair in step S1072 is as follows:
其中:in:
为maxQ(st+1,at+1,θ')的波动系数; is the fluctuation coefficient of maxQ(s t+1 ,a t+1 ,θ');
Q(st+1,at+1,θ')为系统下一状态的值;Q(s t+1 ,a t+1 ,θ') is the value of the next state of the system;
maxQ(st+1,at+1,θ')为系统下一状态的最大值;maxQ(s t+1 ,a t+1 ,θ') is the maximum value of the next state of the system;
所述步骤S1073中的损失函数Loss(θ)的表达式如下:The expression of the loss function Loss(θ) in step S1073 is as follows:
其中:in:
N为每次抽取的状态转换序列的数量值;N is the number of state transition sequences extracted each time;
n为状态转换序列的序号。n is the sequence number of the state transition sequence.
在本实施例中,具体的,所述步骤S1077中的深度强化学习网络模型性能指标包括:全局成本和可靠度;In this embodiment, specifically, the deep reinforcement learning network model performance indicators in step S1077 include: global cost and reliability;
所述全局成本包括延迟成本c1、迁移成本c2和负载成本c3。The global cost includes delay cost c 1 , migration cost c 2 and load cost c 3 .
本实施例中,为了实现高效的任务处理,考虑了三个因素:延迟成本c1、迁移成本c2和负载成本c3;由于终端设备x需要将收集到的数据发送到边缘节点e进行处理,期间数据的传输会产生时间延迟;在处理一个任务时,边缘节点e也可以决定是否将任务发送到迁移边缘节点j,然而,由于迁移任务需要重新部署模型,会产生迁移成本;由于边缘节点e的容量有限,如果在同一个边缘节点e部署太多的任务,边缘节点e往往会过载,产生了负载成本。In this embodiment, in order to achieve efficient task processing, three factors are considered: delay cost c1 , migration cost c2 and load cost c3 ; since the terminal device x needs to send the collected data to the edge node e for processing, the transmission of the data will cause a time delay; when processing a task, the edge node e can also decide whether to send the task to the migration edge node j, however, since the migration task requires the redeployment of the model, a migration cost will be incurred; since the capacity of the edge node e is limited, if too many tasks are deployed on the same edge node e, the edge node e will often be overloaded, resulting in a load cost.
在本实施例中,具体的,所述延迟成本c1的表达式如下:In this embodiment, specifically, the expression of the delay cost c1 is as follows:
其中:in:
t为交互次数;t is the number of interactions;
X为终端设备集合;X is a terminal device set;
E为边缘节点集合;E is the set of edge nodes;
ux为发送的数据量;u x is the amount of data sent;
为当前交互时间里终端设备x与边缘节点e的部署变量; is the deployment variable of the terminal device x and the edge node e during the current interaction time;
τxe为终端设备x与边缘节点e的传输延迟;τ xe is the transmission delay between terminal device x and edge node e;
所述迁移成本c2的表达式如下:The expression of the migration cost c2 is as follows:
其中:in:
j为迁移边缘节点;j is the migration edge node;
为上一交互时间里终端设备x与边缘节点e的部署变量; is the deployment variable of the terminal device x and the edge node e in the last interaction time;
为当前交互时间里终端设备x与迁移边缘节点j的部署变量; is the deployment variable of the terminal device x and the migration edge node j in the current interaction time;
所述负载成本c3的表达式如下:The expression of the load cost c3 is as follows:
其中:in:
ux为发送的数据量。u x is the amount of data sent.
在本实施例中,具体的,如图4所示,所述可靠度的计算包括以下步骤:In this embodiment, specifically, as shown in FIG4 , the calculation of the reliability includes the following steps:
步骤A1:将终端设备x和边缘节点e的交互数据存储于滑动窗口中,并进行实时更新;Step A1: Store the interaction data between the terminal device x and the edge node e in a sliding window and update it in real time;
本实施例中,考虑到间隔时间较长的交互经验不足以及时更新当前的可靠值,应该更加关注最近的交互行为,所以采用滑动窗口机制来更新交互信息;如图5所示,当下一个时隙的交互信息到来时,窗口中间隔时间最长的时隙记录将被丢弃,有效的交互信息将被记录在窗口中,从而减少了用户终端的计算开销;In this embodiment, considering that the interaction experience with a long interval is not enough to update the current reliability value in time, more attention should be paid to the most recent interaction behavior, so a sliding window mechanism is used to update the interaction information; as shown in FIG5 , when the interaction information of the next time slot arrives, the time slot record with the longest interval in the window will be discarded, and the valid interaction information will be recorded in the window, thereby reducing the calculation overhead of the user terminal;
步骤A2:根据终端设备x和边缘节点e的历史交互数据,采用基于贝叶斯信任评价的期望值计算当前交互的时间衰减程度和资源分配率;Step A2: According to the historical interaction data between the terminal device x and the edge node e, the expected value based on the Bayesian trust evaluation is used to calculate the time decay degree and resource allocation rate of the current interaction;
本实施例中,由于边缘服务器的可靠度是动态更新的,历史交互信息距离当前时间越长,对当前可靠度评估的影响越小,时间衰减函数定义为:表示从w次交互得到的信息到当前交互时隙的信息的衰减程度,其中Δtw=t-tw,tw是w次交互时隙的结束时间,边缘服务器在每次交互过程中所能提供的计算资源量也会影响到交互信息的更新;In this embodiment, since the reliability of the edge server is dynamically updated, the longer the historical interaction information is from the current time, the smaller the impact on the current reliability evaluation. The time decay function is defined as: It represents the attenuation degree of the information obtained from the w-th interaction to the current interaction time slot, where Δtw = ttw , tw is the end time of the w-th interaction time slot, and the amount of computing resources that the edge server can provide during each interaction will also affect the update of the interaction information;
步骤A3:根据所述时间衰减程度和资源分配率,计算得到可靠度Tex(t)Step A3: Calculate the reliability T ex (t) according to the time decay degree and resource allocation rate
在本实施例中,具体的,所述可靠度Tex(t)的计算公式如下:In this embodiment, specifically, the calculation formula of the reliability T ex (t) is as follows:
Nex(t)=1-Pex(t)N ex (t) = 1 - P ex (t)
其中:in:
U为滑动窗口中有效信息的数量;U is the amount of valid information in the sliding window;
w为当前交互信息;w is the current interaction information;
为时间衰减程度; is the degree of time attenuation;
Hex(tw)为资源分配率;H ex (t w ) is the resource allocation rate;
ε为的波动系数;ε is The coefficient of volatility;
Pex(tw)当前交互的正服务满意度;P ex (t w ) positive service satisfaction of the current interaction;
Nex(tw)为当前交互的负服务满意度;N ex (t w ) is the negative service satisfaction of the current interaction;
sex(t)为终端设备x和边缘节点e之间成功的历史交互次数;s ex (t) is the number of successful historical interactions between terminal device x and edge node e;
fex(t)为终端设备x和边缘节点e之间失败的历史交互次数。f ex (t) is the number of failed historical interactions between terminal device x and edge node e.
在本实施例中,具体的,所述步骤A2中时间衰减程度的表达式如下:In this embodiment, specifically, the expression of the time attenuation degree in step A2 is as follows:
其中:in:
Δtw为第w次交互结束到当前交互开始的时间间隙;Δt w is the time interval from the end of the wth interaction to the beginning of the current interaction;
所述步骤A2中资源分配率的计算公式如下:The calculation formula of the resource allocation rate in step A2 is as follows:
其中:in:
sourceex(t)为边缘节点e在当前时隙中能提供给终端设备x的资源量;source ex (t) is the amount of resources that edge node e can provide to terminal device x in the current time slot;
sourcee(t)为边缘节点e在当前时隙中所能提供的资源总量。source e (t) is the total amount of resources that edge node e can provide in the current time slot.
本实施例中,采用深度强化学习网络模型求解最优分配策略,如图6所示,深度强化学习网络模型包括两个神经网络,第一个神经网络,称为实时ANN,用于计算当前状态动作对的估计值Q(st,at,θ),θ指的是实时ANN的参数,每次计算当前状态的估计值时都会更新;第二个神经网络,称为延迟ANN,用于计算下一状态的值Q(st+1,at+1,θ'),下一状态的值用于计算目标值y。In this embodiment, a deep reinforcement learning network model is used to solve the optimal allocation strategy. As shown in Figure 6, the deep reinforcement learning network model includes two neural networks. The first neural network, called the real-time ANN, is used to calculate the estimated value Q(s t , a t , θ) of the current state-action pair. θ refers to the parameter of the real-time ANN, which is updated each time the estimated value of the current state is calculated; the second neural network, called the delayed ANN, is used to calculate the value Q(s t+1 , a t+1 , θ') of the next state. The value of the next state is used to calculate the target value y.
本实施例中,测试了不同学习率下对深度强化学习网络模型的影响,如图8所示,当学习率因子设置为0.01时,网络损失函数不能有效收敛,函数值有明显的震荡现象。相反,当学习率设置为0.0001时,网络的分散性得到有效改善,网络在60次迭代时有效收敛,但收敛速度明显变慢。显然,当设置为0.0001时,资源分配性能最好,损失函数下降很快,网络收敛更稳定,收敛效果更好。In this embodiment, the influence of different learning rates on the deep reinforcement learning network model was tested. As shown in Figure 8, when the learning rate factor is set to 0.01, the network loss function cannot converge effectively, and the function value has obvious oscillation. On the contrary, when the learning rate is set to 0.0001, the dispersion of the network is effectively improved, and the network converges effectively at 60 iterations, but the convergence speed is significantly slower. Obviously, when it is set to 0.0001, the resource allocation performance is the best, the loss function drops quickly, the network converges more stably, and the convergence effect is better.
以上所述实施例仅表达了本申请的具体实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请保护范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请技术方案构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。The above-mentioned embodiments only express the specific implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the protection scope of the present application. It should be pointed out that, for ordinary technicians in this field, several variations and improvements can be made without departing from the technical solution concept of the present application, and these all belong to the protection scope of the present application.
提供本背景技术部分是为了大体上呈现本发明的上下文,当前所署名的发明人的工作、在本背景技术部分中所描述的程度上的工作以及本部分描述在申请时尚不构成现有技术的方面,既非明示地也非暗示地被承认是本发明的现有技术。This background section is provided to generally present the context of the invention, and the work of the presently named inventors, the work to the extent described in this background section, and aspects of the description in this section that did not constitute prior art at the time of application are neither explicitly nor implicitly admitted to be prior art to the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211401605.2A CN115914227B (en) | 2022-11-10 | 2022-11-10 | Edge internet of things proxy resource allocation method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211401605.2A CN115914227B (en) | 2022-11-10 | 2022-11-10 | Edge internet of things proxy resource allocation method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115914227A true CN115914227A (en) | 2023-04-04 |
CN115914227B CN115914227B (en) | 2024-03-19 |
Family
ID=86493215
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211401605.2A Active CN115914227B (en) | 2022-11-10 | 2022-11-10 | Edge internet of things proxy resource allocation method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115914227B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118916180A (en) * | 2024-09-30 | 2024-11-08 | 苏州元脑智能科技有限公司 | Resource scheduling method, device, program product and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112134916A (en) * | 2020-07-21 | 2020-12-25 | 南京邮电大学 | A cloud-edge collaborative computing migration method based on deep reinforcement learning |
CN113890653A (en) * | 2021-08-30 | 2022-01-04 | 广东工业大学 | Multi-agent reinforcement learning power allocation method for multi-user benefit |
CN114490057A (en) * | 2022-01-24 | 2022-05-13 | 电子科技大学 | A Deep Reinforcement Learning-Based Resource Allocation Method for MEC Offloaded Tasks |
US20220180174A1 (en) * | 2020-12-07 | 2022-06-09 | International Business Machines Corporation | Using a deep learning based surrogate model in a simulation |
-
2022
- 2022-11-10 CN CN202211401605.2A patent/CN115914227B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112134916A (en) * | 2020-07-21 | 2020-12-25 | 南京邮电大学 | A cloud-edge collaborative computing migration method based on deep reinforcement learning |
US20220180174A1 (en) * | 2020-12-07 | 2022-06-09 | International Business Machines Corporation | Using a deep learning based surrogate model in a simulation |
CN113890653A (en) * | 2021-08-30 | 2022-01-04 | 广东工业大学 | Multi-agent reinforcement learning power allocation method for multi-user benefit |
CN114490057A (en) * | 2022-01-24 | 2022-05-13 | 电子科技大学 | A Deep Reinforcement Learning-Based Resource Allocation Method for MEC Offloaded Tasks |
Non-Patent Citations (5)
Title |
---|
BO FENG 等: "Influence analysis of neutral point grounding mode on the single-phase grounding fault characteristics of distribution network with distributed generation", 2020 5TH ASIA CONFERENCE ON POWER AND ELECTRICAL ENGINEERING (ACPEE), 30 June 2020 (2020-06-30) * |
朱斐;吴文;刘全;伏玉琛;: "一种最大置信上界经验采样的深度Q网络方法", 计算机研究与发展, no. 08, 15 August 2018 (2018-08-15) * |
李孜恒;孟超: "基于深度强化学习的无线网络资源分配算法", 通信技术, vol. 53, no. 008, 31 December 2020 (2020-12-31) * |
李孜恒;孟超;: "基于深度强化学习的无线网络资源分配算法", 通信技术, no. 08, 10 August 2020 (2020-08-10) * |
饶宁等: "基于多智能体深度强化学习的分布式协同 干扰功率分配算法", 电子学报, 30 June 2022 (2022-06-30) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118916180A (en) * | 2024-09-30 | 2024-11-08 | 苏州元脑智能科技有限公司 | Resource scheduling method, device, program product and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115914227B (en) | 2024-03-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114219097B (en) | Federal learning training and predicting method and system based on heterogeneous resources | |
CN113610303A (en) | Load prediction method and system | |
CN112579194B (en) | Block chain consensus task unloading method and device based on time delay and transaction throughput | |
CN113568727A (en) | Mobile edge calculation task allocation method based on deep reinforcement learning | |
CN114340016B (en) | Power grid edge calculation unloading distribution method and system | |
CN111813506A (en) | A resource-aware computing migration method, device and medium based on particle swarm optimization | |
CN113055923B (en) | Mobile network traffic prediction method, device and device | |
CN115809147B (en) | Multi-edge cooperative cache scheduling optimization method, system and model training method | |
CN117539648A (en) | Service quality management method and device for electronic government cloud platform | |
CN107566535A (en) | Adaptive load balancing strategy based on user concurrent access timing planning in a kind of web map service | |
CN115914227A (en) | Edge Internet of things agent resource allocation method based on deep reinforcement learning | |
Lorido-Botran et al. | ImpalaE: Towards an optimal policy for efficient resource management at the edge | |
CN111901134B (en) | Method and device for predicting network quality based on recurrent neural network model (RNN) | |
CN112532459A (en) | Bandwidth resource adjusting method, device and equipment | |
Huang | The value-of-information in matching with queues | |
CN116566696A (en) | A security evaluation system and method based on cloud computing | |
CN116954866A (en) | Edge cloud task scheduling method and system based on deep reinforcement learning | |
CN104113590A (en) | Copy selection method based on copy response time prediction | |
TW202327380A (en) | Method and system for federated reinforcement learning based offloading optimization in edge computing | |
CN116074331A (en) | Block data synchronization method and related product | |
Preetham et al. | Resource provisioning in cloud using ARIMA and LSTM technique | |
CN119071746B (en) | Efficient SMS distribution method and system based on 5G | |
CN114398162B (en) | A collaborative scheduling method for edge computing tasks | |
CN118631764B (en) | Communication method and system of intelligent equipment in service operation range | |
CN118283705A (en) | Load balancing method, server, load balancing device, system and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |