CN115413044A - A Joint Allocation Method of Computing and Communication Resources in Industrial Wireless Network - Google Patents

A Joint Allocation Method of Computing and Communication Resources in Industrial Wireless Network Download PDF

Info

Publication number
CN115413044A
CN115413044A CN202211052799.XA CN202211052799A CN115413044A CN 115413044 A CN115413044 A CN 115413044A CN 202211052799 A CN202211052799 A CN 202211052799A CN 115413044 A CN115413044 A CN 115413044A
Authority
CN
China
Prior art keywords
industrial
computing
terminal
terminals
tasks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211052799.XA
Other languages
Chinese (zh)
Other versions
CN115413044B (en
Inventor
许驰
唐紫萱
金曦
夏长清
李栋
曾鹏
于海斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Institute of Automation of CAS
Original Assignee
Shenyang Institute of Automation of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Institute of Automation of CAS filed Critical Shenyang Institute of Automation of CAS
Priority to CN202211052799.XA priority Critical patent/CN115413044B/en
Publication of CN115413044A publication Critical patent/CN115413044A/en
Application granted granted Critical
Publication of CN115413044B publication Critical patent/CN115413044B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明涉及工业无线网络,具体地说,是一种工业无线网络的计算和通信资源联合分配方法,包括以下步骤:建立具备边缘计算能力的工业无线网络;建立计算和通信资源联合分配的问题原型;基于马尔科夫决策过程进行问题转换;基于深度强化学习构建双层对偶深度神经网络模型;离线训练神经网络模型直至奖励收敛;在线执行计算和通信资源分配,协同处理异构工业任务。本发明通过对工业无线网络的计算和通信资源进行联合分配,可以支持异构工业任务的按需卸载,实现“端‑边”资源协同;在满足异构工业任务的截止期要求,“端‑边”设备的计算能力、最大发射功率、峰值干扰功率等限制的前提下,实现处理异构工业任务的总时延最小化,支持协同化生产制造。

Figure 202211052799

The present invention relates to industrial wireless networks, in particular to a method for joint allocation of computing and communication resources in industrial wireless networks, including the following steps: establishing an industrial wireless network with edge computing capabilities; establishing a problem prototype for joint allocation of computing and communication resources ;Problem conversion based on Markov decision process;Construction of dual-layer dual deep neural network model based on deep reinforcement learning;Offline training of neural network model until reward convergence;Online execution of computing and communication resource allocation, collaborative processing of heterogeneous industrial tasks. The present invention can support on-demand unloading of heterogeneous industrial tasks by jointly allocating computing and communication resources of industrial wireless networks, and realize "device-edge" resource collaboration; when meeting the deadline requirements of heterogeneous industrial tasks, "device- Under the premise of limiting the computing power, maximum transmit power, and peak interference power of "edge" devices, the total delay in processing heterogeneous industrial tasks is minimized, and collaborative manufacturing is supported.

Figure 202211052799

Description

一种工业无线网络的计算和通信资源联合分配方法A Joint Allocation Method of Computing and Communication Resources in Industrial Wireless Network

技术领域technical field

本发明涉及工业无线网络领域,具体地说是一种工业无线网络的计算和通信资源联合分配方法。The invention relates to the field of industrial wireless networks, in particular to a method for jointly allocating computing and communication resources of industrial wireless networks.

背景技术Background technique

随着5G超可靠低时延通信(Ultra-Reliable Low-Latency Communication,URLLC)技术的不断增强,以5G为基础的工业无线网络能力变得越来越强,可以支持关键工业控制任务和复杂生产。然而,尽管URLLC可以在一定程度上保障控制命令在截止期前确定性到达,但是复杂工业任务通常需要实时计算来完成决策并生成控制命令,这对资源有限的工业现场设备提出了巨大挑战。多接入边缘计算(Multi-access Edge Computing,MEC)技术的快速发展及其与5GURLLC的结合,为解决复杂生产任务的实时计算,降低时延提供了有效方案。通过与工业基站联合部署MEC服务器,可以补充和增强工业现场设备的计算能力,提高处理复杂任务的实时性。With the continuous enhancement of 5G ultra-reliable low-latency communication (Ultra-Reliable Low-Latency Communication, URLLC) technology, 5G-based industrial wireless network capabilities are becoming stronger and stronger, which can support key industrial control tasks and complex production . However, although URLLC can guarantee the deterministic arrival of control commands before the deadline to a certain extent, complex industrial tasks usually require real-time calculations to complete decisions and generate control commands, which poses a huge challenge to industrial field devices with limited resources. The rapid development of Multi-access Edge Computing (MEC) technology and its combination with 5GURLLC provide an effective solution for solving real-time computing of complex production tasks and reducing latency. By jointly deploying MEC servers with industrial base stations, the computing power of industrial field devices can be supplemented and enhanced, and the real-time performance of processing complex tasks can be improved.

然而,工业无线网络端边资源分布的不均匀性,海量设备接入对资源的竞争性,以及异构任务QoS要求的差异性,对网络的计算和通信资源按需分配提出了进一步挑战,成为制约MEC高效应用的瓶颈。为实现工业无线网络的端边资源协同,学术界提出了不同的资源分配和计算卸载算法来平衡不同场景下的计算和通信资源分配,包括面向单用户-多服务器、多用户-单服务器、多用户-多服务器等场景。相关方法以时延最小化、能耗最小化、吞吐量最大化等为目标,主要采用深度强化学习(Deep Reinforcement Learning,DRL)对计算决策、卸载比例、发射功率、通信带宽、CPU资源等复杂耦合的计算和通信资源进行决策和分配,以应对高度动态的无线网络环境。However, the inhomogeneous distribution of end-to-end resources in industrial wireless networks, the competition for resources due to massive device access, and the differences in QoS requirements for heterogeneous tasks pose further challenges to the on-demand allocation of computing and communication resources in the network, and become The bottleneck restricting the efficient application of MEC. In order to realize end-edge resource collaboration in industrial wireless networks, the academic community has proposed different resource allocation and computing offloading algorithms to balance computing and communication resource allocation in different scenarios, including single-user-multi-server, multi-user-single-server, and multi-user Scenarios such as user-multiple servers. Related methods aim at minimizing delay, minimizing energy consumption, maximizing throughput, etc., and mainly adopt Deep Reinforcement Learning (DRL) to solve complex problems such as computing decisions, unloading ratios, transmission power, communication bandwidth, and CPU resources. Coupled computing and communication resources for decision-making and allocation to deal with highly dynamic wireless network environments.

然而,现有方法较少关注具有不同QoS要求的异构工业任务的高并发接入问题,例如机器视觉的计算密集型任务、运动控制的时延敏感任务。特别是工业场景下,异构工业任务有不同的截止期要求和干扰限制,对工业无线网络的计算和通信资源分配造成了极大困难。However, existing methods pay less attention to the high-concurrency access problem of heterogeneous industrial tasks with different QoS requirements, such as computing-intensive tasks of machine vision and delay-sensitive tasks of motion control. Especially in industrial scenarios, heterogeneous industrial tasks have different deadline requirements and interference restrictions, which have caused great difficulties in the allocation of computing and communication resources in industrial wireless networks.

发明内容Contents of the invention

本发明面向通用的多用户-多服务器场景下工业无线网络中异构高并发任务的协同处理问题,考虑异构任务的时间截止期要求,工业终端的最大发射功率及其对其他设备的峰值干扰功率等限制,以及网络端边设备的计算能力,提出了一种基于深度强化学习的工业无线网络通信和计算资源联合分配方法,解决了传统资源分配方法难以应对动态网络环境下的状态空间爆炸难题,实现了异构工业任务总处理时延的最小化,可以支持计算密集型、时延敏感型等异构高并发工业任务的实时协同处理。The present invention is oriented to the collaborative processing of heterogeneous high-concurrency tasks in industrial wireless networks in a general multi-user-multi-server scenario, considering the time deadline requirements of heterogeneous tasks, the maximum transmission power of industrial terminals and their peak interference to other devices Power and other constraints, as well as the computing power of network end devices, a joint allocation method of industrial wireless network communication and computing resources based on deep reinforcement learning is proposed, which solves the problem that traditional resource allocation methods are difficult to deal with the state space explosion in a dynamic network environment , which minimizes the total processing delay of heterogeneous industrial tasks, and can support real-time collaborative processing of heterogeneous and highly concurrent industrial tasks such as computing-intensive and delay-sensitive.

本发明为实现上述目的所采用的技术方案是:一种工业无线网络的计算和通信资源联合分配方法,基于深度强化学习实现工业无线网络中工业基站与工业终端的计算和通信资源协同分配,包括以下步骤:The technical solution adopted by the present invention to achieve the above purpose is: a joint allocation method of computing and communication resources in industrial wireless networks, which realizes the collaborative allocation of computing and communication resources between industrial base stations and industrial terminals in industrial wireless networks based on deep reinforcement learning, including The following steps:

1)建立具备边缘计算能力的工业无线网络;1) Establish an industrial wireless network with edge computing capabilities;

2)根据异构工业任务的截止期要求构建关于工业基站与工业终端的计算和通信资源联合分配的优化问题;2) According to the deadline requirements of heterogeneous industrial tasks, construct an optimization problem about the joint allocation of computing and communication resources between industrial base stations and industrial terminals;

3)将所述优化问题转换为马尔科夫决策过程问题;3) converting the optimization problem into a Markov decision process problem;

4)基于深度强化学习构建双层对偶深度神经网络模型,以对马尔科夫决策过程问题进行求解;4) Construct a dual-layer dual deep neural network model based on deep reinforcement learning to solve the Markov decision process problem;

5)离线训练双层对偶深度神经网络模型,以获得计算和通信资源分配结果;5) Offline training of dual-layer dual deep neural network model to obtain calculation and communication resource allocation results;

6)工业无线网络内工业基站和工业终端在线执行计算和通信资源分配,进行无线通信以及任务卸载,以协同处理异构工业任务,最小化时延。6) Industrial base stations and industrial terminals in the industrial wireless network perform computing and communication resource allocation online, perform wireless communication and task offloading, so as to collaboratively process heterogeneous industrial tasks and minimize delay.

所述具备边缘计算能力的工业无线网络,包括:N台工业基站和M个工业终端;The industrial wireless network with edge computing capability includes: N industrial base stations and M industrial terminals;

所述工业基站,配置边缘计算服务器,用于为多个工业终端提供计算资源,并支持对其覆盖范围内工业终端进行调度;The industrial base station is configured with an edge computing server, which is used to provide computing resources for multiple industrial terminals and support scheduling of industrial terminals within its coverage;

所述工业终端,用于对异构工业任务进行本地计算,并支持将异构工业任务通过无线信道卸载至工业基站进行边缘计算。The industrial terminal is used for local computing of heterogeneous industrial tasks, and supports offloading of heterogeneous industrial tasks to industrial base stations through wireless channels for edge computing.

对于单个工业终端的任务,通过不卸载、部分卸载或全部卸载至一个工业基站;以dm,n表示计算决策,dm,n=1表示工业终端m选择工业基站n进行卸载,否则不卸载;For the task of a single industrial terminal, it can be unloaded, partially unloaded or fully unloaded to an industrial base station; d m,n represents the calculation decision, d m,n = 1 means that the industrial terminal m chooses the industrial base station n to unload, otherwise it does not unload ;

所述工业终端通过无线信道进行任务卸载时的传输速率为The transmission rate when the industrial terminal performs task offloading through the wireless channel is

Figure BDA0003824000120000021
Figure BDA0003824000120000021

其中,Bm,n表示带宽,

Figure BDA0003824000120000022
表示工业基站n处的噪声,gm,n和gm',n分别表示从工业终端m、工业终端m'到工业基站n的信道功率增益,pm、pm'分别表示工业终端m、工业终端m'的发射功率。Among them, B m,n represents the bandwidth,
Figure BDA0003824000120000022
Indicates the noise at industrial base station n, g m,n and g m',n respectively represent the channel power gain from industrial terminal m, industrial terminal m' to industrial base station n, p m , p m' represent industrial terminal m, The transmit power of the industrial terminal m'.

所述工业基站与工业终端的计算和通信资源联合分配的优化问题为The optimization problem of the joint allocation of computing and communication resources between the industrial base station and the industrial terminal is

Figure BDA0003824000120000023
Figure BDA0003824000120000023

s.t.C1:0≤pm≤Pmax,m=1,...M,stC1:0≤p m ≤P max ,m=1,...M,

C2:

Figure BDA0003824000120000024
C2:
Figure BDA0003824000120000024

C3:0≤fm,n≤Fn,C3:0≤f m,n ≤F n ,

C4:

Figure BDA0003824000120000025
C4:
Figure BDA0003824000120000025

C5:0≤um,n≤1,C5:0≤um , n≤1,

C6:dm,n∈{0,1},C6:d m,n ∈{0,1},

C7:

Figure BDA0003824000120000026
C7:
Figure BDA0003824000120000026

C8:0≤τm,n≤wm C8: 0≤τm , n≤wm

其中,

Figure BDA0003824000120000031
为任务目标,即最小化总时延,τm,n表示处理工业终端m任务的时延,
Figure BDA0003824000120000032
分别表示所有工业基站和工业终端的客观计算决策、卸载比例和发射功率;in,
Figure BDA0003824000120000031
is the task goal, that is, to minimize the total delay, τ m,n represents the delay in processing industrial terminal m tasks,
Figure BDA0003824000120000032
respectively represent the objective calculation decision, unloading ratio and transmit power of all industrial base stations and industrial terminals;

C1和C2为发射功率约束;其中,Pmax表示工业终端的最大发射功率,Ip表示工业终端可以容忍的峰值干扰功率,gm,m'表示从工业终端m与工业终端m'之间的信道功率增益;C1 and C2 are transmit power constraints; among them, P max represents the maximum transmit power of the industrial terminal, I p represents the peak interference power that the industrial terminal can tolerate, and g m,m' represents the distance between the industrial terminal m and the industrial terminal m' channel power gain;

C3和C4为计算资源约束;其中,fm,n表示工业基站n分配给工业终端m的计算资源,Fn表示工业基站n的总计算资源,均由单位时间内的CPU周期数来表征;C3 and C4 are computing resource constraints; among them, f m,n represent the computing resources allocated by industrial base station n to industrial terminal m, and F n represent the total computing resources of industrial base station n, which are represented by the number of CPU cycles per unit time;

C5为卸载比例约束;其中,um,n表示工业终端m将任务卸载至工业基站n的卸载比例,其大小在0到1之间;C5 is the unloading ratio constraint; among them, u m,n represents the unloading ratio of industrial terminal m offloading tasks to industrial base station n, and its size is between 0 and 1;

C6和C7为计算决策的约束;其中,dm,n表示计算决策,dm,n=1表示工业终端m选择工业基站n进行任务卸载;dm,n=0表示工业终端m未选择工业基站n进行任务卸载;每个工业终端只能选择一个工业基站进行任务卸载,不能卸载到多个工业基站;C6 and C7 are the constraints of computing decisions; among them, d m, n represent computing decisions, d m, n = 1 means that industrial terminal m chooses industrial base station n to offload tasks; d m, n = 0 means that industrial terminal m does not choose industrial Base station n performs task offloading; each industrial terminal can only select one industrial base station for task offloading, and cannot unload to multiple industrial base stations;

C8为任务截止期约束;其中,wm表示工业终端m所执行任务的截止期,即工业终端m所能接受的最长任务处理时间。C8 is the task deadline constraint; among them, w m represents the deadline of the task performed by the industrial terminal m, that is, the longest task processing time that the industrial terminal m can accept.

所述工业终端的任务处理时延,由边缘计算时延和本地计算时延决定,计算方法如下:The task processing delay of the industrial terminal is determined by the edge computing delay and the local computing delay, and the calculation method is as follows:

Figure BDA0003824000120000033
Figure BDA0003824000120000033

所述边缘计算时延

Figure BDA0003824000120000034
由通信时延和计算时延决定,计算为The edge computing delay
Figure BDA0003824000120000034
Determined by the communication delay and calculation delay, calculated as

Figure BDA0003824000120000035
Figure BDA0003824000120000035

其中,Cm表示工业终端m计算单位数据量任务所需的CPU周期数,Tm表示工业终端m要处理的任务数据量大小;Among them, C m represents the number of CPU cycles required by the industrial terminal m to calculate the unit data volume task, and T m represents the task data volume to be processed by the industrial terminal m;

所述本地计算时延

Figure BDA0003824000120000036
计算为The local computation delay
Figure BDA0003824000120000036
calculated as

Figure BDA0003824000120000037
Figure BDA0003824000120000037

所述基于马尔科夫决策过程进行问题转换的过程如下:The process of problem conversion based on the Markov decision process is as follows:

a)建立马尔科夫决策模型,包括状态向量、动作向量、奖励向量及状态转移函数;a) Establish a Markov decision model, including state vectors, action vectors, reward vectors and state transition functions;

所述状态向量为工业终端在时刻t的状态,记为s(t)={T(t),C(t),w(t),v(t),g(t)};The state vector is the state of the industrial terminal at time t, denoted as s(t)={T(t), C(t), w(t), v(t), g(t)};

其中,T(t)={Tm(t)}M表示M个工业终端所执行任务的数据量大小集合,C(t)={Cm(t)}M表示M个工业终端执行单位数据量任务所需的CPU计算周期集合,v(t)={vm(t)}M表示M个工业终端的任务优先级集合,w(t)={wm(t)}M表示M个工业终端执行任务的截止期集合,g(t)={gm,n(t)}M×N表示M个工业终端和N个工业基站之间的信道功率增益集合;Among them, T(t)={T m (t)} M represents the data size set of tasks performed by M industrial terminals, and C(t)={C m (t)} M represents the execution unit data of M industrial terminals The set of CPU computing cycles required by the quantity task, v(t)={v m (t)} M represents the task priority set of M industrial terminals, w(t)={w m (t)} M represents M The set of deadlines for performing tasks by industrial terminals, g(t)={g m,n (t)} M×N represents the set of channel power gains between M industrial terminals and N industrial base stations;

所述动作向量为工业终端在时刻t的动作,记为a(t)={d(t),u(t),p(t)};The action vector is the action of the industrial terminal at time t, denoted as a(t)={d(t), u(t), p(t)};

其中,d(t)={dm,n(t)}M×N表示M个工业终端向N个工业基站进行任务卸载的计算决策结果集合;u(t)={um,n(t)}M×N表示M个工业终端向N个工业基站卸载任务的比例集合,p(t)={pm,n(t)}M×N表示M个工业终端的发射功率集合;Among them, d(t)={d m,n (t)} M×N represents the calculation and decision result set of M industrial terminals performing task offloading to N industrial base stations; u(t)={u m,n (t )} M×N represents the proportion set of M industrial terminals unloading tasks to N industrial base stations, p(t)={p m,n (t)} M×N represents the transmission power set of M industrial terminals;

所述奖励向量为工业终端在时刻t所获得的奖励,记为r(t)={rm,n(t)}M×NThe reward vector is the reward obtained by the industrial terminal at time t, which is denoted as r(t)={r m,n (t)} M×N ;

其中,工业终端m获得的奖励为rm,n(t)=τm,n(t)+ρ(τm,n(t)-wm(t)),ρ表示任务的奖惩系数,由任务优先级决定;Among them, the reward obtained by the industrial terminal m is r m,n (t)=τ m,n (t)+ρ(τ m,n (t)-w m (t)), ρ represents the reward and punishment coefficient of the task, given by task prioritization;

所述状态转移函数为在时刻t执行动作a(t)后,由状态s(t)转移到状态s(t+1)的概率,表示为f(s(t+1)|s(t),a(t));The state transition function is the probability of transitioning from state s(t) to state s(t+1) after executing action a(t) at time t, expressed as f(s(t+1)|s(t) ,a(t));

b)确定长期累积奖励函数;所述长期累积奖励为b) Determine the long-term cumulative reward function; the long-term cumulative reward is

Figure BDA0003824000120000041
Figure BDA0003824000120000041

其中,t0表示上一时刻,γ∈[0,1]表示折扣系数,用于指示过去的奖励对当前奖励的影响情况;Among them, t 0 represents the previous moment, and γ∈[0,1] represents the discount coefficient, which is used to indicate the influence of past rewards on current rewards;

c)进行问题转换;所述转换后的问题为c) carry out problem conversion; The problem after described conversion is

max Rp(t)max R p (t)

s.t.C1,C2,C3,C4,C5,C6,C7,C8s.t.C1,C2,C3,C4,C5,C6,C7,C8

在满足约束C1-C8的情况下,最大化长期累积奖励,以获得最佳状态转移概率,进而获得有效的时延最小化策略。In the case of satisfying constraints C1-C8, the long-term cumulative reward is maximized to obtain the best state transition probability, and then an effective delay minimization strategy is obtained.

所述基于深度强化学习构建双层对偶深度神经网络模型,包括两个深度神经网络,分别称为估计Q网络和目标Q网络,构成智能体;两者结构相同,但采用不同的超参数θ来生成Q值。The two-layer dual deep neural network model based on deep reinforcement learning includes two deep neural networks, which are respectively called the estimated Q network and the target Q network to form an agent; both have the same structure, but use different hyperparameters θ to Generate Q-values.

所述离线训练神经网络模型直至奖励收敛,包括以下步骤:The offline training of the neural network model until the reward converges comprises the following steps:

a)从经验池中提取经验数据E(t)作为训练数据;a) Extract experience data E(t) from the experience pool as training data;

b)将s(t)输入到估计Q网络中,生成动作a(t)和Q值Q(s(t),a(t)|θ);b) Input s(t) into the estimated Q network to generate action a(t) and Q value Q(s(t), a(t)|θ);

c)执行a(t),将s(t)转化为s(t+1),得到奖励r(t);c) Execute a(t), convert s(t) into s(t+1), and get reward r(t);

d)对估计Q网络进行训练,并实时更新目标Q网络的超参数θ;d) Train the estimated Q network and update the hyperparameter θ of the target Q network in real time;

e)将得到的当前时刻的状态、动作、奖励与下一时刻的状态作为经验E(t)={s(t),a(t),r(t),s(t+1)}存储到经验池中;e) Store the obtained current state, action, reward and next state as experience E(t)={s(t), a(t), r(t), s(t+1)} into the experience pool;

f)输入s(t+1)到目标Q网络,得到a(t+1)和Q值Q′(s(t),a(t)|θ′),并计算得到Q′(s(t+1),a(t+1)|θ′);f) Input s(t+1) to the target Q network, get a(t+1) and Q value Q′(s(t),a(t)|θ′), and calculate Q′(s(t) +1), a(t+1)|θ′);

g)用随机梯度下降法更新θ;所述随机梯度下降法更新θ通过下式实现:g) updating θ with the stochastic gradient descent method; the stochastic gradient descent method updating θ is realized by the following formula:

Figure BDA0003824000120000042
Figure BDA0003824000120000042

其中,

Figure BDA0003824000120000043
为估计Q网络的损失函数,表示目标Q网络与估计Q网络的均方误差;in,
Figure BDA0003824000120000043
To estimate the loss function of the Q network, it represents the mean square error between the target Q network and the estimated Q network;

h)执行经验回放,重复迭代步骤a)-g)直至奖励收敛到稳定值,获得有效的时延最小化策略,即关于所有工业基站和工业终端计算决策、卸载比例和发射功率的计算和通信资源分配结果。h) Execute experience replay, repeat the iterative steps a)-g) until the reward converges to a stable value, and obtain an effective delay minimization strategy, that is, calculation and communication about all industrial base stations and industrial terminals computing decisions, unloading ratios, and transmit power Resource allocation results.

所述在线执行计算和通信资源分配,协同处理异构工业任务包括以下步骤:The online execution of computing and communication resource allocation and collaborative processing of heterogeneous industrial tasks includes the following steps:

a)将全部工业终端当前时刻t的状态向量s(t)作为离线训练完成的智能体的输入,得到输出动作向量a(t);a) The state vector s(t) of all industrial terminals at the current moment t is used as the input of the agent after offline training, and the output action vector a(t) is obtained;

b)根据得到的输出动作向量a(t),全部工业终端根据a(t)中的计算决策、卸载比例与发射功率分配计算和通信资源,处理工业任务。b) According to the obtained output action vector a(t), all industrial terminals process industrial tasks by allocating computing and communication resources according to the calculation decision, unloading ratio and transmission power in a(t).

本发明具有以下有益效果及优点:The present invention has the following beneficial effects and advantages:

1.本发明针对高度动态的工业网络环境,以及通信和计算多维资源复杂耦合造成的建模难和算法状态空间爆炸难题,采用深度强化学习方法提出了一种工业无线网络的计算和通信资源联合分配方法,实现了资源分配的离线训练和在线执行,可以确保工业无线网络的实时性。1. Aiming at the highly dynamic industrial network environment, as well as the difficulty in modeling and the explosion of the algorithm state space caused by the complex coupling of communication and computing multi-dimensional resources, the present invention proposes a combination of computing and communication resources for industrial wireless networks by using a deep reinforcement learning method The allocation method realizes offline training and online execution of resource allocation, and can ensure the real-time performance of industrial wireless networks.

2.本发明面向工业无线网络中异构高并发工业任务的协同处理问题,充分考虑了异构任务的时间截止期要求,工业终端的最大发射功率及其对其他设备的峰值干扰功率等限制,以及网络端边的计算能力,提出了一种基于深度强化学习的工业无线网络通信和计算资源联合分配方法,可以满足干扰受限的工业无线网络中异构工业任务的传输质量要求,支持异构任务的协同处理。2. The present invention is oriented towards the cooperative processing of heterogeneous and highly concurrent industrial tasks in industrial wireless networks, fully considering the time deadline requirements of heterogeneous tasks, the maximum transmission power of industrial terminals and the peak interference power to other devices, etc., As well as the computing power of the network end, a joint allocation method of industrial wireless network communication and computing resources based on deep reinforcement learning is proposed, which can meet the transmission quality requirements of heterogeneous industrial tasks in interference-limited industrial wireless networks, and support heterogeneous Collaborative processing of tasks.

附图说明Description of drawings

图1为本发明方法流程图;Fig. 1 is a flow chart of the method of the present invention;

图2为本发明面向的多用户-多服务工业无线网络场景示意图;FIG. 2 is a schematic diagram of a multi-user-multi-service industrial wireless network scenario for the present invention;

图3为本发明采用的深度神经网络结构图;Fig. 3 is the deep neural network structural diagram that the present invention adopts;

图4为本发明进行深度强化学习训练的流程图。Fig. 4 is a flowchart of deep reinforcement learning training in the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步的详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

本发明面向多用户-多服务器这一通用场景下的工业无线网络“端-边”资源分配,提出了一种基于深度强化学习的工业无线网络通信和计算资源联合分配方法。本发明通过对工业无线网络的计算和通信资源进行联合分配,可以支持异构工业任务的按需卸载,实现“端-边”资源协同;在满足异构工业任务的截止期要求,“端-边”设备的计算能力、最大发射功率、峰值干扰功率等限制的前提下,实现异构工业任务的总处理时延最小化,支持协同化生产制造。The present invention is oriented to the "device-side" resource allocation of industrial wireless networks under the general scenario of multi-user-multi-server, and proposes a joint allocation method of industrial wireless network communication and computing resources based on deep reinforcement learning. The present invention can support on-demand unloading of heterogeneous industrial tasks by jointly allocating computing and communication resources of industrial wireless networks, and realize "device-edge" resource collaboration; when meeting the deadline requirements of heterogeneous industrial tasks, "device- Under the premise of limiting the computing power, maximum transmit power, and peak interference power of "edge" devices, the total processing delay of heterogeneous industrial tasks is minimized, and collaborative manufacturing is supported.

本发明所提出的一种工业无线网络通信和计算资源联合分配方法,包括以下步骤:1)建立具备边缘计算能力的工业无线网络;2)建立计算和通信资源联合分配的问题原型;3)基于马尔科夫决策过程进行问题转换;4)基于深度强化学习构建双层对偶深度神经网络模型;5)离线训练神经网络模型直至奖励收敛;6)在线执行计算和通信资源分配,协同处理异构工业任务。本发明的整体流程如图1所示。A method for joint allocation of industrial wireless network communication and computing resources proposed by the present invention includes the following steps: 1) establishing an industrial wireless network with edge computing capabilities; 2) establishing a problem prototype for joint allocation of computing and communication resources; 3) based on Markov decision process for problem conversion; 4) Construct dual-layer dual deep neural network model based on deep reinforcement learning; 5) Offline training of neural network model until reward convergence; 6) Online execution of computing and communication resource allocation, collaborative processing of heterogeneous industries Task. The overall process of the present invention is shown in Figure 1.

1)建立具备边缘计算能力的工业无线网络。如图2所示,工业无线网络包括N台工业基站和M个工业终端。其中,工业基站配置边缘计算服务器,具有强实时计算能力,可为多个工业终端提供计算资源,并支持对其覆盖范围内工业终端的调度,满足工业终端的计算和通信需求;工业终端具备计算和通信能力,可以对异构工业任务进行实时处理,并支持将异构工业任务通过无线信道卸载至工业基站进行处理。1) Establish an industrial wireless network with edge computing capabilities. As shown in Figure 2, the industrial wireless network includes N industrial base stations and M industrial terminals. Among them, the industrial base station is equipped with an edge computing server, which has strong real-time computing capabilities, can provide computing resources for multiple industrial terminals, and supports the scheduling of industrial terminals within its coverage to meet the computing and communication needs of industrial terminals; And communication capabilities, can process heterogeneous industrial tasks in real time, and support the offloading of heterogeneous industrial tasks to industrial base stations through wireless channels for processing.

根据计算决策情况,单个工业终端的任务可以不卸载、部分卸载或全部卸载至工业基站,但只能卸载到一个工业基站。以dm,n表示计算决策,dm,n=1表示工业终端m选择工业基站n进行卸载,否则不卸载。According to the calculation and decision-making situation, the task of a single industrial terminal can be unloaded, partially or completely offloaded to the industrial base station, but it can only be offloaded to one industrial base station. Let d m,n represent the calculation decision, d m,n = 1 means that the industrial terminal m selects the industrial base station n for offloading, otherwise it does not offload.

工业终端通过无线信道进行任务卸载时的速率为When the industrial terminal performs task offloading through the wireless channel, the rate is

Figure BDA0003824000120000061
Figure BDA0003824000120000061

其中,Bm,n表示带宽,

Figure BDA0003824000120000062
表示工业基站n处的噪声,gm,n和gm',n分别表示从工业终端m、工业终端m'到工业基站n的信道功率增益,pm、pm'分别表示工业终端m、工业终端m'的发射功率。Among them, B m,n represents the bandwidth,
Figure BDA0003824000120000062
Indicates the noise at industrial base station n, g m,n and g m',n respectively represent the channel power gain from industrial terminal m, industrial terminal m' to industrial base station n, p m , p m' represent industrial terminal m, The transmit power of the industrial terminal m'.

2)建立计算和通信资源联合分配的问题原型2) Establish a problem prototype for the joint allocation of computing and communication resources

工业无线网络端边联合计算和通信资源分配的优化问题为The optimization problem of end-side joint computing and communication resource allocation in industrial wireless network is

Figure BDA0003824000120000063
Figure BDA0003824000120000063

s.t.C1:0≤pm≤Pmax,m=1,...M,stC1:0≤p m ≤P max ,m=1,...M,

C2:

Figure BDA0003824000120000064
C2:
Figure BDA0003824000120000064

C3:0≤fm,n≤Fn,C3:0≤f m,n ≤F n ,

C4:

Figure BDA0003824000120000065
C4:
Figure BDA0003824000120000065

C5:0≤um,n≤1,C5:0≤um , n≤1,

C6:dm,n∈{0,1},C6:d m,n ∈{0,1},

C7:

Figure BDA0003824000120000066
C7:
Figure BDA0003824000120000066

C8:0≤τm,n≤wm C8: 0≤τm , n≤wm

其中,

Figure BDA0003824000120000067
分别表示所有工业基站和工业终端的客观计算决策、卸载比例和发射功率。
Figure BDA0003824000120000068
为任务目标,即最小化总时延。τm,n表示处理工业终端m的任务时延,由边缘计算时延和本地计算时延决定,计算方法如下:in,
Figure BDA0003824000120000067
Represent the objective calculation decision, unloading ratio and transmit power of all industrial base stations and industrial terminals, respectively.
Figure BDA0003824000120000068
is the mission goal, that is, to minimize the total delay. τ m,n represents the task delay of processing industrial terminal m, which is determined by edge computing delay and local computing delay. The calculation method is as follows:

Figure BDA0003824000120000069
Figure BDA0003824000120000069

边缘计算时延

Figure BDA00038240001200000610
由通信时延和计算时延决定,计算为Edge Computing Latency
Figure BDA00038240001200000610
Determined by the communication delay and calculation delay, calculated as

Figure BDA0003824000120000071
Figure BDA0003824000120000071

其中,Cm表示工业终端m计算单位数据量任务所需的CPU周期数,Tm表示工业终端m要处理的任务数据量大小。Among them, C m represents the number of CPU cycles required by the industrial terminal m to calculate the unit data volume task, and T m represents the task data volume to be processed by the industrial terminal m.

本地计算时延

Figure BDA0003824000120000072
计算为Local Computing Latency
Figure BDA0003824000120000072
calculated as

Figure BDA0003824000120000073
Figure BDA0003824000120000073

C1和C2为发射功率约束。其中,pm表示工业终端m的发射功率,Pmax表示工业终端的最大发射功率,Ip表示工业终端可以容忍的峰值干扰功率,gm,m'表示从工业终端m与工业终端m'之间的信道功率增益。C1 and C2 are transmit power constraints. Among them, p m represents the transmission power of industrial terminal m, P max represents the maximum transmission power of industrial terminal, I p represents the peak interference power that industrial terminal can tolerate, g m,m' represents the distance between industrial terminal m and industrial terminal m' The channel power gain between.

C3和C4为计算资源约束。其中,fm,n表示工业基站n分配给工业终端m的计算资源,Fn表示工业基站n的总计算资源,均由单位时间内的CPU周期数来衡量。C3 and C4 are computing resource constraints. Among them, f m, n represent the computing resources allocated by industrial base station n to industrial terminal m, and F n represent the total computing resources of industrial base station n, which are measured by the number of CPU cycles per unit time.

C5为卸载比例的约束。其中,um,n表示工业终端m将任务卸载至工业基站n的卸载比例,其大小在0到1之间。C5 is the constraint of unloading ratio. Among them, u m,n represents the unloading ratio of industrial terminal m offloading tasks to industrial base station n, and its size is between 0 and 1.

C6和C7为计算决策的约束。其中,dm,n表示计算决策,dm,n=1表示工业终端m选择工业基站n进行任务卸载;dm,n=0表示工业终端m未选择工业基站n进行任务卸载。C6 and C7 are constraints for computing decisions. Among them, d m,n represents a calculation decision, d m,n = 1 means that the industrial terminal m selects the industrial base station n for task offloading; d m,n = 0 means that the industrial terminal m does not select the industrial base station n for task offloading.

C8为任务截止期的约束。其中,wm表示工业终端m所执行任务的截止期,即要求工业终端m任务的处理时延应在该最长时间前完成。C8 is the constraint of task deadline. Among them, w m represents the deadline of the task performed by the industrial terminal m, that is, it is required that the processing delay of the task of the industrial terminal m should be completed before the maximum time.

3)基于马尔科夫决策过程进行问题转换的过程如下:3) The process of problem conversion based on Markov decision process is as follows:

a)建立马尔科夫决策模型,包括状态向量、动作向量、奖励向量及状态转移函数。a) Establish a Markov decision model, including state vectors, action vectors, reward vectors and state transition functions.

状态向量为工业终端在时刻t的状态,表示为s(t)={T(t),C(t),w(t),v(t),g(t)}。其中,T(t)={Tm(t)}M表示M个工业终端所执行任务的数据量大小集合,C(t)={Cm(t)}M表示M个工业终端执行单位数据量任务所需的CPU计算周期集合,v(t)={vm(t)}M表示M个工业终端的任务优先级集合,w(t)={wm(t)}M表示M个工业终端执行任务的截止期集合,g(t)={gm,n(t)}M×N表示M个工业终端和N个工业基站之间的信道功率增益集合。The state vector is the state of the industrial terminal at time t, expressed as s(t)={T(t), C(t), w(t), v(t), g(t)}. Among them, T(t)={T m (t)} M represents the data size set of tasks performed by M industrial terminals, and C(t)={C m (t)} M represents the execution unit data of M industrial terminals The set of CPU computing cycles required by the quantity task, v(t)={v m (t)} M represents the task priority set of M industrial terminals, w(t)={w m (t)} M represents M A set of deadlines for industrial terminals to perform tasks, g(t)={g m,n (t)} M×N represents a set of channel power gains between M industrial terminals and N industrial base stations.

动作向量为工业终端在时刻t的动作,表示为a(t)={d(t),u(t),p(t)}。其中,d(t)={dm,n(t)}M×N表示M个工业终端向N个工业基站进行任务卸载的计算决策结果集合;u(t)={um,n(t)}M×N表示M个工业终端向N个工业基站卸载任务的比例集合,p(t)={pm,n(t)}M×N表示M个工业终端的发射功率集合。The action vector is the action of the industrial terminal at time t, expressed as a(t)={d(t),u(t),p(t)}. Among them, d(t)={d m,n (t)} M×N represents the calculation and decision result set of M industrial terminals performing task offloading to N industrial base stations; u(t)={u m,n (t )} M×N represents the proportion set of M industrial terminals offloading tasks to N industrial base stations, p(t)={p m,n (t)} M×N represents the transmit power set of M industrial terminals.

奖励向量为工业终端在时刻t所获得的奖励,表示为r(t)={rm,n(t)}M×N。其中,工业终端m获得的奖励为The reward vector is the reward obtained by the industrial terminal at time t, expressed as r(t)={r m,n (t)} M×N . Among them, the reward obtained by the industrial terminal m is

rm,n(t)=τm,n(t)+ρ(τm,n(t)-wm(t))r m,n (t)=τ m,n (t)+ρ(τ m,n (t)-w m (t))

其中,ρ表示任务的奖惩系数,由任务优先级决定。Among them, ρ represents the reward and punishment coefficient of the task, which is determined by the priority of the task.

状态转移函数为在时刻t执行动作a(t)后,由状态s(t)转移到状态s(t+1)的概率,表示为f(s(t+1)|s(t),a(t))。The state transition function is the probability of transitioning from state s(t) to state s(t+1) after executing action a(t) at time t, expressed as f(s(t+1)|s(t),a (t)).

b)确定长期累积奖励为b) Determine the long-term cumulative reward as

Figure BDA0003824000120000081
Figure BDA0003824000120000081

其中,t0表示上一时刻,γ∈[0,1]表示折扣系数,用于指示过去的奖励对当前奖励的影响情况。Among them, t 0 represents the previous moment, and γ∈[0,1] represents the discount coefficient, which is used to indicate the influence of past rewards on current rewards.

c)进行问题转换c) Perform problem transformation

根据长期累积函数,将计算和通信资源分配的优化问题转换为According to the long-term cumulative function, the optimization problem of computing and communication resource allocation is transformed into

max Rp(t)max R p (t)

s.t.C1,C2,C3,C4,C5,C6,C7,C8s.t.C1,C2,C3,C4,C5,C6,C7,C8

在满足约束C1-C8的情况下,最大化长期累积奖励,可以获得最佳状态转移概率,进而获得有效的计算和通信资源分配结果,实现时延最小化。In the case of satisfying constraints C1-C8, maximizing the long-term cumulative reward can obtain the best state transition probability, and then obtain effective calculation and communication resource allocation results, and minimize the delay.

4)基于深度强化学习构建双层对偶深度神经网络模型,包括两个深度神经网络,分别称为估计Q网络和目标Q网络。两者结构相同,但采用不同的超参数θ来生成Q值。估计Q网络和目标Q网络均采用对偶架构,如图3所示,包含V(s)和A(s,a)两个分支,分别表明当前状态的状态价值和每个动作的优势。4) Construct a dual-layer dual deep neural network model based on deep reinforcement learning, including two deep neural networks, called the estimated Q-network and the target Q-network, respectively. Both have the same structure, but adopt different hyperparameters θ to generate Q values. Both the estimated Q network and the target Q network adopt a dual architecture, as shown in Figure 3, which contains two branches, V(s) and A(s,a), which respectively indicate the state value of the current state and the advantages of each action.

5)离线训练神经网络模型直至奖励收敛,如图4所示,包括以下步骤:5) offline training neural network model until the reward converges, as shown in Figure 4, including the following steps:

a)从经验池中提取数据作为训练数据;a) Extract data from the experience pool as training data;

b)将s(t)输入到估计Q网络中,生成动作a(t)和Q值Q(s(t),a(t)|θ);b) Input s(t) into the estimated Q network to generate action a(t) and Q value Q(s(t), a(t)|θ);

c)执行a(t),将s(t)转化为s(t+1),得到奖励r(t);c) Execute a(t), convert s(t) into s(t+1), and get reward r(t);

d)对估计Q网络进行训练,并实时更新目标Q网络的超参数θ;d) Train the estimated Q network and update the hyperparameter θ of the target Q network in real time;

e)将得到的当前时刻的状态、动作、奖励与下一时刻的状态作为经验E(t)={s(t),a(t),r(t),s(t+1)}存储到经验池中;e) Store the obtained current state, action, reward and next state as experience E(t)={s(t), a(t), r(t), s(t+1)} into the experience pool;

f)输入s(t+1)到目标Q网络,得到a(t+1)和Q值Q′(s(t),a(t)|θ′),并计算得到Q′(s(t+1),a(t+1)|θ′);f) Input s(t+1) to the target Q network, get a(t+1) and Q value Q′(s(t),a(t)|θ′), and calculate Q′(s(t) +1), a(t+1)|θ′);

g)用随机梯度下降法更新θ;所述随机梯度下降法更新θ通过下式实现:g) updating θ with the stochastic gradient descent method; the stochastic gradient descent method updating θ is realized by the following formula:

Figure BDA0003824000120000082
Figure BDA0003824000120000082

其中,

Figure BDA0003824000120000083
为估计Q网络的损失函数,表示目标Q网络与估计Q网络的均方误差;in,
Figure BDA0003824000120000083
To estimate the loss function of the Q network, it represents the mean square error between the target Q network and the estimated Q network;

h)执行经验回放,重复迭代步骤a)-g)直至奖励收敛到稳定值,获得有效的计算和通信资源分配结果,实现时延最小化。h) Execute experience replay, repeat iteration steps a)-g) until the reward converges to a stable value, obtain effective calculation and communication resource allocation results, and minimize time delay.

6)在线执行计算和通信资源分配,协同处理异构工业任务包括以下步骤:6) Online execution of computing and communication resource allocation, collaborative processing of heterogeneous industrial tasks includes the following steps:

a)将全部工业终端当前时刻t的状态向量s(t)作为离线训练完成的智能体的输入,得到输出动作向量a(t);a) The state vector s(t) of all industrial terminals at the current moment t is used as the input of the agent after offline training, and the output action vector a(t) is obtained;

b)根据得到的输出动作向量a(t),全部工业终端根据a(t)中的计算决策、卸载比例与发射功率分配计算和通信资源,处理工业任务。b) According to the obtained output action vector a(t), all industrial terminals process industrial tasks by allocating computing and communication resources according to the calculation decision, unloading ratio and transmission power in a(t).

Claims (9)

1.一种工业无线网络的计算和通信资源联合分配方法,其特征在于,基于深度强化学习实现工业无线网络中工业基站与工业终端的计算和通信资源协同分配,包括以下步骤:1. A method for joint allocation of computing and communication resources of an industrial wireless network, characterized in that, based on deep reinforcement learning, the collaborative allocation of computing and communication resources of industrial base stations and industrial terminals in industrial wireless networks may include the following steps: 1)建立具备边缘计算能力的工业无线网络;1) Establish an industrial wireless network with edge computing capabilities; 2)根据异构工业任务的截止期要求构建关于工业基站与工业终端的计算和通信资源联合分配的优化问题;2) According to the deadline requirements of heterogeneous industrial tasks, construct an optimization problem about the joint allocation of computing and communication resources between industrial base stations and industrial terminals; 3)将所述优化问题转换为马尔科夫决策过程问题;3) converting the optimization problem into a Markov decision process problem; 4)基于深度强化学习构建双层对偶深度神经网络模型,以对马尔科夫决策过程问题进行求解;4) Construct a dual-layer dual deep neural network model based on deep reinforcement learning to solve the Markov decision process problem; 5)离线训练双层对偶深度神经网络模型,以获得计算和通信资源分配结果;5) Offline training of dual-layer dual deep neural network model to obtain calculation and communication resource allocation results; 6)工业无线网络内工业基站和工业终端在线执行计算和通信资源分配,进行无线通信以及任务卸载,以协同处理异构工业任务,最小化时延。6) Industrial base stations and industrial terminals in the industrial wireless network perform computing and communication resource allocation online, perform wireless communication and task offloading, so as to collaboratively process heterogeneous industrial tasks and minimize delay. 2.根据权利要求1所述的一种工业无线网络的计算和通信资源联合分配方法,其特征在于,所述具备边缘计算能力的工业无线网络,包括:N台工业基站和M个工业终端;2. A method for joint allocation of computing and communication resources for an industrial wireless network according to claim 1, wherein the industrial wireless network with edge computing capabilities includes: N industrial base stations and M industrial terminals; 所述工业基站,配置边缘计算服务器,用于为多个工业终端提供计算资源,并支持对其覆盖范围内工业终端进行调度;The industrial base station is configured with an edge computing server, which is used to provide computing resources for multiple industrial terminals and support scheduling of industrial terminals within its coverage; 所述工业终端,用于对异构工业任务进行本地计算,并支持将异构工业任务通过无线信道卸载至工业基站进行边缘计算。The industrial terminal is used for local computing of heterogeneous industrial tasks, and supports offloading of heterogeneous industrial tasks to industrial base stations through wireless channels for edge computing. 3.根据权利要求2所述的一种工业无线网络的计算和通信资源联合分配方法,其特征在于,对于单个工业终端的任务,通过不卸载、部分卸载或全部卸载至一个工业基站;以dm,n表示计算决策,dm,n=1表示工业终端m选择工业基站n进行卸载,否则不卸载;3. The computing and communication resource joint allocation method of a kind of industrial wireless network according to claim 2, it is characterized in that, for the task of a single industrial terminal, by not unloading, partially unloading or all unloading to an industrial base station; with d m, n represent calculation decisions, d m, n = 1 means that the industrial terminal m selects the industrial base station n for unloading, otherwise it does not unload; 所述工业终端通过无线信道进行任务卸载时的传输速率为The transmission rate when the industrial terminal performs task offloading through the wireless channel is
Figure FDA0003824000110000011
Figure FDA0003824000110000011
其中,Bm,n表示带宽,
Figure FDA0003824000110000012
表示工业基站n处的噪声,gm,n和gm',n分别表示从工业终端m、工业终端m'到工业基站n的信道功率增益,pm、pm'分别表示工业终端m、工业终端m'的发射功率。
Among them, B m,n represents the bandwidth,
Figure FDA0003824000110000012
Indicates the noise at industrial base station n, g m,n and g m',n respectively represent the channel power gain from industrial terminal m, industrial terminal m' to industrial base station n, p m , p m' represent industrial terminal m, The transmit power of the industrial terminal m'.
4.根据权利要求1所述的一种工业无线网络的计算和通信资源联合分配方法,其特征在于,所述工业基站与工业终端的计算和通信资源联合分配的优化问题为4. The computing and communication resource joint distribution method of a kind of industrial wireless network according to claim 1, is characterized in that, the optimization problem of the computing and communication resource joint distribution of described industrial base station and industrial terminal is
Figure FDA0003824000110000013
Figure FDA0003824000110000013
Figure FDA0003824000110000021
Figure FDA0003824000110000021
C3:0≤fm,n≤Fn,C3:0≤f m,n ≤F n ,
Figure FDA0003824000110000022
Figure FDA0003824000110000022
C5:0≤um,n≤1,C5:0≤um , n≤1, C6:dm,n∈{0,1},C6:d m,n ∈{0,1},
Figure FDA0003824000110000023
Figure FDA0003824000110000023
C8:0≤τm,n≤wm C8: 0≤τm , n≤wm 其中,
Figure FDA0003824000110000024
为任务目标,即最小化总时延,τm,n表示处理工业终端m任务的时延,
Figure FDA0003824000110000025
分别表示所有工业基站和工业终端的客观计算决策、卸载比例和发射功率;
in,
Figure FDA0003824000110000024
is the task goal, that is, to minimize the total delay, τ m,n represents the delay in processing industrial terminal m tasks,
Figure FDA0003824000110000025
respectively represent the objective calculation decision, unloading ratio and transmit power of all industrial base stations and industrial terminals;
C1和C2为发射功率约束;其中,Pmax表示工业终端的最大发射功率,Ip表示工业终端可以容忍的峰值干扰功率,gm,m'表示从工业终端m与工业终端m'之间的信道功率增益;C1 and C2 are transmit power constraints; among them, P max represents the maximum transmit power of the industrial terminal, I p represents the peak interference power that the industrial terminal can tolerate, and g m,m' represents the distance between the industrial terminal m and the industrial terminal m' channel power gain; C3和C4为计算资源约束;其中,fm,n表示工业基站n分配给工业终端m的计算资源,Fn表示工业基站n的总计算资源,均由单位时间内的CPU周期数来表征;C3 and C4 are computing resource constraints; among them, f m,n represent the computing resources allocated by industrial base station n to industrial terminal m, and F n represent the total computing resources of industrial base station n, which are represented by the number of CPU cycles per unit time; C5为卸载比例约束;其中,um,n表示工业终端m将任务卸载至工业基站n的卸载比例,其大小在0到1之间;C5 is the unloading ratio constraint; among them, u m,n represents the unloading ratio of industrial terminal m offloading tasks to industrial base station n, and its size is between 0 and 1; C6和C7为计算决策的约束;其中,dm,n表示计算决策,dm,n=1表示工业终端m选择工业基站n进行任务卸载;dm,n=0表示工业终端m未选择工业基站n进行任务卸载;每个工业终端只能选择一个工业基站进行任务卸载,不能卸载到多个工业基站;C6 and C7 are the constraints of computing decisions; among them, d m, n represent computing decisions, d m, n = 1 means that industrial terminal m chooses industrial base station n to offload tasks; d m, n = 0 means that industrial terminal m does not choose industrial Base station n performs task offloading; each industrial terminal can only select one industrial base station for task offloading, and cannot unload to multiple industrial base stations; C8为任务截止期约束;其中,wm表示工业终端m所执行任务的截止期,即工业终端m所能接受的最长任务处理时间。C8 is the task deadline constraint; among them, w m represents the deadline of the task performed by the industrial terminal m, that is, the longest task processing time that the industrial terminal m can accept.
5.根据权利要求4所述的一种工业无线网络的计算和通信资源联合分配方法,其特征在于,所述工业终端的任务处理时延,由边缘计算时延和本地计算时延决定,计算方法如下:5. A joint allocation method for computing and communication resources of an industrial wireless network according to claim 4, wherein the task processing delay of the industrial terminal is determined by the edge computing delay and the local computing delay, and the calculation Methods as below:
Figure FDA0003824000110000026
Figure FDA0003824000110000026
所述边缘计算时延
Figure FDA0003824000110000027
由通信时延和计算时延决定,计算为
The edge computing delay
Figure FDA0003824000110000027
Determined by the communication delay and calculation delay, calculated as
Figure FDA0003824000110000028
Figure FDA0003824000110000028
其中,Cm表示工业终端m计算单位数据量任务所需的CPU周期数,Tm表示工业终端m要处理的任务数据量大小;Among them, C m represents the number of CPU cycles required by the industrial terminal m to calculate the unit data volume task, and T m represents the task data volume to be processed by the industrial terminal m; 所述本地计算时延
Figure FDA0003824000110000029
计算为
The local computation delay
Figure FDA0003824000110000029
calculated as
Figure FDA0003824000110000031
Figure FDA0003824000110000031
6.根据权利要求1所述的一种工业无线网络的计算和通信资源联合分配方法,其特征在于,所述基于马尔科夫决策过程进行问题转换的过程如下:6. A method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein the process of converting a problem based on a Markov decision process is as follows: a)建立马尔科夫决策模型,包括状态向量、动作向量、奖励向量及状态转移函数;a) Establish a Markov decision model, including state vectors, action vectors, reward vectors and state transition functions; 所述状态向量为工业终端在时刻t的状态,记为s(t)={T(t),C(t),w(t),v(t),g(t)};The state vector is the state of the industrial terminal at time t, denoted as s(t)={T(t), C(t), w(t), v(t), g(t)}; 其中,T(t)={Tm(t)}M表示M个工业终端所执行任务的数据量大小集合,C(t)={Cm(t)}M表示M个工业终端执行单位数据量任务所需的CPU计算周期集合,v(t)={vm(t)}M表示M个工业终端的任务优先级集合,w(t)={wm(t)}M表示M个工业终端执行任务的截止期集合,g(t)={gm,n(t)}M×N表示M个工业终端和N个工业基站之间的信道功率增益集合;Among them, T(t)={T m (t)} M represents the data size set of tasks performed by M industrial terminals, and C(t)={C m (t)} M represents the execution unit data of M industrial terminals The set of CPU computing cycles required by the quantity task, v(t)={v m (t)} M represents the task priority set of M industrial terminals, w(t)={w m (t)} M represents M The set of deadlines for performing tasks by industrial terminals, g(t)={g m,n (t)} M×N represents the set of channel power gains between M industrial terminals and N industrial base stations; 所述动作向量为工业终端在时刻t的动作,记为a(t)={d(t),u(t),p(t)};The action vector is the action of the industrial terminal at time t, denoted as a(t)={d(t), u(t), p(t)}; 其中,d(t)={dm,n(t)}M×N表示M个工业终端向N个工业基站进行任务卸载的计算决策结果集合;u(t)={um,n(t)}M×N表示M个工业终端向N个工业基站卸载任务的比例集合,p(t)={pm,n(t)}M×N表示M个工业终端的发射功率集合;Among them, d(t)={d m,n (t)} M×N represents the calculation and decision result set of M industrial terminals performing task offloading to N industrial base stations; u(t)={u m,n (t )} M×N represents the proportion set of M industrial terminals unloading tasks to N industrial base stations, p(t)={p m,n (t)} M×N represents the transmission power set of M industrial terminals; 所述奖励向量为工业终端在时刻t所获得的奖励,记为r(t)={rm,n(t)}M×NThe reward vector is the reward obtained by the industrial terminal at time t, which is denoted as r(t)={r m,n (t)} M×N ; 其中,工业终端m获得的奖励为rm,n(t)=τm,n(t)+ρ(τm,n(t)-wm(t)),ρ表示任务的奖惩系数,由任务优先级决定;Among them, the reward obtained by the industrial terminal m is r m,n (t)=τ m,n (t)+ρ(τ m,n (t)-w m (t)), ρ represents the reward and punishment coefficient of the task, given by task prioritization; 所述状态转移函数为在时刻t执行动作a(t)后,由状态s(t)转移到状态s(t+1)的概率,表示为f(s(t+1)|s(t),a(t));The state transition function is the probability of transitioning from state s(t) to state s(t+1) after executing action a(t) at time t, expressed as f(s(t+1)|s(t) ,a(t)); b)确定长期累积奖励函数;所述长期累积奖励为b) Determine the long-term cumulative reward function; the long-term cumulative reward is
Figure FDA0003824000110000032
Figure FDA0003824000110000032
其中,t0表示上一时刻,γ∈[0,1]表示折扣系数,用于指示过去的奖励对当前奖励的影响情况;Among them, t 0 represents the previous moment, and γ∈[0,1] represents the discount coefficient, which is used to indicate the influence of past rewards on current rewards; c)进行问题转换;所述转换后的问题为c) carry out problem conversion; The problem after described conversion is max Rp(t)max R p (t) s.t.C1,C2,C3,C4,C5,C6,C7,C8s.t.C1,C2,C3,C4,C5,C6,C7,C8 在满足约束C1-C8的情况下,最大化长期累积奖励,以获得最佳状态转移概率,进而获得有效的时延最小化策略。In the case of satisfying constraints C1-C8, the long-term cumulative reward is maximized to obtain the best state transition probability, and then an effective delay minimization strategy is obtained.
7.根据权利要求1所述的一种工业无线网络的计算和通信资源联合分配方法,其特征在于,所述基于深度强化学习构建双层对偶深度神经网络模型,包括两个深度神经网络,分别称为估计Q网络和目标Q网络,构成智能体;两者结构相同,但采用不同的超参数θ来生成Q值。7. A method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein said construction of a dual-layer dual deep neural network model based on deep reinforcement learning includes two deep neural networks, respectively It is called the estimated Q network and the target Q network, which constitute the agent; the two have the same structure, but use different hyperparameters θ to generate the Q value. 8.根据权利要求1所述的一种工业无线网络的计算和通信资源联合分配方法,其特征在于,所述离线训练神经网络模型直至奖励收敛,包括以下步骤:8. A method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein the off-line training of the neural network model until the reward converges comprises the following steps: a)从经验池中提取经验数据E(t)作为训练数据;a) Extract experience data E(t) from the experience pool as training data; b)将s(t)输入到估计Q网络中,生成动作a(t)和Q值Q(s(t),a(t)|θ);b) Input s(t) into the estimated Q network to generate action a(t) and Q value Q(s(t), a(t)|θ); c)执行a(t),将s(t)转化为s(t+1),得到奖励r(t);c) Execute a(t), convert s(t) into s(t+1), and get reward r(t); d)对估计Q网络进行训练,并实时更新目标Q网络的超参数θ;d) Train the estimated Q network and update the hyperparameter θ of the target Q network in real time; e)将得到的当前时刻的状态、动作、奖励与下一时刻的状态作为经验E(t)={s(t),a(t),r(t),s(t+1)}存储到经验池中;e) Store the obtained current state, action, reward and next state as experience E(t)={s(t), a(t), r(t), s(t+1)} into the experience pool; f)输入s(t+1)到目标Q网络,得到a(t+1)和Q值Q′(s(t),a(t)|θ′),并计算得到Q′(s(t+1),a(t+1)|θ′);f) Input s(t+1) to the target Q network, get a(t+1) and Q value Q′(s(t),a(t)|θ′), and calculate Q′(s(t) +1), a(t+1)|θ′); g)用随机梯度下降法更新θ;所述随机梯度下降法更新θ通过下式实现:g) updating θ with the stochastic gradient descent method; the stochastic gradient descent method updating θ is realized by the following formula:
Figure FDA0003824000110000041
Figure FDA0003824000110000041
其中,
Figure FDA0003824000110000042
为估计Q网络的损失函数,表示目标Q网络与估计Q网络的均方误差;
in,
Figure FDA0003824000110000042
To estimate the loss function of the Q network, it represents the mean square error between the target Q network and the estimated Q network;
h)执行经验回放,重复迭代步骤a)-g)直至奖励收敛到稳定值,获得有效的时延最小化策略,即关于所有工业基站和工业终端计算决策、卸载比例和发射功率的计算和通信资源分配结果。h) Execute experience replay, repeat the iterative steps a)-g) until the reward converges to a stable value, and obtain an effective delay minimization strategy, that is, calculation and communication about all industrial base stations and industrial terminals computing decisions, unloading ratios, and transmit power Resource allocation results.
9.根据权利要求1所述的一种工业无线网络的计算和通信资源联合分配方法,其特征在于,所述在线执行计算和通信资源分配,协同处理异构工业任务包括以下步骤:9. A method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein said online execution of computing and communication resource allocation, and collaborative processing of heterogeneous industrial tasks comprises the following steps: a)将全部工业终端当前时刻t的状态向量s(t)作为离线训练完成的智能体的输入,得到输出动作向量a(t);a) The state vector s(t) of all industrial terminals at the current moment t is used as the input of the agent after offline training, and the output action vector a(t) is obtained; b)根据得到的输出动作向量a(t),全部工业终端根据a(t)中的计算决策、卸载比例与发射功率分配计算和通信资源,处理工业任务。b) According to the obtained output action vector a(t), all industrial terminals process industrial tasks by allocating computing and communication resources according to the calculation decision, unloading ratio and transmission power in a(t).
CN202211052799.XA 2022-08-31 2022-08-31 Computing and communication resource joint allocation method for industrial wireless network Active CN115413044B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211052799.XA CN115413044B (en) 2022-08-31 2022-08-31 Computing and communication resource joint allocation method for industrial wireless network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211052799.XA CN115413044B (en) 2022-08-31 2022-08-31 Computing and communication resource joint allocation method for industrial wireless network

Publications (2)

Publication Number Publication Date
CN115413044A true CN115413044A (en) 2022-11-29
CN115413044B CN115413044B (en) 2024-08-06

Family

ID=84164572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211052799.XA Active CN115413044B (en) 2022-08-31 2022-08-31 Computing and communication resource joint allocation method for industrial wireless network

Country Status (1)

Country Link
CN (1) CN115413044B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074211A (en) * 2023-01-05 2023-05-05 国家电网有限公司 Heterogeneous network resource optimization method and system taking service quality as center
CN117575113A (en) * 2024-01-17 2024-02-20 南方电网数字电网研究院股份有限公司 Edge collaborative task processing method, device and equipment based on Markov chain
WO2024159708A1 (en) * 2023-01-31 2024-08-08 中国科学院沈阳自动化研究所 Digital twinning-based end-edge collaborative scheduling method for heterogeneous task and resource
US12191503B2 (en) 2019-12-10 2025-01-07 Dalian Institute Of Chemical Physics, Chinese Academy Of Sciences Fibrous electrode material, preparation and application thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
CN112069903A (en) * 2020-08-07 2020-12-11 之江实验室 Method and device for achieving face recognition end side unloading calculation based on deep reinforcement learning
CN113543156A (en) * 2021-06-24 2021-10-22 中国科学院沈阳自动化研究所 Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning
WO2022027776A1 (en) * 2020-08-03 2022-02-10 威胜信息技术股份有限公司 Edge computing network task scheduling and resource allocation method and edge computing system
CN114143891A (en) * 2021-11-30 2022-03-04 南京工业大学 Multidimensional resource collaborative optimization method based on FDQL in mobile edge network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108848561A (en) * 2018-04-11 2018-11-20 湖北工业大学 A kind of isomery cellular network combined optimization method based on deeply study
WO2022027776A1 (en) * 2020-08-03 2022-02-10 威胜信息技术股份有限公司 Edge computing network task scheduling and resource allocation method and edge computing system
CN112069903A (en) * 2020-08-07 2020-12-11 之江实验室 Method and device for achieving face recognition end side unloading calculation based on deep reinforcement learning
CN113543156A (en) * 2021-06-24 2021-10-22 中国科学院沈阳自动化研究所 Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning
CN114143891A (en) * 2021-11-30 2022-03-04 南京工业大学 Multidimensional resource collaborative optimization method based on FDQL in mobile edge network

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12191503B2 (en) 2019-12-10 2025-01-07 Dalian Institute Of Chemical Physics, Chinese Academy Of Sciences Fibrous electrode material, preparation and application thereof
CN116074211A (en) * 2023-01-05 2023-05-05 国家电网有限公司 Heterogeneous network resource optimization method and system taking service quality as center
WO2024159708A1 (en) * 2023-01-31 2024-08-08 中国科学院沈阳自动化研究所 Digital twinning-based end-edge collaborative scheduling method for heterogeneous task and resource
CN117575113A (en) * 2024-01-17 2024-02-20 南方电网数字电网研究院股份有限公司 Edge collaborative task processing method, device and equipment based on Markov chain
CN117575113B (en) * 2024-01-17 2024-05-03 南方电网数字电网研究院股份有限公司 Edge collaborative task processing method, device and equipment based on Markov chain

Also Published As

Publication number Publication date
CN115413044B (en) 2024-08-06

Similar Documents

Publication Publication Date Title
CN113950066B (en) Method, system, and device for offloading part of computing from single server in mobile edge environment
CN113778648B (en) Task Scheduling Method Based on Deep Reinforcement Learning in Hierarchical Edge Computing Environment
CN113543156B (en) Resource allocation method for industrial wireless network based on multi-agent deep reinforcement learning
CN115413044A (en) A Joint Allocation Method of Computing and Communication Resources in Industrial Wireless Network
WO2024159708A1 (en) Digital twinning-based end-edge collaborative scheduling method for heterogeneous task and resource
CN113626104B (en) Multi-objective optimization offloading strategy based on deep reinforcement learning under edge cloud architecture
CN111800828A (en) A mobile edge computing resource allocation method for ultra-dense networks
CN114205353B (en) A Computational Offloading Method Based on Hybrid Action Space Reinforcement Learning Algorithm
CN113364630A (en) Quality of service (QoS) differentiation optimization method and device
CN116489708A (en) Mobile edge computing task offloading method for metaverse-oriented cloud-edge-device collaboration
CN116233926A (en) Task unloading and service cache joint optimization method based on mobile edge calculation
CN116887355A (en) Multi-unmanned aerial vehicle fair collaboration and task unloading optimization method and system
CN116112488A (en) Fine-grained task unloading and resource allocation method for MEC network
CN116431326A (en) Multi-user dependency task unloading method based on edge calculation and deep reinforcement learning
CN118540743A (en) Method for unloading and caching service of partitionable tasks in cloud-edge cooperative internet of vehicles scene
CN116204319A (en) Cloud-edge-device collaborative offloading method and system based on SAC algorithm and task dependencies
Zhang et al. Federated deep reinforcement learning for multimedia task offloading and resource allocation in MEC networks
Xie et al. Computation offloading and resource allocation in leo satellite-terrestrial integrated networks with system state delay
Hongchang et al. Deep reinforcement learning-based task offloading and service migrating policies in service caching-assisted mobile edge computing
CN118467127A (en) Mobile edge computing task scheduling and offloading method based on multi-agent collaboration
CN118612855A (en) A method for task offloading and resource optimization of heterogeneous UAVs
CN116866353B (en) Distributed resource collaborative scheduling method, device, equipment and medium for integrated computing
CN115934192B (en) B5G/6G network-oriented internet of vehicles multi-type task cooperation unloading method
CN117377084A (en) Mobile edge computing task scheduling method based on deep reinforcement learning
CN117479236A (en) Multilateral server collaborative computing unloading system based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant