CN115413044A

CN115413044A - A Joint Allocation Method of Computing and Communication Resources in Industrial Wireless Network

Info

Publication number: CN115413044A
Application number: CN202211052799.XA
Authority: CN
Inventors: 许驰; 唐紫萱; 金曦; 夏长清; 李栋; 曾鹏; 于海斌
Original assignee: Shenyang Institute of Automation of CAS
Current assignee: Shenyang Institute of Automation of CAS
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2022-11-29
Anticipated expiration: 2042-08-31
Also published as: CN115413044B

Abstract

The present invention relates to industrial wireless networks, in particular to a method for joint allocation of computing and communication resources in industrial wireless networks, including the following steps: establishing an industrial wireless network with edge computing capabilities; establishing a problem prototype for joint allocation of computing and communication resources ;Problem conversion based on Markov decision process;Construction of dual-layer dual deep neural network model based on deep reinforcement learning;Offline training of neural network model until reward convergence;Online execution of computing and communication resource allocation, collaborative processing of heterogeneous industrial tasks. The present invention can support on-demand unloading of heterogeneous industrial tasks by jointly allocating computing and communication resources of industrial wireless networks, and realize "device-edge" resource collaboration; when meeting the deadline requirements of heterogeneous industrial tasks, "device- Under the premise of limiting the computing power, maximum transmit power, and peak interference power of "edge" devices, the total delay in processing heterogeneous industrial tasks is minimized, and collaborative manufacturing is supported.

Description

A Joint Allocation Method of Computing and Communication Resources in Industrial Wireless Network

技术领域technical field

本发明涉及工业无线网络领域，具体地说是一种工业无线网络的计算和通信资源联合分配方法。The invention relates to the field of industrial wireless networks, in particular to a method for jointly allocating computing and communication resources of industrial wireless networks.

背景技术Background technique

随着5G超可靠低时延通信(Ultra-Reliable Low-Latency Communication，URLLC)技术的不断增强，以5G为基础的工业无线网络能力变得越来越强，可以支持关键工业控制任务和复杂生产。然而，尽管URLLC可以在一定程度上保障控制命令在截止期前确定性到达，但是复杂工业任务通常需要实时计算来完成决策并生成控制命令，这对资源有限的工业现场设备提出了巨大挑战。多接入边缘计算(Multi-access Edge Computing，MEC)技术的快速发展及其与5GURLLC的结合，为解决复杂生产任务的实时计算，降低时延提供了有效方案。通过与工业基站联合部署MEC服务器，可以补充和增强工业现场设备的计算能力，提高处理复杂任务的实时性。With the continuous enhancement of 5G ultra-reliable low-latency communication (Ultra-Reliable Low-Latency Communication, URLLC) technology, 5G-based industrial wireless network capabilities are becoming stronger and stronger, which can support key industrial control tasks and complex production . However, although URLLC can guarantee the deterministic arrival of control commands before the deadline to a certain extent, complex industrial tasks usually require real-time calculations to complete decisions and generate control commands, which poses a huge challenge to industrial field devices with limited resources. The rapid development of Multi-access Edge Computing (MEC) technology and its combination with 5GURLLC provide an effective solution for solving real-time computing of complex production tasks and reducing latency. By jointly deploying MEC servers with industrial base stations, the computing power of industrial field devices can be supplemented and enhanced, and the real-time performance of processing complex tasks can be improved.

然而，工业无线网络端边资源分布的不均匀性，海量设备接入对资源的竞争性，以及异构任务QoS要求的差异性，对网络的计算和通信资源按需分配提出了进一步挑战，成为制约MEC高效应用的瓶颈。为实现工业无线网络的端边资源协同，学术界提出了不同的资源分配和计算卸载算法来平衡不同场景下的计算和通信资源分配，包括面向单用户-多服务器、多用户-单服务器、多用户-多服务器等场景。相关方法以时延最小化、能耗最小化、吞吐量最大化等为目标，主要采用深度强化学习(Deep Reinforcement Learning，DRL)对计算决策、卸载比例、发射功率、通信带宽、CPU资源等复杂耦合的计算和通信资源进行决策和分配，以应对高度动态的无线网络环境。However, the inhomogeneous distribution of end-to-end resources in industrial wireless networks, the competition for resources due to massive device access, and the differences in QoS requirements for heterogeneous tasks pose further challenges to the on-demand allocation of computing and communication resources in the network, and become The bottleneck restricting the efficient application of MEC. In order to realize end-edge resource collaboration in industrial wireless networks, the academic community has proposed different resource allocation and computing offloading algorithms to balance computing and communication resource allocation in different scenarios, including single-user-multi-server, multi-user-single-server, and multi-user Scenarios such as user-multiple servers. Related methods aim at minimizing delay, minimizing energy consumption, maximizing throughput, etc., and mainly adopt Deep Reinforcement Learning (DRL) to solve complex problems such as computing decisions, unloading ratios, transmission power, communication bandwidth, and CPU resources. Coupled computing and communication resources for decision-making and allocation to deal with highly dynamic wireless network environments.

然而，现有方法较少关注具有不同QoS要求的异构工业任务的高并发接入问题，例如机器视觉的计算密集型任务、运动控制的时延敏感任务。特别是工业场景下，异构工业任务有不同的截止期要求和干扰限制，对工业无线网络的计算和通信资源分配造成了极大困难。However, existing methods pay less attention to the high-concurrency access problem of heterogeneous industrial tasks with different QoS requirements, such as computing-intensive tasks of machine vision and delay-sensitive tasks of motion control. Especially in industrial scenarios, heterogeneous industrial tasks have different deadline requirements and interference restrictions, which have caused great difficulties in the allocation of computing and communication resources in industrial wireless networks.

发明内容Contents of the invention

本发明面向通用的多用户-多服务器场景下工业无线网络中异构高并发任务的协同处理问题，考虑异构任务的时间截止期要求，工业终端的最大发射功率及其对其他设备的峰值干扰功率等限制，以及网络端边设备的计算能力，提出了一种基于深度强化学习的工业无线网络通信和计算资源联合分配方法，解决了传统资源分配方法难以应对动态网络环境下的状态空间爆炸难题，实现了异构工业任务总处理时延的最小化，可以支持计算密集型、时延敏感型等异构高并发工业任务的实时协同处理。The present invention is oriented to the collaborative processing of heterogeneous high-concurrency tasks in industrial wireless networks in a general multi-user-multi-server scenario, considering the time deadline requirements of heterogeneous tasks, the maximum transmission power of industrial terminals and their peak interference to other devices Power and other constraints, as well as the computing power of network end devices, a joint allocation method of industrial wireless network communication and computing resources based on deep reinforcement learning is proposed, which solves the problem that traditional resource allocation methods are difficult to deal with the state space explosion in a dynamic network environment , which minimizes the total processing delay of heterogeneous industrial tasks, and can support real-time collaborative processing of heterogeneous and highly concurrent industrial tasks such as computing-intensive and delay-sensitive.

本发明为实现上述目的所采用的技术方案是：一种工业无线网络的计算和通信资源联合分配方法，基于深度强化学习实现工业无线网络中工业基站与工业终端的计算和通信资源协同分配，包括以下步骤：The technical solution adopted by the present invention to achieve the above purpose is: a joint allocation method of computing and communication resources in industrial wireless networks, which realizes the collaborative allocation of computing and communication resources between industrial base stations and industrial terminals in industrial wireless networks based on deep reinforcement learning, including The following steps:

1)建立具备边缘计算能力的工业无线网络；1) Establish an industrial wireless network with edge computing capabilities;

2)根据异构工业任务的截止期要求构建关于工业基站与工业终端的计算和通信资源联合分配的优化问题；2) According to the deadline requirements of heterogeneous industrial tasks, construct an optimization problem about the joint allocation of computing and communication resources between industrial base stations and industrial terminals;

3)将所述优化问题转换为马尔科夫决策过程问题；3) converting the optimization problem into a Markov decision process problem;

4)基于深度强化学习构建双层对偶深度神经网络模型，以对马尔科夫决策过程问题进行求解；4) Construct a dual-layer dual deep neural network model based on deep reinforcement learning to solve the Markov decision process problem;

5)离线训练双层对偶深度神经网络模型，以获得计算和通信资源分配结果；5) Offline training of dual-layer dual deep neural network model to obtain calculation and communication resource allocation results;

6)工业无线网络内工业基站和工业终端在线执行计算和通信资源分配，进行无线通信以及任务卸载，以协同处理异构工业任务，最小化时延。6) Industrial base stations and industrial terminals in the industrial wireless network perform computing and communication resource allocation online, perform wireless communication and task offloading, so as to collaboratively process heterogeneous industrial tasks and minimize delay.

所述具备边缘计算能力的工业无线网络，包括：N台工业基站和M个工业终端；The industrial wireless network with edge computing capability includes: N industrial base stations and M industrial terminals;

所述工业基站，配置边缘计算服务器，用于为多个工业终端提供计算资源，并支持对其覆盖范围内工业终端进行调度；The industrial base station is configured with an edge computing server, which is used to provide computing resources for multiple industrial terminals and support scheduling of industrial terminals within its coverage;

所述工业终端，用于对异构工业任务进行本地计算，并支持将异构工业任务通过无线信道卸载至工业基站进行边缘计算。The industrial terminal is used for local computing of heterogeneous industrial tasks, and supports offloading of heterogeneous industrial tasks to industrial base stations through wireless channels for edge computing.

对于单个工业终端的任务，通过不卸载、部分卸载或全部卸载至一个工业基站；以d_m,n表示计算决策，d_m,n＝1表示工业终端m选择工业基站n进行卸载，否则不卸载；For the task of a single industrial terminal, it can be unloaded, partially unloaded or fully unloaded to an industrial base station; d _m,n represents the calculation decision, d _m,n = 1 means that the industrial terminal m chooses the industrial base station n to unload, otherwise it does not unload ;

所述工业终端通过无线信道进行任务卸载时的传输速率为The transmission rate when the industrial terminal performs task offloading through the wireless channel is

其中，B_m,n表示带宽，

表示工业基站n处的噪声，g_m,n和g_m',n分别表示从工业终端m、工业终端m'到工业基站n的信道功率增益，p_m、p_m'分别表示工业终端m、工业终端m'的发射功率。Among them, B _m,n represents the bandwidth,

Indicates the noise at industrial base station n, g _m,n and g _m',n respectively represent the channel power gain from industrial terminal m, industrial terminal m' to industrial base station n, p _m , p _m' represent industrial terminal m, The transmit power of the industrial terminal m'.

所述工业基站与工业终端的计算和通信资源联合分配的优化问题为The optimization problem of the joint allocation of computing and communication resources between the industrial base station and the industrial terminal is

s.t.C1:0≤p_m≤P_max,m＝1,...M,stC1:0≤p _m ≤P _max ,m=1,...M,

C2:

C2:

C3:0≤f_m,n≤F_n,C3:0≤f _m,n ≤F _n ,

C4:

C4:

C5:0≤u_m,n≤1,C5:0≤um _, n≤1,

C6:d_m,n∈{0,1},C6:d _m,n ∈{0,1},

C7:

C7:

C8:0≤τ_m,n≤w_m C8: 0≤τm _, _n≤wm

其中，

为任务目标，即最小化总时延，τ_m,n表示处理工业终端m任务的时延，

分别表示所有工业基站和工业终端的客观计算决策、卸载比例和发射功率；in,

is the task goal, that is, to minimize the total delay, τ _m,n represents the delay in processing industrial terminal m tasks,

respectively represent the objective calculation decision, unloading ratio and transmit power of all industrial base stations and industrial terminals;

C1和C2为发射功率约束；其中，P_max表示工业终端的最大发射功率，I_p表示工业终端可以容忍的峰值干扰功率，g_m,m'表示从工业终端m与工业终端m'之间的信道功率增益；C1 and C2 are transmit power constraints; among them, P _max represents the maximum transmit power of the industrial terminal, I _p represents the peak interference power that the industrial terminal can tolerate, and g _m,m' represents the distance between the industrial terminal m and the industrial terminal m' channel power gain;

C3和C4为计算资源约束；其中，f_m,n表示工业基站n分配给工业终端m的计算资源，F_n表示工业基站n的总计算资源，均由单位时间内的CPU周期数来表征；C3 and C4 are computing resource constraints; among them, f _m,n represent the computing resources allocated by industrial base station n to industrial terminal m, and F _n represent the total computing resources of industrial base station n, which are represented by the number of CPU cycles per unit time;

C5为卸载比例约束；其中，u_m,n表示工业终端m将任务卸载至工业基站n的卸载比例，其大小在0到1之间；C5 is the unloading ratio constraint; among them, u _m,n represents the unloading ratio of industrial terminal m offloading tasks to industrial base station n, and its size is between 0 and 1;

C6和C7为计算决策的约束；其中，d_m,n表示计算决策，d_m,n＝1表示工业终端m选择工业基站n进行任务卸载；d_m,n＝0表示工业终端m未选择工业基站n进行任务卸载；每个工业终端只能选择一个工业基站进行任务卸载，不能卸载到多个工业基站；C6 and C7 are the constraints of computing decisions; among them, d _{m, n} represent computing decisions, d _{m, n} = 1 means that industrial terminal m chooses industrial base station n to offload tasks; d _{m, n} = 0 means that industrial terminal m does not choose industrial Base station n performs task offloading; each industrial terminal can only select one industrial base station for task offloading, and cannot unload to multiple industrial base stations;

C8为任务截止期约束；其中，w_m表示工业终端m所执行任务的截止期，即工业终端m所能接受的最长任务处理时间。C8 is the task deadline constraint; among them, w _m represents the deadline of the task performed by the industrial terminal m, that is, the longest task processing time that the industrial terminal m can accept.

所述工业终端的任务处理时延，由边缘计算时延和本地计算时延决定，计算方法如下：The task processing delay of the industrial terminal is determined by the edge computing delay and the local computing delay, and the calculation method is as follows:

所述边缘计算时延

由通信时延和计算时延决定，计算为The edge computing delay

Determined by the communication delay and calculation delay, calculated as

其中，C_m表示工业终端m计算单位数据量任务所需的CPU周期数，T_m表示工业终端m要处理的任务数据量大小；Among them, C _m represents the number of CPU cycles required by the industrial terminal m to calculate the unit data volume task, and T _m represents the task data volume to be processed by the industrial terminal m;

所述本地计算时延

计算为The local computation delay

calculated as

所述基于马尔科夫决策过程进行问题转换的过程如下：The process of problem conversion based on the Markov decision process is as follows:

a)建立马尔科夫决策模型，包括状态向量、动作向量、奖励向量及状态转移函数；a) Establish a Markov decision model, including state vectors, action vectors, reward vectors and state transition functions;

所述状态向量为工业终端在时刻t的状态，记为s(t)＝{T(t),C(t),w(t),v(t),g(t)}；The state vector is the state of the industrial terminal at time t, denoted as s(t)={T(t), C(t), w(t), v(t), g(t)};

其中，T(t)＝{T_m(t)}_M表示M个工业终端所执行任务的数据量大小集合，C(t)＝{C_m(t)}_M表示M个工业终端执行单位数据量任务所需的CPU计算周期集合，v(t)＝{v_m(t)}_M表示M个工业终端的任务优先级集合，w(t)＝{w_m(t)}_M表示M个工业终端执行任务的截止期集合，g(t)＝{g_m,n(t)}_M×N表示M个工业终端和N个工业基站之间的信道功率增益集合；Among them, T(t)={T _m (t)} _M represents the data size set of tasks performed by M industrial terminals, and C(t)={C _m (t)} _M represents the execution unit data of M industrial terminals The set of CPU computing cycles required by the quantity task, v(t)={v _m (t)} _M represents the task priority set of M industrial terminals, w(t)={w _m (t)} _M represents M The set of deadlines for performing tasks by industrial terminals, g(t)={g _m,n (t)} _M×N represents the set of channel power gains between M industrial terminals and N industrial base stations;

所述动作向量为工业终端在时刻t的动作，记为a(t)＝{d(t),u(t),p(t)}；The action vector is the action of the industrial terminal at time t, denoted as a(t)={d(t), u(t), p(t)};

其中，d(t)＝{d_m,n(t)}_M×N表示M个工业终端向N个工业基站进行任务卸载的计算决策结果集合；u(t)＝{u_m,n(t)}_M×N表示M个工业终端向N个工业基站卸载任务的比例集合，p(t)＝{p_m,n(t)}_M×N表示M个工业终端的发射功率集合；Among them, d(t)={d _m,n (t)} _M×N represents the calculation and decision result set of M industrial terminals performing task offloading to N industrial base stations; u(t)={u _m,n (t )} _M×N represents the proportion set of M industrial terminals unloading tasks to N industrial base stations, p(t)={p _m,n (t)} _M×N represents the transmission power set of M industrial terminals;

所述奖励向量为工业终端在时刻t所获得的奖励，记为r(t)＝{r_m,n(t)}_M×N；The reward vector is the reward obtained by the industrial terminal at time t, which is denoted as r(t)={r _m,n (t)} _M×N ;

其中，工业终端m获得的奖励为r_m,n(t)＝τ_m,n(t)+ρ(τ_m,n(t)-w_m(t))，ρ表示任务的奖惩系数，由任务优先级决定；Among them, the reward obtained by the industrial terminal m is r _m,n (t)=τ _m,n (t)+ρ(τ _m,n (t)-w _m (t)), ρ represents the reward and punishment coefficient of the task, given by task prioritization;

所述状态转移函数为在时刻t执行动作a(t)后，由状态s(t)转移到状态s(t+1)的概率，表示为f(s(t+1)|s(t),a(t))；The state transition function is the probability of transitioning from state s(t) to state s(t+1) after executing action a(t) at time t, expressed as f(s(t+1)|s(t) ,a(t));

b)确定长期累积奖励函数；所述长期累积奖励为b) Determine the long-term cumulative reward function; the long-term cumulative reward is

其中，t₀表示上一时刻，γ∈[0,1]表示折扣系数，用于指示过去的奖励对当前奖励的影响情况；Among them, t ₀ represents the previous moment, and γ∈[0,1] represents the discount coefficient, which is used to indicate the influence of past rewards on current rewards;

c)进行问题转换；所述转换后的问题为c) carry out problem conversion; The problem after described conversion is

max R_p(t)max R _p (t)

s.t.C1,C2,C3,C4,C5,C6,C7,C8s.t.C1,C2,C3,C4,C5,C6,C7,C8

在满足约束C1-C8的情况下，最大化长期累积奖励，以获得最佳状态转移概率，进而获得有效的时延最小化策略。In the case of satisfying constraints C1-C8, the long-term cumulative reward is maximized to obtain the best state transition probability, and then an effective delay minimization strategy is obtained.

所述基于深度强化学习构建双层对偶深度神经网络模型，包括两个深度神经网络，分别称为估计Q网络和目标Q网络，构成智能体；两者结构相同，但采用不同的超参数θ来生成Q值。The two-layer dual deep neural network model based on deep reinforcement learning includes two deep neural networks, which are respectively called the estimated Q network and the target Q network to form an agent; both have the same structure, but use different hyperparameters θ to Generate Q-values.

所述离线训练神经网络模型直至奖励收敛，包括以下步骤：The offline training of the neural network model until the reward converges comprises the following steps:

a)从经验池中提取经验数据E(t)作为训练数据；a) Extract experience data E(t) from the experience pool as training data;

b)将s(t)输入到估计Q网络中，生成动作a(t)和Q值Q(s(t),a(t)|θ)；b) Input s(t) into the estimated Q network to generate action a(t) and Q value Q(s(t), a(t)|θ);

c)执行a(t)，将s(t)转化为s(t+1)，得到奖励r(t)；c) Execute a(t), convert s(t) into s(t+1), and get reward r(t);

d)对估计Q网络进行训练，并实时更新目标Q网络的超参数θ；d) Train the estimated Q network and update the hyperparameter θ of the target Q network in real time;

e)将得到的当前时刻的状态、动作、奖励与下一时刻的状态作为经验E(t)＝{s(t),a(t),r(t),s(t+1)}存储到经验池中；e) Store the obtained current state, action, reward and next state as experience E(t)={s(t), a(t), r(t), s(t+1)} into the experience pool;

f)输入s(t+1)到目标Q网络，得到a(t+1)和Q值Q′(s(t),a(t)|θ′)，并计算得到Q′(s(t+1),a(t+1)|θ′)；f) Input s(t+1) to the target Q network, get a(t+1) and Q value Q′(s(t),a(t)|θ′), and calculate Q′(s(t) +1), a(t+1)|θ′);

g)用随机梯度下降法更新θ；所述随机梯度下降法更新θ通过下式实现：g) updating θ with the stochastic gradient descent method; the stochastic gradient descent method updating θ is realized by the following formula:

其中，

为估计Q网络的损失函数，表示目标Q网络与估计Q网络的均方误差；in,

To estimate the loss function of the Q network, it represents the mean square error between the target Q network and the estimated Q network;

h)执行经验回放，重复迭代步骤a)-g)直至奖励收敛到稳定值，获得有效的时延最小化策略，即关于所有工业基站和工业终端计算决策、卸载比例和发射功率的计算和通信资源分配结果。h) Execute experience replay, repeat the iterative steps a)-g) until the reward converges to a stable value, and obtain an effective delay minimization strategy, that is, calculation and communication about all industrial base stations and industrial terminals computing decisions, unloading ratios, and transmit power Resource allocation results.

所述在线执行计算和通信资源分配，协同处理异构工业任务包括以下步骤：The online execution of computing and communication resource allocation and collaborative processing of heterogeneous industrial tasks includes the following steps:

a)将全部工业终端当前时刻t的状态向量s(t)作为离线训练完成的智能体的输入，得到输出动作向量a(t)；a) The state vector s(t) of all industrial terminals at the current moment t is used as the input of the agent after offline training, and the output action vector a(t) is obtained;

b)根据得到的输出动作向量a(t)，全部工业终端根据a(t)中的计算决策、卸载比例与发射功率分配计算和通信资源，处理工业任务。b) According to the obtained output action vector a(t), all industrial terminals process industrial tasks by allocating computing and communication resources according to the calculation decision, unloading ratio and transmission power in a(t).

本发明具有以下有益效果及优点：The present invention has the following beneficial effects and advantages:

1.本发明针对高度动态的工业网络环境，以及通信和计算多维资源复杂耦合造成的建模难和算法状态空间爆炸难题，采用深度强化学习方法提出了一种工业无线网络的计算和通信资源联合分配方法，实现了资源分配的离线训练和在线执行，可以确保工业无线网络的实时性。1. Aiming at the highly dynamic industrial network environment, as well as the difficulty in modeling and the explosion of the algorithm state space caused by the complex coupling of communication and computing multi-dimensional resources, the present invention proposes a combination of computing and communication resources for industrial wireless networks by using a deep reinforcement learning method The allocation method realizes offline training and online execution of resource allocation, and can ensure the real-time performance of industrial wireless networks.

2.本发明面向工业无线网络中异构高并发工业任务的协同处理问题，充分考虑了异构任务的时间截止期要求，工业终端的最大发射功率及其对其他设备的峰值干扰功率等限制，以及网络端边的计算能力，提出了一种基于深度强化学习的工业无线网络通信和计算资源联合分配方法，可以满足干扰受限的工业无线网络中异构工业任务的传输质量要求，支持异构任务的协同处理。2. The present invention is oriented towards the cooperative processing of heterogeneous and highly concurrent industrial tasks in industrial wireless networks, fully considering the time deadline requirements of heterogeneous tasks, the maximum transmission power of industrial terminals and the peak interference power to other devices, etc., As well as the computing power of the network end, a joint allocation method of industrial wireless network communication and computing resources based on deep reinforcement learning is proposed, which can meet the transmission quality requirements of heterogeneous industrial tasks in interference-limited industrial wireless networks, and support heterogeneous Collaborative processing of tasks.

附图说明Description of drawings

图1为本发明方法流程图；Fig. 1 is a flow chart of the method of the present invention;

图2为本发明面向的多用户-多服务工业无线网络场景示意图；FIG. 2 is a schematic diagram of a multi-user-multi-service industrial wireless network scenario for the present invention;

图3为本发明采用的深度神经网络结构图；Fig. 3 is the deep neural network structural diagram that the present invention adopts;

图4为本发明进行深度强化学习训练的流程图。Fig. 4 is a flowchart of deep reinforcement learning training in the present invention.

具体实施方式Detailed ways

下面结合附图及实施例对本发明做进一步的详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments.

本发明面向多用户-多服务器这一通用场景下的工业无线网络“端-边”资源分配，提出了一种基于深度强化学习的工业无线网络通信和计算资源联合分配方法。本发明通过对工业无线网络的计算和通信资源进行联合分配，可以支持异构工业任务的按需卸载，实现“端-边”资源协同；在满足异构工业任务的截止期要求，“端-边”设备的计算能力、最大发射功率、峰值干扰功率等限制的前提下，实现异构工业任务的总处理时延最小化，支持协同化生产制造。The present invention is oriented to the "device-side" resource allocation of industrial wireless networks under the general scenario of multi-user-multi-server, and proposes a joint allocation method of industrial wireless network communication and computing resources based on deep reinforcement learning. The present invention can support on-demand unloading of heterogeneous industrial tasks by jointly allocating computing and communication resources of industrial wireless networks, and realize "device-edge" resource collaboration; when meeting the deadline requirements of heterogeneous industrial tasks, "device- Under the premise of limiting the computing power, maximum transmit power, and peak interference power of "edge" devices, the total processing delay of heterogeneous industrial tasks is minimized, and collaborative manufacturing is supported.

本发明所提出的一种工业无线网络通信和计算资源联合分配方法，包括以下步骤：1)建立具备边缘计算能力的工业无线网络；2)建立计算和通信资源联合分配的问题原型；3)基于马尔科夫决策过程进行问题转换；4)基于深度强化学习构建双层对偶深度神经网络模型；5)离线训练神经网络模型直至奖励收敛；6)在线执行计算和通信资源分配，协同处理异构工业任务。本发明的整体流程如图1所示。A method for joint allocation of industrial wireless network communication and computing resources proposed by the present invention includes the following steps: 1) establishing an industrial wireless network with edge computing capabilities; 2) establishing a problem prototype for joint allocation of computing and communication resources; 3) based on Markov decision process for problem conversion; 4) Construct dual-layer dual deep neural network model based on deep reinforcement learning; 5) Offline training of neural network model until reward convergence; 6) Online execution of computing and communication resource allocation, collaborative processing of heterogeneous industries Task. The overall process of the present invention is shown in Figure 1.

1)建立具备边缘计算能力的工业无线网络。如图2所示，工业无线网络包括N台工业基站和M个工业终端。其中，工业基站配置边缘计算服务器，具有强实时计算能力，可为多个工业终端提供计算资源，并支持对其覆盖范围内工业终端的调度，满足工业终端的计算和通信需求；工业终端具备计算和通信能力，可以对异构工业任务进行实时处理，并支持将异构工业任务通过无线信道卸载至工业基站进行处理。1) Establish an industrial wireless network with edge computing capabilities. As shown in Figure 2, the industrial wireless network includes N industrial base stations and M industrial terminals. Among them, the industrial base station is equipped with an edge computing server, which has strong real-time computing capabilities, can provide computing resources for multiple industrial terminals, and supports the scheduling of industrial terminals within its coverage to meet the computing and communication needs of industrial terminals; And communication capabilities, can process heterogeneous industrial tasks in real time, and support the offloading of heterogeneous industrial tasks to industrial base stations through wireless channels for processing.

根据计算决策情况，单个工业终端的任务可以不卸载、部分卸载或全部卸载至工业基站，但只能卸载到一个工业基站。以d_m,n表示计算决策，d_m,n＝1表示工业终端m选择工业基站n进行卸载，否则不卸载。According to the calculation and decision-making situation, the task of a single industrial terminal can be unloaded, partially or completely offloaded to the industrial base station, but it can only be offloaded to one industrial base station. Let d _m,n represent the calculation decision, d _m,n = 1 means that the industrial terminal m selects the industrial base station n for offloading, otherwise it does not offload.

工业终端通过无线信道进行任务卸载时的速率为When the industrial terminal performs task offloading through the wireless channel, the rate is

其中，B_m,n表示带宽，

2)建立计算和通信资源联合分配的问题原型2) Establish a problem prototype for the joint allocation of computing and communication resources

工业无线网络端边联合计算和通信资源分配的优化问题为The optimization problem of end-side joint computing and communication resource allocation in industrial wireless network is

s.t.C1:0≤p_m≤P_max,m＝1,...M,stC1:0≤p _m ≤P _max ,m=1,...M,

C2:

C2:

C3:0≤f_m,n≤F_n,C3:0≤f _m,n ≤F _n ,

C4:

C4:

C5:0≤u_m,n≤1,C5:0≤um _, n≤1,

C6:d_m,n∈{0,1},C6:d _m,n ∈{0,1},

C7:

C7:

C8:0≤τ_m,n≤w_m C8: 0≤τm _, _n≤wm

其中，

分别表示所有工业基站和工业终端的客观计算决策、卸载比例和发射功率。

为任务目标，即最小化总时延。τ_m,n表示处理工业终端m的任务时延，由边缘计算时延和本地计算时延决定，计算方法如下：in,

Represent the objective calculation decision, unloading ratio and transmit power of all industrial base stations and industrial terminals, respectively.

is the mission goal, that is, to minimize the total delay. τ _m,n represents the task delay of processing industrial terminal m, which is determined by edge computing delay and local computing delay. The calculation method is as follows:

边缘计算时延

由通信时延和计算时延决定，计算为Edge Computing Latency

Determined by the communication delay and calculation delay, calculated as

其中，C_m表示工业终端m计算单位数据量任务所需的CPU周期数，T_m表示工业终端m要处理的任务数据量大小。Among them, C _m represents the number of CPU cycles required by the industrial terminal m to calculate the unit data volume task, and T _m represents the task data volume to be processed by the industrial terminal m.

本地计算时延

计算为Local Computing Latency

calculated as

C1和C2为发射功率约束。其中，p_m表示工业终端m的发射功率，P_max表示工业终端的最大发射功率，I_p表示工业终端可以容忍的峰值干扰功率，g_m,m'表示从工业终端m与工业终端m'之间的信道功率增益。C1 and C2 are transmit power constraints. Among them, p _m represents the transmission power of industrial terminal m, P _max represents the maximum transmission power of industrial terminal, I _p represents the peak interference power that industrial terminal can tolerate, g _m,m' represents the distance between industrial terminal m and industrial terminal m' The channel power gain between.

C3和C4为计算资源约束。其中，f_m,n表示工业基站n分配给工业终端m的计算资源，F_n表示工业基站n的总计算资源，均由单位时间内的CPU周期数来衡量。C3 and C4 are computing resource constraints. Among them, f _{m, n} represent the computing resources allocated by industrial base station n to industrial terminal m, and F _n represent the total computing resources of industrial base station n, which are measured by the number of CPU cycles per unit time.

C5为卸载比例的约束。其中，u_m,n表示工业终端m将任务卸载至工业基站n的卸载比例，其大小在0到1之间。C5 is the constraint of unloading ratio. Among them, u _m,n represents the unloading ratio of industrial terminal m offloading tasks to industrial base station n, and its size is between 0 and 1.

C6和C7为计算决策的约束。其中，d_m,n表示计算决策，d_m,n＝1表示工业终端m选择工业基站n进行任务卸载；d_m,n＝0表示工业终端m未选择工业基站n进行任务卸载。C6 and C7 are constraints for computing decisions. Among them, d _m,n represents a calculation decision, d _m,n = 1 means that the industrial terminal m selects the industrial base station n for task offloading; d _m,n = 0 means that the industrial terminal m does not select the industrial base station n for task offloading.

C8为任务截止期的约束。其中，w_m表示工业终端m所执行任务的截止期，即要求工业终端m任务的处理时延应在该最长时间前完成。C8 is the constraint of task deadline. Among them, w _m represents the deadline of the task performed by the industrial terminal m, that is, it is required that the processing delay of the task of the industrial terminal m should be completed before the maximum time.

3)基于马尔科夫决策过程进行问题转换的过程如下：3) The process of problem conversion based on Markov decision process is as follows:

a)建立马尔科夫决策模型，包括状态向量、动作向量、奖励向量及状态转移函数。a) Establish a Markov decision model, including state vectors, action vectors, reward vectors and state transition functions.

状态向量为工业终端在时刻t的状态，表示为s(t)＝{T(t),C(t),w(t),v(t),g(t)}。其中，T(t)＝{T_m(t)}_M表示M个工业终端所执行任务的数据量大小集合，C(t)＝{C_m(t)}_M表示M个工业终端执行单位数据量任务所需的CPU计算周期集合，v(t)＝{v_m(t)}_M表示M个工业终端的任务优先级集合，w(t)＝{w_m(t)}_M表示M个工业终端执行任务的截止期集合，g(t)＝{g_m,n(t)}_M×N表示M个工业终端和N个工业基站之间的信道功率增益集合。The state vector is the state of the industrial terminal at time t, expressed as s(t)={T(t), C(t), w(t), v(t), g(t)}. Among them, T(t)={T _m (t)} _M represents the data size set of tasks performed by M industrial terminals, and C(t)={C _m (t)} _M represents the execution unit data of M industrial terminals The set of CPU computing cycles required by the quantity task, v(t)={v _m (t)} _M represents the task priority set of M industrial terminals, w(t)={w _m (t)} _M represents M A set of deadlines for industrial terminals to perform tasks, g(t)={g _m,n (t)} _M×N represents a set of channel power gains between M industrial terminals and N industrial base stations.

动作向量为工业终端在时刻t的动作，表示为a(t)＝{d(t),u(t),p(t)}。其中，d(t)＝{d_m,n(t)}_M×N表示M个工业终端向N个工业基站进行任务卸载的计算决策结果集合；u(t)＝{u_m,n(t)}_M×N表示M个工业终端向N个工业基站卸载任务的比例集合，p(t)＝{p_m,n(t)}_M×N表示M个工业终端的发射功率集合。The action vector is the action of the industrial terminal at time t, expressed as a(t)={d(t),u(t),p(t)}. Among them, d(t)={d _m,n (t)} _M×N represents the calculation and decision result set of M industrial terminals performing task offloading to N industrial base stations; u(t)={u _m,n (t )} _M×N represents the proportion set of M industrial terminals offloading tasks to N industrial base stations, p(t)={p _m,n (t)} _M×N represents the transmit power set of M industrial terminals.

奖励向量为工业终端在时刻t所获得的奖励，表示为r(t)＝{r_m,n(t)}_M×N。其中，工业终端m获得的奖励为The reward vector is the reward obtained by the industrial terminal at time t, expressed as r(t)={r _m,n (t)} _M×N . Among them, the reward obtained by the industrial terminal m is

r_m,n(t)＝τ_m,n(t)+ρ(τ_m,n(t)-w_m(t))r _m,n (t)=τ _m,n (t)+ρ(τ _m,n (t)-w _m (t))

其中，ρ表示任务的奖惩系数，由任务优先级决定。Among them, ρ represents the reward and punishment coefficient of the task, which is determined by the priority of the task.

状态转移函数为在时刻t执行动作a(t)后，由状态s(t)转移到状态s(t+1)的概率，表示为f(s(t+1)|s(t),a(t))。The state transition function is the probability of transitioning from state s(t) to state s(t+1) after executing action a(t) at time t, expressed as f(s(t+1)|s(t),a (t)).

b)确定长期累积奖励为b) Determine the long-term cumulative reward as

其中，t₀表示上一时刻，γ∈[0,1]表示折扣系数，用于指示过去的奖励对当前奖励的影响情况。Among them, t ₀ represents the previous moment, and γ∈[0,1] represents the discount coefficient, which is used to indicate the influence of past rewards on current rewards.

c)进行问题转换c) Perform problem transformation

根据长期累积函数，将计算和通信资源分配的优化问题转换为According to the long-term cumulative function, the optimization problem of computing and communication resource allocation is transformed into

max R_p(t)max R _p (t)

s.t.C1,C2,C3,C4,C5,C6,C7,C8s.t.C1,C2,C3,C4,C5,C6,C7,C8

在满足约束C1-C8的情况下，最大化长期累积奖励，可以获得最佳状态转移概率，进而获得有效的计算和通信资源分配结果，实现时延最小化。In the case of satisfying constraints C1-C8, maximizing the long-term cumulative reward can obtain the best state transition probability, and then obtain effective calculation and communication resource allocation results, and minimize the delay.

4)基于深度强化学习构建双层对偶深度神经网络模型，包括两个深度神经网络，分别称为估计Q网络和目标Q网络。两者结构相同，但采用不同的超参数θ来生成Q值。估计Q网络和目标Q网络均采用对偶架构，如图3所示，包含V(s)和A(s,a)两个分支，分别表明当前状态的状态价值和每个动作的优势。4) Construct a dual-layer dual deep neural network model based on deep reinforcement learning, including two deep neural networks, called the estimated Q-network and the target Q-network, respectively. Both have the same structure, but adopt different hyperparameters θ to generate Q values. Both the estimated Q network and the target Q network adopt a dual architecture, as shown in Figure 3, which contains two branches, V(s) and A(s,a), which respectively indicate the state value of the current state and the advantages of each action.

5)离线训练神经网络模型直至奖励收敛，如图4所示，包括以下步骤：5) offline training neural network model until the reward converges, as shown in Figure 4, including the following steps:

a)从经验池中提取数据作为训练数据；a) Extract data from the experience pool as training data;

其中，

h)执行经验回放，重复迭代步骤a)-g)直至奖励收敛到稳定值，获得有效的计算和通信资源分配结果，实现时延最小化。h) Execute experience replay, repeat iteration steps a)-g) until the reward converges to a stable value, obtain effective calculation and communication resource allocation results, and minimize time delay.

6)在线执行计算和通信资源分配，协同处理异构工业任务包括以下步骤：6) Online execution of computing and communication resource allocation, collaborative processing of heterogeneous industrial tasks includes the following steps:

Claims

1. A method for joint allocation of computing and communication resources of an industrial wireless network, characterized in that, based on deep reinforcement learning, the collaborative allocation of computing and communication resources of industrial base stations and industrial terminals in industrial wireless networks may include the following steps:

1) Establish an industrial wireless network with edge computing capabilities;

2) According to the deadline requirements of heterogeneous industrial tasks, construct an optimization problem about the joint allocation of computing and communication resources between industrial base stations and industrial terminals;

3) converting the optimization problem into a Markov decision process problem;

4) Construct a dual-layer dual deep neural network model based on deep reinforcement learning to solve the Markov decision process problem;

5) Offline training of dual-layer dual deep neural network model to obtain calculation and communication resource allocation results;

6) Industrial base stations and industrial terminals in the industrial wireless network perform computing and communication resource allocation online, perform wireless communication and task offloading, so as to collaboratively process heterogeneous industrial tasks and minimize delay.

2. A method for joint allocation of computing and communication resources for an industrial wireless network according to claim 1, wherein the industrial wireless network with edge computing capabilities includes: N industrial base stations and M industrial terminals;

The industrial base station is configured with an edge computing server, which is used to provide computing resources for multiple industrial terminals and support scheduling of industrial terminals within its coverage;

The industrial terminal is used for local computing of heterogeneous industrial tasks, and supports offloading of heterogeneous industrial tasks to industrial base stations through wireless channels for edge computing.

3. The computing and communication resource joint allocation method of a kind of industrial wireless network according to claim 2, it is characterized in that, for the task of a single industrial terminal, by not unloading, partially unloading or all unloading to an industrial base station; with d _{m, n} represent calculation decisions, d _{m, n} = 1 means that the industrial terminal m selects the industrial base station n for unloading, otherwise it does not unload;

The transmission rate when the industrial terminal performs task offloading through the wireless channel is

Among them, B _m,n represents the bandwidth,

4. The computing and communication resource joint distribution method of a kind of industrial wireless network according to claim 1, is characterized in that, the optimization problem of the computing and communication resource joint distribution of described industrial base station and industrial terminal is

C3:0≤f _m,n ≤F _n ,

C5:0≤um _, n≤1,

C6:d _m,n ∈{0,1},

C8: 0≤τm _, _n≤wm

in,

C1 and C2 are transmit power constraints; among them, P _max represents the maximum transmit power of the industrial terminal, I _p represents the peak interference power that the industrial terminal can tolerate, and g _m,m' represents the distance between the industrial terminal m and the industrial terminal m' channel power gain;

C3 and C4 are computing resource constraints; among them, f _m,n represent the computing resources allocated by industrial base station n to industrial terminal m, and F _n represent the total computing resources of industrial base station n, which are represented by the number of CPU cycles per unit time;

C5 is the unloading ratio constraint; among them, u _m,n represents the unloading ratio of industrial terminal m offloading tasks to industrial base station n, and its size is between 0 and 1;

C6 and C7 are the constraints of computing decisions; among them, d _{m, n} represent computing decisions, d _{m, n} = 1 means that industrial terminal m chooses industrial base station n to offload tasks; d _{m, n} = 0 means that industrial terminal m does not choose industrial Base station n performs task offloading; each industrial terminal can only select one industrial base station for task offloading, and cannot unload to multiple industrial base stations;

C8 is the task deadline constraint; among them, w _m represents the deadline of the task performed by the industrial terminal m, that is, the longest task processing time that the industrial terminal m can accept.

5. A joint allocation method for computing and communication resources of an industrial wireless network according to claim 4, wherein the task processing delay of the industrial terminal is determined by the edge computing delay and the local computing delay, and the calculation Methods as below:

The edge computing delay

Determined by the communication delay and calculation delay, calculated as

Among them, C _m represents the number of CPU cycles required by the industrial terminal m to calculate the unit data volume task, and T _m represents the task data volume to be processed by the industrial terminal m;

The local computation delay

calculated as

6. A method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein the process of converting a problem based on a Markov decision process is as follows:

a) Establish a Markov decision model, including state vectors, action vectors, reward vectors and state transition functions;

The state vector is the state of the industrial terminal at time t, denoted as s(t)={T(t), C(t), w(t), v(t), g(t)};

Among them, T(t)={T _m (t)} _M represents the data size set of tasks performed by M industrial terminals, and C(t)={C _m (t)} _M represents the execution unit data of M industrial terminals The set of CPU computing cycles required by the quantity task, v(t)={v _m (t)} _M represents the task priority set of M industrial terminals, w(t)={w _m (t)} _M represents M The set of deadlines for performing tasks by industrial terminals, g(t)={g _m,n (t)} _M×N represents the set of channel power gains between M industrial terminals and N industrial base stations;

The action vector is the action of the industrial terminal at time t, denoted as a(t)={d(t), u(t), p(t)};

Among them, d(t)={d _m,n (t)} _M×N represents the calculation and decision result set of M industrial terminals performing task offloading to N industrial base stations; u(t)={u _m,n (t )} _M×N represents the proportion set of M industrial terminals unloading tasks to N industrial base stations, p(t)={p _m,n (t)} _M×N represents the transmission power set of M industrial terminals;

The reward vector is the reward obtained by the industrial terminal at time t, which is denoted as r(t)={r _m,n (t)} _M×N ;

Among them, the reward obtained by the industrial terminal m is r _m,n (t)=τ _m,n (t)+ρ(τ _m,n (t)-w _m (t)), ρ represents the reward and punishment coefficient of the task, given by task prioritization;

The state transition function is the probability of transitioning from state s(t) to state s(t+1) after executing action a(t) at time t, expressed as f(s(t+1)|s(t) ,a(t));

b) Determine the long-term cumulative reward function; the long-term cumulative reward is

Among them, t ₀ represents the previous moment, and γ∈[0,1] represents the discount coefficient, which is used to indicate the influence of past rewards on current rewards;

c) carry out problem conversion; The problem after described conversion is

max R _p (t)

s.t.C1,C2,C3,C4,C5,C6,C7,C8

In the case of satisfying constraints C1-C8, the long-term cumulative reward is maximized to obtain the best state transition probability, and then an effective delay minimization strategy is obtained.

7. A method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein said construction of a dual-layer dual deep neural network model based on deep reinforcement learning includes two deep neural networks, respectively It is called the estimated Q network and the target Q network, which constitute the agent; the two have the same structure, but use different hyperparameters θ to generate the Q value.

8. A method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein the off-line training of the neural network model until the reward converges comprises the following steps:

a) Extract experience data E(t) from the experience pool as training data;

b) Input s(t) into the estimated Q network to generate action a(t) and Q value Q(s(t), a(t)|θ);

c) Execute a(t), convert s(t) into s(t+1), and get reward r(t);

d) Train the estimated Q network and update the hyperparameter θ of the target Q network in real time;

e) Store the obtained current state, action, reward and next state as experience E(t)={s(t), a(t), r(t), s(t+1)} into the experience pool;

f) Input s(t+1) to the target Q network, get a(t+1) and Q value Q′(s(t),a(t)|θ′), and calculate Q′(s(t) +1), a(t+1)|θ′);

g) updating θ with the stochastic gradient descent method; the stochastic gradient descent method updating θ is realized by the following formula:

in,

h) Execute experience replay, repeat the iterative steps a)-g) until the reward converges to a stable value, and obtain an effective delay minimization strategy, that is, calculation and communication about all industrial base stations and industrial terminals computing decisions, unloading ratios, and transmit power Resource allocation results.

9. A method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein said online execution of computing and communication resource allocation, and collaborative processing of heterogeneous industrial tasks comprises the following steps:

a) The state vector s(t) of all industrial terminals at the current moment t is used as the input of the agent after offline training, and the output action vector a(t) is obtained;

b) According to the obtained output action vector a(t), all industrial terminals process industrial tasks by allocating computing and communication resources according to the calculation decision, unloading ratio and transmission power in a(t).