CN115413044A - Computing and communication resource joint distribution method for industrial wireless network - Google Patents

Computing and communication resource joint distribution method for industrial wireless network Download PDF

Info

Publication number
CN115413044A
CN115413044A CN202211052799.XA CN202211052799A CN115413044A CN 115413044 A CN115413044 A CN 115413044A CN 202211052799 A CN202211052799 A CN 202211052799A CN 115413044 A CN115413044 A CN 115413044A
Authority
CN
China
Prior art keywords
industrial
terminal
computing
task
base station
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211052799.XA
Other languages
Chinese (zh)
Inventor
许驰
唐紫萱
金曦
夏长清
李栋
曾鹏
于海斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Institute of Automation of CAS
Original Assignee
Shenyang Institute of Automation of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Institute of Automation of CAS filed Critical Shenyang Institute of Automation of CAS
Priority to CN202211052799.XA priority Critical patent/CN115413044A/en
Publication of CN115413044A publication Critical patent/CN115413044A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to an industrial wireless network, in particular to a method for jointly allocating computing and communication resources of the industrial wireless network, which comprises the following steps: establishing an industrial wireless network with edge computing capability; establishing a problem prototype of joint allocation of computing and communication resources; performing problem conversion based on a Markov decision process; constructing a double-layer dual deep neural network model based on deep reinforcement learning; training the neural network model off line until the reward converges; and performing calculation and communication resource allocation on line, and cooperatively processing heterogeneous industrial tasks. According to the invention, through joint allocation of the computing and communication resources of the industrial wireless network, the on-demand unloading of heterogeneous industrial tasks can be supported, and the end-edge resource cooperation is realized; on the premise of meeting the deadline requirements of the heterogeneous industrial tasks, the total time delay minimization of processing the heterogeneous industrial tasks is realized and the cooperative production manufacturing is supported on the premise of limiting the computing capacity, the maximum transmitting power, the peak interference power and the like of the 'end-edge' equipment.

Description

Computing and communication resource joint distribution method for industrial wireless network
Technical Field
The invention relates to the field of industrial wireless networks, in particular to a computing and communication resource joint allocation method of an industrial wireless network.
Background
With the continuous enhancement of the 5G Ultra-Reliable Low-Latency Communication (URLLC) technology, the 5G-based industrial wireless network capability becomes stronger and stronger, and can support key industrial control tasks and complex production. However, while URLLC may guarantee, to some extent, that control commands arrive deterministically before the deadline, complex industrial tasks typically require real-time computations to complete decisions and generate control commands, which presents a significant challenge to resource-limited industrial field devices. The rapid development of Multi-access Edge Computing (MEC) technology and the combination of the MEC technology and 5-GURLLC provide an effective scheme for solving the real-time computation of complex production tasks and reducing time delay. By jointly deploying the MEC server with the industrial base station, the computing power of industrial field equipment can be supplemented and enhanced, and the real-time performance of processing complex tasks is improved.
However, the non-uniformity of edge resource distribution of the industrial wireless network, the competitiveness of mass device access to resources, and the difference of QoS requirements of heterogeneous tasks pose further challenges to the computation and communication resource allocation as required of the network, and become bottlenecks that restrict the efficient application of the MEC. In order to realize the end edge resource cooperation of the industrial wireless network, the academic community provides different resource allocation and calculation unloading algorithms to balance calculation and communication resource allocation under different scenes, including scenes such as single-user-multi-server, multi-user-single-server and multi-user-multi-server. The related method mainly adopts Deep Reinforcement Learning (DRL) to make decisions and allocation on complex coupled calculation and communication resources such as calculation decisions, unloading proportions, transmission power, communication bandwidth, CPU resources and the like to deal with a highly dynamic wireless network environment by taking time delay minimization, energy consumption minimization, throughput maximization and the like as targets.
However, existing approaches are less concerned with the problem of high concurrency access for heterogeneous industrial tasks with different QoS requirements, such as computationally intensive tasks for machine vision, delay sensitive tasks for motion control. In particular, in an industrial scenario, heterogeneous industrial tasks have different deadline requirements and interference limitations, which cause great difficulty in the allocation of computing and communication resources of an industrial wireless network.
Disclosure of Invention
The invention provides a deep reinforcement learning-based industrial wireless network communication and computing resource joint allocation method for the cooperative processing of heterogeneous high-concurrency industrial tasks in an industrial wireless network under the universal multi-user and multi-server scene, and the method is used for solving the problem that the traditional resource allocation method is difficult to deal with the state space explosion under the dynamic network environment, realizing the minimization of the total processing time delay of the heterogeneous industrial tasks and supporting the real-time cooperative processing of the heterogeneous high-concurrency industrial tasks such as computation intensive type and time delay sensitive type by considering the time deadline requirement of the heterogeneous tasks, the limitations of the maximum transmitting power of an industrial terminal and the peak interference power of other equipment and the computing capacity of network end side equipment.
The technical scheme adopted by the invention for realizing the purpose is as follows: a method for jointly allocating computing and communication resources of an industrial wireless network is used for realizing computing and communication resource cooperative allocation of an industrial base station and an industrial terminal in the industrial wireless network based on deep reinforcement learning, and comprises the following steps:
1) Establishing an industrial wireless network with edge computing capability;
2) Constructing an optimization problem about the joint allocation of computing and communication resources of the industrial base station and the industrial terminal according to the deadline requirement of the heterogeneous industrial task;
3) Converting the optimization problem into a Markov decision process problem;
4) Constructing a double-layer dual-depth neural network model based on deep reinforcement learning so as to solve the Markov decision process problem;
5) Training a double-layer dual deep neural network model in an off-line manner to obtain calculation and communication resource distribution results;
6) And the industrial base station and the industrial terminal in the industrial wireless network perform calculation and communication resource allocation on line, perform wireless communication and task unloading so as to cooperatively process heterogeneous industrial tasks and minimize time delay.
The industrial wireless network with edge computing capability comprises: n industrial base stations and M industrial terminals;
the industrial base station is provided with an edge computing server and is used for providing computing resources for a plurality of industrial terminals and supporting the dispatching of the industrial terminals in the coverage area of the industrial base station;
the industrial terminal is used for performing local calculation on the heterogeneous industrial tasks and supporting the heterogeneous industrial tasks to be unloaded to the industrial base station through the wireless channel for performing edge calculation.
For the tasks of a single industrial terminal, the tasks are unloaded to an industrial base station through non-unloading, partial unloading or complete unloading; with d m,n Represents a computational decision, d m,n =1 indicates that the industrial terminal m selects the industrial base station n for unloading, otherwise, the industrial terminal m does not unload;
the transmission rate of the industrial terminal for task unloading through the wireless channel is
Figure BDA0003824000120000021
Wherein, B m,n The bandwidth is represented by the number of bits in the bandwidth,
Figure BDA0003824000120000022
representing noise at the industrial base station n, g m,n And g m',n Respectively representing the channel power gain, p, from industrial terminal m, industrial terminal m' to industrial base station n m 、p m' Respectively represent workerAnd transmitting power of the industrial terminal m and the industrial terminal m'.
The optimization problem of the joint allocation of the computing and communication resources of the industrial base station and the industrial terminal is
Figure BDA0003824000120000023
s.t.C1:0≤p m ≤P max ,m=1,...M,
C2:
Figure BDA0003824000120000024
C3:0≤f m,n ≤F n ,
C4:
Figure BDA0003824000120000025
C5:0≤u m,n ≤1,
C6:d m,n ∈{0,1},
C7:
Figure BDA0003824000120000026
C8:0≤τ m,n ≤w m
Wherein the content of the first and second substances,
Figure BDA0003824000120000031
for the task goal, i.e. minimizing the total delay, τ m,n Representing the time delay for processing the tasks of the industrial terminal m,
Figure BDA0003824000120000032
respectively representing objective calculation decisions, unloading proportions and transmitting power of all industrial base stations and industrial terminals;
c1 and C2 are transmit power constraints; wherein, P max Represents the maximum transmission power, I, of an industrial terminal p Indicating the peak interference power, g, that an industrial terminal can tolerate m,m' Represents the channel power gain from between industrial terminal m and industrial terminal m';
c3 and C4 are computational resource constraints; wherein f is m,n Indicating the computing resources allocated to an industrial terminal m by an industrial base station n, F n The total calculation resources of the industrial base station n are represented by the number of CPU cycles in unit time;
c5 is an unloading proportion constraint; wherein u is m,n The unloading proportion of the industrial terminal m to unload the task to the industrial base station n is shown, and the size of the unloading proportion is between 0 and 1;
c6 and C7 are constraints for computational decisions; wherein d is m,n Representing a computational decision, d m,n =1 represents that the industrial terminal m selects the industrial base station n for task offloading; d is a radical of m,n =0 indicates that the industrial terminal m does not select the industrial base station n for task offloading; each industrial terminal can only select one industrial base station to carry out task unloading and can not unload the industrial base stations;
c8 is task deadline constraint; wherein, w m Indicating the deadline of the task performed by the industrial terminal m, i.e. the longest task processing time that the industrial terminal m can accept.
The task processing time delay of the industrial terminal is determined by edge calculation time delay and local calculation time delay, and the calculation method comprises the following steps:
Figure BDA0003824000120000033
the edge calculates the time delay
Figure BDA0003824000120000034
Determined by the communication delay and the calculated delay, calculated as
Figure BDA0003824000120000035
Wherein, C m Indicating the number of CPU cycles, T, required by the industrial terminal m to compute a unit data volume task m The data size of the task to be processed by the industrial terminal m is represented;
the local computation time delay
Figure BDA0003824000120000036
Is calculated as
Figure BDA0003824000120000037
The process of performing problem transformation based on the Markov decision process is as follows:
a) Establishing a Markov decision model which comprises a state vector, an action vector, an incentive vector and a state transfer function;
the state vector is the state of the industrial terminal at the time T and is marked as s (T) = { T (T), C (T), w (T), v (T), g (T) };
wherein T (T) = { T = m (t)} M Set of data size representing the tasks performed by M industrial terminals, C (t) = { C m (t)} M Represents a set of CPU computation cycles required for M industrial terminals to perform a unit data volume task, v (t) = { v m (t)} M Set of task priorities representing M industrial terminals, w (t) = { w m (t)} M Set of deadlines representing the execution of tasks by the M industrial terminals, g (t) = { g = } m,n (t)} M×N Representing a set of channel power gains between the M industrial terminals and the N industrial base stations;
the motion vector is the motion of the industrial terminal at the time t and is marked as a (t) = { d (t), u (t), p (t) };
wherein d (t) = { d = m,n (t)} M×N A calculation decision result set representing that the M industrial terminals carry out task unloading to the N industrial base stations; u (t) = { u m,n (t)} M×N A set of proportions representing the offloading of tasks by M industrial terminals to N industrial base stations, p (t) = { p = } m,n (t)} M×N Representing a set of transmit powers for the M industrial terminals;
the reward vector is the reward obtained by the industrial terminal at the moment t and is recorded as r (t) = { r m,n (t)} M×N
Wherein the industrial terminal m obtains the reward of r m,n (t)=τ m,n (t)+ρ(τ m,n (t)-w m (t)), ρ represents the reward and punishment coefficient of the task, and is determined by the task priority;
the state transition function is the probability of transition from the state s (t) to the state s (t + 1) after the action a (t) is executed at the time t, and is expressed as f (s (t + 1) | s (t), a (t));
b) Determining a long-term cumulative reward function; the long term jackpot is
Figure BDA0003824000120000041
Wherein, t 0 Represents the last time, gamma ∈ [0,1 ]]A discount coefficient indicating the influence of past rewards on the current reward;
c) Performing problem conversion; the problem after conversion is
max R p (t)
s.t.C1,C2,C3,C4,C5,C6,C7,C8
Under the condition that the constraints C1-C8 are met, the long-term accumulated reward is maximized to obtain the optimal state transition probability, and further an effective time delay minimization strategy is obtained.
The method comprises the steps that a double-layer dual deep neural network model is built based on deep reinforcement learning, the model comprises two deep neural networks which are respectively called an estimation Q network and a target Q network, and an intelligent agent is formed; the two structures are the same, but different hyperparameters theta are adopted to generate Q values.
The off-line training of the neural network model until the reward converges comprises the following steps:
a) Extracting experience data E (t) from the experience pool as training data;
b) Inputting s (t) into an estimated Q network to generate an action a (t) and a Q value Q (s (t), a (t) | theta);
c) Executing a (t), converting s (t) into s (t + 1) and obtaining a reward r (t);
d) Training the estimated Q network, and updating the hyperparameter theta of the target Q network in real time;
e) Storing the obtained state, action, reward and next time state of the current time as experience E (t) = { s (t), a (t), r (t), s (t + 1) } in an experience pool;
f) Inputting s (t + 1) to a target Q network to obtain a (t + 1) and a Q value Q '(s (t), a (t) | theta'), and calculating to obtain Q '(s (t + 1), a (t + 1) | theta');
g) Updating theta by a random gradient descent method; the random gradient descent method updating θ is implemented by:
Figure BDA0003824000120000042
wherein the content of the first and second substances,
Figure BDA0003824000120000043
representing the mean square error of the target Q network and the estimated Q network for the loss function of the estimated Q network;
h) And performing experience playback, and repeatedly iterating the steps a) -g) until the reward converges to a stable value, and obtaining an effective time delay minimization strategy, namely calculating decisions, unloading proportions and transmitting power of all industrial base stations and industrial terminals and communication resource allocation results.
The online computing and communication resource allocation execution and cooperative processing of the heterogeneous industrial tasks comprises the following steps:
a) Taking the state vectors s (t) of all the industrial terminals at the current moment t as the input of the intelligent agent after the off-line training is finished, and obtaining an output action vector a (t);
b) And according to the obtained output motion vector a (t), all industrial terminals process industrial tasks according to calculation decisions, unloading proportion and transmission power distribution calculation and communication resources in the a (t).
The invention has the following beneficial effects and advantages:
1. aiming at the high dynamic industrial network environment, the difficult modeling and the difficult algorithm state space explosion caused by the complex coupling of communication and calculation multidimensional resources, the invention provides a calculation and communication resource joint distribution method of an industrial wireless network by adopting a deep reinforcement learning method, thereby realizing the offline training and online execution of resource distribution and ensuring the real-time property of the industrial wireless network.
2. The invention provides a deep reinforcement learning-based industrial wireless network communication and computing resource joint allocation method, which is oriented to the problem of cooperative processing of heterogeneous high-concurrency industrial tasks in an industrial wireless network, fully considers the time deadline requirement of the heterogeneous tasks, the limits of the maximum transmitting power of an industrial terminal, the peak interference power of the industrial terminal to other equipment and the like, and the computing capacity of a network terminal side, and can meet the transmission quality requirement of the heterogeneous industrial tasks in the interference-limited industrial wireless network and support the cooperative processing of the heterogeneous tasks.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a multi-user-multi-service industrial wireless network scenario to which the present invention is directed;
FIG. 3 is a diagram of a deep neural network architecture employed in the present invention;
FIG. 4 is a flowchart of deep reinforcement learning training according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The invention provides a deep reinforcement learning-based industrial wireless network communication and computing resource joint distribution method, which is oriented to the end-side resource distribution of an industrial wireless network under the general scene of multiple users and multiple servers. According to the invention, through joint allocation of the computing and communication resources of the industrial wireless network, the on-demand unloading of heterogeneous industrial tasks can be supported, and the end-edge resource cooperation is realized; on the premise of meeting the deadline requirements of the heterogeneous industrial task and the limitations of computing capacity, maximum transmitting power, peak interference power and the like of the 'end-edge' equipment, the total processing time delay of the heterogeneous industrial task is minimized, and collaborative production and manufacturing are supported.
The invention provides a method for jointly allocating industrial wireless network communication and computing resources, which comprises the following steps: 1) Establishing an industrial wireless network with edge computing capability; 2) Establishing a problem prototype of joint allocation of computing and communication resources; 3) Performing problem conversion based on a Markov decision process; 4) Constructing a double-layer dual deep neural network model based on deep reinforcement learning; 5) Training the neural network model off line until the reward is converged; 6) And performing calculation and communication resource allocation on line, and cooperatively processing heterogeneous industrial tasks. The overall process of the present invention is shown in FIG. 1.
1) An industrial wireless network with edge computing capability is established. As shown in fig. 2, the industrial wireless network includes N industrial base stations and M industrial terminals. The industrial base station is provided with the edge computing server, has strong real-time computing capacity, can provide computing resources for a plurality of industrial terminals, supports the scheduling of the industrial terminals in the coverage range of the industrial terminals, and meets the computing and communication requirements of the industrial terminals; the industrial terminal has the computing and communication capabilities, can process the heterogeneous industrial tasks in real time, and supports the process of unloading the heterogeneous industrial tasks to the industrial base station through the wireless channel.
According to the calculation decision situation, the tasks of a single industrial terminal can be unloaded to an industrial base station in a non-load, partial load or full load mode, but can be unloaded to only one industrial base station. With d m,n Represents a computational decision, d m,n And =1 indicates that the industrial terminal m selects the industrial base station n for unloading, otherwise, the industrial base station n is not unloaded.
The rate of the industrial terminal for task unloading through the wireless channel is
Figure BDA0003824000120000061
Wherein, B m,n The bandwidth is represented by the number of bits in the bandwidth,
Figure BDA0003824000120000062
representing noise at the industrial base station n, g m,n And g m',n Respectively representing the channel power gain, p, from industrial terminal m, industrial terminal m' to industrial base station n m 、p m' Respectively representing the transmission power of the industrial terminal m and the industrial terminal m'.
2) Problem prototyping for establishing joint allocation of computing and communication resources
The optimization problem of joint calculation and communication resource allocation on the side of the industrial wireless network is
Figure BDA0003824000120000063
s.t.C1:0≤p m ≤P max ,m=1,...M,
C2:
Figure BDA0003824000120000064
C3:0≤f m,n ≤F n ,
C4:
Figure BDA0003824000120000065
C5:0≤u m,n ≤1,
C6:d m,n ∈{0,1},
C7:
Figure BDA0003824000120000066
C8:0≤τ m,n ≤w m
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003824000120000067
respectively representing objective calculation decision, unloading proportion and transmitting power of all industrial base stations and industrial terminals.
Figure BDA0003824000120000068
The objective of the task is to minimize the total latency. Tau is m,n The task time delay for processing the industrial terminal m is determined by the edge calculation time delay and the local calculation time delay, and the calculation method comprises the following steps:
Figure BDA0003824000120000069
edge computation time delay
Figure BDA00038240001200000610
By communication delayAnd calculating a delay decision as
Figure BDA0003824000120000071
Wherein, C m Indicating the number of CPU cycles, T, required by the industrial terminal m to compute a unit data volume task m Indicating the size of the amount of task data to be processed by the industrial terminal m.
Locally calculating time delay
Figure BDA0003824000120000072
Is calculated as
Figure BDA0003824000120000073
C1 and C2 are transmit power constraints. Wherein p is m Representing the transmission power, P, of an industrial terminal m max Represents the maximum transmission power, I, of an industrial terminal p Indicating the peak interference power, g, that an industrial terminal can tolerate m,m' Represents the channel power gain from between industrial terminal m and industrial terminal m'.
C3 and C4 are computational resource constraints. Wherein f is m,n Indicating the computing resources allocated to an industrial terminal m by an industrial base station n, F n The total computing resources of the industrial base station n are all measured by the number of CPU cycles in a unit time.
C5 is a constraint on the unloading ratio. Wherein u is m,n And the unloading proportion of the industrial terminal m to unload the task to the industrial base station n is shown, and the size of the unloading proportion is between 0 and 1.
C6 and C7 are constraints for the computational decision. Wherein d is m,n Represents a computational decision, d m,n =1 represents that the industrial terminal m selects the industrial base station n for task offloading; d is a radical of m,n And =0 indicates that the industrial terminal m does not select the industrial base station n for task offloading.
C8 is a task deadline constraint. Wherein, w m Indicating the deadline of the task executed by the industrial terminal m, i.e. the maximum time during which the processing delay of the task of the industrial terminal m is required to beAnd then the previous completion is finished.
3) The process of performing problem transformation based on the Markov decision process is as follows:
a) And establishing a Markov decision model which comprises a state vector, an action vector, an incentive vector and a state transfer function.
The state vector is the state of the industrial terminal at time T, denoted as s (T) = { T (T), C (T), w (T), v (T), g (T) }. Wherein T (T) = { T = m (t)} M Set of data size representing the tasks performed by M industrial terminals, C (t) = { C m (t)} M Represents a set of CPU calculation cycles, v (t) = { v (v) =, required for M industrial terminals to execute a unit data volume task m (t)} M Set of task priorities representing M industrial terminals, w (t) = { w m (t)} M Set of deadlines representing the execution of tasks by M industrial terminals, g (t) = { g m,n (t)} M×N Representing the set of channel power gains between the M industrial terminals and the N industrial base stations.
The motion vector is the motion of the industrial terminal at time t, and is denoted as a (t) = { d (t), u (t), p (t) }. Wherein d (t) = { d = m,n (t)} M×N A calculation decision result set representing that the M industrial terminals carry out task unloading to the N industrial base stations; u (t) = { u m,n (t)} M×N A set of proportions representing the offloading of tasks by M industrial terminals to N industrial base stations, p (t) = { p = } m,n (t)} M×N Representing the set of transmit powers for M industrial terminals.
The reward vector is the reward obtained by the industrial terminal at time t and is denoted as r (t) = { r = m,n (t)} M×N . Wherein the industrial terminal m obtains the reward of
r m,n (t)=τ m,n (t)+ρ(τ m,n (t)-w m (t))
Wherein ρ represents a reward and punishment coefficient of the task and is determined by the task priority.
The state transition function is a probability of transition from the state s (t) to the state s (t + 1) after the action a (t) is performed at the time t, and is represented as f (s (t + 1) | s (t), a (t)).
b) Determining a long-term jackpot as
Figure BDA0003824000120000081
Wherein, t 0 Represents the last time, gamma ∈ [0,1 ]]A discount factor is indicated for indicating the impact of past rewards on the current reward.
c) Perform problem conversion
Transforming the optimization problem of computation and communication resource allocation into a long-term cumulative function
max R p (t)
s.t.C1,C2,C3,C4,C5,C6,C7,C8
Under the condition of satisfying the constraints C1-C8, the long-term accumulated reward is maximized, the optimal state transition probability can be obtained, further, the effective calculation and communication resource distribution results are obtained, and the time delay minimization is realized.
4) A double-layer dual deep neural network model is constructed based on deep reinforcement learning, and comprises two deep neural networks which are respectively called an estimation Q network and a target Q network. The two structures are the same, but different hyperparameters theta are adopted to generate Q values. The estimation Q network and the target Q network both adopt a dual architecture, as shown in FIG. 3, and comprise two branches, V(s) and A (s, a), which respectively indicate the state value of the current state and the advantage of each action.
5) Training the neural network model offline until the reward converges, as shown in fig. 4, comprises the following steps:
a) Extracting data from the experience pool as training data;
b) Inputting s (t) into the estimated Q network to generate an action a (t) and a Q value Q (s (t), a (t) | theta);
c) Executing a (t), converting s (t) into s (t + 1) and obtaining a reward r (t);
d) Training the estimated Q network, and updating the hyperparameter theta of the target Q network in real time;
e) Storing the obtained state, action, reward and next time state of the current time as experience E (t) = { s (t), a (t), r (t), s (t + 1) } in an experience pool;
f) Inputting s (t + 1) to a target Q network to obtain a (t + 1) and a Q value Q '(s (t), a (t) | theta'), and calculating to obtain Q '(s (t + 1), a (t + 1) | theta');
g) Updating theta by a random gradient descent method; the random gradient descent method updating θ is implemented by:
Figure BDA0003824000120000082
wherein the content of the first and second substances,
Figure BDA0003824000120000083
representing the mean square error of the target Q network and the estimated Q network for the loss function of the estimated Q network;
h) And (3) performing experience playback, and repeating the iteration steps a) -g) until the reward converges to a stable value, so as to obtain effective calculation and communication resource allocation results and minimize time delay.
6) The method comprises the following steps of performing calculation and communication resource allocation on line and cooperatively processing heterogeneous industrial tasks:
a) Taking the state vectors s (t) of all the industrial terminals at the current moment t as the input of the intelligent agent after the off-line training is finished, and obtaining an output action vector a (t);
b) And according to the obtained output action vector a (t), all industrial terminals distribute calculation and communication resources according to the calculation decision, the unloading proportion and the transmitting power in the a (t) to process industrial tasks.

Claims (9)

1. A joint allocation method for computing and communication resources of an industrial wireless network is characterized in that the cooperative allocation of the computing and communication resources of an industrial base station and an industrial terminal in the industrial wireless network is realized based on deep reinforcement learning, and the joint allocation method comprises the following steps:
1) Establishing an industrial wireless network with edge computing capability;
2) Constructing an optimization problem about the joint allocation of computing and communication resources of the industrial base station and the industrial terminal according to the deadline requirement of the heterogeneous industrial task;
3) Converting the optimization problem into a Markov decision process problem;
4) Constructing a double-layer dual deep neural network model based on deep reinforcement learning so as to solve the Markov decision process problem;
5) Training a double-layer dual deep neural network model in an off-line manner to obtain calculation and communication resource distribution results;
6) The industrial base station and the industrial terminal in the industrial wireless network perform calculation and communication resource allocation on line, perform wireless communication and task unloading so as to cooperatively process heterogeneous industrial tasks and minimize time delay.
2. The method of claim 1, wherein the edge-computing-capable industrial wireless network comprises: n industrial base stations and M industrial terminals;
the industrial base station is provided with an edge computing server and is used for providing computing resources for a plurality of industrial terminals and supporting the dispatching of the industrial terminals in the coverage area of the industrial base station;
the industrial terminal is used for performing local calculation on the heterogeneous industrial tasks and supporting the heterogeneous industrial tasks to be unloaded to the industrial base station through the wireless channel for edge calculation.
3. The method of claim 2, wherein tasks for a single industrial terminal are offloaded to one industrial base station by no offloading, partial offloading, or all offloading; with d m,n Represents a computational decision, d m,n =1 indicates that the industrial terminal m selects the industrial base station n for unloading, otherwise, the industrial base station n is not unloaded;
the transmission rate of the industrial terminal for task unloading through the wireless channel is
Figure FDA0003824000110000011
Wherein, B m,n The bandwidth is represented by a number of bits,
Figure FDA0003824000110000012
representing noise at the industrial base station n, g m,n And g m',n Respectively representing the channel power gain, p, from industrial terminal m, industrial terminal m' to industrial base station n m 、p m' Respectively representing the transmission power of the industrial terminal m and the industrial terminal m'.
4. The method as claimed in claim 1, wherein the optimization problem of the joint allocation of the computing and communication resources of the industrial base station and the industrial terminal is that
Figure FDA0003824000110000013
Figure FDA0003824000110000021
C3:0≤f m,n ≤F n ,
Figure FDA0003824000110000022
C5:0≤u m,n ≤1,
C6:d m,n ∈{0,1},
Figure FDA0003824000110000023
C8:0≤τ m,n ≤w m
Wherein the content of the first and second substances,
Figure FDA0003824000110000024
for the task objective, i.e. minimizing the total delay, τ m,n Representing the time delay for processing the tasks of the industrial terminal m,
Figure FDA0003824000110000025
respectively representing objective calculation decisions, unloading proportions and transmitting power of all industrial base stations and industrial terminals;
c1 and C2 are transmit power constraints; wherein, P max Represents the maximum transmission power, I, of an industrial terminal p Indicating the peak interference power, g, that an industrial terminal can tolerate m,m' Represents the channel power gain from between industrial terminal m and industrial terminal m';
c3 and C4 are computational resource constraints; wherein f is m,n Indicating the computing resources allocated to an industrial terminal m by an industrial base station n, F n The total computing resources of the industrial base station n are represented by the number of CPU cycles in unit time;
c5 is an unloading proportion constraint; wherein u is m,n The unloading proportion of the industrial terminal m to unload the task to the industrial base station n is shown, and the size of the unloading proportion is between 0 and 1;
c6 and C7 are constraints for computational decisions; wherein, d m,n Representing a computational decision, d m,n =1 represents that the industrial terminal m selects the industrial base station n for task offloading; d m,n =0 indicates that the industrial terminal m does not select the industrial base station n for task offloading; each industrial terminal can only select one industrial base station to carry out task unloading and can not unload the industrial base stations;
c8 is task deadline constraint; wherein w m Indicating the deadline of the task executed by the industrial terminal m, i.e. the longest task processing time that the industrial terminal m can accept.
5. The method of claim 4, wherein the task processing delay of the industrial terminal is determined by an edge computation delay and a local computation delay, and the computation method comprises:
Figure FDA0003824000110000026
the edge calculating time delay
Figure FDA0003824000110000027
Determined by the communication delay and the calculated delay, calculated as
Figure FDA0003824000110000028
Wherein, C m Indicating the number of CPU cycles, T, required by the industrial terminal m to compute a unit data volume task m The data size of the task to be processed by the industrial terminal m is represented;
the local computation time delay
Figure FDA0003824000110000029
Is calculated as
Figure FDA0003824000110000031
6. The method of claim 1, wherein the problem transformation is performed based on a Markov decision process as follows:
a) Establishing a Markov decision model which comprises a state vector, an action vector, an incentive vector and a state transfer function;
the state vector is the state of the industrial terminal at the time T and is marked as s (T) = { T (T), C (T), w (T), v (T), g (T) };
wherein, T (T) = { T = m (t)} M Set of data size representing the tasks performed by M industrial terminals, C (t) = { C m (t)} M Represents a set of CPU calculation cycles, v (t) = { v (v) =, required for M industrial terminals to execute a unit data volume task m (t)} M Set of task priorities representing M industrial terminals, w (t) = { w m (t)} M Set of deadlines representing the execution of tasks by the M industrial terminals, g (t) = g{g m,n (t)} M×N Representing a set of channel power gains between the M industrial terminals and the N industrial base stations;
the motion vector is the motion of the industrial terminal at the time t and is marked as a (t) = { d (t), u (t), p (t) };
wherein d (t) = { d = m,n (t)} M×N A calculation decision result set representing that the M industrial terminals carry out task unloading to the N industrial base stations; u (t) = { u m,n (t)} M×N Represents a set of proportions for M industrial terminals to offload tasks to N industrial base stations, p (t) = { p = m,n (t)} M×N Representing a set of transmit powers for the M industrial terminals;
the reward vector is the reward obtained by the industrial terminal at the moment t and is recorded as r (t) = { r m,n (t)} M×N
Wherein the reward obtained by the industrial terminal m is r m,n (t)=τ m,n (t)+ρ(τ m,n (t)-w m (t)), ρ represents the reward and punishment coefficient of the task, and is determined by the task priority;
the state transition function is the probability of transition from the state s (t) to the state s (t + 1) after the action a (t) is executed at the time t, and is expressed as f (s (t + 1) | s (t), a (t));
b) Determining a long-term cumulative reward function; the long term jackpot is
Figure FDA0003824000110000032
Wherein, t 0 Represents the last time, gamma ∈ [0,1 ]]A discount coefficient is represented and used for indicating the influence of past rewards on the current rewards;
c) Carrying out problem conversion; the problem after conversion is
max R p (t)
s.t.C1,C2,C3,C4,C5,C6,C7,C8
Under the condition that the constraints C1-C8 are met, the long-term accumulated reward is maximized to obtain the optimal state transition probability, and further an effective time delay minimization strategy is obtained.
7. The method for joint allocation of computing and communication resources of the industrial wireless network according to claim 1, wherein the deep reinforcement learning-based construction of the dual-layer dual deep neural network model comprises two deep neural networks, namely an estimation Q network and a target Q network, to form an agent; the two structures are the same, but different superparameters theta are adopted to generate Q values.
8. The method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein the step of training the neural network model offline until the convergence of rewards comprises the following steps:
a) Extracting experience data E (t) from the experience pool as training data;
b) Inputting s (t) into an estimated Q network to generate an action a (t) and a Q value Q (s (t), a (t) | theta);
c) Executing a (t), converting s (t) into s (t + 1) and obtaining a reward r (t);
d) Training the estimated Q network, and updating the hyperparameter theta of the target Q network in real time;
e) Storing the obtained state, action, reward and next time state of the current time as experience E (t) = { s (t), a (t), r (t), s (t + 1) } in an experience pool;
f) Inputting s (t + 1) to a target Q network to obtain a (t + 1) and a Q value Q '(s (t), a (t) | theta'), and calculating to obtain Q '(s (t + 1), a (t + 1) | theta');
g) Updating theta by a random gradient descent method; the random gradient descent method update θ is realized by the following formula:
Figure FDA0003824000110000041
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003824000110000042
representing the mean square error of the target Q network and the estimated Q network for the loss function of the estimated Q network;
h) And performing experience playback, and repeatedly iterating the steps a) -g) until the reward converges to a stable value, and obtaining an effective time delay minimization strategy, namely calculating decisions, unloading proportions and transmitting power of all industrial base stations and industrial terminals and communication resource allocation results.
9. The method for joint allocation of computing and communication resources of an industrial wireless network according to claim 1, wherein the performing computing and communication resource allocation on-line and co-processing heterogeneous industrial tasks comprises the following steps:
a) Taking the state vectors s (t) of all the industrial terminals at the current moment t as the input of the intelligent agent after the off-line training is finished, and obtaining an output action vector a (t);
b) And according to the obtained output action vector a (t), all industrial terminals distribute calculation and communication resources according to the calculation decision, the unloading proportion and the transmitting power in the a (t) to process industrial tasks.
CN202211052799.XA 2022-08-31 2022-08-31 Computing and communication resource joint distribution method for industrial wireless network Pending CN115413044A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211052799.XA CN115413044A (en) 2022-08-31 2022-08-31 Computing and communication resource joint distribution method for industrial wireless network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211052799.XA CN115413044A (en) 2022-08-31 2022-08-31 Computing and communication resource joint distribution method for industrial wireless network

Publications (1)

Publication Number Publication Date
CN115413044A true CN115413044A (en) 2022-11-29

Family

ID=84164572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211052799.XA Pending CN115413044A (en) 2022-08-31 2022-08-31 Computing and communication resource joint distribution method for industrial wireless network

Country Status (1)

Country Link
CN (1) CN115413044A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117575113A (en) * 2024-01-17 2024-02-20 南方电网数字电网研究院股份有限公司 Edge collaborative task processing method, device and equipment based on Markov chain

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117575113A (en) * 2024-01-17 2024-02-20 南方电网数字电网研究院股份有限公司 Edge collaborative task processing method, device and equipment based on Markov chain
CN117575113B (en) * 2024-01-17 2024-05-03 南方电网数字电网研究院股份有限公司 Edge collaborative task processing method, device and equipment based on Markov chain

Similar Documents

Publication Publication Date Title
CN113242568B (en) Task unloading and resource allocation method in uncertain network environment
CN113950066B (en) Single server part calculation unloading method, system and equipment under mobile edge environment
CN112367353B (en) Mobile edge computing unloading method based on multi-agent reinforcement learning
CN108920280B (en) Mobile edge computing task unloading method under single-user scene
CN111586696B (en) Resource allocation and unloading decision method based on multi-agent architecture reinforcement learning
CN107766135B (en) Task allocation method based on particle swarm optimization and simulated annealing optimization in moving cloud
CN111800828B (en) Mobile edge computing resource allocation method for ultra-dense network
CN111918339B (en) AR task unloading and resource allocation method based on reinforcement learning in mobile edge network
CN113543156B (en) Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning
CN113568675A (en) Internet of vehicles edge calculation task unloading method based on layered reinforcement learning
CN114340016B (en) Power grid edge calculation unloading distribution method and system
CN114285853B (en) Task unloading method based on end edge cloud cooperation in equipment-intensive industrial Internet of things
WO2023040022A1 (en) Computing and network collaboration-based distributed computation offloading method in random network
CN113626104B (en) Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture
CN112511336B (en) Online service placement method in edge computing system
CN113867843B (en) Mobile edge computing task unloading method based on deep reinforcement learning
CN116156563A (en) Heterogeneous task and resource end edge collaborative scheduling method based on digital twin
CN113590279A (en) Task scheduling and resource allocation method for multi-core edge computing server
CN111565380A (en) NOMA-MEC-based hybrid unloading method in Internet of vehicles
CN114828018A (en) Multi-user mobile edge computing unloading method based on depth certainty strategy gradient
CN114564304A (en) Task unloading method for edge calculation
CN113573363A (en) MEC calculation unloading and resource allocation method based on deep reinforcement learning
CN115413044A (en) Computing and communication resource joint distribution method for industrial wireless network
CN116112488A (en) Fine-grained task unloading and resource allocation method for MEC network
CN114706631A (en) Unloading decision method and system in mobile edge calculation based on deep Q learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination