CN115413044A - Computing and communication resource joint distribution method for industrial wireless network - Google Patents
Computing and communication resource joint distribution method for industrial wireless network Download PDFInfo
- Publication number
- CN115413044A CN115413044A CN202211052799.XA CN202211052799A CN115413044A CN 115413044 A CN115413044 A CN 115413044A CN 202211052799 A CN202211052799 A CN 202211052799A CN 115413044 A CN115413044 A CN 115413044A
- Authority
- CN
- China
- Prior art keywords
- industrial
- terminal
- computing
- task
- base station
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004891 communication Methods 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000009826 distribution Methods 0.000 title claims description 11
- 238000004364 calculation method Methods 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 21
- 230000008569 process Effects 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000013468 resource allocation Methods 0.000 claims abstract description 16
- 230000002787 reinforcement Effects 0.000 claims abstract description 15
- 238000003062 neural network model Methods 0.000 claims abstract description 14
- 230000009977 dual effect Effects 0.000 claims abstract description 9
- 238000006243 chemical reaction Methods 0.000 claims abstract description 7
- 239000013598 vector Substances 0.000 claims description 27
- 230000009471 action Effects 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 12
- 230000005540 biological transmission Effects 0.000 claims description 11
- 230000007774 longterm Effects 0.000 claims description 9
- 230000007704 transition Effects 0.000 claims description 9
- 238000005457 optimization Methods 0.000 claims description 8
- 238000011478 gradient descent method Methods 0.000 claims description 6
- 239000003795 chemical substances by application Substances 0.000 claims description 5
- 238000013528 artificial neural network Methods 0.000 claims description 4
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 238000012546 transfer Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 239000010410 layer Substances 0.000 claims 2
- 238000010276 construction Methods 0.000 claims 1
- 239000002355 dual-layer Substances 0.000 claims 1
- 238000004519 manufacturing process Methods 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000004880 explosion Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention relates to an industrial wireless network, in particular to a method for jointly allocating computing and communication resources of the industrial wireless network, which comprises the following steps: establishing an industrial wireless network with edge computing capability; establishing a problem prototype of joint allocation of computing and communication resources; performing problem conversion based on a Markov decision process; constructing a double-layer dual deep neural network model based on deep reinforcement learning; training the neural network model off line until the reward converges; and performing calculation and communication resource allocation on line, and cooperatively processing heterogeneous industrial tasks. According to the invention, through joint allocation of the computing and communication resources of the industrial wireless network, the on-demand unloading of heterogeneous industrial tasks can be supported, and the end-edge resource cooperation is realized; on the premise of meeting the deadline requirements of the heterogeneous industrial tasks, the total time delay minimization of processing the heterogeneous industrial tasks is realized and the cooperative production manufacturing is supported on the premise of limiting the computing capacity, the maximum transmitting power, the peak interference power and the like of the 'end-edge' equipment.
Description
Technical Field
The invention relates to the field of industrial wireless networks, in particular to a computing and communication resource joint allocation method of an industrial wireless network.
Background
With the continuous enhancement of the 5G Ultra-Reliable Low-Latency Communication (URLLC) technology, the 5G-based industrial wireless network capability becomes stronger and stronger, and can support key industrial control tasks and complex production. However, while URLLC may guarantee, to some extent, that control commands arrive deterministically before the deadline, complex industrial tasks typically require real-time computations to complete decisions and generate control commands, which presents a significant challenge to resource-limited industrial field devices. The rapid development of Multi-access Edge Computing (MEC) technology and the combination of the MEC technology and 5-GURLLC provide an effective scheme for solving the real-time computation of complex production tasks and reducing time delay. By jointly deploying the MEC server with the industrial base station, the computing power of industrial field equipment can be supplemented and enhanced, and the real-time performance of processing complex tasks is improved.
However, the non-uniformity of edge resource distribution of the industrial wireless network, the competitiveness of mass device access to resources, and the difference of QoS requirements of heterogeneous tasks pose further challenges to the computation and communication resource allocation as required of the network, and become bottlenecks that restrict the efficient application of the MEC. In order to realize the end edge resource cooperation of the industrial wireless network, the academic community provides different resource allocation and calculation unloading algorithms to balance calculation and communication resource allocation under different scenes, including scenes such as single-user-multi-server, multi-user-single-server and multi-user-multi-server. The related method mainly adopts Deep Reinforcement Learning (DRL) to make decisions and allocation on complex coupled calculation and communication resources such as calculation decisions, unloading proportions, transmission power, communication bandwidth, CPU resources and the like to deal with a highly dynamic wireless network environment by taking time delay minimization, energy consumption minimization, throughput maximization and the like as targets.
However, existing approaches are less concerned with the problem of high concurrency access for heterogeneous industrial tasks with different QoS requirements, such as computationally intensive tasks for machine vision, delay sensitive tasks for motion control. In particular, in an industrial scenario, heterogeneous industrial tasks have different deadline requirements and interference limitations, which cause great difficulty in the allocation of computing and communication resources of an industrial wireless network.
Disclosure of Invention
The invention provides a deep reinforcement learning-based industrial wireless network communication and computing resource joint allocation method for the cooperative processing of heterogeneous high-concurrency industrial tasks in an industrial wireless network under the universal multi-user and multi-server scene, and the method is used for solving the problem that the traditional resource allocation method is difficult to deal with the state space explosion under the dynamic network environment, realizing the minimization of the total processing time delay of the heterogeneous industrial tasks and supporting the real-time cooperative processing of the heterogeneous high-concurrency industrial tasks such as computation intensive type and time delay sensitive type by considering the time deadline requirement of the heterogeneous tasks, the limitations of the maximum transmitting power of an industrial terminal and the peak interference power of other equipment and the computing capacity of network end side equipment.
The technical scheme adopted by the invention for realizing the purpose is as follows: a method for jointly allocating computing and communication resources of an industrial wireless network is used for realizing computing and communication resource cooperative allocation of an industrial base station and an industrial terminal in the industrial wireless network based on deep reinforcement learning, and comprises the following steps:
1) Establishing an industrial wireless network with edge computing capability;
2) Constructing an optimization problem about the joint allocation of computing and communication resources of the industrial base station and the industrial terminal according to the deadline requirement of the heterogeneous industrial task;
3) Converting the optimization problem into a Markov decision process problem;
4) Constructing a double-layer dual-depth neural network model based on deep reinforcement learning so as to solve the Markov decision process problem;
5) Training a double-layer dual deep neural network model in an off-line manner to obtain calculation and communication resource distribution results;
6) And the industrial base station and the industrial terminal in the industrial wireless network perform calculation and communication resource allocation on line, perform wireless communication and task unloading so as to cooperatively process heterogeneous industrial tasks and minimize time delay.
The industrial wireless network with edge computing capability comprises: n industrial base stations and M industrial terminals;
the industrial base station is provided with an edge computing server and is used for providing computing resources for a plurality of industrial terminals and supporting the dispatching of the industrial terminals in the coverage area of the industrial base station;
the industrial terminal is used for performing local calculation on the heterogeneous industrial tasks and supporting the heterogeneous industrial tasks to be unloaded to the industrial base station through the wireless channel for performing edge calculation.
For the tasks of a single industrial terminal, the tasks are unloaded to an industrial base station through non-unloading, partial unloading or complete unloading; with d m,n Represents a computational decision, d m,n =1 indicates that the industrial terminal m selects the industrial base station n for unloading, otherwise, the industrial terminal m does not unload;
the transmission rate of the industrial terminal for task unloading through the wireless channel is
Wherein, B m,n The bandwidth is represented by the number of bits in the bandwidth,representing noise at the industrial base station n, g m,n And g m',n Respectively representing the channel power gain, p, from industrial terminal m, industrial terminal m' to industrial base station n m 、p m' Respectively represent workerAnd transmitting power of the industrial terminal m and the industrial terminal m'.
The optimization problem of the joint allocation of the computing and communication resources of the industrial base station and the industrial terminal is
s.t.C1:0≤p m ≤P max ,m=1,...M,
C3:0≤f m,n ≤F n ,
C5:0≤u m,n ≤1,
C6:d m,n ∈{0,1},
C8:0≤τ m,n ≤w m
Wherein,for the task goal, i.e. minimizing the total delay, τ m,n Representing the time delay for processing the tasks of the industrial terminal m,respectively representing objective calculation decisions, unloading proportions and transmitting power of all industrial base stations and industrial terminals;
c1 and C2 are transmit power constraints; wherein, P max Represents the maximum transmission power, I, of an industrial terminal p Indicating the peak interference power, g, that an industrial terminal can tolerate m,m' Represents the channel power gain from between industrial terminal m and industrial terminal m';
c3 and C4 are computational resource constraints; wherein f is m,n Indicating the computing resources allocated to an industrial terminal m by an industrial base station n, F n The total calculation resources of the industrial base station n are represented by the number of CPU cycles in unit time;
c5 is an unloading proportion constraint; wherein u is m,n The unloading proportion of the industrial terminal m to unload the task to the industrial base station n is shown, and the size of the unloading proportion is between 0 and 1;
c6 and C7 are constraints for computational decisions; wherein d is m,n Representing a computational decision, d m,n =1 represents that the industrial terminal m selects the industrial base station n for task offloading; d is a radical of m,n =0 indicates that the industrial terminal m does not select the industrial base station n for task offloading; each industrial terminal can only select one industrial base station to carry out task unloading and can not unload the industrial base stations;
c8 is task deadline constraint; wherein, w m Indicating the deadline of the task performed by the industrial terminal m, i.e. the longest task processing time that the industrial terminal m can accept.
The task processing time delay of the industrial terminal is determined by edge calculation time delay and local calculation time delay, and the calculation method comprises the following steps:
the edge calculates the time delayDetermined by the communication delay and the calculated delay, calculated as
Wherein, C m Indicating the number of CPU cycles, T, required by the industrial terminal m to compute a unit data volume task m The data size of the task to be processed by the industrial terminal m is represented;
The process of performing problem transformation based on the Markov decision process is as follows:
a) Establishing a Markov decision model which comprises a state vector, an action vector, an incentive vector and a state transfer function;
the state vector is the state of the industrial terminal at the time T and is marked as s (T) = { T (T), C (T), w (T), v (T), g (T) };
wherein T (T) = { T = m (t)} M Set of data size representing the tasks performed by M industrial terminals, C (t) = { C m (t)} M Represents a set of CPU computation cycles required for M industrial terminals to perform a unit data volume task, v (t) = { v m (t)} M Set of task priorities representing M industrial terminals, w (t) = { w m (t)} M Set of deadlines representing the execution of tasks by the M industrial terminals, g (t) = { g = } m,n (t)} M×N Representing a set of channel power gains between the M industrial terminals and the N industrial base stations;
the motion vector is the motion of the industrial terminal at the time t and is marked as a (t) = { d (t), u (t), p (t) };
wherein d (t) = { d = m,n (t)} M×N A calculation decision result set representing that the M industrial terminals carry out task unloading to the N industrial base stations; u (t) = { u m,n (t)} M×N A set of proportions representing the offloading of tasks by M industrial terminals to N industrial base stations, p (t) = { p = } m,n (t)} M×N Representing a set of transmit powers for the M industrial terminals;
the reward vector is the reward obtained by the industrial terminal at the moment t and is recorded as r (t) = { r m,n (t)} M×N ;
Wherein the industrial terminal m obtains the reward of r m,n (t)=τ m,n (t)+ρ(τ m,n (t)-w m (t)), ρ represents the reward and punishment coefficient of the task, and is determined by the task priority;
the state transition function is the probability of transition from the state s (t) to the state s (t + 1) after the action a (t) is executed at the time t, and is expressed as f (s (t + 1) | s (t), a (t));
b) Determining a long-term cumulative reward function; the long term jackpot is
Wherein, t 0 Represents the last time, gamma ∈ [0,1 ]]A discount coefficient indicating the influence of past rewards on the current reward;
c) Performing problem conversion; the problem after conversion is
max R p (t)
s.t.C1,C2,C3,C4,C5,C6,C7,C8
Under the condition that the constraints C1-C8 are met, the long-term accumulated reward is maximized to obtain the optimal state transition probability, and further an effective time delay minimization strategy is obtained.
The method comprises the steps that a double-layer dual deep neural network model is built based on deep reinforcement learning, the model comprises two deep neural networks which are respectively called an estimation Q network and a target Q network, and an intelligent agent is formed; the two structures are the same, but different hyperparameters theta are adopted to generate Q values.
The off-line training of the neural network model until the reward converges comprises the following steps:
a) Extracting experience data E (t) from the experience pool as training data;
b) Inputting s (t) into an estimated Q network to generate an action a (t) and a Q value Q (s (t), a (t) | theta);
c) Executing a (t), converting s (t) into s (t + 1) and obtaining a reward r (t);
d) Training the estimated Q network, and updating the hyperparameter theta of the target Q network in real time;
e) Storing the obtained state, action, reward and next time state of the current time as experience E (t) = { s (t), a (t), r (t), s (t + 1) } in an experience pool;
f) Inputting s (t + 1) to a target Q network to obtain a (t + 1) and a Q value Q '(s (t), a (t) | theta'), and calculating to obtain Q '(s (t + 1), a (t + 1) | theta');
g) Updating theta by a random gradient descent method; the random gradient descent method updating θ is implemented by:
wherein,representing the mean square error of the target Q network and the estimated Q network for the loss function of the estimated Q network;
h) And performing experience playback, and repeatedly iterating the steps a) -g) until the reward converges to a stable value, and obtaining an effective time delay minimization strategy, namely calculating decisions, unloading proportions and transmitting power of all industrial base stations and industrial terminals and communication resource allocation results.
The online computing and communication resource allocation execution and cooperative processing of the heterogeneous industrial tasks comprises the following steps:
a) Taking the state vectors s (t) of all the industrial terminals at the current moment t as the input of the intelligent agent after the off-line training is finished, and obtaining an output action vector a (t);
b) And according to the obtained output motion vector a (t), all industrial terminals process industrial tasks according to calculation decisions, unloading proportion and transmission power distribution calculation and communication resources in the a (t).
The invention has the following beneficial effects and advantages:
1. aiming at the high dynamic industrial network environment, the difficult modeling and the difficult algorithm state space explosion caused by the complex coupling of communication and calculation multidimensional resources, the invention provides a calculation and communication resource joint distribution method of an industrial wireless network by adopting a deep reinforcement learning method, thereby realizing the offline training and online execution of resource distribution and ensuring the real-time property of the industrial wireless network.
2. The invention provides a deep reinforcement learning-based industrial wireless network communication and computing resource joint allocation method, which is oriented to the problem of cooperative processing of heterogeneous high-concurrency industrial tasks in an industrial wireless network, fully considers the time deadline requirement of the heterogeneous tasks, the limits of the maximum transmitting power of an industrial terminal, the peak interference power of the industrial terminal to other equipment and the like, and the computing capacity of a network terminal side, and can meet the transmission quality requirement of the heterogeneous industrial tasks in the interference-limited industrial wireless network and support the cooperative processing of the heterogeneous tasks.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a multi-user-multi-service industrial wireless network scenario to which the present invention is directed;
FIG. 3 is a diagram of a deep neural network architecture employed in the present invention;
FIG. 4 is a flowchart of deep reinforcement learning training according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
The invention provides a deep reinforcement learning-based industrial wireless network communication and computing resource joint distribution method, which is oriented to the end-side resource distribution of an industrial wireless network under the general scene of multiple users and multiple servers. According to the invention, through joint allocation of the computing and communication resources of the industrial wireless network, the on-demand unloading of heterogeneous industrial tasks can be supported, and the end-edge resource cooperation is realized; on the premise of meeting the deadline requirements of the heterogeneous industrial task and the limitations of computing capacity, maximum transmitting power, peak interference power and the like of the 'end-edge' equipment, the total processing time delay of the heterogeneous industrial task is minimized, and collaborative production and manufacturing are supported.
The invention provides a method for jointly allocating industrial wireless network communication and computing resources, which comprises the following steps: 1) Establishing an industrial wireless network with edge computing capability; 2) Establishing a problem prototype of joint allocation of computing and communication resources; 3) Performing problem conversion based on a Markov decision process; 4) Constructing a double-layer dual deep neural network model based on deep reinforcement learning; 5) Training the neural network model off line until the reward is converged; 6) And performing calculation and communication resource allocation on line, and cooperatively processing heterogeneous industrial tasks. The overall process of the present invention is shown in FIG. 1.
1) An industrial wireless network with edge computing capability is established. As shown in fig. 2, the industrial wireless network includes N industrial base stations and M industrial terminals. The industrial base station is provided with the edge computing server, has strong real-time computing capacity, can provide computing resources for a plurality of industrial terminals, supports the scheduling of the industrial terminals in the coverage range of the industrial terminals, and meets the computing and communication requirements of the industrial terminals; the industrial terminal has the computing and communication capabilities, can process the heterogeneous industrial tasks in real time, and supports the process of unloading the heterogeneous industrial tasks to the industrial base station through the wireless channel.
According to the calculation decision situation, the tasks of a single industrial terminal can be unloaded to an industrial base station in a non-load, partial load or full load mode, but can be unloaded to only one industrial base station. With d m,n Represents a computational decision, d m,n And =1 indicates that the industrial terminal m selects the industrial base station n for unloading, otherwise, the industrial base station n is not unloaded.
The rate of the industrial terminal for task unloading through the wireless channel is
Wherein, B m,n The bandwidth is represented by the number of bits in the bandwidth,representing noise at the industrial base station n, g m,n And g m',n Respectively representing the channel power gain, p, from industrial terminal m, industrial terminal m' to industrial base station n m 、p m' Respectively representing the transmission power of the industrial terminal m and the industrial terminal m'.
2) Problem prototyping for establishing joint allocation of computing and communication resources
The optimization problem of joint calculation and communication resource allocation on the side of the industrial wireless network is
s.t.C1:0≤p m ≤P max ,m=1,...M,
C3:0≤f m,n ≤F n ,
C5:0≤u m,n ≤1,
C6:d m,n ∈{0,1},
C8:0≤τ m,n ≤w m
Wherein,respectively representing objective calculation decision, unloading proportion and transmitting power of all industrial base stations and industrial terminals.The objective of the task is to minimize the total latency. Tau is m,n The task time delay for processing the industrial terminal m is determined by the edge calculation time delay and the local calculation time delay, and the calculation method comprises the following steps:
Wherein, C m Indicating the number of CPU cycles, T, required by the industrial terminal m to compute a unit data volume task m Indicating the size of the amount of task data to be processed by the industrial terminal m.
C1 and C2 are transmit power constraints. Wherein p is m Representing the transmission power, P, of an industrial terminal m max Represents the maximum transmission power, I, of an industrial terminal p Indicating the peak interference power, g, that an industrial terminal can tolerate m,m' Represents the channel power gain from between industrial terminal m and industrial terminal m'.
C3 and C4 are computational resource constraints. Wherein f is m,n Indicating the computing resources allocated to an industrial terminal m by an industrial base station n, F n The total computing resources of the industrial base station n are all measured by the number of CPU cycles in a unit time.
C5 is a constraint on the unloading ratio. Wherein u is m,n And the unloading proportion of the industrial terminal m to unload the task to the industrial base station n is shown, and the size of the unloading proportion is between 0 and 1.
C6 and C7 are constraints for the computational decision. Wherein d is m,n Represents a computational decision, d m,n =1 represents that the industrial terminal m selects the industrial base station n for task offloading; d is a radical of m,n And =0 indicates that the industrial terminal m does not select the industrial base station n for task offloading.
C8 is a task deadline constraint. Wherein, w m Indicating the deadline of the task executed by the industrial terminal m, i.e. the maximum time during which the processing delay of the task of the industrial terminal m is required to beAnd then the previous completion is finished.
3) The process of performing problem transformation based on the Markov decision process is as follows:
a) And establishing a Markov decision model which comprises a state vector, an action vector, an incentive vector and a state transfer function.
The state vector is the state of the industrial terminal at time T, denoted as s (T) = { T (T), C (T), w (T), v (T), g (T) }. Wherein T (T) = { T = m (t)} M Set of data size representing the tasks performed by M industrial terminals, C (t) = { C m (t)} M Represents a set of CPU calculation cycles, v (t) = { v (v) =, required for M industrial terminals to execute a unit data volume task m (t)} M Set of task priorities representing M industrial terminals, w (t) = { w m (t)} M Set of deadlines representing the execution of tasks by M industrial terminals, g (t) = { g m,n (t)} M×N Representing the set of channel power gains between the M industrial terminals and the N industrial base stations.
The motion vector is the motion of the industrial terminal at time t, and is denoted as a (t) = { d (t), u (t), p (t) }. Wherein d (t) = { d = m,n (t)} M×N A calculation decision result set representing that the M industrial terminals carry out task unloading to the N industrial base stations; u (t) = { u m,n (t)} M×N A set of proportions representing the offloading of tasks by M industrial terminals to N industrial base stations, p (t) = { p = } m,n (t)} M×N Representing the set of transmit powers for M industrial terminals.
The reward vector is the reward obtained by the industrial terminal at time t and is denoted as r (t) = { r = m,n (t)} M×N . Wherein the industrial terminal m obtains the reward of
r m,n (t)=τ m,n (t)+ρ(τ m,n (t)-w m (t))
Wherein ρ represents a reward and punishment coefficient of the task and is determined by the task priority.
The state transition function is a probability of transition from the state s (t) to the state s (t + 1) after the action a (t) is performed at the time t, and is represented as f (s (t + 1) | s (t), a (t)).
b) Determining a long-term jackpot as
Wherein, t 0 Represents the last time, gamma ∈ [0,1 ]]A discount factor is indicated for indicating the impact of past rewards on the current reward.
c) Perform problem conversion
Transforming the optimization problem of computation and communication resource allocation into a long-term cumulative function
max R p (t)
s.t.C1,C2,C3,C4,C5,C6,C7,C8
Under the condition of satisfying the constraints C1-C8, the long-term accumulated reward is maximized, the optimal state transition probability can be obtained, further, the effective calculation and communication resource distribution results are obtained, and the time delay minimization is realized.
4) A double-layer dual deep neural network model is constructed based on deep reinforcement learning, and comprises two deep neural networks which are respectively called an estimation Q network and a target Q network. The two structures are the same, but different hyperparameters theta are adopted to generate Q values. The estimation Q network and the target Q network both adopt a dual architecture, as shown in FIG. 3, and comprise two branches, V(s) and A (s, a), which respectively indicate the state value of the current state and the advantage of each action.
5) Training the neural network model offline until the reward converges, as shown in fig. 4, comprises the following steps:
a) Extracting data from the experience pool as training data;
b) Inputting s (t) into the estimated Q network to generate an action a (t) and a Q value Q (s (t), a (t) | theta);
c) Executing a (t), converting s (t) into s (t + 1) and obtaining a reward r (t);
d) Training the estimated Q network, and updating the hyperparameter theta of the target Q network in real time;
e) Storing the obtained state, action, reward and next time state of the current time as experience E (t) = { s (t), a (t), r (t), s (t + 1) } in an experience pool;
f) Inputting s (t + 1) to a target Q network to obtain a (t + 1) and a Q value Q '(s (t), a (t) | theta'), and calculating to obtain Q '(s (t + 1), a (t + 1) | theta');
g) Updating theta by a random gradient descent method; the random gradient descent method updating θ is implemented by:
wherein,representing the mean square error of the target Q network and the estimated Q network for the loss function of the estimated Q network;
h) And (3) performing experience playback, and repeating the iteration steps a) -g) until the reward converges to a stable value, so as to obtain effective calculation and communication resource allocation results and minimize time delay.
6) The method comprises the following steps of performing calculation and communication resource allocation on line and cooperatively processing heterogeneous industrial tasks:
a) Taking the state vectors s (t) of all the industrial terminals at the current moment t as the input of the intelligent agent after the off-line training is finished, and obtaining an output action vector a (t);
b) And according to the obtained output action vector a (t), all industrial terminals distribute calculation and communication resources according to the calculation decision, the unloading proportion and the transmitting power in the a (t) to process industrial tasks.
Claims (9)
1. A joint allocation method for computing and communication resources of an industrial wireless network is characterized in that the cooperative allocation of the computing and communication resources of an industrial base station and an industrial terminal in the industrial wireless network is realized based on deep reinforcement learning, and the joint allocation method comprises the following steps:
1) Establishing an industrial wireless network with edge computing capability;
2) Constructing an optimization problem about the joint allocation of computing and communication resources of the industrial base station and the industrial terminal according to the deadline requirement of the heterogeneous industrial task;
3) Converting the optimization problem into a Markov decision process problem;
4) Constructing a double-layer dual deep neural network model based on deep reinforcement learning so as to solve the Markov decision process problem;
5) Training a double-layer dual deep neural network model in an off-line manner to obtain calculation and communication resource distribution results;
6) The industrial base station and the industrial terminal in the industrial wireless network perform calculation and communication resource allocation on line, perform wireless communication and task unloading so as to cooperatively process heterogeneous industrial tasks and minimize time delay.
2. The method of claim 1, wherein the edge-computing-capable industrial wireless network comprises: n industrial base stations and M industrial terminals;
the industrial base station is provided with an edge computing server and is used for providing computing resources for a plurality of industrial terminals and supporting the dispatching of the industrial terminals in the coverage area of the industrial base station;
the industrial terminal is used for performing local calculation on the heterogeneous industrial tasks and supporting the heterogeneous industrial tasks to be unloaded to the industrial base station through the wireless channel for edge calculation.
3. The method of claim 2, wherein tasks for a single industrial terminal are offloaded to one industrial base station by no offloading, partial offloading, or all offloading; with d m,n Represents a computational decision, d m,n =1 indicates that the industrial terminal m selects the industrial base station n for unloading, otherwise, the industrial base station n is not unloaded;
the transmission rate of the industrial terminal for task unloading through the wireless channel is
Wherein, B m,n The bandwidth is represented by a number of bits,representing noise at the industrial base station n, g m,n And g m',n Respectively representing the channel power gain, p, from industrial terminal m, industrial terminal m' to industrial base station n m 、p m' Respectively representing the transmission power of the industrial terminal m and the industrial terminal m'.
4. The method as claimed in claim 1, wherein the optimization problem of the joint allocation of the computing and communication resources of the industrial base station and the industrial terminal is that
C3:0≤f m,n ≤F n ,
C5:0≤u m,n ≤1,
C6:d m,n ∈{0,1},
C8:0≤τ m,n ≤w m
Wherein,for the task objective, i.e. minimizing the total delay, τ m,n Representing the time delay for processing the tasks of the industrial terminal m,respectively representing objective calculation decisions, unloading proportions and transmitting power of all industrial base stations and industrial terminals;
c1 and C2 are transmit power constraints; wherein, P max Represents the maximum transmission power, I, of an industrial terminal p Indicating the peak interference power, g, that an industrial terminal can tolerate m,m' Represents the channel power gain from between industrial terminal m and industrial terminal m';
c3 and C4 are computational resource constraints; wherein f is m,n Indicating the computing resources allocated to an industrial terminal m by an industrial base station n, F n The total computing resources of the industrial base station n are represented by the number of CPU cycles in unit time;
c5 is an unloading proportion constraint; wherein u is m,n The unloading proportion of the industrial terminal m to unload the task to the industrial base station n is shown, and the size of the unloading proportion is between 0 and 1;
c6 and C7 are constraints for computational decisions; wherein, d m,n Representing a computational decision, d m,n =1 represents that the industrial terminal m selects the industrial base station n for task offloading; d m,n =0 indicates that the industrial terminal m does not select the industrial base station n for task offloading; each industrial terminal can only select one industrial base station to carry out task unloading and can not unload the industrial base stations;
c8 is task deadline constraint; wherein w m Indicating the deadline of the task executed by the industrial terminal m, i.e. the longest task processing time that the industrial terminal m can accept.
5. The method of claim 4, wherein the task processing delay of the industrial terminal is determined by an edge computation delay and a local computation delay, and the computation method comprises:
the edge calculating time delayDetermined by the communication delay and the calculated delay, calculated as
Wherein, C m Indicating the number of CPU cycles, T, required by the industrial terminal m to compute a unit data volume task m The data size of the task to be processed by the industrial terminal m is represented;
6. The method of claim 1, wherein the problem transformation is performed based on a Markov decision process as follows:
a) Establishing a Markov decision model which comprises a state vector, an action vector, an incentive vector and a state transfer function;
the state vector is the state of the industrial terminal at the time T and is marked as s (T) = { T (T), C (T), w (T), v (T), g (T) };
wherein, T (T) = { T = m (t)} M Set of data size representing the tasks performed by M industrial terminals, C (t) = { C m (t)} M Represents a set of CPU calculation cycles, v (t) = { v (v) =, required for M industrial terminals to execute a unit data volume task m (t)} M Set of task priorities representing M industrial terminals, w (t) = { w m (t)} M Set of deadlines representing the execution of tasks by the M industrial terminals, g (t) = g{g m,n (t)} M×N Representing a set of channel power gains between the M industrial terminals and the N industrial base stations;
the motion vector is the motion of the industrial terminal at the time t and is marked as a (t) = { d (t), u (t), p (t) };
wherein d (t) = { d = m,n (t)} M×N A calculation decision result set representing that the M industrial terminals carry out task unloading to the N industrial base stations; u (t) = { u m,n (t)} M×N Represents a set of proportions for M industrial terminals to offload tasks to N industrial base stations, p (t) = { p = m,n (t)} M×N Representing a set of transmit powers for the M industrial terminals;
the reward vector is the reward obtained by the industrial terminal at the moment t and is recorded as r (t) = { r m,n (t)} M×N ;
Wherein the reward obtained by the industrial terminal m is r m,n (t)=τ m,n (t)+ρ(τ m,n (t)-w m (t)), ρ represents the reward and punishment coefficient of the task, and is determined by the task priority;
the state transition function is the probability of transition from the state s (t) to the state s (t + 1) after the action a (t) is executed at the time t, and is expressed as f (s (t + 1) | s (t), a (t));
b) Determining a long-term cumulative reward function; the long term jackpot is
Wherein, t 0 Represents the last time, gamma ∈ [0,1 ]]A discount coefficient is represented and used for indicating the influence of past rewards on the current rewards;
c) Carrying out problem conversion; the problem after conversion is
max R p (t)
s.t.C1,C2,C3,C4,C5,C6,C7,C8
Under the condition that the constraints C1-C8 are met, the long-term accumulated reward is maximized to obtain the optimal state transition probability, and further an effective time delay minimization strategy is obtained.
7. The method for joint allocation of computing and communication resources of the industrial wireless network according to claim 1, wherein the deep reinforcement learning-based construction of the dual-layer dual deep neural network model comprises two deep neural networks, namely an estimation Q network and a target Q network, to form an agent; the two structures are the same, but different superparameters theta are adopted to generate Q values.
8. The method for jointly allocating computing and communication resources of an industrial wireless network according to claim 1, wherein the step of training the neural network model offline until the convergence of rewards comprises the following steps:
a) Extracting experience data E (t) from the experience pool as training data;
b) Inputting s (t) into an estimated Q network to generate an action a (t) and a Q value Q (s (t), a (t) | theta);
c) Executing a (t), converting s (t) into s (t + 1) and obtaining a reward r (t);
d) Training the estimated Q network, and updating the hyperparameter theta of the target Q network in real time;
e) Storing the obtained state, action, reward and next time state of the current time as experience E (t) = { s (t), a (t), r (t), s (t + 1) } in an experience pool;
f) Inputting s (t + 1) to a target Q network to obtain a (t + 1) and a Q value Q '(s (t), a (t) | theta'), and calculating to obtain Q '(s (t + 1), a (t + 1) | theta');
g) Updating theta by a random gradient descent method; the random gradient descent method update θ is realized by the following formula:
wherein,representing the mean square error of the target Q network and the estimated Q network for the loss function of the estimated Q network;
h) And performing experience playback, and repeatedly iterating the steps a) -g) until the reward converges to a stable value, and obtaining an effective time delay minimization strategy, namely calculating decisions, unloading proportions and transmitting power of all industrial base stations and industrial terminals and communication resource allocation results.
9. The method for joint allocation of computing and communication resources of an industrial wireless network according to claim 1, wherein the performing computing and communication resource allocation on-line and co-processing heterogeneous industrial tasks comprises the following steps:
a) Taking the state vectors s (t) of all the industrial terminals at the current moment t as the input of the intelligent agent after the off-line training is finished, and obtaining an output action vector a (t);
b) And according to the obtained output action vector a (t), all industrial terminals distribute calculation and communication resources according to the calculation decision, the unloading proportion and the transmitting power in the a (t) to process industrial tasks.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211052799.XA CN115413044B (en) | 2022-08-31 | 2022-08-31 | Computing and communication resource joint allocation method for industrial wireless network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211052799.XA CN115413044B (en) | 2022-08-31 | 2022-08-31 | Computing and communication resource joint allocation method for industrial wireless network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115413044A true CN115413044A (en) | 2022-11-29 |
CN115413044B CN115413044B (en) | 2024-08-06 |
Family
ID=84164572
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211052799.XA Active CN115413044B (en) | 2022-08-31 | 2022-08-31 | Computing and communication resource joint allocation method for industrial wireless network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115413044B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116074211A (en) * | 2023-01-05 | 2023-05-05 | 国家电网有限公司 | Heterogeneous network resource optimization method and system taking service quality as center |
CN117575113A (en) * | 2024-01-17 | 2024-02-20 | 南方电网数字电网研究院股份有限公司 | Edge collaborative task processing method, device and equipment based on Markov chain |
WO2024159708A1 (en) * | 2023-01-31 | 2024-08-08 | 中国科学院沈阳自动化研究所 | Digital twinning-based end-edge collaborative scheduling method for heterogeneous task and resource |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108848561A (en) * | 2018-04-11 | 2018-11-20 | 湖北工业大学 | A kind of isomery cellular network combined optimization method based on deeply study |
CN112069903A (en) * | 2020-08-07 | 2020-12-11 | 之江实验室 | Method and device for achieving face recognition end side unloading calculation based on deep reinforcement learning |
CN113543156A (en) * | 2021-06-24 | 2021-10-22 | 中国科学院沈阳自动化研究所 | Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning |
WO2022027776A1 (en) * | 2020-08-03 | 2022-02-10 | 威胜信息技术股份有限公司 | Edge computing network task scheduling and resource allocation method and edge computing system |
CN114143891A (en) * | 2021-11-30 | 2022-03-04 | 南京工业大学 | FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network |
-
2022
- 2022-08-31 CN CN202211052799.XA patent/CN115413044B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108848561A (en) * | 2018-04-11 | 2018-11-20 | 湖北工业大学 | A kind of isomery cellular network combined optimization method based on deeply study |
WO2022027776A1 (en) * | 2020-08-03 | 2022-02-10 | 威胜信息技术股份有限公司 | Edge computing network task scheduling and resource allocation method and edge computing system |
CN112069903A (en) * | 2020-08-07 | 2020-12-11 | 之江实验室 | Method and device for achieving face recognition end side unloading calculation based on deep reinforcement learning |
CN113543156A (en) * | 2021-06-24 | 2021-10-22 | 中国科学院沈阳自动化研究所 | Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning |
CN114143891A (en) * | 2021-11-30 | 2022-03-04 | 南京工业大学 | FDQL-based multi-dimensional resource collaborative optimization method in mobile edge network |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116074211A (en) * | 2023-01-05 | 2023-05-05 | 国家电网有限公司 | Heterogeneous network resource optimization method and system taking service quality as center |
WO2024159708A1 (en) * | 2023-01-31 | 2024-08-08 | 中国科学院沈阳自动化研究所 | Digital twinning-based end-edge collaborative scheduling method for heterogeneous task and resource |
CN117575113A (en) * | 2024-01-17 | 2024-02-20 | 南方电网数字电网研究院股份有限公司 | Edge collaborative task processing method, device and equipment based on Markov chain |
CN117575113B (en) * | 2024-01-17 | 2024-05-03 | 南方电网数字电网研究院股份有限公司 | Edge collaborative task processing method, device and equipment based on Markov chain |
Also Published As
Publication number | Publication date |
---|---|
CN115413044B (en) | 2024-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113242568B (en) | Task unloading and resource allocation method in uncertain network environment | |
CN113950066B (en) | Single server part calculation unloading method, system and equipment under mobile edge environment | |
CN112367353B (en) | Mobile edge computing unloading method based on multi-agent reinforcement learning | |
CN111800828B (en) | Mobile edge computing resource allocation method for ultra-dense network | |
CN113543156B (en) | Industrial wireless network resource allocation method based on multi-agent deep reinforcement learning | |
CN115413044B (en) | Computing and communication resource joint allocation method for industrial wireless network | |
CN113612843A (en) | MEC task unloading and resource allocation method based on deep reinforcement learning | |
CN108920280A (en) | A kind of mobile edge calculations task discharging method under single user scene | |
WO2023040022A1 (en) | Computing and network collaboration-based distributed computation offloading method in random network | |
CN113573324A (en) | Cooperative task unloading and resource allocation combined optimization method in industrial Internet of things | |
CN111586696A (en) | Resource allocation and unloading decision method based on multi-agent architecture reinforcement learning | |
CN113568675A (en) | Internet of vehicles edge calculation task unloading method based on layered reinforcement learning | |
CN114285853B (en) | Task unloading method based on end edge cloud cooperation in equipment-intensive industrial Internet of things | |
CN114340016B (en) | Power grid edge calculation unloading distribution method and system | |
CN112511336B (en) | Online service placement method in edge computing system | |
CN113626104B (en) | Multi-objective optimization unloading strategy based on deep reinforcement learning under edge cloud architecture | |
CN111565380B (en) | NOMA-MEC-based hybrid unloading method in Internet of vehicles | |
CN116156563A (en) | Heterogeneous task and resource end edge collaborative scheduling method based on digital twin | |
CN114564304A (en) | Task unloading method for edge calculation | |
CN113590279A (en) | Task scheduling and resource allocation method for multi-core edge computing server | |
CN114828018A (en) | Multi-user mobile edge computing unloading method based on depth certainty strategy gradient | |
CN113573363A (en) | MEC calculation unloading and resource allocation method based on deep reinforcement learning | |
CN116112488A (en) | Fine-grained task unloading and resource allocation method for MEC network | |
CN118467127A (en) | Multi-agent cooperation-based mobile edge computing task scheduling and unloading method | |
Han et al. | Multi-step reinforcement learning-based offloading for vehicle edge computing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |