CN116828542A - Power load terminal access response method, system, management system, equipment and storage medium - Google Patents
Power load terminal access response method, system, management system, equipment and storage medium Download PDFInfo
- Publication number
- CN116828542A CN116828542A CN202310790860.9A CN202310790860A CN116828542A CN 116828542 A CN116828542 A CN 116828542A CN 202310790860 A CN202310790860 A CN 202310790860A CN 116828542 A CN116828542 A CN 116828542A
- Authority
- CN
- China
- Prior art keywords
- power load
- load terminal
- representing
- power
- computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 75
- 230000004044 response Effects 0.000 title claims abstract description 66
- 238000004364 calculation method Methods 0.000 claims abstract description 111
- 238000004891 communication Methods 0.000 claims abstract description 67
- 230000009471 action Effects 0.000 claims abstract description 62
- 238000013468 resource allocation Methods 0.000 claims abstract description 59
- 238000005265 energy consumption Methods 0.000 claims abstract description 54
- 238000007726 management method Methods 0.000 claims abstract description 47
- 230000005540 biological transmission Effects 0.000 claims description 57
- 238000012545 processing Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 18
- 125000003275 alpha amino acid group Chemical group 0.000 claims description 12
- 150000001875 compounds Chemical class 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 8
- 230000007704 transition Effects 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 7
- 239000000654 additive Substances 0.000 claims description 6
- 230000000996 additive effect Effects 0.000 claims description 6
- 230000005577 local transmission Effects 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 claims description 5
- 238000011478 gradient descent method Methods 0.000 claims description 5
- 238000009499 grossing Methods 0.000 claims description 5
- 238000012546 transfer Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 description 16
- 238000004422 calculation algorithm Methods 0.000 description 6
- 239000003795 chemical substances by application Substances 0.000 description 5
- 230000002787 reinforcement Effects 0.000 description 4
- 238000012544 monitoring process Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 206010048669 Terminal state Diseases 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/08—Load balancing or load distribution
- H04W28/09—Management thereof
- H04W28/0925—Management thereof using policies
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/08—Load balancing or load distribution
- H04W28/09—Management thereof
- H04W28/0917—Management thereof based on the energy state of entities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/02—Traffic management, e.g. flow control or congestion control
- H04W28/08—Load balancing or load distribution
- H04W28/09—Management thereof
- H04W28/0958—Management thereof based on metrics or performance parameters
- H04W28/0967—Quality of Service [QoS] parameters
- H04W28/0975—Quality of Service [QoS] parameters for reducing delays
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a power load terminal access response method, a system, a management system, equipment and a storage medium, comprising the following steps: receiving an access request from a power load terminal to be accessed, and collecting the current system state of a power load management system; inputting the current system state of the power load management system into a trained actor network to obtain the optimal action of the power load management system; according to the optimal action of the power load management system, performing a calculation resource allocation decision and a communication resource allocation decision; providing edge or cloud computing resources for the power load terminal based on the computing resource allocation decision and the communication resource allocation decision by the edge server or the cloud server, and simultaneously providing communication resources accessed to the edge server or the cloud server for the power load terminal to complete the access of the power load terminal to be accessed; the invention can effectively reduce the access response delay and the energy consumption of all the power load terminals in the novel power load management system, balance the workload of the edge server and improve the utilization rate of calculation and communication resources.
Description
Technical Field
The invention belongs to the technical field of power load terminal access response calculation, and particularly relates to a power load terminal access response method, a power load terminal access response system, a power load management system, computer equipment and a storage medium.
Background
With the rapid development of new power load management systems, more and more power load terminals are connected to a power network to support various power applications, such as power equipment monitoring and fault diagnosis, power consumption behavior monitoring and prediction, energy monitoring and energy saving management, and the like. Since a large number of power load terminals access on a limited radio spectrum on a large scale, severe congestion may be caused, and the probability of success of terminal access and transmission is reduced. In view of the explosive growth in the number of power load terminals, the access efficiency of the terminals must be improved to accommodate a large-scale access scenario with various quality of service (Quality of Service, qoS) metrics. Ultra-Reliable Low-Latency Communications (URLLC) is one of the most challenging services for QoS, with stringent Low-latency and high-reliability requirements.
To solve the above problems, an Edge computing technology has been used as an effective solution, and the technology deploys Edge Servers (ESs) near the power load terminals, so that the power load terminals can directly access the ESs without accessing to a remote Cloud Server (CS), thereby alleviating network congestion caused by large-scale power load terminal access, reducing response delay of computing tasks, and improving transmission reliability. However, the computing power of a single ES is generally limited, and if a computation task that is not delay-sensitive and has a huge computation amount is encountered, the cooperation between the ES and the CS can be adopted in consideration of the abundant computing resources of the CS, so as to further improve the resource utilization.
The true new power load management system environment is typically dynamic and unpredictable (e.g., time-varying computing task parameters, power load terminal states, and channel gains), for which reinforcement learning (Reinforcement Learning, RL) has become a promising solution. The RL learns the best strategy by interacting with the dynamic environment without prior knowledge about the environment dynamics. However, conventional RL algorithms are only suitable for environments with fully observable, low-dimensional state space, while real, new power load management system environments typically have high-dimensional, continuous state space, and it is difficult to extract all useful features from the environment. Fortunately, deep reinforcement learning (Deep Reinforcement Learning, DRL) integrates the powerful feature extraction capability of deep neural networks (Deep Neural Network, DNN) and the powerful decision capability of RL, specifically, DRL utilizes DNN model to approximate policy functions and value functions in RL, and can learn the best policy from large high-dimensional, continuous state space, so DRL is suitable for a real new power load management system environment.
Most of the existing researches only consider the power load terminal access optimization or the allocation optimization of calculation and communication resources, do not consider the joint optimization of the power load terminal access optimization and the allocation optimization of the calculation and the communication resources, and ignore the URLLC constraint. In fact, as the number of electrical load terminals increases, the demand for URLLC increases. Meanwhile, the access of the power load terminals and the resource allocation are complementary and mutually influenced, on one hand, the access decision of the power load terminals can influence the optimization result of the resource allocation, and on the other hand, the competition of a plurality of power load terminals on the resources can influence the terminal decision. Thus, joint optimization of terminal access and resource allocation is required.
Disclosure of Invention
The invention aims to: in order to solve the problems that in the prior art, only power load terminal access optimization or only calculation and communication resource allocation optimization is considered, joint optimization of power load terminal access and resource allocation is not considered, and URLLC constraint is ignored, the invention provides a power load terminal access response method, a power load terminal access response system, a power load management system, computer equipment and a storage medium of a novel cloud-edge-oriented power load management system.
The technical scheme is as follows: an access response method for a power load terminal comprises the following steps:
step 1: receiving an access request from a power load terminal to be accessed, and collecting the current system state of a power load management system;
step 2: inputting the current system state of the power load management system into a trained actor network to obtain the optimal action of the power load management system;
step 3: according to the optimal action of the power load management system, performing a calculation resource allocation decision and a communication resource allocation decision; providing edge or cloud computing resources for the power load terminal based on the computing resource allocation decision and the communication resource allocation decision by the edge server or the cloud server, and simultaneously providing communication resources accessed to the edge server or the cloud server for the power load terminal to complete the access of the power load terminal to be accessed;
The trained actor network is obtained by training the actor network according to the minimum system assembly cost while meeting URLLC constraint; the total cost of the system is composed of the total latency cost of computing task execution and the total cost of system energy consumption to complete all computing tasks.
Further, the trained actor network is obtained by training the actor network according to the minimum system assembly cost while satisfying the URLLC constraint, and specifically comprises the following steps:
s1: initializing an actor network pi μ The actor network parameter mu, the critic1 network Q θ1 Critic1 network parameters θ 1 Critic2 network Q θ2 Critic2 network parameters θ 2 Target actor networkTarget actor network parameter μ ~ Target critic1 network->Target critic1 network parameter θ 1 ~ Target critic2 network->Target critic2 network parameter θ 2 ~ The search noise epsilon, the total number NE of the epochs, the number NS of time steps contained in each epochs, the storage capacity RS of an experience replay pool, the experience replay period RP, the discount factor gamma and the smoothing coefficient tau of a target actor network; defining indexes of an epoode and a time step as ne and t respectively, and initializing ne=1 and t=1;
s2: when NE e {1, 2..ne }, t e {1, 2..ns }, the current system state s will be t Input to an actor network pi μ In which the output satisfies the URLLC constraintIs a probability distribution pi of all possible continuous actions of (a) μ (s t ) Probability distribution pi of all possible continuous actions in satisfying URLLC constraint μ (s t ) On the basis of (a) and adding search noise epsilon to obtain an action a t ~π μ (s t ) +ε, ε to N (0, σ), act a is performed t ;
s t E S, S represents a state space, including computing task J n ={b n ,c n ,Q n ' Access decisionTransmitting power P of power load terminal m Reception power of ES->Transmission power of ES->Total bandwidth B of each ES u Computing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t) a transfer rate R between each ES and SDN controller md (t);Representing the ratio of the calculation tasks calculated locally at the power load terminal n +.>Representing the offload ratio of the computing task from the power load terminal n to the edge server m, +.>The calculation task response ratio of the calculation task from the power load terminal n to the edge server m and then to the cloud server c is represented;
action a t E, A represents action space to make power load terminal access and calculationService response, computational resource allocation decisions and communication resource allocation decisions, including access decisions for each power load terminal Computing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t), the transmitting power P of the electric load terminal m Reception power of ES->And the transmission power of ES
S3: in the execution of action a t After that, an instant prize R is obtained t (s t ,a t ) Expressed as:
R t (s t ,a t )=-(ω 1 T Latency +ω 2 E(t)) (20)
wherein ,ω1 and ω2 Is a weight parameter, ω 1 +ω 2 =1;T Latency Executing total time delay for the computing task; e (t) is the total system energy consumption for completing all computing tasks;
transition to the next system state s t+1 Then the experience sample e t =(s t ,a t ,R t ,s t+1 ) Store in the experience replay pool;
s4: judging whether or not to meetIf it is satisfied that the set of parameters, let t +1, and returns to S2; otherwise, executing S5;
s5: determining whether t% rp= 0 is satisfied, and executing S6 only when satisfied;
s6: extracting experience sample e from experience replay pool by experience replay technology i =(s i ,a i ,R i ,s i+1 ) The method comprises the steps of carrying out a first treatment on the surface of the Will e i S in (3) i+1 Inputting the result into a target actor network, adding exploration noise epsilon into the output of the target actor network, and obtaining a predicted actions i Representing the system state, a, in the extracted empirical sample i Representing actions in the extracted empirical sample, s i+1 Representing the next system state in the extracted empirical sample, R i Representing an instant prize in the extracted experience sample;
Will s i+1 Andinput to target critic1 network +.>And target critic2 network->In (1) obtaining the Q value and
Defining a target Q value as:
wherein y represents a TD target;
updating the critic1 network parameter θ separately by minimizing the loss function using a small batch gradient descent method 1 And critic2 network parameter θ 2 Expressed as:
wherein Z represents the number of empirical samples drawn;
updating the actor network parameter μ using deterministic policy gradient method every d time steps:
wherein ,representing the critical network with the minimum Q value in the target critical 1 network and the target critical 2 network;
soft update is performed on the target critic1 network parameter, the target critic2 network parameter and the target actor network parameter:
μ ~ ←τ(μ+(1-τ)μ ~ (33)。
further, in S2, the URLLC constraint is expressed as:
0≤f mn (t)≤F mn (t) (25)
0≤f mc (t)≤F mc (t) (26)
ω 1 +ω 2 =1 (27)
in the formula (24), the amino acid sequence of the compound,representing the transmit power of edge server m, +.>Representing the received power of the edge server m; p represents the maximum value range of the power;
in the formula (25), f mn (t) represents the computing resources allocated to the power load terminal n by the edge server m during the time slot t, F mn (t) represents the maximum computing resources that the edge server m can allocate to the power load terminal n;
in the formula (26), f mc (t) represents the computing resources allocated to the power load terminal n by the cloud server c during the time slot t, F mc (t) represents the maximum computing resource that the cloud server c can allocate to the power load terminal n;
in the formula (3), the amino acid sequence of the compound,representing the data rate of each URLLC service in the ith communication link, D representing the size of the task data uploaded by the power load terminal n, B n Representing the bandwidth, lambda, of each communication link m To calculate the task arrival rate, f -1 (. Cndot.) is defined as [ -e -1 ,0)→[1,∞),Representing the minimum data rate under delay constraints that ensure that no delay interruption is caused,representing the maximum SINR violation probability that can be tolerated;T max A threshold value representing a maximum tolerable latency;
in the formula (4), the amino acid sequence of the compound,representing outage probability->Representing a minimum SINR threshold on the nth communication link,representing a maximum probability of tolerable violation of SINR constraints; SINR (Signal to interference plus noise ratio) n Indicating the SINR value on the nth communication link,representing the probability that the received SINR is less than the minimum SINR threshold.
Further, in S3, the total time delay of the task execution is the sum of the transmission time delay and the computation time delay;
the transmission delay includes: local transmission delay of the power load terminal, and transmission delay between the edge server and the cloud server;
the calculating the time delay comprises the following steps: the method comprises the steps of locally calculating the time delay of the power load terminal, calculating the time delay of the edge server for processing the calculation task of the power load terminal and calculating the time delay of the cloud server for processing the calculation task of the power load terminal.
Further, the total time delay of the execution of the computing task is expressed as:
in the formula ,representing local transmission delay of the power load terminal in a time slot t;
representing transmission delay between the edge server and the cloud server in the time slot t;
representing the local calculation time delay of the power load terminal in the time slot t;
the calculation time delay of the calculation task of the power load terminal n processed by the edge server m in the time slot t is shown;
and the calculation time delay of the cloud server c for processing the calculation task of the power load terminal n in the time slot t is shown.
Further, the saidExpressed as:
in the formula ,Dm Representing the data quantity of the calculation task which needs to be processed by the power load terminal n; b (B) m Representing the uplink bandwidth equally distributed to all power load terminals;
representing the transmission channel gain, g, in time slot t 0 Representing the channel gain, d, between the edge server m and the power load terminal n mn (t) represents the distance between the edge server m and the power load terminal n during the time slot t,Representing the additive white gaussian noise power at each edge server M, M n And (t) represents the number of power load terminals served by the edge server m during the time slot t.
Further, the saidExpressed as:
in the formula ,Dm Representing the data quantity of the calculation task which needs to be processed by the power load terminal n; r is R mc (t) represents a transmission rate between the edge server and the cloud server in the time slot t; r is R md (t) represents the transmission rate inside the edge server in time slot t; b (B) c Representing bandwidth pre-distributed to edge server m by cloud server, P c Representing the transmit power of the edge server m,represents the additive white gaussian noise power, h, under the cloud server c mc (t) represents the transmission channel gain between the edge server and the cloud server in time slot t;
further, the saidExpressed as:
in the formula ,fn (t) represents the computing power of the power load terminal n in the time slot t, D m The data quantity of the calculation task which is required to be processed by the power load terminal n is represented, and C represents the number of CPU cycles required for processing each task;
further, the saidExpressed as:
in the formula ,fmn (t) represents the computing resources allocated to the power load terminal n by the edge server m during the time slot t, and C represents the number of CPU cycles required to process each task; d (D) m Indicating the amount of computational task data that the power load terminal n needs to process.
Further, the saidExpressed as:
in the formula ,fmc (t) represents the computing resources allocated to the power load terminal n by the cloud server C during the time slot t, C representing the number of CPU cycles required to process each task; d (D) m Indicating the amount of computational task data that the power load terminal n needs to process.
Further, in S3, the total system energy consumption for completing all the calculation tasks is the sum of the local calculation energy consumption of the power load terminal n, the transmission energy consumption of the power load terminal n when the power load terminal n accesses a part of the calculation tasks to the edge server m, the calculation energy consumption of the edge server m for processing the calculation tasks of the power load terminal n, and the transmission energy consumption when the edge server m accesses the rest of the calculation tasks to the cloud server c.
Further, the total system energy consumption for completing all the computing tasks is expressed as:
wherein E (t) represents the total system energy consumption, lambda, in time slot t after completion of the calculation tasks of all the power load terminals m Representing the arrival rate of the computing task,representing the local calculation of the energy consumption, E, of the power load terminal n in the time slot t mn (t) represents the transmission energy consumption of the power load terminal n when the power load terminal n switches part of the calculation tasks to the edge server m during the time slot t, < >>Representing the computational energy consumption of the edge server m in processing the computational tasks of the power load terminal n during the time slot t, E mc And (t) represents the transmission energy consumption when the edge server m accesses the remaining calculation tasks to the CS c in the time slot t.
Further, the saidThe representation is:
wherein, kappa is equal to or greater than 0 and represents effective switch capacitance, f n (t) represents the computing power of the power load terminal n during the time slot t,representing the local computation delay of the power load terminal during time slot t.
Further, the E mn (t) representing:
wherein ,representing the received power of the edge server m.
Further, the saidThe representation is:
further, the E mc (t) representing:
wherein ,is the transmit power of the edge server m.
The invention discloses an access response system of an electric load terminal, which comprises:
the access request acquisition module is used for receiving an access request from a power load terminal to be accessed;
the system state collection module is used for collecting the current system state of the power load management system;
the processing module is used for inputting the current system state of the power load management system into the trained actor network to obtain the optimal action of the power load management system;
the decision module is used for carrying out a calculation resource allocation decision and a communication resource allocation decision according to the optimal action of the power load management system; providing edge or cloud computing resources for the power load terminal based on the computing resource allocation decision and the communication resource allocation decision by the edge server or the cloud server, and simultaneously providing communication resources accessed to the edge server or the cloud server for the power load terminal to complete the access of the power load terminal to be accessed;
The trained actor network is obtained by training the actor network according to the minimum system assembly cost while meeting URLLC constraint; the total cost of the system is composed of the total latency cost of computing task execution and the total cost of system energy consumption to complete all computing tasks.
Further, the trained actor network is obtained by training the actor network according to the minimum system assembly cost while satisfying the URLLC constraint, and specifically comprises the following steps:
s1: initializing an actor network pi μ The actor network parameter mu, the critic1 network Q θ1 Critic1 network parameters θ 1 Critic2 network Q θ2 Critic2 network parameters θ 2 Target actor networkTarget actor network parameter μ ~ Target critic1 network->Target critic1 network parameter θ 1 ~ Target critic2 network->Target critic2 network parameter θ 2 ~ The search noise epsilon, the total number NE of the epochs, the number NS of time steps contained in each epochs, the storage capacity RS of an experience replay pool, the experience replay period RP, the discount factor gamma and the smoothing coefficient tau of a target actor network; defining indexes of an epoode and a time step as ne and t respectively, and initializing ne=1 and t=1;
s2: when NE e {1, 2..ne }, t e {1, 2..ns }, the current system state s will be t Input to an actor networkPi of complex μ In which all possible continuous action probability distribution pi satisfying URLLC constraint is output μ (s t ) Probability distribution pi of all possible continuous actions in satisfying URLLC constraint μ (s t ) On the basis of (a) and adding search noise epsilon to obtain an action a t ~π μ (s t ) +ε, ε to N (0, σ), act a is performed t ;
s t E S, S represents a state space, including computing task J n ={b n ,c n ,Q n ' Access decisionTransmitting power P of power load terminal m Reception power of ES->Transmission power of ES->Total bandwidth B of each ES u Computing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t) a transfer rate R between each ES and SDN controller md (t);Representing the ratio of the calculation tasks calculated locally at the power load terminal n +.>Representing the offload ratio of the computing task from the power load terminal n to the edge server m, +.>The calculation task response ratio of the calculation task from the power load terminal n to the edge server m and then to the cloud server c is represented;
action a t E A, A represents action spaceTo make power load terminal access, computational task responses, computational resource allocation decisions, and communication resource allocation decisions, including access decisions for each power load terminal Computing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t), the transmitting power P of the electric load terminal m Reception power of ES->And the transmission power of ES
S3: in the execution of action a t After that, an instant prize R is obtained t (s t ,a t ) Expressed as:
R t (s t ,a t )=-(ω 1 T Latency +ω 2 E(t)) (20)
wherein ,ω1 and ω2 Is a weight parameter, ω 1 +ω 2 =1;T Latency Executing total time delay for the computing task; e (t) is the total system energy consumption for completing all computing tasks; the greater the total cost of the system, the less the instant rewards;
and transitions to the next system state s t+1 Then the experience sample e t =(s t ,a t ,R t ,s t+1 ) Store in the experience replay pool;
s4: judging whether or not to meetIf it is satisfied that the set of parameters, let t +1, and returns to S2; otherwise, executing S5;
s5: determining whether t% rp= 0 is satisfied, and executing S6 only when satisfied;
s6: from experience, using experience replay techniquesDrawing an experience sample e in a pool i =(s i ,a i ,R i ,s i+1 ) The method comprises the steps of carrying out a first treatment on the surface of the Will e i S in (3) i+1 Inputting the result into a target actor network, adding exploration noise epsilon into the output of the target actor network, and obtaining a predicted actions i Representing the system state, a, in the extracted empirical sample i Representing actions in the extracted empirical sample, s i+1 Representing the next system state in the extracted empirical sample, R i Representing an instant prize in the extracted experience sample;
will s i+1 Andinput to target critic1 network +.>And target critic2 network->In (1) obtaining the Q value and
Defining a target Q value as:
wherein y represents a TD target;
updating the critic1 network parameter θ separately by minimizing the loss function using a small batch gradient descent method 1 And critic2 network parameter θ 2 Expressed as:
wherein Z represents the number of empirical samples drawn;
updating the actor network parameter μ using deterministic policy gradient method every d time steps:
wherein ,representing the critical network with the minimum Q value in the target critical 1 network and the target critical 2 network;
soft update is performed on the target critic1 network parameter, the target critic2 network parameter and the target actor network parameter:
μ ~ ←τμ+(1-τ)μ ~ (33)。
further, in S2, the URLLC constraint is expressed as:
0≤f mn (t)≤F mn (t) (25)
0≤f mc (t)≤F mc (t) (26)
ω 1 +ω 2 =1 (27)
in the formula (24), the amino acid sequence of the compound,representing the transmit power of edge server m, +.>Representing the received power of the edge server m; p represents the maximum value range of the power;
in the formula (25), f mn (t) represents the computing resources allocated to the power load terminal n by the edge server m during the time slot t, F mn (t) represents the maximum computing resources that the edge server m can allocate to the power load terminal n;
in the formula (26), f mc (t) represents the computing resources allocated to the power load terminal n by the cloud server c during the time slot t, F mc (t) represents the maximum computing resource that the cloud server c can allocate to the power load terminal n;
in the formula (3), the amino acid sequence of the compound,representing the data rate of each URLLC service in the ith communication link, D representing the size of the task data uploaded by the power load terminal n, B n Representing the bandwidth, lambda, of each communication link m To calculate the task arrival rate, f -1 (. Cndot.) is defined as [ -e -1 ,0)→[-1,∞),Representing the minimum data rate under delay constraint ensuring that no delay interruption is caused,Representing the maximum SINR violation probability that can be tolerated;T max A threshold value representing a maximum tolerable latency;
in the formula (4), the amino acid sequence of the compound,representing outage probability->Representing a minimum SINR threshold on the nth communication link,representing a maximum probability of tolerable violation of SINR constraints; SINR (Signal to interference plus noise ratio) n Representing SINR value on the nth communication link, P r {. The probability that the received SINR is less than the minimum SINR threshold.
Further, the total time delay of the execution of the computing task is expressed as:
in the formula ,representing local transmission delay of the power load terminal in a time slot t;
representing transmission delay between the edge server and the cloud server in the time slot t;
representing the local calculation time delay of the power load terminal in the time slot t;
The calculation time delay of the calculation task of the power load terminal n processed by the edge server m in the time slot t is shown;
and the calculation time delay of the cloud server c for processing the calculation task of the power load terminal n in the time slot t is shown.
Further, the total system energy consumption for completing all the computing tasks is expressed as:
wherein E (t) represents the total system energy consumption, lambda, in time slot t after completion of the calculation tasks of all the power load terminals m Representing the arrival rate of the computing task,representing the local calculation of the energy consumption, E, of the power load terminal n in the time slot t mn (t) represents the transmission energy consumption of the power load terminal n when the power load terminal n switches part of the calculation tasks to the edge server m during the time slot t, < >>Representing the computational energy consumption of the edge server m in processing the computational tasks of the power load terminal n during the time slot t, E mc And (t) represents the transmission energy consumption when the edge server m accesses the remaining calculation tasks to the CS c in the time slot t.
The invention discloses a management system, comprising: a power load terminal area composed of a plurality of power load terminals, an edge area composed of M edge servers and one SDN controller, and a cloud area composed of a plurality of cloud servers; each edge server has communication links with a plurality of power load terminals in the communication range, and each edge server communicates with a cloud server in the communication range through an SDN controller;
The SDN controller is used for executing the steps of a power load terminal access response method, obtaining an optimal action, and carrying out resource allocation decision according to the optimal action to complete the access of the power load terminal to be accessed;
the edge server is used for providing edge computing resources for the power load terminal based on the resource allocation decision obtained by the SDN controller and providing communication resources accessed to the edge server for the power load terminal;
the cloud server is used for providing cloud computing resources for the power load terminal based on the resource allocation decision obtained by the SDN controller and providing communication resources accessed to the cloud server for the power load terminal.
The invention discloses a device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of a power load terminal access response method when executing the computer program.
The invention discloses a storage medium storing a program of an access response method, which when executed by at least one processor, implements steps of a power load terminal access response method.
The beneficial effects are that: compared with the prior art, the invention has the following advantages:
(1) The method/system solves the problem of joint optimization of low-delay access response and resource allocation of the terminal of the cloud-edge cooperative novel power load management system by adopting the trained actor network, thereby relieving the problem of overestimation of the Q value and improving the learning efficiency;
(2) The method/system can effectively reduce the access response delay and the energy consumption of the power load terminal in the novel power load management system, balance the workload of the edge server and improve the utilization rate of computing and communication resources.
Drawings
Fig. 1 is a flow chart of a method for responding to access of a power load terminal according to the present invention;
FIG. 2 is a schematic diagram of a network model according to the present invention;
fig. 3 is a block flow diagram of a TD3 algorithm employed in the present invention.
Detailed Description
The technical scheme of the invention is further described with reference to the accompanying drawings.
Example 1:
as shown in fig. 1, this embodiment discloses a power load terminal access response method, which mainly includes the following steps:
step 1: receiving an access request from a power load terminal to be accessed, and collecting the current system state of a power load management system;
Step 2: inputting the current system state of the power load management system into a trained actor network to obtain the optimal action of the power load management system;
step 3: according to the optimal action of the power load management system, performing a calculation resource allocation decision and a communication resource allocation decision; providing edge or cloud computing resources for the power load terminal based on the computing resource allocation decision and the communication resource allocation decision by the edge server or the cloud server, and simultaneously providing communication resources accessed to the edge server or the cloud server for the power load terminal to complete the access of the power load terminal to be accessed;
the trained actor network is obtained by training the actor network according to the minimum system assembly cost while meeting URLLC constraint; the total cost of the system of this embodiment is made up of the total latency cost of computing task execution and the total cost of system energy consumption to complete all computing tasks.
The method and the device solve the problems that in the prior art, only power load terminal access optimization or only calculation and communication resource allocation optimization is considered, joint optimization of power load terminal access and resource allocation is not considered, and URLLC constraint is ignored, and the problem of joint optimization of terminal low-delay access response and resource allocation of a novel cloud-edge cooperative power load management system is solved by adopting a trained actor network, so that the problem of Q value overestimation is solved, and learning efficiency is improved.
Example 2:
the embodiment discloses a cloud-edge-oriented cooperative novel terminal low-delay access response method of a power load management system, which comprises the following specific steps:
step 1: according to the URLLC requirement of the novel power load management system, a cloud-edge cooperative load terminal low-delay access response system model is constructed, system parameters are configured, and real-time access of a large-scale load terminal is realized;
the cloud-edge cooperative load terminal low-delay access response system model in the step comprises the following steps: a network model, a time slot division model, a reliability and delay model, a transmission model, a calculation model and an energy consumption model;
the method specifically comprises the following steps:
s100: constructing a cloud-edge cooperative network model shown in fig. 1, wherein the network model is divided into three areas, namely a power load terminal area, an edge area and a cloud area; the power load terminal area comprises a plurality of power load terminals, each power load terminal is represented by a symbol n, and the power load terminals can randomly generate a computation-intensive computation task and a delay-sensitive computation task; each power load terminal can be accessed to the ES and the CS through a wireless network; each power load terminal is provided with a battery, and the battery provides electric energy for the power load terminal in a wired or wireless charging mode; wherein the edge zone comprises M ESs, each denoted by symbol M, each ES having a computing power denoted f, and a software defined network (Software Defined Network, SDN) controller m The ES is responsible for providing edge computing resources for the power load terminals; the SDN controller is connected to the cloud layer through a core backbone network; SDN is responsible for collecting environment state information and making terminal access and resource allocation decisions for each power load terminal; wherein the cloud area comprises a plurality of CSs, denoted by symbol c, and the computing power of each CS is denoted by f c The communication link between each electrical load terminal and the ES/CS is denoted by symbol i。
S110: setting a time slot division model: the whole time axis is divided into T time slots with the same length, T epsilon T represents the time slot index, and a quasi-static model is adopted, namely, in one time slot, all environment state parameters are kept unchanged, and different time slot parameters are different.
S120: building a total execution delay model of the computing task: the access and computation task responses of the power load terminals need to meet URLLC requirements. For URLLC demand, the total task execution delay T is calculated Latency Mainly comprises a transmission delay (T tr ) And calculating the time delay (T pc ) Calculating the total execution delay T of tasks Latency Expressed as:
T Latency =T tr +T pc (1)
constructing a transmission model: at each time slot t, each power load terminal randomly generates a computing task J n ={D m ,c n ,Q n}, wherein ,Dm Representing the size of the input data of the computing task, c n Representing the CPU cycles required to process each computing task, Q n Representing QoS constraints for the service. QoS constraints includeWhich represent maximum allowable delay and reliability, respectively.
Because of low latency constraints, each packet needs to be successfully transmitted for a given period of time, let T be max For maximum tolerable delay threshold, the delay interrupt probability P of URLLC i Latency The method comprises the following steps:
wherein ,is the maximum signal-to-interference-plus-noise ratio (Signal to Interference plus Noise Ratio, SINR) violation probability, pr { · } represents the probability that the received SINR is less than the minimum SINR threshold。
To guarantee the delay outage probability constraint, the data rate of each URLLC service for the ith communication linkThe following should be satisfied:
wherein ,Bn Is the bandwidth of each communication link, D is the size of the data,λ m to calculate the task arrival rate, f -1 (. Cndot.) is defined as [ -e -1 ,0)→[-1,∞)。Representing the maximum SINR violation probability that can be tolerated;Representing the minimum data rate under the delay constraint that ensures that no delay interruption will result.
If the data rate isLess than the minimum data rate threshold, i.e. the delay exceeds the maximum delay threshold, the current URLLC service is unsuccessful and the corresponding data transmission will stop.
Further, SINR values may be used to represent reliability of URLLC. Wherein the signal to noise ratio received by the receiver should be greater than a minimum signal to noise ratio threshold. Otherwise, the received signal cannot be successfully demodulated. Thus, outage probability expressed in signal-to-noise ratio Can be expressed as:
wherein ,representing a minimum SINR threshold value on communication link n, < >>Representing the maximum probability of tolerable violation of the SINR constraint.
Define the set of all power load terminals generating a computational task within ES m at time slot t asThe corresponding number is +.>Adopting cooperative partial offload, namely, assuming that each computing task can be divided into a plurality of sub-computing tasks; firstly, the power load terminal needs to determine whether enough computing resources exist locally, and if so, the power load terminal processes the whole computing task locally; otherwise, the power load terminal processes part of calculation tasks according to the calculation capacity of the power load terminal and simultaneously accesses the rest calculation tasks to the ES; after receiving the calculation task response, the ES also selects to calculate locally or access the calculation task to CS according to the URLLC demand; defining the access decision of the power load terminal N E N in the time slot t and in the ES m as wherein ,Representing the ratio of the calculation tasks calculated locally at the power load terminal n +.>Representing the offload ratio of the computing task from the electrical load terminal n to ES m,Representing a calculation task response ratio of the calculation task from the power load terminal n to the ES m to the CS c and satisfying +. >
In the process of calculating task response, the uplink bandwidth is B m And (3) equally distributing the power load terminals to all the power load terminals, wherein the transmission rate between the power load terminals n and the ES m is as follows:
wherein ,for transmission channel gain g 0 Is the channel gain, d, between ES m and the power load terminal n mn And (t) is the distance between ES m and the electrical load terminal n. P (P) m Is the transmit power of the power load terminal n, +.>Is the additive white gaussian noise power at each ES m. M is M n And (t) is the number of power load terminals served by the ES m during the time slot t.
Setting the data size of the calculation task to be processed of the power load terminal n as D m The corresponding transmission delay is:
when the ES is accessed to the CS, the CS and R are accessed through accessing the SDN controller firstly md (t) is the transmission rate between ES m and SDN controller;
at the same time, the transmission rate between SDN controller and CS is R mc (t):
wherein ,Bc Is the bandwidth of CS pre-allocated to ES m, and P is more than or equal to 0 c ≤P max Represents the transmission power of ES m, P max Is the maximum transmit power per ES m,additive white gaussian noise power, h, of an SDN controller mc And (t) represents the transmission channel gain between the edge server and the cloud server in the time slot t.
The corresponding total transmission delay is:
therefore, the total transmission delay is
Constructing cloud-edge collaborative computing models, and adopting three computing models according to different scenes: a local computing model, an edge computing model, and a cloud computing model;
when the local calculation model is adopted, the calculation capacity of the power load terminal n is defined as f n (t) the data size of the calculation task to be processed by the power load terminal n is D m When the power load terminal n executes locally, the calculation time delay of the calculation task is as follows:
where C represents the number of CPU cycles required to process each task.
When the edge computing model is adopted, when the computing resources of the power load terminal n are insufficient, the power load terminal n accesses part of computing tasks to the ES m, and the computing time delay of the ES m for processing the computing tasks of the power load terminal n is as follows:
in the formula ,fmn And (t) represents the computing resources allocated to the power load terminal n by ES m.
When the cloud computing model is adopted, and computing resources on the ES cannot meet the computing task requirements of the power load terminal, the ES m needs to further access the computing tasks to the CS c through the SDN controller for execution, so that abundant computing resources of the CS are fully utilized. Defining the transmission delay of the ES m to access part of the calculation task to the CS c in the time slot t as follows:
wherein fmc And (t) represents the computing resources allocated to the power load terminal n by CS c.
Since the data size of the calculation result of the calculation task is usually very small, the download delay of the calculation result of the calculation task is ignored. The total computation delay of the computation task in the time slot t is defined as:
setting up the computation on the ES node may be performed while transmitting the computation task to the CS node, then the total execution delay of the computation task may be expressed as:
s130: building an energy consumption model:
when the calculation task is executed locally on the power load terminal n, the power load terminal n calculates the power consumptionThe method comprises the following steps:
wherein, kappa is more than or equal to 0 and is an effective switch capacitor.
When the power load terminal n accesses part of the calculation task to the ES m, the transmission energy consumption E of the power load terminal n mn (t) is:
wherein ,is the received power of ES m.
Computing energy consumption of ES m processing power load terminal n computing taskThe method comprises the following steps:
when ES m accesses the rest calculation task to CS c, energy consumption E is transmitted mc (t) is:
wherein ,is the transmit power of ES m.
The cloud computing center is set to have infinite energy, so the cloud computing energy consumption is negligible.
Therefore, after the calculation tasks of all the power load terminals are completed, the total system energy consumption is as follows:
wherein ,λm Representing the arrival rate of the computing task.
Step 2: using TD3 algorithm to make optimal access response and resource allocation decisions for each power load terminal, the goal is to minimize the total long-term system cost (calculate the task total execution delay T Latency Cost + total system energy consumption E (t) cost) while satisfying URLLC constraints in a large-scale access scenario. The specific operation comprises the following steps:
s200: describing the problems of power load terminal access, calculation task response and resource allocation as a constrained Markov decision process, wherein the optimization target is to maximize the long-term accumulated discount rewards of the system, and the intelligent agent is set as an SDN controller; the process is represented by a five-tuple < S, A, R, pr, C >, wherein:
s denotes a state space: in time slot t, SDN is responsible for collecting system state s t E S, including computing task J n ={b n ,c n ,Q n ' Access decisionTransmitting power P of power load terminal m Reception power of ES->Transmission power of ES->Total bandwidth B of each ES u Computing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t) a transfer rate R between each ES and SDN controller md (t)。
A represents an action space: upon receiving the system state s t Thereafter, the agent will select an action a t E A to make power load terminal access, computation task response and resource allocation decisions, including access decisions for each power load terminalComputing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t), the transmitting power P of the electric load terminal m Reception power of ES->And transmission power of ES->
R represents a reward: at the current system state s t Lower execution action a t Thereafter, the smart agent receives an instant prize R t (s t ,a t ):
R t (s t ,a t )=-(ω 1 T Latency +ω 2 E(t)) (20)
wherein ,ω1 and ω2 Is a weight parameter, ω 1 +ω 2 =1, by ω 1 and ω2 Can be adapted to different network demands, thereby achieving the effect of network slicing if omega 1 >ω 2 The system will prefer to preferentially reduce the time delay if ω 1 <ω 2 The system preferentially reduces the energy loss.
The greater the overall cost of the system, the smaller the prize value.
Pr represents the state transition probability: the agent is derived from the current system state s t Lower execution action a t Thereafter, the system transitions to the next system state s t+1 And (2) probability of (2)
C represents a constraint condition: action a selected by the agent at each time slot t t The following constraints need to be satisfied:
0≤f mn (t)≤F mn (t) (25)
0≤f mc (t)≤F mc (t) (26)
ω 1 +ω 2 =1 (27)
s210: adopting a TD3 algorithm to make terminal access, calculation task response and resource allocation decision for each power load terminal: as shown in fig. 3, specific operations of the TD3 algorithm in this embodiment include:
An initialization stage: initializing an actor network pi μ The actor network parameter mu, the critic1 network Q θ1 Critic1 network parameters θ 1 Critic2 network Q θ2 Critic2 network parameters θ 2 Target actor networkTarget actor network parameter μ, target critic1 network +.>Target critic1 network parameter θ 1 ~ Target critic2 network->Target critic2 network parameter θ 2 ~ The search noise epsilon, the total number NE of the epochs, the number NS of time steps contained in each epochs, the storage capacity RS of an experience replay pool, the experience replay period RP, the discount factor gamma and the smoothing coefficient tau of a target network; the indexes defining the epoode and time step are ne and t, respectively, with ne=1 and t=1 initialized.
Training actor network stage:
s210_1: when NE e {1, 2..ne }, t e {1, 2..ns } the current system state s is collected t And input into an actor network pi μ In which all possible continuous action probability distribution pi satisfying the constraint is output μ (s t ) Then adding the exploring noise to obtain an action a t ~π μ (s t ) +ε, ε -N (0, σ) and execute;
s210_2: in the execution of action a t Thereafter, the SDN controller obtains an instant prize R t (s t ,a t ) And transitions to the next state s t+1 Then the experience sample e t =(s t ,a t ,R t ,s t+1 ) Store in the experience replay pool;
s210_3: judging whether or not to meet If it is satisfied that the set of parameters, let t +1, and returns s210_1; otherwise, executing S210_4;
s210_4: judging whether t% rp= 0 is satisfied, if so, executing s210_5; if not, executing S210_3;
s210_5: sampling data e from an experience replay pool using experience replay techniques i =(s i ,a i ,R i ,s i+1 );
Will e i S in (3) i+1 Inputting the predicted action into a target actor network, and adding noise to obtain a predicted action
Will s i+1 Andinputting into target critic1 and target critic2 network to obtain Q value +.>And
defining a target Q value as:
a small batch gradient descent method is then used to minimize the loss function to update the parameters θ of the target critic1 and target critic2 networks, respectively 1 and θ2 I.e.
Wherein Z represents the number of empirical samples drawn;
updating the parameter μ of the actor network using deterministic policy gradient method every d time steps:
wherein ,representing the critic network with the smallest Q value in the target critic1 and target critic2 networks;
soft update is performed on the multi-target critic1 network parameter, the target critic2 network parameter and the target actor network parameter:
μ ~ ←τμ+(1-τ)μ ~ (33)
s210_6: after the training process of the TD3 algorithm is completed, the optimal strategy of the actor network is obtainedAnd then deploying the trained actor network to the SDN controller.
The execution stage:
when T e {1,2,., T } when the power load terminal requests service from ES, the SDN controller gathers the current system state s t Then s is taken t Input to a trained actor networkIn (c) then based on->To select an optimal action a t And executing;
perform the optimal action a t Thereafter, the SDN controller obtains an instant prize R t (s t ,a t ) And transitions to the next state s t+1 。
Example 3:
the invention discloses an access response system of an electric load terminal, which comprises:
the access request acquisition module is used for receiving an access request from a power load terminal to be accessed;
the system state collection module is used for collecting the current system state of the power load management system; the current system state includes: the method comprises the following steps of randomly generating a calculation task of a current accessed power load terminal, total calculation resources of a current edge server, total calculation resources of a current cloud server, current channel gain and current environmental noise;
the processing module is used for inputting the current system state of the power load management system into the trained actor network to obtain the optimal action of the power load management system;
the decision module is used for carrying out a calculation resource allocation decision and a communication resource allocation decision according to the optimal action of the power load management system; providing edge or cloud computing resources for the power load terminal based on the computing resource allocation decision and the communication resource allocation decision by the edge server or the cloud server, and simultaneously providing communication resources accessed to the edge server or the cloud server for the power load terminal to complete the access of the power load terminal to be accessed; the trained actor network is obtained by training the actor network according to the minimum system assembly cost while meeting URLLC constraint; the total cost of the system is composed of the total latency cost of computing task execution and the total cost of system energy consumption to complete all computing tasks.
Example 4:
as shown in fig. 2, the present invention discloses a management system, comprising: a power load terminal area composed of a plurality of power load terminals, an edge area composed of M edge servers and one SDN controller, and a cloud area composed of a plurality of cloud servers; each edge server has communication links with a plurality of power load terminals in the communication range, and each edge server communicates with a cloud server in the communication range through an SDN controller;
the SDN controller is used for executing the steps of an access response method of the power load terminal, obtaining an optimal action, and carrying out resource allocation decision according to the optimal action to finish the access of the power load terminal to be accessed;
the edge server is used for providing edge computing resources for the power load terminal based on the resource allocation decision obtained by the SDN controller and providing communication resources accessed to the edge server for the power load terminal;
the cloud server is used for providing cloud computing resources for the power load terminal based on the resource allocation decision obtained by the SDN controller and providing communication resources accessed to the cloud server for the power load terminal.
Example 5:
the embodiment discloses a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps disclosed in any one of the embodiments.
Example 6:
the present embodiment discloses a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps disclosed in any of the embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.
Claims (24)
1. An access response method for a power load terminal is characterized in that: the method comprises the following steps:
step 1: receiving an access request from a power load terminal to be accessed, and collecting the current system state of a power load management system;
step 2: inputting the current system state of the power load management system into a trained actor network to obtain the optimal action of the power load management system;
Step 3: according to the optimal action of the power load management system, performing a calculation resource allocation decision and a communication resource allocation decision; providing edge or cloud computing resources for the power load terminal based on the computing resource allocation decision and the communication resource allocation decision by the edge server or the cloud server, and simultaneously providing communication resources accessed to the edge server or the cloud server for the power load terminal to complete the access of the power load terminal to be accessed;
the trained actor network is obtained by training the actor network according to the minimum system assembly cost while meeting URLLC constraint; the total cost of the system is composed of the total latency cost of computing task execution and the total cost of system energy consumption to complete all computing tasks.
2. The power load terminal access response method of claim 1, wherein: the trained actor network is obtained by training the actor network according to the minimum system assembly cost while meeting the URLLC constraint, and specifically comprises the following steps:
s1: initializing an actor network pi μ The actor network parameter mu, the critic1 network Q θ1 Critic1 network parameters θ 1 Critic2 network Q θ2 Critic2 network parameters θ 2 Target actor network Target actor network parameter μ ~ Target critic1 network->Target critic1 network parameter θ 1 ~ Target critic2 network->Target critic2 network parameter θ 2 ~ The search noise epsilon, the total number NE of the epochs, the number NS of time steps contained in each epochs, the storage capacity RS of an experience replay pool, the experience replay period RP, the discount factor gamma and the smoothing coefficient tau of a target actor network; defining indexes of an epoode and a time step as ne and t respectively, and initializing ne=1 and t=1;
s2: when NE e {1, 2..ne }, t e {1, 2..ns }, the current system state s will be t Input to an actor network pi μ In which all possible continuous action probability distribution pi satisfying URLLC constraint is output μ (s t ) Probability distribution pi of all possible continuous actions in satisfying URLLC constraint μ (s t ) On the basis of (a) and adding search noise epsilon to obtain an action a t ~π μ (s t ) +ε, ε to N (0, σ), act a is performed t ;
s t E S, S represents a state space, including computing task J n ={b n ,c n ,Q n ' Access decisionTransmitting power P of power load terminal m Reception power of ES->Transmission power of ES->Total bandwidth B of each ES u Computing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t) a transfer rate R between each ES and SDN controller md (t);Representing the ratio of the calculation tasks calculated locally at the power load terminal n +.>Representing the offload ratio of the computing task from the power load terminal n to the edge server m, +.>The calculation task response ratio of the calculation task from the power load terminal n to the edge server m and then to the cloud server c is represented;
action a t E a, a represents an action space to make power load terminal access, computing task responses, computing resource allocation decisions, and communication resource allocation decisions, including access decisions for each power load terminalComputing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t), the transmitting power P of the electric load terminal m Reception power of ES->And transmission power of ES->S3: in the execution of action a t After that, an instant prize R is obtained t (s t ,a t ) Expressed as:
R t (s t ,a t )=-(ω 1 T Latency +ω 2 E(t))(20)
wherein ,ω1 and ω2 Is a weight parameter, ω 1 +ω 2 =1;T Latency Executing total time delay for the computing task; e (t) is the total system energy consumption for completing all computing tasks;
transition to the next system state s t+1 Then the experience sample e t =(s t ,a t ,R t ,s t+1 ) Store in the experience replay pool;
S4: judging whether or not to meetIf it is satisfied that the set of parameters, let t +1, and returns to S2; otherwise, executing S5;
s5: determining whether t% rp= 0 is satisfied, and executing S6 only when satisfied;
s6: extracting experience sample e from experience replay pool by experience replay technology i =(s i ,a i ,R i ,s i+1 ) The method comprises the steps of carrying out a first treatment on the surface of the Will e i S in (3) i+1 Inputting the result into a target actor network, adding exploration noise epsilon into the output of the target actor network, and obtaining a predicted actionε~N(0,σ);s i Representing the system state, a, in the extracted empirical sample i Representing actions in the extracted empirical sample, s i+1 Representing the next system state in the extracted empirical sample, R i Representing an instant prize in the extracted experience sample;
will s i+1 Andinput to target critic1 network +.>And target critic2 network->In (1) obtaining the Q value and
Defining a target Q value as:
wherein y represents a TD target;
updating the critic1 network parameter θ separately by minimizing the loss function using a small batch gradient descent method 1 And critic2 network parameter θ 2 Expressed as:
wherein Z represents the number of empirical samples drawn;
updating the actor network parameter μ using deterministic policy gradient method every d time steps:
wherein ,representing the critical network with the minimum Q value in the target critical 1 network and the target critical 2 network;
Soft update is performed on the target critic1 network parameter, the target critic2 network parameter and the target actor network parameter:
μ ~ ←τμ+(1-τ)μ ~ (33)。
3. a power load terminal access response method according to claim 2, characterized in that: in S2, the URLLC constraint is expressed as:
0≤f mn (t)≤F mn (t) (25)
0≤f mc (t)≤F mc (t) (26)
ω 1 +ω 2 =1 (27)
in the formula (24), the amino acid sequence of the compound,representing edgesTransmit power of server m, +.>Representing the received power of the edge server m; p represents the maximum value range of the power;
in the formula (25), f mn (t) represents the computing resources allocated to the power load terminal n by the edge server m during the time slot t, F mn (t) represents the maximum computing resources that the edge server m can allocate to the power load terminal n;
in the formula (26), f mc (t) represents the computing resources allocated to the power load terminal n by the cloud server c during the time slot t, F mc (t) represents the maximum computing resource that the cloud server c can allocate to the power load terminal n;
in the formula (3), the amino acid sequence of the compound,representing the data rate of each URLLC service in the ith communication link, D representing the size of the task data uploaded by the power load terminal n, B n Representing the bandwidth, lambda, of each communication link m To calculate the task arrival rate, f -1 (. Cndot.) is defined as [ -e -1 ,0)→[-1,∞),Representing the minimum data rate under delay constraint ensuring that no delay interruption is caused,/ >Representing the maximum SINR violation probability that can be tolerated;T max A threshold value representing a maximum tolerable latency;
in the formula (4), the amino acid sequence of the compound,representing outage probability->Representing the minimum SINR threshold value on the nth communication link,/or->Representing a maximum probability of tolerable violation of SINR constraints; SINR (Signal to interference plus noise ratio) n Indicating the SINR value on the nth communication link,representing the probability that the received SINR is less than the minimum SINR threshold.
4. A power load terminal access response method according to claim 3, characterized in that: s3, calculating the total time delay of task execution as the sum of transmission time delay and calculation time delay;
the transmission delay includes: local transmission delay of the power load terminal, and transmission delay between the edge server and the cloud server;
the calculating the time delay comprises the following steps: the method comprises the steps of locally calculating the time delay of the power load terminal, calculating the time delay of the edge server for processing the calculation task of the power load terminal and calculating the time delay of the cloud server for processing the calculation task of the power load terminal.
5. The power load terminal access response method of claim 4, wherein: the total time delay of task execution is calculated as:
in the formula ,representing local transmission delay of the power load terminal in a time slot t;
Representing the transmission between an edge server and a cloud server in a time slot tExtending;
representing the local calculation time delay of the power load terminal in the time slot t;
the calculation time delay of the calculation task of the power load terminal n processed by the edge server m in the time slot t is shown;
and the calculation time delay of the cloud server c for processing the calculation task of the power load terminal n in the time slot t is shown.
6. The power load terminal access response method of claim 5, wherein: the saidExpressed as:
in the formula ,Dm Representing the data quantity of the calculation task which needs to be processed by the power load terminal n; b (B) m Representing the uplink bandwidth equally distributed to all power load terminals;
representing the transmission channel gain, g, in time slot t 0 Representing the channel gain, d, between the edge server m and the power load terminal n mn (t) represents the edge server m and the power load terminal n in the time slot tDistance between (I) and (II)>Representing the additive white gaussian noise power at each edge server M, M n And (t) represents the number of power load terminals served by the edge server m during the time slot t.
7. The power load terminal access response method of claim 5, wherein: the saidExpressed as:
in the formula ,Dm Representing the data quantity of the calculation task which needs to be processed by the power load terminal n; r is R mc (t) represents a transmission rate between the edge server and the cloud server in the time slot t; r is R md (t) represents the transmission rate inside the edge server in time slot t; b (B) c Representing bandwidth pre-distributed to edge server m by cloud server, P c Representing the transmit power of the edge server m,represents the additive white gaussian noise power, h, under the cloud server c mc And (t) represents the transmission channel gain between the edge server and the cloud server in the time slot t.
8. The power load terminal access response method of claim 5, wherein: the saidExpressed as:
in the formula ,fn (t) represents the computing power of the power load terminal n in the time slot t, D m The amount of calculation task data that the power load terminal n needs to process is represented, and C represents the number of CPU cycles required to process each task.
9. The power load terminal access response method of claim 5, wherein: the saidExpressed as:
in the formula ,fmn (t) represents the computing resources allocated to the power load terminal n by the edge server m during the time slot t, and C represents the number of CPU cycles required to process each task; d (D) m Indicating the amount of computational task data that the power load terminal n needs to process.
10. The power load terminal access response method of claim 5, wherein: the saidExpressed as:
in the formula ,fmc (t) represents the computing resources allocated to the power load terminal n by the cloud server C during the time slot t, C representing the number of CPU cycles required to process each task; d (D) m Indicating the amount of computational task data that the power load terminal n needs to process.
11. A power load terminal access response method according to claim 2, characterized in that: in S3, the total system energy consumption for completing all the calculation tasks is the sum of the local calculation energy consumption of the power load terminal n, the transmission energy consumption of the power load terminal n when the power load terminal n accesses part of the calculation tasks to the edge server m, the calculation energy consumption of the edge server m for processing the calculation tasks of the power load terminal n, and the transmission energy consumption when the edge server m accesses the rest of the calculation tasks to the cloud server c.
12. The power load terminal access response method of claim 11, wherein: the total system energy consumption for completing all computing tasks is expressed as:
wherein E (t) represents the total system energy consumption, lambda, in time slot t after completion of the calculation tasks of all the power load terminals m Representing the arrival rate of the computing task,representing the local calculation of the energy consumption, E, of the power load terminal n in the time slot t mn (t) represents the transmission energy consumption of the power load terminal n when the power load terminal n switches part of the calculation tasks to the edge server m during the time slot t, < >>Representing the computational energy consumption of the edge server m in processing the computational tasks of the power load terminal n during the time slot t, E mc And (t) represents transmission energy consumption when the edge server m accesses the rest calculation task to the CSc in the time slot t.
13. The electrical load terminal access response method of claim 12, wherein: the saidThe representation is:
wherein, kappa is equal to or greater than 0 and represents effective switch capacitance, f n (t) represents the computing power of the power load terminal n during the time slot t,representing the local computation delay of the power load terminal during time slot t.
14. The electrical load terminal access response method of claim 12, wherein: the E is mn (t) representing:
wherein ,representing the received power of the edge server m.
15. The electrical load terminal access response method of claim 12, wherein: the saidThe representation is:
16. the electrical load terminal access response method of claim 12, wherein: the E is mc (t) representing:
wherein ,is the transmit power of the edge server m.
17. An electrical load terminal access response system, characterized by: comprising the following steps:
the access request acquisition module is used for receiving an access request from a power load terminal to be accessed;
the system state collection module is used for collecting the current system state of the power load management system;
the processing module is used for inputting the current system state of the power load management system into the trained actor network to obtain the optimal action of the power load management system;
the decision module is used for carrying out a calculation resource allocation decision and a communication resource allocation decision according to the optimal action of the power load management system; providing edge or cloud computing resources for the power load terminal based on the computing resource allocation decision and the communication resource allocation decision by the edge server or the cloud server, and simultaneously providing communication resources accessed to the edge server or the cloud server for the power load terminal to complete the access of the power load terminal to be accessed;
the trained actor network is obtained by training the actor network according to the minimum system assembly cost while meeting URLLC constraint; the total cost of the system is composed of the total latency cost of computing task execution and the total cost of system energy consumption to complete all computing tasks.
18. An electrical load terminal access response system according to claim 17, wherein: the trained actor network is obtained by training the actor network according to the minimum system assembly cost while meeting the URLLC constraint, and specifically comprises the following steps:
s1: initializing an actor network pi μ Network parameter of actorMu, critic1 network Q θ1 Critic1 network parameters θ 1 Critic2 network Q θ2 Critic2 network parameters θ 2 Target actor networkTarget actor network parameter μ ~ Target critic1 network->Target critic1 network parameter θ 1 ~ Target critic2 network->Target critic2 network parameter θ 2 ~ The search noise epsilon, the total number NE of the epochs, the number NS of time steps contained in each epochs, the storage capacity RS of an experience replay pool, the experience replay period RP, the discount factor gamma and the smoothing coefficient tau of a target actor network; defining indexes of an epoode and a time step as ne and t respectively, and initializing ne=1 and t=1;
s2: when NE e {1, 2..ne }, t e {1, 2..ns }, the current system state s will be t Input to an actor network pi μ In which all possible continuous action probability distribution pi satisfying URLLC constraint is output μ (s t ) Probability distribution pi of all possible continuous actions in satisfying URLLC constraint μ (s t ) On the basis of (a) and adding search noise epsilon to obtain an action a t ~π μ (s t ) +ε, ε to N (0, σ), act a is performed t ;
s t E S, S represents a state space, including computing task J n ={b n ,c n ,Q n ' Access decisionTransmitting power P of power load terminal m Reception power of ES->Transmission power of ES->Total bandwidth B of each ES u Computing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t) a transfer rate R between each ES and SDN controller md (t);Representing the ratio of the calculation tasks calculated locally at the power load terminal n +.>Representing the offload ratio of the computing task from the power load terminal n to the edge server m, +.>The calculation task response ratio of the calculation task from the power load terminal n to the edge server m and then to the cloud server c is represented;
action a t E a, a represents an action space to make power load terminal access, computing task responses, computing resource allocation decisions, and communication resource allocation decisions, including access decisions for each power load terminalComputing power f of power load terminal n (t), computing power f allocated to electric load terminals by ES mn (t), computing power f allocated to power load terminal by CS mc (t), the transmitting power P of the electric load terminal m Reception power of ES->And transmission power of ES->
S3: in the execution of action a t After that, getGet an instant prize R t (s t ,a t ) Expressed as:
R t (s t ,a t )=-(ω 1 T Latency +ω 2 E(t)) (20)
wherein ,ω1 and ω2 Is a weight parameter, ω 1 +ω 2 =1;T Latency Executing total time delay for the computing task; e (t) is the total system energy consumption for completing all computing tasks; the greater the total cost of the system, the less the instant rewards;
and transitions to the next system state s t+1 Then the experience sample e t =(s t ,a t ,R t ,s t+1 ) Store in the experience replay pool;
s4: judging whether or not to meetIf it is satisfied that the set of parameters, let t +1, and returns to S2; otherwise, executing S5;
s5: determining whether t% rp= 0 is satisfied, and executing S6 only when satisfied;
s6: extracting experience sample e from experience replay pool by experience replay technology i =(s i ,a i ,R i ,s i+1 ) The method comprises the steps of carrying out a first treatment on the surface of the Will e i S in (3) i+1 Inputting the result into a target actor network, adding exploration noise epsilon into the output of the target actor network, and obtaining a predicted actionε~N(0,σ);s i Representing the system state, a, in the extracted empirical sample i Representing actions in the extracted empirical sample, s i+1 Representing the next system state in the extracted empirical sample, R i Representing an instant prize in the extracted experience sample;
will s i+1 Andinput to target critic1 network +.>And target critic2 network- >In (1) obtaining the Q value and
Defining a target Q value as:
wherein y represents a TD target;
updating the critic1 network parameter θ separately by minimizing the loss function using a small batch gradient descent method 1 And critic2 network parameter θ 2 Expressed as:
wherein Z represents the number of empirical samples drawn;
updating the actor network parameter μ using deterministic policy gradient method every d time steps:
wherein ,representing the critical network with the minimum Q value in the target critical 1 network and the target critical 2 network;
soft update is performed on the target critic1 network parameter, the target critic2 network parameter and the target actor network parameter:
μ ~ ←τμ+(1-τ)μ ~ (33)。
19. an electrical load terminal access response system according to claim 17, wherein: in S2, the URLLC constraint is expressed as:
0≤f mn (t)≤F mn (t) (25)
0≤f mc (t)≤F mc (t) (26)
ω 1 +ω 2 =1 (27)
in the formula (24), the amino acid sequence of the compound,representing edge server mTransmit power->Representing the received power of the edge server m; p represents the maximum value range of the power;
in the formula (25), f mn (t) represents the computing resources allocated to the power load terminal n by the edge server m during the time slot t, F mn (t) represents the maximum computing resources that the edge server m can allocate to the power load terminal n;
in the formula (26), f mc (t) represents the computing resources allocated to the power load terminal n by the cloud server c during the time slot t, F mc (t) represents the maximum computing resource that the cloud server c can allocate to the power load terminal n;
in the formula (3), the amino acid sequence of the compound,representing the data rate of each URLLC service in the ith communication link, D representing the size of the task data uploaded by the power load terminal n, B n Representing the bandwidth, lambda, of each communication link m To calculate the task arrival rate, f -1 (. Cndot.) is defined as [ -e -1 ,0)→[-1,∞),Representing the minimum data rate under delay constraint ensuring that no delay interruption is caused,Representing the maximum SINR violation probability that can be tolerated;T max A threshold value representing a maximum tolerable latency;
in the formula (4), the amino acid sequence of the compound,representing outage probability->Representing the minimum SINR threshold value on the nth communication link,/or->Representing a maximum probability of tolerable violation of SINR constraints; SINR (Signal to interference plus noise ratio) n Representing SINR value on the nth communication link, P r {. The probability that the received SINR is less than the minimum SINR threshold.
20. An electrical load terminal access response system according to claim 19, wherein: the total time delay of task execution is calculated as:
in the formula ,representing local transmission delay of the power load terminal in a time slot t;
representing transmission delay between the edge server and the cloud server in the time slot t;
representing the local calculation time delay of the power load terminal in the time slot t;
The calculation time delay of the calculation task of the power load terminal n processed by the edge server m in the time slot t is shown;
representing the processing of the power load terminal n computing task by the cloud server c during the time slot tAnd calculating the time delay.
21. An electrical load terminal access response system according to claim 19, wherein: the total system energy consumption for completing all computing tasks is expressed as:
wherein E (t) represents the total system energy consumption, lambda, in time slot t after completion of the calculation tasks of all the power load terminals m Representing the arrival rate of the computing task,representing the local calculation of the energy consumption, E, of the power load terminal n in the time slot t mn (t) represents the transmission energy consumption of the power load terminal n when the power load terminal n switches part of the calculation tasks to the edge server m during the time slot t, < >>Representing the computational energy consumption of the edge server m in processing the computational tasks of the power load terminal n during the time slot t, E mc And (t) represents transmission energy consumption when the edge server m accesses the rest calculation task to the CSc in the time slot t.
22. A management system characterized by: comprising the following steps: a power load terminal area composed of a plurality of power load terminals, an edge area composed of M edge servers and one SDN controller, and a cloud area composed of a plurality of cloud servers; each edge server has communication links with a plurality of power load terminals in the communication range, and each edge server communicates with a cloud server in the communication range through an SDN controller;
The SDN controller is configured to perform the steps of the power load terminal access response method according to any one of claims 1 to 16, obtain an optimal action, and perform a resource allocation decision according to the optimal action, so as to complete access of a power load terminal to be accessed;
the edge server is used for providing edge computing resources for the power load terminal based on the resource allocation decision obtained by the SDN controller and providing communication resources accessed to the edge server for the power load terminal;
the cloud server is used for providing cloud computing resources for the power load terminal based on the resource allocation decision obtained by the SDN controller and providing communication resources accessed to the cloud server for the power load terminal.
23. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a power load terminal access response method according to any one of claims 1 to 16 when the computer program is executed.
24. A storage medium storing a program of an access response method, which when executed by at least one processor implements the steps of an electrical load terminal access response method of any one of claims 1 to 16.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310790860.9A CN116828542A (en) | 2023-06-30 | 2023-06-30 | Power load terminal access response method, system, management system, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310790860.9A CN116828542A (en) | 2023-06-30 | 2023-06-30 | Power load terminal access response method, system, management system, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116828542A true CN116828542A (en) | 2023-09-29 |
Family
ID=88116235
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310790860.9A Pending CN116828542A (en) | 2023-06-30 | 2023-06-30 | Power load terminal access response method, system, management system, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116828542A (en) |
-
2023
- 2023-06-30 CN CN202310790860.9A patent/CN116828542A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111556461B (en) | Vehicle-mounted edge network task distribution and unloading method based on deep Q network | |
CN110809306B (en) | Terminal access selection method based on deep reinforcement learning | |
CN111628855B (en) | Industrial 5G dynamic multi-priority multi-access method based on deep reinforcement learning | |
CN110971706A (en) | Approximate optimization and reinforcement learning-based task unloading method in MEC | |
CN111711666B (en) | Internet of vehicles cloud computing resource optimization method based on reinforcement learning | |
CN109151864A (en) | A kind of migration decision and resource optimal distribution method towards mobile edge calculations super-intensive network | |
CN114757352B (en) | Intelligent agent training method, cross-domain heterogeneous environment task scheduling method and related device | |
CN113687875B (en) | Method and device for unloading vehicle tasks in Internet of vehicles | |
CN112533237B (en) | Network capacity optimization method for supporting large-scale equipment communication in industrial internet | |
CN115633380B (en) | Multi-edge service cache scheduling method and system considering dynamic topology | |
Yang et al. | Deep reinforcement learning based wireless network optimization: A comparative study | |
CN114938381B (en) | D2D-MEC unloading method based on deep reinforcement learning | |
CN114567895A (en) | Method for realizing intelligent cooperation strategy of MEC server cluster | |
CN113573363A (en) | MEC calculation unloading and resource allocation method based on deep reinforcement learning | |
CN117580105B (en) | Unmanned aerial vehicle task unloading optimization method for power grid inspection | |
CN117858109A (en) | User association, task unloading and resource allocation optimization method based on digital twin | |
Tan et al. | Toward a task offloading framework based on cyber digital twins in mobile edge computing | |
CN116828542A (en) | Power load terminal access response method, system, management system, equipment and storage medium | |
CN116109058A (en) | Substation inspection management method and device based on deep reinforcement learning | |
CN114173421B (en) | LoRa logic channel based on deep reinforcement learning and power distribution method | |
CN115580900A (en) | Unmanned aerial vehicle assisted cooperative task unloading method based on deep reinforcement learning | |
CN113157344B (en) | DRL-based energy consumption perception task unloading method in mobile edge computing environment | |
CN117544680B (en) | Caching method, system, equipment and medium based on electric power Internet of things | |
CN117676608A (en) | Resource allocation method and system for heterogeneous equipment access pipe gallery task and electronic equipment | |
CN117135692A (en) | Collaborative task unloading and service caching method based on graph attention multi-agent reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |