CN114626306B - Method and system for guaranteeing freshness of regulation and control information of park distributed energy - Google Patents

Method and system for guaranteeing freshness of regulation and control information of park distributed energy Download PDF

Info

Publication number
CN114626306B
CN114626306B CN202210287027.8A CN202210287027A CN114626306B CN 114626306 B CN114626306 B CN 114626306B CN 202210287027 A CN202210287027 A CN 202210287027A CN 114626306 B CN114626306 B CN 114626306B
Authority
CN
China
Prior art keywords
terminal
regulation
freshness
model
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210287027.8A
Other languages
Chinese (zh)
Other versions
CN114626306A (en
Inventor
廖海君
周振宇
王雅倩
卢文冰
杨阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kuaidian Technology Co ltd
North China Electric Power University
Original Assignee
Beijing Kuaidian Technology Co ltd
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuaidian Technology Co ltd, North China Electric Power University filed Critical Beijing Kuaidian Technology Co ltd
Priority to CN202210287027.8A priority Critical patent/CN114626306B/en
Publication of CN114626306A publication Critical patent/CN114626306A/en
Application granted granted Critical
Publication of CN114626306B publication Critical patent/CN114626306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/06Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2113/00Details relating to the application field
    • G06F2113/04Power grid distribution networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Tourism & Hospitality (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Operations Research (AREA)
  • Primary Health Care (AREA)
  • Quality & Reliability (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention provides a method and a system for guaranteeing the freshness of regulation and control information of garden distributed energy, which comprise the following steps: the data layer provides sample data and a local model for the training of the garden distributed energy regulation and control decision model by deploying an internet of things terminal on the electrical equipment; the network layer comprises a plurality of communication media and provides a channel for interaction of the data layer and the control layer; the control layer reduces the age of the regulation and control information by adjusting channel allocation and batch scale decision and improves the freshness of the regulation and control information; and the service layer comprises regulation and control services. The method comprises the following steps: training a garden distributed energy regulation and control decision model; modeling the regulation and control information freshness guarantee problem faced by the distributed energy regulation and control decision model training; and designing an IDEAL algorithm for power-to-simple Internet of things communication and computing resource collaborative optimization based on the control information freshness perception. The global loss can be reduced, meanwhile, the waiting time delay is reduced, and the information freshness is guaranteed.

Description

Method and system for guaranteeing freshness of regulation and control information of park distributed energy
Technical Field
The invention provides a method and a system for guaranteeing the freshness of regulation and control information of park distributed energy, and belongs to the technical field of power systems.
Background
With the rapid development of the photovoltaic in the whole county and the construction of a novel power system, the distributed energy sources are increased explosively. However, the distributed energy sources such as photovoltaic have the characteristics of intermittency, randomness, volatility and the like, and the grid connection of the high-proportion distributed energy sources has great influence on the power flow distribution, the electric energy quality, the network loss and the regulation capability of the power system. Therefore, distributed energy sources need to be dynamically regulated and controlled according to load, so that the stability of a novel power system is improved, the balance of active/reactive power is realized, distributed energy sources such as photovoltaic energy sources are better consumed, and the phenomena of light abandonment and the like caused by difficulty in consumption are avoided. The regulation and control of the distributed energy of the park need to establish and train a model among load requirements, photovoltaic output, meteorological information and a regulation and control strategy through mass information. The information age is an effective index for measuring the freshness of the information and represents the time delay from the generation of the information to the training of the regulation model. The freshness of the energy regulation information has an important influence on the accuracy of model training. When the information age is older, the freshness and timeliness of the information are poorer, so that the loss function of the training model is large, namely, the output of the model and the real output have larger deviation, and the reliability, economy and accuracy of the distributed energy regulation and control are reduced.
The power-to-simple Internet of things has the advantages of control-data decoupling, multi-mode communication, cross-domain resource cooperation and the like, and provides powerful communication network support for acquisition and transmission of data required by distributed energy regulation and control model training. However, the power-to-simple internet of things oriented to distributed energy regulation also needs to solve the following technical challenges.
First, the coupling of model training and data transmission results in a large amount of raw data being uploaded to a central training node during the model training process, which causes network congestion, waste of communication resources, and leakage of privacy of local data.
Secondly, the adaptability of optimization and model training of cross-domain resources such as communication, calculation, storage and the like is poor, so that a model loss function is large, and the accuracy and reliability of distributed energy regulation and control are reduced. And cross-domain resource collaborative optimization relates to a large-dimensional optimization space, and an accurate probability statistical model and a closed-form solution are difficult to obtain.
Thirdly, power Line Communication (PLC), WLAN, 5G and other multi-modal heterogeneous networks exist in the regulation and control site, and the difference between the terminal computing resources and the multi-modal channel quality leads to an increase in information age, which makes it difficult to ensure long-term constraint of the freshness of the distributed energy regulation and control information.
Therefore, a method and a system for guaranteeing the freshness of the regulation and control information of the garden distributed energy are urgently needed to be designed, the loss function minimization of the distributed energy regulation and control model can be realized under the constraint of the freshness of the regulation and control information for a long time, the problems that the cross-domain resource optimization and model training adaptability of the garden is poor, the freshness of the regulation and control information is difficult to guarantee and the like are solved, and the reliability and the economy of the regulation and control of the distributed energy are guaranteed.
Disclosure of Invention
Aiming at solving the problems of communication resource waste, network congestion and data privacy leakage caused by large-scale data interaction of park distributed energy regulation and control, the invention establishes a semi-distributed regulation and control model training architecture based on federal learning, realizes decoupling of decision optimization and original data transmission through data layer local model training and control layer global model training, and avoids communication resource waste and network congestion caused by large-scale data interaction.
The method aims at solving the problem of poor adaptability between optimization of cross-domain resources such as communication, calculation and storage and minimization of a loss function of a distributed energy regulation and control decision model. According to the method, iterative decoupling is carried out on the minimization problem of the loss function of the long-term regulation and control model by utilizing the expansion and the Lyapunov optimization, the minimization problem is converted into the optimization problem of the short-term Markov decision process, the state-action value fitting precision in a high-dimensional optimization space is improved by adopting a Deep Q Network (DQN), communication and calculation resource allocation cooperation is realized by learning channel allocation and batch scale joint optimization strategies, a global model is trained by utilizing more samples, the loss function of the distributed energy regulation and control decision model is minimized, and the accuracy and the reliability of distributed energy regulation and control are guaranteed. In particular, the controller compares the Q values to resolve a multi-modal channel assignment conflict and assigns a channel to the terminal that can obtain the greatest state-action value.
The method aims at solving the problem of long-term guarantee of coupling and regulation information freshness of a time slot cross-domain resource allocation strategy. The method has the capacity of sensing the freshness of the regulation information, senses the deviation of the freshness of the regulation information of each time slot and the regulation constraint by utilizing the evolution of the deficit virtual queue, dynamically adjusts the multi-mode channel allocation and batch size strategy, reduces the information age, and realizes the long-term guarantee of the freshness of the regulation information.
The technical scheme is as follows:
the garden distributed energy regulation and control information freshness guarantee system comprises a data layer, a network layer, a control layer and a service layer from bottom to top; the data layer provides sample data and a local model for garden distributed energy regulation and control decision model training by deploying an internet of things terminal on the electrical equipment;
the network layer comprises a plurality of communication media and provides a channel for interaction of the data layer and the control layer;
the control layer is used for reducing the age of the regulation and control information by adjusting channel allocation and batch scale decision, improving the freshness of the regulation and control information and ensuring the timeliness of a local terminal model received by the controller;
and the business layer comprises regulation business.
The regulation and control business comprises energy storage regulation and control, distributed energy output prediction, flexible load regulation and control and distributed photovoltaic regulation and control.
The method for guaranteeing the freshness of the regulation and control information of the garden distributed energy comprises the following steps of:
s1, training a garden distributed energy regulation and control decision model;
s2, modeling a regulation and control information freshness guarantee problem in distributed energy regulation and control decision model training;
s3, designing an IDEAL algorithm for power-to-simple Internet of things communication and computing resource collaborative optimization based on regulation and control information freshness perception;
the problem of the freshness of the regulation and control information is modeled by optimizing batch scale and multi-mode channel selection in the training process of the distributed energy regulation and control decision model, so that the loss function of the energy regulation and control decision model is minimized under the condition of guaranteeing the freshness of the regulation and control information. The IDEAL algorithm is carried in a controller of a control layer of a garden distributed energy regulation and control information freshness guarantee system. The controller can dynamically optimize the batch scale and the multi-mode channel selection by executing the algorithm in the training of the distributed energy regulation decision model, and can realize the long-term guarantee of the freshness of the regulation information.
Specifically, the method comprises the following steps:
s1. Park distributed energy regulation and control decision model training
And iteratively training the garden distributed energy regulation and control decision model by adopting a federated learning architecture, assuming that T iterations are required, and the set is represented as T = {1, …, T, …, T }. Each iteration comprises four steps, which are specifically described as follows:
1) Issuing a global model: and the controller issues the global model to the terminal through a multi-mode communication network fusing AC/DC PLC, WLAN and 5G.
2) Local model training: each terminal performs local model training based on the local data set.
3) Uploading a local model: and each terminal uploads the trained local model to the controller through the multi-mode communication network.
4) And (3) global model training: after receiving the local models uploaded by all the terminals, the controller trains a global model based on weighted aggregation, and supports accurate distributed energy regulation and control optimization.
Specifically, local model training:
assuming that there are N internet of things terminals, the set is denoted as N = {1, …, N, … N }. In the t iteration, the terminal n uses the global model omega after the t-1 iteration t-1 Updating the local model ω n,t-1 I.e. omega n,t-1 =ω t-1 . Subsequently, terminal n utilizes local data set D n Training the local model. Defining the number of samples of the terminal n used for local model training in the t-th iteration as a batch size beta n,t A loss function is employed to quantify the deviation between the true output of the model and the target output. Defining the local loss function of the terminal n at the t-th iteration as the average loss of the local samples
Figure BDA0003558664750000031
Wherein the sample loss function f (ω) n,t-1 ,x n,m ) Quantizes the local model omega n,t-1 In the local data set D n The performance difference between the output of the m-th sample and the optimal output. F nn,t-1n,t ) Reflects the local model omega n,t-1 Can be used for local model updating. Based on a gradient descent method, terminaln is updated to
Figure BDA0003558664750000032
Wherein gamma > 0 is the learning step length,
Figure BDA0003558664750000033
as a loss function F nn,t-1n,t ) With respect to the local model ω n,t-1 Of the gradient of (a).
Defining the available computing resource of the terminal n in the t iteration as f n,t Then the time delay and energy consumption of the local model training are
Figure BDA0003558664750000041
Figure BDA0003558664750000042
Wherein e is n Is the coefficient of energy consumption (Watt s) 3 /cycle 3 ),ξ n The number of CPU cycles (cycles/sample) required to train a single sample.
Uploading a local model:
assume that there are J multimodal channels, including J 1 A 5G channel, J 2 A WLAN channel and J 3 One PLC channel, i.e. J = J 1 +J 2 +J 3 . Channel set is denoted as θ = {1, …, J 1 ,…,J 1 +J 2 …, J }, where J =1, …, J }, and 1 for 5G channels, J = J 1 +1,…,J 1 +J 2 For WLAN channels, J = J 1 +J 2 +1, …, J is the PLC channel. Defining a channel allocation variable as alpha n,j,t E.g. {0,1}. Wherein alpha is n,j,t =1 denotes that in the tth iteration the controller allocates channel j to terminal n for uploading local model, otherwise α n,j,t =0. In the t-th iteration, the transmission rate of the uploading model of the terminal n through the channel j is
Figure BDA0003558664750000043
Wherein, B n,j In order to be the bandwidth of the channel,
Figure BDA0003558664750000044
in order to obtain the gain of the channel,
Figure BDA0003558664750000045
in order to transmit power in the uplink direction,
Figure BDA0003558664750000046
the electromagnetic interference power for the operation of the electrical equipment,
Figure BDA0003558664750000047
is the noise power.
Define | ω n,t I is the local model omega n,t The size (bits), the time delay and the energy consumption of the local model uploaded by the terminal n are
Figure BDA0003558664750000048
Figure BDA0003558664750000049
The total energy consumption of the terminal n in the t iteration is the sum of the energy consumption of the training and uploading of the local model, and is expressed as
Figure BDA00035586647500000410
In the t-th iteration, the controller receives the local model of the terminal n with a delay of
Figure BDA00035586647500000411
And (3) global model training:
after the controller receives the local models of the N terminals, the global model is trained based on the local model weighted aggregation, and the global model is expressed as
Figure BDA00035586647500000412
Wherein the content of the first and second substances,
Figure BDA00035586647500000413
and the weight of the local model of the terminal N is expressed and defined as the ratio of the batch size of the local model to the sum of the batch sizes of the N terminals.
Quantifying the difference between the real output and the target output of the global model by using a global loss function, wherein the difference is defined as the weighted sum of N terminal local loss functions and is expressed as
Figure BDA0003558664750000051
Local training, local model uploading and global model aggregation need to meet the constraint of regulating and controlling information freshness.
The regulation and control information freshness is an information timeliness measurement index, and has an important influence on the accuracy and the real-time performance of the distributed energy regulation and control. The higher the freshness of the information adopted during the training of the regulation and control model, the smaller the performance gap between the generated regulation and control strategy and the optimal strategy. Because the controller can start the global model training after receiving all the terminal local models, the regulation and control information freshness is closely related to the time delay experienced by each terminal local model received by the controller.
The specific description of the regulatory information freshness constraint model is as follows:
defining the age (AoI, age of information) of the local model obtained by the terminal n in the t-th iterative training as the time delay from the terminal n leaving the model to the global model training, wherein the time delay mainly comprises transmission time delay
Figure BDA0003558664750000052
And waiting time delay
Figure BDA0003558664750000053
Is shown as
Figure BDA0003558664750000054
Local model latency of terminal n
Figure BDA0003558664750000055
Depending on the delay experienced by the controller receiving the last terminal local model, denoted as
Figure BDA0003558664750000056
Defining the freshness of the regulation information of the terminal n in the t-th iteration as the reciprocal of the information age, and expressing the freshness as the reciprocal
Figure BDA0003558664750000057
And the information freshness can be guaranteed and regulated by restricting the model with the largest information age. Defining the set of the freshness of all terminal regulation information as h t ={h 1,t ,…,h n,t ,…,h N,t The long-term constraint model for regulating information freshness by iteration for T times can be constructed as
Figure BDA0003558664750000058
Wherein h is min A threshold is constrained for information freshness.
S2, modeling aiming at regulation and control information freshness guarantee problem of multi-mode channel allocation optimization and batch scale optimization
The invention aims to solve the problem of minimizing the loss function of a distributed energy regulation decision model, and the optimization target isThe global loss function F (omega) of the regulation and control model after T iterations is minimized through cooperative optimization of power-to-simple Internet of things communication and computing resources while long-term constraints such as regulation and control information freshness are guaranteed T ). Defining a set of multi-modal channel allocation optimization variables as alpha n,t ={α n,1,t ,…,α n,j,t ,…,α n,J,t The set of batch-scale optimization variables is beta n,t ={1,2,…,D n }, the optimization problem is constructed as
Figure BDA0003558664750000061
Wherein, C 1 Indicating that each channel can be allocated to only one terminal; c 2 Indicating that each terminal can only be assigned one channel; c 3 Represents a terminal local model training batch size constraint where | Δ n | represents a terminal n local data set D n The size of (d); c 4 Is a long-term constraint on the energy consumption of terminal n, where E n,max A long-term energy budget for terminal n; c 5 Regulating and controlling a long-term information freshness constraint model for T iterations; c 6 Representing terminal transmission power constraints, where P PLC 、P WLAN And P 5G Respectively, PLC, WLAN and 5G channel transmission power.
Because the optimization strategy of each iteration is not only in accordance with the global loss function F (omega) after T iterations T ) Coupling and coupling with long-term constraints such as information freshness and the like, so that the optimization problem P1 is difficult to directly solve, and the inter-iteration optimization problem decoupling is required;
for the first coupling, F (ω) is scaled and theorem is applied T ) Is decoupled into
Figure BDA0003558664750000062
Wherein, F (ω) t-1 ) Is a global loss function after the t-1 th iteration and is a known parameter during the t-th iteration optimization. From the above formula, F (ω) T ) Global penalty only with the t-th iterationFunction F (ω) t ) Correlation, i.e. F (ω) T ) Is converted into a loss function F (ω) for the t-th iteration t ) And (6) optimizing.
For the second coupling, based on the virtual queue theory, the structures corresponding to the constraint C are respectively constructed 4 And C 5 Terminal energy consumption deficit virtual queue G n (t) and a virtual queue H (t) with freshness of control information whose queue backlog is updated to
Figure BDA0003558664750000063
H(t+1)=max{H(t)-min{h t }+h min ,0} (19)
Wherein, G n (t) represents the energy consumption and energy budget E for terminal n after the t-th iteration n,max The deviation between/T, H (T) represents the regulation information freshness and the information freshness constraint H after the T-th iteration min The deviation therebetween.
Based on the Lyapunov optimization theory, the Lyapunov drift plus penalty is calculated and the upper bound thereof is deduced, P1 can be decoupled into a short-term optimization problem of minimizing each iteration loss function, and the optimization target is the weighted sum of the minimized loss function, the regulation information freshness red and the terminal energy consumption red. The joint optimization problem for the t-th iteration is represented as
Figure BDA0003558664750000071
Wherein, V H And V G Is a weight corresponding to a regulatory information freshness deficit and a terminal energy consumption deficit.
Further modeling the transformed problem P2 as an MDP optimization problem, wherein key elements of the MDP optimization problem comprise a state space, an action space and a return function, and the MDP optimization problem is specifically introduced as follows:
1) State space: defining the set of terminal energy consumption red characters as G (t) = { G = 1 (t),…,G n (t),…,G N (t) the set of terminal energy budgets is E max ={E 1,max ,…E 2,max ,…,E N,max }. The state space comprises a terminal energy consumption font, a regulation information freshness font, a terminal energy budget, a regulation information freshness constraint threshold and the like, and is represented as
Figure BDA0003558664750000072
2) An action space: the motion space is defined as A t ={A 1,t ,…,A n,t ,…,A N,t In which A is n,t The motion space corresponding to terminal n is denoted as α n,t And beta n,t Of Cartesian products, i.e.
Figure BDA0003558664750000073
3) A return function: the reward function is defined as the optimization objective of P2, i.e.
Figure BDA0003558664750000074
S3, designing cooperative optimization algorithm of communication and computing resources of power-to-simple Internet of things based on regulation and control information freshness perception
The algorithm is applied to a control layer and is used for coordinating and controlling terminals of a park to participate in training of a distributed energy regulation and control decision model. The core idea is to quantize and fit the state-action value in a high-dimensional state space by using a deep Q network, namely the Q value representing the action accumulated reward value, and optimize channel allocation and batch scale decision according to the Q value.
The IDEAL algorithm structure comprises a main network, a target network, an experience pool, a multi-mode channel allocation conflict resolution module and a regulation and control information freshness deficit updating module.
The execution subject of the IDEAL algorithm is the controller. For each terminal n, the controller constructs two DQNs, one for each primary network used for optimization decision
Figure BDA0003558664750000075
And a target network for assisting in primary network training
Figure BDA0003558664750000076
The target network and the main network have the same neural network structure, and the target value of the main network is kept relatively fixed in a period of time by adopting a longer target network updating period, so that the learning stability is improved. The controller constructs a experience pool for storing experience data. On the basis, IDEAL adopts an experience playback mechanism and trains DQN through periodically randomly sampling part of experience data;
one iteration of the training of the regulation and control decision model can be divided into the following three steps: first, the controller optimizes channel allocation and batch size decision based on the Q value estimated by the main network and solves multi-modal channel allocation conflicts by comparing Q values, with the core idea of allocating channels to terminals that can obtain the maximum state-action value. And secondly, the controller issues channel allocation and batch scale decision, all terminals execute local model training and model uploading, and energy consumption information is fed back to the controller. And finally, based on the information uploaded by the terminal, the controller updates and controls the information freshness deficit and the terminal energy consumption deficit, calculates a return function, updates an experience pool, and transfers to the next state. The controller calculates a DQN loss function, updates the main network parameters according to the DQN loss function, and periodically updates the target network parameters.
The IDEAL algorithm executes a process such as an algorithm, which includes three phases, namely initialization, action selection, and multi-modal channel assignment conflict resolution and learning.
1) An initialization stage: initialization G n (t)=0,H(t)=0,α n,j,t =0,β n,t =0,
Figure BDA0003558664750000081
Defining the set of terminals which are not distributed with channels as N t And initializing N t And (N). Defining a terminal N ∈ N t Is theta n,t And initializing θ n,t =θ。
2) Action selection and multi-mode channel allocation conflict resolution stage: first, the controller selects an action for each terminal based on an epsilon-greedy algorithm, taking terminal n as an example, and the controller is based on a main network of terminal n
Figure BDA0003558664750000082
Estimated Q value
Figure BDA0003558664750000083
The actions are randomly selected with probability epsilon and the action with the maximum Q value is selected with probability 1-epsilon
Figure BDA0003558664750000084
Second, when there is a channel allocation conflict, the terminals n and m are allocated the channel j simultaneously and
Figure BDA0003558664750000085
the controller assigns a channel j to a terminal n having a larger Q value and rejects the terminal m by comparing Q values of the terminals n and m. The controller then moves terminal N out of the set of terminals that have not been assigned a channel, i.e., N t =N t N, and setting the Q value of the rejected terminal m to
Figure BDA0003558664750000086
Wherein a is m,t For terminal m action space A m,t Of (d) is an action set, denoted as a, corresponding to channel j m,t ={A m,t (j,1),A m,t (j,2),…,A m,t (j,|Δ n |) }. Based on the updated Q value, the above action selection and multi-modal channel allocation conflict resolution process is repeated until all terminals are allocated channels.
Finally, the controller issues channel allocation and batch scale decision, the terminal N belongs to N and executes local model training and local model uploading according to the decision, and energy consumption information E is obtained n,t And uploading to the controller.
3) A learning stage: in the learning stage, the controller updates the DQN network parameters by calculating a return function after the terminal executes the action, so as to improve the fitting precision of the DQN to the state-action value, enable the DQN to output an optimal strategy, realize channel allocation and batch scale optimization, improve the precision of a global model, guarantee the freshness of regulation and control information, and reduce the energy consumption of the terminal.
Firstly, based on the terminalThe controller updates the terminal energy consumption font G according to the uploaded energy consumption information (18) n (t + 1). Meanwhile, the controller calculates and obtains the information freshness of the t iteration according to the received local model timestamp, the model issuing time and the formulas (9), (13) and (14), and updates and controls the information freshness red character H (t + 1) according to the formula (19). The controller calculates a reward function based on (20)
Figure BDA0003558664750000087
It can be seen from (20) that when the freshness of the regulatory information is seriously deviated from the specified constraint, H (t) is gradually increased, which results in a decrease in the value of the reward function, forces the controller to adjust channel allocation and batch size decision to reduce the age of the regulatory information, improves the freshness of the regulatory information, and ensures the timeliness of the local terminal model received by the controller, thereby realizing the perception of the freshness of the regulatory information and improving the accuracy and reliability of the distributed energy regulation decision of the controller.
Next, the controller generates a sample
Figure BDA0003558664750000091
For updating playback experience Chi n,t And is transferred to the state S t+1 . Randomly extracting partial sample composition from playback experience pool
Figure BDA0003558664750000092
Is composed of
Figure BDA0003558664750000093
The number of samples in (1). The DQN loss function can be calculated as
Figure BDA0003558664750000094
Wherein
Figure BDA0003558664750000095
Where λ is the discounting factor.
Finally, based on upsilon n Updating the master network parameters
Figure BDA0003558664750000096
As follows
Figure BDA0003558664750000097
Where κ is the learning step size. Per T 0 The target network is updated by the second iteration
Figure BDA0003558664750000098
According to the method, the long-term constraint of the freshness of the regulation information is incorporated into the regulation and control process of the garden distributed energy, and the information age can be reduced and the freshness of the regulation and control information can be guaranteed while the global loss is minimized by establishing the garden distributed energy regulation and control decision model. In addition, the method can be applied to photovoltaic and novel distributed energy grid-connected engineering of a power system in the whole county, and information freshness guarantee is provided for training of a distributed energy regulation and control model.
(1) The system for guaranteeing the freshness of the regulation and control information of the garden distributed energy is provided, the regulation and control information freshness is bound into the regulation and control process of the garden distributed energy for a long time, the global loss is reduced, meanwhile, the waiting time delay is reduced, and the information freshness is guaranteed.
(2) The method for guaranteeing the freshness of the garden distributed energy regulation and control information is provided, a data layer local model training structure and a control layer global model training structure are established based on the Federal learning, decoupling of decision optimization and original data transmission is achieved, and waste of communication resources and network congestion caused by large-scale data interaction are avoided.
(3) The method for guaranteeing the freshness of the regulation and control information of the distributed energy resources in the park is provided, a regulation and control freshness guarantee problem model is established for multi-mode channel allocation optimization and batch scale optimization, and inter-iteration decoupling of the loss function minimization problem of the long-term regulation and control model is achieved by using the expansion and contraction and the Lyapunov optimization theorem.
(4) A method for guaranteeing the freshness of regulation and control information of distributed energy resources in a park is provided, by providing an information freshness perception-based Power-to-simple Internet of Things communication and computing resource collaborative optimization algorithm (IDEAL), state-based communication-and-computing communication optimization for structured Power networks (Internet of Things), state-action value fitting precision in a high-dimensional optimization space is improved by using a Deep Q Network (DQN), communication and computing resource distribution collaboration is realized by learning a channel distribution and batch scale joint optimization strategy, and multi-mode channel competition conflict is solved based on terminal Q value comparison. The IDEAL has a freshness perception capability, the deviation of the freshness of the regulation and control information of each time slot and the specified constraint is perceived through the evolution of the red-character virtual queue, the channel allocation and batch scale optimization strategy is dynamically adjusted according to the deviation, the age of the regulation and control information is reduced, and the long-term guarantee of the freshness of the regulation and control information is realized.
Drawings
FIG. 1 is a schematic diagram of a garden distributed energy regulation information freshness assurance system according to the present invention;
FIG. 2 (a) is a schematic diagram of terminal computing resource differences and multi-modal channel differences leading to information age according to the present invention;
FIG. 2 (b) is a schematic diagram of channel allocation and batch-scale co-optimization for reducing information age according to the present invention;
FIG. 3 is a diagram of the IDEAL algorithm structure of the present invention;
FIG. 4 is a graph of the global loss function as a function of iteration number;
FIG. 5 is a comparison of average regulatory information freshness and average batch size;
FIG. 6 is a graph of training delay, transmission delay, latency delay, and batch size as a function of iteration number;
FIG. 7 is a diagram comparing terminal energy consumption and regulation information freshness distribution of different algorithms;
fig. 8 is a graph comparing the change of the average regulatory information freshness and the average information age with the weight of the regulatory information freshness.
Detailed Description
The specific technical scheme of the invention is explained by combining the attached drawings.
The whole technical scheme comprises a garden distributed energy regulation and control information freshness guarantee system and a garden distributed energy regulation and control information freshness guarantee method.
As shown in fig. 1, the campus distributed energy regulation and control information freshness assurance system includes, from bottom to top, a data layer, a network layer, a control layer, and a service layer. The data layer provides sample data and a local model for the garden distributed energy regulation and control decision model training by deploying the internet of things terminal on the distributed photovoltaic, controllable load, charging pile and other electrical equipment. The network layer comprises a plurality of communication media such as PLC, WLAN and 5G, and provides a channel for interaction of the data layer and the control layer. The control layer reduces the age of the regulation and control information by adjusting channel allocation and batch scale decision, improves the freshness of the regulation and control information, and ensures the timeliness of the local terminal model received by the controller, thereby realizing the perception of the freshness of the regulation and control information and improving the accuracy and reliability of the distributed energy regulation and control decision of the controller. The business layer comprises regulation and control businesses such as energy storage regulation and control, distributed energy output prediction, flexible load regulation and control, distributed photovoltaic regulation and control and the like.
The system is based on an electric power-to-simple Internet of things architecture, adopts technologies such as distributed artificial intelligence, control-data decoupling, unified signaling interaction and cross-domain resource fusion, realizes heterogeneous fusion of the multi-mode Internet of things terminal, and supports training of a distributed energy regulation and control decision model. Through incorporating regulation and control information new freshness long-term constraint into garden distributed energy regulation and control in-process, can reduce waiting for the time delay when reducing global loss, guarantee information new freshness.
The method for guaranteeing the freshness of the regulation and control information of the garden distributed energy comprises the steps of S1 training a garden distributed energy regulation and control decision model, S2 modeling a regulation and control information freshness guarantee problem faced by the training of the distributed energy regulation and control decision model, and S3 designing an electric Power-to-simple Internet of Things communication and computing resource collaborative optimization algorithm (IDEAL) based on the regulation and control information freshness perception. The problem of the freshness of the regulation and control information is modeled by optimizing batch scale and multi-mode channel selection in the training process of the distributed energy regulation and control decision model, so that the loss function of the energy regulation and control decision model is minimized under the condition of guaranteeing the freshness of the regulation and control information. The IDEAL algorithm is borne in a controller of a control layer of a garden distributed energy regulation and control information freshness guarantee system. The controller can dynamically optimize the batch scale and the multi-mode channel selection by executing the algorithm in the training of the distributed energy regulation decision model, and can realize the long-term guarantee of the freshness of the regulation information.
S1. Garden distributed energy regulation and control decision model training
The method adopts a federated learning architecture iterative training park distributed energy regulation decision model, supposes that T iterations are needed totally, and the set is represented as T = {1, …, T, …, T }. Each iteration comprises four steps, which are specifically described as follows:
1) Issuing a global model: and the controller issues the global model to the terminal through a multi-mode communication network fusing AC/DC PLC, WLAN and 5G.
2) Local model training: each terminal performs local model training based on the local data set.
3) Uploading a local model: and each terminal uploads the trained local model to the controller through the multi-mode communication network.
4) And (3) global model training: after receiving the local models uploaded by all the terminals, the controller trains a global model based on weighted aggregation, and supports accurate distributed energy regulation and control optimization.
Due to the strong downlink transmission capability, the overall model issuing time delay is negligible. Therefore, the method mainly considers three steps of local model training, local model uploading and global model aggregation.
(1) Local model training
Assuming that there are N internet of things terminals, the set is denoted as N = {1, …, N, … N }. In the t iteration, the terminal n uses the global model omega after the t-1 iteration t-1 Updating the local model ω n,t-1 I.e. omega n,t-1 =ω t-1 . Subsequently, terminal n utilizes local data set D n Training the local model. Defining the number of samples of the terminal n used for local model training in the t-th iteration as a batch size beta n,t A loss function is employed to quantify the deviation between the true output of the model and the target output. Defining the local loss function of the terminal n at the t-th iteration as the average loss of the local samples, i.e.
Figure BDA0003558664750000111
Wherein the sample loss function f (ω) n,t-1 ,x n,m ) Quantizes the local model omega n,t-1 In the local data set D n The performance difference between the output of the m-th sample and the optimal output. F nn,t-1n,t ) Reflects the local model omega n,t-1 Can be used for local model updating. Based on the gradient descent method, the local model of the terminal n is updated to
Figure BDA0003558664750000112
Wherein gamma > 0 is the learning step length,
Figure BDA0003558664750000113
as a loss function F nn,t-1n,t ) With respect to the local model ω n,t-1 Of the gradient of (a).
Defining the available computing resource of the terminal n in the t iteration as f n,t Then the time delay and energy consumption of the local model training are
Figure BDA0003558664750000121
Figure BDA0003558664750000122
Wherein e is n Is the coefficient of energy consumption (Watt s) 3 /cycle 3 ),ξ n The number of CPU cycles (cycles/sample) required to train a single sample.
(2) Local model upload
Assume that there are J multimodal channels, including J 1 A 5G channel, J 2 A WLAN channel and J 3 One PLC channel, i.e. J = J 1 +J 2 +J 3 . Channel set is denoted as θ = {1, …, J 1 ,…,J 1 +J 2 …, J }, where J =1, …, J 1 For 5G channels, J = J 1 +1,…,J 1 +J 2 For WLAN channel, J = J 1 +J 2 +1, …, J is the PLC channel. Defining a channel allocation variable as alpha n,j,t E {0,1}. Wherein alpha is n,j,t =1 denotes that in the tth iteration the controller allocates channel j to terminal n for uploading local model, otherwise α n,j,t =0. In the t iteration, the transmission rate of the model uploaded by the terminal n through the channel j is
Figure BDA0003558664750000123
Wherein, B n,j In order to be the bandwidth of the channel,
Figure BDA0003558664750000124
in order to obtain the gain of the channel,
Figure BDA0003558664750000125
in order to transmit power in the uplink direction,
Figure BDA0003558664750000126
the electromagnetic interference power for the operation of the electrical equipment,
Figure BDA0003558664750000127
is the noise power.
Definition of | ω n,t I is the local model omega n,t The size (bits), the time delay and the energy consumption of the local model uploaded by the terminal n are
Figure BDA0003558664750000128
Figure BDA0003558664750000129
The total energy consumption of the terminal n in the t iteration is the sum of the energy consumption of the training and uploading of the local model and is expressed as
Figure BDA00035586647500001210
In the t-th iteration, the controller receives the local model of terminal n with a delay of
Figure BDA00035586647500001211
(3) Global model training
After the controller receives the local models of the N terminals, the global model is trained based on the local model weighted aggregation, and the global model is expressed as
Figure BDA00035586647500001212
Wherein the content of the first and second substances,
Figure BDA0003558664750000131
and the weight of the local model of the terminal N is expressed and defined as the ratio of the batch size of the local model to the sum of the batch sizes of the N terminals.
The difference between the real output of the global model and the target output is quantified using a global loss function, defined as a weighted sum of the local loss functions of the N terminals, i.e.
Figure BDA0003558664750000132
Local training, local model uploading and global model aggregation need to meet the constraint of regulating and controlling information freshness.
The regulation and control information freshness is an information timeliness measurement index, and has an important influence on the accuracy and the real-time performance of the distributed energy regulation and control. The higher the freshness of the information adopted during the training of the regulation and control model, the smaller the performance gap between the generated regulation and control strategy and the optimal strategy. Because the controller can start the global model training after receiving all the terminal local models, the regulation and control information freshness is closely related to the time delay experienced by each terminal local model received by the controller.
The local training model information age schematic diagram is shown in fig. 2 (a) and fig. 2 (b), and the regulation information freshness constraint model is specifically described as follows:
defining the age (AoI, age of information) of the local model obtained by the terminal n in the t-th iterative training as the time delay from the terminal n leaving the model to the global model training, wherein the time delay mainly comprises transmission time delay
Figure BDA0003558664750000133
And waiting time delay
Figure BDA0003558664750000134
Is shown as
Figure BDA0003558664750000135
As shown in fig. 2 (a), due to the difference between the available computing resources and communication media of the terminals, the model that arrives first needs to wait for the controller to receive the local models of all terminals before participating in the global model training, which results in the increase of the age of the regulatory information and the decrease of the freshness of the information. Thus, the local model of terminal n waits for a delay
Figure BDA0003558664750000136
Dependent on the time delay experienced by the controller receiving the last terminal local model, i.e.
Figure BDA0003558664750000137
Defining the regulation information freshness of the terminal n in the t-th iteration as the reciprocal of the information age, namely
Figure BDA0003558664750000138
And the information freshness can be guaranteed and regulated by restricting the model with the largest information age. Defining the set of the freshness of all terminal regulation information as h t ={h 1,t ,…,h n,t ,…,h N,t The long-term constraint model for regulating information freshness by iteration for T times can be constructed as
Figure BDA0003558664750000139
Wherein h is min A threshold is constrained for information freshness.
Comparing fig. 2 (a) and fig. 2 (b), it can be known that dynamically adjusting the multi-modal channel allocation and the batch size strategy can reduce the information age and improve the information freshness. As shown in fig. 2 (a), due to poor computing performance of the terminal 1, the local models uploaded by the terminals 2 and 3 need to wait for the terminal 1 to finish uploading the local models before being aggregated, so that the information of the terminals 2 and 3 is aged and has low information freshness. As shown in fig. 2 (b), by coordinating channel allocation and batch size, the batch size of the terminals 2 and 3 is increased, and the 5G and WLAN channels with better channel quality are allocated to the terminals, so that the waiting time delay is eliminated, the information freshness of the global model is improved, meanwhile, the global model can be trained by using more samples, the global loss function is reduced, and the accuracy and reliability of distributed energy regulation are ensured.
S2, modeling aiming at regulation and control information freshness guarantee problem of multi-mode channel allocation optimization and batch scale optimization
The invention aims to solve the problem of minimizing the loss function of a distributed energy regulation and control decision model, and the optimization goal is to ensure the freshness of regulation and control information and other long-term constraints and simultaneously pass throughCollaborative optimization of power-to-simple Internet of things communication and computing resources to minimize global loss function F (omega) of regulation and control model after T iterations T ). Defining a set of multi-modal channel allocation optimization variables as alpha n,t ={α n,1,t ,…,α n,j,t ,…,α n,J,t The set of batch-scale optimization variables is beta n,t ={1,2,…,D n The optimization problem is constructed as
Figure BDA0003558664750000141
Wherein, C 1 Indicating that each channel can only be allocated to one terminal; c 2 Indicating that each terminal can only be assigned one channel; c 3 Represents the terminal local model training batch size constraint, where n | represents a terminal n local data set D n The size of (d); c 4 Is a long-term constraint on the energy consumption of terminal n, where E n,max A long-term energy budget for terminal n; c 5 Regulating information freshness long-term constraint model for T times of iteration; c 6 Representing terminal transmission power constraints, where P PLC 、P WLAN And P 5G Respectively, PLC, WLAN and 5G channel transmission power.
Because the optimization strategy of each iteration is not only in accordance with the global loss function F (omega) after T iterations T ) Coupling and coupling with long-term constraints such as information freshness and the like, so that the optimization problem P1 is difficult to directly solve and the optimization problem decoupling between iterations is required;
for the first coupling, F (ω) is scaled and theorem is applied T ) Is decoupled into
Figure BDA0003558664750000151
Wherein, F (ω) t-1 ) Is a global loss function after the t-1 th iteration and is a known parameter during the t-th iteration optimization. From the above formula, F (ω) T ) Global loss function F (ω) only with the t-th iteration t ) Correlation, i.e. F (ω) T ) Is superior toThe transformation into a loss function F (ω) for the t-th iteration t ) And (6) optimizing.
For the second coupling, based on the virtual queue theory, the structures corresponding to the constraint C are respectively constructed 4 And C 5 Terminal energy consumption deficit virtual queue G n (t) and the virtual queue H (t) for the freshness red of the control information, the queue backlog of which is updated to
Figure BDA0003558664750000152
H(t+1)=max{H(t)-min{h t }+h min ,0} (19)
Wherein, G n (t) represents the energy consumption and energy budget E for terminal n after the t-th iteration n,max The deviation between the/T, H (T) represents the constraint H for regulating the freshness of the information and the freshness of the information after the T-th iteration min The deviation therebetween.
Based on the Lyapunov optimization theory, the Lyapunov drift plus penalty is calculated, the upper bound of the Lyapunov drift plus penalty is deduced, P1 can be decoupled into a short-term optimization problem of minimizing each iteration loss function, and the optimization target is the weighted sum of the minimized loss function, the regulation and control information freshness red and the terminal energy consumption red. The joint optimization problem for the t-th iteration is represented as
Figure BDA0003558664750000153
Wherein, V H And V G Is a weight corresponding to a regulatory information freshness deficit and a terminal energy consumption deficit.
Further modeling the transformed problem P2 as an MDP optimization problem, wherein key elements of the MDP optimization problem comprise a state space, an action space and a return function, and the MDP optimization problem is specifically introduced as follows:
1) State space: defining the set of terminal energy consumption red characters as G (t) = { G = 1 (t),…,G n (t),…,G N (t) }, the set of terminal energy budgets is E max ={E 1,max ,…E 2,max ,…,E N,max }. The state space comprises a terminal energy consumption font and a terminal energy consumption toneThe control information freshness red, the terminal energy budget, the control information freshness constraint threshold and the like are expressed as
Figure BDA0003558664750000154
2) An action space: the motion space is defined as A t ={A 1,t ,…,A n,t ,…,A N,t In which A is n,t The motion space corresponding to terminal n is denoted as α n,t And beta n,t Of Cartesian products, i.e.
Figure BDA0003558664750000155
3) A return function: the reward function is defined as the optimization objective of P2, i.e.
Figure BDA0003558664750000156
S3, designing cooperative optimization algorithm of communication and computing resources of power-to-simple Internet of things based on regulation and control information freshness perception
The invention provides a method and a system for guaranteeing regulation and control freshness of a park distributed energy, and provides an electric Power-to-simple Internet of Things communication and computing resource collaborative optimization algorithm (IDEAL) based on regulation and control information freshness perception. The algorithm is applied to a control layer and is used for coordinating and controlling all terminals of a park to participate in training of a distributed energy regulation and control decision model. The core idea is to quantize and fit the state-action value in a high-dimensional state space by using a deep Q network, namely the Q value representing the action accumulated reward value, and optimize channel allocation and batch scale decision according to the Q value.
The structure diagram of the IDEAL algorithm is shown in fig. 3, and includes a main network, a target network, an experience pool, a conflict resolution module assigned by multi-mode channels, a regulation information freshness deficit update module, and the like.
The execution subject of the IDEAL algorithm is a controller. For each terminal, e.g. terminal n, the controller constructs two DQNs, one for eachMain network for optimizing decision
Figure BDA0003558664750000161
And a target network for assisting in primary network training
Figure BDA0003558664750000162
The target network and the main network have the same neural network structure, and the target value of the main network is kept relatively fixed in a period of time by adopting a longer target network updating period, so that the learning stability is improved. The controller constructs an experience pool for storing experience data such as status, actions, costs, etc. On the basis, IDEAL adopts an experience playback mechanism, and trains DQN through periodically randomly sampling part of experience data, so that the problem of correlation and non-stable distribution of the experience data is solved, and the optimization performance is improved.
One iteration of the training of the regulation and control decision model can be divided into the following three steps: firstly, the controller optimizes channel allocation and batch size decision based on the Q value estimated by the main network, and solves multi-mode channel allocation conflict by comparing the Q values, and the core idea is to allocate the channel to the terminal which can obtain the maximum state-action value. And secondly, the controller issues channel allocation and batch scale decision, all terminals execute local model training and model uploading, and energy consumption information is fed back to the controller. And finally, based on the information uploaded by the terminal, the controller updates and controls the information freshness deficit and the terminal energy consumption deficit, calculates a return function, updates an experience pool, and transfers to the next state. The controller calculates a DQN loss function, updates the main network parameters according to the DQN loss function, and periodically updates the target network parameters.
IDEAL algorithm execution flow is shown in algorithm 1 and includes three phases, initialization (lines 1-3), action selection and multi-modal channel assignment conflict resolution (lines 5-16) and learning (lines 17-25), respectively.
1) An initialization stage: initialization G n (t)=0,H(t)=0,α n,j,t =0,β n,t =0,
Figure BDA0003558664750000163
Defining the set of terminals which are not distributed with channels as N t And initializing N t And (N). Defining a terminal N ∈ N t Is theta n,t And initializing θ n,t =θ。
2) Action selection and multi-mode channel allocation conflict resolution stage: first, the controller selects an action for each terminal based on an epsilon-greedy algorithm, taking terminal n as an example, and the controller is based on the main network of terminal n
Figure BDA0003558664750000164
Estimated Q value
Figure BDA0003558664750000165
The actions are randomly selected with probability epsilon and the action with the maximum Q value is selected with probability 1-epsilon
Figure BDA0003558664750000166
Secondly, when there is a channel allocation conflict, e.g. terminals n and m are allocated channel j simultaneously and
Figure BDA0003558664750000171
the controller assigns a channel j to a terminal n having a larger Q value and rejects the terminal m by comparing Q values of the terminals n and m. The controller then moves terminal N out of the set of terminals for which no channel is allocated, i.e., N t =N t N and setting the Q value of the rejected terminal m to
Figure BDA0003558664750000172
Wherein a is m,t For terminal m action space A m,t Of (d) is an action set, denoted as a, corresponding to channel j m,t ={A m,t (j,1),A m,t (j,2),…,A m,t (j,|Δ n |) }. Based on the updated Q value, the above action selection and multi-modal channel allocation conflict resolution process is repeated until all terminals are allocated channels.
Finally, the controller issues channel allocation and batch scale decision, the terminal N belongs to N and executes local model training and local model uploading according to the decision, and energy consumption information E is obtained n,t And uploading to the controller.
3) A learning stage: in the learning stage, the controller updates the DQN network parameters by calculating a return function after the terminal executes the action, so as to improve the fitting precision of the DQN to the state-action value, enable the DQN to output an optimal strategy, realize channel allocation and batch scale optimization, improve the precision of a global model, guarantee the freshness of regulation and control information, and reduce the energy consumption of the terminal. Firstly, based on the energy consumption information uploaded by the terminal, the controller updates the terminal energy consumption deficit G according to (18) n (t + 1). Meanwhile, the controller calculates and obtains the information freshness of the t iteration according to the received local model timestamp, the model issuing time and the formulas (9), (13) and (14), and updates and controls the information freshness red character H (t + 1) according to the formula (19). The controller calculates a reward function based on (20)
Figure BDA0003558664750000173
It can be seen from (20) that when the freshness of the regulatory information is seriously deviated from the regulation constraint, H (t) is gradually increased, which results in a decrease in the value of the reward function, and forces the controller to adjust channel allocation and batch size decision to reduce the age of the regulatory information, improve the freshness of the regulatory information, and ensure the timeliness of the local terminal model received by the controller, thereby achieving the perception of the freshness of the regulatory information and improving the accuracy and reliability of the control decision of the distributed energy of the controller.
Next, the controller generates a sample
Figure BDA0003558664750000174
For updating playback experience Chi n,t And is transferred to the state S t+1 . Randomly extracting partial sample composition from playback experience pool
Figure BDA0003558664750000175
Is composed of
Figure BDA0003558664750000176
The number of samples in (1). The DQN loss function can be calculated as
Figure BDA0003558664750000177
Wherein
Figure BDA0003558664750000178
Where λ is the discounting factor.
Finally, based on upsilon n Updating the master network parameters
Figure BDA0003558664750000179
As follows
Figure BDA00035586647500001710
Where κ is the learning step size. Every T 0 The target network is updated by the next iteration
Figure BDA00035586647500001711
Algorithm 1IDEAL algorithm
Inputs N, J, T, V, { E n,max },{h min }
Output alpha n,t ,β n,t
1) Stage one: initialization
2) Initialization G n (t)=0,H(t)=0,α n,j,t =0,β n,t =0,
Figure BDA0003558664750000181
3) Defining a set of terminals not allocated with a channel to be N t And initializing N t =N
4)for t=1,…,T
5) And a second stage: action selection and multi-modal channel allocation conflict resolution
6)for n=1,…,N
7) The action with the largest Q value is selected with the probability 1-epsilon
Figure BDA0003558664750000182
8)end for
9)ifα n,j,tm,j,t =1,
Figure BDA0003558664750000183
10 Suppose that
Figure BDA0003558664750000184
Figure BDA0003558664750000185
11 The controller assigns a channel j to a terminal n having a larger Q value and rejects a terminal m
12)end if
13 Update N) t =N t \n
14 Set the Q value of the rejected terminal m to
Figure BDA0003558664750000186
15 Based on the updated Q value, repeating the above action selection and multi-modal channel assignment conflict resolution process until all terminals are assigned channels
16 N belongs to N to execute local model training and local model uploading according to the decision, and energy consumption information E is obtained n,t Upload to the controller
17 Stage three): study of
18 The controller updates the terminal energy consumption deficit G according to (18) and (19) n (t + 1) and a control information freshness red character H (t + 1), calculating a return function according to (20)
Figure BDA0003558664750000187
Generating a sample eta n,t And updating playback experience Chi n,t
19 ) transition to state S t+1
20)for n=1,…,N
21 Controller calculates a loss function upsilon from (21) n Updating the main network according to (23)
Figure BDA0003558664750000188
22)if t mod T 0 =0
23 Update)
Figure BDA0003558664750000189
24)end if
25)end for
26)end for
The invention simulates the IDEAL algorithm and sets two comparison algorithms for comparing and verifying the performance, wherein the comparison algorithms are set as follows:
comparison algorithm 1: the method comprises the steps that a low-latency resource allocation algorithm (FLRA) based on federated deep reinforcement learning is used for minimizing a federated learning global loss function by optimizing batch size and a channel allocation strategy based on a deep deterministic strategy gradient, and the algorithm does not have the energy consumption sensing and regulation information freshness sensing capabilities.
Comparison algorithm 2: an adaptive federal learning batch size optimization Algorithm (AFLB) is optimized based on a near-end strategy, and by optimizing a batch size minimum global loss function, the algorithm cannot realize channel allocation optimization, cannot solve channel allocation conflict, and does not have the capability of regulating and controlling information freshness perception.
Fig. 4 depicts the variation of the global loss function with the number of iterations. As the number of iterations increases, the global penalty function decreases first and then stabilizes. At 200 iterations, the global loss function of IDEAL is reduced by 63.29% and 38.88% compared to FLRA and AFLB, respectively. IDEAL can maximally participate in the batch scale of local model training on the premise of guaranteeing long-term constraint of terminal energy consumption and regulation and control information freshness, so that a global loss function is reduced. The relevant simulation results are further illustrated in fig. 5.
FIG. 5 compares the average regulatory information freshness and average batch size of different algorithms. Wherein, the average regulation information freshness and the average batch size are respectively defined as
Figure BDA0003558664750000191
And
Figure BDA0003558664750000192
compared with FLRA and AFLB, the average control information freshness of IDEAL is respectively improved by 20.59 percent and 57.69 percent, and the average batch size is respectively improved by 70.37 percent and 6.98 percent. For a terminal with poor computing power, the IDEAL reduces the transmission delay by allocating a channel with better quality to the terminal. For a terminal with large waiting time delay, IDEAL reduces the waiting time delay by increasing the batch scale of local training, and improves the freshness of regulation and control information.
Fig. 6 depicts training delay, transmission delay, latency, and average batch size as a function of iteration number. After 200 iterations, the training delay is increased by 23.08%, the transmission delay is reduced by 52.50%, the waiting delay is reduced by 71.88%, and the total delay is reduced by 21.17%. The IDEAL adjusts the ratio of the training delay to the waiting delay, i.e., reduces the transmission delay by optimizing the channel allocation, and increases the training delay by increasing the batch size, thereby significantly reducing the waiting delay and the total delay.
Fig. 7 compares the terminal energy consumption and the regulation information freshness distribution of different algorithms for 200 iterations. According to simulation results, IDEAL has the least terminal energy consumption median and the highest regulation and control information freshness median. Compared with FLRA and AFLB, the fluctuation range of IDEAL terminal energy consumption is respectively reduced by 24.93 percent and 16.38 percent, and the fluctuation range of the freshness of the regulation and control information is respectively reduced by 30.97 percent and 39.61 percent. The FLRA cannot guarantee the long-term constraint of the energy consumption and the freshness of the regulation information of the terminal, and the fluctuation range is large. In combination with fig. 5, the flra has a smaller batch size and lower training energy consumption, so the terminal energy consumption performance is better than the AFLB.
FIG. 8 depicts average regulatory information freshness and average information age with regulatory information freshness weight V H The variation of (2). With V H And increasing, gradually reducing the average information age, gradually increasing the freshness of the average regulation and control information, and simultaneously obviously reducing the waiting time delay. When V is H When the number is increased from 5 to 12, the freshness of the average regulation and control information is improved by 55.56%, the average information age is reduced by 35.61%, and the waiting time delay is reduced by 67.05%. Simulation results show that IDEAL mainly reduces the information age and improves the freshness of the regulation and control information by reducing the waiting time delay.

Claims (8)

1. A method for guaranteeing the freshness of regulation and control information of park distributed energy utilizes a system for guaranteeing the freshness of regulation and control information of park distributed energy, and the system for guaranteeing the freshness of regulation and control information of park distributed energy comprises a data layer, a network layer, a control layer and a service layer from bottom to top;
the data layer provides sample data and a local model for garden distributed energy regulation and control decision model training by deploying an internet of things terminal on the electrical equipment;
the network layer comprises a plurality of communication media and provides a channel for interaction of the data layer and the control layer;
the control layer is used for reducing the age of the regulation and control information by adjusting channel allocation and batch scale decision, improving the freshness of the regulation and control information and ensuring the timeliness of the local terminal model received by the controller;
a service layer including an energy regulation service;
the method is characterized by comprising the following steps:
s1, training a garden distributed energy regulation decision model;
s2, modeling a regulation and control information freshness guarantee problem in distributed energy regulation and control decision model training;
s3, designing an IDEAL algorithm for power-to-simple Internet of things communication and computing resource collaborative optimization based on regulation and control information freshness perception;
the problem of the freshness of the regulation and control information is modeled by optimizing batch scale and multi-mode channel selection in the training process of the distributed energy regulation and control decision model, so that the loss function of the energy regulation and control decision model is minimized under the condition of guaranteeing the freshness of the regulation and control information;
the proposed IDEAL algorithm is carried in a controller of a control layer of a garden distributed energy regulation and control information freshness guarantee system; the controller dynamically optimizes the batch scale and the multi-mode channel selection by executing the IDEAL algorithm in the training of the distributed energy regulation decision model, so that the long-term guarantee of the freshness of the regulation information can be realized;
the IDEAL algorithm structure comprises a main network, a target network, an experience pool, a multi-mode channel allocation conflict resolution module and a regulation and control information freshness red character updating module;
the execution subject of the IDEAL algorithm is a controller; for each terminal n, the controller constructs two deep Q networks DQN, respectively a main network for optimization decision
Figure FDA0003860262050000011
And a target network for assisting in primary network training
Figure FDA0003860262050000012
The target network and the main network have the same neural network structure, and the target value of the main network is kept relatively fixed in a period of time by adopting a longer target network updating period, so that the learning stability is improved; the controller constructs an experience pool for storing experience data; on the basis, IDEAL adopts an experience playback mechanism and trains DQN through periodically randomly sampling part of experience data;
the IDEAL algorithm execution process comprises three stages of initialization, action selection and multi-mode channel allocation conflict resolution and learning respectively;
1) An initialization stage: initialization G n (t)=0,H(t)=0,α n,j,t =0,β n,t =0,
Figure FDA0003860262050000013
G n (t) represents the energy consumption and energy budget E for terminal n after the t-th iteration n,max The deviation between/T;
h (t) denotes the t-th iterationConstraint h for post regulation and control of information freshness and information freshness min The deviation therebetween;
α n,j,t e {0,1} is a channel allocation variable; wherein alpha is n,j,t =1 denotes that in the tth iteration the controller allocates channel j to terminal n for uploading local model, otherwise α n,j,t =0;
β n,t The batch size is the number of samples used by the terminal n for local model training in the t-th iteration;
the N sets of the internet of things terminals are expressed as
Figure FDA0003860262050000021
The channel set is represented as
Figure FDA0003860262050000022
Wherein J =1, …, J 1 For 5G channels, J = J 1 +1,…,J 1 +J 2 For WLAN channels, J = J 1 +J 2 +1, …, J is a PLC channel; t iterations, set represented as
Figure FDA0003860262050000023
Here means that for all terminals, channels and iterations of the physical association, their energy consumption is biased by G n (t), the control information freshness deviation H (t) and the channel allocation variable are all initialized to 0;
defining the set of terminals not allocated with channels as
Figure FDA0003860262050000024
And initialize
Figure FDA0003860262050000025
Definition terminal
Figure FDA0003860262050000026
Is a set of allocable channels
Figure FDA0003860262050000027
And is firstInitialization
Figure FDA0003860262050000028
2) Action selection and multi-mode channel allocation conflict resolution stage:
first, the controller selects an action for each terminal based on an epsilon-greedy algorithm, taking terminal n as an example, S t Is a state space, A n,t In order to perform the space of the action,
Figure FDA0003860262050000029
for master network parameters, the controller bases on the terminal n master network parameters
Figure FDA00038602620500000210
Estimated Q value
Figure FDA00038602620500000211
Reflect in the state space S t Lower execution motion space A n,t The corresponding value is selected randomly with probability epsilon, and the action with the maximum Q value is selected with probability 1-epsilon
Figure FDA00038602620500000212
Second, when there is a channel allocation conflict, the terminals n and m are allocated the channel j simultaneously and
Figure FDA00038602620500000213
the controller distributes a channel j to a terminal n with a larger Q value and rejects the terminal m by comparing the Q values of the terminal n and the terminal m; subsequently, the controller moves terminal n out of the set of terminals to which no channel is allocated, i.e., the controller moves terminal n out of the set of terminals to which no channel is allocated
Figure FDA00038602620500000214
And sets the Q value of the rejected terminal m to
Figure FDA00038602620500000215
Wherein a is m,t Operate null for terminal mA between m,t Of (d) is an action set, denoted as a, corresponding to channel j m,t ={A m,t (j,1),A m,t (j,2),…,A m,t (j,|Δ n |); wherein | Δ n | represents a terminal n local data set
Figure FDA00038602620500000216
The size of (d); repeating the action selection and multi-mode channel allocation conflict resolution process based on the updated Q value until all terminals are allocated with channels;
finally, the controller issues channel allocation and batch size decision, and the terminal
Figure FDA00038602620500000217
Performing local model training and local model uploading according to the decision, and transmitting energy consumption information E n,t Uploading to a controller;
3) A learning stage: in the learning stage, the controller updates the DQN network parameters by calculating a return function after the terminal executes the action so as to improve the fitting precision of the DQN to the state-action value, enable the DQN to output an optimal strategy, realize channel allocation and batch scale optimization, improve the precision of a global model, guarantee the freshness of regulation and control information and reduce the energy consumption of the terminal;
the learning stage comprises the following steps: firstly, based on the energy consumption information uploaded by the terminal, the controller updates the terminal energy consumption deficit virtual queue G n (t + 1); meanwhile, the controller calculates and obtains the information freshness of the t iteration according to the received local model timestamp, the model issuing time, the time delay of the local model of the terminal n, the time delay of the last terminal local model and the reciprocal of the local model information age, and updates and controls the information freshness red virtual queue H (t + 1); the controller calculates a return function
Figure FDA0003860262050000031
When regulatory information freshness deviates seriously from the prescribed constraints, H (t) increases gradually, resulting in a decrease in the value of the return function, forcing the controller to adjust channel allocation and schedulingThe method comprises the steps of carrying out quantity-scale decision to reduce the age of regulation and control information, improving the freshness of the regulation and control information and ensuring the timeliness of a local terminal model received by a controller, so that the freshness perception of the regulation and control information is realized, and the accuracy and the reliability of the distributed energy regulation and control decision of the controller are improved;
next, the controller generates a sample
Figure FDA0003860262050000032
For updating playback experience Chi n,t And is transferred to the state S t+1 (ii) a Randomly extracting partial sample composition from playback experience pool
Figure FDA0003860262050000033
Is composed of
Figure FDA0003860262050000034
The number of samples in (1); DQN loss function is calculated as
Figure FDA0003860262050000035
Wherein
Figure FDA0003860262050000036
Wherein λ is a discounting factor;
finally, based on upsilon n Updating the master network parameters
Figure FDA0003860262050000037
As follows
Figure FDA0003860262050000038
Wherein, k is the learning step length; every T 0 The target network is updated by the next iteration
Figure FDA0003860262050000039
2. The method of claim 1, wherein the regulation and control business comprises energy storage regulation, distributed energy output prediction, flexible load regulation, and distributed photovoltaic regulation.
3. The method for guaranteeing freshness of regulation and control information of distributed energy resources of a park according to claim 1, wherein in the step S1, a federal learning architecture is adopted to iteratively train a regulation and control decision model of distributed energy resources of the park, assuming that a total of T iterations are required, the set is expressed as
Figure FDA00038602620500000310
Each iteration comprises four steps:
1) Issuing a global model: the controller issues the global model to the terminal through a multi-mode communication network fusing AC/DC PLC, WLAN and 5G;
2) Local model training: each terminal executes local model training based on the local data set;
3) Uploading a local model: each terminal uploads the trained local model to the controller through the multi-mode communication network;
4) And (3) global model training: after receiving the local models uploaded by all the terminals, the controller trains a global model based on weighted aggregation, and supports accurate distributed energy regulation and control optimization.
4. The method of claim 3, wherein the local model training comprises the following steps:
the N sets of the internet of things terminals are expressed as
Figure FDA0003860262050000041
In the t iteration, the terminal n firstly uses the t-1 iterationGlobal model of (a) ("omega") t-1 Updating the local model ω n,t-1 (ii) a Subsequently, terminal n utilizes the local data set
Figure FDA0003860262050000042
Training a local model by using the partial samples; defining the number of samples of the terminal n used for local model training in the t-th iteration as a batch size beta n,t Quantifying a deviation between a true output and a target output of the model using a loss function; defining the local loss function of the terminal n at the t-th iteration as the average loss of the local samples
Figure FDA0003860262050000043
Wherein the sample loss function f (ω) n,t-1 ,x n,m ) Quantizes the local model omega n,t-1 In a local data set
Figure FDA0003860262050000044
The performance difference between the output of the m-th sample and the optimal output; f nn,t-1n,t ) Reflects the local model omega n,t-1 The accuracy of (3) can be used for local model updating; based on the gradient descent method, the local model of the terminal n is updated to
Figure FDA0003860262050000045
Wherein gamma > 0 is the learning step length,
Figure FDA0003860262050000046
as a loss function F nn,t-1n,t ) With respect to the local model ω n,t-1 A gradient of (a);
defining the available computing resource of the terminal n in the t iteration as f n,t Then the time delay and energy consumption of the local model training are
Figure FDA0003860262050000047
Figure FDA0003860262050000048
Wherein e is n To coefficient of energy consumption, watt s 3 /cycle 3 ;ξ n The number of CPU cycles, cycles/sample, required to train a single sample;
the local model uploading comprises the following processes:
the J multi-modal channels include J 1 A 5G channel, J 2 A WLAN channel and J 3 One PLC channel, J = J 1 +J 2 +J 3 (ii) a The channel set is represented as
Figure FDA0003860262050000049
Wherein J =1, …, J 1 For 5G channels, J = J 1 +1,…,J 1 +J 2 For WLAN channel, J = J 1 +J 2 +1, …, J is a PLC channel; defining a channel allocation variable as alpha n,j,t E {0,1}; wherein alpha is n,j,t =1 denotes that in the t iteration the controller allocates channel j to terminal n for uploading local model, otherwise α n,j,t =0; in the t-th iteration, the transmission rate of the uploading model of the terminal n through the channel j is
Figure FDA00038602620500000410
Wherein, B n,j In order to be the bandwidth of the channel,
Figure FDA00038602620500000411
in order to obtain the gain of the channel,
Figure FDA00038602620500000412
in order to transmit power in the uplink direction,
Figure FDA00038602620500000413
the electromagnetic interference power for the operation of the electrical equipment,
Figure FDA00038602620500000414
is the noise power;
define | ω n,t I is the local model omega n,t The size (bits) of the terminal n, the time delay and the energy consumption for uploading the local model are
Figure FDA0003860262050000051
Figure FDA0003860262050000052
The total energy consumption of the terminal n in the t iteration is the sum of the energy consumption of the training and uploading of the local model and is expressed as
Figure FDA0003860262050000053
In the t-th iteration, the controller receives the local model of terminal n with a delay of
Figure FDA0003860262050000054
The global model training comprises the following processes:
after the controller receives the local models of the N terminals, the global model is trained based on the local model weighted aggregation, and the global model is expressed as
Figure FDA0003860262050000055
Wherein the content of the first and second substances,
Figure FDA0003860262050000056
representing the weight of a local model of the terminal N, and defining the weight as the ratio of the batch size of the terminal N to the sum of the batch sizes of the N terminals;
quantifying the difference between the real output and the target output of the global model by using a global loss function, defined as a weighted sum F (omega) of N terminal local loss functions t ) Is shown as
Figure FDA0003860262050000057
5. The method for guaranteeing the freshness of the regulatory information of the campus distributed energy resources according to claim 4, wherein the local model training, the local model uploading and the local model weighted aggregation need to meet the constraint of the freshness of the regulatory information;
the constraint model for regulating information freshness is as follows:
defining the age of the local model information obtained by the terminal n in the t-th iterative training as the time delay from the model leaving the terminal n to participating in the global model training, mainly comprising the transmission time delay
Figure FDA0003860262050000058
And waiting time delay
Figure FDA0003860262050000059
Is shown as
Figure FDA00038602620500000510
Local model wait time delay of terminal n
Figure FDA00038602620500000511
Depending on the time delay experienced by the controller receiving the last terminal local model, the representationIs composed of
Figure FDA00038602620500000512
Defining the freshness of the regulation information of the terminal n in the t-th iteration as the reciprocal of the age of the local model information, and expressing the freshness as
Figure FDA0003860262050000061
The freshness of the regulation and control information is guaranteed by constraining the model with the largest information age; defining the set of the freshness of all terminal regulation information as h t ={h 1,t ,…,h n,t ,…,h N,t The long-term constraint model for regulating information freshness by iteration for T times is constructed as
Figure FDA0003860262050000062
Wherein h is min A threshold is constrained for information freshness.
6. The method for guaranteeing the freshness of the regulatory information of the distributed energy resources of the campus of claim 1, wherein in the step S2, the global loss function F (ω) of the regulatory model after T iterations is minimized through cooperative optimization of power-to-simple internet of things communication and computing resources while long-term constraints such as the freshness of the regulatory information are guaranteed T ) (ii) a Defining a set of multi-modal channel allocation optimization variables as: alpha (alpha) ("alpha") n,t ={α n,1,t ,…,α n,j,t ,…,α n,J,t }, the set of batch-scale optimization variables is β n,t ={1,2,…,D n The optimization problem is constructed as follows:
Figure FDA0003860262050000063
Figure FDA0003860262050000064
wherein, C 1 Indicating that each channel can only be allocated to one terminal; c 2 Indicating that each terminal can only be assigned one channel; c 3 Representing terminal local model training batch size constraints, C 4 Is a long-term constraint on the energy consumption of terminal n, where E n,max A long-term energy budget for terminal n; c 5 Regulating information freshness long-term constraint model for T times of iteration; c 6 Representing terminal transmission power constraints, where P PLC 、P WLAN And P 5G Respectively, PLC, WLAN and 5G channel transmission power.
7. The method for guaranteeing freshness of regulation and control information of distributed energy resources of a campus of claim 6, wherein the optimization strategy of each iteration is not only compared with the global loss function F (ω) after T iterations T ) Coupling and coupling with long-term constraints such as information freshness and the like, so that the optimization problem P1 is difficult to directly solve, and the inter-iteration optimization problem decoupling is required;
for the first coupling, F (ω) is scaled and theorem is applied T ) Is decoupled into
Figure FDA0003860262050000071
Wherein, F (ω) t-1 ) Is a global loss function after the t-1 iteration, and is a known parameter during the t-iteration optimization; f (omega) T ) Is converted into a loss function F (ω) for the t-th iteration t ) Optimizing;
for the second coupling, based on the virtual queue theory, the structures corresponding to the constraint C are respectively constructed 4 And C 5 Terminal energy consumption deficit virtual queue G n (t) and a virtual queue H (t) with freshness of control information whose queue backlog is updated to
Figure FDA0003860262050000072
H(t+1)=max{H(t)-min{h t }+h min ,0} (19)
Wherein G is n (t) represents the energy consumption and energy budget E for terminal n after the t-th iteration n,max The deviation between the/T, H (T) represents the constraint H for regulating the freshness of the information and the freshness of the information after the T-th iteration min The deviation therebetween;
based on the Lyapunov optimization theory, calculating Lyapunov drift plus penalty and deducing the upper bound thereof, decoupling P1 into a short-term optimization problem of minimizing each iteration loss function, wherein the optimization target is the weighted sum of the minimized loss function, the regulation information freshness red and the terminal energy consumption red; the joint optimization problem for the t-th iteration is represented as
Figure FDA0003860262050000073
Wherein, V H And V G Weights corresponding to regulatory information freshness and terminal energy consumption deficit;
further modeling the transformed problem P2 as an MDP optimization problem, wherein key elements of the MDP optimization problem comprise a state space, an action space and a return function, and the MDP optimization problem is specifically introduced as follows:
1) State space: defining the set of energy consumption braids of the terminal as G (t) = { G = 1 (t),…,G n (t),…,G N (t) the set of terminal energy budgets is E max ={E 1,max ,…E 2,max ,…,E N,max }; the state space comprises a terminal energy consumption font, a regulation information freshness font, a terminal energy budget, a regulation information freshness constraint threshold and the like, and is represented as
Figure FDA0003860262050000074
2) An action space: the motion space is defined as A t ={A 1,t ,…,A n,t ,…,A N,t In which A is n,t The motion space corresponding to the terminal n is denoted as alpha n,t And beta n,t Of Cartesian products, i.e.
Figure FDA0003860262050000075
3) A return function: the reward function is defined as the optimization objective of P2, i.e.
Figure FDA0003860262050000076
8. The method for guaranteeing the freshness of the regulation and control information of the distributed energy resources of the park according to claim 1, wherein an IDEAL algorithm in S3 is applied to a control layer and used for coordinating and controlling terminals of the park to participate in training of a distributed energy regulation and control decision model; the core idea is to quantize and fit the state-action value in a high-dimensional state space by using a deep Q network, namely the Q value representing the action accumulated reward value, and optimize channel allocation and batch scale decision according to the Q value.
CN202210287027.8A 2022-03-22 2022-03-22 Method and system for guaranteeing freshness of regulation and control information of park distributed energy Active CN114626306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210287027.8A CN114626306B (en) 2022-03-22 2022-03-22 Method and system for guaranteeing freshness of regulation and control information of park distributed energy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210287027.8A CN114626306B (en) 2022-03-22 2022-03-22 Method and system for guaranteeing freshness of regulation and control information of park distributed energy

Publications (2)

Publication Number Publication Date
CN114626306A CN114626306A (en) 2022-06-14
CN114626306B true CN114626306B (en) 2023-01-24

Family

ID=81904355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210287027.8A Active CN114626306B (en) 2022-03-22 2022-03-22 Method and system for guaranteeing freshness of regulation and control information of park distributed energy

Country Status (1)

Country Link
CN (1) CN114626306B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114979014A (en) * 2022-06-30 2022-08-30 国网北京市电力公司 Data forwarding path planning method and device and electronic equipment
CN115174396B (en) * 2022-07-02 2024-04-16 华北电力大学 Low-carbon energy management and control communication network service management method based on digital twinning
CN117240610B (en) * 2023-11-13 2024-01-23 傲拓科技股份有限公司 PLC module operation data transmission method and system based on data encryption

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201508269D0 (en) * 2015-05-14 2015-06-24 Barletta Media Ltd A system and method for providing a search engine, and a graphical user interface therefor
CN112637914A (en) * 2020-12-10 2021-04-09 天津(滨海)人工智能军民融合创新中心 DQN algorithm-based channel switching system and method in dual-channel environment
CN112752337A (en) * 2020-12-16 2021-05-04 南京航空航天大学 System and method for keeping information freshness through relay assistance of unmanned aerial vehicle based on Q learning
CN113902021A (en) * 2021-10-13 2022-01-07 北京邮电大学 High-energy-efficiency clustering federal edge learning strategy generation method and device
CN113988356A (en) * 2021-09-02 2022-01-28 华北电力大学 DQN-based 5G fusion intelligent power distribution network energy management method
CN114143355A (en) * 2021-12-08 2022-03-04 华北电力大学 Low-delay safety cloud side end cooperation method for power internet of things
CN114205374A (en) * 2020-09-17 2022-03-18 北京邮电大学 Transmission and calculation joint scheduling method, device and system based on information timeliness

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102253998B (en) * 2011-07-12 2013-08-14 武汉大学 Method for automatically discovering and sequencing outdated webpage based on Web time inconsistency
CN113162798A (en) * 2021-03-03 2021-07-23 国网能源研究院有限公司 Information transmission optimization method and system of wireless power supply communication network
CN113657678A (en) * 2021-08-23 2021-11-16 国网安徽省电力有限公司电力科学研究院 Power grid power data prediction method based on information freshness

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201508269D0 (en) * 2015-05-14 2015-06-24 Barletta Media Ltd A system and method for providing a search engine, and a graphical user interface therefor
CN114205374A (en) * 2020-09-17 2022-03-18 北京邮电大学 Transmission and calculation joint scheduling method, device and system based on information timeliness
CN112637914A (en) * 2020-12-10 2021-04-09 天津(滨海)人工智能军民融合创新中心 DQN algorithm-based channel switching system and method in dual-channel environment
CN112752337A (en) * 2020-12-16 2021-05-04 南京航空航天大学 System and method for keeping information freshness through relay assistance of unmanned aerial vehicle based on Q learning
CN113988356A (en) * 2021-09-02 2022-01-28 华北电力大学 DQN-based 5G fusion intelligent power distribution network energy management method
CN113902021A (en) * 2021-10-13 2022-01-07 北京邮电大学 High-energy-efficiency clustering federal edge learning strategy generation method and device
CN114143355A (en) * 2021-12-08 2022-03-04 华北电力大学 Low-delay safety cloud side end cooperation method for power internet of things

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Age-optimal scheduling for heterogeneous;Jingzhou Sun,Lehan Wang,Zhiyuan Jiang,Sheng Zhou;《IEEE Transactions on Industrial Informatics》;20210531;全文 *
al.Joint rate control and power;BAO Wei,CHEN He,LI Yonghui;《IEEE Transactions on Industrial Informatics》;20170717;全文 *
Dynamic scheduling for;Yuxuan Sun,Sheng Zhou,Zhisheng Niu,Deniz Gündüz;《IEEE Transactions on Industrial Informatics》;20220131;全文 *
Efficient federated;Van-Dinh Nguyen,Shree Krishna Sharma,Thang X. Vu;《IEEE Transactions on Industrial Informatics》;20210331;全文 *
Low-latency federated;Yunlong Lu ,Xiaohong Huang;《IEEE Transactions on Industrial Informatics》;20210731;全文 *
基于上下文学习的电力物联网接入控制方法;周振宇, 贾泽晗,廖海君,赵雄文,张磊;《通信学报》;20210304;全文 *
基于强化学习算法的智能电网需求侧响应及优化调度策略研究;李金洧;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20220315;全文 *

Also Published As

Publication number Publication date
CN114626306A (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN114626306B (en) Method and system for guaranteeing freshness of regulation and control information of park distributed energy
Liu et al. FedCPF: An efficient-communication federated learning approach for vehicular edge computing in 6G communication networks
CN111277437B (en) Network slice resource allocation method for smart power grid
CN112598150B (en) Method for improving fire detection effect based on federal learning in intelligent power plant
CN113326002A (en) Cloud edge cooperative control system based on computing migration and migration decision generation method
CN113778677B (en) SLA-oriented intelligent optimization method for cloud-edge cooperative resource arrangement and request scheduling
CN113905347B (en) Cloud edge end cooperation method for air-ground integrated power Internet of things
CN114650228B (en) Federal learning scheduling method based on calculation unloading in heterogeneous network
CN114641076A (en) Edge computing unloading method based on dynamic user satisfaction in ultra-dense network
CN114885422A (en) Dynamic edge computing unloading method based on hybrid access mode in ultra-dense network
Su et al. Joint DNN partition and resource allocation optimization for energy-constrained hierarchical edge-cloud systems
CN117119486B (en) Deep unsupervised learning resource allocation method for guaranteeing long-term user rate of multi-cell cellular network
CN116009990B (en) Cloud edge collaborative element reinforcement learning computing unloading method based on wide attention mechanism
Jiang et al. MARS: A DRL-based Multi-task Resource Scheduling Framework for UAV with IRS-assisted Mobile Edge Computing System
CN116341679A (en) Design method of federal edge learning scheduling strategy with high aging
CN116484976A (en) Asynchronous federal learning method in wireless network
CN115883371A (en) Virtual network function placement method based on learning optimization method in edge-cloud collaborative system
Do et al. Actor-critic deep learning for efficient user association and bandwidth allocation in dense mobile networks with green base stations
CN115499441A (en) Deep reinforcement learning-based edge computing task unloading method in ultra-dense network
Behmandpoor et al. Model-free decentralized training for deep learning based resource allocation in communication networks
Duan et al. Lightweight federated reinforcement learning for independent request scheduling in microgrids
Ma et al. FLIRRAS: fast learning with integrated reward and reduced action space for online multitask offloading
CN113835894A (en) Intelligent calculation migration method based on double-delay depth certainty strategy gradient
Tong et al. D2OP: A fair dual-objective weighted scheduling scheme in Internet of Everything
CN117539640B (en) Heterogeneous reasoning task-oriented side-end cooperative system and resource allocation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant