CN115664924A - Endogenous security perception resource management method for social asset participation power grid interaction - Google Patents

Endogenous security perception resource management method for social asset participation power grid interaction Download PDF

Info

Publication number
CN115664924A
CN115664924A CN202211300064.4A CN202211300064A CN115664924A CN 115664924 A CN115664924 A CN 115664924A CN 202211300064 A CN202211300064 A CN 202211300064A CN 115664924 A CN115664924 A CN 115664924A
Authority
CN
China
Prior art keywords
local
model
scheduling
iteration
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211300064.4A
Other languages
Chinese (zh)
Inventor
周振宇
张孙烜
姚子佳
廖海君
杜治钢
甘忠
朱靖恺
姚贤炯
游兆阳
陈毅龙
肖云杰
宋岩
黄大维
郭磊
冯晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
State Grid Shanghai Electric Power Co Ltd
Original Assignee
North China Electric Power University
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University, State Grid Shanghai Electric Power Co Ltd filed Critical North China Electric Power University
Priority to CN202211300064.4A priority Critical patent/CN115664924A/en
Publication of CN115664924A publication Critical patent/CN115664924A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S40/00Systems for electrical power generation, transmission, distribution or end-user application management characterised by the use of communication or information technologies, or communication or information technology specific aspects supporting them
    • Y04S40/20Information technology specific aspects, e.g. CAD, simulation, modelling, system security

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an endogenous security perception resource management method for social assets participating in power grid interaction, which comprises the following steps: the device comprises a device layer, a 6G edge intelligent layer and a digital twin layer; the device layer includes: the communication equipment is deployed on the communication equipment to perform local model training; uploading the local model parameters to a 6G edge intelligent layer for global model averaging; 6G edge intelligent layer deploys an edge server; the edge server guides the local model training and uploads the local parameters to make a communication equipment scheduling strategy; the edge server provides model poisoning attack detection for local model parameters; the digital twin layer realizes real-time interaction with the park communication equipment and assists resource management optimization. The invention has the advantages that: the safe and reliable operation of the intelligent park provides real-time simulation and accurate prediction, endogenous safety perception is achieved, optimization space is reduced, and convergence of multi-time scale resource management optimization is accelerated.

Description

Endogenous security perception resource management method for social asset participation power grid interaction
Technical Field
The invention relates to the technical field of smart parks, in particular to a smart park system and an endogenous security perception resource management method for social assets participating in power grid interaction.
Background
The intelligent park covers multi-energy main bodies such as high-proportion renewable energy sources and distributed energy storage, and the safe and reliable operation of the intelligent park is a key support for implementing new energy safety strategies in China and realizing the aim of 'double carbon'. For realizing the safe and reliable operation in wisdom garden, a large amount of communication equipment are disposed on communication equipment such as distributing type photovoltaic, fill electric pile, block terminal, provide monitoring and control management for the operation in garden. The digital twin technology provides real-time simulation and accurate prediction for safe and reliable operation of the intelligent park by constructing a connection gap between a physical space and a digital system. The device locally trains the model and uploads the model parameters to the edge server for global model averaging, thereby updating the digital twin model.
In order to ensure accurate real-time update between the digital twin and the intelligent park entity network, the local training period scheduling and the communication equipment scheduling need to be intelligently, flexibly and reasonably managed. However, due to the fact that the smart park is complex in energy supply, energy utilization and membership relation of affiliated assets of the smart park, a large number of social assets participate in power grid interaction through an operator network, the smart park is complex in communication environment, various malicious attackers can invade the smart park communication network, and the updating of the digital twin model faces diversified and frequent network attack threats. Among them, the influence of the model poisoning attack on the digital twin model update is the most significant. A malicious attacker destroys the global model average by controlling the communication equipment to upload wrong model parameters, so that the precision of a digital twin model is reduced, the deviation of simulation and prediction is caused, and the safety of the intelligent park is seriously threatened. Therefore, while optimizing campus communication resource management, an endogenous security awareness technology needs to be introduced, so that a campus actively adjusts a resource management strategy to deal with attacks of malicious attackers under social asset participation and interaction. However, the method for managing communication resources for social asset participation and interaction with intrinsic safety perception is still in the starting stage, and the following challenges need to be solved:
firstly, in order to solve the problem that the accuracy of a digital twin model is low due to the fact that a model poisoning attack of a malicious attacker participates in power grid interaction, the number of dispatching equipment and a local training period are increased by the edge server. However, this increases the total delay of the digital twin model training.
Secondly, the social assets of the intelligent park frequently participate in power grid interaction, so that the network environment of the intelligent park is complex, and the random behavior of the malicious attacker is strong. Based on unpredictable attack behaviors of a malicious attacker, the edge server is difficult to actively adjust a resource management strategy so as to ensure high-precision and low-delay digital twin model training.
Thirdly, since the probability change of model poisoning attack is much slower than the change of channel condition, local training period scheduling needs to be optimized on a large time scale; to ensure real-time updating of the digital twin model, the device scheduling needs to be optimized on a small time scale. However, due to the large optimization space, it is difficult for the edge server to make an optimal multi-time scale resource management strategy within a limited optimization time, so that the convergence speed of the optimization is slow.
Prior art 1
Equipment arrangement algorithm based on deep reinforcement learning
In the prior art, when resource management optimization is performed, only small-scale equipment scheduling optimization is considered, large-scale local training period scheduling optimization is not considered, and convergence of a digital twin loss function cannot be effectively guaranteed. In addition, in the prior art, model poisoning attack detection is not considered, so that an algorithm is difficult to perceive intrinsic safety so as to actively adjust a resource management strategy.
Prior art II
Client scheduling algorithm based on upper confidence bound
In the second prior art, when resource management optimization is performed, optimization of average total training time and small-scale equipment scheduling is only considered, optimization of a digital twin loss function and large-scale local training period scheduling is not considered, and the accuracy of a digital twin model and the convergence of the loss function cannot be effectively guaranteed. In addition, model poisoning attack detection is not considered in the second prior art, so that an algorithm cannot sense intrinsic safety easily, and therefore a resource management strategy is adjusted actively.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a smart park system, and an endogenous security perception resource management method and equipment for social asset participation power grid interaction.
In order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows:
a smart park system, comprising: the device comprises a device layer, a 6G edge intelligent layer and a digital twin layer;
the device layer includes: the system comprises communication equipment and electrical equipment, wherein the communication equipment is deployed on the electrical equipment to perform local model training; the communication equipment executes local model training according to a local training period scheduling strategy, and uploads local model parameters to a 6G edge intelligent layer according to the scheduling strategy of the communication equipment to perform global model averaging so as to update a digital twin model of the digital twin layer; the electrical apparatus includes: photovoltaic and charging piles;
a 6G edge intelligent layer deploys a base station carrying an edge server; the edge server schedules a local training period to guide local model training, and uploads and formulates a communication equipment scheduling strategy for local parameters; the edge server provides model poisoning attack detection for the uploaded local model parameters;
the digital twin layer is maintained by an edge server of a 6G edge intelligent layer, real-time interaction with the park communication equipment is realized by the digital twin layer, and synchronization of the digital twin layer and a physical network is kept; the digital twin layer provides channel gain and electromagnetic interference estimation through local state information uploaded by the equipment layer, and provides model poisoning attack probability estimation through model poisoning attack detection of the 6G edge intelligent layer, so that local training period scheduling and communication equipment scheduling optimization of the 6G edge intelligent layer are assisted.
Further, the local training period scheduling strategy performs local model training, including:
each iteration of the edge server needs to schedule the communication equipment to upload local model parameters so as to carry out global model averaging; if a communication device has been identified by the edge server as a malicious communication device, it cannot perform local training and parameter upload; let set of available communication devices for the t-th iteration be D ava (t), set size is denoted as | D ava (t) |, and satisfies
Figure BDA0003903944180000041
Definition a k (t) E {0,1} is a scheduling indicator variable for the communication device for the t-th iteration, a k (t) =1 denotes a communication apparatus d k Scheduled at the t-th iteration, otherwise a k (t)=0;
In the t-th iteration, the scheduled communication device, e.g. communication device d k The global model is downloaded from the edge server to update the local model, which is expressed as follows:
Figure BDA0003903944180000042
where τ (r) represents the local training period, ω, of the r-th round of communication g (t-1) represents the global model for the t-1 th iteration,
Figure BDA0003903944180000043
is shown ast-1 iterative communication device d k The local model of (2);
communication device d in each iteration k Using local datasets
Figure BDA0003903944180000044
Carrying out local model training; definition of x j And y j Are respectively as
Figure BDA0003903944180000045
The input and target output of the jth sample; thus, the tth iterative communication device d k The loss function of (a) is as follows:
Figure BDA0003903944180000046
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003903944180000047
representing the communication device d for the t-th iteration k Is used to determine the loss function of (c),
Figure BDA0003903944180000048
representing a data set
Figure BDA0003903944180000049
The number of samples of (a) to (b),
Figure BDA00039039441800000410
a loss function representing a single sample, defined as the deviation of the actual output from the target output; based on the gradient descent method, the method comprises the following steps of,
Figure BDA00039039441800000411
updating according to
Figure BDA00039039441800000412
Wherein γ represents a learning step;
thus, the tth iteration communication deviced k Local training time delay
Figure BDA00039039441800000413
Represented by the formula:
Figure BDA0003903944180000051
wherein the content of the first and second substances,
Figure BDA0003903944180000052
representing the t-th iteration of the communication device d k Local training delay of ζ k Number of CPU cycles required to process a sample, f k (t) is the locally available computational resource,. Tau. (r) denotes the local training period for the r-th round of communication,
Figure BDA0003903944180000053
representing a data set
Figure BDA0003903944180000054
The number of samples.
Further, uploading the local model parameters to the 6G edge smart tier specifically comprises:
after the training of the local model is finished, the scheduled communication equipment uploads the local model parameters and local state information; the transmission rate is expressed as follows:
Figure BDA0003903944180000055
wherein R is k (t) as the tth iterative communication device d k Transmission rate of B k For transmission bandwidth, p k And g k (t) transmission power and channel gain, σ, respectively 0 And
Figure BDA0003903944180000056
white gaussian noise and electromagnetic interference respectively;
Figure BDA0003903944180000057
is the magnitude of the local model parameter, | s k (t) | is the local state information size, since | s k The size of (t) | is small, so s is ignored k (t) I uploading delay, local model parameter uploading delay
Figure BDA0003903944180000058
Represented by the formula:
Figure BDA0003903944180000059
wherein, define
Figure BDA00039039441800000510
Is the size of the parameters of the local model,
Figure BDA00039039441800000511
and uploading the local model parameters.
Further, the performing global model averaging by the 6G edge smart tier specifically includes:
after the scheduled communication equipment uploads the local model parameters and the state information to the edge server, the edge server executes global model averaging to update the digital twin model; global model ω for the t-th iteration g (t) on average the following formula:
Figure BDA00039039441800000512
wherein, ω is g (t) represents the global model for the tth iteration,
Figure BDA00039039441800000513
representing a data set
Figure BDA00039039441800000514
Number of samples of (a) k (t) communication device d for the tth iteration k Is used to indicate the variable(s) of the schedule,
Figure BDA0003903944180000061
t-th iteration communication device d k Local model parameters of (2);
global model average delay L for the t-th iteration G (t) represents the following formula:
Figure BDA0003903944180000062
wherein L is G (t) mean time delay of global model for the t-th iteration, f g (t) denotes the available computing resources of the t-th iteration edge server, λ 0 Indicating the number of CPU cycles required to process 1 bit of data,
Figure BDA0003903944180000063
represent
Figure BDA0003903944180000064
Size of (a) k (t) communication device d for the tth iteration k A scheduling indication variable of (1);
by using omega g (t) loss function F gg (t), t) to measure the accuracy of the global model, expressed as:
Figure BDA0003903944180000065
wherein, F gg (t), t) represents ω g (t) a loss function of the (t),
Figure BDA0003903944180000066
representing the t-th iteration of the communication device d k Is used to determine the loss function of (c),
Figure BDA0003903944180000067
representing a data set
Figure BDA0003903944180000068
Number of samples of (a) k (t) for the t-th iterationCommunication device d k The schedule indication variable of (2).
Further, the model poisoning attack detection includes:
the model poisoning attack detection is carried out by calculating Euclidean norms of parameters to be detected and average parameters except the parameters, comparing the Euclidean norms with a preset threshold, and if the Euclidean norms are larger than the threshold, judging the Euclidean norms to be error model parameters, otherwise, judging the Euclidean norms to be normal model parameters; removing device
Figure BDA0003903944180000069
Mean outside parameter
Figure BDA00039039441800000610
Calculated, as follows:
Figure BDA00039039441800000611
wherein the content of the first and second substances,
Figure BDA00039039441800000612
to remove
Figure BDA00039039441800000613
The external average parameter, a z (t) communication device d for the tth iteration z Is used to indicate the variable(s) of the schedule,
Figure BDA00039039441800000614
communication device d for the tth iteration z Local model parameters of (2);
Figure BDA00039039441800000615
and
Figure BDA00039039441800000616
euclidean norm d of k (t) represents the following formula:
Figure BDA0003903944180000071
wherein d is k (t) represents
Figure BDA0003903944180000072
And with
Figure BDA0003903944180000073
Euclidean norm of;
d k (t) and
Figure BDA0003903944180000074
proportional reference value e k (t) is represented by the following formula:
Figure BDA0003903944180000075
wherein e is k (t) represents d k (t) and
Figure BDA0003903944180000076
is calculated by the ratio of (a) to (b),
Figure BDA0003903944180000077
to represent
Figure BDA0003903944180000078
Euclidean norm of d k (t) represents
Figure BDA0003903944180000079
And
Figure BDA00039039441800000710
euclidean norm of;
let attack detection variable be b k (t)∈{0,1},b k (t) =1 denotes that the model parameter is a normal parameter, otherwise b k (t) =0; model poisoning attack detection can be represented by
Figure BDA00039039441800000711
Wherein, b k (t) represents an attack detection variable, e Thr Is a detection threshold;
when the communication device d is iterated for the t time k When the uploaded error model parameter ratio exceeds a threshold ξ, the following formula is given:
Figure BDA00039039441800000712
where ξ represents the error model parameter scaling threshold, a k (z) denotes a z-th iteration communication device d k A scheduling indication variable of b k (z) denotes a z-th iteration communication device d k The attack detection variable of (a);
communication device d k Is treated as a malicious communication device and moves out of D ava (t+1)。
Further, the total time delay L of the training of the digital twin model of the tth iteration of the equipment layer Sum (t) the maximum local training delay of the equipment layer, the maximum model parameter uploading delay obtained by the equipment layer transmitting parameters to the 6G edge intelligent layer and the global model average delay obtained by the 6G edge intelligent layer are represented as follows:
Figure BDA00039039441800000713
wherein L is Sum (t) represents the total delay of the digital twin model training,
Figure BDA0003903944180000081
which represents the local training time delay, is,
Figure BDA0003903944180000082
time delay for uploading of local model parameters, L G And (t) is the average time delay of the global model.
The invention also discloses an endogenous security perception resource management method oriented to social assets participating in power grid interaction, which is characterized by comprising the following steps of: the endogenous security perception resource management method is realized on the basis of the intelligent park system;
the endogenous security perception resource management method comprises the following steps:
step 1, initialization: initialization
Figure BDA0003903944180000083
And
Figure BDA0003903944180000084
setting omega (a) T (r),a(t))=0,
Figure BDA0003903944180000085
Wherein the content of the first and second substances,
Figure BDA0003903944180000086
and
Figure BDA0003903944180000087
respectively representing the evaluation network parameters and the target network parameters of the large-scale local training period scheduling,
Figure BDA0003903944180000088
and
Figure BDA0003903944180000089
respectively representing an evaluation network and a target network parameter, omega (a), of small-scale device scheduling T (r), a (t)) represents a small-scale device scheduling overhead function,
Figure BDA00039039441800000810
represents a large-scale local training period scheduling overhead function, a T (r) a large-scale local training period scheduling action, a (t) a small-scale equipment scheduling action;
step 2, selecting a multi-time scale resource management action: at the beginning of each round of communication, i.e., T = (r-1) T 0 +1, the edge server performs large-time-scale local training period scheduling decision by using an epsilon-greedy algorithm; at the beginning of each iteration, similarly, the edge server utilizes the ε -greedy algorithmPerforming equipment scheduling decision of small time scale; the equipment carries out local model training and local model parameter uploading according to the decision; the edge server executes global model tie and model poisoning attack detection and calculates the poisoning attack probability;
step 3, overhead function calculation and intelligent park state conversion: when each iteration is finished, the edge server observes a digital twin loss function and average total training time delay performance;
calculating a cost function of a small-scale equipment scheduling network;
when each round of communication is finished, the edge server calculates the overhead function of the large-scale local training period scheduling network
Figure BDA00039039441800000811
For large-scale local training period scheduling, the edge server generates empirical data
Figure BDA0003903944180000091
For updating the experience playback pool U T (r) and comparing the current state S T (r) transition to the next State S T (r + 1); similarly, for small scale device scheduling, the edge server generates empirical data u D (t)={S D (t),A D (t),Ω(a T (r),a(t)),S D (t + 1) } to update the empirical playback pool U D (t) and comparing the current state S D (t) transition to the next State S D (t+1);
Step 4, learning and multi-time scale DQN network updating: at the end of each communication round, the edge server randomly plays back the pool U from experience T (r) extracting a random sample set
Figure BDA0003903944180000092
Calculating a loss function eta of the large-scale local training period scheduling network according to the following formula T (r)
Figure BDA0003903944180000093
η T (r) is a loss function of the large-scale local training period scheduling network,
Figure BDA0003903944180000094
network Q value, S, scheduling and evaluating for large-scale local training period T (r+1),A T (r+1),
Figure BDA0003903944180000095
Respectively scheduling and evaluating network parameters for a state space and an action space of the r +1 th round of communication and a large-scale local training period of the r-th round of communication,
Figure BDA0003903944180000096
and representing the Q value of the scheduling target network of the local training period, and being calculated by the following formula:
Figure BDA0003903944180000097
wherein iota is a discount factor, omega (a) T (r), a (t)) is a small scale device scheduling overhead function,
Figure BDA0003903944180000098
scheduling and evaluating a network Q value for a large-scale local training period;
loss function eta based on large-scale local training period scheduling network T (r) the edge server updates the parameters using a gradient descent method
Figure BDA0003903944180000099
At the end of each iteration, the edge server randomly plays back the pool U from experience D (t) extracting a random sample set
Figure BDA00039039441800000910
Calculating a loss function η of a small-scale equipment scheduling network D (t) and updating the parameters
Figure BDA00039039441800000911
Every other R 0 In turn, updating large-scale local training period scheduling target network parameters
Figure BDA00039039441800000912
Every other T 0 Secondary iteration for updating small-scale equipment scheduling target network parameters
Figure BDA00039039441800000913
And 5, repeating the steps 2 to 4 until T = T.
Further, in step 2, calculating the poisoning attack probability according to the following formula;
Figure BDA0003903944180000101
wherein the content of the first and second substances,
Figure BDA0003903944180000102
model poisoning attack probability estimation value representing t-th iteration, a k (t) scheduling indicator variable for communication device, b k (t) represents an attack detection variable.
Further, in step 3, calculating a cost function Ω (a) of the small-scale device scheduling network T (r),a(t)),
Figure BDA0003903944180000103
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003903944180000104
delta represents the upper bound of divergence between the gradient of the local model loss function and the gradient of the digital twin loss function, rho and beta represent that the loss function of the local model meets rho-Lipschitz and beta-Smooth, theta represents the upper bound of the parameter norms of all the local models, and theta represents the current iteration digital twin loss function and the converged loss function
Figure BDA0003903944180000105
The lower bound of the divergence there between,
Figure BDA0003903944180000106
Figure BDA0003903944180000107
representing a model poisoning attack probability estimation value of the t iteration;
at the end of each round of communication, i.e. t = rT 0 The edge server calculates the overhead function of the large-scale local training period scheduling network according to the following formula
Figure BDA0003903944180000108
Figure BDA0003903944180000109
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00039039441800001010
scheduling a network overhead function, Ω (a), for a large scale local training period T (r), a (t)) is a small scale device scheduling overhead function.
Further, in step 4, a loss function η of the large-scale local training period scheduling network is calculated T (r) has the following formula:
Figure BDA0003903944180000111
η T (r) is a loss function of the large-scale local training period scheduling network,
Figure BDA0003903944180000112
network Q value, S, scheduling and evaluating for large-scale local training period T (r+1),A T (r+1),
Figure BDA0003903944180000113
Respectively scheduling and evaluating network parameters for a state space and an action space of the r +1 th round of communication and a large-scale local training period of the r-th round of communication,
Figure BDA0003903944180000114
and representing the Q value of the local training period scheduling target network, and being calculated by the following formula:
Figure BDA0003903944180000115
wherein iota is a discount factor, omega (a) T (r), a (t)) is a small scale device scheduling overhead function,
Figure BDA0003903944180000116
scheduling and evaluating a network Q value for a large-scale local training period;
based on eta T (r) the edge server updates the parameters using a gradient descent method
Figure BDA0003903944180000117
The following formula:
Figure BDA0003903944180000118
where, k represents a learning step size,
Figure BDA0003903944180000119
network parameters are scheduled and evaluated for the r +1 th communication round of the large-scale local training period,
Figure BDA00039039441800001110
as a loss function eta T (r) a gradient decrease;
at the end of each iteration, the edge server randomly plays back the pool U from experience D (t) extracting a random sample set
Figure BDA00039039441800001111
According toCalculating a loss function eta of the small-scale equipment scheduling network by the following formula D (t) and updating the parameters
Figure BDA00039039441800001112
Figure BDA00039039441800001113
Figure BDA00039039441800001114
Wherein eta D (t) scheduling a loss function of the network for the small-scale device,
Figure BDA00039039441800001115
a target network Q value is scheduled for the device,
Figure BDA00039039441800001116
evaluating network Q-value, S, for small scale device scheduling D (t+1),A D (t+1),
Figure BDA00039039441800001117
Respectively scheduling and evaluating network parameters for a state space and an action space of the t +1 th iteration and a small-scale device of the t th iteration, wherein k represents a learning step length,
Figure BDA00039039441800001118
network parameters are scheduled and evaluated for the small-scale equipment of the t +1 th iteration,
Figure BDA0003903944180000121
as a loss function eta D (t) a gradient decrease;
every R 0 Round-robin communication, updating large-scale local training period scheduling target network parameters
Figure BDA0003903944180000122
Every other T 0 Sub-iteration, updating the small scaleDevice scheduling target network parameters
Figure BDA0003903944180000123
The invention also discloses a computer device, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of endogenous security-aware resource management described above.
The invention also discloses a computer readable storage medium for storing computer instructions, and the computer instructions are executed by the processor to realize the endogenous security perception resource management method.
Compared with the prior art, the invention has the advantages that:
1. the intelligent park safe and reliable operation real-time simulation and accurate prediction are supported through frequent interaction among multiple levels of the intelligent park system.
2. The method establishes a communication resource management model jointly optimized by a digital twin loss function and an average total training time delay, takes the weighted sum of the digital twin loss function and the average total training time delay as an optimization target, and realizes the dynamic balance of high precision and low time delay of the training of the digital twin model by optimizing a local training period and a communication equipment scheduling strategy under the poisoning attack of the model, thereby providing real-time simulation and accurate prediction for the safe and reliable operation of the intelligent park.
3. The present invention employs model poisoning attack detection to detect erroneous model parameters to estimate the attack probability. When the estimated model poisoning attack probability changes, the edge server can actively adjust a local training period and a scheduling strategy of equipment to ensure the performance of a digital twin loss function and the average total digital twin model training time delay, so that the endogenous safety perception that the social assets of the intelligent park participate in power grid interaction is realized.
4. The invention deduces the minimum number of scheduling devices to reduce the optimization space and accelerate the convergence of the multi-time scale resource management optimization. Then, with the aid of digital twin and model poisoning attack detection, the invention designs an endogenous security perception resource management method for social assets participating in power grid interaction. Specifically, the method adopts a large-time-scale DQN network to learn a large-time-scale local training period scheduling strategy. And based on an optimization strategy of the large-time-scale local training period scheduling, learning a small-time-scale equipment scheduling strategy by adopting a small-time-scale DQN network.
Drawings
FIG. 1 is a diagram of a digital twin enabled smart campus scenario in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of an endogenous security perception resource management method for social assets participating in power grid interaction according to an embodiment of the invention;
FIG. 3 is a flowchart of an embodiment of the invention, which is directed to a method for managing endogenous security perception resources for social assets participating in power grid interaction;
FIG. 4 is a graphical illustration of the weight and performance of an embodiment of the present invention as a function of iteration number;
FIG. 5 is a graphical illustration of the variation of weighting and performance with the probability of an average model poisoning attack according to an embodiment of the present invention;
FIG. 6 is a convergence performance diagram according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.
The digital twin enabled smart campus scenario considered in the embodiments of the present invention is shown in FIG. 1, and aims to train a digital twin model for a smart campus to ensure reliable operation of the smart campus. The considered scenario includes a device layer, a 6G edge smart layer and a digital twin layer. On the equipment layer, the communication equipment is deployed on communication equipment such as photovoltaic equipment and charging piles for local model training. Assuming that the number of communication devices deployed is K, collection representation is D = { D = 1 ,...,d k ,...,d K In which d is k Represents the k < th >And (4) equipment. The device firstly executes local model training according to a local training period scheduling strategy, and then uploads local model parameters to a 6G edge intelligent layer according to the device scheduling strategy to perform global model averaging. Because the intelligent park has complex energy supply, energy utilization and affiliated asset membership, a large number of social assets participate in power grid interaction through an operator communication network, so that the park faces malicious attack threats. And a malicious attacker can apply model poisoning attacks on local model uploading, namely, the accuracy of the digital twin model is reduced by uploading wrong model parameters through the control equipment. The 6G edge intelligent layer is provided with a base station carrying an edge server. Due to the existence of model poisoning attacks, the edge server schedules a local training period to guide local model training and establishes a device scheduling strategy for uploading local parameters. In addition, the edge server provides model poisoning attack detection for the uploaded local model parameters. The digital twin layer is maintained by an edge server, and the layer mainly realizes real-time interaction with the garden equipment and keeps the synchronization of the digital twin and a physical network. Furthermore, digital twinning may assist resource management optimization by providing state estimates of channel gain, electromagnetic interference, model poisoning attack probability, and the like.
Since the probability of model poisoning attacks varies much slower than the channel gain, local training period scheduling needs to be optimized on a large time scale to ensure training stability, and device scheduling needs to be optimized on a small time scale to adapt to rapidly varying link conditions. The invention divides the total optimization time into R communication turns, namely large time scale, and each communication turn is divided into T 0 The second iteration, i.e. the small time scale. The set of communication rounds is denoted as R = { 1., R }, where the set of iterations involved in the R-th round of communication is defined as T (R) = { (R-1) T } 0 +1,(r-1)T 0 +2,...,rT 0 }. The R communication rounds comprise a total of T iterations, i.e. T = RT 0 The set of iterations is defined as T = { 1.
1. Digital twinning model
Definition device d k Is DT k Including a local model of the device
Figure BDA0003903944180000141
Local data set
Figure BDA0003903944180000142
And local state information s k (t),DT k Can be modeled as
Figure BDA0003903944180000143
2. Device scheduling and local training model
The edge server needs to schedule the device to upload local model parameters for global model averaging each time it iterates. If a device has been identified by the edge server as a malicious device, it cannot do local training and parameter uploads. Let the set of available devices for the t-th iteration be D ava (t), set size is denoted as | D ava (t) |, and satisfies
Figure BDA0003903944180000151
Definition a k (t) e {0,1} is the device scheduling indicator variable for the tth iteration, a k (t) =1 denotes an apparatus d k Scheduled at the t-th iteration, otherwise a k (t)=0。
In the t-th iteration, the scheduled device, e.g. device d k The global model is downloaded from the edge server to update the local model, represented as
Figure BDA0003903944180000152
Where τ (r) represents the local training period for the r-th round of communication.
Device d in each iteration k Using local datasets
Figure BDA0003903944180000153
And carrying out local model training. Definition of x j And y j Are respectively as
Figure BDA0003903944180000154
The input of the jth sample and the target output. Thus, the t-th iteration device d k Has a loss function of
Figure BDA0003903944180000155
Wherein the content of the first and second substances,
Figure BDA0003903944180000156
representing a data set
Figure BDA0003903944180000157
The number of samples of (a) to (b),
Figure BDA0003903944180000158
a loss function representing a single sample, defined as the deviation of the actual output from the target output. Based on the gradient descent method, the method of the gradient descent,
Figure BDA0003903944180000159
updating according to
Figure BDA00039039441800001510
Where γ denotes a learning step size.
Thus, the t-th iteration device d k Expressed as local training delay
Figure BDA00039039441800001511
Therein, ζ k Number of CPU cycles required to process a sample, f k (t) is a locally available computing resource.
3. Local upload model
After the training of the local model is finished, the scheduled equipment uploads the local model parameters and the local state information. The transmission rate can be expressed as
Figure BDA0003903944180000161
Wherein, B k For transmission bandwidth, p k And g k (t) transmission power and channel gain, σ, respectively 0 And
Figure BDA0003903944180000162
gaussian white noise and electromagnetic interference, respectively. Definition of
Figure BDA0003903944180000163
The sizes of the local model parameters and the local state information are small, so that the uploading delay of the local model parameters and the local state information is ignored. The local model parameter upload delay is expressed as
Figure BDA0003903944180000164
4. Model poisoning attack model
Model poisoning attacks imposed by malicious attackers on the upload of local model parameters within a campus are modeled as follows
Figure BDA0003903944180000165
Wherein, P a (r) represents the model poisoning attack probability of the r-th round of communication,
Figure BDA0003903944180000166
representing the device d after an attack k Uploaded error parameters, modeled as
Figure BDA0003903944180000167
Wherein, χ ∈ [ -1,1]Denotes a scale factor, n k (t) is compliance
Figure BDA0003903944180000168
Additive noise of (1).
5. Global mean model
After the scheduled device uploads the local model parameters and state information to the edge server, the edge server performs global model averaging to update the digital twin model. The global model average for the t-th iteration is as follows
Figure BDA0003903944180000169
The global model average delay is expressed as
Figure BDA0003903944180000171
Wherein f is g (t) represents the available computing resources of the edge server for the tth iteration.
The invention adopts omega g (t) loss function to measure the accuracy of the global model, expressed as
Figure BDA0003903944180000172
6. Model poisoning attack detection
The model poisoning attack detection adopted by the invention is characterized in that the Euclidean norm of the parameter to be detected and the average parameter except the parameter is calculated and compared with a preset threshold, if the Euclidean norm is greater than the threshold, the Euclidean norm is judged to be the wrong model parameter, otherwise, the Euclidean norm is the normal model parameter. To be provided with
Figure BDA0003903944180000173
For example, except
Figure BDA0003903944180000174
The outer average parameter can be calculated as
Figure BDA0003903944180000175
Figure BDA0003903944180000176
And
Figure BDA0003903944180000177
is expressed as the Euclidean norm
Figure BDA0003903944180000178
d k (t) and
Figure BDA0003903944180000179
is a proportional reference value of
Figure BDA00039039441800001710
Let attack detection variable be b k (t)∈{0,1},b k (t) =1 denotes that the model parameter is a normal parameter, otherwise b k (t) =0. Model poisoning attack detection can be represented by
Figure BDA00039039441800001711
Wherein e is Thr Is the detection threshold.
When the current time reaches the t-th iteration device d k When the uploaded error model parameter ratio exceeds a threshold value, that is
Figure BDA00039039441800001712
Device d k Is taken as a malicious device and moved out of D ava (t+1)。
7. Total time delay model
The total time delay of the digital twin model training of the tth iteration is composed of the maximum local training time delay, the maximum model parameter uploading time delay and the average time delay of the global model, and is expressed as
Figure BDA0003903944180000181
8. Modeling and transforming of numerical twin model loss function and average total training delay weighted and minimized problem
The objective of the invention is to minimize the weighted sum of the digital twin loss function and the average total training delay by jointly optimizing the local training period scheduling and the device scheduling strategy, and the minimization problem is modeled as follows
Figure BDA0003903944180000182
Wherein the content of the first and second substances,
Figure BDA0003903944180000183
c1 and C2 represent device scheduling constraints, and C3 represents discretized local training period constraints. Definition of tau 1 And τ H The minimum and maximum local training periods, respectively, are then the interval [ tau 1H ]Can be quantized into H levels, wherein
Figure BDA0003903944180000184
Since the resource management optimization decision of each iteration is coupled with the optimization objective, P1 is difficult to directly solve. Thus, P1 is converted to a weighted sum of the upper bound of the convergence of the minimization of the digital twin penalty function and the total model training delay per iteration, i.e.
Figure BDA0003903944180000191
Wherein the content of the first and second substances,
Figure BDA0003903944180000192
delta represents local model loss function gradient and digital twin loss function ladderThe divergence between degrees is bounded, rho and beta represent that the loss function of the local model meets rho-Lipschitz and beta-Smooth, theta represents the upper bound of the parameter norms of all the local models, and theta represents the twin loss function and the converged loss function of the current iteration number
Figure BDA0003903944180000193
The lower bound of the divergence there between,
Figure BDA0003903944180000194
Figure BDA0003903944180000195
and (3) representing the estimated value of the probability of the model poisoning attack of the t-th iteration, and calculating as follows:
Figure BDA0003903944180000196
the invention sets a convergence upper limit constraint of a digital twin loss function, and prevents the precision of a digital twin model from deteriorating by scheduling at least a certain number of devices. By observing the upper bound of the convergence of the digital twin loss function of P2, the constraint on the upper bound of the convergence of the loss function can be converted into a constraint on the part of the upper bound of the convergence, which is related to the number of scheduling devices, namely
Figure BDA0003903944180000197
Where λ (t) represents the constraint threshold. Since the number of scheduling devices is an integer, (20) can be rewritten as
Figure BDA0003903944180000198
Thus, P2 can be rewritten as P3
Figure BDA0003903944180000201
9. Design of endogenous security perception resource management method for social assets participating in power grid interaction
In order to solve P3, the invention models the Markov optimization problem, and the key elements of the Markov optimization problem comprise a state space, an action space and a reward function, and the method is specifically introduced as follows:
state space of digital twin estimation: the state space scheduled by the large-scale local training period comprises all model poisoning attack estimation probabilities in the last round of communication provided by the park digital twin, namely
Figure BDA0003903944180000202
The state space of small-scale equipment scheduling comprises a large-scale local training period scheduling strategy and model poisoning attack probability and channel gain of digital twin estimation
Figure BDA0003903944180000203
And electromagnetic interference
Figure BDA0003903944180000204
Namely that
Figure BDA0003903944180000205
An action space: the action space of large-scale local training period scheduling is defined as
Figure BDA0003903944180000206
Wherein the content of the first and second substances,
Figure BDA0003903944180000207
denotes τ (r) = τ h
The action space of small-scale equipment scheduling is defined as
A D (r)={a 1 (t),...,a k (t),...,a K (t)} (28)
The overhead function: since P3 is a minimization problem, the present invention employs a cost function to define the reward function. The cost function of the small-scale equipment scheduling is defined as the cost function omega (a) of one iteration T (r), a (t)), wherein
Figure BDA0003903944180000208
The cost function of large-scale local training period scheduling is defined as T 0 The sum of the cost functions of the sub-iterations, expressed as
Figure BDA0003903944180000211
The invention provides an endogenous security perception resource management method for social assets participating in power grid interaction to solve the Markov optimization problem. The algorithm schematic is shown in fig. 2. The edge server is the subject of executing the algorithm. For large-scale local training period scheduling, the edge server maintains an evaluation network and a target network, and network parameters are respectively expressed as
Figure BDA0003903944180000212
And
Figure BDA0003903944180000213
similarly, the evaluation network and target network parameters for small-scale device scheduling are respectively expressed as
Figure BDA0003903944180000214
And
Figure BDA0003903944180000215
the invention discloses a flow chart of an endogenous security perception resource management method for social assets participating in power grid interaction, which is shown in figure 3 and comprises the following specific steps:
(1) initialization: initialization
Figure BDA0003903944180000216
And
Figure BDA0003903944180000217
set up Ω (a) T (r),a(t))=0,
Figure BDA0003903944180000218
(2) Selecting a multi-time scale resource management action: at the beginning of each round of communication, i.e., T = (r-1) T 0 +1, the edge server performs large-time-scale local training period scheduling decision by using an epsilon-greedy algorithm; at the beginning of each iteration, similarly, the edge server makes a small time-scale device scheduling decision using the epsilon-greedy algorithm. And the equipment carries out local model training and local model parameter uploading according to the decision. The edge server performs global model tiebacks and model poisoning attack detection and estimates a model poisoning attack probability according to (21).
(3) Calculating a cost function and converting the state of the intelligent park: at the end of each iteration, the edge server observes a digital twin loss function and average total training time delay performance, and calculates a cost function omega (a) of the small-scale equipment scheduling network according to a formula (20) T (r), a (t)). At the end of each round of communication, i.e. t = rT 0 The edge server calculates the cost function of the large-scale local training period scheduling network according to the formula (29)
Figure BDA0003903944180000219
For large-scale local training period scheduling, the edge server generates empirical data
Figure BDA00039039441800002110
For updating the experience playback pool U T (r) and comparing the current state S T (r) transition to the next State S T (r + 1). Similarly, for small scale device scheduling, the edge server generates empirical data u D (t)={S D (t),A D (t),Ω(a T (r),a(t)),S D (t + 1) } to update the experience playback pool U D (t) and comparing the current state S D (t) transition to the next State S D (t+1)。
(4) Learning and multi-time scale DQN network updating: at the end of each communication round, the edge server randomly plays back the pool U from experience T (r) extracting a random sample set
Figure BDA0003903944180000221
Calculating the loss function of the large-scale local training period scheduling network according to the following formula
Figure BDA0003903944180000222
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003903944180000223
the Q value of the scheduling target network representing the local training period is calculated by the following formula
Figure BDA0003903944180000224
Wherein iota is a discount factor.
Based on eta T (r) the edge server updates the parameters using a gradient descent method
Figure BDA0003903944180000225
Namely that
Figure BDA0003903944180000226
Where κ denotes a learning step size.
Similarly, at the end of each iteration, the edge server randomly plays back the pool U from experience D (t) extracting a random sample set
Figure BDA0003903944180000227
Calculating the loss function of the small-scale equipment scheduling network according to the following formula, and updating the parameters
Figure BDA0003903944180000228
Figure BDA0003903944180000229
Figure BDA00039039441800002210
Every R 0 Round-robin communication, updating large-scale local training period scheduling target network parameters
Figure BDA00039039441800002211
Every other T 0 Secondary iteration for updating small-scale equipment scheduling target network parameters
Figure BDA00039039441800002212
Repeating the steps (2) to (4) until T = T.
The principle of realizing endogenous security perception of the method is to actively adjust the scheduling strategy of the large-time scale local training period and the small-time scale equipment according to the estimated probability of model poisoning attack. In particular, for small time scale device scheduling, the cost function Ω (a) is used when the estimated probability of model poisoning attack in the current iteration is greater than the model poisoning attack probability in the previous iteration T (r), a (t)) will increase. Therefore, the edge server can actively adjust the device scheduling policy to reduce Ω (a) T (r), a (t)) to guarantee the digital twin loss function and the average total training delay performance. For local training period scheduling with large time scale, the designed method can sense the overhead function
Figure BDA0003903944180000231
And model poisoning attack estimates a probability distribution. Therefore, the edge server can actively adjust the local training period scheduling strategy to adapt to the continuously changing model poisoning attack, and ensure the endogenous safety of the training of the digital twin model.
The comparison algorithm 1 is a deep reinforcement learning-based device orchestration algorithm (DRL-DO), and the optimization goal is to minimize the weighted sum of the digital twin loss function and the average total training delay. The comparison algorithm 2 is a client scheduling algorithm (UCB-CS) based on an upper confidence bound, and the optimization goal is to minimize the average total training delay. Both comparison algorithms ignore the large-scale local training period scheduling optimization and set the local training period to 1. In addition, the two comparison algorithms ignore the model poisoning attack detection of the edge server, that is, endogenous security perception cannot be realized by estimating the probability of the model poisoning attack.
The embodiment of the invention considers a scene of a 400m × 400m intelligent park, which comprises a base station carrying an edge server and 30 communication devices. The devices are randomly distributed in the area of the smart park under consideration, and the base station is located at the center of the area. The number of communication rounds is set to 100, the total number of iterations is set to 1000, and each communication round comprises 10 iterations. The present invention trains a digital twin model using an MNIST data set. Each device randomly assigned a training sample from MINST. The probability of model poisoning attack is randomly distributed in [0.05,0.2]. The simulation results are as follows:
figure 4 shows the weight and performance as a function of the number of iterations. The weighting and performance of the digital twin loss function and the average total training time delay of the invention are respectively improved by 16.51 percent and 19.58 percent compared with DRL-DO and UCB-CS. In addition, the weighting and performance of the present invention has better convergence and lower volatility. The invention optimizes large-scale local training period scheduling and small-scale equipment scheduling by sensing endogenous safety, thereby realizing the combined optimization of a digital twin loss function and average total training time delay.
Fig. 5 shows the weighting and performance as a function of the probability of an average model poisoning attack. As the probability of the average model poisoning attack increases, the weighted sum of the digital twin loss function and the average total training delay of the three algorithms decreases, with minimal degradation of the invention. Compared with DRL-DO and UCB-CS, when the probability of the model poisoning attack is equal to 0.4, the weighting and the performance of the method are respectively improved by 24.29 percent and 32.51 percent. The reason is that the present invention employs model poisoning attack detection to estimate the model poisoning attack probability. Therefore, the multi-time scale resource management strategy can be actively adjusted by sensing the estimated attack probability, and the endogenous safety of the training of the digital twin model is realized.
Fig. 6 shows the convergence behaviour of the invention. Simulation results show that the invention converges at the 400 th iteration, while the invention does not consider problem transformation converges at the 600 th iteration. The reason is that constraint C4 limits the minimum number of scheduling devices, reducing the size of the action space of the present invention, thereby achieving faster convergence performance.
The embodiment of the present application further provides an endogenetic security perception resource management equipment based on wisdom garden, include:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to: the memory stores instructions executable by the at least one processor to cause the at least one processor to perform: the local training period scheduling strategy executes local model training, local model parameter uploading, global model averaging, model poisoning attack detection, total time delay calculation and endogenous security perception resource management method execution.
An embodiment of the present application further provides a computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to: the memory stores instructions executable by the at least one processor to cause the at least one processor to perform: the local training period scheduling strategy executes local model training, local model parameter uploading, global model averaging, model poisoning attack detection, total time delay calculation and endogenous security perception resource management method execution.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device and media embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference may be made to some descriptions of the method embodiments for relevant points.
The device and the medium provided by the embodiment of the application correspond to the method one to one, so the device and the medium also have the similar beneficial technical effects as the corresponding method, and the beneficial technical effects of the method are explained in detail above, so the beneficial technical effects of the device and the medium are not repeated herein.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include transitory computer readable media (transient media) such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises that element.
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the practice of the invention, and it is to be understood that the scope of the invention is not limited to such specific statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (12)

1. A smart park system, comprising: a device layer, a 6G edge smart layer and a digital twin layer;
the device layer includes: the system comprises communication equipment and electrical equipment, wherein the communication equipment is deployed on the electrical equipment to perform local model training; the communication equipment executes local model training according to a local training period scheduling strategy, and uploads local model parameters to the 6G edge intelligent layer according to the scheduling strategy of the communication equipment to perform global model averaging so as to update a digital twin model of the digital twin layer; the electrical apparatus includes: photovoltaic and charging piles;
a 6G edge intelligent layer deploys a base station carrying an edge server; the edge server schedules a local training period to guide local model training, and uploads a communication equipment scheduling strategy for local parameters; the edge server provides model poisoning attack detection for the uploaded local model parameters;
the digital twin layer is maintained by an edge server of a 6G edge intelligent layer, real-time interaction with the park communication equipment is realized by the digital twin layer, and synchronization of the digital twin layer and a physical network is kept; the digital twin layer provides channel gain and electromagnetic interference estimation through local state information uploaded by the equipment layer, provides model poisoning attack probability estimation through model poisoning attack detection of the 6G edge intelligent layer, schedules a local training period of the 6G edge intelligent layer based on the estimated model poisoning attack probability, and schedules and optimizes communication equipment.
2. The intelligent campus system of claim 1 wherein: the local training period scheduling strategy performs local model training, including:
each iteration of the edge server needs to schedule the communication equipment to upload local model parameters so as to carry out global model averaging; if a communication device has been identified by the edge server as a malicious communication device, it cannot perform local training and parameter upload; let set of available communication devices for the t-th iteration be D ava (t), set size is denoted as | D ava (t) |, and satisfies
Figure FDA0003903944170000011
Definition a k (t) E {0,1} is a scheduling indicator variable for the communication device for the t-th iteration, a k (t) =1 denotes a communication apparatus d k Scheduled at the t-th iteration, otherwise a k (t)=0;
Scheduled communication device d in the t-th iteration k The global model is downloaded from the edge server to update the local model, as follows:
Figure FDA0003903944170000021
where τ (r) represents the local training period, ω, of the r-th round of communication g (t-1) represents the global model for the t-1 th iteration,
Figure FDA0003903944170000022
denotes the t-1 th timeIterative communication device d k The local model of (2);
communication device d in each iteration k Using local data sets
Figure FDA0003903944170000023
Carrying out local model training; definition of x j And y j Are respectively as
Figure FDA0003903944170000024
The input and target output of the jth sample; thus, the t-th iteration communication device d k The loss function of (a) is as follows:
Figure FDA0003903944170000025
wherein the content of the first and second substances,
Figure FDA0003903944170000026
representing the communication device d for the t-th iteration k The loss function of (a) is calculated,
Figure FDA0003903944170000027
representing a data set
Figure FDA0003903944170000028
The number of samples of (a) to (b),
Figure FDA0003903944170000029
a loss function representing a single sample, defined as the deviation of the actual output from the target output; based on the gradient descent method, the method of the gradient descent,
Figure FDA00039039441700000210
update according to:
Figure FDA00039039441700000211
wherein γ represents a learning step;
thus, the t-th iteration communication device d k Local training time delay
Figure FDA00039039441700000212
Represented by the formula:
Figure FDA00039039441700000213
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00039039441700000214
representing the t-th iteration of the communication device d k Local training delay of ζ k Number of CPU cycles required to process a sample, f k (t) is the locally available computational resource,. Tau. (r) denotes the local training period for the r-th round of communication,
Figure FDA00039039441700000215
representing a data set
Figure FDA00039039441700000216
The number of samples.
3. The intelligent park system according to claim 1, wherein: uploading the local model parameters to the 6G edge smart layer specifically comprises the following steps:
after the training of the local model is finished, the scheduled communication equipment uploads the local model parameters and local state information; the transmission rate is expressed as follows:
Figure FDA0003903944170000031
wherein R is k (t) the t-th iteration communication device d k Transmission rate of B k For transmission bandwidth, p k And g k (t) are respectively transmission powerAnd channel gain, σ 0 And
Figure FDA0003903944170000032
white gaussian noise and electromagnetic interference respectively;
Figure FDA0003903944170000033
is the magnitude of the local model parameter, | s k (t) | is the local state information size since | s k The size of (t) | is small, so s is ignored k (t) I uploading delay, local model parameter uploading delay
Figure FDA0003903944170000034
Represented by the formula:
Figure FDA0003903944170000035
wherein, define
Figure FDA0003903944170000036
Is the size of the parameters of the local model,
Figure FDA0003903944170000037
and uploading the delay for the local model parameters.
4. The intelligent campus system of claim 1 wherein: the 6G edge smart layer global model averaging specifically comprises the following steps:
after the scheduled communication equipment uploads the local model parameters and the state information to the edge server, the edge server executes global model averaging to update the digital twin model; global model ω for the t-th iteration g (t) on average the following formula:
Figure FDA0003903944170000038
wherein, ω is g (t) represents the global model for the tth iteration,
Figure FDA0003903944170000039
representing a data set
Figure FDA00039039441700000310
Number of samples of (a) k (t) communication device d for the tth iteration k Is used to indicate the variable(s) of the schedule,
Figure FDA00039039441700000311
communication device d for the tth iteration k The local model parameters of (a);
global model average delay L for the t-th iteration G (t) represents the following formula:
Figure FDA00039039441700000312
wherein L is G (t) mean time delay of global model for the t-th iteration, f g (t) denotes the available computing resources of the t-th iteration edge server, λ 0 Indicating the number of CPU cycles required to process 1 bit of data,
Figure FDA0003903944170000041
to represent
Figure FDA0003903944170000042
Size of (a) k (t) communication device d for the tth iteration k A scheduling indication variable of (1);
by using omega g (t) loss function F gg (t), t) to measure the accuracy of the global model, which is expressed as follows:
Figure FDA0003903944170000043
wherein the content of the first and second substances,F gg (t), t) represents ω g (t) a loss function of the (t),
Figure FDA0003903944170000044
representing the communication device d for the t-th iteration k Is used to determine the loss function of (c),
Figure FDA0003903944170000045
representing a data set
Figure FDA0003903944170000046
Number of samples of (a) k (t) communication device d for the tth iteration k The schedule indication variable.
5. The intelligent park system according to claim 1, wherein: the model poisoning attack detection comprises:
the model poisoning attack detection is carried out by calculating Euclidean norms of parameters to be detected and average parameters except the parameters to be detected, comparing the Euclidean norms with a preset threshold, if the Euclidean norms are larger than the threshold, judging the parameters to be detected to be error model parameters, and otherwise, judging the parameters to be detected to be normal model parameters; removing device
Figure FDA0003903944170000047
Mean outside parameter
Figure FDA0003903944170000048
Calculated as follows:
Figure FDA0003903944170000049
wherein the content of the first and second substances,
Figure FDA00039039441700000410
to remove
Figure FDA00039039441700000411
Outer average parameter, a z (t) communication device d for the tth iteration z The scheduling indication variable of (a) is,
Figure FDA00039039441700000412
t-th iteration communication device d z Local model parameters of (2);
Figure FDA00039039441700000413
and with
Figure FDA00039039441700000414
Euclidean norm d of k (t) represents the following formula:
Figure FDA00039039441700000415
wherein d is k (t) represents
Figure FDA00039039441700000416
And with
Figure FDA00039039441700000417
Euclidean norm of;
d k (t) and
Figure FDA00039039441700000418
proportional reference value e k (t) is represented by the following formula:
Figure FDA0003903944170000051
wherein e is k (t) represents d k (t) and
Figure FDA0003903944170000052
is compared with the reference value of the ratio of (c),
Figure FDA0003903944170000053
represent
Figure FDA0003903944170000054
Euclidean norm of d k (t) represents
Figure FDA0003903944170000055
And with
Figure FDA0003903944170000056
Euclidean norm of;
let attack detection variable be b k (t)∈{0,1},b k (t) =1 denotes that the model parameter is a normal parameter, otherwise b k (t) =0; model poisoning attack detection can be represented by
Figure FDA0003903944170000057
Wherein, b k (t) represents an attack detection variable, e Thr Is a detection threshold;
when the communication device d is iterated for the t time k When the uploaded error model parameter ratio exceeds a threshold ξ, the following formula is given:
Figure FDA0003903944170000058
where ξ represents the error model parameter scaling threshold, a k (z) denotes a z-th iteration communication device d k A scheduling indication variable of b k (z) denotes a z-th iteration communication device d k The attack detection variable of (a);
communication device d k Is treated as a malicious communication device and moves out of D ava (t+1)。
6. The intelligent park system of claim 1, whereinThe method comprises the following steps: total time delay L of digital twin model training of t-th iteration of equipment layer Sum (t) the maximum local training delay of the equipment layer, the maximum model parameter uploading delay obtained by the equipment layer transmitting parameters to the 6G edge intelligent layer and the global model average delay obtained by the 6G edge intelligent layer are represented as follows:
Figure FDA0003903944170000059
wherein L is Sum (t) represents the total delay of the digital twin model training,
Figure FDA00039039441700000510
which represents the local training time delay and is,
Figure FDA00039039441700000511
time delay for uploading of local model parameters, L G And (t) is the average time delay of the global model.
7. An endogenous security perception resource management method for social assets participating in power grid interaction is characterized by comprising the following steps: an endogenous security-aware resource management method implemented on the basis of the intelligent campus system of claim 1;
the endogenous security perception resource management method comprises the following steps:
step 1, initialization
Figure FDA0003903944170000061
And
Figure FDA0003903944170000062
setting omega (a) T (r),a(t))=0,
Figure FDA0003903944170000063
Wherein the content of the first and second substances,
Figure FDA0003903944170000064
and
Figure FDA0003903944170000065
respectively representing the evaluation network parameters and the target network parameters of the large-scale local training period scheduling,
Figure FDA0003903944170000066
and
Figure FDA0003903944170000067
respectively representing the evaluation network and target network parameters of the small-scale equipment scheduling, omega (a) T (r), a (t)) represents a small-scale device scheduling overhead function,
Figure FDA0003903944170000068
represents a large-scale local training period scheduling overhead function, a T (r) a large-scale local training period scheduling action, a (t) a small-scale equipment scheduling action;
step 2, at the beginning of each round of communication, namely T = (r-1) T 0 +1, the edge server performs large-time-scale local training period scheduling decision by using an epsilon-greedy algorithm; when each iteration starts, similarly, the edge server performs small-time-scale equipment scheduling decision by using an epsilon-greedy algorithm; the equipment carries out local model training and local model parameter uploading according to the decision; the edge server executes global model tie and model poisoning attack detection and calculates the poisoning attack probability;
step 3, when each iteration is finished, the edge server observes a digital twin loss function and average total training time delay performance;
calculating a cost function of a small-scale equipment scheduling network;
when each round of communication is finished, the edge server calculates the overhead function of the large-scale local training period scheduling network
Figure FDA0003903944170000069
For large-scale local training period scheduling, the edge server generates empirical data
Figure FDA00039039441700000610
For updating the experience playback pool U T (r) and comparing the current state S T (r) transition to the next State S T (r + 1); for small-scale equipment scheduling, the edge server generates empirical data u D (t)={S D (t),A D (t),Ω(a T (r),a(t)),S D (t + 1) } to update the experience playback pool U D (t) and comparing the current state S D (t) transition to the next State S D (t+1);
And 4, when each round of communication is finished, the edge server randomly plays back the pool U from experience T (r) extracting a random sample set
Figure FDA0003903944170000071
Calculating a loss function eta of the large-scale local training period scheduling network according to the following formula T (r)
Figure FDA0003903944170000072
η T (r) is a loss function of the large-scale local training period scheduling network,
Figure FDA0003903944170000073
network Q values are evaluated for large scale local training period scheduling,
Figure FDA0003903944170000074
respectively scheduling and evaluating network parameters for a state space and an action space of the r +1 th round of communication and a large-scale local training period of the r-th round of communication,
Figure FDA0003903944170000075
the Q value of the scheduling target network representing the local training period is calculated by the following formulaCalculating:
Figure FDA0003903944170000076
wherein iota is a discount factor, omega (a) T (r), a (t)) is a small scale device scheduling overhead function,
Figure FDA0003903944170000077
scheduling and evaluating a network Q value for a large-scale local training period;
loss function eta based on large-scale local training period scheduling network T (r) the edge server updates the parameters using a gradient descent method
Figure FDA0003903944170000078
At the end of each iteration, the edge server randomly plays back the pool U from experience D (t) extracting a random sample set
Figure FDA0003903944170000079
Calculating a loss function η of a small-scale equipment scheduling network D (t) and updating the parameters
Figure FDA00039039441700000710
Every R 0 Round-robin communication, updating large-scale local training period scheduling target network parameters
Figure FDA00039039441700000711
Every other T 0 Secondary iteration for updating small-scale equipment scheduling target network parameters
Figure FDA00039039441700000712
And 5, repeating the steps 2 to 4 until T = T.
8. The endogenous security aware resource management method of claim 7, wherein: in step 2, calculating the poisoning attack probability according to the following formula;
Figure FDA0003903944170000081
wherein the content of the first and second substances,
Figure FDA0003903944170000082
model poisoning attack probability estimation value representing t-th iteration, a k (t) scheduling indicator variable for communication device, b k (t) represents an attack detection variable.
9. The endogenous security aware resource management method of claim 8, wherein: in the step 3, the step of the method is that,
calculating a cost function omega (a) of a small-scale equipment scheduling network T (r),a(t)),
Figure FDA0003903944170000083
Wherein the content of the first and second substances,
Figure FDA0003903944170000084
delta represents the upper bound of divergence between the gradient of the local model loss function and the gradient of the digital twin loss function, rho and beta represent that the loss function of the local model meets rho-Lipschitz and beta-Smooth, theta represents the upper bound of the parameter norms of all the local models, and theta represents the current iteration digital twin loss function and the converged loss function
Figure FDA0003903944170000085
The lower bound of the divergence between them,
Figure FDA0003903944170000086
Figure FDA0003903944170000087
representing a model poisoning attack probability estimation value of the t iteration;
at the end of each communication round, i.e. t = rT 0 The edge server calculates the overhead function of the large-scale local training period scheduling network according to the following formula
Figure FDA0003903944170000088
Figure FDA0003903944170000089
Wherein, the first and the second end of the pipe are connected with each other,
Figure FDA00039039441700000810
scheduling a network overhead function, Ω (a), for a large scale local training period T (r), a (t)) is a small-scale device scheduling overhead function.
10. An endogenous security-aware resource management method according to claim 9, wherein:
in step 4, calculating a loss function eta of the large-scale local training period scheduling network T (r) is represented by the following formula:
Figure FDA0003903944170000091
η T (r) is a loss function of the large-scale local training period scheduling network,
Figure FDA0003903944170000092
network Q value, S, scheduling and evaluating for large-scale local training period T (r+1),A T (r+1),
Figure FDA0003903944170000093
Respectively a state space and an action space of the r +1 th round communication and a large-scale local of the r round communicationThe training period schedules the evaluation of the network parameters,
Figure FDA0003903944170000094
and representing the Q value of the scheduling target network of the local training period, and being calculated by the following formula:
Figure FDA0003903944170000095
wherein iota is a discount factor, omega (a) T (r), a (t)) scheduling overhead functions for small-scale devices,
Figure FDA0003903944170000096
scheduling and evaluating a network Q value for a large-scale local training period;
based on eta T (r) the edge server updates the parameters using a gradient descent method
Figure FDA0003903944170000097
The following formula:
Figure FDA0003903944170000098
where, k represents a learning step size,
Figure FDA0003903944170000099
network parameters are scheduled and evaluated for the r +1 th communication round of the large-scale local training period,
Figure FDA00039039441700000910
as a loss function eta T (r) gradient descent;
at the end of each iteration, the edge server randomly plays back the pool U from experience D (t) extracting a random sample set
Figure FDA00039039441700000911
Calculating the small scale design according toLoss function eta of standby scheduling network D (t) and updating the parameters
Figure FDA00039039441700000912
Figure FDA00039039441700000913
Figure FDA00039039441700000914
Wherein eta is D (t) scheduling a loss function of the network for the small-scale devices,
Figure FDA00039039441700000915
a target network Q value is scheduled for the device,
Figure FDA00039039441700000916
evaluating network Q-value, S, for small scale device scheduling D (t+1),A D (t+1),
Figure FDA00039039441700000917
The state space and the action space of the t +1 th iteration and the small-scale equipment scheduling evaluation network parameters of the t th iteration are respectively, k represents the learning step length,
Figure FDA00039039441700000918
network parameters are scheduled and evaluated for the small-scale equipment of the t +1 th iteration,
Figure FDA0003903944170000101
as a loss function eta D (t) a gradient decrease;
every R 0 Round-robin communication, updating large-scale local training period scheduling target network parameters
Figure FDA0003903944170000102
Every other T 0 Secondary iteration for updating small-scale equipment scheduling target network parameters
Figure FDA0003903944170000103
11. A computer device, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of endogenous security aware resource management of one of claims 7 to 10.
12. A computer-readable storage medium characterized by: storing computer instructions for implementing the method of endogenous security aware resource management according to one of claims 7 to 10 when executed by a processor.
CN202211300064.4A 2022-10-24 2022-10-24 Endogenous security perception resource management method for social asset participation power grid interaction Pending CN115664924A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211300064.4A CN115664924A (en) 2022-10-24 2022-10-24 Endogenous security perception resource management method for social asset participation power grid interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211300064.4A CN115664924A (en) 2022-10-24 2022-10-24 Endogenous security perception resource management method for social asset participation power grid interaction

Publications (1)

Publication Number Publication Date
CN115664924A true CN115664924A (en) 2023-01-31

Family

ID=84991465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211300064.4A Pending CN115664924A (en) 2022-10-24 2022-10-24 Endogenous security perception resource management method for social asset participation power grid interaction

Country Status (1)

Country Link
CN (1) CN115664924A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843304A (en) * 2023-09-04 2023-10-03 中国工业互联网研究院 Digital twin park management method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116843304A (en) * 2023-09-04 2023-10-03 中国工业互联网研究院 Digital twin park management method, device, equipment and storage medium
CN116843304B (en) * 2023-09-04 2023-11-21 中国工业互联网研究院 Digital twin park management method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Daneshfar et al. Load–frequency control: a GA-based multi-agent reinforcement learning
CN110190918B (en) Cognitive wireless sensor network spectrum access method based on deep Q learning
CN103730006A (en) Short-time traffic flow combined forecasting method
CN111144663B (en) Ultra-short-term wind power prediction method for offshore wind farm considering output fluctuation process
KR101749427B1 (en) Method for forecasting wind speed based on artificial neural networks having different features
CN105138717A (en) Transformer state evaluation method by optimizing neural network with dynamic mutation particle swarm
Truong et al. Robust variable sampling period control for networked control systems
CN103322553A (en) Multi-model disturbance estimation predictive-control method for superheated steam temperature of thermal power generating unit
CN105760213A (en) Early warning system and method of resource utilization rate of virtual machine in cloud environment
CN110889779A (en) Typical scene model construction method and unit recovery method for multi-wind-farm output
CN115664924A (en) Endogenous security perception resource management method for social asset participation power grid interaction
CN112187554A (en) Operation and maintenance system fault positioning method and system based on Monte Carlo tree search
CN110874665B (en) Control device and method for wind generating set
Tabak Maiden application of fractional order PID plus second order derivative controller in automatic voltage regulator
CN112310980A (en) Safety and stability evaluation method and system for direct-current blocking frequency of alternating-current and direct-current series-parallel power grid
CN115564193A (en) Multi-dimensional comprehensive benefit evaluation method and system for intelligent power distribution network and storage medium
CN115865679A (en) Network management and control method, system and storage medium
CN103578274B (en) A kind of traffic flow forecasting method and device
WO2021250445A1 (en) Network performance assessment
CN114757548A (en) Wind power energy storage equipment adjusting performance evaluation method adopting scene construction
CN114500561A (en) Power internet of things network resource allocation decision method, system, device and medium
Váquez et al. New predictive PID controllers for packet dropouts in wireless networked control systems
Huang et al. Probabilistic prediction intervals of wind speed based on explainable neural network
CN117311159A (en) Self-adaptive adjusting method and device of control system, storage medium and electronic equipment
CN113705067B (en) Microgrid optimization operation strategy generation method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination