CN115664924A

CN115664924A - Endogenous security perception resource management method for social asset participation power grid interaction

Info

Publication number: CN115664924A
Application number: CN202211300064.4A
Authority: CN
Inventors: 周振宇; 张孙烜; 姚子佳; 廖海君; 杜治钢; 甘忠; 朱靖恺; 姚贤炯; 游兆阳; 陈毅龙; 肖云杰; 宋岩; 黄大维; 郭磊; 冯晨
Original assignee: North China Electric Power University; State Grid Shanghai Electric Power Co Ltd
Current assignee: North China Electric Power University; State Grid Shanghai Electric Power Co Ltd
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-01-31

Abstract

The invention discloses an endogenous security perception resource management method for social assets participating in power grid interaction, which comprises the following steps: the device comprises a device layer, a 6G edge intelligent layer and a digital twin layer; the device layer includes: the communication equipment is deployed on the communication equipment to perform local model training; uploading the local model parameters to a 6G edge intelligent layer for global model averaging; 6G edge intelligent layer deploys an edge server; the edge server guides the local model training and uploads the local parameters to make a communication equipment scheduling strategy; the edge server provides model poisoning attack detection for local model parameters; the digital twin layer realizes real-time interaction with the park communication equipment and assists resource management optimization. The invention has the advantages that: the safe and reliable operation of the intelligent park provides real-time simulation and accurate prediction, endogenous safety perception is achieved, optimization space is reduced, and convergence of multi-time scale resource management optimization is accelerated.

Description

Endogenous security perception resource management method for social asset participation power grid interaction

Technical Field

The invention relates to the technical field of smart parks, in particular to a smart park system and an endogenous security perception resource management method for social assets participating in power grid interaction.

Background

The intelligent park covers multi-energy main bodies such as high-proportion renewable energy sources and distributed energy storage, and the safe and reliable operation of the intelligent park is a key support for implementing new energy safety strategies in China and realizing the aim of 'double carbon'. For realizing the safe and reliable operation in wisdom garden, a large amount of communication equipment are disposed on communication equipment such as distributing type photovoltaic, fill electric pile, block terminal, provide monitoring and control management for the operation in garden. The digital twin technology provides real-time simulation and accurate prediction for safe and reliable operation of the intelligent park by constructing a connection gap between a physical space and a digital system. The device locally trains the model and uploads the model parameters to the edge server for global model averaging, thereby updating the digital twin model.

In order to ensure accurate real-time update between the digital twin and the intelligent park entity network, the local training period scheduling and the communication equipment scheduling need to be intelligently, flexibly and reasonably managed. However, due to the fact that the smart park is complex in energy supply, energy utilization and membership relation of affiliated assets of the smart park, a large number of social assets participate in power grid interaction through an operator network, the smart park is complex in communication environment, various malicious attackers can invade the smart park communication network, and the updating of the digital twin model faces diversified and frequent network attack threats. Among them, the influence of the model poisoning attack on the digital twin model update is the most significant. A malicious attacker destroys the global model average by controlling the communication equipment to upload wrong model parameters, so that the precision of a digital twin model is reduced, the deviation of simulation and prediction is caused, and the safety of the intelligent park is seriously threatened. Therefore, while optimizing campus communication resource management, an endogenous security awareness technology needs to be introduced, so that a campus actively adjusts a resource management strategy to deal with attacks of malicious attackers under social asset participation and interaction. However, the method for managing communication resources for social asset participation and interaction with intrinsic safety perception is still in the starting stage, and the following challenges need to be solved:

firstly, in order to solve the problem that the accuracy of a digital twin model is low due to the fact that a model poisoning attack of a malicious attacker participates in power grid interaction, the number of dispatching equipment and a local training period are increased by the edge server. However, this increases the total delay of the digital twin model training.

Secondly, the social assets of the intelligent park frequently participate in power grid interaction, so that the network environment of the intelligent park is complex, and the random behavior of the malicious attacker is strong. Based on unpredictable attack behaviors of a malicious attacker, the edge server is difficult to actively adjust a resource management strategy so as to ensure high-precision and low-delay digital twin model training.

Thirdly, since the probability change of model poisoning attack is much slower than the change of channel condition, local training period scheduling needs to be optimized on a large time scale; to ensure real-time updating of the digital twin model, the device scheduling needs to be optimized on a small time scale. However, due to the large optimization space, it is difficult for the edge server to make an optimal multi-time scale resource management strategy within a limited optimization time, so that the convergence speed of the optimization is slow.

Prior art 1

Equipment arrangement algorithm based on deep reinforcement learning

In the prior art, when resource management optimization is performed, only small-scale equipment scheduling optimization is considered, large-scale local training period scheduling optimization is not considered, and convergence of a digital twin loss function cannot be effectively guaranteed. In addition, in the prior art, model poisoning attack detection is not considered, so that an algorithm is difficult to perceive intrinsic safety so as to actively adjust a resource management strategy.

Prior art II

Client scheduling algorithm based on upper confidence bound

In the second prior art, when resource management optimization is performed, optimization of average total training time and small-scale equipment scheduling is only considered, optimization of a digital twin loss function and large-scale local training period scheduling is not considered, and the accuracy of a digital twin model and the convergence of the loss function cannot be effectively guaranteed. In addition, model poisoning attack detection is not considered in the second prior art, so that an algorithm cannot sense intrinsic safety easily, and therefore a resource management strategy is adjusted actively.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a smart park system, and an endogenous security perception resource management method and equipment for social asset participation power grid interaction.

In order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows:

a smart park system, comprising: the device comprises a device layer, a 6G edge intelligent layer and a digital twin layer;

the device layer includes: the system comprises communication equipment and electrical equipment, wherein the communication equipment is deployed on the electrical equipment to perform local model training; the communication equipment executes local model training according to a local training period scheduling strategy, and uploads local model parameters to a 6G edge intelligent layer according to the scheduling strategy of the communication equipment to perform global model averaging so as to update a digital twin model of the digital twin layer; the electrical apparatus includes: photovoltaic and charging piles;

a 6G edge intelligent layer deploys a base station carrying an edge server; the edge server schedules a local training period to guide local model training, and uploads and formulates a communication equipment scheduling strategy for local parameters; the edge server provides model poisoning attack detection for the uploaded local model parameters;

the digital twin layer is maintained by an edge server of a 6G edge intelligent layer, real-time interaction with the park communication equipment is realized by the digital twin layer, and synchronization of the digital twin layer and a physical network is kept; the digital twin layer provides channel gain and electromagnetic interference estimation through local state information uploaded by the equipment layer, and provides model poisoning attack probability estimation through model poisoning attack detection of the 6G edge intelligent layer, so that local training period scheduling and communication equipment scheduling optimization of the 6G edge intelligent layer are assisted.

Further, the local training period scheduling strategy performs local model training, including:

each iteration of the edge server needs to schedule the communication equipment to upload local model parameters so as to carry out global model averaging; if a communication device has been identified by the edge server as a malicious communication device, it cannot perform local training and parameter upload; let set of available communication devices for the t-th iteration be D ^ava (t), set size is denoted as | D ^ava (t) |, and satisfies

Definition a _k (t) E {0,1} is a scheduling indicator variable for the communication device for the t-th iteration, a _k (t) =1 denotes a communication apparatus d _k Scheduled at the t-th iteration, otherwise a _k (t)＝0；

In the t-th iteration, the scheduled communication device, e.g. communication device d _k The global model is downloaded from the edge server to update the local model, which is expressed as follows:

where τ (r) represents the local training period, ω, of the r-th round of communication _g (t-1) represents the global model for the t-1 th iteration,

is shown ast-1 iterative communication device d _k The local model of (2);

communication device d in each iteration _k Using local datasets

Carrying out local model training; definition of x _j And y _j Are respectively as

The input and target output of the jth sample; thus, the tth iterative communication device d _k The loss function of (a) is as follows:

wherein, the first and the second end of the pipe are connected with each other,

representing the communication device d for the t-th iteration _k Is used to determine the loss function of (c),

representing a data set

The number of samples of (a) to (b),

a loss function representing a single sample, defined as the deviation of the actual output from the target output; based on the gradient descent method, the method comprises the following steps of,

updating according to

Wherein γ represents a learning step;

thus, the tth iteration communication deviced _k Local training time delay

Represented by the formula:

wherein the content of the first and second substances,

representing the t-th iteration of the communication device d _k Local training delay of ζ _k Number of CPU cycles required to process a sample, f _k (t) is the locally available computational resource,. Tau. (r) denotes the local training period for the r-th round of communication,

representing a data set

The number of samples.

Further, uploading the local model parameters to the 6G edge smart tier specifically comprises:

after the training of the local model is finished, the scheduled communication equipment uploads the local model parameters and local state information; the transmission rate is expressed as follows:

wherein R is _k (t) as the tth iterative communication device d _k Transmission rate of B _k For transmission bandwidth, p _k And g _k (t) transmission power and channel gain, σ, respectively ₀ And

white gaussian noise and electromagnetic interference respectively;

is the magnitude of the local model parameter, | s _k (t) | is the local state information size, since | s _k The size of (t) | is small, so s is ignored _k (t) I uploading delay, local model parameter uploading delay

Represented by the formula:

wherein, define

Is the size of the parameters of the local model,

and uploading the local model parameters.

Further, the performing global model averaging by the 6G edge smart tier specifically includes:

after the scheduled communication equipment uploads the local model parameters and the state information to the edge server, the edge server executes global model averaging to update the digital twin model; global model ω for the t-th iteration _g (t) on average the following formula:

wherein, ω is _g (t) represents the global model for the tth iteration,

representing a data set

Number of samples of (a) _k (t) communication device d for the tth iteration _k Is used to indicate the variable(s) of the schedule,

t-th iteration communication device d _k Local model parameters of (2);

global model average delay L for the t-th iteration ^G (t) represents the following formula:

wherein L is ^G (t) mean time delay of global model for the t-th iteration, f _g (t) denotes the available computing resources of the t-th iteration edge server, λ ₀ Indicating the number of CPU cycles required to process 1 bit of data,

represent

Size of (a) _k (t) communication device d for the tth iteration _k A scheduling indication variable of (1);

by using omega _g (t) loss function F _g (ω _g (t), t) to measure the accuracy of the global model, expressed as:

wherein, F _g (ω _g (t), t) represents ω _g (t) a loss function of the (t),

representing the t-th iteration of the communication device d _k Is used to determine the loss function of (c),

representing a data set

Number of samples of (a) _k (t) for the t-th iterationCommunication device d _k The schedule indication variable of (2).

Further, the model poisoning attack detection includes:

the model poisoning attack detection is carried out by calculating Euclidean norms of parameters to be detected and average parameters except the parameters, comparing the Euclidean norms with a preset threshold, and if the Euclidean norms are larger than the threshold, judging the Euclidean norms to be error model parameters, otherwise, judging the Euclidean norms to be normal model parameters; removing device

Mean outside parameter

Calculated, as follows:

wherein the content of the first and second substances,

to remove

The external average parameter, a _z (t) communication device d for the tth iteration _z Is used to indicate the variable(s) of the schedule,

communication device d for the tth iteration _z Local model parameters of (2);

and

euclidean norm d of _k (t) represents the following formula:

wherein d is _k (t) represents

And with

Euclidean norm of;

d _k (t) and

proportional reference value e _k (t) is represented by the following formula:

wherein e is _k (t) represents d _k (t) and

is calculated by the ratio of (a) to (b),

to represent

Euclidean norm of d _k (t) represents

And

euclidean norm of;

let attack detection variable be b _k (t)∈{0,1}，b _k (t) =1 denotes that the model parameter is a normal parameter, otherwise b _k (t) =0; model poisoning attack detection can be represented by

Wherein, b _k (t) represents an attack detection variable, e ^Thr Is a detection threshold;

when the communication device d is iterated for the t time _k When the uploaded error model parameter ratio exceeds a threshold ξ, the following formula is given:

where ξ represents the error model parameter scaling threshold, a _k (z) denotes a z-th iteration communication device d _k A scheduling indication variable of b _k (z) denotes a z-th iteration communication device d _k The attack detection variable of (a);

communication device d _k Is treated as a malicious communication device and moves out of D ^ava (t+1)。

Further, the total time delay L of the training of the digital twin model of the tth iteration of the equipment layer ^Sum (t) the maximum local training delay of the equipment layer, the maximum model parameter uploading delay obtained by the equipment layer transmitting parameters to the 6G edge intelligent layer and the global model average delay obtained by the 6G edge intelligent layer are represented as follows:

wherein L is ^Sum (t) represents the total delay of the digital twin model training,

which represents the local training time delay, is,

time delay for uploading of local model parameters, L ^G And (t) is the average time delay of the global model.

The invention also discloses an endogenous security perception resource management method oriented to social assets participating in power grid interaction, which is characterized by comprising the following steps of: the endogenous security perception resource management method is realized on the basis of the intelligent park system;

the endogenous security perception resource management method comprises the following steps:

step 1, initialization: initialization

And

setting omega (a) ^T (r),a(t))＝0，

Wherein the content of the first and second substances,

and

respectively representing the evaluation network parameters and the target network parameters of the large-scale local training period scheduling,

and

respectively representing an evaluation network and a target network parameter, omega (a), of small-scale device scheduling ^T (r), a (t)) represents a small-scale device scheduling overhead function,

represents a large-scale local training period scheduling overhead function, a ^T (r) a large-scale local training period scheduling action, a (t) a small-scale equipment scheduling action;

step 2, selecting a multi-time scale resource management action: at the beginning of each round of communication, i.e., T = (r-1) T ₀ +1, the edge server performs large-time-scale local training period scheduling decision by using an epsilon-greedy algorithm; at the beginning of each iteration, similarly, the edge server utilizes the ε -greedy algorithmPerforming equipment scheduling decision of small time scale; the equipment carries out local model training and local model parameter uploading according to the decision; the edge server executes global model tie and model poisoning attack detection and calculates the poisoning attack probability;

step 3, overhead function calculation and intelligent park state conversion: when each iteration is finished, the edge server observes a digital twin loss function and average total training time delay performance;

calculating a cost function of a small-scale equipment scheduling network;

when each round of communication is finished, the edge server calculates the overhead function of the large-scale local training period scheduling network

For large-scale local training period scheduling, the edge server generates empirical data

For updating the experience playback pool U ^T (r) and comparing the current state S ^T (r) transition to the next State S ^T (r + 1); similarly, for small scale device scheduling, the edge server generates empirical data u ^D (t)＝{S ^D (t),A ^D (t),Ω(a ^T (r),a(t)),S ^D (t + 1) } to update the empirical playback pool U ^D (t) and comparing the current state S ^D (t) transition to the next State S ^D (t+1)；

Step 4, learning and multi-time scale DQN network updating: at the end of each communication round, the edge server randomly plays back the pool U from experience ^T (r) extracting a random sample set

Calculating a loss function eta of the large-scale local training period scheduling network according to the following formula ^T (r)

η ^T (r) is a loss function of the large-scale local training period scheduling network,

network Q value, S, scheduling and evaluating for large-scale local training period ^T (r+1),A ^T (r+1),

Respectively scheduling and evaluating network parameters for a state space and an action space of the r +1 th round of communication and a large-scale local training period of the r-th round of communication,

and representing the Q value of the scheduling target network of the local training period, and being calculated by the following formula:

wherein iota is a discount factor, omega (a) ^T (r), a (t)) is a small scale device scheduling overhead function,

scheduling and evaluating a network Q value for a large-scale local training period;

loss function eta based on large-scale local training period scheduling network ^T (r) the edge server updates the parameters using a gradient descent method

At the end of each iteration, the edge server randomly plays back the pool U from experience ^D (t) extracting a random sample set

Calculating a loss function η of a small-scale equipment scheduling network ^D (t) and updating the parameters

Every other R ₀ In turn, updating large-scale local training period scheduling target network parameters

Every other T ₀ Secondary iteration for updating small-scale equipment scheduling target network parameters

And 5, repeating the steps 2 to 4 until T = T.

Further, in step 2, calculating the poisoning attack probability according to the following formula;

wherein the content of the first and second substances,

model poisoning attack probability estimation value representing t-th iteration, a _k (t) scheduling indicator variable for communication device, b _k (t) represents an attack detection variable.

Further, in step 3, calculating a cost function Ω (a) of the small-scale device scheduling network ^T (r),a(t))，

delta represents the upper bound of divergence between the gradient of the local model loss function and the gradient of the digital twin loss function, rho and beta represent that the loss function of the local model meets rho-Lipschitz and beta-Smooth, theta represents the upper bound of the parameter norms of all the local models, and theta represents the current iteration digital twin loss function and the converged loss function

The lower bound of the divergence there between,

representing a model poisoning attack probability estimation value of the t iteration;

at the end of each round of communication, i.e. t = rT ₀ The edge server calculates the overhead function of the large-scale local training period scheduling network according to the following formula

scheduling a network overhead function, Ω (a), for a large scale local training period ^T (r), a (t)) is a small scale device scheduling overhead function.

Further, in step 4, a loss function η of the large-scale local training period scheduling network is calculated ^T (r) has the following formula:

and representing the Q value of the local training period scheduling target network, and being calculated by the following formula:

based on eta ^T (r) the edge server updates the parameters using a gradient descent method

The following formula:

where, k represents a learning step size,

network parameters are scheduled and evaluated for the r +1 th communication round of the large-scale local training period,

as a loss function eta ^T (r) a gradient decrease;

According toCalculating a loss function eta of the small-scale equipment scheduling network by the following formula ^D (t) and updating the parameters

Wherein eta ^D (t) scheduling a loss function of the network for the small-scale device,

a target network Q value is scheduled for the device,

evaluating network Q-value, S, for small scale device scheduling ^D (t+1),A ^D (t+1),

Respectively scheduling and evaluating network parameters for a state space and an action space of the t +1 th iteration and a small-scale device of the t th iteration, wherein k represents a learning step length,

network parameters are scheduled and evaluated for the small-scale equipment of the t +1 th iteration,

as a loss function eta ^D (t) a gradient decrease;

every R ₀ Round-robin communication, updating large-scale local training period scheduling target network parameters

Every other T ₀ Sub-iteration, updating the small scaleDevice scheduling target network parameters

The invention also discloses a computer device, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of endogenous security-aware resource management described above.

The invention also discloses a computer readable storage medium for storing computer instructions, and the computer instructions are executed by the processor to realize the endogenous security perception resource management method.

Compared with the prior art, the invention has the advantages that:

1. the intelligent park safe and reliable operation real-time simulation and accurate prediction are supported through frequent interaction among multiple levels of the intelligent park system.

2. The method establishes a communication resource management model jointly optimized by a digital twin loss function and an average total training time delay, takes the weighted sum of the digital twin loss function and the average total training time delay as an optimization target, and realizes the dynamic balance of high precision and low time delay of the training of the digital twin model by optimizing a local training period and a communication equipment scheduling strategy under the poisoning attack of the model, thereby providing real-time simulation and accurate prediction for the safe and reliable operation of the intelligent park.

3. The present invention employs model poisoning attack detection to detect erroneous model parameters to estimate the attack probability. When the estimated model poisoning attack probability changes, the edge server can actively adjust a local training period and a scheduling strategy of equipment to ensure the performance of a digital twin loss function and the average total digital twin model training time delay, so that the endogenous safety perception that the social assets of the intelligent park participate in power grid interaction is realized.

4. The invention deduces the minimum number of scheduling devices to reduce the optimization space and accelerate the convergence of the multi-time scale resource management optimization. Then, with the aid of digital twin and model poisoning attack detection, the invention designs an endogenous security perception resource management method for social assets participating in power grid interaction. Specifically, the method adopts a large-time-scale DQN network to learn a large-time-scale local training period scheduling strategy. And based on an optimization strategy of the large-time-scale local training period scheduling, learning a small-time-scale equipment scheduling strategy by adopting a small-time-scale DQN network.

Drawings

FIG. 1 is a diagram of a digital twin enabled smart campus scenario in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of an endogenous security perception resource management method for social assets participating in power grid interaction according to an embodiment of the invention;

FIG. 3 is a flowchart of an embodiment of the invention, which is directed to a method for managing endogenous security perception resources for social assets participating in power grid interaction;

FIG. 4 is a graphical illustration of the weight and performance of an embodiment of the present invention as a function of iteration number;

FIG. 5 is a graphical illustration of the variation of weighting and performance with the probability of an average model poisoning attack according to an embodiment of the present invention;

FIG. 6 is a convergence performance diagram according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.

The digital twin enabled smart campus scenario considered in the embodiments of the present invention is shown in FIG. 1, and aims to train a digital twin model for a smart campus to ensure reliable operation of the smart campus. The considered scenario includes a device layer, a 6G edge smart layer and a digital twin layer. On the equipment layer, the communication equipment is deployed on communication equipment such as photovoltaic equipment and charging piles for local model training. Assuming that the number of communication devices deployed is K, collection representation is D = { D = ₁ ,...,d _k ,...,d _K In which d is _k Represents the k < th >And (4) equipment. The device firstly executes local model training according to a local training period scheduling strategy, and then uploads local model parameters to a 6G edge intelligent layer according to the device scheduling strategy to perform global model averaging. Because the intelligent park has complex energy supply, energy utilization and affiliated asset membership, a large number of social assets participate in power grid interaction through an operator communication network, so that the park faces malicious attack threats. And a malicious attacker can apply model poisoning attacks on local model uploading, namely, the accuracy of the digital twin model is reduced by uploading wrong model parameters through the control equipment. The 6G edge intelligent layer is provided with a base station carrying an edge server. Due to the existence of model poisoning attacks, the edge server schedules a local training period to guide local model training and establishes a device scheduling strategy for uploading local parameters. In addition, the edge server provides model poisoning attack detection for the uploaded local model parameters. The digital twin layer is maintained by an edge server, and the layer mainly realizes real-time interaction with the garden equipment and keeps the synchronization of the digital twin and a physical network. Furthermore, digital twinning may assist resource management optimization by providing state estimates of channel gain, electromagnetic interference, model poisoning attack probability, and the like.

Since the probability of model poisoning attacks varies much slower than the channel gain, local training period scheduling needs to be optimized on a large time scale to ensure training stability, and device scheduling needs to be optimized on a small time scale to adapt to rapidly varying link conditions. The invention divides the total optimization time into R communication turns, namely large time scale, and each communication turn is divided into T ₀ The second iteration, i.e. the small time scale. The set of communication rounds is denoted as R = { 1., R }, where the set of iterations involved in the R-th round of communication is defined as T (R) = { (R-1) T } ₀ +1,(r-1)T ₀ +2,...,rT ₀ }. The R communication rounds comprise a total of T iterations, i.e. T = RT ₀ The set of iterations is defined as T = { 1.

1. Digital twinning model

Definition device d _k Is DT _k Including a local model of the device

Local data set

And local state information s _k (t)，DT _k Can be modeled as

2. Device scheduling and local training model

The edge server needs to schedule the device to upload local model parameters for global model averaging each time it iterates. If a device has been identified by the edge server as a malicious device, it cannot do local training and parameter uploads. Let the set of available devices for the t-th iteration be D ^ava (t), set size is denoted as | D ^ava (t) |, and satisfies

Definition a _k (t) e {0,1} is the device scheduling indicator variable for the tth iteration, a _k (t) =1 denotes an apparatus d _k Scheduled at the t-th iteration, otherwise a _k (t)＝0。

In the t-th iteration, the scheduled device, e.g. device d _k The global model is downloaded from the edge server to update the local model, represented as

Where τ (r) represents the local training period for the r-th round of communication.

Device d in each iteration _k Using local datasets

And carrying out local model training. Definition of x _j And y _j Are respectively as

The input of the jth sample and the target output. Thus, the t-th iteration device d _k Has a loss function of

Wherein the content of the first and second substances,

representing a data set

The number of samples of (a) to (b),

a loss function representing a single sample, defined as the deviation of the actual output from the target output. Based on the gradient descent method, the method of the gradient descent,

updating according to

Where γ denotes a learning step size.

Thus, the t-th iteration device d _k Expressed as local training delay

Therein, ζ _k Number of CPU cycles required to process a sample, f _k (t) is a locally available computing resource.

3. Local upload model

After the training of the local model is finished, the scheduled equipment uploads the local model parameters and the local state information. The transmission rate can be expressed as

Wherein, B _k For transmission bandwidth, p _k And g _k (t) transmission power and channel gain, σ, respectively ₀ And

gaussian white noise and electromagnetic interference, respectively. Definition of

The sizes of the local model parameters and the local state information are small, so that the uploading delay of the local model parameters and the local state information is ignored. The local model parameter upload delay is expressed as

4. Model poisoning attack model

Model poisoning attacks imposed by malicious attackers on the upload of local model parameters within a campus are modeled as follows

Wherein, P _a (r) represents the model poisoning attack probability of the r-th round of communication,

representing the device d after an attack _k Uploaded error parameters, modeled as

Wherein, χ ∈ [ -1,1]Denotes a scale factor, n _k (t) is compliance

Additive noise of (1).

5. Global mean model

After the scheduled device uploads the local model parameters and state information to the edge server, the edge server performs global model averaging to update the digital twin model. The global model average for the t-th iteration is as follows

The global model average delay is expressed as

Wherein f is _g (t) represents the available computing resources of the edge server for the tth iteration.

The invention adopts omega _g (t) loss function to measure the accuracy of the global model, expressed as

6. Model poisoning attack detection

The model poisoning attack detection adopted by the invention is characterized in that the Euclidean norm of the parameter to be detected and the average parameter except the parameter is calculated and compared with a preset threshold, if the Euclidean norm is greater than the threshold, the Euclidean norm is judged to be the wrong model parameter, otherwise, the Euclidean norm is the normal model parameter. To be provided with

For example, except

The outer average parameter can be calculated as

And

is expressed as the Euclidean norm

d _k (t) and

is a proportional reference value of

Let attack detection variable be b _k (t)∈{0,1}，b _k (t) =1 denotes that the model parameter is a normal parameter, otherwise b _k (t) =0. Model poisoning attack detection can be represented by

Wherein e is ^Thr Is the detection threshold.

When the current time reaches the t-th iteration device d _k When the uploaded error model parameter ratio exceeds a threshold value, that is

Device d _k Is taken as a malicious device and moved out of D ^ava (t+1)。

7. Total time delay model

The total time delay of the digital twin model training of the tth iteration is composed of the maximum local training time delay, the maximum model parameter uploading time delay and the average time delay of the global model, and is expressed as

8. Modeling and transforming of numerical twin model loss function and average total training delay weighted and minimized problem

The objective of the invention is to minimize the weighted sum of the digital twin loss function and the average total training delay by jointly optimizing the local training period scheduling and the device scheduling strategy, and the minimization problem is modeled as follows

Wherein the content of the first and second substances,

c1 and C2 represent device scheduling constraints, and C3 represents discretized local training period constraints. Definition of tau ₁ And τ _H The minimum and maximum local training periods, respectively, are then the interval [ tau ₁ ,τ _H ]Can be quantized into H levels, wherein

Since the resource management optimization decision of each iteration is coupled with the optimization objective, P1 is difficult to directly solve. Thus, P1 is converted to a weighted sum of the upper bound of the convergence of the minimization of the digital twin penalty function and the total model training delay per iteration, i.e.

Wherein the content of the first and second substances,

delta represents local model loss function gradient and digital twin loss function ladderThe divergence between degrees is bounded, rho and beta represent that the loss function of the local model meets rho-Lipschitz and beta-Smooth, theta represents the upper bound of the parameter norms of all the local models, and theta represents the twin loss function and the converged loss function of the current iteration number

The lower bound of the divergence there between,

and (3) representing the estimated value of the probability of the model poisoning attack of the t-th iteration, and calculating as follows:

the invention sets a convergence upper limit constraint of a digital twin loss function, and prevents the precision of a digital twin model from deteriorating by scheduling at least a certain number of devices. By observing the upper bound of the convergence of the digital twin loss function of P2, the constraint on the upper bound of the convergence of the loss function can be converted into a constraint on the part of the upper bound of the convergence, which is related to the number of scheduling devices, namely

Where λ (t) represents the constraint threshold. Since the number of scheduling devices is an integer, (20) can be rewritten as

Thus, P2 can be rewritten as P3

9. Design of endogenous security perception resource management method for social assets participating in power grid interaction

In order to solve P3, the invention models the Markov optimization problem, and the key elements of the Markov optimization problem comprise a state space, an action space and a reward function, and the method is specifically introduced as follows:

state space of digital twin estimation: the state space scheduled by the large-scale local training period comprises all model poisoning attack estimation probabilities in the last round of communication provided by the park digital twin, namely

The state space of small-scale equipment scheduling comprises a large-scale local training period scheduling strategy and model poisoning attack probability and channel gain of digital twin estimation

And electromagnetic interference

Namely that

An action space: the action space of large-scale local training period scheduling is defined as

Wherein the content of the first and second substances,

denotes τ (r) = τ _h 。

The action space of small-scale equipment scheduling is defined as

A ^D (r)＝{a ₁ (t),...,a _k (t),...,a _K (t)} (28)

The overhead function: since P3 is a minimization problem, the present invention employs a cost function to define the reward function. The cost function of the small-scale equipment scheduling is defined as the cost function omega (a) of one iteration ^T (r), a (t)), wherein

The cost function of large-scale local training period scheduling is defined as T ₀ The sum of the cost functions of the sub-iterations, expressed as

The invention provides an endogenous security perception resource management method for social assets participating in power grid interaction to solve the Markov optimization problem. The algorithm schematic is shown in fig. 2. The edge server is the subject of executing the algorithm. For large-scale local training period scheduling, the edge server maintains an evaluation network and a target network, and network parameters are respectively expressed as

And

similarly, the evaluation network and target network parameters for small-scale device scheduling are respectively expressed as

And

the invention discloses a flow chart of an endogenous security perception resource management method for social assets participating in power grid interaction, which is shown in figure 3 and comprises the following specific steps:

(1) initialization: initialization

And

set up Ω (a) ^T (r),a(t))＝0，

(2) Selecting a multi-time scale resource management action: at the beginning of each round of communication, i.e., T = (r-1) T ₀ +1, the edge server performs large-time-scale local training period scheduling decision by using an epsilon-greedy algorithm; at the beginning of each iteration, similarly, the edge server makes a small time-scale device scheduling decision using the epsilon-greedy algorithm. And the equipment carries out local model training and local model parameter uploading according to the decision. The edge server performs global model tiebacks and model poisoning attack detection and estimates a model poisoning attack probability according to (21).

(3) Calculating a cost function and converting the state of the intelligent park: at the end of each iteration, the edge server observes a digital twin loss function and average total training time delay performance, and calculates a cost function omega (a) of the small-scale equipment scheduling network according to a formula (20) ^T (r), a (t)). At the end of each round of communication, i.e. t = rT ₀ The edge server calculates the cost function of the large-scale local training period scheduling network according to the formula (29)

For updating the experience playback pool U ^T (r) and comparing the current state S ^T (r) transition to the next State S ^T (r + 1). Similarly, for small scale device scheduling, the edge server generates empirical data u ^D (t)＝{S ^D (t),A ^D (t),Ω(a ^T (r),a(t)),S ^D (t + 1) } to update the experience playback pool U ^D (t) and comparing the current state S ^D (t) transition to the next State S ^D (t+1)。

(4) Learning and multi-time scale DQN network updating: at the end of each communication round, the edge server randomly plays back the pool U from experience ^T (r) extracting a random sample set

Calculating the loss function of the large-scale local training period scheduling network according to the following formula

the Q value of the scheduling target network representing the local training period is calculated by the following formula

Wherein iota is a discount factor.

Namely that

Where κ denotes a learning step size.

Similarly, at the end of each iteration, the edge server randomly plays back the pool U from experience ^D (t) extracting a random sample set

Calculating the loss function of the small-scale equipment scheduling network according to the following formula, and updating the parameters

Repeating the steps (2) to (4) until T = T.

The principle of realizing endogenous security perception of the method is to actively adjust the scheduling strategy of the large-time scale local training period and the small-time scale equipment according to the estimated probability of model poisoning attack. In particular, for small time scale device scheduling, the cost function Ω (a) is used when the estimated probability of model poisoning attack in the current iteration is greater than the model poisoning attack probability in the previous iteration ^T (r), a (t)) will increase. Therefore, the edge server can actively adjust the device scheduling policy to reduce Ω (a) ^T (r), a (t)) to guarantee the digital twin loss function and the average total training delay performance. For local training period scheduling with large time scale, the designed method can sense the overhead function

And model poisoning attack estimates a probability distribution. Therefore, the edge server can actively adjust the local training period scheduling strategy to adapt to the continuously changing model poisoning attack, and ensure the endogenous safety of the training of the digital twin model.

The comparison algorithm 1 is a deep reinforcement learning-based device orchestration algorithm (DRL-DO), and the optimization goal is to minimize the weighted sum of the digital twin loss function and the average total training delay. The comparison algorithm 2 is a client scheduling algorithm (UCB-CS) based on an upper confidence bound, and the optimization goal is to minimize the average total training delay. Both comparison algorithms ignore the large-scale local training period scheduling optimization and set the local training period to 1. In addition, the two comparison algorithms ignore the model poisoning attack detection of the edge server, that is, endogenous security perception cannot be realized by estimating the probability of the model poisoning attack.

The embodiment of the invention considers a scene of a 400m × 400m intelligent park, which comprises a base station carrying an edge server and 30 communication devices. The devices are randomly distributed in the area of the smart park under consideration, and the base station is located at the center of the area. The number of communication rounds is set to 100, the total number of iterations is set to 1000, and each communication round comprises 10 iterations. The present invention trains a digital twin model using an MNIST data set. Each device randomly assigned a training sample from MINST. The probability of model poisoning attack is randomly distributed in [0.05,0.2]. The simulation results are as follows:

figure 4 shows the weight and performance as a function of the number of iterations. The weighting and performance of the digital twin loss function and the average total training time delay of the invention are respectively improved by 16.51 percent and 19.58 percent compared with DRL-DO and UCB-CS. In addition, the weighting and performance of the present invention has better convergence and lower volatility. The invention optimizes large-scale local training period scheduling and small-scale equipment scheduling by sensing endogenous safety, thereby realizing the combined optimization of a digital twin loss function and average total training time delay.

Fig. 5 shows the weighting and performance as a function of the probability of an average model poisoning attack. As the probability of the average model poisoning attack increases, the weighted sum of the digital twin loss function and the average total training delay of the three algorithms decreases, with minimal degradation of the invention. Compared with DRL-DO and UCB-CS, when the probability of the model poisoning attack is equal to 0.4, the weighting and the performance of the method are respectively improved by 24.29 percent and 32.51 percent. The reason is that the present invention employs model poisoning attack detection to estimate the model poisoning attack probability. Therefore, the multi-time scale resource management strategy can be actively adjusted by sensing the estimated attack probability, and the endogenous safety of the training of the digital twin model is realized.

Fig. 6 shows the convergence behaviour of the invention. Simulation results show that the invention converges at the 400 th iteration, while the invention does not consider problem transformation converges at the 600 th iteration. The reason is that constraint C4 limits the minimum number of scheduling devices, reducing the size of the action space of the present invention, thereby achieving faster convergence performance.

The embodiment of the present application further provides an endogenetic security perception resource management equipment based on wisdom garden, include:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to: the memory stores instructions executable by the at least one processor to cause the at least one processor to perform: the local training period scheduling strategy executes local model training, local model parameter uploading, global model averaging, model poisoning attack detection, total time delay calculation and endogenous security perception resource management method execution.

An embodiment of the present application further provides a computer storage medium storing computer-executable instructions, where the computer-executable instructions are configured to: the memory stores instructions executable by the at least one processor to cause the at least one processor to perform: the local training period scheduling strategy executes local model training, local model parameter uploading, global model averaging, model poisoning attack detection, total time delay calculation and endogenous security perception resource management method execution.

The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the device and media embodiments, the description is relatively simple as it is substantially similar to the method embodiments, and reference may be made to some descriptions of the method embodiments for relevant points.

The device and the medium provided by the embodiment of the application correspond to the method one to one, so the device and the medium also have the similar beneficial technical effects as the corresponding method, and the beneficial technical effects of the method are explained in detail above, so the beneficial technical effects of the device and the medium are not repeated herein.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. Computer readable media, as defined herein, does not include transitory computer readable media (transient media) such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises that element.

It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the practice of the invention, and it is to be understood that the scope of the invention is not limited to such specific statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A smart park system, comprising: a device layer, a 6G edge smart layer and a digital twin layer;

the device layer includes: the system comprises communication equipment and electrical equipment, wherein the communication equipment is deployed on the electrical equipment to perform local model training; the communication equipment executes local model training according to a local training period scheduling strategy, and uploads local model parameters to the 6G edge intelligent layer according to the scheduling strategy of the communication equipment to perform global model averaging so as to update a digital twin model of the digital twin layer; the electrical apparatus includes: photovoltaic and charging piles;

a 6G edge intelligent layer deploys a base station carrying an edge server; the edge server schedules a local training period to guide local model training, and uploads a communication equipment scheduling strategy for local parameters; the edge server provides model poisoning attack detection for the uploaded local model parameters;

the digital twin layer is maintained by an edge server of a 6G edge intelligent layer, real-time interaction with the park communication equipment is realized by the digital twin layer, and synchronization of the digital twin layer and a physical network is kept; the digital twin layer provides channel gain and electromagnetic interference estimation through local state information uploaded by the equipment layer, provides model poisoning attack probability estimation through model poisoning attack detection of the 6G edge intelligent layer, schedules a local training period of the 6G edge intelligent layer based on the estimated model poisoning attack probability, and schedules and optimizes communication equipment.

2. The intelligent campus system of claim 1 wherein: the local training period scheduling strategy performs local model training, including:

Scheduled communication device d in the t-th iteration _k The global model is downloaded from the edge server to update the local model, as follows:

denotes the t-1 th timeIterative communication device d _k The local model of (2);

communication device d in each iteration _k Using local data sets

The input and target output of the jth sample; thus, the t-th iteration communication device d _k The loss function of (a) is as follows:

wherein the content of the first and second substances,

representing the communication device d for the t-th iteration _k The loss function of (a) is calculated,

representing a data set

The number of samples of (a) to (b),

a loss function representing a single sample, defined as the deviation of the actual output from the target output; based on the gradient descent method, the method of the gradient descent,

update according to:

wherein γ represents a learning step;

thus, the t-th iteration communication device d _k Local training time delay

Represented by the formula:

representing a data set

The number of samples.

3. The intelligent park system according to claim 1, wherein: uploading the local model parameters to the 6G edge smart layer specifically comprises the following steps:

wherein R is _k (t) the t-th iteration communication device d _k Transmission rate of B _k For transmission bandwidth, p _k And g _k (t) are respectively transmission powerAnd channel gain, σ ₀ And

white gaussian noise and electromagnetic interference respectively;

is the magnitude of the local model parameter, | s _k (t) | is the local state information size since | s _k The size of (t) | is small, so s is ignored _k (t) I uploading delay, local model parameter uploading delay

Represented by the formula:

wherein, define

Is the size of the parameters of the local model,

and uploading the delay for the local model parameters.

4. The intelligent campus system of claim 1 wherein: the 6G edge smart layer global model averaging specifically comprises the following steps:

wherein, ω is _g (t) represents the global model for the tth iteration,

representing a data set

communication device d for the tth iteration _k The local model parameters of (a);

to represent

by using omega _g (t) loss function F _g (ω _g (t), t) to measure the accuracy of the global model, which is expressed as follows:

wherein the content of the first and second substances,F _g (ω _g (t), t) represents ω _g (t) a loss function of the (t),

representing a data set

Number of samples of (a) _k (t) communication device d for the tth iteration _k The schedule indication variable.

5. The intelligent park system according to claim 1, wherein: the model poisoning attack detection comprises:

the model poisoning attack detection is carried out by calculating Euclidean norms of parameters to be detected and average parameters except the parameters to be detected, comparing the Euclidean norms with a preset threshold, if the Euclidean norms are larger than the threshold, judging the parameters to be detected to be error model parameters, and otherwise, judging the parameters to be detected to be normal model parameters; removing device

Mean outside parameter

Calculated as follows:

wherein the content of the first and second substances,

to remove

Outer average parameter, a _z (t) communication device d for the tth iteration _z The scheduling indication variable of (a) is,

t-th iteration communication device d _z Local model parameters of (2);

and with

Euclidean norm d of _k (t) represents the following formula:

wherein d is _k (t) represents

And with

Euclidean norm of;

d _k (t) and

proportional reference value e _k (t) is represented by the following formula:

wherein e is _k (t) represents d _k (t) and

is compared with the reference value of the ratio of (c),

represent

Euclidean norm of d _k (t) represents

And with

Euclidean norm of;

6. The intelligent park system of claim 1, whereinThe method comprises the following steps: total time delay L of digital twin model training of t-th iteration of equipment layer ^Sum (t) the maximum local training delay of the equipment layer, the maximum model parameter uploading delay obtained by the equipment layer transmitting parameters to the 6G edge intelligent layer and the global model average delay obtained by the 6G edge intelligent layer are represented as follows:

which represents the local training time delay and is,

7. An endogenous security perception resource management method for social assets participating in power grid interaction is characterized by comprising the following steps: an endogenous security-aware resource management method implemented on the basis of the intelligent campus system of claim 1;

step 1, initialization

And

setting omega (a) ^T (r),a(t))＝0，

Wherein the content of the first and second substances,

and

and

respectively representing the evaluation network and target network parameters of the small-scale equipment scheduling, omega (a) ^T (r), a (t)) represents a small-scale device scheduling overhead function,

step 2, at the beginning of each round of communication, namely T = (r-1) T ₀ +1, the edge server performs large-time-scale local training period scheduling decision by using an epsilon-greedy algorithm; when each iteration starts, similarly, the edge server performs small-time-scale equipment scheduling decision by using an epsilon-greedy algorithm; the equipment carries out local model training and local model parameter uploading according to the decision; the edge server executes global model tie and model poisoning attack detection and calculates the poisoning attack probability;

step 3, when each iteration is finished, the edge server observes a digital twin loss function and average total training time delay performance;

calculating a cost function of a small-scale equipment scheduling network;

For updating the experience playback pool U ^T (r) and comparing the current state S ^T (r) transition to the next State S ^T (r + 1); for small-scale equipment scheduling, the edge server generates empirical data u ^D (t)＝{S ^D (t),A ^D (t),Ω(a ^T (r),a(t)),S ^D (t + 1) } to update the experience playback pool U ^D (t) and comparing the current state S ^D (t) transition to the next State S ^D (t+1)；

And 4, when each round of communication is finished, the edge server randomly plays back the pool U from experience ^T (r) extracting a random sample set

network Q values are evaluated for large scale local training period scheduling,

the Q value of the scheduling target network representing the local training period is calculated by the following formulaCalculating:

And 5, repeating the steps 2 to 4 until T = T.

8. The endogenous security aware resource management method of claim 7, wherein: in step 2, calculating the poisoning attack probability according to the following formula;

wherein the content of the first and second substances,

9. The endogenous security aware resource management method of claim 8, wherein: in the step 3, the step of the method is that,

calculating a cost function omega (a) of a small-scale equipment scheduling network ^T (r),a(t))，

Wherein the content of the first and second substances,

The lower bound of the divergence between them,

at the end of each communication round, i.e. t = rT ₀ The edge server calculates the overhead function of the large-scale local training period scheduling network according to the following formula

scheduling a network overhead function, Ω (a), for a large scale local training period ^T (r), a (t)) is a small-scale device scheduling overhead function.

10. An endogenous security-aware resource management method according to claim 9, wherein:

in step 4, calculating a loss function eta of the large-scale local training period scheduling network ^T (r) is represented by the following formula:

Respectively a state space and an action space of the r +1 th round communication and a large-scale local of the r round communicationThe training period schedules the evaluation of the network parameters,

wherein iota is a discount factor, omega (a) ^T (r), a (t)) scheduling overhead functions for small-scale devices,

The following formula:

where, k represents a learning step size,

as a loss function eta ^T (r) gradient descent;

Calculating the small scale design according toLoss function eta of standby scheduling network ^D (t) and updating the parameters

Wherein eta is ^D (t) scheduling a loss function of the network for the small-scale devices,

a target network Q value is scheduled for the device,

The state space and the action space of the t +1 th iteration and the small-scale equipment scheduling evaluation network parameters of the t th iteration are respectively, k represents the learning step length,

as a loss function eta ^D (t) a gradient decrease;

11. A computer device, comprising:

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of endogenous security aware resource management of one of claims 7 to 10.

12. A computer-readable storage medium characterized by: storing computer instructions for implementing the method of endogenous security aware resource management according to one of claims 7 to 10 when executed by a processor.