CN116109058A

CN116109058A - Substation inspection management method and device based on deep reinforcement learning

Info

Publication number: CN116109058A
Application number: CN202211502140.XA
Authority: CN
Inventors: 李竹筠; 冯善强; 张延旭; 胡春潮; 何英发
Original assignee: China Southern Power Grid Power Technology Co Ltd
Current assignee: China Southern Power Grid Power Technology Co Ltd
Priority date: 2022-11-28
Filing date: 2022-11-28
Publication date: 2023-05-12

Abstract

The invention provides a substation inspection management method and device based on deep reinforcement learning, comprising the steps of constructing an inference task scheduling model which takes real-time calculation forces of edge nodes and cloud as input and takes specific edge node equipment numbers or cloud as prediction results for output by utilizing a Muzero algorithm; the method comprises the steps of training an inference task scheduling model by using data information generated by various patrol tasks in a transformer substation with load balancing as a target; based on the current state of the transformer substation, performing node prediction by using a trained reasoning task scheduling model, and distributing edge computing resources for the patrol task according to a prediction result so as to realize load balance of patrol management of the transformer substation. According to the invention, through a deep reinforcement learning model based on a Muzero algorithm, the edge nodes and the cloud are used as characteristic input networks together, and the distributed edge nodes are selected with load balance as a target, so that edge cloud coordination is realized, and the reliability and the operation efficiency of a substation inspection system are improved.

Description

Substation inspection management method and device based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of substation management, and particularly relates to a substation inspection management method and device based on deep reinforcement learning.

Background

With the rapid development of intelligent digital power grids, the number of substations is continuously increased, the operation and maintenance workload is continuously increased, part of the substations are remote and distributed and scattered, and when special weather is met, manual inspection is blocked, so that the states of equipment in the substations and the field environment cannot be mastered in time. Therefore, the unmanned inspection system of the transformer substation, which can replace manual inspection of the transformer substation, reduces the workload of operators on duty and improves inspection efficiency and quality, is necessary.

The power transformation field has the problems of large manual management workload, multiple types of algorithm platforms, low automation degree and the like; the patrol equipment terminal is provided with a large number of image and video monitoring devices, and the problems of mass data collection, management, analysis and identification and the like are difficult to effectively cope with.

With the deep advancement of intelligent technology in the technical field of production, the application level of various production video (image) data of a substation inspection system is faced with challenges of intelligent operation and detection systems such as power grid equipment, channel environment, operation and maintenance, overhaul and production management: the 'cloud and edge' have insufficient cooperative capability. The pressure of the main station platform is overlarge, including network bandwidth pressure, centralized calculation pressure and centralized storage pressure, and the sub-stations need to exert cloud edge cooperative advantages through edge calculation.

Disclosure of Invention

In view of the above, the invention aims to solve the problem that the existing substation inspection system cannot realize intelligent coordination and unified dispatching of end-to-side clouds, so that the pressure of a main station platform is overlarge.

In order to solve the technical problems, the invention provides the following technical scheme:

in a first aspect, the present invention provides a substation inspection management method based on deep reinforcement learning, including the following steps:

constructing an inference task scheduling model which takes real-time computing power of the edge node and the cloud as input and takes a specific edge node equipment number or the cloud as a prediction result for output by using a Muzero algorithm;

training an inference task scheduling model by using data information generated by various patrol tasks in the transformer substation with load balancing as a target, wherein the load balancing is used for distributing resources based on real-time computing power of edge nodes and cloud, so that average service time of the patrol tasks is minimized;

based on the current state of the transformer substation, performing node prediction by using a trained reasoning task scheduling model, and distributing edge computing resources for the patrol task according to a prediction result so as to realize load balance of patrol management of the transformer substation.

Further, the reasoning task scheduling model is deployed at the cloud end, the cloud end performs centralized training on the reasoning task scheduling model, and the training is distributed to each edge node after the training is completed.

Further, with load balancing as a target, training an inference task scheduling model by utilizing data information generated by various patrol tasks in the transformer substation, specifically including:

dividing a substation inspection task into a computation-intensive task and a delay-sensitive task;

for delay sensitive tasks, edge nodes are distributed for processing, and for computationally intensive tasks, the tasks are sent to a cloud for processing;

training an inference task scheduling model by utilizing data information generated by various patrol tasks in the transformer substation;

and calculating task failure rates in all the inspection tasks, and optimizing the reasoning task scheduling model by taking the lowest task failure rate as a target.

Further, the data information includes real-time computing power of the edge node and the cloud, network bandwidth and delay, task size and delay sensitivity.

Further, the reasoning task scheduling model specifically includes: a terminal layer, an edge calculation layer and a cloud layer;

the terminal layer comprises a plurality of types of equipment, the equipment periodically collects environment data and generates a computation intensive task and a delay sensitive task, and the terminal equipment is accessed to the edge computing layer through TCP;

the edge computing layer is used for providing edge computing resources for the terminal equipment and is connected with the cloud layer through a 5G network;

the cloud layer is used for providing cloud computing capability for the terminal equipment.

In a second aspect, the present invention provides a substation patrol management device based on deep reinforcement learning, including:

the model construction module is used for constructing an inference task scheduling model which takes real-time computing power of the edge node and the cloud as input and takes a specific edge node equipment number or the cloud as a prediction result for output by using a Muzero algorithm;

the model training module is used for training the reasoning task scheduling model by using data information generated by various patrol tasks in the transformer substation with load balancing as a target, and the load balancing is used for distributing resources based on real-time computing power of edge nodes and cloud so as to minimize the average service time of the patrol tasks;

the resource allocation module is used for carrying out node prediction by utilizing the trained reasoning task scheduling model based on the current state of the transformer substation, and allocating edge computing resources for the patrol task according to the prediction result so as to realize load balance of patrol management of the transformer substation.

Furthermore, in the model training module, the reasoning task scheduling model is deployed at the cloud end, the cloud end performs centralized training on the reasoning task scheduling model, and the training is distributed to each edge node.

Further, in the model training module, with load balancing as a target, training the reasoning task scheduling model by using data information generated by various patrol tasks in the transformer substation, specifically including:

Further, in the model training module, the data information includes real-time computing power of the edge node and the cloud, network bandwidth and delay, task size and delay sensitivity.

Further, in the model building module, the reasoning task scheduling model specifically includes: a terminal layer, an edge calculation layer and a cloud layer;

In summary, the invention provides a substation inspection management method and device based on deep reinforcement learning, which comprises the steps of constructing an inference task scheduling model which takes real-time computing power of edge nodes and cloud as input and takes specific edge node equipment numbers or cloud as a prediction result for output by utilizing a Muzero algorithm; training an inference task scheduling model by using data information generated by various patrol tasks in the transformer substation with load balancing as a target, wherein the load balancing is used for distributing resources based on real-time computing power of edge nodes and cloud, so that average service time of the patrol tasks is minimized; based on the current state of the transformer substation, performing node prediction by using a trained reasoning task scheduling model, and distributing edge computing resources for the patrol task according to a prediction result so as to realize load balance of patrol management of the transformer substation. According to the invention, through a deep reinforcement learning model based on a Muzero algorithm, the edge nodes and the cloud are used as characteristic input networks together, and the distributed edge nodes are selected with load balance as a target, so that edge cloud coordination is realized, and the reliability and the operation efficiency of a substation inspection system are improved.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is a schematic flow chart of a substation inspection management method based on deep reinforcement learning according to an embodiment of the present invention;

fig. 2 is a relationship diagram of edge nodes and cloud end provided in an embodiment of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the embodiments described below are only some embodiments of the present invention, not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Based on the method, the invention provides a substation inspection management method and device based on deep reinforcement learning.

Referring to fig. 1, the present embodiment provides a substation inspection management method based on deep reinforcement learning, which includes the following steps:

s100: and constructing an inference task scheduling model which takes real-time computing power of the edge node and the cloud as input and takes a specific edge node equipment number or the cloud as a prediction result for output by using a Muzero algorithm.

It should be noted that the inference task scheduling model constructed by the MuZero algorithm is an abstract Markov decision process (Markov DecisionProcess, MDP) model, on which future data (policies, cost functions and rewards) directly related to the planning are predicted, and on the basis of the future data, the planning is performed by predicting the data, so as to select the operation nodes of the inference task.

The MuZero algorithm combines tree-based searches with learned models, first converting states acquired in the real environment to a hidden state in an abstract state space without direct constraints by an encoder. The hidden state is a dynamic model trained in an abstract state space and a value prediction network, and the value and the return of the future k steps can be predicted after the initial hidden state and the future k steps are executed, so that the value and the return of the future k steps are as close as possible to the value and the observed return of the real environment through searching.

The planning in the abstract MDP remains equivalent to the planning in the real environment by ensuring that the value equivalents are maintained. I.e. starting from the same real state, the accumulated reward of the trace through the abstract MDP matches the accumulated reward of the trace in the real environment. Given a hidden state and a candidate action, the dynamic model needs to generate an instant prize and a new hidden state. The strategy and value functions are calculated from the prediction functions by input. The actions are sampled from the search strategy. The environment receives an action to generate a new observation and rewards.

The reasoning task scheduling model is set to be three layers, namely a terminal layer, an edge computing layer and a cloud layer from bottom to top, wherein the terminal layer comprises a plurality of types of equipment, each equipment is represented by a symbol u, the equipment periodically collects environment data and generates computationally intensive and delay sensitive tasks; the terminal equipment is accessed to an edge computing layer through TCP, and the edge end is responsible for providing edge computing resources for the terminal equipment; the edge layer comprises edge infrastructure capability EC-IaaS, which is composed of a wireless base station, a small data center (edge server), edge nodes and interfaces, and mainly provides computing, storage, network and virtualized resources; the edge platform capability EC-PaaS provides functions for the edge cloud environment, and mainly performs preprocessing and analysis of data and deployment and arrangement of applications; the edge application capability EC-SaaS expands the functions to the edge, and provides services for the application program to the greatest extent. The edge computing layer is connected with the cloud layer through 5G, and the cloud comprises infrastructure capability IaaS for providing infrastructure such as computing, storage, network and virtual machines; platform capability PaaS provides functions of device management, resource management, data processing and analysis, data modeling and analysis, service components, edge management, and business orchestration. The cloud computing power is denoted by the symbol fc. The relationship of the edge computation layer and the cloud layer is shown in fig. 2.

S200: and training an inference task scheduling model by using data information generated by various patrol tasks in the transformer substation by taking load balancing as a target, wherein the load balancing is used for distributing resources based on real-time computing power of edge nodes and cloud, so that the average service time of the patrol tasks is minimized.

The training of the reasoning task scheduling model is performed based on data information generated by various patrol tasks in the transformer substation, the reasoning task scheduling model utilizes the data information to conduct cloud-edge node selection prediction, and the transformer substation patrol management system conducts resource allocation according to a prediction result, so that average service time (sum of task processing time and communication delay) is reduced, and load balance is achieved.

The reasoning task scheduling model is deployed at the cloud for centralized training, and is distributed to each edge node after training is completed. And at each edge node, based on the environmental information collected by each terminal device, carrying out node reasoning to obtain the probability of the device receiving the task, and for the computationally intensive task, obtaining the probability of unloading the task from the edge server to the cloud server.

For training of the inference task scheduling model constructed based on the Muzero algorithm, the state S, action A, reward R and value v (S) are defined as follows:

the state S is a CPU usage rate of each edge server and cloud server, a network usage state such as LAN and WAN between the device and each server, a task size, and real-time attributes. The cloud server parameters comprise the number of servers, the number of virtual machines, the number of cores of the virtual machines, the average execution speed of single word length fixed-point instructions, memory and storage capacity.

The state S definition contains the following: (1) CPU usage (edge/cloud), network (local area network bandwidth, WAN bandwidth, LAN latency, WAN latency), task size, and latency sensitivity (load check interval, location check interval, device counter size).

In the proposed method, the real-time properties of the tasks are regarded as one of the factors that determine the actions, and whether each task requires real-time properties is set.

The action space a contains the following: (1) edge processing; (2) offloading to the cloud. Action a either processes the task at the edge server or offloads the task from the edge server to the cloud server. The reward R is: the service is time consuming. The reward R represents the service time of the task. The smaller the service time, the larger the prize, and the better the value v(s) can be obtained. Based on the definition and the data information generated by the substation patrol task, training of an inference task scheduling model is achieved.

For a general model constructed based on the Muzero algorithm, the training process is as follows:

s1: obtaining a strategy p by using an MCTS algorithm _k And estimated value v _k 。

S2: selecting action a from a policy _t+1 This action is performed and an intermediate prize and state space is generated.

S3: in each step, policy, value, reward is matched to the actual observed target image by co-training all parameters of the model.

S4: all parameters of the model are jointly trained to match the strategy, value and rewards of each hypothesized step k exactly to the corresponding target values observed after the k actual time steps.

S5: adding an L2 regularization term to obtain a final loss function;

s6: and repeating the steps S3-S4, and finishing training when the loss function converges.

S300: based on the current state of the transformer substation, performing node prediction by using a trained reasoning task scheduling model, and distributing edge computing resources for the patrol task according to a prediction result so as to realize load balance of patrol management of the transformer substation.

It should be noted that, for the substation patrol task, it is divided into a computationally intensive and a delay sensitive task, and the delay sensitive task is further divided into two levels of level 0 and level 1. The task set to level 1 is sensitive to communication latency and should therefore be handled on the edge server to reduce communication latency. As the number of devices increases, the number of tasks to be processed increases, and the substation inspection management system discards tasks that have not been saved waiting for processing or cannot be transmitted due to network congestion, and these tasks are regarded as failed tasks. The smaller the task failure rate, the higher the efficiency of the load balancing method. Therefore, the training goal of the reasoning task scheduling model is to minimize the task failure rate of the whole system as much as possible, thereby realizing load balancing.

The embodiment provides a substation inspection management method based on deep reinforcement learning, which comprises the steps of constructing an inference task scheduling model which takes real-time calculation forces of edge nodes and cloud as input and takes specific edge node equipment numbers or cloud as prediction results to output by utilizing a Muzero algorithm; training an inference task scheduling model by using data information generated by various patrol tasks in the transformer substation with load balancing as a target, wherein the load balancing is used for distributing resources based on real-time computing power of edge nodes and cloud, so that average service time of the patrol tasks is minimized; based on the current state of the transformer substation, performing node prediction by using a trained reasoning task scheduling model, and distributing edge computing resources for the patrol task according to a prediction result so as to realize load balance of patrol management of the transformer substation. According to the invention, through a deep reinforcement learning model based on a Muzero algorithm, the edge nodes and the cloud are used as characteristic input networks together, and the distributed edge nodes are selected with load balance as a target, so that edge cloud coordination is realized, and the reliability and the operation efficiency of a substation inspection system are improved.

The foregoing is a detailed description of an embodiment of a substation inspection management method based on deep reinforcement learning according to the present invention, and the following is a detailed description of an embodiment of a substation inspection management device based on deep reinforcement learning according to the present invention.

The invention provides a substation inspection management device based on deep reinforcement learning, which comprises: the system comprises a model construction module, a model training module and a resource allocation module.

In this embodiment, the model building module is configured to build an inference task scheduling model using real-time computing power of the edge node and the cloud as input and using a specific edge node device number or the cloud as a prediction result to output by using a MuZero algorithm.

It should be noted that, in the model building module, the reasoning task scheduling model specifically includes: a terminal layer, an edge calculation layer and a cloud layer;

In this embodiment, the model training module is configured to train the inference task scheduling model by using data information generated by various patrol tasks in the substation with load balancing as a target, where load balancing is to allocate resources based on real-time computing forces of the edge node and the cloud, so that average service time of the patrol tasks is minimized.

In the model training module, the inference task scheduling model is deployed at the cloud end, the cloud end performs centralized training on the inference task scheduling model, and the inference task scheduling model is distributed to each edge node after the training is completed.

In this embodiment, the resource allocation module is configured to perform node prediction by using a trained inference task scheduling model based on a current state of the substation, and allocate edge computing resources for the patrol task according to a prediction result, so as to implement load balance of patrol management of the substation.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The substation inspection management method based on deep reinforcement learning is characterized by comprising the following steps of:

training the reasoning task scheduling model by using data information generated by various patrol tasks in the transformer substation with load balancing as a target, wherein the load balancing is to allocate resources based on real-time computing power of the edge nodes and the cloud so as to minimize average service time of the patrol tasks;

and based on the current state of the transformer substation, performing node prediction by using the trained reasoning task scheduling model, and distributing edge computing resources for the patrol task according to a prediction result so as to realize load balance of patrol management of the transformer substation.

2. The substation inspection management method based on deep reinforcement learning according to claim 1, wherein the inference task scheduling model is deployed at a cloud end, the cloud end performs centralized training on the inference task scheduling model, and the inference task scheduling model is distributed to each edge node after training is completed.

3. The substation patrol management method based on deep reinforcement learning according to claim 2, wherein the training of the inference task scheduling model by using data information generated by various patrol tasks in the substation is performed with load balancing as a target, specifically comprises:

for the delay sensitive task, the edge node is distributed for processing, and for the computation intensive task, the computation intensive task is sent to the cloud for processing;

training the reasoning task scheduling model by utilizing data information generated by various patrol tasks in the transformer substation;

4. The deep reinforcement learning-based substation patrol management method according to claim 3, wherein the data information includes real-time computing power of the edge node and the cloud, network bandwidth and delay, task size and delay sensitivity.

5. The substation patrol management method based on deep reinforcement learning according to claim 2, wherein the inference task scheduling model specifically comprises: a terminal layer, an edge calculation layer and a cloud layer;

6. Substation inspection management device based on deep reinforcement study, characterized by comprising:

the model training module is used for training the reasoning task scheduling model by using data information generated by various patrol tasks in the transformer substation with load balancing as a target, wherein the load balancing is used for distributing resources based on real-time calculation forces of the edge nodes and the cloud so as to minimize the average service time of the patrol tasks;

the resource allocation module is used for carrying out node prediction by utilizing the trained reasoning task scheduling model based on the current state of the transformer substation, and allocating edge computing resources for the patrol task according to a prediction result so as to realize load balance of patrol management of the transformer substation.

7. The substation patrol management device based on deep reinforcement learning according to claim 6, wherein in the model training module, the inference task scheduling model is deployed at a cloud end, the cloud end performs centralized training on the inference task scheduling model, and the training is distributed to each edge node after the training is completed.

8. The substation patrol management device based on deep reinforcement learning according to claim 7, wherein in the model training module, with load balancing as a target, the inference task scheduling model is trained by using data information generated by various patrol tasks in the substation, and specifically includes:

9. The deep reinforcement learning based substation patrol management device of claim 8, wherein in the model training module, the data information comprises real-time computing power, network bandwidth and delay, task size, and delay sensitivity of the edge nodes and the cloud.

10. The substation patrol management device based on deep reinforcement learning according to claim 7, wherein in the model building module, the inference task scheduling model specifically includes: a terminal layer, an edge calculation layer and a cloud layer;