CN115542736A

CN115542736A - Device control method, computer-readable storage medium, and computer terminal

Info

Publication number: CN115542736A
Application number: CN202211192135.3A
Authority: CN
Inventors: 仪忠凯; 王雪; 杨程; 印卧涛; 杨超; 钮孟洋; 韩佳澦
Original assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Current assignee: Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2022-12-30

Abstract

The invention discloses an equipment control method, a computer readable storage medium and a computer terminal. Wherein, the method comprises the following steps: measuring the real environment of the equipment to be controlled to obtain the state information of the equipment to be controlled; mapping the state information into an initial control instruction of the equipment to be controlled; correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the equipment to be controlled, and the control instruction located in the preset instruction space is used for controlling the equipment to be controlled to operate safely; and controlling the equipment to be controlled based on the target control instruction. The invention solves the technical problem that the scheduling system in the related art is difficult to meet the requirement of real-time scheduling.

Description

Device control method, computer-readable storage medium, and computer terminal

Technical Field

The present invention relates to the field of device control, and in particular, to a device control method, a computer-readable storage medium, and a computer terminal.

Background

At present, with the intervention of large-scale renewable energy and flexible resources, the traditional system scheduling mode is difficult to meet the requirement of real-time scheduling in a scheduling environment with violent change and complex model.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a device control method, a computer readable storage medium and a computer terminal, which at least solve the technical problem that a scheduling system in the related art is difficult to meet real-time scheduling.

According to an aspect of an embodiment of the present invention, there is provided an apparatus control method including: measuring the real environment of the equipment to be controlled to obtain the state information of the equipment to be controlled; mapping the state information into an initial control instruction of the equipment to be controlled; correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the equipment to be controlled, and the control instruction located in the preset instruction space is used for controlling the equipment to be controlled to operate safely; and controlling the equipment to be controlled based on the target control instruction.

According to another aspect of the embodiments of the present invention, there is also provided an apparatus control method, including: measuring the real power grid environment where the power equipment is located to obtain the state information of the power equipment; mapping the state information into an initial scheduling instruction of the power equipment; correcting the initial scheduling instruction to obtain a target scheduling instruction, wherein the target scheduling instruction is located in a preset instruction space of the power equipment, and the scheduling instruction located in the preset instruction space is used for controlling the power equipment to run safely; and controlling the power equipment based on the target scheduling instruction.

According to an aspect of an embodiment of the present invention, there is provided an apparatus control method including: the method comprises the steps that a cloud server receives state information of equipment to be controlled uploaded by a client, wherein the state information is obtained by measuring the equipment to be controlled in the real environment where the equipment to be controlled is located; the cloud server maps the state information into an initial control instruction of the equipment to be controlled; the cloud server corrects the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the device to be controlled, and the control instruction located in the preset instruction space is used for controlling the device to be controlled to operate safely; the cloud server sends the target control instruction to the client.

According to an aspect of the embodiments of the present application, there is provided a computer-readable storage medium including a stored program, where the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the apparatus control method in any one of the above embodiments.

According to an aspect of an embodiment of the present application, there is provided a computer terminal including: a memory for storing a program; and the processor is connected with the memory and used for running the program, wherein the program executes the equipment control method in any one of the above embodiments when running.

In the embodiment of the invention, the real environment of the equipment to be controlled can be measured to obtain the state information of the equipment to be controlled; mapping the state information into an initial control instruction of the equipment to be controlled; correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the equipment to be controlled, and the control instruction located in the preset instruction space is used for controlling the equipment to be controlled to operate safely; and the equipment to be controlled is controlled based on the target control instruction, so that the aim of ensuring the safe operation of the equipment to be controlled is fulfilled. It is easy to notice that the real environment where the device to be controlled is located can be measured to obtain the state information of the control device, and because the state information contains the factors of the real environment, the effect of real-time scheduling according to the real scene can be achieved, and the initial control instruction is corrected by using the safety correction model to obtain the target control instruction, so that the running safety of the device to be controlled can be further enhanced, and the technical problem that the scheduling system in the related technology is difficult to meet the real-time scheduling is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware structure of a computer terminal (or a mobile device) for implementing a device control method according to an embodiment of the present application;

fig. 2 is a flowchart of an apparatus control method according to embodiment 1 of the present application;

FIG. 3 is a schematic diagram of an appliance control process according to an embodiment of the present application;

FIG. 4 is a flow chart of a method for controlling a device according to an embodiment of the present application;

fig. 5 is a flowchart of an apparatus control method according to embodiment 2 of the present application;

fig. 6 is a flowchart of an apparatus control method according to embodiment 3 of the present application;

fig. 7 is a schematic diagram of an apparatus control device according to embodiment 4 of the present application;

FIG. 8 is a schematic view of an apparatus control device according to embodiment 5 of the present application;

FIG. 9 is a schematic view of an apparatus control device according to embodiment 6 of the present application;

fig. 10 is a block diagram of a computer terminal according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

alternating Current Optimal Power Flow (ACOPF for short): and calculating a refined mathematical model of better power flow distribution in the real-time scheduling of the power system.

Constrained Markov Decision Process (CMDP): a mathematical problem of sequential decision making.

The low-carbon economic operation of the power system is the necessary way to achieve the aim of 'double carbon', and the construction of a novel data-driven power system scheduling mode is the important force for promoting the digital transformation of an energy system and the high-quality development of national economy. The core of the real-time scheduling problem of the power system is how to solve a series of time-varying ACOPF problems quickly, economically and safely. In the aspect of mathematics, the ACOPF essentially belongs to a non-convex optimization problem, has low calculation efficiency and strong accuracy dependence on model parameters. With the large-scale access of renewable energy sources and the trend of complex power grid operation modes, the traditional power system scheduling mode is difficult to meet the real-time scheduling requirement of a power system in the environment with violent change and inaccurate parameters.

In the related art, a series of overall solutions for scheduling and controlling the smart grid are developed, however, the solutions are mainly based on an optimization modeling method, have high degree of dependence on parameters of a power system and an adjustable resource model, and cannot provide a quick decision function in a system with severe new energy change. An intelligent scheduling, running and optimizing decision platform and a source network load and storage integrated optimizing scheduling platform are also provided in the related technology, but the intelligent scheduling, running and optimizing decision platform and the source network load and storage integrated optimizing scheduling platform excessively depend on an accurate mathematical model and cannot fully exert the complementary advantages of expert knowledge and mass data. A series of safety reinforcement learning algorithms are provided in the related art, application attempts are made, and commercial floor application is not achieved, so that the practicability and the universality are insufficient.

Overall, a universal power system real-time scheduling technology which integrates power grid knowledge and mass data is absent in the industry at present, and the existing product has strong dependency on model parameters of a power grid and adjustable equipment, low decision efficiency and needs to be improved in universality.

In view of this, the present solution provides an apparatus control method to solve the real-time scheduling problem of a complex power system.

Example 1

There is also provided, in accordance with an embodiment of the present invention, an apparatus control method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be carried out in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be carried out in an order different than here.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 is a block diagram of a hardware structure of a computer terminal (or a mobile device) for implementing a device control method according to an embodiment of the present application. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown here as 102a,102b, \8230;, 102 n) processors 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission device 106 for communication functions. In addition, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the device control method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, so as to implement the device control method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

Under the above operating environment, the present application provides an apparatus control method as shown in fig. 2. Fig. 2 is a flowchart of an apparatus control method according to embodiment 1 of the present application.

Step S202, measuring the real environment of the equipment to be controlled to obtain the state information of the equipment to be controlled.

The state information may include: the environment state of the real environment and the equipment state of the equipment to be controlled.

The device to be controlled may be a device that needs to be scheduled in real time in different systems (such as an electric power system, an internet system, etc.), and in the embodiment of the present application, an adjustable device in the electric power system is taken as an example for description. For example, in an energy management scenario, the device to be controlled may be an active regulating device (e.g., a generator set, a new energy plant, an energy storage device, an adjustable load, etc.); in a voltage regulation scenario, the device to be controlled may be a reactive regulation device (e.g., a genset with reactive regulation capability, power electronics, a capacitor, etc.), but is not limited thereto.

The real environment may be a real network environment, a real power grid environment, an electric power system operation environment, and the like where the device to be controlled is located, and the real environment may be determined according to an environment where the control device is located.

The state information of the device to be controlled may be a device parameter, a device operating parameter, and the like of the device to be controlled in a real environment, and may also be an environment state of the real environment, where the environment state of the real environment may be a state of a network and a state of a signal in the real environment, where the state of the network may be stability of the network, and the state of the signal may be stability of the signal, but is not limited thereto, and is only described as an example.

The real environment may be an operating environment of the device to be controlled in different scenes, and the corresponding environment state may be a parameter of the operating environment. The real environment may be real environments of different systems, may be real environments of an electric power system and an internet system, and taking the electric power system as an example, the real environment may be a capacity management environment of a power grid, where the capacity management environment may be states of voltage, current, lines, and the like of a power regulation device, for example, may be a corresponding capacity management environment of a generator set, a new energy power station, an energy storage device, an adjustable load, and the like, and an environmental state may be a parameter of the power grid in a capacity management process. The real environment can also be the environment of voltage control, can realize reactive power's real-time regulation to multiclass reactive power adjusting equipment in the electric power system, guarantees that the electric wire netting voltage is at the safe level, and its environmental condition can be the parameter of electric wire netting under the voltage control condition. The real environment may also be an emergency or emergency environment, and the environmental status may be a parameter of the power grid in the emergency or emergency.

In the power system, the environmental status of the real environment may be a grid status, wherein the grid status includes, but is not limited to, branch flow, node voltage, and line status. The device state of the device to be controlled may be a power system state, wherein the power system state includes, but is not limited to, generator processing, energy storage energy state, and load demand.

In an optional embodiment, the real environment where the device to be controlled is located may be measured by various measuring devices installed in the real environment, so as to obtain state information of the device to be controlled in the real environment, and security of the device to be controlled may be restricted according to the state information of the device to be controlled in the real environment.

And step S204, mapping the state information into an initial control instruction of the equipment to be controlled.

In an alternative embodiment, the state information may be mapped to the initial control command of the device to be controlled by using a reinforcement learning model, wherein the reinforcement learning model is used for representing the mapping relationship between different state information and different control commands.

The reinforcement learning model may be a deep neural network.

The reinforcement learning model can be a reinforcement learning algorithm, wherein the reinforcement learning algorithm can design a reward function by taking a scene of minimizing operation cost and eliminating the load flow exceeding as a target, and the scene of eliminating the load flow exceeding can be a scene of reducing the output condition of a unit, cutting off the load, transferring the load and the like.

The initial control instruction may be a real-time scheduling instruction.

The reinforcement learning action space corresponding to the reinforcement learning model can comprise real-time scheduling instructions of various types of adjustable equipment in the power system, such as source, network, load, storage and the like.

The state information can be expressed as the current state space value of the equipment to be controlled in the reinforcement learning model.

In an optional embodiment, a reinforcement learning model may be used to train a scheduling policy, where the scheduling policy may be a mapping relationship from a state space to an action space, may obtain state information of a device to be controlled, which is acquired in a real environment, and the state information may be a current state space value, and may determine an action space value corresponding to the current state space value by using the scheduling policy, and output an adjustable device scheduling instruction reference value, that is, the initial control instruction, according to the action space value, and send the initial control instruction to a next link.

And step S206, correcting the initial control instruction to obtain a target control instruction.

The target control instruction is located in a preset instruction space of the device to be controlled, and the control instruction located in the preset instruction space is used for controlling the device to be controlled to operate safely.

In an alternative embodiment, the initial control command may be corrected by using a safety correction model to obtain a target control command.

The safety correction model described above can be used to ensure that the device to be controlled operates in a safe environment. The construction idea of the safety correction model can be as follows: and constructing the safety correction model by taking the safety operation limit of the power system as a constraint condition and taking the minimum deviation of the output value and the reference value of the safety correction model as a target.

The preset instruction space may include a plurality of preset instructions, and each instruction is used to implement different control on the device to be controlled.

In an optional embodiment, the initial control instruction may be corrected by using the security correction model, so that the initial control instruction is corrected into the security domain of the device to be controlled to obtain the target control instruction, and the target security instruction may be issued to the device to be controlled for execution and real-time execution. When there are a plurality of devices to be controlled, the target safety instruction generated corresponding to each device to be controlled may be sent to the corresponding device to be controlled.

In an optional embodiment, the safety correction model may further output a theoretical state of the device, and the theoretical state of the device may be compared with an actually measured state of the device in a real environment to obtain a deviation value, and the model parameter of the safety correction model is updated based on the deviation value.

And step S208, controlling the equipment to be controlled based on the target control instruction.

In an optional embodiment, the device to be controlled may be controlled according to the target control instruction, so that the device to be controlled executes an action corresponding to the target control instruction.

Taking the device to be controlled as each adjustable device in the power system as an example for explanation, the real environment where each adjustable device is located may be measured first to obtain the state information of each adjustable device, where the state information may be a power grid state and a device state of each adjustable device in the real environment, and according to a current state space value obtained by sampling from the real environment of the power system, that is, a space value corresponding to the device state and the power grid state, an action space value of the current state space value is determined by using a scheduling policy, and a scheduling reference instruction of each adjustable device is output, that is, the initial control instruction, a scheduling reference value corresponding to the scheduling reference instruction may be corrected by using a security correction model to obtain a scheduling instruction correction value, that is, the security control instruction, and the scheduling instruction correction value may be issued to each adjustable device for execution and implementation.

The application provides a knowledge-data fusion safety reinforcement learning framework for real-time scheduling of an electric power system. The reinforcement learning and optimization method can be combined, the reinforcement learning is utilized to make a quick decision in an environment with inaccurate model parameters, and then the optimization method is adopted to carry out safety correction on the output action of the reinforcement learning so as to ensure the safe operation of the system; in addition, in order to improve the model precision and the long-term economic benefit of a scheduling strategy, a safety correction model is initialized by adopting the priori knowledge of the power system, and the model parameters are dynamically updated by utilizing the measured data; the scheme can obviously reduce the degree of dependence on the electric power system model and parameters, and has obvious advantages in the aspects of decision efficiency, safety guarantee, application range and the like.

Through the steps, the real environment of the equipment to be controlled can be measured, and the state information of the equipment to be controlled is obtained; mapping the state information into an initial control instruction of the equipment to be controlled; correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the equipment to be controlled, and the control instruction located in the preset instruction space is used for controlling the equipment to be controlled to operate safely; and the equipment to be controlled is controlled based on the target control instruction, so that the aim of ensuring the safe operation of the equipment to be controlled is fulfilled. It is easy to notice that the real environment where the device to be controlled is located can be measured to obtain the state information of the control device, and because the state information contains the factors of the real environment, the effect of real-time scheduling according to the real scene can be achieved, and the initial control instruction is corrected by using the safety correction model to obtain the target control instruction, so that the running safety of the device to be controlled can be further enhanced, and the technical problem that the scheduling system in the related technology is difficult to meet the real-time scheduling is solved.

In the above embodiment of the present application, correcting the initial control instruction to obtain the target control instruction includes: and correcting the initial control instruction by using the safety correction model to obtain a target control instruction.

In the above embodiment of the present application, the safety correction model is further configured to correct the initial control instruction to obtain an equipment theoretical state of the equipment to be controlled, and the method further includes: acquiring an actual measurement state of equipment to be controlled in a real environment; comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain a target control deviation; and updating the model parameters of the safety correction model under the condition that the target control deviation exceeds a preset threshold value.

The actual measurement state of the device to be controlled in the real environment may be the actual measurement state of the device when the device to be controlled safely operates in the real environment.

The above-mentioned equipment theoretical state may be a theoretical value of a power flow state of the power system.

The preset threshold may be a preset threshold.

In an optional embodiment, a theoretical value of a power flow state of the power system, that is, the theoretical state of the device and the actual state of the device, may be compared to obtain a target control deviation, where the target control deviation may be a control deviation accumulated according to a comparison result; under the condition that the target control deviation exceeds the preset threshold, the accuracy of the safety correction model is low, and the safety and the control accuracy are difficult to guarantee, and at the moment, the model parameters of the safety correction model can be updated.

In the above embodiment of the present application, comparing the actual measurement state of the device with the theoretical state of the device to obtain the target control deviation includes: comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain the current control deviation of the safety correction model; acquiring historical control deviation of a safety correction model; and accumulating the historical control deviation and the current control deviation to obtain the target control deviation.

The above-mentioned historical control deviation may be a control deviation between the last obtained measured state of the device and the theoretical state of the device. The historical control deviation can also be a control deviation between the actual measurement state and the theoretical state of the equipment, which is obtained in a historical preset time period. The historical control deviations described above may be one or more. The specific setting mode of the historical control deviation is not limited, and the required historical control deviation can be determined according to the actual scene requirements.

In an optional embodiment, the actual measurement state of the device may be compared with the theoretical state of the device to obtain a current control deviation of the safety correction model, and a historical control deviation of the safety correction model may be obtained, and the historical control deviation and the current control deviation may be accumulated to obtain a target control deviation.

In the above embodiment of the present application, after the initial control instruction is corrected by using the safety correction model to obtain the target control instruction and the theoretical state of the device, the method further includes: outputting a target control instruction; and under the condition of receiving a deviation confirmation instruction corresponding to the target control instruction, updating model parameters of the safety correction model, wherein the deviation confirmation instruction is used for determining that the deviation exists between the actual measurement state of the equipment to be controlled and the theoretical state of the equipment output by the safety correction model.

In an alternative embodiment, the theoretical state of the device may be output to a terminal or a display screen of a user, so that the user may confirm whether the theoretical state of the device has a deviation, if the user thinks that the theoretical state of the device has the deviation, the user may click a determination button or determine the theoretical state of the device through other manners, a deviation determination instruction is generated, and model parameters of the safety correction model may be updated according to the generated deviation determination instruction, so as to improve the model accuracy of the safety correction model.

In the above embodiments of the present application, updating the model parameters of the security correction model includes: storing the state information into a preset storage device, wherein the preset storage device is used for storing a historical state obtained by measuring a real environment; sampling data stored in preset storage equipment to generate first training data corresponding to the safety correction model; model parameters of the security correction model are updated based on the first training data.

The preset storage device may be a device dedicated to storing data in the power system, and the preset storage device may also be each device to be controlled in the power system, where the device to be controlled may have a storage space dedicated to storing data.

The preset storage device may also be a data storage.

In an optional embodiment, after the state information of the device to be controlled is acquired, a stored value of the state information may be preset in the storage device, so as to facilitate subsequent use of the state information; the data stored in the preset storage device can be continuously sampled to generate first training data corresponding to the safety correction model, so that the model parameters of the safety correction model can be updated according to the first training data, and the model precision of the safety correction model can be efficiently improved.

In another optional embodiment, after the latest status information is acquired, the data in the preset storage device may be updated by using the latest status information; and periodically updating the data in the preset storage device according to the acquired state information.

In the above embodiment of the present application, the method further includes: constructing an initial correction model based on the safe operation limiting conditions of the equipment to be controlled; and training the initial correction model through historical knowledge information to obtain a safety correction model.

The safe operation restriction condition may be a constraint condition of the safe operation restriction. In the power system, the safe operation restriction condition may be a constraint condition of a safe operation restriction in the power system.

The historical knowledge information may be prior knowledge, where the prior knowledge may be input learning rate, update step length, power system and each adjustable device model parameter, and may also be other prior knowledge, which is not listed here.

In an optional embodiment, various neural networks in the reinforcement learning algorithm can be initialized, and the learning rate, the updating step length, the power system and the initial correction model of each adjustable device are input; and training the initial correction model of each adjustable device based on the prior knowledge to obtain a safety correction model.

In the above embodiment of the present application, after the initial control instruction is corrected by using the safety correction model to obtain the target control instruction, the method further includes: storing the state information to a preset storage device; sampling data stored in preset storage equipment to generate second training data corresponding to the reinforcement learning model; model parameters of the reinforcement learning model are updated based on the second training data.

The preset storage device may also be a data storage.

In an optional embodiment, after the state information of the device to be controlled is acquired, a stored value of the state information may be preset in the storage device, so as to facilitate subsequent use of the state information; the data stored in the preset storage device can be continuously sampled to generate second training data corresponding to the safety correction model, so that the model parameters of the reinforcement learning model can be conveniently updated according to the second training data, and the model precision of the reinforcement learning model can be efficiently improved.

In another alternative embodiment, during the real-time operation of the power system, the real-time scheduling instructions of the source-network-load-storage multiple types of adjustable devices can be periodically output by using a neural network and a safety correction model in reinforcement learning. The method comprises the steps of updating a data storage according to state information of adjustable equipment in a newly measured power system, continuously sampling from the data storage to generate a training set, dynamically updating and optimizing parameters of a scheduling strategy by using a reinforcement learning algorithm, and promoting the strategy to be continuously optimized, so that the adjustable equipment is mutually matched and coordinately scheduled, and rewards are obtained from the power system environment to the maximum extent.

In the above embodiment of the present application, the method further includes: constructing an initial learning model, wherein the initial learning model is used for representing the mapping relation between different states of the equipment to be controlled and different control instructions; generating a reward function of the initial learning model based on the running cost and the line load flow of the target control instruction; and training the initial learning model based on the third training data and the reward function to obtain the reinforcement learning model.

The line tide can be the output condition of the unit, the load removal and the load transfer.

The operation cost may be an operation cost of the device to be controlled when executing the target control command, and the line flow may be a line flow of the device to be controlled when executing the target control command.

The above-mentioned reward function may be a reward function designed with the goal of minimizing the running cost and eliminating the tidal current violation scenario.

The different states of the devices to be controlled may be a reinforcement learning state space, where the reinforcement learning state space is composed of a power grid state (including but not limited to a branch flow, a node voltage, a line state, and the like) and a device state (including but not limited to a generator output, an energy storage state, a load demand, and the like).

The different control instructions of the equipment to be controlled can be real-time scheduling instructions of a plurality of types of adjustable equipment in a reinforcement learning action space including source-network-load-storage in the power system.

The mapping relation between different states of the device to be controlled and different control commands, which is characterized by the initial learning model, can be a scheduling strategy.

In an alternative embodiment, a reinforcement learning algorithm may be designed, that is, an initial learning model is constructed to minimize the operation cost and eliminate the power flow out-of-limit scenario to generate a target design reward function; and training the scheduling strategy by using the initial learning model to obtain a reinforcement learning model. The reinforcement learning model can determine an action space value by using a scheduling strategy according to a current state space value obtained by sampling from a real-time environment of the power system, output a scheduling instruction reference value of each adjustable device and issue the reference value to the next link.

The traditional power system dispatching mode is difficult to meet real-time dispatching requirements in a power grid environment with violent change and inaccurate model parameters, and aims to solve the challenge brought by violent change of new energy and ensure the safety and the economy of long-term operation of a complex power system under an emergency condition.

Fig. 3 is a schematic diagram of an apparatus control process according to an embodiment of the present application, in the process of a reinforcement learning algorithm, a power grid state and an apparatus state may be input into a multi-layer neural network, and a reinforcement learning model, that is, a scheduling policy, may be obtained, where the scheduling policy may obtain a real-time scheduling instruction of a source-grid-load-store multi-class adjustable apparatus, and may input a scheduling instruction reference value into a security correction model, so as to output a scheduling instruction correction value by using the security correction model, where an object of the security correction model may be to correct the scheduling instruction to a security domain, a constraint of the security correction model may be a security operation limit of an electric power system, the security correction model may be initialized by using priori knowledge, a model parameter of the security correction model may be updated by performing sample collection from a data storage, and an apparatus to be controlled in the electric power system may be controlled by using the scheduling instruction correction value; the environment state and the equipment state of the equipment to be controlled in the real environment can be collected in real time, the simulator is utilized to simulate the environment state and the equipment state to obtain state information, and the state information can be stored in the data storage, wherein the data in the data storage can be used for subsequent training so as to strengthen the learning state, the action and the reward value.

Fig. 4 is a flowchart of a device control method according to an embodiment of the present application, and as shown in fig. 4, the method includes:

step S401, inputting a learning rate, a neural network parameter, a power grid and an adjustable equipment model parameter;

optionally, various neural networks in the reinforcement learning algorithm can be initialized, and the learning rate, the updating step length, the power system and various adjustable equipment model parameters are input; and initializing a safety correction model of the power system based on the prior knowledge and the model parameters.

Step S402, initializing a safety correction model of the power system based on prior knowledge;

step S403, updating neural network parameters in the deep reinforcement learning algorithm;

the deep reinforcement learning algorithm may be a reinforcement learning model.

Step S404, utilizing deep reinforcement learning to generate a dispatching instruction reference value;

the above-mentioned scheduling instruction reference value may be an initial control instruction.

Optionally, a reinforcement learning algorithm can be designed, and a reward function is designed by taking a scenario of minimizing operation cost and eliminating power flow out-of-limit as a target; the reinforcement learning action space comprises real-time scheduling instructions of source-network-load-storage multiple types of adjustable equipment in the power system; the reinforcement learning state space is composed of a power grid state and an equipment state. Training the scheduling strategy by using a reinforcement learning algorithm, determining an action space value by using the scheduling strategy according to a current state space value sampled from the real-time environment of the power system, outputting a scheduling instruction reference value of each adjustable device, and issuing the scheduling instruction reference value to the next link.

Step S405, modifying a scheduling command based on the safety correction model;

the safety correction model modification scheduling command may be a target control command.

Optionally, the dispatching instruction reference value output by reinforcement learning often cannot strictly guarantee the safety of the power system, and a safety correction optimization model is constructed for ensuring that the real-time dispatching instruction of the power system meets the safe operation requirement of the power grid. The safety correction optimization model is an optimization problem containing constraint, the safety operation limit of the power system is taken as a constraint condition, and the minimum deviation between the actual measurement state of the equipment and the theoretical state of the equipment is taken as an optimization target. And solving the safety correction optimization model to obtain a target control instruction capable of ensuring the safe operation of the power system, and issuing the target control instruction to each adjustable device for execution and implementation.

In the application, the safety correction model can be initialized by using the priori knowledge of the power system, so that the defect that the safety guarantee capability of a related safety reinforcement learning scheme is poor in the initial training stage and the inexperienced emergency condition is overcome; on the basis of the prior knowledge, the model parameters of the safety correction model can be dynamically updated based on the measured data, so that the model precision is further improved, and the long-term safe operation of the power system is promoted.

Step S406, storing the states of the power grid and the equipment, and updating a data memory;

the power grid and device states may be state information of the device to be controlled.

Optionally, in the real-time operation process of the power system, a neural network and a safety correction model in reinforcement learning can be used for periodically outputting real-time scheduling instructions of the source-network-load-storage multiple types of adjustable devices. The data storage is updated according to the latest actually measured power system operation data, a training set is generated by continuously sampling from the data storage, dynamic updating and parameter optimization are carried out on the scheduling strategy by using a reinforcement learning algorithm, the strategy is promoted to be continuously optimized, therefore, mutual cooperation and coordinated scheduling of all adjustable devices are realized, and the rewards are obtained from the power system environment to the maximum extent.

The data storage device may be a preset storage device.

Step S407, determine whether the error of the security correction model reaches a preset threshold? If yes, executing step S408, otherwise, executing step S409;

optionally, the power flow state theoretical value of the power system obtained by the safety correction model may be compared with the actual measurement data in the actual environment, and if the accumulated deviation exceeds a preset threshold, the parameters of the safety correction model are updated based on the actual measurement data, so as to improve the model precision.

Step S408, updating model parameters of the security correction model based on the latest state information;

step S409, determining whether training is completed, if yes, performing step S410, and if not, performing step S403;

and S410, outputting the parameters of the reinforcement learning neural network and the parameters of the safety correction model.

The key technical innovation points of the scheme are listed as follows:

the method has the innovation points of reinforcement learning and optimization complementation, knowledge-data fusion and source-network-load-storage coordinated scheduling.

For reinforcement learning and optimization complementation, the rapid decision advantage of reinforcement learning and the safety guarantee advantage of an optimization method can be utilized, the reinforcement learning is utilized to realize rapid decision under the condition that the model parameters are inaccurate, the optimization method is adopted to carry out safety correction on the output action of the reinforcement learning, and the rapid calculation and the safe implementation of a scheduling command are guaranteed.

For knowledge-data fusion, a safety correction method based on knowledge-data fusion can be constructed, a safety correction model is initialized by using priori knowledge of the power system, and the defect that a related safety reinforcement learning scheme is poor in safety in an initial training stage and under an inexperienced emergency condition is overcome; in addition, the parameters of the safety correction model are dynamically updated by utilizing the measured data on the basis of the knowledge model, and the model precision is further improved.

For source-network-load-storage coordinated scheduling, a safety reinforcement learning architecture can be constructed to realize coordinated scheduling and mutual cooperation of various resources of source-network-load-storage, and fully excavate the flexibility and the adjustment potential of the power system.

Solutions similar to this scheme can be divided into the following three categories: an optimization scheme, a reinforcement learning scheme and a safety reinforcement learning scheme which take safety constraints into consideration. Details of similar schemes and their disadvantages are described below:

1): an optimization scheme considering safety constraints, wherein optimization methods for solving the real-time scheduling problem of the power system can be mainly divided into two types: (1) The alternating current power flow equation can be converted into a disposable convex model such as a linear model and a second-order cone model through convex relaxation, and then the solution is carried out based on a convex optimization method; (2) And directly solving by using a non-convex optimization method, such as a prime-dual interior point method, a gradient algorithm, a heuristic algorithm and the like.

Its main disadvantages are as follows: and (1) the calculation load is large. Due to the high nonlinearity and the non-convexity of the power flow equation, the optimization-based scheme has low calculation efficiency and high model complexity. With the increasing growth of renewable energy sources and flexible resources, the real-time scheduling requirement is difficult to meet in a novel power system environment with violent change and complex model; and (2) excessively depending on the parameters of the power system model. In recent years, uncertainty grows in power systems, tunable resources with inaccurate mathematical models (for example, network parameters of a low-voltage distribution network and model parameters on a demand side) are growing, and optimization schemes are difficult to model in power systems with inaccurate model parameters.

2): a reinforcement learning scheme: and taking the real-time scheduling instruction of the power system as the action amount of reinforcement learning, and training the scheduling method based on a reinforcement learning method and massive historical operation data. The main defects are that the safety of the dispatching instruction of the power system in the training process and the executing process cannot be ensured, and the phenomena of load flow safety constraint out-of-limit, system breakdown and the like are easy to occur.

3): the safety reinforcement learning scheme comprises the following steps: safety reinforcement learning is a reinforcement learning method considering safety operation constraint, and has great application potential in real-time scheduling of a power system. The related security reinforcement learning methods can be classified into the following three categories as a whole: (1) Adding a penalty item or a self-adaptive penalty function corresponding to violation of the security constraint in the reward function; (2) Learning a control strategy in a CMDP environment using either constraint strategy optimization or a proto-dual approach; (3) And projecting the reinforcement learning action into a safety domain by adding an additional safety correction method. Its main disadvantages are as follows: (1) The related safety reinforcement learning scheme mainly models the safety constraint of the power system into soft constraint, and the scheme is difficult to ensure the instantaneous safety of the power system; (2) Although unsafe reinforcement learning actions can be projected into a safe feasible domain by adding an additional safety correction method, the acquisition of a relevant safety correction model depends on a large amount of historical data, and the method has poor application effect in an initial training stage and an inexperienced emergency.

The scheme provides a safety reinforcement learning framework with knowledge-data fusion, is used for solving the problem of real-time scheduling of a complex power system, and has the advantages of low dependence degree on model precision, high decision efficiency, strong safety guarantee performance and wide application range. Compared with the related scheme, the technical effect that the scheme can achieve is as follows:

with respect to the above related art 1), the present application can reduce the degree of dependence on the power system model and parameters: the method adopts a knowledge-data fusion safety reinforcement learning framework, takes a scheduling instruction as the action quantity of reinforcement learning, takes the operating environment of the power system as the exploration object of the reinforcement learning, and dynamically adjusts and updates the scheduling strategy according to the actual feedback state of the power system. Therefore, training and optimization of the scheduling strategy of the scheme can be completed through continuous interaction with the environment, and compared with related products and technologies, the degree of dependence on the power system model and parameters is greatly reduced.

Aiming at the related technology 2), the execution efficiency is high, the decision-making speed is high, the fast decision-making advantage of reinforcement learning is fully utilized, and compared with an optimization scheme considering safety constraints, the method does not depend on an accurate power system model any more, the execution efficiency is high, and the decision-making speed is high.

Aiming at the related technology 3), the safety guarantee capability is strong, the safety guarantee advantages of the optimization method are fully utilized, and the real-time operation safety of the power system can be greatly improved compared with a reinforcement learning scheme. In addition, the safety correction model is initialized by using the priori knowledge of the power system, so that the defect that the safety guarantee capability of a related safety reinforcement learning method is poor in the initial training stage and the inexperienced emergency condition is overcome; on the basis of the prior knowledge, the parameters of the safety correction model are dynamically updated based on the measured data, so that the model precision can be further improved, and the long-term safe operation of the power system is promoted.

In addition, the method has the advantages of wide application range, can promote coordination and complementation of various resources of source-network-load-storage, has high universality and strong universality, and is easy to adapt to various real-time scheduling scenes of the power system, such as energy management, voltage regulation, emergency control and the like. Specifically, the method comprises the following steps: for the problem of energy management, real-time active power dispatching can be realized by utilizing the method for various active power regulating devices (such as a generator set, a new energy power station, energy storage devices, adjustable loads and the like) in a large power system, power balance of a power grid is maintained, and energy supply requirements are met; for the problem of voltage regulation, various reactive power regulation devices (such as a generator set with reactive power regulation capability, power electronic equipment, a capacitor and the like) in the power system can be regulated in real time by using the method, so that the voltage of a power grid is ensured to be at a safe level; under emergency or emergency conditions, the method and the device can be used for recovering and adjusting the power of the communication line of the regional power system, and realize functions of section recovery, power tracking, emergency control and the like.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

There is also provided, in accordance with an embodiment of the present application, an apparatus control method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

Fig. 5 is a flowchart of a device control method according to embodiment 2 of the present application, and as shown in fig. 5, the method may include the following steps:

step S502, the real power grid environment where the power equipment is located is measured, and the state information of the power equipment is obtained.

Wherein the state information includes: the power grid state of the real power grid environment and the equipment state of the electric equipment.

Step S504, mapping the state information to an initial scheduling instruction of the power equipment.

Step S506, the initial scheduling instruction is corrected to obtain a target scheduling instruction.

The target control instruction is located in a preset instruction space of the power equipment, and the control instruction located in the preset instruction space is used for controlling the power equipment to safely operate;

and step S508, controlling the power equipment based on the target control command.

In the above embodiment of the present application, correcting the initial scheduling instruction to obtain the target scheduling instruction includes: and correcting the initial scheduling instruction by using the safety correction model to obtain a target scheduling instruction.

In the above embodiment of the present application, the safety correction model is further configured to correct the initial scheduling instruction to obtain an equipment theoretical state of the electrical equipment, and the method further includes: acquiring an equipment actual measurement state of power equipment in a real power grid environment; comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain a target control deviation; and updating the model parameters of the safety correction model under the condition that the target control deviation exceeds a preset threshold value.

It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.

Example 3

Fig. 6 is a flowchart of a device control method according to embodiment 3 of the present application, and as shown in fig. 6, the method may include the steps of:

step S602, the cloud server receives the state information of the device to be controlled, which is uploaded by the client.

The state information is obtained by measuring the equipment to be controlled in the real environment where the equipment to be controlled is located, and the state information comprises: the environmental state of the real environment, the device state of the device to be controlled.

Step S604, the cloud server maps the state information to an initial control instruction of the device to be controlled.

Step S606, the cloud server corrects the initial control instruction to obtain a target control instruction.

Step S608, the cloud server sends the target control instruction to the client.

The target control instruction is used for controlling the equipment to be controlled through the client.

Example 4

According to an embodiment of the present application, there is also provided an equipment control apparatus for implementing the equipment control method, and fig. 7 is a schematic diagram of an equipment control apparatus according to embodiment 4 of the present application, and as shown in fig. 7, the apparatus 700 includes: a measurement module 702, a mapping module 704, a correction module 706, and a control module 708.

The measuring module is used for measuring the real environment where the equipment to be controlled is located to obtain the state information of the equipment to be controlled; the mapping module is used for mapping the state information into an initial control instruction of the equipment to be controlled; the correction module is used for correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is positioned in a preset instruction space of the equipment to be controlled, and the control instruction positioned in the preset instruction space is used for controlling the safe operation of the equipment to be controlled; the control module is used for controlling the equipment to be controlled based on the target control instruction.

It should be noted here that the measurement module 702, the mapping module 704, the correction module 706, and the control module 708 correspond to steps S202 to S208 of embodiment 1, and the four modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computing terminal 10 provided in the first embodiment.

In the above embodiments of the present application, the correction module includes: a correction unit.

The correction unit is used for correcting the initial control instruction by using the safety correction model to obtain a target control instruction.

In the above embodiment of the present application, the apparatus further includes: and an acquisition module.

The acquisition module is used for acquiring the actual measurement state of the equipment to be controlled in the real environment; the comparison module is used for comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain a target control deviation; the updating module is used for updating the model parameters of the safety correction model under the condition that the target control deviation exceeds a preset threshold value.

In the above embodiment of the present application, the comparing module includes: the device comprises a comparison unit, an acquisition unit and an accumulation unit.

The comparison unit is used for comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain the current control deviation of the safety correction model; the acquisition unit is used for acquiring historical control deviation of the safety correction model; the accumulation unit is used for accumulating the historical control deviation and the current control deviation to obtain the target control deviation.

In the above embodiment of the present application, the apparatus further includes: the device comprises an output module and an updating module.

The output module is used for outputting a target control instruction; the updating module is used for updating the model parameters of the safety correction model under the condition that a deviation confirming instruction corresponding to the target control instruction is received, wherein the deviation confirming instruction is used for determining that the actual measurement state of the equipment to be controlled has deviation from the theoretical state of the equipment output by the safety correction model.

In the above embodiments of the present application, the update module includes: the device comprises a storage unit, a generation unit and an updating unit.

The storage unit is used for storing the state information to preset storage equipment, wherein the preset storage equipment is used for storing a historical state obtained by measuring a real environment; the generating unit is used for sampling data stored in preset storage equipment and generating first training data corresponding to the safety correction model; the updating unit is used for updating the model parameters of the safety correction model based on the first training data.

In the above embodiment of the present application, the apparatus further includes: the device comprises a construction module and a training module.

The device comprises a construction module, a control module and a control module, wherein the construction module is used for constructing an initial correction model based on a safe operation limiting condition of a device to be controlled; the training module is used for training the initial correction model through historical knowledge information to obtain a safety correction model.

In the above embodiment of the present application, the apparatus further includes: the device comprises a storage module, a sampling module and an updating module.

The storage module is used for storing the state information to preset storage equipment; the sampling module is used for sampling data stored in preset storage equipment to generate second training data corresponding to the reinforcement learning model; updating model parameters of the reinforcement learning model based on the second training data; the updating module is used for updating the model parameters of the safety correction model based on the first training data.

In the above embodiment of the present application, the apparatus further includes: and generating a module.

The device comprises a construction module, a control module and a control module, wherein the construction module is used for constructing an initial learning model, and the initial learning model is used for representing the mapping relation between different states and different control instructions of equipment to be controlled; the generation module is used for generating a reward function of the initial learning model based on the running cost and the line trend of the target control instruction; the sampling module is used for training the initial learning model based on the third training data and the reward function to obtain the reinforcement learning model.

It should be noted that the preferred embodiments described in the foregoing examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.

Example 5

According to an embodiment of the present application, there is further provided an apparatus control device for implementing the apparatus control method, and fig. 8 is a schematic diagram of an apparatus control device according to embodiment 5 of the present application, as shown in fig. 8, the apparatus 800 includes: a measurement module 802, a mapping module 804, a correction module 806, and a control module 808.

The measuring module is used for measuring the real power grid environment where the power equipment is located to obtain the state information of the power equipment; the mapping module is used for mapping the state information into an initial scheduling instruction of the power equipment; the correction module is used for correcting the initial scheduling instruction to obtain a target scheduling instruction, wherein the target scheduling instruction is located in a preset instruction space of the power equipment, and the scheduling instruction located in the preset instruction space is used for controlling the power equipment to run safely; the control module is used for controlling the power equipment based on the target scheduling instruction.

It should be noted here that the measurement module 802, the mapping module 804, the correction module 806, and the control module 808 correspond to steps S502 to S508 of the embodiment 2, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure of the first embodiment. It should be noted that the above modules as part of the apparatus may be run in the computing terminal 10 provided in the first embodiment.

In the above embodiments of the present application, the correction module includes: and a correction unit.

The correcting unit is used for correcting the initial scheduling instruction by using the safety correction model to obtain a target scheduling instruction.

Example 6

According to an embodiment of the present application, there is also provided an apparatus control device for implementing the apparatus control method, and fig. 9 is a schematic diagram of an apparatus control device according to embodiment 6 of the present application, and as shown in fig. 9, the apparatus 900 includes: a receiving module 902, a mapping module 904, a correcting module 906, and a sending module 908.

The receiving module is used for receiving the state information of the equipment to be controlled uploaded by the client through the cloud server; the mapping module is used for mapping the state information into an initial control instruction of the equipment to be controlled through the cloud server; the correction module is used for correcting the initial control instruction through the cloud server to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the device to be controlled, and the control instruction located in the preset instruction space is used for controlling the device to be controlled to operate safely; the sending module is used for sending a target control instruction to the client through the cloud server, wherein the target control instruction is used for controlling the device to be controlled through the client.

It should be noted here that the receiving module 902, the mapping module 904, the correcting module 906, and the sending module 908 correspond to steps S602 to S608 of embodiment 3, and the four modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the above modules as part of the apparatus may be run in the computing terminal 10 provided in the first embodiment.

Example 7

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the device control method: measuring the real environment of the equipment to be controlled to obtain the state information of the equipment to be controlled; mapping the state information into an initial control instruction of the equipment to be controlled; correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the equipment to be controlled, and the control instruction located in the preset instruction space is used for controlling the equipment to be controlled to operate safely; and controlling the equipment to be controlled based on the target control instruction.

Alternatively, fig. 10 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 10, the computer terminal a may include: one or more (only one shown) processors, memory.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the device control method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the device control method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: measuring the real environment of the equipment to be controlled to obtain the state information of the equipment to be controlled; mapping the state information into an initial control instruction of the equipment to be controlled; correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the equipment to be controlled, and the control instruction located in the preset instruction space is used for controlling the equipment to be controlled to operate safely; and controlling the equipment to be controlled based on the target control instruction.

Optionally, the processor may further execute the program code of the following steps: and correcting the initial control instruction by using the safety correction model to obtain a target control instruction.

Optionally, the processor may further execute the program code of the following steps: acquiring an actual measurement state of equipment to be controlled in a real environment; comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain a target control deviation; and updating the model parameters of the safety correction model under the condition that the target control deviation exceeds a preset threshold value.

Optionally, the processor may further execute the program code of the following steps: comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain the current control deviation of the safety correction model; acquiring historical control deviation of a safety correction model; and accumulating the historical control deviation and the current control deviation to obtain the target control deviation.

Optionally, the processor may further execute the program code of the following steps: outputting a target control instruction; and under the condition that a deviation confirmation instruction corresponding to the target control instruction is received, updating model parameters of the safety correction model, wherein the deviation confirmation instruction is used for determining that the deviation exists between the actual measurement state of the equipment to be controlled and the theoretical state of the equipment output by the safety correction model.

Optionally, the processor may further execute the program code of the following steps: storing the state information into a preset storage device, wherein the preset storage device is used for storing a historical state obtained by measuring a real environment; sampling data stored in preset storage equipment to generate first training data corresponding to a safety correction model; model parameters of the security correction model are updated based on the first training data.

Optionally, the processor may further execute the program code of the following steps: constructing an initial correction model based on the safe operation limiting conditions of the equipment to be controlled; and training the initial correction model through historical knowledge information to obtain a safety correction model.

Optionally, the processor may further execute the program code of the following steps: and mapping the state information into an initial control instruction of the equipment to be controlled by utilizing a reinforcement learning model, wherein the reinforcement learning model is used for representing the mapping relation between different state information and different control instructions.

Optionally, the processor may further execute the program code of the following steps: storing the state information to a preset storage device; sampling data stored in preset storage equipment to generate second training data corresponding to the reinforcement learning model; model parameters of the reinforcement learning model are updated based on the second training data.

Optionally, the processor may further execute the program code of the following steps: constructing an initial learning model, wherein the initial learning model is used for representing the mapping relation between different states of the equipment to be controlled and different control instructions; generating a reward function of the initial learning model based on the running cost and the line trend of the target control instruction; and training the initial learning model based on the third training data and the reward function to obtain the reinforcement learning model.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: measuring the real power grid environment where the power equipment is located to obtain state information of the power equipment; mapping the state information into an initial scheduling instruction of the power equipment; correcting the initial scheduling instruction to obtain a target scheduling instruction, wherein the target scheduling instruction is located in a preset instruction space of the power equipment, and the scheduling instruction located in the preset instruction space is used for controlling the power equipment to run safely; and controlling the power equipment based on the target scheduling instruction.

Optionally, the processor may further execute the program code of the following steps: and correcting the initial scheduling instruction by using the safety correction model to obtain a target scheduling instruction.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: the method comprises the steps that a cloud server receives state information of equipment to be controlled uploaded by a client, wherein the state information is obtained by measuring the equipment to be controlled in the real environment where the equipment to be controlled is located; the cloud server maps the state information into an initial control instruction of the equipment to be controlled; the cloud server corrects the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the device to be controlled, and the control instruction located in the preset instruction space is used for controlling the device to be controlled to operate safely; the cloud server sends a target control instruction to the client, wherein the target control instruction is used for controlling the device to be controlled through the client.

By adopting the embodiment of the invention, the real environment of the equipment to be controlled can be measured to obtain the state information of the equipment to be controlled; mapping the state information into an initial control instruction of the equipment to be controlled; correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the equipment to be controlled, and the control instruction located in the preset instruction space is used for controlling the equipment to be controlled to operate safely; and the equipment to be controlled is controlled based on the target control instruction, so that the aim of ensuring the safe operation of the equipment to be controlled is fulfilled. It is easy to notice that the real environment where the device to be controlled is located can be measured to obtain the state information of the control device, and because the state information contains the factors of the real environment, the effect of real-time scheduling according to the real scene can be achieved, and the initial control instruction is corrected by using the safety correction model to obtain the target control instruction, so that the running safety of the device to be controlled can be further enhanced, and the technical problem that the scheduling system in the related technology is difficult to meet the real-time scheduling is solved.

It can be understood by those skilled in the art that the structure shown in fig. 10 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, etc. Fig. 10 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, read-Only memories (ROMs), random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 8

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the apparatus control method provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: measuring the real environment of the equipment to be controlled to obtain the state information of the equipment to be controlled; mapping the state information into an initial control instruction of the equipment to be controlled; correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the equipment to be controlled, and the control instruction located in the preset instruction space is used for controlling the equipment to be controlled to operate safely; and controlling the equipment to be controlled based on the target control instruction.

Optionally, the storage medium is further configured to store program code for performing the following steps: and correcting the initial control instruction by using the safety correction model to obtain a target control instruction.

Optionally, the storage medium is further configured to store program code for performing the following steps: acquiring an actual measurement state of equipment to be controlled in a real environment; comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain a target control deviation; and updating the model parameters of the safety correction model under the condition that the target control deviation exceeds a preset threshold value.

Optionally, the storage medium is further configured to store program code for performing the following steps: comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain the current control deviation of the safety correction model; acquiring historical control deviation of a safety correction model; and accumulating the historical control deviation and the current control deviation to obtain the target control deviation.

Optionally, the storage medium is further configured to store program code for performing the following steps: outputting a target control instruction; and under the condition that a deviation confirmation instruction corresponding to the target control instruction is received, updating model parameters of the safety correction model, wherein the deviation confirmation instruction is used for determining that the deviation exists between the actual measurement state of the equipment to be controlled and the theoretical state of the equipment output by the safety correction model.

Optionally, the storage medium is further configured to store program code for performing the following steps: storing the state information into a preset storage device, wherein the preset storage device is used for storing a historical state obtained by measuring a real environment; sampling data stored in preset storage equipment to generate first training data corresponding to the safety correction model; model parameters of the security correction model are updated based on the first training data.

Optionally, the storage medium is further configured to store program code for performing the following steps: constructing an initial correction model based on the safe operation limiting conditions of the equipment to be controlled; and training the initial correction model through historical knowledge information to obtain a safety correction model.

Optionally, the storage medium is further configured to store program code for performing the following steps: and mapping the state information into an initial control instruction of the equipment to be controlled by utilizing a reinforcement learning model, wherein the reinforcement learning model is used for representing the mapping relation between different state information and different control instructions.

Optionally, the storage medium is further configured to store program code for performing the following steps: storing the state information to a preset storage device; sampling data stored in preset storage equipment to generate second training data corresponding to the reinforcement learning model; and updating the model parameters of the reinforcement learning model based on the second training data.

Optionally, the storage medium is further configured to store program code for performing the following steps: constructing an initial learning model, wherein the initial learning model is used for representing the mapping relation between different states of the equipment to be controlled and different control instructions; generating a reward function of the initial learning model based on the running cost and the line trend of the target control instruction; and training the initial learning model based on the third training data and the reward function to obtain the reinforcement learning model.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: measuring the real power grid environment where the power equipment is located to obtain state information of the power equipment; mapping the state information into an initial scheduling instruction of the power equipment; correcting the initial scheduling instruction to obtain a target scheduling instruction, wherein the target scheduling instruction is located in a preset instruction space of the power equipment, and the scheduling instruction located in the preset instruction space is used for controlling the power equipment to safely operate; and controlling the power equipment based on the target scheduling instruction.

Optionally, the storage medium is further configured to store program code for performing the following steps: and correcting the initial scheduling instruction by using a safety correction model to obtain a target scheduling instruction and a theoretical state of the equipment.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the method comprises the steps that a cloud server receives state information of equipment to be controlled uploaded by a client, wherein the state information is obtained by measuring the equipment to be controlled in a real environment where the equipment to be controlled is located; the cloud server maps the state information into an initial control instruction of the equipment to be controlled; the cloud server corrects the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the device to be controlled, and the control instruction located in the preset instruction space is used for controlling the device to be controlled to operate safely; the cloud server sends a target control instruction to the client, wherein the target control instruction is used for controlling the device to be controlled through the client.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An apparatus control method characterized by comprising:

measuring the real environment of the equipment to be controlled to obtain the state information of the equipment to be controlled;

mapping the state information into an initial control instruction of the equipment to be controlled;

correcting the initial control instruction to obtain a target control instruction, wherein the target control instruction is located in a preset instruction space of the device to be controlled, and the control instruction located in the preset instruction space is used for controlling the device to be controlled to operate safely;

and controlling the equipment to be controlled based on the target control instruction.

2. The method of claim 1, wherein correcting the initial control command to obtain a target control command comprises:

and correcting the initial control instruction by using a safety correction model to obtain the target control instruction.

3. The method according to claim 2, wherein the safety correction model is further configured to correct the initial control command to obtain a theoretical state of the device to be controlled, and the method further comprises:

acquiring the actual measurement state of the equipment to be controlled in the real environment;

comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain a target control deviation;

and updating the model parameters of the safety correction model under the condition that the target control deviation exceeds a preset threshold value.

4. The method of claim 3, wherein comparing the plant measured state to a plant theoretical state to obtain a target control bias comprises:

comparing the actual measurement state of the equipment with the theoretical state of the equipment to obtain the current control deviation of the safety correction model;

acquiring historical control deviation of the safety correction model;

and accumulating the historical control deviation and the current control deviation to obtain the target control deviation.

5. The method of claim 3, wherein after the initial control command is corrected using a safety correction model to obtain the target control command, the method further comprises:

outputting the target control instruction;

and under the condition that a deviation confirmation instruction corresponding to the target control instruction is received, updating the model parameters of the safety correction model, wherein the deviation confirmation instruction is used for determining that the deviation exists between the actual measurement state of the equipment to be controlled and the theoretical state of the equipment output by the safety correction model.

6. The method according to any one of claims 2 to 5, wherein updating the model parameters of the safety correction model comprises:

storing the state information into a preset storage device, wherein the preset storage device is used for storing a historical state obtained by measuring the real environment;

sampling data stored in the preset storage equipment to generate first training data corresponding to the safety correction model;

updating model parameters of the security correction model based on the first training data.

7. The method of claim 2, further comprising:

constructing an initial correction model based on the safe operation limiting condition of the equipment to be controlled;

and training the initial correction model through historical knowledge information to obtain the safety correction model.

8. The method of claim 2, wherein mapping the status information to an initial control command of the device to be controlled comprises:

and mapping the state information into an initial control instruction of the equipment to be controlled by utilizing a reinforcement learning model, wherein the reinforcement learning model is used for representing the mapping relation between different state information and different control instructions.

9. The method of claim 8, wherein after the initial control command is corrected using a safety correction model to obtain a target control command, the method further comprises:

storing the state information to a preset storage device;

sampling data stored in the preset storage equipment to generate second training data corresponding to the reinforcement learning model;

updating model parameters of the reinforcement learning model based on the second training data.

10. The method of claim 8, further comprising:

constructing an initial learning model, wherein the initial learning model is used for representing mapping relations between different states and different control instructions of the equipment to be controlled;

generating a reward function of the initial learning model based on the running cost and the line trend of the target control instruction;

and training the initial learning model based on third training data and the reward function to obtain the reinforcement learning model.

11. An apparatus control method characterized by comprising:

measuring a real power grid environment where the power equipment is located to obtain state information of the power equipment;

mapping the state information to an initial scheduling instruction of the power equipment;

correcting the initial scheduling instruction to obtain a target scheduling instruction, wherein the target scheduling instruction is located in a preset instruction space of the power equipment, and the scheduling instruction located in the preset instruction space is used for controlling the power equipment to operate safely;

and controlling the power equipment based on the target scheduling instruction.

12. The method of claim 11, wherein correcting the initial scheduling instruction to obtain a target scheduling instruction comprises:

and correcting the initial scheduling command by using a safety correction model to obtain the target scheduling command and the theoretical state of the equipment.

13. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the apparatus control method according to any one of claims 1 to 12.

14. A computer terminal, comprising:

a memory for storing a program;

a processor connected to the memory for executing the program, wherein the program executes to perform the apparatus control method according to any one of claims 1 to 12.