CN114721345A

CN114721345A - Industrial control method, device and system based on reinforcement learning and electronic equipment

Info

Publication number: CN114721345A
Application number: CN202210649819.5A
Authority: CN
Inventors: 薛飞; 邹晓川
Original assignee: Nanqi Xiance Nanjing Technology Co ltd
Current assignee: Nanqi Xiance Nanjing Technology Co ltd
Priority date: 2022-06-10
Filing date: 2022-06-10
Publication date: 2022-07-08

Abstract

The embodiment of the invention discloses an industrial control method, a device, a system and electronic equipment based on reinforcement learning, wherein the method comprises the following steps: acquiring current operation data of industrial equipment; determining target control information based on a target control decision model corresponding to the industrial equipment and current operation data, wherein the target control decision model is obtained by carrying out reinforcement learning on a preset control decision model in advance based on a target virtual environment model corresponding to the industrial equipment, and the target virtual environment model corresponding to the industrial equipment is obtained by carrying out environment modeling based on historical operation data of the industrial equipment; and sending the target control information to the industrial equipment so that the industrial equipment operates based on the target control information. By the technical scheme of the embodiment of the invention, the accuracy and the efficiency of industrial control can be effectively ensured.

Description

Industrial control method, device and system based on reinforcement learning and electronic equipment

Technical Field

The embodiment of the invention relates to computer technology, in particular to an industrial control method, device and system based on reinforcement learning and electronic equipment.

Background

The industrial controller can be used for controlling industrial equipment in an industrial production process so as to ensure the normal operation of the industrial equipment. Generally, an industrial controller can perform industrial control based on a model Predictive control (mpc) manner. The MPC approach includes two parts, one is a predictive model for predicting future states and the other is an optimizer that solves for optimal control based on the future states.

At present, a prediction model in the existing MPC mode is obtained based on manual modeling with artificial experience, the model precision of the prediction model depends on the artificial experience seriously, and the modeling time is long and the cost is high. Moreover, the existing optimizer solution process is time-consuming and difficult to solve for the nonlinear case containing complex constraints. Therefore, the accuracy and efficiency of industrial control cannot be effectively guaranteed by the existing industrial control mode.

Disclosure of Invention

The embodiment of the invention provides an industrial control method, device and system based on reinforcement learning and electronic equipment, so as to effectively ensure the accuracy and efficiency of industrial control.

According to an aspect of the present invention, there is provided a reinforcement learning-based industrial control method, including:

acquiring current operation data of the industrial equipment;

determining target control information based on a target control decision model corresponding to the industrial equipment and the current operation data, wherein the target control decision model is obtained by performing reinforcement learning on a preset control decision model based on a target virtual environment model corresponding to the industrial equipment in advance, and the target virtual environment model corresponding to the industrial equipment is obtained by performing environment modeling based on historical operation data of the industrial equipment;

and sending the target control information to the industrial equipment so that the industrial equipment operates based on the target control information.

According to another aspect of the present invention, there is provided an industrial control device based on reinforcement learning, including:

the current operation data acquisition module is used for acquiring current operation data of the industrial equipment;

a target control information determination module, configured to determine target control information based on a target control decision model corresponding to the industrial device and the current operation data, where the target control decision model is obtained by performing reinforcement learning on a preset control decision model based on a target virtual environment model corresponding to the industrial device in advance, and the target virtual environment model corresponding to the industrial device is obtained by performing environment modeling based on historical operation data of the industrial device;

and the target control information sending module is used for sending the target control information to the industrial equipment so as to enable the industrial equipment to operate based on the target control information.

According to another aspect of the present invention, there is provided a reinforcement learning-based industrial control system, the system comprising: industrial equipment and industrial controllers;

wherein the industrial controller is used for realizing the industrial control method based on reinforcement learning according to any embodiment of the invention.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the reinforcement learning-based industrial control method according to any of the embodiments of the present invention.

According to the technical scheme of the embodiment of the invention, the environment modeling is carried out in advance based on the historical operation data of the industrial equipment to obtain the target virtual environment model corresponding to the industrial equipment, so that the environment modeling is carried out based on the historical operation data, the dependence on manual experience is not required, the modeling time is short, the cost is low, and the accuracy of industrial control is effectively ensured. The preset control decision model is subjected to reinforcement learning based on the target virtual environment model corresponding to the industrial equipment, and the target control decision model corresponding to the industrial equipment is obtained, so that the target control decision model obtained through reinforcement learning is utilized to perform control decision on the current operation data of the industrial equipment, target control information can be obtained more quickly, and the industrial control efficiency is further effectively improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of an industrial control method based on reinforcement learning according to an embodiment of the present invention;

FIG. 2 is a flowchart of an industrial control method based on reinforcement learning according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of an industrial control device based on reinforcement learning according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an industrial control system based on reinforcement learning according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device implementing the reinforcement learning-based industrial control method according to the embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It is to be understood that the terms "target" and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example one

Fig. 1 is a flowchart of an industrial control method based on reinforcement learning according to an embodiment of the present invention, which is applicable to industrial control of industrial equipment. The method may be performed by a reinforcement learning-based industrial control device, which may be implemented in hardware and/or software, and which may be configured in an electronic device, such as an industrial controller. As shown in fig. 1, the method specifically includes the following steps:

and S110, acquiring current operation data of the industrial equipment.

The industrial equipment can be any equipment requiring industrial control. The current operating data may refer to operating data of the industrial device at a current time, which may be used to characterize a current environmental state of the industrial device. The current operating data may include, but is not limited to, a current parameter value of at least one operating parameter of the industrial plant.

Specifically, the industrial controller may be connected to the industrial device in advance by wire or wirelessly for data transmission. The industrial equipment can acquire the operation data in real time and send the currently acquired current operation data to the industrial controller, so that the industrial controller can acquire the current operation data of the industrial equipment.

S120, determining target control information based on a target control decision model corresponding to the industrial equipment and current operation data, wherein the target control decision model is obtained by performing reinforcement learning on a preset control decision model based on a target virtual environment model corresponding to the industrial equipment in advance, and the target virtual environment model corresponding to the industrial equipment is obtained by performing environment modeling based on historical operation data of the industrial equipment.

The target control information may be information for controlling the operation of the industrial equipment, and may include a target parameter value corresponding to at least one control parameter. The target parameter value may refer to a parameter value to which the control parameter needs to be adjusted. The target control decision model may be a network model that decides industrial control operations based on the learned industrial control strategy. The preset control decision model may be an initial control decision model to be learned, which is set in advance based on industrial requirements. The target virtual environment model may be a deep learning network model that may mimic the operation of the real environment in which the industrial device is located. The historical operating data may be operating data of the industrial equipment collected over a historical period of time.

Specifically, the industrial controller can perform environmental modeling in advance based on historical operating data of the industrial equipment, automatically construct a target virtual environment model, does not need to rely on manual experience, is short in modeling time and low in cost, and can guarantee modeling accuracy, so that the accuracy of industrial control is effectively guaranteed. It should be noted that, by means of data driving, modeling can be performed on an industrial environment where any kind of industrial equipment is located, for example, modeling can be performed on a complex industrial environment, so that the method has wider practicability and more accurate environment modeling. The industrial controller can perform reinforcement learning on the preset control decision model based on the constructed target virtual environment model and the target reward function to obtain the learned target control decision model. When the preset control decision model is subjected to reinforcement learning, different target reward functions can be set based on actual business requirements, so that different target control decision models are learned, different business requirements are met, and the flexibility of industrial control is improved. In the actual control process, the industrial controller can input the current operation data of the industrial equipment into the target control decision model for control decision, and obtain the target control information needing to be controlled currently based on the output of the target control decision model, so that the target control information can be obtained more quickly by utilizing the target control decision model obtained by reinforcement learning, and the industrial control efficiency is further effectively improved.

Illustratively, performing environment modeling based on historical operating data of the industrial device to obtain a target virtual environment model corresponding to the industrial device may include: preprocessing historical operation data and determining historical operation track time sequence information; analyzing information of the historical running track time sequence information, and creating an initial virtual environment model; training the initial virtual environment model based on the generated confrontation training mode and the historical running track time sequence information to obtain a target virtual environment model corresponding to the industrial equipment after training is finished.

Specifically, historical operation data of the industrial equipment can be subjected to data cleaning, such as elimination of abnormal values and the like, and the cleaned historical operation data is subjected to segmentation processing based on the operation data generation time, and historical operation track time sequence information (S) in a time series form is obtained (S)₁、S₂…S_n). For example, the historical trajectory timing information is: { track S₁: running state at time 1, decision action 1, decision result 1, running state at time 2, decision action 2 …, running state at termination time N, and { track S }₂: the operating state at time 1, decision action 1, decision result 1, the operating state at time 2, decision action 2 …, the operating state at termination time N } and the like. And performing information analysis on parameter information in the historical running track time sequence information, determining an environment state variable and an intelligent agent decision action variable which are required by environment modeling, and creating an initial virtual environment model with a deep learning network framework based on the environment state variable and the intelligent agent decision action variable. And training and learning the environmental probability transition distribution by using the initial virtual environment model as a generator by using a generation confrontation training mode to obtain a target virtual environment model after training is finished.

For example, training the initial virtual environment model based on the generated countermeasure training mode and the historical movement trajectory timing information to obtain a target virtual environment model corresponding to the industrial device after the training is finished may include: determining sample input data and state label data corresponding to the sample input data based on the historical running track time sequence information; taking the initial virtual environment model as a generator in a generation countermeasure network, and inputting sample input data into the initial virtual environment model to obtain predicted state data output by the initial virtual environment model; inputting the predicted state data into a discriminator in a generated countermeasure network to obtain a discrimination result output by the discriminator; and alternately training the initial virtual environment model and the discriminator based on the discrimination result and the state label data until the training is finished when a preset convergence condition is reached, and obtaining a target virtual environment model corresponding to the industrial equipment.

The sample input data may include operation data (for characterizing the current environmental state) and decision action data taken for each time. The state label data may refer to operational data at a next time instant for characterizing a next environmental state. An initial virtual environment model (i.e., a generator) and an arbiter in the countermeasure network are alternately trained using a training function. In the training process, the initial virtual environment model continuously learns that the generated distribution is as close as possible to the arbiter, and the arbiter takes the distance between the initial virtual environment model and the generator generated distribution as an optimization target so as to distinguish which data are generated by the generator and which are real data. When a preset convergence condition is reached, for example, the number of alternating iterations reaches a preset number or a training error is minimum, it is indicated that the training of the initial virtual environment model is finished, and at this time, the trained initial virtual environment model may be used as the target virtual environment model. In this embodiment, the trained target virtual environment model may also be tested by using a test set, and the target virtual environment model with the optimal test result is selected as the final target virtual environment model, so as to ensure the accuracy of environment modeling.

Illustratively, the obtaining the target control decision model by performing reinforcement learning on the preset control decision model based on the target virtual environment model corresponding to the industrial device may include: determining a control parameter search space corresponding to a preset control decision model; determining a target reward function corresponding to a preset control decision model; and performing reinforcement learning on a preset control decision model based on the target reward function, the control parameter search space and a target virtual environment model corresponding to the industrial equipment to obtain the target control decision model.

Specifically, an appropriate search space for each control parameter, that is, a value range of each control parameter, may be determined based on the data gauge model size. Too large a search space may make the model difficult to reach a convergence state, and too small a search space may make the expression capability of the model not strong enough. The embodiment can adopt an automatic search mode, and after the parameter form of the control parameter is given, the search space of the control parameter can be dynamically adjusted based on the model training effect, so that the search space is continuously developed towards a good direction. The target reward function corresponding to the preset control decision model can be determined according to the control effect to be achieved, for example, the target reward function with energy conservation as the control target is determined. The target reward function plays a role in guiding the learning direction of the control strategy in the reinforcement learning process, and can well reflect the effect of the control strategy, such as economic benefit obtained through the strategy. Through a reinforcement learning algorithm, such as a ppo (proximity Policy optimization) algorithm and the like, in a target virtual environment model corresponding to an industrial device, a preset control decision model and the virtual environment continuously interact within a period of time to generate an interaction track, and through a way of maximizing accumulated rewards on the interaction track, reinforcement learning is performed on the preset control decision model, an optimal control decision way is trained, and a final target control decision model is obtained, so that the reinforcement learning can be performed on the preset control decision model in the target virtual environment model more conveniently and quickly, real users do not need to be interfered, and the learning effect of the target control decision model is ensured. In this embodiment, the trained target control decision model may also be tested by using a test set, and the target control decision model with the optimal test result is selected as the final target control decision model, so as to ensure the accuracy of the control strategy.

And S130, sending the target control information to the industrial equipment so that the industrial equipment operates based on the target control information.

Specifically, the industrial controller can send the target control information output by the target control decision model to the industrial equipment, so that the industrial equipment continues to operate based on the target control information, thereby realizing effective control on the industrial equipment and ensuring normal operation of the industrial equipment.

It should be noted that, the industrial controller in this embodiment may include a socket, so as to connect with the industrial device through the socket, and plug and play, without complicated manual debugging, thereby improving convenience and ease of use.

On the basis of the above technical solution, performing environment modeling based on historical operating data of the industrial device to obtain a target virtual environment model corresponding to the industrial device, performing reinforcement learning on a preset control decision model based on the target virtual environment model corresponding to the industrial device to obtain a target control decision model, which may include:

acquiring historical operating data of industrial equipment; sending the historical operation data to a server so that the server carries out environment modeling based on the historical operation data of the industrial equipment to obtain a target virtual environment model corresponding to the industrial equipment, carrying out reinforcement learning on a preset control decision model based on the historical operation data target virtual environment model to obtain a target control decision model and returning; and receiving a target control decision model returned by the server.

Specifically, the industrial controller is in communication connection with the server, so that the industrial controller can send the obtained historical operation data to the server, perform environment modeling in the server based on the historical operation data of the industrial equipment to obtain a target virtual environment model corresponding to the industrial equipment, and perform reinforcement learning on the preset control decision model based on the historical operation data and the target virtual environment model to obtain the target control decision model. The industrial controller obtains the target control decision model by downloading the target control decision model from the server, so that the target control decision model can be generated in the server more quickly and accurately by utilizing the enhanced computing power of the server, and the accuracy and the efficiency of industrial control are further improved.

Example two

Fig. 2 is a flowchart of an industrial control method based on reinforcement learning according to a second embodiment of the present invention, where the present embodiment describes in detail a process of detecting an abnormal operation data based on the above embodiment, and further describes in detail a processing process after detecting the abnormal operation data based on the above embodiment. Wherein explanations of the same or corresponding terms as those of the above-described embodiments are omitted. Referring to fig. 2, the industrial control method based on reinforcement learning provided in this embodiment specifically includes the following steps:

and S210, acquiring current operation data of the industrial equipment.

And S220, determining target control information based on the target control decision model corresponding to the industrial equipment and the current operation data.

And S230, sending the target control information to the industrial equipment so that the industrial equipment operates based on the target control information.

And S240, acquiring next actual operation data of the industrial equipment at the next moment after the industrial equipment operates based on the target control information.

Specifically, after the industrial device operates based on the target control information, the industrial device may enter an operating state at the next time, and send the next actual operating data at the next time to the industrial controller, so that the industrial controller may obtain the next actual operating data.

And S250, carrying out data abnormity detection on the next actual operation data, and sending out abnormity alarm information when the data abnormity is detected.

Specifically, the industrial equipment may wear and age with the increase of the usage time, which causes the performance of the industrial equipment to change, so that the industrial environment where the industrial equipment is located may also change, and therefore, the anomaly detection of the operation data is required, so as to ensure the accuracy of the industrial control. Before the next control is performed based on the next actual operation data, the industrial controller can perform data anomaly detection on the obtained next actual operation data, namely, perform rationality judgment on the next actual operation data, and detect whether data anomaly occurs, so as to determine whether the industrial environment in which the industrial equipment is positioned changes. For example, the upper and lower bounds of the parameter value corresponding to the operating parameter may be preset, and when it is detected that the next actual operating data exceeds the upper bound or the lower bound of the parameter value, it is determined that the operating data is abnormal. When the data abnormity is detected, abnormity alarm information can be sent out, so that related personnel can be timely reminded of the data abnormity, and corresponding abnormity processing is carried out.

For example, the "detecting data abnormality for next actual operation data" in S250 may include: determining next predicted operation data at the next moment based on a target virtual environment model corresponding to the industrial equipment and the current operation data; and comparing the next actual operation data with the next predicted operation data, and determining whether data abnormity occurs or not based on the comparison result.

Specifically, the industrial controller may input the current operation data as the current environmental state into the target virtual environment model, and based on the output of the target virtual environment model, may obtain the next environmental state, i.e., the next predicted operation data at the next time. The target virtual environment model can be locally constructed by the industrial controller or obtained by downloading from a server. By comparing the obtained next actual operation data with the next predicted operation data, if the difference between the two data is greater than the preset difference, it can be determined that the data is abnormal, that is, the industrial production environment in which the industrial equipment is located is changed.

And S260, performing model updating on a target virtual environment model corresponding to the industrial equipment based on target operation data generated by the industrial equipment after data exception to obtain the updated target virtual environment model.

Specifically, after detecting the data anomaly, the industrial device may obtain target operation data generated after the data anomaly, and perform model update on the target virtual environment model based on the latest target operation data, thereby obtaining the target virtual environment model after the industrial environment is changed.

And S270, based on the updated target virtual environment model, performing reinforcement learning on the preset control decision model again to obtain the updated target control decision model.

Specifically, through a reinforcement learning algorithm, in the updated target virtual environment model, the preset control decision model continuously interacts with the virtual environment again to generate an interaction track, and reinforcement learning is performed on the preset control decision model again in a mode of maximizing accumulated rewards on the interaction track to obtain a new target control decision model after the industrial environment is changed, so that industrial control can be performed on industrial equipment subsequently by using the updated target control decision model, and the accuracy of industrial control is further ensured.

It should be noted that, if the environment modeling and decision control are performed in the server, the industrial controller needs to send the target operation data to the server, so as to perform model update on the target virtual environment model in the server based on the target operation data to obtain an updated target virtual environment model, perform reinforcement learning on the preset control decision model again based on the updated target virtual environment model to obtain an updated target control decision model, and send the updated target control decision model to the industrial control.

According to the technical scheme, data abnormity detection is carried out on the next actual operation data, and abnormity alarm information is sent out when the data abnormity is detected, so that related personnel can be timely reminded of the data abnormity, and corresponding abnormity processing is carried out. After the data abnormality is detected, model updating can be carried out on a target virtual environment model corresponding to the industrial equipment based on target operation data generated by the industrial equipment after the data abnormality, the updated target virtual environment model is obtained, reinforcement learning is carried out on a preset control decision model again based on the updated target virtual environment model, and the updated target control decision model is obtained.

The following is an embodiment of an industrial control device based on reinforcement learning according to an embodiment of the present invention, which belongs to the same inventive concept as the industrial control method based on reinforcement learning according to the above embodiments, and reference may be made to the above embodiment of the industrial control method based on reinforcement learning for details that are not described in detail in the embodiment of the industrial control device based on reinforcement learning.

EXAMPLE III

Fig. 3 is a schematic structural diagram of an industrial control device based on reinforcement learning according to a third embodiment of the present invention. As shown in fig. 3, the apparatus specifically includes: a current operation data acquisition module 310, a target control information determination module 320, and a target control information transmission module 330.

The current operation data acquiring module 310 is configured to acquire current operation data of the industrial device; a target control information determining module 320, configured to determine target control information based on a target control decision model corresponding to the industrial device and the current operation data, where the target control decision model is obtained by performing reinforcement learning on a preset control decision model based on a target virtual environment model corresponding to the industrial device in advance, and the target virtual environment model corresponding to the industrial device is obtained by performing environment modeling based on historical operation data of the industrial device; a target control information sending module 330, configured to send the target control information to the industrial device, so that the industrial device operates based on the target control information.

According to the technical scheme of the embodiment of the invention, the environment modeling is carried out in advance based on the historical operating data of the industrial equipment to obtain the target virtual environment model corresponding to the industrial equipment, so that the environment modeling is carried out based on the historical operating data, the dependence on manual experience is not required, the modeling time is short, the cost is low, and the accuracy of industrial control is effectively ensured. The preset control decision model is subjected to reinforcement learning based on the target virtual environment model corresponding to the industrial equipment, and the target control decision model corresponding to the industrial equipment is obtained, so that the target control decision model obtained through reinforcement learning is utilized to perform control decision on the current operation data of the industrial equipment, target control information can be obtained more quickly, and the industrial control efficiency is further effectively improved.

Optionally, the apparatus further comprises: an environmental modeling module comprising:

the historical running track time sequence information determining unit is used for preprocessing the historical running data and determining the historical running track time sequence information;

the initial virtual environment model creating unit is used for carrying out information analysis on the historical running track time sequence information and creating an initial virtual environment model;

and the target virtual environment model determining unit is used for training the initial virtual environment model based on the generated countermeasure training mode and the historical running track time sequence information to obtain the target virtual environment model corresponding to the industrial equipment after the training is finished.

Optionally, the target virtual environment model determining unit is specifically configured to:

determining sample input data and state label data corresponding to the sample input data based on the historical running track time sequence information; taking an initial virtual environment model as a generator in a generation countermeasure network, inputting the sample input data into the initial virtual environment model, and obtaining predicted state data output by the initial virtual environment model; inputting the predicted state data into a discriminator in a generation countermeasure network to obtain a discrimination result output by the discriminator; and alternately training the initial virtual environment model and the discriminator based on the discrimination result and the state label data until the training is finished when a preset convergence condition is reached, and obtaining a target virtual environment model corresponding to the industrial equipment.

Optionally, the apparatus further comprises:

a target control decision model determination module, specifically configured to: determining a control parameter search space corresponding to a preset control decision model; determining a target reward function corresponding to the preset control decision model; and performing reinforcement learning on the preset control decision model based on the target reward function, the control parameter search space and a target virtual environment model corresponding to the industrial equipment to obtain the target control decision model.

Optionally, the apparatus further comprises:

the historical operating data acquisition module is used for acquiring historical operating data of the industrial equipment;

the historical operating data sending module is used for sending the historical operating data to a server so that the server can perform environment modeling based on the historical operating data of the industrial equipment to obtain a target virtual environment model corresponding to the industrial equipment, and perform reinforcement learning on a preset control decision model based on the historical operating data target virtual environment model to obtain the target control decision model and return;

and the target control decision model receiving module is used for receiving the target control decision model returned by the server.

Optionally, the apparatus further comprises:

the next actual operation data acquisition module is used for acquiring next actual operation data of the industrial equipment at the next moment after the industrial equipment operates based on the target control information;

and the abnormal alarm information sending module is used for carrying out data abnormality detection on the next actual operation data and sending abnormal alarm information when the data abnormality is detected.

Optionally, the abnormal alarm information issuing module includes: a data abnormality detection unit;

the data anomaly detection unit is specifically configured to: determining next predicted operation data at the next moment based on the target virtual environment model corresponding to the industrial equipment and the current operation data; and comparing the next actual operation data with the next predicted operation data, and determining whether data abnormity occurs or not based on the comparison result.

Optionally, the apparatus further comprises:

the model updating module is used for updating a model of a target virtual environment model corresponding to the industrial equipment based on target operation data generated by the industrial equipment after the data abnormality is detected to obtain the updated target virtual environment model; and based on the updated target virtual environment model, performing reinforcement learning on the preset control decision model again to obtain an updated target control decision model.

The reinforcement learning-based industrial control device provided by the embodiment of the invention can execute the reinforcement learning-based industrial control method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the reinforcement learning-based industrial control method.

It should be noted that, in the embodiment of the industrial control device based on reinforcement learning, the included units and modules are only divided according to the functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

Example four

Fig. 4 is a schematic structural diagram of an industrial control system based on reinforcement learning according to a fourth embodiment of the present disclosure. As shown in fig. 4, the system specifically includes: an industrial device 410 and an industrial controller 420.

The industrial controller 420 is configured to implement the reinforcement learning-based industrial control method according to any embodiment of the present invention.

According to the industrial control system based on reinforcement learning in the embodiment of the disclosure, the environment modeling is performed in advance based on the historical operation data of the industrial equipment to obtain the target virtual environment model corresponding to the industrial equipment, so that the environment modeling is performed based on the historical operation data, the dependence on artificial experience is not required, the modeling time is short, the cost is low, and the accuracy of industrial control is effectively ensured. The preset control decision model is subjected to reinforcement learning based on the target virtual environment model corresponding to the industrial equipment, and the target control decision model corresponding to the industrial equipment is obtained, so that the target control decision model obtained through reinforcement learning is utilized to perform control decision on the current operation data of the industrial equipment, target control information can be obtained more quickly, and the industrial control efficiency is further effectively improved.

EXAMPLE five

FIG. 5 illustrates a schematic diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as a reinforcement learning based industrial control method.

In some embodiments, the reinforcement learning-based industrial control method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When loaded into RAM 13 and executed by processor 11, the computer program may perform one or more steps of the reinforcement learning based industrial control method described above. Alternatively, in other embodiments, the processor 11 may be configured to perform the reinforcement learning based industrial control method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An industrial control method based on reinforcement learning is characterized by comprising the following steps:

acquiring current operation data of the industrial equipment;

2. The method of claim 1, wherein performing environment modeling based on historical operating data of the industrial device to obtain a target virtual environment model corresponding to the industrial device comprises:

preprocessing the historical operation data to determine historical operation track time sequence information;

performing information analysis on the historical running track time sequence information, and creating an initial virtual environment model;

training an initial virtual environment model based on a generated confrontation training mode and the historical running track time sequence information to obtain a target virtual environment model corresponding to the industrial equipment after training is finished.

3. The method of claim 2, wherein training an initial virtual environment model based on the generated countermeasure training mode and the historical trajectory timing information to obtain a target virtual environment model corresponding to the industrial device after training is completed comprises:

determining sample input data and state label data corresponding to the sample input data based on the historical running track time sequence information;

taking an initial virtual environment model as a generator in a generation countermeasure network, inputting the sample input data into the initial virtual environment model, and obtaining predicted state data output by the initial virtual environment model;

inputting the predicted state data into a discriminator in a generation countermeasure network to obtain a discrimination result output by the discriminator;

and alternately training the initial virtual environment model and the discriminator based on the discrimination result and the state label data until the training is finished when a preset convergence condition is reached, and obtaining a target virtual environment model corresponding to the industrial equipment.

4. The method of claim 1, wherein performing reinforcement learning on a preset control decision model based on a target virtual environment model corresponding to the industrial device to obtain the target control decision model comprises:

determining a control parameter search space corresponding to a preset control decision model;

determining a target reward function corresponding to the preset control decision model;

and performing reinforcement learning on the preset control decision model based on the target reward function, the control parameter search space and a target virtual environment model corresponding to the industrial equipment to obtain the target control decision model.

5. The method according to any one of claims 1 to 4, wherein performing environment modeling based on historical operating data of the industrial device to obtain a target virtual environment model corresponding to the industrial device, and performing reinforcement learning on a preset control decision model based on the target virtual environment model corresponding to the industrial device to obtain the target control decision model comprises:

acquiring historical operating data of the industrial equipment;

sending the historical operating data to a server so that the server carries out environment modeling based on the historical operating data of the industrial equipment to obtain a target virtual environment model corresponding to the industrial equipment, and carrying out reinforcement learning on a preset control decision model based on the historical operating data target virtual environment model to obtain the target control decision model and returning;

and receiving the target control decision model returned by the server.

6. The method of claim 1, further comprising:

acquiring next actual operation data of the industrial equipment at the next moment after the industrial equipment operates based on the target control information;

and carrying out data anomaly detection on the next actual operation data, and sending out anomaly alarm information when detecting data anomaly.

7. The method of claim 6, wherein performing data anomaly detection on the next actual operational data comprises:

determining next predicted operation data at the next moment based on the target virtual environment model corresponding to the industrial equipment and the current operation data;

and comparing the next actual operation data with the next predicted operation data, and determining whether data abnormity occurs or not based on the comparison result.

8. The method of claim 6, after detecting the data anomaly, further comprising:

model updating is carried out on a target virtual environment model corresponding to the industrial equipment based on target operation data generated by the industrial equipment after data exception, and the updated target virtual environment model is obtained;

and based on the updated target virtual environment model, performing reinforcement learning on the preset control decision model again to obtain an updated target control decision model.

9. An industrial control device based on reinforcement learning, characterized by comprising:

10. A reinforcement learning-based industrial control system, the system comprising: industrial equipment and industrial controllers;

wherein the industrial controller is used for implementing the reinforcement learning-based industrial control method according to any one of claims 1-8.

11. An electronic device, characterized in that the electronic device comprises:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the reinforcement learning-based industrial control method of any one of claims 1-8.