DE102011075337A1

DE102011075337A1 - Method for controlling system, involves carrying out action to control system by obtaining control information from data of system, where another action or operation is determined according to data to control system

Info

Publication number: DE102011075337A1
Application number: DE102011075337A
Authority: DE
Inventors: Siegmund Düll; Kai Heesche; Volkmar Sterzing; Steffen Udluft; Per Egedal; Thomas Esbensen
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 2011-05-05
Filing date: 2011-05-05
Publication date: 2012-11-08

Abstract

The method involves carrying out an action (110) to control a system by obtaining a control information from the data (111) of the system. Another action (113) or operation is determined according to the data to control the system. The two actions are intended to explore the components of the state space of the system, where the components include technical facility, manufacturing plant, wind turbine and gas turbine. The two actions comprise a setting of the operating parameters of the system. An independent claim is included for an apparatus for controlling a system by a processing unit.

Description

Die Erfindung betrifft ein Verfahren und eine Vorrichtung zur Ansteuerung einer Anlage. The invention relates to a method and a device for controlling a system.

Es gibt verstärkt Bemühungen, Windturbinen für die Erzeugung erneuerbarer Energien in unterschiedlichen Umgebungen, z.B. an der Küste oder vor der Küste zu errichten. Abhängig von geografischen Bedingungen zeigt der Wind unterschiedliche Charakteristiken, die den Betrieb und auch die Effizienz der Windturbine deutlich beeinflussen. Während die Windturbine die von ihr bereitgestellte Leistung bei hohen Windgeschwindigkeiten begrenzt, wird sie jedoch die meiste Zeit von niedrigeren Windgeschwindigkeiten angetrieben, d.h. die bereitgestellte Leistung liegt zumeist deutlich unter dem maximal möglichen Limit der Windturbine. There are increasing efforts to develop wind turbines for the production of renewable energy in different environments, e.g. to build on the coast or off the coast. Depending on geographical conditions, the wind has different characteristics that significantly affect the operation and efficiency of the wind turbine. However, while the wind turbine limits the power it provides at high wind speeds, it is driven by lower wind speeds most of the time, i. The power provided is usually well below the maximum possible limit of the wind turbine.

Um die von der Windturbine bereitgestellte Leistung zu optimieren, können deren Betriebsparameter angepasst werden. Hierbei ist es jedoch von Nachteil, dass z.B. lokale Bedingungen, z.B. Turbulenzen, Scherwinde, Temperaturschwankungen, Feuchtigkeit, Staub, Eis, etc. nicht geeignet berücksichtigt werden. Dieser Effekt verstärkt sich bei der Verwendung mehrerer Windturbinen in Form sog. Windparks, da die Windturbinen selbst die Windgeschwindigkeiten beeinflussen. In order to optimize the power provided by the wind turbine, its operating parameters can be adjusted. However, it is disadvantageous that e.g. local conditions, e.g. Turbulence, shear winds, temperature fluctuations, moisture, dust, ice, etc. are not considered suitable. This effect is exacerbated by the use of several wind turbines in the form of so-called wind farms, as the wind turbines themselves influence the wind speeds.

Die Aufgabe der Erfindung besteht darin, die vorstehend genannten Nachteile zu vermeiden und insbesondere eine effiziente Lösung zur Einstellung bzw. zum Betrieb (mindestens) einer Windturbine zu schaffen. The object of the invention is to avoid the above-mentioned disadvantages and in particular to provide an efficient solution for setting or for operation (at least) of a wind turbine.

Diese Aufgabe wird gemäß den Merkmalen der unabhängigen Patentansprüche gelöst. Weiterbildungen der Erfindung ergeben sich auch aus den abhängigen Ansprüchen. This object is achieved according to the features of the independent claims. Further developments of the invention will become apparent from the dependent claims.

Zur Lösung der Aufgabe wird ein Verfahren angegeben zur Ansteuerung einer Anlage,

– bei dem die Anlage mittels einer ersten Aktion angesteuert wird,
– bei dem Daten der Anlage ermittelt werden als Folge der Ansteuerung mit der ersten Aktion,
– bei dem anhand der Daten eine zweite Aktion ermittelt wird und die Anlage mit der zweiten Aktion angesteuert wird,
– bei dem die erste Aktion und/oder die zweite Aktion bestimmt wird, um Teile eines Zustandsraums der Anlage zu explorieren.

To solve the problem, a method is specified for controlling a system,

- in which the system is controlled by a first action,
- in which data of the plant are determined as a result of the control with the first action,
In which a second action is determined on the basis of the data and the system is activated with the second action,
- In which the first action and / or the second action is determined to explore parts of a state space of the plant.

Bei dem Ermitteln der Teile des Zustandsraums kann es sich um eine systematische und/oder um eine unsystematische (z.B. zufällige oder pseudozufällige) Erkundung bzw. Exploration des Zustandsraums (oder eines Teils des Zustandsraums) handeln. Determining the parts of the state space may be a systematic and / or unsystematic (e.g., random or pseudorandom) exploration of the state space (or part of the state space).

Somit ist es möglich, durch gezielte Aktionen die Wissensbasis (Daten) für die Modellierung zu vergrößern und damit zu verbessern. Diese Wissensbasis kann zur Modellierung und zur weiteren Ansteuerung der Anlage verwendet werden. Im Ergebnis kann somit wirksam die Effizienz der Anlage gesteigert werden. Thus, it is possible to increase and thus improve the knowledge base (data) for modeling through targeted actions. This knowledge base can be used for modeling and further controlling the plant. As a result, the efficiency of the plant can be effectively increased.

Die vorliegende Lösung ist besonders von Vorteil, wenn die Anlage und das mit ihr verbundene Modell eine zeitliche Dynamik, stochastische Eigenschaften, nichtlineare Effekte und/ oder eine hohe Dimension aufweist. The present solution is particularly advantageous when the system and the model associated with it have temporal dynamics, stochastic properties, non-linear effects and / or a high dimension.

Eine Weiterbildung ist es, dass die erste Aktion und/oder die zweite Aktion eine Aktion ist, für die noch keine Daten vorliegen oder für die vorliegende Daten veraltet sind. A further development is that the first action and / or the second action is an action for which no data is yet available or for which the present data is obsolete.

Somit kann anhand der Aktionen systematisch, unsystematisch, stochastisch oder mit gemischten Strategien der Zustandsraum exploriert werden, d.h. vorgegebene Betriebsparameter eingestellt und zugehörige Daten bestimmt werden. Thus, using the actions systematically, unsystematically, stochastically, or with mixed strategies, the state space can be explored, i. preset operating parameters are set and associated data are determined.

Eine andere Weiterbildung ist es, dass die erste Aktion und die zweite Aktion eine Einstellung von Betriebsparametern der Anlage umfasst. Another development is that the first action and the second action includes a setting of operating parameters of the plant.

Insbesondere ist es eine Weiterbildung, dass die Anlage mindestens eine der folgenden Komponenten umfasst:

– ein technisches System,
– eine Automatisierungsanlage,
– eine Fertigungsanlage,
– eine Produktionsanlage,
– eine Maschine,
– eine Windturbine,
– eine Gasturbine,
– eine Strömungsmaschine,
– ein Energienetz,
– ein Stromverteilungsnetz,
– ein Lastverteilungsnetz,
– ein Kommunikationsnetz.

In particular, it is a development that the system comprises at least one of the following components:

- a technical system,
An automation system,
- a manufacturing plant,
- a production plant,
- a machine,
- a wind turbine,
A gas turbine,
A turbomachine,
- a power grid,
A power distribution network,
- a load distribution network,
- a communication network.

Auch ist es eine Weiterbildung, dass die ermittelten Daten Messwerte, Informationen und/oder Beobachtungen der Anlage umfassen. It is also a further development that the determined data comprise measured values, information and / or observations of the plant.

Insbesondere können unterschiedliche Sensoren (z.B. Temperatur, Feuchtigkeit, Luftdruck, Geschwindigkeit, Beschleunigung etc.) vorgesehen sein, Daten von der Anlage zu erhalten. In particular, different sensors (e.g., temperature, humidity, air pressure, speed, acceleration, etc.) may be provided to obtain data from the plant.

Ferner ist es eine Weiterbildung, dass die Daten zusammen mit der zugehörigen ersten Aktion gespeichert werden. Furthermore, it is a development that the data is stored together with the associated first action.

Beispielsweise können die Daten in einer Datenbank gespeichert und z.B. über eine vorgegebene Zeitdauer gesammelt werden. Diese Datenbank kann für einen Lernalgorithmus (s.u.) eine Wissensbasis darstellen. For example, the data may be stored in a database, e.g. collected over a predetermined period of time. This database can be a knowledge base for a learning algorithm (s.u.).

Im Rahmen einer zusätzlichen Weiterbildung werden die Daten zusammen mit der zugehörigen ersten Aktion gespeichert. As part of an additional training, the data is stored together with the associated first action.

Somit kann die Datenbank Datensätze aufweisen, die jeweils eine Aktion zusammen mit den zugehörigen Daten enthält. Thus, the database may have records that each contain an action along with the associated data.

Eine nächste Weiterbildung besteht darin, dass anhand der Daten die zweite Aktion ermittelt wird, indem mittels einer Zustandsschätzung ein Zustand, insbesondere ein Markov-Zustand, bestimmt wird. A further development is that the second action is determined on the basis of the data by determining a state, in particular a Markov state, by means of a state estimation.

Beispielsweise kann der Markov-Zustand mittels eines MPEN-Verfahrens ( "Markov Decisions Process Extraktion Network" siehe z.B.: S. Duell, A. Hans, and S. Udluft: The Markov Decision Process Extraction Network. In Proc. of the European Symposium on Artificial Neural Networks, 2010 ) ermittelt werden. Insbesondere kann die Zustandsschätzung einen Markov-Zustand basierend auf den in der Datenbank abgespeicherten Daten der Anlage bestimmen. For example, the Markov state can be determined by means of an MPEN method ( See "Markov Decision's Process Extraction Network", for example: S. Duell, A. Hans, and S. Udluft: The Markov Decision Process Extraction Network. In Proc. of the European Symposium on Artificial Neural Networks, 2010 ) be determined. In particular, the state estimation may determine a Markov state based on the data of the plant stored in the database.

Eine Ausgestaltung ist es, dass anhand des Zustands mittels einer Kontroll-Strategie eine optimierte Aktion als die zweite Aktion bestimmt wird. An embodiment is that an optimized action is determined as the second action on the basis of a control strategy.

Die Zustandsschätzung stellt somit einen Zustand bereit, der anhand der Kontroll-Strategie eine Auswahl aus einer Vielzahl möglicher Aktionen erlaubt. Diese Aktion kann dann als zweite Aktion zur Ansteuerung der Anlage eingesetzt werden. The state estimation thus provides a state that allows a selection from a variety of possible actions based on the control strategy. This action can then be used as a second action to control the system.

Eine alternative Ausführungsform besteht darin, dass basierend auf Daten der Anlage mittels eines Lernalgorithmus ein Modell der Anlage bestimmt oder modifiziert wird und basierend auf dem Modell die Zustandsschätzung durchgeführt wird. An alternative embodiment is that, based on data of the plant by means of a learning algorithm, a model of the plant is determined or modified and based on the model, the state estimation is performed.

Das Modell kann einer Abbildung der Anlage entsprechen. Aufgrund der in der Datenbank abgespeicherten Datensätze (s.o.) kann das Modell iterativ verfeinert bzw. geändert werden. Beispielsweise kann hierdurch flexibel auf Änderungen der Anlage reagiert werden oder es können Effizienzsteigerungen, die durch eine Erkundung (z.B. ein schrittweises systematisches oder (pseudo-)zufälliges Explorieren) des Zustandsraums erkennbar wurden, berücksichtigt werden. The model can correspond to a picture of the system. Based on the data records stored in the database (see above), the model can be iteratively refined or changed. For example, this may respond flexibly to changes in the plant, or it may take into account efficiency gains that have become apparent through exploration (e.g., a stepwise systematic or (pseudo) random exploration) of the state space.

Somit kann das Modell als ein Prognosemodell eingesetzt werden zur Bestimmung eines Markov-Zustands. Hierbei kann eine Dynamik der Anlage mitberücksichtigt werden, die Zustandsschätzung nähert sich dem Zustandsraum der Markov-Zustände an. Thus, the model can be used as a predictive model to determine a Markov condition. Here, a dynamics of the system can be taken into account, the state estimation approaches the state space of the Markov states.

Als Lernalgorithmus kann beispielsweise

– ein NFQ-Verfahren ( "Neural Fitted Q Iteration", siehe: M. Riedmiller: Neural Fitted Q Iteration – First Experiences with a Data Efficient Neural Reinforcement Learning Method. In Proc. of the European Conf. on Machine Learning, 2005 ),
– ein RCNN ( "Recurrent Control Neural Network", siehe: A.M. Schaefer, S. Udluft, and H.-G. Zimmermann. A Recurrent Control Neural Network for Data Efficient Reinforcement Learning. In Proc. of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007 ; oder A. M. Schäfer, D. Schneegaß, V. Sterzing, and S. Udluft. A Neural Reinforcement Learning Approach to Gas Turbine Control. International Joint Conference on Neural Networks, 2007 ) und/oder
– ein PGNRR-Verfahren ( "Policy Gradient Neural Rewards Regression", siehe: D. Schneegaß, S. Udluft, and Th. Martinetz. Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification. In Proc. of the International Conf. on Artificial Neural Networks, 2007 )

eingesetzt werden. As a learning algorithm, for example

- an NFQ procedure ( "Neural Fitted Q Iteration", see: M. Riedmiller: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In Proc. of the European Conf. on Machine Learning, 2005 )
An RCNN ( "Recurrent Control Neural Network", see: AM Schaefer, S. Udluft, and H.-G. Zimmermann. A Recurrent Neural Network for Data Efficient Reinforcement Learning. In Proc. of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007 ; or AM Schäfer, D. Schneegass, V. Sterzing, and S. Udluft. A Neural Reinforcement Learning Approach to Gas Turbine Control. International Joint Conference on Neural Networks, 2007 ) and or
- a PGNRR procedure ( "Policy Gradient Neural Rewards Regression", see: D. Schneegass, S. Udluft, and Th. Martinetz. Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification. In Proc. of the International Conf. on Artificial Neural Networks, 2007 )

be used.

Eine nächste Ausgestaltung ist es, dass das Modell der Anlage mittels eines neuronalen Netzes, einer Baumstruktur und/oder Gauß-Prozessen bestimmt wird. A next embodiment is that the model of the plant is determined by means of a neural network, a tree structure and / or Gaussian processes.

Auch ist es eine Ausgestaltung, dass basierend auf Daten der Anlage mittels des Lernalgorithmus und mittels des Modells der Anlage die Kontroll-Strategie zur Auswahl der zweiten Aktion basierend auf dem Zustand eingestellt oder modifiziert wird. It is also an embodiment that based on data of the system by means of the learning algorithm and by means of the model of the system, the control strategy for selecting the second action based on the state is set or modified.

Ein solcher Generator kann als ein Generator für die Zustandsschätzung verstanden werden. Auch kann der Generator zur Erzeugung bzw. Modifikation einer Strategie zum bestärkenden Lernen ("Reinforcement Learning") dienen. Such a generator can be understood as a generator for state estimation. Also, the generator may serve to create or modify a reinforcement learning strategy.

Hierbei ist es von Vorteil, dass der vorgeschlagene Ansatz auf kontinuierliche Zustandsräume anwendbar ist. Beispielsweise kann ein Markov-Zustand auf mindestens eine Aktion abgebildet werden mittels der Kontroll-Strategie, wobei die Kontroll-Strategie selbst ein Resultat des Lernalgorithmus ist. In this case, it is advantageous that the proposed approach is applicable to continuous state spaces. For example, a Markov state may be mapped to at least one action by means of the control strategy, the control strategy itself being a result of the learning algorithm.

Eine Weiterbildung besteht darin, dass die Anlage in einem ersten Betriebsmodus und in einem zweiten Betriebsmodus ansteuerbar ist, wobei der erste Betriebsmodus einen normalen Betrieb der Anlage kennzeichnet und der zweite Betriebsmodus einen Betrieb der Anlage zur Generierung der Daten bezeichnet. A development consists in that the system can be controlled in a first operating mode and in a second operating mode, the first operating mode characterizing normal operation of the system and the second operating mode designating operation of the system for generating the data.

Insbesondere kann eine Umschaltung zwischen den Betriebsmodi erfolgen. Die beiden Betriebsmodi können abwechselnd, zufällig oder regelmäßig aktiviert werden. Auch kann basierend auf einem vorgegebenen Schema eine Aktivierung der Betriebsmodi erfolgen. Alternativ kann eine dynamische Aktivierung der Betriebsmodi erfolgen, z.B. verursacht durch eine Verbesserung der Effizienz der Anlage in dem ersten Betriebsmodus. So ist es weiterhin beispielhaft möglich, dass der zweite Betriebsmodus eher selten aktiviert wird, sofern die Anlage bereits ausreichend effizient läuft bzw. durch zusätzliche Messungen keine (oder kaum mehr eine) Effizienzsteigerung erreicht werden kann. In particular, a switchover between the operating modes can take place. The two operating modes can be activated alternately, randomly or regularly. Also, based on a predetermined scheme, an activation of the operating modes can take place. Alternatively, dynamic activation of the modes of operation may occur, e.g. caused by an improvement in the efficiency of the plant in the first operating mode. Thus, it is also possible by way of example that the second operating mode is rarely activated, if the system is already running sufficiently efficiently or by additional measurements no (or little more) increase in efficiency can be achieved.

Eine zusätzliche Ausgestaltung ist es, dass in dem zweiten Betriebsmodus die Anlage mit einer ersten oder zweiten Aktion angesteuert wird, die der gezielten Beschaffung von noch nicht vorhandenen Daten oder von aktuellen Daten dient. An additional embodiment is that in the second operating mode, the system is controlled with a first or second action, which serves the targeted procurement of not yet existing data or current data.

So kann in dem zweiten Betriebsmodus gezielt der Zustandsraum der Anlage exploriert werden, d.h. ein Verhalten der Anlage mittels der Daten bestimmt werden in Abhängigkeit von den eingestellten Betriebsparametern der Anlage (also den durchgeführten Aktionen). Die gezielte Beschaffung noch nicht vorhandener Daten betreffend den Zustandsraum der Anlage stellt insbesondere darauf ab, dass neue Aktionen durchgeführt werden und anhand der Daten bestimmt wird, ob dies eine Effizienzsteigerung oder eine Effizienzverschlechterung der Anlage bewirkt. Die schrittweise Erkundung des Zustandsraums der Anlage kann zur Modellierung der Anlage dienen und somit die Zustandsschätzung als auch die Kontroll-Strategie – als Resultat des Lernalgorithmus – verbessern. Thus, in the second operating mode, the state space of the plant can be explored specifically, i. A behavior of the system can be determined by means of the data depending on the set operating parameters of the system (ie the actions performed). Targeted procurement of non-existent data on the state space of the plant is in particular dependent on new actions being taken and on the data being determined as to whether this will increase the efficiency or deteriorate the efficiency of the installation. The step-by-step exploration of the state space of the plant can serve to model the plant and thus improve the state estimation as well as the control strategy as a result of the learning algorithm.

Die Erkundung (auch bezeichnet als Exploration) des Zustandsraums kann auch unter Berücksichtigung der Kontroll-Strategie erfolgen. Hierbei sind unterschiedliche Varianten für die Erkundung des Zustandsraums möglich: Beispielsweise kann eine Datendichte erhöht werden oder es kann eine Nähe (in dem Zustandsraum) zwischen bereits ermittelten Daten (und ggf. deren Effizienz) berücksichtigt werden, um weitere vergleichbare oder vollständig andere Daten in dem Zustandsraum zu ermitteln. The exploration (also called exploration) of the state space can also take place under consideration of the control strategy. In this case, different variants for the exploration of the state space are possible: For example, a data density can be increased or a proximity (in the state space) between already determined data (and, if applicable, its efficiency) can be taken into account in order to obtain comparable or completely different data in the state space State space to determine.

Die vorstehend genannte Aufgabe wird auch gelöst durch eine Vorrichtung zur Ansteuerung einer Anlage mit einer Verarbeitungseinheit, die derart eingerichtet ist, dass

– die Anlage mittels einer ersten Aktion ansteuerbar ist,
– Daten der Anlage ermittelbar sind als Folge der Ansteuerung mit der ersten Aktion,
– anhand der Daten eine zweite Aktion ermittelbar ist und die Anlage mit der zweiten Aktion ansteuerbar ist,
– die erste Aktion und/oder die zweite Aktion bestimmbar ist, um Teile eines Zustandsraums der Anlage zu ermitteln.

The above object is also achieved by a device for controlling a system with a processing unit, which is set up such that

- The system is controllable by means of a first action,
- data of the system can be determined as a result of the control with the first action,
A second action can be determined on the basis of the data and the installation can be activated with the second action,
- The first action and / or the second action is determinable to determine parts of a state space of the plant.

Die Verarbeitungseinheit kann insbesondere eine Prozessoreinheit und/oder eine zumindest teilweise fest verdrahtete oder logische Schaltungsanordnung sein, die beispielsweise derart eingerichtet ist, dass das Verfahren wie hierin beschrieben durchführbar ist. Besagte Verarbeitungseinheit kann jede Art von Prozessor oder Rechner oder Computer mit entsprechend notwendiger Peripherie (Speicher, Input/Output-Schnittstellen, Ein-Ausgabe-Geräte, etc.) sein oder umfassen. In particular, the processing unit may be a processor unit and / or an at least partially hardwired or logic circuit arrangement, which is set up, for example, such that the method can be carried out as described herein. Said processing unit may be or include any type of processor or computer or computer with correspondingly necessary peripherals (memory, input / output interfaces, input / output devices, etc.).

Die vorstehenden Erläuterungen betreffend das Verfahren gelten für die Vorrichtung entsprechend. Die Vorrichtung kann in einer Komponente oder verteilt in mehreren Komponenten ausgeführt sein. Insbesondere kann auch ein Teil der Vorrichtung über eine Netzwerkschnittstelle (z.B. das Internet) angebunden sein. Beispielsweise kann die vorstehend erläuterte Datenbank eine Datenbank in einem Netzwerk sein. The above explanations regarding the method apply to the device accordingly. The device may be implemented in one component or distributed in several components. In particular, a portion of the device may also be connected via a network interface (e.g., the Internet). For example, the database discussed above may be a database in a network.

Gemäß einer Weiterbildung dient die Vorrichtung der Ansteuerung mindestens einer Windturbine. According to a development, the device serves to control at least one wind turbine.

Ausführungsbeispiele der Erfindung werden nachfolgend anhand der Zeichnungen dargestellt und erläutert. Embodiments of the invention are illustrated and explained below with reference to the drawings.

Es zeigen: Show it:

1 zur Veranschaulichung eines (autonomen) Lernansatzes zur Ansteuerung einer Windturbine; 1 to illustrate an (autonomous) learning approach for controlling a wind turbine;

2 eine schematische Anordnung mit mehreren Windturbinen, die über eine gemeinsame Steuereinheit ansteuerbar sind. 2 a schematic arrangement with several wind turbines, which can be controlled via a common control unit.

Es wird insbesondere vorgeschlagen, eine Anlage, die mittels Betriebsparametern einstellbar ist, effizient anzusteuern. Bei der Anlage kann es sich um ein beliebiges technisches System handeln, das z.B. elektrische und/oder mechanische Komponenten aufweist. Die Anlage kann z.B. ein technisches System, eine Automatisierungsanlage, eine Fertigungsanlage oder eine Maschine umfassen. Die Anlage kann insbesondere eine Windturbine aufweisen. Auch ist es möglich, dass die Anlage mehrere Windturbinen aufweist, die z.B. in Form eines Windparks zumindest teilweise gemeinsam angesteuert werden können. It is proposed, in particular, to drive a system which can be set by means of operating parameters efficiently. The installation may be any technical system, e.g. having electrical and / or mechanical components. The plant may e.g. a technical system, an automation system, a production plant or a machine. The plant may in particular have a wind turbine. It is also possible for the plant to have several wind turbines, e.g. in the form of a wind farm can be controlled at least partially together.

Betriebsbedingungen der Anlage können eine Zielgröße der Anlage maßgeblich beeinflussen. Beispielsweise kann es sich bei den Betriebsbedingungen um Umgebungsbedingungen, Umwelteinflüsse, etc. handeln. Die Zielgröße der Anlage kann z.B. eine bereitgestellte elektrische Leistung, ein bereitgestelltes Produkt, ein bereitgestellter Dienst, eine Qualität (z.B. des bereitgestellten Produkts oder des bereitgestellten Dienstes) o.ä. sein. Operating conditions of the system can significantly influence a target size of the system. For example, the operating conditions may be environmental conditions, environmental influences, etc. The target size of the plant may e.g. a provided electric power, a provided product, a service provided, a quality (e.g., the product provided or the service provided) or the like. be.

Nachfolgend wird auf eine Windturbine als beispielhafte Anlage eingegangen. Entsprechend sind andere Anlagen möglich. Auch kann die Anlage eine Vielzahl von Windturbinen umfassen. In the following, a wind turbine will be discussed as an example. Accordingly, other systems are possible. Also, the plant may include a variety of wind turbines.

Durch Anpassung der Betriebsparameter der Windturbine ist es (z.B. für eine Klasse oder Gruppe von Windturbinen, die ggf. in einem Windpark aufgestellt sind) möglich, eine optimierte Kontroll-Strategie bereitzustellen bzw. die Kontroll-Strategie aufgrund von vorhandenen Daten zu erlernen und/oder zu modifizieren und so Effekte wie Dichte der Luft, Scherwinde, Anlaufbedingungen der Windturbine und andere Faktoren zu berücksichtigen. By adapting the operating parameters of the wind turbine, it is possible (eg for a class or group of wind turbines, which may be installed in a wind farm) to provide an optimized control strategy or to learn the control strategy on the basis of existing data and / or to take into account effects such as air density, shear winds, wind turbine start-up conditions and other factors.

Die Anpassung der Betriebsparameter kann automatisch erfolgen. Beispielsweise kann eine globale Optimierung der Windturbine erreicht werden, indem ein Zustand der Turbine nur globale Effekte berücksichtigt, z.B.:

– globale Wetterbedingungen,
– globale Windbedingungen und/oder
– allgemeine Turbineneigenschaften.

The adjustment of the operating parameters can be done automatically. For example, a global optimization of the wind turbine can be achieved by considering a state of the turbine only global effects, eg:

- global weather conditions,
- global wind conditions and / or
- general turbine characteristics.

Eine lokale Optimierung der Windturbine kann basierend auf einem Zustand erfolgen, der beispielsweise die folgenden ortspezifischen Informationen berücksichtigt:

– lokale Windrichtungen,
– lokal vorhandene Turbulenzen und/oder
– lokal vorhandene Scherwinde.

A local optimization of the wind turbine may be based on a condition taking into account, for example, the following location-specific information:

- local wind directions,
- local turbulence and / or
- Locally available shear winds.

Unter Berücksichtigung von Daten eines Modells (z.B. eines Designmodells) der Windturbine und von Windprofilen unterschiedlicher Turbulenzstärken können

– eine Drehgeschwindigkeit des Rotors der Windturbine und
– ein Blattanstellwinkel der Rotorblätter eingestellt werden, um die Effizienz zu verbessern. Eine von der Turbine bereitgestellte Leistung ergibt sich zu: p(t) = 0,5·ρ(t)·A·C_p(t)·ν_wind(t)³ – p_I(t), (1)

wobei

t: die Zeit,
ρ(t): eine Luftdichte,
A: eine Rotorfläche,
C_p(t): einen Leistungskoeffizienten,
v_wind(t): eine Windgeschwindigkeit und
p_I(t): Verluste begründet durch Reibung und Hilfsleistung

bezeichnen. Taking into account data from a model (eg a design model) of the wind turbine and wind profiles of different turbulence levels

A rotational speed of the rotor of the wind turbine and
- A blade pitch of the rotor blades are adjusted to improve the efficiency. A power provided by the turbine results in: p (t) = 0.5 · ρ (t) · A · C _p (t) · ν _wind (t) ³ - p _I (t), (1)

in which

t: the time,
ρ (t): an air density,
A: a rotor surface,
_Cp (t): a power coefficient,
v _wind (t): a wind speed and
p _I (t): Losses due to friction and auxiliary power

describe.

Die Leistung kann bei vorgegebener Windgeschwindigkeit optimiert werden, indem ein Leistungskoeffizient verwendet wird, der mittels eines aerodynamischen Modells der Turbine und ihrer Rotorblätter optimiert wird. The power can be optimized at a given wind speed by using a power coefficient optimized by means of an aerodynamic model of the turbine and its rotor blades.

Das Ergebnis einer solchen Optimierung kann durch die Funktion P_el(t) = f(ν_wind(t)) (2) ausgedrückt werden, die einen optimierten Leistungs-Sollwert P_el(t) für jede Windgeschwindigkeit bereitstellt. Die Windgeschwindigkeit kann beispielsweise durch die Geschwindigkeit des Rotors ersetzt werden. Dies bedingt, dass sich bei einer gegebenen Windgeschwindigkeit und bei einem gegebenen Anstellwinkel der Rotorblätter die Geschwindigkeit des Rotors aus einem Gleichgewicht zwischen dem Drehmoment des Rotors verursacht durch den Wind und das Drehmoment des Generators ergibt. Unter Verwendung einer Geschwindigkeit n(t) des Rotors ergibt sich somit der Leistungs-Sollwert zu: P_el(t) = f_sp(n(t)), (3) wobei die Funktion f_sp eine statische Zuordnung von einer Rotorgeschwindigkeit zu einem Leistungs-Sollwert bereitstellt. The result of such optimization can be through the function P _{el (t)} = f (ν _wind (t)) (2) which provides an optimized power set point P _el (t) for each wind speed. The wind speed can be replaced by the speed of the rotor, for example. This implies that, for a given wind speed and pitch of the rotor blades, the speed of the rotor results from a balance between the torque of the rotor caused by the wind and the torque of the generator. Using a speed n (t) of the rotor thus results in the power setpoint to: P _{el (t)} = f _sp (n (t)), (3) wherein the function f _sp provides a static mapping from a rotor speed to a power setpoint.

Zur Berücksichtigung weiterer Einflüsse auf die Windturbine und damit einer Verbesserung der Effizienz der Windturbine im Hinblick auf deren bereitgestellte Leistung als Zielgröße wird insbesondere ein autonomer Lernansatz vorgeschlagen, anhand dessen eine Optimierung der Betriebsparameter der Windturbine erfolgt, indem ein Zustandsraum der für die Windturbine möglichen Betriebsparameter aktiv erkundet bzw. exploriert wird. Die so erhaltenen Informationen, insbesondere ein Einfluss der Betriebsparameter auf die Zielgröße, werden genutzt, um eine optimierte Kontroll-Strategie abzuleiten und zum Betrieb der Windturbine einzusetzen. In order to take into account further influences on the wind turbine and thus an improvement in the efficiency of the wind turbine with regard to their provided power as a target, in particular an autonomous learning approach is proposed on the basis of which an optimization of the operating parameters of the wind turbine takes place by activating a state space of the operating parameters that are possible for the wind turbine explored or explored. The information thus obtained, in particular an influence of the operating parameters on the target size, are used to derive an optimized control strategy and to use it to operate the wind turbine.

Somit ist es möglich, auch lokale Bedingungen, die den Betrieb der Windturbine z.B. in Form von Turbulenzen und Scherwinden beeinflussen, zu berücksichtigen. Außerdem können zusätzliche Faktoren, z.B. Temperatur, Feuchtigkeit, Staub, Eis, etc., die ebenfalls eine Wirkung auf den Betrieb der Windturbine ausüben, berücksichtigt werden. Thus, it is also possible to use local conditions which limit the operation of the wind turbine, e.g. in the form of turbulence and shear winds influence, to take into account. In addition, additional factors, e.g. Temperature, humidity, dust, ice, etc., which also exert an effect on the operation of the wind turbine, are taken into account.

1 zeigt ein Blockdiagramm zur Veranschaulichung eines (autonomen) Lernansatzes zur Ansteuerung einer Windturbine 101. 1 shows a block diagram illustrating an (autonomous) learning approach for controlling a wind turbine 101 ,

Eine Steuereinheit 102 führt über eine Einheit 103 zur Umsetzung einer Untersuchungsstrategie eine Aktion 110 aus, indem die Betriebsparameter der Windturbine 101 eingestellt oder modifiziert werden. Bedingt durch die Aktion 110 werden Daten 111 von der Windturbine 101 bereitgestellt bzw. ermittelt. Bei den Daten 111 kann es sich um Messwerte, Informationen, Beobachtungen, auch in Bezug auf andere Windturbinen eines Windparks, handeln. Die Daten 111 werden beispielsweise in einer Datenbank 106 gespeichert. Vorzugsweise werden Datensätze in der Datenbank gespeichert, die die Daten 111 zusammen mit den von der Steuereinheit 102 vorgegebenen Betriebsparametern umfassen. A control unit 102 leads over a unit 103 to implement an investigation strategy, an action 110 off by the operating parameters of the wind turbine 101 be adjusted or modified. Due to the action 110 become data 111 from the wind turbine 101 provided or determined. With the data 111 it can be measured values, information, observations, also in relation to other wind turbines of a wind farm. The data 111 for example, in a database 106 saved. Preferably, records are stored in the database containing the data 111 along with those of the control unit 102 include predetermined operating parameters.

Basierend auf den Daten 111 kann ein Zustand 112 (z.B. ein Markov-Zustand) der Windturbine 101 mittels eines Zustandsschätzers 104 ermittelt werden. Basierend auf dem Zustand 112 schlägt eine Einheit 105, die eine Kontroll-Strategie umsetzt, eine optimierte bzw. optimale Aktion 113 der Einheit 103 zur Umsetzung der Untersuchungsstrategie vor. Based on the data 111 can a condition 112 (eg a Markov state) of the wind turbine 101 by means of a state estimator 104 be determined. Based on the condition 112 beats a unit 105 , which implements a control strategy, an optimized or optimal action 113 the unit 103 to implement the investigation strategy.

Hierbei sei angemerkt, dass die Einheit 103 entweder eine Aktion 110 zum optimierten Betrieb der Windturbine 101 oder eine Aktion 110 zur bestmöglichen Erkundung des Zustandsraums der Windturbine 101 bereitstellen kann. Insbesondere kann die Einheit 103 eine Umschaltung vornehmen zwischen einem ersten Betriebsmodus "effizienter Betrieb" und einem zweiten Betriebsmodus "Informationsbeschaffung". It should be noted that the unit 103 either an action 110 for optimized operation of the wind turbine 101 or an action 110 for the best possible exploration of the state space of the wind turbine 101 can provide. In particular, the unit 103 a switch between a first operating mode "efficient operation" and a second operating mode "information acquisition".

Beispielhaft sind in 1 die Einheiten 103, 105 sowie der Zustandsschätzer 104 als Teil der Steuereinheit 102 gezeigt. Andere verteilte oder zentralisierte Realisierungen in einer oder in mehreren funktionalen oder tatsächlichen Komponenten sind möglich. Die Steuereinheit 102 kann die Funktion einer Steuerung zur Laufzeit (Online-Steuerung) der Windturbine 101 wahrnehmen bzw. bereitstellen. Exemplary are in 1 the units 103 . 105 as well as the state estimator 104 as part of the control unit 102 shown. Other distributed or centralized implementations in one or more functional or actual components are possible. The control unit 102 can be the function of a controller at runtime (online control) of the wind turbine 101 perceive or provide.

Der Zustandsschätzer 104 kann den Markov-Zustand beispielsweise mittels eines sogenannten MPEN-Verfahrens (MPEN: Markov Decision Process Extraction Network / Markov-Entscheidungsprozess-Extraktionsnetzwerk) ermitteln. Insbesondere kann der Zustandsschätzer einen kompakten Markov-Zustand basierend auf aufgelaufenen (z.B. über einen Zeitraum gesammelte, insbesondere mehrdimensionale) Daten 111 der Windturbine 101 bestimmen. The state estimator 104 can determine the Markov state, for example, by means of a so-called Markov Decision Process Extraction Network (MPEN) method. In particular, the state estimator may have a compact Markov state based on accumulated (eg, collected over a period of time, in particular multi-dimensional) data 111 the wind turbine 101 determine.

Die Aktion 110 wird an die Windturbine 101 übermittelt beispielsweise in Form einer Menge von Betriebsparametern. Während des autonomen Lernansatzes kann die Untersuchungsstrategie der Einheit 103 dazu genutzt werden, diese Aktion 110 zu modifizieren, um so den (mittels der Betriebsparameter einstellbaren) Zustandsraum der Windturbine 101 (z.B. systematisch oder (pseudo-)zufällig) durchsuchen zu können. The action 110 gets to the wind turbine 101 transmitted, for example, in the form of a set of operating parameters. During the autonomous learning approach, the investigative strategy of the unit 103 be used to this action 110 to modify the state space of the wind turbine (adjustable by the operating parameters) 101 (eg to search systematically or (pseudo-) randomly).

Somit kann beispielsweise die Windturbine 101 gezielt mittels möglicherweise nicht optimaler Aktionen 110 betrieben werden, um dadurch Daten 111 zu ermitteln, die den Lernansatz dahingehend unterstützen, neue Betriebspunkte in dem Zustandsraum jenseits bereits bekannter Betriebspunkte berücksichtigen zu können. Basierend auf den Daten 111 bzw. den in der Datenbank 106 abgespeicherten Datensätzen kann ein Lernalgorithmus 107 durchgeführt werden. So kann ein Modell der Windturbine 101 verbessert werden, beispielsweise in Form eines neuronalen Netzes 108, das anhand des Zustandsschätzers 104 genutzt werden kann, um eine verbesserte Schätzung von Zuständen 112 zu ermöglichen. Weiterhin kann der Lernalgorithmus 107 mittels eines Steuergenerators 109 die von der Einheit 105 bereitgestellte Kontroll-Strategie modifizieren (vergleiche Pfeil 114), insbesondere verbessern, damit auch die von dieser Einheit 105 ausgewählten Aktionen 113 optimiert werden. Das neu- ronale Netz 108 kann sowohl den Zustandsschätzer 104 als auch den Steuergenerator 109 modifizieren (vergleiche Verbindung 115). Thus, for example, the wind turbine 101 specifically by means of possibly not optimal actions 110 operated to thereby data 111 to support the learning approach to be able to consider new operating points in the state space beyond already known operating points. Based on the data 111 or in the database 106 stored records can be a learning algorithm 107 be performed. So can a model of the wind turbine 101 be improved, for example in the form of a neural network 108 , based on the state estimator 104 can be used to provide an improved estimate of states 112 to enable. Furthermore, the learning algorithm 107 by means of a control generator 109 the one from the unit 105 modify provided control strategy (see arrow 114 ), in particular to improve, thus also that of this unit 105 selected actions 113 be optimized. The neural network 108 can both the state estimator 104 as well as the control generator 109 modify (compare connection 115 ).

Der Lernalgorithmus 107 kann von mindestens einer Komponente bereitgestellt werden. Der Lernalgorithmus 107 kann unabhängig (offline) ablauffähig sein und z.B. kontinuierlich oder zu vorgegebenen Zeitpunkten bzw. Zeitintervallen basierend auf den Daten der Datenbank 106 eine Aktualisierung des Modells der Windturbine 101 durchführen. The learning algorithm 107 can be provided by at least one component. The learning algorithm 107 may be executable independently (offline) and, for example, continuously or at predetermined times or time intervals based on the data of the database 106 an update of the model of the wind turbine 101 carry out.

Anstelle des neuronalen Netzes 108 kann auch ein Generator für den Zustandsschätzer 104 vorgesehen sein. Anstelle des Steuergenerators 109 kann ein Generator für eine Strategie zum bestärkenden Lernen ("Reinforcement Learning", abgekürzt "RL") vorgesehen sein, der der Einheit 105 basierend auf ermittelten Zuständen eine optimierte Kontroll-Strategie, die vorgegebenen Markov-Zuständen optimierte Aktionen zuweist, bereitstellt. Instead of the neural network 108 can also be a generator for the state estimator 104 be provided. Instead of the control generator 109 For example, a Reinforcement Learning ("RL") generator may be provided to the unit 105 based on determined states, provides an optimized control strategy that assigns optimized actions to given Markov states.

Der Lernalgorithmus 107 kann beispielsweise auf einem NFQ-Verfahren basieren. Das NFQ-Verfahren beruht auf einem konventionellen vorwärtsgesteuerten neuronalen Netz FFNN (Feed-Forward Neuronal Network), wobei dieses vorwärtsgesteuerte neuronale Netz genutzt wird, um eine Funktion Q zu lernen. The learning algorithm 107 can for example be based on an NFQ method. The NFQ method is based on a conventional FFNN (Feed-Forward Neural Network), using this feedforward neural network to learn a function Q.

Das neuronale Netz liefert basierend auf Eingangsdaten I Zieldaten T_i, wobei i einen Iterationszähler bezeichnet. Das Lernen kann anhand der folgenden drei Schritte veranschaulicht werden: The neural network provides target data T _i based on input data _I , where i denotes an iteration counter. Learning can be illustrated by the following three steps:

Schritt 1: Step 1:

Variablen des Lernalgorithmus werden initialisiert: i = 0 r = f_reward(s, a, s') T₀ = r I = {s, a} (4) wobei

i: den Iterationszähler (d.h. einen Schritt i der Iteration),
r: ein Vektor von Belohnungen (alternativ auch als Kosten realisierbar),
s: ein Vektor von Zuständen,
a: ein Vektor von Aktionen,
s': ein Vektor von den Zuständen s nachfolgenden Zuständen,
f_reward(...): eine Funktion der Belohnung
T₀: ein Vektor von Zieldaten des neuronalen Netzes für die erste Iteration (i = 0) und
I: die Eingangsdaten für das neuronale Netz

bezeichnen. Variables of the learning algorithm are initialized:

i = 0 r = f _reward (s, a, s') T ₀ = r I = {s, a} (4)

in which

i: the iteration counter (ie a step i of the iteration),
r: a vector of rewards (alternatively available as a cost),
s: a vector of states,
a: a vector of actions,
s': a vector of the states s subsequent states,
f _reward (...): a function of reward
T ₀: a vector of target data of the neural network for the first iteration (i = 0) and
I: the input data for the neural network

describe.

Der Vektor r mit mehreren Belohnungen kann basierend auf dem Vektor von Zuständen s und Aktionen a sowie den nachfolgenden Zuständen s' geschätzt werden, da die Funktion der Belohnung f_reward oft nicht explizit bekannt ist. Die Eingangsdaten I können als eine Menge der vorstehend beschriebenen Daten 111 sowie deren zugehörige Aktionen 110 vorliegen. The multi-rewards vector r can be estimated based on the vector of states s and actions a and the subsequent states s', since the function of the reward f _{reward is} often not explicitly known. The input data I may be considered as an amount of the data described above 111 and their associated actions 110 available.

Schritt 2: Step 2:

Eine Zuordnung der Eingangsdaten I zu Zieldaten T_i wird mittels einer Funktion Q_i anhand des FFNN bestimmt: T_i = Q_i(I). (5) An assignment of the input data I to the target data T _i is determined by means of a function Q _i on the basis of the FFNN: T _i = Q _i (I). (5)

Schritt 3: Step 3:

Die Zieldaten der nächsten Iteration T_i+1 werden wie folgt ermittelt:

mit γ ∊ [0, 1]. The target data of the next iteration T _{i + 1} are determined as follows:

with γ ε [0, 1].

Die Zieldaten der nächsten Iteration werden bestimmt, indem die Belohnung r und eine geschätzte zusätzliche Belohnung Q_i für die Aktion a' in dem Zustand s' addiert werden. Die Aktion a' wird dabei ermittelt aus allen verfügbaren Aktionen A als diejenige Aktion, die die größte Belohnung Q_i bewirkt. The target data of the next iteration is determined by adding the reward r and an estimated additional reward Q _i for the action a 'in the state s'. The action a 'is determined from all available actions A as the action that causes the greatest reward Q _i .

Der Einfluss der Belohnung Q_i im Vergleich zu der Belohnung r kann mittels eines Korrekturfaktors γ eingestellt werden, der beispielsweise in einem Intervall zwischen 0 und 1 liegt. The influence of the reward Q _i in comparison to the reward r can be set by means of a correction factor γ which lies for example in an interval between 0 and 1.

Die Schritte 2 und 3 können iterativ durchgeführt werden solange bis der Iterationszähler i einen vorgegebenen Schwellwert i_max erreicht oder überschreitet oder bis eine vorgegebene Abbruchbedingung erfüllt ist. Steps 2 and 3 can be carried out iteratively until the iteration counter i reaches or exceeds a predefined threshold value i _max or until a predetermined termination condition is met.

Der Lernalgorithmus kann auch durch einen Algorithmus eines RCNN (Recurrent Control Neuronal Network, d.h. ein zeitlich entfalteter neuronaler Regler) ersetzt werden, wobei die Kontroll-Strategie (siehe Einheit 105) eine kontinuierliche Aktion für einen geschätzten Zustand bereitstellen kann. Die kontinuierliche Aktion muss nicht notwendigerweise vorab bestimmt werden, da dieser Algorithmus eine Verallgemeinerung zwischen Aktionen zulässt. Auch ist es möglich, einen PGNRR (Policy Gradient Neuronal Rewards Regression, d.h. eine neuronale gradientenbasierte Regelung durch Belohnungsregression) Algorithmus einzusetzen. The learning algorithm can also be replaced by an algorithm of an RCNN (Recurrent Control Neural Network), using the control strategy (see unit 105 ) can provide a continuous action for an estimated state. The continuous action does not necessarily have to be determined in advance since this algorithm is a generalization between actions. It is also possible to use a PGNRR (Policy Gradient Neuronal Rewards Regression) algorithm.

Da die Windgeschwindigkeit an der Turbine selbst nicht genau gemessen werden kann, ist es nicht möglich, die Zielfunktion mit der benötigten Genauigkeit zu bestimmen. Es wird daher ein autonomer Lernansatz vorgeschlagen, der keine explizite Zielfunktion benötigt und mittels eines Lernalgorithmus autonom die Zielgröße einer Anlage, hier die ausgegebene Leistung der Windturbine, verbessert. Since the wind speed at the turbine itself can not be measured accurately, it is not possible to determine the target function with the required accuracy. An autonomous learning approach is therefore proposed which does not require an explicit objective function and autonomously improves the target variable of a system, in this case the output of the wind turbine, by means of a learning algorithm.

2 zeigt eine schematische Anordnung mit mehreren Windturbinen 101, die über eine gemeinsame Steuereinheit 201 ansteuerbar sind. Insoweit kann die im Zusammenhang mit 1 beschriebene Steuereinheit 102 für eine Vielzahl von Windturbinen 101 Aktionen bereitstellen. Auch können die Daten von den Windturbinen 101 anhand der Steuereinheit 201 gespeichert werden (vgl. Funktionalität der in 1 gezeigten Datenbank 106). Eine zentrale Einheit 202 stellt in dem Beispiel gemäß 2 die Funktionalität des in 1 gezeigten Lernalgorithmus 107 bereit. 2 shows a schematic arrangement with several wind turbines 101 that have a common control unit 201 are controllable. In that regard, the related to 1 described control unit 102 for a variety of wind turbines 101 Provide actions. Also, the data from the wind turbines 101 based on the control unit 201 be stored (see functionality of in 1 shown database 106 ). A central unit 202 in the example according to 2 the functionality of in 1 shown learning algorithm 107 ready.

Die vorliegende Lösung eignet sich zu einem kombinierten Betrieb einer Anlage, eines Systems oder einer Maschine, z.B. mindestens einer Windturbine, wobei zwischen einem ersten Betriebsmodus "effizienter Betrieb" und einem zweiten Betriebsmodus "Sammeln von Informationen betreffend den Zustandsraum der Anlage" gewechselt werden kann. In dem zweiten Betriebsmodus werden, wie vorstehend erläutert, Daten gesammelt als Reaktionen auf mögliche Aktionen (d.h. Einstellungen) der Anlage. Durch eine gezielte Ansteuerung mit vorgegebenen Aktionen ist es somit möglich, den Zustandsraum der Anlage zu erkunden und somit eine Wissensbasis zu vergrößern, anhand derer dann ein verbesserter (z.B. effizienter) Betrieb der Anlage ermöglicht wird. In dem ersten Betriebsmodus wird die Anlage anhand der ermittelten Informationen (und ggf. weiterer vorgegebener Informationen) angesteuert. Die Ansteuerung der Anlage erfolgt durch Einstellung ihrer Betriebsparameter. The present solution is suitable for a combined operation of a plant, a system or a machine, e.g. at least one wind turbine, it being possible to switch between a first operating mode "efficient operation" and a second operating mode "gathering information regarding the state space of the system". In the second mode of operation, as discussed above, data is collected in response to possible actions (i.e., adjustments) to the plant. By means of targeted activation with given actions, it is thus possible to explore the state space of the installation and thus to increase a knowledge base, by means of which an improved (for example more efficient) operation of the installation is made possible. In the first operating mode, the system is controlled based on the determined information (and possibly further specified information). The system is controlled by setting its operating parameters.

Der Wechsel zwischen dem ersten Betriebsmodus und dem zweiten Betriebsmodus kann nach einem vorgegebenen festen oder nach einem dynamischen Schema erfolgen. Beispielsweise kann eine Zeitbasis zur Ansteuerung der Anlage in Zeitintervalle unterteilt werden. Nun können regelmäßig, unregelmäßig oder zufällig Zeitintervalle für den ersten oder für den zweiten Betriebsmodus genutzt werden. Auch ist es möglich, dass die Veränderung der Effizienz als Maß dafür dient, die Dauer bzw. Häufigkeit zumindest eines der Betriebsmodi zu modifizieren: Wenn z.B. durch die iterative Durchführung des zweiten Betriebsmodus eine Effizienzsteigerung kaum mehr oder gar nicht mehr möglich ist, kann der zweite Betriebsmodus entsprechend seltener aktiviert werden. Im anderen Fall, kann das Sammeln von Informationen forciert werden, um die Effizienz weiter zu steigern. Auch kann der zweite Betriebsmodus dann aktiviert werden, wenn die Anlage nicht (effizient) betrieben werden muss (z.B. wenn eine Windturbine keinen Strom bereitstellen muss) oder in einem Fehlermodus – in solchen Fällen können zumindest Daten über die Anlage gesammelt werden. The change between the first operating mode and the second operating mode can take place according to a predetermined fixed or according to a dynamic scheme. For example, a time base for controlling the system can be subdivided into time intervals. Now periodic, irregular or random time intervals can be used for the first or for the second operating mode. It is also possible that the change in efficiency serves as a measure to modify the duration or frequency of at least one of the modes of operation: e.g. As a result of the iterative implementation of the second operating mode, an increase in efficiency is scarcely possible or even no longer possible, the second operating mode can be activated correspondingly less frequently. In the other case, the gathering of information can be forced to further increase the efficiency. Also, the second mode of operation may be activated when the system need not be operated (efficiently) (e.g., when a wind turbine need not provide power) or in a fault mode - in which case at least data on the system may be collected.

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte Nicht-PatentliteraturCited non-patent literature

See "Markov Decision's Process Extraction Network", for example: S. Duell, A. Hans, and S. Udluft: The Markov Decision Process Extraction Network. In Proc. of the European Symposium on Artificial Neural Networks, 2010 [0021]
"Neural Fitted Q Iteration", see: M. Riedmiller: Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method. In Proc. of the European Conf. on Machine Learning, 2005 [0027]
"Recurrent Control Neural Network", see: AM Schaefer, S. Udluft, and H.-G. Zimmermann. A Recurrent Neural Network for Data Efficient Reinforcement Learning. In Proc. of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007 [0027]
AM Schäfer, D. Schneegass, V. Sterzing, and S. Udluft. A Neural Reinforcement Learning Approach to Gas Turbine Control. International Joint Conference on Neural Networks, 2007 [0027]
"Policy Gradient Neural Rewards Regression", see: D. Schneegass, S. Udluft, and Th. Martinetz. Improving Optimality of Neural Rewards Regression for Data-Efficient Batch Near-Optimal Policy Identification. In Proc. of the International Conf. on Artificial Neural Networks, 2007 [0027]

Claims

Method for controlling a system - in which the system is controlled by a first action, - in which data of the plant are determined as a result of the control with the first action, In which a second action is determined on the basis of the data and the system is activated with the second action, - In which the first action and / or the second action is determined to explore parts of a state space of the plant.

Method according to Claim 1, in which the first action and / or the second action is an action for which no data is yet available or for which the present data is obsolete.

Method according to one of the preceding claims, wherein the first action and the second action comprises an adjustment of operating parameters of the plant.

Method according to one of the preceding claims, in which the installation comprises at least one of the following components: - a technical system, An automation system, - a manufacturing plant, - a production plant, - a machine, - a wind turbine, - a gas turbine A turbomachine, - a power grid, A power distribution network, - a load distribution network, - a communication network.

Method according to one of the preceding claims, in which the determined data comprise measured values, information and / or observations of the system.

Method according to one of the preceding claims, in which the data are stored together with the associated first action.

Method according to one of the preceding claims, in which the second action is determined on the basis of the data by determining a state, in particular a Markov state, by means of a state estimation.

Method according to Claim 7, in which an optimized action is determined as the second action based on the state by means of a control strategy.

Method according to one of claims 7 or 8, wherein based on data of the plant by means of a learning algorithm, a model of the plant is determined or modified and based on the model, the state estimation is performed.

Method according to Claim 9, in which the model of the installation is determined by means of a neural network, a tree structure and / or Gaussian processes.

Method according to one of claims 9 or 10, wherein based on data of the plant by means of the learning algorithm and the model of the plant, the control strategy for selecting the second action based on the state is set or modified.

Method according to one of the preceding claims, wherein the system in a first operating mode and in a second operating mode is controllable, wherein the first operating mode indicates a normal operation of the system and the second operating mode denotes an operation of the system for generating the data.

The method of claim 12, wherein in the second mode of operation, the system is controlled with a first or second action, which serves the targeted procurement of not yet existing data or current data.

Device for controlling a system with a processing unit, which is set up such that - The system is controllable by means of a first action, - data of the system can be determined as a result of the control with the first action, A second action can be determined on the basis of the data and the installation can be activated with the second action, - The first action and / or the second action is determinable to determine parts of a state space of the plant.

Apparatus according to claim 14 for controlling at least one wind turbine.