US20150370227A1 - Controlling a Target System - Google Patents

Controlling a Target System Download PDF

Info

Publication number
US20150370227A1
US20150370227A1 US14/309,641 US201414309641A US2015370227A1 US 20150370227 A1 US20150370227 A1 US 20150370227A1 US 201414309641 A US201414309641 A US 201414309641A US 2015370227 A1 US2015370227 A1 US 2015370227A1
Authority
US
United States
Prior art keywords
control policies
control
target system
weights
policies
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/309,641
Other languages
English (en)
Inventor
Hany F. Bassily
Clemens Otte
Siegmund Düll
Michael Müller
Steffen Udluft
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens AG
Original Assignee
Siemens AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens AG filed Critical Siemens AG
Priority to US14/309,641 priority Critical patent/US20150370227A1/en
Assigned to SIEMENS AKTIENGESELLSCHAFT reassignment SIEMENS AKTIENGESELLSCHAFT ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: UDLUFT, STEFFEN, DÜLL, Siegmund, OTTE, CLEMENS, Müller, Michael
Assigned to SIEMENS ENERGY, INC. reassignment SIEMENS ENERGY, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BASSILY, HANY F.
Assigned to SIEMENS AKTIENGESELLSCHAFT reassignment SIEMENS AKTIENGESELLSCHAFT ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SIEMENS ENERGY, INC.
Priority to EP15725521.7A priority patent/EP3129839B1/en
Priority to PCT/EP2015/060298 priority patent/WO2015193032A1/en
Priority to KR1020177001589A priority patent/KR101963686B1/ko
Priority to CN201580032397.5A priority patent/CN106462117B/zh
Publication of US20150370227A1 publication Critical patent/US20150370227A1/en
Priority to US15/376,794 priority patent/US10747184B2/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/23Pc programming
    • G05B2219/23288Adaptive states; learning transitions
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/25Pc structure of the system
    • G05B2219/25255Neural network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • control of complex dynamical technical systems may be optimized by data driven approaches.
  • various aspects of such dynamical systems may be improved. For example, efficiency, combustion dynamics, or emissions for gas turbines may be improved. Additionally, life-time consumption, efficiency, or yaw for wind turbines may be improved.
  • Modern data driven optimization utilizes machine learning methods for improving control policies (e.g., control strategies) of dynamical systems with regard to general or specific optimization goals.
  • control policies e.g., control strategies
  • Such machine learning methods may outperform conventional control strategies. For example, if the controlled system is changing, an adaptive control approach capable of learning and adjusting a control strategy according to the new situation and new properties of the dynamical system may be advantageous over conventional non-learning control strategies.
  • Known methods for machine learning include reinforcement learning methods that focus on data efficient learning for a specified dynamical system. However, even when using these methods, it may take some time until a good data driven control strategy is available after a change of the dynamical system. Until then, the changed dynamical system operates outside a possibly optimized envelope. If the change rate of the dynamical system is very high, only sub-optimal results for a data driven optimization may be achieved since a sufficient amount of operational data may be never available.
  • control of a target system that allows a more rapid learning of a control policy (e.g., for a changing target system) is provided.
  • Embodiments of a method, a controller, and a computer program product for controlling a target system (e.g., a gas or wind turbine or another technical system) by a processor are based on a pool of control policies.
  • the method, controller, or computer program product (non-transitory computer readable storage medium having instructions, which when executed by a processor, perform actions) is configured to receive the pool of control policies, which includes a plurality of control policies, and to receive weights for weighting each of the plurality of control policies.
  • the plurality of control policies is weighted by the weights to provide a weighted aggregated control policy.
  • the target system is controlled using the weighted aggregated control policy, and performance data relating to a performance of the controlled target system are received.
  • the weights are adjusted by the processor based on the received performance data to improve the performance of the controlled target system.
  • the plurality of control policies is reweighted by the adjusted weights to adjust the weighted aggregated control policy.
  • One or more of the present embodiments allow for an effective learning of peculiarities of the target system by adjusting the weights for the plurality of control policies.
  • Such weights may include much fewer parameters than the pool of control policies.
  • the adjusting of the weights may use much less computing effort and may converge much faster than a training of the whole pool of control policies.
  • a high level of optimization may thus be reached in a shorter time. For example, a reaction time to changes of the target system may be significantly reduced. Aggregating a plurality of control policies reduces a risk of accidentally choosing a poor policy, thus increasing the robustness of the method.
  • the weights may be adjusted by training a neural network run by the processor.
  • the usage of a neural network for the adjusting of the weights allows for an efficient learning and flexible adaptation.
  • the plurality of control policies may be calculated from different data sets of operational data of one or more source systems (e.g., by training a neural network).
  • the different data sets may relate to different source systems, to different versions of one or more source systems, to different policy models, to source systems in different climes, or to one or more source systems under different conditions (e.g., before and after repair, maintenance, changed parts, etc.).
  • the one or more source systems may be chosen similar to the target system, so that control policies optimized for the one or more source systems are expected to perform well for the target system. Therefore, the plurality of control policies based on one or more similar source systems are a good starting point for controlling the target system. Such a learning from similar situations is often denoted as “transfer learning.” Hence, much less performance data relating to the target system are used in order to obtain a good aggregated control policy for the target system. Thus, effective aggregated control policies may be learned in a short time even for target systems with scarce data.
  • the calculation of the plurality of control policies may use a reward function relating to a performance of the source systems. That reward function may also be used for adjusting the weights.
  • the performance data may include state data relating to a current state of the target system.
  • the plurality of control policies may be weighted and/or reweighted in dependence of the state data. This allows for a more accurate and more effective adjustment of the weights. For example, the weight of a control policy may be increased if a state is recognized where the control policy turned out to perform well, and vice versa.
  • the performance data may be received from the controlled target system, from a simulation model of the target system, and/or from a policy evaluation.
  • Performance data from the controlled target system allows monitoring the actual performance of the target system and may improve the performance by learning a particular response characteristic of the target system.
  • a simulation model of the target system also allows what-if queries for the reward function. With a policy evaluation, a Q-function may be set up, allowing an expectation value to be determined for the reward function.
  • An aggregated control action for controlling the target system may be determined according to the weighted aggregated control policy by weighted majority voting, by forming a weighted mean, and/or by forming a weighted median from action proposals according to the plurality of control policies.
  • the training of the neural network may be based on a reinforcement learning model, which allows an efficient learning of control policies for dynamical systems.
  • the neural network may operate as a recurrent neural network. This allows for maintaining an internal state enabling an efficient detection of time dependent patterns when controlling a dynamical system. Many Partially Observable Markov Decision Processes may be handled like Markov Decision Processes by a recurrent neural network
  • the plurality of control policies may be selected from the pool of control policies in dependence of a performance evaluation of control policies.
  • the selected control policies may establish an ensemble of control policies. For example, only those control policies may be selected from the pool of control policies that perform well according to a predefined criterion.
  • Control policies from the pool of control policies may be included into the plurality of control policies or excluded from the plurality of control policies in dependence of the adjusted weights. This allows improvement of the selection of control policies contained in the plurality of control policies. So, for example, control policies with very small weights may be removed from the plurality of control policies in order to reduce a computational effort.
  • FIG. 1 illustrates an exemplary embodiment including a target system and a plurality of source systems together with controllers generating a pool of control policies
  • FIG. 2 illustrates the target system together with a controller in greater detail.
  • FIG. 1 illustrates an exemplary embodiment including a target system TS and a plurality of source systems S 1 , . . . , SN.
  • the target system TS and the plurality of source systems S 1 , . . . , SN may be gas or wind turbines or other dynamical systems including simulation tools for simulating a dynamical system.
  • the source systems S 1 , . . . , SN are chosen to be similar to the target system TS.
  • the source systems S 1 , . . . , SN may also include the target system TS at a different time (e.g., before maintenance of the target system TS or before exchange of a system component, etc.).
  • the target system TS may be one of the source systems S 1 , . . . , SN at a later time.
  • Each of the source systems S 1 , . . . , SN is controlled by a reinforcement learning controller RLC 1 , . . . , or RLCN, respectively.
  • the reinforcement learning controllers RLC 1 , . . . , or RLCN are driven by control policies P 1 , . . . , or PN, respectively.
  • the reinforcement learning controllers RLC 1 , . . . , RLCN may each include a recurrent neural network (not shown) for learning (e.g., optimizing the control policies P 1 , . . . , PN).
  • SN are collected and stored in databases DB 1 , . . . , DBN.
  • the operational data OD 1 , . . . , ODN are processed according to the control policies P 1 , . . . , PN, and the control policies P 1 , . . . , PN are refined by reinforcement learning by the reinforcement learning controllers RLC 1 , . . . , RLCN.
  • the control output of the control policies P 1 , . . . , PN is fed back into the respective source system S 1 , . . . , or SN via a control loop CL, resulting in a closed learning loop for the respective control policy P 1 , . . .
  • control policies P 1 , . . . , PN are fed into a reinforcement learning policy generator PGEN that generates a pool P of control policies including the control policies P 1 , . . . , PN.
  • the target system TS is controlled by a reinforcement learning controller RLC including a recurrent neural network RNN and an aggregated control policy ACP.
  • the reinforcement learning controller RLC receives the control policies P 1 , . . . , PN from the reinforcement learning policy generator PGEN and generates the aggregated control policy ACP from the control policies P 1 , . . . , PN.
  • the reinforcement learning controller RLC receives performance data PD relating to a current performance of the target system TS (e.g., a current power output, a current efficiency, etc.) from the target system TS.
  • the performance data PD includes state data SD relating to a current state of the target system TS (e.g., temperature, rotation speed, etc.).
  • the performance data PD is input to the recurrent neural network RNN for training of the recurrent neural network RNN and input to the aggregated control policy ACP for generating an aggregated control action for controlling the target system TS via a control loop CL. This results in a closed learning loop for the reinforcement learning controller RLC.
  • pre-trained control policies P 1 , . . . , PN from several similar source systems S 1 , . . . , SN gives a good starting point for a neural model run by the reinforcement learning controller RLC. With that, the amount of data and/or time required for learning an efficient control policy for the target system TS may be reduced considerably.
  • FIG. 2 illustrates one embodiment of the target system TS together with the reinforcement learning controller RLC in greater detail.
  • the reinforcement learning controller RLC includes a processor PROC and, as already mentioned above, the recurrent neural network RNN and the aggregated control policy ACP.
  • the recurrent neural network RNN implements a reinforcement learning model.
  • the performance data PD(SD) including the state data SD stemming from the target system TS is input to the recurrent neural network RNN and to the aggregated control policy ACP.
  • the control policies P 1 , . . . , PN are input to the reinforcement learning controller RLC.
  • the control policies P 1 , . . . , PN may include the whole pool P or a selection of control policies from the pool P.
  • the recurrent neural network RNN is adapted to train a weighting policy WP including weights W 1 , . . . , WN for weighting each of the control policies P 1 , . . . , PN.
  • the weights W 1 , . . . , WN are initialized by initial weights IW 1 , . . . , IWN received by the reinforcement learning controller RLC (e.g., from the reinforcement learning policy generator PGEN or from a different source).
  • the aggregated control policy ACP relies on an aggregation function AF receiving the weights W 1 , . . . , WN from the recurrent neural network RNN and on the control policies P 1 , . . . , PN.
  • Each of the control policies P 1 , . . . , PN or a pre-selected part of the control policies P 1 , . . . , PN receives the performance data PD(SD) with the state data SD and calculates from the performance data PD(SD) and the state data SD a specific action proposal AP 1 , . . . , or APN, respectively.
  • APN are input to the aggregation function AF, which weights each of the action proposals AP 1 , . . . , APN with a respective weight W 1 , . . . , or WN to generate an aggregated control action AGGA.
  • the action proposals AP 1 , . . . , APN may be weighted (e.g., by majority voting, by forming a weighted mean, and/or by forming a weighted median from the control policies P 1 , . . . , PN).
  • the target system TS is controlled by the aggregated control action AGGA.
  • the performance data PD(SD) resulting from the control of the target system TS by the aggregated control action AGGA are fed back to the aggregated control policy ACP and to the recurrent neural network RNN.
  • new specific action proposals AP 1 , . . . , APN are calculated by the control policies P 1 , . . . , PN.
  • the recurrent neural network RNN uses a reward function (not shown) relating to a desired performance of the target system TS for adjusting the weights W 1 , . . . , WN in dependence of the performance data PD(SD) fed back from the target system TS.
  • WN are adjusted by reinforcement learning with an optimization goal directed to an improvement of the desired performance.
  • an update UPD of the aggregation function AF is made.
  • the updated aggregation function AF weights the new action proposals AP 1 , . . . , APN (e.g., reweights the control policies P 1 , . . . , PN) by the adjusted weights W 1 , . . . , WN in order to generate a new aggregated control action AGGA for controlling the target system TS.
  • the above acts implement a closed learning loop leading to a considerable improvement of the performance of the target system TS.
  • Each control policy P 1 , . . . , PN is initially calculated by the reinforcement learning controllers RLC 1 , . . . , RLCN based on a set of operational data OD 1 , . . . , or ODN, respectively.
  • the set of operational data for a specific control policy may be specified in multiple ways. Examples for such specific sets of operational data may be operational data of a single system (e.g., a single plant, operational data of multiple plants of a certain version, operational data of plants before and/or after a repair, or operational data of plants in a certain clime, in a certain operational condition, and/or in a certain environmental condition).
  • Different control policies from P 1 , . . . , PN may refer to different policy models trained on a same set of operational data.
  • control policies may be selected from the pool P to form an ensemble of control policies P 1 , . . . , PN.
  • Each control policy P 1 , . . . , PN provides a separate action proposal AP 1 , . . . , or APN, from the performance data PD(SD).
  • the action proposals AP 1 , . . . , APN are aggregated to calculate the aggregated control action AGGA of the aggregated control policy ACP.
  • the aggregation may be performed using majority voting. If the action proposals AP 1 , . . . , APN are continuous, a mean or median value of the action proposals AP 1 , . . . , APN may be used for the aggregation.
  • the reweighting of the control policies P 1 , . . . , PN by the adjusted weights W 1 , . . . , WN allows for a rapid adjustment of the aggregated control policy ACP, for example, if the target system TS changes.
  • the reweighting depends on the recent performance data PD(SD) generated while interacting with the target system TS. Since the weighting policy WP has less free parameters (e.g., the weights W 1 , . . . , WN) than a control policy usually has, less data is used to adjust to a new situation or to a modified system.
  • the weights W 1 , . . . , WN may be adjusted using the current performance data PD(SD) of the target system and/or using a model of the target system (e.g., implemented by an additional recurrent neural network) and/or using a policy evaluation.
  • each control policy P 1 , . . . , PN may be globally weighted (e.g., over a complete state space of the target system TS). A weight of zero may indicate that a particular control policy is not part of the ensemble of policies.
  • the weighting by the aggregation function AF may depend on the system state (e.g., on the state data SD of the target system TS). This may be used to favor good control policies with high weights within one region of the state space of the target system TS. Within other regions of the state space, the control polices may not be used at all.
  • a possible approach may be to calculate the weights W i based on distances (e.g., according to a pre-defined metric of the state space) between the current state s and states stored together with P i in a training set including states where P i performed well. Uncertainty estimates (e.g., provided by a probabilistic policy) may also be included in the weight calculation.
  • the global and/or state dependent weighting is optimized using reinforcement learning.
  • the action space of such a reinforcement learning problem is the space of the weights W 1 , . . . , WN, while the state space is defined in the state space of the target system TS.
  • the action space is only ten dimensional and, therefore, allows a rapid optimization with comparably little input data and little computational effort. Meta actions may be used to reduce the dimensionality of the action space even further. Delayed effects are mitigated by using the reinforcement learning approach.
  • the adjustment of the weights W 1 , . . . , WN may be carried out by applying a measured performance of the ensemble of control policies P 1 , . . . , PN to a reward function.
  • the reward function may be chosen according to the goal of maximizing efficiency, maximizing output, minimizing emissions, and/or minimizing wear of the target system TS.
  • a reward function used to train the control policies P 1 , . . . , PN may be used for training and/or initializing the weighting policy WP.

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)
US14/309,641 2014-06-19 2014-06-19 Controlling a Target System Abandoned US20150370227A1 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
US14/309,641 US20150370227A1 (en) 2014-06-19 2014-06-19 Controlling a Target System
EP15725521.7A EP3129839B1 (en) 2014-06-19 2015-05-11 Controlling a target system
PCT/EP2015/060298 WO2015193032A1 (en) 2014-06-19 2015-05-11 Controlling a target system
KR1020177001589A KR101963686B1 (ko) 2014-06-19 2015-05-11 타겟 시스템 제어
CN201580032397.5A CN106462117B (zh) 2014-06-19 2015-05-11 控制目标系统
US15/376,794 US10747184B2 (en) 2014-06-19 2016-12-13 Controlling a target system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/309,641 US20150370227A1 (en) 2014-06-19 2014-06-19 Controlling a Target System

Related Child Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/060298 Continuation WO2015193032A1 (en) 2014-06-19 2015-05-11 Controlling a target system

Publications (1)

Publication Number Publication Date
US20150370227A1 true US20150370227A1 (en) 2015-12-24

Family

ID=53274489

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/309,641 Abandoned US20150370227A1 (en) 2014-06-19 2014-06-19 Controlling a Target System
US15/376,794 Active 2036-11-10 US10747184B2 (en) 2014-06-19 2016-12-13 Controlling a target system

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/376,794 Active 2036-11-10 US10747184B2 (en) 2014-06-19 2016-12-13 Controlling a target system

Country Status (5)

Country Link
US (2) US20150370227A1 (ko)
EP (1) EP3129839B1 (ko)
KR (1) KR101963686B1 (ko)
CN (1) CN106462117B (ko)
WO (1) WO2015193032A1 (ko)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019058508A1 (en) * 2017-09-22 2019-03-28 Nec Corporation ASSEMBLY REGULATION SYSTEM, ASSEMBLY REGULATION METHOD, AND ASSEMBLY REGULATION PROGRAM
JP2019067238A (ja) * 2017-10-03 2019-04-25 エヌ・ティ・ティ・コミュニケーションズ株式会社 制御装置、制御方法および制御プログラム
WO2020174262A1 (en) * 2019-02-27 2020-09-03 Telefonaktiebolaget Lm Ericsson (Publ) Transfer learning for radio resource management

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6453919B2 (ja) * 2017-01-26 2019-01-16 ファナック株式会社 行動情報学習装置、行動情報最適化システム及び行動情報学習プログラム
CN109308246A (zh) * 2017-07-27 2019-02-05 阿里巴巴集团控股有限公司 系统参数的优化方法、装置及设备、可读介质
CN109388547A (zh) * 2018-09-06 2019-02-26 福州瑞芯微电子股份有限公司 一种优化终端性能的方法及一种存储设备
EP3715608B1 (en) * 2019-03-27 2023-07-12 Siemens Aktiengesellschaft Machine control based on automated learning of subordinate control skills
EP3792483A1 (en) * 2019-09-16 2021-03-17 Siemens Gamesa Renewable Energy A/S Wind turbine control based on reinforcement learning

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5734593A (en) * 1996-04-24 1998-03-31 Bei Sensors & Systems Company, Inc. Fuzzy logic controlled cryogenic cooler
US6574754B1 (en) * 2000-02-14 2003-06-03 International Business Machines Corporation Self-monitoring storage device using neural networks
US6577908B1 (en) * 2000-06-20 2003-06-10 Fisher Rosemount Systems, Inc Adaptive feedback/feedforward PID controller
US6925338B2 (en) 2001-03-01 2005-08-02 Fisher-Rosemount Systems, Inc. Fiducial technique for estimating and using degradation levels in a process plant
JPWO2004068399A1 (ja) * 2003-01-31 2006-05-25 松下電器産業株式会社 予測型行動決定装置および行動決定方法
US7184847B2 (en) * 2004-12-17 2007-02-27 Texaco Inc. Method and system for controlling a process in a plant
CN100530003C (zh) * 2007-10-19 2009-08-19 西安交通大学 基于数据挖掘的火电厂钢球磨煤机制粉系统自动控制方法
CN103034122A (zh) * 2012-11-28 2013-04-10 上海交通大学 基于时间序列的多模型自适应控制器及控制方法
CN103019097B (zh) * 2012-11-29 2015-03-25 北京和隆优化科技股份有限公司 一种轧钢加热炉优化控制系统
US20150301510A1 (en) * 2014-04-22 2015-10-22 Siegmund Düll Controlling a Target System

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Galtier, Ideomotor feedback control in a recurrent neural network, Biological Cybernetics 109(3), 2014, pp. 1-17 *
Nakamura, et al., Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot, Parallel Problem Solving from Nature - PPSN VIII, Volume 3242, Lecture Notes in Computer Science, pp. 972-981 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019058508A1 (en) * 2017-09-22 2019-03-28 Nec Corporation ASSEMBLY REGULATION SYSTEM, ASSEMBLY REGULATION METHOD, AND ASSEMBLY REGULATION PROGRAM
JP2020529664A (ja) * 2017-09-22 2020-10-08 日本電気株式会社 組み合わせ制御システム、組み合わせ制御方法、および、組み合わせ制御プログラム
JP7060080B2 (ja) 2017-09-22 2022-04-26 日本電気株式会社 組み合わせ制御システム、組み合わせ制御方法、および、組み合わせ制御プログラム
JP2019067238A (ja) * 2017-10-03 2019-04-25 エヌ・ティ・ティ・コミュニケーションズ株式会社 制御装置、制御方法および制御プログラム
WO2020174262A1 (en) * 2019-02-27 2020-09-03 Telefonaktiebolaget Lm Ericsson (Publ) Transfer learning for radio resource management
US11658880B2 (en) 2019-02-27 2023-05-23 Telefonaktiebolaget Lm Ericsson (Publ) Transfer learning for radio resource management

Also Published As

Publication number Publication date
US10747184B2 (en) 2020-08-18
WO2015193032A1 (en) 2015-12-23
KR20170023098A (ko) 2017-03-02
CN106462117B (zh) 2019-12-10
EP3129839B1 (en) 2019-06-26
CN106462117A (zh) 2017-02-22
US20170090429A1 (en) 2017-03-30
KR101963686B1 (ko) 2019-03-29
EP3129839A1 (en) 2017-02-15

Similar Documents

Publication Publication Date Title
US10747184B2 (en) Controlling a target system
CN107798199B (zh) 一种水电机组参数闭环辨识方法
EP3117274B1 (en) Method, controller, and computer program product for controlling a target system by separately training a first and a second recurrent neural network models, which are initally trained using oparational data of source systems
CN103927580B (zh) 一种基于改进人工蜂群算法的工程约束参数优化方法
Karaboga et al. Proportional—integral—derivative controller design by using artificial bee colony, harmony search, and the bees algorithms
US20210256428A1 (en) Controller for controlling a technical system, and method for configuring the controller
CN104391444B (zh) 一种基于离散系统改进单神经元的pid整定方法
CN103235620A (zh) 基于全局变量预测模型的温室环境智能控制方法
CN108181802A (zh) 一种性能可控pid控制器参数优化整定方法
CN104500336B (zh) 一种基于Hammerstein‑Wiener模型的风电机组恒功率广义预测控制方法
EP4067766A1 (en) Machine learning device, demand control system, and air conditioning control system
Xuemei et al. Particle swarm optimization-based LS-SVM for building cooling load prediction
CN102663224A (zh) 基于信息熵的交通流量集成预测模型
Lu et al. Vegetable price prediction based on pso-bp neural network
CN112398115A (zh) 一种基于改进模型预测控制的多时间尺度火电-光伏-抽水蓄能联合优化调度方案
CN113852098B (zh) 一种基于多目标蜻蜓算法的自动发电控制调度方法
CN108459570B (zh) 基于生成对抗网络架构的灌溉配水智能控制系统及方法
Chi et al. Comparison of two multi-step ahead forecasting mechanisms for wind speed based on machine learning models
Moustakis et al. A practical Bayesian optimization approach for the optimal estimation of the rotor effective wind speed
CN113110061B (zh) 基于改进粒子群算法优化的智能灌溉模糊控制方法及系统
Lei et al. Multi-agent path planning for unmanned aerial vehicle based on threats analysis
CN111191815B (zh) 一种用于风电集群的超短期出力预测方法及系统
Avila-Miranda et al. An optimal and intelligent control strategy to ventilate a greenhouse
He et al. Application of an improved augmented Lagrangian algorithm to the tuning of robust PID controller for hydraulic turbine governing system
CN110908280B (zh) 一种小车-二级倒立摆系统优化控制方法

Legal Events

Date Code Title Description
AS Assignment

Owner name: SIEMENS AKTIENGESELLSCHAFT, GERMANY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DUELL, SIEGMUND;MUELLER, MICHAEL;OTTE, CLEMENS;AND OTHERS;SIGNING DATES FROM 20140713 TO 20140716;REEL/FRAME:034598/0574

Owner name: SIEMENS ENERGY, INC., FLORIDA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:BASSILY, HANY F.;REEL/FRAME:034598/0592

Effective date: 20140918

Owner name: SIEMENS AKTIENGESELLSCHAFT, GERMANY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SIEMENS ENERGY, INC.;REEL/FRAME:034598/0599

Effective date: 20141023

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION