GB2620635A - PID tuning - Google Patents

PID tuning Download PDF

Info

Publication number
GB2620635A
GB2620635A GB2210433.5A GB202210433A GB2620635A GB 2620635 A GB2620635 A GB 2620635A GB 202210433 A GB202210433 A GB 202210433A GB 2620635 A GB2620635 A GB 2620635A
Authority
GB
United Kingdom
Prior art keywords
pid
electrical
terms
parameter
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2210433.5A
Other versions
GB202210433D0 (en
Inventor
Omar Shabka Zacharaya
Zervas Georgios
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UCL Business Ltd
Original Assignee
UCL Business Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by UCL Business Ltd filed Critical UCL Business Ltd
Priority to GB2210433.5A priority Critical patent/GB2620635A/en
Publication of GB202210433D0 publication Critical patent/GB202210433D0/en
Priority to PCT/GB2023/051677 priority patent/WO2024013465A2/en
Publication of GB2620635A publication Critical patent/GB2620635A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B11/00Automatic controllers
    • G05B11/01Automatic controllers electric
    • G05B11/36Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/42Servomotor, servo controller kind till VSS
    • G05B2219/42018Pid learning controller, gains adapted as function of previous error
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

A proportional‐integral‐derivative, PID system 100 has a device 130, a sensor 140 to sense information associated with the device 130, and a controller 120 to control the device 130 based on closed‐loop control using the sensed information and PID terms. The controller 120 uses a machine learning technique, such as reinforced learning, motivation learning etc, and the sensed information to determining the PID terms.

Description

PID Tuning
FIELD AND BACKGROUND
[0001] The present techniques relate to proportional-integral-derivative, PID, tuning in the field of closed-loop control systems. More particularly, but not exclusively, the present techniques relate to determining PID terms based on a machine learning technique for use in a PID-controlled system.
[0002] PID-controlled systems, or control systems incorporating a form of PID control, are a type of closed-loop feedback-based control system. Such systems generally utilise a set point value, which corresponds to a desired output of the control system, and a measured process variable, which corresponds to a measured output of the control system. The control system operates to minimise or eliminate the difference between the desired set point value and the measured process variable. To do this, a correction is performed by the control system. The nature of this correction depends on the type of control system and the type of device being controlled within the control system.
[0003] Generally, a PID-controlled system includes a PID controller that calculates the correction to be performed by the control system or the device being controlled within the control system. This correction is often based on a difference between the desired set point value and the measured process variable, and in PID-controlled systems, is further based on proportional, integral, and derivative terms specific to the control system. More particularly, the PID controller performs mathematical operations using the calculated difference and each of the proportional, integral, and derivative terms, for example a multiplication operation, an integral operation, and a derivative operation, to determine the correction to be performed. The device being controlled within the control system then performs the correction.
[0004] In such control systems, the control loop cycles through the steps of calculating the difference between a desired set point value and a measured process variable, performing PID control by calculating a correction based on the difference and the proportional, integral, and derivative terms, performing the correction, measuring the process variable, and repeating until a stop condition is satisfied, for example when the measured process variable is equal to or within a threshold of the desired set point.
[0005] The process of determining PID terms for use in such a control system is known as PID tuning. Conventionally, the PID terms, i.e. the proportional, integral, derivative terms that are used by the PID controller to calculate the correction, are determined manually through trial and error or through an iterative tuning process.
[0006] However, as recognised by the inventors of the present techniques, existing methods for determining PID terms for a control system, i.e. PID tuning, can require operation of the control system or device, for example through iterative operation of the control loop, or may be overly sensitive to uncertainties associated with the control system for example due to manufacturing variation. Other existing methods, may fail to account for multiple performance objectives in the PID term determination, may lack efficiency, and may be overly time-consuming.
SUMMARY
[0007] Particular aspects and embodiments are set out in the appended claims.
[0008] Viewed from a first aspect, there is provided a method for determining proportional-integral-derivative, PID, terms for use in a PID-controlled system comprising a device, a sensor configured to sense information associated with the device, and a controller configured to control the device based on closed-loop control using the sensed information and the PID terms, the method comprising: obtaining a physical parameter indicative of a physical property of the device; and determining, based on applying a machine learning technique to the physical parameter, the PID terms. Thereby, there is provided an approach for efficiently determining PID terms specific to a device in a control system, by using a parameter that relates directly to a physical property of the device controlled in the control system. Accordingly, time-consuming iterative PID tuning steps may be omitted and the device need not be operated in a control loop, resulting in an efficient PID determination method.
[0009] In other words, the present approach may be considered as a method to determine PID terms for a control system based on knowledge of a physical property of the control system. In particular, the present approach utilises a machine learning technique. Through application of the machine learning technique to a physical parameter indicative of a physical property of a device, PID terms for closed-loop control of the device are determined.
[0010] Thus, PID tuning is achieved for the control system comprising the device based on a physical property of the device, thereby increasing PID tuning efficiency. Further, as a physical parameter of the physical property is used, which may be determined at manufacture of the device, the device need not be operated to determine the PID terms. Thus, the present approach allows for offline tuning of the PID terms. Moreover, the use of the physical parameter, which may be obtained directly from basic system characteristics, further increases the efficiency of the PID term determination when compared to comparative examples that do not utilise a physical parameter for the determination according to the present approach.
[0011] Further, the present approach achieves a one-step determination of the PID terms for the control system: once the physical parameter is obtained, the machine learning technique is applied to the physical parameter to determine the PID terms in a single step. This increases efficiency of the tuning process and reduces the total duration of the tuning process compared to comparative examples that do not utilise the physical parameter and machine learning technique according to the present approach, thus conserving computing resources.
[0012] It will be appreciated that while techniques are described for determining PID terms, the present techniques may equally be used to determine one or a subset of the proportional, integral, and derivative terms. For example, in some cases, the present techniques may be used to determine PI terms, a P term, etc. The machine learning technique may thus be designed for determining any or a subset of the PID terms.
[0013] Viewed from a second aspect, there is provided an apparatus for determining proportional-integral-derivative, PID, terms for use in a PID-controlled system comprising a device, a sensor configured to sense information associated with the device, and a controller configured to control the device based on closed-loop control using the sensed information and the PID terms, the apparatus comprising: parameter obtaining circuitry configured to obtain a parameter indicative of a physical property of the device; and PID term determining circuitry configured to determine, based on machine learning circuitry configured to apply a machine learning technique to the parameter, the PID terms. Thereby, there is provided an apparatus configured to efficiently determine PID terms for use in a PID-controlled system by using a parameter that relates directly to a physical property of the device controlled in the control system.
[0014] Viewed from a third aspect, there is provided a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the methods described herein. Thereby, there is provided a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to efficiently determine PID terms for use in a PID-controlled system by using a parameter that relates directly to a physical property of the device controlled in the control system.
[0015] Viewed from a fourth aspect, there is provided a system comprising: a device; a sensor configured to sense information associated with the device; and a controller configured to control the device based on closed-loop PID control using the sensed information and PID terms determined based on applying a machine learning technique to an obtained parameter indicative of a physical property of the device. Thereby, there is provided a system configured to control a device in a PID-controlled system using RID terms efficiently determined by using a parameter that relates directly to a physical property of the device controlled in the control system [0016] Viewed from a fifth aspect, there is provided a PID-controlled system comprising: a piezoelectric actuator; a position sensor configured to sense a position of the piezoelectric actuator; and a controller configured to control the piezoelectric actuator based on closed-loop PID control using the position of the piezoelectric actuator sensed by the position sensor and PID terms determined based on applying a machine learning technique to an obtained parameter indicative of a physical property of the piezoelectric actuator. Thereby, there is provided a PID-controlled system configured to control a piezoelectric actuator in a PID-controlled system using PID terms efficiently determined by using a parameter that relates directly to a physical property of the piezoelectric device controlled in the control system.
[0017] Viewed from a sixth aspect, there is provided a method for training a reinforcement learning model to determine PID terms for use in a PID-controlled system comprising a device, a sensor configured to sense information associated with the device, and a controller configured to control the device based on closed-loop control using the sensed information and the PID terms, the method comprising: obtaining training data comprising physical parameters indicative of physical properties associated with a test apparatus comprising a plurality of test devices and electrical parameters indicative of electrical characteristics associated with the test apparatus, the test devices corresponding to the device; and training, based on the training data and a multi-objective reward function, the reinforcement learning model. Thereby, there is provided a method for training a reinforcement learning model to efficiently determine PID terms using parameters that relate directly to physical properties and electrical characteristics of test devices.
[0018] Other aspects will also become apparent upon review of the present disclosure, in particular upon review of the Brief Description of the Drawings, Detailed Description and Claims sections.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] Examples of the disclosure will now be described, by way of example only, with reference to the accompanying drawings in which: [0020] Figure 1: schematically illustrates a PID-controlled system configured to operate according
to teachings of the present disclosure.
[0021] Figure 2: schematically illustrates a method for determining PID terms according to
teachings of the present disclosure.
[0022] Figure 3: schematically illustrates a method for determining PID terms according to
teachings of the present disclosure.
[0023] Figure 4: schematically illustrates a method for training the reinforcement learning model
according to teachings of the present disclosure.
[0024] Figure 5: schematically illustrates a method for training the reinforcement learning model
according to teachings of the present disclosure.
[0025] Figure 6: schematically illustrates a PID-controlled system configured to operate according
to teachings of the present disclosure.
[0026] Figure 7: schematically illustrates an optical switch configured to operate according to
teachings of the present disclosure.
[0027] Figure 8: schematically illustrates an apparatus configured to operate according to
teachings of the present disclosure.
[0028] While the disclosure is susceptible to various modifications and alternative forms, specific example approaches are shown by way of example in the drawings and are herein described in detail. It should be understood however that the drawings and detailed description attached hereto are not intended to limit the disclosure to the particular form disclosed but rather the disclosure is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the claimed invention.
[0029] It will be recognised that the features of the above-described examples of the disclosure can conveniently and interchangeably be used in any suitable combination.
DETAILED DESCRIPTION
[0030] The present approaches as described herein relate to determination of PID terms. Such PID terms may be used for control of a PID-controlled system, for example as part of determining a correction to be applied by a device in the control system. Indeed, as discussed above, there is provided a method for determining proportional-integral-derivative, PID, terms for use in a PID-controlled system comprising a device, a sensor configured to sense information associated with the device, and a controller configured to control the device based on closed-loop control using the sensed information and the PID terms. The method comprises obtaining a physical parameter indicative of a physical property of the device; and determining, based on applying a machine learning technique to the physical parameter, the PID terms.
[0031] In some examples, the physical property of the device is a resonant frequency of the device. In this case, the physical parameter is indicative of the resonant frequency of the device. The present inventors have recognised that the resonant frequency of a device may be used as a way of characterising the device to determine suitable PID terms. In these examples, as the resonant frequency of the device may be determined in an efficient manner, for example by measurement as described further below, the PID terms may also be efficiently determined. In some cases, the resonant frequency of the device may be determined as part of a manufacturing or validation/characterisation process and accordingly may be a predetermined resonant frequency known for the specific device prior to deployment of the device within a control system or loop.
[0032] In some examples, the method further comprises obtaining a further parameter indicative of a characteristic of the device, wherein determining the PID terms is based on applying the machine learning technique to the physical parameter and the further parameter. The present inventors have recognised that a parameter indicative of a characteristic of the device may be used in combination with a physical parameter indicative of the physical property device (in some examples, a resonant frequency of the device) to determine the PID terms. In so doing, PID terms may be efficiently determined that are specific to the device in the control system. Accordingly, the PID terms may result in increased performance of the control system, for example in that the control system may converge to the desired set point value more quickly and with less overshoot, i.e. the settling time and overshoot may be reduced. In other words, PID terms specific to the device within the control system may be determined without having to perform iterative PID tuning procedures for every device. Further, computing resources for determining PID terms may be reduced.
[0033] In some examples, the characteristic of the device is a gain associated with the device.
Thus, in such examples, the device may be characterised in terms of its resonant frequency and its gain, which for many devices can be efficiently determined. Accordingly, the PID terms may be determined using only knowledge of these two aspects of the device, which can result in a simplified and more efficient PID tuning process. In these examples, the gain may be any form of gain associated with the device, for example an electrical gain or a mechanical gain.
[0034] In some examples, the gain associated with the device is an electrical gain associated with the device, and the further parameter is an electrical parameter. In these examples, the PID terms are determined based on knowledge of a physical parameter and an electrical parameter of the device, each being associated with a physical property and an electrical gain respectively. Thus, in some examples, the PID terms may be determined based on aspects of the device which may be determined in an offline manner, i.e. without requiring operation of the device within the control loop, and which may be determined separately and before deployment of the device within the control system/loop. In some examples, the PID terms are determined based on physical and electrical parameters indicative of a resonant frequency and electrical gain respectively associated with the device. These physical and electrical parameters may be determined efficiently and in an offline manner, not requiring operation of the device within the control loop. Further, these parameters may be determined as part of a manufacturing process of the device, and as such may be readily available. For example, the physical parameter and electrical parameter may be measured as part of a manufacturing process, or as part of a characterisation process following manufacture. Example measurement methods are discussed further below.
[0035] In some examples, the physical parameter comprises a first physical parameter indicative of a first resonant frequency of the device and a second physical parameter indicative of a second resonant frequency of the device; the electrical parameter comprises a first electrical parameter indicative of a first electrical gain associated with the device and a second electrical parameter indicative of a second electrical gain associated with the device; and the PID terms comprise a first and second proportional term, a first and second integral term, and a first and second derivative term. Thus, in some examples, multiple resonant frequencies of the device and multiple electrical gains associated with the device may be used to determine the PID terms. In such cases, multiple terms for each of the proportional, integral, and derivative terms may be determined. For example, the number of terms of each of the proportional, integral, and derivative terms may correspond to the number of resonant frequency values used and/or the number of electrical gain values. In these such examples, the method may efficiently determine multiple sets of PID terms corresponding to multiple sets of resonant frequency/electrical gain parameters. In some examples, this determination of multiple sets of PID terms (each corresponding to a set of resonant frequency/gain parameters) may be determined simultaneously. Accordingly, the number of tuning steps required to determine multiple sets of PID terms for a device may be reduced.
[0036] In some examples, the first resonant frequency of the device is a resonant frequency of the device associated with a first dimension of the device; the second resonant frequency of the device is a resonant frequency of the device associated with a second dimension of the device; the first electrical gain associated with the device is an electrical gain associated with the first dimension of the device; and the second electrical gain associated with the device is an electrical gain associated with the second dimension of the device. Accordingly, the method may determine PID terms for different dimensions of the device in a single offline step, thus increasing efficiency of the PID tuning. The first and second dimensions may correspond to control axes of the device, and in some examples the method calculates PID terms for two control axes simultaneously.
[0037] Indeed, in some examples, the device may have resonant frequencies associated with different dimensions of the device, and electrical gains associated with different dimensions of the device. Accordingly, in these examples, the method may determine a set of PID terms for a first dimension of the device (which is associated with a first resonant frequency and a first electrical gain), and a second set of PID terms for a second dimension of the device (which is associated with a second resonant frequency and a second electrical gain). The first and second dimensions of the device may correspond to two control axes of the device, i.e. the device may be controlled in two axes (for an actuator, for example, the actuator may be controllable in two control axes/dimensions). The first and second resonant frequencies may be the same or different. The first and second electrical gains may be the same or different. Accordingly, the method may be able to determine PID terms for devices that operate within the control system in two dimensions, in other words, where devices are controlled across two axes of the device.
[0038] It will be appreciated that the number of dimensions is not limited to two. For example, a device may have resonant frequencies and electrical gains associated with three dimensions of the device (i.e. when a device is controllable in three control axes, such as x, y, and z axes). In such examples, the method may determine three sets of PID terms, one set of PID terms for each dimension/control axis.
[0039] In some conventional techniques, in order to account for inter-axis coupling effects within the same device (damped oscillator, actuator, etc, as described below) a value representing this effect must be specified. This value can be challenging to determine, and so an additional tuning process is conventionally required to approximate this effect, or this effect is ignored altogether. By contrast, according to some examples in accordance with the present teachings, through application of the machine learning technique PID terms for multiple control axes of the device may be determined simultaneously, and thus the technique is implicitly handling this effect in a way that is learnt end-toend rather than requiring additional iterative tuning steps.
[0040] In some examples, the device is a damped oscillator. The present inventors have recognised that damped oscillators may be appropriately characterised based on a physical property. In some examples, the damped oscillator may be characterised based on a physical property and an electrical characteristic, for example a resonant frequency and an electrical gain.
[0041] In some examples, the damped oscillator may be an actuator. Actuators may be approximated as damped oscillators, and as such the techniques of the present disclosure may be used to determine PID terms for control systems comprising actuators.
[0042] In some examples, the damped oscillator is a piezoelectric actuator. A piezoelectric actuator may be an electro-mechanical device with controllable planar movement through the application of a voltage along each of its degrees of freedom (i.e. control axes). Accordingly, in these examples where the device is a piezoelectric actuator, PEA, a resonant frequency of the PEA and an electrical gain of the PEA may be obtained and then used to determine, based on application of a machine learning technique to the PEA resonant frequency and the PEA electrical gain, the PID terms. PEAs (and indeed actuators or damped oscillators in general) are often characterised during manufacture to determine their resonant frequency and electrical gain, and thus the present techniques may provide a particularly efficient method for determining PID terms for PEA-based PID control systems.
[0043] In some examples, the machine learning technique is a reinforcement learning technique.
In other words, the PID terms are determined based on application of a reinforcement learning technique to the physical and electrical parameters. Thus, the RID tuning may be performed in a one-step and offline manner, thereby providing an automatic tuning method that does not rely on time-consuming trial and error or manual searching of the RID search space. Further, in some examples, the reinforcement learning model has been trained based on optimising multiple objectives, resulting in RID terms that are optimal for the multiple objectives. Possible objectives may include minimising the settling time of response, and reducing the overshoot, for example. A further objective may be maintaining the steady state error within a pre-determined acceptable range.
[0044] In some examples, the machine learning technique is a deep reinforcement learning technique. Thus, in examples where the reinforcement learning technique is a deep reinforcement learning technique, a policy function is implemented as a trainable neural network-based model architecture. Thus, these examples provide a high performance RID tuning method. In some examples, the neural network model architecture may comprise two hidden layers with 16 units each. In some examples, ReLU activations are used in the hidden layers, and normalised actions (i.e. the agent outputs values around +-1) are used and then these values are scaled to suitable ranges for each RID parameter. While the present teachings generally refer to a reinforcement learning technique, it will be appreciated that the discussion related to the reinforcement learning technique may equally apply to examples where the reinforcement learning technique is a deep reinforcement learning technique.
[0045] In some examples, applying the reinforcement learning technique comprises inputting the physical parameter and the electrical parameter to a trained reinforcement learning model. Thus, tuning of the RID terms is one-step and offline. In these examples, the physical parameter and the electrical parameter are input to the trained reinforcement learning model and the output is the RID terms. Accordingly, RID terms for a control system are output in a one-step and offline manner, not requiring operating of the control system, and based on characteristics/properties (resonant frequency and electrical gain) of the control system (i.e. the device). Thus, these examples provide an efficient RID tuning method. As discussed above, the trained reinforcement learning model may in some examples be a trained deep reinforcement learning model.
[0046] In some examples, the trained reinforcement learning model is trained based on training data comprising physical parameters indicative of physical properties associated with a test apparatus and electrical parameters indicative of electrical characteristics associated with the test apparatus. In some examples, the test apparatus comprises a plurality of test devices and the test devices correspond to the device. Thus, the reinforcement learning model is trained based on a training apparatus and training devices similar to or corresponding to the device. This allows determination of RID terms for a new device that has not formed part of the training apparatus. This results in a PID determination method that, once trained, may determine PIDs for multiple unseen devices. Accordingly, computing resources are conserved, and the time required for determining PID terms for an unseen device is reduced. This may also relieve bottlenecks due to device characterisation and PID term determination in manufacturing processes. It will be appreciated that devices, for example PEAs, may vary from PEA to PEA due to manufacturing tolerances, limitations, and processes. Thus, in some examples, test devices correspond to the device in that the test devices and the device are the same type of device, for example a PEA, but not the same device.
[0047] In some examples, the trained reinforcement learning model is trained further based on a multi-objective reward function. Thus, in some examples, the determined PID terms take into account multiple objectives. In some examples, the multiple objectives are coupled. For example, the multiple objectives may comprise optimising settling-time minimisation and overshoot. Thus, in some examples, the techniques may automatically manage performance compromise decisions across multiple and possibly interfering objectives. Accordingly, in these examples, when the PID terms are determined based on a trained reinforcement learning model trained based on a multi-objective reward function, performance of the control system may be improved. For example, a settling time of the control system to settle to the desired set point value may be reduced, or overshoot of the control system (overshooting the desired set point value, i.e. through 'over correcting') may be reduced. Further, in some examples, the reward function accommodates multiple objectives and so does the optimisation method, which may accommodate a plurality of metrics that can be measured from a response signal of a device. Example metrics may include settling time and overshoot, and are described in more detail further below.
[0048] In some examples, the trained reinforcement learning model is trained based on: (1) initialising a reinforcement learning policy; (2) performing a training episode comprising: (a) selecting a test device of the plurality of test devices of the test apparatus; (b) obtaining a physical parameter indicative of a physical property of the test device and an electrical parameter indicative of a an electrical characteristic associated with the selected test device; (c) inputting a state vector based on the physical parameter and the electrical parameter to the policy to generate a set of PID parameters; (d) selecting a control operation for the device to perform; (e) performing the selected control operation under closed-loop control and using the set of PID parameters while measuring an output variable of the control operation; (f) and generating a reward using a multi-objective reward function and one or more metrics calculated based on the output variable of the control operation; and (3) updating the policy based on the generated reward.
[0049] In some examples, the performing of the training episode and the updating may be iteratively performed. These steps, in some examples, may be iteratively performed until a stop condition is satisfied. The stop condition may be a predetermined number of iterations, or may be satisfied based on determining that the reward has not changed more than a predetermined amount between successive iterations.
[0050] In some examples, the multi-objective reward function, R, is: wbtrti: [0051] wherein i is a number greater than one, mi is a measured value of a metric for a given output, and /i is a predefined acceptable threshold for that metric. In other words, for a set of metrics, S, the measured value of each metric {mi} for a given PID output, and a set of (soft) upper limits for each of these metrics (i.e. the predefined acceptable metric), {41 for metric i E S, the reward function R used to train the reinforcement learning model may be structured as shown above. may only be punished on the basis of its performance on a particular metric if the performance is outside a limit (i.e. predefined acceptable threshold), but it may also punished in a way that is inherently balanced over various metrics (Le it cannot only focus on one metric, as the contributions from the other metrics will likely accumulate large penalties). Accordingly, in examples using R, multiple objectives may be optimised, resulting in more performant control systems when using the PID terms. In particular, this may result in a reduction in the settling time and/or overshoot when the control system is subsequently operated using the PID terms. It will be appreciated that the exact form of R may vary. For example, an exponent may be associated with one or more terms of R. [0052] Thus, in some examples, the reward function is multi-objective and allows for multiple, in some examples coupled, objectives. Further, this may be implemented in a way that does not require human observation throughout the optimisation process. This form of reward function has been found to effectively accommodate optimisation with respect to multiple objectives. Accordingly, in these examples, the method may produce more performant PID terms once deployed in the control system.
[0053] In some examples, the metrics may be settling time and overshoot, with associated predefined acceptable values of settling time and overshoot that may be implemented in R. [0054] In some examples, the trained reinforcement learning model is trained further based on iteratively: (a) selecting a test device of the plurality of test devices; (b) measuring a physical parameter indicative of a physical property of the selected test device; (c) measuring an electrical parameter indicative of an electrical characteristic associated with the selected test device; (d) inputting a state vector based on the physical parameter and the electrical parameter to the reinforcement learning model to generate a set of PID parameters; (e) selecting a control operation to be performed on the selected device; (f) performing the selected control operation on the selected device under closed-loop control and using the set of PID parameters while measuring an output variable of the control operation; and (g) generating a reward using the multi-objective reward function and one or more metrics calculated based on the output variable of the control operation, wherein the trained reinforcement learning model is trained further based on updating the reinforcement learning model based on the generated reward.
[0055] Accordingly, the reinforcement learning model may be trained to take into account a variety of different test devices, each with specific physical and electrical parameters. As such, once trained, the reinforcement learning model may be used to determine PID terms for unseen devices, allowing for more efficient PID tuning. In some examples, the physical and electrical parameters of the test devices are measured, although in other examples these parameters may be obtained. The steps may be iterated a pre-determined number of times or until a pre-determined condition is satisfied, for example until all test devices of the test apparatus have been selected.
[0056] In some examples, inputting the state vector in step (d) above may comprise inputting a state vector to a policy of the reinforcement learning model, and updating the reinforcement learning model may comprise updating the policy.
[0057] In some examples, the test devices and control operations are randomly selected. By randomly selecting the test devices and the control operations, the agent is exposed to a greater range of possible combinations of test devices and control operations, and across the full range of possible control operations, thus training in an optimised manner with respect to the full range of possible combinations. In so doing, the trained model may be more performant when applied to a new device that the model was not explicitly trained based on.
[0058] In some examples, the plurality of test devices are a plurality of piezoelectric actuators.
The physical properties may be resonant frequencies, for example of the piezoelectric actuators. The electrical characteristics may be gains associated with the piezoelectric actuators. In some examples, randomly selecting the control operation comprises randomly selecting an actuation to be performed on the selected piezoelectric actuator. In some examples, the selected control operation is performed on the piezoelectric actuator by performing the selected actuation on the selected piezoelectric actuator.
[0059] Thus, in some examples, when the control system comprises a PEA and the PID terms to be determined are for a PEA, the reinforcement learning model is trained based on operating a test apparatus that comprises a plurality of PEAs, which are iteratively used to train the reinforcement learning model. As such, once trained, the reinforcement model may be used to determine PID terms for unseen PEAs, i.e. PEAs which were not part of the training apparatus. In so doing, PID terms may be determined in a one-step and offline manner for unseen PEAs and without needing to train the reinforcement learning model again or to undertake multiple time-consuming tuning or optimisation steps.
[0060] In some comparative examples, PID tuning is required on a per-device basis, i.e. for every device and the tuning process must be repeated to determine PID terms specific to that device (as devices may vary, for example due to manufacturing variations). In examples described herein, once the reinforcement learning model is trained, it may be used to determine PID terms for multiple devices. For example, PID terms specific to a device (based on the resonant frequency and electrical gain, for example) are determined using the trained reinforcement model, thus removing the repetition requirement of the comparative tuning methods to determine device specific PID terms.
[0061] Further, in some examples, the test apparatus is an optical switch that comprises a plurality of PEAs as the plurality of test devices. In this context, a PEA may be selected (in some examples randomly), and a switching destination may be selected (in some examples randomly), for example the switching destination may be a second (in some examples randomly) selected PEA. In this context, an optical fibre may be connected to a PEA, and actuation of the PEA controls the position of the optical fibre (directly or by controlling the propagation direction of light from the fibre using a lens). In some examples, an optical switch comprises an input plane and an output plane, each with M x N PEAs (each connected to an optical fibre), that may be separated by a region of free-space and where a collimator lens may be attached to the end of each fibre. By positioning a particular input and output pair of PEAs, such that they correspond, a light path from the input to the output may be formed. The optical switch may be configured to, using the PEAs, direct light from a transmitting optical fibre to any of a plurality of receiving optical fibres. In some examples, the optical switch comprises 384 x 384 ports, i.e. a PEA may perform a switching operation to any of the 384 possible positions on the receiving side of the switch.
[0062] In some examples, the method further comprises controlling, based on the PID terms, the device. Thus, once the PID terms have been determined, actual control of the device may be carried out based on the PID terms. The PID terms may be determined once based on the physical parameter and electrical parameter of the device (in some examples the resonant frequency and electrical gain). These determined PID terms may then be stored in a memory associated with the control system or the device, and may be applied during each iteration of a control loop of the control system. In other words, the PID terms may be determined once, and stored in memory for application during operation of the closed-loop control.
[0063] According to the fifth aspect, in some examples, the PID-controlled system is an optical switch, and the PID-controlled system further comprises: an optical fibre configured to transmit light in a propagation direction, wherein the piezoelectric actuator is configured to actuate movement of the propagation direction.
[0064] According to the sixth aspect, in some examples, obtaining the training data further comprises iteratively: selecting a test device of the plurality of test devices; measuring a physical parameter indicative of a physical property of the selected test device; measuring an electrical parameter indicative of an electrical characteristic associated with the selected test device; inputting a state vector based on the physical parameter and the electrical parameter to the reinforcement learning policy model to generate a set of PID parameters; selecting a control operation to be performed on the selected device; performing the selected control operation on the selected device under closed-loop control and using the set of PID parameters while measuring an output variable of the control operation; and generating a reward using the multi-objective reward function and one or more metrics calculated based on the output variable of the control operation, wherein training the reinforcement learning model further comprises updating the reinforcement learning model based on the generated reward.
[0065] In some examples, the test device and control operation are selected randomly.
[0066] Thus, there have been described techniques for efficiently determining PID terms specific to a control system in a one-step and offline manner.
[0067] Figure 1 schematically illustrates a PID-controlled system. The PID-controlled system 100 may in some examples be a system that implements PID-based control as the system performs a control loop. In the current example, the PID-controlled system 100 comprises an input 110 that is the input for the control loop of the PID-controlled system 100. Input 110 may correspond to a desired set point value that the PID-controlled system 110 is attempting to control a device 130 to output. The input 110 is provided to a controller 120 by input connection 115. The controller 120 is configured to control operation of the device 130 based on the input 110, and further based on sensor data provided by a sensor 140. Device 130 is configured to be controlled by controller 120 based on control input 125 and is configured to provide output 150 based on being controlled via the output connection 135. Sensor 140 is configured to sense information associated with the device 130. More particularly, sensor 140 senses the output 150 of the device 130 after the control of the device has been performed by the controller 120, which may be referred to as a measured process variable.
[0068] Controller 120, device 130, sensor 140 may be implemented in dedicated circuitry, FPGAs or may be implemented by a general-purpose circuitry executing code. It will be appreciated that, while shown as separate components, any of the components of figure 1 may be combined. For example, the controller 120 and device 130 may be combined into a single component. Further, the sensor 140, rather than being a separate component, may rather indicate the sending of the output of the device 130 to the controller 120. It will further be appreciated that input 110 and output 150 may represent input and output values, rather than representing an input or output component. Further, any of the components illustrated may be implemented in a distributed manner. For example, the controller may be implemented off device and network-connected with the other components of figure 1.
[0069] An exemplary control loop will now be described. A set point value as input 110 may be provided to the controller 120. The set point value represents a desired output value of the device 130. The sensor 140 may then sense (i.e. measure) the output 150 of the device, i.e. the measured process variable, and provide the measured process variable to the controller 120 (indicated by the arrows). The controller 120 may then determine a correction to be performed by the device 130 based on the input 110/desired set point value and the output 150/measured process variable. In some examples, this correction comprises determining a difference between the desired set point value and the measured process variable, and performing one or more mathematical operations on the difference. The controller may be configured to store in memory a proportional, integral, and derivative term for use in the determination of the correction. More particularly, the controller 120 may perform one or more mathematical operations on the set point/process variable difference using each (or one or a subset of) the stored PID terms to determine the correction. The controller 120 may then provide this correction as a control input 125 to cause the device 130 to be controlled based on the calculated correction. Output 150 of the device 130 is then sensed, i.e. the new measured process variable, and this may be used at the start of the next control loop.
[0070] Thus, in the current example, the controller 120 is configured to store pre-determined PID terms for use in the determination of the correction and control to be applied to the device 130. It will be appreciated that the controller 120 may implement one or more or any combination of P, I, and D terms for performing one or more of any combination of P. I, and D control. The controller 120 is configured to apply the PID terms and perform PID control using conventional techniques. As discussed herein, the present techniques relate to methods for determining PID terms for use in a control system, such as the PID-controlled system 100. As also discussed herein, PID terms refer to the constant proportional, integral, and derivative terms that may be used in mathematical operations of the PID controller. The constant P term may multiply an error value. The constant I term may multiply an integral calculated based on the error value. The constant D term may multiply a derivative based on the error value. In other words, the PID terms referred to herein may be referred to as PID coefficients.
[0071] It will be appreciated that the PID-controlled system 100 may comprise additional components that are not shown in figure 1. Further, while the controller 120 is shown to receive sensor 140 input and described as calculating the difference in the input 110 and output 150, it will be appreciated that a module other than the controller 120 may perform this functionality.
[0072] Figure 2 schematically illustrates a method 200 for determining PID terms for use in a PID-controlled system, for example in the system of figure 1.. The controller of the PID-controlled system may further be configured to receive input and to control the device based on closed-loop control also using this input. The method 200 includes the following steps.
[0073] At step S202, a physical parameter indicative of a physical property of the device is obtained. In some examples, the physical property of the device is a resonant frequency of the device.
[0074] At step 5202, the PID terms are determined based on applying a machine learning technique to the physical parameter.
[0075] Thus, and as discussed above, the present approach enables the determination of PID terms based on a physical property of the device in the control system in an offline manner without needing to operate the device in the control loop. Further, once the physical parameter is obtained, the present approach utilises a machine learning technique in a one-step tuning to determine the PID terms.
[0076] With reference again to figure 2, method 200 may, in some examples, also include steps 5201a and 5202a. At step S201a, a further parameter indicative of a characteristic of the device is obtained. While step 202a has been shown in figure 2 as a separately labelled step, it will be appreciated that step S202a optionally modifies step 5202 rather than representing a further separate step. At step 5202a, the PID terms are determined based on applying the machine learning technique to the physical parameter and the further parameter.
[0077] Accordingly, in some example methods that include steps S201a and S202a, the PID term determination is based on applying the machine learning technique to the physical parameter and the further parameter. Introducing this further parameter into the PID determination may, in some implementations, result in further increased efficiency of the PID terms and in further increased performance of the PID terms when used for control of the device.
[0078] Figure 3 schematically illustrates a method 300 for determining PID terms for use in a PID-controlled system comprising a device, a sensor configured to sense information associated with the device, and a controller configured to control the device based on closed-loop control using the sensed information and the PID terms according to the teachings of the present disclosure. For example, method 300 may be implemented in the system of figure 1.
[0079] Method 300 is similar to method 200 when method 200 includes steps S201a and 5202a, but in method 300, the physical property is a resonant frequency of the device, the further parameter is an electrical parameter, and the characteristic of the device is an electrical gain associated with the device. As recognised by the present inventors, a physical and electrical parameter indicative of a resonant frequency and an electrical gain are able to characterise the device such that PID terms may be determined for the device based on these parameters alone. Method 300 includes the following steps.
[0080] At step S301, a physical parameter indicative of a resonant frequency of the device is obtained. In some examples, the physical parameter indicative of the resonant frequency may correspond to the resonant frequency. For example, the physical parameter may be the resonant frequency itself, may correspond to a value determined from the resonant frequency, or may be a value determined from the resonant frequency. In some examples, the resonant frequency is pre-determined. In other examples, the resonant frequency of the device may be measured, which is explained further below.
[0081] At step 5302, an electrical parameter indicative of an electrical gain associated with the device is obtained. In some examples, the electrical parameter indicative of the electrical gain corresponds to the electrical gain. For example, the electrical parameter may be the electrical gain, may correspond to a value determined from the electrical gain, or may be a value determined from the electrical gain. In some examples, the electrical gain is pre-determined. In other examples, the electrical gain of the device may be measured, which is explained further below.
[0082] At step 5303, the PID terms are determined based on applying a machine learning technique to the physical parameter and the electrical parameter.
[0083] As discussed above, the machine learning technique may be a reinforcement learning or reinforcement learning technique. In some examples, applying the machine learning technique to the physical parameter and the electrical parameter comprises inputting the physical parameter and the electrical parameter to a trained reinforcement learning/reinforcement learning model to determine the PID terms.
[0084] Figure 4 schematically illustrates a method 400 for training the reinforcement learning model according to techniques of the disclosure, for example for use in the system of figure 1. Method 400 comprises the following steps.
[0085] At step 5401, training data is obtained that comprises physical parameters indicative of physical properties associated with a test apparatus comprising a plurality of test devices and electrical parameters indicative of electrical characteristics associated with the test apparatus, the test devices corresponding to the test device. In other words, the test apparatus comprises a plurality of test devices which correspond to the device. In some examples, the training data comprises a physical parameter and an electrical parameter for each of the plurality of test devices of the test apparatus.
[0086] At step 5402, the reinforcement learning model is trained based on the training data and a multi-objective reward function.
[0087] With reference still to figure 4, method 400 may, in some examples, also include step 5401a. Step 5401a is shown in figure 5, and comprises steps 5501 to S507. It will be appreciated that step 5401a optionally modifies step S401 or may represent further separate steps. Step S401a includes further features relating to the obtaining of training data. Step S401a as shown in figure 5 will now be described.
[0088] At step 5501, a test device of the plurality of test devices is selected. This selection may be random.
[0089] At step 5502, a physical parameter indicative of a physical property of the selected test device is measured. In some examples, the physical parameter is obtained rather than measured. In some examples, the physical property is a resonant frequency. In some examples, the test device is a piezoelectric actuator and the test apparatus is an optical switch comprising a plurality of piezoelectric actuators.
[0090] At step 5503, an electrical parameter indicative of an electrical characteristic associated with the selected test device is measured. In some examples, the electrical parameter is obtained rather than measured. In some examples, the electrical property is an electrical gain associated with the test device. In some examples the test device is a piezoelectric actuator and the test apparatus is an optical switch comprising a plurality of piezoelectric actuators.
[0091] An example approach for measuring the resonant frequency of an actuator and an electrical gain associated with the actuator will now be described, which may be relevant for some examples described herein. It will be appreciated that this example approach, while described in the context of actuators, are not limited to actuators. For example, resonant frequency and electrical gain of any damped oscillator or actuator or PEA (or device) may be determined using this example approach.
[0092] In the current example, an actuator is driven to transition across a distance in open loop and the response is measured over a time period selected such that the actuator stabilises to a steady state value. This measurement may be referred to as the 'trace'. The value of this distance may be selected based on the actuator being used. The data collected during this open loop transition and the trace is used to calculate the resonant frequency and electrical gain as follows.
[0093] The electrical gain may be calculated as the ratio between the change in voltage of the trace (effectively the difference between the starting position and the final steady state position) and the amount of voltage used to drive the actuator, i.e output transition voltage divided by input transition voltage.
[0094] The resonant frequency may be measured as the frequency corresponding to the peak of the Fourier transformation of the trace, i.e. corresponding to the largest contributing frequency in the spectrum of the actuator response. In some examples, as the actuators may be driven in two control axes, for example a first dimension or x-axis and a second dimension or y-axis, this process is repeated twice for each actuator -once per control axis. In other examples, the process may be repeated as many times as there are control axes of the actuator/device.
[0095] Returning to figure 5, at step 5504 a state vector based on the physical parameter and the electrical parameter is input to the reinforcement learning model to generate a set of PID parameters. In some examples the reinforcement learning policy has an input size of four (a resonant frequency and electrical gain for x and y-axes of an actuator), and an output size of six (P, I, and D terms for the x and y axes). In some examples, the state vector is generated based on concatenating the resonant frequency and electrical gain values for each axis to generate a four-dimensional state vector. In some examples, inputting the four-dimensional state vector to the reinforcement learning policy produces PID terms for the x and y-axis of the actuator (i.e. six terms in total).
[0096] At step 5505, a control operation to be performed on the selected device is selected. In some examples, the control operation is an actuation of the actuator or a switching operation of the actuator. This selection may be random.
[0097] At step 5506, the selected control operation is performed on the selected device under closed-loop control and using the set of PID terms generated by the reinforcement learning policy while measuring an output variable of the control operation. In some examples, the actuator is actuated in accordance with the selected actuation or switching operation while the position of the actuator is recorded for a predetermined time period. In some examples, this period of time during which actuator movements are to be observed is determined based on the metrics used in the reward. For example, the period of time may be determined such that metrics used in the reward function may be reliably measured. Overshoot may require a smaller measurement time period than steady state, for example.
[0098] At step 5507, a reward is generated using the multi-objective reward function and one or more metrics calculated based on the output variable of the control operation. In some examples, a metric measured from the output variable, i.e. the position of the actuator over time during the operation of the actuator, may be settling-time and/or overshoot. In some examples, generating the reward is based on a plurality of metrics, for example two metrics where the metrics are the settling time and the overshoot. Further, the reward may be generated based on pre-determined acceptable values for the metrics to be measured. In some examples, pre-determined acceptable values for an overshoot and settling time are used as the basis of generating the reward, and the measured metrics are compared to the pre-determined acceptable values. In some examples, a pre-determined acceptable maximum value of settling time may be 20ms, with smaller settling times may be more rewarded more than greater settling times.
[0099] In the context of piezoelectric actuators (or actuators in general), the settling time may correspond to the time period for the position of the PEA to fall and remain within a predefined threshold of the desired position (or the steady state error). In the context of PEAs in optical switches, minimising this predefined threshold equates to minimising the switching overhead, since when the position of the PEA is not close enough to its desired position, the PEA cannot reliably be used to establish a light path to an intended output. More generally, stability with respect to a desired position equates to optical loss, as greater movement about a desired position will cause more light to be diverted away from the intended output.
[00100] In the context of piezoelectric actuators (or actuators in general), the overshoot may correspond to an amount past a desired position that the PEA has reached. In the context of PEAs in optical switches, a 'large' overshoot may result in a switching port's input being directed at an output port other than the intended output port, with resulting optical losses [00101] A further example metric is the steady state. In some examples, steady state is defined as the time when the position of the PEA gets to and remains with a predefined threshold that is smaller than the predefined threshold associated with the settling time. Thus, in some examples, the steady state may indicate the longer term stability of the PEA, rather than its instantaneous stability after movement (as for the settling time metric).
[00102] The method of the current example may then return to step S501 and the training procedure may repeat until a condition is satisfied, indicated by the dashed arrow shown in figure 5 between steps S507 and S501. Before returning to step S501, in some examples, the method may further comprise updating the reinforcement learning model based on the generated reward. In some examples, this updating is part of the loop shown in the method of figure 5. In some examples, the reinforcement learning model is updated after each iteration of the training procedure steps S501 to S507. In other examples, multiple iterations of steps S501 to S507 may be performed before the reinforcement model is updated based on the generated rewards.
[00103] A further example training procedure will now be described, that comprises steps (1), (2) and (3) as described below. In this example, at step (1) a reinforcement learning policy is initialised. This may be randomly initialised. In some examples, as described above, the policy has an input size of four (a resonant frequency and electrical gain for x and y axes of an actuator), and an output size of six (P. I, and D terms for the x and y axes of the actuator). It will be appreciated that the input size may vary, for example when only one control axis is considered (producing three PID terms), or if three control axes are considered (i.e. x, y, and z dimensions of the actuator are controlled, producing nine PID terms). Thus, the training procedure is not limited to an input size of four and an output size of six. Further, policy and model are referred to interchangeably in the context of this example training procedure. At step (2) a reinforcement learning loop is performed, i.e. a Markov Decision Process-policy interface process is performed.
[00104] An example training episode of step (2) is as follows: (a) choose a source actuator, (b) choose a destination position, (c) in open-loop control, measure the resonant frequency and gain of each axis, (d) concatenate the resonant frequency and gain values for each axis to make a 4-dimensional state vector (s = [wx, g", wy, gy]), (e) input the state vector to the policy and receive an output specifying PID terms for each axis (x and y) (PID = [P, l", Dx, Py, y, DJ), (f) set the PID parameters of the chosen source actuator to be these parameters, (g) in closed loop control, move the position of the source actuator to the chosen destination position and record the position of the actuator throughout this switching process for the decided amount of time, (h) from this signal, measure all relevant metrics (for example settling-time and overshoot), (i) calculate the reward based on these metrics, (j) end of episode -go back to (a).
[00105] In some examples, the source actuator and destination position are randomly chosen. In some examples, at step (i), the reward is calculated based on a reward function and inputting the metrics to the reward function. In some examples, a multi-objective reward function may be used at step (i). In some examples, the multi-objective reward function used at step (i) may be R as defined herein. When using R, given a set of metrics, 5, (e.g. settling time, overshoot), the measured value of each metric is as measured during step (h), and corresponds to mi in R. The predefined acceptable threshold of each metric, for example predefined acceptable thresholds for settling time and overshoot corresponds to /i in R. In this example, the reinforcement learning policy/model will be punished should the PID parameters generated by the policy/model drive a switching process that yields a measurement of mi > I. In some examples, as there may be a reward value associated with each control axis of the PEA, the axis with the worst result (measured metric) is used in this comparison.
[00106] This example training procedure may further comprise step (3) where the reinforcement learning policy is updated. In some examples, at step (3), in accordance with the initialised reinforcement learning policy from step (1), and after training data has been collected during an iteration of the training procedure of step (2) (comprising steps (a) to (j)), the reinforcement learning policy is updated. Finally, at step (4), if a pre-determined stop condition is met then the training procedure terminates, otherwise the training procedure returns to step (2) to collect more training data. The stop condition may be based on an average of the reward not changing more than a predetermined amount between training iterations, or a pre-determined number of training iterations has been met (i.e. steps (2), (3), and (3) have been repeated a pre-determined number of times), for example 1600 iterations. Once trained, the reinforcement learning policy/model may then be used on a different plurality of actuators to determine PID terms specific to those actuators in a one-step and offline manner.
[00107] Further, the trained reinforcement learning model of the present teachings may be used to generate a set of PID terms for a control system once, rather than being integrated into the control loop of the control system and used to dynamically change the PID terms throughout the lifetime of the control system, thereby removing the need for resource intensive computations to be carried out by the control system and during a control loop.
[00108] Thus, the training episode is one-step, as the policy/model may receive an initial state (i.e. the physical state information for a given PEA -resonant frequency and electrical gain for every control axis) and take a single action (the simultaneous output of three PID terms per axis), before receiving a reward based on the switching response of that PEA. At the end of the training episode, a new PEA may be randomly selected and the process may be repeated.
[00109] It will be appreciated that while this example training procedure has been described in the context of actuators, the procedure may equally be applied to damped oscillators, or any device that may be characterised by resonant frequency and gain. It will further be appreciated that any aspect of this example training procedure may be combined, modified or interchanged with the method of figure 5. Further, this training procedure may have a number of variables, for example how the decisions are made (e.g. policy gradient vs a modified 0-learning method), how/if actions or observations are scaled, how samples are used to train the policy, or how the policy is implemented (e.g. deep neural network or q-table). In some examples, the training procedure may be implemented using the proximal policy optimisation, PPO, algorithm, but it will be appreciated that any other reinforcement learning algorithm that supports a continuous action space could be used instead. Further examples include soft actor critic, twin delayed deep deterministic or asynchronous advantage actor critic..
[00110] An example testing procedure will now be discussed. Given state information (resonant frequency and electrical gain values) for one or more actuators to be controlled, and a policy/model that has already been trained using the example training procedure, the PID terms may be tested. In the current example, for each actuator, the state information is used as an input to the trained policy/model the PID terms for that actuator are output. These PID terms may then be implemented in a control system, for example stored in memory of a controller of the control system. As such, during the testing procedure, actuators (or devices more generally) are not required to be powered on/operated in a control loop, assuming that the physical and electrical parameters (resonant frequency and electrical gain) are already known. Conventionally, actuators are characterised by this information from the factory. Thus, testing is also offline and one-step, i.e. not requiring an iterative process. Once trained, the policy/model may be used for a great plurality of actuators.
[00111] Figure 6 schematically illustrates a PID-controlled system 600 that may implement the PID terms determined according to the present techniques. As shown, PID-controlled system 600 comprises a piezoelectric actuator 610 and a position sensor 620 configured to sense a position of the piezoelectric actuator 610. The PID-controlled system 600 further comprises a controller 630 configured to control the piezoelectric actuator 610 based on closed-loop PID control using the position of the piezoelectric actuator 610 sensed by the position sensor 620 and PID terms determined according to the present techniques. The determined PID terms may be stored in a memory associated with the PID-controlled system 600, for example within a memory of the controlled 630. Controlled 630 is configured to receive input, control the piezoelectric actuator based on this input, the sensed position, and the PID terms, in accordance with the discussion of figure 1. Controller 630 performs PID control using conventional techniques but based on the PID terms determined according to the present teachings. It will be appreciated that PID-controlled system 600 may comprise additional components not shown in figure 6.
[00112] Figure 7 schematically illustrates an optical switch 700 that may implement the PID terms determined according to the present techniques. Optical switch 700 is an example of a PID-controlled system. As shown, optical switch 700 comprises the components of the PID-controlled system 600, with the addition of an optical fibre 710, and so discussion of the components in figure 7 labelled with the same reference signs as figure 6 applies. The optical fibre 710 is configured to transmit light in a propagation direction, and the piezoelectric actuator 610 is configured to actuate movement of the propagation direction. This movement may be actuated by the piezoelectric actuator 610 controlling a position of the optical fibre 710, or by controlling a position of a lens through which the propagation of the transmitted light is directed by the optical fibre 710.
[00113] Controller 630 is configured to receive input, control the piezoelectric actuator based on this input, the sensed position, and the PID terms, in accordance with the discussion of figure 1. Controller 630 performs PID control using conventional techniques but based on the PID terms determined according to the present teachings. It will be appreciated that optical switch 700 may comprise additional components not shown in figure 7, for example additional piezoelectric actuators and additional optical fibres.
[00114] Figure 8 schematically illustrates an apparatus 800 that is configured to determine PID terms according to the present techniques. More particularly, apparatus 800 is configured to determine PID terms for use in a PID-controlled system comprising a device, a sensor configured to sense information associated with the device, and a controller configured to control the device based on closed-loop control using the sensed information and the PID terms.
[00115] As shown, apparatus 800 comprises parameter obtaining circuitry 810 configured to obtain a parameter indicative of a physical property of the device. Apparatus 800 further comprises PID determining circuitry 820 and machine learning circuitry 830. Machine learning circuitry 830 is configured to apply a machine learning technique to the parameter, and the PID determining circuitry 820 is configured to determine, based on the machine learning circuitry 830 applying a machine learning technique to the parameter, the PID terms. It will be appreciated that in some examples machine learning circuitry 830 may not be implemented on the apparatus 800 itself. In such examples, the PID determining circuitry 820 may be configured to communicate with machine learning circuitry 830 over a network connection or similar. Machine learning circuitry 830 may be implemented in a server connected over a network to the apparatus 800. It will be appreciated that the components of apparatus 800 may be network connected and distributed across one or more computing systems/devices, for example a plurality computing devices may each implement one of the parameter obtaining circuitry 810, PID determining circuitry 820, and machine learning circuitry 830.
[00116] The methods discussed above may be performed under control of a computer program executing on a computing device, for example the apparatus 800 of figure 8. The computing device may comprise a CUDA-enabled GPU. Hence, a computer program may comprise instructions for controlling a computing device to perform any of the methods discussed above. The program can be comprised in a computer-readable medium. A computer readable medium may include non-transitory type medium such as physical storage media, for example storage discs and solid state devices. A computer readable medium may additionally or alternatively include transient media such as carrier signals and transmission media, which may for example occur to convey instructions between a number of separate computer systems, and/or between components within a single computer system.
[00117] Therefore, from one perspective, there has been described a method for efficiently determining PID terms specific to a control system using a machine learning technique and properties of the device controlled in the control system. These techniques, while being largely directed to the determination of PID terms, can be adapted to determine any or a subset of the P, I, and D terms. Further, these techniques can be adapted to determine PID terms, in some examples, for any damped oscillator, actuator, or piezoelectric actuator. The PID terms determined by these techniques can be used for performing PID control in a PID-controlled system. For example, a controller of the control system may store PID terms determined according to these techniques for use in controlling a device and determining a correction to be performed by the device.
[00118] The various embodiments described herein are presented only to assist in understanding and teaching the claimed features. These embodiments are provided as a representative sample of embodiments only, and are not exhaustive and/or exclusive. It is to be understood that advantages, embodiments, examples, functions, features, structures, and/or other aspects described herein are not to be considered limitations on the disclosure scope defined by the claims or limitations on equivalents to the claims, and that other embodiments may be utilised and modifications may be made without departing from the scope of the invention as defined by the claims.

Claims (24)

  1. CLAIMS: 1. A method for determining proportional-integral-derivative, PID, terms for use in a PIDcontrolled system comprising a device, a sensor configured to sense information associated with the device, and a controller configured to control the device based on closed-loop control using the sensed information and the PID terms, the method comprising: obtaining a physical parameter indicative of a physical property of the device; and determining, based on applying a machine learning technique to the physical parameter, the PID terms.
  2. 2. The method of claim 1, wherein the physical property of the device is a resonant frequency of the device.
  3. 3. The method of any preceding claim, further comprising obtaining a further parameter indicative of a characteristic of the device, wherein determining the PID terms is based on applying the machine learning technique to the physical parameter and the further parameter.
  4. 4. The method of claim 3, wherein the characteristic of the device is a gain associated with the device.
  5. 5. The method of claim 4, wherein the gain associated with the device is an electrical gain associated with the device, and wherein the further parameter is an electrical parameter.
  6. 6. The method of claim 5, wherein: the physical parameter comprises a first physical parameter indicative of a first resonant frequency of the device and a second physical parameter indicative of a second resonant frequency of the device; the electrical parameter comprises a first electrical parameter indicative of a first electrical gain associated with the device and a second electrical parameter indicative of a second electrical gain associated with the device; and the PID terms comprise a first and second proportional term, a first and second integral term, and a first and second derivative term.
  7. 7. The method of claim 6, wherein: the first resonant frequency of the device is a resonant frequency of the device associated with a first dimension of the device; the second resonant frequency of the device is a resonant frequency of the device associated with a second dimension of the device; the first electrical gain associated with the device is an electrical gain associated with the first dimension of the device; and the second electrical gain associated with the device is an electrical gain associated with the second dimension of the device.
  8. 8. The method of any preceding claim, wherein the device is a damped oscillator.
  9. 9. The method of claim 8, wherein the damped oscillator is a piezoelectric actuator.
  10. 10. The method of any preceding claim, wherein the machine learning technique is a reinforcement learning technique.
  11. 11. The method of claim 10, wherein applying the reinforcement learning technique comprises inputting the physical parameter and the electrical parameter to a trained reinforcement learning model.
  12. 12. The method of claim 11, wherein: the trained reinforcement learning model is trained based on training data comprising physical parameters indicative of physical properties associated with a test apparatus comprising a plurality of test devices and electrical parameters indicative of electrical characteristics associated with the test apparatus; and the test devices correspond to the device.
  13. 13. The method of claim 12, wherein the trained reinforcement learning model is trained further based on a multi-objective reward function.
  14. 14. The method of claim 13, wherein the multi-objective reward function, R, is: wherein i is a number greater than one, mi is a measured value of a metric for a given output, and I, is a predefined acceptable threshold for that metric.
  15. 15. The method of claims 13 or 14, wherein the trained reinforcement learning model is trained further based on iteratively: selecting a test device of the plurality of test devices; measuring a physical parameter indicative of a physical property of the selected test device; measuring an electrical parameter indicative of an electrical characteristic associated with the selected test device; inputting a state vector based on the physical parameter and the electrical parameter to the reinforcement learning model to generate a set of PID parameters; selecting a control operation to be performed on the selected device; performing the selected control operation on the selected device under closed-loop control and using the set of PID parameters while measuring an output variable of the control operation; and generating a reward using the multi-objective reward function and one or more metrics calculated based on the output variable of the control operation, wherein the trained reinforcement learning model is trained further based on updating the reinforcement learning model based on the generated reward.
  16. 16. The method of claim 15, wherein: the plurality of test devices are a plurality of piezoelectric actuators; the physical properties are resonant frequencies of the piezoelectric actuators; the electrical characteristics are electrical gains associated with the piezoelectric actuators; selecting the control operation comprises selecting an actuation to be performed on the selected piezoelectric actuator; and performing the selected control operation on the piezoelectric actuator by performing the actuation on the selected piezoelectric actuator.
  17. 17. The method of any preceding claim, the method further comprising controlling, based on the PID terms, the device.
  18. 18. An apparatus for determining proportional-integral-derivative, PID, terms for use in a PIDcontrolled system comprising a device; a sensor configured to sense information associated with the device, and a controller configured to control the device based on closed-loop control using the sensed information and the PID terms, the apparatus comprising: parameter obtaining circuitry configured to obtain a parameter indicative of a physical property of the device; and PID term determining circuitry configured to determine, based on machine learning circuitry configured to apply a machine learning technique to the parameter; the PID terms.
  19. 19. A computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method of claims 1 to 17.
  20. 20. A system comprising: a device; a sensor configured to sense information associated with the device; and a controller configured to control the device based on closed-loop PID control using the sensed information and PID terms determined based on applying a machine learning technique to an obtained parameter indicative of a physical property of the device.
  21. 21. A PID-controlled system comprising: a piezoelectric actuator; a position sensor configured to sense a position of the piezoelectric actuator; and a controller configured to control the piezoelectric actuator based on closed-loop PID control using the position of the piezoelectric actuator sensed by the position sensor and PID terms determined based on applying a machine learning technique to an obtained parameter indicative of a physical property of the piezoelectric actuator.
  22. 22. The PID-controlled system of claim 21, wherein the PID-controlled system is an optical switch, the system further comprising: an optical fibre configured to transmit light in a propagation direction, wherein: the piezoelectric actuator is configured to actuate movement of the propagation direction.
  23. 23. A method for training a reinforcement learning model to determine PID terms for use in a PIDcontrolled system comprising a device, a sensor configured to sense information associated with the device, and a controller configured to control the device based on closed-loop control using the sensed information and the PID terms, the method comprising: obtaining training data comprising physical parameters indicative of physical properties associated with a test apparatus comprising a plurality of test devices and electrical parameters indicative of electrical characteristics associated with the test apparatus, the test devices corresponding to the device; and training, based on the training data and a multi-objective reward function, the reinforcement learning model.
  24. 24. The method of claim 23, wherein obtaining the training data further comprises iteratively: selecting a test device of the plurality of test devices; measuring a physical parameter indicative of a physical property of the selected test device; measuring an electrical parameter indicative of an electrical characteristic associated with the selected test device; inputting a state vector based on the physical parameter and the electrical parameter to the reinforcement learning model to generate a set of PID parameters; selecting a control operation to be performed on the selected device; performing the selected control operation on the selected device under closed-loop control and using the set of PID parameters while measuring an output variable of the control operation; and generating a reward using the multi-objective reward function and one or more metrics calculated based on the output variable of the control operation, wherein training the reinforcement learning model further comprises updating the reinforcement learning model based on the generated reward.
GB2210433.5A 2022-07-15 2022-07-15 PID tuning Pending GB2620635A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB2210433.5A GB2620635A (en) 2022-07-15 2022-07-15 PID tuning
PCT/GB2023/051677 WO2024013465A2 (en) 2022-07-15 2023-06-27 Pid tuning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB2210433.5A GB2620635A (en) 2022-07-15 2022-07-15 PID tuning

Publications (2)

Publication Number Publication Date
GB202210433D0 GB202210433D0 (en) 2022-08-31
GB2620635A true GB2620635A (en) 2024-01-17

Family

ID=84540290

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2210433.5A Pending GB2620635A (en) 2022-07-15 2022-07-15 PID tuning

Country Status (2)

Country Link
GB (1) GB2620635A (en)
WO (1) WO2024013465A2 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190187631A1 (en) * 2017-12-15 2019-06-20 Exxonmobil Research And Engineering Company Adaptive pid controller tuning via deep reinforcement learning
US20190196417A1 (en) * 2017-12-26 2019-06-27 Fanuc Corporation Controller and machine learning device
US20200206975A1 (en) * 2016-12-12 2020-07-02 Schlumberger Technology Corporation Automated Cement Mixing
US20210132587A1 (en) * 2019-11-04 2021-05-06 Honeywell International Inc. Application of simple random search approach for reinforcement learning to controller tuning parameters
US20210341895A1 (en) * 2020-04-29 2021-11-04 Honeywell International Inc. Pid controller autotuner using machine learning approaches
US20220072679A1 (en) * 2018-12-28 2022-03-10 Ebara Corporation Pad-temperature regulating apparatus, method of regulating pad-temperature, polishing apparatus, and polishing system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5847952A (en) * 1996-06-28 1998-12-08 Honeywell Inc. Nonlinear-approximator-based automatic tuner

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200206975A1 (en) * 2016-12-12 2020-07-02 Schlumberger Technology Corporation Automated Cement Mixing
US20190187631A1 (en) * 2017-12-15 2019-06-20 Exxonmobil Research And Engineering Company Adaptive pid controller tuning via deep reinforcement learning
US20190196417A1 (en) * 2017-12-26 2019-06-27 Fanuc Corporation Controller and machine learning device
US20220072679A1 (en) * 2018-12-28 2022-03-10 Ebara Corporation Pad-temperature regulating apparatus, method of regulating pad-temperature, polishing apparatus, and polishing system
US20210132587A1 (en) * 2019-11-04 2021-05-06 Honeywell International Inc. Application of simple random search approach for reinforcement learning to controller tuning parameters
US20210341895A1 (en) * 2020-04-29 2021-11-04 Honeywell International Inc. Pid controller autotuner using machine learning approaches

Also Published As

Publication number Publication date
GB202210433D0 (en) 2022-08-31
WO2024013465A2 (en) 2024-01-18
WO2024013465A3 (en) 2024-02-29

Similar Documents

Publication Publication Date Title
Emmons et al. Rvs: What is essential for offline rl via supervised learning?
Satheeshbabu et al. Open loop position control of soft continuum arm using deep reinforcement learning
US11253999B2 (en) Machine learning device, robot control device and robot vision system using machine learning device, and machine learning method
Xu et al. Prompting decision transformer for few-shot policy generalization
KR100371728B1 (en) Feedback method for controlling non-linear processes
US10796226B2 (en) Laser processing apparatus and machine learning device
Bittanti et al. Nonlinear identification and control of a heat exchanger: a neural network approach
CN111433689B (en) Generation of control systems for target systems
Trunov Recurrent approximation as the tool for expansion of functions and modes of operation of neural network
JP2009175917A (en) Controlled parameter adjustment method and controlled parameter adjustment program
Ramírez et al. On delay-based control of low-order LTI systems: A simple alternative to PI/PID controllers under noisy measurements
GB2620635A (en) PID tuning
Kostadinov et al. Online weight-adaptive nonlinear model predictive control
Liu et al. Efficient reinforcement learning control for continuum robots based on inexplicit prior knowledge
Wang et al. A data-efficient model-based learning framework for the closed-loop control of continuum robots
Quiñones et al. Contact force regulation in physical human-machine interaction based on model predictive control
Benjamin Multi-objective autonomous vehicle navigation in the presence of cooperative and adversarial moving contacts
JP4301491B2 (en) Autonomous design method, autonomous design apparatus and motor control system for motor control system
Emaletdinova et al. Algorithms of constructing a neural network model for a dynamic object of control and adjustment of PID controller parameters
Nimawat et al. Self-tuning fuzzy PID controllers application on industrial hydraulic actuator using unique system identification approach
Matpan Data driven model discovery and control of longitudinal missile dynamics
Emhemed et al. Model predictive control: A summary of industrial challenges and tuning techniques
Brasch et al. Lateral control of a vehicle using reinforcement learning
Zaychik et al. Introspective Control Systems: Fast Model Predictive Control with Explicit Optimization Search, Nonlinear Models, and On-line Learning
Harutyunyan et al. Off-policy reward shaping with ensembles