WO2022192352A1 - Optimisation de commande vectorielle améliorée basée sur un multi-critique avec apprentissage multi-q pour une commande de moteur - Google Patents

Optimisation de commande vectorielle améliorée basée sur un multi-critique avec apprentissage multi-q pour une commande de moteur Download PDF

Info

Publication number
WO2022192352A1
WO2022192352A1 PCT/US2022/019486 US2022019486W WO2022192352A1 WO 2022192352 A1 WO2022192352 A1 WO 2022192352A1 US 2022019486 W US2022019486 W US 2022019486W WO 2022192352 A1 WO2022192352 A1 WO 2022192352A1
Authority
WO
WIPO (PCT)
Prior art keywords
controller
mcmql
current
electric motor
actor
Prior art date
Application number
PCT/US2022/019486
Other languages
English (en)
Other versions
WO2022192352A9 (fr
Inventor
Soumava BHATTACHARJEE
Ye Yan
Narayan C. KAR
Lakshmi Varaha IYER
Original Assignee
Magna International Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Magna International Inc. filed Critical Magna International Inc.
Publication of WO2022192352A1 publication Critical patent/WO2022192352A1/fr
Publication of WO2022192352A9 publication Critical patent/WO2022192352A9/fr

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P23/00Arrangements or methods for the control of AC motors characterised by a control method other than vector control
    • H02P23/12Observer control, e.g. using Luenberger observers or Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P21/00Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
    • H02P21/0003Control strategies in general, e.g. linear type, e.g. P, PI, PID, using robust control
    • H02P21/0025Control strategies in general, e.g. linear type, e.g. P, PI, PID, using robust control implementing a off line learning phase to determine and store useful data for on-line control
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P21/00Arrangements or methods for the control of electric machines by vector control, e.g. by control of field orientation
    • H02P21/14Estimation or adaptation of machine parameters, e.g. flux, current or voltage
    • H02P21/20Estimation of torque
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P6/00Arrangements for controlling synchronous motors or other dynamo-electric motors using electronic commutation dependent on the rotor position; Electronic commutators therefor
    • H02P6/34Modelling or simulation for control purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02PCONTROL OR REGULATION OF ELECTRIC MOTORS, ELECTRIC GENERATORS OR DYNAMO-ELECTRIC CONVERTERS; CONTROLLING TRANSFORMERS, REACTORS OR CHOKE COILS
    • H02P2207/00Indexing scheme relating to controlling arrangements characterised by the type of motor
    • H02P2207/05Synchronous machines, e.g. with permanent magnets or DC excitation

Definitions

  • the present disclosure relates generally to a method and system for controlling an electric motor drive.
  • Electric motor drives also called motor drives, provide an alternating current (AC) voltage to an electric motor.
  • Motor drives typically include inverters to convert direct current (DC) electrical power to the AC voltage.
  • Motor drives are frequently used for powering traction motors in electric vehicles (EVs), such as battery electric vehicles, hybrid electric vehicles (HEVs), and plug-in hybrid electric vehicles (PHEVs).
  • EVs electric vehicles
  • HEVs hybrid electric vehicles
  • PHEVs plug-in hybrid electric vehicles
  • Motor drives must control several different characteristics of the AC voltage, depending on design details of the electric motor and demands of the system at any given time (e.g. amount of torque to be produced, speed and position of the motor, etc.).
  • Vector control plays a critical role in a motor drive to deliver the desired torque and speed for electrified vehicle (EV) applications.
  • Motor speed and stator current control depends on various motor parameters which influence the performance of the electric motor.
  • tuning of speed and current controller parameters using conventional control techniques also depends on parameters of the electric motor.
  • an electric motor drive comprises: an inverter having at least one switch operable to supply an alternating current (AC) power to an electric motor; and a controller.
  • the controller is configured to: compute an error signal by subtracting an actual value signal from a reference value corresponding to a control signal; determine a reward based on the error signal; determine a state observation based on the error signal; compute, using a multi-critic with multi-Q-learning (MCMQL) actor optimization (AO) controller, an actor output based on the error signal, the MCMQL- AO controller including an actor network and a plurality of critic networks, each configured to update the actor network based on the reward and the state observation; determine the control signal based on the actor output; and command, based on the control signal, the at least one switch to selectively conduct current.
  • MCMQL multi-Q-learning
  • AO actor optimization
  • a method of operating an electric motor drive comprises: computing an error signal by subtracting an actual value signal from a reference value corresponding to a control signal; determining a reward based on the error signal; determining a state observation based on the error signal; computing, using a multi-critic with multi-Q-learning (MCMQL) actor optimization (AO) controller, an actor output based on the error signal, the MCMQL-AO controller including an actor network and a plurality of critic networks each configured to update the actor network based on the reward and the state observation; determining the control signal based on the actor output; and commanding, based on the control signal, at least one switch of an inverter to selectively conduct current.
  • MCMQL multi-Q-learning
  • AO actor optimization
  • FIG. 1 shows a schematic block diagram of a system, in accordance with an aspect of the present disclosure
  • FIG. 2 shows a schematic block diagram of an electric motor drive, in accordance with an aspect of the present disclosure
  • FIG. 3 shows a schematic block diagram of an electric motor drive in accordance with an aspect of the present disclosure
  • FIG. 4 shows a block diagram of a motor control architecture incorporating deep reinforcement learning (DRL) using multi-critic with multi-Q-learning actor optimization (MCMQL-AO) controllers, in accordance with an aspect of the present disclosure
  • FIG. 5 shows a block diagram of a test system for performing adaptive PI control
  • FIG. 6 shows a graph tracking q- axis current in a motor drive using an adaptive PI control technique of the present disclosure
  • FIG. 7 shows a graph tracking d- axis current in the motor drive using the adaptive
  • FIG. 8 shows a graph of speed vs. time using the adaptive PI speed control technique of the present disclosure
  • FIG. 9 shows a graph of torque vs. time using the adaptive PI speed control technique of the present disclosure
  • FIG. 10 shows a schematic block diagram of a motor vector control using multi- critic with multi-Q-learning actor optimization (MCMQL-AO) controllers, in accordance with an aspect of the present disclosure
  • FIG. 11 shows a schematic block diagram of a first MCMQL actor optimization
  • FIG. 12 shows a schematic block diagram of a second MCMQL-AO controller, in accordance with an aspect of the present disclosure
  • FIG. 13 shows a schematic of the MCMQL-AO actor-critic network, in accordance with an aspect of the present disclosure
  • FIG. 14 shows a flow chart of steps in a Training workflow of a MCMQL-AO algorithm for control, in accordance with an aspect of the present disclosure
  • FIG. 15 shows a flow diagram for training and tuning a MCMQL-AO controller, in accordance with an aspect of the present disclosure
  • FIG. 16 shows a graph of discounted cumulative reward of an RL agent during training, in accordance with an aspect of the present disclosure
  • FIG. 17 shows a block diagram of a test setup for SIL validation of MCMQL-AO vector control of a permanent magnet synchronous motor (PMSM), in accordance with an aspect of the present disclosure
  • FIG. 18 shows a graph of speed vs. time, with plots showing performance of a
  • FIG. 19 shows a graph of torque vs. time, with plots showing performance of a
  • FIG. 20 shows a combined graph of q- axis and d- axis current tracking using both
  • MCMQL-AO controller and adaptive PI controller in accordance with an aspect of the present disclosure
  • FIG. 21 shows a graph of cumulative reward of a MCMQL-AO speed and current controller during speed and current tracking evaluation, in accordance with an aspect of the present disclosure
  • FIG. 22 shows a combined graph of q- axis and d- axis current tracking using both
  • MCMQL-AO controller and adaptive PI controller each with 20% increased flux linkage, in accordance with an aspect of the present disclosure
  • FIG. 23 shows a combined graph of q- axis and d- axis current tracking using both
  • MCMQL-AO controller and adaptive PI controller each with 20% decreased flux linkage, in accordance with an aspect of the present disclosure
  • FIG. 24 shows a graph showing electromagnetic torque waveforms of PMSM for adaptive PI and MCMQL-AO current controllers under 20% reduced flux linkage
  • FIG. 25 shows a combined graph of q- axis and d- axis current tracking using both
  • MCMQL-AO controller and adaptive PI controller each with 20% increased time-varying parameters, in accordance with an aspect of the present disclosure
  • FIG. 26 shows a combined graph of q- axis and d- axis current tracking using both
  • MCMQL-AO controller and adaptive PI controller each with 20% decreased time-varying parameters, in accordance with an aspect of the present disclosure
  • FIG. 27 shows a graph showing the torque response for 20% increased time- varying parameters for adaptive PI and MCMQL-AO current controllers, in accordance with an aspect of the present disclosure.
  • FIG. 28 shows a combined graph of q- axis and d- axis current tracking using both
  • MCMQL-AO controller and adaptive PI controller showing the effect of sample time at different speed profiles, in accordance with an aspect of the present disclosure.
  • a system and method for controlling an electric motor drive is provided.
  • the system and method of the present disclosure may be used to control a motor drive for a permanent magnet synchronous machine (PMSM) type electric motor.
  • PMSM permanent magnet synchronous machine
  • the system and method of the present disclosure may be used with other types of electric machines, such as induction motors or wound field synchronous machines.
  • PMSM powertrain-to-semiconductor
  • MCMQL-AO multi-Q- leaming actor optimization
  • the present disclosure provides a methodology to deliver a closed-loop reinforcement learning (RL) agent trained with the deterministic policy gradient based multi -critic with multi-Q-learning (MCMQL) actor optimization (AO) algorithm in the plant environment where the cost of exploration is expensive.
  • MCMQL multi-Q-learning
  • AO actor optimization
  • PMSM drives Permanent magnet synchronous motor (PMSM) drives are widely used in EV applications due to its promising features such as high power density, high efficiency, reliability, and light weight compared to those of other motor drive technologies. PMSM drives are commonly controlled using field–oriented control (FOC) algorithms.
  • FOC field–oriented control
  • One strategy for controlling a motor drive is to employ cascaded control loops. For example, Conventional field–oriented control (FOC) algorithms use cascaded proportional– integral (PI) based control loops for speed and stator current control of PMSM. Such a configuration may minimize or eliminate tracking errors under dynamic performance. The speed and stator current control plays a key role in PMSM torque control and its overall performance.
  • FOC field–oriented control
  • PI proportional– integral
  • Such cascaded control loops may include a first controller to determine a reference torque T ref that corresponds to the electric motor having an actual rotational speed ⁇ m matching the desired rotational speed ⁇ m , and a second controller to determine the stator reference voltage to be applied to the electric motor to cause a motor current supplied to the electric motor to match a desired current value corresponding to the reference torque T ref .
  • a maximum torque per ampere (MTPA) block may be used to generate the desired current value based on the reference torque T ref .
  • the desired current may be given as direct axis and quadrature axis values, collectively i ⁇ dq , although the current may be specified in other ways, such as phase currents i a , i b , i c .
  • the desired voltage may be given as direct axis and quadrature axis values, collectively Vdq, although the desired voltage may be specified in other ways, such as phase voltages V ab , V bc , V ca .
  • the effective tuning of controller parameters such as PI controller parameters, depends on plant parameters which are derived from inverter and motor transfer functions.
  • Direct torque control provides optimal torque control for PMSM based on hysteresis comparators and space vector modulation (SVM) switching tables.
  • SVM space vector modulation
  • this control also demands Pi-based torque and flux controller which is further dependent on the system non-linear parameters.
  • this control methodology leads to high flux and torque ripple and further requires a higher sampling frequency.
  • Model predictive control (MPC) and model based design provide effective current control and improved dynamic motor performance. It incorporates complex mathematical models and equations which increase the computation cost. However, for real-time implementation these controls neglect parasitic effects such as nonlinearity and cross-saturation, to reduce the computational burden and implementation cost.
  • Machine learning may be applied to one or more aspects of controlling a motor drive. Such machine learning controls are based on supervised learning, which generally requires large, labeled, and accurate data sets for training under dynamic conditions.
  • Neural network (NN) based PMSM control has been proven to be efficient due to its effective control and reduced computation time.
  • Recurrent NN (RNN) and radial bias function NN (RBFNN) are proposed to enhance the control performance of PMSM.
  • RNN Recurrent NN
  • RBFNN radial bias function NN
  • the use of supervised learning requires large training and test data sets for dynamic performance.
  • the learning algorithm generalizes and maps the input vector with the target vector intending to minimize the mapping errors as cost function.
  • a recurrent neural network may also be used in speed and current controller to enhance the motor dynamic performance.
  • the RNN may be trained using a dynamic programming algorithm to enable accurate and fast-tracking.
  • dynamic programming enables improved training of neural networks for enhanced vector control of a motor.
  • the development of a cost function involves uncertain plant parameters (e.g inductance, flux linkage, resistance, switching frequency of the motor and inverter).
  • Deep neural network (DNN) based motor control uses the concept of supervised learning, and has proven to be a continued success in this domain.
  • Supervised learning maps the input vector to the anticipated target vector by adapting a generalized function approximation or cost function.
  • the use of supervised learning requires large, labeled training and test data sets under different dynamic conditions.
  • the learning algorithm generalizes and maps the input vector with the target vector intending to minimize the mapping errors as cost function.
  • stator voltage equation of PMSM is derived from Park transformation as equation (1): where V d and v q are the input stator voltage in d- axis and i/-axis respectively; i d , and i q are the stator current in dq frame; electrical angular speed is expressed as co e ; L d and L q are d- axis and q- axis inductance of stator and rotor; R s is the stator winding resistance, and ⁇ m is the flux linkage of the PMSM.
  • L d , L q , R s, and ⁇ m are the time-varying non-linear parameters.
  • T e The modeling of the electromagnetic torque equation, T e can be further achieved from the FOC theory considering the PMSM types.
  • SPMSM surface PMSM
  • IPMSM interior PMSM
  • T e is expressed in equations (2) and (3) respectively, where P denotes the number of pole pairs, ⁇ e is the electrical speed corresponding to frequency of stator voltages, L q and L d are d- axis and q- axis inductance, respectively, and i d and i q are d- axis and q- axis current, respectively.
  • the discretization of the continuous PMSM model can be achieved with Runge-Kutta or Dormand-Prince ordinary differential equation (ODE) solver.
  • Dynamic coupling terms including ⁇ e L q i q and - ⁇ e L d i d, change rapidly with the changes in torque and speed. This leads to poor performance with a slow current response and current fluctuations under dynamic conditions.
  • IPMSM internal permanent magnet synchronous machine
  • FIG. 1 shows a block diagram of system 10 in accordance with an aspect of the present disclosure.
  • the system 10 includes an inverter 20 having one or more solid-state switches 22, such as field effect transistors (FETs) configured to switch current from a DC power supply 23 and to generate an AC power upon a set of motor leads 24.
  • the motor leads 24 transmit electrical power between the inverter 20 and an electric motor 26.
  • the electric motor 26 may be a permanent magnet synchronous motor (PMSM).
  • PMSM permanent magnet synchronous motor
  • the system 10 may be used with other types of electric machines such as wound field machines, inductance machines, and/or reluctance machines.
  • the electric motor 26 is shown as a 3 -phase machine, however, the electric motor 26 may have any number of phases.
  • the electric motor 26 may be a single-phase machine, a 3-phase machine, or a higher-order multiphase machine.
  • the electric motor 26 may be used as a motor, a generator, or as a motor/generator that functions as both a motor and a generator.
  • Current sensors 28 measure currents in corresponding ones of the motor leads 24.
  • the system 10 may include other sensors, such as voltage sensors configured to measure voltages upon or between the motor leads 24.
  • the system 10 of FIG. 1 also includes a controller 30 in communication with the current sensors 28 to measure the currents in the motor leads 24.
  • the controller 30 may also be in functional communication with the inverter 20 to control operation of the motor drive 30 and/or to monitor parameters measured by sensors associated with the inverter 20.
  • the controller 30 includes a processor 32 coupled to a storage memory 34.
  • the storage memory 34 stores instructions, such as program code for execution by the processor 32, in an instruction storage 36.
  • the storage memory 34 also includes data storage 38 for holding data to be used by the processor 32.
  • the data storage 38 may record, for example, values of the parameters measured by the current sensors 28 and/or the outcome of functions calculated by the processor 32.
  • An encoder 42 may measure a rotational position ⁇ of a shaft 40 of the electric motor 26.
  • the rotational position ⁇ of the electric motor 26 may be communicated to the controller 30.
  • the rotational position ⁇ of the electric motor 26 may be determined indirectly, for example, as a result of variations in the voltage and/or current on the motor leads 24
  • Controller tuning may depend on switching frequency f sw of the inverter and non- linear motor parameters, including, for example, d- axis inductance L d , q- axis inductance L q , flux linkage ⁇ m , and stator winding resistance R s .
  • the non-linear motor parameters may change due to temperature, magnetic saturation, loading, and aging in various machine running conditions (e.g. wear on bearings, breakdown and contamination of lubricants, and/or breakdown of electrical insulators).
  • System performance such as currents, speed, torque, may experience oscillations due to changing transient conditions, such as changing load conditions.
  • accurate system performance e.g. currents, speed, torque
  • the system and method of the present disclosure is configured to address each of these objectives.
  • the provided multi-critic network enables enhanced optimization of actor network training in reinforcement learning.
  • the provided multi-Q-learning estimation facilitates multi -scenario or multi-task learning enabling more accurate actions T ref , V d , and V q .
  • the multi- critic with multi-Q-learning actor optimization (MCMQL-AO) controllers of the present disclosure are independent of plant (inverter and motor) parameters.
  • the MCMQL-AO controllers (RL Agent 1 and 2) can reduce transient oscillations under dynamic speed-torque conditions.
  • the schematic of a proposed MCMQL-AO vector control architecture is shown in
  • FIG. 2 shows a schematic block diagram of a first electric motor drive 100, in accordance with an aspect of the present disclosure.
  • the first electric motor drive 100 includes a speed controller 102 and a current controller 104 in a cascaded configuration. Some or all of the speed controller 102 and/or the current controller 104 may be implemented in software instructions executed by the processor 32, in hardware, or in a combination of hardware and software.
  • the speed controller 102 includes a first reinforcement learning (RL) agent 110
  • the current controller 104 includes a second RL agent 130.
  • Each of the RL agents 110, 130 is a multi- critic-network with multi-Q-learning,
  • the speed controller 102 is configured to determine a reference torque T ref for the electric motor 26 and based on a speed error signal e ⁇ to cause the actual speed ⁇ m of the electric motor 26 to match a speed reference ⁇ ' m .
  • a first difference block 105 computes the speed error signal e ⁇ by subtracting the actual speed ⁇ m of the electric motor 26 from the speed reference ⁇ ' m .
  • error should be understood to refer to a difference between a reference value and a process-dependent value.
  • the process-dependent value may also be called an actual value.
  • a maximum torque per ampere (MTPA) block 120 computes a reference current i' dq to cause the electric motor 26 to produce the reference torque T ref .
  • the reference current i' dq may be provided as d- axis and q- axis components.
  • the speed controller 102 may be configured to directly compute the reference current i' dq without the intermediate step of determining the reference torque T ref .
  • the speed controller 102 includes a first observer multiplexer 106 which supplies data to first state observer 109 to determine a set of first state observations based on the speed error signal e ⁇ , and/or other control observations by the first RL agent 110, over time.
  • the first observer multiplexer 106 also supplies data to a first reward calculator 108, which is configured to determine a reward based on the speed error signal e ⁇ and/or the reference torque T ref output by the first RL agent 110, over time.
  • the first RL agent 110 includes a first number i of critics 112 coupled to an actor 114 configured to determine the reference torque T ref .
  • the critics 112 may also be called critic networks.
  • the first number i may be any number of critics 112 and each critic network may have different configurations, C styles .
  • the first number i may be 2, 4, 5, 8, 16, 32, or any other number of critics 112.
  • the current controller 104 is configured to determine a reference voltage V dq for the inverter 20 to apply to the electric motor 26 based on a current error signal id, q _ err such that the actual current i d,q supplied to the electric motor 26 matches the reference current i' dq .
  • the reference voltage Vdq may be provided as d- axis and q- axis components. However, the reference voltage V dq may be provided in other forms, such as V a , V b, V c line voltages on the respective motor leads 24.
  • a second difference block 124 computes the current error signal id, q _ err by subtracting the actual current i d,q supplied to the electric motor 26 from the reference current i' dq.
  • the current controller 104 includes a second observer multiplexer 126 which supplies data to second state observer 129 to determine a set of second state observations based on the current error signal e d,q , and/or other control observations by the second RL agent 130, over time.
  • the second observer multiplexer 126 also supplies data to a second reward calculator 128, which is configured to determine a reward based on the current error signal e d,q and/or the reference voltage V dq output by the second RL agent 130, over time.
  • the second RL agent 130 includes a second number j of critics 132 coupled to an actor 134 configured to determine the reference voltage Vdq.
  • the second number j of critics 132 in the second RL agent 130 may be greater than, less than, or equal to the first number i of critics 112 in the first RL agent 110.
  • the second number j may be any number of critics 132 and each critic network may have different configurations, C styles .
  • the second number j may be 2, 4, 5, 8, 16, 32, or any other number of critics 132.
  • An ab transform block 140 takes the reference voltage V dq in the dq domain and transforms the reference voltage to the ab domain to generate output voltage commands V ⁇ .
  • a pulse-width-modulation (PWM) block 142 generates pulse-width-modulation signals for controlling the solid-state switches 22 to cause the inverter 20 to apply the reference voltage V dq on the motor leads 24.
  • PWM pulse-width-modulation
  • Either or both of the ab transform block 140 and/or the PWM block 142 may be implemented in software instructions executed by the processor 32, in hardware, or in a combination of hardware and software.
  • a dq transform block 144 calculates the d- axis and q- axis actual current i d,q supplied to the electric motor 26 based on phase currents i a , i b , i c on the motor leads 24, which may be measured by current sensors (not shown) and angular position Q.
  • the ab transform block 140 may use the angular position ⁇ to generate the output voltage commands V ⁇ .
  • a derivative block 148 calculates the actual speed ⁇ m of the electric motor 26 by taking a derivative of the rotational position ⁇ of the electric motor 26. Either or both of the dq transform block 144 and/or the derivative block 148 may be implemented in software instructions executed by the processor 32, in hardware, or in a combination of hardware and software.
  • Two separate agents 110, 130 are used and trained simultaneously for accurate speed and current control.
  • the first RL agent 110 and second RL agent 130 are used for speed tracking and current tracking of the motor, respectively.
  • the first RL agent 110 actor imitates the control action of the controller through speed errore ⁇ , or reference Torque T ref as output.
  • the second RL agent 130 actor imitates the control action of the controller through optimal reference stator current error e d,q , or motor stator voltage V d and V q as output.
  • the multi-critics 112, 132 each evaluates the respective controller action in terms of the scenario(s) or task(s) reward and further optimizes the corresponding actor 114, 134. Also, the multi-critics 112, 132 each estimate the action values and the probability of taking the action again in future actions through multi -Q- value estimations.
  • FIG. 3 shows a schematic block diagram of a second electric motor drive 150, in accordance with an aspect of the present disclosure.
  • the second electric motor drive 150 includes a speed controller 152 and a current controller 104 in a cascaded configurations.
  • the second electric motor drive 150 may be similar or identical to the first electric motor drive 100, except with only one RL agent 130.
  • the speed controller 152 of the second electric motor drive 150 includes a PI controller 154 and generates a reference torque T ref .
  • the reference torque T ref from the speed controller 152 is supplied to a MTPA controller 120 to compute the reference current i' dq.
  • the current controller 104 includes an RL agent 130, which may be similar or identical to the configuration described above with reference to FIG. 2.
  • an electric motor drive may include a single RL agent 110 in a speed controller, and the current controller may use a PI controller.
  • the optimal action of the RL agents 110, 130 is predicted from the optimal policy p of the corresponding actor network through regression.
  • the critic evaluates the (7-values for ⁇ from the action-value Bellman equation in equation (4), below: where t is the discrete-time step, g e (0,1] is the discount factor, and a t(1,2)) are the control actions (T ref or V d and V q ) at given states s t(1,2) receiving rewards r t(1,2) .
  • the state observations s t(1,2) and the reward function r t(1,2) are the 7-values for ⁇ from the action-value Bellman equation in equation (4), below: where t is the discrete-time step, g e (0,1] is the discount factor, and a t(1,2)) are the control actions (T ref or V d and V q ) at given states s t(1,2) receiving rewards r t(1,2) .
  • Multi-critic with multi-Q-learning can maximize the reward through avoiding local maximums in gradient ascent optimizations in reinforcement learning
  • Multi-critic with multi-Q-learning can improve the learning performance and efficiency of reinforcement learning
  • Multi-critic with multi-Q-learning can facilitate multi -scenario or multi-task optimization using different configurations of critic networks
  • Multi-critic with multi-Q-learning can mitigate the overestimation of (Q-value in early stage of learning
  • Multi-critic with multi-Q-learning can adapt to the learning performance and efficiency tradeoff of reinforcement learning
  • Multi-critic with multi-Q-learning can support hybrid online/offline reinforcement learning.
  • the vector control in PMSM plays an integral role in motor speed, stator current and torque control, influencing PMSM transient and dynamic performance.
  • coupled PI controllers are used to eliminate speed tracking errors, e ⁇ , in speed loop and current tracking errors in d-axis and q-axis current loop, e d and e q , as in equation (7): here, ⁇ ' m , i' d and i' q are the reference speed, d- and q-axis stator currents and ⁇ m , i d and i q are the actual speed, and d- and q-axis stator currents, respectively.
  • the proportional (P) and integral parameters (I) of the adaptive PI controller are tuned from the PMSM discrete transfer functions which includes the PMSM time-varying parameters to eliminate the tracking errors of the speed, e ⁇ and stator current, e d and e q.
  • the coupled PI controllers deliver the reference torque, T ref , based on e d , and the reference stator voltages, V d and V q , based on the e d and e q in equation (8), where T s is the sample time of the controllers.
  • FIG. 5 shows a block diagram of a test system 220 for performing adaptive PI control.
  • the test system 220 shown in FIG. 5 includes an electric motor 26 which is a PMSM.
  • the electric motor 26 is coupled to a dynamometer 222, called a dyno, for short, via a coupling 224.
  • the test system 220 includes a torque transducer 226 connected to the coupling 224 and measuring the torque generated by the electric motor 26.
  • the test system 220 also includes a data acquisition device 228 coupled to the dynamometer 222 and to the torque transducer 226 for storing and processing information therefrom.
  • the test system 220 also includes an inverter 20 connected to the electric motor 26 via motor leads 24 and configured to provide power for operating the electric motor 26.
  • the test system 220 also includes a set of sensors 230 such as current sensors and/or voltage sensors for measuring corresponding current and/or voltage supplied to the electric motor 26.
  • the test system 220 also includes a controller 30, such as a real-time OPAL-RT controller, functionally connected to the inverter 20 for providing commands thereto.
  • the controller 30 is also in functional communication with the sensors 230 for receiving information therefrom.
  • the performance is evaluated through current control by rotating the motor at a constant speed using a speed- controlled dynamometer.
  • the speed and current control performance are evaluated by loading the motor using laboratory torque- and speed-controlled dynamometers, respectively.
  • the adaptive PI controllers are used to track the speed and d- axis and q-axis currents.
  • a real-time OPAL-RT controller is used for rapid control prototyping (RCP) of the control algorithm with an insulated- gate bipolar transistor (IGBT) inverter.
  • the PMSM speed is varied, and the torque performance is observed.
  • the current angle, ⁇ MTPA is varied at each maximum current, and the current tracking response is evaluated.
  • FIGS. 6 and 7 show q- axis current and d- axis current, respectively, over a common time scale.
  • the plots of FIGS. 6-7 demonstrate transient overshoot in d- axis due to the change in current in q-axis at 5.5 seconds.
  • a transient peak is observed in q-axis due to a change of current in d- axis at 10.25 seconds.
  • the PMSM electromagnetic torque performance under varying speed conditions using the adaptive PI speed controller is shown in FIGS. 8-9 over a common time scale.
  • the plots of FIGS. 8-9 demonstrate slow response with transient overshoot due to load torque disturbance at 0.01, 0.4, and 0.7 min.
  • the multi -MCMQL- AO system 200 includes a first MCMQL-AO RL agent 202 performing a first action 203 for speed control and a second MCMQL- AO RL agent 204 performing a second action 205for current control.
  • the MCMQL-AO RL agents 202, 204 may be closed-loop reinforcement learning (RL) agents trained with a deterministic policy gradient multi-critic with multi-Q-learning (MCMQL) algorithm to calculate an optimal control action(s) a (1,2) with random noise e at a given time t.
  • RL closed-loop reinforcement learning
  • the MCMQL-AO RL agents 202, 204 calculate actions, as an output, which may include reference torque T ref , ⁇ 7- ax is and q- axis voltage commands V d , V q.
  • the multi -MCMQL-AO system 200 also includes an environment simulator 206, which may employ one or more ordinary differential equation (ODE) solvers to model dynamics of the inverter 20 and/or dynamics of the electric motor 26, which may be a permanent magnet synchronous motor.
  • ODE ordinary differential equation
  • the environment simulator 206 may determine values for state variables, such as tracking errors of the motor speed e ⁇ , the d- axis and q- axis currents e id , e iq with its integral and differential forms.
  • the environment simulator 206 may also determine a reward function, which is a function of the state variables and a function of the actions.
  • the multi- MCMQL-AO system 200 also includes a first RL updater 208 configured to update the first MCMQL-AO RL agent 202 using the state variables and the reward function from the environment simulator 206.
  • the multi -MCMQL-AO system 200 also includes a second RL updater 210 configured to update the second MCMQL-AO RL agent 204 using the state variables and the reward function from the environment simulator 206.
  • a single MCMQL-AO RL agent may be used in the speed controller or the current controller.
  • the proposed optimal vector control uses the off-policy, actor-critic method to adapt to the continuous action spaces.
  • the MCMQL-AO RL agents interact with the plant environment (inverter and PMSM) to create the optimal deterministic policy function, ⁇ .
  • the RL agents incorporate multiple deep networks (actor and multi-critic) to deliver the optimal control actions, T ref , Vd and V q .
  • the actor network is trained to imitate the optimal control action(s) of the vector controllers at a given state by maximizing the reward through gradient ascent.
  • a multi-Q- leaming optimization-based multi-critic network is used to evaluate the accuracy of the previous state-action pairs by estimating the reward value from the environment feedback.
  • Each critic network may have different hidden layers and activation functions, C styles . Further, the multi-critic network tunes the actor network from the evaluation of the state-action pairs ((Q-value) and cumulative reward.
  • the multi-critic, multi -(Q-values optimization for a deterministic policy can be formulated from the fundamental Bellman equation (4) where t is the discrete-time step, a t1 is the optimal action corresponding to the reference torque T ref at state observation a t2 is the optimal reference voltages (V d and V q ) at state observation e (0,1] is the discount factor, and r t( u ) is the reward function.
  • a schematic diagram 300 of the proposed MCMQL-AO vector control scheme is shown in FIG. 10, including a first MCMQL- AO controller 302 and a second MCMQL-AO controller 304.
  • the first MCMQL-AO controller 302 is configured as a speed controller and takes a speed error signal e ⁇ as an input and generates a torque reference signal T ref based on the speed error signal e ⁇ .
  • the second MCMQL-AO controller 304 is configured as a current controller and takes a current error signal e d,q as an input and generates a voltage command V d , V q based on the current error signal e d,q .
  • FIG. 11 shows a block diagram showing internal details of the first MCMQL-AO controller 302
  • FIG. 12 shows a block diagram showing internal details of the second MCMQL-AO controller 304.
  • FIG. 11 shows details of the first MCMQL-AO controller 302.
  • One or more portions of the first MCMQL-AO controller 302 may be implemented in the form of program instructions executed the processor 32 of the controller 30, and which may be stored in the memory 34 of the controller 30.
  • one or more components of the first MCMQL-AO controller 302 may be implemented in the other forms, such as hardware, software, or a combination of hardware and software.
  • the first MCMQL-AO controller 302 includes a first reward calculator 310 configured to calculate a first reward r t1 based on the speed error signal e ⁇ .
  • the first reward calculator 310 may use equation (5) to compute the first reward r t1 .
  • the first MCMQL-AO controller 302 also includes a first state observation calculator 312 configured to calculate a first state observation s t1 based on the speed error signal e ⁇ .
  • the first state observation calculator 312 may use equation (5) to compute the first state observation s t1 .
  • the first MCMQL-AO controller 302 also includes a memory buffer 314 configured to store values of several parameters and for one or more time instances, t, t+ 1 , etc.
  • the parameters stored by the memory buffer 314 include the first reward r t1 the first state observation s ti , and a control action an.
  • the first MCMQL-AO controller 302 also includes a first actor network 316 and a first multi-critic network 318 having j number of critic networks. Each of the first actor network 316 and the first multi-critic network 318 may include one or more different configurations, including those described elsewhere in the present disclosure.
  • the first multi-critic network 318 determines an initial Q-value Q ⁇ j for the episode.
  • the initial Q-value Q ⁇ j may be an initial prediction from the multi-critic network with j number of networks.
  • the first MCMQL-AO controller 302 also includes a discounting block 320 that applies a discount factor g to the initialQ-value Q ⁇ j.
  • a first summing block 322 adds an output of the discounting block 320 (i.e. the initial Q-value Q ⁇ j , as modified by the discount factor g) to the first rewardr t1.
  • a summation block 324 performs calculation based on subtraction of the estimated Q-value ⁇ j from the output of the first summing block 322 to determine a gradient based on gradient descent and further update the first multi-critic network 318.
  • the multi-critic network with j critic networks performing Q-value estimation can be represented as ⁇ j having network parameters ⁇ j , which may be stored in memory, such as the memory 34 of the controller 30.
  • the first MCMQL-AO controller 302 also includes a first gradient block 326 configured to compute a gradient of the multi-critic network’s Q-value function ⁇ j , which may be represented as
  • the first MCMQL-AO controller 302 also includes a second gradient block 328 configured to compute a gradient of the control action an.
  • the gradient of the control action a t1 may be represented as
  • the first MCMQL-AO controller 302 also includes a multiplier block 330 configured to multiply the outputs of the first gradient block 326 and the second gradient block 328 and to compute the overall gradient using gradient ascent thereof to update the first actor network 316.
  • the first MCMQL-AO controller 302 also includes a first noise generator 332 configured to generate a random decaying noise ⁇ , and a second summing block 334 configured to add the random decaying noise e from the first noise generator 332 to an output of the first actor network 316 and to generate the control action a t1 as a sum of the random decaying noise e and the output of the first actor network 316.
  • the first MCMQL-AO controller 302 also includes a first output block 336 configured to calculate the control action a t1 (i.e. the reference torque T re ) f .
  • FIG. 12 shows details of the second MCMQL-AO controller 304.
  • One or more portions of the second MCMQL-AO controller 304 may be implemented in the form of program instructions executed by the processor 32 of the controller 30, and which may be stored in the memory 34 of the controller 30. Alternatively or additionally, one or more components of the second MCMQL-AO controller 304 may be implemented in the other forms, such as hardware, software, or a combination of hardware and software.
  • the second MCMQL-AO controller 304 includes a second reward calculator 350 configured to calculate a second reward r t2 based on the current error signal e d,q .
  • the second reward calculator 350 may use equation (5) to compute the second reward r t2 .
  • the second MCMQL-AO controller 304 also includes a second state observation calculator 352 configured to calculate a second state observation s t2 based on the current error signal e d,q .
  • the second state observation calculator 352 may use equation (5) to compute the second state observation s t2 .
  • the second MCMQL-AO controller 304 also includes a memory buffer 354 configured to store values of several parameters and for one or more time instances, t, t+ 1 , etc.
  • the parameters stored by the memory buffer 354 include the second reward r t2 , the second state observation s t2 , and a control action a t2 .
  • the second MCMQL-AO controller 304 also includes a second actor network 356 and a second multi-critic network 358 having j number of critic networks.
  • the number of critic networks j in the second multi-critic network 358 may be the same as or different from the number of critic networks in the first multi-critic network 318 of the first MCMQL-AO controller 302.
  • the multi -critic networks 318, 358 of the first and second MCMQL-AO controllers 302, 304 may have a different number of critics (as shown, for example, as i and n in FIG. 2).
  • multi-critic networks 318, 358 may each have a same number of critics.
  • the second multi-critic network 358 determines an initial Q-value for the episode.
  • the initial Q-value Q ⁇ j may be an initial prediction from the multi-critic network with j number of networks.
  • the second MCMQL-AO controller 304 also includes a discounting block 360 that applies a discount factor g to the initial Q-value Q ⁇ j.
  • a fourth summing block 362 adds an output of the discounting block 360 (i.e. the initial Q-value Q ⁇ j as modified by the discount factor g) to the second reward r t2 .
  • a second summation block 364 performs calculation based on subtraction of the estimated Q-value ⁇ j from the output of the fourth summing block 362 to determine a gradient based on gradient descent and further update the second multi-critic network 358.
  • the multi-critic network with j critic networks performing Q-value estimation can be represented as ⁇ j having parameters ⁇ j , which may be stored in memory, such as the memory 34 of the controller 30.
  • the second MCMQL-AO controller 304 also includes a first gradient block 366 configured to compute a gradient of the multi-critic network’s Q-v alue function Q j which may be represented as
  • the second MCMQL-AO controller 304 also includes a second gradient block 368 configured to compute a gradient of the control action a t2 .
  • the gradient of the control action a t2 may be represented as
  • the second MCMQL-AO controller 304 also includes a multiplier block 330 configured to multiply the outputs of the first gradient block 366 and the second gradient block 368 and compute the overall gradient using gradient ascent thereof to update the actor network 356.
  • the second MCMQL-AO controller 304 also includes a second noise generator 372 configured to generate a random decaying noise ⁇ , and a fourth summing block 374 configured to add the random decaying noise e from the second noise generator 372 to an output of the actor network 356 and to generate the control action a t2 as a sum of the random decaying noise e and the output of the actor network 356.
  • the second MCMQL-AO controller 304 also includes an output block 376 configured to calculate the reference d-axis voltage V d, and the reference q-axis voltage V q from the control action a t2 .
  • the output block 376 may include a multiplexer to sequentially compute each of the d-axis voltage V d and the reference q-axis voltage V q .
  • the actor-critic network of RL agents are trained using the MCMQL-AO learning algorithm.
  • the MCMQL-AO algorithm is an online, model-free learning for continuous time and action spaces. This algorithm aids the RL agents to interact with the plant environment (inverter and PMSM) by learning the Q functions based on multi-scenarios and predicting the optimal policy.
  • the training of each RL agent are carried through sample transitions then estimating the Q-value s using the multi-critic network.
  • the final (9-values, y, are estimated from equation (9), below.
  • the objective of the actor network is to maximize the expected reward through gradient ascent.
  • the update equation for actor policy ⁇ ⁇ is expressed as the gradient of the cumulative expected return J as shown in (11).
  • the target actor and multi-critic network parameters are updated at every time step using equation (13), where t is the smoothing factor and less than 1.
  • t is the smoothing factor and less than 1.
  • the process of obtaining the optimal actions, T ref , V d and V q , by maximizing the discounted cumulative reward is achieved through regression.
  • the multi-critic network estimates the Q-value s by evaluating the Bellman equation in (4) and retunes the actor network of the speed and current controllers to achieve optimality.
  • the hyperparameters of the proposed actor-critic networks are shown in Table II, below.
  • the actor network applies hyperbolic tangent ( tanh ) activation function and the multi-critic network uses a combination of rectified linear unit ( ReLU ), clipped rectified linear unit ( clipped ReLU), and leaky rectified linear unit (leaky ReLU) activation functions in the hidden layers with C styles.
  • ReLU rectified linear unit
  • clipped ReLU clipped rectified linear unit
  • leaky ReLU leaky rectified linear unit
  • different combinations of activation functions and hidden layers for both actor and multi-critic networks maybe used in MCMQL-AO controller.
  • the schematic of the MCMQL-AO network layouts 400 is shown in FIG. 13.
  • the MCMQL actor- critic network layouts 400 include a policy gradient 402, an actor network 404, and a multi -critic network with C styles 406.
  • the policy gradient 402 is configured to calculate the derivative of the objective function, j( ⁇ ) using equation (11) and to optimize the actor network 404.
  • the actor network 404 includes an actor feature layer, an actor hidden layer applying the tanh activation function, and an actor output layer.
  • the actor feature layer is configured to take the values of state variables, and compute those state variables with the actor hidden layer.
  • the actor output layer is configured to combine the outputs of the actor hidden layer and to use those outputs to generate an action output
  • the multi -critic network 406 is configured to estimate the (Q-values and to calculate to evaluate the policy gradient 402.
  • the multi-critic network 406 incorporates a C style network including a critic feature layer, a critic hidden layer applying combination of ReLU , clipped ReLU , and leaky ReLU activation functions, and a critic output layer.
  • the critic feature layer is configured to take the state variables, s t(1,2) and r t(1,2) ,. and to compute those state variables with the hidden layer.
  • the critic output layer is configured to combine the outputs of the multi- critic network and to use those outputs to generate the optimal estimated (Q-value.
  • FIG. 14 shows a flow chart listing steps in an overall workflow 500 of the
  • the workflow 500 also includes initializing the actor network ⁇ with random weights f at step 506.
  • the workflow 500 also includes setting target networks parameters step 508.
  • the workflow also includes setting target networks parameters step 508.
  • the 500 also includes initializing replay buffer M, total no. of updates/episodes N and step time t at step 510.
  • the workflow 500 also includes addition of noise e with the final action a t to initiate exploration with inverter and motor model in step 512.
  • the workflow 500 also includes observing next step s t+ 1, reward r and terminal state d at step 514.
  • the workflow 500 also includes storing transitions of s t , a t , r and st+1 in buffer M at step 516.
  • the workflow 500 also includes sampling values from M to compute target y j and update critic parameters Q, based on action a t at step 518.
  • the workflow 500 also includes updating policy ⁇ using gradient ascent at step 522.
  • the workflow 500 also includes updating the target networks ⁇ target,n and f target at step 524.
  • the workflow 500 also includes determining if there is a delayed policy update policy delay through N mod policy delay at step 520.
  • Step 526 includes continuing the workflow 500 at step 516 in response to step 526 having a negative result ⁇ i.e. if Q ⁇ j # ⁇ ⁇ ), and proceeding to step 528 in response to step 528 having an affirmative result ⁇ i.e.
  • the workflow 500 ends at step 528.
  • the workflow 500 demonstrated in FIG. 14 is provided for training a single MCMQL-AO RL agent.
  • a method similar or identical to the workflow 500 may be used as a nested loop with corresponding network architecture and state- action pairs for each of two or more MCMQL-AO controllers for speed and current control in a multi-agent system.
  • FIG. 15 shows a flow diagram 600 for training and tuning a MCMQL-AO vector controllers with a PMSM, s t(1,2) ⁇ ⁇ e ⁇ , ⁇ over N training episodes.
  • the flow diagram 600 includes first training episode of the deterministic policy gradient based MCMQL-AO nested speed and current controllers configured to generate a first reference torque T ref (1), first voltage command V d,q ( 1) and a first reward r t(1,2) ( 1 ) at 602, each based on initial system states s t(1,2) .
  • the first reference torque T ref (1) is applied to the control workflow for current control and the first voltage command V d,q (1) is applied to the inverter and PMSM dynamics model at 604 to generate second system states s t(1,2) (2) observations.
  • the MCMQL-AO vector controllers generate, at 606, a second reference torque T ref ( 2), a second voltage command V d,q (2), and a second reward r t(1,2 )(2), each based on the second system states s t(1,2) (2) and the first reward r t(1,2 )( 1 ).
  • the second system actions a t( i, 2 ) , T ref ( 2), and V d,q ( 2) are applied to the control workflow and the inverter and PMSM dynamics model at 608.
  • This training and tuning process is repeated for N episodes of subsequent MCMQL- AO speed and current controllers at 610, 614, and subsequent corresponding inverter and PMSM dynamic states at 612, 616 to generate the optimal MCMQL-AO controller policy at system states s t(1,2) (N) having the highest cumulative reward r t(1,2) ,(N).
  • the reward calculated from each previous episode is used as feedback in the next episode by the Cstyie multi-critic network to optimize the action of actor network through gradient ascent.
  • This iterative process over N episodes enables a strong memorization ability of the actor network with accurate current tracking ability.
  • the proposed control methodology represents the motor drive as a recurrent system with a one-step delay to the MCMQL-AO speed and current vector controllers.
  • the RL agents From the reward functions r t(1,2) in equations (5) and (6), the RL agents converge from a negative value to the estimated Q-value of multi-critic network (0 ⁇ — Q-value ), for a profitable reward through optimal T ref , V d and V q .
  • the RL agents maximize the reward value with respect to the Q-value estimation.
  • due to dynamic variation in reference speed ⁇ e and reference current i d and i q the reward never achieves perfect zero value or the estimated Q-value since the agents are also penalized for constraints violations.
  • the total cumulative discounted reward over an action for every episode during the training of MCMQL-AO controllers are represented in FIG.
  • Graph 650 shows a graph 650 of discounted reward r t(1,2) as a function of number of episodes.
  • Graph 650 includes a first plot 652 of value of the reward r t1 , and a second plot 654 of average reward for MCMQL-AO speed controller.
  • Graph 650 also includes a third plot
  • FIG. 17 shows a block diagram for a simulation system 670 for evaluating the proposed MCMQL-AO vector control system and method.
  • FIG. 17 includes a real-time OP4510 simulator 672 simulating the proposed MCMQL-AO vector control system.
  • FIG. 17 also includes a host computer 674 in communication with the real-time OP4510 simulator 672 for monitoring and adjusting parameters of the proposed MCMQL-AO vector control system.
  • the tracking ability of the MCMQL-AO speed and current loop controllers are evaluated by rotating the device under test (DUT) at a constant torque or speed with a programmable dynamometer.
  • DUT device under test
  • the PMSM is rotated at varying speed commands and the load torque is varied using the dynamometer.
  • the SIL test results and the comparison of the proposed controller with the conventional adaptive PI controller are shown in FIG. 18 and FIG. 19, respectively.
  • FIG. 18 shows a graph of speed vs. time, with a first plot 680 showing performance of a MCMQL controller of the present disclosure, a second plot 682 showing performance of a conventional adaptive PI controller, and a third plot 684 showing a reference or commanded speed.
  • the speed tracking command is varied from 100 rpm to 975 rpm with field weakening control under varying load torque.
  • the proposed MCMQL-AO speed controller and the adaptive PI track the speed satisfactory. However, the MCMQL-AO controller shows faster response with settling time of at least 0.03 seconds compared to the adaptive PI controller.
  • FIG. 19 shows a graph of torque vs. time, with a first plot 690 showing performance of a MCMQL-AO controller of the present disclosure, a second plot 692 showing performance of a conventional adaptive PI controller, and a third plot 694 showing a reference or commanded torque.
  • the faster response and settling time of the MCMQL-AO controller results in a faster electromagnetic torque response of the PMSM with reduced load torque disturbances under dynamic conditions, as shown in FIG. 19.
  • the average and standard deviation (SD) of the speed tracking error for FIG. 18 is shown in Table III.
  • FIG. 21 shows a graph 710 with a plot 712 showing the cumulative reward earned by the MCMQL-AO RL agent 1, during speed tracking.
  • the agent is penalized at 0.2, 0.4, 0.6, and 0.8 s due to the constraint violations from varying speed commands.
  • the tracking ability of the current loop controller is evaluated by rotating the device under test (DUT) at a constant speed with a speed-controlled dynamometer, while the reference d- axis and q-axis current is varied.
  • the proposed MCMQL-AO current controller is validated with the same procedure as in FIG. 17 to evaluate its current tracking capability.
  • the reference d- axis and q-axis currents are changed to study the transient and dynamic tracking capability of the proposed MCMQL-AO advanced current controller.
  • the referenceq-loop current is changed from 15 A to 5 A at 0.3 seconds and again to 10 A at 0.6 seconds.
  • the d- loop reference current is changed from -15 A to -5 A.
  • the SIL simulator results of the current control loop with the performance comparison are shown in FIG. 20, which shows a combined graph 700 including first and second graphs (a), (b) over a common time scale.
  • the combined graph 700 includes a first plot 702 of q- axis current i q using adaptive PI control, and a second plot 704 of q- axis current i q using the MCMQL-AO current control technique.
  • the second graph (b) includes a third plot 706 of d- axis current / f using adaptive PI control, and a fourth plot 708 of d- axis current i d using the MCMQL-AO current control technique. From FIG. 20, it is observed that the proposed MCMQL-AO current controller tracks the reference current accurately in both q- and rZ-axis.
  • the average and standard deviation (SD) of d- and q-axis currents tracking errors for FIG. 20 are represented in Table IV. The average and SD can be further reduced with increased exploration time.
  • FIG. 21 shows a graph 710 with a plot (b) 714 of the cumulative reward earned by the MCMQL-AO RL agent 2 during the current tracking evaluation. Due to constraint violation i.e., change in reference current, the cumulative reward decreases at 0.3, 0.45, and 0.6 seconds respectively.
  • the MCMQL-AO current controller shows fast and promising dynamic characteristics mitigating decoupling inaccuracy. Also, a reliable response with reduced transient peaks is observed in DRL-based current control compared to adaptive PI. A reliable response with 5 A reduced transient peaks are observed in MCMQL-AO based current control compared to adaptive PI.
  • Combined graph 720 includes first and second graphs (a), (b) over a common time scale.
  • the first graph (a) includes a first plot 722 of q- axis current i q using adaptive PI control, and a second plot 724 of q- axis current fusing the MCMQL-AO current control technique.
  • the second graph (b) includes a third plot 726 of d- axis current i d using adaptive PI control, and a fourth plot 728 of d- axis current / f using the MCMQL-AO current control technique.
  • FIG. 23 shows a combined graph 730 of currents resulting from a 20% decreased flux linkage.
  • Combined graph 730 includes first and second graphs (a), (b) over a common time scale.
  • the first graph (a) includes a first plot 732 of q- axis current i q using adaptive PI control, and a second plot 734 of q- axis current i q using the MCMQL DRL control technique.
  • the second graph (b) includes a third plot 736 of d- axis current i d using adaptive PI control, and a fourth plot 738 of d- axis current / f using the MCMQL-AO current control technique.
  • the updated flux linkage values are presented in Table V, below.
  • the evaluation of the proposed current control is achieved by operating the PMSM at a constant speed and tracking the d- axis andq-axis currents.
  • the MCMQL-AO current controller demonstrated in FIG. 22 and FIG. 23 adapts itself with fluctuating PMSM rotor magnet flux. The transient overshoots at 0.3 and 0.6 seconds are mitigated with better dynamic current tracking capabilities under varying flux linkage compared to adaptive PI control. TABLE V - PMSM VARYING FLUX LINKAGE
  • FIG. 24 shows a graph 740 illustrating the torque response for 20% reduced flux linkage, corresponding to FIG. 23.
  • Graph 740 includes a first plot 742 of torque (in Newton- meters) produced by the electric motor 26 using adaptive PI control, and a second plot 744 of torque (in Newton-meters) produced by the electric motor 26 using the MCMQL-AO control technique.
  • a transient electromagnetic torque overshoot at 0.3 s observed in FIG. 24 and is due to flux linkage variation at 0.3 s.
  • An enhanced dynamic current tracking under changing PM flux is observed in the proposed MCMQL-AO controller.
  • better transient response of PMSM electromagnetic torque by 4 Nm is noted under varying flux linkage. Similar current tracking and torque performance during transient conditions is observed under 20% increased PM flux linkage.
  • FIG. 25 shows a combined graph 750 of currents resulting from 20% increased time- varying parameters.
  • Combined graph 750 includes first and second graphs (a), (b) over a common time scale.
  • the first graph (a) includes a first plot 752 of q- axis current i q using adaptive PI control, and a second plot 754 of q- axis current i q using the MCMQL-AO control technique.
  • the second graph (b) includes a third plot 756 of d- axis current i d using adaptive PI control, and a fourth plot 758 of d- axis current i d using the MCMQL-AO control technique.
  • FIG. 26 shows a combined graph 760 of currents resulting from 20% decreased time-varying parameters.
  • Combined graph 760 includes first and second graphs (a), (b) over a common time scale.
  • the first graph (a) includes a first plot 762 of q- axis current i q using adaptive PI control, and a second plot 764 of q- axis current i q using the MCMQL-AO control technique.
  • the second graph (b) includes a third plot 766 of d- axis current i d using adaptive PI control, and a fourth plot 768 of d- axis current i d using the MCMQL-AO control technique.
  • FIG. 27 shows a graph 770 illustrating the torque response for 20% increased time- varying parameters of PMSM corresponding to FIG. 25.
  • Graph 770 includes a first plot 772 of torque (in Newton-meters) produced by the electric motor 26 using adaptive PI control, and a second plot 774 of torque (in Newton -meters) produced by the electric motor 26 using the MCMQL-AO control technique.
  • the transient overshoot and slow response in torque waveform due to decoupling inaccuracy of PI controller, ⁇ e L q i q and - ⁇ e L d i d is mitigated with the MCMQL- AO current controller.
  • a very similar torque response with reduced transient response by 3 Nm for 20% reduced time-varying parameters is observed (not shows).
  • the MCMQL DRL-based advanced current controller of the present disclosure shows a more reliable, stable, and superior adaptive performance under dynamic conditions with parameter uncertainties of the PMSM.
  • the speed control loop, current control loop and plane hardware runs asynchronously i.e., with different discrete sampling time.
  • the rotating speed of PMSM is directly related to stator dq current. Since the PMSM speed and the electrical frequency are directly related to each other, the impact of the sampling rate plays a vital role in the performance of the PMSM. Also, the angular position of the motor changes very quickly with the change in rotational speed, and hence a smaller sampling time would be ideal for fast and efficient performance.
  • the MCMQL-AO current controller of the present disclosure can be trained with a smaller sample time for a faster and more effective current response. Since the training and exploration of the DRL controller are time-consuming processing, the proposed MCMQL-AO current controller is trained at a moderate sample time as 1x10 -4 seconds. The training and exploration time can be significantly reduced with the use of CUDA-based graphics processing units (GPUs) or tensor processing units (TPUs) hardware.
  • GPUs graphics processing units
  • TPUs tensor processing units
  • FIG. 28 shows a combined graph 780 showing the effect of sample time at different speed profiles.
  • Combined graph 780 includes first and second graphs (a), (b) over a common time scale.
  • the first graph (a) shows performance with a sample rate of 1x10 -4 seconds and includes a first plot 782 of q- axis current i q using adaptive PI control, and a second plot 784 of q- axis current i q using the MCMQL-AO control technique.
  • the second graph (b) shows performance with a sample rate of 1x10 -4 seconds and includes a third plot 786 of d- axis current i d using adaptive PI control, and a second plot 788 of d- axis current i d using the MCMQL-AO control technique.
  • the proposed MCMQL-AO controller shows reduced transient overshoot and improved performance stability.
  • a MCMQL-AO parameter independent speed and current control of PMSM for EV application based on deep reinforcement learning (DRL) is presented.
  • Our proposed MCMQL-AO control shows accurate, reliable, adaptive, and efficient performance compared to conventional control techniques such as PI control.
  • the multi-critic network with C styles allows faster training with improved performance.
  • the exploration of RL agents with the plant environment enables accurate learning with a strong adaptive ability reducing transient overshoot responses, oscillations, and decoupling inaccuracy.
  • the proposed MCMQL-AO speed and current controllers are independent of PMSM parameters and, hence mitigating complex tuning methods in non-linear systems. This study foresees an added advantage of efficient online learning with the MCMQL-AO controllers.
  • our proposed MCMQL-AO controller concept can be adapted to other complex non-linear control systems.
  • the present disclosure provides an electric motor drive that includes an inverter 20 and a controller 30.
  • the inverter 20 includes at least one switch 22 operable to supply an alternating current (AC) power to an electric motor 26.
  • the controller 30 may include a processor 32, such as a general-purpose microprocessor. However, the controller 30 may have a different configuration, which may include application-specific hardware and/or software.
  • the controller 30 is configured to: compute an error signal by subtracting an actual value signal from a reference value corresponding to a control signal.
  • the error signal may include the speed error signal e ⁇
  • the controller 30 may include the processor 32 executing instructions to implement the first difference block 105 to compute the error signal by subtracting the actual speed ⁇ m of the electric motor 26 from the speed reference ⁇ ' m
  • the error signal may include the current error signal e d,q
  • the controller 30 may include the processor 32 executing instructions to implement the second difference block 124 to compute the error signal by subtracting the actual current i d,q supplied to the electric motor 26 from the reference current i'dq.
  • the error signal may include a different signal, such as a voltage error, etc.
  • the controller 30 is also configured to determine a reward based on the error signal.
  • the controller 30 may include the processor 32 executing instructions to implement the first reward calculator 310 to calculate a first reward r t1 based on the speed error signal e ⁇ .
  • the controller 30 may include the processor 32 executing instructions to implement the second reward calculator 350 to calculate a second reward r t2 based on the current error signal e d , q .
  • the controller 30 is also configured to determine a state observation based on the error signal.
  • the controller 30 may include the processor 32 executing instructions to implement the first state observation calculator 312 to calculate the first state observation s t1 based on the speed error signal e ⁇ .
  • the controller 30 may include the processor 32 executing instructions to implement the second state observation calculator 352 to calculate the second state observation s t2 based on the current error signal e d,q .
  • the controller 30 is also configured to compute, using a multi-critic with multi -Q learning (MCMQL) actor optimization (AO) controller, an actor output based on the error signal, the MCMQL-AO controller including an actor network and multiple critic networks with multi- Q-value optimization, each of the critic networks configured to update the actor network based on the reward and the state observation.
  • MCMQL-AO controller including an actor network and multiple critic networks with multi- Q-value optimization, each of the critic networks configured to update the actor network based on the reward and the state observation.
  • the controller 30 may include the processor 32 executing instructions to implement the first actor network 316 and the first multi-critic network 318.
  • the controller 30 may include the processor 32 executing instructions to implement the second actor network 356 and the second multi-critic network 358.
  • the controller 30 is also configured to determine the control signal based on the actor output.
  • the MCMQL-AO controller may be configured as a speed controller, and the control signal may include the torque reference signal r ref based on the speed error signal e ⁇ .
  • the controller30 may include the processor32 executing instructions to implement the second summing block334 and/or the first output block336 to determine the control signal (i.e. the torque reference signal T ref ) based on the actor output from the first actor network 316.
  • the MCMQL-AO controller may be configured as a current controller, and the control signal may include the voltage command V d , V q based on the current error signal e d,q .
  • the controller 30 may include the processor 32 executing instructions to implement the fourth summing block 374 and/or second output block 376 to determine the control signal (i.e. the voltage command V d , V q ) based on the actor output from the second actor network 356.
  • the controller 30 is also configured to command, based on the control signal, the at least one switch to selectively conduct current.
  • the controller30 may include the processor 32 executing instructions to implement the pulse-width-modulation (PWM) block 142 to generate the pulse-width-modulation signals for controlling the solid-state switches 22 based on the control signal, such as the torque reference signal T ref and/or the voltage command V d , V q .
  • PWM pulse-width-modulation
  • the system, methods and/or processes described above, and steps thereof, may be realized in hardware, software or any combination of hardware and software suitable for a particular application.
  • the hardware may include a general purpose computer and/or dedicated computing device or specific computing device or particular aspect or component of a specific computing device.
  • the processes may be realized in one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors or other programmable device, along with internal and/or external memory.
  • the processes may also, or alternatively, be embodied in an application specific integrated circuit, a programmable gate array, programmable array logic, or any other device or combination of devices that may be configured to process electronic signals. It will further be appreciated that one or more of the processes may be realized as a computer executable code capable of being executed on a machine readable medium.
  • the computer executable code may be created using a structured programming language such as C, an object oriented programming language such as C++, or any other high- level or low-level programming language (including assembly languages, hardware description languages, and database programming languages and technologies) that may be stored, compiled or interpreted to run on one of the above devices as well as heterogeneous combinations of processors processor architectures, or combinations of different hardware and software, or any other machine capable of executing program instructions.
  • a structured programming language such as C
  • an object oriented programming language such as C++
  • any other high- level or low-level programming language including assembly languages, hardware description languages, and database programming languages and technologies
  • each method described above and combinations thereof may be embodied in computer executable code that, when executing on one or more computing devices performs the steps thereof.
  • the methods may be embodied in systems that perform the steps thereof, and may be distributed across devices in a number of ways, or all of the functionality may be integrated into a dedicated, standalone device or other hardware.
  • the means for performing the steps associated with the processes described above may include any of the hardware and/or software described above. All such permutations and combinations are intended to fall within the scope of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Power Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Control Of Ac Motors In General (AREA)
  • Control Of Electric Motors In General (AREA)

Abstract

Un entraînement de moteur électrique comprend un multi-critique ayant un dispositif de commande optimisé pour acteur d'apprentissage multi-Q. L'entraînement de moteur électrique comprend un onduleur configuré pour appliquer une tension alternative à un moteur électrique et pour fournir du courant au moteur électrique. L'entraînement de moteur électrique comprend également un dispositif de commande de vitesse et un dispositif de commande de courant selon une configuration en cascade. Le dispositif de commande de vitesse est configuré pour déterminer un couple de référence ou un courant de référence pour l'onduleur à fournir au moteur électrique, et le dispositif de commande de courant est configuré pour déterminer la tension sur la base du courant de référence. Au moins un parmi le dispositif de commande de vitesse ou le dispositif de commande de courant comprend le multi-critique ayant un dispositif de commande optimisé pour acteur d'apprentissage multi-Q. Dans certains modes de réalisation, chacun du dispositif de commande de vitesse et du dispositif de commande de courant comprend un multi-critique correspondant ayant un dispositif de commande optimisé pour acteur d'apprentissage multi-Q.
PCT/US2022/019486 2021-03-09 2022-03-09 Optimisation de commande vectorielle améliorée basée sur un multi-critique avec apprentissage multi-q pour une commande de moteur WO2022192352A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163158544P 2021-03-09 2021-03-09
US63/158,544 2021-03-09

Publications (2)

Publication Number Publication Date
WO2022192352A1 true WO2022192352A1 (fr) 2022-09-15
WO2022192352A9 WO2022192352A9 (fr) 2022-12-01

Family

ID=83227066

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/019486 WO2022192352A1 (fr) 2021-03-09 2022-03-09 Optimisation de commande vectorielle améliorée basée sur un multi-critique avec apprentissage multi-q pour une commande de moteur

Country Status (1)

Country Link
WO (1) WO2022192352A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115765562A (zh) * 2022-09-23 2023-03-07 华北电力大学 一种永磁同步电机无模型预测电流控制方法及装置
WO2024083390A1 (fr) * 2022-10-19 2024-04-25 Robert Bosch Gmbh Procédé et dispositif pour fournir un modèle d'actionnement pour une machine électrique à commutation électronique et un système de moteur

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170033729A1 (en) * 2015-07-31 2017-02-02 Fanuc Corporation Machine learning apparatus for learning operation conditions of cooling device, motor control apparatus and motor control system having the machine learning apparatus, and machine learning method
US20170350404A1 (en) * 2012-03-02 2017-12-07 Panasonic Intellectual Property Management Co., Ltd. Motor controller and motor control method
US20200104685A1 (en) * 2018-09-27 2020-04-02 Deepmind Technologies Limited Learning motor primitives and training a machine learning system using a linear-feedback-stabilized policy
WO2020144292A1 (fr) * 2019-01-09 2020-07-16 Continental Automotive Gmbh Contrôle thermique pour moteur de véhicule
CN110488759B (zh) * 2019-08-09 2020-08-04 西安交通大学 一种基于Actor-Critic算法的数控机床进给控制补偿方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170350404A1 (en) * 2012-03-02 2017-12-07 Panasonic Intellectual Property Management Co., Ltd. Motor controller and motor control method
US20170033729A1 (en) * 2015-07-31 2017-02-02 Fanuc Corporation Machine learning apparatus for learning operation conditions of cooling device, motor control apparatus and motor control system having the machine learning apparatus, and machine learning method
US20200104685A1 (en) * 2018-09-27 2020-04-02 Deepmind Technologies Limited Learning motor primitives and training a machine learning system using a linear-feedback-stabilized policy
WO2020144292A1 (fr) * 2019-01-09 2020-07-16 Continental Automotive Gmbh Contrôle thermique pour moteur de véhicule
CN110488759B (zh) * 2019-08-09 2020-08-04 西安交通大学 一种基于Actor-Critic算法的数控机床进给控制补偿方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115765562A (zh) * 2022-09-23 2023-03-07 华北电力大学 一种永磁同步电机无模型预测电流控制方法及装置
WO2024083390A1 (fr) * 2022-10-19 2024-04-25 Robert Bosch Gmbh Procédé et dispositif pour fournir un modèle d'actionnement pour une machine électrique à commutation électronique et un système de moteur

Also Published As

Publication number Publication date
WO2022192352A9 (fr) 2022-12-01

Similar Documents

Publication Publication Date Title
US10305413B2 (en) Machine learning device which learns current command for motor, motor controller, and machine learning method
US10090791B2 (en) Machine learning apparatus and method for learning correction value in motor current control, correction value computation apparatus including machine learning apparatus and motor driving apparatus
JP6193961B2 (ja) 機械の送り軸の送りの滑らかさを最適化する機械学習装置および方法ならびに該機械学習装置を備えたモータ制御装置
Li et al. Inductance surface learning for model predictive current control of switched reluctance motors
WO2022192352A1 (fr) Optimisation de commande vectorielle améliorée basée sur un multi-critique avec apprentissage multi-q pour une commande de moteur
Cabrera et al. Tuning the stator resistance of induction motors using artificial neural network
Qi Rotor resistance and excitation inductance estimation of an induction motor using deep-Q-learning algorithm
Bhattacharjee et al. Real-time sil validation of a novel pmsm control based on deep deterministic policy gradient scheme for electrified vehicles
Hanke et al. Finite-control-set model predictive control for a permanent magnet synchronous motor application with online least squares system identification
Stender et al. Accurate torque control for induction motors by utilizing a globally optimized flux observer
Ren et al. Speed sensorless nonlinear adaptive control of induction motor using combined speed and perturbation observer
Bhattacharjee et al. An advanced policy gradient based vector control of PMSM for EV application
Raj et al. Particle swarm optimized deep convolutional neural sugeno-takagi fuzzy PID controller in permanent magnet synchronous motor
Karami-Shahnani et al. Online Inductance Estimation of PM-Assisted Synchronous Reluctance Motor Using Artificial Neural Network
CN112422014A (zh) 基于高阶滑模补偿的超局部无模型永磁同步电机转速预测方法
Stender et al. Accurate torque estimation for induction motors by utilizing a hybrid machine learning approach
Li et al. A flexible current tracking control of sensorless induction motors via adaptive observer
Thakar et al. Fractional-order PI controller for permanent magnet synchronous motor: A design-based comparative study
Kirad et al. Improved sensorless backstepping controller using extended Kalman filter of a permanent magnet synchronous machine
Lubineau et al. Design of an advanced non linear controller for induction motors and experimental validation on an industrial benchmark
Yin et al. Overshoot Reduction Inspired Recurrent RBF Neural Network Controller Design for PMSM
De Martin et al. Trajectory Linearisation-based Offset-free MPC for Synchronous Electric Motor Drives with Nonlinear Magnetic Characteristic
Ting et al. An SOS Observer-Based Sensorless Control for PMLSM Drive System
CN113726244B (zh) 一种基于Adaline神经网络的转子磁链实时估计方法及系统
Kamiński et al. Adaptive Control Structure with Neural Data Processing Applied for Electrical Drive with Elastic Shaft. Energies 2021, 14, 3389

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22767864

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22767864

Country of ref document: EP

Kind code of ref document: A1