CN113467226A - Proportional valve position control method based on Q-Learning - Google Patents

Proportional valve position control method based on Q-Learning Download PDF

Info

Publication number
CN113467226A
CN113467226A CN202110720142.5A CN202110720142A CN113467226A CN 113467226 A CN113467226 A CN 113467226A CN 202110720142 A CN202110720142 A CN 202110720142A CN 113467226 A CN113467226 A CN 113467226A
Authority
CN
China
Prior art keywords
current
loop controller
state
pid closed
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110720142.5A
Other languages
Chinese (zh)
Other versions
CN113467226B (en
Inventor
张辉
张思龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110720142.5A priority Critical patent/CN113467226B/en
Publication of CN113467226A publication Critical patent/CN113467226A/en
Application granted granted Critical
Publication of CN113467226B publication Critical patent/CN113467226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B11/00Automatic controllers
    • G05B11/01Automatic controllers electric
    • G05B11/36Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential
    • G05B11/42Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential for obtaining a characteristic which is both proportional and time-dependent, e.g. P. I., P. I. D.
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Magnetically Actuated Valves (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a proportional valve position control method based on Q-Learning, which respectively carries out self-adaptive adjustment on a displacement control parameter of a position PID closed-loop controller and a current control parameter of a current PID closed-loop controller through Q-Learning, so that the displacement control parameter and the current control parameter are adapted along with the change of sampling time, and the accurate control of the position of a valve core is realized through the position PID closed-loop control and the current PID closed-loop control. The invention provides a control method capable of effectively improving the bandwidth and the precision of a proportional valve by considering various nonlinear factors in a proportional solenoid valve aiming at the high-precision position control of the high-frequency response proportional valve, and can provide some references for the high-frequency response of the proportional valve. The control method has the advantages of rapidity and stability, can synchronously follow the displacement of the valve core at low frequency and high frequency, and solves the problem that PID parameters cannot be simultaneously adapted to high-frequency and low-frequency signals.

Description

Proportional valve position control method based on Q-Learning
Technical Field
The invention relates to the technical field of proportional valve control, in particular to a proportional valve position control method based on Q-Learning.
Background
The electro-hydraulic proportional valve is a hydraulic element commonly used in the field of industrial control. With the continuous development of engineering technology, the requirements on the position control precision and the frequency response of the proportional valve are higher and higher. Therefore, the research of a high-precision and high-frequency-response proportional valve position control method is extremely important for the development of the electro-hydraulic control technology. At present, a series of researches are carried out at home and abroad aiming at the position control of a high-frequency response servo proportional valve.
In the existing research on a proportional valve position control method, a compensation value is determined by acquiring the position of a valve core and judging whether the valve core reaches an expected position or whether a system oscillates, then a control signal is compensated and corrected based on the size of a dead zone compensation value, and a correction signal is output to an electro-hydraulic proportional valve. Compared with the traditional valve, the method has the advantages that the control precision is improved, but the control precision of the high-frequency-response position of the hydraulic valve is difficult to guarantee.
In addition, proportional valves also have non-linear factors. The current-force characteristic of the proportional electromagnet can generate nonlinearity under the high-frequency-response working condition, and firstly, the proportional electromagnet is worn due to use after leaving a factory; secondly, due to the existence of hysteresis, nonlinearity can occur under the high frequency response working condition. In the valve body, the friction coefficient between the valve core and the valve sleeve is changed due to the movement of the valve core, so that the friction force is continuously changed. In addition, the hydrodynamic force can also generate nonlinear influence on the valve core along with the movement of the valve core. Non-linear factors introduced by the circuit itself, such as a non-linear relationship between a given PWM duty cycle and frequency and the circuit output current, also occur in the control circuit.
Disclosure of Invention
In view of the above, the present invention provides a proportional valve position control method based on Q-Learning, which is used to consider the nonlinearity of the system and solve the problem that the PID parameter cannot adapt to both high frequency and low frequency given signals.
The invention provides a proportional valve position control method based on Q-Learning, which comprises the following steps:
s1: taking the difference value between the set position of the valve core and the currently acquired position of the valve core as the input of a position PID closed-loop controller, and taking the sum of the output of the position PID closed-loop controller and the reference current as the current reference input;
s2: taking the difference value between the current reference input and the currently acquired electromagnetic valve input current as the input of the current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core;
s3: according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as a reference value of the PWM generator duty ratio;
s4: setting initial values of three displacement control parameters of the position PID closed-loop controller and three current control parameters of the current PID closed-loop controller, collecting a valve core position as a state set, and collecting output current of a PWM generator;
s5: dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain six Q value sets;
s6: giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter to generate an action set with six Q value sets;
s7: formulating a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter;
s8: inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate, performing Q-Learning, and outputting rewards corresponding to the state set and the action set;
s9: and executing steps S1-S4 according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.
In a possible implementation manner, in the above method for controlling a proportional valve position based on Q-Learning provided by the present invention, step S1 specifically includes:
calculating the valve core set position xrefAnd the currently acquired valve core position xkThe difference of (a):
e(xk)=xref-xk (1)
e (x)k) As position PID closureThe input of the loop controller, the output of the position PID closed-loop controller is:
Figure BDA0003136643150000031
wherein k ispxProportional gain, k, of position PID closed-loop controllerixIntegral gain, k, of a closed-loop controller representing position PIDdxDifferential gain, Δ x, of position PID closed-loop controllerkIndicating the sampling interval time, e (x), of a position PID closed-loop controllerk-1) Indicating the valve element set position xrefValve core position x acquired at the k-1 sampling momentk-1A difference of (d);
the output u (x) of the position PID closed-loop controllerk) And a reference current cstThe sum is taken as the current reference input:
cref=u(xk)+cst (3)。
in a possible implementation manner, in the above method for controlling a proportional valve position based on Q-Learning provided by the present invention, step S2 specifically includes:
calculating the current reference input crefAnd the current collected electromagnetic valve input current ckThe difference of (a):
e(ck)=cref-ck (4)
e (c)k) As the input of the current PID closed-loop controller, the output of the current PID closed-loop controller is:
Figure BDA0003136643150000032
wherein k ispcProportional gain, k, representing the current PID closed-loop controllericIntegral gain, k, of a closed-loop controller representing the current PIDdcRepresenting the differential gain, Δ c, of a current PID closed-loop controllerkRepresenting the sampling interval time of a current PID closed-loop controller, e (c)k-1) Representing the current reference input crefElectromagnetic valve input current c collected at the k-1 sampling momentk-1A difference of (d);
b is to be u (c)k) And the duty ratio is superposed with the reference PWM duty ratio and then loaded to the electromagnetic valve to control the movement of the valve core.
In a possible implementation manner, in the proportional valve position control method based on Q-Learning provided by the present invention, in step S5, values in the same storage interval are set using the same rule:
Figure BDA0003136643150000041
wherein, [ x ]]Max { n ∈ Z |. n ≦ x }, n representing a discrete spool displacement or current; x is the number ofconRepresenting a continuous spool displacement or current; xminAnd XmaxAre each xconLower and upper limits of (d); n represents the number of intervals into which each spool displacement or each current is divided.
In a possible implementation manner, in the proportional valve position control method based on Q-Learning provided by the present invention, in step S6, an epsilon-greedy criterion is defined as:
Figure BDA0003136643150000042
wherein xi is belonged to [0, 1 ]]And epsilon represents the probability of taking a random action,
Figure BDA0003136643150000043
representing the action taken by the current maximum Q value, s representing the current state, and a representing the action currently taken.
In a possible implementation manner, in the above proportional valve position control method based on Q-Learning provided by the present invention, in step S7, a Q value set reward scheme related to the displacement control parameter is formulated:
Figure BDA0003136643150000044
wherein R istxPrize value, e (x), representing position PID closed loop controllerk+1) The indication indicates the valve element set position xrefValve core position x acquired at the (k + 1) th sampling momentk+1Difference of (a), xlimIndicating a set displacement threshold;
formulating a set of Q values reward scheme associated with the current control parameter:
Figure BDA0003136643150000051
wherein R istcRepresenting the reward value of a current PID closed-loop controller, e (c)k+1) Representing the current reference input crefElectromagnetic valve input current c collected at the (k + 1) th sampling momentk+1A difference of (c)limIndicating a set current threshold.
In a possible implementation manner, in the proportional valve position control method based on Q-Learning provided by the present invention, in step S8, the performing Q-Learning specifically includes:
s81: randomly initializing a reward corresponding to the state set and the action set, wherein the initial value of the termination state is 0;
s82: sequentially setting i to 1,2, …, and T, and executing steps S83 to S87, respectively; wherein T represents the number of iterations;
s83: initializing a current state s as a first state of a current state set;
s84: selecting an action in the current state by using an epsilon-greedy method criterion;
s85: executing the selected action in the current state to obtain a new state and a corresponding reward;
s86: updating the cost function with the new state and its corresponding reward:
Qk+1(sk,ak)=Qk(sk,ak)+α[rk+γmax Qk)sk+1,ak+1)-Qk(sk,ak)] (10)
wherein s iskRepresenting the state at the kth sampling instant, akRepresenting the action at the kth sampling instant, Qk+1(sk,ak) Denotes the (k + 1) th sampling instant at skIn the state of taking akMovement is transferred to sk+1Q value of State, Qk(sk,ak) Denotes the k-th sampling instant at skIn the state of taking akQ value of action, Qk(sk+1,ak+1) Represents the action a of taking the next sampling time at the kth sampling timek+1The latter Q value, α, represents the learning rate, α ∈ (0, 1)]R represents reward, k represents sampling time, and γ represents attenuation factor;
s87: let s ' be s, s ' be the state after taking action, judge s ' whether it is the end state; if yes, the current round of iteration is finished, and the step S82 is returned; if not, the process returns to step S84.
According to the proportional valve position control method based on Q-Learning, provided by the invention, the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are respectively subjected to self-adaptive adjustment through Q-Learning, so that the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are adapted along with the change of sampling time, and the accurate control of the position of the valve core is realized through position PID closed-loop control and current PID closed-loop control. The invention provides a control method capable of effectively improving the bandwidth and the precision of a proportional valve by considering various nonlinear factors in a proportional solenoid valve aiming at the high-precision position control of the high-frequency response proportional valve, has the advantages of rapidity and stability, and can provide some references for the high-frequency response of the proportional valve. Six control parameters in the traditional control method are fixed values and can only follow the displacement of the valve core at low frequency or the displacement of the valve core at high frequency and cannot synchronously follow the displacements of the valve core at low frequency and high frequency.
Drawings
FIG. 1 is a block flow diagram of a method for controlling a proportional valve position based on Q-Learning according to embodiment 1 of the present invention;
fig. 2 is a flow chart of the Q-Learning algorithm in embodiment 1 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only illustrative and are not intended to limit the present invention.
The invention provides a proportional valve position control method based on Q-Learning, which comprises the following steps:
s1: taking the difference value between the set position of the valve core and the currently acquired position of the valve core as the input of a position PID closed-loop controller, and taking the sum of the output of the position PID closed-loop controller and the reference current as the current reference input;
s2: taking the difference value of the current reference input and the currently acquired electromagnetic valve input current as the input of a current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core;
s3: according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as a reference value of the PWM generator duty ratio;
s4: setting initial values of three displacement control parameters of a position PID closed-loop controller and three current control parameters of a current PID closed-loop controller, collecting a valve core position as a state set, and collecting output current of a PWM generator;
s5: dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain six Q value sets;
s6: giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter to generate an action set with six Q value sets;
s7: formulating a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter;
s8: inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate, performing Q-Learning, and outputting rewards corresponding to the state set and the action set;
s9: and executing steps S1-S4 according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.
The specific implementation of the above-mentioned proportional valve position control method based on Q-Learning according to the present invention is described in detail with reference to a specific embodiment.
Example 1: the Q-Learning based proportional valve position control method includes position PID control of the outer loop and current PID control of the inner loop as shown in fig. 1.
Setting position x of valve core of given proportional valverefAnd for input, the position of the valve core is controlled by taking the deviation between the currently acquired valve core position and the valve core set position and the deviation between the calculated current value and the current value output by the PWM generator as a control quantity. Discretizing the valve core position and the output current of the PWM generator to be used as states for data acquisition, and performing Q-Learning to obtain 6Q value sets, wherein each Q value set corresponds to the system gain of two PID controllers respectively, namely the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller. Each set of Q values generates the best value of the gains of the two PID controllers given the current state.
The method comprises the following specific steps:
firstly, the difference value between the valve core set position and the currently acquired valve core position is used as the input of a position PID closed-loop controller, and the sum of the output of the position PID closed-loop controller and a reference current is used as a current reference input.
Setting position x with a valve corerefFor reference, the spool set position x is calculatedrefAnd the currently acquired valve core position xkThe difference of (a):
e(xk)=xref-xk (1)
e (x)k) The output of the position PID closed-loop controller is as follows as the input of the position PID closed-loop controller:
Figure BDA0003136643150000081
wherein k ispxProportional gain, k, of position PID closed-loop controllerixIntegral gain, k, of a closed-loop controller representing position PIDdxDifferential gain, Δ x, of position PID closed-loop controllerkIndicating the sampling interval time, e (x), of a position PID closed-loop controllerk-1) The indication indicates the valve element set position xrefValve core position x acquired at the k-1 sampling momentk-1A difference of (d);
output u (x) of position PID closed-loop controllerk) And a reference current cstThe sum is taken as the current reference input:
cref=u(xk)+cst (3)。
and secondly, taking the difference value between the current reference input and the currently acquired input current of the electromagnetic valve as the input of a current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core.
Calculating a current reference input crefAnd the current collected electromagnetic valve input current ckThe difference of (a):
e(ck)=cref-ck (4)
e (c)k) As the input of the current PID closed-loop controller, the output of the current PID closed-loop controller is:
Figure BDA0003136643150000082
wherein k ispcProportional gain, k, representing the current PID closed-loop controllericIntegral gain, k, of a closed-loop controller representing the current PIDdcRepresenting the differential gain, Δ c, of a current PID closed-loop controllerkRepresenting the sampling interval time of a current PID closed-loop controller, e (c)k-1) Representing the current reference input crefElectromagnetic valve input current c collected at the k-1 sampling momentk-1A difference of (d);
b is to be u (c)k) And the duty ratio is superposed with the reference PWM duty ratio and then loaded to the electromagnetic valve to control the movement of the valve core.
And thirdly, according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as the reference value of the PWM generator duty ratio.
And fourthly, setting initial values of displacement control parameters of the position PID closed-loop controller and current control parameters of the current PID closed-loop controller, collecting the position of the valve core as a state set, and collecting output current of the PWM generator.
And fifthly, dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain 6Q value sets.
The set of Q values is obtained by a Q-Learning algorithm. There are six Q value sets in total, three Q value sets and displacement control parameter kp1、ki1、kd1In connection with the three sets of Q values and the current control parameter kp2、ki2、kd2It is related.
Specifically, the same rule is used to set the values in the same storage interval:
Figure BDA0003136643150000091
wherein, [ x ]]Max { n ∈ Z |. n ≦ x }, n representing a discrete spool displacement or current; x is the number ofconRepresenting a continuous spool displacement or current; xminAnd XmaxAre each xconLower and upper limits of (d); n represents the number of intervals into which each spool displacement or each current is divided.
And sixthly, giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter, and generating 6 action sets of Q value sets.
Specifically, the epsilon-greedy criterion is defined as:
Figure BDA0003136643150000101
wherein xi is belonged to [0, 1 ]]And epsilon represents the probability of taking a random action,
Figure BDA0003136643150000102
representing the action taken by the current maximum Q value, s representing the current state, and a representing the action currently taken.
And seventhly, establishing a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter.
Specifically, a Q value set reward scheme related to the displacement control parameters is established:
Figure BDA0003136643150000103
wherein R istPrize value, e (x), representing position PID closed loop controllerk+1) The indication indicates the valve element set position xrefValve core position x acquired at the (k + 1) th sampling momentk+1Difference of (a), xlimIndicating a set displacement threshold;
establishing a Q value set reward scheme related to the current control parameter:
Figure BDA0003136643150000104
wherein R istcRepresenting the reward value of a current PID closed-loop controller, e (c)k+1) Representing the current reference input crefElectromagnetic valve input current c collected at the (k + 1) th sampling momentk+1A difference of (c)limIndicating deviceA set current threshold.
And step eight, performing Q-Learning.
The Q-Learning algorithm is a model-free reinforcement Learning method using time-series differential solution, which enables an agent to select an optimal action sequence in a markov decision process through interaction with the external environment. The Q-Learning algorithm's cost function is defined as:
Q(sk,ak)=R(s,a)+γmaxQ(sk,a) (10)
wherein, Q(s)k,ak) Is shown at skIn the state of taking akQ value of motion, maxQ(s)k,ak) Represents the maximum value of Q, gamma represents the decay factor, R (s, a) represents the value returned by the reward scheme;
in the Q-Learning algorithm, the agent receives an input state s in the external environmentkAnd outputs corresponding action a through an internal mechanismk. At akBy the action of (2), the external environment changes to a new state sk+1. At the same time, the algorithm generates an instant reward signal R for the agentk+1. Instant reward signal Rk+1Is for the external environment state skLower agent action akEvaluation of (3).
In the Q-Learning algorithm, the Q-value optimization iterative formula is as follows:
Qk+1(sk,ak)=Qk(sk,ak)+α[rk+γmax Qk(sk+1,ak+1)-Qk(sk,ak)] (11)
Figure BDA0003136643150000111
wherein s iskRepresenting the state at the kth sampling instant, akRepresenting the action at the kth sampling instant, Qk+1(sk,ak) Denotes the (k + 1) th sampling instant at skIn the state of taking akMovement is transferred to sk+1Q value of State, Qk(sk,ak) Watch (A)Shows the k-th sampling instant at skIn the state of taking akQ value of action, Qk(sk+1,ak+1) Represents the action a of taking the next sampling time at the kth sampling timek+1The latter Q value, α, represents the learning rate, α ∈ (0, 1)]R represents reward, k represents sampling time, and γ represents attenuation factor;
Figure BDA0003136643150000112
indicating that the k +1 sampling moment takes the optimal action in the optimizing process
Figure BDA0003136643150000113
The value of Q of (A) is,
Figure BDA0003136643150000114
indicating that the k sampling moment in the optimizing process takes the optimal action
Figure BDA0003136643150000115
The Q value of (1).
The Q-Learning algorithm is embodied as follows, as shown in FIG. 2:
(1) inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate;
(2) randomly initializing a reward corresponding to the state set and the action set, wherein the initial value of the termination state is 0;
(3) sequentially setting i to 1,2, … and T, and executing steps (4) to (8) respectively; wherein T represents the number of iterations;
(4) initializing a current state s as a first state of a current state set;
(5) selecting an action in the current state by using an epsilon-greedy method criterion;
(6) executing the selected action in the current state to obtain a new state and a corresponding reward;
(7) updating the cost function according to the formula (11) by using the new state and the corresponding reward;
(8) let s ' be s, s ' be the state after taking action, judge s ' whether it is the end state; if yes, the current round of iteration is finished, and the step S82 is returned; if not, returning to the step (5);
(9) and outputting the rewards corresponding to the state set and the action set.
And ninthly, executing the first step to the fourth step according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.
According to the proportional valve position control method based on Q-Learning, provided by the invention, the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are respectively subjected to self-adaptive adjustment through Q-Learning, so that the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are adapted along with the change of sampling time, and the accurate control of the position of the valve core is realized through position PID closed-loop control and current PID closed-loop control. The invention provides a control method capable of effectively improving the bandwidth and the precision of a proportional valve by considering various nonlinear factors in a proportional solenoid valve aiming at the high-precision position control of the high-frequency response proportional valve, has the advantages of rapidity and stability, and can provide some references for the high-frequency response of the proportional valve. Six control parameters in the traditional control method are fixed values and can only follow the displacement of the valve core at low frequency or the displacement of the valve core at high frequency and cannot synchronously follow the displacements of the valve core at low frequency and high frequency.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (7)

1. A method for controlling a proportional valve position based on Q-Learning, comprising the steps of:
s1: taking the difference value between the set position of the valve core and the currently acquired position of the valve core as the input of a position PID closed-loop controller, and taking the sum of the output of the position PID closed-loop controller and the reference current as the current reference input;
s2: taking the difference value between the current reference input and the currently acquired electromagnetic valve input current as the input of the current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core;
s3: according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as a reference value of the PWM generator duty ratio;
s4: setting initial values of three displacement control parameters of the position PID closed-loop controller and three current control parameters of the current PID closed-loop controller, collecting a valve core position as a state set, and collecting output current of a PWM generator;
s5: dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain six Q value sets;
s6: giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter to generate an action set with six Q value sets;
s7: formulating a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter;
s8: inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate, performing Q-Learning, and outputting rewards corresponding to the state set and the action set;
s9: and executing steps S1-S4 according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.
2. The method for controlling the Q-Learning based proportional valve position of claim 1, wherein step S1 specifically comprises:
calculating the valve core set position xrefAnd the currently acquired valve core position xkThe difference of (a):
e(xk)=xref-xk (1)
e (x)k) And as the input of the position PID closed-loop controller, the output of the position PID closed-loop controller is as follows:
Figure FDA0003136643140000021
wherein k ispxProportional gain, k, of position PID closed-loop controllerixIntegral gain, k, of a closed-loop controller representing position PIDdxDifferential gain, Δ x, of position PID closed-loop controllerkIndicating the sampling interval time, e (x), of a position PID closed-loop controllerk-1) Indicating the valve element set position xrefValve core position x acquired at the k-1 sampling momentk-1A difference of (d);
the output u (x) of the position PID closed-loop controllerk) And a reference current cstThe sum is taken as the current reference input:
cref=u(xk)+cst (3)。
3. the method for controlling the Q-Learning based proportional valve position of claim 2, wherein the step S2 specifically comprises:
calculating the current reference input crefAnd the current collected electromagnetic valve input current ckThe difference of (a):
e(ck)=cref-ck (4)
e (c)k) As the input of the current PID closed-loop controller, the output of the current PID closed-loop controller is:
Figure FDA0003136643140000022
wherein k ispcProportional gain, k, representing the current PID closed-loop controllericIntegral gain, k, of a closed-loop controller representing the current PIDdcRepresenting the differential gain, Δ c, of a current PID closed-loop controllerkRepresenting the sampling interval time of a current PID closed-loop controller, e (c)k-1) Representing the current reference input crefElectromagnetic valve input current c collected at the k-1 sampling momentk-1A difference of (d);
b is to be u (c)k) And the duty ratio is superposed with the reference PWM duty ratio and then loaded to the electromagnetic valve to control the movement of the valve core.
4. The method for Q-Learning based proportional valve position control of claim 3, wherein in step S5, the same rule is used to set the values in the same storage interval:
Figure FDA0003136643140000031
wherein, [ x ]]Max { n ∈ Z | n ≦ x }, n representing a discrete spool displacement or current; x is the number ofconRepresenting a continuous spool displacement or current; xminAnd XmaxAre each xconLower and upper limits of (d); n represents the number of intervals into which each spool displacement or each current is divided.
5. The Q-Learning based proportional valve position control method of claim 4, wherein in step S6, the epsilon-greedy criterion is defined as:
Figure FDA0003136643140000032
wherein xi is belonged to [0, 1 ]]And epsilon represents the probability of taking a random action,
Figure FDA0003136643140000033
representing the action taken by the current maximum Q value, s representing the current state, and a representing the action currently taken.
6. The method for Q-Learning based proportional valve position control of claim 5, wherein in step S7, a Q-value set reward scheme associated with the displacement control parameter is formulated:
Figure FDA0003136643140000034
wherein R istxPrize value, e (x), representing position PID closed loop controllerk+1) The indication indicates the valve element set position xrefValve core position x acquired at the (k + 1) th sampling momentk+1Difference of (a), xlimIndicating a set displacement threshold;
formulating a set of Q values reward scheme associated with the current control parameter:
Figure FDA0003136643140000041
wherein R istcRepresenting the reward value of a current PID closed-loop controller, e (c)k+1) Representing the current reference input crefElectromagnetic valve input current c collected at the (k + 1) th sampling momentk+1A difference of (c)limIndicating a set current threshold.
7. The method for controlling the Q-Learning based proportional valve position of claim 6, wherein the performing Q-Learning in step S8 specifically comprises:
s81: randomly initializing a reward corresponding to the state set and the action set, wherein the initial value of the termination state is 0;
s82: sequentially executing steps S83 to S87 with i being 1,2,. and T, respectively; wherein T represents the number of iterations;
s83: initializing a current state s as a first state of a current state set;
s84: selecting an action in the current state by using an epsilon-greedy method criterion;
s85: executing the selected action in the current state to obtain a new state and a corresponding reward;
s86: updating the cost function with the new state and its corresponding reward:
Qk+1(sk,ak)=Qk(sk,ak)+α[rk+γmax Qk(sk+1,ak+1)-Qk(sk,ak)] (10)
wherein s iskRepresenting the state at the kth sampling instant, akRepresenting the action at the kth sampling instant, Qk+1(sk,ak) Denotes the (k + 1) th sampling instant at skIn the state of taking akMovement is transferred to sk+1Q value of State, Qk(sk,ak) Denotes the k-th sampling instant at skIn the state of taking akQ value of action, Qk(sk+1,ak+1) Represents the action a of taking the next sampling time at the kth sampling timek+1The latter Q value, α, represents the learning rate, α ∈ (0, 1)]R represents reward, k represents sampling time, and γ represents attenuation factor;
s87: let s ' be s, s ' be the state after taking action, judge s ' whether it is the end state; if yes, the current round of iteration is finished, and the step S82 is returned; if not, the process returns to step S84.
CN202110720142.5A 2021-06-28 2021-06-28 Proportional valve position control method based on Q-Learning Active CN113467226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110720142.5A CN113467226B (en) 2021-06-28 2021-06-28 Proportional valve position control method based on Q-Learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110720142.5A CN113467226B (en) 2021-06-28 2021-06-28 Proportional valve position control method based on Q-Learning

Publications (2)

Publication Number Publication Date
CN113467226A true CN113467226A (en) 2021-10-01
CN113467226B CN113467226B (en) 2023-05-16

Family

ID=77873541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110720142.5A Active CN113467226B (en) 2021-06-28 2021-06-28 Proportional valve position control method based on Q-Learning

Country Status (1)

Country Link
CN (1) CN113467226B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114384795A (en) * 2021-12-21 2022-04-22 卓品智能科技无锡有限公司 Proportional solenoid valve current vibration control method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025630A (en) * 2006-02-13 2007-08-29 Smc株式会社 Positioning control system and filter
CN103995463A (en) * 2014-05-30 2014-08-20 北京敬科海工科技有限公司 Method for performing position servo driving on electro-hydraulic proportional valve based on hybrid control
CN106246986A (en) * 2016-06-27 2016-12-21 南昌大学 Integrated form vibrating signal self adaptation proportional valve amplifier
CN109597331A (en) * 2018-11-28 2019-04-09 南京晨光集团有限责任公司 A kind of pilot-operated type double electromagnet proportional valve controller based on current automatic adaptation
US10900343B1 (en) * 2018-01-25 2021-01-26 National Technology & Engineering Solutions Of Sandia, Llc Control systems and methods to enable autonomous drilling
CN112305920A (en) * 2020-12-28 2021-02-02 南京理工大学 Reinforced learning platform for design of closed-loop jet rock suppression controller

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025630A (en) * 2006-02-13 2007-08-29 Smc株式会社 Positioning control system and filter
CN103995463A (en) * 2014-05-30 2014-08-20 北京敬科海工科技有限公司 Method for performing position servo driving on electro-hydraulic proportional valve based on hybrid control
CN106246986A (en) * 2016-06-27 2016-12-21 南昌大学 Integrated form vibrating signal self adaptation proportional valve amplifier
US10900343B1 (en) * 2018-01-25 2021-01-26 National Technology & Engineering Solutions Of Sandia, Llc Control systems and methods to enable autonomous drilling
CN109597331A (en) * 2018-11-28 2019-04-09 南京晨光集团有限责任公司 A kind of pilot-operated type double electromagnet proportional valve controller based on current automatic adaptation
CN112305920A (en) * 2020-12-28 2021-02-02 南京理工大学 Reinforced learning platform for design of closed-loop jet rock suppression controller

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张活俊;江励;汤健华;黄辉;: "基于强化学习的抛光机器人主动力控制研究", 机械工程师 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114384795A (en) * 2021-12-21 2022-04-22 卓品智能科技无锡有限公司 Proportional solenoid valve current vibration control method
CN114384795B (en) * 2021-12-21 2022-10-25 卓品智能科技无锡股份有限公司 Proportional solenoid valve current vibration control method

Also Published As

Publication number Publication date
CN113467226B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN108008627B (en) Parallel optimization reinforcement learning self-adaptive PID control method
CN107544255B (en) State compensation model control method for batch injection molding process
CN116610025B (en) PID controller optimization method based on improved meta heuristic algorithm
CN113467226A (en) Proportional valve position control method based on Q-Learning
CN112571420B (en) Dual-function model prediction control method under unknown parameters
CN112417000B (en) Time sequence missing value filling method based on bidirectional cyclic codec neural network
CN108845501A (en) A kind of blast-melted quality adaptation optimal control method based on Lazy learning
CN110703718A (en) Industrial process control method based on signal compensation
CN109581863A (en) A kind of intelligence complex fertilizer control system liquid manure consistency controller
CN115167102A (en) Reinforced learning self-adaptive PID control method based on parallel dominant motion evaluation
CN113625547B (en) Main valve position control method of controller
CN106094524A (en) The rapid model prediction control method compensated based on input trend
CN116880191A (en) Intelligent control method of process industrial production system based on time sequence prediction
CN108696199B (en) Control method for improving position precision of permanent magnet synchronous linear motor
CN110597055A (en) Uncertainty-resistant 2D piecewise affine intermittent process minimum-maximum optimization prediction control method
Liu et al. Data learning‐based model‐free adaptive control and application to an NAO robot
CN116127681A (en) Method for driving self-evolution of digital twin of building by hybrid algorithm
Boubaker et al. Variable structure estimation and control of nonlinear distributed parameter bioreactors
Rizvi et al. Output Feedback Reinforcement Learning Control for Linear Systems
CN109491245A (en) A kind of disturbance compensation control method of CSTR system
CN110705700A (en) Drift prediction method of soil temperature sensor
CN114721268B (en) Heuristic iterative learning control method for pressure robustness of injection molding nozzle
CN117930775A (en) Intelligent control method for natural gas unmanned station yard and related components
CN113568309B (en) On-line space-time control method for temperature field
CN115877811B (en) Flow process treatment method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant