CN113467226A - Proportional valve position control method based on Q-Learning - Google Patents
Proportional valve position control method based on Q-Learning Download PDFInfo
- Publication number
- CN113467226A CN113467226A CN202110720142.5A CN202110720142A CN113467226A CN 113467226 A CN113467226 A CN 113467226A CN 202110720142 A CN202110720142 A CN 202110720142A CN 113467226 A CN113467226 A CN 113467226A
- Authority
- CN
- China
- Prior art keywords
- current
- loop controller
- state
- pid closed
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000006073 displacement reaction Methods 0.000 claims abstract description 50
- 238000005070 sampling Methods 0.000 claims abstract description 44
- 230000009471 action Effects 0.000 claims description 50
- 230000006870 function Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 230000004044 response Effects 0.000 abstract description 12
- 230000008859 change Effects 0.000 abstract description 3
- 230000000875 corresponding effect Effects 0.000 description 16
- 230000001276 controlling effect Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B11/00—Automatic controllers
- G05B11/01—Automatic controllers electric
- G05B11/36—Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential
- G05B11/42—Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential for obtaining a characteristic which is both proportional and time-dependent, e.g. P. I., P. I. D.
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Magnetically Actuated Valves (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a proportional valve position control method based on Q-Learning, which respectively carries out self-adaptive adjustment on a displacement control parameter of a position PID closed-loop controller and a current control parameter of a current PID closed-loop controller through Q-Learning, so that the displacement control parameter and the current control parameter are adapted along with the change of sampling time, and the accurate control of the position of a valve core is realized through the position PID closed-loop control and the current PID closed-loop control. The invention provides a control method capable of effectively improving the bandwidth and the precision of a proportional valve by considering various nonlinear factors in a proportional solenoid valve aiming at the high-precision position control of the high-frequency response proportional valve, and can provide some references for the high-frequency response of the proportional valve. The control method has the advantages of rapidity and stability, can synchronously follow the displacement of the valve core at low frequency and high frequency, and solves the problem that PID parameters cannot be simultaneously adapted to high-frequency and low-frequency signals.
Description
Technical Field
The invention relates to the technical field of proportional valve control, in particular to a proportional valve position control method based on Q-Learning.
Background
The electro-hydraulic proportional valve is a hydraulic element commonly used in the field of industrial control. With the continuous development of engineering technology, the requirements on the position control precision and the frequency response of the proportional valve are higher and higher. Therefore, the research of a high-precision and high-frequency-response proportional valve position control method is extremely important for the development of the electro-hydraulic control technology. At present, a series of researches are carried out at home and abroad aiming at the position control of a high-frequency response servo proportional valve.
In the existing research on a proportional valve position control method, a compensation value is determined by acquiring the position of a valve core and judging whether the valve core reaches an expected position or whether a system oscillates, then a control signal is compensated and corrected based on the size of a dead zone compensation value, and a correction signal is output to an electro-hydraulic proportional valve. Compared with the traditional valve, the method has the advantages that the control precision is improved, but the control precision of the high-frequency-response position of the hydraulic valve is difficult to guarantee.
In addition, proportional valves also have non-linear factors. The current-force characteristic of the proportional electromagnet can generate nonlinearity under the high-frequency-response working condition, and firstly, the proportional electromagnet is worn due to use after leaving a factory; secondly, due to the existence of hysteresis, nonlinearity can occur under the high frequency response working condition. In the valve body, the friction coefficient between the valve core and the valve sleeve is changed due to the movement of the valve core, so that the friction force is continuously changed. In addition, the hydrodynamic force can also generate nonlinear influence on the valve core along with the movement of the valve core. Non-linear factors introduced by the circuit itself, such as a non-linear relationship between a given PWM duty cycle and frequency and the circuit output current, also occur in the control circuit.
Disclosure of Invention
In view of the above, the present invention provides a proportional valve position control method based on Q-Learning, which is used to consider the nonlinearity of the system and solve the problem that the PID parameter cannot adapt to both high frequency and low frequency given signals.
The invention provides a proportional valve position control method based on Q-Learning, which comprises the following steps:
s1: taking the difference value between the set position of the valve core and the currently acquired position of the valve core as the input of a position PID closed-loop controller, and taking the sum of the output of the position PID closed-loop controller and the reference current as the current reference input;
s2: taking the difference value between the current reference input and the currently acquired electromagnetic valve input current as the input of the current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core;
s3: according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as a reference value of the PWM generator duty ratio;
s4: setting initial values of three displacement control parameters of the position PID closed-loop controller and three current control parameters of the current PID closed-loop controller, collecting a valve core position as a state set, and collecting output current of a PWM generator;
s5: dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain six Q value sets;
s6: giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter to generate an action set with six Q value sets;
s7: formulating a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter;
s8: inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate, performing Q-Learning, and outputting rewards corresponding to the state set and the action set;
s9: and executing steps S1-S4 according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.
In a possible implementation manner, in the above method for controlling a proportional valve position based on Q-Learning provided by the present invention, step S1 specifically includes:
calculating the valve core set position xrefAnd the currently acquired valve core position xkThe difference of (a):
e(xk)=xref-xk (1)
e (x)k) As position PID closureThe input of the loop controller, the output of the position PID closed-loop controller is:
wherein k ispxProportional gain, k, of position PID closed-loop controllerixIntegral gain, k, of a closed-loop controller representing position PIDdxDifferential gain, Δ x, of position PID closed-loop controllerkIndicating the sampling interval time, e (x), of a position PID closed-loop controllerk-1) Indicating the valve element set position xrefValve core position x acquired at the k-1 sampling momentk-1A difference of (d);
the output u (x) of the position PID closed-loop controllerk) And a reference current cstThe sum is taken as the current reference input:
cref=u(xk)+cst (3)。
in a possible implementation manner, in the above method for controlling a proportional valve position based on Q-Learning provided by the present invention, step S2 specifically includes:
calculating the current reference input crefAnd the current collected electromagnetic valve input current ckThe difference of (a):
e(ck)=cref-ck (4)
e (c)k) As the input of the current PID closed-loop controller, the output of the current PID closed-loop controller is:
wherein k ispcProportional gain, k, representing the current PID closed-loop controllericIntegral gain, k, of a closed-loop controller representing the current PIDdcRepresenting the differential gain, Δ c, of a current PID closed-loop controllerkRepresenting the sampling interval time of a current PID closed-loop controller, e (c)k-1) Representing the current reference input crefElectromagnetic valve input current c collected at the k-1 sampling momentk-1A difference of (d);
b is to be u (c)k) And the duty ratio is superposed with the reference PWM duty ratio and then loaded to the electromagnetic valve to control the movement of the valve core.
In a possible implementation manner, in the proportional valve position control method based on Q-Learning provided by the present invention, in step S5, values in the same storage interval are set using the same rule:
wherein, [ x ]]Max { n ∈ Z |. n ≦ x }, n representing a discrete spool displacement or current; x is the number ofconRepresenting a continuous spool displacement or current; xminAnd XmaxAre each xconLower and upper limits of (d); n represents the number of intervals into which each spool displacement or each current is divided.
In a possible implementation manner, in the proportional valve position control method based on Q-Learning provided by the present invention, in step S6, an epsilon-greedy criterion is defined as:
wherein xi is belonged to [0, 1 ]]And epsilon represents the probability of taking a random action,representing the action taken by the current maximum Q value, s representing the current state, and a representing the action currently taken.
In a possible implementation manner, in the above proportional valve position control method based on Q-Learning provided by the present invention, in step S7, a Q value set reward scheme related to the displacement control parameter is formulated:
wherein R istxPrize value, e (x), representing position PID closed loop controllerk+1) The indication indicates the valve element set position xrefValve core position x acquired at the (k + 1) th sampling momentk+1Difference of (a), xlimIndicating a set displacement threshold;
formulating a set of Q values reward scheme associated with the current control parameter:
wherein R istcRepresenting the reward value of a current PID closed-loop controller, e (c)k+1) Representing the current reference input crefElectromagnetic valve input current c collected at the (k + 1) th sampling momentk+1A difference of (c)limIndicating a set current threshold.
In a possible implementation manner, in the proportional valve position control method based on Q-Learning provided by the present invention, in step S8, the performing Q-Learning specifically includes:
s81: randomly initializing a reward corresponding to the state set and the action set, wherein the initial value of the termination state is 0;
s82: sequentially setting i to 1,2, …, and T, and executing steps S83 to S87, respectively; wherein T represents the number of iterations;
s83: initializing a current state s as a first state of a current state set;
s84: selecting an action in the current state by using an epsilon-greedy method criterion;
s85: executing the selected action in the current state to obtain a new state and a corresponding reward;
s86: updating the cost function with the new state and its corresponding reward:
Qk+1(sk,ak)=Qk(sk,ak)+α[rk+γmax Qk)sk+1,ak+1)-Qk(sk,ak)] (10)
wherein s iskRepresenting the state at the kth sampling instant, akRepresenting the action at the kth sampling instant, Qk+1(sk,ak) Denotes the (k + 1) th sampling instant at skIn the state of taking akMovement is transferred to sk+1Q value of State, Qk(sk,ak) Denotes the k-th sampling instant at skIn the state of taking akQ value of action, Qk(sk+1,ak+1) Represents the action a of taking the next sampling time at the kth sampling timek+1The latter Q value, α, represents the learning rate, α ∈ (0, 1)]R represents reward, k represents sampling time, and γ represents attenuation factor;
s87: let s ' be s, s ' be the state after taking action, judge s ' whether it is the end state; if yes, the current round of iteration is finished, and the step S82 is returned; if not, the process returns to step S84.
According to the proportional valve position control method based on Q-Learning, provided by the invention, the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are respectively subjected to self-adaptive adjustment through Q-Learning, so that the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are adapted along with the change of sampling time, and the accurate control of the position of the valve core is realized through position PID closed-loop control and current PID closed-loop control. The invention provides a control method capable of effectively improving the bandwidth and the precision of a proportional valve by considering various nonlinear factors in a proportional solenoid valve aiming at the high-precision position control of the high-frequency response proportional valve, has the advantages of rapidity and stability, and can provide some references for the high-frequency response of the proportional valve. Six control parameters in the traditional control method are fixed values and can only follow the displacement of the valve core at low frequency or the displacement of the valve core at high frequency and cannot synchronously follow the displacements of the valve core at low frequency and high frequency.
Drawings
FIG. 1 is a block flow diagram of a method for controlling a proportional valve position based on Q-Learning according to embodiment 1 of the present invention;
fig. 2 is a flow chart of the Q-Learning algorithm in embodiment 1 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only illustrative and are not intended to limit the present invention.
The invention provides a proportional valve position control method based on Q-Learning, which comprises the following steps:
s1: taking the difference value between the set position of the valve core and the currently acquired position of the valve core as the input of a position PID closed-loop controller, and taking the sum of the output of the position PID closed-loop controller and the reference current as the current reference input;
s2: taking the difference value of the current reference input and the currently acquired electromagnetic valve input current as the input of a current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core;
s3: according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as a reference value of the PWM generator duty ratio;
s4: setting initial values of three displacement control parameters of a position PID closed-loop controller and three current control parameters of a current PID closed-loop controller, collecting a valve core position as a state set, and collecting output current of a PWM generator;
s5: dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain six Q value sets;
s6: giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter to generate an action set with six Q value sets;
s7: formulating a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter;
s8: inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate, performing Q-Learning, and outputting rewards corresponding to the state set and the action set;
s9: and executing steps S1-S4 according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.
The specific implementation of the above-mentioned proportional valve position control method based on Q-Learning according to the present invention is described in detail with reference to a specific embodiment.
Example 1: the Q-Learning based proportional valve position control method includes position PID control of the outer loop and current PID control of the inner loop as shown in fig. 1.
Setting position x of valve core of given proportional valverefAnd for input, the position of the valve core is controlled by taking the deviation between the currently acquired valve core position and the valve core set position and the deviation between the calculated current value and the current value output by the PWM generator as a control quantity. Discretizing the valve core position and the output current of the PWM generator to be used as states for data acquisition, and performing Q-Learning to obtain 6Q value sets, wherein each Q value set corresponds to the system gain of two PID controllers respectively, namely the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller. Each set of Q values generates the best value of the gains of the two PID controllers given the current state.
The method comprises the following specific steps:
firstly, the difference value between the valve core set position and the currently acquired valve core position is used as the input of a position PID closed-loop controller, and the sum of the output of the position PID closed-loop controller and a reference current is used as a current reference input.
Setting position x with a valve corerefFor reference, the spool set position x is calculatedrefAnd the currently acquired valve core position xkThe difference of (a):
e(xk)=xref-xk (1)
e (x)k) The output of the position PID closed-loop controller is as follows as the input of the position PID closed-loop controller:
wherein k ispxProportional gain, k, of position PID closed-loop controllerixIntegral gain, k, of a closed-loop controller representing position PIDdxDifferential gain, Δ x, of position PID closed-loop controllerkIndicating the sampling interval time, e (x), of a position PID closed-loop controllerk-1) The indication indicates the valve element set position xrefValve core position x acquired at the k-1 sampling momentk-1A difference of (d);
output u (x) of position PID closed-loop controllerk) And a reference current cstThe sum is taken as the current reference input:
cref=u(xk)+cst (3)。
and secondly, taking the difference value between the current reference input and the currently acquired input current of the electromagnetic valve as the input of a current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core.
Calculating a current reference input crefAnd the current collected electromagnetic valve input current ckThe difference of (a):
e(ck)=cref-ck (4)
e (c)k) As the input of the current PID closed-loop controller, the output of the current PID closed-loop controller is:
wherein k ispcProportional gain, k, representing the current PID closed-loop controllericIntegral gain, k, of a closed-loop controller representing the current PIDdcRepresenting the differential gain, Δ c, of a current PID closed-loop controllerkRepresenting the sampling interval time of a current PID closed-loop controller, e (c)k-1) Representing the current reference input crefElectromagnetic valve input current c collected at the k-1 sampling momentk-1A difference of (d);
b is to be u (c)k) And the duty ratio is superposed with the reference PWM duty ratio and then loaded to the electromagnetic valve to control the movement of the valve core.
And thirdly, according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as the reference value of the PWM generator duty ratio.
And fourthly, setting initial values of displacement control parameters of the position PID closed-loop controller and current control parameters of the current PID closed-loop controller, collecting the position of the valve core as a state set, and collecting output current of the PWM generator.
And fifthly, dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain 6Q value sets.
The set of Q values is obtained by a Q-Learning algorithm. There are six Q value sets in total, three Q value sets and displacement control parameter kp1、ki1、kd1In connection with the three sets of Q values and the current control parameter kp2、ki2、kd2It is related.
Specifically, the same rule is used to set the values in the same storage interval:
wherein, [ x ]]Max { n ∈ Z |. n ≦ x }, n representing a discrete spool displacement or current; x is the number ofconRepresenting a continuous spool displacement or current; xminAnd XmaxAre each xconLower and upper limits of (d); n represents the number of intervals into which each spool displacement or each current is divided.
And sixthly, giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter, and generating 6 action sets of Q value sets.
Specifically, the epsilon-greedy criterion is defined as:
wherein xi is belonged to [0, 1 ]]And epsilon represents the probability of taking a random action,representing the action taken by the current maximum Q value, s representing the current state, and a representing the action currently taken.
And seventhly, establishing a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter.
Specifically, a Q value set reward scheme related to the displacement control parameters is established:
wherein R istPrize value, e (x), representing position PID closed loop controllerk+1) The indication indicates the valve element set position xrefValve core position x acquired at the (k + 1) th sampling momentk+1Difference of (a), xlimIndicating a set displacement threshold;
establishing a Q value set reward scheme related to the current control parameter:
wherein R istcRepresenting the reward value of a current PID closed-loop controller, e (c)k+1) Representing the current reference input crefElectromagnetic valve input current c collected at the (k + 1) th sampling momentk+1A difference of (c)limIndicating deviceA set current threshold.
And step eight, performing Q-Learning.
The Q-Learning algorithm is a model-free reinforcement Learning method using time-series differential solution, which enables an agent to select an optimal action sequence in a markov decision process through interaction with the external environment. The Q-Learning algorithm's cost function is defined as:
Q(sk,ak)=R(s,a)+γmaxQ(sk,a) (10)
wherein, Q(s)k,ak) Is shown at skIn the state of taking akQ value of motion, maxQ(s)k,ak) Represents the maximum value of Q, gamma represents the decay factor, R (s, a) represents the value returned by the reward scheme;
in the Q-Learning algorithm, the agent receives an input state s in the external environmentkAnd outputs corresponding action a through an internal mechanismk. At akBy the action of (2), the external environment changes to a new state sk+1. At the same time, the algorithm generates an instant reward signal R for the agentk+1. Instant reward signal Rk+1Is for the external environment state skLower agent action akEvaluation of (3).
In the Q-Learning algorithm, the Q-value optimization iterative formula is as follows:
Qk+1(sk,ak)=Qk(sk,ak)+α[rk+γmax Qk(sk+1,ak+1)-Qk(sk,ak)] (11)
wherein s iskRepresenting the state at the kth sampling instant, akRepresenting the action at the kth sampling instant, Qk+1(sk,ak) Denotes the (k + 1) th sampling instant at skIn the state of taking akMovement is transferred to sk+1Q value of State, Qk(sk,ak) Watch (A)Shows the k-th sampling instant at skIn the state of taking akQ value of action, Qk(sk+1,ak+1) Represents the action a of taking the next sampling time at the kth sampling timek+1The latter Q value, α, represents the learning rate, α ∈ (0, 1)]R represents reward, k represents sampling time, and γ represents attenuation factor;indicating that the k +1 sampling moment takes the optimal action in the optimizing processThe value of Q of (A) is,indicating that the k sampling moment in the optimizing process takes the optimal actionThe Q value of (1).
The Q-Learning algorithm is embodied as follows, as shown in FIG. 2:
(1) inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate;
(2) randomly initializing a reward corresponding to the state set and the action set, wherein the initial value of the termination state is 0;
(3) sequentially setting i to 1,2, … and T, and executing steps (4) to (8) respectively; wherein T represents the number of iterations;
(4) initializing a current state s as a first state of a current state set;
(5) selecting an action in the current state by using an epsilon-greedy method criterion;
(6) executing the selected action in the current state to obtain a new state and a corresponding reward;
(7) updating the cost function according to the formula (11) by using the new state and the corresponding reward;
(8) let s ' be s, s ' be the state after taking action, judge s ' whether it is the end state; if yes, the current round of iteration is finished, and the step S82 is returned; if not, returning to the step (5);
(9) and outputting the rewards corresponding to the state set and the action set.
And ninthly, executing the first step to the fourth step according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.
According to the proportional valve position control method based on Q-Learning, provided by the invention, the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are respectively subjected to self-adaptive adjustment through Q-Learning, so that the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are adapted along with the change of sampling time, and the accurate control of the position of the valve core is realized through position PID closed-loop control and current PID closed-loop control. The invention provides a control method capable of effectively improving the bandwidth and the precision of a proportional valve by considering various nonlinear factors in a proportional solenoid valve aiming at the high-precision position control of the high-frequency response proportional valve, has the advantages of rapidity and stability, and can provide some references for the high-frequency response of the proportional valve. Six control parameters in the traditional control method are fixed values and can only follow the displacement of the valve core at low frequency or the displacement of the valve core at high frequency and cannot synchronously follow the displacements of the valve core at low frequency and high frequency.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (7)
1. A method for controlling a proportional valve position based on Q-Learning, comprising the steps of:
s1: taking the difference value between the set position of the valve core and the currently acquired position of the valve core as the input of a position PID closed-loop controller, and taking the sum of the output of the position PID closed-loop controller and the reference current as the current reference input;
s2: taking the difference value between the current reference input and the currently acquired electromagnetic valve input current as the input of the current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core;
s3: according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as a reference value of the PWM generator duty ratio;
s4: setting initial values of three displacement control parameters of the position PID closed-loop controller and three current control parameters of the current PID closed-loop controller, collecting a valve core position as a state set, and collecting output current of a PWM generator;
s5: dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain six Q value sets;
s6: giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter to generate an action set with six Q value sets;
s7: formulating a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter;
s8: inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate, performing Q-Learning, and outputting rewards corresponding to the state set and the action set;
s9: and executing steps S1-S4 according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.
2. The method for controlling the Q-Learning based proportional valve position of claim 1, wherein step S1 specifically comprises:
calculating the valve core set position xrefAnd the currently acquired valve core position xkThe difference of (a):
e(xk)=xref-xk (1)
e (x)k) And as the input of the position PID closed-loop controller, the output of the position PID closed-loop controller is as follows:
wherein k ispxProportional gain, k, of position PID closed-loop controllerixIntegral gain, k, of a closed-loop controller representing position PIDdxDifferential gain, Δ x, of position PID closed-loop controllerkIndicating the sampling interval time, e (x), of a position PID closed-loop controllerk-1) Indicating the valve element set position xrefValve core position x acquired at the k-1 sampling momentk-1A difference of (d);
the output u (x) of the position PID closed-loop controllerk) And a reference current cstThe sum is taken as the current reference input:
cref=u(xk)+cst (3)。
3. the method for controlling the Q-Learning based proportional valve position of claim 2, wherein the step S2 specifically comprises:
calculating the current reference input crefAnd the current collected electromagnetic valve input current ckThe difference of (a):
e(ck)=cref-ck (4)
e (c)k) As the input of the current PID closed-loop controller, the output of the current PID closed-loop controller is:
wherein k ispcProportional gain, k, representing the current PID closed-loop controllericIntegral gain, k, of a closed-loop controller representing the current PIDdcRepresenting the differential gain, Δ c, of a current PID closed-loop controllerkRepresenting the sampling interval time of a current PID closed-loop controller, e (c)k-1) Representing the current reference input crefElectromagnetic valve input current c collected at the k-1 sampling momentk-1A difference of (d);
b is to be u (c)k) And the duty ratio is superposed with the reference PWM duty ratio and then loaded to the electromagnetic valve to control the movement of the valve core.
4. The method for Q-Learning based proportional valve position control of claim 3, wherein in step S5, the same rule is used to set the values in the same storage interval:
wherein, [ x ]]Max { n ∈ Z | n ≦ x }, n representing a discrete spool displacement or current; x is the number ofconRepresenting a continuous spool displacement or current; xminAnd XmaxAre each xconLower and upper limits of (d); n represents the number of intervals into which each spool displacement or each current is divided.
5. The Q-Learning based proportional valve position control method of claim 4, wherein in step S6, the epsilon-greedy criterion is defined as:
6. The method for Q-Learning based proportional valve position control of claim 5, wherein in step S7, a Q-value set reward scheme associated with the displacement control parameter is formulated:
wherein R istxPrize value, e (x), representing position PID closed loop controllerk+1) The indication indicates the valve element set position xrefValve core position x acquired at the (k + 1) th sampling momentk+1Difference of (a), xlimIndicating a set displacement threshold;
formulating a set of Q values reward scheme associated with the current control parameter:
wherein R istcRepresenting the reward value of a current PID closed-loop controller, e (c)k+1) Representing the current reference input crefElectromagnetic valve input current c collected at the (k + 1) th sampling momentk+1A difference of (c)limIndicating a set current threshold.
7. The method for controlling the Q-Learning based proportional valve position of claim 6, wherein the performing Q-Learning in step S8 specifically comprises:
s81: randomly initializing a reward corresponding to the state set and the action set, wherein the initial value of the termination state is 0;
s82: sequentially executing steps S83 to S87 with i being 1,2,. and T, respectively; wherein T represents the number of iterations;
s83: initializing a current state s as a first state of a current state set;
s84: selecting an action in the current state by using an epsilon-greedy method criterion;
s85: executing the selected action in the current state to obtain a new state and a corresponding reward;
s86: updating the cost function with the new state and its corresponding reward:
Qk+1(sk,ak)=Qk(sk,ak)+α[rk+γmax Qk(sk+1,ak+1)-Qk(sk,ak)] (10)
wherein s iskRepresenting the state at the kth sampling instant, akRepresenting the action at the kth sampling instant, Qk+1(sk,ak) Denotes the (k + 1) th sampling instant at skIn the state of taking akMovement is transferred to sk+1Q value of State, Qk(sk,ak) Denotes the k-th sampling instant at skIn the state of taking akQ value of action, Qk(sk+1,ak+1) Represents the action a of taking the next sampling time at the kth sampling timek+1The latter Q value, α, represents the learning rate, α ∈ (0, 1)]R represents reward, k represents sampling time, and γ represents attenuation factor;
s87: let s ' be s, s ' be the state after taking action, judge s ' whether it is the end state; if yes, the current round of iteration is finished, and the step S82 is returned; if not, the process returns to step S84.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110720142.5A CN113467226B (en) | 2021-06-28 | 2021-06-28 | Proportional valve position control method based on Q-Learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110720142.5A CN113467226B (en) | 2021-06-28 | 2021-06-28 | Proportional valve position control method based on Q-Learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113467226A true CN113467226A (en) | 2021-10-01 |
CN113467226B CN113467226B (en) | 2023-05-16 |
Family
ID=77873541
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110720142.5A Active CN113467226B (en) | 2021-06-28 | 2021-06-28 | Proportional valve position control method based on Q-Learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113467226B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114384795A (en) * | 2021-12-21 | 2022-04-22 | 卓品智能科技无锡有限公司 | Proportional solenoid valve current vibration control method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101025630A (en) * | 2006-02-13 | 2007-08-29 | Smc株式会社 | Positioning control system and filter |
CN103995463A (en) * | 2014-05-30 | 2014-08-20 | 北京敬科海工科技有限公司 | Method for performing position servo driving on electro-hydraulic proportional valve based on hybrid control |
CN106246986A (en) * | 2016-06-27 | 2016-12-21 | 南昌大学 | Integrated form vibrating signal self adaptation proportional valve amplifier |
CN109597331A (en) * | 2018-11-28 | 2019-04-09 | 南京晨光集团有限责任公司 | A kind of pilot-operated type double electromagnet proportional valve controller based on current automatic adaptation |
US10900343B1 (en) * | 2018-01-25 | 2021-01-26 | National Technology & Engineering Solutions Of Sandia, Llc | Control systems and methods to enable autonomous drilling |
CN112305920A (en) * | 2020-12-28 | 2021-02-02 | 南京理工大学 | Reinforced learning platform for design of closed-loop jet rock suppression controller |
-
2021
- 2021-06-28 CN CN202110720142.5A patent/CN113467226B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101025630A (en) * | 2006-02-13 | 2007-08-29 | Smc株式会社 | Positioning control system and filter |
CN103995463A (en) * | 2014-05-30 | 2014-08-20 | 北京敬科海工科技有限公司 | Method for performing position servo driving on electro-hydraulic proportional valve based on hybrid control |
CN106246986A (en) * | 2016-06-27 | 2016-12-21 | 南昌大学 | Integrated form vibrating signal self adaptation proportional valve amplifier |
US10900343B1 (en) * | 2018-01-25 | 2021-01-26 | National Technology & Engineering Solutions Of Sandia, Llc | Control systems and methods to enable autonomous drilling |
CN109597331A (en) * | 2018-11-28 | 2019-04-09 | 南京晨光集团有限责任公司 | A kind of pilot-operated type double electromagnet proportional valve controller based on current automatic adaptation |
CN112305920A (en) * | 2020-12-28 | 2021-02-02 | 南京理工大学 | Reinforced learning platform for design of closed-loop jet rock suppression controller |
Non-Patent Citations (1)
Title |
---|
张活俊;江励;汤健华;黄辉;: "基于强化学习的抛光机器人主动力控制研究", 机械工程师 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114384795A (en) * | 2021-12-21 | 2022-04-22 | 卓品智能科技无锡有限公司 | Proportional solenoid valve current vibration control method |
CN114384795B (en) * | 2021-12-21 | 2022-10-25 | 卓品智能科技无锡股份有限公司 | Proportional solenoid valve current vibration control method |
Also Published As
Publication number | Publication date |
---|---|
CN113467226B (en) | 2023-05-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108008627B (en) | Parallel optimization reinforcement learning self-adaptive PID control method | |
CN107544255B (en) | State compensation model control method for batch injection molding process | |
CN116610025B (en) | PID controller optimization method based on improved meta heuristic algorithm | |
CN113467226A (en) | Proportional valve position control method based on Q-Learning | |
CN112571420B (en) | Dual-function model prediction control method under unknown parameters | |
CN112417000B (en) | Time sequence missing value filling method based on bidirectional cyclic codec neural network | |
CN108845501A (en) | A kind of blast-melted quality adaptation optimal control method based on Lazy learning | |
CN110703718A (en) | Industrial process control method based on signal compensation | |
CN109581863A (en) | A kind of intelligence complex fertilizer control system liquid manure consistency controller | |
CN115167102A (en) | Reinforced learning self-adaptive PID control method based on parallel dominant motion evaluation | |
CN113625547B (en) | Main valve position control method of controller | |
CN106094524A (en) | The rapid model prediction control method compensated based on input trend | |
CN116880191A (en) | Intelligent control method of process industrial production system based on time sequence prediction | |
CN108696199B (en) | Control method for improving position precision of permanent magnet synchronous linear motor | |
CN110597055A (en) | Uncertainty-resistant 2D piecewise affine intermittent process minimum-maximum optimization prediction control method | |
Liu et al. | Data learning‐based model‐free adaptive control and application to an NAO robot | |
CN116127681A (en) | Method for driving self-evolution of digital twin of building by hybrid algorithm | |
Boubaker et al. | Variable structure estimation and control of nonlinear distributed parameter bioreactors | |
Rizvi et al. | Output Feedback Reinforcement Learning Control for Linear Systems | |
CN109491245A (en) | A kind of disturbance compensation control method of CSTR system | |
CN110705700A (en) | Drift prediction method of soil temperature sensor | |
CN114721268B (en) | Heuristic iterative learning control method for pressure robustness of injection molding nozzle | |
CN117930775A (en) | Intelligent control method for natural gas unmanned station yard and related components | |
CN113568309B (en) | On-line space-time control method for temperature field | |
CN115877811B (en) | Flow process treatment method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |