CN113467226A

CN113467226A - Proportional valve position control method based on Q-Learning

Info

Publication number: CN113467226A
Application number: CN202110720142.5A
Authority: CN
Inventors: 张辉; 张思龙
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-10-01
Anticipated expiration: 2041-06-28
Also published as: CN113467226B

Abstract

The invention discloses a proportional valve position control method based on Q-Learning, which respectively carries out self-adaptive adjustment on a displacement control parameter of a position PID closed-loop controller and a current control parameter of a current PID closed-loop controller through Q-Learning, so that the displacement control parameter and the current control parameter are adapted along with the change of sampling time, and the accurate control of the position of a valve core is realized through the position PID closed-loop control and the current PID closed-loop control. The invention provides a control method capable of effectively improving the bandwidth and the precision of a proportional valve by considering various nonlinear factors in a proportional solenoid valve aiming at the high-precision position control of the high-frequency response proportional valve, and can provide some references for the high-frequency response of the proportional valve. The control method has the advantages of rapidity and stability, can synchronously follow the displacement of the valve core at low frequency and high frequency, and solves the problem that PID parameters cannot be simultaneously adapted to high-frequency and low-frequency signals.

Description

Proportional valve position control method based on Q-Learning

Technical Field

The invention relates to the technical field of proportional valve control, in particular to a proportional valve position control method based on Q-Learning.

Background

The electro-hydraulic proportional valve is a hydraulic element commonly used in the field of industrial control. With the continuous development of engineering technology, the requirements on the position control precision and the frequency response of the proportional valve are higher and higher. Therefore, the research of a high-precision and high-frequency-response proportional valve position control method is extremely important for the development of the electro-hydraulic control technology. At present, a series of researches are carried out at home and abroad aiming at the position control of a high-frequency response servo proportional valve.

In the existing research on a proportional valve position control method, a compensation value is determined by acquiring the position of a valve core and judging whether the valve core reaches an expected position or whether a system oscillates, then a control signal is compensated and corrected based on the size of a dead zone compensation value, and a correction signal is output to an electro-hydraulic proportional valve. Compared with the traditional valve, the method has the advantages that the control precision is improved, but the control precision of the high-frequency-response position of the hydraulic valve is difficult to guarantee.

In addition, proportional valves also have non-linear factors. The current-force characteristic of the proportional electromagnet can generate nonlinearity under the high-frequency-response working condition, and firstly, the proportional electromagnet is worn due to use after leaving a factory; secondly, due to the existence of hysteresis, nonlinearity can occur under the high frequency response working condition. In the valve body, the friction coefficient between the valve core and the valve sleeve is changed due to the movement of the valve core, so that the friction force is continuously changed. In addition, the hydrodynamic force can also generate nonlinear influence on the valve core along with the movement of the valve core. Non-linear factors introduced by the circuit itself, such as a non-linear relationship between a given PWM duty cycle and frequency and the circuit output current, also occur in the control circuit.

Disclosure of Invention

In view of the above, the present invention provides a proportional valve position control method based on Q-Learning, which is used to consider the nonlinearity of the system and solve the problem that the PID parameter cannot adapt to both high frequency and low frequency given signals.

The invention provides a proportional valve position control method based on Q-Learning, which comprises the following steps:

s1: taking the difference value between the set position of the valve core and the currently acquired position of the valve core as the input of a position PID closed-loop controller, and taking the sum of the output of the position PID closed-loop controller and the reference current as the current reference input;

s2: taking the difference value between the current reference input and the currently acquired electromagnetic valve input current as the input of the current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core;

s3: according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as a reference value of the PWM generator duty ratio;

s4: setting initial values of three displacement control parameters of the position PID closed-loop controller and three current control parameters of the current PID closed-loop controller, collecting a valve core position as a state set, and collecting output current of a PWM generator;

s5: dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain six Q value sets;

s6: giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter to generate an action set with six Q value sets;

s7: formulating a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter;

s8: inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate, performing Q-Learning, and outputting rewards corresponding to the state set and the action set;

s9: and executing steps S1-S4 according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.

In a possible implementation manner, in the above method for controlling a proportional valve position based on Q-Learning provided by the present invention, step S1 specifically includes:

calculating the valve core set position x_refAnd the currently acquired valve core position x_kThe difference of (a):

e(x_k)＝x_ref-x_k (1)

e (x)_k) As position PID closureThe input of the loop controller, the output of the position PID closed-loop controller is:

wherein k is_pxProportional gain, k, of position PID closed-loop controller_ixIntegral gain, k, of a closed-loop controller representing position PID_dxDifferential gain, Δ x, of position PID closed-loop controller_kIndicating the sampling interval time, e (x), of a position PID closed-loop controller_k-1) Indicating the valve element set position x_refValve core position x acquired at the k-1 sampling moment_k-1A difference of (d);

the output u (x) of the position PID closed-loop controller_k) And a reference current c_stThe sum is taken as the current reference input:

c_ref＝u(x_k)+c_st (3)。

in a possible implementation manner, in the above method for controlling a proportional valve position based on Q-Learning provided by the present invention, step S2 specifically includes:

calculating the current reference input c_refAnd the current collected electromagnetic valve input current c_kThe difference of (a):

e(c_k)＝c_ref-c_k (4)

e (c)_k) As the input of the current PID closed-loop controller, the output of the current PID closed-loop controller is:

wherein k is_pcProportional gain, k, representing the current PID closed-loop controller_icIntegral gain, k, of a closed-loop controller representing the current PID_dcRepresenting the differential gain, Δ c, of a current PID closed-loop controller_kRepresenting the sampling interval time of a current PID closed-loop controller, e (c)_k-1) Representing the current reference input c_refElectromagnetic valve input current c collected at the k-1 sampling moment_k-1A difference of (d);

b is to be u (c)_k) And the duty ratio is superposed with the reference PWM duty ratio and then loaded to the electromagnetic valve to control the movement of the valve core.

In a possible implementation manner, in the proportional valve position control method based on Q-Learning provided by the present invention, in step S5, values in the same storage interval are set using the same rule:

wherein, [ x ]]Max { n ∈ Z |. n ≦ x }, n representing a discrete spool displacement or current; x is the number of_conRepresenting a continuous spool displacement or current; x_minAnd X_maxAre each x_conLower and upper limits of (d); n represents the number of intervals into which each spool displacement or each current is divided.

In a possible implementation manner, in the proportional valve position control method based on Q-Learning provided by the present invention, in step S6, an epsilon-greedy criterion is defined as:

wherein xi is belonged to [0, 1 ]]And epsilon represents the probability of taking a random action,

representing the action taken by the current maximum Q value, s representing the current state, and a representing the action currently taken.

In a possible implementation manner, in the above proportional valve position control method based on Q-Learning provided by the present invention, in step S7, a Q value set reward scheme related to the displacement control parameter is formulated:

wherein R is_txPrize value, e (x), representing position PID closed loop controller_k+1) The indication indicates the valve element set position x_refValve core position x acquired at the (k + 1) th sampling moment_k+1Difference of (a), x_limIndicating a set displacement threshold;

formulating a set of Q values reward scheme associated with the current control parameter:

wherein R is_tcRepresenting the reward value of a current PID closed-loop controller, e (c)_k+1) Representing the current reference input c_refElectromagnetic valve input current c collected at the (k + 1) th sampling moment_k+1A difference of (c)_limIndicating a set current threshold.

In a possible implementation manner, in the proportional valve position control method based on Q-Learning provided by the present invention, in step S8, the performing Q-Learning specifically includes:

s81: randomly initializing a reward corresponding to the state set and the action set, wherein the initial value of the termination state is 0;

s82: sequentially setting i to 1,2, …, and T, and executing steps S83 to S87, respectively; wherein T represents the number of iterations;

s83: initializing a current state s as a first state of a current state set;

s84: selecting an action in the current state by using an epsilon-greedy method criterion;

s85: executing the selected action in the current state to obtain a new state and a corresponding reward;

s86: updating the cost function with the new state and its corresponding reward:

Q^k+1(s_k,a_k)＝Q^k(s_k,a_k)+α[rk+γmax Q^k)s_k+1,a_k+1)-Q^k(s_k,a_k)] (10)

wherein s is_kRepresenting the state at the kth sampling instant, a_kRepresenting the action at the kth sampling instant, Q^k+1(s_k,a_k) Denotes the (k + 1) th sampling instant at s_kIn the state of taking a_kMovement is transferred to s_k+1Q value of State, Q^k(s_k,a_k) Denotes the k-th sampling instant at s_kIn the state of taking a_kQ value of action, Q^k(s_k+1,a_k+1) Represents the action a of taking the next sampling time at the kth sampling time_k+1The latter Q value, α, represents the learning rate, α ∈ (0, 1)]R represents reward, k represents sampling time, and γ represents attenuation factor;

s87: let s ' be s, s ' be the state after taking action, judge s ' whether it is the end state; if yes, the current round of iteration is finished, and the step S82 is returned; if not, the process returns to step S84.

According to the proportional valve position control method based on Q-Learning, provided by the invention, the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are respectively subjected to self-adaptive adjustment through Q-Learning, so that the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller are adapted along with the change of sampling time, and the accurate control of the position of the valve core is realized through position PID closed-loop control and current PID closed-loop control. The invention provides a control method capable of effectively improving the bandwidth and the precision of a proportional valve by considering various nonlinear factors in a proportional solenoid valve aiming at the high-precision position control of the high-frequency response proportional valve, has the advantages of rapidity and stability, and can provide some references for the high-frequency response of the proportional valve. Six control parameters in the traditional control method are fixed values and can only follow the displacement of the valve core at low frequency or the displacement of the valve core at high frequency and cannot synchronously follow the displacements of the valve core at low frequency and high frequency.

Drawings

FIG. 1 is a block flow diagram of a method for controlling a proportional valve position based on Q-Learning according to embodiment 1 of the present invention;

fig. 2 is a flow chart of the Q-Learning algorithm in embodiment 1 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only illustrative and are not intended to limit the present invention.

s2: taking the difference value of the current reference input and the currently acquired electromagnetic valve input current as the input of a current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core;

s4: setting initial values of three displacement control parameters of a position PID closed-loop controller and three current control parameters of a current PID closed-loop controller, collecting a valve core position as a state set, and collecting output current of a PWM generator;

The specific implementation of the above-mentioned proportional valve position control method based on Q-Learning according to the present invention is described in detail with reference to a specific embodiment.

Example 1: the Q-Learning based proportional valve position control method includes position PID control of the outer loop and current PID control of the inner loop as shown in fig. 1.

Setting position x of valve core of given proportional valve_refAnd for input, the position of the valve core is controlled by taking the deviation between the currently acquired valve core position and the valve core set position and the deviation between the calculated current value and the current value output by the PWM generator as a control quantity. Discretizing the valve core position and the output current of the PWM generator to be used as states for data acquisition, and performing Q-Learning to obtain 6Q value sets, wherein each Q value set corresponds to the system gain of two PID controllers respectively, namely the displacement control parameter of the position PID closed-loop controller and the current control parameter of the current PID closed-loop controller. Each set of Q values generates the best value of the gains of the two PID controllers given the current state.

The method comprises the following specific steps:

firstly, the difference value between the valve core set position and the currently acquired valve core position is used as the input of a position PID closed-loop controller, and the sum of the output of the position PID closed-loop controller and a reference current is used as a current reference input.

Setting position x with a valve core_refFor reference, the spool set position x is calculated_refAnd the currently acquired valve core position x_kThe difference of (a):

e(x_k)＝x_ref-x_k (1)

e (x)_k) The output of the position PID closed-loop controller is as follows as the input of the position PID closed-loop controller:

wherein k is_pxProportional gain, k, of position PID closed-loop controller_ixIntegral gain, k, of a closed-loop controller representing position PID_dxDifferential gain, Δ x, of position PID closed-loop controller_kIndicating the sampling interval time, e (x), of a position PID closed-loop controller_k-1) The indication indicates the valve element set position x_refValve core position x acquired at the k-1 sampling moment_k-1A difference of (d);

output u (x) of position PID closed-loop controller_k) And a reference current c_stThe sum is taken as the current reference input:

c_ref＝u(x_k)+c_st (3)。

and secondly, taking the difference value between the current reference input and the currently acquired input current of the electromagnetic valve as the input of a current PID closed-loop controller, superposing the output of the current PID closed-loop controller and a reference PWM duty ratio, and loading the superposed output to the electromagnetic valve to control the movement of a valve core.

Calculating a current reference input c_refAnd the current collected electromagnetic valve input current c_kThe difference of (a):

e(c_k)＝c_ref-c_k (4)

And thirdly, according to the relation between the input of the current PID closed-loop controller and the PWM duty ratio, obtaining the PWM duty ratio corresponding to the input of the current PID closed-loop controller, and taking the PWM duty ratio as the reference value of the PWM generator duty ratio.

And fourthly, setting initial values of displacement control parameters of the position PID closed-loop controller and current control parameters of the current PID closed-loop controller, collecting the position of the valve core as a state set, and collecting output current of the PWM generator.

And fifthly, dividing the value range of each displacement control parameter and each current control parameter into a plurality of storage intervals, wherein the values in the same storage interval are in the same state, and the values in different storage intervals are in different states, so as to obtain 6Q value sets.

The set of Q values is obtained by a Q-Learning algorithm. There are six Q value sets in total, three Q value sets and displacement control parameter k_p1、k_i1、k_d1In connection with the three sets of Q values and the current control parameter k_p2、k_i2、k_d2It is related.

Specifically, the same rule is used to set the values in the same storage interval:

And sixthly, giving an epsilon-greedy method criterion and the current value taking state of each displacement control parameter and each current control parameter, and generating 6 action sets of Q value sets.

Specifically, the epsilon-greedy criterion is defined as:

And seventhly, establishing a Q value set reward scheme related to the displacement control parameter and a Q value set reward scheme related to the current control parameter.

Specifically, a Q value set reward scheme related to the displacement control parameters is established:

wherein R is_tPrize value, e (x), representing position PID closed loop controller_k+1) The indication indicates the valve element set position x_refValve core position x acquired at the (k + 1) th sampling moment_k+1Difference of (a), x_limIndicating a set displacement threshold;

establishing a Q value set reward scheme related to the current control parameter:

wherein R is_tcRepresenting the reward value of a current PID closed-loop controller, e (c)_k+1) Representing the current reference input c_refElectromagnetic valve input current c collected at the (k + 1) th sampling moment_k+1A difference of (c)_limIndicating deviceA set current threshold.

And step eight, performing Q-Learning.

The Q-Learning algorithm is a model-free reinforcement Learning method using time-series differential solution, which enables an agent to select an optimal action sequence in a markov decision process through interaction with the external environment. The Q-Learning algorithm's cost function is defined as:

Q(s_k,a_k)＝R(s,a)+γmaxQ(s_k,a) (10)

wherein, Q(s)_k,a_k) Is shown at s_kIn the state of taking a_kQ value of motion, maxQ(s)_k,a_k) Represents the maximum value of Q, gamma represents the decay factor, R (s, a) represents the value returned by the reward scheme;

in the Q-Learning algorithm, the agent receives an input state s in the external environment_kAnd outputs corresponding action a through an internal mechanism_k. At a_kBy the action of (2), the external environment changes to a new state s_k+1. At the same time, the algorithm generates an instant reward signal R for the agent_k+1. Instant reward signal R_k+1Is for the external environment state s_kLower agent action a_kEvaluation of (3).

In the Q-Learning algorithm, the Q-value optimization iterative formula is as follows:

Q^k+1(s_k,a_k)＝Q^k(s_k,a_k)+α[rk+γmax Q^k(s_k+1,a_k+1)-Q^k(s_k,a_k)] (11)

wherein s is_kRepresenting the state at the kth sampling instant, a_kRepresenting the action at the kth sampling instant, Q^k+1(s_k,a_k) Denotes the (k + 1) th sampling instant at s_kIn the state of taking a_kMovement is transferred to s_k+1Q value of State, Q^k(s_k,a_k) Watch (A)Shows the k-th sampling instant at s_kIn the state of taking a_kQ value of action, Q^k(s_k+1,a_k+1) Represents the action a of taking the next sampling time at the kth sampling time_k+1The latter Q value, α, represents the learning rate, α ∈ (0, 1)]R represents reward, k represents sampling time, and γ represents attenuation factor;

indicating that the k +1 sampling moment takes the optimal action in the optimizing process

The value of Q of (A) is,

indicating that the k sampling moment in the optimizing process takes the optimal action

The Q value of (1).

The Q-Learning algorithm is embodied as follows, as shown in FIG. 2:

(1) inputting iteration rounds, a state set, an action set, a step length, an attenuation factor and an exploration rate;

(2) randomly initializing a reward corresponding to the state set and the action set, wherein the initial value of the termination state is 0;

(3) sequentially setting i to 1,2, … and T, and executing steps (4) to (8) respectively; wherein T represents the number of iterations;

(4) initializing a current state s as a first state of a current state set;

(5) selecting an action in the current state by using an epsilon-greedy method criterion;

(6) executing the selected action in the current state to obtain a new state and a corresponding reward;

(7) updating the cost function according to the formula (11) by using the new state and the corresponding reward;

(8) let s ' be s, s ' be the state after taking action, judge s ' whether it is the end state; if yes, the current round of iteration is finished, and the step S82 is returned; if not, returning to the step (5);

(9) and outputting the rewards corresponding to the state set and the action set.

And ninthly, executing the first step to the fourth step according to the current value state and the reward corresponding to the action of each displacement control parameter and each current control parameter, and controlling the position of the valve core.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for controlling a proportional valve position based on Q-Learning, comprising the steps of:

2. The method for controlling the Q-Learning based proportional valve position of claim 1, wherein step S1 specifically comprises:

e(x_k)＝x_ref-x_k (1)

e (x)_k) And as the input of the position PID closed-loop controller, the output of the position PID closed-loop controller is as follows:

c_ref＝u(x_k)+c_st (3)。

3. the method for controlling the Q-Learning based proportional valve position of claim 2, wherein the step S2 specifically comprises:

e(c_k)＝c_ref-c_k (4)

4. The method for Q-Learning based proportional valve position control of claim 3, wherein in step S5, the same rule is used to set the values in the same storage interval:

wherein, [ x ]]Max { n ∈ Z | n ≦ x }, n representing a discrete spool displacement or current; x is the number of_conRepresenting a continuous spool displacement or current; x_minAnd X_maxAre each x_conLower and upper limits of (d); n represents the number of intervals into which each spool displacement or each current is divided.

5. The Q-Learning based proportional valve position control method of claim 4, wherein in step S6, the epsilon-greedy criterion is defined as:

6. The method for Q-Learning based proportional valve position control of claim 5, wherein in step S7, a Q-value set reward scheme associated with the displacement control parameter is formulated:

7. The method for controlling the Q-Learning based proportional valve position of claim 6, wherein the performing Q-Learning in step S8 specifically comprises:

s82: sequentially executing steps S83 to S87 with i being 1,2,. and T, respectively; wherein T represents the number of iterations;

s83: initializing a current state s as a first state of a current state set;

Q^k+1(s_k，a_k)＝Q^k(s_k，a_k)+α[rk+γmax Q^k(s_k+1，a_k+1)-Q^k(s_k，a_k)] (10)

wherein s is_kRepresenting the state at the kth sampling instant, a_kRepresenting the action at the kth sampling instant, Q^k+1(s_k，a_k) Denotes the (k + 1) th sampling instant at s_kIn the state of taking a_kMovement is transferred to s_k+1Q value of State, Q^k(s_k，a_k) Denotes the k-th sampling instant at s_kIn the state of taking a_kQ value of action, Q^k(s_k+1，a_k+1) Represents the action a of taking the next sampling time at the kth sampling time_k+1The latter Q value, α, represents the learning rate, α ∈ (0, 1)]R represents reward, k represents sampling time, and γ represents attenuation factor;