CN112180996A

CN112180996A - Liquid level fault-tolerant control method based on reinforcement learning

Info

Publication number: CN112180996A
Application number: CN202010947314.8A
Authority: CN
Inventors: 张大鹏
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-09-10
Filing date: 2020-09-10
Publication date: 2021-01-05

Abstract

A liquid level fault-tolerant control system based on reinforcement learning is used for fault-tolerant control of a multi-water-tank system, the required precondition is only that a fault is detected without further diagnosis of the fault, the precondition is easy to realize in fault detection and diagnosis, and at present, a plurality of mature methods such as PCA and Bayesian decision making exist. In addition, the evaluation action structure is mainly realized by adopting an artificial neural network, and the neural network has good robustness and can effectively overcome the influence of noise. The invention can directly utilize the acquired data to control under the condition of no training sample, thereby realizing the same index when the liquid level of the container reaches the fault-free condition. The control quantity obtained by the method is the optimal control quantity when the system fails, and is the performance index which can be reached to the maximum extent when the system fails.

Description

Liquid level fault-tolerant control method based on reinforcement learning

Technical Field

The invention relates to a liquid level fault-tolerant control method. In particular to a liquid level fault-tolerant control method based on reinforcement learning.

Background

In the industrial and agricultural growth, the liquid level of a container is often required to be controlled, for example, the water storage capacity of a water tank, a water pool, a water tank, a boiler and the like is automatically controlled, a large number of mature products are directly used for controlling the water level of a single container, but in the industrial production (crystallizer liquid level), the situation that a plurality of containers are communicated through valves is often faced, the set heights of different containers are kept through regulating the opening degree of the valves, so that the liquid phase reaction in the containers has higher efficiency, however, the liquid level is often deviated from the original set value due to the detection signal deviation caused by the reduction of the precision of a sensor, the performance of a valve controller is reduced, and the liquid leakage in the tank caused by the sealing failure, so that the liquid phase reaction efficiency is reduced, and the commonly adopted method is fault-tolerant.

The multi-container connection enables the height of each container to be kept at a set position through the opening adjustment of the connecting valve, but the liquid level is often deviated from an original set value due to the detection signal deviation caused by the reduction of the precision of the sensor, the performance reduction of a valve controller and the liquid leakage in the tank caused by the sealing failure.

In the traditional various artificial intelligence methods based on data driving, sample data is required to be adopted for training in advance, but due to the uncertainty of the time of fault occurrence and the randomness of fault types, enough effective fault data is difficult to obtain as a training sample.

Disclosure of Invention

The invention aims to provide a liquid level fault-tolerant control method based on reinforcement learning, which can keep the liquid level of each container to be at the height without fault even under the fault condition by adjusting the flow.

The technical scheme adopted by the invention is as follows: a liquid level fault-tolerant control system based on reinforcement learning is characterized in that the fault-tolerant control system is used for a multi-tank system and comprises the following components: the device comprises an information acquisition unit for respectively acquiring liquid level information of each water tank at different moments, a fault-free model for predicting the liquid level information of all the water tanks at the moment k +1 according to the liquid level information of all the water tanks at the moment k and the control information of the frequency converter output by the information acquisition unit, an evaluation network for respectively estimating total values V (k) and V (k +1) of control variables of the control frequency converter corresponding to the moment k and the moment k +1 according to the liquid level information of all the water tanks at the moment k +1 and output by the fault-free model, a stage value evaluation unit for evaluating stage values R (k) according to the stage values output by the stage value evaluation unit and the total values output by the evaluation network V (k) and V (k +1) output a fitness function used for weight updating, a weight updating unit used for updating the weight of the evaluation network according to the fitness function output by the receiving fitness estimating unit, the evaluation network outputs the weight related to the control quantity u (k) of the frequency converter according to all updated weights output by the receiving weight updating unit, and an action network used for controlling the frequency converter of the multi-capacity water tank system by the optimal control variable obtained by iterative updating according to the weight related to the control quantity u (k) of the frequency converter output by the receiving evaluation network and the liquid level information of all water tanks at the moment k output by the information acquisition unit.

The liquid level fault-tolerant control method based on reinforcement learning has the following advantages that:

1. the method of the invention does not need to diagnose and position the fault type and part in advance, and directly adopts a data driving method to carry out fault-tolerant control on the liquid level of the container.

2. The method of the invention overcomes the contradiction between the traditional artificial intelligence method that enough training samples are needed and the actual system is difficult to obtain the sample data, and can directly utilize the acquired data to control under the condition of no training sample, thereby realizing the same index when the liquid level of the container reaches the fault-free condition.

3. The control quantity obtained by the method is the optimal control quantity when the system fails, and is the performance index which can be reached to the maximum extent when the system fails.

Drawings

FIG. 1 is a schematic diagram of a control structure of a liquid level fault-tolerant control method based on reinforcement learning according to the present invention;

FIG. 2 is a schematic diagram of an evaluation neural network in accordance with the present invention;

FIG. 3 is a schematic diagram of an acting neural network according to the present invention;

FIG. 4 is a schematic structural diagram of a three-volume system according to an embodiment of the present invention;

FIG. 5 is a fluid level diagram of T3 in an actuator output deviation fault scenario in accordance with an embodiment of the present invention;

FIG. 6 is a diagram illustrating the evolution of various states in an actuator output bias fault scenario in accordance with an embodiment of the present invention;

FIG. 7 is a control variable plot in an actuator output deviation fault scenario in accordance with an embodiment of the present invention;

FIG. 8 is a liquid level diagram of T3 in an actuator stuck fault scenario in accordance with an embodiment of the present invention;

FIG. 9 is a diagram illustrating the evolution of each state in a stuck-at fault scenario of an actuator according to an embodiment of the present invention;

FIG. 10 is a control variable diagram in an actuator stuck fault scenario in accordance with an embodiment of the present invention;

fig. 11 is a liquid level diagram of T3 when the submersible pump 1 opening degree similar to the dead-lock fault is reduced to 30% according to the embodiment of the present invention;

FIG. 12 is a diagram showing the evolution of the submersible pump 1 according to the embodiment of the present invention when the opening degree similar to the dead lock fault is reduced to 30%;

fig. 13 is a control variable diagram when the opening degree of the submersible pump 1 is reduced to 30% similar to the dead lock fault according to the embodiment of the invention;

FIG. 14 is a liquid level diagram of T3 in a leak fault scenario according to an embodiment of the present invention;

FIG. 15 is a diagram illustrating the evolution of various states in a leakage failure scenario in accordance with an embodiment of the present invention;

FIG. 16 is a graph of control variables in a leakage fault scenario in accordance with an embodiment of the present invention.

Detailed Description

The liquid level fault-tolerant control method based on reinforcement learning of the invention is described in detail below with reference to the embodiments and the accompanying drawings.

As shown in fig. 1, the liquid level fault-tolerant control system based on reinforcement learning of the present invention is a fault-tolerant control system for a multi-tank system, and includes: an information acquisition unit 1 for respectively acquiring liquid level information of each water tank at different moments, a fault-free model 3 for predicting the liquid level information of all the water tanks at the moment k +1 according to the liquid level information of all the water tanks at the moment k and the control information of the frequency converter output by the information acquisition unit 1, an evaluation network 2 for respectively estimating the total values V (k) and V (k +1) of the control variables of the control frequency converter corresponding to the moment k and the moment k +1 according to the liquid level information of all the water tanks at the moment k and the moment k +1 output by the information acquisition unit 1, and a stage value evaluation unit 4 for evaluating the stage value R (k) according to the liquid level information of all the water tanks at the moment k +1 output by the information acquisition unit 1 and the liquid level information of all the water tanks at the moment k +1 predicted by the fault-free model 3, a deviation estimating unit 5 for outputting a fitness function for weight updating according to the separately received stage value output by the stage value evaluating unit 4 and the overall values V (k) and V (k +1) output by the evaluation network 2, a weight updating unit 6 for updating the weight of the evaluation network 2 according to the fitness function output by the receiving deviation estimating unit 5, the evaluation network 2 outputs the weight value related to the control quantity u (k) of the frequency converter according to all the updated weight values output by the receiving weight value updating unit 6, and the action network 7 is used for carrying out iterative updating according to the weight values which are output by the receiving and evaluating network 2 and are related to the control quantity u (k) of the frequency converter and the liquid level information of all the water tanks at the moment k output by the information acquisition unit 1 to obtain the optimal control variable to control the frequency converter of the multi-water-tank system. Wherein the content of the first and second substances,

1) the liquid level information of all the water tanks at the moment k output by the information acquisition unit 1 is represented as x (k), and the liquid level information of all the water tanks at the moment k +1 is represented as x (k + 1).

2) The fault-free model 3 is represented as follows:

…

in the formula, x₁,x₂,x₃And x_nLiquid level information of the water tank T1, the water tank T2, the water tank T3 and the water tank Tn, S₁，S₂，S₃And S_nThe sectional areas of the water tank T1, the water tank T2, the water tank T3 and the water tank Tn respectively, g is the gravity acceleration and the parameters

Parameter(s)

Parameter(s)

Parameter(s)

Parameter(s)

In the formula, R₁₂Is the flow resistance, R, between the tank 1 and the tank 2₃₂Is the flow resistance, R, between the tank 3 and the tank 2₄₃Is the flow resistance, R, between the water tank 4 and the water tank 3_n-1,nIs the flow resistance between tank n-1 and tank n, R_nIs the drainage resistance of the water tank Tn, and rho is the liquid density;

Q₁and Q₂Is the flow rate of the submersible pump 1 and the submersible pump 2.

3) The evaluation network 2 is shown in fig. 2 and includes an input layer, a hidden layer and an output layer which are all connected in sequence, wherein the input layer has n +2 neurons, the hidden layer has 2n neurons, and the output layer has 1 neuron.

4) The stage value evaluation unit 4 is composed of the following formula:

wherein R (k) is a stageA value; x (k +1) is the liquid level information of all the water tanks at the moment of k + 1; x is the number of_rAnd (k +1) is the liquid level information of all the water tanks at the moment of predicting k +1 output by the fault-free model (3).

5) The deviation estimating unit 5 is composed of the following formula:

TE＝V(k)-R(k)+γV(k+1)

wherein TE is a deviation; v (k) and V (k +1) are the total values of the control variables of the control frequency converter corresponding to the time k and the time k +1 respectively; r (k) is stage value; gamma is a discount factor.

6) The weight updating unit 6 includes:

(1) will evaluate the weight W of the input layer and the hidden layer in the network 2_c1And the weight W of the hidden layer and the output layer_c2Randomly selecting an initial particle value by using corresponding particle position representation;

(2) the fitness function for each particle is calculated according to the following formula:

wherein FF (z (k)) is a fitness function of the ith particle at the p-th iteration; v (k) and V (k +1) are the total values of the control variables of the control frequency converter corresponding to the time k and the time k +1 respectively; r (k) is stage value; gamma is a discount factor; x (k) is the combination of the liquid level information x (k) of all the water tanks at the moment k and the control information u (k) of the frequency converter;

(3) obtaining the optimal position p of the current particle swarm according to the fitness function value and the following formula_bestAnd the optimal position g experienced by the whole particle swarm_bestAnd update p_best，g_best：

Wherein i is the number of particles, and m is the number of particles; p is the number of iterations;

(4) updating the particle moving speed v according to the basic iterative formula of the particle swarm optimization_iAnd the position z of the particle_i

Wherein z represents the particle position, v represents the particle velocity, ω is the inertial weight, c₁And c₂Is the acceleration constant, and rand1 and rand2 are at [0,1]]Two random numbers, p, generated independently of each other_bestIs the current optimum position of the particle swarm, g_bestIs the best position experienced by the whole particle swarm, (p) represents the number of iterations;

(5) repeating the steps (2) to (4) until convergence, and recording the optimal position g of the current particle swarm_best1；

(6) Redistributing particles with random numbers of [0,1] to obtain a new fitness function value;

(7) repeating the steps (2) to (4) until convergence, and recording the optimal position g of the current particle swarm_best2；

(8) If the optimum position g_best2Better than optimum position g_best1Then use the optimum position g_best2Alternative optimum position g_best1Otherwise, the optimum position g is maintained_best1The change is not changed;

(9) repeating the step (2) to the step (8) until a better optimal position cannot be found, and obtaining a final position g_best1；

(10) The particles are in g_best1Is located to judge the network W_c1And W_c2The solution of (1).

7) The action network 7 is shown in fig. 3 and comprises an input layer, a hidden layer and an output layer which are all connected in sequence, wherein the input layer is provided with n godsThe channel element, the hidden layer has n +3 neurons, the output layer has 2 neurons, the weight between the input layer and the hidden layer is W_a1The weight between the hidden layer and the output layer is W_a2。

8) The weight of the action network 7 is changed into

ΔW_a2＝l·W_c2·[s_out,c(1-s_out,c)]·W_c1,u·s_out,a

ΔW_a1＝l·W_c2·[s_out,c(1-s_out,c)]·W_c1,u·W_a2·[s_out,a(1-s_out,a)]·x(k)

Wherein l is the learning rate, W_c2Represents the weight, s, between the hidden layer and the output layer in the evaluation network 2_out,cAnd s_out,aAre the outputs of the non-linear functions in the evaluation network 2 and the action network 7, respectively; w_c1,uFor evaluating the weight of the hidden layer pair of the network 2 in relation to the control u (k) of the frequency converter, W_a2The weights of the hidden layer and the output layer in the action network 7, x (k) is the liquid level information of all the water tanks at the moment k, W_c1,u、W_c2、s_out,c，s_out,aAnd W_a2Both obtained from the evaluation network and the action network;

updating the weights Wa1 and Wa2 of the action network according to the following formula

W_a1’＝W_a1+ΔW_a1

W_a2’＝W_a2+ΔW_a2

In the formula, W_a1' and W_a2' is the updated weights between the input layer and the hidden layer and the weights between the hidden layer and the output layer in the action network 7.

Experimental validation is given below

The proposed method was verified using a three-volume system as the experimental platform. The three-container system consists of a water tank T1, a T2, a T3, a submersible pump 1 with flow rate Q1 and Q2 controlled by a digital controller, a submersible pump 2, a connecting valve CV1, CV2, CV3, a leakage valve LV1, LV2 and LV3 and pipelines. The liquid level information of each of the water tanks T1, T2, and T3 may be obtained separately by a liquid level meter. The three tanks T1, T2 and T3 have the same size plumbing connections. The system operates with the connecting valve open and the leak valve closed. Thus, the liquid in the reservoir flows into the tank through the connecting valve CV3 and re-enters the tank body through the submersible pumps 1 and 2. The inter-tank flow resistance can be changed by manually adjusting the opening degrees of the connecting valves CV1, CV2, CV3 and the leak valves LV1, LV2, LV 3. The submersible pump 1 and the submersible pump 2 are respectively controlled by separate frequency converters. The flow rates of the submersible pumps 1 and2 are determined by the rotating speeds of the submersible pumps, and the rotating speeds are controlled by separate frequency converters. The controller outputs a frequency converter control signal of 0-5V. By additional experiments, the relation between the pump flow and the frequency control signal was obtained. After that, for the sake of clarity, we omit the frequency converter, and replace the rotation speed with the pump flow rate as the control variable of the controlled object. The structure is shown in fig. 4.

The following formula gives the fault-free model of the three-capacitor system

In the formula, each variable has the same meaning as described above.

We use the PID controller for the submersible pump 1 to keep the submersible pump 2 at 50% opening (middle signal 2.5V of soft channel control signal 0-5V), achieving the goal of keeping the liquid level at T3 without failure. We call this stability a standard state of no failure. When a fault occurs, the FTC controller with two outputs (flow to the submersible pump 1 and 2) will replace the previous controllers (PID for the

submersible pump

1 and 50% fixed opening for the submersible pump 2). Our goal is to maintain the reference level in T3 by controlling the flow rate of submersible pump 1 and submersible pump 2, respectively.

A. Actuator output deviation fault scenario

In the non-fault case, actuator faults of the submersible pump 1 are simulated by changing the relation between the submersible pump flow and the frequency control signal, which changes cause the flux to increase/decrease compared to the initial value of the connection controller output. By this method, the output deviation fault of the actuator can be obtained by a soft method, and the real actuator can be prevented from being damaged. After sampling 100, the actuator of the submersible pump 1 has a fault, wherein the fault is that the flow is greater than the initial set value and is 12L/min (conversion is carried out according to the relation between the pump flow and the frequency control signal). The liquid phase height, state evolution and control variables of T3 are shown in fig. 5, 6 and 7.

The first and second curves represent the case when no FTC is used and when FTC is used, respectively. Fig. 6 shows that the states x1, x2, and x3 in the no fault state remain stable until the fault occurs, and the liquid level of T3 remains at the reference level (fig. 5). When a fault occurs at sample 100, state x1 will rise because there is more traffic at T1, and states x2, x3 will also rise because coupling without FTC. However, after a transition, the liquid level height of T3 from 10cm to 15cm will enter another steady state. The FTC controller was designed as per procedure 1, using a 3-10-2 forward neural network. 100 data are selected from the training set, and training is carried out by adopting a Levenberg-Marquardt algorithm. A well-trained neural network is used as the FTC. The algorithm clearly restores the liquid phase height of T3.

More explanations about the control variables will be given on the basis of fig. 7. In fig. 7, the horizontal coordinate is the sampling time, and the vertical coordinate is the pump flow rate. The zero point of the scale of the vertical coordinate represents the flow rate of the pump in the normal state. We use the scale zero instead of the actual flow because in the absence of a fault, the standard state will vary with the reference level of T3. Negative means less flow and positive means more flow than in the normal state without failure. The first and second curves represent flow rates without and with FTC, respectively. It can be seen that the pump 1 will reduce the flow to react more to output faults to the actuator. On the other hand, pump 2 will also decrease the output to maintain the T3 liquid level at the reference level.

B. Scene of dead locking fault of actuator

After sampling 100, the pump 1 experienced a stuck fault at 60% opening (signal 3V of inverter control signal 0-5V, indicating that the pump 1 was bumped due to loss of control). Fig. 8, 9 and 10 are the liquid level, state evolution and control variables of T3, respectively.

As can be seen from the first curve of fig. 9, if the control object is followed, the liquid levels of T1, T2, and T3 slowly rise (in response to the characteristics of the equipment) after the occurrence of the stuck fault. Fig. 10 shows the control variables with FTC (second curve) and without FTC (first curve). Due to the pump 1 being blocked, the regulating function is lost and the first and second curves coincide. The pump 2 reflects this failure by stopping the delivery flow for a period of time to release the buildup. It will then provide a steady flow to maintain the level of T3. Fig. 8 shows that the liquid level of T3 can be maintained at a fault-free (red curve) level under FTC control.

The opening degree of the pump 1 is reduced to 30 percent (the frequency converter control signal is 0-5V, the signal is 1.5V) similar to the blocking fault, and the liquid level can not be maintained to rise. The state evolution, liquid level and control variables of T3 are shown in fig. 12, 11 and 13. The first curve represents the case without FTC and the second curve represents the state evolution with FTC. As can be seen from fig. 13, the release flow is less intense and shorter in time than the 60% breakblock opening. Due to the difference in stability, a deviation between the first curve and the second curve also occurs.

C. Leakage fault scenario

We also caused a flow leak failure by partially opening LV2 of the T3 tank. As shown in fig. 14, if a fault-free control is implemented (as shown in the first curve), the liquid level in T3 will decrease from 9cm to 7cm due to the flow leakage. The second curve of fig. 14 shows the liquid height trend for T3 with FTC. It can be seen that the liquid level in T3 will remain at a fault-free level due to the action of the FTC. The state evolution and control variables are shown in fig. 15 and 16.

Claims

1. A liquid level fault-tolerant control system based on reinforcement learning is characterized in that the fault-tolerant control system is used for a multi-tank system and comprises the following components: an information acquisition unit (1) used for respectively acquiring the liquid level information of each water tank at different moments, a fault-free model (3) used for predicting the liquid level information of all the water tanks at the moment k +1 according to the liquid level information of all the water tanks at the moment k and the control information of the frequency converter output by the information acquisition unit (1) and used for estimating the total values V (k) and V (k +1) of the control variables of the frequency converter corresponding to the moment k and the moment k +1 according to the liquid level information of all the water tanks at the moment k and the moment k +1 output by the information acquisition unit (1) respectively, an evaluation network (2) used for evaluating the stage value R (k) according to the liquid level information of all the water tanks at the moment k +1 output by the information acquisition unit (1) respectively and the liquid level information of all the water tanks at the moment k +1 predicted by the fault-free model (3), a deviation estimation unit (5) for outputting a fitness function for weight updating according to the separately received phase value output by the phase value evaluation unit (4) and the overall values V (k) and V (k +1) output by the evaluation network (2), a weight updating unit (6) for updating the weight of the evaluation network (2) according to the fitness function output by the receiving deviation estimating unit (5), the evaluation network (2) outputs the weight value related to the control quantity u (k) of the frequency converter according to all the updated weight values output by the receiving weight value updating unit (6), and the action network (7) is used for carrying out iterative updating according to the weight values which are output by the receiving and evaluating network (2) and are related to the control quantity u (k) of the frequency converter and the liquid level information of all the water tanks at the time k output by the information acquisition unit (1) to obtain an optimal control variable to control the frequency converter of the multi-water-tank system.

2. The liquid level fault-tolerant control method based on reinforcement learning of claim 1, wherein the liquid level information of all water tanks at time k output by the information acquisition unit (1) is represented as x (k), and the liquid level information of all water tanks at time k +1 is represented as x (k + 1).

3. The reinforcement learning-based liquid level fault-tolerant control method according to claim 1, characterized in that the fault-free model (3) is represented as follows:

…

in the formula, x₁，x₂，x₃And x_nLiquid level information of the water tank T1, the water tank T2, the water tank T3 and the water tank Tn, S₁，S₂，S₃And S_nThe sectional areas of the water tank T1, the water tank T2, the water tank T3 and the water tank Tn respectively, g is the gravity acceleration and the parameters

Parameter(s)

Parameter(s)

Parameter(s)

Parameter(s)

In the formula, R₁₂Is the flow resistance, R, between the tank 1 and the tank 2₃₂Is the flow resistance, R, between the tank 3 and the tank 2₄₃Is the flow resistance, R, between the water tank 4 and the water tank 3_n-1，nIs the flow resistance between tank n-1 and tank n, R_nIs the drainage resistance of the water tank Tn, and rho is the liquid density;

4. The reinforcement learning-based liquid level fault-tolerant control method according to claim 1, characterized in that the evaluation network (2) comprises an input layer, a hidden layer and an output layer which are all connected in sequence, wherein the input layer has n +2 neurons, the hidden layer has 2n neurons, and the output layer has 1 neuron.

5. The liquid level fault-tolerant control method based on reinforcement learning of claim 1, characterized in that the stage value evaluation unit (4) is composed of the following formula:

wherein R (k) is a stage value; x (k +1) is the liquid level information of all the water tanks at the moment of k + 1; x is the number of_rAnd (k +1) is the liquid level information of all the water tanks at the moment of predicting k +1 output by the fault-free model (3).

6. The fault-tolerant liquid level control method based on reinforcement learning of claim 1, wherein the deviation estimation unit (5) is formed by the following formula:

TE＝V(k)-R(k)+γV(k+1)

7. The liquid level fault-tolerant control method based on reinforcement learning of claim 1, wherein the weight updating unit (6) comprises:

1) the weights W of the input layer and the hidden layer in the evaluation network (2) are calculated_c1And the weight W of the hidden layer and the output layer_c2Randomly selecting an initial particle value by using corresponding particle position representation;

2) the fitness function for each particle is calculated according to the following formula:

3) obtaining the optimal position p of the current particle swarm according to the fitness function value and the following formula_bestAnd the optimal position g experienced by the whole particle swarm_bestAnd update p_best，g_best：

4) updating the particle moving speed v according to the basic iterative formula of the particle swarm optimization_iAnd the position z of the particle_i

5) repeating the steps 2) to 4) until convergence, and recording the optimal position g of the current particle swarm_best1；

6) Redistributing particles with random numbers of [0,1] to obtain a new fitness function value;

7) repeating the steps 2 to 4 until convergence, and recording the optimal position g of the current particle swarm_best2；

8) If the optimum position g_best2Better than optimum position g_best1Then use the optimum position g_best2Alternative optimum position g_best1Otherwise, the optimum position g is maintained_best1The change is not changed;

9) repeating the steps 2) to 8) until a better optimal position cannot be found, and obtaining a final position g_best1；

10) The particles are in g_best1Is located to judge the network W_c1And W_c2The solution of (1).

8. The liquid level fault-tolerant control method based on reinforcement learning of claim 1, characterized in that the action network (7) comprises an input layer, a hidden layer and an output layer which are all connected in sequence, wherein the input layer has n neurons, the hidden layer has n +3 neurons, the output layer has 2 neurons, and the weight between the input layer and the hidden layer is W_a1The weight between the hidden layer and the output layer is W_a2。

9. The reinforcement learning-based liquid level fault-tolerant control method according to claim 1 or 8, characterized in that the weight of the action network (7) is changed into

ΔW_a2＝l·W_c2·[s_out，c(1-s_out，c)]·W_c1，u·s_out，a

ΔW_a1＝l·W_c2·[s_out，c(1-s_out，c)]·W_c1，u·W_a2·[s_out，a(1-s_out，a)]·x(k)

Wherein l is the learning rate, W_c2Represents the weight, s, between the hidden layer and the output layer in the evaluation network (2)_out，cAnd s_out，aAre the outputs of the non-linear functions in the evaluation network (2) and the action network (7), respectively; w_c1，uFor evaluating the weight of the hidden layer of the network (2) on the control quantity u (k) of the frequency converter, W_a2The weight of a hidden layer and an output layer in the action network (7), x (k) is the liquid level information of all water tanks at the moment k, W_c1，u、W_c2、s_out，c，s_out，aAnd W_a2Both obtained from the evaluation network and the action network;

W_a1’＝W_a1+ΔW_a1

W_a2’＝W_a2+ΔW_a2

In the formula, W_a1' and W_a2' are updated weights between the input layer and the hidden layer and weights between the hidden layer and the output layer in the action network (7).