CN110320796A

CN110320796A - Electrical control method, device and equipment based on PID controller

Info

Publication number: CN110320796A
Application number: CN201910722233.5A
Authority: CN
Inventors: 罗鸿轩; 金鑫; 肖勇; 张乐平; 胡珊珊
Original assignee: China South Power Grid International Co ltd; China Southern Power Grid Co Ltd
Current assignee: China South Power Grid International Co ltd; China Southern Power Grid Co Ltd
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2019-10-11

Abstract

The invention discloses an electrical control method, a device, equipment and a computer readable storage medium based on a PID controller, comprising the following steps: constructing a target function of a PID controller parameter setting problem, wherein undetermined parameters of the target function comprise N single-dimensional variables; discretizing the N single-dimensional variables, and learning the N single-dimensional variables by adopting N agents according to a reinforcement learning algorithm to determine target values of the N single-dimensional variables; according to the target values of the N single-dimensional variables, determining the optimal value of the target function, and completing parameter setting of the PID controller; and controlling a control object in the electrical control system by using the PID controller after parameter setting is finished. The method, the device, the equipment and the computer readable storage medium provided by the invention improve the parameter optimization efficiency and the convergence speed of the PID controller and the control performance of the PID controller.

Description

Electrical control method, device and equipment based on PID controller

Technical Field

The invention relates to the technical field of process control, in particular to an electrical control method, device and equipment based on a PID controller and a computer readable storage medium.

Background

With the progress of process control technology in the electrical field, it has been a major development in recent decades. Researchers around the world have developed various control methods including adaptive control, artificial neural network control, fuzzy control, etc. Among them, the most basic, most widely used, is a single loop PID controller. The PID controller composed of proportional (P), integral (I) and derivative (D) units is simple in structure and can keep good robustness when the variation range of the operating conditions is large. Therefore, how to optimize and tune the proportional, integral and differential parameters of the PID controller is one of the key points of the control problem research.

In the prior art, a parameter optimization method includes two categories: traditional regulation and intelligent regulation. First, the conventional tuning method includes a Ziegler-Nichols algorithm, an optimal PID parameter tuning method based on an Integral Square Time error criterion (ISTE). The adjusting process is complex, oscillation and large overshoot are difficult to avoid, and the optimal PID parameter is difficult to obtain. Therefore, researchers are dedicated to developing intelligent PID parameter tuning methods based on various heuristic algorithms. Artificial intelligence techniques such as Genetic Algorithm (GA), Particle Swarm Optimization (PSO), fuzzy inference Algorithm, and artificial neural network are then used in the tuning process of PID parameters. These techniques can effectively overcome the above-mentioned disadvantages of the conventional regulation method and enhance the control performance of the PID controller. However, these techniques also have their own drawbacks. For example, the GA needs to process a cumbersome encoding process, and both the GA and the PSO depend on the concept of the population, which has a long convergence time and a slow convergence rate; the fuzzy reasoning is difficult to find out a systematic method to complete the selection of the parameters of the algorithm; the neural network comprises a plurality of layers of neurons, and it is difficult to find an explicit method for determining the number of hidden layer neurons and the initial weights of the neurons.

From the above, it can be seen that how to improve the optimization efficiency of PID controller parameters is a problem to be solved at present.

Disclosure of Invention

The invention aims to provide an electrical control method, an electrical control device, electrical control equipment and a computer readable storage medium based on a PID controller, and aims to solve the problems that a parameter adjusting method of the PID controller in the prior art is complex in process, long in convergence time and slow in convergence rate.

In order to solve the above technical problem, the present invention provides an electrical control method based on a PID controller, comprising: constructing a target function of a PID controller parameter setting problem, wherein undetermined parameters of the target function comprise N single-dimensional variables; discretizing the N single-dimensional variables, and learning the N single-dimensional variables by adopting N agents according to a reinforcement learning algorithm to determine target values of the N single-dimensional variables; according to the target values of the N single-dimensional variables, determining the optimal value of the target function, and completing parameter setting of the PID controller; and controlling a control object in the electrical control system by using the PID controller after parameter setting is finished.

Preferably, the constructing an objective function of the PID controller parameter tuning problem, wherein the undetermined parameter of the objective function includes N single-dimensional variables includes:

constructing a target function of a PID controller parameter setting problem:

wherein e (t) is the tracking error of the PID controller; u (t) is the output of the PID controller; t is t_uA rise time for the output signal y (t) of the electrical control system to rise from 10% to 90% of a steady state value; y (t) -y (t-1) is overshoot penalty, when y (t) is not less than 0, ω is₄0; when ey (t) < 0, ω₄Not equal to 0 and ω₄＞＞ω₁(ii) a Undetermined parameters of the objective functionComprising a first weight ω₁A second weight ω₂A third weight ω₃And a fourth weight ω₄。

Preferably, the step of learning each single-dimensional variable by each agent comprises:

s1: determining a current solution of the objective function after an i (i) 1, 2., N) th agent selects a current action from an i (i) 1, 2., N) th set of actions that can be taken for a single-dimensional variable;

s2: determining a reward function value corresponding to the current behavior according to a calculation rule of a preset reward function and the current solution of the objective function;

s3: updating the value function corresponding to the current behavior according to the reward function value so that the ith agent selects the next behavior according to the updated value function;

s4: adding different disturbances to all dimensions of the current solution;

s5: and circularly executing the steps from S1 to S4 until the number of circulation times reaches a preset number, and finishing the learning of the ith single-dimensional variable.

Preferably, the determining, according to the calculation rule of the preset reward function and the current solution of the objective function, the reward function value corresponding to the current behavior includes:

according toDetermining a value R of a reward function for the kth step of the current behavior of the ith agent^k(ii) a Wherein, J^kIs the current solution of the objective function; j. the design is a square_bestIs the initial optimal solution of the objective function.

Preferably, the updating the value function corresponding to the current behavior according to the reward function value includes:

according to V^k+1(i,j)＝(1-α)V^k(i,j)+α[R^k+(1-λ₂)L_max(i,j)+λ₂L_min(i,j)]Updating the value function corresponding to the current behavior;

wherein, V^k(i, j) is the corresponding valueA function; l is_l(i, j) is a path value, l ═ 1 indicates a path to the left, and l ═ 2 indicates a path to the right; lambda [ alpha ]₁As a function of said value V^k(ii) a weight of (i, j); α is the learning rate; l is_max(i, j) and L_min(i, j) two path values, maximum and minimum respectively; lambda [ alpha ]₂(1- λ) being the weight of the maximum and minimum path values₂)＞λ₂。

The invention also provides an electrical control device based on the PID controller, which comprises:

the system comprises a construction module, a parameter setting module and a parameter setting module, wherein the construction module is used for constructing a target function of a PID controller parameter setting problem, and undetermined parameters of the target function comprise N single-dimensional variables;

the reinforcement learning module is used for discretizing the N single-dimensional variables, learning the N single-dimensional variables by adopting N agents according to a reinforcement learning algorithm, and determining target values of the N single-dimensional variables;

the setting module is used for determining the optimal value of the objective function according to the target values of the N single-dimensional variables to complete parameter setting of the PID controller;

and the electrical control module is used for controlling a control object in the electrical control system by utilizing the PID controller after parameter setting is finished.

Preferably, the building block is specifically configured to:

constructing a target function of a PID controller parameter setting problem:

wherein e (t) is the tracking error of the PID controller; u (t) is the output of the PID controller; t is t_uA rise time for the output signal y (t) of the electrical control system to rise from 10% to 90% of a steady state value; y (t) -y (t-1) is overshoot penalty, when y (t) is not less than 0, ω is₄0; when ey (t) < 0, ω₄Not equal to 0 and ω₄＞＞ω₁(ii) a The undetermined parameters of the objective function include a first weight ω₁A second weight ω₂A third weight ω₃And a fourth weight ω₄。

Preferably, the reinforcement learning module includes:

a selecting unit, configured to determine a current solution of the objective function after an i (i ═ 1, 2., N) -th agent selects a current behavior from an i (i ═ 1, 2., N) -th one-dimensional variable set of adoptable behaviors;

the determining unit is used for determining a reward function value corresponding to the current behavior according to a calculation rule of a preset reward function and the current solution of the target function;

the updating unit is used for updating the value function corresponding to the current behavior according to the reward function value so that the ith agent can select the next behavior according to the updated value function;

the disturbance unit is used for adding different disturbances to all dimensions of the current solution;

a loop unit, configured to loop the i (i-1, 2...., N) -th agent to determine a current solution of the objective function after selecting a current behavior from an actionable set of i (i-1, 2.., N) -th one-dimensional variables; determining a reward function value corresponding to the current behavior according to a calculation rule of a preset reward function and the current solution of the objective function; updating the value function corresponding to the current behavior according to the reward function value so that the ith agent selects the next behavior according to the updated value function; and adding different disturbances to all dimensions of the current solution until the cycle times reach preset times, and finishing the learning of the ith single-dimensional variable.

a memory for storing a computer program; and the processor is used for realizing the steps of the electrical control method based on the PID controller when executing the computer program.

The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a PID controller-based electrical control method as described above.

According to the electrical control method based on the PID controller, after the parameters of the PID controller are set by adopting a reinforcement learning algorithm, the PID controller after parameter setting is used for controlling the control object in the electrical control system. The electrical control system consists of a PID controller and a controlled electrical system. When the PID controller is subjected to parameter setting by adopting a reinforcement learning algorithm, firstly, discretizing N single-dimensional variables in a target function of the parameter setting problem of the PID controller. Then, according to a reinforcement learning algorithm, N agents are adopted to respectively learn the discretized N single-dimensional variables, and the target values of the N unit variables are determined, so that the optimal value of the target function is determined, and the parameter setting of the PID controller is completed. The method provided by the invention is based on the reinforced learning algorithm to set the parameters of the PID controller, does not depend on the population, adopts the thought of repeated trial and error, completes the parameter setting through the interaction of the agent and the unknown environment, and can optimize the parameters of the PID controller on line and carry out tracking control on the system when the unknown environment changes, namely the controlled system is dynamically time-varying. The invention improves the optimization efficiency and the convergence speed of the PID controller parameters and the control performance of the PID controller, is convenient to realize and has practicability; and the reinforcement learning algorithm has certain randomness and can jump out local optimum.

Drawings

In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a flow chart of a first embodiment of a PID controller based electrical control method provided by the present invention;

FIG. 2 is a schematic diagram of an electrical control system;

FIG. 3 is a step response comparison graph of the system corresponding to the GA algorithm, the PSO algorithm, and the RL algorithm;

FIG. 4 is a comparison graph of the average objective function convergence results of the GA algorithm, the PSO algorithm and the RL algorithm which are respectively optimized for 10 times;

FIG. 5 is a flow chart of a method for each agent to learn each single-dimensional control variable;

fig. 6 is a block diagram of an electrical control device based on a PID controller according to an embodiment of the present invention.

Detailed Description

The core of the invention is to provide an electrical control method, a device, equipment and a computer readable storage medium based on a PID controller, which improve the parameter optimization efficiency, the convergence rate and the control performance of the PID controller.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating a PID controller-based electrical control method according to a first embodiment of the present invention; the specific operation steps are as follows:

step S101: constructing a target function of a PID controller parameter setting problem, wherein undetermined parameters of the target function comprise N single-dimensional variables;

an electrical control system composed of a PID controller and a control object is shown in fig. 2, wherein c(s) is a transfer function of the PID controller, g(s) is a transfer function of the control object, the input and output of the whole electrical control system are r (t) and y (t), respectively, the input signal of the electrical control system is used as a reference of the output signal of the control object, the difference value between the input signal and the output signal is a tracking error e (t) of the PID controller, and the output u (t) of the PID controller is the input of the control object. In the control process, given input signals, namely a reference signal and a control object, the PID controller can make the output of the control object approach the input signal by processing the tracking error, and the specific processing method is represented by the following laplace transfer function:

wherein, K_p、K_iAnd K_dRespectively, to be determined proportional, integral and differential parameters.

The environmental state is quantitatively represented by the objective function value. The target function expression of the PID controller parameter setting problem is as follows:

wherein e (t) is the tracking error of the PID controller; u (t) is the output of the PID controller; t is t_uA rise time for the output signal y (t) of the electrical control system to rise from 10% to 90% of a steady state value; in order to avoid large overshoot, an overshoot penalty term is set in the objective function; y (t) -y (t-1) is overshoot penalty, when y (t) is not less than 0, ω is₄0; when ey (t) < 0, ω₄Not equal to 0 and ω₄＞＞ω₁(ii) a The undetermined parameters of the objective function include a first weight ω₁A second weight ω₂A third weight ω₃And a fourth weight ω₄。

Step S102: discretizing the N single-dimensional variables, and learning the N single-dimensional variables by adopting N agents according to a reinforcement learning algorithm to determine target values of the N single-dimensional variables;

assuming that the dimension of the parameter X to be determined is N, it can be expressed as X ═ X₁,x₂,…,x_N]. The reinforcement learning algorithm adopts N agents, each agent is responsible for the optimization of a single-dimensional variable,the N agents perform a learning step on respective single-dimensional variables in turn. Discretizing feasible field of ith single dimension of variable into D_i(i-1, 2, …, N) boxes, the set of actions that the ith agent can take is a_i＝{1,2,…,D_i}。

Step S103: according to the target values of the N single-dimensional variables, determining the optimal value of the target function, and completing parameter setting of the PID controller;

setting a control object in the electrical control system asAnd a PID controller based on a reinforcement learning algorithm is adopted for control, and an input signal is a unit step signal.

In the embodiment, the PSO algorithm and the GA algorithm are selected to be compared with the PID parameter setting method based on the reinforcement learning algorithm, and the three algorithms optimize PID controller parameters in the same system. The parameters of the reinforcement learning algorithm are set as follows: n is 4, lambda₁＝0.5,λ₂0.25, 1 and 10. The parameters of the PSO algorithm are set as: acceleration factor c₁＝c₂The population size was 100 ═ 2. Parameters of the GA algorithm are set as: the crossover and mutation rates were 0.9 and 0.01, respectively, with a population size of 100. The objective function contains weight values set as: omega₁＝0.999,ω₂＝0.001,ω₃＝2,ω₄＝100。

Fig. 3 shows the step response of the system corresponding to the three algorithms, and the RL algorithm is the reinforcement learning algorithm in this embodiment. At the time 0, the input signal is suddenly changed from 0 to 1, and the results show that the three algorithms can eliminate the oscillation and overshoot before the output signal reaches the steady-state value 1 in the system response. And the performance of the three algorithms is close, and the step response is completed within 0.1 second. The RL algorithm almost coincides with the system response corresponding to the PSO, the system response corresponding to the GA algorithm is slightly faster at the rise phase, but enters a stable value slightly slower than the other two algorithms. Fig. 4 shows the results of the convergence of the average objective function for 10 times optimized by the three algorithms. The PSO and RL algorithms converge to smaller objective function values than the GA algorithm, but the RL algorithm converges at a rate that is doubled compared to the PSO.

Step S104: and controlling a control object in the electrical control system by using the PID controller after parameter setting is finished.

The method provided by the embodiment adopts a reinforcement learning algorithm to optimize parameters aiming at the parameter setting problem of the PID controller. The reinforcement learning algorithm can avoid introduction of a population in a genetic algorithm and a particle swarm algorithm, and introduces an agent to optimize a target function, so that the convergence rate in the optimization process is improved. Meanwhile, the reinforcement learning algorithm has certain randomness and can jump out of local optimum; convenient to realize and has practicability.

Based on step S102 in the above-described embodiment, a step of learning each single-dimensional variable by each agent is provided in the present embodiment. Referring to fig. 5, fig. 5 is a flowchart of a method for learning each single-dimensional control variable by each agent, and the specific optimization steps include:

s501: determining a current solution of the objective function after an i (i) 1, 2., N) th agent selects a current action from an i (i) 1, 2., N) th set of actions that can be taken for a single-dimensional variable;

the environmental state is quantitatively represented by the objective function value.

S502: determining a reward function value corresponding to the current behavior according to a calculation rule of a preset reward function and the current solution of the objective function;

the context is fed back to the agent as a reward function to characterize whether the agent has taken a favorable action to shift the context to a better state. According toDetermining a value R of a reward function for the kth step of the current behavior of the ith agent^k(ii) a Wherein, J^kIs the current solution of the objective function; j. the design is a square_bestIs the initial optimal solution of the objective function.

S503: updating the value function corresponding to the current behavior according to the reward function value so that the ith agent selects the next behavior according to the updated value function;

the value function corresponding to the jth action of the ith agent is V (i, j). And the agent updates the value function corresponding to the currently taken action according to the reward function and the path value. The path value refers to the value of the ith agent in the ith dimension of the variable, selecting a path search continuing to the left or right from the current jth grid, and is expressed as L_l(i, j), l 1 indicates a path to the left, and l 2 indicates a path to the right. The left and right path values are calculated by the value function corresponding to n lattices adjacent to the left and right sides of the jth lattice, as follows:

wherein,for the m-th element after descending the function of adjacent n values, λ₁Is the weight of a value function and satisfies

In summary, the update rule of the value function is as follows:

V^k+1(i,j)＝(1-α)V^k(i,j)+α[R^k+(1-λ₂)L_max(i,j)+λ₂L_min(i,j)]

wherein, V^k(i, j) is the corresponding value function; l is_l(i, j) is a path value, l ═ 1 indicates a path to the left, and l ═ 2 indicates a path to the right; lambda [ alpha ]₁As a function of said value V^k(ii) a weight of (i, j); alpha is the learning rate, characterizing the new information [ R^k+(1-λ₂)L_max(i,j)+λ₂L_min(i,j)]The impact on the value function; l is_max(i, j) and L_min(i, j) two path values, maximum and minimum respectively; lambda [ alpha ]₂(1- λ) being the weight of the maximum and minimum path values₂)＞λ₂。

The agent selects the next action according to the updated value function. Previously, the agent needs to first select a path, and the selection method is as follows:

wherein, tau^kIs temperature, and has a value range of 0 to tau^kLess than or equal to 1. When tau is^kThe numerical value is larger, and the probability that the remaining non-most favorable behaviors are selected is close; when tau is^kThe value is close to 0 and the probability of these actions being selected will vary depending on the magnitude of the value function. Tau is^kThe value of (a) is gradually reduced with the number of learning, namely:

then, the agent selects one behavior from adjacent n lattices on the selected path starting from the jth lattice, and the selection method is as follows:

and the value of the next one-dimensional variable is randomly determined from the selected trellis.

S504: adding different disturbances to all dimensions of the current solution;

in order to increase the diversity of the solution and avoid the algorithm from falling into the local optimum, the algorithm adds different disturbances to all dimensions of the current solution after the Nth agent completes a learning step, and the specific method is as follows:

X←X+Δ,Δ＝[Δ₁,Δ₂,…,Δ_N]

wherein the disturbance quantity delta is generated according to a covariance evolution algorithm.

S505: and circularly executing the steps from S501 to S504 until the circulation times reach preset times, and finishing the learning of the ith single-dimensional variable.

Repeating steps S501 and S504, each time the ith agent completes a learning processAfter that, the counter k is incremented by 1. When k reaches a preset threshold value k_maxThe algorithm terminates.

The embodiment provides a PID controller parameter setting method based on a reinforcement learning algorithm, the method does not depend on a population, adopts the thought of repeated trial and error, completes parameter setting through the interaction of an agent and an unknown environment, and when the unknown environment changes, namely the controlled system is dynamically time-varying, the reinforcement learning algorithm optimizes the PID parameters on line to carry out tracking control on the system.

Referring to fig. 6, fig. 6 is a block diagram of an electrical control apparatus based on a PID controller according to an embodiment of the present invention; the specific device may include:

the building module 100 is configured to build a target function of a PID controller parameter tuning problem, where a to-be-determined parameter of the target function includes N single-dimensional variables;

the reinforcement learning module 200 is configured to perform discretization on the N single-dimensional variables, and learn the N single-dimensional variables by using N agents according to a reinforcement learning algorithm, so as to determine target values of the N single-dimensional variables;

a setting module 300, configured to determine an optimal value of the objective function according to the target values of the N single-dimensional variables, and complete parameter setting of the PID controller;

and the electrical control module 400 is used for controlling a control object in the electrical control system by using the PID controller after parameter setting is finished.

The electrical control apparatus based on the PID controller of this embodiment is used for implementing the aforementioned electrical control method based on the PID controller, and therefore specific implementation manners in the electrical control apparatus based on the PID controller can be seen in the foregoing example portions of the electrical control method based on the PID controller, for example, the building module 100, the reinforcement learning module 200, the tuning module 300, and the electrical control module 400 are respectively used for implementing steps S101, S102, S103, and S104 in the aforementioned electrical control method based on the PID controller, so specific implementation manners thereof can refer to descriptions of corresponding respective partial embodiments, and are not repeated herein.

The specific embodiment of the present invention further provides an electrical control device based on a PID controller, including: a memory for storing a computer program; and the processor is used for realizing the steps of the electrical control method based on the PID controller when executing the computer program.

The present invention further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the electrical control method based on the PID controller.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The electrical control method, device, equipment and computer readable storage medium based on the PID controller provided by the invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. An electrical control method based on a PID controller is characterized by comprising the following steps:

constructing a target function of a PID controller parameter setting problem, wherein undetermined parameters of the target function comprise N single-dimensional variables;

discretizing the N single-dimensional variables, and learning the N single-dimensional variables by adopting N agents according to a reinforcement learning algorithm to determine target values of the N single-dimensional variables;

according to the target values of the N single-dimensional variables, determining the optimal value of the target function, and completing parameter setting of the PID controller;

and controlling a control object in the electrical control system by using the PID controller after parameter setting is finished.

2. The method of claim 1, wherein the constructing an objective function of a PID controller parameter tuning problem, wherein the pending parameters of the objective function include N single-dimensional variables comprises:

constructing a target function of a PID controller parameter setting problem:

wherein e (t) is the tracking error of the PID controller; u (t) is the output of the PID controller；t_uA rise time for the output signal y (t) of the electrical control system to rise from 10% to 90% of a steady state value; y (t) -y (t-1) is overshoot penalty, when y (t) is not less than 0, ω is₄0; when ey (t) < 0, ω₄Not equal to 0 and ω₄＞＞ω₁(ii) a The undetermined parameters of the objective function include a first weight ω₁A second weight ω₂A third weight ω₃And a fourth weight ω₄。

3. The method of claim 2, wherein the step of each agent learning each single-dimensional variable comprises:

s4: adding different disturbances to all dimensions of the current solution;

4. The method of claim 3, wherein determining the reward function value corresponding to the current behavior according to the preset reward function calculation rule and the current solution of the objective function comprises:

5. The method of claim 4, wherein said updating the value function corresponding to the current behavior according to the reward function value comprises:

wherein, V^k(i, j) is the corresponding value function; l is_l(i, j) is a path value, l ═ 1 indicates a path to the left, and l ═ 2 indicates a path to the right; lambda [ alpha ]₁As a function of said value V^k(ii) a weight of (i, j); α is the learning rate; l is_max(i, j) and L_min(i, j) two path values, maximum and minimum respectively; lambda [ alpha ]₂(1- λ) being the weight of the maximum and minimum path values₂)＞λ₂。

6. An electrical control apparatus based on a PID controller, comprising:

7. The apparatus of claim 6, wherein the build module is specifically configured to:

constructing a target function of a PID controller parameter setting problem:

8. The apparatus of claim 7, wherein the reinforcement learning module comprises:

9. An electrical control apparatus based on a PID controller, comprising:

a memory for storing a computer program;

a processor for implementing the steps of a PID controller based electrical control method according to any of claims 1 to 5 when executing said computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of a PID controller-based electrical control method according to any one of claims 1 to 5.