WO2021068748A1

WO2021068748A1 - Pid control method and apparatus, and video encoding and decoding system

Info

Publication number: WO2021068748A1
Application number: PCT/CN2020/117211
Authority: WO
Inventors: 周益民; 程学理
Original assignee: Oppo广东移动通信有限公司
Priority date: 2019-10-09
Filing date: 2020-09-23
Publication date: 2021-04-15
Also published as: CN112631120B; CN112631120A

Abstract

The present application provides a Proportional Integral Differential (PID) control method and apparatus, and a video encoding and decoding system. The PID control method comprises: determining, according to a difference between an output value of a PID controller in a target control system and a theoretical value, the current reward value corresponding to PID parameters of the PID controller, the PID parameters comprising at least one of a proportional gain, an integral gain, and a differential gain; in the case that the current reward value is less than 0, updating the PID parameters according to the current accumulated reward value and the current reward value, the current accumulated reward value being determined according to the current reward value and the previous accumulated reward value; and performing next PID control to the PID controller in the target control system according to the updated PID parameters. According to the method and apparatus and the system in embodiments of the present application, the difficulty in setting the PID parameters is reduced, and the control performance and versatility of the PID controller are improved.

Description

PID control method, device and video coding and decoding system

Technical field

This application relates to the field of control, and more specifically, to a PID control method, device, and video codec system.

Background technique

At present, proportional integral differential (PID) control has a wide range of applications in the field of control due to its simple algorithm and good reliability. The PID parameters of the traditional PID controller can include proportional gain, integral gain and derivative gain. PID parameters directly determine the control performance of the PID controller. Therefore, the parameter tuning of the PID controller is the core content of the control system design. The traditional parameter setting process relies heavily on the experience of relevant practitioners. Through repeated debugging of PID parameters until the actual application requirements are met, such adjustment workload is extremely large.

Summary of the invention

The present application provides a PID control method, device, and video encoding and decoding system, which are beneficial to reduce the difficulty in setting PID parameters and improve the control performance and versatility of the PID controller.

In the first aspect, a PID control method is provided. The PID control method includes: determining the PID parameter corresponding to the PID controller in the target control system according to the difference between the output value of the PID controller in the target control system and the theoretical value. Second reward value, the PID parameter includes at least one of proportional gain, integral gain, and differential gain; if the current reward value is less than 0, the PID parameter is performed according to the current accumulated reward value and the current reward value. Update, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value; according to the updated PID parameters, the PID controller in the target control system is subjected to the next PID control.

In a second aspect, a PID control device is provided. The PID control device includes a determining unit for determining the PID control based on the difference between the output value of the PID controller in the target control system and the theoretical value. The PID parameter of the controller corresponds to the current reward value; the update unit is used to update the PID parameters according to the current accumulated reward value and the current reward value when the current reward value is less than 0, so The PID parameters include at least one of proportional gain, integral gain, and derivative gain, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value; the control unit is configured to The updated PID parameters perform the next PID control on the PID controller in the target control system.

In a third aspect, a video encoding and decoding system is provided, including the PID control device in the second aspect or its implementation manners.

In a fourth aspect, a PID control device is provided, which includes a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the above-mentioned first aspect or each of its implementation manners.

In a fifth aspect, a chip is provided, which is used to implement the method in the above-mentioned first aspect or each of its implementation manners.

Specifically, the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes the method in the above-mentioned first aspect or each of its implementation manners.

In a sixth aspect, a computer-readable storage medium is provided for storing a computer program that enables a computer to execute the method in the above-mentioned first aspect or each of its implementation manners.

In a seventh aspect, a computer program product is provided, including computer program instructions that cause a computer to execute the method in the first aspect or its implementation manners.

In an eighth aspect, a computer program is provided, which when running on a computer, causes the computer to execute the method in the first aspect or its implementation manners.

Through the above technical solution, the current reward value is determined by the difference between the output value of the PID controller and the theoretical value. When the current reward value is less than 0, the PID parameters are updated in combination with the historical reward value, and the PID parameters are updated according to the updated value. The PID parameters of the PID controller are used for the next PID control, so that the PID parameters can be adjusted adaptively without the need to adjust based on human experience, which greatly reduces the difficulty of PID parameter tuning, and the adjustment effect is significant.

Description of the drawings

Figure 1 is a schematic structural diagram of the PID control system.

Fig. 2 is a schematic block diagram of a PID control method provided by an embodiment of the present application.

Fig. 3 is a schematic diagram of a negative correlation between the reward value this time and the absolute value of the difference in an embodiment of the present application.

FIG. 4 is a schematic diagram of another negative correlation between the absolute value of the reward value this time and the difference value in the embodiment of the present application.

Fig. 5 is a schematic block diagram of a PID control device provided by an embodiment of the present application.

Fig. 6 is a schematic block diagram of a video encoding and decoding system provided by an embodiment of the present application.

Fig. 7 is another schematic block diagram of a PID control device provided by an embodiment of the present application.

FIG. 8 is a schematic block diagram of a chip provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are a part of the embodiments of the present application, not all of the embodiments. Regarding the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.

At present, the level of industrial automation has become an important indicator to measure the modernization level of various industries. At the same time, the development of control theory has also experienced three stages: classical control theory, modern control theory and intelligent control theory. A typical example of intelligent control is a fuzzy fully automatic washing machine. The control system can be divided into an open-loop control system and a closed-loop control system. The PID control is a closed-loop control method, which combines the proportion (P), integral (I) and derivative (D) of the input and output deviation to form a control quantity through a linear combination to control the controlled object. Typically, a control system includes a PID controller and a controlled object, as shown in Figure 1.

Usually PID control includes three parts: proportional, integral and derivative, but there are other types of controllers in practice. The three control laws of proportional, integral and derivative can be used alone or in combination. For example, proportional P controller, proportional integral PI controller, proportional derivative PD controller, etc. The type of specific controller is mainly determined according to the needs of the control system.

The expression of the PID controller is shown in formula (1):

In practical applications, it can also be discretized, and its expression is shown in formula (2):

Among them, u(t) represents the output value of the PID controller, e(t) represents the deviation between the input value (that is, the theoretical value) and the output value, k _p , k _i and k _d are the proportional gain and integral gain, respectively And differential gain can also be called proportional coefficient, integral coefficient and differential coefficient. e(t) represents the deviation at time t, and e(t-1) represents the deviation at time (t-1).

Under normal circumstances, these three parameters are important parameters in the PID controller, which directly determine the control performance of the controller. Therefore, the parameter tuning of the PID controller is the core content of the control system design. The current parameter setting mainly relies on the experience of relevant practitioners, through repeated adjustments of these three parameters, until the actual application requirements are met. Such adjustment workload is huge. In addition, the scope of application of this method is limited, and each adjustment is only applicable to a specific engineering scenario, and does not have universality.

Therefore, the embodiments of the present application provide a new PID control method, which can dynamically adjust the PID parameters based on the determination of the initial PID parameters by referring to the reward and punishment mechanism in the reinforcement learning.

Fig. 2 shows a schematic block diagram of a PID control method 100 provided by an embodiment of the present application. As shown in FIG. 2, the PID control method 100 may include some or all of the following content:

S110: Determine the current reward value corresponding to the PID parameter of the PID controller according to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value. The PID parameters include proportional gain, At least one of integral gain and derivative gain;

S120: In the case that the current reward value is less than 0, update the PID parameters according to the current cumulative reward value and the current reward value, wherein the current cumulative reward value is based on the current reward value. The second reward value is determined by the previous cumulative reward value;

S130: Perform the next PID control on the PID controller in the target control system according to the updated PID parameters.

First of all, it should be noted that the PID controller in the embodiment of the present application is a collective term for all controllers that use PID control laws, and does not represent the type of controller. In other words, the PID controller can be a controller that uses the three control laws of proportional, integral, and derivative, that is, the PID parameters of the PID controller include proportional gain, integral gain, and derivative gain; the PID controller can also be A controller that uses the two control laws of proportional and integral, that is, the PID parameters of the PID controller include proportional gain and integral gain. It should be understood that the embodiment of the present application does not limit the type of the PID controller.

In addition, even if the PID parameters of the PID controller include multiple parameters, the PID control method provided in the embodiment of the present application may only be directed to some of the multiple parameters. For example, assuming that the PID parameters include proportional gain, integral gain, and derivative gain, the PID control method may only target one or two of the proportional gain, integral gain, and derivative gain. This needs to be determined according to the requirements of each control law in the control system.

In the embodiment of the present application, PID control is a cyclic control process, and the PID parameters obtained from each update can be used as parameters for the next PID control. Specifically, it is possible to determine whether to reward the PID parameters in this PID control based on the difference between the output value and the theoretical value in the first PID control, and further determine the current reward value. In other words, according to the difference between the output value and the theoretical value, it is determined whether the performance of the PID controller is good. If the performance of the PID controller is good, the PID parameters currently used need to be rewarded. This reward value Greater than 0; If the performance of the PID controller is not good, the PID parameters currently used need to be punished. When the reward value this time is less than 0, that is, when the performance of the PID controller is not good, the PID parameters can be updated according to the reward value this time and the accumulated reward value this time. The accumulated reward value this time is determined based on the previous accumulated reward value and the current reward value. For example, for each PID control, the determined reward value of this time can be accumulated with all previous reward values to form the accumulated reward value of this time. Specifically, assuming that rwd(t) represents the accumulated reward value for the tth time, rwd(t) can be equal to the sum of the accumulated reward value for the (t-1)th time and the current reward value rwd for the tth time. The accumulated reward value of t-1) times is equal to the sum of the accumulated reward value of the (t-2)th time and the current reward value rwd of the (t-1)th time, and so on, then rwd(t)=rwd _t + rwd _t-1 +rwd _t-2 +……+rwd ₁ +rwd ₀ , where rwd _i represents the reward value of the i-th time, i is an integer less than or equal to t, and rwd ₀ =0. After the PID parameters are updated, the updated PID parameters can be used to perform the next PID control on the PID controller.

The PID control method can be executed by the PID control device in the control system. For example, the PID control device can be an independent device that can be placed behind the PID controller and adjust the PID parameters according to the output value of the PID controller. . For another example, the PID control device may also be a PID controller. The embodiments of the present application do not constitute a limitation on this.

In addition, the PID control method in the embodiments of the present application can be applied to various control systems, for example, constant temperature and humidity systems, power systems, video coding and decoding systems, and so on. Specifically, when the PID control method is applied to a video codec system, it can be applied to rate control in a video codec system.

Therefore, the PID control method of the embodiment of the present application determines the current reward value through the difference between the output value of the PID controller and the theoretical value. In the case that the current reward value is less than 0, the PID parameters are combined with the historical reward value. Update, and perform the next PID control on the PID controller according to the updated PID parameters, so that the PID parameters can be adjusted adaptively without the need to adjust based on human experience, which greatly reduces the difficulty of PID parameter tuning, and the adjustment effect is significant. The PID control method is also versatile.

Optionally, in the embodiment of the present application, the current reward value is determined according to the difference between the output value of the PID controller and the theoretical value. The absolute value of the difference may be compared with the first threshold. If the absolute value of the difference is less than the first threshold, the PID controller is considered to have better performance, and the PID parameters used can be rewarded, that is, the reward value this time is greater than 0; if the absolute value of the difference is greater than the first threshold , It is considered that the performance of the PID controller is not good, then the PID parameters used need to be punished, that is, the reward value is less than 0; if the absolute value of the difference is equal to the first threshold, the PID controller can be considered The performance of is average, and neither reward nor punishment is given to the PID parameters used, that is, the reward value is equal to 0 this time.

Optionally, the current reward value may be negatively correlated with the absolute value of the difference. For example, the absolute value of the current reward value and the difference value may be linearly negatively correlated as shown in FIG. 3. That is, the reward value this time can be determined by formula (3):

rwd=-a*|u(t)-v ^* |+b formula (3)

For another example, the absolute value of the current reward value and the difference value may be non-linearly negatively correlated as shown in FIG. 4. That is, the reward value this time can be determined by formula (4):

rwd=-a*ln(|u(t)-v ^* |+1)+b formula (4)

Among them, rwd represents the reward value this time, u(t) represents the output value of the PID controller, v* represents the theoretical value, and both a and b are constants greater than 0. In Figs. 3 and 4, δ represents the first threshold.

Alternatively, in addition to the logarithmic operation in formula (4), the nonlinear model in the embodiment of the present application may also be square root operation, exponential operation, trigonometric function operation, etc.

Optionally, in this embodiment of the application, the accumulated reward value this time may be determined by formula (5):

rwd(t)=max(0,rwd(t-1)+rwd) Formula (5)

Among them, rwd(t) represents the cumulative reward value this time, rwd(t-1) represents the previous cumulative reward value, and rwd represents the current reward value.

Specifically, when rwd(t-1)+rwd in each PID control is greater than 0, the rwd(t) is the sum of the current reward value determined in each PID control including this time. And when rwd(t-1)+rwd in one PID control is less than 0, then the accumulated reward value rwd(t) for that time is 0, that is, rwd(t) in the next PID control starts again.

Optionally, in this embodiment of the application, when the current reward value is greater than or equal to 0, the PID parameters currently used may not be updated. In other words, in the next PID control, the PID parameters in this PID control are still used. When the current reward value is greater than or equal to 0, the current cumulative reward value still needs to be determined by the previous cumulative reward value and the current reward value. That is to say, regardless of whether the reward value is greater than 0 or less than 0, the cumulative reward value needs to be determined this time. When the reward value is greater than or equal to 0, the PID parameters are not updated; and when the reward value is less than 0, Then, the PID parameters are updated according to the determined current cumulative reward value and current reward value.

In an alternative embodiment, when the current reward value is greater than or equal to 0, the PID parameters can also be fine-tuned. For example, K ₂ =a*K ₁ , where K ₂ is the PID parameter after update, K ₁ is the PID parameter before update, and a is close to 1, such as a=0.99, a=1.01 and so on. The embodiment of the application does not limit this.

Optionally, in this embodiment of the application, in the case that the current reward value is less than 0, updating the PID parameters according to the current accumulated reward value and the current reward value includes: In the case that the current reward value is less than 0, the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward The ratio of the value when updating PID parameters.

Alternatively, when the update rate is constant, the PID parameters can be updated only according to the two variables of the current cumulative reward value and the current reward value.

When the reward value this time is less than 0, that is, when the current PID parameter needs to be punished, the penalty can be divided into a positive penalty and a negative penalty according to the relationship between the output value and the theoretical value. If the output value is smaller than the theoretical value, it can be regarded as a positive penalty. The positive penalty can mean that the current PID controller is not adjusted enough, and the PID parameters need to be adjusted. In this case, the update rate can be used to increase the current time. The proportion of the reward value when updating PID parameters; if the output value is greater than the theoretical value, it can be considered as a negative penalty. The negative penalty can mean that the current PID controller's adjustment power is too large, and the adjustment strength of the PID parameters needs to be reduced At this time, the update rate can be used to reduce the proportion of the current reward value when updating the PID parameters.

Optionally, the update rate can be adjusted according to actual conditions, for example, it can be updated during the PID parameter update process. That is, every time the PID parameters are updated, the update rate is updated once as the update rate when the PID parameters are updated next time. In an achievable embodiment, when the accumulated reward value this time is greater than the second threshold, the update rate is reduced. Thereby, a higher-precision update effect can be achieved. Optionally, when the accumulated reward value this time is less than the second threshold, the update rate may be increased; when the accumulated reward value this time is equal to the second threshold, the update rate may not be updated.

Further, when the reward value this time is less than 0, the positive penalty can use the following formula (6) to update the PID parameters:

Negative penalty can use the following formula (7) to update the PID parameters:

Among them, k ₂ represents the updated PID parameters, k ₁ represents the PID parameters before the update, rwd(t) represents the accumulated reward value this time, psh represents the negative number of the reward value this time, that is, psh=-rwd, ur represents the update rate , The value of ur ranges from 0 to 1.

_{Substituting the k p} , k _i and k _d in the PID parameters into formula (6) and formula (7) respectively, formula (6) becomes formula (8):

Formula (7) becomes formula (9):

It should be noted that the update formulas in the above formulas (6) to (9) are only for illustration and not for limitation, and simple changes to the above formulas also belong to the protection scope of the technical solution of this application.

It should be understood that the first threshold, the second threshold, and the update rate in the embodiments of the present application can be obtained based on the experience of relevant practitioners. The negative correlation between the reward value this time and the absolute value of the difference can also be obtained based on the experience of the relevant practitioners, and these are relatively easy to obtain for the relevant practitioners.

In addition, usually k _p , k _i and k _{d have} different adjustment intensities. For example, k _{p is the} largest, and k _i and k _{d are the} second. Therefore, different update rates ur _p , ur can be set for these three parameters. _i and ur _d .

FIG. 5 shows a schematic block diagram of a PID control device 200 provided in an embodiment of the present application. As shown in FIG. 5, the PID control device 200 includes some or all of the following contents:

The determining unit 210 is configured to determine the current reward value corresponding to the PID parameter of the PID controller according to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value;

The update unit 220 is configured to update the PID parameters according to the current accumulated reward value and the current reward value when the current reward value is less than 0. The PID parameters include proportional gain and integral gain And at least one of differential gain, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value;

The control unit 230 is configured to perform the next PID control on the PID controller in the target control system according to the updated PID parameters.

Therefore, the PID control device of the embodiment of the present application determines the reward value of this time through the difference between the output value of the PID controller and the theoretical value. In the case that the reward value is less than 0 this time, the PID parameters are combined with the historical reward value. Update, and perform the next PID control on the PID controller according to the updated PID parameters, so that the PID parameters can be adjusted adaptively without the need to adjust based on human experience, which greatly reduces the difficulty of PID parameter tuning, and the adjustment effect is significant. The PID control method is also versatile.

Optionally, in this embodiment of the present application, the current reward value is negatively correlated with the absolute value of the difference.

Optionally, in this embodiment of the application, if the absolute value of the difference is less than or equal to a first threshold, the current reward value is greater than or equal to 0; if the absolute value of the difference is greater than the first threshold, Threshold, the reward value this time is less than 0.

Optionally, in the embodiment of the present application, the determining unit is specifically configured to:

Determine the reward value this time according to the first formula, where the first formula is:

rwd=-a*|u(t)-v ^* |+b; or

Determine the reward value this time according to the second formula, where the second formula is:

rwd=-a*ln(|u(t)-v ^* |+1)+b;

Among them, rwd represents the reward value this time, u(t) represents the output value of the PID control device, v* represents the theoretical value, and both a and b are constants greater than zero.

Optionally, in the embodiment of the present application, the determining unit is further configured to:

According to the third formula, determine the accumulated reward value this time, where the third formula is:

rwd(t)=max(0,rwd(t-1)+rwd);

Optionally, in the embodiment of the present application, the update unit is specifically configured to:

In the case that the current reward value is less than 0, the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward value. The percentage of the secondary reward value when updating the PID parameters.

Optionally, in the embodiment of the present application, in the case that the current reward value is less than 0 and the output value is less than the theoretical value, the update rate is used to increase the current reward value when updating The proportion of the PID parameters; when the current reward value is less than 0 and the output value is greater than the theoretical value, the update rate is used to reduce the current reward value in the update The proportion of PID parameters.

In the case where the current reward value is less than 0 and the output value is less than the theoretical value, the PID parameters are updated according to the fourth formula, where the fourth formula is:

In the case that the current reward value is less than 0 and the output value is greater than the theoretical value, the PID parameters are updated according to the fifth formula, where the fifth formula is:

Among them, k ₂ represents the updated PID parameters, k ₁ represents the PID parameters before the update, rwd(t) represents the accumulated reward value this time, psh represents the negative number of the reward value this time, ur represents the update rate, and the value range of ur Between 0 and 1.

Optionally, in the embodiment of the present application, the update unit is further configured to:

If the accumulated reward value this time is greater than the second threshold, the update rate is reduced.

Optionally, in the embodiment of the present application, the control unit is further configured to:

In the case that the current reward value is greater than or equal to 0, the PID controller in the target control system is subjected to the next PID control according to the PID parameters.

In the case that the current reward value is greater than or equal to 0, the current cumulative reward value is determined according to the current reward value and the previous cumulative reward value, and the current cumulative reward value is used for the next update of the reward. The previous accumulated reward value used by the PID parameter.

Optionally, in the embodiment of the present application, the target control system is a video codec system, and the PID control device is suitable for rate control in the video codec system.

It should be understood that the PID control device 200 according to the embodiment of the present application may correspond to the execution subject in the method embodiment of the present application, and the above-mentioned and other operations and/or functions of the various units in the PID control device 200 are to implement the method of FIG. 2 respectively. For the sake of brevity, the corresponding process in, will not be repeated here.

It should be understood that the size of the sequence numbers of the foregoing processes does not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Although the application and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made to the application without departing from the spirit and scope of the application as defined by the appended claims.

6 is a schematic block diagram of a video encoding and decoding system 300 provided by an embodiment of the present application. The video encoding and decoding system 300 includes a PID controller 310, an encoding parameter adjustment device 320, an encoder 330, a buffer 340, and various embodiments described above. Specifically, the difference between the target line of the buffer and the fullness of the buffer is used as the proportional term of the PID controller 310, and the encoding parameter adjustment device 320 feedbacks and calculates the encoding parameters (such as Quantization parameter QP, Lagrangian multiplier λ, etc.), and then assign the adjusted encoding parameters to the encoder 330 for actual encoding. After one frame is encoded, the buffer needs to be updated, and then the next round of PID control is performed. The PID control device 350 in the embodiment of the present application can be used to adjust the PID parameters according to the difference between the target line of the buffer and the fullness of the buffer when the buffer is updated. If the error is large, the PID parameters need to be penalized. At this time, you can also determine whether it is a positive penalty or a negative penalty based on the adjustment of the encoding parameters, and then complete the update and adjustment of the PID parameters for the next rate control process.

For judging the positive/negative penalty, a specific example is: when the actual number of bits generated by the encoder is large, the error increases, and QP or λ is increasing, indicating that the adjustment is not strong enough, and the adjustment needs to be increased. , At this time, it is a positive penalty. See Table 1 for details.

Table 1

FIG. 7 is a schematic structural diagram of a PID control device 400 provided by an embodiment of the present application. The PID control device 400 includes a memory 410 and a processor 420. The memory 410 is used to store instructions, and the processor 420 is used to execute the instructions stored in the memory 410. Specifically, the processor 420 is used to perform the following operations: according to the target control system of the proportional integral derivative PID controller The difference between the output value and the theoretical value determines the current reward value corresponding to the PID parameter of the PID controller. The PID parameter includes at least one of proportional gain, integral gain, and derivative gain; In the case that the second reward value is less than 0, the PID parameters are updated according to the current cumulative reward value and the current reward value, where the current cumulative reward value is based on the current reward value and the previous cumulative reward value. The reward value is determined; according to the updated PID parameters, the next PID control is performed on the PID controller in the target control system.

FIG. 8 is a schematic structural diagram of a chip of an embodiment of the present application. The chip 500 shown in FIG. 8 includes a processor 510, and the processor 510 can call and run a computer program from the memory to implement the method in the embodiment of the present application.

Optionally, as shown in FIG. 8, the chip 500 may further include a memory 520. The processor 510 may call and run a computer program from the memory 520 to implement the method in the embodiment of the present application.

The memory 520 may be a separate device independent of the processor 510, or may be integrated in the processor 510.

Optionally, the chip 500 may further include an input interface 530. The processor 510 can control the input interface 530 to communicate with other devices or chips, and specifically, can obtain information or data sent by other devices or chips.

Optionally, the chip 500 may further include an output interface 550. The processor 510 can control the output interface 550 to communicate with other devices or chips, and specifically, can output information or data to other devices or chips.

Optionally, the chip can be applied to the PID control device in the embodiment of the present application, and the chip can implement the corresponding process in each method of the embodiment of the present application. For the sake of brevity, details are not described herein again.

It should be understood that the chip mentioned in the embodiment of the present application may also be referred to as a system-level chip, a system-on-chip, a system-on-chip, or a system-on-chip.

It should be understood that the processor of the embodiment of the present application may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software. The above-mentioned processor may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.

It can be understood that the memory in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (Random Access Memory, RAM), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, DDR SDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), Synchronous Link Dynamic Random Access Memory (Synchlink DRAM, SLDRAM) ) And Direct Rambus RAM (DR RAM). It should be noted that the memories of the systems and methods described herein are intended to include, but are not limited to, these and any other suitable types of memories.

It should be understood that the foregoing memory is exemplary but not restrictive. For example, the memory in the embodiment of the present application may also be static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), Synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection Dynamic random access memory (synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM) and so on. That is to say, the memory in the embodiments of the present application is intended to include, but is not limited to, these and any other suitable types of memory.

The embodiment of the present application also provides a computer-readable storage medium for storing computer programs.

Optionally, the computer-readable storage medium can be applied to the network device in the embodiment of the present application, and the computer program causes the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application. For the sake of brevity, here No longer.

Optionally, the computer-readable storage medium can be applied to the mobile terminal/terminal device in the embodiment of the present application, and the computer program causes the computer to execute the corresponding process implemented by the mobile terminal/terminal device in each method of the embodiment of the present application For the sake of brevity, I won’t repeat it here.

The embodiments of the present application also provide a computer program product, including computer program instructions.

Optionally, the computer program product can be applied to the network device in the embodiment of the present application, and the computer program instructions cause the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application. For the sake of brevity, it is not here. Go into details again.

Optionally, the computer program product can be applied to the mobile terminal/terminal device in the embodiment of the present application, and the computer program instructions cause the computer to execute the corresponding process implemented by the mobile terminal/terminal device in each method of the embodiment of the present application, For the sake of brevity, I will not repeat them here.

The embodiment of the present application also provides a computer program.

Optionally, the computer program can be applied to the network device in the embodiment of the present application. When the computer program runs on the computer, it causes the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application. For the sake of brevity , I won’t repeat it here.

Optionally, the computer program can be applied to the mobile terminal/terminal device in the embodiment of the present application. When the computer program runs on the computer, the computer executes each method in the embodiment of the present application. For the sake of brevity, the corresponding process will not be repeated here.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. In response to this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory,) ROM, random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A PID control method is characterized in that it comprises:

According to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value, determine the current reward value corresponding to the PID parameter of the PID controller. The PID parameter includes proportional gain and integral gain. And at least one of differential gain;

In the case that the current reward value is less than 0, the PID parameters are updated according to the current cumulative reward value and the current reward value, wherein the current cumulative reward value is based on the current reward The value is determined by the previous accumulated reward value;

According to the updated PID parameters, perform the next PID control on the PID controller in the target control system.
The PID control method according to claim 1, wherein the current reward value is negatively correlated with the absolute value of the difference.
The PID control method according to claim 1 or 2, wherein if the absolute value of the difference is less than or equal to the first threshold, the current reward value is greater than or equal to 0; if the absolute value of the difference is The value is greater than the first threshold, and the current reward value is less than zero.
The PID control method according to claim 2 or 3, wherein the current reward value corresponding to the PID parameter of the PID controller is determined according to the difference between the output value of the target control system and the theoretical value ,include:

Determine the reward value this time according to the first formula, where the first formula is:

rwd=-a*|u(t)-v * |+b; or

Determine the reward value this time according to the second formula, where the second formula is:

rwd=-a*ln(|u(t)-v * |+1)+b;

Among them, rwd represents the reward value this time, u(t) represents the output value of the PID controller, v* represents the theoretical value, and both a and b are constants greater than 0.
The PID control method according to any one of claims 1 to 4, wherein the PID control method further comprises:

According to the third formula, determine the accumulated reward value this time, where the third formula is:

rwd(t)=max(0,rwd(t-1)+rwd);

Among them, rwd(t) represents the cumulative reward value this time, rwd(t-1) represents the previous cumulative reward value, and rwd represents the current reward value.
The PID control method according to any one of claims 1 to 5, characterized in that, in the case that the current reward value is less than 0, the current cumulative reward value is compared with the current reward value. The PID parameters are updated, including:

In the case that the current reward value is less than 0, the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward value. The percentage of the secondary reward value when updating the PID parameters.
The PID control method according to claim 6, wherein in the case that the current reward value is less than 0 and the output value is less than the theoretical value, the update rate is used to increase the current reward value. The proportion of the reward value when updating the PID parameters; when the current reward value is less than 0 and the output value is greater than the theoretical value, the update rate is used to reduce the current reward The ratio of the value when updating the PID parameter.
The PID control method according to claim 6 or 7, characterized in that, in the case that the current reward value is less than 0, according to the current cumulative reward value, the current reward value and the update rate , To update the PID parameters, including:

In the case that the current reward value is less than 0 and the output value is less than the theoretical value, the PID parameters are updated according to the fourth formula, where the fourth formula is:

In the case that the current reward value is less than 0 and the output value is greater than the theoretical value, the PID parameters are updated according to the fifth formula, where the fifth formula is:

Among them, k 2 represents the updated PID parameters, k 1 represents the PID parameters before the update, rwd(t) represents the accumulated reward value this time, psh represents the negative number of the reward value this time, ur represents the update rate, and the value range of ur Between 0 and 1.
The PID control method according to any one of claims 6 to 8, wherein the PID control method further comprises:

If the accumulated reward value this time is greater than the second threshold, the update rate is reduced.
The PID control method according to any one of claims 1 to 9, wherein the PID control method further comprises:

In the case that the current reward value is greater than or equal to 0, the PID controller in the target control system is subjected to the next PID control according to the PID parameters.
The PID control method according to any one of claims 1 to 10, wherein the PID control method further comprises:

In the case that the current reward value is greater than or equal to 0, the current cumulative reward value is determined according to the current reward value and the previous cumulative reward value, and the current cumulative reward value is used for the next update of the reward. The previous accumulated reward value used by the PID parameter.
The PID control method according to any one of claims 1 to 11, wherein the target control system is a video codec system, and the PID control method is suitable for rate control in the video codec system .
A PID control device, characterized in that the PID control device comprises:

The determining unit is configured to determine the current reward value corresponding to the PID parameter of the PID controller according to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value;

The update unit is configured to update the PID parameters according to the accumulated reward value this time and the current reward value when the current reward value is less than 0. The PID parameters include proportional gain, integral gain, and At least one of the differential gains, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value;

The control unit is configured to perform the next PID control on the PID controller in the target control system according to the updated PID parameters.
The PID control device according to claim 13, wherein the current reward value is negatively correlated with the absolute value of the difference.
The PID control device according to claim 13 or 14, wherein if the absolute value of the difference is less than or equal to the first threshold, the current reward value is greater than or equal to 0; if the absolute value of the difference is The value is greater than the first threshold, and the current reward value is less than zero.
The PID control device according to claim 14 or 15, wherein the determining unit is specifically configured to:

Determine the reward value this time according to the first formula, where the first formula is:

rwd=-a*|u(t)-v * |+b; or

Determine the reward value this time according to the second formula, where the second formula is:

rwd=-a*ln(|u(t)-v * |+1)+b;

Among them, rwd represents the reward value this time, u(t) represents the output value of the PID control device, v* represents the theoretical value, and both a and b are constants greater than zero.
The PID control device according to any one of claims 13 to 16, wherein the determining unit is further configured to:

According to the third formula, determine the accumulated reward value this time, where the third formula is:

rwd(t)=max(0,rwd(t-1)+rwd);

Among them, rwd(t) represents the cumulative reward value this time, rwd(t-1) represents the previous cumulative reward value, and rwd represents the current reward value.
The PID control device according to any one of claims 13 to 17, wherein the update unit is specifically configured to:

In the case that the current reward value is less than 0, the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward value. The percentage of the secondary reward value when updating the PID parameters.
The PID control device according to claim 18, wherein, in the case that the current reward value is less than 0 and the output value is less than the theoretical value, the update rate is used to increase the current reward value. The proportion of the reward value when updating the PID parameters; when the current reward value is less than 0 and the output value is greater than the theoretical value, the update rate is used to reduce the current reward The ratio of the value when updating the PID parameter.
The PID control device according to claim 18 or 19, wherein the update unit is specifically configured to:

In the case where the current reward value is less than 0 and the output value is less than the theoretical value, the PID parameters are updated according to the fourth formula, where the fourth formula is:

In the case that the current reward value is less than 0 and the output value is greater than the theoretical value, the PID parameters are updated according to the fifth formula, where the fifth formula is:

Among them, k 2 represents the updated PID parameters, k 1 represents the PID parameters before the update, rwd(t) represents the accumulated reward value this time, psh represents the negative number of the reward value this time, ur represents the update rate, and the value range of ur Between 0 and 1.
The PID control device according to any one of claims 18 to 20, wherein the update unit is further configured to:

If the accumulated reward value this time is greater than the second threshold, the update rate is reduced.
The PID control device according to any one of claims 13 to 21, wherein the control unit is further configured to:

In the case that the current reward value is greater than or equal to 0, the PID controller in the target control system is subjected to the next PID control according to the PID parameters.
The PID control device according to any one of claims 13 to 22, wherein the determining unit is further configured to:

In the case that the current reward value is greater than or equal to 0, the current cumulative reward value is determined according to the current reward value and the previous cumulative reward value, and the current cumulative reward value is used for the next update of the reward. The previous accumulated reward value used by the PID parameter.
The PID control device according to any one of claims 13 to 23, wherein the target control system is a video codec system, and the PID control device is suitable for rate control in the video codec system .
A video encoding and decoding system, characterized by comprising the PID control device according to any one of claims 13 to 24, the PID control device being suitable for rate control in the video encoding and decoding system.
A PID control device, characterized by comprising: a processor and a memory, the memory is used to store a computer program, the processor is used to call and run the computer program stored in the memory, and execute as claimed in claims 1 to 12 Any of the methods described.
A chip, characterized by comprising: a processor, configured to call and run a computer program from a memory, so that a device installed with the chip executes the method according to any one of claims 1 to 12.
A computer-readable storage medium, characterized in that it is used to store a computer program that enables a computer to execute the method according to any one of claims 1 to 12.
A computer program product, characterized by comprising computer program instructions, which cause a computer to execute the method according to any one of claims 1 to 12.
A computer program, wherein the computer program causes a computer to execute the method according to any one of claims 1 to 12.