WO2021068748A1 - Pid control method and apparatus, and video encoding and decoding system - Google Patents

Pid control method and apparatus, and video encoding and decoding system Download PDF

Info

Publication number
WO2021068748A1
WO2021068748A1 PCT/CN2020/117211 CN2020117211W WO2021068748A1 WO 2021068748 A1 WO2021068748 A1 WO 2021068748A1 CN 2020117211 W CN2020117211 W CN 2020117211W WO 2021068748 A1 WO2021068748 A1 WO 2021068748A1
Authority
WO
WIPO (PCT)
Prior art keywords
pid
reward value
value
current
pid control
Prior art date
Application number
PCT/CN2020/117211
Other languages
French (fr)
Chinese (zh)
Inventor
周益民
程学理
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021068748A1 publication Critical patent/WO2021068748A1/en

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B11/00Automatic controllers
    • G05B11/01Automatic controllers electric
    • G05B11/36Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential
    • G05B11/42Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential for obtaining a characteristic which is both proportional and time-dependent, e.g. P.I., P.I.D.
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output

Definitions

  • This application relates to the field of control, and more specifically, to a PID control method, device, and video codec system.
  • proportional integral differential (PID) control has a wide range of applications in the field of control due to its simple algorithm and good reliability.
  • the PID parameters of the traditional PID controller can include proportional gain, integral gain and derivative gain. PID parameters directly determine the control performance of the PID controller. Therefore, the parameter tuning of the PID controller is the core content of the control system design.
  • the traditional parameter setting process relies heavily on the experience of relevant practitioners. Through repeated debugging of PID parameters until the actual application requirements are met, such adjustment workload is extremely large.
  • the present application provides a PID control method, device, and video encoding and decoding system, which are beneficial to reduce the difficulty in setting PID parameters and improve the control performance and versatility of the PID controller.
  • a PID control method includes: determining the PID parameter corresponding to the PID controller in the target control system according to the difference between the output value of the PID controller in the target control system and the theoretical value. Second reward value, the PID parameter includes at least one of proportional gain, integral gain, and differential gain; if the current reward value is less than 0, the PID parameter is performed according to the current accumulated reward value and the current reward value. Update, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value; according to the updated PID parameters, the PID controller in the target control system is subjected to the next PID control.
  • a PID control device in a second aspect, includes a determining unit for determining the PID control based on the difference between the output value of the PID controller in the target control system and the theoretical value.
  • the PID parameter of the controller corresponds to the current reward value;
  • the update unit is used to update the PID parameters according to the current accumulated reward value and the current reward value when the current reward value is less than 0, so
  • the PID parameters include at least one of proportional gain, integral gain, and derivative gain, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value; the control unit is configured to The updated PID parameters perform the next PID control on the PID controller in the target control system.
  • a video encoding and decoding system including the PID control device in the second aspect or its implementation manners.
  • a PID control device which includes a processor and a memory.
  • the memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the above-mentioned first aspect or each of its implementation manners.
  • a chip is provided, which is used to implement the method in the above-mentioned first aspect or each of its implementation manners.
  • the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes the method in the above-mentioned first aspect or each of its implementation manners.
  • a computer-readable storage medium for storing a computer program that enables a computer to execute the method in the above-mentioned first aspect or each of its implementation manners.
  • a computer program product including computer program instructions that cause a computer to execute the method in the first aspect or its implementation manners.
  • a computer program which when running on a computer, causes the computer to execute the method in the first aspect or its implementation manners.
  • the current reward value is determined by the difference between the output value of the PID controller and the theoretical value.
  • the PID parameters are updated in combination with the historical reward value, and the PID parameters are updated according to the updated value.
  • the PID parameters of the PID controller are used for the next PID control, so that the PID parameters can be adjusted adaptively without the need to adjust based on human experience, which greatly reduces the difficulty of PID parameter tuning, and the adjustment effect is significant.
  • Figure 1 is a schematic structural diagram of the PID control system.
  • Fig. 2 is a schematic block diagram of a PID control method provided by an embodiment of the present application.
  • Fig. 3 is a schematic diagram of a negative correlation between the reward value this time and the absolute value of the difference in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another negative correlation between the absolute value of the reward value this time and the difference value in the embodiment of the present application.
  • Fig. 5 is a schematic block diagram of a PID control device provided by an embodiment of the present application.
  • Fig. 6 is a schematic block diagram of a video encoding and decoding system provided by an embodiment of the present application.
  • Fig. 7 is another schematic block diagram of a PID control device provided by an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a chip provided by an embodiment of the present application.
  • control theory is a closed-loop control method, which combines the proportion (P), integral (I) and derivative (D) of the input and output deviation to form a control quantity through a linear combination to control the controlled object.
  • a control system includes a PID controller and a controlled object, as shown in Figure 1.
  • PID control includes three parts: proportional, integral and derivative, but there are other types of controllers in practice.
  • the three control laws of proportional, integral and derivative can be used alone or in combination.
  • the type of specific controller is mainly determined according to the needs of the control system.
  • u(t) represents the output value of the PID controller
  • e(t) represents the deviation between the input value (that is, the theoretical value) and the output value
  • k p , k i and k d are the proportional gain and integral gain, respectively
  • differential gain can also be called proportional coefficient, integral coefficient and differential coefficient.
  • e(t) represents the deviation at time t
  • e(t-1) represents the deviation at time (t-1).
  • the parameter tuning of the PID controller is the core content of the control system design.
  • the current parameter setting mainly relies on the experience of relevant practitioners, through repeated adjustments of these three parameters, until the actual application requirements are met. Such adjustment workload is huge.
  • the scope of application of this method is limited, and each adjustment is only applicable to a specific engineering scenario, and does not have universality.
  • the embodiments of the present application provide a new PID control method, which can dynamically adjust the PID parameters based on the determination of the initial PID parameters by referring to the reward and punishment mechanism in the reinforcement learning.
  • Fig. 2 shows a schematic block diagram of a PID control method 100 provided by an embodiment of the present application.
  • the PID control method 100 may include some or all of the following content:
  • S110 Determine the current reward value corresponding to the PID parameter of the PID controller according to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value.
  • the PID parameters include proportional gain, At least one of integral gain and derivative gain;
  • S130 Perform the next PID control on the PID controller in the target control system according to the updated PID parameters.
  • the PID controller in the embodiment of the present application is a collective term for all controllers that use PID control laws, and does not represent the type of controller.
  • the PID controller can be a controller that uses the three control laws of proportional, integral, and derivative, that is, the PID parameters of the PID controller include proportional gain, integral gain, and derivative gain; the PID controller can also be A controller that uses the two control laws of proportional and integral, that is, the PID parameters of the PID controller include proportional gain and integral gain. It should be understood that the embodiment of the present application does not limit the type of the PID controller.
  • the PID control method may only be directed to some of the multiple parameters. For example, assuming that the PID parameters include proportional gain, integral gain, and derivative gain, the PID control method may only target one or two of the proportional gain, integral gain, and derivative gain. This needs to be determined according to the requirements of each control law in the control system.
  • PID control is a cyclic control process, and the PID parameters obtained from each update can be used as parameters for the next PID control. Specifically, it is possible to determine whether to reward the PID parameters in this PID control based on the difference between the output value and the theoretical value in the first PID control, and further determine the current reward value. In other words, according to the difference between the output value and the theoretical value, it is determined whether the performance of the PID controller is good. If the performance of the PID controller is good, the PID parameters currently used need to be rewarded. This reward value Greater than 0; If the performance of the PID controller is not good, the PID parameters currently used need to be punished.
  • the PID parameters can be updated according to the reward value this time and the accumulated reward value this time.
  • the accumulated reward value this time is determined based on the previous accumulated reward value and the current reward value. For example, for each PID control, the determined reward value of this time can be accumulated with all previous reward values to form the accumulated reward value of this time. Specifically, assuming that rwd(t) represents the accumulated reward value for the tth time, rwd(t) can be equal to the sum of the accumulated reward value for the (t-1)th time and the current reward value rwd for the tth time.
  • the updated PID parameters can be used to perform the next PID control on the PID controller.
  • the PID control method can be executed by the PID control device in the control system.
  • the PID control device can be an independent device that can be placed behind the PID controller and adjust the PID parameters according to the output value of the PID controller.
  • the PID control device may also be a PID controller.
  • the embodiments of the present application do not constitute a limitation on this.
  • the PID control method in the embodiments of the present application can be applied to various control systems, for example, constant temperature and humidity systems, power systems, video coding and decoding systems, and so on.
  • the PID control method when it is applied to a video codec system, it can be applied to rate control in a video codec system.
  • the PID control method of the embodiment of the present application determines the current reward value through the difference between the output value of the PID controller and the theoretical value. In the case that the current reward value is less than 0, the PID parameters are combined with the historical reward value. Update, and perform the next PID control on the PID controller according to the updated PID parameters, so that the PID parameters can be adjusted adaptively without the need to adjust based on human experience, which greatly reduces the difficulty of PID parameter tuning, and the adjustment effect is significant.
  • the PID control method is also versatile.
  • the current reward value is determined according to the difference between the output value of the PID controller and the theoretical value.
  • the absolute value of the difference may be compared with the first threshold. If the absolute value of the difference is less than the first threshold, the PID controller is considered to have better performance, and the PID parameters used can be rewarded, that is, the reward value this time is greater than 0; if the absolute value of the difference is greater than the first threshold , It is considered that the performance of the PID controller is not good, then the PID parameters used need to be punished, that is, the reward value is less than 0; if the absolute value of the difference is equal to the first threshold, the PID controller can be considered The performance of is average, and neither reward nor punishment is given to the PID parameters used, that is, the reward value is equal to 0 this time.
  • the current reward value may be negatively correlated with the absolute value of the difference.
  • the absolute value of the current reward value and the difference value may be linearly negatively correlated as shown in FIG. 3. That is, the reward value this time can be determined by formula (3):
  • the absolute value of the current reward value and the difference value may be non-linearly negatively correlated as shown in FIG. 4. That is, the reward value this time can be determined by formula (4):
  • rwd represents the reward value this time
  • u(t) represents the output value of the PID controller
  • v* represents the theoretical value
  • both a and b are constants greater than 0.
  • represents the first threshold.
  • the nonlinear model in the embodiment of the present application may also be square root operation, exponential operation, trigonometric function operation, etc.
  • the accumulated reward value this time may be determined by formula (5):
  • rwd(t) represents the cumulative reward value this time
  • rwd(t-1) represents the previous cumulative reward value
  • rwd represents the current reward value
  • rwd(t-1)+rwd in each PID control is greater than 0, the rwd(t) is the sum of the current reward value determined in each PID control including this time. And when rwd(t-1)+rwd in one PID control is less than 0, then the accumulated reward value rwd(t) for that time is 0, that is, rwd(t) in the next PID control starts again.
  • the PID parameters currently used may not be updated.
  • the PID parameters in this PID control are still used.
  • the current cumulative reward value still needs to be determined by the previous cumulative reward value and the current reward value. That is to say, regardless of whether the reward value is greater than 0 or less than 0, the cumulative reward value needs to be determined this time.
  • the PID parameters are not updated; and when the reward value is less than 0, Then, the PID parameters are updated according to the determined current cumulative reward value and current reward value.
  • the PID parameters can also be fine-tuned.
  • the embodiment of the application does not limit this.
  • updating the PID parameters according to the current accumulated reward value and the current reward value includes: In the case that the current reward value is less than 0, the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward The ratio of the value when updating PID parameters.
  • the PID parameters can be updated only according to the two variables of the current cumulative reward value and the current reward value.
  • the penalty can be divided into a positive penalty and a negative penalty according to the relationship between the output value and the theoretical value. If the output value is smaller than the theoretical value, it can be regarded as a positive penalty.
  • the positive penalty can mean that the current PID controller is not adjusted enough, and the PID parameters need to be adjusted.
  • the update rate can be used to increase the current time.
  • the negative penalty can mean that the current PID controller's adjustment power is too large, and the adjustment strength of the PID parameters needs to be reduced At this time, the update rate can be used to reduce the proportion of the current reward value when updating the PID parameters.
  • the update rate can be adjusted according to actual conditions, for example, it can be updated during the PID parameter update process. That is, every time the PID parameters are updated, the update rate is updated once as the update rate when the PID parameters are updated next time. In an achievable embodiment, when the accumulated reward value this time is greater than the second threshold, the update rate is reduced. Thereby, a higher-precision update effect can be achieved. Optionally, when the accumulated reward value this time is less than the second threshold, the update rate may be increased; when the accumulated reward value this time is equal to the second threshold, the update rate may not be updated.
  • the positive penalty can use the following formula (6) to update the PID parameters:
  • Negative penalty can use the following formula (7) to update the PID parameters:
  • k 2 represents the updated PID parameters
  • k 1 represents the PID parameters before the update
  • rwd(t) represents the accumulated reward value this time
  • ur represents the update rate
  • the value of ur ranges from 0 to 1.
  • the first threshold, the second threshold, and the update rate in the embodiments of the present application can be obtained based on the experience of relevant practitioners.
  • the negative correlation between the reward value this time and the absolute value of the difference can also be obtained based on the experience of the relevant practitioners, and these are relatively easy to obtain for the relevant practitioners.
  • k p , k i and k d have different adjustment intensities.
  • k p is the largest, and k i and k d are the second. Therefore, different update rates ur p , ur can be set for these three parameters. i and ur d .
  • FIG. 5 shows a schematic block diagram of a PID control device 200 provided in an embodiment of the present application. As shown in FIG. 5, the PID control device 200 includes some or all of the following contents:
  • the determining unit 210 is configured to determine the current reward value corresponding to the PID parameter of the PID controller according to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value;
  • the update unit 220 is configured to update the PID parameters according to the current accumulated reward value and the current reward value when the current reward value is less than 0.
  • the PID parameters include proportional gain and integral gain And at least one of differential gain, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value;
  • the control unit 230 is configured to perform the next PID control on the PID controller in the target control system according to the updated PID parameters.
  • the PID control device of the embodiment of the present application determines the reward value of this time through the difference between the output value of the PID controller and the theoretical value. In the case that the reward value is less than 0 this time, the PID parameters are combined with the historical reward value. Update, and perform the next PID control on the PID controller according to the updated PID parameters, so that the PID parameters can be adjusted adaptively without the need to adjust based on human experience, which greatly reduces the difficulty of PID parameter tuning, and the adjustment effect is significant.
  • the PID control method is also versatile.
  • the current reward value is negatively correlated with the absolute value of the difference.
  • the current reward value is greater than or equal to 0; if the absolute value of the difference is greater than the first threshold, Threshold, the reward value this time is less than 0.
  • the determining unit is specifically configured to:
  • rwd represents the reward value this time
  • u(t) represents the output value of the PID control device
  • v* represents the theoretical value
  • both a and b are constants greater than zero.
  • the determining unit is further configured to:
  • rwd(t) max(0,rwd(t-1)+rwd);
  • rwd(t) represents the cumulative reward value this time
  • rwd(t-1) represents the previous cumulative reward value
  • rwd represents the current reward value
  • the update unit is specifically configured to:
  • the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward value.
  • the update rate is used to increase the current reward value when updating The proportion of the PID parameters; when the current reward value is less than 0 and the output value is greater than the theoretical value, the update rate is used to reduce the current reward value in the update The proportion of PID parameters.
  • the update unit is specifically configured to:
  • the PID parameters are updated according to the fourth formula, where the fourth formula is:
  • the PID parameters are updated according to the fifth formula, where the fifth formula is:
  • k 2 represents the updated PID parameters
  • k 1 represents the PID parameters before the update
  • rwd(t) represents the accumulated reward value this time
  • psh represents the negative number of the reward value this time
  • ur represents the update rate
  • the value range of ur Between 0 and 1.
  • the update unit is further configured to:
  • the update rate is reduced.
  • control unit is further configured to:
  • the PID controller in the target control system is subjected to the next PID control according to the PID parameters.
  • the determining unit is further configured to:
  • the current cumulative reward value is determined according to the current reward value and the previous cumulative reward value, and the current cumulative reward value is used for the next update of the reward.
  • the target control system is a video codec system
  • the PID control device is suitable for rate control in the video codec system.
  • the PID control device 200 may correspond to the execution subject in the method embodiment of the present application, and the above-mentioned and other operations and/or functions of the various units in the PID control device 200 are to implement the method of FIG. 2 respectively.
  • the corresponding process in will not be repeated here.
  • the video encoding and decoding system 300 includes a PID controller 310, an encoding parameter adjustment device 320, an encoder 330, a buffer 340, and various embodiments described above. Specifically, the difference between the target line of the buffer and the fullness of the buffer is used as the proportional term of the PID controller 310, and the encoding parameter adjustment device 320 feedbacks and calculates the encoding parameters (such as Quantization parameter QP, Lagrangian multiplier ⁇ , etc.), and then assign the adjusted encoding parameters to the encoder 330 for actual encoding.
  • the encoding parameter adjustment device 320 feedbacks and calculates the encoding parameters (such as Quantization parameter QP, Lagrangian multiplier ⁇ , etc.), and then assign the adjusted encoding parameters to the encoder 330 for actual encoding.
  • the PID control device 350 in the embodiment of the present application can be used to adjust the PID parameters according to the difference between the target line of the buffer and the fullness of the buffer when the buffer is updated. If the error is large, the PID parameters need to be penalized. At this time, you can also determine whether it is a positive penalty or a negative penalty based on the adjustment of the encoding parameters, and then complete the update and adjustment of the PID parameters for the next rate control process.
  • FIG. 7 is a schematic structural diagram of a PID control device 400 provided by an embodiment of the present application.
  • the PID control device 400 includes a memory 410 and a processor 420.
  • the memory 410 is used to store instructions
  • the processor 420 is used to execute the instructions stored in the memory 410.
  • the processor 420 is used to perform the following operations: according to the target control system of the proportional integral derivative PID controller The difference between the output value and the theoretical value determines the current reward value corresponding to the PID parameter of the PID controller.
  • the PID parameter includes at least one of proportional gain, integral gain, and derivative gain;
  • the PID parameters are updated according to the current cumulative reward value and the current reward value, where the current cumulative reward value is based on the current reward value and the previous cumulative reward value.
  • the reward value is determined; according to the updated PID parameters, the next PID control is performed on the PID controller in the target control system.
  • FIG. 8 is a schematic structural diagram of a chip of an embodiment of the present application.
  • the chip 500 shown in FIG. 8 includes a processor 510, and the processor 510 can call and run a computer program from the memory to implement the method in the embodiment of the present application.
  • the chip 500 may further include a memory 520.
  • the processor 510 may call and run a computer program from the memory 520 to implement the method in the embodiment of the present application.
  • the memory 520 may be a separate device independent of the processor 510, or may be integrated in the processor 510.
  • the chip 500 may further include an input interface 530.
  • the processor 510 can control the input interface 530 to communicate with other devices or chips, and specifically, can obtain information or data sent by other devices or chips.
  • the chip 500 may further include an output interface 550.
  • the processor 510 can control the output interface 550 to communicate with other devices or chips, and specifically, can output information or data to other devices or chips.
  • the chip can be applied to the PID control device in the embodiment of the present application, and the chip can implement the corresponding process in each method of the embodiment of the present application.
  • the chip can be applied to the PID control device in the embodiment of the present application, and the chip can implement the corresponding process in each method of the embodiment of the present application.
  • details are not described herein again.
  • the chip mentioned in the embodiment of the present application may also be referred to as a system-level chip, a system-on-chip, a system-on-chip, or a system-on-chip.
  • the processor of the embodiment of the present application may be an integrated circuit chip with signal processing capability.
  • the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the above-mentioned processor may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC application specific integrated circuit
  • FPGA field Programmable Gate Array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the memory in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (Random Access Memory, RAM), which is used as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • DDR SDRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • Enhanced SDRAM, ESDRAM Enhanced Synchronous Dynamic Random Access Memory
  • Synchronous Link Dynamic Random Access Memory Synchronous Link Dynamic Random Access Memory
  • DR RAM Direct Rambus RAM
  • the memory in the embodiment of the present application may also be static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), Synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection Dynamic random access memory (synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM) and so on. That is to say, the memory in the embodiments of the present application is intended to include, but is not limited to, these and any other suitable types of memory.
  • the embodiment of the present application also provides a computer-readable storage medium for storing computer programs.
  • the computer-readable storage medium can be applied to the network device in the embodiment of the present application, and the computer program causes the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application.
  • the computer program causes the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application.
  • the computer-readable storage medium can be applied to the mobile terminal/terminal device in the embodiment of the present application, and the computer program causes the computer to execute the corresponding process implemented by the mobile terminal/terminal device in each method of the embodiment of the present application For the sake of brevity, I won’t repeat it here.
  • the embodiments of the present application also provide a computer program product, including computer program instructions.
  • the computer program product can be applied to the network device in the embodiment of the present application, and the computer program instructions cause the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application.
  • the computer program instructions cause the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application.
  • the computer program product can be applied to the mobile terminal/terminal device in the embodiment of the present application, and the computer program instructions cause the computer to execute the corresponding process implemented by the mobile terminal/terminal device in each method of the embodiment of the present application, For the sake of brevity, I will not repeat them here.
  • the embodiment of the present application also provides a computer program.
  • the computer program can be applied to the network device in the embodiment of the present application.
  • the computer program runs on the computer, it causes the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application.
  • I won’t repeat it here.
  • the computer program can be applied to the mobile terminal/terminal device in the embodiment of the present application.
  • the computer program runs on the computer, the computer executes each method in the embodiment of the present application. For the sake of brevity, the corresponding process will not be repeated here.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory,) ROM, random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Feedback Control In General (AREA)

Abstract

The present application provides a Proportional Integral Differential (PID) control method and apparatus, and a video encoding and decoding system. The PID control method comprises: determining, according to a difference between an output value of a PID controller in a target control system and a theoretical value, the current reward value corresponding to PID parameters of the PID controller, the PID parameters comprising at least one of a proportional gain, an integral gain, and a differential gain; in the case that the current reward value is less than 0, updating the PID parameters according to the current accumulated reward value and the current reward value, the current accumulated reward value being determined according to the current reward value and the previous accumulated reward value; and performing next PID control to the PID controller in the target control system according to the updated PID parameters. According to the method and apparatus and the system in embodiments of the present application, the difficulty in setting the PID parameters is reduced, and the control performance and versatility of the PID controller are improved.

Description

PID控制方法、装置和视频编解码系统PID control method, device and video coding and decoding system 技术领域Technical field
本申请涉及控制领域,并且更具体地,涉及一种PID控制方法、装置和视频编解码系统。This application relates to the field of control, and more specifically, to a PID control method, device, and video codec system.
背景技术Background technique
目前,比例积分微分(proportion integral differential,PID)控制由于其算法简单、可靠性好,所以在控制领域有着较为广泛的应用。传统的PID控制器的PID参数可以包括比例增益、积分增益和微分增益。PID参数直接决定了PID控制器的控制性能。因此,PID控制器的参数整定是控制系统设计的核心内容。传统的参数整定过程严重依赖相关从业人员的经验,通过反复调试PID参数,直到满足实际应用需求,这样的调节工作量极大。At present, proportional integral differential (PID) control has a wide range of applications in the field of control due to its simple algorithm and good reliability. The PID parameters of the traditional PID controller can include proportional gain, integral gain and derivative gain. PID parameters directly determine the control performance of the PID controller. Therefore, the parameter tuning of the PID controller is the core content of the control system design. The traditional parameter setting process relies heavily on the experience of relevant practitioners. Through repeated debugging of PID parameters until the actual application requirements are met, such adjustment workload is extremely large.
发明内容Summary of the invention
本申请提供一种PID控制方法、装置和视频编解码系统,有利于降低PID参数的整定难度,以及提高PID控制器的控制性能及通用性。The present application provides a PID control method, device, and video encoding and decoding system, which are beneficial to reduce the difficulty in setting PID parameters and improve the control performance and versatility of the PID controller.
第一方面,提供了一种PID控制方法,该PID控制方法包括:根据目标控制系统中的PID控制器的输出值与理论值之间的差值,确定该PID控制器的PID参数对应的本次奖励值,该PID参数包括比例增益、积分增益和微分增益中的至少一种;在该本次奖励值小于0情况下,根据本次累加奖励值和该本次奖励值对该PID参数进行更新,其中,该本次累加奖励值是根据该本次奖励值与前一次累加奖励值确定的;根据更新后的该PID参数,对该目标控制系统中的PID控制器进行下一次PID控制。In the first aspect, a PID control method is provided. The PID control method includes: determining the PID parameter corresponding to the PID controller in the target control system according to the difference between the output value of the PID controller in the target control system and the theoretical value. Second reward value, the PID parameter includes at least one of proportional gain, integral gain, and differential gain; if the current reward value is less than 0, the PID parameter is performed according to the current accumulated reward value and the current reward value. Update, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value; according to the updated PID parameters, the PID controller in the target control system is subjected to the next PID control.
第二方面,提供了一种PID控制装置,所述PID控制装置包括:确定单元,用于根据目标控制系统中的PID控制器的输出值与理论值之间的差值,确定所述PID控制器的PID参数对应的本次奖励值;更新单元,用于在所述本次奖励值小于0情况下,根据本次累加奖励值和所述本次奖励值对所述PID参数进行更新,所述PID参数包括比例增益、积分增益和微分增益中的至少一种,其中,所述本次累加奖励值是根据所述本次奖励值与前一次累加奖励值确定的;控制单元,用于根据更新后的所述PID参数,对所述目标控制系统中的PID控制器进行下一次PID控制。In a second aspect, a PID control device is provided. The PID control device includes a determining unit for determining the PID control based on the difference between the output value of the PID controller in the target control system and the theoretical value. The PID parameter of the controller corresponds to the current reward value; the update unit is used to update the PID parameters according to the current accumulated reward value and the current reward value when the current reward value is less than 0, so The PID parameters include at least one of proportional gain, integral gain, and derivative gain, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value; the control unit is configured to The updated PID parameters perform the next PID control on the PID controller in the target control system.
第三方面,提供了一种视频编解码系统,包括第二方面或其各实现方式中的PID控制装置。In a third aspect, a video encoding and decoding system is provided, including the PID control device in the second aspect or its implementation manners.
第四方面,提供了一种PID控制装置,包括处理器和存储器。该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,执行上述第一方面或其各实现方式中的方法。In a fourth aspect, a PID control device is provided, which includes a processor and a memory. The memory is used to store a computer program, and the processor is used to call and run the computer program stored in the memory to execute the method in the above-mentioned first aspect or each of its implementation manners.
第五方面,提供了一种芯片,用于实现上述第一方面或其各实现方式中的方法。In a fifth aspect, a chip is provided, which is used to implement the method in the above-mentioned first aspect or each of its implementation manners.
具体地,该芯片包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有该芯片的设备执行如上述第一方面或其各实现方式中的方法。Specifically, the chip includes: a processor, configured to call and run a computer program from the memory, so that the device installed with the chip executes the method in the above-mentioned first aspect or each of its implementation manners.
第六方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面或其各实现方式中的方法。In a sixth aspect, a computer-readable storage medium is provided for storing a computer program that enables a computer to execute the method in the above-mentioned first aspect or each of its implementation manners.
第七方面,提供了一种计算机程序产品,包括计算机程序指令,所述计算机程序指令使得计算机执行上述第一方面或其各实现方式中的方法。In a seventh aspect, a computer program product is provided, including computer program instructions that cause a computer to execute the method in the first aspect or its implementation manners.
第八方面,提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面或其各实现方式中的方法。In an eighth aspect, a computer program is provided, which when running on a computer, causes the computer to execute the method in the first aspect or its implementation manners.
通过上述技术方案,通过PID控制器的输出值与理论值的差值,确定本次奖励值,在本次奖励值小于0的情况下,结合历史奖励值对PID参数进行更新,并根据更新后的PID参数对PID控制器进行下一次PID控制,从而能够自适应调节PID参数,而不需要 依据人为经验调节,其大大降低了PID参数的整定难度,并且调节效果显著。Through the above technical solution, the current reward value is determined by the difference between the output value of the PID controller and the theoretical value. When the current reward value is less than 0, the PID parameters are updated in combination with the historical reward value, and the PID parameters are updated according to the updated value. The PID parameters of the PID controller are used for the next PID control, so that the PID parameters can be adjusted adaptively without the need to adjust based on human experience, which greatly reduces the difficulty of PID parameter tuning, and the adjustment effect is significant.
附图说明Description of the drawings
图1是PID控制系统的结构性示意图。Figure 1 is a schematic structural diagram of the PID control system.
图2是本申请实施例提供的PID控制方法的示意性框图。Fig. 2 is a schematic block diagram of a PID control method provided by an embodiment of the present application.
图3是本申请实施例中本次奖励值与差值的绝对值的一种负相关关系示意图。Fig. 3 is a schematic diagram of a negative correlation between the reward value this time and the absolute value of the difference in an embodiment of the present application.
图4是本申请实施例中本次奖励值与差值的绝对值的另一种负相关关系示意图。FIG. 4 is a schematic diagram of another negative correlation between the absolute value of the reward value this time and the difference value in the embodiment of the present application.
图5是本申请实施例提供的PID控制装置的示意性框图。Fig. 5 is a schematic block diagram of a PID control device provided by an embodiment of the present application.
图6是本申请实施例提供的视频编解码系统的示意性框图。Fig. 6 is a schematic block diagram of a video encoding and decoding system provided by an embodiment of the present application.
图7是本申请实施例提供的PID控制装置的另一示意性框图。Fig. 7 is another schematic block diagram of a PID control device provided by an embodiment of the present application.
图8是本申请实施例提供的芯片的示意性框图。FIG. 8 is a schematic block diagram of a chip provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。针对本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are a part of the embodiments of the present application, not all of the embodiments. Regarding the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.
目前工业自动化水平已成为衡量各行各业现代化水平的一个重要标志。同时,控制理论的发展也经历了古典控制理论、现代控制理论和智能控制理论三个阶段。智能控制的典型实例是模糊全自动洗衣机等。控制系统可分为开环控制系统和闭环控制系统。而PID控制则是一种闭环控制方式,将输入输出偏差的比例(P)、积分(I)和微分(D)通过线性组合构成控制量,对被控制对象进行控制。典型地,一个控制系统包括PID控制器和被控对象,如图1所示。At present, the level of industrial automation has become an important indicator to measure the modernization level of various industries. At the same time, the development of control theory has also experienced three stages: classical control theory, modern control theory and intelligent control theory. A typical example of intelligent control is a fuzzy fully automatic washing machine. The control system can be divided into an open-loop control system and a closed-loop control system. The PID control is a closed-loop control method, which combines the proportion (P), integral (I) and derivative (D) of the input and output deviation to form a control quantity through a linear combination to control the controlled object. Typically, a control system includes a PID controller and a controlled object, as shown in Figure 1.
通常PID控制包括比例、积分和微分这三部分,而在实际中也有其他类型的控制器。比例、积分以及微分这三种控制规律可以单独使用,也可以组合使用。例如,比例P控制器、比例积分PI控制器、比例微分PD控制器等。具体控制器的类型主要根据控制系统的需求来定。Usually PID control includes three parts: proportional, integral and derivative, but there are other types of controllers in practice. The three control laws of proportional, integral and derivative can be used alone or in combination. For example, proportional P controller, proportional integral PI controller, proportional derivative PD controller, etc. The type of specific controller is mainly determined according to the needs of the control system.
PID控制器的表达式如公式(1)所示:The expression of the PID controller is shown in formula (1):
Figure PCTCN2020117211-appb-000001
Figure PCTCN2020117211-appb-000001
在实际应用时,也可以将其离散化,其表达式如公式(2)所示:In practical applications, it can also be discretized, and its expression is shown in formula (2):
Figure PCTCN2020117211-appb-000002
Figure PCTCN2020117211-appb-000002
其中,u(t)表示PID控制器的输出值,e(t)表示输入值(也即理论值)与输出值之间的偏差,k p、k i以及k d分别为比例增益、积分增益以及微分增益,也可以称为是比例系数、积分系数以及微分系数。e(t)表示在t时刻的偏差,而e(t-1)表示在(t-1)时刻的偏差。 Among them, u(t) represents the output value of the PID controller, e(t) represents the deviation between the input value (that is, the theoretical value) and the output value, k p , k i and k d are the proportional gain and integral gain, respectively And differential gain can also be called proportional coefficient, integral coefficient and differential coefficient. e(t) represents the deviation at time t, and e(t-1) represents the deviation at time (t-1).
通常情况下,这三个参数为PID控制器中重要的参数,直接决定了控制器的控制性能。因此,PID控制器的参数整定是控制系统设计的核心内容。目前的参数整定主要依赖于相关从业人员的经验,通过反复调节这三项参数,直到满足实际应用需求。这样的调节工作量极大。此外,该方法适用范围受限,每次调整只适用于某种特定工程场景,不具备通用性。Under normal circumstances, these three parameters are important parameters in the PID controller, which directly determine the control performance of the controller. Therefore, the parameter tuning of the PID controller is the core content of the control system design. The current parameter setting mainly relies on the experience of relevant practitioners, through repeated adjustments of these three parameters, until the actual application requirements are met. Such adjustment workload is huge. In addition, the scope of application of this method is limited, and each adjustment is only applicable to a specific engineering scenario, and does not have universality.
因此,本申请实施例提供了一种新的PID控制方法,该PID控制方法通过借鉴强化学习中的奖励惩罚机制,在PID初始参数确定的基础上可以动态调整PID参数。Therefore, the embodiments of the present application provide a new PID control method, which can dynamically adjust the PID parameters based on the determination of the initial PID parameters by referring to the reward and punishment mechanism in the reinforcement learning.
图2示出了本申请实施例提供的一种PID控制方法100的示意性框图。如图2所示, 该PID控制方法100可以包括以下部分或全部内容:Fig. 2 shows a schematic block diagram of a PID control method 100 provided by an embodiment of the present application. As shown in FIG. 2, the PID control method 100 may include some or all of the following content:
S110,根据目标控制系统中的比例积分微分PID控制器的输出值与理论值之间的差值,确定所述PID控制器的PID参数对应的本次奖励值,所述PID参数包括比例增益、积分增益和微分增益中的至少一种;S110: Determine the current reward value corresponding to the PID parameter of the PID controller according to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value. The PID parameters include proportional gain, At least one of integral gain and derivative gain;
S120,在所述本次奖励值小于0的情况下,根据本次累加奖励值和所述本次奖励值对所述PID参数进行更新,其中,所述本次累加奖励值是根据所述本次奖励值与前一次累加奖励值确定的;S120: In the case that the current reward value is less than 0, update the PID parameters according to the current cumulative reward value and the current reward value, wherein the current cumulative reward value is based on the current reward value. The second reward value is determined by the previous cumulative reward value;
S130,根据更新后的所述PID参数,对所述目标控制系统中的PID控制器进行下一次PID控制。S130: Perform the next PID control on the PID controller in the target control system according to the updated PID parameters.
首先,需要说明的是,本申请实施例中的PID控制器是对所有使用PID控制规律的控制器的统称,并不代表控制器的类型。也就是说,该PID控制器可以是使用比例、积分以及微分这三种控制规律的控制器,即该PID控制器的PID参数包括比例增益、积分增益以及微分增益;该PID控制器也可以是使用比例和积分这两种控制规律的控制器,即该PID控制器的PID参数包括比例增益和积分增益。应理解,本申请实施例对PID控制器的类型不作限定。First of all, it should be noted that the PID controller in the embodiment of the present application is a collective term for all controllers that use PID control laws, and does not represent the type of controller. In other words, the PID controller can be a controller that uses the three control laws of proportional, integral, and derivative, that is, the PID parameters of the PID controller include proportional gain, integral gain, and derivative gain; the PID controller can also be A controller that uses the two control laws of proportional and integral, that is, the PID parameters of the PID controller include proportional gain and integral gain. It should be understood that the embodiment of the present application does not limit the type of the PID controller.
另外,即便PID控制器的PID参数包括多个参数,本申请实施例所提供的PID控制方法也可以只针对该多个参数中的部分参数。例如,假设PID参数包括比例增益、积分增益以及微分增益,该PID控制方法可以只针对比例增益、积分增益和微分增益中的一个参数或两个参数。这需要根据各个控制规律在控制系统中的需求而定。In addition, even if the PID parameters of the PID controller include multiple parameters, the PID control method provided in the embodiment of the present application may only be directed to some of the multiple parameters. For example, assuming that the PID parameters include proportional gain, integral gain, and derivative gain, the PID control method may only target one or two of the proportional gain, integral gain, and derivative gain. This needs to be determined according to the requirements of each control law in the control system.
在本申请实施例中,PID控制是一个循环控制的过程,而每次更新得到的PID参数可以作为下一次PID控制所使用的参数。具体而言,可以依据一次PID控制中的输出值与理论值的差值,确定是否要对该次PID控制中的PID参数进行奖励,并进一步确定本次奖励值。也就是说,根据输出值与理论值的差值,确定PID控制器的性能是否较好,若PID控制器的性能较好,则需要对当前所使用的PID参数进行奖励,该本次奖励值大于0;若PID控制器的性能不好,则需要对当前所使用的PID参数进行惩罚,该本次奖励值小于0。在本次奖励值小于0时,即该PID控制器的性能不好时,可以根据本次奖励值以及本次累加奖励值对PID参数进行更新。本次累加奖励值就是根据前一次累加奖励值和本次奖励值确定的。例如,对于每一次PID控制,可以将确定的本次奖励值与之前的所有奖励值累加起来形成本次累加奖励值。具体地,假设rwd(t)表示第t次的累加奖励值,rwd(t)可以等于第(t-1)次的累加奖励值与第t次的本次奖励值rwd之和,而第(t-1)次的累加奖励值等于第(t-2)次的累加奖励值与第(t-1)次的本次奖励值rwd之和,依次类推,那么rwd(t)=rwd t+rwd t-1+rwd t-2+……+rwd 1+rwd 0,其中,rwd i表示第i次的本次奖励值,i为小于或等于t的整数,并且rwd 0=0。当对PID参数进行更新之后,可以使用更新后的PID参数对PID控制器进行下一次PID控制。 In the embodiment of the present application, PID control is a cyclic control process, and the PID parameters obtained from each update can be used as parameters for the next PID control. Specifically, it is possible to determine whether to reward the PID parameters in this PID control based on the difference between the output value and the theoretical value in the first PID control, and further determine the current reward value. In other words, according to the difference between the output value and the theoretical value, it is determined whether the performance of the PID controller is good. If the performance of the PID controller is good, the PID parameters currently used need to be rewarded. This reward value Greater than 0; If the performance of the PID controller is not good, the PID parameters currently used need to be punished. When the reward value this time is less than 0, that is, when the performance of the PID controller is not good, the PID parameters can be updated according to the reward value this time and the accumulated reward value this time. The accumulated reward value this time is determined based on the previous accumulated reward value and the current reward value. For example, for each PID control, the determined reward value of this time can be accumulated with all previous reward values to form the accumulated reward value of this time. Specifically, assuming that rwd(t) represents the accumulated reward value for the tth time, rwd(t) can be equal to the sum of the accumulated reward value for the (t-1)th time and the current reward value rwd for the tth time. The accumulated reward value of t-1) times is equal to the sum of the accumulated reward value of the (t-2)th time and the current reward value rwd of the (t-1)th time, and so on, then rwd(t)=rwd t + rwd t-1 +rwd t-2 +……+rwd 1 +rwd 0 , where rwd i represents the reward value of the i-th time, i is an integer less than or equal to t, and rwd 0 =0. After the PID parameters are updated, the updated PID parameters can be used to perform the next PID control on the PID controller.
该PID控制方法可以由控制系统中的PID控制装置执行,例如,该PID控制装置可以是一个独立的装置,其可以置于PID控制器之后,并根据PID控制器的输出值对PID参数进行调节。再例如,该PID控制装置也可以就是PID控制器。本申请实施例对此不构成限定。The PID control method can be executed by the PID control device in the control system. For example, the PID control device can be an independent device that can be placed behind the PID controller and adjust the PID parameters according to the output value of the PID controller. . For another example, the PID control device may also be a PID controller. The embodiments of the present application do not constitute a limitation on this.
另外,本申请实施例中的PID控制方法可以应用于各种控制系统中,例如,恒温恒湿系统、电力系统、视频编解码系统等。具体地,当该PID控制方法应用于视频编解码系统中时,可以应用于视频编解码系统中的码率控制。In addition, the PID control method in the embodiments of the present application can be applied to various control systems, for example, constant temperature and humidity systems, power systems, video coding and decoding systems, and so on. Specifically, when the PID control method is applied to a video codec system, it can be applied to rate control in a video codec system.
因此,本申请实施例的PID控制方法,通过PID控制器的输出值与理论值的差值,确定本次奖励值,在本次奖励值小于0的情况下,结合历史奖励值对PID参数进行更新,并根据更新后的PID参数对PID控制器进行下一次PID控制,从而能够自适应调节PID参数,而不需要依据人为经验调节,其大大降低了PID参数的整定难度,并且调节效果显著,该PID控制方法还具有通用性。Therefore, the PID control method of the embodiment of the present application determines the current reward value through the difference between the output value of the PID controller and the theoretical value. In the case that the current reward value is less than 0, the PID parameters are combined with the historical reward value. Update, and perform the next PID control on the PID controller according to the updated PID parameters, so that the PID parameters can be adjusted adaptively without the need to adjust based on human experience, which greatly reduces the difficulty of PID parameter tuning, and the adjustment effect is significant. The PID control method is also versatile.
可选地,在本申请实施例中,根据PID控制器的输出值与理论值的差值,确定本次奖励值,可以是,将该差值的绝对值与第一阈值进行比较,若该差值的绝对值小于第一阈值,就认为该PID控制器的性能比较好,可以对所使用的PID参数进行奖励,即本次奖励值大于0;若该差值的绝对值大于第一阈值,就认为该PID控制器的性能不好,那就需要对所使用的PID参数进行惩罚,即本次奖励值小于0;若该差值的绝对值等于第一阈值,可以认为该PID控制器的性能一般,对所使用的PID参数既不奖励也不惩罚,即本次奖励值等于0。Optionally, in the embodiment of the present application, the current reward value is determined according to the difference between the output value of the PID controller and the theoretical value. The absolute value of the difference may be compared with the first threshold. If the absolute value of the difference is less than the first threshold, the PID controller is considered to have better performance, and the PID parameters used can be rewarded, that is, the reward value this time is greater than 0; if the absolute value of the difference is greater than the first threshold , It is considered that the performance of the PID controller is not good, then the PID parameters used need to be punished, that is, the reward value is less than 0; if the absolute value of the difference is equal to the first threshold, the PID controller can be considered The performance of is average, and neither reward nor punishment is given to the PID parameters used, that is, the reward value is equal to 0 this time.
可选地,所述本次奖励值可以与所述差值的绝对值负相关。例如,所述本次奖励值与所述差值的绝对值可以如图3所示线性负相关。即本次奖励值可以通过公式(3)确定:Optionally, the current reward value may be negatively correlated with the absolute value of the difference. For example, the absolute value of the current reward value and the difference value may be linearly negatively correlated as shown in FIG. 3. That is, the reward value this time can be determined by formula (3):
rwd=-a*|u(t)-v *|+b       公式(3) rwd=-a*|u(t)-v * |+b formula (3)
再例如,所述本次奖励值与所述差值的绝对值可以如图4所示非线性负相关。即本次奖励值可以通过公式(4)确定:For another example, the absolute value of the current reward value and the difference value may be non-linearly negatively correlated as shown in FIG. 4. That is, the reward value this time can be determined by formula (4):
rwd=-a*ln(|u(t)-v *|+1)+b      公式(4) rwd=-a*ln(|u(t)-v * |+1)+b formula (4)
其中,rwd表示本次奖励值,u(t)表示PID控制器的输出值,v*表示理论值,a和b均为大于0的常数。在图3和图4中,δ表示第一阈值。Among them, rwd represents the reward value this time, u(t) represents the output value of the PID controller, v* represents the theoretical value, and both a and b are constants greater than 0. In Figs. 3 and 4, δ represents the first threshold.
可替代地,本申请实施例中的非线性模型除了公式(4)中的对数运算,还可以是开方运算、指数运算以及三角函数运算等。Alternatively, in addition to the logarithmic operation in formula (4), the nonlinear model in the embodiment of the present application may also be square root operation, exponential operation, trigonometric function operation, etc.
可选地,在本申请实施例中,所述本次累加奖励值可以通过公式(5)确定:Optionally, in this embodiment of the application, the accumulated reward value this time may be determined by formula (5):
rwd(t)=max(0,rwd(t-1)+rwd)       公式(5)rwd(t)=max(0,rwd(t-1)+rwd) Formula (5)
其中,rwd(t)表示本次累加奖励值,rwd(t-1)表示前一次累加奖励值,rwd表示本次奖励值。Among them, rwd(t) represents the cumulative reward value this time, rwd(t-1) represents the previous cumulative reward value, and rwd represents the current reward value.
具体地,当每次PID控制中的rwd(t-1)+rwd均大于0时,该rwd(t)则为包括本次在内的每次PID控制中确定的本次奖励值之和。而当其中某次PID控制中的rwd(t-1)+rwd小于0,那么该次的累加奖励值rwd(t)则为0,即下一次PID控制中的rwd(t)又重新开始。Specifically, when rwd(t-1)+rwd in each PID control is greater than 0, the rwd(t) is the sum of the current reward value determined in each PID control including this time. And when rwd(t-1)+rwd in one PID control is less than 0, then the accumulated reward value rwd(t) for that time is 0, that is, rwd(t) in the next PID control starts again.
可选地,在本申请实施例中,当所述本次奖励值大于或等于0时,可以不对当前所使用的PID参数进行更新。也就是说,在下一次PID控制中,仍然使用本次PID控制中的PID参数。当所述本次奖励值大于或等于0时,本次累加奖励值仍然需要通过前一次累加奖励值和本次奖励值确定。也就是说,无论本次奖励值是大于0或者是小于0,都需要确定本次累加奖励值,当本次奖励值大于或等于0,PID参数不作更新;而当本次奖励值小于0,则根据确定的本次累加奖励值和本次奖励值更新PID参数。Optionally, in this embodiment of the application, when the current reward value is greater than or equal to 0, the PID parameters currently used may not be updated. In other words, in the next PID control, the PID parameters in this PID control are still used. When the current reward value is greater than or equal to 0, the current cumulative reward value still needs to be determined by the previous cumulative reward value and the current reward value. That is to say, regardless of whether the reward value is greater than 0 or less than 0, the cumulative reward value needs to be determined this time. When the reward value is greater than or equal to 0, the PID parameters are not updated; and when the reward value is less than 0, Then, the PID parameters are updated according to the determined current cumulative reward value and current reward value.
在一种可替代的实施例中,当所述本次奖励值大于或等于0时,也可以对PID参数进行微调。例如,K 2=a*K 1,其中,K 2是更新后的PID参数,K 1是更新前的PID参数,而a则接近于1,如a=0.99,a=1.01等。本申请实施例对此不作限定。 In an alternative embodiment, when the current reward value is greater than or equal to 0, the PID parameters can also be fine-tuned. For example, K 2 =a*K 1 , where K 2 is the PID parameter after update, K 1 is the PID parameter before update, and a is close to 1, such as a=0.99, a=1.01 and so on. The embodiment of the application does not limit this.
可选地,在本申请实施例中,所述在所述本次奖励值小于0情况下,根据本次累加奖励值和所述本次奖励值对所述PID参数进行更新,包括:在所述本次奖励值小于0的情况下,根据所述本次累加奖励值、所述本次奖励值以及更新率,对所述PID参数进行更新,所述更新率用于调节所述本次奖励值在更新PID参数时所占的比例。Optionally, in this embodiment of the application, in the case that the current reward value is less than 0, updating the PID parameters according to the current accumulated reward value and the current reward value includes: In the case that the current reward value is less than 0, the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward The ratio of the value when updating PID parameters.
可替代地,当更新率为常数时,可以只根据本次累加奖励值和本次奖励值这两个变量对PID参数进行更新。Alternatively, when the update rate is constant, the PID parameters can be updated only according to the two variables of the current cumulative reward value and the current reward value.
当本次奖励值小于0时,即需要对当前PID参数惩罚时,该惩罚可以根据输出值与理论值之间的大小关系分为正向惩罚和负向惩罚。若输出值比理论值小,则可以认为是正向惩罚,正向惩罚可以是指当前PID控制器调节力度不够,需要加大PID参数的调节力度,此时可以利用更新率增大所述本次奖励值在更新PID参数时所占的比例;若输出值大于理论值,则可以认为是负向惩罚,负向惩罚可以是指当前PID控制器调节力度过 大,需要减小PID参数的调节力度,此时可以利用更新率降低所述本次奖励值在更新PID参数时所占的比例。When the reward value this time is less than 0, that is, when the current PID parameter needs to be punished, the penalty can be divided into a positive penalty and a negative penalty according to the relationship between the output value and the theoretical value. If the output value is smaller than the theoretical value, it can be regarded as a positive penalty. The positive penalty can mean that the current PID controller is not adjusted enough, and the PID parameters need to be adjusted. In this case, the update rate can be used to increase the current time. The proportion of the reward value when updating PID parameters; if the output value is greater than the theoretical value, it can be considered as a negative penalty. The negative penalty can mean that the current PID controller's adjustment power is too large, and the adjustment strength of the PID parameters needs to be reduced At this time, the update rate can be used to reduce the proportion of the current reward value when updating the PID parameters.
可选地,更新率可以根据实际情况进行调节,例如,可以在PID参数更新过程中进行更新。即每更新一次PID参数,就更新一次更新率,以作为下一次更新PID参数时的更新率。在一种可实现的实施例中,当本次累加奖励值大于第二阈值时,则减小更新率。从而可以达到更高精度的更新效果。可选地,当本次累加奖励值小于第二阈值时,则可以增大更新率;当本次累加奖励值等于第二阈值时,则可以不更新更新率。Optionally, the update rate can be adjusted according to actual conditions, for example, it can be updated during the PID parameter update process. That is, every time the PID parameters are updated, the update rate is updated once as the update rate when the PID parameters are updated next time. In an achievable embodiment, when the accumulated reward value this time is greater than the second threshold, the update rate is reduced. Thereby, a higher-precision update effect can be achieved. Optionally, when the accumulated reward value this time is less than the second threshold, the update rate may be increased; when the accumulated reward value this time is equal to the second threshold, the update rate may not be updated.
进一步地,当本次奖励值小于0时,正向惩罚可以利用下述公式(6)更新PID参数:Further, when the reward value this time is less than 0, the positive penalty can use the following formula (6) to update the PID parameters:
Figure PCTCN2020117211-appb-000003
Figure PCTCN2020117211-appb-000003
负向惩罚可以利用下述公式(7)更新PID参数:Negative penalty can use the following formula (7) to update the PID parameters:
Figure PCTCN2020117211-appb-000004
Figure PCTCN2020117211-appb-000004
其中,k 2表示更新后的PID参数,k 1表示更新前的PID参数,rwd(t)表示本次累加奖励值,psh表示本次奖励值的负数,即psh=-rwd,ur表示更新率,ur的取值范围在0到1之间。 Among them, k 2 represents the updated PID parameters, k 1 represents the PID parameters before the update, rwd(t) represents the accumulated reward value this time, psh represents the negative number of the reward value this time, that is, psh=-rwd, ur represents the update rate , The value of ur ranges from 0 to 1.
将PID参数中的k p、k i以及k d分别代入公式(6)和公式(7)中,则公式(6)变为公式(8): Substituting the k p , k i and k d in the PID parameters into formula (6) and formula (7) respectively, formula (6) becomes formula (8):
Figure PCTCN2020117211-appb-000005
Figure PCTCN2020117211-appb-000005
公式(7)则变为公式(9):Formula (7) becomes formula (9):
Figure PCTCN2020117211-appb-000006
Figure PCTCN2020117211-appb-000006
需要说明的是,上述公式(6)~公式(9)中的更新公式仅仅是用来示意,并不用来限定,对上述公式的简单变化也属于本申请技术方案的保护范围。It should be noted that the update formulas in the above formulas (6) to (9) are only for illustration and not for limitation, and simple changes to the above formulas also belong to the protection scope of the technical solution of this application.
应理解,本申请实施例中的第一阈值、第二阈值以及更新率可以根据相关从业人员的经验获取。而本次奖励值与差值的绝对值的负相关关系也可以根据相关从业人员的经验获取,而这些对于相关从业人员来说,是比较容易获得的。It should be understood that the first threshold, the second threshold, and the update rate in the embodiments of the present application can be obtained based on the experience of relevant practitioners. The negative correlation between the reward value this time and the absolute value of the difference can also be obtained based on the experience of the relevant practitioners, and these are relatively easy to obtain for the relevant practitioners.
另外,通常k p、k i以及k d三项参数的调节强度不同,例如,k p最大,k i和k d次之,因此,可以针对这三项参数设置不同的更新率ur p、ur i以及ur dIn addition, usually k p , k i and k d have different adjustment intensities. For example, k p is the largest, and k i and k d are the second. Therefore, different update rates ur p , ur can be set for these three parameters. i and ur d .
图5示出了本申请实施例提供的PID控制装置200的示意性框图。如图5所示,该PID控制装置200包括以下部分或全部内容:FIG. 5 shows a schematic block diagram of a PID control device 200 provided in an embodiment of the present application. As shown in FIG. 5, the PID control device 200 includes some or all of the following contents:
确定单元210,用于根据目标控制系统中的比例积分微分PID控制器的输出值与理论值之间的差值,确定所述PID控制器的PID参数对应的本次奖励值;The determining unit 210 is configured to determine the current reward value corresponding to the PID parameter of the PID controller according to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value;
更新单元220,用于在所述本次奖励值小于0的情况下,根据本次累加奖励值和所述 本次奖励值对所述PID参数进行更新,所述PID参数包括比例增益、积分增益和微分增益中的至少一种,其中,所述本次累加奖励值是根据所述本次奖励值与前一次累加奖励值确定的;The update unit 220 is configured to update the PID parameters according to the current accumulated reward value and the current reward value when the current reward value is less than 0. The PID parameters include proportional gain and integral gain And at least one of differential gain, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value;
控制单元230,用于根据更新后的所述PID参数,对所述目标控制系统中的PID控制器进行下一次PID控制。The control unit 230 is configured to perform the next PID control on the PID controller in the target control system according to the updated PID parameters.
因此,本申请实施例的PID控制装置,通过PID控制器的输出值与理论值的差值,确定本次奖励值,在本次奖励值小于0的情况下,结合历史奖励值对PID参数进行更新,并根据更新后的PID参数对PID控制器进行下一次PID控制,从而能够自适应调节PID参数,而不需要依据人为经验调节,其大大降低了PID参数的整定难度,并且调节效果显著,该PID控制方法还具有通用性。Therefore, the PID control device of the embodiment of the present application determines the reward value of this time through the difference between the output value of the PID controller and the theoretical value. In the case that the reward value is less than 0 this time, the PID parameters are combined with the historical reward value. Update, and perform the next PID control on the PID controller according to the updated PID parameters, so that the PID parameters can be adjusted adaptively without the need to adjust based on human experience, which greatly reduces the difficulty of PID parameter tuning, and the adjustment effect is significant. The PID control method is also versatile.
可选地,在本申请实施例中,所述本次奖励值与所述差值的绝对值负相关。Optionally, in this embodiment of the present application, the current reward value is negatively correlated with the absolute value of the difference.
可选地,在本申请实施例中,若所述差值的绝对值小于或等于第一阈值,所述本次奖励值大于或等于0;若所述差值的绝对值大于所述第一阈值,所述本次奖励值小于0。Optionally, in this embodiment of the application, if the absolute value of the difference is less than or equal to a first threshold, the current reward value is greater than or equal to 0; if the absolute value of the difference is greater than the first threshold, Threshold, the reward value this time is less than 0.
可选地,在本申请实施例中,所述确定单元具体用于:Optionally, in the embodiment of the present application, the determining unit is specifically configured to:
根据第一公式,确定所述本次奖励值,其中,所述第一公式为:Determine the reward value this time according to the first formula, where the first formula is:
rwd=-a*|u(t)-v *|+b;或 rwd=-a*|u(t)-v * |+b; or
根据第二公式,确定所述本次奖励值,其中,所述第二公式为:Determine the reward value this time according to the second formula, where the second formula is:
rwd=-a*ln(|u(t)-v *|+1)+b; rwd=-a*ln(|u(t)-v * |+1)+b;
其中,rwd表示本次奖励值,u(t)表示PID控制装置的输出值,v*表示理论值,a和b均为大于0的常数。Among them, rwd represents the reward value this time, u(t) represents the output value of the PID control device, v* represents the theoretical value, and both a and b are constants greater than zero.
可选地,在本申请实施例中,所述确定单元还用于:Optionally, in the embodiment of the present application, the determining unit is further configured to:
根据第三公式,确定所述本次累加奖励值,其中,所述第三公式为:According to the third formula, determine the accumulated reward value this time, where the third formula is:
rwd(t)=max(0,rwd(t-1)+rwd);rwd(t)=max(0,rwd(t-1)+rwd);
其中,rwd(t)表示本次累加奖励值,rwd(t-1)表示前一次累加奖励值,rwd表示本次奖励值。Among them, rwd(t) represents the cumulative reward value this time, rwd(t-1) represents the previous cumulative reward value, and rwd represents the current reward value.
可选地,在本申请实施例中,所述更新单元具体用于:Optionally, in the embodiment of the present application, the update unit is specifically configured to:
在所述本次奖励值小于0的情况下,根据所述本次累加奖励值、所述本次奖励值以及更新率,对所述PID参数进行更新,所述更新率用于调节所述本次奖励值在更新所述PID参数时所占的比例。In the case that the current reward value is less than 0, the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward value. The percentage of the secondary reward value when updating the PID parameters.
可选地,在本申请实施例中,在所述本次奖励值小于0且所述输出值小于所述理论值的情况下,所述更新率用于增大所述本次奖励值在更新所述PID参数时所占的比例;在所述本次奖励值小于0且所述输出值大于所述理论值的情况下,所述更新率用于减小所述本次奖励值在更新所述PID参数时所占的比例。Optionally, in the embodiment of the present application, in the case that the current reward value is less than 0 and the output value is less than the theoretical value, the update rate is used to increase the current reward value when updating The proportion of the PID parameters; when the current reward value is less than 0 and the output value is greater than the theoretical value, the update rate is used to reduce the current reward value in the update The proportion of PID parameters.
可选地,在本申请实施例中,所述更新单元具体用于:Optionally, in the embodiment of the present application, the update unit is specifically configured to:
在所述本次奖励值小于0且所述输出值小于所述理论值的情况下,根据第四公式,对所述PID参数进行更新,其中,所述第四公式为:In the case where the current reward value is less than 0 and the output value is less than the theoretical value, the PID parameters are updated according to the fourth formula, where the fourth formula is:
Figure PCTCN2020117211-appb-000007
Figure PCTCN2020117211-appb-000007
在所述本次奖励值小于0且所述输出值大于所述理论值的情况下,根据第五公式,对所述PID参数进行更新,其中,所述第五公式为:In the case that the current reward value is less than 0 and the output value is greater than the theoretical value, the PID parameters are updated according to the fifth formula, where the fifth formula is:
Figure PCTCN2020117211-appb-000008
Figure PCTCN2020117211-appb-000008
其中,k 2表示更新后的PID参数,k 1表示更新前的PID参数,rwd(t)表示本次累加奖励值,psh表示本次奖励值的负数,ur表示更新率,ur的取值范围在0到1之间。 Among them, k 2 represents the updated PID parameters, k 1 represents the PID parameters before the update, rwd(t) represents the accumulated reward value this time, psh represents the negative number of the reward value this time, ur represents the update rate, and the value range of ur Between 0 and 1.
可选地,在本申请实施例中,所述更新单元还用于:Optionally, in the embodiment of the present application, the update unit is further configured to:
若所述本次累加奖励值大于第二阈值,减小所述更新率。If the accumulated reward value this time is greater than the second threshold, the update rate is reduced.
可选地,在本申请实施例中,所述控制单元还用于:Optionally, in the embodiment of the present application, the control unit is further configured to:
在所述本次奖励值大于或等于0的情况下,根据所述PID参数对所述目标控制系统中的PID控制器进行下一次PID控制。In the case that the current reward value is greater than or equal to 0, the PID controller in the target control system is subjected to the next PID control according to the PID parameters.
可选地,在本申请实施例中,所述确定单元还用于:Optionally, in the embodiment of the present application, the determining unit is further configured to:
在所述本次奖励值大于或等于0的情况下,根据所述本次奖励值以及所述前一次累加奖励值确定本次累加奖励值,所述本次累加奖励值用于下一次更新所述PID参数所使用的前一次累加奖励值。In the case that the current reward value is greater than or equal to 0, the current cumulative reward value is determined according to the current reward value and the previous cumulative reward value, and the current cumulative reward value is used for the next update of the reward. The previous accumulated reward value used by the PID parameter.
可选地,在本申请实施例中,所述目标控制系统为视频编解码系统,所述PID控制装置适用于所述视频编解码系统中的码率控制。Optionally, in the embodiment of the present application, the target control system is a video codec system, and the PID control device is suitable for rate control in the video codec system.
应理解,根据本申请实施例的PID控制装置200可对应于本申请方法实施例中的执行主体,并且PID控制装置200中的各个单元的上述和其它操作和/或功能分别为了实现图2方法中的相应流程,为了简洁,在此不再赘述。It should be understood that the PID control device 200 according to the embodiment of the present application may correspond to the execution subject in the method embodiment of the present application, and the above-mentioned and other operations and/or functions of the various units in the PID control device 200 are to implement the method of FIG. 2 respectively. For the sake of brevity, the corresponding process in, will not be repeated here.
应理解,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the foregoing processes does not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
尽管已对本申请及其优点做了详细说明,但应理解,在不脱离如所附权利要求书所界定的本申请的精神和范围的情况下,可以对本申请进行各种变化、替代和更改。Although the application and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made to the application without departing from the spirit and scope of the application as defined by the appended claims.
图6是本申请实施例提供的视频编解码系统300的示意性框图,该视频编解码系统300包括PID控制器310、编码参数调节装置320、编码器330、缓冲区340以及上述各种实施例中的PID控制装置350,具体地,缓冲区目标线和缓冲区充盈度的差值作为PID控制器310的比例项,编码参数调节装置320根据PID控制器310的输出,反馈计算编码参数(如量化参数QP,拉格朗日乘子λ等),再将调节后的编码参数指定给编码器330进行实际编码。编码完成一帧后,需要更新缓冲区,然后进行下一轮PID控制过程。本申请实施例中的PID控制装置350可以用于在更新缓冲区时,根据缓冲区目标线与缓冲区充盈度的差值对PID参数进行调节。如果误差较大,则需要对PID参数进行惩罚,此时,还可以结合编码参数调节情况判断是正向惩罚或是负向惩罚,进而完成对PID参数的更新调节,以用于下一次码率控制过程。6 is a schematic block diagram of a video encoding and decoding system 300 provided by an embodiment of the present application. The video encoding and decoding system 300 includes a PID controller 310, an encoding parameter adjustment device 320, an encoder 330, a buffer 340, and various embodiments described above. Specifically, the difference between the target line of the buffer and the fullness of the buffer is used as the proportional term of the PID controller 310, and the encoding parameter adjustment device 320 feedbacks and calculates the encoding parameters (such as Quantization parameter QP, Lagrangian multiplier λ, etc.), and then assign the adjusted encoding parameters to the encoder 330 for actual encoding. After one frame is encoded, the buffer needs to be updated, and then the next round of PID control is performed. The PID control device 350 in the embodiment of the present application can be used to adjust the PID parameters according to the difference between the target line of the buffer and the fullness of the buffer when the buffer is updated. If the error is large, the PID parameters need to be penalized. At this time, you can also determine whether it is a positive penalty or a negative penalty based on the adjustment of the encoding parameters, and then complete the update and adjustment of the PID parameters for the next rate control process.
对于判断正向/负向惩罚,一个具体的例子是:当编码器产生的实际比特数较大,导致误差增大,并且QP或λ是在调大,说明调节力度不够,需要加大调节力度,此时为正向惩罚。具体见表1。For judging the positive/negative penalty, a specific example is: when the actual number of bits generated by the encoder is large, the error increases, and QP or λ is increasing, indicating that the adjustment is not strong enough, and the adjustment needs to be increased. , At this time, it is a positive penalty. See Table 1 for details.
表1Table 1
Figure PCTCN2020117211-appb-000009
Figure PCTCN2020117211-appb-000009
图7是本申请实施例提供的一种PID控制装置400示意性结构图。该PID控制装置400包括存储器410和处理器420。其中,该存储器410用于存储指令,该处理器420用于执行该存储器410存储的指令,具体地,该处理器420用于执行以下操作:根据目标控制系统中的比例积分微分PID控制器的输出值与理论值之间的差值,确定所述PID控制器的PID参数对应的本次奖励值,所述PID参数包括比例增益、积分增益和微分增益中的至少一种;在所述本次奖励值小于0情况下,根据本次累加奖励值和所述本次奖励值对所述PID参数进行更新,其中,所述本次累加奖励值是根据所述本次奖励值与前一次累加奖励值确定的;根据更新后的所述PID参数,对所述目标控制系统中的PID控制器进行下一次PID控制。FIG. 7 is a schematic structural diagram of a PID control device 400 provided by an embodiment of the present application. The PID control device 400 includes a memory 410 and a processor 420. The memory 410 is used to store instructions, and the processor 420 is used to execute the instructions stored in the memory 410. Specifically, the processor 420 is used to perform the following operations: according to the target control system of the proportional integral derivative PID controller The difference between the output value and the theoretical value determines the current reward value corresponding to the PID parameter of the PID controller. The PID parameter includes at least one of proportional gain, integral gain, and derivative gain; In the case that the second reward value is less than 0, the PID parameters are updated according to the current cumulative reward value and the current reward value, where the current cumulative reward value is based on the current reward value and the previous cumulative reward value. The reward value is determined; according to the updated PID parameters, the next PID control is performed on the PID controller in the target control system.
图8是本申请实施例的芯片的示意性结构图。图8所示的芯片500包括处理器510,处理器510可以从存储器中调用并运行计算机程序,以实现本申请实施例中的方法。FIG. 8 is a schematic structural diagram of a chip of an embodiment of the present application. The chip 500 shown in FIG. 8 includes a processor 510, and the processor 510 can call and run a computer program from the memory to implement the method in the embodiment of the present application.
可选地,如图8所示,芯片500还可以包括存储器520。其中,处理器510可以从存储器520中调用并运行计算机程序,以实现本申请实施例中的方法。Optionally, as shown in FIG. 8, the chip 500 may further include a memory 520. The processor 510 may call and run a computer program from the memory 520 to implement the method in the embodiment of the present application.
其中,存储器520可以是独立于处理器510的一个单独的器件,也可以集成在处理器510中。The memory 520 may be a separate device independent of the processor 510, or may be integrated in the processor 510.
可选地,该芯片500还可以包括输入接口530。其中,处理器510可以控制该输入接口530与其他设备或芯片进行通信,具体地,可以获取其他设备或芯片发送的信息或数据。Optionally, the chip 500 may further include an input interface 530. The processor 510 can control the input interface 530 to communicate with other devices or chips, and specifically, can obtain information or data sent by other devices or chips.
可选地,该芯片500还可以包括输出接口550。其中,处理器510可以控制该输出接口550与其他设备或芯片进行通信,具体地,可以向其他设备或芯片输出信息或数据。Optionally, the chip 500 may further include an output interface 550. The processor 510 can control the output interface 550 to communicate with other devices or chips, and specifically, can output information or data to other devices or chips.
可选地,该芯片可应用于本申请实施例中的PID控制装置,并且该芯片可以实现本申请实施例的各个方法中的相应流程,为了简洁,在此不再赘述。Optionally, the chip can be applied to the PID control device in the embodiment of the present application, and the chip can implement the corresponding process in each method of the embodiment of the present application. For the sake of brevity, details are not described herein again.
应理解,本申请实施例提到的芯片还可以称为系统级芯片,系统芯片,芯片系统或片上系统芯片等。It should be understood that the chip mentioned in the embodiment of the present application may also be referred to as a system-level chip, a system-on-chip, a system-on-chip, or a system-on-chip.
应理解,本申请实施例的处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。It should be understood that the processor of the embodiment of the present application may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software. The above-mentioned processor may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), and electrically available Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (Random Access Memory, RAM), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, DDR SDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), Synchronous Link Dynamic Random Access Memory (Synchlink DRAM, SLDRAM) ) And Direct Rambus RAM (DR RAM). It should be noted that the memories of the systems and methods described herein are intended to include, but are not limited to, these and any other suitable types of memories.
应理解,上述存储器为示例性但不是限制性说明,例如,本申请实施例中的存储器还可以是静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)以及直接内存总线随机存取存储器(Direct Rambus RAM, DR RAM)等等。也就是说,本申请实施例中的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It should be understood that the foregoing memory is exemplary but not restrictive. For example, the memory in the embodiment of the present application may also be static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), Synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection Dynamic random access memory (synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM) and so on. That is to say, the memory in the embodiments of the present application is intended to include, but is not limited to, these and any other suitable types of memory.
本申请实施例还提供了一种计算机可读存储介质,用于存储计算机程序。The embodiment of the present application also provides a computer-readable storage medium for storing computer programs.
可选的,该计算机可读存储介质可应用于本申请实施例中的网络设备,并且该计算机程序使得计算机执行本申请实施例的各个方法中由网络设备实现的相应流程,为了简洁,在此不再赘述。Optionally, the computer-readable storage medium can be applied to the network device in the embodiment of the present application, and the computer program causes the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application. For the sake of brevity, here No longer.
可选地,该计算机可读存储介质可应用于本申请实施例中的移动终端/终端设备,并且该计算机程序使得计算机执行本申请实施例的各个方法中由移动终端/终端设备实现的相应流程,为了简洁,在此不再赘述。Optionally, the computer-readable storage medium can be applied to the mobile terminal/terminal device in the embodiment of the present application, and the computer program causes the computer to execute the corresponding process implemented by the mobile terminal/terminal device in each method of the embodiment of the present application For the sake of brevity, I won’t repeat it here.
本申请实施例还提供了一种计算机程序产品,包括计算机程序指令。The embodiments of the present application also provide a computer program product, including computer program instructions.
可选的,该计算机程序产品可应用于本申请实施例中的网络设备,并且该计算机程序指令使得计算机执行本申请实施例的各个方法中由网络设备实现的相应流程,为了简洁,在此不再赘述。Optionally, the computer program product can be applied to the network device in the embodiment of the present application, and the computer program instructions cause the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application. For the sake of brevity, it is not here. Go into details again.
可选地,该计算机程序产品可应用于本申请实施例中的移动终端/终端设备,并且该计算机程序指令使得计算机执行本申请实施例的各个方法中由移动终端/终端设备实现的相应流程,为了简洁,在此不再赘述。Optionally, the computer program product can be applied to the mobile terminal/terminal device in the embodiment of the present application, and the computer program instructions cause the computer to execute the corresponding process implemented by the mobile terminal/terminal device in each method of the embodiment of the present application, For the sake of brevity, I will not repeat them here.
本申请实施例还提供了一种计算机程序。The embodiment of the present application also provides a computer program.
可选的,该计算机程序可应用于本申请实施例中的网络设备,当该计算机程序在计算机上运行时,使得计算机执行本申请实施例的各个方法中由网络设备实现的相应流程,为了简洁,在此不再赘述。Optionally, the computer program can be applied to the network device in the embodiment of the present application. When the computer program runs on the computer, it causes the computer to execute the corresponding process implemented by the network device in each method of the embodiment of the present application. For the sake of brevity , I won’t repeat it here.
可选地,该计算机程序可应用于本申请实施例中的移动终端/终端设备,当该计算机程序在计算机上运行时,使得计算机执行本申请实施例的各个方法中由移动终端/终端设备实现的相应流程,为了简洁,在此不再赘述。Optionally, the computer program can be applied to the mobile terminal/terminal device in the embodiment of the present application. When the computer program runs on the computer, the computer executes each method in the embodiment of the present application. For the sake of brevity, the corresponding process will not be repeated here.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。针对这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以 是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,)ROM、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. In response to this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory,) ROM, random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims (30)

  1. 一种PID控制方法,其特征在于,包括:A PID control method is characterized in that it comprises:
    根据目标控制系统中的比例积分微分PID控制器的输出值与理论值之间的差值,确定所述PID控制器的PID参数对应的本次奖励值,所述PID参数包括比例增益、积分增益和微分增益中的至少一种;According to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value, determine the current reward value corresponding to the PID parameter of the PID controller. The PID parameter includes proportional gain and integral gain. And at least one of differential gain;
    在所述本次奖励值小于0的情况下,根据本次累加奖励值和所述本次奖励值对所述PID参数进行更新,其中,所述本次累加奖励值是根据所述本次奖励值与前一次累加奖励值确定的;In the case that the current reward value is less than 0, the PID parameters are updated according to the current cumulative reward value and the current reward value, wherein the current cumulative reward value is based on the current reward The value is determined by the previous accumulated reward value;
    根据更新后的所述PID参数,对所述目标控制系统中的PID控制器进行下一次PID控制。According to the updated PID parameters, perform the next PID control on the PID controller in the target control system.
  2. 根据权利要求1所述的PID控制方法,其特征在于,所述本次奖励值与所述差值的绝对值负相关。The PID control method according to claim 1, wherein the current reward value is negatively correlated with the absolute value of the difference.
  3. 根据权利要求1或2所述的PID控制方法,其特征在于,若所述差值的绝对值小于或等于第一阈值,所述本次奖励值大于或等于0;若所述差值的绝对值大于所述第一阈值,所述本次奖励值小于0。The PID control method according to claim 1 or 2, wherein if the absolute value of the difference is less than or equal to the first threshold, the current reward value is greater than or equal to 0; if the absolute value of the difference is The value is greater than the first threshold, and the current reward value is less than zero.
  4. 根据权利要求2或3所述的PID控制方法,其特征在于,所述根据目标控制系统的输出值与理论值之间的差值,确定所述PID控制器的PID参数对应的本次奖励值,包括:The PID control method according to claim 2 or 3, wherein the current reward value corresponding to the PID parameter of the PID controller is determined according to the difference between the output value of the target control system and the theoretical value ,include:
    根据第一公式,确定所述本次奖励值,其中,所述第一公式为:Determine the reward value this time according to the first formula, where the first formula is:
    rwd=-a*|u(t)-v *|+b;或 rwd=-a*|u(t)-v * |+b; or
    根据第二公式,确定所述本次奖励值,其中,所述第二公式为:Determine the reward value this time according to the second formula, where the second formula is:
    rwd=-a*ln(|u(t)-v *|+1)+b; rwd=-a*ln(|u(t)-v * |+1)+b;
    其中,rwd表示本次奖励值,u(t)表示PID控制器的输出值,v*表示理论值,a和b均为大于0的常数。Among them, rwd represents the reward value this time, u(t) represents the output value of the PID controller, v* represents the theoretical value, and both a and b are constants greater than 0.
  5. 根据权利要求1至4中任一项所述的PID控制方法,其特征在于,所述PID控制方法还包括:The PID control method according to any one of claims 1 to 4, wherein the PID control method further comprises:
    根据第三公式,确定所述本次累加奖励值,其中,所述第三公式为:According to the third formula, determine the accumulated reward value this time, where the third formula is:
    rwd(t)=max(0,rwd(t-1)+rwd);rwd(t)=max(0,rwd(t-1)+rwd);
    其中,rwd(t)表示本次累加奖励值,rwd(t-1)表示前一次累加奖励值,rwd表示本次奖励值。Among them, rwd(t) represents the cumulative reward value this time, rwd(t-1) represents the previous cumulative reward value, and rwd represents the current reward value.
  6. 根据权利要求1至5中任一项所述的PID控制方法,其特征在于,所述在所述本次奖励值小于0情况下,根据本次累加奖励值和所述本次奖励值对所述PID参数进行更新,包括:The PID control method according to any one of claims 1 to 5, characterized in that, in the case that the current reward value is less than 0, the current cumulative reward value is compared with the current reward value. The PID parameters are updated, including:
    在所述本次奖励值小于0的情况下,根据所述本次累加奖励值、所述本次奖励值以及更新率,对所述PID参数进行更新,所述更新率用于调节所述本次奖励值在更新所述PID参数时所占的比例。In the case that the current reward value is less than 0, the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward value. The percentage of the secondary reward value when updating the PID parameters.
  7. 根据权利要求6所述的PID控制方法,其特征在于,在所述本次奖励值小于0且所述输出值小于所述理论值的情况下,所述更新率用于增大所述本次奖励值在更新所述PID参数时所占的比例;在所述本次奖励值小于0且所述输出值大于所述理论值的情况下,所述更新率用于减小所述本次奖励值在更新所述PID参数时所占的比例。The PID control method according to claim 6, wherein in the case that the current reward value is less than 0 and the output value is less than the theoretical value, the update rate is used to increase the current reward value. The proportion of the reward value when updating the PID parameters; when the current reward value is less than 0 and the output value is greater than the theoretical value, the update rate is used to reduce the current reward The ratio of the value when updating the PID parameter.
  8. 根据权利要求6或7所述的PID控制方法,其特征在于,所述在所述本次奖励值小于0的情况下,根据所述本次累加奖励值、所述本次奖励值以及更新率,对所述PID参数进行更新,包括:The PID control method according to claim 6 or 7, characterized in that, in the case that the current reward value is less than 0, according to the current cumulative reward value, the current reward value and the update rate , To update the PID parameters, including:
    在所述本次奖励值小于0且所述输出值小于所述理论值的情况下,根据第四公式, 对所述PID参数进行更新,其中,所述第四公式为:In the case that the current reward value is less than 0 and the output value is less than the theoretical value, the PID parameters are updated according to the fourth formula, where the fourth formula is:
    Figure PCTCN2020117211-appb-100001
    Figure PCTCN2020117211-appb-100001
    在所述本次奖励值小于0且所述输出值大于所述理论值的情况下,根据第五公式,对所述PID参数进行更新,其中,所述第五公式为:In the case that the current reward value is less than 0 and the output value is greater than the theoretical value, the PID parameters are updated according to the fifth formula, where the fifth formula is:
    Figure PCTCN2020117211-appb-100002
    Figure PCTCN2020117211-appb-100002
    其中,k 2表示更新后的PID参数,k 1表示更新前的PID参数,rwd(t)表示本次累加奖励值,psh表示本次奖励值的负数,ur表示更新率,ur的取值范围在0到1之间。 Among them, k 2 represents the updated PID parameters, k 1 represents the PID parameters before the update, rwd(t) represents the accumulated reward value this time, psh represents the negative number of the reward value this time, ur represents the update rate, and the value range of ur Between 0 and 1.
  9. 根据权利要求6至8中任一项所述的PID控制方法,其特征在于,所述PID控制方法还包括:The PID control method according to any one of claims 6 to 8, wherein the PID control method further comprises:
    若所述本次累加奖励值大于第二阈值,减小所述更新率。If the accumulated reward value this time is greater than the second threshold, the update rate is reduced.
  10. 根据权利要求1至9中任一项所述的PID控制方法,其特征在于,所述PID控制方法还包括:The PID control method according to any one of claims 1 to 9, wherein the PID control method further comprises:
    在所述本次奖励值大于或等于0的情况下,根据所述PID参数对所述目标控制系统中的PID控制器进行下一次PID控制。In the case that the current reward value is greater than or equal to 0, the PID controller in the target control system is subjected to the next PID control according to the PID parameters.
  11. 根据权利要求1至10中任一项所述的PID控制方法,其特征在于,所述PID控制方法还包括:The PID control method according to any one of claims 1 to 10, wherein the PID control method further comprises:
    在所述本次奖励值大于或等于0的情况下,根据所述本次奖励值以及所述前一次累加奖励值确定本次累加奖励值,所述本次累加奖励值用于下一次更新所述PID参数所使用的前一次累加奖励值。In the case that the current reward value is greater than or equal to 0, the current cumulative reward value is determined according to the current reward value and the previous cumulative reward value, and the current cumulative reward value is used for the next update of the reward. The previous accumulated reward value used by the PID parameter.
  12. 根据权利要求1至11中任一项所述的PID控制方法,其特征在于,所述目标控制系统为视频编解码系统,所述PID控制方法适用于所述视频编解码系统中的码率控制。The PID control method according to any one of claims 1 to 11, wherein the target control system is a video codec system, and the PID control method is suitable for rate control in the video codec system .
  13. 一种PID控制装置,其特征在于,所述PID控制装置包括:A PID control device, characterized in that the PID control device comprises:
    确定单元,用于根据目标控制系统中的比例积分微分PID控制器的输出值与理论值之间的差值,确定所述PID控制器的PID参数对应的本次奖励值;The determining unit is configured to determine the current reward value corresponding to the PID parameter of the PID controller according to the difference between the output value of the proportional integral derivative PID controller in the target control system and the theoretical value;
    更新单元,用于在所述本次奖励值小于0的情况下,根据本次累加奖励值和所述本次奖励值对所述PID参数进行更新,所述PID参数包括比例增益、积分增益和微分增益中的至少一种,其中,所述本次累加奖励值是根据所述本次奖励值与前一次累加奖励值确定的;The update unit is configured to update the PID parameters according to the accumulated reward value this time and the current reward value when the current reward value is less than 0. The PID parameters include proportional gain, integral gain, and At least one of the differential gains, wherein the current cumulative reward value is determined based on the current reward value and the previous cumulative reward value;
    控制单元,用于根据更新后的所述PID参数,对所述目标控制系统中的PID控制器进行下一次PID控制。The control unit is configured to perform the next PID control on the PID controller in the target control system according to the updated PID parameters.
  14. 根据权利要求13所述的PID控制装置,其特征在于,所述本次奖励值与所述差值的绝对值负相关。The PID control device according to claim 13, wherein the current reward value is negatively correlated with the absolute value of the difference.
  15. 根据权利要求13或14所述的PID控制装置,其特征在于,若所述差值的绝对值小于或等于第一阈值,所述本次奖励值大于或等于0;若所述差值的绝对值大于所述第一阈值,所述本次奖励值小于0。The PID control device according to claim 13 or 14, wherein if the absolute value of the difference is less than or equal to the first threshold, the current reward value is greater than or equal to 0; if the absolute value of the difference is The value is greater than the first threshold, and the current reward value is less than zero.
  16. 根据权利要求14或15所述的PID控制装置,其特征在于,所述确定单元具体用于:The PID control device according to claim 14 or 15, wherein the determining unit is specifically configured to:
    根据第一公式,确定所述本次奖励值,其中,所述第一公式为:Determine the reward value this time according to the first formula, where the first formula is:
    rwd=-a*|u(t)-v *|+b;或 rwd=-a*|u(t)-v * |+b; or
    根据第二公式,确定所述本次奖励值,其中,所述第二公式为:Determine the reward value this time according to the second formula, where the second formula is:
    rwd=-a*ln(|u(t)-v *|+1)+b; rwd=-a*ln(|u(t)-v * |+1)+b;
    其中,rwd表示本次奖励值,u(t)表示PID控制装置的输出值,v*表示理论值,a和b均为大于0的常数。Among them, rwd represents the reward value this time, u(t) represents the output value of the PID control device, v* represents the theoretical value, and both a and b are constants greater than zero.
  17. 根据权利要求13至16中任一项所述的PID控制装置,其特征在于,所述确定单元还用于:The PID control device according to any one of claims 13 to 16, wherein the determining unit is further configured to:
    根据第三公式,确定所述本次累加奖励值,其中,所述第三公式为:According to the third formula, determine the accumulated reward value this time, where the third formula is:
    rwd(t)=max(0,rwd(t-1)+rwd);rwd(t)=max(0,rwd(t-1)+rwd);
    其中,rwd(t)表示本次累加奖励值,rwd(t-1)表示前一次累加奖励值,rwd表示本次奖励值。Among them, rwd(t) represents the cumulative reward value this time, rwd(t-1) represents the previous cumulative reward value, and rwd represents the current reward value.
  18. 根据权利要求13至17中任一项所述的PID控制装置,其特征在于,所述更新单元具体用于:The PID control device according to any one of claims 13 to 17, wherein the update unit is specifically configured to:
    在所述本次奖励值小于0的情况下,根据所述本次累加奖励值、所述本次奖励值以及更新率,对所述PID参数进行更新,所述更新率用于调节所述本次奖励值在更新所述PID参数时所占的比例。In the case that the current reward value is less than 0, the PID parameters are updated according to the current accumulated reward value, the current reward value, and the update rate, and the update rate is used to adjust the current reward value. The percentage of the secondary reward value when updating the PID parameters.
  19. 根据权利要求18所述的PID控制装置,其特征在于,在所述本次奖励值小于0且所述输出值小于所述理论值的情况下,所述更新率用于增大所述本次奖励值在更新所述PID参数时所占的比例;在所述本次奖励值小于0且所述输出值大于所述理论值的情况下,所述更新率用于减小所述本次奖励值在更新所述PID参数时所占的比例。The PID control device according to claim 18, wherein, in the case that the current reward value is less than 0 and the output value is less than the theoretical value, the update rate is used to increase the current reward value. The proportion of the reward value when updating the PID parameters; when the current reward value is less than 0 and the output value is greater than the theoretical value, the update rate is used to reduce the current reward The ratio of the value when updating the PID parameter.
  20. 根据权利要求18或19所述的PID控制装置,其特征在于,所述更新单元具体用于:The PID control device according to claim 18 or 19, wherein the update unit is specifically configured to:
    在所述本次奖励值小于0且所述输出值小于所述理论值的情况下,根据第四公式,对所述PID参数进行更新,其中,所述第四公式为:In the case where the current reward value is less than 0 and the output value is less than the theoretical value, the PID parameters are updated according to the fourth formula, where the fourth formula is:
    Figure PCTCN2020117211-appb-100003
    Figure PCTCN2020117211-appb-100003
    在所述本次奖励值小于0且所述输出值大于所述理论值的情况下,根据第五公式,对所述PID参数进行更新,其中,所述第五公式为:In the case that the current reward value is less than 0 and the output value is greater than the theoretical value, the PID parameters are updated according to the fifth formula, where the fifth formula is:
    Figure PCTCN2020117211-appb-100004
    Figure PCTCN2020117211-appb-100004
    其中,k 2表示更新后的PID参数,k 1表示更新前的PID参数,rwd(t)表示本次累加奖励值,psh表示本次奖励值的负数,ur表示更新率,ur的取值范围在0到1之间。 Among them, k 2 represents the updated PID parameters, k 1 represents the PID parameters before the update, rwd(t) represents the accumulated reward value this time, psh represents the negative number of the reward value this time, ur represents the update rate, and the value range of ur Between 0 and 1.
  21. 根据权利要求18至20中任一项所述的PID控制装置,其特征在于,所述更新单元还用于:The PID control device according to any one of claims 18 to 20, wherein the update unit is further configured to:
    若所述本次累加奖励值大于第二阈值,减小所述更新率。If the accumulated reward value this time is greater than the second threshold, the update rate is reduced.
  22. 根据权利要求13至21中任一项所述的PID控制装置,其特征在于,所述控制单元还用于:The PID control device according to any one of claims 13 to 21, wherein the control unit is further configured to:
    在所述本次奖励值大于或等于0的情况下,根据所述PID参数对所述目标控制系统中的PID控制器进行下一次PID控制。In the case that the current reward value is greater than or equal to 0, the PID controller in the target control system is subjected to the next PID control according to the PID parameters.
  23. 根据权利要求13至22中任一项所述的PID控制装置,其特征在于,所述确定单元还用于:The PID control device according to any one of claims 13 to 22, wherein the determining unit is further configured to:
    在所述本次奖励值大于或等于0的情况下,根据所述本次奖励值以及所述前一次累加奖励值确定本次累加奖励值,所述本次累加奖励值用于下一次更新所述PID参数所使用的前一次累加奖励值。In the case that the current reward value is greater than or equal to 0, the current cumulative reward value is determined according to the current reward value and the previous cumulative reward value, and the current cumulative reward value is used for the next update of the reward. The previous accumulated reward value used by the PID parameter.
  24. 根据权利要求13至23中任一项所述的PID控制装置,其特征在于,所述目标控制系统为视频编解码系统,所述PID控制装置适用于所述视频编解码系统中的码率控制。The PID control device according to any one of claims 13 to 23, wherein the target control system is a video codec system, and the PID control device is suitable for rate control in the video codec system .
  25. 一种视频编解码系统,其特征在于,包括如权利要求13至24中任一项所述的PID控制装置,所述PID控制装置适用于所述视频编解码系统中的码率控制。A video encoding and decoding system, characterized by comprising the PID control device according to any one of claims 13 to 24, the PID control device being suitable for rate control in the video encoding and decoding system.
  26. 一种PID控制装置,其特征在于,包括:处理器和存储器,该存储器用于存储 计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,执行如权利要求1至12中任一项所述的方法。A PID control device, characterized by comprising: a processor and a memory, the memory is used to store a computer program, the processor is used to call and run the computer program stored in the memory, and execute as claimed in claims 1 to 12 Any of the methods described.
  27. 一种芯片,其特征在于,包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有所述芯片的设备执行如权利要求1至12中任一项所述的方法。A chip, characterized by comprising: a processor, configured to call and run a computer program from a memory, so that a device installed with the chip executes the method according to any one of claims 1 to 12.
  28. 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至12中任一项所述的方法。A computer-readable storage medium, characterized in that it is used to store a computer program that enables a computer to execute the method according to any one of claims 1 to 12.
  29. 一种计算机程序产品,其特征在于,包括计算机程序指令,该计算机程序指令使得计算机执行如权利要求1至12中任一项所述的方法。A computer program product, characterized by comprising computer program instructions, which cause a computer to execute the method according to any one of claims 1 to 12.
  30. 一种计算机程序,其特征在于,所述计算机程序使得计算机执行如权利要求1至12中任一项所述的方法。A computer program, wherein the computer program causes a computer to execute the method according to any one of claims 1 to 12.
PCT/CN2020/117211 2019-10-09 2020-09-23 Pid control method and apparatus, and video encoding and decoding system WO2021068748A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910955024.5A CN112631120B (en) 2019-10-09 2019-10-09 PID control method, device and video coding and decoding system
CN201910955024.5 2019-10-09

Publications (1)

Publication Number Publication Date
WO2021068748A1 true WO2021068748A1 (en) 2021-04-15

Family

ID=75283283

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117211 WO2021068748A1 (en) 2019-10-09 2020-09-23 Pid control method and apparatus, and video encoding and decoding system

Country Status (2)

Country Link
CN (1) CN112631120B (en)
WO (1) WO2021068748A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113110029A (en) * 2021-04-16 2021-07-13 北京黑蚁兄弟科技有限公司 PID control method and device based on hybrid filtering and PID control equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355584A (en) * 2011-10-31 2012-02-15 电子科技大学 Code rate control method based on intra-frame predictive coding modes
CN107154918A (en) * 2016-03-03 2017-09-12 北京大学 Net cast transfer control method and system based on PID control
CN107943022A (en) * 2017-10-23 2018-04-20 清华大学 A kind of PID locomotive automatic Pilot optimal control methods based on intensified learning
CN108008627A (en) * 2017-12-13 2018-05-08 中国石油大学(华东) A kind of reinforcement learning adaptive PID control method of parallel optimization
US20190187631A1 (en) * 2017-12-15 2019-06-20 Exxonmobil Research And Engineering Company Adaptive pid controller tuning via deep reinforcement learning

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110167025A1 (en) * 2008-07-24 2011-07-07 Kourosh Danai Systems and methods for parameter adaptation
CN102636989B (en) * 2012-04-25 2014-09-17 北京科技大学 Design method for data-driven PID (proportional integral derivative) controller for adjusting depth of stockline of bell-less top blast furnace
CN102787915A (en) * 2012-06-06 2012-11-21 哈尔滨工程大学 Diesel engine electronic speed adjusting method based on reinforced study of proportion integration differentiation (PID) controller
US9958840B2 (en) * 2015-02-25 2018-05-01 Mitsubishi Electric Research Laboratories, Inc. System and method for controlling system using a control signal for transitioning a state of the system from a current state to a next state using different instances of data with different precisions
CN105163121B (en) * 2015-08-24 2018-04-17 西安电子科技大学 Big compression ratio satellite remote sensing images compression method based on depth autoencoder network
JP6474456B2 (en) * 2017-05-16 2019-02-27 ファナック株式会社 Machine learning apparatus, servo control system, and machine learning method
CN107515531B (en) * 2017-08-30 2021-01-26 京东方科技集团股份有限公司 Intelligent control method and system and intelligent monitoring system for plant factory
JP6680756B2 (en) * 2017-12-26 2020-04-15 ファナック株式会社 Control device and machine learning device
CN108462876B (en) * 2018-01-19 2021-01-26 瑞芯微电子股份有限公司 Video decoding optimization adjustment device and method
CN108447082A (en) * 2018-03-15 2018-08-24 深圳市唯特视科技有限公司 A kind of objective matching process based on combination learning Keypoint detector
CN109270833A (en) * 2018-10-23 2019-01-25 大连海事大学 A kind of Varied scope fuzzy control method based on brshless DC motor Q study
CN109521669A (en) * 2018-11-12 2019-03-26 中国航空工业集团公司北京航空精密机械研究所 A kind of turning table control methods of self-tuning based on intensified learning
CN109451038A (en) * 2018-12-06 2019-03-08 北京达佳互联信息技术有限公司 A kind of information-pushing method, device, server and computer readable storage medium
CN109739090A (en) * 2019-01-15 2019-05-10 哈尔滨工程大学 A kind of autonomous type underwater robot neural network intensified learning control method
CN110262218A (en) * 2019-05-20 2019-09-20 北京航空航天大学 Control method, device, equipment and the storage medium of machine fish

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102355584A (en) * 2011-10-31 2012-02-15 电子科技大学 Code rate control method based on intra-frame predictive coding modes
CN107154918A (en) * 2016-03-03 2017-09-12 北京大学 Net cast transfer control method and system based on PID control
CN107943022A (en) * 2017-10-23 2018-04-20 清华大学 A kind of PID locomotive automatic Pilot optimal control methods based on intensified learning
CN108008627A (en) * 2017-12-13 2018-05-08 中国石油大学(华东) A kind of reinforcement learning adaptive PID control method of parallel optimization
US20190187631A1 (en) * 2017-12-15 2019-06-20 Exxonmobil Research And Engineering Company Adaptive pid controller tuning via deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNKAI SHAO, XUAN ZHAO, JUE YANG, WENMING ZHANG, YITING KANG, XINXIN ZHAO: "Reinforcement Learning Algorithm for Path Following Control of Articulated Vehicle", TRANSACTIONS OF THE CHINESE SOCIETY FOR AGRICULTURAL MACHINERY, vol. 48, no. 3, 1 March 2017 (2017-03-01), pages 376 - 382, XP055800463, DOI: : 10.6041 /j.issn.1000-1298.2017.03.048 *
SHI QIAN; LAM HAK-KEUNG; XIAO BO; TSAI SHUN-HUNG: "Adaptive PID controller based on-learning algorithm", CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, THE INSTITUTION OF ENGINEERING AND TECHNOLOGY, MICHAEL FARADAY HOUSE, SIX HILLS WAY, STEVENAGE, HERTS. SG1 2AY, UK, vol. 3, no. 4, 1 December 2018 (2018-12-01), Michael Faraday House, Six Hills Way, Stevenage, Herts. SG1 2AY, UK, pages 235 - 244, XP006087797, DOI: 10.1049/trit.2018.1007 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113110029A (en) * 2021-04-16 2021-07-13 北京黑蚁兄弟科技有限公司 PID control method and device based on hybrid filtering and PID control equipment
CN113110029B (en) * 2021-04-16 2022-11-18 北京黑蚁兄弟科技有限公司 PID control method and device based on hybrid filtering and PID control equipment

Also Published As

Publication number Publication date
CN112631120B (en) 2022-05-17
CN112631120A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN112085188B (en) Method for determining quantization parameter of neural network and related product
CN108989889B (en) Video playing amount prediction method and device and electronic equipment
WO2021068748A1 (en) Pid control method and apparatus, and video encoding and decoding system
TWI796286B (en) A training method and training system for a machine learning system
CN112287968A (en) Image model training method, image processing method, chip, device and medium
WO2023202355A1 (en) Soil body state data calculation method and device based on boundary surface plasticity model
CN114091281A (en) Product life prediction method and device, electronic equipment and storage medium
EP4002023A1 (en) Learning model generation method, learning model generation device, method for controlling molten iron temperature in blast furnace, method for guiding molten iron temperature control in blast furnace, and molten iron manufacturing method
CN107666107B (en) Method of correcting laser power, laser, storage medium, and electronic apparatus
CN114753940B (en) Engine speed control method, electronic device, engineering machine and storage medium
CN112253516B (en) Speed regulation method and device
CN112100867B (en) Power grid load prediction method
CN112596378A (en) Coating thickness control method and training method and device of coating thickness control model
CN113298256B (en) Adaptive curve learning method and device, computer equipment and storage medium
CN110658722A (en) Self-equalization multi-model decomposition method and system based on gap
CN117391404B (en) Control method and device for coating transverse surface density of lithium battery and electronic equipment
WO2023150967A1 (en) Equipment failure rate determining method and device, computer equipment and storage medium
CN117289686B (en) Parameter calibration method and device, electronic equipment and storage medium
US20210064235A1 (en) Memory controller, flash memory system having the same, and flash memory control method
CN103177184A (en) Runtime recursion data source tracing method of low storage expenditure
CN115560604A (en) Control method, device, medium and equipment for material flow regulating valve of blast furnace charging bucket
CN109981110B (en) Method of lossy compression with point-by-point relative error bounds
CN112989739B (en) Method for setting time step length in Trap-Gear time discrete format
CN111741218B (en) Focusing method, device, electronic equipment and storage medium
Deng et al. Stabilization of nonlinear stochastic systems with input and output delays via event‐triggered predictive control

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20873843

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20873843

Country of ref document: EP

Kind code of ref document: A1