WO2020042098A1 - 一种调频方法、装置及计算机可读存储介质 - Google Patents
一种调频方法、装置及计算机可读存储介质 Download PDFInfo
- Publication number
- WO2020042098A1 WO2020042098A1 PCT/CN2018/103307 CN2018103307W WO2020042098A1 WO 2020042098 A1 WO2020042098 A1 WO 2020042098A1 CN 2018103307 W CN2018103307 W CN 2018103307W WO 2020042098 A1 WO2020042098 A1 WO 2020042098A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- energy efficiency
- current frame
- module
- frequency point
- frequency
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/325—Power saving in peripheral device
- G06F1/3275—Power saving in memory, e.g. RAM, cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/324—Power saving characterised by the action undertaken by lowering clock frequency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3215—Monitoring of peripheral devices
- G06F1/3218—Monitoring of peripheral devices of display devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3215—Monitoring of peripheral devices
- G06F1/3225—Monitoring of peripheral devices of memory devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3228—Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3243—Power saving in microcontroller unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/325—Power saving in peripheral device
- G06F1/3278—Power saving in modem or I/O interface
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3296—Power saving characterised by the action undertaken by lowering the supply or operating voltage
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
- G06F9/4893—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues taking into account power or heat criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
Definitions
- the present application relates to the field of electronic technology, and more particularly, to a frequency modulation method, device, and computer-readable storage medium.
- the frequency of the central processing unit can be adjusted according to the load. It can make the frequency of the CPU respond quickly to the load demand (or performance demand), thereby reducing the power consumption of the smart phone and extending the battery life of the smart phone.
- WALT window-assisted load tracking
- PELT per-entity load tracking
- WALT can calculate the duty cycle or CPU usage in the current window through a defined statistical strategy according to the duty cycle or CPU usage of the task in the N non-empty windows, and adjust the frequency points step by step.
- the window size is used as a cycle, and the frequency is adjusted step by step according to the predicted CPU occupancy rate, which cannot respond to changes in performance requirements in a timely manner. Not only the CPU, but other similar devices have similar problems with frequency modulation.
- the present application provides a frequency modulation method and device, which can respond to load changing requirements in time during the frequency modulation process.
- a frequency modulation method includes: predicting energy efficiency parameters required for at least one module to process a current frame of an image; and selecting a first frequency from a plurality of frequency point sets according to the predicted energy efficiency parameters.
- the set of points adjusts a working frequency point at which the at least one module processes the current frame to a preset frequency point corresponding to the at least one module.
- the at least one module for processing an image includes at least one of a central processing unit CPU, a graphics processing unit GPU, an internal memory (for example, a DDR memory) for storing the current frame, or a neural network processing unit NPU.
- a central processing unit CPU central processing unit
- GPU graphics processing unit GPU
- internal memory for example, a DDR memory
- NPU neural network processing unit
- the first frequency point set includes a preset frequency point corresponding to the at least one module.
- the energy efficiency parameters required for the current frame can be predicted according to the historical energy efficiency parameters (which can be obtained from the energy efficiency parameters of at least one frame before the current frame). For example, the energy efficiency parameters required for the current frame can be predicted based on the average of the historical energy efficiency parameters. As another example, a load prediction table can also be constructed and searched based on historical energy efficiency parameters, so that the energy efficiency parameters required for the current frame can be predicted according to the load prediction table.
- the energy efficiency parameters include the number of instructions required by the CPU to process the current frame, a cache miss generated by the CPU when processing the current frame, and the GPU At least one of the number of calls to a drawing function required to process the current frame, the bandwidth required to read or store the current frame from the memory, or the amount of calculation required by the NPU to process the current frame.
- the energy efficiency parameters predicted by the CPU in this embodiment of the present application may include, but are not limited to, at least one of the following parameters:
- the CPU processes the current frame.
- Number of instructions required, cache misses generated by the CPU processing the current frame e.g., number of L1 data cache misses, or number of L2 cache misses (L2 data cache) (one or all of misses)
- the number of times that the CPU has no operations due to front-end reasons also known as FE bound
- BE bound also known as BE bound
- the energy efficiency parameters predicted by the GPU in the embodiments of the present application may include, but are not limited to, at least one of the following parameters:
- the GPU processes the current frame.
- the number of calls to the drawing function draw calls
- the number of triangles the number of pixel-total-fragment (total-fragment)
- the texture drawing texture-operation, TEX-operation
- the energy efficiency parameters predicted by the embodiment of the present application may include, but are not limited to, at least one of the following parameters: the The memory stores or reads the bandwidth required for the current frame.
- the energy efficiency parameters predicted by the NPU in this embodiment of the present application may include, but are not limited to, at least one of the following parameters:
- the NPU processes the current frame. Required calculations (tasks), the number of instructions that the NPU needs to execute the current frame (the total of instructions, the current current frame (total-INSTR-EXEC)), the NPU's memory access request (the total of the memory request), total-MEM-request).
- the first set of frequency points can meet energy efficiency requirements, and the energy efficiency requirements include at least one of the following: power consumption requirements or performance requirements.
- the energy efficiency requirements in the embodiments of the present application may be used to indicate that the preset frequency points of the current frame can meet the performance requirements, or that the preset frequency points of the current frame can meet the power consumption requirements, which are not specifically limited in this application.
- the performance corresponding to each frequency point set in the multiple frequency point sets (for example, the running time corresponding to at least one module) may be predicted according to the predicted energy efficiency parameters.
- the power consumption corresponding to multiple frequency point sets in the first frequency point set may also be predicted according to the predicted energy efficiency parameters, and may be based on at least one The power consumption formula corresponding to the module selects the frequency point set corresponding to the lowest power consumption from the first frequency point set.
- the power consumption demand is a minimum power consumption demand
- the performance demand is a preset threshold
- the preset threshold may be less than or equal to a certain time threshold.
- the time threshold may be a duration required for each frame calculated according to a target frame rate. Taking a 60-frame game as an example, the time required for each frame is 1000/60 ⁇ 16.6ms.
- the performance corresponding to each frequency point set is predicted according to the predicted energy efficiency parameter; and a performance that satisfies a preset threshold is selected from a plurality of frequency point sets. The corresponding first frequency point set.
- a first frequency point set corresponding to a performance that meets a preset threshold may be selected from a plurality of frequency point sets.
- a running time corresponding to a preset frequency point of at least one module in the first frequency point set meets a prediction threshold, and the performance formula is used to represent a functional relationship between the running time of the at least one module and an energy efficiency parameter.
- the power consumption corresponding to each frequency point set is predicted according to the predicted energy efficiency parameter; and a plurality of frequency point sets are selected to satisfy a preset threshold.
- multiple second frequency point sets corresponding to multiple performances that meet a preset threshold value may be selected from multiple frequency point sets; From the point set, the first frequency point set corresponding to the lowest power consumption is selected.
- a plurality of second frequency point sets that meet performance requirements may be selected from the plurality of frequency point sets.
- the first frequency point set corresponding to the lowest power consumption may also be selected from the plurality of second frequency point sets according to the predicted energy efficiency parameter and power consumption formula.
- the power consumption formula is used to represent a functional relationship between the current power consumption required for the at least one module to process and the energy efficiency parameter of the at least one module.
- the energy efficiency parameters required by the at least one module to process the current frame of the image are predicted according to historical energy efficiency parameters, where the historical energy efficiency parameters are at least one frame before the current frame. Obtained energy efficiency parameters.
- a load prediction table is searched according to historical energy efficiency parameters to predict energy efficiency parameters required for the at least one module to process a current frame of an image.
- the load prediction table indicates at least a change trend of the plurality of historical energy efficiency parameters or directly indicates a relationship between the plurality of historical energy efficiency parameters and the predicted energy efficiency parameter. Correspondence.
- a frequency modulation device includes a prediction module, a determination module, and a processing module.
- the prediction module is configured to predict an energy efficiency parameter required for at least one module to process a current frame of an image.
- the at least one The module includes at least one of a central processing unit CPU, a graphics processing unit GPU, a memory for storing the current frame, or a neural network processing unit NPU; the determining module is configured to: A first frequency point set is selected from the frequency point sets, and the first frequency point set includes a preset frequency point corresponding to the at least one module; and the processing module is configured to process the at least one module by the at least one module.
- the working frequency of the current frame is adjusted to a preset frequency corresponding to the at least one module.
- the energy efficiency parameters include a number of instructions required by the CPU to process the current frame, a cache miss generated by the CPU to process the current frame, At least one of the number of times the GPU calls a drawing function required to process the current frame, the bandwidth required to read or store the current frame from the memory, or the amount of calculation required for the NPU to process the current frame .
- the first set of frequency points can meet energy efficiency requirements, and the energy efficiency requirements include at least one of the following: power consumption requirements or performance requirements.
- the power consumption requirement is a minimum power consumption requirement
- the performance requirement is a preset threshold
- the determining module is specifically configured to predict the performance corresponding to each frequency point set according to the predicted energy efficiency parameter; and from a plurality of frequency point sets A first frequency point set corresponding to the performance meeting a preset threshold is selected.
- the determining module is further specifically configured to predict the power consumption corresponding to each frequency point set according to the predicted energy efficiency parameter; A plurality of second frequency point sets corresponding to multiple performances satisfying a preset threshold are selected from the frequency point set; a first frequency point set corresponding to the lowest power consumption is selected from the plurality of second frequency point sets.
- the prediction module is specifically configured to predict, according to historical energy efficiency parameters, the energy efficiency parameters required by the at least one module to process the current frame of the image, the historical energy efficiency parameters Obtained from energy efficiency parameters of at least one frame before the current frame.
- the prediction module is specifically configured to: search a load prediction table according to historical energy efficiency parameters to predict the energy efficiency parameters required for the at least one module to process the current frame of the image.
- an FM system includes a memory and a processor.
- the memory is configured to store a program; the processor is configured to execute a program stored in the memory, and when the program is executed, the processor executes the first aspect or any possible implementation manner of the first aspect As described in the method.
- the processor may be further communicatively connected to the transceiver.
- the memory may be used to store program code and data of the device. Therefore, the memory may be a storage unit inside the processor, or an external storage unit independent of the processor, or a component including a storage unit inside the processor and an external storage unit independent of the processor.
- the processor may be a general-purpose processor, and the method process may be implemented by hardware, or the process may be implemented by executing software.
- the processor may include a microprocessor with a logic circuit or an integrated circuit, a digital signal processor, a microcontroller, or the CPU, and the function is realized by reading software codes stored in a memory,
- the memory can be integrated in the processor, can be located outside the processor, and exists independently.
- the processor may include a necessary hardware accelerator, such as a hardware algorithm circuit, a logic operation circuit, or an analog circuit that does not rely on software to perform operations.
- the processor When the program is executed, the processor is configured to predict energy efficiency parameters required by at least one module to process a current frame of the image, the at least one module includes a central processing unit CPU, a graphics processing unit GPU, and is configured to store the current frame. At least one of a memory or a neural network processing unit NPU; the processor is further configured to: select a first frequency point set from a plurality of frequency point sets according to the predicted energy efficiency parameter, and the first frequency point The point set includes a preset frequency point corresponding to the at least one module; the processor is further configured to adjust a working frequency point at which the at least one module processes the current frame to a preset corresponding to the at least one module Frequency.
- the energy efficiency parameters include a number of instructions required by the CPU to process the current frame, a cache miss generated by the CPU to process the current frame, At least one of the number of times the GPU calls a drawing function required to process the current frame, the bandwidth required to read or store the current frame from the memory, or the amount of calculation required for the NPU to process the current frame .
- the first frequency point set can meet energy efficiency requirements, and the energy efficiency requirements include at least one of the following: power consumption requirements or performance requirements.
- the power consumption requirement is a minimum power consumption requirement
- the performance requirement is to meet a preset threshold
- the processor is specifically configured to: predict the performance corresponding to each frequency point set according to the predicted energy efficiency parameter; and from a plurality of frequency point sets A first frequency point set corresponding to the performance meeting a preset threshold is selected.
- the processor is configured to predict the power consumption corresponding to each frequency point set according to the predicted energy efficiency parameter; and to select from a plurality of frequency point sets A plurality of second frequency point sets corresponding to a plurality of performances satisfying a preset threshold is selected; and a first frequency point set corresponding to the lowest power consumption is selected from the plurality of second frequency point sets.
- the processor is specifically configured to predict, according to historical energy efficiency parameters, the energy efficiency parameters required for the at least one module to process the current frame of the image, the historical energy efficiency parameters Obtained from energy efficiency parameters of at least one frame before the current frame.
- the processor is specifically configured to: search a load prediction table according to historical energy efficiency parameters to predict energy efficiency parameters required for the at least one module to process a current frame of the image.
- a chip including a processor and an interface for externally coupling the processor to the chip.
- the interface may be used for coupling to a memory external to the chip, and the memory may be used for storing the device.
- Program code and data For specific descriptions of the memory and the processor, reference may be made to the introduction of the third aspect or any implementation manner.
- the processor can read program code from the memory through the interface to perform the operations.
- the processor When the program is executed, the processor is configured to predict energy efficiency parameters required by at least one module to process a current frame of the image, the at least one module includes a central processing unit CPU, a graphics processing unit GPU, and is configured to store the current frame. At least one of a memory or a neural network processing unit NPU; the processor is further configured to: select a first frequency point set from a plurality of frequency point sets according to the predicted energy efficiency parameter, and the first frequency point The point set includes a preset frequency point corresponding to the at least one module; the processor is further configured to adjust a working frequency point at which the at least one module processes the current frame to a preset corresponding to the at least one module Frequency.
- the energy efficiency parameters include a number of instructions required by the CPU to process the current frame, a cache miss generated by the CPU to process the current frame, At least one of the number of times the GPU calls a drawing function required to process the current frame, the bandwidth required to read or store the current frame from the memory, or the amount of calculation required for the NPU to process the current frame .
- the first frequency point set can meet energy efficiency requirements, and the energy efficiency requirements include at least one of the following: power consumption requirements or performance requirements.
- the power consumption requirement is a minimum power consumption requirement
- the performance requirement is a preset threshold
- the processor is specifically configured to: predict the performance corresponding to each frequency point set according to the predicted energy efficiency parameter; and from a plurality of frequency point sets A first frequency point set corresponding to the performance meeting a preset threshold is selected.
- the processor is configured to: predict the power consumption corresponding to each frequency point set according to the predicted energy efficiency parameter; and from the plurality of frequency point sets A plurality of second frequency point sets corresponding to a plurality of performances satisfying a preset threshold is selected; and a first frequency point set corresponding to the lowest power consumption is selected from the plurality of second frequency point sets.
- the processor is specifically configured to predict, according to historical energy efficiency parameters, the energy efficiency parameters required for the at least one module to process the current frame of the image, the historical energy efficiency parameters Obtained from energy efficiency parameters of at least one frame before the current frame.
- the processor is specifically configured to: search a load prediction table according to historical energy efficiency parameters to predict energy efficiency parameters required for the at least one module to process a current frame of the image.
- a computer-readable storage medium including a computer program, and when the computer program is run on a computer or a processor, the computer or processor is configured to execute the first aspect or any one of the first aspect. The method described in the implementation.
- a computer program product is provided, and when the computer program product runs on a computer or a processor, the computer or processor is caused to execute the method described in the first aspect or any implementation manner of the first aspect. method.
- FIG. 1 is a schematic block diagram of a frequency modulation system 100 according to an embodiment of the present application.
- FIG. 2 is a schematic flowchart of a frequency modulation method according to an embodiment of the present application.
- FIG. 3 is a schematic flowchart of a method for predicting an energy efficiency parameter according to an embodiment of the present application.
- FIG. 4 is a schematic flowchart of a possible method for selecting a preset frequency point according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of a possible preset frequency point selection provided by an embodiment of the present application.
- FIG. 6 is a schematic diagram of a possible preset frequency point selection provided by another embodiment of the present application.
- FIG. 7 is a schematic block diagram of a frequency modulation apparatus 700 according to an embodiment of the present application.
- one or more of the following modules for example, central processing unit (CPU), graphics processing unit (GPU)) , Double-rate (double data rate, DDR) memory frequency adjustment.
- the frequency points of the one or more modules can be made to respond quickly to load requirements (performance requirements and / or power requirements), thereby reducing the power consumption of the smart phone and extending the battery life of the smart phone.
- the frequency modulation method provided in the embodiment of the present application may use the frame of an image as a unit, and according to the predicted frame information (also may be referred to as the energy efficiency parameter of at least one module), the at least one module (for example, CPU, GPU, DDR, NPU) ) Is adjusted to the predicted frequency. Since the solution of this embodiment performs the adjustment in units of frames of an image, it can respond to the demand for load changes in a timely manner, and can obtain good performance and / or power consumption benefits.
- the frequency modulation method in the embodiment of the present application can be applied to, but not limited to, a game field, a video field, or other general application fields. This solution can be applied to any field with image processing.
- the game field is taken as an example.
- the frequency of at least one of the modules (for example, CPU, GPU, DDR, and NPU) is adjusted to the predicted frequency based on the predicted frame information.
- the NPU in this embodiment is an artificial intelligence computing unit, and can be calculated using, for example, a convolutional neural network (CNN).
- CNN convolutional neural network
- the frame rate may be a metric for measuring the number of frames of a displayed image, that is, measuring the number of frames displayed per second (frame per second, FPS).
- Frames per second (FPS) or frame rate can indicate how many times the graphics processor can update per second. Due to the special physiological function of the human eye, if the frame rate of the picture viewed is higher than a certain threshold, the image can be considered as coherent, and a high frame rate can obtain a smoother game screen. Reflect more realistic animation.
- One frame of instructions can guarantee game performance.
- the predicted energy efficiency parameters of at least one module (predicting that the at least one module needs to process the current frame may be predicted) Parameter information) to predict a preset frequency point that meets performance requirements and power consumption requirements, and can adjust the operating frequency of the at least one module to a preset frequency point.
- FIG. 1 is a schematic block diagram of a frequency modulation system 100 according to an embodiment of the present application.
- the frequency modulation system 100 may include a CPU 110, a GPU 120, a memory 130, an NPU 140, and a bus 150.
- the frequency modulation system 100 in FIG. 1 may be included in one or more chips to form an electronic system.
- the system may be located in an electronic device, which may be a wireless terminal, a wired terminal, a user equipment, or a connectionless device.
- the electronic device may be a mobile phone, a laptop, a tablet, or a game console.
- the memory 130 may be, for example, a DDR memory, and the memory 130 may be used to store program codes and data of the frequency modulation system 100.
- the memory 130 may be connected to the CPU 110, the GPU 120, and the NPU 140 through the bus 150.
- bus 150 may be any type of bus such as a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus.
- PCI peripheral component interconnect
- EISA extended industry standard architecture
- the bus 150 may be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one line is used in FIG. 1, but it does not mean that there is only one bus or one type of bus.
- the CPU 110, GPU 120, and NPU 140 shown in FIG. 1 can be used to process images.
- the CPU 110, GPU 120, and NPU 140 can be on a chip and read from the memory 130 through an interface for external coupling with the chip.
- the software code is read into the chip from the memory 130 via the interface and the bus 150 and used by at least one of the CPU 110, the GPU 120, or the NPU 140.
- the frequency modulation system 100 may further include a hardware accelerator 160.
- the hardware accelerator 160 may include an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof.
- ASIC application-specific integrated circuit
- FPGA field programmable gate array
- FIG. 2 is a schematic flowchart of a frequency modulation method according to an embodiment of the present application.
- the method shown in FIG. 2 may include steps 210-230. Steps 210-230 are described in detail below.
- Step 210 Predict the energy efficiency parameters required for the at least one module to process the current frame.
- a module for processing a current frame of an image is not specifically limited. It may be one or more modules in the CPU 110, GPU 120, memory 130 (for example, DDR), or NPU 140 in the frequency modulation system 100 shown in FIG.
- the energy efficiency parameters required to process the current frame corresponding to the at least one module are not specifically limited. The following will combine different modules to give detailed examples of the energy efficiency parameters required by different modules to process the current frame. Instructions.
- the energy efficiency parameters of the CPU 110 predicted in the embodiments of the present application may include but are not limited to At least one of the following parameters: the number of instructions that the CPU needs to process the current frame, the cache misses that the CPU processes the current frame (for example, the number of times a level 1 data cache miss occurs (L1 data cache misses), Number of L2 cache misses (L2 data cache misses), the number of times that the CPU caused no operations due to front-end reasons (also known as frontend bound, FE bound) ). The number of times that the CPU caused no operation due to back-end reasons (no operation operation due backend) (also known as backend bound (BE bound)).
- the cache misses that the CPU processes the current frame for example, the number of times a level 1 data cache miss occurs (L1 data cache misses), Number of L2 cache misses (L2 data cache misses), the number of times that the CPU caused no operations due to front-end reasons (also known as frontend bound, FE bound) ).
- the energy efficiency parameters of the GPU 120 predicted in the embodiments of the present application may include, but are not limited to, At least one of the following parameters: the number of draw function calls (draw calls) required by the GPU to process the current frame, the number of triangles, and the pixel-total-fragment (total-FRAG) Number of times, texture-operation (TEX-operation) times.
- the energy efficiency of the memory 130 predicted in the embodiments of the present application is predicted.
- the parameter may include, but is not limited to, at least one of the following parameters: the memory stores or reads a bandwidth required for the current frame.
- the energy efficiency parameters of the NPU 140 predicted in the embodiments of the present application may include, but are not limited to, At least one of the following parameters: the amount of calculations (tasks) required by the NPU to process the current frame, the total number of instructions required by the NPU to execute the current frame (the total current-frame, total-INSTR-EXEC), The number of memory access requests (the total memory request, total-MEM-request) of the NPU.
- the energy efficiency parameters required for the current frame can be predicted according to the historical energy efficiency parameters (which can be obtained from the energy efficiency parameters of at least one frame before the current frame). For example, the energy efficiency parameters required for the current frame can be predicted based on the average of the historical energy efficiency parameters.
- a load prediction table can also be constructed and searched based on historical energy efficiency parameters, so that the energy efficiency parameters required for the current frame can be predicted according to the load prediction table. The specific implementation manner of constructing and finding the load prediction table will be described later with reference to FIG. 3, and is not repeated here.
- Step 220 According to the predicted energy efficiency parameters, select a first frequency point set from a plurality of frequency point sets to meet the energy efficiency requirements.
- the multiple frequency point sets in the embodiment of the present application may be used to represent a complete set of frequency points or a complete set of frequency points of at least one module shown in FIG. 1. It should be noted that the complete set of frequency points corresponding to different types of modules is different. Therefore, the number of frequency points or frequency point combinations in multiple frequency point sets is not specifically limited in the embodiment of the present application.
- the energy efficiency requirement can be used to indicate that the preset frequency point of the current frame can meet the performance requirement, or it can be expressed that the preset frequency point of the current frame can meet the power consumption requirement, which is not specifically limited in this application.
- the performance corresponding to each frequency point set in the multiple frequency point sets can be predicted according to the predicted energy efficiency parameters (for example, the running time corresponding to at least one module shown in FIG. 1), and the at least one The performance formula corresponding to the module selects a first frequency point set corresponding to a preset threshold value (for example, a preset running time corresponding to at least one module) that satisfies performance from a plurality of frequency point sets.
- the operating frequency of at least one module shown in FIG. 1 may be adjusted to a preset frequency in a corresponding first frequency set.
- the power consumption corresponding to multiple frequency point sets in the first frequency point set can also be predicted according to the predicted energy efficiency parameters (for example, the power consumption required for at least one module shown in FIG. 1 to run the current frame),
- the frequency point set corresponding to the lowest power consumption may be selected from the first frequency point set according to the power consumption formula corresponding to the at least one module.
- the operating frequency of at least one module shown in FIG. 1 may be adjusted to a preset frequency in a frequency set corresponding to the lowest power consumption.
- Step 230 Adjust the working frequency of at least one module to process the current frame to a preset frequency corresponding to at least one module in the first frequency set.
- a frequency point set that meets the performance requirements can be selected according to the performance formula in step 220, and the working frequency point for processing the current frame by at least one module shown in FIG. 1 can be adjusted to a frequency point set that meets the performance requirements.
- the frequency point set corresponding to the lowest power consumption can also be selected according to the power consumption formula in step 220, and the working frequency point of at least one module shown in FIG. 1 for processing the current frame can be adjusted to the frequency point set corresponding to the lowest power consumption.
- a preset frequency corresponding to at least one module does not specifically limit this.
- FIG. 3 is a schematic flowchart of a method for predicting an energy efficiency parameter according to an embodiment of the present application.
- the method shown in FIG. 3 may include steps 310-340. Steps 310-340 are described in detail below.
- Step 310 Collect historical energy efficiency parameters online. Collect online the energy efficiency parameters of at least one frame before the current frame (which can be understood as historical energy efficiency parameters), and the energy efficiency parameters corresponding to different modules in the current frame are different. Specifically, the current processing corresponding to at least one module shown in Figure 1 For the energy efficiency parameters required by the frame, please refer to the description in step 210, which will not be repeated here.
- Step 320 Quantify historical energy efficiency parameters.
- the historical energy efficiency parameters can be quantified.
- quantization can be understood as the discrete instantaneous values of the historical energy efficiency parameters collected online to a value that can be quantified.
- the module that processes the current frame of the image is the CPU 110 shown in FIG. 1, and the energy efficiency parameter is the number of instructions required by the CPU to process the current frame as an example.
- the quantification process is described in detail.
- the number of CPU instructions necessary for processing the first frame of 1 ⁇ 10 8, 1 ⁇ 10 8 may be quantized to 1.
- the number of instructions processed by the CPU required for the second frame is 5 ⁇ 10 8, 5 ⁇ 10 8 may be quantized to 5.
- the number of instructions required by the CPU to process different frames can be quantified.
- a change prediction table of energy efficiency parameters is established.
- a prediction table of energy efficiency parameter changes may be established.
- the load prediction table may indicate the change trends of multiple historical energy efficiency parameters or may directly indicate the correspondence between the multiple historical energy efficiency parameters and the predicted energy efficiency parameters. Therefore, the energy efficiency parameter required by the at least one module shown in FIG. 1 when processing the current frame of the image can be predicted according to the change rule of the energy efficiency parameter in the energy efficiency parameter change prediction table. It can be understood that the establishment of the energy efficiency parameter change prediction table can be performed in real time or when the device is in an idle or charged state.
- the energy efficiency parameters required for at least one module to process the current frame may be predicted according to the energy efficiency parameters of at least one frame before the current frame.
- the embodiment of the present application uses the energy efficiency parameters of 3 frames before the current frame to predict the energy efficiency parameters required for at least one module to process the current frame as an example for description.
- the module that processes the current frame of the image is the CPU
- the predicted energy efficiency parameters are the instructions required by the CPU to process the current frame as an example. According to the instructions for the first 3 frames in the prediction table, The process of counting the number of instructions required by the CPU to process the current frame is described in detail.
- the probability of occurrence of each quantized value in the current frame can be calculated according to the change law after the energy efficiency parameter quantization of at least one frame before the current frame, and the quantized value with the highest probability can be used as the predicted CPU required to process the current frame The number of instructions.
- the detailed description is given below with reference to specific examples in Table 2.
- the number of instructions required by the CPU to process the three frames before the current frame is 5, 7, and 3 respectively.
- the next (current frame) quantization number is 5 with the highest probability (about 54 %).
- it can be predicted that the instruction required by the CPU to process the current frame is 5 ⁇ 10 8 (5 after quantization).
- the number of instructions required by the CPU to process the three frames before the current frame is 5, 7, and 4, respectively.
- the next (current frame) quantization number is 4 with the highest probability of occurrence (about 70%).
- it can be predicted that the instruction required by the CPU to process the current frame is 4 ⁇ 10 8 (4 after quantization).
- the historical energy efficiency parameter quantization in step 320 and the energy efficiency parameter change prediction table in step 330 may be dynamically updated.
- Step 340 Predict the energy efficiency parameters required for the at least one module to process the current frame of the image.
- the energy efficiency parameter required by the at least one module shown in FIG. 1 for processing the current frame of the image may be predicted according to the energy efficiency parameter change prediction table in step 330.
- step 340 reference may be made to operation 210 in the process of FIG. 2.
- Steps 310-330 are a process of establishing a load prediction table, and can be considered as a process of pre-generating a table. For subsequent 340 operations, reference may be made to the corresponding process in FIG. 2.
- At least one that meets the requirements may be selected according to the predicted energy efficiency parameters and energy efficiency requirements of the current frame.
- the preset frequency point corresponding to the module, and the working frequency point of processing the current frame by at least one module shown in FIG. 1 can be adjusted to the corresponding preset frequency point.
- the embodiments of the present application do not specifically limit energy efficiency requirements.
- the energy efficiency requirement may be that at least one module processing the current frame of the image meets the performance requirement, for example, the running time of the at least one module processing the current frame of the image is less than or equal to a preset threshold.
- the energy efficiency requirement may also be that at least one module processing the current frame of the image satisfies both performance requirements and energy consumption requirements.
- FIG. 4 is a schematic flowchart of a possible method for selecting a preset frequency point according to an embodiment of the present application.
- the method shown in FIG. 4 may include steps 410-430. Steps 410-430 are described in detail below.
- step 410 the energy efficiency parameters of all frequency points or frequency point combinations of at least one module are collected offline.
- energy efficiency parameters of all frequency points or frequency point combinations of at least one module (CPU, GPU, DDR, or NPU module) for processing an image shown in FIG. 1 may be collected offline.
- the set of frequency points corresponding to different types of modules may be different, and energy efficiency parameters of each frequency point or combination of frequency points may be collected according to one or more modules processing the current frame of the image.
- the at least one of the above modules may process the current frame of the image separately, and multiple modules may also process the current frame of the image in any combination.
- energy efficiency parameters of the CPU at various frequency points may be collected.
- the energy efficiency parameters of the CPU and the DDR combined at various frequency points may be collected.
- the performance formula of at least one module is derived offline.
- y in the performance formula y f (x) can be the CPU running time (CPU running time), and x can be the parameters instructions, L1 data cache misses, L2 data cache misses, FE bound, Or one or more of the BEbound.
- the performance formula of the CPU can be expressed as:
- W0-W3 are weight parameters
- instrutions is the number of instructions required by the CPU to process any frame of the image
- FE bound is the number of times the CPU has no instruction execution due to front-end reasons
- BE bound is the number of times the queue is congested when the CPU executes the CPU instruction due to back-end reasons
- the performance formula of the CPU can also be expressed as:
- L1 misses is the number of times a level 1 data cache miss occurs when the CPU processes an arbitrary frame of the image
- L2 misses is the number of times the secondary data cache miss occurs when the CPU processes any frame of the image
- the GPU's performance formula can be expressed as:
- draw calls are the number of drawing function calls required by the GPU when processing any frame of the image
- triangles is the number of triangles required for the GPU to process any frame of the image
- total-FRAG is the number of pixel drawing times required by the GPU when processing any frame of the image
- TEX-operation is the number of times the texture is drawn when the GPU processes any frame of the image
- y in the performance formula y f (x) can be the NPU running time (NPU running time), and x can be in the parameters tasks, total-INSTR-EXEC, total-MEM-request One or more.
- the performance formula of the NPU can be expressed as:
- NPU running time W0 * tasks + W1 * total-INSTR-EXEC + W2 * total-MEM-request + W3 (4)
- tasks is the amount of calculation required when the NPU processes any frame of the image
- total-INSTR-EXEC is the number of instructions required by the NPU to process any frame of the image
- total-MEM-request is the number of times that the NPU requests memory access when processing any frame of the image
- step 430 a frequency point set that meets the performance requirements is selected according to the performance formula and the predicted energy efficiency parameters required for the current frame.
- Steps 410 and 420 are offline operations, and step 430 is online calculation.
- the specific online operation can refer to the corresponding process in FIG.
- a subset S that satisfies a preset condition.
- the preset condition may be less than or equal to a certain time threshold.
- the time threshold may be a duration required for each frame calculated according to a target frame rate. Taking a 60-frame game as an example, the time required for each frame is 1000/60 ⁇ 16.6ms.
- a frequency point sub-set S (the frequency point sub-set S including at least one module corresponding to a preset frequency point) whose running time is less than or equal to a certain time threshold is selected.
- the working frequency of at least one module processing the current frame may be adjusted to a preset frequency point corresponding to at least one module in the foregoing subset S, and the corresponding at least one module may process the current frame at the preset frequency point.
- FIG. 5 is a schematic diagram of a possible preset frequency selection provided by an embodiment of the present application.
- the CPU 110 and the memory 130 (for example, DDR) shown in FIG. 1 need to be frequency-modulated (that is, the CPU 110 and the memory 130 (for example, DDR) process the current frame of the image) as an example.
- the complete set of frequency points T in FIG. 4 may be a combination of all frequency points of the CPU 110 and the memory 130 (for example, DDR).
- the complete frequency set T can be:
- Frequency combination 1 CPU frequency 1 / DDR frequency 1 ⁇ W 10 , W 11 , W 12 , W 13 ⁇
- Frequency combination 2 CPU frequency 1 / DDR frequency 2 ⁇ W 20 , W 21 , W 22 , W 23 ⁇
- Frequency combination 3 CPU frequency 1 / DDR frequency 3 ⁇ W 30 , W 31 , W 32 , W 33 ⁇
- Frequency combination n CPU frequency 1 / DDR frequency n ⁇ W n0 , W n1 , W n2 , W n3 ⁇
- n + 1 CPU frequency 2 / DDR frequency 1 ⁇ W (n + 1) 0 , W (n + 1) 1 , W (n + 1) 2 , W (n + 1) 3 ⁇
- n + 2 CPU frequency 2 / DDR frequency 2 ⁇ W (n + 2) 0 , W (n + 2) 1 , W (n + 2) 2 , W (n + 2) 3 ⁇
- Frequency combination 2n CPU frequency 2 / DDR frequency n ⁇ W (2n) 0 , W (2n) 1 , W (2n) 2 , W (2n) 3 ⁇
- ⁇ W 00 , W 01 , W 02 , W 03 ⁇ in the frequency point combination may correspond to the weight W x in the performance formula under the frequency point combination.
- a subset S that satisfies performance requirements may be selected from the complete set T of frequency combinations.
- the subset S that meets the performance requirements can be:
- Frequency combination 1 CPU frequency 1 / DDR frequency 1 ⁇ W 00 , W 01 , W 02 , W 03 ⁇
- Frequency combination 2 CPU frequency 1 / DDR frequency 2 ⁇ W 10 , W 11 , W 12 , W 13 ⁇
- Frequency combination 3 CPU frequency 1 / DDR frequency 3 ⁇ W 20 , W 21 , W 22 , W 23 ⁇
- Frequency combination 4 CPU frequency 2 / DDR frequency 1 ⁇ W 30 , W 31 , W 32 , W 33 ⁇
- Frequency combination 5 CPU frequency 2 / DDR frequency 3 ⁇ W 40 , W 41 , W 42 , W 43 ⁇
- Frequency combination 6 CPU frequency 3 / DDR frequency 3 ⁇ W 50 , W 51 , W 52 , W 53 ⁇
- Frequency combination 7 CPU frequency 4 / DDR frequency 1 ⁇ W 60 , W 61 , W 62 , W 63 ⁇
- the frequency point combination of the CPU and DDR included in the sub-set S in FIG. 5 can meet the performance requirements, and the working frequency points of the CPU and DDR in processing the current frame of the image can be adjusted to the following 6 intermediate frequency point combinations (sub-set S) Any combination of frequency points.
- FIG. 5 is based on performance requirements, selecting a subset S that satisfies the performance requirements from all frequency points or frequency point combinations of at least one module.
- the frequency point set M corresponding to the lowest power consumption may be selected from the subset S according to the power consumption formula.
- the specific implementation manner of selecting the frequency point set M corresponding to the lowest power consumption from the subset S is described in combination with FIG. 6, and details are not described herein again.
- the total power consumption formula required by the CPU and DDR can be expressed as:
- W0-W3 are weight parameters
- frame duration is the reciprocal of the target frame rate
- CPU running time is the running time required by the CPU to process any frame of the image
- bandwidth is the total amount of bandwidth required by DDR to store or read any frame of the image
- the total power consumption formula required by the CPU, GPU, NPU, and DDR can be expressed as:
- W0-W5 are weight parameters
- GPU running time is the running time required for the GPU to process any frame of the image
- NPU running time is the running time required for the NPU to process any frame of the image
- FIG. 6 is a schematic diagram of a possible preset frequency point selection provided by another embodiment of the present application.
- the CPU 110 and the memory 130 (for example, DDR) shown in FIG. 1 need to be frequency-modulated (that is, the CPU 110 and the memory 130 (for example, DDR) in FIG. 1 process the current frame of the image) as an example.
- the CPU and DDR can be selected from the subset S (for example, the above-mentioned 6 frequency combinations of the CPU and DDR that meet the performance requirements) according to the power consumption formula (5).
- the frequency point combination M with the lowest total power consumption is required, and the working frequency point of the CPU and DDR in processing the current frame of the image can be adjusted to the frequency point in the frequency point combination M.
- the frequency combination M with the lowest total power consumption required by the CPU and DDR can be:
- Frequency combination 4 CPU frequency 2 / DDR frequency 1 ⁇ W 30 , W 31 , W 32 , W 33 ⁇
- the frequency combination M with the lowest total power consumption required by the CPU and DDR may be one or multiple, which is not specifically limited in this embodiment of the present application.
- the frequency modulation method provided in the embodiment of the present application can respond to the load change demand in time during the frequency modulation process, and can obtain good performance and / or power consumption benefits.
- the following takes the overall performance of some games as an example and uses Table 3 to describe in detail the overall performance and power consumption benefits of the game after using the frequency modulation method (AI frequency modulation method) in the embodiment of the present application.
- AI frequency modulation method AI frequency modulation method
- Table 3 lists the game performance benefits and power consumption benefits of some games before and after the implementation of AI FM scheduling. Gaming performance gains are mainly reflected in the average frame rate, fluency, and stutter rate. Among them, the closer the average frame rate is to the full frame rate, the better the game performance; fluency can be used to indicate the difference in frame rate (the next second Frame rate minus the standard deviation of the frame rate in the previous second), the lower the value, the better the game performance; the stall rate can be used to indicate the ratio of the frame rate below the set threshold (counted in seconds), which The lower the value, the better the game performance.
- the frequency modulation scheduling method (AI frequency modulation scheduling) provided by the embodiment of the present application is not used, and the average frame rate is 56. After using the AI frequency modulation scheduling method, Its average frame rate rose to 57.89.
- fluency when the AI FM scheduling method is not used, its fluency is 4.81, and after using the AI FM scheduling method, its fluency drops to 2.39.
- power consumption benefits after using the AI frequency modulation scheduling method provided in the embodiment of the present application, the energy efficiency benefit ratio increased by 9.11%. Therefore, after using the AI frequency modulation scheduling method provided by the embodiment of the present application, the overall performance and power consumption of the game are significantly improved, and the energy efficiency benefits thereof are obvious.
- the predicted running time required for at least one module to process the current frame is more accurate.
- the following takes the predicted frame running time required to run the QQ speeding car in Table 4 as an example, and describes in detail Table 5 the deviation between the running time required by the at least one module and the actual running time in processing the current frame.
- the frequency modulation method in the embodiment of the present application can be applied to, but not limited to, a game field, a video field, or other general application fields.
- the MAD counts the deviation between the running time required by the at least one module and the actual running time when processing the current frame.
- FIG. 7 shows a schematic block diagram of a frequency modulation apparatus 700 according to an embodiment of the present application.
- Each module in the frequency modulation apparatus 700 is configured to perform each action or process in the foregoing method.
- FIG. 7 is a schematic block diagram of a frequency modulation apparatus 700 according to an embodiment of the present application.
- the frequency modulation device 700 may include a prediction module 710, a determination module 720, and a processing module 730.
- the prediction module 710 is configured to predict energy efficiency parameters required by at least one module to process a current frame of an image, and the at least one module includes a central processing unit.
- the energy efficiency parameters include the number of instructions required by the CPU to process the current frame, a cache miss generated by the CPU to process the current frame, and the GPU to process the current frame. At least one of a number of times a drawing function required for a frame is called, a bandwidth required for the memory to store or read the current frame, or a calculation amount required for the NPU to process the current frame.
- the energy efficiency requirement includes at least one of the following: a power consumption requirement or a performance requirement.
- the power consumption requirement is a minimum power consumption requirement
- the performance requirement is to meet a preset threshold
- the determining module 720 is specifically configured to: predict the performance corresponding to each frequency point set according to the predicted energy efficiency parameter; and select from a plurality of frequency point sets to satisfy a preset The first frequency point set corresponding to the performance of the threshold.
- the determining module 720 is further specifically configured to: predict the power consumption corresponding to each frequency point set according to the predicted energy efficiency parameter; and select from a plurality of frequency point sets A plurality of second frequency point sets corresponding to a plurality of performances satisfying a preset threshold is selected; and a first frequency point set corresponding to the lowest power consumption is selected from the plurality of second frequency point sets.
- the prediction module 710 is specifically configured to predict an energy efficiency parameter required by the at least one module to process a current frame of an image according to a historical energy efficiency parameter, where the historical energy efficiency parameter is Energy efficiency parameters are obtained for at least one frame.
- the prediction module 710 is specifically configured to: search a load prediction table according to historical energy efficiency parameters to predict energy efficiency parameters required for the at least one module to process a current frame of an image.
- each module 710-730 in the frequency modulation apparatus 700 corresponding to FIG. 7 may be implemented by software, hardware, or a combination thereof. If implemented in hardware, the frequency modulation device 700 is a hardware circuit, and each module can be considered as a circuit unit, including at least one of a digital circuit, a logic circuit, an analog circuit, a hardware accelerator, or an algorithm circuit. At this time, the frequency modulation device 700 can be regarded as a dedicated hardware, for example, it can be regarded as a hardware accelerator 160 or a part thereof in the system of FIG. 1.
- the frequency modulation device 700 may be formed by a software program, where each module includes program instructions and is executed by a processor, such as the CPU 110 mentioned in the system of FIG. 1 previously, to implement related functions, that is, each module is a software
- the module can run on the CPU 110 shown in Figure 1. For details, refer to the previous description.
- modules in the frequency modulation device 700 are hardware circuits, which may be the hardware accelerator 160 in the system of FIG. 1, and the other modules are processors, such as software modules executed by the CPU 110 in FIG. 1. No restrictions.
- An embodiment of the present application further provides a chip, including a memory and an interface for externally coupling the processor with the chip.
- the interface may be used for coupling to a memory external to the chip, and the memory may be used for storing the chip.
- the processor can read program code from the memory through the interface to perform the operations.
- An embodiment of the present application further provides a computer-readable storage medium including a computer program, and when the computer program runs on a computer or a processor, the computer or the processor causes the computer or the processor to execute the steps described in steps 210 to 230 and the like. method.
- the embodiment of the present application further provides a computer program product, when the computer program product runs on a computer or a processor, the computer or processor is caused to execute the method described in steps 210 to 230 and the like.
- the flow of any method in FIG. 2 to FIG. 4 may be executed by a processor.
- the processor may be a general-purpose processor, and the operations of the flow may be implemented by hardware or related operations may be implemented by executing software.
- the processor may include a microprocessor having a logic circuit or an integrated circuit, a digital signal processor, a microcontroller, or the CPU 110 mentioned in the above embodiments, etc., by reading and storing in the memory
- Software code to implement the function the memory can be integrated in the processor, can be located outside the processor, and exists independently.
- the processor may include a necessary hardware accelerator, such as a hardware algorithm circuit, a logic operation circuit, or an analog circuit that does not rely on software to perform operations.
- the method flow in the previous embodiment is executed by the CPU 110.
- the CPU 110 implements the method flow by executing a program in the memory 130 so as to tune a plurality of modules in the system, including the CPU 110 itself.
- the online operation is an operation performed by the CPU 110 or other processors.
- the offline operation may be performed by a person skilled in the art during the development process before the device leaves the factory, and the obtained results, such as the performance formula mentioned in the embodiment, are preset in the device.
- the performance formula may be preset in the device in the form of software or hardware algorithm circuit.
- the “first”, “second”, and “third” in the embodiments of the application are only for distinction, and should not be construed as any limitation to this application. It should also be understood that, in the various embodiments of the present application, the size of the sequence number of each process does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not deal with the embodiments of the present application. The implementation process constitutes any limitation. It should also be understood that, in the various embodiments of the present application, the size of the sequence number of each process does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not deal with the embodiments of the present application. The implementation process constitutes any limitation.
- various aspects or features of the present application may be implemented as a method, apparatus, or article of manufacture using standard programming and / or engineering techniques.
- article of manufacture encompasses a computer program accessible from any computer-readable device, carrier, or medium.
- computer-readable media may include, but are not limited to: magnetic storage devices (eg, hard disks, floppy disks, or magnetic tapes, etc.), optical disks (eg, compact discs (CD), digital versatile discs (DVD) Etc.), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), cards, sticks or key drives, etc.).
- various storage media described herein may represent one or more devices and / or other machine-readable media used to store information.
- machine-readable medium may include, but is not limited to, wireless channels and various other media capable of storing, containing, and / or carrying instruction (s) and / or data.
- the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
- the technical solution of this application is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product.
- the computer software product is stored in a storage medium, including Several instructions to cause a computer device (which may be a personal computer, a server, or a device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage media include: U disks, mobile hard disks, read-only memories (ROMs), random access memories (RAMs), magnetic disks or compact discs and other media that can store program codes .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Power Sources (AREA)
Abstract
一种调频方法,该方法包括:预测至少一个模块处理图像的当前帧所需的能效参数(210);根据预测的所述能效参数,从多个频点集合中选择出满足能效需求的第一频点集合(220),将所述至少一个模块处理所述当前帧的工作频点调整为所述至少一个模块对应的预设频点(230)。该方法可以在调频过程中,可以及时响应负载变化需求。
Description
本申请涉及电子技术领域,并且更具体地,涉及一种调频方法、装置及计算机可读存储介质。
近年来,移动设备领域特别是智能手机的飞速发展,移动化的使用习惯意味着用户对智能手机的功耗和发热也有着很高的要求。为了降低智能手机的功耗、延长智能手机的续航时间,可以根据负载对中央处理单元(central processing unit,CPU)的频点进行调节。可以使得CPU的频点快速响应负载需求(或性能需求),从而降低智能手机的功耗、延长智能手机的续航时间。
现有技术中,采用窗口辅助负载跟踪(window assisted load tracking,WALT)或每个实体负载跟踪(per-entity load tracking,PELT)的方法调频调度。WALT可以根据任务的历史N个非空窗口下的占空比或CPU的占用率通过定义好的统计策略计算出当前窗口下的占空比或CPU的占用率,并逐级对频点进行调节。但是现有技术中,以窗口大小为周期,根据预测的CPU占用率逐级进行调频,不能及时响应性能需求的变化。不仅是CPU,其他类似器件的调频也存在类似问题。
因此,在调频过程中,如何及时响应任务的性能变化、降低功耗成为亟需要解决的问题。
发明内容
本申请提供一种调频方法、装置,在调频过程中,可以及时响应负载变化需求。
第一方面,提供了一种调频方法,该方法包括:预测至少一个模块处理图像的当前帧所需的能效参数;根据预测的所述能效参数,从多个频点集合中选择出第一频点集合,将所述至少一个模块处理所述当前帧的工作频点调整为所述至少一个模块对应的预设频点。
应理解,处理图像的所述至少一个模块包括中央处理单元CPU、图形处理单元GPU、用于存储所述当前帧的内存储器(例如,可以是DDR存储器)、或神经网络处理单元NPU中的至少一个。还应理解,所述第一频点集合中包括所述至少一个模块对应的预设频点。
本申请实施例中预测至少一个模块处理当前帧所需要的能效参数的实现方式有多种,本申请对此不做具体限定。可以根据历史能效参数(可以由当前帧之前的至少一个帧的能效参数获得)预测当前帧所需要的能效参数。例如,可以根据历史能效参数的均值预测当前帧所需要的能效参数。又如,还可以根据历史能效参数构建并查找负载预测表,从而可以根据负载预测表预测当前帧所需要的能效参数。
本申请实施例中,在调频过程中可以及时响应负载变化需求,可以获得良好的性能和 /或功耗收益。
结合第一方面,在第一方面的某些实现方式中,能效参数包括所述CPU处理所述当前帧所需的指令数、所述CPU处理所述当前帧所产生的缓存缺失、所述GPU处理所述当前帧所需的绘图函数的调用次数、所述存储器读取或存储所述当前帧所需的带宽、或所述NPU处理所述当前帧所需的计算量中的至少一个。
可选地,在一些实施例中,如果是CPU对图像的当前帧进行处理,本申请实施例预测的该CPU的能效参数可以包括但不限于以下参数中的至少一个:该CPU处理当前帧所需要的指令数(instructions)、该CPU处理当前帧所产生的缓存缺失(例如,一级数据缓存缺失所发生的次数(L1 data cache misses)、或二级缓存缺失所发生的次数(L2 data cache misses)中的一个或全部)、该CPU因前端原因导致无操作(no operation issused due to frontend)所发生的次数(也可以称为FE bound)、该CPU因后端原因导致无操作(no operation issused due to backend)所发生的次数(也可以称为BE bound)。
可选地,在一些实施例中,如果是GPU对图像的当前帧进行处理,本申请实施例预测的该GPU的能效参数可以包括但不限于以下参数中的至少一个:该GPU处理当前帧所需要的绘图函数的调用(draw calls)次数、三角形(triangles)的绘制个数、像素(pixel)的绘制填充(total-fragment,total-FRAG)次数、纹理的绘制(texture-operation,TEX-operation)次数。
可选地,在一些实施例中,如果是存储所述当前帧的存储器(例如,DDR存储器),本申请实施例预测的该存储器的能效参数可以包括但不限于以下参数中的至少一个:该存储器存储或读取当前帧所需要的带宽(bandwidth)。
可选地,在一些实施例中,如果是NPU对图像的当前帧进行处理,本申请实施例预测的该NPU的能效参数可以包括但不限于以下参数中的至少一个:该NPU处理当前帧所需的计算量(tasks)、该NPU执行当前帧所需要的指令数(the total of instructionss needed to execute the current frame,total-INSTR-EXEC)、该NPU的内存访问请求(the total of memory request,total-MEM-request)次数。
结合第一方面,在第一方面的某些实现方式中,所述第一频点集合能够满足能效需求,能效需求包括如下至少一项:功耗需求或性能需求。
本申请实施例中能效需求可以用于表示当前帧的预设频点能够满足性能需求,也可以表示为当前帧的预设频点能够满足功耗需求,本申请对此不做具体限定。作为一个示例,可以根据预测的能效参数,预测多个频点集合中的每个频点集合对应的性能(例如,至少一个模块所对应的运行时间)。作为另一个示例,还可以根据预测的能效参数,预测第一频点集合中多个频点集合对应的功耗(例如,至少一个模块运行当前帧所需要的功耗),并可以根据至少一个模块所对应的功耗公式,从第一频点集合中选择出最低功耗对应的频点集合。
结合第一方面,在第一方面的某些实现方式中,所述功耗需求为最低功耗需求,所述性能需求为满足预设阈值。
应理解,预设阈值可以是小于或等于某一个时间阈值。作为一个示例,该时间阈值可以是根据目标帧率计算得到的每帧所需的时长。以60帧的游戏为例,其中每一帧所需的时长为1000/60≈16.6ms。
结合第一方面,在第一方面的某些实现方式中,根据预测的所述能效参数预测所述每个频点集合对应的性能;从多个频点集合中选择出满足预设阈值的性能所对应的第一频点集合。
具体地,可以根据预测的所述能效参数以及性能公式,从多个频点集合中选择出满足预设阈值的性能所对应的第一频点集合。
应理解,所述第一频点集合中至少一个模块的预设频点对应的运行时间满足预测阈值,所述性能公式用于表示所述至少一个模块的运行时间与能效参数之间的函数关系。
本申请实施例中可以根据至少一个模块的运行参数(例如,运行时间)以及能效参数推导出至少一个模块的性能公式y=f(x)。
应理解,性能公式y=f(x)可以用于表示至少一个模块的运行参数(例如,运行时间)以及能效参数之间的函数关系,其中,x可以是自变量(例如,至少一个模块的能效参数),y可以是因变量(例如,可以是至少一个模块执行任意帧所需的运行时间)。
本申请实施例对性能公式y=f(x)不做具体限定,x与y之间的函数关系可以是线性的,也可以是非线性的。
结合第一方面,在第一方面的某些实现方式中,根据预测的所述能效参数预测所述每个频点集合对应的功耗;从多个频点集合中选择出满足预设阈值的多个性能所对应的多个第二频点集合;从多个第二频点集合中选择出最低功耗所对应的第一频点集合。
具体地,可以根据预测的所述能效参数以及功耗公式,从多个频点集合中选择出满足预设阈值的多个性能所对应的多个第二频点集合;从多个第二频点集合中选择出最低功耗所对应的第一频点集合。
应理解,可以根据预测的所述能效参数以及性能公式,从多个频点集合中选择出满足性能要求的多个第二频点集合。还可以根据预测的所述能效参数以及功耗公式,从多个第二频点集合中选择出最低功耗所对应的第一频点集合。
其中,所述功耗公式用于表示所述至少一个模块处理当前所需的功耗与所述至少一个模块的能效参数之间的函数关系。
本申请实施例中可以根据至少一个模块的功耗(power)参数以及能效参数推导出至少一个模块的功耗公式p=f(y)。
应理解,功耗公式p=f(y)可以用于表示至少一个模块的功耗参数以及能效参数之间的函数关系,其中,y可以是自变量,y可以是性能公式的输出值,例如,至少一个模块执行任意帧所需的运行时间,p可以是因变量,例如至少一个模块处理图像当前帧所需的功耗(power)。
本申请实施例对功耗公式p=f(y)不做具体限定,y与p之间的函数关系可以是线性的,也可以是非线性的。
结合第一方面,在第一方面的某些实现方式中,根据历史能效参数预测所述至少一个模块处理图像的当前帧所需的能效参数,所述历史能效参数由当前帧之前的至少一个帧的能效参数获得。
结合第一方面,在第一方面的某些实现方式中,根据历史能效参数查找负载预测表,以预测所述至少一个模块处理图像的当前帧所需的能效参数。
结合第一方面,在第一方面的某些实现方式中,所述负载预测表至少指示所述多个历 史能效参数的变化趋势或者直接指示了多个历史能效参数与预测的能效参数之间的对应关系。
第二方面,提供了一种调频装置,该装置包括:预测模块、确定模块以及处理模块;所述预测模块用于:预测至少一个模块处理图像的当前帧所需的能效参数,所述至少一个模块包括中央处理单元CPU、图形处理单元GPU、用于存储所述当前帧的存储器、或神经网络处理单元NPU中的至少一个;所述确定模块用于:根据预测的所述能效参数,从多个频点集合中选择出第一频点集合,所述第一频点集合中包括所述至少一个模块对应的预设频点;所述处理模块用于:将所述至少一个模块处理所述当前帧的工作频点调整为所述至少一个模块对应的预设频点。
结合第二方面,在第二方面的某些实现方式中,所述能效参数包括所述CPU处理所述当前帧所需的指令数、所述CPU处理所述当前帧所产生的缓存缺失、所述GPU处理所述当前帧所需的绘图函数的调用次数、所述存储器读取或存储所述当前帧所需的带宽、或所述NPU处理所述当前帧所需的计算量中的至少一个。
结合第二方面,在第二方面的某些实现方式中,所述第一频点集合能够满足能效需求,所述能效需求包括如下至少一项:功耗需求或性能需求。
结合第二方面,在第二方面的某些实现方式中,所述功耗需求为最低功耗需求,所述性能需求为满足预设阈值。
结合第二方面,在第二方面的某些实现方式中,所述确定模块具体用于:根据预测的所述能效参数预测所述每个频点集合对应的性能;从多个频点集合中选择出满足预设阈值的性能所对应的第一频点集合。
结合第二方面,在第二方面的某些实现方式中,所述确定模块还具体用于:根据预测的所述能效参数预测所述每个频点集合对应的功耗;所述从多个频点集合中选择出满足预设阈值的多个性能所对应的多个第二频点集合;从多个第二频点集合中选择出最低功耗所对应的第一频点集合。
结合第二方面,在第二方面的某些实现方式中,所述预测模块具体用于:根据历史能效参数预测所述至少一个模块处理图像的当前帧所需的能效参数,所述历史能效参数由当前帧之前的至少一个帧的能效参数获得。
结合第二方面,在第二方面的某些实现方式中,所述预测模块具体用于:根据历史能效参数查找负载预测表以预测所述至少一个模块处理图像的当前帧所需的能效参数。
第三方面,提供了一种调频系统,该装置包括:存储器、处理器。所述存储器用于存储程序;所述处理器用于执行所述存储器中存储的程序,当所述程序被执行时,所述处理器执行第一方面或第一方面中任意一种可能的实现方式中所述的方法。例如,该处理器可以进一步与收发器通信连接。例如,该存储器可以用于存储该设备的程序代码和数据。因此,该存储器可以是处理器内部的存储单元,也可以是与处理器独立的外部存储单元,还可以是包括处理器内部的存储单元和与处理器独立的外部存储单元的部件。
可选地,该处理器可以是通用处理器,可以通过硬件来实现所述方法流程也可以通过执行软件来实现所述流程。当通过硬件实现时,该处理器可以包括具有逻辑电路或集成电路等的微处理器、数字信号处理器、微控制器或所述CPU等,通过读取存储器中存储的软件代码来实现功能,该存储器可以集成在处理器中,可以位于该处理器之外,独立存在。 进一步地,该处理器可以包括必要的硬件加速器,例如不依赖于软件执行运算的硬件算法电路、逻辑运算电路、或模拟电路等。
当程序被执行时,所述处理器用于:预测至少一个模块处理图像的当前帧所需的能效参数,所述至少一个模块包括中央处理单元CPU、图形处理单元GPU、用于存储所述当前帧的存储器、或神经网络处理单元NPU中的至少一个;所述处理器还用于:根据预测的所述能效参数,从多个频点集合中选择出第一频点集合,所述第一频点集合中包括所述至少一个模块对应的预设频点;所述处理器还用于:将所述至少一个模块处理所述当前帧的工作频点调整为所述至少一个模块对应的预设频点。
结合第三方面,在第三方面的某些实现方式中,所述能效参数包括所述CPU处理所述当前帧所需的指令数、所述CPU处理所述当前帧所产生的缓存缺失、所述GPU处理所述当前帧所需的绘图函数的调用次数、所述存储器读取或存储所述当前帧所需的带宽、或所述NPU处理所述当前帧所需的计算量中的至少一个。
结合第三方面,在第三方面的某些实现方式中,所述第一频点集合能够满足能效需求,所述能效需求包括如下至少一项:功耗需求或性能需求。
结合第三方面,在第三方面的某些实现方式中,所述功耗需求为最低功耗需求,所述性能需求为满足预设阈值。
结合第三方面,在第三方面的某些实现方式中,所述处理器具体用于:根据预测的所述能效参数预测所述每个频点集合对应的性能;从多个频点集合中选择出满足预设阈值的性能所对应的第一频点集合。
结合第三方面,在第三方面的某些实现方式中,所述处理器用于:根据预测的所述能效参数预测所述每个频点集合对应的功耗;所述从多个频点集合中选择出满足预设阈值的多个性能所对应的多个第二频点集合;从多个第二频点集合中选择出最低功耗所对应的第一频点集合。
结合第三方面,在第三方面的某些实现方式中,所述处理器具体用于:根据历史能效参数预测所述至少一个模块处理图像的当前帧所需的能效参数,所述历史能效参数由当前帧之前的至少一个帧的能效参数获得。
结合第三方面,在第三方面的某些实现方式中,所述处理器具体用于:根据历史能效参数查找负载预测表以预测所述至少一个模块处理图像的当前帧所需的能效参数。
第四方面,提供了一种芯片,包括处理器和将该处理器与所述芯片外部耦合的接口,例如,接口可以用于耦合至所述芯片外部的存储器,该存储器可以用于存储该设备的程序代码和数据。关于存储器和处理器的具体描述可以参照第三方面或其中任一实现方式的介绍。处理器可以通过该接口从存储器中读取程序代码以执行所述操作。
当程序被执行时,所述处理器用于:预测至少一个模块处理图像的当前帧所需的能效参数,所述至少一个模块包括中央处理单元CPU、图形处理单元GPU、用于存储所述当前帧的存储器、或神经网络处理单元NPU中的至少一个;所述处理器还用于:根据预测的所述能效参数,从多个频点集合中选择出第一频点集合,所述第一频点集合中包括所述至少一个模块对应的预设频点;所述处理器还用于:将所述至少一个模块处理所述当前帧的工作频点调整为所述至少一个模块对应的预设频点。
结合第四方面,在第四方面的某些实现方式中,所述能效参数包括所述CPU处理所 述当前帧所需的指令数、所述CPU处理所述当前帧所产生的缓存缺失、所述GPU处理所述当前帧所需的绘图函数的调用次数、所述存储器读取或存储所述当前帧所需的带宽、或所述NPU处理所述当前帧所需的计算量中的至少一个。
结合第四方面,在第四方面的某些实现方式中,所述第一频点集合能够满足能效需求,所述能效需求包括如下至少一项:功耗需求或性能需求。
结合第四方面,在第四方面的某些实现方式中,所述功耗需求为最低功耗需求,所述性能需求为满足预设阈值。
结合第四方面,在第四方面的某些实现方式中,所述处理器具体用于:根据预测的所述能效参数预测所述每个频点集合对应的性能;从多个频点集合中选择出满足预设阈值的性能所对应的第一频点集合。
结合第四方面,在第四方面的某些实现方式中,所述处理器用于:根据预测的所述能效参数预测所述每个频点集合对应的功耗;所述从多个频点集合中选择出满足预设阈值的多个性能所对应的多个第二频点集合;从多个第二频点集合中选择出最低功耗所对应的第一频点集合。
结合第四方面,在第四方面的某些实现方式中,所述处理器具体用于:根据历史能效参数预测所述至少一个模块处理图像的当前帧所需的能效参数,所述历史能效参数由当前帧之前的至少一个帧的能效参数获得。
结合第四方面,在第四方面的某些实现方式中,所述处理器具体用于:根据历史能效参数查找负载预测表以预测所述至少一个模块处理图像的当前帧所需的能效参数。
第五方面,提供了一种计算机可读存储介质,包括计算机程序,当该计算机程序在计算机或处理器上运行时,使得该计算机或处理器如执行第一方面或第一方面的任意一种实现方式中所述的方法。
第六方面,提供了一种计算机程序产品,当该计算机程序产品在计算机或处理器上运行时,使得该计算机或处理器执行如第一方面或第一方面任意一种实现方式中所述的方法。
图1是本申请实施例提供的一种调频系统100示意性框图。
图2是本申请实施例提供的一种调频方法的示意性流程图。
图3是本申请实施例提供的一种预测能效参数的方法的示意性流程图。
图4是本申请实施例提供的一种可能的预设频点的选择方法的示意性流程图。
图5是本申请实施例提供的一种可能的预设频点选择示意图。
图6是本申请另一实施例提供的一种可能的预设频点选择的示意图。
图7是本申请实施例提供的调频装置700的示意性框图。
下面将结合附图,对本申请中的技术方案进行描述。近年来,移动设备领域特别是智能手机的飞速发展,使得智能手机在日常生活中的作用越来越大。与此同时,智能手机上运行的应用程序功能日益多样,用户对性能的要求自然也水涨船高。但是,移动化的使用 习惯意味着用户对智能手机的功耗和发热也有着很高的要求。如何在满足性能的前提下降低功耗、延长智能手机的续航时间、减少发热成为了行业和消费者共同关注的焦点。
为了降低智能手机的功耗、延长智能手机的续航时间,可以根据负载对以下一种或多种模块(例如,中央处理单元(central processing unit,CPU),图形处理单元(graphics processing unit,GPU),双倍速率(double data rate,DDR))存储器的频点进行调节。可以使得上述一个或多个模块的频点快速响应负载需求(性能需求和/或功率需求),从而降低智能手机的功耗、延长智能手机的续航时间。
为了克服现有现有技术中存在的通过预测出的当前窗口的CPU的占用率逐级对CPU的频点进行调节从而导致不能及时响应负载变化需求的问题。本申请实施例中提供的调频方法可以以图像的帧为单位,根据预测的帧信息(也可以称为至少一个模块的能效参数)将上述至少一种模块(例如,CPU、GPU、DDR、NPU)的频点调整为预测频点。由于本实施例的方案以图像的帧为单位执行所述调整,可以及时响应负载变化需求,并可以获得良好的良好的性能和/或功耗收益。
本申请实施例中的调频方法可以适用但不限于:游戏领域、视频领域或其他通用的应用领域。凡是具有图像处理的领域,本方案可以适用。下面以游戏领域为例,对本申请实施例中以帧为单位,根据预测的帧信息将上述至少一种模块(例如,CPU、GPU、DDR、NPU)的频点调整为预测频点,可以及时响应负载变化需求进行详细说明。本实施例的NPU是一种人工智能运算单元,可以利用例如卷积神经网络(CNN)进行计算。
一方面,对于游戏领域而言,游戏性能最重要的指标为帧率(frame rate)。帧率可以是用于测量显示图像的帧数的一种度量,也就是测量每秒显示的帧数(frame per second,FPS)。每秒的帧数(FPS)或者帧率可以表示图形处理器每秒钟能够更新的次数。由于人类眼睛的特殊生理功能,如果所看到的的画面其帧率高于某一设定阈值的情况下,就可以认为该图像是连贯的,高的帧率可以得到更流畅的游戏画面,体现更逼真的动画。
应理解,本实施例的帧可以是分段数据组成的一个静止的图像。因此,只要该帧可以在指定时间内执行完成(以60帧的游戏为例,其中每一帧的时长为1000/60=16.6ms),就可以保证游戏的性能要求。对于CPU、GPU、DDR等模块而言,可以按照帧进行调频,使得在指定的时间内(例如,以60帧的游戏为例,其中每一帧的时长为1000/60=16.6ms)完成每一帧所需要的指令,就可以保证游戏性能。
另一方面,在对上述至少一种模块(例如,CPU、GPU、DDR、NPU)进行调频的过程中,可以根据预测的至少一个模块的能效参数(预测出该至少一个模块处理当前帧所需要的参数信息)预测出满足性能要求和功耗要求的预设频点,并可以将上述至少一个模块的工作频率调节为预设频点。
图1是本申请实施例提供的一种调频系统100的示意性框图。该调频系统100可以包括:CPU 110、GPU 120、存储器130、NPU 140以及总线150。图1中的调频系统100可以包括在一个或多个芯片中,从而形成一个电子系统。该系统可位于一个电子设备中,该电子设备可以是无线终端、有线终端、用户设备或无连接设备。例如,该电子设备可以是手机、膝上电脑、平板电脑或游戏机等。
参见图1,存储器130例如可以是DDR存储器,该存储器130可以用于存储调频系统100的程序代码和数据。存储器130可以通过总线150与CPU 110、GPU 120以及NPU 140连接。
应理解,总线150可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等任一类型总线。所述总线150可以分为地址总线、数据总线、控制总线等。为便于表示,图1中仅用一条线表示,但并不表示仅有一根总线或一种类型的总线。
图1所示的CPU 110、GPU 120以及NPU 140可以用于处理图像,CPU 110、GPU 120以及NPU 140可以为一个芯片上,并通过用于与所述芯片外部耦合的接口读取存储器130中存储的软件代码来实现相应的功能。该软件代码经由接口和总线150从存储器130被读入芯片内部并被CPU 110、GPU 120或NPU 140的至少一个使用。
可选地,在一些实施例中,调频系统100还可以包括硬件加速器160。该硬件加速器160可以包括专用集成电路(application-specific integrated circuit,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。
下面结合图1所示的硬件系统架构(调频系统100),对本申请实施例提供的一种调频方法进行详细描述。图2是本申请实施例提供的一种调频方法的示意性流程图。图2所示的方法可以包括步骤210-230,下面分别对步骤210-230进行详细描述。
步骤210,预测至少一个模块处理当前帧所需要的能效参数。本申请实施例中对处理图像的当前帧的模块不做具体限定。可以是图1所示的调频系统100中的CPU 110、GPU 120、存储器130(例如DDR)、或NPU 140中的一个模块或多个模块。本申请实施例中对上述至少一个模块所对应的处理当前帧所需要的能效参数不做具体限定,下面会结合不同的模块,对不同的模块在处理当前帧所需的能效参数进行详细的举例说明。
可选地,在一些实施例中,如果是图1所示的调频系统100中的CPU 110对图像的当前帧进行处理,本申请实施例中预测的该CPU 110的能效参数可以包括但不限于以下参数中的至少一个:该CPU处理当前帧所需要的指令数(instructions)、该CPU处理当前帧所产生的缓存缺失(例如,一级数据缓存缺失所发生的次数(L1 data cache misses)、二级缓存缺失所发生的次数(L2 data cache misses))、该CPU因前端原因导致无操作(no operation issused due to frontend)所发生的次数(也可以称为前端限制(frontend bound,FE bound))、该CPU因后端原因导致无操作(no operation issused due to backend)所发生的次数(也可以称为后端限制(backend bound,BE bound))。
可选地,在一些实施例中,如果是图1所示的调频系统100中的GPU 120对图像的当前帧进行处理,本申请实施例中预测的该GPU 120的能效参数可以包括但不限于以下参数中的至少一个:该GPU处理当前帧所需要的绘图函数的调用(draw calls)次数、三角形(triangles)的绘制个数、像素(pixel)的绘制填充(total-fragment,total-FRAG)次数、纹理的绘制(texture-operation,TEX-operation)次数。
可选地,在一些实施例中,如果是图1所示的调频系统100中的存储器130(例如,DDR)存储或读取所述当前帧,本申请实施例中预测的该存储器130的能效参数可以包括但不限于以下参数中的至少一个:该存储器存储或读取当前帧所需要的带宽(bandwidth)。
可选地,在一些实施例中,如果是图1所示的调频系统100中的NPU 140对图像的当前帧进行处理,本申请实施例中预测的该NPU 140的能效参数可以包括但不限于以下 参数中的至少一个:该NPU处理当前帧所需的计算量(tasks)、该NPU执行当前帧所需要的指令数(the total of instructionss needed to execute the current frame,total-INSTR-EXEC)、该NPU的内存访问请求(the total of memory request,total-MEM-request)次数。
本申请实施例中预测图1所示的至少一个模块处理当前帧所需要的能效参数的实现方式有多种,本申请对此不做具体限定。可以根据历史能效参数(可以由当前帧之前的至少一个帧的能效参数获得)预测当前帧所需要的能效参数。例如,可以根据历史能效参数的均值预测当前帧所需要的能效参数。又如,还可以根据历史能效参数构建并查找负载预测表,从而可以根据负载预测表预测当前帧所需要的能效参数。后文会结合图3对构建并查找负载预测表的具体实现方式进行描述,此处不再赘述。
步骤220,根据预测的能效参数,从多个频点集合中选择出满足能效需求的第一频点集合。本申请实施例中的多个频点集合可以用于表示图1所示的至少一个模块的频点全集或频点组合的全集。需要说明的是,不同型号的模块对应的频点全集不同,因此,本申请实施例对多个频点集合中的频点或频点组合的数量不做具体限定。
本申请实施例中根据预测的能效参数,从多个频点集合中选择出满足能效需求的第一频点集合的实现方式有多种,本申请对此不做具体限定。应理解,能效需求可以用于表示当前帧的预设频点能够满足性能需求,也可以表示为当前帧的预设频点能够满足功耗需求,本申请对此不做具体限定。
作为一个示例,可以根据预测的能效参数,预测多个频点集合中的每个频点集合对应的性能(例如,图1所示的至少一个模块所对应的运行时间),并可以根据至少一个模块所对应的性能公式,从多个频点集合中选择出满足性能的预设阈值(例如,预设的至少一个模块所对应的运行时间)所对应的第一频点集合。可以将图1所示的至少一个模块的工作频点调整为对应的第一频点集合中的预设频点。后文会结合图4-图5对根据性能公式选择出符合性能要求的频点集合的具体实现方式进行描述,此处不再赘述。
作为另一个示例,还可以根据预测的能效参数,预测第一频点集合中多个频点集合对应的功耗(例如,图1所示的至少一个模块运行当前帧所需要的功耗),并可以根据至少一个模块所对应的功耗公式,从第一频点集合中选择出最低功耗对应的频点集合。可以将图1所示的至少一个模块的工作频点调整为最低功耗对应的频点集合中的预设频点。后文会结合图6对根据功耗公式选择出最低功耗对应的频点集合的具体实现方式进行描述,此处不再赘述。
步骤230,将至少一个模块处理当前帧的工作频点调整为第一频点集合中至少一个模块对应的预设频点。本申请实施例中可以将步骤220中根据性能公式选择出符合性能要求的频点集合,并可以将图1所示的至少一个模块处理当前帧的工作频点调整为符合性能要求的频点集合中至少一个模块对应的预设频点。还可以将步骤220中根据功耗公式选择出最低功耗对应的频点集合,并可以将图1所示的至少一个模块处理当前帧的工作频点调整为最低功耗对应的频点集合中至少一个模块对应的预设频点。本申请实施例对此不做具体些限定。
本申请实施例中,在调频过程中可以及时响应负载变化需求,可以获得良好的性能和/或功耗收益。
下面结合图3中具体的例子,更加详细地描述本申请实施例中根据历史能效参数创建并查找负载预测表预测当前帧所需要的能效参数的具体实现方式。应注意,图3的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将申请实施例限制于所示例的具体数值或具体场景。本领域技术人员根据所给出的图3的例子,显然可以进行各种等价的修改或变化,这样的修改和变化也落入本申请实施例的范围内。
图3是本申请实施例提供的一种预测能效参数的方法的示意性流程图。图3所示的方法可以包括步骤310-340,下面分别对步骤310-340进行详细描述。
步骤310,在线收集历史能效参数。在线收集当前帧之前的至少一个帧的能效参数(可以理解为历史能效参数),处理当前帧不同的模块所对应的能效参数不同,具体的有关图1所示的至少一个模块所对应的处理当前帧所需要的能效参数请参照步骤210中的描述,此处不再赘述。
步骤320,历史能效参数量化。可以将历史能效参数进行量化,所谓的量化可以理解为将在线收集的历史能效参数的瞬时值离散成可以清晰度量的数值。
下面以处理图像的当前帧的模块为图1所示的CPU 110,以能效参数为该CPU 110处理当前帧所需要的指令数(instructions)作为示例,结合图1对前8帧的历史能效参数量化的过程进行详细描述。
表1 能效参数量化示意表
第1帧 | 第2帧 | 第3帧 | 第4帧 | 第5帧 | 第6帧 | 第7帧 | 第8帧 | |
原始数据 | 1×10 8 | 5×10 8 | 6×10 8 | 9×10 8 | 8×10 8 | 2×10 8 | 7×10 8 | 1×10 8 |
量化数据 | 1 | 5 | 6 | 9 | 8 | 2 | 7 | 1 |
参见表1,对于当前帧的前8帧的指令数(instructions)而言,CPU处理第1帧所需要的指令数为1×10
8,可以将1×10
8量化为1。CPU处理第2帧所需要的指令数为5×10
8,可以将5×10
8量化为5。以此类推,可以将CPU在处理不同的帧所需要的指令数进行量化。
步骤330,建立能效参数变化预测表。可以根据步骤320中的能效参数的量化方式,可以建立能效参数变化的预测表。应理解,负载预测表可以指示多个历史能效参数的变化趋势或者可以直接指示多个历史能效参数的与预测的能效参数之间对应关系。从而可以根据能效参数变化预测表中能效参数的变化规律预测图1所示的至少一个模块处理图像的当前帧时所需要的能效参数。可以理解,能效参数变化预测表的建立可以被实时执行或在装置处于空闲或充电状态下执行。
需要说明的是,可以根据当前帧之前的至少一个帧的能效参数预测至少一个模块处理当前帧所需的能效参数。本申请实施例以当前帧之前的3帧的能效参数预测至少一个模块处理当前帧所需的能效参数作为示例进行说明。
下面结合表2,以处理图像当前帧的模块为CPU,以预测的能效参数为该CPU处理当前帧所需要的指令数(instructions)作为示例,对根据对根据预测表中的前3帧的指令数预测CPU处理当前帧的所需的指令数的过程进行详细描述。
表2 指令数变化预测表
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
573 | 0 | 0 | 0 | 2 | 19 | 8 | 6 | 0 | 0 |
574 | 0 | 0 | 0 | 7 | 3 | 0 | 0 | 0 | 0 |
575 | 0 | 0 | 0 | 8 | 0 | 0 | 0 | 0 | 0 |
参见表2,可以根据当前帧之前的至少一个帧的能效参数量化之后的变化规律统计出当前帧各个量化数值出现的概率,并可以将概率最大的量化值作为预测的CPU处理当前帧所需要的指令数。下面结合表2中的具体例子进行详细说明。
例如,CPU处理当前帧之前3帧所需的指令数量化之后分别为5、7、3,根据表2中的统计规律,下一次(当前帧)量化数为5出现的概率最大(约为54%),本申请实施例中可以预测CPU处理当前帧所需要的指令为5×10
8(量化之后为5)。
再例如,CPU处理当前帧之前3帧所需的指令数量化之后分别为5、7、4,根据表2中的统计规律,下一次(当前帧)量化数为4出现的概率最大(约为70%),本申请实施例中可以预测CPU处理当前帧所需要的指令为4×10
8(量化之后为4)。
需要说明的是,每一个预测完至少一个模块处理当前帧所需的能效参数之后,可以动态更新步骤320中的历史能效参数量化以及步骤330中的能效参数变化预测表。
步骤340,预测至少一个模块处理图像的当前帧所需的能效参数。可以根据步骤330中的能效参数变化预测表,预测出图1所示的至少一个模块处理图像的当前帧所需的能效参数。步骤340具体可参照图2的流程中的210的操作。步骤310-330是建立负载预测表的过程,可以认为是一个预生成表格的过程。随后的340操作可具体参照图2的对应流程。
本申请实施例中可以在预测到图1所示的至少一个模块处理图像的当前帧所需的能效参数之后,可以根据预测的当前帧所需的能效参数以及能效需求选择出符合要求的至少一个模块对应的预设频点,并可以将图1所示的至少一个模块处理当前帧的工作频点调整为对应的预设频点。
本申请实施例对能效需求不做具体限定。作为一个示例,能效需求可以是处理图像当前帧的至少一个模块满足性能需求,例如,处理图像当前帧的至少一个模块的运行时间小于或等于预设阈值。作为另一个示例,能效需求还可以是处理图像当前帧的至少一个模块满足既满足性能需求,又满足能耗需求。
下面结合图4中具体的例子,更加详细地描述本申请实施例中根据预测的当前帧所需的能效参数以及性能需求选择出符合要求的至少一个模块对应的预设频点的具体实现方式。应注意,图4的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将申请实施例限制于所示例的具体数值或具体场景。本领域技术人员根据文所给出的图4的例子,显然可以进行各种等价的修改或变化,这样的修改和变化也落入本申请实施例的范围内。
图4是本申请实施例提供的一种可能的预设频点的选择方法的示意性流程图。图4所示的方法可以包括步骤410-430,下面分别对步骤410-430进行详细描述。
步骤410中,离线采集至少一个模块的所有频点或频点组合的能效参数。本申请实施例中可以离线采集图1所示的处理图像的至少一个模块(CPU、GPU、DDR、或NPU模块)的所有频点或频点组合的能效参数。应理解,不同型号的模块对应的频点集合可以不同,可以根据处理图像当前帧的一个或多个模块,采集各个频点或频点组合的能效参数。
还应理解,上述至少一个模块可以单独对图像当前帧进行处理,多个模块之间也可以任意组合搭配对图像当前帧进行处理。作为一个示例,如果仅是CPU对图像当前帧进行处理,可以采集该CPU在各个频点的能效参数。作为另一个示例,如果是CPU以及DDR对图像当前帧进行处理,可以采集该CPU以及DDR在各个频点组合的能效参数。
上述至少一个模块的能效参数请参考步骤210中对不同的模块在处理当前帧所需的能效参数的描述,此处不再赘述。
步骤420中,离线推导出至少一个模块的性能公式。本申请实施例中可以根据至少一个模块的运行参数(例如,运行时间)以及能效参数推导出至少一个模块的性能公式y=f(x)。
应理解,性能公式y=f(x)可以用于表示至少一个模块的运行参数(例如,运行时间)以及能效参数之间的函数关系,其中,x可以是自变量(例如,至少一个模块的能效参数),y可以是因变量(例如,可以是至少一个模块执行任意帧所需的运行时间)。
本申请实施例对性能公式y=f(x)不做具体限定,x与y之间的函数关系可以是线性的,也可以是非线性的。
下面以线性的性能公式y=f(x)作为示例,对各个模块的性能公式进行举例说明。
作为一个示例,对于CPU而言,性能公式y=f(x)中的y可以为CPU运行时间(CPU running time),x可以为参数instructions、L1 data cache misses、L2 data cache misses、FE bound、或BE bound中的一个或多个。
例如,CPU的性能公式可以表示为:
CPU running time=W0*instrutions+W1*FE bound+W2*BE bound+W3 (1)
其中,W0-W3为权重参数;
instrutions为CPU处理图像的任意帧时需要的指令数;
FE bound为CPU因前端原因导致该CPU无指令执行所发生的次数;
BE bound为CPU因后端原因导致CPU指令执行时队列发生拥堵的次数;
又如,CPU的性能公式还可以表示为:
CPU running time=W0*instrutions+W1*L1 misses+W2*L2 misses+W3 (2)
其中,L1 misses为CPU处理图像的任意帧时一级数据缓存缺失所发生的次数;
L2 misses为CPU处理图像的任意帧时二级数据缓存缺失所发生的次数;
作为另一个示例,对于GPU而言,性能公式y=f(x)中的y可以为GPU运行时间(GPU running time),x可以为参数draw calls、triangles、total-FRAG、或TEX-operation中的一个或多个。
例如,GPU的性能公式可以表示为:
GPU running time=W0*draw calls+W1*triangles+W2*total-FRAG+W3*TEX-operation+W4 (3)
其中,draw calls为该GPU处理图像的任意帧时所需的绘图函数的调用次数;
triangles为该GPU处理图像的任意帧时所需的三角形的绘制个数;
total-FRAG为该GPU处理图像的任意帧时所需的像素的绘制次数;
TEX-operation为该GPU处理图像的任意帧时所需的纹理的绘制次数;
作为另一个示例,对于NPU而言,性能公式y=f(x)中的y可以为NPU运行时间(NPU running time),x可以为参数tasks、total-INSTR-EXEC、total-MEM-request中的一个或多个。
例如,NPU的性能公式可以表示为:
NPU running time=W0*tasks+W1*total-INSTR-EXEC+W2*total-MEM-request+W3 (4)
其中,tasks为该NPU处理图像的任意帧时所需的计算量;
total-INSTR-EXEC为该NPU处理图像的任意帧时所需的指令数;
total-MEM-request为该NPU处理图像的任意帧时请求进行内存访问的次数;
步骤430中,根据性能公式以及预测出的当前帧所需的能效参数,选择出满足性能要求的频点集合。步骤410和420是离线操作,步骤430是在线计算,其流程具体可参照图2所示的220步骤,即在步骤410和420后,具体的在线操作可参照图2的对应流程。本申请实施例中可以根据步骤430中的性能公式y=f(x)以及预测出的至少一个模块处理当前帧所需的能效参数,从至少一个模块的所有频点或频点组合中选择出满足预设条件的子集合S。
应理解,预设条件可以是小于或等于某一个时间阈值。作为一个示例,该时间阈值可以是根据目标帧率计算得到的每帧所需的时长。以60帧的游戏为例,其中每一帧所需的时长为1000/60≈16.6ms。
具体地,可以将预测出的至少一个模块处理当前帧所需的能效参数作为输入,代入对应模块的性能公式y=f(x),并可以从至少一个模块的所有频点或频点组合中选择出对应模块的运行时间小于或等于某一个时间阈值的频点子集合S(该频点子集合S包括至少一个模块对应的预设频点)。
本申请实施例中可以将至少一个模块处理当前帧的工作频率调整为上述子集合S中至少一个模块对应的预设频点,对应的至少一个模块可以在预设的频点下处理当前帧,从而可以满足性能要求。
参见图5,图5是本申请实施例提供的一种可能的预设频点选择示意图。图5中以需要对图1所示的CPU 110以及存储器130(例如DDR)进行调频(也就是说,CPU 110以及存储器130(例如DDR)处理图像的当前帧)作为示例。图4中的频点组合全集T可以是CPU 110与存储器130(例如DDR)所有频点的组合。
例如,频点组合全集T可以为:
频点组合1:CPU频点1/DDR频点1{W
10,W
11,W
12,W
13}
频点组合2:CPU频点1/DDR频点2{W
20,W
21,W
22,W
23}
频点组合3:CPU频点1/DDR频点3{W
30,W
31,W
32,W
33}
····
频点组合n:CPU频点1/DDR频点n{W
n0,W
n1,W
n2,W
n3}
频点组合n+1:CPU频点2/DDR频点1{W
(n+1)0,W
(n+1)1,W
(n+1)2,W
(n+1)3}
频点组合n+2:CPU频点2/DDR频点2{W
(n+2)0,W
(n+2)1,W
(n+2)2,W
(n+2)3}
····
频点组合2n:CPU频点2/DDR频点n{W
(2n)0,W
(2n)1,W
(2n)2,W
(2n)3}
····
频点组合m×n:CPU频点m/DDR频点n{W
(m×n)0,W
(m×n)1,W
(m×n)2,W
(m×n)3}
应理解,不同型号的模块其设置的频点m或n不同,本申请对此不做具体限定。
还应理解,上述频点组合中的{W
00,W
01,W
02,W
03}可以对应于该频点组合下性能公式中的权重W
x。
本申请实施例中,可以根据步骤420中提及的性能公式,从频点组合全集T中选择出满足性能需求的子集合S。
例如,满足性能需求的子集合S可以为:
频点组合1:CPU频点1/DDR频点1{W
00,W
01,W
02,W
03}
频点组合2:CPU频点1/DDR频点2{W
10,W
11,W
12,W
13}
频点组合3:CPU频点1/DDR频点3{W
20,W
21,W
22,W
23}
频点组合4:CPU频点2/DDR频点1{W
30,W
31,W
32,W
33}
频点组合5:CPU频点2/DDR频点3{W
40,W
41,W
42,W
43}
频点组合6:CPU频点3/DDR频点3{W
50,W
51,W
52,W
53}
频点组合7:CPU频点4/DDR频点1{W
60,W
61,W
62,W
63}
图5中子集合S包括的CPU与DDR的频点组合均可以满足性能需求,并可以将CPU与DDR在处理图像当前帧的工作频点调整为如下6中频点组合(子集合S)中的任意一个频点组合。
图5是基于性能需求从至少一个模块的所有频点或频点组合中选择出满足性能需求的子集合S。可选地,在一些实施例中,还可以在图5的基础上,可以根据功耗公式,从子集合S中选择出最低功耗对应的频点集合M。下面会结合图6对从子集合S中选择出最低功耗对应的频点集合M的具体实现方式进行描述,此处不再赘述。
本申请实施例中可以根据至少一个模块的功耗(power)参数以及能效参数推导出至少一个模块的功耗公式p=f(y)。
应理解,功耗公式p=f(y)可以用于表示至少一个模块的功耗参数以及能效参数之间的函数关系,其中,y可以是自变量(y可以是性能公式的输出值,例如,至少一个模块执行任意帧所需的运行时间),p可以是因变量(例如至少一个模块处理图像当前帧所需的功耗(power))。
本申请实施例对功耗公式p=f(y)不做具体限定,y与p之间的函数关系可以是线性的,也可以是非线性的。
下面以线性的功耗公式p=f(y)作为示例,对根据预测的至少一个模块的能效参数,预测子集合S中的频点或频点组合所对应的功耗大小进行举例说明。
例如,如果需要对CPU以及DDR模块进行调频(也就是说,CPU以及DDR模块处理图像的当前帧),子集合S中可以是CPU以及DDR的频点组合,并且可以根据如下的功耗公式p=f(y)预测子集合S中CPU以及DDR的频点组合所对应的功耗大小。
CPU以及DDR所需的总功耗公式可以表示为:
power≈W0*frame duration+W1*CPU running time+W2*bandwidth+W3 (5)
其中,W0-W3为权重参数;
frame duration为目标帧率的倒数;
CPU running time为CPU处理图像的任意帧时所需的运行时间;
bandwidth为DDR存储或读取图像的任意帧时所需的带宽总量;
需要说明的是,功耗公式p=f(y)中的自变量y可以按照子集合S中的模块进行搭配,如果子集合S中只需要对CPU以及DDR进行调频,则计算出的模块GPU以及NPU的运行时间(GPU running time、NPU running time)可以不需要参与总功耗的计算。
又如,如果需要对全系统的模块(CPU、GPU、NPU以及DDR)进行调频,则CPU、GPU、NPU以及DDR所需的总功耗公式可以表示为:
power≈W0*frame duration+W1*CPU running time+W2*GPU running time+W3*NPU running time+W4*bandwidth+W5 (6)
其中,W0-W5为权重参数;
GPU running time为GPU处理图像的任意帧时所需的运行时间;
NPU running time为NPU处理图像的任意帧时所需的运行时间;
下面结合图6中具体的例子,更加详细地描述本申请实施例中从子集合S中选择出最低功耗对应的频点集合m的具体实现方式。应注意,图6的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将申请实施例限制于所示例的具体数值或具体场景。本领域技术人员根据文所给出的图6的例子,显然可以进行各种等价的修改或变化,这样的修改和变化也落入本申请实施例的范围内。
图6是本申请另一实施例提供的一种可能的预设频点选择的示意图。图6中以需要对图1所示的CPU 110以及存储器130(例如DDR)进行调频(也就是说,图1中的CPU 110以及存储器130(例如DDR)处理图像的当前帧)作为示例。
参见图6,可以在图5的基础上,进一步根据上述功耗公式(5),从子集合S(例如,上述6种符合性能需求的CPU与DDR的频点组合)中选择出CPU以及DDR所需的总功耗最低的频点组合M,并可以将CPU与DDR在处理图像当前帧的工作频点调整为频点组合M中的频点。
例如,CPU以及DDR所需的总功耗最低的频点组合M可以为:
频点组合4:CPU频点2/DDR频点1{W
30,W
31,W
32,W
33}
需要说明的是,CPU以及DDR所需的总功耗最低的频点组合M可以是一个,也可以是多个,本申请实施例对此不做具体限定。
本申请实施例提供的调频方法可以在调频过程中及时响应负载变化需求,可以获得良好的性能和/或功耗收益。
下面以部分游戏的整体性能为例,结合表3,对使用本申请实施例中的调频方法(AI调频方法)之后,游戏的整体性能以及功耗收益进行详细描述。
表3 AI调频调度实施前后收益对比表
表3列出了部分游戏在AI调频调度实施前后收益的游戏性能收益以及功耗收益。游戏性能收益主要体现在平均帧率、流畅度、卡顿率等方面,其中,平均帧率越接近满帧帧率,游戏性能越好;流畅度可以用于表示帧率之差(后一秒帧率减去前一秒帧率的差值)的标准差,其值越低,游戏性能越好;卡顿率可以用于表示帧率低于设定阈值的比例(按秒统计),其值越低,游戏性能越好。
例如,表3中的开心消消乐游戏,在平均帧率方面,未使用本申请实施例提供的调频调度方法(AI调频调度),其平均帧率为56,在使用AI调频调度方法之后,其平均帧率上升为57.89。在流畅度方面,在未使用AI调频调度方法时,其流畅度为4.81,在在使用AI调频调度方法之后,其流畅度下降为2.39。在功耗收益方面,在使用了本申请实施例提供的AI调频调度方法之后,其能效收益比例上升9.11%。因此,在使用本申请实施例提供的AI调频调度方法之后,游戏的整体性能以及功耗均有明显提升,其能效收益明显。
本申请实施例提供的调频方法,预测的至少一个模块在处理当前帧所需的运行时间较准确。
下面结合表4以及表5,详细描述使用本申请实施例提供的调频方法之后,可以较准确的预测至少一个模块在处理当前帧所需的运行时间。
表4 AI调频调度预测准确率
游戏场景 | 预测准确率 |
王者荣耀 | 98.53% |
阴阳师 | 98.70% |
QQ飞车 | 98.65% |
NBA2018 | 98.55% |
崩坏3 | 98.39% |
由表4可以看出,使用本申请实施例提供的调频方法之后,预测的至少一个模块在处理当前帧所需的运行时间的准确率较高。
下面以表4中预测的运行QQ飞车的所需的帧运行时间为例,结合表5详细描述至少一个模块在处理当前帧所需的运行时间与实际运行时间的偏差。
表5 AI调频调度预测准确率
由表5可以看出,本申请实施例中的调频方法可以适用但不限于:游戏领域、视频领域或其他通用的应用领域。
参见表5,索引为62的一列中,该游戏的中位数绝对偏差MAD为0.0135,对应的该游戏的预测准确率为(1-0.0135=98.65%)。
应理解,MAD统计的是至少一个模块在处理当前帧所需的运行时间与实际运行时间的偏差值。
上文结合图1至图6,详细描述了本发明实施例提供的一种调频方法,下面将结合图图7,详细描述本申请的装置实施例。应理解,方法实施例的描述与装置实施例的描述相互对应,因此,未详细描述的部分可以参见前面方法实施例。
图7示出了本申请实施例的调频装置700的示意性框图,该调频装置700中各模块分别用于执行上述方法中各动作或处理过程,这里,为了避免赘述,详细说明可以参照上文中的描述。
图7是本申请实施例提供的调频装置700的示意性框图。该调频装置700可以包括:预测模块710、确定模块720以及处理模块730,所述预测模块710用于:预测至少一个模块处理图像的当前帧所需的能效参数,所述至少一个模块包括中央处理单元CPU、图形处理单元GPU、用于存储所述当前帧的存储器、或神经网络处理单元NPU中的至少一个;所述确定模块720用于:根据预测的所述能效参数,从多个频点集合中选择出满足能效需求的第一频点集合,所述第一频点集合中包括所述至少一个模块对应的预设频点;所述处理模块730用于:将所述至少一个模块处理所述当前帧的工作频点调整为所述至少一个模块对应的预设频点。
可选地,在一些实施例中,所述能效参数包括所述CPU处理所述当前帧所需的指令数、所述CPU处理所述当前帧所产生的缓存缺失、所述GPU处理所述当前帧所需的绘图函数的调用次数、所述存储器存储或读取所述当前帧所需的带宽、或所述NPU处理所述当前帧所需的计算量中的至少一个。
可选地,在一些实施例中,所述能效需求包括如下至少一项:功耗需求或性能需求。
可选地,在一些实施例中,所述功耗需求为最低功耗需求,所述性能需求为满足预设阈值。
可选地,在一些实施例中,所述确定模块720具体用于:根据预测的所述能效参数预测所述每个频点集合对应的性能;从多个频点集合中选择出满足预设阈值的性能所对应的第一频点集合。
可选地,在一些实施例中,所述确定模块720还具体用于:根据预测的所述能效参数预测所述每个频点集合对应的功耗;所述从多个频点集合中选择出满足预设阈值的多个性能所对应的多个第二频点集合;从多个第二频点集合中选择出最低功耗所对应的第一频点集合。
可选地,在一些实施例中,所述预测模块710具体用于:根据历史能效参数预测所述至少一个模块处理图像的当前帧所需的能效参数,所述历史能效参数由当前帧之前的至少一个帧的能效参数获得。
可选地,在一些实施例中,所述预测模块710具体用于:根据历史能效参数查找负载预测表以预测所述至少一个模块处理图像的当前帧所需的能效参数。
可以理解,图7对应的调频装置700中的每个模块710-730可以以软件、硬件或其结合来实现。如果以硬件实现,则调频装置700是一个硬件电路,每个模块可以认为是一个电路单元,包括数字电路、逻辑电路、模拟电路、硬件加速器或算法电路中的至少一个。此时调频装置700可以视为是一个专用硬件,例如可以视为是图1系统中的一个硬件加速器160或其中一部分。
或者,调频装置700可以是以软件程序形成,其中每个模块包括程序指令,被处理器,如之前图1的系统中提到的CPU 110所运行以实现相关功能,即每个模块是一个软件模块,并可以运行在如图1所示的CPU 110上,具体可以参照之前的描述。
再或者,调频装置700中的部分模块是硬件电路,可以是图1的系统中的硬件加速器160,而另一部分模块是处理器,如图1中CPU 110执行的软件模块,本实施例对此不作限制。
本申请实施例还提供了一种芯片,包括存储器和将该处理器与所述芯片外部耦合的接口,例如,该接口可以用于耦合至所述芯片外部的存储器,该存储器可以用于存储该设备的程序代码和数据。处理器可以通过该接口从存储器中读取程序代码以执行所述操作。关于存储器和处理器的具体描述可以参照上述实施例中的介绍,此处不再赘述。
本申请实施例还提供了一种计算机可读存储介质,包括计算机程序,当该计算机程序在计算机或处理器上运行时,使得该计算机或处理器执行如步骤210-230等步骤中所述的方法。
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机或处理器上运行时,使得该计算机或处理器执行如步骤210-230等步骤中所述的方法。
图2至图4中任一方法流程可以被处理器执行,该处理器可以是通用处理器,可以通过硬件来实现所述流程的运算也可以通过执行软件来实现相关运算。当通过硬件实现时,该处理器可以包括具有逻辑电路或集成电路等的微处理器、数字信号处理器、微控制器、或以上实施例提到的所述CPU110等,通过读取存储器中存储的软件代码来实现功能,该存储器可以集成在处理器中,可以位于该处理器之外,独立存在。进一步地,该处理器可以包括必要的硬件加速器,例如不依赖于软件执行运算的硬件算法电路、逻辑运算电路、或模拟电路等。
在一种典型的实现方式中,以前实施例中的所述方法流程被所述CPU110执行。所述CPU110通过执行存储器130内的程序来实现所述方法流程,以便对系统内的多个模块,包括该CPU110自身,进行调频。可以理解,在线操作是CPU110或其他处理器执行的操作。离线操作则可以是装置出厂之前有本领域技术人员在开发过程中执行,并将得到的结果,如实施例提到的性能公式预置在装置内。所述性能公式可以以软件或硬件算法电路的形式被预置在所述装置中。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示终端设备相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。应理解,本申请实施例中的方式、情况、类别以及实施例的划分仅是为了描述的方便,不应构成特别的限定,各种方式、类别、情况以及实施例中的特征在不矛盾的情况下可以相结合。
还应理解,申请实施例中的“第一”、“第二”以及“第三”仅为了区分,不应对本申请构成任何限定。还应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。还应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
还需要说明的是,“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。“至少一个”是指一个或一个以上;“A和B中的至少一个”,类似于“A和/或B”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和B中的至少一个,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。下面将结合附图详细说明本申请提供的技术方案。
另外,本申请的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本申请中使用的术语“制品”涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。例如,计算机可读介质可以包括,但不限于:磁存储器件(例如,硬盘、 软盘或磁带等),光盘(例如,压缩盘(compact disc,CD)、数字通用盘(digital versatile disc,DVD)等),智能卡和闪存器件(例如,可擦写可编程只读存储器(erasable programmable read-only memory,EPROM)、卡、棒或钥匙驱动器等)。另外,本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令以使得一台计算机设备(可以是个人计算机,服务器,或者设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
Claims (18)
- 一种调频方法,其特征在于,所述方法包括:预测至少一个模块处理图像的当前帧所需的能效参数,所述至少一个模块包括中央处理单元CPU、图形处理单元GPU、用于存储所述当前帧的存储器、或神经网络处理单元NPU中的至少一个;根据预测的所述能效参数,从多个频点集合中选择出第一频点集合,所述第一频点集合中包括所述至少一个模块对应的预设频点;将所述至少一个模块处理所述当前帧的工作频点调整为所述至少一个模块对应的预设频点。
- 根据权利要求1所述的方法,其特征在于,所述能效参数包括所述CPU处理所述当前帧所需的指令数、所述CPU处理所述当前帧所产生的缓存缺失、所述GPU处理所述当前帧所需的绘图函数的调用次数、所述存储器存储所述当前帧所需的带宽、或所述NPU处理所述当前帧所需的计算量中的至少一个。
- 根据权利要求1或2所述的方法,其特征在于,所述第一频点集合能够满足能效需求,所述能效需求包括如下至少一项:功耗需求或性能需求。
- 根据权利要求3所述的方法,其特征在于,所述功耗需求为最低功耗需求,所述性能需求为满足预设阈值。
- 根据权利要求3或4所述的方法,其特征在于,所述根据预测的所述能效参数,从多个频点集合中选择出满足能效需求的第一频点集合,包括:根据预测的所述能效参数预测所述每个频点集合对应的性能;从多个频点集合中选择出满足预设阈值的性能所对应的第一频点集合。
- 根据权利要求5所述的方法,其特征在于,所述从多个频点集合中选择出满足预设阈值的性能所对应的第一频点集合包括:根据预测的所述能效参数预测所述每个频点集合对应的功耗;所述从多个频点集合中选择出满足预设阈值的多个性能所对应的多个第二频点集合;从多个第二频点集合中选择出最低功耗所对应的第一频点集合。
- 根据权利要求1至6中任一项所述的方法,其特征在于,所述预测至少一个模块处理图像的当前帧所需的能效参数,包括:根据历史能效参数预测所述至少一个模块处理图像的当前帧所需的能效参数,所述历史能效参数由当前帧之前的至少一个帧的能效参数获得。
- 根据权利要求7所述的方法,其特征在于,所述根据历史能效参数预测所述至少一个模块处理图像的当前帧所需的能效参数,包括:根据历史能效参数查找负载预测表以预测所述至少一个模块处理图像的当前帧所需的能效参数。
- 一种调频装置,其特征在于,包括:预测模块,用于预测至少一个模块处理图像的当前帧所需的能效参数,所述至少一个 模块包括中央处理单元CPU、图形处理单元GPU、用于存储所述当前帧的存储器、或神经网络处理单元NPU中的至少一个;确定模块,用于根据预测的所述能效参数,从多个频点集合中选择出第一频点集合,所述第一频点集合中包括所述至少一个模块对应的预设频点;处理模块,用于将所述至少一个模块处理所述当前帧的工作频点调整为所述至少一个模块对应的预设频点。
- 根据权利要求9所述的装置,其特征在于,所述能效参数包括所述CPU处理所述当前帧所需的指令数、所述CPU处理所述当前帧所产生的缓存缺失、所述GPU处理所述当前帧所需的绘图函数的调用次数、所述存储器存储所述当前帧所需的带宽、或所述NPU处理所述当前帧所需的计算量中的至少一个。
- 根据权利要求9或10所述的装置,其特征在于,所述第一频点集合能够满足能效需求,所述能效需求包括如下至少一项:功耗需求或性能需求。
- 根据权利要求11所述的装置,其特征在于,所述功耗需求为最低功耗需求,所述性能需求为满足预设阈值。
- 根据权利要求11或12所述的装置,其特征在于,所述确定模块具体用于:根据预测的所述能效参数预测所述每个频点集合对应的性能;从多个频点集合中选择出满足预设阈值的性能所对应的第一频点集合。
- 根据权利要求13所述的装置,其特征在于,所述确定模块还具体用于:根据预测的所述能效参数预测所述每个频点集合对应的功耗;所述从多个频点集合中选择出满足预设阈值的多个性能所对应的多个第二频点集合;从多个第二频点集合中选择出最低功耗所对应的第一频点集合。
- 根据权利要求9至14中任一项所述的装置,其特征在于,所述预测模块具体用于:根据历史能效参数预测所述至少一个模块处理图像的当前帧所需的能效参数,所述历史能效参数由当前帧之前的至少一个帧的能效参数获得。
- 根据权利要求15所述的装置,其特征在于,所述预测模块具体用于:根据历史能效参数查找负载预测表以预测所述至少一个模块处理图像的当前帧所需的能效参数。
- 一种调频装置,其特征在于,包括:存储器和处理器,所述存储器用于存储程序;所述处理器用于执行所述存储器中存储的程序,并在所述程序的驱动下执行权利要求1至8中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,包括计算机程序,当所述计算机程序在计算机或处理器上运行时,使得所述计算机或处理器执行如1至8中任一项所述的方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18932041.9A EP3819744A4 (en) | 2018-08-30 | 2018-08-30 | FREQUENCY ADAPTATION METHOD, DEVICE AND COMPUTER-READABLE STORAGE MEDIUM |
CN201880091791.XA CN111902790B (zh) | 2018-08-30 | 2018-08-30 | 一种调频方法、装置及计算机可读存储介质 |
PCT/CN2018/103307 WO2020042098A1 (zh) | 2018-08-30 | 2018-08-30 | 一种调频方法、装置及计算机可读存储介质 |
US17/177,881 US11460905B2 (en) | 2018-08-30 | 2021-02-17 | Frequency scaling responding to a performance change method and apparatus and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/103307 WO2020042098A1 (zh) | 2018-08-30 | 2018-08-30 | 一种调频方法、装置及计算机可读存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/177,881 Continuation US11460905B2 (en) | 2018-08-30 | 2021-02-17 | Frequency scaling responding to a performance change method and apparatus and computer-readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020042098A1 true WO2020042098A1 (zh) | 2020-03-05 |
Family
ID=69642617
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/103307 WO2020042098A1 (zh) | 2018-08-30 | 2018-08-30 | 一种调频方法、装置及计算机可读存储介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11460905B2 (zh) |
EP (1) | EP3819744A4 (zh) |
CN (1) | CN111902790B (zh) |
WO (1) | WO2020042098A1 (zh) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114510140B (zh) * | 2020-11-16 | 2024-04-16 | 深圳市万普拉斯科技有限公司 | 一种调频方法、装置及电子设备 |
CN114510139B (zh) * | 2020-11-16 | 2024-06-04 | 深圳市万普拉斯科技有限公司 | 一种调频方法、装置及电子设备 |
KR20230036589A (ko) * | 2021-09-06 | 2023-03-15 | 삼성전자주식회사 | 시스템-온-칩 및 그의 동작 방법 |
CN113918002B (zh) * | 2021-10-27 | 2024-06-28 | 杭州逗酷软件科技有限公司 | 调频方法、装置、存储介质及电子设备 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060020838A1 (en) * | 2004-06-30 | 2006-01-26 | Tschanz James W | Method, apparatus and system of adjusting one or more performance-related parameters of a processor |
US20110320846A1 (en) * | 2010-06-23 | 2011-12-29 | David Howard S | Adaptive memory frequency scaling |
CN102609319A (zh) * | 2011-01-20 | 2012-07-25 | 中国移动通信有限公司 | 一种处理器调频方法、装置及设备 |
CN105677482A (zh) * | 2015-12-31 | 2016-06-15 | 联想(北京)有限公司 | 一种频率调节方法及电子设备 |
CN107678855A (zh) * | 2017-09-19 | 2018-02-09 | 中国电子产品可靠性与环境试验研究所 | 处理器动态调节方法、装置及处理器芯片 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2406184B (en) * | 2003-09-17 | 2006-03-15 | Advanced Risc Mach Ltd | Data processing system |
US7529948B2 (en) * | 2005-08-25 | 2009-05-05 | Apple Inc. | Methods and apparatuses for dynamic power estimation |
CN100561404C (zh) * | 2005-12-29 | 2009-11-18 | 联想(北京)有限公司 | 节省处理器功耗的方法 |
KR20140030823A (ko) * | 2012-09-04 | 2014-03-12 | 삼성전자주식회사 | 3차원 작업 부하를 이용하여 dvfs 정책을 수행하는 시스템-온 칩 및 이의 동작 방법 |
CN103019367B (zh) * | 2012-12-03 | 2015-07-08 | 福州瑞芯微电子有限公司 | 基于Android系统的嵌入式GPU动态调频方法及装置 |
KR20140088691A (ko) * | 2013-01-03 | 2014-07-11 | 삼성전자주식회사 | Dvfs 정책을 수행하는 시스템-온 칩 및 이의 동작 방법 |
US9395796B2 (en) * | 2013-12-19 | 2016-07-19 | Intel Corporation | Dynamic graphics geometry preprocessing frequency scaling and prediction of performance gain |
US9378536B2 (en) * | 2014-04-30 | 2016-06-28 | Qualcomm Incorporated | CPU/GPU DCVS co-optimization for reducing power consumption in graphics frame processing |
CN105094272A (zh) * | 2014-05-14 | 2015-11-25 | 中兴通讯股份有限公司 | 一种终端的硬件刷新率的调节方法及装置 |
CN105045367A (zh) * | 2015-01-16 | 2015-11-11 | 中国矿业大学 | 基于游戏负载预测的android系统设备功耗优化方法 |
CN107465929B (zh) * | 2017-07-21 | 2019-02-01 | 山东大学 | 基于hevc的dvfs控制方法、系统、处理器及存储设备 |
-
2018
- 2018-08-30 EP EP18932041.9A patent/EP3819744A4/en active Pending
- 2018-08-30 CN CN201880091791.XA patent/CN111902790B/zh active Active
- 2018-08-30 WO PCT/CN2018/103307 patent/WO2020042098A1/zh unknown
-
2021
- 2021-02-17 US US17/177,881 patent/US11460905B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060020838A1 (en) * | 2004-06-30 | 2006-01-26 | Tschanz James W | Method, apparatus and system of adjusting one or more performance-related parameters of a processor |
US20110320846A1 (en) * | 2010-06-23 | 2011-12-29 | David Howard S | Adaptive memory frequency scaling |
CN102609319A (zh) * | 2011-01-20 | 2012-07-25 | 中国移动通信有限公司 | 一种处理器调频方法、装置及设备 |
CN105677482A (zh) * | 2015-12-31 | 2016-06-15 | 联想(北京)有限公司 | 一种频率调节方法及电子设备 |
CN107678855A (zh) * | 2017-09-19 | 2018-02-09 | 中国电子产品可靠性与环境试验研究所 | 处理器动态调节方法、装置及处理器芯片 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3819744A4 * |
Also Published As
Publication number | Publication date |
---|---|
US11460905B2 (en) | 2022-10-04 |
EP3819744A1 (en) | 2021-05-12 |
US20210165477A1 (en) | 2021-06-03 |
CN111902790A (zh) | 2020-11-06 |
CN111902790B (zh) | 2022-05-31 |
EP3819744A4 (en) | 2021-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020042098A1 (zh) | 一种调频方法、装置及计算机可读存储介质 | |
CN111045814B (zh) | 资源调度方法和终端设备 | |
EP3452954B1 (en) | Dynamic classifier selection based on class skew | |
WO2020078135A1 (zh) | 资源调度方法和计算机设备 | |
CN112631415A (zh) | Cpu频率调整方法、装置、电子设备及存储介质 | |
CN111639687B (zh) | 一种模型训练以及异常账号识别方法及装置 | |
US20200225995A1 (en) | Application cleaning method, storage medium and electronic device | |
WO2019019926A1 (zh) | 系统参数的优化方法、装置及设备、可读介质 | |
US10956976B2 (en) | Recommending shared products | |
WO2020233709A1 (zh) | 模型压缩方法及装置 | |
US20160035032A1 (en) | Determining an asset recommendation | |
US20240152393A1 (en) | Task execution method and apparatus | |
WO2019062409A1 (zh) | 后台应用程序管控方法、存储介质及电子设备 | |
CN110033383B (zh) | 一种数据处理方法、设备、介质以及装置 | |
CN115795146A (zh) | 待推荐资源的确定方法、装置、设备及存储介质 | |
CN113742581B (zh) | 榜单的生成方法、装置、电子设备及可读存储介质 | |
Wang et al. | Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning | |
CN113760550A (zh) | 资源分配方法和资源分配装置 | |
Wu et al. | AyE-Edge: Automated Deployment Space Search Empowering Accuracy yet Efficient Real-Time Object Detection on the Edge | |
CN108476084B (zh) | Q学习中调整状态空间边界的方法和装置 | |
CN113407335A (zh) | 计算资源规划方法以及电子设备、存储装置 | |
CN113596994B (zh) | 一种无线视频传输资源分配方法、装置和电子设备 | |
WO2024159976A1 (zh) | 模型训练方法、电子设备和计算机可读存储介质 | |
US10715384B2 (en) | Automatically modifying computer parameters as an incentive for complying with data policies | |
CN113704619A (zh) | 一种策略推荐方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18932041 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |