WO2022141735A1 - Processing apparatus, processing method, and related device - Google Patents

Processing apparatus, processing method, and related device Download PDF

Info

Publication number
WO2022141735A1
WO2022141735A1 PCT/CN2021/074555 CN2021074555W WO2022141735A1 WO 2022141735 A1 WO2022141735 A1 WO 2022141735A1 CN 2021074555 W CN2021074555 W CN 2021074555W WO 2022141735 A1 WO2022141735 A1 WO 2022141735A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
load
target
processor
processor core
Prior art date
Application number
PCT/CN2021/074555
Other languages
French (fr)
Chinese (zh)
Inventor
魏威
姚琮
谌灼杰
施赛丰
冷静
陈立前
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180088258.XA priority Critical patent/CN116710904A/en
Publication of WO2022141735A1 publication Critical patent/WO2022141735A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus

Definitions

  • the present application relates to the technical field of processors, and in particular, to a processing apparatus, a processing method, and related equipment.
  • Embodiments of the present application provide a processing apparatus, a processing method, and related equipment, which can better optimize the performance of a processor and a memory and the energy efficiency of a computer system.
  • an embodiment of the present application provides a processing device, the processing device includes a processor, a frequency modulation controller, and a frequency modulator, the processor includes a first processor core; the frequency modulation controller is used to: obtain the first processing at least one load type of the processor core; determining a target frequency of the target object based on the at least one load type; the target object includes a memory; calling the frequency regulator to adjust the operating frequency of the target object based on the target frequency.
  • the memory may include at least one of a memory and an asynchronous cache.
  • the foregoing load types may include computational loads, cache-dependent loads, and memory-dependent loads.
  • the working frequency of the corresponding memory is adjusted according to the workload type of the processor core (for example, if the workload is memory-dependent, the working frequency of the memory can be appropriately adjusted). If the workload is computing-dependent, the working frequency of the memory can be appropriately adjusted), so that the memory can better cooperate with the processor to process the workload for different workload types, and can reduce the system workload during processing. Caton can optimize the energy efficiency of the system while meeting the processing efficiency required by the workload, that is, the performance of the processor and memory can meet the needs of the workload while optimizing the energy efficiency of the system.
  • the existing technical solutions ignore not only the processor core itself, but also the memory (such as asynchronous cache and/or asynchronous cache) that affects the processor performance and system energy efficiency.
  • the memory such as asynchronous cache and/or asynchronous cache
  • the above-mentioned target object further includes the above-mentioned first processor core; the frequency modulation controller is further configured to: based on the target frequency of the first processor core, call the frequency controller to adjust the frequency of the first processor core.
  • the frequency modulation controller is further configured to: based on the target frequency of the first processor core, call the frequency controller to adjust the frequency of the first processor core.
  • the processor core and the asynchronous storage system can be regarded as one, and the operating frequencies can be optimized simultaneously by using algorithms and frequency modulation methods.
  • the operating frequency of the storage system and the processor core is matched with the workload that needs to be processed, so that the overall performance requirements of the processor and the memory can be met while unnecessary energy consumption can be saved, thereby optimizing the energy efficiency of the system.
  • mapping relationship between the target frequency of the first processor core and the target frequency of the memory, and the mapping relationship enables the performance and energy efficiency of the processor to be optimal at the same time, that is, the target frequency of the first processor core and The target frequency of the memory simultaneously runs and processes the corresponding workload, which can meet the performance requirements of the workload on the processor and the memory, save unnecessary energy consumption, and optimize the energy efficiency of the system.
  • the above-mentioned processor further includes at least one second processor core, the above-mentioned first processor core and the at least one second processor core form a cluster, and the frequency modulation controller is further configured to The target frequency of a processor core determines the operating frequency of the cluster.
  • the at least one second processor core may include one or more of the above-mentioned first processor cores.
  • processor cores in a cluster share one operating frequency
  • this application when one or more target frequencies of one or more processor cores in a cluster are obtained, it is necessary to arbitrate based on these target frequencies.
  • a unified optimized frequency is used as the working frequency of the one cluster.
  • the above-mentioned processor includes n above-mentioned first processor cores, where n is a positive integer; the above-mentioned frequency modulation controller is further configured to: determine for at least one load type of each of the first processor cores A target frequency of the memory; determining the optimal operating frequency of the memory based on the n target frequencies of the memory; calling the frequency regulator to adjust the operating frequency of the memory to the optimal operating frequency.
  • the memory is shared by the entire processor and has only one operating frequency, in the present application, when multiple optimized operating frequencies of the memory are obtained, a unified optimized frequency also needs to be arbitrated as the operating frequency of the memory.
  • the processing apparatus further includes: a load classifier, configured to obtain load classification feature information in the first processor core, and based on the load classification feature information, load the first processor core
  • the at least one load type is obtained through classification
  • the frequency modulation controller is specifically configured to: obtain the at least one load type from the load classifier.
  • the load classification feature information includes clock signal inversion information in the first processor core.
  • the load classifier can be implemented by one or more of hardware, software or firmware, and can classify the load in the processor core in a fine-grained manner, so as to cooperate with the frequency modulation controller to obtain a reasonable and accurate optimized frequency .
  • the frequency regulation controller is further configured to: acquire the load of the first processor core; and determine the target of the target object based on the at least one load type when the load meets a preset condition frequency.
  • the load amount is the load amount of the first processor core in a first time period, and the first time period and the time period in which the first processor core obtains the load amount are two adjacent ones. a time period, and the first time period occurs first.
  • the preset condition may include: the load is greater than the first load threshold or the load is less than the second load threshold. Wherein, the first load threshold is greater than the second load threshold.
  • the present application shows that the processing of frequency optimization and adjustment is triggered only when the load satisfies the preset conditions, which can reduce the resources consumed by triggering adjustment anytime and anywhere.
  • the present application may set two thresholds, one is the up-frequency threshold (the above-mentioned first load threshold), and the other is the down-frequency threshold (the above-mentioned second load threshold), when the load is greater than the up-frequency threshold , it indicates that the load of the processor core is heavy and the processing speed cannot keep up. In order to improve the performance, the frequency needs to be increased. When the load is less than the frequency reduction threshold, it indicates that the load of the processor core is light, and such a fast processing speed is not required. In order to save energy consumption, frequency reduction is required. Setting the two thresholds can also more reasonably perform frequency regulation on processor cores and the like, and can also reduce additional resource consumption of frequent frequency regulation.
  • the above-mentioned frequency modulation controller is specifically used for:
  • the current performance information of the processor is searched in the first mapping table corresponding to the first load type, where the first mapping table includes the mapping between the operating frequency of the target object and the performance information of the processor relationship; the first load type is the type with the most occurrences among the at least one load type; adjust the current performance information based on the load amount to obtain target performance information; look up in the first mapping table based on the target performance information
  • the first target frequency of the target object, and the first target frequency is taken as the target frequency of the target object.
  • the corresponding target frequency is calculated based on the load type with the largest number of occurrences or the largest proportion, so that the processor core can achieve better performance and energy efficiency as much as possible when processing the load.
  • the present application obtains the target frequency through the above-mentioned frequency-performance mapping table. Since these mapping tables are the optimal solution sets that satisfy various constraints obtained through offline training, the obtained target frequency is more ideal. Based on the obtained target frequency Processing loads can be better optimized for performance and energy efficiency.
  • the first mapping table further includes a mapping relationship between the operating frequency of the target object and the power consumption of the processor; the frequency modulation controller is further configured to: based on the power consumption constraint of the processor The second target frequency of the target object is searched in the first mapping table; if the first target frequency and the second target frequency are different, the target frequency of the target object is determined to be the second target frequency.
  • the above frequency-performance-power consumption mapping table is also the optimal solution set obtained by offline training and satisfying various constraints, so the obtained target frequency is more ideal, and the processing load based on the obtained target frequency can obtain better performance and efficiency.
  • the above-mentioned at least one load type is m load types, and m is an integer greater than 1; each type of the m load types corresponds to a mapping table, and the mapping table includes the target object's The mapping relationship between the operating frequency and the performance information of the processor; the frequency modulation controller is specifically used for:
  • the target performance information finds m groups of first optimized frequencies of the target object in the m mapping tables; the m groups of first optimized frequencies are processed based on the load distribution weights of the m load types to obtain a third target frequency, which is the The third target frequency is used as the target frequency of the target object, and the load distribution weight indicates the proportion of the load of each type of the m types.
  • the present application performs processing based on the load distribution weight to obtain the target frequency, which can also enable the processor core to achieve better performance and energy efficiency when processing the load corresponding to the load type.
  • the present application also uses the above frequency-performance mapping table to obtain the target frequency. Since these mapping tables are the optimal solution sets that satisfy various constraints obtained through offline training, the obtained target frequency is more ideal. Based on the obtained target frequency processing load for better performance and energy efficiency.
  • the above m mapping tables further include a mapping relationship between the operating frequency of the target object and the power consumption of the processor; the frequency modulation controller is further configured to: based on the power consumption constraint of the processor Find m groups of second optimized frequencies of the target object in the m mapping tables respectively; process the m groups of second optimized frequencies based on the load distribution weight to obtain the fourth target frequency of the target object; in the third target When the frequency is different from the fourth target frequency, the target frequency of the target object is determined to be the fourth target frequency.
  • the above frequency-performance-power consumption mapping table is also the optimal solution set obtained by offline training and satisfying various constraints, so the obtained target frequency is more ideal, and the processing load based on the obtained target frequency can obtain better performance and efficiency.
  • the present application provides a processing method, which is applied to a processing device, where the processing device includes a processor, a frequency modulation controller, and a frequency modulator, and the processor includes a first processor core; the method includes: using the frequency modulation
  • the controller does the following:
  • the above-mentioned memory includes at least one of a memory and an asynchronous cache.
  • the above load types include computational loads, cache-dependent loads, and memory-dependent loads.
  • the above-mentioned target object further includes the above-mentioned first processor core; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
  • the frequency regulator is called to adjust the operating frequency of the first processor core, and there is a mapping relationship between the target frequency of the first processor core and the target frequency of the memory.
  • the above-mentioned processor further includes at least one second processor core, and the above-mentioned first processor core and the above-mentioned at least one second processor core form a cluster; the above-mentioned method further includes: using the above-mentioned frequency modulation controller according to the The target frequency of the first processor core determines the operating frequency of the cluster.
  • the above-mentioned processor includes n above-mentioned first processor cores, and the above-mentioned n is a positive integer; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
  • a target frequency of the memory is determined for at least one load type of each of the first processor cores; an optimized operating frequency of the memory is determined based on the n target frequencies of the memory; the frequency regulator is called to adjust the operating frequency of the memory to the above-mentioned Optimize operating frequency.
  • the above-mentioned processing apparatus further includes a load classifier
  • the above-mentioned method further includes:
  • the above-mentioned at least one load type is obtained from the above-mentioned load classifier.
  • the load classification feature information includes clock signal inversion information in the first processor core.
  • the above method also includes:
  • the determining the target frequency of the target object based on the at least one load type includes: when the load amount satisfies a preset condition, determining the above-mentioned at least one load type target frequency.
  • the above-mentioned determination of the above-mentioned target frequency based on the above-mentioned at least one load type includes:
  • the current performance information of the processor is searched in the first mapping table corresponding to the first load type, where the first mapping table includes the mapping between the operating frequency of the target object and the performance information of the processor
  • the above-mentioned first load type is the type with the most occurrences among the above-mentioned at least one load type; the above-mentioned current performance information is adjusted based on the above-mentioned load amount to obtain target performance information; based on the above-mentioned target performance information, look up in the above-mentioned first mapping table For the first target frequency of the target object, the first target frequency is taken as the target frequency of the target object.
  • the second target frequency of the target object is searched in the first mapping table; when the first target frequency and the second target frequency are different, the target frequency of the target object is determined as the above-mentioned second target frequency.
  • the above-mentioned at least one load type is m load types, and the above-mentioned m is an integer greater than 1; each type of the above-mentioned m load types corresponds to a mapping table, and the above-mentioned mapping table includes the above-mentioned target object.
  • the mapping relationship between the operating frequency and the performance information of the above-mentioned processor; the above-mentioned determination of the above-mentioned target frequency based on the above-mentioned at least one load type includes:
  • a current performance information of the processor is searched in each of the above mapping tables; based on the above load, the m pieces of current performance information are adjusted to obtain m pieces of target performance information;
  • the target performance information is to find m groups of first optimized frequencies of the target object in the above m mapping tables; based on the load distribution weights of the above m load types, the above m groups of first optimized frequencies are processed to obtain a third target frequency, and the above The third target frequency is used as the target frequency of the target object, and the load distribution weight indicates the proportion of the load of each of the m types.
  • the m groups of second optimized frequencies of the target object are respectively searched in the m mapping tables; the m groups of second optimized frequencies are processed based on the load distribution weight to obtain the fourth optimal frequency of the target object target frequency; when the third target frequency and the fourth target frequency are different, determine the target frequency of the target object as the fourth target frequency.
  • the present application provides an electronic device, the device comprising: the processing device according to any one of the above-mentioned first aspect, and a discrete device coupled to the processing device.
  • the present application provides a system-on-chip, where the system-on-chip includes the processing device provided by any one of the implementation manners of the first aspect.
  • the system-on-chip may consist of a processing chip, or may include a processing chip and other discrete devices.
  • the present application provides a computer program, the computer program including instructions, when executed by the computer program processor, enables the processor to execute the processing method flow described in any one of the second aspect above.
  • FIG. 1 is a schematic structural diagram of a processing apparatus according to an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a load classifier according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a relationship between another load type and a mapping table provided by an embodiment of the present application.
  • FIG. 4 and FIG. 5 are schematic flow charts of functions implemented by modules in a processing device provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a processing method provided by an embodiment of the present application.
  • Task scheduling the task scheduling described in this application refers to that the operating system allocates thread task scheduling to multiple processor cores in the processor for execution.
  • a task is a series of operations that work together to achieve a certain purpose, which can be a process or a thread.
  • DVFS Dynamic voltage and frequency scaling
  • the processor can be a central processing unit (CPU), a general-purpose processor, a digital signal processor, an integrated circuit (IC), a field programmable gate array (Field Programmable Gate Array, FPGA) ) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof.
  • CPU central processing unit
  • IC integrated circuit
  • FPGA field programmable gate array
  • a processor may also be a combination that performs computing functions, such as a combination comprising one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
  • the processor core which refers to the core of the processor, controls the execution of all operations such as computing, accepting/storing commands, and processing data.
  • the execution time of a unit instruction can be used to judge the performance of the processor in handling a certain type of load.
  • the unit instruction The less execution time, the better the performance.
  • the number of instructions executed per unit time Instructions Per Seconds, IPS can also be used to judge the performance of the processor in handling a certain type of load.
  • the processor is processing a certain type of load, the number of instructions executed per unit time The more, the better the performance.
  • Energy efficiency which refers to the ratio of the services provided by the computer to the total energy consumed.
  • energy efficiency can be measured by the number of instructions executed per unit energy consumed (IPS Per Watt).
  • IPS Per Watt the number of instructions executed per unit energy consumed
  • Computer instructions are the instructions and commands to direct the work of the machine, the program is a series of instructions arranged in a certain order, and the process of executing the program is the working process of the computer.
  • the instruction set is a set of instructions used in the processor to calculate and control the computer system.
  • Each processor specifies a series of instruction systems that cooperate with its hardware circuit when it is designed. The strength of the instruction is also an important indicator of the processor, and the instruction set is one of the most effective tools to improve the efficiency of the microprocessor.
  • Common instruction set architectures include complex instruction set computing (Complex Instruction Set Computing, CISC) and reduced instruction set computing (Reduced Instruction Set Computing, RISC).
  • CISC Complex Instruction Set Computing
  • RISC Reduced Instruction Set Computing
  • the typical representatives of CISC are X86
  • RISC Typical representatives are the Advanced RISC Machine (ARM) architecture and the Microprocessor without interlocked pipelined stages (MIPS) architecture.
  • a process is often defined as the execution of a program.
  • a process can be regarded as an independent program with its complete data space and code space in memory. Data and variables owned by a process belong only to itself.
  • a thread is a program that runs alone in a process. That is, threads exist within processes.
  • a process consists of one or more threads, each thread sharing the same code and global data, but each has its own stack. Since the stack is one per thread, local variables are private to each thread. Since all threads share the same code and global data, they are tighter than processes and tend to interact more than separate processes, and the interaction between threads is easier because they themselves have some shared memory for communication : Global data for the process.
  • the existing scheduling-frequency modulation scheme integrates the completely fair scheduler (CFS) and the dynamic voltage and frequency scaling (DVFS) technology.
  • CFS completely fair scheduler
  • DVFS dynamic voltage and frequency scaling
  • the scaling table senses the result of energy consumption in the scheduling process, and then controls the DVFS to adjust the working frequency of the CPU to optimize the energy efficiency of the scheduling.
  • DVFS runs periodically, sustaining target performance and energy efficiency.
  • the optimization of performance and energy efficiency brought about by this integration is limited to certain fixed scenarios, such as the first time a new thread is issued, the old thread is awakened, and the thread needs to be migrated. There is no ongoing guarantee.
  • the above-mentioned CFS and DVFS fusion scheme relies on a fixed performance and power consumption conversion table to optimize task performance and energy efficiency, but this table is not universally applicable under different task types.
  • this solution only adjusts the working frequency of the CPU, and does not involve the working frequency of the memory such as memory and asynchronous cache, so the optimization of performance and energy efficiency is limited.
  • the DVFS system cannot track the load in a fine-grained manner to iteratively optimize the energy efficiency.
  • FIG. 1 is a schematic structural diagram of a device 10 including scheduling and frequency modulation functions provided by an embodiment of the present application.
  • the device 10 may be located in any electronic device, such as a computer, a computer, a mobile phone, a tablet, etc. in the device.
  • the device 10 may be a chip or a chip set or a circuit board on which the chip or the chip set is mounted.
  • the chip or chip set or the circuit board on which the chip or chip set is mounted can be driven by necessary software.
  • the apparatus 10 includes a processor 101 , and a frequency modulation controller 102 , a frequency modulator 103 , an asynchronous buffer 104 and a memory 105 coupled to the processor 101 .
  • the processor 101 includes one or more processor cores (Core).
  • FIG. 1 takes N (N is an integer greater than 0) processor cores 1011 as an example, including a processor core (Core) 1, a processor core (Core) 2. ... and the processor core (Core) (N-1).
  • Each processor core 1011 includes a load classifier and an operation control unit.
  • the load classifier in each processor core 1011 is used to classify the load in each processor core 1011 .
  • the operation control unit in each processor core 1011 may be used to control the execution and operation of hardware such as load classifiers in each processor core 1011 .
  • the load classifier can also be implemented in software.
  • the software-implemented load classifier may be configured in the processor core 1011 or in the frequency modulation controller 102 . If configured in the processor 101 , the load classifier directly obtains the characteristic information of the load classification from the processor core 1011 , classifies the load, and outputs the load classification result to the frequency modulation controller 102 . If configured in the FM controller 102, the load classifier can obtain the load classification feature information from the processor core 1011, and the FM controller 102 performs load classification calculation based on the obtained load classification feature information to obtain the load classification information.
  • the load classification feature information will be introduced later, and will not be described in detail here.
  • the multiple processor cores 1011 may be divided into multiple clusters, and each cluster includes at least one processor core. For example, if the processor 101 includes 8 processor cores 1011, then the 8 processor cores 1011 can be divided into three clusters, two of which can include 3 processor cores 1011, and the remaining one cluster includes 2 processors device core 1011.
  • the operating frequencies of the processor cores in each cluster are the same, that is, the processor cores in each cluster share one operating frequency.
  • the frequency modulation controller 102 is configured to calculate the optimized frequency of the operating frequency of the relevant components in the device (for example, each processor core 1011, the memory 105, and/or the asynchronous cache 104, etc.) based on the classification result of the above-mentioned load classifier, and then call the frequency regulator 103 to The operating frequency of the related components is adjusted to the corresponding optimized frequency to optimize the performance of the processor and the energy efficiency of the computer system.
  • the FM controller 102 may be a low-power processor, such as a low-power CPU, a low-power microcontroller (MCU), or a low-power state machine, or the like.
  • a low-power processor such as a low-power CPU, a low-power microcontroller (MCU), or a low-power state machine, or the like.
  • the frequency regulator 103 may obtain the optimal operating frequency of the processor from the frequency regulation controller 102, and then adjust the operating frequency of the above-mentioned one or more processor cores based on the optimal frequency.
  • the frequency regulator 103 may also obtain the optimal operating frequencies of the asynchronous cache 104 and the memory 105 from the frequency regulation controller 102, and then adjust the operating frequencies of the asynchronous cache 104 and the memory 105 respectively based on these optimal frequencies.
  • the frequency modulator 103 may be a dynamic voltage and frequency scaling (DVFS) module or the like.
  • DVFS dynamic voltage and frequency scaling
  • the memory 105 may include, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM), or Portable read-only memory (compact disc read-only memory, CD-ROM), etc.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read only memory
  • CD-ROM Compact disc read-only memory
  • the asynchronous cache 104 may be a cache memory, usually composed of static random access memory (SRAM).
  • Asynchronous cache 104 may include a L3 cache and/or a L4 cache. It should be noted that since the operating frequency of the synchronous cache (such as the L1 cache, etc.) is the same as the operating frequency of the processor core, and the operating frequency of the asynchronous cache 104 is different from that of the processor core 1011, so in this In the application, the operating frequency of the asynchronous cache can be further optimized and adjusted to further optimize the performance of the processor and the energy efficiency of the entire computer system.
  • the secondary cache may be a synchronous cache, and in another possible implementation manner, the secondary cache is an asynchronous cache.
  • the memory 105 may be a DDR memory, where the DDR is an abbreviation for double data rate synchronous dynamic random access memory (DDR SDRAM).
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • At least one of the one or more processor cores (Cores) included in the processor 101 is used to run the task scheduler 106.
  • the task scheduler 106 is implemented by a computer program, and the task scheduler 106 may be a scheduler in an operating system (operating system, OS), and is used to assign task scheduling to the one or more processor cores through the scheduling channel 107 in execution.
  • OS operating system
  • the task scheduler 106 may send frequency modulation constraints to the frequency modulation controller 102, which may be power consumption constraints, performance constraints, or the like.
  • the task scheduler 106 may also obtain scheduling suggestions from the frequency regulation controller 102, and the scheduling suggestions may include suggestions such as load balancing, task migration, or power-on/off policies.
  • the frequency regulation controller 102 obtains that the processor cores in the same cluster have unbalanced loads or different load types, which are not conducive to optimization, it will send a load balancing suggestion to the task scheduler 106 .
  • the load balancing suggestion may include information such as the load situation and load type of each processor core in the cluster.
  • the task scheduler 106 may also send a frequency modulation request to the frequency modulator 103, so as to call the frequency modulator 103 to perform frequency modulation.
  • the device 10 provided by the present application decouples hardware frequency modulation control from software thread scheduling, software is responsible for coarse-grained scheduling and provides frequency modulation constraints to hardware; hardware controls fine-grained frequency modulation and provides scheduling advice to software; A combination of fine-tuning and fine-tuning.
  • the combination of the frequency modulation controller and the load classifier in the device 10 provided by the present application can track and iterate the load in a timely manner, increase and decrease the frequency in a timely manner, and optimize energy efficiency and performance in a timely and continuous manner.
  • implementing load tracking and frequency modulation control through hardware can effectively reduce software load and overhead.
  • the FM controller provides FM control of the processor and memory to optimize energy efficiency and performance from a system-wide perspective.
  • each component in the device 10 may include the following:
  • the load classifier in each processor core 1011 is used to classify the load in each processor core 1011 to obtain load classification information, where the load classification information is used to indicate to which the load in each processor core 1011 belongs load type.
  • the load classifier may also be configured in the frequency modulation controller 102 to classify the load in each processor core 1011 .
  • This application mainly focuses on configuring the load classifier in the processor core 1011 .
  • the load classifier is configured in the frequency modulation controller 102, reference may be made to the corresponding description, which will not be repeated in this application.
  • the load type may include computational type, cache-dependent type, memory-dependent type, idle type, and the like.
  • the load is a computational load.
  • the processor core needs to access the L3 cache and/or the L4 cache to obtain data in the process of executing the load
  • the load is a cache-dependent load.
  • the processor core needs to access the memory to obtain data during the execution of the load, the load is a memory-dependent load.
  • the processor core executes an idle load, that is, when the suspending work is suspended, the type of the load in the processor core 1011 is the idle type.
  • the load classifier can be a classification algorithm implemented by a hardware circuit or a software program, such as a softmax classification algorithm.
  • the softmax classification algorithm includes a pre-trained
  • the model can take the load classification feature information in the processor core 1011-0 (optionally, it can also include the synchronization cache) as input, and the processor core 1011-0 can be output after the calculation and judgment of the model.
  • the classification information of the load within 0.
  • the load classification feature information includes one or more of clock signal inversions and performance events (events) in the processor core 1011-0 (optionally, may also include in the synchronization buffer).
  • the performance event may include one or more events such as a command and/or a level signal within a preset duration.
  • FIG. 2 exemplarily shows a schematic structural diagram of a load classifier implemented by a hardware circuit.
  • the load classifier may include a counter 201 , a processing unit 202 and a memory 203 .
  • the processing unit 202 may include a multiply-accumulate (multiply accumulate, MAC) unit, a comparison and control unit, and other units for performing operations such as calculation, comparison, and control processing.
  • the memory 203 can be a static random-access memory (static random-access memory, SRAM) or the like.
  • the load classifier is input to the counter with one or more of the clock signal inversion and the performance event of the above-mentioned processor core 1011-0 (optionally, the synchronization buffer may also be included) within the first preset duration.
  • the synchronization buffer may also be included.
  • it is used to count the number of times the clock signal is flipped and ⁇ or the number or duration of performance events within the first preset time period.
  • the counter 201 can be used to count the number of the instructions; if the performance event includes events such as level signals, the counter 201 can be used to count the level signals and other events in the first The duration of the preset duration, etc., and then the front-end execution efficiency within the first preset duration and/or the back-end execution efficiency within the first preset duration can be counted.
  • the processing unit 202 After the counter 201 completes the statistics of the input within the first preset time period, the statistical results are input into the processing unit 202, and the processing unit 202 obtains information such as pre-trained calculation parameters from the memory 203, and then the processing unit 202 will count the results. Calculate and compare with pre-trained computing parameters and other information, and finally output a specific load type.
  • the load classifier may periodically classify the load in the processor core 1011-0 using the first preset duration as a period, or may use a second preset duration greater than the first preset duration as cycle.
  • the period may also be called a classification window, then the load classifier periodically classifies the load of the processor core 1011-0 in each classification window to obtain the type of the load in each classification window.
  • the size of the first preset duration and the second preset duration may be 100 microseconds, 10 milliseconds, or 1 second, etc. The present application does not limit the sizes of the first preset duration and the second preset duration.
  • the size of the first preset duration and the second preset duration are configured by the frequency modulation controller 102; or, the size of the first preset duration and the second preset duration are determined by each processor core.
  • the load classifier in 1011 is determined by interrupt triggering the FM controller 102.
  • the frequency modulation controller 102 is configured to obtain the target load classification information of the first processor core from the first processor core; determine the target frequency of the target object based on the target load classification information; and then call the frequency regulator based on the target frequency 103 Adjust the working frequency of the target object.
  • the first processor core is one of the above-mentioned one or more processor cores 1011 . As long as the processor core 1011 satisfies the preset condition, the processor core 1011 is the first processor core.
  • the preset condition is that the load of the processor core 1011 in the first window is greater than the first threshold or less than the second threshold.
  • the first window and the window where the processor core 1011 is located are adjacent windows, and the first window appears first.
  • the first window may include one or more classification periods of the foregoing load classifiers, that is, the first window may include one or more of the foregoing classification windows.
  • the above-mentioned first threshold and second threshold are collectively referred to as a load threshold, and the first threshold is greater than the second threshold.
  • the first threshold may be an up-frequency load threshold
  • the second threshold may be a down-frequency load threshold.
  • the value of the up-frequency load threshold may be, for example, 70% or 80%, and the value of the down-frequency load threshold may be 50% or 60%, etc.
  • the application does not limit the specific value of the load threshold.
  • the load of the processor core 1011 in the first window is greater than the up-frequency load threshold, it indicates that the load of the processor core 1011 is too heavy and the processing speed cannot keep up.
  • the operating frequency corresponding to 1011 is up-converted.
  • the load of the processor core 1011 in the first window is less than the frequency reduction load threshold, it indicates that the load of the processor core 1011 is light, and such a fast processing speed is not required.
  • the operating frequency corresponding to the core 1011 is down-converted.
  • the first thresholds of different processor cores may be the same or different; the second thresholds of different processor cores may be the same or different.
  • the first thresholds of the processor cores in the same cluster may be the same, and the second thresholds of the processor cores in the same cluster may be the same; and the first thresholds of the processor cores in different clusters may be the same.
  • the threshold may be different, and the second threshold may be different for processor cores in different clusters.
  • the above-mentioned target object may include at least one of the first processor core and the memory.
  • the memory includes at least one of a memory and an asynchronous cache.
  • the memory may be the memory 105 shown in FIG. 1
  • the asynchronous cache may be the asynchronous cache 104 shown in FIG. 1 .
  • the foregoing load classification information may include a specific type to which a specific load belongs.
  • the load classification information in the first window may be: the load type obtained by classification window 1 is computing
  • the load type obtained by classification window 2 is cache-dependent.
  • the frequency modulation controller 102 can monitor the load of the above-mentioned one or more processor cores 1011 in each window. Taking the processor core 1011-0 as an example, in the When the load in the first window is greater than the first threshold or less than the second threshold (in this case, the processor core 1011-0 is the above-mentioned first processor core), the FM controller 102 will be triggered based on the first window. According to the load situation, the optimized frequency is calculated, and then the frequency regulator 103 is called for frequency regulation.
  • the frequency modulation controller 102 can obtain the load amount of the processor core 1011-0 in the first window in various ways, two of which are exemplarily introduced below:
  • the first method is to obtain the load amount in the first window calculated by the processor core 1011-0 from the processor core 1011-0.
  • the processor core 1011-0 can perceive its own load in the first window.
  • the processor core 1011-0 in the first window can know the busy and idle periods of time. Then, by calculating the ratio of the busy duration to the total duration in the window, the load in the first window can be obtained.
  • the total duration in this window is the sum of the busy duration and the idle duration.
  • the busy duration refers to the duration that the processor core 1011-0 needs to process a task for completing a certain target
  • the load can be sent to the frequency modulation controller 102 for further processing.
  • the second is obtaining real-time load classification information from the load classifier in the processor core 1011-0, and calculating the real-time load of the processor core 1011-0 in the first window based on the real-time load classification information.
  • the FM controller 102 can acquire the load classification information in each window of the load classifier in the processor core 1011-0 in real time, then the FM controller 102 can acquire the load classification information in the processor core 1011-0 in real time.
  • the ratio of the number of load types to the total number of load types in the load classification information can obtain the load amount in the first window.
  • the load classification information in the first window is: the load type obtained by classification window 1 is the calculation
  • the load type obtained by the classification window 2 is the cache-dependent type
  • the load type obtained by the classification window 3 is the computational type
  • the frequency modulation controller 102 After acquiring the load amount of the processor core 1011-0 in the first window, the frequency modulation controller 102 compares the load amount with the above-mentioned first threshold and the second threshold respectively, and the processor core 1011-0 is in the first window In the case where the load is greater than the first threshold or less than the second threshold, the target object (including the processor core 1011-0, target frequency of at least one of asynchronous cache 104 and memory 105).
  • the following describes the process of calculating the target frequency of the target object.
  • the frequency modulation controller 102 may calculate the load distribution weight according to the load classification information in the first window obtained from the load classifier in the processor core 1011-0.
  • the load classification information indicates that the load types of the processor core 1011-0 in the first window include m (where m is an integer greater than 1) load types, then the load distribution weight indicates that among the m types Load ratio for each type.
  • m is an integer greater than 1
  • the load classifier within the 10 classification windows (may be referred to as the 1st classification window, the 2nd classification window, ... and the 10th classification window) in the first window
  • the load classification information obtained after classifying the load of the , the loads in the seventh classification window are memory-dependent loads, and the loads in the eighth and tenth classification windows are idle-type loads.
  • the FM controller 102 obtains the load classification information, it can be known from statistics that the total number of load types of the processor core 1011-0 in the first window is 10, and then the calculated load distribution weight is:
  • the weight of the computational load is 3/10, cache-dependent loads are weighted 4/10, memory-dependent loads are weighted 1/10, and idle type loads are weighted 2/10.
  • the frequency modulation controller 102 also needs to calculate the load scaling ratio, and the target performance can be calculated based on the load scaling ratio, so that the optimal frequency can be found based on the target performance. Specifically, if the load of the processor core 1011-0 in the first window is greater than the first load threshold (the first load threshold may also be referred to as an up-frequency load threshold), then the load scaling ratio is the first load threshold and the ratio of this load. If the load of the processor core 1011-0 in the first window is less than the second load threshold (the second load threshold may also be called the underclocking load threshold), then the load scaling ratio is the second load threshold and the load quantity ratio.
  • the first load threshold may also be referred to as an up-frequency load threshold
  • the load scaling ratio is the first load threshold and the ratio of this load. If the load of the processor core 1011-0 in the first window is less than the second load threshold (the second load threshold may also be called the underclocking load threshold), then the load scaling ratio is the second load threshold and the load quantity ratio.
  • the performance described in this application may also be that the larger the value, the better the performance. Then, when calculating the load scaling ratio, if the load of the processor core 1011-0 in the first window is greater than the first load threshold, then the load scaling ratio is the ratio of the load amount to the first load threshold. If the load of the processor core 1011-0 in the first window is less than the second load threshold, the load scaling ratio is the ratio of the load to the second load threshold. It should be noted that the method described in the above paragraph of this application is used as an example for introduction.
  • the frequency modulation controller 102 also needs to acquire the current operating frequency of the target object.
  • the current operating frequency of the target object can be obtained from the frequency regulator 103, and the current operating frequency of the target object includes the current operating frequency of the processor core 1011-0.
  • the current operating frequency also includes the current operating frequency of the asynchronous cache 104 and/or the memory 105 .
  • the frequency modulation controller 102 can interact with the processor core running the task scheduler 106 to obtain the current operating frequency of the processor core 1011-0 from the task scheduler 106, and optionally, can also obtain asynchronous Current operating frequency of cache 104 and/or memory 105. Since the task scheduler 106 needs to know the conditions of each processor core, memory and other modules in the processor in order to make task scheduling decisions, the task scheduler 106 can obtain information such as the operating frequency of each processor core and memory in real time.
  • the frequency modulation controller 102 may directly interact with the processor core 1011-0 to obtain the current operating frequency of the processor core 1011-0.
  • the frequency modulation controller 102 may also interact with the memory, that is, the asynchronous cache 104 and/or the memory 105 to obtain the current operating frequency of the asynchronous cache 104 and/or the memory 105 .
  • the FM controller 102 After acquiring the current operating frequency corresponding to the processor core 1011-0, the FM controller 102 acquires the current performance information of the processor 101 based on the current operating frequency, where the current performance refers to the time when the load is processed at the current operating frequency of the target object performance of the processor 101 . Then, target performance information is calculated based on the current performance information and the load scaling ratio calculated above, and then the target frequency of the target object is determined based on the calculated target performance information.
  • the frequency-performance mapping table maintained in the frequency modulation controller 102 is introduced below.
  • the frequency-performance relationship mapping table (hereinafter referred to as the mapping table) corresponding to each processor core is maintained in the frequency modulation controller 102, and the mapping table corresponding to each processor core may also include multiple, the multiple
  • the mapping table is a mapping table corresponding to each of the multiple load types.
  • the frequency-performance mapping table can be obtained by offline training. Specifically, firstly, the load classifier described above can be used to perform offline training and classification for the loads of different processor cores to obtain different load types. Then, for different load types, the performance of processors corresponding to different operating frequencies of processor cores and/or memories is predicted through a linear regression model, thereby establishing frequency-performance mapping tables corresponding to different load types in different processor cores.
  • the above-mentioned linear regression model may include, but is not limited to, a first-order or higher-order polynomial, and the input variables of the model may include, but are not limited to, the frequency value of the operating frequency of the processor core and/or memory, the linear combination of frequency values, the Ratios or normalized frequency values, etc.
  • the linear regression algorithm corresponding to the linear regression model may include, but is not limited to, the least squares method or the least squares method with a regularization term.
  • mapping table corresponding to the processor core refers to that the data in the mapping table is applicable to the processor core.
  • the mapping table corresponding to a certain load type records the mapping relationship between the operating frequency of the processor core and the processor performance when the processor core executes the load of this load type; optionally, the mapping table corresponding to the certain load type What can also be recorded in is the mapping relationship between the operating frequency of the memory and the performance of the processor when the processor core executes the load of the load type.
  • mapping table simultaneously records the mapping relationship between the operating frequency of the processor core, the operating frequency of the memory and the processor performance, the operating frequency of the processor sum and the operating frequency of the memory in the mapping table satisfy a certain mapping relationship, so that The performance of the processor can reach an ideal optimized state, and at the same time, the energy efficiency of the entire computer system can also be better optimized.
  • mapping table corresponding to each processor core is exemplarily drawn, and the mapping table corresponding to each processor core also includes mapping tables corresponding to multiple load types. Examples of workloads, cache-dependent workloads, and memory-dependent workloads are shown.
  • Table 1 exemplarily shows a part of a mapping table of a certain load type.
  • the unit of operating frequency is Hertz (Hz).
  • Table 1 shows a mapping table of computing loads corresponding to the processor core 1011-0, for example, in Table 1, when the processor core 1011-0 operates at a frequency of 100 Hz, and the memory and cache are The performance of the processor core 1011-0 is a1 when processing a computational load at operating frequencies of 50 Hz and 40 Hz.
  • the higher the operating frequency of the processor core the lower the performance, indicating that it takes less time to execute a load, so the performance is better.
  • the first possible implementation is to select the load type with the largest weight among the above load distribution weights for calculation.
  • the frequency modulation controller 102 calculates and obtains the load distribution weight of the processor core 1011-0 in the first window, it can obtain the load type with the largest weight (which may be referred to as the first load type, or the load type with the largest weight is also known as the load type with the largest weight). It can be said that it is the load type that occurs most frequently in the first window), and then the mapping table of the first load type corresponding to the processor core 1011-0 is found. Taking the current operating frequency of the target object obtained above as an index, the performance information mapped to the current operating frequency is found in the mapping table of the first load type, which may be called first performance information, and the first performance information is the processor 101's current performance information.
  • the mapping table of the first load type is shown in Table 1
  • the obtained current operating frequency corresponding to the processor core 1011-0 is: the operating frequency of the processor core 1011-0 is 100 Hz, and the operating frequency of the memory is 50 Hz. If the working frequency of the hertz and the cache is 40 Hz, then compare the working frequencies with the frequencies in Table 1, and finally find the performance information mapped by the frequencies as a1.
  • the FM controller 102 finds the first performance information, it can multiply the first performance information by the load scaling ratio calculated above to obtain new performance information, and the new performance is the expected processor core 1011-0
  • the performance when handling the load of the first load type may also be referred to as the target performance.
  • the operating frequency mapped by the target performance information is searched in the mapping table of the first load type with the target performance information as an index, and the found operating frequency is the optimized frequency.
  • the first performance information found in Table 1 above is a1. Assuming that the load scaling ratio calculated above is 0.84, the target performance information is a1*0.84 ⁇ a2. Then, the By comparing a2 with the performance in Table 1, the optimized frequency of the mapping can be found: the optimized operating frequency of the processor core 1011-0 is 300, the optimized operating frequency of the memory is 50, and the optimized operating frequency of the cache is 53.
  • the second possible implementation is to perform weighted average calculation based on the above load distribution weights.
  • the FM controller 102 knows that the first window includes m types of loads based on the obtained load classification information of the processor core 1011-0 in the first window, and the m types are specifically: which load types. Then, the frequency modulation controller 102 calculates and processes the m types of loads respectively to obtain m groups of optimal operating frequencies of the target object, and then performs a weighted average of the m groups of optimal operating frequencies based on the load distribution weight to obtain the The target frequency of the target object.
  • the frequency modulation controller 102 obtains the i-th performance in the i-th mapping table based on the current operating frequency of the target object obtained above.
  • the i-th mapping table is a mapping table of the i-th load type among the above m types.
  • the value of i is an integer between 1 and m.
  • the frequency modulation controller 102 multiplies the i-th performance by the load scaling ratio calculated above to obtain the i-th target performance information.
  • the ith group of optimized operating frequencies is found in the ith mapping table using the ith target performance information as an index.
  • the frequency modulation controller 102 obtains m groups of optimal operating frequencies, and then calculates and obtains the above-mentioned target optimal operating frequencies based on the m groups of optimal operating frequencies and the above-mentioned load distribution weight. For ease of understanding, examples are given below.
  • two groups of optimized operating frequencies are obtained after calculation and processing. It is assumed that a set of optimal operating frequencies found in the mapping table of the computing load are: the optimal operating frequency of the processor core is 100 Hz, the optimal operating frequency of the memory is 50 Hz, and the optimal operating frequency of the cache is 40 Hz. It is assumed that a set of optimal operating frequencies found in the mapping table of cached loads are: the optimal operating frequency of the processor core is 200 Hz, the optimal operating frequency of the memory is 50 Hz, and the optimal operating frequency of the cache is 49 Hz.
  • the above m load types may be some types of multiple types to which the load of the processor core 1011-0 in the first window belongs.
  • the types of loads of the processor core 1011-0 in the first window include three types: computation type, cache-dependent type, and memory-dependent type, and the above-mentioned m loads may only include computation in the three types type and cache-dependent type.
  • the m types may be pre-configured types, that is, the frequency modulation controller 102 may acquire external configuration information, and the configuration information indicates which specific load types the m types are.
  • the frequency modulation controller 102 may obtain performance constraints from the task scheduler 106, including But not limited to the first performance threshold, the second performance threshold and the third performance threshold.
  • the FM controller 102 obtains the first performance threshold from the task scheduler 106, the frequency points whose performance is higher than the first performance threshold in Table 1 will not be used for frequency search; if the FM controller 102 obtains the first performance threshold from the task scheduler If the second performance threshold is obtained in 106, the frequency points whose performance is lower than the second performance threshold in Table 1 will not be used for frequency search; if the frequency modulation controller 102 obtains the third performance threshold from the task scheduler 106, Then the frequency point closest to the third performance threshold in Table 1 will be searched. Doing so allows the processor and/or memory to work under certain performance constraints, while also optimizing the performance of the processor and/or memory and the energy efficiency of the system.
  • mapping table may further include power consumption information, that is, the mapping table may be a frequency-performance-power consumption mapping table.
  • the frequency-performance-power consumption mapping table is similar to the above-mentioned frequency-performance mapping table, and may be obtained by offline training.
  • the load classifier described above can be used to perform offline training and classification for the load of different processor cores to obtain different load types. Then, for different load types, the performance and power consumption of the processor corresponding to different operating frequencies of the processor core and/or memory are predicted through a linear regression model, thereby establishing the frequency-performance corresponding to different load types in different processor cores.
  • - Power consumption mapping table For the description of the linear regression model, please refer to the previous introduction, which will not be repeated here.
  • Operating frequency of the processor core/Hz Memory operating frequency/Hz Cache operating frequency/Hz performance Power consumption/w 100 50 40 a1 w1 200 50 49 a2 w2 200 50 52 a3 w3 300 50 53 a4 w4 400 55 53 a5 w5 500 51 52 a6 w6
  • Table 2 exemplarily shows a part of the frequency-performance-power consumption mapping table. Compared with Table 1, there is one more power consumption information in Table 2. Power consumption refers to the energy consumed per unit time. The unit is watt (w). Table 2 lists the corresponding performance and power consumption of the processor when the processor core handles a certain type of load under various corresponding operating frequencies. The greater the operating frequency of the processor core, the greater the corresponding power consumption, indicating that more energy needs to be consumed.
  • the frequency modulation controller 102 may also acquire the power consumption constraints on the above-mentioned processor 101, and then look up the optimized frequency of the target object in the mapping table based on the acquired power consumption constraints. For example, taking Table 2 as an example, assuming that the obtained power consumption constraint is w5, then the corresponding optimized operating frequency can be found by searching in Table 2 with w5 as an index: the optimized operating frequency of the processor core is 400 Hz, and the memory The optimized operating frequency is 55 Hz and the optimized operating frequency of the cache is 53 Hz.
  • the frequency modulation controller 102 may obtain the power consumption constraint of the processor 101 based on one or more of temperature control, thermal design power (TDP) control, and single-core overclocking control.
  • TDP thermal design power
  • PID proportional-integral-differential
  • the target frequency of the processed target object can be calculated by adopting any one of the first possible implementation manner and the second possible implementation manner.
  • the optimized frequency of the target object obtained based on the power consumption constraint is calculated and processed by using any one of the first possible implementation manner and the second possible implementation manner above
  • the optimization frequency of the target object is the same, then, the target frequency of the target object is still the optimization frequency obtained by calculation and processing using any one of the first possible implementation manner and the second possible implementation manner.
  • the target frequency of the target object is the optimized frequency obtained based on the power consumption constraint. This is because the power consumption constraint has a higher priority, and the processor core needs to work under the condition that the power consumption constraint is satisfied.
  • the power consumption in the mapping table is the fixed chip corner (chip corner refers to the process of chip making due to materials and/or welding, gluing, etc.) (deviation caused by other operations) and a fixed ambient temperature, but the actual power consumption is affected by the ambient temperature and the chip corner, and there will be changes. Therefore, the power consumption of the processor can be obtained in real time to analyze the power consumption data in the mapping table. Minor corrections to improve data accuracy.
  • the frequency modulation controller 102 may acquire real-time power consumption from a power sensor (power sensor) in real time, and then update the acquired power consumption to a corresponding frequency-performance-power consumption mapping table.
  • the power consumption sensor can detect the power consumption of the processor in real time, and send the detected power consumption to the frequency modulation controller 102 .
  • examples are given below.
  • Table 2 is the mapping table of the computing load corresponding to the processor core 1011-0.
  • the operating frequency of the processor core 1011-0 is 100 Hz
  • the operating frequency of the memory 105 is 50 Hz. Hertz, when the operating frequency of the asynchronous cache 104 is 40 Hz, the power consumption of the processor 101 is limited to w1 when the processor core 1011-0 processes the computing load. Due to the influence of the ambient temperature, etc., in the process of processing the computing load by the processor core 1011-0, the power consumption sensor actually detects that the size of the power consumption constraint of the processor 101 is w1'.
  • the frequency modulation controller 102 sets the working frequency of the processor core in Table 2 to 100 Hz, the working frequency of the memory 105 to 50 Hz, and the working frequency of the asynchronous cache 104 to be 100 Hz.
  • the power consumption mapped at a frequency of 40 Hz is updated to w1', and the updated mapping can be found in Table 3.
  • Operating frequency of the processor core/Hz Memory operating frequency/Hz Cache operating frequency/Hz performance Power consumption/w 100 50 40 a1 w1' 200 50 49 a2 w2 200 50 52 a3 w3 300 50 53 a4 w4 400 55 53 a5 w5 500 51 52 a6 w6
  • the frequency modulation controller 102 may also acquire the real-time power consumption of the processor by establishing a power consumption acquisition model, and then update the acquired real-time power consumption into the corresponding mapping table.
  • the operating frequency of the cache and memory can also be optimized in this application. Since only optimizing the operating frequency of the processor core can improve performance and energy consumption to a limited extent, optimizing the operating frequency of the cache and memory at the same time can further improve processing.
  • the performance and energy efficiency of the core can better adapt to various types of load tasks, meet the needs of different load tasks, reduce system freezes and improve system energy efficiency.
  • the present application obtains the target frequency through the above-mentioned frequency-performance mapping table or frequency-performance-power consumption mapping table. Since these mapping tables are the optimal solution sets obtained by offline training and satisfying various constraints, the obtained target frequency The frequency is more ideal, and better performance and energy efficiency can be achieved by processing the load based on the target frequency obtained.
  • the frequency modulation controller 102 can acquire the target frequency of a processor core.
  • the multiple processor cores 1011 can be divided into multiple clusters, and the operating frequency of the processor cores in each cluster same. Based on this, if in the cluster where the processor core 1011-0 is located, the load of other processor cores except the processor core 1011-0 in the first window is also greater than the first threshold or less than the second threshold, That is, the other processor core is also the first processor core, and the frequency modulation controller 102 is also triggered to calculate the optimized frequency based on the load condition of each processor core in the other processor cores within the first window to obtain the frequency modulation controller 102.
  • the target frequency of each processor is also triggered to calculate the optimized frequency based on the load condition of each processor core in the other processor cores within the first window to obtain the frequency modulation controller 102.
  • the frequency modulation controller 102 obtains the target frequencies of multiple processor cores, but the operating frequencies of the processor cores in each cluster are the same (for the convenience of description, it can be called a processor
  • the optimized operating frequency of the processor core in the cluster where the core 1011-0 is located is the first optimized operating frequency), so the first optimized operating frequency needs to be determined from the target frequencies of the plurality of processor cores.
  • the frequency modulation controller 102 may arbitrate based on the target frequencies of the plurality of processor cores.
  • the frequency modulation controller 102 may select the maximum frequency among the target frequencies of the plurality of processor cores as the first optimized operating frequency.
  • the frequency modulation controller 102 may select the minimum frequency among the target frequencies of the multiple processor cores as the first optimized operating frequency.
  • the frequency modulation controller 102 takes the average value of the target frequencies of the plurality of processor cores as the above-mentioned first optimized operating frequency.
  • the frequency modulation controller 102 After determining the above-mentioned first optimal operating frequency, the frequency modulation controller 102 sends the first optimal operating frequency to the frequency regulator 103, and the frequency regulator 103 adjusts the operating frequency of the processor cores in the cluster where the processor core 1011-0 is located to be the The first optimizes the operating frequency.
  • the frequency modulation controller 102 only calculates each The target frequency of the processor core.
  • One or more processor cores other than the part of the first processor core in the cluster where the processor core 1011-0 is located may be referred to as a second processor core.
  • the frequency modulation controller 102 may be based on the target frequency of each processor core in the part of the first processor core (hereinafter referred to as multiple optimized frequencies), and based on the second processor core The current operating frequency of the core is arbitrated.
  • the frequency modulation controller 102 may select the largest frequency from the above-mentioned multiple optimal frequencies and the current operating frequency of the second processor core as the above-mentioned first optimal operating frequency.
  • the frequency modulation controller 102 may select the minimum frequency from the above-mentioned multiple optimal frequencies and the current operating frequency of the second processor core as the above-mentioned first optimal operating frequency.
  • the frequency modulation controller 102 may take the average value of the above-mentioned multiple optimal frequencies and the current operating frequency of the second processor core as the above-mentioned first optimal operating frequency.
  • the frequency modulation controller 102 After determining the above-mentioned first optimal operating frequency, the frequency modulation controller 102 sends the first optimal operating frequency to the frequency regulator 103, and the frequency regulator 103 adjusts the operating frequency of the processor cores in the cluster where the processor core 1011-0 is located to be the The first optimizes the operating frequency.
  • the frequency modulation controller 102 also calculates the target frequency of the memory, and can call the frequency modulator 103 to adjust the operating frequency of the memory based on the target frequency of the memory.
  • the memory whether it is the asynchronous cache 104 or the memory 105, it is shared by the multiple processor cores 1011 included in the processor 101, so both the asynchronous cache 104 and the memory 105 have only one operating frequency.
  • the asynchronous cache 104 is used as an example for introduction.
  • the frequency modulation controller 102 may calculate and obtain the corresponding target frequency based on the load conditions of the multiple first processor cores in the first window.
  • the target frequency corresponding to each of the plurality of first processor cores includes a target frequency of the asynchronous cache 104 .
  • the frequency modulation controller 102 may determine an optimal frequency from the plurality of target frequencies in the asynchronous cache 104 , and the optimal frequency may be referred to as a second optimal frequency working frequency.
  • the frequency modulation controller 102 may select the largest frequency from the multiple target frequencies of the asynchronous buffer 104 as the second optimized operating frequency.
  • the frequency modulation controller 102 may select the minimum frequency from the multiple target frequencies of the asynchronous buffer 104 as the second optimized operating frequency.
  • the frequency modulation controller 102 may take the average value of multiple target frequencies in the asynchronous buffer 104 as the second optimized operating frequency.
  • the frequency modulation controller 102 may determine the second optimal operating frequency from the plurality of target frequencies of the asynchronous buffer 104 and the current operating frequency of the asynchronous buffer 104 .
  • the frequency modulation controller 102 may select the largest frequency from the multiple target frequencies of the asynchronous buffer 104 and the current operating frequency of the asynchronous buffer 104 as the second optimized operating frequency.
  • the frequency modulation controller 102 may select the minimum frequency from the multiple target frequencies of the asynchronous buffer 104 and the current operating frequency of the asynchronous buffer 104 as the second optimized operating frequency.
  • the frequency modulation controller 102 may take the average value of the multiple target frequencies of the asynchronous buffer 104 and the current operating frequency of the asynchronous buffer 104 as the second optimized operating frequency.
  • the frequency modulation controller 102 After determining the second optimal operating frequency, the frequency modulation controller 102 sends the second optimal operating frequency to the frequency regulator 103 , and the frequency regulator 103 adjusts the operating frequency of the asynchronous buffer 104 to be the second optimal operating frequency.
  • FIG. 4 takes the processor 101 including 8 processor cores 1011 as an example.
  • the 8 processor cores are divided into three clusters, wherein the processor cores (cores) 0 to 2 are the first cluster, and the processor cores (core) 3 To 5 is the second cluster, and processors (cores) 6 and 7 are the third cluster.
  • FIG. 4 shows the connection relationship between the processor cores of the three clusters and the corresponding processing channels in the FM controller 102 , and also shows the arbitration of the operating frequency of the processor cores among the clusters and the overall control of the operating frequency of the memory.
  • Fig. 5 takes the processor core (core) 0 as an example to illustrate the process that the frequency modulation controller 102 included in the corresponding processing channel obtains the target frequency corresponding to the processor core based on the load condition of the processor core.
  • the FM controller 102 first obtains the external configuration information of the first window size and the configuration information of the load type, and then performs load classification analysis based on these information, that is, the The load classification information obtained from the processor core (core) 0 is analyzed and processed. It can be known from the foregoing description that the size of the first window may include one or more classification windows of the load classifier in the processor core (core) 0 .
  • the configuration information of the load type mainly includes the above m types of information.
  • the frequency modulation controller 102 calculates the load amount, load scaling ratio and load distribution weight of the processor core (core) 0, obtains the current operating frequency corresponding to the processor core (core) 0, and queries the mapping table based on the current operating frequency to obtain The current performance is then calculated based on the load scaling ratio and the load distribution weight to obtain the target frequency corresponding to the processor core (core) 0.
  • the frequency modulation controller 102 processes each processor core to obtain the target frequency corresponding to each processor core. Then, based on these target frequencies, a final frequency for adjustment is determined for each cluster, and a final frequency for adjustment is determined for the memory, and the determined frequency is sent to the frequency modulator 103, which is based on the received frequency. Frequency adjusts the operating frequency of each cluster and the operating frequency of the memory.
  • the frequency modulation controller 102 may send a power-on/off suggestion for the corresponding processor core to the task scheduler.
  • the frequency modulation controller 102 may send a load balancing suggestion and the like to the task scheduler.
  • FIG. 6 is a schematic flowchart of a processing method provided by an embodiment of the present invention, and the processing method can be applied to a processing apparatus.
  • the processing device may include a processor, a frequency modulation controller and a frequency modulator, the processor including a first processor core.
  • the processing apparatus may be the apparatus 10 shown in FIG. 1
  • the first processor core may be any one of the multiple processor cores shown in FIG. 1 .
  • the method includes but is not limited to the following steps:
  • S602 Determine, by the frequency modulation controller, a target frequency of a target object based on the at least one load type; the target object includes a memory.
  • the above-mentioned memory includes at least one of a memory and an asynchronous cache.
  • the above load types include computational loads, cache-dependent loads, and memory-dependent loads.
  • the above-mentioned target object further includes the above-mentioned first processor core; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
  • the frequency regulator is called to adjust the operating frequency of the first processor core, and there is a mapping relationship between the target frequency of the first processor core and the target frequency of the memory.
  • the above-mentioned processor further includes at least one second processor core, and the above-mentioned first processor core and the above-mentioned at least one second processor core form a cluster; the above-mentioned method further includes: using the above-mentioned frequency modulation controller according to the The target frequency of the first processor core determines the operating frequency of the cluster.
  • the above-mentioned processor includes n above-mentioned first processor cores, and the above-mentioned n is a positive integer; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
  • the above-mentioned processing apparatus further includes a load classifier
  • the above-mentioned method further includes:
  • the above-mentioned at least one load type is obtained from the above-mentioned load classifier.
  • the load classification feature information includes clock signal inversion information in the first processor core.
  • the above method also includes:
  • the determining the target frequency of the target object based on the at least one load type includes: when the load amount satisfies a preset condition, determining the above-mentioned at least one load type target frequency.
  • the above-mentioned determination of the above-mentioned target frequency based on the above-mentioned at least one load type includes:
  • the current performance information of the processor is searched in the first mapping table corresponding to the first load type, where the first mapping table includes the mapping between the operating frequency of the target object and the performance information of the processor relationship;
  • the above-mentioned first load type is the type with the most occurrences among the above-mentioned at least one load type;
  • the above-mentioned current performance information is adjusted based on the above-mentioned load amount to obtain target performance information;
  • the first target frequency is taken as the target frequency of the target object.
  • the second target frequency of the target object is searched in the first mapping table; when the first target frequency and the second target frequency are different, the target frequency of the target object is determined as the above-mentioned second target frequency.
  • the above-mentioned at least one load type is m load types, and the above-mentioned m is an integer greater than 1; each type of the above-mentioned m load types corresponds to a mapping table, and the above-mentioned mapping table includes the above-mentioned target object.
  • the mapping relationship between the operating frequency and the performance information of the above-mentioned processor; the above-mentioned determination of the above-mentioned target frequency based on the above-mentioned at least one load type includes:
  • a current performance information of the processor is searched in each of the above mapping tables; based on the above load, the m pieces of current performance information are adjusted to obtain m pieces of target performance information;
  • the target performance information is to find m groups of first optimized frequencies of the target object in the above m mapping tables; based on the load distribution weights of the above m load types, the above m groups of first optimized frequencies are processed to obtain a third target frequency, and the above The third target frequency is used as the target frequency of the target object, and the load distribution weight indicates the proportion of the load of each of the m types.
  • the m groups of second optimized frequencies of the target object are respectively searched in the m mapping tables; the m groups of second optimized frequencies are processed based on the load distribution weight to obtain the fourth optimal frequency of the target object target frequency; when the third target frequency and the fourth target frequency are different, determine the target frequency of the target object as the fourth target frequency.
  • the present application provides a computer program, the computer program including instructions, when executed by the computer program processor, enables the processor to execute the processing method flow described in any one of the above-mentioned FIG. 6 and its possible implementation manners.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted over a computer-readable storage medium.
  • the computer instructions can be sent from a website site, computer, server, or data center via wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.) another website site, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, digital versatile discs (DVDs)), or semiconductor media (eg, solid state disks (SSDs)) )Wait.

Abstract

The present application provides a processing apparatus, a processing method, and a related device. The processing apparatus comprises: a processor, a frequency modulation controller, and a frequency modulator. The processor comprises a first processor core. The frequency modulation controller is used for: obtaining at least one load type of the first processor core; determining a target frequency of a target object on the basis of the at least one load type, the target object comprising a memory; and calling the frequency modulator on the basis of the target frequency to adjust the working frequency of the target object. According to the present application, the performance and energy efficiency of the processor can be well optimized.

Description

处理装置、处理方法及相关设备Processing device, processing method and related equipment
本申请要求于2020年12月31日提交中国专利局、申请号为PCT/CN2020/142512、申请名称为“处理装置、处理方法及相关设备”的国际申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the international application filed with the China Patent Office on December 31, 2020, the application number is PCT/CN2020/142512, and the application name is "processing device, processing method and related equipment", the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及处理器技术领域,具体涉及一种处理装置、处理方法及相关设备。The present application relates to the technical field of processors, and in particular, to a processing apparatus, a processing method, and related equipment.
背景技术Background technique
随着移动端对计算能力及计算效率需求的逐渐增加,系统芯片(system on a chip,SoC)的中央处理器(central processing unit,CPU)及其对应的存储系统日益复杂化,例如CPU自身从单丛集发展成为双丛集甚至多丛集,以及三级缓存甚至四级层缓存的引入等,对处理器和存储器的性能以及计算机系统的能效的优化提出了极大地挑战。With the increasing demand for computing power and computing efficiency of mobile terminals, the central processing unit (CPU) of a system on a chip (SoC) and its corresponding storage system are increasingly complex. The development of single-cluster into double-cluster or even multi-cluster, and the introduction of L3 cache and even L4 cache, etc., pose great challenges to the optimization of processor and memory performance and energy efficiency of computer systems.
综上所述,如何更好地优化处理器和存储器的性能以及计算机系统的能效是本领域技术人员需要解决的技术问题。To sum up, how to better optimize the performance of the processor and the memory and the energy efficiency of the computer system is a technical problem to be solved by those skilled in the art.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种处理装置、处理方法及相关设备,能够更好地优化处理器和存储器的性能及计算机系统的能效。Embodiments of the present application provide a processing apparatus, a processing method, and related equipment, which can better optimize the performance of a processor and a memory and the energy efficiency of a computer system.
第一方面,本申请实施例提供了一种处理装置,该处理装置包括处理器、调频控制器和调频器,该处理器包括第一处理器核;该调频控制器用于:获取该第一处理器核的至少一个负载类型;基于该至少一个负载类型确定目标对象的目标频率;该目标对象包括存储器;基于该目标频率调用该调频器调整该目标对象的工作频率。In a first aspect, an embodiment of the present application provides a processing device, the processing device includes a processor, a frequency modulation controller, and a frequency modulator, the processor includes a first processor core; the frequency modulation controller is used to: obtain the first processing at least one load type of the processor core; determining a target frequency of the target object based on the at least one load type; the target object includes a memory; calling the frequency regulator to adjust the operating frequency of the target object based on the target frequency.
可选的,该存储器可以包括内存和异步缓存中的至少一项。Optionally, the memory may include at least one of a memory and an asynchronous cache.
可选的,上述负载类型可以包括计算型负载、缓存依赖型负载和内存依赖型负载。Optionally, the foregoing load types may include computational loads, cache-dependent loads, and memory-dependent loads.
由于不同类型的负载最佳匹配的存储器工作频率可能不同,在本申请中通过处理器核的工作负载类型调整对应存储器的工作频率(例如工作负载为存储器依赖型,那么可以将存储器的工作频率适当调高,若工作负载为计算依赖型,那么可以将存储器的工作频率适当调低),使得存储器可以针对不同的工作负载类型更好地协作处理器处理工作负载,且可以减少处理过程中系统的卡顿,能够在满足工作负载所需的处理效率的同时优化系统的能效,即使得处理器和存储器的性能满足工作负载的需求的同时优化了系统的能效。另外,相比于现有的技术方案,现有的技术方案中忽略了影响处理器性能及系统能效的因素不止在于处理器核自身,还受和处理器核异步的存储器(例如异步缓存和/或内存)的影响,要达到更好地优化处理器性能及系统能效的目的,需要优化该异步存储器的性能,进而更好地优化系统的能效。Since the working frequency of the memory that is best matched to different types of loads may be different, in this application, the working frequency of the corresponding memory is adjusted according to the workload type of the processor core (for example, if the workload is memory-dependent, the working frequency of the memory can be appropriately adjusted). If the workload is computing-dependent, the working frequency of the memory can be appropriately adjusted), so that the memory can better cooperate with the processor to process the workload for different workload types, and can reduce the system workload during processing. Caton can optimize the energy efficiency of the system while meeting the processing efficiency required by the workload, that is, the performance of the processor and memory can meet the needs of the workload while optimizing the energy efficiency of the system. In addition, compared with the existing technical solutions, the existing technical solutions ignore not only the processor core itself, but also the memory (such as asynchronous cache and/or asynchronous cache) that affects the processor performance and system energy efficiency. In order to better optimize the performance of the processor and the energy efficiency of the system, it is necessary to optimize the performance of the asynchronous memory, thereby better optimizing the energy efficiency of the system.
一种可能的实施方式中,上述目标对象还包括上述第一处理器核;该调频控制器还用于:基于该第一处理器核的目标频率调用该调频器调整该第一处理器核的工作频率,该第一处理器核的目标频率和该存储器的目标频率存在映射关系。In a possible implementation manner, the above-mentioned target object further includes the above-mentioned first processor core; the frequency modulation controller is further configured to: based on the target frequency of the first processor core, call the frequency controller to adjust the frequency of the first processor core. There is a mapping relationship between the working frequency, the target frequency of the first processor core and the target frequency of the memory.
由于不同类型的负载最佳匹配的处理器和存储器工作频率不同,在本申请中,可以将处理器核及异步的存储系统视为一体,并利用算法及调频手段同时对其工作频率进行优化,使 得存储系统和处理器核的工作频率与需要处理的工作负载相匹配,从而可以从整体上满足处理器和存储器性能的需求的同时可以节省不必要的能耗,从而优化了系统的能效。Due to the different operating frequencies of processors and memories that are best matched for different types of loads, in this application, the processor core and the asynchronous storage system can be regarded as one, and the operating frequencies can be optimized simultaneously by using algorithms and frequency modulation methods. The operating frequency of the storage system and the processor core is matched with the workload that needs to be processed, so that the overall performance requirements of the processor and the memory can be met while unnecessary energy consumption can be saved, thereby optimizing the energy efficiency of the system.
另外,上述第一处理器核的目标频率和该存储器的目标频率存在映射关系,该映射关系使得处理器的性能和能效同时达到较优的情况,即以该第一处理器核的目标频率和该存储器的目标频率同时运行处理对应的工作负载,可以满足该工作负载对处理器和存储器的性能的需求,且节省不必要的能耗,使系统的能效得到较好的优化。In addition, there is a mapping relationship between the target frequency of the first processor core and the target frequency of the memory, and the mapping relationship enables the performance and energy efficiency of the processor to be optimal at the same time, that is, the target frequency of the first processor core and The target frequency of the memory simultaneously runs and processes the corresponding workload, which can meet the performance requirements of the workload on the processor and the memory, save unnecessary energy consumption, and optimize the energy efficiency of the system.
一种可能的实施方式中,上述处理器还包括至少一个第二处理器核,上述第一处理器核和该至少一个第二处理器核组成簇,该调频控制器,还用于根据该第一处理器核的目标频率确定该簇的工作频率。In a possible implementation manner, the above-mentioned processor further includes at least one second processor core, the above-mentioned first processor core and the at least one second processor core form a cluster, and the frequency modulation controller is further configured to The target frequency of a processor core determines the operating frequency of the cluster.
可选的,该至少一个第二处理器核可以包括一个或多个上述第一处理器核。Optionally, the at least one second processor core may include one or more of the above-mentioned first processor cores.
由于一个簇内的处理器核共用一个工作频率,因而,在本申请中,当获取得到一个簇内的一个或多个处理器核的一个或多个目标频率时,需要基于这些目标频率仲裁出一个统一的优化频率作为该一个簇的工作频率。Since the processor cores in a cluster share one operating frequency, in this application, when one or more target frequencies of one or more processor cores in a cluster are obtained, it is necessary to arbitrate based on these target frequencies. A unified optimized frequency is used as the working frequency of the one cluster.
一种可能的实施方式中,上述处理器包括n个上述第一处理器核,该n为正整数;上述调频控制器还用于:针对每个该第一处理器核的至少一个负载类型确定上述存储器的一个目标频率;基于该存储器的n个目标频率确定该存储器的优化工作频率;调用上述调频器调整该存储器的工作频率为该优化工作频率。In a possible implementation manner, the above-mentioned processor includes n above-mentioned first processor cores, where n is a positive integer; the above-mentioned frequency modulation controller is further configured to: determine for at least one load type of each of the first processor cores A target frequency of the memory; determining the optimal operating frequency of the memory based on the n target frequencies of the memory; calling the frequency regulator to adjust the operating frequency of the memory to the optimal operating frequency.
由于存储器是整个处理器共用的,只有一个工作频率,因此,在本申请中,当获取到存储器的多个优化工作频率时,也需要仲裁出一个统一的优化频率作为该存储器的工作频率。Since the memory is shared by the entire processor and has only one operating frequency, in the present application, when multiple optimized operating frequencies of the memory are obtained, a unified optimized frequency also needs to be arbitrated as the operating frequency of the memory.
一种可能的实施方式中,上述处理装置还包括:负载分类器,用于获取上述第一处理器核中的负载分类特征信息,基于该负载分类特征信息对该第一处理器核中的负载进行分类得到上述至少一个负载类型;上述调频控制器具体用于:从该负载分类器中获取该至少一个负载类型。In a possible implementation manner, the processing apparatus further includes: a load classifier, configured to obtain load classification feature information in the first processor core, and based on the load classification feature information, load the first processor core The at least one load type is obtained through classification; the frequency modulation controller is specifically configured to: obtain the at least one load type from the load classifier.
可选的,该负载分类特征信息包括该第一处理器核内的时钟信号翻转信息。Optionally, the load classification feature information includes clock signal inversion information in the first processor core.
在本申请中,负载分类器可以是用硬件、软件或固件中的一项或多项实现,能够细粒度地对处理器核中的负载进行分类,以配合调频控制器获取合理准确的优化频率。In this application, the load classifier can be implemented by one or more of hardware, software or firmware, and can classify the load in the processor core in a fine-grained manner, so as to cooperate with the frequency modulation controller to obtain a reasonable and accurate optimized frequency .
一种可能的实施方式中,上述调频控制器,还用于:获取上述第一处理器核的负载量;在该负载量满足预设条件时,基于上述至少一个负载类型确定上述目标对象的目标频率。In a possible implementation manner, the frequency regulation controller is further configured to: acquire the load of the first processor core; and determine the target of the target object based on the at least one load type when the load meets a preset condition frequency.
可选的,该负载量为第一时间段内该第一处理器核的负载量,该第一时间段与该第一处理器核获取该负载量时所处的时间段为相邻的两个时间段,且该第一时间段先出现。Optionally, the load amount is the load amount of the first processor core in a first time period, and the first time period and the time period in which the first processor core obtains the load amount are two adjacent ones. a time period, and the first time period occurs first.
可选的,该预设条件可以包括:该负载量大于第一负载阈值或者该负载量小于第二负载阈值。其中,该第一负载阈值大于该第二负载阈值。Optionally, the preset condition may include: the load is greater than the first load threshold or the load is less than the second load threshold. Wherein, the first load threshold is greater than the second load threshold.
本申请表明只有在负载量满足预设条件的情况下才触发频率优化和调整的处理,可以减少随时随地触发调整所消耗的资源。The present application shows that the processing of frequency optimization and adjustment is triggered only when the load satisfies the preset conditions, which can reduce the resources consumed by triggering adjustment anytime and anywhere.
另外,可选的,本申请可以设置了两个阈值,一个是升频阈值(上述第一负载阈值),一个是降频阈值(上述第二负载阈值),当负载量大于升频阈值的情况下,表明处理器核的负载较重,处理速度跟不上,为了提升性能,需要升频。当负载量小于降频阈值的情况下,表明处理器核的负载较轻,不需要这么快的处理速度,为了节省能耗,需要降频。设置该两个阈值也可以更合理地对处理器核等进行调频,也可以减少频繁调频的额外资源消耗。In addition, optionally, the present application may set two thresholds, one is the up-frequency threshold (the above-mentioned first load threshold), and the other is the down-frequency threshold (the above-mentioned second load threshold), when the load is greater than the up-frequency threshold , it indicates that the load of the processor core is heavy and the processing speed cannot keep up. In order to improve the performance, the frequency needs to be increased. When the load is less than the frequency reduction threshold, it indicates that the load of the processor core is light, and such a fast processing speed is not required. In order to save energy consumption, frequency reduction is required. Setting the two thresholds can also more reasonably perform frequency regulation on processor cores and the like, and can also reduce additional resource consumption of frequent frequency regulation.
一种可能的实施方式中,上述调频控制器,具体用于:In a possible implementation, the above-mentioned frequency modulation controller is specifically used for:
基于上述目标对象的当前工作频率在第一负载类型对应的第一映射表中查找处理器的当前性能信息,该第一映射表中包括该目标对象的工作频率与该处理器的性能信息的映射关系;该第一负载类型为该至少一个负载类型中出现次数最多的类型;基于该负载量对该当前性能信息做调整以得到目标性能信息;基于该目标性能信息在该第一映射表中查找该目标对象的第一目标频率,将该第一目标频率作为该目标对象的目标频率。Based on the current operating frequency of the target object, the current performance information of the processor is searched in the first mapping table corresponding to the first load type, where the first mapping table includes the mapping between the operating frequency of the target object and the performance information of the processor relationship; the first load type is the type with the most occurrences among the at least one load type; adjust the current performance information based on the load amount to obtain target performance information; look up in the first mapping table based on the target performance information The first target frequency of the target object, and the first target frequency is taken as the target frequency of the target object.
本申请以出现次数最多或者说比重最大的负载类型来计算对应的目标频率,以使得处理器核处理负载时尽可能地能够达到更好的性能和能效。另外,本申请通过上述频率-性能映射表来获得目标频率,由于这些映射表是离线情况下训练得到的满足各种约束的最优解集,因而获得的目标频率更理想,基于获得的目标频率处理负载可以更好优化的性能和能效。In the present application, the corresponding target frequency is calculated based on the load type with the largest number of occurrences or the largest proportion, so that the processor core can achieve better performance and energy efficiency as much as possible when processing the load. In addition, the present application obtains the target frequency through the above-mentioned frequency-performance mapping table. Since these mapping tables are the optimal solution sets that satisfy various constraints obtained through offline training, the obtained target frequency is more ideal. Based on the obtained target frequency Processing loads can be better optimized for performance and energy efficiency.
一种可能的实施方式中,上述第一映射表中还包括上述目标对象的工作频率与该处理器的功耗的映射关系;该调频控制器,还用于:基于该处理器的功耗约束在该第一映射表中查找该目标对象的第二目标频率;在该第一目标频率和该第二目标频率不同的情况下,确定该目标对象的目标频率为该第二目标频率。In a possible implementation manner, the first mapping table further includes a mapping relationship between the operating frequency of the target object and the power consumption of the processor; the frequency modulation controller is further configured to: based on the power consumption constraint of the processor The second target frequency of the target object is searched in the first mapping table; if the first target frequency and the second target frequency are different, the target frequency of the target object is determined to be the second target frequency.
在本申请中,从频率、性能和功耗(即能效)多方面考虑,在同时满足功耗约束和性能需求的情况下为存储器和/或处理器核匹配出更合理、准确的工作频率,可以更好地优化处理器和存储器的性能和系统的能效。另外,上述频率-性能-功耗映射表也是离线情况下训练得到的满足各种约束的最优解集,因而获得的目标频率更理想,基于获得的目标频率处理负载可以获得更好的性能和能效。In this application, considering the multiple aspects of frequency, performance and power consumption (ie, energy efficiency), a more reasonable and accurate operating frequency is matched for the memory and/or processor core while satisfying the power consumption constraints and performance requirements at the same time, The performance of the processor and memory and the energy efficiency of the system can be better optimized. In addition, the above frequency-performance-power consumption mapping table is also the optimal solution set obtained by offline training and satisfying various constraints, so the obtained target frequency is more ideal, and the processing load based on the obtained target frequency can obtain better performance and efficiency.
一种可能的实施方式中,上述至少一个负载类型为m个负载类型,该m为大于1的整数;该m个负载类型的每个类型对应一个映射表,该映射表中包括该目标对象的工作频率与该处理器的性能信息的映射关系;该调频控制器,具体用于:In a possible implementation manner, the above-mentioned at least one load type is m load types, and m is an integer greater than 1; each type of the m load types corresponds to a mapping table, and the mapping table includes the target object's The mapping relationship between the operating frequency and the performance information of the processor; the frequency modulation controller is specifically used for:
基于该目标对象的当前工作频率在每个该映射表中查找处理器的一个当前性能信息;基于该负载量分别对该m个当前性能信息做调整以得到m个目标性能信息;基于该m个目标性能信息在该m个映射表中查找该目标对象的m组第一优化频率;基于该m个负载类型的负载分布权重对该m组第一优化频率进行处理得到第三目标频率,将该第三目标频率作为该目标对象的目标频率,该负载分布权重指示该m个类型中每个类型的负载的比例。Based on the current working frequency of the target object, look up a current performance information of the processor in each of the mapping tables; adjust the m current performance information based on the load to obtain m target performance information; based on the m current performance information The target performance information finds m groups of first optimized frequencies of the target object in the m mapping tables; the m groups of first optimized frequencies are processed based on the load distribution weights of the m load types to obtain a third target frequency, which is the The third target frequency is used as the target frequency of the target object, and the load distribution weight indicates the proportion of the load of each type of the m types.
本申请基于负载分布权重进行处理获取目标频率,也可以使得处理器核处理对应负载类型的负载时能够达到较优的性能和能效。本申请同样通过上述频率-性能映射表来获得目标频率,由于这些映射表是离线情况下训练得到的满足各种约束的最优解集,因而获得的目标频率更理想,基于获得的目标频率处理负载可以获得更好的性能和能效。The present application performs processing based on the load distribution weight to obtain the target frequency, which can also enable the processor core to achieve better performance and energy efficiency when processing the load corresponding to the load type. The present application also uses the above frequency-performance mapping table to obtain the target frequency. Since these mapping tables are the optimal solution sets that satisfy various constraints obtained through offline training, the obtained target frequency is more ideal. Based on the obtained target frequency processing load for better performance and energy efficiency.
一种可能的实施方式中,上述m个映射表中还包括该目标对象的工作频率与该处理器的功耗的映射关系;该调频控制器,还用于:基于该处理器的功耗约束分别在该m个映射表中查找该目标对象的m组第二优化频率;基于该负载分布权重对该m组第二优化频率进行处理得到该目标对象的第四目标频率;在该第三目标频率和该第四目标频率不同的情况下,确定该目标对象的目标频率为该第四目标频率。In a possible implementation manner, the above m mapping tables further include a mapping relationship between the operating frequency of the target object and the power consumption of the processor; the frequency modulation controller is further configured to: based on the power consumption constraint of the processor Find m groups of second optimized frequencies of the target object in the m mapping tables respectively; process the m groups of second optimized frequencies based on the load distribution weight to obtain the fourth target frequency of the target object; in the third target When the frequency is different from the fourth target frequency, the target frequency of the target object is determined to be the fourth target frequency.
同理,在本申请中,从频率、性能和功耗(即能效)多方面考虑,在同时满足功耗约束和性能需求的情况下为存储器和/或处理器核匹配出更合理、准确的工作频率,可以更好地优化处理器和存储器的性能和系统的能效。另外,上述频率-性能-功耗映射表也是离线情况下训练得到的满足各种约束的最优解集,因而获得的目标频率更理想,基于获得的目标频率处理负载可以获得更好的性能和能效。Similarly, in this application, from the aspects of frequency, performance and power consumption (ie energy efficiency), a more reasonable and accurate matching for the memory and/or processor core is made while satisfying the power consumption constraints and performance requirements at the same time. Operating frequency to better optimize processor and memory performance and system energy efficiency. In addition, the above frequency-performance-power consumption mapping table is also the optimal solution set obtained by offline training and satisfying various constraints, so the obtained target frequency is more ideal, and the processing load based on the obtained target frequency can obtain better performance and efficiency.
第二方面,本申请提供一种处理方法,该方法应用于处理装置,该处理装置包括处理器、调频控制器和调频器,该处理器包括第一处理器核;该方法包括:通过该调频控制器执行如下操作:In a second aspect, the present application provides a processing method, which is applied to a processing device, where the processing device includes a processor, a frequency modulation controller, and a frequency modulator, and the processor includes a first processor core; the method includes: using the frequency modulation The controller does the following:
获取上述第一处理器核的至少一个负载类型;基于上述至少一个负载类型确定目标对象的目标频率;上述目标对象包括存储器;基于上述目标频率调用上述调频器调整上述目标对象的工作频率。Acquire at least one load type of the first processor core; determine a target frequency of the target object based on the at least one load type; the target object includes a memory; call the frequency regulator based on the target frequency to adjust the operating frequency of the target object.
一种可能的实施方式中,上述存储器包括内存和异步缓存中的至少一项。In a possible implementation manner, the above-mentioned memory includes at least one of a memory and an asynchronous cache.
一种可能的实施方式中,上述负载类型包括计算型负载、缓存依赖型负载和内存依赖型负载。In a possible implementation manner, the above load types include computational loads, cache-dependent loads, and memory-dependent loads.
一种可能的实施方式中,上述目标对象还包括上述第一处理器核;上述方法还包括:通过上述调频控制器执行如下操作:In a possible implementation manner, the above-mentioned target object further includes the above-mentioned first processor core; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
基于上述第一处理器核的目标频率调用上述调频器调整上述第一处理器核的工作频率,上述第一处理器核的目标频率和上述存储器的目标频率存在映射关系。Based on the target frequency of the first processor core, the frequency regulator is called to adjust the operating frequency of the first processor core, and there is a mapping relationship between the target frequency of the first processor core and the target frequency of the memory.
一种可能的实施方式中,上述处理器还包括至少一个第二处理器核,上述第一处理器核和上述至少一个第二处理器核组成簇;上述方法还包括:通过上述调频控制器根据上述第一处理器核的目标频率确定上述簇的工作频率。In a possible implementation manner, the above-mentioned processor further includes at least one second processor core, and the above-mentioned first processor core and the above-mentioned at least one second processor core form a cluster; the above-mentioned method further includes: using the above-mentioned frequency modulation controller according to the The target frequency of the first processor core determines the operating frequency of the cluster.
一种可能的实施方式中,上述处理器包括n个上述第一处理器核,上述n为正整数;上述方法还包括:通过上述调频控制器执行如下操作:In a possible implementation manner, the above-mentioned processor includes n above-mentioned first processor cores, and the above-mentioned n is a positive integer; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
针对每个上述第一处理器核的至少一个负载类型确定上述存储器的一个目标频率;基于上述存储器的n个目标频率确定上述存储器的优化工作频率;调用上述调频器调整上述存储器的工作频率为上述优化工作频率。A target frequency of the memory is determined for at least one load type of each of the first processor cores; an optimized operating frequency of the memory is determined based on the n target frequencies of the memory; the frequency regulator is called to adjust the operating frequency of the memory to the above-mentioned Optimize operating frequency.
一种可能的实施方式中,上述处理装置还包括负载分类器,上述方法还包括:In a possible implementation manner, the above-mentioned processing apparatus further includes a load classifier, and the above-mentioned method further includes:
通过上述负载分类器获取上述第一处理器核中的负载分类特征信息,基于上述负载分类特征信息对上述第一处理器核中的负载进行分类得到上述至少一个负载类型;通过上述调频控制器从上述负载分类器中获取上述至少一个负载类型。Obtain the load classification feature information in the first processor core through the load classifier, and classify the load in the first processor core based on the load classification feature information to obtain the at least one load type; The above-mentioned at least one load type is obtained from the above-mentioned load classifier.
一种可能的实施方式中,上述负载分类特征信息包括上述第一处理器核内的时钟信号翻转信息。In a possible implementation manner, the load classification feature information includes clock signal inversion information in the first processor core.
一种可能的实施方式中,上述方法还包括:In a possible implementation, the above method also includes:
通过上述调频控制器获取上述第一处理器核的负载量;上述基于上述至少一个负载类型确定目标对象的目标频率,包括:在上述负载量满足预设条件时,基于上述至少一个负载类型确定上述目标频率。Obtaining the load amount of the first processor core through the frequency regulation controller; the determining the target frequency of the target object based on the at least one load type includes: when the load amount satisfies a preset condition, determining the above-mentioned at least one load type target frequency.
一种可能的实施方式中,上述基于上述至少一个负载类型确定上述目标频率,包括:In a possible implementation manner, the above-mentioned determination of the above-mentioned target frequency based on the above-mentioned at least one load type includes:
基于上述目标对象的当前工作频率在第一负载类型对应的第一映射表中查找处理器的当前性能信息,上述第一映射表中包括上述目标对象的工作频率与上述处理器的性能信息的映射关系;上述第一负载类型为上述至少一个负载类型中出现次数最多的类型;基于上述负载量对上述当前性能信息做调整以得到目标性能信息;基于上述目标性能信息在上述第一映射表中查找上述目标对象的第一目标频率,将上述第一目标频率作为上述目标对象的目标频率。Based on the current operating frequency of the target object, the current performance information of the processor is searched in the first mapping table corresponding to the first load type, where the first mapping table includes the mapping between the operating frequency of the target object and the performance information of the processor The above-mentioned first load type is the type with the most occurrences among the above-mentioned at least one load type; the above-mentioned current performance information is adjusted based on the above-mentioned load amount to obtain target performance information; based on the above-mentioned target performance information, look up in the above-mentioned first mapping table For the first target frequency of the target object, the first target frequency is taken as the target frequency of the target object.
一种可能的实施方式中,上述第一映射表中还包括上述目标对象的工作频率与上述处理器的功耗的映射关系;上述方法还包括:通过上述调频控制器执行如下操作:In a possible implementation manner, the above-mentioned first mapping table further includes a mapping relationship between the operating frequency of the above-mentioned target object and the power consumption of the above-mentioned processor; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
基于上述处理器的功耗约束在上述第一映射表中查找上述目标对象的第二目标频率;在 上述第一目标频率和上述第二目标频率不同的情况下,确定上述目标对象的目标频率为上述第二目标频率。Based on the power consumption constraint of the processor, the second target frequency of the target object is searched in the first mapping table; when the first target frequency and the second target frequency are different, the target frequency of the target object is determined as the above-mentioned second target frequency.
一种可能的实施方式中,上述至少一个负载类型为m个负载类型,上述m为大于1的整数;上述m个负载类型的每个类型对应一个映射表,上述映射表中包括上述目标对象的工作频率与上述处理器的性能信息的映射关系;上述基于上述至少一个负载类型确定上述目标频率,包括:In a possible implementation manner, the above-mentioned at least one load type is m load types, and the above-mentioned m is an integer greater than 1; each type of the above-mentioned m load types corresponds to a mapping table, and the above-mentioned mapping table includes the above-mentioned target object. The mapping relationship between the operating frequency and the performance information of the above-mentioned processor; the above-mentioned determination of the above-mentioned target frequency based on the above-mentioned at least one load type includes:
基于上述目标对象的当前工作频率在每个上述映射表中查找处理器的一个当前性能信息;基于上述负载量分别对上述m个当前性能信息做调整以得到m个目标性能信息;基于上述m个目标性能信息在上述m个映射表中查找上述目标对象的m组第一优化频率;基于上述m个负载类型的负载分布权重对上述m组第一优化频率进行处理得到第三目标频率,将上述第三目标频率作为上述目标对象的目标频率,上述负载分布权重指示上述m个类型中每个类型的负载的比例。Based on the current working frequency of the target object, a current performance information of the processor is searched in each of the above mapping tables; based on the above load, the m pieces of current performance information are adjusted to obtain m pieces of target performance information; The target performance information is to find m groups of first optimized frequencies of the target object in the above m mapping tables; based on the load distribution weights of the above m load types, the above m groups of first optimized frequencies are processed to obtain a third target frequency, and the above The third target frequency is used as the target frequency of the target object, and the load distribution weight indicates the proportion of the load of each of the m types.
一种可能的实施方式中,上述m个映射表中还包括上述目标对象的工作频率与上述处理器的功耗的映射关系;上述方法还包括:通过上述调频控制器执行如下操作:In a possible implementation manner, the above-mentioned m mapping tables further include a mapping relationship between the operating frequency of the above-mentioned target object and the power consumption of the above-mentioned processor; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
基于上述处理器的功耗约束分别在上述m个映射表中查找上述目标对象的m组第二优化频率;基于上述负载分布权重对上述m组第二优化频率进行处理得到上述目标对象的第四目标频率;在上述第三目标频率和上述第四目标频率不同的情况下,确定上述目标对象的目标频率为上述第四目标频率。Based on the power consumption constraints of the processor, the m groups of second optimized frequencies of the target object are respectively searched in the m mapping tables; the m groups of second optimized frequencies are processed based on the load distribution weight to obtain the fourth optimal frequency of the target object target frequency; when the third target frequency and the fourth target frequency are different, determine the target frequency of the target object as the fourth target frequency.
第三方面,本申请提供一种电子设备,该设备包括:如上述第一方面任一项所述的处理装置,以及耦合于该处理装置的分立器件。In a third aspect, the present application provides an electronic device, the device comprising: the processing device according to any one of the above-mentioned first aspect, and a discrete device coupled to the processing device.
第四方面,本申请提供一种片上系统芯片,该片上系统芯片包括上述第一方面的任意一种实现方式所提供的处理装置。该片上系统芯片,可以由处理芯片构成,也可以包含处理芯片和其他分立器件。In a fourth aspect, the present application provides a system-on-chip, where the system-on-chip includes the processing device provided by any one of the implementation manners of the first aspect. The system-on-chip may consist of a processing chip, or may include a processing chip and other discrete devices.
第五方面,本申请提供一种计算机程序,该计算机程序包括指令,当该计算机程序处理器执行时,使得该处理器可以执行上述第二方面中任意一项所述的处理方法流程。In a fifth aspect, the present application provides a computer program, the computer program including instructions, when executed by the computer program processor, enables the processor to execute the processing method flow described in any one of the second aspect above.
上述第二方面至第五方面提供的方案,用于实现或配合实现上述第一方面提供的处理器,因此可以与第一方面达到相同或相应的有益效果,此处不再进行赘述。The solutions provided in the second aspect to the fifth aspect are used to implement or cooperate with the processor provided in the first aspect, and thus can achieve the same or corresponding beneficial effects as those of the first aspect, which will not be repeated here.
综上所述,采用本申请提供的方案能够更好地优化处理器的性能及系统的能效。To sum up, using the solution provided by the present application can better optimize the performance of the processor and the energy efficiency of the system.
附图说明Description of drawings
为了更清楚地说明本发明实施例或背景技术中的技术方案,下面将对本发明实施例或背景技术中所需要使用的附图进行说明。In order to more clearly describe the technical solutions in the embodiments of the present invention or the background technology, the accompanying drawings required in the embodiments or the background technology of the present invention will be described below.
图1为本申请实施例提供的一种处理装置的架构示意图。FIG. 1 is a schematic structural diagram of a processing apparatus according to an embodiment of the present application.
图2为本申请实施例提供的一种负载分类器的架构示意图。FIG. 2 is a schematic structural diagram of a load classifier according to an embodiment of the present application.
图3为本申请实施例提供的另一种负载类型和映射表之间的关系的示意图。FIG. 3 is a schematic diagram of a relationship between another load type and a mapping table provided by an embodiment of the present application.
图4和图5为本申请实施例提供的一种处理装置中模块实现的功能流程示意图。FIG. 4 and FIG. 5 are schematic flow charts of functions implemented by modules in a processing device provided by an embodiment of the present application.
图6为本申请实施例提供的一种处理方法的流程示意图。FIG. 6 is a schematic flowchart of a processing method provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例进行描述。The embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third" and "fourth" in the description and claims of the present application and the drawings are used to distinguish different objects, rather than to describe a specific order . Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
首先,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。First, some terms in this application will be explained so as to facilitate the understanding of those skilled in the art.
(1)任务调度,在本申请中所述的任务调度指的是操作系统将线程任务调度分配给处理器中的多个处理器核执行。任务是一系列共同达到某一目的的操作,可以是一个进程,或者可以是一个线程。(1) Task scheduling, the task scheduling described in this application refers to that the operating system allocates thread task scheduling to multiple processor cores in the processor for execution. A task is a series of operations that work together to achieve a certain purpose, which can be a process or a thread.
(2)动态电压频率调整(dynamic voltage and frequency scaling,DVFS),DVFS技术可以根据芯片所运行的任务对计算能力的不同需要,动态调节芯片的工作频率和电压,从而达到节能的目的。(2) Dynamic voltage and frequency scaling (DVFS), DVFS technology can dynamically adjust the operating frequency and voltage of the chip according to the different needs of the computing power of the tasks that the chip is running, so as to achieve the purpose of energy saving.
(3)处理器,可以是中央处理器单元(central processing unit,CPU)、通用处理器、数字信号处理器、集成电路(Integrated Circuit,IC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,数字信号处理器和微处理器的组合等等。(3) The processor can be a central processing unit (CPU), a general-purpose processor, a digital signal processor, an integrated circuit (IC), a field programmable gate array (Field Programmable Gate Array, FPGA) ) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. A processor may also be a combination that performs computing functions, such as a combination comprising one or more microprocessors, a combination of a digital signal processor and a microprocessor, and the like.
(4)处理器核,指的是处理器的核心,控制着所有的计算、接受/存储命令和处理数据等操作的执行。(4) The processor core, which refers to the core of the processor, controls the execution of all operations such as computing, accepting/storing commands, and processing data.
(5)性能,在本申请中可以用单位指令的执行时间(Delay Per Instruction,DPI)来判断处理器处理某一个类型负载的性能的好坏,处理器在处理某一个类型负载时,单位指令的执行时间越少,性能越好。同样的,单位时间执行的指令数目(Instructions Per Seconds,IPS)也可以用来判断处理器处理某一个类型负载的性能的好坏,处理器在处理某一个类型负载时,单位时间执行的指令数目越多,性能越好。(5) Performance. In this application, the execution time of a unit instruction (Delay Per Instruction, DPI) can be used to judge the performance of the processor in handling a certain type of load. When the processor is processing a certain type of load, the unit instruction The less execution time, the better the performance. Similarly, the number of instructions executed per unit time (Instructions Per Seconds, IPS) can also be used to judge the performance of the processor in handling a certain type of load. When the processor is processing a certain type of load, the number of instructions executed per unit time The more, the better the performance.
(6)能效,指的是计算机提供的服务与所消耗的总能源量之比,在本申请中,能效可以用消耗单位能量所执行的指令数目(IPS Per Watt)来衡量。计算机系统(包括处理器系统和存储器系统等)在处理某一个类型负载时,消耗单位能量所执行的指令数目越多,计算机系统的能效越好。(6) Energy efficiency, which refers to the ratio of the services provided by the computer to the total energy consumed. In this application, energy efficiency can be measured by the number of instructions executed per unit energy consumed (IPS Per Watt). When a computer system (including a processor system and a memory system, etc.) processes a certain type of load, the more instructions executed by consuming unit energy, the better the energy efficiency of the computer system.
(7)计算机指令就是指挥机器工作的指示和命令,程序就是一系列按一定顺序排列的指令,执行程序的过程就是计算机的工作过程。指令集(Instruction set),就是处理器中用来计算和控制计算机系统的一套指令的集合,每款处理器在设计时就规定了一系列与其硬件电路相配合的指令系统。指令的强弱也是处理器的重要指标,指令集是提高微处理器效率的最有效的工具之一。常见的指令集架构(Instruction Set Architecture,ISA)有复杂指令集运算(Complex Instruction Set Computing,CISC)和精简指令集运算(Reduced Instruction Set Computing,RISC),其中,CISC的典型代表是X86,RISC的典型代表是高级精简指令集机 器(Advanced RISC Machine,ARM)架构和无内部互锁流水级的微处理器(Microprocessor without interlocked pipelined stages,MIPS)架构。(7) Computer instructions are the instructions and commands to direct the work of the machine, the program is a series of instructions arranged in a certain order, and the process of executing the program is the working process of the computer. The instruction set is a set of instructions used in the processor to calculate and control the computer system. Each processor specifies a series of instruction systems that cooperate with its hardware circuit when it is designed. The strength of the instruction is also an important indicator of the processor, and the instruction set is one of the most effective tools to improve the efficiency of the microprocessor. Common instruction set architectures (Instruction Set Architecture, ISA) include complex instruction set computing (Complex Instruction Set Computing, CISC) and reduced instruction set computing (Reduced Instruction Set Computing, RISC). Among them, the typical representatives of CISC are X86, RISC Typical representatives are the Advanced RISC Machine (ARM) architecture and the Microprocessor without interlocked pipelined stages (MIPS) architecture.
(8)进程(process)常常被定义为程序的执行。可以把一个进程看成是一个独立的程序,在内存中有其完备的数据空间和代码空间。一个进程所拥有的数据和变量只属于它自己。(8) A process is often defined as the execution of a program. A process can be regarded as an independent program with its complete data space and code space in memory. Data and variables owned by a process belong only to itself.
(9)线程(thread)则是某一进程中一路单独运行的程序。也就是说,线程存在于进程之中。一个进程由一个或多个线程构成,各线程共享相同的代码和全局数据,但各有其自己的堆栈。由于堆栈是每个线程一个,所以局部变量对每一线程来说是私有的。由于所有线程共享同样的代码和全局数据,它们比进程更紧密,比单独的进程间更趋向于相互作用,线程间的相互作用更容易些,因为它们本身就有某些供通信用的共享内存:进程的全局数据。(9) A thread is a program that runs alone in a process. That is, threads exist within processes. A process consists of one or more threads, each thread sharing the same code and global data, but each has its own stack. Since the stack is one per thread, local variables are private to each thread. Since all threads share the same code and global data, they are tighter than processes and tend to interact more than separate processes, and the interaction between threads is easier because they themselves have some shared memory for communication : Global data for the process.
为了便于理解本申请实施例,下面先分析并提出本申请所具体要解决的技术问题。In order to facilitate understanding of the embodiments of the present application, the following first analyzes and proposes specific technical problems to be solved by the present application.
一般来说,处理器工作频率越高,执行任务的速度就越快,性能就越好,但是,工作频率越高,需要消耗的能量就越多,功耗就越大,计算机系统的能效越差。为了兼顾性能和能效,可以对任务进行调度并对处理器的工作频率进行调频。现有调度-调频方案是将完全公平调度器(completely fair scheduler,CFS)和动态电压频率调整(dynamic voltage and frequency scaling,DVFS)技术融合,在调度器CFS中,通过预先设置的DVFS能量和性能缩放表格在调度过程感知能量消耗的结果,继而控制DVFS调整CPU的工作频率,优化调度的能效结果。同时,DVFS周期运行,持续目标性能及能效。但是,这种融合带来的性能和能效的优化仅限于某些固定场景,例如新线程首次下发、旧线程被唤醒和线程需要被迁移这些场景,在绝大多数时间里系统性能及能效得不到持续保证。Generally speaking, the higher the operating frequency of the processor, the faster the speed of executing tasks and the better the performance. However, the higher the operating frequency, the more energy it needs to consume, the greater the power consumption, and the more energy efficient the computer system is. Difference. To balance performance and energy efficiency, tasks can be scheduled and the operating frequency of the processor can be tuned. The existing scheduling-frequency modulation scheme integrates the completely fair scheduler (CFS) and the dynamic voltage and frequency scaling (DVFS) technology. In the scheduler CFS, through the preset DVFS energy and performance The scaling table senses the result of energy consumption in the scheduling process, and then controls the DVFS to adjust the working frequency of the CPU to optimize the energy efficiency of the scheduling. At the same time, DVFS runs periodically, sustaining target performance and energy efficiency. However, the optimization of performance and energy efficiency brought about by this integration is limited to certain fixed scenarios, such as the first time a new thread is issued, the old thread is awakened, and the thread needs to be migrated. There is no ongoing guarantee.
对于较长时间运行的任务来说,为了保证该任务持续的性能及能效优化。当前调度调频系统依然需要依赖周期化执行的DVFS。理论上,DVFS的执行间隔越细,则系统的收益越大。但受限于DVFS自身的执行开销。这种间隔无法做到更细粒度。For tasks that run for a long time, in order to ensure the continuous performance and energy efficiency optimization of the task. The current scheduling FM system still needs to rely on the periodic execution of DVFS. In theory, the finer the execution interval of DVFS, the greater the benefit of the system. But it is limited by the execution overhead of DVFS itself. This interval cannot be made more fine-grained.
另外,上述CFS和DVFS融合的方案依赖一个固定的性能、功耗折算表格来进行任务的性能及能效优化,但这个表格在不同任务类型下的普适性并不佳。且,该方案只针对CPU的工作频率进行调整,对于存储器例如内存和异步缓存的工作频率不涉及,因而对性能和能效的优化力度有限。In addition, the above-mentioned CFS and DVFS fusion scheme relies on a fixed performance and power consumption conversion table to optimize task performance and energy efficiency, but this table is not universally applicable under different task types. Moreover, this solution only adjusts the working frequency of the CPU, and does not involve the working frequency of the memory such as memory and asynchronous cache, so the optimization of performance and energy efficiency is limited.
基于上述的描述,本申请所要解决的技术问题可以包括如下:Based on the above description, the technical problems to be solved by this application may include the following:
1、调度-调频系统不能持续目标性能及能效的问题。1. The problem that the scheduling-frequency modulation system cannot sustain the target performance and energy efficiency.
2、DVFS系统不能细粒度追踪负载以迭代优化能效的问题。2. The DVFS system cannot track the load in a fine-grained manner to iteratively optimize the energy efficiency.
3、调度-调频系统在不同任务类型下以目标性能及能效为目标的行为的准确性问题。3. The accuracy of the behavior of scheduling-frequency modulation system aiming at target performance and energy efficiency under different task types.
4、调频行为无法与存储系统联动以更全面地优化整个处理器的性能及系统能效的问题。4. The frequency modulation behavior cannot be linked with the storage system to more comprehensively optimize the performance of the entire processor and system energy efficiency.
为了解决上述技术问题,首先,本申请提供一种装置。请参见图1,图1是本申请实施例提供的一种包括调度和调频功能的装置10的结构示意图,该装置10可以位于任意一个电子设备中,如电脑、计算机、手机、平板等各类设备中。该装置10具体可以是芯片或芯片组或搭载有芯片或者芯片组的电路板。该芯片或芯片组或搭载有芯片或芯片组的电路板可在必要的软件驱动下工作。In order to solve the above technical problems, firstly, the present application provides a device. Please refer to FIG. 1. FIG. 1 is a schematic structural diagram of a device 10 including scheduling and frequency modulation functions provided by an embodiment of the present application. The device 10 may be located in any electronic device, such as a computer, a computer, a mobile phone, a tablet, etc. in the device. Specifically, the device 10 may be a chip or a chip set or a circuit board on which the chip or the chip set is mounted. The chip or chip set or the circuit board on which the chip or chip set is mounted can be driven by necessary software.
装置10包括处理器101,以及耦合于处理器101的调频控制器102、调频器103、异步缓存104和内存105。The apparatus 10 includes a processor 101 , and a frequency modulation controller 102 , a frequency modulator 103 , an asynchronous buffer 104 and a memory 105 coupled to the processor 101 .
处理器101包括一个或多个处理器核(Core),图1以N(N为大于0的整数)个处理器核1011为例,包括处理器核(Core)1、处理器核(Core)2、……和处理器核(Core)(N-1)。每个处理器核1011中包括负载分类器和运算控制单元。其中,每个处理器核1011内的负载分类器用于对该每个处理器核1011内的负载进行分类。每个处理器核1011内的运算控制单元可以用于控制该每个处理器核1011内的负载分类器等硬件的执行和运算。The processor 101 includes one or more processor cores (Core). FIG. 1 takes N (N is an integer greater than 0) processor cores 1011 as an example, including a processor core (Core) 1, a processor core (Core) 2. ... and the processor core (Core) (N-1). Each processor core 1011 includes a load classifier and an operation control unit. The load classifier in each processor core 1011 is used to classify the load in each processor core 1011 . The operation control unit in each processor core 1011 may be used to control the execution and operation of hardware such as load classifiers in each processor core 1011 .
一种可能的实现方式中,该负载分类器也可以用软件来实现。可选的,软件实现的负载分类器可以配置在处理器核1011内或者配置在调频控制器102内。若配置在处理器101内,那么该负载分类器直接从处理器核1011内获取负载分类的特征信息对负载进行分类输出负载分类结果给调频控制器102。若配置在调频控制器102内,那么负载分类器可以从处理器核1011获取负载分类特征信息,在调频控制器102内基于获取的负载分类特征信息进行负载分类的计算,得到从而负载分类信息。该负载分类特征信息后面会介绍,此处暂不详述。In a possible implementation, the load classifier can also be implemented in software. Optionally, the software-implemented load classifier may be configured in the processor core 1011 or in the frequency modulation controller 102 . If configured in the processor 101 , the load classifier directly obtains the characteristic information of the load classification from the processor core 1011 , classifies the load, and outputs the load classification result to the frequency modulation controller 102 . If configured in the FM controller 102, the load classifier can obtain the load classification feature information from the processor core 1011, and the FM controller 102 performs load classification calculation based on the obtained load classification feature information to obtain the load classification information. The load classification feature information will be introduced later, and will not be described in detail here.
若处理器101包括多个处理器核1011,那么该多个处理器核1011可以分成多个簇,每一个簇包括至少一个处理器核。例如假设处理器101包括8个处理器核1011,那么,可以将该8个处理器核1011分成三个簇,其中两个簇可以包括3个处理器核1011,剩下一个簇包括2个处理器核1011。每个簇中的处理器核的工作频率相同,即每个簇中的处理器核共用一个工作频率。If the processor 101 includes multiple processor cores 1011, the multiple processor cores 1011 may be divided into multiple clusters, and each cluster includes at least one processor core. For example, if the processor 101 includes 8 processor cores 1011, then the 8 processor cores 1011 can be divided into three clusters, two of which can include 3 processor cores 1011, and the remaining one cluster includes 2 processors device core 1011. The operating frequencies of the processor cores in each cluster are the same, that is, the processor cores in each cluster share one operating frequency.
调频控制器102用于基于上述负载分类器的分类结果计算装置内相关组件(例如各个处理器核1011、内存105和/或异步缓存104等)的工作频率的优化频率,然后调用调频器103将该相关组件的工作频率调整为对应的优化频率,以优化处理器的性能和计算机系统的能效。The frequency modulation controller 102 is configured to calculate the optimized frequency of the operating frequency of the relevant components in the device (for example, each processor core 1011, the memory 105, and/or the asynchronous cache 104, etc.) based on the classification result of the above-mentioned load classifier, and then call the frequency regulator 103 to The operating frequency of the related components is adjusted to the corresponding optimized frequency to optimize the performance of the processor and the energy efficiency of the computer system.
调频控制器102可以是低功耗的处理器,例如可以是低功耗的CPU、低功耗的微控制单元(microcontroller unit,MCU)或者低功耗的状态机等等。The FM controller 102 may be a low-power processor, such as a low-power CPU, a low-power microcontroller (MCU), or a low-power state machine, or the like.
调频器103可以从调频控制器102获取处理器的优化工作频率,然后基于这些优化频率调整上述一个或多个处理器核的工作频率。可选的,调频器103还可以从调频控制器102获取异步缓存104和内存105的优化工作频率,然后基于这些优化频率分别调整异步缓存104和内存105的工作频率。The frequency regulator 103 may obtain the optimal operating frequency of the processor from the frequency regulation controller 102, and then adjust the operating frequency of the above-mentioned one or more processor cores based on the optimal frequency. Optionally, the frequency regulator 103 may also obtain the optimal operating frequencies of the asynchronous cache 104 and the memory 105 from the frequency regulation controller 102, and then adjust the operating frequencies of the asynchronous cache 104 and the memory 105 respectively based on these optimal frequencies.
调频器103可以是动态电压频率调整(dynamic voltage and frequency scaling,DVFS)模块等。The frequency modulator 103 may be a dynamic voltage and frequency scaling (DVFS) module or the like.
内存105可以包括但不限于是随机存储记忆体(random access memory,RAM)、只读存储器(read-only memory,ROM)、可擦除可编程只读存储器(erasable programmable read only memory,EPROM)或便携式只读存储器(compact disc read-only memory,CD-ROM)等。The memory 105 may include, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM), or Portable read-only memory (compact disc read-only memory, CD-ROM), etc.
可选的,异步缓存104可以是高速缓冲(cache)存储器,通常由静态随机存取存储器(static random access memory,SRAM)组成。异步缓存104可以包括三级缓存和/或四级缓存。需要说明的是,由于同步缓存(例如一级缓存等)的工作频率是与处理器核的工作频率相同的,而异步缓存104的工作频率与处理器核1011的工作频率不相同,所以在本申请中,可以另外对异步缓存的工作频率进行优化调整,以进一步优化处理器的性能和整个计算机系统的能效。另外,一种可能的实施方式中,二级缓存可以是同步缓存,另一种可能的实施方式中,二级缓存为异步缓存。Optionally, the asynchronous cache 104 may be a cache memory, usually composed of static random access memory (SRAM). Asynchronous cache 104 may include a L3 cache and/or a L4 cache. It should be noted that since the operating frequency of the synchronous cache (such as the L1 cache, etc.) is the same as the operating frequency of the processor core, and the operating frequency of the asynchronous cache 104 is different from that of the processor core 1011, so in this In the application, the operating frequency of the asynchronous cache can be further optimized and adjusted to further optimize the performance of the processor and the energy efficiency of the entire computer system. In addition, in a possible implementation manner, the secondary cache may be a synchronous cache, and in another possible implementation manner, the secondary cache is an asynchronous cache.
可选的,内存105可以是DDR存储器,该DDR为双倍速率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDR SDRAM)的简称。Optionally, the memory 105 may be a DDR memory, where the DDR is an abbreviation for double data rate synchronous dynamic random access memory (DDR SDRAM).
处理器101包括的一个或多个处理器核(Core)中的至少一个处理器核用于运行任务调度 器106。该任务调度器106为计算机程序来实现,该任务调度器106可以是操作系统(operating system,OS)中的调度程序,用于通过调度通道107将任务调度分配到该一个或多个处理器核中执行。At least one of the one or more processor cores (Cores) included in the processor 101 is used to run the task scheduler 106. The task scheduler 106 is implemented by a computer program, and the task scheduler 106 may be a scheduler in an operating system (operating system, OS), and is used to assign task scheduling to the one or more processor cores through the scheduling channel 107 in execution.
在图1中可以看到,任务调度器106可以向调频控制器102发送调频约束,该调频约束可以是功耗约束、性能约束等。任务调度器106还可以从调频控制器102获取调度建议,该调度建议可以包括负载均衡、任务迁移或上下电策略等建议。例如,当调频控制器102获取同一个簇内的各个处理器核存在负载不均衡或者负载类型差异等不利于优化的情况时,会向任务调度器106发送负载均衡的建议。示例性地,该负载均衡建议可以包括该簇内各个处理器核的负载情况以及负载类型等信息。As can be seen in FIG. 1, the task scheduler 106 may send frequency modulation constraints to the frequency modulation controller 102, which may be power consumption constraints, performance constraints, or the like. The task scheduler 106 may also obtain scheduling suggestions from the frequency regulation controller 102, and the scheduling suggestions may include suggestions such as load balancing, task migration, or power-on/off policies. For example, when the frequency regulation controller 102 obtains that the processor cores in the same cluster have unbalanced loads or different load types, which are not conducive to optimization, it will send a load balancing suggestion to the task scheduler 106 . Exemplarily, the load balancing suggestion may include information such as the load situation and load type of each processor core in the cluster.
可选的,任务调度器106也可以向调频器103发送调频请求,从而调用调频器103进行调频。Optionally, the task scheduler 106 may also send a frequency modulation request to the frequency modulator 103, so as to call the frequency modulator 103 to perform frequency modulation.
综上所述,本申请提供的装置10硬件调频控制与软件线程调度解耦,软件负责粗粒度调度并向硬件提供调频约束;硬件控制细粒度调频并向软件提供调度建议;软硬结合,粗调与细调结合。本申请提供的装置10中调频控制器与负载分类器相结合,可以更及时的对负载追踪及实时迭代,更及时的升频降频,及时及持续更好优化能效及性能。同时,通过硬件实现负载的跟踪及调频控制可以有效降低软件负载及开销。另外,调频控制器可对处理器与存储器进行调频控制,从而可从整个系统的角度优化能效及性能。To sum up, the device 10 provided by the present application decouples hardware frequency modulation control from software thread scheduling, software is responsible for coarse-grained scheduling and provides frequency modulation constraints to hardware; hardware controls fine-grained frequency modulation and provides scheduling advice to software; A combination of fine-tuning and fine-tuning. The combination of the frequency modulation controller and the load classifier in the device 10 provided by the present application can track and iterate the load in a timely manner, increase and decrease the frequency in a timely manner, and optimize energy efficiency and performance in a timely and continuous manner. At the same time, implementing load tracking and frequency modulation control through hardware can effectively reduce software load and overhead. In addition, the FM controller provides FM control of the processor and memory to optimize energy efficiency and performance from a system-wide perspective.
基于上述装置的软硬件架构,在本申请实施例中,该装置10中各个组成部分具体所实现的功能可包括如下:Based on the software and hardware architecture of the above device, in the embodiment of the present application, the specific functions implemented by each component in the device 10 may include the following:
每个处理器核1011中的负载分类器,用于对该每个处理器核1011中的负载进行分类得到负载分类信息,该负载分类信息用于指示该每个处理器核1011中的负载所属的负载类型。The load classifier in each processor core 1011 is used to classify the load in each processor core 1011 to obtain load classification information, where the load classification information is used to indicate to which the load in each processor core 1011 belongs load type.
一种可能的实施方式中,负载分类器也可以配置在调频控制器102中,用于对每个处理器核1011中的负载进行分类,本申请主要以在处理器核1011中配置负载分类器的情况为例进行介绍,对于负载分类器配置在调频控制器102中的情况可以参见对应的描述,本申请不再赘述。In a possible implementation manner, the load classifier may also be configured in the frequency modulation controller 102 to classify the load in each processor core 1011 . This application mainly focuses on configuring the load classifier in the processor core 1011 . For the case where the load classifier is configured in the frequency modulation controller 102, reference may be made to the corresponding description, which will not be repeated in this application.
在具体实施例中,负载类型可以包括计算型、缓存依赖型、内存依赖型和空闲类型等。其中,若处理器核在执行负载的过程中,只需要访问一级缓存和/或二级缓存即可获取需要的数据,那么该负载属于计算型的负载。若处理器核在执行负载的过程中,需要访问三级缓存和/或四级缓存来获取数据,那么该负载属于缓存依赖型的负载。若处理器核在执行负载的过程中,需要访问内存来获取数据,那么该负载属于内存依赖型的负载。当处理器核执行空闲(idle)负载时即挂起暂停工作时,处理器核1011中的负载的类型为空闲类型。In a specific embodiment, the load type may include computational type, cache-dependent type, memory-dependent type, idle type, and the like. Wherein, if the processor core only needs to access the first-level cache and/or the second-level cache to obtain the required data in the process of executing the load, then the load is a computational load. If the processor core needs to access the L3 cache and/or the L4 cache to obtain data in the process of executing the load, the load is a cache-dependent load. If the processor core needs to access the memory to obtain data during the execution of the load, the load is a memory-dependent load. When the processor core executes an idle load, that is, when the suspending work is suspended, the type of the load in the processor core 1011 is the idle type.
下面以处理器核1011-0中的负载分类器为例介绍,该负载分类器可以是用硬件电路或者软件程序实现的分类算法,例如可以softmax分类算法等,该softmax分类算法包括一个预先训练好的模型,该模型可以以处理器核1011-0内(可选的,还可以包括同步缓存内)的负载分类特征信息作为输入,经过该模型的计算和判断之后可以输出该处理器核1011-0内负载的分类信息。该负载分类特征信息包括处理器核1011-0内(可选的,还可以包括同步缓存内)的时钟信号翻转和性能事件(event)中的一项或多项。该性能事件可以包括例如预设时长内的指令和/或电平信号等事件中的一项或多项。为了便于理解该负载分类器,可以参见图2。The following takes the load classifier in the processor core 1011-0 as an example. The load classifier can be a classification algorithm implemented by a hardware circuit or a software program, such as a softmax classification algorithm. The softmax classification algorithm includes a pre-trained The model can take the load classification feature information in the processor core 1011-0 (optionally, it can also include the synchronization cache) as input, and the processor core 1011-0 can be output after the calculation and judgment of the model. The classification information of the load within 0. The load classification feature information includes one or more of clock signal inversions and performance events (events) in the processor core 1011-0 (optionally, may also include in the synchronization buffer). The performance event may include one or more events such as a command and/or a level signal within a preset duration. To facilitate understanding of the load classifier, reference can be made to Figure 2.
图2示例性示出了以硬件电路实现的负载分类器的结构示意图。可以看到,负载分类器可以包括计数器201、处理单元202和存储器203。其中,处理单元202可以包括乘加(multiply  accumulate,MAC)单元,和比较及控制单元等用于执行计算、比较以及控制处理等操作的单元。存储器203可以是静态随机存取存储器(static random-access memory,SRAM)等。示例性地,负载分类器以上述处理器核1011-0(可选的,还可以包括同步缓存)在第一预设时长内的时钟信号翻转和性能事件中的一项或多项输入到计数器201中,用于统计第一预设时长内的时钟信号翻转的次数和\或性能事件出现的次数或时长等。若性能事件包括第一预设时长内的指令,那么计数器201用于统计该指令的个数;若性能事件包括电平信号等事件,那么计数器201可以用于统计这些电平信号等在第一预设时长内出现的时长等,进而可以统计出该第一预设时长内的前端执行效率和\或第一预设时长内的后端执行效率等信息。计数器201对第一预设时长内的输入统计完成之后,将统计结果输入到处理单元202中,且处理单元202从存储器203中获取预先训练好的计算参数等信息,然后处理单元202将统计结果和预先训练好的计算参数等信息进行计算和比较等操作,最终输出一个具体的负载类型。FIG. 2 exemplarily shows a schematic structural diagram of a load classifier implemented by a hardware circuit. As can be seen, the load classifier may include a counter 201 , a processing unit 202 and a memory 203 . Wherein, the processing unit 202 may include a multiply-accumulate (multiply accumulate, MAC) unit, a comparison and control unit, and other units for performing operations such as calculation, comparison, and control processing. The memory 203 can be a static random-access memory (static random-access memory, SRAM) or the like. Exemplarily, the load classifier is input to the counter with one or more of the clock signal inversion and the performance event of the above-mentioned processor core 1011-0 (optionally, the synchronization buffer may also be included) within the first preset duration. In 201, it is used to count the number of times the clock signal is flipped and\or the number or duration of performance events within the first preset time period. If the performance event includes instructions within the first preset duration, the counter 201 can be used to count the number of the instructions; if the performance event includes events such as level signals, the counter 201 can be used to count the level signals and other events in the first The duration of the preset duration, etc., and then the front-end execution efficiency within the first preset duration and/or the back-end execution efficiency within the first preset duration can be counted. After the counter 201 completes the statistics of the input within the first preset time period, the statistical results are input into the processing unit 202, and the processing unit 202 obtains information such as pre-trained calculation parameters from the memory 203, and then the processing unit 202 will count the results. Calculate and compare with pre-trained computing parameters and other information, and finally output a specific load type.
该负载分类器可以是以上述第一预设时长为周期来周期性地对处理器核1011-0内的负载进行分类,也可以是以大于上述第一预设时长的第二预设时长为周期。该周期也可以称为一个分类窗口,那么该负载分类器周期性地对每个分类窗口内处理器核1011-0的负载进行分类得到每个分类窗口内的负载的类型。上述第一预设时长、第二预设时长的大小可以是100微秒、10毫秒或者1秒等等,本申请对上述第一预设时长、第二预设时长的大小不做限制。The load classifier may periodically classify the load in the processor core 1011-0 using the first preset duration as a period, or may use a second preset duration greater than the first preset duration as cycle. The period may also be called a classification window, then the load classifier periodically classifies the load of the processor core 1011-0 in each classification window to obtain the type of the load in each classification window. The size of the first preset duration and the second preset duration may be 100 microseconds, 10 milliseconds, or 1 second, etc. The present application does not limit the sizes of the first preset duration and the second preset duration.
可选的,上述第一预设时长、第二预设时长的大小由该调频控制器102配置得到;或者,上述第一预设时长、第二预设时长的大小由该每个处理器核1011中的负载分类器通过中断触发该调频控制器102来确定。Optionally, the size of the first preset duration and the second preset duration are configured by the frequency modulation controller 102; or, the size of the first preset duration and the second preset duration are determined by each processor core. The load classifier in 1011 is determined by interrupt triggering the FM controller 102.
调频控制器102,用于从第一处理器核中获取该第一处理器核的目标负载分类信息;基于该目标负载分类信息确定目标对象的目标频率;然后,基于该目标频率调用该调频器103调整该目标对象的工作频率。The frequency modulation controller 102 is configured to obtain the target load classification information of the first processor core from the first processor core; determine the target frequency of the target object based on the target load classification information; and then call the frequency regulator based on the target frequency 103 Adjust the working frequency of the target object.
该第一处理器核为上述一个或多个处理器核1011中的一个处理器核。只要当处理器核1011满足预设条件的情况下,该处理器核1011才是所述的第一处理器核。The first processor core is one of the above-mentioned one or more processor cores 1011 . As long as the processor core 1011 satisfies the preset condition, the processor core 1011 is the first processor core.
该预设条件为该处理器核1011在第一窗口内的负载量大于第一阈值或者小于第二阈值。该第一窗口与该处理器核1011所在的窗口为相邻的窗口,且该第一窗口先出现。可选的,该第一窗口可以包括一个或多个上述负载分类器的分类周期,即该第一窗口可以包括一个或多个上述分类窗口。The preset condition is that the load of the processor core 1011 in the first window is greater than the first threshold or less than the second threshold. The first window and the window where the processor core 1011 is located are adjacent windows, and the first window appears first. Optionally, the first window may include one or more classification periods of the foregoing load classifiers, that is, the first window may include one or more of the foregoing classification windows.
上述第一阈值和第二阈值统称为负载阈值,该第一阈值大于该第二阈值。具体的,该第一阈值可以是升频负载阈值,该第二阈值可以是降频负载阈值。该升频负载阈值的取值例如可以是70%或80%等等,该降频负载阈值的取值可以是50%或60%等等,本申请对负载阈值的具体取值不做限制。The above-mentioned first threshold and second threshold are collectively referred to as a load threshold, and the first threshold is greater than the second threshold. Specifically, the first threshold may be an up-frequency load threshold, and the second threshold may be a down-frequency load threshold. The value of the up-frequency load threshold may be, for example, 70% or 80%, and the value of the down-frequency load threshold may be 50% or 60%, etc. The application does not limit the specific value of the load threshold.
当处理器核1011在第一窗口内的负载量大于该升频负载阈值时,表明该处理器核1011的负载太重,处理速度跟不上,为了提升处理器的性能,需要对处理器核1011对应的工作频率进行升频。当处理器核1011在第一窗口内的负载量小于该降频负载阈值时,表明该处理器核1011的负载较轻,不需要这么快的处理速度,为了节省系统的能耗,需要对处理器核1011对应的工作频率进行降频。When the load of the processor core 1011 in the first window is greater than the up-frequency load threshold, it indicates that the load of the processor core 1011 is too heavy and the processing speed cannot keep up. The operating frequency corresponding to 1011 is up-converted. When the load of the processor core 1011 in the first window is less than the frequency reduction load threshold, it indicates that the load of the processor core 1011 is light, and such a fast processing speed is not required. The operating frequency corresponding to the core 1011 is down-converted.
一种可能的实施方式中,不同的处理器核的第一阈值可以相同,也可以不相同;不同的处理器核的第二阈值可以相同,也可以不相同。In a possible implementation manner, the first thresholds of different processor cores may be the same or different; the second thresholds of different processor cores may be the same or different.
一种可能的实施方式中,同一个簇内的处理器核的第一阈值可以相同,同一个簇内的处 理器核的第二阈值可以相同;且,不同簇内的处理器核的第一阈值可以不同,不同簇内的处理器核的第二阈值可以不同。In a possible implementation manner, the first thresholds of the processor cores in the same cluster may be the same, and the second thresholds of the processor cores in the same cluster may be the same; and the first thresholds of the processor cores in different clusters may be the same. The threshold may be different, and the second threshold may be different for processor cores in different clusters.
上述目标对象可以包括该第一处理器核和存储器中的至少一项。该存储器包括内存和异步缓存中的至少一项。示例性地,该内存可以是图1中所示的内存105,该异步缓存可以是图1中所示的异步缓存104。The above-mentioned target object may include at least one of the first processor core and the memory. The memory includes at least one of a memory and an asynchronous cache. Exemplarily, the memory may be the memory 105 shown in FIG. 1 , and the asynchronous cache may be the asynchronous cache 104 shown in FIG. 1 .
上述负载分类信息可以包括具体的负载所属的具体类型。例如,假设该第一窗口内包括两个分类窗口(分类窗口1和分类窗口2),那么,示例性地,该第一窗口内的负载分类信息可以为:分类窗口1得到的负载类型为计算型,分类窗口2得到的负载类型为缓存依赖型。The foregoing load classification information may include a specific type to which a specific load belongs. For example, assuming that the first window includes two classification windows (classification window 1 and classification window 2), then, exemplarily, the load classification information in the first window may be: the load type obtained by classification window 1 is computing The load type obtained by classification window 2 is cache-dependent.
在具体实施例中,调频控制器102可以监控上述一个或多个处理器核1011在每个窗口内的负载量,以处理器核1011-0为例说明,在该处理器核1011-0在第一窗口内的负载量大于第一阈值或者小于第二阈值的情况下(此时该处理器核1011-0为上述第一处理器核),会触发调频控制器102基于该第一窗口内的负载情况计算优化频率,然后调用调频器103进行调频。In a specific embodiment, the frequency modulation controller 102 can monitor the load of the above-mentioned one or more processor cores 1011 in each window. Taking the processor core 1011-0 as an example, in the When the load in the first window is greater than the first threshold or less than the second threshold (in this case, the processor core 1011-0 is the above-mentioned first processor core), the FM controller 102 will be triggered based on the first window. According to the load situation, the optimized frequency is calculated, and then the frequency regulator 103 is called for frequency regulation.
调频控制器102可以通过多种方式获取处理器核1011-0在第一窗口内的负载量,下面示例性介绍两种:The frequency modulation controller 102 can obtain the load amount of the processor core 1011-0 in the first window in various ways, two of which are exemplarily introduced below:
第一种、通过从处理器核1011-0获取由该处理器核1011-0计算得到的该第一窗口内的负载量。The first method is to obtain the load amount in the first window calculated by the processor core 1011-0 from the processor core 1011-0.
具体的,处理器核1011-0可以感知该第一窗口内的自身的负载情况,具体的,可以获知该第一窗口内处理器核1011-0繁忙(busy)和空闲(idle)的时长,然后,计算busy时长与该窗口内的总时长的比值即可得到该第一窗口内的负载量。该窗口内的总时长为busy时长与idle时长的和。Busy时长指的是处理器核1011-0需要处理用于完成某个目标的任务的时长,idle时长指的是让处理器核1011-0挂起即暂停工作的时长。例如,该第一窗口的总时长为10秒,busy时长为8秒,idle时长为2秒,那么,该第一窗口内的负载量为8/10=80%。处理器和1011-0计算得到该第一窗口内的负载量之后,可以发送给调频控制器102以做进一步的处理。Specifically, the processor core 1011-0 can perceive its own load in the first window. Specifically, the processor core 1011-0 in the first window can know the busy and idle periods of time. Then, by calculating the ratio of the busy duration to the total duration in the window, the load in the first window can be obtained. The total duration in this window is the sum of the busy duration and the idle duration. The busy duration refers to the duration that the processor core 1011-0 needs to process a task for completing a certain target, and the idle duration refers to the duration that the processor core 1011-0 is suspended, that is, the work is suspended. For example, if the total duration of the first window is 10 seconds, the busy duration is 8 seconds, and the idle duration is 2 seconds, then the load in the first window is 8/10=80%. After the processor and 1011-0 calculate the load in the first window, the load can be sent to the frequency modulation controller 102 for further processing.
第二种、通过从处理器核1011-0中的负载分类器获取实时的负载分类信息,基于该实时的负载分类信息计算得到在该第一窗口内该处理器核1011-0的实时负载。The second is obtaining real-time load classification information from the load classifier in the processor core 1011-0, and calculating the real-time load of the processor core 1011-0 in the first window based on the real-time load classification information.
具体的,调频控制器102可以实时获取处理器核1011-0内的负载分类器在每一个窗口内的负载分类信息,那么,调频控制器102可以实时获取处理器核1011-0内的负载分类器在上述第一窗口内的负载分类信息,由于该负载分类信息指示了该第一窗口包括的各个分类窗口的负载的类型情况,那么,计算该负载分类信息中除了空闲类型之外的负载类型的个数与该负载分类信息中总的负载类型的个数的比值即可得到该第一窗口内的负载量。例如,假设该第一窗口包括4个分类窗口(分类窗口1、分类窗口2、分类窗口3和分类窗口4),该第一窗口内的负载分类信息为:分类窗口1得到的负载类型为计算型,分类窗口2得到的负载类型为缓存依赖型,分类窗口3得到的负载类型为计算型,和分类窗口4得到的负载类型为空闲类型。那么,该第一窗口内的负载量为3/4=75%。Specifically, the FM controller 102 can acquire the load classification information in each window of the load classifier in the processor core 1011-0 in real time, then the FM controller 102 can acquire the load classification information in the processor core 1011-0 in real time The load classification information of the controller in the above-mentioned first window, since the load classification information indicates the load type of each classification window included in the first window, then, calculate the load type except the idle type in the load classification information. The ratio of the number of load types to the total number of load types in the load classification information can obtain the load amount in the first window. For example, assuming that the first window includes 4 classification windows (classification window 1, classification window 2, classification window 3 and classification window 4), the load classification information in the first window is: the load type obtained by classification window 1 is the calculation The load type obtained by the classification window 2 is the cache-dependent type, the load type obtained by the classification window 3 is the computational type, and the load type obtained by the classification window 4 is the idle type. Then, the load in the first window is 3/4=75%.
调频控制器102获取处理器核1011-0在第一窗口内的负载量后,将该负载量分别与上述第一阈值和第二阈值比较,在该处理器核1011-0在第一窗口内的负载量大于第一阈值或者小于第二阈值的情况下,基于该第一窗口内的负载情况即基于第一窗口内的负载量和负载分类信息计算目标对象(包括处理器核1011-0、异步缓存104和内存105中的至少一项)的目标频 率。After acquiring the load amount of the processor core 1011-0 in the first window, the frequency modulation controller 102 compares the load amount with the above-mentioned first threshold and the second threshold respectively, and the processor core 1011-0 is in the first window In the case where the load is greater than the first threshold or less than the second threshold, the target object (including the processor core 1011-0, target frequency of at least one of asynchronous cache 104 and memory 105).
下面介绍计算该目标对象的目标频率的过程。The following describes the process of calculating the target frequency of the target object.
调频控制器102可以根据从处理器核1011-0内的负载分类器获取的上述第一窗口内的负载分类信息计算负载分布权重。具体的,该负载分类信息指示该处理器核1011-0在第一窗口内的负载的类型包括m(该m为大于1的整数)个负载类型,那么该负载分布权重指示该m个类型中每个类型的负载比例。为了便于理解,下面举例说明。The frequency modulation controller 102 may calculate the load distribution weight according to the load classification information in the first window obtained from the load classifier in the processor core 1011-0. Specifically, the load classification information indicates that the load types of the processor core 1011-0 in the first window include m (where m is an integer greater than 1) load types, then the load distribution weight indicates that among the m types Load ratio for each type. For ease of understanding, examples are given below.
假设处理器核1011-0在第一窗口内包括10个分类窗口(可以称为第1分类窗口、第2分类窗口、……和第10分类窗口),负载分类器对该10个分类窗口内的负载进行分类之后得到的负载分类信息可以为:第1分类窗口至第3分类窗口中的负载为计算型负载,第4至第6分类窗口以及第9分类窗口中的负载是缓存依赖型负载,第7分类窗口中的负载为内存依赖型负载,第8分类窗口和第10分类窗口中的负载为空闲类型负载。调频控制器102获取到该负载分类信息之后,统计可知处理器核1011-0在第一窗口内的负载类型总个数为10,然后,计算得到的负载分布权重为:计算型负载的权重为3/10,缓存依赖型负载的权重为4/10,内存依赖型负载的权重为1/10,以及空闲类型负载的权重为2/10。Assuming that the processor core 1011-0 includes 10 classification windows (may be referred to as the 1st classification window, the 2nd classification window, ... and the 10th classification window) in the first window, the load classifier within the 10 classification windows The load classification information obtained after classifying the load of the , the loads in the seventh classification window are memory-dependent loads, and the loads in the eighth and tenth classification windows are idle-type loads. After the FM controller 102 obtains the load classification information, it can be known from statistics that the total number of load types of the processor core 1011-0 in the first window is 10, and then the calculated load distribution weight is: The weight of the computational load is 3/10, cache-dependent loads are weighted 4/10, memory-dependent loads are weighted 1/10, and idle type loads are weighted 2/10.
另外,调频控制器102还需要计算负载缩放比例,基于该负载缩放比例可以计算得到目标性能,从而基于目标性能可以查找到优化频率。具体的,若处理器核1011-0在第一窗口内的负载量大于第一负载阈值(第一负载阈值也可以称为升频负载阈值),那么该负载缩放比例为该第一负载阈值与该负载量的比值。若处理器核1011-0在第一窗口内的负载量小于第二负载阈值(第二负载阈值也可以为称为降频负载阈值),那么该负载缩放比例为该第二负载阈值与该负载量的比值。In addition, the frequency modulation controller 102 also needs to calculate the load scaling ratio, and the target performance can be calculated based on the load scaling ratio, so that the optimal frequency can be found based on the target performance. Specifically, if the load of the processor core 1011-0 in the first window is greater than the first load threshold (the first load threshold may also be referred to as an up-frequency load threshold), then the load scaling ratio is the first load threshold and the ratio of this load. If the load of the processor core 1011-0 in the first window is less than the second load threshold (the second load threshold may also be called the underclocking load threshold), then the load scaling ratio is the second load threshold and the load quantity ratio.
一种可能的实施方式中,本申请所述的性能也可以是数值越大,性能越好,那么,在计算负载缩放比例时,若处理器核1011-0在第一窗口内的负载量大于第一负载阈值,那么该负载缩放比例为该负载量与该第一负载阈值的比值。若处理器核1011-0在第一窗口内的负载量小于第二负载阈值,那么该负载缩放比例为该负载量与该第二负载阈值的比值。需要说明的是,本申请以上一段所述的方式为例介绍。In a possible implementation manner, the performance described in this application may also be that the larger the value, the better the performance. Then, when calculating the load scaling ratio, if the load of the processor core 1011-0 in the first window is greater than the first load threshold, then the load scaling ratio is the ratio of the load amount to the first load threshold. If the load of the processor core 1011-0 in the first window is less than the second load threshold, the load scaling ratio is the ratio of the load to the second load threshold. It should be noted that the method described in the above paragraph of this application is used as an example for introduction.
调频控制器102还需要获取上述目标对象的当前工作频率。一种可能的实施方式中,该目标对象的当前工作频率可以从调频器103中获取,该目标对象的当前工作频率包括处理器核1011-0的当前工作频率,可选的,该目标对象的当前工作频率还包括异步缓存104和/或内存105的当前工作频率。一种可能的实施方式,调频控制器102可以与运行任务调度器106的处理器核交互,从任务调度器106中获取处理器核1011-0的当前工作频率,可选的,还可以获取异步缓存104和/或内存105的当前工作频率。由于任务调度器106需要了解处理器内各个处理器核以及存储器等模块的情况才便于做出任务调度决策,因此,任务调度器106可以实时获取各个处理器核以及存储器的工作频率等信息。The frequency modulation controller 102 also needs to acquire the current operating frequency of the target object. In a possible implementation manner, the current operating frequency of the target object can be obtained from the frequency regulator 103, and the current operating frequency of the target object includes the current operating frequency of the processor core 1011-0. The current operating frequency also includes the current operating frequency of the asynchronous cache 104 and/or the memory 105 . In a possible implementation manner, the frequency modulation controller 102 can interact with the processor core running the task scheduler 106 to obtain the current operating frequency of the processor core 1011-0 from the task scheduler 106, and optionally, can also obtain asynchronous Current operating frequency of cache 104 and/or memory 105. Since the task scheduler 106 needs to know the conditions of each processor core, memory and other modules in the processor in order to make task scheduling decisions, the task scheduler 106 can obtain information such as the operating frequency of each processor core and memory in real time.
另一种可能的实施方式中,该调频控制器102可以直接与处理器核1011-0交互获取该处理器核1011-0的当前工作频率。可选的,该调频控制器102还可以与存储器即异步缓存104和/或内存105交互获取异步缓存104和/或内存105的当前工作频率。In another possible implementation manner, the frequency modulation controller 102 may directly interact with the processor core 1011-0 to obtain the current operating frequency of the processor core 1011-0. Optionally, the frequency modulation controller 102 may also interact with the memory, that is, the asynchronous cache 104 and/or the memory 105 to obtain the current operating frequency of the asynchronous cache 104 and/or the memory 105 .
获取处理器核1011-0对应的当前工作频率之后,调频控制器102基于该当前工作频率获取处理器101的当前性能信息,该当前性能指的是在该目标对象的当前工作频率下处理负载时该处理器101的性能。然后,基于该当前性能信息和上述计算得到的负载缩放比例计算得到目标性能信息,然后,基于计算得到的目标性能信息确定该目标对象的目标频率。After acquiring the current operating frequency corresponding to the processor core 1011-0, the FM controller 102 acquires the current performance information of the processor 101 based on the current operating frequency, where the current performance refers to the time when the load is processed at the current operating frequency of the target object performance of the processor 101 . Then, target performance information is calculated based on the current performance information and the load scaling ratio calculated above, and then the target frequency of the target object is determined based on the calculated target performance information.
在介绍确定该目标对象的目标频率的具体过程之前,下面先介绍一下调频控制器102中维护的频率-性能映射表。Before introducing the specific process of determining the target frequency of the target object, the frequency-performance mapping table maintained in the frequency modulation controller 102 is introduced below.
具体的,在调频控制器102中维护着上述各个处理器核对应的频率-性能关系映射表(下面简称映射表),并且每个处理器核对应的映射表还可以包括多个,该多个映射表为多个负载类型各自对应的映射表。Specifically, the frequency-performance relationship mapping table (hereinafter referred to as the mapping table) corresponding to each processor core is maintained in the frequency modulation controller 102, and the mapping table corresponding to each processor core may also include multiple, the multiple The mapping table is a mapping table corresponding to each of the multiple load types.
该频率-性能映射表可以是离线训练得到的。具体的,首先可以采用上述介绍的负载分类器针对不同处理器核的负载进行离线训练分类得到不同的负载类型。然后,针对不同的负载类型,通过线性回归模型预测处理器核和/存储器不同的工作频率对应的处理器的性能,从而建立了不同处理器核中不同的负载类型对应的频率-性能映射表。The frequency-performance mapping table can be obtained by offline training. Specifically, firstly, the load classifier described above can be used to perform offline training and classification for the loads of different processor cores to obtain different load types. Then, for different load types, the performance of processors corresponding to different operating frequencies of processor cores and/or memories is predicted through a linear regression model, thereby establishing frequency-performance mapping tables corresponding to different load types in different processor cores.
可选的,上述线性回归模型可以包括但不限于是一阶或高阶多项式,模型的输入变量可以包括但不限于处理器核和/存储器工作频率的频率值、频率值的线性组合、频率的比值或者标准化后的频率值等等。该线性回归模型对应的线性回归的算法可以包括但不限于是最小二乘法或者带正则化项的最小二乘法等。Optionally, the above-mentioned linear regression model may include, but is not limited to, a first-order or higher-order polynomial, and the input variables of the model may include, but are not limited to, the frequency value of the operating frequency of the processor core and/or memory, the linear combination of frequency values, the Ratios or normalized frequency values, etc. The linear regression algorithm corresponding to the linear regression model may include, but is not limited to, the least squares method or the least squares method with a regularization term.
需要说明的是,处理器核对应的映射表指的是该映射表中的数据适用于该处理器核。某个负载类型对应的映射表中记录的是处理器核在执行该负载类型的负载时处理器核的工作频率与处理器性能的映射关系;可选的,该某个负载类型对应的映射表中还可以记录的是处理器核在执行该负载类型的负载时存储器的工作频率与处理器性能的映射关系。It should be noted that the mapping table corresponding to the processor core refers to that the data in the mapping table is applicable to the processor core. The mapping table corresponding to a certain load type records the mapping relationship between the operating frequency of the processor core and the processor performance when the processor core executes the load of this load type; optionally, the mapping table corresponding to the certain load type What can also be recorded in is the mapping relationship between the operating frequency of the memory and the performance of the processor when the processor core executes the load of the load type.
当映射表中同时记录了处理器核的工作频率、存储器的工作频率和处理器性能的映射关系时,该映射表中的处理器和的工作频率和存储器的工作频率满足一定的映射关系,使得处理器的性能能够达到较理想的优化状态,同时使得整个计算机系统的能效也能够得到较好的优化。When the mapping table simultaneously records the mapping relationship between the operating frequency of the processor core, the operating frequency of the memory and the processor performance, the operating frequency of the processor sum and the operating frequency of the memory in the mapping table satisfy a certain mapping relationship, so that The performance of the processor can reach an ideal optimized state, and at the same time, the energy efficiency of the entire computer system can also be better optimized.
为了便于理解,可以参见图3。在图3中,示例性画出了各个处理器核各自对应的映射表,每个处理器核对应的映射表中还包括多个负载类型各自对应的映射表,图3中示例性地以计算型负载、缓存依赖型负载和内存依赖型负载为例示出。For ease of understanding, reference may be made to FIG. 3 . In FIG. 3 , the mapping table corresponding to each processor core is exemplarily drawn, and the mapping table corresponding to each processor core also includes mapping tables corresponding to multiple load types. Examples of workloads, cache-dependent workloads, and memory-dependent workloads are shown.
为了便于理解,可以参见表1,表1示例性地示出了某个负载类型的映射表的一部分。For ease of understanding, reference may be made to Table 1, which exemplarily shows a part of a mapping table of a certain load type.
表1Table 1
处理器核的工作频率/HzOperating frequency of the processor core/Hz 内存的工作频率/HzMemory operating frequency/Hz 缓存的工作频率/HzCache operating frequency/Hz 性能performance
100100 5050 4040 a1a1
200200 5050 4949 a2a2
200200 5050 5252 a3a3
300300 5050 5353 a4a4
400400 5555 5353 a5a5
500500 5151 5252 a6a6
在表1中可以看到,工作频率的单位为赫兹(Hz)。假设表1中所示为处理器核1011-0对应的计算型负载的映射表,示例性地,在表1中,当处理器核1011-0以工作频率100赫兹,且内存和缓存分别以50赫兹和40赫兹的工作频率处理一个计算型负载时,该处理器核1011-0的性能为a1。另外,处理器核的工作频率越大,性能越小,表明执行一个负载需要的时间越少,因而性能越好。As can be seen in Table 1, the unit of operating frequency is Hertz (Hz). Assuming that Table 1 shows a mapping table of computing loads corresponding to the processor core 1011-0, for example, in Table 1, when the processor core 1011-0 operates at a frequency of 100 Hz, and the memory and cache are The performance of the processor core 1011-0 is a1 when processing a computational load at operating frequencies of 50 Hz and 40 Hz. In addition, the higher the operating frequency of the processor core, the lower the performance, indicating that it takes less time to execute a load, so the performance is better.
基于上述的描述,下面介绍确定上述目标对象的目标频率的具体过程,这个过程有两种可能的实现方式,下面分别介绍。Based on the above description, the following describes the specific process of determining the target frequency of the above-mentioned target object. There are two possible implementation manners for this process, which are introduced separately below.
第一种可能的实现方式,选择上述负载分布权重中权重最大的负载类型来计算。The first possible implementation is to select the load type with the largest weight among the above load distribution weights for calculation.
具体的,调频控制器102计算得到处理器核1011-0在第一窗口内的负载分布权重之后,可以获知权重最大的负载类型(可以称为第一负载类型,或者该权重最大的负载类型也可以说是在该第一窗口内出现次数最多的负载类型),然后查找到该处理器核1011-0对应的第一负载类型的映射表。以上述获取的目标对象的当前工作频率为索引,在该第一负载类型的映射表中查找到该当前工作频率映射的性能信息,可以称为第一性能信息,该第一性能信息为处理器101的当前性能信息。Specifically, after the frequency modulation controller 102 calculates and obtains the load distribution weight of the processor core 1011-0 in the first window, it can obtain the load type with the largest weight (which may be referred to as the first load type, or the load type with the largest weight is also known as the load type with the largest weight). It can be said that it is the load type that occurs most frequently in the first window), and then the mapping table of the first load type corresponding to the processor core 1011-0 is found. Taking the current operating frequency of the target object obtained above as an index, the performance information mapped to the current operating frequency is found in the mapping table of the first load type, which may be called first performance information, and the first performance information is the processor 101's current performance information.
例如,假设该第一负载类型的映射表为表1所示,获取的该处理器核1011-0对应的当前工作频率为:处理器核1011-0的工作频率100赫兹、内存的工作频率50赫兹和缓存的工作频率40赫兹,那么将该几个工作频率与表1中的频率分别比较,最后查找到该几个频率映射的性能信息为a1。For example, assuming that the mapping table of the first load type is shown in Table 1, the obtained current operating frequency corresponding to the processor core 1011-0 is: the operating frequency of the processor core 1011-0 is 100 Hz, and the operating frequency of the memory is 50 Hz. If the working frequency of the hertz and the cache is 40 Hz, then compare the working frequencies with the frequencies in Table 1, and finally find the performance information mapped by the frequencies as a1.
调频控制器102查找到上述第一性能信息之后,可以将第一性能信息与上述计算得到的负载缩放比例相乘得到新的性能信息,该新的性能即为期望的该处理器核1011-0处理第一负载类型的负载时的性能,也可以称为目标性能。然后,以该目标性能信息为索引在第一负载类型的映射表中查找该目标性能信息映射的工作频率,查找到的工作频率即为优化频率。After the FM controller 102 finds the first performance information, it can multiply the first performance information by the load scaling ratio calculated above to obtain new performance information, and the new performance is the expected processor core 1011-0 The performance when handling the load of the first load type may also be referred to as the target performance. Then, the operating frequency mapped by the target performance information is searched in the mapping table of the first load type with the target performance information as an index, and the found operating frequency is the optimized frequency.
例如,还是以表1为例,上述在表1中查找到的第一性能信息为a1,假设上述计算得到的负载缩放比例为0.84,那么目标性能信息为a1*0.84≈a2,然后,将该a2与表1中的性能比较,即可查找到映射的优化频率:处理器核1011-0的优化工作频率为300,内存的优化工作频率为50,缓存的优化工作频率为53。For example, taking Table 1 as an example, the first performance information found in Table 1 above is a1. Assuming that the load scaling ratio calculated above is 0.84, the target performance information is a1*0.84≈a2. Then, the By comparing a2 with the performance in Table 1, the optimized frequency of the mapping can be found: the optimized operating frequency of the processor core 1011-0 is 300, the optimized operating frequency of the memory is 50, and the optimized operating frequency of the cache is 53.
第二种可能的实现方式,基于上述负载分布权重进行加权平均来计算。The second possible implementation is to perform weighted average calculation based on the above load distribution weights.
在具体实施例中,调频控制器102基于获取的该处理器核1011-0在第一窗口内的负载分类信息可知该第一窗口内包括m个类型的负载,且可知该m个类型具体为哪些负载类型。然后,调频控制器102分别对该m个类型的负载进行计算和处理,得到多组该目标对象的m组优化工作频率,然后基于上述负载分布权重对该m组优化工作频率进行加权平均得到该目标对象的目标频率。In a specific embodiment, the FM controller 102 knows that the first window includes m types of loads based on the obtained load classification information of the processor core 1011-0 in the first window, and the m types are specifically: which load types. Then, the frequency modulation controller 102 calculates and processes the m types of loads respectively to obtain m groups of optimal operating frequencies of the target object, and then performs a weighted average of the m groups of optimal operating frequencies based on the load distribution weight to obtain the The target frequency of the target object.
具体的,首先,调频控制器102基于上述获取的目标对象的当前工作频率在第i个映射表中获取第i性能。该第i映射表为上述m个类型中的第i负载类型的映射表。该i的取值为1到m之间的整数。然后,调频控制器102将该第i性能和上述计算得到的负载缩放比例相乘得到第i目标性能信息。再以该第i目标性能信息为索引在该第i个映射表中查找到第i组优化工作频率。经过上述的计算和处理,调频控制器102得到了m组优化工作频率,然后基于该m组优化工作频率和上述负载分布权重计算得到上述目标优化工作频率。为了便于理解,下面举例说明。Specifically, first, the frequency modulation controller 102 obtains the i-th performance in the i-th mapping table based on the current operating frequency of the target object obtained above. The i-th mapping table is a mapping table of the i-th load type among the above m types. The value of i is an integer between 1 and m. Then, the frequency modulation controller 102 multiplies the i-th performance by the load scaling ratio calculated above to obtain the i-th target performance information. Then, the ith group of optimized operating frequencies is found in the ith mapping table using the ith target performance information as an index. After the above calculation and processing, the frequency modulation controller 102 obtains m groups of optimal operating frequencies, and then calculates and obtains the above-mentioned target optimal operating frequencies based on the m groups of optimal operating frequencies and the above-mentioned load distribution weight. For ease of understanding, examples are given below.
假设上述m个类型为计算型和缓存依赖型,且该两个类型的权重分别为0.4和0.6,经过计算和处理得到两组优化工作频率。假设在计算型负载的映射表中查找到的一组优化工作频率为:处理器核的优化工作频率为100赫兹、内存的优化工作频率为50赫兹和缓存的优化工作频率为40赫兹。假设在缓存型负载的映射表中查找到的一组优化工作频率为:处理器核的优化工作频率为200赫兹、内存的优化工作频率为50赫兹和缓存的优化工作频率为49赫兹。那么,目标频率中处理器核的优化工作频率为:100*0.4+200*0.6=2239,目标频率中内存的优化工作频率为:50*0.4+50*0.6=50,目标频率中缓存的优化工作频率为:40*0.4+49*0.6=1025.6。Assuming that the above m types are computational and cache-dependent, and the weights of the two types are 0.4 and 0.6, respectively, two groups of optimized operating frequencies are obtained after calculation and processing. It is assumed that a set of optimal operating frequencies found in the mapping table of the computing load are: the optimal operating frequency of the processor core is 100 Hz, the optimal operating frequency of the memory is 50 Hz, and the optimal operating frequency of the cache is 40 Hz. It is assumed that a set of optimal operating frequencies found in the mapping table of cached loads are: the optimal operating frequency of the processor core is 200 Hz, the optimal operating frequency of the memory is 50 Hz, and the optimal operating frequency of the cache is 49 Hz. Then, the optimized operating frequency of the processor core in the target frequency is: 100*0.4+200*0.6=2239, the optimized operating frequency of the memory in the target frequency is: 50*0.4+50*0.6=50, the optimization of the cache in the target frequency The working frequency is: 40*0.4+49*0.6=1025.6.
在一种可能的实施方式中,上述m个负载类型可以是处理器核1011-0在第一窗口内的负载所属的多个类型中的部分类型。例如,假设该处理器核1011-0在第一窗口内的负载的类型包括计算型、缓存依赖型和内存依赖型这三个类型,而上述m个负载可以只包括该三个类型中的计算型和缓存依赖型。在具体实施例中,该m个类型可以是预先配置好的类型,即调频控制器102可以获取外部的配置信息,该配置信息指示该m个类型为哪些具体的负载类型。In a possible implementation manner, the above m load types may be some types of multiple types to which the load of the processor core 1011-0 in the first window belongs. For example, it is assumed that the types of loads of the processor core 1011-0 in the first window include three types: computation type, cache-dependent type, and memory-dependent type, and the above-mentioned m loads may only include computation in the three types type and cache-dependent type. In a specific embodiment, the m types may be pre-configured types, that is, the frequency modulation controller 102 may acquire external configuration information, and the configuration information indicates which specific load types the m types are.
在一种可能的实施方式中,上述以该目标性能信息为索引在映射表中查找该目标性能信息映射的工作频率的过程中,调频控制器102可以从任务调度器106中获取性能约束,包括但不限于第一性能阈值、第二性能阈值和第三性能阈值。如果调频控制器102从任务调度器106中获取了第一性能阈值,则表1中性能高于第一性能阈值的频点将不会被用于频率查找;如果调频控制器102从任务调度器106中获取了第二性能阈值,则表1中性能低于第二性能阈值的频点将不会被用于频率查找;如果调频控制器102从任务调度器106中获取了第三性能阈值,则表1中最接近于第三性能阈值的频点将会被查找。这样做可以使得处理器和/或存储器在满足一定性能约束条件下进行工作,同时还可以优化处理器和/或存储器的性能和系统的能效。In a possible implementation manner, in the above process of looking up the operating frequency mapped by the target performance information in the mapping table using the target performance information as an index, the frequency modulation controller 102 may obtain performance constraints from the task scheduler 106, including But not limited to the first performance threshold, the second performance threshold and the third performance threshold. If the FM controller 102 obtains the first performance threshold from the task scheduler 106, the frequency points whose performance is higher than the first performance threshold in Table 1 will not be used for frequency search; if the FM controller 102 obtains the first performance threshold from the task scheduler If the second performance threshold is obtained in 106, the frequency points whose performance is lower than the second performance threshold in Table 1 will not be used for frequency search; if the frequency modulation controller 102 obtains the third performance threshold from the task scheduler 106, Then the frequency point closest to the third performance threshold in Table 1 will be searched. Doing so allows the processor and/or memory to work under certain performance constraints, while also optimizing the performance of the processor and/or memory and the energy efficiency of the system.
在一种可能的实施方式中,上述的映射表中还可以包括功耗的信息,即该映射表可以是频率-性能-功耗映射表。In a possible implementation manner, the above-mentioned mapping table may further include power consumption information, that is, the mapping table may be a frequency-performance-power consumption mapping table.
该频率-性能-功耗映射表与上述频率-性能映射表类似,可以是离线训练得到的。同样的,首先可以采用上述介绍的负载分类器针对不同处理器核的负载进行离线训练分类得到不同的负载类型。然后,针对不同的负载类型,通过线性回归模型预测处理器核和/存储器不同的工作频率对应的处理器的性能和功耗,从而建立了不同处理器核中不同的负载类型对应的频率-性能-功耗映射表。关于该线性回归模型的描述参见前面的介绍,此处不再赘述。The frequency-performance-power consumption mapping table is similar to the above-mentioned frequency-performance mapping table, and may be obtained by offline training. Similarly, the load classifier described above can be used to perform offline training and classification for the load of different processor cores to obtain different load types. Then, for different load types, the performance and power consumption of the processor corresponding to different operating frequencies of the processor core and/or memory are predicted through a linear regression model, thereby establishing the frequency-performance corresponding to different load types in different processor cores. - Power consumption mapping table. For the description of the linear regression model, please refer to the previous introduction, which will not be repeated here.
为了便于理解上述频率-性能-功耗映射表,可以参见表2。To facilitate the understanding of the above frequency-performance-power consumption mapping table, please refer to Table 2.
表2Table 2
处理器核的工作频率/HzOperating frequency of the processor core/Hz 内存的工作频率/HzMemory operating frequency/Hz 缓存的工作频率/HzCache operating frequency/Hz 性能performance 功耗/wPower consumption/w
100100 5050 4040 a1a1 w1w1
200200 5050 4949 a2a2 w2w2
200200 5050 5252 a3a3 w3w3
300300 5050 5353 a4a4 w4w4
400400 5555 5353 a5a5 w5w5
500500 5151 5252 a6a6 w6w6
表2中示例性示出了频率-性能-功耗映射表中的一部分,相比于表1,表2中多了一项功耗的信息,功耗指的是单位时间内消耗的能源,单位为瓦(w)。表2中列出了处理器核在各种对应的工作频率下处理某个类型的负载时处理器所对应的性能和功耗。处理器核的工作频率越大,对应的功耗越大,表明需要消耗越多的能源。Table 2 exemplarily shows a part of the frequency-performance-power consumption mapping table. Compared with Table 1, there is one more power consumption information in Table 2. Power consumption refers to the energy consumed per unit time. The unit is watt (w). Table 2 lists the corresponding performance and power consumption of the processor when the processor core handles a certain type of load under various corresponding operating frequencies. The greater the operating frequency of the processor core, the greater the corresponding power consumption, indicating that more energy needs to be consumed.
在此情况下,调频控制器102还可以获取对上述处理器101的功耗约束,然后基于获取的功耗约束在映射表中查找目标对象的优化频率。例如,以表2为例,假设获取的功耗约束为w5,那么,以w5为索引在表2中查找可以查找到对应的优化工作频率:处理器核的优化工作频率为400赫兹、内存的优化工作频率为55赫兹和缓存的优化工作频率为53赫兹。In this case, the frequency modulation controller 102 may also acquire the power consumption constraints on the above-mentioned processor 101, and then look up the optimized frequency of the target object in the mapping table based on the acquired power consumption constraints. For example, taking Table 2 as an example, assuming that the obtained power consumption constraint is w5, then the corresponding optimized operating frequency can be found by searching in Table 2 with w5 as an index: the optimized operating frequency of the processor core is 400 Hz, and the memory The optimized operating frequency is 55 Hz and the optimized operating frequency of the cache is 53 Hz.
在具体实施例中,调频控制器102可以基于温度控制、热设计功耗(thermal design power,TDP)控制和单核超频控制中的一项或多项控制中获取处理器101的功耗约束。具体的,温度 控制会根据当前温度,以及温控目标温度,根据预测模型或者比例-积分-微分(Proportional-Integral-Differential,PID)算法给出可供分配的功耗配额。TDP就是直接分配功耗上限,该功耗上限即功耗约束,TDP的来源可以是操作系统的配置。单核超频控制和TDP控制类似,也是直接分配功耗上限。In a specific embodiment, the frequency modulation controller 102 may obtain the power consumption constraint of the processor 101 based on one or more of temperature control, thermal design power (TDP) control, and single-core overclocking control. Specifically, the temperature control will give the allocated power consumption quota according to the current temperature and the temperature control target temperature, according to the prediction model or the proportional-integral-differential (Proportional-Integral-Differential, PID) algorithm. TDP is to directly allocate the upper limit of power consumption, the upper limit of power consumption is the power consumption constraint, and the source of TDP can be the configuration of the operating system. Single-core overclocking control is similar to TDP control, and it also directly allocates the upper limit of power consumption.
综上所述,若没有功耗约束的情况下,采用上述第一种可能的实现方式和第二种可能的实现方式中的任一种实施方式即可计算处理得到的目标对象的目标频率。To sum up, if there is no power consumption constraint, the target frequency of the processed target object can be calculated by adopting any one of the first possible implementation manner and the second possible implementation manner.
若有功耗约束的情况下,若基于功耗约束得到的目标对象的优化频率,与采用上述第一种可能的实现方式和第二种可能的实现方式中的任一种实施方式计算处理得到的目标对象的优化频率相同,那么,该目标对象的目标频率仍然是采用上述第一种可能的实现方式和第二种可能的实现方式中的任一种实施方式计算处理得到的优化频率。If there is a power consumption constraint, if the optimized frequency of the target object obtained based on the power consumption constraint is calculated and processed by using any one of the first possible implementation manner and the second possible implementation manner above The optimization frequency of the target object is the same, then, the target frequency of the target object is still the optimization frequency obtained by calculation and processing using any one of the first possible implementation manner and the second possible implementation manner.
若基于功耗约束得到的目标对象的优化频率,与采用上述第一种可能的实现方式和第二种可能的实现方式中的任一种实施方式计算处理得到的目标对象的优化频率不相同,那么,目标对象的目标频率为基于功耗约束得到的优化频率。这是因为,功耗约束的优先级较高,处理器核需要在满足功耗约束的情况下工作。If the optimized frequency of the target object obtained based on the power consumption constraint is different from the optimized frequency of the target object obtained by calculation and processing using any of the first possible implementation manner and the second possible implementation manner, Then, the target frequency of the target object is the optimized frequency obtained based on the power consumption constraint. This is because the power consumption constraint has a higher priority, and the processor core needs to work under the condition that the power consumption constraint is satisfied.
一种可能的实施方式中,对于上述频率-性能-功耗映射表,映射表中的功耗是在固定芯片corner(芯片corner指的是制作芯片的过程中因材料和/或焊接、灌胶等操作导致的偏差)及固定环境温度下得到的,但实际功耗受环境温度及芯片corner的影响,会有变化,因此可以实时获取处理器的功耗以对映射表中的功耗数据进行细微修正,以提高数据的准确性。In a possible implementation manner, for the above-mentioned frequency-performance-power consumption mapping table, the power consumption in the mapping table is the fixed chip corner (chip corner refers to the process of chip making due to materials and/or welding, gluing, etc.) (deviation caused by other operations) and a fixed ambient temperature, but the actual power consumption is affected by the ambient temperature and the chip corner, and there will be changes. Therefore, the power consumption of the processor can be obtained in real time to analyze the power consumption data in the mapping table. Minor corrections to improve data accuracy.
可选的,调频控制器102可以实时从功耗传感器(power sensor)获取实时的功耗大小,然后将获取的功耗大小更新到对应的频率-性能-功耗映射表中。该功耗传感器可以实时检测处理器的功耗大小,并将检测到的功耗发送到调频控制器102。为了便于理解,下面举例说明。Optionally, the frequency modulation controller 102 may acquire real-time power consumption from a power sensor (power sensor) in real time, and then update the acquired power consumption to a corresponding frequency-performance-power consumption mapping table. The power consumption sensor can detect the power consumption of the processor in real time, and send the detected power consumption to the frequency modulation controller 102 . For ease of understanding, examples are given below.
还是以表2中的数据为例,假设表2为处理器核1011-0对应的计算型负载的映射表,在处理器核1011-0的工作频率为100赫兹,内存105的工作频率为50赫兹,异步缓存104的工作频率为40赫兹的情况下,处理器核1011-0处理计算型负载时处理器101的功耗约束为w1。由于受到环境温度等影响,在处理器核1011-0处理计算型负载的过程中,功耗传感器实际检测到处理器101的功耗约束大小为w1’。那么,调频控制器102从功耗传感器中获取该检测到的功耗约束大小后,将表2中处理器核的工作频率为100赫兹,内存105的工作频率为50赫兹,异步缓存104的工作频率为40赫兹所映射的功耗更新为w1’,更新后的映射参见可以表3。Taking the data in Table 2 as an example, suppose that Table 2 is the mapping table of the computing load corresponding to the processor core 1011-0. The operating frequency of the processor core 1011-0 is 100 Hz, and the operating frequency of the memory 105 is 50 Hz. Hertz, when the operating frequency of the asynchronous cache 104 is 40 Hz, the power consumption of the processor 101 is limited to w1 when the processor core 1011-0 processes the computing load. Due to the influence of the ambient temperature, etc., in the process of processing the computing load by the processor core 1011-0, the power consumption sensor actually detects that the size of the power consumption constraint of the processor 101 is w1'. Then, after obtaining the detected power consumption constraint size from the power consumption sensor, the frequency modulation controller 102 sets the working frequency of the processor core in Table 2 to 100 Hz, the working frequency of the memory 105 to 50 Hz, and the working frequency of the asynchronous cache 104 to be 100 Hz. The power consumption mapped at a frequency of 40 Hz is updated to w1', and the updated mapping can be found in Table 3.
表3table 3
处理器核的工作频率/HzOperating frequency of the processor core/Hz 内存的工作频率/HzMemory operating frequency/Hz 缓存的工作频率/HzCache operating frequency/Hz 性能performance 功耗/wPower consumption/w
100100 5050 4040 a1a1 w1’w1'
200200 5050 4949 a2a2 w2w2
200200 5050 5252 a3a3 w3w3
300300 5050 5353 a4a4 w4w4
400400 5555 5353 a5a5 w5w5
500500 5151 5252 a6a6 w6w6
可选的,调频控制器102还可以通过建立功耗获取模型获取处理器的实时功耗的大小,然后将获取的实时功耗大小更新到对应的映射表中。Optionally, the frequency modulation controller 102 may also acquire the real-time power consumption of the processor by establishing a power consumption acquisition model, and then update the acquired real-time power consumption into the corresponding mapping table.
综上所述,不同负载类型对应的处理器核的性能及能效不同,因此本申请从负载类型出 发,并从频率、性能和功耗(即能效)多方面考虑,从而可以为处理器核匹配出更合理、准确的工作频率,可以更好地优化处理器核的性能和能效。To sum up, the performance and energy efficiency of the processor cores corresponding to different load types are different. Therefore, this application starts from the load type, and considers the frequency, performance and power consumption (ie energy efficiency), so as to match the processor cores. A more reasonable and accurate operating frequency can better optimize the performance and energy efficiency of the processor core.
另外,本申请中还可以优化缓存和内存的工作频率,由于仅优化处理器核的工作频率能对性能和能耗的改善效果有限,因而再同时优化缓存和内存的工作频率,可以进一步改善处理器核的性能和能效,可以更好地适应各种类型的负载任务,满足不同负载任务的需求,减少系统卡顿的同时提高系统能效。In addition, the operating frequency of the cache and memory can also be optimized in this application. Since only optimizing the operating frequency of the processor core can improve performance and energy consumption to a limited extent, optimizing the operating frequency of the cache and memory at the same time can further improve processing. The performance and energy efficiency of the core can better adapt to various types of load tasks, meet the needs of different load tasks, reduce system freezes and improve system energy efficiency.
此外,本申请通过上述频率-性能映射表或者频率-性能-功耗映射表来获得目标频率,由于这些映射表是离线情况下训练得到的满足各种约束的最优解集,因而获得的目标频率更理想,基于获得的目标频率处理负载可以获得更好的性能和能效。In addition, the present application obtains the target frequency through the above-mentioned frequency-performance mapping table or frequency-performance-power consumption mapping table. Since these mapping tables are the optimal solution sets obtained by offline training and satisfying various constraints, the obtained target frequency The frequency is more ideal, and better performance and energy efficiency can be achieved by processing the load based on the target frequency obtained.
基于上述的介绍,调频控制器102可以获取到一个处理器核的目标频率。基于上述对图1中的描述可知,在处理器101包括多个处理器核1011的情况下,可以将多个处理器核1011分为多个簇,每个簇中的处理器核的工作频率相同。基于此,若上述处理器核1011-0所在的簇中,除了所述处理器核1011-0的其它处理器核在上述第一窗口内的负载量也大于第一阈值或者小于第二阈值,即该其它处理器核也为上述第一处理器核,也会触发调频控制器102基于该其它处理器核中的每个处理器核在该第一窗口内的负载情况计算优化频率得到该每个处理器的目标频率。Based on the above description, the frequency modulation controller 102 can acquire the target frequency of a processor core. Based on the above description of FIG. 1 , in the case where the processor 101 includes multiple processor cores 1011 , the multiple processor cores 1011 can be divided into multiple clusters, and the operating frequency of the processor cores in each cluster same. Based on this, if in the cluster where the processor core 1011-0 is located, the load of other processor cores except the processor core 1011-0 in the first window is also greater than the first threshold or less than the second threshold, That is, the other processor core is also the first processor core, and the frequency modulation controller 102 is also triggered to calculate the optimized frequency based on the load condition of each processor core in the other processor cores within the first window to obtain the frequency modulation controller 102. The target frequency of each processor.
那么,对于处理器核1011-0所在的簇中,调频控制器102获得了多个处理器核的目标频率,但是每个簇中的处理器核的工作频率相同(为了便于描述可以称处理器核1011-0所在的簇中处理器核的优化工作频率为第一优化工作频率),因此需要从该多个处理器核的目标频率中确定出该第一优化工作频率。调频控制器102可以基于该多个处理器核的目标频率进行仲裁。Then, for the cluster where the processor core 1011-0 is located, the frequency modulation controller 102 obtains the target frequencies of multiple processor cores, but the operating frequencies of the processor cores in each cluster are the same (for the convenience of description, it can be called a processor The optimized operating frequency of the processor core in the cluster where the core 1011-0 is located is the first optimized operating frequency), so the first optimized operating frequency needs to be determined from the target frequencies of the plurality of processor cores. The frequency modulation controller 102 may arbitrate based on the target frequencies of the plurality of processor cores.
一种可能的实施方式中,调频控制器102可以选择该多个处理器核的目标频率中最大的频率作为上述第一优化工作频率。In a possible implementation manner, the frequency modulation controller 102 may select the maximum frequency among the target frequencies of the plurality of processor cores as the first optimized operating frequency.
另一种可能的实施方式中,调频控制器102可以选择该多个处理器核的目标频率中最小的频率作为上述第一优化工作频率。In another possible implementation manner, the frequency modulation controller 102 may select the minimum frequency among the target frequencies of the multiple processor cores as the first optimized operating frequency.
另一种可能的实施方式中,调频控制器102取该多个处理器核的目标频率的平均值作为上述第一优化工作频率。In another possible implementation manner, the frequency modulation controller 102 takes the average value of the target frequencies of the plurality of processor cores as the above-mentioned first optimized operating frequency.
在确定上述第一优化工作频率之后,调频控制器102将该第一优化工作频率发送给调频器103,调频器103调整处理器核1011-0所在的簇中的处理器核的工作频率为该第一优化工作频率。After determining the above-mentioned first optimal operating frequency, the frequency modulation controller 102 sends the first optimal operating frequency to the frequency regulator 103, and the frequency regulator 103 adjusts the operating frequency of the processor cores in the cluster where the processor core 1011-0 is located to be the The first optimizes the operating frequency.
一种可能的实施方式中,若上述处理器核1011-0所在的簇中,只有部分处理器核为上述第一处理器核,那么调频控制器102只计算了该部分处理器核中每个处理器核的目标频率。该处理器核1011-0所在的簇除了该部分第一处理器核之外的一个或多个处理器核可以称为第二处理器核。为了确定出该簇的第一优化工作频率,调频控制器102可以基于该部分第一处理器核中每个处理器核的目标频率(下面简称多个优化频率),以及基于该第二处理器核的当前工作频率进行仲裁。In a possible implementation manner, if only some of the processor cores in the cluster where the processor core 1011-0 is located are the above-mentioned first processor core, the frequency modulation controller 102 only calculates each The target frequency of the processor core. One or more processor cores other than the part of the first processor core in the cluster where the processor core 1011-0 is located may be referred to as a second processor core. In order to determine the first optimized operating frequency of the cluster, the frequency modulation controller 102 may be based on the target frequency of each processor core in the part of the first processor core (hereinafter referred to as multiple optimized frequencies), and based on the second processor core The current operating frequency of the core is arbitrated.
一种可能的实施方式中,调频控制器102可以从上述多个优化频率和该第二处理器核的当前工作频率中选择最大的频率作为上述第一优化工作频率。In a possible implementation manner, the frequency modulation controller 102 may select the largest frequency from the above-mentioned multiple optimal frequencies and the current operating frequency of the second processor core as the above-mentioned first optimal operating frequency.
另一种可能的实施方式中,调频控制器102可以从上述多个优化频率和该第二处理器核的当前工作频率中选择最小的频率作为上述第一优化工作频率。In another possible implementation manner, the frequency modulation controller 102 may select the minimum frequency from the above-mentioned multiple optimal frequencies and the current operating frequency of the second processor core as the above-mentioned first optimal operating frequency.
另一种可能的实施方式中,调频控制器102可以取上述多个优化频率和该第二处理器核的当前工作频率的平均值作为上述第一优化工作频率。In another possible implementation manner, the frequency modulation controller 102 may take the average value of the above-mentioned multiple optimal frequencies and the current operating frequency of the second processor core as the above-mentioned first optimal operating frequency.
在确定上述第一优化工作频率之后,调频控制器102将该第一优化工作频率发送给调频器103,调频器103调整处理器核1011-0所在的簇中的处理器核的工作频率为该第一优化工作频率。After determining the above-mentioned first optimal operating frequency, the frequency modulation controller 102 sends the first optimal operating frequency to the frequency regulator 103, and the frequency regulator 103 adjusts the operating frequency of the processor cores in the cluster where the processor core 1011-0 is located to be the The first optimizes the operating frequency.
需要说明的是,上述只是以处理器核1011-0和该处理器核1011-0所在的簇为例介绍,其它的处理器核以及其它的簇的相关处理可以参见上述对处理器核1011-0和该处理器核1011-0所在的簇的描述,不再赘述。It should be noted that the above only takes the processor core 1011-0 and the cluster where the processor core 1011-0 is located as an example. 0 and the description of the cluster where the processor core 1011-0 is located will not be repeated.
一种可能的实施方式中,若上述目标对象包括上述存储器,那么调频控制器102还计算得到该存储器的目标频率,可以基于该存储器的目标频率调用调频器103调整存储器的工作频率。对于存储器,不管是异步缓存104还是内存105,均是处理器101包括的多个处理器核1011共用的,因此异步缓存104和内存105均只有一个工作频率。下面以异步缓存104为例介绍,内存105的相关描述也可参见对异步缓存104的介绍,不再赘述。In a possible implementation, if the target object includes the memory, the frequency modulation controller 102 also calculates the target frequency of the memory, and can call the frequency modulator 103 to adjust the operating frequency of the memory based on the target frequency of the memory. As for the memory, whether it is the asynchronous cache 104 or the memory 105, it is shared by the multiple processor cores 1011 included in the processor 101, so both the asynchronous cache 104 and the memory 105 have only one operating frequency. In the following, the asynchronous cache 104 is used as an example for introduction. For the related description of the memory 105, reference may also be made to the introduction of the asynchronous cache 104, and details are not repeated here.
若在上述第一窗口内,有多个第一处理器核,那么调频控制器102可以基于该多个第一处理器核各自在该第一窗口内的负载情况计算得到对应的目标频率,该多个第一处理器核中每个处理器核对应的目标频率均包括异步缓存104的一个目标频率。If there are multiple first processor cores in the above-mentioned first window, the frequency modulation controller 102 may calculate and obtain the corresponding target frequency based on the load conditions of the multiple first processor cores in the first window. The target frequency corresponding to each of the plurality of first processor cores includes a target frequency of the asynchronous cache 104 .
若该多个第一处理器核包括处理器101中的全部处理器核,调频控制器102可以从该异步缓存104的多个目标频率中确定一个优化频率,该优化频率可以称为第二优化工作频率。If the plurality of first processor cores include all processor cores in the processor 101 , the frequency modulation controller 102 may determine an optimal frequency from the plurality of target frequencies in the asynchronous cache 104 , and the optimal frequency may be referred to as a second optimal frequency working frequency.
一种可能的实施方式中,调频控制器102可以从该异步缓存104的多个目标频率中选择最大的频率作为上述第二优化工作频率。In a possible implementation manner, the frequency modulation controller 102 may select the largest frequency from the multiple target frequencies of the asynchronous buffer 104 as the second optimized operating frequency.
另一种可能的实施方式中,调频控制器102可以从上述异步缓存104的多个目标频率中选择最小的频率作为上述第二优化工作频率。In another possible implementation manner, the frequency modulation controller 102 may select the minimum frequency from the multiple target frequencies of the asynchronous buffer 104 as the second optimized operating frequency.
另一种可能的实施方式中,调频控制器102可以取上述异步缓存104的多个目标频率的平均值作为上述第二优化工作频率。In another possible implementation manner, the frequency modulation controller 102 may take the average value of multiple target frequencies in the asynchronous buffer 104 as the second optimized operating frequency.
若该多个第一处理器核包括处理器101中的部分处理器核,调频控制器102可以从该异步缓存104的多个目标频率以及异步缓存104的当前工作频率中确定第二优化工作频率。If the plurality of first processor cores include some of the processor cores in the processor 101 , the frequency modulation controller 102 may determine the second optimal operating frequency from the plurality of target frequencies of the asynchronous buffer 104 and the current operating frequency of the asynchronous buffer 104 .
一种可能的实施方式中,调频控制器102可以从该异步缓存104的多个目标频率以及异步缓存104的当前工作频率中选择最大的频率作为上述第二优化工作频率。In a possible implementation manner, the frequency modulation controller 102 may select the largest frequency from the multiple target frequencies of the asynchronous buffer 104 and the current operating frequency of the asynchronous buffer 104 as the second optimized operating frequency.
另一种可能的实施方式中,调频控制器102可以从上述异步缓存104的多个目标频率以及异步缓存104的当前工作频率中选择最小的频率作为上述第二优化工作频率。In another possible implementation, the frequency modulation controller 102 may select the minimum frequency from the multiple target frequencies of the asynchronous buffer 104 and the current operating frequency of the asynchronous buffer 104 as the second optimized operating frequency.
另一种可能的实施方式中,调频控制器102可以取上述异步缓存104的多个目标频率和异步缓存104的当前工作频率的平均值作为上述第二优化工作频率。In another possible implementation, the frequency modulation controller 102 may take the average value of the multiple target frequencies of the asynchronous buffer 104 and the current operating frequency of the asynchronous buffer 104 as the second optimized operating frequency.
在确定上述第二优化工作频率之后,调频控制器102将该第二优化工作频率发送给调频器103,调频器103调整该异步缓存104的工作频率为该第二优化工作频率。After determining the second optimal operating frequency, the frequency modulation controller 102 sends the second optimal operating frequency to the frequency regulator 103 , and the frequency regulator 103 adjusts the operating frequency of the asynchronous buffer 104 to be the second optimal operating frequency.
为了便于理解上述调频控制器102所做的各个操作,可以参见图4和图5。图4以处理器101包括8个处理器核1011为例,该8个处理器核分为三个簇,其中处理器核(core)0至2为第一簇,处理器核(core)3至5为第二簇,处理器(core)6和7为第三簇。图4中示出了该三个簇的处理器核与调频控制器102中对应的处理通道的连接关系,并示出了簇间处理器核工作频率的仲裁和整体上存储器的工作频率的总裁流程示意。图5以处理器核(core)0为例示出了其对应的处理通道中所包括的调频控制器102基于处理器核的负载情况获取该处理器核对 应的目标频率的过程。To facilitate understanding of the various operations performed by the FM controller 102, reference may be made to FIG. 4 and FIG. 5 . FIG. 4 takes the processor 101 including 8 processor cores 1011 as an example. The 8 processor cores are divided into three clusters, wherein the processor cores (cores) 0 to 2 are the first cluster, and the processor cores (core) 3 To 5 is the second cluster, and processors (cores) 6 and 7 are the third cluster. FIG. 4 shows the connection relationship between the processor cores of the three clusters and the corresponding processing channels in the FM controller 102 , and also shows the arbitration of the operating frequency of the processor cores among the clusters and the overall control of the operating frequency of the memory. Process schematic. Fig. 5 takes the processor core (core) 0 as an example to illustrate the process that the frequency modulation controller 102 included in the corresponding processing channel obtains the target frequency corresponding to the processor core based on the load condition of the processor core.
在图5所示的处理器核(core)0处理通道中,调频控制器102首先获取外部对第一窗口大小的配置信息和负载类型的配置信息,然后基于这些信息进行负载分类分析,即对从处理器核(core)0获取的负载分类信息进行分析处理。由前述的描述可知,该第一窗口的大小可以包括一个或多个该处理器核(core)0内负载分类器的分类窗口。该负载类型的配置信息主要包括上述m个类型的信息。In the processing channel of the processor core (core) 0 shown in FIG. 5 , the FM controller 102 first obtains the external configuration information of the first window size and the configuration information of the load type, and then performs load classification analysis based on these information, that is, the The load classification information obtained from the processor core (core) 0 is analyzed and processed. It can be known from the foregoing description that the size of the first window may include one or more classification windows of the load classifier in the processor core (core) 0 . The configuration information of the load type mainly includes the above m types of information.
然后,调频控制器102计算处理器核(core)0的负载量、负载缩放比例和负载分布权重,并获取处理器核(core)0对应的当前工作频率,基于该当前工作频率查询映射表得到当前性能,然后,再基于负载缩放比例和负载分布权重计算得到该处理器核(core)0对应的目标频率。Then, the frequency modulation controller 102 calculates the load amount, load scaling ratio and load distribution weight of the processor core (core) 0, obtains the current operating frequency corresponding to the processor core (core) 0, and queries the mapping table based on the current operating frequency to obtain The current performance is then calculated based on the load scaling ratio and the load distribution weight to obtain the target frequency corresponding to the processor core (core) 0.
假设图4中的8个处理器核均满足频率调整的条件即满足上述预设条件,那么调频控制器102针对每一个处理器核处理得到每一个处理器核对应的目标频率。然后,基于这些目标频率,为每个簇确定一个最终用于调整的频率,并为存储器确定一个最终用于调整的频率,然后将确定的频率发送给调频器103,调频器103基于接收到的频率调整每个簇的工作频率和存储器的工作频率。Assuming that the eight processor cores in FIG. 4 all satisfy the frequency adjustment conditions, that is, the above preset conditions are satisfied, then the frequency modulation controller 102 processes each processor core to obtain the target frequency corresponding to each processor core. Then, based on these target frequencies, a final frequency for adjustment is determined for each cluster, and a final frequency for adjustment is determined for the memory, and the determined frequency is sent to the frequency modulator 103, which is based on the received frequency. Frequency adjusts the operating frequency of each cluster and the operating frequency of the memory.
另外,在映射表查询和处理的步骤以及簇间处理器核的工作频率仲裁之后,调频控制器102可以向任务调度器发送对应处理器核上下电建议。另外,在簇间处理器核的工作频率仲裁之后,调频控制器102可以向任务调度器发送负载均衡建议等。In addition, after the steps of querying and processing the mapping table and the arbitration of the operating frequency of the processor cores between clusters, the frequency modulation controller 102 may send a power-on/off suggestion for the corresponding processor core to the task scheduler. In addition, after the working frequency arbitration of the processor cores among the clusters, the frequency modulation controller 102 may send a load balancing suggestion and the like to the task scheduler.
上述图4和图5所示的各个步骤的具体实现参见上述基于图1的详述描述,此处不再赘述。For the specific implementation of the steps shown in FIG. 4 and FIG. 5 , refer to the above detailed description based on FIG. 1 , and details are not repeated here.
请参见图6,是本发明实施例提供的一种处理方法的流程示意图,该处理方法可应用于处理装置。该处理装置可以包括处理器、调频控制器和调频器,该处理器包括第一处理器核。示例性地,该处理装置可以是图1所示的装置10,该第一处理器核可以是图1所示的多个处理器核中的任意一个。该方法包括但不限于如下步骤:Please refer to FIG. 6 , which is a schematic flowchart of a processing method provided by an embodiment of the present invention, and the processing method can be applied to a processing apparatus. The processing device may include a processor, a frequency modulation controller and a frequency modulator, the processor including a first processor core. Exemplarily, the processing apparatus may be the apparatus 10 shown in FIG. 1 , and the first processor core may be any one of the multiple processor cores shown in FIG. 1 . The method includes but is not limited to the following steps:
S601、通过该调频控制器获取上述第一处理器核的至少一个负载类型。S601. Acquire at least one load type of the first processor core through the frequency modulation controller.
S602、通过该调频控制器基于上述至少一个负载类型确定目标对象的目标频率;该目标对象包括存储器。S602: Determine, by the frequency modulation controller, a target frequency of a target object based on the at least one load type; the target object includes a memory.
S603、通过该调频控制器基于上述目标频率调用上述调频器调整上述目标对象的工作频率。S603 , using the frequency modulation controller to call the frequency modulator based on the target frequency to adjust the working frequency of the target object.
一种可能的实施方式中,上述存储器包括内存和异步缓存中的至少一项。In a possible implementation manner, the above-mentioned memory includes at least one of a memory and an asynchronous cache.
一种可能的实施方式中,上述负载类型包括计算型负载、缓存依赖型负载和内存依赖型负载。In a possible implementation manner, the above load types include computational loads, cache-dependent loads, and memory-dependent loads.
一种可能的实施方式中,上述目标对象还包括上述第一处理器核;上述方法还包括:通过上述调频控制器执行如下操作:In a possible implementation manner, the above-mentioned target object further includes the above-mentioned first processor core; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
基于上述第一处理器核的目标频率调用上述调频器调整上述第一处理器核的工作频率,上述第一处理器核的目标频率和上述存储器的目标频率存在映射关系。Based on the target frequency of the first processor core, the frequency regulator is called to adjust the operating frequency of the first processor core, and there is a mapping relationship between the target frequency of the first processor core and the target frequency of the memory.
一种可能的实施方式中,上述处理器还包括至少一个第二处理器核,上述第一处理器核和上述至少一个第二处理器核组成簇;上述方法还包括:通过上述调频控制器根据上述第一处理器核的目标频率确定上述簇的工作频率。In a possible implementation manner, the above-mentioned processor further includes at least one second processor core, and the above-mentioned first processor core and the above-mentioned at least one second processor core form a cluster; the above-mentioned method further includes: using the above-mentioned frequency modulation controller according to the The target frequency of the first processor core determines the operating frequency of the cluster.
一种可能的实施方式中,上述处理器包括n个上述第一处理器核,上述n为正整数;上述方法还包括:通过上述调频控制器执行如下操作:In a possible implementation manner, the above-mentioned processor includes n above-mentioned first processor cores, and the above-mentioned n is a positive integer; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
针对每个上述第一处理器核的至少一个负载类型确定上述存储器的一个目标频率;基于上述存储器的n个目标频率确定上述存储器的优化工作频率;调用上述调频器调整上述存储器的工作频率为上述优化工作频率。Determine a target frequency of the memory for at least one load type of each of the first processor cores; determine the optimal operating frequency of the memory based on the n target frequencies of the memory; call the frequency regulator to adjust the operating frequency of the memory to the above Optimize operating frequency.
一种可能的实施方式中,上述处理装置还包括负载分类器,上述方法还包括:In a possible implementation manner, the above-mentioned processing apparatus further includes a load classifier, and the above-mentioned method further includes:
通过上述负载分类器获取上述第一处理器核中的负载分类特征信息,基于上述负载分类特征信息对上述第一处理器核中的负载进行分类得到上述至少一个负载类型;通过上述调频控制器从上述负载分类器中获取上述至少一个负载类型。Obtain the load classification feature information in the first processor core through the load classifier, and classify the load in the first processor core based on the load classification feature information to obtain the at least one load type; The above-mentioned at least one load type is obtained from the above-mentioned load classifier.
一种可能的实施方式中,上述负载分类特征信息包括上述第一处理器核内的时钟信号翻转信息。In a possible implementation manner, the load classification feature information includes clock signal inversion information in the first processor core.
一种可能的实施方式中,上述方法还包括:In a possible implementation, the above method also includes:
通过上述调频控制器获取上述第一处理器核的负载量;上述基于上述至少一个负载类型确定目标对象的目标频率,包括:在上述负载量满足预设条件时,基于上述至少一个负载类型确定上述目标频率。Obtaining the load amount of the first processor core through the frequency regulation controller; the determining the target frequency of the target object based on the at least one load type includes: when the load amount satisfies a preset condition, determining the above-mentioned at least one load type target frequency.
一种可能的实施方式中,上述基于上述至少一个负载类型确定上述目标频率,包括:In a possible implementation manner, the above-mentioned determination of the above-mentioned target frequency based on the above-mentioned at least one load type includes:
基于上述目标对象的当前工作频率在第一负载类型对应的第一映射表中查找处理器的当前性能信息,上述第一映射表中包括上述目标对象的工作频率与上述处理器的性能信息的映射关系;上述第一负载类型为上述至少一个负载类型中出现次数最多的类型;基于上述负载量对上述当前性能信息做调整以得到目标性能信息;基于上述目标性能信息在上述第一映射表中查找上述目标对象的第一目标频率,将上述第一目标频率作为上述目标对象的目标频率。Based on the current operating frequency of the target object, the current performance information of the processor is searched in the first mapping table corresponding to the first load type, where the first mapping table includes the mapping between the operating frequency of the target object and the performance information of the processor relationship; the above-mentioned first load type is the type with the most occurrences among the above-mentioned at least one load type; the above-mentioned current performance information is adjusted based on the above-mentioned load amount to obtain target performance information; based on the above-mentioned target performance information, look up in the above-mentioned first mapping table For the first target frequency of the target object, the first target frequency is taken as the target frequency of the target object.
一种可能的实施方式中,上述第一映射表中还包括上述目标对象的工作频率与上述处理器的功耗的映射关系;上述方法还包括:通过上述调频控制器执行如下操作:In a possible implementation manner, the above-mentioned first mapping table further includes a mapping relationship between the operating frequency of the above-mentioned target object and the power consumption of the above-mentioned processor; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
基于上述处理器的功耗约束在上述第一映射表中查找上述目标对象的第二目标频率;在上述第一目标频率和上述第二目标频率不同的情况下,确定上述目标对象的目标频率为上述第二目标频率。Based on the power consumption constraint of the processor, the second target frequency of the target object is searched in the first mapping table; when the first target frequency and the second target frequency are different, the target frequency of the target object is determined as the above-mentioned second target frequency.
一种可能的实施方式中,上述至少一个负载类型为m个负载类型,上述m为大于1的整数;上述m个负载类型的每个类型对应一个映射表,上述映射表中包括上述目标对象的工作频率与上述处理器的性能信息的映射关系;上述基于上述至少一个负载类型确定上述目标频率,包括:In a possible implementation manner, the above-mentioned at least one load type is m load types, and the above-mentioned m is an integer greater than 1; each type of the above-mentioned m load types corresponds to a mapping table, and the above-mentioned mapping table includes the above-mentioned target object. The mapping relationship between the operating frequency and the performance information of the above-mentioned processor; the above-mentioned determination of the above-mentioned target frequency based on the above-mentioned at least one load type includes:
基于上述目标对象的当前工作频率在每个上述映射表中查找处理器的一个当前性能信息;基于上述负载量分别对上述m个当前性能信息做调整以得到m个目标性能信息;基于上述m个目标性能信息在上述m个映射表中查找上述目标对象的m组第一优化频率;基于上述m个负载类型的负载分布权重对上述m组第一优化频率进行处理得到第三目标频率,将上述第三目标频率作为上述目标对象的目标频率,上述负载分布权重指示上述m个类型中每个类型的负载的比例。Based on the current working frequency of the target object, a current performance information of the processor is searched in each of the above mapping tables; based on the above load, the m pieces of current performance information are adjusted to obtain m pieces of target performance information; The target performance information is to find m groups of first optimized frequencies of the target object in the above m mapping tables; based on the load distribution weights of the above m load types, the above m groups of first optimized frequencies are processed to obtain a third target frequency, and the above The third target frequency is used as the target frequency of the target object, and the load distribution weight indicates the proportion of the load of each of the m types.
一种可能的实施方式中,上述m个映射表中还包括上述目标对象的工作频率与上述处理器的功耗的映射关系;上述方法还包括:通过上述调频控制器执行如下操作:In a possible implementation manner, the above-mentioned m mapping tables further include a mapping relationship between the operating frequency of the above-mentioned target object and the power consumption of the above-mentioned processor; the above-mentioned method further includes: performing the following operations through the above-mentioned frequency modulation controller:
基于上述处理器的功耗约束分别在上述m个映射表中查找上述目标对象的m组第二优化频率;基于上述负载分布权重对上述m组第二优化频率进行处理得到上述目标对象的第四目标频率;在上述第三目标频率和上述第四目标频率不同的情况下,确定上述目标对象的目标频率为上述第四目标频率。Based on the power consumption constraints of the processor, the m groups of second optimized frequencies of the target object are respectively searched in the m mapping tables; the m groups of second optimized frequencies are processed based on the load distribution weight to obtain the fourth optimal frequency of the target object target frequency; when the third target frequency and the fourth target frequency are different, determine the target frequency of the target object as the fourth target frequency.
需要说明的是,上述图6中所描述的处理方法及其可能的实施方式中的具体流程,可参见上述图1-图5所述的实施例中的相关描述,此处不再赘述。It should be noted that, for the specific flow of the processing method described in FIG. 6 and possible implementations thereof, reference may be made to the relevant descriptions in the embodiments described in FIG. 1 to FIG. 5 , and details are not repeated here.
本申请提供一种计算机程序,该计算机程序包括指令,当该计算机程序处理器执行时,使得该处理器可以执行上述图6及其可能的实施方式中任意一项所述的处理方法流程。The present application provides a computer program, the computer program including instructions, when executed by the computer program processor, enables the processor to execute the processing method flow described in any one of the above-mentioned FIG. 6 and its possible implementation manners.
以上所述仅为本发明的几个实施例,本领域的技术人员依据申请文件公开的可以对本发明进行各种改动或变型而不脱离本发明的精神和范围。例如本发明实施例的附图中的各个部件具体形状或结构是可以根据实际应用场景进行调整的。The above are only a few embodiments of the present invention, and those skilled in the art can make various changes or modifications to the present invention according to the disclosure of the application documents without departing from the spirit and scope of the present invention. For example, the specific shape or structure of each component in the drawings of the embodiments of the present invention can be adjusted according to actual application scenarios.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital versatile disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above-mentioned embodiments, it may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions can be sent from a website site, computer, server, or data center via wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.) another website site, computer, server or data center for transmission. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media. The available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, digital versatile discs (DVDs)), or semiconductor media (eg, solid state disks (SSDs)) )Wait.

Claims (27)

  1. 一种处理装置,其特征在于,所述处理装置包括处理器、调频控制器和调频器,所述处理器包括第一处理器核;所述调频控制器用于:A processing device, characterized in that the processing device comprises a processor, a frequency modulation controller and a frequency regulator, the processor comprises a first processor core; the frequency modulation controller is used for:
    获取所述第一处理器核的至少一个负载类型;obtaining at least one load type of the first processor core;
    基于所述至少一个负载类型确定目标对象的目标频率;所述目标对象包括存储器;A target frequency of a target object is determined based on the at least one load type; the target object includes a memory;
    基于所述目标频率调用所述调频器调整所述目标对象的工作频率。The frequency modulator is called based on the target frequency to adjust the operating frequency of the target object.
  2. 根据权利要求1所述的装置,其特征在于,所述存储器包括内存和异步缓存中的至少一项。The apparatus of claim 1, wherein the memory comprises at least one of a memory and an asynchronous cache.
  3. 根据权利要求1或2所述的装置,其特征在于,所述目标对象还包括所述第一处理器核;所述调频控制器还用于:The apparatus according to claim 1 or 2, wherein the target object further comprises the first processor core; the frequency modulation controller is further configured to:
    基于所述第一处理器核的目标频率调用所述调频器调整所述第一处理器核的工作频率,所述第一处理器核的目标频率和所述存储器的目标频率存在映射关系。The frequency regulator is called to adjust the operating frequency of the first processor core based on the target frequency of the first processor core, and there is a mapping relationship between the target frequency of the first processor core and the target frequency of the memory.
  4. 根据权利要求3所述的装置,其特征在于,The device of claim 3, wherein:
    所述处理器还包括至少一个第二处理器核,所述第一处理器核和所述至少一个第二处理器核组成簇;The processor further includes at least one second processor core, and the first processor core and the at least one second processor core form a cluster;
    所述调频控制器,还用于根据所述第一处理器核的目标频率确定所述簇的工作频率。The frequency modulation controller is further configured to determine the operating frequency of the cluster according to the target frequency of the first processor core.
  5. 根据权利要求1至4任一项所述的装置,其特征在于,所述处理器包括n个所述第一处理器核,所述n为正整数;所述调频控制器还用于:The apparatus according to any one of claims 1 to 4, wherein the processor comprises n of the first processor cores, and n is a positive integer; the frequency modulation controller is further configured to:
    针对每个所述第一处理器核的至少一个负载类型确定所述存储器的一个目标频率;determining a target frequency of the memory for at least one load type of each of the first processor cores;
    基于所述存储器的n个目标频率确定所述存储器的优化工作频率;determining an optimal operating frequency of the memory based on the n target frequencies of the memory;
    调用所述调频器调整所述存储器的工作频率为所述优化工作频率。Invoke the frequency regulator to adjust the working frequency of the memory to the optimized working frequency.
  6. 根据权利要求1至5任一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 1 to 5, wherein the device further comprises:
    负载分类器,用于获取所述第一处理器核中的负载分类特征信息,基于所述负载分类特征信息对所述第一处理器核中的负载进行分类得到所述至少一个负载类型;a load classifier, configured to obtain load classification feature information in the first processor core, and classify the load in the first processor core based on the load classification feature information to obtain the at least one load type;
    所述调频控制器具体用于:从所述负载分类器中获取所述至少一个负载类型。The frequency modulation controller is specifically configured to: acquire the at least one load type from the load classifier.
  7. 根据权利要求6所述的装置,其特征在于,所述负载分类特征信息包括所述第一处理器核内的时钟信号翻转信息。The apparatus according to claim 6, wherein the load classification characteristic information comprises clock signal inversion information in the first processor core.
  8. 根据权利要求1至7任一项所述的装置,其特征在于,所述调频控制器,还用于:The device according to any one of claims 1 to 7, wherein the frequency modulation controller is further configured to:
    获取所述第一处理器核的负载量;obtaining the load of the first processor core;
    在所述负载量满足预设条件时,基于所述至少一个负载类型确定所述目标频率。The target frequency is determined based on the at least one load type when the load amount satisfies a preset condition.
  9. 根据权利要求8所述的装置,其特征在于,所述调频控制器,具体用于:The device according to claim 8, wherein the frequency modulation controller is specifically used for:
    基于所述目标对象的当前工作频率在第一负载类型对应的第一映射表中查找处理器的当前性能信息,所述第一映射表中包括所述目标对象的工作频率与所述处理器的性能信息的映射关系;所述第一负载类型为所述至少一个负载类型中出现次数最多的类型;Based on the current operating frequency of the target object, the current performance information of the processor is searched in the first mapping table corresponding to the first load type, where the first mapping table includes the operating frequency of the target object and the processor's current performance information. A mapping relationship of performance information; the first load type is the type with the most occurrences in the at least one load type;
    基于所述负载量对所述当前性能信息做调整以得到目标性能信息;adjusting the current performance information based on the load to obtain target performance information;
    基于所述目标性能信息在所述第一映射表中查找所述目标对象的第一目标频率,将所述第一目标频率作为所述目标对象的目标频率。The first target frequency of the target object is searched in the first mapping table based on the target performance information, and the first target frequency is used as the target frequency of the target object.
  10. 根据权利要求9所述的装置,其特征在于,所述第一映射表中还包括所述目标对象的工作频率与所述处理器的功耗的映射关系;The apparatus according to claim 9, wherein the first mapping table further includes a mapping relationship between the operating frequency of the target object and the power consumption of the processor;
    所述调频控制器,还用于:The FM controller is also used for:
    基于所述处理器的功耗约束在所述第一映射表中查找所述目标对象的第二目标频率;looking up the second target frequency of the target object in the first mapping table based on the power consumption constraint of the processor;
    在所述第一目标频率和所述第二目标频率不同的情况下,确定所述目标对象的目标频率为所述第二目标频率。When the first target frequency and the second target frequency are different, the target frequency of the target object is determined to be the second target frequency.
  11. 根据权利要求8所述的装置,其特征在于,所述至少一个负载类型为m个负载类型,所述m为大于1的整数;所述m个负载类型的每个类型对应一个映射表,所述映射表中包括所述目标对象的工作频率与所述处理器的性能信息的映射关系;所述调频控制器,具体用于:The apparatus according to claim 8, wherein the at least one load type is m load types, and the m is an integer greater than 1; each type of the m load types corresponds to a mapping table, and the The mapping table includes the mapping relationship between the operating frequency of the target object and the performance information of the processor; the frequency modulation controller is specifically used for:
    基于所述目标对象的当前工作频率在每个所述映射表中查找处理器的一个当前性能信息;Searching for a current performance information of the processor in each of the mapping tables based on the current operating frequency of the target object;
    基于所述负载量分别对所述m个当前性能信息做调整以得到m个目标性能信息;respectively adjusting the m pieces of current performance information based on the load to obtain m pieces of target performance information;
    基于所述m个目标性能信息在所述m个映射表中查找所述目标对象的m组第一优化频率;looking up m groups of first optimization frequencies of the target object in the m mapping tables based on the m target performance information;
    基于所述m个负载类型的负载分布权重对所述m组第一优化频率进行处理得到第三目标频率,将所述第三目标频率作为所述目标对象的目标频率,所述负载分布权重指示所述m个类型中每个类型的负载的比例。The m groups of first optimized frequencies are processed based on the load distribution weights of the m load types to obtain a third target frequency, and the third target frequency is used as the target frequency of the target object, and the load distribution weight indicates The proportion of the load for each of the m types.
  12. 根据权利要求11所述的装置,其特征在于,所述m个映射表中还包括所述目标对象的工作频率与所述处理器的功耗的映射关系;The device according to claim 11, wherein the m mapping tables further include a mapping relationship between the operating frequency of the target object and the power consumption of the processor;
    所述调频控制器,还用于:The FM controller is also used for:
    基于所述处理器的功耗约束分别在所述m个映射表中查找所述目标对象的m组第二优化频率;looking up m groups of second optimized frequencies of the target object in the m mapping tables respectively based on the power consumption constraints of the processor;
    基于所述负载分布权重对所述m组第二优化频率进行处理得到所述目标对象的第四目标频率;processing the m groups of second optimized frequencies based on the load distribution weight to obtain the fourth target frequency of the target object;
    在所述第三目标频率和所述第四目标频率不同的情况下,确定所述目标对象的目标频率为所述第四目标频率。When the third target frequency and the fourth target frequency are different, the target frequency of the target object is determined to be the fourth target frequency.
  13. 根据权利要求1-12任一所述的装置,其特征在于,所述负载类型包括计算型负载、缓存依赖型负载和内存依赖型负载。The apparatus according to any one of claims 1-12, wherein the load types include computing loads, cache-dependent loads, and memory-dependent loads.
  14. 一种处理方法,其特征在于,所述方法应用于处理装置,所述处理装置包括处理器、调频控制器和调频器,所述处理器包括第一处理器核;所述方法包括:通过所述调频控制器执行如下操作:A processing method, characterized in that the method is applied to a processing device, the processing device includes a processor, a frequency modulation controller and a frequency modulator, the processor includes a first processor core; the method includes: The FM controller performs the following operations:
    获取所述第一处理器核的至少一个负载类型;obtaining at least one load type of the first processor core;
    基于所述至少一个负载类型确定目标对象的目标频率;所述目标对象包括存储器;A target frequency of a target object is determined based on the at least one load type; the target object includes a memory;
    基于所述目标频率调用所述调频器调整所述目标对象的工作频率。The frequency modulator is called based on the target frequency to adjust the operating frequency of the target object.
  15. 根据权利要求14所述的方法,其特征在于,所述存储器包括内存和异步缓存中的至少一项。The method of claim 14, wherein the memory comprises at least one of a memory and an asynchronous cache.
  16. 根据权利要求14或15所述的方法,其特征在于,所述目标对象还包括所述第一处理器核;所述方法还包括:通过所述调频控制器执行如下操作:The method according to claim 14 or 15, wherein the target object further comprises the first processor core; the method further comprises: performing the following operations through the frequency modulation controller:
    基于所述第一处理器核的目标频率调用所述调频器调整所述第一处理器核的工作频率,所述第一处理器核的目标频率和所述存储器的目标频率存在映射关系。The frequency regulator is called to adjust the operating frequency of the first processor core based on the target frequency of the first processor core, and there is a mapping relationship between the target frequency of the first processor core and the target frequency of the memory.
  17. 根据权利要求16所述的方法,其特征在于,所述处理器还包括至少一个第二处理器核,所述第一处理器核和所述至少一个第二处理器核组成簇;所述方法还包括:The method of claim 16, wherein the processor further comprises at least one second processor core, and the first processor core and the at least one second processor core form a cluster; the method Also includes:
    通过所述调频控制器根据所述第一处理器核的目标频率确定所述簇的工作频率。The operating frequency of the cluster is determined by the frequency modulation controller according to the target frequency of the first processor core.
  18. 根据权利要求14至17任一项所述的方法,其特征在于,所述处理器包括n个所述第一处理器核,所述n为正整数;所述方法还包括:通过所述调频控制器执行如下操作:The method according to any one of claims 14 to 17, wherein the processor comprises n of the first processor cores, and n is a positive integer; the method further comprises: adjusting the frequency by the frequency modulation The controller does the following:
    针对每个所述第一处理器核的至少一个负载类型确定所述存储器的一个目标频率;determining a target frequency of the memory for at least one load type of each of the first processor cores;
    基于所述存储器的n个目标频率确定所述存储器的优化工作频率;determining an optimal operating frequency of the memory based on the n target frequencies of the memory;
    调用所述调频器调整所述存储器的工作频率为所述优化工作频率。Invoke the frequency regulator to adjust the working frequency of the memory to the optimized working frequency.
  19. 根据权利要求14至18任一项所述的方法,其特征在于,所述处理装置还包括负载分类器,所述方法还包括:The method according to any one of claims 14 to 18, wherein the processing device further comprises a load classifier, and the method further comprises:
    通过所述负载分类器获取所述第一处理器核中的负载分类特征信息,基于所述负载分类特征信息对所述第一处理器核中的负载进行分类得到所述至少一个负载类型;Obtain load classification feature information in the first processor core by the load classifier, and classify the load in the first processor core based on the load classification feature information to obtain the at least one load type;
    通过所述调频控制器从所述负载分类器中获取所述至少一个负载类型。The at least one load type is obtained from the load classifier by the frequency modulation controller.
  20. 根据权利要求19所述的方法,其特征在于,所述负载分类特征信息包括所述第一处理器核内的时钟信号翻转信息。The method according to claim 19, wherein the load classification characteristic information comprises clock signal inversion information in the first processor core.
  21. 根据权利要求14至20任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 14 to 20, wherein the method further comprises:
    通过所述调频控制器获取所述第一处理器核的负载量;Obtain the load amount of the first processor core through the frequency modulation controller;
    所述基于所述至少一个负载类型确定目标对象的目标频率,包括:The determining the target frequency of the target object based on the at least one load type includes:
    在所述负载量满足预设条件时,基于所述至少一个负载类型确定所述目标频率。The target frequency is determined based on the at least one load type when the load amount satisfies a preset condition.
  22. 根据权利要求21所述的方法,其特征在于,所述基于所述至少一个负载类型确定所述目标频率,包括:The method of claim 21, wherein the determining the target frequency based on the at least one load type comprises:
    基于所述目标对象的当前工作频率在第一负载类型对应的第一映射表中查找处理器的当前性能信息,所述第一映射表中包括所述目标对象的工作频率与所述处理器的性能信息的映射关系;所述第一负载类型为所述至少一个负载类型中出现次数最多的类型;Based on the current operating frequency of the target object, the current performance information of the processor is searched in the first mapping table corresponding to the first load type, where the first mapping table includes the operating frequency of the target object and the processor's current performance information. A mapping relationship of performance information; the first load type is the type with the most occurrences in the at least one load type;
    基于所述负载量对所述当前性能信息做调整以得到目标性能信息;adjusting the current performance information based on the load to obtain target performance information;
    基于所述目标性能信息在所述第一映射表中查找所述目标对象的第一目标频率,将所述第一目标频率作为所述目标对象的目标频率。The first target frequency of the target object is searched in the first mapping table based on the target performance information, and the first target frequency is used as the target frequency of the target object.
  23. 根据权利要求22所述的方法,其特征在于,所述第一映射表中还包括所述目标对象的工作频率与所述处理器的功耗的映射关系;所述方法还包括:通过所述调频控制器执行如下操作:The method according to claim 22, wherein the first mapping table further includes a mapping relationship between the operating frequency of the target object and the power consumption of the processor; the method further comprises: using the The FM controller performs the following operations:
    基于所述处理器的功耗约束在所述第一映射表中查找所述目标对象的第二目标频率;looking up the second target frequency of the target object in the first mapping table based on the power consumption constraint of the processor;
    在所述第一目标频率和所述第二目标频率不同的情况下,确定所述目标对象的目标频率为所述第二目标频率。When the first target frequency and the second target frequency are different, the target frequency of the target object is determined to be the second target frequency.
  24. 根据权利要求21所述的方法,其特征在于,所述至少一个负载类型为m个负载类型,所述m为大于1的整数;所述m个负载类型的每个类型对应一个映射表,所述映射表中包括所述目标对象的工作频率与所述处理器的性能信息的映射关系;The method according to claim 21, wherein the at least one load type is m load types, and the m is an integer greater than 1; each type of the m load types corresponds to a mapping table, and the The mapping table includes the mapping relationship between the operating frequency of the target object and the performance information of the processor;
    所述基于所述至少一个负载类型确定所述目标频率,包括:The determining of the target frequency based on the at least one load type includes:
    基于所述目标对象的当前工作频率在每个所述映射表中查找处理器的一个当前性能信息;Searching for a current performance information of the processor in each of the mapping tables based on the current operating frequency of the target object;
    基于所述负载量分别对所述m个当前性能信息做调整以得到m个目标性能信息;respectively adjusting the m pieces of current performance information based on the load to obtain m pieces of target performance information;
    基于所述m个目标性能信息在所述m个映射表中查找所述目标对象的m组第一优化频率;looking up m groups of first optimization frequencies of the target object in the m mapping tables based on the m target performance information;
    基于所述m个负载类型的负载分布权重对所述m组第一优化频率进行处理得到第三目标频率,将所述第三目标频率作为所述目标对象的目标频率,所述负载分布权重指示所述m个 类型中每个类型的负载的比例。The m groups of first optimized frequencies are processed based on the load distribution weights of the m load types to obtain a third target frequency, and the third target frequency is used as the target frequency of the target object, and the load distribution weight indicates The proportion of the load for each of the m types.
  25. 根据权利要求24所述的方法,其特征在于,所述m个映射表中还包括所述目标对象的工作频率与所述处理器的功耗的映射关系;所述方法还包括:通过所述调频控制器执行如下操作:The method according to claim 24, wherein the m mapping tables further include a mapping relationship between the operating frequency of the target object and the power consumption of the processor; the method further comprises: using the The FM controller performs the following operations:
    基于所述处理器的功耗约束分别在所述m个映射表中查找所述目标对象的m组第二优化频率;looking up m groups of second optimized frequencies of the target object in the m mapping tables respectively based on the power consumption constraints of the processor;
    基于所述负载分布权重对所述m组第二优化频率进行处理得到所述目标对象的第四目标频率;processing the m groups of second optimized frequencies based on the load distribution weight to obtain the fourth target frequency of the target object;
    在所述第三目标频率和所述第四目标频率不同的情况下,确定所述目标对象的目标频率为所述第四目标频率。When the third target frequency and the fourth target frequency are different, the target frequency of the target object is determined to be the fourth target frequency.
  26. 根据权利要求14-25任一所述的方法,其特征在于,所述负载类型包括计算型负载、缓存依赖型负载和内存依赖型负载。The method according to any one of claims 14-25, wherein the load types include computational loads, cache-dependent loads, and memory-dependent loads.
  27. 一种电子设备,其特征在于,包括:如权利要求1至13任一所述的处理装置,以及耦合于所述处理装置的分立器件。An electronic device, comprising: the processing device according to any one of claims 1 to 13, and a discrete device coupled to the processing device.
PCT/CN2021/074555 2020-12-31 2021-01-30 Processing apparatus, processing method, and related device WO2022141735A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202180088258.XA CN116710904A (en) 2020-12-31 2021-01-30 Processing device, processing method and related equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNPCT/CN2020/142512 2020-12-31
CN2020142512 2020-12-31

Publications (1)

Publication Number Publication Date
WO2022141735A1 true WO2022141735A1 (en) 2022-07-07

Family

ID=82259805

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074555 WO2022141735A1 (en) 2020-12-31 2021-01-30 Processing apparatus, processing method, and related device

Country Status (2)

Country Link
CN (1) CN116710904A (en)
WO (1) WO2022141735A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070277046A1 (en) * 2006-05-29 2007-11-29 Yoshiko Yasuda Power management method for information platform
US20100191988A1 (en) * 2006-06-13 2010-07-29 Via Technologies, Inc. Method for reducing power consumption of a computer system in the working state
CN106959930A (en) * 2017-03-31 2017-07-18 深圳市金立通信设备有限公司 A kind of method of control memory, device and terminal
US20170212581A1 (en) * 2016-01-25 2017-07-27 Qualcomm Incorporated Systems and methods for providing power efficiency via memory latency control
CN111459682A (en) * 2020-04-09 2020-07-28 Oppo广东移动通信有限公司 Frequency adjustment method, frequency adjustment device, electronic device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070277046A1 (en) * 2006-05-29 2007-11-29 Yoshiko Yasuda Power management method for information platform
US20100191988A1 (en) * 2006-06-13 2010-07-29 Via Technologies, Inc. Method for reducing power consumption of a computer system in the working state
US20170212581A1 (en) * 2016-01-25 2017-07-27 Qualcomm Incorporated Systems and methods for providing power efficiency via memory latency control
CN106959930A (en) * 2017-03-31 2017-07-18 深圳市金立通信设备有限公司 A kind of method of control memory, device and terminal
CN111459682A (en) * 2020-04-09 2020-07-28 Oppo广东移动通信有限公司 Frequency adjustment method, frequency adjustment device, electronic device and storage medium

Also Published As

Publication number Publication date
CN116710904A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN110096349B (en) Job scheduling method based on cluster node load state prediction
GB2544609B (en) Granular quality of service for computing resources
US7814485B2 (en) System and method for adaptive power management based on processor utilization and cache misses
EP3274827B1 (en) Technologies for offloading and on-loading data for processor/coprocessor arrangements
CN101488098B (en) Multi-core computing resource management system based on virtual computing technology
US7689838B2 (en) Method and apparatus for providing for detecting processor state transitions
US8943340B2 (en) Controlling a turbo mode frequency of a processor
US9766933B2 (en) Fine-grained capacity management of computing environments that may support a database
KR101471303B1 (en) Device and method of power management for graphic processing unit
Arshad et al. Utilizing power consumption and SLA violations using dynamic VM consolidation in cloud data centers
JP5695766B2 (en) Multi-core system energy consumption optimization
WO2013137860A1 (en) Dynamically computing an electrical design point (edp) for a multicore processor
CN106528266A (en) Resource dynamic adjustment method and device in cloud computing system
WO2023015788A1 (en) Serverless computing resource allocation system for energy consumption optimization
CN113672383A (en) Cloud computing resource scheduling method, system, terminal and storage medium
CN111190735B (en) On-chip CPU/GPU pipelining calculation method based on Linux and computer system
WO2022246759A1 (en) Power consumption adjustment method and apparatus
KR101770736B1 (en) Method for reducing power consumption of system software using query scheduling of application and apparatus for reducing power consumption using said method
US10942850B2 (en) Performance telemetry aided processing scheme
CN108574600B (en) Service quality guarantee method for power consumption and resource competition cooperative control of cloud computing server
US20220214917A1 (en) Method and system for optimizing rack server resources
Rapp et al. NPU-accelerated imitation learning for thermal-and QoS-aware optimization of heterogeneous multi-cores
WO2022141735A1 (en) Processing apparatus, processing method, and related device
CN109582119B (en) Double-layer Spark energy-saving scheduling method based on dynamic voltage frequency adjustment
Huo et al. An energy efficient task scheduling scheme for heterogeneous GPU-enhanced clusters

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912513

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180088258.X

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21912513

Country of ref document: EP

Kind code of ref document: A1