WO2021073130A1 - 处理器的调频方法及装置、计算设备 - Google Patents

处理器的调频方法及装置、计算设备 Download PDF

Info

Publication number
WO2021073130A1
WO2021073130A1 PCT/CN2020/095550 CN2020095550W WO2021073130A1 WO 2021073130 A1 WO2021073130 A1 WO 2021073130A1 CN 2020095550 W CN2020095550 W CN 2020095550W WO 2021073130 A1 WO2021073130 A1 WO 2021073130A1
Authority
WO
WIPO (PCT)
Prior art keywords
core
frequency
cores
target process
kernel
Prior art date
Application number
PCT/CN2020/095550
Other languages
English (en)
French (fr)
Inventor
胡耀国
黄靖淞
赵辉昌
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021073130A1 publication Critical patent/WO2021073130A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Definitions

  • This application relates to the field of computer technology, and in particular to a method and device for frequency modulation of a processor, and computing equipment.
  • processors usually include multiple cores.
  • the multiple cores can process tasks in parallel.
  • each of the multiple cores processes a part of the task. After multiple cores have processed the parts they have processed, one of the cores summarizes the processing results of each core to obtain the processing results of the task.
  • the present application provides a method and device for frequency modulation of a processor, and computing equipment, which can improve the effective utilization of processor resources.
  • the technical solution is as follows:
  • a method for frequency modulation of a processor includes: obtaining the invalid utilization rate of each of the plurality of cores in the process of parallel processing of the target task by the plurality of cores of the processor; Reduce the frequency of the first core among the plurality of cores in the target process; increase the frequency of the second core among the plurality of cores in the target process; wherein the invalid utilization rate of the first core is high At the invalid utilization threshold, the invalid utilization of the second kernel is lower than the invalid utilization threshold.
  • the invalid utilization rate is used to characterize the utilization of kernel resources, and the invalid utilization rate of any kernel is negatively correlated with the calculation amount of any kernel in the process of processing the target task. The higher the invalid utilization of the core, the higher the current frequency of the core, and the lower the invalid utilization of the core, the lower the current frequency of the core.
  • the computing device can process the target task in parallel by the multiple cores of the processor, after obtaining the invalid utilization rate of each of the multiple cores, the first core is reduced in the target process according to the invalid utilization rate.
  • the frequency of the process and the frequency of increasing the second kernel in the target process, and the invalid utilization rate of the first kernel is higher than the invalid utilization threshold value, and the invalid utilization rate of the second kernel is lower than the invalid utilization threshold value.
  • the frequency of the kernel with a higher invalid utilization rate in the target process is reduced, and the frequency of the kernel with a lower invalid utilization rate in the target process is increased.
  • the waiting time of the core with higher invalid utilization rate is reduced, and the processing time of the processor to the target task is further reduced, thereby improving the efficiency of multiple cores in processing the target task and the effective utilization of processor resources.
  • the obtaining the invalid utilization rate of each of the plurality of cores includes: obtaining the number of calculations of any one of the plurality of cores in at least one unit time period in the target process; The ratio of the idle duration corresponding to any one of the cores to the total duration of the at least one unit time period is determined as the invalid utilization rate of any one of the cores, and the idle duration corresponding to the any one of the cores is the at least one The total time length of the idle unit time period corresponding to any one of the cores in the unit time period, and the number of calculations of the any core in the corresponding idle unit time period is less than the calculation number threshold.
  • the number of calculations may be the sum of the number of times that any core calculates at least one parameter in a floating point and a vector.
  • the number of calculations may be the sum of the number of times that any kernel calculates at least one parameter of integers and vectors, which is not limited in this embodiment of the application.
  • each calculation performed by any core corresponds to an event
  • the processor includes a register that stores the event corresponding to each calculation.
  • the computing device can read the register in each unit time period to obtain the number of events stored in the register in each unit time period. This number is the number of calculations of any kernel in each unit time period.
  • the computing device can be preset with a threshold for the number of calculations. Since any core usually performs a large number of calculations when processing the target task, when the number of calculations in any unit period of at least one unit period of time is less than When calculating the threshold of the number of times, it indicates that any kernel has not performed the target task in any unit time period, that is, it is in an empty running state. At this time, the computing device may determine any unit time period as an idle unit time period. For any core, after determining all the idle unit time periods of the any core in at least one unit interval, the idle time period of the any core can be determined, so as to obtain the invalid utilization rate of the any core.
  • the reducing the frequency of the first core of the plurality of cores in the target process includes: reducing the frequency of the first core in the target process based on the reference frequency of the first core , wherein, after reducing the frequency of the first core in the target process, the frequency of the first core is less than or equal to the reference frequency of the first core; wherein the reference frequency of the first core is positive Related to: the difference between the invalid utilization threshold and the invalid utilization of the first kernel.
  • the increasing the frequency of the second core in the target process among the plurality of cores includes: increasing the frequency of the second core in the target process based on the reference frequency of the second core , wherein, after increasing the frequency of the second core in the target process, the frequency of the second core is less than or equal to the reference frequency of the second core; wherein the reference frequency of the second core is positive Related to: the difference between the invalid utilization threshold and the invalid utilization of the second kernel.
  • the computing device may first determine the first core and the second core among the plurality of cores based on the invalid utilization rate of each core among the plurality of cores. Then reduce the frequency of the first kernel in the target process based on the reference frequency of the first kernel, and increase the frequency of the second kernel in the target process based on the reference frequency of the second kernel.
  • the method further includes: determining an average value of the invalid utilization rate of the multiple cores as the invalid utilization rate threshold.
  • the average value may be an arithmetic average, geometric average, square average, harmonic average, or weighted average.
  • the computing device can determine the reference frequency of each core through the average value and the target formula.
  • the target formula can be:, where s'represents the reference frequency of any one of the multiple cores, s represents the current frequency of any one of the cores, a represents the invalid utilization threshold, and b represents the invalid utilization of any one of the cores .
  • the computing device may reduce the frequency of the first core in the process of processing the target task to the reference frequency of the first core, and increase the frequency of the second core in the process of processing the target task to the reference frequency of the second core.
  • any of the cores has a frequency threshold. Since the frequency of each core in the process of processing the target task cannot be greater than its frequency threshold, the second core in the process of processing the target task among the multiple cores before the frequency of the second kernel, the method further includes: determining the minimum of the two parameters of the reference frequency and the frequency threshold of the second kernel; and the increasing the second kernel of the plurality of kernels in the target process The frequency includes: increasing the frequency of the second kernel in the target process to the minimum value. In order to avoid increasing the frequency of the second kernel in the target process, the frequency of the second kernel is greater than the frequency threshold.
  • the processor in the computing device further includes multiple power supply interfaces corresponding to the multiple cores one-to-one, and the multiple power supply interfaces are used to provide voltages to the corresponding cores to drive the corresponding cores to process target tasks.
  • the computing device can reduce the voltage of the power supply interface corresponding to the first core to reduce the frequency of the first core in the process of processing the target task. And by increasing the voltage of the power supply interface corresponding to the second core, the frequency of the second core in the process of processing the target task is increased.
  • the frequency of the first kernel in the target process is too large, the processing stability of the kernel will be affected. Therefore, after the frequency of the first kernel in the target process is reduced, the frequency of the first kernel is less than or equal to the first kernel.
  • the reference frequency of a core After increasing the frequency of the second core in the target process, the frequency of the second core is less than or equal to the reference frequency of the second core. In this way, the frequency of the first core and the second core in the target process can be prevented from being too high, thereby reducing the influence on the processing stability of the first core and the second core.
  • the method further includes: after reducing the frequency of the first kernel in the target process and increasing the frequency of the second kernel in the target process, detecting whether the target process ends When the target process is not over, repeat execution to obtain the invalid utilization rate, reduce the frequency of the first kernel in the target process, and increase the frequency of the second kernel in the target process process.
  • the target task may be periodic or non-periodic.
  • the target task since multiple cores execute the target task in the processor, the target task may be periodic or non-periodic.
  • the target task is periodic, since the operating scenarios of multiple cores in each cycle are almost the same, it can be executed only once to obtain invalid utilization, reduce the frequency of the first core in the target process, and increase the second core The frequency process in the target process obtains the final frequency modulation scheme.
  • the target task is aperiodic, the operating scenarios of multiple cores in each time period are different.
  • the computing device when the computing device detects that multiple cores have not processed the target task, it can repeat the execution to obtain the invalid utilization rate and reduce The frequency of the first core in the target process and the process of increasing the frequency of the second core in the target process to achieve real-time adjustment of the frequencies of the first core and the second core. This can more effectively reduce the difference in processing time between multiple cores in the process of processing the target task, and further improve the effective utilization of processor resources.
  • the sum of the reduced values of the frequencies of all the first cores in the plurality of cores is greater than or equal to the sum of the increased values of the frequencies of all the second cores in the plurality of cores.
  • a frequency modulation device of a processor includes various modules for executing the frequency modulation method of the processor according to any one of the first aspects.
  • a computer-readable storage medium is provided, and a computer program is stored in the storage medium.
  • the computer program is executed by a processor, the method for frequency modulation of the processor of any one of the first aspects is implemented.
  • a chip in a fourth aspect, includes a programmable logic circuit and/or program instructions, and is used to implement the frequency modulation method of the processor according to any one of the first aspects when the chip is running.
  • a computer program product containing instructions, which when the computer program product runs on a computer, causes the computer to execute the frequency modulation method of any one of the processors in the first aspect.
  • a computing device in a sixth aspect, includes a memory and a processor, wherein the processor is configured to execute a program stored in the memory to implement the processor of any one of the first aspects. FM method.
  • FIG. 1 is a flowchart of a method for frequency modulation of a processor according to an embodiment of the application
  • FIG. 2 is a schematic diagram of a scenario in which multiple cores in a computing device process a target task according to an embodiment of the application;
  • FIG. 3 is a flowchart of a method for obtaining the invalid utilization rate of each of multiple cores according to an embodiment of the application
  • FIG. 4 is a schematic diagram of threads running in multiple cores according to an embodiment of the application.
  • FIG. 5 is a schematic diagram of another thread running in multiple cores according to an embodiment of the application.
  • FIG. 6 is a block diagram of a frequency modulation device for a processor according to an embodiment of the application.
  • FIG. 7 is a block diagram of another frequency modulation device for a processor according to an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a computing device provided by an embodiment of this application.
  • the computing device includes a processor, and the processor usually includes multiple cores.
  • the multiple cores can process tasks in parallel. When multiple cores process tasks in parallel, each core is used to process part of the task. After multiple cores have processed the parts they have processed, one of the cores summarizes the processing results of each core to obtain the processing results of the task.
  • a computing device Before the tasks are processed in parallel by multiple cores, a computing device is required to distribute the tasks to the multiple cores. However, this task is usually not evenly distributed to multiple cores.
  • the computing device before multiple cores process a task, the computing device generates a calculation model according to the task, and allocates the calculation model to multiple cores. Since the calculation model is usually relatively complex, the parts allocated to the multiple cores in the task are not uniform, and the calculation amount of the parts allocated to the multiple cores also has large differences, resulting in differences in the processing time of each core.
  • the faster core will be in an idling state, resulting in a lower effective utilization of processor resources. Among them, the effective utilization of the core is positively correlated with the amount of calculation of the core in the process of processing tasks.
  • a processor acceleration (turbo) technology for increasing the processing speed of multiple cores
  • HWP Hardware performance state
  • the processor will detect whether each core is working. Later, in the turbo technology, the processor will control the power supply system to increase the power supply voltage of the core in the working state and reduce the power supply voltage of the core in the working state within the thermal design power (TDP) range of the processor. The power supply voltage of the core.
  • the processor In order to increase the operating frequency of the core in the working state, reduce the operating frequency of the core in the non-working state, thereby increasing the processing speed of the core in the working state.
  • the processor will reduce the operating frequency of the cores in the non-working state to reduce the processing power consumption of multiple cores.
  • the processor when multiple cores process tasks in parallel, in some application scenarios (such as HPC application scenarios), when the faster processing core is in the empty running state, the processor will determine the core in the empty running state Still in working condition. In this way, in the turbo technology, the processor will increase the operating frequency of each core.
  • the processor does not reduce the operating frequency of any core. As a result, the faster processing core is still in the idle state, which affects the effective utilization of processor resources. Therefore, neither of the two methods provided by related technologies can achieve an improvement in the effective utilization of processor resources.
  • the embodiment of the present application provides a method for frequency modulation of a processor.
  • the method can be applied to modules other than multiple cores in a processor included in a computing device.
  • the method can be applied to an uncore in the processor. Module or management module, etc.
  • the method can be applied to modules other than the processor in the computing device.
  • the method can be applied to a baseboard management controller (BMC).
  • BMC baseboard management controller
  • the method may be applied to an external device different from the computing device where the processor is located, which is not limited in the embodiment of the present application.
  • FIG. 1 is a flowchart of a method for frequency modulation of a processor provided by an embodiment of the application.
  • FIG. 1 uses the method applied to a computing device as an example for description.
  • the method may include:
  • Step 101 In the target process of the multiple cores of the processor processing the target task in parallel, obtain the invalid utilization rate of each of the multiple cores.
  • the multiple cores are used to process the target task.
  • the multiple cores may be all cores included in the processor, or part of all cores included in the processor.
  • the processor may be a central processing unit (CPU) or a graphics processing unit (GPU).
  • FIG. 2 is a schematic diagram of a scenario in which multiple cores in a computing device process a target task according to an embodiment of the application.
  • This scenario includes a computing device 20, which includes a processor 201, and the processor 201 includes multiple cores (four are shown in FIG. 2), and the multiple cores include a core a, a core b, a core c, and a core d.
  • Fig. 2 is a schematic diagram of a scenario in which multiple cores in a computing device process a target task according to an embodiment of the application.
  • This scenario includes a computing device 20, which includes a processor 201, and the processor 201 includes multiple cores (four are shown in FIG. 2), and the multiple cores include a core a, a core b, a core c, and a core d.
  • the thread 2 takes a thread running in each core as an example for illustration.
  • the thread a1 runs in the core a
  • the thread b1 runs in the core b
  • the thread c1 runs in the core c
  • the thread d1 runs in the core d.
  • the invalid utilization rate is used to characterize the utilization of kernel resources, and the invalid utilization rate of any kernel is negatively correlated with the calculation amount of any kernel in the target process.
  • multiple cores can start processing the target task first, and perform step 101 after a period of time after starting to process the target task, so as to avoid being unable to obtain the process of each core processing the target task because the processing time of the target task is too short.
  • the period of time may be three minutes, four minutes, five minutes, or more than ten minutes.
  • FIG. 3 is a flow chart of a method for obtaining the invalid utilization rate of each of multiple cores provided by an embodiment of the application, and the method may include:
  • Step 1011 Obtain the calculation times of any one of the multiple cores in at least one unit time period in the target process.
  • the unit time period may be 1 millisecond (ms), at this time the at least one unit time period may be 300,000 unit time periods (that is, the total time length of the at least one unit time period is five minutes) .
  • the number of calculations may be the sum of the number of times that any kernel calculates at least one parameter in a floating point and a vector. If any kernel performs integer calculations in the target process, the number of calculations can be the sum of the number of times that any kernel calculates at least one parameter of integers and vectors, which is not limited in this embodiment of the application.
  • each calculation performed by any core corresponds to an event
  • the processor includes a register that stores the event corresponding to each calculation.
  • the computing device can obtain the number of events stored in the register in each unit time period by reading the register in each unit time period. This number is the number of calculations of any kernel in each unit time period.
  • the processor includes a performance monitor unit (PMU) register, and the PMU register stores multiple PMU data, and each PMU data stores an event.
  • PMU performance monitor unit
  • the computing device can read the PMU data stored in the PMU register in each unit time period to obtain the number of calculations of the any core in each unit time period.
  • Step 1012 Determine the ratio of the idle duration corresponding to any core to the total duration of at least one unit time period as the invalid utilization of any core, where the idle time corresponding to any core is at least one unit time period The total duration of the idle unit time period corresponding to any core, and the calculation times of any core in the corresponding idle unit time period are less than the calculation times threshold.
  • the computing device can be preset with a threshold for the number of calculations. Since any kernel usually performs a large number of calculations in the target process, when the number of calculations in any unit period of at least one unit period of time is less than the number of calculations When the number of times is threshold, it indicates that any kernel has not executed the target task in any unit time period, that is, it is in the empty running state. At this time, the computing device may determine any unit time period as the idle unit time period corresponding to any core. For any core, after determining all the idle unit time periods corresponding to any core in at least one unit interval, the idle time corresponding to any core can be determined, so as to obtain the invalid utilization rate of any core .
  • the threshold of the number of calculations may be 800, 900, 1000, or 1100. Assuming that the threshold of the number of calculations is 1000, the unit time period is 1 ms, and at least one unit time period is 300,000 unit time periods (that is, the total time length is 5 minutes).
  • the multiple cores include core a, core b, core c, and core d.
  • Step 102 Determine the first core and the second core among the plurality of cores based on the invalid utilization rate of each core among the plurality of cores.
  • each of the multiple kernels is determined as the first kernel or the second kernel.
  • the any core is determined as the first core.
  • the invalid utilization rate of any core is lower than the invalid utilization threshold, the any core is determined as the second core.
  • the invalid utilization rate threshold may be an average value of invalid utilization rates of multiple cores.
  • the average value may be an arithmetic average value, a geometric average value, a square average value, a harmonic average value, or a weighted average value, etc., which is not limited in the embodiment of the present application.
  • the average value is an arithmetic average value
  • the average value is a square average value
  • the average value of the invalid utilization of the kernel a, kernel b, kernel c, and kernel d is 0.2375
  • the invalid utilization of core a and the invalid utilization of core c are both less than the invalid utilization threshold
  • the invalid utilization of core b and the invalid utilization of core d are both greater than Invalid utilization threshold.
  • the computing device may determine the core a and the core c as the second core, and the core b and the core d as the first core.
  • the computing device may determine the magnitude relationship between the invalid utilization rate of each core and the invalid utilization threshold value through the magnitude relationship between the current frequency of each core and its reference frequency. Further determining the first core and the second core of the plurality of cores, the reference frequency of any one of the cores is positively correlated with the difference between the invalid utilization threshold and the invalid utilization of the any one of the cores. When the current frequency of any core is greater than its reference frequency, it indicates that the invalid utilization rate of any core is higher than the invalid utilization threshold. Further, any core can be determined as the first core; the current frequency of any core is less than its reference frequency. When the reference frequency is used, it indicates that the invalid utilization rate of any core is lower than the invalid utilization threshold, and any core may be further determined as the second core.
  • the computing device may determine the reference frequency of each of the multiple cores based on the invalid utilization rate and the invalid utilization threshold of the multiple cores.
  • the computing device may be preset with a one-to-one correspondence between the different invalid utilization rates of the core and the reference frequency, and the computing device may directly search for the different invalid utilization rates and reference frequencies of the core according to the invalid utilization rate of any of the multiple cores.
  • the computing device determines the relationship between the invalid utilization rate of each core and the invalid utilization threshold based on the relationship between the current frequency of each core and its reference frequency, and further determines the first core and the second core among the plurality of cores.
  • the reference frequency of each core may be determined by the invalid utilization threshold and the target formula.
  • the invalid utilization threshold is the average of the invalid utilization of multiple cores Value
  • the invalid utilization threshold is 0.2375.
  • the greater the difference between the invalid utilization threshold and the invalid utilization of any core the greater the reference frequency of any core; the greater the difference between the invalid utilization threshold and the invalid utilization of any core Smaller, the smaller the reference frequency of any core.
  • the current frequency of core a and the current frequency of core c are both less than its reference frequency
  • the current frequency of core b and the current frequency of core d are both greater than their reference frequency.
  • the computing device may determine the core a and the core c as the second core, and the core b and the core d as the first core.
  • Step 103 Reduce the frequency of the first core among the multiple cores in the target process.
  • the computing device may reduce the frequency of the first core in the target process based on the reference frequency of the first core.
  • the frequency of the first core is less than or equal to the reference frequency of the first core. Since the frequency of the kernel in the target process is too large, the processing stability of the kernel will be affected. Therefore, after reducing the frequency of the first kernel in the target process, the frequency of the first kernel is less than or equal to the reference frequency of the first kernel The frequency of the first kernel in the target process can be prevented from being too high, thereby reducing the influence on the processing stability of the first kernel.
  • core b and core d are the first cores.
  • the current frequencies of core b and core d are both 100Mhz.
  • the reference frequency of core b is 73.75Mhz
  • the reference frequency of core d is 93.75Mhz.
  • the computing device can reduce the frequency of core b in the target process to 73.75Mhz or 73Mhz, and reduce the frequency of core d in the target process to 93.75Mhz or 93Mhz.
  • the processor in the computing device further includes multiple power supply interfaces corresponding to the multiple cores one-to-one, and the multiple power supply interfaces are used to provide voltages to the corresponding cores to drive the corresponding cores to process target tasks.
  • the computing device can reduce the voltage of the power supply interface corresponding to the first core to reduce the frequency of the first core in the target process.
  • Step 104 Increase the frequency of the second core among the multiple cores in the target process.
  • the computing device can increase the voltage of the power supply interface corresponding to the second core to increase the frequency of the second core in the target process.
  • the computing device may increase the frequency of the second core in the target process based on the reference frequency of the second core. After increasing the frequency of the second core in the target process, the frequency of the second core is less than or equal to the reference frequency of the second core. Since the frequency of the kernel in the target process is too large, the processing stability of the kernel will be affected.
  • the frequency of the second kernel is less than or equal to the reference frequency of the second kernel , Can avoid the frequency of the second kernel being too high in the target process, thereby reducing the influence on the processing stability of the second kernel.
  • the core a and the core c are the second cores.
  • the current frequencies of core a and core c are both 100Mhz.
  • the reference frequency of core a is 113.75Mhz
  • the reference frequency of core c is 118.75Mhz.
  • the computing device can increase the frequency of core a in the target process to 113.75Mhz or 113Mhz, and increase the frequency of core c in the target process to 118.75Mhz or 118Mhz.
  • each of the multiple cores has a frequency threshold. Since the frequency of each core in the process of processing the target task cannot be greater than its frequency threshold, before performing this step 104, the computing device can first determine the reference frequency of the second core and the frequency threshold of the second core. The minimum value of the parameter. Then the frequency of the second kernel in the target process is increased to the minimum of the two parameters of the reference frequency and its frequency threshold, so as to avoid that the frequency of the second kernel is greater than the frequency threshold after the frequency of the second kernel in the target process is increased.
  • the processor can reduce the frequency of the first core in the target process.
  • the available power in the power supply system of the country has increased. In this way, the frequency of the second core in the target process can be increased as much as possible without affecting the stability of the multiple cores.
  • the sum of the reduced values of the frequencies of all the first cores in the plurality of cores may be greater than or equal to the sum of the increased values of the frequencies of all the second cores in the plurality of cores.
  • the sum of the reduced values of the frequencies of all the first cores in the plurality of cores may also be less than the sum of the increased values of the frequencies of all the second cores in the plurality of cores, which is not limited in this embodiment of the application.
  • FIG. 4 is a schematic diagram of a thread running in multiple cores according to an embodiment of the application
  • FIG. 5 is another thread running in multiple cores according to an embodiment of the application.
  • Figure 4 shows threads running in multiple cores when the frequency of multiple cores in the target process is not changed
  • Figure 5 shows threads running in multiple cores after changing the frequency of multiple cores in the target process.
  • 4 and 5 both take the multiple cores including core a, core b, core c, and core d as examples for description.
  • thread a1 runs in core a
  • thread b1 runs in core b
  • thread c1 runs in core c
  • a thread d1 runs in the core d.
  • the thread b1 and the thread d1 need to wait for the thread a1 and the thread c1, so there are idle thread segments.
  • Step 105 Detect whether the target process is over.
  • Step 106 When the target process is not finished, repeat the process of obtaining invalid utilization, reducing the frequency of the first kernel in the target process, and increasing the frequency of the second kernel in the target process.
  • step 101 to step 103 can be performed only once to obtain the final frequency modulation. Program.
  • the target task is aperiodic, the operating scenarios of multiple cores in each time period are different. Therefore, when the computing device detects that the target process is not over, it can repeat step 101 to step 103 to achieve the first core and Real-time adjustment of the frequency of the second core. Thereby, the difference in processing time of multiple cores in the target process can be reduced more effectively, and the effective utilization of processor resources can be further improved.
  • the frequency modulation method of the processor provided by the embodiment of the present application, because the computing device can process the target task in parallel by the multiple cores of the processor, after obtaining the invalid utilization rate of each core of the multiple cores According to the invalid utilization rate, reduce the frequency of the first core in the target process and increase the frequency of the second core in the target process, and the invalid utilization rate of the first core is higher than the invalid utilization threshold, the invalid utilization of the second core The rate is below the invalid utilization threshold.
  • the frequency of the kernel with a higher invalid utilization rate in the target process is reduced, and the frequency of the kernel with a lower invalid utilization rate in the target process is increased.
  • the waiting time of the core with higher invalid utilization rate is reduced, and the processing time of the processor to the target task is further reduced, thereby improving the efficiency of multiple cores in processing the target task and the effective utilization of processor resources.
  • a computing device performing frequency modulation on multiple cores included in a processor as an example.
  • a computing device may also perform frequency modulation on multiple cores included in multiple processors.
  • For the frequency modulation process of each core please refer to the aforementioned step 101 to step 106, which will not be repeated in the embodiment of the present application.
  • step 103 and step 104 can be performed at the same time.
  • step 104 can also be performed first and then step 103. This is the case in the embodiment of the present application. Not limited.
  • FIG. 6 is a block diagram of a frequency modulation device for a processor according to an embodiment of the application.
  • the frequency modulation device 300 for a processor includes:
  • the obtaining module 301 is used to obtain the invalid utilization rate of each core in the multiple cores during the target process of the multiple cores of the processor processing the target task in parallel.
  • the amount of calculation in the process is negatively correlated.
  • the frequency modulation module 302 is used to reduce the frequency of the first core among the plurality of cores in the target process.
  • the frequency modulation module 302 is also used to increase the frequency of the second core among the multiple cores in the target process.
  • the invalid utilization rate of the first kernel is higher than the invalid utilization threshold value, and the invalid utilization rate of the second kernel is lower than the invalid utilization threshold value.
  • the obtaining module 301 is used to:
  • the ratio of the idle time corresponding to any core to the total time of at least one unit time period is determined as the invalid utilization rate of any core, and the idle time corresponding to any core is the idle time corresponding to any core in at least one unit time period
  • the total duration of the unit time period, the calculation times of any kernel in the corresponding idle unit time period is less than the calculation times threshold.
  • the frequency modulation module 302 is used for:
  • the reference frequency of the first kernel is positively correlated with: the difference between the invalid utilization threshold and the invalid utilization of the first kernel.
  • the frequency modulation module 302 is used for:
  • the reference frequency of the second kernel is positively correlated with: the difference between the invalid utilization threshold and the invalid utilization of the second kernel.
  • FIG. 7 shows a block diagram of another frequency modulation device for a processor provided by an embodiment of the present application.
  • the frequency modulation device 300 for the processor further includes:
  • the determining module 303 is configured to determine the average value of the invalid utilization rate of multiple cores as the invalid utilization rate threshold.
  • the frequency modulation apparatus 300 of the processor further includes:
  • the detection module 304 is configured to detect whether the target process is over after reducing the frequency of the first kernel in the target process and increasing the frequency of the second kernel in the target process.
  • the repetition module 305 is used to repeatedly execute the process of obtaining invalid utilization when the target process is not finished, reducing the frequency of the first core in the target process, and increasing the frequency of the second core in the target process.
  • the sum of the reduced values of the frequencies of all the first cores in the plurality of cores is greater than or equal to the sum of the increased values of the frequencies of all the second cores in the plurality of cores.
  • the frequency modulation module can obtain the invalidity of each core in the multiple cores during the process of the obtaining module processing the target task in parallel in the multiple cores of the processor.
  • the frequency of the first core in the target process is reduced according to the invalid utilization rate, and the frequency modulation module can also increase the frequency of the second core in the target process according to the invalid utilization rate, and the invalid utilization rate of the first core is high
  • the invalid utilization of the second kernel is lower than the invalid utilization threshold.
  • the frequency of the kernel with a higher invalid utilization rate in the target process is reduced, and the frequency of the kernel with a lower invalid utilization rate in the target process is increased.
  • the waiting time of the core with higher invalid utilization rate is reduced, and the processing time of the processor to the target task is further reduced, thereby improving the efficiency of multiple cores in processing the target task and the effective utilization of processor resources.
  • the embodiment of the present application provides a computer-readable storage medium in which a computer program is stored.
  • the computer program is executed by a processor, the method for frequency modulation of any processor provided in the embodiment of the present application is implemented.
  • An embodiment of the present application provides a chip, which includes a programmable logic circuit and/or program instructions, and is used to implement any processor frequency modulation method provided in the embodiments of the present application when the chip is running.
  • the embodiments of the present application provide a computer program product containing instructions.
  • the computer program product runs on a computer, the computer executes any processor frequency modulation method provided in the embodiments of the present application.
  • the computer may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of a computer program product, which includes one or more computer instructions.
  • the computer may be a general-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data.
  • the center transmits to another website site, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium, or a semiconductor medium (for example, a solid state hard disk).
  • An embodiment of the present application provides a computing device that includes a memory and a processor, where the processor is used to execute a program stored in the memory to implement the frequency modulation method of any processor provided in the embodiment of the present application.
  • the processor may include multiple cores and a frequency modulation device of any processor provided in the embodiments of the present application.
  • the processor may include multiple cores and chips provided in the embodiments of the present application.
  • the computing device may further include: a frequency modulation device of any processor provided in the embodiment of the present application.
  • the computing device may also include the chip provided in the embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a computing device provided by an embodiment of the present application.
  • the embodiment of the present application takes FIG. 8 as an example.
  • the processor includes multiple cores and any processor of the foregoing embodiments.
  • the case of the FM device will be described.
  • the computing device 40 includes: a memory 401 and a processor 402.
  • the memory 401 is used to store a program
  • the processor 402 is used to execute the program stored in the memory 401 to implement the frequency modulation method of any processor provided in the embodiment of the present application.
  • the computing device 40 may further include at least one communication interface 403 and at least one communication bus 404.
  • the memory 401, the processor 402, and the communication interface 403 are communicatively connected through a communication bus 404.
  • the communication interface 403 is used to communicate with other devices under the control of the processor 402, and the processor 402 can call a program stored in the memory 401 through the communication bus 404.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)

Abstract

一种处理器的调频方法及装置、计算设备,属于计算机技术领域。该处理器的调频方法包括:在处理器的多个内核并行处理目标任务的目标过程中,获取多个内核中每个内核的无效利用率;降低多个内核中第一内核在目标过程中的频率;提高多个内核中第二内核在目标过程中的频率;其中,第一内核的无效利用率高于无效利用率阈值,第二内核的无效利用率低于无效利用率阈值。所述方法能够提高处理器资源的有效利用率,可用于调节处理器的多个内核的频率。

Description

处理器的调频方法及装置、计算设备 技术领域
本申请涉及计算机技术领域,特别涉及一种处理器的调频方法及装置、计算设备。
背景技术
随着计算机技术的发展,目前的处理器通常包括多个内核。该多个内核能够对任务进行并行处理。
在多个内核对任务进行并行处理时,多个内核中的每个内核处理该任务中的一部分。在多个内核均将自己处理的部分处理完毕后,由其中一个内核对各个内核的处理结果进行汇总,以得到该任务的处理结果。
通常在多个内核对任务进行并行处理时,各个内核的处理量差异较大,使得各个内核的处理时长存在差异。在任务处理完之前,处理较快的内核会存在处于空跑状态的情况,导致处理器资源的有效利用率较低。
发明内容
本申请提供了一种处理器的调频方法及装置、计算设备,能够提高处理器资源的有效利用率,所述技术方案如下:
第一方面,提供了一种处理器的调频方法,所述方法包括:在处理器的多个内核并行处理目标任务的目标过程中,获取所述多个内核中每个内核的无效利用率;降低所述多个内核中第一内核在所述目标过程中的频率;提高所述多个内核中第二内核在所述目标过程中的频率;其中,所述第一内核的无效利用率高于无效利用率阈值,所述第二内核的无效利用率低于所述无效利用率阈值。其中,无效利用率用于表征内核资源的使用情况,任一内核的无效利用率与该任一内核的在处理目标任务的过程中的计算量负相关。内核的无效利用率越高,表明该内核当前的频率越高,内核的无效利用率越低,表明该内核当前的频率越低。
需要说明的是,由于计算设备能够在处理器的多个内核并行处理目标任务的目标过程中,获取多个内核中每个内核的无效利用率后,根据该无效利用率降低第一内核在目标过程中的频率以及提高第二内核在目标过程中的频率,且该第一内核的无效利用率高于无效利用率阈值,第二内核的无效利用率低于无效利用率阈值。使得无效利用率较高的内核在目标过程中的频率降低,且使得无效利用率较低的内核在目标过程中的频率提高。减小了无效利用率较高的内核的等待时间,进一步减小了处理器对目标任务的处理时间,从而提高了多个内核处理目标任务的效率以及对处理器资源的有效利用率。
可选地,所述获取所述多个内核中每个内核的无效利用率,包括:获取所述多个内核中任一内核在所述目标过程中的至少一个单位时间段中的计算次数;将所述任一内核对应的空闲时长与所述至少一个单位时间段的总时长的比值,确定为所述任一内核的无效利用率,所述任一内核对应的空闲时长为所述至少一个单位时间段中所述任一内核对应的空闲单位时间段的总时长,所述任一内核在对应的空闲单位时间段内的计算次数小 于计算次数阈值。
示例地,若任一内核在处理目标任务时进行浮点计算,则该计算次数可以为该任一内核计算浮点和向量中的至少一种参数的次数之和。若该任一内核在处理目标任务时进行整型计算,则该计算次数可以为该任一内核计算整型和向量中的至少一种参数的次数之和,本申请实施例对此不做限定。其中,任一内核所进行的每次计算对应有一个事件,处理器包括存储每次计算对应的事件的寄存器。在获取多个内核中任一内核的无效利用率时,计算设备可以在每个单位时间段内通过读取该寄存器,获取该寄存器在每个单位时间段内所存储的事件的个数。该个数即为每个单位时间段内该任一内核的计算次数。
计算设备中可以预先设置有计算次数阈值,由于任一内核在处理目标任务时通常会进行大量的计算,因此当任一内核在至少一个单位时间段内的任一单位时间段内的计算次数小于计算次数阈值时,表明该任一内核在该任一单位时间段内并未执行目标任务,也即是处于空跑状态。此时计算设备可以将该任一单位时间段确定为空闲单位时间段。对于任一内核,在确定了该任一内核在至少一个单位之间段中的所有空闲单位时间段后,可以确定该任一内核的空闲时长,从而得到该任一内核的无效利用率。
可选地,所述降低所述多个内核中第一内核在所述目标过程中的频率,包括:基于所述第一内核的参考频率降低所述第一内核在所述目标过程中的频率,其中,在降低所述第一内核在所述目标过程中的频率后,所述第一内核的频率小于或等于所述第一内核的参考频率;其中,所述第一内核的参考频率正相关于:所述无效利用率阈值与所述第一内核的无效利用率之差。
可选地,所述提高所述多个内核中第二内核在所述目标过程中的频率,包括:基于所述第二内核的参考频率提高所述第二内核在所述目标过程中的频率,其中,在提高所述第二内核在所述目标过程中的频率后,所述第二内核的频率小于或等于所述第二内核的参考频率;其中,所述第二内核的参考频率正相关于:所述无效利用率阈值与所述第二内核的无效利用率之差。
示例地,计算设备可以先基于多个内核中每个内核的无效利用率,确定多个内核中的第一内核和第二内核。再基于第一内核的参考频率降低第一内核在目标过程中的频率,基于第二内核的参考频率提高第二内核在目标过程中的频率。
可选地,所述方法还包括:将所述多个内核的无效利用率的平均值确定为所述无效利用率阈值。示例地,该平均值可以为算数平均值、几何平均值、平方平均值、调和平均值或者加权平均值等。计算设备可以通过平均值以及目标公式确定每个内核的参考频率。该目标公式可以为:,其中,s'表示多个内核中任一内核的参考频率,s表示该任一内核的当前频率,a表示无效利用率阈值,b表示该任一内核的无效利用率。计算设备可以将第一内核在处理目标任务的过程中的频率降低为第一内核的参考频率,将第二内核在处理目标任务的过程中的频率提高为第二内核的参考频率。
可选地,该任一内核具有频率阈值,由于每个内核在处理目标任务的过程中的频率均不能大于其频率阈值,因此,在提高多个内核中第二内核在处理目标任务的过程中的频率之前,所述方法还包括:确定所述第二内核的参考频率与频率阈值这两个参数中的最小值;所述提高所述多个内核中第二内核在所述目标过程中的频率,包括:将所述第二内核在所述目标过程中的频率提高为所述最小值。以避免提高第二内核在目标过程中 的频率之后第二内核的频率大于频率阈值。
可选地,计算设备中的处理器还包括与该多个内核一一对应的多个供电接口,该多个供电接口用于向对应的内核提供电压,以驱动相应的内核处理目标任务。计算设备可以通过降低该第一内核对应的供电接口的电压,以降低第一内核在处理目标任务的过程中的频率。以及通过提高该第二内核对应的供电接口的电压,以提高第二内核在处理目标任务的过程中的频率。
需要说明的是,由于内核在目标过程中的频率过大时会影响内核的处理稳定性,因此,在降低第一内核在目标过程中的频率后,该第一内核的频率小于或等于该第一内核的参考频率。在提高第二内核在目标过程中的频率后,该第二内核的频率小于或等于该第二内核的参考频率。这样能够避免第一内核和第二内核在目标过程中的频率过大,从而减小了对第一内核和第二内核的处理稳定性的影响。
可选地,所述方法还包括:在降低所述第一内核在所述目标过程中的频率,以及提高所述第二内核在所述目标过程中的频率后,检测所述目标过程是否结束;当所述目标过程未结束时,重复执行获取所述无效利用率,降低所述第一内核在所述目标过程中的频率,以及提高所述第二内核在所述目标过程中的频率的过程。
需要说明的是,由于多个内核在处理器执行目标任务时,该目标任务可以是周期性或者非周期性的。当目标任务为周期性时,由于各个周期中多个内核的运行场景几乎相同,因此,可以仅执行一次获取无效利用率,降低第一内核在所述目标过程中的频率,以及提高第二内核在所述目标过程中的频率的过程即得到最终的调频方案。当目标任务为非周期性时,各个时间段多个内核的运行场景均不同,因此,计算设备在检测到多个内核将目标任务未处理完毕时,可以重复执行获取所述无效利用率,降低所述第一内核在所述目标过程中的频率,以及提高所述第二内核在所述目标过程中的频率的过程,以实现对第一内核和第二内核的频率的实时调节。从而能够更加有效的减小多个内核在处理目标任务的过程中的处理时长的差异,进一步提高对处理器资源的有效利用率。
可选地,所述多个内核中所有第一内核的频率的降低值之和大于或等于所述多个内核中所有第二内核的频率的提高值之和。这样可以保证调节多个内核的频率之后多个内核所需的总电压小于或等于调节多个内核的频率之前多个内核所需的总电压,进而能够避免处理器的供电系统的可用电能无法实现对多个内核的频率进行调节的情况发生,从而减小对多个内核的稳定性的影响。
第二方面,提供了一种处理器的调频装置,所述处理器的调频装置包括用于执行第一方面任一所述的处理器的调频方法的各个模块。
第三方面,提供了一种计算机可读存储介质,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现第一方面任一所述的处理器的调频方法。
第四方面,提供了一种芯片,所述芯片包括可编程逻辑电路和/或程序指令,当所述芯片运行时用于实现如第一方面任一所述的处理器的调频方法。
第五方面,提供了一种包含指令的计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行第一方面任一所述的处理器的调频方法。
第六方面,提供了一种计算设备,所述计算设备包括:存储器和处理器,其中,所述处理器用于执行所述存储器中存储的程序,以实现第一方面任一所述的处理器的调频 方法。
附图说明
图1为本申请实施例提供的一种处理器的调频方法流程图;
图2为本申请实施例提供的一种计算设备中的多个内核处理目标任务的场景示意图;
图3为本申请实施例提供的一种获取多个内核中每个内核的无效利用率的方法流程图;
图4为本申请实施例提供的一种多个内核中运行的线程示意图;
图5为本申请实施例提供的另一种多个内核中运行的线程示意图;
图6为本申请实施例提供的一种处理器的调频装置的框图;
图7为本申请实施例提供的另一种处理器的调频装置的框图;
图8为本申请实施例提供的一种计算设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
计算设备包括处理器,处理器通常包括多个内核。该多个内核能够对任务进行并行处理。在多个内核对任务进行并行处理时,每个内核用于处理该任务中的一部分。在多个内核均将自己处理的部分处理完毕后,由其中一个内核对各个内核的处理结果进行汇总,以得到该任务的处理结果。
在多个内核对任务进行并行处理之前,需要一计算设备将该任务分配至多个内核。但是通常不会将该任务平均分配至多个内核。例如,在高性能计算(high performance computing,HPC)领域,在多个内核对某一任务进行处理之前,该计算设备会根据该任务生成计算模型,将该计算模型分配至多个内核。由于计算模型通常较为复杂,导致任务中分配至多个内核的部分不均匀,且分配至多个内核的部分的计算量也存在较大差异,使得各个内核的处理时长存在差异。在任务处理完之前,处理较快的内核会存在处于空跑状态的情况,导致处理器资源的有效利用率较低。其中,内核的有效利用率与该内核的在处理任务的过程中的计算量正相关。
相关技术提供了两种调节处理器的多个内核的频率的技术,一种为用于提高多个内核的处理速度的处理器加速(turbo)技术,另一种为用于降低多个内核的处理功耗的硬件性能状态(hardware performance state,HWP)技术。在采用该两种技术调节多个内核的频率时,处理器会检测每个内核是否处于工作状态。之后在处理器加速(turbo)技术中,处理器会在处理器的热设计功耗(thermal design power,TDP)范围内,控制供电系统提高处于工作状态的内核的供电电压,降低处于非工作状态的内核的供电电压。以提高处于工作状态的内核的运行频率,降低处于非工作状态的内核的运行频率,从而提高处于工作状态的内核的处理速度。在HWP技术中,处理器会降低处理非工作状态的内核的运行频率,以降低降低多个内核的处理功耗。但是在多个内核对任务进行并行处理时,在一些应用场景下(例如HPC应用场景),当处理较快的内核存在处于空跑状态的情况时,处理器会确定该处于空跑状态的内核依然处于工作状态。这样一来,就使得 在处理器加速(turbo)技术中,处理器会提高每个内核的运行频率。在HWP技术中,处理器不会降低任一内核的运行频率。从而导致处理较快的内核依然处于空跑状态,影响处理器资源的有效利用率。因此,相关技术所提供的两种方法均不能实现对处理器资源的有效利用率的提高。
本申请实施例提供了一种处理器的调频方法,该方法可以应用于计算设备包括的处理器中除多个内核之外的模块,例如该方法可以应用于处理器中的非核心(uncore)模块或者管理模块等。或者,该方法可以应用于计算设备中除处理器之外的模块,例如该方法可以应用于基板管理控制器(baseboard management controller,BMC)。又或者,该方法可以应用于与该处理器所在的计算设备不同的外置设备中,本申请实施例对此不做限定。示例地,图1为本申请实施例提供的一种处理器的调频方法流程图,图1以该方法应用于计算设备为例进行说明,该方法可以包括:
步骤101、在处理器的多个内核并行处理目标任务的目标过程中,获取多个内核中每个内核的无效利用率。
该多个内核用于处理该目标任务。该多个内核可以是处理器所包括的全部内核,或者是处理器所包括的全部内核中的部分内核。该处理器可以是中央处理器(central processing unit,CPU)或者图形处理器(graphics processing unit,GPU)。
其中,在多个内核处理目标任务的目标过程中,当采用多线程技术使多个线程能够并行运行时,每个内核中运行有一个线程;当未采用多线程技术使多个线程能够并行运行时,每个内核中运行有一个进程。示例地,请参考图2,图2为本申请实施例提供的一种计算设备中的多个内核处理目标任务的场景示意图。该场景包括计算设备20,计算设备20包括处理器201,该处理器201包括多个内核(图2示出了4个),该多个内核包括内核a、内核b、内核c和内核d。图2以每个内核中运行有一个线程为例进行说明,内核a中运行有线程a1,内核b中运行有线程b1,内核c中运行有线程c1,内核d中运行有线程d1。
其中,无效利用率用于表征内核资源的使用情况,任一内核的无效利用率与该任一内核在目标过程中的计算量负相关。内核的无效利用率越高,表明该内核当前的频率越高,内核的无效利用率越低,表明该内核当前的频率越低。可选地,多个内核可以先开始处理目标任务,在开始处理目标任务的一段时间后执行该步骤101,以避免由于处理目标任务时间过短而获取不到每个内核在处理目标任务的过程中的无效利用率。示例地,该一段时间可以为三分钟、四分钟、五分钟或者大于十分钟。
示例地,请参考图3,图3为本申请实施例提供的一种获取多个内核中每个内核的无效利用率的方法流程图,该方法可以包括:
步骤1011、获取多个内核中任一内核在目标过程中的至少一个单位时间段中的计算次数。
可选地,该单位时间段可以为1毫秒(ms),此时该至少一个单位时间段可以为三十万个单位时间段(也即是该至少一个单位时间段的总时长为五分钟)。
可选地,若任一内核在目标过程中进行浮点计算,则该计算次数可以为该任一内核计算浮点和向量中的至少一种参数的次数之和。若该任一内核在目标过程中进行整型计 算,则该计算次数可以为该任一内核计算整型和向量中的至少一种参数的次数之和,本申请实施例对此不做限定。
其中,任一内核所进行的每次计算对应有一个事件,处理器包括存储每次计算对应的事件的寄存器。计算设备可以在每个单位时间段内通过读取该寄存器,获取该寄存器在每个单位时间段内所存储的事件的个数。该个数即为每个单位时间段内该任一内核的计算次数。示例地,处理器包括性能监视单元(performance monitor unit,PMU)寄存器,PMU寄存器中存储有多个PMU数据,每个PMU数据中存储有一个事件。对于多个内核中的任一内核,计算设备可以通过在每个单位时间段读取PMU寄存器中存储的PMU数据,获取到每个单位时间段内该任一内核的计算次数。
步骤1012、将任一内核对应的空闲时长与至少一个单位时间段的总时长的比值,确定为任一内核的无效利用率,其中,该任一内核对应的空闲时长为至少一个单位时间段中该任一内核对应的空闲单位时间段的总时长,任一内核在对应的空闲单位时间段内的计算次数小于计算次数阈值。
计算设备中可以预先设置有计算次数阈值,由于任一内核在目标过程中通常会进行大量的计算,因此当任一内核在至少一个单位时间段内的任一单位时间段内的计算次数小于计算次数阈值时,表明该任一内核在该任一单位时间段内并未执行目标任务,也即是处于空跑状态。此时计算设备可以将该任一单位时间段确定为该任一内核对应的空闲单位时间段。对于任一内核,在确定了该任一内核在至少一个单位之间段中对应的所有空闲单位时间段后,可以确定该任一内核对应的空闲时长,从而得到该任一内核的无效利用率。
示例地,该计算次数阈值可以为800、900、1000或者1100。假设计算次数阈值为1000,单位时间段为1ms,至少一个单位时间段为三十万个单位时间段(也即是总时长为5分钟)。该多个内核包括内核a、内核b、内核c和内核d。内核a在该三十万个单位时间段中对应的空闲单位时间段为3万个(也即是内核a的空闲时长为0.5分钟),内核a的无效利用率为0.5/5=0.1。内核b在该三十万个单位时间段中对应的空闲单位时间段为15万个(也即是内核b的空闲时长为2.5分钟),内核b的无效利用率为2.5/5=0.5。内核c在该三十万个单位时间段中对应的空闲单位时间段为1.5万个(也即是内核c的空闲时长为0.25分钟),内核c的无效利用率为0.25/5=0.05。内核d在该三十万个单位时间段中对应的空闲单位时间段为9万个(也即是内核d的空闲时长为1.5分钟),内核d的无效利用率为1.5/5=0.3。
步骤102、基于多个内核中每个内核的无效利用率,确定多个内核中的第一内核和第二内核。
由于第一内核的无效利用率高于无效利用率阈值,第二内核的无效利用率低于无效利用率阈值,因此计算设备可以按照该多个内核中的每个内核的无效利用率与无效利用率阈值的大小关系,将该多个内核中的每个内核确定为第一内核或者第二内核。当任一内核的无效利用率高于无效利用率阈值时,将该任一内核确定为第一内核。当任一内核的无效利用率低于无效利用率阈值时,将该任一内核确定为第二内核。
其中,该无效利用率阈值可以为多个内核的无效利用率的平均值。可选地,该平均值可以为算数平均值、几何平均值、平方平均值、调和平均值或者加权平均值等,本申 请实施例对此不做限定。示例地,请参考前述步骤101,假设平均值为算数平均值,该内核a、内核b、内核c和内核d的无效利用率的平均值为(0.1+0.5+0.05+0.3)/4=0.2375。假设该平均值为平方平均值,该内核a、内核b、内核c和内核d的无效利用率的平均值为
Figure PCTCN2020095550-appb-000001
当无效利用率阈值为0.2375时,请参考前述步骤101,内核a的无效利用率和内核c的无效利用率均小于无效利用率阈值,内核b的无效利用率和内核d的无效利用率均大于无效利用率阈值。计算设备可以将内核a和内核c确定为第二内核,将内核b和内核d确定为第一内核。
示例地,计算设备可以通过每个内核的当前频率与其参考频率的大小关系确定每个内核的无效利用率与无效利用率阈值的大小关系。进一步确定该多个内核中的第一内核和第二内核,该任一内核的参考频率正相关于:无效利用率阈值与该任一内核的无效利用率之差。当任一内核的当前频率大于其参考频率时,表明该任一内核的无效利用率高于无效利用率阈值,进一步可以将该任一内核确定为第一内核;任一内核的当前频率小于其参考频率时,表明该任一内核的无效利用率低于无效利用率阈值,进一步可以将该任一内核确定为第二内核。
其中,计算设备可以基于多个内核的无效利用率以及无效利用率阈值,确定多个内核中每个内核的参考频率。或者计算设备可以预先设置有内核的不同无效利用率与参考频率的一一对应关系,计算设备可以直接根据该多个内核中任一内核的无效利用率查找该内核的不同无效利用率与参考频率的一一对应关系,以确定该任一内核的无效利用率所对应的参考频率。之后计算设备再基于每个内核的当前频率与其参考频率的大小关系确定每个内核的无效利用率与无效利用率阈值的大小关系,进一步确定多个内核中的第一内核和第二内核。
示例地,计算设备在基于多个内核的无效利用率以及无效利用率阈值,确定多个内核中每个内核的参考频率时,可以通过无效利用率阈值以及目标公式确定每个内核的参考频率。该目标公式可以为:s'=s×(1+a-b)。其中,s'表示多个内核中任一内核的参考频率,s表示该任一内核的当前频率,a表示无效利用率阈值,b表示该任一内核的无效利用率。
例如,请参考前述步骤101和步骤102,假设内核a、内核b、内核c以及内核d的当前频率均为100兆赫兹(Mhz),且无效利用率阈值为多个内核的无效利用率的平均值,该无效利用率阈值为0.2375。内核a的参考频率为100×(1+0.2375-0.1)=113.75Mhz,内核b的参考频率为100×(1+0.2375-0.5)=73.75Mhz,内核c的参考频率为100×(1+0.2375-0.05)=118.75Mhz,内核d的参考频率为100×(1+0.2375-0.3)=93.75Mhz。可以看出,该无效利用率阈值与任一内核的无效利用率的差值越大,该任一内核的参考频率越大;该无效利用率阈值与任一内核的无效利用率的差值越小,该任一内核的参考频率越小。其中,内核a的当前频率和内核c的当前频率均小于其参考频率,内核b的当前频率和内核d的当前频率均大于其参考频率。计算设备可以将内核a和内核c确定为第二内核,将内核b和内核d确定为第一内核。
步骤103、降低多个内核中第一内核在目标过程中的频率。
可选地,请参考前述步骤102,计算设备可以基于第一内核的参考频率降低第一内核在目标过程中的频率。在降低该第一内核在目标过程中的频率后,该第一内核的频率小 于或等于该第一内核的参考频率。由于内核在目标过程中的频率过大时会影响内核的处理稳定性,因此,在降低第一内核在目标过程中的频率后,该第一内核的频率小于或等于该第一内核的参考频率能够避免第一内核在目标过程中的频率过大,从而减小了对第一内核的处理稳定性的影响。
示例地,请参考前述步骤101和步骤102,对于内核a、内核b、内核c和内核d,内核b和内核d第一内核。在执行该步骤103之前,内核b和内核d的当前频率均为100Mhz。内核b的参考频率为73.75Mhz,内核d的参考频率为93.75Mhz。计算设备可以将内核b在目标过程中的频率降低为73.75Mhz或者73Mhz,将内核d在目标过程中的频率降低为93.75Mhz或者93Mhz。
可选地,计算设备中的处理器还包括与该多个内核一一对应的多个供电接口,该多个供电接口用于向对应的内核提供电压,以驱动相应的内核处理目标任务。计算设备可以通过降低该第一内核对应的供电接口的电压,以降低第一内核在目标过程中的频率。
步骤104、提高多个内核中第二内核在目标过程中的频率。
计算设备可以通过提高该第二内核对应的供电接口的电压,以提高第二内核在目标过程中的频率。可选地,请参考前述步骤102,计算设备可以基于第二内核的参考频率提高第二内核在目标过程中的频率。在提高该第二内核在目标过程中的频率后,该第二内核的频率小于或等于该第二内核的参考频率。由于内核在目标过程中的频率过大时会影响内核的处理稳定性,因此,在提高第二内核在目标过程中的频率后,该第二内核的频率小于或等于该第二内核的参考频率,能够避免第二内核在目标过程中的频率过大,从而减小了对第二内核的处理稳定性的影响。
示例地,请参考前述步骤101和步骤102,对于内核a、内核b、内核c和内核d,内核a和内核c为第二内核。在执行该步骤104之前,内核a和内核c的当前频率均为100Mhz。内核a的参考频率为113.75Mhz,内核c的参考频率为118.75Mhz。计算设备可以将内核a在目标过程中的频率提高为113.75Mhz或者113Mhz,将内核c在目标过程中的频率提高为118.75Mhz或者118Mhz。
需要说明的是,该多个内核中的每个内核均具有频率阈值。由于每个内核在处理目标任务的过程中的频率均不能大于其频率阈值,因此,在执行该步骤104之前,计算设备可以先确定第二内核的参考频率与第二内核的频率阈值这两个参数中的最小值。再将第二内核在目标过程中的频率提高为该参考频率与其频率阈值这两个参数中的最小值,以避免提高第二内核在目标过程中的频率之后第二内核的频率大于频率阈值。
另一方面,当处理器的内核的频率不断提高时,处理器的供电系统的可用电能会不断减小,导致可能存在无法提高内核的电压的情况,从而影响多个内核的稳定性。在本申请实施例中,若先降低第一内核在目标过程中的频率,再提高第二内核在目标过程中的频率,则能够使得在降低第一内核在目标过程中的频率后,处理器的供电系统中的可用电能增大。这样一来,能够实现在不影响多个内核的稳定性的同时,尽可能地实现第二内核在目标过程中的频率的提高。
可选地,该多个内核中所有第一内核的频率的降低值之和可以大于或等于多个内核中所有第二内核的频率的提高值之和。这样可以保证调节多个内核的频率之后多个内核所需的总电压小于或等于调节多个内核的频率之前多个内核所需的总电压,进而能够避 免处理器的供电系统的可用电能无法实现对多个内核的频率进行调节的情况发生,从而减小对多个内核的稳定性的影响。可选地,该多个内核中所有第一内核的频率的降低值之和也可以小于多个内核中所有第二内核的频率的提高值之和,本申请实施例对此不做限定。
示例地,请参考图4和图5,图4为本申请实施例提供的一种多个内核中运行的线程示意图,图5为本申请实施例提供的另一种多个内核中运行的线程示意图。其中,图4示出了未改变多个内核在目标过程中的频率时多个内核中运行的线程,图5示出了改变多个内核在目标过程中的频率后多个内核中运行的线程。图4和图5均以该多个内核包括内核a、内核b、内核c以及内核d为例进行说明。如图4和图5所示,在内核a、内核b、内核c以及内核d处理目标任务时,内核a中运行有线程a1,内核b中运行有线程b1,内核c中运行有线程c1,内核d中运行有线程d1。如图4所示,线程b1和线程d1由于需要等待线程a1和线程c1,因此均存在空闲线程段。如图5所示,在降低内核b和内核d在目标过程中的频率,并提高内核a和内核c在目标过程中的频率之后,线程b1和线程d1的运行速度减小,且线程a1和线程c1的运行速度增大,使得空闲线程段占用线程b1的比例以及空闲线程段占用线程d1的比例均减小,从而减小了线程b1和线程d1等待线程a1和线程c1的时间,进一步减小了多个内核对目标任务的处理时长,提高了对多个内核的有效利用率。
步骤105、检测目标过程是否结束。
步骤106、当目标过程未结束时,重复执行获取无效利用率,降低第一内核在目标过程中的频率,以及提高第二内核在目标过程中的频率的过程。
由于该目标任务可以是周期性或者非周期性的,当目标任务为周期性时,各个周期中多个内核的运行场景几乎相同,因此,可以仅执行一次步骤101至步骤103即得到最终的调频方案。当目标任务为非周期性时,各个时间段多个内核的运行场景均不同,因此,计算设备在检测到目标过程未结束时,可以重复执行步骤101至步骤103,以实现对第一内核和第二内核的频率的实时调节。从而能够更加有效的减小多个内核在目标过程中的处理时长的差异,进一步提高对处理器资源的有效利用率。
综上所述,本申请实施例提供的处理器的调频方法,由于计算设备能够在处理器的多个内核并行处理目标任务的目标过程中,获取多个内核中每个内核的无效利用率后,根据该无效利用率降低第一内核在目标过程中的频率以及提高第二内核在目标过程中的频率,且该第一内核的无效利用率高于无效利用率阈值,第二内核的无效利用率低于无效利用率阈值。使得无效利用率较高的内核在目标过程中的频率降低,且使得无效利用率较低的内核在目标过程中的频率提高。减小了无效利用率较高的内核的等待时间,进一步减小了处理器对目标任务的处理时间,从而提高了多个内核处理目标任务的效率以及对处理器资源的有效利用率。
需要说明的是,前述步骤是以计算设备对一个处理器包括的多个内核进行调频为例进行说明的,在实际应用中,计算设备也可以对多个处理器所包括的多个内核进行调频,对每个内核进行调频的过程均可以参考前述步骤101至步骤106,本申请实施例在此不做赘述。
本申请实施例提供的方法的先后顺序可以进行适当调整,步骤也可以根据情况进行 相应增减。任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内。前述步骤103和步骤104可以同时执行。另外,随着处理器的工作频率的提高,处理器对电压的要求也越来越严格,因此为了保持处理器的稳定性,也可以先执行步骤104再执行步骤103,本申请实施例对此不做限定。
上文中结合图1至图5,详细描述了本申请实施例所提供的处理器的调频方法,下面将结合图6和图7,描述本申请实施例所提供的处理器的调频装置。
本申请实施例提供了一种处理器的调频装置,请参考图6,图6为本申请实施例提供的一种处理器的调频装置的框图,该处理器的调频装置300包括:
获取模块301,用于在处理器的多个内核并行处理目标任务的目标过程中,获取多个内核中每个内核的无效利用率,该任一内核的无效利用率与该任一内核在目标过程中的计算量负相关。
调频模块302,用于降低多个内核中第一内核在目标过程中的频率。
该调频模块302,还用于提高多个内核中第二内核在目标过程中的频率。
其中,第一内核的无效利用率高于无效利用率阈值,第二内核的无效利用率低于无效利用率阈值。
可选地,该获取模块301,用于:
获取多个内核中任一内核在目标过程中的至少一个单位时间段中的计算次数;
将任一内核对应的空闲时长与至少一个单位时间段的总时长的比值,确定为任一内核的无效利用率,任一内核对应的空闲时长为至少一个单位时间段中任一内核对应的空闲单位时间段的总时长,任一内核在对应的空闲单位时间段内的计算次数小于计算次数阈值。
可选地,该调频模块302,用于:
基于第一内核的参考频率降低第一内核在目标过程中的频率,其中,在降低第一内核在目标过程中的频率后,第一内核的频率小于或等于第一内核的参考频率;
其中,第一内核的参考频率正相关于:无效利用率阈值与第一内核的无效利用率之差。
可选地,该调频模块302,用于:
基于第二内核的参考频率提高第二内核在目标过程中的频率,其中,在提高第二内核在目标过程中的频率后,第二内核的频率小于或等于第二内核的参考频率;
其中,第二内核的参考频率正相关于:无效利用率阈值与第二内核的无效利用率之差。
图7示出了本申请实施例提供的另一种处理器的调频装置的框图,参见图7,在图6的基础上,该处理器的调频装置300还包括:
确定模块303,用于将多个内核的无效利用率的平均值确定为无效利用率阈值。
可选地,如图7所示,该处理器的调频装置300还包括:
检测模块304,用于在降低第一内核在目标过程中的频率,以及提高第二内核在目标过程中的频率后,检测目标过程是否结束。
重复模块305,用于当目标过程未结束时,重复执行获取无效利用率,降低第一内核 在目标过程中的频率,以及提高第二内核在目标过程中的频率的过程。
可选地,该多个内核中所有第一内核的频率的降低值之和大于或等于多个内核中所有第二内核的频率的提高值之和。
综上所述,本申请实施例提供的处理器的调频装置中,调频模块能够在获取模块在处理器的多个内核并行处理目标任务的目标过程中,获取多个内核中每个内核的无效利用率后,根据该无效利用率降低第一内核在目标过程中的频率,调频模块还能够根据该无效利用率提高第二内核在目标过程中的频率,且该第一内核的无效利用率高于无效利用率阈值,第二内核的无效利用率低于无效利用率阈值。使得无效利用率较高的内核在目标过程中的频率降低,且使得无效利用率较低的内核在目标过程中的频率提高。减小了无效利用率较高的内核的等待时间,进一步减小了处理器对目标任务的处理时间,从而提高了多个内核处理目标任务的效率以及对处理器资源的有效利用率。
本申请实施例提供了一种计算机可读存储介质,该存储介质内存储有计算机程序,该计算机程序被处理器执行时实现本申请实施例提供的任一处理器的调频方法。
本申请实施例提供了一种芯片,该芯片包括可编程逻辑电路和/或程序指令,当芯片运行时用于实现本申请实施例提供的任一处理器的调频方法。
本申请实施例提供了一种包含指令的计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行本申请实施例提供的任一处理器的调频方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现,所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机的可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质,或者半导体介质(例如固态硬盘)等。
本申请实施例提供了一种计算设备,该计算设备包括:存储器和处理器,其中,处理器用于执行存储器中存储的程序,以实现本申请实施例提供的任一处理器的调频方法。可选地,该处理器可以包括多个内核和本申请实施例提供的任一处理器的调频装置。或者,该处理器可以包括多个内核和本申请实施例提供的芯片。或者,该计算设备还可以包括:本申请实施例提供的任一处理器的调频装置。或者,该计算设备还可以包括本申请实施例提供的芯片。
示例地,请参考图8,图8为本申请实施例提供的一种计算设备的结构示意图,本申请实施例以图8为例,对处理器包括多个内核和前述实施例任一处理器的调频装置的情 况进行说明。如图8所示,该计算设备40包括:存储器401和处理器402。其中,存储器401用于存储程序,处理器402用于执行存储器401中存储的程序,以实现本申请实施例提供的任一处理器的调频方法。
可选地,如图8所示,该计算设备40还可以包括至少一个通信接口403和至少一个通信总线404。存储器401、处理器402以及通信接口403通过通信总线404通信连接。其中,通信接口403用于在处理器402的控制下与其他设备通信,处理器402可以通过通信总线404调用存储器401中存储的程序。
在本申请中,术语“第一”和“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。术语“多个”指两个或两个以上,除非另有明确的限定。
需要说明的是,本申请实施例提供的方法实施例和装置实施例均可以相互参考,本申请实施例对此不做限定。本申请实施例提供的方法实施例步骤的先后顺序能够进行适当调整,步骤也能够根据情况进行相应增减,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内,因此不再赘述。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (15)

  1. 一种处理器的调频方法,其特征在于,所述方法包括:
    在处理器的多个内核并行处理目标任务的目标过程中,获取所述多个内核中每个内核的无效利用率;
    降低所述多个内核中第一内核在所述目标过程中的频率;
    提高所述多个内核中第二内核在所述目标过程中的频率;
    其中,所述第一内核的无效利用率高于无效利用率阈值,所述第二内核的无效利用率低于所述无效利用率阈值。
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述多个内核中每个内核的无效利用率,包括:
    获取所述多个内核中任一内核在所述目标过程中的至少一个单位时间段中的计算次数;
    将所述任一内核对应的空闲时长与所述至少一个单位时间段的总时长的比值,确定为所述任一内核的无效利用率,所述任一内核对应的空闲时长为所述至少一个单位时间段中所述任一内核对应的空闲单位时间段的总时长,所述任一内核在对应的空闲单位时间段内的计算次数小于计算次数阈值。
  3. 根据权利要求1或2所述的方法,其特征在于,所述降低所述多个内核中第一内核在所述目标过程中的频率,包括:
    基于所述第一内核的参考频率降低所述第一内核在所述目标过程中的频率,其中,在降低所述第一内核在所述目标过程中的频率后,所述第一内核的频率小于或等于所述第一内核的参考频率;
    其中,所述第一内核的参考频率正相关于:所述无效利用率阈值与所述第一内核的无效利用率之差。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述提高所述多个内核中第二内核在所述目标过程中的频率,包括:
    基于所述第二内核的参考频率提高所述第二内核在所述目标过程中的频率,其中,在提高所述第二内核在所述目标过程中的频率后,所述第二内核的频率小于或等于所述第二内核的参考频率;
    其中,所述第二内核的参考频率正相关于:所述无效利用率阈值与所述第二内核的无效利用率之差。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述方法还包括:
    将所述多个内核的无效利用率的平均值确定为所述无效利用率阈值。
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述方法还包括:
    在降低所述第一内核在所述目标过程中的频率,以及提高所述第二内核在所述目标过程中的频率后,检测所述目标过程是否结束;
    当所述目标过程未结束时,重复执行获取所述无效利用率,降低所述第一内核在所述目标过程中的频率,以及提高所述第二内核在所述目标过程中的频率的过程。
  7. 根据权利要求1至6任一所述的方法,其特征在于,所述多个内核中所有第一内核的频率的降低值之和大于或等于所述多个内核中所有第二内核的频率的提高值之和。
  8. 一种处理器的调频装置,其特征在于,所述处理器的调频装置包括:
    获取模块,用于:在处理器的多个内核并行处理目标任务的目标过程中,获取所述多个内核中每个内核的无效利用率;
    调频模块,用于:降低所述多个内核中第一内核在所述目标过程中的频率;
    提高所述多个内核中第二内核在所述目标过程中的频率;
    其中,所述第一内核的无效利用率高于无效利用率阈值,所述第二内核的无效利用率低于所述无效利用率阈值。
  9. 根据权利要求8所述的装置,其特征在于,所述获取模块,用于:
    获取所述多个内核中任一内核在所述目标过程中的至少一个单位时间段中的计算次数;
    将所述任一内核对应的空闲时长与所述至少一个单位时间段的总时长的比值,确定为所述任一内核的无效利用率,所述任一内核对应的空闲时长为所述至少一个单位时间段中所述任一内核对应的空闲单位时间段的总时长,所述任一内核在对应的空闲单位时间段内的计算次数小于计算次数阈值。
  10. 根据权利要求8或9所述的装置,其特征在于,所述调频模块,用于:
    基于所述第一内核的参考频率降低所述第一内核在所述目标过程中的频率,其中,在降低所述第一内核在所述目标过程中的频率后,所述第一内核的频率小于或等于所述第一内核的参考频率;
    其中,所述第一内核的参考频率正相关于:所述无效利用率阈值与所述第一内核的无效利用率之差。
  11. 根据权利要求8至10任一所述的装置,其特征在于,所述调频模块,用于:
    基于所述第二内核的参考频率提高所述第二内核在所述目标过程中的频率,其中,在提高所述第二内核在所述目标过程中的频率后,所述第二内核的频率小于或等于所述第二内核的参考频率;
    其中,所述第二内核的参考频率正相关于:所述无效利用率阈值与所述第二内核的无效利用率之差。
  12. 根据权利要求8至11任一所述的装置,其特征在于,所述装置还包括:
    确定模块,用于将所述多个内核的无效利用率的平均值确定为所述无效利用率阈值。
  13. 根据权利要求8至12任一所述的装置,其特征在于,所述处理器的调频装置还包括:
    检测模块,用于在降低所述第一内核在所述目标过程中的频率,以及提高所述第二内核在所述目标过程中的频率后,检测所述目标过程是否结束;
    重复模块,用于当所述目标过程未结束时,重复执行获取所述无效利用率,降低所述第一内核在所述目标过程中的频率,以及提高所述第二内核在所述目标过程中的频率的过程。
  14. 根据权利要求8至13任一所述的装置,其特征在于,所述多个内核中所有第一内核的频率的降低值之和大于或等于所述多个内核中所有第二内核的频率的提高值之和。
  15. 一种计算设备,其特征在于,所述计算设备包括:存储器和处理器,其中,所述处理器用于执行所述存储器中存储的程序,以实现权利要求1至7任一所述的处理器的调频方法。
PCT/CN2020/095550 2019-10-17 2020-06-11 处理器的调频方法及装置、计算设备 WO2021073130A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910990340.6 2019-10-17
CN201910990340.6A CN110941325B (zh) 2019-10-17 2019-10-17 处理器的调频方法及装置、计算设备

Publications (1)

Publication Number Publication Date
WO2021073130A1 true WO2021073130A1 (zh) 2021-04-22

Family

ID=69905952

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/095550 WO2021073130A1 (zh) 2019-10-17 2020-06-11 处理器的调频方法及装置、计算设备

Country Status (2)

Country Link
CN (2) CN114816033A (zh)
WO (1) WO2021073130A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023082723A1 (zh) * 2021-11-11 2023-05-19 Oppo广东移动通信有限公司 处理器电路、供电控制方法及终端设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114816033A (zh) * 2019-10-17 2022-07-29 华为技术有限公司 处理器的调频方法及装置、计算设备
CN111580639A (zh) * 2020-05-06 2020-08-25 深圳忆联信息系统有限公司 Ssd自适应负载时钟的调节方法、装置和计算机设备
WO2022041251A1 (zh) * 2020-08-31 2022-03-03 华为技术有限公司 一种功率预算的分配方法及相关设备
CN112947737A (zh) * 2021-02-20 2021-06-11 山东云海国创云计算装备产业创新中心有限公司 一种芯片功耗调整方法、装置、电子设备和存储介质
CN115599554A (zh) * 2022-11-16 2023-01-13 浪潮电子信息产业股份有限公司(Cn) 一种gpgpu资源分配方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483646A (zh) * 2009-07-24 2012-05-30 超威半导体公司 根据性能灵敏度不均匀地改变计算单元的性能
CN102999391A (zh) * 2012-11-21 2013-03-27 华为技术有限公司 一种调节处理器运行频率的方法和装置
CN106462456A (zh) * 2014-07-09 2017-02-22 英特尔公司 基于对生产者/消费者工作负载序列化的检测的处理器状态控制
US20190041962A1 (en) * 2018-09-28 2019-02-07 Avinash Ananthakrishnan Per-core operating voltage and/or operating frequency determination based on effective core utilization
CN110941325A (zh) * 2019-10-17 2020-03-31 华为技术有限公司 处理器的调频方法及装置、计算设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7490254B2 (en) * 2005-08-02 2009-02-10 Advanced Micro Devices, Inc. Increasing workload performance of one or more cores on multiple core processors
CN100365543C (zh) * 2006-03-10 2008-01-30 浙江大学 内核动态调节处理器频率的节能方法
CN101661327A (zh) * 2009-10-14 2010-03-03 中兴通讯股份有限公司 一种调节中央处理器主频的方法及装置
KR20140080058A (ko) * 2012-12-20 2014-06-30 삼성전자주식회사 멀티코어를 위한 로드 밸런싱 방법 및 휴대 단말
US9383790B2 (en) * 2013-03-11 2016-07-05 Intel Corporation Internal communication interconnect scalability
CN105740075B (zh) * 2016-01-27 2020-03-31 浪潮(北京)电子信息产业有限公司 一种cpu调度方法与系统
CN105511593A (zh) * 2016-02-25 2016-04-20 浪潮(北京)电子信息产业有限公司 一种用于Linux系统的CPU子系统频率调节方法和装置
CN107368402A (zh) * 2017-07-10 2017-11-21 中国第汽车股份有限公司 计算cpu利用率的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483646A (zh) * 2009-07-24 2012-05-30 超威半导体公司 根据性能灵敏度不均匀地改变计算单元的性能
CN102999391A (zh) * 2012-11-21 2013-03-27 华为技术有限公司 一种调节处理器运行频率的方法和装置
CN106462456A (zh) * 2014-07-09 2017-02-22 英特尔公司 基于对生产者/消费者工作负载序列化的检测的处理器状态控制
US20190041962A1 (en) * 2018-09-28 2019-02-07 Avinash Ananthakrishnan Per-core operating voltage and/or operating frequency determination based on effective core utilization
CN110941325A (zh) * 2019-10-17 2020-03-31 华为技术有限公司 处理器的调频方法及装置、计算设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023082723A1 (zh) * 2021-11-11 2023-05-19 Oppo广东移动通信有限公司 处理器电路、供电控制方法及终端设备

Also Published As

Publication number Publication date
CN114816033A (zh) 2022-07-29
CN110941325B (zh) 2022-05-06
CN110941325A (zh) 2020-03-31

Similar Documents

Publication Publication Date Title
WO2021073130A1 (zh) 处理器的调频方法及装置、计算设备
US9087146B2 (en) Wear-out equalization techniques for multiple functional units
KR102425341B1 (ko) 시스템 최대 전류 보호
US20110161978A1 (en) Job allocation method and apparatus for a multi-core system
US20170255239A1 (en) Energy efficient workload placement management using predetermined server efficiency data
US9164931B2 (en) Clamping of dynamic capacitance for graphics
EP3483771A1 (en) Multi-level cpu high current protection
US9250910B2 (en) Current change mitigation policy for limiting voltage droop in graphics logic
US10146287B2 (en) Processor power monitoring and control with dynamic load balancing
KR102523589B1 (ko) 가속기 요청을 서비스할 때의 중앙 처리 유닛의 서비스 품질 보장 강화
JP2018506111A (ja) 計算要素がアクティブであるときのシステム低電力状態有効化
WO2015188649A1 (zh) 一种虚拟cpu与物理cpu之间的映射方法及电子设备
TW201314433A (zh) 伺服器系統及其電源管理方法
CN106462456B (zh) 基于对生产者/消费者工作负载序列化的检测的处理器状态控制
WO2021057023A1 (zh) 一种基于部件温度自动分配计算资源的降功耗方法和系统
WO2014105133A1 (en) Table driven multiple passive trip platform passive thermal management
WO2023155695A1 (zh) 处理器、控制方法、设备及介质
EP4145249A1 (en) Per-lane power management of bus interconnects
US20150186157A1 (en) Techniques for workload scalability-based processor performance state control
JP6070078B2 (ja) ハイブリッド並列処理システム、ハイブリッド並列処理方法およびコンピュータプログラム
CN117546121A (zh) 用于通过减少每周期指令数来控制多处理器核心系统中的电流供应的系统和方法
WO2024099474A1 (zh) 能效评估方法、装置、系统及相关设备
US11442522B2 (en) Method of controlling performance boosting of semiconductor device based on at least user input and feedback from previous boosting policies and semiconductor device performing the method
US20160306406A1 (en) Performance State Selection for Low Activity Scenarios
EP3436898A1 (en) Method and apparatus to maintain node power budget for systems that share a power supply

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20876294

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20876294

Country of ref document: EP

Kind code of ref document: A1