WO2011120019A2 - Fine grain performance resource management of computer systems - Google Patents

Fine grain performance resource management of computer systems Download PDF

Info

Publication number
WO2011120019A2
WO2011120019A2 PCT/US2011/030096 US2011030096W WO2011120019A2 WO 2011120019 A2 WO2011120019 A2 WO 2011120019A2 US 2011030096 W US2011030096 W US 2011030096W WO 2011120019 A2 WO2011120019 A2 WO 2011120019A2
Authority
WO
WIPO (PCT)
Prior art keywords
task
rate
processor
clock
performance
Prior art date
Application number
PCT/US2011/030096
Other languages
English (en)
French (fr)
Other versions
WO2011120019A3 (en
Inventor
Gary Allen Gibson
Valeri Popescu
Original Assignee
Virtualmetrix, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Virtualmetrix, Inc. filed Critical Virtualmetrix, Inc.
Priority to KR1020127027941A priority Critical patent/KR20130081213A/ko
Priority to JP2013501534A priority patent/JP2013527516A/ja
Priority to CN2011800254093A priority patent/CN102906696A/zh
Priority to EP11760356.3A priority patent/EP2553573A4/en
Publication of WO2011120019A2 publication Critical patent/WO2011120019A2/en
Publication of WO2011120019A3 publication Critical patent/WO2011120019A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3206Monitoring of events, devices or parameters that trigger a change in power modality
    • G06F1/3228Monitoring task completion, e.g. by use of idle timers, stop commands or wait commands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/324Power saving characterised by the action undertaken by lowering clock frequency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/26Power supply means, e.g. regulation thereof
    • G06F1/32Means for saving power
    • G06F1/3203Power management, i.e. event-based initiation of a power-saving mode
    • G06F1/3234Power saving characterised by the action undertaken
    • G06F1/329Power saving characterised by the action undertaken by task scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3419Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment by assessing time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5094Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2201/00Indexing scheme relating to error detection, to error correction, and to monitoring
    • G06F2201/865Monitoring of software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/501Performance criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/507Low-level
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the subject matter described herein relates to systems, methods, and articles for management of performance resources utilized by tasks executing in a processor system.
  • a computing system not only consists of physical resources
  • processors memory, peripherals, buses, etc.
  • performance resources such as processor cycles, clock speed, memory and I/O bandwidth and main/cache memory space.
  • the performance resources have generally been managed inefficiently or not managed at all.
  • processors are underutilized, consume too much energy and are robbed of some of their performance potential.
  • Many computer systems are capable of dynamically controlling the system and/or processor clock frequency(s). Lowering the clock frequency can dramatically lower the power consumption due to semiconductor scaling effects that allow processor supply voltages to be lowered when the clock frequency is lowered. Thus, being able to reduce the clock frequency, provided the computer system performs as required, can lead to reduced energy consumption, heat generation, etc.
  • processors are able to rapidly enter and exit idle or sleep states where they may consume very small amounts of energy compared to their active state(s).
  • processors and/or part or all of a computer system in sleep state can be used to reduce overall energy consumption provided the computer system performs as required.
  • Execution of a plurality of tasks by a processor system are monitored. Based on this monitoring, tasks requiring additional performance resources are identified by calculating a progress error and/or one or more progress limit errors for each task. Thereafter, performance resources of the processor system allocated to each identified task are adjusted. Such adjustment can comprise: adjusting a clock rate of at least one processor in the processor system executing the task, adjusting an amount of cache and/or buffers to be utilized by the task, and/or adjusting an amount of input/output (I/O) bandwidth to be utilized by the task.
  • I/O input/output
  • Each task can be selected from a group comprising: a single task, a group of tasks, a thread, a group of threads, a single state machine, a group of state machines, a single virtual machine, and a group of virtual machines, and any combination thereof.
  • the processor can comprise: a single processor, a multi -processor, a processor system supporting multi -threading (e.g., simultaneous or pseudo-simultaneous multithreading, etc.), and/or a multi-core processor.
  • Monitored performance metrics associated with the tasks executing / to be executed can be changed. For example, data transference can initially be monitored and later processor cycles can be monitored.
  • the progress error rate can be equal to a differential between work completed by the task and work to be completed by the task. Alternatively, the progress error rate is equal to a difference between a work completion rate for completed work and an expected work rate for the task.
  • Each task can have an associated execution priority and an execution deadline (and such priority and/or deadline can be specified by a scheduler and/or it can be derived / used as part of a rate adaption function or a parameter to a rate adaption function). In such cases, the performance resources of the processor system can be adjusted to enable each identified task to be completed prior to its corresponding execution deadline and according to its corresponding execution priority.
  • Performance resources can be adjusted on a task-by-task basis.
  • Each task can have an associated performance profile that is used to establish the execution priority and the execution deadline for the task.
  • the associated performance profile can specify at least one performance parameter.
  • the performance parameter can, for example, be a cache occupancy quota specifying an initial maximum and/or minimum amount of buffers to be used by the task and the cache occupancy quota can be dynamically adjusted during execution of the task.
  • the cache occupancy quota can be dynamically adjusted based on at least one of: progress error, a cache miss rate for the task, a cache hit rate or any other metrics indicative of performance.
  • the performance parameter can specify initial bandwidth requirements for the execution of the task and such bandwidth requirements can be dynamically adjusted during execution of the task.
  • a processor clock demand rate required by each task can be determined. Based on such determinations, an aggregate clock demand rate based on the determined processor clock demand rate for all tasks can be computed. In response, the processor system clock rate can be adjusted to accommodate the aggregate clock demand rate. In some cases, the processor system clock rate can be adjusted to the aggregate clock demand rate plus an overhead demand rate.
  • the processor clock demand rate can be calculated as a product of a current processor system clock rate with expected execution time for completion of the task divided by a time interval.
  • the processor clock demand rate for each task can be updated based on errors affecting performance of the task and, as a result, the aggregate clock demand rate can be updated based on the updated processor clock demand rate for each task.
  • Updating of the processor clock demand rate for each task or the aggregate clock demand rate can use at least one adaptation function to dampen or enhance rapid rate changes.
  • a processor clock rate for each task can be added to the aggregate clock demand rate when the task is ready-to-run as determined by a scheduler or other system component that determines when a task is ready-to-run (such as an I/O subsystem completing an I/O request on which the task is blocked).
  • the aggregate clock demand rate can be calculated over a period of time such that, at times, the processor system clock rate is higher than the aggregate clock demand rate, and at other times, the processor system clock rate is lower than the aggregate clock demand rate.
  • the processor system can include at least two processors and the aggregate clock demand rate can be determined for each of the at least two processors and be based on the processor demand rate for tasks executing using the corresponding processor. In such arrangements, the clock rate for each of the at least two processors can be adjusted separately and accordingly.
  • Each task is allocated physical memory. At least one task can utilize at least one virtual memory space that is mapped to at least a portion of the physical memory.
  • execution of a plurality of tasks by a processor system are monitored to determine at least one monitored value for each of the tasks.
  • the at least one monitored value characterizes at least one factor affecting performance of the corresponding task by the processor system.
  • Each task has an associated task performance profile that specifies at least one performance parameter,
  • the corresponding monitored value is compared with the corresponding at least one performance parameter specified in the associated task performance profile. Based on this comparing, it is determined, for each of the tasks based on the comparing, whether performance resources utilized for the execution of the task should be adjusted or whether performance resources utilized for the execution of the task should be maintained. Thereafter, performance resources can be adjusted by modifying a processor clock rate for each of the tasks for which it was determined that performance resources allocated to such task should be adjusted and maintaining performance resources for each of the tasks for which it was determined that performance resources allocated to the task should be maintained.
  • the monitored value can characterize an amount of work completed by the task.
  • the amount of work completed by the task can be derived from at least one of: an amount of data transferred when executing the task, a number of processor instructions completed when executing the task, processor cycles, execution time, etc.
  • a current program state is determined for each task and the associated task performance profile specifies two or more program states having different performance parameters.
  • the monitored value can be compared to the performance parameter for the current program state (and what is monitored can be changed (e.g., instructions data transfererence, etc.)).
  • At least one performance profile of a task being executed can be modified so that a corresponding performance parameter is changed.
  • the monitored value can be compared to the changed performance parameter.
  • a processor clock demand rate required by each task can be determined. Thereafter, an aggregate clock demand rate can be computed based on the determined processor clock demand rate for all tasks. As a result, the processor system clock rate can be adjusted to accommodate the aggregate clock demand rate.
  • a processor clock demand rate required by a particular task can be dynamically adjusted based on a difference between an expected or completed work rate and at least one progress limiting rate (e.g., a progress limit error, etc.). The processor clock demand rate required by each task can be based on an expected time of completion of the corresponding task.
  • the processor system clock rate can be selectively reduced to a level that does not affect the expected time of completion of the tasks.
  • the processor system clock rate can be set to either of a sleep or idle state until such time that the aggregate clock demand is greater than zero.
  • the processor system clock rate can fluctuate above and below the aggregate clock demand rate during a period of time provided that an average processor system clock rate during the period of time is above the aggregate clock demand rate.
  • the performance profile can specify an occupancy quota limiting a number of buffers a task can utilize.
  • the occupancy quota can be dynamically adjusted based on a difference between an expected and completed work rate and one or more progress limiting rate (e.g., progress limit error etc.) Other performance metrics from a single source or multiple sources can be used to adjust the occupancy quota.
  • Utilization of bandwidth by an input / output subsystem of the processor system can be selectively controlled so that performance requirements of each task are met.
  • the amount of bandwidth utilized can be dynamically adjusted based on a difference between an expected and completed work rate and one or more progress limiting rate (e.g., progress error, etc.).
  • Other performance metrics e.g., progress limit error, etc.
  • a system includes at least one processor, a plurality of buffers, a scheduler module, a metering module, an adaptive clock manager module, a cache occupancy manager module, and an input/output bandwidth manager module.
  • the scheduler module can schedule a plurality of tasks to be executed by the at least one processor (and in some implementations each task has an associated execution priority and/or an execution deadline).
  • the metering module can monitor execution of the plurality of tasks and to identify tasks that require additional processing resources.
  • the adaptive clock manager module can selectively adjust a clock rate of the at least one processor when executing a task.
  • the cache occupancy manager module can selectively adjust a maximum amount of buffers to be utilized by a task.
  • the input / output bandwidth manager module can selectively adjust a maximum amount of input/output (I/O) bandwidth to be utilized by a task.
  • Articles of manufacture are also described that comprise computer executable instructions permanently stored on computer readable media, which, when executed by a computer, causes the computer to perform operations herein.
  • computer systems are also described that may include a processor and a memory coupled to the processor. The memory may temporarily or permanently store one or more programs that cause the processor to perform one or more of the operations described herein.
  • performance requirements in such a way as to provide performance guarantees / targets while at the same time using minimal resources, can allow a computer system to have greater capacity (because required resources for each component is minimized).
  • the current subject matter can allow a computer system to require fewer/smaller physical computer resources thereby lowering cost and/or reducing physical size.
  • overall power consumption can be reduced because fewer power consuming resources are needed.
  • information such as aggregate clock rates, progress error and progress limit error can be used to inform a scheduler on which processor to schedule tasks.
  • FIG. 1 is a block diagram of a computer system with performance resource management
  • FIG. 2 is a block diagram of a metering module
  • FIG. 3 is a block diagram of a performance resource manager module
  • FIG 4 is a diagram illustrating a calendar queue
  • FIG. 5 is a process flow diagram illustrating a technique for processor system performance resource management.
  • FIG. 1 is a simplified block diagram of a computer system including a processor system 10, a management module 106, an I/O (Input / Output) subsystem 108 and a system memory 150.
  • the processor system 10 can include one or more of a central processing unit, a processor, a microprocessor, a processor core and the like.
  • the processor system 10 can comprise a plurality of processors and/or a multi- core processor.
  • the functional elements of the processor system depicted in FIG. 1 can be implemented in hardware or with a combination of hardware and software (or firmware).
  • the processor system 10 can include an instruction cache 104, instruction fetch/branch unit 1 15, an instruction decode module 125, an execution unit 135, a load/store unit 140, a data cache 145, a clock module 180 for controlling the processor system's clock speed(s), an idle state module 184 for controlling the idle or sleep state of the processor system, a DMA (Direct Memory Access) module 186, a performance management system 105 and a scheduler module 130.
  • the performance management system 105 can include a metering module 1 10 and a performance resource management module 120.
  • a task context memory which stores the task performance profile for a task, can be incorporated into the system memory 150. In other implementations, the task context memory may be independent of the system memory 150.
  • a task may be referred to as a set of instruction to be executed by the processor system 10.
  • the term task can be interpreted to include a group of tasks (unless otherwise stated).
  • a task can also comprise processes such as instances of computer programs that are being executed, threads of execution such as one or more simultaneously, or pseudo-simultaneously, executing instances of a computer program closely sharing resources, etc. that execute within one or more processor systems 10 (e.g., microprocessors) or virtual machines such as virtual execution environments on one or more processors.
  • a virtual machine (VM) is a software implementation of a machine (computer) that executes programs like a real machine.
  • the tasks can be state machines such as image processors, cryptographic processors and the like.
  • the management module 106 can be part of the computer system coupled to the processing module (for example, a program residing in the system memory 150).
  • the management module 106 can create, and/or retrieve previously created performance profiles from system memory 150 or from storage devices such as hard disk drives, non-volatile memory, etc., and assign task performance profiles that specify task performance parameters to tasks directly or through their task context (a set of data containing the information needed to manage a particular task).
  • the management module 106 can control the allocation of resources by
  • the I/O subsystem module 108 can be part of the computer system coupled to the processing module (for example, a program residing in the system memory 150).
  • the I/O subsystem module 108 can control, and/or enable, and/or provide the means for the communication between the processing system, and the outside world possibly a human, storage devices, or another processing system.
  • Inputs are the signals or data received by the system, and outputs are the signals or data sent from it.
  • Storage can be used to store information for later retrieval; examples of storage devices include hard disk drives and non-volatile semiconductor memory. Devices for communication between computer systems, such as modems and network cards, typically serve for both input and output.
  • the performance management system 105 of the processor system 10 can control the allocation of processor performance resources to individual tasks and for the processor system.
  • the performance management system 105 can control the allocation of state machine performance resources to individual tasks executing in the state machine.
  • the management module 106 can control the allocation of resources by determining/controlling the task performance profiles (e,g. through a set of policies/rules, etc.). For example, by controlling the allocation of performance resources to all tasks, each task can be provided with throughput and response time guarantees.
  • processor resources of the processor system 10 and/or a computing system incorporating the processor system 10 that includes the I/O subsystem module 108 and the system memory 150, etc.
  • performance resources are utilized.
  • the minimization of performance resources increases efficiency lowering energy consumption and requiring fewer/smaller physical computer resources resulting in lowered cost.
  • the minimization of performance resources allocated to each task can enable the processor system 10 to have greater capacity enabling more tasks to run on the system while similarly providing throughput and response time guarantees to the larger number of tasks.
  • Tasks can be assigned performance profiles that specify task performance parameters.
  • task performance parameters include work to be completed, We, time interval, Ti, and maximum work to be completed, Wm, cache occupancy and I/O (Input / Output) bandwidth requirements as described elsewhere in this document.
  • the time interval can represent a deadline such that the task is expected to complete We work within Ti time.
  • the work to be completed can determine the expected work to be performed by the task when it is scheduled for execution.
  • the maximum work to be completed can specify the maximum work the task may accumulate if, for example, the completion of its expected work is postponed.
  • the time interval can also be utilized by the scheduling module 130 to influence scheduling decisions, such as using the time interval to influence when a Task should run or as a deadline (the maximal time allowed for the task to complete its expected work).
  • these parameters can dynamically change with task state such that the performance profile parameters are sets of parameters where each set may be associated with one or more program states and changed dynamically during the task's execution.
  • a scheduler module (as well as related aspects that can be used in connection with the current subject matter) is described in U.S. Patent App. Pub. No 2009/0055829 Al, the contents of which are hereby fully incorporated by reference.
  • Performance profiles can be assigned to groups of tasks similar to the performance profile for an individual task.
  • tasks that are members of a group share a common performance profile and the performance resource parameters can be derived from that common profile.
  • a subset of the performance parameters can be part of a group performance profile while others are part of individual task performance profile.
  • a task profile can include expect work parameters while the task is a member of a group that shares I/O bandwidth and cache occupancy performance parameters.
  • a multiplicity of groups can exist where tasks are members of one or more groups that specify both common and separate performance profile parameters where the parameters utilized by the performance resource manager are derived from the various performance profiles (through a set of policies/rules)
  • the work can be a measure of data transference, processor instructions completed, or other meaningful units of measure of work done by the processor system 10 or state machine such as image processors, cryptographic processors and the like. As this work can be measured to a fine granularity, the performance resources can be similarly managed to a fine granularity.
  • the processor system 10 can execute instructions stored in the system memory 150 where many of the instructions operate on data stored in the system memory 150.
  • the instructions can be referred to as a set of instructions or program instructions throughout this document.
  • the system memory 150 can be physically distributed in the computer system.
  • the instruction cache 104 can temporarily store instructions from the system memory 150.
  • the instruction cache 104 can act as a buffer memory between system memory 150 and the processor system 10. When instructions are to be executed, they are typically retrieved from system memory 150 and copied into the instruction cache 104. If the same instruction or group of instructions is used frequently in a set of program instructions, storage of these instructions in the instruction cache 104 can yield an increase in throughput because system memory accesses are eliminated.
  • the fetch/branch unit 115 can be coupled to the instruction cache 104 and configured to retrieve instructions from the system memory 150 for storage within the instruction cache 104.
  • the instruction decode module 125 can interpret and implement the instructions retrieved. In one implementation, the decode module 125 can break down the instructions into parts that have significance to other portions of the processor system 10.
  • the execution unit 135 can pass the decoded information as a sequence of control signals, for example, to relevant function units of the processor system 10 to perform the actions required by the instructions.
  • the execution unit can include register files and Arithmetic Logic Unit (ALU).
  • the actions required by the instructions can include reading values from registers, passing the values to an ALU (not shown) to add them together and writing the result to a register.
  • the execution unit 135 can include a load/store unit 140 that is configured to perform access to the data cache 145.
  • the load/store unit 140 can be independent of the execution unit 135.
  • the data cache 145 can be a high-speed storage device, for example a random-access memory, which contains data items that have been recently accessed from system memory 150, for example.
  • the data cache 145 can be accessed independently of the instruction cache 104.
  • FIG. 2 is a block diagram of a metering module 1 10.
  • the metering module 1 10 can measure the work performed or amount of work completed by the currently executing task(s).
  • the metering module 1 10 can monitor the execution of the task to determine a monitored value related to the amount of work completed for the task.
  • the monitored value related to the amount of work completed can be the actual amount of work completed, a counter value or the like that is proportional to or related to the amount of work completed.
  • one implementation of the metering module 110 can comprise a work completed module 210 (Wc), a work to be completed module 220 (We), a comparator module 230, and an adder module 240.
  • the work completed module 210 can be a work completed counter and the work to be completed module 220 can also be a work to be completed counter.
  • the work to be completed counter can be updated based on the work rate to account for the passage of time.
  • the work to be completed can be calculated by the performance resource manager, for example, when the task is selected for execution on the processor system by the scheduler module 130 informing the performance resource manager of the task selection.
  • the metering module 1 10 can measure and monitor the work completed by a task that is currently being executed on the processor system 10.
  • One or more tasks can be implemented on the processor system 10 (e.g., processor(s) employing simultaneous or pseudo-simultaneous multi-threading, a multi-processor, etc.).
  • the monitored value of work completed or information about the amount of work completed can be measured by the amount of instructions completed and can be acquired from the instruction fetch/branch unit 1 15 as illustrated by the arrow 170 in FIG. 1.
  • the monitored values can also be measured by the amount of data transferred through memory operations and can be acquired from the load/store unit 140 as illustrated by the arrow 165 in FIG. 1.
  • the metering module 1 when used to monitor memory operations (bandwidth), can be configured to only account for memory operations to/from certain addresses (such as a video frame buffer). This configuration can be varied on a task-by-task basis (with the configuration information part of the Task Context or task performance profile). In some implementations, there can be separate metering modules 1 10 for instruction completion and memory operations depending on specific details of the computer system implementation. These metering modules would be similar to a single metering module 1 10. As some processing modules 10 handle multiple tasks (threads) simultaneously, the instructions completed information can include information as to which thread had completed certain instructions (typically by tagging the information with thread or process or task identifier(s)).
  • the memory operations information can similarly include this thread identifier in order for the metering module 110 associate these operations to the correct task.
  • Processing modules 10 which include one or more of a central processing unit, a processor, a microprocessor, a processor core, etc can include a plurality of metering modules 110 for each such processor.
  • a monitored value related to the work performed or work completed Wc can be measured by counting the accesses to memory, instructions completed, and/or other measurable quantities that are meaningful measurements of work by the currently executing task(s).
  • the monitored value for example the number of accesses to memory, which can include the size of the access, can be received at the adder module 240 where they are summed and provided to the work completed module 210.
  • the monitored values can also be measured by the memory operations that can be acquired from the load/store unit 140 as illustrated by the arrow 165 in FIG. 1.
  • the work to be completed module 220 can receive a parameter value We related to the amount of work to be completed.
  • the parameter value related to the amount of work to be completed and/or work rate can be a predetermined value that is stored in the task performance profile of a task.
  • the work to be completed parameter value can be the actual amount of work to be completed, a counter value or the like that is proportional to or related to the amount of work to be completed.
  • the parameter value can be a constant parameter or calculated from the work rate to include, for example, work credit which can be calculated to account for the time the task waits to be executed by multiplying the work rate by the passage of time.
  • the work credit can also be calculated continuously, or periodically, such that the work to be done increases with the passage of time at the work rate even while the task in running. This computed work to be done can be limited to being no larger than a maximum work parameter.
  • the parameter values can be predetermined by the management module 106 during the process of mapping a task to a computer system.
  • the work completed can be compared to the work to be completed by the comparator module 230.
  • the result of this comparison, the progress error can be a value representing a differential between the work completed and work to be completed and/or between the work completion rate and the work to be completed rate (the expected work rate) by including time in the comparison.
  • One implementation can calculate a progress error based on a task achieving its expected work to be completed, within an expected runtime.
  • a negative progress error in the above example relation, can indicate the work completion is greater than the expected work at elapsed time qt.
  • a progress error can be used to allocate or adjust the allocation of performance related resources to tasks as detailed elsewhere in this document.
  • One or more instances of meter modules can be utilized to determine if task's progress is limited (directly or indirectly) by quantities a meter module may measure; memory accesses or cache miss occurrences (i.e., failed attempts to read or write a piece of data in the buffer resulting in a main memory access, etc.), for instance, by metering those quantities and comparing them to pre-calculated parameters.
  • the progress limit measurement can be achieved by providing the We module 220 of a meter module instance with a value to be compared to the accumulated metered quantity in the Wc module 210.
  • the value supplied to module 220 can be considered a progress limit parameter.
  • a comparator function can then compare the two values, including a comparison with respect to time, to determine if progress is limited by the quantity measured; for example, limited by a certain cash miss rate or memory access rate.
  • the result can be expressed as a progress error (note that this result is different than the primary progress error arising from comparing work completed to work to be completed).
  • the progress limit error values can be used to allocate or adjust the allocation of performance related resources to tasks as detailed elsewhere in this document.
  • the progress limit parameters may be part of the task's performance profile [0051]
  • a history of progress error and progress limit error values, from current and previous times a task was executing on the processor system, can be utilized to allocate or adjust the allocation of performance related resources to tasks as detailed elsewhere in this document. These values can be represented, for example, as cumulated progress and progress limit error values or as a series of current and historical values (which may be part of the task's performance profile).
  • the adaptive clock manager module 320 can manage the processor system's clock speed(s) by determining the required clock speed and setting the clock rate of the processor system 10 via the clock control module 180.
  • the processor system's clock speed(s) can be determined by computing the aggregate clock demand rate of the tasks in the computer system.
  • the task demand rate can represent the clock rate demand for task i to complete its expected work, We, within a time interval or deadline Ti.
  • the aggregate demand rate can include demand rates from the ready-to-run tasks while in other implementations the demand rate can include estimated demand rates from not ready-to-run tasks, calculating and/or speculating on when those tasks will be ready to run.
  • the overhead demand rate can be a constant parameter or it can depend on system state such that one or more values for the overhead demand rate is selected depending on system state.
  • the overhead demand rate can be contained in the task demand rate (which then can incorporate the processor system overhead activity on behalf of the task).
  • the overhead demand rate can be predetermined by the management module 106 during the process of mapping task to a computer system.
  • the expected execution time is the expected time for the task to complete its expected work and can be part of the task's performance profile. In general, the expected execution time can be derived from the previous executions of the task (running on the processor system) and can be a measure of the cumulative time for the task's expected work to be completed. In addition, the expected execution time is typically dependent on the processor system frequency.
  • the task's demand rate can be a minimal clock rate for the task to complete its expected work within its time interval or deadline of Ti.
  • the task demand rate can be part of the task's performance profile.
  • the clock manager module 320 can request the processor run at a clock frequency related to the aggregate demand rate, Ard, making such requests when the value of Ard changes in accordance with certain dependencies describe elsewhere in this document.
  • the actual system may only be capable of supporting a set of discrete processor and system clock frequencies, in which case the system is set to a supported frequency such that the processor system frequency is higher than or equal to the aggregate demand rate.
  • multiple clock cycles can be required to change the clock frequency in which case the requested clock rate can be adjusted to account for clock switching time.
  • the progress error and/or progress limit errors can be monitored and the task demand rate updated based on one or more of these values, for example at periodic intervals.
  • the updated task demand rate results in a new aggregate demand rate which can result in changing the processor system's clock as described elsewhere in this document.
  • the progress error and progress limit errors can be used to adjust the demand rate directly or through one or more rate adaption functions implemented by the adaptive clock manager module 320. For example, one rate adaption function can adjust the task demand rate if the error is larger than certain limits, while another adaption function can change the demand rate should the error persist for longer than a certain period of time.
  • the rate adaption function(s) can be used to dampen rapid changes in task and/or aggregate demand rates which may be undesirable in particular processor systems and/or arising from certain tasks and can be system dependent and/or task dependent.
  • the rate adaptation functions can be part of the task's performance profile.
  • the adaptive clock manager module 320 can adjust the aggregate demand rate by adjusting the individual task demand rates to account for the tasks meeting their expected work in their expected time.
  • the processor clock frequency can be adjusted relative to the aggregate demand rate while adjusting the individual task demand rates separately with both adjustments arising from progress error and progress limit error values.
  • the processor clock frequency, the aggregate demand rate, and individual task demand rates can be adjusted to match the sum of all tasks', being considered, expected work completed to their work to be completed in a closed loop form.
  • Demand rate adjustments can allow the overhead demand rate to be included in the individual tasks demand rates and thus be an optional parameter.
  • Minimum and maximum threshold parameters can be associated with the task demand rate. These minimum and maximum threshold parameters can relate to progress error and progress limit error and can be used to limit the minimum and/or maximum task demand rate. In another implementation, thresholds can limit the minimum and maximum processor clock frequency chosen during the execution of the task. The minimum and maximum threshold parameters can be part of the task's performance profile.
  • the adaptive clock manager module 320 can detect when adjusting the processor clock frequency higher does not increase the work completed rate and the requested clock rate can be adjusted down without adversely reducing the rate of work completed. This condition can be detected, for example, by observing a change, or lack thereof, in progress error as processor frequency is changed.
  • the clock manager module 320 can adjust the requested clock rate higher when the task's state changes such that increasing the clock frequency higher does increase the work completed rate. This detection can be accomplished by setting the processor clock frequency such that the progress error meets a certain threshold criteria, and when the error falls below a certain threshold, the clock frequency can be adjusted higher as greater progress is indicated by the reduction in progress error.
  • Certain rate adaption function(s) which can include progress error and/or progress limit error, can be utilized in computing the processor clock frequency. These rate adaption functions can be system and/or task dependent and can be part of the task performance profile.
  • the task demand rate, rate adaption parameters, progress limit parameters, and/or thresholds, etc. can dynamically change with task state such that the performance profile parameters are sets of parameters where each set may be associated with one or more program states and changed dynamically during the execution of the task by the management module 106.
  • the management module 106 can adjust directly by the task (rather than the management module 106).
  • a task's demand rate can be added to the aggregate demand rate when the task becomes ready-to-run which may be determined by the scheduler module 130 (e.g., based on scheduling or other events such as becoming unblocked on I/O operations, etc.) or other subsystems such as the I/O subsystem.
  • This demand rate can initially be specified by, or calculated from, the tasks performance profile and can be updated based, for example, on the task's work completion progress over time, updated through a rate adaption function as a function of progress error, and the like.
  • the performance profile can contain one or more task state dependent performance parameters. In such cases, the task demand rate can be updated when these parameters change due to task state, or system state, change and can be further updated while the task is executing on the processor system through rate error adaptation (using the progress error and/or progress limit error in the computation of performance profile parameters).
  • the aggregate demand rate can be recalculated from the individual task demand rates.
  • the new aggregate demand rate can be calculated by subtracting the task's cumulative demand rate at the end of the time interval or current execution (when the expected work is completed), which ever is later, by placing the cumulative demand rate in a time-based queuing system, such as a calendar queue, which presents certain information at a specific time in the future.
  • This implementation reserves the task's demand rate within the aggregate demand rate from the time the task rate is first added until the end of its time interval or its completes execution, whichever is later.
  • the adaptive clock manager module 320 can utilize a calendar queue for example, Calendar Queue Entry 1 (other calendar queue techniques can be utilized).
  • the adaptive clock manager module 320 can insert a task's cumulative clock demand rate into the location Ti-Rt (difference from the time interval, to the current real time, Rt) units in the future (for example the tasks under Calendar Queue Entry N-l).
  • the index can be calculated as MAX(Ti - Rt,
  • MAX_CALENDAR_SIZE - 1) where MAX_CALENDAR_SIZE (N) is the number of discrete time entries of the calendar queue.
  • the index can represent a time related value in the future from the current time or real time.
  • a task with Ti > Rt can be reinserted into the calendar queue within a certain threshold.
  • the threshold and the size of the calendar can depend on the system design, precision of the real time clock and the desired time granularity.
  • the calendar queue can be a circular queue such that as the real time advances, the previous current time entry becomes the last entry in the calendar queue.
  • entry 0 becomes the oldest queue entry.
  • the index can take into account the fact that the calendar is a circular queue.
  • the current time index can advance from 0 to N-l as real time advances. Thus at point N-l the current time index wraps back to zero.
  • the adaptive clock manager module 320 can additionally manage entering into and resuming from the processor system's idle state. Should the aggregate clock demand be zero, the clock manager module 320 can place the processor system into an idle state until such time that the aggregate clock rate is/will be greater than zero. In some processor systems, multiple clock cycles may be required to enter and resume from idle state, in which case the time entering and resuming idle state as well the requested clock rate upon resuming active state can be adjusted to account for idle enter and resume time (as well as clock switching time).
  • the clock manager module 320 can also be capable of achieving certain aggregate demand rates, over a period of time, by requesting a frequency greater than or equal to the aggregate demand rate and placing the processor system into an idle state such that the average frequency (considering the idle time to have frequency of zero) equal to or higher than the aggregate demand rate.
  • the processor system 10 has greater energy efficiency executing at higher frequency and is then placed in idle state to satisfy certain aggregate demand rates.
  • the requested rate can be adapted to be higher than the calculated aggregate demand rate to bias placing the processing system in idle state.
  • the parameters from which the frequency and idle state selection are made can be derived from characterizing the processor system by the management module 106 during the process of mapping task(s) to a computer system.
  • the adaptive clock management module can request the processor system enter idle state by signaling the idle state module 184 to idle the processor system.
  • the idle state can be exited when an event, such as an interrupt from an I/O device or timer, etc occurs.
  • the aggregate demand rate can be calculated individually for each processor or collectively for all processors or a subset of processors or a combination of these. Some tasks can be assigned to certain processors while others may be free to run on any or a certain set of processors.
  • the aggregate demand rate can be calculated for the all processors observing the restrictions and freedoms of each task has to run on a certain processor including an affinity property where it is desirable to run a task on a particular processor.
  • each processor clock rates and idle states can be controlled individually.
  • the clock manager module 320 can select a combination of clock rates while idling one or more processors to achieve minimum energy.
  • the idle states may be, a single clock rate can be chosen while idling one or more processors to achieve minimum energy consumption.
  • the clock rate can be chosen such that the aggregate demand rate for all, or a plurality of subsets of, processors is divided among the processors to achieve certain desired goals, such as maximizing throughput or minimizing task completion times of a tasks individually or of parallel computations performed by a plurality of tasks. Interaction with the scheduler module 130 (in the determination of which task(s) execute in which processor) may be necessary to achieve the desired goals.
  • the clock module 180 and idle state module 184 can have interaction with other computer system components, not shown in the drawings. These interactions may be necessary to enable changing the one or more processors' clock speed(s) or idle state(s). For example, changing the processor frequency can require changing the clock speed of busses, peripherals, the clock speed of system memory 150, etc. Similarly, to place the processor in or resume from a idle state, certain busses, peripherals, system memory 150, etc may require preparation before such state is entered (such as quiescing an I/O device and writing its buffers to system memory) or active state is resumed (such as initializing an I/O device to commence operation(s)).
  • the cache occupancy management module 340 can manage the use of buffer or cache occupancy quotas. These occupancy quotas can be numerical limits of the number of buffers a task may (or should) use.
  • the occupancy quota, Oq, and current occupancy Oc can be additionally stored in the task's performance profile.
  • Cache occupancy can be selectively allocated using, for example, a cache replacement algorithm such as those described in co-pending U.S. Pat. App. Ser. No. 13/072,529 entitled "Control of Processor Cache Memory Occupancy", filed on March 25, 201 1 and claiming priority to U.S. Pat. App. Ser. No. 61,341,069, the contents of both applications are hereby incorporated by reference.
  • Occupancy in this case can be characterized as an indication of actual number of buffers being used by a task.
  • a buffer is a memory or region of memory used to temporarily hold data (such as an input/output buffer cache) while it is being moved from one place to another or to allow faster access (such as a processor instruction/data cache).
  • the occupancy counter Oc can be incremented, as buffers are de-allocated to the task the occupancy counter can be decremented. Whenever the occupancy quota is greater than the occupancy counter (Oc > Oq), the task is exceeding its occupancy quota.
  • Occupancy quotas can contain multiple quota parameters such that higher or lower priority is given to comparing the occupancy to these additional quotas.
  • a task's occupancy quota can be part of its performance profile.
  • This performance profile parameter may be statically set, may be dependent on program state, or may be dynamically calculated by the cache occupancy manager. Dynamic occupancy quotas may be adjusted based on the performance of the task, for example meeting its deadline, based on the cache miss information during its execution or feedback from execution in terms of expected work compared to work completed using progress error and/or progress limit errors as described elsewhere in this document.
  • the cache occupancy manager can adjust the occupancy quotas. Such adjustments can be based, for example, on pre-defined / configured limits which in turn can be a combination of system-level configured limits and limits contained in the task's performance profile. In one implementation, the occupancy quota can be adjusted based on the differential between a task's expected work rate and work completed rate, utilizing progress error for instance, or the cache miss rate, or a combination of the two.
  • the computation of the occupancy quota can be made such that that the occupancy quota can be increased when a task is below its expected work rate or the cache miss rate is above a certain threshold; conversely, the occupancy quota can be reduced when the task is exceeding its expected work or the cache miss rate is below a certain threshold.
  • This computation can also take progress limiting error values into account, for example, by detecting that the progress is being limited by another factor other than occupancy.
  • the cache occupancy management module can control occupancy quotas by setting quotas in the instruction cache 104 and/or data cache 145 if they have occupancy quota control mechanisms, or other buffer / caching components that can be part of, or coupled to, the processing system or computer system, such as a program stored in system memory 150.
  • the cache occupancy parameters can relate to a task (or group of tasks) such that the system allocates occupancy quotas to or on behalf of the task; perhaps keeping track of a task if utilized by both the cache occupancy management module and the respective I/O subsystems.
  • the quota control mechanisms can be implemented in hardware or software (firmware) or a combination of both.
  • Cache occupancy can include mapping virtual memory, memory management techniques allowing tasks to utilize virtual memory address space(s) which may be separate from physical address space(s), to physical memory.
  • the physical memory in effect acts as a cache allowing a plurality of tasks to share physical memory wherein the total size of the virtual memory space(s) may be larger than the size of physical memory, or larger than the physical memory allocated to one or more tasks, and thus the physical memory, and/or a portion thereof, acts as a "cache".
  • Physical memory occupancy of a task can be managed as described elsewhere in this document.
  • the management module may be a separate module, as in 106, or may be an integral part of one or more operating systems, virtual machine monitors, etc.
  • a multiplicity of caches and/or buffer subsystems can exist and thus there can be several occupancy quota parameters utilized and stored in the task's performance profile.
  • These caches and buffers can be embodied in hardware or software (firmware) or a combination of both.
  • a task's occupancy quota(s) can be modified such that work completed rate is matched to the expected work completed rate in a closed loop form where occupancy can be increased to meet expected work rates and/or decreased when expected work rates are being met or exceeded.
  • the modification of occupancy quota(s) can utilize rate adaption functions which may be task and dependent on task state.
  • Task prioritization relative to occupancy quotas can be utilized to guarantee certain higher priority tasks meet their expected work at the expense of lower priority tasks.
  • the management module 106 can control the overall allocation of occupancy quotas by determining/controlling the maximum and minimum occupancy quotas and/or the maximum and minimum changes allowed to occupancy quotas, etc (e,g. through a set of policies/rules).
  • the I/O bandwidth management module 360 can manage the computer system's input output subsystem(s) utilization of bandwidth (which is a measure of data transference per unit time). I/O operations performed by tasks, or by an operating system on behalf of a task's I/O request(s) for instance, can be managed as a performance resource by the I/O bandwidth manager to ensure that tasks performance requirements of 10 operations are met.
  • a task's I/O bandwidth can be part of its performance profile. This performance can be statically set (based on, for example, program state), or it can be dynamically calculated, such as by the I/O bandwidth manager. Dynamic I/O bandwidth values can be adjusted based on the performance of the task, for example, meeting its calculated deadline or feedback from execution in terms of expected work rate vs. work completed rate.
  • the I/O bandwidth manager can adjust the I/O bandwidth parameters, within certain configured limits which can be a combination of system-level configured limits and limits contained in the task's performance profile.
  • the I/O bandwidth can be modified utilizing progress error and/or progress limit error values, or the expected I/O rate, or a combination of these.
  • the computation of an I/O bandwidth rate can be made such that that the I/O bandwidth may be increased or decreased depending on progress and/or progress limit error values and thresholds. In general, these values and thresholds can be determined to match the tasks work completed rate to the work to be completed rate without using I/O bandwidth unnecessarily.
  • a task's work can may be the I/O bandwidth rate, in which case task primary work is the transference of I/O data at a certain rate.
  • I/O bandwidths can be adjusted such that the work completed rate is matched to the work to be completed rate in a closed loop form; where I/O bandwidths can be increased to meet expected work rates and/or decreased when expected work rates are being exceeded considering progress and progress limit errors.
  • I/O resources can be allocated through I/O bandwidth allocations, managed through the I/O bandwidth manager, in such a way as to provide system performance guarantees. Such guarantees can be that the total I/O bandwidth is not over allocated or that certain tasks receive their I/O bandwidth at the expense of others (depending on a set of policies/rules).
  • the I/O bandwidth management module can control I/O bandwidth by setting bandwidth parameters in the I/O subsystem module 108 for such bandwidth control mechanisms that exist, or other I/O components that may be part of, or coupled to, the processing system or computer system, such as a program stored in system memory 150.
  • the I/O bandwidth parameters can relate to a task (or group of tasks) such that the system allocates bandwidth to or on behalf the task. In some variations, this can comprises keeping track of a task ID to associate with I/O operations such that the I/O bandwidth management module and the respective I/O subsystems may attribute data transference to a specific task.
  • the I/O bandwidth control mechanisms can be implemented in hardware or software (firmware) or a combination of both.
  • DMA controllers can be utilized. Direct memory access is a feature of modern computers and microprocessors that allows certain hardware subsystems within the computer to access system memory for reading and/or writing independently of the central processing unit. Many hardware systems use DMA including disk drive controllers, graphics cards, network cards, sound cards and Graphics Processing Units (GPUs). DMA can also used for intra-chip data transfer in multi-core processors, especially in multiprocessor system-on-chips, where its processing element is equipped with a local memory (often called scratchpad memory) and DMA can be used for transferring data between the local memory and the main memory.
  • DMA can also used for intra-chip data transfer in multi-core processors, especially in multiprocessor system-on-chips, where its processing element is equipped with a local memory (often called scratchpad memory) and DMA can be used for transferring data between the local memory and the main memory.
  • the I/O bandwidth manager can control I/O bandwidth through mechanisms that provide a bandwidth control mechanism to I/O operations, through bandwidth shaping.
  • Bandwidth shaping can be accomplished by delaying certain data transference requests until sufficient time has passed to accumulate credit for the transference (where credit is a measure of data that is accumulated over time at a certain rate, representing the bandwidth).
  • the I/O operation or the bandwidth management of data transference, including DMA, operations can be implemented in hardware or by software (or firmware).
  • I/O bandwidth management system can request I/O operation prioritization based on tasks matching their work completed to their work to be completed, taking progress error and progress limit error into account. This can, for example, consider progress and progress limit errors for all tasks of interest such that tasks with greater progress error, within certain progress limit error values, are given priority over tasks with lesser progress error within progress limit error values.
  • the progress error and progress limit errors can be used to adjust a task's I/O bandwidth parameters directly or though one or more rate adaption functions implemented by the I/O bandwidth manager.
  • one rate adaption function can be to only adjust the I/O bandwidth if the error is larger than certain limits while another adaption function can only may only change the demand rate should the error persist for longer than a certain period of time.
  • the rate adaption function(s) can be system dependent and/or task dependent.
  • the rate adaptation functions can be part of the task's performance profile.
  • Task prioritization relative to I/O bandwidth can be utilized to guarantee certain higher priority tasks meet their expected work at the expense of lower priority Tasks.
  • the management module 106 can control the overall allocation of I/O bandwidth by determining/controlling the maximum and minimum I/O bandwidth and/or bandwidth parameters (e,g. through a set of
  • the scheduler module 130 can select the next task(s) to be executed from its list of tasks based on the task parameters including task priority.
  • the scheduler module 130 can indicate that a higher priority task is ready to the processor system 10.
  • the processor system 10 (or software on the processor system 10) can decide to preemptively switch from the currently running task and run the higher priority task.
  • the scheduler module 130 or software in the processor system can indicate that a higher priority task is to be selected for execution, perhaps replacing a currently running task. In which case, the task currently running or executed in the processor system 10 can also be indicated to the performance resource manager 120.
  • the state of the metering module(s) 1 10 utilized for the currently running task can be saved in the task's context and the metering module is directed to monitor the newly selected task, by the performance resource manager (by updating the modules 210, 220 and the comparator function(s) within the metering module). Additional state in the performance resource manager can be modified similarly as a result of this task switching.
  • scheduling can be assigned on a processor-by-processor basis such that a task on a particular processor can be influenced by progress errors and/or progress limit errors of that task. This can be also be done on a thread-by-thread basis for multi-thread systems.
  • FIG. 5 is a process flow diagram illustrating a method 500, in which, at 510, execution of a plurality of tasks by a processor system are monitored. Based on the monitoring, at 520, tasks requiring adjustment of performance resources are identified by calculating at least one of a progress error and a progress limit error for each task.
  • performance resources of the processor system allocated to each identified task are adjusted.
  • the adjusting can include, for example, one or more of: adjusting a clock rate of at least one processor in the processor system executing the task, adjusting an amount of cache and/or buffer to be utilized by the task, and adjusting an amount of input/output (I/O) bandwidth to be utilized by the task.
  • Various implementations of the subject matter described herein may be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations may include implementation in one or more computer programs that are executable and/or interpretable on a
  • programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Hardware Redundancy (AREA)
  • Power Sources (AREA)
PCT/US2011/030096 2010-03-26 2011-03-25 Fine grain performance resource management of computer systems WO2011120019A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020127027941A KR20130081213A (ko) 2010-03-26 2011-03-25 컴퓨터 시스템들의 미세한 단위의 성능 자원 관리
JP2013501534A JP2013527516A (ja) 2010-03-26 2011-03-25 コンピュータシステムの細粒度パフォーマンスリソース管理
CN2011800254093A CN102906696A (zh) 2010-03-26 2011-03-25 计算机系统的细粒性能资源管理
EP11760356.3A EP2553573A4 (en) 2010-03-26 2011-03-25 FINANCIAL PERFORMANCE RESOURCE MANAGEMENT OF COMPUTER SYSTEMS

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US34117010P 2010-03-26 2010-03-26
US34106910P 2010-03-26 2010-03-26
US61/341,170 2010-03-26
US61/341,069 2010-03-26

Publications (2)

Publication Number Publication Date
WO2011120019A2 true WO2011120019A2 (en) 2011-09-29
WO2011120019A3 WO2011120019A3 (en) 2012-01-26

Family

ID=44673905

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/030096 WO2011120019A2 (en) 2010-03-26 2011-03-25 Fine grain performance resource management of computer systems

Country Status (5)

Country Link
EP (1) EP2553573A4 (ja)
JP (1) JP2013527516A (ja)
KR (1) KR20130081213A (ja)
CN (1) CN102906696A (ja)
WO (1) WO2011120019A2 (ja)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014085707A (ja) * 2012-10-19 2014-05-12 Renesas Electronics Corp キャッシュ制御装置及びキャッシュ制御方法
WO2014113055A1 (en) 2013-01-17 2014-07-24 Xockets IP, LLC Offload processor modules for connection to system memory
WO2014138354A1 (en) * 2013-03-08 2014-09-12 Insyde Software Corp. A method and device to perform event thresholding in a firmware environment utilizing a scalable sliding time window
US9286472B2 (en) 2012-05-22 2016-03-15 Xockets, Inc. Efficient packet handling, redirection, and inspection using offload processors
US9378161B1 (en) 2013-01-17 2016-06-28 Xockets, Inc. Full bandwidth packet handling with server systems including offload processors
US9495308B2 (en) 2012-05-22 2016-11-15 Xockets, Inc. Offloading of computation for rack level servers and corresponding methods and systems
CN107291370A (zh) * 2016-03-30 2017-10-24 杭州海康威视数字技术股份有限公司 一种云存储系统调度方法和装置
CN107463357A (zh) * 2017-08-22 2017-12-12 中车青岛四方车辆研究所有限公司 任务调度系统、调度方法、制动仿真系统及仿真方法
CN107547270A (zh) * 2017-08-14 2018-01-05 天脉聚源(北京)科技有限公司 一种智能分配任务分片的方法及装置
EP3361386A1 (en) * 2012-09-29 2018-08-15 Intel Corporation Intelligent far memory bandwidth scaling
US10209998B2 (en) 2016-06-17 2019-02-19 Via Alliance Semiconductor Co., Ltd. Multi-threading processor and a scheduling method thereof
WO2021171156A1 (en) * 2020-02-28 2021-09-02 3M Innovative Properties Company Deep causal learning for advanced model predictive control
WO2022212385A1 (en) * 2021-03-31 2022-10-06 Advanced Micro Devices, Inc. Low power state selection based on idle duration history
US20230099950A1 (en) * 2021-09-24 2023-03-30 Ati Technologies Ulc Scheduling and clock management for real-time system quality of service (qos)
US11714549B2 (en) 2020-02-28 2023-08-01 3M Innovative Properties Company Deep causal learning for data storage and processing power management
WO2024001994A1 (zh) * 2022-06-28 2024-01-04 华为技术有限公司 节能管理方法、装置、计算设备及计算机可读存储介质

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101587579B1 (ko) * 2014-02-19 2016-01-22 한국과학기술원 가상화 시스템에서 메모리 조정방법
WO2015145598A1 (ja) * 2014-03-26 2015-10-01 株式会社 日立製作所 並列演算処理システムのデータ配分装置、データ配分方法、及びデータ配分プログラム
CN103929769B (zh) * 2014-05-04 2017-02-15 中国科学院微电子研究所 一种应用于无线通信系统仿真的调度方法及系统
CN105357097A (zh) * 2014-08-19 2016-02-24 中兴通讯股份有限公司 虚拟网络的调节方法及系统
EP3230874B1 (en) * 2014-12-14 2021-04-28 VIA Alliance Semiconductor Co., Ltd. Fully associative cache memory budgeted by memory access type
US10157081B2 (en) * 2015-11-13 2018-12-18 Telefonaktiebolaget Lm Ericsson (Publ) Trainer of many core systems for adaptive resource control
US10146583B2 (en) * 2016-08-11 2018-12-04 Samsung Electronics Co., Ltd. System and method for dynamically managing compute and I/O resources in data processing systems
KR101958112B1 (ko) * 2017-09-29 2019-07-04 건국대학교 산학협력단 태스크 스케줄링 장치 및 태스크 스케줄링 방법
CN111475297B (zh) * 2018-06-27 2023-04-07 国家超级计算天津中心 一种作业柔性配置方法
CN110852965A (zh) * 2019-10-31 2020-02-28 湖北大学 一种基于生成对抗网络的视频光照增强方法及系统
CN112965885B (zh) * 2019-12-12 2024-03-01 中科寒武纪科技股份有限公司 访存带宽的检测方法、装置、计算机设备及可读存储介质
CN110874272A (zh) * 2020-01-16 2020-03-10 北京懿医云科技有限公司 资源配置方法及装置、计算机可读存储介质、电子设备
CN111506402B (zh) * 2020-03-31 2023-06-27 上海氪信信息技术有限公司 面向机器学习建模的计算机任务调度方法、装置、设备及介质
CN114724233B (zh) * 2020-12-21 2024-06-25 青岛海尔多媒体有限公司 用于终端设备姿势控制的方法及装置、终端设备
CN112559440B (zh) * 2020-12-30 2022-11-25 海光信息技术股份有限公司 多小芯片系统中实现串行业务性能优化的方法及装置
CN112925633A (zh) * 2021-05-12 2021-06-08 浙江华创视讯科技有限公司 嵌入式的任务调度方法、装置、电子设备及存储介质
CN113589916B (zh) * 2021-07-29 2024-07-26 维沃移动通信有限公司 内存的控制方法和装置
KR20230119832A (ko) * 2022-02-08 2023-08-16 삼성전자주식회사 태스크에 메모리 자원을 할당하는 전자 장치 및 전자 장치의 동작 방법

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7228546B1 (en) * 2000-01-28 2007-06-05 Hewlett-Packard Development Company, L.P. Dynamic management of computer workloads through service level optimization
US6845456B1 (en) * 2001-05-01 2005-01-18 Advanced Micro Devices, Inc. CPU utilization measurement techniques for use in power management
US7539994B2 (en) * 2003-01-03 2009-05-26 Intel Corporation Dynamic performance and resource management in a processing system
US7770034B2 (en) * 2003-12-16 2010-08-03 Intel Corporation Performance monitoring based dynamic voltage and frequency scaling
US20050198636A1 (en) * 2004-02-26 2005-09-08 International Business Machines Corporation Dynamic optimization of batch processing
US7281145B2 (en) * 2004-06-24 2007-10-09 International Business Machiness Corporation Method for managing resources in a CPU by allocating a specified percentage of CPU resources to high priority applications
JP4117889B2 (ja) * 2004-11-08 2008-07-16 インターナショナル・ビジネス・マシーンズ・コーポレーション ウェブアプリケーションを実行するための通信を制御するコンピュータおよびその方法
US7721127B2 (en) * 2006-03-28 2010-05-18 Mips Technologies, Inc. Multithreaded dynamic voltage-frequency scaling microprocessor
EP2031510A4 (en) * 2006-06-07 2011-07-06 Hitachi Ltd INTEGRATED SEMICONDUCTOR SWITCHING
JP2008282150A (ja) * 2007-05-09 2008-11-20 Matsushita Electric Ind Co Ltd 信号処理装置及び信号処理システム
WO2009029549A2 (en) * 2007-08-24 2009-03-05 Virtualmetrix, Inc. Method and apparatus for fine grain performance management of computer systems
JP5040773B2 (ja) * 2008-03-31 2012-10-03 富士通株式会社 メモリバッファ割当装置およびプログラム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of EP2553573A4 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9619406B2 (en) 2012-05-22 2017-04-11 Xockets, Inc. Offloading of computation for rack level servers and corresponding methods and systems
US9558351B2 (en) 2012-05-22 2017-01-31 Xockets, Inc. Processing structured and unstructured data using offload processors
US9495308B2 (en) 2012-05-22 2016-11-15 Xockets, Inc. Offloading of computation for rack level servers and corresponding methods and systems
US9286472B2 (en) 2012-05-22 2016-03-15 Xockets, Inc. Efficient packet handling, redirection, and inspection using offload processors
EP3361386A1 (en) * 2012-09-29 2018-08-15 Intel Corporation Intelligent far memory bandwidth scaling
JP2014085707A (ja) * 2012-10-19 2014-05-12 Renesas Electronics Corp キャッシュ制御装置及びキャッシュ制御方法
US9378161B1 (en) 2013-01-17 2016-06-28 Xockets, Inc. Full bandwidth packet handling with server systems including offload processors
US9436638B1 (en) 2013-01-17 2016-09-06 Xockets, Inc. Full bandwidth packet handling with server systems including offload processors
US9436639B1 (en) 2013-01-17 2016-09-06 Xockets, Inc. Full bandwidth packet handling with server systems including offload processors
US9436640B1 (en) 2013-01-17 2016-09-06 Xockets, Inc. Full bandwidth packet handling with server systems including offload processors
US9460031B1 (en) 2013-01-17 2016-10-04 Xockets, Inc. Full bandwidth packet handling with server systems including offload processors
US9288101B1 (en) 2013-01-17 2016-03-15 Xockets, Inc. Full bandwidth packet handling with server systems including offload processors
US9250954B2 (en) 2013-01-17 2016-02-02 Xockets, Inc. Offload processor modules for connection to system memory, and corresponding methods and systems
US9348638B2 (en) 2013-01-17 2016-05-24 Xockets, Inc. Offload processor modules for connection to system memory, and corresponding methods and systems
WO2014113055A1 (en) 2013-01-17 2014-07-24 Xockets IP, LLC Offload processor modules for connection to system memory
WO2014138354A1 (en) * 2013-03-08 2014-09-12 Insyde Software Corp. A method and device to perform event thresholding in a firmware environment utilizing a scalable sliding time window
US10353765B2 (en) 2013-03-08 2019-07-16 Insyde Software Corp. Method and device to perform event thresholding in a firmware environment utilizing a scalable sliding time-window
CN107291370A (zh) * 2016-03-30 2017-10-24 杭州海康威视数字技术股份有限公司 一种云存储系统调度方法和装置
US10209998B2 (en) 2016-06-17 2019-02-19 Via Alliance Semiconductor Co., Ltd. Multi-threading processor and a scheduling method thereof
CN107547270A (zh) * 2017-08-14 2018-01-05 天脉聚源(北京)科技有限公司 一种智能分配任务分片的方法及装置
CN107463357A (zh) * 2017-08-22 2017-12-12 中车青岛四方车辆研究所有限公司 任务调度系统、调度方法、制动仿真系统及仿真方法
CN107463357B (zh) * 2017-08-22 2024-03-12 中车青岛四方车辆研究所有限公司 任务调度系统、调度方法、制动仿真系统及仿真方法
WO2021171156A1 (en) * 2020-02-28 2021-09-02 3M Innovative Properties Company Deep causal learning for advanced model predictive control
JP2023505617A (ja) * 2020-02-28 2023-02-09 スリーエム イノベイティブ プロパティズ カンパニー 高度モデル予測制御のための深層因果学習
US11714549B2 (en) 2020-02-28 2023-08-01 3M Innovative Properties Company Deep causal learning for data storage and processing power management
US11983404B2 (en) 2020-02-28 2024-05-14 3M Innovative Properties Company Deep causal learning for data storage and processing power management
WO2022212385A1 (en) * 2021-03-31 2022-10-06 Advanced Micro Devices, Inc. Low power state selection based on idle duration history
US11543877B2 (en) 2021-03-31 2023-01-03 Advanced Micro Devices, Inc. Low power state selection based on idle duration history
US20230099950A1 (en) * 2021-09-24 2023-03-30 Ati Technologies Ulc Scheduling and clock management for real-time system quality of service (qos)
WO2024001994A1 (zh) * 2022-06-28 2024-01-04 华为技术有限公司 节能管理方法、装置、计算设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN102906696A (zh) 2013-01-30
EP2553573A4 (en) 2014-02-19
KR20130081213A (ko) 2013-07-16
JP2013527516A (ja) 2013-06-27
WO2011120019A3 (en) 2012-01-26
EP2553573A2 (en) 2013-02-06

Similar Documents

Publication Publication Date Title
US8782653B2 (en) Fine grain performance resource management of computer systems
WO2011120019A2 (en) Fine grain performance resource management of computer systems
US8302098B2 (en) Hardware utilization-aware thread management in multithreaded computer systems
US8397236B2 (en) Credit based performance managment of computer systems
US8484498B2 (en) Method and apparatus for demand-based control of processing node performance
US7958316B2 (en) Dynamic adjustment of prefetch stream priority
US8219993B2 (en) Frequency scaling of processing unit based on aggregate thread CPI metric
Lee et al. Prefetch-aware DRAM controllers
US8205206B2 (en) Data processing apparatus and method for managing multiple program threads executed by processing circuitry
US8924690B2 (en) Apparatus and method for heterogeneous chip multiprocessors via resource allocation and restriction
JP5735638B2 (ja) キャッシュ制御のための方法および装置
US8321712B2 (en) System and method for reducing power requirements of microprocessors through dynamic allocation of datapath resources
US8522245B2 (en) Thread criticality predictor
Lee et al. Prefetch-aware memory controllers
WO2014042749A1 (en) Distributing power to heterogenous compute elements of a processor
US20110047362A1 (en) Version Pressure Feedback Mechanisms for Speculative Versioning Caches
US10942850B2 (en) Performance telemetry aided processing scheme
CN116088662A (zh) 功耗管理方法、多处理单元系统和功耗管理模组
JP2022549333A (ja) アップストリームリソースを管理する間のスロットリング
CN114651230A (zh) 通过线程调解实现的线程共享资源中的软水印
US20240004725A1 (en) Adaptive power throttling system
US20200348936A1 (en) Method of managing multi-tier memory displacement using software controlled thresholds

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180025409.3

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11760356

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2013501534

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2011760356

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 9035/CHENP/2012

Country of ref document: IN

ENP Entry into the national phase

Ref document number: 20127027941

Country of ref document: KR

Kind code of ref document: A