US20100205602A1 - Mechanism for Scheduling Execution of Threads for Fair Resource Allocation in a Multi-Threaded and/or Multi-Core Processing System - Google Patents
Mechanism for Scheduling Execution of Threads for Fair Resource Allocation in a Multi-Threaded and/or Multi-Core Processing System Download PDFInfo
- Publication number
- US20100205602A1 US20100205602A1 US12/767,662 US76766210A US2010205602A1 US 20100205602 A1 US20100205602 A1 US 20100205602A1 US 76766210 A US76766210 A US 76766210A US 2010205602 A1 US2010205602 A1 US 2010205602A1
- Authority
- US
- United States
- Prior art keywords
- thread
- threads
- processors
- processor
- processor group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000012545 processing Methods 0.000 title claims description 10
- 230000007246 mechanism Effects 0.000 title abstract description 17
- 238000013468 resource allocation Methods 0.000 title 1
- 238000000034 method Methods 0.000 claims description 40
- 238000003860 storage Methods 0.000 claims description 2
- 230000001747 exhibiting effect Effects 0.000 claims 3
- 230000000694 effects Effects 0.000 abstract description 14
- 238000002955 isolation Methods 0.000 abstract description 7
- 230000006399 behavior Effects 0.000 description 22
- 230000008901 benefit Effects 0.000 description 16
- 230000008569 process Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000013459 approach Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000008602 contraction Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 125000000524 functional group Chemical group 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000011900 installation process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/48—Indexing scheme relating to G06F9/48
- G06F2209/483—Multiproc
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/48—Indexing scheme relating to G06F9/48
- G06F2209/485—Resource constraint
Definitions
- This invention relates to schedulers as found in modern operating systems and in particular to a scheduler for use in a computer system with a multi-threaded and/or multi-core architecture.
- CPUs central processing units
- MMU memory management units
- I/O devices like network interfaces, disks, printers, etc.
- Software is also part of a computer system; typically, a software application provides the ultimate utility of the computer system for users.
- OS operating system
- the OS uses a more privileged mode of the CPU(s), so that it can perform operations which software applications cannot.
- One of the main jobs of the OS is to coordinate the access by the various applications to shared system resources.
- this mechanism is usually called a “scheduler,” which is a program that coordinates the use of shared resources according to certain rules programmed into the scheduler by the designer.
- Each task usually comprises one or more execution abstractions known as “threads.”
- a thread typically includes its own instruction pointer and sometimes has its own stack.
- access to a CPU is scheduled per-thread.
- a task is thus an environment in which one or several threads are scheduled independently to run on the CPU(s), and not necessarily all (or even more than one) at a time even in multi-processor architectures.
- One way to accomplish this is of course though the design of the applications themselves.
- Another way is through efficient design of the OS, which usually entails computing an efficient schedule for executing threads.
- a specific scheduling problem is discussed below, but before this it is helpful also to consider some of the different hardware techniques that are being employed to increase overall execution speed, since these hardware choices also impact the problem of scheduling.
- SMP symmetric multi-processor
- an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices.
- each processor may have private cache memory.
- the OS which is aware of the multiple processors, allows truly concurrent execution of multiple threads, typically using time-slicing only when the number of ready threads exceeds the number of CPUs.
- multi-core architectures are fabricated on a single chip. Although each CPU can execute threads independently, the CPUs share at least some cache and in some cases even other resources. Each CPU is provided with its own set of functional units, however, such as its own floating-point and arithmetic/logic units (ALU). Essentially, a multi-core architecture is a multi-processor on a single chip, although with limited resource sharing. Of course, the OS in such a system will be designed to schedule thread execution on one of the multi-core CPUs.
- Still another modern technique that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one logical processor (hardware thread) operates simultaneously on a single chip, but in which the logical processors must flexibly share not only one or more caches (for example, for data, instructions and traces), but also functional units such as the floating-point unit and the ALU, as well as the translation lookaside buffer (TLB), if the TLB is shared.
- logical processor hardware thread
- TLB translation lookaside buffer
- Intel Corporation has developed its “Hyper-Threading Technology” to improve the performance of its Pentium IV and Xeon processor lines.
- the single chip is referred to as a “package.”
- multi-threading does not provide the performance of a true multi-processor or multi-core system, it can improve the utilization of on-chip resources, leading to greater throughput for several important workload types, by exploiting additional instruction-level parallelism that is exposed by executing the instruction streams associated with multiple threads concurrently.
- the OS designates which software threads the logical processor(s) are to execute, and can also issue commands to cause an idle logical processor to be put in a halt state, such that its execution resources are made available for use by any remaining logical processors.
- internal mechanisms of the processor control use of the shared resources by the executing threads.
- the operating system can preempt a thread, that is, force it to give up the CPU on which it is running, in order to run another thread (perhaps one that has not run for some time, or one that the user has given a higher priority to).
- Putting a processor into the halt state typically involves preempting the running thread and instead scheduling on that processor a dedicated idle thread.
- This idle thread may use a processor-specific method to make the execution resources from the hardware context available to other threads in the same functional processor group. For instance, on the Intel IA-32 architecture, the idle thread may issue the “HLT” instruction.
- the problem can arise that one thread might be “anti-cooperative,” meaning that it does not conform to a predetermined notion of “fairness.”
- anti-competitive execution behavior include using so much of or otherwise “hoarding” the shared resource or causing some other state change in the resource, such that a co-executing thread cannot execute as efficiently as it would if it had exclusive or at least “normal” use of the resource, or such that hardware or software intervention is required.
- one thread could theoretically even completely prevent another thread from making forward execution progress, that is, “starving” it, for lack of the shared resource.
- Grunwald offers four possible solutions to the problem microarchitectural denial of service.
- First, Grunwald detects the need for intervention using various mechanisms such as performance counters, computing a function of committed instructions, and monitoring bad events such as cache and pipeline flushes. Then he applies one of four proposed “punition” mechanisms, all of which involve either stalling or suspending offending threads, or specifically modifying the OS kernel so that it changes the scheduling interval of an attacking thread.
- Even Grunwald acknowledges the inadequacy of his proposed software solutions, however, stating that “we think it is better to implement them in microarchitecture” in order to provide “compatibility across a number of operating systems, eliminating processor-specific features.”
- Snavely's scheduler attempts to optimize how much CPU time each thread will get. In the presence of run-time anti-cooperative execution behavior, however, merely allocating more CPU time to a thread does not ensure optimal execution progress. As Grunwald points out, however, even very small thread segments (with self-modifying code, for example) can cause severe performance degradation of another running thread, such that merely reducing allocated time may not eliminate the problem: For example, a processor may have 90% of the total CPU time, but the 10% used by another, coscheduled and highly anti-cooperative thread might cause much of the other processor's 90% to be wasted recovering from the resource hoarding of the anti-cooperative thread. Merely adjusting the amount of time allocated to a given thread therefore ignores the unique features of the SMT architecture, in particular, the presence of more than one logical processor, and simply applies a solution that is also applicable to standard, single-processor systems.
- an anti-cooperative process is not necessarily malicious and may in fact be one that the user wants to have run quickly, perhaps even with a higher priority than other runnable threads.
- a user may suppose that a particular important process contains self-modifying code in a tight loop, or has in the past caused problems for co-scheduled threads in an SMT architecture. Stalling or suspending this thread would therefore benefit other threads, but would lead to a worse result from the user's perspective.
- Proposed mechanisms for dealing with the problem of shared resource hoarding in multi-threaded architectures fail to provide the user with any ability to influence how the OS addresses the problem. It would thus be beneficial to enable the user to control at least some of the decision about what to do in the presence of an anti-cooperative process in a multi-threaded architecture.
- the invention provides a method and corresponding system implementation for controlling execution of a plurality of threads by a processing system that has at least two processors in at least one functional processor group, in which threads coscheduled for execution on the processors share an internal processor group resource.
- the invention senses, during run time, the presence of a rescheduling condition indicating anti-cooperative execution behavior.
- a scheduler reschedules at least one of the threads such that the first and second threads no longer execute in the same functional processor group at the same time.
- anti-competitive execution behavior examples include: use by the first thread of the internal processor group resource causing a denial of use of the resource by the second thread above a minimum acceptable level; triggering more than a threshold number of cache flushes or misses, triggering more than a threshold number of pipeline flushes. etc.
- Rescheduling may be triggered according to rules programmed into the schedule, according to user-input parameters, or both, or disabled altogether.
- the scheduler may input at least one user-specified thread performance requirement and then estimating run-time thread execution performance relative to the performance requirement as a function of an observable condition (for example, performance counters). One measure of anti-cooperative execution behavior will then be violation of the user-specified thread performance requirement.
- the scheduler may input user designation of the first thread as being un-coschedulable with the second thread, in which such user designation is the rescheduling condition.
- it could input at least one user-provided execution guarantee for a designated one of the threads, in which the rescheduling condition is violation of the guarantee; upon violation of the guarantee, the scheduler then reschedules at least one of the coscheduled threads to ensure that the guarantee is met for the designated thread.
- One embodiment of the invention is in a computer system with at least two functional processor groups (such as a multi-threaded processor package or a set or partnered multi-core processors) each having at least two processors (logical or physical, depending on the type of group).
- One rescheduling decision may then be allowing continued execution of the second thread, and rescheduling execution of the first thread on a processor in a different functional processor group.
- the first and second threads can then continue to execute simultaneously but in different functional processor groups.
- the scheduler can implement a rescheduling decision such that it threads are all running simultaneously but the first and second threads are running in different functional processor groups and the third and fourth threads are also running in different functional processor groups.
- the threads originate in at least one virtual machine, in which case the threads may be virtual CPUs.
- the virtual CPUs may themselves be virtualized logical processors within virtualized, functional processor groups.
- the process of rescheduling a thread may include putting the processor on which it was running into a halted state, preempting the thread that is running on that processor and scheduling a different thread to run on that processor.
- rescheduling a thread may alternatively comprise changing its priority relative to the priorities of coschedulable threads.
- FIG. 1 illustrates the simplest two-thread, two-logical-processor case of the mechanism according to the invention for reducing conflicts for a shared resource in a multi-threaded and/or multi-core computer system.
- FIG. 2 illustrates the mechanism according to the invention for descheduling one thread running on a package where anti-cooperative execution behavior is detected.
- FIG. 3 illustrates a normal operating condition in a multi-threaded or multi-core architect, with two pairs of executable threads each running on respective logical processors in a respective processor group.
- FIG. 4 illustrates one scheduling option at two different times in the case where one thread in each of two packages in FIG. 3 is found to display anti-cooperative behavior.
- FIG. 5 illustrates an alternative scheduling option to the one shown in FIG. 4 , namely, a time-shared scheduling that keeps previously co-scheduled threads within the same processor group.
- FIG. 6 illustrates yet another alternative scheduling option to the one shown in FIG. 4 , namely, one in which all executing threads continue to execute simultaneously, but in which previously co-scheduled threads are rescheduled to run on different processor groups.
- FIG. 7 illustrates, on the one hand, a generalized embodiment of the invention, in which one or more guests, each having one or more multi-threaded or multi-core virtual processors, are scheduled using the invention to execute on a hardware platform that also has one or more processor groups, each containing one or more logical (in the multi-threaded case) or partnered physical (in the multi-core case) processors.
- FIG. 1 A pair of “partnered” processors CPU 0 , CPU 1 are associated in a functional group 101 such that they share at least one group resource 102 under the control of known hardware mechanisms within the group.
- SMT simultaneous multi-threaded
- multi-threaded such as Intel Corp.'s Hyper-Threaded Technology
- a scheduler 610 schedules each of a plurality (two are shown by way of example) of logically cooperating executable threads Ta, Tb for execution on the processors CPU 0 , CPU 1 , while an activity sensor 615 within or accessible by the scheduler monitors the behavior of the executing threads. Extensions of this simplified embodiment are described below.
- the scheduler 610 will be part of some known intermediate software layer that mediates access to hardware resources. Examples include an operating system, a virtual machine monitor or hypervisor, a kernel in a virtualized computer system, etc., as will be made clearer below.
- the processors CPU 0 , CPU 1 will be part of a larger set of system hardware 100 , which will include such components as a disk, memory, power and timing devices, I/O controllers, etc.
- system software and hardware are not illustrated or described further here because they are well known and can be assumed to be present in any modern computer system.
- processor group 101 is a multi-thread package, in which the partner processors CPU 0 , CPU 1 are logical processors and the shared resource may be a cache, pipeline, etc.
- Another example of a group would be a set of multi-core processors.
- the invention is not restricted to any particular number of executable threads, or the manner in which they logically cooperate, and there may be more than one processor group, each of which may have two or more associated processors.
- thread often implies a shared address space. This is not necessary in this invention. Rather, as used here, a thread is simply a body of executable code that is scheduled for execution as a unit. Logical cooperation among threads may be simply that they are multiple threads of a the same software entity, which, for the sake of conciseness, is referred to below as a “task,” and which may be, for example, a single process, multiple virtual CPUs in the same virtual machine (see below), etc.
- the activity sensor 615 is a software module comprising computer-executable code that either monitors the activity of executing threads with respect to a predetermined activity parameter, or accesses any known mechanism within the system hardware (including within the processor group itself) to get a value of the activity parameter.
- the activity sensor 615 detects any observable condition such as any of the many known hardware performance counters, or includes software performance counters, to determine, for example, the frequency of pipeline flushes, cache flushes or misses, overflow of a resource, requiring too many floating-point operations per predetermined time unit, or any other event indicative of anti-cooperative execution behavior.
- the activity sensor may operate according to pre-set rules, or by comparing run-time behavior against a user-specified performance threshold or range, or both.
- the activity sensor 615 detects that thread Tb is behaving “anti-cooperatively,” which may be defined in any predetermined sense as any behavior that reduces the ability of one or more other co-scheduled threads to use the shared resource, or that interferes with another thread's attempts to use the resource, such that hardware or software intervention is required.
- anti-cooperative execution behavior Several examples of anti-cooperative execution behavior have been mentioned above.
- the scheduler 610 may deal with the situation according to predetermined rules programmed into the scheduler, or according to one or more options, depending on the desired implementation of the invention.
- three alternatives were provided: 1) do nothing, that is, no intervention, such that the anti-cooperative behavior is allowed to continue; 2) follow rules input by the user or administrator, for example via a console 300 ; or 3) automatically intervene according to predetermined, pre-programmed rules such as when the anti-cooperative behavior causes the sensed or computed value of the activity parameter has exceeded a threshold (for example, too many cache flushes or cache misses) or fallen outside given bounds.
- Options 2) and 3) may lead to the same type of intervention, as described below, although the conditions that trigger the intervention will be either user-selected or pre-set.
- the scheduler 610 deschedules Tb from CPU 1 , allowing thread Ta to run alone, or at least without being co-scheduled with the anti-cooperative thread Tb.
- the scheduler 610 could instead deschedule Ta.
- the processor group 101 is effectively converted into a single-processor configuration, in which it will operate as almost any other non-multi-threaded processor. Threads Ta and Tb can then be scheduled to execute separately. Although this will mean that thread Tb will have to wait, it may actually increase overall execution progress, since thread Ta will be able to execute with full access to the shared resource, without repeated wasted processor cycles needed to reconstruct it. Notice, however, a difference in the approach according to the invention relative to the prior art: Upon detecting anti-cooperative behavior, rather than just adjusting the time allotted to the offending thread, the invention makes use of the features of the multi-threaded processor architecture itself to prevent a partial or total denial of service.
- time-slicing implements only coarse-grained interleaving of software threads (thousands or millions of instructions may execute before switching threads), while the invention implements not only coarse—but also fine-grained interleaving of software threads such that the pipeline may be processing instructions from both threads simultaneously.
- the invention directly attempts to determine anti-cooperative behavior, and does so at run-time, as threads are actually running together to do “real” work.
- FIG. 3 illustrates a configuration of the invention in which two or more tasks 500 - a , 500 - b (only two are shown for simplicity), each having more than one thread Ta 0 , Ta 1 , Tb 0 , Tb 1 (again, only two per task are shown for simplicity) run via the intermediate software layer(s) 600 and are scheduled for execution on any of a plurality of functional processor groups 101 - 1 , 101 - 2 , each of which includes two or more associated physical or logical processors CPU 0 - 1 , CPU 1 - 1 , CPU 0 - 2 , CPU 1 - 2 which share, within each group, a respective resource 102 - 1 , 102 - 2 .
- the number of threads it is not necessary to the invention for the number of threads to be the same in each task, or for the number of processors to be the same in each group, or for the number of threads in any task to be the same as the number of processor groups or number of processors in any given group. All that is necessary is that the scheduler or some analogous component that performs operations according to the invention to be able to schedule a particular thread on a particular processor (or processor group, if mechanisms within the group assign processors to submitted threads).
- FIG. 3 illustrates the “normal,” cooperative multi-threaded situation, in which two threads are running on each processor group, sharing the respective resources. Now if a single thread is detected as being anti-cooperative, the scheduler 610 can deal with this in the same manner as described above for FIGS. 1 and 2 , allowing the threads in the other processor group to continue execution as normal.
- FIGS. 3-6 only the various processor groups are shown. The other hardware and software components of the system may be assumed.
- FIG. 4 illustrates one way for the scheduler 610 to allow the other threads Ta 0 and Tb 0 to proceed, namely, to deschedule Ta 1 and Tb 1 at time t 0 . Threads Ta 1 and Tb 1 can then be rescheduled later, at a time t 1 , when Ta 0 and Tb 0 have completed. Threads Ta 1 and Tb 1 do not have to be rescheduled at the same time, however. The procedure illustrated in FIG.
- FIG. 5 illustrates a different rescheduling option, which may be called a “time shared” scheduling option in that the threads of one task (Ta 0 and Ta 1 , for example) are executed simultaneously, but are isolated from one another by being scheduled onto different groups, which then operate as single- or at least reduced-processor groups.
- the threads of the other task (here, Tb 0 and Tb 1 ), are then rescheduled to run afterwards.
- the decision as to which task's threads are to be given priority may be implemented in any desired manner: Either the task that contained the anti-cooperative thread could be “punished” by having to wait, or its threads could be scheduled to run immediately, with the other, cooperative threads running afterwards.
- FIG. 6 illustrates a scheduling option that handles both these situations: Rather than running the threads on the same processor group, the threads are “cross-scheduled,” that is, both processors in each group are working, but each processor group is handling one thread from each previously coscheduled pair.
- a processor or, more correctly, the thread running on that processor
- this will mean that some thread (either the anti-cooperative thread or one of its thread “victims”) running in the same logical processor group (such as package or multi-core processor set) is preempted and that either another “working” thread (Ta 0 , Tb 1 , etc.) is scheduled to run on that processor, or that an idle thread is.
- halt should not be limited to the sense or particular semantics of the HLT instruction used in most Intel processors, in particular, those with the x 86 architecture.
- the software entities (tasks) in which the various threads originate may be of any type.
- the invention has been found to be particularly advantageous, however, in virtualized computers running on a multi-threaded hardware architecture.
- An example of the invention in this context will now be described. In addition to providing a concrete example of the invention, this will also show how the invention can be generalized, as well as several specific features that improve performance and that can be used in other embodiments of the invention as well.
- VM virtual machine
- a virtual machine is a software abstraction—a “virtualization”—of an actual physical computer system.
- a virtual machine is installed on a “host,” such as the hardware platform 100 .
- FIG. 7 illustrates implementation of the scheduler 610 according to the invention in a virtualized computer system, in which each task whose threads are scheduled is shown as a “guest,” which, in the illustrated embodiment, is assumed by way of example to be a virtual machine.
- guest which, in the illustrated embodiment, is assumed by way of example to be a virtual machine.
- Two guests 500 - 1 , 500 - 2 are shown for the sake of simplicity, although any number may be included, including only one.
- Each VM will typically have both virtual system hardware 501 - 1 , 501 - 2 and guest system software, including or consisting of a guest operating system 520 - 1 , 520 - 2 , which has the typical included and associated software such as drivers as needed.
- the virtual system hardware typically includes virtual system memory 512 , at least one virtual disk 514 , and one or more virtual devices 540 . Note that a disk—virtual or physical—is also a “device,” but is usually considered separately because of its important role. All of the virtual hardware components of the VM may be implemented in software using known techniques to emulate the corresponding physical components.
- each VM 500 - 1 , 500 - 2 itself has a virtualized, multi-threaded processor architecture.
- each guest has a plurality of virtual processor packages (or, more generally, groups), each of which has a different number of logical processors.
- VM 500 - 1 has m virtual processor packages VPACKAGE 1 - m , where VPACKAGE 1 has logical processors VP 0 -VPd; where VPACKAGE m has logical processors VP 0 -VPe; and VM 500 - 2 has n virtual processor packages VPACKAGE 1 - n , VPACKAGE 1 has x logical processors VP 0 -VPx; and VPACKAGE n has logical processors VP 0 -VPy.
- i threads T 0 - 1 to Ti- 1 are shown as being ready and in VM 500 - 2 , j threads T 0 - 2 to Tj- 2 are shown as being ready.
- PACKAGE 1 - p are shown, where PACKAGE 1 has logical processors P 0 - 1 to P 0 - r ; PACKAGE p has logical processors P 0 - p to Ps-p, and so on. As mentioned above, these groups of processors may also be multi-core instead of multi-threaded.
- a VM If a VM is properly designed, then even though applications running within the VM are running indirectly, that is, via its respective guest OS and virtual processor(s), it will act just as it would if run on a “real” computer, except for a decrease in running speed that will be noticeable only in exceptionally time-critical applications. Executable files will be accessed by the guest OS from the virtual disk or virtual memory, which will simply be portions of the actual physical disk or memory allocated to that VM. Once an application is installed within the VM, the guest OS retrieves files from the virtual disk just as if they had been pre-stored as the result of a conventional installation of the application.
- the design and operation of virtual machines are well known in the field of computer science.
- VMM virtual machine monitor
- a VMM is usually a software component that runs directly on top of a host, or directly on the hardware, and virtualizes at least some of the resources of the physical host machine so as to export some hardware interface to the VM.
- the various virtualized hardware components in the VM such as the virtualized processors, the virtual memory, the virtual disk, and the virtual device(s) are shown as being part of each respective VM 500 - 1 , 500 - 2 for the sake of conceptual simplicity—in actual implementations these “components” are usually constructs or emulations exposed to the VM by its respective VMM, for example, as emulators.
- VMM may be set up to expose “generic” devices, which facilitate VM migration and hardware platform-independence.
- the guest OS cannot determine the presence of the VMM and does not access hardware devices directly.
- One advantage of full virtualization is that the guest OS may then often simply be a copy of a conventional operating system.
- Another advantage is that the system provides complete isolation of a VM from other software entities in the system (in particular, from other VMs) if desired. Because such a VM (and thus the user of applications running in the VM) cannot usually detect the presence of the VMM, the VMM and the VM may be viewed as together forming a single virtual computer.
- the guest OS in a so-called “para-virtualized” system is modified to support virtualization, such that it not only has an explicit interface to the VMM, but is sometimes also allowed to access at least one hardware resource directly.
- virtualization transparency is sacrificed to gain speed.
- the VMM is sometimes referred to as a “hypervisor.”
- This invention may be used in both fully virtualized and para-virtualized computer systems. Indeed, virtualization is not a prerequisite for this invention at all, but rather the software mechanisms that implement the method according to the invention may be incorporated into system-level software even in conventional, non-virtualized systems.
- a hosted virtualized computer system an existing, general-purpose operating system forms a “host” OS that is used to perform certain I/O operations, alongside and sometimes at the request of the VMM.
- the Workstation product of VMware, Inc., of Palo Alto, Calif. is an example of a hosted, virtualized computer system, which is also explained in U.S. Pat. No. 6,496,847 (Bugnion, et al., “System and Method for Virtualizing Computer Systems,” 17 Dec. 2002).
- a kernel customized to support virtual computers takes the place of and performs the conventional functions of the host OS, such that virtual computers run on the kernel.
- the kernel also handles any other applications running on the kernel that can be separately scheduled, as well as any temporary “console” operating system, if included, used for booting the system as a whole and for enabling certain user interactions with the kernel.
- the kernel will be the primary if not sole intermediate software layer 600 .
- a kernel offers improved performance because it can be co-developed with the VMMs and be optimized for the characteristics of a workload consisting mostly of virtualized computers. Moreover, a kernel can also be optimized for I/O operations and it allows services to extend across multiple VMs (for example, for resource management).
- the ESX Server product of VMware, Inc. is an example of a non-hosted virtualized computer system.
- FIGS. 1-6 Various options for scheduling different threads on the logical processors of different packages are described above with reference to FIGS. 1-6 . All of these options may be made available in the more generalized system shown in FIG. 7 , such that logical processors are halted as needed, or anti-cooperative threads can be rescheduled on different packages, upon detection of anti-cooperative behavior on the part of any running thread.
- Hyper-Threading it is important to recall that most processor resources are shared between the two executing threads. For instance, the L1, L2 and L3 caches and all functional units (such as the floating point units and arithmetic/logical units) are flexibly shared between the two threads. So, if one thread is using very little cache, the other thread will be able to take advantage of all the unused cache space. However, if both threads demand large amounts of cache, they will compete for the limited capacity likely slow each other down.
- HT is preferably enabled during the ESX Server installation process on any hardware that supports the feature.
- a checkbox is also provided in a Management User Interface to enable or disable HT. Assuming that the user selects multi-threading, the user, for example using the console 300 , is preferably also given the option of enabling or disabling this invention.
- One advantage of the invention is that it requires few other changes to the interface presented to the user—the number of CPUs shown in the Management User Interface will double, and the list of available CPUs for the per VM-only use processors (also known as CPU affinity) will double.
- BIOS Most systems with Intel Xeon MP processors or Intel Xeon processors with at least 512 KB of cache support HT.
- the server BIOS In order for ESX Server to enable multi-threading, the server BIOS must be properly configured with multi-threading enabled. Skilled systems administrators will know how to configure a BIOS; moreover, the factory default BIOS setup often enables HT.
- an operating system can cause logical processors to enter an architecture-dependent halted state, often within the context of an idle thread.
- This halted state frees up hardware execution resources to the partner logical processor (the other logical processor on the same package), so that a thread running on the partner logical processor runs effectively like a thread on a non-HT system.
- the VMware ESX Server preferably uses the halted state aggressively to guarantee full utilization of the system's processing power, even when there are not enough running threads to occupy all logical processors.
- ESX Server accounts for CPU time in terms of “package seconds,” not logical processor seconds.
- a VM running on a logical processor that shares a package with another busy logical processor will be charged for half as much as a VM running on a logical processor with its partner halted.
- a VM is only “half-charged” when it runs on only half of a package, but fully charged if it has the package to itself.
- Performance testing has shown this to be the most accurate and understandable way to quantify the impact of HT performance implications. This style of accounting also makes it easier to compare performance between HT and non-HT systems, because CPU time consumed is measured in the same units on both system types.
- VMware ESX Server preferably coschedules both virtual CPUs in an SMP VM. That is to say, if one virtual CPU in the VM is running, they must both be running or idle. This can lead to a problem of “processor fragmentation” on two-way systems.
- a uni-processor VM is running and a two-processor VM is ready to run:
- One physical CPU will be idle, but ESX Server will not be able to run the SMP VM, because it would need two available physical processors.
- a physical CPU may be left idle. This problem may also arise in the more generalized case shown in FIG.
- VMs have more than two virtual packages and/or more than two logical CPUs per virtual package. For example, is one VM has a single two-CPU package and another VM has a single three-CPU package, then the VM with the three-CPU package would need to wait to make any execution progress in known systems.
- VMware ESX server could dedicate one package (with two logical CPUs) to the SMP VM and another package to the uni-processor VM (running on one logical CPU, with the other halted), thus fully utilizing the system's resources. This increased utilization can lead to substantial performance benefits for realistic workloads with a mix of SMP and uni-processor VMs.
- VMware ESX Server provides a number of improvements and configuration options that advance the state of the art in HT performance and management.
- VMs In VMware ESX Server, VMs typically receive CPU time proportional to an allocation of “shares.” Even in systems that incorporate the invention, VMware ESX Server's CPU resource controls are preferably tightly integrated with HT accounting: Virtual machines still receive to their share allocation, but are capped by user-specified min and max values, which may be entered, for example, using the console 300 . While shares allow relative allocation of resources (so that an administrator can specify one VM should receive twice the resources of another VM, for instance), min and max are absolute guarantees, measured as a percentage of a package's resources.
- a VM with a min of “75%” and a max of “90%” is guaranteed to get at least 75% of a package's time, but never more than 90%, even if extra idle time is available in the system. These limits may be incorporated into the scheduler's 610 scheduling routine in any normal manner.
- ESX Server dynamically expands a high-priority VM to use a full package, by rescheduling its partner logical processor to run an idle thread (which, for example, may execute the HLT instruction), even if other VMs are currently runnable in the system.
- This does not waste resources, but simply redirects them to the high priority VM, so that it can receive up to a full physical package (or two full physical packages for an SMP VM with two virtual CPUs), depending on the administrator-specified configuration.
- This feature differentiates ESX Server from commodity operating systems, which attempt to keep all logical CPUs busy, even if doing so hurts the progress of a high-priority thread. Expansion and contraction are preferably fully dynamic and transparent to the administrator.
- Another user choice made possible by the invention is that the user may specify not only a percentage of a package's time, but may also indicate to the scheduler 610 , via the console 300 or otherwise, such as with settings specified in an associated configuration file or other user-specified configuration state, that a particular thread is known to be anti-cooperative and should not be co-scheduled with other threads.
- the scheduler 610 will then not need to bother detecting whether the indicated thread(s) must be isolated since this will already have been decided.
- user-manual control has an additional benefit: The user can take advantage of the invention to guard against attacks that are either impossible to detect with the activity sensor or that were not known at the time of the scheduler's design.
- HT Technology can provide a useful performance boost for many workloads, it also increases the possibility of performance interference between two applications running simultaneously. For instance, as discussed earlier, an application with extremely poor cache performance may lead to performance problems for another application running on the same physical package.
- ESX server provides an additional level of control for administrators to manage package-sharing settings at the level of the individual VM: Users can select from three choices (called “HT-sharing” settings) for each VM: any sharing, no sharing, or internal sharing only.
- the default setting, “any,” allows the scheduler 610 to schedule virtual CPUs from the designated VM on the same package with any other virtual CPU. This allows the system to exploit HT Technology to its fullest, and it is the best choice for the majority of applications.
- the “internal” setting applies only to SMP VMs. It specifies that the two (or more) virtual CPUs (which form schedulable threads) from the VM in question can share a package together, but not with virtual CPUs from any other VM. This contains any HT performance issues within the designated VM, so it can neither affect the performance of other VMs nor be affected by them. ESX Server can still dedicate a full package to each virtual CPU in the VM, however, if resource constraints and the system activity load permit it. For applications that are quite sensitive to performance variations (such as streaming media servers), this setting may provide the best balance between HT utilization and performance isolation.
- the “no sharing” setting guarantees that each virtual CPU will always run on a full package, with the partner logical CPU halted.
- This setting can be chosen to maximize the VM's isolation, and it is particularly appropriate for virtual machines running applications that are known to perform poorly on multi-threaded systems.
- the “no sharing” option causes the scheduler to implement time-slicing (coarse-grained interleaving) whereas the “internal” and “any” options both lead to fine-grained interleaving.
- ESX Server includes special optimizations to ensure that a rogue thread in one VM can not severely degrade the performance of another VM:
- the scheduler 610 in the ESX Server kernel accesses low-level hardware counters to observe the frequency of events that may indicate potentially anti-cooperative behavior.
- the system automatically “quarantines” that VM by placing it into the “no sharing” state (or, alternatively, the internal state), as described above. This setting protects other VMs from the potential denial of service attack, but does not excessively degrade performance for the misbehaving VM, as it loses only the added benefit of HT. If the degree of anti-cooperative behavior eventually drops below a specified threshold, the VM will be released from the quarantined state and allowed to run on a package along with other threads.
- ESX Server 2.1 has tightly integrated the interrupt-steering code with the HT-aware scheduler 610 .
- ESX Server minimizes unnecessary context switches by preferentially directing interrupts to idling logical processors, which are already waiting in a kernel mode, that is, are available to the kernel.
- the scheduler has to decide which logical processor of a package should begin running a thread, it preferentially chooses the logical processor with the lower interrupt load, which the scheduler 610 may determine using known techniques
- device drivers handle asynchronous events, such as interrupts or “bottom halves,” which are snippets of code (a form of thread) used to aid in the processing of interrupts.
- asynchronous events such as interrupts or “bottom halves,” which are snippets of code (a form of thread) used to aid in the processing of interrupts.
- the manual/automatic quarantining approach according to the invention could also apply to interrupts, such that an interrupt is not handled by a logical CPU on the same package as a “no sharing” thread; furthermore, anti-cooperative device drivers could have their interrupts directed to processors that are not running time-critical threads.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
A thread scheduling mechanism is provided that flexibly enforces performance isolation of multiple threads to alleviate the effect of anti-cooperative execution behavior with respect to a shared resource, for example, hoarding a cache or pipeline, using the hardware capabilities of simultaneous multi-threaded (SMT) or multi-core processors. Given a plurality of threads running on at least two processors in at least one functional processor group, the occurrence of a rescheduling condition indicating anti-cooperative execution behavior is sensed, and, if present, at least one of the threads is rescheduled such that the first and second threads no longer execute in the same functional processor group at the same time.
Description
- This application claims the benefit of U.S. patent application Ser. No. 11/015,506, filed on 16 Dec. 2004, now issued as U.S. Pat. No. 7,707,578.
- 1. Field of the Invention
- This invention relates to schedulers as found in modern operating systems and in particular to a scheduler for use in a computer system with a multi-threaded and/or multi-core architecture.
- 2. Background Art
- As is well known, modern computer systems consist of one or more central processing units (CPUs), as well as supporting hardware such as memory and memory management units (MMU) for each CPU, as well as less essential peripheral hardware such as I/O devices like network interfaces, disks, printers, etc. Software is also part of a computer system; typically, a software application provides the ultimate utility of the computer system for users.
- Users often want to use more than one of these software applications, perhaps concurrently. To make this possible, software applications are typically written to run on top of a more privileged piece of software, often known as the “operating system” (OS), which resides, logically, as or in an intermediate software layer, between the applications and the underlying hardware. The OS uses a more privileged mode of the CPU(s), so that it can perform operations which software applications cannot. One of the main jobs of the OS is to coordinate the access by the various applications to shared system resources.
- Given multiple applications that are to share some system resource, such as CPU or I/O access, some mechanism must exist to coordinate the sharing. In modern OSs, this mechanism is usually called a “scheduler,” which is a program that coordinates the use of shared resources according to certain rules programmed into the scheduler by the designer.
- The most fundamental shared resource is access to the CPU(s), since such access is required for execution of any code. Almost all modern operating systems export some notion of “task” or “process,” which is an abstraction of a CPU and memory. A task is conceptually similar to an execution vehicle, and typically corresponds to a single logical activity that requires computational resources (memory, CPU, and I/O devices) to make forward progress. The operating system multiplexes these tasks onto the physical CPUs and other physical resources of the system.
- Each task usually comprises one or more execution abstractions known as “threads.” A thread typically includes its own instruction pointer and sometimes has its own stack. Typically, access to a CPU is scheduled per-thread. A task is thus an environment in which one or several threads are scheduled independently to run on the CPU(s), and not necessarily all (or even more than one) at a time even in multi-processor architectures.
- A standing goal of all computer design—of both hardware such as CPUs and software such as OSs—is to enable applications to run as fast and as efficiently possible, even when sharing system resources, including the CPU(s). One way to accomplish this is of course though the design of the applications themselves. Another way is through efficient design of the OS, which usually entails computing an efficient schedule for executing threads. A specific scheduling problem is discussed below, but before this it is helpful also to consider some of the different hardware techniques that are being employed to increase overall execution speed, since these hardware choices also impact the problem of scheduling.
- Most personal computer systems are equipped with a single CPU. Because CPUs today are quite fast, a single CPU often provides enough computational power to handle several “concurrent” execution threads by rapidly switching from thread to thread, or even task to task (a procedure sometimes known as time-slicing or multiprogramming). This management of concurrent threads is one of the main responsibilities of almost all operating systems.
- The use of multiple concurrent threads often allows an overall increase in the utilization of the hardware resources. The reason is that while one thread is waiting for input or output to happen, the CPU may execute other “ready” threads. However, as the number of threads, or the workload within each thread, increases, the point may be reached where computational cycles, i.e., CPU power, is the limiting factor. The exact point where this happens depends on the particular workloads.
- To permit computer systems to scale to larger numbers of concurrent threads, systems with multiple CPUs have been developed. These symmetric multi-processor (SMP) systems are available as extensions of the PC platform and from other vendors. Essentially, an SMP system is a hardware platform that connects multiple processors to a shared main memory and shared I/O devices. In addition, each processor may have private cache memory. The OS, which is aware of the multiple processors, allows truly concurrent execution of multiple threads, typically using time-slicing only when the number of ready threads exceeds the number of CPUs.
- Because of advances in manufacturing processes, the density of semiconductor elements per chip has now grown so great that “multi-core” architectures have been made possible; examples include the IBM POWER4 and POWER5 architectures, as well as the Sun UltraSparc IV. In these devices, more than one (at present, two, although this is a currently practical rather than a theoretical limitation) physical CPU is fabricated on a single chip. Although each CPU can execute threads independently, the CPUs share at least some cache and in some cases even other resources. Each CPU is provided with its own set of functional units, however, such as its own floating-point and arithmetic/logic units (ALU). Essentially, a multi-core architecture is a multi-processor on a single chip, although with limited resource sharing. Of course, the OS in such a system will be designed to schedule thread execution on one of the multi-core CPUs.
- Still another modern technique that provides for simultaneous execution of multiple threads is referred to as “simultaneous multi-threading,” in which more than one logical processor (hardware thread) operates simultaneously on a single chip, but in which the logical processors must flexibly share not only one or more caches (for example, for data, instructions and traces), but also functional units such as the floating-point unit and the ALU, as well as the translation lookaside buffer (TLB), if the TLB is shared.
- As one example of an SMT architecture, Intel Corporation has developed its “Hyper-Threading Technology” to improve the performance of its Pentium IV and Xeon processor lines. In Intel's terminology, the single chip is referred to as a “package.” While multi-threading does not provide the performance of a true multi-processor or multi-core system, it can improve the utilization of on-chip resources, leading to greater throughput for several important workload types, by exploiting additional instruction-level parallelism that is exposed by executing the instruction streams associated with multiple threads concurrently.
- To understand the performance implications of simultaneous multi-threading, it is important to understand that most internal processor resources are shared between the two executing threads. For instance, in the Intel architecture, the L1, L2 and L3 caches and all functional units (such as the floating point units and arithmetic/logical units) are flexibly shared between the two threads. If one thread is using very little cache, then the other thread will be able to take advantage of all the unused cache space. However, if both threads demand large amounts of cache, they will compete for the limited capacity and likely slow each other down.
- In an SMT system, the OS designates which software threads the logical processor(s) are to execute, and can also issue commands to cause an idle logical processor to be put in a halt state, such that its execution resources are made available for use by any remaining logical processors. Once threads are scheduled for execution on a multi-threaded hardware processor, internal mechanisms of the processor control use of the shared resources by the executing threads. At any time, the operating system can preempt a thread, that is, force it to give up the CPU on which it is running, in order to run another thread (perhaps one that has not run for some time, or one that the user has given a higher priority to). Putting a processor into the halt state typically involves preempting the running thread and instead scheduling on that processor a dedicated idle thread. This idle thread may use a processor-specific method to make the execution resources from the hardware context available to other threads in the same functional processor group. For instance, on the Intel IA-32 architecture, the idle thread may issue the “HLT” instruction.
- Because at least one resource is shared between the logical processors of a multi-threaded system, the problem can arise that one thread might be “anti-cooperative,” meaning that it does not conform to a predetermined notion of “fairness.” Examples of anti-competitive execution behavior include using so much of or otherwise “hoarding” the shared resource or causing some other state change in the resource, such that a co-executing thread cannot execute as efficiently as it would if it had exclusive or at least “normal” use of the resource, or such that hardware or software intervention is required. In extreme cases, one thread could theoretically even completely prevent another thread from making forward execution progress, that is, “starving” it, for lack of the shared resource.
- One example of this problem is described by Dirk Grunwald and Soraya Ghiasi in “Microarchitectural denial of service: insuring microarchitectural fairness,” International Symposium on Microarchitecture, Proceedings of the 35th annual ACM/IEEE International Symposium on Microarchitecture, Istanbul, Turkey, pp. 409-18, 2002. Although most anti-cooperative applications in the specific SMT architecture they studied caused performance degradations of less than five percent, Grunwald and Ghiasi showed that a malicious application could degrade the performance of another workload running on the same physical package by as much as 90% through, for example, the use of self-modifying code in a tight loop.
- Existing OS schedulers are not designed to cope with such problems as a microarchitectural denial of service conflict (or outright attack); rather, known schedulers may adjust the amount of execution time allocated to each of a set of runnable threads, but this ignores that the allotted execution time of a given thread may be wasted because of the actions of a co-executing, anti-cooperative thread. For example, as Grunwald points out, self-modifying code can lead to frequent complete flushes of a shared trace cache, which means that the cached information of the other running thread will also be lost, such that many processing cycles are needed to build it back up again, over and over. Even though the “nice” thread will have its allotted execution time, it will not be able to use it efficiently and the OS scheduler will not be able to do anything to improve the situation, assuming that the scheduler detects the situation at all.
- Grunwald offers four possible solutions to the problem microarchitectural denial of service. First, Grunwald detects the need for intervention using various mechanisms such as performance counters, computing a function of committed instructions, and monitoring bad events such as cache and pipeline flushes. Then he applies one of four proposed “punition” mechanisms, all of which involve either stalling or suspending offending threads, or specifically modifying the OS kernel so that it changes the scheduling interval of an attacking thread. Even Grunwald acknowledges the inadequacy of his proposed software solutions, however, stating that “we think it is better to implement them in microarchitecture” in order to provide “compatibility across a number of operating systems, eliminating processor-specific features.”
- In general, to the small extent that system designers have recognized and addressed the problem of anti-cooperative processes in multi-threaded environments at all, the solutions have focused either on hardware support, or on ways for the OS scheduler to detect anti-cooperativeness and to adjust the execution time slice given to currently offending processes. One solution proposed by Allan Snavely and Dean M. Tullsen in “Symbiotic jobscheduling for a simultaneous multithreaded processor,” ACM SIGOPS Operating Systems Review, v.34 n.5, p. 234-244, December 2000, involves an “SOS” (Sample, Optimize, Symbios) scheduler that samples the space of possible schedules, examines performance counters and applies heuristics to guess an optimal schedule, then runs the presumed optimal schedule.
- In a refinement, described by Allan Snavely, Dean M. Tullsen and Geoff Voelker in “Symbiotic jobscheduling with priorities for a simultaneous multithreading processor,” ACM SIGMETRICS Performance Evaluation Review, v.30 n.1, June 2002, Snavely et al. incorporate the notion of priorities into the scheduling decisions, such that if a particular thread has a high enough priority, then idle threads are scheduled to run alongside it in the same package so that it is guaranteed enough CPU time.
- One problem with both of Snavely's approaches are the Sample and Optimize phases, during which the processors are devoted to test cases. Only in a later phase are threads actually allowed to run so as to do the work they are intended to do. Because Snavely's method is two-pass, it is not suitable for run-time detection and alleviation of anti-cooperative behavior at actual run time.
- Yet another disadvantage of Snavely's approaches is that his systems do not directly attempt to determine anti-competitive behavior. Because of this, threads that, during the Sample and Optimize phases, appeared to run well together, may not when actually running under normal conditions. In other words, Snavely assumes that threads will cooperate as well during actual “working” execution as they did during the Sample phase, but this assumption may not be correct—Snavely cannot detect and deal with previously undetected, run-time anti-cooperativeness.
- Snavely's scheduler attempts to optimize how much CPU time each thread will get. In the presence of run-time anti-cooperative execution behavior, however, merely allocating more CPU time to a thread does not ensure optimal execution progress. As Grunwald points out, however, even very small thread segments (with self-modifying code, for example) can cause severe performance degradation of another running thread, such that merely reducing allocated time may not eliminate the problem: For example, a processor may have 90% of the total CPU time, but the 10% used by another, coscheduled and highly anti-cooperative thread might cause much of the other processor's 90% to be wasted recovering from the resource hoarding of the anti-cooperative thread. Merely adjusting the amount of time allocated to a given thread therefore ignores the unique features of the SMT architecture, in particular, the presence of more than one logical processor, and simply applies a solution that is also applicable to standard, single-processor systems.
- Conversely, an anti-cooperative process is not necessarily malicious and may in fact be one that the user wants to have run quickly, perhaps even with a higher priority than other runnable threads. For example, a user may suppose that a particular important process contains self-modifying code in a tight loop, or has in the past caused problems for co-scheduled threads in an SMT architecture. Stalling or suspending this thread would therefore benefit other threads, but would lead to a worse result from the user's perspective.
- Proposed mechanisms for dealing with the problem of shared resource hoarding in multi-threaded architectures fail to provide the user with any ability to influence how the OS addresses the problem. It would thus be beneficial to enable the user to control at least some of the decision about what to do in the presence of an anti-cooperative process in a multi-threaded architecture.
- What is needed is a mechanism that more efficiently addresses the problem of anti-cooperative and malicious threads in multi-threaded processor architectures, and that preferably does so with no need for hardware support other than that already provided by the multi-threaded processor. Optionally, it would also be beneficial to give the user at least some control over the mechanism.
- The invention provides a method and corresponding system implementation for controlling execution of a plurality of threads by a processing system that has at least two processors in at least one functional processor group, in which threads coscheduled for execution on the processors share an internal processor group resource. When at least a first and a second thread are coscheduled for execution on the processors of the functional processor group, the invention senses, during run time, the presence of a rescheduling condition indicating anti-cooperative execution behavior. Upon sensing the rescheduling condition, a scheduler reschedules at least one of the threads such that the first and second threads no longer execute in the same functional processor group at the same time.
- Examples of anti-competitive execution behavior include: use by the first thread of the internal processor group resource causing a denial of use of the resource by the second thread above a minimum acceptable level; triggering more than a threshold number of cache flushes or misses, triggering more than a threshold number of pipeline flushes. etc.
- Rescheduling may be triggered according to rules programmed into the schedule, according to user-input parameters, or both, or disabled altogether. For example, the scheduler may input at least one user-specified thread performance requirement and then estimating run-time thread execution performance relative to the performance requirement as a function of an observable condition (for example, performance counters). One measure of anti-cooperative execution behavior will then be violation of the user-specified thread performance requirement.
- According to another optional aspect of the invention, the scheduler may input user designation of the first thread as being un-coschedulable with the second thread, in which such user designation is the rescheduling condition. Alternatively, it could input at least one user-provided execution guarantee for a designated one of the threads, in which the rescheduling condition is violation of the guarantee; upon violation of the guarantee, the scheduler then reschedules at least one of the coscheduled threads to ensure that the guarantee is met for the designated thread.
- One embodiment of the invention is in a computer system with at least two functional processor groups (such as a multi-threaded processor package or a set or partnered multi-core processors) each having at least two processors (logical or physical, depending on the type of group). One rescheduling decision may then be allowing continued execution of the second thread, and rescheduling execution of the first thread on a processor in a different functional processor group. The first and second threads can then continue to execute simultaneously but in different functional processor groups.
- As an example of an expansion of this decision, given four threads executing simultaneously in pairs on the two different processor groups, in which the first and second threads are initially scheduled in a first one of the processor groups and at least a third and a fourth thread are running in a second functional processor group, the scheduler according to the invention can implement a rescheduling decision such that it threads are all running simultaneously but the first and second threads are running in different functional processor groups and the third and fourth threads are also running in different functional processor groups.
- In one advantageous embodiment of the invention, the threads originate in at least one virtual machine, in which case the threads may be virtual CPUs. The virtual CPUs may themselves be virtualized logical processors within virtualized, functional processor groups.
- The process of rescheduling a thread may include putting the processor on which it was running into a halted state, preempting the thread that is running on that processor and scheduling a different thread to run on that processor. In computers in which the processors in the functional processor group support a hardware thread priority, rescheduling a thread may alternatively comprise changing its priority relative to the priorities of coschedulable threads.
-
FIG. 1 illustrates the simplest two-thread, two-logical-processor case of the mechanism according to the invention for reducing conflicts for a shared resource in a multi-threaded and/or multi-core computer system. -
FIG. 2 illustrates the mechanism according to the invention for descheduling one thread running on a package where anti-cooperative execution behavior is detected. -
FIG. 3 illustrates a normal operating condition in a multi-threaded or multi-core architect, with two pairs of executable threads each running on respective logical processors in a respective processor group. -
FIG. 4 illustrates one scheduling option at two different times in the case where one thread in each of two packages inFIG. 3 is found to display anti-cooperative behavior. -
FIG. 5 illustrates an alternative scheduling option to the one shown inFIG. 4 , namely, a time-shared scheduling that keeps previously co-scheduled threads within the same processor group. -
FIG. 6 illustrates yet another alternative scheduling option to the one shown inFIG. 4 , namely, one in which all executing threads continue to execute simultaneously, but in which previously co-scheduled threads are rescheduled to run on different processor groups. -
FIG. 7 illustrates, on the one hand, a generalized embodiment of the invention, in which one or more guests, each having one or more multi-threaded or multi-core virtual processors, are scheduled using the invention to execute on a hardware platform that also has one or more processor groups, each containing one or more logical (in the multi-threaded case) or partnered physical (in the multi-core case) processors. - The main idea of the invention is flexible enforcement of performance isolation using the hardware capabilities of SMT/multi-core processors. The simplest embodiment of the invention is illustrated in
FIG. 1 : A pair of “partnered” processors CPU0, CPU1 are associated in afunctional group 101 such that they share at least onegroup resource 102 under the control of known hardware mechanisms within the group. As just one example, in a simultaneous multi-threaded (SMT, or, here, simply “multi-threaded”) architecture such as Intel Corp.'s Hyper-Threaded Technology, there are two logical processors per package (a type of group), but a hardware mechanism in the processor package itself determines how each thread accesses the trace caches. - A
scheduler 610 schedules each of a plurality (two are shown by way of example) of logically cooperating executable threads Ta, Tb for execution on the processors CPU0, CPU1, while anactivity sensor 615 within or accessible by the scheduler monitors the behavior of the executing threads. Extensions of this simplified embodiment are described below. - The
scheduler 610 will be part of some known intermediate software layer that mediates access to hardware resources. Examples include an operating system, a virtual machine monitor or hypervisor, a kernel in a virtualized computer system, etc., as will be made clearer below. Similarly, the processors CPU0, CPU1 will be part of a larger set ofsystem hardware 100, which will include such components as a disk, memory, power and timing devices, I/O controllers, etc. The other features of system software and hardware are not illustrated or described further here because they are well known and can be assumed to be present in any modern computer system. - One example of a
processor group 101 is a multi-thread package, in which the partner processors CPU0, CPU1 are logical processors and the shared resource may be a cache, pipeline, etc. Another example of a group would be a set of multi-core processors. As will become clearer below, the invention is not restricted to any particular number of executable threads, or the manner in which they logically cooperate, and there may be more than one processor group, each of which may have two or more associated processors. - As commonly (but not universally) used, the term “thread” often implies a shared address space. This is not necessary in this invention. Rather, as used here, a thread is simply a body of executable code that is scheduled for execution as a unit. Logical cooperation among threads may be simply that they are multiple threads of a the same software entity, which, for the sake of conciseness, is referred to below as a “task,” and which may be, for example, a single process, multiple virtual CPUs in the same virtual machine (see below), etc.
- The
activity sensor 615 is a software module comprising computer-executable code that either monitors the activity of executing threads with respect to a predetermined activity parameter, or accesses any known mechanism within the system hardware (including within the processor group itself) to get a value of the activity parameter. For example, depending on the architecture in which the invention is included, theactivity sensor 615 detects any observable condition such as any of the many known hardware performance counters, or includes software performance counters, to determine, for example, the frequency of pipeline flushes, cache flushes or misses, overflow of a resource, requiring too many floating-point operations per predetermined time unit, or any other event indicative of anti-cooperative execution behavior. Note that the activity sensor may operate according to pre-set rules, or by comparing run-time behavior against a user-specified performance threshold or range, or both. - Now assume that the
activity sensor 615 detects that thread Tb is behaving “anti-cooperatively,” which may be defined in any predetermined sense as any behavior that reduces the ability of one or more other co-scheduled threads to use the shared resource, or that interferes with another thread's attempts to use the resource, such that hardware or software intervention is required. Several examples of anti-cooperative execution behavior have been mentioned above. - Upon detecting anti-cooperative execution behavior, the
scheduler 610 may deal with the situation according to predetermined rules programmed into the scheduler, or according to one or more options, depending on the desired implementation of the invention. In one system that incorporates the invention, three alternatives were provided: 1) do nothing, that is, no intervention, such that the anti-cooperative behavior is allowed to continue; 2) follow rules input by the user or administrator, for example via aconsole 300; or 3) automatically intervene according to predetermined, pre-programmed rules such as when the anti-cooperative behavior causes the sensed or computed value of the activity parameter has exceeded a threshold (for example, too many cache flushes or cache misses) or fallen outside given bounds. Options 2) and 3) may lead to the same type of intervention, as described below, although the conditions that trigger the intervention will be either user-selected or pre-set. - For the time being, the discussion of the invention will focus on the manner in which the scheduler intervenes, since this highlights perhaps the most beneficial aspect of the invention: the
scheduler 610 deschedules Tb from CPU1, allowing thread Ta to run alone, or at least without being co-scheduled with the anti-cooperative thread Tb. Alternatively, if Tb is more important in any sense, such as if the user designates it as a higher priority thread, then thescheduler 610 could instead deschedule Ta. - The effect of this is illustrated in
FIG. 2 : Theprocessor group 101 is effectively converted into a single-processor configuration, in which it will operate as almost any other non-multi-threaded processor. Threads Ta and Tb can then be scheduled to execute separately. Although this will mean that thread Tb will have to wait, it may actually increase overall execution progress, since thread Ta will be able to execute with full access to the shared resource, without repeated wasted processor cycles needed to reconstruct it. Notice, however, a difference in the approach according to the invention relative to the prior art: Upon detecting anti-cooperative behavior, rather than just adjusting the time allotted to the offending thread, the invention makes use of the features of the multi-threaded processor architecture itself to prevent a partial or total denial of service. - Additionally, whereas Grunwald's proposals involve penalizing the anti-cooperative thread, this invention does not, but rather simply changes co-scheduling to reduced-processor scheduling (for example, from dual to single, or, more generally, from k processors to k−1), or changes which processor at least one of the threads executes on. A further distinction between known time-slicing techniques and the invention is that time-slicing implements only coarse-grained interleaving of software threads (thousands or millions of instructions may execute before switching threads), while the invention implements not only coarse—but also fine-grained interleaving of software threads such that the pipeline may be processing instructions from both threads simultaneously. Moreover, unlike Snavely's time-slicing proposals, the invention directly attempts to determine anti-cooperative behavior, and does so at run-time, as threads are actually running together to do “real” work.
-
FIG. 3 illustrates a configuration of the invention in which two or more tasks 500-a, 500-b (only two are shown for simplicity), each having more than one thread Ta0, Ta1, Tb0, Tb1 (again, only two per task are shown for simplicity) run via the intermediate software layer(s) 600 and are scheduled for execution on any of a plurality of functional processor groups 101-1, 101-2, each of which includes two or more associated physical or logical processors CPU0-1, CPU1-1, CPU0-2, CPU1-2 which share, within each group, a respective resource 102-1, 102-2. It is not necessary to the invention for the number of threads to be the same in each task, or for the number of processors to be the same in each group, or for the number of threads in any task to be the same as the number of processor groups or number of processors in any given group. All that is necessary is that the scheduler or some analogous component that performs operations according to the invention to be able to schedule a particular thread on a particular processor (or processor group, if mechanisms within the group assign processors to submitted threads). -
FIG. 3 illustrates the “normal,” cooperative multi-threaded situation, in which two threads are running on each processor group, sharing the respective resources. Now if a single thread is detected as being anti-cooperative, thescheduler 610 can deal with this in the same manner as described above forFIGS. 1 and 2 , allowing the threads in the other processor group to continue execution as normal. - For the sake of clarity, in
FIGS. 3-6 , only the various processor groups are shown. The other hardware and software components of the system may be assumed. - Assume, however, that a thread in each group (for example, Ta1 and Tb1) is detected as being anti-cooperative.
FIG. 4 illustrates one way for thescheduler 610 to allow the other threads Ta0 and Tb0 to proceed, namely, to deschedule Ta1 and Tb1 at time t0. Threads Ta1 and Tb1 can then be rescheduled later, at a time t1, when Ta0 and Tb0 have completed. Threads Ta1 and Tb1 do not have to be rescheduled at the same time, however. The procedure illustrated inFIG. 4 can be considered as a “per group” sharing option inasmuch as the same processor group remains dedicated to the associated threads (Ta0 and Ta1, and Tb0 and Tb1), which execute in the same processor group in which they were originally scheduled. -
FIG. 5 illustrates a different rescheduling option, which may be called a “time shared” scheduling option in that the threads of one task (Ta0 and Ta1, for example) are executed simultaneously, but are isolated from one another by being scheduled onto different groups, which then operate as single- or at least reduced-processor groups. The threads of the other task (here, Tb0 and Tb1), are then rescheduled to run afterwards. The decision as to which task's threads are to be given priority may be implemented in any desired manner: Either the task that contained the anti-cooperative thread could be “punished” by having to wait, or its threads could be scheduled to run immediately, with the other, cooperative threads running afterwards. - Assume that two threads are known or found not to cooperate well with each other, but would not as likely degrade the execution of threads of other tasks. Alternatively, assume that it is desired for any reason to ensure that two threads execute in isolation from one another.
FIG. 6 illustrates a scheduling option that handles both these situations: Rather than running the threads on the same processor group, the threads are “cross-scheduled,” that is, both processors in each group are working, but each processor group is handling one thread from each previously coscheduled pair. - In this description of the various embodiments of the invention, it is stated that a processor (or, more correctly, the thread running on that processor) may be rescheduled. Depending on how the scheduler chooses to deal with an anti-cooperative thread, this will mean that some thread (either the anti-cooperative thread or one of its thread “victims”) running in the same logical processor group (such as package or multi-core processor set) is preempted and that either another “working” thread (Ta0, Tb1, etc.) is scheduled to run on that processor, or that an idle thread is. As mentioned above, scheduling an idle thread on a processor effectively puts it into a “halt” state; for purposes of understanding this invention, however, the term “halt” should not be limited to the sense or particular semantics of the HLT instruction used in most Intel processors, in particular, those with the x86 architecture.
- In processor architectures that support a “hardware thread priority,” another way to reschedule a thread would be to change the relative priorities of running threads so that a given processor will execute the anti-cooperative thread much less frequently. Note that this option will generally enforce performance isolation less strictly. Skilled programmers will be able to adapt the notion of rescheduling as described here to the needs of a given architecture.
- As mentioned above, the software entities (tasks) in which the various threads originate may be of any type. The invention has been found to be particularly advantageous, however, in virtualized computers running on a multi-threaded hardware architecture. An example of the invention in this context will now be described. In addition to providing a concrete example of the invention, this will also show how the invention can be generalized, as well as several specific features that improve performance and that can be used in other embodiments of the invention as well.
- The advantages of virtual machine technology have become widely recognized. Among these advantages is the ability to run multiple virtual machines on a single host platform. This makes better use of the capacity of the hardware, while still ensuring that each user enjoys the features of a “complete,” isolated computer. Depending on how it is implemented, virtualization also provides greater security since it can isolate potentially unstable or unsafe software so that it cannot adversely affect the hardware state or system files required for running the physical (as opposed to virtual) hardware.
- As is well known in the field of computer science, a virtual machine (VM) is a software abstraction—a “virtualization”—of an actual physical computer system. A virtual machine is installed on a “host,” such as the
hardware platform 100. - See
FIG. 7 , which illustrates implementation of thescheduler 610 according to the invention in a virtualized computer system, in which each task whose threads are scheduled is shown as a “guest,” which, in the illustrated embodiment, is assumed by way of example to be a virtual machine. Two guests 500-1, 500-2 are shown for the sake of simplicity, although any number may be included, including only one. - Each VM will typically have both virtual system hardware 501-1, 501-2 and guest system software, including or consisting of a guest operating system 520-1, 520-2, which has the typical included and associated software such as drivers as needed. The virtual system hardware typically includes
virtual system memory 512, at least onevirtual disk 514, and one or morevirtual devices 540. Note that a disk—virtual or physical—is also a “device,” but is usually considered separately because of its important role. All of the virtual hardware components of the VM may be implemented in software using known techniques to emulate the corresponding physical components. - In the illustrated embodiment, each VM 500-1, 500-2 itself has a virtualized, multi-threaded processor architecture. In fact, in the configuration shown in
FIG. 7 , each guest has a plurality of virtual processor packages (or, more generally, groups), each of which has a different number of logical processors. Thus, VM 500-1 has m virtual processor packages VPACKAGE 1-m, whereVPACKAGE 1 has logical processors VP0-VPd; where VPACKAGE m has logical processors VP0-VPe; and VM 500-2 has n virtual processor packages VPACKAGE 1-n,VPACKAGE 1 has x logical processors VP0-VPx; and VPACKAGE n has logical processors VP0-VPy. In VM 500-1, i threads T0-1 to Ti-1 are shown as being ready and in VM 500-2, j threads T0-2 to Tj-2 are shown as being ready. - As for the illustrated
system hardware 100, p physical processor packages PACKAGE 1-p are shown, wherePACKAGE 1 has logical processors P0-1 to P0-r; PACKAGE p has logical processors P0-p to Ps-p, and so on. As mentioned above, these groups of processors may also be multi-core instead of multi-threaded. - If a VM is properly designed, then even though applications running within the VM are running indirectly, that is, via its respective guest OS and virtual processor(s), it will act just as it would if run on a “real” computer, except for a decrease in running speed that will be noticeable only in exceptionally time-critical applications. Executable files will be accessed by the guest OS from the virtual disk or virtual memory, which will simply be portions of the actual physical disk or memory allocated to that VM. Once an application is installed within the VM, the guest OS retrieves files from the virtual disk just as if they had been pre-stored as the result of a conventional installation of the application. The design and operation of virtual machines are well known in the field of computer science.
- Some interface is usually required between a VM and the underlying host platform 100 (in particular, the hardware CPU(s)), which is responsible for actually executing VM-issued instructions and transferring data to and from the hardware memory and storage devices. A common term for this interface is a “virtual machine monitor” (VMM), which will be included as one of the intermediate software layers but is not labeled specifically in the figures because its presence can be assumed, or because it may be the
intermediate software layer 600 that actually includes the scheduler (depending in the virtualized configuration, as described below). A VMM is usually a software component that runs directly on top of a host, or directly on the hardware, and virtualizes at least some of the resources of the physical host machine so as to export some hardware interface to the VM. - The various virtualized hardware components in the VM, such as the virtualized processors, the virtual memory, the virtual disk, and the virtual device(s) are shown as being part of each respective VM 500-1, 500-2 for the sake of conceptual simplicity—in actual implementations these “components” are usually constructs or emulations exposed to the VM by its respective VMM, for example, as emulators. One advantage of such an arrangement is that the VMM may be set up to expose “generic” devices, which facilitate VM migration and hardware platform-independence.
- In fully virtualized systems, the guest OS cannot determine the presence of the VMM and does not access hardware devices directly. One advantage of full virtualization is that the guest OS may then often simply be a copy of a conventional operating system. Another advantage is that the system provides complete isolation of a VM from other software entities in the system (in particular, from other VMs) if desired. Because such a VM (and thus the user of applications running in the VM) cannot usually detect the presence of the VMM, the VMM and the VM may be viewed as together forming a single virtual computer.
- In contrast, the guest OS in a so-called “para-virtualized” system is modified to support virtualization, such that it not only has an explicit interface to the VMM, but is sometimes also allowed to access at least one hardware resource directly. In short, virtualization transparency is sacrificed to gain speed. In such para-virtualized systems, the VMM is sometimes referred to as a “hypervisor.”
- This invention may be used in both fully virtualized and para-virtualized computer systems. Indeed, virtualization is not a prerequisite for this invention at all, but rather the software mechanisms that implement the method according to the invention may be incorporated into system-level software even in conventional, non-virtualized systems.
- In addition to the distinction between full and partial (para-) virtualization, two arrangements of intermediate system-level software layer(s) are in general use—a “hosted” configuration, and a non-hosted configuration. In a hosted virtualized computer system, an existing, general-purpose operating system forms a “host” OS that is used to perform certain I/O operations, alongside and sometimes at the request of the VMM. The Workstation product of VMware, Inc., of Palo Alto, Calif., is an example of a hosted, virtualized computer system, which is also explained in U.S. Pat. No. 6,496,847 (Bugnion, et al., “System and Method for Virtualizing Computer Systems,” 17 Dec. 2002).
- In a non-hosted virtualized computer system, a kernel customized to support virtual computers takes the place of and performs the conventional functions of the host OS, such that virtual computers run on the kernel. In addition to the various VM/VMMs, the kernel also handles any other applications running on the kernel that can be separately scheduled, as well as any temporary “console” operating system, if included, used for booting the system as a whole and for enabling certain user interactions with the kernel. Thus, in a non-hosted virtualized computer system, the kernel will be the primary if not sole
intermediate software layer 600. - Compared with a system in which VMMs run directly on the hardware platform, use of a kernel offers improved performance because it can be co-developed with the VMMs and be optimized for the characteristics of a workload consisting mostly of virtualized computers. Moreover, a kernel can also be optimized for I/O operations and it allows services to extend across multiple VMs (for example, for resource management). The ESX Server product of VMware, Inc., is an example of a non-hosted virtualized computer system.
- Various options for scheduling different threads on the logical processors of different packages are described above with reference to
FIGS. 1-6 . All of these options may be made available in the more generalized system shown inFIG. 7 , such that logical processors are halted as needed, or anti-cooperative threads can be rescheduled on different packages, upon detection of anti-cooperative behavior on the part of any running thread. - Note that, in a system with guests that have virtual processors, the virtual processors themselves are typically the threads that are scheduled to run on the underlying hardware processors. Separate threads T0-1 to Ti-1 and T0-2 to Tj-2 are shown in
FIG. 7 simply so that this Figure will be easier to compare with the previous figures. - Certain specifics of an implementation of the invention in a non-hosted virtualized computer system—specifically, a version of VMware's ESX Server product—will now be described by way of example. In this example, it is assumed that the processor architecture is Intel Corp.'s Hyper-Threading Technology architecture and that the shared resource in question is a trace cache. To the extent they are needed at all, modifications to the described embodiment to accommodate other architectures (such as multi-core, with more than two logical processors per package, etc.) and shared resources will be within the skill of experienced designers of system-level software.
- To understand the performance implications of Hyper-Threading (HT), it is important to recall that most processor resources are shared between the two executing threads. For instance, the L1, L2 and L3 caches and all functional units (such as the floating point units and arithmetic/logical units) are flexibly shared between the two threads. So, if one thread is using very little cache, the other thread will be able to take advantage of all the unused cache space. However, if both threads demand large amounts of cache, they will compete for the limited capacity likely slow each other down.
- By default, HT is preferably enabled during the ESX Server installation process on any hardware that supports the feature. A checkbox is also provided in a Management User Interface to enable or disable HT. Assuming that the user selects multi-threading, the user, for example using the
console 300, is preferably also given the option of enabling or disabling this invention. One advantage of the invention is that it requires few other changes to the interface presented to the user—the number of CPUs shown in the Management User Interface will double, and the list of available CPUs for the per VM-only use processors (also known as CPU affinity) will double. - Most systems with Intel Xeon MP processors or Intel Xeon processors with at least 512 KB of cache support HT. However, in order for ESX Server to enable multi-threading, the server BIOS must be properly configured with multi-threading enabled. Skilled systems administrators will know how to configure a BIOS; moreover, the factory default BIOS setup often enables HT.
- As mentioned above, an operating system can cause logical processors to enter an architecture-dependent halted state, often within the context of an idle thread. This halted state frees up hardware execution resources to the partner logical processor (the other logical processor on the same package), so that a thread running on the partner logical processor runs effectively like a thread on a non-HT system. The VMware ESX Server preferably uses the halted state aggressively to guarantee full utilization of the system's processing power, even when there are not enough running threads to occupy all logical processors.
- ESX Server accounts for CPU time in terms of “package seconds,” not logical processor seconds. A VM running on a logical processor that shares a package with another busy logical processor will be charged for half as much as a VM running on a logical processor with its partner halted. In other words, a VM is only “half-charged” when it runs on only half of a package, but fully charged if it has the package to itself. Performance testing has shown this to be the most accurate and understandable way to quantify the impact of HT performance implications. This style of accounting also makes it easier to compare performance between HT and non-HT systems, because CPU time consumed is measured in the same units on both system types.
- Because the benefits of HT depend so heavily on the characteristics of the running workload, it is difficult to generalize about the performance impact of HT. Intel suggests that some applications may see performance improvements of up to 30%, but, in practice, these extreme improvements are rare. More typical applications see performance benefits closer to 10%, and a few applications will decrease slightly in performance when run on a multi-threaded system.
- When running symmetric multi-processor (SMP) VMs on a system with two physical packages, however, the performance gains may be more substantial. VMware ESX Server preferably coschedules both virtual CPUs in an SMP VM. That is to say, if one virtual CPU in the VM is running, they must both be running or idle. This can lead to a problem of “processor fragmentation” on two-way systems. Consider the case where a uni-processor VM is running and a two-processor VM is ready to run: One physical CPU will be idle, but ESX Server will not be able to run the SMP VM, because it would need two available physical processors. Thus, a physical CPU may be left idle. This problem may also arise in the more generalized case shown in
FIG. 7 , in which VMs have more than two virtual packages and/or more than two logical CPUs per virtual package. For example, is one VM has a single two-CPU package and another VM has a single three-CPU package, then the VM with the three-CPU package would need to wait to make any execution progress in known systems. - The above situation would not be a problem for a multi-threaded system. For example, VMware ESX server could dedicate one package (with two logical CPUs) to the SMP VM and another package to the uni-processor VM (running on one logical CPU, with the other halted), thus fully utilizing the system's resources. This increased utilization can lead to substantial performance benefits for realistic workloads with a mix of SMP and uni-processor VMs. In addition to the basic features described above, VMware ESX Server provides a number of improvements and configuration options that advance the state of the art in HT performance and management.
- In VMware ESX Server, VMs typically receive CPU time proportional to an allocation of “shares.” Even in systems that incorporate the invention, VMware ESX Server's CPU resource controls are preferably tightly integrated with HT accounting: Virtual machines still receive to their share allocation, but are capped by user-specified min and max values, which may be entered, for example, using the
console 300. While shares allow relative allocation of resources (so that an administrator can specify one VM should receive twice the resources of another VM, for instance), min and max are absolute guarantees, measured as a percentage of a package's resources. That is, a VM with a min of “75%” and a max of “90%” is guaranteed to get at least 75% of a package's time, but never more than 90%, even if extra idle time is available in the system. These limits may be incorporated into the scheduler's 610 scheduling routine in any normal manner. - To achieve this level of fairness, ESX Server dynamically expands a high-priority VM to use a full package, by rescheduling its partner logical processor to run an idle thread (which, for example, may execute the HLT instruction), even if other VMs are currently runnable in the system. This does not waste resources, but simply redirects them to the high priority VM, so that it can receive up to a full physical package (or two full physical packages for an SMP VM with two virtual CPUs), depending on the administrator-specified configuration. This feature differentiates ESX Server from commodity operating systems, which attempt to keep all logical CPUs busy, even if doing so hurts the progress of a high-priority thread. Expansion and contraction are preferably fully dynamic and transparent to the administrator.
- Another user choice made possible by the invention is that the user may specify not only a percentage of a package's time, but may also indicate to the
scheduler 610, via theconsole 300 or otherwise, such as with settings specified in an associated configuration file or other user-specified configuration state, that a particular thread is known to be anti-cooperative and should not be co-scheduled with other threads. Thescheduler 610 will then not need to bother detecting whether the indicated thread(s) must be isolated since this will already have been decided. In addition to saving the scheduler from having to make the decision about quarantining, user-manual control has an additional benefit: The user can take advantage of the invention to guard against attacks that are either impossible to detect with the activity sensor or that were not known at the time of the scheduler's design. - While HT Technology can provide a useful performance boost for many workloads, it also increases the possibility of performance interference between two applications running simultaneously. For instance, as discussed earlier, an application with extremely poor cache performance may lead to performance problems for another application running on the same physical package.
- On a commodity operating system, when an application is observed to interact poorly with HT Technology, the administrator has little choice but to disable HT on the entire machine. ESX server, however, using the invention, provides an additional level of control for administrators to manage package-sharing settings at the level of the individual VM: Users can select from three choices (called “HT-sharing” settings) for each VM: any sharing, no sharing, or internal sharing only. The default setting, “any,” allows the
scheduler 610 to schedule virtual CPUs from the designated VM on the same package with any other virtual CPU. This allows the system to exploit HT Technology to its fullest, and it is the best choice for the majority of applications. - The “internal” setting applies only to SMP VMs. It specifies that the two (or more) virtual CPUs (which form schedulable threads) from the VM in question can share a package together, but not with virtual CPUs from any other VM. This contains any HT performance issues within the designated VM, so it can neither affect the performance of other VMs nor be affected by them. ESX Server can still dedicate a full package to each virtual CPU in the VM, however, if resource constraints and the system activity load permit it. For applications that are quite sensitive to performance variations (such as streaming media servers), this setting may provide the best balance between HT utilization and performance isolation.
- Finally, the “no sharing” setting guarantees that each virtual CPU will always run on a full package, with the partner logical CPU halted. This setting can be chosen to maximize the VM's isolation, and it is particularly appropriate for virtual machines running applications that are known to perform poorly on multi-threaded systems. Note that the “no sharing” option causes the scheduler to implement time-slicing (coarse-grained interleaving) whereas the “internal” and “any” options both lead to fine-grained interleaving.
- As mentioned above, Grunwald showed a particularly malicious application could degrade the performance of another workload running on the same physical package by as much as 90% through, for example, the use of self-modifying code in a tight loop. Although the inventors have not yet observed such an attack in the field, the invention as incorporated into ESX Server includes special optimizations to ensure that a rogue thread in one VM can not severely degrade the performance of another VM: The
scheduler 610 in the ESX Server kernel accesses low-level hardware counters to observe the frequency of events that may indicate potentially anti-cooperative behavior. - If the number of harmful events observed in a given time period for a certain VM is too high, the system automatically “quarantines” that VM by placing it into the “no sharing” state (or, alternatively, the internal state), as described above. This setting protects other VMs from the potential denial of service attack, but does not excessively degrade performance for the misbehaving VM, as it loses only the added benefit of HT. If the degree of anti-cooperative behavior eventually drops below a specified threshold, the VM will be released from the quarantined state and allowed to run on a package along with other threads.
- Particularly for network-intensive workloads, context switches due to interrupts can be a major source of overhead. To address this problem, VMware ESX Server 2.1 has tightly integrated the interrupt-steering code with the HT-
aware scheduler 610. ESX Server minimizes unnecessary context switches by preferentially directing interrupts to idling logical processors, which are already waiting in a kernel mode, that is, are available to the kernel. Similarly, when the scheduler has to decide which logical processor of a package should begin running a thread, it preferentially chooses the logical processor with the lower interrupt load, which thescheduler 610 may determine using known techniques - In many systems, device drivers handle asynchronous events, such as interrupts or “bottom halves,” which are snippets of code (a form of thread) used to aid in the processing of interrupts. The manual/automatic quarantining approach according to the invention could also apply to interrupts, such that an interrupt is not handled by a logical CPU on the same package as a “no sharing” thread; furthermore, anti-cooperative device drivers could have their interrupts directed to processors that are not running time-critical threads.
Claims (29)
1. A method for controlling execution of a plurality of threads by a processing system that has at least two processors in at least one functional processor group, in which threads coscheduled for execution on the processors share an internal processor group resource, the method comprising:
when at least a first and a second thread are coscheduled for execution on the processors of the functional processor group, sensing during run time the presence of a rescheduling condition indicating that either the first thread or the second thread is exhibiting anti-cooperative execution behavior towards the other coscheduled thread, wherein anti-cooperative execution behavior comprises any behavior that reduces or interferes with the ability of another coscheduled thread to use the shared internal processor group resource;
upon sensing the rescheduling condition, rescheduling at least one of the threads such that the first and second threads no longer execute in the same functional processor group at the same time.
2. A method as in claim 1 , in which the anti-cooperative execution behavior is use by the first thread of the internal processor group resource causing a denial of use of the resource by the second thread above a minimum acceptable level.
3. A method as in claim 2 , in which the anti-cooperative execution behavior is triggering more than a threshold number of cache flushes.
4. A method as in claim 2 , in which the anti-cooperative execution behavior is triggering more than a threshold number of cache misses.
5. A method as in claim 2 , in which the anti-cooperative execution behavior is triggering more than a threshold number of pipeline flushes.
6. A method as in claim 1 , further comprising inputting user designation of the first thread as being un-coschedulable with the second thread, in which such user designation is the rescheduling condition.
7. A method as in claim 1 , further comprising:
inputting at least one user-provided execution guarantee for a designated one of the threads, in which the rescheduling condition is violation of the guarantee; and
upon violation of the guarantee, rescheduling at least one of the coscheduled threads to ensure that the guarantee is met for the designated thread.
8. In a computer system with at least two functional processor groups each having at least two processors, the method of claim 1 , further comprising
allowing continued execution of the second thread, and
rescheduling execution of the first thread on a processor in a different functional processor group,
whereby the first and second threads continue to execute simultaneously but in different functional processor groups.
9. A method as in claim 8 , in which the first and second threads are initially scheduled in a first one of the processor groups and at least a third and a fourth thread are running in a second functional processor group, further comprising:
co-scheduling the second and third threads and co-scheduling the first and fourth threads such that the first, second, third and fourth threads are all running simultaneously but the first and second threads are running in different functional processor groups and the third and fourth threads are also running in different functional processor groups.
10. A method as in claim 1 , in which each functional processor group is a multi-threaded processor and the processors are logical processors.
11. A method as in claim 1 , in which each functional processor group is a multi-core processor arrangement and the at least two processors are partnered, physical processors.
12. A method as in claim 1 , in which the threads originate in at least one virtual machine.
13. A method as in claim 12 , in which the threads are virtual CPUs.
14. A method as in claim 13 , in which the virtual CPUs themselves are virtualized logical processors within virtualized, functional processor groups.
15. A method as in claim 1 , in which the step of rescheduling a thread comprises putting the processor on which it was running into a halted state.
16. A method as in claim 1 , in which the step of rescheduling a thread comprises preempting the thread that is running on that processor and scheduling a different thread to run on that processor.
17. In a computer system in which the processors in the functional processor group support a hardware thread priority, the method as in claim 1 , in which the step of rescheduling a thread comprises changing its priority relative to the priorities of coschedulable threads.
18. A method for controlling execution of a plurality of threads by a processing system that has at least two processors in at least one functional processor group, in which threads coscheduled for execution on the processors share an internal processor group resource, the method comprising:
when at least a first and a second thread are coscheduled for execution on the processors of the functional processor group, sensing during run time the presence of a rescheduling condition indicating that either the first thread or the second thread is exhibiting anti-cooperative execution behavior towards the other coscheduled thread, wherein anti-cooperative execution behavior comprises any behavior that reduces or interferes with the ability of another coscheduled thread to use the shared internal processor group resource;
upon sensing the rescheduling condition, rescheduling at least one of the threads such that the first and second threads no longer execute in the same functional processor group at the same time;
in which:
the anti-cooperative execution behavior is use by the first thread of the internal processor group resource causing a denial of use of the resource by the second thread above a minimum acceptable level;
each functional processor group is a multi-threaded processor and the processors are logical processors; and
the threads are virtual CPUs in a virtual machine.
19. A system for controlling execution of a plurality of threads by a processing system that has at least two processors in at least one functional processor group, in which threads coscheduled for execution on the processors share an internal processor group resource, the system comprising:
a scheduling module embodied in a computer readable storage medium comprising computer-executable code
for sensing, during run time, when at least a first and a second thread are coscheduled for execution on the processors of the functional processor group, the presence of a rescheduling condition indicating that either the first thread or the second thread is exhibiting anti-cooperative execution behavior towards the other coscheduled thread, wherein anti-cooperative execution behavior comprises any behavior that reduces or interferes with the ability of another coscheduled thread to use the shared internal processor group resource; and
upon sensing the rescheduling condition, for rescheduling at least one of the threads such that the first and second threads no longer execute in the same functional processor group at the same time.
20. A system as in claim 19 , in which the anti-cooperative execution behavior is use by the first thread of the internal processor group resource causing a denial of use of the resource by the second thread above a minimum acceptable level.
21. A system as in claim 19 , in which internal processor group resource is a cache and the anti-cooperative execution behavior is triggering more than a threshold number of cache flushes.
22. A system as in claim 20 , in which internal processor group resource is a cache and the anti-cooperative execution behavior is triggering more than a threshold number of cache misses.
23. A system as in claim 20 , in which internal processor group resource is a pipeline and the anti-cooperative execution behavior is triggering more than a threshold number of pipeline flushes.
24. A system as in claim 19 , comprising at least two functional processor groups each having at least two processors, the scheduling module being further provided:
for allowing continued execution of the second thread, and
for rescheduling execution of the first thread on a processor in a different functional processor group,
whereby the first and second threads continue to execute simultaneously but in different functional processor groups.
25. A system as in claim 19 , in which each functional processor group is a multi-threaded processor and the processors are logical processors.
26. A system as in claim 19 , in which each functional processor group is a multi-core processor arrangement and the at least two processors are partnered, physical processors.
27. A system as in claim 19 , further comprising at least one virtual machine, in which the threads originate.
28. A system as in claim 27 , in which the threads are virtual CPUs.
29. A system as in claim 28 , in which the virtual CPUs themselves are virtualized logical processors within virtualized, functional processor groups.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/767,662 US20100205602A1 (en) | 2004-12-16 | 2010-04-26 | Mechanism for Scheduling Execution of Threads for Fair Resource Allocation in a Multi-Threaded and/or Multi-Core Processing System |
US13/473,534 US10417048B2 (en) | 2004-12-16 | 2012-05-16 | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/015,506 US7707578B1 (en) | 2004-12-16 | 2004-12-16 | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
US12/767,662 US20100205602A1 (en) | 2004-12-16 | 2010-04-26 | Mechanism for Scheduling Execution of Threads for Fair Resource Allocation in a Multi-Threaded and/or Multi-Core Processing System |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/015,506 Continuation-In-Part US7707578B1 (en) | 2004-12-16 | 2004-12-16 | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
US11/015,506 Continuation US7707578B1 (en) | 2004-12-16 | 2004-12-16 | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/473,534 Continuation US10417048B2 (en) | 2004-12-16 | 2012-05-16 | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
Publications (1)
Publication Number | Publication Date |
---|---|
US20100205602A1 true US20100205602A1 (en) | 2010-08-12 |
Family
ID=42112615
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/015,506 Active 2028-12-03 US7707578B1 (en) | 2004-12-16 | 2004-12-16 | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
US12/767,662 Abandoned US20100205602A1 (en) | 2004-12-16 | 2010-04-26 | Mechanism for Scheduling Execution of Threads for Fair Resource Allocation in a Multi-Threaded and/or Multi-Core Processing System |
US13/473,534 Active 2030-07-06 US10417048B2 (en) | 2004-12-16 | 2012-05-16 | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/015,506 Active 2028-12-03 US7707578B1 (en) | 2004-12-16 | 2004-12-16 | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/473,534 Active 2030-07-06 US10417048B2 (en) | 2004-12-16 | 2012-05-16 | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system |
Country Status (1)
Country | Link |
---|---|
US (3) | US7707578B1 (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090077550A1 (en) * | 2007-09-13 | 2009-03-19 | Scott Rhine | Virtual machine schedular with memory access control |
US20110231855A1 (en) * | 2009-09-24 | 2011-09-22 | Fujitsu Limited | Apparatus and method for controlling priority |
US20120023492A1 (en) * | 2010-07-26 | 2012-01-26 | Microsoft Corporation | Workload interference estimation and performance optimization |
US8127301B1 (en) | 2007-02-16 | 2012-02-28 | Vmware, Inc. | Scheduling selected contexts in response to detecting skew between coscheduled contexts |
US8171488B1 (en) | 2007-02-16 | 2012-05-01 | Vmware, Inc. | Alternating scheduling and descheduling of coscheduled contexts |
US8176493B1 (en) * | 2007-02-16 | 2012-05-08 | Vmware, Inc. | Detecting and responding to skew between coscheduled contexts |
US8296767B1 (en) * | 2007-02-16 | 2012-10-23 | Vmware, Inc. | Defining and measuring skew between coscheduled contexts |
US20130018507A1 (en) * | 2011-07-13 | 2013-01-17 | Kuka Roboter Gmbh | Control System Of A Robot |
US20140129716A1 (en) * | 2012-11-07 | 2014-05-08 | International Business Machines Corporation | Mobility operation resource allocation |
US8752058B1 (en) | 2010-05-11 | 2014-06-10 | Vmware, Inc. | Implicit co-scheduling of CPUs |
US20150012634A1 (en) * | 2012-01-13 | 2015-01-08 | Accenture Global Services Limited | Performance Interference Model for Managing Consolidated Workloads In Qos-Aware Clouds |
US8935699B1 (en) | 2011-10-28 | 2015-01-13 | Amazon Technologies, Inc. | CPU sharing techniques |
US8990802B1 (en) * | 2010-05-24 | 2015-03-24 | Thinking Software, Inc. | Pinball virtual machine (PVM) implementing computing process within a structural space using PVM atoms and PVM atomic threads |
US9104485B1 (en) * | 2011-10-28 | 2015-08-11 | Amazon Technologies, Inc. | CPU sharing techniques |
US20150268942A1 (en) * | 2014-03-18 | 2015-09-24 | International Business Machines Corporation | Controlling execution of binary code |
US9158588B2 (en) | 2012-01-19 | 2015-10-13 | International Business Machines Corporation | Flexible task and thread binding with preferred processors based on thread layout |
US9195805B1 (en) * | 2011-12-08 | 2015-11-24 | Amazon Technologies, Inc. | Adaptive responses to trickle-type denial of service attacks |
US9268542B1 (en) * | 2011-04-28 | 2016-02-23 | Google Inc. | Cache contention management on a multicore processor based on the degree of contention exceeding a threshold |
JP2016165912A (en) * | 2015-03-09 | 2016-09-15 | 株式会社デンソー | Electronic control device |
US9578351B1 (en) | 2015-08-28 | 2017-02-21 | Accenture Global Services Limited | Generating visualizations for display along with video content |
US9940739B2 (en) | 2015-08-28 | 2018-04-10 | Accenture Global Services Limited | Generating interactively mapped data visualizations |
US10061615B2 (en) * | 2012-06-08 | 2018-08-28 | Throughputer, Inc. | Application load adaptive multi-stage parallel data processing architecture |
US10318353B2 (en) | 2011-07-15 | 2019-06-11 | Mark Henrik Sandstrom | Concurrent program execution optimization |
US20200174838A1 (en) * | 2018-11-29 | 2020-06-04 | International Business Machines Corporation | Utilizing accelerators to accelerate data analytic workloads in disaggregated systems |
US10996990B2 (en) | 2018-11-15 | 2021-05-04 | International Business Machines Corporation | Interrupt context switching using dedicated processors |
Families Citing this family (103)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070050777A1 (en) * | 2003-06-09 | 2007-03-01 | Hutchinson Thomas W | Duration of alerts and scanning of large data stores |
US9219917B2 (en) * | 2005-01-19 | 2015-12-22 | Thomson Licensing | Method and apparatus for real time parallel encoding |
US7870406B2 (en) * | 2005-02-03 | 2011-01-11 | International Business Machines Corporation | Method and apparatus for frequency independent processor utilization recording register in a simultaneously multi-threaded processor |
US7631308B2 (en) * | 2005-02-11 | 2009-12-08 | International Business Machines Corporation | Thread priority method for ensuring processing fairness in simultaneous multi-threading microprocessors |
US20070204268A1 (en) * | 2006-02-27 | 2007-08-30 | Red. Hat, Inc. | Methods and systems for scheduling processes in a multi-core processor environment |
US8327115B2 (en) | 2006-04-12 | 2012-12-04 | Soft Machines, Inc. | Plural matrices of execution units for processing matrices of row dependent instructions in single clock cycle in super or separate mode |
US20070294693A1 (en) * | 2006-06-16 | 2007-12-20 | Microsoft Corporation | Scheduling thread execution among a plurality of processors based on evaluation of memory access data |
US8069444B2 (en) * | 2006-08-29 | 2011-11-29 | Oracle America, Inc. | Method and apparatus for achieving fair cache sharing on multi-threaded chip multiprocessors |
EP2523101B1 (en) | 2006-11-14 | 2014-06-04 | Soft Machines, Inc. | Apparatus and method for processing complex instruction formats in a multi- threaded architecture supporting various context switch modes and virtualization schemes |
US8028286B2 (en) * | 2006-11-30 | 2011-09-27 | Oracle America, Inc. | Methods and apparatus for scheduling threads on multicore processors under fair distribution of cache and other shared resources of the processors |
JP4705051B2 (en) | 2007-01-29 | 2011-06-22 | 株式会社日立製作所 | Computer system |
US8286170B2 (en) * | 2007-01-31 | 2012-10-09 | International Business Machines Corporation | System and method for processor thread allocation using delay-costs |
TW200835319A (en) * | 2007-02-07 | 2008-08-16 | Lite On Technology Corp | Method for processing frames of digital broadcast signals and system thereof |
US8739162B2 (en) * | 2007-04-27 | 2014-05-27 | Hewlett-Packard Development Company, L.P. | Accurate measurement of multithreaded processor core utilization and logical processor utilization |
US20080271027A1 (en) * | 2007-04-27 | 2008-10-30 | Norton Scott J | Fair share scheduling with hardware multithreading |
GB2449455B (en) * | 2007-05-22 | 2011-08-03 | Advanced Risc Mach Ltd | A data processing apparatus and method for managing multiple program threads executed by processing circuitry |
US8813080B2 (en) * | 2007-06-28 | 2014-08-19 | Intel Corporation | System and method to optimize OS scheduling decisions for power savings based on temporal characteristics of the scheduled entity and system workload |
US8185907B2 (en) * | 2007-08-20 | 2012-05-22 | International Business Machines Corporation | Method and system for assigning logical partitions to multiple shared processor pools |
US20090070762A1 (en) * | 2007-09-06 | 2009-03-12 | Franaszek Peter A | System and method for event-driven scheduling of computing jobs on a multi-threaded machine using delay-costs |
US8136153B2 (en) * | 2007-11-08 | 2012-03-13 | Samsung Electronics Co., Ltd. | Securing CPU affinity in multiprocessor architectures |
US20090133029A1 (en) * | 2007-11-12 | 2009-05-21 | Srinidhi Varadarajan | Methods and systems for transparent stateful preemption of software system |
US9063778B2 (en) * | 2008-01-09 | 2015-06-23 | Microsoft Technology Licensing, Llc | Fair stateless model checking |
US8191073B2 (en) * | 2008-03-04 | 2012-05-29 | Fortinet, Inc. | Method and system for polling network controllers |
US8245229B2 (en) * | 2008-09-30 | 2012-08-14 | Microsoft Corporation | Temporal batching of I/O jobs |
US8346995B2 (en) | 2008-09-30 | 2013-01-01 | Microsoft Corporation | Balancing usage of hardware devices among clients |
WO2010038307A1 (en) * | 2008-10-03 | 2010-04-08 | 富士通株式会社 | Virtual computer system test method, test program, its recording medium, and virtual computer system |
US9244732B2 (en) * | 2009-08-28 | 2016-01-26 | Vmware, Inc. | Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution |
US8429665B2 (en) * | 2010-03-19 | 2013-04-23 | Vmware, Inc. | Cache performance prediction, partitioning and scheduling based on cache pressure of threads |
US9405931B2 (en) | 2008-11-14 | 2016-08-02 | Dell Products L.P. | Protected information stream allocation using a virtualized platform |
US9396021B2 (en) * | 2008-12-16 | 2016-07-19 | International Business Machines Corporation | Techniques for dynamically assigning jobs to processors in a cluster using local job tables |
US9384042B2 (en) * | 2008-12-16 | 2016-07-05 | International Business Machines Corporation | Techniques for dynamically assigning jobs to processors in a cluster based on inter-thread communications |
US8239524B2 (en) * | 2008-12-16 | 2012-08-07 | International Business Machines Corporation | Techniques for dynamically assigning jobs to processors in a cluster based on processor workload |
US8959517B2 (en) * | 2009-06-10 | 2015-02-17 | Microsoft Corporation | Cancellation mechanism for cancellable tasks including stolen task and descendent of stolen tasks from the cancellable taskgroup |
US8656396B2 (en) * | 2009-08-11 | 2014-02-18 | International Business Machines Corporation | Performance optimization based on threshold performance measure by resuming suspended threads if present or by creating threads within elastic and data parallel operators |
US8448027B2 (en) | 2010-05-27 | 2013-05-21 | International Business Machines Corporation | Energy-efficient failure detection and masking |
KR101685247B1 (en) | 2010-09-17 | 2016-12-09 | 소프트 머신즈, 인크. | Single cycle multi-branch prediction including shadow cache for early far branch prediction |
KR20120083801A (en) * | 2011-01-18 | 2012-07-26 | 삼성전자주식회사 | Apparatus and method of pre-processing multimedia data for virtual machine |
EP2689327B1 (en) | 2011-03-25 | 2021-07-28 | Intel Corporation | Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines |
CN108376097B (en) | 2011-03-25 | 2022-04-15 | 英特尔公司 | Register file segments for supporting code block execution by using virtual cores instantiated by partitionable engines |
KR101966712B1 (en) | 2011-03-25 | 2019-04-09 | 인텔 코포레이션 | Memory fragments for supporting code block execution by using virtual cores instantiated by partitionable engines |
RU2011117765A (en) * | 2011-05-05 | 2012-11-10 | ЭлЭсАй Корпорейшн (US) | DEVICE (OPTIONS) AND METHOD FOR IMPLEMENTING TWO-PASS PLANNER OF LINEAR COMPLEXITY TASKS |
US9237127B2 (en) * | 2011-05-12 | 2016-01-12 | Airmagnet, Inc. | Method and apparatus for dynamic host operating system firewall configuration |
US9183047B2 (en) * | 2011-05-13 | 2015-11-10 | Samsung Electronics Co., Ltd. | Classifying requested application based on processing and response time and scheduling threads of the requested application according to a preset group |
KR101639853B1 (en) | 2011-05-20 | 2016-07-14 | 소프트 머신즈, 인크. | Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines |
CN103649931B (en) | 2011-05-20 | 2016-10-12 | 索夫特机械公司 | For supporting to be performed the interconnection structure of job sequence by multiple engines |
WO2013077876A1 (en) | 2011-11-22 | 2013-05-30 | Soft Machines, Inc. | A microprocessor accelerated code optimizer |
KR101703401B1 (en) | 2011-11-22 | 2017-02-06 | 소프트 머신즈, 인크. | An accelerated code optimizer for a multiengine microprocessor |
US8850450B2 (en) | 2012-01-18 | 2014-09-30 | International Business Machines Corporation | Warning track interruption facility |
US9104508B2 (en) | 2012-01-18 | 2015-08-11 | International Business Machines Corporation | Providing by one program to another program access to a warning track facility |
US9110878B2 (en) | 2012-01-18 | 2015-08-18 | International Business Machines Corporation | Use of a warning track interruption facility by a program |
EP2833264B1 (en) * | 2012-03-29 | 2020-06-24 | Hitachi, Ltd. | Virtual computer schedule method |
US20130332778A1 (en) * | 2012-06-07 | 2013-12-12 | Vmware, Inc. | Performance-imbalance-monitoring processor features |
US9075789B2 (en) * | 2012-12-11 | 2015-07-07 | General Dynamics C4 Systems, Inc. | Methods and apparatus for interleaving priorities of a plurality of virtual processors |
US9201681B2 (en) * | 2013-02-13 | 2015-12-01 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Method and controller device for quality of service (QOS) caching in a virtualized environment |
US9904625B2 (en) | 2013-03-15 | 2018-02-27 | Intel Corporation | Methods, systems and apparatus for predicting the way of a set associative cache |
US9891924B2 (en) | 2013-03-15 | 2018-02-13 | Intel Corporation | Method for implementing a reduced size register view data structure in a microprocessor |
US9632825B2 (en) | 2013-03-15 | 2017-04-25 | Intel Corporation | Method and apparatus for efficient scheduling for asymmetrical execution units |
WO2014150806A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for populating register view data structure by using register template snapshots |
WO2014150971A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for dependency broadcasting through a block organized source view data structure |
US10140138B2 (en) | 2013-03-15 | 2018-11-27 | Intel Corporation | Methods, systems and apparatus for supporting wide and efficient front-end operation with guest-architecture emulation |
US9569216B2 (en) | 2013-03-15 | 2017-02-14 | Soft Machines, Inc. | Method for populating a source view data structure by using register template snapshots |
US9886279B2 (en) | 2013-03-15 | 2018-02-06 | Intel Corporation | Method for populating and instruction view data structure by using register template snapshots |
WO2014151018A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for executing multithreaded instructions grouped onto blocks |
WO2014150991A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for implementing a reduced size register view data structure in a microprocessor |
US9811342B2 (en) | 2013-03-15 | 2017-11-07 | Intel Corporation | Method for performing dual dispatch of blocks and half blocks |
US10275255B2 (en) | 2013-03-15 | 2019-04-30 | Intel Corporation | Method for dependency broadcasting through a source organized source view data structure |
WO2014151043A1 (en) | 2013-03-15 | 2014-09-25 | Soft Machines, Inc. | A method for emulating a guest centralized flag architecture by using a native distributed flag architecture |
US9106391B2 (en) | 2013-05-28 | 2015-08-11 | International Business Machines Corporation | Elastic auto-parallelization for stream processing applications based on a measured throughput and congestion |
US9367472B2 (en) | 2013-06-10 | 2016-06-14 | Oracle International Corporation | Observation of data in persistent memory |
US20150052614A1 (en) * | 2013-08-19 | 2015-02-19 | International Business Machines Corporation | Virtual machine trust isolation in a cloud environment |
US9727361B2 (en) | 2013-12-12 | 2017-08-08 | International Business Machines Corporation | Closed-loop feedback mechanism for achieving optimum performance in a consolidated workload environment |
US9589311B2 (en) * | 2013-12-18 | 2017-03-07 | Intel Corporation | Independent thread saturation of graphics processing units |
US9804846B2 (en) | 2014-03-27 | 2017-10-31 | International Business Machines Corporation | Thread context preservation in a multithreading computer system |
US9354883B2 (en) | 2014-03-27 | 2016-05-31 | International Business Machines Corporation | Dynamic enablement of multithreading |
US9594660B2 (en) | 2014-03-27 | 2017-03-14 | International Business Machines Corporation | Multithreading computer system and program product for executing a query instruction for idle time accumulation among cores |
US9417876B2 (en) | 2014-03-27 | 2016-08-16 | International Business Machines Corporation | Thread context restoration in a multithreading computer system |
US9218185B2 (en) | 2014-03-27 | 2015-12-22 | International Business Machines Corporation | Multithreading capability information retrieval |
US9921848B2 (en) | 2014-03-27 | 2018-03-20 | International Business Machines Corporation | Address expansion and contraction in a multithreading computer system |
US10102004B2 (en) | 2014-03-27 | 2018-10-16 | International Business Machines Corporation | Hardware counters to track utilization in a multithreading computer system |
US9417927B2 (en) | 2014-04-01 | 2016-08-16 | International Business Machines Corporation | Runtime capacity planning in a simultaneous multithreading (SMT) environment |
US9361159B2 (en) | 2014-04-01 | 2016-06-07 | International Business Machines Corporation | Runtime chargeback in a simultaneous multithreading (SMT) environment |
US10642663B2 (en) | 2014-09-10 | 2020-05-05 | Oracle International Corporation | Coordinated garbage collection in distributed systems |
JP6189553B2 (en) | 2014-11-28 | 2017-08-30 | 株式会社日立製作所 | Virtual computer system control method and virtual computer system |
FR3031203B1 (en) * | 2014-12-24 | 2017-03-24 | Bull Sas | METHOD FOR THE ORDERING OF TASKS AT THE KNOB LEVELS OF A COMPUTER CLUSTER, ORDERER OF TASKS AND CLUSTER ASSOCIATED |
WO2016122503A1 (en) | 2015-01-28 | 2016-08-04 | Hewlett-Packard Development Company, L.P. | Collecting hardware performance data |
US10133602B2 (en) | 2015-02-19 | 2018-11-20 | Oracle International Corporation | Adaptive contention-aware thread placement for parallel runtime systems |
US9760404B2 (en) | 2015-09-01 | 2017-09-12 | Intel Corporation | Dynamic tuning of multiprocessor/multicore computing systems |
WO2017095388A1 (en) * | 2015-11-30 | 2017-06-08 | Hewlett-Packard Enterprise Development LP | Managing an isolation context |
US9753760B2 (en) | 2015-12-17 | 2017-09-05 | International Business Machines Corporation | Prioritization of low active thread count virtual machines in virtualized computing environment |
US10372493B2 (en) * | 2015-12-22 | 2019-08-06 | Intel Corporation | Thread and/or virtual machine scheduling for cores with diverse capabilities |
US10216547B2 (en) * | 2016-11-22 | 2019-02-26 | International Business Machines Corporation | Hyper-threaded processor allocation to nodes in multi-tenant distributed software systems |
US10956193B2 (en) * | 2017-03-31 | 2021-03-23 | Microsoft Technology Licensing, Llc | Hypervisor virtual processor execution with extra-hypervisor scheduling |
US10831500B2 (en) | 2018-06-10 | 2020-11-10 | International Business Machines Corporation | Adaptive locking in elastic threading systems |
US11059435B2 (en) * | 2018-09-17 | 2021-07-13 | Drimaes, Inc. | Vehicle software control device |
US11023273B2 (en) | 2019-03-21 | 2021-06-01 | International Business Machines Corporation | Multi-threaded programming |
US11422849B2 (en) * | 2019-08-22 | 2022-08-23 | Intel Corporation | Technology for dynamically grouping threads for energy efficiency |
US11301305B2 (en) | 2020-01-07 | 2022-04-12 | Bank Of America Corporation | Dynamic resource clustering architecture |
US11334393B2 (en) | 2020-01-07 | 2022-05-17 | Bank Of America Corporation | Resource cluster chaining architecture |
US10938742B1 (en) | 2020-01-31 | 2021-03-02 | Bank Of America Corporation | Multiplexed resource allocation architecture |
US11726816B2 (en) * | 2020-07-30 | 2023-08-15 | Vmware, Inc. | Scheduling workloads on a common set of resources by multiple schedulers operating independently |
CN111708613B (en) * | 2020-08-18 | 2020-12-11 | 广东睿江云计算股份有限公司 | Method and system for repairing boot failure card task of VM virtual machine |
US20240004680A1 (en) * | 2022-06-29 | 2024-01-04 | Microsoft Technology Licensing, Llc | CPU Core Off-parking |
US20240111563A1 (en) * | 2022-09-30 | 2024-04-04 | Advanced Micro Devices, Inc. | Security for simultaneous multithreading processors |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040054999A1 (en) * | 2002-08-30 | 2004-03-18 | Willen James W. | Computer OS dispatcher operation with virtual switching queue and IP queues |
US20050125795A1 (en) * | 2003-08-28 | 2005-06-09 | Mips Technologies, Inc. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US7448037B2 (en) * | 2004-01-13 | 2008-11-04 | International Business Machines Corporation | Method and data processing system having dynamic profile-directed feedback at runtime |
US7475399B2 (en) * | 2004-01-13 | 2009-01-06 | International Business Machines Corporation | Method and data processing system optimizing performance through reporting of thread-level hardware resource utilization |
Family Cites Families (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE3789215T2 (en) * | 1986-12-22 | 1994-06-01 | American Telephone & Telegraph | Controlled dynamic load balancing for a multiprocessor system. |
US5692193A (en) * | 1994-03-31 | 1997-11-25 | Nec Research Institute, Inc. | Software architecture for control of highly parallel computer systems |
US6633897B1 (en) * | 1995-06-30 | 2003-10-14 | International Business Machines Corporation | Method and system for scheduling threads within a multiprocessor data processing system using an affinity scheduler |
US5872972A (en) * | 1996-07-05 | 1999-02-16 | Ncr Corporation | Method for load balancing a per processor affinity scheduler wherein processes are strictly affinitized to processors and the migration of a process from an affinitized processor to another available processor is limited |
US6785803B1 (en) * | 1996-11-13 | 2004-08-31 | Intel Corporation | Processor including replay queue to break livelocks |
US6317774B1 (en) * | 1997-01-09 | 2001-11-13 | Microsoft Corporation | Providing predictable scheduling of programs using a repeating precomputed schedule |
US6269391B1 (en) * | 1997-02-24 | 2001-07-31 | Novell, Inc. | Multi-processor scheduling kernel |
US6549930B1 (en) * | 1997-11-26 | 2003-04-15 | Compaq Computer Corporation | Method for scheduling threads in a multithreaded processor |
JP2002041304A (en) * | 2000-07-28 | 2002-02-08 | Hitachi Ltd | Automatic imparting method of backup resource of logical section and logical section based computer system |
JP2002202959A (en) * | 2000-12-28 | 2002-07-19 | Hitachi Ltd | Virtual computer system for performing dynamic resource distribution |
US6996822B1 (en) * | 2001-08-01 | 2006-02-07 | Unisys Corporation | Hierarchical affinity dispatcher for task management in a multiprocessor computer system |
US7191440B2 (en) * | 2001-08-15 | 2007-03-13 | Intel Corporation | Tracking operating system process and thread execution and virtual machine execution in hardware or in a virtual machine monitor |
US7412492B1 (en) * | 2001-09-12 | 2008-08-12 | Vmware, Inc. | Proportional share resource allocation with reduction of unproductive resource consumption |
US20060218556A1 (en) * | 2001-09-28 | 2006-09-28 | Nemirovsky Mario D | Mechanism for managing resource locking in a multi-threaded environment |
US7389506B1 (en) * | 2002-07-30 | 2008-06-17 | Unisys Corporation | Selecting processor configuration based on thread usage in a multiprocessor system |
US8776050B2 (en) * | 2003-08-20 | 2014-07-08 | Oracle International Corporation | Distributed virtual machine monitor for managing multiple virtual resources across multiple physical nodes |
US7404067B2 (en) * | 2003-09-08 | 2008-07-22 | Intel Corporation | Method and apparatus for efficient utilization for prescient instruction prefetch |
US7614056B1 (en) * | 2003-09-12 | 2009-11-03 | Sun Microsystems, Inc. | Processor specific dispatching in a heterogeneous configuration |
US7441242B2 (en) * | 2004-04-22 | 2008-10-21 | International Business Machines Corporation | Monitoring performance of a logically-partitioned computer |
US20060048160A1 (en) * | 2004-09-02 | 2006-03-02 | International Business Machines Corporation | Method, apparatus, and computer program product for providing a self-tunable parameter used for dynamically yielding an idle processor |
US20060136919A1 (en) * | 2004-12-17 | 2006-06-22 | Sun Microsystems, Inc. | System and method for controlling thread suspension in a multithreaded processor |
US8756605B2 (en) * | 2004-12-17 | 2014-06-17 | Oracle America, Inc. | Method and apparatus for scheduling multiple threads for execution in a shared microprocessor pipeline |
US7343476B2 (en) * | 2005-02-10 | 2008-03-11 | International Business Machines Corporation | Intelligent SMT thread hang detect taking into account shared resource contention/blocking |
US8028286B2 (en) * | 2006-11-30 | 2011-09-27 | Oracle America, Inc. | Methods and apparatus for scheduling threads on multicore processors under fair distribution of cache and other shared resources of the processors |
-
2004
- 2004-12-16 US US11/015,506 patent/US7707578B1/en active Active
-
2010
- 2010-04-26 US US12/767,662 patent/US20100205602A1/en not_active Abandoned
-
2012
- 2012-05-16 US US13/473,534 patent/US10417048B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040054999A1 (en) * | 2002-08-30 | 2004-03-18 | Willen James W. | Computer OS dispatcher operation with virtual switching queue and IP queues |
US20050125795A1 (en) * | 2003-08-28 | 2005-06-09 | Mips Technologies, Inc. | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
US7448037B2 (en) * | 2004-01-13 | 2008-11-04 | International Business Machines Corporation | Method and data processing system having dynamic profile-directed feedback at runtime |
US7475399B2 (en) * | 2004-01-13 | 2009-01-06 | International Business Machines Corporation | Method and data processing system optimizing performance through reporting of thread-level hardware resource utilization |
Cited By (67)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8296767B1 (en) * | 2007-02-16 | 2012-10-23 | Vmware, Inc. | Defining and measuring skew between coscheduled contexts |
US8127301B1 (en) | 2007-02-16 | 2012-02-28 | Vmware, Inc. | Scheduling selected contexts in response to detecting skew between coscheduled contexts |
US8171488B1 (en) | 2007-02-16 | 2012-05-01 | Vmware, Inc. | Alternating scheduling and descheduling of coscheduled contexts |
US8176493B1 (en) * | 2007-02-16 | 2012-05-08 | Vmware, Inc. | Detecting and responding to skew between coscheduled contexts |
US20090077550A1 (en) * | 2007-09-13 | 2009-03-19 | Scott Rhine | Virtual machine schedular with memory access control |
US20110231855A1 (en) * | 2009-09-24 | 2011-09-22 | Fujitsu Limited | Apparatus and method for controlling priority |
US8752058B1 (en) | 2010-05-11 | 2014-06-10 | Vmware, Inc. | Implicit co-scheduling of CPUs |
US9632808B2 (en) | 2010-05-11 | 2017-04-25 | Vmware, Inc. | Implicit co-scheduling of CPUs |
US10572282B2 (en) | 2010-05-11 | 2020-02-25 | Vmware, Inc. | Implicit co-scheduling of CPUs |
US8990802B1 (en) * | 2010-05-24 | 2015-03-24 | Thinking Software, Inc. | Pinball virtual machine (PVM) implementing computing process within a structural space using PVM atoms and PVM atomic threads |
US8707300B2 (en) * | 2010-07-26 | 2014-04-22 | Microsoft Corporation | Workload interference estimation and performance optimization |
US10255113B2 (en) | 2010-07-26 | 2019-04-09 | Microsoft Technology Licensing, Llc | Workload interference estimation and performance optimization |
US20120023492A1 (en) * | 2010-07-26 | 2012-01-26 | Microsoft Corporation | Workload interference estimation and performance optimization |
US9268542B1 (en) * | 2011-04-28 | 2016-02-23 | Google Inc. | Cache contention management on a multicore processor based on the degree of contention exceeding a threshold |
US20130018507A1 (en) * | 2011-07-13 | 2013-01-17 | Kuka Roboter Gmbh | Control System Of A Robot |
US9114528B2 (en) * | 2011-07-13 | 2015-08-25 | Kuka Roboter Gmbh | Control system of a robot |
US10514953B2 (en) | 2011-07-15 | 2019-12-24 | Throughputer, Inc. | Systems and methods for managing resource allocation and concurrent program execution on an array of processor cores |
US10318353B2 (en) | 2011-07-15 | 2019-06-11 | Mark Henrik Sandstrom | Concurrent program execution optimization |
US8935699B1 (en) | 2011-10-28 | 2015-01-13 | Amazon Technologies, Inc. | CPU sharing techniques |
US9104485B1 (en) * | 2011-10-28 | 2015-08-11 | Amazon Technologies, Inc. | CPU sharing techniques |
US10620998B2 (en) | 2011-11-04 | 2020-04-14 | Throughputer, Inc. | Task switching and inter-task communications for coordination of applications executing on a multi-user parallel processing architecture |
US10963306B2 (en) | 2011-11-04 | 2021-03-30 | Throughputer, Inc. | Managing resource sharing in a multi-core data processing fabric |
US10437644B2 (en) | 2011-11-04 | 2019-10-08 | Throughputer, Inc. | Task switching and inter-task communications for coordination of applications executing on a multi-user parallel processing architecture |
US10430242B2 (en) | 2011-11-04 | 2019-10-01 | Throughputer, Inc. | Task switching and inter-task communications for coordination of applications executing on a multi-user parallel processing architecture |
US20210303354A1 (en) | 2011-11-04 | 2021-09-30 | Throughputer, Inc. | Managing resource sharing in a multi-core data processing fabric |
US10789099B1 (en) | 2011-11-04 | 2020-09-29 | Throughputer, Inc. | Task switching and inter-task communications for coordination of applications executing on a multi-user parallel processing architecture |
US10310901B2 (en) | 2011-11-04 | 2019-06-04 | Mark Henrik Sandstrom | System and method for input data load adaptive parallel processing |
US10310902B2 (en) | 2011-11-04 | 2019-06-04 | Mark Henrik Sandstrom | System and method for input data load adaptive parallel processing |
US11150948B1 (en) | 2011-11-04 | 2021-10-19 | Throughputer, Inc. | Managing programmable logic-based processing unit allocation on a parallel data processing platform |
US11928508B2 (en) | 2011-11-04 | 2024-03-12 | Throughputer, Inc. | Responding to application demand in a system that uses programmable logic components |
US9195805B1 (en) * | 2011-12-08 | 2015-11-24 | Amazon Technologies, Inc. | Adaptive responses to trickle-type denial of service attacks |
US9588816B2 (en) | 2012-01-13 | 2017-03-07 | Accenture Global Services Limited | Performance interference model for managing consolidated workloads in QOS-aware clouds |
US9344380B2 (en) | 2012-01-13 | 2016-05-17 | Accenture Global Services Limited | Performance interference model for managing consolidated workloads in QoS-aware clouds |
US20150012634A1 (en) * | 2012-01-13 | 2015-01-08 | Accenture Global Services Limited | Performance Interference Model for Managing Consolidated Workloads In Qos-Aware Clouds |
US9026662B2 (en) * | 2012-01-13 | 2015-05-05 | Accenture Global Services Limited | Performance interference model for managing consolidated workloads in QoS-aware clouds |
US9158588B2 (en) | 2012-01-19 | 2015-10-13 | International Business Machines Corporation | Flexible task and thread binding with preferred processors based on thread layout |
US9158587B2 (en) | 2012-01-19 | 2015-10-13 | International Business Machines Corporation | Flexible task and thread binding with preferred processors based on thread layout |
USRE47945E1 (en) | 2012-06-08 | 2020-04-14 | Throughputer, Inc. | Application load adaptive multi-stage parallel data processing architecture |
US10061615B2 (en) * | 2012-06-08 | 2018-08-28 | Throughputer, Inc. | Application load adaptive multi-stage parallel data processing architecture |
USRE47677E1 (en) | 2012-06-08 | 2019-10-29 | Throughputer, Inc. | Prioritizing instances of programs for execution based on input data availability |
US9166865B2 (en) * | 2012-11-07 | 2015-10-20 | International Business Machines Corporation | Mobility operation resource allocation |
US20140129958A1 (en) * | 2012-11-07 | 2014-05-08 | International Business Machines Corporation | Mobility operation resource allocation |
CN103810036A (en) * | 2012-11-07 | 2014-05-21 | 国际商业机器公司 | Mobility operation resource allocation |
US11237856B2 (en) | 2012-11-07 | 2022-02-01 | International Business Machines Corporation | Mobility operation resource allocation |
US20140129716A1 (en) * | 2012-11-07 | 2014-05-08 | International Business Machines Corporation | Mobility operation resource allocation |
US10942778B2 (en) | 2012-11-23 | 2021-03-09 | Throughputer, Inc. | Concurrent program execution optimization |
US11816505B2 (en) | 2013-08-23 | 2023-11-14 | Throughputer, Inc. | Configurable logic platform with reconfigurable processing circuitry |
US11347556B2 (en) | 2013-08-23 | 2022-05-31 | Throughputer, Inc. | Configurable logic platform with reconfigurable processing circuitry |
US11188388B2 (en) | 2013-08-23 | 2021-11-30 | Throughputer, Inc. | Concurrent program execution optimization |
US11687374B2 (en) | 2013-08-23 | 2023-06-27 | Throughputer, Inc. | Configurable logic platform with reconfigurable processing circuitry |
US11915055B2 (en) | 2013-08-23 | 2024-02-27 | Throughputer, Inc. | Configurable logic platform with reconfigurable processing circuitry |
US11500682B1 (en) | 2013-08-23 | 2022-11-15 | Throughputer, Inc. | Configurable logic platform with reconfigurable processing circuitry |
US11385934B2 (en) | 2013-08-23 | 2022-07-12 | Throughputer, Inc. | Configurable logic platform with reconfigurable processing circuitry |
US11036556B1 (en) | 2013-08-23 | 2021-06-15 | Throughputer, Inc. | Concurrent program execution optimization |
US20160179492A1 (en) * | 2014-03-18 | 2016-06-23 | International Business Machines Corporation | Controlling execution of binary code |
US20150268942A1 (en) * | 2014-03-18 | 2015-09-24 | International Business Machines Corporation | Controlling execution of binary code |
US20150293754A1 (en) * | 2014-03-18 | 2015-10-15 | International Business Machines Corporation | Controlling execution of binary code |
US9626169B2 (en) * | 2014-03-18 | 2017-04-18 | International Business Machines Corporation | Controlling execution of binary code |
US9430205B2 (en) * | 2014-03-18 | 2016-08-30 | International Business Machines Corporation | Controlling execution of binary code |
US10241768B2 (en) * | 2014-03-18 | 2019-03-26 | International Business Machines Corporation | Controlling execution of binary code |
US9760357B2 (en) * | 2014-03-18 | 2017-09-12 | International Business Machines Corporation | Controlling execution of binary code |
JP2016165912A (en) * | 2015-03-09 | 2016-09-15 | 株式会社デンソー | Electronic control device |
US9940739B2 (en) | 2015-08-28 | 2018-04-10 | Accenture Global Services Limited | Generating interactively mapped data visualizations |
US9578351B1 (en) | 2015-08-28 | 2017-02-21 | Accenture Global Services Limited | Generating visualizations for display along with video content |
US10996990B2 (en) | 2018-11-15 | 2021-05-04 | International Business Machines Corporation | Interrupt context switching using dedicated processors |
US20200174838A1 (en) * | 2018-11-29 | 2020-06-04 | International Business Machines Corporation | Utilizing accelerators to accelerate data analytic workloads in disaggregated systems |
US11275622B2 (en) * | 2018-11-29 | 2022-03-15 | International Business Machines Corporation | Utilizing accelerators to accelerate data analytic workloads in disaggregated systems |
Also Published As
Publication number | Publication date |
---|---|
US7707578B1 (en) | 2010-04-27 |
US10417048B2 (en) | 2019-09-17 |
US20120227042A1 (en) | 2012-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7707578B1 (en) | Mechanism for scheduling execution of threads for fair resource allocation in a multi-threaded and/or multi-core processing system | |
US7765543B1 (en) | Selective descheduling of idling guests running on a host computer system | |
US8095929B1 (en) | Method and system for determining a cost-benefit metric for potential virtual machine migrations | |
EP3039540B1 (en) | Virtual machine monitor configured to support latency sensitive virtual machines | |
Kim et al. | A coordinated approach for practical OS-level cache management in multi-core real-time systems | |
US7945908B1 (en) | Method and system for improving the accuracy of timing and process accounting within virtual machines | |
US8667500B1 (en) | Use of dynamic entitlement and adaptive threshold for cluster process balancing | |
Lackorzyński et al. | Flattening hierarchical scheduling | |
EP2191369B1 (en) | Reducing the latency of virtual interrupt delivery in virtual machines | |
Kim et al. | Demand-based coordinated scheduling for SMP VMs | |
Cheng et al. | vScale: Automatic and efficient processor scaling for SMP virtual machines | |
WO2006108169A2 (en) | Sequencer address management | |
Kim et al. | Guest-aware priority-based virtual machine scheduling for highly consolidated server | |
Gottschlag et al. | Automatic core specialization for AVX-512 applications | |
Nakajima et al. | Enhancements for {Hyper-Threading} Technology in the Operating System: Seeking the Optimal Scheduling | |
Kim et al. | Transparently bridging semantic gap in cpu management for virtualized environments | |
Lim et al. | Load-balancing for improving user responsiveness on multicore embedded systems | |
US8127301B1 (en) | Scheduling selected contexts in response to detecting skew between coscheduled contexts | |
Pan et al. | Hypervisor support for efficient memory de-duplication | |
US8171488B1 (en) | Alternating scheduling and descheduling of coscheduled contexts | |
KR101330609B1 (en) | Method For Scheduling of Mobile Multi-Core Virtualization System To Guarantee Real Time Process | |
Lackorzynski et al. | Predictable low-latency interrupt response with general-purpose systems | |
Li et al. | A light-weighted virtualization layer for multicore processor-based rich functional embedded systems | |
US11934890B2 (en) | Opportunistic exclusive affinity for threads in a virtualized computing system | |
Mackerras et al. | Operating system exploitation of the POWER5 system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |