US20160034310A1 - Job assignment in a multi-core processor - Google Patents
Job assignment in a multi-core processor Download PDFInfo
- Publication number
- US20160034310A1 US20160034310A1 US14/447,216 US201414447216A US2016034310A1 US 20160034310 A1 US20160034310 A1 US 20160034310A1 US 201414447216 A US201414447216 A US 201414447216A US 2016034310 A1 US2016034310 A1 US 2016034310A1
- Authority
- US
- United States
- Prior art keywords
- cores
- job
- core processor
- parallelism
- degree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004044 response Effects 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 36
- 238000004891 communication Methods 0.000 claims description 31
- 230000009849 deactivation Effects 0.000 claims 5
- 230000003213 activating effect Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000012545 processing Methods 0.000 description 15
- 230000005012 migration Effects 0.000 description 9
- 238000013508 migration Methods 0.000 description 9
- 239000000306 component Substances 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 229910052710 silicon Inorganic materials 0.000 description 2
- 239000010703 silicon Substances 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000003607 modifier Substances 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
- G06F9/5088—Techniques for rebalancing the load in a distributed system involving task migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5094—Allocation of resources, e.g. of the central processing unit [CPU] where the allocation takes into account power or heat criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/501—Performance criteria
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- Datacenters may include one or more servers that may include multi-core processors. Jobs received at the datacenter may be assigned to cores within the multi-core processors based on a scheduling mechanism of a respective server. In some examples, the scheduling mechanism may schedule jobs to be executed on different cores in parallel.
- methods for assigning a job to be executed in a multi-core processor may include receiving, by the multi-core processor, the job at the multi-core processor at an arrival time.
- the multi-core processor may include a first set of cores with a first size.
- the multi-core processor may also include a second set of cores with a second size different from the first size.
- the job may include a request to execute a set of instructions.
- the methods may also include determining, by the multi-core processor, a job arrival rate of the job based on the arrival time of the job.
- the job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job.
- the methods may also include selecting, by the multi-core processor, a degree of parallelism based on the job arrival rate and based on a performance metric.
- the degree of parallelism may relate to a number of parallel threads associated with execution of the request.
- the performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism.
- the methods may also include selecting, by the multi-core processor, the first set of cores based on the job arrival rate and based on a performance metric.
- the methods may also include, in response to the selection of the first set of cores, assigning, by the multi-core processor, the job to be executed on the first set of cores.
- the system may include the multi-core processor.
- the multi-core processor may include a first set of cores with a first size.
- the multi-core processor may also include a second set of cores with a second size different from the first size.
- the systems may also include a memory configured to be in communication with the multi-core processor.
- the multi-core processor may be configured to receive the job at an arrival time.
- the job may include a request to execute a set of instructions.
- the multi-core processor may also be configured to determine a job arrival rate of the job based on the arrival time of the job.
- the job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job.
- the multi-core processor may also be configured to select the first set of cores and select a degree of parallelism based on the job arrival rate and based on a performance metric.
- the degree of parallelism may relate to a number of parallel threads associated with execution of the request.
- the performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism.
- the multi-core processor may also be configured to, in response to the selection of the first set of cores, assign the job to be executed on the first set of cores.
- multi-core processors configured to assign a job to a first set of cores in the multi-core processor are generally described.
- the multi-core processors may include the first set of cores with a first size.
- the multi-core processors may also include a second set of cores with a second size different from the first size.
- the multi-core processors may also include a memory configured to be in communication with the first set of cores and with the second set of cores.
- a particular core among the second set of cores may be configured to receive the job at an arrival time.
- the job may include a request to execute a set of instructions.
- the particular core may also be configured to determine a job arrival rate of the job based on the arrival time of the job.
- the job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job.
- the particular core may also be configured to select the first set of cores and selecting a degree of parallelism based on the job arrival rate and based on a performance metric.
- the degree of parallelism may relate to a number of parallel threads associated with execution of the request.
- the performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism.
- the particular core may also be configured to, in response to the selection of the first set of cores, assign the job to be executed on the first set of cores.
- multi-core processors configured to execute a job on a first set of cores are generally described.
- the multi-core processors may include a memory.
- the multi-core processors may also include a first set of cores with a first size.
- the multi-core processors may also include a second set of cores with a second size different from the first size.
- the multi-core processors may include a switch configured to be in communication with the first set of cores, the second set of cores, and the memory.
- the multi-core processors may also include a power receiver configured to be in communication with the first set of cores, the second set of cores, and the memory. In response to a receipt of a selection signal at the switch, the switch may be configured to activate the first set of cores to execute the job.
- FIG. 1 illustrates an example system that can be utilized to implement job assignment in a multi-core processor
- FIG. 2 illustrates the example system of FIG. 1 with additional details relating to selection of a pod
- FIG. 3 illustrates the example system of FIG. 1 with additional details relating to assigning jobs to a selected pod
- FIG. 4 illustrates a flow diagram for an example process for implementing job assignment in a multi-core processor
- FIG. 5 illustrates an example computer program product that can be utilized to implement job assignment in a multi-core processor
- FIG. 6 is a block diagram illustrating an example computing device that is arranged to implement job assignment in a multi-core processor; all arranged according to at least some embodiments described herein.
- This disclosure is generally drawn, inter alia, to methods, apparatus, systems, devices, and computer program products related to job assignment in a multi-core processor.
- the methods may include receiving the job at the multi-core processor at an arrival time.
- the multi-core processor may include a first set of cores with a first size.
- the multi-core processor may also include a second set of cores with a second size different from the first size.
- the job may include a request to execute a set of instructions.
- the methods may also include determining a job arrival rate of the job based on the arrival time of the job.
- the job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job.
- the methods may also include selecting the first set of cores and selecting a degree of parallelism based on the job arrival rate and based on a performance metric.
- the degree of parallelism may relate to a number of parallel threads associated with execution of the request.
- the performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism.
- the methods may also include, in response to the selection of the first set of cores, assigning the job to be executed on the first set of cores.
- FIG. 1 illustrates an example system 100 that can be utilized to implement job assignment in a multi-core processor, arranged in accordance with at least some embodiments described herein.
- System 100 may be implemented in a datacenter 101 and may include one or more multi-core processors 102 a, 102 b, 102 c, 102 d.
- multi-core processors 102 a, 102 b, 102 c, 102 d may each be a part of a respective server among a plurality of servers in datacenter 101 .
- Multi-core processors 102 , 102 a, 102 b, 102 c may be configured to be in communication with each other.
- Datacenter 101 may include a power source 105 , such as a power generator, configured to provide power to multi-core processors 102 a, 102 b, 102 c, 102 d.
- Multi-core processors 102 a, 102 b, 102 c, 102 d may include the same components. Focusing on multi-core processor 102 a, multi-core processor 102 a, multi-core processor 102 a may include one or more pods 110 , 120 , 130 , a power receiver 106 , a switch 108 , and/or a memory 140 . Power receiver 106 may receive power provided by power source 105 and, in response, may allocate the received power to a pod among pods 110 , 120 , 130 based on a selection of a pod (described below). Power receiver 106 may further be configured to allocate power to memory 140 .
- Switch 108 may be, for example, a multiplexer and may be configured to be in communication with pods 110 , 120 , 130 and/or memory 140 . Switch 108 may be configured to receive selection signals that may be effective to activate one pod among pods 110 , 120 , 130 (described below).
- Memory 140 may be a cache, such as a low-level cache such as a level three (L3) cache that is effective to store data relating to operations of multi-core processor 102 a.
- Memory 140 may include one or more one or more memory banks 142 a, 142 b, 142 c, 142 d. Switch 108 may be further configured to receive selection signals that may be effective to activate at least one memory bank among memory banks 142 a, 142 b, 142 c, 142 d based on a selection of a pod (described below).
- Each pod among pods 110 , 120 , 130 may include a respective set of processor cores (“cores”).
- Pods 110 , 120 , 130 may include a same or different number of cores.
- pod 110 may include nine cores 112 (e.g. 112 a, 112 b, 112 c, 112 d, 112 e, 112 f , 112 g, 112 h, 112 i ).
- Pod 120 may include four cores 122 (e.g. 122 a, 122 b, 122 c, 122 d ).
- Pod 130 may include two cores 132 (e.g. 132 a, 132 b ).
- Each pod among pods 110 , 120 , 130 may include cores of a same core size.
- core size may refer to a nominal frequency that relates to operating frequency of a core.
- cores in pod 110 may be configured to operate under a nominal frequency of 2.0 gigahertz (GHz)
- cores in pod 120 may be configured to operate under a nominal frequency of 2.2 GHz
- cores in pod 130 may be configured to operate under a nominal frequency of 2.4 GHz.
- core size may refer to a dispatch width that relates to a number of instructions that may be executed simultaneously on a core.
- core size may refer to a window size that relates to storage of instructions that are waiting to be returned from a core.
- core size may refer to a peak power consumption of a core.
- core size may refer to cache size of a data cache or an instruction cache associated with a core.
- Each pod may include at least one tile where each tile includes a core and/or one or more modules of memory such as cache. Focusing on pod 130 , pod 130 may include a tile 131 a and a tile 131 b. Tile 131 a may include core 132 a and tile 131 b may include core 132 b . Focusing on tile 131 b, tile 131 b may further include a cache 134 b and a cache 136 b. In examples where cache 134 b is a level one (L1) cache, cache 134 b may be a part of core 132 b . Cache 134 b, 136 b may be configured to store data relating to operations of core 132 b.
- L1 level one
- Core 132 b may be configured to be in communication with cache 134 b and/or cache 136 b. Core 132 b may be configured to execute jobs and/or threads issued by an operating system 104 or jobs received at multi-core processor 102 a. Operating system 104 may be an operating system effective to facilitate operations of multi-core processor 102 a and/or datacenter 101 .
- datacenter 101 may receive a job 160 from an entity such as a device configured to be in communication with datacenter 101 .
- a processor of datacenter 101 may assign job 160 to a multi-core processor such as multi-core processor 102 a.
- Job 160 may include a request to execute a set of instructions relating to contents stored in datacenter 101 .
- datacenter 101 is a datacenter for a host domain such as xyz.com
- job 160 may include a request to execute instructions to search for particular content and/or files relating to webpages of xyz.com.
- job 160 may arrive at multi-core processor 102 a at an arrival time.
- Core 132 b which may be assigned to execute operating system 104 , may determine a job arrival rate 162 based on the arrival time of job 160 at multi-core processor 102 a.
- Job arrival rate 162 may indicate a frequency that multi-core processor 102 a receives jobs such as job 160 .
- Core 132 b may select a pod among pods 110 , 120 , 130 , and may select a degree of parallelism 164 , based on job arrival rate 162 and based on a performance metric 166 .
- Degree of parallelism 164 may relate to a number of parallel threads associated with execution of requests of job 160 .
- Performance metric 166 may relate to parameters associated with an execution of job 160 such as a mean execution time.
- core 132 b may assign job 160 to be executed on the selected pod.
- FIG. 2 illustrates an example system 100 of FIG. 1 with additional details relating to selection of a pod, arranged in accordance with at least some embodiments described herein.
- FIG. 2 is substantially similar to system 100 of FIG. 1 , with additional details. Those components in FIG. 2 that are labeled identically to components of FIG. 1 will not be described again for the purposes of clarity.
- core 132 b may determine job arrival rate 162 based on arrival times of one or more jobs received at multi-core processor 102 a.
- Core 132 b may analyze threads associated with jobs received at multi-core processor 102 a and may determine a number of threads for a respective job that may be executed in parallel.
- core 132 b may determine at least one performance value of performance metric 166 .
- Core 132 b may further compare the determined performance values and, in response, may select a pod and a degree of parallelism based on the comparison.
- pod 130 may be activated and core 132 b may be assigned to execute operating system 104 and execute jobs being received at multi-core processor 102 a .
- Pods 110 , 120 may be deactivated (depicted by the shading) when pod 130 is activated.
- a portion of memory 140 such as memory banks 142 a, 142 b, may be activated to facilitate execution of operating system 104 and jobs on pod 130 .
- Memory banks 142 c, 142 d may be deactivated (depicted by the shading) if activation of memory banks 142 a, 142 b is determined, such as by core 132 b, to be sufficient for execution of operating system 104 and jobs on pod 130 .
- power receiver 106 may allocate power received at multi-core processor 102 a to pod 130 .
- multi-core processor 102 a may receive jobs 160 , 270 , 280 at arrival times 240 , 242 , 244 , respectively. Jobs 160 , 270 , 280 may each be associated with at least one thread associated with execution of requests of a respective job.
- job 160 may be associated with threads 262 , 264 , 266 .
- Job 270 may be associated with threads 272 , 274 , 276 .
- Job 280 may be associated with threads 282 , 284 , 286 .
- a queue 200 which may be stored in cache 134 b or cache 136 b, may store incoming jobs at multi-core processor 102 a when core 132 b is not available to execute the incoming jobs.
- Queue 200 may be of an arbitrary size and may store one or more jobs. Queue 200 may be a queue of a particular queueing model such as a M/M/n queue, where jobs are expected to arrive based on a Poisson process. Queue 200 may also store indications of arrival times 240 , 242 , 244 of jobs 160 , 270 , 280 .
- Core 132 b may analyze jobs stored in queue 200 and may use arrival times 240 , 242 , 244 to determine job arrival rate 162 .
- operating system 104 may include instructions to command core 132 b to analyze jobs stored in queue 200 periodically.
- Core 132 b may analyze queue 200 and may determine that a total of three jobs, jobs 160 , 270 , 280 , are received at pod 130 between a first millisecond and a thirteenth millisecond.
- queue 200 is a M/M/n queue, based on a time interval (thirteen milliseconds) and a number of jobs in queue 200 (three jobs)
- core 132 b may determine job arrival rate 162 based on instructions relating to a Poisson process.
- performance metric 166 may be a mean service time relating to an expected service time of jobs 160 , 270 , 280 .
- Mean service time of jobs 160 , 270 , 280 may be based on job arrival rate 162 and a job service rate 210 that relates to an expected number of jobs that may be executed per second.
- Job service rate 210 may be based on a number of parallel threads associated with jobs 160 , 270 , 280 .
- Parallel threads may be threads that may be executed in parallel.
- core 132 b may determine a number of parallel threads among threads 262 , 264 , 266 , 272 , 274 , 276 , 282 , 284 , 286 associated with jobs 160 , 270 , 280 .
- core 132 b may determine that threads 262 , 264 , 272 , 274 , 282 , 284 are parallel threads, and threads 266 , 276 , 286 are serial threads, where serial threads are threads that cannot be executed in parallel. Based on the determination, core 132 b may determine a percentage, such as 66.66%, that indicates a percentage of parallel executions during execution of each of jobs 160 , 270 , 280 . Core 132 b may determine at least one value of job service rate 210 in an iterative manner using the percentage of parallel execution and at least one value of degree of parallelism 164 . The at least one value of degree of parallelism 164 may be an integer.
- a degree of parallelism of one may indicate executing jobs by executing one thread at a time.
- a degree of parallelism of two may indicate executing jobs by executing two threads at a time. As the percentage of parallel execution increases, a value of job service rate 210 may also increase.
- core 132 b may determine at least one performance value of mean service time in an iterative manner using job arrival rate 162 , job service rate 210 , and at least one value of degree of parallelism 164 .
- core 132 b may determine a first mean service time relating to execution of jobs 160 , 270 , 280 on the set of cores in pod 110 using a first value of degree of parallelism 164 .
- Core 132 b may determine a second mean service time relating to execution of jobs 160 , 270 , 280 on the set of cores in pod 110 using a second value of degree of parallelism 164 .
- Core 132 b may determine subsequent mean service times relating to execution of jobs 160 , 270 , 280 on each set of cores such as cores in pod 120 and cores in pod 130 , using the first, second, and subsequent values of degree of parallelism 164 .
- Selection of a pod may be further based on a number of cores in each pod. For example, core 132 b may not determine a mean service time relating to execution of jobs 160 , 270 , 280 on pod 130 using a degree of parallelism of three or greater because pod 130 includes two cores. Similarly, core 132 b may not determine a mean service time relating to execution of jobs 160 , 270 , 280 on pod 120 using a degree of parallelism of five or greater because pod 120 includes four cores.
- FIG. 3 illustrates example system 100 of FIG. 1 with additional details relating to assigning jobs to a selected pod, arranged in accordance with at least some embodiments described herein.
- FIG. 3 is substantially similar to system 100 of FIG. 1 , with additional details. Those components in FIG. 3 that are labeled identically to components of FIG. 1 will not be described again for the purposes of clarity.
- core 132 b may assign jobs 160 , 270 , 280 to the selected pod.
- the selected pod may execute jobs 160 , 270 , 280 using the selected degree of parallelism.
- multi-core processor 102 a may perform a migration 340 that may migrate operating system 104 to the selected pod and one or more memory banks of memory 140 may be activated or deactivated.
- core 132 b may select pod 110 to execute jobs 160 , 270 , 280 using a selected value of degree of parallelism 164 .
- core 132 b may first identify outstanding jobs in pod 130 . If there are outstanding jobs in pod 130 , cores in pod 130 may execute the outstanding jobs and core 132 b may store incoming jobs in queue 200 .
- core 132 b may activate pod 110 by sending a selection signal 310 to switch 108 .
- Switch 108 may receive selection signal 310 and in response, may activate pod 110 .
- Core 132 b may perform migration 340 in response to the activation of pod 110 to migrate operating system 104 and jobs stored in queue 200 to pod 110 .
- switch 108 may deactivate pod 130 .
- core 132 b may send an indication of the selected value of degree of parallelism 164 to pod 110 .
- a core in pod 110 may be assigned to execute operating system 104 .
- Core 112 a may assign threads to each core in pod 110 based on the selected value of degree of parallelism 164 and based on the determined number of parallel threads (described above).
- a selected value of degree of parallelism 164 may be “9” and threads 262 , 264 , 266 , 272 , 274 , 276 , 282 , 284 , 286 may be parallel threads.
- Core 112 a may assign threads 262 , 264 , 266 , 272 , 274 , 276 , 282 , 284 , 286 to cores in pod 110 to be executed simultaneously based on the selected value of degree of parallelism 164 and the determined number of parallel threads.
- core 112 a may determine an amount of memory required for an execution of jobs 160 , 270 , 280 .
- memory banks 142 a, 142 b are activated prior to migration 340 and memory banks 142 c, 142 d may be deactivated prior to migration 340 .
- core 112 a may determine that the activated memory banks 142 a, 142 b in memory 140 may not provide sufficient memory capacity for the execution of jobs 160 , 270 , 280 .
- core 112 a may identify a portion of memory 140 , such as a deactivated portion, and in response, may activate memory banks associated with the identified portion of memory 140 in order to execute jobs 160 , 270 , 280 .
- core 112 a may activate bank 142 c by sending a selection signal 312 to switch 108 .
- Switch 108 may receive selection signal 312 and in response, may activate memory bank 142 c.
- power receiver 106 may allocate power received at multi-core processor 102 a to pod 130 .
- power receiver 106 may allocate power to pod 110 instead of pod 130 .
- a system in accordance with the disclosure may benefit systems that may utilize multi-core processors.
- the system may reduce unnecessary power consumption by the multi-core processor by maximizing core usage within a multi-core processor. For example, a selection of a first pod where all cores within the first pod may be utilized may result in reduced unnecessary power consumption compared to a selection of a second pod where not all cores within the second pod may be utilized.
- the system may also provide continuous adjustment in power consumption by the multi-core processor. As jobs arrive at the multi-core processor at different arrival times, the system may select different pods at different times in order to reduce unnecessary power consumption by the multi-core processor.
- FIG. 4 illustrates a flow diagram for an example process for implementing job assignment in a multi-core processor, arranged in accordance with at least some embodiments presented herein.
- the process in FIG. 4 could be implemented using, for example, system 100 discussed above.
- An example process may include one or more operations, actions, or functions as illustrated by one or more of blocks S 2 , S 4 , S 6 , S 8 , and/or S 10 . Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
- a multi-core processor may receive a job at an arrival time.
- the multi-core processor may include a first set of cores with a first size.
- the multi-core processor may also include a second set of cores with a second size different from the first size.
- the job may include a request to execute a set of instructions.
- Processing may continue from block S 2 to block S 4 , “Determine a job arrival rate of the job based on the arrival time of the job”.
- the multi-core processor may determine a job arrival rate based on the arrival time of the job.
- the job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job.
- Processing may continue from block S 4 to block S 6 , “Select a degree of parallelism based on the job arrival rate and based on a performance metric”.
- the multi-core processor may select a degree of parallelism based on the job arrival rate and based on a performance metric.
- the degree of parallelism may relate to a number of parallel threads associated with execution of the request.
- the performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism.
- the performance metric may be a mean service time associated with the job.
- Processing may continue from block S 6 to block S 8 , “Select the first set of cores based on the job arrival rate and based on a performance metric”.
- the multi-core processor may select the first set of cores based on the job arrival rate and based on a performance metric. In some examples, the selection of the first set of cores may be further based on a number of cores of the first size in the multi-core processor.
- Processing may continue from block S 8 to block S 10 , “Assign the job to be executed on a first set of cores of the multi-core processor”.
- the multi-core processor may assign the job to be executed on the first set of cores of the multi-core processor.
- the multi-core processor may allocate power to the first set of cores.
- the multi-core processor may identify outstanding jobs assigned to the second set of cores. In response to the identification of the outstanding jobs, the multi-core processor may execute the outstanding jobs on the second set of cores.
- the multi-core processor may deactivate the second set of cores.
- the multi-core processor may migrate an operating system to a particular core among the first set of cores.
- FIG. 5 illustrates an example computer program product 500 that can be utilized to implement job assignment in a multi-core processor, arranged in accordance with at least some embodiments described herein.
- Program product 500 may include a signal bearing medium 502 .
- Signal bearing medium 502 may include one or more instructions 504 that, when executed by, for example, a processor, may provide the functionality described above with respect to FIGS. 1-4 .
- multi-core processors 102 a, 102 b, 102 c, 102 d may undertake one or more of the blocks shown in FIG. 5 in response to instructions 504 conveyed to the system 100 by medium 502 .
- signal bearing medium 502 may encompass a computer-readable medium 506 , such as, but not limited to, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, memory, etc.
- signal bearing medium 502 may encompass a recordable medium 508 , such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, etc.
- signal bearing medium 502 may encompass a communications medium 510 , such as, but not limited to, a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
- program product 500 may be conveyed to one or more modules of the system 100 by an RF signal bearing medium 502 , where the signal bearing medium 502 is conveyed by a wireless communications medium 510 (e.g., a wireless communications medium conforming with the IEEE 802.11 standard).
- a wireless communications medium 510 e.g., a wireless communications medium conforming with the IEEE 802.11 standard.
- FIG. 6 is a block diagram illustrating an example computing device 600 that is arranged to implement job assignment in a multi-core processor, arranged in accordance with at least some embodiments described herein.
- computing device 600 typically includes one or more processors 604 and a system memory 606 .
- a memory bus 608 may be used for communicating between processor 604 and system memory 606 .
- processor 604 may be of any type including but not limited to a microprocessor ( ⁇ P), a microcontroller ( ⁇ C), a digital signal processor (DSP), or any combination thereof.
- Processor 604 may include one more levels of caching, such as a level one cache 610 and a level two cache 612 , a processor core 614 , and registers 616 .
- An example processor core 614 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof.
- An example memory controller 618 may also be used with processor 604 , or in some implementations memory controller 618 may be an internal part of processor 604 .
- system memory 606 may be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof.
- System memory 606 may include an operating system 620 , one or more applications 622 , and program data 624 .
- Application 622 may include a job assignment algorithm 626 that is arranged to perform the functions as described herein including those described with respect to system 100 of FIGS. 1-5 .
- Program data 624 may include job assignment data 628 that may be useful for implementation of job assignment in a multi-core processor as is described herein.
- application 622 may be arranged to operate with program data 624 on operating system 620 such that implementations of job assignment in multi-core processor may be provided.
- This described basic configuration 602 is illustrated in FIG. 6 by those components within the inner dashed line.
- Computing device 600 may have additional features or functionality, and additional interfaces to facilitate communications between basic configuration 602 and any required devices and interfaces.
- a bus/interface controller 630 may be used to facilitate communications between basic configuration 602 and one or more data storage devices 632 via a storage interface bus 634 .
- Data storage devices 632 may be removable storage devices 636 , non-removable storage devices 638 , or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDDs), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSDs), and tape drives to name a few.
- Example computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVDs) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which may be used to store the desired information and which may be accessed by computing device 600 . Any such computer storage media may be part of computing device 600 .
- Computing device 600 may also include an interface bus 640 for facilitating communication from various interface devices (e.g., output devices 642 , peripheral interfaces 644 , and communication devices 646 ) to basic configuration 602 via bus/interface controller 630 .
- Example output devices 642 include a graphics processing unit 648 and an audio processing unit 650 , which may be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 652 .
- Example peripheral interfaces 644 include a serial interface controller 654 or a parallel interface controller 656 , which may be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 658 .
- An example communication device 646 includes a network controller 660 , which may be arranged to facilitate communications with one or more other computing devices 662 over a network communication link via one or more communication ports 664 .
- the network communication link may be one example of a communication media.
- Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media.
- a “modulated data signal” may be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
- communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) and other wireless media.
- RF radio frequency
- IR infrared
- the term computer readable media as used herein may include both storage media and communication media.
- Computing device 600 may be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
- a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
- PDA personal data assistant
- Computing device 600 may also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
- Some example systems (such as multi-core processor architectures) and some example methods allow an improved (or, in some examples, substantially optimized) degree of parallelism to be selected for processing a job, the degree of parallelism being selected based upon the job arrival rate.
- Example systems include a data center capable of handling large variations in job arrival rate with a reduced mean service time.
- an optimum level of parallelism for processing a job is selected based on the job arrival rate, for example with the degree of parallelism increasing with decrease in the job arrival rate.
- An example system such as a dark silicon multiprocessor architecture, includes a run-time scheduler configured to select a core type for job assignments that is optimized under a full-chip power budget to the job arrival rate at that time.
- An example system such as a multi-core processor architecture, comprises a plurality of processing pods, for example where each pod represents a separate multicore processor, with different numbers and sizes of processors in each pod.
- a plurality of processing pods may include a first number of small cores, a second number of medium sized cores, and a third number of large cores.
- small, medium, and large may refer to relative sizes of cores.
- the first number is greater or equal to the second number, and the second number is greater or equal to the third number.
- cores in a pod have private L1/L2 caches, and in some examples the pods may share one or more banks of last-level caches of varying sizes, and may communicate with the caches through an interconnect.
- each pod is micro-architecturally different from the other pods.
- only one pod is turned on at any time.
- pods share a last level cache (LLC).
- a pod may be selected for processing by the run-time scheduler at a particular time, and the other pods are power gated.
- Each pod may be designed to expend the core power budget of the chip, and the pod may be chosen (from a plurality of heterogeneous pods) to reduce (e.g. approximately or substantially minimize) service time based on the job arrival rate at the time of selection.
- a system (such as a multi-core processor) comprises a plurality of pods, where each pod may comprise a plurality of processing cores.
- the processing cores in each pod may be identical.
- the system includes private caches for each core.
- the core type used in a pod is different from the core type used in any other pod on the chip, so that cores are micro-architecturally homogeneous within a pod but heterogeneous across pods.
- each pod is designed to consume the full power budget of the chip (excluding non-core components), and in some examples only one pod is switched on at any given time while the other pods remain dark.
- Some examples comprise a globally shared LLC with multiple banks and support for per-bank power gating, allowing for the LLC cache capacity to be changed dynamically at run-time, and at any given time part of the LLC may be dark.
- Some examples include a run-time scheduler that monitors the job arrival rate and determines which pod to utilize, the optimal degree of parallelism and number of jobs to run in parallel on that pod, and in some examples the number of banks of the LLC to turn on. The run-time scheduler may reduce, and in some examples substantially minimize, the mean service time of jobs within a peak power budget.
- Some example systems comprise a globally shared LLC that is partitioned into banks Each bank may be individually power gated, allowing dynamic control of LLC capacity.
- increased cache capacity may be needed for lower degree of parallelisms and a higher number of parallel jobs.
- the system may be configured to dynamically control the cache capacity based on the degree of parallelism and the number of parallel jobs.
- the frequency of the cores may be reduced to compensate for the increase in LLC power consumption.
- the LLC implements a write-through policy and is therefore generally consistent with the main memory.
- the run-time scheduler if the run-time scheduler decides to switch off one or more banks, the run-time scheduler invalidates all the data in the LLC and updates the cache indexing policy to indicate the reduced cache capacity. Starting with a cold LLC may incur a performance overhead, but even taking that into account, improvements were observed through simulations. Based on the job arrival rate, the run-time scheduler may decide at intervals which pod to utilize and the optimal degree of parallelism for that pod, and optionally may also decide at intervals the LLC cache capacity. In some examples, the run-time scheduler may predict future values of job characteristics and/or job arrival rate, for example using time, historic data, or other approach or combination thereof.
- a run-time scheduler may be configured to implement an online policy that estimates future values and/or future variations of job characteristics, and/or future values and/or future variations job arrival rate.
- a job arrival rate may be estimated for a subsequent time interval, and the degree of parallelism used for that time interval selected based on the estimated job arrival rate.
- a job arrival rate may be determined as an average over a time period, and in non-limiting examples the time period may be a time period in the range 1 second to 10 minutes, for example in the range 10 seconds-5 minutes. In some examples, the job arrival rate may be determined from the time during which a predetermined number of jobs arrive. In some examples, job arrival rate may be determined a rolling average of a parameter as described above. In some examples, ranges may be approximate.
- a range includes each individual member.
- a group having 1-3 cells refers to groups having 1, 2, or 3 cells.
- a group having 1-5 cells refers to groups having 1, 2, 3, 4, or 5 cells, and so forth.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Power Sources (AREA)
Abstract
Description
- Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
- Datacenters may include one or more servers that may include multi-core processors. Jobs received at the datacenter may be assigned to cores within the multi-core processors based on a scheduling mechanism of a respective server. In some examples, the scheduling mechanism may schedule jobs to be executed on different cores in parallel.
- In some examples, methods for assigning a job to be executed in a multi-core processor are generally described. The methods may include receiving, by the multi-core processor, the job at the multi-core processor at an arrival time. The multi-core processor may include a first set of cores with a first size. The multi-core processor may also include a second set of cores with a second size different from the first size. The job may include a request to execute a set of instructions. The methods may also include determining, by the multi-core processor, a job arrival rate of the job based on the arrival time of the job. The job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job. The methods may also include selecting, by the multi-core processor, a degree of parallelism based on the job arrival rate and based on a performance metric. The degree of parallelism may relate to a number of parallel threads associated with execution of the request. The performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism. The methods may also include selecting, by the multi-core processor, the first set of cores based on the job arrival rate and based on a performance metric. The methods may also include, in response to the selection of the first set of cores, assigning, by the multi-core processor, the job to be executed on the first set of cores.
- In some examples, systems effective to assign a job to be executed in a multi-core processor are generally described. The system may include the multi-core processor. The multi-core processor may include a first set of cores with a first size. The multi-core processor may also include a second set of cores with a second size different from the first size. The systems may also include a memory configured to be in communication with the multi-core processor. The multi-core processor may be configured to receive the job at an arrival time. The job may include a request to execute a set of instructions. The multi-core processor may also be configured to determine a job arrival rate of the job based on the arrival time of the job. The job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job. The multi-core processor may also be configured to select the first set of cores and select a degree of parallelism based on the job arrival rate and based on a performance metric. The degree of parallelism may relate to a number of parallel threads associated with execution of the request. The performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism. The multi-core processor may also be configured to, in response to the selection of the first set of cores, assign the job to be executed on the first set of cores.
- In some examples, multi-core processors configured to assign a job to a first set of cores in the multi-core processor are generally described. The multi-core processors may include the first set of cores with a first size. The multi-core processors may also include a second set of cores with a second size different from the first size. The multi-core processors may also include a memory configured to be in communication with the first set of cores and with the second set of cores. A particular core among the second set of cores may be configured to receive the job at an arrival time. The job may include a request to execute a set of instructions. The particular core may also be configured to determine a job arrival rate of the job based on the arrival time of the job. The job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job. The particular core may also be configured to select the first set of cores and selecting a degree of parallelism based on the job arrival rate and based on a performance metric. The degree of parallelism may relate to a number of parallel threads associated with execution of the request. The performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism. The particular core may also be configured to, in response to the selection of the first set of cores, assign the job to be executed on the first set of cores.
- In some examples, multi-core processors configured to execute a job on a first set of cores are generally described. The multi-core processors may include a memory. The multi-core processors may also include a first set of cores with a first size. The multi-core processors may also include a second set of cores with a second size different from the first size. The multi-core processors may include a switch configured to be in communication with the first set of cores, the second set of cores, and the memory. The multi-core processors may also include a power receiver configured to be in communication with the first set of cores, the second set of cores, and the memory. In response to a receipt of a selection signal at the switch, the switch may be configured to activate the first set of cores to execute the job.
- The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
- The foregoing and other features of this disclosure will become more fully apparent from the following description and appended claims, taken in conjunction with the accompanying drawings. Understanding that these drawings depict only several embodiments in accordance with the disclosure and are, therefore, not to be considered limiting of its scope, the disclosure will be described with additional specificity and detail through use of the accompanying drawings, in which:
-
FIG. 1 illustrates an example system that can be utilized to implement job assignment in a multi-core processor; -
FIG. 2 illustrates the example system ofFIG. 1 with additional details relating to selection of a pod; -
FIG. 3 illustrates the example system ofFIG. 1 with additional details relating to assigning jobs to a selected pod; -
FIG. 4 illustrates a flow diagram for an example process for implementing job assignment in a multi-core processor; -
FIG. 5 illustrates an example computer program product that can be utilized to implement job assignment in a multi-core processor; and -
FIG. 6 is a block diagram illustrating an example computing device that is arranged to implement job assignment in a multi-core processor; all arranged according to at least some embodiments described herein. - In the following detailed description, reference is made to the accompanying drawings, which form a part hereof. In the drawings, similar symbols typically identify similar components, unless context dictates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not meant to be limiting. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the Figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are explicitly contemplated herein.
- This disclosure is generally drawn, inter alia, to methods, apparatus, systems, devices, and computer program products related to job assignment in a multi-core processor.
- Briefly stated, technologies are generally described for methods and systems effective to assign a job to be executed in a multi-core processor. The methods may include receiving the job at the multi-core processor at an arrival time. The multi-core processor may include a first set of cores with a first size. The multi-core processor may also include a second set of cores with a second size different from the first size. The job may include a request to execute a set of instructions. The methods may also include determining a job arrival rate of the job based on the arrival time of the job. The job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job. The methods may also include selecting the first set of cores and selecting a degree of parallelism based on the job arrival rate and based on a performance metric. The degree of parallelism may relate to a number of parallel threads associated with execution of the request. The performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism. The methods may also include, in response to the selection of the first set of cores, assigning the job to be executed on the first set of cores.
-
FIG. 1 illustrates anexample system 100 that can be utilized to implement job assignment in a multi-core processor, arranged in accordance with at least some embodiments described herein.System 100 may be implemented in adatacenter 101 and may include one or moremulti-core processors multi-core processors datacenter 101.Multi-core processors Datacenter 101 may include apower source 105, such as a power generator, configured to provide power tomulti-core processors -
Multi-core processors multi-core processor 102 a,multi-core processor 102 a may include one ormore pods power receiver 106, aswitch 108, and/or amemory 140.Power receiver 106 may receive power provided bypower source 105 and, in response, may allocate the received power to a pod amongpods Power receiver 106 may further be configured to allocate power tomemory 140.Switch 108 may be, for example, a multiplexer and may be configured to be in communication withpods memory 140.Switch 108 may be configured to receive selection signals that may be effective to activate one pod amongpods Memory 140 may be a cache, such as a low-level cache such as a level three (L3) cache that is effective to store data relating to operations ofmulti-core processor 102 a.Memory 140 may include one or more one ormore memory banks Switch 108 may be further configured to receive selection signals that may be effective to activate at least one memory bank amongmemory banks - Each pod among
pods Pods pod 110 may include nine cores 112 (e.g. 112 a, 112 b, 112 c, 112 d, 112 e, 112 f, 112 g, 112 h, 112 i).Pod 120 may include four cores 122 (e.g. 122 a, 122 b, 122 c, 122 d).Pod 130 may include two cores 132 (e.g. 132 a, 132 b). Each pod amongpods pod 110 may be configured to operate under a nominal frequency of 2.0 gigahertz (GHz), cores inpod 120 may be configured to operate under a nominal frequency of 2.2 GHz, and cores inpod 130 may be configured to operate under a nominal frequency of 2.4 GHz. In some examples, core size may refer to a dispatch width that relates to a number of instructions that may be executed simultaneously on a core. In some examples, core size may refer to a window size that relates to storage of instructions that are waiting to be returned from a core. In some examples, core size may refer to a peak power consumption of a core. In some examples, core size may refer to cache size of a data cache or an instruction cache associated with a core. - Each pod may include at least one tile where each tile includes a core and/or one or more modules of memory such as cache. Focusing on
pod 130,pod 130 may include atile 131 a and atile 131 b.Tile 131 a may include core 132 a andtile 131 b may include core 132 b. Focusing ontile 131 b,tile 131 b may further include acache 134 b and acache 136 b. In examples wherecache 134 b is a level one (L1) cache,cache 134 b may be a part ofcore 132 b.Cache core 132 b.Core 132 b may be configured to be in communication withcache 134 b and/orcache 136 b.Core 132 b may be configured to execute jobs and/or threads issued by anoperating system 104 or jobs received atmulti-core processor 102 a.Operating system 104 may be an operating system effective to facilitate operations ofmulti-core processor 102 a and/ordatacenter 101. - In an example,
datacenter 101 may receive ajob 160 from an entity such as a device configured to be in communication withdatacenter 101. A processor ofdatacenter 101 may assignjob 160 to a multi-core processor such asmulti-core processor 102 a.Job 160 may include a request to execute a set of instructions relating to contents stored indatacenter 101. For example, whendatacenter 101 is a datacenter for a host domain such as xyz.com,job 160 may include a request to execute instructions to search for particular content and/or files relating to webpages of xyz.com. - As will be described in more detail below,
job 160 may arrive atmulti-core processor 102 a at an arrival time.Core 132 b, which may be assigned to executeoperating system 104, may determine ajob arrival rate 162 based on the arrival time ofjob 160 atmulti-core processor 102 a.Job arrival rate 162 may indicate a frequency thatmulti-core processor 102 a receives jobs such asjob 160.Core 132 b may select a pod amongpods parallelism 164, based onjob arrival rate 162 and based on aperformance metric 166. Degree ofparallelism 164 may relate to a number of parallel threads associated with execution of requests ofjob 160.Performance metric 166 may relate to parameters associated with an execution ofjob 160 such as a mean execution time. In response to selection of a pod amongpods core 132 b may assignjob 160 to be executed on the selected pod. -
FIG. 2 illustrates anexample system 100 ofFIG. 1 with additional details relating to selection of a pod, arranged in accordance with at least some embodiments described herein.FIG. 2 is substantially similar tosystem 100 ofFIG. 1 , with additional details. Those components inFIG. 2 that are labeled identically to components ofFIG. 1 will not be described again for the purposes of clarity. - As will be explained in more detail below,
core 132 b may determinejob arrival rate 162 based on arrival times of one or more jobs received atmulti-core processor 102 a.Core 132 b may analyze threads associated with jobs received atmulti-core processor 102 a and may determine a number of threads for a respective job that may be executed in parallel. In response to the determination ofjob arrival rate 162 and the number of threads that may be executed in parallel,core 132 b may determine at least one performance value ofperformance metric 166.Core 132 b may further compare the determined performance values and, in response, may select a pod and a degree of parallelism based on the comparison. - In the example,
pod 130 may be activated andcore 132 b may be assigned to executeoperating system 104 and execute jobs being received atmulti-core processor 102 a.Pods pod 130 is activated. In some examples, a portion ofmemory 140, such asmemory banks operating system 104 and jobs onpod 130.Memory banks memory banks core 132 b, to be sufficient for execution ofoperating system 104 and jobs onpod 130. Whenpod 130 is activated,power receiver 106 may allocate power received atmulti-core processor 102 a topod 130. - In the example,
multi-core processor 102 a may receivejobs arrival times Jobs job 160 may be associated withthreads Job 270 may be associated withthreads Job 280 may be associated withthreads queue 200, which may be stored incache 134 b orcache 136 b, may store incoming jobs atmulti-core processor 102 a whencore 132 b is not available to execute the incoming jobs. Queue 200 may be of an arbitrary size and may store one or more jobs. Queue 200 may be a queue of a particular queueing model such as a M/M/n queue, where jobs are expected to arrive based on a Poisson process. Queue 200 may also store indications ofarrival times jobs -
Core 132 b may analyze jobs stored inqueue 200 and may usearrival times job arrival rate 162. In some examples,operating system 104 may include instructions tocommand core 132 b to analyze jobs stored inqueue 200 periodically.Core 132 b may analyzequeue 200 and may determine that a total of three jobs,jobs pod 130 between a first millisecond and a thirteenth millisecond. Whenqueue 200 is a M/M/n queue, based on a time interval (thirteen milliseconds) and a number of jobs in queue 200 (three jobs),core 132 b may determinejob arrival rate 162 based on instructions relating to a Poisson process. - In the example,
performance metric 166 may be a mean service time relating to an expected service time ofjobs jobs job arrival rate 162 and ajob service rate 210 that relates to an expected number of jobs that may be executed per second.Job service rate 210 may be based on a number of parallel threads associated withjobs performance metric 166 is a mean service time ofjobs job arrival rate 162,core 132 b may determine a number of parallel threads amongthreads jobs - For example,
core 132 b may determine thatthreads threads core 132 b may determine a percentage, such as 66.66%, that indicates a percentage of parallel executions during execution of each ofjobs Core 132 b may determine at least one value ofjob service rate 210 in an iterative manner using the percentage of parallel execution and at least one value of degree ofparallelism 164. The at least one value of degree ofparallelism 164 may be an integer. A degree of parallelism of one may indicate executing jobs by executing one thread at a time. A degree of parallelism of two may indicate executing jobs by executing two threads at a time. As the percentage of parallel execution increases, a value ofjob service rate 210 may also increase. - In response to the determination of
job arrival rate 162 andjob service rate 210,core 132 b may determine at least one performance value of mean service time in an iterative manner usingjob arrival rate 162,job service rate 210, and at least one value of degree ofparallelism 164. For example,core 132 b may determine a first mean service time relating to execution ofjobs pod 110 using a first value of degree ofparallelism 164.Core 132 b may determine a second mean service time relating to execution ofjobs pod 110 using a second value of degree ofparallelism 164.Core 132 b may determine subsequent mean service times relating to execution ofjobs pod 120 and cores inpod 130, using the first, second, and subsequent values of degree ofparallelism 164. - Selection of a pod may be further based on a number of cores in each pod. For example,
core 132 b may not determine a mean service time relating to execution ofjobs pod 130 using a degree of parallelism of three or greater becausepod 130 includes two cores. Similarly,core 132 b may not determine a mean service time relating to execution ofjobs pod 120 using a degree of parallelism of five or greater becausepod 120 includes four cores. -
FIG. 3 illustratesexample system 100 ofFIG. 1 with additional details relating to assigning jobs to a selected pod, arranged in accordance with at least some embodiments described herein.FIG. 3 is substantially similar tosystem 100 ofFIG. 1 , with additional details. Those components inFIG. 3 that are labeled identically to components ofFIG. 1 will not be described again for the purposes of clarity. - In response to a selection of a pod and degree of
parallelism 164,core 132 b may assignjobs jobs multi-core processor 102 a may perform a migration 340 that may migrateoperating system 104 to the selected pod and one or more memory banks ofmemory 140 may be activated or deactivated. - Based on the comparison of the determined mean service times (described above), in the example,
core 132 b may selectpod 110 to executejobs parallelism 164. In response to selection ofpod 110,core 132 b may first identify outstanding jobs inpod 130. If there are outstanding jobs inpod 130, cores inpod 130 may execute the outstanding jobs andcore 132 b may store incoming jobs inqueue 200. In response to a completion of the outstanding jobs assigned topod 130,core 132 b may activatepod 110 by sending a selection signal 310 to switch 108.Switch 108 may receive selection signal 310 and in response, may activatepod 110.Core 132 b may perform migration 340 in response to the activation ofpod 110 to migrateoperating system 104 and jobs stored inqueue 200 topod 110. In response to a completion of migration 340,switch 108 may deactivatepod 130. During migration 340,core 132 b may send an indication of the selected value of degree ofparallelism 164 topod 110. - In response to a completion of migration 340, a core in
pod 110, such ascore 112 a, may be assigned to executeoperating system 104.Core 112 a may assign threads to each core inpod 110 based on the selected value of degree ofparallelism 164 and based on the determined number of parallel threads (described above). In an example, a selected value of degree ofparallelism 164 may be “9” andthreads Core 112 a may assignthreads pod 110 to be executed simultaneously based on the selected value of degree ofparallelism 164 and the determined number of parallel threads. - In some examples, prior to assigning threads to cores in
pod 110,core 112 a may determine an amount of memory required for an execution ofjobs memory banks memory banks core 112 a may determine that the activatedmemory banks memory 140 may not provide sufficient memory capacity for the execution ofjobs memory banks core 112 a may identify a portion ofmemory 140, such as a deactivated portion, and in response, may activate memory banks associated with the identified portion ofmemory 140 in order to executejobs core 112 a may activatebank 142 c by sending a selection signal 312 to switch 108.Switch 108 may receive selection signal 312 and in response, may activatememory bank 142 c. In some examples, prior to migration 340,power receiver 106 may allocate power received atmulti-core processor 102 a topod 130. In response to the selection ofpod 110 to executejobs power receiver 106 may allocate power topod 110 instead ofpod 130. - Among other possible benefits, a system in accordance with the disclosure may benefit systems that may utilize multi-core processors. The system may reduce unnecessary power consumption by the multi-core processor by maximizing core usage within a multi-core processor. For example, a selection of a first pod where all cores within the first pod may be utilized may result in reduced unnecessary power consumption compared to a selection of a second pod where not all cores within the second pod may be utilized. The system may also provide continuous adjustment in power consumption by the multi-core processor. As jobs arrive at the multi-core processor at different arrival times, the system may select different pods at different times in order to reduce unnecessary power consumption by the multi-core processor.
-
FIG. 4 illustrates a flow diagram for an example process for implementing job assignment in a multi-core processor, arranged in accordance with at least some embodiments presented herein. The process inFIG. 4 could be implemented using, for example,system 100 discussed above. An example process may include one or more operations, actions, or functions as illustrated by one or more of blocks S2, S4, S6, S8, and/or S10. Although illustrated as discrete blocks, various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation. - Processing may begin at block S2, “Receive a job at the multi-core processor at an arrival time”. At block S2, a multi-core processor may receive a job at an arrival time. The multi-core processor may include a first set of cores with a first size. The multi-core processor may also include a second set of cores with a second size different from the first size. The job may include a request to execute a set of instructions.
- Processing may continue from block S2 to block S4, “Determine a job arrival rate of the job based on the arrival time of the job”. At block S4, the multi-core processor may determine a job arrival rate based on the arrival time of the job. The job arrival rate may indicate a frequency that the multi-core processor receives a plurality of jobs including the job.
- Processing may continue from block S4 to block S6, “Select a degree of parallelism based on the job arrival rate and based on a performance metric”. At block S6, the multi-core processor may select a degree of parallelism based on the job arrival rate and based on a performance metric. The degree of parallelism may relate to a number of parallel threads associated with execution of the request. The performance metric may relate to the execution of the job on the first set of cores using the degree of parallelism. In some examples, the performance metric may be a mean service time associated with the job.
- Processing may continue from block S6 to block S8, “Select the first set of cores based on the job arrival rate and based on a performance metric”. At block S8, the multi-core processor may select the first set of cores based on the job arrival rate and based on a performance metric. In some examples, the selection of the first set of cores may be further based on a number of cores of the first size in the multi-core processor.
- Processing may continue from block S8 to block S10, “Assign the job to be executed on a first set of cores of the multi-core processor”. At block S8, the multi-core processor may assign the job to be executed on the first set of cores of the multi-core processor. In some examples, in response to the assignment of the job to be executed on the first set of cores, the multi-core processor may allocate power to the first set of cores. Prior to assigning the jobs to be executed on the first set of cores, the multi-core processor may identify outstanding jobs assigned to the second set of cores. In response to the identification of the outstanding jobs, the multi-core processor may execute the outstanding jobs on the second set of cores. In response to a completion of the execution of the outstanding jobs on the second set of cores, the multi-core processor may deactivate the second set of cores. In some examples, in response to the completion of execution of the outstanding jobs, the multi-core processor may migrate an operating system to a particular core among the first set of cores.
-
FIG. 5 illustrates an examplecomputer program product 500 that can be utilized to implement job assignment in a multi-core processor, arranged in accordance with at least some embodiments described herein.Program product 500 may include a signal bearing medium 502. Signal bearing medium 502 may include one or more instructions 504 that, when executed by, for example, a processor, may provide the functionality described above with respect toFIGS. 1-4 . Thus, for example, referring tosystem 100,multi-core processors FIG. 5 in response to instructions 504 conveyed to thesystem 100 by medium 502. - In some implementations, signal bearing medium 502 may encompass a computer-
readable medium 506, such as, but not limited to, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, memory, etc. In some implementations, signal bearing medium 502 may encompass arecordable medium 508, such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, etc. In some implementations, signal bearing medium 502 may encompass acommunications medium 510, such as, but not limited to, a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.). Thus, for example,program product 500 may be conveyed to one or more modules of thesystem 100 by an RF signal bearing medium 502, where the signal bearing medium 502 is conveyed by a wireless communications medium 510 (e.g., a wireless communications medium conforming with the IEEE 802.11 standard). -
FIG. 6 is a block diagram illustrating anexample computing device 600 that is arranged to implement job assignment in a multi-core processor, arranged in accordance with at least some embodiments described herein. In a very basic configuration 602,computing device 600 typically includes one ormore processors 604 and asystem memory 606. A memory bus 608 may be used for communicating betweenprocessor 604 andsystem memory 606. - Depending on the desired configuration,
processor 604 may be of any type including but not limited to a microprocessor (μP), a microcontroller (μC), a digital signal processor (DSP), or any combination thereof.Processor 604 may include one more levels of caching, such as a level onecache 610 and a level twocache 612, a processor core 614, and registers 616. An example processor core 614 may include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof. Anexample memory controller 618 may also be used withprocessor 604, or in someimplementations memory controller 618 may be an internal part ofprocessor 604. - Depending on the desired configuration,
system memory 606 may be of any type including but not limited to volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.) or any combination thereof.System memory 606 may include anoperating system 620, one ormore applications 622, andprogram data 624.Application 622 may include ajob assignment algorithm 626 that is arranged to perform the functions as described herein including those described with respect tosystem 100 ofFIGS. 1-5 .Program data 624 may includejob assignment data 628 that may be useful for implementation of job assignment in a multi-core processor as is described herein. In some embodiments,application 622 may be arranged to operate withprogram data 624 onoperating system 620 such that implementations of job assignment in multi-core processor may be provided. This described basic configuration 602 is illustrated inFIG. 6 by those components within the inner dashed line. -
Computing device 600 may have additional features or functionality, and additional interfaces to facilitate communications between basic configuration 602 and any required devices and interfaces. For example, a bus/interface controller 630 may be used to facilitate communications between basic configuration 602 and one or moredata storage devices 632 via a storage interface bus 634.Data storage devices 632 may beremovable storage devices 636,non-removable storage devices 638, or a combination thereof. Examples of removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDDs), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSDs), and tape drives to name a few. Example computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. -
System memory 606,removable storage devices 636 andnon-removable storage devices 638 are examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVDs) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which may be used to store the desired information and which may be accessed by computingdevice 600. Any such computer storage media may be part ofcomputing device 600. -
Computing device 600 may also include an interface bus 640 for facilitating communication from various interface devices (e.g.,output devices 642,peripheral interfaces 644, and communication devices 646) to basic configuration 602 via bus/interface controller 630.Example output devices 642 include a graphics processing unit 648 and anaudio processing unit 650, which may be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 652. Exampleperipheral interfaces 644 include a serial interface controller 654 or a parallel interface controller 656, which may be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 658. Anexample communication device 646 includes anetwork controller 660, which may be arranged to facilitate communications with one or moreother computing devices 662 over a network communication link via one ormore communication ports 664. - The network communication link may be one example of a communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and may include any information delivery media. A “modulated data signal” may be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) and other wireless media. The term computer readable media as used herein may include both storage media and communication media.
-
Computing device 600 may be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.Computing device 600 may also be implemented as a personal computer including both laptop computer and non-laptop computer configurations. - Some example systems (such as multi-core processor architectures) and some example methods allow an improved (or, in some examples, substantially optimized) degree of parallelism to be selected for processing a job, the degree of parallelism being selected based upon the job arrival rate. Example systems include a data center capable of handling large variations in job arrival rate with a reduced mean service time. In some examples, an optimum level of parallelism for processing a job is selected based on the job arrival rate, for example with the degree of parallelism increasing with decrease in the job arrival rate. An example system, such as a dark silicon multiprocessor architecture, includes a run-time scheduler configured to select a core type for job assignments that is optimized under a full-chip power budget to the job arrival rate at that time.
- An example system, such as a multi-core processor architecture, comprises a plurality of processing pods, for example where each pod represents a separate multicore processor, with different numbers and sizes of processors in each pod. In some examples, a plurality of processing pods may include a first number of small cores, a second number of medium sized cores, and a third number of large cores. In this context, small, medium, and large may refer to relative sizes of cores. In some examples, the first number is greater or equal to the second number, and the second number is greater or equal to the third number. In some examples, cores in a pod have private L1/L2 caches, and in some examples the pods may share one or more banks of last-level caches of varying sizes, and may communicate with the caches through an interconnect. In some examples, each pod is micro-architecturally different from the other pods. In some examples, only one pod is turned on at any time. In some examples, pods share a last level cache (LLC).
- In some examples, a pod may be selected for processing by the run-time scheduler at a particular time, and the other pods are power gated. Each pod may be designed to expend the core power budget of the chip, and the pod may be chosen (from a plurality of heterogeneous pods) to reduce (e.g. approximately or substantially minimize) service time based on the job arrival rate at the time of selection.
- In some examples, a system (such as a multi-core processor) comprises a plurality of pods, where each pod may comprise a plurality of processing cores. In some examples, the processing cores in each pod may be identical. In some examples, the system includes private caches for each core. In some examples, the core type used in a pod is different from the core type used in any other pod on the chip, so that cores are micro-architecturally homogeneous within a pod but heterogeneous across pods. In some examples, each pod is designed to consume the full power budget of the chip (excluding non-core components), and in some examples only one pod is switched on at any given time while the other pods remain dark. Some examples comprise a globally shared LLC with multiple banks and support for per-bank power gating, allowing for the LLC cache capacity to be changed dynamically at run-time, and at any given time part of the LLC may be dark. Some examples include a run-time scheduler that monitors the job arrival rate and determines which pod to utilize, the optimal degree of parallelism and number of jobs to run in parallel on that pod, and in some examples the number of banks of the LLC to turn on. The run-time scheduler may reduce, and in some examples substantially minimize, the mean service time of jobs within a peak power budget.
- Experimental results were obtained using a cycle-accurate multi-core simulation and an in-house discrete event simulation (DES) engine, and showed that the optimal degree of parallelism, type of pod used and LLC capacity depended on the job arrival rate. Even in the absence of any diversity in job application characteristics, examples of the present disclosure (such as micro-architecturally heterogeneous dark silicon processors) show improved performance when there are arrival rate variations. Examples also include data centers that serve homogeneous workloads, for example a data center that serves web search queries, where there may be little application heterogeneity.
- Some example systems comprise a globally shared LLC that is partitioned into banks Each bank may be individually power gated, allowing dynamic control of LLC capacity. In some examples, increased cache capacity may be needed for lower degree of parallelisms and a higher number of parallel jobs. The system may be configured to dynamically control the cache capacity based on the degree of parallelism and the number of parallel jobs. In some examples, as the number of banks of the LLC that are switched on increase, the frequency of the cores may be reduced to compensate for the increase in LLC power consumption. In some examples, the LLC implements a write-through policy and is therefore generally consistent with the main memory. In some examples, if the run-time scheduler decides to switch off one or more banks, the run-time scheduler invalidates all the data in the LLC and updates the cache indexing policy to indicate the reduced cache capacity. Starting with a cold LLC may incur a performance overhead, but even taking that into account, improvements were observed through simulations. Based on the job arrival rate, the run-time scheduler may decide at intervals which pod to utilize and the optimal degree of parallelism for that pod, and optionally may also decide at intervals the LLC cache capacity. In some examples, the run-time scheduler may predict future values of job characteristics and/or job arrival rate, for example using time, historic data, or other approach or combination thereof. In some examples, a run-time scheduler may be configured to implement an online policy that estimates future values and/or future variations of job characteristics, and/or future values and/or future variations job arrival rate. In some examples, a job arrival rate may be estimated for a subsequent time interval, and the degree of parallelism used for that time interval selected based on the estimated job arrival rate.
- In some examples, a job arrival rate may be determined as an average over a time period, and in non-limiting examples the time period may be a time period in the
range 1 second to 10 minutes, for example in therange 10 seconds-5 minutes. In some examples, the job arrival rate may be determined from the time during which a predetermined number of jobs arrive. In some examples, job arrival rate may be determined a rolling average of a parameter as described above. In some examples, ranges may be approximate. - The present disclosure is not to be limited in terms of the particular embodiments described in this application, which are intended as illustrations of various aspects. Many modifications and variations can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the disclosure, in addition to those enumerated herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims. It is to be understood that this disclosure is not limited to particular methods, reagents, compounds compositions or biological systems, which can, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
- With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.
- It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.). It will also be understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more” to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”
- In addition, where features or aspects of the disclosure are described in terms of Markush groups, those skilled in the art will recognize that the disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.
- As will be understood by one skilled in the art, for any and all purposes, such as in terms of providing a written description, all ranges disclosed herein also encompass any and all possible subranges and combinations of subranges thereof. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art all language such as “up to,” “at least,” “greater than,” “less than,” and the like include the number recited and refer to ranges which can be subsequently broken down into subranges as discussed above. Finally, as will be understood by one skilled in the art, a range includes each individual member. Thus, for example, a group having 1-3 cells refers to groups having 1, 2, or 3 cells. Similarly, a group having 1-5 cells refers to groups having 1, 2, 3, 4, or 5 cells, and so forth.
- While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
Claims (23)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/447,216 US9256470B1 (en) | 2014-07-30 | 2014-07-30 | Job assignment in a multi-core processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/447,216 US9256470B1 (en) | 2014-07-30 | 2014-07-30 | Job assignment in a multi-core processor |
Publications (2)
Publication Number | Publication Date |
---|---|
US20160034310A1 true US20160034310A1 (en) | 2016-02-04 |
US9256470B1 US9256470B1 (en) | 2016-02-09 |
Family
ID=55180122
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/447,216 Active - Reinstated US9256470B1 (en) | 2014-07-30 | 2014-07-30 | Job assignment in a multi-core processor |
Country Status (1)
Country | Link |
---|---|
US (1) | US9256470B1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018018425A1 (en) * | 2016-07-26 | 2018-02-01 | 张升泽 | Method and system for allocating threads of multi-kernel chip |
WO2018018373A1 (en) * | 2016-07-25 | 2018-02-01 | 张升泽 | Power calculation method and system for multiple core chips |
WO2018018491A1 (en) * | 2016-07-28 | 2018-02-01 | 张升泽 | Method and system for allocating voltage of electronic chip in plurality of intervals |
US20180285290A1 (en) * | 2017-03-30 | 2018-10-04 | Futurewei Technologies, Inc. | Distributed and shared memory controller |
US20180307624A1 (en) * | 2017-04-24 | 2018-10-25 | Intel Corporation | System cache optimizations for deep learning compute engines |
US10169051B2 (en) * | 2013-12-05 | 2019-01-01 | Blue Yonder GmbH | Data processing device, processor core array and method for characterizing behavior of equipment under observation |
CN111522420A (en) * | 2019-01-17 | 2020-08-11 | 电子科技大学 | Multi-core chip dynamic thermal management method based on power budget |
US20230028837A1 (en) * | 2021-07-23 | 2023-01-26 | Vmware, Inc. | Scaling for split-networking datapath |
US20230046808A1 (en) * | 2020-09-04 | 2023-02-16 | Micron Technology, Inc. | Volatile memory to non-volatile memory interface for power management |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106776004A (en) * | 2016-11-18 | 2017-05-31 | 努比亚技术有限公司 | Cpu resource distributor and method |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8015564B1 (en) * | 2005-04-27 | 2011-09-06 | Hewlett-Packard Development Company, L.P. | Method of dispatching tasks in multi-processor computing environment with dispatching rules and monitoring of system status |
GB0613923D0 (en) * | 2006-07-13 | 2006-08-23 | Ibm | A method, apparatus and software for managing processing for a plurality of processors |
US8291427B2 (en) * | 2008-06-09 | 2012-10-16 | International Business Machines Corporation | Scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the nodes during execution |
US8356304B2 (en) * | 2009-06-30 | 2013-01-15 | International Business Machines Corporation | Method and system for job scheduling |
US20120066683A1 (en) * | 2010-09-09 | 2012-03-15 | Srinath Nadig S | Balanced thread creation and task allocation |
US8713256B2 (en) | 2011-12-23 | 2014-04-29 | Intel Corporation | Method, apparatus, and system for energy efficiency and energy conservation including dynamic cache sizing and cache operating voltage management for optimal power performance |
US20140068621A1 (en) * | 2012-08-30 | 2014-03-06 | Sriram Sitaraman | Dynamic storage-aware job scheduling |
-
2014
- 2014-07-30 US US14/447,216 patent/US9256470B1/en active Active - Reinstated
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10169051B2 (en) * | 2013-12-05 | 2019-01-01 | Blue Yonder GmbH | Data processing device, processor core array and method for characterizing behavior of equipment under observation |
WO2018018373A1 (en) * | 2016-07-25 | 2018-02-01 | 张升泽 | Power calculation method and system for multiple core chips |
WO2018018425A1 (en) * | 2016-07-26 | 2018-02-01 | 张升泽 | Method and system for allocating threads of multi-kernel chip |
WO2018018491A1 (en) * | 2016-07-28 | 2018-02-01 | 张升泽 | Method and system for allocating voltage of electronic chip in plurality of intervals |
US20180285290A1 (en) * | 2017-03-30 | 2018-10-04 | Futurewei Technologies, Inc. | Distributed and shared memory controller |
US10769080B2 (en) * | 2017-03-30 | 2020-09-08 | Futurewei Technologies, Inc. | Distributed and shared memory controller |
US11586558B2 (en) | 2017-04-24 | 2023-02-21 | Intel Corporation | System cache optimizations for deep learning compute engines |
US20180307624A1 (en) * | 2017-04-24 | 2018-10-25 | Intel Corporation | System cache optimizations for deep learning compute engines |
US11003592B2 (en) * | 2017-04-24 | 2021-05-11 | Intel Corporation | System cache optimizations for deep learning compute engines |
US11914525B2 (en) | 2017-04-24 | 2024-02-27 | Intel Corporation | System cache optimizations for deep learning compute engines |
CN111522420A (en) * | 2019-01-17 | 2020-08-11 | 电子科技大学 | Multi-core chip dynamic thermal management method based on power budget |
US20230046808A1 (en) * | 2020-09-04 | 2023-02-16 | Micron Technology, Inc. | Volatile memory to non-volatile memory interface for power management |
US11960738B2 (en) * | 2020-09-04 | 2024-04-16 | Micron Technology, Inc. | Volatile memory to non-volatile memory interface for power management |
US20230028837A1 (en) * | 2021-07-23 | 2023-01-26 | Vmware, Inc. | Scaling for split-networking datapath |
Also Published As
Publication number | Publication date |
---|---|
US9256470B1 (en) | 2016-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9256470B1 (en) | Job assignment in a multi-core processor | |
CN111406250B (en) | Provisioning using prefetched data in a serverless computing environment | |
US8489744B2 (en) | Selecting a host from a host cluster for live migration of a virtual machine | |
US10048976B2 (en) | Allocation of virtual machines to physical machines through dominant resource assisted heuristics | |
US10037222B2 (en) | Virtualization of hardware accelerator allowing simultaneous reading and writing | |
US8694638B2 (en) | Selecting a host from a host cluster to run a virtual machine | |
KR101686010B1 (en) | Apparatus for fair scheduling of synchronization in realtime multi-core systems and method of the same | |
US8738875B2 (en) | Increasing memory capacity in power-constrained systems | |
US8689226B2 (en) | Assigning resources to processing stages of a processing subsystem | |
EP3111333B1 (en) | Thread and data assignment in multi-core processors | |
US8656405B2 (en) | Pulling heavy tasks and pushing light tasks across multiple processor units of differing capacity | |
US20130167152A1 (en) | Multi-core-based computing apparatus having hierarchical scheduler and hierarchical scheduling method | |
US20110161637A1 (en) | Apparatus and method for parallel processing | |
US10445131B2 (en) | Core prioritization for heterogeneous on-chip networks | |
US20150301858A1 (en) | Multiprocessors systems and processes scheduling methods thereof | |
JP6679146B2 (en) | Event-Driven Reoptimization of Logically Partitioned Environments for Power Management | |
TW201140442A (en) | Accelerating a wake-up time of a system | |
US20150378782A1 (en) | Scheduling of tasks on idle processors without context switching | |
US8352702B2 (en) | Data processing system memory allocation | |
US20160253216A1 (en) | Ordering schemes for network and storage i/o requests for minimizing workload idle time and inter-workload interference | |
US12001880B2 (en) | Multi-core system and method of controlling operation of the same | |
US20160170474A1 (en) | Power-saving control system, control device, control method, and control program for server equipped with non-volatile memory | |
WO2017020798A1 (en) | Core load knowledge for elastic load balancing of threads | |
US9612907B2 (en) | Power efficient distribution and execution of tasks upon hardware fault with multiple processors | |
US9652298B2 (en) | Power-aware scheduling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: EMPIRE TECHNOLOGY DEVELOPMENT LLC, DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GARG, SIDDHARTH;RAGHUNATHAN, BHARATHWAJ;SIGNING DATES FROM 20140408 TO 20140722;REEL/FRAME:033425/0597 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: CRESTLINE DIRECT FINANCE, L.P., TEXAS Free format text: SECURITY INTEREST;ASSIGNOR:EMPIRE TECHNOLOGY DEVELOPMENT LLC;REEL/FRAME:048373/0217 Effective date: 20181228 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FEPP | Fee payment procedure |
Free format text: SURCHARGE FOR LATE PAYMENT, LARGE ENTITY (ORIGINAL EVENT CODE: M1554); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: EMPIRE TECHNOLOGY DEVELOPMENT LLC, WASHINGTON Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CRESTLINE DIRECT FINANCE, L.P.;REEL/FRAME:065712/0585 Effective date: 20231004 |
|
PRDP | Patent reinstated due to the acceptance of a late maintenance fee |
Effective date: 20240307 |
|
FEPP | Fee payment procedure |
Free format text: PETITION RELATED TO MAINTENANCE FEES FILED (ORIGINAL EVENT CODE: PMFP); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: PETITION RELATED TO MAINTENANCE FEES GRANTED (ORIGINAL EVENT CODE: PMFG); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Free format text: SURCHARGE, PETITION TO ACCEPT PYMT AFTER EXP, UNINTENTIONAL (ORIGINAL EVENT CODE: M1558); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |