CN102135793A - Mixed dividing method of low-power-consumption multi-core shared cache - Google Patents

Mixed dividing method of low-power-consumption multi-core shared cache Download PDF

Info

Publication number
CN102135793A
CN102135793A CN 201110076723 CN201110076723A CN102135793A CN 102135793 A CN102135793 A CN 102135793A CN 201110076723 CN201110076723 CN 201110076723 CN 201110076723 A CN201110076723 A CN 201110076723A CN 102135793 A CN102135793 A CN 102135793A
Authority
CN
China
Prior art keywords
ipc
cache
thread
formula
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110076723
Other languages
Chinese (zh)
Other versions
CN102135793B (en
Inventor
方娟
杜文娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN2011100767236A priority Critical patent/CN102135793B/en
Publication of CN102135793A publication Critical patent/CN102135793A/en
Application granted granted Critical
Publication of CN102135793B publication Critical patent/CN102135793B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to a mixed dividing method of a low-power-consumption multi-core shared cache, belonging to the field of computer system structures. With increase of number of cores integrated on a chip, the low-power-consumption design becomes the necessary trend, however, throughput or equity is considered and the problem of power consumption is ignored in most traditional cache dividing methods. The invention provides a novel low-power-consumption dividing method. The diving method combines threads with larger access difference in a secondary cache into a dividing unit to realize column division of the cache by utilizing the locality principle of a program, thus, when the same application is run, less cache columns are used, and the remaining cache columns are closed so that the purpose of reducing power consumption is achieved on the basis of meeting the performance.

Description

A kind of multinuclear towards low-power consumption is shared Cache and is mixed division methods
Technical field
The invention belongs to field of computer architecture, be specifically related to a kind of multinuclear and share Cache mixing division methods towards low-power consumption.
Background technology
Along with the increase of core number integrated on the sheet, the processor surface temperature also becomes more and more higher and is exponential increase.High power consumption not only means a large amount of energy resource consumptions, and hot stack and ever-increasing power dissipation density also will threaten the stability of system.Too high power consumption can limiting processor performance lifting, and if further improve frequency or increase the capacity of buffer memory, it is upwards soaring that the power consumption of processor is continued, and then enter into a vicious cycle.In the face of the power consumption pressure of processor, low power dissipation design has become the key problem in the following microprocessor Design.
Yet the present research that multinuclear is shared the second-level cache partition strategy is nearly all towards handling capacity or fairness, and is few towards the research of low-power consumption.Only division methods towards low-power consumption is to realize on the basis of any sacrifice in performance, division methods is not carried out substantial improvement.
Summary of the invention
The present invention utilizes the principle of locality of program run, combine by privately owned and shared two kinds of resource distribution modes and to implement Cache and divide, thereby when being implemented in same applications of operation, use the least possible Cache to be listed as, close remaining columns, reach the purpose that reduces system power dissipation.Aspect the assurance system performance, the present invention has adopted the index of system IPC as the evaluation system performance, and IPC is the instruction number (Instruction Per Cycle) that each clock is carried out.
It is as follows to the invention provides technical scheme:
Dynamic partition strategy of the present invention mainly is divided into three steps: initialization, divide and recall.In algorithm, four basic parameters are arranged: diversity factor threshold value R Share, recall threshold value IPC [initial], performance loss threshold value PLT and timeslice t.R ShareGeneral value is 50%~200%, IPC [initial]Value obtains in test, and the PLT value is between 0~3%, and the t value is between 100000~5000000 clock period.The main process of algorithm is as follows:
1, a kind of multinuclear towards low-power consumption is shared Cache mixing division methods, it is characterized in that comprising following steps:
(1) initialization:
1.1) with each thread as an independent dividing unit, divide a row second-level cache for each dividing unit;
1.2) behind timeslice t, whether determining program end of run, finishes then to jump to step (4), otherwise continues execution in step 1.3); Timeslice t is meant that with the even piecemeal working time of application program every time is called a timeslice t;
1.3) according to 1. computing system performance IPC of formula [par], IPC is the instruction number (Instruction Per Cycle) that each clock is carried out;
IPC [ par ] = Σ i = 1 n IPC [ app i ] = Σ i = 1 n 1 α + β × θ i ( x )
Wherein, n represents the total number of threads that application program comprises, Expression thread i (instruction number that each clock of 1≤i≤n) is carried out is used for characterizing the performance of thread i, 2. calculates according to following formula:
IPC [ app i ] = 1 α + β × θ i ( x )
2. α in the formula and β computing formula be as shown in the formula 3., θ i(x) be the crash rate of the thread i that obtains according to the stack range performance, expression when for the second-level cache of thread i distribution x size, the disappearance number of times of thread i in timeslice t;
α=CPI [base]+E 1+M 1×E 2,β=E 3
3. the CPI in the formula [base]Be the execution cycle number of average every instruction, E 1Hit expense, M for visit one-level Cache 1Be the disappearance number of times of one-level Cache, E 2The visit second-level cache hits expense, E when lacking for one-level Cache 3Visit the expense of hitting of main memory when lacking for second-level cache; M 1, E 1, E 2, E 3And CPI [base]Value determine by employed Computer Architecture;
1.4) as if system performance IPC this moment [par]〉=(1-PLT) IPC [initial], then close remaining Cache row, execution in step (3); Otherwise execution in step (2); Wherein, recall threshold value IPC [initial]Expression Cache is listed as the system performance when dividing merely, and PLT represents the performance loss threshold value;
(2) the division stage: make all thread T that do not merge form thread collection TS, TS={T i| 1≤i≤n}
2.1) if | TS| 〉=2, execution in step 2.2), otherwise jump to step 2.5);
2.2) 4. calculate the diversity factor R that any two thread accesses Cache are listed as among the TS according to formula Diff
R Diff=| P i-P j|, (1≤i≤n, 1≤j≤n and i ≠ j) 4.
Wherein P is the thread accesses degree of bias, 5. calculate by formula,
P = U up - U down U used
The present invention is divided into two parts up and down, U with each row of Cache UpBe illustrated among the row Cache that thread gets the Cache piece number that the first half was visited, U DownThe piece number that expression the latter half was visited, U UsedRepresent the Cache piece sum of visiting in these row;
2.3) for all differences degree R DiffSatisfy R Diff〉=R ShareThread right, with R DiffObtain peaked two thread<T i, T jMerge into a dividing unit, TS=TS-{T i, T j; Wherein, R ShareExpression diversity factor threshold value;
2.4) if | TS| 〉=2, then execution in step 2.3), otherwise execution in step 2.5);
2.5) judge whether unallocated Cache row, be then to carry out 2.6), otherwise execution in step (3);
2.6) calculate the IPC recruitment Plus_IPC of all dividing unit k(x) value, Plus_IPC k(x) be meant when dividing unit k (Cache that 1≤k≤n) distributes be listed as from x (x ∈ [and 0, C L2)) when row are increased to the x+1 row, the IPC recruitment of dividing unit k, wherein C L2Total columns for second-level cache; Plus_IPC k(x) computing formula is as shown in the formula 6.:
Figure BDA0000052667240000032
Wherein, function
Figure BDA0000052667240000033
Computing formula such as following formula 2.;
2.7) divide a row Cache to the maximum Plus_IPC that gets k(x) dividing unit;
2.8) according to formula 1. computing system divide performance IPC [par]
2.9) if this moment IPC [par]〉=(1-PLT) IPC [initial], then the division stage finishes, and closes remaining Cache row, forwards step (3) to; Otherwise carry out 2.5);
(3) recall the stage: behind timeslice t of program run,
3.1) program end of run whether, be then to forward step (4) to, otherwise execution in step 3.2);
3.2) according to formula 1. computing system divide performance IPC [par]
3.3) if this moment IPC [par]〉=(1-PLT) IPC [initial], forward step (3) to, otherwise, recover initial division, carry out (2);
(4) output operation result, the power consumption of being saved with power consumption assessment tool evaluating system.
In the implementation of above-mentioned partition strategy, the present invention has adopted IPC (Instruction Per Cycle: the order bar number of each clock operation) as the module of system performance, with the box-like performance of coming characterization system of crash rate and ideal I PC, suc as formula 1..To sum up state, the purport of this partitioning algorithm is to transform towards the multinuclear Cache partition problem of low-power consumption in order 7. to find the solution formula problem 8. satisfying formula.
Σ i = 1 n 1 α + β × θ i ( c i ) ≥ ( 1 - PLT ) × IPC [ initial ]
E consum = E col × max ( C L 2 - n - Σ k ∈ Units c k ) , c k > 0
7. formula is conditional function, the system performance that the system performance after indicating to guarantee to divide will be equivalent to necessarily to lose basic Cache row in the threshold value at least when dividing.8. formula is objective function, wherein E ColRepresent the Cache average power consumption that consumed of position of itemizing, C L2Total columns of expression second-level cache, the Cache columns that n divides away when representing initial division, Units are represented the set that all dividing unit are formed, c kThe Cache columns that expression division unit k divides,
Figure BDA0000052667240000043
The Cache row total amount that the expression division stage divides for all dividing unit, then E in use ConsumThe total power consumption that the representation program operation is saved.This shows that the second-level cache amount that system uses is few more, the power consumption of system consumption is just few more.Therefore, program will be closed more second-level cache row as far as possible in operational process, reach the purpose that farthest reduces power consumption;
For the crash rate of dynamic collection application program under Different Ca che capacity, each nuclear has all increased a crash rate watch-dog (MRM).Like this, each nuclear can be simulated the accessing operation of monopolizing when using second-level cache.Just can obtain the deletion condition of the Cache row of different stack distances according to the record of MRM.According to the result of MRM record, the crash rate θ (x) when just can computing system distributing different big or small second-level cache to dividing unit.
In order to add up the accessed diversity factor of every row Cache, the present invention has also increased by two access counter ACU and ACD for each nuclear, be used for writing down Cache and list the access times of half part and the access times of Cache row the latter half, thereby calculate the visit diversity factor R that Cache is listed as Diff
Description of drawings
Fig. 1 is a method flow diagram of the present invention;
Fig. 2 is a mixing partition process process flow diagram of the present invention;
Embodiment:
Chip multi-core processor with a two-stage Cache structure is an example below, and division methods of the present invention is described in detail.
Dispose as table 1:
Figure BDA0000052667240000051
Table 1
Four parameter difference values on this processor: diversity factor threshold value R Share=100%, recall threshold value IPC InitialValue obtains in test, and performance loss threshold value PLT=0 and timeslice t=100000 adopt CACTI as the power consumption assessment instrument, move 4 threads.Wherein, CACTI is one of power consumption assessment instrument commonly used in this area.Concrete steps are as follows:
(1) initialization:
1.1) with each thread of application program as an independent dividing unit, divide a row second-level cache for each dividing unit;
1.2) after timeslice, whether determining program end of run, finishes then to jump to step (4), otherwise continues execution in step 1.3);
1.3) by the current computer architecture, get M 1=0.1, E 1=3, E 2=6, E 3=158 and CPI Base=0.5, n=4 calculates α=CPI [base]+ E 1+ M 1* E 2=0.5+3+0.1 * 6=4.1, β=E 3=158, θ i(x) by the working procedure decision,, obtain by the stack range performance according to the MRM monitoring result.Computing system performance IPC [par],
IPC [ par ] = Σ i = 1 4 1 α + β × θ i ( x ) = Σ i = 1 4 1 4.1 + 158 × θ i ( x )
1.4) as if system performance IPC this moment [par]〉=IPC [initial], then close remaining Cache row, execution in step (3); Otherwise execution in step (2);
(2) the division stage: make all thread T that do not merge form thread collection TS, TS={T i| 1≤i≤4}
2.1) if | TS| 〉=2, execution in step 2.2), otherwise jump to step 2.5);
2.2) from ACU and ACD, read the U of each thread UpAnd U Down, calculate visit degree of bias P.Suppose, calculate P this moment 1=100%, P 2=-80%, P 3=90%, P 4=-85%, calculate the diversity factor R that any two thread accesses Cache are listed as among the TS according to following formula Diff
R Diff=| P i-P j|, (1≤i≤4,1≤j≤4 and i ≠ j)
R DiffResult of calculation such as following table:
R diff P 1 P 2 P 3 P 4
P 1 —— 180% 10% 185%
P 2 180% —— 170% 5%
P 3 10% 170% —— 175%
P 4 185% 5% 175% ——
2.3) at this moment, R Diff〉=100% thread is to having<T 1, T 2,<T 1, T 4,<T 2, T 3,<T 3, T 4, with R DiffObtain peaked two thread<T 1, T 4Merge into a dividing unit, TS=TS-{T 1, T 4;
2.4) at this moment, | TS| 〉=2,<T 2, T 3Diversity factor R Diff=170% 〉=100%, satisfy the merging condition, continue to merge thread<T 2, T 3, at this moment, | TS|<2, execution in step 2.5);
2.5) judge whether unallocated Cache row, be then to carry out 2.6), otherwise execution in step (3);
2.6) merge through thread, at this moment, system has two dividing unit, dividing unit 1<T 1, T 4And dividing unit 2<T 2, T 3, calculate the IPC recruitment Plus_IPC of these two dividing unit k(x), k=1,2, computing formula as shown in the formula:
Plus _ IPC 1 ( x ) = [ IPC [ app 1 ] ( x + 1 ) - IPC [ app 1 ] ( x ) ] + [ IPC [ app 4 ] ( x + 1 ) - IPC [ app 4 ] ( x ) ]
Plus _ IPC 2 ( x ) = [ IPC [ app 2 ] ( x + 1 ) - IPC [ app 2 ] ( x ) ] + [ IPC [ app 3 ] ( x + 1 ) - IPC [ app 3 ] ( x ) ]
2.7) if this moment Plus_IPC 1(x) 〉=Plus_IPC 2(x), divide a row Cache and give dividing unit 1, i.e. dividing unit<T 1, T 4; Otherwise, divide a row Cache and give dividing unit 2, i.e. dividing unit<T 2, T 3.
2.8) computing system division performance IPC [par]
2.9) if this moment IPC [par]〉=IPC [initial], then the division stage finishes, and closes remaining Cache row, forwards step (3) to; Otherwise carry out 2.5);
(3) recall the stage: after timeslice of program run
3.1) program end of run whether, be then to forward step (4) to, otherwise execution in step 3.2);
3.2) computing system division performance
Figure BDA0000052667240000073
3.3) if this moment IPC [par]〉=IPC [initial], then this time recall the stage end, forward step (3) to, otherwise, recover initial division, carry out (2);
(4) output operation result, the power consumption of being saved with power consumption assessment tool CACTI evaluating system.

Claims (1)

1. the multinuclear towards low-power consumption is shared Cache mixing division methods, it is characterized in that comprising following steps:
(1) initialization:
1.1) with each thread as an independent dividing unit, divide a row second-level cache for each dividing unit;
1.2) behind timeslice t, whether determining program end of run, finishes then to jump to step (4), otherwise continues execution in step 1.3); Timeslice t is meant that with the even piecemeal working time of application program every time is called a timeslice t;
1.3) according to 1. computing system performance IPC of formula [par], IPC is the instruction number (Instruction Per Cycle) that each clock is carried out;
IPC [ par ] = Σ i = 1 n IPC [ app i ] = Σ i = 1 n 1 α + β × θ i ( x )
Wherein, n represents the total number of threads that application program comprises,
Figure FDA0000052667230000012
Expression thread i (instruction number that each clock of 1≤i≤n) is carried out is used for characterizing the performance of thread i, 2. calculates according to following formula:
IPC [ app i ] = 1 α + β × θ i ( x )
2. α in the formula and β computing formula be as shown in the formula 3., θ i(x) be the crash rate of the thread i that obtains according to the stack range performance, expression when for the second-level cache of thread i distribution x size, the disappearance number of times of thread i in timeslice t;
α=CPI [base]+E 1+M 1×E 2,β=E 3
3. the CPI in the formula [base]Be the execution cycle number of average every instruction, E 1Hit expense, M for visit one-level Cache 1Be the disappearance number of times of one-level Cache, E 2The visit second-level cache hits expense, E when lacking for one-level Cache 3Visit the expense of hitting of main memory when lacking for second-level cache; M 1, E 1, E 2, E 3And CPI [baae]Value determine by employed Computer Architecture;
1.4) as if system performance IPC this moment [par]〉=(1-PLT) IPC [initial], then close remaining Cache row, execution in step (3); Otherwise execution in step (2); Wherein, recall threshold value IPC [initial]Expression Cache is listed as the system performance when dividing merely, and PLT represents the performance loss threshold value;
(2) the division stage: make all thread T that do not merge form thread collection TS, TS={T i| 1≤i≤n}
2.1) if | TS| 〉=2, execution in step 2.2), otherwise jump to step 2.5);
2.2) 4. calculate the diversity factor R that any two thread accesses Cache are listed as among the TS according to formula Diff
R Diff=| P i-P j|, (1≤i≤n, 1≤j≤n and i ≠ j) 4.
Wherein P is the thread accesses degree of bias, 5. calculate by formula,
P = U up - U down U used
The present invention is divided into two parts up and down, U with each row of Cache UpBe illustrated among the row Cache that thread gets the Cache piece number that the first half was visited, U DownThe piece number that expression the latter half was visited, U UsedRepresent the Cache piece sum of visiting in these row;
2.3) for all differences degree R DiffSatisfy R Diff〉=R ShareThread right, with R DiffObtain peaked two thread<T i, T jMerge into a dividing unit, TS=TS-{T i, T j; Wherein, R ShareExpression diversity factor threshold value;
2.4) if | TS| 〉=2, then execution in step 2.3), otherwise execution in step 2.5);
2.5) judge whether unallocated Cache row, be then to carry out 2.6), otherwise execution in step (3);
2.6) calculate the IPC recruitment Plus_IPC of all dividing unit k(x) value, Plus_IPC k(x) be meant when dividing unit k (Cache that 1≤k≤n) distributes be listed as from x (x ∈ [and 0, C L2)) when row are increased to the x+1 row, the IPC recruitment of dividing unit k, wherein C L2Total columns for second-level cache; Plus_IPC k(x) computing formula is as shown in the formula 6.:
Figure FDA0000052667230000022
Wherein, function
Figure FDA0000052667230000023
Computing formula such as following formula 2.;
2.7) divide a row Cache to the maximum Plus_IPC that gets k(x) dividing unit;
2.8) according to formula 1. computing system divide performance IPC [par]
2.9) if this moment IPC [par]〉=(1-PLT) IPC [initial], then the division stage finishes, and closes remaining Cache row, forwards step (3) to; Otherwise carry out 2.5);
(3) recall the stage: behind timeslice t of program run,
3.1) program end of run whether, be then to forward step (4) to, otherwise execution in step 3.2);
3.2) according to formula 1. computing system divide performance IPC [par]
3.3) if this moment IPC [par]〉=(1-PLT) IPC [initial], forward step (3) to, otherwise, recover initial division, carry out (2);
(4) output operation result, the power consumption of being saved with power consumption assessment tool evaluating system.
CN2011100767236A 2011-03-29 2011-03-29 Mixed dividing method of low-power-consumption multi-core shared cache Expired - Fee Related CN102135793B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100767236A CN102135793B (en) 2011-03-29 2011-03-29 Mixed dividing method of low-power-consumption multi-core shared cache

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100767236A CN102135793B (en) 2011-03-29 2011-03-29 Mixed dividing method of low-power-consumption multi-core shared cache

Publications (2)

Publication Number Publication Date
CN102135793A true CN102135793A (en) 2011-07-27
CN102135793B CN102135793B (en) 2012-07-04

Family

ID=44295596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100767236A Expired - Fee Related CN102135793B (en) 2011-03-29 2011-03-29 Mixed dividing method of low-power-consumption multi-core shared cache

Country Status (1)

Country Link
CN (1) CN102135793B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103077128A (en) * 2012-12-29 2013-05-01 华中科技大学 Method for dynamically partitioning shared cache in multi-core environment
CN103150266A (en) * 2013-02-20 2013-06-12 北京工业大学 Improved multi-core shared cache replacing method
TWI476583B (en) * 2011-09-23 2015-03-11 Nat Univ Tsing Hua Power aware computer simulation system and method thereof
CN106200868A (en) * 2016-06-29 2016-12-07 联想(北京)有限公司 Shared variable acquisition methods, device and polycaryon processor in polycaryon processor

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060018179A1 (en) * 2002-11-18 2006-01-26 Paul Marchal Cost-aware design-time/run-time memory management methods and apparatus
CN101739299A (en) * 2009-12-18 2010-06-16 北京工业大学 Method for dynamically and fairly partitioning shared cache based on chip multiprocessor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060018179A1 (en) * 2002-11-18 2006-01-26 Paul Marchal Cost-aware design-time/run-time memory management methods and apparatus
CN101739299A (en) * 2009-12-18 2010-06-16 北京工业大学 Method for dynamically and fairly partitioning shared cache based on chip multiprocessor

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI476583B (en) * 2011-09-23 2015-03-11 Nat Univ Tsing Hua Power aware computer simulation system and method thereof
CN103077128A (en) * 2012-12-29 2013-05-01 华中科技大学 Method for dynamically partitioning shared cache in multi-core environment
CN103077128B (en) * 2012-12-29 2015-09-23 华中科技大学 Shared buffer memory method for dynamically partitioning under a kind of multi-core environment
CN103150266A (en) * 2013-02-20 2013-06-12 北京工业大学 Improved multi-core shared cache replacing method
CN103150266B (en) * 2013-02-20 2015-10-28 北京工业大学 A kind of multinuclear cache sharing replacement method of improvement
CN106200868A (en) * 2016-06-29 2016-12-07 联想(北京)有限公司 Shared variable acquisition methods, device and polycaryon processor in polycaryon processor
CN106200868B (en) * 2016-06-29 2020-07-24 联想(北京)有限公司 Method and device for acquiring shared variables in multi-core processor and multi-core processor

Also Published As

Publication number Publication date
CN102135793B (en) 2012-07-04

Similar Documents

Publication Publication Date Title
Hu et al. Data allocation optimization for hybrid scratch pad memory with SRAM and nonvolatile memory
EP2743834B1 (en) Dynamic set-associative cache apparatus for processor and visiting method thereof
CN102135793B (en) Mixed dividing method of low-power-consumption multi-core shared cache
Li et al. MAC: Migration-aware compilation for STT-RAM based hybrid cache in embedded systems
Wu et al. RAMZzz: Rank-aware DRAM power management with dynamic migrations and demotions
Mittal et al. CASHIER: A cache energy saving technique for QoS systems
Jing et al. Compiler assisted dynamic register file in GPGPU
Tan et al. Soft-error reliability and power co-optimization for GPGPUS register file using resistive memory
Huang et al. Triangle counting and truss decomposition using FPGA
Xie et al. Page policy control with memory partitioning for DRAM performance and power efficiency
Hebbar SR et al. SPEC CPU2017: Performance, event, and energy characterization on the core i7-8700K
US20140146060A1 (en) Power management method for graphic processing unit and system thereof
CN103593304B (en) The quantization method of effective use based on LPT device model caching
CN103218304A (en) On-chip and off-chip distribution method for embedded memory data
CN101901192B (en) On-chip and off-chip data object static assignment method
Zhang et al. To co-run, or not to co-run: A performance study on integrated architectures
Fang et al. Performance optimization by dynamically altering cache replacement algorithm in CPU-GPU heterogeneous multi-core architecture
Li et al. Managing hybrid main memories with a page-utility driven performance model
Kabat et al. Performance evaluation of high bandwidth memory for HPC workloads
Malkowski et al. Phase-aware adaptive hardware selection for power-efficient scientific computations
Zhu et al. Onac: optimal number of active cores detector for energy efficient gpu computing
Atoofian et al. Power-Aware L 1 and L 2 Caches for GPGPUs
Lashgar et al. Investigating Warp Size Impact in GPUs
Lee et al. 2WPR: Disk Buffer Replacement Algorithm Based on the Probability of Reference to Reduce the Number of Writes in Flash Memory
Paul et al. Introduction to the technology mediated collaborations in healthcare Minitrack

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120704

Termination date: 20190329

CF01 Termination of patent right due to non-payment of annual fee