CN103593304B

CN103593304B - The quantization method of effective use based on LPT device model caching

Info

Publication number: CN103593304B
Application number: CN201210287737.7A
Authority: CN
Inventors: 陶袁; 任可欣; 付军; 张运林; 陈永胜; 丁雪莹
Original assignee: Jilin Normal University
Current assignee: Jilin Normal University
Priority date: 2012-08-14
Filing date: 2012-08-14
Publication date: 2016-08-03
Anticipated expiration: 2032-08-14
Also published as: CN103593304A

Abstract

A kind of quantization method of effective use based on LPT device model caching on NUMA architecture of the present invention, the method includes: check whether current stack exists the number of CPU needs；If current stack does not exist the number of CPU needs, from the physical memory of machine, take number to be processed；If there is the number of CPU needs in current stack, taking out this non-zero number, cache hit statistical variable adds 1；Update to storehouse top finding CPU number to be processed；Access Memory statistics number of times and add 1；Calculate the big step of cache hit rate six that this secondary program is run, caching quantization method by means of LPT device model realization can realize quantitative research according to difform sparse matrix and different dividing mode to cache hit rate, it can improve cache hit rate in sparse matrix associative operation, reduce the use of communication bandwidth, has practical value and application prospect widely at high-performance computing sector.

Description

The quantization method of effective use based on LPT device model caching

Technical field

The present invention relates to a kind of quantization method that on high-performance calculation platform based on LPT device model, effective use caches on NUMA architecture, it is specifically related to a kind of high-performance calculation machine platform towards NUMA architecture and realizes the quantization method of sparse matrix associative operation effective use caching, sparse matrix associative operation highly effective caching for realizing at NUMA architecture provides implementation method and theoretical foundation, belongs to the parallel numerical algorithm field of high-performance calculation.

Background technology

When a kind of framework of the architecture of former high-performance computer is NUMA architecture, its different processor on node computer that is mainly characterized by has the caching of oneself, different processors to share host memory；Delay low decades of times compared with the delay accessing machine shared drive due to processor access cache, if the data user rate in Huan Cun is the highest, data in internal memory will be accessed repeatedly thus increase the pressure communication of communication channel, row subscript and the row subscript of extra access nonzero element is needed again owing to accessing the nonzero element of sparse matrix, thus increase the delay accessing internal memory further, make the hydraulic performance decline of complete machine.The approach solving this problem is preferably to use caching from algorithm design angle, increases the reusability of data in caching；Thus reduce processor and access the number of times of internal memory, reduce data communication amount between processor and internal memory, improve the service efficiency of communication bandwidth, reach to improve the purpose of high-performance computer performance.

Sparse matrix associative operation is the core algorithm of many high-performance algorithm, mainly apply such as sparse matrix and sparse matrix multiplication operations and include figure contraction algorithm, breadth-first search algorithm based on many source endpoints, recurrence shortest path first, multi grid interpolation/limit algorithm and parsing context-free language algorithm etc..

Along with expansion and the increase of the scale of calculating of sparse matrix associative operation application, major part sparse matrix associative operation is completed by multiple high-performance computer parallel computations；And owing to the feature that the distribution of sparse matrix data is sparse determines that sparse matrix associative operation can more preferably use caching more prominent to high-performance computer performance impact in running, and different algorithms is the biggest to the utilization ratio difference of caching, a kind of method can be provided, it is the energy-efficient requisite technology of current high performance computer utility by lack number of times as far as possible run high-performance computer just the efficiency of caching being utilized to carry out quantitative evaluation this algorithm, use performance that this technology preferably utilizes whole computer system to provide at high-performance computing sector and the important meaning of saves energy tool.

Summary of the invention

1, purpose: it is an object of the invention to provide a kind of quantization method based on LPT device model highly effective caching on NUMA architecture, apply this model can be algorithms of different cache hit rate provide quantization method, it is reduced to algorithm obtain superior performance and repeatedly run program, from reaching to save the purpose of the energy.

2, for achieving the above object, the technical scheme is that

The present invention is to study effective use based on LPT device model caching quantization method on NUMA architecture, and wherein cache model uses stacked manner to realize, and concrete grammar comprises the following steps:

Step one. check whether current stack exists the calculating number of CPU needs；

Step 2. if would there is not the number of CPU needs in current stack, takes out number to be processed from the physical memory of machine；

Step 3. if current stack would exist the number of CPU needs, takes out operand, and cache hit statistics number is added 1；

Step 4. number to be processed for the CPU found is updated to storehouse top；

Step 5. access Memory statistics number of times is added 1；

Step 6. utilize step 3 and step 5 to obtain value and calculate the cache hit rate that this secondary program is run.

Advantage and effect: the present invention is a kind of quantization method based on LPT device model effective use caching on NUMA architecture, it is compared with the prior art, its major advantage is: (1) is reduced to effective use caching and obtains superior performance and repeatedly run the number of run of the applicable parameter of acquisition in actual machine, reaches to save the purpose of electric energy；(2) using software management caching, access internal memory number of times and cache hit number of times to difform sparse matrix and different buffer sizes carry out accurate statistics；(3) solve sparse matrix associative operation realizes quantitative statistics by means of LPT device model to the hit rate of caching according to difform sparse matrix and different block sizes of drawing.

Accompanying drawing explanation

LPT device model framework figure of the present invention on Fig. 1 NUMA architecture

LPT device model realization sparse matrix multiplication operations schematic flow sheet of the present invention on Fig. 2 NUMA architecture

Detailed description of the invention

For making the object, technical solutions and advantages of the present invention express clearer, below in conjunction with the accompanying drawings and instantiation the present invention is further described in more detail.

As it is shown in figure 1, the present invention is to study effective use based on LPT device model caching quantization method on NUMA architecture, wherein cache model uses stacked manner to realize, and concrete grammar comprises the following steps:

Wherein, the major function described in step one is to check the calculating number whether current stack exists current CPU needs.Heap stack element of the present invention preserves the nonzero value of sparse matrix with the form of " row value, train value, nonzero element value " tlv triple, and wherein row value in tlv triple, train value are to distinguish the calculating number whether current number is CPU needs；The capacity of this storehouse is with high-performance computer hardware cache capacity for according to (value is high-performance computer final stage cache size), for improving the portability of program, buffer memory capacity is saved in the appointment text of high-performance computer assigned catalogue by the present invention, when the program is run initially to reading this value in the file specified；Caching uses the substitute mode of first in first out, it is achieved have the identical function that hardware cache realizes, unlike hardware cache function: the caching of this software management can add up number of times and the number of times of cache hit accessing internal memory.

Step 2. if would there is not the number of CPU needs in current stack, takes number to be processed from the physical memory of machine；

Wherein, the major function described in step 2 is if there is not number to be processed in current stack caching, takes number to be processed from the physical memory of high-performance computer；

Step 3. if current stack would exist the number of CPU needs, takes out operand, and cache hit statistics number adds 1；

Wherein, the major function described in step 3 is if current cache storehouse exists the number that current CPU to be used for calculating, and takes out operand, and cache hit statistics number adds 1；When program has performed, the value of this variable is that this secondary program runs cache hit number of times accumulated value；

Step 4. update to storehouse top finding CPU number to be processed；

Wherein, the major function described in step 4 is that the nonzero element just read is updated in storehouse；The element of the storehouse of the present invention comprises the preservation array of heap stack element, top-of-stack pointer, heap stack capability and the nonzero element number of current stack, and updating storehouse is the position row, column at non-zero place and the value of nonzero element being updated current stack stack top；

Step 5. access Memory statistics number of times and add 1；

Wherein, the major function described in step 5 adds 1 for accessing internal memory number of times statistical variable value；When program has performed, the value of this variable is that this secondary program runs the total degree accessing internal memory；

Step 6. calculate the cache hit rate that this secondary program is run.

Wherein, the major function described in step 6 is to calculate the cache hit rate that this secondary program is run；When program has performed, by cache hit statistical variable value divided by accessing internal memory number of times statistical variable value, the result obtained is the cache hit rate that this algorithm runs this example；The LPT device model of the present invention can be in the case of difformity sparse matrix and different piecemeal size, and accurate statistics accesses number of times and the number of times of cache hit of internal memory, provides quantized data for accurate statistics cache hit rate；

Main idea is that on the hardware platform of the high-performance computer of NUMA architecture, application LPT device model realizes quantitative research for effective use caching and provides foundation.

First when program perform CPU to take one for calculate severals time to reading in storehouse, when storehouse exists CPU number to be read, cache hit number of times and internal memory reading times all add 1, update to storehouse top just getting number simultaneously, and CPU gets non-zero number for calculating；If there is not the number of CPU needs afterwards in current stack, reading in the physical memory of high-performance computer, accessing internal memory number of times simultaneously and adding 1, and the non-zero number just read is updated storehouse top, CPU takes non-zero number for calculating；Finally perform when program, the cache hit rate of computational algorithm.

Illustrate with an example below, as in figure 2 it is shown, comprise the following steps:

Step one: check whether current CPU exists in current stack for the non-zero number calculated；First the method checked accesses whether current stack non-zero number is zero, if be not zero, represent current stack not for sky, it is successively read the number of current stack pointer indication, determine whether the number that current CPU needs, if zero carries out step 3, otherwise check whether current stack has the number not having access to, continue checking for being whether the number that needs of CPU if had, without then performing step 2；If current stack non-zero number is zero, represents current stack and there is not the number of CPU needs, then perform step 2.

Step 2: do not have the calculating number that CPU needs in current stack, take this number from the physical memory of current high performance computer.

Step 3: have the calculating number that CPU needs in current stack, takes out this non-zero number from current stack, and cache hit statistical variable adds 1 simultaneously.

Step 4: the calculating number that CPU is needed updates in storehouse；If the calculating number that CPU needs is in current stack, then this calculating number is moved to the position of current stack stack top；Otherwise this number is got in high-performance computer physical memory, also this number is updated the position to current stack stack top, if now stack full, then deletes the element of bottom of stack, and the most directly this number press-in storehouse, storehouse nonzero element number adds 1 simultaneously.

Step 5: access internal memory number of times statistical variable and add 1；No matter the number of current accessed is to get in storehouse or get in the physical memory of high-performance computer, is both needed to add 1 by access internal memory number of times.

Step 6: program has been run, the cache hit rate of statistic algorithm；The method calculated: cache hit statistical variable value is divided by accessing internal memory number of times statistical variable value, the cache hit rate that business is algorithm obtained.

In this example, main research uses the quantization method of LPT device model effective use caching, and the characteristic of can what the method was only had with algorithm self use caching is relevant, and unrelated with LPT device model of the present invention.

It should be noted last that: above example is only in order to illustrative not limiting technical scheme, although the present invention being described in detail with reference to above-described embodiment, it will be understood by those within the art that: still the present invention can be modified or equivalent, any modification or partial replacement without departing from the spirit and scope of the present invention, it all should be contained in the middle of scope of the presently claimed invention.

Claims

1. the quantization method of effective use based on a LPT device model caching, the method is to study effective use based on LPT device model caching quantization method on NUMA architecture, cache model uses stacked manner to realize, it is characterised in that: the method comprises the following steps: step one. check whether current stack exists operand；Step 2. if would there is not operand in current stack, takes out number to be processed from the physical memory of machine；Step 3. if current stack would exist the number of CPU needs, takes out operand, and cache hit statistics number is added 1；Step 4. the operand found is updated storehouse top；Step 5. access Memory statistics number of times is added 1；Step 6. the value utilizing step 3 and step 5 to obtain calculates the cache hit rate that this secondary program is run；

The element of storehouse described in step one preserves the nonzero value of sparse matrix with the form of " row value, train value, nonzero element value " tlv triple, and wherein row value in tlv triple, train value are to distinguish the calculating number whether current number is CPU needs；The capacity of this storehouse is with high-performance computer hardware cache capacity as foundation, value is high-performance computer final stage cache size, for improving the portability of program, buffer memory capacity is saved in the appointment text of high-performance computer assigned catalogue, reads this value initially to specifying in text when the program is run；Caching uses the substitute mode of first in first out, it is achieved have the identical function of hardware cache；

If step 3 current stack exists operand, take out operand, and cache hit statistics number adds 1；When program has performed, the value of this cache hit statistics number is that this secondary program runs cache hit statistics number accumulated value；

Described in step 4, the operand found is updated storehouse top；The element of storehouse comprises the preservation array of heap stack element, top-of-stack pointer, heap stack capability and the nonzero element number of current stack, and updating storehouse is the position row, column at nonzero element place and the value of nonzero element being updated current stack stack top；

Access Memory statistics number of times described in step 5 and add 1；When program has performed, the value of this Memory statistics number of times is that this secondary program runs the total degree accessing internal memory.