CN103593304B - The quantization method of effective use based on LPT device model caching - Google Patents

The quantization method of effective use based on LPT device model caching Download PDF

Info

Publication number
CN103593304B
CN103593304B CN201210287737.7A CN201210287737A CN103593304B CN 103593304 B CN103593304 B CN 103593304B CN 201210287737 A CN201210287737 A CN 201210287737A CN 103593304 B CN103593304 B CN 103593304B
Authority
CN
China
Prior art keywords
value
cache hit
current stack
stack
caching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210287737.7A
Other languages
Chinese (zh)
Other versions
CN103593304A (en
Inventor
陶袁
任可欣
付军
张运林
陈永胜
丁雪莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin Normal University
Original Assignee
Jilin Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin Normal University filed Critical Jilin Normal University
Priority to CN201210287737.7A priority Critical patent/CN103593304B/en
Publication of CN103593304A publication Critical patent/CN103593304A/en
Application granted granted Critical
Publication of CN103593304B publication Critical patent/CN103593304B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A kind of quantization method of effective use based on LPT device model caching on NUMA architecture of the present invention, the method includes: check whether current stack exists the number of CPU needs;If current stack does not exist the number of CPU needs, from the physical memory of machine, take number to be processed;If there is the number of CPU needs in current stack, taking out this non-zero number, cache hit statistical variable adds 1;Update to storehouse top finding CPU number to be processed;Access Memory statistics number of times and add 1;Calculate the big step of cache hit rate six that this secondary program is run, caching quantization method by means of LPT device model realization can realize quantitative research according to difform sparse matrix and different dividing mode to cache hit rate, it can improve cache hit rate in sparse matrix associative operation, reduce the use of communication bandwidth, has practical value and application prospect widely at high-performance computing sector.

Description

The quantization method of effective use based on LPT device model caching
Technical field
The present invention relates to a kind of quantization method that on high-performance calculation platform based on LPT device model, effective use caches on NUMA architecture, it is specifically related to a kind of high-performance calculation machine platform towards NUMA architecture and realizes the quantization method of sparse matrix associative operation effective use caching, sparse matrix associative operation highly effective caching for realizing at NUMA architecture provides implementation method and theoretical foundation, belongs to the parallel numerical algorithm field of high-performance calculation.
Background technology
When a kind of framework of the architecture of former high-performance computer is NUMA architecture, its different processor on node computer that is mainly characterized by has the caching of oneself, different processors to share host memory;Delay low decades of times compared with the delay accessing machine shared drive due to processor access cache, if the data user rate in Huan Cun is the highest, data in internal memory will be accessed repeatedly thus increase the pressure communication of communication channel, row subscript and the row subscript of extra access nonzero element is needed again owing to accessing the nonzero element of sparse matrix, thus increase the delay accessing internal memory further, make the hydraulic performance decline of complete machine.The approach solving this problem is preferably to use caching from algorithm design angle, increases the reusability of data in caching;Thus reduce processor and access the number of times of internal memory, reduce data communication amount between processor and internal memory, improve the service efficiency of communication bandwidth, reach to improve the purpose of high-performance computer performance.
Sparse matrix associative operation is the core algorithm of many high-performance algorithm, mainly apply such as sparse matrix and sparse matrix multiplication operations and include figure contraction algorithm, breadth-first search algorithm based on many source endpoints, recurrence shortest path first, multi grid interpolation/limit algorithm and parsing context-free language algorithm etc..
Along with expansion and the increase of the scale of calculating of sparse matrix associative operation application, major part sparse matrix associative operation is completed by multiple high-performance computer parallel computations;And owing to the feature that the distribution of sparse matrix data is sparse determines that sparse matrix associative operation can more preferably use caching more prominent to high-performance computer performance impact in running, and different algorithms is the biggest to the utilization ratio difference of caching, a kind of method can be provided, it is the energy-efficient requisite technology of current high performance computer utility by lack number of times as far as possible run high-performance computer just the efficiency of caching being utilized to carry out quantitative evaluation this algorithm, use performance that this technology preferably utilizes whole computer system to provide at high-performance computing sector and the important meaning of saves energy tool.
Summary of the invention
1, purpose: it is an object of the invention to provide a kind of quantization method based on LPT device model highly effective caching on NUMA architecture, apply this model can be algorithms of different cache hit rate provide quantization method, it is reduced to algorithm obtain superior performance and repeatedly run program, from reaching to save the purpose of the energy.
2, for achieving the above object, the technical scheme is that
The present invention is to study effective use based on LPT device model caching quantization method on NUMA architecture, and wherein cache model uses stacked manner to realize, and concrete grammar comprises the following steps:
Step one. check whether current stack exists the calculating number of CPU needs;
Step 2. if would there is not the number of CPU needs in current stack, takes out number to be processed from the physical memory of machine;
Step 3. if current stack would exist the number of CPU needs, takes out operand, and cache hit statistics number is added 1;
Step 4. number to be processed for the CPU found is updated to storehouse top;
Step 5. access Memory statistics number of times is added 1;
Step 6. utilize step 3 and step 5 to obtain value and calculate the cache hit rate that this secondary program is run.
Advantage and effect: the present invention is a kind of quantization method based on LPT device model effective use caching on NUMA architecture, it is compared with the prior art, its major advantage is: (1) is reduced to effective use caching and obtains superior performance and repeatedly run the number of run of the applicable parameter of acquisition in actual machine, reaches to save the purpose of electric energy;(2) using software management caching, access internal memory number of times and cache hit number of times to difform sparse matrix and different buffer sizes carry out accurate statistics;(3) solve sparse matrix associative operation realizes quantitative statistics by means of LPT device model to the hit rate of caching according to difform sparse matrix and different block sizes of drawing.
Accompanying drawing explanation
LPT device model framework figure of the present invention on Fig. 1 NUMA architecture
LPT device model realization sparse matrix multiplication operations schematic flow sheet of the present invention on Fig. 2 NUMA architecture
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention express clearer, below in conjunction with the accompanying drawings and instantiation the present invention is further described in more detail.
As it is shown in figure 1, the present invention is to study effective use based on LPT device model caching quantization method on NUMA architecture, wherein cache model uses stacked manner to realize, and concrete grammar comprises the following steps:
Step one. check whether current stack exists the calculating number of CPU needs;
Wherein, the major function described in step one is to check the calculating number whether current stack exists current CPU needs.Heap stack element of the present invention preserves the nonzero value of sparse matrix with the form of " row value, train value, nonzero element value " tlv triple, and wherein row value in tlv triple, train value are to distinguish the calculating number whether current number is CPU needs;The capacity of this storehouse is with high-performance computer hardware cache capacity for according to (value is high-performance computer final stage cache size), for improving the portability of program, buffer memory capacity is saved in the appointment text of high-performance computer assigned catalogue by the present invention, when the program is run initially to reading this value in the file specified;Caching uses the substitute mode of first in first out, it is achieved have the identical function that hardware cache realizes, unlike hardware cache function: the caching of this software management can add up number of times and the number of times of cache hit accessing internal memory.
Step 2. if would there is not the number of CPU needs in current stack, takes number to be processed from the physical memory of machine;
Wherein, the major function described in step 2 is if there is not number to be processed in current stack caching, takes number to be processed from the physical memory of high-performance computer;
Step 3. if current stack would exist the number of CPU needs, takes out operand, and cache hit statistics number adds 1;
Wherein, the major function described in step 3 is if current cache storehouse exists the number that current CPU to be used for calculating, and takes out operand, and cache hit statistics number adds 1;When program has performed, the value of this variable is that this secondary program runs cache hit number of times accumulated value;
Step 4. update to storehouse top finding CPU number to be processed;
Wherein, the major function described in step 4 is that the nonzero element just read is updated in storehouse;The element of the storehouse of the present invention comprises the preservation array of heap stack element, top-of-stack pointer, heap stack capability and the nonzero element number of current stack, and updating storehouse is the position row, column at non-zero place and the value of nonzero element being updated current stack stack top;
Step 5. access Memory statistics number of times and add 1;
Wherein, the major function described in step 5 adds 1 for accessing internal memory number of times statistical variable value;When program has performed, the value of this variable is that this secondary program runs the total degree accessing internal memory;
Step 6. calculate the cache hit rate that this secondary program is run.
Wherein, the major function described in step 6 is to calculate the cache hit rate that this secondary program is run;When program has performed, by cache hit statistical variable value divided by accessing internal memory number of times statistical variable value, the result obtained is the cache hit rate that this algorithm runs this example;The LPT device model of the present invention can be in the case of difformity sparse matrix and different piecemeal size, and accurate statistics accesses number of times and the number of times of cache hit of internal memory, provides quantized data for accurate statistics cache hit rate;
Main idea is that on the hardware platform of the high-performance computer of NUMA architecture, application LPT device model realizes quantitative research for effective use caching and provides foundation.
First when program perform CPU to take one for calculate severals time to reading in storehouse, when storehouse exists CPU number to be read, cache hit number of times and internal memory reading times all add 1, update to storehouse top just getting number simultaneously, and CPU gets non-zero number for calculating;If there is not the number of CPU needs afterwards in current stack, reading in the physical memory of high-performance computer, accessing internal memory number of times simultaneously and adding 1, and the non-zero number just read is updated storehouse top, CPU takes non-zero number for calculating;Finally perform when program, the cache hit rate of computational algorithm.
Illustrate with an example below, as in figure 2 it is shown, comprise the following steps:
Step one: check whether current CPU exists in current stack for the non-zero number calculated;First the method checked accesses whether current stack non-zero number is zero, if be not zero, represent current stack not for sky, it is successively read the number of current stack pointer indication, determine whether the number that current CPU needs, if zero carries out step 3, otherwise check whether current stack has the number not having access to, continue checking for being whether the number that needs of CPU if had, without then performing step 2;If current stack non-zero number is zero, represents current stack and there is not the number of CPU needs, then perform step 2.
Step 2: do not have the calculating number that CPU needs in current stack, take this number from the physical memory of current high performance computer.
Step 3: have the calculating number that CPU needs in current stack, takes out this non-zero number from current stack, and cache hit statistical variable adds 1 simultaneously.
Step 4: the calculating number that CPU is needed updates in storehouse;If the calculating number that CPU needs is in current stack, then this calculating number is moved to the position of current stack stack top;Otherwise this number is got in high-performance computer physical memory, also this number is updated the position to current stack stack top, if now stack full, then deletes the element of bottom of stack, and the most directly this number press-in storehouse, storehouse nonzero element number adds 1 simultaneously.
Step 5: access internal memory number of times statistical variable and add 1;No matter the number of current accessed is to get in storehouse or get in the physical memory of high-performance computer, is both needed to add 1 by access internal memory number of times.
Step 6: program has been run, the cache hit rate of statistic algorithm;The method calculated: cache hit statistical variable value is divided by accessing internal memory number of times statistical variable value, the cache hit rate that business is algorithm obtained.
In this example, main research uses the quantization method of LPT device model effective use caching, and the characteristic of can what the method was only had with algorithm self use caching is relevant, and unrelated with LPT device model of the present invention.
It should be noted last that: above example is only in order to illustrative not limiting technical scheme, although the present invention being described in detail with reference to above-described embodiment, it will be understood by those within the art that: still the present invention can be modified or equivalent, any modification or partial replacement without departing from the spirit and scope of the present invention, it all should be contained in the middle of scope of the presently claimed invention.

Claims (1)

1. the quantization method of effective use based on a LPT device model caching, the method is to study effective use based on LPT device model caching quantization method on NUMA architecture, cache model uses stacked manner to realize, it is characterised in that: the method comprises the following steps: step one. check whether current stack exists operand;Step 2. if would there is not operand in current stack, takes out number to be processed from the physical memory of machine;Step 3. if current stack would exist the number of CPU needs, takes out operand, and cache hit statistics number is added 1;Step 4. the operand found is updated storehouse top;Step 5. access Memory statistics number of times is added 1;Step 6. the value utilizing step 3 and step 5 to obtain calculates the cache hit rate that this secondary program is run;
The element of storehouse described in step one preserves the nonzero value of sparse matrix with the form of " row value, train value, nonzero element value " tlv triple, and wherein row value in tlv triple, train value are to distinguish the calculating number whether current number is CPU needs;The capacity of this storehouse is with high-performance computer hardware cache capacity as foundation, value is high-performance computer final stage cache size, for improving the portability of program, buffer memory capacity is saved in the appointment text of high-performance computer assigned catalogue, reads this value initially to specifying in text when the program is run;Caching uses the substitute mode of first in first out, it is achieved have the identical function of hardware cache;
If step 3 current stack exists operand, take out operand, and cache hit statistics number adds 1;When program has performed, the value of this cache hit statistics number is that this secondary program runs cache hit statistics number accumulated value;
Described in step 4, the operand found is updated storehouse top;The element of storehouse comprises the preservation array of heap stack element, top-of-stack pointer, heap stack capability and the nonzero element number of current stack, and updating storehouse is the position row, column at nonzero element place and the value of nonzero element being updated current stack stack top;
Access Memory statistics number of times described in step 5 and add 1;When program has performed, the value of this Memory statistics number of times is that this secondary program runs the total degree accessing internal memory.
CN201210287737.7A 2012-08-14 2012-08-14 The quantization method of effective use based on LPT device model caching Expired - Fee Related CN103593304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210287737.7A CN103593304B (en) 2012-08-14 2012-08-14 The quantization method of effective use based on LPT device model caching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210287737.7A CN103593304B (en) 2012-08-14 2012-08-14 The quantization method of effective use based on LPT device model caching

Publications (2)

Publication Number Publication Date
CN103593304A CN103593304A (en) 2014-02-19
CN103593304B true CN103593304B (en) 2016-08-03

Family

ID=50083455

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210287737.7A Expired - Fee Related CN103593304B (en) 2012-08-14 2012-08-14 The quantization method of effective use based on LPT device model caching

Country Status (1)

Country Link
CN (1) CN103593304B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015176245A (en) * 2014-03-13 2015-10-05 株式会社東芝 Information processing apparatus and data structure
CN104504077B (en) * 2014-12-22 2018-04-03 北京国双科技有限公司 The statistical method and device of web page access data
CN111338884B (en) * 2018-12-19 2023-06-16 北京嘀嘀无限科技发展有限公司 Cache miss rate monitoring method and device, electronic equipment and readable storage medium
CN109783402A (en) * 2018-12-28 2019-05-21 深圳竹云科技有限公司 A kind of method of dynamic adjustment caching hot spot data
CN110347487B (en) * 2019-07-05 2021-03-23 中国人民大学 Database application-oriented energy consumption characterization method and system for data movement

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102087634A (en) * 2011-01-27 2011-06-08 凌阳科技股份有限公司 Device and method for improving cache hit ratio

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102087634A (en) * 2011-01-27 2011-06-08 凌阳科技股份有限公司 Device and method for improving cache hit ratio

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Parallel Unsymmetric-Pattern Multifrontal Sparse LU with Column Preordering;Haim Avron等;《ACM Transactions on Mathematics Software》;20080331;第34卷(第2期);第41-71页 *

Also Published As

Publication number Publication date
CN103593304A (en) 2014-02-19

Similar Documents

Publication Publication Date Title
CN103336758B (en) The sparse matrix storage means of a kind of employing with the sparse row of compression of local information and the SpMV implementation method based on the method
CN103593304B (en) The quantization method of effective use based on LPT device model caching
Kim et al. Fast, energy efficient scan inside flash memory SSDs
CN108196935B (en) Cloud computing-oriented virtual machine energy-saving migration method
CN104361113A (en) OLAP (On-Line Analytical Processing) query optimization method in memory and flesh memory hybrid storage mode
CN110362566B (en) Data placement in a hybrid data layout of a hierarchical HTAP database
CN104361118A (en) Mixed OLAP (on-line analytical processing) inquiring treating method adapting coprocessor
CN101593202A (en) Based on the hash connecting method for database of sharing the Cache polycaryon processor
WO2015142341A1 (en) Dynamic memory expansion by data compression
CN102110079A (en) Tuning calculation method of distributed conjugate gradient method based on MPI
WO2021232769A1 (en) Method for storing data and data processing apparatus
CN104536832A (en) Virtual machine deployment method
Liang et al. Ins-dla: An in-ssd deep learning accelerator for near-data processing
Choi et al. Energy efficient scale-in clusters with in-storage processing for big-data analytics
Ouyang et al. Active SSD design for energy-efficiency improvement of web-scale data analysis
CN105022631A (en) Scientific calculation-orientated floating-point data parallel lossless compression method
CN102521463B (en) Method for improving numerical reservoir simulation efficiency by optimizing behaviors of Cache
Kwon et al. Sparse convolutional neural network acceleration with lossless input feature map compression for resource‐constrained systems
CN112540718A (en) Sparse matrix storage method for Schenk core architecture
CN107341193B (en) Method for inquiring mobile object in road network
Maltenberger et al. Evaluating In-Memory Hash Joins on Persistent Memory.
CN103984832A (en) Simulation analysis method for electric field of aluminum electrolysis cell
CN104461941A (en) Memory system structure and management method
CN102929580B (en) Partitioning method and device of digit group multi-reference access
Li et al. HODS: Hardware object deserialization inside SSD storage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160803

Termination date: 20180814