CN103218304A - On-chip and off-chip distribution method for embedded memory data - Google Patents
On-chip and off-chip distribution method for embedded memory data Download PDFInfo
- Publication number
- CN103218304A CN103218304A CN2013101146843A CN201310114684A CN103218304A CN 103218304 A CN103218304 A CN 103218304A CN 2013101146843 A CN2013101146843 A CN 2013101146843A CN 201310114684 A CN201310114684 A CN 201310114684A CN 103218304 A CN103218304 A CN 103218304A
- Authority
- CN
- China
- Prior art keywords
- data
- data object
- tcg
- chip
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Memory System Of A Hierarchy Structure (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention relates to an on-chip and off-chip distribution method for embedded memory data. An on-chip memory serving as the key part of an embedded system directly affects the integral performance of the system. The on-chip and off-chip distribution method firstly proposes that a TCG (trusted computing group) model is used as a new standard for measuring a data object to cause Cache missing, and majority of key factors, such as the data object size, the life cycle, the access times, the time locality and the space locality, are comprehensively considered; then, an SPM(scratch pad memory)/Cache data distribution method is proposed to distribute a data object, which most likely generates confliction (big TCG value), to an SPM; and finally, a fixed Cache data layout method is proposed to map the data object with the big TCG value to different Cache groups to avoid confliction. According to the method disclosed by the invention, on-chip memory hardware and software operated on the on-chip memory hardware are better matched, and time for a program to access a memory system is shortened so as to improve the integral performance of the system.
Description
Technical field
The invention belongs to the embedded memory technical field.Relate in particular to the outer distribution method of a kind of embedded memory data slice last slice.The present invention can obtain at the best performance that specifically is applied on the concrete memory configurations, is particularly useful for the performance optimization of multimedia application on scratch-pad memory/high-speed cache mixing internal storage structure.
Background technology
Because the difference of manufacturing process and circuit logic structure, the speed of processor execution unit is higher than the memory read writing rate always, and along with the development of semiconductor process techniques, the performance difference that these gaps between their growth rates cause is progressively strengthening.An important technology that solves processor and external memory velocity mismatch is exactly a storage system employing hierarchical design, on sheet an integrated little but speed faster storer improve system's memory access performance.
Internal storage structure is as the pith of embedded system on the sheet, directly affects the key parameter such as performance, power consumption, cost of system.Internal storage structure has high-speed cache Cache and scratch-pad memory SPM(Scratch-Pad Memory on the sheet) two types.SPM compares every cost less area of Cache storage and power consumption, thereby internal storage structure takes the mixed structure of SPM/Cache to become a kind of trend gradually on the embedded system sheet.Yet the very little and specificity of the capacity of SPM makes how effectively to use memory source on the sheet to become the key issue of Embedded System Design.
Existing software data storage optimization research mainly concentrates on the Cache order rate that how to increase, and perhaps how to increase the SPM access times, lacks research is optimized in the datarams visit of adopting internal storage structure on Cache and the SPM mixing tab.
The outer distribution technique of data slice last slice is a kind of embedded system storage optimization technology, utilizes this technology to obtain the outer allocation strategy of sheet last slice and determines which data by SPM (being called on the sheet) visit, and which data is by Cache (it is outer to be called sheet) visit.The outer distribution technique of data slice last slice has been optimized the distribution of data between SPM and Cache, can obtain the best performance to concrete application, has become the focus of embedded system storage optimization research.
Summary of the invention
The objective of the invention is at the deficiencies in the prior art, provide a kind of embedded memory data slice last slice outer distribution method, can realize the best performance of concrete application program on concrete memory configurations.
In order to solve the problems of the technologies described above, the technical solution used in the present invention comprises the steps:
Step 1. utilizes compiler and simulator tool to extract the information of concrete application program;
Step 2. is set up the TCG model to these information;
Step 3. proposes the data distributing method data object that the TCG value is big and is assigned to SPM;
Step 4. proposes the data layout method data object that the TCG value is big and is mapped to different Cache groups to avoid conflict.
The information of the described concrete application program of step 1 comprises size, life cycle, access times, temporal locality and the spatial locality of data object; Described temporal locality is by time chart TRG(Temporal Relationship Graph) represent; Spatial locality is to be represented by maximum connected reference number of times.
The described TCG model of step 2, its content comprise size, life cycle, access times, temporal locality and the spatial locality factor of the data object that step 1 is extracted, and its model formation is as follows:
TCG=(access times * life cycle * TRG value)/(maximum connected reference number of times * object size).
The described data distributing method of step 3 specifically comprises the steps:
3-1. according to the descending sort of TCG value, and initialization is assigned to the outer internal memory of sheet, then as the data to be distributed object with the total data object;
3-2. in all data to be distributed objects, first satisfies the data object that capacity is less than or equal to scratch-pad memory residual capacity according to the descending select progressively, and this data object is assigned to scratch-pad memory on the sheet;
3-3. repeating step 3-2, then finishes all greater than scratch-pad memory residual capacity up to all data to be distributed object capacities.
The described data layout method of step 4 comprises following steps:
4-1. calculate the cache set number that data object needs in the residue data to be distributed object, computing formula is as follows:
Group number=data object size/cache set size;
4-2. the current group number of high-speed cache is distributed to data object, and the current group number of buffer memory is added one, required group of number of data object subtracts one simultaneously;
4-3. repeating step 4-2 is zero up to required group of number of data object;
4-4. repeating step 4-1,4-2 and 4-3 have all assigned up to residue data to be distributed object.
Beneficial effect of the present invention is as follows:
The inventive method utilizes TCG model application programs information to carry out modeling, the reasonable utilization of SPM and the rational deployment of the outer internal storage data object of sheet have been taken all factors into consideration, optimized the data allocations between SPM and the Cache, reduce program consumes in data storage access time and reduction data storage visit energy consumption, realized the best performance of concrete application program on concrete memory configurations.
Description of drawings
Fig. 1 is the process flow diagram of the inventive method;
The TCG model structure figure that Fig. 2 proposes for the inventive method;
Fig. 3 is the SPM/Cache data distributing method process flow diagram in the inventive method;
Fig. 4 is the fixation of C ache data layout method flow diagram in the inventive method.
Embodiment
Describe the present invention below in conjunction with embodiment and accompanying drawing.
As shown in Figure 1, at first utilize compiler and simulator tool to extract the information of concrete application program in the present embodiment: 1. select the GCC-2.7.1-MIPS compiler-O3 optimizes the static compiling of option application program and obtains the MIPS assembly code; 2. select the MIPS emulator that internal memory on the sheet is configured, comprise amount of capacity, access delay and organizational form (replacement policy, write strategy, write disappearance strategy and interrelational form) etc., open the performance statistical tool, carry out the emulation of the data storage access performance of program.Secondly these information are set up the TCG model; Utilize the SPM/Cache data distributing method data object that the TCG value is big to be assigned to SPM then; Utilize the fixation of C ache data layout method data object that the TCG value is big to be mapped to different Cache groups at last to avoid conflict.
The information of described concrete application program comprises size, life cycle, access times, temporal locality and the spatial locality of data object; Described temporal locality is by time chart TRG(Temporal Relationship Graph) represent; Spatial locality is to be represented by maximum connected reference number of times.
As shown in Figure 2, described TCG model, its content comprises size, life cycle, access times, temporal locality and the spatial locality factor of the data object that step 1 is extracted, its model formation is as follows:
TCG=(access times * life cycle * TRG value)/(maximum connected reference number of times * object size).
Wherein, time chart TRG(Temporal Relationship Graph) and the TRG value can be referring to N. Gloy, T. Blockwell, cross reference file behind the M.D. Zorn thesis topic Procedure placement using temporal ordering information.
As shown in Figure 3, the purpose of the SPM/Cache data allocations in the present embodiment is that the easiest data object that clashes is assigned among the SPM, comprises the steps:
Step1, with the total data object according to the descending sort of TCG value, and initialization is assigned to the outer internal memory of sheet, then as the data to be distributed object;
Step2, in all data to be distributed objects, first satisfies the data object that capacity is less than or equal to scratch-pad memory residual capacity according to the descending select progressively, and this data object is assigned to scratch-pad memory SPM on the sheet;
Step3, repeating step Step2, then finish all greater than scratch-pad memory residual capacity up to all data to be distributed object capacities.
As shown in Figure 4, the fixation of C ache data layout in the present embodiment has two targets: the number of times that 1. reduces the Cache disappearance; 2. reduce the outer memory headroom of sheet (being that the hole in the outer internal memory of sheet is reduced in the intact back of data layout), comprise the steps:
The cache set that each data object needs in Step4, remaining i data to be distributed object of calculating is counted j, and computing formula is as follows:
Group number j=data object size/cache set size;
Step5, the current group number setNO of high-speed cache is distributed to data object, and the current group number setNO of buffer memory is added one, required group of number j of data object subtracts one simultaneously;
Step6, repeating step 4-5 are zero up to required group of number j of data object;
Step7, repeating step 4-4,4-5 and 4-6 have all assigned up to remaining i data to be distributed object.
Claims (1)
1. the outer distribution method of embedded memory data slice last slice is characterized in that comprising the steps:
Step 1. utilizes compiler and simulator tool to extract the information of concrete application program;
Step 2. is set up the TCG model to these information;
Step 3. proposes the data distributing method data object that the TCG value is big and is assigned to SPM;
Step 4. proposes the data layout method data object that the TCG value is big and is mapped to different Cache groups to avoid conflict;
The information of the described concrete application program of step 1 comprises size, life cycle, access times, temporal locality and the spatial locality of data object; Described temporal locality is to be represented by time chart TRG; Spatial locality is to be represented by maximum connected reference number of times;
The described TCG model of step 2, its content comprise size, life cycle, access times, temporal locality and the spatial locality factor of the data object that step 1 is extracted, and its model formation is as follows:
TCG=(access times * life cycle * TRG value)/(maximum connected reference number of times * object size);
The described data distributing method of step 3 specifically comprises the steps:
3-1. according to the descending sort of TCG value, and initialization is assigned to the outer internal memory of sheet, then as the data to be distributed object with the total data object;
3-2. in all data to be distributed objects, first satisfies the data object that capacity is less than or equal to scratch-pad memory residual capacity according to the descending select progressively, and this data object is assigned to scratch-pad memory on the sheet;
3-3. repeating step 3-2, then finishes all greater than scratch-pad memory residual capacity up to all data to be distributed object capacities;
The described data layout method of step 4 comprises following steps:
4-1. calculate the cache set number that data object needs in the residue data to be distributed object, computing formula is as follows:
Group number=data object size/cache set size;
4-2. the current group number of high-speed cache is distributed to data object, and the current group number of buffer memory is added one, required group of number of data object subtracts one simultaneously;
4-3. repeating step 4-2 is zero up to required group of number of data object;
4-4. repeating step 4-1,4-2 and 4-3 have all assigned up to residue data to be distributed object.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310114684.3A CN103218304B (en) | 2013-04-03 | 2013-04-03 | Off-chip distribution method in a kind of embedded memory data slice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310114684.3A CN103218304B (en) | 2013-04-03 | 2013-04-03 | Off-chip distribution method in a kind of embedded memory data slice |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103218304A true CN103218304A (en) | 2013-07-24 |
CN103218304B CN103218304B (en) | 2016-07-20 |
Family
ID=48816120
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310114684.3A Expired - Fee Related CN103218304B (en) | 2013-04-03 | 2013-04-03 | Off-chip distribution method in a kind of embedded memory data slice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103218304B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559148A (en) * | 2013-11-15 | 2014-02-05 | 山东大学 | On-chip scratch-pad memory (SPM) management method facing multitasking embedded system |
CN103793339A (en) * | 2014-01-13 | 2014-05-14 | 杭州电子科技大学 | Memory access stack distance based data Cache performance exploring method |
CN105204940A (en) * | 2014-05-28 | 2015-12-30 | 中兴通讯股份有限公司 | Memory allocation method and device |
CN106940682A (en) * | 2017-03-07 | 2017-07-11 | 武汉科技大学 | A kind of embedded system optimization method based on programmable storage on piece |
CN116097222A (en) * | 2020-05-18 | 2023-05-09 | 华为技术有限公司 | Memory arrangement optimization method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101763316A (en) * | 2009-12-25 | 2010-06-30 | 东南大学 | Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism |
CN101901192A (en) * | 2010-07-27 | 2010-12-01 | 杭州电子科技大学 | On-chip and off-chip data object static assignment method |
US20110219193A1 (en) * | 2007-11-06 | 2011-09-08 | Il Hyun Park | Processor and memory control method |
-
2013
- 2013-04-03 CN CN201310114684.3A patent/CN103218304B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110219193A1 (en) * | 2007-11-06 | 2011-09-08 | Il Hyun Park | Processor and memory control method |
CN101763316A (en) * | 2009-12-25 | 2010-06-30 | 东南大学 | Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism |
CN101901192A (en) * | 2010-07-27 | 2010-12-01 | 杭州电子科技大学 | On-chip and off-chip data object static assignment method |
Non-Patent Citations (1)
Title |
---|
袁名举: "基于ScratchPad Memory的低功耗技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559148A (en) * | 2013-11-15 | 2014-02-05 | 山东大学 | On-chip scratch-pad memory (SPM) management method facing multitasking embedded system |
CN103559148B (en) * | 2013-11-15 | 2016-03-23 | 山东大学 | Scratch-pad storage management method on the sheet of multi-task embedded operation system |
CN103793339A (en) * | 2014-01-13 | 2014-05-14 | 杭州电子科技大学 | Memory access stack distance based data Cache performance exploring method |
CN103793339B (en) * | 2014-01-13 | 2016-08-24 | 杭州电子科技大学 | Data Cache performance heuristic approach based on internal storage access storehouse distance |
CN105204940A (en) * | 2014-05-28 | 2015-12-30 | 中兴通讯股份有限公司 | Memory allocation method and device |
CN106940682A (en) * | 2017-03-07 | 2017-07-11 | 武汉科技大学 | A kind of embedded system optimization method based on programmable storage on piece |
CN116097222A (en) * | 2020-05-18 | 2023-05-09 | 华为技术有限公司 | Memory arrangement optimization method and device |
Also Published As
Publication number | Publication date |
---|---|
CN103218304B (en) | 2016-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hu et al. | Data allocation optimization for hybrid scratch pad memory with SRAM and nonvolatile memory | |
Salkhordeh et al. | An operating system level data migration scheme in hybrid DRAM-NVM memory architecture | |
CN104081315B (en) | Including thread merging for efficiency and the methods, devices and systems of energy-conservation | |
Zomaya et al. | Energy-efficient distributed computing systems | |
Capra et al. | Measuring application software energy efficiency | |
CN103218304B (en) | Off-chip distribution method in a kind of embedded memory data slice | |
CN103150265B (en) | The fine-grained data distribution method of isomery storer on Embedded sheet | |
US20180024928A1 (en) | Modified query execution plans in hybrid memory systems for in-memory databases | |
CN104115093A (en) | Method, apparatus, and system for energy efficiency and energy conservation including power and performance balancing between multiple processing elements | |
Li et al. | MAC: Migration-aware compilation for STT-RAM based hybrid cache in embedded systems | |
CN103559148B (en) | Scratch-pad storage management method on the sheet of multi-task embedded operation system | |
CN104572500B (en) | The management method of microprocessor and its performance and power consumption | |
Li et al. | Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache | |
Hu et al. | Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors | |
Kline Jr et al. | Greenchip: A tool for evaluating holistic sustainability of modern computing systems | |
Kannan et al. | A software solution for dynamic stack management on scratch pad memory | |
Liu et al. | A space-efficient fair cache scheme based on machine learning for nvme ssds | |
CN101901192B (en) | On-chip and off-chip data object static assignment method | |
Köhler et al. | Carbon-Aware Memory Placement | |
Hu et al. | Optimizing data allocation and memory configuration for non-volatile memory based hybrid SPM on embedded CMPs | |
Tian et al. | Optimal task allocation on non-volatile memory based hybrid main memory | |
CN104182280B (en) | Low-energy RM real-time task scheduling method for hybrid main memory embedded system | |
Ramesh et al. | Energy management in embedded systems: Towards a taxonomy | |
Li et al. | Energy optimization of branch-aware data variable allocation on hybrid SRAM+ NVM SPM for CPS | |
Poursafaei et al. | NPAM: NVM-aware page allocation for multi-core embedded systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160720 Termination date: 20170403 |