CN103218304A

CN103218304A - On-chip and off-chip distribution method for embedded memory data

Info

Publication number: CN103218304A
Application number: CN2013101146843A
Authority: CN
Inventors: 姚英彪; 陈越佳; 王璇; 曾宪彬
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2013-04-03
Filing date: 2013-04-03
Publication date: 2013-07-24
Anticipated expiration: 2033-04-03
Also published as: CN103218304B

Abstract

The invention relates to an on-chip and off-chip distribution method for embedded memory data. An on-chip memory serving as the key part of an embedded system directly affects the integral performance of the system. The on-chip and off-chip distribution method firstly proposes that a TCG (trusted computing group) model is used as a new standard for measuring a data object to cause Cache missing, and majority of key factors, such as the data object size, the life cycle, the access times, the time locality and the space locality, are comprehensively considered; then, an SPM(scratch pad memory)/Cache data distribution method is proposed to distribute a data object, which most likely generates confliction (big TCG value), to an SPM; and finally, a fixed Cache data layout method is proposed to map the data object with the big TCG value to different Cache groups to avoid confliction. According to the method disclosed by the invention, on-chip memory hardware and software operated on the on-chip memory hardware are better matched, and time for a program to access a memory system is shortened so as to improve the integral performance of the system.

Description

The outer distribution method of a kind of embedded memory data slice last slice

Technical field

The invention belongs to the embedded memory technical field.Relate in particular to the outer distribution method of a kind of embedded memory data slice last slice.The present invention can obtain at the best performance that specifically is applied on the concrete memory configurations, is particularly useful for the performance optimization of multimedia application on scratch-pad memory/high-speed cache mixing internal storage structure.

Background technology

Because the difference of manufacturing process and circuit logic structure, the speed of processor execution unit is higher than the memory read writing rate always, and along with the development of semiconductor process techniques, the performance difference that these gaps between their growth rates cause is progressively strengthening.An important technology that solves processor and external memory velocity mismatch is exactly a storage system employing hierarchical design, on sheet an integrated little but speed faster storer improve system's memory access performance.

Internal storage structure is as the pith of embedded system on the sheet, directly affects the key parameter such as performance, power consumption, cost of system.Internal storage structure has high-speed cache Cache and scratch-pad memory SPM(Scratch-Pad Memory on the sheet) two types.SPM compares every cost less area of Cache storage and power consumption, thereby internal storage structure takes the mixed structure of SPM/Cache to become a kind of trend gradually on the embedded system sheet.Yet the very little and specificity of the capacity of SPM makes how effectively to use memory source on the sheet to become the key issue of Embedded System Design.

Existing software data storage optimization research mainly concentrates on the Cache order rate that how to increase, and perhaps how to increase the SPM access times, lacks research is optimized in the datarams visit of adopting internal storage structure on Cache and the SPM mixing tab.

The outer distribution technique of data slice last slice is a kind of embedded system storage optimization technology, utilizes this technology to obtain the outer allocation strategy of sheet last slice and determines which data by SPM (being called on the sheet) visit, and which data is by Cache (it is outer to be called sheet) visit.The outer distribution technique of data slice last slice has been optimized the distribution of data between SPM and Cache, can obtain the best performance to concrete application, has become the focus of embedded system storage optimization research.

Summary of the invention

The objective of the invention is at the deficiencies in the prior art, provide a kind of embedded memory data slice last slice outer distribution method, can realize the best performance of concrete application program on concrete memory configurations.

In order to solve the problems of the technologies described above, the technical solution used in the present invention comprises the steps:

Step 1. utilizes compiler and simulator tool to extract the information of concrete application program;

Step 2. is set up the TCG model to these information;

Step 3. proposes the data distributing method data object that the TCG value is big and is assigned to SPM;

Step 4. proposes the data layout method data object that the TCG value is big and is mapped to different Cache groups to avoid conflict.

The information of the described concrete application program of step 1 comprises size, life cycle, access times, temporal locality and the spatial locality of data object; Described temporal locality is by time chart TRG(Temporal Relationship Graph) represent; Spatial locality is to be represented by maximum connected reference number of times.

The described TCG model of step 2, its content comprise size, life cycle, access times, temporal locality and the spatial locality factor of the data object that step 1 is extracted, and its model formation is as follows:

TCG=(access times * life cycle * TRG value)/(maximum connected reference number of times * object size).

The described data distributing method of step 3 specifically comprises the steps:

3-1. according to the descending sort of TCG value, and initialization is assigned to the outer internal memory of sheet, then as the data to be distributed object with the total data object;

3-2. in all data to be distributed objects, first satisfies the data object that capacity is less than or equal to scratch-pad memory residual capacity according to the descending select progressively, and this data object is assigned to scratch-pad memory on the sheet;

3-3. repeating step 3-2, then finishes all greater than scratch-pad memory residual capacity up to all data to be distributed object capacities.

The described data layout method of step 4 comprises following steps:

4-1. calculate the cache set number that data object needs in the residue data to be distributed object, computing formula is as follows:

Group number=data object size/cache set size;

4-2. the current group number of high-speed cache is distributed to data object, and the current group number of buffer memory is added one, required group of number of data object subtracts one simultaneously;

4-3. repeating step 4-2 is zero up to required group of number of data object;

4-4. repeating step 4-1,4-2 and 4-3 have all assigned up to residue data to be distributed object.

Beneficial effect of the present invention is as follows:

The inventive method utilizes TCG model application programs information to carry out modeling, the reasonable utilization of SPM and the rational deployment of the outer internal storage data object of sheet have been taken all factors into consideration, optimized the data allocations between SPM and the Cache, reduce program consumes in data storage access time and reduction data storage visit energy consumption, realized the best performance of concrete application program on concrete memory configurations.

Description of drawings

Fig. 1 is the process flow diagram of the inventive method;

The TCG model structure figure that Fig. 2 proposes for the inventive method;

Fig. 3 is the SPM/Cache data distributing method process flow diagram in the inventive method;

Fig. 4 is the fixation of C ache data layout method flow diagram in the inventive method.

Embodiment

Describe the present invention below in conjunction with embodiment and accompanying drawing.

As shown in Figure 1, at first utilize compiler and simulator tool to extract the information of concrete application program in the present embodiment: 1. select the GCC-2.7.1-MIPS compiler-O3 optimizes the static compiling of option application program and obtains the MIPS assembly code; 2. select the MIPS emulator that internal memory on the sheet is configured, comprise amount of capacity, access delay and organizational form (replacement policy, write strategy, write disappearance strategy and interrelational form) etc., open the performance statistical tool, carry out the emulation of the data storage access performance of program.Secondly these information are set up the TCG model; Utilize the SPM/Cache data distributing method data object that the TCG value is big to be assigned to SPM then; Utilize the fixation of C ache data layout method data object that the TCG value is big to be mapped to different Cache groups at last to avoid conflict.

The information of described concrete application program comprises size, life cycle, access times, temporal locality and the spatial locality of data object; Described temporal locality is by time chart TRG(Temporal Relationship Graph) represent; Spatial locality is to be represented by maximum connected reference number of times.

As shown in Figure 2, described TCG model, its content comprises size, life cycle, access times, temporal locality and the spatial locality factor of the data object that step 1 is extracted, its model formation is as follows:

Wherein, time chart TRG(Temporal Relationship Graph) and the TRG value can be referring to N. Gloy, T. Blockwell, cross reference file behind the M.D. Zorn thesis topic Procedure placement using temporal ordering information.

As shown in Figure 3, the purpose of the SPM/Cache data allocations in the present embodiment is that the easiest data object that clashes is assigned among the SPM, comprises the steps:

Step1, with the total data object according to the descending sort of TCG value, and initialization is assigned to the outer internal memory of sheet, then as the data to be distributed object;

Step2, in all data to be distributed objects, first satisfies the data object that capacity is less than or equal to scratch-pad memory residual capacity according to the descending select progressively, and this data object is assigned to scratch-pad memory SPM on the sheet;

Step3, repeating step Step2, then finish all greater than scratch-pad memory residual capacity up to all data to be distributed object capacities.

As shown in Figure 4, the fixation of C ache data layout in the present embodiment has two targets: the number of times that 1. reduces the Cache disappearance; 2. reduce the outer memory headroom of sheet (being that the hole in the outer internal memory of sheet is reduced in the intact back of data layout), comprise the steps:

The cache set that each data object needs in Step4, remaining i data to be distributed object of calculating is counted j, and computing formula is as follows:

Group number j=data object size/cache set size;

Step5, the current group number setNO of high-speed cache is distributed to data object, and the current group number setNO of buffer memory is added one, required group of number j of data object subtracts one simultaneously;

Step6, repeating step 4-5 are zero up to required group of number j of data object;

Step7, repeating step 4-4,4-5 and 4-6 have all assigned up to remaining i data to be distributed object.

Claims

1. the outer distribution method of embedded memory data slice last slice is characterized in that comprising the steps:

Step 2. is set up the TCG model to these information;

Step 4. proposes the data layout method data object that the TCG value is big and is mapped to different Cache groups to avoid conflict;

The information of the described concrete application program of step 1 comprises size, life cycle, access times, temporal locality and the spatial locality of data object; Described temporal locality is to be represented by time chart TRG; Spatial locality is to be represented by maximum connected reference number of times;

TCG=(access times * life cycle * TRG value)/(maximum connected reference number of times * object size);

3-3. repeating step 3-2, then finishes all greater than scratch-pad memory residual capacity up to all data to be distributed object capacities;

The described data layout method of step 4 comprises following steps:

Group number=data object size/cache set size;

4-3. repeating step 4-2 is zero up to required group of number of data object;