CN101763316A - Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism - Google Patents

Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism Download PDF

Info

Publication number
CN101763316A
CN101763316A CN200910264520A CN200910264520A CN101763316A CN 101763316 A CN101763316 A CN 101763316A CN 200910264520 A CN200910264520 A CN 200910264520A CN 200910264520 A CN200910264520 A CN 200910264520A CN 101763316 A CN101763316 A CN 101763316A
Authority
CN
China
Prior art keywords
instruction
spm
page
cache
time slot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910264520A
Other languages
Chinese (zh)
Other versions
CN101763316B (en
Inventor
凌明
张阳
梅晨
王欢
武建平
李冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN2009102645202A priority Critical patent/CN101763316B/en
Publication of CN101763316A publication Critical patent/CN101763316A/en
Application granted granted Critical
Publication of CN101763316B publication Critical patent/CN101763316B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method for dynamically distributing isomerism storage resources on an instruction parcel based on a virtual memory mechanism, wherein the storage resources on the instruction parcel comprising command Cache and command SPM are fully utilized. In the invention, time slot analyzing method is adopted to analyze the time and space distribution hit and missing by command Cache caused by high frequency to obtain time slot access map of the command Cache and perform mathematical abstraction to the time slot access map. According to energy consumption target function and performance target function, the program command parts needing optimization in different time slots are selected by utilizing integer non-linear programming method, different program phases are divided by utilizing a clock module, and when clock is suspended, command pages with optimizing value are dynamically remapped into a command SPM memory by utilizing a command SPM controller, thus avoiding additional access and memory caused by command Cache conflict and obtaining energy consumption gain from once access energy consumption difference of Cache and SPM. In the method, isomerism storage on instruction parcel is fully utilized, thus reducing system energy consumption and improving system performance.

Description

Method based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism
Technical field
The present invention relates to embedded on-chip memory field, particularly a kind of method based on isomerism storage resources on the instruction sheet of virtual memory mechanism (comprising instruction Cache and instruction SPM) dynamic assignment.
Background technology
Along with the development of microelectric technique, increasingly mature based on the embedded computing platform of SoC (System-on-a-Chip).Yet because the gap of processor speed and external memory storage speed constantly increases, the SoC storage subsystem has become the bottleneck of system performance, power consumption and cost.Therefore how the framework and the operating strategy of optimal Storage subsystem are the focuses of embedded research always.
As traditional on-chip memory, Cache is by hardware management, and is transparent to software in most of situation, and the instruction and data of the frequent access of energy automatic loading is in on-chip memory.Yet the high power consumption of Cache, area occupied is big, program execution time is unpredictable etc., and deficiency limits its extensive utilization in embedded system always.Especially the group associate feature of Cache may cause being mapped to the capable distinct program content of same Cache, because the memory access rule, and mutual alternative repeatedly, thus increased the expense of system performance and energy consumption, Cache promptly occurs and shake.Compare with Cache, SPM (Scratch-Pad Memory, memo storer) is a kind of high speed on-chip memory, realize by SRAM usually, and be very important system framework design consideration in the modern embedded system.SPM is within the address space that processor can directly visit, because traditional SPM controller does not comprise the logical circuit of any auxiliary management data, all the elements among the SPM must with respect to the transparent Cache of programmer, increase the complicacy of program management via the explicit management of software.Because the extra cost that does not have the management logic circuit to bring, compared to traditional C ache, the realization of SPM hardware is more simple, the single reference power consumption is lower, chip occupying area is littler and the access time can be predicted.To sum up, each tool advantage of Cache and SPM and existence are complementary, therefore the isomery memory stores resource of Cache and SPM coexistence are studied, and can make full use of both advantages, thereby reduce system energy consumption, elevator system performance to greatest extent.
Some Main Analysis for the research of embedded on-chip memory dispose merely Cache or dispose merely the framework of SPM, can not well utilize the characteristic of both complementations.Directly will be only at the optimized Algorithm of SPM or only apply in the framework of both coexistences at the optimized Algorithm of Cache, can not reach the overall performance optimization, the optimization income that obtains on a kind of memory bank may be offset by the expense of another kind of storer, even introduces the more overhead of multisystem performance and energy consumption.For example the content of certain section main memory is transported to SPM, thereby has obtained the income of performance and energy consumption at the optimized Algorithm of SPM.Yet carrying code itself may pollute, cause to Instruction Cache the inefficacy of Cache optimized algorithm, thereby causes extra Cache disappearance, offsets the optimization income of SPM.
In Cache when disappearance, need actually to visit the external memory operation and new content is changed to Cache capable, and expense is bigger, and this is called as the punishment that Cache lacks.Because the group associate feature of Cache, being mapped to the capable content of same Cache may replace repeatedly mutually, brings a large amount of accessing operations, thereby causes systematic function sharply to reduce, and system energy consumption sharply increases, the conflict of Here it is Cache.By increasing methods such as Cache capacity, increase group incidence number, can reduce the Cache conflict, but can introduce new chip area again and promote single Cache access time and energy consumption like this.Have research to point out that Cache conflict is the major reason that causes system performance and energy consumption bottleneck at present, so they will cause easily that the program segment of Cache conflict puts into SPM, obtain the income of performance and energy consumption with this.The page or leaf that causes easily the Cache conflict is selected among the SPM, not only can reduces system energy consumption by reducing the Cache conflict, the elevator system performance can also obtain more to overcharge benefit by the energy consumption difference of single reference SPM and Cache.But these researchs all are based on static design, and namely the content among the SPM does not change in program is carried out, and do not utilize fully the Cache conflict in the locality on time dimension, have reduced the utilization rate of SPM.And these designs depend on modification to the source program jump instruction to the management of SPM, are a kind of analyses of intrusive mood.
Owing to the change of SPM content is needed the carrying out of software demonstration, therefore generally the research of SPM being carried out dynamic management all is the form by " piling ", namely before and after the program kernel circulation that needs are optimized, insert by hand code carrying instruction, thereby finish the swapping in and out to contents of program.In program image, insert new instruction, need to rely on analysis, and new instruction causes the variation of Cache behavior in the coexistence framework, for example more conflict of generation probably source code.
For the research of operation part in Cache and the SPM coexistence framework, generally the analysis of intrusive mood need to be carried out to program at present, partial code need to be in user program, inserted, revise, to be implemented in the dynamic swapping in and out of content in the program process.Up to the present, also there is not correlative study to relate to based on virtual memory management, utilize the time slot method, do not need update routine source code ground the command content of program dynamically to be mapped to the method for SPM on the sheet.
Summary of the invention
The objective of the invention is to overcome the deficiency of storage subsystem on the existing sheet, method based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism is provided, the less instruction SPM storer of employing capacity cushions frequent access and causes instruction Cache conflict easily in program process program, operation part to program is optimized, thereby improves the speed of microprocessor operation and reduce system energy consumption.
For achieving the above object, the technical scheme that the present invention takes is: a kind of based on the method for virtual memory mechanism to isomerism storage resources dynamic assignment on the instruction sheet, it is characterized in that: by the tracking of processor cores in the application programs implementation instruction Cache visit, the time and the space distribution that obtain instructing Cache to hit and lack, obtain instructing the time slot visit figure of Cache by this space distribution, comprise and hit weight, disappearance relation and weight, and it is carried out mathematical abstractions, according to the power dissipation obj ectives function, the state of each instruction page when the performance objective function utilizes the method for integral nonlinear planning to select system's total energy consumption optimum respectively, obtain that the page number of optimizing the value instruction page is arranged in each time slot most, utilize the iterative method, the time slot size is adjusted, in carrying out, program interrupts by clock, use instruction SPM controller to realize the modification of page table entry inlet and the configuration of direct memory access controller DMA, the instruction access focus is reached the dynamic replay of address space that causes instruction Cache conflict easily be mapped in the on-chip command SPM storer, eliminate the extra memory access that instruction Cache conflicts and brings.
The described time slot access figure that obtains Instruction Cache, and after it is carried out mathematical abstractions, the temporal locality that shows when utilizing program to carry out, with the most frequent access in each time slot and the instruction page that causes easily Instruction Cache conflict according to the big minispread of income, the size of instruction SPM memory is the instruction page number of optimizing divided by MMU page or leaf size, select that according to this number the instruction page of optimizing value is arranged most, be remapped in the instruction SPM memory.
Said virtual memory mechanism is to increase the S position in TLB, be used for indicating that this page content is at instruction SPM storer, reduce extra Cache energy consumption expense relatively, simultaneously, revising TLB makes it to support 512Byte/ void is deposited page or leaf and 256Byte/ void is deposited page or leaf, thereby with the instruction access focus and cause that easily the address space of instruction Cache conflict detaches out, do and remap and avoid that a large amount of optimizations are worth little address space.
Said instruction SPM controller in program process dynamic high-efficiency programmed instruction is partly changed to the instruction SPM storer that swaps out, utilize the Burst characteristic of AHB high-speed bus on the sheet, avoid secondary pollution to instruction Cache and Data Cache.Having increased by one group in the instruction SPM controller is exclusively used in record and writes back address and an empty instruction SPM regional register of depositing the page or leaf size, its effect is: 1) this group register will be responsible for depositing page or leaf in certain void and record its corresponding core address when remapping at instruction SPM memory, this address will be when this void be deposited page or leaf and is swapped out the note memory as the destination address of DMA; 2) being responsible for the empty position of depositing the page or leaf size of record in this group register will be for configuration DMA carrying length; 3) whether the enable bit in this group register to be responsible for controlling this page content available.
The inventive method specifically can realize according to the following steps:
(1) sets up the mechanism of virtual memory management
Virtual memory management mechanism can form physical separation, logic continuous address by revising page table entry, so just can realize that map addresses with the subprogram page or leaf is in SPM.With respect to traditional dynamic SPM optimisation technique, utilize void to deposit the change of finishing the address space mapping relations, can realize complete non-intrusion type optimization to the binary image that generates after program source code and the compiling.In order to adapt to the method to Cache and SPM dynamic management, the present invention need to improve original storage hardware.The one, in TLB, increase the S position to realize essential access control.Use the S position can make MMU when carrying out address translation, determine the actual physical address that should access, and the on-chip command memory that needs access issued in this address, namely Instruction Cache and instruction SPM controller the two one of.The 2nd, by revising the decoding logic of TLB, increase 512Byte/ virtual page, 256Byte/ virtual page are supported.Traditional TLB only supports the management of minimum 1K Byte/ virtual page, and Cache is by the row tissue, every row is 32-64Bytes only, in a period of time that program is carried out, the access focus of Instruction Cache occurs and cause that easily the minimum void that the address space of Cache conflict is supported less than traditional TLB mostly deposits the page or leaf size, can't take full advantage of the SPM capacity.The present invention will utilize the reservation position in the conventional page list item entrance, and revise Tag memory and the comparison circuit of TLB, realize the support to 256Byte/ virtual page and 512Byte/ virtual page.
(2) foundation of time slot access figure
The present invention comprises and hits characteristic and disappearance characteristic by analysis instruction Cache memory access characteristic, and the storer of coexistence framework is optimized.The present invention proposes " time slot access figure " concept, Cache is hit on time and the space and disappearance is analyzed.Figure is according to the trace information (comprising the information of hitting and disappearance information) to instruction Cache in the time slot visit, and it is carried out mathematical abstractions.Because the present invention adopts virtual memory management mechanism that programmed instruction is partly managed, the granularity of division of programmed instruction part is the page or leaf size of MMU, time slot access figure will be undertaken by page or leaf abstract, and access figure comprises: the weight of instruction page itself, i.e. Instruction Cache hit-count; Page or leaf conflict graph, a kind of quantitative description are mapped to the polar plot of fallback relationship between the distinct program content of same Instruction Cache in capable, comprise that different instruction page causes direction and the alternative number of times of Instruction Cache disappearance mutually.Set up after the time slot access figure, need to carry out mathematical modeling describing the weight distribution between each page to it, the state of each page when finally trying to achieve by integral nonlinear planning that whole energy consumption income is optimum in the different time-gap.So just obtain having most in each time slot and optimize the instruction page that is worth, these pages can be changed among the SPM in the process that program is carried out dynamically.
(3) iterative of slot length
This research and utilization time slot carries out dynamic management to program.The clock interruption can realize the time slot of program is divided, only need be at first at each time slot, reload the clock module counter, when producing clock when reducing to 0, counter interrupts, processor cores will receive the interrupt request that interruption processing module is sent, system enters the IRQ pattern, finishes changing to of instruction SPM memory content to be swapped out the final dynamic management that realizes instruction SPM.
Interrupt the program execution time mark by clock, need know the length that certain time slot is concrete.When setting up time slot collision figure, be the isometric whole time slot of division.But when program was optimized, owing to reduce the Instruction Cache conflict so that performance boost, namely program execution time shortened, so that the length of time slot has had variation.Therefore, we need to adopt the method for iterative, and the slot length after optimizing is adjusted.Adjusted slot length reaches the page number that needs to put into SPM in (1) in determined each time slot, when each clock interrupts, is loaded in the time slot register of clock module.
(4) utilize instruction SPM controller to finish dynamic management
In the program execute phase, clock module can send look-at-me when time slot finishes, take over by interruptable controller, in interrupt handling routine, can use instruction SPM controller, can finish that changing to of content in the modification of page table entry and the instruction SPM storer swapped out, to adapt to the program memory access mode of next time slot.
In clock interrupted, the operation of SPM content swapping in and out was finished by instruction SPM controller.Have no progeny in entering, instruction SPM controller need be finished: the first, and utilize the time slot register of clock module to obtain current time slot, determine the skew of current time slot configuration information; The second, need write the renewal operation that the SPM page or leaf carries out page table entry to this time slot; The 3rd, configuration DMA carries out change operation.
Traditional SPM design of Controller is comparatively simple, can't realize the desired more complicated SPM dynamic allocation scheme of dynamic address mapping mechanism.Therefore the present invention is used for the register that record writes back the address by increasing by one group on the basis of traditional SPM controller, it is the SPM regional register, SPM controller traditional, that only can realize addressing function expanded to support different grain size SPM management, and can realize by active arrangement DMA the instruction SPM controller of the dynamic swapping in and out of its content.DMA can be with the dynamic swapping in and out SPM of instruction block under the configuration of instruction SPM controller, compared to swapping in and out traditional, that pass through LDR/STR instruction execution SPM content, DMA has utilized the BURST characteristic of high-speed bus AHB on main memory SDRAM and the sheet to a great extent, thereby has reduced cost and the interrupt latency of transmission.
Advantage of the present invention and remarkable result: utilize the thought of virtual memory management can realize that the analytical method of time slot access figure can take full advantage of the locality of program to the non-intruding optimization of programmed instruction part.Adopt the thought of virtual memory management, actual physical address and virtual address can be kept apart.Like this, in different time-gap, can be by revising page table entry, and pass through DMA, finish changing to of content in the instruction SPM storer swapped out, address space all is continuous before and after optimizing for CPU, but for real hardware, will often visit and frequently cause and instruct the subprogram command content of Cache conflict to put into instruction SPM storer, thereby reduced the number of times of instruction Cache conflict, and poor by the single reference energy consumption of Cache and SPM, finally obtained the income on performance and the energy consumption.Simultaneously, based on virtual memory mechanism program is managed, can realize analysis and optimization to the program non-intrusion type, the carrying code of the increase SPM that promptly need in user program, not show, and in Interrupt Process by configuration DMA with revise page table and finish changing to of contents of program swapped out.In addition, the present invention makes full use of the temporal locality characteristics that program is carried out, innovation has proposed the notion of time slot visit figure, then will cause the page or leaf of Cache conflict to utilize the mechanism of virtual memory management to be reoriented to SPM in each time slot by the SPM dynamic allocation algorithm, thereby obtain compared to static optimization more considerable performance and energy consumption income, final SPM resource on the limited sheet, the energy consumption of reduction program, the elevator system performance utilized dynamically.
Description of drawings
Fig. 1 is the system block diagram of isomery memory on the instruction sheet;
Fig. 2 is the design of the register of instruction SPM controller;
Fig. 3 is the exemplary plot of time slot collision;
Fig. 4 is the system flow chart that utilizes the time slot analysis method that storage resources on the instruction sheet is managed;
Fig. 5 is the iterative synoptic diagram;
The test findings of Fig. 6 for adopting optimization method of the present invention that system energy consumption is optimized.
Embodiment
Below in conjunction with accompanying drawing and embodiment the present invention is described in further detail.
The present invention is by the tracking of Instruction Cache access in the application programs implementation, obtain the spatio-temporal distribution that Instruction Cache hits and lacks, then obtained the time slot access figure of Instruction Cache by this distribution, comprise the conflict graph between the own weight of instruction page (being the Cache hit-count) and the instruction page.Conflict graph be a kind of quantitative description be mapped to same Cache capable in the polar plot of fallback relationship between the distinct program content.By being carried out mathematics, Cache time slot visit takes out picture, the state of each instruction page when utilizing the method for integral nonlinear planning can determine that system's total energy consumption is optimum, thus obtaining having most in each time slot optimizes the page number that is worth instruction page.Afterwards, according to the iterative method, the time slot size is adjusted.According to the instruction page of needs optimization in each time slot and the time slot size information after the adjustment, in carrying out, program utilize clock to interrupt, realize the modification of page table entry entrance, the steps such as configuration of DMA by instruction SPM controller, thereby frequently the access and cause that easily the address space of Instruction Cache conflict dynamically is remapped among the SPM, finally eliminate the extra memory access that the Instruction Cache conflict brings, and obtain more system energy consumption income by the single reference energy consumption difference of Cache and SPM.
Figure 1 shows that at original framework needs the part that increases and revise, comprises S position among the TLB, instruction SPM controller, DMA, clock module.
The S figure place is to study necessary access control in order to adapt to this for increasing among the TLB.Use special S position can make MMU when carrying out address translation, determine the actual physical address that should access, and the on-chip memory (Instruction Cache or instruction SPM controller) that needs access is issued in this address.It is to be noted: the calculating to this SPM position was finished in the algorithm stage, was used for indicating the page or leaf that needs to be remapped to SPM in particular time-slot, and real set operation is finished during to be clock when each time slot finishes interrupt.Thus, after clock interrupts processing, just can according to the set situation of SPM position, in the same clock cycle of physical address-virtual address translation, determine the on-chip memory type of subsequent access when continuing to carry out benchmark.The increase of S position can reduce the energy consumption of addressing on the sheet.If do not use the S position, after the conversion of address through virtual address-physical address of MMU, need simultaneously the address to be sent to Instruction Cache or instruction SPM controller, the comparison circuit of Cache and the addressing logic of SPM are started working simultaneously.Because instruction only may be in a kind of storer, two kinds of addressing logic all will consume certain energy consumption.Design S position can judge in address transition that the address sends to Instruction Cache or instruction SPM controller, to save certain power consumption.
Fig. 2 is the register design of instruction SPM controller.Traditional SPM design of Controller is comparatively simple, can't realize the desired more complicated SPM dynamic allocation scheme of dynamic address mapping mechanism.Therefore the present invention is used for the register that record writes back address and page or leaf size, i.e. SPM regional register by increasing by one group on the basis of traditional SPM controller.SPM controller traditional, that only can realize addressing function expanded to support different grain size SPM management, and can realize by active arrangement DMA the instruction SPM controller of the dynamic swapping in and out of its content.Because this research can be done the design space at the MMU page or leaf of different sizes and explore, and therefore need do explanation to the page or leaf size.The 0th of the SPM regional register is the EN position, and this position can be set to 0 when system initialization, when done the DMA change operation SPM page of first time, with this position 1; The 1st to the 3rd the page or leaf size of determining of regional register, its concrete corresponding base address is provided by the high address, takies figure place according to its page size and does not wait.
DMA can be with the dynamic swapping in and out instruction of instruction block SPM memory under the configuration of instruction SPM controller, compared to swapping in and out traditional, that pass through LDR/STR instruction execution SPM content, DMA has utilized the BURST characteristic of high-speed bus AHB on main memory SDRAM and the sheet to a great extent, thereby has reduced cost and the interrupt latency of transmission.Control to DMA is mainly finished by instruction SPM controller, instruction SPM controller is by reading the configuration information that is loaded into main memory, to need by demand to upgrade in the corresponding control register that partial information is loaded into dma controller at different time slots, thereby finish configuration to DMA.Dma controller will be applied for bus afterwards, according to the configuration of instruction SPM controller by the content swapping in and out instruction SPM memory of dma operation with needs, thereby finish dynamically updating of instruction SPM memory content.
It is the basis of dynamically adjusting the SPM content that clock interrupts.Because carrying out, program usually has apparent in view temporal locality, therefore also there is similar characteristic in the programmed instruction section, Instruction Cache disappearance occurs and also has similar temporal locality feature, if take full advantage of, then can take full advantage of to a great extent the SPM area, finally obtain better energy consumption income by dynamic adjustment SPM content.When the clock count register of clock module reduces to 0 certainly, processor cores will receive the interrupt request that interruption processing module is sent, and system enters the IRQ pattern then.Afterwards, instruction SPM controller is finished the operation to SPM content swapping in and out according to the content in the regional register, the final dynamic management that realizes instruction SPM controller.
Fig. 3 is the exemplary plot of time slot access.Have comparatively significantly temporal locality because program is carried out, the whole process that time slot access figure carries out according to partition program is the time slot of even length, and obtains respectively Cache access figure separately in different time-gap inside.Visit figure comprise instruction page own weight (promptly instruct Cache hit-count) and with other nodes between relation (comprising the conflict relationship between the instruction page and the number of times that conflicts).Because the hit rate of Instruction Cache is higher, therefore the selection of optimum node need to be considered the relation of Cache between hitting and lacking simultaneously.But the energy consumption income that instruction page is reoriented among the SPM is different, and the single reference energy consumption of hitting income and be Cache and SPM is poor.Optimization income to disappearance is bigger, this is because Cache when disappearance, the hardware of Cache can automatically be initiated the process that row is filled, need to and be filled in the Cache body via the external bus access sdram, in addition Cache does not hit and also can introduce the energy consumption that twice Cache reads, this is because reading Cache for the first time finds not hit, and the filling of reading to be expert at for the second time after finishing is read valid data among the Cache; After the instruction page that clashes is put into SPM, eliminated the Cache disappearance, the access energy consumption only is the access energy consumption to SPM, the energy consumption income of therefore disappearance being optimized will be higher than the optimization to hitting.Because the hit rate of Instruction Cache is higher, must consider simultaneously hitting and disappearance information of Cache.Time slot access figure can consider the access characteristics of Cache fully, according to the characteristic of different time-gap, selects to have most and optimizes the instruction page that is worth.
To cause the page or leaf of Cache conflict to utilize the mechanism of virtual memory management to be reoriented to SPM in each time slot by the SPM dynamic allocation algorithm, timeslot-based dynamic optimization can utilize SPM resource on the limited sheet, obtains compared to static optimization more considerable performance and energy consumption income.
Figure 4 shows that and utilize the time slot analysis method isomerism storage resources on the instruction sheet to be carried out the system flowchart of the method for dynamic management.At program analysis phase, the first step is set up Instruction Cache time slot access figure by the trace/ trace information of the Instruction Cache collected.Can realize analysis based on instruction Cache time slot visit figure to the program non-intrusion type.Second step, carry out mathematical abstractions, by instruction Cache visit figure is carried out mathematical modeling to describe the weight distribution between each instruction page, then the state of each the alternative instruction page of variation quantitative description by weight distribution is to the influence of energy consumption function, finally tried to achieve the state of whole energy consumption income each instruction page when optimum by integral nonlinear planning.The 3rd step, with the most frequent access in each time slot and the instruction page that causes easily Instruction Cache conflict according to the big minispread of income, the size of instruction SPM memory is the instruction page number that can optimize divided by MMU page or leaf size, select that according to this number the instruction page of optimizing value is arranged most, be remapped in the instruction SPM memory.For example, instruction SPM storer is 4K Byte, and when MMU page or leaf size was 512Byte, the number of instruction page that can be optimised was that 4K Byte is divided by 512Byte just like this, be 8, when selecting, have most in each time slot and optimize 8 instruction page that are worth and put into instruction SPM storer.Obtain like this in each time slot, have most and optimize the instruction page number that is worth, these instruction page can be changed in the instruction SPM memory in the process that program is carried out dynamically.In the 4th step, by iteration slot length is adjusted, as shown in Figure 5.Owing to lack owing to having eliminated a large amount of Instruction Caches in each time slot, thereby obtained the income of energy consumption and performance two aspects, and the income on the performance can directly cause this time slot program execution time to shorten, thereby the time point skew that causes next time slot to begin, therefore need to be by the method for iterative, count each time slot in the gap of optimizing surrounding time, with its correction.After finishing above-mentioned steps, needing in the concrete length of adjusted each time slot and each time slot can to obtain moving into the content of the page or leaf of SPM.
In the program execute phase, when the clock module to constantly, processor cores will receive the interrupt request that interruption processing module is sent, system enters the IRQ pattern then.Under abnormal patterns, can finish the swapping in and out to content in the modification of page table entry and the instruction SPM memory, to adapt to the program memory access mode of next time slot.The detailed process that enters the clock interruption is: the first step, after entering this pattern and preserving relevant environmental variance, utilize the time slot register of clock module to obtain current timeslot number, thereby obtain this time slot configuration information needed, comprise the page number and the adjusted slot length that need dynamically change to instruction page.Second step, instruction SPM controller will utilize this value to obtain core address spatial information and the correlating markings position that should cushion in the current time slots instruction SPM memory.In the 3rd step, instruction SPM controller will load this time slot need to write the core address of SPM page or leaf to the mapping range register of correspondence, and the renewal operation of beginning page table entry.The 4th step, after finishing the page table entry renewal, instruction SPM controller will be responsible for loading core address in the mapping range register to the source address register of DMA, and load the physical address of the corresponding page or leaf of SPM to the destination address register of DMA, begin then the change operation of DMA.At last, interrupt handling routine adds 1 with the time slot register of clock module, for next time slot is prepared, and the environmental variance before recovering to interrupt, withdrawing from interrupt handling routine, processor cores begins to continue to carry out clock and interrupts benchmark in the past.
Figure 6 shows that the resulting energy consumption income of method of using isomerism storage resources dynamic assignment on the instruction sheet that the present invention is based on virtual memory mechanism.Contrast test adopts the Instruction Cache of 16K Byte 4 tunnel group associations common in the actual chips, Optimum Experiment adopts the Instruction Cache of 4K Byte direct correlation and the instruction SPM memory of 8K Byte, utilizes the time slot analysis method that the instruction isomerism storage resources is carried out dynamic management.According to the calculating of Cacti3.2 to the on-chip memory area, the area sum of 4K direct correlation Cache that optimization Test is used and the SPM of 4K, related Cache compares with contrast test 8K Byte 4 tunnel groups, chip area reduces by 19.0%, system energy consumption on average reduces by 12.7%, the highest reduction by 22.3%, simultaneously, system's execution time on average reduces by 15.9%, the highest reduction by 25.6%.

Claims (5)

1. one kind based on the method for virtual memory mechanism to isomerism storage resources dynamic assignment on the instruction sheet, it is characterized in that: by the tracking of processor cores in the application programs implementation to the Instruction Cache access, obtain the spatio-temporal distribution that Instruction Cache hits and lacks, obtained the time slot access figure of Instruction Cache by this spatial distribution, comprise and hit weight, disappearance relation and weight, and it is carried out mathematical abstractions, according to the power dissipation obj ectives function, the state of each instruction page when the performance objective function utilizes respectively the method for integral nonlinear planning to select system's total energy consumption optimum, obtain that the page number of optimizing the value instruction page is arranged in each time slot most, utilize the iterative method, the time slot size is adjusted, in carrying out, program interrupts by clock, use instruction SPM controller to realize the modification of page table entry entrance and the configuration of direct memory access controller DMA, with instruction access focus and cause that easily the address space of Instruction Cache conflict dynamically is remapped in the on-chip command SPM memory, single reference energy consumption by Instruction Cache and instruction SPM is poor, and reduce the extra memory access that the Instruction Cache disappearance is brought, reduce system energy consumption.
2. according to claim 1 based on the method for virtual memory mechanism to isomerism storage resources dynamic assignment on the instruction sheet, it is characterized in that: the time slot access figure that obtains Instruction Cache, and after it is carried out mathematical abstractions, the temporal locality that shows when utilizing program to carry out, with the most frequent access in each time slot and the instruction page that causes easily Instruction Cache conflict according to the big minispread of income, the size of instruction SPM memory is the instruction page number of optimizing divided by MMU page or leaf size, select that according to this number the instruction page of optimizing value is arranged most, be remapped in the instruction SPM memory.
3. according to claim 1 based on the method for virtual memory mechanism to isomerism storage resources dynamic assignment on the instruction sheet, it is characterized in that: said virtual memory mechanism is to increase the S position in bypass conversion buffered TLB, be used for indicating that this page content is at instruction SPM storer, reduce extra Cache energy consumption expense relatively, simultaneously, revising TLB makes it to support 512Byte/ void is deposited page or leaf and 256Byte/ void is deposited page or leaf, thereby with the instruction access focus and cause that easily the address space of instruction Cache conflict detaches out, do and remap and avoid that a large amount of optimizations are worth little address space.
4. according to claim 1 based on the method for virtual memory mechanism to isomerism storage resources dynamic assignment on the instruction sheet, it is characterized in that: instruction SPM controller in program process dynamic high-efficiency programmed instruction is partly changed to the instruction SPM controller that swaps out, utilize the Burst characteristic of AHB high-speed bus on the sheet, avoid secondary pollution instruction Cache and Data Cache.
5. according to claim 4 based on the method for virtual memory mechanism to isomerism storage resources dynamic assignment on the instruction sheet, it is characterized in that: in instruction SPM controller, increased by one group and be exclusively used in record and write back address and an empty instruction SPM control register of depositing the page or leaf size:
1) this group register will be responsible for depositing page or leaf in certain void and record its corresponding core address when remapping at instruction SPM memory, this address will be when this void be deposited page or leaf and is swapped out instruction SPM memory as the destination address of DMA;
2) be responsible for the empty position of depositing the page or leaf size of record in this group register and will be used to dispose DMA carrying length;
3) whether the enable bit in this group register to be responsible for controlling this page content available.
CN2009102645202A 2009-12-25 2009-12-25 Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism Expired - Fee Related CN101763316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009102645202A CN101763316B (en) 2009-12-25 2009-12-25 Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009102645202A CN101763316B (en) 2009-12-25 2009-12-25 Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism

Publications (2)

Publication Number Publication Date
CN101763316A true CN101763316A (en) 2010-06-30
CN101763316B CN101763316B (en) 2011-06-29

Family

ID=42494483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009102645202A Expired - Fee Related CN101763316B (en) 2009-12-25 2009-12-25 Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism

Country Status (1)

Country Link
CN (1) CN101763316B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901192A (en) * 2010-07-27 2010-12-01 杭州电子科技大学 On-chip and off-chip data object static assignment method
CN102073596A (en) * 2011-01-14 2011-05-25 东南大学 Method for managing reconfigurable on-chip unified memory aiming at instructions
CN103218304A (en) * 2013-04-03 2013-07-24 杭州电子科技大学 On-chip and off-chip distribution method for embedded memory data
CN105701032A (en) * 2014-12-14 2016-06-22 上海兆芯集成电路有限公司 Set associative cache memory with heterogeneous replacement policy
US9652398B2 (en) 2014-12-14 2017-05-16 Via Alliance Semiconductor Co., Ltd. Cache replacement policy that considers memory access type
US9652400B2 (en) 2014-12-14 2017-05-16 Via Alliance Semiconductor Co., Ltd. Fully associative cache memory budgeted by memory access type
CN106708747A (en) * 2015-11-17 2017-05-24 深圳市中兴微电子技术有限公司 Memory switching method and device
CN106874222A (en) * 2016-12-26 2017-06-20 深圳市紫光同创电子有限公司 instruction delay control method, controller and memory
US9898411B2 (en) 2014-12-14 2018-02-20 Via Alliance Semiconductor Co., Ltd. Cache memory budgeted by chunks based on memory access type
US9910785B2 (en) 2014-12-14 2018-03-06 Via Alliance Semiconductor Co., Ltd Cache memory budgeted by ways based on memory access type
CN110059024A (en) * 2019-04-19 2019-07-26 中国科学院微电子研究所 A kind of memory headroom data cache method and device
TWI739844B (en) * 2016-06-29 2021-09-21 美商甲骨文國際公司 Method and processor for controlling prefetching to prevent over-saturation of interfaces in memory hierarchy, and related computer system
CN113867820A (en) * 2021-09-29 2021-12-31 深圳市智微智能软件开发有限公司 Method, device and equipment for dynamically modifying frame buffer and storage medium

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901192A (en) * 2010-07-27 2010-12-01 杭州电子科技大学 On-chip and off-chip data object static assignment method
CN102073596A (en) * 2011-01-14 2011-05-25 东南大学 Method for managing reconfigurable on-chip unified memory aiming at instructions
CN102073596B (en) * 2011-01-14 2012-07-25 东南大学 Method for managing reconfigurable on-chip unified memory aiming at instructions
CN103218304B (en) * 2013-04-03 2016-07-20 杭州电子科技大学 Off-chip distribution method in a kind of embedded memory data slice
CN103218304A (en) * 2013-04-03 2013-07-24 杭州电子科技大学 On-chip and off-chip distribution method for embedded memory data
CN105701032B (en) * 2014-12-14 2019-06-11 上海兆芯集成电路有限公司 The set associative cache memory for having a variety of replacement policies
US9811468B2 (en) 2014-12-14 2017-11-07 Via Alliance Semiconductor Co., Ltd. Set associative cache memory with heterogeneous replacement policy
TWI559143B (en) * 2014-12-14 2016-11-21 上海兆芯集成電路有限公司 Set associative cache memory with heterogeneous replacement policy
US9652398B2 (en) 2014-12-14 2017-05-16 Via Alliance Semiconductor Co., Ltd. Cache replacement policy that considers memory access type
US9652400B2 (en) 2014-12-14 2017-05-16 Via Alliance Semiconductor Co., Ltd. Fully associative cache memory budgeted by memory access type
CN105701032A (en) * 2014-12-14 2016-06-22 上海兆芯集成电路有限公司 Set associative cache memory with heterogeneous replacement policy
US9910785B2 (en) 2014-12-14 2018-03-06 Via Alliance Semiconductor Co., Ltd Cache memory budgeted by ways based on memory access type
WO2016097813A1 (en) * 2014-12-14 2016-06-23 Via Alliance Semiconductor Co., Ltd. Set associative cache memory with heterogeneous replacement policy
US9898411B2 (en) 2014-12-14 2018-02-20 Via Alliance Semiconductor Co., Ltd. Cache memory budgeted by chunks based on memory access type
CN106708747A (en) * 2015-11-17 2017-05-24 深圳市中兴微电子技术有限公司 Memory switching method and device
TWI739844B (en) * 2016-06-29 2021-09-21 美商甲骨文國際公司 Method and processor for controlling prefetching to prevent over-saturation of interfaces in memory hierarchy, and related computer system
CN106874222A (en) * 2016-12-26 2017-06-20 深圳市紫光同创电子有限公司 instruction delay control method, controller and memory
CN106874222B (en) * 2016-12-26 2020-12-15 深圳市紫光同创电子有限公司 Instruction delay control method, controller and memory
CN110059024A (en) * 2019-04-19 2019-07-26 中国科学院微电子研究所 A kind of memory headroom data cache method and device
CN113867820A (en) * 2021-09-29 2021-12-31 深圳市智微智能软件开发有限公司 Method, device and equipment for dynamically modifying frame buffer and storage medium

Also Published As

Publication number Publication date
CN101763316B (en) 2011-06-29

Similar Documents

Publication Publication Date Title
CN101763316B (en) Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism
CN101739358B (en) Method for dynamically allocating on-chip heterogeneous memory resources by utilizing virtual memory mechanism
CN201540564U (en) Dynamic distribution circuit for distributing on-chip heterogenous storage resources by utilizing virtual memory mechanism
CN201570016U (en) Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism
CN102073596B (en) Method for managing reconfigurable on-chip unified memory aiming at instructions
CN101464834B (en) Flash memory data write-in method and controller using the same
CN105103144A (en) Apparatuses and methods for adaptive control of memory
CN102792285A (en) Hierarchical translation tables control
CN102483719A (en) Block-based non-transparent cache
CN103019955B (en) The EMS memory management process of PCR-based AM main memory application
Janapsatya et al. Hardware/software managed scratchpad memory for embedded system
Bathen et al. HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed scratchpad and non-volatile memories
US20130132704A1 (en) Memory controller and method for tuned address mapping
CN104346284A (en) Memory management method and memory management equipment
CN101295240A (en) Method for instruction buffering based on SPM in embedded system
Siddique et al. Lmstr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors
Kandemir et al. Improving memory energy using access pattern classification
Ravindran et al. Compiler-managed partitioned data caches for low power
Ji et al. Dynamic and adaptive SPM management for a multi-task environment
CN101251810A (en) Method for optimizing embedded type operating system process scheduling based on SPM
CN101853219B (en) Virtualized platform based Method for swapping in disc page
Yang et al. Compiler‐assisted dynamic scratch‐pad memory management with space overlapping for embedded systems
CN101013404A (en) Heterogeneous multi-core system-oriented management method of paging memory space
Aouad et al. A Survey of Scratch-Pad Memory Management Techniques for low-power and-energy
Volpato et al. A post-compiling approach that exploits code granularity in scratchpads to improve energy efficiency

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110629

Termination date: 20141225

EXPY Termination of patent right or utility model