CN102073596A - Method for managing reconfigurable on-chip unified memory aiming at instructions - Google Patents

Method for managing reconfigurable on-chip unified memory aiming at instructions Download PDF

Info

Publication number
CN102073596A
CN102073596A CN2011100073102A CN201110007310A CN102073596A CN 102073596 A CN102073596 A CN 102073596A CN 2011100073102 A CN2011100073102 A CN 2011100073102A CN 201110007310 A CN201110007310 A CN 201110007310A CN 102073596 A CN102073596 A CN 102073596A
Authority
CN
China
Prior art keywords
cache
spm
program
storage
restructural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011100073102A
Other languages
Chinese (zh)
Other versions
CN102073596B (en
Inventor
凌明
王欢
梅晨
翟婷婷
张阳
武建平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN2011100073102A priority Critical patent/CN102073596B/en
Publication of CN102073596A publication Critical patent/CN102073596A/en
Application granted granted Critical
Publication of CN102073596B publication Critical patent/CN102073596B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a method for implementing managing a reconfigurable on-chip unified memory aiming at instructions by utilizing a virtual memory mechanism. Through the method, the parameters of the Cache part and the SPM (Scratch-PadMemory) part in the reconfigurable unified memory can be dynamically adjusted in the program running process to adapt to the requirements for memory architecture in different program execution stages. The method is characterized by analyzing the memory access behaviors in different program running stages, obtaining phase change behavior diagrams of the instruction parts and carrying out mathematical abstraction on the phase change behavior diagrams, obtaining the reconfigurable memory configuration information in each program stage and selecting the program instruction parts needing to be optimized by adopting integer nonlinear programming (INLP) according to the energy consumption objective function and the performance objective function and mapping the code segments which have severe conflicts and are frequently accessed in the Cache into the SPM part as much as possible by virtue of a virtual memory management mechanism, thus not only reducing the external memory access energy consumption caused by repeatedly filling Cache, reducing the extra energy consumption caused by compare logic in the Cache and improving the system performance.

Description

At storage and uniform management method on the restructural sheet of instruction
Technical field
The present invention relates to storage and uniform device on a kind of restructural sheet, relate in particular to a kind of dynamic management that utilizes the virtual memory mechanism realization to storage and uniform device on this restructural sheet, specifically provide the circuit and the dynamic management approach of this storer.
Background technology
Along with the development of microelectric technique, based on SoC(System-on-a-Chip) embedded computing platform increasingly mature.Yet because the gap of processor speed and external memory storage speed constantly increases, the SoC storage subsystem has become the bottleneck of system performance, power consumption and cost.Therefore how the framework and the operating strategy of optimal Storage subsystem are the focuses of embedded research always.
Cache and SPM (Scratch-Pad Memory, memo storer) are modal traditional on-chip memories.Cache is by hardware management, and is transparent to software under most of situation, can load the instruction and data of nearest visit automatically in on-chip memory.Yet the high power consumption of Cache, area occupied is big, program execution time is unpredictable etc., and deficiency limits its extensive utilization in embedded system always.Especially the group associate feature of Cache may cause being mapped to the capable distinct program content of same Cache, because the memory access rule, and mutual alternative repeatedly, thus increased the expense of system performance and energy consumption, Cache promptly occurs and shake.Compare with Cache, SPM is a kind of high speed on-chip memory, realize by SRAM usually, and be very important system framework design consideration in the modern embedded system.SPM is within the address space that processor can directly visit, because traditional SPM controller does not comprise the logical circuit of any auxiliary management data, all the elements among the SPM must with respect to the transparent Cache of programmer, increase the complicacy of program management via the explicit management of software.Because the extra cost that does not have the management logic circuit to bring, compared to traditional C ache, the realization of SPM hardware is more simple, the single reference power consumption is lower, chip occupying area is littler and the access time can be predicted.To sum up, each tool advantage of Cache and SPM and existence are complementary, therefore intercommunication Cache and SPM unify on the restructural sheet of configuration management the storage and uniform device and study, and can make full use of both advantages, thereby reduce system energy consumption, elevator system performance to greatest extent.
The framework that some dispose Cache merely or dispose SPM merely at the main analysis of the research of embedded on-chip memory can not well utilize the characteristic of both complementations.Directly will be only at the optimized Algorithm of SPM or only apply on the restructural sheet in the storage and uniform device at the optimized Algorithm of Cache, can not reach overall power and best performanceization, the optimization income that obtains on a kind of memory bank may be offset by the expense of another kind of storer, even introduces the more overhead of multisystem performance and energy consumption.For example the content of certain section main memory is transported to SPM, thereby has obtained the income of performance and energy consumption at the optimized Algorithm of SPM.Yet carrying code itself may pollute, cause the inefficacy of Cache optimized Algorithm to instruction Cache, thereby causes extra Cache disappearance, offsets the optimization income of SPM.
In Cache when disappearance, need actually to visit the external memory operation and will new content change to Cache capable, and expense is bigger, and this is called as the punishment that Cache lacks.Because the group associate feature of Cache is mapped to the capable content of same Cache mutual alternative repeatedly, brings a large amount of accessing operations, thereby causes system performance sharply to reduce, system energy consumption sharply increases, the conflict of Here it is Cache.By increasing methods such as Cache capacity, increase group incidence number, can reduce the Cache conflict, but can introduce new chip area so again and promote single Cache access time and energy consumption, and have the storage block of a large amount of free time among the Cache of the high degree of association in some road, wasted storage resources on the valuable sheet.Have research to point out that Cache conflict is the major reason that causes system performance and energy consumption bottleneck at present, so they will cause easily that the program segment of Cache conflict puts into SPM, obtain the income of performance and energy consumption with this.The page or leaf that causes the Cache conflict easily is selected among the SPM, not only can reduces system energy consumption by reducing the Cache conflict, the elevator system performance can also obtain more to overcharge benefit by the energy consumption difference of single reference SPM and Cache.But these researchs all are based on static circuit design, and promptly the degree of association of Cache and the size of SPM can not change in program is carried out.Studies show that the different application programs even the different phase of consolidator have different memory access characteristics, this fixing storage architecture can not adapt to the variation of memory access characteristic.
Owing to the change of SPM content is needed the carrying out of software demonstration, therefore generally the research that SPM is carried out dynamic management all is the form by " piling ", promptly before and after the program kernel circulation that needs are optimized, manual insertion code carrying instruction swaps out to changing to of contents of program thereby finish.In program image, insert new instruction, need to rely on analysis, and new instruction causes the variation of Cache behavior in the coexistence framework, for example more conflict of generation probably source code.
At the research of operation part in Cache and the SPM coexistence framework, generally need carry out the analysis of intrusive mood to program at present, need insert, revise partial code in user program, dynamically changing to of content swaps out in the program process to be implemented in.Mostly be research at restructural Cache at the research of reconstruction structure, the parameter of the change Cache of trial property is minimum in the hope of energy consumption in program operation process, but can't improve program feature.Up to the present, also there is not correlative study to relate to, utilizes the virtual memory management mode, the method for storage and uniform device on the dynamic management restructural sheet at the programmed instruction part.
Summary of the invention
Technical matters:The objective of the invention is to overcome the deficiency of storage subsystem on the existing sheet, adopt a kind of reconfigurable to go up the storage and uniform device, a kind of method of utilizing the virtual memory mechanism realization to the reconfigurable memorizer dynamic management is proposed, parameter according to Cache part in the interim dynamic-configuration reconfigurable memorizer of program execution and SPM part, to cause that the instruction page of Cache conflict and the instruction page of frequent access are mapped in the SPM part, thereby reduce the extra memory access that brings by conflict and the extra energy consumption of Cache Compare Logic, finally reduce system energy consumption and improve the speed of microprocessor operation.
Technical scheme:The method of utilizing storage and uniform device on the virtual memory mechanism dynamic management restructural sheet of the present invention by in the application program implementation to the tracking of processor reading command and to the tracking of the cache memory Cache part behavior in the reconfigurable memorizer, obtain instructing time and the space distribution of carrying out instruction hit and disappearance among characteristic and the Cache, and then to instruction Cache at the transformation behavior figure of different phase and it is carried out mathematics take out picture, according to the power dissipation obj ectives function, the reconfigurable memorizer parameter configuration when performance objective function utilizes the method for integral nonlinear planning to select system's total energy consumption optimum respectively and the distribution of each instruction page; Producing phase transformation by program phase transformation detecting device in program is carried out interrupts, Cache part and memo storer SPM(Scratch-Pad Memory in the storage and uniform device on each stage restructural sheet) partly structure, and by configuration to storage and uniform device controller on the modification of page table entry inlet, the direct memory access restructural sheet, suitable instruction page is mapped in the SPM storer the extra energy consumption of Compare Logic that extra memory access that elimination instruction Cache conflict brings and frequent access Cache bring.
Utilize program to carry out different phase and embody different instruction execution characteristics, program process is divided into the different stages; After in the different stages, obtaining the transformation behavior figure of Cache in the reconfigurable memorizer, utilize the locality of current generation instruction, the not high road of utilization factor in the Cache part is reconstructed into the SPM storage organization, cause that with the most frequent in a period of time the instruction address space replay of instruction Cache conflict and frequent access is mapped in the SPM storage area, and when its income is little, shine upon back main memory.
The storage and uniform device can be in program operation process on the said restructural sheet, current configuration information register by storage and uniform device controller on the configuration restructural sheet, the Tag bank that Cache in the reconfigurable memorizer is partly closed a certain road, and its Data bank is reconstructed into SPM uses; Perhaps the Tag bank of a certain bank correspondence among the SPM is opened and be reconstructed into Cache and use, can dynamically adjust the degree of association and the SPM capacity of Cache in the storage architecture in this way, also be provided with simultaneously the configuration information registers group of the reconstruct configuration information that is exclusively used in each program phase of record and the SPM regional register group of record SPM zone mapping relations in Memory Controller, its effect is:
1) the configuration information registers group is responsible for writing down the configuration information of reconfigurable memorizer in pairing Cache part of each program phase and SPM part, when program phase transformation detecting device detects the program phase when changing, interrupt handling routine should be loaded into the current configuration information register from this group register by required configuration information of stage, finishes the dynamic-configuration to reconfigurable memorizer;
2) SPM regional register group is responsible for writing down the physical address that each program phase need change to the instruction page of SPM storage area, be used to dispose the direct memory access controller instruction page is moved into the SPM storage area from main memory, this group register also will be responsible for depositing page table entry before page or leaf is used to when being swapped out the SPM storage area recover to change in certain void;
Said program phase transformation detecting device is added up the execution characteristic of instruction in program process, and according to detection mode that disposes in the configuration register and threshold value, producing phase transformation when program phase property changes interrupts, in interrupt handling routine, can be configured, and then cater to of the requirement of program different phase storage architecture to storage and uniform device on the restructural sheet.
The storage and uniform device comprises Cache part and SPM part on the restructural sheet, and these two parts can be adjusted the degree of association of parameter: Cache part dynamically in program operation process, the capacity of SPM part.
Described phase transformation detecting device utilizes the variation determining program phase transformation of this characteristic by the characteristic that real-time measurement processor executes instruction in the working procedure process, write down the phase transformation sequence number and produce look-at-me to processor.
Obtain instructing time and the space distribution of carrying out instruction hit and disappearance among characteristic and the Cache, the stage that shows when utilizing program to carry out, the address space replay of the most frequent Cache of causing conflict in a period of time and frequent access is mapped in the SPM storer, and when its income is little, shines upon back main memory.
What restructural unified that the on-chip memory controller utilizes its inner direct memory access controller dynamic high-efficiency in program process partly changes to the SPM storage area with programmed instruction, utilize the Burst characteristic of AHB high-speed bus on the sheet, avoid carrying secondary pollution Cache by processor.
On the restructural sheet, be provided with one group of regional register group that is exclusively used in each program phase reconfigurable memorizer configuration information of record and SPM storage area address mapping relation in the storage and uniform device controller:
1) this group register will be responsible for detecting the program phase when changing at program phase transformation testing circuit, should be loaded in the current configuration information register by required configuration information of stage by interrupt handling routine, finish the dynamic-configuration to reconfigurable memorizer;
2) this group register is responsible for writing down the physical address that each program phase need change to the instruction page of SPM storage area, is used to dispose the direct memory access controller instruction page is moved into the SPM storage area from main memory;
3) this group register will be responsible for depositing page or leaf when remapping at the SPM storage area in certain void, write down its corresponding core address, and page table entry before being used to recover to change to when page or leaf is swapped out the SPM storage area will be deposited in this void in this address.
Beneficial effect:The present invention makes full use of the phasic characteristics in the program process, the proposition of novelty the notion of transformation behavior figure, by analysis to transformation behavior figure, the Cache part on the dynamic configuration restructural sheet in the storage and uniform device and the parameter of SPM part, the adaptation program is carried out the memory access characteristic in each stage, farthest reduce system energy consumption, and elevator system performance to a certain degree.Utilize the thought of virtual memory management can conveniently solve the shortcoming of invasive update routine code layout in traditional SPM optimisation technique.Traditional optimisation technique adopts in program more is inserted the way that section that the carrying instruction will be to be optimized dynamically is transported to SPM, adopts the thought of virtual memory management, just the actual physical address and the virtual address of program distribution use when compiling can be kept apart.Like this, virtual address space all is continuous before and after optimizing for program, but for real hardware, instruction segment part replay frequent access and that cause the Cache conflict is mapped in the SPM part, thereby reduced access times and the conflict number of times of Cache, finally obtained the income on performance and the energy consumption.Simultaneously, utilize virtual memory mechanism that program is managed, can realize analysis and optimization to the program non-intrusion type, the carrying code of the increase SPM that promptly need in user program, not show, and in the phase transformation Interrupt Process by configuration DMA with revise page table and finish changing to of contents of program swapped out.The present invention organically combines storage and uniform device on the mechanism of virtual memory management and the restructural sheet, obtains to optimize or single SPM optimizes more considerable performance and energy consumption income compared to other single Cache.
Description of drawings
Fig. 1 utilizes the system chart of virtual memory mechanism realization to storage and uniform device dynamic management on the restructural sheet;
Fig. 2 is amended TLB page table entry synoptic diagram;
Storage and uniform device synoptic diagram on Fig. 3 restructural sheet;
Fig. 4 is a transformation behavior figure synoptic diagram;
Fig. 5 utilizes virtual memory mechanism storage and uniform device on the restructural sheet to be carried out the system flowchart of the method for dynamic management.
Embodiment
The inventive method specifically can realize according to the following steps:
(1) sets up the mechanism of virtual memory management
Virtual memory management mechanism can form physical separation, logic continuous address by revising page table entry, so just can realize with the map addresses of subprogram page or leaf to the SPM of reconfigurable memorizer partly in.With respect to traditional dynamic SPM optimisation technique, utilize void to deposit the change of finishing the address space mapping relations, can realize complete non-intrusion type optimization to the binary image that generates after program source code and the compiling.In order to adapt to method to Cache and SPM dynamic management, improve the utilization factor of SPM part, the present invention need improve original MMU hardware.By revising the decoding logic of TLB, increase by 512 Bytes/ virtual pages, 256 Bytes/ virtual pages are supported.Traditional TLB only supports the management of minimum 1K Bytes/ virtual page, and Cache is by the row tissue, every row is 32-64Bytes only, in a period of time that program is carried out, occur instruction Cache conflict or frequent access the minimum void supported less than traditional TLB mostly of address space deposit the page or leaf size, in order to carry out refinement to optimizing granularity, improve the SPM utilization factor, the present invention will utilize the reservation position in the conventional page list item inlet, revise Tag storer and the comparator circuit of TLB, realize support 256 Bytes/ virtual pages and 512 Bytes/ virtual pages.
(2) foundation of transformation behavior figure
The present invention carries out dynamic optimization by the visit behavior of analyzing Cache part in the reconfigurable memorizer to reconfigurable memorizer, because the Cache behavior shows tangible program phase property, therefore proposition " transformation behavior figure " notion of novelty of the present invention, behavior is analyzed to Cache on time and the space.Transformation behavior figure carries out mathematical abstractions according to the trace information to reconfigurable memorizer Cache part to it.Transformation behavior figure be a kind of quantitative description be mapped to same Cache capable in the weight vectors figure of fallback relationship and visit behavior between the distinct program instruction segment.Because the present invention adopts virtual memory management mechanism that programmed instruction is partly managed, the granularity of division of program is the page or leaf size of MMU, the Cache behavior will be undertaken abstract by page or leaf, and it is carried out mathematical modeling describing the weight distribution between each page, the allocation optimum of reconfigurable memorizer and the mapping status of each page when finally trying to achieve by integral nonlinear planning that whole energy consumption and performance benefits are optimum in the different time-gap.So just can obtain in each stage, having most and optimize the page or leaf that is worth, when program undergoes phase transition, storer is reconstructed and these pages or leaves are changed in the SPM part dynamically.
(3) the program phase transformation is analyzed
The phase transformation of this research and utilization program is carried out dynamic management to reconfigurable memorizer.The operational process of program often can be divided into the different program phases, and in each program phase, the behavioural characteristic of program is constant substantially, is embodied in the requirement to memory construction, the instruction number of phase operation weekly etc.The present invention utilizes the real-time measurement processor of the phase transformation detecting device instruction number of phase operation weekly, when undergoing phase transition, program produces hardware interrupts, processor cores will receive the interrupt request that interruption processing module is sent, system enters interrupt mode, finish structural adjustment, and the SPM storage area is remapped reconfigurable processor.
(4) utilize the reconfigurable memorizer controller to finish dynamic management
In the program execute phase, when the phase transformation detection module detects the conversion of program phase property, processor cores is under abnormal patterns, by configuration to the reconfigurable memorizer controller, finish to the changing to of content in the modification of the reconstruct of storer and page table entry and the SPM storer, to adapt to the program memory access mode in this stage.
In phase transformation was interrupted, the reconstruct of storer was finished by configuration reconstruction memory controller: the first, search the configuration information memory location that phase-change recording register in the phase transformation detection module finds the current generation; The second, configuration information is loaded into current configuration register in the reconfigurable memorizer controller, to adjust Cache part and SPM parameter partly; The 3rd, the instruction page that will be mapped to the SPM part in this stage is carried out page table entry upgrade operation; The 4th, the instruction page that configuration DMA register will be mapped to the SPM part is transported to from main memory in the SPM part; The 5th, enable reconfigurable memorizer, processor enters the normal procedure implementation.
Reconfigurable memorizer controller involved in the present invention will be referred to following registers group: the first, and current configuration information register is used for a certain Bank of reconfigurable memorizer is configured to Cache or SPM; The second, context configuration information registers group, wherein the memory configurations of each register in the corresponding program phase is used for being loaded into current configuration information register when the variation of program phase property; The 3rd, SPM regional register group writes down the mapping situation of each program phase SPM, is used for changing to when swapping out the SPM part at page or leaf revising page table entry by reading this registers group; Second, DMA transmits control register, realize that by configuration DMA the main memory content dynamically changes to the SPM storage area, compared to traditional, carry out changing to of data SPM memory content by the LDR/STR instruction and swap out, DMA has utilized the BURST characteristic of high-speed bus AHB on main memory SDRAM and the sheet to a great extent, thereby has reduced the cost and interruption time-delay of transmission.
Below in conjunction with accompanying drawing and embodiment the present invention is described in further detail.
Figure 1 shows that system chart, comprise the outer main memory SDRAM of storage and uniform device on processor cores, phase transformation detecting device, memory management unit MMU, operation part router, the restructural sheet, reconfigurable memorizer controller, special-purpose direct memory access controller DMA, bus, interruptable controller, clock module, external memory interface and sheet.Comprise storage and uniform device, reconfigurable memorizer controller on phase transformation detecting device, the restructural sheet in the part that needs on original framework to increase.
Processor cores sends the virtual address of access instruction, after process memory management unit (MMU) is converted to physical address, zone bit state according to its bypass conversion buffered TLB, advanced the operation part router, physical address was sent to Cache part, SPM part or chip external memory in the reconfigurable memorizer; The phase transformation detecting device detects the finger situation of getting of CPU in real time, sends look-at-me when detecting phase transformation, by reconfigurable memorizer controller and interruptable controller response, and configuration reconfigurable memorizer controller in interrupt handling routine; The reconfigurable memorizer controller comprises current configuration information register, one group context configuration information register and SPM regional register, controller is according to the information of SPM regional register, the source address of configuration dma controller, destination address and carrying length, dma controller is changed the content in the SPM storage area according to the contents of program among the outer main memory SDRAM of sheet through high speed ahb bus and external memory interface.
Figure 2 shows that modification, to support 512 Bytes/ virtual pages and 256 Bytes/ virtual pages to instruction TLB page table entry.The minimum management of only supporting 1K Bytes/ virtual page of the page or leaf of traditional MMU, and in the management based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism, the minimum of SPM management granularity is the page or leaf size of MMU.If use bigger page or leaf to manage,, can not finely utilize the area of SPM part for the programmed instruction part of comparatively disperseing.Therefore the present invention will make amendment to the 2nd of secondary page table entry in the ARMv5TEJ standard P TEs framework, owing to concerning instruction, do not need Buffer, so with former B position as the Size extension bits, and Tag storer and the comparator circuit of modification TLB, realize support to 256 Bytes/ virtual pages and 512 Bytes/ virtual pages.Need to adjust original address conversion circuit, revise the structure of TLB,, when the dynamic management of instruction SPM storer, can make full use of the area of on-chip memory like this to increase support to 512 Bytes/ virtual pages and 256 Bytes/ virtual pages.TLB mainly comprises following components: a Tag storage array, two SRAM storage arrays, address decoding circuitry, Hit logic, read-write steering logic and input and output driving circuits.A virtual address is made up of page number and offset address usually, and during work, CPU sends 32 virtual address, and the high-order page number of virtual address and the virtual page number among the Tag are compared.Owing to increased the more support of fine granularity page or leaf, page number is also corresponding elongated, and the present invention is maximum to support 24 Tag to compare, and supports that promptly minimum page or leaf is 256 Bytes/ virtual pages.During 512 Bytes/ virtual pages, Tag only needs to use preceding 23; The Tag that TLB also can support 22,20,16 or 12 simultaneously relatively, the conversion regime of corresponding little respectively page or leaf, little page or leaf, big page or leaf and section.
Figure 3 shows that the reconfigurable memorizer structural drawing.Comprise reconfigurable memorizer controller, tag storage array, data storage array, special-purpose DMA etc.Memory bank part is based on the related Cache structure of 4 tunnel groups, maximum is not both the tag storage array and the data storage array can be controlled by the reconfigurable memorizer controller.One group of current configuration information register current_cs_reg is arranged in the controller, and wherein C1-C4 is respectively applied for control one road tag storage array and corresponding data storage array thereof.Work as C iBe 1 o'clock, tag iTo be closed data iAs the SPM storage area; Work as C iBe 0 o'clock, tag iTo be opened data iAs the Cache storage area.Also have one group of SPM regional register in the controller, can be used to store the SPM part of each program phase and the mapping relations of main memory.One group context configuration information register also is set in the controller in addition, be for when program undergoes phase transition, can carry out the contextual switching of phase transformation rapidly, make reconfigurable memorizer in the shortest time, finish the reconstruct of memory bank and utilize special-purpose DMA that the SPM storage area is carried out fast mapping.From structural drawing, as can be seen, when a certain road is configured to the SPM part, can reduce the extra power consumption of bringing owing to the tag Compare Logic, and the data part is by the unaddressable software addressable that becomes of software.
Fig. 4 is the synoptic diagram of transformation behavior figure.Because carrying out, program has comparatively significantly program phase property, transformation behavior figure is according to the phasic characteristics of program, the whole process that partition program is carried out is several stages, and obtain separately memory access behavior figure respectively in different phase inside, and obtain the best storage configuration of reconfigurable memorizer in each program phase according to behavior figure.To cause the page or leaf of Cache conflict and the page or leaf of frequent access to utilize the mechanism of virtual memory management to be reoriented to the SPM storage area in each time slot by dynamic allocation algorithm, dynamic optimization based on the program phase characteristic can utilize storage resources on the limited sheet, obtains compared to fixed storage structure more considerable performance and energy consumption income.
Figure 5 shows that and utilize virtual memory mechanism storage and uniform device on the restructural sheet to be carried out the system flowchart of the method for dynamic management.
At program analysis phase, the first step is configured to Cache with all bank of reconfigurable processor, by the Cache trace information of collecting partly, creation facilities program (CFP) transformation behavior figure.Can realize analysis based on transformation behavior figure to the program non-intrusion type.Second step, carry out mathematical abstractions, by transformation behavior figure is carried out mathematical modeling with describe each instruction page in program process the visit situation and the relation between each page, then come of the influence of the state of each alternative node of quantitative description by the variation of analyzing each instruction page weight distribution of distinct program stage, finally try to achieve the state of whole energy consumption income each node when optimum by integral nonlinear planning to the energy consumption function.The 3rd step obtained in each program phase according to the analysis result in second step, and the best configuration of required storer is determined the reconstruct configuration information of each program phase of reconfigurable memorizer.The 4th step, distribute according to the storage organization after the reconstruct, promptly the parameter of Cache part and SPM part determines each program phase need be mapped to the instruction page page number and the areal distribution in the SPM part of SPM part, obtains the value of SPM regional register group.After finishing above-mentioned steps, can obtain the memory configurations information in each stage in the program process and the regional mapping relations of SPM storage area.
In the program execute phase, at first the value with configuration information register and SPM regional register is loaded in the reconfigurable memorizer controller.When the program phase place changes, processor cores will receive the interrupt request that interruptable controller sends, and system enters interrupt mode then.Under interrupt mode, by loading configuration information in the context configuration information register in current configuration information register, finish the reconfiguring of reconfigurable memorizer, changing to of content swaps out in the modification of page table entry and the SPM storer, to adapt to the program memory access mode in present procedure stage.The detailed process of Interrupt Process is: the first step, after entering interrupt mode and preserving relevant environmental variance,, close Cache part and MMU in the reconfigurable memorizer owing to need reconfigure and revise page table to storer.In second step, read the phase transformation counter register and obtain current program phase number.The 3rd step, read the current generation regional register, modification need be mapped to the pairing page table entry of instruction page of SPM.In the 4th step, the configuration information in the loading context configuration information register is in current configuration information register, and the reconfigurable memorizer controller is according to current configuration information register configuration memory construction.The 5th step, configure dedicated DMA, core address arrives the source address register of DMA in the loading mapping area register, and loads the destination address register of the physical address of the corresponding page or leaf of SPM storage area to DMA, enables DMA then the instruction page that needs are mapped to SPM is transported to the SPM part.In the 6th step, DMA enables Cache and MMU after carrying end, and the environmental variance before recovering to interrupt, and withdraws from interrupt handling routine, and processor cores begins to continue to carry out interruption program in the past.

Claims (6)

1. method of utilizing storage and uniform device on the virtual memory mechanism dynamic management restructural sheet, it is characterized in that: by in the application program implementation to the tracking of processor reading command and to the tracking of the cache memory Cache part behavior in the reconfigurable memorizer, obtain instructing time and the space distribution of carrying out instruction hit and disappearance among characteristic and the Cache, and then to instruction Cache at the transformation behavior figure of different phase and it is carried out mathematics take out picture, according to the power dissipation obj ectives function, the reconfigurable memorizer parameter configuration when performance objective function utilizes the method for integral nonlinear planning to select system's total energy consumption optimum respectively and the distribution of each instruction page; Producing phase transformation by program phase transformation detecting device in program is carried out interrupts, Cache part and memo storer SPM structure partly in the storage and uniform device on each stage restructural sheet, and by configuration to storage and uniform device controller on the modification of page table entry inlet, the direct memory access restructural sheet, suitable instruction page is mapped in the SPM storer the extra energy consumption of Compare Logic that extra memory access that elimination instruction Cache conflict brings and frequent access Cache bring.
2. the method for utilizing virtual memory mechanism to storage and uniform device dynamic management on the restructural sheet according to claim 1, it is characterized in that: the storage and uniform device comprises Cache part and SPM part on the restructural sheet, these two parts can be adjusted the degree of association of parameter: Cache part dynamically in program operation process, the capacity of SPM part.
3. the method for utilizing virtual memory mechanism to storage and uniform device dynamic management on the restructural sheet according to claim 1, it is characterized in that: the characteristic that described phase transformation detecting device executes instruction in the working procedure process by real-time measurement processor, utilize the variation determining program phase transformation of this characteristic, write down the phase transformation sequence number and produce look-at-me to processor.
4. the method for utilizing virtual memory mechanism to storage and uniform device dynamic management on the restructural sheet according to claim 1, it is characterized in that: obtain instructing time and the space distribution of carrying out instruction hit and disappearance among characteristic and the Cache, the stage that shows when utilizing program to carry out, the address space replay of the most frequent Cache of causing conflict in a period of time and frequent access is mapped in the SPM storer, and when its income is little, shines upon back main memory.
5. the method for utilizing virtual memory mechanism to storage and uniform device dynamic management on the restructural sheet according to claim 1, it is characterized in that: what restructural unified that the on-chip memory controller utilizes its inner direct memory access controller dynamic high-efficiency in program process partly changes to the SPM storage area with programmed instruction, utilize the Burst characteristic of AHB high-speed bus on the sheet, avoid carrying secondary pollution Cache by processor.
6. the method for utilizing virtual memory mechanism to storage and uniform device dynamic management on the restructural sheet according to claim 1 is characterized in that: be provided with one group of regional register group that is exclusively used in each program phase reconfigurable memorizer configuration information of record and SPM storage area address mapping relation on the restructural sheet in the storage and uniform device controller:
1) this group register will be responsible for detecting the program phase when changing at program phase transformation testing circuit, should be loaded in the current configuration information register by required configuration information of stage by interrupt handling routine, finish the dynamic-configuration to reconfigurable memorizer;
2) this group register is responsible for writing down the physical address that each program phase need change to the instruction page of SPM storage area, is used to dispose the direct memory access controller instruction page is moved into the SPM storage area from main memory;
3) this group register will be responsible for depositing page or leaf when remapping at the SPM storage area in certain void, write down its corresponding core address, and page table entry before being used to recover to change to when page or leaf is swapped out the SPM storage area will be deposited in this void in this address.
CN2011100073102A 2011-01-14 2011-01-14 Method for managing reconfigurable on-chip unified memory aiming at instructions Expired - Fee Related CN102073596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100073102A CN102073596B (en) 2011-01-14 2011-01-14 Method for managing reconfigurable on-chip unified memory aiming at instructions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100073102A CN102073596B (en) 2011-01-14 2011-01-14 Method for managing reconfigurable on-chip unified memory aiming at instructions

Publications (2)

Publication Number Publication Date
CN102073596A true CN102073596A (en) 2011-05-25
CN102073596B CN102073596B (en) 2012-07-25

Family

ID=44032142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100073102A Expired - Fee Related CN102073596B (en) 2011-01-14 2011-01-14 Method for managing reconfigurable on-chip unified memory aiming at instructions

Country Status (1)

Country Link
CN (1) CN102073596B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102662861A (en) * 2012-03-22 2012-09-12 北京北大众志微系统科技有限责任公司 Software-aided inserting strategy control method for last-level cache
CN103207838A (en) * 2012-01-17 2013-07-17 展讯通信(上海)有限公司 Method for improving property of chip
CN103345429A (en) * 2013-06-19 2013-10-09 中国科学院计算技术研究所 High-concurrency access and storage accelerating method and accelerator based on on-chip RAM, and CPU
CN103593324A (en) * 2013-11-12 2014-02-19 上海新储集成电路有限公司 Quick-start and low-power-consumption computer system-on-chip with self-learning function
CN104067244A (en) * 2012-01-23 2014-09-24 高通股份有限公司 Preventing the displacement of high temporal locality of reference data fill buffers
CN104813286A (en) * 2012-12-20 2015-07-29 英特尔公司 Method, apparatus, system for continuous automatic tuning of code regions
WO2015149433A1 (en) * 2014-03-31 2015-10-08 Tsinghua University Method and device for generating configuration information of dynamic reconfigurable processor
US9239786B2 (en) 2012-01-18 2016-01-19 Samsung Electronics Co., Ltd. Reconfigurable storage device
CN106708747A (en) * 2015-11-17 2017-05-24 深圳市中兴微电子技术有限公司 Memory switching method and device
CN110806898A (en) * 2019-05-22 2020-02-18 成都海光集成电路设计有限公司 Processor and instruction operation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1045307A2 (en) * 1999-04-16 2000-10-18 Infineon Technologies North America Corp. Dynamic reconfiguration of a micro-controller cache memory
CN101739358A (en) * 2009-12-21 2010-06-16 东南大学 Method for dynamically allocating on-chip heterogeneous memory resources by utilizing virtual memory mechanism
CN101763316A (en) * 2009-12-25 2010-06-30 东南大学 Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism
CN201540564U (en) * 2009-12-21 2010-08-04 东南大学 Dynamic distribution circuit for distributing on-chip heterogenous storage resources by utilizing virtual memory mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1045307A2 (en) * 1999-04-16 2000-10-18 Infineon Technologies North America Corp. Dynamic reconfiguration of a micro-controller cache memory
CN101739358A (en) * 2009-12-21 2010-06-16 东南大学 Method for dynamically allocating on-chip heterogeneous memory resources by utilizing virtual memory mechanism
CN201540564U (en) * 2009-12-21 2010-08-04 东南大学 Dynamic distribution circuit for distributing on-chip heterogenous storage resources by utilizing virtual memory mechanism
CN101763316A (en) * 2009-12-25 2010-06-30 东南大学 Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《电脑知识与技术》 20090831 张阳等 利用虚存管理的思想实现基于SPM的动态能耗优化机制 第5卷, 第24期 2 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207838A (en) * 2012-01-17 2013-07-17 展讯通信(上海)有限公司 Method for improving property of chip
CN103207838B (en) * 2012-01-17 2016-03-30 展讯通信(上海)有限公司 Improve the method for chip performance
US9239786B2 (en) 2012-01-18 2016-01-19 Samsung Electronics Co., Ltd. Reconfigurable storage device
US10114750B2 (en) 2012-01-23 2018-10-30 Qualcomm Incorporated Preventing the displacement of high temporal locality of reference data fill buffers
CN104067244A (en) * 2012-01-23 2014-09-24 高通股份有限公司 Preventing the displacement of high temporal locality of reference data fill buffers
CN104067244B (en) * 2012-01-23 2017-10-31 高通股份有限公司 Prevent the displacement of the high temporal locality of reference data fill buffer
CN102662861A (en) * 2012-03-22 2012-09-12 北京北大众志微系统科技有限责任公司 Software-aided inserting strategy control method for last-level cache
CN102662861B (en) * 2012-03-22 2014-12-10 北京北大众志微系统科技有限责任公司 Software-aided inserting strategy control method for last-level cache
CN104813286B (en) * 2012-12-20 2018-08-10 英特尔公司 Method, apparatus, the system of continuous adjust automatically for code area
CN104813286A (en) * 2012-12-20 2015-07-29 英特尔公司 Method, apparatus, system for continuous automatic tuning of code regions
US9904555B2 (en) 2012-12-20 2018-02-27 Intel Corporation Method, apparatus, system for continuous automatic tuning of code regions
CN108874457A (en) * 2012-12-20 2018-11-23 英特尔公司 Method, apparatus, the system of continuous adjust automatically for code area
CN108874457B (en) * 2012-12-20 2021-08-17 英特尔公司 Method, device and system for continuous automatic adjustment of code area
CN103345429B (en) * 2013-06-19 2018-03-30 中国科学院计算技术研究所 High concurrent memory access accelerated method, accelerator and CPU based on RAM on piece
CN103345429A (en) * 2013-06-19 2013-10-09 中国科学院计算技术研究所 High-concurrency access and storage accelerating method and accelerator based on on-chip RAM, and CPU
CN103593324A (en) * 2013-11-12 2014-02-19 上海新储集成电路有限公司 Quick-start and low-power-consumption computer system-on-chip with self-learning function
WO2015149433A1 (en) * 2014-03-31 2015-10-08 Tsinghua University Method and device for generating configuration information of dynamic reconfigurable processor
CN106708747A (en) * 2015-11-17 2017-05-24 深圳市中兴微电子技术有限公司 Memory switching method and device
WO2017084415A1 (en) * 2015-11-17 2017-05-26 深圳市中兴微电子技术有限公司 Memory switching method, device, and computer storage medium
CN110806898A (en) * 2019-05-22 2020-02-18 成都海光集成电路设计有限公司 Processor and instruction operation method
CN110806898B (en) * 2019-05-22 2021-09-14 成都海光集成电路设计有限公司 Processor and instruction operation method

Also Published As

Publication number Publication date
CN102073596B (en) 2012-07-25

Similar Documents

Publication Publication Date Title
CN102073596B (en) Method for managing reconfigurable on-chip unified memory aiming at instructions
CN201540564U (en) Dynamic distribution circuit for distributing on-chip heterogenous storage resources by utilizing virtual memory mechanism
CN101763316B (en) Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism
CN101739358B (en) Method for dynamically allocating on-chip heterogeneous memory resources by utilizing virtual memory mechanism
CN201570016U (en) Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism
CN101464834B (en) Flash memory data write-in method and controller using the same
US20120151232A1 (en) CPU in Memory Cache Architecture
US10592430B2 (en) Memory structure comprising scratchpad memory
CN103513957A (en) High-performance cache system and method
CN102792285A (en) Hierarchical translation tables control
US20060212654A1 (en) Method and apparatus for intelligent instruction caching using application characteristics
CN101819518A (en) Method and device for quickly saving context in transactional memory
GB2505564A (en) Generating executable code by selecting an optimization from a plurality of optimizations on basis of ACET.
WO2012123061A1 (en) Parallel memory systems
CN100377117C (en) Method and device for converting virtual address, reading and writing high-speed buffer memory
CN103678571A (en) Multithreaded web crawler execution method applied to single host with multi-core processor
Liu et al. Scratchpad memory architectures and allocation algorithms for hard real-time multicore processors
Zhang et al. G10: Enabling an efficient unified gpu memory and storage architecture with smart tensor migrations
Siddique et al. Lmstr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors
CN105447285A (en) Method for improving OpenCL hardware execution efficiency
Catthoor et al. How to solve the current memory access and data transfer bottlenecks: at the processor architecture or at the compiler level
CN101251810A (en) Method for optimizing embedded type operating system process scheduling based on SPM
CN101482851B (en) Threading sharing target local code cache replacement method and system in binary translator
CN103514107A (en) High-performance data caching system and method
Ji et al. Dynamic and adaptive SPM management for a multi-task environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120725

Termination date: 20150114

EXPY Termination of patent right or utility model