CN201570016U - Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism - Google Patents

Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism Download PDF

Info

Publication number
CN201570016U
CN201570016U CN2009202825304U CN200920282530U CN201570016U CN 201570016 U CN201570016 U CN 201570016U CN 2009202825304 U CN2009202825304 U CN 2009202825304U CN 200920282530 U CN200920282530 U CN 200920282530U CN 201570016 U CN201570016 U CN 201570016U
Authority
CN
China
Prior art keywords
instruction
spm
address
controller
tlb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009202825304U
Other languages
Chinese (zh)
Inventor
凌明
张阳
梅晨
王欢
武建平
李冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN2009202825304U priority Critical patent/CN201570016U/en
Application granted granted Critical
Publication of CN201570016U publication Critical patent/CN201570016U/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The utility model relates to a dynamic command on-chip heterogenous memory resource distribution circuit based on a virtual memory mechanism; an inner core of a processor sends out a virtual address of the command access; the virtual address is converted into a physical address through an internal memory management unit (MMU); the physical address is sent to either a commend Cache controller or a command SPM controller through a a router of the command part according to the flag bit state of a translation lookaside buffer (TLB); if the command SPM controller receives the physical address, a command SPM memory is accessed after the physical address is decoded; a clock module sends out interruption signals during clock interruption, the interruption signals are responded by an interruption controllor, and the command SPM controller is adopted in the interruption processing procedure; the command SPM controller comprises an SPM area register; the command SPM controller is equipped with a source address, a destination address and the conveying length of the DMA controller according to the information of the SPM area register; the DMA controller modifies the contents in the command SPM memory through a high-speed AHB bus and an external memory interface according to the procedure contents in the off-chip main memory SDRAM; and the command SPM controller is equipped with length information of the clock module and controls the clock module at the same time.

Description

Circuit based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism
Technical field
The utility model relates to embedded on-chip memory field, particularly a kind of circuit to the dynamic assignment of isomerism storage resources (instruction Cache and instruction SPM) on the instruction sheet based on virtual memory mechanism.
Background technology
Along with the development of microelectric technique, increasingly mature based on the embedded computing platform of SoC (System-on-a-Chip).Yet because the gap of processor speed and external memory storage speed constantly increases, the SoC storage subsystem has become the bottleneck of system performance, power consumption and cost.Therefore how the framework and the operating strategy of optimal Storage subsystem are the focuses of embedded system research always.
As traditional on-chip memory, Cache is by hardware management, and is transparent to software under most of situation, can load the instruction and data of frequent access automatically in on-chip memory.Yet the high power consumption of Cache, chip occupying area is big, program execution time is unpredictable etc., and shortcoming limits its extensive utilization in embedded system always.Especially the group associate feature of Cache may cause being mapped to the capable distinct program content of same Cache, mutual alternative (be Cache shake) repeatedly, thus increased the cost of extra access main memory SDRAM, finally increased system performance and energy consumption expense.Compare with Cache, SPM (Scratch-pad Memory, memo storer) is a kind of high speed on-chip memory, realize by SRAM (static RAM) usually, and be very important Consideration in the modern embedded system Frame Design.SPM is within the address space that processor can directly visit, because traditional SPM controller does not comprise the logical circuit of any auxiliary management data, all the elements among the SPM must with respect to the transparent Cache of programmer, increase the complicacy of program management via the explicit management of software.Because the overhead that does not have the management logic circuit to introduce, compared to traditional C ache, the realization of SPM hardware is more simple, the single reference power consumption is lower, chip occupying area is littler and the access time can be predicted.To sum up, each tool advantage of Cache and SPM and existence are complementary, therefore the isomery on-chip memory resource of Cache and SPM coexistence are effectively managed, and can make full use of both advantages, thereby reduce system energy consumption, elevator system performance to greatest extent.
The framework that some dispose Cache separately or dispose SPM separately at the main analysis of the research of embedded on-chip memory can not well utilize the characteristic of both complementations.Other researchs then directly will be only at the optimized Algorithm of SPM or only apply in the framework of both coexistences at the optimized Algorithm of Cache, this optimization income that will cause probably obtaining on a kind of storer may be offset by the expense of another kind of storer, even introduces how extra system performance and energy consumption expense.For example the content of certain section main memory is transported to SPM, thereby has obtained the income of performance and energy consumption at the optimized Algorithm of SPM.Yet carrying code itself may pollute, cause the inefficacy of Cache optimized Algorithm to instruction Cache, thereby causes extra Cache disappearance, even causes the Cache shake, finally offsets the optimization income of SPM.
Need visit main memory during the Cache disappearance and also new content be changed in the Cache storer, performance that causes and energy consumption expense are bigger.Because the group associate feature of Cache is mapped to the capable content of same Cache mutual alternative repeatedly, brings a large amount of accessing operations, finally causes system performance sharply to reduce, system energy consumption sharply increases, Cache conflict that Here it is.By increasing methods such as Cache capacity, increase group incidence number, can reduce the Cache conflict, but can introduce new chip area again and increase single Cache access time and energy consumption like this.Have research to point out that Cache conflict is the major reason that causes system performance and energy consumption bottleneck at present, so they will cause easily that the program segment of Cache conflict puts into SPM, obtain the income of performance and energy consumption with this.The page or leaf that causes the Cache conflict easily is selected among the SPM, not only can reduces the Cache conflict, can also obtain more to overcharge benefit by the energy consumption difference of single reference SPM and Cache.But static mode optimization is all adopted in these researchs, and promptly the content among the SPM does not change in program process, does not make full use of the Cache conflict in the locality on time dimension, has reduced the utilization factor of SPM.And these designs depend on modification to the source program jump instruction to the management of SPM, are a kind of optimization of intrusive mood.
Owing to the change of SPM content is needed the carrying out of software demonstration, therefore generally the research that SPM is carried out dynamic management all is the form by " piling ", promptly before and after the program kernel circulation that needs are optimized, manual insertion code carrying instruction swaps out to changing to of contents of program thereby finish.In program image, insert new instruction, need to rely on analysis, and new instruction causes the variation of Cache behavior in the coexistence framework probably, finally cause more Cache conflicts source code.
At the research of operation part in Cache and the SPM coexistence framework, generally need carry out the analysis of intrusive mood to program at present, need insert, revise partial code in user program, dynamically changing to of content swaps out in the program process to be implemented in.Up to the present, also there is not correlative study to relate to based on virtual memory management, utilize the time slot method, do not need update routine source code ground the command content of program dynamically to be mapped to the circuit of SPM on the sheet.
Summary of the invention
The purpose of this utility model is to overcome the deficiency of storage subsystem on the existing sheet, circuit based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism, focus is carried out in the less instruction SPM BUF instruction in program process of employing capacity, reduce instruction Cache disappearance simultaneously, reduce the extra memory access that brings by conflict, finally improve the speed of microprocessor operation and reduce system energy consumption.
For achieving the above object, the utility model provides a kind of circuit based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism, it is characterized in that: be provided with processor cores, memory management unit MMU, operation part router, instruction Cache, instruction SPM storer and instruction SPM controller, direct memory access controller DMA, bus, interruptable controller, clock module, external memory interface and the outer main memory SDRAM of sheet; Processor cores sends the virtual address to instruction access, after process memory management unit MMU is converted to physical address, zone bit state according to its bypass conversion buffered TLB advanced the operation part router, with physical address send to instruction Cache and instruction SPM controller both one of; If instruction SPM controller receives physical address, then to physical address decoding back access instruction SPM storer; Clock module sends look-at-me when clock interrupts, by interruptable controller response, call instruction SPM controller in interrupt handling routine; Instruction SPM controller comprises a SPM regional register, instruction SPM controller is according to the information of SPM regional register, the source address of configuration dma controller, destination address and carrying length, dma controller is through high speed ahb bus and external memory interface, according to the contents of program among the outer main memory SDRAM of sheet the content in the instruction SPM storer is changed, the length information of instruction SPM controller while configurable clock generator module also enables clock module.
Described memory management unit adopts the framework of the bypass conversion buffered TLB of two-stage, and wherein: one-level is the TLB of instruction, data separating, and secondary is the TLB of instruction, uniform data; When kernel sends the address of instruction access, at first send to one-level instruction TLB, the physical address if one-level TLB hits after will changing sends to instruction Cache or instruction SPM controller; If one-level TLB does not hit, the address sends to secondary and unifies TLB, and the physical address if secondary TLB hits after will changing sends to instruction Cache or instruction SPM controller; If secondary TLB does not hit, need the page table in the access external memory, carry out virtual address-physical address translations.
Described every TLB is made of a Tag storage array, two SRAM storage arrays, address decoding circuitry, Hit logic, read-write steering logic and input and output driving circuits: Tag partly is 24, processor cores sends virtual address through after the address decoding logic, high 24 will compare with the virtual page number preserved in the Tag storer, the Hit logic is used to judge whether to hit, if hit then carry out address translation, need visit next stage TLB or main memory if do not hit according to the content of two SRAM; First SRAM is 20, is used for depositing of zone bit, comprises that the utility model utilization keeps the S position that the position newly expands, and after conversion is finished in the address, can physical address be sent to instruction Cache or instruction SPM controller according to the numerical value of S position; 24 of second SRAM positions are used to deposit the page number of physical address.
Described instruction SPM controller also according in the clock module for the record of time slot, configuration information is loaded in the SPM regional register, the SPM regional register is one 32 a register, the 0th is enable bit, the 1st to the 3rd position for sign page or leaf size, its concrete corresponding base address is provided by the high address, takies figure place according to its page size and does not wait; When clock interrupted, instruction SPM controller was changed the content in the instruction SPM storer according to source address, destination address and the carrying length of the content configuration dma controller of SPM regional register.
Described clock module is provided with a register that is exclusively used in the record number of time slots, adds 1 automatically when each clock interrupts, and is used to indicate the current time slots number; By the timing length of SPM controller according to current time slots configurable clock generator module, and clock module is set is the One-shot pattern, when numerical value when being kept to 0, send clock and interrupt, take over by interruptable controller.
Relate to the modification of the structure of TLB in the foregoing circuit.The minimum management of only supporting the 1KByte/ virtual page of the page or leaf of traditional MMU, and in the management based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism, the minimum of SPM management granularity is the page or leaf size of MMU.If use bigger page or leaf to manage,, can not finely utilize the area of SPM for the programmed instruction part of comparatively disperseing.Therefore the utility model will have been done expansion to the reservation position of the 6th to the 9th of secondary page table entry in the ARMv5TEJ standard P TEs framework, and Tag (label) storer and the comparator circuit of modification TLB, realization can make full use of the area of on-chip memory like this to the support of 256Byte/ virtual page and 512Byte/ virtual page when the dynamic management of SPM storer.TLB mainly comprises following components: a Tag storage array, two SRAM storage arrays, address decoding circuitry, Hit (hitting) logic, read-write steering logic and input and output driving circuits.A virtual address is made up of page number and offset address usually, and during work, processor cores is sent 32 virtual address, and the high-order page number of virtual address and the virtual page number among the Tag are compared.Owing to increased the more support of fine granularity page or leaf, page number is also corresponding elongated, and the utility model is maximum supports 24 Tag to compare, and supports that promptly minimum page or leaf is the 256Byte/ virtual page.During the 512Byte/ virtual page, Tag only needs to use preceding 23; TLB also can support 22,20,16 or 12 s' contrast simultaneously, the conversion regime of corresponding little respectively page or leaf, little page or leaf, big page or leaf and section.
Also relate to the modification of TLB mode bit in the foregoing circuit.The mode bit of TLB generally is the access control information that is stored among the SRAM, for example the territory under the access rights of page table, address translation mode and the page table etc.In order to show that a page or leaf is in instruction SPM or among the instruction Cache, the mode bit that the utility model utilizes TLB to reserve increases the S position to realize essential access control in TLB.Use special S position can make MMU when carrying out address translation, determine the actual physical address that should visit, and the on-chip memory (instruction Cache or instruction SPM storer) that needs visit is issued in this address.If do not use the S position, after the conversion of address through virtual address-physical address of MMU, need simultaneously the address to be sent to instruction Cache and instruction SPM controller, the addressing logic of the comparator circuit of instruction Cache and instruction SPM controller is started working simultaneously.Because instruction only may be in a kind of storer, two kinds of addressing logic all will consume certain energy consumption.Design S position can be judged the address is sent to one of instruction Cache and instruction SPM controller in address translation, to save certain power consumption.
Also relate to the modification of TLB framework in the foregoing circuit.Use more fine-grained little page or leaf, the address space of each page table entry description will correspondingly shorten, and thereby then may because frequent TLB lack bring a large amount of main memory visit, reduce performance and increase energy consumption therefore if do not redesign TLB this moment.Therefore, the utility model is analyzed inlet number, the degree of association and the level of TLB when revising the MMU page or leaf, reduces the system energy consumption cost that is brought by little page management.Introduce two-stage TLB and can reach the compromise of performance and energy consumption, so the utility model has designed instruction, the one-level TLB of data separating and unified secondary TLB.By optimized choice, the utility model finally selects to use the unified TLB (2 grades of TLB) of the instruction uTLB of the data uTLB of 8 inlets, 16 inlets and 64 inlets, 32 tunnel group associations.In this configuration, can drop to minimum with introducing little page of cost.
Need to design new instruction SPM controller in the foregoing circuit.Instruction SPM controller in program process dynamic high-efficiency programmed instruction is partly changed to the SPM that swaps out, utilize the Burst characteristic of AHB high-speed bus on the sheet, avoid secondary pollution to instruction Cache and Data Cache.Having increased by one group in the instruction SPM controller is exclusively used in record and writes back an address and an empty SPM regional register of depositing the page or leaf size, comprise: 1. this group register will be responsible for depositing page or leaf in certain void and write down its corresponding core address when remapping at SPM, this address will be when this void be deposited page or leaf and is swapped out the memo storer as the destination address of special-purpose DMA; 2. be responsible for the empty position of depositing the page or leaf size of record in this group register and will be used for configure dedicated DMA carrying length; 3. whether the enable bit in this group register to be responsible for controlling this page content available.
Compared with prior art, the utlity model has following advantage and remarkable result:
(1), can solve the carrying instruction segment and cause problems such as discontinuous, the necessary modification source code of address space based on the thought of virtual memory management.Virtual memory mechanism can provide virtual continuously, the discrete address space of physics, instruct focus of carrying out or the program segment that causes instruction Cache conflict easily to be transported in the instruction SPM storer part, by page table entry is made amendment, can form the continuous space of virtual address.Like this, the carrying code of the increase SPM that can not need show in user program is only finished the optimization to program in interrupt handling routine, realizes analysis and optimization to the program non-intrusion type.
(2) the utility model makes full use of the temporal locality characteristics that program is carried out, utilize clock module that procedure division is different time-gap, then with frequent access in each time slot and cause instructing the page or leaf of Cache conflict easily, utilize the mechanism of virtual memory management to be reoriented to SPM, thereby obtain compared to static optimization more considerable performance and energy consumption income, final limited instruction SPM memory resource, the energy consumption of reduction program, the elevator system performance utilized dynamically.
Description of drawings
Fig. 1 is the circuit based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism;
Fig. 2 is the modification of the TLB of support 512Byte/ virtual page, 256Byte/ virtual page;
The design of Fig. 3 storage administration circuit
Fig. 4 is the design of the register of instruction SPM controller;
Fig. 5 utilizes the system flowchart of time slot analysis method to isomerism storage resources on the instruction sheet;
Fig. 6 is the iterative synoptic diagram;
The test findings of Fig. 7 for adopting the utility model optimization method that system energy consumption is optimized
Embodiment
Below in conjunction with accompanying drawing and embodiment the utility model is described in further detail.
Figure 1 shows that system chart, comprise processor cores, memory management unit MMU, operation part router, instruction Cache, instruction SPM storer and instruction SPM controller, direct memory access controller DMA, bus (comprising the high speed ahb bus), interruptable controller, clock module, external memory interface and the outer main memory SDRAM of sheet.The part that needs to increase on original framework comprises S position among the TLB, instruction SPM controller, and the place that needs to revise comprises memory management unit MMU, clock module etc.
Processor cores sends the virtual address to instruction access, after process memory management unit (MMU) is converted to physical address, zone bit state according to its bypass conversion buffered TLB advanced the operation part router, with physical address send to instruction Cache and instruction SPM controller both one of; If instruction SPM controller receives physical address, then to physical address decoding back access instruction SPM storer; Clock module sends look-at-me when clock interrupts, by interruptable controller response, call instruction SPM controller in interrupt handling routine; Instruction SPM controller comprises a SPM regional register, instruction SPM controller is according to the information of SPM regional register, the source address of configuration dma controller, destination address and carrying length, dma controller is through high speed ahb bus and external memory interface, according to the contents of program among the outer main memory SDRAM of sheet the content in the instruction SPM storer is changed, the length information of instruction SPM controller while configurable clock generator module also enables clock module.
Figure 2 shows that the modification of TLB framework, to support 512Byte/ virtual page and 256Byte/ virtual page and S mode bit.The minimum management of only supporting the 1KByte/ virtual page of the page or leaf of traditional MMU, and in the management based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism, the minimum of SPM management granularity is the page or leaf size of MMU.If use bigger page or leaf to manage,, can not finely utilize the area of SPM for the programmed instruction part of comparatively disperseing.Therefore the utility model will have been done expansion to the reservation position of the 6th to the 9th of secondary page table entry in the ARMv5TEJ standard P TEs framework, and revise Tag storer and the comparator circuit of TLB, realize the support to 256Byte/ virtual page and 512Byte/ virtual page.Need adjust original address conversion circuit, revise the structure of TLB,, when the dynamic management of instruction SPM storer, can make full use of the area of on-chip memory like this to increase support to 512Byte/ virtual page and 256Byte/ virtual page.TLB mainly comprises following components: a Tag storage array, two SRAM storage arrays, address decoding circuitry, Hit logic, read-write steering logic and input and output driving circuits.A virtual address is made up of page number and offset address usually, and during work, processor cores is sent 32 virtual address, and the high-order page number of virtual address and the virtual page number among the Tag are compared.Owing to increased the more support of fine granularity page or leaf, page number is also corresponding elongated, and the utility model is maximum supports 24 Tag to compare, and supports that promptly minimum page or leaf is the 256Byte/ virtual page.During the 512Byte/ virtual page, Tag only needs to use preceding 23; TLB also can support 22,20,16 or 12 s' contrast simultaneously, the conversion regime of corresponding little respectively page or leaf, little page or leaf, big page or leaf and section.
The mode bit of TLB generally is the access control information that is stored among the SRAM, for example the territory under the access rights of page table, address translation mode and the page table etc.In order to show that a page or leaf is in instruction SPM storer or among the instruction Cache, the mode bit that the utility model utilizes TLB to reserve increases the S position to realize essential access control in TLB.Use special S position can make MMU when carrying out address translation, determine the actual physical address that should visit, and the on-chip memory (instruction Cache or instruction SPM storer) that needs visit is issued in this address.After the clock Interrupt Process, just can be when continuing to carry out benchmark according to the set situation of SPM position, the on-chip memory type of decision subsequent access in physical address-virtual address translation.
Figure 3 shows that the design of storage administration circuit.The utility model adopts the design of 2 grades of TLB.This is that the address space of each page table entry description will correspondingly shorten because use more fine-grained little page or leaf, and thereby then may because frequent TLB lack bring a large amount of main memory visit, reduce performance and increase energy consumption therefore if do not redesign TLB this moment.Therefore, the utility model is analyzed inlet number, the degree of association and the level of TLB when revising the MMU page or leaf, reduces the system energy consumption cost that is brought by little page management.Introduce two-stage TLB and can reach the compromise of performance and energy consumption, so the utility model has designed instruction, the one-level TLB of data separating and unified secondary TLB.The utility model only designs at the circuit of the operation part of program, therefore only has operation part to comprise SPM storer and SPM controller.When processor cores needs the access program operation part, at first virtual address is sent to MMU.The operation part of one-level TLB is started working, if page table entry in TLB, then one-level TLB hits, the address sends among SPM or the Cache according to S zone bit among the TLB; If one-level TLB does not hit, then visit secondary and unify TLB; Secondary TLB does not hit, and then needs to carry out virtual address-physical address translations according to the page table in the main memory.By optimized choice, the utility model finally selects to use the unified TLB (2 grades of TLB) of the instruction uTLB of the data uTLB of 8 inlets, 16 inlets and 64 inlets, 32 tunnel group associations.In this configuration, can drop to minimum with introducing little page of cost.
Fig. 4 is the register design of instruction SPM controller.Traditional SPM design of Controller is comparatively simple, can't realize the desired more complicated instruction SPM dynamic allocation scheme of dynamic address mapping mechanism.Therefore the utility model is used to write down the register that writes back the address by increasing by one group on the basis of traditional SPM controller, i.e. the SPM regional register.SPM controller traditional, that only can realize addressing function expanded to support different grain size SPM management, and can realize that its content dynamically changes to the advanced SPM controller that swaps out by the special-purpose DMA of active arrangement.The 0th of the SPM regional register is the EN position, and this position can be put 0 when system initialization, when done the DMA change operation SPM page of first time, with this position 1; The 1st to the 3rd the page or leaf size of determining of regional register, this research can be done the design space at the MMU page or leaf of different sizes and explore, and therefore need do explanation to the page or leaf size.The pairing base address of different instruction page or leaf is provided by the high address, takies figure place according to its page size and does not wait, and for example the page base address of 128Byte/ virtual page is the 7th to the 31st, and the page base address of 4096Byte/ virtual page is the 12nd to the 31st.All the other positions are for keeping the position.
Instruction SPM controller is finished configuration to DMA according to the content in the SPM regional register.Compared to traditional, carry out changing to of SPM content by the LDR/STR instruction and swap out, DMA has utilized the BURST characteristic of high-speed bus AHB on main memory SDRAM and the sheet to a great extent, thus the cost that has reduced transmission with interrupt delaying time.Instruction SPM controller enables DMA by the configuration to DMA source address, destination address and carrying length.Dma controller will be applied for bus afterwards, finish dynamically updating of on-chip command SPM memory content.
Figure 5 shows that and utilize the time slot analysis method isomerism storage resources on the instruction sheet to be carried out the system flowchart of the method for dynamic management.
At program analysis phase, the first step is set up instruction Cache time slot visit figure by the trace/ trace information of the instruction Cache that collects.Can realize analysis based on instruction Cache time slot visit figure to the program non-intrusion type.Second step, carry out mathematical abstractions we by instruction Cache visit figure is carried out mathematical modeling to describe the weight distribution between each instruction page, then the state of each the alternative node of variation quantitative description by weight distribution is to the influence of energy consumption function, finally tried to achieve the state of whole energy consumption income each node when optimum by integral nonlinear planning.The 3rd step can obtain in each time slot, had most to optimize the page number that is worth, and these page numbers can be changed in the process that program is carried out in the instruction SPM storer dynamically.In the 4th step, by iteration slot length is adjusted, as shown in Figure 6.Owing to lack owing to having eliminated a large amount of instruction Cache in each time slot, thereby obtained the income of energy consumption and performance two aspects, and the income on the performance can directly cause this time slot program execution time to shorten, thereby the time point skew that causes next time slot to begin, therefore need count each time slot in the gap of optimizing surrounding time and with its correction by the method for iterative.After finishing above-mentioned steps, can obtain needing in the concrete length of adjusted each time slot and each time slot to move into the page or leaf content of instruction SPM storer.
In the program execute phase, when the clock module to constantly, processor cores will receive the interrupt request that interruptable controller sends, system enters the IRQ pattern then.Under abnormal patterns, can finish that changing to of content in the modification of page table entry and the instruction SPM storer swapped out, to adapt to the program memory access mode of next time slot.The detailed process that enters the clock interruption is: the first step, after entering this pattern and preserving relevant environmental variance, utilize the time slot register of clock module to obtain current timeslot number, thereby obtain this time slot configuration information needed, comprise the page number and the adjusted slot length that need dynamically change to instruction page.In second step, instruct the SPM controller to obtain core address spatial information and the correlating markings position that to cushion in the current time slots instruction SPM storer.In the 3rd step, instruction SPM controller will load this time slot need write the map section inter-register of the core address of instruction SPM storage page to correspondence, and the renewal operation of beginning page table entry.The 4th step, after finishing the page table entry renewal, instruction SPM controller will be responsible for loading in the map section inter-register core address to the source address register of DMA, and the physical address of the corresponding page or leaf of load instructions SPM storer begins the change operation of DMA then to the destination address register of DMA.At last, interrupt handling routine adds 1 with the time slot register of clock module, for next time slot is prepared, and the environmental variance before recovering to interrupt, withdrawing from interrupt handling routine, processor cores begins to continue to carry out clock and interrupts benchmark in the past.
Figure 7 shows that and use the resulting energy consumption income of circuit that the utility model proposes based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism.Contrast test adopts the instruction Cache of 16K Byte 4 tunnel group associations common in the actual chips, optimization Test adopts the instruction Cache of 4K Byte direct correlation and the instruction SPM storer of 8K Byte, utilizes the time slot analysis method that the instruction isomerism storage resources is carried out dynamic management.According to the calculating of Cacti3.2 to the on-chip memory area, the area sum of 4K direct correlation Cache that optimization Test is used and the SPM of 4K, related Cache compares with contrast test 8K Byte 4 tunnel groups, chip area reduces by 19.0%, system energy consumption on average reduces by 12.7%, the highest reduction by 22.3%, simultaneously, system's execution time on average reduces by 15.9%, the highest reduction by 25.6%.

Claims (5)

1. the circuit based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism is characterized in that: be provided with processor cores, memory management unit MMU, operation part router, instruction Cache, instruction SPM storer and instruction SPM controller, direct memory access controller DMA, bus, interruptable controller, clock module, external memory interface and the outer main memory SDRAM of sheet; Processor cores sends the virtual address to instruction access, after process memory management unit MMU is converted to physical address, zone bit state according to its bypass conversion buffered TLB advanced the operation part router, with physical address send to instruction Cache and instruction SPM controller both one of; If instruction SPM controller receives physical address, then to physical address decoding back access instruction SPM storer; Clock module sends look-at-me when clock interrupts, by interruptable controller response, call instruction SPM controller in interrupt handling routine; Instruction SPM controller comprises a SPM regional register, instruction SPM controller is according to the information of SPM regional register, the source address of configuration dma controller, destination address and carrying length, dma controller is through high speed ahb bus and external memory interface, according to the contents of program among the outer main memory SDRAM of sheet the content in the instruction SPM storer is changed, the length information of instruction SPM controller while configurable clock generator module also enables clock module.
2. the circuit based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism according to claim 1, it is characterized in that: memory management unit adopts the framework of the bypass conversion buffered TLB of two-stage, wherein: one-level is the TLB of instruction, data separating, and secondary is the TLB of instruction, uniform data; When kernel sends the address of instruction access, at first send to one-level instruction TLB, the physical address if one-level TLB hits after will changing sends to instruction Cache or instruction SPM controller; If one-level TLB does not hit, the address sends to secondary and unifies TLB, and the physical address if secondary TLB hits after will changing sends to instruction Cache or instruction SPM controller; If secondary TLB does not hit, need the page table in the access external memory, carry out virtual address-physical address translations.
3. the circuit based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism according to claim 2, it is characterized in that: every TLB is by a Tag storage array, two SRAM storage arrays, address decoding circuitry, the Hit logic, read-write steering logic and input and output driving circuit constitute: Tag partly is 24, processor cores sends virtual address through after the address decoding logic, high 24 will compare with the virtual page number preserved in the Tag storer, the Hit logic is used to judge whether to hit, if hit then carry out address translation, need visit next stage TLB or main memory if do not hit according to the content of two SRAM; First SRAM is 20, is used for depositing of zone bit, comprises that the utility model utilization keeps the S position that the position newly expands, and after conversion is finished in the address, can physical address be sent to instruction Cache or instruction SPM controller according to the numerical value of S position; 24 of second SRAM positions are used to deposit the page number of physical address.
4. according to claim 1 or 2 or 3 described circuit based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism, it is characterized in that: instruction SPM controller also according in the clock module for the record of time slot, configuration information is loaded in the SPM regional register, the SPM regional register is one 32 a register, the 0th is enable bit, the 1st to the 3rd position for sign page or leaf size, its concrete corresponding base address is provided by the high address, takies figure place according to its page size not wait; When clock interrupted, instruction SPM controller was changed the content in the instruction SPM storer according to source address, destination address and the carrying length of the content configuration dma controller of SPM regional register.
5. according to claim 1 or 2 or 3 described circuit based on isomerism storage resources dynamic assignment on the instruction sheet of virtual memory mechanism, it is characterized in that: clock module is provided with a register that is exclusively used in the record number of time slots, when each clock interrupts, add 1 automatically, be used to indicate the current time slots number; By the timing length of SPM controller according to current time slots configurable clock generator module, and clock module is set is the One-shot pattern, when numerical value when being kept to 0, send clock and interrupt, take over by interruptable controller.
CN2009202825304U 2009-12-25 2009-12-25 Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism Expired - Fee Related CN201570016U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009202825304U CN201570016U (en) 2009-12-25 2009-12-25 Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009202825304U CN201570016U (en) 2009-12-25 2009-12-25 Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism

Publications (1)

Publication Number Publication Date
CN201570016U true CN201570016U (en) 2010-09-01

Family

ID=42662313

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009202825304U Expired - Fee Related CN201570016U (en) 2009-12-25 2009-12-25 Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism

Country Status (1)

Country Link
CN (1) CN201570016U (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043723A (en) * 2011-01-06 2011-05-04 中国人民解放军国防科学技术大学 On-chip cache structure used for variable memory access mode of general-purpose stream processor
CN102426554A (en) * 2011-11-04 2012-04-25 杭州中天微系统有限公司 Interface converter for controlling translation lookaside buffers (TLBs)
CN102866958A (en) * 2012-09-07 2013-01-09 北京君正集成电路股份有限公司 Method and device for accessing dispersed internal memory
CN103226508A (en) * 2013-04-17 2013-07-31 上海新储集成电路有限公司 Method for characterizing kernel working load condition of microcontroller
CN104731720A (en) * 2014-12-30 2015-06-24 杭州中天微系统有限公司 Set associative second-level memory management device
CN106560798A (en) * 2015-09-30 2017-04-12 杭州华为数字技术有限公司 Internal memory access method and apparatus, and computer system
CN107526528A (en) * 2016-06-20 2017-12-29 北京正泽兴承科技有限责任公司 A kind of realization mechanism of piece upper low latency memory
CN108647161A (en) * 2018-04-17 2018-10-12 北京控制工程研究所 A kind of hardware observation circuit of record memory access address history
CN111221465A (en) * 2018-11-23 2020-06-02 中兴通讯股份有限公司 DSP processor, system and external storage space access method
CN112527697A (en) * 2020-05-11 2021-03-19 大唐半导体科技有限公司 Data exchange controller of Cache RAM and Retention RAM and implementation method

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043723B (en) * 2011-01-06 2012-08-22 中国人民解放军国防科学技术大学 On-chip cache structure used for variable memory access mode of general-purpose stream processor
CN102043723A (en) * 2011-01-06 2011-05-04 中国人民解放军国防科学技术大学 On-chip cache structure used for variable memory access mode of general-purpose stream processor
CN102426554A (en) * 2011-11-04 2012-04-25 杭州中天微系统有限公司 Interface converter for controlling translation lookaside buffers (TLBs)
CN102866958A (en) * 2012-09-07 2013-01-09 北京君正集成电路股份有限公司 Method and device for accessing dispersed internal memory
CN102866958B (en) * 2012-09-07 2015-07-01 北京君正集成电路股份有限公司 Method and device for accessing dispersed internal memory
CN103226508A (en) * 2013-04-17 2013-07-31 上海新储集成电路有限公司 Method for characterizing kernel working load condition of microcontroller
CN104731720A (en) * 2014-12-30 2015-06-24 杭州中天微系统有限公司 Set associative second-level memory management device
CN104731720B (en) * 2014-12-30 2018-01-09 杭州中天微系统有限公司 The connected secondary memory managing device of group
CN106560798B (en) * 2015-09-30 2020-04-03 杭州华为数字技术有限公司 Memory access method and device and computer system
CN106560798A (en) * 2015-09-30 2017-04-12 杭州华为数字技术有限公司 Internal memory access method and apparatus, and computer system
CN107526528A (en) * 2016-06-20 2017-12-29 北京正泽兴承科技有限责任公司 A kind of realization mechanism of piece upper low latency memory
CN107526528B (en) * 2016-06-20 2021-09-07 北京正泽兴承科技有限责任公司 Mechanism for realizing on-chip low-delay memory
CN108647161A (en) * 2018-04-17 2018-10-12 北京控制工程研究所 A kind of hardware observation circuit of record memory access address history
CN111221465A (en) * 2018-11-23 2020-06-02 中兴通讯股份有限公司 DSP processor, system and external storage space access method
US11782846B2 (en) 2018-11-23 2023-10-10 Zte Corporation Digital signal processor, DSP system, and method for accessing external memory space
CN111221465B (en) * 2018-11-23 2023-11-17 中兴通讯股份有限公司 DSP processor, system and external memory space access method
CN112527697A (en) * 2020-05-11 2021-03-19 大唐半导体科技有限公司 Data exchange controller of Cache RAM and Retention RAM and implementation method

Similar Documents

Publication Publication Date Title
CN201570016U (en) Dynamic command on-chip heterogenous memory resource distribution circuit based on virtual memory mechanism
CN201540564U (en) Dynamic distribution circuit for distributing on-chip heterogenous storage resources by utilizing virtual memory mechanism
CN101763316B (en) Method for dynamically distributing isomerism storage resources on instruction parcel based on virtual memory mechanism
CN101739358B (en) Method for dynamically allocating on-chip heterogeneous memory resources by utilizing virtual memory mechanism
CN102073596B (en) Method for managing reconfigurable on-chip unified memory aiming at instructions
US6260114B1 (en) Computer cache memory windowing
US9047090B2 (en) Methods, systems and devices for hybrid memory management
JP3715714B2 (en) Low power memory system
CN105103144A (en) Apparatuses and methods for adaptive control of memory
CN101346701A (en) Reducing number of memory bodies under power supply
CN102792285A (en) Hierarchical translation tables control
CN100549945C (en) In the embedded system based on the implementation method of the instruction buffer of SPM
WO2018027839A1 (en) Method for accessing table entry in translation lookaside buffer (tlb) and processing chip
Janapsatya et al. Hardware/software managed scratchpad memory for embedded system
CN112997161A (en) Method and apparatus for using storage system as main memory
CN102567220A (en) Cache access control method and Cache access control device
Siddique et al. Lmstr: Local memory store the case for hardware controlled scratchpad memory for general purpose processors
CN109521949A (en) It is a kind of that frequency data distribution method is write based on the perception for mixing scratch ROM
Hu et al. A novel design of software system on chip for embedded system
CN105353865A (en) Multiprocessor based dynamic frequency adjustment method
CN101251810A (en) Method for optimizing embedded type operating system process scheduling based on SPM
CN103377141A (en) High-speed memory area access method and high-speed memory area access device
Paul et al. Dynamically adaptive i-cache partitioning for energy-efficient embedded multitasking
Du et al. Optimization of data allocation on CMP embedded system with data migration
Wang et al. Energy-oriented dynamic SPM allocation based on time-slotted cache conflict graph

Legal Events

Date Code Title Description
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100901

Termination date: 20141225

EXPY Termination of patent right or utility model