CN101558391B

CN101558391B - Configurable cache for a microprocessor

Info

Publication number: CN101558391B
Application number: CN2007800461129A
Authority: CN
Inventors: 罗德尼·J·佩萨文托; 格雷格·D·拉赫蒂; 约瑟夫·W·特里斯
Original assignee: Microchip Technology Inc
Current assignee: Microchip Technology Inc
Priority date: 2006-12-15
Filing date: 2007-12-12
Publication date: 2013-10-16
Anticipated expiration: 2027-12-12
Also published as: CN101558390A; CN101558390B; CN101558393A; CN101558393B; CN101558391A

Abstract

A cache module for a central processing unit has a cache control unit with an interface for a memory, a cache memory coupled with the control unit, wherein the cache memory has a plurality of cache lines, at least one cache line of the plurality of cache lines has an address tag bit field and an associated storage area for storing instructions or data, wherein the address tag bit field is readableand writeable and wherein the cache control unit is operable upon detecting that an address has been written to the address tag bit field to initiate a preload function in which instructions or datafrom the memory are loaded from the address into the at least one cache line.

Description

The configurable cache that is used for microprocessor

The cross reference of related application

The exercise question of the application's case opinion application on Dec 15th, 2006 is the 60/870th of " having the configurable skin cache memory (CONFIGURABLE PICOCACHE WITHPREFETCH AND LINKED BRANCH TRAIL BUFFERS; AND FLASH PREFETCHBUFFER) of looking ahead and linking branch's trace buffer and flash prefetch buffer " the, the exercise question of No. 188 U.S. Provisional Application cases and application on Dec 19th, 2006 is the right of priority of the 60/870th, No. 622 U.S. Provisional Application case of " linked branch history impact damper (LINKED BRANCH HISTORY BUFFER) "; The full text of described two provisional application cases is incorporated herein.

Technical field

The present invention relates to a kind of configurable cache for microprocessor or microcontroller.

Background technology

The bottleneck of pipeline type microprocessor structure is the high access time of accumulator system.Typical method in order to head it off uses the large high-speed memory buffer and a plurality of data words of every clock transfer after initial high memory access time.Small-sized microcontroller design is subject to the amount that can be positioned at the cache memory on the chip, and it can not support large-sized high stand-by period but the narrow storer of format high throughput.Therefore, need a kind of configurable cache for microcontroller or microprocessor.

Summary of the invention

According to an embodiment, a kind of cache module for CPU (central processing unit) can comprise the cache memory control module that comprises for the interface of storer, cache memory with described control module coupling, wherein said cache memory comprises a plurality of cache lines, at least one cache line in described a plurality of cache line comprises address tag bit field and is used for the zone that is associated with storing of storage instruction or data, wherein said address tag bit field is readable and can writes, and wherein said cache memory control module can operate to be initial pre-loaded function detecting after the address being written to address tag bit field, wherein will be loaded into described at least one cache line from described address from instruction or the data of storer.

According to further embodiment, described cache module also can comprise the indexed registers for the register access cache line that is associated by at least one.According to further embodiment, described cache module also can comprise for reading and write access the register of mapping address tag field.According to further embodiment, described at least one cache line further can comprise be used to locking described at least one cache line in order to avoid it is by the locking bit of overwrite.According to further embodiment, described at least one cache line further can comprise at least one control bit field, and wherein said control bit field and address tag bit field coupling are with the position of predefine number in the shielded address marker bit field.According to further embodiment, at least one other cache line can comprise be used at least one branch tail bit that automatically locks described at least one other cache line, wherein in the situation that described branch tail bit be set, in the situation that the predefined instruction of described locking bit in being associated with storing the zone has been published by automatic setting.According to further embodiment, each cache line further can comprise the validity control bit of the validity that is used to indicate associated cache line.According to further embodiment, each cache line can comprise further that to be used to indicate described cache line be as instruction cache line or as the Type Control position of data cache lines.According to further embodiment, cache module can further comprise the pre-fetch unit with described storer and the coupling of described cache memory, and wherein said pre-fetch unit is through designing will automatically to be loaded in another cache line from the instruction of storer when being published from the instruction that before has been loaded with a cache line of instruction.According to further embodiment, described pre-fetch unit can be controlled to and be activated or stop using.According to further embodiment, least-recently-used algorithm can be in order to determine which cache line will be by overwrite.

According to another embodiment, a kind of operation has for storage instruction or a plurality of cache lines of data and the method for the cache memory that each cache line has address tag bit field can comprise following steps: the address that is provided for being stored in the instruction sequence of storer; And described address is written in the address tag bit field of cache line, carry out thereupon that the access to storer is loaded in the cache line with instruction or the data that will be stored in the storer under described address under described address.

According to further embodiment, described method can further be included in carries out the step that write step is selected cache line before.According to further embodiment, can carry out described selection step by being loaded into for the index of described cache line indexed registers.According to further embodiment, can be by the address being written to the step of carrying out the said write address in the register that maps to cache line.According to further embodiment, described method can further comprise the step that will automatically be loaded into from the instruction of storer in another cache line when being published from the instruction that before has been loaded with a cache line of instruction.

According to another embodiment, the method that a kind of operation has the system of CPU (central processing unit) (CPU) (its with have for a plurality of cache lines of storage instruction or data and the cache memory coupling that each cache line has address tag bit field) can comprise following steps: carry out instruction at described CPU, described instruction is written to the address in the address tag bit field of cache line; Detect described address tag bit field by overwrite; And thereupon under described address access memory and instruction or the data that will under described address, be stored in the storer be loaded in the cache line.

According to further embodiment, described method can further be included in carries out the step that write step is selected cache line before.According to further embodiment, can carry out described selection step by being loaded into for the index of described cache line indexed registers.According to further embodiment, can be by the address being written to the step of carrying out the said write address in the register that maps to described cache line.According to further embodiment, described method can further comprise the step that will automatically be loaded into from the instruction of storer in another cache line when being published from the instruction that before has been loaded with a cache line of instruction.

According to another embodiment, a kind of cache module for CPU (central processing unit) can comprise comprise for the cache memory control module of the interface of storer and with the cache memory of described control module coupling, wherein said cache memory comprises a plurality of cache lines, wherein said cache memory is able to programme to assign in order to first group of cache line of high-speed cache instruction and in order to second group of cache line of cached data, and wherein said cache memory control module comprises programmable functions, and described programmable functions is forcing data cache to described second group of cache line when described first group of cache line carried out instruction.

According to another embodiment, a kind of cache module for CPU (central processing unit) can comprise and comprise for the interface of storer and the cache memory control module of control register able to programme, and with the cache memory of described control module coupling, wherein said cache memory comprises a plurality of cache lines, wherein said cache memory comprises in order to first group of cache line of high-speed cache instruction and in order to second group of cache line of cached data, and wherein the cache memory control module can operate when being set with at least one position in control register and forces data cache in second group of cache line.

Description of drawings

Can be by obtaining referring to the following description of doing by reference to the accompanying drawings of the present invention than complete understanding, wherein:

Fig. 1 illustrates the first embodiment of configurable cache.

Fig. 2 explanation is according to the details of the cache memory sections of the embodiment of Fig. 1.

Fig. 3 illustrates the second embodiment of configurable cache.

Fig. 4 explanation is according to the details of the cache line of the cache memory of the embodiment of Fig. 3.

Fig. 5 explanation is for the exemplary register of the function of the embodiment of control cache memory.

Fig. 6 explanation is according to other register of the content of the mapping cache line of the one among the described embodiment.

Fig. 7 explanation is for generation of certain logical circuit of signal specific.

Fig. 8 illustrates and shows the process flow diagram of simplifying cache access process.

Although the present invention allows various modifications and alternative form, show in the accompanying drawings and also describe in this article its specific example embodiment in detail.Yet, should be appreciated that, herein the description of specific example embodiment is not wished to limit the invention to particular form disclosed herein, but on the contrary, the present invention will be contained such as all modifications that is defined by appended claims and equivalent.

Embodiment

Standard micro controller unit (MCU) comprises 8 or microprocessor of 16 bit core usually.32 cores only just enter MCU circle recently.All these cores all do not have cache memory usually.Only complicated high-end 32 8-digit microcontrollers can have cache memory.This is that cache memory is larger and expensive because for MCU.The embodiment that discloses provides the small-sized configurable cache of middle ground, and it can configure in running and can serve as and look ahead and branch's trace buffer, is provided for simultaneously the optimum high speed memory buffer degree of depth that MCU uses.

According to an embodiment, cache memory can be configurable with very neatly operation through being designed to.For instance, it can be through programming strictly to operate as cache memory, and this is useful for small loop optimization.For this reason, manually lockable comprises the respective cache line in loop.It also can contribute the cache line (for example, nearly being used for half of line of linked branch history storage) of given number, but this acceleration function calls and returns.At last, it can be configured to sequential program information is being prefetched to the least-recently-used cache line when cache line is issued the first instruction.But by coming the prefetch program instruction with the speed that doubles the instruction of microprocessor service routine, accumulator system provides available bandwidth with in the situation that do not make program instruction streams stop the extraction procedure data.In fact, be not that all routine datas extractions are transparent.Provide in order to by providing with the high stand-by period but the feature balance of the low latency cache memory of the wide memory of format high throughput combination is improved the mechanism of performance according to the cache design method of different embodiment.

According to an embodiment, cache memory can through be designed to working time and the running in configurable complete association cache memory.Fig. 1 shows the block diagram of the embodiment of this type of configurable cache 100.

Coupling bus

110a and 110b are coupled to cache memory the CPU (central processing unit) (CPU) of microcontroller or microprocessor.Cache memory 100 comprises cache controller 120, and described cache controller 120 is coupled to instruction cache section 130 and data cache section 140.Each instruction cache section include instruction storer peculiar and the control bit and the mark (for example, with linear formula) that are associated, its center line can comprise be used to the storage area of storing a plurality of words.For instance, the lines that word can be in 16 long and instruction caches 130 can have 4 double words, thereby produce 4 * 32 positions.According to an embodiment, small instruction cache 130 can comprise 4 these type of lines.According to other embodiment, other configuration fixed according to the design of respective processor may be for more favourable.According to an embodiment, data cache section 140 can be through being designed to be similar to instruction cache design 130.According to designing a model and deciding, independent data and

instruction cache section

130 and 140 may be for desirable, in the processor of this (for example) (Harvard) structure that can be used for having Harvard.Yet in variational OR (von Neumann) the type microprocessor of routine, can use can be from the hybrid cache memory of same memory cache instruction and data.Fig. 1 only shows and is connected to instruction and data caching 130,140 program flash memory 160 (PFM) according to the processor with Harvard structure.Data-carrier store can be coupled in the Harvard structure individually, and perhaps storer 160 can be such as employed unified instruction/data storer in the variational OR structure.The data/commands that multiplexer 150 (for example) is controlled and will be stored in the cache memory 130,140 by cache controller 120 is provided to CPU via bus 110b.

Fig. 2 shows in more detail according to the instruction cache 130 of an embodiment and the structure of data caching.Described layout is showed the independent cache memory for instruction and data again.Each line of cache memory comprises data/commands storage area and a plurality of control and marker bit (for example, IFM, TAG and BT) of being associated.IFM represents particular mask, and it can be in order to some position of (for example) shielded address tag field TAG, and described address mark field TAG contains the start address of data/commands cache memory DATA, as hereinafter explaining in more detail.Each line can (for example) include instruction/data caching 4 * 32 positions, such as among Fig. 2 displaying.The extra bits that tag field can comprise actual address and indicate the validity of respective cache line, locking, type etc.In addition, such as among Fig. 2 displaying, provide branch tail bit BT for each cache line.When this position when being set, when to carry out subroutine call instruction and described instruction in respective cache line be not last instruction in the described line, CPU can automatically lock the cache line that is associated.In the case, respective cache line is automatically locked, and when program is returned from respective subroutine, the instruction of following after the respective calls instruction will be present in the cache memory, as hereinafter explaining in more detail.

Fig. 3 shows another embodiment of configurable cache.Cache controller 120 is provided for control signal and the information of all functions of cache memory.For instance, cache controller 120 control TAG logics 310, described TAG logic 310 and hit logic 320 couplings, described hit logic 320 are also processed from cache controller 120 and are come the data of the mark 330 of looking ahead that free cache controller provides.Hit logic produces the signal of control cache line address scrambler 340, described cache line address scrambler 340 addressing cache memories 350, including (for example) the data/commands storer of 16 lines, each line is including (for example) 4 * 32 double words that are used for the instruction/data storage in this embodiment for described cache memory 350.Program flash memory 160 is coupled with cache memory with cache controller 120 couplings and via pre-fetch unit 360, and described pre-fetch unit 360 is also connected to cache line address scrambler 340.Pre-fetch unit 360 is sent to by cache line address scrambler 340 instruction directly or in each cache line of the cache memory 350 by addressed.For this reason, pre-fetch unit 360 can comprise one or more impact dampers of the instruction in the storage area that can store respective cache line to be sent to.Multiplexer 150 is provided to cpu bus 110b through control with selection respective byte/word/double word in cache memory 350 or from the prefetch buffer of unit 360 and with it.

Fig. 4 shows cache memory 350 in more detail.In this embodiment, provide 16 cache lines.Each line comprises a plurality of control bits and one 4 * 32 bit instructions/data storage areas (Word0 is to Word3).Described control bit comprises shielding MASK, address mark TAG, validity bit V, locking bit L, type bit T and branch tail bit BT.Shielding MASK allows the selected position of shielded address mark TAG during being compared by hit logic 320, as hereinafter explaining in more detail.The beginning of the cache line in address mark TAG and then the instruction memory 160.As hereinafter will explaining in more detail, address mark TAG is readable and can writes, and fashionablely will force pre-fetch function being write by the user.Clauses and subclauses in the validity bit V indication associated cache line are effective.This position can not be changed by the user, and it is through automatic setting or reset.Whether locking bit L indication cache line is locked, and therefore can not be by overwrite.This position can be by user change or can be with respect to branch trail function and automatic setting, such as hereinafter explanation.The type of position T indication cache line, that is, cache line is as instruction cache line or as data cache lines.This position can be through being designed to the change by the user, and this allows assigning very flexibly and configuring of cache memory.Replace with a single T that assigns some cache line being appointed as data cache lines, useful general configuration register defines given number will be for the line of cached data, will be for instruction cache and remain cache line.In this embodiment, still can provide a T indicating which cache line through being set as specified data cache lines, and therefore the rheme T of institute can not be modified in this embodiment.As will explaining after a while, can (for example) be configured to zero cache line, 1,2 or 4 cache lines are used for the purpose of data cache according to the cache memory of an embodiment.Therefore this appointment can be split into two parts with cache memory, for example, decides according to the number of the line of assigning, and can upwards assign data cache lines from the bottom of cache memory.Have other configuration of more data cache line yes possible and decide according to the respective design of cache memory.Therefore, when being set, position T indicates this line to be used for data cache.

Fig. 7 shows can be in order to the embodiment of certain logical circuit of implementing branch trail function.As explained above, branch tail bit 750 in order to will be branched off into subroutine and with the subroutine instruction returned, capture, interrupt or other instruction is carried out in cache line and be not to automatically lock associated cache line in the situation of last instruction in the described line.When being set, calling that the subroutine type instruction has been performed and program branches leaves that it is linear when carrying out sequence, CPU can be by setting position 740 automatic lock-related on lines via logic gate 760.The execution of this subroutine type instruction can detect in performance element, and by signal 770 signaling logic gates 760.Not yet carry out when at least one but will be when the instruction that program is carried out when respective subroutine is returned is stayed the cache line, enable that this is functional.Under this instruction is placed on situation in last storage space of cache line, to there is no need to keep cache line to automatically lock, because instruction subsequently will be in different cache line or even may be not in cache memory.When position 750 according to the execution (it is by detection signal 770 signaling logic gates 760) of respective subroutine or interrupt call when being set, CPU automatic setting and reset locking position 740.

Fig. 5 and Fig. 6 are illustrated in and implement in microprocessor or the microcontroller with the behavior of control configurable cache and the example of functional universal high speed memory buffer control register 510 and other control register 610 to 660.All registers can designed to be used 32 bit registers that use in 32 environment.Yet these registers can easily be adapted to work in 16 or 8 environment.For instance, register CHECON comprises position 31 enabling or to stop using whole cache memory, and position 16 CHECOH can set in order to realize the cache coherence on the PFM program loop position.For instance, this CHECOH can make all data and order line invalid when being set, maybe can make all data lines invalid and only make without the order line of locking invalid.Position 24 can be in order to enable compulsory data cache function, as hereinafter explaining in more detail.When being set, if the cache memory bandwidth is not used for extracting instruction, this function is forced data cache so.Position 11-12 BTSZ can be in order to enable/the disable branch trace labelling.For instance, in one embodiment, if be activated, branch's trace labelling can be set to the size of 1,2 or 4 line so.Therefore, 1,2 or 4 cache line will have that this is functional.According to other embodiment, all cache lines can be activated to be used for this functional.Position 8-9 DCSZ is in order to define the number of data cache lines, as explained above.In one embodiment, described number can be through setting to enable 0,1,2 or 4 data cache line.

Position 4-5 PREFEN can be in order to enable the predictive prefetch that optionally is used for the cacheable and non-cacheable area of storer.The cacheable area of storer can be in the storer for example can be through the district of storer or the program area of actual cache, it means the memory areas with the actual coupling of cache memory.Non-cacheable area refers generally to the memory mapped peripheral space that generation (for example) can not be cached usually.Differentiation criterion system between cacheable area and the non-cacheable area and deciding.Some embodiment may need this difference, and respective microprocessor/microcontroller will be supported high-speed cache/non-cache method, and other embodiment of processor may any type of high-speed cache storer, and no matter it is actual storage district or memory mapped district.

If be set, pre-fetch unit will be extracted the instruction of following after the cache line of current therefrom issuing command always so.Use two positions to allow (for example) four kinds of different set, for example, enable for cacheable area and both predictive prefetch of non-cacheable area, only enable predictive prefetch for non-cacheable area, only enable predictive prefetch and inactive predictive prefetch for cacheable area.According to an embodiment, suppose that cache line comprises 16 bytes or four double words.For instance, if the CPU (central processing unit) request from the instruction x1 of address 0x001000, the cache memory steering logic compares all address marks and 0x00100X (its meta X is left in the basket) so.If controller produces and hits, select so homologous lines.Selected line comprises all initial instructions with address 0x001000.Therefore, in the situation that each instruction be 32 long, the first instruction will be distributed to CPU (central processing unit), and pre-fetch unit will be triggered next line of looking ahead.For this reason, pre-fetch unit will be calculated as address mark subsequently 0x001010 and begin and load command adapted thereto in next available cache line.When CPU (central processing unit) was further carried out instruction from address 0x001004,0x001008 and 0x00100C, pre-fetch unit was used from the instruction of address 0x001010,0x001014,0x001018 and 0x00101C and is filled up next available cache line.Finish in CPU (central processing unit) before the instruction of the cache line of carrying out current selected, pre-fetch unit will be finished the loading subsequent instructions.Therefore, CPU (central processing unit) will not be stopped.

Return referring to Fig. 5, position 0-2 is in order to the number of the waiting status that defines program flash memory.Therefore, various flash memory can use with microcontroller.

Each line in the cache memory as shown in Figure 4 can be mapped to register as shown in Figure 6 under control.Therefore, can be designed to fully can be by reading and write operation comes access and can be changed by the user fully for cache line.Yet as indicated above, some positions of cache line can must not or may need homologous lines is unblanked before the user can change homologous lines by user's change through design.For this reason, can provide indexed registers 600 to be used for selecting the one of described 16 cache lines.In case selected cache line by indexed registers 600, described cache line just can come access by register 610-660 subsequently.Mask register can comprise the shielding MASK that selectes cache line among (for example) 5-15 in place.The second register that is used for mark can have address mark and also can comprise position V, L, T and the BT that validity, lock-out state, type and the branch trail function of register are selected in indication by 4-23 in place.At last, four 32 bit registers can be provided for the selected line that comprises cached data or instruction in register Word0, Word1, Word2 and Word3.Can implement other control register with the general utility functions of control cache memory.Therefore, each cache line can be by user or software access and manipulation, as hereinafter explaining in more detail.

According to the embodiment that discloses, cache memory 100,300 comes initial cpu instruction extraction is responded to gather (being called line) by the instruction word of extracting (for example) 128 bit alignments from PFM 160 through design.The actual instruction of asking can be present in the described line Anywhere.Described line is stored in the cache memory 130,350 (filling), and instruction turns back to CPU.This access can take a plurality of clock period and CPU is stopped.For instance, for 40 nanoseconds of access flash, access can cause 3 waiting statuss under 80MHz.Yet, in case line is cached, the subsequent access that is present in the instruction address in the described line is just occured in zero wait state.

If high-speed cache so is activated, this process continues on for each instruction address of miss cache line so.In this way, if minor loop be 128 bit alignments and with cache memory 130,350 byte number is identical or be less than cache memory 130,350 byte number, so described loop can be carried out from cache memory under zero wait state.For the loop of filling fully, 4 line cache memories, the 130 every clocks with 32 bit instructions are as shown in fig. 1 carried out an instruction.In other words, CPU carries out all instructions that are stored in the cache memory 130 in 16 clocks.If only support the extraction of 128 bit wides, so described the same circuit can take the waiting status of given number (for example to be used for extraction by every line, 3 waiting statuss), and take the clock of given number (for example to be used for execution, 4 clocks), this will cause (for example) per 4 instructions to take 7 clocks.This example has generation the total loop time of 28 clocks.

Embodiment among Fig. 1 comprises two line data cachings to utilize the constant that can be stored among the PFM 160 and the spatial proximity of showing data.Yet in other embodiments, this cache memory can be larger and is connected to data-carrier store.

In addition, as explained above, also can realize looking ahead such as the cache memory of showing among Fig. 1 and Fig. 3, with the waiting status of the required given number of the instruction stream that allows to avoid to extract 128 bit wides.Be activated if look ahead, cache memory 100,300 uses least-recently-used line to carry out the predicted address filling so.Predicted address just in time is next order 128 bit alignment address, as mentioned the example that uses actual address is done detailed explanation.Therefore, in cache line, carry out between order period, if predicted address not yet in cache memory, cache memory produces flash memory access so.When CPU needed to move under (for example) frequency to 3 waiting status accesses of flash memory system, predicted address was extracted in CPU wherein and needs in cycle of predict command and finishes.In this way, for linear code, cpu instruction extracts and can move under zero wait state.

When link branch carries out in CPU with the preservation cache line for future when using with link skip instruction, branch's tracking characteristics is checked described instruction.This feature strengthens the performance that funcall returns by preserve any instruction in the line of following the tracks of branch or skip instruction.

Program flash memory cache 160 and prefetch module 120,360 are for providing the performance of enhancing in the outside application of carrying out of cacheable program flash memory region.Property enhancement is realized with three kinds of different modes.

First kind of way is the module cache capability.Has the ability that every clock is loop supply once command (reaches 16/64 instruction and reach 32/128 instruction for 16 bit manipulation codes for 32 bit manipulation codes) such as 4 or 16 line instruction cache 130,350 of showing among Fig. 1 and Fig. 3.Other configuration of cache size and tissue is applicable.The embodiment that shows among Fig. 1 also provides the ability of high-speed cache two line data, thereby the improvement access to the data item in the line is provided.The embodiment that shows among Fig. 3 is by setting split point or individually assigning corresponding cache memory type (as explained above) that the more data cache lines size of flexible assignment is provided.

The second, when allowing to look ahead, the every clock of module provides once command for linear code, thereby hides the access time of flash memory.The 3rd, module can be distributed to one or two instruction cache line the linked branch history instruction.When the jump with link or branch instruction occurred in CPU, last line was marked as branch history line and preserves being used for and return from calling.

Module is enabled

According to an embodiment, after resetting, can enable module by setting position (for example, the position 31ON/OFF (referring to Fig. 5) in the CHECON register).Remove this position and will finish the following:

Stop using all cache memories, look ahead and the state of branch history functionality and the cache memory that resets.

Module is set as bypass mode.

Allow special function register (SFR) to read and write.

Operation under the energy-saving mode

Park mode

According to an embodiment, when device entered park mode, the clock control piece stopped cache module 100,300 clock.

Idle mode

According to an embodiment, when device enters idle mode, cache memory and the clock source of looking ahead still works and CPU stops run time version.Any being untreated is taken in advance module 100,300 and stops to finish before its clock via automatic Clock gating.

The bypass behavior

According to an embodiment, default mode of operation is bypass.Under bypass mode, module is for each instruction and access PFM, thereby causes the flash access time of defining such as the PFMWS position (referring to Fig. 5) among the register CHECON.

The high-speed cache behavior

According to Fig. 1, high-speed cache and prefetch module can be implemented complete association 4 line instruction cache.Decide according to design, more or less cache line can be provided.Instruction/data storage area in the cache line can through be designed to write and during the quickflashing programmed sequence or when the corresponding positions among the general control register CHECON is set to logical zero, be eliminated with the control bit that is associated.Its every line uses register or the bit field that contains flash address tag.Each line can be made of the instruction of 128 positions (16 bytes), and no matter instruction size how.In order to simplify access, can be only from quickflashing 160 requests 16 byte aligned instruction data according to high-speed cache and the prefetch module of Fig. 1 and Fig. 3.According to an embodiment, if the address that CPU asks is not aimed at 16 byte boundaries, module will be come aligned address by abandoning address bit [3.0] so.

When only being configured to cache memory, module by when miss with a plurality of instruction load in line and as any cache memory, work.According to an embodiment, module can be used simply least-recently-used (LRU) algorithm to select which line to receive new instructions and close.Cache controller determines with the wait state value of register CHECON how long it must wait for flash access when it detects when miss.When hitting, cache memory is return data under zero wait state.

Instruction cache is according to looking ahead and branch follow the tracks of to select and works by different way.If code is 100% linearity, so only cache mode will provide instruction with corresponding PFMWS cycle sequential and get back to CPU, and wherein PFMWS is the number of waiting status.

Shielding

Use mask bit field can realize further using flexibly of cache memory.Fig. 7 shows in order to implement the possible logical circuit of function of shielding.The bit field 710 of cache line contains (for example) 11 positions, and institute's rheme can be used some position with shielded address mark 720.11 positions of mask bit field 710 in order to shielded address mark 720 than low level 0-10.When comparer 780 compares address mark 720 and institute's request address 790, incite somebody to action so that the corresponding positions in the address mark is left in the basket for any that is set to " 1 " in the mask bit field 710.If instruction/data storage area comprises 16 bytes, address mark does not comprise low 4 positions of actual address so.Therefore, be set to " 1 " if shield all positions of 710, comparer compares the position 0-19 of the address mark in the position 4-23 of actual address and the system that uses 24 address bits so.Yet, by shielding 730, can force comparer 780 only the fraction of address mark 720 and the corresponding fraction of actual address 790 to be compared.Therefore, a plurality of addresses can cause and hit.This is functional can be especially advantageously to cause that with some the interruption of branch of the predefined address in the command memory or the generation of trapped instruction use.For instance, interruption can cause to the branch of the storage address that contains Interrupt Service Routine, and described storage address adds that by interrupting base address the offset address that is defined by the priority of interrupting is defined.For instance, priority 0 interrupts being branched off into address 0x000100, and priority 1 interrupts being branched off into address 0x000110, and priority 2 interrupts being branched off into address 0x000120, etc.Trapped instruction can be organized similarly and can be caused similar branching pattern.The Interrupt Service Routine of supposing given number is identical for the instruction of predefine number at least, and by using function of shielding, these addresses can cause to the branch of the initial same cache line that contains service routine so.For instance, if the forth day of a lunar month 32 bit instruction for the Interrupt Service Routine of priority level 0-3 are identical, the mask bit field that is included in so the cache line of the initial instruction in 0x000010 place, address can be set to " 11111111100 ", and it will hit causing from initial all addresses to 0x0001300 of 0x000100.Therefore, the interruption that not only has priority 0 will cause hits, and has priority 1,2 and 3 interruption and also will cause and hit.It all will jump to the same instruction sequence that has been carried in the cache memory.Therefore, with the loss that can not occur because of the access flash storer.

The behavior of looking ahead

The bit field PREFEN of control register CHECON or corresponding single position (referring to Fig. 5) can be in order to enable pre-fetch function.When being configured when looking ahead, module 100,300 next line address of prediction and it is turned back in cache memory 130,350 the LRU line.Pre-fetch function extracts to begin prediction based on the first cpu instruction.When First Line was positioned in the cache memory 130,350, module only made address increment arrive next 16 byte alignment address and beginning flash access.Flash memory 160 all instructions can be before the front when carrying out or before return the next instruction set.

If any time during predicted flash access, new cpu address does not mate with predicted address, and flash access will be changed to correct address so.This behavior can not make the CPU access take than the shared longer time of time in the situation that does not have prediction.

If predicted flash access is finished, so instruction is positioned in the LRU line with its address mark.Before hitting line, do not upgrade cpu address the LRU indication.If it is the line of just in time looking ahead, so with described wire tag for the lines that use recently at most and correspondingly upgrade other line.If it is another line in the cache memory, algorithm is correspondingly adjusted so, but the line of just in time looking ahead still is the LRU line.If it is miss cache memory 130,350, access forwards quickflashing to and link order is positioned over the LRU line (it is for upgrading at most recently but from untapped prefetched lines) so.

According to an embodiment, as indicated above, optionally open or close data pre-fetching.According to another embodiment, if (for example, the dedicated bit in CHECON) is set to logical one to control register, can cause the instruction prefetch abort in instruction prefetch data access midway so.If this position is set to logical zero, data access is finished after instruction prefetch is finished so.

Branch's tracking behavior

Cache memory can be used for branch trace command by the bit field BTSZ (referring to Fig. 5) among the program register CHECON with one or more lines of instruction cache with (for example) through division.When CPU request as from branch with link or during new address that jump and link instruction are calculated, branch trail line be the cache lines of nearest maximum uses.According to an embodiment, when module 100,300 was labeled as branch trail line with the MRU cache line, it also can deallocate the LRU branch trail line, used thereby make it be returned as the universal high speed memory buffer.

As explained above, if last access is that so described line is not marked as branch trail line from last instruction in the MRU line (superlatively location).And module does not deallocate any one that has now the line from branch's tracking section of cache memory.

The prestrain behavior

The bootable module 100 of application code, 300 usefulness are from the instruction prestrain of flash memory 160 and lock a cache line.Pre-loaded function use the to hang oneself LRU of the line that is labeled as cache memory (that is, not branch trail).

According to an embodiment, but the address tag bit field in the direct access cache line, and the user can be written to any value in this bit field.This writes the pressure prestrain high-speed cache that causes the homologous lines of institute's addressing in the flash memory.Therefore, prestrain is by coming work to be pre-loaded to homologous lines from storer in the address tag bit field that the address is written to cache line.According to an embodiment, this action made described line invalid before access flash is with search instruction.After prestrain, described line can be by the CPU (central processing unit) access to be used for carrying out command adapted thereto.

According to an embodiment, this functional can be in order to implementing very flexibly debug functionality, and need not to change the code in the program storage.Being included in the homologous lines that needs the instruction of breakpoint during the debug sequence in case recognize, can be that prestrain has particular address with described wire tag just.Then, the content of described cache line can be through revising to comprise debug command.For instance, but the instruction in the described cache line of system software automatic replacement to produce breakpoint or to carry out the subroutine of any other type.In case respective code is performed, just can replaces described instruction and can change storehouse to turn back to the same address of therefrom carrying out debugging routine with presumptive instruction.Therefore, preload functionality allows to change very neatly intrasystem code.

According to another embodiment, if cache line can be forbidden the access that writes to this cache line so by the locking bit locking or potentially by the branch tail bit locking.Therefore, only the cache line through unblanking can be and can write.If it is functional to implement this, user's described cache line of must at first unblanking before it can be written to new address mark in the cache line so loads command adapted thereto or data from storer to force cache controller.For instruction/data storage area write access too.

Feature with specified instruction load cache memory especially can be very useful for function of shielding as explained above on one's own initiative.For instance, if it is initial that many Interrupt Service Routines come with same instruction sequence, so can be by the respective service routine address being written in the address mark so that respective cache line prestrain has the instruction of respective interrupt service routine to force this instruction sequence to enter in the cache memory.By setting corresponding shielding and locking respective cache line as explained above, cache memory can not have a flash access loss through pre-configured so that program is made a response to some interruption.Therefore, some routine can be come access by cache memory all the time.

Reset and initialization

After resetting, all cache lines all namely are marked as invalid and cache features is deactivated.For instance, by register CHECON, waiting status is reset to its max wait state value (allowing to carry out bypass accesses after resetting).

When any quickflashing program began, it was its reset values that module 100,300 forces cache memory.Before program loop finished, any access of being undertaken by CPU all was stopped.In case program loop is finished, CPU access co-pending is just proceeded via switching to quickflashing.Link order is finished by the value that defines in the configuration register.

Flash prefetch buffer (FPB)

According to an embodiment, flash prefetch buffer design (referring to Fig. 3) can be simple impact damper, for example latch or register 365.In one embodiment, its core cpu instruction of the core cpu instruction of altogether 8 instructions or 4 instructions of when under 32 bit instruction patterns, operating, looking ahead of can be through design looking ahead nearly during when operation under 16 bit instruction patterns with 4 panels allowing to utilize x32 position flash memory.The instruction that the FPB that implements in cache controller 120 looks ahead to guarantee to be fed in the core with linear mode will not make kernel instruction stop.According to an embodiment, FPB can contain 2 impact dampers that have separately 16 bytes.Each impact damper trace command address extraction.If branch out outside the present buffer instruction boundary, utilize so alternate buffer (cause initially and stopping, but then the linear code of high-speed cache extracts).Each instruction fetch forces FPB to grasp 16 possible bytes of follow-up linearity with fill buffer.

According to another embodiment, randomly, programmable forced data cache operation can be implemented by prefetch buffer.In case cache memory is filled with one or more order lines, just can sequentially carries out described instruction and need not to extract other order line in the cycle at special time.This situation is especially true, because the execution time of the instruction in the single cache line can double or even more be longer than in order to cache line is loaded into the time in the cache memory.In addition, if one or more row cache memory line comprises the loop through carrying out, the possibility duration of existence does not need the relatively long time of any other instruction of high-speed cache so.According to an embodiment, this time can be used with cached data, for example treats relatively a large amount of data of using in table, etc.Cache memory can by register (for example, position 23 DATAPREFEN (referring to Fig. 5) among the register CHECON) programming, be carried out extra data caching function when extracting instruction to be not used in the cache memory bandwidth.This tables of data be loaded into by needs can be in the situation that the program in the cache memory uses useful.The data extraction can occur after initially filling the first time and still allow core continuation use from institute's prefetched instruction of cache line.According to an embodiment, when function digit DATAPREFEN is set, can be after each instruction fetch the automatic lifting line that fetches data.Perhaps, according to another embodiment, as long as corresponding positions DATAPREFEN is set, just can force data cache.Therefore, for instance, can begin and stop compulsory data cache by setting corresponding positions.In another embodiment, when cache memory suspends load instructions within cycle time, just can automatically perform compulsory data cache.If a plurality of control bits are provided, can implement so the programmable combination of different pieces of information cache mode.

Fig. 8 shows according to the use high-speed cache of an embodiment and the simplification flash memory request of pre-fetch function.Flash memory request begins at step 800 place.At first, determine in step 805 whether request is cacheable.If request, determines in step 810 so whether the address that provides has produced cache-hit for cacheable.If so, so according to an embodiment, process can branch into two parallel procedures.Yet other embodiment can sequentially carry out these processes.The first branch determines whether to have asked calling subroutine with step 812 beginning in step 812.If not, the first parallel procedure finishes so.If so, in step 815, determine whether so in respective cache line, to have set branch tail bit.Whether if so, determine so to call in step 820 is last instruction in cache line.If so, the first parallel procedure finishes so.If so, in step 830, lock so respective cache line.The second parallel procedure begins in step 835, wherein from the cache memory link order, and in step 835, carries out the algorithm of recently last use to upgrade the state of cache line.If if in step 810, not yet produce cache-hit or request for not cacheable, determine in step 840 so whether prefetch buffer produces to hit.If prefetch buffer contains the to some extent instruction of request, in step 845, return so the instruction of asking.Otherwise, in step 850, carry out flash access, it will make CPU stop.In the step 855 after the step 850, in the situation that cache line can be used for carrying out the cache memory function, flash request can be filled cache line.Routine finishes with step 860.

Although describe, describe and defined embodiments of the invention with reference to exemplary embodiment of the present invention, described reference does not also mean that limitation of the present invention, and should not infer any this type of restriction.The subject matter that discloses can be made considerable modification, change and equivalent in form and function, as association area and benefit from that those skilled in the art of the present invention will expect.The embodiment that institute of the present invention describes and describes only is example, and and non exhaustive scope of the present invention.

Claims

1. cache module that is used for CPU (central processing unit), wherein in response to the request of described CPU (central processing unit), described cache module provides data and/or instruction to described CPU (central processing unit), and described cache module comprises:

Comprise the cache memory control module for the interface of storer,

Cache memory with described control module coupling, wherein said cache memory comprises a plurality of cache lines, at least one cache line in described a plurality of cache line comprises address tag bit field and is used for the zone that is associated with storing of storage instruction or data, be readable for the described address tag bit field of described CPU (central processing unit) wherein and can write, and wherein said cache memory control module can operate with initial pre-loaded function at once after detecting the address to be written to described address tag bit field by described CPU (central processing unit), and instruction or data from described storer in the described pre-loaded function are loaded into described at least one cache line from described address.

2. cache module according to claim 1, it further comprises the indexed registers for the described cache line of register access that is associated by at least one.

3. cache module according to claim 1, it further comprises the described address tag bit field of mapping to be used for reading and writing the register of access.

4. cache module according to claim 1, wherein said at least one cache line further comprises be used to locking described at least one cache line in order to avoid by the locking bit of overwrite.

5. cache module according to claim 1, wherein said at least one cache line further comprises at least one control bit field, wherein said control bit field and the coupling of described address tag bit field are to shield the position of predefine number in the described address tag bit field.

6. cache module according to claim 4, wherein at least one other cache line comprises be used at least one branch tail bit that automatically locks described at least one other cache line, wherein in the situation that described branch tail bit be set, locking bit in the situation that the described predefined instruction that is associated with storing in the zone has been published by automatic setting.

7. cache module according to claim 1, wherein each cache line further comprises the validity control bit of the validity that is used to indicate described cache line.

8. cache module according to claim 1, wherein each cache line comprises further that to be used to indicate described cache line be as instruction cache line or the Type Control position of data cache lines.

9. cache module according to claim 1, it further comprises the pre-fetch unit with described storer and the coupling of described cache memory, wherein said pre-fetch unit, will be loaded in another cache line from the instruction of described storer with when being published from the instruction that before has been loaded with a cache line of instruction automatically through design.

10. cache module according to claim 9, described pre-fetch unit can be controlled to be activated or stop using.

11. cache module according to claim 9, wherein least-recently-used algorithm is in order to determine which cache line will be by overwrite.

12. cache module according to claim 1, wherein said cache memory is able to programme to assign in order to first group of cache line of high-speed cache instruction and in order to second group of cache line of cached data, and wherein said cache memory control module comprises programmable functions, and described programmable functions is forcing data cache to described second group of cache line when described first group of cache line carried out instruction.

13. cache module according to claim 1, wherein said cache memory comprises in order to first group of cache line of high-speed cache instruction and in order to second group of cache line of cached data, and wherein said cache memory control module can operate when being set with at least one position in control register and forces data cache in described second group of cache line.

14. method that operates cache memory, described cache memory and CPU (central processing unit) coupling, and a plurality of cache lines and each cache line that have for storage instruction or data have address tag bit field, and described method comprises following steps:

In response to the request of described CPU (central processing unit), described cache memory provides data and/or instruction to described CPU (central processing unit);

For the instruction sequence that is stored in the storer provides the address;

Described CPU (central processing unit) is written to described address in the address tag bit field of cache line, detect said write in response to the cache memory control module that comprises for the interface of storer, execution to the access of described storer, is loaded in the described cache line with described instruction or the data that will be stored under described address in the described storer under described address.

15. method according to claim 14, it further is included in carries out the step that the said write step is selected described cache line before.

16. method according to claim 15 is wherein carried out described selection step by being written to for the index of described cache line indexed registers.

17. method according to claim 14 is wherein by being written to described address the step of carrying out the described address of said write in the register that maps to described cache line.

18. method according to claim 14, it further comprises the step that will automatically be loaded into from the instruction of described storer in another cache line when being published from the instruction that before has been loaded with a cache line of instruction.

19. an operation has the method for the system of central processing unit CPU, described central processing unit CPU and the cache module coupling with cache controller and cache memory, described cache memory comprises a plurality of cache lines for storage instruction or data, and each cache line has address tag bit field, wherein said cache controller comprises the interface for storer, and described method comprises following steps:

In response to the request of described CPU, described cache memory provides data and/or instruction to described CPU, carries out instruction in described CPU, and described instruction is written to the address in the address tag bit field of cache line,

Described cache controller detects described address tag bit field by overwrite, and thereupon

Described the cache controller described storer of access and instruction or data that will be stored under described address in the described storer under described address are loaded in the described cache line.

20. method according to claim 19, it further is included in carries out the step that the said write step is selected described cache line before.

21. method according to claim 20 is wherein carried out described selection step by being written to for the index of described cache line indexed registers.

22. method according to claim 19 is wherein by being written to described address the step of carrying out the described address of said write in the register that maps to described cache line.

23. method according to claim 19, it further comprises the step that will automatically be loaded into from the instruction of described storer in another cache line when being published from the instruction that before has been loaded with a cache line of instruction.