CN1963789A - Stack cache memory applied for context switch and buffer storage method - Google Patents

Stack cache memory applied for context switch and buffer storage method Download PDF

Info

Publication number
CN1963789A
CN1963789A CNA2005100868602A CN200510086860A CN1963789A CN 1963789 A CN1963789 A CN 1963789A CN A2005100868602 A CNA2005100868602 A CN A2005100868602A CN 200510086860 A CN200510086860 A CN 200510086860A CN 1963789 A CN1963789 A CN 1963789A
Authority
CN
China
Prior art keywords
stack
address
space
cacheline
speed cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2005100868602A
Other languages
Chinese (zh)
Other versions
CN100377115C (en
Inventor
郇丹丹
胡伟武
李祖松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Loongson Technology Corp Ltd
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CNB2005100868602A priority Critical patent/CN100377115C/en
Publication of CN1963789A publication Critical patent/CN1963789A/en
Application granted granted Critical
Publication of CN100377115C publication Critical patent/CN100377115C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

This invention discloses one shed high buffer memory and its buffer memory method suitable for upper and down text switch, which comprises at least two shed high speed buffer blocks; one or gate circuit, one selector, wherein, the shed buffer memory block is composed of label part, data part and control part; the said shed high speed buffer block is composed of at least three comparing circuits and one and gate circuit. The method comprises the following steps: a, initiating shed; b, aligning shed space; c, recycling shed space; comparing label to determine whether to shot of shed high buffer.

Description

Be applicable to stack cache memory and buffer storage method that context switches
Technical field
The present invention relates to the micro-processor architecture technical field, particularly a kind of stack cache memory and buffer storage method that is applicable to that context switches.
Background technology
Along with the fast development of microprocessor Design and production technology, the gap of the access speed of internal memory and the arithmetic speed of processor is more and more significant.And the gap of the access speed of storer and the arithmetic speed of processor makes memory access speed more and more become the bottleneck that improves processor performance with annual 50% speed increment.Utilize principle of locality, adopting one or more levels cache memory (Cache is called for short " high-speed cache ") is one of effective means that improves performance of storage system.High-speed cache is a little fireballing special memory of capacity, deposits processor most recently used instruction and data.Processor when operation, if the instruction of visit or data in high-speed cache, then can be, otherwise need access memory with very high speed visit, wait for the long time.Design the average memory access time that high-speed cache efficiently can reduce processor significantly.
For the better utilization principle of locality, improve cache hit rate, high-speed cache is further divided into instruction cache (Instruction Cache) and data cache (Data Cache) again.The visit of internal memory is divided into code segment, data segment, heap space and stack space, and this just provides possibility for the further refinement of data cache.Wherein the visit of stack space data has good temporal locality and spatial locality, and program is visited the data near stack top continuously.The preservation of the preservation of local variable during function call, parameter transmission, register and recovery all are to finish by the visit of stack space.Stack cache memory (Stack Cache, be called for short " stack high-speed cache ") stack addressing is separated from data cache, can the better utilization stack space characteristics of visit, the heap data of avoiding the stack data to replace out in the data cache simultaneously causes that data cache pollutes, and has reduced taking the data cache port.The characteristics of stack space data access: the visit of (1) stack space data has good temporal locality and spatial locality, and program is visited the data near stack top continuously.Therefore the stack high-speed cache does not need very greatly, just very high hit rate can be arranged.(2) stack space distributes, and promptly top-of-stack pointer (sp) reduces, and does not need to fetch from the low layer storage system the original value of corresponding blocks.(3) stack space reclaims, and promptly top-of-stack pointer (sp) increases, and does not need alluvial (even dirty) is write in the low layer storage system.(4) stack addressing is the visit of continuous space, and continuous space can use identical base address to add the form visit of skew.
As Fig. 1 is traditional stack cache structure and visit figure.In order to obtain stack high-speed cache information of whether hitting and the data of hitting apace, the stack high-speed cache adopts the virtual address access modes.The stack high-speed cache is divided into sign, data and control assembly three parts, and input is the address of visit stack high-speed cache, and output is stack cache hit or signal that does not hit and hiting data.The sign of stack high-speed cache partly comprises territory, address (Bottom) at the bottom of empty plot (Vbase) territory, significance bit (Valid) territory, physical base address (Pbase) territory, (Top) territory, stack top address and the stack.The data division of stack high-speed cache comprises (W) territory, dirty position whether data (Data) territory and expression data were write.Because the stack of microprocessor all is to distribute to low address from high address, thus at the bottom of the stack address greater than the stack top address.The address is promptly determined by binary program interface standard (ABI) by software convention at the bottom of the stack.As shown in Figure 1, control assembly comprises first comparator circuit, second comparator circuit and AND circuit.First comparator circuit is finished the base address and the judgement whether empty plot of stack high-speed cache equates of visit, promptly judge Base=Vbase?Is second comparator circuit finished the judgement whether reference address belongs to stack space, promptly judges Top≤Vaddr≤Bottom?AND circuit finish first comparator circuit output, the output of second comparator circuit and significance bit with operation, determine whether the stack high-speed cache hits, output stack cache hit or the signal that does not hit.
The address of visit stack high-speed cache is divided into two fixing parts: base address (Base) and skew (Offset).Wherein the base address is used for judging relatively with the empty plot of stack high-speed cache whether the stack cache access hits.Skew is used for selecting the content that needs in the data field, output stack number of cache hits certificate.
The stack cache set is woven to the form of round-robin queue, the zone that buffer memory is continuous.The stack high-speed cache is judged the distribution and the recovery of stack space by the variation that detects top-of-stack pointer.Stack space distributes, and promptly top-of-stack pointer (sp) reduces, and directly distributes stack space in the stack high-speed cache, does not need to fetch from the low layer storage system the original value of corresponding blocks.If the new stack space of the not enough distribution in the space in the stack high-speed cache is the continuity that guarantees the stack high-speed cache, replace in the high-speed cache of popping near the data at the bottom of the stack.Stack space reclaims, and promptly top-of-stack pointer (sp) increases, and does not need dirty alluvial is write in the low layer storage system.
The stack cache access by indicating comparison, determines whether the stack high-speed cache hits.The stack cache tag relatively, need to judge the whether satisfied stack space that belongs to of reference address, be reference address more than or equal to the stack top address smaller or equal to stack at the bottom of the address, the base address of visit is identical with empty plot in the stack cache tag, the value of the effective bit field in the sign is 1.Satisfy above-mentioned condition simultaneously, then the stack cache hit.Obtain the data of needs with the offset index data field.
The stack high-speed cache does not hit, if reference address does not belong to stack space, is handled by data cache.The stack high-speed cache does not hit, reference address belongs to stack space, if reference address is less than the lowest address value in the current stack high-speed cache, promptly less than near the address of stack top, then from the low layer storage system, fetch lowest address from the stack high-speed cache to the data of hit address not.The stack high-speed cache does not hit, reference address belongs to stack space, if reference address is greater than the maximum address value in the current stack high-speed cache, promptly greater than near the address at the bottom of the stack, then from the low layer storage system, fetch maximum address from the stack high-speed cache to the data of hit address not.Guarantee the continuity of stack high-speed cache like this.
Carrying out context when switching, owing to do not have the record the process identification information of (comprising thread) in the stack high-speed cache, new process may be distributed identical virtual address with original process, causes their physical address difference but the virtual address is identical.Therefore, when switching, context is the assurance data consistency, the data of stack high-speed cache apoplexy involving the solid organs all need be write in the low layer storage system, discharging the space uses for new process, referring to U.S. Patent No. 6167488, patent name " Stack caching circuit with overflow/underflow unit ".Even the enough new process in the space in the stack high-speed cache is used, also must write dirty data in the low layer storage system for guaranteeing correctness.
When processor operation one process is used, the stack high-speed cache shows good performance, but operation multi-user (Multi-user), multiprogramming (Multi-programming) and multithreading (Multi-threading), need once all dirty data in the stack high-speed cache to be write in the low layer storage system because context switches, expense is very big.After context switches back, also need the data of writing are away fetched the stack high-speed cache again, the cost of data transmission is very big.Therefore, the processor with stack high-speed cache has good performance in one process is used, and operates under multi-process (the comprising thread) environment, and when particularly frequently carrying out the context switching, effect is unsatisfactory.The application of multi-user, multiprogramming and multithreading is the trend of microprocessor development, is inevitable.
Therefore, the deficiencies in the prior art just need design to be applicable to the average memory access time of the microprocessor stack high-speed cache of context switching with the reduction processor, increase substantially microprocessor memory access performance in actual applications.
Summary of the invention
The object of the invention is that the stack high-speed cache that overcomes prior art is not suitable for the deficiency that context switches, thereby provide a kind of under multi-user, multiprogramming and multi-thread environment, running into frequent context switches, also can be good at playing a role, and hardware spending is little, the microprocessor stack high-speed cache and the method that are applicable to the context switching that are easy to realize.
In order to achieve the above object, the present invention takes technical scheme as follows:
A kind of stack cache memory that is applicable to that context switches comprises:
At least two stack cachelines, described stack cacheline is made up of sign part, data division and control section;
An OR circuit is connected with the output terminal of the control section of described two stack cachelines at least, be used for each stack cacheline hiting signal or operation, export the result that this stack cache memory hits or do not hit;
A selector switch, be connected with the output terminal of the control section of described at least two stack cachelines, and be connected with the output terminal of the data division of described at least two stack cachelines, be used to select the data of the stack cacheline that hits, output stack number of cache hits certificate;
Further, the sign of described stack cacheline partly comprises territory, address (Bottom) at the bottom of empty plot (Vbase) territory, significance bit (Valid) territory, physical base address (Pbase) territory, territory, stack top address (Top), the stack, process address space sign (PASID is called for short " process identification (PID) ") territory;
Further, the data division of described stack cacheline comprises (W) territory, dirty position whether data (Data) territory and expression data were write;
Further, the control section of each described stack cacheline comprises: at least three comparator circuits and an AND circuit; Wherein, the input end of first comparator circuit is connected with the empty plot territory of sign part and the territory, base address of the reference address of this stack high-speed cache, be used to finish the base address and the judgement that indicates whether empty plot partly equates of visit, its output terminal is connected to described AND circuit; The reference address of address field and this stack high-speed cache is connected at the bottom of the stack top address field of the input end of second comparator circuit and sign part and the stack, is used to finish the judgement whether reference address belongs to stack space, and its output terminal is connected to described AND circuit; The input end of the 3rd comparator circuit is connected with the process identification (PID) territory of described sign part and the process identification (PID) territory of control register, the judgement whether value that is used to finish process address space identification field and the process address space sign of access instruction equate, its output terminal is connected to described AND circuit; Effective bit field of described sign part also is connected to the input end of described AND circuit; The output terminal of described AND circuit is connected respectively to described OR circuit and described selector switch.
The microprocessor stack cache memory that is applicable to that context switches provided by the invention, its input are the address of visit stack high-speed cache and the process address space sign of access instruction, and its output is signal that hits or do not hit and the data of hitting.
In the present invention, the value of the process address space sign of input reference instruction all has corresponding process address space sign from the control register of microprocessor for each bar access instruction.In the middle of microprocessor, the control register of preserving process address space sign corresponding contents is all arranged, the register difference of just depositing, location mode difference.For example MIPS processor adopting EntryHi register is deposited address space identifier (ASID) ASID (Address SpaceIdentifier), and the EntryLow register is deposited overall situation position G (Global Bit), and they constitute process address space sign jointly.
According to above-mentioned microprocessor stack cache memory, a kind of microprocessor stack cache storage means that is applicable to that context switches, it is as follows to comprise step:
(1) context switches, the initialization stack; If do not distribute corresponding process stack space in the stack high-speed cache, address and stack top address at the bottom of the record initialization stack in the stack high-speed cache;
(2) stack space distributes; If the stack high-speed cache has allocatable space, in the stack high-speed cache, distribute new vacant space, if the stack high-speed cache does not have allocatable space, then select the stack cacheline to write back to the low layer storage system, and the sign of the newly assigned stack cacheline of initialization;
(3) stack space reclaims; Do not need dirty alluvial is write in the low layer storage system, directly reclaim release stack cache memory space;
(4) instruction access stack high-speed cache; Indicate comparison, determine according to the sign comparative result whether visit stack high-speed cache hits; If hit, execution in step (5); If do not hit, execution in step (6);
(5) export the stack cacheline that hits and carry out the hiting data that data directory obtains with skew;
(6) judge whether reference address belongs to stack space; If reference address does not belong to stack space, handle by data cache; If belong to stack space, from the low layer storage system, fetch not hit address place stack cacheline at the bottom of the stack and the data between the stack top address.
In above-mentioned steps (2), if the stack high-speed cache does not have allocatable space, select the stack cacheline to write back to the low layer storage system, can adopt first in first out strategy (FIFO), randomized policy (Random) or least recently used strategy (LRU).If adopt first in first out strategy (FIFO), the selection of the stack cacheline that enters at first can distribute the territory (Age) of time to realize by increase expression stack cacheline in the sign of stack cacheline, the Age territory zero clearing of newly assigned stack high-speed cache block mark, the Age territory of other stack high-speed cache block marks adds 1.
In above-mentioned steps (4), described sign relatively is meant and judges whether to satisfy simultaneously following condition: reference address more than or equal to the stack top address smaller or equal to stack at the bottom of the address, the base address of visit is identical with the empty plot in the stack high-speed cache block mark, in the empty plot stack high-speed cache block mark identical with the visit base address effectively the value of bit field be 1, the value of process address space identification field identifies identical with the process address space of access instruction; Described visit stack cache hit is meant and satisfies above-mentioned condition; If do not satisfy above-mentioned condition, then visit the stack high-speed cache and do not hit.
The present invention has following advantage:
1. stack high-speed cache of the present invention is organizational form with the piece, has adopted special process address space sign in stack high-speed cache block mark, in order to distinguish the address space of different processes.So this be one specially at multi-user, multiprogramming, multi-thread environment, can well adapt to the stack cache method that process (comprising thread) context switches.
2. the present invention only need increase process address space sign PASID territory and Age territory in the sign of stack cacheline, and hardware spending is little, and control is simple, has avoided the complicacy that realizes.
Description of drawings
Fig. 1 is traditional stack cache structure and visit figure.
Fig. 2 is that one embodiment of the invention is applicable to stack cache structure and the visit figure that context switches.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail:
Among Fig. 2, numeral 10 is represented one according to the specific embodiment that is applicable to the stack cache memory that context switches of the present invention.Among this embodiment, be applicable to that the stack high-speed cache of context switching is made up of two stack cachelines, an OR circuit 11 and a selector switch 12.The input of this embodiment is the address 13 of visit stack high-speed cache and the process address space sign 14 of access instruction, and output is signal that hits or do not hit and the data of hitting.Data in each stack cacheline are continuous, are similar to traditional stack high-speed cache.
Two stack cachelines have same structure, are called the first and second stack cachelines for the ease of statement, and wherein the first stack cacheline comprises sign part 15, data division 16 and control assembly 17 3 parts.The sign part 15 of the first stack cacheline comprises: empty plot (Vbase) territory is used for representing the empty plot of stack cacheline, significance bit (Valid) territory is used for representing whether the stack cacheline is effective, physical base address (Pbase) territory is used for representing the empty plot corresponding physical of stack cacheline plot, territory, stack top address (Top) is used for representing the stack top address of the affiliated process stack space of stack cacheline, territory, address at the bottom of the stack (Bottom) is used for representing address at the bottom of the stack of process stack space under the stack cacheline, process address space sign (PASID) territory is used for representing the process address space sign of the affiliated process of stack cacheline and the distribution time that distribution time (Age) territory is used for representing the stack cacheline.The sign of each individual corresponding stack cacheline of stack cache tag part among the figure, the sign part of the numeral 18 expressions second stack cacheline among the figure, its structure is identical with the sign part 15 of the first stack cacheline.The data division 16 of the first stack cacheline comprises (W) territory, dirty position whether data (Data) territory and expression data were write.Each body surface of stack cached data among the figure shows the data of a stack cacheline, the data division of the numeral 19 expressions second stack high-speed cache among the figure, and its structure is identical with the data division 16 of the first stack cacheline.The control section 17 of the first stack cacheline comprises first comparator circuit 20, second comparator circuit 21, the 3rd comparator circuit 22 and AND circuit 23.First comparator circuit 20 is finished the base address and the judgement whether empty plot of the first stack cacheline equates of visit, promptly judge Base=Vbase?Second comparator circuit 21 is finished the judgement whether reference address belongs to stack space, promptly judge Top≤Vaddr≤Bottom?The 3rd comparator circuit 22 is finished the value of process address space identification field and process address space sign 14 judgements that whether equate of access instruction.Wherein the value of the process address space of access instruction sign 14 (PASID) all has corresponding process address space sign from the control register of microprocessor for each bar access instruction.AND circuit 23 finish the output of first comparator circuit 20,21 outputs of second comparator circuit, 22 outputs of the 3rd comparator circuit and sign part 15 effective bit field with operation, determine that whether the first stack cacheline hits, and exports the signal that the first stack cacheline hits or do not hit.
That OR circuit 11 is finished each stack cacheline hiting signal or operation, the i.e. first and second stack cachelines in the present embodiment, the result that output stack high-speed cache 10 hits or do not hit.
Selector switch 12 is finished the data of selecting the stack cacheline hit, output stack number of cache hits certificate.
The address (Vaddr) 13 of visit stack high-speed cache instruction is divided into two fixing parts: base address (Base) and skew (Offset).Wherein the base address be used for stack high-speed cache block mark in empty plot judge relatively whether the stack cache access hits, skew is used for selecting the content that needs in the data field.Processor has the control register (Control Register) of process identity information and specifies the process address space of visit stack high-speed cache instruction to identify 14, be used for stack high-speed cache block mark in process address space sign compare the process at identification access instruction place.
In the present embodiment, with two stack cachelines is that example illustrates stack high-speed cache of the present invention, be to be understood that, can comprise a plurality of stack cachelines according to stack high-speed cache of the present invention, it all has same structure and connected mode, and this is adequate to those skilled in the art.
According to the stack cache memory that present embodiment provides, a kind of microprocessor stack cache storage means that is applicable to that context switches, concrete implementation step is as follows:
(1) context switches, the initialization stack; If distribute corresponding process stack space in the stack high-speed cache, address and stack top address at the bottom of the initialization stack of record the process in the stack high-speed cache.If distributed corresponding process stack space in the stack high-speed cache, expression is that original process context switches back, and does not need to do any operation.
(2) stack space distributes, and promptly top-of-stack pointer reduces; If the stack high-speed cache has allocatable space, in the stack high-speed cache, distribute new vacant space.If the stack high-speed cache does not have allocatable space, then the stack cacheline of selecting to enter at first according to first in first out strategy (FIFO) writes back to the low layer storage system, discharge the stack cacheline and distribute to new process use, and the newly assigned stack high-speed cache of initialization block mark.The selection of the stack cacheline that enters at first realizes by increase distribution time (Age) territory in the sign of stack cacheline, the stack cacheline of stack cacheline for entering at first that the Age value is maximum.The new Age territory zero clearing that distributes stack high-speed cache block mark, the Age territory of other stack high-speed cache block marks adds 1.Distribute the process of stack space to distribute the stack cacheline if desired, revise the top-of-stack pointer in this process place stack high-speed cache block mark, will newly distribute the W territory zero clearing of stack space correspondence.If the stack cacheline at place, a stack top address is newly distributed in the stack top address not in the stack cacheline that this process has been distributed.Distribute the process of stack space not distribute the stack cacheline if desired, for it distributes the stack cacheline at new place, stack top address.Vbase in the newly assigned stack high-speed cache block mark is provided with by the base address of stack top, Pbase is provided with by stack top base address corresponding physical plot, process address space identification field is changed to current process address space sign, address at the bottom of the stack (Bottom) and stack top address (Top) are made as address and current stack top address at the bottom of the initialization stack, effectively bit field (Valid) is changed to 1, and the W territory of stack high-speed cache blocks of data is initialized as 0.
(3) stack space reclaims, and promptly top-of-stack pointer increases; Do not need dirty alluvial is write in the low layer storage system, directly reclaim release stack cache memory space, revise top-of-stack pointer.If stack space all reclaims, promptly top-of-stack pointer equals the value of bottom of stack pointer, and the value of then putting the effective bit field (Valid) that reclaims the corresponding stack cache tag of stack space is 0.
(4) instruction access stack high-speed cache indicates comparison, determines according to the sign comparative result whether the stack high-speed cache hits.Indicate comparison, promptly judge whether to satisfy: (a) reference address more than or equal to the stack top address smaller or equal to stack at the bottom of the address.(b) Fang Wen base address is identical with empty plot in the stack high-speed cache block mark.(c) value in significance bit (Valid) territory is 1 in the stack high-speed cache block mark that empty plot is identical with the visit base address.(d) value of the process address space identification field in the identical stack high-speed cache in the base address block mark is identical with the process address space sign of access instruction.Satisfy above-mentioned condition simultaneously, stack cache hit then, execution in step (5); Otherwise execution in step (6);
(5) visit is hit, and the stack cacheline that output is hit carries out the hiting data that data directory obtains with skew.
(6) visit is not hit, and judges whether reference address belongs to stack space.Handle in two kinds of situation: (a) reference address does not belong to stack space, is handled by data cache.(b) reference address belongs to stack space, fetches not hit address place stack cacheline at the bottom of the stack and the data between the stack top address from the low layer storage system, guarantees that the stack cacheline is continuous.Distribute the stack cacheline, the empty plot of this stack cacheline is identical with the base address of reference address.Visit low layer memory system data returns, put Vbase in the stack high-speed cache block mark with the base address, put Pbase in the sign with base address corresponding physical plot, Valid territory in the sign puts 1, with the PASID territory in the filling of the PASID in the control register stack high-speed cache block mark, the zero clearing of Age territory, the value in the Age territory of other stack cachelines adds 1.Return data leaves the Data territory of stack high-speed cache blocks of data in, and corresponding W territory puts 0.
Enumerate three specific embodiment below.Distribution recovery, stack cache access by stack space hits the example that does not hit with the stack cache access, and to specify the stack high-speed cache how to mention by the present invention be organizational form with the piece, in stack high-speed cache block mark, increase process address space identification field, distinguish the shared space of different processes, realize the stack cache method that is applicable to that context switches.
Example 1. stack spaces distribute, the virtual address is 32, the address is 0x7fff8000 at the bottom of the stack, the stack top address is reduced to 0x7fff7c00, and the process address space sign PASID in the control register is a process 8, and stack high-speed cache block size is 4KB, the Vbase of each stack high-speed cache block mark is 20, Offset is 12, and the stack cache memory sizes is 16KB, is divided into 4 stack cachelines.Not having process number in the stack high-speed cache is 8 stack cacheline, and not have the significance bit of sign be 0 stack cacheline.The space that needs to distribute is 1KB (0x7fff8000-0x7fff7c00), search the stack cacheline of Age maximum, the Age of second stack cacheline is 8 to the maximum, process number is 1, the stack top address is 0x7fff7b00, and the address is 0x7fff8000 at the bottom of the stack, and Vbase is 0x7fff7, Valid is 1, and Pbase is 0x00ff7.Replace out this stack cacheline, begin at the bottom of the stack from the stack top of process 1, i.e. skew is arrived 0xfff for 0xb00, if the data field respective items W that indexes be 1 be dirty, respective items is write in the low layer storage system, and the value that writes back rearmounted W territory is 0, and writing back the address is Offset in the Pbase assembly.As Offset is that the data of 0xb00 are dirty, and Pbase is 0x00ff7, writes back the address and then is 00ff7b00.Stack high-speed cache block size is 4KB, more than or equal to the space that needs distribute, enough distributes to new process stack space and uses.Process 8 enters second stack cacheline, the Vbase of sign is changed to 0x7fff7, Valid is changed to 1, the PASID that PASID is changed in the control register is 8, Pbase is changed to this void plot corresponding physical plot 0x01ff7, Bottom is changed to 0x7fff8000, and Top is changed to 7fff7c00, and Age is changed to 0.The Age of other stack cachelines adds 1.When stack space reclaimed, the stack top address increased to 0x7fff7f00, did not need the data of the correspondence in the stack high-speed cache are write the low layer storage system, only need change stack top into 0x7fff7f00.
The address Vaddr of example 2. access instruction is 0x7fff7f80, process address space sign PASID in the control register is a process 7, stack high-speed cache block size is 4KB, the Vbase of each sign stack cacheline is 20, Offset is 12, the stack cache memory sizes is 16KB, is divided into 4 stack cachelines.The base address of reference address (Base) is 0x7fff7, and skew (Offset) is 0xf80.The empty plot (Vbase) of first stack cacheline in the stack high-speed cache is 0x7fff7, process address space sign (PASID) is 7, significance bit (Valid) is 1, address at the bottom of the stack (Bottom) is 0x7fff8000, stack top address (Top) is 0x7fff7400, physical base address (Pbase) is 0x1ff7, and Age is 0.The data of 0xf80 offset index are 0x01fc00c0 in first stack cacheline.Indicate comparison, reference address more than or equal to the stack top address smaller or equal to stack at the bottom of address (0x7fff7400≤0x7fff7f80≤0x7fff8000), the visit base address equals the empty plot 0x7fff7f80 of first stack high-speed cache block mark, the value of effective bit field is 1 in first stack high-speed cache block mark, and the value of process address space identification field is all 7 mutually with the process address space sign of access instruction.Satisfy the condition of stack cache hit, return the information of hitting.Select the data 0x01fc00c0 that first stack cacheline obtains with the offset index data field in the data field, output hiting data 0x01fc00c0.
The address Vaddr of example 3. access instruction is 0x7fff7f80, process address space sign PASID in the control register is a process 7, stack high-speed cache block size is 4KB, the Vbase of each stack high-speed cache block mark is 20, Offset is 12, the stack cache memory sizes is 16KB, is divided into 4 stack cachelines.The base address of reference address (Base) is 0x7fff7, and skew (Offset) is 0xf80.The empty plot (Vbase) of first stack high-speed cache block mark in the stack high-speed cache is 0x7fff6, process address space sign (PASID) is 7, significance bit (Valid) is 1, address at the bottom of the stack (Bottom) is 0x7fff8000, stack top address (Top) is 0x7fff6000, physical base address (Pbase) is 0x1ff6, and Age is 2.Reference address more than or equal to the stack top address smaller or equal to stack at the bottom of the address (0x7fff7400≤0x7fff7f80≤0x7fff8000), in the sign effectively the value of bit field be 1, the value of the process address space identification field in the sign is all 7 mutually with the process identification (PID) of access instruction.But the visit base address is not equal to the empty plot 0x7fff6 of first stack cacheline, and the significance bit that does not have the process address space of other stack cachelines to be designated 7, the three stack high-speed cache block marks is 0.The stack high-speed cache does not hit, the reference address visit belongs to stack space, fetch from the low layer storage system not that stack cacheline in hit address place enters the 3rd stack cacheline in the data between the address at the bottom of stack top address and the stack, guarantee that the stack cacheline is continuous.Promptly fetch the data from virtual address 0x7fff7000 to 0x7fff7fff, corresponding physical address is that 0x01ff7000 is to 0x01ff7fff.Visit low layer memory system data returns, putting the 3rd Vbase in the stack high-speed cache block mark is 0x7fff7, putting Pbase is corresponding physical base address 0x01ff7, putting the Valid territory is 1, fill PASID with the address space identifier (ASID) in the control register 7, putting the Age territory is 0, and the value in other stack cachelines Age territory adds 1.Return data leaves the position, Data territory of the 3rd stack high-speed cache blocks of data in, and corresponding W territory is changed to 0.
By the description of the foregoing description, advantage of the present invention is tangible.The present invention has overcome traditional stack cache method and has not been suitable for the deficiency that process (comprising thread) context switches, and feasibility is good.
It should be noted that at last: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only, although the present invention is had been described in detail with reference to the foregoing description, those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention, and not breaking away from any modification or partial replacement of the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (8)

1, a kind of stack cache memory that is applicable to that context switches comprises:
At least two stack cachelines, described stack cacheline is made up of sign part, data division and control section;
An OR circuit is connected with the output terminal of the described control section of described two stack cachelines at least, be used for each stack cacheline hiting signal or operation, export the result that this stack cache memory hits or do not hit;
A selector switch, be connected with the output terminal of the described control section of described at least two stack cachelines, and be connected with the output terminal of the described data division of described at least two stack cachelines, be used to select the data of the stack cacheline that hits, output stack number of cache hits certificate.
2, the stack cache memory that is applicable to that context switches according to claim 1 is characterized in that the structure of each described stack cacheline is identical.
3, the stack cache memory that is applicable to that context switches according to claim 2, it is characterized in that the sign of described stack cacheline partly comprises empty plot territory, effectively address field, process address space identification field at the bottom of bit field, physical base address territory, stack top address field, the stack.
4, the stack cache memory that is applicable to that context switches according to claim 3 is characterized in that, the data division of each described stack cacheline comprises the dirty bit field whether data field and expression data were write.
5, according to each described stack cache memory that is applicable to that context switches among the claim 1-4, it is characterized in that the control section of each described stack cacheline comprises: at least three comparator circuits and an AND circuit; Wherein, the input end of first comparator circuit is connected with the empty plot territory of sign part and the territory, base address of the reference address of this stack high-speed cache, be used to finish the base address and the judgement that indicates whether empty plot partly equates of visit, its output terminal is connected to described AND circuit; The reference address of address field and this stack high-speed cache is connected at the bottom of the stack top address field of the input end of second comparator circuit and sign part and the stack, is used to finish the judgement whether reference address belongs to stack space, and its output terminal is connected to described AND circuit; The input end of the 3rd comparator circuit is connected with the process identification (PID) territory of described sign part and the process identification (PID) territory of control register, the judgement whether value that is used to finish process address space identification field and the process address space sign of access instruction equate, its output terminal is connected to described AND circuit; Effective bit field of described sign part also is connected to the input end of described AND circuit; The output terminal of described AND circuit is connected respectively to described OR circuit and described selector switch.
6, a kind of microprocessor stack cache storage means that is applicable to that context switches, its step is as follows:
(1) context switches, the initialization stack; If do not distribute corresponding process stack space in the stack high-speed cache, address and stack top address at the bottom of the record initialization stack in the stack high-speed cache;
(2) stack space distributes; If the stack high-speed cache has allocatable space, in the stack high-speed cache, distribute new vacant space, if the stack high-speed cache does not have allocatable space, then select the stack cacheline to write back to the low layer storage system, and the sign of the newly assigned stack cacheline of initialization;
(3) stack space reclaims; Do not need dirty alluvial is write in the low layer storage system, directly reclaim release stack cache memory space;
(4) instruction access stack high-speed cache; Indicate comparison, determine according to the sign comparative result whether visit stack high-speed cache hits; If hit, execution in step (5); If do not hit, execution in step (6);
(5) export the stack cacheline that hits and carry out the hiting data that data directory obtains with skew;
(6) judge whether reference address belongs to stack space; If reference address does not belong to stack space, handle by data cache; If belong to stack space, from the low layer storage system, fetch not hit address place stack cacheline at the bottom of the stack and the data between the stack top address.
7, according to the described microprocessor stack cache storage means that is applicable to that context switches of claim 6, it is characterized in that, in described step (2), if the stack high-speed cache does not have allocatable space, select the stack cacheline to write back to the low layer storage system, the strategy of employing is first in first out strategy, randomized policy or least recently used strategy; If adopt the first in first out strategy, the selection of the stack cacheline that enters at first distributes the territory of time to realize by increase expression stack cacheline in the sign of stack cacheline, the described territory zero clearing of newly assigned stack high-speed cache block mark, the described territory of other stack high-speed cache block marks adds 1.
8, according to claim 6 or the 7 described microprocessor stack cache storage meanss that are applicable to that context switches, it is characterized in that in described step (4), described sign relatively is meant and judges whether to satisfy simultaneously following condition; Reference address more than or equal to the stack top address smaller or equal to stack at the bottom of the address, the base address of visit is identical with the empty plot in the stack high-speed cache block mark, in the empty plot stack high-speed cache block mark identical with the visit base address effectively the value of bit field be 1, the value of process address space identification field identifies identical with the process address space of access instruction; Described visit stack cache hit is meant and satisfies above-mentioned condition; If do not satisfy above-mentioned condition, then visit the stack high-speed cache and do not hit.
CNB2005100868602A 2005-11-11 2005-11-11 Stack cache memory applied for context switch and buffer storage method Active CN100377115C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005100868602A CN100377115C (en) 2005-11-11 2005-11-11 Stack cache memory applied for context switch and buffer storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100868602A CN100377115C (en) 2005-11-11 2005-11-11 Stack cache memory applied for context switch and buffer storage method

Publications (2)

Publication Number Publication Date
CN1963789A true CN1963789A (en) 2007-05-16
CN100377115C CN100377115C (en) 2008-03-26

Family

ID=38082852

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100868602A Active CN100377115C (en) 2005-11-11 2005-11-11 Stack cache memory applied for context switch and buffer storage method

Country Status (1)

Country Link
CN (1) CN100377115C (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015081889A1 (en) * 2013-12-06 2015-06-11 上海芯豪微电子有限公司 Caching system and method
CN105808576A (en) * 2014-12-30 2016-07-27 展讯通信(天津)有限公司 Data recording system and method
WO2017049590A1 (en) * 2015-09-25 2017-03-30 Intel Corporation Systems and methods for input/output computing resource control
CN110134617A (en) * 2019-05-15 2019-08-16 上海东软载波微电子有限公司 Address space allocation method and device, computer readable storage medium
CN114840143A (en) * 2022-05-09 2022-08-02 Oppo广东移动通信有限公司 Stack space characteristic-based cache processing method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6167488A (en) * 1997-03-31 2000-12-26 Sun Microsystems, Inc. Stack caching circuit with overflow/underflow unit
US7191291B2 (en) * 2003-01-16 2007-03-13 Ip-First, Llc Microprocessor with variable latency stack cache
US7139877B2 (en) * 2003-01-16 2006-11-21 Ip-First, Llc Microprocessor and apparatus for performing speculative load operation from a stack memory cache

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015081889A1 (en) * 2013-12-06 2015-06-11 上海芯豪微电子有限公司 Caching system and method
US9990299B2 (en) 2013-12-06 2018-06-05 Shanghai Xinhao Microelectronics Co. Ltd. Cache system and method
CN105808576A (en) * 2014-12-30 2016-07-27 展讯通信(天津)有限公司 Data recording system and method
CN105808576B (en) * 2014-12-30 2019-05-28 展讯通信(天津)有限公司 A kind of digital data recording system and method
WO2017049590A1 (en) * 2015-09-25 2017-03-30 Intel Corporation Systems and methods for input/output computing resource control
US10310974B2 (en) 2015-09-25 2019-06-04 Intel Corporation Systems and methods for input/output computing resource control
CN110134617A (en) * 2019-05-15 2019-08-16 上海东软载波微电子有限公司 Address space allocation method and device, computer readable storage medium
CN114840143A (en) * 2022-05-09 2022-08-02 Oppo广东移动通信有限公司 Stack space characteristic-based cache processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN100377115C (en) 2008-03-26

Similar Documents

Publication Publication Date Title
US6260114B1 (en) Computer cache memory windowing
US6957304B2 (en) Runahead allocation protection (RAP)
US8140764B2 (en) System for reconfiguring cache memory having an access bit associated with a sector of a lower-level cache memory and a granularity bit associated with a sector of a higher-level cache memory
US8677071B2 (en) Control of processor cache memory occupancy
CN1092360C (en) Method and apparatus for decreasing thread switch latency in multithread processor
US6427188B1 (en) Method and system for early tag accesses for lower-level caches in parallel with first-level cache
KR100936601B1 (en) Multi-processor system
CN112543916B (en) Multi-table branch target buffer
US20120290793A1 (en) Efficient tag storage for large data caches
JPH1196074A (en) Computer system for dynamically selecting exchange algorithm
CN100377115C (en) Stack cache memory applied for context switch and buffer storage method
US5765199A (en) Data processor with alocate bit and method of operation
CN104699627B (en) A kind of caching system and method
CN100399299C (en) Memory data processing method of cache failure processor
CN101178690A (en) Design method of low-power consumption high performance high speed scratch memory
US8266379B2 (en) Multithreaded processor with multiple caches
GB2299879A (en) Instruction/data prefetching using non-referenced prefetch cache
EP0173893B1 (en) Computing system and method providing working set prefetch for level two caches
US5819080A (en) Microprocessor using an instruction field to specify condition flags for use with branch instructions and a computer system employing the microprocessor
US11662931B2 (en) Mapping partition identifiers
US6751707B2 (en) Methods and apparatus for controlling a cache memory
EP1387275B1 (en) Memory management of local variables upon a change of context
US12001705B2 (en) Memory transaction parameter settings
US11455253B2 (en) Set indexing for first-level and second-level set-associative cache
JP2004303232A (en) Data memory cache device and data memory cache system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Assignee: Beijing Loongson Zhongke Technology Service Center Co., Ltd.

Assignor: Institute of Computing Technology, Chinese Academy of Sciences

Contract fulfillment period: 2009.12.16 to 2028.12.31

Contract record no.: 2010990000062

Denomination of invention: Stack cache memory applied for context switch and buffer storage method

Granted publication date: 20080326

License type: exclusive license

Record date: 20100128

LIC Patent licence contract for exploitation submitted for record

Free format text: EXCLUSIVE LICENSE; TIME LIMIT OF IMPLEMENTING CONTACT: 2009.12.16 TO 2028.12.31; CHANGE OF CONTRACT

Name of requester: BEIJING LOONGSON TECHNOLOGY SERVICE CENTER CO., LT

Effective date: 20100128

EC01 Cancellation of recordation of patent licensing contract

Assignee: Longxin Zhongke Technology Co., Ltd.

Assignor: Institute of Computing Technology, Chinese Academy of Sciences

Contract record no.: 2010990000062

Date of cancellation: 20141231

EM01 Change of recordation of patent licensing contract

Change date: 20141231

Contract record no.: 2010990000062

Assignee after: Longxin Zhongke Technology Co., Ltd.

Assignee before: Beijing Loongson Zhongke Technology Service Center Co., Ltd.

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20070516

Assignee: Longxin Zhongke Technology Co., Ltd.

Assignor: Institute of Computing Technology, Chinese Academy of Sciences

Contract record no.: 2015990000066

Denomination of invention: Stack cache memory applied for context switch and buffer storage method

Granted publication date: 20080326

License type: Common License

Record date: 20150211

TR01 Transfer of patent right

Effective date of registration: 20200818

Address after: 100095, Beijing, Zhongguancun Haidian District environmental science and technology demonstration park, Liuzhou Industrial Park, No. 2 building

Patentee after: LOONGSON TECHNOLOGY Corp.,Ltd.

Address before: 100080 Haidian District, Zhongguancun Academy of Sciences, South Road, No. 6, No.

Patentee before: Institute of Computing Technology, Chinese Academy of Sciences

TR01 Transfer of patent right
EC01 Cancellation of recordation of patent licensing contract

Assignee: LOONGSON TECHNOLOGY Corp.,Ltd.

Assignor: Institute of Computing Technology, Chinese Academy of Sciences

Contract record no.: 2015990000066

Date of cancellation: 20200928

EC01 Cancellation of recordation of patent licensing contract
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 100095 Building 2, Longxin Industrial Park, Zhongguancun environmental protection technology demonstration park, Haidian District, Beijing

Patentee after: Loongson Zhongke Technology Co.,Ltd.

Address before: 100095 Building 2, Longxin Industrial Park, Zhongguancun environmental protection technology demonstration park, Haidian District, Beijing

Patentee before: LOONGSON TECHNOLOGY Corp.,Ltd.