WO2014206217A1 - Procédé de gestion pour cache d'instructions, et processeur - Google Patents
Procédé de gestion pour cache d'instructions, et processeur Download PDFInfo
- Publication number
- WO2014206217A1 WO2014206217A1 PCT/CN2014/080059 CN2014080059W WO2014206217A1 WO 2014206217 A1 WO2014206217 A1 WO 2014206217A1 CN 2014080059 W CN2014080059 W CN 2014080059W WO 2014206217 A1 WO2014206217 A1 WO 2014206217A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cache
- instruction
- hardware thread
- instruction cache
- private
- Prior art date
Links
- 238000007726 management method Methods 0.000 title claims abstract description 5
- 238000000034 method Methods 0.000 claims description 22
- 238000013500 data storage Methods 0.000 claims description 13
- 239000000872 buffer Substances 0.000 claims description 11
- 238000013519 translation Methods 0.000 claims description 5
- 230000003068 static effect Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0875—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0842—Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
Definitions
- the invention relates to a method and a processor for managing an instruction cache.
- the application is filed on June 28, 2013 by the Chinese Patent Office, the application number is 201310269557.0, and the invention is entitled "A Method of Cache Management and Processor". The entire contents of the above-identified patent application are incorporated herein by reference.
- the present invention relates to the field of computers, and in particular, to a method and a processor for managing an instruction cache.
- the CPU Central Processing Unit cache (Cache Memory) is a temporary memory located between the CPU and the memory.
- the capacity is much smaller than the memory, which solves the contradiction between the CPU operation speed and the memory read/write speed. CPU read speed.
- multiple hardware threads fetch instructions from the same I-Cache (instruction cache). When there is no instruction to be fetched in the I-Cache, the missing request is sent to the next-level Cache. Switching to other hardware threads to access the I-Cache continues to fetch, reducing the stalls caused by the I-Cache miss and improving pipeline efficiency. However, because the shared I-Cache resources allocated by each hardware thread are insufficient, the I-Cache miss rate increases, and the missing requests sent by the I-Cache to the next-level cache frequently occur, and the instructions are retrieved from the next-level cache. When backfilling, when the thread data increases, the cache line where the filled instruction is located will be filled into the missing I-Cache and will not be used immediately, and the replaced cache line may be used again.
- Embodiments of the present invention provide a method and a processor for managing an instruction cache, which can expand the instruction cache capacity of a hardware thread, reduce the missing rate of the instruction cache, and improve system performance.
- a processor comprising: a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, a shared floating point unit, and a data cache.
- a program counter comprising: a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, a shared floating point unit, and a data cache.
- the internal bus it also includes:
- a shared instruction cache a shared instruction for storing all hardware threads, including a tag storage array and a data storage array, the tag storage array for storing tags, the data storage array including stored instructions and hardware thread identifiers, the hardware The thread identifier is used to identify a hardware thread corresponding to the cache line in the shared instruction cache;
- a private instruction cache configured to store an instruction cache line that is replaced from the shared instruction cache, where the private instruction cache corresponds to the hardware thread
- a missing cache configured to save the cache line retrieved from the next-level cache of the shared instruction cache in a missing cache of the hardware thread when the fetched instruction does not exist in the shared instruction cache.
- the method further includes: label comparison logic, configured to: when the hardware thread fetches, the label in the private instruction cache corresponding to the hardware thread In contrast to the physical address translated by the translation buffer, the private instruction cache is logically coupled to the tag comparison such that the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
- the processor is a multi-thread processor
- the structure of the private instruction cache is a fully associative structure, the full phase
- the association structure maps any block of instructions in the private instruction cache to any block of instructions in the main memory.
- the shared instruction cache, the private instruction cache, and the missing cache are static memory chips or dynamic memory chips.
- the second aspect provides a method for managing an instruction cache, including:
- the hardware thread of the processor acquires the instruction from the instruction cache, simultaneously accessing the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread; determining that the shared instruction cache corresponds to the hardware thread Whether the instruction exists in the private instruction cache, and the instruction is obtained from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the judgment result.
- the shared instruction cache includes a label storage array and a data storage array, the label storage array is configured to store labels, and the data storage array includes storage
- the hardware thread identifier is used to identify a hardware thread corresponding to the cache line in the shared instruction cache
- the structure of the private instruction cache is a fully associative structure, and the fully associative structure is mainly Any block of instructions in the memory maps any block of instructions in the private instruction cache, the private instruction cache corresponding to the hardware thread.
- the determining, by the shared instruction cache, and the private instruction cache corresponding to the hardware thread, whether the instruction exists, and determining The obtaining the instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread includes:
- the instruction is obtained from the shared instruction cache; if the hardware thread corresponds to a private instruction cache If the instruction exists and the instruction does not exist in the shared instruction cache, the instruction is obtained from a private instruction cache corresponding to the hardware thread.
- the method further includes:
- the hardware thread is used to Obtaining the instruction in the next level cache, and storing the cache line in which the instruction is located in a missing cache corresponding to the hardware thread, and backfilling the cache line to the share when the hardware thread fetches the finger In the instruction cache;
- the missing cache corresponds to the hardware thread.
- a fourth possible implementation manner when the cache line is backfilled into the shared instruction cache, if the shared instruction cache does not have idle resources, And replacing the cache line with the first cache line in the shared instruction cache, backfilling the cache line into the shared instruction cache, and according to the hardware thread of the first hardware thread acquiring the first cache line Identifying, storing the first cache line in a private instruction cache corresponding to the first hardware thread;
- the first cache line is determined by a least recently used algorithm.
- a fifth possible implementation manner when the replaced first cache line is stored in the private instruction cache corresponding to the first hardware thread, The private cache corresponding to the first hardware thread does not have an idle resource, and the first cache line is replaced with a second cache line in the private instruction cache corresponding to the first hardware thread, and the first cache line is used. Backfilling into the private instruction cache corresponding to the first hardware thread;
- the second cache line is determined by the least recently used algorithm.
- Embodiments of the present invention provide a method and a processor for managing an instruction cache.
- the processor includes a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, and a shared floating point.
- Units, data caches, and internal buses also include shared instruction caches, private instruction caches, missing caches, and tag comparison logic.
- the shared instruction cache is used to store shared instructions of all hardware threads, including a tag storage array and a data storage array.
- the data storage array includes stored instructions and hardware thread identifiers, and the hardware thread identifier is used to identify cache lines in the shared instruction cache.
- private instruction cache for storing instruction cache lines replaced from the shared instruction cache
- private instruction cache Corresponding to the hardware thread - corresponding to the label comparison logic, when the hardware thread fetches the pointer, compares the label in the private instruction cache corresponding to the hardware thread with the physical address converted by the translation backup buffer, and the private instruction cache and label
- the comparison logic is connected such that the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
- the instruction cache determines whether there is an instruction in the shared instruction cache and the private instruction cache corresponding to the hardware thread, and obtains an instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the judgment result, thereby expanding the instruction cache capacity of the hardware thread and reducing the instruction.
- the cache miss rate improves system performance.
- FIG. 1 is a schematic structural diagram of a processor according to an embodiment of the present invention.
- FIG. 2 is a schematic flowchart of a method for managing an instruction cache according to an embodiment of the present invention
- FIG. 3 is a schematic diagram of a simultaneous access to a shared instruction cache and a private instruction cache according to an embodiment of the present invention
- FIG. 4 is a schematic diagram of a method for retrieving a cache line according to a cache miss request according to an embodiment of the present invention.
- the instruction cache capacity of the LI Cache allocated by each hardware thread is too small, there will be a miss in L1, and the L1 miss rate will increase, resulting in increased communication between LI Cache and L2 Cache, fetching from L2 Cache, or from main memory. In the middle fetch, the processor power consumption increases.
- the embodiment of the present invention provides a processor 01, as shown in FIG. 1, including a program counter 011, a register file 012, an instruction prefetching component 013, an instruction decoding component 014, an instruction transmitting component 015, an address generating unit 016, and an arithmetic logic.
- Unit 017, shared floating point unit 018, data cache 019, and internal bus also include:
- the shared instruction cache 020 is configured to store sharing instructions of all hardware threads, including a tag storage array ( Tag Array ) 0201 and a data storage array (Data Array ) 0202.
- the tag storage array 0201 is used to store tags
- the data storage array 0202 includes
- the stored instruction 02021 and the hardware thread identifier (Thread ID) 02022 are used to identify the hardware thread corresponding to the cache line in the shared instruction cache 020.
- the private instruction cache 021 is used to store the instruction cache line replaced from the shared instruction cache 020, and the private instruction cache 021 corresponds to the hardware thread.
- the missing cache 022 is configured to cache the cache line retrieved from the next level cache of the shared instruction cache 020 in the missing cache of the hardware thread when the instruction fetched in the shared instruction cache 020 does not exist, in the fetched instruction When the corresponding hardware thread fetches, the cache line in the missing cache 022 is backfilled into the shared instruction cache, and the missing cache 022 corresponds to the hardware thread.
- Tag comparison logic when the hardware thread fetches, compares the tag in the private instruction cache corresponding to the hardware thread with the PA (Physis Adress) converted by TLB (Translation Look-aside Buffers),
- the private instruction cache 021 is logically coupled to the tag comparison such that the hardware thread accesses the private instruction cache 021 while accessing the shared instruction cache 020.
- TLB page table buffer
- page table buffer which stores some page table files (virtual address to physical address conversion table, you can convert the virtual address of the fetched instruction through TLB
- the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
- PC Program Counter
- GRF General Register File
- a logical processor core in a processor core corresponds to a GRF, and the number is the same as the number of PCs.
- ALU Arimetic Logic Unit
- CPU Central Processing Unit
- D-Cache data buffer
- D-Cache internal bus
- the processor 01 is a multi-threaded processor
- the structure of the private instruction cache 021 is a fully associative structure.
- the fully associative structure maps an arbitrary instruction cache in the private instruction cache to any one of the instruction caches in the main memory.
- shared instruction cache 020, private instruction cache 021, and missing cache 022 are static memory chips or dynamic memory chips.
- Thread ID hardware thread ID
- I-Cache Data Array which is used to indicate which hardware thread the Cache Line is issued by. Missing
- L1 when the hardware thread accesses the I-Cache of L1, that is, the instruction to be obtained by the hardware thread does not exist in the I-Cache, L1 sends a Cache Miss request to the next-level cache L2 Cache of L1, and if the L2 Cache hits,
- the hardware thread backfills the cache line (Cache Line) where the instruction is located in the L2 Cache into the LI Cache, or the hardware thread receives the return.
- the Cache Line In the Cache Line, the Cache Line is not directly filled in the LI Cache, but the Cache Line is stored in the Miss Buffer corresponding to the hardware thread, and the Cache Line is filled in until the hardware layer picks up. LI Cache.
- the replaced Cache Line is not directly discarded, and the Thread ID of the hardware thread corresponding to the replaced Cache Line can be used.
- the replaced Cache Line is filled in the private instruction cache corresponding to the hardware thread corresponding to the replaced Cache Line.
- the replacement may be caused by the absence of idle resources in the LI Cache, and the replaced Cache Line may be obtained according to the LRU (Least Recently Used, least recently used) algorithm.
- LRU east Recently Used, least recently used
- the LRU algorithm is to replace the one of the longest unused instructions out of the cache once the instruction cache is missing. In other words, the cache first retains the most frequently used instructions.
- a hardware thread fetches a finger, it can simultaneously access the I-Cache and the private Cache corresponding to the hardware thread.
- the fetched instruction is obtained from the I-Cache;
- the instruction fetched in the I-Cache does not exist and the private Cache corresponding to the hardware thread has an instruction fetched, the instruction fetched from the private Cache corresponding to the hardware thread obtains the fetched instruction;
- the fetched instruction is obtained from the I-Cache;
- the hardware thread sends a Cache Miss request to the next-level cache of the I-Cache to obtain the fetched instruction.
- the private cache corresponding to the new hardware thread is logically connected with the tag (tag) while accessing the shared instruction cache.
- the Tag comparison logic read by the private Cache is compared with the PA (Physical Address) outputted by the TLB (Translation Look-aside Buffers) to generate a private Cache Miss signal and a private Cache data output.
- the private Cache Miss signal indicates that there is an instruction, and there is an instruction output.
- an embodiment of the present invention provides a processor including a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, a shared floating point unit, and data.
- the cache and the internal bus also include a shared instruction cache, a private instruction cache, a missing cache, and a tag comparison logic.
- the hardware thread identifier is added to the data storage array of the shared instruction cache, and the cache line retrieved when the cache is missing is Which hardware thread sends a cache miss request, when the shared instruction cache is replaced, the replaced cache line is stored in the private instruction cache corresponding to the corresponding hardware thread according to the hardware thread identifier, and the missing cache is used for
- the hardware thread receives the cache line returned by the cache miss request, it does not directly fill the cache line back into the shared instruction cache, but saves the cache line in the missing cache until the hardware thread fetches the line.
- Backfilling into the shared instruction cache, reducing the upcoming access In addition to the chance that the cache line is replaced by the instruction cache, in addition, the increased private instruction cache increases the cache capacity of each hardware thread and improves system performance.
- a further embodiment of the present invention provides a method for managing an instruction cache, as shown in FIG. 2, including:
- the processor When the hardware thread of the processor acquires an instruction from the instruction cache, the processor simultaneously accesses the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread.
- the Central Processing Unit can be a multi-threaded processor.
- a physical core can have multiple hardware threads, also called logical cores or logical processors, but a hardware thread does not represent a physical core. Windows will each hardware thread be treated as a schedulable logical processor, each The logical processor can run the code of the software thread.
- the instruction cache can be a shared instruction cache (I-Cache) in the LI Cache in the processor and a private instruction cache of the hardware thread.
- the LI Cache includes a data cache (D-Cache) and an instruction cache (I-Cache).
- a fully associated private Cache can be set in each hardware thread, that is, the private Cache corresponds to the hardware thread.
- the fully associative structure maps any block of instructions in the private instruction cache to any block of instructions in the main memory.
- a Tag (tag) comparison logic can be added.
- the private thread of the hardware thread is actively connected with the Tag logically, so that when a hardware thread fetches the finger, the I-Cache and the I-Cache are simultaneously accessed.
- the processor determines whether the shared instruction cache and the private instruction cache corresponding to the hardware thread have an instruction, and then proceeds to step 103 or 104 or 105 or 106.
- the hardware thread accesses the I-Cache and the private Cache corresponding to the hardware thread, it determines whether the I-Cache and the private Cache corresponding to the hardware thread have the fetched instruction.
- the 32 hardware threads share a 64 KB I-Cache, that is, the shared instruction cache capacity is 64 KB.
- Each hardware thread contains a 32-way fully-associated private Cache that can store 32 replaced Cache Lines, each of which contains 64 Bytes, so that each private Cache has a capacity of 2 KB.
- the hardware thread compares the 32-way Tag read by the hardware thread with the PA (Physical Address) output by the TLB while accessing the I-Cache shared instruction cache, and Generate private Cache Miss signal and private Cache data output. If the 32-channel tag is the same as the PA, the private Cache Miss signal indicates that the private Cache of the hardware thread has the fetched instruction, and the private Cache data is a valid instruction. As shown in Figure 3.
- PA Physical Address
- TLB page table buffer
- page table buffer which stores some page table files (virtual address to physical address conversion table, you can convert the virtual address of the fetched instruction into a physical address through TLB, in the physical address and private
- the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
- the processor obtains the instruction from the shared instruction cache.
- the instruction is fetched from the I-Cache.
- the processor acquires the instruction from the shared instruction cache.
- the fetched instruction exists from the I-Cache.
- the processor acquires the instruction from the private instruction cache corresponding to the hardware thread.
- the I-Cache does not hit, that is, there is no instruction fetched, and the fetch instruction exists in the private Cache corresponding to the hardware thread
- the instruction is obtained from the private Cache corresponding to the hardware thread. In this way, by actively selecting the private cache corresponding to the hardware thread to participate in the tag comparison, the Cache capacity allocated by each hardware thread can be expanded, and the hit rate of the instruction cache of the hardware thread is increased.
- the processor sends a cache miss request to the next level cache of the shared instruction cache through the hardware thread.
- the hardware thread issues a Cache Miss to the next-level cache of the I-Cache.
- the hardware thread sends a Cache Miss to the L2 Cache of the next Cache of the LI Cache to obtain the fetched instruction from the L2 Cache.
- the processor acquires the instruction from the next level cache through the hardware thread, and stores the cache line where the instruction is located in a missing cache corresponding to the hardware thread, when the hardware thread fetches the finger, Backfill the cache line into the shared instruction cache.
- the instruction in the L2 Cache exists, the instruction is obtained from the L2 Cache, and the Cache Line where the instruction is located is not directly backfilled into the LI Cache, but the Cache Line where the instruction is fetched is saved in the In the Miss Buffer corresponding to the hardware thread, the Cache Line is filled into the LI Cache until the hardware thread fetches.
- Miss Buffer and hardware thread - corresponding that is, each hardware thread has a Miss Buffer
- each hardware thread uses a Miss Buffer to cache the Cache Line returned by the Cache Miss request, which is due to the Cache Line Replacement occurs when backfilling to LI Cache.
- the replaced Cache Line may be the Cache Line to be accessed.
- the existence of Miss Buffer optimizes the backfilling time of Cache Line, and reduces the cache that will be replaced by the Cache Line to be accessed. The chance.
- the processor sends a missing request to the main memory through the hardware thread, obtains the instruction from the main memory, and stores the cache line where the instruction is located in the missing cache corresponding to the hardware thread. When the hardware thread fetches, the cache line is backfilled into the shared instruction cache.
- the hardware thread issues a Cache Miss request to the main memory to obtain the fetched instruction from the main memory. If the fetched instruction exists in the main memory, the fetched instruction is obtained, and the Cache Line where the fetched instruction is stored is stored in the Miss Buffer corresponding to the hardware thread, until the hardware thread fetches the instruction, the Cache Line is filled in. In the LI Cache.
- the hardware thread sends a Cache Miss request to the L3 Cache. If the instruction fetched in the L3 Cache, the fetched instruction is obtained, and if the fetch instruction does not exist in the L3 Cache, A Cache Miss request is issued to the main memory to obtain the fetched instruction.
- the exchange unit between the CPU and the Cache is a word.
- the CPU reads a word in the main memory
- the memory address of the word is sent to the Cache and the main memory at the same time, and the LI Cache or the L2 Cache or the L3 Cache can be controlled in the Cache.
- the logic determines whether there is a word according to the Tag tag part of the address. If it hits, the CPU obtains the word. If it does not, it reads out from the main memory and outputs it to the CPU using the main memory read cycle, even if the current CPU reads only one word.
- the Cache controller also copies a complete Cache line containing the word in the main memory to the Cache. This operation of transferring a row of data to the Cache is called Cache line filling.
- the cache line when the cache line is backfilled into the shared instruction cache, if there is no idle resource in the shared instruction cache, the cache line is replaced with the first cache line in the shared instruction cache, and the cache line is backfilled into the shared instruction cache, and Obtaining a hardware thread identifier of the first hardware thread of the first cache line, and storing the first cache line in a private instruction cache corresponding to the first hardware thread.
- the first cache line is determined by an LRU (Least Recently Used) algorithm.
- Thread ID hardware thread ID
- I-Cache Data Array which is used to indicate which Cache Line is a Cache Miss request from which hardware thread. Taken back.
- Thread ID hardware thread ID
- the replaced Cache Line is not directly discarded, and the replaced Cache can be replaced according to the Thread ID.
- Line is filled in the private Cache of the hardware thread identified by the Thread ID, which is due to the possibility that the replaced Cache Line will be accessed soon. As shown in Figure 4.
- the first cache line is replaced with the second cache line in the private instruction cache corresponding to the first hardware thread, and the first cache line is backfilled to The first hardware thread corresponds to the private instruction cache.
- the second cache line is determined by an LRU algorithm.
- the LRU algorithm is to replace the one of the longest unused instructions out of the cache once the instruction cache is missing. In other words, the cache first retains the most frequently used instructions.
- the instruction cache capacity allocated by each hardware thread is effectively expanded, the hit rate of the instruction cache of the hardware thread is increased, and the communication between the I-Cache and the next-level Cache is reduced.
- the added buffer buffer optimizes the backfilling time of the Cache Line, reduces the probability that the Cache Line to be accessed is replaced, and increases the Tag comparison logic, so that the shared instruction cache and the private instruction cache are simultaneously accessed when accessing the I-Cache. , increased the hit rate of the instruction cache.
- An embodiment of the present invention provides a method for managing an instruction cache.
- a hardware thread of a processor acquires an instruction from an instruction cache
- the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread are simultaneously accessed to determine a shared instruction cache.
- the hardware thread obtains an instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the judgment result, and if there is no instruction in the shared instruction cache and the private instruction cache, the hardware thread
- the next level cache of the shared instruction cache sends a cache miss request, and stores the cache line of the instruction in the missing cache corresponding to the hardware thread.
- the cache line is backfilled into the shared instruction cache, and the cache is cached.
- the row is backfilled into the shared instruction cache, if there is no free resource in the shared instruction cache, the cache line is replaced with the first cache line in the shared instruction cache, the cache line is backfilled into the shared instruction cache, and the first cache line is obtained according to the Hardware thread of the first hardware thread
- the first cache line is stored in the private instruction cache corresponding to the first hardware thread, which can expand the instruction cache capacity of the hardware thread, reduce the missing rate of the instruction cache, and improve system performance.
- the disclosed processor and method may be implemented in other manners.
- the device embodiments described above are only schematic.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
- the coupling or direct coupling or communication connection between the components shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
- each functional unit may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit.
- the above units may be implemented in the form of hardware or in the form of hardware plus software functional units.
- All or part of the steps of implementing the foregoing method embodiments may be performed by hardware related to the program instructions.
- the foregoing program may be stored in a computer readable storage medium, and when executed, the program includes the steps of the foregoing method embodiments;
- the foregoing storage medium includes:
- U disk, removable hard disk, read only memory (ROM), random access memory (RAM), disk or optical disk, etc. can store various program code media.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
La présente invention porte sur un procédé de gestion pour un cache d'instructions, et sur un processeur, qui ont trait au domaine de l'informatique et peuvent étendre la capacité d'un cache d'instructions en termes de fils matériels, réduire le taux d'échec d'un cache d'instructions et améliorer les performances du système. Un identificateur de fil matériel dans un cache d'instructions partagé d'un processeur est utilisé pour identifier un fil matériel correspondant à une ligne de cache dans le cache d'instructions partagé. Un cache d'instructions privé est utilisé pour stocker une ligne de cache d'instructions qui est remplacée par le cache d'instructions partagé. Un cache manquant est également inclus. Lors de l'acquisition d'une instruction auprès d'un cache d'instructions, le fil matériel du processeur accède simultanément au cache d'instructions partagé et au cache d'instructions privé correspondant au fil matériel dans le cache d'instructions, détermine si le cache d'instructions partagé et le cache d'instructions privé correspondant au fil matériel contiennent des instructions ou non, et acquiert les instructions auprès du cache d'instructions partagé ou du cache d'instructions privé correspondant au fil matériel en fonction d'un résultat de détermination. Le procédé de gestion est utilisé pour gérer un cache d'instructions d'un processeur.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310269557.0 | 2013-06-28 | ||
CN201310269557.0A CN104252425B (zh) | 2013-06-28 | 2013-06-28 | 一种指令缓存的管理方法和处理器 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014206217A1 true WO2014206217A1 (fr) | 2014-12-31 |
Family
ID=52141028
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/080059 WO2014206217A1 (fr) | 2013-06-28 | 2014-06-17 | Procédé de gestion pour cache d'instructions, et processeur |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104252425B (fr) |
WO (1) | WO2014206217A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018229701A1 (fr) * | 2017-06-16 | 2018-12-20 | International Business Machines Corporation | Support de traduction pour mémoire cache virtuelle |
US10606762B2 (en) | 2017-06-16 | 2020-03-31 | International Business Machines Corporation | Sharing virtual and real translations in a virtual cache |
US10713168B2 (en) | 2017-06-16 | 2020-07-14 | International Business Machines Corporation | Cache structure using a logical directory |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809078B (zh) * | 2015-04-14 | 2019-05-14 | 苏州中晟宏芯信息科技有限公司 | 基于退出退让机制的共享高速缓存硬件资源访问方法 |
WO2017006235A1 (fr) * | 2015-07-09 | 2017-01-12 | Centipede Semi Ltd. | Processeur doté d'un accès à la mémoire efficace |
CN106484310B (zh) * | 2015-08-31 | 2020-01-10 | 华为数字技术(成都)有限公司 | 一种存储阵列操作方法和装置 |
CN109308190B (zh) * | 2018-07-09 | 2023-03-14 | 北京中科睿芯科技集团有限公司 | 基于3d堆栈内存架构的共享行缓冲系统及共享行缓冲器 |
US11099999B2 (en) * | 2019-04-19 | 2021-08-24 | Chengdu Haiguang Integrated Circuit Design Co., Ltd. | Cache management method, cache controller, processor and storage medium |
CN110990062B (zh) * | 2019-11-27 | 2023-03-28 | 上海高性能集成电路设计中心 | 一种指令预取过滤方法 |
CN111078592A (zh) * | 2019-12-27 | 2020-04-28 | 无锡中感微电子股份有限公司 | 一种低功耗片上系统的多级指令缓存 |
WO2022150996A1 (fr) * | 2021-01-13 | 2022-07-21 | 王志平 | Procédé de mise en œuvre d'une structure de mémoire cache de processeur |
CN114116533B (zh) * | 2021-11-29 | 2023-03-10 | 海光信息技术股份有限公司 | 利用共享存储器存储数据的方法 |
CN115098169B (zh) * | 2022-06-24 | 2024-03-05 | 海光信息技术股份有限公司 | 基于容量共享的调取指令的方法及装置 |
CN117851278B (zh) * | 2024-03-08 | 2024-06-18 | 上海芯联芯智能科技有限公司 | 一种共享静态随机存取存储器的方法及中央处理器 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020174285A1 (en) * | 1998-12-03 | 2002-11-21 | Marc Tremblay | Shared instruction cache for multiple processors |
CN101510191A (zh) * | 2009-03-26 | 2009-08-19 | 浙江大学 | 具备缓存窗口的多核体系架构及其实现方法 |
US20110320720A1 (en) * | 2010-06-23 | 2011-12-29 | International Business Machines Corporation | Cache Line Replacement In A Symmetric Multiprocessing Computer |
CN103020003A (zh) * | 2012-12-31 | 2013-04-03 | 哈尔滨工业大学 | 面向多核程序确定性重演的内存竞争记录装置及其控制方法 |
-
2013
- 2013-06-28 CN CN201310269557.0A patent/CN104252425B/zh active Active
-
2014
- 2014-06-17 WO PCT/CN2014/080059 patent/WO2014206217A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020174285A1 (en) * | 1998-12-03 | 2002-11-21 | Marc Tremblay | Shared instruction cache for multiple processors |
CN101510191A (zh) * | 2009-03-26 | 2009-08-19 | 浙江大学 | 具备缓存窗口的多核体系架构及其实现方法 |
US20110320720A1 (en) * | 2010-06-23 | 2011-12-29 | International Business Machines Corporation | Cache Line Replacement In A Symmetric Multiprocessing Computer |
CN103020003A (zh) * | 2012-12-31 | 2013-04-03 | 哈尔滨工业大学 | 面向多核程序确定性重演的内存竞争记录装置及其控制方法 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018229701A1 (fr) * | 2017-06-16 | 2018-12-20 | International Business Machines Corporation | Support de traduction pour mémoire cache virtuelle |
GB2577023A (en) * | 2017-06-16 | 2020-03-11 | Ibm | Translation support for a virtual cache |
US10606762B2 (en) | 2017-06-16 | 2020-03-31 | International Business Machines Corporation | Sharing virtual and real translations in a virtual cache |
US10698836B2 (en) | 2017-06-16 | 2020-06-30 | International Business Machines Corporation | Translation support for a virtual cache |
US10713168B2 (en) | 2017-06-16 | 2020-07-14 | International Business Machines Corporation | Cache structure using a logical directory |
GB2577023B (en) * | 2017-06-16 | 2020-08-05 | Ibm | Translation support for a virtual cache |
US10810134B2 (en) | 2017-06-16 | 2020-10-20 | International Business Machines Corporation | Sharing virtual and real translations in a virtual cache |
US10831674B2 (en) | 2017-06-16 | 2020-11-10 | International Business Machines Corporation | Translation support for a virtual cache |
US10831664B2 (en) | 2017-06-16 | 2020-11-10 | International Business Machines Corporation | Cache structure using a logical directory |
US11403222B2 (en) | 2017-06-16 | 2022-08-02 | International Business Machines Corporation | Cache structure using a logical directory |
US11775445B2 (en) | 2017-06-16 | 2023-10-03 | International Business Machines Corporation | Translation support for a virtual cache |
US12141076B2 (en) | 2017-06-16 | 2024-11-12 | International Business Machines Corporation | Translation support for a virtual cache |
Also Published As
Publication number | Publication date |
---|---|
CN104252425B (zh) | 2017-07-28 |
CN104252425A (zh) | 2014-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014206217A1 (fr) | Procédé de gestion pour cache d'instructions, et processeur | |
US7290116B1 (en) | Level 2 cache index hashing to avoid hot spots | |
EP3238074B1 (fr) | Accès à une mémoire cache au moyen d'adresses virtuelles | |
CN104346294B (zh) | 基于多级缓存的数据读/写方法、装置和计算机系统 | |
US6427188B1 (en) | Method and system for early tag accesses for lower-level caches in parallel with first-level cache | |
US11016763B2 (en) | Implementing a micro-operation cache with compaction | |
KR101456860B1 (ko) | 메모리 디바이스의 전력 소비를 감소시키기 위한 방법 및 시스템 | |
US10884751B2 (en) | Method and apparatus for virtualizing the micro-op cache | |
WO2014206218A1 (fr) | Procédé et processeur pour accéder à un cache de données | |
US9727482B2 (en) | Address range priority mechanism | |
CN112631962B (zh) | 存储管理装置、存储管理方法、处理器和计算机系统 | |
US9547593B2 (en) | Systems and methods for reconfiguring cache memory | |
US20170185515A1 (en) | Cpu remote snoop filtering mechanism for field programmable gate array | |
US20120054425A1 (en) | Performing memory accesses using memory context information | |
US20120005454A1 (en) | Data processing apparatus for storing address translations | |
CN102073533A (zh) | 支持动态二进制翻译的多核体系结构 | |
CN107592927B (zh) | 管理扇区高速缓存 | |
JP2024164308A (ja) | 階層キャッシュシステムにおけるプリフェッチ管理 | |
JP2023179708A (ja) | 命令キャッシュにおけるプリフェッチの強制終了及び再開 | |
US20040221117A1 (en) | Logic and method for reading data from cache | |
JPWO2004031943A1 (ja) | データプロセッサ | |
JP2008502069A (ja) | メモリ・キャッシュ制御装置及びそのためのコヒーレンシ動作を実行する方法 | |
WO2014105167A1 (fr) | Appareil et procédé d'extension de parcours de pages pour des vérifications de sécurité renforcée | |
US9639467B2 (en) | Environment-aware cache flushing mechanism | |
US9037804B2 (en) | Efficient support of sparse data structure access |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14816879 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14816879 Country of ref document: EP Kind code of ref document: A1 |