WO2014206217A1 - Management method for instruction cache, and processor - Google Patents

Management method for instruction cache, and processor Download PDF

Info

Publication number
WO2014206217A1
WO2014206217A1 PCT/CN2014/080059 CN2014080059W WO2014206217A1 WO 2014206217 A1 WO2014206217 A1 WO 2014206217A1 CN 2014080059 W CN2014080059 W CN 2014080059W WO 2014206217 A1 WO2014206217 A1 WO 2014206217A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
instruction
hardware thread
instruction cache
private
Prior art date
Application number
PCT/CN2014/080059
Other languages
French (fr)
Chinese (zh)
Inventor
郭旭斌
侯锐
冯煜晶
苏东锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2014206217A1 publication Critical patent/WO2014206217A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0842Multiuser, multiprocessor or multiprocessing cache systems for multiprocessing or multitasking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming

Definitions

  • the invention relates to a method and a processor for managing an instruction cache.
  • the application is filed on June 28, 2013 by the Chinese Patent Office, the application number is 201310269557.0, and the invention is entitled "A Method of Cache Management and Processor". The entire contents of the above-identified patent application are incorporated herein by reference.
  • the present invention relates to the field of computers, and in particular, to a method and a processor for managing an instruction cache.
  • the CPU Central Processing Unit cache (Cache Memory) is a temporary memory located between the CPU and the memory.
  • the capacity is much smaller than the memory, which solves the contradiction between the CPU operation speed and the memory read/write speed. CPU read speed.
  • multiple hardware threads fetch instructions from the same I-Cache (instruction cache). When there is no instruction to be fetched in the I-Cache, the missing request is sent to the next-level Cache. Switching to other hardware threads to access the I-Cache continues to fetch, reducing the stalls caused by the I-Cache miss and improving pipeline efficiency. However, because the shared I-Cache resources allocated by each hardware thread are insufficient, the I-Cache miss rate increases, and the missing requests sent by the I-Cache to the next-level cache frequently occur, and the instructions are retrieved from the next-level cache. When backfilling, when the thread data increases, the cache line where the filled instruction is located will be filled into the missing I-Cache and will not be used immediately, and the replaced cache line may be used again.
  • Embodiments of the present invention provide a method and a processor for managing an instruction cache, which can expand the instruction cache capacity of a hardware thread, reduce the missing rate of the instruction cache, and improve system performance.
  • a processor comprising: a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, a shared floating point unit, and a data cache.
  • a program counter comprising: a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, a shared floating point unit, and a data cache.
  • the internal bus it also includes:
  • a shared instruction cache a shared instruction for storing all hardware threads, including a tag storage array and a data storage array, the tag storage array for storing tags, the data storage array including stored instructions and hardware thread identifiers, the hardware The thread identifier is used to identify a hardware thread corresponding to the cache line in the shared instruction cache;
  • a private instruction cache configured to store an instruction cache line that is replaced from the shared instruction cache, where the private instruction cache corresponds to the hardware thread
  • a missing cache configured to save the cache line retrieved from the next-level cache of the shared instruction cache in a missing cache of the hardware thread when the fetched instruction does not exist in the shared instruction cache.
  • the method further includes: label comparison logic, configured to: when the hardware thread fetches, the label in the private instruction cache corresponding to the hardware thread In contrast to the physical address translated by the translation buffer, the private instruction cache is logically coupled to the tag comparison such that the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
  • the processor is a multi-thread processor
  • the structure of the private instruction cache is a fully associative structure, the full phase
  • the association structure maps any block of instructions in the private instruction cache to any block of instructions in the main memory.
  • the shared instruction cache, the private instruction cache, and the missing cache are static memory chips or dynamic memory chips.
  • the second aspect provides a method for managing an instruction cache, including:
  • the hardware thread of the processor acquires the instruction from the instruction cache, simultaneously accessing the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread; determining that the shared instruction cache corresponds to the hardware thread Whether the instruction exists in the private instruction cache, and the instruction is obtained from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the judgment result.
  • the shared instruction cache includes a label storage array and a data storage array, the label storage array is configured to store labels, and the data storage array includes storage
  • the hardware thread identifier is used to identify a hardware thread corresponding to the cache line in the shared instruction cache
  • the structure of the private instruction cache is a fully associative structure, and the fully associative structure is mainly Any block of instructions in the memory maps any block of instructions in the private instruction cache, the private instruction cache corresponding to the hardware thread.
  • the determining, by the shared instruction cache, and the private instruction cache corresponding to the hardware thread, whether the instruction exists, and determining The obtaining the instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread includes:
  • the instruction is obtained from the shared instruction cache; if the hardware thread corresponds to a private instruction cache If the instruction exists and the instruction does not exist in the shared instruction cache, the instruction is obtained from a private instruction cache corresponding to the hardware thread.
  • the method further includes:
  • the hardware thread is used to Obtaining the instruction in the next level cache, and storing the cache line in which the instruction is located in a missing cache corresponding to the hardware thread, and backfilling the cache line to the share when the hardware thread fetches the finger In the instruction cache;
  • the missing cache corresponds to the hardware thread.
  • a fourth possible implementation manner when the cache line is backfilled into the shared instruction cache, if the shared instruction cache does not have idle resources, And replacing the cache line with the first cache line in the shared instruction cache, backfilling the cache line into the shared instruction cache, and according to the hardware thread of the first hardware thread acquiring the first cache line Identifying, storing the first cache line in a private instruction cache corresponding to the first hardware thread;
  • the first cache line is determined by a least recently used algorithm.
  • a fifth possible implementation manner when the replaced first cache line is stored in the private instruction cache corresponding to the first hardware thread, The private cache corresponding to the first hardware thread does not have an idle resource, and the first cache line is replaced with a second cache line in the private instruction cache corresponding to the first hardware thread, and the first cache line is used. Backfilling into the private instruction cache corresponding to the first hardware thread;
  • the second cache line is determined by the least recently used algorithm.
  • Embodiments of the present invention provide a method and a processor for managing an instruction cache.
  • the processor includes a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, and a shared floating point.
  • Units, data caches, and internal buses also include shared instruction caches, private instruction caches, missing caches, and tag comparison logic.
  • the shared instruction cache is used to store shared instructions of all hardware threads, including a tag storage array and a data storage array.
  • the data storage array includes stored instructions and hardware thread identifiers, and the hardware thread identifier is used to identify cache lines in the shared instruction cache.
  • private instruction cache for storing instruction cache lines replaced from the shared instruction cache
  • private instruction cache Corresponding to the hardware thread - corresponding to the label comparison logic, when the hardware thread fetches the pointer, compares the label in the private instruction cache corresponding to the hardware thread with the physical address converted by the translation backup buffer, and the private instruction cache and label
  • the comparison logic is connected such that the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
  • the instruction cache determines whether there is an instruction in the shared instruction cache and the private instruction cache corresponding to the hardware thread, and obtains an instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the judgment result, thereby expanding the instruction cache capacity of the hardware thread and reducing the instruction.
  • the cache miss rate improves system performance.
  • FIG. 1 is a schematic structural diagram of a processor according to an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of a method for managing an instruction cache according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a simultaneous access to a shared instruction cache and a private instruction cache according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of a method for retrieving a cache line according to a cache miss request according to an embodiment of the present invention.
  • the instruction cache capacity of the LI Cache allocated by each hardware thread is too small, there will be a miss in L1, and the L1 miss rate will increase, resulting in increased communication between LI Cache and L2 Cache, fetching from L2 Cache, or from main memory. In the middle fetch, the processor power consumption increases.
  • the embodiment of the present invention provides a processor 01, as shown in FIG. 1, including a program counter 011, a register file 012, an instruction prefetching component 013, an instruction decoding component 014, an instruction transmitting component 015, an address generating unit 016, and an arithmetic logic.
  • Unit 017, shared floating point unit 018, data cache 019, and internal bus also include:
  • the shared instruction cache 020 is configured to store sharing instructions of all hardware threads, including a tag storage array ( Tag Array ) 0201 and a data storage array (Data Array ) 0202.
  • the tag storage array 0201 is used to store tags
  • the data storage array 0202 includes
  • the stored instruction 02021 and the hardware thread identifier (Thread ID) 02022 are used to identify the hardware thread corresponding to the cache line in the shared instruction cache 020.
  • the private instruction cache 021 is used to store the instruction cache line replaced from the shared instruction cache 020, and the private instruction cache 021 corresponds to the hardware thread.
  • the missing cache 022 is configured to cache the cache line retrieved from the next level cache of the shared instruction cache 020 in the missing cache of the hardware thread when the instruction fetched in the shared instruction cache 020 does not exist, in the fetched instruction When the corresponding hardware thread fetches, the cache line in the missing cache 022 is backfilled into the shared instruction cache, and the missing cache 022 corresponds to the hardware thread.
  • Tag comparison logic when the hardware thread fetches, compares the tag in the private instruction cache corresponding to the hardware thread with the PA (Physis Adress) converted by TLB (Translation Look-aside Buffers),
  • the private instruction cache 021 is logically coupled to the tag comparison such that the hardware thread accesses the private instruction cache 021 while accessing the shared instruction cache 020.
  • TLB page table buffer
  • page table buffer which stores some page table files (virtual address to physical address conversion table, you can convert the virtual address of the fetched instruction through TLB
  • the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
  • PC Program Counter
  • GRF General Register File
  • a logical processor core in a processor core corresponds to a GRF, and the number is the same as the number of PCs.
  • ALU Arimetic Logic Unit
  • CPU Central Processing Unit
  • D-Cache data buffer
  • D-Cache internal bus
  • the processor 01 is a multi-threaded processor
  • the structure of the private instruction cache 021 is a fully associative structure.
  • the fully associative structure maps an arbitrary instruction cache in the private instruction cache to any one of the instruction caches in the main memory.
  • shared instruction cache 020, private instruction cache 021, and missing cache 022 are static memory chips or dynamic memory chips.
  • Thread ID hardware thread ID
  • I-Cache Data Array which is used to indicate which hardware thread the Cache Line is issued by. Missing
  • L1 when the hardware thread accesses the I-Cache of L1, that is, the instruction to be obtained by the hardware thread does not exist in the I-Cache, L1 sends a Cache Miss request to the next-level cache L2 Cache of L1, and if the L2 Cache hits,
  • the hardware thread backfills the cache line (Cache Line) where the instruction is located in the L2 Cache into the LI Cache, or the hardware thread receives the return.
  • the Cache Line In the Cache Line, the Cache Line is not directly filled in the LI Cache, but the Cache Line is stored in the Miss Buffer corresponding to the hardware thread, and the Cache Line is filled in until the hardware layer picks up. LI Cache.
  • the replaced Cache Line is not directly discarded, and the Thread ID of the hardware thread corresponding to the replaced Cache Line can be used.
  • the replaced Cache Line is filled in the private instruction cache corresponding to the hardware thread corresponding to the replaced Cache Line.
  • the replacement may be caused by the absence of idle resources in the LI Cache, and the replaced Cache Line may be obtained according to the LRU (Least Recently Used, least recently used) algorithm.
  • LRU east Recently Used, least recently used
  • the LRU algorithm is to replace the one of the longest unused instructions out of the cache once the instruction cache is missing. In other words, the cache first retains the most frequently used instructions.
  • a hardware thread fetches a finger, it can simultaneously access the I-Cache and the private Cache corresponding to the hardware thread.
  • the fetched instruction is obtained from the I-Cache;
  • the instruction fetched in the I-Cache does not exist and the private Cache corresponding to the hardware thread has an instruction fetched, the instruction fetched from the private Cache corresponding to the hardware thread obtains the fetched instruction;
  • the fetched instruction is obtained from the I-Cache;
  • the hardware thread sends a Cache Miss request to the next-level cache of the I-Cache to obtain the fetched instruction.
  • the private cache corresponding to the new hardware thread is logically connected with the tag (tag) while accessing the shared instruction cache.
  • the Tag comparison logic read by the private Cache is compared with the PA (Physical Address) outputted by the TLB (Translation Look-aside Buffers) to generate a private Cache Miss signal and a private Cache data output.
  • the private Cache Miss signal indicates that there is an instruction, and there is an instruction output.
  • an embodiment of the present invention provides a processor including a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, a shared floating point unit, and data.
  • the cache and the internal bus also include a shared instruction cache, a private instruction cache, a missing cache, and a tag comparison logic.
  • the hardware thread identifier is added to the data storage array of the shared instruction cache, and the cache line retrieved when the cache is missing is Which hardware thread sends a cache miss request, when the shared instruction cache is replaced, the replaced cache line is stored in the private instruction cache corresponding to the corresponding hardware thread according to the hardware thread identifier, and the missing cache is used for
  • the hardware thread receives the cache line returned by the cache miss request, it does not directly fill the cache line back into the shared instruction cache, but saves the cache line in the missing cache until the hardware thread fetches the line.
  • Backfilling into the shared instruction cache, reducing the upcoming access In addition to the chance that the cache line is replaced by the instruction cache, in addition, the increased private instruction cache increases the cache capacity of each hardware thread and improves system performance.
  • a further embodiment of the present invention provides a method for managing an instruction cache, as shown in FIG. 2, including:
  • the processor When the hardware thread of the processor acquires an instruction from the instruction cache, the processor simultaneously accesses the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread.
  • the Central Processing Unit can be a multi-threaded processor.
  • a physical core can have multiple hardware threads, also called logical cores or logical processors, but a hardware thread does not represent a physical core. Windows will each hardware thread be treated as a schedulable logical processor, each The logical processor can run the code of the software thread.
  • the instruction cache can be a shared instruction cache (I-Cache) in the LI Cache in the processor and a private instruction cache of the hardware thread.
  • the LI Cache includes a data cache (D-Cache) and an instruction cache (I-Cache).
  • a fully associated private Cache can be set in each hardware thread, that is, the private Cache corresponds to the hardware thread.
  • the fully associative structure maps any block of instructions in the private instruction cache to any block of instructions in the main memory.
  • a Tag (tag) comparison logic can be added.
  • the private thread of the hardware thread is actively connected with the Tag logically, so that when a hardware thread fetches the finger, the I-Cache and the I-Cache are simultaneously accessed.
  • the processor determines whether the shared instruction cache and the private instruction cache corresponding to the hardware thread have an instruction, and then proceeds to step 103 or 104 or 105 or 106.
  • the hardware thread accesses the I-Cache and the private Cache corresponding to the hardware thread, it determines whether the I-Cache and the private Cache corresponding to the hardware thread have the fetched instruction.
  • the 32 hardware threads share a 64 KB I-Cache, that is, the shared instruction cache capacity is 64 KB.
  • Each hardware thread contains a 32-way fully-associated private Cache that can store 32 replaced Cache Lines, each of which contains 64 Bytes, so that each private Cache has a capacity of 2 KB.
  • the hardware thread compares the 32-way Tag read by the hardware thread with the PA (Physical Address) output by the TLB while accessing the I-Cache shared instruction cache, and Generate private Cache Miss signal and private Cache data output. If the 32-channel tag is the same as the PA, the private Cache Miss signal indicates that the private Cache of the hardware thread has the fetched instruction, and the private Cache data is a valid instruction. As shown in Figure 3.
  • PA Physical Address
  • TLB page table buffer
  • page table buffer which stores some page table files (virtual address to physical address conversion table, you can convert the virtual address of the fetched instruction into a physical address through TLB, in the physical address and private
  • the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
  • the processor obtains the instruction from the shared instruction cache.
  • the instruction is fetched from the I-Cache.
  • the processor acquires the instruction from the shared instruction cache.
  • the fetched instruction exists from the I-Cache.
  • the processor acquires the instruction from the private instruction cache corresponding to the hardware thread.
  • the I-Cache does not hit, that is, there is no instruction fetched, and the fetch instruction exists in the private Cache corresponding to the hardware thread
  • the instruction is obtained from the private Cache corresponding to the hardware thread. In this way, by actively selecting the private cache corresponding to the hardware thread to participate in the tag comparison, the Cache capacity allocated by each hardware thread can be expanded, and the hit rate of the instruction cache of the hardware thread is increased.
  • the processor sends a cache miss request to the next level cache of the shared instruction cache through the hardware thread.
  • the hardware thread issues a Cache Miss to the next-level cache of the I-Cache.
  • the hardware thread sends a Cache Miss to the L2 Cache of the next Cache of the LI Cache to obtain the fetched instruction from the L2 Cache.
  • the processor acquires the instruction from the next level cache through the hardware thread, and stores the cache line where the instruction is located in a missing cache corresponding to the hardware thread, when the hardware thread fetches the finger, Backfill the cache line into the shared instruction cache.
  • the instruction in the L2 Cache exists, the instruction is obtained from the L2 Cache, and the Cache Line where the instruction is located is not directly backfilled into the LI Cache, but the Cache Line where the instruction is fetched is saved in the In the Miss Buffer corresponding to the hardware thread, the Cache Line is filled into the LI Cache until the hardware thread fetches.
  • Miss Buffer and hardware thread - corresponding that is, each hardware thread has a Miss Buffer
  • each hardware thread uses a Miss Buffer to cache the Cache Line returned by the Cache Miss request, which is due to the Cache Line Replacement occurs when backfilling to LI Cache.
  • the replaced Cache Line may be the Cache Line to be accessed.
  • the existence of Miss Buffer optimizes the backfilling time of Cache Line, and reduces the cache that will be replaced by the Cache Line to be accessed. The chance.
  • the processor sends a missing request to the main memory through the hardware thread, obtains the instruction from the main memory, and stores the cache line where the instruction is located in the missing cache corresponding to the hardware thread. When the hardware thread fetches, the cache line is backfilled into the shared instruction cache.
  • the hardware thread issues a Cache Miss request to the main memory to obtain the fetched instruction from the main memory. If the fetched instruction exists in the main memory, the fetched instruction is obtained, and the Cache Line where the fetched instruction is stored is stored in the Miss Buffer corresponding to the hardware thread, until the hardware thread fetches the instruction, the Cache Line is filled in. In the LI Cache.
  • the hardware thread sends a Cache Miss request to the L3 Cache. If the instruction fetched in the L3 Cache, the fetched instruction is obtained, and if the fetch instruction does not exist in the L3 Cache, A Cache Miss request is issued to the main memory to obtain the fetched instruction.
  • the exchange unit between the CPU and the Cache is a word.
  • the CPU reads a word in the main memory
  • the memory address of the word is sent to the Cache and the main memory at the same time, and the LI Cache or the L2 Cache or the L3 Cache can be controlled in the Cache.
  • the logic determines whether there is a word according to the Tag tag part of the address. If it hits, the CPU obtains the word. If it does not, it reads out from the main memory and outputs it to the CPU using the main memory read cycle, even if the current CPU reads only one word.
  • the Cache controller also copies a complete Cache line containing the word in the main memory to the Cache. This operation of transferring a row of data to the Cache is called Cache line filling.
  • the cache line when the cache line is backfilled into the shared instruction cache, if there is no idle resource in the shared instruction cache, the cache line is replaced with the first cache line in the shared instruction cache, and the cache line is backfilled into the shared instruction cache, and Obtaining a hardware thread identifier of the first hardware thread of the first cache line, and storing the first cache line in a private instruction cache corresponding to the first hardware thread.
  • the first cache line is determined by an LRU (Least Recently Used) algorithm.
  • Thread ID hardware thread ID
  • I-Cache Data Array which is used to indicate which Cache Line is a Cache Miss request from which hardware thread. Taken back.
  • Thread ID hardware thread ID
  • the replaced Cache Line is not directly discarded, and the replaced Cache can be replaced according to the Thread ID.
  • Line is filled in the private Cache of the hardware thread identified by the Thread ID, which is due to the possibility that the replaced Cache Line will be accessed soon. As shown in Figure 4.
  • the first cache line is replaced with the second cache line in the private instruction cache corresponding to the first hardware thread, and the first cache line is backfilled to The first hardware thread corresponds to the private instruction cache.
  • the second cache line is determined by an LRU algorithm.
  • the LRU algorithm is to replace the one of the longest unused instructions out of the cache once the instruction cache is missing. In other words, the cache first retains the most frequently used instructions.
  • the instruction cache capacity allocated by each hardware thread is effectively expanded, the hit rate of the instruction cache of the hardware thread is increased, and the communication between the I-Cache and the next-level Cache is reduced.
  • the added buffer buffer optimizes the backfilling time of the Cache Line, reduces the probability that the Cache Line to be accessed is replaced, and increases the Tag comparison logic, so that the shared instruction cache and the private instruction cache are simultaneously accessed when accessing the I-Cache. , increased the hit rate of the instruction cache.
  • An embodiment of the present invention provides a method for managing an instruction cache.
  • a hardware thread of a processor acquires an instruction from an instruction cache
  • the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread are simultaneously accessed to determine a shared instruction cache.
  • the hardware thread obtains an instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the judgment result, and if there is no instruction in the shared instruction cache and the private instruction cache, the hardware thread
  • the next level cache of the shared instruction cache sends a cache miss request, and stores the cache line of the instruction in the missing cache corresponding to the hardware thread.
  • the cache line is backfilled into the shared instruction cache, and the cache is cached.
  • the row is backfilled into the shared instruction cache, if there is no free resource in the shared instruction cache, the cache line is replaced with the first cache line in the shared instruction cache, the cache line is backfilled into the shared instruction cache, and the first cache line is obtained according to the Hardware thread of the first hardware thread
  • the first cache line is stored in the private instruction cache corresponding to the first hardware thread, which can expand the instruction cache capacity of the hardware thread, reduce the missing rate of the instruction cache, and improve system performance.
  • the disclosed processor and method may be implemented in other manners.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection between the components shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • each functional unit may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit.
  • the above units may be implemented in the form of hardware or in the form of hardware plus software functional units.
  • All or part of the steps of implementing the foregoing method embodiments may be performed by hardware related to the program instructions.
  • the foregoing program may be stored in a computer readable storage medium, and when executed, the program includes the steps of the foregoing method embodiments;
  • the foregoing storage medium includes:
  • U disk, removable hard disk, read only memory (ROM), random access memory (RAM), disk or optical disk, etc. can store various program code media.

Abstract

A management method for an instruction cache, and a processor, which relate to the field of computers, and can expand the instruction cache capacity of hardware threads, reduce the missing rate of an instruction cache and improve the system performance. A hardware thread identifier in a shared instruction cache of a processor is used for identifying a hardware thread corresponding to a cache line in the shared instruction cache. A private instruction cache is used for storing an instruction cache line which is replaced from the shared instruction cache. A missing cache is also included. When acquiring an instruction from an instruction cache, the hardware thread of the processor simultaneously accesses the shared instruction cache and the private instruction cache corresponding to the hardware thread in the instruction cache, determines whether the shared instruction cache and the private instruction cache corresponding to the hardware thread have instructions or not, and acquires the instructions from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to a judgement result. The management method is used for managing an instruction cache of a processor.

Description

一种指令緩存的管理方法和处理器 本申请要求于 2013 年 6 月 28 日提交中国专利局、 申请号为 201310269557.0、发明名称为 "一种指令緩存的管理方法和处理器" 的中国专 利申请的优先权, 上述专利申请的全部内容通过引用结合在本申请中。  The invention relates to a method and a processor for managing an instruction cache. The application is filed on June 28, 2013 by the Chinese Patent Office, the application number is 201310269557.0, and the invention is entitled "A Method of Cache Management and Processor". The entire contents of the above-identified patent application are incorporated herein by reference.
技术领域 Technical field
本发明涉及计算机领域, 尤其涉及一种指令緩存的管理方法和处理 器。  The present invention relates to the field of computers, and in particular, to a method and a processor for managing an instruction cache.
背景技术 Background technique
CPU ( Central Processing Unit, 中央处理机)緩存( Cache Memory ) 是位于 CPU与内存之间的临时存储器,容量比内存小得多,解决了 CPU 运算速度与内存读写速度不匹配的矛盾, 加快了 CPU的读取速度。  The CPU (Central Processing Unit) cache (Cache Memory) is a temporary memory located between the CPU and the memory. The capacity is much smaller than the memory, which solves the contradiction between the CPU operation speed and the memory read/write speed. CPU read speed.
在多线程处理器中, 多个硬件线程从同一块 I-Cache (指令緩存) 中获取指令,当 I-Cache中不存在所要获取的指令时,在向下一级 Cache 发送缺失请求的同时, 切换到其他硬件线程访问 I-Cache继续取指, 减 少了流水线由于 I-Cache缺失所导致的停顿, 提高了流水线效率。 但是 由于每个硬件线程分到的共享 I-Cache 资源不足时, I-Cache缺失率增 大, I-Cache 发往下一级 Cache 的缺失请求会频繁发生, 且从下一级 Cache取回指令回填时, 在线程数据增多时, 会导致填入的指令所在的 Cache行立即填入到缺失的 I-Cache中不会立即用到,而替换出的 Cache 行反而有可能被再次使用。  In a multi-threaded processor, multiple hardware threads fetch instructions from the same I-Cache (instruction cache). When there is no instruction to be fetched in the I-Cache, the missing request is sent to the next-level Cache. Switching to other hardware threads to access the I-Cache continues to fetch, reducing the stalls caused by the I-Cache miss and improving pipeline efficiency. However, because the shared I-Cache resources allocated by each hardware thread are insufficient, the I-Cache miss rate increases, and the missing requests sent by the I-Cache to the next-level cache frequently occur, and the instructions are retrieved from the next-level cache. When backfilling, when the thread data increases, the cache line where the filled instruction is located will be filled into the missing I-Cache and will not be used immediately, and the replaced cache line may be used again.
另外, 在根据 Cache命中情况来调整 Thread (线程)的调度策略时, 会尽量在一段时间内保证优先调度访存指令在 Cache 中命中率高的线 程, 但是对于每个硬件线程分到的共享 I-Cache 资源不足的问题并没有 得到改善。 发明内容 In addition, when adjusting the Thread (thread) scheduling policy according to the Cache hit situation, it will try to ensure that the priority fetching instruction fetches the thread with a high hit rate in the Cache for a period of time, but for each hardware thread, the shared I is shared. - The problem of insufficient Cache resources has not improved. Summary of the invention
本发明的实施例提供一种指令緩存的管理方法和处理器, 能够扩 大硬件线程的指令緩存容量, 降低指令緩存的缺失率, 提高系统性能。  Embodiments of the present invention provide a method and a processor for managing an instruction cache, which can expand the instruction cache capacity of a hardware thread, reduce the missing rate of the instruction cache, and improve system performance.
为达到上述目的, 本发明的实施例釆用如下技术方案:  In order to achieve the above object, embodiments of the present invention use the following technical solutions:
第一方面, 提供一种处理器, 其特征在于, 包括程序计数器、 寄 存器堆、指令预取部件、指令译码部件、指令发射部件、地址生成单元、 算术逻辑单元、 共享浮点单元、 数据緩存以及内部总线, 还包括:  In a first aspect, a processor is provided, comprising: a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, a shared floating point unit, and a data cache. As well as the internal bus, it also includes:
共享指令緩存, 用于存储所有硬件线程的共享指令, 包括标签存 储阵列和数据存储阵列, 所述标签存储阵列用于存储标签, 所述数据存 储阵列包括存储的指令和硬件线程标识,所述硬件线程标识用于识别所 述共享指令緩存中的緩存行对应的硬件线程;  a shared instruction cache, a shared instruction for storing all hardware threads, including a tag storage array and a data storage array, the tag storage array for storing tags, the data storage array including stored instructions and hardware thread identifiers, the hardware The thread identifier is used to identify a hardware thread corresponding to the cache line in the shared instruction cache;
私有指令緩存, 用于存储从所述共享指令緩存中替换出的指令緩 存行, 所述私有指令緩存与所述硬件线程——对应;  a private instruction cache, configured to store an instruction cache line that is replaced from the shared instruction cache, where the private instruction cache corresponds to the hardware thread;
缺失緩存, 用于当所述共享指令緩存中不存在所取指令时, 将从 所述共享指令緩存的下一级緩存中取回的緩存行保存在所述硬件线程 的缺失緩存中,在所述所取指令对应的硬件线程取指时,将所述缺失緩 存器中的緩存行回填至所述共享指令緩存中,所述缺失緩存与所述硬件 线程 对应。  a missing cache, configured to save the cache line retrieved from the next-level cache of the shared instruction cache in a missing cache of the hardware thread when the fetched instruction does not exist in the shared instruction cache. When the hardware thread corresponding to the instruction fetches the instruction, the cache line in the missing buffer is backfilled into the shared instruction cache, and the missing cache corresponds to the hardware thread.
结合第一方面, 在第一方面的第一种可能实现的方式中, 还包括: 标签比较逻辑, 用于当所述硬件线程取指时, 将所述硬件线程对 应的私有指令緩存中的标签与翻译后援緩冲器转换的物理地址进行比 较,将所述私有指令緩存与所述标签比较逻辑相连, 以使得所述硬件线 程在访问所述共享指令緩存的同时访问所述私有指令緩存。  With reference to the first aspect, in a first possible implementation manner of the first aspect, the method further includes: label comparison logic, configured to: when the hardware thread fetches, the label in the private instruction cache corresponding to the hardware thread In contrast to the physical address translated by the translation buffer, the private instruction cache is logically coupled to the tag comparison such that the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
结合第一方面的第一种可能实现的方式, 在第二种可能实现的方 式中, 所述处理器为多线程处理器, 所述私有指令緩存的结构为全相联 结构,所述全相联结构为主存储器中的任意一块指令緩存映射所述私有 指令緩存中的任意一块指令緩存。  With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the processor is a multi-thread processor, and the structure of the private instruction cache is a fully associative structure, the full phase The association structure maps any block of instructions in the private instruction cache to any block of instructions in the main memory.
结合第一方面的第二种可能实现的方式, 在第三种可能实现的方 式中, 所述共享指令緩存、私有指令緩存和所述缺失緩存为静态存储芯 片或动态存储芯片。 第二方面, 提供一种指令緩存的管理方法, 包括: In conjunction with the second possible implementation of the first aspect, in a third possible implementation manner, the shared instruction cache, the private instruction cache, and the missing cache are static memory chips or dynamic memory chips. The second aspect provides a method for managing an instruction cache, including:
当处理器的硬件线程在从指令緩存中获取指令时, 同时访问所述 指令緩存中的共享指令緩存和所述硬件线程对应的私有指令緩存; 确定所述共享指令緩存和所述硬件线程对应的私有指令緩存是否 存在所述指令,并根据判断结果从所述共享指令緩存或所述硬件线程对 应的私有指令緩存中获取所述指令。  When the hardware thread of the processor acquires the instruction from the instruction cache, simultaneously accessing the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread; determining that the shared instruction cache corresponds to the hardware thread Whether the instruction exists in the private instruction cache, and the instruction is obtained from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the judgment result.
结合第二方面, 在第二方面的第一种可能实现的方式中, 所述共 享指令緩存包括标签存储阵列和数据存储阵列,所述标签存储阵列用于 存储标签, 所述数据存储阵列包括存储的指令和硬件线程标识, 所述硬 件线程标识用于识别所述共享指令緩存中的緩存行对应的硬件线程; 所述私有指令緩存的结构为全相联结构, 所述全相联结构为主存 储器中的任意一块指令緩存映射所述私有指令緩存中的任意一块指令 緩存, 所述私有指令緩存与所述硬件线程——对应。  With reference to the second aspect, in a first possible implementation manner of the second aspect, the shared instruction cache includes a label storage array and a data storage array, the label storage array is configured to store labels, and the data storage array includes storage And the hardware thread identifier, the hardware thread identifier is used to identify a hardware thread corresponding to the cache line in the shared instruction cache; the structure of the private instruction cache is a fully associative structure, and the fully associative structure is mainly Any block of instructions in the memory maps any block of instructions in the private instruction cache, the private instruction cache corresponding to the hardware thread.
结合第二方面的第一种可能实现的方式, 在第二种可能实现的方 式中,所述确定所述共享指令緩存和所述硬件线程对应的私有指令緩存 是否存在所述指令,并根据判断结果从所述共享指令緩存或所述硬件线 程对应的私有指令緩存中获取所述指令包括:  With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner, the determining, by the shared instruction cache, and the private instruction cache corresponding to the hardware thread, whether the instruction exists, and determining The obtaining the instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread includes:
若所述共享指令緩存和所述硬件线程对应的私有指令緩存同时存 在所述指令, 则从所述共享指令緩存中获取所述指令;  And if the shared instruction cache and the private instruction cache corresponding to the hardware thread simultaneously exist in the instruction, acquiring the instruction from the shared instruction cache;
若所述共享指令緩存中存在所述指令且所述硬件线程对应的私有 指令緩存不存在所述指令, 则从所述共享指令緩存中获取所述指令; 若所述硬件线程对应的私有指令緩存存在所述指令且所述共享指 令緩存中不存在所述指令,则从所述硬件线程对应的私有指令緩存获取 所述指令。  If the instruction exists in the shared instruction cache and the private instruction cache corresponding to the hardware thread does not have the instruction, the instruction is obtained from the shared instruction cache; if the hardware thread corresponds to a private instruction cache If the instruction exists and the instruction does not exist in the shared instruction cache, the instruction is obtained from a private instruction cache corresponding to the hardware thread.
结合第二方面的第二种可能实现的方式, 在第三种可能实现的方 式中, 所述方法还包括:  In conjunction with the second possible implementation of the second aspect, in a third possible implementation, the method further includes:
若所述共享指令緩存和所述私有指令緩存都不存在所述指令, 则 通过所述硬件线程向所述共享指令緩存的下一级緩存发送緩存缺失请 求,  If the instruction does not exist in the shared instruction cache and the private instruction cache, sending a cache miss request to the next level cache of the shared instruction cache by the hardware thread,
若所述下一级緩存中存在所述指令, 则通过所述硬件线程从所述 下一级緩存中获取所述指令,并将所述指令所在的緩存行存储在所述硬 件线程对应的缺失緩存中, 在所述硬件线程取指时,将所述緩存行回填 至所述共享指令緩存中; If the instruction exists in the next level cache, the hardware thread is used to Obtaining the instruction in the next level cache, and storing the cache line in which the instruction is located in a missing cache corresponding to the hardware thread, and backfilling the cache line to the share when the hardware thread fetches the finger In the instruction cache;
若所述下一级緩存中不存在所述指令, 则通过所述硬件线程向主 存储器发送所述缺失请求,从所述主存储器中获取所述指令, 并将所述 指令所在的緩存行存储在所述硬件线程对应的缺失緩存中,在所述硬件 线程取指时, 将所述緩存行回填至所述共享指令緩存中;  And if the instruction does not exist in the next level cache, sending the missing request to the main memory by the hardware thread, acquiring the instruction from the main memory, and storing the cache line where the instruction is located In the missing cache corresponding to the hardware thread, when the hardware thread fetches, the cache line is backfilled into the shared instruction cache;
其中, 所述缺失緩存与所述硬件线程——对应。  The missing cache corresponds to the hardware thread.
结合第二方面的第三种可能实现的方式, 在第四种可能实现的方 式中,在将所述緩存行回填至所述共享指令緩存中时, 若所述共享指令 緩存不存在空闲资源, 则将所述緩存行替换所述共享指令緩存中的 第一緩存行, 将所述緩存行回填至所述共享指令緩存中, 同时根据获取 所述第一緩存行的第一硬件线程的硬件线程标识,将所述第一緩存行存 储在所述第一硬件线程对应的私有指令緩存中;  In conjunction with the third possible implementation of the second aspect, in a fourth possible implementation manner, when the cache line is backfilled into the shared instruction cache, if the shared instruction cache does not have idle resources, And replacing the cache line with the first cache line in the shared instruction cache, backfilling the cache line into the shared instruction cache, and according to the hardware thread of the first hardware thread acquiring the first cache line Identifying, storing the first cache line in a private instruction cache corresponding to the first hardware thread;
其中, 所述第一緩存行是通过最近最少使用到算法确定的。  The first cache line is determined by a least recently used algorithm.
结合第二方面的第四种可能实现的方式, 在第五种可能实现的方 式中,在将替换出来的第一緩存行存储在获取所述第一硬件线程对应的 私有指令緩存中时,若所述第一硬件线程对应的私有指令緩存不存在空 闲资源,则将所述第一緩存行替换所述第一硬件线程对应的私有指令緩 存中的第二緩存行,将所述第一緩存行回填至所述第一硬件线程对应的 私有指令緩存中;  In conjunction with the fourth possible implementation of the second aspect, in a fifth possible implementation manner, when the replaced first cache line is stored in the private instruction cache corresponding to the first hardware thread, The private cache corresponding to the first hardware thread does not have an idle resource, and the first cache line is replaced with a second cache line in the private instruction cache corresponding to the first hardware thread, and the first cache line is used. Backfilling into the private instruction cache corresponding to the first hardware thread;
其中, 所述第二緩存行是通过所述最近最少使用到算法确定的。 本发明实施例提供一种指令緩存的管理方法和处理器, 处理器包括 程序计数器、 寄存器堆、 指令预取部件、 指令译码部件、 指令发射部件、 地址生成单元、 算术逻辑单元、 共享浮点单元、 数据緩存以及内部总线, 还包括共享指令緩存, 私有指令緩存, 缺失緩存和标签比较逻辑。 其中, 共享指令緩存, 用于存储所有硬件线程的共享指令, 包括标签存储阵列 和数据存储阵列, 数据存储阵列包括存储的指令和硬件线程标识, 硬件 线程标识用于识别共享指令緩存中的緩存行对应的硬件线程, 私有指令 緩存, 用于存储从共享指令緩存中替换出的指令緩存行, 私有指令緩存 与硬件线程——对应, 标签比较逻辑, 用于当硬件线程取指时, 将该硬 件线程对应的私有指令緩存中的标签与翻译后援緩冲器转换的物理地址 进行比较, 私有指令緩存与标签比较逻辑相连, 以使得硬件线程在访问 共享指令緩存的同时访问私有指令緩存, 当处理器的硬件线程在从指令 緩存中获取指令时, 同时访问指令緩存中的共享指令緩存和硬件线程对 应的私有指令緩存, 确定共享指令緩存和硬件线程对应的私有指令緩存 是否存在指令, 并根据判断结果从共享指令緩存或硬件线程对应的私有 指令緩存中获取指令, 能够扩大硬件线程的指令緩存容量, 降低指令緩 存的缺失率, 提高系统性能。 The second cache line is determined by the least recently used algorithm. Embodiments of the present invention provide a method and a processor for managing an instruction cache. The processor includes a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, and a shared floating point. Units, data caches, and internal buses also include shared instruction caches, private instruction caches, missing caches, and tag comparison logic. The shared instruction cache is used to store shared instructions of all hardware threads, including a tag storage array and a data storage array. The data storage array includes stored instructions and hardware thread identifiers, and the hardware thread identifier is used to identify cache lines in the shared instruction cache. Corresponding hardware thread, private instruction cache, for storing instruction cache lines replaced from the shared instruction cache, private instruction cache Corresponding to the hardware thread - corresponding to the label comparison logic, when the hardware thread fetches the pointer, compares the label in the private instruction cache corresponding to the hardware thread with the physical address converted by the translation backup buffer, and the private instruction cache and label The comparison logic is connected such that the hardware thread accesses the private instruction cache while accessing the shared instruction cache. When the hardware thread of the processor acquires the instruction from the instruction cache, the shared instruction cache in the instruction cache and the private corresponding to the hardware thread are simultaneously accessed. The instruction cache determines whether there is an instruction in the shared instruction cache and the private instruction cache corresponding to the hardware thread, and obtains an instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the judgment result, thereby expanding the instruction cache capacity of the hardware thread and reducing the instruction. The cache miss rate improves system performance.
附图说明 DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案, 下面将 对实施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见 地, 下面描述中的附图是本发明的一些实施例, 对于本领域普通技术人 员来讲, 在不付出创造性劳动的前提下, 还可以根据这些附图获得其他 的附图。  In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are Some embodiments of the present invention may also be used to obtain other drawings based on these drawings without departing from the prior art.
图 1为本发明实施例提供的一种处理器结构示意图;  1 is a schematic structural diagram of a processor according to an embodiment of the present invention;
图 2为本发明实施例提供的一种指令緩存的管理方法流程示意图; 图 3 为本发明实施例提供的一种同时访问共享指令緩存和私有指 令緩存的逻辑示意图;  2 is a schematic flowchart of a method for managing an instruction cache according to an embodiment of the present invention; FIG. 3 is a schematic diagram of a simultaneous access to a shared instruction cache and a private instruction cache according to an embodiment of the present invention;
图 4为本发明实施例提供的一种根据緩存缺失请求取回緩存行时的 逻辑示意图。  FIG. 4 is a schematic diagram of a method for retrieving a cache line according to a cache miss request according to an embodiment of the present invention.
具体实施方式 detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有做 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。 在现代多线程处理器设计中, 随着硬件线程数目的增多, 每个硬 件线程所能分到的共享资源会存在不足, 例如, 对于 Cache (緩存) 中 的 LI ( Level 1 ) Cache这一重要共享资源更是如此。 每个硬件线程分 到的 LI Cache 的指令緩存容量过小, L1 中会存在不命中的情况, L1 缺失率增高, 导致 LI Cache与 L2 Cache通信增多而从 L2 Cache中取 指, 或者从主存储器中取指, 处理器功耗增大。 The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without departing from the inventive scope are the scope of the present invention. In modern multi-threaded processor design, as the number of hardware threads increases, the shared resources that each hardware thread can share are insufficient. For example, the importance of LI (Level 1) Cache in Cache is This is especially true for shared resources. The instruction cache capacity of the LI Cache allocated by each hardware thread is too small, there will be a miss in L1, and the L1 miss rate will increase, resulting in increased communication between LI Cache and L2 Cache, fetching from L2 Cache, or from main memory. In the middle fetch, the processor power consumption increases.
本发明实施例提供一种处理器 01 , 如图 1 所示, 包括程序计数器 011、 寄存器堆 012、 指令预取部件 013、 指令译码部件 014、 指令发射 部件 015、 地址生成单元 016、 算术逻辑单元 017、 共享浮点单元 018、 数据緩存 019、 以及内部总线, 还包括:  The embodiment of the present invention provides a processor 01, as shown in FIG. 1, including a program counter 011, a register file 012, an instruction prefetching component 013, an instruction decoding component 014, an instruction transmitting component 015, an address generating unit 016, and an arithmetic logic. Unit 017, shared floating point unit 018, data cache 019, and internal bus also include:
共享指令緩存(I-Cache ) 020 , 私有指令緩存 021 , 缺失緩存(Miss Buffer ) 022和标签 ( Tag ) 比较逻辑 023。  Shared Instruction Cache (I-Cache) 020, Private Instruction Cache 021, Miss Buffer 022 and Tag Comparison 023.
其中, 共享指令緩存 020 , 用于存储所有硬件线程的共享指令, 包 括标签存储阵列( Tag Array ) 0201和数据存储阵列( Data Array ) 0202 , 标签存储阵列 0201用于存储标签,数据存储阵列 0202包括存储的指令 02021和硬件线程标识 ( Thread ID ) 02022 , 硬件线程标识 02022用于 识别共享指令緩存 020中的緩存行对应的硬件线程。  The shared instruction cache 020 is configured to store sharing instructions of all hardware threads, including a tag storage array ( Tag Array ) 0201 and a data storage array (Data Array ) 0202. The tag storage array 0201 is used to store tags, and the data storage array 0202 includes The stored instruction 02021 and the hardware thread identifier (Thread ID) 02022 are used to identify the hardware thread corresponding to the cache line in the shared instruction cache 020.
私有指令緩存 021 ,用于存储从共享指令緩存 020中替换出的指令 緩存行, 私有指令緩存 021与硬件线程——对应。  The private instruction cache 021 is used to store the instruction cache line replaced from the shared instruction cache 020, and the private instruction cache 021 corresponds to the hardware thread.
缺失緩存 022 , 用于当共享指令緩存 020中不存在所取指令时, 将 从共享指令緩存 020 的下一级緩存中取回的緩存行进行緩存在硬件线 程的缺失緩存中, 在所取指令对应的硬件线程取指时, 将缺失緩存 022 中的緩存行回填至共享指令緩存中, 缺失緩存 022 与硬件线程——对 应。  The missing cache 022 is configured to cache the cache line retrieved from the next level cache of the shared instruction cache 020 in the missing cache of the hardware thread when the instruction fetched in the shared instruction cache 020 does not exist, in the fetched instruction When the corresponding hardware thread fetches, the cache line in the missing cache 022 is backfilled into the shared instruction cache, and the missing cache 022 corresponds to the hardware thread.
标签比较逻辑, 用于当硬件线程取指时, 将该硬件线程对应的私 有指令緩存中的标签与 TLB ( Translation Look-aside Buffers, 翻译后援 緩冲器) 转换的 PA ( Physis Adress ) 进行比较, 私有指令緩存 021与 标签比较逻辑相连,以使得硬件线程在访问共享指令緩存 020的同时访 问私有指令緩存 021。  Tag comparison logic, when the hardware thread fetches, compares the tag in the private instruction cache corresponding to the hardware thread with the PA (Physis Adress) converted by TLB (Translation Look-aside Buffers), The private instruction cache 021 is logically coupled to the tag comparison such that the hardware thread accesses the private instruction cache 021 while accessing the shared instruction cache 020.
其中, TLB 或称为页表緩冲, 里面存放的是一些页表文件 (虚拟 地址到物理地址的转换表, 可以将所取指令的虚拟地址通过 TLB转换 为物理地址, 在将物理地址与私有指令緩存中的标签进行比较后, 若物 理地址与私有指令緩存中的标签相同,使得硬件线程在访问共享指令緩 存的同时也访问私有指令緩存。 Among them, TLB or page table buffer, which stores some page table files (virtual address to physical address conversion table, you can convert the virtual address of the fetched instruction through TLB For the physical address, after comparing the physical address with the label in the private instruction cache, if the physical address is the same as the label in the private instruction cache, the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
举例来说, PC ( Program Counter , 程序计数器) 有 16 个, 为 PC0-PC15 , 一个处理器核内逻辑处理器核 (硬件线程) 的个数与 PC 的个数是一致的。  For example, there are 16 PC (Program Counter), which is PC0-PC15. The number of logical processor cores (hardware threads) in a processor core is the same as the number of PCs.
GRF ( General Register File, 寄存器堆) , 一个处理器核内的逻辑 处理器核对应一个 GRF , 数量上与 PC的数量一致。  GRF (General Register File), a logical processor core in a processor core corresponds to a GRF, and the number is the same as the number of PCs.
Fetch (指令预取部件) 用于获取指令, Decoder (指令译码部件) 用于对指令进行解码, Issue 为指令发射部件, 用于发射指令, AGU ( Address Generator Unit,地址生成单元)为进行所有地址计算的模块, 生成一个用于对访问存储器进行控制的地址。 ALU ( Arithmetic Logic Unit, 算术逻辑单元) 是 CPU ( Central Processing Unit, 中央处理器) 的执行单元, 可以由 "And Gate" (与门) 和' Or Gate" (或门) 构成的 算术逻辑单元。 共享浮点单元 ( Shared Float Point Unit ) 为处理器中专 门进行浮点算术运算的电路单元, 数据緩存(D-Cache )用于存储数据, 内部总线用于连接处理器内各部件。  Fetch (instruction prefetching component) is used to get the instruction, Decoder (instruction decoding component) is used to decode the instruction, Issue is the instruction transmitting component, used to transmit the instruction, AGU (Address Generator Unit) is used to do all The module for address calculation generates an address for controlling access to the memory. The ALU (Arithmetic Logic Unit) is an execution unit of the CPU (Central Processing Unit), an arithmetic logic unit that can be composed of "And Gate" and "Or Gate". Shared Float Point Unit is a circuit unit dedicated to floating-point arithmetic in a processor. A data buffer (D-Cache) is used to store data, and an internal bus is used to connect various components in the processor.
举例来说, 处理器 01为多线程处理器, 私有指令緩存 021的结构 为全相联结构,全相联结构为主存储器中的任意一块指令緩存映射私有 指令緩存中的任意一块指令緩存。  For example, the processor 01 is a multi-threaded processor, and the structure of the private instruction cache 021 is a fully associative structure. The fully associative structure maps an arbitrary instruction cache in the private instruction cache to any one of the instruction caches in the main memory.
举例来说, 共享指令緩存 020、 私有指令緩存 021和缺失緩存 022 为静态存储芯片或动态存储芯片。  For example, shared instruction cache 020, private instruction cache 021, and missing cache 022 are static memory chips or dynamic memory chips.
举例来说, 可以在 I-Cache Data Array (指令高速緩存数据存储阵 列 ) 中新增 Thread ID (硬件线程标识 ) , 该 Thread ID用于表示 Cache Line是由哪一个硬件线程发出的 Cache Miss (緩存不命中) 请求所取 回的。  For example, you can add a Thread ID (hardware thread ID) to the I-Cache Data Array, which is used to indicate which hardware thread the Cache Line is issued by. Missing) The request was retrieved.
例如, 当硬件线程访问 L1的 I-Cache不命中, 即 I-Cache中不存在 硬件线程所要获得的指令时, L1 向 L1 的下一级緩存 L2 Cache发送 Cache Miss请求, 若 L2 Cache命中, 即 L2 Cache中存在该硬件线程所 要获得的指令时, 该硬件线程将 L2 Cache 中该指令所在的緩存行 ( Cache Line )回填至 LI Cache中, 也可以是该硬件线程在收到返回的 Cache Line时,不直接将该 Cache Line填入 LI Cache中,而是将该 Cache Line保存在该硬件线程对应的 Miss Buffer中, 直到轮到该硬件线层取 指时, 将该 Cache Line填入 LI Cache中。 For example, when the hardware thread accesses the I-Cache of L1, that is, the instruction to be obtained by the hardware thread does not exist in the I-Cache, L1 sends a Cache Miss request to the next-level cache L2 Cache of L1, and if the L2 Cache hits, When the instruction to be obtained by the hardware thread exists in the L2 Cache, the hardware thread backfills the cache line (Cache Line) where the instruction is located in the L2 Cache into the LI Cache, or the hardware thread receives the return. In the Cache Line, the Cache Line is not directly filled in the LI Cache, but the Cache Line is stored in the Miss Buffer corresponding to the hardware thread, and the Cache Line is filled in until the hardware layer picks up. LI Cache.
这样, 当该硬件线程将 L2 Cache中该指令所在的緩存行回填至 L1 Cache中发生替换时, 不直接丟弃掉替换出的 Cache Line , 可以根据替 换出的 Cache Line对应的硬件线程的 Thread ID,将替换出的 Cache Line 填入该替换出的 Cache Line对应的硬件线程对应的私有指令緩存中。  In this way, when the hardware thread backfills the cache line in which the instruction is located in the L2 cache to the L1 cache, the replaced Cache Line is not directly discarded, and the Thread ID of the hardware thread corresponding to the replaced Cache Line can be used. The replaced Cache Line is filled in the private instruction cache corresponding to the hardware thread corresponding to the replaced Cache Line.
举例来说, 发生替换可以是由于 LI Cache中不存在空闲资源导致 的, 替换出的 Cache Line可以根据 LRU ( Least Recently Used, 最近最 少使用到的) 算法得到。  For example, the replacement may be caused by the absence of idle resources in the LI Cache, and the replaced Cache Line may be obtained according to the LRU (Least Recently Used, least recently used) algorithm.
其中, LRU 算法就是一旦指令緩存出现缺失, 则将未被使用到的 时间最长的一条指令替换出緩存,换句话说,緩存首先保留的是最近常 被使用到的指令。  Among them, the LRU algorithm is to replace the one of the longest unused instructions out of the cache once the instruction cache is missing. In other words, the cache first retains the most frequently used instructions.
举例来说, 一个硬件线程在取指时, 可以同时访问 I-Cache和该硬 件线程对应的私有 Cache。  For example, when a hardware thread fetches a finger, it can simultaneously access the I-Cache and the private Cache corresponding to the hardware thread.
若 I-Cache中存在所取指令而该硬件线程对应的私有 Cache不存在 所取指令, 则从 I-Cache中获得所取指令;  If the instruction fetched in the I-Cache and the private Cache corresponding to the hardware thread does not have the fetched instruction, the fetched instruction is obtained from the I-Cache;
若 I-Cache中不存在所取指令而该硬件线程对应的私有 Cache存在 所取指令,则从该硬件线程对应的私有 Cache存在所取指令获得所取指 令;  If the instruction fetched in the I-Cache does not exist and the private Cache corresponding to the hardware thread has an instruction fetched, the instruction fetched from the private Cache corresponding to the hardware thread obtains the fetched instruction;
若 I-Cache和该硬件线程对应的私有 Cache中同时存在所取指令, 则从 I-Cache中获得所取指令;  If the instruction fetched by the I-Cache and the private Cache corresponding to the hardware thread, the fetched instruction is obtained from the I-Cache;
若 I-Cache和该硬件线程对应的私有 Cache中都不存在所取指令, 则该硬件线程向 I-Cache的下一级緩存发送 Cache Miss请求,以获得所 取指令。  If the I-Cache and the private Cache corresponding to the hardware thread do not have the fetched instruction, the hardware thread sends a Cache Miss request to the next-level cache of the I-Cache to obtain the fetched instruction.
举例来说, 根据硬件线程的调取策略, 当下一个周期切换到其它 线程取指时, 在访问共享指令緩存的同时,将新的硬件线程对应的私有 Cache与 Tag (标签 ) 比较逻辑相连, 将该私有 Cache读出的 Tag比较 逻辑与 TLB ( Translation Look-aside Buffers , 转译后备緩冲区)输出的 PA ( Physics Address , 物理地址) 进行比较, 生成私有 Cache Miss信 号和私有 Cache数据输出。当该新的硬件线程对应的私有 Cache中存在 所取指令, 私有 Cache Miss信号表示存在指令, 并有指令输出。 For example, according to the hardware thread's retrieval strategy, when the next cycle is switched to another thread fetching, the private cache corresponding to the new hardware thread is logically connected with the tag (tag) while accessing the shared instruction cache. The Tag comparison logic read by the private Cache is compared with the PA (Physical Address) outputted by the TLB (Translation Look-aside Buffers) to generate a private Cache Miss signal and a private Cache data output. When the new hardware thread corresponds to the private Cache The instruction fetched, the private Cache Miss signal indicates that there is an instruction, and there is an instruction output.
因此, 本发明实施例提供一种处理器, 该处理器包括程序计数器、 寄存器堆、 指令预取部件、 指令译码部件、 指令发射部件、 地址生成单 元、 算术逻辑单元、 共享浮点单元、 数据緩存以及内部总线, 还包括共 享指令緩存, 私有指令緩存, 缺失緩存和标签比较逻辑, 在共享指令緩 存的数据存储阵列中增加硬件线程标识, 用来在緩存缺失时,取回的緩 存行是由哪一个硬件线程发出的緩存缺失请求取回的,当共享指令緩存 发生替换时,将替换出的緩存行根据硬件线程标识存入到对应的硬件线 程对应的私有指令緩存中,且缺失緩存用于硬件线程在收到緩存缺失请 求返回的緩存行时, 不直接将緩存行回填至共享指令緩存中, 而是将緩 存行保存在缺失緩存中, 直到轮到该硬件线程取指时, 将緩存行回填至 共享指令緩存中,降低了即将被访问到底緩存行被替换出的指令緩存的 几率, 另外, 增加的私有指令緩存, 增大了每个硬件线程的緩存容量, 提高了系统性能。  Therefore, an embodiment of the present invention provides a processor including a program counter, a register file, an instruction prefetching component, an instruction decoding component, an instruction transmitting component, an address generating unit, an arithmetic logic unit, a shared floating point unit, and data. The cache and the internal bus also include a shared instruction cache, a private instruction cache, a missing cache, and a tag comparison logic. The hardware thread identifier is added to the data storage array of the shared instruction cache, and the cache line retrieved when the cache is missing is Which hardware thread sends a cache miss request, when the shared instruction cache is replaced, the replaced cache line is stored in the private instruction cache corresponding to the corresponding hardware thread according to the hardware thread identifier, and the missing cache is used for When the hardware thread receives the cache line returned by the cache miss request, it does not directly fill the cache line back into the shared instruction cache, but saves the cache line in the missing cache until the hardware thread fetches the line. Backfilling into the shared instruction cache, reducing the upcoming access In addition to the chance that the cache line is replaced by the instruction cache, in addition, the increased private instruction cache increases the cache capacity of each hardware thread and improves system performance.
本发明又一实施例提供一种指令緩存的管理方法, 如图 2 所示, 包括:  A further embodiment of the present invention provides a method for managing an instruction cache, as shown in FIG. 2, including:
101、 当处理器的硬件线程在从指令緩存中获取指令时, 处理器同 时访问指令緩存中的共享指令緩存和硬件线程对应的私有指令緩存。  101. When the hardware thread of the processor acquires an instruction from the instruction cache, the processor simultaneously accesses the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread.
示例性的, 该处理器 ( Central Processing Unit , CPU ) 可以为多线 程处理器。一个物理内核可以有多个硬件线程,也称为逻辑内核或逻辑 处理器, 但是一个硬件线程并不代表一个物理内核, Windows 将每一 个硬件线程是被为一个可调度的逻辑处理器,每一个逻辑处理器可以运 行软件线程的代码。 指令緩存可以为处理器中的 LI Cache中的共享指 令緩存 (I-Cache ) 和硬件线程的私有指令緩存。 其中 LI Cache包括数 据緩存 (D-Cache ) 和指令緩存 (I-Cache ) 。  Illustratively, the Central Processing Unit (CPU) can be a multi-threaded processor. A physical core can have multiple hardware threads, also called logical cores or logical processors, but a hardware thread does not represent a physical core. Windows will each hardware thread be treated as a schedulable logical processor, each The logical processor can run the code of the software thread. The instruction cache can be a shared instruction cache (I-Cache) in the LI Cache in the processor and a private instruction cache of the hardware thread. The LI Cache includes a data cache (D-Cache) and an instruction cache (I-Cache).
具体的, 可以在每个硬件线程中设置一块全相联的私有 Cache , 即 私有 Cache与硬件线程——对应。其中, 全相联结构为主存储器中的任 意一块指令緩存映射私有指令緩存中的任意一块指令緩存。  Specifically, a fully associated private Cache can be set in each hardware thread, that is, the private Cache corresponds to the hardware thread. The fully associative structure maps any block of instructions in the private instruction cache to any block of instructions in the main memory.
另外, 还可以增加一个 Tag (标签)比较逻辑, 当硬件线程取指时, 主动将硬件线程的私有 Cache与 Tag比较逻辑相连, 这样, 当一个硬 件线程取指时, 同时访问 I-Cache和该硬件线程对应的私有 Cache。 102、 处理器确定共享指令緩存和硬件线程对应的私有指令緩存是 否存在指令, 而后进入步骤 103或 104或 105或 106。 In addition, a Tag (tag) comparison logic can be added. When the hardware thread fetches the finger, the private thread of the hardware thread is actively connected with the Tag logically, so that when a hardware thread fetches the finger, the I-Cache and the I-Cache are simultaneously accessed. The private cache corresponding to the hardware thread. 102. The processor determines whether the shared instruction cache and the private instruction cache corresponding to the hardware thread have an instruction, and then proceeds to step 103 or 104 or 105 or 106.
示例性的, 硬件线程同时访问 I-Cache 和该硬件线程对应的私有 Cache时, 同时判断 I-Cache和该硬件线程对应的私有 Cache是否存在 所取指令。  Exemplarily, when the hardware thread accesses the I-Cache and the private Cache corresponding to the hardware thread, it determines whether the I-Cache and the private Cache corresponding to the hardware thread have the fetched instruction.
设若该多线程处理器有 32个硬件线程, 该 32个硬件线程共用一 块 64KB的 I-Cache , 即共享指令緩存容量为 64KB。 每个硬件线程包含 有一块 32路全相联的私有 Cache,可以存储 32条替换出来的 Cache Line (緩存行), 每一条 Cache Line包含 64Bytes, 这样, 每一块私有 Cache 容量为 2KB。  If the multi-threaded processor has 32 hardware threads, the 32 hardware threads share a 64 KB I-Cache, that is, the shared instruction cache capacity is 64 KB. Each hardware thread contains a 32-way fully-associated private Cache that can store 32 replaced Cache Lines, each of which contains 64 Bytes, so that each private Cache has a capacity of 2 KB.
当新增一个 32路 Tag比较逻辑时, 硬件线程在访问 I-Cache共享 指令緩存的同时, 将该硬件线程读出的 32路 Tag与 TLB 输出的 PA ( Physics Address , 物理地址)进行比较, 并生成私有 Cache Miss信号 和私有 Cache数据输出。 若 32路 Tag与 PA相同, 则私有 Cache Miss 信号表示该硬件线程的私有 Cache存在所取指令,私有 Cache数据为有 效指令。 如图 3所示。  When a 32-way Tag comparison logic is added, the hardware thread compares the 32-way Tag read by the hardware thread with the PA (Physical Address) output by the TLB while accessing the I-Cache shared instruction cache, and Generate private Cache Miss signal and private Cache data output. If the 32-channel tag is the same as the PA, the private Cache Miss signal indicates that the private Cache of the hardware thread has the fetched instruction, and the private Cache data is a valid instruction. As shown in Figure 3.
其中, TLB 或称为页表緩冲, 里面存放的是一些页表文件 (虚拟 地址到物理地址的转换表, 可以将所取指令的虚拟地址通过 TLB转换 为物理地址, 在将物理地址与私有指令緩存中的标签进行比较后, 若物 理地址与私有指令緩存中的标签相同,使得硬件线程在访问共享指令緩 存的同时也访问私有指令緩存。  Among them, TLB or page table buffer, which stores some page table files (virtual address to physical address conversion table, you can convert the virtual address of the fetched instruction into a physical address through TLB, in the physical address and private After the tags in the instruction cache are compared, if the physical address is the same as the tag in the private instruction cache, the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
103、 若共享指令緩存和硬件线程对应的私有指令緩存同时存在指 令, 则处理器从共享指令緩存中获取指令。  103. If the shared instruction cache and the private instruction cache corresponding to the hardware thread have the same instruction, the processor obtains the instruction from the shared instruction cache.
示例性的,在同时对 I-Cache和私有 Cache进行访问时,若 I-Cache 和私有 Cache同时存在所取指令, 则从 I-Cache中取得指令。  Exemplarily, when the I-Cache and the private Cache are accessed at the same time, if the I-Cache and the private Cache have the fetched instruction at the same time, the instruction is fetched from the I-Cache.
104、 若共享指令緩存中存在指令且硬件线程对应的私有指令緩存 不存在指令, 则处理器从共享指令緩存中获取指令。  104. If there is an instruction in the shared instruction cache and the private instruction cache corresponding to the hardware thread does not exist, the processor acquires the instruction from the shared instruction cache.
示例性的, 若 I-Cache中存在所取指令, 硬件线程的私有 Cache不 存在所取指令, 则从 I-Cache中存在所取指令。  Exemplarily, if there is an instruction in the I-Cache, and the private Cache of the hardware thread does not have the fetched instruction, the fetched instruction exists from the I-Cache.
105、 若硬件线程对应的私有指令緩存存在指令且共享指令緩存中 不存在指令, 则处理器从硬件线程对应的私有指令緩存获取指令。 示例性的, 若 I-Cache中不命中, 即不存在所取指令, 硬件线程对 应的私有 Cache中存在所取指令,则从硬件线程对应的私有 Cache中取 得指令。 这样, 通过主动选取硬件线程对应的私有 Cache参与 Tag比 较, 可以扩大每个硬件线程分到的 Cache容量, 增大了硬件线程的指令 緩存的命中率。 105. If the private instruction cache corresponding to the hardware thread has an instruction and the instruction does not exist in the shared instruction cache, the processor acquires the instruction from the private instruction cache corresponding to the hardware thread. Exemplarily, if the I-Cache does not hit, that is, there is no instruction fetched, and the fetch instruction exists in the private Cache corresponding to the hardware thread, the instruction is obtained from the private Cache corresponding to the hardware thread. In this way, by actively selecting the private cache corresponding to the hardware thread to participate in the tag comparison, the Cache capacity allocated by each hardware thread can be expanded, and the hit rate of the instruction cache of the hardware thread is increased.
106、 若共享指令緩存和私有指令緩存都不存在指令, 则处理器通 过硬件线程向共享指令緩存的下一级緩存发送緩存缺失请求。  106. If there is no instruction in the shared instruction cache and the private instruction cache, the processor sends a cache miss request to the next level cache of the shared instruction cache through the hardware thread.
示例性的, 若 I-Cache和硬件线程对应的私有 Cache都不存在所取 指令, 则该硬件线程向 I-Cache的下一级緩存发出 Cache Miss (緩存缺 失) 。  Exemplarily, if the I-Cache and the private cache corresponding to the hardware thread do not have the fetched instruction, the hardware thread issues a Cache Miss to the next-level cache of the I-Cache.
例如, LI Cache和硬件线程对应的私有 Cache都不存在所取指令, 则硬件线程向 LI Cache的下一级緩存 L2 Cache发出 Cache Miss, 以求 从 L2 Cache中获得所取指令。  For example, if the LI Cache and the private Cache corresponding to the hardware thread do not have the fetched instruction, the hardware thread sends a Cache Miss to the L2 Cache of the next Cache of the LI Cache to obtain the fetched instruction from the L2 Cache.
107、 若下一级緩存中存在指令, 则处理器通过硬件线程从下一级 緩存中获取指令,并将指令所在的緩存行存储在硬件线程对应的缺失緩 存中, 在硬件线程取指时, 将緩存行回填至共享指令緩存中。  107. If there is an instruction in the next level cache, the processor acquires the instruction from the next level cache through the hardware thread, and stores the cache line where the instruction is located in a missing cache corresponding to the hardware thread, when the hardware thread fetches the finger, Backfill the cache line into the shared instruction cache.
示例性的, 当 L2 Cache中存在所取指令, 则从 L2 Cache中获得指 令, 并不直接将该指令所在的 Cache Line回填至 LI Cache中, 而是将 所取指令所在的 Cache Line保存在该硬件线程对应的 Miss Buffer (缺 失緩存)中,直到轮到该硬件线程取指时,将该 Cache Line填入 LI Cache 中。  Exemplarily, when the instruction in the L2 Cache exists, the instruction is obtained from the L2 Cache, and the Cache Line where the instruction is located is not directly backfilled into the LI Cache, but the Cache Line where the instruction is fetched is saved in the In the Miss Buffer corresponding to the hardware thread, the Cache Line is filled into the LI Cache until the hardware thread fetches.
其中, Miss Buffer与硬件线程——对应, 即每个硬件线程都有一 个 Miss Buffer,每个硬件线程都利用一个 Miss Buffer来緩存 Cache Miss 请求返回的 Cache Line ,这是由于 殳设将该 Cache Line回填至 LI Cache 时发生替换, 替换出的 Cache Line 有可能是即将被访问到的 Cache Line , Miss Buffer的存在优化了 Cache Line的回填时机, 降低了将被 访问到的 Cache Line被替换出的緩存的几率。  Among them, Miss Buffer and hardware thread - corresponding, that is, each hardware thread has a Miss Buffer, each hardware thread uses a Miss Buffer to cache the Cache Line returned by the Cache Miss request, which is due to the Cache Line Replacement occurs when backfilling to LI Cache. The replaced Cache Line may be the Cache Line to be accessed. The existence of Miss Buffer optimizes the backfilling time of Cache Line, and reduces the cache that will be replaced by the Cache Line to be accessed. The chance.
108、 若下一级緩存中不存在指令, 则处理器通过硬件线程向主存 储器发送缺失请求,从主存储器中获取指令, 并将指令所在的緩存行存 储在硬件线程对应的缺失緩存中,在硬件线程取指时, 将緩存行回填至 共享指令緩存中。 示例性的, 若 L2 Cache中也不存在所取指令, 则硬件线程向主存 储器发出 Cache Miss请求, 以求从主存储器中获得所取指令。 若主存 储器中存在所取指令, 则获得所取指令, 并将所取指令所在的 Cache Line保存在该硬件线程对应的 Miss Buffer中, 直到轮到该硬件线程取 指时, 将该 Cache Line填入 LI Cache中。 108. If there is no instruction in the next level cache, the processor sends a missing request to the main memory through the hardware thread, obtains the instruction from the main memory, and stores the cache line where the instruction is located in the missing cache corresponding to the hardware thread. When the hardware thread fetches, the cache line is backfilled into the shared instruction cache. Exemplarily, if the fetched instruction does not exist in the L2 Cache, the hardware thread issues a Cache Miss request to the main memory to obtain the fetched instruction from the main memory. If the fetched instruction exists in the main memory, the fetched instruction is obtained, and the Cache Line where the fetched instruction is stored is stored in the Miss Buffer corresponding to the hardware thread, until the hardware thread fetches the instruction, the Cache Line is filled in. In the LI Cache.
也可以是 L2 Cache中也不存在所取指令时, 硬件线程向 L3 Cache 发出 Cache Miss请求,若 L3 Cache中存在所取指令,则获得所取指令, 若 L3 Cache中不存在所取指令, 则向主存储器发出 Cache Miss请求以 获得所取指令。  Alternatively, when the instruction in the L2 Cache does not exist, the hardware thread sends a Cache Miss request to the L3 Cache. If the instruction fetched in the L3 Cache, the fetched instruction is obtained, and if the fetch instruction does not exist in the L3 Cache, A Cache Miss request is issued to the main memory to obtain the fetched instruction.
其中, CPU与 Cache之间的交换单位为字, 当 CPU要读取主存中 一个字时, 发出该字的内存地址同时到达 Cache和主存, LI Cache或 L2 Cache或 L3 Cache可以在 Cache控制逻辑依据地址的 Tag标记部分 判断是否存在字, 若命中, CPU 获得该字, 若未命中, 则要用主存读 取周期从主存中读出并输出至 CPU, 即使当前 CPU仅读一个字, Cache 控制器也要把主存储器中包含该字的一个完整的 Cache行复制到 Cache 中, 这种向 Cache传送一行数据的操作就称为 Cache行填充。  The exchange unit between the CPU and the Cache is a word. When the CPU reads a word in the main memory, the memory address of the word is sent to the Cache and the main memory at the same time, and the LI Cache or the L2 Cache or the L3 Cache can be controlled in the Cache. The logic determines whether there is a word according to the Tag tag part of the address. If it hits, the CPU obtains the word. If it does not, it reads out from the main memory and outputs it to the CPU using the main memory read cycle, even if the current CPU reads only one word. The Cache controller also copies a complete Cache line containing the word in the main memory to the Cache. This operation of transferring a row of data to the Cache is called Cache line filling.
另外, 在将緩存行回填至共享指令緩存中时, 若共享指令緩存不 存在空闲资源, 则将緩存行替换共享指令緩存中的第一緩存行, 将緩存 行回填至共享指令緩存中,同时根据获取第一緩存行的第一硬件线程的 硬件线程标识,将第一緩存行存储在第一硬件线程对应的私有指令緩存 中。 其中, 第一緩存行是通过 LRU ( Least Recently Used, 最近最少使 用到的) 算法确定的。  In addition, when the cache line is backfilled into the shared instruction cache, if there is no idle resource in the shared instruction cache, the cache line is replaced with the first cache line in the shared instruction cache, and the cache line is backfilled into the shared instruction cache, and Obtaining a hardware thread identifier of the first hardware thread of the first cache line, and storing the first cache line in a private instruction cache corresponding to the first hardware thread. The first cache line is determined by an LRU (Least Recently Used) algorithm.
示例性的, 可以在 I-Cache Data Array (指令高速緩存数据存储阵 列) 中增加 Thread ID (硬件线程标识) , 该硬件线程标识用于表示一 条 Cache Line是由哪一个硬件线程发出的 Cache Miss请求所取回的。 这样,当在每个硬件线程中设置一块全相联的私有 Cache后,当 I-Cache 发生替换时, 不直接丟弃掉替换出的 Cache Line , 而可以根据该 Thread ID ,将替换出的 Cache Line填入 Thread ID标识的硬件线程的私有 Cache 中, 这是由于替换出的 Cache Line有很快要被访问的可能性。 如图 4 所示。  For example, you can add a Thread ID (hardware thread ID) to the I-Cache Data Array, which is used to indicate which Cache Line is a Cache Miss request from which hardware thread. Taken back. In this way, when a fully-associated private cache is set in each hardware thread, when the I-Cache is replaced, the replaced Cache Line is not directly discarded, and the replaced Cache can be replaced according to the Thread ID. Line is filled in the private Cache of the hardware thread identified by the Thread ID, which is due to the possibility that the replaced Cache Line will be accessed soon. As shown in Figure 4.
在将替换出来的第一緩存行存储在获取第一硬件线程对应的私有 指令緩存中时, 若第一硬件线程对应的私有指令緩存不存在空闲资源, 则将第一緩存行替换第一硬件线程对应的私有指令緩存中的第二緩存 行, 将第一緩存行回填至第一硬件线程对应的私有指令緩存中。 其中, 第二緩存行是通过 LRU算法确定的。 Storing the replaced first cache line in the private one corresponding to the first hardware thread In the instruction cache, if the private instruction cache corresponding to the first hardware thread does not have an idle resource, the first cache line is replaced with the second cache line in the private instruction cache corresponding to the first hardware thread, and the first cache line is backfilled to The first hardware thread corresponds to the private instruction cache. The second cache line is determined by an LRU algorithm.
其中, LRU 算法就是一旦指令緩存出现缺失, 则将未被使用到的 时间最长的一条指令替换出緩存,换句话说,緩存首先保留的是最近常 被使用到的指令。  Among them, the LRU algorithm is to replace the one of the longest unused instructions out of the cache once the instruction cache is missing. In other words, the cache first retains the most frequently used instructions.
这样一来, 通过增加私有 Cache 有效的扩大了每个硬件线程分到 的指令 Cache容量, 增大了硬件线程的指令 Cache 的命中率, 减少了 I-Cache与下一级 Cache之间的通信 , 同时通过增加的 Miss Buffer优化 了 Cache Line的回填时机 , 降低了即将被访问到的 Cache Line被替换 出的几率, 增加的 Tag比较逻辑, 使得访问 I-Cache时同时访问共享指 令緩存和私有指令緩存, 增加了指令緩存的命中率。  In this way, by increasing the private Cache, the instruction cache capacity allocated by each hardware thread is effectively expanded, the hit rate of the instruction cache of the hardware thread is increased, and the communication between the I-Cache and the next-level Cache is reduced. At the same time, the added buffer buffer optimizes the backfilling time of the Cache Line, reduces the probability that the Cache Line to be accessed is replaced, and increases the Tag comparison logic, so that the shared instruction cache and the private instruction cache are simultaneously accessed when accessing the I-Cache. , increased the hit rate of the instruction cache.
本发明实施例提供一种指令緩存的管理方法, 当处理器的硬件线 程在从指令緩存中获取指令时,同时访问指令緩存中的共享指令緩存和 硬件线程对应的私有指令緩存,确定共享指令緩存和硬件线程对应的私 有指令緩存是否存在指令,并根据判断结果从共享指令緩存或硬件线程 对应的私有指令緩存中获取指令,若共享指令緩存和私有指令緩存都不 存在指令,则通过硬件线程向共享指令緩存的下一级緩存发送緩存缺失 请求, 并将指令所在的緩存行存储在硬件线程对应的缺失緩存中,在硬 件线程取指时,将緩存行回填至共享指令緩存中, 在将緩存行回填至共 享指令緩存中时, 若共享指令緩存不存在空闲资源, 则将緩存行替 换共享指令緩存中的第一緩存行,将緩存行回填至共享指令緩存中, 同 时根据获取第一緩存行的第一硬件线程的硬件线程标识,将第一緩存行 存储在第一硬件线程对应的私有指令緩存中,这样能够扩大硬件线程的 指令緩存容量, 降低指令緩存的缺失率, 提高系统性能。  An embodiment of the present invention provides a method for managing an instruction cache. When a hardware thread of a processor acquires an instruction from an instruction cache, the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread are simultaneously accessed to determine a shared instruction cache. Whether the instruction exists in the private instruction cache corresponding to the hardware thread, and obtains an instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the judgment result, and if there is no instruction in the shared instruction cache and the private instruction cache, the hardware thread The next level cache of the shared instruction cache sends a cache miss request, and stores the cache line of the instruction in the missing cache corresponding to the hardware thread. When the hardware thread fetches the instruction, the cache line is backfilled into the shared instruction cache, and the cache is cached. When the row is backfilled into the shared instruction cache, if there is no free resource in the shared instruction cache, the cache line is replaced with the first cache line in the shared instruction cache, the cache line is backfilled into the shared instruction cache, and the first cache line is obtained according to the Hardware thread of the first hardware thread The first cache line is stored in the private instruction cache corresponding to the first hardware thread, which can expand the instruction cache capacity of the hardware thread, reduce the missing rate of the instruction cache, and improve system performance.
在本申请所提供的几个实施例中, 应该理解到, 所揭露的处理器 和方法, 可以通过其它的方式实现。 例如, 以上所描述的设备实施例仅 仅是示意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实 际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可 以集成到另一个系统, 或一些特征可以忽略, 或不执行。 另一点, 所显 示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接 口, 装置或单元的间接耦合或通信连接, 可以是电性, 机械或其它的形 式。 In the several embodiments provided by the present application, it should be understood that the disclosed processor and method may be implemented in other manners. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed. Another point, the display The coupling or direct coupling or communication connection between the components shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
另外, 在本发明各个实施例中的设备和系统中, 各功能单元可以 集成在一个处理单元中,也可以是各个单元单独物理包括, 也可以两个 或两个以上单元集成在一个单元中。且上述的各单元既可以釆用硬件的 形式实现, 也可以釆用硬件加软件功能单元的形式实现。  In addition, in the devices and systems in the various embodiments of the present invention, each functional unit may be integrated into one processing unit, or each unit may be physically included separately, or two or more units may be integrated into one unit. The above units may be implemented in the form of hardware or in the form of hardware plus software functional units.
实现上述方法实施例的全部或部分步骤可以通过程序指令相关的 硬件来完成, 前述的程序可以存储于一计算机可读取存储介质中, 该程 序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括: All or part of the steps of implementing the foregoing method embodiments may be performed by hardware related to the program instructions. The foregoing program may be stored in a computer readable storage medium, and when executed, the program includes the steps of the foregoing method embodiments; The foregoing storage medium includes:
U盘、 移动硬盘、 只读存储器 (Read Only Memory , 简称 ROM ) 、 随 机存取存储器 (Random Access Memory, 简称 RAM ) 、 磁碟或者光盘 等各种可以存储程序代码的介质。 U disk, removable hard disk, read only memory (ROM), random access memory (RAM), disk or optical disk, etc. can store various program code media.
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不 局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本 发明的保护范围应以所述权利要求的保护范围为准。  The above is only a specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the appended claims.

Claims

权利 要求 书 claims
1、 一种处理器, 其特征在于, 包括程序计数器、 寄存器堆、 指令预 取部件、 指令译码部件、 指令发射部件、 地址生成单元、 算术逻辑单元、 共享浮点单元、 数据緩存以及内部总线, 其特征在于, 还包括: 1. A processor, characterized in that it includes a program counter, a register file, an instruction prefetch unit, an instruction decoding unit, an instruction issuing unit, an address generation unit, an arithmetic logic unit, a shared floating point unit, a data cache and an internal bus , which is characterized by also including:
共享指令緩存, 用于存储所有硬件线程的共享指令, 包括标签存储 阵列和数据存储阵列, 所述标签存储阵列用于存储标签, 所述数据存储 阵列包括存储的指令和硬件线程标识, 所述硬件线程标识用于识别所述 共享指令緩存中的緩存行对应的硬件线程; The shared instruction cache is used to store shared instructions of all hardware threads, including a tag storage array and a data storage array. The tag storage array is used to store tags. The data storage array includes stored instructions and hardware thread identifiers. The hardware The thread identifier is used to identify the hardware thread corresponding to the cache line in the shared instruction cache;
私有指令緩存, 用于存储从所述共享指令緩存中替换出的指令緩存 行, 所述私有指令緩存与所述硬件线程——对应; A private instruction cache, used to store instruction cache lines replaced from the shared instruction cache, the private instruction cache corresponds to the hardware thread;
缺失緩存, 用于当所述共享指令緩存中不存在所取指令时, 将从所 述共享指令緩存的下一级緩存中取回的緩存行保存在所述硬件线程的缺 失緩存中, 在所述所取指令对应的硬件线程取指时, 将所述缺失緩存中 的緩存行回填至所述共享指令緩存中, 所述缺失緩存与所述硬件线程一 一对应。 Missing cache, used to save the cache line retrieved from the next-level cache of the shared instruction cache in the missing cache of the hardware thread when the fetched instruction does not exist in the shared instruction cache. When the hardware thread corresponding to the fetched instruction fetches an instruction, the cache line in the missing cache is backfilled into the shared instruction cache, and the missing cache corresponds to the hardware thread one-to-one.
2、 根据权利要求 1所述的处理器, 其特征在于, 还包括: 2. The processor according to claim 1, further comprising:
标签比较逻辑, 用于当所述硬件线程取指时, 将所述硬件线程对应 的私有指令緩存中的标签与翻译后援緩冲器转换的物理地址进行比较, 所述私有指令緩存与所述标签比较逻辑相连, 以使得所述硬件线程在访 问所述共享指令緩存的同时访问所述私有指令緩存。 Tag comparison logic, used to compare the tag in the private instruction cache corresponding to the hardware thread with the physical address converted by the translation backup buffer when the hardware thread fetches an instruction, and the private instruction cache and the tag The comparisons are logically connected such that the hardware thread accesses the private instruction cache while accessing the shared instruction cache.
3、 根据权利要求 2所述的处理器, 其特征在于, 所述处理器为多线 程处理器, 所述私有指令緩存的结构为全相联结构, 所述全相联结构为 主存储器中的任意一块指令緩存映射所述私有指令緩存中的任意一块指 令緩存。 3. The processor according to claim 2, characterized in that, the processor is a multi-threaded processor, the structure of the private instruction cache is a fully associative structure, and the fully associative structure is in the main memory. Any block of instruction cache maps any block of instruction cache in the private instruction cache.
4、 根据权利要求 3所述的处理器, 其特征在于, 所述共享指令緩存、 私有指令緩存和所述缺失緩存为静态存储芯片或动态存储芯片。 4. The processor according to claim 3, wherein the shared instruction cache, the private instruction cache and the missing cache are static memory chips or dynamic memory chips.
5、 一种指令緩存的管理方法, 其特征在于, 包括: 5. An instruction cache management method, characterized by including:
当处理器的硬件线程在从指令緩存中获取指令时, 同时访问所述指 令緩存中的共享指令緩存和所述硬件线程对应的私有指令緩存; When the hardware thread of the processor obtains an instruction from the instruction cache, it simultaneously accesses the shared instruction cache in the instruction cache and the private instruction cache corresponding to the hardware thread;
确定所述共享指令緩存和所述硬件线程对应的私有指令緩存是否存 在所述指令, 并根据判断结果从所述共享指令緩存或所述硬件线程对应 的私有指令緩存中获取所述指令。 Determine whether the instruction exists in the shared instruction cache and the private instruction cache corresponding to the hardware thread, and obtain the instruction from the shared instruction cache or the private instruction cache corresponding to the hardware thread according to the determination result.
6、 根据权利要求 5所述的方法, 其特征在于, 所述共享指令緩存包 括标签存储阵列和数据存储阵列, 所述标签存储阵列用于存储标签, 所 述数据存储阵列包括存储的指令和硬件线程标识, 所述硬件线程标识用 于识别所述共享指令緩存中的緩存行对应的硬件线程; 6. The method according to claim 5, characterized in that, the shared instruction cache includes a tag storage array and a data storage array, the tag storage array is used to store tags, and the data storage array includes stored instructions and hardware Thread identifier, the hardware thread identifier is used to identify the hardware thread corresponding to the cache line in the shared instruction cache;
所述私有指令緩存的结构为全相联结构, 所述全相联结构为主存储 器中的任意一块指令緩存映射所述私有指令緩存中的任意一块指令緩 存, 所述私有指令緩存与所述硬件线程——对应。 The structure of the private instruction cache is a fully associative structure. The fully associative structure maps any instruction cache in the private instruction cache to any instruction cache in the main memory. The private instruction cache and the hardware Thread - Correspondence.
7、 根据权利要求 6所述的方法, 其特征在于, 所述确定所述共享指 令緩存和所述硬件线程对应的私有指令緩存是否存在所述指令, 并根据 判断结果从所述共享指令緩存或所述硬件线程对应的私有指令緩存中获 取所述指令包括: 7. The method according to claim 6, characterized in that: determining whether the instruction exists in the shared instruction cache and the private instruction cache corresponding to the hardware thread, and based on the determination result, from the shared instruction cache or the private instruction cache corresponding to the hardware thread. Obtaining the instruction from the private instruction cache corresponding to the hardware thread includes:
若所述共享指令緩存和所述硬件线程对应的私有指令緩存同时存在 所述指令, 则从所述共享指令緩存中获取所述指令; If the instruction exists in both the shared instruction cache and the private instruction cache corresponding to the hardware thread, the instruction is obtained from the shared instruction cache;
若所述共享指令緩存中存在所述指令且所述硬件线程对应的私有指 令緩存不存在所述指令, 则从所述共享指令緩存中获取所述指令; If the instruction exists in the shared instruction cache and the instruction does not exist in the private instruction cache corresponding to the hardware thread, obtain the instruction from the shared instruction cache;
若所述硬件线程对应的私有指令緩存存在所述指令且所述共享指令 緩存中不存在所述指令, 则从所述硬件线程对应的私有指令緩存获取所 述指令。 If the instruction exists in the private instruction cache corresponding to the hardware thread and the instruction does not exist in the shared instruction cache, the instruction is obtained from the private instruction cache corresponding to the hardware thread.
8、 根据权利要求 7所述的方法, 其特征在于, 所述方法还包括: 若所述共享指令緩存和所述私有指令緩存都不存在所述指令, 则通 过所述硬件线程向所述共享指令緩存的下一级緩存发送緩存缺失请求; 若所述下一级緩存中存在所述指令, 则通过所述硬件线程从所述下 一级緩存中获取所述指令, 并将所述指令所在的緩存行存储在所述硬件 线程对应的缺失緩存中, 在所述硬件线程取指时, 将所述緩存行回填至 所述共享指令緩存中; 8. The method according to claim 7, wherein the method further includes: if the instruction does not exist in the shared instruction cache or the private instruction cache, sending the instruction to the shared instruction cache through the hardware thread. The next-level cache of the instruction cache sends a cache miss request; if the instruction exists in the next-level cache, the instruction is obtained from the next-level cache through the hardware thread, and the instruction is stored in the next-level cache. The cache line is stored in the missing cache corresponding to the hardware thread, and when the hardware thread fetches an instruction, the cache line is backfilled into the shared instruction cache;
若所述下一级緩存中不存在所述指令, 则通过所述硬件线程向主存 储器发送所述缺失请求, 从所述主存储器中获取所述指令, 并将所述指 令所在的緩存行存储在所述硬件线程对应的缺失緩存中, 在所述硬件线 程取指时, 将所述緩存行回填至所述共享指令緩存中; If the instruction does not exist in the next-level cache, the missing request is sent to the main memory through the hardware thread, the instruction is obtained from the main memory, and the cache line where the instruction is located is stored. In the missing cache corresponding to the hardware thread, when the hardware thread fetches an instruction, the cache line is backfilled into the shared instruction cache;
其中, 所述缺失緩存与所述硬件线程——对应。 Wherein, the missing cache corresponds to the hardware thread.
9、 根据权利要求 8所述的方法, 其特征在于, 在将所述緩存行回填 至所述共享指令緩存中时, 若所述共享指令緩存不存在空闲资源, 则 将所述緩存行替换所述共享指令緩存中的第一緩存行, 将所述緩存行回 填至所述共享指令緩存中, 同时根据获取所述第一緩存行的第一硬件线 程的硬件线程标识, 将所述第一緩存行存储在所述第一硬件线程对应的 私有指令緩存中; 9. The method according to claim 8, wherein when backfilling the cache line into the shared instruction cache, if there are no idle resources in the shared instruction cache, then Replace the first cache line in the shared instruction cache with the cache line, backfill the cache line into the shared instruction cache, and obtain the hardware thread identifier of the first hardware thread of the first cache line at the same time. , store the first cache line in the private instruction cache corresponding to the first hardware thread;
其中, 所述第一緩存行是通过最近最少使用到算法确定的。 Wherein, the first cache line is determined through a least recently used algorithm.
10、 根据权利要求 9 所述的方法, 其特征在于, 在将替换出来的第 一緩存行存储在获取所述第一硬件线程对应的私有指令緩存中时, 若所 述第一硬件线程对应的私有指令緩存不存在空闲资源, 则将所述第一緩 存行替换所述第一硬件线程对应的私有指令緩存中的第二緩存行, 将所 述第一緩存行回填至所述第一硬件线程对应的私有指令緩存中; 10. The method of claim 9, wherein when storing the replaced first cache line in the private instruction cache corresponding to the first hardware thread, if If there are no idle resources in the private instruction cache, replace the first cache line with the second cache line in the private instruction cache corresponding to the first hardware thread, and backfill the first cache line to the first hardware thread. in the corresponding private instruction cache;
其中, 所述第二緩存行是通过所述最近最少使用到算法确定的。 Wherein, the second cache line is determined through the least recently used algorithm.
PCT/CN2014/080059 2013-06-28 2014-06-17 Management method for instruction cache, and processor WO2014206217A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310269557.0A CN104252425B (en) 2013-06-28 2013-06-28 The management method and processor of a kind of instruction buffer
CN201310269557.0 2013-06-28

Publications (1)

Publication Number Publication Date
WO2014206217A1 true WO2014206217A1 (en) 2014-12-31

Family

ID=52141028

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/080059 WO2014206217A1 (en) 2013-06-28 2014-06-17 Management method for instruction cache, and processor

Country Status (2)

Country Link
CN (1) CN104252425B (en)
WO (1) WO2014206217A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018229701A1 (en) * 2017-06-16 2018-12-20 International Business Machines Corporation Translation support for a virtual cache
US10606762B2 (en) 2017-06-16 2020-03-31 International Business Machines Corporation Sharing virtual and real translations in a virtual cache
US10713168B2 (en) 2017-06-16 2020-07-14 International Business Machines Corporation Cache structure using a logical directory

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809078B (en) * 2015-04-14 2019-05-14 苏州中晟宏芯信息科技有限公司 Based on the shared cache hardware resource access method for exiting yielding mechanism
WO2017006235A1 (en) * 2015-07-09 2017-01-12 Centipede Semi Ltd. Processor with efficient memory access
CN106484310B (en) * 2015-08-31 2020-01-10 华为数字技术(成都)有限公司 Storage array operation method and device
CN109308190B (en) * 2018-07-09 2023-03-14 北京中科睿芯科技集团有限公司 Shared line buffer system based on 3D stack memory architecture and shared line buffer
US11099999B2 (en) * 2019-04-19 2021-08-24 Chengdu Haiguang Integrated Circuit Design Co., Ltd. Cache management method, cache controller, processor and storage medium
CN110990062B (en) * 2019-11-27 2023-03-28 上海高性能集成电路设计中心 Instruction prefetching filtering method
CN111078592A (en) * 2019-12-27 2020-04-28 无锡中感微电子股份有限公司 Multi-level instruction cache of low-power-consumption system on chip
WO2022150996A1 (en) * 2021-01-13 2022-07-21 王志平 Method for implementing processor cache structure
CN114116533B (en) * 2021-11-29 2023-03-10 海光信息技术股份有限公司 Method for storing data by using shared memory
CN115098169B (en) * 2022-06-24 2024-03-05 海光信息技术股份有限公司 Method and device for fetching instruction based on capacity sharing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020174285A1 (en) * 1998-12-03 2002-11-21 Marc Tremblay Shared instruction cache for multiple processors
CN101510191A (en) * 2009-03-26 2009-08-19 浙江大学 Multi-core system structure with buffer window and implementing method thereof
US20110320720A1 (en) * 2010-06-23 2011-12-29 International Business Machines Corporation Cache Line Replacement In A Symmetric Multiprocessing Computer
CN103020003A (en) * 2012-12-31 2013-04-03 哈尔滨工业大学 Multi-core program determinacy replay-facing memory competition recording device and control method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020174285A1 (en) * 1998-12-03 2002-11-21 Marc Tremblay Shared instruction cache for multiple processors
CN101510191A (en) * 2009-03-26 2009-08-19 浙江大学 Multi-core system structure with buffer window and implementing method thereof
US20110320720A1 (en) * 2010-06-23 2011-12-29 International Business Machines Corporation Cache Line Replacement In A Symmetric Multiprocessing Computer
CN103020003A (en) * 2012-12-31 2013-04-03 哈尔滨工业大学 Multi-core program determinacy replay-facing memory competition recording device and control method thereof

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018229701A1 (en) * 2017-06-16 2018-12-20 International Business Machines Corporation Translation support for a virtual cache
GB2577023A (en) * 2017-06-16 2020-03-11 Ibm Translation support for a virtual cache
US10606762B2 (en) 2017-06-16 2020-03-31 International Business Machines Corporation Sharing virtual and real translations in a virtual cache
US10698836B2 (en) 2017-06-16 2020-06-30 International Business Machines Corporation Translation support for a virtual cache
US10713168B2 (en) 2017-06-16 2020-07-14 International Business Machines Corporation Cache structure using a logical directory
GB2577023B (en) * 2017-06-16 2020-08-05 Ibm Translation support for a virtual cache
US10810134B2 (en) 2017-06-16 2020-10-20 International Business Machines Corporation Sharing virtual and real translations in a virtual cache
US10831664B2 (en) 2017-06-16 2020-11-10 International Business Machines Corporation Cache structure using a logical directory
US10831674B2 (en) 2017-06-16 2020-11-10 International Business Machines Corporation Translation support for a virtual cache
US11403222B2 (en) 2017-06-16 2022-08-02 International Business Machines Corporation Cache structure using a logical directory
US11775445B2 (en) 2017-06-16 2023-10-03 International Business Machines Corporation Translation support for a virtual cache

Also Published As

Publication number Publication date
CN104252425A (en) 2014-12-31
CN104252425B (en) 2017-07-28

Similar Documents

Publication Publication Date Title
WO2014206217A1 (en) Management method for instruction cache, and processor
EP3238074B1 (en) Cache accessed using virtual addresses
US7290116B1 (en) Level 2 cache index hashing to avoid hot spots
US11016763B2 (en) Implementing a micro-operation cache with compaction
US6427188B1 (en) Method and system for early tag accesses for lower-level caches in parallel with first-level cache
KR101456860B1 (en) Method and system to reduce the power consumption of a memory device
WO2014206218A1 (en) Method and processor for accessing data cache
US10884751B2 (en) Method and apparatus for virtualizing the micro-op cache
US9727482B2 (en) Address range priority mechanism
US8521944B2 (en) Performing memory accesses using memory context information
US20170185515A1 (en) Cpu remote snoop filtering mechanism for field programmable gate array
US8335908B2 (en) Data processing apparatus for storing address translations
US9547593B2 (en) Systems and methods for reconfiguring cache memory
JP2008502069A (en) Memory cache controller and method for performing coherency operations therefor
KR20150079408A (en) Processor for data forwarding, operation method thereof and system including the same
WO2016191016A1 (en) Managing sectored cache
US20040221117A1 (en) Logic and method for reading data from cache
WO2014105167A1 (en) Apparatus and method for page walk extension for enhanced security checks
JPWO2004031943A1 (en) Data processor
US8271732B2 (en) System and method to reduce power consumption by partially disabling cache memory
JP2023179708A (en) Prefetch kill and retrieval in instruction cache
US10013352B2 (en) Partner-aware virtual microsectoring for sectored cache architectures
US9037804B2 (en) Efficient support of sparse data structure access
US9639467B2 (en) Environment-aware cache flushing mechanism
US10942851B2 (en) System, apparatus and method for dynamic automatic sub-cacheline granularity memory access control

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14816879

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14816879

Country of ref document: EP

Kind code of ref document: A1