GB2489243A - Trace cache with pointers to common blocks and high resource thread filtering - Google Patents

Trace cache with pointers to common blocks and high resource thread filtering Download PDF

Info

Publication number
GB2489243A
GB2489243A GB1104760.2A GB201104760A GB2489243A GB 2489243 A GB2489243 A GB 2489243A GB 201104760 A GB201104760 A GB 201104760A GB 2489243 A GB2489243 A GB 2489243A
Authority
GB
United Kingdom
Prior art keywords
cache
threads
instructions
commonly used
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1104760.2A
Other versions
GB201104760D0 (en
GB2489243B (en
Inventor
Azam Beg
Ajmal Beg
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
United Arab Emirates University
Original Assignee
United Arab Emirates University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by United Arab Emirates University filed Critical United Arab Emirates University
Priority to GB1104760.2A priority Critical patent/GB2489243B/en
Publication of GB201104760D0 publication Critical patent/GB201104760D0/en
Publication of GB2489243A publication Critical patent/GB2489243A/en
Application granted granted Critical
Publication of GB2489243B publication Critical patent/GB2489243B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • G06F9/30178Runtime instruction translation, e.g. macros of compressed or encrypted instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Debugging And Monitoring (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A processor 100 has an instruction cache 101 and a trace cache 102. The trace cache includes pointers to commonly used blocks of instructions. The commonly used blocks may be identified using a signal 112. The processor uses a multiplexer 103, 104 to select instructions from the trace cache or the instruction cache to be executed in the execution engine 106. The trace cache may be divided into separate pointer and block sections. The processor may use a thread filter 107 to trace only threads specified by a thread selection signal 110. The signal may be sent by a user or by the operating system based on the resource usage of the threads. The trace cache may be disabled to save energy in response to an energy mode signal 111.

Description

ARCHITECTURE OF A PROCESSOR WITH LOW-ENERGY
INSTRUCTION CACHE
TECHNICAL FIELD
This invention relates to an architecture of a processor with low-energy instruction cache.
BACKGROUND
A processor executes instructions using one or more execution units. An execution unit in a modern processor receives the instructions from an instruction cache or a unified instruction-data cache. The execution unit in the processor remains idle if there is a "cache miss," meaning the desired instruction(s) is/are not available in the cache. This also means that the instructions have to be fetched from a higher level of memory such as RAM. The RAM access time is larger than that of the cache.
Practical programs exhibit "locality of reference", which is the tendency of sets of instructions (also called blocks) related to a single thread, to repeatedly execute.
One of the ways of increasing the chances of finding the instructions in a cache is to store the blocks of instructions in a special instruction cache such as Code Pattern Cache (CPC).
CPC generally exhibits shorter access time than the common instruction cache.
The miss rate of any cache (including CPC) can be reduced by increasing its size and/or by dedicating cache areas for different threads running on a processor.
However, increasing the size of a cache (including CPC) results in higher energy consumption and increased die area. Thus, there is need for a cache architecture that results in lower energy consumption and smaller die area while maintaining lower miss rates.
BRIEF SUMMARY OF THE INVENTION
A low energy processor, comprising: execution unit to execute instructions, instruction cache to store instructions, cache storingcommonly used instructions, cache storing blocks of instructions from threads, references to blocks in the cache storing commonly used instructions, multiplexer to select instruction from either from the instruction cache or from the cache storing blocks of instructions belonging to threads.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings together with the description, serve to explain the principles of the invention.
FIG. 1 illustrates an exemplary processor with low energy CPC (LE-CPC) according to the present invention.
FIG. 2 illustrates an exemplary executable for selecting threads which are more likely to benefit from the CPC and sending the selected thread information to the processor according to the present invention.
FIG. 3 illustrates an exemplary thread filter in the processor, which selects traces based on the information received by the exemplary executable illustrated in FIG. 2 and is according to the present invention.
FIG. 4 illustrates an exemplary trace build engine used to build traces according to the present invention.
FIG. 5 illustrates an exemplary executable which provides the processor the most commonly used blocks' information according to the present invention.
FIG. 6 illustrates an exemplary storage module which stores the traces of instructions relevant to selected threads according to the present invention.
FIG. 7 illustrates an exemplary cache which stores commonly used sets of instructions according to the present invention.
FIG. 8 illustrates an exemplary cache which stores pointers for traces of threads according to the present invention.
FIG 9 illustrates exemplary cache structures for storing instructions according to the present invention.
FIG 10 illustrates exemplary lines in the cache structures for storing compressed instructions according to the present invention.
FIG 11 illustrates exemplary lines in the cache structures for storing uncompressed instructions according to the present invention.
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 illustrates an exemplary processor 100 with LE-CPC according to the present invention.
The execution engine 106 executes instructions. The instructions to the execution engine 106 are provided by instruction cache 101 and a further cache, namely a LE-CPC storage module 102.
The instruction cache 101 receives instructions 113 from other memory area such as RAM or another level of cache.
The LE-CPC storage module 102 stores the traces of instructions from threads executed by the execution engine 106.
Two multiplexers 103, 104 select between instruction cache 101 and LE-CPC storage module 102 via lines/or as a source of instructions for the execution engine 106.
A thread filter 107 selects traces from only those threads which are using the processor extensively. The thread filter 107 selects threads based on a thread selection signal 110 which is received from a utility running at the operating system level. Using a utility to select threads which consume the processor extensively eliminates the need to build a selection unit in the processor at the hardware level, thus further reducing the die area of the processor and decreasing the energy consumption of the CPC.
LE-CPC trace build engine 108 buffers the output of the thread filter 107 to LE-CPC storage module 102 to build a trace 109.
The LE-CPC storage module 102 stores the traces in a commonly used block cache 603 and basic block cache 601 (as described below with reference to Figure 6). The commonly used block cache 603 in the LE-CPC storage module 102 stores the commonly used sets of instructions in the form of basic blocks.
The approach of taking traces of only those threads which are using the processor extensively (i.e. commonly) helps reduce the size of the CPC, thus reducing the energy consumption and the die area of the CPC.
The commonly used block cache 603 in the LE-CPC storage module is filled by the commonly used block information received by a utility running at the OS level. As such this utility is not implemented at the hardware level, and thereby it further reduces the die area, resulting in low energy consumption by the processor.
The blocks in a built trace which are the same as the blocks stored in the commonly used block cache 603 are replaced by the pointers toward the commonly used block cache 603.
That is, parts of the trace are replaced by a reference to blocks stored in the LE-CPC storage module 102 (e.g. in the commonly used block cache 603). This approach further reduces the size of the CPC, thus further resulting in reduced die area and low energy consumption.
The thread filter 107, LE-CPC trace engine 108 and LE-CPC storage module 102 have energy modes which are controlled by energy mode signals 111. In an embodiment, the thread filter 107, the LE-CPC trace engine 108 and the LE-CPC storage module 102 can be switched off during the duration of low activity of the processor 100, thus further allowing low energy consumption.
FIG. 2 illustrates an exemplary executable called "high resource thread selection utility" 203 for selecting threads which are more likely to benefit from the CPC according to the present invention and for sending the selected threadst information to the processor 100, 201.
The high resource thread selection utility 203 runs on operating system level 202 and collects threads information 208, 210. Thread information 208 from the OS level may have different details and formats compared to thread information 210 from the processor 101, 201.
The high resource thread selection utility 203 consists of two routines, "automatic high resource thread detection routine" 204 and "manual high resource thread selection routine" 205.
The high resource thread selection utility 203 also provides a graphical user interface 207.
The automatic high resource thread detection routine 204 collects information about high resource consuming threads without need of an input from a human user through a graphical user interface 207.
The manual high resource thread selection routine 205 collects information about high resource consuming threads based on an input frcsm a human user through a graphical user interface 207.
The high resource thread selection utility 203 may store high resource thread related information in OS level files 206 for later reference to make decisions related to thread selection based on the historical data.
Thread selection information 209, 211 is sent to the processor 100, 201. The thread selection information at the operating system level 209 and at the processor level 211 may have different details and formats compared to one another.
FIG. 3 illustrates an exemplary thread filter 300 in the processor (equivalent to thread filter 107 in Figure 1), which selects traces based on the thread selection signal 211, 303 (equivalent to thread selection signal 110 in Figure 1) according to the present invention. The filter 301 in the thread filter 300 receives the traces 302 of instructions related to tn number of threads which are executed by the execution engine 106. The filter 301 selects only traces of instructions that are related to n number of threads. The selection of n number of threads is based on the thread selection signal 211, 303 from the operating system 202.
As the thread filter 300 receives this information about thread selection 211, 303 from high resource thread selection utility 203, the thread filter 300 does not need to contain complex circuitry to implement decision making logic at the hardware level. Such approach reduces the die area resulting in low energy consumption. The thread filter 300 can be switched to low energy mode based on energy mode signal 111, 304 during the duration of low activity. Thus, allowing further reduction in energy consumption.
FIG. 4 illustrates an exemplary trace build engine 400 (equivalent to LE-CPC trace build engine 108 in Figure 1) used to build traces according to the present invention. The exemplary trace build engine 400 receives and buffers traces of threads from thread filter 107, 303. The exemplary trace build engine 400 is a first-in first-out buffer which consists of multiple trace buffer areas, called LE-CPC trace buffer area 403. Each LE-CPC trace buffer area 403 consists of six fields to memory areas; thread ID 404, Sequence ID 405, Head address 406, Tail address 407, Branch status 408 and Instruction area 409. In an embodiment, all six memory areas are of fixed length.
Thread ID 404 stores the thread ID the trace belongs to. Head address 406 is the address of the first instruction in a block belonging to Thread ID 404. Tail address 407 is the address of the last instruction in a block belonging to thread ID 404. Branch status 408 is the branch status of block belonging to a trace in thread ID 404. Instruction area 409 stores the block belonging to thread ID 404 that starts from the head address 406 and ends at tail address 407. Long traces which cannot be stored fully in one single instruction area 409 are divided and stored in multiple LE-CPC trace buffer areas 403.
FIG. 5 illustrates an exemplary exeOutable called "commonly used block detection utility" 503 which provides the processor 100, 501 the commonly used blocks information 112 (see Figure 1) according to the present invention.
The commonly used block detection utility 503 runs on operating system level 502 and collects commonly used blocks information 504. The commonly used blocks information at the OS level 504 may have different details and formats compared to the commonly used blocks information 505 sent to the processor 100, 501.
The commonly used block detection utility 503 keep tracks of the commonly used blocks, The commonly used block detection utility 503 also provides a graphical user interface 507 to allow users select most commonly used programs. The commonly used block detection utility 503 may store commonly used block information in OS level files 506 for later reference to make decisions related to commonly used block information based on the historical data.
Commonly used block information 504, 505, 112 is passed to the LE-CPC storage module 102.
The process does not need to implement logic at hardware level to make decisions about commonly used block detection utility. It reduces the die area and helps reduce the energy consumption.
FIG. 6 illustrates an exemplary LE-CPC storage module 600 (equivalent to LE-CPC storage module 102 of Figure 1) which stores the traces of instructions relevant to selected threads according to the present invention. The exemplary LE-CPC storage module 600 receives trace build from LE-CPC trace build engine signals 611 (equivalent to build trace signal 109 of Figure l) A commonly used block cache filler 604 receives the commonly used block information 112 613 and stores this information 112, 613 in the commonly used block cache 603.
A basic block filler 605 uses the information in the commonly used block cache 603 and stores traces in a block pointer cache 602 and basic block cache 601.
An instruction extractor 606 receives address 614 and sets a LE-CPC hit flag 607 to indicate the availability of a trace containing the address 614 in the LE-CPC storage module 600. A block count 608 produced by the instruction extractor 606 indicates the number of instructions contained in the trace. A block of instructions 610 produced by the instruction extractor 606 contains the set of instructions which form the trace. A block counter 609 produced by the instruction extractor 606 indicates the existing count number in the block count 608. With each cycle, the block counter 609 increments until it reaches the block count 608.
The LE-CPC storage module 600 can be switched to low energy mode based on energy mode signal 111 612 during the duration of low activity. Thus, allowing further reduction in energy consumption.
FIG. 7 illustrates an exemplary commonly used block cache 700 (equivalent to the commonly used block cache 603 of Figure 6) which stores commonly used sets of instructions according to the present invention. The commonly used block cache 700 consists of multiple fixed length memory areas 701 storing commonly used blocks. This fixed length memory area is called "commonly used block area" 701 here. The commonly used block area 701 consists of three fixed length memory areas: Block ID 702, SQ No. 703 and commonly used block 704.
The block ID 702 identifies uniquely the common used block. The commonly used block is stored in commonly used block 705.
The commonly used block cache may be updated periodically by the commonly used block detection utility 503.
The commonly used block detection utility 503 may get the information about the current thread selection signal 209 from the high resource thread selection utility 203 and fill the commonly used block cache 700 with the commonly used blocks that are relevant to the traces related to currently selected threads.
SQ No 703 allows storing commonly used code that spans over multiple commonly used block areas 701.
FIG. 8 illustrates an exemplary block pointer cache 800 (equivalent to block pointer cache 602 of Figure 6) which stores pointers for blocks stored for the traces in the commonly used block cache 603 and basic block cache 601 of threads stored according to the present invention.
Each row consists of multiple fields: A thread ID 801 contains the unique identifier of a thread. A trace valid bit 803 indicates the validity of the trace 803. A trace LRU 804 indicates the least recently used status of the trace. A branch status 805 indicates the branch status.
Block n 806 807 808 809 holds the pointers to the blocks in commonly used block cache 603 and basic block cache 601. SQ No 802 is the sequence number and is used in case when the trace is too long and cannot be accommodated in one row. One row in block n has four fields 810 811 812 813. A first field 810 indicates the type of cache (the commonly used block cache 603 or the basic block cache 601) which holds the block. A head address 811 holds the head address of the block. A tail address 812 holds the tail address of the block. A way ID 813 indicates the way number for the block in the trace.
FIG. 9 illustrates the basic block cache 900 (equivalent to basic block cache 601 of Figure 6). An opcode detector and encoder (ODE) 901 distinguishes between two types of incoming instructions (opcodes) 906: frequently occurring opcodes and non-frequent opcodes.
The frequently occurring opcodes are compressed before storage, while the other opcodes are not compressed before storage. We use four most frequently opcodes for compression, following the example of MIPS architecture,.in which just four opcodes make up more than 50% of the executed instructions in CPU2006 benchmarks. The compressible opcodes are encoded by ODE 901 and routed 908 to a BBC compressed data Array (BBC-CDA) 902 and the uncompressed codes 910 are dispatched to a BBC uncompressed data array (BBC-UDA) 903. ODE 901 encoding converts 6-bit opcodes into 2-bit codes and sends them to 28xn bit wide BBC-CDA 902. Storage and retrieval of 2-bit opcodes uses less energy than normal 6-bit opcodes. The remaining 6-bit opcodes (instructions) are stored in the regular 32xn bit wide BBC-UDA 903.
ODE 901 asserts a write enable signal 907 if the compressed opcodes are being sent BBC-CDA 902, or asserts the other write enable signal 909 if the uncompressed opcodes 910 are being directed to BBC-UDA 903.
Upon a hit to the BBC-CDA 902, it routes the instructions 911 first to an opcode decoder 904 and then 912 to a merging buffer 905. While a hit to BBC-UDA 903 does not require opcode decoding and instructions 913 are transferred directly to the merging buffer 905.
A multiplexer MUX 914 selects which instructions 912, 913 to send to the merging buffer 905.
The merging buffer 905 outputs instructions 915 to the execution engine (the instruction extract 606 of Figure 6).
FIG. 10 illustrates the composition of lines 1000 in BBC-CDA 902. Each line of BBC-CDA 902 is made up of 28xn bits.
FIG. 11 illustrates the composition of lines 1100 in BBC-UDA 903. Each line of BBC-UDA 903 is made up of 32xn bits.
It is to be understood that while the detailed description describes the present invention, the foregoing description is for illustrative purpose and does not limit the scope of the present invention which is defined by the scope of the appended claims. Other embodiments, arrangements and equivalents will be evident to those skilled in the art. Other embodiments, arrangements, usages and equivalents are within the scope of the present invention as defined by the appended claims.

Claims (12)

  1. CLAIMSI. A processor, comprising: an execution unit (106) to execute instructions; an instruction cache (101) storing instructions; a further cache (102, 600) storing traces of instructions from threads, where parts of the trace are replaced by reference to blocks stored in the further cache; a multiplexer (103, 104) to select instruction for the execution unit (106) from either the further cache storing traces of instruction from threads or from the instruction cache.
  2. 2. A processor according to claim I, characterized by further comprising a thread filter (107, 300) for selecting threads of which a trace is to stored in the further cache.
  3. 3. A processor according to claim 2, characterized in that the thread filter (107, 300) is adapted to select the threads based on a signal (110, 303) from an operating system (202) of the processor.
  4. 4. A processor according to any one of claims 1 to 3, characterized in that the further cache (102, 600) has a block memory part (603) allocated for storing commonly used instructions.
  5. 5. A processor according to claim 4, wherein the commonly used instructions are filled in the block memory part (603) based on information from an operating system (202) of the processor.I
  6. 6. A processor according to any one of claims 1 to 5, characterized in that the further cache (102, 600) has a pointer memory part (602) allocated for pointers to blocks in the block memory part thereby implementing parts of the trace to be replaced by reference to blocks.
  7. 7. A program, which when executed on a computer, causes the computer to carry out the following: a step of automatic high resource thread detection in which information about high resource consuming threads is collected and/or a step of accepting user input as to high resource threads; generating a thread selection signal based on the step of automatic high resource thread detection and/or the step of accepting new input; and selecting threads for storing in a further cache separate from an instruction cache based on the thread selection signal.
  8. 8. The program of claim 7, which further causes the computer to carry out: storage of traces of instructions from selected threads, in which parts of the traces are replaced by reference to blocks stored in the further cache.
  9. 9. The program of claim 7 or 8, which further causes the computer to carry out: storing blocks representing parts of traces of selected threads in a block memory part (603) of the further cache.
  10. 10. The program of claim 8 or 9, which further causes the computer to carry out: storing pointers to blocks stored in the further cache in a pointer memory part of the further cache.
  11. 11. The program of any of claims 7 to 10, wherein the step of accepting user input is carried out through a graphical user interface.
  12. 12. A program, which when executed on a computer, causes the computer to cany out the following: a step of accepting user input to determine commonly used blocks and/or a step of code simulating to determine commonly used blocks; passing commonly used block information to a further cache different to an instruction cache; and storing block information passed to the further cache in a block memory part of the further cache.AMENDMENTS TO CLAIMS HAVE BEEN FILED AS FOLLOWSCLAIMS1. A program, which when executed on a computer, causes the computer to carry out the following: a step of automatic high resource thread detection in which information about high resource consuming threads is collected and/or a step of accepting user input as to high resource threads; generating a thread selection signal based on the step of automatic high resource thread detection and/or the step of accepting user input; and selecting threads for storing in a further cache separate from an instruction cache based on the thread selection signal.2. The program of claim 1, which further causes the computer to carry out: storage of traces of instructions from selected threads, in which parts of the traces are replaced by reference to blocks stored in the further cache. rcc 3. The program of claim 1 or 2, which further causes the computer to carry out: 0 storing blocks representing parts of traces of selected threads in a block memory part (603) of the further cache. C.J204. The program of claim 2 or 3, which further causes the computer to carry out: storing pointers to blocks stored in the further cache in a pointer memory part of the further cache.5. The program of any of claims ito 4, wherein the step of accepting user input is carried out through a graphical user interface.
GB1104760.2A 2011-03-21 2011-03-21 Trace cache with pointers to common blocks and high resource thread filtering Expired - Fee Related GB2489243B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1104760.2A GB2489243B (en) 2011-03-21 2011-03-21 Trace cache with pointers to common blocks and high resource thread filtering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1104760.2A GB2489243B (en) 2011-03-21 2011-03-21 Trace cache with pointers to common blocks and high resource thread filtering

Publications (3)

Publication Number Publication Date
GB201104760D0 GB201104760D0 (en) 2011-05-04
GB2489243A true GB2489243A (en) 2012-09-26
GB2489243B GB2489243B (en) 2013-02-27

Family

ID=44012923

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1104760.2A Expired - Fee Related GB2489243B (en) 2011-03-21 2011-03-21 Trace cache with pointers to common blocks and high resource thread filtering

Country Status (1)

Country Link
GB (1) GB2489243B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11237974B2 (en) * 2019-08-27 2022-02-01 Arm Limited Operation cache compression

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185675B1 (en) * 1997-10-24 2001-02-06 Advanced Micro Devices, Inc. Basic block oriented trace cache utilizing a basic block sequence buffer to indicate program order of cached basic blocks
US20050235131A1 (en) * 2004-04-20 2005-10-20 Ware Frederick A Memory controller for non-homogeneous memory system
KR20070093541A (en) * 2006-03-14 2007-09-19 장성태 Control method and processor system with partitioned level-1 instruction cache
US20080263326A1 (en) * 2004-12-01 2008-10-23 International Business Machines Corporation Method and apparatus for an efficient multi-path trace cache design
KR20100046415A (en) * 2008-10-27 2010-05-07 고려대학교 산학협력단 Patch engine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185675B1 (en) * 1997-10-24 2001-02-06 Advanced Micro Devices, Inc. Basic block oriented trace cache utilizing a basic block sequence buffer to indicate program order of cached basic blocks
US20050235131A1 (en) * 2004-04-20 2005-10-20 Ware Frederick A Memory controller for non-homogeneous memory system
US20080263326A1 (en) * 2004-12-01 2008-10-23 International Business Machines Corporation Method and apparatus for an efficient multi-path trace cache design
KR20070093541A (en) * 2006-03-14 2007-09-19 장성태 Control method and processor system with partitioned level-1 instruction cache
KR20100046415A (en) * 2008-10-27 2010-05-07 고려대학교 산학협력단 Patch engine

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Black B et al, The block-based trace cache *
Hamalainen et al, Proceedings of 5th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation, 18-20 July 2005, Springer-Verlag, pages 103-111, Cheol et al, "First-level instruction cache design for reducing dynamic energy consumption" *
Journal of Systems Architecture, Vol. 51 No. 8, August 2005 (Netherlands), Sahuquillo et al, "Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors", pages 451-469 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11237974B2 (en) * 2019-08-27 2022-02-01 Arm Limited Operation cache compression

Also Published As

Publication number Publication date
GB201104760D0 (en) 2011-05-04
GB2489243B (en) 2013-02-27

Similar Documents

Publication Publication Date Title
US8316214B2 (en) Data access tracing with compressed address output
US9798645B2 (en) Embedding stall and event trace profiling data in the timing stream
US9164727B2 (en) FPGA-based high-speed low-latency floating point accumulator and implementation method therefor
KR20210003267A (en) Multithreaded, thread status monitoring of systems with self-scheduling processor
KR101531078B1 (en) Data processing system and data processing method
EP3791266A1 (en) Thread commencement and completion using work descriptor packets in a system having a self-scheduling processor and a hybrid threading fabric
EP3791265A1 (en) Thread commencement using a work descriptor packet in a self-scheduling processor
WO2019217331A1 (en) Thread creation on local or remote compute elements by a multi-threaded, self-scheduling processor
WO2019217326A1 (en) Thread priority management in a multi-threaded, self-scheduling processor
WO2019217329A1 (en) Memory request size management in a multi-threaded, self scheduling processor
WO2019217304A1 (en) Adjustment of load access size by a multi-threaded, self-scheduling processor to manage network congestion
EP3791278A1 (en) Event messaging in a system having a self-scheduling processor and a hybrid threading fabric
EP3791273A1 (en) Multi-threaded, self-scheduling processor
US7840758B2 (en) Variable store gather window
US8285973B2 (en) Thread completion rate controlled scheduling
US20140201456A1 (en) Control Of Processor Cache Memory Occupancy
CN106575220B (en) Multiple clustered VLIW processing cores
WO2019217303A1 (en) Non-cached loads and stores in a system having a multi-threaded, self-scheduling processor
CN107870780B (en) Data processing apparatus and method
CN105512051A (en) Self-learning type intelligent solid-state hard disk cache management method and device
Zhong et al. LIRS2: an improved LIRS replacement algorithm
CN110147254A (en) A kind of data buffer storage processing method, device, equipment and readable storage medium storing program for executing
GB2489243A (en) Trace cache with pointers to common blocks and high resource thread filtering
US8589738B2 (en) Program trace message generation for page crossing events for debug
CN105378652A (en) Method and apparatus for allocating thread shared resource

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20190321