WO1997036234A1 - Cache multi-block touch mechanism for object oriented computer system - Google Patents
Cache multi-block touch mechanism for object oriented computer system Download PDFInfo
- Publication number
- WO1997036234A1 WO1997036234A1 PCT/US1996/019469 US9619469W WO9736234A1 WO 1997036234 A1 WO1997036234 A1 WO 1997036234A1 US 9619469 W US9619469 W US 9619469W WO 9736234 A1 WO9736234 A1 WO 9736234A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cache
- block
- instruction
- touch instruction
- memory
- Prior art date
Links
- 230000007246 mechanism Effects 0.000 title claims description 37
- 238000012545 processing Methods 0.000 claims abstract description 28
- 230000004044 response Effects 0.000 claims abstract description 3
- 238000000034 method Methods 0.000 claims description 35
- 238000004590 computer program Methods 0.000 claims description 10
- 230000005540 biological transmission Effects 0.000 claims description 2
- 230000002708 enhancing effect Effects 0.000 claims 1
- 230000000977 initiatory effect Effects 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 14
- 230000008901 benefit Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 5
- 238000012790 confirmation Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000006854 communication Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0862—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
- G06F8/4441—Reducing the execution time required by the program code
- G06F8/4442—Reducing the number of cache misses; Data prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30047—Prefetch instructions; cache control instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
- G06F9/383—Operand prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/60—Details of cache memory
- G06F2212/6028—Prefetching based on hints or prefetch instructions
Definitions
- This invention generally relates to computer systems.
- this invention relates to mechanisms for prefetching information into a cache memory within a computer system.
- the development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era.
- Two of the more basic components of the EDVAC system which are still found in today's systems are the computer system's memory and processor.
- the computer system's memory is where the computer system's programs and data are stored
- the computer system's processor is the entity that is responsible for executing the programs that are stored in its memory.
- OOP Object Oriented Programming
- Cache memory is special because a processor can retrieve information from cache memory much faster than it can from standard memory (called main memory) .
- main memory standard memory
- main memory standard memory
- Cache memory is significantly more expensive than main memory. Consequently, computer system designers balance the need for speed against the cost of cache memory by keeping the size of cache memory relatively small when compared to that of main memory.
- cache touch instruction serves as a signal to the memory controller to prefetch information from main memory to cache memory.
- the compiler inserts one or more cache touch instructions in the instruction stream to prefetch the needed information into the cache. In most cases, more than one cache touch instruction is needed to prefetch enough extended blocks of data into the cache to benefit a system with cache capabilities.
- the hardware has the capability of prefetching each touched block of data or instructions into the cache memory in parallel with the execution of the instructions following the cache touch instruction. hile the use of OOP and cache memory (with the associated use of cache touch instructions) have each improved overall computer system technology, their combined use in a single system has yet to be fully optimized. This lack of optimization exists because object oriented programs do not fully benefit from typical cache touch instructions. Standard cache touch instructions are designed to prefetch only small portions of data at a time, which means that large portions of objects are not brought into cache memory even though those portions may well be needed by instructions that are about to execute on the computer system's processor. More advanced cache instructions have been designed to prefetch larger amounts of data.
- an advantage of this invention to provide a cache multi-block touch mechanism for object oriented computer systems that is capable of successfully and optimally operating with any type or size of cache memory. It is another advantage of this invention to provide a cache multi-block touch mechanism for object oriented computer systems that successfully prefetches multiple cache lines without the compiler having to issue multiple touch instructions, regardless of the line size of the cache memory. It is a further advantage of the present invention to provide a cache multi-block touch mechanism that allows for both data cache and instruction cache multi-block touch instructions.
- an object- oriented computer apparatus for generating a first instruction stream executable on a processing unit from a second instruction stream.
- the computer apparatus comprises a multi-block cache touch instruction generator for generating and inserting a multi-block cache touch instruction into the first instruction stream in at least one location within the first instruction stream where prefetching multiple blocks of object data and code into the cache memory is advantageous.
- the execution of the multi-block cache touch instruction by the processing unit causes a prefetch of at least one of a plurality of multiple blocks of data and code from a main memory into a plurality of cache lines of a cache memory.
- These multi-block touch instructions indicate the beginning address of a desired code in main memory and the size of the block to be prefetched.
- the memory controller will examine the amount of code/data requested and determine how many lines must be brought into cache to satisfy the request. Thus, multiple cache lines may be prefetched with only one touch instruction.
- FIG. 1 is a block diagram of a computer system according to an embodiment of the present invention
- FIG. 2 is a block diagram of a typical region of the main memory of FIG. 1;
- FIG. 3 is a block diagram of a cache that has a line size of 8 bytes;
- FIG. 4 shows an example of prior art touch instructions that are configured and used for the 8-byte-length cache line of FIG. 3
- FIG. 5 is a block diagram of a cache that has a line size of 4 bytes
- FIG. 6 shows an example of prior art touch instructions that are configured for an 8-byte-length cache line but are used in the 4-byte-length cache line of FIG. 5;
- FIG. 7 is a block diagram of a cache that has a line size of 16 bytes;
- FIG. 8 shows an example of prior art touch instructions that are configured for an 8-byte-length cache line but are used in the 16-byte-length cache line of FIG. 7;
- FIG. 9 is a block diagram of the multi-block cache touch instruction used with the multi-block cache touch mechanism of the present invention, according to the preferred embodiment;
- FIG. 10 is a block diagram of a second cache that has a line size of 8 bytes
- FIG. 11 shows the results of using the present invention with the cache of FIG. 10;
- FIG. 12 is a block diagram of a second cache that has a line size of 4 bytes;
- FIG. 13 shows the results of using the present invention with the cache of FIG. 12;
- FIG. 14 is a block diagram of a second cache that has a line size of 16 bytes.
- FIG. 15 shows the results of using the present invention with the cache of FIG. 14. Description of the Preferred Embodiments
- objects can be thought of as autonomous agents that work together to perform the tasks required by a computer system.
- a single object represents an individual operation or a group of operations that are performed by a computer system upon information controlled by the object.
- the operations of objects are called “methods” and the information controlled by objects is called “object data” or just “data.”
- Objects are created (i.e., "instantiated") as instances of something called a "class.” Classes define the data that will be controlled by their instances and the methods that will provide access to that data.
- Computer programs are constructed using one or more programming languages. Like words written in English, a programming language is used to write a series of statements that have particular meaning to the drafter (i.e., the programmer) . The programmer first drafts a computer program in human readable form (called source code) prescribed by the programming language, resulting in a source code instruction
- compiler refers to any mechanism that transforms one representation of a computer program into another representation of that program.
- the object code within this specification, is a stream of binary instructions (i.e., ones and zeros) that are meaningful to the computer.
- Compilers generally translate each source code statement in the source code instruction stream into one or more intermediate language instructions, which are then converted into corresponding object code instructions.
- Special compilers called optimizing compilers, typically operate on the intermediate language instruction stream to make it perform better ( e . g. , by eliminating unneeded instructions, etc.).
- Some optimizing compilers are wholly separate while others are built into a primary compiler (i.e., the compiler that converts the source code statements into object code) to form a multi-pass compiler.
- multi-pass compilers first operate to convert source code into an instruction stream in an intermediate language understood only by the compiler (i.e., as a first pass or stage) and then operate on the intermediate language instruction stream to optimize it and convert it into object code (i.e., as a second pass or stage) .
- a compiler may reside within the memory of the computer which will be used to execute the object code, or may reside on a separate computer system. Compilers that reside on one computer system and are used to generate machine code for other computer systems are typically called "cross compilers.” The methods and apparatus discussed herein apply to all types of compilers, including cross compilers.
- Cache Prefetch Mechanisms Information may be prefetched into a cache memory due to specific requirements of the hardware architecture, or by the processor executing a special command or instruction stream indicating its desire for a prefetch.
- Modern compilers typically generate cache touch instructions to tell the memory controller (or memory subsystem) when to prefetch information into a cache memory.
- Prior art compilers must make an assumption based on the cache line size on the target platform. This assumption results in code that runs less efficiently on targets with cache line sizes more or less than the assumed cache line size.
- FIG. 1 shows a block diagram of the computer system 100 in accordance with a preferred embodiment of the present invention.
- the computer system 100 of the preferred embodiment is an enhanced IBM AS/400 mid-range computer system.
- Computer system 100 suitably comprises a processor 110, main memory 120, a memory controller 130, an auxiliary storage interface 140, a terminal interface 150, instruction cache memory 160 and data cache memory 170, all of which are interconnected via a system bus 180.
- processor 110 main memory 120
- memory controller 130 an auxiliary storage interface 140
- terminal interface 150 instruction cache memory 160 and data cache memory 170
- FIG. 1 is presented to simply illustrate some of the salient features of computer system 100.
- Processor 110 performs computation and control functions of computer system 100, and comprises a suitable central processing unit.
- Processor 110 may comprise a single integrated circuit, such as a microprocessor, or may comprise any suitable number of integrated circuit devices and/or circuit boards working in cooperation to accomplish the functions of a processor.
- Processor 110 suitably executes an instruction stream 124 within main memory 120.
- Auxiliary storage interface 140 is used to allow computer system 100 to store and retrieve information from auxiliary storage, such as magnetic disk (e.g., hard disks or floppy diskettes) or optical storage devices (e.g., CD-ROM).
- Memory controller 130 through use of a processor separate from processor 110, is responsible for moving requested information from main memory 120 and/or through auxiliary storage interface 140 to instruction cache 160, data cache 170 and/or processor 110. While for the purposes of explanation, memory controller 130 is shown as a separate entity, those skilled in the art understand that, in practice, portions of the function provided by memory controller 130 may actually reside in the circuitry associated with processor 110, main memory 120, instruction cache 160, data cache 170, and/or auxiliary storage interface 140.
- Terminal interface 150 allows system administrators and computer programmers to communicate with computer system 100, normally through programmable workstations.
- system 100 depicted in FIG. 1 contains only a single main processor 110 and a single system bus 180, it should be understood that the present invention applies equally to computer systems having multiple processors and/or multiple system buses.
- system bus 180 of the preferred embodiment is a typical hardwired, multidrop bus, any connection means that supports bi-directional communication could be used.
- Main memory 120 contains optimizing compiler 122, source code instruction stream 123, machine code instruction stream 124, application programs 126, and operating system 128. It should be understood that main memory 120 will not necessarily contain all parts of all mechanisms shown.
- portions of application programs 126 and operating system 128 may be loaded into instruction cache 160 for processor 110 to execute, while other files may well be stored on magnetic or optical disk storage devices (not shown) .
- compiler 122 may generate a machine code instruction stream 124 that is intended to be executed on a different computer system if compiler 122 is a cross-compiler. It is also to be understood that any appropriate computer system memory may be used in place of main memory 120.
- Instruction cache 160 contains instructions/blocks of code from main memory 120 for processor 110 to readily access and use.
- data cache 170 contains blocks of data from main memory 120 for processor 110 to readily access and use. It should be understood that even though data cache 170 is separate from instruction cache 160 in the preferred embodiment of the present invention, both caches may be combined to form a single unit.
- FIG. 2 is a block diagram of a typical region 210 of main memory 120 as used in the present invention.
- a region in main memory is generally made up of blocks of data 212 (or instruction code) .
- related blocks of data may be stored in fragmented sections in main memory, they may be thought of conceptually as a contiguous stream of blocks in a region as shown in FIG. 2. Effectively, the region 210 of related data blocks 212 will be a given size, with an address indicating the beginning of the related data.
- a region of main memory may contain the instructions/data of an entire object, or significant portions of such functions, which will benefit the computer system when it may be prefetched through the mechanism of the present invention.
- FIG. 3 shows a cache memory that has a line size of 8 bytes, that is, each cache line (0-N) is made up of 8 bytes (0-7) .
- blocks of data D are prefetched from memory to cache lines through cache touch instructions. Although data D is shown occupying cache lines 0-2, it is to be understood that other cache lines may be used.
- Al, A2, A3, and A4 are locations addressed by cache touch instructions, and will be discussed later in reference to FIG. 4.
- FIG. 4 illustrates the touch instructions and corresponding prefetched lines of a prior art system, wherein the compiler correctly assumes the length of the cache line of FIG. 3 to be 8 bytes long.
- the touch instructions T1-T4 correspond to the touched addresses A1-A4 shown in the cache memory of FIG. 3.
- the processor will execute a cache touch instruction for each assumed cache line to be prefetched, that is, touch instruction Tl is executed for the beginning address (Al) , T2 for 8 bytes after the beginning address (A2) , T3 for 16 bytes after the beginning address (A3) , and T4 for the end address (A4) .
- Touch instructions Tl, T2 and T3 cause the memory subsystem to prefetch cache lines 0, 1 and 2, respectively, (i.e., prefetch blocks of data from main memory to occupy cache lines 0, 1 and 2) .
- the end address touch instruction T4 does not prefetch any lines, but has to be issued because the compiler cannot accurately predict the alignment of the data object or blocks of data with respect to cache line boundaries. Hence, a compiler cannot determine the number of cache lines that the object or related blocks of data may straddle and will need a touch instruction to prefetch the last block of data if it happens to fall on a new cache line. In this case, the cache touch instruction T4 is an unnecessary instruction, and uses the processor's resources inadequately.
- FIGS. 5-8 demonstrate the detrimental affects of prefetching in prior art systems when the compiler assumes a cache line length that is either greater or less than the actual cache line length.
- FIG. 5 shows a cache with lines that are 4 bytes long, that is, each cache line (0-N) is made up of 4 bytes (0-3) .
- Data D is shown occupying cache lines 0-5.
- Al, A2, A3, and A4 are locations addressed by touch instructions that are used to prefetch data. Both the data D and the touched address locations of FIG. 5 will be discussed in reference to FIG. 6.
- FIG. 6 shows the touch instructions and prefetched lines of a prior art system used with the cache of FIG.
- touch instruction Tl is executed for the beginning address
- T2 for 8 bytes after the beginning address (A2) , T3 for 16 bytes after the beginning address (A3) , and T4 for the end address (A4) (i.e., the same touches as shown in FIG. 4 since an 8-byte-long cache is assumed) .
- Touch instruction Tl prefetches line 0, and then, after advancing 8 bytes, T2 will prefetch the corresponding cache line, which is line 2, not line 1. Again the compiler advances 8 bytes and prefetches the corresponding cache line for instruction T3, which is cache line 4. The touch instruction for the end address will then prefetch cache line 5.
- FIG. 7 is a block diagram of a cache with lines that are 16 bytes long, that is, each cache line (0-N) is made up of 16 bytes (0-15).
- Data D is shown occupying cache lines 0 and 1.
- Al, A2, A3, and A4 are locations addressed by touch instructions that are used to prefetch data. Both the data D and the touched address locations of FIG. 7 are discussed in reference to FIG. 8.
- FIG. 8 shows the touch instructions and prefetched lines of a prior art system used with the cache of FIG. 7, wherein the compiler assumes that the cache line is 8 bytes long, an assumption that is less than the actual 16-byte-length cache line of FIG. 7.
- the touched addresses A1-A4 of FIG. 7 correspond with touch instructions T1-T4.
- the processor will execute a cache touch instruction for each assumed cache line to be prefetched, that is, touch instruction Tl is executed for the beginning address (Al) , T2 for 8 bytes after the beginning address (A2) , T3 for 16 bytes after the beginning address (A3) , and T4 for the end address (A4) (i.e., the same touches as shown in FIGS.
- FIG. 9 is a block diagram of the multi-block cache touch instruction in accordance with the present invention.
- the instruction comprises an op code field 310, an address field 312, and a size field 314.
- the op code field 310 distinguishes the multi-block cache touch instruction from other instructions.
- the op code field 310 allows the touch instruction to be placed within the hardware instruction set.
- the addressing field 312 indicates the beginning of the code or data block to be prefetched from memory. Those skilled in the art will appreciate that the addressing field may be generated through a variety of different methods.
- Some of these methods include, but are not limited to: denoting, in a field of the instruction, a register as containing the starting address; denoting, through two fields in the instruction, registers containing a base address and an offset for constructing the starting address; or, for loading blocks of code, using an offset from the current instruction pointer.
- denoting in a field of the instruction, a register as containing the starting address
- denoting through two fields in the instruction, registers containing a base address and an offset for constructing the starting address
- an offset from the current instruction pointer For the particular case of an IBM PowerPC processor (and for other RISC architectures) , the first of these methods could be easily retrofitted into an existing data cache block touch (debt) instruction, placing the size field in the unused target register field.
- the size field 314 indicates how much memory is to be prefetched into the cache.
- the size field is measured in units that are independent of the size of a cache line, since platforms with different size cache lines may have to execute the same code. For example, the size field might denote the number of 32-byte memory blocks that are to be transferred.
- a cache with a 128-byte line size would examine the starting address and the size field and determine how many 128-byte blocks must be brought into the cache to satisfy the prefetch request.
- a size field entry of 3 i.e., three 32-byte memory blocks) might then require either one or two cache lines to be prefetched, depending on the location of the starting address within its cache line.
- FIGS. 10-15 demonstrate how the multi-block cache mechanism and instruction of the present invention operate in contrast to the prior art mechanism and touch instruction as shown in FIGS. 3-8.
- FIG. 10 illustrates a block diagram of an 8-byte-length cache line, which corresponds to the cache of FIG. 3. However, only one address Al is touched (in contrast to the four addresses touched shown in FIG. 3) because only one touch instruction Tl is needed, as described in reference to FIG. 11.
- Each cache line (0-N) is made up of 8 bytes (0-7) and data D is shown occupying cache lines 0-2.
- FIG. 11 shows the results of using the multi-block cache touch instruction of the present invention with the 8- byte-length cache line of FIG. 10.
- the touched address Al of FIG. 10 corresponds to the cache touch instruction Tl.
- a multi-block cache touch instruction generator such as the compiler or a programmer
- the cache management logic will determine the blocks of data or code in a computer system memory to be prefetched and the corresponding number of cache lines to be preloaded. The blocks are then prefetched directly from the computer system memory into the cache lines without any intermediate processing of the blocks, such as with relocation of data or unnecessary manipulations of the blocks, thus optimizing the prefetching of blocks of data or code, or objects for OOP.
- the size field denotes 19 bytes of data to be prefetched.
- the cache memory examines the starting address and the size field of the touch instruction and determines that three cache lines worth of data need to be brought into the cache to satisfy the prefetch request. The cache then prefetches lines 0, 1 and 2 from the memory subsystem. Unlike the prior art examples, which needed four cache touch instructions (see FIGS. 3 and 4), the present invention only requires one cache touch instruction.
- the cache management logic would be augmented with a "sequencer" to control the preloading of multiple cache lines.
- registers in the sequencer Upon receiving a request, registers in the sequencer would be initialized with the address of the first cache line to be preloaded and the number of cache lines to load. The sequencer would then process cache lines sequentially as follows. The sequencer first determines if the currently addressed memory block is already in the cache. If so, no further action is required for this block, so the sequencer increments its address register by the size of a cache line and decrements the register containing the number of cache lines to process.
- the sequencer issues a memory transfer request to prefetch the currently addressed memory block into the cache, again followed by incrementing its address register and decrementing the number of cache lines to be processed. This process continues until all requested blocks have been found in the cache or are in the process of being loaded.
- the present invention may be implemented through a variety of appropriate methods, and is not limited to the method described above.
- the present invention may also by extended by other features, such as the ability for a processor to issue a second cache request before the first multi -block prefetch has completed.
- the cache may respond to such a second request by:
- a contiguous effective address range (as that implied by the multi-block touch instruction's address and size) may not map onto a contiguous real (physical main storage) address range when a page boundary is crossed (for example a 4K byte page) .
- the memory subsystem either has the option of : 1) only prefetching within a page; or 2) translating the effective address at the page boundary in order to get to the next real page, and thus prefetch across page boundaries .
- An additional feature for extending the present invention includes the processor implementing a "history mechanism" that is associated with the touched address Al.
- a history mechanism would remember the actual cache lines that were subsequently referenced by the processor ("useful") after the prior execution of the multi-block touch to a given location. ith this history, a subsequent multi-block touch to that location would only fetch memory blocks into cache blocks that were useful on the prior multi-block touch. That is, the cache management logic of the cache memory would prefetch only a subset of the blocks of data or code corresponding to the multi-block cache touch instruction, where the subset consists of one or more blocks of data or code that were used by the processor after a previous issuance of the multi-block cache touch instruction.
- One example of using the history mechanism is as follows.
- the history of that multi-block touch instruction would be maintained by the cache management logic of the memory subsystem. Then, a subsequent execution of the multi-block touch instruction would only bring in lines 0 and 3 rather than all of the lines 0, 1, 2, 3, and 4.
- the history mechanism may be implemented by means of a confirmation vector, which would associate one bit (initially zero) with each prefetched cache line. Each reference by the processor to a cache line would cause that cache line's bit in the confirmation vector to be set. At the next issuance of the multiblock cache touch instruction, only those cache lines with set bits in the confirmation vector would actually be prefetched.
- the confirmation vector scheme might be further extended to use a consensus vote, in which history of the previous N issuances of a multiblock cache touch instruction are remembered, and in which a cache line is prefetched only if it was found to be useful after a majority of the previous N issuances.
- FIG. 12 illustrates a 4-byte-length cache line corresponding to the cache memory of FIG. 5. That is, each cache line (0-N) is made up of 4 bytes 0-3. Data D is shown occupying cache lines 0-5. Al is the touched address. Data D and Al are discussed in reference to FIG. 13.
- FIG. 13 shows the results of using the multi-block cache touch instruction of the present invention with the 4- byte-length cache line of FIG. 12.
- the touch address Al of FIG. 12 corresponds with touch instruction Tl .
- the corresponding prior art mechanism was only able to prefetch lines 0, 2, 4, and 5 when its compiler assumed a cache line of 8 bytes long (see FIG. 6) . Even if the prior art mechanism correctly assumed a cache line of 4 bytes long, it would have taken 6 cache touch instructions to prefetch lines 0-5.
- the multi -block touch mechanism of the present invention prefetches all necessary cache lines (i.e., lines 0, 1, 2, 3, 4 and 5) with only one touch instruction Tl .
- the present invention uses the processor's resources effectively when prefetching data into cache memory.
- FIG. 14 illustrates a 16-byte-length cache line corresponding to the cache memory of FIG. 7. That is, each cache line (0-N) is made up of 16 bytes 0-15. Data D is shown occupying cache lines 0-1. Unlike FIG. 7, Al is the only touched address shown, and will be discussed in reference to FIG. 15.
- FIG. 15 shows the results of using the multi -block touch instruction of the present invention with the 16-byte- length cache line of FIG. 14.
- the touched address Al of FIG. 14 corresponds with the touch instruction Tl .
- unnecessary touch instructions are generated because: first, the compiler assumed the incorrect length of a cache line; and second, the prior art mechanism issued an end address touch instruction in case the data was misaligned in the cache.
- the multi-block touch mechanism of the present invention prefetches cache lines 0 and 1 with only one touch instruction Tl . Additionally, the present invention avoids the unnecessary cache touch instructions that result from potential misalignment of the block within the cache. This is possible because the cache, not the compiler, determines the number of cache lines that a block of data or object may straddle by examining the size field of the touch instruction of the present invention. Hence, memory space and processing time are not wasted when prefetching data from a region of memory.
- the multi-block touch mechanism of the present invention is very efficient and beneficial when prefetching data blocks into a data cache memory.
- the multi-block touch mechanism may also be used, as efficiently, for prefetching code into an instruction cache memory, and is especially useful for prefetching entire object functions, or significant portions of such functions, into an instruction cache.
- the multi-block touch instruction is less applicable to small functions and small object methods, though that can be mitigated by grouping related functions and methods that have an affinity to each other in contiguous memory blocks, and performing the multi-block touch on groups of functions and object methods.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP96944764A EP0890148A1 (en) | 1996-03-28 | 1996-12-05 | Cache multi-block touch mechanism for object oriented computer system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US62326696A | 1996-03-28 | 1996-03-28 | |
US08/623,266 | 1996-03-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1997036234A1 true WO1997036234A1 (en) | 1997-10-02 |
Family
ID=24497420
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1996/019469 WO1997036234A1 (en) | 1996-03-28 | 1996-12-05 | Cache multi-block touch mechanism for object oriented computer system |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0890148A1 (en) |
KR (1) | KR19990087830A (en) |
WO (1) | WO1997036234A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0752644A2 (en) * | 1995-07-07 | 1997-01-08 | Sun Microsystems, Inc. | Memory management unit incorporating prefetch control |
EP1096371A1 (en) * | 1999-10-28 | 2001-05-02 | Hewlett-Packard Company, A Delaware Corporation | A method and apparatus for prefetching instructions |
EP0752645A3 (en) * | 1995-07-07 | 2002-01-02 | Sun Microsystems, Inc. | Tunable software control of Harvard architecture cache memories using prefetch instructions |
WO2006017874A2 (en) * | 2004-08-17 | 2006-02-23 | Schoeberl Martin | Instruction cache memory for real-time systems |
US8234452B2 (en) | 2006-11-30 | 2012-07-31 | Freescale Semiconductor, Inc. | Device and method for fetching instructions |
US11454950B2 (en) * | 2019-05-08 | 2022-09-27 | Fanuc Corporation | Machining control system and machining system |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100617765B1 (en) * | 1999-08-28 | 2006-08-28 | 삼성전자주식회사 | Distributed dbms cache management system for real-time distributed odbms in telecommunication systems |
-
1996
- 1996-12-05 WO PCT/US1996/019469 patent/WO1997036234A1/en not_active Application Discontinuation
- 1996-12-05 KR KR1019980707417A patent/KR19990087830A/en not_active Application Discontinuation
- 1996-12-05 EP EP96944764A patent/EP0890148A1/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
IEEE COMPUTER SOC. PRESS, CONFERENCE PAPER, Conference Date 02-04 October 1995, CHI et al., "Reducing Data Access Penalty Using Intelligent Opcode-Driven Cache Prefetching", pages 512-517. * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0752644A2 (en) * | 1995-07-07 | 1997-01-08 | Sun Microsystems, Inc. | Memory management unit incorporating prefetch control |
EP0752644A3 (en) * | 1995-07-07 | 2001-08-22 | Sun Microsystems, Inc. | Memory management unit incorporating prefetch control |
EP0752645A3 (en) * | 1995-07-07 | 2002-01-02 | Sun Microsystems, Inc. | Tunable software control of Harvard architecture cache memories using prefetch instructions |
EP1096371A1 (en) * | 1999-10-28 | 2001-05-02 | Hewlett-Packard Company, A Delaware Corporation | A method and apparatus for prefetching instructions |
US6799263B1 (en) * | 1999-10-28 | 2004-09-28 | Hewlett-Packard Development Company, L.P. | Prefetch instruction for an unpredicted path including a flush field for indicating whether earlier prefetches are to be discarded and whether in-progress prefetches are to be aborted |
WO2006017874A2 (en) * | 2004-08-17 | 2006-02-23 | Schoeberl Martin | Instruction cache memory for real-time systems |
WO2006017874A3 (en) * | 2004-08-17 | 2006-11-23 | Martin Schoeberl | Instruction cache memory for real-time systems |
US8234452B2 (en) | 2006-11-30 | 2012-07-31 | Freescale Semiconductor, Inc. | Device and method for fetching instructions |
US11454950B2 (en) * | 2019-05-08 | 2022-09-27 | Fanuc Corporation | Machining control system and machining system |
Also Published As
Publication number | Publication date |
---|---|
KR19990087830A (en) | 1999-12-27 |
EP0890148A1 (en) | 1999-01-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6131145A (en) | Information processing unit and method for controlling a hierarchical cache utilizing indicator bits to control content of prefetching operations | |
US5357618A (en) | Cache prefetch and bypass using stride registers | |
JP3618385B2 (en) | Method and system for buffering data | |
US5123095A (en) | Integrated scalar and vector processors with vector addressing by the scalar processor | |
JP3548616B2 (en) | Information processing equipment | |
JP3739491B2 (en) | Harmonized software control of Harvard architecture cache memory using prefetch instructions | |
US5893165A (en) | System and method for parallel execution of memory transactions using multiple memory models, including SSO, TSO, PSO and RMO | |
EP0803817B1 (en) | A computer system having cache prefetching capability based on CPU request types | |
US7168070B2 (en) | Aggregate bandwidth through management using insertion of reset instructions for cache-to-cache data transfer | |
US5796989A (en) | Method and system for increasing cache efficiency during emulation through operation code organization | |
US6668307B1 (en) | System and method for a software controlled cache | |
US6922753B2 (en) | Cache prefetching | |
US6658537B2 (en) | DMA driven processor cache | |
US6260191B1 (en) | User controlled relaxation of optimization constraints related to volatile memory references | |
US6301652B1 (en) | Instruction cache alignment mechanism for branch targets based on predicted execution frequencies | |
JPH06236353A (en) | Method and system for increase of parallelism of system memory of multiprocessor computer system | |
US20060149940A1 (en) | Implementation to save and restore processor registers on a context switch | |
EP1040412B1 (en) | Processor executing a computer instruction which generates multiple data-type results | |
US6892280B2 (en) | Multiprocessor system having distributed shared memory and instruction scheduling method used in the same system | |
WO1997036234A1 (en) | Cache multi-block touch mechanism for object oriented computer system | |
US5715425A (en) | Apparatus and method for prefetching data into an external cache | |
WO1998004972A1 (en) | Compiler having automatic common block splitting | |
EP0101718B1 (en) | Computer with automatic mapping of memory contents into machine registers | |
US20050251795A1 (en) | Method, system, and program for optimizing code | |
US20070162703A1 (en) | Method and structure for an improved data reformatting procedure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 1996944764 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1019980707417 Country of ref document: KR |
|
NENP | Non-entry into the national phase |
Ref country code: JP Ref document number: 97522337 Format of ref document f/p: F |
|
WWP | Wipo information: published in national office |
Ref document number: 1996944764 Country of ref document: EP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1996944764 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1019980707417 Country of ref document: KR |
|
WWR | Wipo information: refused in national office |
Ref document number: 1019980707417 Country of ref document: KR |