EP1444584A1 - Datenvorabruf in einem computersystem - Google Patents

Datenvorabruf in einem computersystem

Info

Publication number
EP1444584A1
EP1444584A1 EP01979147A EP01979147A EP1444584A1 EP 1444584 A1 EP1444584 A1 EP 1444584A1 EP 01979147 A EP01979147 A EP 01979147A EP 01979147 A EP01979147 A EP 01979147A EP 1444584 A1 EP1444584 A1 EP 1444584A1
Authority
EP
European Patent Office
Prior art keywords
data
prefetching
data storage
storage information
job
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01979147A
Other languages
English (en)
French (fr)
Inventor
Leif Karl Östen JOHANSSON
Jon Fredrik Helmer Reveman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Publication of EP1444584A1 publication Critical patent/EP1444584A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6028Prefetching based on hints or prefetch instructions

Definitions

  • the present invention generally relates to data prefetching in a computer system, and more particularly to a method and system for prefetching data as well as a method and system for supporting such data prefetching.
  • a common way of alleviating this problem, trying to reduce the average memory latency to an acceptable level, is to use one or more levels of small and fast cache as a buffer between the processor and the larger and slower main memory.
  • a cache memory contains copies of blocks of information that are also stored in the main memory.
  • the system first goes to the fast cache to determine if the information is present in the cache. If the information is available in the cache, a so-called cache hit, access to the main memory is not required and the information is taken directly from the cache. If the information is not available in the cache, a so-called cache miss, the data is fetched from the main memory into the cache, possibly overwriting other active data in the cache.
  • the cache memory has the potential to reduce the average memory latency, the actual performance improvement is naturally dependent on the cache-hit ratio. It is only when the required information is available in the cache, a cache-hit, that the memory latency will be reduced. Whenever the processor needs data and/or instructions that are not available in the cache, the processor "stalls" until the required information is loaded from the main memory, thus wasting valuable processing time.
  • the simplest example of hardware prefetching is the behavior of ordinary caches, which bring an entire cache line from memory even when a single word in the line is referenced, assuming that other words in the same line will be referenced within short.
  • More advanced conventional hardware techniques perform statistical analysis of the memory access patterns of the processor at run-time to generate appropriate prefetch requests.
  • software prefetching also generally known as compiler-assisted prefetching, the compiler analyzes the code to predict references in for example loop structures and inserts specific prefetch instructions into the code accordingly.
  • Compiler-assisted prefetching can use program code knowledge to provide prefetches at suitable places in the code, but has no knowledge of the run-time dynamics.
  • conventional hardware prefetching can effectuate prefetches for run-time memory access patterns not visible or available to the compiler, but has no knowledge of the program code flow.
  • U.S. Patent 5,704,053 relates to a compiler that facilitates efficient insertion of explicit data prefetch instructions into loop structures within an application by simple address expression analysis. Analysis and explicit prefetch instruction insertion is performed by the compiler in a low-level optimizer to provide access to more accurate expected loop iteration latency information. In addition, execution profiles from previous runs of an application are exploited in the insertion of prefetch instructions into loops with internal control flow. Cache line reuse patterns across loop iterations are recognized to eliminate unnecessary prefetch instructions. The prefetch insertion algorithm is also integrated with other low-level optimization phases such as loop unrolling, register re- association and instruction scheduling.
  • U.S. Patent 5,812,996 relates to a database system for improving the execution speed of database queries by optimizing the use of buffer caches.
  • the system includes an optimizer for formulating an optimal strategy for a given query.
  • the optimizer communicates with a buffer manager for determining whether an object of interest exists in its own buffer cache, and how much of the cache that the object requires as well as the optimal I/O size for the cache. Based on this information, the optimizer formulates a query strategy with hints that are passed to the buffer manager.
  • U.S. Patent 5,918,246 relates to data prefetching based on information in a compiler- generated program map.
  • the program map is generated by a compiler when the source code is compiled into object code, and represents the address flow of the compiled program with information of the address location of each branch target that the CPU might encounter during execution. For each application program, the user would have this program map stored with the object file.
  • the operating system will load the program map into a given area of the random access memory, and a special map control unit will utilize the program map in cooperation with a conventional cache controller to effectuate the actual pre-loading of data and instructions to the cache.
  • the present invention overcomes these and other drawbacks of the prior art arrangements.
  • Yet another object of the invention is to provide a method and system for supporting data prefetching in a computer system.
  • the invention is based on the recognition that program code knowledge of the data storage structure used by program procedures can be effectively combined with run- time information in order to generate appropriate prefetch requests.
  • the general idea according to the invention is to combine data storage information generated during program code analysis, for example at compile-time, with one or more run-time arguments to determine a memory address for prefetching data. In this way, efficient data prefetching with a high cache-hit ratio will be accomplished, thus reducing the memory latency and improving the processor utilization.
  • the invention takes advantage of the fact that many computer systems, such as transaction-based systems and database systems, have a queue of jobs to be executed. By peeking into the queue to fetch the relevant information for a given job well in advance of execution of the job, a prefetch can be requested sufficiently early so that the corresponding data will be available when the job is to be executed.
  • the data storage information is generally generated prior to the program execution by program code analysis and stored in the memory system for easy access during run- time.
  • a program code analyzer such as a compiler or code optimizer generates individual data storage information for each of a number of program procedures defined in the program code.
  • the appropriate data storage information to be used for a given job is then accessed based on the program procedure or procedures indicated in the job.
  • the data storage information comprises at least a data area start address together with information concerning which job input argument or arguments that are required to pin-point the memory address of the data to be prefetched.
  • the prefetch address determination and the actual prefetch request are preferably executed by operating system software, dedicated hardware or a combination thereof.
  • the invention offers the following advantages: Efficient data prefetching; Reduced average memory latency; Improved processor utilization; and - Efficient compiler-derived support of data prefetching.
  • Fig. 1 is a schematic block diagram illustrating an example of a computer system implementing a prefetch mechanism according to a preferred embodiment of the invention
  • Fig. 2 is a schematic diagram illustrating the job queue and the corresponding execution flow related to the exemplary computer system of Fig. 1;
  • Fig. 3 is a schematic diagram generally illustrating the general principle for generating data storage information according to the invention.
  • Fig. 4 is a schematic diagram illustrating an example of compiler-assisted generation of data storage information according to a preferred embodiment of the invention
  • Fig. 5 is a schematic diagram illustrating a specific example of the relationship between given data storage information and a data storage structure in the data store.
  • Fig. 1 is a schematic block diagram illustrating an example of a computer system implementing a prefetch mechanism according to a preferred embodiment of the invention.
  • the description is not intended to be complete with regard to the entire computer system, but will concentrate on those parts that are relevant to the invention.
  • the example illustrated in Fig. 1 merely serves as a basis for understanding the basic principles of the invention, and the invention is not limited thereto.
  • the computer system basically comprises a processor 110 and a memory system 120.
  • the computer system also comprises a memory manager 130, a scheduling unit 140 and a prefetch unit 150, implemented in operating system software, dedicated hardware or a combination thereof.
  • the processor 110 and the memory system 120 are interconnected via a conventional communication bus.
  • the memory system 120 comprises a main memory 121 and a faster cache memory 122.
  • the main memory 121 generally comprises a job queue 123 for storing jobs to be executed, a data store 124 for storing data variables and constants, a program store 125 for storing executable program instructions, and a dedicated memory area 126 for storing special data storage information generated during program code analysis.
  • the cache memory 122 generally comprises a data cache 127 and an instruction cache 128.
  • the cache memory 122 may be representative of a so-called on-chip cache provided directly on the processor chip, an off-chip cache provided on a separate chip or both.
  • the performance of a cache is affected by the organization of the cache, and especially the replacement algorithm.
  • the replacement algorithm generally determines to which blocks or lines in the relevant cache that information in the main memory is mapped.
  • the most commonly used replacement algorithms are direct mapping, set-associative and fully associative mapping.
  • the replacement algorithm determines to which blocks or lines in the relevant cache that selected information in the main memory (or another higher-level cache) are mapped, it is still necessary to determine which blocks of information in the main memory that should be copied into the cache in order to maximize the cache-hit ratio and minimize the memory latency.
  • the memory latency will be reduced only when the required information is available in the cache. Whenever the processor needs data and/or instructions that are not available in the cache, the processor stalls until the required information has been loaded from the main memory.
  • the invention proposes a new prefetch mechanism that effectively combines data storage information generated during program code analysis, for example at compile-time, with one or more run-time arguments in order to generate appropriate data prefetch requests.
  • data storage information and run-time information for a given program procedure are combined by means of a generic prefetch function in order to determine a useful prefetch address.
  • the invention has turned out to be particularly applicable in computer systems that operate based on a queue of jobs to be executed. It has been recognized that the queue structure makes it possible to peek into the job queue to fetch relevant information for a given job and request a prefetch of data well in advance of the actual execution of the job. By looking into the queue and generating a prefetch request sufficiently early, the required data will be available in time for execution of the job.
  • the job queue 123 is implemented as a first-in-first- out (FIFO) buffer in which a number of externally and/or internally generated job messages are buffered, awaiting processing by the processor.
  • FIFO first-in-first- out
  • each job message in the job queue 123 includes program address representative information, input arguments to be used in the execution as well as data storage information related to the given procedure.
  • the program address representative information directly or indirectly addresses the program procedure to be executed.
  • program address information directly or indirectly addresses the program procedure to be executed.
  • the actual program address is generally accessed by means of several table look-ups in different tables.
  • the program address information in the job message typically includes a pointer to a look-up table, which in turn points to another table and so on until the final program address is found.
  • the data storage information also referred to as data storage structure information, is generated before program execution by proper analysis of the program code. In many applications, it is often convenient to generate individualized data storage information for each of a number of program procedures defined in the program code.
  • the procedure-specific data storage information generally describes the data storage structure related to the program procedure in question.
  • the data storage information is typically stored in the data storage information area 126 for access during run-time.
  • the data storage information is preferably transferred by the operating system
  • OS or equivalent from the data storage information area 126 into the job queue 123.
  • the operating system analyzes each job message to be placed in the job queue 123, and detects which program procedure that is defined in the job message based on the program address information included in the message. The operating system then adds the corresponding data storage information to the respective job message, and writes the entire job message into the job queue.
  • the data storage information may be loaded directly from the data storage information area 126 based on the program address information for the given job.
  • the scheduling unit 140 schedules the corresponding jobs for execution by the processor 110 by managing the job queue 123 using a special execution pointer.
  • the execution pointer usually points to the head of the job queue, indicating that the job at the head position is to be executed (or currentiy under execution).
  • the prefetch unit 150 looks ahead in the job queue 123, using a special prefetch pointer, and initiates the prefetch mechanism for a given future job a predetermined number of jobs in advance of execution. First, the prefetch unit 150 loads program address information, input arguments and data storage information for the indicated job from the memory-allocated job queue 123 into the cache, unless this information already resides in the cache. The prefetch unit 150 then combines selected data storage information with at least one of the input arguments for the job according to a given prefetch address function, thus calculating a data prefetch address.
  • the data storage information typically comprises a data area start address together with information concerning which input argument or arguments that are required to fully determine the corresponding prefetch address.
  • a prefetch address may be calculated by using the start address to find the relevant area within the data store 124 and pin-pointing the address of the needed data variable or constant by means of the indicated input argument.
  • the prefetch unit 150 communicates the calculated prefetch address to the memory manager 130, which in turn controls the actual transfer of data from the data store 124 into the data cache 127. If the memory manager 130 brings an entire cache line from the main memory 121 when a single word is referenced, it is generally not necessary to find the exact individual memory address for the future data reference. It is merely sufficient to determine the correct memory line or block in which the needed data is located. This relaxes the requirements on the exactness of the prefetch address function.
  • prefetch unit 150 generally follows the same job queue as the scheduling unit 140, but operates a predetermined number of jobs ahead of the job to be executed.
  • the program address information and the input arguments for the job are already available in the cache.
  • data variables and/or constants to be used in the execution of the job are also available in the data cache. This minimizes the memory latency and thus substantially reduces the number of stall cycles. Simulations have indeed shown that the number of stall cycles due to data store accesses can be reduced with 25-50%, as will be described in detail later on.
  • prefetch is merely a hint to the memory system to bring the given data into a closer, faster level of memory, such that a later binding load will complete much faster.
  • This kind of prefetch is executed asynchronously with no follow-on dependencies in the code stream, and therefore does not cause any stall cycles.
  • Fig. 2 is a schematic diagram illustrating the job queue and the corresponding execution flow related to the exemplary computer system of Fig. 1.
  • the prefetch for a given future job is initiated a predetermined number K of jobs in advance of the actual execution.
  • K the number of jobs in advance of the actual execution.
  • step 201 the data cache block or blocks required in the future execution of job M+K are subsequently calculated based on the obtained data storage information and at least one of the obtained input arguments.
  • step 203 the actual prefetch of the calculated data cache block or blocks is requested.
  • the prefetch should not be issued too close in time to the actual data reference, since then the prefetched data may not be available in time to minimize or avoid a stall situation.
  • the prefetch is issued too early, there is a risk that the prefetched line is displaced from the cache before the actual data reference takes place.
  • the so-called look-ahead period is ideally of the same order as the memory access time or slightly longer so that data related to job M+K will be available in the cache when job M+K is ready for execution at the head of the job queue. If the average job execution time is known, it is possible to deteraiine how many jobs in advance of execution the prefetch should be issued. Naturally, the optimal look-ahead period differs from application to application. However, simulations have shown that a significant performance gain can be achieved already with a look-ahead of one or two jobs.
  • the representation of the job queue in Fig. 2 is a snap-shot, and that a similar prefetch of program address information, input arguments and data variables or constants has already been performed for each of the jobs to be executed before job M+K, including job M.
  • the program address information and input arguments required for starting job M as well as data variables and/or constants needed in the execution of job M are ideally available in the cache so that the execution of job M can be initiated in step 204.
  • the results of job M are stored in relevant parts of the memory system. If a new job M+N is generated as a result of the execution of job M, this job is stored in the job queue in step 205.
  • the operating system adds the data storage information corresponding to the new job M+N from the data storage information area 126 into the relevant position of the job queue.
  • job M is shifted out of the job queue and job M + l is placed at the head of the job queue.
  • the prefetch mechanism according to the invention may be implemented as an operating system routine that is activated between jobs, or executed as an integrated part of the currently executing job M.
  • the prefetch may be executed by dedicated hardware or even a separate processor with software responsible for job scheduling and prefetching. In the latter case, it is generally easier to optimize the prefetch timing to the memory access time of the slower memory, since prefetches may be issued by the separate processor at any suitable time.
  • Fig. 3 is a schematic diagram illustrating the general principle for generating data storage information according to the invention.
  • An input program file 302 is provided to a code analyzer 304, which performs a flow graph analysis or equivalent analysis of the program code.
  • the code analyzer 304 extracts static information concerning the data storage structure related to the program procedure.
  • the code analyzer 304 may extract information regarding the start address to a specific area in the data store towards which the given program procedure operates.
  • one or more run-time arguments are typically required.
  • the code analyzer 304 does not know the values of any run-time arguments, but instead analyzes the code to provide information as to which input argument or arguments that are required to pinpoint the address of the needed data within the specified data store area. During runtime, the required input argument(s) can then be accessed and combined with the static information from the code analyzer to determine the relevant data store address.
  • Fig. 4 is a schematic diagram illustrating an example of compiler-assisted generation of data storage information according to a preferred embodiment of the invention.
  • the source code in the form of an input program file 402, is provided to a compiler or optimizer 404. During compilation, the compiler translates the source code into object code, producing a corresponding output program file 406.
  • the compiler generally generates a compiler help file in the form of a procedure descriptor table 408.
  • This table normally includes a general description of each compiled program procedure indicating the name of the procedure, the number of input arguments, possibly the format of the arguments, and program address information.
  • the compiler 404 also generates individual data storage information for each program procedure by analysis of the corresponding program code, and integrates this information into the procedure descriptor table 408. The data storage information can then be accessed from the procedure descriptor table 408 during run-time and combined with run-time arguments to generate appropriate prefetch requests.
  • Fig. 5 is a schematic diagram illustrating a specific example of the relationship between given data storage information and a data storage structure in the data store.
  • the prefetch mechanism has access to the run-time arguments 510 for a given job as well as data storage information 520 related to a program procedure defined in the given job.
  • the data storage information 520 is represented by a base address number ban as well as an indication arg nr of which input argument or arguments to the procedure that are required to determine a prefetch address.
  • the base address number is unique for the given program procedure and acts as a pointer to a base address table 525.
  • the base address table 525 holds data area start addresses, record size values and offset values, and each base address number is associated with a unique data area start address dfn, record size recsize and offset.
  • the data storage parameters dfn, recsize and offset are given directly, eliminating the need for a table look-up in the base address table using the base address number.
  • the dfn value indicates the start address of a given data store area 535 in the data store 530.
  • the recsize value represents the size of a record in the given data storage structure.
  • the offset value indicates which one of the variables within a record that is requested.
  • the input argument or arguments indicated by the data storage information are also required.
  • the data storage information 520 points out a certain input argument, which is provided as input to a pointer function p(arg).
  • the pointer function p(arg) may simply be defined as the sum of the relevant input argument arg and an arbitrary constant C. The resulting pointer value indicates in which record the needed data variable is located.
  • the prefetch address can thus be calculated according to the following generic prefetch function (assuming that the data store does not have any index dependencies):
  • prefetch address dfn (ban) + p(arg) • recsize (ban) + offset (ban) (1)
  • dfn value gives the data area start address
  • p(arg) value multiplied with the recsize value gives the relevant record
  • the offset value finally gives the correct variable within the identified record.
  • the prefetch unit then requests a prefetch of the data variable, or preferably an entire data cache block, from the data store into the data cache based on the calculated address.
  • each set of data storage information is 32 bits, with the following store layout:
  • the cache block determination defined by steps 1-11 is a very straightforward implementation based on simple logical operations and shift operations, and does not involve any logical decisions. This is important for minimizing the overhead for the data prefetch mechanism according to the invention.
  • a trace-driven simulation was used to study the cache-hit ratio obtained by using the memory address determination algorithm proposed above.
  • the trace was recorded in a live telecommunication exchange based on a 64-bit Alpha processor and included approximately 6 million assembler instructions.
  • the cache block size was 64 bytes.
  • the simulator counted the needed number of data cache blocks for every executed signal in the trace and compared that number with the number of preloaded cache blocks. Table I below shows the results of the simulation.
  • the simulation shows that 48-56% of the cache blocks could be preloaded with the algorithm used by the invention.
  • the percentage of data store accesses to these cache blocks was 58-65%.
  • Calculations and measurements show that nearly 50% of the execution time is stalled due to data store accesses in the Alpha-based processor architecture.
  • the proposed prefetch mechanism could reduce the number of stall cycles with approximately 25-50%. This corresponds to a total capacity gain of about 10-25%, which is a remarkable improvement offered by the invention. In real-life applications, improvements of up to at least 5-10% is expected.
  • the invention can be used in any computer system that allows non-binding asynchronous transfer of data from one level of memory to another level of memory. This includes most modern computer systems such as pipelined processor systems, superscalar processor systems, multiprocessor systems and combinations thereof.
  • the invention is particularly applicable to computer systems in which a number of externally and/or internally generated jobs are arranged, explicitly or implicitly, in a queue, and in applications with a high ratio of so-called context switching.
  • a number of externally and/or internally generated jobs are arranged, explicitly or implicitly, in a queue, and in applications with a high ratio of so-called context switching.
  • most transaction-based systems have a job queue in which jobs are buffered, awaiting processing by the processor or processors within the system.
  • database applications a number of requests or queries from various clients are typically buffered for subsequent processing by a database server.
  • the invention is also applicable to process-based computer systems. For example, many commercial operating systems such as Unix and Windows NT work with processes. In a system having an execution model based on processes, incoming signal messages originating from events in the system or from communication messaging are directed to corresponding processes.
  • a process is normally represented by its process control block, which holds the process state when the process is not executing and possibly administrative information required by the operating system.
  • the process state includes program address information and input arguments required for execution of the current job of the process.
  • a process can be either READY, waiting in a ready queue for execution, EXECUTING, meaning that a job is executed based on the current process state of the process control block, or BLOCKED, waiting for a required signal message in a blocked queue.
  • a job queue is defined and used by a main executive.
  • the job queue consists of job-queue entries, and each entry includes information about the specific procedure to be executed and the input arguments to the procedure.
  • two simple program procedures are defined.
  • a generic prefetch function is inlined in the main executive. The prefetch function uses information found in the job-queue in combination with procedure- specific data storage information. The data storage information would normally be generated by the compiler.
  • element number element_number + 123;
  • next_free_entry 0; /* Wrap around at end of queue */
  • unsigned int new_function (element_number > > 1) & 1; /* Pick 2nd lowest bit for new function */ int arg2;
  • element_number element_number > > 1 ; if ((element_number > 0) && (element_number ⁇ WA_SIZE)) ⁇
  • pf_pointer (execute_pointer + LOOKAHEAD) %JQ_SIZE;

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
EP01979147A 2001-10-19 2001-10-19 Datenvorabruf in einem computersystem Withdrawn EP1444584A1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SE2001/002290 WO2003034229A1 (en) 2001-10-19 2001-10-19 Data prefecthing in a computer system

Publications (1)

Publication Number Publication Date
EP1444584A1 true EP1444584A1 (de) 2004-08-11

Family

ID=20284641

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01979147A Withdrawn EP1444584A1 (de) 2001-10-19 2001-10-19 Datenvorabruf in einem computersystem

Country Status (2)

Country Link
EP (1) EP1444584A1 (de)
WO (1) WO2003034229A1 (de)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060075394A1 (en) * 2004-10-01 2006-04-06 Tatsuya Iwamoto Dynamic loading and unloading for processing unit
US8285941B2 (en) 2008-02-25 2012-10-09 International Business Machines Corporation Enhancing timeliness of cache prefetching
JP6161395B2 (ja) * 2013-05-15 2017-07-12 オリンパス株式会社 演算装置
JP6161396B2 (ja) * 2013-05-15 2017-07-12 オリンパス株式会社 演算装置
US10042773B2 (en) 2015-07-28 2018-08-07 Futurewei Technologies, Inc. Advance cache allocator

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761506A (en) * 1996-09-20 1998-06-02 Bay Networks, Inc. Method and apparatus for handling cache misses in a computer system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778435A (en) * 1996-05-30 1998-07-07 Lucent Technologies, Inc. History-based prefetch cache including a time queue
US6175898B1 (en) * 1997-06-23 2001-01-16 Sun Microsystems, Inc. Method for prefetching data using a micro-TLB

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761506A (en) * 1996-09-20 1998-06-02 Bay Networks, Inc. Method and apparatus for handling cache misses in a computer system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
S. P. VANDERWIEL: "Masking memory access latency with a compiler-assisted data prefetch controller", February 1999, UMI DISSERTATION PUBLISHING, ISBN: 9780599009111 *
See also references of WO03034229A1 *
VANDER WIEL S P ET AL: "A compiler-assisted data prefetch controller", COMPUTER DESIGN, 1999. (ICCD '99). INTERNATIONAL CONFERENCE ON AUSTIN, TX, USA 10-13 OCT. 1999, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US LNKD- DOI:10.1109/ICCD.1999.808569, 10 October 1999 (1999-10-10), pages 372 - 377, XP010360525, ISBN: 978-0-7695-0406-3 *

Also Published As

Publication number Publication date
WO2003034229A1 (en) 2003-04-24

Similar Documents

Publication Publication Date Title
Joseph et al. Prefetching using markov predictors
US9003169B2 (en) Systems and methods for indirect register access using status-checking and status-setting instructions
CA2285760C (en) Method for prefetching structured data
US7904661B2 (en) Data stream prefetching in a microprocessor
US7467377B2 (en) Methods and apparatus for compiler managed first cache bypassing
JP3816586B2 (ja) 先取り命令を生成する方法とシステム
US7958316B2 (en) Dynamic adjustment of prefetch stream priority
USRE45086E1 (en) Method and apparatus for prefetching recursive data structures
US7716427B2 (en) Store stream prefetching in a microprocessor
US8949837B2 (en) Assist thread for injecting cache memory in a microprocessor
US20060179236A1 (en) System and method to improve hardware pre-fetching using translation hints
US6981119B1 (en) System and method for storing performance-enhancing data in memory space freed by data compression
Chen et al. Exploiting method-level parallelism in single-threaded Java programs
JP3681647B2 (ja) キャッシュメモリシステム装置
US6662273B1 (en) Least critical used replacement with critical cache
US20030084433A1 (en) Profile-guided stride prefetching
Tsai et al. Performance study of a concurrent multithreaded processor
Vander Wiel et al. A compiler-assisted data prefetch controller
US6760816B1 (en) Critical loads guided data prefetching
EP1444584A1 (de) Datenvorabruf in einem computersystem
Lee et al. A dual-mode instruction prefetch scheme for improved worst case and average case program execution times
Chen et al. Using incorrect speculation to prefetch data in a concurrent multithreaded processor
Kyriacou et al. Cacheflow: A short-term optimal cache management policy for data driven multithreading
Rau et al. The effect of instruction fetch strategies upon the performance of pipelined instruction units
Sair et al. Quantifying load stream behavior

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20040507

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO SI

17Q First examination report despatched

Effective date: 20100430

R17C First examination report despatched (corrected)

Effective date: 20100506

R17C First examination report despatched (corrected)

Effective date: 20100517

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20101130