WO2002041142A1 - A data processing system performing just-in-time data loading - Google Patents
A data processing system performing just-in-time data loading Download PDFInfo
- Publication number
- WO2002041142A1 WO2002041142A1 PCT/EP2001/008496 EP0108496W WO0241142A1 WO 2002041142 A1 WO2002041142 A1 WO 2002041142A1 EP 0108496 W EP0108496 W EP 0108496W WO 0241142 A1 WO0241142 A1 WO 0241142A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- time
- data processing
- datum
- processing device
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 143
- 230000015654 memory Effects 0.000 claims abstract description 210
- 239000000872 buffer Substances 0.000 claims abstract description 17
- 235000019800 disodium phosphate Nutrition 0.000 abstract description 4
- 102000006822 Agouti Signaling Protein Human genes 0.000 abstract description 3
- 108010072151 Agouti Signaling Protein Proteins 0.000 abstract description 3
- 238000000034 method Methods 0.000 description 12
- 230000003111 delayed effect Effects 0.000 description 8
- 230000001934 delay Effects 0.000 description 4
- 230000003139 buffering effect Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005206 flow analysis Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000011017 operating method Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000005352 clarification Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3824—Operand accessing
Definitions
- a data processing system performing just-in-time data loading.
- the present invention relates to the field of architecture design of data processing systems in general. More specifically, the invention is dealing with a data processing system containing a data processing device performing just-in-time data loading.
- the term 'data processing device' means one of the following terms : microprocessor, central processing unit (CPU), digital signal processor (DSP), micro-controller, any special-purpose processor (e.g. graphics processor) or any application specific instruction set processor (ASIP), whether embedded or stand-alone, any multi-processor system, any array of more or less tightly coupled microprocessors, CPUs, DSPs and/or ASIPs.
- microprocessor central processing unit
- DSP digital signal processor
- ASIP application specific instruction set processor
- One of the main characteristics of a data processing device as defined before is the fact that it has an instruction set.
- the machine code of a program which is running or executed on said data processing device contains instructions belonging to said instruction set.
- Said machine code is usually obtained by compiling the source code of a given program or is obtained by manual writing.
- the source code of said program is usually written in a high level programming language like C++, Basic, Fortran or Java.
- a said instruction set may be dynamically reconfigurable during execution (run-time) of said machine code or may be fixed.
- the scope of the present invention is independent thereof.
- said data processing device contains one or more functional units (FUs) representing any kind of arithmetic logic .units (ALUs) and/or floating point units (FPUs) and/or load/store units and/or address generation units and/or memory management units and/or any other functional units.
- Instructions of a machine code running on said data processing device are executed on (by) the FUs of said data processing device.
- arithmetic/logic instructions like addition, multiplication, bit-wise OR instructions are executed on (by) ALUs, address mode instructions are executed on (by) address generation units, load/store instructions are usually executed on (by) load/store units etc...
- the data used (read) by instructions are often called instruction operands.
- instruction operands When an instruction is executed on a FU, it performs a number of data operations on its operands and generates data results also called instruction results.
- an 'ADD' instruction (arithmetic addition) uses (reads) two operands (two numbers) and generates a result equal to the sum of the two operands.
- a load-, store- or prefetch instruction often uses as operand an explicitly specified memory address or the memory address stored within a register of a register file and returns the content (value) stored at said memory address as the (data) result of saidjogd/store instruction.
- instructions of a machine code being executed on said data processing device have an instruction format. While the definition and meaning of an instruction format is well known, it is recalled that an instruction format is made-up (contains) one ore more so-called bit-fields where specific information is stored. Common bit-fields appearing in instruction formats are f. ex. 'opcode'-, 'operand'- and 'destination' bit-fields which specify a particular instruction (e.g. an 'ADD' instruction), its operands and its destination respectively. However, the order, the number, the type of bit-fields as well as the information stored within each bit-field may vary from data processing device to data processing device.
- the term 'instruction format' has a slightly broader meaning than the one normally found in the literature and includes instruction formats where no instruction (or data operation) is specified neither in an 'opcode' bit-field nor in any other bit-field of the instruction format.
- either one or more 'implicit' instructions or one or more 'implicit and potential' instructions are associated to the data (or operands) specified by the 'operand' bit-fields or by any other bit-fields contained in the instruction format.
- An 'implicit' instruction is defined to be an instruction which is known by the data processing device prior to execution of said instruction and where said instruction has not to be specified by an 'opcode' bit-field or any other bit-field in an instruction format of said instruction.
- an 'implicit' instruction may well have one or more operands and one or more destinations specified in corresponding bit-fields of said instruction format. It is also possible that an 'implicit' instruction may have no operands and no destination specified in any bit-field of the instruction format.
- the 'implicit' instruction may be a special-purpose instruction which initializes some hardware circuitry of the data processing device or has some other well defined meaning or purpose.
- an 'implicit and potential' instruction is an 'implicit' instruction where the data results or the outcome of instructions which have not yet finished execution decide whether : 1) said 'implicit and potential' instruction shall be executed or not
- an example of an 'implicit instruction' associated to these two operands can be any kind of instruction (or data operation) like the addition or the multiplication of these two operands or the loading of these two operands from a memory or a register file etc.
- said implicit instruction may be specified by convention for the whole time of execution of said machine code or may be specified by another instruction which was executed prior to said instruction.
- An example of an 'implicit and potential instruction' associated to these two operands is a load- or a move-instruction which is loading the two operands from some memory 1) only after certain instructions not yet executed have been executed and 2) only if the outcome of the data results of said instructions satisfy certain conditions.
- the organization and architecture of the memory system and memory hierarchy of said data processing system plays an important role.
- the terms 'memory system' and 'memory hierarchy' are defined such as to comprise one or more of the following memories :
- one or more data caches which may be part of the data processing device itself, e.g. L0-.L1- and L2-caches
- main memory Often, data used by instructions are stored in separate caches and memories than instructions themselves. However, data caches and instruction caches may be unified such that data and instructions are stored within a same memory. The scope of the present invention is independent thereof.
- Load and write buffers are usually part of the data processing device itself, e.g. they may also be part of so-called reservation stations used within super-scalar microprocessors. Often, when data are pre-fetched or loaded from main memory or from a data cache, they may first be loaded into said load buffers where they are kept either until they are read by instructions to be executed on the functional units (FUs) of said data processing device or until they are stored in a register file of the data processing device. Similarly, instructions executed on the FUs of the data processing device often write their instruction results first into said write buffers before they are stored into a register file or into another memory of the memory system.
- the load and write buffers are not relevant for the scope of the present invention, they ease the conceptual description. Furthermore, said load and write buffers can be bypassed if required and are fully transparent for the programmer.
- the term 'memory' denotes either a register file, a data cache or main memory.
- the register file(s), the data caches and main memory have each different access times (latencies) for data read and for data write. Since one or more of these memories may be part of the memory system, we adopt a definition of the access time for data read/write (whether register file, data cache or main memory) which considers the memory system as a unit having data address ports, memory control ports (e.g. any kind of clock signal input, column/row address strobe signal inputs, chip select signal inputs etc..) and data read/write ports. The memory system is accessed by said data processing device through these ports. Therefore, the definition of access times for data read/write are defined with respect to the timing of the signals and data applied to these ports. Note that these ports may be those of the memory system or those of a particular memory of the memory system.
- the access time for data read of a memory of said memory system as the time that elapses between : the point in time when said data processing device applies some memory control signals to (e.g. any kind of clock signals, column/row address strobe signals, chip select signals etc..) and/or applies valid data address signals on some read address bus (or some read address ports) of said memory or of said memory system in order to signal a request for a data access to a particular memory of the memory system or in order to start a data read operation of said data from said memory and the point in time when said data are returned by the memory system and/or are valid on some data read bus (or data read ports) of said memory or of said memory system or when the transmission of said data to a memory location of the same or of another memory is finished
- the access time for data write of a memory of the memory system (whether register file, data cache or main memory) is defined to be the time that elapses between : the point in time when said data processing device applies some memory control signals to (e.g.
- this definition of the access time for data read/write of a memory is independent of any particular embodiment of the memory system and of the memories in question, be it a synchronous static/dynamic RAM (SSRAM, SDRAM) or a packet-switched or packet-oriented memory (see memories produced by Rambus Inc.) .
- SSRAM synchronous static/dynamic RAM
- SDRAM synchronous static/dynamic RAM
- packet-switched or packet-oriented memory see memories produced by Rambus Inc.
- the term 'memory hierarchy' refers to the fact that said memory system has so-called memory hierarchy levels.
- each memory within said memory system has a specific memory hierarchy level.
- Said memory hierarchy level is often determined by the access times for data read/write of said memory.
- a memory having a certain memory hierarchy level denoted by '/ is also said to be of or to have memory hierarchy level j and said memory hierarchy level j is said to be the memory hierarchy level of said memory.
- the shorter the access times for data read/write of a memory the lower the memory hierarchy level of that memory.
- a L0 (level-O)-cache has shorter access times than a L1 -cache
- a L1 -cache has shorter access times than a L2-cache and so on.
- a register file is sometimes called a LO-cache.
- the term 'memory hierarchy level' is based in essence on either an upwards or downwards sorting and labeling of the memory hierarchy levels of a memory system according to the access times for data read/write of the different memories.
- a memory A has a shorter access time for data write than a memory B, then said memory A has a lower/higher hierarchy level than said memory B.
- an upwards sorting of the memory hierarchy levels is used, in which case a lower memory hierarchy level implies a shorter (or faster) data access time, often concerning both data read and data write.
- a datum is said to be stored at memory hierarchy level denoted by ' if the memory holding the value of said datum has memory hierarchy level j .
- access time for data read 2 ns
- access time for data read 1 ns
- access time for data read 4 ns
- access time for data read 6 ns
- access time for data read 10 ns
- access time for data read 30 ns
- 'loading of a datum' and 'data loading' refers also to any reading, pre-fetching, moving, copying and transferring of a datum (of data) from a memory of the memory system into the same or into another memory of the memory system or into a load buffer.
- verb 'load' stands also for verbs like 'pre-fetch', 'move', 'copy', 'read', 'transfer' and for any other synonym.
- 'datum' denotes the singular of the term 'data'.
- the terms 'data' and 'datum' refer to their values taken during execution of a machine code running on said data processing device.
- a datum is loaded from some memory of the memory system and transmitted, copied or moved to the same or to another memory of the memory system, this means of course that the value of said datum is loaded and transmitted to said data processing device.
- Several concepts used within the scope of the present invention require and assume that said data processing device has means (hardware circuitry) to measure time by using some method, otherwise machine code that is running on said data processing device may produce wrong data or wrong results.
- Said terms 'measure time' or 'time measurement' have a very broad meaning and implicitly assume the definition of a time axis and of a time unit such that all points in time, time intervals, time delays or any arbitrary time events refer to said time axis.
- Said time axis can be defined by starting to measure the time that elapses from a certain point in time onwards, this point in time usually being the point in time when said data processing device starts operation and begins to execute said machine code.
- Said time unit which is used to express the length of time intervals and time delays as well as the position on said time axis of points in time or any other time events, may be a physical time unit (e.g. nanosecond) or a logical time unit (e.g. the cycle of a clock used by a synchronously clocked microprocessor).
- E.g. synchronously clocked microprocessors use the cycles, the cycle times or the periods of one or more periodic clock signals to measure time.
- a clock signal is referred to simply as a clock.
- the cycle of said clock may change over time or during execution of a machine code on said microprocessor, e.g. the SpeedStep Technology used by Intel Corporation in the design of the Pentium IV microprocessor.
- asynchronously clocked microprocessors use the travel times required by signals to go through some specific hardware circuitry as time units.
- said time axis can be defined by starting to count and label the clock cycles of said clock from a certain point in time onwards, this point in time usually being the point in time when said microprocessor starts operation and begins to execute machine code.
- a data processing device is able to measure time, then this implies that said data processing device is able find to out the chronological order of any two points in time or of any two time events on said time axis.
- a point in time (whose value is) denoted by 't1' is said to lie chronologically ahead of or behind another point in time denoted by 't2' if t2 - 11 ⁇ 0.
- a point in time (whose value is) denoted by 't1' is said to lie chronologically before another point in time denoted by 't2' if t2 - 11 > 0.
- time measurement is made possible by letting said microprocessor operate with a clock in order to measure time with multiples (maybe integer or fractional) of the cycle of said clock, where one cycle of said clock can be seen as a logical time unit.
- a time delay time interval
- the cycle time of said clock is equal to 12.3 ns
- the clock which is used to measure time is often the clock with the shortest cycle time such that said cycle is the smallest time unit (logical or physical) used by a synchronously clocked microprocessor in order to perform instruction scheduling and execution , e.g. to schedule all internal operations and actions necessary to execute a given machine code in a correct way.
- the scope of the present invention is independent of whether said data processing device is synchronously clocked or whether it uses asynchronous clocking, asynchronous timing or any other operating method or timing method to run and execute machine code.
- said data processing device has one or more instruction pipelines which contain each several (pipeline) stages and instructions may take each different amounts of time (in case of a synchronously clocked microprocessor : several cycles of said clock) to go through the different stages of said instruction pipeline before completing execution.
- the first pipeline stage is usually a 'prefetch' stage, followed by 'decode' and 'dispatch' stages, the last pipeline stage being often a 'write back' or an 'execution' stage.
- One often speaks of different phases through which an instruction has to go e.g. 'fetch', 'decode', 'dispatch', 'execute', 'write-back' phases etc., each phase containing several pipeline stages.
- the execution of an instruction may include the pipeline stages (and the amount of time) which are required to write or to store or to save operands or data results into some memory location, e.g. into a register, into a cache or into main memory.
- multiples (integer or fractional) of the cycle of said clock can be used as well to specify the depth and the number of the instruction pipeline stages of a microprocessor.
- the number of pipeline stages that a given instruction has to go through is often called the latency of said instruction.
- said latency is often given in cycle units of a clock.
- An instruction is said to begin execution or to be executed on a FU of said data processing device or to have commenced execution an a FU of said processing device if said instruction enters a certain pipeline stage, and where said pipeline stage is often the first stage of the execution phase.
- An instruction is said to have finished execution if it leaves a certain pipeline stage, said pipeline stage being often the last stage of the execution phase.
- the point in time (on said time axis) at which a given instruction enters a pipeline stage is called the 'entrance point' of said instruction into said pipeline stage.
- the point in time at which a given instruction leaves a pipeline stage is called the 'exit point' of said instruction out of said pipeline stage.
- microcode and microoperations usually differ from pipeline stage to pipeline stage. Note that microcode has not to be confused with machine code. 2) an instruction may enter a stage of an instruction pipeline before another instruction has left another stage of the same instruction pipeline.
- an instruction A1 may enter stage P2 at some point in time t1 while another instruction labeled by B1 enters stage P4 at the same point in time t
- an instruction pipeline of a data processing device is such that instruction A1 may enter a stage before another instruction B1 has left the same stage.
- instruction pipeline is still valid and keeps the same meaning even if instructions are not pipelined.
- an instruction pipeline has one single stage.
- an instruction usually takes one cycle of said clock to go through one stage of an instruction pipeline.
- Typical depths of instruction pipelines of prior-art microprocessors range between 5 to 15 stages.
- the Pentium IV processor of Intel Corporation has an instruction pipeline containing 20 stages such that instructions may require up to 20 clock cycles to go through the entire pipeline, whereas the Alpha 21264 processor from Compaq Corporation has only 7 stages.
- the term 'instruction scheduling and execution' plays an important role for the scope of the present invention.
- the term 'instruction scheduling and execution' refers to the determination of the points in time on a time axis (as defined above) at which some operations or some time events are occurring (or are taking place) within said data processing device in order to allow for a correct execution of machine code on said data processing device
- the term 'instruction scheduling and execution' refers to the determination of the points in time on said time axis at which a given instruction of a machine code running on said data processing device enters or leaves one or more stages of an instruction pipeline of said data processing device in order to complete (finish) execution.
- said points in time can be integer or fractional multiples of a cycle, cycle time or period of a clock.
- instructions which perform data loading explicitly may be any kind of data load-, pre-fetch, move-, copy-, or transfer-instructions.
- instructions which perform data loading implicitly are arithmetic/logic instructions which implicitly perform the loading of their operands from some memory (register file) of the memory system but without requiring an separate data load instruction for doing so.
- an 'ADD R1,R2,R3' instruction which adds the contents of registers R1 and R2 together and stores the result (sum) into register R3 implicitly (automatically) performs the loading of the contents of registers R1 and R2 of some register file such that said register contents (values) are available for computation within a FU on which the ADD-instruction is going to be executed.
- any loading of a datum performed implicitly by an instruction (e.g. arithmetic/logic instruction) in order to have said datum available as operand of that same instruction and any explicit data loading instruction (e.g. data load-, store-, copy-, transfer-, move-, pre-fetch instructions) which has the aim to make data available as operands of other instructions
- an instruction e.g. arithmetic/logic instruction
- any explicit data loading instruction e.g. data load-, store-, copy-, transfer-, move-, pre-fetch instructions
- the points in time at which the loading of a datum is started and is finished are determined by the scheduling and execution of said data load instruction.
- the loading of a datum starts as soon as a corresponding instruction enters a certain stage of an instruction pipeline of the data processing device and is finished as soon as said instruction has left a certain stage of a said instruction pipeline.
- the points in time at which the loading of a datum starts also called starting point
- finishes also called end point
- said pipeline stages which may define the starting point and the end point of the loading of a datum may be different for each instruction.
- the loading of a datum by a 'move'-instruction may start as soon as said instruction has left a 'decode' stage, but the loading of a datum by an arithmetic instruction may only start when said instruction has entered an 'issue' stage or when said datum is valid.
- said starting points and end points are not known exactly before the instructions performing said data loading actually begin execution.
- said starting points and end points can often be estimated by determining optimistic or as-soon-as-possible (ASAP) instruction schedules.
- SASAP optimistic or as-soon-as-possible
- the estimated starting points and end points represent earliest possible starting points and end points.
- said data loading will not in any case start and end before said earliest possible points in time.
- Starting points and end points are said to be determined if they can be exactly calculated. Otherwise starting points and end points are said to be estimated.
- the loading of a datum may only be started if resources are available, otherwise the start of said loading may have to be delayed or postponed until resources are available.
- resources may be of any kind, e.g. number and type of ALUs, FPUs or load/store units, number and bandwidth of busses, number and type of read/write ports of memories etc ...
- data loading may also occur autonomously without requiring instructions to start or initiate data loading.
- said data processing device may have means to load or move data from one cache into another cache of the memory system, without requiring instructions to do so, but only by using some caching strategies such as a least-recently-used (LRU) strategy or such as some random replacement strategy. In this case, said caching strategies decide when the loading of a datum is started.
- LRU least-recently-used
- a definition of the points in time at which the loading of a datum is started and finished respectively, which is independent of whether the loading of said datum is initiated by instructions or whether it is performed autonomously, is as follows : the loading of a datum starts : as soon as said data processing device applies some memory control signals to (e.g.
- this definition assumes as well that the loading of a datum can only be started if resources are available, otherwise the start of said loading may have to be delayed until resources are available.
- the lifetime of a datum denotes a time interval on said time axis.
- the two points in time (on said time axis) defining the lifetime of a datum are respectively :
- data lifetimes depend on instruction scheduling and execution as well. If instructions are scheduled using static scheduling (as in the case of DSPs), then most of the data lifetimes can be exactly calculated, e.g. by relying on array data flow analysis. E.g. if instructions are dynamically scheduled (as is the case for super-scalar microprocessors), then most of the data lifetimes can only be estimated by using array data flow analysis or by determining optimistic schedules like ASAP (as soon as possible) schedules. Data may be re-used at multiple and different points in time. Minimal and maximal data lifetimes are determined by the points in time where data are reused for the first and for the last time respectively. For the scope of the present invention, it is not relevant whether data lifetimes represent minimal, maximal or some intermediate lifetimes. Furthermore, it does not matter whether data lifetimes are calculated in an exact way or whether they are approximated or estimated by some method.
- Fractional numbers are often expressed (specified) in some physical time unit (e.g. in [ns]) while integer numbers are often expressed (specified) in logical time units such as the cycle units of a clock of a synchronously clocked microprocessor. However, fractional numbers may also be expressed (specified) in logical time units.
- a run time window denotes a time interval where one of the two points in time defining said run-time window is the actual execution state of a machine code running on said data processing device.
- the actual execution state of a machine code running on said data processing device can be defined in multiple ways. Conceptually spoken, the actual execution state of a machine code running on said data processing device allows to determine all the instructions (of said given machine code) which have been executed or which have entered a certain instruction pipeline stage since the start of execution of said machine code by said data processing device.
- the term 'the execution state' always refers to the actual execution state whereas the terms 'an execution state' and 'one or more execution states' refer to execution states which are not further specified or which may lie chronologically before said actual execution state.
- the execution state of a machine code is defined to be the point in time when the latest instruction of said machine code was fetched from some memory (e.g. instruction cache) or when the latest instruction fetched so far enters a certain stage of an instruction pipeline of said processing device.
- some memory e.g. instruction cache
- another possibility consists in defining the execution state in form of an integer number which represents the number of clock cycles of said clock which have elapsed since said program has started execution.
- the execution state it does not affect the scope of the present invention.
- the definition of the execution state of a machine code running on a data processing device has not to be confused with the execution state of the data processing device itself.
- the term 'execution state' refers to the execution state of a machine code running on a data processing device as defined before.
- the other point in time defining said run-time window is always a point in time lying chronologically ahead of said execution state.
- t ahead denotes said point in time t exec denotes said execution state then : t exec ⁇ t ahea ⁇ l .
- the time difference between the two end points is called the size of the run time window.
- dynamic instruction scheduling and execution makes it that the data processing device can often only estimate which instructions will begin execution within a run time window of a certain size and which will not. As soon as the data processing device knows which instructions are likely to begin execution within said run-time window, then said data processing is also able to determine which data are required by said instructions. Usually, this is done by fetching and decoding instructions ahead of the actual execution state by using branch or trace prediction methods. Therefore, in the following we assume that said data processing device has means to do so.
- Prior-art data loading strategies load data from some memory into the same of into another memory of the memory system or into a load buffer of the data processing device as soon as one or more of the following three conditions are satisfied :
- said data are known to be used by instructions which are known or estimated to begin execution and/or to finish execution within a run time window of certain size
- condition (1) is checked by a data processing device after having fetched instructions ahead of the actual execution state according to branch or trace prediction methods and after having decoded said instructions together with their operands.
- Condition (2) is checked by a data processing device by checking if all data hazards, e.g. read-after- write (RAW) hazards, have resolved or are satisfied.
- Data hazards are satisfied if instructions are executed in an chronological order which satisfies the data dependencies and the control dependencies between instructions.
- data which are used by some instruction denoted by Op1' in form of instruction operands are only valid if said data are not equal to the result of (or are not produced by) another instruction which has not yet finished execution.
- condition (2) has not necessarily to be satisfied. Data can well be loaded from some memory and used by other instructions even before all data hazards are known to be satisfied or not. This is called speculative data loading and speculative instruction execution.
- Condition (3) checks if there are resources available to schedule and execute instructions which use said data as operands.
- Resources may be of any kind, e.g. number and type of ALUs, FPUs or load/store units, number and bandwidth of busses, number and type of read/write ports of memories etc ... If condition (3) is not satisfied, the loading of a datum may have to be delayed or postponed until resources become available.
- the main difference between prior-art data loading and just-time data loading as based on the present invention consists in the fact that neither the access time of the memory in which a datum is stored nor the lifetime of a datum are used to determine or to estimate the point in time when the loading of said datum may start, e.g. the point in time when said processing device applies some memory control signals to and/or applies valid data address signals on some read address bus (read address ports) of the memory system in order to signal a request for a data access to a particular memory of the memory system or in order to start a data read operation of said data from said memory.
- the access time may only determine the end point as well as any point in time lying chronologically between the starting point and the end point of the loading of said datum.
- an arithmetic instruction such as 'ADD R1,R2,R3' loads its operands R1 and R2 from a register file and although the amount of time required for loading said operands out of said register file and hence the end points of the loading of said operands are determined by the access time of said register file, the point in time at which the loading of said operands is started or initiated does not depend on the access time of said register file.
- just-in-time data loading delays the start of the loading of a datum from a memory as long as the access time for data read of the memory in question still allows the datum to be loaded into the same or into another memory of said memory system or into a load buffer of the data processing device just-in-time, e.g. just before the instruction, which uses said datum as an operand, is calculated or estimated to begin execution.
- a data processing system containing : a data processing device, a memory system which may comprise one or more of the following memories : one or more register files of said data processing device itself any data caches, e.g. L0-,L1-,L2- data caches a main memory, where a machine code is running (is executed) on said data processing device, where said machine code contains instructions which use data in form of operands and which produce data in form instruction results (data operation results), where said data processing device has means to perform just-in-time data loading, where said just-in-time data loading is defined as follows :
- said data processing device uses the access time of a memory where said datum is stored and/or the lifetime of said datum in order to determine or to estimate a point in time at which the loading of said datum may start, where said point in time may be postponed if resources are not available
- said data processing device determines all of or part of the data used by instructions (of said machine code) which are known or estimated to begin and/or end execution within a run time window, where the size of said run-time window may vary during execution of said machine code, where said instructions refer to any instructions performing or involving data loading
- said data processing device uses the access time of a memory where a datum of said data is stored and/or the lifetime of said datum in order to determine or estimate a point in time at which the loading of said datum may start, where said point in time may be postponed if resources are not available
- just- time-time data loading assumes that said data processing device has means to determine one or more of said memories where a datum is stored.
- the access time of the memory where a datum is stored may determine the starting point of the loading of said datum itself, e.g. the entrance point of a load instruction into an 'execution' stage of the instruction pipeline.
- said data processing device determines all of or part of the data used by instructions (of said machine code) which are known or estimated to begin and/or end execution within a run time window, where the size of said run-time window may vary during execution of said machine code, where said instructions refer to instructions performing or involving data loading
- said data processing device uses the access time of a memory where a datum of said data is stored and/or the lifetime of said datum in order to determine if the loading of said datum may start at a point in time given by the actual execution state or not
- said instructions of said machine code may refer to any instruction performing data loading, e.g. explicit load instructions, whether they be implicit and/or potential, and/or any integer and floating-point arithmetic/logic instructions.
- a practical and efficient implementation of the steps 2. to 5. would exploit the following 'buffering' property : the set of data which, for a given execution state t e ⁇ ec , is used by instructions which are known or estimated to begin or to end execution within a run time window of a given size s ? is identical to the set of data which, at a later execution state given by t e ⁇ e c + ⁇ , is used by instructions whose execution times lie within a run time window of size Si - ⁇ , ⁇ being some positive amount of time
- a data processing device using this buffering property determines only once if a certain datum is used within a given run-time window and when the loading of said datum may start. In other words, as soon as the data processing device knows or estimates that a datum is used within a certain run-time window, then it determines or estimates a point in time at which the loading of said datum may start. The processing device then stores said point in time (maybe together with a label specifying to which datum this point in time refers to) in some kind of buffer. As soon as the actual execution state has advanced and has reached said point in time, then the loading of said datum is started, provided that resources are available.
- This buffering property reduces the amount of computation as well as the costs required to implement just-in-time data loading.
- Steps 3. and 5. have to be performed according to a given procedure.
- Said procedure determines how said access time is used in order to calculate or to estimate the point in time at which said data loading may start.
- a procedure of particular interest is as follows :
- the value of b[1][2] is stored at memory 1 and the value o b[0][2] is stored at memory 0. Since b[0][2] does not belong to SD 0 but belongs to SD ⁇ and since s 0 ⁇ fro ⁇ s-, , the value of b[0][2] is loaded from memory for the given execution state.
- b[1][2] is not loaded from memory because, since b[1][2] does not belong to SD, but belongs to SD 2 , the condition s, ⁇ _r. ⁇ s 2 is not satisfied.
- the loading of the value of b[1][2] can be delayed by another 10 ns, while still guaranteeing that said value can be loaded in-time f. ex. into a load buffer or into a register file of the data processing device before the instruction using said value as operand begins execution. Therefore the value of b[1][2] is not loaded for t( ⁇ e given execution state, but only at a later execution state !
- just-in-time data loading delays the loading of a datum as long as the access time for data read of the memory in which said datum is stored still allows the datum to be loaded f. ex. into a load buffer of the data processing device just-in-time, e.g. just before the instruction, which uses said datum as an operand, is calculated or estimated to begin execution.
- the present invention concerns a data processing system containing a data processing device and a memory system and where said data processing device performs just-in-time data loading according to claim 1.
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01978262A EP1410176A1 (en) | 2000-11-17 | 2001-07-23 | A data processing system performing just-in-time data loading |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EPPCT/EP00/11458 | 2000-11-17 | ||
EP0011458 | 2000-11-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2002041142A1 true WO2002041142A1 (en) | 2002-05-23 |
Family
ID=8164163
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2001/008496 WO2002041142A1 (en) | 2000-11-17 | 2001-07-23 | A data processing system performing just-in-time data loading |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2002041142A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5664193A (en) * | 1995-11-17 | 1997-09-02 | Sun Microsystems, Inc. | Method and apparatus for automatic selection of the load latency to be used in modulo scheduling in an optimizing compiler |
US6092180A (en) * | 1997-11-26 | 2000-07-18 | Digital Equipment Corporation | Method for measuring latencies by randomly selected sampling of the instructions while the instruction are executed |
-
2001
- 2001-07-23 WO PCT/EP2001/008496 patent/WO2002041142A1/en not_active Application Discontinuation
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5664193A (en) * | 1995-11-17 | 1997-09-02 | Sun Microsystems, Inc. | Method and apparatus for automatic selection of the load latency to be used in modulo scheduling in an optimizing compiler |
US6092180A (en) * | 1997-11-26 | 2000-07-18 | Digital Equipment Corporation | Method for measuring latencies by randomly selected sampling of the instructions while the instruction are executed |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7487340B2 (en) | Local and global branch prediction information storage | |
KR100958705B1 (en) | System and method for linking speculative results of load operations to register values | |
US7814469B2 (en) | Speculative multi-threading for instruction prefetch and/or trace pre-build | |
US20160098279A1 (en) | Method and apparatus for segmented sequential storage | |
US5848269A (en) | Branch predicting mechanism for enhancing accuracy in branch prediction by reference to data | |
US20070288733A1 (en) | Early Conditional Branch Resolution | |
US20120023314A1 (en) | Paired execution scheduling of dependent micro-operations | |
Schlansker et al. | EPIC: An architecture for instruction-level parallel processors | |
EP1296230A2 (en) | Instruction issuing in the presence of load misses | |
US20020124155A1 (en) | Processor architecture | |
US20020087794A1 (en) | Apparatus and method for speculative prefetching after data cache misses | |
EP1296229A2 (en) | Scoreboarding mechanism in a pipeline that includes replays and redirects | |
JP2001175473A (en) | Method and device for actualizing execution predicate in computer-processing system | |
WO2004001584A2 (en) | A method for executing structured symbolic machine code on a microprocessor | |
US8301871B2 (en) | Predicated issue for conditional branch instructions | |
US20050251621A1 (en) | Method for realizing autonomous load/store by using symbolic machine code | |
WO2002008894A1 (en) | A microprocessor having an instruction format containing timing information | |
US20070288732A1 (en) | Hybrid Branch Prediction Scheme | |
US20070288731A1 (en) | Dual Path Issue for Conditional Branch Instructions | |
US20080162908A1 (en) | structure for early conditional branch resolution | |
US20070288734A1 (en) | Double-Width Instruction Queue for Instruction Execution | |
US6092184A (en) | Parallel processing of pipelined instructions having register dependencies | |
US7484075B2 (en) | Method and apparatus for providing fast remote register access in a clustered VLIW processor using partitioned register files | |
US5737562A (en) | CPU pipeline having queuing stage to facilitate branch instructions | |
WO2004099978A2 (en) | Apparatus and method to identify data-speculative operations in microprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): JP US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2001978262 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2001978262 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2001978262 Country of ref document: EP |