US20070089095A1 - Apparatus and method to trace high performance multi-issue processors - Google Patents

Apparatus and method to trace high performance multi-issue processors Download PDF

Info

Publication number
US20070089095A1
US20070089095A1 US11/608,725 US60872506A US2007089095A1 US 20070089095 A1 US20070089095 A1 US 20070089095A1 US 60872506 A US60872506 A US 60872506A US 2007089095 A1 US2007089095 A1 US 2007089095A1
Authority
US
United States
Prior art keywords
computer
trace
instructions
data
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US11/608,725
Inventor
Radhika Thekkath
Franz Treue
Soren Kragh
Vidya Rajagopalan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
MIPS Tech LLC
Original Assignee
MIPS Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MIPS Technologies Inc filed Critical MIPS Technologies Inc
Priority to US11/608,725 priority Critical patent/US20070089095A1/en
Assigned to MIPS TECHNOLOGIES, INC. reassignment MIPS TECHNOLOGIES, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RAJAGOPALAN, VIDYA, THEKKATH, RADHIKA, KRAGH, SOREN, TREUE, FRANZ
Publication of US20070089095A1 publication Critical patent/US20070089095A1/en
Assigned to JEFFERIES FINANCE LLC, AS COLLATERAL AGENT reassignment JEFFERIES FINANCE LLC, AS COLLATERAL AGENT SECURITY AGREEMENT Assignors: MIPS TECHNOLOGIES, INC.
Assigned to MIPS TECHNOLOGIES, INC. reassignment MIPS TECHNOLOGIES, INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: JEFFERIES FINANCE LLC, AS COLLATERAL AGENT
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3856Reordering of instructions, e.g. using queues or age tags
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3854Instruction completion, e.g. retiring, committing or graduating
    • G06F9/3858Result writeback, i.e. updating the architectural state or memory

Definitions

  • the present invention relates generally to on-chip debugging, and more specifically to program counter (PC) and data tracing in embedded processor systems.
  • PC program counter
  • Computer systems process information according to a program that includes a sequence of instructions defined by an application program or an operating system.
  • a program counter provides a series of memory addresses that are used by the processor for fetching instructions stored in the associated memory.
  • the processor conveys the memory address to the memory over an address bus, and the memory responds over an instruction/data bus with the instruction stored in the addressed memory location.
  • the instructions stored in the memory constitute the program to be executed.
  • Program development relies heavily on the verification of the instructions stored in memory as well as their corresponding execution. Typically, these debug efforts are supported by instruction tracing tools that generate a listing of executed instructions during the execution of a program.
  • a multi-issue processor may have out-of-order (OOO) dynamic scheduling, deep pipelines, multi-latency pipelines, or support of outstanding load misses.
  • OOO out-of-order
  • embodiments of the present invention include an apparatus, system, method, computer program product, and data signal embodied in a transmission medium for tracing multi-issue processors in program sequence order.
  • tracing instructions from a multi-issue processor includes: monitoring a reorder buffer having a graduation cycle for graduating instructions in program order and transmitting trace data for the instructions in graduation order for each graduation cycle along with information that enables a determination of program execution of the instructions.
  • the trace data may be transmitted using a trace interface having a plurality of trace buses.
  • one or more rules are used to assign trace data to the trace buses to facilitate another element reconstructing the program sequence.
  • One benefit of the present invention is that it facilitates tracing a complex multi-issue microprocessor having one or more features that may disrupt sequential execution of instructions, such as deep pipelines, multi-latency pipelines, multiple outstanding load misses, out-of-order (OOO) instructions, or superscalarity.
  • instructions such as deep pipelines, multi-latency pipelines, multiple outstanding load misses, out-of-order (OOO) instructions, or superscalarity.
  • FIG. 1 illustrates a tracing system according to an embodiment of the present invention.
  • FIGS. 2 and 3 illustrate aspects of tracing a single instruction pipeline according to an embodiment of the present invention.
  • FIG. 4 illustrates a portion of a multi-issue processor tracing apparatus according to an embodiment of the present invention.
  • FIG. 5 illustrates an embodiment of a present invention in which a trace interface includes a plurality of trace slots.
  • FIG. 6 is a flow chart illustrating one embodiment of a rule that trace generation logic may use to assign trace buses to graduating instructions
  • FIG. 7 is a flow chart illustrating an embodiment of a rule that trace generation logic may use in which data associated with an instruction is traced out on the same trace slot.
  • FIG. 8 illustrates a method of coordinating an end time signal for the case that data from a plurality of graduating instructions are traced out on a plurality of trace buses.
  • FIG. 9 is a timing diagram for a single instruction pipeline illustrating signals in an exemplary trace interface
  • FIG. 10 is a table illustrating a method of tracing instructions in instruction order according to the graduation cycle of a reorder buffer.
  • FIG. 11 is a table illustrating an exemplary program sequence.
  • FIG. 12 illustrates the corresponding instruction complete signals for the program sequence of FIG. 11 .
  • FIG. 13 illustrates the trace bus data signals and end point signals associated with FIGS. 11-12 .
  • FIG. 14 illustrates an exemplary data sequence reconstructed from the signals of FIGS. 12-13 .
  • FIG. 1 illustrates a tracing system 100 that includes on-chip components identified as microprocessor core 110 , trace generation logic (TGL) 120 , trace control block (TCB) 130 , and test access port (TAP) controller 140 .
  • TGL 120 can be embodied as part of microprocessor core 110 .
  • TGL 120 is generally operative to generate program counter (PC) and data trace information based on the execution of program code in one or more pipelines within microprocessor core 110 .
  • microprocessor core 10 is a high performance multi-issue microprocessor having one or more features that may disrupt sequential execution of instructions, such as deep pipelines, multi-latency pipelines, multiple outstanding load misses, out-of-order (OOO) instructions, or superscalarity.
  • TGL 120 transmits the generated trace information to TCB 130 via trace interface 180 .
  • TGL 120 includes logic 480 to monitor a reorder buffer associated with multiple instruction pipelines (not shown in FIG. 1 ) of processor core 110 .
  • TCB 130 captures the trace information that is provided by TGL 120 on trace interface 180 and writes the trace information to trace memory 150 in accordance with a particular set of requirements of trace re-generation software 160 .
  • trace interface 180 includes trace buses for transmitting trace data and TGL may be adapted to apply one or more rules for outputting trace data on trace interface 180 to facilitate transmitting sufficient information to TCB 130 to reconstruct the program execution.
  • TAP controller 140 Also included on-chip is TAP controller 140 .
  • TAP controller 140 includes instruction, data, and control registers as well as circuitry that enables tap controller 140 to access internal debug registers and to monitor and control the microprocessor core's address and data buses.
  • TAP controller 140 is based on the extended JTAG (EJTAG) specification developed in part by MIPS Technologies, Inc.
  • EJTAG extended JTAG
  • Trace regeneration software 160 is a post-processing software module that enables trace reconstruction.
  • Debugger 170 interfaces with TAP controller 140 and is generally operative to display TAP states as well as provide high-level commands to TAP controller 140 . For example, debugger 170 can be used to set breakpoints or examine contents of certain sections of memory.
  • FIG. 2 illustrates a single instruction pipeline 200 having six stages, labeled as fetch stage 310 , decode stage 320 , execute stage 330 , memory stage 340 , align stage 350 , and writeback stage 360 .
  • a tracing point may be placed after any stage beyond which instructions are certain to execute, such as after memory stage 340 .
  • FIG. 3 illustrates a single instruction pipeline having an associated data-order determination module 328 , compression modules 318 and 338 , and a FIFO queue 348 .
  • the deferred transmission of load and store data is enabled through the output of a data order signal that is designed to signal the out-of-order nature of load and store data.
  • the PC, store address, and load address are immediately provided to compression module 318 .
  • Store data and load data are provided to compression modules 318 and 338 only when the data is available to data order determination module 328 . If the data is retrieved from a cache/register, then the data is immediately available to data order determination module 328 . The data can then be passed on to compression modules.
  • compression modules 318 and 338 are operative to compress the trace data that is to be placed into FIFO 348 , thereby awaiting output onto a trace bus.
  • instructions may be issued out of order, i.e. out of program sequence. Even if instructions are not issued out of order, they may complete their execution out of order. This can happen when pipelines have different latencies or because of cache misses.
  • the instructions are typically put back in order at the back-end of the pipeline in a structure called the reorder buffer.
  • FIG. 4 is a block diagram illustrating some aspects of an embodiment of an on-chip tracing apparatus 400 for tracing multi-issue processor.
  • a multi-issue processor may comprise processors 405 of processor core 110 that include at least two instruction pipelines, such as instruction pipelines including a plurality of stages, such as a fetch stage 410 , decode stage 420 , execute stage 430 , memory stage 440 , align stage 450 , and writeback stage 460 and issue queue 470 and issue logic 475 having issue slots for each pipeline.
  • instruction pipelines including a plurality of stages, such as a fetch stage 410 , decode stage 420 , execute stage 430 , memory stage 440 , align stage 450 , and writeback stage 460 and issue queue 470 and issue logic 475 having issue slots for each pipeline.
  • two pipelines are shown, although it will be understood that processors 405 may have any number of instruction pipelines.
  • TGL 120 is communicatively coupled to reorder buffer 470 and is adapted to monitor graduating instructions from a reorder buffer 470 .
  • TGL 120 includes an instruction completion logic block module 480 to monitor the graduation of instructions from a reorder buffer 470 associated with instruction pipeline 405 .
  • reorder buffer 470 is shown as a separate element from instruction pipelines 405 although it will be understood that reorder buffer 470 is commonly considered to be a component of instruction pipelines 405 .
  • Reorder buffer 470 is responsible for putting the issued instructions back in program sequence order.
  • the reorder buffer holds instructions that complete out-of order and graduates them in-order, i.e., commits their results in-order.
  • the reorder buffer is located at a point in the pipeline where it is certain that the instruction will not stop and can proceed to completion. This is typically at a point where it is certain that the instruction will not get an exception or be nullified for any reason.
  • the number of graduating instructions per cycle will typically not exceed the number of issue slots of the processor (e.g. 0, 1, or 2 instructions per graduation cycle for a two-issue multiprocessor), However, more generally, the maximum number of graduating instructions will range from zero to the number of issue slots at the front of the multi-issue pipeline plus the number of load miss completions from the bus and cache units.
  • TGL 120 monitors instructions at the point of graduation and traces them out on output trace buses 490 .
  • instructions may be traced in program sequence order from a multi-issue processor.
  • the number of trace slots e.g., the number of different portions of trace interface 180 able to simultaneously transmit instruction data
  • each trace slot 505 of trace interface 180 thus includes a trace bus for outputting trace data and other associated signals for tracing out an instruction.
  • the number of trace slots is selected to be equal to the number of issue slots.
  • trace interface 180 includes at least two output trace buses 490 . It is possible that in some cycles the number of graduating instructions is greater than the number of instruction trace slots.
  • the TGL 120 may include a buffer to buffer the instructions(s) that could not be traced earlier, and trace them during the next cycle, with other instructions intergraduate in that cycle, while still maintaining the program sequence order.
  • TGL 120 outputs trace information on trace interface 180 in a format that facilitates reconstructing sequential program execution (e.g., by TRS 160 ) from information received by two or more trace buses over multiple graduation cycles.
  • Embodiments of the present invention include at least one tracing rule to facilitate reconstructing sequential program execution.
  • One tracing rule is a bus order rule. With multiple trace buses, the TGL 120 maintains a bus order rule, which dictates that during each cycle, instructions are assigned to trace buses such that the relative instruction order may be determined from the trace bus order. For example, referring to the flow chart of FIG.
  • TGL 120 determines 605 the number of instructions graduating per cycle and assigns 610 these instructions to trace buses depending upon their relative program order, In a preferred embodiment, this assignment of instructions to trace buses is static, i.e., the earliest instruction is traced on trace bus 0 , the next on trace bus 1 , and so on. Alternatively, in another embodiment, this assignment can be dynamic with the mapping being transmitted each cycle as well.
  • another tracing rule is that if an instruction is traced on a particular instruction trace slot (i.e., a particular output trace bus 490 ) then all other information for that instruction is sent on the signals of the same instruction trace slot. For example, referring to FIG. 7 in one embodiment if the PC is output 705 on a trace bus, the other associated data, such as load address, load data, store address, and store data are traced out 710 on the same trace bus (in subsequent clock cycles).
  • This rule facilitates TCB 130 to associate and gather all the information relating to a particular instruction. The exception to this occurs when the load or store data is not immediately available, such as if a load misses in the cache. In this situation, the data is sent at a later time on any free trace bus, using an out of order data signal as described below in more detail.
  • another tracing rule is applied to facilitate the TCB 130 to gather data on a per-instruction basis to determine the program sequence.
  • This rule coordinates the end cycle of data transfers of the instructions that began tracing together on the same cycle.
  • some types of data may be transmitted on each trace bus in one or more clock cycles.
  • a transaction end coordination rule facilitates sequencing data from the trace buses in proper order.
  • end signals illustrated as Tend signals in FIG. 8 , are coordinated on each trace bus for the data associated with instructions that began their tracing cycle together.
  • the three above-described tracing rules are utilized together in combination.
  • embodiments of the present invention include using a subset of the three tracing rules (i.e., one or two of the tracing rules).
  • Trace interface 180 may comprise a set of input and output signals from microprocessor core 110 .
  • a “PDO_” prefix may be used to identify signals belonging to the output interface from TGL 120
  • a “PDI_” prefix may be used to identify signals belonging to the input interface to TGL 120 .
  • the trace interface may have one output trace bus for each instruction pipeline.
  • the output signal names described may further have a “_n” appended to the signal name to designate a predetermined trace slot.
  • a two-issue microprocessor core may use the signals PDO_InsComp — 0 and PDO_InsComp — 1 to represent the instruction completion status values of two simultaneously graduating instructions.
  • PDO_IamTracing is a signal, sent out from TGL 120 that indicates that the rest of the Out signals represent valid trace data. In effect, PDO_IamTracing represents an enable signal for the rest of the Out signals.
  • the PDO_InsComp signal is an instruction completion status signal.
  • PDO_AD is a trace bus for transmitting trace data.
  • PDO_Ttype specifies the transmission type for the transaction on the trace bus. The end of a particular transaction on a trace bus is indicated by a PDO_Tend signal.
  • a PDO_Tmode may be included to indicate the mode of the transmission.
  • a PDO_DataOrder signal is used to provide information on out-of-order data.
  • a PDO_Overflow signal may be used to indicate an overflow error.
  • a PDO_Issue 13 Tag_n signal is used to provide information on instructions that are issued together. The elements of Table 1 are described below in more detail. TABLE 1 Output Signal Name Description PDO_IamTracing Global enable signal for signals output from the microprocessor core PDO_InsComp_n Instruction completion status signal for the nth instruction pipeline PDO_AD_n The nth trace bus for trace data PDO_Ttype_n Specifies the transmission type for the transaction on the PDO_AD lines for the nth trace bus.
  • PDO_Tend_n Indicates the last cycle of the current transaction
  • PDO_TMode Indicates the transmission mode for the bits transmitted on PDO_AD
  • PDO_DataOrder_n Indicates the out-of-order-ness of load and store data
  • PDO_Overflow Indicates an internal FIFO overflow error
  • PDO_Issue_Tag_n Indicates the instructions with matching tag values have issued together in the multi-issue pipelines
  • PDO_InsComp_n is an instruction completion status signal that is used as an indicator of completed instructions and their type in the processor's pipeline.
  • PDO_InsComp_in can take on the values of Table 2. TABLE 2 PDO_InsComp Description 000 No instruction completed this cycle (NI) 001 Instruction completed this cycle (I) 010 Instruction completed this cycle was a load (IL) 011 Instruction completed this cycle was a store (IS) 100 Instruction completed this cycle was a PC sync (IPC) 101 Instruction branched this cycle (IB) 110 Instruction branched this cycle was a load (ILB) 111 Instruction branched this cycle was a store (ISB)
  • a PDO_InsComp value ‘000’ is associated with a No Instruction complete (NI) indication.
  • the NI indication can be used when the instruction pipeline is stalled. In another example, the NI indication can be used when an instruction is killed due to an exception.
  • the PDO_InsComp values ‘001,’ ‘010,’ and ‘011’ are associated with the completion of instructions within a basic block or predictable jumps and branches to the next basic block. Specifically, ‘001’ is used to signal the completion of a regular instruction (I), ‘010’ is used to signal the completion of a load instruction (IL), and ‘011’ is used to signal the completion of store instruction (IS). As the I, IL, or IS indication is associated with the completion of an instruction within a basic block , the PC value of the I, IL, or IS instruction need not traced.
  • the completion of branch instructions are signaled using the PDO_InsComp values of ‘101,’ ‘110,’ and ‘111.’ Specifically, ‘101’ is used to signal the completion of a regular branch instruction (IB), ‘110’ is used to signal the completion of a load-branch instruction (ILB), and ‘111’ is used to signal the completion of a store-branch instruction (ISB).
  • IB regular branch instruction
  • IB load-branch instruction
  • ISB store-branch instruction
  • the three branch-type encodings (101, 110, and 111) imply that the associated instruction was the target of a taken branch that could or could not be statically predicted. In general, a branch is indicated on the first instruction in a new basic block.
  • the PDO_InsComp signal takes values ILB or ISB, respectively, to indicate the combined condition of a branch and a load or store. If the PC can not be statically predicted for a an IB, ILB or ISB, then the PC is transmitted on the PDO_AD_n trace bus.
  • trace data is output on a trace bus, PDO_ADAD_n.
  • PDO_ADAD_n a trace bus
  • TPC PC values
  • TLA load address values
  • TSA store address values
  • TD data values
  • trace bus PDO_AD Additional trace data beyond PC, address and data values can also be transmitted on trace bus PDO_AD.
  • the width of each PDO_AD trace bus may not be adequate to transmit the entire address or data in one cycle, each transaction may take multiple cycles to transmit.
  • a FIFO is therefore used to hold pending transactions and values. In one embodiment, if a transaction takes multiple cycles, then the least-significant bits are sent first, followed by the more-significant bits.
  • the PDO_Ttype_n signal is used to indicate the type of information being transmitted on the PDO_AD_n bus.
  • the PDO_Ttype_n signal can take on the values of Table 3.
  • TABLE 3 PDO_TType Description 000 No transmission this cycle (NT) 001 Begin transmitting the PC (TPC) 010 Begin transmitting the load address (TLA) 011 Begin transmitting the store address (TSA) 100 Begin transmitting the data value (TD) 101 Begin transmitting the processor mode and the 8-bit ASID value (TMOAS) 110 Begin user-defined trace record - type 1 (TU1) 111 Begin user-defined trace record - type 2 (TU2)
  • the data type includes PC values (TPC), load address values (TLA), store address values (TSA), and data values (TD). These trace data types are identified using the PDO_TType signal values of ‘001’ to ‘100,’ respectively.
  • PDO_TType signal value ‘101’ is used to identify the transmission of processor mode and application space identity (ASID) information.
  • the processor mode and ASID information 101 can be included as part of the synchronization information that is periodically transmitted. This portion of the synchronization information enables trace regeneration software 160 to identify the software state of the computer system being traced.
  • the final data types that can be transmitted on trace bus PDO_AD are user-defined trace records TU 1 and TU 2 . These user-defined trace records are identified using PDO_TType signal values ‘110’ and ‘111,’ respectively.
  • the PDO_Tend_signal indicates the last cycle of the current transaction on trace bus PDO_AD_n. This signal can be asserted in the same cycle that a transaction is started implying that the particular transaction only took one cycle to complete.
  • the PDO_Tend signals are synchronized for all the PDO_AD_n transmissions associated with instructions that graduate together.
  • the PDO_Tmode_n signal indicates the transmission mode for the bits transmitted on trace bus PDO_AD_n.
  • the PDO_TMode signal can be used to signal to TCB 130 the type of compression that has been performed on the trace data that is transmitted on trace bus PDO_AD. This mode information is therefore used by TCB 130 to regenerate the program flow accurately.
  • the PDO_DataOrder_n signal is used to indicate the out-of-order nature of data that is traced out.
  • the use of the PDO_DataOrder signal enables TGL 120 to avoid having to include memory for storing data that are returned out-of-order. The data can simply be traced out as soon as they are available.
  • Out-of-order transfers of data are further described in co-pending application Ser. No. 09/751,747, entitled “Configurable Out-Of-Order Data Transfer in a Coprocessor Interface,” which is incorporated herein by reference in its entirety.
  • the PDO_DataOrder signal indicates the position of the data in the list of current outstanding load and stores starting at the oldest.
  • the PDO_DataOrder_n signal can take on the following values of Table 4. TABLE 4 PDO_DataOrder Description 0000 data from oldest load/store instruction (is in-order) 0001 data from second-oldest load/store instruction 0010 data from third-oldest load/store instruction 0011 data from fourth-oldest load/store instruction 0100 data from fifth-oldest load/store instruction 0101 data from sixth-oldest load/store instruction 0110 data from seventh-oldest load/store instruction 0111 data from eighth-oldest load/store instruction 1000 data from ninth-oldest load/store instruction 1001 data from tenth-oldest load/store instruction 1010 data from eleventh-oldest load/store instruction 1011 data from twelfth-oldest load/store instruction 1100 data from thirteenth-oldest load/store instruction 1101 data from fourteenth-oldest load/store instruction 1110 data from fifteenth-oldest load/store instruction
  • Table 5 illustrates an example in which the program issues five loads, A, B,C, D, E, to be traced along with the corresponding PDO_DataOrder signals.
  • the data is available the same clock cycle as the instruction.
  • TABLE 5 Load Data Data Traced Load Cycle# CacheOp Available Out PDO_DataOrder A 1 Miss — — — B 2 Hit B B 0001 (second oldest) C 3 Hit C C 0001 (second oldest) D 4 Miss — — — E 5 Hit E E 0010 (third oldest) — k — A A 0000 (oldest) — K + p — D D 0000 (oldest)
  • load A misses in the cache and goes to memory. Load A is therefore considered outstanding.
  • load B hits in the cache and is immediately available. Load B is then traced out with the PDO_DataOrder signal indicating that the load data is the second oldest outstanding load. Based on the values of Table 4, the PDO_DataOrder signal will have a value of ‘0001.’ At this point, load A is considered the oldest outstanding load.
  • load C hits in the cache and is immediately available. Load C is then traced out With the PDO_DataOrder signal indicating with a value ‘0001’ that the load data is the second oldest outstanding load. At this point, load A is still considered the oldest outstanding load.
  • Load B is not considered outstanding as it was traced out at clock cycle 2 .
  • load D misses in the cache and goes to memory. Load D is therefore considered outstanding.
  • both load A and load D are the currently outstanding loads.
  • Load A is considered the oldest outstanding load while load D is considered the second oldest outstanding load.
  • load E hits in the cache and is immediately available. Load E is then traced out with the PDO_DataOrder signal indicating with a value ‘0010’ that the load data is the third oldest outstanding load behind load A and load D.
  • load A returns from memory and is available.
  • Load A is then traced out with the PDO_DataOrder signal indicating with a value ‘0000’ that the load data is the oldest outstanding load.
  • load D returns from memory and is available.
  • Load D is then traced out with the PDO_DataOrder signal indicating with a value ‘0000’ that the load data is the oldest outstanding load.
  • the PDO_Overflow signal is used to indicate that the current tracing is being abandoned due to a FIFO overflow. In this situation, TGL 120 discards all entries in FIFO 348 , and restarts transmission from the next completed instruction. It should be noted that the first instruction to be signaled after the assertion of the PDO_Overflow signal should have its PC value sent as well. In effect, that instruction is treated as a IB, ILB, or ISB instruction.
  • the tracing of the PC value is important where the PC value could not be statically predicted. Without this information, trace regeneration software 160 is unable to reconstruct the program execution path. For example, if the branch was unpredictable and the unpredictability lies in the branch target address, then the PC value should be transmitted. If the unpredictability lies in the branch condition (i.e., determining if the branch is taken or not), on the other hand, then the branch target PC value need not be transmitted. Here, it is sufficient to simply indicate that the branch was taken. For branch instructions where there is a jump in PC, several options exist.
  • the following rules can be applied: (1) when the branch is unconditional and the branch target is predictable, IB, ILB, or ISB is used for the PDO_InsComp value, and the PC value is not traced out; (2) when the branch is conditional, and the branch target is predictable, IB, ILB, or ISB is used only when the branch is taken and there is no need to trace out the PC value; and (3) when the branch is conditional or unconditional, and the branch target is unpredictable, IB, ILB, or ISB is used and the PC value is traced out using TPC for the PDO_TType signal.
  • the PC value can be transmitted (a) after a JR or JALR instruction; (b) after a control transfer to an exception handler; (c) after a return from exception (ERET or DERET instruction); and (d) for resynchronization purposes.
  • the user may require that the target address and data be traced along with the transmitted PC value
  • the PC value is sent first, followed by the store address, and finally the store data.
  • the PC value and load address are sent first, followed by the load data when it becomes available.
  • FIG. 9 is a timing diagram 900 for a single instruction pipeline illustrating the above-described interface for a single instruction pipeline relative to a clock signal Pclk.
  • PDO_InsComp[ 2 : 0 ] has a value IB, indicating the completion of a branch instruction.
  • the value IB represents the completion of an instruction that could not be statically predicted. Accordingly, the PC value for the branch instruction should be traced, thereby enabling trace regeneration software 160 to recreate the execution of a new block of instructions.
  • the PC value for the branch instruction is transmitted on the trace bus PDO_AD[ 15 : 0 ].
  • the PDO_TMode signal indicates the transmission mode for the bits transmitted on trace bus PDO_AD[ 15 : 0 ].
  • PDO_InsComp[ 2 : 0 ] has a value I, indicating the completion of an instruction within a block of instructions. As noted, the completion of an instruction within a block does not require the tracing of the PC value. Accordingly, no transmission occurs on trace bus PDO_AD[ 15 : 0 ].
  • the no transmission state is also signaled by the PDO_TType signal with a NT value.
  • PDO_InsComp[ 2 : 0 ] has a value IB, indicating the completion of another branch instruction.
  • the PC value is then transmitted on trace bus PDO_AD[ 15 : 0 ] with the data type TPC indicated on PDO_TType[ 2 : 0 ].
  • the transmission of the PC value requires two clock cycles ( 3 and 4 ). Accordingly, the PDO_TEnd signal is not asserted until the end of the transaction at clock cycle 4 .
  • PDO_InsComp[ 2 : 0 ] has a value IL, indicating the completion of a load instruction.
  • the PC value need not be transmitted.
  • the user can specify however, that the load address and data be traced. With the assumption that the load hit in the cache, the load address and data is immediately available The load address is transmitted first on PDO_AD[ 15 : 0 ] at clock cycles 5 and 6 , and the load data is transmitted next on PDO_AD[ 15 : 0 ] at clock cycles 7 - 10 . In both cases, the corresponding data type is transmitted on PDO_TType[ 2 : 0 ] using signal values TLA and TD, respectively.
  • PDO_InsComp[ 2 : 0 ] further signals the completion of IL at clock cycle 5 , I at clock cycle 6 , NI at clock cycles 7 - 9 , and I at clock cycle 10 .
  • Each of these instruction-completion indications did not require a transmission on trace bus PDO_AD[ 15 : 0 ]. Accordingly, the trace data FIFO did not overflow as it waited to be cleared during the six-cycle transmission of the load address and data during clock cycles 5 - 10 .
  • PDO_InsComp[ 2 : 0 ] indicates completion of a branch store instruction ISB.
  • the PC value, store address, and store data are then transmitted on trace bus PDO_AD[ 15 : 0 ] at clock cycles 12 - 13 , 14 - 16 , and 17 - 18 , respectively.
  • PDO_InsComp[ 2 : 0 ] continues to indicate the completion of additional instructions.
  • PDO_InsComp[ 2 : 0 ] indicates the sequential completion of I, IL, IL, IS, IS, and IL instructions at clock cycles 13 - 18 , respectively.
  • Timing diagram 900 illustrates an overflow condition at clock cycle 18 .
  • the overflow indication is indicated by the assertion of the PDO_Overflow signal, thereby indicating an internal FIFO overflow error.
  • a two-issue core can trace two instructions and uses the signals PDO_InsComp — 0 and PDO_InsComp — 1 to represent the completion status values of two simultaneously graduating instructions.
  • FIG. 10 illustrates the cycle of graduation of each instruction from the reorder buffer and the number of the instruction trace slot (trace bus) that actually traces that instruction
  • the exemplary instructions are in the MIPS 32 assembly language.
  • the SW fragments correspond to store word instructions having a corresponding PDO_InsComp value of IS and a potential PDO_TType value of multiple cycles of TSA and TD.
  • the JAL Instruction corresponds to a jump instruction having a corresponding PDO_InsComp value of I and a PDO_TType value of NT.
  • the OR instruction has a PDO_InsComp value of I and a PDO_TType value of NT.
  • the NOP instruction has a corresponding PDO_InsComp value of IB (target of the previous JAL) and a PDO_TType value of NT.
  • the JR instruction has a corresponding PDO_InsComp value of I and a PDO_TType value of NT.
  • the LW instruction has a corresponding PDO_InsComp value of ILB and a PDO_TType value of TPC and potentially TLA and TD.
  • the BEQ instruction has a corresponding PDO_InsComp value of I and a corresponding PDO_TType value of NT.
  • the ADDIU instruction has a corresponding PDO_InsComp value of I and a corresponding PDO_TType value of NT.
  • the OR instruction has a corresponding PDO_InsComp value of I and a PDO_TType value of NT.
  • the NOP instruction has a corresponding PDO_InsComp value of I and a PDO_TType value of NT.
  • the ADDU instruction has a PDO_InsComp of I and a PDO_TType value of TMOAS.
  • the SLTU instruction has a corresponding PDO_InsComp value of 1 and a PDO_TType value of NT.
  • the instructions in the assembly fragment are also identified by an instruction number (Inst. No.) This example assumes a simple two-issue processor that allows up to one load/store instruction per issue and one branch instruction per cycle. In this example, 0, 1, or 2 instructions can graduate per cycle.
  • a preferred embodiment includes at least one rule to facilitate the TCB 130 organizing data to reconstruct the program sequence.
  • the trace buses are implicitly ordered to facilitate determining the earliest instruction(s) and earliest data in a graduation cycle.
  • the data on trace bus PDO_AD_k may be assumed to be before the data on PDO_AD_k+1.
  • trace bus 0 may be used to trace out the earliest instruction in the graduation cycle
  • trace bus 1 used to trace out the next earliest instruction in the graduation cycle.
  • This rule facilitates keeping track of instruction order when more than one instruction graduates in a graduation cycle.
  • An application of this rule can be seen in FIG. 10 , in which the smallest number instruction in each graduation cycle is always assigned trace bus number 0 . Thus, if two instructions graduate in the cycle, the earliest (smallest number) instruction can be distinguished.
  • Another rule to assist TCB 130 to identify the data associated with particular instructions is to synchronize the end points for transmitting data from a graduation cycle for all trace buses.
  • data associated with instructions that are traced together on the different PDO_InsComp_n trace buses are such that their end points (i.e. the last data cycle) are synchronized.
  • This rule facilitates an external trace control block sequencing all the data operations in the various PDO_AD_n buses into the program sequence.
  • FIGS. 11-14 illustrate example of how the above-described rules may be applied to facilitate multi-issue tracing.
  • FIG. 11 shows a block of information corresponding to instruction complete (PDO_InsComp) values in a program sequence, i.e., ILBa is instruction number 1 , ILb is instruction number 2 , ISc is instruction number 3 , and ILd is instruction number 4 .
  • PDO_InsComp instruction complete
  • FIG. 12 shows the values of the block of information as they would be transmitted on two instruction trace slots, i.e., PDO_InsComp — 0 and PDO_InsComp — 1 for graduation cycles n and n+1.
  • instructions traced out on trace slot 0 are presumed to have a lower instruction number than instructions traced out on trace slot 1 .
  • the instructions are traced out in two graduation cycles.
  • FIG. 13 shows the corresponding PDO_AD and PDO_Tend values for the two trace slots.
  • the PDO_AD may correspond to PC, load-addr, load data, store-addr, or store data depending upon the corresponding PDO_InsComp value.
  • the data trace information for the instructions that were simultaneously traced on PDO_InsComp — 0 and PDO_InsComp — 1 are traced such that their PDO_Tend signals are coordinated to facilitate ordering the data received from the trace buses in program sequence.
  • the PDO_InsComp values traced in cycle n+1 the data transmission ends in cycle m+9.
  • FIG. 14 illustrates how an external block reading the signals on the interface can take the data values from the two PDO_AD buses, and knowing the program sequence, can put the traced data in order.
  • the PDO_Tend signal indicates that one transaction is completed, e.g., the PDO_Tend — 0 signal at cycle m+2 indicates the completion of the transaction of transmitting a first group of PC data, TPCa 1 and TPCa 2 .
  • the PDO_Tend signals in cycle 3 permit a determination that the next orderings of data is TLAa 1 , TLAa 2 and TLAb 1 .
  • cycle 5 all of the transfers are complete for instructions traced in cycle n. Since the transfers complete at the same time and since the data transferred on the 0 st trace bus can be assumed to occur earlier than for the 1 th trace bus, the data may be sequenced into TDa 1 , TDa 2 sequenced before TDb 1 , TDb 2 .
  • the present invention may be applied to out of order loads and stores in the multi-pipe core.
  • a multi-issue processor needs to send out of order data, it uses the PDO_DataOrder signal to indicate that the data is out-of-order.
  • PDO_DataOrder When an out-of-order data is returned, it can be traced on any free PDO_AD_n bus, not necessarily the one that traced the corresponding instruction. This is because instruction tracing is sequentialized by the PDO_InsComp_n order and therefore the data can be associated with the correct instruction once the PDO_DataOrder value is known. Note that since the PDO_AD_n trace buses are implicitly ordered, for data transmissions that end on the same cycle, the data on PDO_AD_k is before the data on PDO_AD_k+1.
  • the processor tags all the instructions that issue together, using the signal PDO_IssueTag_n. This tag value is also traced out with each PDO_InsComp_n value. In one embodiment, a tag value of 6 bits is used, assuming an issue window of about 64 instructions. Note that this tag information can be traced out of the TCB only if the user requires it, hence it will not incur bandwidth on the external pins unless there is a real need for the information. Thus, it is recommended that the TCB allow the external tracing of this information under discretion.
  • the invention can be embodied in a computer usable medium configured to store computer readable code (e.g., computer readable program code, data, etc).
  • computer readable code e.g., computer readable program code, data, etc.
  • the computer code causes the enablement of the functions or fabrication, or both, of the invention disclosed herein.
  • this can be accomplished through the use of general programming languages (e.g., C, C++, JAVA, etc.) GDSII databases; hardware description languages (HDL) including Verilog HDL, VHDL, Altera Hardware Description Language (AHDL) and so on; or other programming and/or circuit (i.e., schematic) capture tools available in the art.
  • general programming languages e.g., C, C++, JAVA, etc.
  • GDSII databases GDSII databases
  • HDL hardware description languages
  • HDL Verilog HDL, VHDL, Altera Hardware Description Language (AHDL) and so on
  • AHDL Altera Hardware Description Language
  • circuit i.e., schematic
  • the computer code can be disposed in any known computer usable (e.g., readable) medium including semiconductor memory, magnetic disk, optical disc (e.g., CD-ROM, DVD-ROM, etc.) and as a computer data signal embodied in a computer usable (e.g., readable) transmission medium (e.g., carrier wave or any other medium including digital, optical or analog-based medium).
  • a computer usable (e.g., readable) transmission medium e.g., carrier wave or any other medium including digital, optical or analog-based medium.
  • the code can be transmitted over communication networks including the Internet and intranets.
  • the invention can be embodied in computer code (e.g., as an HDL program) as part of a semiconductor intellectual property core (e.g., a microprocessor core) or a system level design (e.g., a system on chip) and transformed to hardware as part of the production of integrated circuits. Also, the invention may be embodied as a combination of hardware and computer code.
  • computer code e.g., as an HDL program
  • a semiconductor intellectual property core e.g., a microprocessor core
  • system level design e.g., a system on chip
  • tracing method of the present invention is not limited to processors including a reorder buffer.
  • the tracing method of the present invention may be applied to multi-issue processors having an element for placing instructions back in program sequence order.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Advance Control (AREA)

Abstract

A system and method for program counter and data tracing in a multi-issue processor is disclosed. Instructions are traced in program sequence order. In one embodiment instructions are traced in graduation order from a reorder buffer. The tracing mechanism of the present invention enables increased visibility into the hardware and software state of the processor core.

Description

    FIELD OF THE INVENTION
  • The present invention relates generally to on-chip debugging, and more specifically to program counter (PC) and data tracing in embedded processor systems.
  • BACKGROUND OF THE INVENTION
  • Computer systems process information according to a program that includes a sequence of instructions defined by an application program or an operating system. Typically, a program counter provides a series of memory addresses that are used by the processor for fetching instructions stored in the associated memory. In this process, the processor conveys the memory address to the memory over an address bus, and the memory responds over an instruction/data bus with the instruction stored in the addressed memory location. The instructions stored in the memory constitute the program to be executed.
  • Program development relies heavily on the verification of the instructions stored in memory as well as their corresponding execution. Typically, these debug efforts are supported by instruction tracing tools that generate a listing of executed instructions during the execution of a program.
  • The increased control and flexibility in the generation of tracing data is particularly important for the embedded processor industry In the embedded processor industry, specialized on-chip circuitry is often combined with a processor core. However, high performance processors may include features that make it difficult to trace sequential execution of a program. For example, a multi-issue processor may have out-of-order (OOO) dynamic scheduling, deep pipelines, multi-latency pipelines, or support of outstanding load misses.
  • SUMMARY
  • Broadly speaking, embodiments of the present invention include an apparatus, system, method, computer program product, and data signal embodied in a transmission medium for tracing multi-issue processors in program sequence order. In one embodiment, tracing instructions from a multi-issue processor includes: monitoring a reorder buffer having a graduation cycle for graduating instructions in program order and transmitting trace data for the instructions in graduation order for each graduation cycle along with information that enables a determination of program execution of the instructions. The trace data may be transmitted using a trace interface having a plurality of trace buses. In one embodiment, one or more rules are used to assign trace data to the trace buses to facilitate another element reconstructing the program sequence.
  • One benefit of the present invention is that it facilitates tracing a complex multi-issue microprocessor having one or more features that may disrupt sequential execution of instructions, such as deep pipelines, multi-latency pipelines, multiple outstanding load misses, out-of-order (OOO) instructions, or superscalarity.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 illustrates a tracing system according to an embodiment of the present invention.
  • FIGS. 2 and 3 illustrate aspects of tracing a single instruction pipeline according to an embodiment of the present invention.
  • FIG. 4 illustrates a portion of a multi-issue processor tracing apparatus according to an embodiment of the present invention.
  • FIG. 5 illustrates an embodiment of a present invention in which a trace interface includes a plurality of trace slots.
  • FIG. 6 is a flow chart illustrating one embodiment of a rule that trace generation logic may use to assign trace buses to graduating instructions,
  • FIG. 7 is a flow chart illustrating an embodiment of a rule that trace generation logic may use in which data associated with an instruction is traced out on the same trace slot.
  • FIG. 8 illustrates a method of coordinating an end time signal for the case that data from a plurality of graduating instructions are traced out on a plurality of trace buses.
  • FIG. 9 is a timing diagram for a single instruction pipeline illustrating signals in an exemplary trace interface,
  • FIG. 10 is a table illustrating a method of tracing instructions in instruction order according to the graduation cycle of a reorder buffer.
  • FIG. 11 is a table illustrating an exemplary program sequence.
  • FIG. 12 illustrates the corresponding instruction complete signals for the program sequence of FIG. 11.
  • FIG. 13 illustrates the trace bus data signals and end point signals associated with FIGS. 11-12.
  • FIG. 14 illustrates an exemplary data sequence reconstructed from the signals of FIGS. 12-13.
  • DETAILED DESCRIPTION
  • Embodiments of the invention are discussed in detail below. While specific implementations are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without departing from the spirit and scope of the invention.
  • FIG. 1 illustrates a tracing system 100 that includes on-chip components identified as microprocessor core 110, trace generation logic (TGL) 120, trace control block (TCB) 130, and test access port (TAP) controller 140. TGL 120 can be embodied as part of microprocessor core 110. TGL 120 is generally operative to generate program counter (PC) and data trace information based on the execution of program code in one or more pipelines within microprocessor core 110. In some embodiments, microprocessor core 10 is a high performance multi-issue microprocessor having one or more features that may disrupt sequential execution of instructions, such as deep pipelines, multi-latency pipelines, multiple outstanding load misses, out-of-order (OOO) instructions, or superscalarity.
  • TGL 120 transmits the generated trace information to TCB 130 via trace interface 180. As described below in more detail, TGL 120 includes logic 480 to monitor a reorder buffer associated with multiple instruction pipelines (not shown in FIG. 1) of processor core 110. TCB 130 captures the trace information that is provided by TGL 120 on trace interface 180 and writes the trace information to trace memory 150 in accordance with a particular set of requirements of trace re-generation software 160. As described below in more detail, trace interface 180 includes trace buses for transmitting trace data and TGL may be adapted to apply one or more rules for outputting trace data on trace interface 180 to facilitate transmitting sufficient information to TCB 130 to reconstruct the program execution.
  • Also included on-chip is TAP controller 140. TAP controller 140 includes instruction, data, and control registers as well as circuitry that enables tap controller 140 to access internal debug registers and to monitor and control the microprocessor core's address and data buses. In one embodiment, TAP controller 140 is based on the extended JTAG (EJTAG) specification developed in part by MIPS Technologies, Inc.
  • The trace information stored in trace memory 150 can be retrieved through trace regeneration software 160. Trace regeneration software 160 is a post-processing software module that enables trace reconstruction. Debugger 170 interfaces with TAP controller 140 and is generally operative to display TAP states as well as provide high-level commands to TAP controller 140. For example, debugger 170 can be used to set breakpoints or examine contents of certain sections of memory.
  • Some aspects of the present invention may be understood with regards to a single instruction pipeline. FIG. 2 illustrates a single instruction pipeline 200 having six stages, labeled as fetch stage 310, decode stage 320, execute stage 330, memory stage 340, align stage 350, and writeback stage 360. In a single instruction pipeline, a tracing point may be placed after any stage beyond which instructions are certain to execute, such as after memory stage 340.
  • FIG. 3 illustrates a single instruction pipeline having an associated data-order determination module 328, compression modules 318 and 338, and a FIFO queue 348. The deferred transmission of load and store data is enabled through the output of a data order signal that is designed to signal the out-of-order nature of load and store data. In the illustrated embodiment, the PC, store address, and load address are immediately provided to compression module 318. Store data and load data, on the other hand, are provided to compression modules 318 and 338 only when the data is available to data order determination module 328. If the data is retrieved from a cache/register, then the data is immediately available to data order determination module 328. The data can then be passed on to compression modules. On the other hand, if say the load data is requested from memory (say due to a cache miss), then the load data is not immediately available to data order determination module 328. In general, compression modules 318 and 338 are operative to compress the trace data that is to be placed into FIFO 348, thereby awaiting output onto a trace bus.
  • In a multi-issue pipeline, instructions may be issued out of order, i.e. out of program sequence. Even if instructions are not issued out of order, they may complete their execution out of order. This can happen when pipelines have different latencies or because of cache misses. To ensure correct execution of the program, the instructions are typically put back in order at the back-end of the pipeline in a structure called the reorder buffer.
  • In one embodiment of the present invention instructions are traced out from a reorder buffer. FIG. 4 is a block diagram illustrating some aspects of an embodiment of an on-chip tracing apparatus 400 for tracing multi-issue processor. A multi-issue processor may comprise processors 405 of processor core 110 that include at least two instruction pipelines, such as instruction pipelines including a plurality of stages, such as a fetch stage 410, decode stage 420, execute stage 430, memory stage 440, align stage 450, and writeback stage 460 and issue queue 470 and issue logic 475 having issue slots for each pipeline. For the purposes of illustration, two pipelines are shown, although it will be understood that processors 405 may have any number of instruction pipelines.
  • TGL 120 is communicatively coupled to reorder buffer 470 and is adapted to monitor graduating instructions from a reorder buffer 470. In one implementation, TGL 120 includes an instruction completion logic block module 480 to monitor the graduation of instructions from a reorder buffer 470 associated with instruction pipeline 405. For the purposes of illustration, reorder buffer 470 is shown as a separate element from instruction pipelines 405 although it will be understood that reorder buffer 470 is commonly considered to be a component of instruction pipelines 405.
  • Reorder buffer 470 is responsible for putting the issued instructions back in program sequence order. The reorder buffer holds instructions that complete out-of order and graduates them in-order, i.e., commits their results in-order. The reorder buffer is located at a point in the pipeline where it is certain that the instruction will not stop and can proceed to completion. This is typically at a point where it is certain that the instruction will not get an exception or be nullified for any reason.
  • The number of graduating instructions per cycle will typically not exceed the number of issue slots of the processor (e.g. 0, 1, or 2 instructions per graduation cycle for a two-issue multiprocessor), However, more generally, the maximum number of graduating instructions will range from zero to the number of issue slots at the front of the multi-issue pipeline plus the number of load miss completions from the bus and cache units.
  • TGL 120 monitors instructions at the point of graduation and traces them out on output trace buses 490. Thus, instructions may be traced in program sequence order from a multi-issue processor. Since the maximum number of instructions that may graduate each cycle is typically equal to or greater than the number of issue slots, the number of trace slots (e.g., the number of different portions of trace interface 180 able to simultaneously transmit instruction data) is preferably at least equal to the number of issue slots. Referring to FIG. 5, each trace slot 505 of trace interface 180 thus includes a trace bus for outputting trace data and other associated signals for tracing out an instruction. In one embodiment the number of trace slots is selected to be equal to the number of issue slots. Thus, in one embodiment of a two-issue multiprocessor, trace interface 180 includes at least two output trace buses 490. It is possible that in some cycles the number of graduating instructions is greater than the number of instruction trace slots. In this case, the TGL 120 may include a buffer to buffer the instructions(s) that could not be traced earlier, and trace them during the next cycle, with other instructions intergraduate in that cycle, while still maintaining the program sequence order.
  • As previously described, TCB 130 requires sufficient information to permit the sequential program execution to be reconstructed. Consequently, TGL 120 outputs trace information on trace interface 180 in a format that facilitates reconstructing sequential program execution (e.g., by TRS 160) from information received by two or more trace buses over multiple graduation cycles. Embodiments of the present invention include at least one tracing rule to facilitate reconstructing sequential program execution. One tracing rule is a bus order rule. With multiple trace buses, the TGL 120 maintains a bus order rule, which dictates that during each cycle, instructions are assigned to trace buses such that the relative instruction order may be determined from the trace bus order. For example, referring to the flow chart of FIG. 6, in one embodiment TGL 120 determines 605 the number of instructions graduating per cycle and assigns 610 these instructions to trace buses depending upon their relative program order, In a preferred embodiment, this assignment of instructions to trace buses is static, i.e., the earliest instruction is traced on trace bus 0, the next on trace bus 1, and so on. Alternatively, in another embodiment, this assignment can be dynamic with the mapping being transmitted each cycle as well.
  • To reconstruct program sequence execution, another tracing rule is that if an instruction is traced on a particular instruction trace slot (i.e., a particular output trace bus 490) then all other information for that instruction is sent on the signals of the same instruction trace slot. For example, referring to FIG. 7 in one embodiment if the PC is output 705 on a trace bus, the other associated data, such as load address, load data, store address, and store data are traced out 710 on the same trace bus (in subsequent clock cycles). This rule facilitates TCB 130 to associate and gather all the information relating to a particular instruction. The exception to this occurs when the load or store data is not immediately available, such as if a load misses in the cache. In this situation, the data is sent at a later time on any free trace bus, using an out of order data signal as described below in more detail.
  • Referring to FIG. 8, another tracing rule is applied to facilitate the TCB 130 to gather data on a per-instruction basis to determine the program sequence. This rule coordinates the end cycle of data transfers of the instructions that began tracing together on the same cycle. As described below in more detail, some types of data may be transmitted on each trace bus in one or more clock cycles. With a plurality of trace buses, a transaction end coordination rule facilitates sequencing data from the trace buses in proper order. In one embodiment, end signals, illustrated as Tend signals in FIG. 8, are coordinated on each trace bus for the data associated with instructions that began their tracing cycle together.
  • In a preferred embodiment the three above-described tracing rules are utilized together in combination. However, it will also be understood that embodiments of the present invention include using a subset of the three tracing rules (i.e., one or two of the tracing rules).
  • Having described the general components of tracing system 100, a detailed description of one embodiment of trace interface 180 is now described to illustrate in more detail one embodiment of a method of multi-issue tracing. Trace interface 180 may comprise a set of input and output signals from microprocessor core 110. A “PDO_” prefix may be used to identify signals belonging to the output interface from TGL 120, while a “PDI_” prefix may be used to identify signals belonging to the input interface to TGL 120. For a multi-issue pipeline, the trace interface may have one output trace bus for each instruction pipeline. The output signal names described may further have a “_n” appended to the signal name to designate a predetermined trace slot. For example, a two-issue microprocessor core may use the signals PDO_InsComp0 and PDO_InsComp 1 to represent the instruction completion status values of two simultaneously graduating instructions.
  • An exemplary set of output signals includes the signals listed in Table 1, below, PDO_IamTracing is a signal, sent out from TGL 120 that indicates that the rest of the Out signals represent valid trace data. In effect, PDO_IamTracing represents an enable signal for the rest of the Out signals. The PDO_InsComp signal is an instruction completion status signal. PDO_AD is a trace bus for transmitting trace data. PDO_Ttype specifies the transmission type for the transaction on the trace bus. The end of a particular transaction on a trace bus is indicated by a PDO_Tend signal. A PDO_Tmode may be included to indicate the mode of the transmission. A PDO_DataOrder signal is used to provide information on out-of-order data. A PDO_Overflow signal may be used to indicate an overflow error. A PDO_Issue13 Tag_n signal is used to provide information on instructions that are issued together. The elements of Table 1 are described below in more detail.
    TABLE 1
    Output Signal Name Description
    PDO_IamTracing Global enable signal for signals output
    from the microprocessor core
    PDO_InsComp_n Instruction completion status signal for
    the nth instruction pipeline
    PDO_AD_n The nth trace bus for trace data
    PDO_Ttype_n Specifies the transmission type
    for the transaction on the PDO_AD lines
    for the nth trace bus.
    PDO_Tend_n Indicates the last cycle of the current
    transaction
    PDO_TMode Indicates the transmission mode for the
    bits transmitted on PDO_AD
    PDO_DataOrder_n Indicates the out-of-order-ness of
    load and store data
    PDO_Overflow Indicates an internal FIFO overflow error
    PDO_Issue_Tag_n Indicates the instructions with matching
    tag values have issued together in the
    multi-issue pipelines
  • PDO_InsComp_n is an instruction completion status signal that is used as an indicator of completed instructions and their type in the processor's pipeline. In one embodiment, PDO_InsComp_in can take on the values of Table 2.
    TABLE 2
    PDO_InsComp Description
    000 No instruction completed this cycle (NI)
    001 Instruction completed this cycle (I)
    010 Instruction completed this cycle was a load (IL)
    011 Instruction completed this cycle was a store (IS)
    100 Instruction completed this cycle was a PC sync (IPC)
    101 Instruction branched this cycle (IB)
    110 Instruction branched this cycle was a load (ILB)
    111 Instruction branched this cycle was a store (ISB)
  • A PDO_InsComp value ‘000’ is associated with a No Instruction complete (NI) indication. In one example, the NI indication can be used when the instruction pipeline is stalled. In another example, the NI indication can be used when an instruction is killed due to an exception.
  • The PDO_InsComp values ‘001,’ ‘010,’ and ‘011’ are associated with the completion of instructions within a basic block or predictable jumps and branches to the next basic block. Specifically, ‘001’ is used to signal the completion of a regular instruction (I), ‘010’ is used to signal the completion of a load instruction (IL), and ‘011’ is used to signal the completion of store instruction (IS). As the I, IL, or IS indication is associated with the completion of an instruction within a basic block , the PC value of the I, IL, or IS instruction need not traced.
  • When a PDO_InsComp indicates a store in the completing instruction, the store address and data is also transmitted provided that the user requires those values to be traced. Similarly, when PDO_InsComp indicates a load in the completing instruction, the load address and data is also transmitted provided that the user requires those values to be traced. In general, if the load instruction hits in the cache, then the trace data for the load instruction is transmitted in a similar manner to the trace data for a store instruction.
  • The completion of branch instructions are signaled using the PDO_InsComp values of ‘101,’ ‘110,’ and ‘111.’ Specifically, ‘101’ is used to signal the completion of a regular branch instruction (IB), ‘110’ is used to signal the completion of a load-branch instruction (ILB), and ‘111’ is used to signal the completion of a store-branch instruction (ISB). The three branch-type encodings (101, 110, and 111) imply that the associated instruction was the target of a taken branch that could or could not be statically predicted. In general, a branch is indicated on the first instruction in a new basic block. When this first instruction is either a load or a store, then the PDO_InsComp signal takes values ILB or ISB, respectively, to indicate the combined condition of a branch and a load or store. If the PC can not be statically predicted for a an IB, ILB or ISB, then the PC is transmitted on the PDO_AD_n trace bus.
  • As previously described, trace data is output on a trace bus, PDO_ADAD_n. In general, when a PC change, load/store address, or load/store data information for the nth trace bus needs to be traced, these pieces of trace information are all sent out on the same PDO_ADAD trace bus. In general, the width of the PDO_AD trace bus is implementation dependent. In one embodiment, the trace bus PDO_AD is configured to be 32-bits wide. A first set of trace data includes PC values (TPC), load address values (TLA), store address values (TSA), and data values (TD). These trace data types are identified using the PDO_TType signal values of ‘001’ to ‘100,’ respectively. Additional trace data beyond PC, address and data values can also be transmitted on trace bus PDO_AD. The width of each PDO_AD trace bus may not be adequate to transmit the entire address or data in one cycle, each transaction may take multiple cycles to transmit. A FIFO is therefore used to hold pending transactions and values. In one embodiment, if a transaction takes multiple cycles, then the least-significant bits are sent first, followed by the more-significant bits.
  • The PDO_Ttype_n signal is used to indicate the type of information being transmitted on the PDO_AD_n bus. In one embodiment, the PDO_Ttype_n signal can take on the values of Table 3.
    TABLE 3
    PDO_TType Description
    000 No transmission this cycle (NT)
    001 Begin transmitting the PC (TPC)
    010 Begin transmitting the load address (TLA)
    011 Begin transmitting the store address (TSA)
    100 Begin transmitting the data value (TD)
    101 Begin transmitting the processor mode
    and the 8-bit ASID value (TMOAS)
    110 Begin user-defined trace record - type 1 (TU1)
    111 Begin user-defined trace record - type 2 (TU2)
  • As illustrated in Table 3, various data types can be output on each trace bus PDO_AD. The data type includes PC values (TPC), load address values (TLA), store address values (TSA), and data values (TD). These trace data types are identified using the PDO_TType signal values of ‘001’ to ‘100,’ respectively. Specifically, PDO_TType signal value ‘101’ is used to identify the transmission of processor mode and application space identity (ASID) information. The processor mode and ASID information 101 can be included as part of the synchronization information that is periodically transmitted. This portion of the synchronization information enables trace regeneration software 160 to identify the software state of the computer system being traced. The final data types that can be transmitted on trace bus PDO_AD are user-defined trace records TU1 and TU2. These user-defined trace records are identified using PDO_TType signal values ‘110’ and ‘111,’ respectively.
  • Generally, the PDO_Tend_signal indicates the last cycle of the current transaction on trace bus PDO_AD_n. This signal can be asserted in the same cycle that a transaction is started implying that the particular transaction only took one cycle to complete. In a multi-issue core, the PDO_Tend signals are synchronized for all the PDO_AD_n transmissions associated with instructions that graduate together.
  • The PDO_Tmode_n signal indicates the transmission mode for the bits transmitted on trace bus PDO_AD_n. The PDO_TMode signal can be used to signal to TCB 130 the type of compression that has been performed on the trace data that is transmitted on trace bus PDO_AD. This mode information is therefore used by TCB 130 to regenerate the program flow accurately.
  • The PDO_DataOrder_n signal is used to indicate the out-of-order nature of data that is traced out. In general, the use of the PDO_DataOrder signal enables TGL 120 to avoid having to include memory for storing data that are returned out-of-order. The data can simply be traced out as soon as they are available. Out-of-order transfers of data are further described in co-pending application Ser. No. 09/751,747, entitled “Configurable Out-Of-Order Data Transfer in a Coprocessor Interface,” which is incorporated herein by reference in its entirety.
  • The PDO_DataOrder signal indicates the position of the data in the list of current outstanding load and stores starting at the oldest. In one embodiment, the PDO_DataOrder_n signal can take on the following values of Table 4.
    TABLE 4
    PDO_DataOrder Description
    0000 data from oldest load/store instruction (is in-order)
    0001 data from second-oldest load/store instruction
    0010 data from third-oldest load/store instruction
    0011 data from fourth-oldest load/store instruction
    0100 data from fifth-oldest load/store instruction
    0101 data from sixth-oldest load/store instruction
    0110 data from seventh-oldest load/store instruction
    0111 data from eighth-oldest load/store instruction
    1000 data from ninth-oldest load/store instruction
    1001 data from tenth-oldest load/store instruction
    1010 data from eleventh-oldest load/store instruction
    1011 data from twelfth-oldest load/store instruction
    1100 data from thirteenth-oldest load/store instruction
    1101 data from fourteenth-oldest load/store instruction
    1110 data from fifteenth-oldest load/store instruction
    1111 data from sixteenth-oldest load/store instruction
  • Table 5 illustrates an example in which the program issues five loads, A, B,C, D, E, to be traced along with the corresponding PDO_DataOrder signals. In this example, it is assumed (for the purposes of simplicity) that the data is available the same clock cycle as the instruction.
    TABLE 5
    Load Data Data Traced
    Load Cycle# CacheOp Available Out PDO_DataOrder
    A
    1 Miss
    B 2 Hit B B 0001
    (second oldest)
    C 3 Hit C C 0001
    (second oldest)
    D 4 Miss
    E 5 Hit E E 0010
    (third oldest)
    k A A 0000
    (oldest)
    K + p D D 0000
    (oldest)
  • In clock cycle 1, load A misses in the cache and goes to memory. Load A is therefore considered outstanding. In clock cycle 2, load B hits in the cache and is immediately available. Load B is then traced out with the PDO_DataOrder signal indicating that the load data is the second oldest outstanding load. Based on the values of Table 4, the PDO_DataOrder signal will have a value of ‘0001.’ At this point, load A is considered the oldest outstanding load. In clock cycle 3 load C hits in the cache and is immediately available. Load C is then traced out With the PDO_DataOrder signal indicating with a value ‘0001’ that the load data is the second oldest outstanding load. At this point, load A is still considered the oldest outstanding load. Load B is not considered outstanding as it was traced out at clock cycle 2. In clock cycle 4, load D misses in the cache and goes to memory. Load D is therefore considered outstanding. At this point, both load A and load D are the currently outstanding loads. Load A is considered the oldest outstanding load while load D is considered the second oldest outstanding load. In clock cycle 5, load E hits in the cache and is immediately available. Load E is then traced out with the PDO_DataOrder signal indicating with a value ‘0010’ that the load data is the third oldest outstanding load behind load A and load D. In clock cycle k, load A returns from memory and is available. Load A is then traced out with the PDO_DataOrder signal indicating with a value ‘0000’ that the load data is the oldest outstanding load. Finally, in clock cycle k+p, load D returns from memory and is available. Load D is then traced out with the PDO_DataOrder signal indicating with a value ‘0000’ that the load data is the oldest outstanding load.
  • The PDO_Overflow signal is used to indicate that the current tracing is being abandoned due to a FIFO overflow. In this situation, TGL 120 discards all entries in FIFO 348, and restarts transmission from the next completed instruction. It should be noted that the first instruction to be signaled after the assertion of the PDO_Overflow signal should have its PC value sent as well. In effect, that instruction is treated as a IB, ILB, or ISB instruction.
  • In general, the tracing of the PC value is important where the PC value could not be statically predicted. Without this information, trace regeneration software 160 is unable to reconstruct the program execution path. For example, if the branch was unpredictable and the unpredictability lies in the branch target address, then the PC value should be transmitted. If the unpredictability lies in the branch condition (i.e., determining if the branch is taken or not), on the other hand, then the branch target PC value need not be transmitted. Here, it is sufficient to simply indicate that the branch was taken. For branch instructions where there is a jump in PC, several options exist. In one embodiment, the following rules can be applied: (1) when the branch is unconditional and the branch target is predictable, IB, ILB, or ISB is used for the PDO_InsComp value, and the PC value is not traced out; (2) when the branch is conditional, and the branch target is predictable, IB, ILB, or ISB is used only when the branch is taken and there is no need to trace out the PC value; and (3) when the branch is conditional or unconditional, and the branch target is unpredictable, IB, ILB, or ISB is used and the PC value is traced out using TPC for the PDO_TType signal. As an example the PC value can be transmitted (a) after a JR or JALR instruction; (b) after a control transfer to an exception handler; (c) after a return from exception (ERET or DERET instruction); and (d) for resynchronization purposes. For ISB and ILB indications, the user may require that the target address and data be traced along with the transmitted PC value In particular, for an ISB indication, the PC value is sent first, followed by the store address, and finally the store data. For an ILB indication, the PC value and load address are sent first, followed by the load data when it becomes available.
  • FIG. 9 is a timing diagram 900 for a single instruction pipeline illustrating the above-described interface for a single instruction pipeline relative to a clock signal Pclk. At clock cycle 1, PDO_InsComp[2:0] has a value IB, indicating the completion of a branch instruction. The value IB represents the completion of an instruction that could not be statically predicted. Accordingly, the PC value for the branch instruction should be traced, thereby enabling trace regeneration software 160 to recreate the execution of a new block of instructions.
  • The PC value for the branch instruction is transmitted on the trace bus PDO_AD[15:0]. The PDO_TMode signal indicates the transmission mode for the bits transmitted on trace bus PDO_AD[15:0]. At clock cycle 2, PDO_InsComp[2:0] has a value I, indicating the completion of an instruction within a block of instructions. As noted, the completion of an instruction within a block does not require the tracing of the PC value. Accordingly, no transmission occurs on trace bus PDO_AD[15:0]. The no transmission state is also signaled by the PDO_TType signal with a NT value.
  • At clock cycle 3, PDO_InsComp[2:0] has a value IB, indicating the completion of another branch instruction. The PC value is then transmitted on trace bus PDO_AD[15:0] with the data type TPC indicated on PDO_TType[2:0]. As illustrated, the transmission of the PC value requires two clock cycles (3 and 4). Accordingly, the PDO_TEnd signal is not asserted until the end of the transaction at clock cycle 4. Also occurring at clock cycle 4 is the signaling of value I on PDO_InsComp[2:0]. This indicates the completion of an instruction within a block of instructions and no transmission on trace bus PDO_AD[15:0] is required.
  • At clock cycle 5, PDO_InsComp[2:0] has a value IL, indicating the completion of a load instruction. Here, the PC value need not be transmitted. The user can specify however, that the load address and data be traced. With the assumption that the load hit in the cache, the load address and data is immediately available The load address is transmitted first on PDO_AD[15:0] at clock cycles 5 and 6, and the load data is transmitted next on PDO_AD[15:0] at clock cycles 7-10. In both cases, the corresponding data type is transmitted on PDO_TType[2:0] using signal values TLA and TD, respectively.
  • During the load address and data transmission at clock cycles 5-10, PDO_InsComp[2:0] further signals the completion of IL at clock cycle 5, I at clock cycle 6, NI at clock cycles 7-9, and I at clock cycle 10. Each of these instruction-completion indications did not require a transmission on trace bus PDO_AD[15:0]. Accordingly, the trace data FIFO did not overflow as it waited to be cleared during the six-cycle transmission of the load address and data during clock cycles 5-10.
  • At clock cycle 12, PDO_InsComp[2:0] indicates completion of a branch store instruction ISB. The PC value, store address, and store data are then transmitted on trace bus PDO_AD[15:0] at clock cycles 12-13, 14-16, and 17-18, respectively. As the trace data for the ISB indication completes, however, PDO_InsComp[2:0] continues to indicate the completion of additional instructions. Specifically, PDO_InsComp[2:0] indicates the sequential completion of I, IL, IL, IS, IS, and IL instructions at clock cycles 13-18, respectively.
  • While the completion of instruction I at clock cycle 13 does not require tracing of any data, the completion of the IL and IS instructions on each of clock cycles 14-18 can require tracing of a target address and data. Each of these pieces of trace data continues to fill FIFO 348 as the trace data associated with the ISB instruction at clock cycle 12 completes its transmission on trace bus PDO_AD. FIFO 348 therefore eventually overflows, as shown at clock cycle 18, indicating that FIFO 348 is being filled faster than it is being emptied. Timing diagram 900 illustrates an overflow condition at clock cycle 18. The overflow indication is indicated by the assertion of the PDO_Overflow signal, thereby indicating an internal FIFO overflow error.
  • Now that one implementation of trace interface 180 has been described, an illustrative example of a method of tracing from the point of graduation of a reorder buffer will now be described with regards to FIGS. 10-14. As an illustrative example, a two-issue core can trace two instructions and uses the signals PDO_InsComp0 and PDO_InsComp 1 to represent the completion status values of two simultaneously graduating instructions.
  • FIG. 10 illustrates the cycle of graduation of each instruction from the reorder buffer and the number of the instruction trace slot (trace bus) that actually traces that instruction The exemplary instructions are in the MIPS32 assembly language. The SW fragments correspond to store word instructions having a corresponding PDO_InsComp value of IS and a potential PDO_TType value of multiple cycles of TSA and TD. The JAL Instruction corresponds to a jump instruction having a corresponding PDO_InsComp value of I and a PDO_TType value of NT. The OR instruction has a PDO_InsComp value of I and a PDO_TType value of NT. The NOP instruction has a corresponding PDO_InsComp value of IB (target of the previous JAL) and a PDO_TType value of NT. The JR instruction has a corresponding PDO_InsComp value of I and a PDO_TType value of NT. The LW instruction has a corresponding PDO_InsComp value of ILB and a PDO_TType value of TPC and potentially TLA and TD. The BEQ instruction has a corresponding PDO_InsComp value of I and a corresponding PDO_TType value of NT. The ADDIU instruction has a corresponding PDO_InsComp value of I and a corresponding PDO_TType value of NT. The OR instruction has a corresponding PDO_InsComp value of I and a PDO_TType value of NT. The NOP instruction has a corresponding PDO_InsComp value of I and a PDO_TType value of NT. The ADDU instruction has a PDO_InsComp of I and a PDO_TType value of TMOAS. The SLTU instruction has a corresponding PDO_InsComp value of 1 and a PDO_TType value of NT. For simplicity, the instructions in the assembly fragment are also identified by an instruction number (Inst. No.) This example assumes a simple two-issue processor that allows up to one load/store instruction per issue and one branch instruction per cycle. In this example, 0, 1, or 2 instructions can graduate per cycle.
  • A preferred embodiment includes at least one rule to facilitate the TCB 130 organizing data to reconstruct the program sequence. One rule that is that the trace buses are implicitly ordered to facilitate determining the earliest instruction(s) and earliest data in a graduation cycle. For data transmissions that end on the same cycle, the data on trace bus PDO_AD_k may be assumed to be before the data on PDO_AD_k+1. For example, if there are two trace buses 0 and 1 for a two-issue multi-processor, trace bus 0 may be used to trace out the earliest instruction in the graduation cycle and trace bus 1 used to trace out the next earliest instruction in the graduation cycle. This rule facilitates keeping track of instruction order when more than one instruction graduates in a graduation cycle. An application of this rule can be seen in FIG. 10, in which the smallest number instruction in each graduation cycle is always assigned trace bus number 0. Thus, if two instructions graduate in the cycle, the earliest (smallest number) instruction can be distinguished.
  • Another rule that is that if an instruction is traced out on a particular instruction trace slot, say using trace bus PDO_InsComp_k, then all other information fur that instruction is sent on the signals of the kth instruction trace slot (e.g., the kth trace bus). For example, the address and data, if any associated with that instruction is also sent on the PDO_AD_k bus. This facilitates reconstructing the program sequence, since both instructions and associated data are received from the same trace bus. However, in one embodiment, an exception is made when the data is not immediately available. In this situation the data can be sent on a PDO_AD_n bus that is temporarily free and hence chosen by the processor to send that data, e.g. using the out of load store (PDO_DataOrder) signals.
  • Another rule to assist TCB 130 to identify the data associated with particular instructions is to synchronize the end points for transmitting data from a graduation cycle for all trace buses. In one embodiment, data associated with instructions that are traced together on the different PDO_InsComp_n trace buses are such that their end points (i.e. the last data cycle) are synchronized. This rule facilitates an external trace control block sequencing all the data operations in the various PDO_AD_n buses into the program sequence.
  • FIGS. 11-14 illustrate example of how the above-described rules may be applied to facilitate multi-issue tracing. FIG. 11 shows a block of information corresponding to instruction complete (PDO_InsComp) values in a program sequence, i.e., ILBa is instruction number 1, ILb is instruction number 2, ISc is instruction number 3, and ILd is instruction number 4. As previously mentioned, a reorder buffer would sequence the instructions in program order.
  • FIG. 12 shows the values of the block of information as they would be transmitted on two instruction trace slots, i.e., PDO_InsComp 0 and PDO_InsComp 1 for graduation cycles n and n+1. In this example, instructions traced out on trace slot 0 are presumed to have a lower instruction number than instructions traced out on trace slot 1. In this example, the instructions are traced out in two graduation cycles.
  • FIG. 13 shows the corresponding PDO_AD and PDO_Tend values for the two trace slots. As previously discussed, the PDO_AD may correspond to PC, load-addr, load data, store-addr, or store data depending upon the corresponding PDO_InsComp value. In this example, the data trace information for the instructions that were simultaneously traced on PDO_InsComp 0 and PDO_InsComp 1 are traced such that their PDO_Tend signals are coordinated to facilitate ordering the data received from the trace buses in program sequence. For the PDO_InsComp values traced in cycle n, the data transmission ends in cycle m+5. And for the PDO_InsComp values traced in cycle n+1 the data transmission ends in cycle m+9.
  • FIG. 14 illustrates how an external block reading the signals on the interface can take the data values from the two PDO_AD buses, and knowing the program sequence, can put the traced data in order. For a particular trace bus, the PDO_Tend signal indicates that one transaction is completed, e.g., the PDO_Tend 0 signal at cycle m+2 indicates the completion of the transaction of transmitting a first group of PC data, TPCa1 and TPCa2. By giving precedence to transactions that begin earlier in time and the rules for trace buses, the PDO_Tend signals in cycle 3 permit a determination that the next orderings of data is TLAa1, TLAa2 and TLAb1. In cycle 5, all of the transfers are complete for instructions traced in cycle n. Since the transfers complete at the same time and since the data transferred on the 0st trace bus can be assumed to occur earlier than for the 1th trace bus, the data may be sequenced into TDa1, TDa2 sequenced before TDb1, TDb2.
  • The present invention may be applied to out of order loads and stores in the multi-pipe core. When a multi-issue processor needs to send out of order data, it uses the PDO_DataOrder signal to indicate that the data is out-of-order. When an out-of-order data is returned, it can be traced on any free PDO_AD_n bus, not necessarily the one that traced the corresponding instruction. This is because instruction tracing is sequentialized by the PDO_InsComp_n order and therefore the data can be associated with the correct instruction once the PDO_DataOrder value is known. Note that since the PDO_AD_n trace buses are implicitly ordered, for data transmissions that end on the same cycle, the data on PDO_AD_k is before the data on PDO_AD_k+1.
  • With the method of tracing graduating instructions in sequence, it is not possible to know which instructions issue together without additional information. This information might be useful to tune a code optimizer for high performance microprocessors. In one embodiment, in order to trace this information, the processor tags all the instructions that issue together, using the signal PDO_IssueTag_n. This tag value is also traced out with each PDO_InsComp_n value. In one embodiment, a tag value of 6 bits is used, assuming an issue window of about 64 instructions. Note that this tag information can be traced out of the TCB only if the user requires it, hence it will not incur bandwidth on the external pins unless there is a real need for the information. Thus, it is recommended that the TCB allow the external tracing of this information under discretion.
  • In addition to embodiments of the invention using hardware, the invention can be embodied in a computer usable medium configured to store computer readable code (e.g., computer readable program code, data, etc). The computer code causes the enablement of the functions or fabrication, or both, of the invention disclosed herein.
  • For example, this can be accomplished through the use of general programming languages (e.g., C, C++, JAVA, etc.) GDSII databases; hardware description languages (HDL) including Verilog HDL, VHDL, Altera Hardware Description Language (AHDL) and so on; or other programming and/or circuit (i.e., schematic) capture tools available in the art.
  • The computer code can be disposed in any known computer usable (e.g., readable) medium including semiconductor memory, magnetic disk, optical disc (e.g., CD-ROM, DVD-ROM, etc.) and as a computer data signal embodied in a computer usable (e.g., readable) transmission medium (e.g., carrier wave or any other medium including digital, optical or analog-based medium). As such, the code can be transmitted over communication networks including the Internet and intranets.
  • It is understood that the invention can be embodied in computer code (e.g., as an HDL program) as part of a semiconductor intellectual property core (e.g., a microprocessor core) or a system level design (e.g., a system on chip) and transformed to hardware as part of the production of integrated circuits. Also, the invention may be embodied as a combination of hardware and computer code.
  • While embodiments have been described with regards to a multi-issue processor that includes a reorder buffer for placing instructions in program sequence order, it will also be understood that the tracing method of the present invention is not limited to processors including a reorder buffer. For example, more generally the tracing method of the present invention may be applied to multi-issue processors having an element for placing instructions back in program sequence order.
  • While the invention has been described in detail and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope thereof. Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims (9)

1. A computer program product having a computer-usable medium storing computer-readable program code, the computer-readable program code comprising:
computer-readable program code for causing a computer to describe a reorder buffer in a multi-issue processor core having a plurality of instruction pipelines, said reorder buffer having a graduation cycle for graduating instructions in program order in which the maximum number of instructions graduating per cycle depends on the number of issue slots of said multi-issue processor core; and
computer-readable program code for causing a computer to describe a trace generation module that transmits trace data for said instructions in graduation order for each graduation cycle along with information that enables a determination of program execution of said instructions.
2. The computer program product of claim 1, further comprising: computer-readable program code for causing said trace generation module to assign each instruction in the graduation cycle to one of a plurality of trace buses such that the relative instruction order of graduating instructions may be determined from associated trace bus numbers of said plurality of trace buses.
3. The computer program product of claim 2, further comprising: computer-readable program code for causing said trace generation module to coordinate transaction end points on said plurality of trace buses such that a sequence of program data may be determined.
4. A method for enabling a computer to generate tracing logic, comprising:
transmitting computer-readable program code to a computer-readable medium associated with a computer, said computer-readable program code including:
computer-readable program code for causing a computer reading said computer-readable memory to describe a reorder buffer in a multi-issue processor core having a plurality of instruction pipelines, said reorder buffer having a graduation cycle for graduating instructions in program order in which the maximum number of instructions graduating per cycle depends on the number of issue slots of said multi-issue processor core; and
computer-readable program code for causing a computer reading said computer-readable memory to describe a trace generation module that transmits trace data for said instructions in graduation order for each graduation cycle along with information that enables a determination of program execution of said instructions.
5. The method of claim 4, wherein computer-readable program code is transmitted to said computer over the Internet.
6. A computer data signal embodied in a transmission medium comprising:
computer-readable program code for causing a computer storing said data signal in a computer-readable medium to describe a reorder buffer in a multi-issue processor core having a plurality of instruction pipelines, said reorder buffer having a graduation cycle for graduating instructions in program order said reorder buffer having a graduation cycle in which the maximum number of instructions graduating per cycle depends on the number of issue slots of said multi-issue processor core; and
computer-readable program code for causing a computer storing said data signal in a computer-readable medium to describe a trace generation module that transmits trace data for said instructions in graduation order for each graduation cycle along with information that enables a determination of program execution of said instructions.
7. The computer data signal of claim 6, further comprising: computer-readable program code for causing said trace generation module to assign each instruction in the graduation cycle to one of a plurality of trace buses such that the relative instruction order of graduating instructions may be determined from associated trace bus numbers of said plurality of trace buses.
8. The computer data signal of claim 7, further comprising: computer-readable program code for causing said trace generation module to coordinate transaction end points on said plurality of trace buses such that a sequence of program data may be determined.
9. A tracing system, comprising:
an embedded processor, said embedded processor including, a multi-issue processor core having a plurality of instruction pipelines for executing instructions and having a reorder buffer placing out-of-order issued instructions placed in program sequence order, said reorder buffer having a graduation cycle in which the maximum number of instructions graduating per cycle depends on the number of issue slots of said multi-issue processor core; and
trace generation logic that is operative to generate trace data for said instructions executing in said multi-issue processor core, said trace generation logic monitoring the reorder buffer and tracing executed instructions according to a graduation cycle associated with the reorder buffer.
US11/608,725 2003-05-28 2006-12-08 Apparatus and method to trace high performance multi-issue processors Abandoned US20070089095A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US11/608,725 US20070089095A1 (en) 2003-05-28 2006-12-08 Apparatus and method to trace high performance multi-issue processors

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/448,324 US7159101B1 (en) 2003-05-28 2003-05-28 System and method to trace high performance multi-issue processors
US11/608,725 US20070089095A1 (en) 2003-05-28 2006-12-08 Apparatus and method to trace high performance multi-issue processors

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/448,324 Continuation US7159101B1 (en) 2003-05-28 2003-05-28 System and method to trace high performance multi-issue processors

Publications (1)

Publication Number Publication Date
US20070089095A1 true US20070089095A1 (en) 2007-04-19

Family

ID=37592414

Family Applications (2)

Application Number Title Priority Date Filing Date
US10/448,324 Active 2024-09-02 US7159101B1 (en) 2003-05-28 2003-05-28 System and method to trace high performance multi-issue processors
US11/608,725 Abandoned US20070089095A1 (en) 2003-05-28 2006-12-08 Apparatus and method to trace high performance multi-issue processors

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US10/448,324 Active 2024-09-02 US7159101B1 (en) 2003-05-28 2003-05-28 System and method to trace high performance multi-issue processors

Country Status (1)

Country Link
US (2) US7159101B1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052572A1 (en) * 2006-07-26 2008-02-28 Moyer William C Pipelined data processor with deterministic signature generation
US20080115113A1 (en) * 2006-11-15 2008-05-15 Lucian Codrescu Non-intrusive, thread-selective, debugging method and system for a multi-thread digital signal processor
US20080115011A1 (en) * 2006-11-15 2008-05-15 Lucian Codrescu Method and system for trusted/untrusted digital signal processor debugging operations
US20080114972A1 (en) * 2006-11-15 2008-05-15 Lucian Codrescu Method and system for instruction stuffing operations during non-intrusive digital signal processor debugging
US20080115115A1 (en) * 2006-11-15 2008-05-15 Lucian Codrescu Embedded trace macrocell for enhanced digital signal processor debugging operations
US20080126769A1 (en) * 2006-07-26 2008-05-29 Moyer William C Data processing with reconfigurable registers
US20080256396A1 (en) * 2007-04-11 2008-10-16 Louis Achille Giannini Inter-thread trace alignment method and system for a multi-threaded processor
US20090083526A1 (en) * 2007-09-20 2009-03-26 Fujitsu Microelectronics Limited Program conversion apparatus, program conversion method, and comuter product
US20090249045A1 (en) * 2008-03-31 2009-10-01 Mips Technologies, Inc. Apparatus and method for condensing trace information in a multi-processor system
US20090249046A1 (en) * 2008-03-31 2009-10-01 Mips Technologies, Inc. Apparatus and method for low overhead correlation of multi-processor trace information
US20100281308A1 (en) * 2009-04-29 2010-11-04 Freescale Semiconductor, Inc. Trace messaging device and methods thereof
US20100281304A1 (en) * 2009-04-29 2010-11-04 Moyer William C Debug messaging with selective timestamp control
US20120185675A1 (en) * 2011-01-18 2012-07-19 Samsung Electronics Co., Ltd. Apparatus and method for compressing trace data
US20160170820A1 (en) * 2014-12-10 2016-06-16 Intel Corporation Tracking deferred data packets in a debug trace architecture
US10176546B2 (en) * 2013-05-31 2019-01-08 Arm Limited Data processing systems
US10209962B2 (en) * 2017-02-06 2019-02-19 International Business Machines Corporation Reconstructing a high level compilable program from an instruction trace

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4533682B2 (en) * 2004-06-29 2010-09-01 株式会社東芝 Trace analysis apparatus and trace analysis method
US20070089102A1 (en) * 2005-10-18 2007-04-19 Erb David J System and method for analyzing software performance without requiring hardware
US7743279B2 (en) * 2007-04-06 2010-06-22 Apple Inc. Program counter (PC) trace
JP2011258055A (en) * 2010-06-10 2011-12-22 Fujitsu Ltd Information processing system, and fault processing method for information processing system
US8762783B2 (en) * 2010-06-24 2014-06-24 International Business Machines Corporation Error identification
US10747543B2 (en) * 2018-12-28 2020-08-18 Marvell Asia Pte, Ltd. Managing trace information storage using pipeline instruction insertion and filtering

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5574935A (en) * 1993-12-29 1996-11-12 Intel Corporation Superscalar processor with a multi-port reorder buffer
US5943498A (en) * 1994-12-28 1999-08-24 Hewlett-Packard Company Microprocessor, method for transmitting signals between the microprocessor and debugging tools, and method for tracing
US6002880A (en) * 1992-12-29 1999-12-14 Philips Electronics North America Corporation VLIW processor with less instruction issue slots than functional units
US6026479A (en) * 1998-04-22 2000-02-15 Hewlett-Packard Company Apparatus and method for efficient switching of CPU mode between regions of high instruction level parallism and low instruction level parallism in computer programs
US7110934B2 (en) * 2002-10-29 2006-09-19 Arm Limited. Analysis of the performance of a portion of a data processing system

Family Cites Families (96)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3473154A (en) 1964-05-04 1969-10-14 Gen Electric Data processing unit for providing sequential memory access and record thereof
US3585599A (en) 1968-07-09 1971-06-15 Ibm Universal system service adapter
CA969657A (en) 1969-02-28 1975-06-17 United Aircraft Corporation Selective data handling apparatus
US3694582A (en) 1969-03-13 1972-09-26 Int Standard Electric Corp Circuit arrangement for supervising the coded output information of a translator in telecommunication systems and particularly telephone systems
US3707725A (en) 1970-06-19 1972-12-26 Ibm Program execution tracing system improvements
US3704363A (en) 1971-06-09 1972-11-28 Ibm Statistical and environmental data logging system for data processing storage subsystem
US3771131A (en) 1972-04-17 1973-11-06 Xerox Corp Operating condition monitoring in digital computers
US3794831A (en) 1972-06-01 1974-02-26 Ibm Apparatus and method for monitoring the operation of tested units
US3805038A (en) 1972-07-12 1974-04-16 Gte Automatic Electric Lab Inc Data handling system maintenance arrangement for processing system fault conditions
US3906454A (en) 1973-05-18 1975-09-16 Bell Telephone Labor Inc Computer monitoring system
US4205370A (en) 1975-04-16 1980-05-27 Honeywell Information Systems Inc. Trace method and apparatus for use in a data processing system
US4293925A (en) 1977-08-29 1981-10-06 Hewlett-Packard Company Apparatus and method for indicating a minimum degree of activity of digital signals
JPS5755456A (en) 1980-09-19 1982-04-02 Hitachi Ltd Career recording system
FR2509936B1 (en) 1981-07-17 1986-12-19 Thomson Csf DISTURBANCE RECORDING SYSTEM
US4503495A (en) 1982-01-15 1985-03-05 Honeywell Information Systems Inc. Data processing system common bus utilization detection logic
US4511960A (en) 1982-01-15 1985-04-16 Honeywell Information Systems Inc. Data processing system auto address development logic for multiword fetch
US4462077A (en) 1982-06-24 1984-07-24 Bell Telephone Laboratories, Incorporated Trace facility for use in multiprocessing environment
JPS59133610A (en) 1983-01-19 1984-08-01 Omron Tateisi Electronics Co Programmable controller
US4539682A (en) 1983-04-11 1985-09-03 The United States Of America As Represented By The Secretary Of The Army Method and apparatus for signaling on-line failure detection
US4590550A (en) 1983-06-29 1986-05-20 International Business Machines Corporation Internally distributed monitoring system
US4554661A (en) 1983-10-31 1985-11-19 Burroughs Corporation Generalized fault reporting system
JPS60238944A (en) 1984-05-14 1985-11-27 Mitsubishi Electric Corp Storage device for tracing
EP0199009A3 (en) 1985-02-28 1989-05-31 Kabushiki Kaisha Toshiba Path coverage measuring system in a programme
US5535331A (en) 1987-09-04 1996-07-09 Texas Instruments Incorporated Processor condition sensing circuits, systems and methods
US5084814A (en) 1987-10-30 1992-01-28 Motorola, Inc. Data processor with development support features
JP2678283B2 (en) 1988-03-15 1997-11-17 株式会社日立製作所 Data communication controller
US5289587A (en) 1988-11-30 1994-02-22 National Semiconductor Corporation Apparatus for and method of providing the program counter of a microprocessor external to the device
US5274811A (en) 1989-06-19 1993-12-28 Digital Equipment Corporation Method for quickly acquiring and using very long traces of mixed system and user memory references
US5150470A (en) 1989-12-20 1992-09-22 International Business Machines Corporation Data processing system with instruction queue having tags indicating outstanding data status
GB2260428B (en) 1991-10-11 1995-03-08 Sony Broadcast & Communication Data Formatter
JPH0820949B2 (en) 1991-11-26 1996-03-04 松下電器産業株式会社 Information processing device
JP2693678B2 (en) 1992-01-28 1997-12-24 株式会社東芝 Data processing device
GB2263988B (en) 1992-02-04 1996-05-22 Digital Equipment Corp Work flow management system and method
US5491793A (en) 1992-07-31 1996-02-13 Fujitsu Limited Debug support in a processor chip
KR970005831B1 (en) 1992-09-09 1997-04-21 대우전자 주식회사 Image coder using adaptive frame/field change coding method
US5752013A (en) 1993-06-30 1998-05-12 Intel Corporation Method and apparatus for providing precise fault tracing in a superscalar microprocessor
US5751942A (en) 1993-06-30 1998-05-12 Intel Corporation Trace event detection during trace enable transitions
DE4332993C1 (en) 1993-09-28 1994-11-24 Siemens Ag Tracer system for fault analysis in running real-time systems
US5473754A (en) 1993-11-23 1995-12-05 Rockwell International Corporation Branch decision encoding scheme
JP3290280B2 (en) 1994-01-13 2002-06-10 株式会社東芝 Information processing device
SG75756A1 (en) * 1994-02-28 2000-10-24 Intel Corp Method and apparatus for avoiding writeback conflicts between execution units sharing a common writeback path
US5533193A (en) 1994-06-24 1996-07-02 Xerox Corporation Method of saving machine fault information including transferring said information to another memory when an occurrence of predetermined events or faults of a reproduction machine is recognized
GB2293467B (en) 1994-09-20 1999-03-31 Advanced Risc Mach Ltd Trace analysis of data processing
WO1996012228A1 (en) * 1994-10-14 1996-04-25 Silicon Graphics, Inc. Redundant mapping tables
US5764885A (en) 1994-12-19 1998-06-09 Digital Equipment Corporation Apparatus and method for tracing data flows in high-speed computer systems
US5802272A (en) 1994-12-19 1998-09-01 Digital Equipment Corporation Method and apparatus for tracing unpredictable execution flows in a trace buffer of a high-speed computer system
DE69523884T2 (en) 1994-12-28 2002-06-27 Toshiba Kawasaki Kk Microprocessor with troubleshooting system
US5642478A (en) 1994-12-29 1997-06-24 International Business Machines Corporation Distributed trace data acquisition system
US5598421A (en) 1995-02-17 1997-01-28 Unisys Corporation Method and system for tracking the state of each one of multiple JTAG chains used in testing the logic of intergrated circuits
JPH08320808A (en) 1995-05-24 1996-12-03 Nec Corp Emulation system
US5621886A (en) 1995-06-19 1997-04-15 Intel Corporation Method and apparatus for providing efficient software debugging
US6012085A (en) 1995-11-30 2000-01-04 Stampede Technolgies, Inc. Apparatus and method for increased data access in a network file object oriented caching system
US5724505A (en) 1996-05-15 1998-03-03 Lucent Technologies Inc. Apparatus and method for real-time program monitoring via a serial interface
JPH09307726A (en) 1996-05-17 1997-11-28 Oki Data:Kk Image compression and restoring device
US5903740A (en) * 1996-07-24 1999-05-11 Advanced Micro Devices, Inc. Apparatus and method for retiring instructions in excess of the number of accessible write ports
US5832515A (en) 1996-09-12 1998-11-03 Veritas Software Log device layered transparently within a filesystem paradigm
US5748904A (en) 1996-09-13 1998-05-05 Silicon Integrated Systems Corp. Method and system for segment encoded graphic data compression
US5812868A (en) 1996-09-16 1998-09-22 Motorola Inc. Method and apparatus for selecting a register file in a data processing system
US5848264A (en) 1996-10-25 1998-12-08 S3 Incorporated Debug and video queue for multi-processor chip
US5878208A (en) 1996-11-25 1999-03-02 International Business Machines Corporation Method and system for instruction trace reconstruction utilizing limited output pins and bus monitoring
US5996092A (en) 1996-12-05 1999-11-30 International Business Machines Corporation System and method for tracing program execution within a processor before and after a triggering event
US5946486A (en) 1996-12-10 1999-08-31 International Business Machines Corporation Apparatus and method for tracing entries to or exits from a dynamic link library
US5790561A (en) 1997-01-17 1998-08-04 Rockwell International Corporation Internal testability system for microprocessor-based integrated circuit
US6009270A (en) 1997-04-08 1999-12-28 Advanced Micro Devices, Inc. Trace synchronization in a processor
US6314530B1 (en) 1997-04-08 2001-11-06 Advanced Micro Devices, Inc. Processor having a trace access instruction to access on-chip trace memory
US6094729A (en) 1997-04-08 2000-07-25 Advanced Micro Devices, Inc. Debug interface including a compact trace record storage
US5944841A (en) 1997-04-15 1999-08-31 Advanced Micro Devices, Inc. Microprocessor with built-in instruction tracing capability
US5933626A (en) 1997-06-12 1999-08-03 Advanced Micro Devices, Inc. Apparatus and method for tracing microprocessor instructions
US6282701B1 (en) 1997-07-31 2001-08-28 Mutek Solutions, Ltd. System and method for monitoring and analyzing the execution of computer programs
US6289437B1 (en) * 1997-08-27 2001-09-11 International Business Machines Corporation Data processing system and method for implementing an efficient out-of-order issue mechanism
GB2329049B (en) 1997-09-09 2002-09-11 Advanced Risc Mach Ltd Apparatus and method for identifying exceptions when debugging software
GB2329048A (en) 1997-09-09 1999-03-10 * Advanced Risc Machines Limited A debugger interface unit with a stepping mode
US5970246A (en) 1997-09-11 1999-10-19 Motorola Inc. Data processing system having a trace mechanism and method therefor
US6338159B1 (en) 1997-12-12 2002-01-08 International Business Machines Corporation System and method for providing trace information
US6772324B2 (en) * 1997-12-17 2004-08-03 Intel Corporation Processor having multiple program counters and trace buffers outside an execution pipeline
US6687865B1 (en) 1998-03-25 2004-02-03 On-Chip Technologies, Inc. On-chip service processor for test and debug of integrated circuits
US6055630A (en) * 1998-04-20 2000-04-25 Intel Corporation System and method for processing a plurality of branch instructions by a plurality of storage devices and pipeline units
US6247143B1 (en) * 1998-06-30 2001-06-12 Sun Microsystems, Inc. I/O handling for a multiprocessor computer system
US6145123A (en) 1998-07-01 2000-11-07 Advanced Micro Devices, Inc. Trace on/off with breakpoint register
JP3277900B2 (en) 1998-09-30 2002-04-22 日本電気株式会社 Program inspection method, program inspection device, and computer-readable storage medium storing inspection program
US6256777B1 (en) 1998-10-09 2001-07-03 Hewlett-Packard Company Method and apparatus for debugging of optimized machine code, using hidden breakpoints
US6457144B1 (en) 1998-12-08 2002-09-24 International Business Machines Corporation System and method for collecting trace data in main storage
US6353924B1 (en) 1999-02-08 2002-03-05 Incert Software Corporation Method for back tracing program execution
US6487715B1 (en) 1999-04-16 2002-11-26 Sun Microsystems, Inc. Dynamic code motion optimization and path tracing
US6615370B1 (en) 1999-10-01 2003-09-02 Hitachi, Ltd. Circuit for storing trace information
US6684348B1 (en) 1999-10-01 2004-01-27 Hitachi, Ltd. Circuit for processing trace information
US6530076B1 (en) 1999-12-23 2003-03-04 Bull Hn Information Systems Inc. Data processing system processor dynamic selection of internal signal tracing
US7024661B2 (en) 2000-01-07 2006-04-04 Hewlett-Packard Development Company, L.P. System and method for verifying computer program correctness and providing recoverable execution trace information
JP3629181B2 (en) 2000-03-28 2005-03-16 Necマイクロシステム株式会社 Program development support device
US6694427B1 (en) * 2000-04-20 2004-02-17 International Business Machines Corporation Method system and apparatus for instruction tracing with out of order processors
US6658649B1 (en) 2000-06-13 2003-12-02 International Business Machines Corporation Method, apparatus and article of manufacture for debugging a user defined region of code
US6754804B1 (en) 2000-12-29 2004-06-22 Mips Technologies, Inc. Coprocessor interface transferring multiple instructions simultaneously along with issue path designation and/or issue order designation for the instructions
US7093236B2 (en) 2001-02-01 2006-08-15 Arm Limited Tracing out-of-order data
US6883162B2 (en) * 2001-06-06 2005-04-19 Sun Microsystems, Inc. Annotations for transaction tracing
US6834360B2 (en) 2001-11-16 2004-12-21 International Business Machines Corporation On-chip logic analyzer
US6615371B2 (en) 2002-03-11 2003-09-02 American Arium Trace reporting method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6002880A (en) * 1992-12-29 1999-12-14 Philips Electronics North America Corporation VLIW processor with less instruction issue slots than functional units
US5574935A (en) * 1993-12-29 1996-11-12 Intel Corporation Superscalar processor with a multi-port reorder buffer
US5943498A (en) * 1994-12-28 1999-08-24 Hewlett-Packard Company Microprocessor, method for transmitting signals between the microprocessor and debugging tools, and method for tracing
US6026479A (en) * 1998-04-22 2000-02-15 Hewlett-Packard Company Apparatus and method for efficient switching of CPU mode between regions of high instruction level parallism and low instruction level parallism in computer programs
US7110934B2 (en) * 2002-10-29 2006-09-19 Arm Limited. Analysis of the performance of a portion of a data processing system

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080126769A1 (en) * 2006-07-26 2008-05-29 Moyer William C Data processing with reconfigurable registers
US20080052572A1 (en) * 2006-07-26 2008-02-28 Moyer William C Pipelined data processor with deterministic signature generation
US7627795B2 (en) * 2006-07-26 2009-12-01 Freescale Semiconductor, Inc Pipelined data processor with deterministic signature generation
US7823033B2 (en) 2006-07-26 2010-10-26 Freescale Semiconductor, Inc. Data processing with configurable registers
US8380966B2 (en) 2006-11-15 2013-02-19 Qualcomm Incorporated Method and system for instruction stuffing operations during non-intrusive digital signal processor debugging
US20080115113A1 (en) * 2006-11-15 2008-05-15 Lucian Codrescu Non-intrusive, thread-selective, debugging method and system for a multi-thread digital signal processor
US20080115011A1 (en) * 2006-11-15 2008-05-15 Lucian Codrescu Method and system for trusted/untrusted digital signal processor debugging operations
US20080114972A1 (en) * 2006-11-15 2008-05-15 Lucian Codrescu Method and system for instruction stuffing operations during non-intrusive digital signal processor debugging
US20080115115A1 (en) * 2006-11-15 2008-05-15 Lucian Codrescu Embedded trace macrocell for enhanced digital signal processor debugging operations
US8341604B2 (en) 2006-11-15 2012-12-25 Qualcomm Incorporated Embedded trace macrocell for enhanced digital signal processor debugging operations
US8533530B2 (en) 2006-11-15 2013-09-10 Qualcomm Incorporated Method and system for trusted/untrusted digital signal processor debugging operations
US8370806B2 (en) 2006-11-15 2013-02-05 Qualcomm Incorporated Non-intrusive, thread-selective, debugging method and system for a multi-thread digital signal processor
US20080256396A1 (en) * 2007-04-11 2008-10-16 Louis Achille Giannini Inter-thread trace alignment method and system for a multi-threaded processor
US8484516B2 (en) * 2007-04-11 2013-07-09 Qualcomm Incorporated Inter-thread trace alignment method and system for a multi-threaded processor
US20090083526A1 (en) * 2007-09-20 2009-03-26 Fujitsu Microelectronics Limited Program conversion apparatus, program conversion method, and comuter product
US8352928B2 (en) * 2007-09-20 2013-01-08 Fujitsu Semiconductor Limited Program conversion apparatus, program conversion method, and computer product
US8230202B2 (en) 2008-03-31 2012-07-24 Mips Technologies, Inc. Apparatus and method for condensing trace information in a multi-processor system
US20090249045A1 (en) * 2008-03-31 2009-10-01 Mips Technologies, Inc. Apparatus and method for condensing trace information in a multi-processor system
US20090249046A1 (en) * 2008-03-31 2009-10-01 Mips Technologies, Inc. Apparatus and method for low overhead correlation of multi-processor trace information
US20120331354A1 (en) * 2009-04-29 2012-12-27 Freescale Semiconductor, Inc. Trace messaging device and methods thereof
US20100281304A1 (en) * 2009-04-29 2010-11-04 Moyer William C Debug messaging with selective timestamp control
US20100281308A1 (en) * 2009-04-29 2010-11-04 Freescale Semiconductor, Inc. Trace messaging device and methods thereof
US8201025B2 (en) 2009-04-29 2012-06-12 Freescale Semiconductor, Inc. Debug messaging with selective timestamp control
US8286032B2 (en) * 2009-04-29 2012-10-09 Freescale Semiconductor, Inc. Trace messaging device and methods thereof
US20120185675A1 (en) * 2011-01-18 2012-07-19 Samsung Electronics Co., Ltd. Apparatus and method for compressing trace data
US9152422B2 (en) * 2011-01-18 2015-10-06 Samsung Electronics Co., Ltd. Apparatus and method for compressing trace data
US10176546B2 (en) * 2013-05-31 2019-01-08 Arm Limited Data processing systems
US20160170820A1 (en) * 2014-12-10 2016-06-16 Intel Corporation Tracking deferred data packets in a debug trace architecture
US9632907B2 (en) * 2014-12-10 2017-04-25 Intel Corporation Tracking deferred data packets in a debug trace architecture
US10209962B2 (en) * 2017-02-06 2019-02-19 International Business Machines Corporation Reconstructing a high level compilable program from an instruction trace
US10691419B2 (en) 2017-02-06 2020-06-23 International Business Machines Corporation Reconstructing a high level compilable program from an instruction trace

Also Published As

Publication number Publication date
US7159101B1 (en) 2007-01-02

Similar Documents

Publication Publication Date Title
US20070089095A1 (en) Apparatus and method to trace high performance multi-issue processors
US7185234B1 (en) Trace control from hardware and software
US7178133B1 (en) Trace control based on a characteristic of a processor's operating state
US8185879B2 (en) External trace synchronization via periodic sampling
US7069544B1 (en) Dynamic selection of a compression algorithm for trace data
US7181728B1 (en) User controlled trace records
US7043668B1 (en) Optimized external trace formats
CN101529392B (en) Method and system for instruction stuffing operations during non-intrusive digital signal processor debugging
EP0762280B1 (en) Data processor with built-in emulation circuit
US5537559A (en) Exception handling circuit and method
EP0762276B1 (en) Data processor with built-in emulation circuit
US6205560B1 (en) Debug system allowing programmable selection of alternate debug mechanisms such as debug handler, SMI, or JTAG
US6622237B1 (en) Store to load forward predictor training using delta tag
US6651161B1 (en) Store load forward predictor untraining
US6694424B1 (en) Store load forward predictor training
US7168066B1 (en) Tracing out-of order load data
US20030051122A1 (en) Trace information generation apparatus for generating branch trace information omitting at least part of branch source information and branch destination information on target processing
US7711934B2 (en) Processor core and method for managing branch misprediction in an out-of-order processor pipeline
US7231551B1 (en) Distributed tap controller
US20080141002A1 (en) Instruction pipeline monitoring device and method thereof
US7124072B1 (en) Program counter and data tracing from a multi-issue processor
US20100306513A1 (en) Processor Core and Method for Managing Program Counter Redirection in an Out-of-Order Processor Pipeline
US7013256B2 (en) Computer system with debug facility
US20080140993A1 (en) Fetch engine monitoring device and method thereof
WO2007084202A2 (en) Processor core and method for managing branch misprediction in an out-of-order processor pipeline

Legal Events

Date Code Title Description
AS Assignment

Owner name: MIPS TECHNOLOGIES, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:THEKKATH, RADHIKA;TREUE, FRANZ;KRAGH, SOREN;AND OTHERS;REEL/FRAME:018605/0338;SIGNING DATES FROM 20030909 TO 20030927

AS Assignment

Owner name: JEFFERIES FINANCE LLC, AS COLLATERAL AGENT, NEW YO

Free format text: SECURITY AGREEMENT;ASSIGNOR:MIPS TECHNOLOGIES, INC.;REEL/FRAME:019744/0001

Effective date: 20070824

Owner name: JEFFERIES FINANCE LLC, AS COLLATERAL AGENT,NEW YOR

Free format text: SECURITY AGREEMENT;ASSIGNOR:MIPS TECHNOLOGIES, INC.;REEL/FRAME:019744/0001

Effective date: 20070824

AS Assignment

Owner name: MIPS TECHNOLOGIES, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JEFFERIES FINANCE LLC, AS COLLATERAL AGENT;REEL/FRAME:021985/0015

Effective date: 20081205

Owner name: MIPS TECHNOLOGIES, INC.,CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:JEFFERIES FINANCE LLC, AS COLLATERAL AGENT;REEL/FRAME:021985/0015

Effective date: 20081205

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION