US20110258421A1 - Architecture Support for Debugging Multithreaded Code - Google Patents
Architecture Support for Debugging Multithreaded Code Download PDFInfo
- Publication number
- US20110258421A1 US20110258421A1 US12/762,817 US76281710A US2011258421A1 US 20110258421 A1 US20110258421 A1 US 20110258421A1 US 76281710 A US76281710 A US 76281710A US 2011258421 A1 US2011258421 A1 US 2011258421A1
- Authority
- US
- United States
- Prior art keywords
- instruction
- entry
- cam
- exception
- address
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000015654 memory Effects 0.000 claims abstract description 97
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000004044 response Effects 0.000 claims abstract description 16
- 238000000034 method Methods 0.000 claims description 30
- 238000004590 computer program Methods 0.000 claims description 17
- 230000007246 mechanism Effects 0.000 abstract description 17
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 7
- 238000011010 flushing procedure Methods 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 4
- 230000003190 augmentative effect Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008685 targeting Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000000872 buffer Substances 0.000 description 1
- 230000005465 channeling Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000010339 dilation Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3636—Software debugging by tracing the execution of the program
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3648—Software debugging using additional hardware
Definitions
- the present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms that provide support for debugging multithreaded code.
- a method in a processor of a data processing system, is provided for debugging application code.
- the method comprises receiving an instruction in a hardware unit of the processor, the instruction having a target memory address that the instruction is attempting to access.
- the method further comprises searching a content addressable memory (CAM) associated with the hardware unit for an entry in the CAM designating a range of addresses that includes the target memory address.
- the method comprises, in response to finding an entry in the CAM designating a range of addresses that include the target memory address, determining if information in the entry identifies the instruction as an instruction of interest.
- the method comprises, in response to the entry identifying the instruction as an instruction of interest, generating an exception and sending the exception to one of an exception handler or a debugger application.
- the method further includes the programmer loading the CAM associated with the hardware with ranges of addresses including variables shared among various threads in the program. Furthermore, the method includes setting the CAM of every hardware thread that runs an application thread according to an embodiment of this invention. The program is then run, and if a thread accesses a variable in the ranges specified in the CAM, a debugger verifies that the application has procured the necessary synchronization construct prior to accessing the variable. An access to a variable without protection is a potential for a synchronization bug, which is difficult to detect in conventional debugging.
- a computer program product comprising a computer useable or readable medium having a computer readable program.
- the computer readable program when executed on a computing device, causes the computing device to perform various ones, and combinations of, the operations outlined above with regard to the method illustrative embodiment.
- a system/apparatus may comprise one or more processors and a memory coupled to the one or more processors.
- the memory may comprise instructions which, when executed by the one or more processors, cause the one or more processors to perform various ones, and combinations of, the operations outlined above with regard to the method illustrative embodiment.
- FIG. 1 is an example diagram of a processor architecture in which aspects of the illustrative embodiments may be implemented
- FIG. 2 is an example block diagram of a load/store unit in accordance with one illustrative embodiment.
- FIG. 3 is a flowchart outlining an example operation of a load/store unit in accordance with one illustrative embodiment.
- the illustrative embodiments provide a mechanism for providing debugging support for multi-threaded computer code.
- the mechanisms of the illustrative embodiments provide hardware support that enables an application to track memory accesses to several ranges in memory.
- the hardware support includes a content addressable memory (CAM) structure that can be set either by the application or a debugger that controls the application.
- CAM content addressable memory
- Each entry in the CAM structure has a starting address, which designates the starting address of a range of memory being monitored.
- the entry further comprises a length field, which designates the size of the range of memory being monitored corresponding to the entry, a store bit (or S bit), and a load bit (or L bit), which enable detection of memory stores and loads, respectively, to the range of memory defined by the start address and length.
- a processor checks every access to memory within a running thread. If the address of the memory access matches one of the entries in the CAM, i.e. the address is within a range of memory corresponding to an entry in the CAM, then the hardware issues an exception. The exception causes the state of the thread on the stack to be stored and execution to jump to an exception handling routine in software. A match of the address of the access to an entry in the CAM occurs if the memory access is a store and the corresponding address lies in the range determined by one of the CAM entries with a corresponding S bit being set to a predetermined value, e.g., 1.
- a match also occurs if the memory access is a load and the corresponding address lies in the range determined by one of the CAM entries with a corresponding L bit being set to a predetermined value, e.g., 1. If the S bit or the L bit is not set to the predetermined value, e.g., the S bit or L bit is set to 0, and the access is a store or load, respectively, then the match is ignored.
- a predetermined value e.g. 1
- the application or the debugger controlling the application may set the range of memory to be monitored into one of the CAM entries and an exception handler may be provided to handle the exceptions generated upon any memory access to a monitored range.
- the exception handler may be used to determine where, in the application's code, a particular variable is being modified during execution, for example, by recording the variable's state at the time of the exception as well as other execution parameters, such as may be generated by performance counters, or the like.
- the CAM structure allows the hardware to monitor more than one range of memory simultaneously without any performance overhead that may cause execution dilation.
- the application or a debugger may set the exception handler to check if a received instruction performs a store or a load to a variable's memory address while a protecting synchronization object, e.g., a lock, has been acquired by another thread prior to the access. If not, then this is an instance of a race condition or a demonic access to a shared variable, which are common and difficult to find bugs in multi-threaded applications. If the protecting synchronization object has been acquired prior to the access, then a race condition or demonic access to a shared variable has not been encountered. Other types of hard to find bugs may be found using the hardware mechanisms of the illustrative embodiments to provide support for generating debugging exceptions and branching execution to an appropriate exception handler to gather trace information for debugging purposes.
- the mechanisms of the illustrative embodiments may be used in many different types of data processing system and processor architectures.
- the illustrative embodiments may be used in both single processor sequential processing architectures and multiple processor, multi-threaded data processing system architectures, to provide hardware support for debugging of computer programs.
- the data processing system in which the mechanisms of the illustrative embodiments are implemented is a multi-processor (or multi-core) data processing system that provides multi-threading hardware. It should be appreciated, however, that the illustrative embodiments and the present invention are not limited to such.
- the present invention may be embodied as a system, method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
- the computer-usable or computer-readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium.
- the computer-readable medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CDROM), an optical storage device, a transmission media such as those supporting the Internet or an intranet, or a magnetic storage device.
- a computer-usable or computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
- a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- the computer-usable medium may include a propagated data signal with the computer-usable program code embodied therewith, either in baseband or as part of a carrier wave.
- the computer usable program code may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio frequency (RF), etc.
- Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JavaTM, SmalltalkTM, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- LAN local area network
- WAN wide area network
- Internet Service Provider for example, AT&T, MCI, Sprint, EarthLinkTM, MSN, GTE, etc.
- program code may be embodied on a computer readable storage medium on the server or the remote computer and downloaded over a network to a computer readable storage medium of the remote computer or the users' computer for storage and/or execution.
- any of the computing systems or data processing systems may store the program code in a computer readable storage medium after having downloaded the program code over a network from a remote computing system or data processing system.
- These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- Processor 100 may be implemented as one or more of the processing units in a multi-threaded data processing system architecture, for example. That is, processor 100 may comprise one or more processor cores supporting the simultaneous execution of more than one thread. For example, processor 100 may comprise a single integrated circuit superscalar microprocessor with dual-thread simultaneous multi-threading (SMT) that may also be operated in a single threaded mode. Accordingly, as discussed further herein below, processor 100 includes various units, registers, buffers, memories, and other sections, all of which are formed by integrated circuitry.
- SMT dual-thread simultaneous multi-threading
- instruction fetch unit (IFU) 102 connects to instruction cache 104 .
- Instruction cache 104 holds instructions for multiple programs (threads) to be executed.
- Instruction cache 104 also has an interface to level 2 (L2) cache/memory 106 .
- IFU 102 requests instructions from instruction cache 104 according to an instruction address, and passes instructions to instruction decode unit 108 .
- IFU 102 may request multiple instructions from instruction cache 104 for up to two threads at the same time.
- Instruction decode unit 108 decodes multiple instructions for up to two threads at the same time and passes decoded instructions to instruction sequencer unit (ISU) 109 .
- ISU instruction sequencer unit
- Processor 100 may also include issue queue 110 , which receives decoded instructions from ISU 109 . Instructions are stored in the issue queue 110 while awaiting dispatch to the appropriate execution units. For an out-of order processor to operate in an in-order manner, ISU 109 may selectively issue instructions quickly using false dependencies between each instruction. If the instruction does not produce data, such as in a read after write dependency, ISU 109 may add an additional source operand (also referred to as a consumer) per instruction to point to the previous target instruction (also referred to as a producer). Issue queue 110 , when issuing the producer, may then wakeup the consumer for issue. By introducing false dependencies, a chain of dependent instructions may then be created, whereas the instructions may then be issued only in-order.
- issue queue 110 receives decoded instructions from ISU 109 . Instructions are stored in the issue queue 110 while awaiting dispatch to the appropriate execution units.
- ISU 109 may selectively issue instructions quickly using false dependencies between each instruction. If the instruction does not produce data, such as
- ISU 109 uses the added consumer for instruction scheduling purposes and the instructions, when executed, do not actually use the data from the added dependency. Once ISU 109 selectively adds any required false dependencies, then issue queue 110 takes over and issues the instructions in order for each thread, and outputs or issues instructions for each thread to execution units 112 , 114 , 116 , 118 , 120 , 122 , 124 , 126 , and 128 of the processor.
- the execution units of the processor may include branch unit 112 , load/store units (LSUA) 114 and (LSUB) 116 , fixed point execution units (FXUA) 118 and (FXUB) 120 , floating point execution units (FPUA) 122 and (FPUB) 124 , and vector multimedia extension units (VMXA) 126 and (VMXB) 128 .
- branch unit 112 load/store units (LSUA) 114 and (LSUB) 116
- FXUA fixed point execution units
- FXUA floating point execution units
- FPUA floating point execution units
- FPUB floating point execution units
- VMXA vector multimedia extension units
- Execution units 112 , 114 , 116 , 118 , 120 , 122 , 124 , 126 , and 128 are fully shared across both threads, meaning that execution units 112 , 114 , 116 , 118 , 120 , 122 , 124 , 126 , and 128 may receive instructions from either or both threads.
- the processor includes multiple register sets 130 , 132 , 134 , 136 , 138 , 140 , 142 , 144 , and 146 , which may also be referred to as architected register files (ARFs).
- An ARF is a file where completed data is stored once an instruction has completed execution.
- ARFs 130 , 132 , 134 , 136 , 138 , 140 , 142 , 144 , and 146 may store data separately for each of the two threads and by the type of instruction, namely general purpose registers (GPRs) 130 and 132 , floating point registers (FPRs) 134 and 136 , special purpose registers (SPRs) 138 and 140 , and vector registers (VRs) 144 and 146 .
- GPRs general purpose registers
- FPRs floating point registers
- SPRs special purpose registers
- VRs vector registers
- the processor additionally includes a set of shared special purpose registers (SPR) 142 for holding program states, such as an instruction pointer, stack pointer, or processor status word, which may be used on instructions from either or both threads.
- SPR shared special purpose registers
- Execution units 112 , 114 , 116 , 118 , 120 , 122 , 124 , 126 , and 128 are connected to ARFs 130 , 132 , 134 , 136 , 138 , 140 , 142 , 144 , and 146 through simplified internal bus structure 149 .
- FPUA 122 and FPUB 124 retrieves register source operand information, which is input data required to execute an instruction, from FPRs 134 and 136 , if the instruction data required to execute the instruction is complete or if the data has passed the point of flushing in the pipeline.
- Complete data is data that has been generated by an execution unit once an instruction has completed execution and is stored in an ARF, such as ARFs 130 , 132 , 134 , 136 , 138 , 140 , 142 , 144 , and 146 .
- Incomplete data is data that has been generated during instruction execution where the instruction has not completed execution.
- FPUA 122 and FPLTB 124 input their data according to which thread each executing instruction belongs. For example, FPUA 122 inputs completed data to FPR 134 and FPUB 124 inputs completed data to FPR 136 , because FPUA 122 , FPUB 124 , and FPRs 134 and 136 are thread specific.
- FPUA 122 and FPUB 124 output their destination register operand data, or instruction data generated during execution of the instruction, to FPRs 134 and 136 when the instruction has passed the point of flushing in the pipeline.
- FXUA 118 , FXUB 120 , LSUA 114 , and LSUB 116 output their destination register operand data, or instruction data generated during execution of the instruction, to GPRs 130 and 132 when the instruction has passed the point of flushing in the pipeline.
- FXUA 118 , FXUB 120 , and branch unit 112 output their destination register operand data to SPRs 138 , 140 , and 142 when the instruction has passed the point of flushing in the pipeline.
- Program states such as an instruction pointer, stack pointer, or processor status word, stored in SPRs 138 and 140 indicate thread priority 152 to ISU 109 .
- VMXA 126 and VMXB 128 output their destination register operand data to VRs 144 and 146 when the instruction has passed the point of flushing in the pipeline.
- Data cache 150 may also have associated with it a non-cacheable unit (not shown) which accepts data from the processor and writes it directly to level 2 cache/memory 106 . In this way, the non-cacheable unit bypasses the coherency protocols required for storage to cache.
- ISU 109 In response to the instructions input from instruction cache 104 and decoded by instruction decode unit 108 , ISU 109 selectively dispatches the instructions to issue queue 110 and then onto execution units 112 , 114 , 116 , 118 , 120 , 122 , 124 , 126 , and 128 with regard to instruction type and thread.
- execution units 112 , 114 , 116 , 118 , 120 , 122 , 124 , 126 , and 128 execute one or more instructions of a particular class or type of instructions.
- FXUA 118 and FXUB 120 execute fixed point mathematical operations on register source operands, such as addition, subtraction, ANDing, ORing and XORing.
- FPUA 122 and FPUB 124 execute floating point mathematical operations on register source operands, such as floating point multiplication and division.
- LSUA 114 and LSUB 116 execute load and store instructions, which move operand data between data cache 150 and ARFs 130 , 132 , 134 , and 136 .
- VMXA 126 and VMXB 128 execute single instruction operations that include multiple data.
- Branch unit 112 executes branch instructions which conditionally alter the flow of execution through a program by modifying the instruction address used by IFU 102 to request instructions from instruction cache 104 .
- Instruction completion unit 154 monitors internal bus structure 149 to determine when instructions executing in execution units 112 , 114 , 116 , 118 , 120 , 122 , 124 , 126 , and 128 are finished writing their operand results to ARFs 130 , 132 , 134 , 136 , 138 , 140 , 142 , 144 , and 146 .
- Instructions executed by branch unit 112 , FXUA 118 , FXUB 120 , LSUA 114 , and LSUB 116 require the same number of cycles to execute, while instructions executed by FPUA 122 , FPUB 124 , VMXA 126 , and VMXB 128 require a variable, and a larger number of cycles to execute. Therefore, instructions that are grouped together and start executing at the same time do not necessarily finish executing at the same time.
- “Completion” of an instruction means that the instruction is finishing executing in one of execution units 112 , 114 , 116 , 118 , 120 , 122 , 124 , 126 , or 128 , has passed the point of flushing, and all older instructions have already been updated in the architected state, since instructions have to be completed in order. Hence, the instruction is now ready to complete and update the architected state, which means updating the final state of the data as the instruction has been completed.
- the architected state can only be updated in order, that is, instructions have to be completed in order and the completed data has to be updated as each instruction completes.
- Instruction completion unit 154 monitors for the completion of instructions, and sends control information 156 to ISU 109 to notify ISU 109 that more groups of instructions can be dispatched to execution units 112 , 114 , 116 , 118 , 120 , 122 , 124 , 126 , and 128 .
- ISU 109 sends dispatch signal 158 , which serves as a throttle to bring more instructions down the pipeline to the dispatch unit, to IFU 102 and instruction decode unit 108 to indicate that it is ready to receive more decoded instructions.
- processor 100 provides one detailed description of a single integrated circuit superscalar microprocessor with dual-thread simultaneous multi-threading (SMT) that may also be operated in a single threaded mode
- the illustrative embodiments are not limited to such microprocessors. That is, the illustrative embodiments may be implemented in any type of processor using a pipeline technology.
- one or more of the load/store units 114 and 116 may be augmented to include a hardware content addressable memory (CAM) structure and logic for implementing the mechanisms of the illustrative embodiments.
- a content addressable memory (CAM) is a special type of hardware search engine that is much faster than algorithmic approaches for search intensive applications.
- CAMs are composed of conventional semiconductor memory, usually SRAM, with added comparison circuitry that enables a search operation to complete in a single processor clock cycle.
- the logic of the load/store unit and its CAM structure are configurable by an application, debugger, or the like, to define ranges of memory, such as main memory, for which load and/or store operations targeting that range of memory should generate an exception in order to facilitate gathering of debugging information.
- the processor is augmented with special instructions to allow the debugger or the application to access the CAM structure, such as to load the CAM structure with ranges, and to set the corresponding S and L bits. Also, the instructions allow the application to turn off the CAM altogether to save energy when no debugging is taking place.
- the application or debugger creates an entry in the CAM structure that specifies the starting address of the range of memory, a length of the range of memory, and whether loads, stores, or loads and stores to this range of memory are to generate an exception for handling by an exception handler or the debugger application.
- This information is stored in the entry in the CAM structure and is searchable based on an address of an access operation to determine if the address of the access operation falls within a range specified by one of the entries in the CAM structure. If so, and the access operation is one that is indicated as being an access operation that generates an exception, the exception may be generated and handled by either an exception handler or the debugger to gather debugging information and/or perform the actual debugging of the application. This may be done whether or not the application is a multi-threaded application or not.
- the exception handler or debugger may be configured to identify difficult to find bugs in multi-threaded applications, such as race conditions or demonic accesses to shared variables. For example, in order to check for race conditions or demonic accesses, the exception handler or debugger may check to see if the thread that submitted the access operation had acquired a lock on the memory location specified by the address in the access operation prior to attempting the access operation. If so, then the debugger or exception handler may not perform any actions and instead allow the application to resume execution. However, if the thread that attempted the access operation did not first obtain the lock for the memory location, then the debugger or exception handler may take over the execution of the application and retrieve debug or trace information for use in analysis to identify a potential bug in the application code.
- the access operation is one that is not indicated as being an access operation that generates an exception, or the address of the access operation does not fall within one of the ranges of memory defined by an entry in the CAM, then the access operation may be performed without generating an exception.
- FIG. 2 is an example block diagram of a load/store unit in accordance with one illustrative embodiment.
- the load/store unit 230 is augmented to include a content addressable memory (CAM) having one or more CAM entries and search logic 249 .
- Each CAM entry includes a start address 242 , a length 244 , a store bit (S bit) 246 , and a load bit (L bit) 248 .
- the start address 242 and length 244 define an address range of memory that is to be monitored using the CAM 240 .
- the start address 242 and length 244 may be specified in terms of effective addresses, virtual addresses, real or physical addresses, or the like, depending upon the particular implementation.
- the S bit 246 and L bit 248 designate whether one or both of store and load instructions/operations targeting the address range of memory specified by the corresponding start address 242 and length 244 are to be monitored, i.e. should generate an exception requiring exception handling.
- a single CAM structure 240 may be used to handle all load/store instructions executed by all threads executing in the processor architecture.
- separate CAM structures 240 may be provided for each of the threads such that the CAM structures 240 are associated with a thread context.
- the load/store unit 230 may have multiple CAM structures 240 , one for each thread executing in the processor.
- each load/store unit 230 may have one or more CAM structures 340 for each of the threads that they handle. In the case of multiple CAM structures 240 , one for each thread, which CAM structure 240 corresponds to which thread may be specified in the thread context information of the particular thread.
- An application or debugger 280 may generate entries in the CAM 240 so that certain address ranges of memory are monitored and certain instructions, e.g., store and/or load instructions, targeting the monitored address range of memory are monitored. It should be appreciated that with the mechanisms of the illustrative embodiments, not all portions of the monitored memory need to be monitored. To the contrary, the mechanisms of the illustrative embodiments allow the application or debugger 280 to target individual portions of memory, i.e. individual address ranges of memory, so that targeted tracing and debugging can be performed. For example, an entry in the CAM may be associated with an address range of memory corresponding to a particular variable and thus, the mechanisms of the illustrative embodiments may be used to trace and debug the execution of the application code with regard to this particular variable.
- the search logic 249 of the CAM 240 is used to quickly search all of the entries in the CAM 240 in the same processor cycle and determine if there is a matching entry to an input address.
- the search logic 249 receives an input address 222 associated with the instruction 220 .
- the instruction 220 may be either a load or a store instruction.
- the search logic 249 searches the address ranges specified by the start address 242 and length 244 of each of the entries in the CAM 240 to determine if the input address 222 falls within an address range of an entry in the CAM 240 .
- the state of the S bit 246 and L bit 248 of the matching entry is determined and compared to an opcode of the load or store instruction 220 . If the opcode of the instruction 220 indicates that the instruction is a store instruction, and the S bit 246 of the corresponding matching CAM entry is set to a predetermined value, e.g., 1, then the logic of the CAM 240 may generate an exception 250 . Similarly, if the opcode of the instruction 220 indicates that the instruction is a load instruction, and the L bit 248 of the corresponding matching CAM entry is set to a predetermined value, e.g., 1, then the logic of the CAM 240 may also generate an exception 250 . If the instruction is a load instruction or a store instruction and the corresponding S bit 246 or L bit 248 is not set to the predetermined value, then no exception is generated and the execution of the instruction simply continues in a normal manner through the load/store unit 230 .
- this check against the entries in the CAM 240 is performed for each thread that submits the load/store instruction 220 .
- multiple threads may be executing in the processor and each thread is checked by its corresponding CAM structure in the manner described above to determine whether the load/store instruction 220 targets an address range of interest and is an instruction of interest.
- the CAM structure 240 allows individual address ranges of the memory to be targeted as well as individual types of instructions, e.g., either loads, stores, or both loads and stores.
- the exception may be provided to an exception handler 260 .
- the exception may be sent directly to the application or debugger 380 rather than having a separate exception handler 260 .
- the exception handler 260 or the application/debugger 280 may have been previously registered receive exceptions on behalf of the executing application. This can be done using traditional operating system techniques such as UNIX's ptrace( ) system call or the signal handling mechanisms of UNIX and UNIX-like systems.
- the operating system is responsible for channeling the exception to the appropriate entity (debugger or application) and at the appropriate code handler, as done in the current art.
- Execution of the application code is branched to the exception handler 260 or application/debugger 280 in the event of the exception 250 being generated which then may operate to collect trace/debug information in a trace data structure 270 .
- the application/debugger 280 may operate on the trace data structure 270 to perform analysis and identify potential bugs in the application code.
- the application/debugger 280 may identify potential race conditions or demonic accesses by multiple threads accessing the same address range of memory at substantially a same time. Race conditions or demonic accesses may pose serious problems with the execution of application code since data may be corrupted or otherwise made incorrect for one or more of the threads attempting to access that data due to one thread modifying the data while the other thread is attempting to use the data or modify it in a different manner.
- a first thread may be of the type:
- the application/debugger 280 may provide a debugger output 290 detailing the results of the analysis performed by the application/debugger 280 on the trace information stored in the trace data structure 270 .
- the application/debugger 280 may identify possible race conditions or demonic accesses by threads, identify the threads involved and the instructions that gave rise to the race conditions/demonic accesses, or the like.
- Various types of debugger outputs 290 may be provided based on the trace information gathered in the trace data structure 270 and the analysis performed by the application/debugger 280 .
- FIG. 3 is a flowchart outlining an example operation of a load/store unit in accordance with one illustrative embodiment with regard to using a content addressable memory (CAM) to trigger exceptions when load and/or store instructions, regardless of thread, attempt to access an address range of memory of interest.
- the operation in FIG. 3 assumes that the CAM structure is present in the load/store unit and has been populated with one or more entries specifying address ranges of memory that are of interest to a debugger.
- a debugger may write entries to the CAM structure to identify the address ranges of memory that are interest to the debugger and may set the appropriate S bit and/or L bit for the types of instructions that are of interest to the debugger.
- the debugger may be registered with the system for handling exceptions generated by the CAM structure as discussed above.
- the operation starts with the receipt, in the load/store unit, of a load or store instruction (step 310 ).
- a lookup operation, or search is performed in the CAM for the address specified in the load or store instruction to determine if the specified address is within an address range defined by one of the entries in the CAM (step 320 ).
- a determination is made as to whether there is a matching entry (step 330 ). If so, then a determination is made as to whether to generate an exception or not based on the setting of the S bit and L bit of the matching entry (step 340 ). For example, as mentioned above, if the instruction is a store and the S bit is set, of if the instruction is a load and the L bit is set, then an exception may be generated. Otherwise, the exception is not generated.
- the exception is generated and sent to an exception handler or debugger (step 350 ).
- the state of the thread that issued the load or store instruction is stored on the stack (step 360 ) and debug or trace information is gathered for the thread that generated the exception (step 370 ).
- the exception is then handled by either the exception handler or the debugger (step 380 ).
- the exception handler may analyze the debug/trace information gathered and determine if a race condition or demonic access is detected to have occurred.
- One way in which such conditions may be detected is to determine if the thread that issued the load or store instruction obtained a lock on the address range of the corresponding entry in the CAM, or at least the specific memory location identified by the address in the load or store instruction, before attempting to perform the load or store on the memory location. If so, then there is no race condition or demonic access. If the lock was not obtained, then a race condition or demonic access may have occurred.
- the illustrative embodiments provide hardware mechanisms for providing a CAM structure to assist in debugging application code.
- the mechanisms of the illustrative embodiments are especially well suited for assisting in the debugging of multi-threaded application code since one or more CAM structures, which may be associated with particular thread contexts, may be provided for generating exceptions whenever a processor attempts to access an address range of memory, regardless of the particular thread attempting the access. In this way, multiple concurrently running threads may be monitored concurrently with regard to specific address ranges of interest and with regard to particular types of instructions of interest.
- CAM structures being provided in a load/store unit of a processor to monitor loads and/or stores to certain address ranges of memory
- the illustrative embodiments are not limited to such. Rather, similar CAM structures may be provided in other functional units of a processor in order to monitor different types of instructions being executed in the processor.
- similar CAM structures may be provided in the branch unit 112 in FIG. 1 , the floating point units 122 or 124 , or the like, in order to monitor for different types of instructions and generating corresponding exceptions for generating debug or trace information.
- the key concept being the use of a hardware CAM structure to designate the address ranges of memory that are of interest and the types of instructions of interest and generating an exception when an instruction of interest targets an address range of interest, regardless of which thread is executing the instruction.
- the illustrative embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
- the mechanisms of the illustrative embodiments are implemented in software or program code, which includes but is not limited to firmware, resident software, microcode, etc.
- a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
- I/O devices can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems and Ethernet cards are just a few of the currently available types of network adapters.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Debugging And Monitoring (AREA)
Abstract
Mechanisms are provided for debugging application code using a content addressable memory. The mechanisms receive an instruction in a hardware unit of a processor of the data processing system, the instruction having a target memory address that the instruction is attempting to access. A content addressable memory (CAM) associated with the hardware unit is searched for an entry in the CAM corresponding to the target memory address. In response to an entry in the CAM corresponding to the target memory address being found, a determination is made as to whether information in the entry identifies the instruction as an instruction of interest. In response to the entry identifying the instruction as an instruction of interest, an exception is generated and sent to one of an exception handler or a debugger application. In this way, debugging of multithreaded applications may be performed in an efficient manner.
Description
- The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms that provide support for debugging multithreaded code.
- Writing computer programs to run in a multitude of threads is a recognized method in the current state of the art to improve application performance. Unlike single-threaded applications, which execute instructions sequentially according to program order, multithreaded applications improve performance by running multiple threads simultaneously on various processing components of a system. Performance improves because more than one processor or hardware thread are typically running the multithreaded code, thereby helping the application complete its tasks in shorter time.
- The development of multithreaded applications remains a difficult task, however, because the programmer often has to insert synchronization code to make the threads behave in a desired manner to compute the equivalent result of the application running as a sequential program. Such synchronization code can be difficult to write and maintain. Another difficulty in developing multithreaded application code is to organize the sharing of data among the threads. Without careful organization of how threads share data among themselves, the threads within an application may overwrite each other's changes to data items in memory, or may produce unpredictable results because reads and writes of the same data item are not ordered properly. This condition is usually called a “data race” or simply a “race condition.”
- Many synchronization primitives have been invented to aid programmers in developing multithreaded applications. For example, semaphores, locks, and monitors are generally recognized techniques to impose order on shared data access and to ensure that threads interact with one another in a predictable manner. When a correctly written parallel program uses these constructs, it will generally produce correct results and behave in a deterministic manner. However, even with these constructs and primitives, the task of developing multithreaded code is not a simple one. A programmer may forget to protect access to a shared data item by failing to introduce the proper synchronization code. Such unprotected accesses are called demonic accesses, and are very difficult to track at runtime.
- Since no application code can be realistically assumed to be correct upon implementation, a debugging and testing phase usually follows code development. During this phase, the application runs a test suite (usually called regression testing) and the results are examined to see if the application can be released. If the results show errors in the application code, it is debugged by several techniques such as relating the errors back to their origins until the source of error has been identified and corrected. This technique, already difficult in sequential debugging, is even more difficult to use in multithreaded code because the application code is often not deterministic. For example, if there is a demonic access of shared data, a run of an application may have different possible schedules for the demonic access, and some of these schedules may not produce an error at all. Thus, repeating the execution of the application to find bugs is not a viable approach in debugging multithreaded code.
- To exacerbate the problem, there is a dearth of tools that can help in debugging multithreaded applications. Unlike sequential code where the programmer can use tools to observe the behavior of the code as it runs through the different phases of a program, a parallel program may not execute in the same manner every time. Thus, there will be situations where a bug manifests itself some of the time, or worse yet, a bug may manifest itself rarely, making it difficult to uncover. Furthermore, many of the conventional techniques for sequential debugging may perturb the timing of a parallel program so as to mask the appearance of bugs while the debugging session is on, only to appear later when the debugging tools have been disengaged.
- In one illustrative embodiment, a method, in a processor of a data processing system, is provided for debugging application code. The method comprises receiving an instruction in a hardware unit of the processor, the instruction having a target memory address that the instruction is attempting to access. The method further comprises searching a content addressable memory (CAM) associated with the hardware unit for an entry in the CAM designating a range of addresses that includes the target memory address. Moreover, the method comprises, in response to finding an entry in the CAM designating a range of addresses that include the target memory address, determining if information in the entry identifies the instruction as an instruction of interest. In addition, the method comprises, in response to the entry identifying the instruction as an instruction of interest, generating an exception and sending the exception to one of an exception handler or a debugger application.
- The method further includes the programmer loading the CAM associated with the hardware with ranges of addresses including variables shared among various threads in the program. Furthermore, the method includes setting the CAM of every hardware thread that runs an application thread according to an embodiment of this invention. The program is then run, and if a thread accesses a variable in the ranges specified in the CAM, a debugger verifies that the application has procured the necessary synchronization construct prior to accessing the variable. An access to a variable without protection is a potential for a synchronization bug, which is difficult to detect in conventional debugging.
- In other illustrative embodiments, a computer program product comprising a computer useable or readable medium having a computer readable program is provided. The computer readable program, when executed on a computing device, causes the computing device to perform various ones, and combinations of, the operations outlined above with regard to the method illustrative embodiment.
- In yet another illustrative embodiment, a system/apparatus is provided. The system/apparatus may comprise one or more processors and a memory coupled to the one or more processors. The memory may comprise instructions which, when executed by the one or more processors, cause the one or more processors to perform various ones, and combinations of, the operations outlined above with regard to the method illustrative embodiment.
- These and other features and advantages of the present invention will be described in, or will become apparent to those of ordinary skill in the art in view of, the following detailed description of the example embodiments of the present invention.
- The invention, as well as a preferred mode of use and further objectives and advantages thereof, will best be understood by reference to the following detailed description of illustrative embodiments when read in conjunction with the accompanying drawings, wherein:
-
FIG. 1 is an example diagram of a processor architecture in which aspects of the illustrative embodiments may be implemented; -
FIG. 2 is an example block diagram of a load/store unit in accordance with one illustrative embodiment; and -
FIG. 3 is a flowchart outlining an example operation of a load/store unit in accordance with one illustrative embodiment. - The illustrative embodiments provide a mechanism for providing debugging support for multi-threaded computer code. The mechanisms of the illustrative embodiments provide hardware support that enables an application to track memory accesses to several ranges in memory. The hardware support includes a content addressable memory (CAM) structure that can be set either by the application or a debugger that controls the application. Each entry in the CAM structure has a starting address, which designates the starting address of a range of memory being monitored. The entry further comprises a length field, which designates the size of the range of memory being monitored corresponding to the entry, a store bit (or S bit), and a load bit (or L bit), which enable detection of memory stores and loads, respectively, to the range of memory defined by the start address and length.
- At a hardware level, a processor checks every access to memory within a running thread. If the address of the memory access matches one of the entries in the CAM, i.e. the address is within a range of memory corresponding to an entry in the CAM, then the hardware issues an exception. The exception causes the state of the thread on the stack to be stored and execution to jump to an exception handling routine in software. A match of the address of the access to an entry in the CAM occurs if the memory access is a store and the corresponding address lies in the range determined by one of the CAM entries with a corresponding S bit being set to a predetermined value, e.g., 1. A match also occurs if the memory access is a load and the corresponding address lies in the range determined by one of the CAM entries with a corresponding L bit being set to a predetermined value, e.g., 1. If the S bit or the L bit is not set to the predetermined value, e.g., the S bit or L bit is set to 0, and the access is a store or load, respectively, then the match is ignored.
- To debug an application, the application or the debugger controlling the application, may set the range of memory to be monitored into one of the CAM entries and an exception handler may be provided to handle the exceptions generated upon any memory access to a monitored range. The exception handler may be used to determine where, in the application's code, a particular variable is being modified during execution, for example, by recording the variable's state at the time of the exception as well as other execution parameters, such as may be generated by performance counters, or the like.
- The CAM structure allows the hardware to monitor more than one range of memory simultaneously without any performance overhead that may cause execution dilation. To debug a multi-threaded application, the application or a debugger may set the exception handler to check if a received instruction performs a store or a load to a variable's memory address while a protecting synchronization object, e.g., a lock, has been acquired by another thread prior to the access. If not, then this is an instance of a race condition or a demonic access to a shared variable, which are common and difficult to find bugs in multi-threaded applications. If the protecting synchronization object has been acquired prior to the access, then a race condition or demonic access to a shared variable has not been encountered. Other types of hard to find bugs may be found using the hardware mechanisms of the illustrative embodiments to provide support for generating debugging exceptions and branching execution to an appropriate exception handler to gather trace information for debugging purposes.
- The mechanisms of the illustrative embodiments may be used in many different types of data processing system and processor architectures. The illustrative embodiments may be used in both single processor sequential processing architectures and multiple processor, multi-threaded data processing system architectures, to provide hardware support for debugging of computer programs. However, for purposes of this description, it will be assumed that the data processing system in which the mechanisms of the illustrative embodiments are implemented is a multi-processor (or multi-core) data processing system that provides multi-threading hardware. It should be appreciated, however, that the illustrative embodiments and the present invention are not limited to such.
- As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
- Any combination of one or more computer usable or computer readable medium(s) may be utilized. The computer-usable or computer-readable medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CDROM), an optical storage device, a transmission media such as those supporting the Internet or an intranet, or a magnetic storage device. Note that the computer-usable or computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory. In the context of this document, a computer-usable or computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The computer-usable medium may include a propagated data signal with the computer-usable program code embodied therewith, either in baseband or as part of a carrier wave. The computer usable program code may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio frequency (RF), etc.
- Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java™, Smalltalk™, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In addition, the program code may be embodied on a computer readable storage medium on the server or the remote computer and downloaded over a network to a computer readable storage medium of the remote computer or the users' computer for storage and/or execution. Moreover, any of the computing systems or data processing systems may store the program code in a computer readable storage medium after having downloaded the program code over a network from a remote computing system or data processing system.
- The illustrative embodiments are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the illustrative embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
- Referring to
FIG. 1 , an exemplary block diagram of a dual threaded processor design showing functional units and registers is depicted in accordance with an illustrative embodiment.Processor 100 may be implemented as one or more of the processing units in a multi-threaded data processing system architecture, for example. That is,processor 100 may comprise one or more processor cores supporting the simultaneous execution of more than one thread. For example,processor 100 may comprise a single integrated circuit superscalar microprocessor with dual-thread simultaneous multi-threading (SMT) that may also be operated in a single threaded mode. Accordingly, as discussed further herein below,processor 100 includes various units, registers, buffers, memories, and other sections, all of which are formed by integrated circuitry. It should be appreciated that while reference is made herein to a particular processor architecture and particular multi-threading capabilities for illustration purposes, the mechanisms of the illustrative embodiments are applicable to any processor architecture that supports any level of multi-threading, e.g., dual- thread, quad-thread, or the like. - As shown in
FIG. 1 , instruction fetch unit (IFU) 102 connects toinstruction cache 104.Instruction cache 104 holds instructions for multiple programs (threads) to be executed.Instruction cache 104 also has an interface to level 2 (L2) cache/memory 106.IFU 102 requests instructions frominstruction cache 104 according to an instruction address, and passes instructions toinstruction decode unit 108. In an illustrative embodiment,IFU 102 may request multiple instructions frominstruction cache 104 for up to two threads at the same time.Instruction decode unit 108 decodes multiple instructions for up to two threads at the same time and passes decoded instructions to instruction sequencer unit (ISU) 109. -
Processor 100 may also includeissue queue 110, which receives decoded instructions fromISU 109. Instructions are stored in theissue queue 110 while awaiting dispatch to the appropriate execution units. For an out-of order processor to operate in an in-order manner,ISU 109 may selectively issue instructions quickly using false dependencies between each instruction. If the instruction does not produce data, such as in a read after write dependency,ISU 109 may add an additional source operand (also referred to as a consumer) per instruction to point to the previous target instruction (also referred to as a producer).Issue queue 110, when issuing the producer, may then wakeup the consumer for issue. By introducing false dependencies, a chain of dependent instructions may then be created, whereas the instructions may then be issued only in-order.ISU 109 uses the added consumer for instruction scheduling purposes and the instructions, when executed, do not actually use the data from the added dependency. OnceISU 109 selectively adds any required false dependencies, then issuequeue 110 takes over and issues the instructions in order for each thread, and outputs or issues instructions for each thread toexecution units - In an illustrative embodiment, the execution units of the processor may include
branch unit 112, load/store units (LSUA) 114 and (LSUB) 116, fixed point execution units (FXUA) 118 and (FXUB) 120, floating point execution units (FPUA) 122 and (FPUB) 124, and vector multimedia extension units (VMXA) 126 and (VMXB) 128.Execution units execution units - An ARF is a file where completed data is stored once an instruction has completed execution.
ARFs - The processor additionally includes a set of shared special purpose registers (SPR) 142 for holding program states, such as an instruction pointer, stack pointer, or processor status word, which may be used on instructions from either or both threads.
Execution units ARFs internal bus structure 149. - In order to execute a floating point instruction,
FPUA 122 andFPUB 124 retrieves register source operand information, which is input data required to execute an instruction, fromFPRs ARFs FPUA 122 andFPLTB 124 input their data according to which thread each executing instruction belongs. For example,FPUA 122 inputs completed data toFPR 134 andFPUB 124 inputs completed data toFPR 136, becauseFPUA 122,FPUB 124, and FPRs 134 and 136 are thread specific. - During execution of an instruction,
FPUA 122 andFPUB 124 output their destination register operand data, or instruction data generated during execution of the instruction, to FPRs 134 and 136 when the instruction has passed the point of flushing in the pipeline. During execution of an instruction,FXUA 118,FXUB 120,LSUA 114, andLSUB 116 output their destination register operand data, or instruction data generated during execution of the instruction, to GPRs 130 and 132 when the instruction has passed the point of flushing in the pipeline. During execution of a subset of instructions,FXUA 118,FXUB 120, andbranch unit 112 output their destination register operand data to SPRs 138, 140, and 142 when the instruction has passed the point of flushing in the pipeline. Program states, such as an instruction pointer, stack pointer, or processor status word, stored inSPRs thread priority 152 toISU 109. During execution of an instruction,VMXA 126 andVMXB 128 output their destination register operand data to VRs 144 and 146 when the instruction has passed the point of flushing in the pipeline. -
Data cache 150 may also have associated with it a non-cacheable unit (not shown) which accepts data from the processor and writes it directly tolevel 2 cache/memory 106. In this way, the non-cacheable unit bypasses the coherency protocols required for storage to cache. - In response to the instructions input from
instruction cache 104 and decoded byinstruction decode unit 108,ISU 109 selectively dispatches the instructions to issuequeue 110 and then ontoexecution units execution units FXUA 118 andFXUB 120 execute fixed point mathematical operations on register source operands, such as addition, subtraction, ANDing, ORing and XORing.FPUA 122 andFPUB 124 execute floating point mathematical operations on register source operands, such as floating point multiplication and division.LSUA 114 andLSUB 116 execute load and store instructions, which move operand data betweendata cache 150 andARFs VMXA 126 andVMXB 128 execute single instruction operations that include multiple data.Branch unit 112 executes branch instructions which conditionally alter the flow of execution through a program by modifying the instruction address used byIFU 102 to request instructions frominstruction cache 104. -
Instruction completion unit 154 monitorsinternal bus structure 149 to determine when instructions executing inexecution units ARFs branch unit 112,FXUA 118,FXUB 120,LSUA 114, andLSUB 116 require the same number of cycles to execute, while instructions executed byFPUA 122,FPUB 124,VMXA 126, andVMXB 128 require a variable, and a larger number of cycles to execute. Therefore, instructions that are grouped together and start executing at the same time do not necessarily finish executing at the same time. “Completion” of an instruction means that the instruction is finishing executing in one ofexecution units -
Instruction completion unit 154 monitors for the completion of instructions, and sendscontrol information 156 toISU 109 to notifyISU 109 that more groups of instructions can be dispatched toexecution units ISU 109 sendsdispatch signal 158, which serves as a throttle to bring more instructions down the pipeline to the dispatch unit, toIFU 102 andinstruction decode unit 108 to indicate that it is ready to receive more decoded instructions. Whileprocessor 100 provides one detailed description of a single integrated circuit superscalar microprocessor with dual-thread simultaneous multi-threading (SMT) that may also be operated in a single threaded mode, the illustrative embodiments are not limited to such microprocessors. That is, the illustrative embodiments may be implemented in any type of processor using a pipeline technology. - In the architecture shown in
FIG. 1 , one or more of the load/store units - The logic of the load/store unit and its CAM structure are configurable by an application, debugger, or the like, to define ranges of memory, such as main memory, for which load and/or store operations targeting that range of memory should generate an exception in order to facilitate gathering of debugging information. The processor is augmented with special instructions to allow the debugger or the application to access the CAM structure, such as to load the CAM structure with ranges, and to set the corresponding S and L bits. Also, the instructions allow the application to turn off the CAM altogether to save energy when no debugging is taking place.
- The application or debugger creates an entry in the CAM structure that specifies the starting address of the range of memory, a length of the range of memory, and whether loads, stores, or loads and stores to this range of memory are to generate an exception for handling by an exception handler or the debugger application. This information is stored in the entry in the CAM structure and is searchable based on an address of an access operation to determine if the address of the access operation falls within a range specified by one of the entries in the CAM structure. If so, and the access operation is one that is indicated as being an access operation that generates an exception, the exception may be generated and handled by either an exception handler or the debugger to gather debugging information and/or perform the actual debugging of the application. This may be done whether or not the application is a multi-threaded application or not.
- The exception handler or debugger may be configured to identify difficult to find bugs in multi-threaded applications, such as race conditions or demonic accesses to shared variables. For example, in order to check for race conditions or demonic accesses, the exception handler or debugger may check to see if the thread that submitted the access operation had acquired a lock on the memory location specified by the address in the access operation prior to attempting the access operation. If so, then the debugger or exception handler may not perform any actions and instead allow the application to resume execution. However, if the thread that attempted the access operation did not first obtain the lock for the memory location, then the debugger or exception handler may take over the execution of the application and retrieve debug or trace information for use in analysis to identify a potential bug in the application code.
- If the access operation is one that is not indicated as being an access operation that generates an exception, or the address of the access operation does not fall within one of the ranges of memory defined by an entry in the CAM, then the access operation may be performed without generating an exception.
-
FIG. 2 is an example block diagram of a load/store unit in accordance with one illustrative embodiment. As shown inFIG. 2 , the load/store unit 230 is augmented to include a content addressable memory (CAM) having one or more CAM entries andsearch logic 249. Each CAM entry includes astart address 242, alength 244, a store bit (S bit) 246, and a load bit (L bit) 248. Thestart address 242 andlength 244 define an address range of memory that is to be monitored using theCAM 240. Thestart address 242 andlength 244 may be specified in terms of effective addresses, virtual addresses, real or physical addresses, or the like, depending upon the particular implementation. TheS bit 246 and L bit 248 designate whether one or both of store and load instructions/operations targeting the address range of memory specified by thecorresponding start address 242 andlength 244 are to be monitored, i.e. should generate an exception requiring exception handling. - It should be noted that, in some implementations of the illustrative embodiments, a
single CAM structure 240 may be used to handle all load/store instructions executed by all threads executing in the processor architecture. Alternatively,separate CAM structures 240 may be provided for each of the threads such that theCAM structures 240 are associated with a thread context. Thus, the load/store unit 230 may havemultiple CAM structures 240, one for each thread executing in the processor. Alternatively, in an architecture having multiple load/store units 230, each load/store unit 230 may have one ormore CAM structures 340 for each of the threads that they handle. In the case ofmultiple CAM structures 240, one for each thread, whichCAM structure 240 corresponds to which thread may be specified in the thread context information of the particular thread. - An application or
debugger 280 may generate entries in theCAM 240 so that certain address ranges of memory are monitored and certain instructions, e.g., store and/or load instructions, targeting the monitored address range of memory are monitored. It should be appreciated that with the mechanisms of the illustrative embodiments, not all portions of the monitored memory need to be monitored. To the contrary, the mechanisms of the illustrative embodiments allow the application ordebugger 280 to target individual portions of memory, i.e. individual address ranges of memory, so that targeted tracing and debugging can be performed. For example, an entry in the CAM may be associated with an address range of memory corresponding to a particular variable and thus, the mechanisms of the illustrative embodiments may be used to trace and debug the execution of the application code with regard to this particular variable. - The
search logic 249 of theCAM 240 is used to quickly search all of the entries in theCAM 240 in the same processor cycle and determine if there is a matching entry to an input address. In particular, in response to anissue queue 210 issuing aninstruction 220 to the load/store unit 230, thesearch logic 249 receives aninput address 222 associated with theinstruction 220. Theinstruction 220 may be either a load or a store instruction. In response to receiving theinstruction 220 and itsinput address 222, thesearch logic 249 searches the address ranges specified by thestart address 242 andlength 244 of each of the entries in theCAM 240 to determine if theinput address 222 falls within an address range of an entry in theCAM 240. If so, the state of theS bit 246 and L bit 248 of the matching entry is determined and compared to an opcode of the load orstore instruction 220. If the opcode of theinstruction 220 indicates that the instruction is a store instruction, and theS bit 246 of the corresponding matching CAM entry is set to a predetermined value, e.g., 1, then the logic of theCAM 240 may generate anexception 250. Similarly, if the opcode of theinstruction 220 indicates that the instruction is a load instruction, and theL bit 248 of the corresponding matching CAM entry is set to a predetermined value, e.g., 1, then the logic of theCAM 240 may also generate anexception 250. If the instruction is a load instruction or a store instruction and thecorresponding S bit 246 orL bit 248 is not set to the predetermined value, then no exception is generated and the execution of the instruction simply continues in a normal manner through the load/store unit 230. - It should be noted that this check against the entries in the
CAM 240 is performed for each thread that submits the load/store instruction 220. Thus, multiple threads may be executing in the processor and each thread is checked by its corresponding CAM structure in the manner described above to determine whether the load/store instruction 220 targets an address range of interest and is an instruction of interest. Hence, it is possible to monitor multiple threads at substantially the same time without having to serialize the monitoring on a thread by thread basis as is required in the prior art. Moreover, theCAM structure 240 allows individual address ranges of the memory to be targeted as well as individual types of instructions, e.g., either loads, stores, or both loads and stores. - In the event that an
exception 250 is generated by theCAM 240, the exception may be provided to anexception handler 260. Alternatively, the exception may be sent directly to the application ordebugger 380 rather than having aseparate exception handler 260. Theexception handler 260 or the application/debugger 280 may have been previously registered receive exceptions on behalf of the executing application. This can be done using traditional operating system techniques such as UNIX's ptrace( ) system call or the signal handling mechanisms of UNIX and UNIX-like systems. The operating system is responsible for channeling the exception to the appropriate entity (debugger or application) and at the appropriate code handler, as done in the current art. Execution of the application code is branched to theexception handler 260 or application/debugger 280 in the event of theexception 250 being generated which then may operate to collect trace/debug information in atrace data structure 270. The application/debugger 280 may operate on thetrace data structure 270 to perform analysis and identify potential bugs in the application code. - For example, the application/
debugger 280 may identify potential race conditions or demonic accesses by multiple threads accessing the same address range of memory at substantially a same time. Race conditions or demonic accesses may pose serious problems with the execution of application code since data may be corrupted or otherwise made incorrect for one or more of the threads attempting to access that data due to one thread modifying the data while the other thread is attempting to use the data or modify it in a different manner. For example, a first thread may be of the type: -
- Lock(I);
- v+=1;
- Unlock (I);
and a second thread may be of the type: - v+=2; //demonic variable access
- If v==3 before entering the first thread, then v==4, v==5, v==6 are all possible after exit. The same is true if v==2 before entry into the second thread. Moreover, if v==3 before entry into the second thread, both v==5 and v==6 are also possible. Thus, there is the possibility, with concurrent execution of the first and second thread, that a race condition or demonic access occurs when v is the same value in both the first and second threads. Such race conditions or demonic accesses may be detected with regard to
thread 2 in thatthread 2 does not obtain the lock on the memory location before attempting to access it. This is a simple example, but it illustrates the possible problem. Actual errors occurring in multi-threaded applications will typically be more complex than this but may likewise be detected using the CAM structure and exception handling of the illustrative embodiments. - The application/
debugger 280 may provide adebugger output 290 detailing the results of the analysis performed by the application/debugger 280 on the trace information stored in thetrace data structure 270. For example, the application/debugger 280 may identify possible race conditions or demonic accesses by threads, identify the threads involved and the instructions that gave rise to the race conditions/demonic accesses, or the like. Various types ofdebugger outputs 290 may be provided based on the trace information gathered in thetrace data structure 270 and the analysis performed by the application/debugger 280. -
FIG. 3 is a flowchart outlining an example operation of a load/store unit in accordance with one illustrative embodiment with regard to using a content addressable memory (CAM) to trigger exceptions when load and/or store instructions, regardless of thread, attempt to access an address range of memory of interest. The operation inFIG. 3 assumes that the CAM structure is present in the load/store unit and has been populated with one or more entries specifying address ranges of memory that are of interest to a debugger. As noted above, a debugger may write entries to the CAM structure to identify the address ranges of memory that are interest to the debugger and may set the appropriate S bit and/or L bit for the types of instructions that are of interest to the debugger. The debugger may be registered with the system for handling exceptions generated by the CAM structure as discussed above. - As shown in
FIG. 3 , the operation starts with the receipt, in the load/store unit, of a load or store instruction (step 310). A lookup operation, or search, is performed in the CAM for the address specified in the load or store instruction to determine if the specified address is within an address range defined by one of the entries in the CAM (step 320). A determination is made as to whether there is a matching entry (step 330). If so, then a determination is made as to whether to generate an exception or not based on the setting of the S bit and L bit of the matching entry (step 340). For example, as mentioned above, if the instruction is a store and the S bit is set, of if the instruction is a load and the L bit is set, then an exception may be generated. Otherwise, the exception is not generated. - If an exception is to be generated, the exception is generated and sent to an exception handler or debugger (step 350). The state of the thread that issued the load or store instruction is stored on the stack (step 360) and debug or trace information is gathered for the thread that generated the exception (step 370). The exception is then handled by either the exception handler or the debugger (step 380).
- For example, the exception handler may analyze the debug/trace information gathered and determine if a race condition or demonic access is detected to have occurred. One way in which such conditions may be detected is to determine if the thread that issued the load or store instruction obtained a lock on the address range of the corresponding entry in the CAM, or at least the specific memory location identified by the address in the load or store instruction, before attempting to perform the load or store on the memory location. If so, then there is no race condition or demonic access. If the lock was not obtained, then a race condition or demonic access may have occurred.
- Thus, the illustrative embodiments provide hardware mechanisms for providing a CAM structure to assist in debugging application code. The mechanisms of the illustrative embodiments are especially well suited for assisting in the debugging of multi-threaded application code since one or more CAM structures, which may be associated with particular thread contexts, may be provided for generating exceptions whenever a processor attempts to access an address range of memory, regardless of the particular thread attempting the access. In this way, multiple concurrently running threads may be monitored concurrently with regard to specific address ranges of interest and with regard to particular types of instructions of interest.
- It should be appreciated that while the illustrative embodiments are described in terms of a CAM structure being provided in a load/store unit of a processor to monitor loads and/or stores to certain address ranges of memory, the illustrative embodiments are not limited to such. Rather, similar CAM structures may be provided in other functional units of a processor in order to monitor different types of instructions being executed in the processor. For example, similar CAM structures may be provided in the
branch unit 112 inFIG. 1 , the floatingpoint units - As noted above, it should be appreciated that the illustrative embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In one example embodiment, the mechanisms of the illustrative embodiments are implemented in software or program code, which includes but is not limited to firmware, resident software, microcode, etc.
- A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
- Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers. Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modems and Ethernet cards are just a few of the currently available types of network adapters.
- The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention, the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Claims (25)
1. A method, in a processor of a data processing system, for debugging application code, comprising:
receiving an instruction in a hardware unit of the processor, the instruction having a target memory address that the instruction is attempting to access in a memory of the data processing system;
searching a content addressable memory (CAM) associated with the hardware unit for an entry in the CAM corresponding to the target memory address;
in response to an entry in the CAM corresponding to the target memory address being found, determining whether information in the entry identifies the received instruction as an instruction of interest; and
in response to the entry identifying the received instruction as an instruction of interest, generating an exception and sending the exception to one of an exception handler or a debugger application.
2. The method of claim 1 , wherein searching the CAM comprises searching entries in the CAM for an entry having a starting address and length corresponding to a range of memory addresses within which the target memory address is present.
3. The method of claim 1 , wherein determining if information in the entry identifies the instruction as an instruction of interest comprises:
determining a type of the received instruction;
determining if a value in the entry indicates that the type of the received instruction is a type of instruction for which an exception should be generated; and
determining that the received instruction is an instruction of interest in response to the value in the entry indicating that the type of the received instruction is a type of instruction for which an exception should be generated.
4. The method of claim 1 , wherein the type of received instruction is one of a load instruction or a store instruction, and wherein the value in the entry indicates whether a load instruction or a store instruction is an instruction of interest.
5. The method of claim 1 , wherein entries in the CAM are created by the debugger application to identify a range of addresses in the memory to be monitored for debugging purposes.
6. The method of claim 1 , wherein the exception handler determines where, in application code, a particular variable is being modified during execution of the application code by recording a state of the variable at the time of the exception.
7. The method of claim 1 , wherein the exception handler checks for a race condition by checking whether the received instruction operates on a target address of a variable while a protecting synchronization object has been acquired by another thread prior to the received instruction attempting to access the target address of the variable, and wherein a race condition is not present when the received instruction does not operate on a target address of a variable for which a protecting synchronization object has been acquired by another thread prior to the received instruction attempting to access the target address of the variable.
8. The method of claim 1 , wherein the processor maintains a plurality of CAMs, one for each thread of execution supported by the processor.
9. The method of claim 1 , wherein entries in the CAM comprise a start address, a length, and one or more bits identifying types of instruction of interest, wherein a setting of the one or more bits to a predetermined value indicates that a corresponding type of instruction is an instruction of interest for which an exception is to be generated.
10. The method of claim 1 , wherein the exception handler checks whether the received instruction operates on a target address of a variable without procuring a corresponding synchronization object.
11. A data processing system, comprising:
a processor, comprising a hardware unit having a content addressable memory (CAM); and
a memory coupled to the processor, wherein the processor is configured to:
receive an instruction in the hardware unit of the processor, the instruction having a target memory address that the instruction is attempting to access in the memory of the data processing system;
search the CAM for an entry in the CAM corresponding to the target memory address;
determine, in response to an entry in the CAM corresponding to the target memory address being found, whether information in the entry identifies the received instruction as an instruction of interest; and
generate, in response to the entry identifying the received instruction as an instruction of interest, an exception and send the exception to one of an exception handler or a debugger application.
12. The system of claim 11 , wherein the processor searches the CAM by searching entries in the CAM for an entry having a starting address and length corresponding to a range of memory addresses within which the target memory address is present.
13. The system of claim 11 , wherein the processor determines if information in the entry identifies the instruction as an instruction of interest by:
determining a type of the received instruction;
determining if a value in the entry indicates that the type of the received instruction is a type of instruction for which an exception should be generated; and
determining that the received instruction is an instruction of interest in response to the value in the entry indicating that the type of the received instruction is a type of instruction for which an exception should be generated.
14. The system of claim 11 , wherein the type of received instruction is one of a load instruction or a store instruction, and wherein the value in the entry indicates whether a load instruction or a store instruction is an instruction of interest.
15. The system of claim 11 , wherein entries in the CAM are created by the debugger application to identify a range of addresses in the memory to be monitored for debugging purposes.
16. The system of claim 11 , wherein the exception handler determines where, in application code, a particular variable is being modified during execution of the application code by recording a state of the variable at the time of the exception.
17. The system of claim 11 , wherein the exception handler checks for a race condition by checking whether the received instruction operates on a target address of a variable while a protecting synchronization object has been acquired by another thread prior to the received instruction attempting to access the target address of the variable, and wherein a race condition is not present when the received instruction does not operate on a target address of a variable for which a protecting synchronization object has been acquired by another thread prior to the received instruction attempting to access the target address of the variable.
18. The system of claim 11 , wherein the processor maintains a plurality of CAMs, one for each thread of execution supported by the processor.
19. The system of claim 11 , wherein the hardware unit is a load/store unit of the processor.
20. The system of claim 11 , wherein entries in the CAM comprise a start address, a length, and one or more bits identifying types of instruction of interest, wherein a setting of the one or more bits to a predetermined value indicates that a corresponding type of instruction is an instruction of interest for which an exception is to be generated.
21. A computer program product comprising a computer recordable medium having a computer readable program recorded thereon, wherein the computer readable program, when executed on a computing device, causes the computing device to:
receive an instruction in a hardware unit of a processor of the computing device, the instruction having a target memory address that the instruction is attempting to access in a memory of the computing device;
search a content addressable memory (CAM) associated with the hardware unit for an entry in the CAM corresponding to the target memory address;
determine, in response to an entry in the CAM corresponding to the target memory address being found, whether information in the entry identifies the received instruction as an instruction of interest; and
generate, in response to the entry identifying the received instruction as an instruction of interest, an exception and sending the exception to one of an exception handler or a debugger application.
22. The computer program product of claim 21 , wherein searching the CAM comprises searching entries in the CAM for an entry having a starting address and length corresponding to a range of memory addresses within which the target memory address is present.
23. The computer program product of claim 21 , wherein determining if information in the entry identifies the instruction as an instruction of interest comprises:
determining a type of the received instruction;
determining if a value in the entry indicates that the type of the received instruction is a type of instruction for which an exception should be generated; and
determining that the received instruction is an instruction of interest in response to the value in the entry indicating that the type of the received instruction is a type of instruction for which an exception should be generated.
24. The computer program product of claim 21 , wherein the type of received instruction is one of a load instruction or a store instruction, and wherein the value in the entry indicates whether a load instruction or a store instruction is an instruction of interest.
25. The computer program product of claim 21 , wherein:
entries in the CAM are created by the debugger application to identify a range of addresses in the memory to be monitored for debugging purposes,
the exception handler determines where, in application code, a particular variable is being modified during execution of the application code by recording a state of the variable at the time of the exception,
the exception handler checks for a race condition by checking whether the received instruction operates on a target address of a variable while a protecting synchronization object has been acquired by another thread prior to the received instruction attempting to access the target address of the variable, and
a race condition is not present when the received instruction does not operate on a target address of a variable for which a protecting synchronization object has been acquired by another thread prior to the received instruction attempting to access the target address of the variable.
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/762,817 US20110258421A1 (en) | 2010-04-19 | 2010-04-19 | Architecture Support for Debugging Multithreaded Code |
CN2011800194904A CN102844744A (en) | 2010-04-19 | 2011-03-31 | Debugging multithreaded code |
GB1219670.5A GB2493861A (en) | 2010-04-19 | 2011-03-31 | Debugging multithreaded code |
PCT/EP2011/055029 WO2011131469A1 (en) | 2010-04-19 | 2011-03-31 | Debugging multithreaded code |
JP2013505387A JP5904993B2 (en) | 2010-04-19 | 2011-03-31 | Method, system, and computer program for debugging multithreaded code |
DE112011101364.7T DE112011101364B4 (en) | 2010-04-19 | 2011-03-31 | Troubleshooting multithreaded code |
US13/439,229 US8838939B2 (en) | 2010-04-19 | 2012-04-04 | Debugging multithreaded code by generating exception upon target address CAM search for variable and checking race condition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/762,817 US20110258421A1 (en) | 2010-04-19 | 2010-04-19 | Architecture Support for Debugging Multithreaded Code |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/439,229 Continuation US8838939B2 (en) | 2010-04-19 | 2012-04-04 | Debugging multithreaded code by generating exception upon target address CAM search for variable and checking race condition |
Publications (1)
Publication Number | Publication Date |
---|---|
US20110258421A1 true US20110258421A1 (en) | 2011-10-20 |
Family
ID=43920700
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/762,817 Abandoned US20110258421A1 (en) | 2010-04-19 | 2010-04-19 | Architecture Support for Debugging Multithreaded Code |
US13/439,229 Expired - Fee Related US8838939B2 (en) | 2010-04-19 | 2012-04-04 | Debugging multithreaded code by generating exception upon target address CAM search for variable and checking race condition |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/439,229 Expired - Fee Related US8838939B2 (en) | 2010-04-19 | 2012-04-04 | Debugging multithreaded code by generating exception upon target address CAM search for variable and checking race condition |
Country Status (6)
Country | Link |
---|---|
US (2) | US20110258421A1 (en) |
JP (1) | JP5904993B2 (en) |
CN (1) | CN102844744A (en) |
DE (1) | DE112011101364B4 (en) |
GB (1) | GB2493861A (en) |
WO (1) | WO2011131469A1 (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100242026A1 (en) * | 2009-03-18 | 2010-09-23 | International Business Machines, Corporation | Enhanced thread stepping |
US20130159661A1 (en) * | 2011-12-16 | 2013-06-20 | Stmicroelectronics R&D Ltd | Hardware monitor |
US20130246736A1 (en) * | 2010-11-25 | 2013-09-19 | Toyota Jidosha Kabushiki Kaisha | Processor, electronic control unit and generating program |
US20140007054A1 (en) * | 2012-06-27 | 2014-01-02 | Youfeng Wu | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs using expressions |
US20140115604A1 (en) * | 2011-12-21 | 2014-04-24 | Justin Gottschlich | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs |
US20140163945A1 (en) * | 2012-12-07 | 2014-06-12 | International Business Machines Corporation | Memory frame proxy architecture for synchronization and check handling in a simulator |
US20140223137A1 (en) * | 2013-02-01 | 2014-08-07 | International Business Machines Corporation | Storing a system-absolute address (saa) in a first level translation look-aside buffer (tlb) |
US20140233573A1 (en) * | 2013-02-15 | 2014-08-21 | Broadcom Corporation | Out-of-order message filtering with aging |
US8838939B2 (en) | 2010-04-19 | 2014-09-16 | International Business Machines Corporation | Debugging multithreaded code by generating exception upon target address CAM search for variable and checking race condition |
US20140331206A1 (en) * | 2013-05-06 | 2014-11-06 | Microsoft Corporation | Identifying impacted tests from statically collected data |
US20150149984A1 (en) * | 2013-11-22 | 2015-05-28 | International Business Machines Corporation | Determining instruction execution history in a debugger |
US9117021B2 (en) | 2013-03-14 | 2015-08-25 | Intel Corporation | Methods and apparatus to manage concurrent predicate expressions |
US20160103682A1 (en) * | 2014-10-10 | 2016-04-14 | International Business Machines Corporation | Load and store ordering for a strongly ordered simultaneous multithreading core |
US20160139201A1 (en) * | 2014-11-14 | 2016-05-19 | Cavium, Inc. | Debug interface for multiple cpu cores |
US9582312B1 (en) | 2015-02-04 | 2017-02-28 | Amazon Technologies, Inc. | Execution context trace for asynchronous tasks |
CN110888773A (en) * | 2019-10-28 | 2020-03-17 | 北京字节跳动网络技术有限公司 | Method, device, medium and electronic equipment for obtaining thread identification |
CN111831464A (en) * | 2019-04-22 | 2020-10-27 | 阿里巴巴集团控股有限公司 | Data operation control method and device |
CN114003491A (en) * | 2021-10-15 | 2022-02-01 | 赛轮集团股份有限公司 | Test equipment parameter modification method and device, electronic equipment and storage medium |
CN115687159A (en) * | 2022-12-29 | 2023-02-03 | 飞腾信息技术有限公司 | Debugging method, debugging device and computer readable storage medium |
CN116955044A (en) * | 2023-09-12 | 2023-10-27 | 北京开源芯片研究院 | Method, device, equipment and medium for testing cache working mechanism of processor |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10031834B2 (en) | 2016-08-31 | 2018-07-24 | Microsoft Technology Licensing, Llc | Cache-based tracing for time travel debugging and analysis |
US10489273B2 (en) | 2016-10-20 | 2019-11-26 | Microsoft Technology Licensing, Llc | Reuse of a related thread's cache while recording a trace file of code execution |
EP4036737A1 (en) * | 2016-11-11 | 2022-08-03 | Microsoft Technology Licensing, LLC | Cache-based tracing for time travel debugging and analysis |
US10318332B2 (en) | 2017-04-01 | 2019-06-11 | Microsoft Technology Licensing, Llc | Virtual machine execution tracing |
GB2563587B (en) | 2017-06-16 | 2021-01-06 | Imagination Tech Ltd | Scheduling tasks |
GB2563588B (en) | 2017-06-16 | 2019-06-26 | Imagination Tech Ltd | Scheduling tasks |
GB2569275B (en) * | 2017-10-20 | 2020-06-03 | Graphcore Ltd | Time deterministic exchange |
CN111480150B (en) * | 2017-11-02 | 2024-07-16 | 芯力能简易股份公司 | Software environment for controlling engine debugging, testing, calibration and tuning |
CN109933517A (en) * | 2017-12-19 | 2019-06-25 | 成都鼎桥通信技术有限公司 | Test method, device and equipment based on android system |
US11907091B2 (en) | 2018-02-16 | 2024-02-20 | Microsoft Technology Licensing, Llc | Trace recording by logging influxes to an upper-layer shared cache, plus cache coherence protocol transitions among lower-layer caches |
US10565511B1 (en) * | 2018-10-01 | 2020-02-18 | Microsoft Technology Licensing, Llc | Reverse debugging of software failures |
US10691581B1 (en) | 2019-08-16 | 2020-06-23 | Sas Institute Inc. | Distributed software debugging system |
WO2022091651A1 (en) * | 2020-10-28 | 2022-05-05 | 日立Astemo株式会社 | Calculation device and inspection method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5420990A (en) * | 1993-06-17 | 1995-05-30 | Digital Equipment Corporation | Mechanism for enforcing the correct order of instruction execution |
US6101586A (en) * | 1997-02-14 | 2000-08-08 | Nec Corporation | Memory access control circuit |
US7757237B2 (en) * | 2004-06-16 | 2010-07-13 | Hewlett-Packard Development Company, L.P. | Synchronization of threads in a multithreaded computer program |
US8006075B2 (en) * | 2009-05-21 | 2011-08-23 | Oracle America, Inc. | Dynamically allocated store queue for a multithreaded processor |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS62154035A (en) * | 1985-12-27 | 1987-07-09 | Toshiba Corp | System developing supporting device |
JPH04145544A (en) * | 1990-10-05 | 1992-05-19 | Nec Corp | Debugging device |
JP2636101B2 (en) * | 1991-09-11 | 1997-07-30 | 工業技術院長 | Debug support device |
US6920634B1 (en) * | 1998-08-03 | 2005-07-19 | International Business Machines Corporation | Detecting and causing unsafe latent accesses to a resource in multi-threaded programs |
JP2002014843A (en) * | 2000-06-30 | 2002-01-18 | Mitsubishi Electric Corp | Program execution trace system |
JP2005070949A (en) * | 2003-08-21 | 2005-03-17 | Sanyo Electric Co Ltd | Program processing apparatus |
JP2007257397A (en) * | 2006-03-24 | 2007-10-04 | Fujitsu Ltd | Contention state detection process additional program, contention state detection process adding apparatus and contention state detection process adding method |
JP4930078B2 (en) * | 2007-01-31 | 2012-05-09 | 富士通株式会社 | Information processing method, information processing apparatus, information processing program, and recording medium recording the program |
US8032706B2 (en) * | 2008-08-05 | 2011-10-04 | Intel Corporation | Method and apparatus for detecting a data access violation |
US20110258421A1 (en) | 2010-04-19 | 2011-10-20 | International Business Machines Corporation | Architecture Support for Debugging Multithreaded Code |
-
2010
- 2010-04-19 US US12/762,817 patent/US20110258421A1/en not_active Abandoned
-
2011
- 2011-03-31 WO PCT/EP2011/055029 patent/WO2011131469A1/en active Application Filing
- 2011-03-31 DE DE112011101364.7T patent/DE112011101364B4/en active Active
- 2011-03-31 GB GB1219670.5A patent/GB2493861A/en not_active Withdrawn
- 2011-03-31 JP JP2013505387A patent/JP5904993B2/en active Active
- 2011-03-31 CN CN2011800194904A patent/CN102844744A/en active Pending
-
2012
- 2012-04-04 US US13/439,229 patent/US8838939B2/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5420990A (en) * | 1993-06-17 | 1995-05-30 | Digital Equipment Corporation | Mechanism for enforcing the correct order of instruction execution |
US6101586A (en) * | 1997-02-14 | 2000-08-08 | Nec Corporation | Memory access control circuit |
US7757237B2 (en) * | 2004-06-16 | 2010-07-13 | Hewlett-Packard Development Company, L.P. | Synchronization of threads in a multithreaded computer program |
US8006075B2 (en) * | 2009-05-21 | 2011-08-23 | Oracle America, Inc. | Dynamically allocated store queue for a multithreaded processor |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8327336B2 (en) * | 2009-03-18 | 2012-12-04 | International Business Machines Corporation | Enhanced thread stepping |
US20100242026A1 (en) * | 2009-03-18 | 2010-09-23 | International Business Machines, Corporation | Enhanced thread stepping |
US8838939B2 (en) | 2010-04-19 | 2014-09-16 | International Business Machines Corporation | Debugging multithreaded code by generating exception upon target address CAM search for variable and checking race condition |
US20130246736A1 (en) * | 2010-11-25 | 2013-09-19 | Toyota Jidosha Kabushiki Kaisha | Processor, electronic control unit and generating program |
US20130159661A1 (en) * | 2011-12-16 | 2013-06-20 | Stmicroelectronics R&D Ltd | Hardware monitor |
US9753870B2 (en) * | 2011-12-16 | 2017-09-05 | Stmicroelectronics (Research & Development) Limited | Hardware monitor with context switching and selection based on a data memory access and for raising an interrupt when a memory access address is outside of an address range of the selected context |
US20140115604A1 (en) * | 2011-12-21 | 2014-04-24 | Justin Gottschlich | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs |
US10191834B2 (en) | 2011-12-21 | 2019-01-29 | Intel Corporation | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs |
US9311143B2 (en) * | 2011-12-21 | 2016-04-12 | Intel Corporation | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs |
US9135139B2 (en) * | 2012-06-27 | 2015-09-15 | Intel Corporation | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs using expressions |
US20140007054A1 (en) * | 2012-06-27 | 2014-01-02 | Youfeng Wu | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs using expressions |
US10387296B2 (en) | 2012-06-27 | 2019-08-20 | Intel Corporation | Methods and systems to identify and reproduce concurrency violations in multi-threaded programs using expressions |
US9323874B2 (en) * | 2012-12-07 | 2016-04-26 | International Business Machines Corporation | Simulation method using memory frame proxy architecture for synchronization and check handling |
US10204195B2 (en) * | 2012-12-07 | 2019-02-12 | International Business Machines Corporation | Simulation method using memory frame proxy architecture for synchronization and check handling |
US10204194B2 (en) * | 2012-12-07 | 2019-02-12 | International Business Machines Corporation | Memory frame proxy architecture for synchronization and check handling in a simulator |
US9336341B2 (en) * | 2012-12-07 | 2016-05-10 | International Business Machines Corporation | Memory frame proxy architecture for synchronization and check handling in a simulator |
US20140163945A1 (en) * | 2012-12-07 | 2014-06-12 | International Business Machines Corporation | Memory frame proxy architecture for synchronization and check handling in a simulator |
US9292453B2 (en) * | 2013-02-01 | 2016-03-22 | International Business Machines Corporation | Storing a system-absolute address (SAA) in a first level translation look-aside buffer (TLB) |
US20140223137A1 (en) * | 2013-02-01 | 2014-08-07 | International Business Machines Corporation | Storing a system-absolute address (saa) in a first level translation look-aside buffer (tlb) |
US9460023B2 (en) * | 2013-02-01 | 2016-10-04 | International Business Machines Corporation | Storing a system-absolute address (SAA) in a first level translation look-aside buffer (TLB) |
US20140233573A1 (en) * | 2013-02-15 | 2014-08-21 | Broadcom Corporation | Out-of-order message filtering with aging |
US9479622B2 (en) * | 2013-02-15 | 2016-10-25 | Broadcom Corporation | Out-of-order message filtering with aging |
US9117021B2 (en) | 2013-03-14 | 2015-08-25 | Intel Corporation | Methods and apparatus to manage concurrent predicate expressions |
US9830196B2 (en) | 2013-03-14 | 2017-11-28 | Intel Corporation | Methods and apparatus to manage concurrent predicate expressions |
US20140331206A1 (en) * | 2013-05-06 | 2014-11-06 | Microsoft Corporation | Identifying impacted tests from statically collected data |
US9389986B2 (en) * | 2013-05-06 | 2016-07-12 | Microsoft Technology Licensing, Llc | Identifying impacted tests from statically collected data |
US10372590B2 (en) * | 2013-11-22 | 2019-08-06 | International Business Corporation | Determining instruction execution history in a debugger |
US10977160B2 (en) * | 2013-11-22 | 2021-04-13 | International Business Machines Corporation | Determining instruction execution history in a debugger |
US10552297B2 (en) * | 2013-11-22 | 2020-02-04 | International Business Machines Corporation | Determining instruction execution history in a debugger |
US20150149984A1 (en) * | 2013-11-22 | 2015-05-28 | International Business Machines Corporation | Determining instruction execution history in a debugger |
US9940264B2 (en) * | 2014-10-10 | 2018-04-10 | International Business Machines Corporation | Load and store ordering for a strongly ordered simultaneous multithreading core |
US20160103681A1 (en) * | 2014-10-10 | 2016-04-14 | International Business Machines Corporation | Load and store ordering for a strongly ordered simultaneous multithreading core |
US20160103682A1 (en) * | 2014-10-10 | 2016-04-14 | International Business Machines Corporation | Load and store ordering for a strongly ordered simultaneous multithreading core |
US9886397B2 (en) * | 2014-10-10 | 2018-02-06 | International Business Machines Corporation | Load and store ordering for a strongly ordered simultaneous multithreading core |
US20160139201A1 (en) * | 2014-11-14 | 2016-05-19 | Cavium, Inc. | Debug interface for multiple cpu cores |
US9404970B2 (en) * | 2014-11-14 | 2016-08-02 | Cavium, Inc. | Debug interface for multiple CPU cores |
US9582312B1 (en) | 2015-02-04 | 2017-02-28 | Amazon Technologies, Inc. | Execution context trace for asynchronous tasks |
CN111831464A (en) * | 2019-04-22 | 2020-10-27 | 阿里巴巴集团控股有限公司 | Data operation control method and device |
CN110888773A (en) * | 2019-10-28 | 2020-03-17 | 北京字节跳动网络技术有限公司 | Method, device, medium and electronic equipment for obtaining thread identification |
CN114003491A (en) * | 2021-10-15 | 2022-02-01 | 赛轮集团股份有限公司 | Test equipment parameter modification method and device, electronic equipment and storage medium |
CN115687159A (en) * | 2022-12-29 | 2023-02-03 | 飞腾信息技术有限公司 | Debugging method, debugging device and computer readable storage medium |
CN116955044A (en) * | 2023-09-12 | 2023-10-27 | 北京开源芯片研究院 | Method, device, equipment and medium for testing cache working mechanism of processor |
Also Published As
Publication number | Publication date |
---|---|
GB2493861A (en) | 2013-02-20 |
US20120203979A1 (en) | 2012-08-09 |
JP5904993B2 (en) | 2016-04-20 |
WO2011131469A1 (en) | 2011-10-27 |
GB201219670D0 (en) | 2012-12-12 |
CN102844744A (en) | 2012-12-26 |
US8838939B2 (en) | 2014-09-16 |
DE112011101364B4 (en) | 2018-08-09 |
DE112011101364T5 (en) | 2013-03-28 |
JP2013528853A (en) | 2013-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8838939B2 (en) | Debugging multithreaded code by generating exception upon target address CAM search for variable and checking race condition | |
US7836430B2 (en) | Reversing execution of instructions in a debugger | |
Zhou et al. | iWatcher: Efficient architectural support for software debugging | |
US7950001B2 (en) | Method and apparatus for instrumentation in a multiprocessing environment | |
Lucia et al. | Colorsafe: architectural support for debugging and dynamically avoiding multi-variable atomicity violations | |
US6754856B2 (en) | Memory access debug facility | |
US8479173B2 (en) | Efficient and self-balancing verification of multi-threaded microprocessors | |
US8683185B2 (en) | Ceasing parallel processing of first set of loops upon selectable number of monitored terminations and processing second set | |
US20030135719A1 (en) | Method and system using hardware assistance for tracing instruction disposition information | |
EP3834083B1 (en) | Commit logic and precise exceptions in explicit dataflow graph execution architectures | |
KR102132805B1 (en) | Multicore memory data recorder for kernel module | |
US8850266B2 (en) | Effective validation of execution units within a processor | |
US20180267807A1 (en) | Precise exceptions for edge processors | |
US9081895B2 (en) | Identifying and tagging breakpoint instructions for facilitation of software debug | |
US8037366B2 (en) | Issuing instructions in-order in an out-of-order processor using false dependencies | |
US7984276B2 (en) | Method and system for altering processor execution of a group of instructions | |
CN108834427B (en) | Processing vector instructions | |
US20130019085A1 (en) | Efficient Recombining for Dual Path Execution | |
US7844859B2 (en) | Method and apparatus for instruction trace registers | |
US20240338220A1 (en) | Apparatus and method for implementing many different loop types in a microprocessor | |
Zhou et al. | A. Summary of the Work |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ELNOZAHY, ELMOOTAZBELLAH N.;GHEITH, AHMED;REEL/FRAME:024254/0743 Effective date: 20100416 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO PAY ISSUE FEE |