US8229726B1 - System for application level analysis of hardware simulations - Google Patents
System for application level analysis of hardware simulations Download PDFInfo
- Publication number
- US8229726B1 US8229726B1 US11/544,341 US54434106A US8229726B1 US 8229726 B1 US8229726 B1 US 8229726B1 US 54434106 A US54434106 A US 54434106A US 8229726 B1 US8229726 B1 US 8229726B1
- Authority
- US
- United States
- Prior art keywords
- analysis
- data
- framework
- module
- hardware
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/32—Circuit design at the digital level
- G06F30/33—Design verification, e.g. functional simulation or model checking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2117/00—Details relating to the type or aim of the circuit design
- G06F2117/08—HW-SW co-design, e.g. HW-SW partitioning
Definitions
- the invention relates generally to workload analysis, and more particularly, to systems and methods for providing software application-level workload analysis of simulated hardware systems.
- the present invention provides a computer system for performing application-level analysis of simulated hardware.
- the computer system comprises a hardware simulation, wherein the hardware simulator is capable of executing software on the simulated hardware and intercepting the interactions between the software and the simulated hardware.
- the computer system also comprises a framework, wherein the framework includes a plurality of analysis modules arranged in a multi-level configuration through which a data stream of one or more data entities corresponding to the interactions between the software and the simulated hardware travels and is processed at each level of the multi-level configuration.
- the present invention provides a method for enabling a computer system to perform application-level analysis of simulated hardware.
- the method comprises executing software on the simulated hardware using a hardware simulator and intercepting the interactions between the software and the simulated hardware.
- the method also comprises receiving a data stream of one or more data entities corresponding to the interactions between the software and the simulated hardware at the framework, the framework including a plurality of analysis modules arranged in a multi-level configuration.
- the method further comprises processing the one or more data entities, at each level of the multi-level configuration, using the plurality of analysis modules as the data stream travels through the multi-level configuration.
- the present invention provides a computer-readable medium for directing a computer system to perform application-level analysis of real or simulated hardware.
- the computer-readable medium comprises instructions for receiving the data stream of one or more data entities corresponding to interactions between software and the simulated hardware at a framework where the framework includes a plurality of analysis modules arranged in a multi-level configuration.
- the computer-readable medium also comprises instruction for processing the one or more data entities, at each level of the multi-level configuration, using the plurality of analysis modules as the data stream travels through the multi-level configuration.
- FIG. 1 is an illustration of a system for application-level analysis of a hardware simulation, in accordance with an embodiment of the present invention
- FIG. 2 is an illustration of an exemplary analysis tree of a workload analysis framework, in accordance with an embodiment of the present invention
- FIG. 3A is an illustration of an exemplary analyzer module configuration for analyzing cache misses in a hardware simulation, in accordance with an embodiment of the present invention
- FIG. 3B is an illustration of an exemplary analysis tree for providing instruction fetch analysis of a hardware simulation, in accordance with an embodiment of the present invention
- FIG. 4 is an illustration of a unified modeling language (UML) diagram of the analyzer and profile modules of the workload analysis framework, in accordance with an embodiment of the present invention
- FIG. 5A is an illustration of an exemplary system for performing application-level analysis from a trace-driven hardware simulation, in accordance with an embodiment of the present invention
- FIG. 5B is an illustration of an exemplary characterization of the segment working set growth analysis data produced by a workload analysis framework operating in a trace-driven simulation system, in accordance with an embodiment of the present invention
- FIG. 6A is an illustration of an exemplary system for performing application-level analysis from an execution-driven hardware simulation, in accordance with an embodiment of the present invention
- FIG. 6B is an illustration of an exemplary characterization of the function-level pipeline analysis data produced by the system illustrated in FIG. 6A , in accordance with an embodiment of the present invention
- FIG. 7A is a generalized diagram of a typical computer system suitable for use with the present invention.
- FIG. 7B shows subsystems in the typical computer system of FIG. 7A ;
- FIG. 7C is a generalized diagram of a typical network suitable for use with the present invention.
- Embodiments of the present invention provide a workload analysis framework that enables software engineers and hardware engineers to gain insight into the behavior of software applications on emerging chips and system architectures etc. before such chips and system architectures are built.
- the workload analysis framework of embodiments of the present invention reduces the need to extrapolate software performance characteristics from existing hardware product generations onto developing hardware architectures because analysis data can easily be collected directly from the simulation of developing hardware architectures, which significantly shortens software and hardware design improvement cycles.
- the workload analysis framework of embodiments of the present invention is an object-oriented analysis framework that is architected in a modular manner to allow a broad range of flexible analysis to be performed with the development of only a handful of analysis modules.
- the workload analysis framework of embodiments of the present invention is designed to be fully compatible for use with trace-driven or execution-driven (live) cycle-accurate etc. hardware simulators written for emerging or new chip and system architectures, and with software and hardware instruments of real (non-simulated) systems. As a result, new users need only understand individually relevant portions of the framework's structure and source code to both use and extend the toolset associated with the framework.
- FIG. 1 is an illustration of a System 100 for analyzing the performance of a future hardware system that includes a software-based Hardware Performance Simulator 102 and a Workload Analysis Framework 104 .
- the Hardware Performance Simulator 102 can be a trace-driven (e.g. functional) simulator that executes a Software Program 106 and outputs a trace of entities that include instructions, records, bus transactions, direct memory access (DMA) requests, interrupts, network packets, etc. executed during the hardware simulation.
- DMA direct memory access
- the outputs of the trace can then be sent to the Workload Analysis Framework 104 as an Entity Stream 108 of simulation data where analysis modules of the Workload Analysis Framework 104 can decode each entity of the Entity Stream 108 and perform analysis or filter functions on information contained in the Entity Stream 108 to produce application-level performance analysis data for an emerging hardware system.
- one or more instances of a Workload Analysis Framework 104 can be plugged in to a “live simulation” performed by an execution-driven (e.g., performance) Hardware Simulator 102 where, again, analysis modules of each instance of the Workload Analysis Framework 104 perform analysis or filter functions on the Entity Stream 108 generated at various stages of the “live simulation” to produce application-level performance analysis data for an emerging hardware system.
- a “live simulation” performed by an execution-driven (e.g., performance) Hardware Simulator 102
- analysis modules of each instance of the Workload Analysis Framework 104 perform analysis or filter functions on the Entity Stream 108 generated at various stages of the “live simulation” to produce application-level performance analysis data for an emerging hardware system.
- the Workload Analysis Framework 104 can be used with a Hardware Simulator 102 that is capable of running in both trace-driven and execution-driven modes, or with any other type of hardware simulator that is capable of simulating emerging hardware architecture.
- software tools (not shown) or hardware instrumentation devices (not shown) can replace the Hardware Simulator 102 .
- the Workload Analysis Framework 104 can be integrated with one or more software tools that instrument real hardware by intercepting interactions (events) between software and hardware, and create a stream of Data Entities 108 for each instrumented event.
- the Workload Analysis Framework 104 can be used with one or more hardware instrumentation devices (not shown) that are capable of providing a trace of hardware events. These hardware events can be used to create a stream of Data Entities 108 for analysis of the software running on the real hardware.
- the Workload Analysis Framework 104 is an object-oriented software analysis framework that includes interchangeable analysis modules that are configured in a multi-level functional configuration tree-like structure to perform the application-level performance analysis for an emerging hardware system.
- the Workload Analysis Framework 104 of an embodiment of the present invention is capable of performing detailed application-level analysis on an emerging hardware system by constructing an Analysis Tree 203 of interchangeable analysis modules ( 204 , 206 ) through which live or trace etc. simulation data, Entity Stream 108 , travels and is classified and analyzed or is filtered at each level of the Analysis Tree 203 .
- each node of the Analysis tree is represented by an analysis module ( 204 , 206 ) that performs sub-analysis based on the classification performed by its parent. And, as the Entity Stream 108 data travels through an analysis module ( 204 , 206 ), the module ( 204 , 206 ) can add information objects to the Entity Stream 108 data for reference or analysis further down the analysis chain. At the end of a simulation, each node starting at the “root” node, can recursively call on its children to dump their analysis to produce a categorized analysis report that contains all of the performance analysis data results for a targeted hardware system.
- the infrastructure of the Workload Analysis Framework 104 illustrated in FIG. 2 includes two primary categories of analysis modules, Analyzers 204 and Profilers 206 , which represent the foundation of any Analysis Tree 203 .
- Profilers 206 can be “leaf” modules that collect data on different events seen in the Entity Stream 108 .
- Analyzers 204 also discussed in more detail below, can be specialized “non-leaf” modules that, like Profilers 206 , can collect data on different events seen in the Entity Stream 108 , but are also capable of classifying, annotating, or transforming the simulation data of the Entity Stream 108 for analysis by lower-level analysis modules.
- Analyzer modules 204 can be connected to other Analyzer modules 204 to classify an Entity Stream 108 further, or connected to Profiler modules 206 to perform backend analysis.
- the left-most analysis branch of the Analysis Tree 203 illustrated in FIG. 2 which includes a Standard Analyzer 204 a , a Processor ID Analyzer 204 b , a User/Kernel Analyzer 204 e , and an Opcode Count Profiler 206 a , is capable of producing one opcode count summary table for both user and kernel instructions for each processor identified in the Entity Stream 108 .
- Analyzers 204 can be specialized “non-leaf” modules that are capable of classifying, annotating, or transforming simulation Data 108 for analysis by lower-level modules. Specifically, in one embodiment of the present invention, Analyzers 204 are capable of dividing a stream of Data 108 into sub-streams of data to be analyzed separately by Analyzer 204 or Profiler 206 modules lower in the Analysis Tree 203 .
- an Analyzer module 204 can split the Entity Stream 108 into sub-streams classifications based on characteristics that include, for example, software thread ID, CPU ID, function, memory segment, etc., so that successive analysis can be automatically performed based on a particular classification or characteristic of the Data 108 .
- the Analyzer module 204 is also capable of annotating a data entity of the Entity Stream 108 with additional information that can be utilized by analysis modules further down the analysis chain.
- a Function Analyzer module can annotate incoming Data Entities 107 (not shown) of the Entity Stream 108 with the software application function that the entities of the Entity Stream 108 correspond to so that the software function can be identified by other analysis modules ( 204 , 206 ) without having to re-execute the entity-to-function mapping.
- the Analyzer module 204 is further capable of transforming the Entity Stream 108 by overwriting fields of a Data Entity 107 (not shown) or by substituting or adding new fields in a Data Entity 107 .
- a Data Entity 107 created from a memory access instruction in simulation can contain fields such as the context ID, instruction executed, virtual memory address accessed, physical memory accessed etc.
- a Physical Memory Mapping Analyzer module may dynamically change the value of the address field to reflect a newer memory mapping policy under study, resulting in lower level analysis being performed with the new mappings. Further, the Physical Memory Mapping Analyzer module can attach a new field, such as memory domain etc., to the list of fields in the Data Entity 107 for lower level analysis.
- the Workload Analysis Framework 104 can include a set of “built-in” Analyzer modules 204 and “built-in” Profiler modules 206 that can be utilized in any combination based on the desired analysis.
- a set of built-in Analyzer modules 204 can include, but is not limited to:
- a Standard Analyzer that is a generic analysis module that can be used in experiments where there is no need to divide the simulation Data 108 into different categories.
- the Standard Analyzer is capable of profiling the total number of traps seen in an instruction trace or the total number of instructions fetched by all of the processors in a cycle accurate simulator.
- the Standard Analyzer can also be used as a “root” node of an Analysis Tree 203 when the desired analysis requires the use of multiple additional analysis modules in parallel (e.g. Process ID Analyzer, Function Analyzer, Memory Segment Analyzer), as shown in FIG. 2 ;
- a Processor ID Analyzer that is capable of decoding information in the stream of Data 108 and categorizing entities of the Data 108 by processor ID.
- the Processor ID Analyzer can be particularly useful in the analysis of multiprocessor hardware systems;
- a Software Thread Analyzer that is capable of decoding the register information in the Entity Stream 108 and classifying the information in the data in the Entity Stream 108 by software thread ID. It is worth noting that the Software Thread Analyzer is different from the Processor ID Analyzer in that the Software Thread Analyzer is capable of analyzing multi-threaded software programs;
- a User/Kernel Analyzer that is capable of decoding the Entity Stream 108 and dividing it into sub-streams depending on whether and a Data Entity 107 of the Entity Stream 108 corresponds to user or kernel instruction activity, which makes the User/Kernel Analyzer useful in determining the amount of operating system kernel activity that occurs during a workload's execution or in characterizing user code after all kernel activity has been filtered out;
- a Memory Segment Analyzer that is capable of categorizing Data Entities 107 in the Entity Stream 108 based on which virtual memory segments (e.g., stack, heap, text, data, library, etc.) the Data Entities 107 references.
- the Memory Segment Analyzer can provide a way of relating various events, like cache misses and translation look-aside buffer (TLB) misses etc., to the application memory segment responsible for causing the misses etc.
- the Memory Segment Analyzer utilizes a configuration file that contains the process virtual address segment mappings for all active processes on a simulated machine from which a trace etc. is derived to identify the segment access by a particular instruction; and
- a Function Analyzer that is capable of decoding the address information in the Data 108 and categorizing the decoded data based on which source code function the address information maps to.
- the Function Analyzer offers a powerful way of mapping various events encountered in the execution (e.g. cache misses, TLB misses, branch mispredicts) to the application function responsible for causing the event.
- the Function Analyzer is capable of using a configuration file that includes application symbol table information and mapping each source code function to a virtual address range on the simulated machine and associating instructions in the trace etc. with those functions.
- Profiler modules 206 in one embodiment of the present invention, can be “leaf” modules that collect data on different events seen in the Entity Stream 108 , as illustrated in FIG. 2 .
- Profiler modules 206 form the leaves of an Analysis Tree 203 , and can be responsible for counting and/or profiling the Entity Stream 108 passed down by the Analyzer modules 204 .
- Profilers 206 are not chained to other Profilers 206 , but instead are connected to a parent Analyzer module 204 .
- a Profiler 206 can be as simple as an event counter.
- a Profiler's 206 utility increases significantly when the Profiler 206 is connected with different Analyzers 204 to form the Analysis Tree 203 .
- a simple Profiler 206 can be plugged into any combination of Analyzers 204 to generate a wide range of information within the same simulation without requiring any modifications to the Profiler 206 or Analyzers 204 . For example, as shown in FIG.
- a Working Set Profiler module 206 d whose purpose it is to characterize the working sets of the processes represented in the Entity Stream 108 , can be plugged into a Function Analyzer module 204 c and a Memory Segment Analyzer module 204 d to report working set growth on a per application function and per memory segment basis within the same hardware simulation.
- the Workload Analysis Framework 104 can include a set of built-in Profilers 206 that can be utilized with any combination of Analyzers 204 depending upon the particular requirements of the analysis being performed.
- This set of built-in Profilers 206 can include, but is not limited to:
- An Instruction Frequency Profiler that can decode each machine instruction in the Entity Stream 108 and report the number of instructions that fall into any number of broad categories including memops, branches, trapped instructions, arithmetic logic unit (ALU) instructions, no-ops, etc.;
- An Opcode Count Profiler that can decode each machine instruction in the stream of Data 108 and count the occurrence of each opcode.
- a Trap Profiler that can count and categorize each type of trap encountered in the Entity Stream 108 ;
- a Working Set Profiler that can decode the effective address information of each machine instruction encountered in the sequence of instructions identified in the Data 108 and report information about the growth of a working set over a period of time for the sequence of instructions;
- a Memory Stride Profiler that can decode each machine instruction encountered in the Entity Stream 108 and report a histogram summary of the distance between consecutive instructions and data memory accesses;
- a Load/Store Transition Profiler that can identify each memory operation encountered in the Entity Stream 108 and report a summary transition table (i.e. percentage of loads followed by a store, percentage of stores followed by another store, etc., and so on);
- a Memory Reuse Profiler that can identify each memory operation encountered in the Entity Stream 108 and report a histogram summary of the number of machine instructions between accesses to the same memory location (i.e. temporal locality);
- a Context Switch Profiler that can identify changes in the memory management unit (MMU) context of the Entity Stream 108 and report a summary of machine instructions seen on a per context switch basis; and A Program Counter (PC) Check Profiler that can ensure that the Entity Stream 108 represents a legal SPARCTM (Scalable Processor Architecture) instruction sequence.
- the PC Check Profiler can check for legal PC sequences around branches, delay slots, traps, interrupts, etc. and report a corresponding error summary.
- the PC Check Profiler can be used as a child of the Processor ID Analyzer discussed above.
- FIG. 3 additional exemplary illustrations of Analysis Trees 203 according to embodiments of the present invention are shown.
- FIG. 3 illustrations are provided for illustrative purposes only and are not meant to limit the scope of embodiments of the present invention.
- FIG. 3A an illustration of a Analyzer module 204 configuration that can be used to analyze cache misses is shown. Specifically, Data Entities 107 of the Entity Stream 108 which are created during cache miss events in a hardware simulation are passed to a custom Cache Line Analyzer module 204 f .
- the Cache Line Analyzer module 204 f is an analysis module that is capable of decoding an address in the Data Entities 107 of the Entity Stream 108 , and analyzing and classifying Data Entities 107 of the Entity Stream 108 according to the cache line that each cache miss corresponds to.
- Each Data Entity 107 is then passed to a Software Thread Analyzer module 204 g which further classifies the information in the Data Entity 107 according to the software thread that caused the cache miss.
- the Software Thread Analyzer module 204 g decodes the register information in the Data Entity 107 , and analyzes and classifies the information in the Data Entity 107 by software thread id.
- the Data Entity 107 is then passed to a Function Analyzer module 204 c that decodes the instruction information in the Data Entity 107 and analyzes the instruction information by mapping it back to the source code function that the instruction information corresponds to.
- the Data Entity 107 is also passed to a Memory Segment Analyzer module 204 d that decodes the address information in the Data Entity 107 and analyzes the virtual memory segment the address information belongs to.
- the Analyzer Modules ( 204 c , 204 d , 204 f , 204 g ) of the Analysis Tree 203 illustrated in FIG. 3A are capable of producing analysis data in which each miss to a cache line is mapped to the software thread that caused the missed, and in which, for each software thread, the software function and the virtual memory segment that caused the miss to a particular cache line is identified.
- FIG. 3B an illustration of a module configuration that can be used to analyze the instructions fetched in a hardware simulation is shown.
- Data Entities 107 of an Entity Stream 108 which are created during the cache miss events in the hardware simulation are passed to a CPU Analyzer module 204 h which decodes the register information in each Data Entity 107 , and analyzes and classifies the Data Entity 107 by CPU.
- Each Data Entity 107 is then passed to a Software Thread Analyzer module 204 g that, as discussed above in FIG. 3A , decodes the register information in the Data Entity 107 , and analyzes and classifies information in the Data Entity 107 by software thread id.
- Each Data Entity 107 is then passed to a Working Set Profiler module 206 d (discussed above regarding FIG. 2 ) that decodes the instruction information in each Data Entity 107 and analyzes the working sets of the processes represented in the simulation Data Entity 107 and reports information about the growth of the working sets.
- a Working Set Profiler module 206 d decodes the instruction information in each Data Entity 107 and analyzes the working sets of the processes represented in the simulation Data Entity 107 and reports information about the growth of the working sets.
- each Data Entity 107 is also passed to a Function Analyzer module 204 c (discussed above in FIG. 2 ) that decodes the address information in the Data Entity 107 and categorizes the information based on which source code function the address information maps to, and passes the Data Entity 107 on to an Opcode Count Profiler module 206 a (discussed above in FIG. 2 ) and a Memory Segment Analyzer module 204 d (also discussed above in FIG. 2 ).
- the Opcode Count Profiler 206 a decodes the instruction information in the Data Entity 107 and counts the occurrence of each opcode.
- the Memory Segment Analyzer 204 d analyzes the information in the Data Entity 107 according to which memory segments the information references, and passed the Data Entity 107 to a Memory Op Profiler 206 e that decodes and analyzes the instruction information in the Data Entity 107 and reports the information for each memory operation identified in the Data Entity 107 .
- a Memory Op Profiler 206 e that decodes and analyzes the instruction information in the Data Entity 107 and reports the information for each memory operation identified in the Data Entity 107 .
- the working sets corresponding to each software thread are analyzed and further broken down by the source code functions associated with a software thread and, for each software function, the opcodes and the virtual memory segments corresponding to the software function are analyzed. And finally, for each virtual memory segment the memory operations that the virtual memory segment performs are analyzed.
- the object-oriented Workload Analysis Framework 104 of embodiments of the present invention can provide an environment in which Analyzer 204 and Profiler 206 modules can be reused and reconfigured.
- Analyzer 204 and Profiler 206 modules can be reused and reconfigured.
- the Software Thread Analyzer 204 g the Function Analyzer 204 c , the Memory Segment Analyzer 204 d , the Opcode Count Profiler 206 a , and the Working Set Profiler 206 e modules which are used in FIG. 3A are reused in FIG. 3B by linking the modules in varying orders to represent a different type of analysis.
- an event framework can be implemented within the Workload Analysis Framework 104 that allows an Analysis Tree 203 to be dynamically reconfigured based on trigger events.
- each Analyzer 204 or Profiler 206 module can generate events that are predicated on specified trigger conditions, thereby dynamically enabling the performance of different types and depths of analysis in response to simulation events.
- a trigger can be attached to a Software Thread Analyzer module 204 g shown in FIG. 2 which activates if an address falls within two bounds etc. And an event handler registered with the trigger can then enable/disable etc.
- a Function Analyzer 204 c branch of an Analysis Tree 203 so that more detailed analysis is performed only for addresses of interest.
- the Function Analyzer module 204 c of FIG. 2 can dynamically clone a copy of its child Working Set Profiler module 206 d for each new source code function that the Function Analyzer module 204 c encounters. This feature allows the same Analysis Tree 203 to execute on a variety of Entity Stream 108 inputs without modifying the Analysis Tree 203 .
- This ability to dynamically reconfigure an Analysis Tree 203 based on trigger events provides a further advantage of mitigating the problems that are routinely encountered with large volumes of analysis data.
- Conventional post-processing and data exploration analysis tools are incapable of efficiently handling very large data sets.
- the Workload Analysis Framework 104 of embodiments of the present invention resolves this issue by allowing data reduction at the source of the hardware simulation environment using the mechanism of filtering data based on trigger events. For example, a module can trigger analysis when a “miss ratio” of a target process's memory segment reaches a predetermined threshold, which can significantly reduce the amount of analysis data that needs to be stored and post-processed.
- the Analyzer 204 and Profiler 206 modules discussed above can be implemented as instances of a JavaTM-based abstract class called Module 402 . More specifically, a Profiler subclass 406 and an Analyzer subclass 404 each extend the abstract class Module 402 to create Analyzer 204 and Profiler 206 objects.
- the Module abstract class 402 implements methods for naming a module using a string identifier and for getting the position of a module in the analysis tree. It also defines abstract methods (to be implemented in the Analyzer subclass 404 and Profiler subclass 406 ) for initializing module specific data fields and structures, processing a Data Entity 108 and printing out analysis results.
- Analyzer subclass 404 implements a method for adding other Modules 402 to it.
- Analyzers 404 can be composed of zero or many other Modules 402 that perform analysis on Data Entities 108 received by the Analyzer 404 .
- a Module 402 may belong to one and only one Analyzer 404 .
- Embodiments of the present invention are not limited to implementing the Analyzer 204 and the Profiler 206 modules using JavaTM-based class constructs, or to implementing the particular class construct shown in FIG. 4 . Rather the Analyzers 204 and the Profilers 206 modules of embodiments of the present invention can be implemented using any programming construct including those provided in languages such as C, C++, Verilog, VHDL etc.
- Profilers 206 can be developed for specific analysis needs without requiring any knowledge about the Analyzers 204 to which the Profilers 206 are connected, and Profilers 206 , as discussed above in FIG. 4 , can automatically benefit from existing and future Analyzers 204 because higher level analysis methods such as function analysis, memory segment analysis etc. don't need to be implemented in the Profiler 206 itself. And when new Analyzers 204 are created an existing Profiler 206 could automatically perform new analysis using it.
- the Workload Analysis Framework 104 of embodiments of the present invention can be used to gather workload analysis information not only from static traces, but also from detailed performance simulators of emerging hardware architectures.
- the Workload Analysis Framework 104 of embodiments of the present invention is capable of utilizing hardware simulations which run in trace-driven mode or in execution-driven mode, or both.
- a System 500 for performing application-level analysis from a trace-driven hardware simulation is illustrated.
- the System 500 includes a Functional Simulator 502 that is capable of simulating emerging hardware architectures.
- the Functional Simulator 502 executes a Software Program 504 and outputs an instruction Trace 506 of records (e.g. register records, trap records, TLB records, cache forming records, etc.), bus transactions, direct memory access (DMA) requests, interrupts, network packets, instructions, etc. that represent a snapshot of all of the software and hardware activity and interaction that occurred during the Functional Simulator's 502 execution of the Software Program 504 .
- records e.g. register records, trap records, TLB records, cache forming records, etc.
- DMA direct memory access
- each instruction, record, bus transaction, DMA request, interrupt, network packet, etc. of the Trace 506 corresponds to a Data Entity 107 which can be implemented as an instance of a JavaTM class called “entity,” discussed above in FIG. 4 .
- the Data Entities 107 discussed above are sent to the Workload Analysis Framework 104 as an Entity Stream 108 where, as discussed in FIGS. 2 and 3 , the Workload Analysis Framework 104 of an embodiment of the present invention is capable of performing detailed application-level analysis on each Trace Data Entity 107 of the Entity Stream 108 by utilizing an Analysis Tree 203 of Analyzer 204 and Profiler 206 modules through which the Entity Stream 108 travels and is classified and analyzed or filtered at each level of the Analysis Tree 203 until the last Analyzer 204 module of each branch of the Analysis Tree 203 passes the last Data Entity 107 of the Entity Stream 108 to a Profiler 208 module which outputs the performance analysis results of the simulation Report(s) 512 .
- each Analyzer 204 and Profiler 206 module is capable of invoking a “process” operation that allows the Analyzer 204 or the Profiler 206 module to read the content of and analyze each Data Entity 107 which, as discussed above, can be implemented as a Java object represented by a class called “entity.”
- FIG. 5B an exemplary characterization of the performance analysis data reported by a Workload Analysis Framework 104 in a trace-driven hardware simulation environment is provided.
- FIG. 5B illustrates the results of a per-memory segment (e.g., application binary text, application binary data, heap space, and process stack memory segments of a virtual address space of a SolarisTM process etc.) working set trace analysis using the Standard Analyzer 204 a , Memory Segment Analyzer 204 d , and Working Set Profiler 206 d branch of Analysis Tree 203 illustrated in FIG. 2 for several of the integer benchmarks of the Standard Performance Evaluation Corporation (SPEC) CPU2000 suite.
- SPEC Standard Performance Evaluation Corporation
- the data, the heap, and the stack memory segments display different growth patterns that would not be discernable if the growth patterns were analyzed as a combined group.
- the “crafty” benchmark does not display significant heap growth, but does steadily access new words in the data segment.
- most benchmarks demonstrate stepwise growth in their text working set, but the benchmark's stack sizes remain steady. This indicates that even though new code paths continue to be executed, maximum function call depth is reached very early.
- the “gap” benchmark displays the longest function call chain while the “crafty” benchmark shows the largest text footprint.
- the performance analysis data of FIG. 5B also illustrates that even though the “bzip2” and the “gzip” benchmarks are in the same benchmark category (e.g. compression), their characteristics differ—bzip2 uses a larger heap and data segment than gzip and, though bzip2 displays phased behavior in the growth of the memory segments, gzip reaches steady state quickly.
- the Workload Analysis Framework 104 of embodiments of the present invention can be used to analyze memory access patterns of both commercial and scientific workloads and to gather statistics about workload traces that can be used by product design groups. And, as shown in FIG. 5B , these types of statistics can be used to verify that Traces 506 generated through a functional simulation accurately represent a workload execution on real hardware and to capture interesting or representative segments of a workload execution.
- the exemplary System 600 can include one or more instances a Workload Analysis Framework (e.g., 104 a , 104 b , 104 c , 104 d ) which are selectively plugged into a multi-stage execution Pipeline 606 of a Performance Simulator 602 that performs performance model simulation of emerging or existing hardware architectures.
- a Workload Analysis Framework e.g., 104 a , 104 b , 104 c , 104 d
- each instance of the Workload Analysis Framework ( 104 a , 104 b , 104 c , 104 d ) includes an Analysis Tree 203 that can use any combination of interchangeable Analyzers 204 and Profiles 206 to perform performance analysis on the chip or system architecture simulated by the Performance Simulator 602 .
- the Performance Simulator 602 executes a Software Program 604 and produces an Entity Stream 108 which is a stream of Data Entities 107 (not shown) that, as discussed above in FIG. 6 , can each be implemented as a Java object represented by a class called “entity” and each represent records (e.g. register records, trap records, TLB records, cache forming records, etc.), bus transactions, direct memory access (DMA) requests, interrupts, network packets, or instructions, etc. of all of the software and hardware activity and interaction that occurred during the Performance Simulator's 602 execution of the Software Program 604 .
- the Entity Stream 108 flows between Blocks 608 of the Pipeline 606 .
- FIG. 6 demonstrates how the Workload Analysis Framework ( 104 a , 104 b , 104 c , 104 d ) can be hooked into a generic processor pipeline made of several blocks or stages, which include, but are not limited to, Fetch Block 608 g (pipeline stage or stages during which instruction fetch occurs), Decode Block 608 h (pipeline stage or stages during which instruction decode occurs), Execute Block 608 i (pipeline stage or stages during which instruction execution occurs), Memory Block 608 j (pipeline stage or stages during which data memory accesses occur to the memory system), Retire Block 608 k (pipeline stage or stages during which instruction retire occurs), and Commit Block 6081 (pipeline stage or stages during which instruction commit occurs).
- Fetch Block 608 g pipeline stage or stages during which instruction fetch occurs
- Decode Block 608 h pipeline stage or stages during which instruction decode occurs
- Execute Block 608 i pipeline stage or stages during which instruction execution occurs
- Memory Block 608 j (pipeline stage or stages during
- the Workload Analysis Framework ( 104 a , 104 b , 104 c , 104 d ) hooks similarly into different levels of the cache memory hierarchy ( 608 a , 608 b , 608 c , 608 d ). Instances of the Workload Analysis Framework ( 104 a , 104 b , 104 c , 104 d ) “snoop” the Stream 108 of Data Entities 107 flowing between Blocks 608 of the Pipeline 606 and perform analysis without impacting or modification to the simulation itself.
- the exemplary System 600 of FIG. 6A can be configured such that instances of the Workload Analysis Framework ( 104 a , 104 b , 104 c , 104 d ) each include an Analysis Tree 203 that contains a Function Analyzer 204 module which is connected to a simple Instruction Frequency Profiler 206 module.
- Workload Analysis Framework instance 104 a which is plugged in between Execute Block 608 i and Mem Block 608 j , can characterize branch mispredicts encountered in the Entity Stream 108 , and Workload Framework instances 104 b , 104 c , and 104 d , which are respectively plugged in between Instruction Cache Block 608 d and Data Cache Block 608 e , L2 Cache Block 608 c , L3 Cache Block 608 b , and Memory Block 608 a , can characterize misses from each level of cache. As a result, these characterizations can generate function-level pipeline analysis data like that shown in FIG.
- FIG. 6B highlights the capability of the Workloadload Analysis Framework 104 of embodiments of the present invention to map important processor events back to the software source code.
- FIG. 6B shows the most frequently executed source code functions for each benchmark and the percentage of cache misses and branch mispredicts caused by each of those functions in simulation.
- FIG. 6B shows the most frequently executed source code functions for each benchmark and the percentage of cache misses and branch mispredicts caused by each of those functions in simulation.
- the software function “primal_start_artifical( )” in the “mcf” benchmark might otherwise be viewed as insignificant, accounting for only approximately 1% of user instructions.
- the Workload Analysis Framework 104 of embodiments of the present invention is not limited to producing only the types of performance analysis data discussed above in FIGS. 1 through 6 .
- the Workload Analysis Framework 104 is capable of producing analysis data that can be leveraged for purposes of exploring synthetic workload inputs for wide processor and system design where trace-driven and execution-driven simulation methods can sometimes become impractical.
- the type of analysis data provided by the Workload Analysis Framework 104 of embodiments of the present invention can allow, among things, hardware architects to understand and identify the types of software functions that behave poorly in different portions of a hardware system and to learn how proposed changes to the software and hardware design will affect benchmark behavior.
- the Workload Analysis Framework 104 of embodiments of the present invention also allows software architects to perform application tuning for emerging hardware architectures at a much earlier stage in the design cycle, which provides the software architects with more opportunity to influence the design of hardware systems.
- the Workload Analysis Framework 104 of embodiments of the present invention is based on an object-oriented infrastructure which provides a unique and modular framework for performance analysis in trace and execution etc. driven simulation environments.
- the Workload Analysis Framework 104 allows analysis modules (analyzers and profilers) to be written in isolation of other simulation-specific etc. modules with no knowledge of the overall simulator architecture.
- the Workload Analysis Framework 104 of embodiments of the present invention also enables engineers and architects to leverage existing analysis modules without any insight into the existing module's implementation. As a result, productivity can be improved by reducing the time usually required for hardware and software development, and for providing valuable analysis data.
- the modular architecture of the Workload Analysis Framework 104 provides software engineers with a capability to provide analysis routines that attach to the Workload Analysis Framework 104 that is specific to a small piece of their software application without requiring knowledge of the larger simulation environment.
- analysis routines that attach to the Workload Analysis Framework 104 that is specific to a small piece of their software application without requiring knowledge of the larger simulation environment.
- it can take months to produce adequate analysis results because hardware simulation experts have to modify their simulation environments to accommodate the requirements of software experts.
- detailed analysis result can be obtained in a matter of minutes.
- an analysis tree that performs complex analysis on a memory segment basis can be modified to perform analysis on a source code function basis by simply replacing a Memory Segment Analyzer module of the analysis tree with a Function Analyzer module, while keeping the other analysis modules the same.
- FIG. 7A is an illustration of an embodiment of an exemplary computer system 700 suitable for use with the present invention including display 703 having display screen 705 .
- Cabinet 707 houses standard computer components (not shown) such as a disk drive, CDROM drive, display adapter, network card, random access memory (RAM), central processing unit (CPU), and other components, subsystems and devices.
- User input devices such as a mouse 711 having buttons 713 , and keyboard 709 are shown.
- Computers can be configured with many different hardware components and can be made in many dimensions and styles (e.g. laptop, palmtop, pentop, server, workstation, mainframe). Any hardware platform suitable for performing the processing described herein is suitable for use with the present invention.
- FIG. 7B illustrates subsystems that might typically be found in a computer such as computer 700 .
- subsystems within box 720 are directly interfaced to internal bus 722 .
- Such subsystems typically are contained within the computer system such as within cabinet 707 of FIG. 7A .
- Subsystems include input/output (I/O) controller 724 , System Random Access Memory 9RAM) 726 , Central Processing Unit (CPU) 728 , Display Adapter 730 , Serial Port 740 , Fixed Disk 742 and Network Interface Adapter 744 .
- I/O input/output
- CPU Central Processing Unit
- Display Adapter 730 Central Processing Unit
- Serial Port 740 Fixed Disk 742
- Fixed Disk 742 Fixed Disk 742
- Network Interface Adapter 744 The use of bus 722 allows each of the subsystems to transfer data among the subsystems and, most importantly, with the CPU.
- External devices can communicate with the CPU or other subsystems via the bus 722 by interfacing with a subsystem on the bus.
- Monitor 746 connects to the bus through Display Adapter 730 .
- a relative pointing device (RPD) 748 such as a mouse connects through Serial Port 740 .
- RPD relative pointing device
- Some devices such as a Keyboard 750 can communicate with the CPU by direct means without using the main data bus as, for example, via an interrupt controller and associated registers (not shown).
- FIG. 7B is illustrative of but one suitable configuration. Subsystems, components or devices other than those shown in FIG. 7B can be added. A suitable computer system can be achieved without using all of the subsystems shown in FIG. 7B . For example, a standalone computer need not be coupled to a network so Network Interface 744 would not be required. Other subsystems such as a CDROM drive, graphics accelerator, etc. can be included in the configuration without affecting the performance of the system of the present invention.
- FIG. 7C is a generalized diagram of a typical network.
- the network system 780 includes several local networks coupled to the Internet.
- specific network protocols, physical layers, topologies, and other network properties are presented herein, embodiments of the present invention are suitable for use with any network.
- FIG. 7C computer USER 1 is connected to Server 1 .
- This connection can be by a network such as Ethernet, Asynchronous Transfer Mode, IEEE standard 1553 bus, modem connection, Universal Serial Bus, etc.
- the communication link need not be wire but can be infrared, radio wave transmission, etc.
- Server 1 is coupled to the Internet.
- the Internet is shown symbolically as a collection of sever routers 782 . Note that the use of the Internet for distribution or communication of information is not strictly necessary to practice the present invention but is merely used to illustrate embodiments, above. Further, the use of server computers and the designation of server and client machines are not critical to an implementation of the present invention.
- USER 1 Computer can be connected directly to the Internet.
- Server 1 's connection to the Internet is typically by a relatively high bandwidth transmission medium such as a T1 or T3 line.
- computers at 784 are shown utilizing a local network at a different location from USER 1 computer.
- the computers at 784 are couple to the Internet via Server 2 .
- USER 3 and Server 3 represent yet a third installation.
- a server is a machine or process that is providing information to another machine or process, i.e., the “client,” that requests the information.
- a computer or process can be acting as a client at one point in time (because it is requesting information).
- Some computers are consistently referred to as “servers” because they usually act as a repository for a large amount of information that is often requested. For example, a World Wide Web (WWW, or simply, “Web”) site is often hosted by a server computer with a large storage capacity, high-speed processor and Internet link having the ability to handle many high-bandwidth communication lines.
- WWW World Wide Web
- a server machine will most likely not be manually operated by a human user on a continual basis, but, instead, has software for constantly, and automatically, responding to information requests.
- some machines, such as desktop computers are typically though of as client machines because they are primarily used to obtain information from the Internet for a user operating the machine.
- the machine may actually be performing the role of a client or server, as the need may be.
- a user's desktop computer can provide information to another desktop computer.
- a server may directly communicate with another server computer.
- peer-to-peer communication.
- processes of the present invention, and the hardware executing the processes may be characterized by language common to a discussion of the Internet (e.g., “client,” “server,” “peer”) it should be apparent that software of the present invention can execute on any type of suitable hardware including networks other than the Internet.
- software of the present invention may be presented as a single entity, such software is readily able to be executed on multiple machines. That is, there may be multiple instances of a given software program, a single program may be executing on different physical machines, etc. Further, two different programs, such as a client a server program, can be executing in a single machine, or in different machines. A single program can be operating as a client for information transaction and as a server for a different information transaction.
- a “computer” for purposes of embodiments of the present invention may include any processor-containing device, such as a mainframe computer, personal computer, laptop, notebook, microcomputer, server, personal data manager or “PIM” (also referred to as a personal information manager or “PIM”) smart cellular or other phone, so-called smart card, set-top box, or any of the like.
- a “computer program” may include any suitable locally or remotely executable program or sequence of coded instructions which are to be inserted into a computer, well known to those skilled in the art. Stated more specifically, a computer program includes an organized list of instructions that, when executed, causes the computer to behave in a predetermined manner.
- a computer program contains a list of ingredients (called variables) and a list of directions (called statements) that tell the computer what to do with the variables.
- the variables may represent numeric data, text, audio or graphical images. If a computer is employed for synchronously presenting multiple video program ID streams, such as on a display screen of the computer, the computer would have suitable instructions (e.g., source code) for allowing a user to synchronously display multiple video program ID streams in accordance with the embodiments of the present invention.
- a computer for presenting other media via a suitable directly or indirectly coupled input/output (I/O) device
- the computer would have suitable instructions for allowing a user to input or output (e.g., present) program code and/or data information respectively in accordance with the embodiments of the present invention.
- a “computer-readable medium” for purposes of embodiments of the present invention may be any medium that can contain, store, communicate, propagate, or transport the computer program for use by or in connection with the instruction execution system, apparatus, system or device.
- the computer readable medium can be, by way of example only but not by limitation, an electronic, magnetic, optical, electromagnetic, or semiconductor system, apparatus, system, device, or computer memory.
- the computer readable medium may have suitable instructions for synchronously presenting multiple video program ID streams, such as on a display screen, or for providing for input or presenting in accordance with various embodiments of the present invention.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
Claims (19)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/544,341 US8229726B1 (en) | 2006-10-05 | 2006-10-05 | System for application level analysis of hardware simulations |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/544,341 US8229726B1 (en) | 2006-10-05 | 2006-10-05 | System for application level analysis of hardware simulations |
Publications (1)
Publication Number | Publication Date |
---|---|
US8229726B1 true US8229726B1 (en) | 2012-07-24 |
Family
ID=46513948
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/544,341 Active 2028-06-16 US8229726B1 (en) | 2006-10-05 | 2006-10-05 | System for application level analysis of hardware simulations |
Country Status (1)
Country | Link |
---|---|
US (1) | US8229726B1 (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090037888A1 (en) * | 2007-07-30 | 2009-02-05 | Fujitsu Limited | Simulation of program execution to detect problem such as deadlock |
US20090313312A1 (en) * | 2008-06-11 | 2009-12-17 | International Business Machines Corporation | Method of Enhancing De-Duplication Impact by Preferential Selection of Master Copy to be Retained |
US20110276833A1 (en) * | 2010-05-04 | 2011-11-10 | Oracle International Corporation | Statistical analysis of heap dynamics for memory leak investigations |
US8423986B1 (en) * | 2011-10-12 | 2013-04-16 | Accenture Global Services Limited | Random utility generation technology |
US8522216B2 (en) | 2010-05-04 | 2013-08-27 | Oracle International Corporation | Memory leak detection |
WO2014152469A1 (en) * | 2013-03-18 | 2014-09-25 | The Trustees Of Columbia University In The City Of New York | Unsupervised anomaly-based malware detection using hardware features |
US8966211B1 (en) * | 2011-12-19 | 2015-02-24 | Emc Corporation | Techniques for dynamic binding of device identifiers to data storage devices |
GB2524016A (en) * | 2014-03-11 | 2015-09-16 | Advanced Risc Mach Ltd | Hardware simulation |
US9507891B1 (en) | 2015-05-29 | 2016-11-29 | International Business Machines Corporation | Automating a microarchitecture design exploration environment |
US10025929B2 (en) | 2013-03-18 | 2018-07-17 | The Trustees Of Columbia University In The City Of New York | Detection of anomalous program execution using hardware-based micro-architectural data |
CN109213725A (en) * | 2017-06-30 | 2019-01-15 | 英特尔Ip公司 | The reconfigurable mobile device of software and method |
US20190179655A1 (en) * | 2017-12-12 | 2019-06-13 | Arch Systems, Inc. | System and method for physical machine monitoring and analysis |
US10339229B1 (en) | 2013-05-31 | 2019-07-02 | Cadence Design Systems, Inc. | Simulation observability and control of all hardware and software components of a virtual platform model of an electronics system |
US10554505B2 (en) * | 2012-09-28 | 2020-02-04 | Intel Corporation | Managing data center resources to achieve a quality of service |
US10802852B1 (en) | 2015-07-07 | 2020-10-13 | Cadence Design Systems, Inc. | Method for interactive embedded software debugging through the control of simulation tracing components |
US10892971B2 (en) | 2019-03-12 | 2021-01-12 | Arch Systems Inc. | System and method for network communication monitoring |
CN112308222A (en) * | 2020-10-27 | 2021-02-02 | 之江实验室 | A full-system simulator based on RRAM storage and computing and its design method |
US11487561B1 (en) * | 2014-12-24 | 2022-11-01 | Cadence Design Systems, Inc. | Post simulation debug and analysis using a system memory model |
US11580228B2 (en) * | 2019-11-22 | 2023-02-14 | Oracle International Corporation | Coverage of web application analysis |
US11604718B1 (en) | 2020-03-04 | 2023-03-14 | Elasticsearch B.V. | Profiling by unwinding stacks from kernel space using exception handling data |
US11928045B1 (en) * | 2021-04-21 | 2024-03-12 | Cadence Design Systems, Inc. | System and method for non-intrusive debugging at an embedded software breakpoint |
US12229043B1 (en) * | 2022-06-06 | 2025-02-18 | Cadence Design Systems, Inc. | Method and system for dynamic windows traffic in emulation systems |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6230114B1 (en) * | 1999-10-29 | 2001-05-08 | Vast Systems Technology Corporation | Hardware and software co-simulation including executing an analyzed user program |
US6263303B1 (en) * | 1998-10-26 | 2001-07-17 | Sony Corporation | Simulator architecture |
US6425762B1 (en) * | 1998-02-24 | 2002-07-30 | Wind River Systems, Inc. | System and method for cosimulation of heterogeneous systems |
US6763452B1 (en) * | 1999-01-28 | 2004-07-13 | Ati International Srl | Modifying program execution based on profiling |
US6973417B1 (en) * | 1999-11-05 | 2005-12-06 | Metrowerks Corporation | Method and system for simulating execution of a target program in a simulated target system |
-
2006
- 2006-10-05 US US11/544,341 patent/US8229726B1/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6425762B1 (en) * | 1998-02-24 | 2002-07-30 | Wind River Systems, Inc. | System and method for cosimulation of heterogeneous systems |
US6263303B1 (en) * | 1998-10-26 | 2001-07-17 | Sony Corporation | Simulator architecture |
US6763452B1 (en) * | 1999-01-28 | 2004-07-13 | Ati International Srl | Modifying program execution based on profiling |
US6230114B1 (en) * | 1999-10-29 | 2001-05-08 | Vast Systems Technology Corporation | Hardware and software co-simulation including executing an analyzed user program |
US6973417B1 (en) * | 1999-11-05 | 2005-12-06 | Metrowerks Corporation | Method and system for simulating execution of a target program in a simulated target system |
Non-Patent Citations (2)
Title |
---|
Nogiec et al. "A Dynamically Reconfigurable Data Stream Processing System", Oct. 2004, Computing in High Energy Physics and Nuclear Physics 2004, pp. 429-432. * |
Nogiec et al. "Configuring systems from components: the EMS approach", Jul. 2004, Nuclear Instruments & Methods in Physics Research, pp. 101-105. * |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090037888A1 (en) * | 2007-07-30 | 2009-02-05 | Fujitsu Limited | Simulation of program execution to detect problem such as deadlock |
US20090313312A1 (en) * | 2008-06-11 | 2009-12-17 | International Business Machines Corporation | Method of Enhancing De-Duplication Impact by Preferential Selection of Master Copy to be Retained |
US8682850B2 (en) * | 2008-06-11 | 2014-03-25 | International Business Machines Corporation | Method of enhancing de-duplication impact by preferential selection of master copy to be retained |
US20110276833A1 (en) * | 2010-05-04 | 2011-11-10 | Oracle International Corporation | Statistical analysis of heap dynamics for memory leak investigations |
US8504878B2 (en) * | 2010-05-04 | 2013-08-06 | Oracle International Corporation | Statistical analysis of heap dynamics for memory leak investigations |
US8522216B2 (en) | 2010-05-04 | 2013-08-27 | Oracle International Corporation | Memory leak detection |
US8423986B1 (en) * | 2011-10-12 | 2013-04-16 | Accenture Global Services Limited | Random utility generation technology |
US8966211B1 (en) * | 2011-12-19 | 2015-02-24 | Emc Corporation | Techniques for dynamic binding of device identifiers to data storage devices |
US12155538B2 (en) | 2012-09-28 | 2024-11-26 | Intel Corporation | Managing data center resources to achieve a quality of service |
US11722382B2 (en) | 2012-09-28 | 2023-08-08 | Intel Corporation | Managing data center resources to achieve a quality of service |
US10554505B2 (en) * | 2012-09-28 | 2020-02-04 | Intel Corporation | Managing data center resources to achieve a quality of service |
CN105247532B (en) * | 2013-03-18 | 2019-05-31 | 纽约市哥伦比亚大学理事会 | Use the unsupervised detection to abnormal process of hardware characteristics |
US9996694B2 (en) | 2013-03-18 | 2018-06-12 | The Trustees Of Columbia University In The City Of New York | Unsupervised detection of anomalous processes using hardware features |
US10025929B2 (en) | 2013-03-18 | 2018-07-17 | The Trustees Of Columbia University In The City Of New York | Detection of anomalous program execution using hardware-based micro-architectural data |
WO2014152469A1 (en) * | 2013-03-18 | 2014-09-25 | The Trustees Of Columbia University In The City Of New York | Unsupervised anomaly-based malware detection using hardware features |
CN105247532A (en) * | 2013-03-18 | 2016-01-13 | 纽约市哥伦比亚大学理事会 | Unsupervised anomaly-based malware detection using hardware signatures |
US10339229B1 (en) | 2013-05-31 | 2019-07-02 | Cadence Design Systems, Inc. | Simulation observability and control of all hardware and software components of a virtual platform model of an electronics system |
GB2524016A (en) * | 2014-03-11 | 2015-09-16 | Advanced Risc Mach Ltd | Hardware simulation |
GB2524016B (en) * | 2014-03-11 | 2021-02-17 | Advanced Risc Mach Ltd | Hardware simulation |
US10824451B2 (en) | 2014-03-11 | 2020-11-03 | Arm Limited | Hardware simulation |
US11487561B1 (en) * | 2014-12-24 | 2022-11-01 | Cadence Design Systems, Inc. | Post simulation debug and analysis using a system memory model |
US9507891B1 (en) | 2015-05-29 | 2016-11-29 | International Business Machines Corporation | Automating a microarchitecture design exploration environment |
US9665674B2 (en) * | 2015-05-29 | 2017-05-30 | International Business Machines Corporation | Automating a microarchitecture design exploration environment |
US10802852B1 (en) | 2015-07-07 | 2020-10-13 | Cadence Design Systems, Inc. | Method for interactive embedded software debugging through the control of simulation tracing components |
CN109213725A (en) * | 2017-06-30 | 2019-01-15 | 英特尔Ip公司 | The reconfigurable mobile device of software and method |
US20190179655A1 (en) * | 2017-12-12 | 2019-06-13 | Arch Systems, Inc. | System and method for physical machine monitoring and analysis |
US10437619B2 (en) * | 2017-12-12 | 2019-10-08 | Arch Systems Inc. | System and method for physical machine monitoring and analysis |
US10892971B2 (en) | 2019-03-12 | 2021-01-12 | Arch Systems Inc. | System and method for network communication monitoring |
US11580228B2 (en) * | 2019-11-22 | 2023-02-14 | Oracle International Corporation | Coverage of web application analysis |
US11604718B1 (en) | 2020-03-04 | 2023-03-14 | Elasticsearch B.V. | Profiling by unwinding stacks from kernel space using exception handling data |
US11720468B1 (en) * | 2020-03-04 | 2023-08-08 | Elasticsearch B.V. | Unwinding program call stacks for performance profiling |
CN112308222A (en) * | 2020-10-27 | 2021-02-02 | 之江实验室 | A full-system simulator based on RRAM storage and computing and its design method |
CN112308222B (en) * | 2020-10-27 | 2023-06-23 | 之江实验室 | A full-system simulator based on RRAM storage and calculation and its design method |
US11928045B1 (en) * | 2021-04-21 | 2024-03-12 | Cadence Design Systems, Inc. | System and method for non-intrusive debugging at an embedded software breakpoint |
US12229043B1 (en) * | 2022-06-06 | 2025-02-18 | Cadence Design Systems, Inc. | Method and system for dynamic windows traffic in emulation systems |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8229726B1 (en) | System for application level analysis of hardware simulations | |
Jia et al. | Characterizing data analysis workloads in data centers | |
US8166462B2 (en) | Method and apparatus for sorting and displaying costs in a data space profiler | |
US8640114B2 (en) | Method and apparatus for specification and application of a user-specified filter in a data space profiler | |
US8762951B1 (en) | Apparatus and method for profiling system events in a fine grain multi-threaded multi-core processor | |
Las-Casas et al. | Sifter: Scalable sampling for distributed traces, without feature engineering | |
US8813055B2 (en) | Method and apparatus for associating user-specified data with events in a data space profiler | |
US8627335B2 (en) | Method and apparatus for data space profiling of applications across a network | |
US8176475B2 (en) | Method and apparatus for identifying instructions associated with execution events in a data space profiler | |
US8032875B2 (en) | Method and apparatus for computing user-specified cost metrics in a data space profiler | |
US10289411B2 (en) | Diagnosing production applications | |
US8136124B2 (en) | Method and apparatus for synthesizing hardware counters from performance sampling | |
Gonzalez et al. | Profiling hyperscale big data processing | |
Jia et al. | Understanding big data analytics workloads on modern processors | |
US11580228B2 (en) | Coverage of web application analysis | |
Han et al. | Benchmarking big data systems: State-of-the-art and future directions | |
Lagraa et al. | Data mining mpsoc simulation traces to identify concurrent memory access patterns | |
Wolf et al. | Large event traces in parallel performance analysis. | |
Mysore et al. | Profiling over adaptive ranges | |
Umar et al. | Pti-gpu: Kernel profiling and assessment on intel gpus | |
Ryckbosch et al. | Analyzing performance traces using temporal formulas | |
Lagraa | New MP-SoC profiling tools based on data mining techniques | |
Hauswirth et al. | Temporal vertical profiling | |
Shaccour et al. | A loop-based methodology for reducing computational redundancy in workload sets | |
Zhai et al. | Performance Analysis of Parallel Applications for HPC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: COLOPLAST A/S, DENMARK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HANSEN, MICHAEL;REEL/FRAME:017979/0895 Effective date: 20050809 |
|
AS | Assignment |
Owner name: SUN MICROSYSTEMS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MAGDON-ISMAIL, TARIQ;CHEVERESAN, RAZVAN;RAMSAY, MATTHEW D.;REEL/FRAME:018400/0427 Effective date: 20061004 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: ORACLE AMERICA, INC., CALIFORNIA Free format text: MERGER AND CHANGE OF NAME;ASSIGNORS:ORACLE USA, INC.;SUN MICROSYSTEMS, INC.;ORACLE AMERICA, INC.;REEL/FRAME:037311/0171 Effective date: 20100212 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |