WO2006054265A2 - Co-simulation of a processor design - Google Patents

Co-simulation of a processor design Download PDF

Info

Publication number
WO2006054265A2
WO2006054265A2 PCT/IB2005/053820 IB2005053820W WO2006054265A2 WO 2006054265 A2 WO2006054265 A2 WO 2006054265A2 IB 2005053820 W IB2005053820 W IB 2005053820W WO 2006054265 A2 WO2006054265 A2 WO 2006054265A2
Authority
WO
WIPO (PCT)
Prior art keywords
instructions
simulator
processor
simulation
execution
Prior art date
Application number
PCT/IB2005/053820
Other languages
French (fr)
Other versions
WO2006054265A3 (en
Inventor
Robert J. De Gruijl
Original Assignee
Koninklijke Philips Electronics, N.V.
U.S. Philips Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics, N.V., U.S. Philips Corporation filed Critical Koninklijke Philips Electronics, N.V.
Publication of WO2006054265A2 publication Critical patent/WO2006054265A2/en
Publication of WO2006054265A3 publication Critical patent/WO2006054265A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking

Definitions

  • the present invention relates to processor simulation, and more particularly, but not exclusively, relates to co-simulation of a processor with different models.
  • Simulation techniques have been embraced to debug and otherwise evaluate the performance of electronic circuitry.
  • the simulation of complex programmable processor logic is often performed with several different simulation models. Unfortunately, these models sometimes fail to provide the desired trade-off between accuracy and speed of execution. Thus, there is an ongoing need for further contributions in this area of technology.
  • One embodiment of the present application is a unique processor simulation technique.
  • Other embodiments include unique methods, systems, devices, and apparatus to simulate a processor.
  • a further embodiment of the present application includes providing an instruction set architecture simulation and a processor microarchitecture simulation that can be interlaced together.
  • the instruction set architecture simulator stores respective instruction execution information in a first-in, first-out queue
  • the processor microarchitecture simulation accesses the queue in accordance with a sequence of instructions being simulated.
  • Another embodiment of the present application includes: providing a processor design including an instruction set architecture and a processor microarchitecture to implement the instruction set architecture, simulating the processor design by operating a first simulator to simulate execution of a sequence of processor instructions in accordance with the instruction set architecture and a second simulator to simulate performance of each of the instructions in accordance with the microarchitecture.
  • this simulation further includes storing respective instruction execution information determined with the first simulator in a queue and accessing the queue with the second simulator to evaluate execution timing of the instructions.
  • Still another embodiment includes a device with computer-executable logic operable to perform a simulation of a processor design that includes an instruction set architecture and a processor microarchitecture to implement the instruction set architecture.
  • the simulation includes a first simulator to simulate execution of a sequence of instructions in accordance with the instruction set architecture and a second simulator to simulate performance of each of the instructions in accordance with the microarchitecture.
  • the simulation stores respective instruction execution information determined with the first simulator in a queue for each of the instructions and accesses this queue to evaluate execution timing behavior of the sequence of instructions with the second simulator.
  • Yet another embodiment is directed to a system that includes an operator input device, a computer processor, and an output device.
  • the processor is responsive to the input device to execute simulation logic to evaluate a processor design.
  • the simulation logic defines a first simulator to simulate execution of a sequence of processor instructions in accordance with an instruction set architecture model of the processor design and a second simulator to simulate performance of each of the instructions with a microarchitecture model of the processor design; where the microarchitecture model is effective to implement the instruction set architecture model in the processor design.
  • the simulation logic defines a queue, stores respective instruction execution information in the queue for each of the instructions, and accesses the queue to evaluate execution timing behavior of the instructions as a function of the respective instruction execution information.
  • the output device provides an output representative of execution timing behavior of the instructions.
  • an apparatus includes means for performing a simulation of a processor executing a sequence of instructions based on an instruction set architecture model of the processor, means for storing respective instruction information in a queue for each one of the instructions simulated with the instruction set architecture model during the simulation, and means for determining execution timing behavior of the sequence of instructions by simulating instruction execution in accordance with a microarchitecture model of the processor as a function of the respective instruction information read from the queue for each one of the instructions.
  • One object of the present invention is to provide a unique processor simulation technique.
  • Fig. 1 is a schematic view of a computer system.
  • Fig. 2 is a diagrammatic view of a processor co-simulation model that is executed with the computer system of Fig. 1.
  • Figs. 3 and 4 depict processor simulation flowcharts corresponding to the processor model of Fig. 2.
  • Figs. 3 and 4 depict processor simulation flowcharts corresponding to the processor model of Fig. 2.
  • One embodiment of the present application is a microprocessor simulation model that combines a instruction set architecture simulator with a microarchitecture simulator for fast and accurate hardware and software co-simulation.
  • a queue is utilized to pass trace records generated with the instruction set architecture simulator to the microarchitecture simulator.
  • Fig. 1 diagrammatically depicts computer system 20 of a further embodiment of the present application.
  • System 20 includes computer 21 with processor 22.
  • Processor 22 performs operations in accordance with programming instructions and/or another form of operating logic, and more particularly is of a type suitable to perform simulation techniques described hereinafter.
  • processor 22 is integrated circuit based, including one or more digital, solid-state central processing units each in the form of a microprocessor. It should be understood that while a single processor 22 is depicted, it is representative of multiprocessor arrangements as well as single processor arrangements. Further, processor 22 can be of a reduced instruction set (RISC) type, a complex instruction set (CISC) type, or a combination of both. For multiple processor forms, parallel and/or pipeline processing can be utilized as appropriate. Alternatively or additionally, processor 22 can be provided in the form of one or more components in a single unit or as multiple units. In one embodiment, processor 22 is in the form of one or more highly integrated, digital semiconductor devices.
  • RISC reduced instruction set
  • CISC complex
  • System 20 also includes operator input devices 24 and operator output devices 26 operatively coupled to processor 22.
  • Input devices 24 include a conventional mouse 24a and keyboard 24b, and alternatively or additionally can include a trackball, light pen, voice recognition subsystem, and/or different input device type as would occur to those skilled in the art.
  • Output devices 26 include a conventional graphic display 26a, such as a color or noncolor plasma, Cathode Ray Tube (CRT), or Liquid Crystal Display (LCD) type, and printer 26b.
  • output devices 26 can include an aural output system and/or different output device type as would occur to those skilled in the art.
  • operator input devices 24 or operator output devices 26 may be utilized.
  • Memory 28 operatively coupled to processor 22.
  • Memory 28 can be of one or more types, such as solid-state electronic memory, magnetic memory, optical memory, or a combination of these.
  • memory 28 includes a removable/portable memory device 28a that can be an optical disk (such as a CD ROM or DVD); a magnetically encoded hard disk, floppy disk, tape, or cartridge; and/or a different form as would occur to those skilled in the art.
  • at least a portion of memory 28 is operable to store operating logic for processor 22 in the form of programming instructions.
  • memory 28 can be arranged to store data other than programming instructions for processor 22.
  • memory 28 and/or portable memory device 28a may not be present.
  • System 20 also includes computer network 30, which can be a Local Area Network (LAN); Municipal Area Network (MAN); Wide Area Network (WAN), such as the Internet; another type as would occur to those skilled in the art; or a combination of these.
  • Network 30 couples computer 31 to computer 21; where computer 31 is remotely located relative to computer 21.
  • Computer 31 can include a processor, input devices, output devices, and/or memory as described in connection with computer 21; however these features of computer 31 are not shown to preserve clarity.
  • Computer 31 and computer 21 can be arranged as client and server, respectively, in relation to some or all of the processing performed with system 20. For this arrangement, it should be understood that many other remote computers 31 could be included as clients of computer 21, but are not shown to preserve clarity.
  • computer 21 and computer 31 can both be participating members of a distributed processing arrangement with one or more processing units at a different site relative to others.
  • the distributed processors of such an arrangement can be used collectively to execute routines according to the present invention.
  • remote computer 31 may be absent.
  • Operating logic for processor 22 is arranged to facilitate performance of various routines, subroutines, computer models, simulations, procedures, stages, operations, and/or conditionals described hereinafter.
  • This operating logic can be of a dedicated, hardwired variety and/or in the form of programming instructions as is appropriate for the particular processor arrangement.
  • Such logic can be at least partially encoded on device 28a for storage and/or transport to another computer.
  • the operating logic of computer 21 can be in the form of one or more signals carried by a transmission medium, such as network 30.
  • Hardware/software co-simulation is used to evaluate digital processor-based systems in which the components exist at various levels of modeling abstraction.
  • embedded software can be simulated together with components modeled in a Hardware Description Language (HDL) (such as Verilog or the like) or components modeled in a high-level programming language, (such as SystemC, C++, or the like).
  • HDL Hardware Description Language
  • Processor simulation with components modeled at a high abstraction level is typically performed when a lower-level model (such as a detailed HDL model) is not yet available, or when it is desired to execute the simulation faster than would be possible with a standard lower-level model.
  • low level HDL modelling often reveals design shortcomings and provides greater accuracy that a high level model does not. Accordingly, the ability to combine elements of both approaches is sometimes desirable.
  • FIG. 2 depicts processor model 40 in diagrammatic form.
  • Model 40 is developed in accordance with a processor design to be simulated and is encoded as corresponding operating logic executable by computer system 20.
  • the processor design is directed to execution of programming instructions over one or more execution or cycles determined relative to a system clock.
  • the design can be of a RISC variety.
  • a pipelined, scalar RISC processor design is simulated that is of an "unblocked" type. For this type as well as some others, under certain circumstances multiple instructions may be executed at the same time for one or more execution cycles or otherwise executed out of order relative to the programmed sequence.
  • Processor model 40 includes logic execution model 50.
  • Model 50 includes HDL memory model 52 and HDL peripheral hardware model 54.
  • model 50 is representative of logic and hardware that is independent of the processor design being simulated and may be optional. In other embodiments, model 50 may be provided in the form of actual operational hardware that is interfaced with computer system 20 in such a manner that the processor design is effectively emulated in accordance with model 40.
  • Model 40 further includes processor software execution model 60.
  • Model 60 is co- simulated with instruction set architecture simulator (ISS) 62 and microarchitecture simulator (MAS) 70.
  • Simulator 62 is directed to the simulation of program instructions according to the instruction set architecture (ISA) specified for the processor design under evaluation. The operation of simulator 62 generally mimics the operation visible to and expected by a programmer. This ISA simulation is defined at a higher level of abstraction and typically is coded in a high level programming language.
  • ISS 62 is operatively connected to model 50 by a bus interface 64 with an HDL-modeled system bus. Interface 64 provides for information transmission between simulator 62 and model 50.
  • programming instructions and data are stored in memory model 52 and input/output information is provided via peripheral model 54.
  • MAS simulator 70 is operatively linked to simulator 62 by an instruction queue 66.
  • Instruction queue 66 is arranged as a first-in, first-out buffer between simulator 62 and simulator 70.
  • MAS simulator 70 is modeled to simulate selected microarchitectural features of the processor design undergoing evaluation at a higher level of extraction relative to typical low level HDL processor models.
  • typical features selected for MAS simulation include caches, register dependency checking, branch prediction, and/or other features that tend to significantly impact execution performance.
  • the operation of simulator 70 is generally not of the type visible to a programmer making use of a device with the processor design. Further, simulator 70 synchronously operates in a sequential state machine fashion relative to system simulation clock 80.
  • Clock 80 is also coupled to model 50 to synchronize its operation. While simulator 62 interfaces with model 50 subject to the timing imposed by clock 80, simulator 62 asynchronously performs instruction execution simulation internal to the new design with respect to clock 80. Queue 66 is utilized to buffer timing differences between simulator 62 and simulator 70, and to loosely synchronize simulator 62 to clock 80, as is more fully described in connection with the flowcharts of Figs. 3 and 4 as follows. Referring to Fig. 3, simulation 120 describes one mode of operating simulator 62. Simulation 120 begins with initiation of simulator 62 in operation 122. This initiation includes identification of the sequence of instructions to simulate, such as a benchmark program.
  • simulator 62 fetches and simulates execution of the next instruction, following the order of the instruction sequence.
  • operation 124 includes access to model 50
  • simulator 62 is subject to any timing constraints imposed by the synchronous HDL modeling.
  • various instructions in the simulated ISA may require one or more clocked processor interface (PI) cycles to load, store, send, and/or receive information with respect to model 50; where PI cycles are timed relative to clock 80.
  • PI clocked processor interface
  • Simulation 120 proceeds in operation 125 with generation of an instruction trace record for each instruction simulated with simulator 62.
  • a trace record is generated each time a given instruction is actually executed. Accordingly, a separate trace record results for each time an instruction is executed in a repeated loop, and does not include a record for any instructions present in the program that are not actually executed, such as may occur with conditional branching, jump instructions, or the like.
  • Table I below depicts a few representative instructions and the corresponding trace record information:
  • Table I Table I also depicts the instruction sequence order in the first column, assembly- level coding in the second column, corresponding instruction description in the third column, Processor Interface (PI) cycle count in the fourth column, and internal processor execution cycle count in the fifth column for each of four different instructions in the sequence.
  • the Table I information corresponds to the pipelined, scalar, nonblocked RISC kind of design previously considered.
  • conditional 130 tests if queue 66 is full. If this test is true (yes), simulation 120 returns via loop 132 to repeat conditional 120. If the test is false (no), simulation 120 proceeds to operation 130. In operation 130, the instruction trace record is stored in queue 66 on a first-in, first-out basis.
  • Conditional 140 tests whether to continue simulation 120, if the test is true (yes), simulation 120 returns via loop 142 to instruction 124 to simulate the next instruction. If the test of conditional 140 is false (no), then simulation 120 halts.
  • simulator 62 may temporarily halt when simulating access to synchronous models, such as model 50, and/or when queue 66 is full per conditional 130. Otherwise, simulator 62 performs the instructions in sequence independent of clock 80.
  • the subject processor design is of the pipelined, scalar, nonblocked RISC architecture type.
  • the corresponding ISA includes a "load" instruction that requires multicycle, external memory access (memory model 52) and simultaneously executes subsequent instructions in a processor instruction cache as long as these subsequent instructions do not need the result of this load instruction.
  • the architecture can fully utilize any available Instruction Level Parallelism (ILP) in the software application.
  • simulator 62 does not model the out-of-order execution behavior of the design ⁇ instead delaying the execution of the subsequent instructions for the amount of cycles it takes the load instruction to finish.
  • simulator 62 provides a relatively pessimistic view of design performance behavior for instructions that have to access the Processor Interface (PI), as is the case, for example, for load and store instructions.
  • PI Processor Interface
  • the ISS can execute complex arithmetic instructions much faster (in terms of required clock cycles) than a processor modeled in a lower abstraction level.
  • the multiply (MuI) instruction from table 1 requires 6 cycles to execute on a processor modeled in HDL. If simulator 62 can fetch one instruction per cycle from 50, it is able to execute this multiply instruction in the same cycle, resulting in just one cycle execution time. This way, the ISS provides a relatively optimistic view of design performance behavior for arithmetic, multi- cycle instructions.
  • simulator 70 To improve cycle accuracy, at least a partial cycle-based model of the processor design microarchitecture is implemented in simulator 70.
  • One mode of operating simulator 70 is described in flowchart form as simulation 220.
  • Simulation 220 starts with operation 222 that initializes operation.
  • Simulation 220 continues with conditional 230 that tests if queue 66 is empty. If the test of conditional 230 is true (yes) simulation 220 repeats conditional 230 via loop 232 until the test is negative (no). From conditional 230, simulation 220 proceeds to operation 240.
  • simulator 70 reads the next corresponding instruction trace record in queue 66 on a f ⁇ rst-in, first-out basis.
  • simulator 70 evaluates the trace record information to determine to what extent (if any) two or more instructions are executed simultaneously during one or more execution cycles for the processor design. This evaluation includes determining if a current instruction depends on completing execution of one or more prior instructions (i.e. instruction dependencies) and/or if there are other constraints on concurrent execution, such as limited processing hardware, etc.
  • simulator 70 updates and provides execution timing behavior of the processor design based on this modeling. This timing behavior may be provided an operator with one or more of devices 26 ⁇ along with any other simulation information of interest. Further, based on the cumulative timing behavior evaluation of all simulated instructions, the total execution time of the instruction sequence based on the MAS model is determined.
  • Simulation 220 proceeds from operation 246 to conditional 250. Conditional 250 tests if simulation 220 should continue. If the test of conditional 250 is true (yes) simulation 220 loops back to conditional 230 via loop 252. If the test of conditional 250 is false (no), then simulation 220 halts.
  • the last column of Table I represents four trace record entries in queue 66 from first-in (no. 1 in the first column) to last-in (no. 4 in the first column).
  • each trace record includes 3 types of data: (a) processor register updates resulting from the simulated execution of the instruction with simulator 62; (b) the instruction type/category executed such as a load (1st entry), arithmetic-logic unit (ALU) (2nd and 4th entries), or multiplication dedicated operation (MAD) (3rd entry); and the amount of execution cycles the instruction was delayed (execution cycle count) because of processor interfacing access (e.g. a load instruction that requires external HDL memory model access to obtain data).
  • processor interfacing access e.g. a load instruction that requires external HDL memory model access to obtain data.
  • Execution of the four instructions in the second column of Table I is simulated with simulator 62 and the corresponding trace records shown in the last column are stored in queue 66 as space becomes available.
  • the instructions are all executed out of a dedicated instruction cache of the processor design, and the only microarchitectural features simulated by simulator 70 are for register dependency checking.
  • different MAS features, dependencies, and/or constraints can be modeled as desired.
  • MAS simulator 70 processes the first trace record from queue 66.
  • Simulator 70 is defined with data about the cycle count for each instruction type ⁇ in instant case, simulator 10 accesses cycle count data for a LOAD type instruction and determines it does not need to wait before processing the next record. Because a nonblocking architecture is being modeled, simulator 70 need not yet take into account the four PI cycles it took the LOAD instruction to obtain its data from memory model 52 in this example.
  • simulator 70 then processes trace record 2, accessing cycle count data for the corresponding ALU instruction type to determine that an execution cycle count of one applies. Simulator 70 also tracks the PI cycle quantity for prior instructions, such as the LOAD instruction for the first trace record, to evaluate timing. Because there are no register dependencies between the ALU and the LOAD instructions, simulator 70 determines there is no need to wait before processing the next record.
  • simulator 70 processes trace record 3, and again there are no pending dependencies, so it proceeds immediately with trace record 4.
  • the cycle count for a MAD-type instruction is 6, as determined by accessing the cycle count data with simulator 70. Accordingly, simulator 70 detects a register dependency on register r5, and although the ALU type instruction of trace record 4 takes only one cycle, simulator 70 waits six cycles before processing the next record to assure use of the proper value for register r5 in the fourth instruction. After waiting six cycles, the processing of trace records 1-4 results in a cumulative cycle count of nine.
  • simulator 70 flags the MAD-type instruction as being complete, and proceeds with processing the next trace record from ISS simulator 62.
  • the total execution time equates to nine cycles for the four instructions tabulated.
  • a processor model 40 based only on ISS 62 results in a more optimistic execution time of seven cycles.
  • the application of the MAS 70 and the trace queue 66 improves the cycle accuracy of the processor model 50. It does so by inserting penalty cycles for instructions that were executed by the ISS 62 with an optimistic view of the design behavior, as is the case for trace record 3, and by ignoring penalty cycles for instructions that were executed by the ISS 62 with an pessimistic view of the design behavior, as is the case for trace record 1.
  • different processor design types, ISAs, cycle counts, different MAS features, dependencies, and/or constraints are simulated as desired with corresponding adaptations to simulator 62 and/or simulator 70.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A technique of the present invention includes performing a simulation according to a processor co-simulation model (40) of a processor. Once form of this model is implemented by simulating execution of a sequence of instructions with an instruction set architecture simulator (62), storing respective instruction information in a queue (66) for each of the instructions simulated, and determining execution timing behavior of the sequence of instructions by simulating instruction execution with a microarchitecture simulator (70) as a function of the respective instruction information read from the queue (66) for each one of the instructions.

Description

CO-SIMULATION OF A PROCESSOR DESIGN
The present invention relates to processor simulation, and more particularly, but not exclusively, relates to co-simulation of a processor with different models. Simulation techniques have been embraced to debug and otherwise evaluate the performance of electronic circuitry. The simulation of complex programmable processor logic is often performed with several different simulation models. Unfortunately, these models sometimes fail to provide the desired trade-off between accuracy and speed of execution. Thus, there is an ongoing need for further contributions in this area of technology.
One embodiment of the present application is a unique processor simulation technique. Other embodiments include unique methods, systems, devices, and apparatus to simulate a processor.
A further embodiment of the present application includes providing an instruction set architecture simulation and a processor microarchitecture simulation that can be interlaced together. In one particular form, the instruction set architecture simulator stores respective instruction execution information in a first-in, first-out queue, and the processor microarchitecture simulation accesses the queue in accordance with a sequence of instructions being simulated. Another embodiment of the present application includes: providing a processor design including an instruction set architecture and a processor microarchitecture to implement the instruction set architecture, simulating the processor design by operating a first simulator to simulate execution of a sequence of processor instructions in accordance with the instruction set architecture and a second simulator to simulate performance of each of the instructions in accordance with the microarchitecture. In one particular form, this simulation further includes storing respective instruction execution information determined with the first simulator in a queue and accessing the queue with the second simulator to evaluate execution timing of the instructions.
Still another embodiment includes a device with computer-executable logic operable to perform a simulation of a processor design that includes an instruction set architecture and a processor microarchitecture to implement the instruction set architecture. The simulation includes a first simulator to simulate execution of a sequence of instructions in accordance with the instruction set architecture and a second simulator to simulate performance of each of the instructions in accordance with the microarchitecture. The simulation stores respective instruction execution information determined with the first simulator in a queue for each of the instructions and accesses this queue to evaluate execution timing behavior of the sequence of instructions with the second simulator. Yet another embodiment is directed to a system that includes an operator input device, a computer processor, and an output device. The processor is responsive to the input device to execute simulation logic to evaluate a processor design. The simulation logic defines a first simulator to simulate execution of a sequence of processor instructions in accordance with an instruction set architecture model of the processor design and a second simulator to simulate performance of each of the instructions with a microarchitecture model of the processor design; where the microarchitecture model is effective to implement the instruction set architecture model in the processor design. The simulation logic defines a queue, stores respective instruction execution information in the queue for each of the instructions, and accesses the queue to evaluate execution timing behavior of the instructions as a function of the respective instruction execution information. The output device provides an output representative of execution timing behavior of the instructions.
In yet a further embodiment, an apparatus includes means for performing a simulation of a processor executing a sequence of instructions based on an instruction set architecture model of the processor, means for storing respective instruction information in a queue for each one of the instructions simulated with the instruction set architecture model during the simulation, and means for determining execution timing behavior of the sequence of instructions by simulating instruction execution in accordance with a microarchitecture model of the processor as a function of the respective instruction information read from the queue for each one of the instructions.
One object of the present invention is to provide a unique processor simulation technique.
Other objects include providing a unique method, system, device, or apparatus to simulate a processor. Further objects, embodiments, forms, aspects, benefits, advantages, and features of the present application and its inventions will become apparent from the figures and description provided herewith.
Fig. 1 is a schematic view of a computer system. Fig. 2 is a diagrammatic view of a processor co-simulation model that is executed with the computer system of Fig. 1.
Figs. 3 and 4 depict processor simulation flowcharts corresponding to the processor model of Fig. 2. For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended. Any alterations and further modifications in the described embodiments, and any further applications of the principles of the invention as described herein are contemplated as would normally occur to one skilled in the art to which the invention relates.
One embodiment of the present application is a microprocessor simulation model that combines a instruction set architecture simulator with a microarchitecture simulator for fast and accurate hardware and software co-simulation. In one form, a queue is utilized to pass trace records generated with the instruction set architecture simulator to the microarchitecture simulator.
Fig. 1 diagrammatically depicts computer system 20 of a further embodiment of the present application. System 20 includes computer 21 with processor 22. Processor 22 performs operations in accordance with programming instructions and/or another form of operating logic, and more particularly is of a type suitable to perform simulation techniques described hereinafter. In one form, processor 22 is integrated circuit based, including one or more digital, solid-state central processing units each in the form of a microprocessor. It should be understood that while a single processor 22 is depicted, it is representative of multiprocessor arrangements as well as single processor arrangements. Further, processor 22 can be of a reduced instruction set (RISC) type, a complex instruction set (CISC) type, or a combination of both. For multiple processor forms, parallel and/or pipeline processing can be utilized as appropriate. Alternatively or additionally, processor 22 can be provided in the form of one or more components in a single unit or as multiple units. In one embodiment, processor 22 is in the form of one or more highly integrated, digital semiconductor devices.
System 20 also includes operator input devices 24 and operator output devices 26 operatively coupled to processor 22. Input devices 24 include a conventional mouse 24a and keyboard 24b, and alternatively or additionally can include a trackball, light pen, voice recognition subsystem, and/or different input device type as would occur to those skilled in the art. Output devices 26 include a conventional graphic display 26a, such as a color or noncolor plasma, Cathode Ray Tube (CRT), or Liquid Crystal Display (LCD) type, and printer 26b. Alternatively or additionally output devices 26 can include an aural output system and/or different output device type as would occur to those skilled in the art.
Further, in other embodiments, more or fewer operator input devices 24 or operator output devices 26 may be utilized.
System 20 also includes memory 28 operatively coupled to processor 22. Memory 28 can be of one or more types, such as solid-state electronic memory, magnetic memory, optical memory, or a combination of these. As illustrated in Fig. 1, memory 28 includes a removable/portable memory device 28a that can be an optical disk (such as a CD ROM or DVD); a magnetically encoded hard disk, floppy disk, tape, or cartridge; and/or a different form as would occur to those skilled in the art. In one embodiment, at least a portion of memory 28 is operable to store operating logic for processor 22 in the form of programming instructions. Alternatively or additionally, memory 28 can be arranged to store data other than programming instructions for processor 22. In still other embodiments, memory 28 and/or portable memory device 28a may not be present.
System 20 also includes computer network 30, which can be a Local Area Network (LAN); Municipal Area Network (MAN); Wide Area Network (WAN), such as the Internet; another type as would occur to those skilled in the art; or a combination of these. Network 30 couples computer 31 to computer 21; where computer 31 is remotely located relative to computer 21. Computer 31 can include a processor, input devices, output devices, and/or memory as described in connection with computer 21; however these features of computer 31 are not shown to preserve clarity. Computer 31 and computer 21 can be arranged as client and server, respectively, in relation to some or all of the processing performed with system 20. For this arrangement, it should be understood that many other remote computers 31 could be included as clients of computer 21, but are not shown to preserve clarity. In another embodiment, computer 21 and computer 31 can both be participating members of a distributed processing arrangement with one or more processing units at a different site relative to others. The distributed processors of such an arrangement can be used collectively to execute routines according to the present invention. In still other embodiments, remote computer 31 may be absent. Operating logic for processor 22 is arranged to facilitate performance of various routines, subroutines, computer models, simulations, procedures, stages, operations, and/or conditionals described hereinafter. This operating logic can be of a dedicated, hardwired variety and/or in the form of programming instructions as is appropriate for the particular processor arrangement. Such logic can be at least partially encoded on device 28a for storage and/or transport to another computer. Alternatively or additionally, the operating logic of computer 21 can be in the form of one or more signals carried by a transmission medium, such as network 30.
Hardware/software co-simulation is used to evaluate digital processor-based systems in which the components exist at various levels of modeling abstraction. In a hardware/software co-simulation environment, embedded software can be simulated together with components modeled in a Hardware Description Language (HDL) (such as Verilog or the like) or components modeled in a high-level programming language, (such as SystemC, C++, or the like). Processor simulation with components modeled at a high abstraction level is typically performed when a lower-level model (such as a detailed HDL model) is not yet available, or when it is desired to execute the simulation faster than would be possible with a standard lower-level model. On the other hand, low level HDL modelling often reveals design shortcomings and provides greater accuracy that a high level model does not. Accordingly, the ability to combine elements of both approaches is sometimes desirable.
One such combination is described in connection with Fig. 2. Fig. 2 depicts processor model 40 in diagrammatic form. Model 40 is developed in accordance with a processor design to be simulated and is encoded as corresponding operating logic executable by computer system 20. Typically, the processor design is directed to execution of programming instructions over one or more execution or cycles determined relative to a system clock. For example, the design can be of a RISC variety. In one particular form, a pipelined, scalar RISC processor design is simulated that is of an "unblocked" type. For this type as well as some others, under certain circumstances multiple instructions may be executed at the same time for one or more execution cycles or otherwise executed out of order relative to the programmed sequence. Simulation is commonly performed to evaluate a processor or processor-based system design under development and make appropriate changes in response to simulation results. This evaluation often includes benchmark or other performance testing. Processor model 40 includes logic execution model 50. Model 50 includes HDL memory model 52 and HDL peripheral hardware model 54. Generally model 50 is representative of logic and hardware that is independent of the processor design being simulated and may be optional. In other embodiments, model 50 may be provided in the form of actual operational hardware that is interfaced with computer system 20 in such a manner that the processor design is effectively emulated in accordance with model 40.
Model 40 further includes processor software execution model 60. Model 60 is co- simulated with instruction set architecture simulator (ISS) 62 and microarchitecture simulator (MAS) 70. Simulator 62 is directed to the simulation of program instructions according to the instruction set architecture (ISA) specified for the processor design under evaluation. The operation of simulator 62 generally mimics the operation visible to and expected by a programmer. This ISA simulation is defined at a higher level of abstraction and typically is coded in a high level programming language. ISS 62 is operatively connected to model 50 by a bus interface 64 with an HDL-modeled system bus. Interface 64 provides for information transmission between simulator 62 and model 50. In a typical simulation, programming instructions and data are stored in memory model 52 and input/output information is provided via peripheral model 54.
MAS simulator 70 is operatively linked to simulator 62 by an instruction queue 66. Instruction queue 66 is arranged as a first-in, first-out buffer between simulator 62 and simulator 70. MAS simulator 70 is modeled to simulate selected microarchitectural features of the processor design undergoing evaluation at a higher level of extraction relative to typical low level HDL processor models. For this approach, typical features selected for MAS simulation include caches, register dependency checking, branch prediction, and/or other features that tend to significantly impact execution performance. Unlike the operation of simulator 62, the operation of simulator 70 is generally not of the type visible to a programmer making use of a device with the processor design. Further, simulator 70 synchronously operates in a sequential state machine fashion relative to system simulation clock 80. Clock 80 is also coupled to model 50 to synchronize its operation. While simulator 62 interfaces with model 50 subject to the timing imposed by clock 80, simulator 62 asynchronously performs instruction execution simulation internal to the new design with respect to clock 80. Queue 66 is utilized to buffer timing differences between simulator 62 and simulator 70, and to loosely synchronize simulator 62 to clock 80, as is more fully described in connection with the flowcharts of Figs. 3 and 4 as follows. Referring to Fig. 3, simulation 120 describes one mode of operating simulator 62. Simulation 120 begins with initiation of simulator 62 in operation 122. This initiation includes identification of the sequence of instructions to simulate, such as a benchmark program. In operation 124, simulator 62 fetches and simulates execution of the next instruction, following the order of the instruction sequence. To the extent operation 124 includes access to model 50, simulator 62 is subject to any timing constraints imposed by the synchronous HDL modeling. For example, various instructions in the simulated ISA may require one or more clocked processor interface (PI) cycles to load, store, send, and/or receive information with respect to model 50; where PI cycles are timed relative to clock 80. Aside from such external accesses, simulation performed within simulator 62 is asynchronous relative to clock 80.
Simulation 120 proceeds in operation 125 with generation of an instruction trace record for each instruction simulated with simulator 62. A trace record is generated each time a given instruction is actually executed. Accordingly, a separate trace record results for each time an instruction is executed in a repeated loop, and does not include a record for any instructions present in the program that are not actually executed, such as may occur with conditional branching, jump instructions, or the like. Table I below depicts a few representative instructions and the corresponding trace record information:
Figure imgf000009_0001
Table I Table I also depicts the instruction sequence order in the first column, assembly- level coding in the second column, corresponding instruction description in the third column, Processor Interface (PI) cycle count in the fourth column, and internal processor execution cycle count in the fifth column for each of four different instructions in the sequence. In one nonlimiting example, the Table I information corresponds to the pipelined, scalar, nonblocked RISC kind of design previously considered.
Next, conditional 130 tests if queue 66 is full. If this test is true (yes), simulation 120 returns via loop 132 to repeat conditional 120. If the test is false (no), simulation 120 proceeds to operation 130. In operation 130, the instruction trace record is stored in queue 66 on a first-in, first-out basis. Conditional 140 tests whether to continue simulation 120, if the test is true (yes), simulation 120 returns via loop 142 to instruction 124 to simulate the next instruction. If the test of conditional 140 is false (no), then simulation 120 halts.
It should be appreciated that asynchronous simulation performed with simulator 62 may temporarily halt when simulating access to synchronous models, such as model 50, and/or when queue 66 is full per conditional 130. Otherwise, simulator 62 performs the instructions in sequence independent of clock 80.
However, this high-level simulation may sacrifice a degree of cycle accuracy depending on the specifics of the instructions simulated and/or the processor design being evaluated. In one example, the subject processor design is of the pipelined, scalar, nonblocked RISC architecture type. The corresponding ISA includes a "load" instruction that requires multicycle, external memory access (memory model 52) and simultaneously executes subsequent instructions in a processor instruction cache as long as these subsequent instructions do not need the result of this load instruction. In other words, the architecture can fully utilize any available Instruction Level Parallelism (ILP) in the software application. For such an example, simulator 62 does not model the out-of-order execution behavior of the design ~ instead delaying the execution of the subsequent instructions for the amount of cycles it takes the load instruction to finish. In this instance, simulator 62 provides a relatively pessimistic view of design performance behavior for instructions that have to access the Processor Interface (PI), as is the case, for example, for load and store instructions.
On the other hand, because of the high abstraction level at which the ISS is modeled, the ISS can execute complex arithmetic instructions much faster (in terms of required clock cycles) than a processor modeled in a lower abstraction level. For example the multiply (MuI) instruction from table 1 requires 6 cycles to execute on a processor modeled in HDL. If simulator 62 can fetch one instruction per cycle from 50, it is able to execute this multiply instruction in the same cycle, resulting in just one cycle execution time. This way, the ISS provides a relatively optimistic view of design performance behavior for arithmetic, multi- cycle instructions.
To improve cycle accuracy, at least a partial cycle-based model of the processor design microarchitecture is implemented in simulator 70. One mode of operating simulator 70 is described in flowchart form as simulation 220. Simulation 220 starts with operation 222 that initializes operation. Simulation 220 continues with conditional 230 that tests if queue 66 is empty. If the test of conditional 230 is true (yes) simulation 220 repeats conditional 230 via loop 232 until the test is negative (no). From conditional 230, simulation 220 proceeds to operation 240. In operation 240, simulator 70 reads the next corresponding instruction trace record in queue 66 on a fϊrst-in, first-out basis. In operation 242, simulator 70 evaluates the trace record information to determine to what extent (if any) two or more instructions are executed simultaneously during one or more execution cycles for the processor design. This evaluation includes determining if a current instruction depends on completing execution of one or more prior instructions (i.e. instruction dependencies) and/or if there are other constraints on concurrent execution, such as limited processing hardware, etc. In operation 246, simulator 70 updates and provides execution timing behavior of the processor design based on this modeling. This timing behavior may be provided an operator with one or more of devices 26 ~ along with any other simulation information of interest. Further, based on the cumulative timing behavior evaluation of all simulated instructions, the total execution time of the instruction sequence based on the MAS model is determined. Simulation 220 proceeds from operation 246 to conditional 250. Conditional 250 tests if simulation 220 should continue. If the test of conditional 250 is true (yes) simulation 220 loops back to conditional 230 via loop 252. If the test of conditional 250 is false (no), then simulation 220 halts.
By coupling simulator 62 and simulator 70 with queue 66, instruction trace information is passed to the modeled microarchitectural features. This co-simulation approach provides the ability to flexibly blend high-level ISA simulation (simulator 62) with cycle-specific microarchitectural simulation (simulator 70). By focusing MAS at a higher abstraction level than standard HDL modeling, a better cycle accuracy than ISA modeling is possible without the complex intricacies of a complete low level HDL processor model. Correspondingly, simulation performance time is commensurately less. Moreover, adjustment to the processor design simulation can be readily translated into simulator 70 as the processor design is being developed, and/or before a complete HDL processor model is available. Indeed, one embodiment of co-simulation model 40 includes interative performance of a design, test, and development sequence until the design achieves desired objectives. Nonetheless, in other embodiments, such features may be absent or differently realized.
Relative to the representative instructions of Table I, one nonlimiting example is next described in detail for a pipelined, scalar, nonblocked RISC processor design type; however, such instructions could be applicable to other designs, as well. The last column of Table I represents four trace record entries in queue 66 from first-in (no. 1 in the first column) to last-in (no. 4 in the first column). In this example, each trace record includes 3 types of data: (a) processor register updates resulting from the simulated execution of the instruction with simulator 62; (b) the instruction type/category executed such as a load (1st entry), arithmetic-logic unit (ALU) (2nd and 4th entries), or multiplication dedicated operation (MAD) (3rd entry); and the amount of execution cycles the instruction was delayed (execution cycle count) because of processor interfacing access (e.g. a load instruction that requires external HDL memory model access to obtain data).
Execution of the four instructions in the second column of Table I is simulated with simulator 62 and the corresponding trace records shown in the last column are stored in queue 66 as space becomes available. For this example, the instructions are all executed out of a dedicated instruction cache of the processor design, and the only microarchitectural features simulated by simulator 70 are for register dependency checking. In other embodiments, different MAS features, dependencies, and/or constraints can be modeled as desired.
At the first available execution cycle, MAS simulator 70 processes the first trace record from queue 66. Simulator 70 is defined with data about the cycle count for each instruction type ~ in instant case, simulator 10 accesses cycle count data for a LOAD type instruction and determines it does not need to wait before processing the next record. Because a nonblocking architecture is being modeled, simulator 70 need not yet take into account the four PI cycles it took the LOAD instruction to obtain its data from memory model 52 in this example. In the second execution cycle, simulator 70 then processes trace record 2, accessing cycle count data for the corresponding ALU instruction type to determine that an execution cycle count of one applies. Simulator 70 also tracks the PI cycle quantity for prior instructions, such as the LOAD instruction for the first trace record, to evaluate timing. Because there are no register dependencies between the ALU and the LOAD instructions, simulator 70 determines there is no need to wait before processing the next record.
In the third cycle, simulator 70 processes trace record 3, and again there are no pending dependencies, so it proceeds immediately with trace record 4. The cycle count for a MAD-type instruction is 6, as determined by accessing the cycle count data with simulator 70. Accordingly, simulator 70 detects a register dependency on register r5, and although the ALU type instruction of trace record 4 takes only one cycle, simulator 70 waits six cycles before processing the next record to assure use of the proper value for register r5 in the fourth instruction. After waiting six cycles, the processing of trace records 1-4 results in a cumulative cycle count of nine. In the ninth cycle, simulator 70 flags the MAD-type instruction as being complete, and proceeds with processing the next trace record from ISS simulator 62. Accordingly, for this example, based on simulator 70 the total execution time equates to nine cycles for the four instructions tabulated. In contrast, a processor model 40 based only on ISS 62 results in a more optimistic execution time of seven cycles. It should be noted, that the application of the MAS 70 and the trace queue 66, improves the cycle accuracy of the processor model 50. It does so by inserting penalty cycles for instructions that were executed by the ISS 62 with an optimistic view of the design behavior, as is the case for trace record 3, and by ignoring penalty cycles for instructions that were executed by the ISS 62 with an pessimistic view of the design behavior, as is the case for trace record 1. It should be appreciated that in other embodiments, different processor design types, ISAs, cycle counts, different MAS features, dependencies, and/or constraints are simulated as desired with corresponding adaptations to simulator 62 and/or simulator 70.
While the invention has been illustrated and described in detail in the drawings and foregoing description, the same is to be considered as illustrative and not restrictive in character, it being understood that only selected embodiments have been shown and described and that all changes, modifications and equivalents that come within the spirit of the inventions described heretofore and/or defined by the following claims are desired to be protected.

Claims

CLAIMS What is claimed is:
1. A method, comprising: providing a processor design including an instruction set architecture and a processor microarchitecture to implement the instruction set architecture; simulating the processor design by operating a first simulator (62) to simulate execution of a sequence of processor instructions in accordance with the instruction set architecture and a second simulator (70) to simulate performance of each of the instructions in accordance with at least a portion of the microarchitecture, which includes for each of the instructions: storing respective instruction execution information determined with the first simulator (62) in a queue (66); and in correspondence to the sequence, accessing the queue (66) with the second simulator (70) to evaluate execution timing of one of the instructions relative to one or more other of the instructions as a function of the respective instruction execution information.
2. The method of claim 1 , which includes determining an execution time for the sequence of instructions based on the execution timing evaluated with the second simulator (70).
3. The method of claim 1, which includes: changing the second simulator (70) to simulate additional microarchitecture design features; and repeating simulation with the first simulator (62) and the second simulator (70) after the changing.
4. The method of claim 1, wherein operation of the first simulator (62) to simulate the execution of the sequence of instructions extends over a first period of time and operation of the second simulator (70) to simulate the performance of each of the instructions extends over a second period of time, and the first period and the second period at least partially overlap.
5. The method of claim 1, which includes determining if proper execution of each one of the instructions depends on the prior proper execution of one or more other of the instructions with the second simulator (70).
6. The method of claim 1, wherein the simulating with the first simulator (62) of processor instructions internally executed by the processor design is performed asynchronously relative to a simulation clock (80) and the simulating with the second simulator (70) is performed synchronously relative to the simulation clock (80).
7. The method of claim 1, wherein the simulating includes determining with the second simulator (70) that one of the instructions can execute concurrently with another of the instructions for at least one execution cycle of the microarchitecture.
8. The method of claim 1 , wherein the processor design is of an unblocked RISC type and the queue (66) is of a first-in, first-out type.
9. The method of claim 1 , which includes simulated loading of information from an HDL model of a memory (52) with the first simulator (62).
10. An apparatus, comprising: a device (22, 28, 28a, or 30) carrying computer- executable logic operable to perform a simulation of a processor design including an instruction set architecture and a processor microarchitecture to implement the instruction set architecture, the simulation includes a first simulator (62) operable to simulate execution of a sequence of processor instructions in accordance with the instruction set architecture and a second simulator (70) operable to simulate performance of each of the instructions in accordance with at least a portion of the microarchitecture, performance of the first simulator (62) overlapping in time performance of the second simulator (70) during operation of the simulation, the simulation stores respective instruction execution information determined with the first simulator (62) in a queue (66) for each of the instructions and accesses the queue (66) to evaluate execution timing behavior of the sequence of instructions with the second simulator (70) as a function of the respective instruction execution information for each one of the instructions.
11. The apparatus of claim 10, wherein the device (22, 28, 28a, or 30) is a removable memory device (28a) and the logic is encoded on the device (28a) as computer- executable instructions.
12. The method of claim 10, wherein the device (22, 28, 28a, or 30) is at least a portion of a computer network (30) and the logic is encoded as a plurality of signals transmitted on the computer network (30).
13. The method of claim 10, further comprising a computer system (20) with a memory (28), the logic being stored in the memory (28) as a number of computer program instructions.
14. A system, comprising: an operator input device (24); a computer processor (22) responsive to the input device (24) to execute simulation logic to evaluate a processor design, the simulation logic defining a first simulator (62) to simulate execution of a sequence of processor instructions in accordance with an instruction set architecture model of the processor design and a second simulator (70) to simulate performance of each of the instructions with a microarchitecture model for the processor design, the simulation logic being operable to define a queue (66), store respective instruction execution information determined with the first simulator (62) in the queue (66) for each of the instructions, and access the queue (66) to evaluate execution timing behavior of one or more the instructions with the second simulator (70) as a function of the respective instruction execution information for each one of the instructions; and an output device (26) responsive to the computer processor (22) to provide an output representative of the execution timing behavior of the one or more of the instructions.
15. The system of claim 14, further comprising means for changing the microarchitecture model in response to the input device (24).
16. The system of claim 14, wherein the system is configured to provide a processor emulator that operates in accordance with the processor design.
17. The system of claim 14, wherein the second simulator (70) is operable to determine if proper execution of each one of the instructions depends on the prior proper execution of one or more other of the instructions.
18. The system of claim 14, wherein the second simulator (70) includes means for determining one of the instructions can execute concurrently with another of the instructions for at least one execution cycle of the microarchitecture model.
19. The system of claim 14, wherein the queue (66) is defined as a first-in, first- out type.
20. A method, comprising: performing a simulation of a processor (22) executing a sequence of processor instructions based on an instruction set architecture model of the processor (22); storing respective instruction information in a queue (66) for each of the instructions simulated with the instruction set architecture model during the simulation; and determining execution timing behavior of the sequence of instructions by simulating instruction execution in accordance with a microarchitecture model of the processor (22) as a function of the respective instruction information read from the queue (66) for each one of the instructions.
21. The method of claim 20, which includes determining execution time for the sequence of instructions in accordance with the execution timing for each respective one of the instructions.
22. The method of claim 20, which includes implementing the simulation with a first simulator (62) to simulate the instruction set architecture model and a second simulator (70) to simulate the microarchitecture model, the first simulator (62) and the second simulator (70) both operating during performance of at least a portion of the simulation.
23. The method of claim 20, wherein the simulating of instruction execution in accordance with the microarchitecture model includes determining if proper execution of a current one of the instructions depends on the prior proper execution of one or more other of the instructions, and the queue (66) is of a first-in, first-out type.
24. The method of claim 20, wherein the simulation according to the instruction set architecture model is performed asynchronously relative to a simulation clock (80) for processor instructions internally executed by the processor (22) and the simulation according to the microarchitecture model is performed synchronously relative to the simulation clock (80).
25. The method of claim 20, wherein the simulation includes determining one of the instructions can execute concurrently with another of the instructions for at least one execution cycle of the microarchitecture.
26. The method of claim 20, wherein the processor (22) is of an unblocked RISC type.
27. An apparatus, comprising: means for performing a simulation of a processor (22) executing a sequence of processor instructions based on an instruction set architecture model of the processor (22); means for storing respective instruction information in a queue (66) for each of the instructions simulated with the instruction set architecture model during the simulation; and means for determining execution timing behavior of the sequence of instructions by simulating instruction execution in accordance with a microarchitecture model of the processor (22) as a function of the respective instruction information read from the queue (66) for each one of the instructions in accordance with the sequence.
PCT/IB2005/053820 2004-11-19 2005-11-18 Co-simulation of a processor design WO2006054265A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US62986904P 2004-11-19 2004-11-19
US60/629,869 2004-11-19

Publications (2)

Publication Number Publication Date
WO2006054265A2 true WO2006054265A2 (en) 2006-05-26
WO2006054265A3 WO2006054265A3 (en) 2006-11-16

Family

ID=36354104

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2005/053820 WO2006054265A2 (en) 2004-11-19 2005-11-18 Co-simulation of a processor design

Country Status (1)

Country Link
WO (1) WO2006054265A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8280713B2 (en) 2007-04-16 2012-10-02 International Business Machines Corporation Automatic generation of test suite for processor architecture compliance

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100057427A1 (en) * 2008-09-04 2010-03-04 Anthony Dean Walker Simulated processor execution using branch override

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIANWEI CHEN ET AL: "Integrating complete-system and user-level performance/power simulators: the simwattch approach" PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, 2003. ISPASS. 2003 IEEE INTERNATIONAL SYMPOSIUM ON MARCH 6-8, 2003, PISCATAWAY, NJ, USA,IEEE, 6 March 2003 (2003-03-06), pages 1-10, XP010637393 ISBN: 0-7803-7756-7 *
LARSON E ET AL: "MASE: a novel infrastructure for detailed microarchitectural modeling" PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, 2001. ISPASS. 2001 IEEE INTERNATIONAL SYMPOSIUM ON NOV. 4-6, 2001, PISCATAWAY, NJ, USA,IEEE, 4 November 2001 (2001-11-04), pages 1-9, XP010583881 ISBN: 0-7695-7230-1 *
MOUDGILL M ET AL: "Environment for PowerPC microarchitecture exploration" IEEE MICRO IEEE USA, vol. 19, no. 3, May 1999 (1999-05), pages 15-25, XP002396655 ISSN: 0272-1732 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8280713B2 (en) 2007-04-16 2012-10-02 International Business Machines Corporation Automatic generation of test suite for processor architecture compliance

Also Published As

Publication number Publication date
WO2006054265A3 (en) 2006-11-16

Similar Documents

Publication Publication Date Title
Hwang et al. Cycle-approximate retargetable performance estimation at the transaction level
Chiou et al. Fpga-accelerated simulation technologies (fast): Fast, full-system, cycle-accurate simulators
US7840397B2 (en) Simulation method
Langenbach et al. Pipeline modeling for timing analysis
Pimentel et al. Calibration of abstract performance models for system-level design space exploration
US20110307688A1 (en) Synthesis system for pipelined digital circuits
Nurvitadhi et al. Automatic pipelining from transactional datapath specifications
US11734480B2 (en) Performance modeling and analysis of microprocessors using dependency graphs
Schlickling et al. Semi-automatic derivation of timing models for WCET analysis
US7373638B1 (en) Automatic generation of structure and control path using hardware description language
Pellauer et al. Quick performance models quickly: Closely-coupled partitioned simulation on FPGAs
Ray et al. High-level modeling and FPGA prototyping of microprocessors
WO2006054265A2 (en) Co-simulation of a processor design
US20120185231A1 (en) Cycle-Count-Accurate (CCA) Processor Modeling for System-Level Simulation
Thesing Modeling a system controller for timing analysis
US20090055155A1 (en) Simulating execution of software programs in electronic circuit designs
Herbegue et al. Formal architecture specification for time analysis
Kim et al. Performance simulation modeling for fast evaluation of pipelined scalar processor by evaluation reuse
Hoseininasab et al. Rapid Prototyping of Complex Micro-architectures Through High-Level Synthesis
Dave et al. A design flow based on modular refinement
Li et al. A retargetable software timing analyzer using architecture description language
Sun et al. Using execution graphs to model a prefetch and write buffers and its application to the Bostan MPPA
Velev Integrating formal verification into an advanced computer architecture course
Sartori et al. Go functional model for a RISC-V asynchronous organisation—ARV
Lin et al. A fast and accurate instruction-oriented processor simulation approach

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KN KP KR KZ LC LK LR LS LT LU LV LY MA MD MG MK MN MW MX MZ NA NG NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

NENP Non-entry into the national phase in:

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase

Ref document number: 05813449

Country of ref document: EP

Kind code of ref document: A2