US20060117274A1 - Behavior processor system and method - Google Patents

Behavior processor system and method Download PDF

Info

Publication number
US20060117274A1
US20060117274A1 US09/918,600 US91860001A US2006117274A1 US 20060117274 A1 US20060117274 A1 US 20060117274A1 US 91860001 A US91860001 A US 91860001A US 2006117274 A1 US2006117274 A1 US 2006117274A1
Authority
US
United States
Prior art keywords
logic
system
signal
data
simulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/918,600
Inventor
Ping-sheng Tseng
Yogesh Goel
Su-Jen Hwang
James Lee
Kun-Hsu Shen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Verisity Design Inc
Original Assignee
Verisity Design Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US09/144,222 priority Critical patent/US6321366B1/en
Priority to US37301499A priority
Priority to US09/900,124 priority patent/US20020152060A1/en
Application filed by Verisity Design Inc filed Critical Verisity Design Inc
Priority to US09/918,600 priority patent/US20060117274A1/en
Assigned to AXIS SYSTEMS, INC. reassignment AXIS SYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GOEL, YOGESH, HWANG, SU-JEN, LEE JAMES, SHEN, KUN-HSU, TSENG, PING-SHENG
Priority claimed from US10/092,839 external-priority patent/US6754763B2/en
Assigned to VERISITY DESIGN, INC. reassignment VERISITY DESIGN, INC. MERGER (SEE DOCUMENT FOR DETAILS). Assignors: AXIS SYSTEMS, INC.
Publication of US20060117274A1 publication Critical patent/US20060117274A1/en
Priority claimed from US13/078,786 external-priority patent/US9195784B2/en
Application status is Abandoned legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5009Computer-aided design using simulation
    • G06F17/5022Logic simulation, e.g. for logic circuit operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5009Computer-aided design using simulation
    • G06F17/5022Logic simulation, e.g. for logic circuit operation
    • G06F17/5027Logic emulation using reprogrammable logic devices, e.g. field programmable gate arrays [FPGA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2217/00Indexing scheme relating to computer aided design [CAD]
    • G06F2217/86Hardware-Software co-design

Abstract

The debug system described in this patent specification provides a system that generates hardware elements from normally non-synthesizable code elements for placement on an FPGA device. This particular FPGA device is called a Behavior Processor. This Behavior Processor executes in hardware those code constructs that were previously executed in software. When some condition is satisfied (e.g., If . . . then . . . else loop) which requires some intervention by the workstation or the software model, the Behavior Processor works with an Xtrigger device to send a callback signal to the workstation for immediate response.

Description

    RELATED U.S. APPLICATION
  • This is a continuation-in-part of U.S. patent application Ser. No. 09/900,124, filed Jul. 6, 2001, entitled “Inter-Chip Communication System”; which is a continuation-in-part of U.S. patent application Ser. No. 09/373,014, filed Aug. 11, 1999, entitled “VCD-on-Demand System and Method”; which is a continuation-in-part of U.S. patent application Ser. No. 09/144,222, filed Aug. 31, 1998, entitled “Timing-Insensitive and Glitch-Free Logic System and Method”.
  • BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention generally relates to electronic design automation (EDA). More particularly, the present invention relates to dynamically changing the evaluation period to accelerate design debug sessions.
  • 2. Description of Related Art
  • In general, electronic design automation (EDA) is a computer-based tool configured in various workstations to provide designers with automated or semi-automated tools for designing and verifying user's custom circuit designs. EDA is generally used for creating, analyzing, and editing any electronic design for the purpose of simulation, emulation, prototyping, execution, or computing. EDA technology can also be used to develop systems (i.e., target systems) which will use the user-designed subsystem or component. The end result of EDA is a modified and enhanced design, typically in the form of discrete integrated circuits or printed circuit boards, that is an improvement over the original design while maintaining the spirit of the original design.
  • The value of software simulating a circuit design followed by hardware emulation is recognized in various industries that use and benefit from EDA technology. Nevertheless, current software simulation and hardware emulation/acceleration are cumbersome for the user because of the separate and independent nature of these processes. For example, the user may want to simulate or debug the circuit design using software simulation for part of the time, use those results and accelerate the simulation process using hardware models during other times, inspect various register and combinational logic values inside the circuit at select times, and return to software simulation at a later time, all in one debug/test session. Furthermore, as internal register and combinational logic values change as the simulation time advances, the user should be able to monitor these changes even if the changes are occurring in the hardware model during the hardware acceleration/emulation process.
  • Co-simulation arose out of a need to address some problems with the cumbersome nature of using two separate and independent processes of pure software simulation and pure hardware emulation/acceleration, and to make the overall system more user-friendly. However, co-simulators still have a number of drawbacks: (1) co-simulation systems require manual partitioning, (2) co-simulation uses two loosely coupled engines, (3) co-simulation speed is as slow as software simulation speed, and (4) co-simulation systems encounter race conditions.
  • First, partitioning between software and hardware is done manually, instead of automatically, further burdening the user. In essence, co-simulation requires the user to partition the design (starting with behavior level, then RTL, and then gate level) and to test the models themselves among the software and hardware at very large functional blocks. Such a constraint requires some degree of sophistication by the user.
  • Second, co-simulation systems utilize two loosely coupled and independent engines, which raise inter-engine synchronization, coordination, and flexibility issues. Co-simulation requires synchronization of two different verification engines—software simulation and hardware emulation. Even though the software simulator side is coupled to the hardware accelerator side, only external pin-out data is available for inspection and loading. Values inside the modeled circuit at the register and combinational logic level are not available for easy inspection and downloading from one side to the other, limiting the utility of these co-simulator systems. Typically, the user may have to re-simulate the whole design if the user switches from software simulation to hardware acceleration and back. Thus, if the user wanted to switch between software simulation and hardware emulation/acceleration during a single debug session while being able to inspect register and combinational logic values, co-simulator systems do not provide this capability.
  • Third, co-simulation speed is as slow as simulation speed. Co-simulation requires synchronization of two different verification engines—software simulation and hardware emulation. Each of the engines has its own control mechanism for driving the simulation or emulation. This implies that the synchronization between the software and hardware pushes the overall performance to a speed that is as low as software simulation. The additional overhead to coordinate the operation of these two engines adds to the slow speed of co-simulation systems.
  • Fourth, co-simulation systems encounter set-up, hold time, and clock glitch problems due to race conditions in the hardware logic element or hardware accelerator among clock signals. Co-simulators use hardware driven clocks, which may find themselves at the inputs to different logic elements at different times due to different wire line lengths. This raises the uncertainty level of evaluation results as some logic elements evaluate data at some time period and other logic elements evaluate data at different time periods, when these logic elements should be evaluating the data together.
  • Accordingly, a need exists in the industry for a system or method that addresses problems raised above by currently known simulation systems, hardware emulation systems, hardware accelerators, co-simulation, and coverification systems.
  • SUMMARY OF THE INVENTION
  • An object of the present invention is to use less hardware resources than the dedicated hardware cross-bar technology while achieving similar performance levels.
  • Another object of the present invention is to be more resourceful than the virtual wires technology without the decrease in performance arising from the use of extra evaluation cycles for the transfer of inter-chip data.
  • One embodiment of the present invention is an inter-chip communication system that transfers signals across FPGA chip boundaries only when these signals change values. This is accomplished with a series of event detectors that detect changes in signal values and packet schedulers which can then schedule the transfer of these changed signal values to another designated chip.
  • These and other embodiments are fully discussed and illustrated in the following sections of the specification.
  • BRIEF DESCRIPTION OF THE FIGURES
  • The above objects and description of the present invention may be better understood with the aid of the following text and accompanying drawings.
  • FIG. 1 shows a high level overview of one embodiment of the present invention, including the workstation, reconfigurable hardware emulation model, emulation interface, and the target system coupled to a PCI bus.
  • FIG. 2 shows one particular usage flow diagram of the present invention.
  • FIG. 3 shows a high level diagram of the software compilation and hardware configuration during compile time and run time in accordance with one embodiment of the present invention.
  • FIG. 4 shows a flow diagram of the compilation process, which includes generating the software/hardware models and the software kernel code.
  • FIG. 5 shows the software kernel that controls the overall SEmulation system.
  • FIG. 6 shows a method of mapping hardware models to reconfigurable boards through mapping, placement, and routing.
  • FIG. 7 shows the connectivity matrix for the FPGA array shown in FIG. 8.
  • FIG. 8 shows one embodiment of the 4×4 FPGA array and their interconnections.
  • FIGS. 9(A), 9(B), and 9(C) illustrate one embodiment of the time division multiplexed (TDM) circuit which allows a group of wires to be coupled together in a time multiplexed fashion so that one pin, instead of a plurality of pins, can be used for this group of wires in a chip. FIG. 9(A) presents an overview of the pin-out problem, FIG. 9(B) provides a TDM circuit for the transmission side, and FIG. 9(C) provides a TDM circuit for the receiver side.
  • FIG. 10 shows a SEmulation system architecture in accordance with one embodiment of the present invention.
  • FIG. 11 shows one embodiment of address pointer of the present invention.
  • FIG. 12 shows a state transition diagram of the address pointer initialization for the address pointer of FIG. 11.
  • FIG. 13 shows one embodiment of the MOVE signal generator for derivatively generating the various MOVE signals for the address pointer.
  • FIG. 14 shows the chain of multiplexed address pointers in each FPGA chip.
  • FIG. 15 shows one embodiment of the multiplexed cross chip address pointer chain in accordance with one embodiment of the present invention.
  • FIG. 16 shows a flow diagram of the clock/data network analysis that is critical for the software clock implementation and the evaluation of logic components in the hardware model.
  • FIG. 17 shows a basic building block of the hardware model in accordance with one embodiment of the present invention.
  • FIGS. 18(A) and 18(B) show the register model implementation for latches and flip-flops.
  • FIG. 19 shows one embodiment of the clock edge detection logic in accordance with one embodiment of the present invention.
  • FIG. 20 shows a four state finite state machine to control the clock edge detection logic of FIG. 19 in accordance with one embodiment of the present invention.
  • FIG. 21 shows the interconnection, JTAG, FPGA bus, and global signal pin designations for each FPGA chip in accordance with one embodiment of the present invention.
  • FIG. 22 shows one embodiment of the FPGA controller between the PCI bus and the FPGA array.
  • FIG. 23 shows a more detailed illustration of the CTRL_FPGA unit and data buffer which were discussed with respect to FIG. 22.
  • FIG. 24 shows the 4×4 FPGA array, its relationship to the FPGA banks, and expansion capability.
  • FIG. 25 shows one embodiment of the hardware start-up method.
  • FIG. 26 shows the HDL code for one example of a user circuit design to be modeled and simulated.
  • FIG. 27 shows a circuit diagram that symbolically represent the circuit design of the HDL code in FIG. 26.
  • FIG. 28 shows the component type analysis for the HDL code of FIG. 26.
  • FIG. 29 shows a signal network analysis of a structured RTL HDL code based on the user's custom circuit design shown in FIG. 26.
  • FIG. 30 shows the software/hardware partition result for the same hypothetical example.
  • FIG. 31 shows a hardware model for the same hypothetical example.
  • FIG. 32 shows one particular hardware model-to-chip partition result for the same hypothetical example of a user's custom circuit design.
  • FIG. 33 shows another particular hardware model-to-chip partition result for the same hypothetical example of a user's custom circuit design.
  • FIG. 34 shows the logic patching operation for the same hypothetical example of a user's custom circuit design.
  • FIGS. 35(A) to 35(D) illustrate the principle of “hops” and interconnections with two examples.
  • FIG. 36 shows an overview of the FPGA chip used in the present invention.
  • FIG. 37 shows the FPGA interconnection buses on the FPGA chip.
  • FIGS. 38(A) and 38(B) show side views of the FPGA board connection scheme in accordance with one embodiment of the present invention.
  • FIG. 39 shows a direct-neighbor and one-hop six-board interconnection layout of the FPGA array in accordance with one embodiment of the present invention.
  • FIGS. 40(A) and 40(B) show FPGA inter-board interconnection scheme.
  • FIGS. 41(A) to 41(F) show top views of the board interconnection connectors.
  • FIG. 42 shows on-board connectors and some components in a representative FPGA board.
  • FIG. 43 shows a legend of the connectors in FIGS. 41(A) to 41(F) and 42.
  • FIG. 44 shows a direct-neighbor and one-hop dual-board interconnection layout of the FPGA array in accordance with another embodiment of the present invention.
  • FIG. 45 shows a workstation with multiprocessors in accordance with another embodiment of the present invention.
  • FIG. 46 shows an environment in accordance with another embodiment of the present invention in which multiple users share a single simulation/emulation system on a time-shared basis.
  • FIG. 47 shows a high level structure of the Simulation server in accordance with one embodiment of the present invention.
  • FIG. 48 shows the architecture of the Simulation server in accordance with one embodiment of the present invention.
  • FIG. 49 shows a flow diagram of the Simulation server.
  • FIG. 50 shows a flow diagram of the job swapping process.
  • FIG. 51 shows the signals between the device driver and the reconfigurable hardware unit.
  • FIG. 52 illustrates the time-sharing feature of the Simulation server for handling multiple jobs with different levels of priorities.
  • FIG. 53 shows the communication handshake signals between the device driver and the reconfigurable hardware unit.
  • FIG. 54 shows the state diagram of the communication handshake protocol.
  • FIG. 55 shows an overview of the client-server model of the Simulation server in accordance with one embodiment of the present invention.
  • FIG. 56 shows a high level block diagram of the Simulation system for implementing memory mapping in accordance with one embodiment of the present invention.
  • FIG. 57 shows a more detailed block diagram of the memory mapping aspect of the Simulation system with supporting components for the memory finite state machine (MEMFSM) and the evaluation finite state machine for each FPGA logic device (EVALFSMx).
  • FIG. 58 shows a state diagram of a finite state machine of the MEMFSM unit in the CTRL_FPGA unit in accordance with one embodiment of the present invention.
  • FIG. 59 shows a state diagram of a finite state machine in each FPGA chip in accordance with one embodiment of the present invention.
  • FIG. 60 shows the memory read data double buffer.
  • FIG. 61 shows the Simulation write/read cycle in accordance with one embodiment of the present invention.
  • FIG. 62 shows a timing diagram of the Simulation data transfer operation when the DMA read operation occurs after the CLK_EN signal.
  • FIG. 63 shows a timing diagram of the Simulation data transfer operation when the DMA read operation occurs near the end of the EVAL period.
  • FIG. 64 shows a typical user design implemented as a PCI add-on card.
  • FIG. 65 shows a typical hardware/software coverification system using an ASIC as the device-under-test.
  • FIG. 66 shows a typical coverification system using an emulator where the device-under-test is programmed in the emulator.
  • FIG. 67 shows a simulation system in accordance with one embodiment of the present invention.
  • FIG. 68 shows a coverification system without external I/O devices in accordance with one embodiment of the present invention, where the RCC computing system contains a software model of the various I/O devices and the target system.
  • FIG. 69 shows a coverification system with actual external I/O devices and the target system in accordance with another embodiment of the present invention.
  • FIG. 70 shows a more detailed logic diagram of the data-in portion of the control logic in accordance with one embodiment of the present invention.
  • FIG. 71 shows a more detailed logic diagram of the data-out portion of the control logic in accordance with one embodiment of the present invention.
  • FIG. 72 shows the timing diagram of the data-in portion of the control logic.
  • FIG. 73 shows the timing diagram of the data-out portion of the control logic.
  • FIG. 74 shows a board layout of the RCC hardware array in accordance with one embodiment of the present invention.
  • FIG. 75(A) shows an exemplary shift register circuit which will be used to explain the hold time and clock glitch problems.
  • FIG. 75(B) shows a timing diagram of the shift register circuit shown in FIG. 75(A) to illustrate hold time.
  • FIG. 76(A) shows the same shift register circuit of FIG. 75(A) placed across multiple FPGA chips.
  • FIG. 76(B) shows a timing diagram of the shift register circuit shown in FIG. 76(A) to illustrate hold time violation.
  • FIG. 77(A) shows an exemplary logic circuit which will be used to illustrate a clock glitch problem.
  • FIG. 77(B) shows a timing diagram of the logic circuit of FIG. 77(A) to illustrate the clock glitch problem.
  • FIG. 78 shows a prior art timing adjustment technique for solving the hold time violation problem.
  • FIG. 79 shows a prior art timing resynthesis technique for solving the hold time violation problem.
  • FIG. 80(A) shows the original latch and FIG. 80(B) shows a timing insensitive and glitch-free latch in accordance with one embodiment of the present invention.
  • FIG. 81(A) shows the original design flip-flop and FIG. 81(B) shows a timing insensitive and glitch-free design type flip-flop in accordance with one embodiment of the present invention.
  • FIG. 82 shows a timing diagram of the trigger mechanism of the timing insensitive and glitch-free latch and flip-flop in accordance with one embodiment of the present invention.
  • These figures will be discussed below with respect to several different aspects and embodiments of the present invention.
  • FIG. 83 shows a high level view of the components of the RCC system which incorporates one embodiment of the present invention.
  • FIG. 84 shows several simulation time periods to illustrate the VCD on-demand operation in accordance with one embodiment of the present invention.
  • FIG. 85 shows a single row interconnect layout in accordance with one embodiment of the present invention.
  • FIG. 86 shows a two-row interconnect layout in accordance with another embodiment of the present invention.
  • FIG. 87 shows a three-row interconnect layout in accordance with another embodiment of the present invention.
  • FIG. 88 shows a four-row interconnect layout in accordance with another embodiment of the present invention.
  • FIG. 89 shows a table that summarizes the interconnect layout scheme for a three-row board in accordance with one embodiment of the present invention.
  • FIG. 90 shows a system diagram of the dynamic logic evaluation system and method in accordance with one embodiment of the present invention.
  • FIG. 91 shows a detailed circuit diagram of the propagation detector in accordance with one embodiment of t