CN100578510C - Timing-insensitive glitch-free logic device - Google Patents

Timing-insensitive glitch-free logic device Download PDF

Info

Publication number
CN100578510C
CN100578510C CN01822790A CN01822790A CN100578510C CN 100578510 C CN100578510 C CN 100578510C CN 01822790 A CN01822790 A CN 01822790A CN 01822790 A CN01822790 A CN 01822790A CN 100578510 C CN100578510 C CN 100578510C
Authority
CN
China
Prior art keywords
hardware
clock
chip
fpga
software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN01822790A
Other languages
Chinese (zh)
Other versions
CN1491394A (en
Inventor
曾平圣
梁小萍
沈崑旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cadence Design Systems Inc
Original Assignee
Verisity Design Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Verisity Design Inc filed Critical Verisity Design Inc
Priority claimed from PCT/US2001/025546 external-priority patent/WO2003017148A1/en
Publication of CN1491394A publication Critical patent/CN1491394A/en
Application granted granted Critical
Publication of CN100578510C publication Critical patent/CN100578510C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/04Generating or distributing clock signals or signals derived directly therefrom
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • G06F30/3308Design verification, e.g. functional simulation or model checking using simulation
    • G06F30/331Design verification, e.g. functional simulation or model checking using simulation with hardware acceleration, e.g. by using field programmable gate array [FPGA] or emulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/327Logic synthesis; Behaviour synthesis, e.g. mapping logic, HDL to netlist, high-level language to RTL or netlist

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A timing insensitive glitch-free (TIGF) logic device which can take the form of any latch or edge triggered flip-flop. In one embodiment, a trigger signal is provided to update the TIGF logic device. The trigger signal is provided during a short trigger period that occurs at adjacent times from the evaluation period (figure 59). In latch form, the TIGF latch includes a flip-flop that holds the current state of the TIGF latch until a trigger signal is received (figure 59). A multiplexer is also provided to receive the new input value and the old stored value. The enable signal functions as the selector signal for the multiplexer. In flip-flop form, the TIGF flip-flop includes a first flip-flop that holds the new input value, a second flip-flop that holds the current stored value, and a clock edge detector. Hold time violation are avoided because one dedicated flip-flop stores the new input value which effectively blocks input changes during evaluation.

Description

Timing-insensitive and glitch-free logical device
Related U.S. patent
This was the part continuation application case of 08/850,136 U.S. patent application case for sequence number, and it submits United States Patent (USP) trademark office on May 2nd, 1997.
Technical field
The present invention relates generally to electric design automation (EDA).In particular, the present invention relates to solve the retention time in the multiple application and the digital logic arrangement of clock aliasing problem, comprise simulation, hardware-accelerated and collaborative verification.
Background technology
Usually, electric design automation (EDA) is a kind of computer based instrument, and it is configured on the various workstations, for the deviser provides robotization or semi-automatic instrument, is used to design and verify user's self-defined circuit design.EDA can be used for the foundation of any Electronic Design usually, analyzes and editor, and the purpose of these designs is simulations, emulation, and prototype is carried out, or calculates.The EDA technology also can be used to carry out the exploitation of system's (being goal systems), and these systems will use the subsystem or the assembly of user's design.The net result of EDA is a design that improves and strengthen, and it is generally the form of discrete integrated circuit or printed circuit board (PCB), the soul that it has been improved original design and has kept original design.The value of carrying out simulation hardware after the software simulation circuit design is recognized in various uses and the industry of benefiting from the EDA technology.Yet present software simulation and simulation hardware/acceleration are pretty troublesome concerning the user, and this is to separate and characteristics independently because these processing have.For example, the user may utilize software simulation to go simulation or debug circuit design in portion of time, in other times, use these results and utilize hardware model to quicken simulation process, selecting the combinational logic value of checking between sequential in each register and the circuit, and return software simulation in later time, all these is in a debugging/test process.In addition, when internal register and combinational logic value changed along with the propelling of simulated time, the user should monitor that these change, even change in the hardware model that occurs in hardware-accelerated/simulation process.
Use two separation and the problem of making us perplexing brought of pure software simulation and pure simulation hardware/accelerator independently in order to solve, co-simulation arises at the historic moment, and it makes total system close friend more.But, co-simulation still has some shortcomings: (1) co-simulation system needs the manual adjustment subregion, (2) co-simulation uses the engine of two loose coupling connection, and (3) co-simulation speed is the same with software simulation speed slow, and (4) co-simulation system can meet with the race state.
At first, by the division that manually rather than automatically comes between process software and the hardware, this has more increased the weight of user's burden.Substantially, co-simulation needs user oneself based on very large functional block, divides design (starting from the behavior hierarchical Design, is RTL (register transfer) design again, then is the gate leve hierarchical Design) and test model in software and hardware.Such restriction needs the user to have certain level.
Secondly, the co-simulation system uses the independent engine of two loose coupling connection, and it has caused between engine synchronous, coordinates and adaptive problem.Co-simulation need between two different verification engines synchronously---software simulation and simulation hardware.Even the software simulator end is connected with the hardware accelerator end, also only there are outside leading foot data and to be written into for inspection.The value that is in register and combinational logic grade in the modelling circuit can not simply be used to the download checking and pass through, has limited the application of these co-simulation device systems.Otherwise when the user from software simulation be transformed into hardware-accelerated or the time, he generally must simulate whole design again.Therefore, if user expectation realizes the conversion between software simulation and the simulation hardware/acceleration in single debug procedures, can also check register and combinational logic value simultaneously, co-simulation device system then can't provide this function.
The 3rd, co-simulation speed is the same with analog rate slow.Co-simulation need between two different verification engines synchronously---software simulation and simulation hardware.Each engine has the driving simulation of oneself or the controlling mechanism of emulation.The speed that will cause overall performance synchronously between this expression software and hardware is the same with software simulation speed slow.More aggravated the low speed problem of co-simulation system for the additional consumption of coordinating these two engine operations.
The 4th, the co-simulation system is because the race state between the clock signal will meet with setting, retention time and clock aliasing problem.The co-simulation device uses the hardware driving clock, and this causes them owing to different wire line length, is in the input end of Different Logic element in the different time.When these logic elements of needs simultaneously during the estimated data and since some logic elements certain in period the estimated data and other logic elements different estimated datas in period, will produce estimation result's uncertainty.
Therefore, need a kind of system or method that can solve by the at present known problem that system brought, known system comprises simulation system, hardware simulation system, hardware accelerator, co-simulation and collaborative check system.
Summary of the invention
The present invention is that the problems referred to above provide solution with the form of simulation/analogue system flexibly and fast, we are referred to as " Analog Simulation System (SEmulation system) " at this, " analog simulator system (Semulator system) ", or collaborative check system, wherein comprise reconfigurable computing system (or rcc computing system) and reconfigurable hardware array (or RCC hardware array).
Analog Simulation System of the present invention and method change the ability that can supply the software and hardware form of simulation into for the user provides the electronic system design with them.Analog Simulation System is generally the emulator of software control or hardware-accelerated simulator, and uses method of the present invention therein.Therefore, the pure software simulation is feasible, but also can quicken simulation by using hardware model.Can use the hardware-accelerated startup of software control, stop, the operation of the value of asserting and check the value.Also provide the internal circuit simulation model with test subscriber's circuit design under the goal systems environment of circuit.In addition, also provide software control.
The core of system is a software kernel, and its Control Software and hardware model start by allowing the user, stop, and the value of asserting, check the value, and switching comes to provide stronger dirigibility working time for the user between various patterns.Kernel is by controlling each pattern to the data estimation in the startup input control hardware of register.
Provide 4 kinds of operator schemes according to Analog Simulation System of the present invention and method: (1) software simulation, (2) be by hardware-accelerated simulation, (3) internal circuit emulation (ICE), and (4) back sunykatuib analysis.At a higher level, the present invention implements with the various combinations of each of above-mentioned 4 kinds of patterns or these patterns, and is as described below: the software simulation that (1) is independent; (2) separately by hardware-accelerated simulation; (3) independent internal circuit emulation (ICE); (4) independent back sunykatuib analysis; (5) software simulation and by hardware-accelerated simulation; (6) software simulation and ICE; (7) by hardware-accelerated simulation and ICE; (8) software simulation is by hardware-accelerated simulation and ICE; (9) software simulation and back sunykatuib analysis; (10) by hardware-accelerated simulation and back sunykatuib analysis; (11) software simulation is by hardware-accelerated simulation and back sunykatuib analysis; (12) ICE and back sunykatuib analysis; (13) software simulation, ICE, back sunykatuib analysis; (14) by hardware-accelerated simulation, ICE, back sunykatuib analysis; And (15) software simulation, by hardware-accelerated simulation, ICE and back sunykatuib analysis.Other combinations also are feasible and within the scope of the present invention.
Each pattern or mode combinations provide following properties or property combination: manually or automatically switch between pattern (1); (2) application-user can be switched between pattern, and can start, and stops the value of asserting, check the value, and the single step cycle of passing through simulation or simulation process; (3) compilation process of generation software model and hardware model; (4) with the software kernel of all patterns of main control cycle control, in one embodiment, it step that comprises has: initialization system, test platform program/assembly that estimation activates, the estimation clock assembly detects the clock edge, upgrade RS, transmit combine component, the propulsion module pseudotime, and when activation test platform program exists, continue to carry out this circulation; (5) the component type analysis is to generate hardware model; (6) in one embodiment, by hiving off, place and route is mapped to reconfigurable circuit board with hardware model; (7) in one embodiment, the software clock setting is avoided the race state by gated clock logic analysis and the logic analysis of gate data; (8) in one embodiment, realize software clock through the following steps, in software model, detect the clock edge to trigger the enabling signal in the hardware model, clock input from major clock to clock edge register sends signal by the gated clock logic, clock enable signal is delivered to the startup input of hardware model register, by gate data logic data are delivered to the hardware model register from the major clock register, and reset clock edge register, shielding is to the clock enable signal of the startup input of hardware model register; (9) be that data selected in debug procedures and back sunykatuib analysis record; (10) combinational logic regeneration; (11) in one embodiment, the basic building block piece is the D-type register that has asynchronous input and export synchronously; (12) address pointer in each chip; (13) the multiplexed chip address indicator link of striding; (14) fpga chip array and its interconnect scheme; (15) have some groups of fpga chips of the bus that can compare with the pci bus system performance; (16) can use the FPGA of piggyback board expansion to organize; And time division multiplexing (TDM) circuit of (17) best pin utilization factor.By its each embodiment, the present invention also provides in other characteristics described in this literary composition, and these characteristics are not listed in above.
One embodiment of the present of invention are a kind of simulation systems.Simulation system is moved the ruuning situation with mimic channel in a mainframe computer system.Mainframe computer system comprises CPU (central processing unit) (CPU), primary memory, and connect CPU and primary memory and realize CPU and primary memory between the local bus of getting in touch.Hardware language as HDL, has been specified the 26S Proteasome Structure and Function of circuit, and this language can be described as circuit component type and be connected.Simulation system comprises: a software model, a software control logic, and a hardware logic elements.
The software model of circuit is connected with local bus.Usually, it is present in the primary memory.Software control logic is connected with hardware logic elements with software model, with the operation of Control Software model and hardware logic elements.Software control logic comprises can receive the interface logic of input data and clock signal from external program, and the clock detection logic that is used to detect effective clock edges and generates trigger pip.Hardware logic elements also is connected with local bus and comprises the hardware model based on component type of partial circuit at least, and is used in the Clock enable logic of hardware model according to the trigger pip estimated data.
Hardware logic elements also comprises a plurality of interconnected field programmable devices or its array.Each field programmable device comprises the part hardware model of circuit and therefore, the combination of all field-programmable devices then comprises whole hardware model.A plurality of interconnection interconnect the various piece of hardware model.The direct connection between any two the field-programmable devices that are positioned at identical row or column is represented in each interconnection.Shortest path in the array between any two field-programmable devices mostly is two interconnection or " hop " most.
An alternative embodiment of the invention is the system and method for mimic channel, wherein circuit modelization in software and at least the partial circuit modelling in hardware.Data estimation occurs in the hardware, but its by software clock by software control.Estimated data transfer is given hardware model and is made it stable.When software model detected effective clock edge, it was transferred to hardware model with enabling signal and estimates with activation data.The hardware model estimated data also then waits for the new data that arrive, and these data can be estimated when detecting next effective clock edge in software model.
An alternative embodiment of the invention comprises software kernel, the operation of its Control Software model and hardware model.Software kernel comprises test platform program/assembly that estimation activates, and the estimation clock assembly detects the clock edge, upgrade RS, transmit combine component, the propulsion module pseudotime, and when the test platform program that activates exists, continue to carry out this round-robin step.
Another embodiment of the present invention is a kind of method of mimic channel, and wherein hardware language as HDL, has been specified the 26S Proteasome Structure and Function of circuit, and assembly be described or be simplified to this language can with circuit.This method step comprises: (1) determines component type with hardware language; (2) based on component type generative circuit model; And (3) are by providing the input data, utilizing modeling circuit ruuning situation to model.The generation of model can comprise: the software model of (1) generative circuit; And (2) are based on the hardware model of component type generative circuit.
In another embodiment, the present invention is a kind of method of mimic channel.Step comprises: the software model of (1) generative circuit; (2) hardware model of generative circuit; (3) by the input data being provided to software model, utilizing software model mimic channel ruuning situation; (4) selectivity is transformed into hardware model; (5) provide the input data for hardware model; And (6) utilize hardware model mimic channel ruuning situation by quicken simulation in hardware model.Method also can comprise following additional step: (1) selectivity is transformed into software model; And (2) are by providing the input data, utilizing software model mimic channel ruuning situation to software model.Simulation also can utilize the software model Lay to stop.
For the internal circuit simulation model, this method comprises: the software model of (1) generative circuit; (2) generate the hardware model of partial circuit at least; (3) will offer hardware model from the input signal of goal systems; (4) will offer goal systems from the output signal of hardware model; (5) utilize the ruuning situation of hardware model mimic channel, wherein this software model can be controlled simulation/emulation one by one periodically.
For the back model molecule, the method for this mimic channel comprises: (1) produces circuit model; (2) by the ruuning situation of importing data, utilizing the modeling circuit is provided to model; And (3) record optionally import data and optionally output data as the measuring point of model.Can generate a software and hardware model.Method can further comprise the following steps: (1) in simulation, select the expectation the time correlation point; (2) on the selected time correlation point or before select measuring point; (3) provide the input data for hardware model; And (4) utilize hardware model mimic channel ruuning situation according to selected measuring point.
An alternative embodiment of the invention is a kind of method of simulation system generation model with mimic channel that be.Step comprises: the software model of (1) generative circuit; (2) generate the hardware model of partial circuit at least based on component type, the said modules type comprises register assembly and combine component; And (3) generate clock forming circuit in hardware model, to trigger the data estimation in the hardware model in response to clock edge in the software model detects.
Trigger and latch that a plurality of embodiment of the present invention utilizes custom-designed logical unit to replace standard design have solved the problems referred to above.One embodiment of the present of invention are timing-insensitive and glitch-free (TIGF) logical unit.The TIGF logical unit can be any latch or edge triggered flip-flop form.In one embodiment of the invention, provide trigger pip to upgrade the TIGF logical unit.Provide trigger pip in the short triggering cycle, this process occur in the estimation cycle in the time.
In the latch form, the TIGF latch comprises that the current TIGF latch state of maintenance is up to the trigger that receives trigger pip.Also provide multiplexer to receive new input value and old storing value.Enabling signal is as the selector signal of multiplexer.Because the renewal of trigger pip control TIGF latch, so data in the D input of TIGF latch and the control data that starts in the input can arrive by any order, and the retention time upset can not take place.Equally, because the renewal of trigger pip control TIGF so enabling signal can often be a glitch, and can not produce harmful effect to the true(-)running of TIGF latch.
In the trigger form, the TIGF trigger comprises first trigger that keeps new input value, keeps second trigger of current storing value, and the clock marginal detector.Trigger pip is controlled all these three parts to upgrade the TIGF trigger.Multiplexer also is provided with the marginal detector signal as selector signal.Because the input value that the storage of first trigger of a special use is new, it has effectively stoped the input in the estimation process to change, so avoided the retention time upset.Utilize the renewal of trigger pip control TIGF trigger, the clock glitch can not influence and use the hardware model of TIGF trigger as user's design circuit of emulation trigger.
In instructions, will discuss and illustrate these and other embodiment fully in the ensuing part.
Description of drawings
Under the help of following literal and accompanying drawing, the above-mentioned target and the description of this invention that the present invention may be better understood.
Fig. 1 has shown the high-level overview figure of one embodiment of the invention, and it comprises and is connected a workstation on the pci bus, reconfigurable hardware simulation model, emulation interface and goal systems.
Fig. 2 has shown the process flow diagram of a special method of the present invention.
Fig. 3 shows according to one embodiment of the invention in the compilation time and the high-level diagram of software translating and hardware configuration in working time.
Fig. 4 has shown the process flow diagram of compilation process, and it comprises generation software/hardware model and software kernel code.
Fig. 5 has shown the software kernel of controlling whole Analog Simulation System.
Fig. 6 has shown the method that hardware model is mapped to reconfigurable circuit board by mapping, place and route.
Fig. 7 has shown the connection matrix of the FPGA shown in Fig. 8 (field programmable gate array) array.
Fig. 8 has shown 4x4FPGA (field programmable gate array) array and its interconnective embodiment.
Fig. 9 (A), 9 (B) and 9 (C) have represented an embodiment of time division multiplexing (TDM) circuit, and it allows one group of wiring to connect in the mode of time division multiplexing, make that this group wiring in the chip can be used a pin, rather than a plurality of pin.Fig. 9 (A) has represented the general survey of leading foot problem, and Fig. 9 (B) has shown the TDM circuit that is used for transmission ends, and Fig. 9 (C) has shown the TDM circuit that is used for receiving end.
Figure 10 has shown Analog Simulation System structure according to an embodiment of the invention.
Figure 11 has shown an embodiment of address pointer of the present invention.
Figure 12 has shown the initialized state transition diagram of address pointer shown in Figure 11.
Figure 13 has shown and is used to address pointer to derive out an embodiment of the MOVE signal generator of different MOVE signals.
Figure 14 has shown the link of multiplexed address indicator in each fpga chip.
Figure 15 has shown the multiplexed according to an embodiment of the invention embodiment that strides chip address indicator link.
Figure 16 has shown for the very important clock/data network analysis process figure of logic module estimation in software clock realization and the hardware model.
Figure 17 has shown the basic building block piece of hardware model according to an embodiment of the invention.
Figure 18 (A) and 18 (B) have shown the register model of realizing latch and trigger.
Figure 19 has shown the embodiment of clock edge detection logic according to an embodiment of the invention.
Figure 20 has shown the four condition finite state machine of controlling clock shown in Figure 19 edge detection logic according to an embodiment of the invention.
Figure 21 has shown the interconnection of each fpga chip according to an embodiment of the invention, JTAG, FPGA bus and overall signal's pin sign.
Figure 22 has shown the FPGA controller embodiment between pci bus and the FPGA array.
Figure 23 has shown in conjunction with the CTRL_FPGA unit of Figure 22 discussion and the detailed view of data buffer.
Figure 24 has shown 4x4FPGA (field programmable gate array) array, the relation that itself and FPGA organize, and extended capability.
Figure 25 has shown an embodiment of hardware-initiated method.
Figure 26 has shown HDL (hardware description language) sign indicating number that is used for modelling and analog line circuit design example.
Figure 27 has shown the circuit diagram of symbolic expression HDL decoding circuit design shown in Figure 26.
Figure 28 has shown the component type analysis of HDL sign indicating number shown in Figure 26.
Figure 29 has shown based on the signal network analysis of User Defined circuit design shown in Figure 26 to structuring RTLHDL sign indicating number.
Figure 30 has shown the software/hardware subregion result for identical hypothetical examples.
Figure 31 has shown the hardware model for identical hypothetical examples.
Figure 32 has shown that a special hardware model of the identical hypothetical examples of User Defined circuit design--arrives--chip subregion result.
Figure 33 has shown that another special hardware model of the identical hypothetical examples of User Defined circuit design--arrives--chip subregion result.
Figure 34 has shown the logical patch operation of the identical hypothetical examples of User Defined circuit design.
Figure 35 (A) to (D) utilizes two examples that " hop (hop) " and the principle that connects have been described.
Figure 36 has shown the fpga chip general survey that is used for the present invention.
Figure 37 has shown the FPGA interconnect bus on the fpga chip.
Figure 38 (A) and 38 (B) have shown FPGA circuit board connectivity scenario side view according to an embodiment of the invention.
Figure 39 has shown direct neighbor and single relaying six circuit board interconnect structures of FPGA array according to an embodiment of the invention.
Figure 40 (A) and 40 (B) have shown interconnect scheme between the FPGA plate.
Figure 41 (A) has shown the top view of circuit board interconnect connector to 41 (F).
Figure 42 has shown connector and some assemblies on the plate on the representative FPGA circuit board.
Figure 43 has shown the connector legend of Figure 41 (A) in 41 (F) and 42.
Figure 44 has shown direct neighbor and single relaying double circuit board interconnection structure of FPGA array according to an embodiment of the invention.
Figure 45 has shown the workstation that has multiprocessor according to another embodiment of the present invention.
Figure 46 has shown environment according to another embodiment of the present invention, and wherein a plurality of users share a simulation/analogue system based on time-sharing operation.
Figure 47 has shown the high-level structure of emulating server according to an embodiment of the invention.
Figure 48 has shown emulating server structure according to an embodiment of the invention.
Figure 49 has shown the process flow diagram of emulating server.
Figure 50 has shown the process flow diagram that the operation exchange is handled.
Figure 51 has shown the signal between device driver and reconfigurable hardware cell.
Figure 52 has shown that emulating server is used to handle the time-sharing operation characteristic of a plurality of operations of different priority levels.
Figure 53 has shown the communication exchange signal between device driver and reconfigurable hardware cell.
Figure 54 has shown the constitutional diagram of signal of communication exchange agreement.
Figure 55 has shown the client-server pattern general survey of emulating server according to an embodiment of the invention.
Figure 56 has shown the simulation system high-level structure block diagram of execute store mapping according to an embodiment of the invention.
Figure 57 has shown the more detailed structural drawing of simulation system about the memory transactions aspect, wherein has the supporting assembly of the estimation finite state machine (EVALFSMx) that is used for limited memory state machine (MEMFSM) and is used for each fpga logic device.
Figure 58 has shown the finite state machine constitutional diagram of MEMFSM unit in the CTRL_FPGA unit according to an embodiment of the invention.
Figure 59 has shown the constitutional diagram of the finite state machine in each fpga chip according to an embodiment of the invention.
Figure 60 has shown that storer reads the data double buffer.
Figure 61 has shown that simulating Writing/Reading according to an embodiment of the invention circulates.
Figure 62 has shown when DMA (direct memory access) read operation takes place after the CLK_EN signal, the sequential chart of analog data transmissions operation.
Figure 63 has shown when when DMA (direct memory access) read operation takes place the end near the EVAL cycle, the sequential chart of analog data transmissions operation.
Figure 64 has shown the typical user's design as the PCI additional card.
Figure 65 has shown the exemplary hardware/software collaboration check system of use ASIC (special IC) as device under test.
Figure 66 has shown the collaborative check system of the typical case who uses emulator, and wherein device under test is programmed among the emulator.
Figure 67 has shown simulation system according to an embodiment of the invention.
Figure 68 has shown the collaborative check system that does not have outside input-output apparatus according to an embodiment of the invention, and wherein rcc computing system comprises the software model and the goal systems of different input-output apparatus.
Figure 69 has shown the collaborative check system that has actual outside input-output apparatus and goal systems according to another embodiment of the present invention.
Figure 70 has shown the detail logic diagram of the data input unit of steering logic according to an embodiment of the invention.
Figure 71 has shown the detail logic diagram of the data output unit of steering logic according to an embodiment of the invention.
Figure 72 has shown the sequential chart of the data input unit of steering logic.
Figure 73 has shown the sequential chart of the data output unit of steering logic.
Figure 74 has shown the board design of RCC hardware array according to an embodiment of the invention.
Figure 75 (A) has shown the shift-register circuit example that is used for explaining retention time and clock aliasing problem.
Figure 75 (B) has shown the sequential chart of shift-register circuit shown in the Figure 75 (A) that shows the retention time.
Figure 76 (A) has shown the identical shift-register circuit shown in Figure 75 (A) that strides across a plurality of fpga chips.
Figure 76 (B) has shown the sequential chart of shift-register circuit shown in the Figure 76 (A) that shows the retention time upset.
Figure 77 (A) has shown the logical circuit example that is used for illustrating the clock aliasing problem.
Figure 77 (B) has shown the logical circuit sequential chart shown in the Figure 77 (A) that shows the clock aliasing problem.
Figure 78 has shown the sequential adjustment technology of the problem that upsets according to the solution retention time of prior art.
Figure 79 has shown the sequential synthetic technology again that upsets problem according to solution retention time of prior art.
Figure 80 (A) has shown original latches according to an embodiment of the invention, and Figure 80 (B) has shown timing-insensitive and glitch-free latch according to an embodiment of the invention.
Figure 81 (A) has shown original design trigger according to an embodiment of the invention, and Figure 81 (B) has shown timing-insensitive and glitch-free design mode trigger according to an embodiment of the invention.
Figure 82 has shown the trigger mechanism sequential chart of timing-insensitive and glitch-free latch and trigger according to an embodiment of the invention.
To discuss to these figure in conjunction with a plurality of different aspects of the present invention and embodiment hereinafter.
DETAILED DESCRIPTION OF THE PREFERRED
This instructions is by about being called as system description different embodiments of the invention of " analog simulator " (" SEmulator ") or " analog simulation " (" SEmulation ").In the whole instructions, can use term " Analog Simulation System ", " analog simulator system ", " analog simulator ", or simple " system ".These terms refer to according to different device of the present invention and method embodiment, be used for four kinds of operator schemes of combination in any: (1) software simulation, (2) be by hardware-accelerated simulation, (3) internal circuit emulation (ICE), and (4) back sunykatuib analysis, comprise their corresponding configuration or pretreatment stages.Other the time, can use term " analog simulation ".New processing described herein represented in this term.
Same, the term finger print as " reconfigurable calculating (RCC) array system " or " rcc computing system " is intended/works in coordination with in the check system comprising primary processor, the part of the software model of software kernel and user's design.Comprise the part of the hardware model of user's design in term finger print plan as " reconfigurable hardware array " or " RCC hardware array "/collaborative check system, this part comprises reconfigurable array of logic elements in one embodiment.
Also used " user " and user " circuit design " or " Electronic Design " in the instructions." user " is the people who uses Analog Simulation System by its interface, may be the deviser of circuit or the test/commissioning staff who seldom participates in or have neither part nor lot in design process." circuit design " or " Electronic Design " is the system or the assembly of self-definition design, can be software or hardware, and it can simulated the analogue system modelling to realize test/debugging.In many cases, " user " also designed " circuit design " or " Electronic Design ".
Instructions has also used as " wiring ", " wiring route ", " wiring/bus line " and " bus " such term.These terms refer to different conducting wires.Multi-thread between single line that every circuit can be a point-to-point transmission or point.These terms can exchange use, because " wiring " can comprise one or more conductor wire, " bus " also can comprise one or more conductor wire.
This instructions launches according to outline.At first, instructions has been introduced the roughly general survey of Analog Simulation System, comprises the general introduction of four kinds of operator schemes and hardware implementations.Secondly, instructions has carried out detailed discussion to Analog Simulation System.In some cases, variant of scheming to show embodiment shown in its last figure.At this moment, use identical Ref. No. to represent identical assembly/unit/process.The outline of instructions is as follows:
I. general introduction
A. simulation/hardware-accelerated pattern
B. utilize the goal systems mode simulation
C. sunykatuib analysis pattern after
D. hardware implementations
E. emulating server
F. storer simulation
G. collaborative check system
II. system description
III. simulation/hardware-accelerated pattern
IV. utilize the goal systems mode simulation
V. sunykatuib analysis pattern after
VI. hardware implementations
A. general introduction
B. address pointer
C. gate data/clock network analysis
D.FPGA array and control
E. use the alternate embodiment of more intensive fpga chip
The F.TIGF logical unit
VII. emulating server
VIII. storer simulation
IX. collaborative check system
X. example
--------------------------------------------------------------
I. general introduction
Each embodiment of the present invention has four kinds of general operation patterns: (1) software simulation, (2) be by hardware-accelerated simulation, (3) internal circuit emulation (ICE), and (4) back sunykatuib analysis.The different embodiment that comprise the system and method for these patterns have some in the following feature at least:
(1) the software and hardware model has a single tight coupling gang mould and intends engine, a software kernel, and it is Control Software and hardware model circularly one by one; (2) the automatic component type analysis occurs in the process of compilation process, is used for the generation and the subregion of software and hardware model; (3) have in the software simulation pattern, by hardware-accelerated simulation model, internal circuit simulation model, and the ability of conversion between the back sunykatuib analysis pattern (circulation one by one); (4) the complete hardware model visibility by combination of software assembly regeneration; (5) has the double buffering clock modelsization of software clock and gated clock/data logic, to avoid the race state; And (6) any selected element from past simulation process is simulated again or with the ability of hardware-accelerated subscriber's line circuit design.Net result is flexible and quick simulator/emulator systems and the method with complete HDL function and emulator execution performance.
A. simulation/hardware-accelerated pattern
The analog simulator system is by the automatic component type analysis, can be in software and hardware with user's self-defined circuit design modelling.Whole designing a model of subscriber's line circuit among software, and the estimation assembly (being memory assembly, combine component) then modelling among hardware.Help to carry out hardware modeling by the component type analysis.
The software kernel that resides in the general processor system main memory serves as the master routine of analog simulator system, and it is responsible for controlling the overall operation and the execution of its different mode and function.As long as there is any test platform program to activate, kernel is the test platform assembly of estimation activation just, and the estimation clock assembly detects the clock edge to upgrade RS and to transmit combinational logic data and propulsion module pseudotime.This software kernel provides hardware acceleration engine for the simulator engine with tight coupling connection characteristic.For the software/hardware border, the analog simulator system provides several input/output address space-REG (register), CLK (software clock), S2H (software is to hardware), and H2S (hardware is to software).
Analog simulator has the ability of selectivity conversion between four kinds of operator schemes.The user of system can begin simulation, stops simulation, asserts input value, check the value, and the single step that circulates is one by one carried out, and switches back and forth between four different patterns.For example, system can quicken simulation by hardware model with the software simulation circuit in a period of time, returns the software simulation pattern again.
Usually, Analog Simulation System provides the ability that can " see " each modelling assembly for the user, no matter its be software or in hardware modelling.Owing to multiple reason, combine component is unlike register " as seen ", and therefore, it is very difficult obtaining the combine component data.A reason is to be used for reconfigurable circuit board the modeled FPGA of hardware components that subscriber's line circuit designs generally is modeled as combine component question blank, replaces actual combine component.Therefore, Analog Simulation System reads the value in the register and regenerates combine component.Because need some expenses to regenerate combine component, so this regeneration process is not in the free execution of institute; But only when needing, the user carries out.
Because software kernel is present in the software end, so provide clock edge testing mechanism to deliver in each register of hardware model will start input with the generation that triggers so-called software clock.By the strict control timing of double buffering circuit arrangement, the software clock enabling signal before entering these models, data is entered in the register model.In case stablized the data of importing these register models, the synchronous gate data of software clock do not have the danger of any generation retention time upset to guarantee the data value that common gate is all.
Software simulation is also very fast, because therefore all input value and the selected register value/states of system log (SYSLOG) minimize expense by the quantity that reduces the I/O operation.The user can optionally select recording frequency.
B. utilize the goal systems mode simulation
Analog Simulation System can be in its goal systems environment emulation user's circuit.Goal systems is used for estimation to the hardware model output data, and hardware model is also to the goal systems output data.In addition, software kernel is controlled the operation of this pattern, makes the user still can select beginning, stop, and the value of asserting, check the value, single step is carried out, and mode switch.
C. sunykatuib analysis pattern after
Running log provides the historical record of simulation process for the user.Be different from known simulation system, " Analog Simulation System " do not write down each monodrome in the simulation process, internal state, or value changes." simulation system " is only based on selected value and the state of recording frequency (that is 1 record of every N periodic recording) record.In the back dummy run phase, if the user need check a plurality of data around the simulation process mid point X that just finishes, then the user forwards to earlier on the measuring point, for example measuring point Y, and this closest approach X also temporarily is positioned at before it.Then, the user simulates to obtain analog result from selected measuring point Y to impact point X.
D. hardware implementations
" Analog Simulation System " realizes the fpga chip array on reconfigurable circuit board.Based on hardware model, " Analog Simulation System " each selected part to the subscriber's line circuit design on fpga chip is carried out subregion, and mapping is arranged, and the circuit customized treatment.Therefore, the large scale circuit that can modelling on these 16 chips, launches of 4x4 array with 16 chips for example.This interconnect scheme makes each chip insert another chip within can or linking at 2 times " wire jumper ".
Each fpga chip be each input/output address space (that is, and REG, CLK, S2H H2S) provides an address pointer.The combination of all address pointers that are associated with a specific address space is linked at.So, in data transmission procedure, sequentially the digital data in each chip is selected/is selected into main FPGA bus and pci bus, corresponding to one next word of the selected address space in each chip, and next chip is till the target word data that have access to corresponding to selected address space.Utilize a transmission word select to select the select progressively that signal is finished digital data.This word select is selected signal and is passed an address pointer in the chip, and then is delivered to the address pointer in the next chip, and chip or system that this process continues to the last carry out initialization to address pointer.
Bandwidth when the FPGA bus system in reconfigurable circuit board is worked is the twice of pci bus, but speed only is pci bus half.Therefore, fpga chip is divided into some groups to utilize the bus of bigger bandwidth.The processing power of this FPGA bus system can be than the processing power of last pci bus system, so do not lose performance because of the reduction of bus speed.Can realize expansion by the length of piggyback board extension group.
In another embodiment of the present invention, use more intensive fpga chip.A kind of more intensive chip is Altera 10K130V and 10K250V chip.The use of these chips has changed the design of circuit board, makes only to use four fpga chips on each circuit board, rather than eight not intensive fpga chips (as Altera 10K100).
FPGA array in the simulation system is arranged on the mainboard by a special board interconnection structure.Each chip can have nearly 8 groups of interconnection, wherein interconnect according to the interconnection of the direct neighbor of adjacency (that is, N[73:0], W[73:0], E[73:0]), and single hop adjacent interconnection (that is, NH[27:0], SH[27:0], XH[36:0], XH[72:37]), do not comprise that local bus connects, and is arranged on the single circuit board and the different plate of cross-over connection.Each chip can be directly and adjacent being connected in abutting connection with chip, or in a hop with the position thereon, down, a left side, right non-adjacent chip is connected.(Dong-Xi), array is an annular at directions X.In Y direction (North-south), array is latticed.
These interconnection can be connected logical unit on single circuit board separately with other assemblies.But mother daughter board connector can connect these plates and interconnection between various boards, so that (1) by between the pci bus and array board of mainboard, and (2) transmit signal between any two array boards.
Therefore and pci bus a motherboard connector is connected circuit board with mainboard, and, power supply is connected with ground.For some circuit boards, motherboard connector is not used in the direct of mainboard and is connected.In one six board structure of circuit, only plate 1,3 directly is connected with mainboard with 5, and its adjacent panels realization of plate 2,4 and 6 dependence simultaneously is connected with mainboard.Therefore, just have one directly to be connected, and the interconnection of these plates and local bus interconnect to the mother daughter board connector of component side by being arranged in face of weld with mainboard every a plate.Only the path of pci signal is only by a plate (being generally first circuit board).The motherboard connector that power supply and earth potential is put on other is used for these circuit boards.Be arranged in face of weld and realized the pci bus assembly, communicating by letter between fpga logic device, storage arrangement and each simulation system control circuit to each mother daughter board connector of component side.
E. emulating server
In another embodiment of the present invention, has emulating server to allow the identical reconfigurable hardware cell of a plurality of user captures.In a system architecture, a plurality of users/processing in a plurality of workstations in the network or the non-network environment can be visited the identical reconfigurable hardware unit based on server, so that identical or different subscriber's line circuit design is checked/debugged.Finish visit by time-division processing, one of them scheduler program is determined a plurality of users' access privileges, exchanging operation, and selectivity is pinned the hardware model visit between the predesignated subscriber.In one case, each user can be mapped to reconfigurable hardware model for the first time so that his/her isolated user is designed by access server, system compiles to generate the hardware and software model design in the case, the execution operation of hiving off, carry out layout and wiring operations, generate the bit stream configuration file, and in reconfigurable hardware cell, reconfigure fpga chip with hardware components modelling with user's design.When a user uses hardware model to quicken its design and hardware state downloaded in his storer for software simulation, can discharge hardware cell for another user capture.
Server provides visit to reconfigurable hardware cell to a plurality of users or processing, with the purpose that realizes quickening and hardware state exchanges.Emulating server comprises scheduler program, one or more device drivers, and reconfigurable hardware cell.Scheduler program in the emulating server is based on the round-robin algorithm of trying to be the first.The server scheduling program comprises a simulation job queue table, a priority classification device, and an operation exchanger.Recovery of the present invention and playback function make non-network multiprocessing environment and network multi-user environment convenient more, wherein can download the status data of previous checkpoint, and can recover whole emulation mode, be used for playback debugging or by endless-walk one by one about this checkpoint.
F. storer simulation
Storer simulation of the present invention or memory mapped provide the effective way of a kind of simulation system management about the different memory areas of the model of configure hardware of user's design, and hardware model is by in the fpga chip array of sequencing on the reconfigurable hardware unit.Storer of the present invention simulation provides a kind of structure and scheme, wherein designs in the SRAM storage arrangement that relevant numerous memory blocks are mapped to simulation system with the user, rather than is used for disposing in the logical unit with modelling user design.The storer simulation system comprises a memory state machine, an estimated state machine, with and relevant logic and interface, be used for controlling and being connected: (1) mainframe computer system and its associative memory system, (2) the SRAM storage arrangement that is connected with the FPGA bus in the simulation system, and (3) fpga logic device, it comprises user's design of disposing of debugging and sequencing.The operation of storer simulation system according to an embodiment of the invention is as follows usually.Simulation Writing/Reading circulation is divided into three cycles---DMA (direct memory access (DMA)) data transmission, estimation, and storage access.
The fpga logic device end of storer simulation system comprises an estimated state machine, a FPGA bus driver, and logic interfacing, be used for each memory block N and design being connected of user self memory interface with the user, to handle: the data estimation in (1) fpga logic device, and the Writing/Reading storage access between (2) fpga logic device and SRAM storage arrangement.Together with the fpga logic device end, FPGA i/o controller end comprises a memory state machine and interface logic, to handle (1) mainframe computer system and SRAM storage arrangement, and the DMA (direct memory access (DMA)) between (2) fpga logic device and the SRAM storage arrangement, the write and read operation.
G. collaborative check system
One embodiment of the present of invention are collaborative check system, and it comprises a reconfigurable computing system (being called " rcc computing system " hereinafter) and a reconfigurable computing hardware array (being called " RCC hardware array " hereinafter).They in certain embodiments, do not need goal systems and outside input-output apparatus, because can use software modularity.In further embodiments, in fact goal systems is connected with acquisition speed with outside input-output apparatus and uses real data with collaborative check system, rather than the analog testing platform data.Therefore, collaborative check system can comprise rcc computing system and RCC hardware array and in conjunction with other function, with when using actual goal systems and/or input-output apparatus, and the software section and the hardware components of debugging user design.
Rcc computing system also comprises clocked logic (being used for detection of clock edge and software clock generates), the test platform program that is used for test subscriber's design, and device model, it is used for user's decision at the modeled any input-output apparatus of software, to replace using real physics input-output apparatus.Certainly, the user can determine to use real input-output apparatus and modeled input-output apparatus in a debug procedures.Software clock is offered the effect of external interface with the external clock pulse source that is used from goal systems and outside input-output apparatus.The use of software clock has brought the needs of synchronous processing input and output data.Because the software clock that rcc computing system generates is the time base of debug procedures, thus simulation and hardware-accelerated data with working in coordination with any data sync of transmitting between check system and external interface.
When goal systems is connected with collaborative check system with outside input-output apparatus, must between collaborative check system and its external interface, provide the leading foot data.Collaborative check system comprises a steering logic, and it provides: (1) rcc computing system and RCC hardware array, and the Control on Communication between (2) external interface (it is connected with goal systems and outside input-output apparatus) and the RCC hardware array.Because rcc computing system has the model of whole design in software, comprise the part of designing a model of user in RCC hardware array, so rcc computing system also must be able to be visited through all data between external interface and the RCC hardware array.Steering logic has guaranteed that rcc computing system can visit these data.
II. system description
Fig. 1 has shown the high-level overview figure of one embodiment of the invention.Workstation1 0 is connected with emulation interface 30 with reconfigurable hardware model 20 by pci bus system 50.Reconfigurable hardware model 20 is connected with emulation interface 30 by pci bus 50 and cable 61.Goal systems 40 is connected with emulation interface 30 by cable 60.In further embodiments, when need be under the goal systems environment in specific test/debug procedures during the design of emulation subscriber's line circuit, in this device, then do not have the internal circuit simulator 70 (as shown in frame of broken lines) that comprises emulation interface 30 and goal systems 40.Do not have internal circuit simulator 70, reconfigurable hardware model 20 communicates by pci bus 50 and workstation1 0.
Collaborative internal circuit simulator 70, reconfigurable hardware model 20 simulates or imitates the circuit design of some electronic sub-systems of user in goal systems.In order to ensure under the goal systems environment to the proper operation of user's electronic sub-system circuit design, must offer reconfigurable hardware model 20 at the input and output signal between goal systems 40 and modelling electronic sub-system for estimation.Therefore, goal systems 40 is transmitted by way of emulation interface 30 and pci bus 50 by cable 60 for the input and output signal of reconfigurable hardware model 20.Perhaps, the input/output signal of goal systems 40 can be transferred to reconfigurable hardware model 20 by emulation interface 30 and cable 61.
Control data and quite some simulated datas between reconfigurable hardware model 20 and workstation1 0, transmit by pci bus 50.In fact, the software kernel of the whole Analog Simulation System operation of workstation1 0 operation control, and must be able to visit (read/write) reconfigurable hardware model 20.
Workstation1 0 comprises computing machine, keyboard, and mouse, display and suitable bus/network interface make the user can enter and revise the data of describing the electronic system circuitry design.The demonstration work station comprises the SPARC of Sun Microsystems company or ULTRA-SPARC workstation or based on the computer installation of Intel/Microsoft.As known to the technical staff in the technical field, workstation1 0 comprises a CPU11,12, one main frames of a local bus/13, memory buss 14 of PCI bridge, and primary memory 15.Workstation1 0, reconfigurable hardware model 20 provides the various software simulation relevant with the present invention, hardware-accelerated simulation, internal circuit emulation, and back sunykatuib analysis with emulation interface 30.Being implemented in algorithm in the software is stored in the primary memory 15 in a test/debug procedures and utilizes CPU11 to pass through the operating system execution algorithm of workstation.
As known to the technical staff in the technical field, after in the storer that operating system is written into workstation1 0 by the startup firmware, control system forwards its initialization codes to set up the data necessary structure, is written into and the initialization apparatus driver.Then control system forwards command line interpreter (CLI) to, and it points out user to point out the program that will move.Next operating system determine the required amount of memory of working procedure, and the memory block is set, or allocate storage and directly or by BIOS (Basic Input or Output System (BIOS)) reference-to storage.After finishing storer and being written into process, the beginning executive utility.
One embodiment of the present of invention are a kind of specific analog simulation application programs.In its implementation, application program needs operating system that multiple service is provided, and includes but not limited to reading and writing, execution data communication, and connection display/keyboard/mouse disk file.
Workstation1 0 has the appropriate users interface, and to allow user's typing circuit design data, editor's circuit design data monitors simulation and simulation process, obtains the result simultaneously, and controls simulation and simulation process in essence.Although show among Fig. 1, user interface comprise can utilize keyboard and mouse enter and be presented on the display, can be by the menu-driven options and the command set of user capture.The user uses the computer installation 80 with keyboard 90 usually.
The user sets up the specific circuit design of electronic system and usually with HDL (hardware description language) (being generally RTL-hierarchical Design structure) the coding key input service station 10 of its designed system.Analog Simulation System executive module type analysis of the present invention between other operation, is used to divide the modelling between the hardware and software.Analog Simulation System is the dry run situation in software, RTL and gate leve coding.For hardware modeling, system can modelling RTL and the gate leve coding; But the RTL level must be synthetic with gate leve before hardware modeling.The gate leve coding can directly be processed into available source design data library format, is used for hardware modeling.Use RTL and gate leve coding, system automatically performs the component type analysis to finish partiting step.Based on occurring in the division analysis of software translating in the time, system is mapped into hardware to simulate fast by hardware-accelerated with the some parts of circuit design.The user also can link to each other modeled circuit design to carry out the internal circuit emulation under the true environment with goal systems.Because software simulation closely links to each other with hardware acceleration engine, so pass through software kernel, the user can use software simulation to simulate the entire circuit design, come accelerated test/debug process by the hardware model that uses institute's mapping circuit design, return the simulation part, and return and hardware-acceleratedly finish up to test/debug process.By each cycle period pattern and by user intention software simulation and hardware-accelerated between conversion be one of favourable characteristics of present embodiment.These characteristics are particularly useful in debug process, and it makes the user can enter specific point or cycle fast using hardware-accelerated pattern, and then use software simulation to detect thereafter a plurality of and design with debug circuit.In addition, Analog Simulation System makes the user can see all assemblies, and no matter the inside of assembly realizes it being in hardware or software.It is by reading register value from hardware model that Analog Simulation System is finished this process, and when then this read when customer requirements, the use software model rebulid combine component.These and other characteristics will be hereinafter be discussed more fully.
Workstation1 0 links to each other with bus system 50.Bus system can be any available bus system, and it makes different subjects, and for example workstation1 0, and reconfigurable hardware model 20 is realized exercisable the connection with emulation interface 30.Bus system is preferably enough fast, thinks that the user provides in real time or approaching real-time result.A kind of this type of bus system is the bus system described in peripheral component interconnect (PCI) standard, and its content is incorporated this paper by reference into.At present, 2.0 of the PCI standard editions bus speeds that 33MHz is provided.2.1 version provides the support to the 66MHz bus speed.Thereby, workstation1 0, reconfigurable hardware model 20 and emulation interface 30 will be followed the PCI standard.
In one embodiment, in the communication between work of treatment station 10 and the reconfigurable hardware model 20 on the pci bus.In this bus system, also can find other PCI suitable device.These equipment can with workstation1 0, reconfigurable hardware model 20 is connected pci bus with emulation interface 30 identical or different grades.The pci bus of each different brackets, for example pci bus 52, are connected to the pci bus of PCI bridge 51 with other grade by PCI, for example pci bus 50 (if present).On pci bus 52, be connected with two PCI equipment 53 and 54.
Reconfigurable hardware model 20 comprises the array of field programmable gate array (FPGA) chip, and it can be disposed and reconfigure the hardware components with the system design of modelling user's electronic by sequencing.In this embodiment, hardware model is reconfigurable; That is to say that its reconfigurable its hardware is to be fit to the design of specific calculation or existing subscriber's line circuit.For example, many if desired totalizers and multiplier, then system just disposes many totalizers and multiplier.When needs other computing unit or during function, they are also with modelling or be formed among the system.Like this, can optimization system to carry out special calculating or logical operation.Reconfigurable system also has flexibility simultaneously, makes the user make, the less hardware fault that runs in test or the use.In one embodiment, reconfigurable hardware model 20 comprises a computing element dyadic array of being made up of fpga chip, to provide computational resource to different user circuit design and application.More detailed hardware configuration process will be discussed.
Two kinds of these type of fpga chips comprise the chip that Altera and Xilinx company are sold.In certain embodiments, by using field programmable device that reconfigurable hardware model is reconfigured.But other embodiment of the present invention realize by using special IC (ASIC) technology.Can be the form of self-defined integrated circuit at some other embodiment.
Under typical test/debugging enironment,, make and before the prototype manufacturing of reality, can carry out suitable change using reconfigurable equipment simulating/emulation user's circuit design.But, under some other situation, can use real ASIC or self-defined integrated circuit, although it has deprived the possible NOT-function circuit design of the change of user's fast, economical with the simulation and the ability of emulation again again.Yet sometimes, this type of ASIC or self-defined integrated circuit have been made and have been finished and obtain easily, make that to utilize non-reconfigurable chip to carry out emulation can be preferred.
According to the present invention, the software in the workstation, in conjunction with its external hardware model, being in a ratio of the terminal user with existing system provides and has had more high flexibility, controlled and performance.In order to move simulation and emulation, with determine the model of circuit design and correlation parameter (as, the excitation of input test platform, total system output, intermediate result) and offer the simulation softward system.The user can use synoptic diagram collection kit or synthetics define system circuit design.The user is beginning with the circuit design (being generally the form of rough schematic) of electronic system, then uses synthetics to change it into HDL (hardware description language) form.Also can directly write HDL by the user.HDL language as example comprises Verilog and VHDL (VHSIC hardware description language); But, also can use other language.The circuit design of representing with HDL comprises many parallel components.Each assembly is a coded sequence, and its ruuning situation that has promptly defined circuit component has been controlled the execution of simulation again.
Analog Simulation System is analyzed these assemblies with definite its component type, and program compiler utilizes these component type informations to set up execution models different in the software and hardware.After this, the user can use Analog Simulation System of the present invention.The deviser can be by using a plurality of excitations to analogy model, for example the simulation of input signal and the test vector pattern accuracy of coming proof scheme.If not operation according to plan of circuit in simulation process, then the user redefines circuit by revising circuit diagram or hdl file.
Process flow diagram among Fig. 2 has shown the use of the embodiment of the invention.Algorithm starts from step 100.After with the hdl file loading system, system compiles circuit design, divides and be mapped to suitable hardware model.To go through compiling hereinafter, divide and mapping step.
Before the operation simulation, system must move a homing sequence, to remove all the unknown " x " values in the software before hardware-accelerated model works.One embodiment of the present of invention use 2-bit wide data path to provide that 4 state values of bus signals---" 00 " is logic low, and " 01 " is logic high, and " 10 " are " z ", and " 11 " are " x ".Known to the person of ordinary skill in the field, software model can be handled " 0 ", " 1 ", " x " (bus conflict or unknown-value) and " z " (no driver or high impedance).Comparatively speaking, hardware can't be handled unknown-value " x ", so homing sequence, it changes according to specific suitable coding, register value is all reset to " 0 " or " 1 ".
In step 105, the user determines whether Analog Circuit Design.Usually, the user will make system from software simulation.Therefore, if the decision of step 105 is a "Yes", then at step 110 beginning software simulation.
The user can stop simulation and check the value (shown in step 115).In fact, the user can stop simulation any time in test/debug procedures, as extending to hardware-accelerated pattern from step 115, in ICE pattern and the back simulation model shown in the dotted line of each node.Execution in step 115 has been taken the user to step 160.
After stopping, system kernel reads back the state of hardware register assembly regenerating the whole software model, if user expectation is checked the combine component value, then also comprises combine component.After recovering the whole software model, any signal value of user in can check system.After stopping and checking, the user can continue only to move in simulation model or hardware-accelerated pattern.As shown in process flow diagram, step 115 forwards to and stops/the value scrutiny program.Stop/the value scrutiny program starts from step 160.In step 165, the user must determine whether to stop simulation and check the value at this point.If the result of step 165 is a "Yes", then step 170 stops current ongoing simulation and checks the correctness of each value with the checking circuit design.In step 175, algorithm returns take-off point, i.e. step 115.At this, the user can continue simulation and stop for remaining test/debug procedures/check the value or advance to the internal circuit simulation process.
Same, if the result of step 105 is a "No", then algorithm will continue hardware-accelerated determining step 120.In step 120, the user determines whether to accelerate test/debug process by the hardware components acceleration simulation of modelling circuit design.If the result of step 120 is a "Yes", quicken at step 125 beginning hardware model so.In system's compilation process, " Analog Simulation System " is mapped to some parts in the hardware model.At this, when needs were hardware-accelerated, system moved on to register and combine component in the hardware model and will import with estimated value and moves on in the hardware model.Therefore, in hardware-accelerated process, estimation occurs in the hardware model over a long time with the speed that increases.Kernel writes hardware model with test platform output, and the update software clock then reads the hardware model output valve by each cycle period pattern.If the user needs, can use value from the whole software model of subscriber's line circuit design (entire circuit design), this realizes with output register value and combine component by utilizing register value to regenerate combine component.Regenerate these combine components because need software to get involved, so be not the output that the whole software model value all was provided in each cycle; But only when needing, the user provides these values.This instructions will discussed the process that regenerates of combine component thereafter.
In addition, shown in step 115, the user can stop hardware-accelerated pattern at any time.If the user wants to stop, algorithm enters step 115 and 160 to be stopped forwarding to/the value scrutiny program.At this, in step 115, the user can stop hardware-accelerated simulation process at any time and check the end value of simulation process, and perhaps the user can continue hardware-accelerated simulation process.Stop/the value scrutiny program forwards step 160 to, and 165,170 and 175, its existing hereinbefore introduction.Return master routine after step 125, the user can determine whether to continue hardware-accelerated simulation or carry out pure simulation in step 135.If the user wants further simulation, then algorithm enters step 105.If not, then algorithm enters the back sunykatuib analysis of step 140.
In step 140, " Analog Simulation System " provides a plurality of back sunykatuib analysis characteristics.All inputs of system log (SYSLOG) to hardware model.For hardware model output, system is with all values of user-defined recording frequency (as, 1/10,000 record/cycle) record hardware register assembly.Recording frequency has determined the frequency that output valve is recorded.For the recording frequency in 1/10,000 record/cycle, output valve of per 10,000 periodic recordings.Recording frequency is high more, and the information of noting that is used for the back sunykatuib analysis is also many more.Because selected recording frequency and analog simulation speed have cause-effect relationship, so user's care should be used to is selected recording frequency.Higher recording frequency will lower analog simulation speed, because necessary spended time of system and resource write down the output data to storer execution I/O operation before carrying out further simulation.
About the back sunykatuib analysis, the user will select the specified point of expectation simulation.The user can then will write down the input hardware model by the operating software simulation in " analog simulation " back and analyze with the internal state of calculated value variation and all hardware assembly.The data that should note having used hardware accelerator to simulate selected measuring point are with the analysis mode result.After this analog analysing method can be got in touch any analog waveform visualizer to be used for the back sunykatuib analysis.Subsequently more detailed discussion will be arranged.
In step 145, the user can be chosen in the circuit design of analogue simulation in its goal systems environment.If the result of step 145 is a "No", algorithm stops and the analog simulation process ends at step 155.If expectation combining target system carries out emulation, then algorithm enters step 150.This step relates to activation emulation interface plate, cable and chip pin joint are inserted goal systems, and the operational objective system is to obtain the system's I/O from goal systems.Comprise the signal between the emulation of goal systems and circuit design from system's I/O of goal systems.The circuit design of emulation is handled these signals from the goal systems receiving inputted signal, send signal to Analog Simulation System and be used for further processing, and the signal after will handling is exported to goal systems.Opposite, the circuit design of emulation sends output signal to goal systems, and goal systems is handled signal, and the signal after handling may be exported back the circuit design of emulation.By this method, performance that can evaluation circuits design under its natural goal systems environment.After the combining target system carried out emulation, the result that the user has can design or disclose its NOT-function feature by proof scheme.At this moment, shown in step 135, the user can simulate/emulation once more, all stops with the improvement circuit design, or carries out the manufacturing of integrated circuit based on the circuit design of checking.
III. simulation/hardware-accelerated pattern
Fig. 3 shown according to one embodiment of the invention, in the compilation time and the high-level diagram of software translating and hardware configuration in working time.Fig. 3 has shown two groups of information: one group of data separation in the operation of compilation time and simulation/carry out in the simulation run time; Another group information has shown the division between software model and the hardware model.During beginning, Analog Simulation System needs the subscriber's line circuit design as input data 200 according to an embodiment of the invention.Subscriber's line circuit is designed to the form of certain hdl file (as Verilog, VHDL).Analog Simulation System is resolved hdl file, so that with operation conditions level coding, register transfer level sign indicating number and gate leve coding are reduced to can be for the form of Analog Simulation System use.System generates the source design database and is used for front-end processing step 205.Now, the hdl file after the processing can use for Analog Simulation System.Known to the person of ordinary skill in the field, resolving becomes the intrinsic BINARY data structure with ASC II data-switching.See also ALFRED V.AHO, " program compiler: principle, technology and instrument " (1988) of RAVI SETHI and JEFFREYD.ULLMAN, its content is incorporated this paper by reference into.
Represented by process/unit 230 by working time by process 225 expressions for compilation time.Shown in process 225, in compilation time, Analog Simulation System is by the hdl file after the executive module type analysis process of compilation.The component type analysis is divided into combine component with the HDL assembly, register assembly, clock assembly, memory assembly and test platform assembly.In fact, system is divided into control and estimation assembly with the subscriber's line circuit design.
Analog simulation program compiler 210 is mapped to the Control Component of simulation in the software in fact, will estimate that assembly is mapped in the software and hardware.Program compiler 210 generates the software model that is used for all HDL assemblies.Software model is formed in the coding 215.In addition, analog simulation program compiler 210 is used the component type information of hdl file, selects or generate hardware logic blocks/element from routine library or module generator, and generates the hardware model that is used for specific HDL assembly.Net result is so-called " bit stream " configuration file 220.
In the preparatory stage of working time, the software model of coding form is stored in the primary memory, and the application program relevant with analog simulation program according to an embodiment of the invention also is stored in the primary memory.In general processor or workstation 240, handle this coding.Basically meanwhile, the configuration file 220 that uses hardware model with the subscriber's line circuit design map in reconfigurable hardware plate 250.At this, circuit design is mapped and be assigned in the fpga chip in the reconfigurable hardware plate 250 by modelling those parts in hardware.
As mentioned above, to general processor or workstation 240 user application test platform excitations and test vector data and other test platform resources 235 to be used to simulate purpose.In addition, the user can be by the emulation of software control executive circuit design.Reconfigurable hardware plate 250 comprises user's artificial circuit design.Analog Simulation System makes optionally conversion between software simulation and simulation hardware of user, and stops simulation or simulation process at any time by each cycle period pattern, to check the value of each assembly (register or combine component) in the model.Therefore, Analog Simulation System transmits data and is used for simulation between test platform 235 and processor/workstation 240, transmit data by data bus 245 and processor/workstation 240 and be used for emulation between test platform 235 and reconfigurable hardware plate 250.If comprise an ownership goal system 260, emulated data can transmit between reconfigurable hardware plate 250 and goal systems 260 by emulation interface 255 and data bus 245 so.Kernel is present in the software simulation model in the storer of processor/workstation 240, so need transmit data between processor/workstation 240 and reconfigurable hardware plate 250 by data bus 245.
Fig. 4 has shown compilation process process flow diagram according to an embodiment of the invention.Compilation process among Fig. 3 is by process 205 and 210 expressions.Compilation process among Fig. 4 starts from step 300.Step 301 is handled front-end information.At this, generate gate leve HDL coding.The user is by direct hand-written coding or use the synoptic diagram of some forms or synthetics to generate gate leve HDL coded representation so that the initial circuit design is converted into the HDL form.Analog Simulation System resolves to binary format with hdl file (ASC II form), thereby with operation conditions level sign indicating number, register transfer level (RTL) sign indicating number and gate leve coding are reduced to can be for the internal data structure form of Analog Simulation System use.System generates and comprises the source design database of resolving back HDL coding.
Step 302 is by being divided into combine component with the HDL assembly, register assembly, clock assembly, memory assembly and test platform assembly (shown in component type resource 303) executive module type analysis.Analog Simulation System generates and is used for the hardware model of register and combine component, and follows some exceptions, will discuss hereinafter.Test platform and memory assembly are mapped in the software.Some clock assemblies (as, derive clock) modelling in hardware, other assemblies are positioned at software/hardware boundary (as, software clock).
Combine component is the stateless logic module, and its output valve is the function of current input value and the history that does not rely on input value.The example of combine component comprise elementary gate (as, with, or, XOR, non-), selector switch, totalizer, multiplier, shift unit, and bus driver.
The register assembly is simple memory module.State-transition by the clock signal control register.A kind of form of register is the edge trigger-type, and its generation state changes when detecting the edge.The form of another kind of register is the latch form, and it is a level triggers.Demonstration example comprises trigger (D-type, JK-type) and level-sensitive latch.
Clock assembly is for to send periodic signal to logical unit to control the assembly of its operation conditions.Usually, the renewal of clock signal control register.Generate major clock from test platform program from sequential.For example, it is as follows to be used to generate the typical test platform program (Verilog) of clock:
always begin
Clock=0;
#5;
Clock=1;
#5;
end;
According to this coding, clock signal is initiated at logical zero.After 5 chronomeres, clock signal becomes logical one.After 5 chronomeres, clock signal is returned logical zero again.Master clock signal generally is created in the software and minority is only arranged (that is, 1-10) major clock is present in typical user's circuit design.Generating from the network of the combinational logic that driven by major clock and register derives or gated clock.Many (that is, 1000 or more) are derived clock and are present in typical user's circuit design.
Memory assembly is the piece memory module, and it has address and control line with the exclusive data in the visit particular memory location.Example comprises ROM (ROM (read-only memory)), asynchronous RAM (random access memory (RAM)), and synchronous random access memory.
The test platform assembly is the software processes that is used to control and monitor simulation process.Therefore, these assemblies are not the parts of hardware circuit design in test.The test platform assembly passes through to generate clock signal, the initialization simulated data, and from disk/storer, read the simulation test vector pattern and control simulation.The test platform assembly also by the variation of check the value, change dump by the execution value, checks that signal value closes the constraint of asserting of fastening, and will export test vector and write disk/storer, and different waveform viewers is connected with debugged program monitors simulation process.
Analog Simulation System follows these steps to the executive module type analysis.Systems inspection binary source design database.Based on the source design database, system can characterize or be categorized as a kind of of said modules type with element.Assignment statement is classified as combine component continuously.According to language definition, initial door can be the latch form of composite type or register type.The test platform of initialization codes being regarded as initialization type.
Do not use network to drive the test platform of the program all the time of network as type of driver.Not driving the program all the time that network reads network is the test platform of display monitor central monitoring system type.Having the program all the time that postpones control or the control of multiple incident is the test platform of universal class.
The program all the time that has single incident control and drive single network can be a kind of in following: (1) if incident is controlled to be the edge trigger event, this program is an edge flip-over type register assembly so.(2) if the network-driven in the program is undefined in all possible execution path, network is the latch type of register so.(3) if the network-driven in the program is defined in all possible execution path, network is a combine component so.
The program all the time that has single incident control but drive a plurality of networks can resolve into some programs of each network of individual drive to obtain its corresponding component type respectively.Can use decomposable process to determine component type.
Step 304 generates the software model that is used for all HDL assemblies, does not consider its component type.By the appropriate users interface, the user can use complete software model simulation entire circuit design.The use test platform program drives the excitation input, and the test vector pattern is controlled total n-body simulation n, and monitors simulation process.
Step 305 is carried out clock analysis.Clock analysis comprises two general steps: (1) Clock Extraction and order mapping, and (2) clock network analysis.Clock Extraction and order mapping step comprise that the register assembly with the user is mapped in the hardware register model of Analog Simulation System, and then extract clock signal from system hardware register assembly.The clock network analytical procedure comprises based on the clock signal of extracting to be determined major clock and derives clock, and separates gate clock network and gate data network.To be described in detail in conjunction with Figure 16.
Step 306 executive resident is selected.Combine with the user, system is that hardware model is selected assembly; That is to say, in the possible nextport hardware component NextPorts that in all can the hardware model in subscriber's line circuit design, realize, some nextport hardware component NextPorts owing to multiple reason can not be in hardware modelling.These reasons comprise component type, hardware resource limitations (that is, floating-point operation in the software and large-scale multiplying), the simulation and communicate by letter spending (promptly, small-sized bridge logic in the software between the test platform program, and in the software by the signal of test platform sequential monitoring), and user's preferences.Owing to the numerous reasons that comprise performance and analog monitoring, the user can force and should modeled specific components stay in the software in hardware.
Step 307 is mapped to selected hardware model on the reconfigurable simulation hardware plate.Specifically, step 307 is obtained wire list and circuit design is mapped in the specific fpga chip.This step comprises logic element grouping or classification.System distributes every group to one specific fpga chip then, perhaps fractions is fitted on single FPGA chip.System can also divide some assembling and dismantling and be assigned in the different fpga chips.System generally is fitted on fractions in several fpga chips.Hereinafter will be described in detail in conjunction with Fig. 6.System puts into the fpga chip grid to minimize the spending of interchip communication with the hardware model assembly.In one embodiment, array comprises the FPGA array of a 4x4, a pci interface unit, and a software clock control module.The FPGA array has been realized the part of user's hardware circuit design, as determined among the step 302-306 of above-mentioned this software translating process.The pci interface unit can communicate reconfigurable hardware simulation model by pci bus and workstation.Race state when software clock has avoided a plurality of clock signals to enter the FPGA array.In addition, step 307 connects up to fpga chip according to the communication progress sheet between hardware model.
Step 308 is inserted control circuit.These control circuits comprise I/O address pointer and data bus logical, it is used to get in touch the DMA engine (hereinafter will be in conjunction with Figure 11 to simulator, 12 and 14 discuss), and the estimation steering logic, with control hardware state-transition and the multiple transmission of lead (hereinafter will discuss) in conjunction with Figure 19 and 20.Known to the technical staff in the technical field, a direct memory access (DMA) (DMA) unit provides the additional data channel between peripherals and the primary memory, peripherals can directly be visited (that is, read, write) primary memory and do not needed the intervention of CPU therein.It is mobile that address pointer in each fpga chip allows data based bus size to be limited between software model and the hardware model.The estimation steering logic is essentially a finite state machine, and it guarantees that clock can be asserted the input to register before the input of clock and data enters these registers.
Step 309 generates the configuration file that is used for hardware model is mapped to fpga chip.In fact, step 309 is distributed to discrete cell or gate leve assembly on each chip with the circuit design assembly.In view of step 307 is determined the mapping of hardware model group to specific fpga chip, step 309 obtains this mapping result and is each fpga chip generation configuration file.
Step 310 generates the software kernel code.Kernel is the software code sequence of the whole Analog Simulation System of control.Just can generate kernel up to this point, because nextport hardware component NextPort need be upgraded and estimate to the part of code.Only after step 309, just occur to the correct mapping of hardware model and fpga chip.Hereinafter will discuss in more detail in conjunction with Fig. 5.Compiling ends at step 311.
Described as mentioned in conjunction with Fig. 4, after definite software and hardware model, generate the software kernel code in step 310.Kernel is a software of control total system operation in the Analog Simulation System.The execution of simulation of kernel Control Software and simulation hardware.Because kernel also resides at the center of hardware model, so simulator combines with emulator.Compare with other known co-simulation systems, Analog Simulation System does not need simulator to interact with emulator from outside according to an embodiment of the invention.An embodiment of kernel is a control loop shown in Figure 5.
Referring to Fig. 5, kernel starts from step 330.Step 331 pair initialization codes is estimated.Start from step 332 and end at steps in decision-making 339, control loop constantly begins and circulates to handle less than the test platform that activates up to systematic observation, and expression simulation or simulation process are finished in the case.The test platform assembly that step 332 estimation activates is used for simulation or emulation.
Step 333 estimation clock assembly.These clock assemblies are handled from test platform.Usually, the clock signal type of the supply simulation system that is generated by user regulation.(discussed and be replicated in this when coupling unit type analysis above) in an example, the clock assembly that the user designs in the test platform program is as follows:
always begin
Clock=0;
#5;
Clock=1;
#5;
end;
In the clock examples of components, user's decision at first produces the logical zero signal, and after 5 simulated times, will produce a logical one signal at this moment.This clock generative process will constantly circulate up to being stopped by the user.These simulated times are by interior nuclear propulsion.
Whether steps in decision-making 334 inquiries detect any effective clock edge, and it will cause forming the logic estimation and the possibility hardware model (if emulation moves) of some types in the software.The clock signal that kernel is used for detecting the efficient clock edge is the clock signal from the test platform program.If the estimation result of steps in decision-making 334 is a "No", then kernel enters step 337.If the estimation result of steps in decision-making 334 is a "Yes", then cause step 335 to upgrade RS, step 336 transmits combine component.Step 336 is safeguarded combinational logic in fact, and it needs some times asserting after the clock signal, by the combinational logic network delivery value.In case by combine component delivery value and it is stable, kernel enters step 337.
Should note also modelling in hardware of register and combine component, therefore, the emulator section of interior nuclear control Analog Simulation System.In fact, kernel can quicken the estimation to hardware model in step 334 and 335, no matter when detect any effective clock edge.Therefore, be different from prior art, according to an embodiment of the invention Analog Simulation System can by software kernel and based on component type (as, register, combined type) the accelerating hardware emulator.In addition, kernel is by the execution of each cycle period pattern Control Software and hardware model.In fact, the emulator hardware model can be taken as the simulation coprocessor of relative general processor, and it can move the simulation kernel.Coprocessor has quickened the simulation task.
The test platform assembly that step 337 estimation activates.The step 338 propulsion module pseudotime.Step 339 provides the border of the control loop that starts from step 332.Step 339 determines whether that any test platform program activates.If have, will continue operation simulation and/or emulation so and estimate more data.Like this, kernel is recycled to the test platform assembly that step 332 goes to estimate any activation.If there is not the test platform program to activate, then finish simulation and simulation process.Step 340 stops simulation and simulation process.Generally speaking, kernel is the main control loop of the whole Analog Simulation System operation of control.As long as there is any test platform program to activate, the test platform assembly that kernel activates with regard to estimation, the estimation clock assembly, detection clock edge is with the renewal RS and transmit the combinational logic data, and the propulsion module pseudotime.
Fig. 6 has shown an embodiment who automatically hardware model is mapped to the method on the reconfigurable circuit board.The wire list file provides the input of hardware implementation procedure.Wire list has been described logic function and its interconnection.Hardware model-comprise three independently tasks: mapping, place and route to-FPGA implementation procedure.These instruments are commonly called " layout and wiring " instrument.The design tool that uses can be Viewlogic Viewdraw (a kind of system that obtains of signal) and Xilinx Xact layout and wiring software, or the MAX+PLUS II system of Altera.
The mapping task is divided into logical block with circuit design, I/O piece and other FPGA resources.Although some logic functions, for example trigger and impact damper can map directly in the corresponding FPGA resource, other logic function, and for example combinational logic then must use mapping algorithm to realize in logical block.The user can mapping inject the row selection to obtain optimal density or best performance usually.
The layout task relates to and obtains logic and I/O (I/O) piece and they are assigned to physical location in the FPGA array from the mapping task.The general combination of using three kinds of technology of present FPGA instrument: minimum cutting (mincut), simulated annealing and general power are pointed to lax (GFDR).These technology are mainly determined optimal layout based on different cost functions, and these functions depend on total free length of interconnection or along the delay of one group of key signal path between other variable.Xilinx XC4000 Series FPGA instrument uses a kind of modification of minimum cutting technique to carry out initial layout, re-uses the GFDR technology layout is carried out trickle improvement.
The wiring task relates to the routing path of the piece of determining various process mappings of interconnection and layout.A kind of this type of wired program is called as maze router, can seek the shortest path of point-to-point transmission.Because the wiring task provides the direct interconnection of chip chamber, so the circuit layout relevant with chip is just very crucial.
When beginning, can in door wire list 350 or RTL357, hardware model be described.RTL level coding can further be synthesized the gate leve wire list.In mapping process, can use compositor server 360, for example MAX+PLUS II FPGA (Field Programmable Gate Array) tool System and the software of Altera produce output file and are used to shine upon purpose.Compositor server 360 can be with the existing logic element of user's circuit design assembly and any standard in the routine library 361 (for example, standard totalizer or standard multiplication device) be complementary, the logic module 362 that generates any parametrization and frequently use (as, off-gauge multiplexer or off-gauge totalizer), and synthetic random logic element 363 (as, carry out the logic based on look-up table of self-defined logic function).The compositor server is also removed unnecessary logic and untapped logic.The synthetic in fact or optimization of output file the logic of subscriber's line circuit design.
When some or all of HDL was in the RTL level, the circuit design assembly was in sufficiently high level, made Analog Simulation System can utilize analog simulation register or assembly with these component modelizations easily.When some or all of HDL was in gate leve wire list level, the circuit design assembly may be more special circuit design, made user's circuit design assembly difficult more to the mapping of analog simulation assembly.Therefore, the compositor server is had the ability, and generation is any does not have any similar random logic element based on the logic element of standard logic element variant or with these variants or routine library standard logic element.
Ifs circuit is designed to the form of gate leve wire list, and Analog Simulation System will at first be carried out grouping or sort operation 351.The hardware model structure is based on assorting process, because combinational logic separates with clock with register.Therefore, by they being gathered together and being placed in jointly on the chip, can the servo better logic element of sharing a public major clock or door controling clock signal.Sorting algorithm drives based on connectedness, grading extraction, and regular texture extracts.If be described among the structurized RTL358, Analog Simulation System can be with the unit of Function Decomposition Cheng Gengxiao so, as logic function operation splitting 359 is represented.In any stage, logic is synthesized or logic optimization if desired, and then available compositor server 360 changes circuit design into more effective expression according to user's instruction.For sort operation 351, by dotted arrow 364 expression its with being connected of compositor server.For structuring RTL358, by arrow 365 expression its with being connected of compositor server 360.For logic function operation splitting 359, by arrow 366 expression its with being connected of compositor server 360.
Sort operation 351 is grouped into logic module in mode optionally based on function and size is in the same place.Sort operation may relate to monoid of miniature circuit design or several monoids of large scale circuit design.In any case, will use the logic element of these monoids it is mapped in the fpga chip of appointment in the step in the back; That is to say that a monoid will point to a specific chip, and another monoid will point to a different chip or may with first kind faciation with chip.Logic element in monoid is placed in the chip with this monoid usually, but for the optimization purpose, a monoid must be decomposed in a plurality of chips.
After being to form monoid in the sort operation 351, system carries out layout and wiring operations.At first, carry out the coarse grain layout operation 352 that these monoids is disposed into fpga chip.Coarse grain layout operation 352 at first is placed in the logic element of some monoids in the selected fpga chip.If desired, system can make compositor server 360 be used for coarse grain layout operation 352, shown in arrow 367.Carry out the operation of particulate layout with the initial layout of trickle adjustment in coarse grain layout operation back.Analog Simulation System is used based on the requirement of pin utilization rate, and the gate circuit utilization rate requires and door-to the cost function of-Men hop, with the optimal layout of determining that coarse grain and particulate layout are operated.
For monoid how layout determining in certain chip be based on the layout cost, it by two or more circuit (promptly, CKTQ=CKT1, CKT2 ... .., CKTN) cost function f (P, G, D) and they in the fpga chip array the relevant position and calculate, wherein P refers generally to pin utilization rate/utilization factor, G refers generally to gate circuit utilization rate/utilization factor, distance or the quantity (as shown in Figure 7 and Figure 8) of the door that D defines for connection matrix M-arrive-Men " hop ".The subscriber's line circuit design of modelling in hardware model comprises total combination of circuits CKTQ.Define each cost function, the calculated value of the feasible layout cost that calculates helps usually: (1) is in the FPGA array, realize the quantity of the minimum " hop " between any two circuit CKTN-1 and the CKTN, and the layout of circuit CKTN-1 and CKTN in (2) FPGA array, to obtain minimum pin utilization rate.
In one embodiment, cost function F (P, G D) are defined as:
f ( P , G , D ) = [ C 0 * MAX each _ FPGA _ chip ( P used P available ) ] + [ C 1 * MAX each _ FPGA _ chip ( G used G available ) ] +
[ C 2 * Σ ( i , j ) ∈ CKT DIST ( FPGA i , FPGA j ) ]
This equation can be reduced to following form:
f(P,G,D)=C0*P+C1*G+C2*D
First (that is, C0*P) generates the first layout value at cost based on the quantity and the available pin number of use pin.Second (that is, C1*G) generates the second layout value at cost based on the quantity and the available gate circuit quantity of use gate circuit.The 3rd (that is, and C2*D) based on circuit CKTQ (be CKT1, CKT2 ... .., the quantity of the hop that CKTN) exists between middle different interconnection gate circuits generates the layout value at cost.These three layout value at costs that add up by iteration produce total layout value at cost.Constant C 0, C1 and C2 represent weighting constant, it optionally makes total layout value at cost deflection that cost function thus produces of paramount importance factor or a plurality of factor (that is, pin utilization rate, gate circuit utilization rate or door-to-Men hop) in any iteration layout pricing process.
Along with system is weighting constant C0, C1 selects different correlations, double counting layout cost with C2.Therefore, in one embodiment, in the operating process of coarse grain layout, system is that C0 and C1 select higher value with respect to C2.In this iterative process, system determines in the initial circuit CKTQ layout in the fpga chip array, optimization pin utilization rate/utilization factor and gate circuit utilization rate/utilization factor than optimization door-to-the Men hop is more important.In iterative process subsequently, system is that C0 and C1 select smaller value with respect to C2.In this iterative process, system determine the optimization door-to-the Men hop is more important than optimization pin utilization rate/utilization factor and gate circuit utilization rate/utilization factor.
In the operating process of particulate layout, system uses identical cost function.In one embodiment, about selecting C0, the iterative step of C1 and C2 is identical with step during the coarse grain layout is operated.In another embodiment, particulate layout operation relates to and allows the system be that C0 and C1 select smaller value with respect to C2.
To explain these variablees and equation now.For determining whether to arrange particular electrical circuit CKTQ in fpga chip x or fpga chip y (among other fpga chips), cost function will be checked pin utilization rate/utilization factor (P), gate circuit utilization rate/utilization factor (G), and door-to-Men hop (D).Based on cost function variable P, G and D, (P, G D) are created on the layout value at cost of the ad-hoc location cloth circuits CKTQ of FPGA array to cost function f.
Pin utilization rate/utilization factor P also represents the I/O capacity.P UsedThe employed pin number of circuit CKTQ for each fpga chip.P AvailableBe pin number available in fpga chip.In one embodiment, P AvailableBe 264 (6 interconnection/chips of 44 pin x), and in another embodiment, P AvailableBe 265 (6 interconnection/chip+1 of 44 pin x additional pin).But the concrete quantity of usable pins depends on the type of the fpga chip that uses, the total quantity of interconnection that every chip uses, and each employed pin number that interconnects.Therefore, P AvailableCan great changes have taken place.So (D) first of equation (that is, C0*P), be calculated the P of each fpga chip for P, G for estimation cost function F Used/ P AvailableRatio.Like this, for a 4x4FPGA chip array, calculate 16 P Used/ P AvailableRatio.For a usable pins to determined number, employed pin number is many more, and ratio is just high more.In 16 ratios that calculate, select to produce the rate value of high number.By with selected maximum rate P Used/ P AvailableC0 multiplies each other with weighting constant, calculates the first layout value at cost from first C0*P.Because this first depends on the ratio P that calculates Used/ P AvailableWith the specific maximum rate in the ratio that calculates for each fpga chip, so under the identical situation of every other factor, the pin utilization rate is high more, the layout value at cost is also high more.The minimum layout of layout cost is selected by system.It has been generally acknowledged that to have minimum maximum rate P Used/ P Available(in the maximal value of promising different layout calculation minimum one) specified arrangement be optimal layout in the FPGA array, every other factor is all identical.
The gate circuit quantity that gate circuit utilization rate/utilization factor G allows based on each fpga chip.In one embodiment, based on the position of circuit CKTQ in the array, if in each chip employed gate circuit quantity G UsedBe higher than a fixed threshold, this second layout cost (C1*G) will be endowed a value so, show that layout is infeasible.Similarly, employed gate circuit quantity is equal to or less than fixed threshold in the chip of circuit CKTQ if comprise at each, and this second (C1*G) will be endowed a value so, show that layout is feasible.Therefore, if system is desirably in cloth circuits CKT1 in the certain chip when beginning, this chip does not have abundant gate circuit to hold circuit CKT1, and system will draw the infeasible conclusion of this specified arrangement by cost function so.Usually, G have very high numeral (as, unlimited) guaranteed that cost function will produce high layout value at cost, show that the layout of desired circuit CKTQ is infeasible, and should determine the layout that substitutes.
In another embodiment, based on the position of circuit CKTQ in the array, calculate the ratio G of each chip Used/ G Available, G wherein UsedBe the employed gate circuit quantity of circuit CKTQ in each fpga chip, G AvailableBe gate circuit quantity available in fpga chip.In one embodiment, system is used for the FPGA array with FLEX 10K100 chip.The FLEX10K100 chip comprises about 100,000 gate circuits.Therefore, in this embodiment, G AvailableEqual 100,000 gate circuits.Like this, for a 4x4FPGA chip array, calculate 16 G Used/ G AvailableRatio.For an available gate circuit to determined number, employed gate circuit quantity is many more, and ratio is just high more.In 16 ratios that calculate, select the highest rate value.By with selected maximum rate G Used/ G AvailableC1 multiplies each other with weighting constant, calculates the second layout value at cost from second C1*G.Because this second depends on the ratio G that calculates Used/ G AvailableWith the specific maximum rate in the ratio that calculates for each fpga chip, so under the identical situation of other factors, the gate circuit utilization rate is high more, the layout value at cost is also high more.The minimum layout of layout cost is selected by system.It has been generally acknowledged that to have minimum maximum rate G Used/ G Available(in the maximal value of promising different layout calculation minimum one) specified arrangement be optimal layout in the FPGA array, every other factor is all identical.
In another embodiment, some values are at first selected for C1 by system.If ratio G Used/ G AvailableGreater than " 1 ", this specified arrangement infeasible (that is, at least one chip do not have abundant gate circuit be used for this particular electrical circuit layout) then.Thereby, system with C1 be modified as very large numeral (as, unlimited) and therefore, second C1*G also will be very large numeral, (P, G will be very high also D) to total layout value at cost f.On the other hand, if ratio G Used/ G AvailableBe less than or equal to " 1 ", so this specified arrangement feasible (that is, each chip has abundant gate circuit to support the realization of circuit).Thereby also therefore system does not revise C1, and second C1*G will have a particular value.
The 3rd C2*D represents the quantity of the hop between all gate circuits that need interconnection.The quantity of hop also depends on interconnection matrix.Connection matrix provides need to determine chip-to the basis of circuit path between any two gate circuits of-chip interconnect.Be not that each gate circuit all needs door-to the interconnection of-Men.Be divided into certain chip based on user's ifq circuit design with monoid, some gate circuits are without any need for interconnection, because its corresponding input (a plurality of input) is arranged in identical chips with the logic element (a plurality of logic element) that output (a plurality of output) links to each other.But other gate circuit then needs interconnection, because its corresponding input (a plurality of input) is arranged in different chips with the logic element (a plurality of logic element) that output (a plurality of output) links to each other.
In order to understand " hop ", please referring to the connection matrix of image format among form among Fig. 7 and Fig. 8.In Fig. 8, each interconnection of chip chamber, 44 pins or 44 wire lines are represented in for example interconnection 602 between chip F11 and chip F14.In other embodiments, each interconnection representative surpasses 44 pin.Again in other embodiments, each interconnection representative is less than 44 pin.
Utilize this interconnect scheme, data can pass to another chip from a chip in twice " hop " or " redirect ".Therefore, data can utilize hop 601 to pass to chip F12 from chip F11 by interconnecting, and data can utilize twice hop by interconnecting 600 and 606 or interconnect and 603 and 610 pass to chip F33 from chip F11.These hops be exemplified as the shortest hop paths between these chipsets.In some cases, a plurality of chips will be passed in the path of signal, make that the quantity of hop has surpassed short wave relay section path between gate circuit in a chip and the gate circuit in another chip.Only the circuit path that must detect in the quantity of determining the door-to-door hop is the passage that needs interconnection.
All need the summation of hop between the gate circuit of chip chamber interconnection to represent connectedness.Represent the shortest path of any two chip chambers by one or two " hop " of the connection matrix that uses Fig. 7 and 8.But, for the specific hardware simulator, the I/O capacity limit the direct-connected quantity of shortest path between any two gate circuits in the array, and therefore, these signals will be through longer path (more than two hops) to arrive the destination.Thereby for some doors-connect to-Men, the quantity of hop may be above two.Usually, under the situation that other conditions equate, few more relaying hop count will produce low more layout cost.
The 3rd (that is, detailed form C2*D) is as follows:
f ( P , G , D ) = . . . [ C 2 * Σ ( i , j ) ∈ CKT DIST ( FPGA i , FPGA j ) ]
This 3rd is weighting constant C2 and summation part (S ...) product.Summation partly is essentially the summation that needs all hops between each gate circuit i of chip-in the subscriber's line circuit design of-chip interconnect and the gate circuit j.As mentioned above, be not that all gate circuits all need the chip chamber interconnection.Need the gate circuit i and the gate circuit j of chip chamber interconnection for those, the number of hop is determined.For all gate circuit i and gate circuit j, with total interruption hop count addition.
Distance calculation also can be defined as follows:
DIST ( i , j ) ∈ CKT ( FPGA i , FPGA j ) = min k ( M i , j k = 1 )
At this, M is a connection matrix.An embodiment of connection matrix as shown in Figure 7.Calculate the distance of each door that need interconnect-connect to-Men.Therefore, for the comparison of each gate circuit i and gate circuit j, check connection matrix M.More be explicitly shown as,
M i , j k = ∪ ∀ k ( m i , l ∩ m l , j )
Foundation comprises the matrix of all chips in the array, makes each chip have discernible numbering.These identiflication numbers are arranged on the top of matrix as column heading.Same, these identiflication numbers are arranged on a side of matrix as row headers.The particular table train value in the place that row and column intersects in this matrix provides the direct connection data between two chips that line number and column number intersect.For any distance calculation between chip i and the chip j, matrix M I, jIn tabulated value comprise " 1 " (directly connect) or " 0 " (non-direct connection).Index k refers to and connects among the chip i the required relaying hop count of any gate circuit in any gate circuit and chip j, and these gate circuits need interconnection.
At first, should test the connection matrix Mi of k=1, j.If tabulated value is " 1 ", then exist directly between the selected gate circuit among gate circuit among this chip i and the chip j to connect.Therefore, designated index or hop k=1 are as Mi, and the result of j and this result are two distances between the gate circuit.At this moment, can test other door-connect to-Men.But,, then do not have direct connection if tabulated value is " 0 ".
If there is no directly connect, then should test next k.This new k (that is, k=2) can pass through matrix M i, multiply each other and calculate by j and himself; In other words, M 2=M*M, wherein k=2.
This will continue M and the process that himself multiplies each other up to the tabulated value of the specific row and column of chip i and chip j, be " 1 " up to the result who calculates, and this moment, selection index k was as the number of hop.Operation comprises carries out AND-operation to matrix M, and the result who follows AND-operation carries out inclusive-OR operation.If matrix m I, lAnd m L, jBetween the result of AND-operation be logical value " 1 ", in chip i, exist between the selected gate circuit among selected gate circuit and the chip j so to connect, this connects by any chip 1 and within hop k; If not, then within this specific hop k, do not exist to connect and further calculating of needs.According to definition to hardware modeling, matrix m I, lAnd m L, jBe connection matrix M.For any given gate circuit i and gate circuit j that needs interconnection, from logic to matrix m I, lThe row that comprises fpga chip and gate circuit j and the m of middle gate circuit i I, jThe row that comprise fpga chip carry out AND-operation.To independent " with " assembly carries out OR operation to determine the M as a result for index or hop k I, jValue is " 1 " or " 0 ".If the result is " 1 ", then having connection and designated index k is the number of hop.If the result is " 0 ", then there is not connection.
Following Example has shown these principles.Referring to Figure 35 (A) to 35 (D).Figure 35 (A) has been represented user's circuit design by cloud 1090.This circuit design 1090 can be simple or complicated.The part of circuit design 1090 comprises OR-gate 1091 and two AND gates 1092 and 1093.AND gate 1092 is connected with the input of OR-gate 1091 with 1093 output.These gate circuits 1091,1092 also can be connected with other parts of circuit design 1090 with 1093.
Referring to Figure 35 (B), the assembly of circuit 1090 comprises the part that comprises three gate circuits 1091,1092 and 1093, be set up and layout on fpga chip 1094,1095 and 1096.Interconnect scheme shown in the concrete demonstration example of this fpga chip array has; That is, one group of interconnection 1097 connects chip 1094 and chip 1095, and another group interconnection 1098 connects chip 1095 and chip 1096.Not directly interconnection between chip 1094 and chip 1096.When the component placement of this circuit design 1090 is in chip, system uses interconnect scheme connecting circuit passage between different chips of pre-design.
Referring to Figure 35 (C), possible structure and layout are that OR-gate 1091 is placed on the chip 1094, AND gate 1092 is placed on the chip 1095, and AND gate 1093 is placed on the chip 1096.Other parts of display circuit 1090 are not as demonstration.Connection between OR-gate 1091 and the AND gate 1092 needs an interconnection, because they are arranged in different chips, in being to use one group of interconnection 1097.The relaying hop count of this interconnection is " 1 ".Connection between OR-gate 1091 and the AND gate 1093 also needs an interconnection, in being to use interconnected set 1097 and 1098.The relaying hop count is " 2 ".For this layout example, hop adds up to " 3 ", does not consider other gate circuits of not shown circuit 1090 remaining parts and the effect of interconnection.
Figure 35 (D) has shown another layout example.At this, OR-gate 1091 is placed on the chip 1094, and AND gate 1092 and 1093 is placed on the chip 1095.Also not other parts of display circuit 1090 as demonstration.Connection between OR-gate 1091 and the AND gate 1092 needs an interconnection, because they are arranged in different chips, in being to use one group of interconnection 1097.The relaying hop count of this interconnection is " 1 ".Connection between OR-gate 1091 and the AND gate 1093 also needs interconnection, in being to use interconnected set 1097.The relaying hop count also is " 1 ".For this layout example, hop adds up to " 2 ", does not consider other gate circuits of not shown circuit 1090 remaining parts and the effect of interconnection.So only based on distance parameter D and suppose that any other factor all equates, the cost function of Figure 35 that is calculated (D) layout example will be lower than the cost function of Figure 35 (C) layout example.But other factor is also not all equal.Probably, the cost function of Figure 35 (D) is also based on gate circuit utilization rate/utilization factor G.In Figure 35 (D), chip 1095 has used a gate circuit than the identical chips among Figure 35 (C) more.In addition, the pin utilization rate/utilization factor P of the chip 1095 in the layout example of Figure 35 (C) is greater than the pin utilization rate/utilization factor of the identical chips in another layout example of Figure 35 (D).
After the coarse grain layout, for the trickle adjustment of the monoid layout that the flattens result that further to optimize distribution.The layout of having selected when this particulate layout operation 353 has improved by coarse grain layout operation 352 beginnings.At this, initial assembly monoid may be separated, condition is that such layout can reach more optimization effect.For example, logic of propositions element X and Y are originally the part of assembly monoid A and are assigned to fpga chip 1.Because particulate layout operation 353, logic element X and Y may be designated as the part of assembly monoid B independently or another assembly monoid C and layout in fpga chip 2 now.Then the generic connection subscriber's line circuit is designed and specifies the FPGA wire list 354 of FPGA.
The separation assembly monoid and with its layout determining also based on the layout cost in certain chip for how, (P, G D) calculate its cost function f by circuit CKTQ.In one embodiment, the employed cost function of particulate layout process is identical with coarse grain layout process employed cost function.Only difference is the size of the assembly monoid of institute's layout between two layout process, rather than process itself.Coarse grain layout process compares the bigger assembly monoid of use with particulate layout process.In other embodiments, coarse grain is different with particulate layout process employed cost function, as introduces weighting constant C0, and is described the same during the selection of C1 and C2.
In case layout is finished, carry out the wiring task 355 of chip chamber.If connecting the wiring route quantity be arranged in different chip circuit has surpassed these fpga chips and has distributed to circuit-to the usable pins quantity of-wiring, then can use time division multiplexing (TDM) circuit.For example, if each fpga chip only allows 44 pins to be used for connecting the circuit of two different fpga chips, and a special model realizes having 45 leads at chip chamber, will settle special time-division multiplex change-over circuit so in each chip.This special TDM circuit connects two leads at least.An embodiment of TDM circuit is presented at Fig. 9 (A), among 9 (B) and 9 (C), will discuss hereinafter.Therefore, owing to pin can be arranged to the time division multiplexing form of chip chamber, so the wiring task always can be finished.
In case determined the place and route of each FPGA, then each FPGA can be configured to best operating circuit and system according to these generations " bit stream " configuration file 356.According to the term of Altera, system generates one or more Programmer Object Files (programmer's file destination) (.pof).The file of other generations comprises SRAM Object Files (SRAM file destination) (.sof), JEDEC Files (JEDEC file) (.jed), Hexadecimal (Intel form) Files (hex file) (.hex), and Tabular Text Files (table text file) is (.ttf).The MAX+PLUS II sequencer of Altera uses POF, and SOF and JEDEC file are FPGA array program preface in conjunction with the hardware programmable device of Altera.Perhaps, system generates one or more former binary files (.rbf).CPU revises the .rbf file and is the FPGA array program by pci bus.
At this moment, the hardware that is disposed is hardware-initiated 370 to get ready.On reconfigurable plate, finished the automatic formation of hardware model like this.
Get back to the TDM circuit, its mode with time division multiplexing connects pin output group, make and in fact only use a pin output, the TDM circuit comes down to a multiplexer, it has at least two inputs (being used for two leads), an output, and be configured to a pair of register of loop as selector signal.If Analog Simulation System needs more lead to connect in groups, can provide more input and loop register so.As the selector signal of this TDM circuit, several registers that are configured to the loop provide appropriate signals to multiplexer, make one period, and an input is selected as output, and in another section period, another input is selected as output.Therefore, the TDM circuit manages only to use an outlet line at chip chamber, make can use 44 pins for this example, rather than 45 realizations of finishing circuit hardware model in certain chip.Like this, owing to pin can be arranged to the time division multiplexing form of chip chamber, so the wiring task always can be finished.
Fig. 9 (A) has shown the general survey of leading foot problem.Owing to need the TDM circuit, Fig. 9 (B) provides the TDM circuit of transmission ends, and Fig. 9 (C) provides the TDM circuit of receiving end.These figure have only shown a concrete example, and wherein Analog Simulation System need have a lead at chip chamber, rather than two.Surpass two leads if must connect in the time multiplexing device, then the person of ordinary skill in the field can carry out suitable improvement according to following content.
Fig. 9 (A) has shown an embodiment of TDM circuit, and wherein Analog Simulation System connects two leads in the TDM structure.Wherein have two chips, 990 and 991.As the circuit 960 of the part of complete subscriber's line circuit design by modelling and layout in chip 991.As the circuit 973 of the part of complete subscriber's line circuit design by modelling and layout in chip 990.Between circuit 960 and circuit 973, have a plurality of interconnection, comprise interconnected set 994, interconnection 992 and interconnection 993.In this example, interconnection add up to 45.If in one embodiment, each chip only provides 44 pins to be used for these interconnection at the most, then one embodiment of the present of invention setting makes at least two interconnection connect by multiplexed form of time, only to obtain an interconnection between chip 990 and 991.
In this example, interconnected set 994 will continue to use 43 pins.Can use TDM circuit according to an embodiment of the invention to connect interconnection 992 and interconnection 993 by the form of time division multiplexing, form the 44th, also be last pin.
Fig. 9 (B) has shown an embodiment of TDM circuit.Precircuit in the fpga chip 991 (or its part) 960 provides two signals on lead 966 and 967.For circuit 960, lead 966 and 967 is output.These outputs precircuit 973 general and in the chip 990 is connected (seeing Fig. 9 (A) and 9 (C)).But, only hindered contacting directly of pin-right-pin for these two output leads 966 and 967 provide a pin.Because export 966 and 967 other chips are adopted one-way transmission, so must provide suitable transmission and receiver TDM circuit to be connected these circuits.Fig. 9 (B) has shown an embodiment of transmission ends TDM circuit.
Transmission ends TDM circuit comprises AND gate 961 and 962, and they are exported 970 accordingly and link to each other with the input of OR-gate 963 with 971.The output 972 of OR-gate 963 is to distribute to the chip of pin output and link to each other with another chip 990.Provide respectively one group of input 966 and 967 by circuit model 960 to AND gate 961 and 962.Another group input 968 and 969 is provided by the loop register circuit as the time division multiplexing selector signal.
The loop register circuit comprises register 964 and 965.The output 995 of register 964 is provided for the input of register 965 and the input 968 of AND gate 961.The output 996 of register 965 is connected with the input 969 of the input of register 964 and AND gate 962. Register 964 and 965 is by a common clock impulse source control.Any given moment in the time, only there is one to be logical one in the output 995 or 996.Another is a logical zero.Therefore, after each clock edge, logical one output 995 and export 996 between conversion.This is not to provide a logical one to AND gate 962 exactly to AND gate 961 conversely speaking,, with the signal on " selection " lead 966 or the lead 967.Therefore, the data on the lead 972 by lead 966 or lead 967 from circuit 960.
Fig. 9 (C) has shown an embodiment of TDM circuit receivers end parts.Signal (Fig. 9 (A) and 9 (B)) from circuit 960 in the chip 991 on lead 966 and the lead 967 must link to each other with suitable lead 985 or 986 to arrive the circuit 973 among Fig. 9 (C).Time division multiplexed signals from chip 991 enters from lead/pin 978.Receiver end TDM circuit can link to each other these signals on lead/pin 978 and suitable lead 985 to arrive circuit 973 with 986.
The TDM circuit comprises input register 974 and 975.Signal on lead/pin 978 offers these input registers 974 and 975 by lead 979 or 980 respectively.The output 985 of input register 974 is provided for port suitable in the circuit 973.Same, the output 986 of input register 975 is provided for port suitable in the circuit 973.These input registers 974 and 975 are by loop register 976 and 977 controls.
The output 984 of register 976 links to each other with the clock input 981 of the input of register 977 and register 974.The output 983 of register 977 links to each other with the clock input 982 of the input of register 976 and register 975.Register 976 and 977 is by a common clock impulse source control.Any given moment in the time, enabling to import only has to be a logical one in 981 or 982.Another is a logical zero.Therefore, after each clock edge, logical one enable to import 981 and export 982 between conversion.Conversely speaking, this " selection " signal on lead 979 or the lead 980.Therefore, the data from circuit 960 correctly are connected with circuit 973 by lead 985 or lead 986 on the lead 978.
To go through the simple address pointer of introducing according to an embodiment of the invention now in conjunction with Fig. 4.Reiterate, be mounted with a plurality of address pointers in each fpga chip in hardware model.Usually, settling the fundamental purpose of address pointer is to make the system can be by transmitting data (referring to Figure 10) between the specific fpga chip of 32-position pci bus 328 in software model 315 and hardware model 325.More particularly, the fundamental purpose of address pointer is that the bandwidth constraints according to 32-position pci bus optionally is controlled at each address space (that is, REG, the S2H in the software/hardware border, H2S, and CLK) and fpga chip group 326a-326d in each fpga chip between data transmit.Even 64-position pci bus has been installed, still need these address pointers to come control data to transmit.Therefore, if software model has 5 address spaces (that is, REG reads, and REG writes, and S2H reads, and H2S writes and CLK writes), then each fpga chip has 5 address pointers corresponding to these 5 address spaces.Each FPGA needs this 5 address pointers, because processed specific word may be arranged in any one or a plurality of fpga chip in selected address space.
FPGA i/o controller 381 is selected specific address space (that is, REG, S2H, H2S, and CLK) by using the SPACE index corresponding to the software/hardware border.In case selected address space, the particular address indicator corresponding to selected address space in each fpga chip is then selected specific word according to word select identical in the selected address space.The full-size of the address pointer in the address space in the software/hardware border and each fpga chip depends on the memory capacity/word length of selected fpga chip.For example, one embodiment of the present of invention are used Altera FLEX 10K Series FPGA chip.Therefore, the expectation full-size of each address space is: REG, 3,000 words; CLK, 1 word; S2H, 10 words; H2S, 10 words.Each fpga chip approximately can hold 100 words.
The analog simulator system also has and allows any time of user in the analog simulation process, stops, and asserts the input value and the characteristics of check the value.In order to make simulator have flexibility, analog simulator also must allow the user can see all component, and no matter assembly is to finish inner the realization in software or hardware.In software, modelling combine component and calculated value in simulation process.Therefore, these are worth clear for the user " as seen ", can any time in simulation process carry out access.
But the combine component value in the hardware model is like this directly " as seen " not.Although software kernel can be easy to and can directly visit (that is, read/write) register, more difficult definite combine component.In FPGA, most of combine components are become look-up table to obtain high gate circuit utilization factor by model.Thereby the look-up table mapping provides effective hardware modeling, but has lost the observability of most of combinational logic signals.
Although have the problem that combine component lacks observability, Analog Simulation System can rebulid or generate combine component for customer inspection after hardware-accelerated pattern.If user's circuit design only has combination and register assembly, then can from the register assembly, obtain the value of all combine components.That is to say, the specific logical function required according to circuit design, combine component be according to register structure or in a plurality of structures, comprise register.Analog simulator only has the hardware model of register and combine component, and therefore, analog simulator will be read all register values from hardware model, then rebulid or generate all combine components.Regenerate the process need expense because carry out this, thus not sometimes all carry out regenerating of combine component; And be based on the user need carry out.In fact, using a benefit of hardware model is to quicken simulation process.Determine that in each cycle (or most of cycle) the combine component value has further reduced the speed of simulation.Under any circumstance, only the inspection of register value just can be satisfied the requirement of most of sunykatuib analyses.
The process hypothetical simulation analogue system that regenerates the combine component value from register value is in hardware-accelerated pattern or ICE pattern.Otherwise software simulation has offered the user with the combine component value.Analog Simulation System was preserved combine component value and the register value that resides in the software model before hardware-accelerated beginning.These values remain in the software model up to being rewritten once more by system.Because software model just in time before hardware-accelerated operation begins the time interimly had register value and combine component value, relate to according in these values in the input register value update software model that upgrades some or all so regenerate the process of combine component.
Combine component to regenerate process as follows: at first, if the user needs, software kernel can be read the output valve of hardware register assembly the REG impact damper from fpga chip.This process relates to register value in the fpga chip by DMA (direct memory access (DMA)) transmission of address pointer chain to the REG address space.Register value in the hardware model is placed in the REG impact damper that is arranged in the software/hardware border, allows the software model visit data to be used for further processing.
The second, before the more hardware-accelerated operation of software kernel and the register value after the hardware-accelerated operation.If the register value before the hardware-accelerated operation is identical with hardware-accelerated operation value afterwards, then the value in the combine component does not change.Can read these values from software model, rather than expend time in and resource regenerates combine component, software model has had the combine component value that just was stored in wherein before hardware-accelerated operation.On the other hand, if the one or more of these register values change, the one or more combine component values that depend on the register value of change also will change.Must regenerate these combine components by following third step.
The 3rd, for before quickening and quicken the different situation of register value afterwards, software kernel is arranged into the combine component of its fan-out in the event queue.At this, those registers that changed value in quickening operational process detect an incident.Probably, the combine component that depends on the register value of these changes will produce different values.No matter how these combine component values change, and system guarantees that these combine components estimate the register value of these changes in next procedure.
The 4th, software kernel is followed operative norm event simulation algorithm, will be worth all combine components in changing from the register transfer to the software model.In other words, the register value that changed in the time interval after quickening before quickening is sent in all downstream combine components that depend on these register values.Then these combine components are estimated new register value.According to fan-out and transmission principle, other secondary combine components that are placed in one-level combine component downstream that directly rely on the register value that changes conversely also must be estimated the data that change, if any.This process that register value is sent to other affected downstream components is extended to the end of fan-out network.Therefore, only upgraded the combine component that the register value that is positioned at the downstream in the software model and changed influences.Be not that all combine component value is all influenced.Therefore, if only have a register value to change in the time interval before quickening after quickening, and the influence that only has a combine component changed by this register value, so only this combine component will be estimated its value again according to the register value of this change.Other parts of precircuit are unaffected.Hereto little variation, the process that regenerates of combine component will be carried out comparatively fast relatively.
At last, after the incident transmission was finished, system all set carried out the operation of any pattern.Usually, user expectation check the value after long-term operation.After the process that regenerates of combine component, the user will proceed the pure software simulation, for debugging/test.But at ordinary times, the user wishes to proceed hardware-accelerated to arrive next impact point.In other cases, the user wishes to continue implementation under the ICE pattern.
Generally speaking, combine component regenerates to relate to and uses register value to go combine component value in the update software model.When any register value changed, the register value of change will be transmitted in updating value by the fan-out network of register.When not having register value to change, the value in the software model can not change yet, so system need not regenerate combine component.Usually, hardware-accelerated operation will continue considerable time.Thereby many register values will change, and influence is positioned at a lot of combine component values in the fan-out network downstream of the register that these values change.In the case, the process that regenerates of combine component is with relatively slow.In other cases, after hardware-accelerated operation, have only a few registers value to change.The fan-out network of the register that value changes may be less, and therefore, the process that regenerates of combine component will be very fast relatively.
IV. utilize the goal systems mode simulation
Figure 10 has shown Analog Simulation System structure according to an embodiment of the invention.Figure 10 has also shown when system operates in the internal circuit simulation model, software model, a relation between hardware model and emulation interface and the goal systems.As previously mentioned, Analog Simulation System comprises that a general purpose microprocessor and one are by such as the interconnective reconfigurable hardware plate of the high-speed bus of pci bus.The circuit design of Analog Simulation System compiling user and be that hardware model generates the simulation hardware configuration data to the mapping process of reconfigurable circuit plate.The user can pass through the general processor mimic channel then, and hardware-accelerated simulator program has the circuit design of goal systems by emulation interface emulation, carries out the back sunykatuib analysis afterwards.
Decision software model 315 and hardware model 325 in compilation process.Emulation interface 382 and goal systems 387 also are provided in the system of internal circuit simulation model.Under user's judgement, emulation interface and goal systems do not need at first to be coupled in the system.
Software model 315 comprises kernel 316, these kernel 316 control total systems and four address space-REG, S2H, H2S and CLK being used for the software/hardware border.Analog Simulation System is mapped to 4 address spaces of primary memory according to different component type and control function with hardware model, and component type and control function that these are different comprise: specify REG space 317 to the register assembly; Specify CLK space 320 to software clock; Specify S2H space 318 that the software test platform assembly is outputed in the hardware model; Specify H2S space 319 that hardware model is outputed in the software test platform assembly.In the initialization time of system, these special-purpose I/O cushion spaces are mapped in the primary storage space of kernel.
Hardware model comprises several fpga chips 326a-326d and FPGA i/o controller 327.Each memory bank (for example, 326b) comprises a fpga chip at least.In one embodiment, every group comprises 4 FGPA chips.In the fpga chip of a 4X4 array, group 326b and group 326d may be the low side groups, and group 326a and group 326c may be high-end group.With reference to Fig. 6 mapping has been discussed, layout and the wiring that interconnects with their from the modeled subscriber's line circuit design component of specific hardware to specific chip.Interconnection 328 between software model 315 and the hardware model 325 is pci bus systems.Hardware model also comprises FPGA i/o controller 327, and this controller 327 comprises a pci interface 380 and a control module 381 that is used to control the data communication between pci bus and the fpga chip group 326a-326d when keeping the pci bus throughput.Each fpga chip also comprises several address pointers, wherein in the software/hardware border each address pointer corresponding with each address space (be REG, S2H, H2S and CLK), with each address space among this coupling connection fpga chip group 326a-326d and the data between each fpga chip.
Communication between software model 315 and the hardware model 325 takes place by DMA engine in the hardware model or address pointer.Perhaps, communication also takes place by the DMA engine and the address pointer of hardware model.Kernel starts the DMA transmission together with the estimation request by directly shining upon the I/O control register.REG space 317, CLK space 320, S2H space 318 and H2S space 319 use I/O data routing circuit 321,322,323 and 324 to be used for data transmission between software model 315 and the hardware model 325 respectively.
All primary inputs in S2H and CLK space all need double buffering, because several clock period of these space requirements are finished renewal process.Double buffering has avoided causing the interference to inner hardware model state of race state.
S2H and CLK space are the primary inputs from the kernel to the hardware model.As mentioned above, hardware model holds all register assembly and combine components of subscriber's line circuit design in fact.And, in software, software clock is set with the connection hardware model with the software clock modelling and in the CLK input/output address space.The kernel propulsion module pseudotime, seek the experiment porch assembly and the estimation clock assembly that activate.When kernel detects any clock edge, upgrade RS and transmit numerical value by combine component.Like this, if select hardware-accelerated pattern, any variation of numerical value will trigger hardware model change logic state in these spaces.
For the internal circuit simulation model, emulation interface 382 is coupled to pci bus 328 and communicates by letter with hardware model 325 with software model 315 with this.In the process of hardware-accelerated simulation model and internal circuit simulation model, kernel 316 is Control Software model but also control hardware model not only.Emulation interface 382 also is coupled in the goal systems 387 by cable 390.Emulation interface 382 also comprises interface port 385, emulation I/O control 386, and target arrives input/output (i/o) buffer (T2H) 384 of hardware and the input/output (i/o) buffer (H2T) 383 that hardware arrives target.
Goal systems 387 comprises 389, one signal input/signal output interfaces of connector socket 388 and other modules or the chip that belong to goal systems 387.For example, goal systems 387 can be an EGA Video Controller, and the subscriber's line circuit design can be a special i/o controller circuit.Be used for the EGA Video Controller i/o controller subscriber's line circuit design software model 315 complete modelizations and in hardware model 325 part topotype typeization.
Kernel 316 in the software model 315 is also controlled the internal circuit simulation model.By software clock, gated clock logic and gate data logic to the control of simulated clock simulation clock still in software, so problem with the retention time in the internal circuit simulation model, can not occur assembling.Like this, the user can open in any time in the online simulation process, stops, and single step is carried out, and asserts numerical value and check numerical value.
To move all clock nodes between first recognition objective system and the hardware model like this.Clock generator in the forbidding goal systems disconnects the clock port from goal systems, or stops the clock signal from goal systems to enter hardware model.On the contrary, clock signal is from other form of a test platform program or software generation clock, and software kernel can detect the clock edge of activation with this trigger data estimation like this.Therefore, in the ICE pattern, Analog Simulation System uses software clock to come control hardware model rather than goal systems clock.
Operation for the design of analog line circuit in the environment of goal systems offers hardware model 325 for estimation with primary input between goal systems 40 and the modeled circuit design (signal input) and output signal (signal output).Finish this purpose to hardware buffer (T2H) 384 and hardware to target buffer (H2T) 383 these two impact dampers by target.Goal systems 387 uses T2H impact damper 384 that input signal is applied in the hardware model 325.Hardware model 325 uses H2T impact damper 383 that output signal is transported in the goal systems 387.In this internal circuit simulation model, hardware model is by T2H and H2T impact damper rather than S2H and the H2S impact damper receives and the transmission input/output signal, because system uses the test platform program in goal systems 387 rather than the software model 315 to come the estimated data now.Because goal systems is with a speed operation that is higher than software simulation speed in fact, the internal circuit simulation model also will be with a higher rate operation.The transmission of these input and output signals occurs in the pci bus 328.
In addition, between emulation interface 382 and hardware model 325, provide a bus 61.Bus 61 among this bus and Fig. 1 is similar.This bus 61 allows emulation interface 382 to communicate by letter with H2T impact damper 383 by T2H impact damper 384 with hardware model 325.
Usually, goal systems 387 is not coupled to pci bus.But if emulation interface 382 is merged in the design of goal systems 387, a such coupling connection is feasible.In this assembling, cable 390 does not exist.Signal between goal systems 387 and the hardware model 325 will pass emulation interface.
V. sunykatuib analysis pattern after
Analog Simulation System of the present invention can be supported numerical value change dump (VCD), a kind of simulator function that is widely used in the back sunykatuib analysis.In essence, VCD provides all inputs of hardware model and the historical record of selected register output, makes that afterwards the user can check the output that the difference of simulation process is imported and caused in the sunykatuib analysis of back.In order to support VCD, system will record all inputs in the hardware model.For output, system is with all numerical value of a user-defined recording frequency (for example, 1/10,000 record/cycle) record hardware register assembly.How long recording frequency decision output numerical value writes down once.For the recording frequency in 1/10,000 a record/cycle, per 10,000 periodic recordings are once exported numerical value.Recording frequency is high more, and the information of back sunykatuib analysis record afterwards is just many more.Recording frequency is low more, and back sunykatuib analysis canned data afterwards is just few more.Because recording frequency and the analog simulation speed selected have cause-effect relationship, so the user should carefully select recording frequency.A higher recording frequency will reduce analog simulation speed, because before further simulating, system should expend time in by the I/O operation of execute store and resource writes down output data.
About the back sunykatuib analysis, the user selects a specified point that simulation is required.If recording frequency is 1/500 record/cycle, register value is being recorded every the point 0,500,1000,1500 in 500 cycles or the like.If the user 610 needs the result at point, for example, the user selects the point 500 that writes down and simulates forward in time up to simulation to reach a little 610.In the analysis phase, analysis speed is the same with analog rate, because the user visits 500 data at the beginning, simulates a little 610 then forward.Attention is on higher recording frequency, for more data has been stored in the back sunykatuib analysis.Like this, for the recording frequency in 1/300 a record/cycle, every 300 cycles the point 0,300,600,900, or the like the storage data.In order to obtain the result at point on 610, the point 600 that the user selects to be write down at the beginning simulates a little 610 then forward.Attention when recording frequency is 1/300, is 1/500 to compare with recording frequency in the sunykatuib analysis of back, and system can reach the point 610 of expectation quickly.But such was the case with for situation.Special analysis site can arrive back sunykatuib analysis point with speed how soon together with the recording frequency decision.For example, if the VCD recording frequency is 1/500 rather than 1/300, system can reach a little 523 quickly so.
Then the user can have input record by execution software simulation in hardware model, thereby carry out analysis after the analog simulation with this numerical value change dump of calculating all hardware assembly.The user also can in time select any register measuring point and begin the numerical value change dump forward from that measuring point in time.This numerical value change dump method can be linked to any analog waveform visualizer and be used for the back sunykatuib analysis.
VI. hardware implementations
A. general introduction
Analog Simulation System realizes the fpga chip array on reconfigurable circuit board.Based on hardware model, Analog Simulation System is carried out subregion to the selected portion of each subscriber's line circuit design on fpga chip, mapping, place and route operation.Therefore, the 4x4 array that for example has 16 chips can modelling be deployed in the large scale circuit on these 16 chips.The interconnect scheme that adopts can make each chip visit another chip 2 times " redirect " or within connecting.
Each fpga chip is that each input/output address space (that is, REG, S2H, H2S, and CLK) is provided with an address pointer.The combination of all address pointers relevant with specific address space is linked at together mutually.So, in data transmission procedure, sequentially select the digital data each chip or the digital data in each chip chosen in main FPGA bus and the pci bus from main FPGA bus and pci bus, at one next word of the selected address space in each chip, and next chip is till having access to the desired digital data about this selected address space.The select progressively of this digital data is selected signal by the transmission word select and is finished.This word select is selected signal and is passed the address pointer in the chip and then be sent in the address pointer of next chip, and to the last chip or system carry out initialization to address pointer in continuation like this.
Bandwidth when the FPGA bus system in reconfigurable circuit board is worked is the twice of pci bus, but speed only is pci bus half.Therefore, fpga chip is divided into some groups to utilize the bus of bigger bandwidth.The processing power of this FPGA bus system can be comparable to the processing power of pci bus system, so do not lose performance because of the reduction of bus speed.Can adopt the bigger circuit board or the piggyback board extension group length that comprise more fpga chips to realize expansion.
B. address pointer
Figure 11 has shown an embodiment of address pointer of the present invention.All I/O operations all will be flowed through DMA.Because only there is a bus in system, so system is by the visit data of the mode order of next word.Therefore, address pointer embodiment uses shift register chain with selected word in these address spaces of visit of order.Address pointer 400 comprises trigger 401-405, AND gate 406, and pair of control signal, INITIALIZE407 and MOVE408.
Each address pointer have n output (W0, W1, W2 ..., Wn-1), be used for selecting a word corresponding to word identical the selected address space from n of each fpga chip possible word.Based on modeled specific user's circuit design, the quantity n of the word of different circuit design is also different, and for given circuit design, the n of different fpga chips is also different.In Figure 11, address pointer 400 only is 5 word lengths (that is address pointers n=5).Therefore, it is available that this specific fpga chip that is used for specific address space that comprises the 5-word address pointer only has 5 words.Much less, address pointer 400 can have the word of any amount n.This output signal Wn also can be called as word select and select signal.When this word select was selected signal and arrived in this address indicator the output of last trigger, it was called as the OUT signal, and is transmitted to the input of the address pointer of next fpga chip.
When asserting the INITIALIZE signal, the initialization address indicator.First trigger 401 is set to " 1 ", and every other trigger 402-405 is set to " 0 ".At this moment, the initialization of address pointer can not start any word select and selects; That is to say that after the initialization, all Wn outputs still are " 0 ".The initialization procedure of address pointer will be discussed in conjunction with Figure 12.
The process that the word select of MOVE signal controlling indicator is selected.This MOVE signal derives from index control signal READ, WRITE and the SPACE of order FPGA i/o controller.Because each operation all is once to read or write in essence,, the SPACE exponential signal will be applied to which address pointer so having determined the MOVE signal in fact.Therefore, system once only activates an address pointer relevant with selected input/output address space, and in this process, system only is used for this address indicator with the MOVE signal.The generation of MOVE signal will further be discussed in conjunction with Figure 13.According to Figure 11, when asserting the MOVE signal, the MOVE signal is offered input of AND gate 406 and the startup input of trigger 401-405.Like this, at every system clock cycle, a logical one will be exported Wi from word and move to Wi+1; That is to say that in per clock period, indicator will move to Wi+1 to select specific word from Wi.When the output 413 (being labeled as " OUT " at this) that signal advances to last trigger 405 is selected in the displacement word select, after this this OUT signal should enter next fpga chip (will these processes be discussed in conjunction with Figure 14 and 15) by the multiplex chip address indicator link of striding, unless initialization address indicator once more.
Set forth the initialization procedure of address pointer now.Figure 12 has shown the initialized state transition diagram of address pointer shown in Figure 11.During beginning, state 460 is idle.When DATA_XSFR was set to " 1 ", system got the hang of 461, at this initialization address indicator.At this, assert the INITIALIZE signal.First trigger in each address pointer is set to " 1 ", and the every other trigger in the address pointer is set to " 0 ".At this moment, the initialization of address pointer can not start any word select and selects; That is to say that all Wn outputs still are " 0 ".Next state is a waiting status 462, and DATA_XSFR still is " 1 " simultaneously.When DATA_XSFR was " 0 ", the initialization procedure and the system that finish address pointer returned idle condition 460.
Be illustrated as the MOVE signal generator that address pointer produces different MOVE signals now.By (among Figure 10 the 327th of FPGA i/o controller; Figure 22) the SPACE index of Chan Shenging is responsible for selecting specific address space (that is, REG reads, and REG writes, and S2H reads, and H2S writes and CLK writes).At this point in the space, location, the specific word of the selection of systematic order of the present invention is for visit.Finishing alphabetic word by the MOVE signal in each address pointer selects.
Figure 13 has shown an embodiment of MOVE signal generator.Each fpga chip 450 has the address pointer corresponding to different software/hardware boundary address spaces (that is, REG, S2H, H2S, and CLK).Except address pointer and modelling and be implemented in subscriber's line circuit design in the fpga chip 450, also has MOVE signal generator 470 in the fpga chip 450.MOVE signal generator 470 comprises an address space code translator 451 and some AND gate 452-456.Input signal is the FPGA read signal (F_RD) on the wire line 457, the FPGA write signal (F_WR) on the wire line 458, and address space signal 459.The output MOVE signal that is used for each address pointer is corresponding to the REGR-move on the wire line 464, REGW-move on the wire line 465, S2H-move on the wire line 466, H2S-move on the wire line 467, and the CLK-move on the wire line 468, depend on the address pointer of using which address space.These output signals are corresponding to the MOVE signal (Figure 11) on the wire line 408.
Address space code translator 451 receives 3 input signals 459.It also can receive 2 input signals.2 signals provide 4 possible address spaces, and 3 signals provide 8 possible address spaces.In one embodiment, CLK is assigned as " 00 ", S2H is assigned as " 01 ", and H2S is assigned as " 10 ", and REG is assigned as " 11 ".According to input signal 459, the output of address space code translator is corresponding to REG, one " 1 " of output in wire line 460-463 one of S2H, H2S and CLK, and Sheng Xia wire line is set to " 0 " simultaneously.Therefore, if any of these output lead circuit 460-463 is " 0 ", the output of its corresponding AND gate 452-456 also is " 0 " so.Same, if any of these input lead circuit 460-463 is " 1 ", the output of its corresponding AND gate 452-456 also is " 1 " so.For example, if address space signal 459 is " 10 ", then selected address space H2S.Wire line 461 is that the wire line 460,462 and 463 that " 1 " is left is " 0 ".Accordingly, wire line 466 is that the wire line 464,465,467 and 468 that " 1 " is left is " 0 ".Equally, if wire line 460 is " 1 ", then having selected address space REG and having depended on selected is to read (F_RD) still to write (F_WR) operation, and REGR-move signal on the wire line 464 or the REGW-move signal on the wire line 465 will be " 1 ".
As explaining in the preamble, produce the SPACE index by the FPGA i/o controller.Use coded representation, MOVE is controlled to be:
REG reads indicator in the space: REGR-move=(SPACE-index==#REG) ﹠amp; READ;
REG writes indicator in the space: REGW-move=(SPACE-index==#REG) ﹠amp; WRITE;
S2H reads indicator in the space: S2H-move=(SPACE-index==#S2H) ﹠amp; READ;
H2S writes indicator in the space: H2S-move=(SPACE-index==#H2S) ﹠amp; WRITE;
CLK writes indicator in the space: CLK-move=(SPACE-index==#CLK) ﹠amp; WRITE;
This is the coding with the logical diagram equivalence of MOVE signal generator shown in Figure 13.
As mentioned above, each fpga chip have with the software/hardware border in the address pointer of address space equal number.If have 4 address spaces (that is, REG, S2H, H2S, and CLK) in the software/hardware border, then each fpga chip has 4 address pointers corresponding to these 4 address spaces.Each FPGA needs this 4 address pointers, because processed specific selection word can be arranged in any one or a plurality of fpga chip in selected address space, or because the data influence in the selected address space modelling and different circuit components of realizing in each fpga chip.For guaranteeing, cross over a plurality of fpga chips and be " linked in " together with the relevant every group address indicator in given software/hardware boundary address space (that is, REG, S2H, H2S is with CLK) according to the selected word of correct circuit element processes in the correct fpga chip.Described in conjunction with Figure 11 as mentioned, still use the word select system of selecting a good opportunity of carrying out specific displacement or transmission by the MOVE signal, just in this " link " embodiment, " link " about the address pointer about same address space in the address pointer of specific address space and the next fpga chip in fpga chip.
Utilize 4 input pins and 4 output pin chained address indicators can realize same purpose.But this embodiment has been wasted very much with regard to effectively utilizing resource; That is to say, between two chips, need 4 leads, in each chip, need 4 input pins and 4 output pins.An embodiment according to system of the present invention uses the multiplexed chip address indicator link of striding, and it makes hardware model only use a lead between chip and only uses 1 input pin and 1 output pin (2 I/O pins are arranged in the chip) in each chip.A multiplexed embodiment who strides chip address indicator link as shown in figure 14.
In the embodiment shown in fig. 14, user's circuit design is mapped and be divided among three fpga chip 415-417 on the reconfigurable hardware plate 470.By block 421-432 presentation address indicator.Each address pointer, for example address pointer 427 has the 26S Proteasome Structure and Function that is similar to address pointer shown in Figure 11, just the quantity Wn of word and the therefore quantity of the word also realized according to each chip that is used for the User Defined circuit design of the quantity of trigger and different.
For the REGR address space, fpga chip 415 has address pointer 421, and fpga chip 416 has address pointer 425, and fpga chip 417 has address pointer 429.For the REGW address space, fpga chip 415 has address pointer 422, and fpga chip 416 has address pointer 426, and fpga chip 417 has address pointer 430.For the S2H address space, fpga chip 415 has address pointer 423, and fpga chip 416 has address pointer 427, and fpga chip 417 has address pointer 431.For the H2S address space, fpga chip 415 has address pointer 424, and fpga chip 416 has address pointer 428, and fpga chip 417 has address pointer 432.
Each chip 415-417 has multiplexer 418-420 respectively.Should notice that these multiplexers 418-420 can be a model, and real realization can be the combination of register and logic element, as known to the person of ordinary skill in the field.For example, multiplexer can be the form of an OR-gate of a plurality of AND gates injections as shown in figure 15.Multiplexer 487 comprises four AND gate 481-484 and an OR-gate 485.Multiplexer 487 be input as OUT and MOVE signal from each address pointer in the chip.The output 486 of multiplexer 487 goes out signal for the chain that is sent to next fpga chip input end.
In Figure 15, this specific fpga chip has four address pointer 475-478 corresponding to input/output address space.The output of address pointer, OUT and MOVE signal are the input of multiplexer 487.For example, address pointer 475 has an OUT signal on wire line 479, has a MOVE signal on wire line 480.These signals are transfused to AND gate 481.AND gate 481 is output as an input of OR-gate 485.The output of OR-gate 485 i.e. the output of multiplexer 487 for this reason.In operation, the OUT signal of the output terminal of each address pointer 475-478 serves as the selector signal of multiplexer 487 together with its corresponding M OVE signal and SPACE index; That is to say that OUT and MOVE signal (it derives from the SPACE exponential signal) must be asserted effectively (that is logical one) and send out multiplexer and arrive chain and connect wire line word select is selected signal.To periodically assert the MOVE signal, and move through trigger in the address pointer so that signal is selected in word select, thereby make it have the feature of input MUX data-signal.
Referring to Figure 14, these multiplexers 418-420 has four groups of inputs and an output.Every group of input comprises: (1) is based on last output Wn-1 wire line (for example, the wire line 413 of address pointer among Figure 11) the OUT signal, and (2) MOVE signal of the address pointer relevant with specific address space.Each multiplexer 418-420 is output as chain and goes out signal.When selecting the output terminal of last trigger in the signal Wn arrival address indicator by the word select of trigger in each address pointer, it becomes the OUT signal.Only when all being asserted to about an OUT signal of identical address indicator and a MOVE signal when effectively (that is, being asserted as " 1 "), it just is " 1 " that the chain on the wire line 433-435 goes out signal.
For multiplexer 418, be input as the OUT that corresponds respectively to address pointer 421-424 and the MOVE signal 436-439 and the OUT signal 440-443 of MOVE signal.For multiplexer 419, be input as the OUT that corresponds respectively to address pointer 425-428 and the MOVE signal 444-447 and the OUT signal 452-455 of MOVE signal.For multiplexer 420, be input as the OUT that corresponds respectively to address pointer 429-432 and the MOVE signal 448-451 and the OUT signal 456-459 of MOVE signal.
In operation, for the displacement of any given word Wn, it is effective that those address pointer or address pointer links about selected input/output address space in the software/hardware border are only arranged.Therefore, in Figure 14, in the chip 415,416 and 417 with address space REGR, REGW, a relevant address pointer among S2H or the H2S just can be effective to given displacement.Equally, select the given displacement of signal Wn, because the selected word of visit of the necessary order of the restriction of bus bandwidth by trigger for word select.In one embodiment, total live width is that 32 and a word also are 32, thus once only can visit a word, and give suitable resource with it.
When the address indicator just carries out transmission or when displacement that signal is selected in word select by its trigger, discharging chain goes out signal and is not activated (promptly, be not " 1 "), and therefore, this multiplexer in this chip is unripe to be selected signal with word select and sends next fpga chip to.When the OUT signal was asserted to effectively (that is, " 1 "), chain went out signal and is asserted to effectively (that is, " 1 "), showed that system all set selects word select signal and transmit or be displaced to next fpga chip.Therefore, once a chip is conducted interviews; That is to say, select word select in the previous chip of shifting function in the word select of carrying out another chip and select signal and be shifted by trigger.In fact, only select and assert that chain goes out signal when signal arrives the terminal of address pointer in each chip when word select.With coded representation, chain goes out signal and is:
Chain-out=(REGR-move&REGR-out)|(REGW-move&REGW-out)|(S2H-move&S2H-out)|(H2S-move&H2S-out);
In a word, for the input/output address space of the X in the system (that is, REG, S2H, H2S, and CLK), each FPGA has X address pointer, and an address pointer is corresponding to an address space.The size of each address pointer depends on the quantity of the word that in fpga chip modelling User Defined circuit design is required.Suppose specific fpga chip have n word and thereby, address pointer also has n word, this particular address indicator have n output (that is, and W0, W1, W2 ..., Wn-1).These output Wi is also referred to as word select and selects signal.When having selected specific word Wi, the Wi signal is asserted to effectively (that is, " 1 ").The end of signal address pointer in arriving this chip to displacement of the downstream of this chip address pointer or transmission is selected in this word select, and herein, it triggers the generation that chain goes out signal, makes word select select signal Wi and begins to transmit in the address pointer of next chip.In this way, can on all fpga chips on this reconfigurable hardware plate, realize a series of address pointers relevant with given input/output address space.
C. gate data/clock network analysis
Different embodiments of the invention execution clock analysis that combines with the analysis of gate data logic sum gate control clocked logic.Gated clock logic (or clock network) and gate data network determine to successful realizations of software clock and in simulation process the logic of hardware model estimate very key.As described in conjunction with Fig. 4, carry out clock analysis in step 305.For further setting forth the clock analysis process, Figure 16 has shown process flow diagram according to an embodiment of the invention.Figure 16 has also shown the gate data analysis.
Analog Simulation System has the complete model of subscriber's line circuit design in software, have the some parts of subscriber's line circuit design in hardware.These hardware components comprise clock assembly, especially derive clock.Because sequence problem is transmitted and produced clock in the border between this software and hardware.Because have complete model in the software,, software influences the clock of register value edge so can detecting.Except the software model of register, these registers also necessary being in hardware model.Also estimate its corresponding input (that is, the data that D is imported move on to Q output) in order to ensure hardware register, the software/hardware border comprises a software clock.Software clock is guaranteed correct the estimating of the register in the hardware model.Software clock is the startup input of control hardware register in fact, rather than control is to the clock input of hardware register assembly.Also therefore this software clock has avoided the race state, does not need to avoid the retention time to upset with accurate sequential control.Clock network shown in Figure 16 and gate data logic analysis process provide a kind of modelling and have realized the clock of hardware register and the method for data transmission system, the feasible software/hardware border embodiment of having avoided the race state and dirigibility being provided.
As previously mentioned, major clock is the clock signal from the test platform program.Every other clock for example is derived from those clock signals of combine component, is to derive or gated clock.Major clock can be derived gated clock and gate data-signal.For most of parts, only there be seldom (as, 1-10) to derive or gated clock is present in user's the circuit design.These are derived clock and can realize with the form of software clock and be present in the software.If in circuit design, exist relative populations bigger (as, above 10) derive clock, Analog Simulation System can with its modelling among hardware to reduce the I/O expense and to keep the performance of Analog Simulation System.The gate data are the data or the control input of register, and it is different from the clock that is driven by major clock by some combinational logics.
Gate data/clock analysis process starts from step 500.Step 501 is obtained the useful source design database coding that results from HDL coding and user's register element is mapped among the register assembly of Analog Simulation System.Man-to-man mapping has promoted modelling step subsequently between this user register and the analog simulation register.In some cases, need this mapping with the process user circuit design, these designs utilize specific original language to describe register element.Therefore,, can quite easily use the analog simulation register,, allow to change the embodiment of lower grade because the grade of RTL level coding is enough high for the coding of RTL grade.For the gate leve wire list, Analog Simulation System is made amendment with the cell library of access component and to it, makes the special logic element of its suitable specific circuit design.
Step 502 is extracted clock signal from the register assembly of hardware model.This step permission system determines major clock and derives clock.This step is also determined all required clock signals of different assemblies in the circuit design.Coming since then, the information of step helps software/hardware clock models step.
Step 503 is determined major clock and is derived clock.Master clock source self-test platform assembly and only by modelling in software.Derive clock from combinational logic, it is driven by major clock conversely.According to default settings, Analog Simulation System of the present invention will derive clock and be kept in the software.If the negligible amounts of derivation clock (as, be less than 10), these can be derived clock models so and turn to software clock.Because it is less to generate the quantity of these combine components of deriving clock, so, do not increase and do not increase sizable I/O expense by these combine components are kept in the software.But, if derive the quantity of clock big (as, above 10), these can be derived so clock modelsization in hardware to minimize the I/O expense.Sometimes, user's circuit design is used a large amount of derivation clock assemblies by the major clock derivation.Therefore, system sets up clock in hardware, to keep less software clock quantity.
Steps in decision-making 504 needs system to remove to determine whether to find any derivation clock in user's circuit design.If no, then the result of step 504 ends at step 508 for "No" and clock analysis because in user's the circuit design all clocks all be major clock and these clocks all naive modelization among software.If finding in user's circuit design derives clock, then the result of step 504 enters step 505 for "Yes" and algorithm.
Step 505 is determined from major clock to the fan-out combine component of deriving clock.In other words, this step is by the clock signal data path of combine component tracking from major clock.Step 506 is determined from the fan-in combine component of deriving clock.In other words, this step is followed the tracks of from combine component to the clock signal data path of deriving clock.Determining of system's output and fan-in group carried out in circulation in software.The fan-in group of network N is as follows:
The fan-in of network N (FanIn) group:
Seek the assembly that all drive network N;
Each drives the assembly X do of network N for:
If assembly X is not combine component then
Return; (returning)
else
Each fan-in network Y of for assembly X
With fan-in (FanIn) the group W of network Y and the fan-in of network N
(FanIn) group addition
end for
Assembly X is added N;
end if
end for
The fan-in group of the definite network N by repeatedly and fan-out group and their common factor are determined gated clock or data logical network.Final goal herein is to determine fan-in (FanIn) group of so-called network N.Network N is a clock input node normally, is used for determining the gated clock logic from the angle of fan-in.In order to determine the gated clock logic from the angle of fan-in, network N be one with the relevant clock input node of data input that closes on.If node is on register, network N is the clock input to this register, is used for the relevant data input of register therewith.All drive the assembly of network N system looks.For each assembly X that drives network N, system determines whether assembly X is combine component.If each assembly X all is not a combine component, the fan-in group of network N does not contain combine component and network N is a major clock so.
But, be combine component if having an assembly X at least, then system determines the fan-in network Y of assembly X.At this, system can further inquire after in circuit design backward by the input node that searching enters assembly X.For each fan-in network Y of each assembly X, there is the fan-in group W that is connected with network Y.The fan-in group W of this network Y is added the fan-in group of network N, then assembly X adding group N.
Determine the fan-out group of network N in the same way.The fan-out group of network N is as follows:
The fan-out of network N (FanOut) group:
Seek the assembly that all use network N;
Each uses the assembly X do of network N for:
If assembly X is not combine component then
Return; (returning)
else
Each output network Y of for assembly X
With the fan-out group of the fan-out (FanOut) of network Y group and network N mutually
Add
end for
Assembly X is added N;
end if
end for
Again, the fan-in group of the definite network N by repeatedly and fan-out group and their common factor are determined gated clock or data logical network.Final goal herein is to determine fan-out (FanOut) group of so-called network N.Network N is a clock output node normally, is used for determining the gated clock logic from the angle of fan-out.Therefore, the group of all logic elements of use network N will be determined.For determine from the angle of fan-out gate data logic, network N be one with the relevant clock output node of the data of closing on output.If node is on register, network N is the output of register for this reason, and being used for therewith, the relevant major clock of register drives input.All use the assembly of network N system looks.For each assembly X that uses network N, system determines whether assembly X is combine component.If each assembly X all is not a combine component, the fan-out group of network N does not contain combine component and network N is a major clock so.
But, be combine component if having an assembly X at least, then system determines the output network Y of assembly X.At this, system by seek from the output node of assembly X further the major clock from circuit design inquire after forward.For each fan-out network Y of each assembly X, there is the fan-out group W that is connected with network Y.The fan-out group W of this network Y is added the fan-out group of network N, then assembly X adding group N.
Step 507 is determined clock network or gated clock logic.Clock network is the common factor of fan-in and fan-out combine component.
Similarly, can use identical fan-in and fan-out principle to determine gate data logic.Similar with gated clock, the gate data are by data or the control input (except clock) of major clock by the register of some combinational logics drivings.Gate data logic is the common factor of the fan-in and the major clock fan-out of gate data.Therefore, clock analysis and gate data analysis produce gate clock network/logic by some combinational logics and gate data logic.As mentioned below, gated clock network and gate data network determine to successful realizations of software clock and in simulation process the logic of hardware model estimate very key.Clock/data network analysis ends at step 508.
Figure 17 has shown the basic building block piece of hardware model according to an embodiment of the invention.For the register assembly, Analog Simulation System uses the D-D-flip flop with asynchronous load control as the fundamental block that constitutes edge triggered flip flop (that is trigger) and level induction (that is latch) register hardware model.This register model component piece has following port: Q (output state); A_E (asynchronous starting); A_D (asynchronous data); S_E (starting synchronously); S_D (synchrodata); Certainly also has System.clk (system clock).
This analog simulation register model is triggered by the just edge of system clock or the positive level of asynchronous starting (A_E) input.When just edge or positive level trigger event took place, the register model was sought asynchronous starting (A_E) input.If asynchronous starting (A_E) input is activated, then export the value that Q has asynchronous data (A_D); Otherwise, be activated if start (S_E) input synchronously, then export the value that Q has synchrodata (S_D).On the other hand, if asynchronous starting (A_E) or start synchronously (S_E) input and all be not activated is not then estimated the value of output Q, although the just edge of detection system clock.According to said method, the input of these its enable port has been controlled the operation of basic building block register model.
System uses software clock (it is special startup register) to control the startup input of these register models.In the design of the subscriber's line circuit of complexity, in circuit design, have millions of elements and therefore, the analog simulator system will realize millions of elements in hardware model.The cost of independent all these elements of control will be very high, will spend the longer time because transmit the expense of millions of control signals to hardware model than these elements of estimation in software.But, even this complex circuit design is usually also only called (1-10) clock seldom, and only with regard to clock enough control only have the state-transition of the system of register and combine component.The hardware model of analog simulator system only uses register and combine component.Simulation system also passes through the estimation of software clock control hardware model.In the analog simulator system, the hardware model that is used for register does not have the clock of direct other nextport hardware component NextPorts of connection; But control the value of all clocks by software kernel.By controlling several clock signals, kernel has the comprehensive control to the hardware model estimation, follows insignificant coprocessor to interfere amount of overhead.
Being taken as latch according to the register model still is that trigger uses, and software clock is imported asynchronous starting (A_E) or started (S_E) wire line synchronously.Detect the application of triggering software clock from the software model to the hardware model by edge to clock assembly.When software kernel detected the edge of clock assembly, it was provided with clock edge register by the CLK address space.This clock edge register controlled is for the startup input of hardware register model, rather than the clock input.The global system clock still provides the clock input for the hardware register model.But clock edge register provides the software clock signal by a double buffer interface to the hardware register model.As mentioned below, the double buffer interface from the software clock to the hardware model has guaranteed that all register models will be upgraded synchronously about the global system clock.Therefore, the danger of retention time upset has been eliminated in the use of software clock.
Figure 18 (A) and 18 (B) have shown the structure block register model of realizing latch and trigger.These register models are subjected to the control of software clock by correct startup input.Being taken as latch according to the register model still is that trigger uses, asynchronous port (A_E, A_D) and synchronous port (S_E will have in S_D) one to be used for software clock or I/O is operated.Figure 18 (A) has shown the realization of the register model that is taken as the latch use.Latch is the level induction; That is to say, if asserted clock signal (as, " 1 "), then export Q and follow input (D).At this, the software clock signal is provided for asynchronous starting (A_E) input, and the data input is provided for asynchronous data (A_D) input.For the I/O operation, software kernel uses and starts the input of (S_E) and synchrodata (S_D) synchronously value is downloaded to the Q port.The S_E port is used as REG space address indicator, and the S_D port is used for data are read in or take out local data's bus.
Figure 18 (B) has shown the realization of the register model that is taken as the use of design trigger.The design trigger uses following port to determine next state logic: data (D) are provided with (S) reset (R), and startup (E).All next state logics of design trigger all are included in the hardware combinations assembly that enters synchrodata (S_D) input.Software clock is imported into synchronous startup (S_E) input.For the I/O operation, software kernel uses asynchronous starting (A_E) and asynchronous data (A_D) input value is downloaded to the Q port.The A_E port is used as REG space write address indicator, and the A_D port is used for data are read in or take out local data's bus.
Software clock is discussed now.An embodiment of software clock of the present invention is the clock enable signal to the hardware register model, makes the data of these hardware register model input ends together be estimated and synchronous with system clock like this.Having eliminated race state and retention time like this upsets.An embodiment of software clock logic comprises that the clock edge in the software detects logic, and it detects the additional logic that triggers in the hardware according to the clock edge.This enabling signal logic was the startup input generation enabling signal of hardware register model before data arrive these hardware register models.Gated clock network and gate data network determine to successful realizations of software clock and in hardware-accelerated pattern the logic of hardware model estimate very key.As mentioned below, clock network or gated clock logic are the common factor of gated clock fan-in and major clock fan-out.Similarly, gate data logic also is the common factor of gate data fan-in and data-signal major clock fan-out.The notion of fan-in and fan-out above has been discussed in conjunction with Figure 16.
As indicated above, major clock is generated by the test platform program in the software.Derivation or gated clock are generated by the combinational logic and the register network that driven by major clock.According to default settings, Analog Simulation System of the present invention also will derive clock and be kept in the software.If the negligible amounts of derivation clock (as, be less than 10), these can be derived clock models so and turn to software clock.Because it is less to generate the quantity of these combine components of deriving clock, so pass through these combine component modellings in software, making does not increase sizable I/O expense.But, if derive the quantity of clock big (as, above 10), these can be derived so clock modelsization in hardware to minimize the I/O expense.
Finally, according to one embodiment of present invention, the clock edge detection (by the input to major clock) that takes place in software can be translated into the clock detection (by the input to clock edge register) in the hardware.Clock edge in the software detects an incident that triggers in the hardware, makes register receive clock enabling signal before receiving data-signal in the hardware model to take place to avoid the retention time upset synchronously with estimation and the system clock of guaranteeing data-signal.
As mentioned before, Analog Simulation System has the complete model of subscriber's line circuit design in software, has the some parts of subscriber's line circuit design in hardware.As defined in the kernel, software can detect influences the clock of hardware register value edge.For guaranteeing that hardware register also estimates its corresponding input, the software/hardware border comprises a software clock.Software clock guarantees that the estimation of the register in the hardware model and system clock are synchronous, and does not have the retention time to upset.Software clock is the startup input of control hardware register assembly in fact, rather than control is to the clock input of hardware register assembly.The double buffering method that realizes software clock has guaranteed that the estimation of register and system clock are synchronous, has avoided the race state, and has eliminated the needs to accurate sequential control, thereby avoided the retention time upset.
Figure 19 has shown the embodiment according to clock executive system of the present invention.During beginning, as described in conjunction with Figure 16, determine gated clock logic sum gate control data logic by the analog simulator system.Then separate gate clocked logic and gate data logic.When realizing double buffer, also must separate drive source and double buffering main logic.Therefore, according to fan-in and fan-out analysis, gate data logic 513 and gated clock logic 514 have been separated.
Modeled major clock register 510 comprises one first impact damper 511 and one second impact damper 512, and it is the D register.This major clock by modelling in software, but double buffer by modelling in software and hardware.The clock edge detects in the major clock register 510 that occurs in the software to trigger the software clock signal of hardware model generation to hardware model.Have the data and the address that enter first impact damper 511 on the wire line 519 and 520 respectively.The Q output of first impact damper 511 on wire line 521 links to each other with the D input of second impact damper 512.The Q output of first impact damper 511 also is provided for the clock input of gated clock logic 514 with first impact damper 516 of final drive clock edge register 515 by wire line 522.The Q output of second impact damper 512 is provided for gate data logic 513 with the inputs by the register 518 of wire line 530 final drivings in the circuit model of User Defined design by wire line 523.The startup of second impact damper 512 of major clock register 510 is input as on the wire line 533 the INPUT-EN signal from state machine, and its definite estimation cycle is also correspondingly controlled different signals.
Clock edge register 515 also comprises one first impact damper 516 and one second impact damper 517.Clock edge register 515 is implemented in the hardware.When the detection of clock edge occurs in the software (by the input of major clock register 510), it can trigger clock edge identical in hardware and detect (by clock edge register 515).D input to first impact damper 516 on the wire line 524 is set to " 1 ".Clock signal on the wire line 525 is from gated clock logic 514 and final from first output of impact damper 511 on wire line 522 in the major clock register 510.Clock signal on the wire line 525 is a door controling clock signal.Signal on the startup wire line 526 of first impact damper 516 for from the control I/O of state machine and estimation cycle (will introduce hereinafter)~the EVAL signal.First impact damper 516 also has the RESET signal on wire line 527.This identical RESET signal also will be provided for second impact damper 517 of clock edge register 515.The Q output of first impact damper 516 on wire line 529 is provided for the D input of second impact damper 517.Second impact damper 517 also has the input of startup CLK-EN signal on wire line 528, have a RESET input on wire line 527.The Q of second impact damper 517 output is provided for the startup input of the register 518 in the circuit model of User Defined design by wire line 532.Impact damper 511,512 and 517 is controlled by system clock together with register 518.Only the impact damper 516 in the clock edge register 515 is by the gated clock control from gated clock logic 514.
Register 518 is the typical D-type register model of modelling in hardware, and is the part of User Defined circuit design.Its estimation process of the strict control of this embodiment by clock embodiment of the present invention.The final goal that this clock is set is to guarantee that the clock enable signal on the wire line 532 arrived register 518 before the data-signal on the wire line 530, makes the estimation of this register pair data-signal and the generation that the race state takes place and do not have system clock synchronously.
Reiterate, modeled major clock register 510 by modelling in software, but its double buffer by modelling in software and hardware.Clock edge register 515 is implemented in the hardware.According to fan-in and fan-out analysis, gate data logic 513 and gated clock logic 514 are also separated being used for modeled purpose, and they can be by modelling in software in (if the quantity of gate data and gated clock is less) or the hardware (if the quantity of gate data and gated clock is bigger).Gated clock network and gate data network determine to successful realizations of software clock and in hardware-accelerated pattern the logic of hardware model estimate very key.
The realization of software clock mainly depends on clock setting shown in Figure 19 and asserts signal~EVAL, INPUT-EN, the sequential of CLK-EN and RESET.Major clock register 510 detects the generation that the clock edge triggers for hardware model software clock.This clock edge detection incident is by the input of the clock on the wire line 525, and gated clock logic 514 and wire line 522 trigger " activation " of clock edge register 515, make clock edge register 515 also detect identical clock edge.In this way, the clock edge that the clock detection that takes place in software (by the input 519 and 520 to major clock register 510) can be translated in the hardware detects (by the input 525 to clock edge register 515).At this moment, the CLK-EN wire line 528 of the INPUT-EN wire line 533 of second impact damper 512 of major clock register 510 and second impact damper 517 of clock edge register 515 also is not asserted, and does not therefore have data estimation to take place.Therefore, will be to detect before the estimated data in the hardware register model clock edge.Should note in this stage, also not be sent to gate data logic 513 from the data of data bus on the wire line 519 and enter the user register 518 of hardware modeling.In fact, data even also do not arrive second impact damper 512 of major clock register 510 are not because the INPUT-EN signal on the wire line 533 also is asserted.
In the I/O stage, assert on the wire line 526~the EVAL signal to be to start first impact damper 516 in the clock edge register 515.~EVAL signal also passes through gated clock logic 514, and at door controling clock signal when the gated clock logic enters first impact damper 516 by the clock on the wire line 525 input, it is monitored.Therefore, in conjunction with as described in the 4-state estimated state machine, can keep as required~the EVAL signal as hereinafter, with stable through partial data shown in Figure 19 and clock signal in the system.
Behind signal stabilization, I/O stops, or the preparation estimated data of system, and~EVAL is asserted to forbid first impact damper 516 by contrary.Assert CLK-EN signal and be applied to second impact damper 517 to start second impact damper 517 and to give startup on the wire line 532 and register 518 the Q that input is connected with the logical value on the wire line 529 " 1 " and export by wire line 528.Register 518 be activated now and wire line 530 on any data will be input to register 518 by the system clock synchronous clock.As the reader was observable, the enabling signal of register 518 was faster than the estimation of the data-signal of this register 518 operation.
INPUT-EN signal on the wire line 533 is not asserted to second impact damper 512.And the RESET edge register signal on the wire line 527 is asserted to the impact damper 516 in the clock edge register 515 and 517 these impact dampers are resetted and guarantee that they are output as logical zero.The INPUT-EN signal is asserted to impact damper 512 now, and the data on the wire line 521 are sent to gate data logic 513 to arrive subscriber's line circuit register 518 by wire line 530.Because the startup of register 518 input is a logical zero now, the data on the wire line 530 can't be by clock input register 518.But previous data were imported by clock by the enabling signal on the wire line 532 of before having asserted before the RESET signal is asserted to the register of forbidding 518.Therefore the input data of register 518, and the input of other registers (it is the part of user's hardware modeling circuit design) are stable for their relevant register input ports.When in software, detecting the clock edge subsequently, clock edge register 515 in major clock register 510 and the hardware activates the startup input of registers 518, makes the data of the input register 518 that clamps on and other wait for that the data of its corresponding registers of input are together imported by clock and synchronous with system clock.
As previously mentioned, software clock is realized mainly depending on clock setting shown in Figure 19 and is asserted signal~EVAL, INPUT-EN, the sequential of CLK-EN and RESET.Figure 20 has shown the four condition finite state machine of controlling software clock logic shown in Figure 19 according to an embodiment of the invention.
At state 540, system's free time or some I/O operation are carried out.The EVAL signal is a logical zero.The EVAL signal is determined the estimation cycle, and it is generated by system controller, and can continue a lot of clock period as required with the logic in the systems stabilisation.Usually, time of continuing of EVAL signal is determined by the placement scheme in the compilation process and based on the length of long direct line and the length of the longest segmentation multipath transmission lead (that is TDM circuit).In estimation process, the EVAL signal is a logical one.
At state 541, clock is activated.The CLK-EN signal is asserted to logical one and therefore, has asserted the enabling signal of hardware register model.At this, gate data previous in the hardware register model are estimated synchronously, and do not have the danger that the retention time upsets.
At state 542, when the INPUT-EN signal was asserted to logical one, new data were activated.Assert that also the RESET signal is to remove enabling signal from the hardware register model.But, the new data that is allowed to enter the hardware register model by gate data logical network is sent to the hardware register purpose of model ground of expectation or has arrived its destination continuing, and waits for when enabling signal is asserted once more and imported the hardware register model by clock.
At state 543, the new data of transmission is stabilized in logic, and the EVAL signal remains on logical one simultaneously.In conjunction with Fig. 9 (A), 9 (B) and 9 (C) are introduced when describing time-division multiplex conversion (TDM) circuit as mentioned, and multiplexed lead also is a logical one.When the EVAL signal is asserted or when being set to logical zero, system returns idle condition 540 and waits for and according to software the detection at clock edge being estimated by contrary.
D.FPGA array and control
The analog simulator system at first is compiled into the subscriber's line circuit design data in the software and hardware model based on a series of controls that comprise component type.In the hardware compilation process, as described in conjunction with Fig. 6, system carries out mapping, the place and route process, and with the division of the best, the different assemblies of subscriber's line circuit design are formed in layout and interconnection.Use known programming tool, quote bit stream configuration file or Programmer Object Files (purpose file able to programme) (.pof) (perhaps, former binary file (.rbf)) reconfigure the hardware plate that comprises many fpga chips.Each chip comprises the part corresponding to the hardware model of subscriber's line circuit design.
In one embodiment, the analog simulator system uses the fpga chip array of 4x4, has 16 chips altogether.The example of fpga chip comprises Xilinx XC4000 Series FPGA logical device and Altera FLEX 10K device.
Can use Xilinx XC4000 Series FPGA, comprise XC4000, XC4000A, XC4000D, XC4000H, XC4000E, XC4000EX, XC4000L, and XC4000XL.Special FPGA comprises Xilinx XC4005H, XC4025 and Xilinx4028EX.Xilinx XC 4028EX FPGA can drive 500,000 gate circuits nearly on a single PCI plate.Can in following databook, obtain the particulars of these Xilinx FPGA, [Xilinx, FPGA (Field Programmable Gate Array) databook] (9/96), its content is incorporated this paper by reference into.Can obtain the particulars of Altera FPGA in following databook, [Altera, 1996 databooks] (in June, 1996), its content is incorporated this paper by reference into.
Briefly introducing of XC402FPGA will be provided.Each array chip is made up of the Xilinx chip of a 240-pin.The array board that is assembled with Xilinx XC4025 chip comprises about 440,000 configurable gate circuits, and can carry out the task of computation-intensive.The XilinxXC4025 chip comprises 1024 configurable logical blocks (CLBs).Each CLB can realize 32 Asynchronous SRAM, or a spot of general boolean (Boolean) logic, and two strobe register.In the periphery of chip, has non-gating I/O register.Can substitute XC4025 with XC4005H.This is a lower-cost array board with 120,000 configurable gate circuits.The XC4005H device has powerful 24mA driving circuit, but lacks the I/O trigger of standard x C4000 series.Can obtain the particulars of these and other Xilinx FPGA by tables of data publicly, its content is incorporated this paper by reference into.
Can be by configuration data being written into the function that internal storage unit customizes Xilinx XC4000 Series FPGA.Be stored in value in these storage unit and determined interconnection among logic function and the FPGA.The configuration data of these FPGA can be stored on the chip-scale storer and can be written into from external memory storage.FPGA can read configuration data from outside serial or parallel connection PROM, or from external unit configuration data is write FPGA.Can be many times with these FPGA reprogrammings, particularly change or user expectation hardware can adapt to different application the time in hardware dynamic.
The XC4000 Series FPGA generally has nearly 1024 CLBs.Each CLB has the two-stage look-up table, and wherein two 4-input checking tables (or function generator F and G) are that three 3-input checking tables (or function generator H) provide part input, and two triggers or latch.Can be independent of the output that these triggers or latch drive these look-up tables.CLB can realize the combination of following any boolean (Boolean) function: (1) has any function of four or five variablees, (2) has any function of four variablees, has nearly any second function of four irrelevant variables, and has nearly any the 3rd function of three irrelevant variables, (3) a function and functions with six variablees with four variablees, (4) have any two functions of four variablees, and (5) some have the function of nine variablees.The output that provides two D flip-flops or latch to be used to deposit the CLB input or to store look-up table.Can be independent of look-up table and use these triggers.Can use DIN can drive another by the H function generator as direct input some in these two triggers or the latch and H1.
Each 4-input function generator (that is, F and G) in CLB comprises the special-purpose arithmetical logic that is used for carry and the quick generation of borrow signal, it can be configured to have the 2-position totalizer of carry input and carry output.These function generators also can be configured to read/write random access memory (RAM).Can use the address wire of 4-input lead circuit as RAM.
Some is similar on principle for Altera FLEX 10K chip.These chips are the programmable logic device (PLDs) based on SRAM, and it has a plurality of 32-position bus.More particularly, each FLEX 10K100 chip comprises about 100,000 gate circuit, 12 embedded Array pieces (EABs), 624 logic array blocks (LABs), each LAB has 8 logic elements (LEs) (or 4,992 LEs), 5,392 triggers or register, 406 I/O pins and 503 pins altogether.
Altera FLEX 10K chip comprises the embedded Array of embedded Array piece (EABs) and the logic array of logic array block (LABs).Can use an EAB realize multiple storer (as, RAM, ROM, FIFO) and the complex logic function (as, digital signal processor (DSPs), microcontroller, multiplier, data converting function, state machine).For realizing memory function, EAB provides 2,048 positions.For realizing logic function, EAB provides 100 to 600 gate circuits.
By LEs, can use LAB to realize medium sized logical block.Each LAB represents about 96 logic gates and comprises 8 LE and a local interlinkage.A LE comprises a 4-input checking table, and a programmable trigger device and being used to transmits the special signal path with cascaded functions.The general logic function that can set up comprises counter, address decoder, or small status machine.
Can find AlteraFLEX 10K chip more detailed description in [Altera, 1996 databooks] (in June, 1996), its content is incorporated this paper by reference into.Databook also comprises the particulars of the program development software of being supported.
Fig. 8 has shown that an embodiment of 4x4 FPGA (field programmable gate array) array interconnects with it.
The embodiment that should note this analog simulator does not use crossbar switch or local crossbar switch to connect in fpga chip.Fpga chip comprises chip F11 in first row to F14, and second the chip F21 in capable is to F24, and the chip F41 of the chip F31 in the third line in F34 and the fourth line is to F44.In one embodiment, each fpga chip (as, chip F23) has the following pin that is used for the FPGA i/o controller interface of analog simulator system:
Interface Pin
Data bus
32
The SPACE index 3
READ,WRITE,EVAL 3
DATA XSFR 1
The address pointer chain 1
Amount to 41
Therefore, in one embodiment, each fpga chip only is used for 41 pins the interface of analog simulator system.To these pins further be discussed in conjunction with Figure 22.
These fpga chips interconnect by non-crossbar switch or the interconnection of non local crossbar switch.Each interconnection of chip chamber, 44 pins or 44 wire lines are represented in for example interconnection 602 between chip F11 and the chip F14.In other embodiments, each interconnection representative surpasses 44 pin.Again in other embodiments, each interconnection representative is less than 44 pin.
Each chip has six interconnection.For example, chip F11 has interconnection 600 to 605.Equally, chip F33 has interconnection 606 to 611.The row of level and vertical row are lined up in these interconnection.Interconnection provides the direct connection between adjacent two row chips or the adjacent two row chips.Therefore, for example, interconnection 600 directly connects chip F11 and F13; Interconnection 601 directly connects chip F11 and F12; Interconnection 602 directly connects chip F11 and F14; Interconnection 603 directly connects chip F11 and F31; Interconnection 604 directly connects chip F11 and F21; And 605 direct chip F11 of connection and the F41 that interconnect.
Same, for not being positioned at the array edge chip F13 of (as, chip F11), interconnection 606 is connection chip F33 and F13 directly; Interconnection 607 directly connects chip F33 and F23; Interconnection 608 directly connects chip F33 and F34; Interconnection 609 directly connects chip F33 and F43; Interconnection 610 directly connects chip F33 and F31; And 611 direct chip F33 of connection and the F32 that interconnect.
Because chip F11 is positioned within the hop that begins from chip F13,600 be marked as " 1 " so interconnect.Because chip F11 is positioned within the hop that begins from chip F12,601 be marked as " 1 " so interconnect.Same, because chip F11 is positioned within the hop that begins from chip F14,602 be marked as " 1 " so interconnect.Same, for chip F33, all interconnection all are marked as " 1 ".
This interconnect scheme make each chip can twice " redirect " or the interconnection within array in other any chips get in touch.Therefore, chip F11 can be connected with chip F33 by any in following two paths: (1) interconnection 600 is arrived and is interconnected 606; Or (2) interconnection 603 is to interconnection 610.In a word, the path can be: (1) at first along row, again along row, or (2) are at first along row, again along row.
Although Fig. 8 has shown the fpga chip with level and perpendicular interconnection that is configured to the 4x4 array, the actual physics on circuit board realizes it being to rely on low side and high-end group with expansion piggyback board to realize.So in one embodiment, chip F41-F44 and F21-F24 are in the low side group.Chip F31-F34 and F11-F14 are in high-end group.Piggyback board comprises chip F11-F14 and chip F21-F24.Therefore, for array extending, can with contain a plurality of chips (as, 8) piggyback board be added on these groups, and be positioned at the current top that comprises the row of chip F11-F14.In another embodiment, piggyback board expansion is comprised array current chip F41-F44 row below.Additional embodiments allows its expansion at chip F14, F24, the right of F34 and F44.Additional embodiments allows its expansion at chip F11 again, F21, the left side of F31 and F41.
Mode with " 0 " and " 1 " represents that Fig. 7 has shown the connection matrix of the 4x4FPGA shown in Fig. 8 (field programmable gate array) array.Utilize this connection matrix to generate by the hardware mapping in Analog Simulation System, the layout cost that the cost function that uses in the place and route process produces.Above introduced cost function in conjunction with Fig. 6.For example, chip F11 is positioned within the hop that begins from chip F13, so the connection matrix tabulated value of F11-F13 is " 1 ".
Figure 21 has shown the interconnection leading foot of single fpga chip according to an embodiment of the invention.Each chip has six groups of interconnection, and wherein every group of interconnection comprises the pin of specific quantity.In one embodiment, every group of interconnection has 44 pins.The interconnection of each fpga chip is by level (Dong-Xi) arrange with vertical (North-south) direction.Interconnected set westwards is marked as W[43:0].Interconnected set eastwards is marked as E[43:0].Interconnected set northwards is marked as N[43:0].Interconnected set to the south is marked as S[43:0].These interconnected set are used for the connection between adjacent chips; That is to say that these interconnection do not have " hop " to cross any chip.For example, in Fig. 8, the N[43:0 of chip F33] for interconnecting 607, E[43:0] for interconnecting 608, S[43:0] and be interconnection 609, W[43:0] be interconnection 611.
Get back to Figure 21, also have two additional interconnection groups.An interconnected set is used for vertical non-adjacent interconnection-YH[21:0] and YH[43:22].Another interconnected set is used for the non-adjacent interconnection-XH[21:0 of level] and XH[43:22].Each group, YH[... ] and XH[... ], be divided into two groups, wherein per half group comprises 22 pins.This configuration makes that the manufacturing of each chip is all identical.Therefore, each chip can with its top, the below, the left and right-hand non-adjacent chip interconnect in a hop.This fpga chip has also shown and has been used for overall signal, the pin of FPGA bus and JTAG signal.
The FPGA i/o controller is discussed now.In Figure 10, briefly introduced this controller before this as part 327.Data between FPGA i/o controller management pci bus and the FPGA array are communicated by letter with control.
Figure 22 has shown the FPGA controller embodiment between pci bus and the FPGA array, and some groups of fpga chips.FPGA i/o controller 700 comprises CTRL_FPGA unit 701, clock buffer 702, pci controller 703, EEPROM704, FPGA arranged in series interface 705, boundary scan testing interface 706, and impact damper 707.The suitable power regulating circuit that provides the person of ordinary skill in the field to know.The example of power supply comprises Vcc, and it is connected with sensor amplifier with voltage-level detector/regulator, and sensor amplifier plays a part in fact to keep voltage under varying environment.The film fuse that has snap action among the Vcc of each fpga chip.Vcc-HI is offered the CONFIG# of all fpga chips and the LINTI# of LOCAL_BUS 708.
CTRL_FPGA unit 701 is the master controller of FPGA i/o controller 700, is responsible for handling different control, test, and the mass data between read/write different units and bus.CTRL_FPGA unit 701 is connected with high-end group with the low side of fpga chip.Fpga chip F41-F44 and F21-F24 (that is low side group) link to each other with low side FPGA bus 718.Fpga chip F31-F34 and F11-F14 (that is, high-end group) link to each other with high-end FPGA bus 719.These fpga chips F11-F14, F21-F24, F31-F34 and F41-F44 keep their numbering corresponding to the fpga chip among Fig. 8.
At these fpga chips F11-F14, F21-F24, thick film chip resistor for being used for correctly being written between F31-F34 and F41-F44 and low side group bus 718 and the high-end group of bus 719.Resistor group 713 links to each other with low side group bus 718, also comprises for example resistor 716 and resistor 717.Resistor group 712 links to each other with high-end group of bus 719, also comprises for example resistor 714 and resistor 715.
Expansion if desired, can be on low side group bus 718 and high-end group of bus 719 the more fpga chip of right-hand arrangement of fpga chip F11 and F21.In one embodiment, expand by the piggyback board of similar piggyback board 720.Therefore, if these fpga chip groups only have 8 fpga chip F41-F44 and F31-F34 when initial, can realize further expansion by increasing piggyback board 720 so, piggyback board 720 is included in fpga chip F24-F21 in the low side group and the chip F14-F11 in high-end group.Piggyback board 720 also comprises additional low-end and high-end group of bus and thick film chip resistor.
Pci controller 703 is the main interface between FPGA i/o controller 700 and the 32-position pci bus 709.If pci bus expands to 64 and/or 66MHz, Adjustment System that can be suitable and can not depart from the spirit and scope of the present invention.Will be described herein-after these adjustment.Operable pci controller 703 PCI9080 or 9060 who is exemplified as PLX technology company in system.PCI9080 has suitable local bus interface, control register, and FIFO (first-in first-out), and to the pci interface of pci bus.The databook of PLX technology company, the content of [PCI9080 tables of data] (0.93 edition, on February 28th, 1997) is incorporated this paper by reference into.
Pci controller 703 is by LOCAL_BUS 708 Data transmission between CTRL_FPGA unit 701 and pci bus 709.LOCAL_BUS comprises and is respectively applied for control signal, the control bus part of address signal and data-signal, address bus part, and data bus part.If pci bus expands to 64, the data bus of LOCAL_BUS 708 part also can expand to 64.Pci controller 703 is connected with EEPROM 704, and it comprises the configuration data of pci controller 703.The example of EEPROM 704 is the 93CS46 of National Semiconductor (National semiconductor).
Pci bus 709 provides the clock signal of 33MHz for FPGA i/o controller 700.Clock signal is provided for clock buffer 702 by wire line 710 and is used for synchronous purpose and low time lag purpose.This clock impact damper 702 is output as global clock (GL_CLK) signal of 33MHz, and it is provided for all fpga chips by wire line 711, and is provided for CTRL_FPGA unit 701 by wire line 721.If pci bus expands to 66MHz, clock buffer also will provide the signal of 66MHz for system.
FPGA arranged in series interface 705 provides configuration data with configuration fpga chip F11-F14, F21-F24, F31-F34 and F41-F44.The Altera databook, [Altera, 1996 databooks] provide the particulars of device for formulating and process (in June, 1996).FPGA arranged in series interface 705 also is connected with parallel port 721 with LOCAL_BUS 708.In addition, FPGA arranged in series interface 705 connects CTRL_FPGA unit 701 and fpga chip F11-F14, F21-F24, F31-F34 and F41-F44 by CONF_INTF wire line 723.
Boundary scan testing interface 706 provides the JTAG embodiment of the test command set of appointment, to utilize the software externally logical block and the circuit of measurement processor or system.This interface 706 is observed IEEE (IEEE) standard 1149.1-1990 standard.Referring to the Altera databook, [Altera, 1996 databooks] (in June, 1996) and [application note 39] (the jtag boundary sweep test in the Altera equipment) are to obtain more information, and its content is incorporated this paper by reference into.Boundary scan testing interface 706 also is connected with parallel port 722 with LOCAL_BUS 708.In addition, boundary scan testing interface 706 connects CTRL_FPGA unit 701 and fpga chip F11-F14, F21-F24, F31-F34 and F41-F44 by BST_INTF wire line 724.
CTRL FPGA unit 701 is respectively by low side group 32-position bus 718 and high-end group of 32-position bus 719, and impact damper 707 imports data into or spreads out of low side (chip F41-F44 and F21-F24) and high-end (chip F31-F34 and F11-F14) fpga chip group, and F_BUS 725 is used for low side group 32-position FD[31:0], F_BUS 726 is used for high-end group of 32-position FD[63:32].
An embodiment in low side group bus 718 and high-end group of bus 719 has the throughput of double pci bus 709.Pci bus 709 bit wide when 33MHz is 32.Therefore throughput is 132MBX (=33MHz*4 byte).Low side group bus 718 is 32 at a half (33/2MHz=16.5MHz) of pci bus frequency.High-end group of bus 719 also is 32 at a half (33/2MHz=16.5MHz) of pci bus frequency.The throughput of 64 low sides and high-end group of bus also is 132MBX (=16.5MHz*8 byte).Therefore, the performance of low side and high-end group of bus is better than the performance of pci bus.In other words, pci bus has performance limitations, and low side and high-end group of bus do not have.
According to one embodiment of the invention, all be that the number of address indicator is realized in each software/hardware boundary address space in each fpga chip.These address pointers are crossed over a plurality of fpga chips and are linked at together by the multiplexed chip address indicator link of striding.See also above in conjunction with Fig. 9 the description of 11,12,14 and 15 pairs of address pointers.In order to cross over the address pointer link relevant and to cross over a plurality of chips and move word select and select signal, must have chain and go out wire line with given address space.Represent that with the arrow between the chip these chains go out wire line.It is wire line 730 between chip F23 and the F22 that chain that is used for the low side group goes out wire line.It is wire line 731 between chip F31 and the F32 that another chain that is used for high-end group goes out wire line.The chain that is positioned at low side group end chip F21 goes out wire line 732 and is connected with CTRL_FPGA unit 701, as LAST_SHIFT_L.The chain that is positioned at high-end group end chip F11 goes out wire line 733 and is connected with CTRL_FPGA unit 701, as LAST_SHIFT_H.When word select was selected signal and passed fpga chip and transmit, these signals LAST_SHIFT_L and LAST_SHIFT_H selected signal for word select of its corresponding group.When one among these signals LAST_SHIFT_L and the LAST_SHIFT_H offers CTRL_FPGA unit 701 with logical one, show that word select selects the terminal chip that signal has advanced to respective sets.
CTRL_FPGA unit 701 imports into or from the fpga chip outgoing signal to fpga chip by following wire line, it on the wire line 734 write signal (F_WR), it on the wire line 735 read signal (F_RD), it on the wire line 736 the DATA_XSFR signal, being the EVAL signal on the wire line 737, is SPACE[2:0 on the wire line 738] signal.The EVAL_REQ# signal that CTRL_FPGA unit 701 receives on the wire line 739.Write signal (F_WR), read signal (F_RD), DATA_XSFR signal and SPACE[2:0] address pointer of signal common service in fpga chip.Utilizing write signal (F_WR), read signal (F_RD) and SPACE[2:0] signal is that address pointer with the selected address space correlation of being determined by SPACE index (SPACE[2:0]) generates the MOVE signal.Utilize DATA_XSFR signal initialization address indicator and begin word for word data transmission procedure.
If asserting, any fpga chip then utilizes this signal to restart the estimation circulation by the EVAL_REQ# signal.For example, be the estimated data, data be transferred to or write FPGA from the primary memory of primary processor computer installation by pci bus.Last in transmission begins the estimation circulation, and this operation that comprises the initialization of address pointer and software clock is to promote estimation process.But owing to multiple reason, specific fpga chip may need estimated data once more.This fpga chip asserts that EVAL_REQ# signal and CTRL_FPGA unit 701 begin the estimation circulation once more.
Figure 23 has shown the detailed view of CTRL_FPGA unit 701 and impact damper 707 shown in Figure 22.Figure 23 and Figure 22 use identical about the input/output signal of CTRL_FPGA unit 701 and corresponding numbering thereof.But, other signals and the lead/bus line that do not show among Figure 22 will be represented by new numbering, for example SEM_FPGA output starts 1016, local interruption output (local I NTO) 708a, local read 708b, local address bus 708c, local interruption input (local I NTI#) 708d, and the bus 708e of local data.
CTRL_FPGA unit 701 comprises transmission and finishes inspection logic (XSFR_DONE logic) 1000, estimation steering logic (EVAL logic) 1001, dma descriptor piece 1002, control register 1003, estimation timer logic (EVAL timer) 1004, address decoder 1005, write flag sequence generator logical one 006, fpga chip read/write steering logic (SEM_FPGA R/W logic) 1007, demultiplexer and latch (DEMUX logic) 1008, and latch 1009-1012, it is corresponding to the impact damper among Figure 22 707.Global clock signal (CTRL_FPGA_CLK) on lead/bus 721 is provided for logic element/pieces all in the CTRL_FPGA unit 701.
Transmission is finished and is checked that logic (XSFR_DONE logic) 1000 receives LAST_SHIFT_H 733, LAST_SHIFT_L 732 and local INTO 708a.XSFR_DONE logical one 000 is finished signal (XSFR_DONE) by lead/bus 1013 with transmission and is outputed to EVAL logical one 001.Based on the reception of LAST_SHIFT_H 733 and LAST_SHIFT_L 732, XSFR_DONE logical one 000 will be checked finishing of data transmission, make to begin the estimation circulation as required.
EVAL_REQ# signal on EVAL logical one 001 reception lead/bus 739 and the WR_XSFR/RD_XSFR signal on lead/bus 1015, signal (XSFR_DONE) is finished in the transmission that adds on lead/bus 1013.EVAL logical one 001 generates two output signals, beginning EVAL on lead/bus 1014 and the DATA_XSFR on lead/bus 736.The EVAL logic shows that when the data transmission between FPGA bus and the pci bus will begin in the initialization address indicator.It receives the XSFR_DONE signal after data transmission is finished.The transmission of WR_XSFR/RD_XSFR signal indicating is read or write operation.In case I/O end cycle (or before an I/O cycle begins), EVAL logic can begin to estimate circulation and follow the EVAL signal that starts the EVAL timer.The EVAL timer has been stipulated the estimation round-robin duration and by keeping the estimation circulation effectively with stable data transfer to all registers and combine component, guaranteed the successful operation of software clock mechanism in needs.
The local bus address that dma descriptor piece 1002 receives on lead/bus 1019, the enabling signal of writing on lead/bus 1020, and the local bus data on lead/bus 1029 via the bus 708e of local data from address decoder 1005.It is output as the dma descriptor output on lead/bus 1046, and it enters DEMUX logical one 008 by lead/bus 1045.Dma descriptor piece 1002 comprises the descriptor block information corresponding to primary memory, comprises the PCI address, local address, transmission counting, the address of transmission direction and next descriptor block.Main frame will be set up the address of initial descriptor block in the descriptor indicator register of pci controller.Can start transmission by control bit is set.PCI is written into first descriptor block and begins data transmission.Pci controller continues to be written into descriptor block and transmits data to be arranged in next descriptor indicator register up to the end that it detects the position that is linked.
Local R/W control signal on address decoder 1005 receptions and the transfer bus 708b, the local address signal on reception and the transfer bus 708c.Address decoder 1005 generates the enabling signal of writing of input dma descriptor 1002 on lead/bus 1020, on lead/bus 1021, generate the enabling signal of writing of input control register 1003, on lead/bus 738, generate FPGA address SPACE index, on lead/bus 1027, generate control signal, and another control signal that on lead/bus 1024, generates input DEMUX logical one 008.
The enabling signal of writing that control register 1003 receives on lead/bus 1021, and the data on lead/bus 1030 via the bus 708e of local data from address decoder 1005.Control register 1003 generates the WR_XSFR/RD_XSFR signal of input EVAL logical one 001 on lead/bus 1015, what generate input EVAL timer 1004 on lead/bus 1041 is provided with EVAL time signal and the SEM_FPGA output enabling signal that generates the input fpga chip on lead/bus 1016.System uses SEM_FPGA output enabling signal optionally to connect or activate each fpga chip.System once activates a fpga chip usually.
EVAL timer 1004 receives the beginning EVAL signal on lead/buses 1014, and receives and on lead/bus 1041 the EVAL time signal is set.EVAL timer 1004 generates the EVAL signal on lead/bus 737, on lead/bus 1017, generate estimation and finish (EVAL_DONE) signal, and generate on lead/bus 1018 that input writes flag sequence generator logical one 006 begin to write marking signal.In one embodiment, the position of EVAL timer is long is 6.
Writing flag sequence generator logical one 006 receives and begins to write marking signal from EVAL timer 1004 on lead/bus 1018.Write flag sequence generator logical one 006 generates input local R/W lead/bus 708b on lead/bus 1022 local R/W control signal, on lead/bus 1023, generate the local address signal of input local address bus 708c, on lead/bus 1028, generate local data's signal of the input bus 708e of local data, on lead/bus 708d, generate local I NTI#.Based on the reception that begins to write marking signal, write flag sequence generator logic and begin the sequence of control signal with the write cycle time of beginning storer to pci bus.
The control signal that SEM_FPGA R/W steering logic 1007 receives on lead/bus 1027, and the local R/W control signal on lead/bus 1047 via local R/W control bus 708b from address decoder 1005.SEM_FPGA R/W steering logic 1007 generates the enabling signal of input latch 1009 on lead/bus 1035, on lead/bus 1025, generate the control signal of input DEMUX logical one 008, on lead/bus 1037, generate the enabling signal of input latch 1011, on lead/bus 1040, generate the enabling signal of input latch 1012, on lead/bus 734, generate the F_WR signal, and on lead/bus 735, generate the F_RD signal.1007 controls of SEM_FPGA R/W steering logic enter and export a plurality of write and read data transmission of FPGA low side group and high-end group of bus.
DEMUX logical one 008 is a multiplexer and a latch, and latch receives four groups of input signals and export one group of signal to the bus 708e of local data on lead/bus 1026.Selector signal be on lead/bus 1025 from the control signal of SEM_FPGA R/W steering logic 1007 and the lead/bus 1024 from the control signal of address decoder 1005.One group of EVAL_DONE signal that DEMUX logical one 008 receives on lead/bus 1042, one group of XSFR_DONE signal on lead/bus 1043, and one group of EVAL signal on lead/bus 1044.This single sets of signals is marked as numbering 1048.In any one time cycle, these three signal EVAL_DONE, XSFR_DONE has only one may be provided for DEMUX logical one 008 among the EVAL.DEMUX logical one 008 also receives on lead/bus 1045 the dma descriptor output signal from dma descriptor piece 1002, come data output on lead/bus 1039 from latch 1012, and come on lead/bus 1034 to export from another data of latch 1010, as other three groups of input signals.
Data buffer between CTRL_FPGA unit 701 and low side and the high-end FPGA group bus comprises latch 1009 to 1012.Latch 1009 receives on lead/buses 1032 the local bus data via lead/bus 1031 and local data bus 708e, and on lead/bus 1035 from the enabling signal of SEM_FPGA R/W steering logic 1007.Latch 1009 outputs to latch 1010 by lead/bus 1033 with data.
Latch 1010 receives the data of coming on lead/buses 1033 from latch 1009, and on lead/bus 1036 via the enabling signal of lead/bus 1037 from SEM_FPGA R/W steering logic 1007.Latch 1010 outputs to data FPGA low side group bus and outputs to DEMUX logical one 008 by lead/bus 1034 by lead/bus 725.
Latch 1011 receives on lead/buses 1031 data from the bus 708e of local data, and on lead/bus 1037 from the enabling signal of SEM_FPGA R/W steering logic 1007.Latch 1011 outputs to data high-end group of bus of FPGA and outputs to latch 1012 by lead/bus 1038 by lead/bus 726.
Latch 1012 receives the data of coming on lead/buses 1038 from latch 1011, and on lead/bus 1040 from the enabling signal of SEM_FPGA R/W steering logic 1007.Latch 1012 outputs to DEMUX logical one 008 by lead/bus 1039 with data.
Figure 24 has shown 4x4FPGA (field programmable gate array) array, the relation that itself and FPGA organize, and extended capability.Similar Fig. 8, Figure 24 have shown same 4x4 array.Also shown CTRL_FPGA unit 740.Low side core assembly sheet (chip F41-F44 and F21-F24) and high-end core assembly sheet (chip F31-F34 and F11-F14) are arranged in the mode that replaces.Therefore, fpga chip is capable from the bottom line to the top row is successively: the low side group--high-end group--low side group--high-end group.The data transmission chain is pressed predefined procedure and is formed along these groups.The data transmission chain of arrow 741 expression low side groups.The data transmission chain that arrow 742 expressions are high-end group.Arrow 743 expression JTAG configuration chains, it is through all 16 chips of whole array, and from F41 to F44, F34 is to F31, and F21 is to F24, and F14 returns CTRL_FPGA unit 740 to F11.
Can utilize piggyback board to finish expansion.Suppose that original fpga chip array comprises F41-F44 and F31-F34 among Figure 24, can finish the interpolation of other two row chip F21-F24 and F11-F14 by piggyback board 745.Piggyback board 745 also has suitable bus with expanded set.The top that more piggyback board can be placed in other circuit boards is to finish further expansion in the array.
Figure 25 has shown an embodiment of hardware-initiated method.Step 800 begins to carry out power-on servicing or carries out hot startup procedure.In step 801, pci controller reads EEPROM to carry out initialization.Step 802 is carried out the read and write operation according to initialize routine to the pci controller register.Fpga chips all in the step 803 pair array carry out boundary scan testing.Step 804 disposes the CTRL_FPGA unit in the FPGA i/o controller.Register in the step 805 pair CTRL_FPGA unit carries out the read and write operation.Step 806 pci controller is set to DMA master's read/write mode.After this, transmission and checking data.Step 807 is utilized test design to dispose all fpga chips and is verified its correctness.In step 808, hardware has been ready to available.At this moment, system postulation result in steps all confirmed the operability of hardware, otherwise system can not arrive step 808.
E. use the alternate embodiment of more intensive fpga chip
In one embodiment of this invention, the fpga logic device is installed on the single circuit board.If make the needed fpga logic device of user's circuit design modelling than being installed in many on the circuit board, we can provide the multiple circuit board that has more fpga logic devices so.Can increase more circuit board in simulation system is a good characteristics of the present invention.In this embodiment, use more intensive fpga chip, (as Altera 10k130v and 10k250v).The use of these chips has changed the design of circuit board, so that has only substituted eight more low-density fpga chips (as Altera10k100) with four more intensive fpga chips on each circuit board.
Like this, we solve coupling connection problem between these circuit boards and the simulation system mainboard with regard to needs, they interconnection and the link scheme in must make compensation to the base plate disappearance.FPGA array in the simulation system is installed on the mainboard by a unique circuit board interconnection structure.Each chip may have 8 cover interconnection devices at most, the configuration of these interconnection devices be according to contiguous direct neighbor connectors (be N[73:0], S[73:0], W[73:0], E[73:0]) connectors adjacent with single hop (be NH[27:0], SH[27:0], XH[36:0], XH[72:37]), do not comprise that local bus connects, be positioned at independent circuit board, and pass different circuit boards.Each chip can be direct and contiguous chip interconnect or upper and lower by being positioned at of single hop and non-vicinity, a left side or right chip interconnect.Array ringwise, and is latticed in that Y direction (north is to southern) is next under directions X (east to west).
These connectors can connect logical unit and other assembly independently in independent circuit board.Yet we link together these circuit boards and connectors by the connector between circuit board, and pass different circuit boards and transmitting data by mainboard and between any two array circuit plates between pci bus and the array circuit plate.Each circuit board comprises the FPGA bus FD[63:0 that the fpga logic device is interknited that carries], SRAM storage arrangement and CTRL_FPGA unit (FPGA i/o controller).FPGA bus FD[63:0] be not mounted between the multiple circuit board, set up connection (though these connectors do not have related with the FPGA bus) between the fpga logic device but pass multiple circuit board.And on the other hand, local bus is arranged all on all circuit boards.
Motherboard connector is linked circuit board on the mainboard, so just can connect pci bus, power supply and ground connection.For some circuit board, motherboard connector is not as with the direct usefulness that is connected of mainboard.In a structure with six circuit boards, only be numbered 1,3,5 circuit board and be directly link on the mainboard and to be numbered 2,4 and 6 circuit board be to be connected on the mainboard by contiguous circuit board.Therefore, each circuit board of being separated by is directly to link on the mainboard and connectors these circuit boards and local bus are to link together to the mother daughter board connector on the component side by being installed in face of weld.Pci signal only transmits by one of them circuit board (normally first circuit board).The power supply of these mainboards and earthing device are mounted on the other motherboard connector.Various circuit mother daughter board connectors are mounted on face of weld and the component side and make between pci bus assembly, fpga logic device, memory device and the various simulation system control circuit and are interconnected.
Shown in Figure 56 is the high-level structure figure of a fpga chip array according to an embodiment of the invention.CTRL_FPGA described above unit 1200 is connected on 1210 buses by 1209 circuits.In one embodiment, CTRL_FPGA unit 1200 is programmable logic devices (PLD) of fpga chip (as Altera 10K50 chip) form.Bus 1210 makes CTRL_FPGA unit 1200 be connected to other analog array circuit board and other chip (as pci controller, EEPROM, clock buffer etc.).Figure 56 has also shown other major function piece of logical unit and storage arrangement form.In one embodiment, logical unit is the programmable logic device (PLD) of fpga chip (as Altera10K130V or 10K250V chip) form.10K130V or 10K250V chip are pin compatibilities, and each then is the PGA encapsulation of one 599 pin.Therefore with top be shown in the array embodiment with 8 Altera FLEX 10K100 chips different be only to use 4 Altera FLEX 10K130 chips in this embodiment.One embodiment of the invention have been described the circuit board that has 4 logical units and their interconnection.
Owing to these logical units that adopt any amount in the array in modelling that designs the user and the configuration, the logical unit communication between the FPGA must be connected to another part with the part that subscriber's line circuit designs.And initial configuration information and boundary scan testing also are to be supported by the connectors between the FPGA.At last, also must between simulation system and fpga logic device, visit necessary simulation system control signal.
Shown in Figure 36 is the hardware configuration of a fpga logic device used in the present invention.This fpga logic device 1500 comprises 102 top I/O pins, 102 bottom I/O pins, 111 left side I/O pins and 111 right side I/O pins.Like this, the pin sum of connectors is exactly 425.And, also have 45 additional I/O pins with the lower device special use, comprise: GCLK, FPGA bus FD[31:0] (for high-end group, special-purpose FD[63:32]), F_RD, F_WR, DATAXSFR, SHIFTIN, SHIFTOUT, SPACE[2:0], EVAL, EVAL_REQ_N, DEVICE_E (signal of the output pin of the unlatching fpga logic device that CRTL_FPGA sends the unit) and DEV_CLRN (signal of all internal triggers of removing before starting simulation that CRTL_FPGA sends the unit).Like this, these interconnection are transmitted in any data and the control signal between any two fpga logic devices.Remaining pin is power supply and ground connection special use.
Shown in Figure 37 is according to the FPGA interconnection leading foot of the independent fpga chip of one embodiment of the invention.Each chip 1510 can have 8 cover interconnection at most, and the number of the pin of every cover interconnection is different.According to the difference of their positions on circuit board, the tricks of some chip interconnects may be less than 8.In preferred embodiment, though the different tricks of their employed interconnection that cause also can be different on circuit board for chip, all chips all have 7 cover interconnection.The interconnection of each fpga chip is that level (east is to the west) is installed with vertical (north is to south).The west to interconnection be designated as W[73:0], the interconnection of east orientation is designated as E[73:0], the interconnection of north orientation is designated as N[73:0], and the interconnection of south orientation is designated as S[73:0].These all interconnection can only be connected on the contiguous chip and can not skip any one chip.For example: in Figure 39, the N[73:0 of chip 1570] be interconnection 1540, W[73:0] be interconnected 1542, E[73:0] be interconnection 1543, S[73:0] for interconnecting 1545.Note this fpga chip 1570, i.e. FPGA2 chip has contiguous interconnection---the N[73:0 of whole quadruplets], S[73:0], W[73:0] and E[73:0].The FPGA0 west to the intercommunicated circuit 1539 of crossing be connected with the FPGA3 east orientation in the annulus mode.Like this, circuit 1539 method that 1569 (FPGA0) and 1572 (FPGA3) are coupled together be with the thing two ends of circuit board around and the mode that couples together similar.
We can see quadruplet " relaying " interconnection to get back to Figure 37.Wherein two cover interconnection are that interconnection for the homeotropic alignment of non-vicinity is provided with, i.e. NH[27:1] and SH[27:0].For example, interconnection 1541 of NH shown in the FPGA2 chip 1570 and SH interconnection 1546 among Figure 39.Return Figure 37, other two cover interconnection are that the horizontal interconnection for non-vicinity is provided with, i.e. XH[36:0] and XH[72:37].For example, the interconnection of XH shown in the FPGA2 chip 1,570 1544 among Figure 37.
Turn to Figure 37, vertical relay interconnection device NH[27:0] and SH[27:0] 28 pins are respectively arranged.The interconnection XH[36:0 of level] and XH[72:37] 73 pins are then arranged.Horizontal interconnect pin (XH[36:0] and XH[72:37]) can be used in that west (as the interconnection 1605 of FPGA3 chip 1576 among Figure 39) is gone up and/or in the east in (as the interconnection 1602 of FPGA0 chip 1573 among Figure 39).This configuration makes the production of each chip become identical.Like this, each chip can be connected to one by single hop and is positioned on upper and lower a, left side and the right non-adjacent chips.
Six circuit boards that Figure 39 shows is according to one embodiment of present invention to be done are positioned at directly contiguous and FPGA array design that single hop is close on the independent mainboard.This figure is used for demonstrating two kinds of possible configurations, i.e. one six circuit board systems and a double circuit plate system.Position indicator 1550 has shown that " Y " direction is that north is to south and " X " direction is that east is to western.Under directions X array ringwise, and array is latticed under the Y direction.In Figure 39, only show high-rise circuit board, fpga logic device, interconnection device and connector, and do not shown mainboard and other supporting assembly (as the SRAM storage arrangement) and wire line (as the FPGA bus).
Notice that Figure 39 has provided the array view of circuit board and assembly, connectors and connector.Actual physical configuration and installing comprises these circuit boards is placed on separately the edge from the component side to the face of weld.Nearly half circuit board is that to be directly connected on the mainboard second half then be to be connected on the adjacent separately circuit board.
In the embodiment of the present invention's six circuit boards, six circuit boards 1551 (circuit board 1), 1552 (circuit boards 2), 1553 (circuit boards 3), 1554 (circuit boards 4), 1555 (circuit boards 5) and 1556 (circuit boards 6) are that the part as reconfigurable hardware unit 20 among Fig. 1 is installed on the mainboard (not shown).Each circuit board comprises almost completely identical a grip assembly and connector.Like this, for the purpose of demonstrating conveniently, the 6th circuit board 1556 comprises fpga logic device 1565 to 1568 and connector 1557 to 1560 and 1581; The 5th circuit board 1555 comprises fpga logic device 1569 to 1572 and connector 1582 and 1583; The 4th circuit board 1554 comprises fpga logic device 1573 to 1576 and connector 1584 and 1585.
In the configuration of this six circuit board, first circuit board 1551 and the 6th circuit board 1556 are as " bookend (bookend) " circuit board and comprise Y-grid terminal, as the terminal 1591 to 1594 on bag terminal 1557 to 1560 of the R-on the 6th circuit board 1556 and the first circuit board 1551.For the complete mounting circuit boards (i.e. 1552 (circuit boards 2), 1553 (circuit boards 3), 1554 (circuit boards 4) and 1555 (circuit boards 5)) at once that also provides of array is provided.
As what set forth above, the configuration of these interconnection be according to contiguous direct neighbor interconnection (be N[73:0], S[73:0], W[73:0], E[73:0]) interconnection adjacent with single hop (be NH[27:0], SH[27:0], XH[36:0], XH[72:37]), do not comprise that local bus connects, be positioned at independent circuit board, and pass different circuit boards.These interconnection can be in an independent circuit board separate connection logical unit and other assembly.Yet circuit mother daughter board connector 1581 to 1590 can make the logical unit on the various boards (first circuit board to the six circuit boards) carry out communication.The FPGA bus is the part of circuit board connector 1581 to 1590.These connectors 1581 to 1590 are 600 pin connectors, and they are that 520 signals of two adjacent array circuit boards load are connected with 80 power supply/ground connection.
In Figure 39, different circuit boards connects in asymmetric mode for circuit mother daughter board connector 1581 to 1590.For example, being present between circuit board 1551 and 1552 is circuit mother daughter board connector 1589 and 1590.Connectors 1515 fpga logic device 1511 is linked together with 1577 and according to connector 1589 with 1590 this to be connected be symmetrical.Yet 1603 of connectors are asymmetric.It is connected to a fpga logic device in the tertiary circuit plate 1553 in the circuit board 1551 on the fpga logic device.For connector 1589 and 1590, this connection is asymmetric.Similarly, for connector 1589 and 1590, connector 1600 also is asymmetric.Because it is connected to fpga logic device 1577 on the terminal 1591, this terminal then is to be connected to fpga logic device 1577 by connectors 1601.Other connectors has also illustrated this asymmetric.
This asymmetric these interconnection wirings that caused are by the connector between circuit board, and carrying out wiring topology----a kind of according to two kinds of different modes is that symmetric form another kind that resemble interconnection 1515 then is to resemble 1603 and 1600 the asymmetrical type of interconnecting.Figure 40 (A) and 40 (B) have shown interconnection wiring road scheme.
In Figure 39, the example that the direct neighbor in independent circuit board connects is a connectors 1543, and it couples together the logical unit on the circuit board 1,555 1570 and logical unit 1571 along east-west direction.The other example that direct neighbor in independent circuit board connects is a connectors 1607, and it couples together logical unit on the circuit board 1,554 1573 and logical unit 1576.The example that direct neighbor in two circuit boards connects is a connectors 1545, and it couples together the logical unit on logical unit on the circuit board 1,555 1570 and the circuit board 1,554 1574 by connector 1583 and 1584 along North and South direction.At this, two circuit mother daughter board connectors 1583 and 1584 are signals of being used for transmitting passing through.
An example of the single hop interconnection in independent circuit board is interconnection 1544, and it couples together the logical unit on the circuit board 1,555 1570 and logical unit 1572 along east-west direction.An example of single hop interconnection is interconnection 1599 between two various boards, and it couples together the logical unit 1573 on logical unit on the circuit board 1,556 1565 and the circuit board 1554 by connector 1581 to 1584.At this, four circuit mother daughter board connectors 1581 to 1584 are signals of being used for transmitting being passed through.
Some circuit boards, those circuit boards that especially are positioned at mainboard north and south end also comprise 10 ohm of R bags that are used for stopping some connections.Like this, the 6th circuit board 1556 comprises 10 ohm of R packet gatewaies 1557 to 1560 and first circuit board 1551 comprises 10 ohm of R packet gatewaies 1591 to 1594.The R packet gateway 1557 that the 6th circuit board 1556 comprises is used for stopping interconnection 1970 and 1971, R packet gateway 1558 is used for stopping interconnection 1972 and 1541, R packet gateway 1559 is used for stopping interconnection 1973 and 1974, and R packet gateway 1560 is used for stopping interconnection 1975 and 1976.In addition, connector 1561 to 1564 does not link to each other with any device.It is different with the circular ring type connection of thing that these north and south connect, and they are grid types.
These grid terminals have increased the number of North and South direction direct interconnection device.Otherwise the interconnection device that is positioned at two ends, FPGA grid north and south will be wasted.For example, fpga logic device 1511 and 1577 has also disposed the additional interconnection by R bag 1591 and interconnection 1600 and 1601 except a cover direct interconnection 1515 is arranged.That is to say that R bag 1591 will interconnect and 1600 and 1601 link together.This has increased direct-connected number between fpga logic device 1511 and 1577.
We have also disposed the connection between circuit board. Logical unit 1577,1578,1579 on the circuit board 1551 with 1580 by interconnection 1515,1516,1517 with 1518 with circuit mother daughter board connector 1589 with 1590 with circuit board 1552 on logical unit 1511,1512,1513 link to each other with 1514.Like this, interconnection 1515 couples together the logical unit 1577 on logical unit on the circuit board 1,552 1511 and the circuit board 1551 by connector 1589 and 1590; 1516 couple together logical unit on the circuit board 1,552 1512 and logical unit 1578 on the circuit board 1551 by connector 1589 and 1590; 1517 couple together logical unit on the circuit board 1,552 1513 and logical unit 1579 on the circuit board 1551 by connector 1589 and 1590; 1518 couple together logical unit on the circuit board 1,552 1514 and logical unit 1580 on the circuit board 1551 by connector 1589 and 1590.
Some interconnection as interconnection 1595,1596,1597 do not link to each other with any device with 1598, because they are not used.Yet just as described above, for logical unit 1511 and 1577, R bag 1591 couples together connector 1600 and 1601 and has increased the number of North and South direction connector.
The embodiment of a double circuit plate of the present invention as shown in figure 44.In double circuit plate embodiment of the present invention, having only two circuit boards is to make the designing a model of user of simulation system necessary.As six circuit board arrangement among Figure 39, two circuit boards of the identical with it conduct " bookend " that the configuration of the double circuit plate of Figure 44 is to use, i.e. circuit board 1551 and circuit board 1556.They are to be arranged on the mainboard and as the part of Fig. 1 and Figure 44 reconfigurable hardware unit, one of them bookend circuit board is a first circuit board, and another then is the 6th circuit board.The 6th used circuit board of Figure 44 has shown the similarity with the 6th circuit board of Figure 39.That is to say that resembling first should have with the such bookend circuit board of the 6th circuit board and to stop the north and south grid and be connected necessary terminal.
This double circuit plate configuration comprises the fpga logic device 1577 (FPGA0) on the first circuit board 1551,1578 (FPGA1), fpga logic device 1565 (FPGA0) on 1579 (FPGA2) and 1580 (FPGA3) and the 6th circuit board 1556,1566 (FPGA1), 1567 (FPGA2) and 1568 (FPGA3).These two circuit boards link together by circuit mother daughter board connector 1581 and 1590.
These circuit boards comprise 10 ohm of R bags that are used for stopping some connections.In the embodiment of double circuit plate, two circuit boards all are " bookend " plates.1551 circuit boards comprise the 10 ohm of R packet gatewaies 1591,1592,1593 and 1594 as resistive terminal.Another piece circuit board also comprises 10 ohm of R packet gatewaies 1557 to 1560.
Be useful on the connector 1590 and 1581 of communication between circuit board on circuit board 1551 and the circuit board 1556 respectively.Connect the interconnection of two circuit boards, as 1600,1971,1977,1541 and 1540, pass connector 1590 and 1581; In other words, circuit mother daughter board connector 1590 and 1581 makes interconnection 1600,1971,1977,1541 and 1540 can finish the connection between the assembly on the various boards.Circuit mother daughter board connector 1590 and 1581 is transmitting control data and the control signal on the FPGA bus.
In the configuration of four circuit boards, first circuit board and the 6th circuit board are the bookend circuit boards, and second circuit board 1552 and tertiary circuit plate 1553 (seeing Figure 39) then are the intermediate circuit plates.When according to the present invention's (discussing with reference to Figure 38 (A) and Figure 38 (B)) when it is connected with mainboard, circuit board 1 and circuit board 2 are paired and circuit board 3 and circuit board 6 are paired.
In the configuration of six circuit boards, first circuit board and the 6th circuit board are bookend circuit boards and second circuit board 1552, tertiary circuit plate 1553, the 4th circuit board 1554 and the 5th circuit board 1555 (seeing Figure 39) are the intermediate circuit plates as mentioned above.When according to the present invention's (discussing with reference to Figure 38 (A) and Figure 38 (B)) when being connected with mainboard, circuit board 1 is paired with circuit board 2, and circuit board 3 and circuit board 4 are paired and circuit board 5 and circuit board 6 are paired.
In case of necessity more circuit board can be installed.Yet no matter the number that will be increased to the circuit board in the system how, bookend circuit board (as circuit board among Figure 39 1 and circuit board 6) should have indispensable terminal to finish being connected of latticed array.In one embodiment, the minimal configuration of double circuit template as shown in figure 44.Number of circuit boards can increase along with the increase of double circuit plate.If initial configuration is first circuit board and the 6th circuit board, so in the future the change of four circuit board arrangement is just comprised and aforesaid the 6th circuit board being shifted out, with first circuit board and second circuit board pairing, and with tertiary circuit plate and the pairing of the 6th circuit board.
As described above, each logical unit all is to be connected to the logical unit of direct vicinity and the logical unit of the single hop of non-direct vicinity.Like this, logical unit 1577 is connected to the logical unit 1578 of direct vicinity by connectors 1547 in Figure 39.Logical unit 1577 also is connected to the logical unit 1579 of non-direct vicinity by single hop connectors 1548.Yet we think logical unit 1580 and logical unit 1577 vicinities be because by the interconnection 1549 provide be connected around loop configurations.
What Figure 42 showed is the top view (component side) of an independent component on circuit board and connector.In one embodiment of the invention, having only a circuit board is essential to the design of the user in the modelling simulation system.In other embodiments, multiple circuit board (i.e. at least two circuit boards) is essential.Therefore, for example Figure 39 has shown that six circuit boards 1551 to 1556 link together to 1590 by different 600 pin connectors 1581.In top and bottom, circuit board 1551 is interrupted by 10 ohm of different R bags respectively with circuit board 1556.
Turn to Figure 42, we see that circuit board 1820 comprises four fpga logic devices: 1822 (FPGA0), 1823 (FPGA1), 1824 (FPGA2) and (FPGA3).Also have two SRAM storage arrangements 1828 and 1829 in addition.These two SRAM storage arrangements are used for logical unit mapping memory block from this circuit board; That is to say that storer simulation of the present invention is mapped to the memory block of logical unit on this circuit board in the SRAM memory device of this circuit board.Other circuit board comprises other logical unit and the storage arrangement of finishing similar map operation.In one embodiment, memory mapped depends on circuit board; That is to say that the memory mapped of first circuit board is only limited to logical unit on this circuit board and memory device and irrelevant with other circuit board.Memory mapped is independent of circuit board in other embodiments.So just need to use some big memory devices that the memory block is mapped on the memory device of another circuit board from the logical unit on the circuit board.
Also dispose light emitting diode (LED) 1821 and be used to refer to some selection modes.According to one embodiment of the invention, what LED was shown thes contents are as follows shown in the Table A:
Table A: LED shows
LED Color State Describe
LED1 Green Open + 5v and+3.3v is normal.
Close + 5v or+3.3v is undesired.
LED2 Amber Close FPGA configuration on all circuit boards is finished.
Flicker FPGA configuration on the circuit board is not finished or configuration failure.
Open The FPGA configuration is carried out.
LED3 Red Open During data transmission is carried out.
Close The no datat transmission.
Flicker Diagnostic test at fails
Various other control chip such as PLX pci controller 1826 and CTRL_FPGA unit 1827 are being controlled between FPGA and the communication of PCI.The example that may be used in intrasystem PLXPCI controller 1826 is PCI9080 or 9060 of PLX technology company.PCI9080 has suitable local bus interface, control register, the interface of FIFO and PCI and pci bus.The content of databook PLX technology and PCI9080 tables of data (on February 28th, 1997,0.93 edition) is incorporated this paper by reference into.The programmable logic device (PLD) that an example of CTRL_FPGA unit 1827 is FPGA forms is as the Altera chip.In multiple circuit board arrangement, have only the first circuit board that is connected with pci bus to comprise pci controller.
Connector 1830 is connected to mainboard (not showing) with circuit board 1820, just can connect pci bus, power supply and ground connection then.In some circuit board, connector 1830 is not direct-connected with mainboard.In the configuration of a double circuit plate, have only first circuit board directly to link to each other like this with mainboard.In the configuration of one six circuit board, have only circuit board 1,3,5th, with mainboard directly link to each other and circuit board 2,4,6th links to each other with mainboard by the circuit board adjacent with them.Also disposed circuit mother daughter board connector J1 in addition to J28, as its name suggests, connector J1 can set up the connection of passing various boards to J28.
Connector J1 connects external power source and ground connection.Following table B is pin and the associated description according to the shown external power supply connector J1 of one embodiment of the invention.
Table B: external power source--J1
Number of pins Describe
1 VCC5V
2 GND
3 GND
4 VCC3V
Connector J2 is used for parallel port and connects.Connector J1 and J2 are the boundary scan testings that is used for the independent circuit board of stand-alone interface in process of production.Following table C shows is pin and associated description according to the parallel jtag port J2 of one embodiment of the invention.
Table C: parallel jtag port-J2
The J2 number of pins The J2 signal The I/O of circuit board The DB25 number of pins The DB25 signal
3 PARA_TCK I 2 D0
5 PARA_TMS I 3 D1
7 PARA_TKI O 4 D2
9 PARA_NR O 5 D3
19 PARA_TKO O 10 NACK
10,12,14,16, 18,20,22,24 GND 18-25 GND
Connector J3 and J4 are that the local bus that is used to pass circuit board is connected.Connector J5 is the interconnection connection of a cover FPGA to J16.Connector J17 is the another set of interconnection connection of FPGA to J28.After from the component side to the face of weld these connectors being installed, they will set up effective connection for the assembly between the various boards.Following table D and E be according to one embodiment of the invention to connector J1 to complete tabulation and description that J28 did.
Table D: connector J1-J28
Connector Describe Type
J1 + 5V/+3V external power source 4 pin power supply RA top covers, component side
J2 Parallel port 0.1 " the pin distance, 2 eleven punch 11 RA top covers, component side
J3 Local bus 0.05 " the pin distance, 2 * 30 perforation RA top covers, SAMTEC, component side
J4 Local bus 0.05 " the pin distance, 2 * 30 perforation RA top covers, SAMTEC, component side
J5 A is capable: NH[0], VCC3V, GND B is capable: J17B is capable, VCC3V, GND 0.05 " the pin distance, 2 * 30SMD top cover, SAMTEC, component side
J6 A is capable: J5B is capable, VCC3V, and GND B is capable: J5A is capable, VCC3V, GND 0.05 " the pin distance, 2 * 30SMD socket, SAMTEC, face of weld
J7 A is capable: N[0], 4VCC3V, 4GND, N[2] B is capable: N[0], 4VCC3V, 4GND, N[2] 0.05 " the pin distance, 2 * 45 perforation top covers, SAMTEC, element/face of weld
J8 A is capable: N[0], 4VCC3V, 4GND, N[2] B is capable: N[0], 4VCC3V, 4GND, N[2] 0.05 " the pin distance, 2 * 45 perforation sockets, SAMTEC, element/face of weld
J9 A is capable: NH[2], LASTL, GND B is capable: J21B is capable, GND 0.05 " the pin distance, 2 * 30SMD top cover, SAMTEC, component side
J10 A is capable: J9B is capable, FIRSTL, and GND B is capable: J9A is capable, GND 0.05 " the pin distance, 2 * 30SMD socket, SAMTEC, face of weld
J11 A is capable: NH[1], VCC3V, GND B is capable: J23B is capable, VCC3V, GND 0.05 " the pin distance, 2 * 30SMD top cover, SAMTEC, component side
J12 A is capable: J11B is capable, VCC3V, and GND B is capable: J11A is capable, VCC3V, GND 0.05 " the pin distance, 2 * 30SMD socket, SAMTEC, face of weld
J13 A is capable: N[1], 4VCC3V, 4GND, N[3] B is capable: N[1], 4VCC3V, 4GND, N[3] 0.05 " the pin distance, 2 * 45 perforation top covers, SAMTEC, element/face of weld
J14 A is capable: N[1], 4VCC3V, 4GND, N[3] B is capable: N[1], 4VCC3V, 4GND, N[3] 0.05 " the pin distance, 2 * 45 perforation sockets, SAMTEC, element/face of weld
J15 A is capable: NH[3], LASTH, GND B is capable: J27B is capable, GND 0.05 " the pin distance, 2 * 30SMD top cover, SAMTEC, component side
J16 A is capable: J15B is capable, FIRSTH, and GND B is capable: J15A is capable, GND 0.05 " the pin distance, 2 * 30SMD socket, SAMTEC, face of weld
J17 A is capable: SH[0], VCC3V, GND B is capable: J5B is capable, VCC3V, GND 0.05 " the pin distance, 2 * 30SMD top cover, SAMTEC, component side
J18 A is capable: J17B is capable, VCC3V, and GND B is capable: J17A is capable, VCC3V, GND 0.05 " the pin distance, 2 * 30SMD socket, SAMTEC, face of weld
J19 A is capable: S[0], 4VCC3V, 4GND, S[2] B is capable: S[0], 4VCC3V, 4GND, S[2] 0.05 " the pin distance, 2 * 45 perforation top covers, SAMTEC, element/face of weld
J20 A is capable: S[0], 4VCC3V, 4GND, S[2] B is capable: S[0], 4VCC3V, 4GND, S[2] 0.05 " the pin distance, 2 * 45 perforation sockets, SAMTEC, element/face of weld
J21 A is capable: SH[2], LASTL, GND B is capable: J19B is capable, GND 0.05 " the pin distance, 2 * 30SMD top cover, SAMTEC, component side
J22 A is capable: J21B is capable, FIRSTL, and GND B is capable: J21A is capable, GND 0.05 " the pin distance, 2 * 30SMD socket, SAMTEC, face of weld
J23 A is capable: SH[1], VCC3V, GND B is capable: J11B is capable, VCC3V, GND 0.05 " the pin distance, 2 * 30SMD top cover, SAMTEC, component side
J24 A is capable: J23B is capable, VCC3V, GND 0.05 " the pin distance, 2 * 30SMD socket,
B is capable: J23A is capable, VCC3V, GND SAMTEC, face of weld
J25 A is capable: S[1], 4VCC3V, 4GND, S[3] B is capable: S[1], 4VCC3V, 4GND, S[3] 0.05 " the pin distance, 2 * 45 perforation top covers, SAMTEC, element/face of weld
J26 A is capable: S[1], 4VCC3V, 4GND, S[3] B is capable: S[1], 4VCC3V, 4GND, S[3] 0.05 " the pin distance, 2 * 45 perforation sockets, SAMTEC, element/face of weld
J27 A is capable: SH[3], LASTH, GND B is capable: J15B is capable, GND 0.05 " the pin distance, 2 * 30SMD top cover, SAMTEC, component side
J28 A is capable: J27B is capable, FIRSTH, and GND B is capable: J27A is capable, GND 0.05 " the pin distance, 2 * 30SMD socket, SAMTEC, face of weld
The connector of band shade is perforate.Attention the table D in, the digitized representation fpga logic device number 0 to 3 in the bracket [].Like this, S[0] just represent 74 bytes of south orientation interconnection (be the S[73:0 among Figure 37]) and FPGA0 thereof.
Table E: local bus connector-J3, J4
Number of pins Signal name I/O Number of pins Signal name I/O
A1 GND PWR B1 LRESET_N I/O
A2 I/O B2 VCC5V PWR
A3 GND PWR B3 LD0 I/O
A4 LD1 I/O B4 LD2 I/O
A5 LD3 I/O B5 LD4 I/O
A6 LD5 I/O B6 LD6 I/O
A7 LD7 I/O B7 LD8 I/O
A8 LD9 I/O B8 LD10 I/O
A9 LD11 I/O B9 GND PWR
A10 VCC3V PWR B10 LD12 I/O
A11 LD13 I/O B11 LD14 I/O
A12 LD15 I/O B12 LD16 I/O
A13 LD17 I/O B13 LD18 I/O
A14 LD19 I/O B14 LD20 PWR
A15 LD21 I/O B15 VCC3V I/O
A16 LD22 I/O B16 LD23 I/O
A17 LD24 I/O B17 LD25 I/O
A18 LD26 I/O B18 LD27 I/O
A19 LD28 I/O B19 LD29 I/O
A20 LD30 I/O B20 LD31 I/O
A21 VCC3V PWR B21 LHOLD OT
A22 ASD_N I/O B22 GND PWR
A23 DEN_N O B23 DTR_N O
A24 LA31 O B24 LA30KTR O
A25 LA29 O B25 LA28 O
A26 LA10 O B26 LA7 O
A27 LA6 O B27 LA5 O
A28 LA4 O B28 LA3 O
A29 LA2 O B29 Finish OD
A30 VCC5V PWR B30 DCC5V PWR
I/O direction wherein refers to circuit board 1.
Figure 43 connector J1 that is Figure 41 (A) in 41 (F) and Figure 42 is to the legend of J28.On the whole, what clear block showed is mounted on surface, then represents the perforation type with the block that grey is filled up.In addition, the connector on the solid line block representation element face, dotted line road block is represented the connector on the face of weld.Like this, the clear block 1840 that the with dashed lines road is delineated is just represented 2 * 30 top covers, and mounted on surface also is fixed on the component side.The clear block 1841 that the with dashed lines road is delineated is represented 2 * 30 sockets, and mounted on surface also is fixed on the face of weld of circuit board.Represent 2 * 30 or 2 * 45 top covers with the grey block 1842 that the solid line road is delineated, bore a hole and be fixed on the component side.The grey block 1843 that the with dashed lines road is delineated is represented 2 * 30 or 2 * 45 sockets, bores a hole and is fixed on the face of weld.In one embodiment, simulation system is used the SFM of Samtec and 2 * 30 or 2 * 45 microstripline connectors that TFM series is applicable to mounted on surface and perforation type.Represent the R bag with the intersection block 1844 that the solid line road is delineated, the mounted on surface perforation also is fixed on the face of weld.The intersection block 1845 that the with dashed lines road is delineated is represented the R bag, and the mounted on surface perforation also is fixed on the face of weld.Content in the instructions under the Samtec catalogue on the website of Samtec company is incorporated this paper by reference into.Get back to Figure 42, connector J3 is the shown legend of Figure 43 to J28.
Figure 41 (A) has shown the top view of each circuit board and their connectors separately to 41 (F).What Figure 41 (A) showed is the connector of the 6th circuit board.Like this, circuit board 1660 comprises connector 1661 to 1681 and motherboard connector 1682.What Figure 41 (B) showed is the connector of the 5th circuit board.Like this, circuit board 1690 just comprises connector 1691 to 1708 and motherboard connector 1709.What Figure 41 (C) showed is the connector of the 4th circuit board.Therefore, circuit board 1715 just comprises connector 1716 to 1733 and motherboard connector 1734.What Figure 41 (D) showed is the connector of tertiary circuit plate.Therefore, circuit board 1740 just comprises connector 1741 to 1758 and motherboard connector 1759.What Figure 41 (E) showed is the connector of second circuit board.Therefore, circuit board 1765 just comprises connector 1766 to 1783 and motherboard connector 1784.What Figure 41 (F) showed is the connector of first circuit board.Therefore, circuit board 1790 just comprises connector 1791 to 1812 and motherboard connector 1813.Shown in the legend of Figure 43, these connectors of the 6th circuit board are following several connections: one, mounted on surface or perforation, two, component side or face of weld, three, top cover, socket or R bag.
In one embodiment, these connectors are the connections that are used for carrying out between circuit board.Relevant bus and signal pcl lumps together and by connector between these circuit boards transmission signals between two circuit boards.And the circuit board that has only half is directly to link to each other with mainboard.In Figure 41 (A), the 6th circuit board 1660 comprises the connector 1661 to 1668 that is used for a cover FPGA interconnection, is used for the connector 1669 to 1674,1676 and 1679 of a cover FPGA interconnection, and the connector 1681 that is used for local bus.Because the 6th circuit board 1660 terminal circuit board that is mainboards (first circuit board 1790 is positioned at the other end in Figure 41 (F)), so connector 1675,1677,1678 and 1680 is being connected of 10 ohm of R bags that are used for certain north-south interconnection.In addition as shown in Figure 38 (B), motherboard connector 1682 neither be used for the 6th circuit board 1660, and wherein the 6th circuit board 1535 is that link to each other with the 5th circuit board 1534 rather than is directly connected on the mainboard 1520.
In Figure 41 (B), the 5th circuit board 1690 comprises the connector 1691 to 1698 that is used for a cover FPGA interconnection device, is used for the connector 1699 to 1706 of another set of FPGA interconnection, and the connector 1707 and 1708 that is used for local bus.Connector 1709 is not used for the 5th circuit board 1690 is connected to mainboard.
In Figure 41 (C), the 4th circuit board 1715 comprises the connector 1716 to 1723 that is used for a cover FPGA interconnection, is used for the connector 1724 to 1731 of another set of FPGA interconnection, and the connector 1732 and 1733 that is used for local bus.Connector 1709 is not used for the 4th circuit board 1715 is directly connected to mainboard.This is configured in Figure 38 (B) also demonstration, and wherein the 4th circuit board 1533 is directly to link to each other with tertiary circuit plate 1532 and the 5th circuit board 1534 is not directly to link to each other with mainboard 1520.
In Figure 41 (D), tertiary circuit plate 1740 comprises the connector 1741 to 1748 that is used for a cover FPGA interconnection, is used for the connector 1749 to 1756 of another set of FPGA interconnection, and the connector 1757 and 1758 that is used for local bus.Connector 1759 is not used for tertiary circuit plate 1740 is connected to mainboard.
In Figure 41 (E), second circuit board 1765 comprises the connector 1766 to 1773 that is used for a cover FPGA interconnection, is used for the connector 1774 to 1781 of another set of FPGA interconnection, and the connector 1782 and 1783 that is used for local bus.Connector 1784 is not used for second circuit board 1765 is connected to mainboard.This is configured in Figure 38 (B) also demonstration, and wherein second circuit board 1525 is directly to link to each other with tertiary circuit plate 1532 and first circuit board 1526 is not directly to link to each other with mainboard 1520.
In Figure 41 (F), first circuit board 1790 comprises the connector 1791 to 1698 that is used for a cover FPGA interconnection, is used for the connector 1799 to 1804,1806 and 1809 of another set of FPGA interconnection, and the connector 1811 and 1812 that is used for local bus.Connector 1813 is not used for first circuit board 1790 is connected to mainboard.Because the first circuit board 1790 terminal circuit board that is mainboards (the 6th circuit board 1660 is positioned at the other end in Figure 41 (A)), so connector 1805,1807,1808 and 1810 is being connected of 10 ohm of R bags that are used for certain north-south interconnection.
In one embodiment of this invention, multiple circuit board is to be connected in mainboard and interconnected with a kind of unique mode.Multiple circuit board is according to being linked in sequence together from the component side to the face of weld.In these circuit boards, first circuit board is to link to each other with mainboard by a motherboard connector to be connected with pci bus then.FPGA interconnect bus on the first circuit board is to be connected in the FPGA interconnection of another piece circuit board (such as second circuit board) by a pair of FPGA interconnecting connector.The FPGA interconnecting connector of first circuit board be positioned on the component side and the FPGA of second circuit board interconnection is positioned on the face of weld.On first circuit board and the second circuit board separately component side and the connector of face of weld make the interconnect bus of FPGA to be connected to each other.
Similarly, the local bus of second circuit board links together by the local bus connector.Local bus connector on the first circuit board be positioned on the component side and local bus connector on the second circuit board be positioned on the face of weld.On first circuit board and the second circuit board separately component side and the connector of face of weld make the interconnect bus of FPGA to be connected to each other.
Can also increase more circuit board.The method that can face toward the component side of second circuit board according to the face of weld with the 3rd circuit board increases circuit board.We can set up being connected between FPGA interconnection device and local bus circuit plate with similar method.The tertiary circuit plate also is to link to each other with mainboard by another connector, but this connector is only for the tertiary circuit plate provides power supply and ground connection, and this will discuss hereinafter.
The component side of double circuit plate configuration is discussed with reference to Figure 38 (A) to the connector of face of weld.The side view that the FPGA circuit board connects on the mainboard that this figure shows is according to the present invention to be done.What Figure 38 (A) showed is the configuration of double circuit plate, as the term suggests only used two circuit boards.Two circuit boards 1525 (second circuit board) among Figure 38 (A) are consistent with two circuit boards 1552 and 1551 among Figure 39 with 1526 (first circuit boards).Numbering 1989 has been represented the component side of circuit board 1525 and 1526.Numbering 1988 has been represented the face of weld of circuit board 1525 and 1526.Shown in Figure 38 (A), circuit board 1525 links to each other with mainboard 1520 by motherboard connector 1523 with 1526.Be needing of expansion, we also provide other motherboard connector 1521,1522 and 1524.Signal between pci bus and circuit board 1525 and 1526 is by motherboard connector 1523 transmission.Pci signal transmission between this double circuit plate structure and the pci bus is at first undertaken by first circuit board 1526.Like this, the signal that sends from pci bus at first arrives first circuit board 1526 before second circuit board 1525.Similarly, the signal from the double circuit plate structure to pci bus is sent by second circuit board 1526.Also be equipped with the supply unit (not shown) in addition by motherboard connector 1523.
Shown in Figure 38 (A), circuit board 1526 comprises several assemblies and connector.One of them assembly is exactly a fpga logic device 1530.Also have connector 1528A and 1531A in addition.Similarly, circuit board 1525 comprises several assemblies and connector.One of them assembly is exactly a fpga logic device 1529.Also have connector 1528B and 1531B in addition.
In one embodiment, connector 1528A and 1528B are the circuit mother daughter board connectors of FPGA bus (as 1590 and 1581, seeing Figure 44).These circuit mother daughter board connectors provide connection between circuit board for various FPGA interconnection, as N[73:0], S[73:0], W[73:0], E[73:0], NH[27:0], SH[27:0], XH[36:0] and XH[72:37], and do not comprise that local bus connects.
In addition, connector 1531A and 1531B are the circuit mother daughter board connectors of local bus.Local bus is being controlled the signal between pci bus (passing through pci controller) and the FPGA bus (by FPGA i/o controller (CRTL_FPGA) unit).Local bus is also being controlled configuration and the boundary scan testing information between pci controller, fpga logic device and FPGA i/o controller (CRTL_FPGA) unit.
On the whole, motherboard connector is connected to pci bus and power supply with one in a pair of circuit board.One set of connectors is connected to FPGA by the component side of a circuit board face of weld of another circuit board.Another set of connector is connected to local bus by the component side of a circuit board face of weld of another circuit board.
Used plural circuit board in another embodiment of the present invention.Figure 38 (B) has shown the configuration of six circuit boards.This configuration is similar with the configuration of Figure 38 (A), all be to link to each other with mainboard every a circuit board, and the interconnection of these circuit boards all is to be connected according to the direction of face of weld to component side by the circuit mother daughter board connector with local bus.
Figure 38 (B) has shown six circuit boards 1526 (first circuit board), 1525 (second circuit boards), 1532 (tertiary circuit plate), 1533 (the 4th circuit boards), 1534 (the 5th circuit boards) and 1535 (the 6th circuit boards).These six circuit boards link to each other with mainboard 1520 with connector on 1534 (the 5th circuit boards) by 1526 (first circuit boards), 1532 (tertiary circuit plate).Other circuit board 1525 (second circuit board), 1533 (the 4th circuit board) does not directly link to each other with mainboard with 1535 (the 6th circuit boards), and they are by linking to each other with mainboard with being connected indirectly of proximate circuitry plate.
Various circuit mother daughter board connectors are mounted between face of weld and the component side, and they have set up the connection between pci bus assembly, fpga logic device, storage arrangement and the various simulation system control circuit.Connector J5 among the first cover circuit mother daughter board connector, 1990 corresponding Figure 42 is to J16.Connector J17 among the second cover circuit mother daughter board connector, 1991 corresponding Figure 42 is to J28.Connector J3 and J4 among the first cover circuit mother daughter board connector, 1992 corresponding Figure 42.
Motherboard connector 1521 to 1524 on the mainboard 1520 is connected to mainboard (and pci bus) on six circuit boards.As mentioned above, 1526 (first circuit boards), 1532 (tertiary circuit plate) and 1534 (the 5th circuit boards) directly link to each other with 1521 with connector 1523,1522 respectively.Other circuit board 1525 (second circuit board), 1533 (the 4th circuit board) does not directly link to each other with mainboard 1520 with 1535 (the 6th circuit boards).Because six circuit boards only need a pci controller altogether, so have only first circuit board 1526 to comprise a pci controller.The motherboard connector 1523 that links to each other with first circuit board 1526 set up and pci bus between path.Connector 1522 is connected power supply and ground connection with 1521.In one embodiment, the spacing of being close to the center to center between the motherboard connector is approximately 20.32mm.
For direct circuit board 1526 (first circuit board), 1532 (tertiary circuit plate) and 1534 (the 5th circuit boards) that link to each other with 1521 with connector 1523,1522 of difference, their J5 is to be positioned on the component side to the J16 connector, and J17 is positioned on the face of weld and local bus connector J3 is to be positioned on the component side to J4 to J28.For the circuit board 1525 (second circuit board), 1533 (the 4th circuit board) and 1535 (the 6th circuit boards) that directly do not link to each other with 1521 with connector 1523,1522, their J5 is to be positioned on the face of weld to the J16 connector, and J17 is positioned on the component side and local bus connector J3 is to be positioned on the face of weld to J4 to J28.For tail circuit plate 1526 (first circuit board) and 1535 (the 6th circuit board), connector J17 is 10 ohm of R bag terminals to the part of J28.
Figure 40 (A) shows that with figure (B) array that passes various boards is connected.For simplifying production run, we use with a kind of design proposal all circuit boards.Just as explained above, circuit board is connected on other circuit board by the connector that does not have base plate.Figure 40 (A) has shown two block models circuit boards 1611 (second circuit board) and 1610 (first circuit boards).The component side of circuit board 1610 is facing to the face of weld of circuit board 1611.Circuit board 1611 comprises many fpga logic devices, other assembly and wire line.Specific node node A ' of these logical units and other assembly on the circuit board 1611 (numbering 1612) and B ' (numbering 1614) expression.Node A ' is connected to connector pad 1616 by PCB trace road 1620.Similarly, Node B ' be connected to connector pad 1617 by PCB trace road 1623.
Similarly, circuit board 1610 also comprises many fpga logic devices, other assembly and wire line.Specific node node A ' of these logical units and other assembly on the circuit board 1610 (numbering 1613) and B ' (numbering 1615) expression.Node A ' is connected to connector pad 1618 by PCB trace road 1625.Similarly, Node B ' be connected to connector pad 1619 by PCB trace road 1622.
Wiring lines road between the node on the various boards of using surface-mount connector below will be discussed.In Figure 40 (A), desirable connection is to be based upon (1) to fabricate indicated node A in path 1620,1621 and 1622 and B ' and (2) and fabricate between indicated Node B in path 1623,1624 and 1625 and the A '.These connections are to be used for resembling the such path of asymmetric interconnection 1600 between Figure 39 circuit board 1551 and the circuit board 1552.Other asymmetric interconnection is included in the interconnection 1977,1979 and 1981 of the NH on connector 1589 and 1590 two sides to SH.
A-A ' and B-B ' are corresponding to interconnection device 1515 (N, S) such interconnection.The perforation connector is used in N and S interconnection, and wherein the SMD connector is used in the asymmetric interconnection of NH and SH.Details reference table D.
Hereinafter with reference to Figure 40 (B) actual installation of using surface-mount connector is discussed, is used identical sequence number to represent identical part.In Figure 40 (B), circuit board 1611 has shown that the node A ' on the component side is connected to connector pad 1636 on the component side by PCB trace road 1620.Connector pad 1636 on the component side is connected to the connector pad 1639 of face of weld by conductive path 1651.Connector pad 1639 on the face of weld is connected to connector pad 1642 on circuit board 1610 component sides by conductive path 1648.At last, the connector pad 1642 on the component side is connected to Node B by PCB trace road 1622.Like this, the node A ' on the circuit board 1611 just can be connected to the Node B on the circuit board 1610.
Equally, in Figure 40 (B), circuit board 1611 has shown the Node B on the component side ' be connected to connector pad 1638 on the component side by PCB trace road 1623.Connector pad 1638 on the component side is connected to the connector pad 1637 of face of weld by conductive path 1650.Connector pad 1637 on the face of weld is connected to connector pad 1640 on the component side by conductive path 1645.At last, the connector pad 1640 on the component side is connected to node A by PCB trace road 1625.Like this, the Node B on the circuit board 1611 ' just can be connected to the node A on the circuit board 1610.Because these circuit boards all adopt same design proposal,, conductive path 1652 and 1653 is used for the circuit board contiguous conductive path 1650 and 1651 with circuit board 1610 so can resembling.Like this, just set up the connectivity scenario between a kind of unique circuit plate that uses mounted on surface and perforation connector rather than switch block.
F. timing-insensitive and glitch-free logical unit
One embodiment of the invention have solved the problem of retention time and clock glitch.According to one embodiment of the invention, in the process of the hardware model of user's design configurations being gone into reconfigurable computing system, the standard logical devices of finding in user design (as latch, trigger) replaces with emulation logic device or timing-insensitive and glitch-free (TIGF) logical unit.In one embodiment, the trigger pip that is incorporated in the EVAL signal is to be used for upgrading the stored value of these TIGF logical units.Various inputs and other signal in the hardware model of user's design transmission and in estimation process, reach steady state (SS) after, can produce the trigger pip that is used for upgrading the stored or value that latchs of TIGF logical unit.Thereby begun the new estimation cycle.In one embodiment, this from the stage that is estimated to triggering be round-robin.
The problem of the retention time of being mentioned above now discussing briefly.The person of ordinary skill in the field knows that a general problem of Logic Circuit Design is exactly to upset the retention time.The data input that retention time is meant a logic element must keep stable minimum time after the variation of control input (as the clock input) has caused latching, obtaining or storing of value that the data input is indicated; Otherwise logic element can not normal operation.
Now we through discussion the example of a shift register demonstrate the requirement of retention time.Figure 75 (A) has shown a typical shift register that connects three D flip-flops continuously, i.e. the output of trigger 2400 connects is the input of the trigger 2401 that links to each other with the input of trigger 2402.All input signal Si n be link to each other with the input of trigger 2400 and all output signal Sout is produced by trigger 2402.These three triggers receive a common clock signal in their clock inputs separately.This shift register is according to following hypothesis design: (1) clock signal arrives three triggers simultaneously, and after detecting clock edges, the input of trigger can not change in the period of retention time (2).
The sequential chart of Figure 75 (B) has been demonstrated the hypothesis of retention time, and wherein system does not destroy the requirement of retention time.Retention time can change between logic element, but these to change all be that parameter in showing to specifications takes place.At time t 0The variation of clock input occurs in logical zero to logical one.Shown in Figure 75 (A), clock is input to trigger 2400 to 2402.From clock edge t 0Begin to import S InMust be from time t 0To t 1Retention time T HKeep stable in period.Similarly, (be D to trigger 2401 2) and 2402 (be D 3) input period retention time that also must begin at triggering edge from clock signal in keep stable.Since Figure 75 (A) and 75 (B) have satisfied this requirement, import S so InJust be displaced to trigger 2400, D 2The input of (logical zero) is displaced to trigger 2401 and D 3The input of (logical one) then is displaced to trigger 2402.The person of ordinary skill in the field knows that after the clock edge is being triggered if the requirement of retention time is satisfied, trigger 2401 (is imported D so 2Logical one) and trigger 2401 (input D 3Logical zero) new value will and be stored in the next trigger in next clock ring shift.Following table has been summed up the operation to the shift register of these typical assignment:
D 1 D 2 D 3 Q 3
Before the clock edge 1 0 1 0
Behind the clock edge 1 1 0 1
When reality was implemented, clock signal can not reach all logic elements simultaneously.Or rather, the design of circuit makes clock signal will almost arrive all logic elements simultaneously or substantially simultaneously.The design of circuit must make clock skew or arrive between the clock signal of each trigger time sequence difference more than the retention time require little.Correspondingly, all logic elements will obtain suitable input value.In the example that above Figure 75 (A) and 75 (B) are demonstrated, upset and to cause some triggers to obtain old input value and another trigger obtains new input value because clock signal arrives retention time that trigger 2400 to 2402 causes at different time.The result makes that shift register can not normal operation.
In the device of the reconfigurable logic (being FPGA) that same shift register is designed, if clock is not to produce from primary input, circuit design can be become make low skew network clock signal can be distributed in all logic elements and go that these logic elements just can detect the clock edge substantially at one time so so.Major clock produces in self clock test platform program.Usually master clock signal produces in software, has only seldom some major clocks (being 1-10) to find in typical user's circuit design.
Yet if clock signal is to produce in logic rather than the primary input internally, it is even more important that the problem of retention time just seems.Derive or gated clock is to produce in the network of the combinational logic that driven by major clock and register.Many (promptly 1000 or more) are derived clock and are found in typical user's circuit design.These clock signals do not need other prevention just can arrive each logic element in the different time with control measure and clock skew may be longer than the retention time.This will cause the failure of circuit design, the shift-register circuit of being demonstrated such as Figure 75 (A) and 75 (B).
We will discuss the retention time upset by the same shift-register circuit of being demonstrated among Figure 75 (A) now.At this moment, the trigger separately of shift-register circuit can pass the shown a plurality of reconfigurable logic chip of Figure 76 (A) (as a plurality of fpga chips) and launch.First fpga chip 2411 comprises the inside derivation clocked logic 2410 that its clock signal clk is fed back to some assemblies of fpga chip 2412 to 2416.In this example, the inner clock signal clk that produces will offer the trigger 2400 to 2402 of shift-register circuit.Chip 2412 comprises trigger 2400, and chip 2415 comprises trigger 2401 and chip 2416 comprises trigger 2402.Two chips 2413 in addition and 2414 are used for demonstrating the retention time notion.
The clocked logic 2410 of chip 2411 receives an initial clock input (perhaps may be the clock input of an other derivation) and produces a clock internal signal CLK.This internal clock signal CLK will be transferred to chip 2412 and be designated as CLK1.The internal clock signal CLK that sends from clocked logic 2410 also can be transferred to chip 2415 and be designated as CLK2 by chip 2412 and 2414.As implied above, CLK1 is input to trigger 2400 CLK2 and then is input to trigger 2401.CLK1 and CLK2 can experience the wire trace road to postpone, and the edge of CLK1 and CLK2 postpones in the clock signal clk internally like this.And CLK2 also can be because of passing other two chips 2413 and 2414 and the additional delay of experience.
Sequential chart with reference to Figure 76 (B) can find that internal clock signal CLK is at time t 2Produced and triggered.Because the wire trace road postpones, CLK1 can arrive time t 3Just arrive the trigger 2400 of chip 2412, be designated as T1 this time delay.Shown in as above showing, Q 1Output (or the input D 2) before CLK1 clock edge arrives, be to be positioned at logical zero.Detect the edge of CLK1 at trigger 2400 after, because necessary retention time H2 (is time t 4) preceding D 1Input must keep stable.At this moment trigger 2400 is shifted or is stored into logical one so that Q 1Output (or D 2) at logical one.
When these occurred on the trigger 2400, the trigger 2401 of clock signal clk 2 forward chips 2415 moved.Chip 2413 and 2414 caused delay T2 make CLK2 at time t 5Arrive trigger 2401.D 2Input at this moment arrive logical one.After satisfying in 2401 required retention times of this trigger, this logical assignment 1 will appear at Q 2Output (or D 3) on.Like this, output Q before CLK2 arrives 2Be positioned on the logical one and arrive back output still on logical one at CLK2.This result is incorrect.This shift register should be displaced to logical zero.When register 2400 when correctly displacement arrives old input value (logical one), trigger 2401 is displaced to new input value (logical one) mistakenly.This is the typical fault operation that is taken place when clock skew (or sequential time delay) is bigger than the retention time.In the middle of this example, T2>T1+H2.Generally speaking, unless take some preventive measure, postpone otherwise the retention time may take place when being distributed to logic element on the different chips when a chip clocking and with it.
We will discuss the clock aliasing problem with reference to Figure 77 (A) and 77 (B) now.Usually, when the input of a circuit changed, output also can become a random value in the quite short time before it is decided to be a right value.If another electric circuit inspection is to just detecting output and reading random value in that wrong time, the result will be incorrect and be difficult to debugging so.This random value to other circuit generation deleterious effect just becomes glitch.In typical logical circuit, a circuit can be another circuit clocking.If one or two circuit exists the sequential time delay that is not compensated, will produce a clock glitch (being the clock edge of beyong contemplation) so and cause the result of a mistake.Identical with the retention time upset, the reason that causes the clock glitch is that some logic element in the circuit design has changed value in the different time.
Figure 77 (A) has shown the exemplary logic circuit of some logic elements to another group logic element transmit clock signal; That is, D flip-flop 2420, D flip-flop 2421 and exclusive-OR gate (XOR) 2422 is to D flip-flop 2423 transmit clock signals (CLK3).The D1 of trigger 2420 on circuit 2425 receives its input data and Q1 on circuit 2427 receives output data.It receives its clock input (CLK1) in a clock logic 2424.CLK refers to major clock logic 2424 clockings and CLK1 refers to it because arrive the identical signal that trigger 2420 is postponed.
The D2 of trigger 2421 on circuit 2426 receive it the input data and at the Q of circuit 2428 2On receive output data.It receives its clock input (CLK2) in a clock logic 2424.As mentioned above, CLK refers to major clock logic 2424 clockings, and CLK1 refers to it because arrive the identical signal that trigger 2421 is postponed.
Be input to exclusive-OR gate 2422 by trigger 2420 and 2421 outputs of being sent respectively on the circuit 2427 and 2428.The data that exclusive-OR gate 2422 will be labeled as CLK3 output to the clock input of trigger 2423.Trigger is the D on circuit 2429 also 3Input data and at Q 3Output data.
We discuss the clock aliasing problem that this circuit may cause with reference to the sequential chart among Figure 77 (B) now.The CLK signal is at time t 0Be triggered.Before arriving trigger 2420, clock signal (being CLK1) is time t 1CLK2 is up to time t 2Just arrive trigger 2421.
Suppose D 1And D 2Input all at logical one.When CLK1 at time t 1The output of Q1 will be at logical one (shown in Figure 77 (B)) when arriving trigger 2420.CLK2 arrives trigger 2421 at time t2 after a while, and like this, the Q2 output on the circuit 2428 all remains on logical zero from time t1 to time t2.Even desirable signal is a logical zero (1XOR1=0), but exclusive-OR gate 2422 transmits the clock input of a logical one (as CLK3) to trigger 2423 at time t1 during time t2.The generation of CLK3 is a clock glitch in time t1 is during the time t2.Therefore, no matter whether expect that the D3 that any logical value appears on trigger 2423 incoming lines 2429 has been stored, and trigger 2423 will be prepared input next time on circuit 2429 this moment.If design is correct, the time delay of CLK1 and CLK2 can reduce to minimum so, so just can not produce the clock glitch, and perhaps time of being continued of clock glitch can not exert an influence to the part that circuit is left at least.
The method of two kinds of known solution retention time upset problems is as follows: (1) sequential adjustment, (2) sequential are synthetic again.The 5th, 475, the sequential adjustment of being discussed in No. 830 United States Patent (USP)s requires to insert the retention time that enough delay elements (as impact damper) prolong logic element in some signal path.For example, increasing enough delays on input D2 in shift-register circuit and the D3 can avoid the retention time to upset.Like this, in Figure 78, shown the same shift-register circuit that respectively delay element 2430 and 2431 is added to input D2 and D3.After the result just can design delay element 2430 and makes that time t4 occurs in time t5, T2<T1+H2 (Figure 76 (B)) and make the generation that does not have the retention time to upset.
Potential problems of timing adjusting method are the specification tables that it too relies on fpga chip.The person of ordinary skill in the field knows that the reconfigurable logic chip resemble the fpga chip realizes logic element by look-up table.The relevant look-up table that postpones is in specification table in the chip, and the deviser who uses timing adjusting method to avoid the retention time to upset depends on this specific time delay.Yet this postpones just individual estimated value and can change along with the difference of chip.Another potential problems of timing adjusting method are that the deviser must compensate the line road delay that exists in the entire circuit design process.Though this is not impossible, the estimation that the line road postpones is to need time consuming and cause mistake easily.The more important thing is that the sequential adjustment does not solve the problem of clock glitch.
Another solution is that the sequential that Virtual Wiring (VirtualWires) technology of IKOS is introduced is synthesized again.The synthetic again notion of sequential comprises that the circuit design with a user is transformed into the design of a functional equivalent simultaneously by finite state machine and the strict sequential of controlling clock and leading foot signal of register.Sequential is synthetic again readjusts the time of subscriber's line circuit design by introducing an independent high-frequency clock.It also with latch, gated clock and multiple synchronously and asynchronous clock be transformed into a single clock Synchronization Design based on trigger.Like this, sequential is synthesized to use register to come the precision that signal moves in the control chip and reach at the input and output leading foot of each chip again and is avoided the generation that the retention time is upset in the chip.The synthetic again time of also in each chip, using a state machine and determining to import, arrive the output and the internal trigger renewal of other chip based on reference clock of sequential from other chip.
Figure 79 uses the same shift register of being introduced in the above-mentioned discussion related with Figure 75 (A), 75 (B), 76 (A) and 76 (B) to show a sequential example of combiner circuit again.Basic three trigger shift registers design has been transformed into the design of a functional equivalent.Chip 2430 comprises the logic 2435 of the original internal clocking generation that links to each other with a register 2443 by circuit 2448.Clocked logic 2435 produces the CLK signal.First finite state machine 2438 also links to each other with register 2443 by circuit 2449.The register 2443 and first finite state machine 2438 all are to be controlled by the overall reference clock that does not rely on design.
The CLK signal also passed chip 2432 and 2433 and transmits before arriving chip 2434.In chip 2432, second finite state machine 2440 links to each other with register 2445 by circuit 2462.The CLK signal is delivered to register 2445 from register 2443 by circuit 2461.Register 2445 outputs to next chip 2433 by circuit 2463 with signal.Chip 2433 comprises the 3rd finite state machine 2441 controlling register 2446 by circuit 2464.Register 2446 outputs to chip 2434 with the CLK signal.
Chip 2431 comprises initial trigger 2436.Register 2444 receives input S InAnd will import S by circuit 2452 InOutput to the D of trigger 2436 1In the input.The Q of trigger 2436 1Output links to each other with register 2466 by circuit 2454.Having ideals, morality, culture, and discipline limit state machine 2439 starts circuit 2453 control triggers 2436 by circuit 2451 control registers 2444 by circuit 2455 control registers 2466 and by latch.Having ideals, morality, culture, and discipline limit state machine 2439 also receives master clock signal CLK by circuit 2450 from chip 2430.
Chip 2434 comprises initial trigger 2437, and it is by the D of the register 2466 of circuit 2456 from the chip 2431 2The input received signal.The Q of trigger 2437 2Output links to each other with register 2447 by circuit 2457.The 5th finite state machine 2439 starts circuit 2458 control triggers 2437 by circuit 2459 control registers 2447 and by latch.The 5th finite state machine 2442 also receives master clock signal CLK by chip 2432 and 2433 from chip 2430.
Under the synthetic again situation of utilization sequential, finite state machine 2438 to 2442, register 2443 to 2447 and independent overall reference clock are to be used for controlling the signal flow that passes multiple chip and to upgrade internal trigger.Like this, the CLK signal is definite by register 2443 by first finite state machine 2438 to the time of the distribution of other chip in chip 2430.Similarly, having ideals, morality, culture, and discipline limit state machine 2439 is to be used for determining by register 2436 from input S in chip 2431 InAnd pass through register 2466 from Q 2To the transmission of trigger 2436.The latch function of trigger 2436 also is that of being sent by having ideals, morality, culture, and discipline limit state machine 2439 latchs enabling signal control.The logic of other chip 2432 to 2434 also is suitable for same principle.Owing to upgrade and carried out strict control, upset thereby got rid of in the chip retention time to importing in passing time, the chip output time and internal trigger state in the chip.
Yet sequential synthetic technology again requires the subscriber's line circuit design is transformed into the suitable circuit of much bigger function and comprises additional finite state machine and register.In general, in order to carry out can the account for useful logic of each chip 20% of the necessary additional logic of this technology.And this technology can not resist the clock aliasing problem.Use sequential again the deviser of the synthetic technology preventive means that must take to add avoid the clock glitch.Conservative method is circuit of design so that can not change in the same time to the input of the logical unit of use gated clock.A positive method is to use gate delay to filter glitch so that it can not influence the remaining part of circuit.Yet as the above, synthetic again some the additional effective measures that need of sequential are avoided the clock glitch.
We will discuss the various embodiment of solution retention time of the present invention and clock aliasing problem now.Be mapped in the process of hardware model of the software model of rcc computing system and RCC array in the configuration with user design, the shown latch of Figure 18 (A) is according to the emulation of one embodiment of the invention to the insensitive glitch-free of a sequential (TIGF).Similarly, the shown design trigger of Figure 18 (B) is according to the emulation of one embodiment of the invention to a TIGF trigger.No matter these TIGF logical units are with the latch or the form of trigger, also can be known as the emulation logic device.The renewal of TIGF latch and trigger is controlled by overall trigger pip.
In one embodiment of this invention, not that all logical units of finding in user's design circuit are all replaced by TIGF.User's design circuit comprises by major clock and starting or the part of timing and by gate or derive the other parts of clock control.Because the retention time is upset and the clock glitch is the problem that belongs to the latter, wherein logical unit is by gate or derivation clock control.According to one embodiment of the invention, have only the logical unit by gate or derivation clock control of these uniquenesses to replace by the TIGF logical unit.In other embodiments, all logical units of finding in user's design circuit are all replaced by the TIGF logical unit.
We discuss overall trigger pip earlier before TIGF latch of the present invention and trigger embodiment are discussed.On the whole, overall trigger pip is to be used for making TIGF latch and trigger to keep its state (promptly keeping old input value) in estimation process and renewal its state (promptly storing new input value) during short the triggering.In one embodiment, the overall trigger pip shown in Figure 82 is from above-mentioned EVAL Signal Separation and derive out.Overall situation trigger pip is followed the trail of the EVAL signal in estimation process, and can produce the short trigger pip that is used for upgrading TIGF latch and trigger when finishing when EVAL circulates.In another embodiment, the EVAL signal is overall trigger pip, and it is to be in a logic state (being logical zero) and to be to be in another logic state (logical one) in non-estimation or TIGF latch/trigger update stage in the estimation cycle.
About the discussion of rcc computing system and RCC hardware array, the estimation cycle is to be used for variation with all primary inputs and trigger/latch means to be transferred in user's design completely and to go, and once is a simulation loop as above.In the process of transmission, the RCC system is in waiting status always and all reaches steady state (SS) up to all signals of system.The estimation cycle be with user's design map and be configured to go in the suitable reconfigurable logical unit (as fpga chip) of RCC array after by system-computed.Correspondingly, the estimation cycle is a particular design, that is to say, the estimation cycle of different user design is different.The duration in this estimation cycle should be able to guarantee that all signals of system can transmit and reached steady state (SS) at the next one before the short triggering stage by total system.
Shown in Figure 82, the short triggering stage is contiguous with the estimation cycle.In one embodiment, lacking the triggering stage lags behind the estimation cycle.Input signal is that the hardware model configuration section by user's design circuit transmits in the estimation cycle before the short triggering stage.According to one embodiment of the invention, come the short triggering stage of mark controlling all TIGF latch and triggers in user's design with a variation of EVAL logic state signal, they just can be updated to the new value of being transmitted from the estimation cycle that reaches after the steady state (SS) like this.This short triggering stage is undertaken by a low skew network that the overall situation distributes and the weak point of its duration (be shown in Figure 82 from t 0To t 1And from t 2To t 3) can satisfy reconfigurable logical unit and carry out the requirement that proper operation allowed.In this short triggering stage, can take a sample to new primary input at each input phase of TIGF latch and trigger, and old being stored in the next stage that identical TIGF latch and the value in the trigger can be output to the RCC hardware of user's design.In the following discussion, the part of the overall trigger pip that is taken place in the short triggering stage will be called as TIGF and trigger, TIGF trigger pip, trigger pip or be called triggering simply.
Figure 80 (A) has shown at first latch 2470 shown in Figure 18 (A).This latch is operated by following program:
if(#S),Q←1
else if(#R),Q←0
else if(en),Q←D
Else Q keeps old value.
Because latch is the level induction and is asynchronous, so as long as clock input and latch start input to be activated, to export Q so and will follow the trail of input D.
Figure 80 (B) shows is TIGF latch according to one embodiment of the invention.As the latch among Figure 80 (A), the TIGF latch has D input, a startup input, a setting (S), resets (R) and an output Q.It has one to trigger input in addition.The TIGF latch comprises a D flip-flop 2471, a multiplexer 2472, an OR-gate 2473 and an AND gate 2474 and various interconnection device.
D flip-flop 2471 receives its input by circuit 2476 from the output of AND gate.D flip-flop also is to trigger input by the trigger pip on the circuit 2477 at it to be triggered, and this circuit 2477 is that the RCC system carries out overall situation distribution according to a plan that depends on the strictness of estimation round-robin.The output of D flip-flop 2471 links to each other with multiplexer 2472 by circuit 2475.Multiplexer 2472 other inputs are to link to each other with TIGF latch D input on the circuit 2475.Multiplexer is to be controlled by the enabling signal on the circuit 2484.The output of multiplexer 2472 is to link to each other with an input of OR-gate 2473 by circuit 2479.OR-gate 2473 other inputs are to link to each other with setting (S) input on the circuit 2480.The output of OR-gate 2473 is to link to each other by the input of circuit 2481 with AND gate 2474.AND gate 2474 other inputs are to link to each other with (R) signal that resets on the circuit 2482.As mentioned above, the output of AND gate 2474 is the inputs that feed back to D flip-flop 2471 by circuit 2476.
We will discuss the embodiment of TIGF latch operation of the present invention now.In the embodiment of this TIGF latch, D flip-flop 2471 makes TIGF keep current state (being old value).Circuit 2476 in D flip-flop 2471 inputs has provided the new input value that need be latched into the TIGF latch.Circuit 2476 is for the value of making new advances because the primary input (D input) of TIGF latch at last can be from multiplexer 2472 (having the correct enabling signal that will in the end provide on the circuit 2484) on the circuit 2475, pass OR-gate 2473 and pass AND gate 2474 arrival circuits 2483 at last, 2483 input signals that the TIGF latch is new of circuit feed back to the D flip-flop 2471 on the circuit 2476.Trigger pip on the circuit 2477 is upgraded the TIGF latch by input value new on the circuit 2476 is recorded on the D flip-flop 2471.Like this, the output of D flip-flop 2471 has just shown that the input on current state (the being old value) circuit 2478 of TIGF latch has then shown the new input value that need be latched in the TIGF latch on the circuit 2478.
Multiplexer 2472 receives the current state of D flip-flop 2471 and the new input value on the circuit 2475.The function that starts circuit 2484 is the selector signal as multiplexer 2472.Because the TIGF latch has trigger pip just can upgrade (promptly storing new input value) on circuit 2477, so the D input value of TIGF latch and the startup value on the circuit 2484 can reach TIGF with any order on the circuit 2475.If the situation that this TIGF latch (designing other latch of hardware model with the user) is run into use usually can cause that the circuit that uses latch takes place upsets the retention time, (among Figure 76 as mentioned above (A) 76 (B) clock signal far be later than another clock signal reach), this TIGF latch can come normal operation till correct old value being remained to when trigger pip occurring on the circuit 2477 so.
Trigger pip distributes by low skew global clock network.
This TIGF latch has also solved the problem of clock glitch.Notice that clock signal is to be replaced by the enabling signal on the TIGF latch.Enabling signal on the circuit 2484 may malfunctioning but TIGF can continue the maintenance current state in the process in estimation cycle.Unique mechanism that TIGF can be updated when signal reaches steady state (SS) is the trigger pip by the estimation week after date generation of an embodiment.
Figure 81 (A) has shown at first trigger 2490 shown in Figure 18 (B).This latch is operated by following program:
if(#S),Q←1
else if(#R),Q←0
Else if (positive edge of CLK), Q ← D
Else Q keeps old value.
Because being the edge, triggers by latch, so, export Q so and will follow the trail of input D in the positive edge of clock signal as long as triggering to start to import is activated.
Figure 81 (B) shows is TIGF D flip-flop according to one embodiment of the invention.As the trigger among Figure 81 (A), the TIGF trigger has D input, clock input, a setting (S), to reset (R) and an output Q.It has one to trigger input in addition.The TIGF trigger comprises three D flip- flops 2491,2492 and 2496, one multiplexers, 2493, one OR-gates 2494 and two AND gates 2495 and 2497 and various interconnection.
TIGF D input on trigger 2491 receiving liness 2498, the triggering input on the circuit 2499 also provides Q output on circuit 2500.This outlet line 2500 is also as an input of multiplexer 2493.Another input of multiplexer 2493 is by the Q output of circuit 2503 from trigger 2492.The output of multiplexer 2493 is to link to each other with an input of OR-gate 2494 by circuit 2505.OR-gate 2494 other inputs are setting (S) signals on the circuit 2506.The output of OR-gate 2494 is to link to each other with an input of AND gate 2495 by circuit 2507.AND gate 2495 other inputs are resetting on the circuit 2508 (R).The output of AND gate 2495 (also being whole TIGF output Q) is to link to each other by the input of circuit 2501 with trigger 2492.Trigger 2492 also has one to trigger input on circuit 2502.
Get back to multiplexer 2493, its selector switch input links to each other with AND gate 2497 by circuit 2509.The input of AND gate 2497 is the output from trigger 2496 on CLK signal on the circuit 2510 and the circuit 2512.The trigger 2496 also CLK signal from the circuit 2511 receives input and receives trigger pip from circuit 2513.
We will discuss the embodiment of TIGF trigger operation of the present invention now.In this embodiment, the TIGF trigger receives trigger pips from following three different points: via the D flip-flop 2491 of circuit 2499, via the D flip-flop 2492 of circuit 2502 and via the D flip-flop 2496 of circuit 2513.
The TIGF trigger only is detected Shi Caihui and stores input value at the edge of clock signal.According to one embodiment of the invention, desired edge is the positive edge of clock signal.We provide marginal detector 2515 to detect the positive edge of clock signal.Marginal detector 2515 comprises a D flip-flop 2496 and an AND gate 2497.Marginal detector also upgrades by the trigger pip of D flip-flop 2496 on the circuit 2513.
D flip-flop 2491 is preserved the currency (or old value) of TIGF trigger and is stoped the change of D input on the circuit 2498, produces trigger pip on circuit 2499.Like this, new value is stored in the D flip-flop 2491 before each estimation cycle of TIGF trigger.The TIGF trigger just avoids the retention time to upset by storing new value like this, up to the TIGF trigger signal update that is triggered.
The currency of in store TIGF trigger (or old value) D flip-flop 2492 trigger pip occurs on circuit 2502 before.This value is to upgrade the state of the TIGF trigger of simulating before back and next estimation cycle.The in store new value of input of D flip-flop 2492 to the circuit 2501 (also being the value on the circuit 2500 during of estimation cycle is important).
New input value on multiplexer 2493 receiving liness 2500 and be stored in old value in the TIGF trigger on the circuit 2503 now.Based on the selector signal on the circuit 2504, multiplexer new value of output (circuit 2500) or old value (circuit 2503) are as the output of the TIGF trigger of simulation.Designing transmission signals all in the hardware model the user up till now exported along with the clock glitch changes near steady state (SS).Like this, the input on the circuit 2501 will provide the new value that is stored in the trigger 2491 before estimating latter stage.When the TIGF trigger received trigger pip, trigger 2492 was storing the past and is then storing next new value on the circuit 2498 at the new value trigger 2491 on the circuit 2501.The TIGF trigger just is not subjected to the side effects of clock glitch like this, according to an embodiment of the invention.
More at large set forth, this TIGF trigger also has some to resist the effect of clock glitch.Persons of ordinary skill in the field will appreciate that the clock glitch just can not influence any circuit so if replace trigger 2420,2421 and 2423 among Figure 77 (A) with the TIGF trigger.With reference to once Figure 77 (A) and 77 (B), it is because of from time t that the clock glitch can have a negative impact to Figure 77 (A) 1To t 2During internal trigger 2423 when should not writing down new value, write down new value.The characteristic of CLK1 and CLK2 signal skew forces exclusive-OR gate 2422 at t 1To t 2Produce a logical one state during this time, this state-driven the clock line of next trigger 2423.According to one embodiment of the invention, if use TIGF trigger clock glitch just can not influence the new value of record.If replace trigger 2423 with the TIGF trigger, in case signal reaches steady state (SS) in the estimation cycle, the trigger pip in so short triggering stage will make the TIGF trigger store the new value of (Figure 81 (B)) in the trigger 2491.Thereafter the such any clock glitch of the clock glitch in the image pattern 77 (B) is from t 1To t 2Period in can not write down new value.The TIGF trigger only can upgrade along with trigger pip, and this trigger pip only the signal in being transmitted in circuit just can reach the TIGF trigger after reaching estimation cycle of steady state (SS).
Though this unique TIGF trigger embodiment is a D flip-flop, and other trigger (as T, JK, SR) also within the scope of the invention.The edge triggered flip-flop of other type can by on the D flip-flop and increase before D input " with "/" or " logic derives out.
VII. emulating server
Emulating server can allow the multi-user to enter identical reconfigurable hardware cell according to another embodiment of the present invention, comes to simulate and quicken identical or different user's design in the mode of time-sharing operation effectively.Operation simulation program and state exchange mechanism at a high speed makes emulating server can carry out the very strong simulation process of effective processing power.Server provides multi-user or processing, so that visit the reconfigurable hardware unit for quickening with hardware state conversion.Quicken or access hardware state in case finish, each user or handle and just can only use software simulation so just makes other users or processing can control the reconfigurable hardware unit.
In the part of the emulating server of this instructions, we have used such as " operation " and " processing " such term.In this instructions, term " operation " and " processing " can be exchanged use usually.The batch processing system in past is carried out " operation " and time sharing operating system stores and carry out " processing " or program.And these operations and processing are similar in system of today.Term " operation " just is not limited to batch processing system in this instructions like this, and " processing " just is not limited to time sharing operating system.And " operation " is equal to " processing " under certain extreme case, and that can carry out " processing " when " processing " exactly in a time block or under the situation that does not have other timesharing program meddler to interrupt.Require many time blocks to finish if another extreme case is exactly " operation ", " operation " is the subclass of " processing " so.Therefore, if one " processing " requires many time blocks to finish execution owing to the appearance of user/processing of other All factors being equal, preference will be give to power, it just is divided into " operation " so.And, if one " processing " since it be the user of independent right of priority or handle very short so that in a time block, just can finish, so one " processing " just be equal to one " operation ".Such user just can with one or more " processing " or the program interaction of in simulation system, having loaded and having carried out, and one " processings " may need one or more " operation " to finish in the timesharing programming system.
In a system configuration, the multi-user can utilize identical multiprocessor workstation by remote terminal under non-network environment, and reach the purpose that enters same reconfigurable hardware unit, thereby check or debug identical or different subscriber's line circuit design.In non-network environment, remote terminal be by with the link processing capacity that realizes it of a main body computing system.These non-network settings make the multi-user can enter same user and design the purpose that reaches parallel debugging.This path is handled by time-sharing operation and is realized, and when carrying out this time-sharing operation, scheduler program plays the user that decision has preferential right of ingress, conversion process, and in predetermined user, optionally lock the hardware cell inlet.In other cases, the multi-user can debug by entering same reconfigurable hardware cell corresponding to his (she) oneself server of separation and different users design.In this configuration, the multiple microprocessor in multi-user or processing and the operation systems share workstation.In another configuration, the multi-user in separation and the workstation that be based on microprocessor or to handle then be to enter the same hardware cell that reconfigures by network, thus check or debug identical or different subscriber's line circuit design.Similarly, this path is also handled by time-sharing operation and is realized, and when carrying out this time-sharing operation, scheduler program plays the user that decision has preferential right of ingress, conversion process, and in predetermined user, optionally lock the hardware cell inlet.Under network environment, scheduler program is accepted the network requests by UNIX socket system call.This operating system fetches to scheduler program by cover and sends instruction.
As previously mentioned, the operation simulation program is used the preferential round-robin algorithm of multipriority.In other words, have the user of high priority more or handle at first serviced, till this user or its operation and end process of finishing dealing with.The user with equal priority or in handling, preferential round-robin algorithm is used according to the following rules: each user or handle is given the identical time and goes executable operations up to end.Very short during this period of time, too of a specified duration because multi-user or processing need not to wait for.Also long enough during this period of time is because before the scheduler program of emulating server interrupts a user or processing and is transformed into and carries out new user job, carried out sufficient operation.In one embodiment, it was 5 seconds that system establishes between sequential section, and can be set by the user.In one embodiment, scheduler program sends special request to the scheduler program of operating system itself.
Figure 45 is a non-network environment with multi-processor workstation according to one embodiment of the invention.Figure 45 is the modification of Fig. 1, therefore, components identical, unit has been used identical numbering.Workstation1 100 comprises 1105, one main frames of local bus/PCI bridge 1106, memory bus 1107, primary memory 1108, and cache memory subsystem (not shown).Though we also provide other user interface section (as display, keyboard), in Figure 45, do not show.Workstation1 100 also comprises multiple microprocessor 1101,1102,1103,1104, they by scheduler program 1117 be connected/path 1118 is coupled on the local bus 1105.The person of ordinary skill in the field knows that operating system 1121 provides user's hardware interface basis for whole computing environment, could be users different in the computing environment like this, handles and device management file and Resources allocation.In order to make principle clearer, we have shown operating system 1121 and bus 1122.Can find the list of references about operating system in " modern operating system (1996) " of " Principles of Operating System (1988) " that AbrahamSilberschatz and James L.Peterson collaborate and William Stalling, its content is incorporated this paper by reference into.
In one embodiment, what workstation1 100 adopted is Sun Microsystems Enterprise450 system, and what it used is the UltraSPARC processor.Sun 450 systems have replaced the memory access by local bus, and it makes multi-processor visit reservoir by some private buss that linked to each other with reservoir by crossbar switch.Therefore, carry out instruction separately at multiple microprocessor, when by local bus storer not being conducted interviews, multiprocessing can move.The content of the instructions of Sun450 system and UltraSPAR will be incorporated this paper by reference into.Though Sun Ultra 60 systems only can allow 2 processors, it remains an other example of microprocessor system.
By device driver 1119 be connected/path 1120, scheduler program 1117 provides the timesharing of reconfigurable hardware cell 20 visit.The execution of scheduler program 1117 mainly interacts with the operating system of host computer system in software, part in hardware by supporting the simulation process to interrupt and exchange is gone into/gone out the simulation process and comes to interact with emulating server.Scheduler program 1117 and device driver 1119 will be done below and go through.
In workstation1 101, each microprocessor 1101 to 1104 all has independent processing and does not rely on the ability of other microprocessor.As one embodiment of the invention, workstation1 100 moves under the operating system based on UNIX, although among other the embodiment, 1101 workstations can move under the operating system based on Windows or Macintosh.Based on the system of UNIX is that the user has disposed X-Windows and manages program as required as the user, carries out operation and handles the interface of file.If want to understand the details of UNIX operating system, please refer to Maurice J.Bach " UNIX operating system design (1986) ".
In Figure 45, the multi-user can enter workstation1 100 by remote terminal.Sometimes, unique user can come working procedure with a special CPU.Under other situation, unique user uses different CPU according to resource limit.Usually, operating system 1121 these visits of decision, in fact, operating system itself can jump to another from a CPU and carry out operation.In order to handle the time-sharing operation routine processes, scheduler program receives network requests and operating system 1121 is sent system call by the socket system call, conversely, operating system 1121 is handled right of priority like this: it sends to reconfigurable hardware cell 20 by device driver 1119 and starts the indication that look-at-me produces.The generation of look-at-me is the step in many dispatching algorithm steps, comprising stopping current operation, is the job storage status information of current termination, and conversion process is carried out new operation.The server scheduling algorithm is discussed below.
Socket and socket system call are discussed now briefly.In one embodiment, UNIX operating system can be with the time-sharing operation mode operation.In the regular hour (being the time period), the UNIX kernel distributes a processing to CPU, and when finishing during this period of time, it is preferential that this is handled, and be then the next time period to dispatch the another one processing.Dispatch once more in the time period a little later in the processing that previous time period is preferential.
A scheme is socket (socket), so that can realize and promote communication between each processing and the purpose that allows to use the complex network agreement.Kernel has three layers to bring into play function under client/server mode, comprises socket layer, protocol layer and mechanical floor.Upper strata (socket layer) provides the interface between system call and the bottom (protocol layer and mechanical floor).Typical socket layer has terminal point, so that CLIENT PROGRAM and server program coupling connection mutually.The socket terminal can be positioned on the different machines.The middle layer is the protocol mode that protocol layer provides interchange, such as TCP and IP.Bottom is the driver that mechanical floor comprises control network devices.A device driver be exemplified as Ethernet driver based on Ethernet.
Handling the application client server mode exchanges.In this pattern, server program is accepted in the socket of an end, and CLIENT PROGRAM is accepted in service routine by the socket of another end points in a two-way exchange path.Kernel is being kept each client, server and the connection of the inherence between three layers in the needed path data from client to the server.
Socket comprises several system calls, has set up the socket system call that exchanges path termination comprising one.In many system calls, a lot of programs socket descriptor sd.Connected system calls a name and a socket descriptor sd is connected.The example of some other system calls comprises that connected system calls and requires kernel and socket to interrelate, and shutdown system is called and closed socket, cuts off system call and closes socket and connect, and send and be received in system call transmission data between the socket that is connected.
What Figure 46 had shown is an alternative embodiment of the invention, wherein many workstations by a network share one independent based on the simulation system on the time-sharing operation basis.Many workstations are by scheduler program 1117 and simulation system coupling connection.Under the computing environment of simulation system, independent CPU11 joins with local bus 12 couplings on station 1110.This system also can be equipped with multiple CPU.The person of ordinary skill in the field knows and equipped operating system 1118 in addition, and nearly all program and be applied to be present in the top of operating system.For making principle clear, operating system 1121 together is illustrated together with bus 1122.
In Figure 46, workstation1 110 comprises those assembly/unit in Fig. 1, and they are coupled to local bus 12 together with scheduler 1117 and scheduler bus 1118 by operating system 1121.Scheduler 1117 is by sending socket call control timesharing calling party station 1111,1112 and 1113 to operating system 1121.Scheduler 1117 major parts are implemented in the software, are partly implemented in the hardware.
In this drawing, have only three users to be shown, and can be by the access to netwoks simulation system.Certainly, other system is provided with regulation more than three or less than three users.Each user is by remote work station 1111,1112,1113 access system.The long-distance user stands and 1111,1112 and 1113 can be coupled to scheduler 1117 by network connection road 1114,1115 and 1116 respectively.
The person of ordinary skill in the field knows that device driver 1119 is connected between pci bus 50 and the reconfigurable hardware cell 20.Between device driver 1119 and reconfigurable hardware cell 20, can install and connect or conductive path 1120.In network multi-user embodiment of the present invention, scheduler program 1117 is connected with device driver 1119 by operating system 1121, gets in touch with and control reconfigurable hardware cell 20 so that hardware state can carry out hardware-accelerated and simulation after recovering.
In addition, in one embodiment, analog operation station 1100 is Sun MicrosystemsEnterprise 450 systems, and it uses the UltraSPACEII multi-processor.With different by the local bus memory access, the SUN450 system can so that multi-processor by by a crossbar switch but not the specialized bus of uniting local bus access storer conduct interviews.
Figure 47 has shown the high-rise emulating server structure according to network implementation example of the present invention.Operating system is not shown brightly, but the person of ordinary skill in the field knows that it is to go to serve different users for carrying out file management with resources allocation under the analog computation environment, handles and device.Emulating server 1130 comprises scheduler 1137, one or more device drivers 1138 and reconfigurable hardware cell 1139.Though in Figure 45 and 46, emulating server is not clearly illustrated out that as a single integral unit it comprises scheduler 1117, device driver 1119 and reconfigurable hardware cell 20.Get back to Figure 47, emulating server 1130 is respectively by 1134,1135 and 1136 and three teller work stations 1131,1132, network connection/path and 1133 couplings connection.As mentioned above, more than three or can join less than three workstations with emulating server 1130 couplings.
Scheduler in emulating server is based upon on the preferential round-robin algorithm basis.In fact, recycle scheme allows several users or program to carry out according to priority to finish cycling.Like this, each simulation process just be awarded priority level and carry out between required solid sequential section (simulation process and one under network environment workstation or interrelate in a user/program of the non-network environment of a multiple processing).
Usually, more the operation of high priority at first has been performed.On the one hand, if each different user has different right of priority, so earlier service have highest priority the user till its end of job, and last service has the user of lowest priority.In this retention time section not, because each user has different right of priority, and scheduler is only served the user according to right of priority.This situation to have only a user capture simulation system similar up to situation about finishing.
Another extremely is exactly the right of priority that different users has equality.Be applicable to like this and early advance early to go out the time period notion of (FIFO) formation.For the operation that has equal right, no matter whether it begins earlier, and section termination between its end or solid sequential is all carried out in each operation.If operation be can not be completed,, just must preserve the simulated image relevant with completed operation so for later recovery and execution in its time period.This not intact operation has just come the sequence back then.The simulated image of having preserved (even if being the next item down operation) will be resumed and carry out in the next time period.
The operation of higher-priority is better than the operation than low priority.In other words, the operation of equality right of priority is carried out in circulation pattern and is finished up to it.In circulation pattern carry out operation than low priority thereafter.If the operation of a certain higher-priority is inserted in the sequence of moving than the operation of low priority, the operation of higher-priority will have precedence over the operation than low priority so, execute up to the operation of higher-priority.Therefore, the operation of higher-priority just moved before the operation than low priority begins to carry out and has finished.If the operation than low priority has begun to carry out, the operation that then can end to carry out than low priority executes up to the operation of higher-priority.
In one embodiment, UNIX operating system provides basic preferential round-robin scheduling algorithm.According to one embodiment of the invention, the dispatching algorithm of emulating server links to each other with the dispatching algorithm of operating system.In system, be the preferential user-defined dispatch list of carrying out by the running program that priority provided of dispatching algorithm based on UNIX.For the time-sharing operation plan can be implemented, the operation simulation program has been used a kind of preferential round-robin algorithm with multiple right of priority on the dispatching algorithm of operating system own.
According to one embodiment of the invention, the relation between multi-user and the emulating server is followed a client/server mode, and wherein multi-user is the client, and the analog service system is a server.Carry out communication by socket call between user client and the server.Briefly with reference to Figure 55, the client comprises 1109, one socket system calls of CLIENT PROGRAM assembly 1123, UNIX kernel 1124 and ICP/IP protocol assembly 1125.Server comprises an ICP/IP protocol assembly 1126, UNIX kernel 1127, socket system call assembly 1128 and emulating server 1129.Many clients can come the requirement server to carry out simulation job by the UNIX socket call that client applications sends.
In one embodiment, a typical event sequence comprises that a plurality of clients send request by UNIX socket agreement to server.For each request, server thinks all whether it runs succeeded about instruction., for the request of server queue state, server will be answered the situation of current queue so that present to the user rightly.Following table F has listed client's socket instruction
Table F: client's socket instruction
Instruction Describe
0 Begin simulation<design 〉
1 Suspend simulation<simulation 〉
2 Withdraw from simulation<design 〉
3 The simulation process is redistributed right of priority
4 The design Storage emulation mode
5 Quene state
For each socket call, with each bar instruction back of integer coding all have some represent name of design such as<design additional parameter.If this instructs successful execution, the response that emulating server sends is 0, if there is not successful execution, the response of sending is 1.For the instruction 5 that requires quene state, an embodiment of instruction feedback is that the ASCII literal with " 0 " character ending is presented on user's the display screen.After having used these system's socket call, reconfigurable hardware cell can send or receive appropriate communication protocol signal by device driver.
Figure 48 is an embodiment according to emulating server structure of the present invention.As mentioned above, single emulating server can provide service to reach under the time-sharing operation mode purpose that the simulation in user's design and hardware are quickened to multi-user or a plurality of processing.Therefore, user/ processing 1147,1148,1194 is respectively by inter-process communication line 1150,1151,1152 and emulating server 1140 couplings connection.These communication lines can be present in the same workstation so that the multi-processor setting operation perhaps is present in the use of being convenient to multiple-workstation in the network.In order to carry out communication with reconfigurable hardware cell, each simulation process comprises software simulation state and hardware state.Inter-process communication between the software process is carried out by UNIX socket or system call, and it can allow this simulation process be present in to be equipped with simulator to insert on the same workstation of card, perhaps is present in the workstation of the separation that links to each other by the TCP/IP network.Like this, can carry out automatically with the communication of emulating server.
In Figure 48, emulating server 1140 comprises server display 1141, simulation job queue table 1142, priority classification device 1143, operation converter 1144, device driver 1145 and a reconfigurable hardware cell 1148, and simulation job queue table 1142, priority classification device 1143 and operation converter 1144 have been formed scheduling driver shown in Figure 47 1137.
Server display 1141 provides user interface function for the system manager.The user can pass through command system display simulation job queue, dispatching priority, and service recorder and simulation job conversion efficiency are monitored the state of emulating server.Other functions of use also comprise the editing operating right of priority, delete simulation job and reset the emulating server state.
Simulation job queue table 1142 is listed is all outstanding simulation requests in the formation of being inserted by scheduler program.The project of this form comprises job number, and software simulation is handled number, software simulation image, hardware simulation image file, design configurations file, priority number, hardware size, software size, the integration time of dry run and owner's identity.Job queue is according to " first in first out " queueing form (FIFO).Therefore, when requiring to finish a new operation, it is placed in the end of formation.
Which simulation job in the formation is carried out in 1143 decisions of priority classification device.In one embodiment, the simulation job priority scheme is can (promptly can be controlled and set by the system manager) defined by the user, is controlled at this which simulation process enjoys priority in current execution.In one embodiment, determine priority level according to the importance urgent or user especially of particular procedure.According to user ID priority being set is that more everybody praises highly.A typical example is arranged here: a user has high right of priority, and other users enjoy right of priority lower but equality.
The rank of priority is set by the system manager.Emulating server obtains user profile from UNIX equipment, relatively is typically to find in the UNIX user file of by name "/etc/passwd ".It is consistent with the new user procedures of increase in unix system wherein increasing new user procedures.After all users are defined, just can come to adjust priority level for different user with the display of simulator server.
The different simulation job that operation converter 1144 comes interim replacement and different disposal or workstation to link according to the priority decision of setting for scheduler program.If the multi-user is simulating same design, the operation converter is only changed the simulation process under the storage emulation mode.But if the multi-user is simulating multiple design, the operation converter will load this design for hardware before the exchange emulation mode so.In one embodiment, the conversion because the visit of reconfigurable hardware cell only need fulfil assignment is so this operation mechanism has improved the performance of time-sharing operation embodiment of the present invention.Therefore, if a user needs to carry out software simulation in the section between at a time, just server can be transformed in other operation of other users and goes so, that user just can visit reconfigurable hardware cell and accelerating hardware like this.The user can regulate and set the operation inversion frequency.Device driver also can carry out communication with conversion process with reconfigurable hardware cell.
The operation of emulating server will be discussed now.Figure 49 is the process flow diagram of an emulating server in operating process.Originally system is idle in step 1160, and at this moment, emulating server there is no need to be in unactivated state, and the process of simulating in other words is not in the middle of operation.In fact, idle condition may mean one of following situation: (1) does not have dry run, (2) having only a user/workstation in single processing environment is to be in state of activation so that not need time-sharing operation, or (3) have only a user or workstation to be in state of activation in multiprocessing environment, but only a processing is in operation.Therefore, above-mentioned 2,3 two kinds of situations show that emulating server only need be handled an operation, therefore arrange operation, and decision priority and operation conversion all are not need with unnecessary.Owing to do not receive request, so emulating server is in idle condition from other workstations or handling.
When the generation of a simulation request is when causing because of one or more request signals that workstation under the multi-user environment or the microprocessor under multi-processor environment send, emulating server will sort in step 1162 pair simulation job that enters or operation group.The simulation job queue table that scheduler program is done is that all outstanding simulation requests are inserted wherein and listed in the simulation request that all are outstanding.For the batch processing simulation job, the scheduler program in the server is given all simulation request queues that enter, and handles these operations automatically, does not need artificial intervention.
Then in step 1163, emulating server will be classified so that determine priority to each operation of arranging.This step is for a plurality of operation particular importances, because server must determine in each operation that priority is to visit reconfigurable hardware cell.Which simulation job the decision of priority classification device carries out.In one embodiment, if resource contention occurs, the executor of system will control the sequence list of simulation job, determine which simulator program of current execution.
After step 1163 had determined priority, server where necessary will be in step 1164 exchange simulation job.This step will temporarily use a simulation job that links to each other with a program or workstation to replace another simulation job that links to each other with another program or workstation according to the priority of being set by the plan in the server.If a plurality of users simulate same design, exchange in the emulation mode that the operation interchanger will only have been stored in the simulation process.And if a plurality of users are simulating different designs, the operation interchanger will load design earlier and exchange in emulation mode.Here, device driver also carries out communication with conversion process with reconfigurable hardware cell.
In one embodiment, operation is changed the mechanism and has been improved time-sharing operation implementation result of the present invention, because the operation exchange only needs to carry out when the reconfigurable hardware cell of visit.Therefore, if a user carries out software simulation in certain time period, server can be converted into another operation for another user, and such second user is hardware-accelerated to carry out with regard to addressable reconfigurable hardware cell.For example, suppose that user 1 and user 2 all need to visit reconfigurable hardware cell by emulating server.At first, user's 1 access system designs to debug his/her user in a period of time.If 1 of user debugs under software pattern, server just can discharge reconfigurable hardware cell, at this moment user 2 be addressable it.Server carries out the operation exchange for user 2, and user 2 enters software simulation or hardware-accelerated pattern.According to the priority between user 1 and the user 2, user 2 can continue to visit reconfigurable hardware cell within the predetermined time, perhaps when user 1 needs reconfigurable hardware cell to quicken, server can stop user 2 operation earlier, and user 1 operation can be gone into by exchange and be undertaken hardware-accelerated by reconfigurable hardware cell like this.This preset time is meant emptying in advance based on the simulator operation of a plurality of requests with identical priority.In one embodiment, though the user can be provided with it, the default time is 5 minutes.This 5 minutes is provided with has been represented a kind of form of timer in limited time.The execution that simulation system of the present invention uses the timer of prescribing a time limit to stop current simulation job is that therefore system determines other operations co-pending with same priority should visit reconfigurable hardware model because this operation is too consuming time.
In step 1164, after the operation exchange step is finished, the device driver in the server will lock reconfigurable hardware cell, have only user or program in the current planning can simulate and use hardware model like this.Locking and simulation steps occur in step 1165.
In case simulation is finished or the current simulation process of incident 1166 is suspended, server all will return priority classification step 1163 determining the order of each simulation job co-pending, and carry out simulation job where necessary and exchange.Similarly, server also may stop the execution of current simulation job and be back to priority classification state 1163 in step 1167.This often stop to handle only take place under given conditions.One of them such condition is to have the more operation of high priority to wait for treatment conditions.The such condition of another one is when system is just moving the simulation task of a computation-intensive, and scheduler program can be designed to utilize in limited time that timer stops the operation of current operation and handles another operation with same priority in the case.In one embodiment, timer is set at 5 minutes in limited time.If current operation carried out 5 minutes,, system is converted into the operation co-pending of the status that is in that All factors being equal, preference will be give to even will suspending current operation.
Figure 50 has shown the process flow diagram that the operation exchange is handled.The operation function of exchange realizes in the step 1164 of Figure 49, and is presented in the hardware of emulating server as operation interchanger 1144 in Figure 48.In Figure 50, when a simulation job need exchange with the another one simulation job, the operation interchanger can be to reconfigurable hardware cell connection closed signal in step 1180.Not move any operation (be that system's free time or user only operate in software pattern if reconfigurable hardware cell is current, do not relate to any hardware-accelerated), suspended market order makes reconfigurable hardware cell perform the preparation of operation exchange immediately.Yet if reconfigurable hardware cell is moving an operation and carrying out in an order or the routine processes, abort signal will be by identification but the data that reconfigurable hardware cell still continues to carry out current instruction co-pending and handles current operation.If reconfigurable hardware cell be not current operation is instructed carry out and the process of data processing in accepted abort signal, this signal has stopped the operation of reconfigurable hardware cell in fact immediately.
In step 1181, simulation system has been preserved current analog image (being the software and hardware state).By preserving these images, the user can be reruned the operation of preserving preceding simulation and recovering to simulate subsequently.
In step 1182, simulation system is that reconfigurable hardware cell has disposed new user's design.This configuration step have only be only under below the situation necessary: the user who has disposed and loaded in promptly relevant with new operation user's design and the reconfigurable hardware cell of just being ended to carry out has designed different.After configuration was finished, the hardware simulation image that is saved reloaded in step 1183, and the software simulation image that is saved then reloads in step 1184.If new simulation job links to each other with same design, then do not need to be provided with again.For same design, simulation system will load the desired hardware simulation image relevant with new simulation job in step 1183, because the analog image of the analog image of new operation and firm suspended operation may be different.Configuration step detail as per patent specification.Therefore, Xiang Guan software simulation image reloads in step 1184.After reloading the software and hardware analog image, new simulation job will begin in step 1185, and only because previous suspended operation temporarily can not be visited reconfigurable hardware cell, so it can only carry out under the software simulation pattern.
Signal between Figure 51 display-device driver and the reconfigurable hardware cell.Device driver 1171 provides the interface between scheduler 1170 and the reconfigurable hardware cell 1172.Shown in Figure 45 and 46, device driver 1171 also provides the interface between whole computing environment (being workstation, pci bus, PCI equipment) and the reconfigurable hardware cell 1172, but Figure 51 has only shown the emulating server part.Signal between device driver and the reconfigurable hardware cell comprises two-way communication synchronous exchange signal, pass unidirectional design configurations information from computing environment via scheduler program to reconfigurable hardware cell, the emulation mode information that exchange is advanced, the emulation mode information that exchanges out, and the slave unit driver passes to the abort signal of reconfigurable hardware cell with the exchange simulation job.
Circuit 1173 transmits the two way communication exchange signal, and these signals and handshaking protocol will be discussed in conjunction with Figure 53 and Figure 54.
Circuit 1174 transmits the unidirectional design configurations information that passes to reconfigurable hardware cell 1172 via scheduler 1170 from computing environment.Can pass to reconfigurable hardware cell 1172 by circuit 1170 for carrying out the modelling initial configuration information.In addition, when the user imitation with simulate different users when designing, configuration information must be sent to 1172 reconfigurable hardware cells in a period of time.When different users the same user of simulation when designing, new design configurations is unnecessary; And for different dry runs, the different analog hardware state that needs to link to each other with same design passes to reconfigurable hardware cell 1172.
Circuit 1175 transmits the emulation mode information that exchange is advanced to reconfigurable hardware cell 1172.The emulation mode information that the exchange that circuit 1176 transmission (are generally storer) from reconfigurable hardware cell to computing environment goes out.The emulation mode information that exchange is advanced comprises the hardware model status information of previous preservation and the hardware memory state that needs quicken with reconfigurable hardware cell 1172.The emulation mode information that exchange is advanced be the time period begin transmit, Yu Ding active user just can visit reconfigurable hardware cell to quicken like this.The emulation mode information that exchanges out comprises hardware model and memory state information.When proceeding to the next time period that links to each other with different user/programs in case reconfigurable hardware cell 1172 has received abort signal, these information must be saved at the end of this time period in the storer.The storage of the status information active user/program that makes can (for example be distributed to the next time period of active user/program) and preserve this state again in the time after a while.
Circuit 1177 transmits (slave unit driver 1171 sends) abort signal to carry out the conversion of simulation job to reconfigurable hardware cell.Abort signal was transmitted between the time period, could spread out of current simulation job and import new simulation job in next time period at current slot like this.
On the basis of reference Figure 53 and Figure 54, we will discuss the communication handshake agreement of implementing according to of the present invention.Figure 53 has shown the communication handshake signal that transmits by a synchronous exchange logic interface between device driver and the reconfigurable hardware cell.Figure 54 is the constitutional diagram of communication protocol.Figure 51 has shown the communication handshake signal on the circuit 1173.Figure 53 then is the detailed situation of the communication handshake signal between device driver and the reconfigurable hardware cell.
In Figure 53, reconfigurable hardware cell 1172 provides handshake logic interface 1234.In addition, handshake logic interface 1234 will be installed in the outside of reconfigurable hardware cell 1172.Four groups of signals are arranged between device driver 1171 and handshake logic interface 1234.These signals are the 3-bit space signals on the circuit 1230, the 1-position read-write on the circuit 1231, the 4-order of the bit signal on the circuit 1232, and the 1-position DONE signal on the circuit 1233.The handshake logic interface has comprised the logical circuit that is loaded with these signals, and this circuit can be positioned over reconfigurable hardware cell under the appropriate mode to carry out the different operating that needs operation.This interface links to each other with CTRL_FPGA unit (perhaps FPGA i/o controller).
For 3 SPACE (space) signal, the simulation system computing environment and the data transmission between the reconfigurable hardware cell that are based upon on the pci bus are to be assigned to the borderline specific I/O address of software/hardware space--REG (register), CLK (software clock), among S2H (software is to hardware) and the H2S (hardware is to software).Just as mentioned before, simulation system is mapped to hardware model in four address spaces of primary memory according to different component type and control function: the REG space is corresponding with the register assembly; The CLK space is corresponding with software clock; The S2H space is corresponding to the output of hardware model with the software test platform assembly; The H2S space is then with corresponding to the output of software test platform assembly from hardware model.In the initialization time of system, these special-purpose I/O cushion spaces will be mapped in the primary storage space of kernel.
Following table G has shown and has described all spacing waves:
Table G: spacing wave
The space Describe
000 The overall situation (or CLK) space and software are to hardware (DMA wr)
001 Register is write (DMA wr)
010 Hardware is to software (DMA rd)
011 Register read (DMA rd)
100 SRAM writes (DMA wr)
101 SRAM reads (DMA rd)
110 Untapped
111 Untapped
The read/write signal of circuit 1231 shows that these data read or write.End during DMA data transmission of DONE signal indication on the circuit 1233.
The 4-order of the bit shows that this data transfer operation is reading and writing, new user's design or the termination simulation of configuration in reconfigurable hardware cell.Bidding protocol is shown in following table H:
Table H:COMMAND (order) signal
Order Describe
0000 Write designated space
0001 Read from designated space
0010 Configuration FPGA design
0011 End simulation
0100 Untapped
On the basis of reference Figure 54, we will discuss the communication handshake agreement.At state 1400, the device driver simulation system is left unused.Short of new order is sent, and this system will keep idle state shown in path 1401.When new order is sent, command processor will be handled new order in state 1402.In the present embodiment, command processor is the FPGA i/o controller.
If COMMAND=0000 or COMMAND=0001 will read or write at designated space system will show as the SPACE index of state 1403.If COMMAND=0010, system at first can be configured FPGA and user's design in reconfigurable hardware, perhaps in state 1104, FPGA and new user's design are configured.System will be configured the ordering of information for all FPGA, the part that the user that can simulate in hardware with modelling designs.Yet if COMMAND=0011, system will end reconfigurable hardware cell to end simulation system at state 1405, exchange into new emulation mode because the time period has been prepared as new user/program.After these states 1403,1403,1405 finished, simulation system can produce DONE state 1406 to produce the DONE signal, promptly returned in state 1400 then and kept idle state to occur up to new order.
The time-sharing operation feature of emulating server when handling a plurality of operation that is on the different priorities is discussed now.Figure 52 has given our examples: have four to be about to the operation (operation A, operation B, operation C, operation D) carried out in the simulation job formation.But the order of these four operations is different: it is preferential that operation A and operation B are designated as one-level, and operation C and operation D then are that secondary is preferential.As Figure 52 the time shown in the line chart, the priority level of job queue co-pending is depended in the use of reconfigurable hardware time-sharing operation.In the time 1190, simulation is permitted it and is visited reconfigurable hardware cell from operation A.In the time 1191, operation A is interrupted by operation B, because operation B and operation A have equal priority, scheduler program provides the time-sharing operation access rights of equality to two operations.Operation B is the reconfigurable hardware cell of visit now.In the time 1192, operation A interruption operation B also carries out to finishing in times 1193.In the time 1193, operation B inserts and carries out to finishing in the time 1194.In the time 1194, be in next bit in the job queue but the operation C lower than the right of priority of operation A and operation B visits reconfigurable hardware cell and begin and carry out.In the time 1195, operation D interrupts operation C, carries out time-sharing operation visit, because they have that All factors being equal, preference will be give to property.The access rights of operation D were ended to times 1196, and this moment, its operation C interrupted.Operation C was finished in the time 1197, and operation D regained access rights and is finished in the time 1198 in the time 1197 then.
VIII. storer simulation
Storer simulation of the present invention or memory mapped are managed a plurality of memory blocks for simulation system provides an effective method, these memory blocks are relevant with the hardware model that configures of user design, and the program that this user designs is to be organized in the fpga chip array of reconfigurable hardware cell.By implementing embodiments of the invention, the storer modeling scheme does not need the pin of any special use on the fpga chip when handling the run memory visit.
Here used " memory access " be meant a visit that reads or writes, and this visit is at the fpga logic device that disposed user's design and has stored all and designed between the SRAM memory device of relevant memory block with the user.Therefore, write operation relates to the fpga logic device and transmits to the data between the SRAM memory device, and read operation relates to from the SRAM memory device to the data the fpga logic device and transmitting.With reference to Figure 56, the fpga logic device comprises 1201 (FPGA1), 1202 (FPGA3), 1203 (FPGA0) and 1204 (FPGA2), and the SRAM memory device then comprises memory device 1205 and 1206.
Simultaneously, " transmission of DMA data " except the common usage that the person of ordinary skill in the field understood, and it refers to that also the data between computing system and simulation system transmit.Fig. 1, the computing system that Figure 45 and Figure 46 show are fully based on PCI and have a storer of supporting to be arranged in software and reconfigurable simulation system.Socket/system call that selected device driver and operating system are sent or received also is the part of simulation system, and this part is for providing suitable interface between simulation system and operating system and the reconfigurable hardware cell.In one embodiment of this invention, DMA reads to transmit and comprises from the fpga logic device FPGA SRAM memory device of initialization and memory contents dump (and be used for) and transmitting to the data the host computer system.A DMA writes to transmit and comprises that the data (the FPGA SRAM memory device that is used for initialization and memory contents dump) transmit from the host computer system to the fpga logic device.
Here the term of mentioning " FPGA data bus ", " FPGA bus ", " FD bus " and other all variations are meant high-end group of bus FD[63:32] and low side group bus FD[31:0], these buses have connected fpga logic device and the SRAM memory device that comprises the user who disposes He programmed the design that remains to be debugged.
The storer simulation system comprises a memory state machine, an estimated state machine and relevant with them and control and the logic that is connected following system or device: (1) host computer system and relative storage system, (2) the SRAM storage arrangement that links to each other with the FPGA bus in the simulation system, (3) comprise the fpga logic device of the just debugged user who configures and programme design.
The fpga logic device end of storer simulation system comprises an estimated state machine, a FPGA bus driver, and the logic interfacing that the memory interface of user self in each memory block N and the user design is linked to each other, they can be handled: the data estimation in (1) fpga logic device, and the visit of the read/writable memory device between (2) SRAM storage arrangement and the fpga logic device.FPGA i/o controller end links to each other with the fpga logic device end, it comprises memory state machine and logic interfacing and handles: (1) host computer system and SRAM memory device, and the DMA between (2) fpga logic device and the SRAM memory device, reading and writing operation.
According to one embodiment of the invention, the description of the operation of storer simulation system is as follows substantially.The analog read/write circulation is divided into three phases: DMA data transmission, estimation and memory access.The beginning of DATAXSFR signal indicating DMA data transfer phase, in this stage,---high-end group of bus 1212[FD (63:32)] and low side group bus 1213[FD (31:0)]---transmits data mutually by the FPGA data bus between computing system and the SRAM storage unit.
In estimating stage, the logical circuit in each fpga logic device sends suitable software clock, input starts and multiplexed enabling signal is carried out data estimation in user's design logic.The intercommunication of fpga logic device was carried out in this stage.
In the memory access stage, the storer simulation system waits for that high-end and low side group fpga logic device is positioned over their address and control signals separately on separately the FPGA data bus.The CTRL_FPGA unit will latch these addresses and control signal.If carry out write operation, the address, control and data signal the SRAM storage arrangement that to be sent to from the fpga logic device separately.If carry out read operation, the SRAM memory device of appointment will obtain address and control signal, and data-signal then passes fpga logic device to separately by the SRAM memory device.After the memory block that in all fpga logic devices all need was accessed, the circulation of storer analog read/write was finished, and the analog storage system kept idle state before next storer analog read/write circulation arrives.
That Figure 56 shows is the high-level structure figure of storer analog configuration according to an embodiment of the invention.Simulate incoherent signal with storer of the present invention, be connected and bus is not revealed.The CTRL_FPGA unit of narrating above 1200 is connected with bus 1210 by circuit 1209.In one embodiment, CTRL_FPGA unit 1200 is programmable logic device (PLD) of fpga chip form, for example the Altera10K50 chip.Local bus 1210 makes CTRL_FPGA unit 1200 to be connected with chip (being pci controller, EEPROM, clock buffer) with other analog array plates (if any).Circuit 1209 transmits the DONE signals, the finishing of this signal indication analog D MA data transfer phase.
Figure 56 is other main functional blocks of logical unit and memory device form.In one embodiment of this invention, logical unit is with the programmable logic device (PLD) of fpga chip form (PLD), for example Altera 10K130 or 10K250 chip.Therefore, 8 Altera FLEX 10K100 chips are different with having in the array among the shown embodiment in front, and present embodiment has only used four Altera FLEX 10K130 chips.Memory device is synchronous pipeline high-speed cache SRAM, for example Cypress 128Kx32, CY7C1335 or a CY7C1336 chip.Logical unit comprises 1201 (FPGA1), 1202 (FPGA3), 1203 (FPGA0) and 1204 (FPGA2).Sram chip comprises low side group memory device 1205 (L-SRAM) and high-end group of memory device 1206 (H-SRAM).
These logical units and memory device are by high-end group of bus 1212[FD (63:32)] with low side group bus 1213[FD (31:0)] be connected with CTRL_FPGA unit 1200.Logical unit 1201 (FPGA1) is connected with high-end group of bus 1212 with bus 1225 by bus 1223 respectively with 1202 (FPGA3), and logical unit 1203 (FPGA0) then is connected with low side group data bus 1213 with bus 1226 by bus 1224 respectively with 1204 (FPGA2).High-end group of memory device 1206 is connected with high-end group of bus 1212 by bus 1220, and low side group memory device 1205 is connected with low side group bus 1213 by bus 1219.This dual group bus structure can make simulation system with higher percent of pass and be advanced into high-end group of device and low side group device.This dual group data bus structure is supported other signals simultaneously, as control signal and address signal, so just can control the analog read/write circulation.
Translate into Figure 61, we can find that each analog read/write circulation comprises DMA data transfer phase, estimating stage and memory access stage.The combination control of various control signal also shows whether simulation system is in certain stage relative with an other stage.DMA data transmission between the logical unit 1201 to 1204 of host computer system and reconfigurable hardware cell is by pci bus (being the bus 50 among Figure 46), local bus 1210 and 1236, and FPGA bus 1212[FD (63:32)] and 1213[FD (31:0)] carry out.The related memory device 1205 and 1206 of DMA data transmission is to carry out initialization and memory contents dump.Estimated data's transmission in the reconfigurable hardware cell between the logical unit 1201 to 1204 is by connecting (as indicated above) and FPGA bus 1212[FD (63:32) by inside] and 1213[FD (31:0)] carry out.Memory access between logical unit 1201 to 1204 and memory device 1205 and 1206 is by FPGA bus 1212[FD (63:32)] and 1213[FD (31:0)] carry out.
Return Figure 56, we find that CTRL_FPGA unit 1200 provides and received many controls and address signal, with the circulation of control analog read/write.CTRL_FPGA unit 1200 passes through bus 1221 respectively to logical unit 1201 and 1203 on bus 1211, and provides DATAXSFR and EVAL signal to logical unit 1202 and 1204 respectively by bus 1222.CTRL_FPGA unit 1200 also provides MA (18:2) signal to low side group memory device 1205 and high-end group of memory device 1206 respectively by bus 1229 and 1214.In order to control the pattern of these memory devices, CTRL_FPGA unit 1200 provides chip to select to read (with writing) signal to low side group memory device 1205 and high-end group of memory device 1206 respectively by bus 1216 and 1215.The analog storage system can be on bus 1209 sends or receives the DONE signal to CTRL_FPGA unit 1200 and computing system and shows finishing of DMA data transmission.
As Fig. 9 in front, 11,12,14 and mentioned in 15 o'clock, logical unit 1201 to 1204 links together by the multiple chip address indicator link of striding, and this indicator chain is presented as two groups of SHIFTIN/SHIFTOUT circuits in Figure 56: circuit 1207,1227 and 1218 and circuit 1208,1228 and 1217.These sets of lines promptly are initialised at the place that begins of chain by circuit 1207 and 1208.The SHIFTIN signal spreads out of from the fpga logic device of the group of front and begins memory access to current fpga logic device.After by one group of specific mobile finishing that chain carried out, last logical unit sends LAST signal (being LASTL or LASTH) to CTRL_FPGA unit 1200.For high-end group, logical unit 1202 sends LASTH to CTRL_FPGA unit 1200 and shifts out signal on circuit 1218, and for the low side group, logical unit 1204 sends the LASTL signal to CTRL_FPGA unit 1200 on circuit 1217.
Implement and Figure 56 as for circuit board, one embodiment of the invention (are logical unit 1201-1204 with assembly, memory device 1205-1206 and CTRL_FPGA unit 1200) and bus (being FPGA bus 1212-1213 and local bus 1210) be incorporated on the circuit board.This circuit board is connected with mainboard by motherboard connector.Therefore, four logical units (each organizes last two) are arranged, two memory devices (each organizes), and bus on a circuit board.Then comprise it self logical unit (being generally four) on second circuit board, memory device (being generally two), FPGA i/o controller (CTRL_FPGA unit) and bus.But pci controller only is installed on first circuit board.Connector in the circuit board mentioned above is arranged between the circuit board, and the logical unit on all circuit boards can connect together like this, and can communicate with one another when estimation, is provided with local bus in addition between all circuit boards.All be provided with FPGA bus FD[63:0 on each circuit board], but they do not cross over a plurality of circuit boards.
In this circuit board arrangement, simulation system is carried out memory mapped between logical unit on each circuit board and memory device, but does not support the memory mapped between the various boards.Therefore, the logical unit on the circuit board 5 can only be mapped to the logical unit on the circuit board 5 with the memory block and can not be mapped to memory device on other circuit boards.But in other embodiments, simulation system then can be mapped to memory device on the another one circuit board from the logical unit on the circuit board with the memory block.
Operation according to the analog storage system of one embodiment of the invention is roughly as follows.Simulation read-write circulation is divided into three phases: DMA data transmission, estimation and memory access.For showing that a simulation read-write round-robin finishes, the analog storage system on circuit 1209 to/transmit and receive the DONE signal from CTRL_FPGA unit 1200 and computing system.The generation of the DATAXSFR signal indicating DMA data transfer phase on the bus 1211.In this stage, between computing system and the fpga logic device 1201 to 1204 by FPGA data bus, high-end group of bus 1212[FD (63:32)] and low side group bus 1213[FD (31:0)] transmit data mutually.Generally speaking, DMA transmits and occurs between host computer system and the fpga logic device.When initialization and memory contents dump, DMA transmits and occurs between host computer system and SRAM memory device 1205 and 1206.
In estimating stage, the logical circuit among each fpga logic device 1201-1204 sends the appropriate software clock to user's design logic, and input startup and multiplexed enabling signal are to carry out data estimation.The communication of FPGA internal logic means betides this stage.CTRL_FPGA unit 1200 also starts the estimation counter with during the control estimation.The numeral that system comes setting recording by the longest path of determining signal, thus also just set the length of estimating the phase.Path is that the certain number of step is relevant.System uses this Step Information and calculates and makes the estimation circulation can finish the required quantity of operation.
In the memory access stage, the high low side group of analog storage system wait fpga logic device 1201-1204 deposits in its address and control signal respectively on separately the FPGA data bus.CTRL_FPGA unit 1200 will lock these addresses and control signal.For write operation, the address, control and data signal the SRAM memory device 1205 and 1206 that to be sent to from fpga logic device 1201-1204 separately.For read operation, address and control signal will be sent to separately SRAM memory device 1205 and 1206 from fpga logic device 1201-1204, and data-signal then passes fpga logic device 1201-1204 to separately by SRAM memory device 1205 and 1206.At the fpga logic device end, the FD bus driver places the address and the control signal of memory block on the FPGA data bus (FD bus).If being placed on, write operation, write data be used for this memory block on the FD bus.If read operation, the double buffering device will for from the data latching of SRAM memory device to the memory block of FD bus.This operates in each memory block of each fpga logic device and carries out in order, only carries out in a memory block at every turn.When all memory blocks that need are all accessed on the fpga logic device, the analog storage system will enter a fpga logic device of each group and begin visit to the memory block of this fpga logic device.After all memory blocks that need on all fpga logic device 1201-1204 were all accessed, the analog storage read/write cycles was finished, and the analog storage system will keep idle state, up to the arrival of next analog storage read/write cycles.
Figure 57 is the more detailed structural drawing of storer simulation of the present invention aspect, comprises CTRL_FPGA unit 1200 more detailed structural drawing and simulates all relevant logical units with storer.Figure 57 has shown the part of CTRL_FPGA unit 1200 and logical unit 1203 (its similar is in logical unit 1201,1202 and 1204).CTRL_FPGA unit 1200 comprises limited memory state machine (MEMFSM) 1240, AND gate 1241, estimation (EVAL) counter 1242, low side group storage address/control lock storage 1243, low side group address/control multiplexer 1244, address counter 1245, high-end group of storage address/control lock storage 1247 and high-end group address/control multiplexer 1246.Each logical unit, for example shown logical unit 1203 among Figure 57 comprises estimation finite state machine (EVALFSMx) 1248 and data bus multiplexer (FDO-MUXx that is used for logical unit 1203 FPGA0) 1249.The certain logic device that appended " x " expression of EVALFSM ending is attached thereto (FPGA0, FPGA1, FPGA2, FPGA3), the number " x " expression from 0 to 3 in this example.Therefore, EVALFSM0 is relevant with the FPGA0 of logical unit 1203.Generally speaking, each logical unit is all relevant with some digital x.And for logical unit N, x then represents certain number from 0 to N-1.
In each logical unit of 1201-1204, many memory blocks are and configure and mapped user designs relevant.Therefore, the memory block interface 1253 in the user logic provides approach for the memory block that computing system enters the fpga logic apparatus array that needs.Memory block interface 1253 also offers FPGA data bus multiplexer (FDO-MUXx) 1249 with the memory write data on 1295 buses, and receives memory read data on the buses 1297 from memory read data buffer 1251.
Each fpga logic device all has storage area data/logic interfacing 1298.Each such storage area data/logic interfacing 1298 is all connecting FPGA data bus multiplexer (FDO-MUXx) 1249, estimation finite state machine (EVALFSMx) 1248 and FPGA bus FD (63:0).Storage area data/logic interfacing 1298 comprises the memory block interface 1253 that reads storage data double buffering device 1251, address compensating unit 1250, memory model 1252 and each memory block N.They equally also are present in all specific fpga logic device 1201-1204 that are used for each memory block N.Therefore, if 5 memory blocks are arranged, 5 groups of storage area data/logic interfacings 1298 are so just arranged; That is to say that 5 are read 1250,5 memory models 1252 of storage data 1251,5 address compensating units of double buffering device, and the memory block interface 1253 of 5 each memory block N.
The same with EVALFSMx, " x " among the FDO-MUXx also represents the certain logic device (FPGA0, FPGA1, FPGA2, FPGA3) that is attached thereto, here the number " x " expression from 0 to 3.The output of FDO-MUXx1249 is positioned on the bus 1282, bus 1282 then links to each other with high-end group of bus FD (63:32) or low side group bus FD (31:0), depends on as for which bus linking to each other MUXx1249 with which chip (FPGA0, FPGA1, FPGA2, FPGA3) links to each other.In Figure 57, FDO-MUXx is the FDO-MUX0 that links to each other with low side group logical unit FPGA01203.Therefore, the output on the bus 1282 passes to low side group bus FD (31:0).Bus 1283 parts are used to transmit read data from high-end group of bus FD (63:32) or low side group bus FD (31:0) to read bus 1283, to be entered in the memory read data double buffering device 1251.Therefore, write data is transferred to high-end group of bus FD (63:32) or the low side group bus FD (31:0) by the memory block of FDO-MUXx1249 from each logical unit 1201-1204, and read data then is transferred to the memory read data double buffering device 1251 from high-end group of bus FD (63:32) or low side group bus FD (31:0) by read bus 1283.Memory read data double buffering device utilizes double buffering mechanism locking data in first impact damper, and then cushions, and discharges locked data simultaneously so that minimum deviation.Memory read data double buffering device 1251 will go through hereinafter.
Return memory model 1252, it is converted into user's type of memory the SRAM type of analog storage system.Because the type of memory in user's design may have nothing in common with each other, and the memory block interface 1253 of each user's design also may be specific.For example, the user storage type may be DRAM, volatile storage or EEPROM.But various memory blocks interface 1253 is equipped with storage address and control signal (being reading and writing, chip selection, mem_clk).In an embodiment of storer simulation of the present invention, the user storage type is converted into the SRAM type of using in the analog storage system.If the user storage type is SRAM, it is very simple to the conversion of SRAM type memory model so.Therefore, storage address and control signal are present on the bus 1296 and are transferred to memory model 1252, and by 1252 it are transformed.
Memory model 1252 is providing the memory block address information and providing control information on bus 1292 on the bus 1293.Address compensating unit 1250 is various memory blocks receiver address information, and offers modification compensation address on the bus 1291 according to the original address on the bus 1293.Because the specific memory regional address may cover mutually, therefore compensate necessary.For example, a memory block may keep and be present in space 0-2K, and another memory block then may keep and be present in space 0-3K.Because two memory blocks are overlapping on the 0-2K of space, so if do not have a kind of address compensation mechanism just to be difficult to carry out the individual address read-write.Therefore, can keep and be present in space 0-2K on first memory block, second memory block then can keep and be present in the above space to 5K of 2K.Compensation address on the address compensating unit 1250 and the control signal on the bus 1292 are combined and offer bus 1299 and be transferred in the FPGA bus multiplexer (FDO-MUXx) 1249.
SPACE2 data on the FPGA data bus multiplexer FDO-MUXx reception bus 1289, the SPACE3 data on the bus 1290, the address/control data on the bus 1299, and the memory write data on the bus 1295.As previously mentioned, SPACE2 and SPACE3 are concrete SPACE indexes.By the FPGA i/o controller (part 327 among Figure 10; Figure 22) the SPACE index of Chan Shenging is selected specific address space (be that REG reads, REG writes, and S2H reads, and H2S writes, and CLK writes).In these address spaces, system of the present invention will select in order with accessed one group of specific character.SPACE2 refers to read the special-purpose storage space of transmission by hardware to the DMA of software H2S data.SPACE3 then refers to read the special-purpose storage space of transmission by the DMA of REGISTER_READ data.Ginseng sees the above table G.
As its output, FDO_MUXx1249 provides data for low side group or high-end group of bus on bus 1282.Selector signal is from the selection signal on startup (output_en) signal of the output on 1284 circuits of EVALFSMx unit 1248 and 1285 circuits.Output enabling signal on 1284 circuits starts the operation of (or forbidding) FDO_MUXx1249.For the data access on the FPGA bus, the output enabling signal is activated and allows FDO_MUXx performance function.Selection signal on 1285 circuits is produced by EVALFSMx unit 1248, so that from the SPACE2 data on the bus 1289, SPACE3 data on the bus 1290, the address/control signal on the bus 1299, and select among a plurality of inputs of the memory write data on the bus 1295.The selection signal that EVALFSMx unit 1248 is produced will further be discussed below.
EVALFSMx unit 1248 is operation cores of each logical device 1201-1204 of relevant storer simulation system.EVALFSMx unit 1248 receives following the input as it: the SHIFTIN signal on 1279 circuits, and from the EVAL signal of the CTRL_FPGA unit 1200 on 1274 circuits, and the write signal wrx on 1287 circuits.The signal of SHIFTOUT on EVALFSMx unit 1248 outputs 1280 circuits, mail to the read latch signal rd-latx of memory read data double buffer 1251 on 1286 circuits, mail to the output enabling signal of FDO_MUXx1249 on 1284 circuits, mail to the selection signal of FDO_MUXx1249 on 1285 circuits, and 1,281 three signals (input-en, mux_en and clk_en) that mail to user logic on the circuit.
It is as described below usually according to an embodiment of the invention, to be used for the operation of fpga logic device 1201-1204 of storer simulation system.When the EVAL signal was in logical one, the data estimation of fpga logic device 1201-1204 the inside had just begun; Otherwise simulation system is just being moved DMA data transmission or memory access.When EVAL=1, EVALFSMx unit 1248 produces the clk_en signal, and input_en signal and mux_en signal are so that the permission user logic is estimated the data on the logical device, latch related data, and multiway transmission signal respectively.EVALFSMx unit 1248 produces the clk_en signal so that start second trigger (seeing Figure 19) of all clock edge register flip in user's design logic.Otherwise the clk_en signal just is taken as software clock.If user's type of memory is synchronous, clk_en also starts the second clock of the memory read data double buffer 1251 in each memory block.The 1248 couples of users' in EVALFSMx unit design logic produces the input_en signal, to latch the input signal that is transferred to user logic by DMA from CPU.The input_en signal will start imports second trigger (seeing Figure 19) that is provided in the major clock register.At last, EVALFSMx unit 1248 produces the mux_en signal, so that open multi-channel transmission channel in each fpga logic device, communicates with other fpga logic device in beginning and the array.
Thereafter, if fpga logic device 1201-1204 comprises a memory block at least, the storer simulation system just waits for that selecteed data are moved into selected fpga logic device so, produce output_en then and select signal, so that the FPGA data bus driver is placed on the address of memory block interface 1253 (mem_block_N) and control signal on the FD bus.
If the write signal wrx on 1287 circuits is activated (that is, logical one), select signal and output_en signal also to be activated so, so that write data is placed on low side or the high-end group of bus, this depends on which group fpga chip is connected on.In Figure 57, logical device 1203 is FPGAO, and is connected to low side group bus FD[31:0] on.If the write signal wrx on 1287 circuits is disabled (promptly, logical zero), select signal and output_en signal disabled so, and 1286 read latch signal rd-latx on the circuit also disabled, cushion via low side or the high-end group of bus selected data from SRAM so that allow the memory read data connect buffering 1251 latchs and be connected, this depends on which group fpga chip is connected to.The wrx signal is the memory write signals that originates from the memory interface of user's design logic.Really, the wrx signal on 1287 circuits via control bus 1292 from memory model 1252.
This process that reads or writes data all can take place in each fpga logic device.After all memory blocks all obtained handling via the SRAM visit, EVALFSMx unit 1248 produced the SHIFTOUT signal, so that allow the next fpga logic device in the link to carry out the SRAM visit.Note that at the memory access meeting of the equipment on high-end and the low side group is parallel and take place.Sometimes, may before memory access, finish at the memory access of a group at another group.For all these visits, suitable latent period is inserted into, and is ready to and data when can use so that have only when logic, and it just understands deal with data.
1,200 one sides in the CTRL_FPGA unit, MEMFSM 1240 is in the core of storer simulation of the present invention aspect.Its sends also and receives many control signals, so that the control of the various operations supported of the activation in control store simulation Writing/Reading cycle and cycle.MEMFSM1240 receives DATAXSFR signal on 1260 circuits via 1258 circuits.This signal also is provided to each logical device on 1273 circuits.When the DATAXSFR step-down (, logic low), the DMA data transfer cycle finishes, and estimation and memory access cycle then begin.
MEMFSM 1240 also receives a LASTH signal on 1254 circuits, and 1255 LASTL signals on the circuit, so that indication obtains visiting between computing system and simulation system via pci bus and FPGA bus with institute's word selection that selected address space links.Shift out MOVE signal that process links (for instance with this by each logical device, logical device 1201-1204) obtains propagating, obtain visit up to required word, and the MOVE signal becomes LAST signal (just, at high-end group LASTH with at the LASTL of low side group) at the link end at last.In EVALFSM1248 (that is, Figure 57 has shown EVALFSMO for FPGAO logical device 1203), corresponding LAST signal is the SHIFTOUT signal on 1280 circuits.Because specific logical device 1203 is not last logical device in the low side group link shown in Figure 56, wherein logical device 1204 is last logical device in the low side group link, so be not the LAST signal at the SHIFTOUT signal of EVALFSMO.If EVALFSM1248 is corresponding to the EVALFSM2 among Figure 56, the SHIFTOUT signal on 1280 circuits is exactly the LASTL signal that is provided to MEMFSM on 1255 circuits so.Otherwise the SHIFTOUT signal on 1280 circuits just is provided to logical device 1204 (seeing Figure 56).Similar situation is that the SHIFTIN signal on 1279 circuits is represented the Vcc of FPGAO logical device (seeing Figure 56) 1203.
LASTL and LASTH signal are the inputs that enters AND gate 1241 respectively via 1256 circuits and 1257 circuits.AND gate 1241 provides the drain electrode of an opening.The output of AND gate 1241 produces the DONE signal on 1259 circuits, and this signal is provided to computing system and MEMFSM1240.Therefore, have only when LASTL and LASTH signal all be logic high, and indication when shifting out chain process and finishing, AND gate is just exported a logic high.
MEMFSM 1240 produces a start signal to EVAL counter 1242 on 1261 circuits.As this title hinted, the beginning of start signal triggers EVAL counter 1242, and after the DMA data transfer cycle is finished, be sent out.Start signal produces when detecting the transition of (1 to 0) from high to low of DATAXSFR signal.EVAL counter 1242 is programmable counters, and its calculating has the clock period of a predetermined number.Determine the length in estimation cycle in the EVAL counter 1242 through the count cycle length of programming.The output of the EVAL counter 1242 on 1274 circuits is a logical level 1 or 0, and this depends on that whether counter is at counting.When EVAL counter 1242 was being counted, the output on 1274 circuits was in logical one, and this output is provided to each fpga logic device 1201-1204 via EVALFSMx1248.When EVAL=1, fpga logic device 1201-1204 move mutual FPGA communication in case in user's design the estimated data.The output of EVAL counter 1242 is also fed back to MEMFSM unit 1240 on 1262 circuits, so that realize its tracking purpose.When program count finished, a logic zero signal on EVAL counter 1242 generations 1274 and 1262 circuits was with the end in indication estimation cycle.
If do not need memory access, the MEM_EN signal on 1272 circuits is asserted to logical zero so, and is provided to MEMFSM unit 1240, and in this case, the storer simulation system is waited for another one DMA data transfer cycle.Memory access if desired, the MEM_EN signal on 1272 circuits is asserted to logical one so.In fact, the MEM_EN signal be a mainboard SRAM memory device that is used for initiated access fpga logic device, from the control signal of CPU.Here, MEMFSM unit 1240 waits for that fpga logic device 1201-1204 is placed on address and control signal on the FPGA bus, i.e. FD[63:32] and FD[31:0].
The associated control signal of remaining functional element and they and circuit are to be used for the SRAM memory device of write and read data for address/control information is offered.These unit comprise the storage address/control lock storage 1243 at the low side group, address control mux1244 at the low side group, at high-end group storage address/control lock storage 1247, at high-end group address control mux1246 and address counter 1245.
Receive from FPGA bus FD[31:0 at the storage address of low side group/control lock storage 1243] address and control signal 1275, a latch signal on this signal and bus 1213 and 1263 circuits is consistent.Latch 1243 produces the mem_wr_L signal on 1264 circuits, and via bus 1266 with access address/control signal from FPGA bus FD[31:0] offer address/control mux1244.This mem_wr signal selects write signal identical with chip.
Address/control mux1244 receives address on the buses 1266 and control information via bus 1268 and imports as it from the address information of address counter 1245.As output, it sends to address/control information low side group SRAM memory device 1205 on bus 1276.Selection signal on 1265 circuits provides the suitable selection signal from MEMFSM unit 1240.Address/control information on the bus 1276 is corresponding to the MA[18:2 on bus among Figure 56 1229 and 1216] and chip selection read/write signal.
The information that address counter 1245 receives from SPACE4 and SPACE5 via bus 1267.SPACE4 comprises that DMA writes transmission information.SPACE5 comprises that DMA reads transmission information.Therefore, these DMA are transmitted between computing system (via high-speed buffer/primary memory of workstation CPU) on the pci bus and the simulation system (SRAM memory device 1205,1206) and take place.Address counter 1245 offers bus 1288 and 1268 with its output, and offers address/control muxes1244 and 1246.By on 1265 circuits at the suitable selection signal of low side group, address/control mux1244 is being placed on the bus 1276 or with address/control information on the bus 1266 so that at SRAM equipment 1205 and fpga logic device 1203, carry out the Writing/Reading memory access between 1204, perhaps the method that can substitute is, will be placed on the bus 1276 from the DMA Writing/Reading transmission data of SPACE4 on the bus 1267 or SPACE5.
In memory access cycle, MEMFSM unit 1240 is provided to storage address/control lock storage 1243 with the latch signal on 1263 circuits, so that obtain from FPGA bus FD[31:0] input.MEMFSM unit 1240 extracts from FD[31:0] on the mem_wr_L control information of address/control signal so that make further control.If the mem_wr_L signal on 1264 circuits is a logical one, just need a write operation, and MEMFSM unit 1240 will produce suitable selection signal and send to address/control mux1244 on 1265 circuits, so that address on the bus 1266 and control signal are sent to the low side group SRAM on the bus 1276.Thereafter, a write data transmission occurs to the SRAM memory device from the fpga logic device.If the mem_wr_L signal on 1264 circuits is a logical zero, so need a read operation, be placed on FPGA bus FD[31:0 by the SRAM memory device so that allow simulation system wait for] on data.In case DSR, read data just are transferred to the fpga logic device from the SRAM memory device.
Similar configuration and operation are provided for high-end group.Receive from FPGA bus FD[63:32 at high-end group storage address/control lock storage 1247] 1278 address and control signal, a latch signal on this signal and bus 1212 and 1270 circuits is consistent.Latch 1270 produces the mem_wr_H signal on 1271 circuits, and via bus 1239 with access address/control signal from FPGA bus FD[63:32] offer address/control mux1246.
Address/control mux1246 receive address on the buses 1239 and control information via bus 1268 and from the address information of address counter 1245 as input.As output, it sends to high-end group of SRAM memory device 1206 with the address/control information on the bus 1277.Selection signal on 1269 circuits provides the suitable selection signal from MEMFSM unit 1240.Address/control information on the bus 1277 is corresponding to the MA[18:2 on the bus among Figure 56 1214 and 1215] and chip selection read/write signal.
Address counter 1245 transmits so that carry out the DMA write and read from the information of SPACE4 and SPACE5 through bus 1267 receptions by mentioned earlier.Address counter 1245 offers bus 1288 and 1268 and address/control muxes1244 and 1246 with its output.By on 1269 circuits at high-end group suitable selection signal, address/control mux1246 or the address/control information on the bus 1239 is placed on the bus 1277 so that at SRAM equipment 1206 and fpga logic device 1201, carry out the Writing/Reading memory access between 1202, perhaps the method that can substitute is, will be placed on the bus 1277 from the DMA Writing/Reading transmission data of SPACE4 on the bus 1267 or SPACE5.
In memory access cycle, MEMFSM unit 1240 is provided to storage address/control lock storage 1247 with the latch signal on 1270 circuits so that obtain from FPGA bus FD[63:32] input.MEMFSM unit 1240 extracts from FD[63:32] on the mem_wr_H control information of address/control signal so that make further control.If the mem_wr_H signal on 1271 circuits is a logical one, so just need a write operation, and MEMFSM unit 1240 will produce suitable selection signal and send to address/control mux1246 on 1269 circuits, so that address on the bus 1239 and control are sent to high-end group of SRAM on the bus 1277 as signal.Thereafter, a write data is transferred to the SRAM memory device from the fpga logic device.If the mem_wr_H signal on 1271 circuits is a logical zero, so need a read operation, simulation system can be waited for by the SRAM memory device and be placed on FPGA bus FD[63:32 like this] on data.In case DSR, read data just are transferred to the fpga logic device from the SRAM memory device.
Shown in Figure 57, address and control signal are provided to low side group SRAM memory device and high-end group of memory devices respectively via bus 1276 and 1277.Bus 1276 at the low side group makes up corresponding to the bus among Figure 56 1229 and 1216.Similarly, at high-end group bus 1277 corresponding to bus among Figure 56 1214 and 1215 combinations.
According to an embodiment of the invention, as described below usually at the operation of the CTRL_FPGA unit 1200 of storer simulation system.DONE signal on 1259 circuits is provided to computing system and the MEMFSM unit 1240 in CTRL_FPGA unit 1200, the finishing of its instruction simulation Writing/Reading cycle.The generation of the DMA data transfer cycle in the DATAXSFR signal instruction simulation Writing/Reading cycle on 1260 circuits.FPGA bus FD[31:0] and FD[63:32] on storage address/control signal all be provided to respectively at high-end and storage address/control lock storage 1243 and 1247 low side group.For arbitrary group, MEMFSM unit 1240 all produces latch signal (1263 or 1269) so that latch address and control information.This information is provided to the SRAM memory device then.Whether the mem_wr signal is used to decision needs a write or read operation.Write for one if desired, data just are transferred to the SRAM memory device via the FPGA bus from fpga logic device 1201-1204.Read if desired, simulation system just waits for that the SRAM memory device is placed on the PFGA bus requested data so that transmit between SRAM memory device and fpga logic device.In order to carry out the DMA data transmission of SPACE4 and SPACE5, the output that the selection signal on 1265,1269 circuits can be selected address counter 1245 is as data, so that transmit between host computer system in simulation system and the SRAM memory device.For all these visits, suitable latent period is inserted into, and is ready to and data are can get the time so that have only when logic, and logic is deal with data.
The view (Figure 57) that Figure 60 display-memory read data double buffer 1251 is more detailed.Each memory block N in each fpga logic device has a double buffer, so that latch the relevant data that may come at different time, these data finally cushion out simultaneously this relevant latch data then.In Figure 60, comprise two D-D- flip flops 1340 and 1341 at the double buffer 1391 of memory block 0.The output 1343 of first D flip-flop 1340 is connected to the input of second D flip-flop 1341.The output 1344 of second D flip-flop 1341 is the output of double buffer, and it is provided to the memory block N interface in user's design logic.Global clock input is provided to first trigger 1340 on 1393 circuits and second trigger 1341 on 1394 circuits.
First D flip-flop 1340 is via bus 1283 with at high-end group FPGA bus FD[63:32] and at the FD[31:0 of low side group] data input on 1342 circuits, received from the SRAM memory device.Enable input and be connected to 1345 circuits, this circuit be each fpga logic device reception from the rd-latx of EVALFSMx unit (rd_lat0) signal for instance.Therefore, to read operation (that is, wrx=0), the EVALFSMx unit produces the rd_latx signal, so as with the data latching on 1342 circuits to 1343 circuits.May come at different time for the input data that all double buffers of all memory blocks are prepared, double buffer determines that all data are at first latched.In case all data are latched to D flip-flop 1340, clk_en signal (that is, software clock) just is provided on 1346 circuits, and second D flip-flop 1341 issued in input as clock.When the clk_en signal was asserted, the latch data on 1343 circuits was cushioned the D flip-flop 1341 that enters 1344 circuits.
For next memory block 1, another double buffer 1392 that is equivalent to double buffer 1391 in fact is provided.From the data of SRAM memory device is input on 1396 circuits.The global clock signal is the input on 1397 circuits.Clk_en (software clock) signal is transfused to second trigger (not shown) in the double buffer 1392 on 1398 circuits.These circuits are connected to the analog signal lines of other double buffers of first double buffer 1391 of memory block 0 and other memory blocks N.Output connects buffered data and is provided on 1399 circuits.
Rd_latx signal at second double buffer 1392 (rd_lat1) independently is provided on 1395 circuits by other rd_latx signals from other double buffers for instance.More double buffer is provided to other memory block N.
Now the constitutional diagram of MEMFSM unit 1240 will be discussed at one embodiment of the present of invention.Figure 58 has shown a constitutional diagram of the finite state machine of the MEMFSM unit in the CTRL_FPGA unit.Three cycles that the formation of the constitutional diagram among Figure 58 is convenient to simulate in the Writing/Reading cycle also show with their corresponding states.Therefore, the corresponding DMA data transfer cycle of state 1300-1301; The corresponding estimation cycle of state 1302-1304; State 1305-1314 corresponding stored device access cycle.In discussing hereinafter in conjunction with Figure 58 and with reference to Figure 57.
The signal sequence of DMA transmission generally speaking,, estimation and memory access is set.In one embodiment, order is as follows: DATA_XSFR triggers the DMA data transmission, if any.Produced when the DMA data transmission is finished at high-end and the LAST signal low side group, and triggered the DONE signal, so that indicate finishing of DMA data transfer cycle.The XSFR_DONE signal is produced then, and then the EVAL cycle begins.When EVAL finishes, can begin memory read/write.
Turn to the top of Figure 58, when the DATAXSFR signal was in logical zero, state 1300 all was idle.The DMA data transmission did not take place in this indication at that time.When the DATAXSFR signal was in logical one, MEMFSM unit 1240 just proceeded to state 1301.Here, computing system need carry out the DMA data transmission between computing system (Fig. 1, the primary memory in 45 and 46) and simulation system (fpga logic device 1201-1204 among Figure 56 or SRAM memory device 1205,1206).Suitable latent period is inserted into, and is done up to the DMA data transmission.When the DMA transmission was finished, the DATAXSFR signal turned back to logical zero.
When the DATAXSFR signal is got back to logical zero, be triggered in the MEMFSM unit 1240 of the state that is created in 1302 of start signal.Start signal starts EVAL counter 1242, and this is a programmable counter.Programmable count duration in the EVAL counter equals to estimate the duration in cycle.As long as the EVAL counter is counted at state 1303, the EVAL signal just is asserted at logical one so, and is provided to the EVALFSMx and the MEMFSM unit 1240 of each fpga logic device.When counting finished, the EVAL signal that the EVAL counter will be in logical zero was provided to each fpga logic device interior EVALFSMx and MEMFSM unit 1240.When MEMFSM unit 1240 receive logic 0EVAL signals, it opens the EVAL_DONE mark of the state of being in 1304.The EVAL_DONE mark is used to refer to the estimation cycle by MEMFSM to be finished, and if desired, memory access cycle can be carried out now.CPU will check that EVAL_DONE and XSFR_DONE completed successfully so that determine DMA transmission and EVAL by reading XSFR_EVAL register (K sees the following form) before the next DMA transmission of beginning.
Yet in some cases, simulation system may not thought run memory visit at the moment.Here, simulation system is retained in logical zero to memory enable signal MEM_EN.(logical zero) MEM_EN signal of this forbidding is retained in idle condition 1300 to the MEMFSM unit, and it is waiting for the data estimation of DMA data transmission or fpga logic device here.On the other hand, if memory enable signal MEM_EN is in logical one, simulation system is being indicated the needs that carry out memory access so.
State in Figure 58 1304 times, constitutional diagram are divided into two parallel sections that carry out.Section comprises the state 1305,1306,1307,1308 and 1309 at the memory access of low side group.Another section comprises the state 1311,1312,1313,1314 and 1309 at high-end group of memory access.
At state 1305, simulation system waits for that one-period is so that the fpga logic device of current selection is placed on FPGA bus FD[31:0 with address and control signal] on.At state 1306, MEMFSM on 1263 circuits, produce latch signal to storage address/control lock storage 1243 so that obtain from FD[31:0] input.To be read or be written to the SRAM memory device from the SRAM memory device corresponding to the address of this specific acquisition and the data of control signal.In order to determine whether simulation system needs a read operation or a write operation, will from address and control signal, be extracted at the memory write signals mem_wr_L of low side group.If mem_wr_L=0, a read operation is requested.If mem_wr_L=1, a write operation is requested.As indicated above, this mem_wr signal equals chip and selects write signal.
At state 1307, produced at the suitable selection signal of address/control mux1244, so that address and control signal are sent to low side group SRAM.Mem_wr signal and LASTL signal are checked in the MEMFSM unit.If mem_wr_L=1 and LASTL=0, a write operation is requested, but last data also is not moved out of in the fpga logic device link.Therefore, simulation system is got back to state 1305, and it waits for that one-period is so that the fpga logic device is placed into FD[31:0 with more address and control signal at this] on.This process is proceeded, and to the last data are moved out of the fpga logic device.Yet if mem_wr_L=1 and LASTL=1, last data just has been moved out of the fpga logic device.
Similarly, if read operation of mem_wr_L=0 indication, MEMFSM just proceeds to state 1308.At state 1308, simulation system waits for that one-period is so that the SRAM memory device is placed on FPGA bus FD[31:0 with data] on.If LASTL=0, last data in the fpga logic device link also are not moved out of.Therefore, simulation system is got back to state 1305, and wherein it waits for one-period, so that the fpga logic device is placed on FD[31:0 with more address and control signal] on.This process is proceeded, and to the last data are moved out of the fpga logic device.Notice that write operation (mem_wr_L=1) and read operation (mem_wr_L=0) can intersect or otherwise hocket mutually, up to LASTL=1.
When LASTL=1, MEMFSM proceeds to state 1309, and wherein when DONE=0, it is being waited for.When DONE=1, LASTL and LASTH are in logical one, and therefore, the simulation Writing/Reading cycle finishes.Simulation system proceeds to state 1300 then, wherein needs only DATAXSFR=0, and it just keeps idle.
Identical process also is applicable to high-end group.At state 1311, simulation system is waited for one-period, so that current selected fpga logic device is placed into FPGA bus FD[63:32 with address and control signal] on.At state 1312, the latch signal that MEMFSM produces on 1270 circuits storage address/control lock storage 1247 is so that obtain from FD[63:32] input.To be read or write the SRAM memory device from the SRAM memory device corresponding to the address of this specific acquisition and the data of control signal.In order to determine whether simulation system needs a read operation or a write operation, will from address and control signal, be obtained extracting at high-end group memory write signals mem_wr_H.If mem_wr_H=0, a read operation is requested.If men_w_H=1, a write operation is requested.
At state 1313, produced at the suitable selection signal of address/control mux1246, so that address and control signal are sent to high-end group of SRAM.Mem_wr signal and LASTH signal are checked in the MEMFSM unit.If mem_wr_H=1 and LASTH=0, a write operation is requested, but last data also is not moved out of in the fpga logic device link.Therefore, simulation system is got back to state 1311, and wherein it waits for one-period, so that the fpga logic device is placed on FD[63:32 with more address and control signal] on.This process is proceeded, and to the last data are moved out of the fpga logic device.Yet if mem_wr_H=1 and LASTH=1, last data has been moved out of the fpga logic device so.
Similarly, if read operation of mem_wr_H=0 indication, MEMFSM just proceeds to state 1314.At state 1314, simulation system is waited for one-period, so that the SRAM memory device is placed on FPGA bus FD[63:32 with data] on.If LASTH=0, last data also is not moved out of in the fpga logic device link so.Therefore, simulation system is got back to state 1311, and wherein it waits for one-period, so that the fpga logic device is placed on FD[63:32 with more address and control signal] on.This process is proceeded, up to being moved out of the fpga logic device to last data.Notice that write operation (mem_wr_H=1) and read operation (mem_wr_H=0) can intersect or otherwise hocket mutually, up to LASTH=1.
When LASTH=1, MEMFSM proceeds to state 1309, and wherein it is waited for, and DONE=0.When DONE=1, LASTL and LASTH are in logical one, therefore simulate the Writing/Reading cycle and finish.Simulation system proceeds to state 1300 then, and wherein when DATAXSFR=0, it keeps idle.
The method that can substitute is that for high-end group and low side group, state 1309 and 1310 is not all implemented according to a further embodiment of the invention.Therefore, in the low side group, MEMFSM will be directly to state 1300 afterwards by state 1308 (LASTL=1) or 1307 (MEM_WR_L=1 and LASTL=1).In high-end group, MEMFSM will be directly to state 1300 afterwards by state 1314 (LASTH1) or 1313 (MEM_WR_H=1 and LASTH=1).
The constitutional diagram of EVALFSM unit 1248 will be discussed in conjunction with one embodiment of the present of invention.Figure 59 shows a constitutional diagram of this EVALFSMx finite state machine in each fpga chip.Shown in Figure 58, two cycle states corresponding with them that the formation of the constitutional diagram among Figure 59 is convenient to simulate Writing/Reading cycle the inside show together.Therefore, corresponding estimation cycle of state 1320-1326, state 1326B-1336 corresponding stored device access cycle.In the following discussion in conjunction with Figure 59 with reference to Figure 57.
EVALFSMx unit 1248 receives from the EVAL signal (seeing Figure 57) on 1274 circuits of CTRL_FPGA unit 1200.When EVAL=0, the data estimation that the fpga logic device carries out does not take place.Therefore, at state 1320, when EVAL=0, EVALFSMx is in the free time.When EVAL=1, EVALFSMx proceeds to state 1321.
State 1321,1322 asks with FPGA-to communicate by letter relevantly that with 1323 wherein data are estimated by user's designing institute via the fpga logic device.Here, EVALFSMx produces signal input_en, mux_en and clk_en (project 1281 among Figure 57) to user logic.At state 1321, EVALFSMx produces the clk_en signal, and this signal starts second trigger (seeing Figure 19) of all clock edge register flip in user's design logic in this cycle.Otherwise the clk_en signal just is considered to software clock.If the user memory type is synchronous, clk_en also starts second clock of memory read data double buffer 1251 in each memory block.In this cycle, output is sent to user's design logic at the SRAM data of each memory block.
At state 1322, EVALFSMx produces the input_en signal to user's design logic, to latch the input signal that the DMA transmission sends from CPU to the user logic.The input_en signal starts input (seeing Figure 19) for second trigger in the major clock register provides.
At state 1323, EVALFSMx produces the mux_en signal, so that open multiplex electronics in each fpga logic device, other fpga logic devices communicate in beginning and the array.Such as explained before, wire line is often by multiplexed processing, so that use pin resource limited in each fpga logic device chip effectively between FPGA-.
At state 1324, as long as EVAL=1, EVALFSM just waits for.When EVAL=0, the estimation cycle finishes, and therefore state 1325 needs EVALFSMx to close the mux_en signal.
If the number of memory block M (wherein M is an integer, comprises 0) is zero, EVALFSMx gets back to state 1320, if EVAL=0 wherein, it just keeps idle.Under the situation of major part, M>0, therefore, EVALFSMx proceeds to state 1326A/1326B." M " is the number of memory block in the fpga logic device.It is shone upon from user's design and is configured in the fpga logic device, is a constant; It can countdown.If M>0, the right side part (memory access cycle) of Figure 59 will be configured in the fpga logic device.If M=0 has only the left-hand component (EVAL cycle) of Figure 59 to be configured.
As long as SHIFTIN=0, state 1327 just is retained in EVALFSMx in the waiting status.When SHIFTIN=1, previous fpga logic device has been finished its memory access, and current fpga logic device is ready to the memory access task of bootup window.The method that can substitute is that when SHIFTIN=1, current fpga logic device is first logical device in group, and the SHIFTIN incoming line is connected to Vcc.Yet the current fpga logic device of reception indication of SHIFTIN=1 signal has been ready to the run memory visit.At state 1328, the memory block is counted N and is set at N=1.Number N is increased in the time of will taking place in each loop, so that can be done at the memory access of specific memory section N.At first, N=1, so EVALFSMx will proceed to and is memory block 1 reference-to storage.
At state 1329, EVALFSMx produces selection signal on 1285 circuits and the output_en signal on 1284 circuits to FPGA bus driver FDO_MUXx1249, so that the address and the control signal of Mem_Block_N interface 1253 is placed on FPGA bus FD[63:32] or FD[31:0] on.Write operation if desired, wr=1; Otherwise, need read operation, then a wr=0.The wr signal that EVALFSMx receives on 1287 circuits one of is imported as it.Based on this wr signal, the suitable selection signal on 1285 circuits will be asserted.
When wr=1, EVALFSMx proceeds to state 1330.EVALFSMx selects and the output_en signal for the FD bus driver produces, so that the write data of Mem_Block_N1253 is placed on FPGA bus FD[63:32] or FD[31:0] on.Thereafter, EVALFSMx waits for one-period, so that allow the SRAM memory device finish write cycle time.EVALFSMx gets the hang of 1335 then, and wherein the memory block number N obtains an increment; That is to say N=N+1.
Yet if at the wr=0 of state 1329, a read operation is requested, and EVALFSMx gets the hang of 1332, and wherein its waits for one-period, gets the hang of 1333 then, and wherein, it waits for the another one cycle.At state 1334, EVALFSMx produces the rd_latch signal on 1286 circuits, so that allow the memory read data double buffer 1251 of memory block N that the SRAM data are taken out of on the FD bus.EVALFSMx proceeds to state 1335 then, and wherein the memory block number N obtains an increment; That is to say N=N+1.Therefore, if before increment state 1335 N=1, N is exactly 2 now, so memory access afterwards will be adapted to memory block 2.
If present memory block number N be less than or equal in the user design the memory block sum M (just, N<=M), EVALFSMx proceeds to state 1329, and wherein it is write or read operation and be that the FD bus driver produces specific selection and output_en signal based on operation.Then, the write or read operation of carrying out for next memory block N will take place.
Yet, (just, N>M), EVALFSMx proceeds to state 1336 if the memory block sum M during the number N of current memory block designs than the user is bigger, wherein it opens the SHIFTOUT output signal, so that next fpga logic device visit SRAM memory device in the permission group.Thereafter, EVALFSMx proceeds to state 1320, wherein need carry out data estimation up to simulation system among the fpga logic device, it be always idle (just, EVAL=1).
Figure 61 has shown and has simulated according to an embodiment of the invention the Writing/Reading cycle.Figure 61 shown at numbering 1366 places in the simulation Writing/Reading cycle three the cycle-DMA data transfer cycle, estimation cycle and memory access cycle.Though without demonstration, can draw hint, i.e. DMA transmission, estimation and memory access may take place in advance.In addition, arrive/may be different from the transmission time sequence of high-end group of SRAM from the data transmission sequential of low side group SRAM.Succinct in order to narrate, Figure 61 has shown an example, and wherein the accessing time sequence at low side and high-end group is same.Global clock GCLK1350 provides clock signal for assemblies all in the system.
The generation of DATAXSFR signal 1351 indication DMA data transfer cycles.When the DATAXSFR=1 of trace 1367, the DMA data transmission just takes place between host computer system and fpga logic device or SRAM memory device.Therefore, data are provided to high-end group of bus FD[63:32 of FPGA] 1359 and trace 1369 on, and FPGA low side group bus FD[31:0] 1358 and follow the trail of 1368.Logical zero to 1 signal of DONE signal 1364 indication (finishing of) memory access cycle for instance, trace 1390, otherwise the duration in instruction simulation Writing/Reading cycle (combination at the edge of the edge of trace 1390 and trace 1370 for instance).During the DMA transmission cycle, the DONE signal is in logical zero.
When the DMA transmission cycle finished, the DATAXSFR signal entered logical zero from logical one, and this triggers the beginning in estimation cycle.Therefore, indicated as trace 1371, EVAL1352 is in logical one.The EVAL signal duration that is in logical one is scheduled to, and able to programme.During this estimation cycle, data in user's design logic obtain estimation with clk_en signal 1353, indicated as trace 1372, this signal is in logical one, indicated as trace 1373, input_en signal 1354 also is in logical one, and is indicated as trace 1374, mux_en signal 1355 also is in logical one, but duration is longer than clk_en and input_en.Estimate according to obtaining in this specific fpga logic device mileage.When mux_en signal 1355 in trace 1374 when logical one enters 0, and at least one memory block is present in the fpga logic device, estimates that so the cycle will finish, memory access cycle then begins.
SHIFTIN signal 1356 is asserted with a logical one at trace 1375.FPGA before this indication has finished its estimation, and all accessed entering of data that need/from the fpga logic device before this.Now, next fpga logic device has been ready to begin memory access in the group.
In trace 1377 to 1386, will use following term.ACj_k indication address is relevant with memory block k with FPGAj with control signal, and wherein j and k are that integer comprises 0.WDj_k is FPGAj and memory block k indication write data.RDj_k is FPGAj and memory block k indication read data.Therefore, relevant address and the control signal in AC3_1 indication and FPGA3 and memory block 1.Low side group SRAM visit and high-end group of SRAM visit 1361 are shown as trace 1387.
How ensuing several trace 1377 to 1387 is finished the display-memory visit.Based on the wrx signal logic level that mails to EVALFSMx, and the result is based on the mem_wr signal level that mails to MEMFSM, and write operation or read operation will obtain carrying out.Write operation if desired, the memory model interface (the Mem_Block_N interface 1253 among Figure 57) with user storage area N interface provides wrx as one of its control signal.This control signal wrx is provided to FD bus driver and EVALFSMx unit.If wrx is in logical one, suitable selection signal and output_en signal are provided to the FD bus driver so that the memory write data are placed on the FD bus.This identical control signal that is currently located on the FD bus can be stored device address/control latches in the CTRL_FPGA unit.Storage address/control lock storage is via a MA[18:2]/control bus sends to SRAM with address and control signal.The wrx control signal that is in logical one is extracted from the FD bus, and because a write operation is requested, so the data relevant with control signal with address on the FD bus are sent to the SRAM memory device.
Therefore, shown in Figure 61, this next fpga logic device is exactly the logical device FPGA0 in the low side group, and indicated as trace 1377, it is placed into FD[31:0 with AC0_0] on.Simulation system is write operation of WD0_0 operation.Then, AC0_1 is placed on FD[31:0] on the bus.Yet, if a read operation is requested, AC0_1 is being placed into FD bus FD[31:0] afterwards, be placed on the FD bus of SRAM memory device at RD0_0 (rather than corresponding to AC0_0 WD0_0) before, some time delays can appear.
Note, indicated as trace 1383, at MA[18:2]/place ACO_0 and on the FD bus, place the delay that address, control and data are compared will be had a little on the control bus.This is because address/control signal that the MEMFSM unit needs the time to latch to be come from the FD bus is extracted the mem_wr signal, and to address/suitable selection signal of control mux generation so that address/control signal can be placed on MA[18:2]/control bus on.In addition, at the MA[18:2 of SRAM memory device]/place address/control signal on the control bus after, simulation system must wait for that the corresponding data from the SRAM memory device is placed on the FD bus.One of them example is to be time migration between trace 1384 and the trace 1381, wherein is placed on MA[18:2 at AC1_1]/control bus on after, RD1_1 just is placed on the FD bus.
On high-end group, FPGA1 is being placed on AC1_0 bus FD[63:32] on, follow placement by WDI_0.Thereafter, AC1_1 is placed on bus FD[63:32] on.This is indicated by trace 1380.When AC1_1 was placed on the FD bus, control signal was indicated a read operation in this example.Therefore, according to above describing, when AC1_1 is placed on MA[18 shown in trace 1384; 2]/control bus on the time, the suitable wrx and the mem_wr signal that are in logical zero are provided in address/control signal, and are sent to EVALFSMx and MEMFSM unit.Because simulation system knows that this is a read operation, write data just can not be transferred to the SRAM memory device; On the contrary, relevant with AC1_1 read data is placed on the FD bus so that user's design logic carried out read operation via analog storage district interface afterwards by the SRAM memory device.This is indicated by the trace 1381 on high-end group.On the low side group, indicated as trace 1378, RD0_1 is placed on the FD bus, follows MA[18:2]/AC0_1 (not shown) on the control bus.
When EVALFSMx produced rd_lat0 signal 1362 to the memory read data double buffer in the analog storage district interface as trace 1388 is indicated, user's design logic had just been finished via the read operation of analog storage district interface.This rd_lat0 signal is provided to low side group FPGA0 and high-end group of FPGA1.
Thereafter, the next memory block at each fpga logic device is placed on the FD bus.AC2_0 is placed on the low side group FD bus, and AC3_0 is placed on the high-end group of FD bus.Write operation if desired, WD2_0 is placed on the low side group FD bus, and WD3_0 is placed on the high-end group of FD bus.Indicated as trace 1385, AC3_0 is placed on high-end group of MA[18:2]/control bus on.This process is proceeded, so that the write and read operation is carried out in next memory block.Notes, may on inconsistent time and speed, take place, and Figure 61 shows a special example that its low and middle-end is identical with high-end group sequential at the write and read operation of low side group and high-end group.The method that can increase is, the write operation of low side and high-end group takes place together, and so the read operation on latter two group is followed thereafter.But be not total generation like this.The existence of low side and high-end group allows the parallel work-flow of equipment to be connected to these groups; That is to say that the activity on the low side group is independent of the activity on high-end group.Also can imagine other situation, promptly operate a series of write operation when the low side group, and high-end group just when a series of read operation of parallel work-flow.
When suffering from last data in last fpga logic device of each group, indicated as trace 1376, SHIFTOUT signal 1357 is asserted.For read operation, indicated as trace 1389, be asserted to the RD3_1 that reads on RD2_1 and the trace 1379 of reading on the trace 1382 corresponding to the FPGA2 on the low side group and corresponding to the rd_lat1 signal 1363 of the FPGA3 on high-end group.Because last data of last FPGA unit are accessed, so indicated as trace 1390, finishing by DONE signal 1364 of simulation Writing/Reading cycle is indicated.
Following table H has listed and has described the various assemblies on the simulation system circuit board and register/memory, the PCI storage address of their correspondences, and the local address.
Table H: memory mapped
Assembly Register/memory PCI storage address (bit) Local address (bit) Describe
PLX9080 The PCI configuration register 00H is to 3CH
PLX9080 Local C onfig./working time/the DMA register Skew-FFh from PCI plot 0:0 Skew-80h from CSaddr:80h From PCI and local bus access
CTRL_FPGA[ 6:1] The XSFR_EVAL register Skew from PCI plot 2: 0h 0h Be in local space 0
CTRL_ FPGA1 CONFIG_JTA G1 register Skew from PCI plot 2: 10h 10h Be in local space 0
CTRL_FPGA 2 CONFIG_JTA G2 register Skew from PCI plot 2: 14h 14h Be in local space 0
CTRL_FPGA 3 CONFIG_JTA G3 register Skew from PCI plot 2: 18h 18h Be in local space 0
CTRL_FPGA 4 CONFIG_JTA G4 register Skew from PCI plot 2: 1Ch 1Ch Be in local space 0
CTRL_FPGA 5 CONFIG_JTA G5 register Skew from PCI plot 2: 18h 20h Be in local space 0
CTRL_FPGA 6 CONF1G_JTA G6 register Skew from PCI plot 2: 1Ch 24h Be in local space 0
CTRL_FPGA 1 Local RAM Skew from PCI plot 2: 400h-7FFh 400h-7FFh Be in local space 0
FPGA[3:0] SPACE0 Skew-the FFFFFFFh that is used for ch0DMA:0 from the PCI plot 80000000h is to 8FFFFFFFh DMA at GLOBALandS 2H data writes transmission
FPGA[3:0] SPACE1 Be used for skew-FFFFFFFh of ch0DMA:0 with the PCI plot 90000000H is to 9FFFFFFFh DMA at REGISTER-W RITE data writes transmission
FPGA[3:0] SPACE2 Skew-the FFFFFFFh that is used for ch1DMA:0 from the PCI plot A0000000H is to AFFFFFFFh DMA at the H2S data reads transmission
FPGA[3:0] SPACE3 Be used for the skew of ch1DMA:0 from the PCI plot B0000000H At REGISTER-RE AD's
Assembly Register/memory PCI storage address (bit) Local address (bit) Describe
-FFFFFFFh To BFFFFFFFh DMA reads transmission
L-SRAM, H-SRAM SPACE4 Skew-the FFFFFFFh that is used for ch0DMA:0 from the PCI plot C0000000H is to CFFF FFFFh DMA at SRAM writes transmission
L-SRAM, H-SRAM SPACE5 Skew-the FFFFFFFh that is used for ch1DMA:0 from the PCI plot D0000000H is to DFFF FFFFh DMA at SRAM reads transmission
SPACE6 Skew-the FFFFFFFh that is used for ch1DMA:0 from the PCI plot E0000000H is to EFFF FFFFh Keep
SPACE7 Skew-the FFFFFFFh that is used for ch1DMA:0 from the PCI plot F0000000H is to FFFF FFFFh Keep
Shown according to an embodiment of the invention data layout among the J at table below at configuration file.CPU sends a word so that at bit of the parallel configuration of FPGAs on all plates by pci bus is each.
Table J: configuration data form
Bit 0 Bit 1 Bit 2 Bit 3 Bit 16-31-
Word 0 D0(FPGA0) D0(FPGA1) D0(FPGA2) D0(FPGA3) control/status
Word 1 D1(FPGA0) D1(FPGA1) D1(FPGA2) D1(FPGA3) control/status
Word 2 D2(FPGA0) D2(FPGA1) D2(FPGA2) D2(FPGA3) control/status
Word 3 D3(FPGA0) D3(FPGA1) D3(FPGA2) D3(FPGA3) control/status
Word 4 D4(FPGA0) D4(FPGA1) D4(FPGA2) D4(FPGA3) control/status
Word s DS(FPGA0) D5(FPGA1) D5(FPGA2) D5(FPGA3) control/status
Following table K has listed the XSFR_EVAL register.It resides in all circuit boards.The XSFR_EVAL register is used for the EVAL cycle is programmed by the host computer system, controls the DMA read/write and reads EVAL_DONE and the state of XSFR_DONE field.The host computer system also uses this register to start memory access.About the operation of the simulation system of this register will obtain describing together with table 62 and 63 below.
Table K: at the XSFR_EVAL register (local address: 0h) of all 6 circuit boards
Field Signal Describe R/W Value after reseting
7:0 EVALTIME [7:0] According to the Eval time in pci clock cycle R/W 0h
8 EVAL-DONE The Eval-done mark. remove by setting bit WR-XSFR R 0
9 XSFRDONE Xsfr-done mark at read-write. remove by writing the XSFR-EVAL register R 0
10 RD-XSFR-EN Starting DMA-read-transfer. removes by XSFR_D0NE. R/W 0
11 WR-XSFR-EN Starting DMA-write-transfer.. removes by XSFR-DONE. when WR-XSFR and RD-XSFR establish sequential, CTRL_FPGA at first carries out DMA-write-transfer, automatically performs DMA-read-transfer. then R/W 0
19:12 Keep R/W 0h
20 F-CLRN During low value, reset all FPGA[3:0]. R/W 0
21 WAIT-EVAL If RD-XSFR and WR-XSFR set, this bit is effective.When being in 1, DMA-read-transfer begins after EVAL-DONE. and when being in 0, DMA-read-transfer begins after CLK-EN. R/W 0
22 MEM-EN Start the SRAM on the plate R/W 0
31:23 Keep
Following table L has listed CONFIG-JTAG[6:1] content of register.CPU configuration fpga logic device, and this register of process is the sweep test of fpga logic device running boundary.Each plate all has a special register.
Table L:CONFIG-JTAG[6:1] register
Field Signal Describe R/W Value after reseting
15:0 CONF-D[15:0] Be FPGA[15:0] configuration data R/W 0h
16 NCONF1G When transmitting from low to high, begin configuration R/W 0h
17 CONFDONE Configuration finishes R
18 CONF-CLK Configurable clock generator R/W 0
19 NSTATUS Configuration status shows mistake during low value R -
20 F-OE For opening output, simulation FPGA on all plates starts R/W 0h
21 JTAG-TCK The JTAG clock R/W 0
22 JTAG-TMS The JTAG model selection R/W 0
23 JTAG-TD1 The JTAG data enter-send to the TD1 of FPGA0 R/W 0
24 JTAG-TDO JTAG data output-from the TDO of FPGAS R -
25 JTAG-NR When low value, reset the JTAG test. R/W 0
26 LED2 1=is that Config-status opens LED2. 0=and closes. R/W 0
27 LED3 1=is that DataXsfr/Diag opens LED3. 0=and closes. R/W 0
31:28 Keep
Figure 62 and 63 has shown the sequential chart of another one embodiment of the present invention.These two figure have represented the operation of the simulation system of relevant XSFR_EVAL register.The XSFR_EVAL register is used for the EVAL cycle is programmed by the host computer system, controls the DMA read/write, and reads the state of EVAL_DONE and XSFR_DONE field.The host computer system also uses this register to start memory access.One of main difference point between these two figure is the state of WAIT_EVAL field.When the WAIT_EVAL field was configured to " O ", as the situation of Figure 62, DMA read to begin after being transmitted in CLK_EN.When the WAIT_EVAL field was configured to " 1 ", as the situation of Figure 63, DMA read to begin after being transmitted in EVAL-D0NE.
In Figure 62, WR_XSFR_EN and RD_XSFR_EN are configured to " 1 ".These two fields start DMA Writing/Reading transmission, and can be removed by XSFR_DONE.Because two fields all are configured to " 1 ", so at first moving DMA automatically, the CTRL_FPGA unit writes transmission, move DMA then and read transmission.Yet the WAIT_EVAL field is configured to " 0 ", and this indication DMA reads to be transmitted in CLK_EN and begin (and beginning) after asserting after the DMA write operation is finished.Therefore, in Figure 62, in case CLK_EN signal (software clock) is found, the DMA read operation almost takes place after the DMA write operation is finished at once.DMA reads to transmit operation and does not wait for finishing of EVAL cycle.
At the place that begins of sequential chart, if the contention of a plurality of fpga logic device notices that the EVAL_REQ_N signal can experience contention.Such as previously explained, if any one fpga logic device asserts that this signal, EVAL_REQ_N (or EVAL_REQ#) signal just are used to start the estimation circulation.In DTD, in the estimation cycle, comprise the operation of address pointer initialization and software clock, so that help estimation process.
The DONE signal is produced when the DMA data transfer cycle is finished, if a plurality of LAST signal (from the shiftin and the shiftout signal of each fpga logic device output) is produced and be provided to the CTRL_FPGA unit, it also can experience contention.When all LAST signals all were received and pass through processing, the DONE signal was just produced, and can begin a new DMA data transmission operation.The EVAL_REQ_N signal uses identical circuit with the DONE signal in the mode of timesharing, and we will discuss this mode hereinafter.
System at first begins DMA automatically and writes transmission, shown in the WR_XSFR signal at sequential 1409 places.The beginning of WR_XSFR signal partly comprises some expenses relevant with pci controller, and in one embodiment, this is PCI9080 or 9060.Thereafter, the host computer system is via local bus LD[31:0] and FPGA bus FD[63:0] to being connected to FPGA bus FD[63:0] DMA write operation of fpga logic device operation.
At sequential 1412 places, the WR_XSFR signal is closed, and this indicates finishing of DMA write operation.The 125EVAL signal pin is to 1410 the predetermined sequential and being activated from sequential 1412 to sequential.The EVALTIME duration is programmable, and is set in 8+X at first, and wherein X originates from the longest signal traces path.The XSFR_DONE signal also is activated in a short time, and this indicates finishing of this DMA transmission operation, and wherein current operation is a DMA write operation.
Equally also in sequential 1412, the contention between the EVAL_REQ_N signal has stopped, but the current transmission of the circuit EVAL_REQ_N signal that carries the DONE signal is given the CTRL_FPGA unit.For 3 clock period, the EVAL_REQ_N signal obtains via the circuit that carries the DONE signal handling.After 3 clock period, the EVAL_REQ_N signal is no longer produced by the fpga logic device, but the EVAL_REQ_N signal that before had been sent to the CTRL_FPGA unit will obtain handling.For gateable clock, no longer the maximum sequential of EVAL_REQ_N signal that is produced by the fpga logic device approximately is 23 clock period.The EVAL_REQ_N signal longer than this cycle will be left in the basket.
At sequential 1413 places, greatly about back 2 clock period of sequential 1412 (this sequential is in the end of DMA write operation), the CTRL_FPGA unit sends to pci controller with a write address lock WPLXADS_N signal and (for instance, PLXPCI9080), reads transmission so that begin DMA.After 1413 about 24 clock period of beginning, pci controller will start DMA and read transmission course, produces the DONE signal simultaneously from sequential.At sequential 1414 places, before the DMA of pci controller read procedure began, the RD_XSFR signal was activated, and read transmission so that start DMA.Some PLX overhead datas are at first transmitted and are handled.At sequential 1415 places, during this overhead data was processed, the DMA read data was placed on FPGA bus FD[63:0] and local bus LD[31:0] on.When 24 clock period from sequential 1413 finish, and when producing from the DONE signal enabling of fpga logic device and EVAL_REQ_N signal, pci controller is by will be from FPGA bus FD[63:0] and local bus LD[31:0] data transmission handle the DMA read data to mainframe computer system.
At sequential 1410 places, when the EVAL signal was closed, the DMA read data will continue to obtain to handle, and the EVAL_DONE signal will be activated, so that the finishing of indication EVAL cycle.When they produced the EVAL_REQ_N signal, the contention among the fpga logic device also began to occur.
At sequential 1417 places, just the DMA read cycle before sequential 1416 places finish, mainframe computer system with poll PLX interrupt register in case the decision dma cycle whether wane to the close.Pci controller knows to finish the DMA data transmission procedure must how many cycles.After the cycle of a predetermined number, pci controller will be set a special bit in its interrupt register.CPU in the mainframe computer system is this interrupt register of poll in pci controller.If bit is set, CPU just knows that dma cycle almost finishes.CPU poll interrupt register all the time not in host computer system will be because it will bundle pci bus with a read cycle then.Therefore, in one embodiment of the invention, before the poll interrupt register, the CPU device is programmed the cycle of waiting for some in the mainframe computer system.
After a brief sequential, the end of DMA read cycle takes place at sequential 1416 places, and RD_XSFR is closed simultaneously, and the DMA read data also no longer is positioned at FPGA bus FD[63:0] or local bus LD[31:0] on.The XSFR_DONE signal also is activated at sequential 1416 places, has also begun for the contention that produces the DONE signal between the LAST signal.
In the whole dma cycle that the WR_XSFR signal that arrives sequential 1417 from sequential 1409 produces, CPU in the mainframe computer system does not visit the analog hardware system, in one embodiment, the duration in this cycle is sequential of expense of (1) pci controller sequential 2 and the number of words order of (2) WR_XSFR and RD_XSFR, and the summation of (for example SunULTRASparc) PCI expense of (3) mainframe computer system.When processor in pci controller the poll interrupt register time, first visit after dma cycle takes place at sequential 1419 places.
In sequential 1411,3 clock period are located after sequential 1416 greatly, and the MEM_EN signal is activated, so that start SRAM memory device on the plate, the memory access between fpga logic device and the SRAM memory device just can begin like this.Memory access continues up to sequential 1419, and in one embodiment, essential 5 clock period of each visit.If do not need DMA to read to transmit, memory access can more early begin at sequential 1410 places so, rather than begins at sequential 1411 places.
Though memory access is at FPGA bus FD[63:0] on the fpga logic device and the SRAM memory device between take place, the CPU in the mainframe computer system can be via local bus LD[31:0] 1429 communicate from sequential 1418 to sequential with pci controller and CTRL_FPGA unit.This occurs in processor and finishes after the interrupt register of poll pci controller.CPU is write data on different registers, so that prepare next data transmission.The duration in this cycle is greater than 4u second.If memory access is shorter than this cycle, so FPGA bus FD[63:0] will be without the conflict of what being successively held.At sequential 1429 places, the XSFR_DONE signal is closed.
In Figure 63, sequential chart is different from the sequential chart of Figure 62, because in Figure 63, the WAIT_EVAL field is configured to " 1 ".In other words, DMA reads the transmission cycle and starts after the EVAL_DONE signal is activated and almost finishes.The approaching of its wait EVAL cycle finished, rather than gets started after the DMA write operation is finished.The EVAL signal is in that the pre-sequential preface of 1410 is activated from sequential 1412 to sequential.At sequential 1410 places, the EVAL_DONE signal is activated, and this indicates finishing of EVAL cycle.
In Figure 63, after the DMA write operation at sequential 1412 places, the CTRL_FPGA unit just produces write address lock signal WPLXADS_N to pci controller up to sequential 1420 places, and this is 16 clock period before the EVAL end cycle greatly.The XSFR_D0NE signal also is lengthened to sequential 1423 places.At sequential 1423 places, the XSFR_DONE field is set, and produces the WPLXADS_N signal then, so that start the DMA read procedure.
At sequential 1420 places, greatly 16 clock period before the EVAL_DONE signal enabling, the CTRL_FPGA unit sends to pci controller (for instance, PLXPCI9080) so that beginning DMA reads transmission with a write address lock signal WPLXADS_N.Beginning about 24 clock period from sequential 1420, pci controller will start DMA and read transmission course, and the DONE signal is also produced.At sequential 1421 places, begin to carry out before DMA reads to handle at pci controller, the RD_XSFR signal is activated, and reads to transmit so that start DMA.Some PLX overhead datas at first obtain transmission and handle.At sequential 1422 places, this overhead data processed during, the DMA read data is placed on FPGA bus FD[63:0] and local bus LD[31:0] on.When 24 clock period at sequential 14 places finished, pci controller was by will be from FPGA bus FD[63:0] and local bus LD[31:0] data transmission handle the DMA read data to mainframe computer system.The remainder of sequential chart equals the remainder of Figure 62.
Therefore, the startup of the RD_XSFR signal among Figure 63 is more late than the startup among Figure 62.RD_XSFR signal among Figure 63 is followed the approaching place that finishes in EVAL cycle, so that postpone the DMA read operation.RD_XSFR signal among Figure 62 is write at DMA and is transmitted the detection of following the CLK_EN signal after finishing.
IX. collaborative check system
Collaborative check system of the present invention can be by providing flexibly software simulation and originating from and use the faster speed of a hardware model to quicken the design/construction cycle to the deviser.The hardware and software part of design can both obtain check before ASIC makes, and to based on the collaborative verification instrument of emulator also without limits.Debug features is enhanced, and the comprehensive debug time also may be shortened significantly.
As the collaborative verification instrument of tradition Devices to test, that have ASIC
Figure 64 has shown a typical final design, is embodied as a PCI additional card, for example video, multimedia, Ethernet or SCSI card.This card 2000 comprises the direct interface connector 2002 that a permission is communicated by letter with other peripheral apparatus.Connector 2002 is connected to bus 2001, so that will arrive display or loudspeaker from the video signal transmission of video recorder, camera or TV tuner, video and audio frequency output; And transfer signals to communication or disk drive interface.Depend on that the user designs, the person skilled in art can predict other interface requirements.A large amount of functions of the design are present in the chip 2004 that is connected to interface connector 2002 via bus 2003, and are used to produce the local oscillator 2005 of a local clock signal and via the storer 2006 of bus 2008 via bus 2007.Additional card 2000 also comprises a PCI connector 2009, is used for being connected with pci bus 2010.
Before the design of implementing as an additional card as shown in Figure 64, the design is reduced to the ASIC form, and this is the purpose in order to test.Shown that at Figure 65 a traditional hardware/software works in coordination with the verification tool work.User's design obtains implementing with the form of ASIC, and this form is denoted as the Devices to test (or " DUT ") 2024 among Figure 65.In order to obtain the multiple excitation from the design connecting interface, Devices to test 2024 is placed within the goal systems 2020, and this system is the central computer system 2021 on the mainboard and the combination of some peripheral hardwares.Goal systems 2020 comprises a central computer system 2021, it comprises a CPU and storer, and this system moves under the certain operations system, as " form " of Microsoft or the Solaris of SunMicrosystem company, so that the application program of operation some.To those skilled in the art, the Solaris of SunMicroSystem company is an operating environment, also is the software product combination of supporting internet, internal network and enterprise calculation.The Solaris operating environment is based on industrial standard unix system V edition 4, and is designed to carry out the master-slave mode application program in a distributed network environment, for less working group provides suitable resource, and provides e-business needed WebTone.
Device driver 2022 at Devices to test 2024 is comprised in the central computer system 2021, so as to start the operating system (and Any Application) and Devices to test 2024 between communication.To those skilled in the art, special software that device driver is control computer system hardware components or peripheral hardware.A device driver is responsible for the hardware register of access means, and often comprises an interrupt handling routine so that the interruption that service equipment produces.Device driver constitutes other some of lowermost level of operating system nucleus often, and when kernel was built, driver is coupled to get on.There is the driver that can be written in some systems more recently, and this program can be installed from file after operating system.
Devices to test 2024 and central computer system 2021 are connected on the pci bus 2023.Other peripheral hardwares in the goal systems 2020 comprise an Ethernet PCI additional card 2025 that is used to via bus 2034 goal systems is connected to a network 2030, a SCSIPCI additional card 2026 that is connected to SCSI driver 2027 and 2031 via bus 2036 and 2035, a video recorder 2028 (if in the design of Devices to test 2024, being essential) that connects via bus 2032, and a display and/or a loudspeaker 2029 (if in the design of Devices to test 2024, being essential) that is connected to Devices to test 2024 via bus 2033.For a person skilled in the art, SCSI represents " personal computer system interface ", this is a kind of standard that is independent of processor of carrying out system-level interface exchange between computing machine and smart machine (for example, hard disk, floppy disk, CD, printer, scanner and Geng Duo equipment).
In this goal systems environment, Devices to test 2024 can obtain detecting together with multiple excitation and the peripheral hardware from central computer system (being operating system, application program).If the time is not the problem that will consider, and the deviser only seeks a kind of simply by/failure test, and this collaborative verification instrument should be able to fully satisfy their needs.Yet in most of situations, a design item has strict budget, and before product is released strict predetermined schedule is arranged.Just as explained above, this specific collaborative verification instrument based on ASIC is also unsatisfactory, (deviser does not have complicated technology because its debug features does not exist, the reason that can't isolate " failure " test, and project can't predict " reparation " number of each mistake of discovery when beginning, and therefore also can't predict schedule and budget.
As the collaborative verification instrument of the tradition with an emulator of tape test equipment
Figure 66 for example understands the collaborative verification instrument of the tradition that has an emulator.Different with being provided with of above being explained in Figure 64, Devices to test is programmed to be brought in the emulator 2048 that is connected to goal systems 2040 and some peripheral hardwares and a testing workstation 2052.Emulator 2048 comprises a simulation clock 2066 and is programmed the Devices to test of including in the emulator.
Emulator 2048 is connected to goal systems 2040 via pci bus bridge 2044 and pci bus 2057 and operation circuit 2056.Goal systems 2040 comprises the central computer system 2041 on the mainboard and the combination of some peripheral hardwares.Goal systems 2040 comprises a central computer system 2041, it comprises processor and storer, and this system moves under the certain operations system, as " form " of Microsoft or the Solaris of SunMicrosystem company, so that the application program of operation some.Device driver 2042 at Devices to test 2024 is comprised in the central computer system 2041, so as to start the operating system (and Any Application) and the Devices to test of emulator 2048 between communication.In order to communicate with emulator 2048 and as other equipment of a computing environment part, central computer system 2041 is connected on the pci bus 2043.Other peripheral hardwares in the goal systems 2040 comprise 2045, one the SCSI PCI additional card 2046 that are connected to SCSI driver 2047 and 2050 via bus 2060 and 2059 of Ethernet PCI additional card that are used to via bus 2058 goal systems is connected to a network 2049.
Emulator 2048 also is connected to testing workstation 2052 via bus 2062.Testing workstation 2052 comprises a CPU and storer, so that carry out its function.Testing workstation 2052 also may comprise test chassis 2061 and at modeled but be not connected to the device model 2068 of other equipment of emulator 2048 in fact.
At last, emulator 2048 is connected to some other peripheral hardwares via bus 2061, as frame buffer or traffic logging/Play System 2051.This frame buffer or traffic logging/Play System 2051 also may be connected to communication facilities or channel 2053 via bus 2063, are connected to video tape recorder 2054 and are connected to display and/or loudspeaker 2055 via bus 2065 via bus 2064.
To those skilled in the art, the travelling speed of simulation clock is much more slowly than real goal systems speed.Therefore, the dash area of Figure 66 is just with the simulation velocity operation, and other shadeless parts are just with real goal systems speed operation.
As indicated above, this collaborative verification worker who has emulator has some restrictions.When using a logic analyzer or a sample-and-hold circuit equipment to obtain the internal state information of Devices to test, the deviser must compile his design, so as he in inspection relevant signal interested, that remain to be debugged can be provided to output pin and take a sample.If the deviser wants to debug for a different piece of design, he just must determine that this partly has the output signal that can be taken a sample by logic analyzer or sample-and-hold circuit equipment, otherwise he must recompilate his design in emulator 2048, realizes the sampling purpose on the output pin so that these signals can be presented to.The time of these recompilities may need a couple of days or a few weeks longer, and for the design/development time table of a time-sensitive, this may be too tediously long delay.In addition, because this collaborative verification instrument uses signal, so must provide complicated circuit that these conversion of signals are become data or some signals are provided to signal sequence control.And, must use a lot of circuits 2061 and 2062, this is that each signal that will take a sample is necessary, this has just increased burden and time that debugging is provided with.
Have the simulation of reconfigurable computing array
As a brief summary, Figure 67 for example understands a kind of high level configuration with the reconfigurable calculating of single engine (RCC) array system of the present invention, and this patent right instructions was once above being described this invention.This single engine RCC system will merge with collaborative check system according to an embodiment of the invention.
In Figure 67, RCC array system 2080 comprises 2081, one reconfigurable calculating of a rcc computing system (RCC) hardware array 2084 and the pci bus 2089 that they are linked together.Importantly, rcc computing system 2081 comprises that users all in the software designs a model, and RCC hardware array 2084 comprises the hardware model that the user designs.Rcc computing system 2081 comprises CPU, storer, an operating system and moves the necessary software of single engine RCC system 2080.Provide a software clock 2082 so that the hardware model in tight control of the software model in the startup rcc computing system 2081 and the RCC hardware array 2084.Test platform data 2083 also are stored in the rcc computing system 2081.
RCC hardware array system 2084 comprises 2085, one groups of RCC hardware of pci interface array board 2086, and the various buses that realize the interface purpose.RCC hardware array board 2086 combination comprise at least a part of modelling in hardware (just, hardware model 2087) user's design and be used for test platform memory of data 2088.In one embodiment, during disposing, the various parts of this hardware model are distributed on (fpga chip for instance) between a plurality of reconfigurable logic elements.Because use more reconfigurable logic element or chip, need to use more plate.In one embodiment, four reconfigurable logic elements are provided on the single circuit board.In other embodiment, eight reconfigurable logic elements are provided on the single circuit board.It is different significantly that the circuit board of the capacity of the reconfigurable logic element that provides in the circuit board that four chips are formed and ability and eight chips compositions provides reconfigurable assembly capacity and ability to have.
Bus 2090 provides different clocks 2087 for hardware model from pci interface 2085 to hardware model.Bus 2091 provides other I/O (I/O) data via connector 2093 and internal bus 2094 between pci interface 2085 and hardware model 2087.Bus 2092 plays the function of the pci bus between pci interface 2085 and the hardware model 2087.The test platform data also can be stored in the storer in the hardware model 2087.As indicated above, hardware model 2087 comprises other 26S Proteasome Structure and Functions, but not starts hardware model so that carry out the hardware model that interface exchanges needed user's design with rcc computing system 2081.
This RCC system 2080 may be provided in the single workstation, and maybe the method that can substitute is, is connected to a network of workstations, and wherein the visit of each workstation is provided to RCC system 2080 in the mode of timesharing.As a result, RCC array system 2080 is as emulating server, and it has an operation simulation program and status exchange mechanism.Server allows each user in the workstation to visit RCC hardware array 2084 for realizing acceleration at a high speed and hardware state switching purpose.After acceleration and status exchange, each user can both can discharge the control of RCC hardware array 2084 simultaneously to other users of other workstations with local mode analog subscriber design in software.The collaborative check system that this network model also will be used to describe below.
RCC array system 2080 provides the ability and the dirigibility of the whole design of simulation for the deviser, the deviser can also quicken partly test point via the hardware model in the reconfigurable computing array in the cycle of selecting, and obtains the internal state information of any part in its design at any time.Really, have single engine, reconfigurable computing array (RCC) system can be described as a hardware-accelerated emulator by coarse, it can be used to the following task of operation between single limber up period: simulate alone (1); (2) with hardware-accelerated simulation, wherein the user can start, stop, asserting numerical value, and can check the internal state of design at any time, (3) back sunykatuib analysis, and (4) internal circuit emulation.Because software model and hardware model all are under the strictness control of single engine via a software clock, the hardware model in the reconfigurable computing array closely is connected to the software simulation model.This allows the deviser to debug by the cycle, and quickens and the hardware model that slows down through the cycle of some, so that obtain valuable internal state information.And, because this simulation system deal with data, rather than signal, so do not need complicated signal-to-data-switching/sequential circuit.In addition, if the deviser wishes to check different groups of nodes, the hardware model in the reconfigurable computing array does not need to recompilate, and this point is unlike typical analogue system.Relevant further details please be looked back description above.
The collaborative check system that does not have outside I/O
One embodiment of the invention is one and does not use collaborative check system true and outside input-output apparatus of physics and destination application.Therefore, a collaborative check system according to an embodiment of the invention can be integrated in the RCC system works together with the functional of other, so that do not use any real goal system or input-output apparatus to debug user's design software part and hardware components.On the contrary, goal systems and outside input-output apparatus by modelling in the software of rcc computing system.
With reference to Figure 68, collaborative check system 2100 comprises a rcc computing system 2101, RCC hardware array 2108 and the pci bus 2114 that they are coupled together.Importantly, rcc computing system 2101 comprises the whole model that the user designs in software, and reconfigurable computing array 2108 comprises the hardware model that the user designs.Rcc computing system 2101 comprises processor, storer, an operating system and moves the necessary software of the collaborative check system 2100 of single engine.Software clock 2104 be provided in case start software model in the rcc computing system 2101 strictness control, and the hardware model in the reconfigurable computing array 2108.Test case 2103 also is stored in the rcc computing system 2101.
Situation is according to an embodiment of the invention, rcc computing system 2101 also comprises destination application 2102 in being labeled as 2106 software, the user designs a driver 2105 of hardware model, an equipment (for instance, a video card) model and its driver, in being labeled as 2107 software, comprise model of other equipment (display for instance) and its driver in addition.In essence, rcc computing system 2101 comprises device model as much as possible and driver as required, is transferred to the software model and the hardware model of user's design, illustrates that real goal system and other input-output apparatus are ingredients of this computing environment.
RCC hardware array 2108 comprises a pci interface 2109, one groups of RCC hardware array board 2110 and various for realizing the bus of interface purpose.RCC hardware array board 2110 combination comprises at least that the certain customers of modelling hardware 2112 in design and at test platform memory of data 2113.According to description above, each circuit board comprises a plurality of reconfigurable logic elements or chip.
Bus 2115 is for 2112 hardware model provides various clocks from pci interface 2109 to hardware model.Bus 2116 provides other I/O data between pci interface 2109 and the hardware model 2112 via connector 2111 and internal bus 2118.The function of bus 2117 is equivalent to the pci bus between pci interface 2109 and the hardware model 2112.The test platform data also can be stored in the storer in the hardware model 2113.According to description above, hardware model comprises other 26S Proteasome Structure and Functions, but not startup hardware model and rcc computing system 2101 carry out the hardware model that interface exchanges needed user's design.
For the collaborative check system among Figure 68 and traditional collaborative check system based on emulator are compared, Figure 66 has shown the emulator 2048 that is connected to goal systems 2040, some input-output apparatus (frame buffer or traffic logging/Play System 2051 for instance) and a workstation 2052.This emulator is configured to the deviser and has proposed a lot of troubles and setting problem.Emulator needs a logic analyzer or a sample-and-hold circuit equipment, so that the user of measurement modelization in emulator designs internal state.Because logic analyzer and sampling maintenance equipment need signal, requiring has complicated signal-to-data converting circuit.The method that can increase is, also needs complicated signal-to-signal sequence control circuit simultaneously.Each signal all needs many leads to measure the internal state of emulator, and this will further increase the user and establish the burden that sequential runs into.Between limber up period, each user wants to check a different set of internal logic circuit, he must recompilate emulator, and the proper signal from that logical circuit offers measurement and recording operation as output by logic analyzer or sample-and-hold circuit equipment like this.Very long recompility time cost is too expensive.
The outside input-output apparatus that does not connect in the collaborative check system of the present invention, in this system, the input-output apparatus of goal systems and other by modelling in software, so that do not need physical presence real physical target system and input-output apparatus.Because rcc computing system 2101 deal with data, thus undesired signal-to-data converting circuit or signal-to-signal sequence control circuit.The lead number also need not combine with signal number, therefore, sets simple relatively.In addition, because collaborative check system deal with data rather than signal, so the different piece of debug logic circuit does not need to recompilate yet in the hardware model of user's design.Because rcc computing system is with clock (just, software clock and clock edge sense circuit) the control RCC hardware array that is subjected to software constraint, so the beginning of hardware model and stop to have become easily.Because the model of whole user design is in the software, and software clock starts synchronously, so also be easy from the read data of hardware model.Therefore, the user can only debug by software simulation, the part in the accelerating hardware or all designs, and the test point of the various needs of process is carried out the promotion by the cycle, checks the internal state (just, register and combined type logic state) of software and hardware model.For instance, the user can design with some test platform digital simulations, then internal state information is downloaded to hardware model, various test platform data with hardware model are quickened design, by the regeneration of register/combined type logic be written into the generation internal state numerical value that numerical value is checked final hardware model from the hardware model to the software model, and the user can come other parts of analog subscriber design at last by the result who uses the hardware model accelerator in software.
Yet,,, still need a workstation for debug procedures control according to description above.In a network configuration, a workstation may be connected in the collaborative check system so that the remote access tune-up data by long-range.In a non-network configuration, a workstation may be connected to collaborative check system partly, perhaps is connected among some other embodiment, and workstation may inherently be integrated collaborative check system, so that local access's tune-up data.
The collaborative check system that has outside I/O
In Figure 68, various input-output apparatus and destination application by modelling in rcc computing system 2101.Yet when too many input-output apparatus and destination application moved in rcc computing system 2101, bulk velocity can slow down.If have only a single-processor in the rcc computing system 2101, just must more time handle various data from all devices model and destination application.In order to increase volume of transmitted data, true input-output apparatus and destination application (rather than software model of input-output apparatus and destination application) can be connected to collaborative check system physically.
One embodiment of the invention is one and uses collaborative check system true and outside input-output apparatus of physics and destination application.Therefore, when using real goal systems and input-output apparatus, a collaborative check system can merge the RCC system together with other functions, so that the software section and the hardware components of debugging user design.For test purpose, collaborative check system can use from the test platform data of software with from the excitation (goal systems and outside input-output apparatus for instance) of external interface.The test platform data can not only be used to provide test data to the leading foot of user design, and provide test data for the internal node of user in designing.Only may be introduced into the user from the true input/output signal of outside input-output apparatus (or goal systems) and design leading foot.Therefore, from an external interface (for instance, goal systems or outside input-output apparatus) the test data and a kind of key distinction of the test platform in the software between handling can be used to utilize the excitation that is applied to leading foot and internal node to come the test subscriber to design with regard to being the test platform data, and can only be applied to the user's design node of leading foot (or in user's design, represent) via its leading foot from the True Data of goal systems or outside input-output apparatus.In following discussion, we will present about the collaborative check system structure of a goal systems and outside input-output apparatus and its configuration.
As with Figure 66 in the comparison carried out of system configuration, collaborative check system according to an embodiment of the invention is replaced project structure and function at dotted line 2070 places.In other words, Figure 66 shows the interior emulator and the workstation of scope of dotted line 2070, and one embodiment of the present of invention comprise collaborative check system 2140 (with its workstation that links) at dotted line 2070 places, as the shown collaborative check system 2140 of Figure 69.
With reference to Figure 69, collaborative check system configuration according to an embodiment of the invention comprises a goal systems 2120, one collaborative check systems 2140, some optional input-output apparatus and control/ data bus 2131 and 2132 that they are coupled together.Goal systems 2120 comprises a central computer system 2121, this computing system comprises a CPU and storer, and in the certain operations system, move, for example the Solaris of " form " of Microsoft or SunMicrosystem company is so that the application program 2122 and the test case 2123 of operation some.Device driver 2124 at the hardware model of user design is comprised in the central computer system 2121, so as to start the operating system (and Any Application) and user's design between communication.For with collaborative check system and constitute this computing environment other equipment partly and communicate, central computer system 2121 is connected to pci bus 2129.Other peripheral hardwares in the goal systems 2120 comprise an Ethernet PCI additional card 2125 that is used to goal systems is connected to a network, be connected to a SCSI PCI additional card 2126 and a pci bus bridge 2127 of SCSI driver 2128 via bus 2130.
Collaborative check system 2140 comprises a rcc computing system 2141, a RCC hardware array 2190, with a kind of external interface 2139 that outside I/O extender form occurs, be connected the pci bus 2171 of rcc computing system 2141 and RCC hardware array 2190 with one.Rcc computing system 2141 comprises CPU, storer, an operating system and moves the collaborative check system 2140 necessary softwares of single engine.Importantly, rcc computing system 2141 comprises the whole model that the user designs in software, and RCC hardware array 2190 comprises the hardware model that the user designs.
According to discussion above, the single engine of collaborative check system is from major software kernel acquisition its ability and a dirigibility, and this kernel resides in the primary memory of rcc computing system 2141, and the whole operations and the execution of the collaborative check system 2140 of control.As long as any test platform is in active state, or be sent to collaborative check system from any signal in the external world, kernel is the test platform assembly of estimation activation just, the estimation clock assembly, detect the clock edge so that upgrade RS, simultaneously also propagate the combined type logical data, the simulated time of advancing.This major software kernel provides the strictness of rcc computing system 2141 to connect character and RCC hardware array 2190.
Software kernel produces the software clock signal from a software clock source 2142, and this signal is provided to the RCC hardware array 2190 and the external world.Clock source 2142 can produce multiple clock in different frequencies, and this depends on the destination of these software clocks.Usually, software clock determines to estimate synchronously at register and system clock that the user designs in the hardware model, and the phenomenon that upsets without any the retention time.Software model can detect in software influences the clock of hardware model register value edge.Therefore, a kind of clock detection mechanism determines that in the major software model clock edge detects to be transmitted hardware model is carried out clock detection.More detailed discussion about software clock and clock edge detection logic please refer to the text of following in Figure 17-19 and the patent specification.
Situation according to an embodiment of the invention is, rcc computing system 2141 also may comprise the one or more models in the input-output apparatus of some, although other real physics input-output apparatus can be connected in the collaborative check system.For instance, rcc computing system 2141 may comprise that together with its driver and test platform data equipment (for instance in being labeled as 2143 software, a loudspeaker) model, and in being labeled as 2144 software, comprise another one equipment (graphics accelerator for instance) model together with its driver and test platform data.The user determines which equipment (and they divide other driver and test platform data) will be also whole in rcc computing system 2141 by modelling, and in fact which equipment will be connected in the collaborative check system.
Collaborative check system comprises a steering logic that Control on Communication is provided, this control occurs between the following equipment: (1) rcc computing system 2141 and RCC hardware array 2190, and (2) external interface (being connected to the interface of goal systems and outside input-output apparatus) and RCC hardware array 2190.Because some input-output apparatus may be by modelling in rcc computing system, so some data transmit between RCC hardware array 2190 and rcc computing system 2141.In addition, rcc computing system 2141 has whole the designing a model in the software, comprises the certain customers design of modelling in RCC hardware array 2190.As a result, rcc computing system 2141 also must be able to be visited through all data between external interface and the RCC hardware array 2190.Steering logic determines that rcc computing system 2141 can visit these data.Hereinafter will be described in more detail steering logic.
RCC hardware array 2190 comprises the array board of some.In this specific embodiment that shows in Figure 69, hardware array 2190 comprises plate 2145-2149.Circuit board 2146-2149 comprises the size of the hardware model that is configured.Circuit board 2145 (or circuit board m1) comprises reconfigurable computing element (for instance, fpga chip) 2153, collaborative check system can use this element to come configuration section hardware model at least, and the outside i/o controller 2152 of indication communication and data between interface (goal systems and input-output apparatus) and the collaborative check system 2140 externally.Circuit board 2145 allows rcc computing system 2141 to visit externally all data of transmission between the world (just, goal systems and the input-output apparatus) and RCC hardware array 2190 via outside i/o controller.This visit is very important, because the rcc computing system 2141 in the collaborative check system comprises a model of whole user's design in software, and rcc computing system 2141 also can be controlled the function of RCC hardware array 2190.
If the excitation from an outside input-output apparatus is provided to hardware model, software model also must can be visited this excitation, so that the user of this collaborative check system can control next debugging step selectively, this step may comprise the design internal state numerical value of inspection as this application excitation result.As above about plate layout and interconnect scheme discuss, first is comprised in the hardware array 2190 with last plate.Therefore, plate 1 (being labeled as plate 2146) and plate 8 (being labeled as plate 2149) are comprised in the hardware array of being made up of eight plates and (get rid of plate m1).Except plate 2145-2149, plate m2 (do not show in Figure 69, but see Figure 74) also may be provided, and has chip m2.This plate m2 is similar to plate m1, except plate m2 without any external interface, and add-in card if desired, it can be used to realize the expansion purpose.
The content of these plates will be discussed now.Plate 2145 (plate m1) comprises 2151, one outside i/o controllers 2152 of a pci controller, data chip (m1) 2153, storer 2154 and multiplexer 2155.In one embodiment, this pci controller is PLX9080.Pci controller 2151 is connected to via the rcc computing system 2141 of bus 2171 with via the three condition impact damper 2179 of bus 2172.
Externally the main communication controler in the collaborative check system between the world (goal systems 2120 and input-output apparatus) and the rcc computing system 2141 is that an outside i/o controller 2152 is (at Figure 69,71, with 73 in be also referred to as " CTRLXM "), this controller is connected to rcc computing system 2141, other plates 2146-2149 in the RCC hardware array is in goal systems 2120 and the true outside input-output apparatus.Certainly, as as described above, main communication controler between rcc computing system 2141 and the RCC hardware array 2190 always is the single inner i/o controller (i/o controller 2156 and 2158 for instance) among each array board 2146-2149 and the combination of pci controller 2151.In one embodiment, these single inner i/o controllers, for example controller 2156 and 2158 is above to be described with illustrational as the FPGA i/o controller in Figure 22 (unit 700) and Figure 56 exemplary view such as (unit 1200).
Outside i/o controller 2152 is connected to three condition impact damper 2179, so that allow outside i/o controller to connect rcc computing system 2141.In one embodiment, in some cases, when preventing to pass to rcc computing system 2141 from the data of local bus, three condition impact damper 2179 allows to pass to local bus 2180 from the data of rcc computing system 2141, and allows data to pass through to rcc computing system 2141 other illustrations from local bus 2180.
Outside i/o controller 2152 also is connected to chip (m1) 2153 and storer/external buffer 2154 via data bus 2176.In one embodiment, chip (m1) the 2153rd, reconfigurable computation module, fpga chip for example, it can be used to the part hardware model (or all hardware model, enough little if the user designs) of configure user design at least.In one embodiment, external buffer 2154 is DRAMDIMM, and can be used by chip 2153 and be used for realizing multiple purpose.External buffer 2154 provides many memory spans, surpasses the indivedual SRAM memory devices that are connected to each reconfigurable logic element (reconfigurable logic element 2157 for instance) with local mode.This big memory span allows rcc computing system to store lot of data, for example test platform data, microcontroller embedded code (if user's design is a microcontroller), and the large-scale look-up table in memory devices.According to top description, external buffer 2154 also can be utilized for hardware model and store essential data.In fact, this external buffer 2154 can partly play another high-end or the same function of low side group SRAM memory device described and illustrated above, for instance, and Figure 56 (SRAM1205 and 1206), but it has more storer.External buffer 2154 also can be used for storing the data of receiving from the input-output apparatus of goal systems 2120 and outside by collaborative check system, so that these data can be fetched by rcc computing system 2141 after a while.Chip m12153 and external buffer 2154 are also contained in the memory mapped logic of describing in patent specification " storer simulation " part.
In order externally to visit the data that need in the impact damper 2154, chip 2153 and rcc computing system 2141 (via outside i/o controller 2152) can both be the data transfer address of needs.Chip 2153 provides the address on address bus 2182, outside i/o controller 2152 provides the address on address bus 2177.These address buss 2182 and 2177 are the inputs to a multiplexer 2155, and it provides selected address on output 2178 circuits that are connected to external buffer 2154.Selection signal at multiplexer 2155 is provided via 2181 circuits by outside i/o controller 2152.
Outside i/o controller 2152 also is connected on other plates 2146-2149 via bus 2180.In one embodiment, bus 2180 is above to obtain describing and illustrational local bus in Figure 22 (local bus 708) and Figure 56 exemplary view such as (local buss 1210).In this embodiment, have only five plates (comprising plate 2145 (plate m1)) to be used to, the true number of plate will be decided by user design complexity and the size of modelling in hardware.User with medium complicacy establishes hardware model and designs hardware model needs plate still less with the user with higher complexity.
In order to realize extensibility, except some mutual plate interconnection lines, plate 2146-2149 comes down to identical.These interconnection lines are enabled in a chip (for instance, chip 2157 in the plate 2146) certain customers of lining design hardware model, so that communicate (chip 2161 in the plate 2148 for instance) with the other part that physically is placed in the other chip, be arranged in the hardware model of same subscriber design.Briefly, understand the interconnection structure of this collaborative check system with reference to Figure 74, simultaneously with reference to Fig. 8 and 36-44, and the description of following them in the patent specification.
Plate 2148 is representational plates.Plate 2148 is the 3rd plates in this layout of being made up of four plates (getting rid of plate 2145 (plate m1)).Therefore, it is not the end plate of the suitable interconnection line terminal of needs.Plate 2148 comprises an inner i/o controller 2158, some reconfigurable logic elements (fpga chip for instance) 2159-2166, high-end group of FD bus 2167, low side group FD bus 2168, high-end group of storer 2169 and low side group storer 2170.As indicated above, in one embodiment, inner i/o controller 2158 is to obtain describing and illustrational FPGA i/o controller in Figure 22 (unit 700) and Figure 56 exemplary view such as (unit 1200) as mentioned.Similarly, high-end and low side group memory devices 2169 and 2170 is above, for instance, is described and illustrational SRAM memory device among Figure 56 (SRAM1205 and 1206).In one embodiment, high-end and low side group FD bus 2167 and 2168 is Figure 22 (FPGA bus 718 and 719) as mentioned, obtains description and illustrational FD bus or FPGA bus in Figure 56 (FD bus 1212 and 1213) and Figure 57 exemplary view such as (FD buses 1282).
Be connected in goal systems 2120 and other input-output apparatus in order to work in coordination with check system 2140, an external interface 2139 with the form appearance of an outside I/O extender is provided.On goal systems, outside I/O extender 2139 is connected on the PCI bridge 2127 via secondary pci bus 2132 and an operation circuit 2131, is used for the transmitting software clock.On input-output apparatus, outside I/O extender 2139 is via being connected to various input-output apparatus at the bus 2136-2138 of leading foot data with at the operation circuit 2133-2135 of software clock.The number that can be connected to the input-output apparatus of I/O extender 2139 is determined by the user.In any case, as required, data bus as much as possible and software clock operation circuit are provided in the outside I/O extender 2139, so that input-output apparatus as much as possible is connected to collaborative check system 2140, so that the successful operation debug procedures.
On collaborative check system 2140, outside I/O extender 2139 is via data bus 2175, and software clock operation circuit 2174 and scan control circuit 2173 are connected to outside i/o controller 2152.Data bus 2175 is used to externally transmit between the world (goal systems 2120 and the outside input-output apparatus) and collaborative check system 2140 the leading foot data.Software clock operation circuit 2174 is used to from rcc computing system 2141 to external world transmitting software clock data.
Software clock on operation circuit 2174 and 2131 produces by the major software kernel in the rcc computing system 2141.Rcc computing system 2141 is via pci bus 2171, pci controller 2151, bus 2171, three condition impact damper 2179, local bus 2180, outside i/o controller 2152 and software clock of operation circuit 2174 transmission are given outside I/O extender 2139.From outside I/O extender 2139, software clock is used as the clock input and offers goal systems 2120 (via PCI bridge 2127), and offers other outside input-output apparatus via operation circuit 2133-2135.Because software clock plays the function of master clock source, so goal systems 2120 and input-output apparatus are with slow speed operation.Yet the data that are provided to goal systems 2120 and outside input-output apparatus are synchronized to software clock speed, as the hardware model in software model in the rcc computing system 2141 and the RCC hardware array 2190.Similarly, be transferred to collaborative check system 2140 from the data of goal systems 2120 and outside input-output apparatus, so that synchronous with software clock.
Therefore, I/O data and the software clock that externally transmits between interface and the collaborative check system is synchronous.In essence, when data were transmitted between them, software clock kept synchronously the operation of outside input-output apparatus and goal systems and collaborative check system (in rcc computing system and RCC hardware array).Software clock is used to carry out data input operation and data output operation.For data input operation, when an indicator (hereinafter discuss) from rcc computing system 2141 when external interface latchs software clock, the selected internal node of other indicators will the hardware model from external interface to RCC hardware array 2190 latchs these I/O data.When software clock is transferred to external interface, in this cycle, indicator will latch these I/O data in mode one by one.When all data all were latched, rcc computing system can produce another software clock again, so that latch more multidata in the cycle at another software clock when needed.For data output operation, rcc computing system can be transferred to external interface with software clock, and controls the data lock from the hardware model internal node in the RCC hardware array 2190 subsequently, so that carry out the external interface exchange under the help of indicator.Again, indicator with one by one mode internally node to external interface data are carried out gate.If more data need be transferred to external interface, rcc computing system can produce another software clock cycle, starts selected indicator then and comes data are carried out gate, delivers to external interface.The generation of software clock is subjected to strict control, therefore allows collaborative check system to make it and the data transmission and the data estimation that are connected between any outside input-output apparatus of outside interface keep synchronously.
Scan control circuit 2173 is used to allow collaborative check system 2140 at possibility any data scanning data bus 2132,2136,2137 and 2138 on the scene.Logic in the outside i/o controller 2151 is supported sweep signal, and it is an indicator logic, and wherein various inputs were provided to the special sequential preface cycle as output via a MOVE signal before continuing to advance to next input.This logic is a simulation for the scheme that shows among Figure 11.As a result, sweep signal plays a function at the selection signal of a multiplexer, except it selects various inputs to multiplexer in circular order.Therefore, in the cycle, the sweep signal on the scan control circuit 2173 is for carrying out sampling operation to data bus 2132 from the data of goal systems 2120 a sequential.In cycle, the sweep signal on the scan control circuit 2173 is for carrying out sampling operation to data bus 2136 from the data that may be connected outside input-output apparatus there in next sequential.In cycle, data bus 2137 is sampled in next sequential, or the like, so collaborative check system 2140 can receive and handle between this limber up period all the leading foot data from goal systems 2120 or outside input-output apparatus.Any data of being received from the process of sampled data bus 2132,2136,2137 and 2138 by collaborative check system 2140 all are transferred to external buffer 2154 via outside i/o controller 2152.
Notice that the configuration hypothetical target system 2120 of Figure 69 illustrated comprises host CPU, and user's design is some peripheral hardwares, Video Controller for example, network adapter, graphics adapter, other support equipment of mouse or some, card or logic.Therefore, goal systems 2120 comprises the destination application (comprising operating system) that is connected to main pci bus 2129, and collaborative check system 2140 comprises user design, and is connected to secondary pci bus 2132.This configuration may be significantly different, and this depends on the intention that the user designs.For instance, if user design is a CPU, when goal systems 2120 no longer comprised central computer system 2121, destination application can operation in the rcc computing system 2141 of collaborative check system 2140.Really, bus 2132 current meetings are main pci buss, and bus 2129 can be a secondary pci bus.As a result, be not that the user is designed to the peripheral hardware of supporting central computer system 2121, on the contrary, it is the host computer center that the user designs current, and other all peripheral hardwares support that all the user designs.
The steering logic that is used for the data transmission between external interface (outside I/O extender 2139) and the collaborative check system 2140 is positioned at each plate 2145-2149.The major part of steering logic is arranged in outside i/o controller 2152, but other parts are positioned at various inner i/o controllers (for instance, 2156 and 2158) and in the reconfigurable logic element (fpga chip 2159 and 2165 for instance).For realizing illustrative purposes, the only essential some parts that shows this steering logic, rather than the repetition logical organization that all chips are identical in all plates.Collaborative check system 2140 parts that dotted line among Figure 69 is 2150 li comprise a subclass of steering logic.Discuss this steering logic in more detail now with reference to Figure 70-73.
Assembly in this particular subset of steering logic comprises outside i/o controller 2152, three condition impact damper 2179, inner i/o controller 2156 (CTRL1), reconfigurable logic element 2157 (chip0_1, the chip 0 of indicator board 1) and various bus and the part operation circuit that is connected to these assemblies.Particularly, for example clear data steering logic part in input cycle that is used to of Figure 70, wherein the data from external interface (outside I/O extender 2139) and rcc computing system 2141 are transferred to RCC hardware array 2190.Figure 72 for example understands the data sequential chart in input cycle.Figure 71 for example understands the steering logic part that is used for data output period, and wherein the data from RCC hardware array 2190 are transferred to rcc computing system 2141 and external interface (outside 1/0 extender 2139).Figure 73 for example understands the sequential chart of data output period.
The data input
The data input control logic is responsible for handling the data from rcc computing system or external interface to the transmission of RCC hardware array according to an embodiment of the invention.A particular subset 2150 (seeing Figure 69) of data input control logic is displayed among Figure 70, and comprise outside i/o controller 2200, three condition impact damper 2202, inner i/o controller 2203, reconfigurable logic element 2204 and various bus and operation circuit are so that allow to carry out therein data transmission.In this data input embodiment, also shown external buffer 2201.This subclass is for example understood the necessary logic of data input operation, and wherein the data from external interface and rcc computing system are transferred to RCC hardware array.The data input timing figure of Figure 70 data input control logic and Figure 72 will obtain discussing together.
Two types of cycle data obtain using in this data input embodiment of the present invention: overall cycle and software-to-hardware (S2H) cycle.The overall situation is used to the data of the chip in all RCC hardware arrays of any sensing the cycle, and clock is for example reset the S2H data of many different nodes in other directed RCC hardware arrays with some.Via the overall situation cycle these data are seen off when " overall situation " S2H data for these latter's, more feasible method, rather than sent follow-up S2H data.
Software-be used in all plates, will send to RCC hardware array from the data that the test platform of rcc computing system is handled in a sequential manner from a chip to another chip to-hardware cycle.Because the hardware model of user's design is distributed on some plates, the test platform data must be provided to each chip so that carry out data estimation.Therefore, data are transferred to each internal node in each chip in a sequential manner, once are transferred to an internal node.Follow-up transmission allows a particular data that is specified in the specific internal node to be handled by all chips in the RCC hardware array, because hardware model is distributed among a plurality of chips.
For data estimation, collaborative verification provides two address space: S2H and CLK.As indicated above, S2H and CLK space are the primary inputs from the kernel to the hardware model.Hardware model is supported the register assembly and the combine component of all subscriber's line circuit designs in fact.In addition, software clock, and is provided in the CLK input/output address space in software by modelling, so that carry out the interface exchange with hardware model.Kernel promotes simulated time, seeks the test platform assembly that activates, and the estimation clock assembly.When any clock edge was detected by kernel, RS was updated, and was propagated through the numerical value of combine component.Therefore, if hardware-accelerated pattern is selected, any numerical value in these spaces changes all will trigger hardware model change logic state.
When data transmission, the DATA_XSFR signal is in logic " 1 ".During this, local bus 2222-2230 will be used for according to following cycle data transmission data by collaborative check system: (1) is from rcc computing system to RCC hardware array with the global data in CLK space; (2) global data from external interface to RCC hardware array and external buffer; (3) the S2H data from rcc computing system to RCC hardware array, in each plate, next chip.Therefore, initial two data cycles are the parts in overall situation cycle, and last cycle data is the part in S2H cycle.
For the data input overall situation cycle first partly for, wherein the global data from rcc computing system is sent to RCC hardware array, outside i/o controller 2200 starts the logic " 1 " of a CPU_IN signal to 2255 circuits.2255 circuits are connected to a startup input of three condition impact damper 2202.By the logic on 2255 circuits " 1 ", the data on the three condition impact damper 2202 permission local buss 2222 are passed to the local bus 2223-2230 on three condition impact damper 2202 another sides.In this specific examples, local bus 2223,2224,2225,2226,2227,2228,2229 and 2230 correspond respectively to LD3, LD4 (from outside i/o controller 2200), LD6 (from outside i/o controller 2200), LD1, LD6, LD4, LD5 and LD7.
Global data is transferred to bus line 2231-2235 the inner i/o controller 2203 from these local buss, and then to FD bus line 2236-2240.In this example, FD bus line 2236,2237,2238,2239 and 2240 corresponds respectively to FD bus line FD1, FD6, FD4, FD5 and FD7.
These FD bus lines 2236-2240 is connected the latch 2208-2213 that is input in the reconfigurable logic element 2204.In this example, reconfigurable logic element is corresponding to chip 0_1 (chip 0 in the plate 1 just).Simultaneously, FD bus line 2236 is connected to latch 2208, FD bus line 2237 is connected to latch 2209 and 2211, FD bus line 2238 is connected to latch 2210, FD bus line 2239 is connected to latch 2212, and FD bus line 2240 is connected to latch 2213.
Be connected to some overall indicators and software-at the startup of each these latchs 2208-2213 input to-hardware (S2H) indicator.Startup input at latch 2208-2211 is connected to overall indicator, and is connected to the S2H indicator at the startup input of latch 2212-2213.Some exemplary overall indicators comprise the GLB_PTRO on the circuit 2241, the GLB_PTR1 on the circuit 2242, the GLB_PTR3 on GLB_PTR2 on the circuit 2243 and the circuit 2244.Some exemplary S2H indicators comprise the S2H_PTRO on the circuit 2245, and comprise the S2H_PTR1 on the circuit 2246.Because the startup input at these latchs is connected to these indicators, so latch separately can not latch data latching in the hardware model of the user's design that does not have suitable indicator signal to their plans destination node.
These overall situations and S2H indicator signal are produced by a data input pointer state machine 2214 in the output 2254.Data input pointer state machine 2214 is controlled by DATA_XSFR on the circuit 2253 and F_WR signal.DATA_XSFR and F_WR signal that inner i/o controller 2203 produces on the circuit 2253.Between RCC hardware array, data transmission is arranged, and when needing rcc computing system or external interface, DATA_XSFR always is in logic " 1 ".The F_WR signal is opposite with the F_RD signal, and when needs carried out a write operation to RCC hardware array, it was in logic " 1 ".One via the read operation of F_RD signal need be from RCC hardware array to rcc computing system or external interface data transmission.If DATA_XSFR and F_WR signal all are in logic " 1 ", data input pointer state function produces the suitable overall situation or S2H indicator signal according to suitable programmed order.
The output terminal 2247-2252 of these latchs is connected to the user and designs various internal nodes in the hardware model.Some internal nodes are corresponding to the input leading foot of user's design.User's design has other internal nodes, cannot visit via leading foot under their normal conditions, but the internal node of these non-leading foots can be realized other debugging purpose, so that for the deviser that need excitation be applied to various internal nodes in user's design provides dirigibility, no matter whether they are input leading foots.For the excitation of the meticulous hardware model that is applied to user design by external interface, data input logic and those internal nodes corresponding to the input leading foot belong to content.For instance, if user's design is a CRTC6845 Video Controller, some input leading foots may be as described below so:
Light pen strobe pin of LPSTB-
~RESET-low level signal is so that reset 6845 controllers
The RS-register is selected
E-enables
The CLK-clock
~CS-chip is selected
In this Video Controller, other input leading foots also are available.Based on carrying out the number of the input leading foot of interface exchange with the external world, therefore the number of node is determined, and the number of the number of latch and indicator also can obtain ready-made determining.For instance, be configured in some hardware models in the RCC hardware array, have 30 separate latch, they with add up to 180 latch (=30X6) each GLB_PTRO, GLB_PTR1, GLB_PTR2, GLB_PTR3, S2H-PTRO is relevant with S2H_PTR1.In other design, at the how overall indicator of GLB_PTR30, for example GLB_PTR4 may be taken as necessity and use.Similarly, at more S2H indicators of S2H_PTR30, for example S2H_PTR2 may be taken as necessity and use.The latch of these indicators and their correspondence is based on the demand of the hardware model of each user's design.
Get back to Figure 70 and 72, have only when latch is started with suitable overall indicator or S2H indicator signal, the data on the FD bus line just manage to enter these internal nodes.Otherwise these internal nodes can not get the driving of any data on the FD bus.In first semiperiod in CPU_IN=1 sequential cycle, when F_WR was in logic " 1 ", GLB_PTRO was in logic " ", so that the data that drive on the FD1 via circuit 2247 arrive corresponding internal node.If exist to rely on other latchs that GLB_PTRO starts, these latchs also can latch data to the internal node of their correspondences.In second semiperiod in CPU_IN=1 sequential cycle, F_WR enters logic " 1 " again, rises to logic " 1 " so that trigger GLB_PTR1.Data on this driving FD6 are to the internal node that is connected to circuit 2248.This also sends to the software clock signal on 2223 row, is latched into circuit 2216 and starts GLB_PTR1 signal on the circuit 2215 so that be latched device 2205.This software clock is transferred to the external clock input at goal systems and other outside input-output apparatus.Since GLB_PTRO and GLB_PTR1 only are used to first part in data input overall situation cycle, CPU_IN gets back to logic " 0 ", and this has just finished the transmission of the global data from rcc computing system to RCC hardware array.
Second part in data input overall situation cycle will be discussed now, and wherein the global data from external interface is transferred to RCC hardware array and external buffer.Equally, must be provided to hardware model and software model by quilt from goal systems or the various input leading foot signals that are drawn towards the outside input-output apparatus of user design.By using suitable indicator, these data can be transferred to hardware model, and are latched so that drive internal node.Follow-up rcc computing system obtains and the internal state of update software model so that carry out in the impact damper 2201 by at first they being stored in, and these data also are transferred to software model like this.
CPU_IN is current to be in logic " 0 ", and EXTJTN is in logic " 1 ".Therefore, the three condition impact damper 2206 in the outside i/o controller 2200 is activated, so that data are placed on (for example bus line 2217 and 2218) on this pci bus circuit.These pci bus circuits also are connected FD bus line 2219, so that be stored in the external buffer 2201.Be in first semiperiod in sequential cycle of logic " 1 " at the EXT_IN signal, GLB_PTR2 is in logic " 1 ".This latch with data latching on FD4 (via bus line 2217,2224 and local bus line 2228 (LD4)) so that be latched into internal node in the hardware model that is connected on the circuit 2249.
Be in second semiperiod in sequential cycle of logic " 1 " at the EXT_IN signal, GLB_PTR3 is in logic " " 1.This latch with data latching on FD6 (via bus line 2218,2225 and local bus line 2227 (LD6)) so that be latched into internal node in the hardware model that is connected on the circuit 2250.
As indicated above, follow-up rcc computing system obtains and the internal state of update software model so that carry out in the impact damper 2201 by at first they being stored in, and these data from goal systems or some other outside input-output apparatus also are transferred to software model.These data on the bus line 2217 and 2218 are provided at FD bus FD[63:0] enter external buffer 2201 on 2219.The particular memory address that externally stores each data in the impact damper 2201 is provided to bus 2220 by memory address counter 2207 via external buffer 2201.In order to start these storages, the WR_EXT_BUF signal is provided to external buffer 2201 via circuit 2221.Before externally impact damper 2201 was full of, rcc computing system will be read the content of external buffer 2201, so that software model is carried out suitable renewal.Any data that are transferred to the various internal nodes of hardware model in RCC hardware array may cause some internal states of hardware model to change.Because rcc computing system has the model of whole user's design in software, these internal state change in hardware model also should obtain reflection in software model.This has just finished the overall cycle of data inputs.
Now will be discussed the S2H cycle.The S2H cycle is used to from rcc computing system to classify each plate then according to the order of sequence as and to transmit that data from a chip to another chip to RCC hardware array test transmission platform data.When the EXT_IN signal enters logic " 0 ", the designation data transmission is when carrying out between rcc computing system and RCC hardware array, and the CPU_IN signal enters logic " 1 ".Do not relate to external interface.The CPU_IN signal also starts three condition impact damper 2202, so that allow data to enter inner i/o controller 2203 from local bus 2222.
The place that begins in the CPU_IN-1 sequential cycle, S2H_PTRO enters logic " 1 ", it can latch data on FD5 (via local bus 2222, local bus 2229, bus line 2234, and FD bus 2239) so that be latched into internal node in the hardware model that is connected on the circuit 2251.In second part in CPU_IN=1 sequential cycle, S2H_PTR1 enters logic " 1 ", it can latch data on FD7 (via local bus 2222, local bus 2230, bus line 2235, and FD bus 2240) so that be latched into internal node in the hardware model that is connected on the circuit 2252.During the follow-up data estimation, data from rcc computing system at first are transferred to chip m1, arrive chip 0_1 then (just, chip 0 on the plate 1), chip 1_1 (just, the chip 1 on the plate 1), to the last last chip on plate, chip 7_8. (chip 7 on the plate 8 just).As fruit chip m2 is available, and data are also by this chip of shift-in.
In this DTD, DATA_XSFR returns logic " O ".Note, be taken as global data from the data of external interface I/O, and during the overall situation cycle, obtain handling.This has just finished the discussion in data input control logic and data input cycle.
Data output
Data output control logic embodiment of the present invention is discussed now.Be responsible for handling the data that are transferred to rcc computing system and external interface from RCC hardware array according to the data output control logic of the embodiment of the invention.During at excitation (outside or other) deal with data, hardware model produces the data that certain export target application program or some input-output apparatus may need.These output datas may be data, address, control information or other application programs of essence or equipment may in self handles, need other for information about.These output datas that enter rcc computing system (it has the model of other outside input-output apparatus in software), goal systems or outside input-output apparatus are provided on the various internal nodes.The discussion of comparable data input logic is shown as mentioned, and some internal nodes are corresponding to the output leading foot of user's design.User's design has other internal node, they normally can not visit via leading foot, but the internal node of these non-leading foots is at other debugging purpose, so that for the deviser provides dirigibility, they wish that the various internal nodes in user design read and analyze excitation, no matter whether they are output leading foots.For from the meticulous hardware model of user design, be applied to for external interface or rcc computing system (it may at the model that other input-output apparatus are arranged in the software) excitation, imply corresponding to data output logic and those internal nodes of output leading foot.
For instance, if user's design is a CRTC6845 Video Controller, some output leading foots may comprise following:
The MAO-MA13-storage address
The DO-D7-data bus
The DE-demonstration enables
The CURSOR-cursor position
The VS-vertical synchronization
The HS-horizontal synchronization
Other output leading foots in this Video Controller also are available.Based on carrying out the number of the output leading foot of interface exchange with the external world, the number of the number of node and gate logic and indicator number also can obtain being easy to determine.Therefore, the output leading foot MAO-MA13 on the Video Controller provides storage address for video-ram.VS output leading foot provides signal for vertical synchronization, so a vertical retrace on the trigger scope.Output leading foot-D7 is eight terminals, and they form BDB Bi-directional Data Bus for CPU in the goal systems visits inner 6845 registers.These output leading foots are corresponding to some internal node in the hardware model.Certainly, the number of these internal nodes and character change according to user's design.
Data from these output leading foot internal nodes must be provided to rcc computing system, because rcc computing system comprises a model of whole user's design in software, and any incident that takes place in hardware model all must be passed to software model, so that make corresponding variation.Like this, software model will have the information consistent with hardware model.The method that can increase is, rcc computing system has the device model of input-output apparatus, and user or deviser's decision in software, rather than are attached to a real equipment one of port on the outside I/O extender with these device modelings.For instance, the user may make decision, think display or loudspeaker modelling inserted a real display than one of port on the I/O extender externally in software or loudspeaker easier and more effective.In addition, the data from these internal nodes in the hardware model must be provided to goal systems and any other outside input-output apparatus.In order to allow these data of exporting in leading foot internal nodes be transferred to rcc computing system and goal systems and other outside input-output apparatus, the data output control logic is provided in the collaborative check system according to an embodiment of the invention.
The data output control logic uses the plurality of data output cycle, and these cycles relate to the data transmission (outside I/O extender 2139) from RCC hardware array 2190 to rcc computing system 2141 and external interface.In Figure 69, for the steering logic that externally the transmission data are prepared between interface (outside I/O extender 2139) and the collaborative check system 2140 is present among each plate 2145-2149.The major part of steering logic is present in the outside i/o controller 2152, but other parts are present in various inner i/o controllers (for instance, 2156 and 2158) and in the reconfigurable logic element (fpga chip 2159 and 2165 for instance).Simultaneously, for illustrative purposes, only need in all plates, represent the some parts of this steering logic for all chips, rather than identical repetition logical organization.Collaborative check system 2140 parts of dotted line 2150 the insides comprise a subclass of steering logic among Figure 69.Go through this steering logic now with reference to Figure 71 and 73.Figure 71 for example understands the steering logic part that is used to data output period.Figure 73 for example understands the sequential chart of data output period.
A special subclass of data output control logic is displayed among Figure 71, and comprise outside i/o controller 2300, three condition impact damper 2301, inner i/o controller 2302, a reconfigurable logic element 2303, and the various buses and the operation circuit that allow data to transmit betwixt.This subclass understands that for example wherein the data from external interface and rcc computing system are transferred to RCC hardware array at the essential logic of data output operation.The data output control logic of Figure 71 and the data output timing diagram of Figure 73 will obtain discussing together.
With the data two cycles type opposite in input cycle, data output period includes only the cycle of a type.The data output control logic need be arrived by sequential delivery from the data of RCC hardware model: (1) rcc computing system, arrive (2) rcc computing system and external interface (to goal systems and outside input-output apparatus) then.Particularly, data output period need the data from the hardware model internal node at first be transferred to rcc computing system in RCC hardware array, arrive rcc computing system and external interface then, secondly to each chip, in each plate, next chip, and next plate.
As the data input control logic, indicator will be used to internally node and select (or gate) data to rcc computing system and external interface.In an embodiment of Figure 71 and 73 illustrated, data output indicator state machine 2319 is at hardware-to-software data and hardware-produce five indicator H2S_PTR to-external interface data on bus 2359.Data output indicator state machine 2319 is by DATA_XSFR and F_RD signal controlling on circuit 2358 row.DATA_XSFR and F_RD signal that inner i/o controller 2302 produces on the circuit 2358.As long as when needing data transmission between RCC hardware array and rcc computing system or the external interface, DATA_XSFR always is in logic " 1 ".The F_RD signal is opposite with the F_WR signal, as long as need be from the read operation of RCC hardware array, it just be in logical one.If DATA_XSFR and F_RD signal all are in logic " 1 ", data output indicator state machine 2319 just can produce suitable H2S indicator signal in suitable programmed sequence.Other embodiment may use more indicator (or less indicator) because of the necessity of user's design.
These H2S indicator signals are provided to a gate logic.Enter more directed AND gate 2314-2318 of combinatorial input 2353-2357 of gate logic.The internal node that is connected to hardware model of another group input 2348-2352.Therefore, AND gate 2314 has the input 2348 from an internal node, and has the input 2353 from H2S_PTRO; AND gate 2315 has the input 2349 from an internal node, and has the input 2354 from H2S_PTR1; AND gate 2316 has the input 2350 from an internal node, and has the input 2355 from H2S_PTR2; AND gate 2317 has the input 2351 from an internal node, and has the input 2356 from H2S_PTR3; AND gate 2318 has from the input 2352 of an internal node with from the input 2357 of H2S_PTR4.Do not have suitable H2S_PTR indicator signal, internal node just can not be driven to rcc computing system or external interface.
The 2343-2347 of output terminal separately of these AND gates 2314-2318 is connected to OR-gate 2310-2313.Therefore, AND gate output terminal 2343 is connected to the input end of OR-gate 2310; AND gate output terminal 2344 is connected to the input end of OR-gate 2311; AND gate output terminal 2345 is connected to the input end of OR-gate 2311; AND gate output terminal 2346 is connected to the input end of OR-gate 2312; AND gate output terminal 2347 is connected to the input end of OR-gate 2313.Notice that the output terminal 2344 of AND gate 2315 is not connected to an OR-gate for sharing; On the contrary, output 2344 is connected to OR-gate 2311, also is connected to the output terminal 2345 of AND gate 2316.Other input end 2360-2366 that enters OR-gate 2310-2313 can be connected to the output of other AND gate (not shown), and they oneself are connected to other internal nodes and H2S_PTR indicator.The hardware model that the use of these OR-gates and their specific input designs and is configured based on the user.Therefore, in other design, may use more indicator, and be connected to a different OR-gate from the output 2344 of AND gate 2315, rather than OR-gate 2311.
The output 2339-2342 of OR-gate 2310-2313 is connected to FD bus line FDO, FD3, FD1 and FD4.In this specific examples of user's design, have only four output leading foot signals will be transferred to rcc computing system and external interface.Therefore, FDO is connected to the output terminal of OR-gate 2310; FD3 is connected to the output terminal of OR-gate 2311; FD1 is connected to the output terminal of OR-gate 2312; FD4 is connected to the output terminal of OR-gate 2313.These FD bus lines are connected to local bus circuit 2330-2333 via internal wiring 2334-2338 in inner i/o controller 2302.In this embodiment, local bus circuit 2330 is LD0, and local bus circuit 2331 is LD3, and local bus circuit 2332 is LD1, and local bus circuit 2333 is LD4.
In order to start data on these local bus circuits 2330-2333 so that they are transferred to rcc computing system, these local bus circuits are connected to three condition impact damper 2301.In its normal condition, three condition impact damper 2301 allows data to enter into local bus 2320 from local bus circuit 2330-2333.Contrast during data inputs, has only when the CPU_IN signal is provided to three condition impact damper 2301, and data just are allowed to pass through to RCC hardware array from rcc computing system.
In order to start the data on these local bus circuits 2330-2333,, provide circuit 2321-2324 here so that they are transferred to external interface.Circuit 2321 is connected to circuit 2330 and some the latch (not shown)s in the outside i/o controller 2300; Circuit 2322 is connected to circuit 2331 and some the latch (not shown)s in the outside i/o controller 2300; Circuit 2323 is connected to circuit 2332 and the latch 2305 in the outside i/o controller 2300; Circuit 2324 is connected to circuit 2333 and the latch 2306 in the outside I/O controller 2300.
These latchs 2305 and each output terminal of 2306 all are connected to an impact damper, and then to external interface, it is connected to the suitable output leading foot of goal systems or outside input-output apparatus then.Therefore, the output terminal of latch 2305 is connected to impact damper 2307 and circuit 2327.Equally, the output terminal of latch 2306 is connected to impact damper 2308 and circuit 2328.The another one output terminal of another one latch (not shown) can be connected to circuit 2329.In this example, lead 1, lead 4 and the lead 3 of circuit 2327-2329 difference respective objects system or some outside input-output apparatus.At last, from the hardware model to the external interface, carry out between transmission period data, the hardware model of user's design obtains configuration, so that be connected to the lead 3 on the internal node respective lines 2329 of circuit 2350, the internal node that is connected to circuit 2351 is corresponding to the lead on the circuit 2,327 1, and the internal node that is connected to circuit 2352 is corresponding to the lead on the circuit 2,328 4.Equally, lead 3 is corresponding to the LD3 on the circuit 2331, and lead 1 is corresponding to the LD1 on the circuit 2332, and lead 4 is corresponding to the LD4 on the circuit 2333.
A look-up table 2309 is coupled to the startup input that enters these latchs 2305 and 2306.Look-up table 2309 is controlled by the F_RD signal that triggers 2304 operations of look-up table address counter on the circuit 2367.At each counter-increments place, indicator starts specific row in look-up table 2309.If the project (or bit) in this particular column is for logic " 1 ", the LUT outlet line that is connected to that specific project in the look-up table 2309 will start its corresponding latch, and driving data enters external interface, and the required destination in last target approach system or some the outside input-output apparatus.For instance, LUT outlet line 2325 is connected to the startup input at latch 2305, and LUT outlet line 2326 is connected to the startup input at latch 2306.
In this example, the row 0-3 of look-up table 2309 is programmed the latch that starts corresponding to the output of the internal node among chip m1 leading foot lead.Equally, row 4-6 is programmed the latch that starts corresponding to the output of the internal node among the chip 0_1 (chip 0 in the plate 1 just) leading foot lead.In row 4, bit 3 is in logic " 1 ".In row 5, bit 1 is in logic " 1 ".In row 6, bit 4 is in logic " 1 ".Every other project or bit position all are in logic " O ".Because a single output leading foot circuit can not drive multiple input-output apparatus, so, have only a project to be in logic " 1 " for any given bit position (or hurdle) in the look-up table.In other words, the output of in hardware model leading foot internal node can only provide data to a uniline that is connected to external interface.
As indicated above, the data output control logic needs the data in each reconfigurable logic element in each chip in the RCC hardware model to be arrived by sequential delivery: (1) rcc computing system, arrive (2) rcc computing system and external interface (to goal systems and outside input-output apparatus) then.Rcc computing system needs these data, because it has the model that some input-output apparatus are arranged in software, and those are not designed into the data of these modeled input-output apparatus, rcc computing system need be monitored them, so that the state of the hardware model in its internal state and the RCC hardware array is consistent.In this example of Figure 71 and 73 illustrated, have only seven internal nodes will be driven so that output to rcc computing system and external interface.Two nodes in those internal nodes are arranged in chip m1, and other five internal nodes are arranged in chip0_1 (chip 0 in the plate 1 just).Certainly, for this specific user design, in the chip of these and other, may need other internal nodes, but Figure 71 and 73 will only illustrate this seven nodes.
During data transmission, the DATA_XSFR signal is in logic " 1 ".During this, local bus 2330-2333 will be used for the data sequential delivery from each chip in each plate in the RCC hardware array is arrived rcc computing system and external interface by collaborative check system.The operation of DATA_XSFR and F_RD signal control data output indicator state machine is so that produce suitable indicator H2S_PTR[4:0], allow it enter the suitable door of output leading foot internal node.The F_RD signal is also controlled look-up table address counter 2304, so that the internal node data transmission is arrived external interface.
Internal node among the chip m1 will at first obtain handling.When F_RD was raised to logic " 1 " when data transfer cycle begins when, the H2S_PTRO among the chip m1 entered logic " 1 ".This enters in the rcc computing system via data in those internal nodes among three condition impact damper 2301 and the local bus 2320 chip for driving m1, that depend on H2S_PTRO.2304 pairs of look-up tables of look-up table address counter, 2309 row 0 are counted and are pointed to, so that the proper data of chip m1 is latched into external interface.When the F_RD signal enters logic " 1 " again, can be transferred to rcc computing system and external interface by the data that H2S_PTR1 drives in the internal node.H2S_PTR1 enters logic " 1 ", and corresponding to the 2nd F_RD signal, 2304 pairs of look-up tables of look-up table address counter, 2309 row 1 are counted and pointed to, so that the proper data of chip m1 is latched into external interface.
Five internal nodes of reconfigurable logic element 2303 (just, chip 0_1, or the chip in the plate 1 0) will obtain handling now.In this example, the data from two internal nodes relevant with H2S_PTRO and H2SPTR1 will only be transferred to rcc computing system.From with H2S_PTR2, the data of three internal nodes that H2S_PTR3 is relevant with H2S_PTR4 will be transferred to rcc computing system and external interface.
When F_RD was raised to logic " 1 ", the H2S_PTRO in the chip 2303 entered logic " 1 ".This makes it enter rcc computing system via the data that depend on H2S_PTRO in the internal node in three condition impact damper 2301 and local bus 2320 chip for driving 2303.In this example, the internal node that is connected to circuit 2348 depends on the H2S_PTRO on the circuit 2353.When the F_RD signal enters logic " 1 " again, can be transferred to rcc computing system by the data that H2S_PTR1 drives in the internal node.Here, the internal node that is connected on the circuit 2349 is affected.These data are driven the LD3 on circuit 2331 and 2322.
When the F_RD signal entered logic " 1 " again, H2S_PTR2 entered logic " 1 ", and the data in the internal node that is connected to circuit 2350 are provided on the LD3.These data are provided to rcc computing system and external interface.Three condition impact damper 2301 allows data transmission to local bus 2320, enters within the rcc computing system then.As for external interface, by starting the H2S_PTR2 signal, these data are driven to the LD3 on circuit 2331 and 2322.Corresponding to the F_RD signal, the row 4 of 2304 pairs of look-up tables 2309 of look-up table address counter are counted and are pointed to, so that suitable data are latched into the circuit 2329 (lead 3) that is positioned at external interface from the internal node that this is connected on the circuit 2350.
When the F_RD signal entered logic " 1 " again, H2S_PTR3 entered logic " 1 ", and the data in the internal node that is connected to circuit 2351 are provided on the LD1.These data are provided to rcc computing system and external interface.Three condition impact damper 2301 allows data transmission to local bus 2320, enters within the rcc computing system then.As for external interface, by starting the H2S_PTR3 signal, these data are driven to the LD1 on circuit 2332 and 2323.Corresponding to the F_RD signal, the row 5 of 2304 pairs of look-up tables 2309 of look-up table address counter are counted and are pointed to, so that suitable data are latched into the circuit 2327 (lead 1) that is positioned at external interface from the internal node that this is connected on the circuit 2351.
When the F_RD signal entered logic " 1 " again, H2S_PTR4 entered logic " 1 ", and the data in the internal node that is connected to circuit 2352 are provided on the LD4.These data are provided to rcc computing system and external interface.Three condition impact damper 2301 allows data transmission to local bus 2320, enters within the rcc computing system then.As for external interface, by starting the H2S_PTR4 signal, these data are driven to the LD4 on circuit 2333 and 2324.Corresponding to the F_RD signal, the row 6 of 2304 pairs of look-up tables 2309 of look-up table address counter are counted and are pointed to, so that suitable data are latched into the circuit 2328 (lead 4) that is positioned at external interface from the internal node that this is connected on the circuit 2352.
Data in the internal node of chip m1 at first are driven into rcc computing system to be proceeded for other chips in a sequential manner to this process of rcc computing system and external interface then.At first, the internal node of chip m1 is activated.Secondly, the internal node of chip 0_1 (chip 2303) is activated.Then, if chip 1_1 has any internal node, it all will be activated.This process is proceeded, and to the last last node in last chip in plate is activated.Therefore, as fruit chip 7_8 any internal node is arranged, it all will be activated.At last, as fruit chip m2 any internal node is arranged, it will be activated.
Though Figure 71 has shown the data output control logic that only is used for driving internal node in chip 2303, other chips also have internal node, and they may need to be driven into rcc computing system and external interface.No matter the internal node number how much, the data that the data output logic will drive from a chip internal node make it enter rcc computing system, then in another cycle, the internal node that drives in the identical chips not on the same group enters rcc computing system and external interface together.The data output control logic continues to advance to next chip then, and moves the identical two steps operation of the data-driven operation that at first is assigned to rcc computing system, and the data that will be assigned to external interface then are driven into rcc computing system and external interface.Even data are designed into external interface, rcc computing system also must be understood those data, because rcc computing system has a model of whole user design in software, this model must have with RCC hardware array in the consistent internal state information of hardware model information.
Circuit-board laying-out
The circuit-board laying-out of collaborative check system according to an embodiment of the invention is discussed now with reference to Figure 74.Circuit board is installed in the RCC hardware array.Circuit-board laying-out is similar to the layout that Fig. 8,36-44 illustrate and related text is described.
RCC hardware array comprises six plates, in one embodiment.Plate m1 is connected to plate 1, and plate m2 is connected to plate 8.Plate 1, plate 2, being connected and being arranged in above of plate 3 and plate 8 obtains describing with reference to Fig. 8 and 36-44.
Plate m1 comprises chip m1.The interconnection structure that plate m1 is relevant to other plates is convenient to chip m1 and is connected to chip 0 at plate 1, chip 2, the south interconnection of chip 4 and chip 6.Similarly situation is that plate m2 comprises chip m2.The interconnection structure that plate m2 is relevant to other plates is convenient to chip m2 and is connected to chip 0 at plate 8, chip 2, the south interconnection of chip 4 and chip 6.
X. example
The operation of an embodiment in order to demonstrate the invention will be used the subscriber's line circuit design of a hypothesis.In structural register transfer level (RTL) HDL sign indicating number, exemplary subscriber's line circuit design is as follows:
Module register (clock resets, d, q)
Input clock, d resets;
Output q;
reg q;
Always@ (posedge clock or negedge reset)
If (~reset) q=0;
Else other
q=d,
Endmodule (end module)
Moduleexample module illustration;
Wire (lead) d1, d2, d3;
Wire (lead) q1, q2, q3;
reg sigin;
wire sigout;
reg clk,reset;
Register (register) reg1 (clk, reset, d1, q1);
Register (register) reg2 (clk, reset, d2, q2);
Register (register) reg3 (clk, reset, d3, q3);
assign d1=sigin∧q3;
assign d2=q1∧q3;
assign d3=q2∧q3;
assign sigout=q3;
//a clock generator (clock generator)
Always (always)
Begin (beginning)
clk=0;
.#5
clk=1;
#5;
End (end)
//a signal generator always (signal generator always)
Begin (beginning)
#10;
sigin=$random;
End (end)
//initialization (initialization)
Initial (initially)
Begin (beginning)
Reset (resetting)=0;
sigin=0;
#1;
Reset (resetting)=1;
#5;
$monitor;($time,″%b,%b,″sigin,sigout)
#1000$finish (end);
End (end)
End module (end module)
This code is produced in Figure 26 once more.Be appreciated that the present invention, needn't understand the particular functionality details of this circuit design.Yet the reader should understand, and the user produces this HDL sign indicating number so that be circuit of board design.So that corresponding to input signal, and produce an output by designed some functions of the circuit run user of this yard representative.
Figure 27 has shown the circuit diagram of the HDL sign indicating number that obtains discussing with reference to Figure 26.Under most situation, in fact the user may produce a circuit diagram of this character before representing with the HDL form.Some circuit diagram input tools allow entering of n-lustrative circuit diagram, and after handling, these instruments produce spendable sign indicating number.
As shown in figure 28, simulation system operating component type analysis.The HDL coding was rendered as the sign indicating number of representative of consumer specific circuit design originally in Figure 26, it has obtained analysis now.By " module register (clock clock resets reset, d, q); " beginning, and with " finishes module endmodule " be ends, and initial several the going of further being discerned by numbering 900 yard is a register definitions section.
A few down row sign indicating numbers, numbering 907 has been represented some wire interconnects information.Those skilled in the art should understand, and the lead variable among the HDL is used to the actual binding between the structural entity of typical example such as gate circuit and so on.Because HDL mainly is used to the modelling digital circuit, the lead variable is essential variable.Usually, and " q " (for instance, q1, q2 q3) represents the output lead circuit, and also " d " (for instance, d1, on behalf of input lead circuit-numbering 908, d2 d3) shown " sigin " that exports as a test platform.Register has shown " sigout " as a test platform input for No. 909.
Numbering 901 has shown register assembly S1, S2 and S3.Numbering 902 has shown combine component S4, S5, S6 and S7.Notice that combine component S4-S7 has output variable d1, d2 and conduct enter the d3 of the input of register assembly S1-S3.Numbering 903 has shown clock assembly S8.
The sign indicating number line number order of next series has shown the test platform assembly.Numbering 904 has shown test platform assembly (driver) S9.Numbering 905 has shown test platform assembly (initial value) S10, and S11.Numbering 904 has shown test platform assembly (display) S12.
Following table has been summarized the component type analysis:
Assembly Type
S1 Register
S2 Register
S3 Register
S4 Combined type
S5 Combined type
S6 Combined type
S7 Combined type
S8 Clock
S9 Test platform (driver)
S10 Test platform (initialization)
S11 Test platform (initialization)
S12 Test platform (monitor)
Based on the component type analysis, system is that entire circuit produces a software model, and is that register and combine component produce a hardware model.S1-S3 is the register assembly, and S4-S7 is a combine component.These assemblies will so that allow the user of simulation system to simulate entire circuit in software, or be simulated in software, and carry out selectivity and quicken in hardware by modelling in hardware.In arbitrary situation, the user controls simulation and hardware-accelerated pattern.During the method that can increase, the user can come mimic channel with a goal systems, still to starting, stopping to keep software control, can check numerical value simultaneously, and asserts input value according to all recycle design.
Figure 29 has shown a signal network analysis of same structural RTL level HDL sign indicating number.As shown in illustrating, S8, S9, S10 and S11 are by modelling or be provided in the software.S9 is the test platform process in essence, and it produces the sigin signal, and S12 is test platform display process in essence, and it receives the sigout signal.In this example, S9 produce one at random sigin come mimic channel.Yet, at the register S1 of S3 and at the combine component S4 of S7 by modelling in hardware and software.
For the software/hardware border, system be various retention signals (just, q1, q2, q3, CLK, sigin, sigout) designated memory space, these signals will be used to make software model and hardware model to carry out interface exchange.Following table has been listed the distribution of storage space:
Signal The memory address space
q1 REG
q2 REG
q3 REG
clk CLK
sigin S2H
sigout H2S
Figure 30 has shown software/hardware subregion result for this illustrative circuit design.Figure 30 is a more attainable illustration of relevant software/hardware subregion.Software end 910 is connected to hardware end 912 through software/hardware border 911 and pci bus 913.
Software end 910 comprises software kernel, and by its control.Generally speaking, kernel is the major control loop of the operation of the whole Analog Simulation System of control.As long as any test platform process is that effectively kernel is just estimated the Validity Test platform assembly, the estimation clock assembly detects the clock edge so that upgrade RS, and propagates the combined type logical data, and promotes simulated time.Even kernel resides in the software end, its some operations or statement also can move in hardware, because exist a hardware model to be used for those statements and operation.Therefore, this software control software model and hardware model.
Software end 910 comprises the whole model of subscriber's line circuit, comprises S1-S12.Software/hardware boundary member in the software end comprises input/output (i/o) buffer or storage space S2H, CLK, H2S, and REG.Notice that driver test platform process S9 is connected to the S2H storage space, display apparatus test platform process S12 is connected to the H2S storage space, and clock generator S8 is connected to the CLK storage space.Register S1-S3 output signal q1-q3 will be assigned to the REG space.
Hardware model 912 has the model of combine component S4-S7, and it resides in pure hardware end.On the software/hardware border of hardware model 912, sigout, sigin, register output q1-q3 and software clock 916 are implemented.
Except the model of User Defined circuit design, system also produces software clock and address pointer.Software clock provides signal to the startup input that enters register S1-S3.As indicated above, remove race condition and retention time upset problem according to software clock of the present invention.When major clock detects the clock edge in software when, detect logic triggers a correspondence in hardware detection logic.Subsequently, register 916 pairs of registers in clock edge start input and produce an enabling signal, so that any data that reside in the input are carried out gate, make it enter register.
For the purpose of describing and being familiar with, also shown address pointer 914 here.Address pointer is in fact all accomplished in each fpga chip, and allows data to be transferred to its destination in the mode of selectivity and succession.
Combine component S4-S7 also is connected to register assembly S1-S3, sigin (signal input), and sigout (signal output).These signals go to or leave pci bus 913 on input/output bus 915.
In mapping, before layout and the step line step, one completely hardware model be displayed among Figure 31, do not comprise address pointer.System also is not mapped to specific chip to model.Register S1-S3 is provided and is connected to input/output bus and combine component S4-S6.Combine component S7 only is the output q3 of register S3.Sigin, sigout and software clock 920 are also by modelling.
In case hardware model has been determined, system just can handle model mapping, layout, wiring within one or more chips then.In fact this specific examples can realize on single AlteraFLEX 10K chip, but the purpose in order to instruct, and this example need to suppose two chips to realize this hardware model.Figure 32 has shown a specific hardware model-to-chip subregion result for this example.
In Figure 32, model (except I/O and clock edge register) is with being represented chip boundary to show by dotted line completely.Before last configuration file produced, this result was produced by the compiler of simulation system.Therefore, hardware model needs three leads at lead 921,922 and 923 at least between these two chips.For the pin/lead number that will need between these two chips (chip 1 and chip 2) minimizes, can produce another model-, or use a multiplexing scheme to-chip subregion.
Analyze this particular zones result who shows among Figure 32, the lead number between these two chips can be reduced to two, and method is from chip 2 sigin lead 923 to be moved to chip 1.Really, Figure 33 for example understands this subregion.As if though only consider from the lead number, the particular zones among Figure 33 is better than the subregion that shows among Figure 32, this example has been selected the hypothetical simulation system subregion of Figure 32 after having moved mapping, layout and step line operation.The subregion result of Figure 32 will be used as the basis that produces configuration file.
Figure 34 has shown the logical patch operation for same hypothesis example, has wherein shown the last realization in two chips.This system uses the subregion result of Figure 32 so that produce configuration file.Yet, explicit address indicator not, this is just for easy purpose.Two fpga chips 930 and 940 have obtained demonstration.Chip 930 comprises that among miscellaneous part, subscriber's line circuit designs by the part of subregion, a TDM unit 931 (receiver side), software clock 932 and input/output bus 933.Chip 940 among miscellaneous part, comprises the subscriber's line circuit design by the part of subregion, the TDM unit 941 that is used to the side of transmission, software clock 942 and an input/output bus 943.TDM unit 931 and 941 is with reference to Fig. 9 (A), and 9 (B) and 9 (C) have obtained discussion.
These chips 930 and 940 have two interconnecting leads 944 and 945, and they link together hardware model.These two interconnecting leads are somes of the interconnection structure that shows among Fig. 8.With reference to Fig. 8, this interconnection structure is the interconnection 611 between chip F32 and F33.In one embodiment, the lead/pin maximum number at each interconnection is 44.In Figure 34, modeled circuit only needs two lead/pins between chip 930 and 940.
These chips 930 and 940 are connected to group bus 950.Realize two chips because of needing only, so two chips all are arranged in identical group, or each chip resides in the different group.Best method is, a chip is connected to a group bus, and another chip is connected to another group bus, so that determine to equal at the transmission quantity of FPGA interface the transmission quantity at pci interface place.
A preferred embodiment of the present invention has been described in the front, and its proposition is for illustration and purpose of description.This is not to be to illustrate completely, is not the particular form that presents in order to limit the invention to yet.Obviously, those skilled in the art can clearly find many modifications and changes.Those skilled in the art will find easily that other application can be replaced application provided herein, and can not deviate from the spirit and scope of the present invention.Therefore, the present invention only should be subjected to the hereinafter restriction of claims.

Claims (2)

1, a kind of logical device, it comprises:
One first logic, it has the first input end that is used to receive first data with first value, second input end, first output terminal and the control input end that is used to receive a control signal; And
Second logic that is used to store currency, it has first and triggers input end, second logic input terminal that connects first output terminal, and second logic output terminal that connects second input end of first logic, wherein when when the triggering input end receives trigger pip, first value of first data of second logical renewal to first output terminal also offers second input end of first logic with these first data, no matter in the control input end or the control signal of first input end arrive the order of first logic.
2, equipment as claimed in claim 1, it further comprises:
The 3rd logic that is used to store new value, it has four-input terminal, and second triggers input end, and the 3rd output terminal, and wherein the 3rd output terminal links to each other with the first input end of first logic; And
A marginal detector, it comprises input end of clock, and the 3rd triggers input end, and the 4th output terminal, wherein the 4th output terminal links to each other with the control input end, wherein at select time trigger pip is put on the second and the 3rd and triggers input end to upgrade this logical device.
CN01822790A 2001-08-14 2001-08-14 Timing-insensitive glitch-free logic device Expired - Fee Related CN100578510C (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2001/025546 WO2003017148A1 (en) 1997-05-02 2001-08-14 Timing-insensitive glitch-free logic system and method

Publications (2)

Publication Number Publication Date
CN1491394A CN1491394A (en) 2004-04-21
CN100578510C true CN100578510C (en) 2010-01-06

Family

ID=21742774

Family Applications (1)

Application Number Title Priority Date Filing Date
CN01822790A Expired - Fee Related CN100578510C (en) 2001-08-14 2001-08-14 Timing-insensitive glitch-free logic device

Country Status (6)

Country Link
EP (1) EP1417605A4 (en)
JP (1) JP4125675B2 (en)
KR (1) KR20040028599A (en)
CN (1) CN100578510C (en)
CA (1) CA2420022A1 (en)
IL (2) IL154480A0 (en)

Families Citing this family (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4561459B2 (en) 2004-04-30 2010-10-13 ヤマハ株式会社 Class D amplifier
KR101282963B1 (en) * 2006-05-12 2013-07-08 삼성전자주식회사 Emulation system and method thereof
KR101340763B1 (en) * 2008-11-19 2013-12-12 엘에스아이 코포레이션 Interconnects using self-timed time-division multiplexed bus
US8239590B1 (en) * 2009-01-31 2012-08-07 Xilinx, Inc. Method and apparatus for transferring data between two different interfaces
US10423740B2 (en) * 2009-04-29 2019-09-24 Synopsys, Inc. Logic simulation and/or emulation which follows hardware semantics
WO2010129909A1 (en) * 2009-05-07 2010-11-11 Cypress Semiconductor Corporation Development, programming, and debugging environment
US8942628B2 (en) * 2011-11-28 2015-01-27 Qualcomm Incorporated Reducing power consumption for connection establishment in near field communication systems
CN102799709B (en) * 2012-06-19 2015-04-01 中国电子科技集团公司第二十八研究所 System simulation test environment building and configuring system and method based on extensive markup language (XML)
KR101354007B1 (en) * 2012-12-12 2014-01-21 국방과학연구소 Interfacing system synchronizing a time process of a simulation system and a test system based on simulation time and test method for simulation model
KR101704600B1 (en) 2014-10-31 2017-02-08 한국전기연구원 Glitch removal device for hall-sensor
US10289579B2 (en) * 2015-12-10 2019-05-14 Qualcomm Incorporated Digital aggregation of interrupts from peripheral devices
EP3399425B1 (en) * 2017-05-05 2020-07-29 dSPACE digital signal processing and control engineering GmbH Method for detecting wiring topology
CN109960593B (en) * 2017-12-26 2023-02-17 中国船舶重工集团公司七五〇试验场 Interlocking type time sequence control simulation method
CN108537000B (en) * 2018-03-27 2021-07-27 东南大学 Milli-type state machine design method based on molecular calculation
EP3579126A1 (en) * 2018-06-07 2019-12-11 Kompetenzzentrum - Das virtuelle Fahrzeug Forschungsgesellschaft mbH Co-simulation method and device
CN112753034B (en) * 2018-09-25 2024-03-12 美商新思科技有限公司 Hardware simulation system control block and method for controlling hardware simulation of circuit design
CN109683512B (en) * 2018-12-07 2022-04-12 四川航天烽火伺服控制技术有限公司 Adapter card applied to rudder system
US10454459B1 (en) 2019-01-14 2019-10-22 Quantum Machines Quantum controller with multiple pulse modes
US11164100B2 (en) 2019-05-02 2021-11-02 Quantum Machines Modular and dynamic digital control in a quantum controller
US10931267B1 (en) 2019-07-31 2021-02-23 Quantum Machines Frequency generation in a quantum controller
US10862465B1 (en) 2019-09-02 2020-12-08 Quantum Machines Quantum controller architecture
US11245390B2 (en) 2019-09-02 2022-02-08 Quantum Machines Software-defined pulse orchestration platform
CN112445743B (en) * 2019-09-04 2024-03-22 珠海格力电器股份有限公司 Burr removing method, device and state machine
CN111479334B (en) * 2020-03-20 2023-08-11 深圳赛安特技术服务有限公司 Network request retry method and device and terminal equipment
CN111581149B (en) * 2020-04-24 2022-08-26 希翼微电子(嘉兴)有限公司 Reconfigurable address remapping low-power consumption multifunctional timer
US11132486B1 (en) * 2020-05-21 2021-09-28 Taiwan Semiconductor Manufacturing Company, Ltd. Systems and methods for multi-bit memory with embedded logic
US11043939B1 (en) 2020-08-05 2021-06-22 Quantum Machines Frequency management for quantum control
CN112269728B (en) * 2020-11-03 2023-08-04 北京百度网讯科技有限公司 System performance evaluation method, device, equipment and storage medium
CN112328701B (en) * 2020-11-27 2023-11-10 广东睿住智能科技有限公司 Data synchronization method, terminal device and computer readable storage medium
CN113158260B (en) * 2021-03-30 2023-03-31 西南电子技术研究所(中国电子科技集团公司第十研究所) Hierarchical protection circuit of SoC chip internal data
CN112733478B (en) * 2021-04-01 2021-08-03 芯华章科技股份有限公司 Apparatus for formal verification of a design
CN113297819B (en) * 2021-06-22 2023-07-07 海光信息技术股份有限公司 Timing sequence checking method and device of asynchronous clock, electronic equipment and storage medium
WO2023281652A1 (en) * 2021-07-07 2023-01-12 日本電信電話株式会社 Reconfigurable circuit device
US20230153678A1 (en) * 2021-07-21 2023-05-18 Quantum Machines System and method for clock synchronization and time transfer between quantum orchestration platform elements
CN114841103B (en) * 2022-07-01 2022-09-27 南昌大学 Parallel simulation method, system, storage medium and equipment for gate-level circuit
CN116882336B (en) * 2023-09-07 2023-12-01 芯动微电子科技(珠海)有限公司 Modeling method and device based on high-level language simulation RTL

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6009256A (en) * 1997-05-02 1999-12-28 Axis Systems, Inc. Simulation/emulation system and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5801955A (en) * 1996-05-31 1998-09-01 Mentor Graphics Corporation Method and apparatus for removing timing hazards in a circuit design
US5748911A (en) * 1996-07-19 1998-05-05 Compaq Computer Corporation Serial bus system for shadowing registers
US6134516A (en) * 1997-05-02 2000-10-17 Axis Systems, Inc. Simulation server system and method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6009256A (en) * 1997-05-02 1999-12-28 Axis Systems, Inc. Simulation/emulation system and method

Also Published As

Publication number Publication date
EP1417605A1 (en) 2004-05-12
CN1491394A (en) 2004-04-21
EP1417605A4 (en) 2009-07-15
IL154480A0 (en) 2003-09-17
JP4125675B2 (en) 2008-07-30
KR20040028599A (en) 2004-04-03
IL154480A (en) 2008-11-26
JP2005500625A (en) 2005-01-06
CA2420022A1 (en) 2003-02-27

Similar Documents

Publication Publication Date Title
CN100578510C (en) Timing-insensitive glitch-free logic device
US6321366B1 (en) Timing-insensitive glitch-free logic system and method
CA2218458C (en) Method and apparatus for design verification using emulation and simulation
EP0437491B1 (en) Method of using electronically reconfigurable gate array logic and apparatus formed thereby
US5452231A (en) Hierarchically connected reconfigurable logic assembly
US6009256A (en) Simulation/emulation system and method
US6785873B1 (en) Emulation system with multiple asynchronous clocks
US6389379B1 (en) Converification system and method
US9195784B2 (en) Common shared memory in a verification system
US6754763B2 (en) Multi-board connection system for use in electronic design automation
US7512728B2 (en) Inter-chip communication system
KR20040023699A (en) Behavior processor system and method
KR100928134B1 (en) Custom DCC Systems and Methods
Haufe et al. Accelerated logic simulation by using prototype boards
Tessier Multi-FPGA systems: Logic emulation
Borriello et al. Synthesis of Timing-Constrained VLSI Systems.
Magalhaes System-level simulation framework for heterogeneous multi-core processing structures
Gajski et al. Specification and design of embedded software/hardware systems
Schaming Hardware/software co-design in the rapid prototyping of application-specific signal processors methodology
Ebeling Synthesis of Timing-Constrained VLSI Systems
STEVENS PRACT IC ALVER IF IC AT IO NAND

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: CADENCE DESIGN SYSTEMS INC. (US)

Free format text: FORMER OWNER: WELYXITE APPEARANCE CO., LTD.

Effective date: 20130301

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20130301

Address after: American California

Patentee after: Cadence Design Systems Inc. (US)

Address before: American California

Patentee before: Verisity Design Inc.

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100106

Termination date: 20140814

EXPY Termination of patent right or utility model