KR20040028598A

KR20040028598A - Vcd-on-demand system and method

Info

Publication number: KR20040028598A
Application number: KR10-2003-7002218A
Authority: KR
Inventors: 핑-셍 트셍; 요게쉬 쿠마 고엘; 퀸시 쿤-흐수 쉔
Original assignee: 액시스 시스템즈, 인크.
Priority date: 2001-08-14
Filing date: 2001-08-14
Publication date: 2004-04-03
Also published as: CN1308819C; EP1417577A4; IL154481A; IL154481A0; CA2420027A1; WO2003017099A1; KR100928134B1; EP1417577A1; CN1491385A; JP4102752B2; IL160392A0; CA2420027C; JP2005500618A

Abstract

본원 발명은 소위 주문형 VCD에 관한 것이다. 일반적인 시스템에서, 주문형 VCD 기술을 통합하는 EDA 툴은 다음의 상위 레벨 속성들을 가지고 있다: (1) RCC-기반 병렬 시뮬레이션 히스토리 압축 및 기록, (2) RCC-기반 병렬 시뮬레이션 히스토리 압축 및 VCD 파일 발생, 및 (3) 선택된 시뮬레이션 타겟 범위에 대한 주문형 소프트웨어 재발생 및 시뮬레이션 재실행 없는 디자인 리뷰. 이러한 속성들 각각이 상세히 기술될 것이다. 사용자가 시뮬레이션 세션 범위를 선택할 때, RCC 시스템은 테스트 벤치 프로세스로부터 주요 입력들에 대해 높게 압축된 버젼을 기록한다. 그리고 나서 사용자는 보다 집중된 분석을 위해 시뮬레이션 세션 범위 내에 소위 시뮬레이션 타겟 범위로 불려지는 보다 협소한 영역을 선택한다. RCC 시스템은 하드웨어 모델의 하드웨어 상태 정보(즉, 주요 입력들)를 VCD 파일내에 덤핑한다. 그리고 RCC 시스템은 시뮬레이션 세션 범위의 시작부터 완전한 시뮬레이션을 재실행할 필요없이 사용자가 시뮬레이션 타겟 범위의 시작으로부터 VCD 파일을 바로 관찰하는 과정으로 진행할 수 있도록 하여준다.The present invention relates to a so-called custom VCD. In a typical system, an EDA tool that incorporates custom VCD technology has the following high-level attributes: (1) RCC-based parallel simulation history compression and recording, (2) RCC-based parallel simulation history compression and VCD file generation, And (3) On-demand software regeneration and design review without simulation rerun for the selected simulation target range. Each of these attributes will be described in detail. When the user selects a simulation session range, the RCC system records a highly compressed version of the key inputs from the test bench process. The user then selects a narrower area, called the simulation target range, within the simulation session range for more focused analysis. The RCC system dumps hardware state information (ie, major inputs) of the hardware model into the VCD file. The RCC system allows the user to proceed directly to viewing the VCD file from the start of the simulation target range without having to rerun the complete simulation from the start of the simulation session range.

Description

Custom DCC system and method {VCD-ON-DEMAND SYSTEM AND METHOD}

일반적으로, 전자 설계 자동화(EDA, electronic design automation)는 사용자의 주문 회로 설계를 설계하고 검증하기 위한 자동 또는 반자동 툴을 설계자에게 공급하기 위하여 다양한 워크스테이션에서의 컴퓨터 기반을 구성된 툴이다. EDA는 일반적으로 시뮬레이션, 에뮬레이션, 견본형성, 실행 또는 연산을 목적으로 임의의 전자 설계를 창조하고, 분석하며 편집하기 위하여 사용된다. EDA기술은 또한 사용자 설계 서브 시스템 또는 컴포넌트를 사용할 시스템(타겟 시스템)을 개발하는데 사용된다. EDA의 최종 목적은 전형적으로 추상적인 집적회로 또는 인쇄회로 기판의 형태의, 즉 원형 설계의 정신을 유지하면서 원형 설계에 대한 개선인, 수정되고 보강된 설계이다.In general, electronic design automation (EDA) is a computer-based tool at various workstations to provide designers with automatic or semi-automated tools for designing and verifying custom circuit designs. EDA is commonly used to create, analyze and edit any electronic design for simulation, emulation, swatching, execution or computation. EDA technology is also used to develop systems (target systems) that will use user-designed subsystems or components. The end purpose of an EDA is a modified and reinforced design, typically in the form of an abstract integrated circuit or printed circuit board, ie an improvement over the prototype design while maintaining the spirit of the prototype design.

하드웨어 에뮬레이션에 앞서는 회로 설계의 소프트웨어 시뮬레이팅의 가치는 EDA 기술을 사용하고 EDA 기술의 이점을 이용하는 여러 가지 산업에서 인식되어 있다. 그럼에도 불구하고, 현재 소프트웨어 시뮬레이션 및 하드웨어 에뮬레이션/가속화는 이들 처리의 분리되고 독립적인 성질때문에 사용자에게 번거로운 것이다. 예컨대, 사용자는 모든 하나의 디버그/테스트 세션에 있어서 시간의 일부분 동안 소프트웨어 시뮬레이션을 이용하여 회로 설계를 디버그하거나 시뮬레이션하고, 그들 결과를 이용하여 다른 시간동안 하드웨어 모델을 이용하여 시뮬레이션 프로세스를 가속화하며, 나중에는 소프트웨어 시뮬레이션으로 되돌아간다. 게다가, 시뮬레이션 시간이 진행됨에 따라서 내부 레지스터와 조합 논리값이 변화하기 때문에, 사용자는 그 변화가 하드웨어 가속화/에뮬레이션 프로세스 동안 하드웨어 모델에 발생한다 하더라도 이들 변화를 모니터링할 수 있어야 한다.The value of software simulation in circuit design prior to hardware emulation has been recognized in many industries using EDA technology and taking advantage of EDA technology. Nevertheless, current software simulation and hardware emulation / acceleration are cumbersome for the user because of the separate and independent nature of these processes. For example, a user may debug or simulate a circuit design using software simulation for a fraction of the time in every one debug / test session, use the results to accelerate the simulation process using a hardware model for another time, and later Returns to the software simulation. In addition, as the simulation time progresses, internal registers and combinational logic values change, so the user must be able to monitor these changes even if they occur in the hardware model during the hardware acceleration / emulation process.

코시뮬레이션(Co-simulation)은 두개의 분리되고 독립적인 순수 소프트웨어 시뮬레이션 프로세스와 순수 에뮬레이션/가속화 프로세스를 이용하는 번거로운 성질에 있어서의 몇가지 문제점을 바로잡고, 전반적인 시스템을 보다 사용자 친화적으로 만들 필요성에서 발생되었다. 그러나, 코시뮬레이션은 여전히 몇가지 결점이 있다: (1)코시뮬레이션 시스템은 수동 분할(manual partitioning)을 필요로 하고, (2)코시뮬레이션은 두개의 소결합 엔진을 사용하며, (3)코시뮬레이션 속도는 소프트웨어 시뮬레이션 속도만큼 느리며, (4)코시뮬레이션 시스템은 레이스 컨디션과 충돌한다.Co-simulation arose from the need to correct some of the troublesome nature of using two separate and independent pure software simulation processes and a pure emulation / acceleration process, making the overall system more user friendly. However, cosimulation still has some drawbacks: (1) the cosimulation system requires manual partitioning, (2) the cosimulation uses two small-combination engines, and (3) the cosimulation speed Is as slow as the software simulation, and (4) the cosimulation system collides with the race condition.

우선, 소프트웨어와 하드웨어 사이의 분할은 자동적으로 행해지는 대신에 수동적으로 행해지고, 더욱이 사용자에게는 부담이다. 본질적으로, 코시뮬레이션은 사용자가 설계를 분할(가동레벨에서 시작하여, 그리고 나서 RTL, 및 게이트 레벨)하고, 소프트웨어와 하드웨어 중에서 매우 큰 기능 블럭으로 상기 모델을 그들 스스로 테스트하는 것을 요구한다. 이러한 제한은 사용자에게 어느 정도 복잡한 것을 요구한다.First of all, the division between software and hardware is done manually instead of automatically, which is burdensome for the user. In essence, co-simulation requires the user to split the design (starting at run level, then RTL, and gate level) and test the model on their own with very large functional blocks of software and hardware. This limitation requires some complexity for the user.

두 번째로, 코시뮬레이션 시스템은 두개의 소결합 및 독립 엔진을 이용하며, 이것은 내부 엔진 동기화, 조정, 및 유연성 이슈를 유발한다. 코시뮬레이션은 두개의 다른 검증 엔진 - 소프트웨어 시뮬레이션 및 하드웨어 에뮬레이션의 동기화를 요구한다. 소프트웨어 시뮬레이션쪽이 하드웨어 가속화쪽에 연결되어 있다고 해도, 외부의 핀아웃 데이터(pin-out data)만이 검사 및 로딩에 유용하다. 레지스터에서의 모델 회로 내부의 값과 조합 논리 레벨은 용이한 검사 및 한쪽에서 다른 한쪽으로의 다운로딩에 있어서 유용하지 않으며, 이들 코시뮬레이션 시스템의 이용을 제한한다. 전형적으로 사용자는 사용자가 소프트웨어 시뮬레이션에서 하드웨어 가속화로 전환하여 되돌린다면 전체 설계를 다시 시뮬레이션해야 할 수도 있다. 따라서, 사용자가 단일 디버그 세션 동안 소프트웨어 시뮬레이션과 하드웨어 에뮬레이션 사이에서 전환을 원하는 한편, 레지스터와 조합 논리값을 검사할 수 있다면, 코시뮬레이션 시스템은 이러한 능력을 제공하지 않는다.Secondly, the cosimulation system utilizes two small coupling and standalone engines, which raise internal engine synchronization, coordination, and flexibility issues. Cosimulation requires the synchronization of two different verification engines-software simulation and hardware emulation. Even if the software simulation side is connected to the hardware acceleration side, only external pin-out data is available for inspection and loading. The values and combinational logic levels inside the model circuit in the registers are not useful for easy inspection and downloading from one side to the other, limiting the use of these cosimulation systems. Typically, you may need to re-simulate the entire design if you switch back from software simulation to hardware acceleration. Thus, if a user wants to switch between software simulation and hardware emulation during a single debug session, while checking registers and combinatorial logic, the cosimulation system does not provide this capability.

세번째로, 코시뮬레이션 속도는 시뮬레이션 속도만큼 느리다. 코시뮬레이션은 두개의 다른 검증엔진- 소프트웨어 시뮬레이션 및 하드웨어 에뮬레이션의 동기화를 요구한다. 각각의 엔진은 시뮬레이션 또는 에뮬레이션을 구동하기 위한 그자신의 제어 메커니즘을 가진다. 이것은 소프트웨어와 하드웨어 사이의 동기화가 전체적인 성능을 소프트웨어 시뮬레이션만큼 낮은 속도로 추진한다는 것을 의미한다.이들 두개의 엔진의 동작을 조정하기 위한 부가적인 부담이 코시뮬레이션 시스템의 낮은 속도에 부가된다.Third, the cosimulation speed is as slow as the simulation speed. Cosimulation requires the synchronization of two different verification engines-software simulation and hardware emulation. Each engine has its own control mechanism for running the simulation or emulation. This means that the synchronization between software and hardware drives the overall performance at as low speed as software simulation. An additional burden of coordinating the operation of these two engines is added to the low speed of the cosimulation system.

네번째로, 코시뮬레이션 시스템은 클럭 신호 사이의 레이스 컨디션에 기인한 셋업, 유지시간 및 클럭 글릿치 문제에 직면한다. 코시뮬레이션은 하드웨어 구동 클럭을 이용하며, 그것은 서로 다른 배선 길이로 인하여 다른 시간에서 다른 논리 소자에 대하여 입력에서 찾아낼 수 있다. 이들 논리 소자가 데이터를 함께 평가하여야 하는 경우, 이것은 임의의 논리소자가 임의의 시간 기간에서 데이터를 평가하고 다른 논리소자가 다른 시간 기간에서 데이터를 평가하기 때문에 평가 결과의 불확실성을 초래한다.Fourth, cosimulation systems face setup, hold time and clock glitches problems due to race conditions between clock signals. Cosimulation uses a hardware driven clock, which can be found at the input for different logic elements at different times due to different wiring lengths. If these logic elements need to evaluate the data together, this leads to uncertainty in the evaluation result because any logic element evaluates the data at any time period and another logic element evaluates the data at another time period.

통상적인 설계자에 의해 직면한 다른 문제점은, 디버그동안 설계문제를 분리하고 인식하는 비교적 느린 프로세스이다. 설계자 자신의 제한된 문제 해결 능력이 몇가지 이러한 흩트러진 페이스(pace)에 기여할 수도 있지만, 이러한 문제의 주요 원천은 시뮬레이터 자신이다. 소프트웨어 기반 엔진 때문에 느린 시뮬레이터뿐만 아니라, 시뮬레이터에 있어서의 디버깅은 다시 구동될 전체 시뮬레이션을 필요로 한다. 이러한 문제들에 대해서는 더 설명될 것이다.Another problem faced by conventional designers is the relatively slow process of isolating and recognizing design problems during debug. The designer's own limited problem solving ability may contribute to some of these scattered paces, but the main source of these problems is the simulator itself. Because of the software-based engine, debugging in the simulator, as well as slow simulators, requires the entire simulation to be run again. These problems will be further explained.

전형적인 ASIC칩 설계자는 시뮬레이터를 이용하여 디버그, 즉 설계자는 다른 것중에서 다양한 자극에 대한 그들의 반응을 관찰하기 위하여 테스트 벤치 프로세스(test bench process)를 이용하여 그 설계를 시뮬레이트하거나 테스트한다. 그 설계의 몇가지 키노드(key nodes)의 심사를 기초로 하여, 설계자는 일반적으로 그들 설계의 결함 유무를 결정할 수 있다. 물론, 설계가 초기 단계에 있다면, 그것은늘 몇가지 문제를 가진다.A typical ASIC chip designer uses a simulator to debug, that is, the designer uses a test bench process to simulate or test the design to observe their response to various stimuli, among others. Based on the examination of several key nodes of the design, the designer can generally determine the presence or absence of a defect in their design. Of course, if the design is in its infancy, it always has some problems.

그러나, 버그의 위치를 알아내는 것은 쉽지 않다. 상당히 크고 복잡한 설계(예컨대 수백만 게이트이상)에 있어서, 시뮬레이터는 버그중의 하나가 명백하게 드러나기 전에 수많은 시뮬레이션 시간 스텝에 걸쳐 나아가야 한다. 분명히, 이러한 설계에 있어서, 설계자가 각각의 시뮬레이션 시간 스텝을 검토하는 것을 기대할 수는 없다. 솔직히 이러한 작업은 제품 설계의 개발 사이클에 있어서 주어진 짧은 시간 범위에는 불가능할 것이다.However, locating bugs is not easy. In a fairly large and complex design (such as millions of gates or more), the simulator must go through a number of simulation time steps before one of the bugs becomes apparent. Clearly, in this design, the designer cannot expect to review each simulation time step. Frankly, this would not be possible in a given short time span in the product design development cycle.

일단 시뮬레이터가 일반적으로 버그의 존재를 발견하면, 액츄얼 버그(actual bug)가 버그의 결함있는 설계를 제거하기 위하여 특별히 배치되어야 한다. 언제(즉, 시물레이션 시간 스텝) 문제가 발생하였는가? 시뮬레이션 초기(예컨대, t10), 중기(예컨대, t1000) 또는 종기(예컨대, t1000000)에 발생하였는가? 또한, 수리가 제공될 수 있도록 어디(즉, 회로 설계의 물리적 위치)에 문제가 위치하는가? 시초에, 설계자가 정확하게 어디(시뮬레이션 시간 스텝)에서 버그가 발생했는지를 알 수는 없지만, 그는 합리적인 추정을 할 수 있다. 설계자는 그가 문제가 위치한다고 생각하는 정확한 시뮬레이션 시간으로 진행하기 위한 몇가지 방안이 있을 수 있다. 시뮬레이터는 두가지 종래의 방법- 완전한 VCD 및 선택적인 VCD 중 하나를 통하여 VCD(Value Change Dump) 파일을 제공하여 이러한 작업에 있어서 설계자를 보조할 수 있다.Once the simulator generally detects the presence of a bug, an actual bug must be specially placed to eliminate the bug's defective design. When did the problem occur (ie simulation time step)? Did it occur early in the simulation (eg t10), mid-term (eg t1000) or boil (eg t1000000)? Also, where is the problem located (ie the physical location of the circuit design) so that repairs can be provided? Initially, the designer does not know exactly where the bug occurred (simulation time step), but he can make a reasonable estimate. The designer may have several ways to proceed with the exact simulation time he thinks the problem is located. The simulator can assist the designer in this task by providing a Value Change Dump (VCD) file through one of two conventional methods—a complete VCD and an optional VCD.

완전 VCD 방법에 있어서, 시뮬레이터는 시뮬레이션 시간 t0 에서 시뮬레이션 최종까지 VCD 파일로서 전체 시뮬레이션을 저장한다. 그리고 나서, 이러한 VCD 파일은 설계자에 의해 분석되어 버그를 분리하다. 설계자는 그 일반적인 위치에 대하여 합리적인 추정을 하여 몇가지 정교한 스텝핑으로 이 위치를 찾아낼 수 있게 된다: 즉, 설계자가 버그가 시뮬레이션 시간 t350 과 t400 사이의 어느 곳에서 발생했다고 조금 의심한다면, 그는 시뮬레이션 시간 t345와 같은 의심된 시뮬레이션 시간 직전에 위치한 시뮬레이션 시간으로 진행할 것이다. 그리고 나서, 설계자는 이러한 의심된 지역(즉, t345 내지 t400)을 매우 조심스럽게 검사하기 위하여 진행할 것이다.In the full VCD method, the simulator stores the entire simulation as a VCD file from simulation time t0 to the simulation end. Then, the VCD file is analyzed by the designer to isolate the bug. The designer can make a reasonable estimate of the general position and find it by some elaborate stepping: if the designer suspects that the bug has occurred somewhere between simulation times t350 and t400, he has a simulation time of t345. We will proceed to the simulation time located just before the suspected simulation time. The designer will then proceed to examine this suspect area very carefully (ie t345 to t400).

그러나, 이러한 시뮬레이션 시간에 도달하기 위해서는, 설계자는 버그가 어디에서 발생되었는지 상관없이 VCD 파일을 가지고 시작(즉, t0)부터 전체 시뮬레이션을 재가동하여야 한다. 만약 그의 버그의 위치에 대한 초기 추정이 틀렸다면, 그는 다른 추정을 하여 처음부터 시뮬레이션을 다시 재가동하여야 한다. 백만 이상의 게이트 및 백만 이상의 시뮬레이션 시간 스텝을 가진 설계에 있어서, 이러한 시작부터 시뮬레이션을 재가동하는 디버깅 프로세스는 잘못된 추정에 의해 악화된 매우 큰 시간낭비이다.However, to reach this simulation time, the designer has to restart the entire simulation from the beginning (ie t0) with the VCD file, regardless of where the bug originated. If the initial estimate of the location of his bug is wrong, he must make another estimate and restart the simulation from the beginning. For designs with more than one million gates and more than one million simulation time steps, the debugging process that restarts the simulation from this initiation is a very large time wasted by bad estimation.

그러나, 백만이상의 게이트와 백만 이상의 시뮬레이션 시간 스텝을 가진 설계는 많은 디스크 공간을 필요로 한다. 전형적으로, 대략 100 GB의 완전한 VCD 파일은 통상적이지 않다. 억세스 스토리지의 이러한 프로세스는 몇시간을 필요로 하며, 결과적으로 시뮬레이션은 스토리지 작동이 주어진 시뮬레이션 시간에서 종료될 때까지 잠시동안 중단되어야 한다. 오늘날, 완전한 VCD 방법은 더이상 실용적이지 않다.However, designs with more than one million gates and more than one million simulation time steps require a lot of disk space. Typically, a complete VCD file of approximately 100 GB is unusual. This process of access storage requires several hours, and as a result, the simulation must be paused for a while until the storage operation ends at a given simulation time. Today, the complete VCD method is no longer practical.

선택적인 VCD방법에 있어서, 전체 시뮬레이션은 저장되지 않는다; 오히려, 시뮬레이터는 설계자가 선택한 부분의 시뮬레이션을 저장한다. 그러나, 선택적인 VCD는 설계자가 시작부터 전체 시뮬레이션을 재가동해야할 필요성을 줄여주지는 않는다. 시초부터, 설계자는 시뮬레이션을 가동하고 반드시 그 설계에 있어서 문제점을 관찰하여야 한다. 그리고 나서, 어디 문제점이 위치하는지에 대하여 추정한다. 만약 설계자가 문제가 시뮬레이션 시간 t350과 t400사이의 어디엔가 문제점이 발생할 것이라는 추정을 한다면, 설계자는 그 시뮬레이션을 재가동하여 시뮬레이터가 VCD 파일로서 이러한 시뮬레이션 시간 범위를 저장하도록 지시한다. 그후에, 설계자는 그의 추정에 해당하는 VCD 파일을 검사할 수 있다. 만약 문제의 위치를 알아내는데 있어서 그의 추정이 틀렸다면, 그는 다른 추정을 하여, VCD파일로서 새로운 시뮬레이션 범위를 저장하도록 시뮬레이터에게 지시하고, 그리고 나서 그 시뮬레이션을 재가동하여야 한다. 그리고 나서, 설계자는 VCD파일을 다시 분석한다.In the optional VCD method, the entire simulation is not saved; Rather, the simulator stores a simulation of the part selected by the designer. However, the optional VCD does not reduce the need for the designer to restart the entire simulation from the start. From the outset, designers must run simulations and observe problems in their design. Then, estimate where the problem is located. If the designer assumes that the problem will occur somewhere between simulation times t350 and t400, the designer restarts the simulation and instructs the simulator to save this simulation time range as a VCD file. The designer can then examine the VCD file corresponding to his estimate. If his estimate is wrong in locating the problem, he must make another estimate, instruct the simulator to save the new simulation range as a VCD file, and then restart the simulation. The designer then analyzes the VCD file again.

완전한 VCD 방법과는 달리, 선택적인 VCD는 전체 시뮬레이션이 저장되지 않기 때문에 많은 디스크를 필요로 하지 않는다. 그러나, 선택적인 VCD는 여전히 전체 시뮬레이션을 재가동하는 것을 요구한다. 만약 설계자가 그 버그의 위치를 알아내는데 있어서 잘못된 추정을 한다면, 그는 VCD 파일에 있어서 새로운 시뮬레이션 범위를 저장하도록 다시 시뮬레이션을 재가동하여야 한다. 어느 경우에도, 선택적인 VCD 방법은 잘못된 추정에 의해 악화된 시간낭비는 여전하다.Unlike the full VCD method, the optional VCD does not require many disks because the entire simulation is not saved. However, the optional VCD still requires restarting the entire simulation. If the designer makes a mistake in locating the bug, he has to restart the simulation again to save the new simulation range in the VCD file. In either case, the selective VCD method is still aggravated by wasted time.

따라서, 현재 알려진 시뮬레이션 시스템, 하드웨어 에뮬레이션 시스템, 하드웨어 가속화, 코시뮬레이션, 및 공동 검증(coverification) 시스템에 의해 유발된문제점을 바로 잡는 시스템 및 방법에 대한 필요성이 산업 분야에 존재한다.Accordingly, there is a need in the industry for systems and methods for correcting problems caused by currently known simulation systems, hardware emulation systems, hardware acceleration, cosimulation, and co-coverage systems.

본 발명은 1998년 8월 31일에 미국특허청(USPTO)에 제출된 미국 특허 출원 제09/144,222호의 일부 연속 출원(CIP)과 관련된 출원이다.The present invention is an application related to a partial serial application (CIP) of US patent application Ser. No. 09 / 144,222, filed with USPTO on August 31, 1998.

본 발명은 일반적으로 설계 디버그 세션을 가속화하기 위한 VCD(value change dump) 개선에 관한 것이다.The present invention generally relates to a value change dump (VCD) improvement to speed up a design debug session.

도 1은 워크스테이션, 재구성가능한 하드웨어 에뮬레이션 모델, 에뮬레이션 인터페이스 및 PCI 버스에 결합된 타겟 시스템을 포함하는, 본 발명의 하나의 실시예의 상위 개략도를 나타내고,1 shows a top schematic diagram of one embodiment of the present invention, including a workstation, a reconfigurable hardware emulation model, an emulation interface, and a target system coupled to a PCI bus,

도 2는 본 발명의 하나의 특정된 이용 흐름도를 나타내며,2 illustrates one specific use flow diagram of the invention,

도 3은 본 발명의 일실시예에 따라 컴파일링 시간 및 런타임동안 소프트웨어 컴파일레이션 및 하드웨어 구성의 상위도를 나타내며,3 illustrates a top view of software compilation and hardware configuration during compilation time and runtime in accordance with an embodiment of the present invention.

도 4는 소프트웨어/하드웨어 모델 및 소프트웨어 커널 코드 발생을 포함하는 컴파일레이션 프로세스의 흐름도를 나타내고,4 shows a flow diagram of a compilation process including a software / hardware model and software kernel code generation,

도 5는 전체 SEmulation 시스템을 제어하는 소트프웨어 커널을 나타내며,5 shows a software kernel that controls the entire SEmulation system.

도 6은 맵핑, 배치, 및 라우팅을 통하여 하드웨어 모델을 재구성가능보드에 맵핑하는 방법을 나타내고,6 shows a method of mapping a hardware model to a reconfigurable board through mapping, placement, and routing,

도 7은 도 8에 나타낸 FPGA 어레이를 위한 연결성 매트릭스를 나타내며,FIG. 7 shows the connectivity matrix for the FPGA array shown in FIG. 8,

도 8은 4x4 FPGA 어레이 및 그들의 내부 접속의 일실시예를 나타내고,8 shows an embodiment of a 4x4 FPGA array and their internal connections,

도 9(A), (B) 및 (C)는 다수의 핀대신 하나의 핀이 칩내의 와이어 그룹에 대하여 사용될 수 있도록 타임 멀티플렉싱 방식으로 일단의 와이어가 함께 결합되는 시분할 다중 회로의 일실시예를 나타낸다. 도 9(A)는 핀아웃 문제의 개략을 나타내고, 도 9(B)는 발신측에 있어서의 TDM 회로를 제공하며, 도 9(C)는 수신측에서에서의 TDM회로를 제공한다.9 (A), (B) and (C) illustrate an embodiment of a time division multiplexing circuit in which a group of wires are joined together in a time multiplexing manner so that one pin can be used for a group of wires in a chip instead of multiple pins. Indicates. Fig. 9A shows an outline of the pinout problem, Fig. 9B provides a TDM circuit at the originating side, and Fig. 9C provides a TDM circuit at the receiving side.

도 10은 본 발명의 일실시예에 따라서 SEmulation 시스템 구조를 나타내고,10 illustrates a structure of a SEmulation system according to an embodiment of the present invention,

도 11은 본 발명의 어드레스 포인터의 일실시예를 나타내며,11 shows an embodiment of the address pointer of the present invention,

도 12는 도 11의 어드레스 포인터를 위한 어드레스 포인터 초기화의 상태변이를 나타내며,12 illustrates a state transition of address pointer initialization for the address pointer of FIG.

도 13은 어드레스 포인터를 위한 다양한 MOVE 신호를 파생적으로 발생하는 MOVE신호 발생기의 일실시예를 나타내고,FIG. 13 illustrates an embodiment of a MOVE signal generator that derivatively generates various MOVE signals for an address pointer,

도 14는 FPGA칩 각각에서 멀티플레싱된 어드레스 포인터 체인을 나타내며,14 shows a chain of address pointers multiplexed on each FPGA chip,

도 15는 본 발명의 일실시예에 따라서 멀티플렉싱된 크로스 칩 어드레스 포인터 체인의 일실시예를 나타내고,15 illustrates one embodiment of a cross chip address pointer chain multiplexed according to an embodiment of the present invention,

도 16은 하드웨어 모델의 논리 컴포넌트의 소프트웨어 클럭 실행과 평가에 중요한 클럭/데이터 네트웍 분석의 흐름도를 나타내며,16 shows a flow chart of a clock / data network analysis that is critical for software clock execution and evaluation of logic components of a hardware model.

도 17은 본 발명의 일실시예에 따라서 하드웨어 모델의 기본 구축 블럭을 나타내며,17 illustrates a basic building block of a hardware model according to an embodiment of the present invention,

도 18(A)와 (B)는 래치와 플립플롭에 대한 레지스터 모델 실행을 나타내고,18 (A) and (B) show register model execution for latches and flip-flops,

도 19는 본 발명의 일실시예에 따라서 클럭 엣지 검출 논리의 일실시예를 나타내며,19 illustrates an embodiment of clock edge detection logic in accordance with an embodiment of the present invention.

도 20은 본 발명의 일실시예에 따라서 도 19의 클럭 엣지 검출 논리를 제어하기 위한 4가지 상태의 한정 상태 머신을 나타내고,20 illustrates a four state confined state machine for controlling the clock edge detection logic of FIG. 19 in accordance with an embodiment of the present invention.

도 21은 본 발명의 일실시예를 따라서 FPGA칩 각각에 대하여, 내부 접속, JTAG, FPGA, 버스 및 전체 신호 핀 지정을 나타내고,Figure 21 shows the internal connections, JTAG, FPGA, bus and overall signal pin assignments for each FPGA chip in accordance with one embodiment of the present invention,

도 22는 PCI 버스와 FPGA 어레이 사이의 FPGA 제어기의 일실시예를 나타내며,22 illustrates one embodiment of an FPGA controller between a PCI bus and an FPGA array,

도 23은 도 22에 대하여 논의된 CTRL_FPGA 유닛과 데이터 버퍼의 보다 상세한 설명을 나타내고,FIG. 23 shows a more detailed description of the CTRL_FPGA unit and data buffer discussed with respect to FIG. 22;

도 24는 4x4 FPGA 어레이, FPGA에 대한 그 관계, 및 확장 능력을 나타내고,24 illustrates a 4x4 FPGA array, its relationship to the FPGA, and its expansion capabilities,

도 25는 하드웨어 개시 방법의 일실시예를 나타내며,25 illustrates one embodiment of a hardware initiation method,

도 26은 모델링되고 시뮬레이트될 사용자 회로 설계의 일예에 대한 HDL코드를 나타내고,26 illustrates an HDL code for an example of a user circuit design to be modeled and simulated,

도 27은 도 26의 HDL코드의 회로 설계를 기호로 나타낸 회로도이며,27 is a circuit diagram showing a circuit design of the HDL code of FIG.

도 28은 도 26의 HDL코드에 대한 컴포넌트 형태 분석을 나타내고,FIG. 28 shows component shape analysis for the HDL code of FIG. 26;

도 29는 도 26에 나타낸 사용자 주문 회로 설계에 기초한 구조화된 RTL HDL 코드의 신호 네트웍 분석을 나타내며,29 illustrates a signal network analysis of a structured RTL HDL code based on the user order circuit design shown in FIG. 26,

도 30은 동일한 가상예의 소프트웨어/하드웨어 분할 결과를 나타내며,30 shows the result of software / hardware partitioning of the same virtual example,

도 31은 동일한 가상예에 대한 하드웨어 모델을 나타내고,31 shows a hardware model for the same virtual example,

도 32는 사용자 주문 회로 설계의 동일한 가상예에 대한 하나의 특정한 하드웨어 모델 대 칩 분할 결과를 나타내며,32 shows one specific hardware model versus chip splitting result for the same hypothetical example of a user-customized circuit design,

도 33은 사용자 주문 회로 설계의 동일한 가상예의 다른 특정 하드웨어 모델 대 칩 분할 결과를 나타내며,33 illustrates another specific hardware model versus chip splitting result of the same hypothetical example of a user-customized circuit design,

도 34는 사용자 주문 회로 설계의 동일한 가상예에 대한 논리 패칭 동작을 나타내고,34 illustrates a logic patching operation for the same hypothetical example of a user order circuit design,

도 35(A) 내지 35(D)는 두개의 예에 있어서 "홉(hops)"과 내부 접속 원리를 설명하며,35 (A) to 35 (D) illustrate the "hops" and internal connection principle in two examples,

도 36은 본원 발명에서 사용된 FPGA칩의 개략을 나타내고,36 shows an outline of an FPGA chip used in the present invention,

도 37은 FPGA칩상의 FPGA 내부 접속 버스를 나타내며,37 shows an FPGA internal connection bus on an FPGA chip,

도 38(A) 내지 (B)는 본 발명의 일실시예에 따라서 FPGA보드 접속의 측면도를 나타내며,38 (A)-(B) show a side view of an FPGA board connection according to one embodiment of the invention,

도 39는 본 발명의 일실시예에 따라서 FPGA 어레이의 다이렉트-네이버(direct-neighbor) 및 원홉 식스보드(one-hop, six-board) 내부접속 레이아웃을 나타내고,39 illustrates a direct-neighbor and one-hop six-board interconnect layout of an FPGA array in accordance with an embodiment of the present invention.

도 40(A) 및 40(B)는 FPGA 내부 보드 내부접속 구성을 나타내며,40 (A) and 40 (B) show an FPGA internal board interconnection configuration.

도 41(A) 내지 41(F)는 보드 내부 접속 커넥터의 상면도를 나타내고,41A to 41F show a top view of the board internal connection connector,

도 42는 대표적인 FPGA 보드에 있어서 보드상 커넥터 및 몇가지 컴포넌트를 나타내며,42 illustrates on-board connectors and some components of a typical FPGA board,

도 43은 도 41(A) 내지 41(F) 및 도 42의 커넥터의 사용 설명을 나타내고,Fig. 43 shows the use of the connectors of Figs. 41A to 41F and Fig. 42;

도 44는 본 발명의 다른 실시예에 따라서 FPGA 어레이의 다이렉트-네이버 및 원홉 듀얼보드 내부접속 레이아웃을 나타내며,44 illustrates a direct-naver and one-hop dual board interconnect layout of an FPGA array according to another embodiment of the present invention.

도 45는 본 발명의 다른 실시예에 따라서 멀티프로세서를 가진 워크 스테이션을 나타내고,45 illustrates a workstation with a multiprocessor according to another embodiment of the present invention,

도 46은 시간 공유 기반으로 다수의 사용자가 단일 시뮬레이션/에뮬레이션 시스템을 공유하는 본 발명의 다른 실시예에 따르는 환경을 나타내며,46 illustrates an environment according to another embodiment of the present invention in which multiple users share a single simulation / emulation system on a time sharing basis,

도 47은 본 발명의 일실시예에 따라서 시뮬레이션 서버의 상위 구조를 나타내고,47 illustrates a higher structure of a simulation server according to an embodiment of the present invention.

도 48은 본 발명의 일실시예를 따라서 시뮬레이션 서버의 구조를 나타내고,48 illustrates a structure of a simulation server according to an embodiment of the present invention.

도 49는 시뮬레이션 서버의 흐름도를 나타내고,49 shows a flowchart of a simulation server,

도 50은 잡 스와핑(job swapping) 프로세스의 흐름도를 나타내며,50 shows a flowchart of a job swapping process,

도 51은 디바이스 드라이버와 재구성 가능한 하드웨어 유닛 사이의 신호를 나타내며,51 shows a signal between a device driver and a reconfigurable hardware unit,

도 52는 다양한 레벨의 우선 순위를 가진 다중 잡(job)을 조정하기 위한 시뮬레이션 서버의 시간 공유 특징을 도시하며,FIG. 52 illustrates a time sharing feature of a simulation server for coordinating multiple jobs with various levels of priority,

도 53은 디바이스 드라이버와 재구성 가능한 하드웨어 유닛 사이의 통신 핸드쉐이크 신호를 나타내고,53 illustrates a communication handshake signal between a device driver and a reconfigurable hardware unit,

도 54는 통신 핸드쉐이크 프로토콜의 상태도를 나타내며,54 shows a state diagram of the communication handshake protocol,

도 55는 본 발명의 일실시예에 따라서 시뮬레이션 서버의 클라이언트-서버 모델의 개략도를 나타내고,55 shows a schematic diagram of a client-server model of a simulation server according to an embodiment of the present invention,

도 56은 본 발명의 일실시예에 따라서 메모리 맵핑을 실행하기 위한 시뮬레이션 시스템의 상위 블럭도를 나타내며,56 shows a high block diagram of a simulation system for performing memory mapping in accordance with an embodiment of the present invention,

도 57은 각각의 FPGA 논리장치에 대한 평가 한정 상태 머신(EVALFSM_x)과 메모리 한정 상태 머신(MEMFSM)용 지원 컴퍼넌트를 가진 시뮬레이션 시스템의 메모리 맵핑 태양의 보다 상세한 블럭도를 나타내고,FIG. 57 shows a more detailed block diagram of a memory mapping aspect of a simulation system with support components for an evaluation limited state machine (EVALFSM _x ) and a memory limited state machine (MEMFSM) for each FPGA logic unit, and FIG.

도 58은 본 발명의 일실시예를 따라서 CTRL_FPGA 유닛내의 MEMFSM의 한정상태 머신의 상태도를 나타내며,58 is a state diagram of a state machine of the MEMFSM in the CTRL_FPGA unit according to one embodiment of the present invention;

도 59는 본 발명의 일실시예를 따라서 FPGA칩 각각에서의 한정 상태 머신의 상태도를 나타내며,59 is a state diagram of a limited state machine in each FPGA chip according to one embodiment of the present invention;

도 60은 메모리 판독 데이터 이중 버퍼를 나타내며,60 shows a memory read data double buffer,

도 61은 본 발명의 일실시예를 따라서 시뮬레이션 기록/판독 사이클을 나타내며,61 illustrates a simulation write / read cycle in accordance with an embodiment of the present invention.

도 62는 CLK_EN 신호 후에 DMA 판독 동작이 일어날 때의 시뮬레이션 데이터 전송 동작의 타이밍도를 나타내고,Fig. 62 shows a timing diagram of a simulation data transfer operation when a DMA read operation occurs after the CLK_EN signal,

도 63은 EVAL 기간의 끝 부근에서 DMA 판독 동작이 일어날 때의 시뮬레이션 데이터 전송 동작의 타이밍도를 나타내며,Fig. 63 shows a timing chart of the simulation data transfer operation when the DMA read operation occurs near the end of the EVAL period,

도 64는 PCI 추가 카드로서 실행된 통상적인 사용자 설계를 나타내고,64 shows a typical user design implemented as a PCI add-in card,

도 65는 테스트중인 장치로서 ASIC을 이용하는 하드웨어/소프트웨어 공동 검증 시스템을 나타내고,65 illustrates a hardware / software co-verification system using an ASIC as the device under test,

도 66은 테스트중인 장치가 에뮬레이터에 프로그래밍된 때의 에뮬레이터를 사용하는 통상적인 공동 검증 시스템을 나타내며,66 shows a typical joint verification system using an emulator when the device under test is programmed into the emulator,

도 67은 본 발명의 일실시예를 따라서 시뮬레이션 시스템을 나타내고,67 illustrates a simulation system in accordance with an embodiment of the present invention.

도 68은 RCC 컴퓨팅 시스템이 다양한 I/O 장치의 소프트웨어 모델과 타겟시스템을 포함하는, 본 발명의 일실시예를 따라서 외부 I/O 장치없는 공동 검증 시스템을 나타내며,68 illustrates a joint verification system without external I / O devices in accordance with an embodiment of the present invention, wherein the RCC computing system includes software models and target systems of various I / O devices.

도 69는 본 발명의 다른 실시예에 따라서 실제 외부 I/O 장치 및 타겟 시스템을 가진 공동 검증 시스템을 나타내며,69 illustrates a joint verification system with actual external I / O devices and target systems in accordance with another embodiment of the present invention,

도 70은 본 발명의 일실시예에 따라서 제어논리 내부 데이터의 보다 상세한 논리도를 나타내고,70 is a more detailed logic diagram of control logic internal data according to an embodiment of the present invention,

도 71은 본 발명의 일실시예에 따라서 제어논리 외부 데이터의 보다 상세한 논리도를 나타내며,71 is a more detailed logic diagram of control logic external data according to an embodiment of the present invention.

도 72는 제어논리 내부 데이터의 타이밍도를 나타내고,72 is a timing diagram of control logic internal data,

도 73은 제어논리 외부 데이터의 타이밍도를 나타내며,73 is a timing diagram of control logic external data,

도 74는 본 발명의 일실시예에 RCC 하드웨어 어레이의 보드 레이아웃을 나타내고,74 illustrates a board layout of an RCC hardware array in one embodiment of the present invention.

도 75(A)는 유지 시간 및 클럭 글릿치 문제를 설명하는데 사용될 시프트 레지스터 회로의 일예를 나타내고,75 (A) shows an example of a shift register circuit to be used for explaining the holding time and clock glitches problems,

도 75(B)는 유지 시간을 설명하기 위해 도 76(A)에 나타낸 시프트 레지스터 회로의 타이밍도를 나타내며,FIG. 75B is a timing diagram of the shift register circuit shown in FIG. 76A for explaining the holding time.

도 76(A)는 다중 FPGA 칩을 가로질러 배치된 도 75(A)에 나타낸 동일한 시프트 레지스터 회로를 나타내고,FIG. 76 (A) shows the same shift register circuit shown in FIG. 75 (A) disposed across multiple FPGA chips,

도 76(B)는 유지시간 침해를 설명하기 위한 도 76(A)에 나타낸 시프트 레지스터 회로의 타이밍도를 나타내며,FIG. 76B shows a timing chart of the shift register circuit shown in FIG. 76A for explaining the infringement of the holding time.

도 77(A)는 클럭 글릿치 문제를 설명하는데 사용될 논리 회로의 일예를 나타내고,77A shows an example of a logic circuit to be used to explain the clock glitches problem,

도 77(B)는 클럭 글릿치 문제를 설명하기 위한 도 77(A)의 논리 회로의 타이밍도를 나타내며,FIG. 77B shows a timing diagram of the logic circuit of FIG. 77A for explaining the clock glitches problem.

도 78은 유지시간 침해 문제를 해결하기 위한 종래 기술의 타이밍 조정 기술을 나타내고,78 shows a timing adjustment technique of the related art for solving the maintenance time infringement problem,

도 79는 유지시간 침해문제를 해결하기 위한 종래 기술의 타이밍 재통합기술을 나타내며,79 illustrates a timing re-integration technique of the prior art for solving the maintenance time violation problem,

도 80(A)는 오리지널 래치를 나타내고, 도 80(B)는 본 발명의 일실시예에 따라서, 타이밍에 영향을 받지 않고, 글릿치가 없는 래치를 나타내고,80 (A) shows the original latch, and FIG. 80 (B) shows the latch without glitches, without being affected by timing, according to one embodiment of the invention,

도 81(A)는 오리지널 설계 플립플롭을 나타내며, 도 81(B)는 본 발명의 일실시예에 따라서 타이밍에 영향을 받지 않고 글릿치 없는 설계형 플립플롭을 나타내고,81 (A) shows the original design flip-flop, FIG. 81 (B) shows the design-free flip-flop without being affected by timing in accordance with one embodiment of the present invention,

도 82는 본 발명의 일실시예를 따라서 타이밍에 영향을 받지 않고 글릿치가 없는 래치 및 플립플롭의 트리거 메카니즘의 타이밍도를 나타낸다.FIG. 82 illustrates a timing diagram of a trigger mechanism of a latch and flip-flop without glitches without being affected by timing according to an embodiment of the present invention.

이들 특징은 본 발명의 여러가지 다양한 태양 및 실시예와 관련하여 이하에서 논의되리 것이다.These features will be discussed below in connection with various various aspects and embodiments of the invention.

도 83은 본 발명의 일실시예를 병합하는 RCC 시스템의 컴포넌트의 상위도를 나타내며,83 illustrates a top view of components of an RCC system incorporating an embodiment of the present invention.

도 84는 본 발명의 일실시예에 따라서 주문형 VCD 동작을 설명하는 몇가지 시뮬레이션 시간 기간을 나타낸다.84 illustrates several simulation time periods illustrating VCD operation on demand in accordance with an embodiment of the present invention.

본 발명의 하나의 실시예는 시뮬레이션 재가동이 없는 주문형 VCD 파일을 제공하는 것이다. 주문형 VCD 특징은 RCC 시스템에 병합되며, 그것은 RCC 컴퓨팅 시스템 및 RCC 하드웨어 가속기를 포함한다. RCC 컴퓨팅 시스템은 사용자가 소프트웨어로 사용자의 전체 소프트웨어 모델 설계를 시뮬레이팅하고, 설계의 하드웨어 모델 부분의 하드웨어 가속화를 제어하는데 필요한 계산 리소스를 포함한다. RCC 하드웨어 가속기는 사용자가 디버깅 프로세스를 가속화할 수 있도록 사용자의 하드웨어 설계의 적어도 일부분을 모델링할 수 있는 논리소자의 재구성가능한 어레이(FPGA)를 포함한다. RCC 컴퓨팅 시스템은 소프트웨어 클럭을 통하여 RCC 하드웨어 가속기에 정밀하게 결합되어 있다.One embodiment of the present invention is to provide a custom VCD file without simulation restart. On-demand VCD features are incorporated into an RCC system, which includes an RCC computing system and an RCC hardware accelerator. An RCC computing system includes computational resources that a user needs to simulate the user's entire software model design with software and control hardware acceleration of the hardware model portion of the design. The RCC hardware accelerator includes a reconfigurable array of logic elements (FPGAs) that can model at least a portion of the user's hardware design to allow the user to speed up the debugging process. The RCC computing system is tightly coupled to the RCC hardware accelerator through a software clock.

주문형 VCD는 사용자가 시뮬레이션을 재가동하지 않고 상세한 디버깅 분석을 위하여 시뮬레이션 히스토리의 일부를 선택하도록 한다. RCC 시스템은 사용자가 두개의 시뮬레이션 시간 범위, 보다 넓은 "시뮬레이션 세션 범위"와 "시뮬레이션 타겟 범위"라고 불리는 보다 좁은 서브셋의 이러한 범위를 선택하도록 한다. "시뮬레이션 세션 범위"의 선택 후에, RCC시스템은 평가를 위해 테스트 벤치 프로세스에서 RCC 하드웨어 가속기내의 하드웨어 모델까지 일차 입력을 제공하여 시뮬레이션 세션 범위의 전체 지속동안 설계를 신속하게 시뮬레이팅한다. 또한, 이들 동일한 일차 입력은 시뮬레이션 히스토리 파일에 압축되고 기록된다. 이러한 시뮬레이션 히스토리 파일을 가지고, RCC시스템은 언제라도 시뮬레이션 세션 범위내에서 임의의시뮬레이션 부분을 재생할 수 있다.On-demand VCD allows the user to select a portion of the simulation history for detailed debugging analysis without restarting the simulation. The RCC system allows the user to choose between two simulation time ranges, a wider "simulation session range" and a narrower subset called the "simulation target range." After the selection of "simulation session range", the RCC system quickly provides a primary input from the test bench process to the hardware model in the RCC hardware accelerator for evaluation to quickly simulate the design for the entire duration of the simulation session range. In addition, these same primary inputs are compressed and recorded in the simulation history file. With this simulation history file, the RCC system can play any simulation part at any time within the scope of the simulation session.

시뮬레이션 세션 범위의 시작에서, RCC시스템은 사용자가 필요한 오프라인 시뮬레이션을 하도록 이 지점에서 설계의 하드웨어 상태 정보를 저장한다. 시뮬레이션 세션 범위의 마지막에서, RCC시스템은 사용자가 시뮬레이션을 다시 감지 않고서도 언제라도 이러한 시뮬레이션 세션 범위을 지나 계속 시뮬레이팅하는 이 지점으로 빨리 되돌아 갈 수 있도록 이 지점에서 설계의 하드웨어 상태 정보를 저장한다.At the start of the simulation session scope, the RCC system stores the hardware state information of the design at this point for the user to perform the necessary offline simulation. At the end of the simulation session scope, the RCC system stores the hardware state information of the design at this point so that the user can quickly return to this point of continuous simulation past this simulation session scope at any time without re-detecting the simulation.

사용자가 "시뮬레이션 타겟 범위"를 선택하면, RCC시스템은 시뮬레이션 히스토리 파일의 압축된 일차 입력을 풀고, 평가용 RCC 하드웨어 가속기에 이들 압축이 풀린 일차 입력을 제공하여 시뮬레이션 타겟 범위의 초기에 대하여 빨리 시뮬레이팅한다. 시뮬레이션 타겟 범위에서, RCC시스템은 시스템 디스크내의 스토리지용 VCD 파일로 하드웨어 모델로부터의 일차 출력 또는 평가된 결과를 쌓아 놓는다(dump). 시뮬레이션 타겟 범위의 마지막에서, RCC 시스템은 덤프 프로세스를 중지한다.When the user selects the "simulation target range", the RCC system decompresses the compressed primary inputs of the simulation history file and provides these decompressed primary inputs to the evaluation RCC hardware accelerator to quickly simulate the beginning of the simulation target range. do. In the simulation target range, the RCC system dumps the primary output or evaluated results from the hardware model into a VCD file for storage in the system disk. At the end of the simulation target range, the RCC system stops the dump process.

일단 VCD 파일이 만들어지면, 사용자는 VCD 파일을 보다 상세하게 설계의 디버그를 위하여 파형 뷰어를 가지고 VCD파일을 관찰할 수 있다. 버그가 그의 시뮬레이션 타겟 범위에 위치하지 않는다면, 사용자는 동일한 시뮬레이션 세션 범위내의 다른 시뮬레이션 타겟 범위를 선택할 수 있다. 일단 새로운 시뮬레이션 타겟범위가 선택되면, RCC시스템은 상술한 방법으로 새로운 VCD 파일을 만들어낸다. 그리고 나서, 사용자는 버그를 분리하기 위하여 이 새로운 VCD 파일을 분석할 수 있다.Once the VCD file is created, the user can view the VCD file with the waveform viewer to debug the design in more detail. If the bug is not located within its simulation target range, the user can select another simulation target range within the same simulation session range. Once a new simulation target range is selected, the RCC system creates a new VCD file in the manner described above. Then, the user can analyze this new VCD file to isolate the bug.

버그가 일단 분리되어 치료되면, 사용자는 현재 시뮬레이션 세션 범위를 지나서 다음 시뮬레이션 범위까지 시뮬레이팅하기 위하여 이동할 수 있다. 현재 시뮬레이션 세션 범위의 마지막으로부터 저장된 하드웨어 상태 정보는 RCC 시스템으로 로딩된다. 그리고 나서, 사용자는 시뮬레이팅을 착수할 수 있다. 주문형 VCD 특성은 온라인과 오프라인 모두에서 이용할 수 있다.Once the bug has been isolated and cleaned up, the user can move past the current simulation session scope to simulate the next simulation scope. The hardware state information stored from the end of the current simulation session scope is loaded into the RCC system. The user can then start simulating. Custom VCD features are available both online and offline.

이들 및 다른 실시예는 이하의 상세한 설명에 의해 충분히 논의되고 설명된다.These and other embodiments are fully discussed and described by the following detailed description.

본 발명의 상기 목적 및 상세한 설명은 이하의 본문과 첨부 도면으로부터 보다 명백하게 이해될 것이다.The above object and detailed description of the present invention will be more clearly understood from the following text and the accompanying drawings.

본 명세서는 "SEmulator"또는 "SEmulation" 시스템으로 칭하는 시스템의 배경을 통하여 그리고 시스템의 배경내에서 본 발명의 다양한 실시예를 설명할 것이다. 명세서 전반에 걸쳐, "SEmulator system", "SEmulator" 또는 단순히 시스템이라는 용어가 사용될 수 있다. 이들 용어는 (1) 소프트웨어 시뮬레이션, (2) 하드웨어 가속화를 통한 시뮬레이션, (3) 회로 내 에뮬레이션(ICE) 및, (4) 그들 각각의 셋업 또는 전-처리 단계를 포함하는 후-시뮬레이션 분석;의 4가지 동작 모드의 임의의 조합을 위한 본 발명에 따라는 다양한 장치 및 방법 실시예를 언급한 것이다. 다른 때는, "SEmulation"이라는 용어가 사용될 수 있다. 이 용어는 여기에서 설명되는 새로운 프로세스를 언급한다.This specification will describe various embodiments of the invention through the background of the system, referred to as a "SEmulator" or "SEmulation" system, and within the background of the system. Throughout the specification, the terms "SEmulator system", "SEmulator" or simply system may be used. These terms include (1) software simulation, (2) simulation through hardware acceleration, (3) in-circuit emulation (ICE), and (4) post-simulation analysis including their respective setup or pre-processing steps; Reference is made to various apparatus and method embodiments in accordance with the present invention for any combination of four modes of operation. At other times, the term "SEmulation" may be used. This term refers to the new process described herein.

유사하게, 재구성가능한 컴퓨팅(RCC, reconfigurable computing) 어레이 시스템, 또는 RCC 컴퓨팅 시스템과 같은 용어는 메인 프로세서, 사용자 설계의 소프트웨어 커널 및 소프트웨어 모델을 포함하는 시뮬레이션/공동 검증 시스템 부분을언급한다. "재구성 가능한 하드웨어 어레이" 또는 "RCC 하드웨어 어레이" 등과 같은 용어는 일실시예에서, 사용자 설계의 하드웨어 모델을 포함하고 재구성 가능한 논리소자의 어레이를 포함하는 시뮬레이션/공동 검증 시스템의 그부분을 언급한다.Similarly, terms such as reconfigurable computing (RCC) array systems, or RCC computing systems, refer to parts of the simulation / co-verification system that include the main processor, a user-designed software kernel, and a software model. Terms such as “reconfigurable hardware array” or “RCC hardware array” refer to that portion of a simulation / co-verification system that, in one embodiment, includes a hardware model of a user design and includes an array of reconfigurable logic elements.

또한, 본 명세서는 "사용자", 및 사용자의 "회로 설계" 또는 "전자설계"등에 대하여 언급하고 있다. "사용자"는 그 인터페이스를 통하여 SEmulation 시스템을 사용하는 사람이며, 설계 프로세스에 거의, 또는 전혀 참여를 하지 않는 회로의 설계자 또는 테스트/디버거일 수 있다. "회로 설계" 또는 "전자 설계"는 소프트웨어거나 하드웨어건 간에 테스트/디버그 프로세스를 위한 SEmulation 시스템에 의해 모델링될 수 있는 주문형 설계 시스템 또는 컴포넌트이다. 많은 경우에, "사용자"도 회로설계 또는 전자설계를 설계하였다.In addition, the present specification refers to "user" and "circuit design" or "electronic design" of the user. A "user" is a person who uses the SEmulation system through its interface and can be a designer or a test / debugger of the circuit with little or no participation in the design process. A "circuit design" or "electronic design" is a custom design system or component that can be modeled by a SEmulation system for a test / debug process, whether in software or hardware. In many cases, the "user" has also designed a circuit or electronic design.

또한, 본 명세서는 "와이어", "와이어 라인", "와이어/버스 라인" 및 "버스"라는 용어를 사용한다. 이들 용어는 다양한 전기적으로 도전성인 라인을 언급한 것이다. 각 라인은 두개의 지점사이의 단일 와이어 또는 지점들 사이의 여러 개의 와이어일 수 있다. 이들 용어는 와이어가 하나 이상의 도전성 라인을 포함할 수 있고, 버스가 하나 이상의 도전성 라인을 또한 포함할 수 있다는 점에서 상호 호환가능하다.In addition, the specification uses the terms "wire", "wire line", "wire / bus line" and "bus". These terms refer to various electrically conductive lines. Each line may be a single wire between two points or several wires between points. These terms are interchangeable in that the wire may include one or more conductive lines, and the bus may also include one or more conductive lines.

본 명세서는 개략적인 형태로 표현되어 있다. 우선, 본 명세서는 4개의 작동 모드 및 하드웨어 실행 구성의 개략을 포함하는 SEmulator 시스템의 일반적인 개략을 나타낸다. 두번째로, 본 명세서는 SEmulator 시스템의 상세한 논의를 제공한다. 몇가지 경우에 있어서, 하나의 도면은 이전의 도면에 나타낸 실시예의 변형을 제공할 수 있다. 이들 경우에, 유사한 참조부호가 유사한 컴포넌트/유닛/프로세스에 대하여 사용될 수 있다. 본 명세서의 개요는 이하와 같다:This specification is shown in schematic form. First of all, the present specification presents a general outline of a SEmulator system that includes an outline of four modes of operation and a hardware execution configuration. Second, the present specification provides a detailed discussion of the SEmulator system. In some cases, one figure may provide a variation of the embodiment shown in the previous figure. In these cases, similar reference numerals may be used for similar components / units / processes. The outline of this specification is as follows:

I. 개요I. Overview

A. 시뮬레이션/하드웨어 가속모드A. Simulation / Hardware Acceleration Mode

B. 타겟 시스템 모드에 의한 에뮬레이션B. Emulation by Target System Mode

C. 후-시뮬레이션 분석 모드C. Post-Simulation Analysis Mode

D. 하드웨어 실행 구조D. Hardware Execution Structure

E. 시뮬레이션 서버E. Simulation Server

F. 메모리 시뮬레이션F. Memory Simulation

G. 공동 검증 시스템G. Joint Verification System

II. 시스템 설명II. System description

III. 시뮬레이션/하드웨어 가속모드III. Simulation / Hardware Acceleration Mode

IV. 타겟시스템 모드에 의한 에뮬레이션IV. Emulation by Target System Mode

V. 후-시뮬레이션 분석 모드V. Post-Simulation Analysis Mode

VI. 하드웨어 실행 구조VI. Hardware execution structure

A. 개요A. Overview

B. 어드레스 포인터B. Address Pointer

C. 게이트 데이터/클럭 네트웍 분석C. Gate Data / Clock Network Analysis

D. FPGA 어레이 및 제어D. FPGA Array and Control

E. 고밀도 FPGA칩을 이용하는 다른 실시예E. Other Embodiments Using High Density FPGA Chips

F. TIGF 논리 장치F. TIGF Logic Unit

VII. 시뮬레이션 서버VII. Simulation server

VIII. 메모리 시뮬레이션VIII. Memory simulation

IX. 공동 검증 시스템IX. Joint verification system

X. 실예X. Example

I. 개요I. Overview

본 발명의 다양한 실시예는 4개의 일반적인 동작모드를 가진다: (1) 소프트웨어 시뮬레이션, (2) 하드웨어 가속화를 통한 시뮬레이션, (3) 회로내 에뮬레이션, (4) 후-시뮬레이션 분석. 다양한 실시예는 이하의 특징 중 적어도 몇가지를 가진 이들 모드의 시스템 및 방법을 포함한다.Various embodiments of the present invention have four general modes of operation: (1) software simulation, (2) simulation with hardware acceleration, (3) in-circuit emulation, and (4) post-simulation analysis. Various embodiments include systems and methods in these modes having at least some of the following features.

(1) 사이클마다 소프트웨어 및 하드웨어 모델을 제어하는, 단일의 엄격하게 결합된 시뮬레이션 엔진, 소프트웨어 커널을 가지는 소프트웨어 및 하드웨어 모델, (2) 소프트웨어 및 하드웨어 모델 발생 및 분할을 위한 컴파일레이션 프로세스동안의 자동 컴포넌트 형태 분석, (3) 하드웨어 가속화 모드, 회로내 에뮬레이션 모드, 및 후-시뮬레이션 분석 모드를 통한, 소프트웨어 시뮬레이션 모드 사이에서 (사이클마다)의 전환 능력, (4) 소프트웨어 조합 컴포넌트 재발생을 통한 완전한 하드웨어 모델 가시성, (5) 레이스 컨디션을 피하기 위한 소프트웨어 클럭 및 게이트 클럭/데이터 논리에 의한 이중 버퍼 클럭 모델링, 및 (6) 경과된 시뮬레이션 세션 내에서 임의로 선택된 지점으로부터 사용자의 회로설계를 다시 시뮬레이팅하거나 또는 하드웨어 가속화하는 능력. 최종 결과는 탄력적이며, 신속한 시뮬레이터/에뮬레이터 시스템 및 방법은 HDL 기능 및 에뮬레이터 실행 성능을 완전하게 할 것이다.(1) a single tightly coupled simulation engine that controls software and hardware models per cycle, software and hardware models with software kernels, and (2) automatic components during the compilation process for generating and partitioning software and hardware models. Ability to switch (per cycle) between software simulation modes, through shape analysis, (3) hardware acceleration mode, in-circuit emulation mode, and post-simulation analysis mode, and (4) full hardware model visibility through software combination component regeneration (5) dual buffer clock modeling with software clock and gate clock / data logic to avoid race conditions, and (6) re-simulate or accelerate hardware design of the user's circuit design from randomly chosen points within the elapsed simulation session. doing ability. The end result is elastic, and rapid simulator / emulator systems and methods will perfect HDL functionality and emulator execution performance.

A. 시뮬레이션/하드웨어 가속화 모드A. Simulation / Hardware Acceleration Mode

자동 컴포넌트 타입 분석을 통한 SEmulator 시스템은 소프트웨어 및 하드웨어에 있어서 사용자 주문 회로 설계를 모델링할 수 있다. 전체 사용자 회로 설계는 소프트웨어로 모델링되는 반면, 평가 컴포넌트(즉, 레지스터 컴퍼넌트, 조합 컴포넌트)는 하드웨어로 모델링된다. 하드웨어 모델링은 컴포넌트 타입 분석에 의해 촉진된다.SEmulator systems, with automatic component type analysis, can model custom circuit designs in software and hardware. The entire user circuit design is modeled in software, while evaluation components (ie, register components, combination components) are modeled in hardware. Hardware modeling is facilitated by component type analysis.

범용의 프로세스 시스템의 메인 메모리에 귀속된 소프트웨어 커널은 다양한 모드와 특징의 실행 및 전체 동작을 제어하는 SEmulator 시스템의 메인 프로그램 으로서 기능한다. 임의의 테스트 벤치 프로세스가 작동하는 한, 커널은 능동 테스트 벤치 컴포넌트를 평가하고, 클럭 컴포넌트를 평가하며, 클럭 에지를 검출하여 조합 논리 데이터 뿐만 아니라 레지스터와 메모리를 업데이트하며, 시뮬레이션 타임을 진척시킨다. 이러한 소프트웨어 커널은 하드웨어 가속 엔진과 엄격하게 결합된 특성의 시뮬레이션 엔진을 위해 제공된다. 소프트웨어/하드웨어 경계에 있어서, SEmulator 시스템은 많은 I/O 어드레스 공간 - REG(레지스터), CLK(소프트웨어 클럭), S2H(소프트웨어 대 하드웨어) 및 H2S(하드웨어 대 소프트웨어)를 제공한다.The software kernel, which belongs to the main memory of the general-purpose process system, functions as the main program of the SEmulator system, which controls the execution and overall operation of various modes and features. As long as any test bench process works, the kernel evaluates active test bench components, evaluates clock components, detects clock edges, updates register and memory as well as combinatorial logic data, and advances simulation time. These software kernels are provided for a simulation engine that is tightly coupled with a hardware acceleration engine. At the software / hardware boundary, the SEmulator system provides many I / O address spaces-REG (register), CLK (software clock), S2H (software to hardware) and H2S (hardware to software).

SEmulator는 4개 모드의 동작 중에서 선택적으로 전환할 수 있는 능력을 가진다. 이러한 시스템의 사용자는 시뮬레이션을 시작하고, 시뮬레이션을 중단하며, 입력값을 주장하고, 값을 검사하며, 사이클마다 단일 단계를 진행하며, 4개의 다른 모드 사이에서 전후 전환한다. 예컨대, 상기 시스템은 시간 기간 동안 소프트웨어로 회로를 시뮬레이팅하고, 하드웨어 모델을 통한 시뮬레이션을 가속하며, 소프트웨어 시뮬레이션 모드를 다시 되돌릴 수 있다.The SEmulator has the ability to selectively switch between four modes of operation. The user of such a system starts a simulation, stops a simulation, asserts an input value, examines a value, goes through a single step every cycle, and switches back and forth between four different modes. For example, the system can simulate circuitry in software over a period of time, accelerate simulation through a hardware model, and return to software simulation mode again.

일반적으로 SEmulation 시스템은 사용자에게 그것이 소프트웨어로 모델링하거나 하드웨어로 모델링하는지에 상관없이 모든 모델링된 컴포넌트를 인식할 수 있는 능력을 제공한다. 여러가지의 이유로, 조합 컴포넌트는 레지스터와 같이 가시적이지 않으며, 따라서 조합 컴포넌트 데이터의 획득은 어렵다. 하나의 이유는, 실제의 조합 컴포넌트 대신에, 전형적으로는 룩업 테이블(LUT)과 같은 조합 컴포넌트를 사용자 회로설계의 하드웨어 부분을 모델링하기 위하여 재구성가능한 보드내에 사용되는 FPGA가 모델링한다는 것이다. 따라서, SEmulation 시스템은 레지스터 값을 판독하여, 조합 컴포넌트를 재발생시킨다. 조합 컴포넌트를 발생시키는데 있어서 부담이 조금 있으므로, 이러한 재발생 프로세스는 항상 수행되는 것은 아니며, 오히려, 사용자의 요청에 의해서만 행해진다.In general, SEmulation systems give users the ability to recognize all modeled components, whether they are modeled in software or hardware. For various reasons, the combinatorial component is not as visible as a register, so obtaining combinatorial component data is difficult. One reason is that instead of the actual combinatorial components, typically the combinatorial component, such as a lookup table (LUT), is modeled by the FPGA used in the reconfigurable board to model the hardware portion of the user circuit design. Thus, the SEmulation system reads the register value, regenerating the combinational component. Since there is a small burden of generating a combinatorial component, this regeneration process is not always performed, but rather only at the request of the user.

소프트웨어 커널은 소프트웨어 측에 존재하기 때문에, 클럭 엣지 검출 메카니즘은 하드웨어 모델내의 다양한 레지스터로 작동 입력을 구동하는 소위 소프트웨어 클럭의 발생을 트리거하도록 제공된다. 그 타이밍은 소프트웨어 클럭 작동 신호가 이들에 대한 데이터의 모델링 전에 레지스터 모델로 진입하도록 이중 버퍼 회로 실행을 통하여 엄격하게 제어된다. 일단 이들 레지스터 모델에 대한 데이터 입력이 안정화되면, 소프트웨어 클럭은 유지시간의 침해의 어떠한 위험도 없이 모든 데이터 값이 함께 게이트되는 것을 보장하기 위하여 동시에 데이터를 게이트한다.Since the software kernel resides on the software side, a clock edge detection mechanism is provided to trigger the generation of so-called software clocks that drive the operational inputs to the various registers in the hardware model. The timing is tightly controlled through double buffer circuit execution so that software clock enable signals enter the register model before modeling the data for them. Once the data inputs to these register models have stabilized, the software clock gates the data at the same time to ensure that all data values are gated together without any risk of breach of retention time.

시스템은 모든 입력값 및 선택된 레지트터값/상태만을 로그하기 때문에 소프트웨어 시뮬레이션은 또한 신속하며, 따라서 I/O 동작 수를 감소시킴으로써 부담은 경감된다. 사용자는 선택적으로 로깅 주파스를 선택할 수 있다.Software simulation is also fast because the system logs all input values and selected register values / states, thus reducing the burden by reducing the number of I / O operations. The user can optionally select the logging frequency.

SEmulation 시스템은 그 타겟 시스템 환경내에서 사용자의 회로를 에뮬레이팅할 수 있다. 타겟 시스템은 평가용 하드웨어 모델에 데이터를 출력하고, 하드웨어 모델은 또한 타겟 시스템으로 데이터를 출력한다. 부가적으로, 소프트웨어 커널은 사용자가 여전히 개시하고, 중단하며, 값을 주장하고, 단일 단계를 진행하며, 한 모드에서 다른 모드로 전환하기 위한 선택을 하도록 이러한 모드의 동작을 제어한다.The SEmulation system can emulate the user's circuitry within its target system environment. The target system outputs data to the evaluation hardware model, which also outputs data to the target system. In addition, the software kernel controls the operation of this mode so that the user still starts, stops, asserts the value, goes through a single step, and makes a choice to switch from one mode to another.

C. 후-시뮬레이션 분석 모드C. Post-Simulation Analysis Mode

로그는 시뮬레이션 세션의 기록 레코드를 사용자에게 제공한다. 공지된 시뮬레이션 시스템과는 달리, SEmulation 시스템은 단일값, 내부 상태, 또는 시뮬레이션 프로세스 동안 값 변화마다 로깅하지는 않는다. SEmulation 시스템은 오직 로깅 주파수(즉, N사이클 마다 log 1 레코드)에 근거하여 선택된 값과 상태를 로그한다. 후-시뮬레이션 단계 동안, 사용자가 방금 종료한 시뮬레이션 세션 내의 X 지점 근방의 다양한 데이터를 검사하기 원한다면, 사용자는 로그된 지점 중의 하나로 가서, 지점 X에 가장 가깝고 그 이전에 일시적으로 위치한 로그된 지점 Y를 말한다. 그리고 나서, 사용자는 시뮬레이션 결과를 얻기 위하여 선택된 로그된 지점 Y로부터 원하는 지점 X까지 시뮬레이팅한다.The log provides the user with a record of the simulation session. Unlike known simulation systems, the SEmulation system does not log every single value, internal state, or value change during the simulation process. The SEmulation system only logs selected values and states based on the logging frequency (ie log 1 record every N cycles). During the post-simulation phase, if you want to examine the various data near point X in the simulation session you just ended up, you can go to one of the logged points and select the logged point Y closest to and temporarily temporarily before point X. Say. The user then simulates from the selected logged point Y to the desired point X to obtain a simulation result.

또한, 주문형 VCD 시스템은 이하에 설명될 것이다. 이러한 주문형 VCD 시스템은 사용자가 시뮬레이션 재가동을 하지 않고 임의의 시뮬레이션 타겟 범위(즉, 시뮬레이션 시간)를 관찰하도록 한다.In addition, a custom VCD system will be described below. This custom VCD system allows the user to observe an arbitrary simulation target range (ie simulation time) without restarting the simulation.

D. 하드웨어 실행 구조D. Hardware Execution Structure

SEmulation 시스템은 재구성 가능한 보드 상에 FPGA칩 어레이를 실행한다. 하드웨어 모델에 근거하여, SEmulation 시스템은 사용자의 회로 설계의 선택된 부분 각각을 FPGA칩 상으로 분할하고, 맵핑하며, 배치하고 라우팅한다. 따라서, 예컨대, 4x4 어레이의 16개의 칩은 이들 16개의 칩을 가로질러 확장된 대형회로의 모델링일 수 있다. 내부 접속 구조는 각각의 칩이 2 점프 또는 링크 내에서 다른 칩으로 억세스하는 것을 허용한다.The SEmulation system implements an FPGA chip array on a reconfigurable board. Based on the hardware model, the SEmulation system divides, maps, places and routes each selected portion of the user's circuit design onto the FPGA chip. Thus, for example, sixteen chips in a 4x4 array may be modeling of a large circuit that extends across these sixteen chips. The internal connection structure allows each chip to access another chip within two jumps or links.

각각의 FPGA칩은 I/O 어드레스 공간(즉, REG, CLK, S2H, H2S) 각각에 대한 어드레스 포인터를 실행한다. 특정 어드레스 공간과 연관된 모든 어드레스 포인터의 조합은 서로 연쇄되어 있다. 그래서, 데이터 전송 동안, 각각의 칩 내의 워드 데이터는 메인 FPGA 버스 및 PCI 버스로부터/로 각 칩내의 선택된 어드레스 공간에 대하여 한번에 하나의 워드가, 한번에 하나의 칩이, 원하는 워드 데이터가 선택된 어드레스 공간에 대하여 억세스될 때까지 연속적으로 선택된다. 이러한 워드데이터의 연속적인 선택은 워드 선택 신호의 전달에 의해 달성된다. 이러한 워드 선택 신호는 칩내의 어드레스 포인터를 통하여 진행하며, 그리고 나서 다음 칩의 어드레스 포인터까지 전파되며, 마지막 칩 또는 시스템이 어드레스 포인터를 초기화할 때까지 계속한다.Each FPGA chip implements an address pointer for each of the I / O address spaces (ie, REG, CLK, S2H, H2S). The combination of all address pointers associated with a particular address space is concatenated with each other. Thus, during data transfer, the word data in each chip is to / from the main FPGA bus and PCI bus one word at a time, one chip at a time, and one word at a time for the selected address space in each chip to the selected address space. Are selected continuously until they are accessed. This continuous selection of word data is accomplished by the transfer of a word select signal. This word select signal travels through the address pointer in the chip and then propagates to the address pointer of the next chip and continues until the last chip or system initializes the address pointer.

재구성가능한 보드 내의 FPGA 버스 시스템은 PCI 버스 대역폭의 두배의 대역폭, PCI 버스 속도의 절반 속도에서 동작한다. 따라서, FPGA 칩은 보다 큰 대역폭의 버스를 이용하기 위하여 뱅크로 분리된다. 이러한 FPGA 버스 시스템의 스루풋은 PCI 버스 시스템의 스루풋을 추적할 수 있어, 버스 속도를 줄임으로써 성능을 상실하지 않는다. 확장은 뱅크 길이를 연장하는 피기백(piggyback)보드를 통하여 가능하다.The FPGA bus system on the reconfigurable board operates at twice the bandwidth of PCI bus bandwidth and half the PCI bus speed. Thus, FPGA chips are divided into banks to use buses of higher bandwidth. The throughput of these FPGA bus systems can track the throughput of the PCI bus system so that it does not lose performance by reducing the bus speed. Expansion is possible through piggyback boards that extend the bank length.

본 발명의 다른 실시예에 있어서, 고밀도의 FPGA칩이 사용된다. 하나의 이러한 고밀도의 칩은 알테라 10K 130V 및 10K 250V 칩이다. 이들 칩의 사용은 조금 덜 밀집된 8개의 FPGA 칩대신에 단지 4개의 FPGA칩(알테라 10K 100)이 보드당 사용되도록 보드 설계를 변경한다.In another embodiment of the present invention, a high density FPGA chip is used. One such high density chip is the Altera 10K 130V and 10K 250V chips. The use of these chips changes the board design so that only four FPGA chips (Altera 10K 100) are used per board instead of eight less dense FPGA chips.

시뮬레이션 시스템의 FPGA 어레이가 특정 보드 내부 접속 구조를 통하여 머더보드상에 제공된다. 각각의 칩은 내부 접속을 8개까지 가질 수 있으며, 여기서 내부 접속은 인접한 다이렉트-네이버(direct neighbor) 내부 접속(즉, N[73:0],W[73:0], E[73:0]) 및, 하나의 단일 보드 내부 및 다른 보드를 가로지르는 로컬 버스 접속을 제외한 원-홉 네이버 내부 접속(즉,NH[27:0], SH[27:0], XH[36:0], XH[72:37])에 따라서 배열된다. 각각의 칩은 인접한 이웃칩에 바로 내부 접속되거나 또는 위, 아래, 좌우에 위치한 인접하지 않은 칩에 대하여 하나의 홉내에 내부 접속된다. X 방향(동-서)에서 어레이는 원환체(torus)이다. Y 방향(북-남)에서 어레이는 메시(mesh)구조이다.The FPGA array of the simulation system is provided on the motherboard through specific board interconnects. Each chip can have up to eight internal connections, where the internal connections are direct neighbor internal connections (ie, N [73: 0], W [73: 0], E [73: 0). ]) And one-hop neighbor internal connections (ie NH [27: 0], SH [27: 0], XH [36: 0], excluding local bus connections within one single board and across other boards) XH [72:37]). Each chip is internally connected directly to adjacent neighboring chips or internally within one hop to non-adjacent chips located above, below, left and right. In the X direction (east-west), the array is a torus. In the Y direction (north-south), the array is a mesh structure.

내부접속만이 단일 보드 내에서 논리장치 및 다른 컴포넌트를 결합할 수 있다. 그러나, 내부 보드 접속기는 이들 보드를 결합하고, (1) 머더 보드 및 어레이 보드를 통한 PCI 버스와 (2) 임의의 어레이 보드 사이에서 신호를 전달하기 위하여 다양한 보드를 가로질러 서로 내부 접속하도록 제공된다.Only internal connections can combine logic and other components within a single board. However, internal board connectors are provided to couple these boards and internally connect each other across the various boards to transfer signals between (1) the PCI bus through the motherboard and the array board and (2) any array board. .

머더보드 접속기는 보드를 머더모드에, 따라서, PCI 버스, 전력, 및 접지에 연결한다. 몇개의 보드에 있어서, 머더 보드 접속기는 머더보드로의 직접 접속에는 사용되지 않는다. 6개의 보드 구조에 있어서, 단지 보드 1, 3 및 5 만이 머더보드에 직접 접속되는 한편, 나머지 보드 2, 4, 및 6은 머더보드 연결성을 위하여 그들의 인접 보드에 의지한다. 따라서, 다른 모든 보드는 머더 보드에 직접 연결되며, 이들 버스의 로컬 버스 및 내부 접속은 솔더 측 내지 컴포넌트 측에 배열된 내부 보드 접속기를 통하여 함께 결합된다. PCI신호는 보드 중 하나(통상적으로 첫번째의 보드)만을 통하여 라우팅된다. 전력 및 접지는 그들 보드를 위한 다른 머더보드 접속기에 인가된다. 솔더 측 내지 컴포넌트 측에 배치되면, 다양한 내부 보드 접속기가 PCI 버스 컴포넌트, FPGA 논리 장치, 메모리 장치 및 다양한 시뮬레이션 시스템 제어 회로 사이에서 통신을 허용한다.The motherboard connector connects the board to the mother mode and thus to the PCI bus, power, and ground. In some boards, the motherboard connector is not used for direct connection to the motherboard. In a six board structure, only boards 1, 3, and 5 are directly connected to the motherboard, while the remaining boards 2, 4, and 6 rely on their adjacent boards for motherboard connectivity. Thus, all other boards are directly connected to the motherboard, and the local buses and internal connections of these buses are joined together through internal board connectors arranged on the solder side or component side. PCI signals are routed through only one of the boards (usually the first). Power and ground are applied to other motherboard connectors for their boards. When placed on the solder side or component side, various internal board connectors allow communication between PCI bus components, FPGA logic devices, memory devices, and various simulation system control circuits.

E. 시뮬레이션 서버E. Simulation Server

본 발명의 다른 실시예에서, 시뮬레이션 서버는 다수의 사용자가 동일한 재구성 가능한 하드웨어 유닛을 억세스하는 것을 허용하도록 제공된다. 하나의 시스템 구조에 있어서 하나의 네트웍을 가로지르는 다수의 워크 스테이션, 또는 비네트웍 환경내의 다수의 사용자/프로세스가 동일한 또는 다른 사용자 회로 설계를 리뷰/디버그하도록 동일한 서버 기반 재구성 가능한 하드웨어 유닛을 억세스할 수 있다. 상기 억세스는 스케쥴러가 다수의 사용자에 대한 억세스 우선순위를 결정하고, 잡(job)을 교환하며, 예정된 사용자 사이에서 선택적으로 하드웨어 모델 억세스를 로크(lock)하는 시간 공유 프로세스를 통하여 달성된다. 하나의 시나리오에서, 각 사용자는 처음으로 재구성 가능한 하드웨어 모델로 그/그녀 각각의 사용자 설계를 맵핑하기 위하여 서버에 억세스할 수 있으며, 그 경우 시스템은 소프트웨어 및 하드웨어 모델을 발생시키기 위하여 설계를 컴파일하고, 클러스터링 동작을 수행하며, 플레이스 및 라우트 동작을 수행하고, 비트스트림 구조 파일을 발생시키고, 사용자 설계의 하드웨어 부분을 모델링하기 위하여 재구성가능한 하드웨어 유닛의 FPGA칩을 재구성한다. 하나의 사용자가 하드웨어 모델을 사용하여 그 설계를 가속화하고, 소프트웨어 시뮬레이션용 그 자신의 메모리에 하드웨어 상태를 다운로드하는 경우, 하드웨어 유닛은 다른 사용자에 의한 억세스로부터 해방될 수 있다.In another embodiment of the present invention, a simulation server is provided to allow multiple users to access the same reconfigurable hardware unit. In a system architecture, multiple workstations across a network, or multiple users / processes in a non-network environment, can access the same server-based reconfigurable hardware units to review / debug the same or different user circuit designs. have. The access is achieved through a time sharing process in which the scheduler determines access priorities for multiple users, exchanges jobs, and optionally locks hardware model access among the scheduled users. In one scenario, each user can access a server to map his / her respective user designs to the first reconfigurable hardware model, in which case the system compiles the designs to generate software and hardware models, Reconfigure the FPGA chip of the reconfigurable hardware unit to perform clustering operations, perform place and route operations, generate bitstream structure files, and model the hardware portion of the user design. If one user uses a hardware model to accelerate the design and download the hardware state to his own memory for software simulation, the hardware unit can be freed from access by other users.

서버는 가속화 및 하드웨어 상태 교환목적으로 재구성 가능한 하드웨어를 억세스하기 위하여 다수의 사용자 또는 프로세스를 제공한다. 시뮬레이션 서버는 스케쥴러 또는 하나 이상의 디바이스 드라이버, 및 재구성가능한 하드웨어 유닛을 포함한다. 시뮬레이션 서버 내의 스케쥴러는 선점 라운드 로빈 알고리즘에 기초한다. 서버 스케쥴러는 시뮬레이션 잡 큐 테이블(simulation job queue table), 우선순위 분류기 및 잡 교환기를 포함한다. 본 발명의 복원 및 재생 기능은 네트웍 다중 사용자 환경뿐만 아니라 비네트웍 멀티프로세싱 환경을 용이하게 하고, 이전의 체크포인트 상태 데이터가 다운로드될 수 있으며, 그 체크포인트와 관련된 전체 시뮬레이션 상태는 재생 디버깅 또는 사이클별 스텝핑을 위해 복원될 수 있다.The server provides multiple users or processes to access reconfigurable hardware for acceleration and hardware state exchange. The simulation server includes a scheduler or one or more device drivers, and a reconfigurable hardware unit. The scheduler in the simulation server is based on a preemptive round robin algorithm. The server scheduler includes a simulation job queue table, a priority classifier and a job exchanger. The restoration and playback functions of the present invention facilitate not only network multi-user environments, but also non-network multiprocessing environments, where previous checkpoint state data can be downloaded, and the overall simulation state associated with the checkpoint is either replay debugging or cycle-by-cycle. Can be restored for stepping.

F. 메모리 시뮬레이션F. Memory Simulation

본 발명의 메모리 시뮬레이션 또는 메모리 맵핑 태양은 재구성가능한 하드웨어 유닛내의 FPGA 칩 어레이로 프로그래밍된, 사용자 설계의 구성된 하드웨어 모델과 관련된 다양한 메모리블럭을 관리하기 위하여 시뮬레이션 시스템에 대하여 효과적인 방안을 제공한다. 본 발명의 상기 메모리 시뮬레이션 태양은 사용자 설계와 관련된 수많은 메모리 블럭이 논리 장치 내부 대신에 시뮬레이션 시스템 내의 SRAM 메모리 장치로 맵핑되어 있으며, 그것은 사용자 설계를 구성하고 모델링하는데 이용된다. 메모리 시뮬레이션 시스템은 메모리 상태 머신, 평가 상태 머신 및 제어 및 중개를 위한 그들의 연관된 로직과 함께; (1) 메인 컴퓨팅 시스템 및 그와 연관된 메모리 시스템, (2) 시뮬레이션 시스템내의 FPGA 버스에 결합된 SRAM 메모리 장치, 및 (3) 디버그되는 형성되고 프로그래밍된 사용자 설계를 포함하는 FPGA 논리 장치를 포함한다. 본 발명의 일실시예에 따른 메모리 시뮬레이션 시스템의 동작은 일반적으로 이하와 같다. 시뮬레이션 기록/판독 사이클은 3개의 기간-DMA 데이터 전송, 평가 및 메모리 억세스로 나뉜다.The memory simulation or memory mapping aspect of the present invention provides an effective way for a simulation system to manage various memory blocks associated with a configured hardware model of a user design, programmed with an array of FPGA chips in a reconfigurable hardware unit. In this memory simulation aspect of the present invention, numerous memory blocks associated with a user design are mapped to an SRAM memory device in the simulation system instead of inside the logic device, which is used to construct and model the user design. The memory simulation system, along with the memory state machine, evaluation state machine and their associated logic for control and mediation; FPGA logic devices including (1) a main computing system and associated memory system, (2) an SRAM memory device coupled to an FPGA bus within a simulation system, and (3) a formed and programmed user design to be debugged. Operation of the memory simulation system according to an embodiment of the present invention is generally as follows. The simulation write / read cycle is divided into three period-DMA data transfer, evaluation and memory accesses.

메모리 시뮬레이션 시스템의 FPGA 논리 장치측은 평가 상태 머신, FPGA 버스 드라이버 및 사용자 설계 내의 사용자 자신의 메모리 인터페이스와 인터페이싱하기 위하여 각각의 메모리 블럭 N에 대한 논리 인터페이스를 포함하여: (1) FPGA 논리장치사이에서 데이터 평가, (2) FPGA 논리 장치 및 SRAM 메모리 장치 사이에서 기록/판독 메모리 억세스를 조정한다. FPGA 논리 장치측과 결합에 있어서, FPGA I/O 컨트롤러 측은 메모리 상태 머신 및, (1) 메인 컴퓨팅 시스템 및 SRAM 메모리 장치 및, (2) FPGA 논리 장치 및 SRAM 메모리 장치 사이의 DMA, 기록 및 판독 동작을 조정하기 위한 인터페이스 로직을 포함한다.The FPGA logic unit side of the memory simulation system includes an evaluation state machine, an FPGA bus driver, and a logic interface for each memory block N to interface with the user's own memory interface in the user design: (1) data between FPGA logic units; Evaluate, (2) coordinate write / read memory access between FPGA logic and SRAM memory devices. In conjunction with the FPGA logic unit side, the FPGA I / O controller side may include a memory state machine and (1) a main computing system and an SRAM memory device, and (2) DMA, write and read operations between the FPGA logic device and the SRAM memory device. It includes interface logic to adjust this.

G. 공동 검증 시스템(coverification system)G. Joint Coverage System

본 발명의 일실시예는 재구성가능한 컴퓨팅 시스템(이하, RCC 컴퓨팅 시스템이라고 함)과, 재구성 가능한 컴퓨팅 하드웨어 어레이(이하, RCC 하드웨어 어레이라고 함)을 포함하는 공동검증 시스템이다. 몇가지 실시예에서, 타겟 시스템 및 외부 I/O 장치는 그들이 소프트웨어로 모델링될 수 있기 때문에 필수적이지는 않다. 다른 실시예에 있어서, 타겟 시스템 및 외부 I/O 장치는 사실상 시뮬레이팅된 테스트 벤치 데이터보다는 실제 데이터를 사용하고 속도를 얻기 위하여 공동 검증 시스템에 결합되어 있다. 따라서, 하나의 공동 검증 시스템은 사용자 설계의 소프트웨어 부분 및 하드웨어 부분을 디버그하기 위하여 다른 기능과 함께 RCC 컴퓨팅 시스템 및 RCC 하드웨어 어레이를 병합하는 한편, 실제 타겟 시스템 및/또는 I/O 장치를 사용할 수 있다.One embodiment of the present invention is a co-verification system comprising a reconfigurable computing system (hereinafter referred to as RCC computing system) and a reconfigurable computing hardware array (hereinafter referred to as RCC hardware array). In some embodiments, the target system and external I / O devices are not essential because they can be modeled in software. In another embodiment, the target system and external I / O device are actually coupled to a joint verification system to use and obtain actual data rather than simulated test bench data. Thus, one joint verification system may use an actual target system and / or I / O device while merging the RCC computing system and the RCC hardware array with other functions to debug the software and hardware portions of the user design. .

RCC 컴퓨팅 시스템은 또한 클록 로직(엣지 검출 및 소프트웨어 클럭 발생을 위한), 사용자 설계의 테스팅을 위한 테스트 벤치 프로세스 및, 사용자가 실제의 물리적 I/O 장치를 사용하는 대신에 소프트웨어로 모델링하도록 결정하는 임의의I/O 장치를 위한 디바이스 모델을 포함한다. 물론, 사용자는 하나의 디버그 세션에서 모델링된 I/O 디바이스뿐만 아니라 실제의 I/O 장치를 사용하도록 결정할 수 있다. 소프트웨어 클럭은 타겟 시스템 및 외부 I/O 장치를 위한 외부 클럭 소스로서 기능하기 위하여 인터페이스에 제공되어 있다. 이러한 소프트웨어 클럭의 이용은 인입되고 인출되는 데이터를 프로세스하는데 필수적인 동기화를 제공한다. RCC컴퓨팅 시스템-발생된 소프트웨어 클럭은 디버그 세션을 위한 타임기반이므로, 시뮬레이팅되고 하드웨어 가속화된 데이터는 공동 검증 시스템 및 외부 인터페이스 사이에서 전달되는 임의의 데이터와 동기된다.The RCC computing system also includes clock logic (for edge detection and software clock generation), test bench processes for testing user designs, and any that the user decides to model in software instead of using actual physical I / O devices. Contains device models for I / O devices. Of course, the user can decide to use the actual I / O device as well as the modeled I / O device in one debug session. The software clock is provided on the interface to function as an external clock source for the target system and external I / O devices. The use of this software clock provides the synchronization necessary to process incoming and outgoing data. Since the RCC computing system-generated software clock is time based for the debug session, the simulated and hardware accelerated data is synchronized with any data passed between the joint verification system and the external interface.

타겟 시스템 및 외부 I/O 장치가 공동 검증 시스템에 결합되면, 핀아웃 데이터는 공동 검증 시스템 및 그 외부 인터페이스사이에 제공되어야 한다. 공동 검증 시스템은 (1) RCC 컴퓨팅 시스템 및 RCC 하드웨어 어레이 및 (2) (타겟 시스템 및 외부 I/O 장치에 결합된) 외부 인터페이스, 및 (3) RCC 하드웨어 어레이 사이에서 트래픽 제어를 제공하는 제어로직을 포함한다. RCC 컴퓨팅 시스템은 RCC 하드웨어 어레이에 모델링된 사용자 설계 부분을 포함하는, 소프트웨어로 전체 설계의 모델을 가지므로, RCC 컴퓨팅 시스템은 또한 외부 인터페이스 및 RCC 하드웨어 어레이 사이를 통과하는 모든 데이터에 대하여 억세스하여야 한다. 제어 로직은 RCC 컴퓨팅 시스템이 이들 데이터를 억세스하는 것을 보장한다.When the target system and the external I / O device are coupled to the joint verification system, pinout data must be provided between the joint verification system and its external interface. The joint verification system is a control logic that provides traffic control between (1) the RCC computing system and the RCC hardware array and (2) the external interface (coupled to the target system and external I / O devices), and (3) the RCC hardware array. It includes. Since the RCC computing system has a model of the overall design in software, including user-designed parts modeled on the RCC hardware array, the RCC computing system must also access all data passing between the external interface and the RCC hardware array. Control logic ensures that the RCC computing system accesses this data.

II. 시스템 설명II. System description

도 1은 본 발명의 일실시예의 상위 개략도를 나타낸다. 워크 스테이션(10)은PCI 버스 시스템을 통하여 하드웨어 모델(20) 및 에뮬레이션 인터페이스(30)에 결합되어 있다. 재구성가능한 하드웨어 모델(20)은 케이블(61) 뿐만 아니라, PCI버스(50)를 통하여 에뮬레이션 인터페이스(30)에 결합되어 있다. 타겟 시스템(40)은 케이블(60)을 통하여 에뮬레이션 인터페이스(30)에 결합되어 있다. 다른 실시예에서, 에뮬레이션 인터페이스(30) 및 타겟 시스템(40)(점선 박스로 나타낸)을 포함하는 회로 내부 에뮬레이션 셋업(70)은 타겟 시스템의 환경 내에서 사용자 설계의 에뮬레이션이 특정 테스트/디버그 세션 동안 요구되지 않는 경우, 이 셋업 내에 제공되지 않는다. 회로 내부 에뮬레이션 셋업(70)없이, 재구성 가능한 하드웨어 모델(20)은 PCI 버스(50)를 통하여 워크스테이션(10)과 교신한다.1 shows a high level schematic diagram of one embodiment of the present invention. The workstation 10 is coupled to the hardware model 20 and the emulation interface 30 via a PCI bus system. The reconfigurable hardware model 20 is coupled to the emulation interface 30 via the PCI bus 50 as well as the cable 61. The target system 40 is coupled to the emulation interface 30 via a cable 60. In another embodiment, the in-circuit emulation setup 70, including the emulation interface 30 and the target system 40 (indicated by dashed boxes), allows the emulation of the user design within the environment of the target system during a particular test / debug session. If not required, it is not provided within this setup. Without the in-circuit emulation setup 70, the reconfigurable hardware model 20 communicates with the workstation 10 via the PCI bus 50.

회로 내부 에뮬레이션 셋업(70)과 함께, 재구성가능한 하드웨어 모델(20)은 타겟 시스템 내의 몇개의 전자 서브시스템의 사용자의 회로 설계를 모방하거나 모사한다. 타겟 시스템의 환경 내에서 전자 서브시스템의 사용자의 회로 설계의 정확한 동작을 보장하기 위하여, 타겟 시스템(40) 및 모델링된 전자 서브시스템 사이의 입력 및 출력 신호는 평가용 재구성 가능한 하드웨어 모델(20)에 제공되어야 한다. 따라서, 재구성 가능한 하드웨어 모델(20)로부터/로의 타겟 시스템(40)의 입력 및 출력 신호가 에뮬레이션 인터페이스(30) 및 PCI 버스(50)를 통과하여 케이블(60)을 통하여 전달된다. 선택적으로, 타겟 시스템(40)의 입력/출력 신호는 에뮬레이션 인터페이스(30) 및 케이블(61)을 통하여 재구성 가능한 하드웨어 모델(20)에 전달될 수 있다.In conjunction with the in-circuit emulation setup 70, the reconfigurable hardware model 20 mimics or simulates the user's circuit design of several electronic subsystems in the target system. In order to ensure the correct operation of the circuit design of the user of the electronic subsystem in the environment of the target system, input and output signals between the target system 40 and the modeled electronic subsystem are passed to the reconfigurable hardware model 20 for evaluation. Should be provided. Thus, input and output signals of the target system 40 to and from the reconfigurable hardware model 20 are passed through the cable 60 through the emulation interface 30 and the PCI bus 50. Optionally, the input / output signal of the target system 40 can be delivered to the reconfigurable hardware model 20 via the emulation interface 30 and the cable 61.

제어 데이터 및 몇몇 본체 시뮬레이션 데이터는 PCI 버스(50)를 통하여 재구성가능한 하드웨어 모델(20) 및 워크 스테이션(10) 사이를 통과한다. 실제로, 워크스테이션(10)은 전체 SEmulation 시스템의 동작을 제어하고, 재구성가능한 하드웨어 모델(20)로 억세스(기록/판독)해야하는 소프트웨어 커널을 가동한다.Control data and some body simulation data pass between the reconfigurable hardware model 20 and the workstation 10 via the PCI bus 50. In practice, workstation 10 controls the operation of the entire SEmulation system and runs a software kernel that must be accessed (written / read) with the reconfigurable hardware model 20.

컴퓨터, 키보드, 마우스, 모니터 및 적절한 버스/네트웍 인터페이스로 완성되는 하나의 워크스테이션(10)은 사용자가 전자 시스템의 회로 설계를 기술하는 데이터를 입력하고 수정하게 한다. 전형적인 워크스테이션은 썬마이크로 시스템즈 SPRAC 또는 ULTRA-SPARC 워크스테이션 또는 인텔/마이크로소프트 기반 컴퓨팅 스테이션을 포함한다. 당업자에게 공지된 바와 같이, 워크스테이션(10)은 CPU(11), 로컬 버스(12), 호스트/PCI 브릿지(13), 메모리 버스(14) 및 메인메모리(15)를 포함한다. 본 발명의 다양한 소프트웨어 시뮬레이션, 하드웨어 가속화에 의한 시뮬레이션, 회로 내부 에뮬레이션 및 후-시뮬레이션 분석 태양들은 워크스테이션(10), 재구성 가능한 하드웨어 모델(20), 및 에뮬레이션 인터페이스(30)내에 제공되어 있다. 소프트웨어로 구현되는 알고리즘은 테스트/디버그 세션 동안 메인메모리(15)내에 저장되어, 워크스테이션의 작동 시스템을 거쳐 CPU(11)를 통해 실행된다.One workstation 10, complete with a computer, keyboard, mouse, monitor, and appropriate bus / network interface, allows the user to enter and modify data describing the circuit design of the electronic system. Typical workstations include Sun Microsystems SPRAC or ULTRA-SPARC workstations or Intel / Microsoft based computing stations. As known to those skilled in the art, the workstation 10 includes a CPU 11, a local bus 12, a host / PCI bridge 13, a memory bus 14, and a main memory 15. Various software simulations, hardware-accelerated simulations, in-circuit emulation and post-simulation analysis aspects of the present invention are provided within workstation 10, reconfigurable hardware model 20, and emulation interface 30. Algorithms implemented in software are stored in main memory 15 during the test / debug session and run through CPU 11 via the operating system of the workstation.

당업자에게 이미 알려진 바와 같이, 작동 시스템이 초기 펌웨어에 의해 워크스테이션(10)의 메모리내로 로딩된 후에, 제어가 필수 데이터 구조를 셋업하기 위한 그 초기화 코드를 패스하여, 디바이스 드라이버를 로드하고 초기화한다. 그리고 나서, 제어가 커맨드 라인 인터프리터(CLI)로 패스되어, 가동될 프로그램을 사용자가 지시하도록 촉구한다. 그후, 작동 시스템이 프로그램을 가동하는 데 필요한 메모리의 양을 결정하고, 메모리 블럭의 위치를 정하거나, 또는 메모리 블럭을 할당하고 BIOS를 통하여 또는 직접 메모리를 억세스한다. 메모리 로딩 프로세스의 종료 후에, 애플리케이션 프로그램이 실행을 개시한다.As is already known to those skilled in the art, after the operating system has been loaded into the memory of the workstation 10 by the initial firmware, control passes its initialization code to set up the required data structures to load and initialize the device driver. Control is then passed to the command line interpreter CLI, prompting the user to instruct the program to be run. The operating system then determines the amount of memory needed to run the program, locates the memory block, or allocates the memory block and accesses the memory either directly or through the BIOS. After the end of the memory loading process, the application program starts executing.

본 발명의 일실시예는 SEmulation의 특정 애플리케이션 프로그램이다. 그 실행과정동안, 애플리케이션 프로그램은 그것에 제한되지는 않지만, 디스크 파일로부터의 판독 및 기록, 데이터 교신 수행 및 디스플레이/키보드/마우스와 인터페이싱을 포함하는 작동 시스템으로부터의 수많은 서비스를 필요로 할 수 있다.One embodiment of the present invention is a specific application program of SEmulation. During its execution, the application program may require numerous services from the operating system, including but not limited to reading and writing from disk files, performing data communications, and displaying / keyboard / mouse interfacing.

워크스테이션(10)은 사용자가 회로설계 데이터를 입력하고, 회로설계 데이터를 편집하며, 시뮬레이션 및 에뮬레이션 프로세스를 모니터링하는 한편 결과를 얻고, 본질적으로 시뮬레이션 및 에뮬레이션 프로세스를 제어하도록 하는 적절한 사용자 인터페이스를 가진다. 도 1에는 나타나 있지 않지만, 사용자 인터페이스는 모니터로 보여지며 키보드와 마우스로 입력될 수 있는 사용자 억세스가능한 메뉴방식 옵션 및 커맨드 세트를 포함한다.Workstation 10 has a suitable user interface that allows a user to enter circuit design data, edit circuit design data, monitor simulation and emulation processes, obtain results while essentially controlling the simulation and emulation process. Although not shown in FIG. 1, the user interface includes a set of user accessible menu options and commands that can be viewed with a monitor and entered with a keyboard and mouse.

사용자는 통상적으로 전자시스템의 특정 회로 설계를 만들어내고, 워크스테이션(10)으로 그의 설계된 시스템의 HDL(보통은 RTL 레벨로 구성된) 코드 기술을 입력한다. 본원 발명의 SEmulation 시스템은 소프트웨어 및 하드웨어 사이의 모델링을 분할하기 위하여, 다른 동작들 사이에서 컴퍼넌트 형태 분석을 수행한다. SEmulation 시스템은 소프트웨어의 작용, RTL 및 게이트 레벨 코드를 모델링한다. 하드웨어 모델링에 있어서, 시스템은 RTL 및 게이트 레벨 코드를 모델링할 수 있으나, RTL레벨은 하드웨어 모델링 이전에 게이트 레벨에 통합되어야 한다. 게이트 레벨 코드는 하드웨어 모델링을 위한 사용가능한 소스 설계 데이터 베이스 포맷으로바로 프로세싱될 수 있다. RTL 및 게이트 레벨 코드를 사용하여, 시스템은 자동적으로 분할 단계를 완성하기 위하여 컴포넌트 형태 분석을 수행한다. 소프트웨어 컴파일 시간동안의 분할 분석을 기반으로, 시스템은 하드웨어 가속화를 통하여 신속한 시뮬레이션을 위한 하드웨어로 회로설계의 몇개 부분을 맵핑한다. 사용자는 모델화된 회로 설계를 회로 에뮬레이션 내의 실제 환경을 위한 타겟 시스템에 결합할 수도 있다. 소프트웨어 시뮬레이션과 하드웨어 가속 엔진은 밀접하게 결합되어 있기 때문에, 소프트웨어 커널을 통하여, 사용자는 소프트웨어 시뮬레이션을 사용하여 전체 회로 설계를 시뮬레이트하고, 맵핑된 화로 설계의 하드웨어 모델을 사용하여 테스트/디버그 프로세스를 가속시키며, 시뮬레이션부로 리턴하며, 그리고 테스트/디버그 프로세스가 수행될 때 까지 하드웨어 가속으로 리턴할 수 있다. 사이클-바이-사이클 및 사용자 마음대로 소프트웨어 시뮬레이션 및 하드웨어 가속 간의 스위치를 할 수 있는 능력은 본 실시예의 가치 있는 특징중의 하나이다. 이러한 특징은 다양한 지점의 검사한 뒤에 회로 설계의 결함을 고치기 위하여, 하드웨어 가속 모드를 사용하고 소프트웨어 시뮬레이션을 사용하여 사용자가 매우 신속하게 특정 지점 또는 사이클로 가는 것을 허용함으로써 디버그 프로세스에서 특히 유용하다. 더욱이, 에스 에뮬레이션(SEmulation) 시스템은 컴포넌트의 내부 실현이 하드웨어 또는 소프트웨어로 이루어지는지를 사용자에게 모든 컴포넌트를 볼 수 있게 한다. 에스 에뮬레이션 시스템은 사용자가 그러한 판독을 요구할 때, 하드웨어 모델로부터 레지스터 값을 판독하며, 그리고 나서 소프트웨어 모델을 사용하여 조합 컴포넌트를 재설계함으로써 성취한다. 이러한 특징 및 다른 특징은 상세한설명에서 더욱 상세히 논의될 것이다.The user typically creates a specific circuit design of the electronic system and inputs the HDL (usually configured at the RTL level) code description of his designed system to the workstation 10. The SEmulation system of the present invention performs component shape analysis among different operations to partition modeling between software and hardware. The SEmulation system models the software's behavior, RTL and gate level code. In hardware modeling, the system can model RTL and gate level codes, but the RTL level must be integrated at the gate level before hardware modeling. Gate level code can be processed directly into an available source design database format for hardware modeling. Using RTL and gate level codes, the system automatically performs component shape analysis to complete the segmentation step. Based on segmentation analysis during software compile time, the system maps several parts of the circuit design into hardware for rapid simulation through hardware acceleration. The user may combine the modeled circuit design into a target system for a real environment in circuit emulation. Because the software simulation and hardware acceleration engine are tightly coupled, the software kernel allows the user to simulate the entire circuit design using software simulation and to accelerate the test / debug process using the hardware model of the mapped furnace design. Return to the simulation section, and return to hardware acceleration until the test / debug process is performed. The ability to switch between cycle-by-cycle and software simulation and hardware acceleration at will is one of the valuable features of this embodiment. This feature is particularly useful in the debug process by using hardware accelerated mode and using software simulation to allow the user to go to a specific point or cycle very quickly to check for defects in the circuit design after examining various points. Moreover, the SE emulation system allows the user to see all components whether the internal realization of the components is hardware or software. The emulation system accomplishes this by reading the register values from the hardware model when the user requires such a reading, and then redesigning the combinatorial component using the software model. These and other features will be discussed in more detail in the detailed description.

워크스테이션(10)는 버스 시스템(50)에 연결된다. 버스 시스템은 워크 스테이션(10), 재구성가능한 하드웨어 모델(20) 및 에뮬레이션 인터페이스(30)와 같은 다양한 에이전트가 함께 결합하여 동작 가능하게 하는 임의의 이용가능한 버스 시스템이 사용될 수 있다. 바람직하게는, 버스 시스템은 실시간 또는 거의 실시간으로 사용자에게 제공할 정도로 빠르다. 그러한 버스 시스템중 하나는 본 명세서에서 참조되는 환경 컴포넌트 인터콘넥트(PCI) 표준에서 상술된 버스 시스템이다. 최근, PCI 표준 개정 2.0은 33㎒ 버스 속도를 제공한다. 개정 2.1은 66㎒ 버스 속도 지원을 제공한다. 따라서, 워크스테이션(10), 재구성가능한 하드웨어 모델(20) 및 에뮬레이션 인터페이스(30)는 PCI 표준에 따른다.Workstation 10 is connected to bus system 50. The bus system can be any available bus system that allows various agents, such as workstation 10, reconfigurable hardware model 20, and emulation interface 30, to operate in conjunction with one another. Preferably, the bus system is fast enough to provide the user in real time or near real time. One such bus system is the bus system described above in the Environmental Component Interconnect (PCI) standard referenced herein. Recently, PCI Standard Revision 2.0 provides a 33 MHz bus speed. Revision 2.1 provides 66 MHz bus speed support. Thus, workstation 10, reconfigurable hardware model 20, and emulation interface 30 conform to the PCI standard.

일 실시예에서, 워크스테이션(10)과 재구성가능한 하드웨어 모델(20)간의 통신은 PCI 버스로 이루어진다. 다른 PCI 수행 디바이스는 이러한 버스 시스템에서 발견될 수 있다. 이러한 디바이스는 워크스테이션(10), 재구성가능한 하드웨어 모델(20) 및 에뮬레이션 인터페이스(30)와 같은 레벨 또는 다른 레벨에서 PCI 버스에 결합될 수 있다. PCI 버스(52)와 같이, 다른 레벨에서의 각 PCI 버스는 만약 모두 존재한다면, PCI-대-PCI 브리지(51)를 통하여 PCI 버스(50)와 같은 다른 PCI 버스 레벨에 연결된다. PCI 버스(52)에서, 2개의 PCI 디바이스(53, 54)가 결합될 수 있다.In one embodiment, the communication between workstation 10 and reconfigurable hardware model 20 is via a PCI bus. Other PCI performing devices can be found in this bus system. Such a device may be coupled to the PCI bus at the same level or at a different level, such as workstation 10, reconfigurable hardware model 20, and emulation interface 30. Like PCI bus 52, each PCI bus at a different level, if present, is connected to another PCI bus level, such as PCI bus 50, through PCI-to-PCI bridge 51. In the PCI bus 52, two PCI devices 53, 54 may be combined.

재구성가능한 하드웨어 모델(20)은 사용자의 전자 시스템 설계의 하드웨어부를 설계하기 위하여 프로그래머블하게 구성되고, 재구성될 수 있는 필드-프로그래머블 게이트 어레이(FPGA)칩 어레이를 포함한다. 이러한 실시예에서, 하드웨어 모델은 재구성가능하다; 즉, 특정 계산 또는 손수 사용자 회로 설계를 알맞게 하도록 하드웨어를 재구성할 수 있다. 예를 들면, 만약 많은 가산기 또는 멀티플렉서가 요구된다면, 시스템은 많은 가산기 및 멀티플렉서를 포함하도록 구성된다. 다른 계산 소자 또는 기능이 요구될 때, 그것들은 시스템내에서 모델화되거나 또는 형성될 수도 있다. 이러한 방식으로, 시스템은 특정 계산 또는 논리 동작을 실행하기 위하여 최적화될 수 있다. 재구성가능한 시스템은 플렉서블하여, 사용자는 제조, 테스팅 또는 사용 동안에 일어나는 소수의 하드웨어 결점을 해결할 수 있다. 일 실시예에서, 재구성가능한 하드웨어 모델(20)은 다양한 사용자 회로 설계 및 애플리케이션을 위한 계산 자원을 제공하기 위하여 FPGA 칩을 포함하는 계산 소자의 2차원 어레이를 포함한다. 하드웨어 구성 프로세스가 더욱 상세히 제공될 것이다.The reconfigurable hardware model 20 includes a field-programmable gate array (FPGA) chip array that is programmable and reconfigurable to design the hardware portion of the user's electronic system design. In this embodiment, the hardware model is reconfigurable; That is, the hardware can be reconfigured to suit a particular computational or manual user circuit design. For example, if many adders or multiplexers are required, the system is configured to include many adders and multiplexers. When other computing elements or functions are required, they may be modeled or formed in the system. In this way, the system can be optimized to perform certain computational or logical operations. The reconfigurable system is flexible such that a user can solve a few hardware defects that occur during manufacturing, testing or use. In one embodiment, reconfigurable hardware model 20 includes a two dimensional array of computing elements including FPGA chips to provide computational resources for various user circuit designs and applications. The hardware configuration process will be provided in more detail.

그러한 2개의 FPGA 칩은 Altera 및 Xilinx에 의해 판매된 것을 포함한다. 일부 실시예에서, 재구성가능한 하드웨어 모델은 필드 프로그래머블 디바이스의 사용을 통하여 재구성가능하다. 그러나, 본 발명의 다른 실시예는 주문형 집적회로 (ASIC)기술에 을 사용하여 구현될 수 있다. 다른 실시예는 주문형 집적 회로(custom IC)의 형태로 이루어질 수 있다.Two such FPGA chips include those sold by Altera and Xilinx. In some embodiments, the reconfigurable hardware model is reconfigurable through the use of a field programmable device. However, other embodiments of the present invention may be implemented using in an application specific integrated circuit (ASIC) technique. Other embodiments may be in the form of custom ICs.

통상적인 테스트/디버그 시나리오에서, 재구성가능한 디바이스는 실제 원형 제조 전에 적절한 변화가 행해지도록 사용자의 회로 설계를 시뮬레이트/에뮬레이트하도록 사용될 것이다. 다른 예에서, 그러나 이것은 재시뮬레이션 및 재에뮬레이션을 위한 비기능적 회로 설계를 신속하고, 비용 효과적으로 변화시킬 수 있는 능력을 사용자로부터 박탈하지만, 실제 ASIC 또는 주문형 집적 회로가 사용될 수 있다. 그러한 ASIC 또는 주문형 IC는 이미 제조되어, 이용가능하더라도, 실제 재구성이 불가능한 칩으로 에뮬레이션을 하는 것이 더 좋을 수 있다.In a typical test / debug scenario, the reconfigurable device will be used to simulate / emulate the user's circuit design so that appropriate changes are made before actual prototype fabrication. In another example, however, this deprives the user of the ability to quickly and cost-effectively change re-simulation and non-functional circuit design for re-emulation, although actual ASICs or custom integrated circuits may be used. Such ASICs or custom ICs may be better emulated with chips that are already manufactured and available, even if they are not practically reconfigurable.

본 발명에 따르면, 외부 하드웨어 모델의 집적도에 따라, 워크스테이션 내의 소프트웨어는 존재하는 시스템상의 최종 사용자를 위해 더 큰 유연성, 제어 및 성능을 제공한다. 시뮬레이션 및 에뮬레이션을 실행하기 위하여, 회로 설계의 모델 및 관련 파라미터(예를 들면, 입력 테스트-벤치 스티멀러스, 전체 시스템 출력, 중간 결과)이 결정되며, 시뮬레이션 소프트웨어 시스템에 제공된다. 사용자는 시스템 회로 설계를 정의하기 위하여 스키메틱 캡쳐 장치나 분석 장치를 사용할 수 있다. 사용자는 통상적으로 드래프트 스키메틱 형태로 전자 시스템 회로 설계를 시작하며, 그리고 나서 분석 장치를 사용하여 HDL로 전화된다. HDL은 사용자에 의해 직접 기입될 수도 있다. 예시적인 HDL 언어는 Verilog 및 VHDL을 포함한다; 그러나 다른 언어도 사용 가능하다. HDL에 표현된 회로 설계는 많은 협력 컴포넌트를 포함한다. 각각의 컴포넌트는 회로 소자의 행위를 정의하거나 또는 시뮬레이션 실행을 제어하는 코드 시퀀스이다.According to the present invention, depending on the density of the external hardware model, the software in the workstation provides greater flexibility, control and performance for end users on the existing system. In order to run the simulations and emulations, the model of the circuit design and associated parameters (eg input test-bench stimulus, total system output, intermediate results) are determined and provided to the simulation software system. The user can use the schematic capture device or the analysis device to define the system circuit design. The user typically begins the design of the electronic system circuit in draft schematic form, and then uses an analysis device to call HDL. The HDL may be written directly by the user. Exemplary HDL languages include Verilog and VHDL; However, other languages are available. The circuit design represented in the HDL includes many cooperative components. Each component is a sequence of code that defines the behavior of a circuit element or controls simulation execution.

SEmulation 시스템은 그것들의 컴포넌트 형태를 결정하기 위하여 분석하며, 컴파일러는 소프트웨어 및 하드웨어 내의 상이한 실행 모델을 만들기 위하여 이러한 컴포넌트 형태 정보를 사용한다. 그리고 나서, 사용자는 본 발명의 SEmulation 시스템을 사용할 수 있다. 설계자는 입력 신호와 같은 다양한 자극을 인가함으로써 시뮬레이션을 통하여 회로의 정확성을 확인하고, 시뮬레이트된 모델의 벡터 패턴을 테스트할 수 있다. 만약 시뮬레이션 동안에, 회로가 계획대로 동작하지 않으면, 사용자는 회로 스키메틱 또는 HDL 파일을 조정함으로써 회로를 재정의할 수 있다.The SEmulation system analyzes to determine their component types, and the compiler uses this component type information to create different execution models in software and hardware. The user can then use the SEmulation system of the present invention. By applying various stimuli such as input signals, designers can verify the accuracy of the circuit through simulations and test the vector pattern of the simulated model. If during the simulation, the circuit does not work as planned, the user can redefine the circuit by adjusting the circuit schematic or HDL file.

본 발명의 이러한 실시예의 사용은 도 2의 흐름도에 도시된다. 알고리즘은 단계 100에서 시작한다. 시스템 안으로 HDL 파일을 로딩한 후에, 시스템은 하드웨어 모델에 적합하도록 회로 설계를 컴파일, 분할, 및 맵핑한다. 컴파일, 분할 및 맵핑 단계는 이하에서 더욱 상세히 논의될 것이다.The use of this embodiment of the present invention is shown in the flowchart of FIG. The algorithm starts at step 100. After loading the HDL file into the system, the system compiles, partitions, and maps the circuit design to fit the hardware model. Compiling, splitting and mapping steps will be discussed in more detail below.

시뮬레이션 실행 전에, 시스템은 하드웨어 가속 모델이 기능할 수 있기 전에, 소프트웨어 내의 모든 미지의 "x" 값을 제거하기 위하여 리셋 시퀀스를 실행해야 한다. 본 발명의 일 실시예는 버스 신호 -"00"는 로직 로우, "01"은 로직 하이, "10"은 "z" 및 "11"은 "x"-를 위해 4 상태 값을 제공하기 위해 2-비트 와이드 데이터 경로를 사용한다. 당업자가 알 수 있는 바와 같이, 소프트웨어 모델은 "0", "1", "x"(버스 충돌 또는 미지값), 및 "z"(드라이버 없음 또는 하이 임피던스)를 다룰 수 있다. 그와 반대로, 하드웨어는 미지값 "x"를 다룰 수 없어서, 특정 적용가능한 코드에 의존하여 변화하는 리셋 시퀀스를 레지스터 값을 모두 "0" 또는 모두 "1"로 리셋시킨다.Before running the simulation, the system must execute a reset sequence to remove all unknown "x" values in software before the hardware acceleration model can function. In one embodiment of the present invention, the bus signal-" 00 " is logic low, " 01 " is logic high, " 10 " is " z " and " 11 " Use a bit wide data path. As will be appreciated by those skilled in the art, the software model can handle "0", "1", "x" (bus crash or unknown value), and "z" (no driver or high impedance). On the contrary, the hardware cannot handle the unknown value "x", thus resetting the register value to all "0" or all "1" depending on the particular applicable code.

단계 105에서, 사용자는 회로 설계를 시뮬레이트할 것인지를 결정한다. 통상적으로, 사용자는 먼저 소프트웨어 시뮬레이션으로 시스템을 시작할 것이다. 그러므로, 만약 단계 105에서 결정이 "예" 라면, 소프트웨어 시뮬레이션은 단계 110에서 발생한다.In step 105, the user determines whether to simulate the circuit design. Typically, the user will first start the system with software simulation. Therefore, if the decision is "Yes" in step 105, then a software simulation occurs in step 110.

사용자는 단계 115에서 도시된 바와 같이, 값을 검사하기 위하여 시뮬레이션을 중단한다. 실제로, 사용자는 단계 115에서 하드웨어 가속 모드, ICE 모드 및 포스트-시뮬레이션 모드내의 다양한 노드로 연장되는 점선 라인에 의해 도시된 바와 같이 테스트/디버그 세션 동안의 임의의 시간에 시뮬레이션을 중단할 수 있다. 단계 115의 실행은 사용자를 단계 160으로 유도한다.The user stops the simulation to check the value, as shown in step 115. Indeed, the user may abort the simulation at any time during the test / debug session as shown by the dashed line extending to various nodes in the hardware acceleration mode, ICE mode and post-simulation mode in step 115. Execution of step 115 directs the user to step 160.

중단 후에, 사용자가 조합 컴포넌트 값을 검사하기를 원한다면, 조합 컴포넌트를 포함하는 전체 소프트웨어 모델을 재생성하기 위하여 하드웨어 레지스터 컴포넌트의 상태를 역판독한다. 전체 소프트웨어 모델을 저장한 후에, 사용자는 시스템내의 임의의 신호 값을 검사할 수 있다. 중단 및 검사 후에, 사용자는 시뮬레이션 모드 또는 하드웨어 가속 모드에서 계속 실행할 수 있다. 중단/값 검사 루틴은 단계 160에서 시작한다. 단계 165에서, 사용자는 이러한 포인트에서 시뮬레이션을 중단하고, 값을 검사할 것인지를 결정해야 한다. 만약 단계 165가 "예"를 결정한다면, 단계 170은 현재 진행될 시뮬레이션을 중단하고, 회로 설계의 정확성을 점검하기 위해 다양한 값을 검사한다. 단계 175에서, 알고리즘은 단계 115로 브랜치되어 있는 지점으로 리턴된다. 여기서, 사용자는 테스트/디버그 세션의 나머지를 위해 시뮬레이트 및 중단/ 값 검사를 계속하거나 또는 회로내의 에뮬레이션 단계로 진행할 수 있다.After aborting, if the user wants to check the combined component value, the user reads back the state of the hardware register component to regenerate the entire software model including the combined component. After saving the entire software model, the user can examine any signal value in the system. After stopping and checking, the user can continue to run in simulation mode or hardware acceleration mode. The abort / value check routine begins at step 160. In step 165, the user must stop the simulation at this point and decide whether to check the value. If step 165 determines "Yes," step 170 stops the simulation that is currently going on and checks various values to check the accuracy of the circuit design. In step 175, the algorithm returns to the point where it is branched to step 115. Here, the user can continue the simulation and interruption / value checking for the remainder of the test / debug session or proceed to the emulation phase in the circuit.

유사하게, 만약 단계 105가 "아니오"를 결정하면, 알고리즘은 하드웨어 가속 결정 단계 120을 진행할 것이다. 단계 120에서, 사용자는 모델화된 회로 설계의 하드웨어부를 통한 시뮬레이션을 가속함으로써 테스트/디버그 프로세스를 가속할것인지를 결정한다. 만약 단계 120에서의 결정이 "예"라면, 그러면 하드웨어 모델 가속은 단계 125에서 발생한다. 시스템 편집 프로세스 동안에, SEmulation 시스템은 하드웨어 모델 안으로 일부 맵핑된다. 여기서, 하드웨어 가속이 요구되면, 시스템은레지스터 및 조합 컴포넌트가 하드웨어 모델로 이동하며, 입력 및 평가 값이 하드웨어 모델로 이동한다. 그러므로, 하드웨어 가속 동안에, 가속된 스피드로 긴 시간 구간동안에 하드웨어 모델에서 평가가 일어난다. 커널은 테스트-벤치 출력을 하드웨어 모델에 기록하고, 소프트웨어 클록을 업데이트하며, 그리고 나서 하드웨어 모델 출력값을 주기마다 기록한다. 만약 사용자에 의해 요구된다면, 전체 회로 설계인 사용자의 회로 설계의 전체 소프트웨어 모델로부터의 값은 레지스터 값과 조합 컴포넌트를 출력하고, 레지스터 값으로 조합 컴포넌트를 재생성함으로써 이용가능하게 할 수 있다. 이러한 조합 컴포넌트를 재생성하기 위하여 소프트웨어 개입의 요구 때문에, 전체 소프트웨어 모델을 위한 출력값은 매 주기마다 제공되지 않으며, 값은 사용자가 그러한 값을 원하는 경우에만 제공된다. 이러한 상술은 조합 컴포넌트 재생성 프로세서에서 논의할 것이다.Similarly, if step 105 determines "no", the algorithm will proceed to hardware acceleration decision step 120. In step 120, the user determines whether to accelerate the test / debug process by accelerating the simulation through the hardware portion of the modeled circuit design. If the determination at step 120 is yes, then hardware model acceleration occurs at step 125. During the system editing process, the SEmulation system is partially mapped into the hardware model. Here, if hardware acceleration is required, the system moves registers and combinational components to the hardware model and input and evaluation values to the hardware model. Therefore, during hardware acceleration, evaluation takes place in the hardware model over long time periods at accelerated speeds. The kernel writes test-bench outputs to the hardware model, updates the software clock, and then writes the hardware model outputs every cycle. If required by the user, the value from the full software model of the user's circuit design, which is the overall circuit design, can be made available by outputting the register value and the combination component and regenerating the combination component with the register value. Because of the requirement of software intervention to regenerate this combinatorial component, the output for the entire software model is not provided every cycle, and the value is provided only if the user desires such a value. Such details will be discussed in the Combination Component Regeneration Processor.

다시, 사용자는 단계 115에 개시된 바와 같이, 임의의 시간에 하드웨어 가속 모드를 중단시킬 수 있다. 만약 사용자가 중단하기를 원한다면, 알고리즘은 중단/값 검사 루틴을 브랜치하기 위하여 단계 115 및 160을 진행한다. 여기서, 단계 115에서와 같이, 사용자는 임의의 시간에 하드웨어 가속화된 시뮬레이션 프로세스를 중단할 수 있으며, 시뮬레이션 공정으로부터 나오는 값을 검사할 수 있거나, 또는 사용자는 하드웨어-가속화 시뮬레이션 프로세스를 계속할 수 있다. 중단/값 검사 루틴은 단계 160, 165, 170 및 175에 브랜치되며, 이러한 단계는 시뮬레이션 중단에서 언급되었다. 단계 125후에 주요 루틴으로 리턴하며, 사용자는 하드웨어-가속화 시뮬레이션을 계속할 것인지 또는 대신에 단계 135에서 순수 시뮬레이션을 실행할 것인지를 결정할 수 있다. 만약 사용자가 시뮬레이트를 더 하기를 원하면, 알고리즘은 단계 105를 진행한다. 만약, 그렇지 않다면, 알고리즘은 단계 140에서 포스트-시뮬레이션 분석을 진행한다.Again, the user can stop the hardware acceleration mode at any time, as disclosed in step 115. If the user wants to abort, the algorithm proceeds to steps 115 and 160 to branch the abort / value checking routine. Here, as in step 115, the user may abort the hardware accelerated simulation process at any time, check the value coming from the simulation process, or the user may continue the hardware-accelerated simulation process. The stop / value checking routine is branched to steps 160, 165, 170, and 175, which are mentioned in the simulation stop. Returning to the main routine after step 125, the user can decide whether to continue the hardware-accelerated simulation or instead run a pure simulation in step 135. If the user wants to add more simulations, the algorithm proceeds to step 105. If not, the algorithm proceeds to post-simulation analysis in step 140.

단계 140에서, SEmulation 시스템은 많은 프로스-시뮬레이션 분석 특징으로 제공한다. 시스템은 모든 입력은 하드웨어 모델에 기입(log)한다. 하드웨어 모델 출력을 위하여, 시스템은 사용자 정의 로깅 주파수(예를 들면, 1/10,000 기록/주기)에서 하드웨어 레지스터 컴포넌트의 모든 값을 기입한다. 로깅 주파수는 출력값이 얼마나 자주 기록되는지를 결정한다. 1/10,000 기록/주기의 로깅 주파수 동안에, 출력값은 10,000 주기마다 한 번 기록된다. 로깅 주파수가 더 높으면, 나중의 포스트-시뮬레이션 분석 동안에 더 많은 정보가 기록된다. 선택된 로깅 주파수는 SEmulation 속도와 임시의 관계를 갖기 때문에, 사용자는 주의 깊게 로깅 주파수를 선택한다. 시스템은 더 많은 시뮬레이션이 실행되기 전에 I/O 동작을 메모리에 실행함으로써 출력 데이터를 기록하기 위한 자원과 시간을 소비해야만 하기 때문에, 더 높은 로깅 주파수는 SEmulation 속도를 감소시킬 것이다.In step 140, the SEmulation system provides a number of pros-simulation analysis features. The system logs all inputs to the hardware model. For hardware model output, the system writes all the values of the hardware register components at user defined logging frequencies (e.g., 1 / 10,000 writes / cycle). The logging frequency determines how often the output is recorded. During the logging frequency of 1 / 10,000 recordings / cycle, the output value is recorded once every 10,000 cycles. The higher the logging frequency, the more information is recorded during later post-simulation analysis. Since the selected logging frequency has a temporary relationship with the SEmulation rate, the user carefully selects the logging frequency. Higher logging frequencies will reduce the SEmulation rate because the system must spend time and resources to write output data by executing I / O operations in memory before more simulations are run.

포스트-시뮬레이션 분석에 관하여, 사용자는 시뮬레이션이 요구하는 특정 지점을 선택한다. 그리고 나서, 사용자는 SEmulation 후에, 값 변화와 모든 하드웨어 컴포넌트의 내부 상태를 계산하기 위하여 하드웨어 모델에 입력 로그를 갖는 소프트웨어 시뮬레이션을 실행시킴으로써 분석한다. 시뮬레이션 결과를 분석하기 위하여 선택된 로깅 지점으로부터 데이터를 시뮬레이트하기 위하여 하드웨어 가속기가 사용된다는 것을 유의하라. 이하에서 더욱 상세히 논의될 것이다.Regarding post-simulation analysis, the user selects a particular point that the simulation requires. Then, after the SEmulation, the user analyzes by running a software simulation with an input log in the hardware model to calculate the value change and the internal state of all hardware components. Note that a hardware accelerator is used to simulate the data from the selected logging point to analyze the simulation results. It will be discussed in more detail below.

단계 145에서, 사용자는 타겟 시스템 환경내에서 시뮬레이트된 회로 설계를 에뮬레이트하기 위하여 선택할 수 있다. 만약 타겟 시스템을 갖는 에뮬레이션이 요구된다면, 알고리즘은 단계 150을 진행한다. 이러한 단계는 에뮬레이션 인터페이스 보드 활성화, 케이블 및 칩 핀 어댑터를 타겟 시스템에 플러깅 및 타겟 시스템으로부터 시스템 I/O를 획득하기 위하여 타겟 시스템을 실행하는 단계를 포함한다. 타겟 시스템으로부터 시스템 I/P는 타겟 시스템 및 회로 설계의 에뮬레이션 사이의 신호를 포함한다. 에뮬레이트된 회로 설계는 타겟 시스템으로부터 입력 신호를 수신하고, 이것을 처리하며, 다른 프로세싱을 위하여 SEmulation 시스템에 보내며, 처리된 신호를 타겟 시스템에 출력한다.In step 145, the user can choose to emulate a simulated circuit design within the target system environment. If emulation with the target system is required, the algorithm proceeds to step 150. These steps include enabling the emulation interface board, plugging cables and chip pin adapters into the target system and executing the target system to obtain system I / O from the target system. The system I / P from the target system contains the signal between the emulation of the target system and the circuit design. The emulated circuit design receives an input signal from the target system, processes it, sends it to the SEmulation system for further processing, and outputs the processed signal to the target system.

이와는 반대로, 에뮬레이트된 회로 설계는 출력 신호를 타겟 시스템에 보내며, 이러한 신호를 처리하며, 에뮬레이트된 회로 설계에 처리된 신호를 역 출력한다. 이러한 방식으로, 회로 설계의 성능은 본래 타겟 시스템 환경에서 평가될 수 있다. 타겟 시스템으로 에뮬레이션을 한 후에, 사용자는 회로 설계를 확인하거나 기능적인 면을 나타내는 결과를 갖는다. 이러한 지점에서, 사용자는 단계 135에서 개시된 바와 같이, 다시 시뮬레이트/에뮬레이트하거나, 회로 설계를 조정하기 위하여 중단하거나, 또는 확인된 회로 설계에 근거하여 집적 회로 제조를 진행할 수 있다.In contrast, an emulated circuit design sends an output signal to the target system, processes the signal, and outputs the processed signal back to the emulated circuit design. In this way, the performance of the circuit design can be assessed in the original target system environment. After emulation with the target system, the user has a result that confirms the circuit design or shows a functional aspect. At this point, the user may again simulate / emulate, stop to adjust the circuit design, or proceed with integrated circuit fabrication based on the identified circuit design, as disclosed in step 135.

Ⅲ. 시뮬레이션/하드웨어 가속 모드III. Simulation / Hardware Acceleration Mode

본 발명의 일 실시예에 따라, 컴파일 시간 및 실행 시간 동안에 소프트웨어 편집 또는 하드웨어 구성의 고레벨 다이어그램이 도 3에 개시된다. 도 3은 2세트의 정보를 도시하며; 한 세트의 정보는 컴파일 시간 및 시뮬레이션/에뮬레이션 실행 시간 동안에 실행된 동작을 구별하며; 다른 세트의 정보는 소프트웨어 모델 및 하드웨어 모델 사이의 분할을 도시한다. 처음에, 본 발명의 일 실시예에 따른 SEmulation 시스템은 입력 데이터(200)로서 사용자 회로 설계를 요구한다. 사용자 회로 설계는 HDL 파일(예를 들면, Verilog, VHDL)의 형태이다. SEmulation 시스템은 HDL 파일을 축소하여, 동작 레벨 코드, 레지스터 트랜스퍼 레벨 코드 및 게이트 레벨 코드는 SEmulation 시스템에 의해 사용할 수 있는 형태로 감소될 수 있다. 시스템은 전단 프로세싱 단계 205를 위하여 소스 디자인 데이터베이스를 생성한다. 처리된 HDL 파일은 SEmulation 시스템에 의해 사용 가능하다. 파싱 프로세스(parsing process)는 ASCII 데이터를 내부 이진 데이터 구조로 변환시키며, 이는 당업자에게 공지되어 있다. ALFRED V.AHO, RAVISETHI, JEFFREY D.ULLMAN 편저, PRINCIPLE, THEHNIQUE AND TOOLS(1988)이 참조된다.In accordance with one embodiment of the present invention, a high level diagram of software editing or hardware configuration during compilation time and execution time is disclosed in FIG. 3. 3 shows two sets of information; A set of information distinguishes actions performed during compilation time and simulation / emulation execution time; The other set of information shows the division between the software model and the hardware model. Initially, the SEmulation system in accordance with one embodiment of the present invention requires user circuit design as input data 200. The user circuit design is in the form of HDL files (eg Verilog, VHDL). The SEmulation system shrinks the HDL file so that the operation level code, register transfer level code and gate level code can be reduced in a form that can be used by the SEmulation system. The system creates a source design database for the front end processing step 205. The processed HDL file is available by the SEmulation system. The parsing process converts ASCII data into an internal binary data structure, which is known to those skilled in the art. See ALFRED V.AHO, RAVISETHI, JEFFREY D.ULLMAN, PRINCIPLE, THEHNIQUE AND TOOLS (1988).

컴파일 시간은 프로세스 225에 의해 표현되며, 실행 시간은 프로세스/소자 (230)에 의해 표현된다. 프로세스 (225)에 의해 도시된, 편집 시간 동안에, SEmulation 시스템은 컴포넌트 타입 분석을 실행함으로써 처리된 HDL 파일을 컴파일한다. 컴포넌트 형태 분석은 HDL 컴포넌트를 조합 컴포넌트, 레지스터 컴포넌트, 클록 컴포넌트, 메모리 컴포넌트 및 테스트-벤치 컴포넌트로 분류한다. 시스템은 사용자 회로 설계를 제어 및 평가 컴포넌트로 분할한다.Compilation time is represented by process 225, and execution time is represented by process / device 230. During the edit time, shown by process 225, the SEmulation system compiles the processed HDL file by performing component type analysis. Component shape analysis classifies HDL components into combination components, register components, clock components, memory components, and test-bench components. The system divides the user circuit design into control and evaluation components.

SEmulation 컴파일러(210)는 시뮬레이션의 제어 컴포넌트를 소프트웨어에 맵핑시키며, 평가 컴포넌트를 소프트웨어 및 하드웨어에 맵핑시킨다. 컴파일러(210)는 모든 HDL 컴포넌트를 위한 소프트웨어 모델을 생성한다. 소프트웨어 모델은 코드(215)에서 동쪽에 있다. 추가적으로, SEmulation 컴파일러(210)는 HDL 파일의 컴포넌트 형태 정보를 사용하며, 라이브러리나 모듈 생성기로부터 하드웨어 로직 블록/소자를 선택 또는 생성하며, 그리고 특정 HDL 컴포넌트를 위한 하드웨어 모델을 생성한다. 말단 결과는 소위 "비트스트림" 구성 파일(220)이다.The SEmulation compiler 210 maps the control components of the simulation to software, and the evaluation components to software and hardware. Compiler 210 generates a software model for all HDL components. The software model is east of code 215. In addition, the SEmulation compiler 210 uses component type information in the HDL file, selects or creates hardware logic blocks / elements from the library or module generator, and generates a hardware model for a particular HDL component. The end result is a so-called "bitstream" configuration file 220.

실행 시간의 준비시, 코드 폼내의 소프트웨어 모델은 본 발명의 일 실시예에 따른 SEmulation 프로그램과 관련된 애플리케이션 프로그램이 저장되는 메인 메모리에 저장된다. 이러한 코드는 일반적인 목적 프로세서나 워크스테이션(240)내에서 처리된다. 거의 동시에, 하드웨어 모델을 위한 구성 파일(220)은 사용자 회로 설계를 재구성가능한 하드웨어 보드(250)에 맵핑하기 위하여 사용된다. 여기서, 하드웨어 내에 모델화된 회로 설계의 부분은 재구성가능한 하드웨어 보드(250)내의 FPGA 칩안에 맵핑되고, 분할된다.In preparation of the execution time, the software model in the code form is stored in the main memory in which the application program associated with the SEmulation program according to one embodiment of the invention is stored. Such code is processed in a general purpose processor or workstation 240. At about the same time, the configuration file 220 for the hardware model is used to map the user circuit design to the reconfigurable hardware board 250. Here, the portion of the circuit design modeled in the hardware is mapped and divided into the FPGA chip in the reconfigurable hardware board 250.

상기에서 설명된 바와 같이, 사용자 테스트-벤치 스티멀러스 및 테스트 벡터 데이터 및 다른 테스트-벤치 자원(235)은 시뮬레이선 목적을 위하여 일반적인 프로세서 또는 워크스테이션(240)에 제공된다. 게다가, 사용자는 소프트웨어 제어를 통하여 회로 설계의 에뮬레이션을 실행할 수 있다. 재구성가능한 하드웨어 보드(250)는 사용자의 에뮬레이트된 회로 설계를 함유한다. 이러한 SEmulation 시스템은 사용자가 소프트웨어 시뮬레이션 및 하드웨어 에뮬레이션 사이의 선택적으로 스위치하며, 임의의 시간에 시뮬레이션 또는 에뮬레이션 프로세스를 중단하며, 모델내에서 모든 컴포넌트로부터 값을 검사하기 위한 능력을 갖는다. 그러므로, SEmulation 시스템은 테스트-벤치(235)와 시뮬레이션을 위한 프로세서/워크스테이션 (240) 사이의 데이터를 패스시키며, 에뮬레이션을 위한 프로세서/워크스테이션(240)과 데이터 버스(245)를 통하여 테스트-벤치(235)와 재구성가능한 하드웨어 보드(250) 사이의 데이터를 패스시킨다. 만약, 사용자 타겟 시스템(260)이 포함된다면, 에뮬레이션 데이터는 재구성가능한 하드웨어 보드(250)및 에뮬레이션 인터페이스(255) 및 데이터 버스(245)를 통하여 타겟 시스템(260) 사이를 패스할 수 있다. 커널은 프로세서/워크스테이션(240)의 메모리내의 소프트웨어 시뮬레이션 모델내에서 발견되어, 데이터는 프로세서/워크스테이션(240)과 데이터 버스(245)를 통하여 재구성가능한 하드웨어 보드(250) 사이를 패스한다.As described above, user test-bench stimulus and test vector data and other test-bench resources 235 are provided to a general processor or workstation 240 for simulation purposes. In addition, the user can perform emulation of the circuit design through software control. Reconfigurable hardware board 250 contains a user's emulated circuit design. This SEmulation system has the ability for the user to selectively switch between software simulation and hardware emulation, to interrupt the simulation or emulation process at any time, and to check values from all components in the model. Thus, the SEmulation system passes data between the test-bench 235 and the processor / workstation 240 for simulation, and the test-bench through the data bus 245 and the processor / workstation 240 for emulation. Pass data between 235 and reconfigurable hardware board 250. If user target system 260 is included, emulation data may pass between target system 260 via reconfigurable hardware board 250 and emulation interface 255 and data bus 245. The kernel is found in a software simulation model in memory of the processor / workstation 240 so that data passes between the processor / workstation 240 and the reconfigurable hardware board 250 via the data bus 245.

도 4는 본 발명의 일 실시예에 따른 편집 프로세스의 흐름도를 도시한다. 편집 프로세스는 도 3에 도시된 프로세스 205 및 210이다. 도 4의 편집 프로세스는 단계 300에서 시작한다. 단계 301은 전단 정보를 처리한다. 여기서 게이트 레벨 HDL 코드가 생성된다. 사용자는 코드의 게이트 레벨 HDL 표현을 생성하기 위하여 스키메틱 또는 분석 장치를 사용하거나 또는 코드를 직접 핸드라이팅함으로써 초기 회로 설계를 HDL 형태로 변화시킨다. SEmulation 시스템은 HDL 파일(ASCII)을 이진 포맷으로 축소하여, 행동 레벨 코드, 레지스터 트랜스퍼 레벨(RTL) 및 게이트 레벨 코드가 SEmulation 시스템에 의해 허용가능한 내부 데이터 구조 형태로감소될 수 있다. 시스템은 축소된 HDL 코드를 함유하는 소스 설계 데이터베이스를 생성한다.4 shows a flowchart of an editing process according to one embodiment of the invention. The editing process is processes 205 and 210 shown in FIG. The editing process of FIG. 4 begins at 300. Step 301 processes the leaflet information. The gate level HDL code is generated here. The user transforms the initial circuit design into HDL form by using a schematic or analysis device to generate a gate level HDL representation of the code or by handwriting the code directly. The SEmulation system shrinks the HDL file (ASCII) into a binary format so that the behavior level code, register transfer level (RTL) and gate level code can be reduced to an internal data structure form acceptable by the SEmulation system. The system creates a source design database containing reduced HDL code.

단계 302는 HDL 컴포넌트를 컴포넌트 형태 자원(303)으로 도시된 조합 컴포넌트, 레지스터 컴포넌트, 클록 컴포넌트, 메모리 컴포넌트 및 테스트-벤치 컴포넌트로 분류함으로써 컴포넌트 형태 분석을 실행한다. SEmulation 시스템은 이하에서 논의되는 예외를 가지면서, 레지스터 및 조합 컴포넌트를 위한 하드웨어 모델을 생성한다. 테스트-벤치 및 메모리 컴포넌트는 소프트웨어에 맵핑된다. 일부 클록 컴포넌트(예를 들면, 유도된 클록)는 하드웨어내에 모델화되며, 다른 나머지는 소프트웨어/하드웨어 바운더리(예를 들면, 소프트웨어 클록)내에 모델화된다.Step 302 performs component type analysis by classifying the HDL component into a combination component, a register component, a clock component, a memory component, and a test-bench component shown as component type resource 303. The SEmulation system generates hardware models for registers and combinational components, with the exceptions discussed below. Test-bench and memory components are mapped to software. Some clock components (eg, derived clocks) are modeled in hardware, and others are modeled in software / hardware boundaries (eg, software clocks).

조합 컴포넌트는 스테이트리스 로직 컴포넌트이며, 그것의 출력값은 현재 입력값의 함수이며, 과거 입력 값에 의존하지 않는다. 조합 컴포넌트의 예는 기본 게이트(예를 들면, AND, OR, XOR, NOT), 선택기, 어댑터, 멀티플렉서, 시프터, 및 버스 드라이버를 포함한다.Combination components are stateless logic components whose output values are a function of the current input value and do not depend on past input values. Examples of combinational components include basic gates (eg, AND, OR, XOR, NOT), selectors, adapters, multiplexers, shifters, and bus drivers.

레지스터 컴포넌트는 간단한 저장 소자이다. 레지스터의 상태 변이는 클록 신호에 의해 제어된다. 레지스터의 한 형태는 에지가 검출될 때 상태가 변화하는 에지-트리거 레지스터이다. 다른 형태의 레지스터는 레벨 트리거되는 래치이다. 예는 플립-플롭(D-타입, JK-타입) 및 레벨-감지 래치를 포함한다.Register components are simple storage elements. The state transition of the register is controlled by the clock signal. One type of register is an edge-trigger register whose state changes when an edge is detected. Another type of register is a level triggered latch. Examples include flip-flops (D-type, JK-type) and level-sensitive latches.

클록 컴포넌트는 로직 디바이스의 동작을 제어하기 위하여 로직 디바이스에 주기적인 신호를 보내는 소자이다. 통상적으로 클록 신호는 레지스터의 업데이트를제어한다. 주요 클록은 셀프-타임 테스트-벤치 프로세스로부터 발생된다. 예를들면, Verilog 내의 클록 발생을 위한 통상적인 테스트-벤치 프로세스는 다음과 같다.The clock component is a device that sends periodic signals to the logic device to control the operation of the logic device. Typically the clock signal controls the update of the register. The main clock is generated from the self-time test-bench process. For example, a typical test-bench process for clock generation in Verilog is as follows.

시작start

클록 = 0;Clock = 0;

#5;# 5;

클록 = 1;Clock = 1;

#5# 5

끝;End;

이러한 코드에 따라, 클록 신호는 최초에 로직 "0"이다. 5 타임 유닛 후에, 클록 신호는 로직 "1"로 변화한다. 5 타임 유닛 후에, 클록 신호는 다시 로직 "0"으로 변한다. 통상적으로, 주요 클록 신호는 소프트웨어에서 생성되며, 단지 몇몇(즉, 1-10) 주요 클록은 통상적인 사용자 회로 설계에서 생긴다. 유도된 또는 게이트된 클록은 주요 클록에 의해 차례대로 유도되는 레지스터 및 조합 로직의 네트워크로부터 생성된다. 많은(즉, 1000 이상) 유도된 클록은 통상적인 사용자 회로 설계내에서 생긴다.According to this code, the clock signal is initially logic "0". After 5 time units, the clock signal changes to logic "1". After 5 time units, the clock signal changes back to logic " 0 ". Typically, the major clock signal is generated in software, and only a few (ie 1-10) major clocks occur in a typical user circuit design. The derived or gated clock is generated from a network of register and combinational logic that is in turn driven by the primary clock. Many (ie, more than 1000) derived clocks occur within conventional user circuit designs.

메모리 컴포넌트는 어드레스를 갖는 블록 저장 컴포넌트이며, 특정 메모리 위치내의 개별 데이터를 액서스하기 위하여 라인을 제어한다. 예로써, ROM, 비동기 RAM, 동기 RAM이 있다.The memory component is a block storage component with an address and controls lines to access individual data within a particular memory location. Examples include ROM, asynchronous RAM, and synchronous RAM.

테스트-벤치 컴포넌트는 시뮬레이션 프로세스를 제어하고 모니터하기 위하여 사용된 소프트웨어 프로세스이다. 따라서, 이러한 컴포넌트는 테스트 동안의 하드웨어의 일부가 아니다. 테스트-벤치 컴포넌트는 클록 신호를 제어하며, 시뮬레이션 데이터를 초기화하며, 그리고 디스크/메모리로부터 시뮬레이션 테스트 벡터 패턴을 판독함으로써 시뮬레이션을 제어한다. 테스트-벤치 컴포넌트는 값 변화를 점검하고, 값 변화 덤프를 실행하며, 신호 값 관계상의 제한을 점검하고, 출력 테스트 벡터를 디스크/ 메모리에 기록하며, 다양한 파형 뷰어 및 디버거와 인터페이스함으로써 모니터한다.Test-bench components are software processes used to control and monitor the simulation process. Thus, these components are not part of the hardware during the test. The test-bench component controls the simulation by controlling the clock signal, initializing the simulation data, and reading the simulation test vector pattern from disk / memory. The test-bench component checks for value changes, runs a value change dump, checks for signal value relationship constraints, writes output test vectors to disk / memory, and monitors them by interfacing with various waveform viewers and debuggers.

SEmulation 시스템은 컴포넌트 형태 분석을 다음과 같이 행한다. 시스템은 2진 소스 설계 데이터베이스를 시험한다. 소스 설계 데이터베이스에 근거하여, 시스템은 상기 컴포넌트 형태중 하나의 소자로서 분류할 수 있다. 연속 할당 진술은 조합 컴포넌트로서 분류된다. 기본 게이트는 언어 정의에 의하여 레지스터 형태의 래치 또는 조합 형태이다. 초기화 코드는 초기화 형태의 테스트-벤치로서 취급된다.The SEmulation system performs component type analysis as follows. The system tests the binary source design database. Based on the source design database, the system may classify as one of the component types. Consecutive assignment statements are classified as combinatorial components. The basic gate is a latch or combination of registers by language definition. The initialization code is treated as a test-bench in the initialization form.

네트를 사용하지 않고 네트를 구동하는 프로세스는 드라이버 형태의 테스트-벤치이다. 네트를 구동하지 않고 네트를 판독하는 프로세스는 모니터 형태의 테스트-벤치이다. 지연 제어 또는 다중 사건 제어를 갖는 프로세스는 일반적인 형태의 테스트-벤치이다.The process of running a net without using a net is a test-bench in the form of a driver. The process of reading a net without running the net is a test-bench in the form of a monitor. A process with delay control or multiple event control is a common form of test-bench.

단일 사건 제어 및 단일 네트를 구동하는 프로세스는 이하의 것중 하나가 될 수 있다 : (1) 사건 제어는 에지-트리거된 사건이라면, 프로세스는 에지-트리거된 형태의 레지스터 컴포넌트이다. (2) 프로세스 내에서 구동된 네트가 모든 가능한 실행 경로내에서 정의되지 않는다면, 네트는 레지스터의 래치 형태이다. (3) 프로세스 내에서 구동된 네트가 모든 가능한 실행 경로내에서 정의된다면, 네트는 조합 컴포넌트이다.The process of driving a single event control and a single net can be one of the following: (1) If event control is an edge-triggered event, the process is an edge-triggered register component. (2) If a net driven in a process is not defined in all possible execution paths, the net is in the form of a latch of a register. (3) If a net driven in a process is defined in all possible execution paths, the net is a combinatorial component.

다중 네트 구동없이 단일 사건 제어를 갖는 프로세스는 각각의 네트를 개별적으로 구동하는 몇몇 프로세스로 분해되어, 개별적인 컴포넌트 형태가 구동된다. 분해된 프로세스는 컴포넌트 형태를 결정하기 위하여 사용될 수 있다.Processes with a single event control without multiple net drives are broken down into several processes that drive each net individually, driving individual component types. The decomposed process can be used to determine the component shape.

단계 304는 컴포넌트 형태와는 상관없이, 모든 HDL 컴포넌트를 위한 소프트웨어 모델을 생성한다. 적절한 사용자 인터페이스를 사용하면, 사용자는 완전한 소프트웨어 모델을 사용하여 전체 회로 설계를 시뮬레이트할 수 있다. 테스트-벤치 프로세스는 시티멀러스 입력을 드라이브하고, 벡터 패턴을 테스트하고, 전체 시뮬레이션을 제어하며, 그리고 시뮬레이션 프로세스를 모니터하기 위하여 사용된다.Step 304 generates a software model for all HDL components, regardless of component type. With the proper user interface, the user can simulate the entire circuit design using a complete software model. The test-bench process is used to drive the Cimamus input, test the vector pattern, control the overall simulation, and monitor the simulation process.

단계 305는 클록 분석을 실행한다. 클록 분석은 2가지 일반적인 단계; (1) 클록 추출 및 연속 맵핑, 및 (2) 클록 네트워크 분석을 포함한다. 클록 추출 및 연속 맵핑 단계는 사용자의 레지스터 컴포넌트를 SEmulation 시스템의 하드웨어 레지스터 모델에 맵핑하고, 그리고 나서 클록 신호를 시스템의 하드웨어 레지스터 컴포넌트 밖으로 추출하는 단계를 포함한다. 클록 네트워크 분석 단계는 주요 클록 및 추출된 클록 신호에 근거하여 유도된 클록을 결정하며, 게이트 클록 네트워크 및 게이트 데이터 네트워크를 분리하는 단계를 포함한다. 더욱 상세한 상술은 도 16과 관련하여 제공될 것이다.Step 305 performs clock analysis. Clock analysis consists of two general steps; (1) clock extraction and continuous mapping, and (2) clock network analysis. The clock extraction and continuous mapping steps include mapping a user's register component to the SEmulation system's hardware register model, and then extracting the clock signal out of the system's hardware register component. The clock network analysis step includes determining a derived clock based on the primary clock and the extracted clock signal and separating the gate clock network and the gate data network. Further details will be provided in conjunction with FIG. 16.

단계 306은 레지던스 선택을 실행한다. 사용자와 관련하여, 시스템은 하드웨어 모델을 위한 컴포넌트를 선택한다; 즉, 사용자의 회로 설계의 하드웨어 모델내에서 구현될 수 잇는 하드웨어 컴포넌트중에서, 일부 하드웨어 컴포넌트는 다양한 이유로 인하여 하드웨어 내에서 모델되지 않을 것이다. 이러한 이유는 컴포넌트 형태, 하드웨어 자원 제한(즉, 유동점 동작 및 소프트웨어 내의 대량의 멀티플라이 동작), 시뮬레이션 및 통신 오버헤드(즉, 소프트웨어 내에서 상주하는 테스트-벤치 프로세스들 사이의 작은 브리지 로직, 및 소프트웨어 내에서 상주하는 테스트-벤치 프로세스에 의해 모니터되는 신호), 및 사용자의 선택을 포함한다. 성능 및 시뮬레이션 모니터링을 포함하는 다양한 이유를 위하여, 사용자는 소프트웨어에 상주하도록 하드웨어내에서 모델화되는 특정 컴포넌트가 되도록 할 수 있다.Step 306 executes residence selection. With respect to the user, the system selects a component for the hardware model; That is, among the hardware components that can be implemented in the hardware model of the user's circuit design, some hardware components will not be modeled in hardware for various reasons. These reasons include component type, hardware resource limitations (ie, floating point operation and large multiply operation in software), simulation and communication overhead (ie, small bridge logic between test-bench processes residing in software, and in software). Signal monitored by the test-bench process residing in the < RTI ID = 0.0 > For a variety of reasons, including performance and simulation monitoring, a user can be a specific component that is modeled in hardware to reside in software.

단계 307은 선택된 하드웨어 모델을 재구성가능한 하드웨어 에뮬레이션 보드에 맵핑시킨다. 특히, 단계 307 맵은 네트리스트를 취하며, 회로 설계를 특정 FPGA 칩에 맵핑시킨다. 이러한 단계는 로직 소자를 함께 그룹화하거나 클러스터하는 단계를 포함한다. 그리고 나서, 시스템은 각 그룹에 단일 FPGA 칩을 할당하거나, 또는 몇몇 그룹에 단일 FPGA 칩을 할당한다. 시스템은 상이한 FPGA 칩에 할당하기 위하여 그룹을 분리하기도 한다. 일반적으로, 시스템은 그룹을 FPGA 칩에 할당한다. 더욱 상세한 논의는 도 6과 관련하여 제공될 것이다. 시스템은 내부-칩 통신 오버헤드를 최소화하기 위하여 하드웨어 모델 컴포넌트를 FPGA 칩의 메쉬에 놓는다. 일 실시예에서, 어레이는 FPGA의 4×4 어레이, PCI 인터페이스 유닛 및 소프트웨어 클록 제어 유닛을 포함한다. FPGA 어레이는 이러한 소프트웨어 편집 프로세스의 단계 302-306에서 결정된 바와 같이, 사용자의 하드웨어 회로 설계의부분을 구현한다. PCI 인터페이스 유닛은 재구성가능한 하드웨어 에뮬레이션 모델이 PCI 버스를 통하여 워크스테이션과 통신하도록 한다. 소프트웨어 클록은 FPGA의 어레이에 다양한 클록 신호를 위하여 레이스 조건을 회피한다. 더욱이, 단계 307은 하드웨어 모델중의 통신 스케쥴에 따라 FPGA 칩을 라우트한다.Step 307 maps the selected hardware model to a reconfigurable hardware emulation board. In particular, the step 307 map takes a netlist and maps the circuit design to a particular FPGA chip. This step includes grouping or clustering logic elements together. The system then assigns a single FPGA chip to each group, or assigns a single FPGA chip to several groups. The system may even separate groups to allocate to different FPGA chips. In general, the system assigns groups to FPGA chips. A more detailed discussion will be provided with respect to FIG. 6. The system places hardware model components on the mesh of the FPGA chip to minimize internal-chip communication overhead. In one embodiment, the array includes a 4x4 array of FPGAs, a PCI interface unit, and a software clock control unit. The FPGA array implements part of the user's hardware circuit design, as determined in steps 302-306 of this software editing process. The PCI interface unit allows a reconfigurable hardware emulation model to communicate with the workstation via the PCI bus. The software clock avoids race conditions for various clock signals in the array of FPGAs. Moreover, step 307 routes the FPGA chip in accordance with the communication schedule in the hardware model.

단계 308은 제어 회로를 삽입한다. 이러한 제어 회로는 I/O 어드레스 포인터 및 시뮬레이터(도 11, 12 및 14와 관련하여 이하에서 논의됨)에 DMA 엔진과 통신하기 위한 데이터 버스 로직, 및 하드웨어 상태 변이 및 와이어 멀티플렉싱(도 19 및 20과 관련하여 논의됨)을 제어하기 위한 평가 제어 로직을 포함한다. 당업자에게 공지된 바와 같이, 직접 메모리 액세스(DMA) 유닛은 주변장치 및 메인 메모리 사이의 추가적인 데이터 채널을 제공하며, 주변장치는 CPU의 개입없이 메인 메모리와 직접적으로 접속(즉, 판독, 기록)할 수 있다. 각각의 FPGA 칩내의 어드레스 포인터는 버스 크기 제한에 비추어 소프트웨어 모델 및 하드웨어 모델 사이에 데이터를 이동하도록 한다. 평가 제어 로직은 클록과 데이터 입력이 이러한 레지스터를 입력하기 전에 클록이 입력을 레지스터에 입력할 수 있도록 보장하는 한정된 상태 기계이다.Step 308 inserts a control circuit. Such control circuitry includes data bus logic for communicating with the DMA engine to the I / O address pointer and simulator (discussed below in connection with FIGS. 11, 12 and 14), and hardware state transitions and wire multiplexing (see FIGS. 19 and 20 and Evaluation control logic) to control). As is known to those skilled in the art, a direct memory access (DMA) unit provides an additional data channel between the peripheral and the main memory, which can directly connect (i.e. read, write) to the main memory without CPU intervention. Can be. Address pointers within each FPGA chip allow data to move between software and hardware models in light of bus size limitations. The evaluation control logic is a finite state machine that ensures that the clock can enter the input into the register before the clock and data input enter this register.

단계 309는 하드웨어 모델을 FPGA 칩에 맵핑하기 위한 구성 파일을 생성한다. 본질적으로, 단계 309는 회로 설계 컴포넌트를 특정 셀 또는 각 칩내의 게이트 레벨 컴포넌트에 할당한다. 단계 307은 하드웨어 모델 그룹을 특정 FPGA 칩에 맵핑하는 것을 결정하는 반면에, 단계 309는 이러한 맵핑 결과를 취하여, 각 FPGA 칩을 위한 구성 파일을 생성한다.Step 309 creates a configuration file for mapping the hardware model to the FPGA chip. In essence, step 309 assigns the circuit design component to a gate level component within a particular cell or each chip. Step 307 determines mapping a hardware model group to a particular FPGA chip, while step 309 takes this mapping result to generate a configuration file for each FPGA chip.

단계 310은 소프트웨어 커널 코드를 생성한다. 커널은 전체 SEmulation 시스템을 제어하는 소프트웨어 코드의 시퀀스이다. 커널은 코드의 일부가 업데이트 및 하드웨어 컴포넌트 평가를 요구하기 때문에 이러한 포인트까지 생성될 수 없다. 단계 309 후에만 하드웨어 모델과 FPGA 칩에 적절한 맵핑이 발생한다. 더욱 상세한 논의는 도 5와 관련하여 이하에서 제공될 것이다. 편집은 단계 311에서 종료된다.Step 310 generates software kernel code. The kernel is a sequence of software code that controls the entire SEmulation system. The kernel cannot be generated to this point because some of the code requires updates and hardware component evaluation. Only after step 309 proper mapping to the hardware model and the FPGA chip occurs. A more detailed discussion will be provided below with respect to FIG. 5. Editing ends at step 311.

도 4에 관하여 상술한 바와 같이, 소프트웨어 커널 코드는 소프트웨어와 하드웨어 모델이 결정된 후에 단계 310에서 결정된다. 커널은 전체 시스템의 동작을 제어하는 SEmulation 시스템내의 소프트웨어 일부이다. 커널은 소프트웨어 시뮬레이션과 하드웨어 에뮬레이션의 동작을 제어한다. 커널은 하드웨어 모델의 중앙에 위치하기 때문에, 시뮬레이터는 에뮬레이터와 함께 집적된다. 다른 공지된 공동-시뮬레이션 시스템과는 달리, 본 발명의 일 실시예에 따른 SEmulation 시스템은 외부로부터 에뮬리이터와 상호작용하기 위한 시뮬레이터를 요구하지 않는다. 커널의 일 실시예는 도 5에 도시된 제어 루프이다.As described above with respect to FIG. 4, the software kernel code is determined in step 310 after the software and hardware model have been determined. The kernel is the piece of software in the SEmulation system that controls the behavior of the entire system. The kernel controls the behavior of software simulation and hardware emulation. Because the kernel is located in the center of the hardware model, the simulator is integrated with the emulator. Unlike other known co-simulation systems, the SEmulation system according to one embodiment of the present invention does not require a simulator to interact with the emulator from the outside. One embodiment of the kernel is the control loop shown in FIG.

도 5를 참조하면, 커널은 단계 330에서 시작된다. 단계 331은 초기화 코드를 평가한다. 단계 332에서의 시작하여, 결정 단계 339에 의해 바운드되면, 제어 루프가 시작하여, 시스템이 액티브 테스트-벤치 프로세스를 관측하지 못 할 때까지, 반복적으로 순환하며, 그러한 경우 시뮬레이션 또는 에뮬레이션 세션은 완료된다. 단계 332는 시뮬레이션 또는 에뮬레이션을 위한 액티브 테스트-벤치 컴포넌트를 평가한다.Referring to FIG. 5, the kernel begins at step 330. Step 331 evaluates the initialization code. Beginning at step 332, if bound by decision step 339, the control loop begins to iterate repeatedly until the system fails to observe the active test-bench process, in which case the simulation or emulation session is complete. . Step 332 evaluates an active test-bench component for simulation or emulation.

단계 333은 클록 컴포넌트를 평가한다. 이러한 클록 컴포넌트는 테스트-벤치 프로세스로부터 생긴다. 보통, 사용자는 시뮬레이션 시스템에 무슨 형태의 클록 신호가 생성될 것인지를 지시한다. (컴포넌트 형태 분석과 관련하여 상술되고 여기서 재생산된) 일 예에서, 테스트-벤치 프로세스에서 사용자에 의해서 지정된 클록 컴포넌트는 다음과 같다 :Step 333 evaluates the clock component. This clock component results from the test-bench process. Typically, the user tells the simulation system what type of clock signal will be generated. In one example (described above in connection with component shape analysis and reproduced herein), the clock component specified by the user in the test-bench process is as follows:

시작start

클록 = 0;Clock = 0;

#5;# 5;

클록 = 1;Clock = 1;

#5;# 5;

종료;End;

이러한 클록 컴포넌트에서, 사용자는 로직 "0" 신호가 먼저 생성되고, 그리고 나거 5 시뮬레이션 시간 후에, 로직 "1" 신호가 생성될 것이라는 것을 결정한다. 이러한 클록 생성 프로세스는 사용자에 의해 중단될 때 까지, 계속 순환한다. 이러한 시뮬레이션 시간은 커널에 의해 개선된다.In this clock component, the user determines that a logic "0" signal will be generated first, and after five simulation times, a logic "1" signal will be generated. This clock generation process continues to cycle until interrupted by the user. This simulation time is improved by the kernel.

결정 단계 334는 임의의 액티브 클록 에지가 검출되었는지를 문의하며, 몇가지 종류의 소프트웨어 및 가능한 하드웨어 모델(만약, 에뮬레이션이 실행중이라면)내의 로직 평가가 나오게 된다. 커널은 액티브 클록 에지를 검출하기 위하여 사용하며, 클록 신호는 테스트-벤치 프로세스로부터의 클록 신호이다. 만약 결정 단계 334가 "아니오"라고 평가한다면, 커널은 단계 337을 진행한다. 결정 단계 334가 "예"라고 평가한다면, 레지스터와 메모리를 업데이트하는 단계 335와 조합 컴포넌트를 전파하는 단계 336으로 된다. 단계 336은 클록 신호가 나타난 후에 조합 로직 네트워크를 통하여 값을 전파하기 위하여 시간을 요구하는 조합 로직을 처리한다. 값이 조합 컴포넌트를 통하여 전파하여, 안정화되면, 커널은 단계 337로 진행한다.Decision step 334 asks if any active clock edge has been detected and results in a logic evaluation within some sort of software and possible hardware model (if emulation is running). The kernel uses to detect active clock edges, and the clock signal is the clock signal from the test-bench process. If decision step 334 evaluates to "no", then the kernel proceeds to step 337. If the decision step 334 evaluates to "yes," then step 335 of updating the register and memory and step 336 of propagating the combinational component. Step 336 processes the combinatorial logic that requires time to propagate the value through the combinatorial logic network after the clock signal appears. If the value propagates through the combination component and stabilizes, the kernel proceeds to step 337.

레지스터와 조합 컴포넌트가 하드웨어내에서 모델화되며, 그러므로 커널은 SEmulation 시스템의 에뮬레이터 부분을 제어한다는 것을 유의하라. 실제로, 커널은 임의의 액티브 클록 에비가 검출될 때마다, 단계 334와 335에서 하드웨어 모델의평가를 가속시킬 수 있다. 그러므로, 종래 기술과는 달리, 본 발명의 일 실시예에 따른 SEmulation 시스템은 컴포넌트 형태(예를 들면, 레지스터, 조합)에 기초하며, 소프트웨어 커널을 통한 하드웨어 에뮬레이터을 가속할 수 있다. 더욱이, 커널은 주기마다 소프트웨어와 하드웨어 모델의 실행을 제어한다. 본질적으로, 에뮬레이터 하드웨어 모델은 시뮬레이션 커널을 실행하는 일반-목적 프로세서의 시뮬레이션 공동프로세서로서 특징될 수 있다. 공동프로세서는 시뮬레이션 업무를 가속시킨다.Note that registers and combination components are modeled in hardware, so the kernel controls the emulator part of the SEmulation system. Indeed, the kernel may accelerate the evaluation of the hardware model at steps 334 and 335 whenever any active clock evi is detected. Therefore, unlike the prior art, the SEmulation system according to one embodiment of the present invention is based on component type (eg, registers, combinations) and can accelerate the hardware emulator through the software kernel. Moreover, the kernel controls the execution of software and hardware models on a periodic basis. In essence, the emulator hardware model can be characterized as a simulation coprocessor of a general-purpose processor running a simulation kernel. Coprocessors speed up simulation tasks.

단계 337은 액티브 테스트-벤치 컴포넌트를 평가한다. 단계 338은 시뮬레이션 시간을 개선시킨다. 단계 339는 단계 332에서 시작하는 제어 루프를 위한 경계를 제공한다. 단계 339는 임의의 테스트-벤치 프로세스가 액티브인지를 결정한다. 만약 그렇다면, 시뮬레이션 및/또는 에뮬레이션은 여전히 동작하며, 더 많은 데이터가 평가된다. 그러므로, 커널은 임의의 액티브 테스트-벤치 컴포넌트를 평가하기 위하여 단계 332로 루프시킨다. 만약, 테스트-벤치 프로세스가 액티브가 아니라면, 시뮬레이션과 에뮬레이션 프로세스는 완료된다. 단계 340은 시뮬레이션/에뮬레이션 프로세스는 종료한다. 또한, 커널은 전체 SEmulation 시스템의 동작을 제어하는 메인 제어 루프이다. 임의의 테스트-벤치 프로세스가 액티브이면, 커널은 액티브 테스트-벤치 컴포넌트를 평가하며, 클록 컴포넌트를 평가하며, 레지스터를 업데이트하기 위하여 클록 에지를 검출하고, 조합 로직 데이터를 전파하고 기억시키며, 시뮬레이션 시간을 개선시킨다.Step 337 evaluates the active test-bench component. Step 338 improves simulation time. Step 339 provides a boundary for the control loop beginning at step 332. Step 339 determines if any test-bench process is active. If so, simulation and / or emulation still work, and more data is evaluated. Therefore, the kernel loops to step 332 to evaluate any active test-bench components. If the test-bench process is not active, the simulation and emulation processes are complete. Step 340 ends the simulation / emulation process. The kernel is also the main control loop that controls the operation of the entire SEmulation system. If any test-bench process is active, the kernel evaluates the active test-bench component, evaluates the clock component, detects clock edges to update registers, propagates and stores combinational logic data, and Improve.

도 6은 하드웨어 모델을 재구성가능한 보드를 자동적으로 맵핑하는 방법의 일 실시예를 도시한다. 네트리스트 파일은 하드웨어 구현 프로세스에 입력을 제공한다. 네트리스트는 로직 기능과 그것들의 상호접속을 상술한다. 하드웨어 모델 대 FPGA 실현 프로세스는 3가지 독립 업무; 맵핑, 배치 및 라우팅을 포함한다. 장치는 일반적으로 "배치-및-라우트" 장치로서 언급된다. 사용된 설계 장치는 Viewlogic Viewdraw, 스키메틱 캡쳐 시스템 및 Xilinx Xact 배치 및 라우트 소프트웨어 또는 Altera's MAX+PLUS Ⅱ 시스템이 될 수 있다.6 illustrates one embodiment of a method for automatically mapping a hardware model to a reconfigurable board. The netlist file provides input to the hardware implementation process. The netlist details the logic functions and their interconnections. The hardware model to FPGA realization process involves three independent tasks; Includes mapping, deployment, and routing. The device is generally referred to as a "batch-and-route" device. Design devices used could be Viewlogic Viewdraw, schematic capture systems and Xilinx Xact batch and route software or Altera's MAX + PLUS II systems.

맵핑 업무는 회로 설계를 로직 블록, I/O 블록 및 다른 FPGA 자원으로 분할한다. 플립-플롭 및 버퍼와 같은 일부 로직 기능은 직접적으로 상응하는 FPGA 자원에 맵핑하지만, 조합 로직과 같은 다른 기능은 맵핑 알고리즘을 사용하여 로직 블록에서 구현되어야만 한다.The mapping task divides the circuit design into logic blocks, I / O blocks, and other FPGA resources. Some logic functions, such as flip-flops and buffers, map directly to corresponding FPGA resources, while other functions, such as combinatorial logic, must be implemented in logic blocks using mapping algorithms.

배치 업무는 맵핑 업무로부터 로직 및 I/O 블록을 취하며, FPGA 어레이내에 물리적인 위치로 할당하는 것을 포함한다. 현재 FPGA 장치는 일반적으로 3가지 기술; 민컷(mincut), 시뮬레이팅 어닐링 및 일반적인 힘-지향 완화(GFDR)의 조합을사용한다. 이러한 기술은 상호 접속의 전체 네트 길이 또는 다양한 변중에서 임계 신호 경로의 세트에 따른 지연에 좌우되는 다양한 비용 함수에 기초한 최적 배치를 결정한다. Xilinx XC4000 시리즈 FPGA 장치는 배치시 개선을 위한 GFDR에 의해 일어나는 최초 배치를 위한 민컷 기술의 변화를 사용한다.Deployment tasks take logic and I / O blocks from mapping tasks and include assigning them to physical locations within the FPGA array. Current FPGA devices typically have three technologies; A combination of mincut, simulating annealing, and general force-directed relaxation (GFDR) is used. This technique determines the optimal placement based on various cost functions that depend on the delay along the set of critical signal paths over the entire net length or various variations of the interconnect. Xilinx XC4000 Series FPGA devices use a change in Mincut technology for the initial deployment, caused by GFDR for improvement in deployment.

라우팅 업무는 다양한 맵핑 및 배치된 블록을 상호접속하기 위하여 사용된 라우팅 경로를 결정하는 것을 포함한다. 소위 메이즈 라우터(maze router)인 그러한 라우터는 2개의 지점간의 최단 경로를 탐색한다. 라우팅 업무는 칩중에 직접 상호접속을 제공하므로, 칩과 관련된 회로의 배치가 중요하다.Routing tasks include determining the routing paths used to interconnect the various mapped and placed blocks. Such a router, a so-called maze router, seeks the shortest path between two points. Since the routing task provides direct interconnection within the chip, the placement of the circuit associated with the chip is important.

아웃셋에서, 하드웨어 모델은 게이트 네트리스트 350 또는 RTL 357로 상술될 수 있다. RTL 레벨 코드는 게이트 레벨 네트리스트에 합성될 수 있다. 맵핑 프로세스 동안에, Altera MAX+PLUSⅡ 프로그래머블 로직 개발 장치 시스템 및 소프트웨어와 같은 합성기 서버(360)는 맵핑 목적을 위한 출력 파일을 생산하기 위하여 사용될 수 있다. 합성기 서버(360)는 사용자의 회로 설계 컴포넌트를 라이브러리(361)내에 있는 임의의 표준 로직 소자(예를 들면, 표준 가산기나 표준 멀티플라이어)에 매칭되며, 파라메터화되며 종종 사용되는 로직 모듈(362)(예를 들면, 비표준 멀티플렉서 또는 비표준 가산기)을 생성하며, 그리고 임의의 로직 소자(363)(예를 들면, 주문된 로직 기능을 실현하는 로직에 기초한 표)를 합성하는 능력을 갖는다. 합성기 서버는 여분의 로직과 사용되지 않는 로직을 제거하기도 한다. 출력 파일은 사용자의 회로 설계에 의해 요구된 로직을 합성하거나 최적화한다.In the offset, the hardware model may be detailed with gate netlist 350 or RTL 357. The RTL level code can be synthesized in the gate level netlist. During the mapping process, a synthesizer server 360, such as Altera MAX + PLUSII programmable logic development device system and software, can be used to produce an output file for mapping purposes. Synthesizer server 360 matches a user's circuit design component to any standard logic element (eg, standard adder or standard multiplier) in library 361, parameterized and often used logic module 362. (E.g., generate a nonstandard multiplexer or nonstandard adder), and have the ability to synthesize any logic element 363 (e.g., a table based on logic that implements the ordered logic function). The synthesizer server also removes extra logic and unused logic. The output file synthesizes or optimizes the logic required by the user's circuit design.

HDL의 일부 또는 모두가 RTL 레벨일 때, 회로 설계 컴포넌트는 높은 레벨에 있어서, SEmulation 시스템이 SEmulation 레지스터나 컴포넌트를 사용하여 이러한 컴포넌트를 용이하게 모델화할 수 있게된다. HDL의 일부 또는 모두가 게이트 네트리스트 레벨에 있을 때, 호로 설계 컴포넌트는 더 많은 회로 설계-특정이 되며, 사용자 회로 설계 컴포넌터가 SEmulation 컴포넌트에 맵핑하는 것을 더욱 어렵게 한다. 따라서, 합성기 서버는 표준 로직 소자의 변화에 근거한 임의의 로직 소자 또는 이러한 변화나 라이브러리 표준 로직 소자와 병행하지 않는 임의의 로직 소자를 생성할 수 있다.When some or all of the HDL is at the RTL level, the circuit design components are at a high level, allowing the SEmulation system to easily model these components using SEmulation registers or components. When some or all of the HDL is at the gate netlist level, the arc design component becomes more circuit design-specific, making it more difficult for the user circuit design component to map to the SEmulation component. Thus, the synthesizer server may create any logic element based on changes in the standard logic element or any logic element that is not parallel to such a change or library standard logic element.

회로 설계가 게이트 네트리스트 형태에 있다면, SEmulation 시스템은 그룹핑 또는 클러스터링 동작(351)을 최초로 수행할 것이다. 하드웨어 모델 구조는 조합 로직과 레지스터가 클록으로부터 분리되기 때문에, 클러스터링 프로세스에 기초한다. 그러므로, 일반적인 주요 클록 또는 게이트된 클록 신호를 공유하는 로직 소자는 함께 그룹화하여, 칩 상에 배치됨으로써 더욱 양호하게 작용할 수 있다. 클러스터링 알고리즘은 유도된 접속, 계층적 추출 및 규직적인 구조물 추출에 기초한다. 구조화된 RTL 358내에 기재되어 있다면, SEmulation 시스템은 기능을 로직 기능 분해 동작(359)에 의해 표현되는 더 작은 유닛으로 분해할 수 있다. 임의의 단계에서, 로직 합성이나 로직 최적화가 요구된다면, 합성기 서버(360)는 회로 설계를 사용자 지시에 기초한 더욱 효율적인 표현으로 변환할 수 있다. 클러스터링 동작(351)을 위하여, 합성기 서버에 링크는 점선 화살표(364)에 의해 표현된다. 구조화된 RTL(358)을 위하여, 합성기 서버(360)에 링크는 화살표(365)로 표현된다.로직 기능 분해 동작(359)을 위하여, 합성기 서버(360)에 링크는 화살표(366)로 표현된다.If the circuit design is in the form of a gate netlist, the SEmulation system will first perform a grouping or clustering operation 351. The hardware model structure is based on the clustering process because the combinational logic and registers are separated from the clock. Therefore, logic elements that share a common major clock or gated clock signal can work better by being grouped together and placed on a chip. Clustering algorithms are based on derived connections, hierarchical extraction and canonical structure extraction. If described in structured RTL 358, the SEmulation system can decompose the function into smaller units represented by logic functional decomposition operation 359. At any stage, if logic synthesis or logic optimization is required, synthesizer server 360 may convert the circuit design into a more efficient representation based on user instructions. For clustering operation 351, the link to the synthesizer server is represented by dashed arrow 364. For structured RTL 358, the link to synthesizer server 360 is represented by arrow 365. For logic functional decomposition operation 359, the link to synthesizer server 360 is represented by arrow 366. .

클러스터링 동작(351)은 기능과 크기에 기초한 선택적인 방식으로 로직 컴포넌트를 그룹화한다. 클러스터링은 큰 회로 설계를 위한 몇몇 클러스터나 작은 회로 설계를 위한 하나의 클러스터를 포함한다. 이러한 로직 소자의 클러스터는 지정된 FPGA 칩으로 맵핑하기 위하여 다음 단계에서 사용될 것이다; 즉, 하나의 클러스터는 특정 칩을 위하여 목표가 정해질 것이며, 다른 클러스터는 상이한 칩 또는 제 1 클러스터로서 동일한 칩을 위하여 목표가 정해질 것이다. 일반적으로, 클러스터내의 로직 소자는 칩 내에서 클러스터와 함께 놓이지만, 최적화 목적을 위하여, 클러스터는 하나 이상의 칩에 분리된다.Clustering operation 351 groups the logical components in an optional manner based on function and size. Clustering includes several clusters for large circuit designs or one cluster for small circuit designs. This cluster of logic elements will be used in the next step to map to the designated FPGA chip; That is, one cluster will be targeted for a particular chip and the other cluster will be targeted for the same chip as a different chip or as a first cluster. In general, logic elements within a cluster lie with the cluster within the chip, but for optimization purposes, the cluster is separated into one or more chips.

클러스터가 클러스터링 동작(351)에서 형성된 후에, 시스템은 배치-및-라우트 동작을 수행한다. 우선, FPGA 칩 내에 클러스터의 거친-그레인 배치 동작(352)이 수행된다. 거친-그레인 배치 동작(352)은 우선 로직 소자의 클러스터를 선택된 FPGA 칩에 배치한다. 만약, 필요하다면, 시스템은 합성기 서버(360)가 화살표(367)에 의해 표현된 바와 같이 거친-그레인 배치 동작(352)에 이용가능하게 한다. 미세-그레인 배치 동작은 최초 배치를 미세-조정하기 위하여, 거친-그레인 배치 동작을 한 후에 수행된다. SEmulation 시스템은 거친-그레인 배치 및 미세-그레인 배치 동작을 위한 최적 배치를 결정하기 위하여, 핀 사용 요구, 게이트 사용 요구 및 게이트-대-게이트 홉(hop)에 기초한 비용 함수를 사용한다.After the cluster is formed in clustering operation 351, the system performs a batch-and-route operation. First, a rough-grain placement operation 352 of the cluster in the FPGA chip is performed. Rough-grain placement operation 352 first places a cluster of logic elements on a selected FPGA chip. If necessary, the system makes synthesizer server 360 available for coarse-grain placement operation 352 as represented by arrow 367. The fine-grain placement operation is performed after the coarse-grain placement operation to fine-tune the initial placement. The SEmulation system uses a cost function based on pin usage requirements, gate usage requirements, and gate-to-gate hops to determine optimal placement for coarse-grain placement and fine-grain placement operations.

특정 칩내에 배치되는 클러스터 결정 방법은 배치 비용에 기초하며, 2개이상의 회로(즉, CKTQ = CKT1, CKT2,...,CKTN) 및 FPGA 칩 어레이내의 개별 위치를 위한 비용 함수(P, G, D)를 통하여 계산되며, 여기서 P는 일반적으로 핀 사용/이용가능성, G는 일반적으로 게이트 이용/이용가능성 및 D는 (도 8과 관련하여 도 7에 도시된 )연결 메트릭스 M에 의해 정의되는 게이트 대 게이트 "홉"의 거리 또는 수이다. 하드웨어 모델내에 모델화되는 사용자의 회로 설계는 회로 CKTQ의 전체 조합이다. 각각의 비용함수는 계산된 배치 비용의 계산된 값이 일반적으로 (1) FPGA 어레이내의 임의의 2개의 회로 CKTN-1과 CKTN 사이의 "홉"의 최소 수 및 (2) FPGA 어레이내의 회로 CKTN-1과 CKTN 의 배치가 핀 사용이 최소가 되도록하는 경향이 있다.Cluster determination methods that are placed within a particular chip are based on placement costs and include cost functions (P, G, P) for two or more circuits (ie, CKTQ = CKT1, CKT2, ..., CKTN) and individual locations within the FPGA chip array. Calculated via D), where P is generally the pin usage / availability, G is the gate usage / availability in general, and D is the gate defined by the connection matrix M (shown in FIG. 7 with respect to FIG. 8). Vs. distance or number of "hops". The user's circuit design, modeled within the hardware model, is the overall combination of circuit CKTQ. Each cost function is calculated such that the calculated value of the calculated placement cost is generally (1) the minimum number of "hops" between any two circuits CKTN-1 and CKTN in the FPGA array and (2) the circuit CKTN- in the FPGA array. The placement of 1 and CKTN tends to minimize pin usage.

일 실시예에서, 비용 함수 F(P, G, D)는 다음과 같이 정의된다.In one embodiment, the cost function F (P, G, D) is defined as follows.

이러한 등식은 다음과 같이 간략화될 수 있다.This equation can be simplified as follows.

f(P, G, D) = C0*P + C1*G + C2*Df (P, G, D) = C0 * P + C1 * G + C2 * D

제 1항(즉, C0*P)는 사용된 핀의 수와 사용가능한 핀의 수에 기초한 제 1 배치 비용 값을 만든다. 제 2항(즉, C1*G)은 사용된 게이트의 수와 사용가능한 게이트의 수에 기초한 제 2 배치 비용값을 만든다. 제 3항(즉, C2*D)은 회로 CKTQ(즉, CKT1, CKT2,...,CKTN)내의 다양한 상호접속 게이트 사이의 존재하는 홉의 수에 기초한 배치 비용 값을 만든다. 전체 배치 비용 값은 이러한 3개의 배치 비용 값을 반복적으로 가산함으로써 만들어진다. 상수 CO, C1및 C2는 임의의 배치 비용 계산동안에 가장 중요한 인자 또는 인자들(즉, 핀 사용량, 게이트 사용량 또는 게이트-대-게이트 홉)에 대한 비용 함수로터 발생된 전체 배치 비용 값을 선택적으로 왜곡하는 가중 상수를 나타낸다.Claim 1 (ie C0 * P) makes a first placement cost value based on the number of pins used and the number of pins available. The second term (i.e. C1 * G) makes a second placement cost value based on the number of gates used and the number of gates available. The third term (i.e., C2 * D) makes a placement cost value based on the number of hops present between the various interconnect gates in the circuit CKTQ (i.e., CKTl, CKT2, ..., CKTN). The total batch cost value is created by iteratively adding these three batch cost values. The constants CO, C1 and C2 selectively skew the total placement cost value generated from the cost function for the most important factors or factors (ie, pin usage, gate usage or gate-to-gate hop) during any placement cost calculation. Represents a weighting constant.

배치 비용은 시스템이 가중 상수 C0, C1 및 C2를 위한 상이한 상대값을 선택할 때 반복적으로 계산된다. 그러므로, 일 실시예에서, 거친-그레인 배치 동작 동안에, 시스템은 C2에 비하여 CO 및 C1를 위하여 큰 값을 선택한다. 이러한 반복에서, 시스템은 핀 사용량/이용가능성 및 게이트 사용량/이용가능성의 최적화가 FPGA 칩의 어레이내의 회로 CKTQ의 초기 배치에서의 게이트-대-게이트 홉의 최적화보다 더욱 중요하다는 것을 결정한다. 이러한 반복에서, 시스템은 게이트-대-게이트 홉의 최적화가 핀 사용량/이용가능성 및 게이트 사용량/이용가능성의 최적화보다 더 중요하는 것을 결정한다.The batch cost is calculated iteratively when the system selects different relative values for the weighting constants C0, C1 and C2. Therefore, in one embodiment, during the rough-grain batch operation, the system selects large values for CO and C1 relative to C2. In this iteration, the system determines that the optimization of pin usage / availability and gate usage / availability is more important than the optimization of gate-to-gate hop in the initial placement of circuit CKTQ within the array of FPGA chips. In this iteration, the system determines that the optimization of gate-to-gate hop is more important than the optimization of pin usage / availability and gate usage / availability.

미세-그레인 배치 동작동안에, 시스템은 동일한 비용 함수를 사용한다. 일 실시예에서, CO, C1 및 C2의 선택에 관한 반복 단계는 거친-그레인 동작을 위한 것과 동일하다. 다른 실시예에서, 미세-그레인 배치 동작은 시스템이 C2에 비하여 CO 및 C1을 위하여 적은 값을 선택하는 단계를 포함한다.During the fine-grain placement operation, the system uses the same cost function. In one embodiment, the repeating steps for the selection of CO, C1 and C2 are the same as for the rough-grain operation. In another embodiment, the micro-grain placement operation includes the system selecting a smaller value for CO and C1 compared to C2.

이러한 변수와 등식의 설명은 이하에서 논의될 것이다. (다른 FPGA 칩중의) FPGA 칩 x 또는 FPGA 칩 y내의 특정 회로 CKTQ를 배치할 것인지를 결정할 시, 비용 함수는 핀 사용량/이용가능성(P), 게이트 사용량/이용가능성(G) 및 게이트-대-게이트 홉(D)을 시험한다. 비용 함수 변수 P, G, D에 기초하여, 비용 함수 f(P, G, D)는 FPGA 어레이내의 특정 위치에서 회로 CKTQ를 위한 배치 비용 값을 만든다.Descriptions of these variables and equations will be discussed below. In deciding whether to place a particular circuit CKTQ within FPGA chip x or FPGA chip y (among other FPGA chips), the cost functions are pin usage / availability (P), gate usage / availability (G) and gate-to- Test the gate hop (D). Based on the cost function variables P, G, and D, the cost function f (P, G, D) produces a placement cost value for the circuit CKTQ at a specific location within the FPGA array.

핀 사용량/이용가능성(P)는 I/O 용량을 나타내기도 한다. P_used는 각각의 FPGA 칩을 위한 회로 CKTQ에 의해 사용된 핀의 개수이다. P_available은 FPGA 칩내에서 이용가능한 핀의 개수이다. 일 실시예에서, P_available은 264(44핀 ×6 상호접속/칩)인데 반하여, 다른 실시예에서는 P_available은 265이다(44핀 ×6 상호접속/칩 + 1 여분 핀). 그러나, 이용가능한 핀의 특정 수는 사용된 FPGA 칩의 형태, 칩당 사용된 상호접속의 전체 수 및 각각의 상호접속을 위해 사용된 핀의 개수에 의존한다. 그러므로, P_available은 상당히 변화할 수 있다. 비용 함수 F(P, G, D)의 제 1항(즉, C0*P)를 평가하기 위하여, P_used/P_available비율은각각의 FPGA 칩을 위하여 계산된다. 그러므로, FPGA 칩의 4×4 어레이를 위하여, 16 비율 P_used/P_available가 계산된다. 소정 개수의 이용가능한 핀을 위하여 더 많은 핀이 사용되면, 비율은 더 높아진다. 16 계산된 비율중, 비율을 양산하는 가장 높은 수가 선택된다. 제 1배치 비용값은 선택된 최대 비율 P_used/P_available과 가중 상수 C0가 곱해짐으로써 제 1항 C0*P로부터 계산된다. 이러한 제 1항은 계산된 비율 P_used/P_available과 각각의 FPGA 칩을 위해 계산된 비율중 특정 최대 비율에 의존하기 때문에, 배치 비용 값은 더 높은 핀 사용을 위하여 더 높아지며, 다른 모든 인다는 같아진다. 시스템은 가장 낮은 배치 비용을 양산하는 배치를 선택한다. 다양한 배치를 위하여 계산된 모든 최대치중에서 가장 낮은 특정 배치를 만드는 P_used/P_available는 FPGA 어레이내에서 최적 배치로서 일반적으로 고려되며, 다른 모든 인자는 같다.Pin usage / availability (P) may also indicate I / O capacity. P _used is the number of pins used by the circuit CKTQ for each FPGA chip. P _available is the number of pins _available in the FPGA chip. In one embodiment, P _available is 264 (44 pin x 6 interconnect / chip), whereas in other embodiments, P _available is 265 (44 pin x 6 interconnect / chip + 1 spare pin). However, the specific number of pins available depends on the type of FPGA chip used, the total number of interconnects used per chip, and the number of pins used for each interconnect. Therefore, P _available can vary significantly. In order to evaluate the first term (ie C0 * P) of the cost function F (P, G, D), the P _used / P _available ratio is calculated for each FPGA chip. Therefore, for a 4x4 array of FPGA chips, 16 ratios P _used / P _available are calculated. If more pins are used for a given number of available pins, the ratio is higher. 16 Of the calculated ratios, the highest number yielding the ratio is selected. The first batch cost value is calculated from the first term C0 * P by multiplying the selected maximum ratio P _used / P _available by the weighting constant C0. Since this term depends on the calculated ratio P _used / P _available and the specific maximum ratio calculated for each FPGA chip, the placement cost value is higher for higher pin usage, and all other values are equal. . The system selects the batch that produces the lowest batch cost. P _used / P _{available, which} _yields the lowest specific placement of all the maximums calculated for the various placements, is generally considered as the optimal placement within the FPGA array, and all other factors are equal.

게이트 사용량/이용가능성(G)은 각각의 FPGA 칩에 의해 허용가능한 게이트의 개수에 기초한다. 어레이내의 회로 CKTQ의 위치에 근거한 일 실시예에서, 만약 사용된 게이트의 수(G_used)는 특정 임계값 이상이며, 그리고 나서 제 2 배치 비용(C1*G)은 배치가 실행할 수 없다는 것을 나타내는 값으로 할당될 것이다. 유사하게, 회로 CKTQ를 함유하는 각각의 칩내에 사용된 게이트의 개수가 특정 임계값 이하이면, 그러면 이러한 제 2항(C1*G)은 배치가 실행가능한 것을 나타내는 값으로 할당될 것이다. 그러므로, 초기에 시스템이 특정 칩내에 회로 CKTQ를 배치하기를 원하며, 칩은 회로 CKT1를 수용할 정도의 게이트가 충분치 못하다면, 시스템은 이러한 배치는 실행 가능하지 않은 비용 함수로 결론짓는다. 일반적으로, G를 위한 높은 수(예를 들면, 무한)는 비용 함수가 회로 CKTQ의 바람직한 배치가 실행 불가능한 것을 나타내는 높은 배치 비용값을 나타낼 것이며, 다른 배치가 결정되어야 한다는 것을 보장한다.Gate usage / availability G is based on the number of gates allowable by each FPGA chip. In one embodiment based on the position of the circuit CKTQ in the array, if the number of gates _used G is above a certain threshold, then the second placement cost C1 * G is a value indicating that the placement cannot be performed. Will be assigned. Similarly, if the number of gates used in each chip containing circuit CKTQ is below a certain threshold, then this second term C1 * G will be assigned a value indicating that the placement is feasible. Therefore, initially, the system wants to place circuit CKTQ within a particular chip, and if the chip does not have enough gates to accommodate circuit CKT1, the system concludes that this placement is a cost function that is not feasible. In general, a high number (eg, infinite) for G will represent a high placement cost value indicating that the cost function indicates that the preferred placement of the circuit CKTQ is not feasible and ensures that another placement should be determined.

어레이내의 회로 CKTQ의 위치에 기초한 다른 실시예에서, 비율 G_used/G_available은 각 칩을 위하여 계산되며, 여기서 G_used는 각각의 FPGA 칩내의 회로 CKTQ에 의해 사용된 게이트의 개수이며, G_available은 각 칩내에 이용가능한 게이트의 개수이다.일 실시예에서, 시스템은 FPGA 어레이를 위하여 FLEX 10K100 칩을 사용한다. FLEX 10K100 칩은 대략 100,000 게이트를 포함한다. 그러므로, 이러한 실시예에서, G_available은 100,000 게이트와 같다. 그러므로 FPGA 칩의 4×4 어레이를 위하여, 16 비율 G_used/G_available가 계산된다. 소정 개수의 이용가능한 게이트를 위하여 많은 게이트가 사용되면, 비율은 더 높아진다. 계산된 16 비율중에서, 가장 높은 수가 선택된다. 제 2 배치 비용값은 선택된 최대 비율 G_used/G_available과 가중 상수 C1을 곱함으로써 제 2 항 C1*G로부터 계산된다. 이러한 제 2항은 각각의 FPGA 칩을 위해 계산된 비율중에서 특정 최대 비율과 계산된 비율 G_used/G_available에 의존하기 때문에, 배치 비용값은 더 높은 게이트 사용량을 위하여 더 높아지며, 다른 인자는 동일해질 것이다. 시스템은 가장 낮은 비용을 양산하는 회로 배치를 선택한다. 다양한 배치를 위하여 계산된 모든 최대치중에서 가장 낮은 최대 비율 G_used/G_available를 만드는 특정 배치는 일반적으로 FPGA 어레이내의 최적 배치로서 고려도며, 다른 모든 인자는 동일하다.In another embodiment based on the position of circuit CKTQ in the array, the ratio G _used / G _available is calculated for each chip, where G _used is the number of gates used by circuit CKTQ in each FPGA chip, where G _available is The number of gates available in each chip. In one embodiment, the system uses a FLEX 10K100 chip for the FPGA array. The FLEX 10K100 chip contains approximately 100,000 gates. Therefore, in this embodiment, G _available is equal to 100,000 gates. Therefore, for a 4x4 array of FPGA chips, 16 ratios G _used / G _available are calculated. If more gates are used for a given number of available gates, the ratio is higher. Of the 16 proportions calculated, the highest number is selected. The second placement cost value is calculated from the second term C1 * G by multiplying the selected maximum ratio G _used / G _available with the weighting constant C1. Since the second term depends on the specific maximum ratio and the calculated ratio G _used / G _available among the ratios calculated for each FPGA chip, the placement cost value is higher for higher gate usage and the other factors will be the same. will be. The system selects the circuit layout that produces the lowest cost. Of all the maximums calculated for the various placements, the particular placement that produces the lowest maximum ratio G _used / G _available is generally considered as the optimal placement within the FPGA array, and all other factors are equal.

다른 실시예에서, 시스템은 C1을 위한 일부 값을 선택한다. 만약 비율 G_used/G_available가 "1"보다 더 크다면, 이러한 특정 배치는 실행 불가능하다(즉, 적어도 하나의 칩은 회로의 이러한 특정 배치를 위해 게이트가 충분치 못하다). 그 결과, 시스템은 C1을 매우 높은 수(예를 들면, 무한대)로 조절하며, 제 2항(C1*G)는 매우 높은 수가 되며, 전체 배치 비용 값 f(P, G, D)은 매우 높아질 것이다. 다른한편, 만약 비율 G_used/G_available가 "1" 이하이면, 이러한 특정 배치는 실행가능하다(즉, 각각의 칩은 최오 구현을 지원하기에 충분한 게이트를 갖는다). 그 결과, 시스템은 C1을 조절하지 않으며, 제 2항(C1*G)는 특정 수를 결정할 것이다.In another embodiment, the system selects some value for C1. If the ratio G _used / G _available is greater than " 1 ", this particular arrangement is not feasible (ie at least one chip does not have enough gates for this particular placement of the circuit). As a result, the system adjusts C1 to a very high number (e.g. infinity), Clause 2 (C1 * G) becomes a very high number, and the overall deployment cost value f (P, G, D) becomes very high. will be. On the other hand, if the ratio G _used / G _available is less than or equal to " 1 ", this particular arrangement is feasible (ie, each chip has enough gates to support the worst implementation). As a result, the system does not regulate C1, and paragraph 2 (C1 * G) will determine a certain number.

제 3항(C2*D)은 상호접속을 요구하는 모든 게이트들 사이의 홉의 개수를 나타낸다. 롭의 개수는 상호접속 메트릭스에 의존하기도 한다. 접속 메트릭스는 칩 대 칩 상호접속을 요구하는 임의의 2개 게이트 사이의 회로 경로를 결정하기 위한 기초를 제공한다. 모든 게이트가 게이트 대 게이트 상호접속을 요구하는 것은 아니다. 사용자의 최초 회로 설계와 클러스터의 특정 칩으로 분할에 기초하여, 일부 게이트는 그것들의 개별 입력과 출력이 동일한 칩에 위치하기 때문에, 임의의 상호접속이 필요하지 않게 된다. 그러나, 다른 게이트는 그적들의 개별 입력과 출력에 연결된 로직 소자가 상이한 칩에 배치되므로 상호접속이 요구된다.Clause 3 (C2 * D) represents the number of hops between all gates requiring interconnection. The number of robs also depends on the interconnect metrics. The connection metrics provide the basis for determining the circuit path between any two gates requiring chip to chip interconnect. Not all gates require a gate-to-gate interconnect. Based on the user's original circuit design and division into specific chips in the cluster, some gates do not require any interconnection because their individual inputs and outputs are located on the same chip. However, other gates require interconnection because logic elements connected to their respective inputs and outputs are placed on different chips.

"홉"을 이해하기 위하여, 도 7의 테이블 형태의 접속 메트릭스와 도 8의 화보 형태의 도면을 참조한다. 도 8에서, 칩 F11과 칩 F14 사이의 상호접속(602)과 갑튼 칩들 사이의 각각의 상호접속은 44핀 또는 44 와이어 라인을 나타낸다. 다른 실시예에서, 각각의 상호접속은 44핀 이상을 나타낸다. 다른 실시예에서, 각각의 상호접속은 44핀 이하를 나타낸다.In order to understand "hop", reference is made to the connection matrix in the form of a table in FIG. 7 and the diagram in the form of a pictorial in FIG. 8. In FIG. 8, the interconnection 602 between chip F11 and chip F14 and each interconnection between the armored chips represents a 44 pin or 44 wire line. In other embodiments, each interconnect represents at least 44 pins. In other embodiments, each interconnect represents 44 pins or less.

이러한 상호접속 기술을 사용하여, 데이터는 한 칩에서 다른 칩으로 2 "홉" 또는 "점프"내에서 통과할 수 있다. 그러므로, 데이터는 상호접속(601)을 통하여 하나의 홉내에서 칩 F11에서 칩 F12로 통과할 수 있으며, 데이터는 상호 접속(600,606) 또는 상호접속(603, 610)중 어느 하나를 통하여 2 홉내에서 칩 F11에서 칩 F 33으로 통과할 수 있다. 이러한 예시적인 홉은 이러한 칩의 세트들 사이의 최단 경로 홉이다. 일부 예에서, 신호는 다양한 신호를 통하여 라우트되어, 한 칩내의 게이트와 다른 칩내의 게이트 사이의 홉의 개수는 최단 경로를 초과하게 된다. 게이트 대 게이트 홉의 개수를 결정할 때 시험되어야만 하는 회로 경로는 상호접속을 요구하는 것들이다.Using this interconnect technology, data can pass in two "hops" or "jumps" from one chip to another. Thus, data can pass from chip F11 to chip F12 within one hop through interconnect 601, and the data can pass within two hops through either interconnect 600, 606 or interconnect 603, 610. Pass from F11 to chip F 33. This exemplary hop is the shortest path hop between these sets of chips. In some examples, signals are routed through various signals such that the number of hops between gates in one chip and gates in another chip exceeds the shortest path. Circuit paths that must be tested when determining the number of gate-to-gate hops are those requiring interconnection.

접속은 내부 칩 상호접속을 요구하는 게이트들 사이의 모든 홉의 합에 의해서 표현된다. 임의의 2개 칩간의 최단 경로는 도 7과 8의 접속 메트릭스를 사용하여 하나 또는 두 개의 "홉"에 의해 표현될 수 있다. 그러나, 특정 하드웨어 모델 구현을 위하여, I/O 용량은 어레이내의 임의의 2 게이트 사이의 직접적인 최단 경로 접속의 개수를 제한하며, 그러므로 이러한 신호는 그것들의 목표지점에 도달하기 위하여 더 긴 경로(2개 이상의 홉)를 통하여 라우트되어야만 한다. 따라서, 홉의 개수는 일부 게이트 대 게이트 접속을 위하여 2개를 초과할 수 있다. 일반적으로, 모든 것은 동일하며, 홉의 더 적은 수는 더 적은 배치 비용을 가져온다.The connection is represented by the sum of all hops between the gates requiring the internal chip interconnect. The shortest path between any two chips can be represented by one or two "hops" using the connection metrics of FIGS. 7 and 8. However, for certain hardware model implementations, the I / O capacity limits the number of direct shortest path connections between any two gates in the array, so such signals may require longer paths (two to reach their target). Must be routed through more than one hop). Thus, the number of hops may exceed two for some gate-to-gate connections. In general, everything is the same, with fewer hops resulting in less deployment costs.

제 3항(즉, C2*D)는 다음과 같이 긴 형태로 표현된다.Clause 3 (ie, C2 * D) is expressed in the long form as follows.

제 3항은 가중 상수(C2)와 컴포넌트(S...)의 합계의 곱이다. 합 성분은 칩 대 칩 상호접속을 필요로하는 사용자의 회로 설계에서 각각의 게이트 i와 게이트 j사이의 모든 홉의 합이다. 상술한 바와 같이, 모든 게이트가 내부 칩 상호접속을 필요로 하는 것은 아니다. 내부 칩 상호접속을 요구하는 게이트 i와 게이트 j를 우하여, 홉의 개수가 결정된다. 모든 게이트 i와 게이트 j를 위하여, 홉의 전체 개수는 모두 합해진다.The third term is the product of the sum of the weighting constants C2 and components S ... The sum component is the sum of all hops between each gate i and gate j in the user's circuit design requiring chip to chip interconnect. As mentioned above, not all gates require internal chip interconnect. For gate i and gate j requiring internal chip interconnection, the number of hops is determined. For all gates i and j, the total number of hops is added together.

거리 계산은 다음과 같이 정의될 수도 있다.Distance calculation may be defined as follows.

여기서, M은 접속 메트릭스이다. 접속 메트릭스의 일 실시예는 도 7에 도시된다. 거리는 상호접속을 요구하는 각각의 게이트 대 게이트 접속을 위해 계산된다. 그러므로, 각각의 게이트 i와 게이트 j 비교를 위하여, 접속 메트릭스 M이 시험된다. 더욱, 구체적으로는Where M is the connection matrix. One embodiment of the connection matrix is shown in FIG. The distance is calculated for each gate-to-gate connection that requires interconnection. Therefore, for each gate i and gate j comparison, the connection matrix M is tested. More specifically

메트릭스는 각각의 칩이 확인가능한 번호화되도록 어레이내의 모든 칩과 함께 세트된다. 이러한 확인 번호는 칼럼 헤더로서 메트릭스의 상부에 세트된다. 유사하게, 이러한 확인 번호는 로우 헤더로서 메트릭스의 측면을 따라 셋업된다. 이러한 메트릭스내의 로우와 칼럼의 교차점에서의 특정 엔트리는 로우에 의해 확인된 칩과 칼럼에 의해서 확인된 칩 사이의 접속 데이터를 직접 제공하며, 이는 교차점에서 발생한다. 칩 i와 칩 j 사이의 임의의 거리 계산을 위하여, 메트릭스 Mij내의 엔트리는 직접 접속을 위한 "1" 또는 직접 접속이 아닌 것을 위한 "0"를 함유한다. 인덱스 k는 상호접속을 요구하는 칩 j내의 임의의 게이트와 칩 i내의 임의의 게이트를 상호접속하기 위하여 필요한 홉의 개수를 나타낸다.The matrix is set with all chips in the array such that each chip is identifiable numbered. This confirmation number is set on top of the matrix as a column header. Similarly, this confirmation number is set up along the side of the matrix as a row header. A particular entry at the intersection of a row and a column in this matrix directly provides connection data between the chip identified by the row and the chip identified by the column, which occurs at the intersection. For calculating the arbitrary distance between chip i and chip j, the entry in the matrix Mij contains "1" for direct connection or "0" for non-direct connection. Index k represents the number of hops needed to interconnect any gate in chip j and any gate in chip i that require interconnect.

우선, k=1을 위한 접속 메트릭스 Mij가 시험되어야만 한다. 만약 엔트리가 "1"이라면, 직접 접속은 칩 i내의 이러한 게이트를 위하여 칩 j내의 선택된 게이트에 존재한다. 그러므로. 인덱스 또는 홉 k=1은 Mij의 결과로서 지정되며, 이러한 결과는 이러한 2 게이트 사이의 거리이다. 이러한 지점에서, 다른 게이트 대 게이트 접속이 시험될 수 있다. 그러나, 만약 엔트리가 "0"이라면, 직접 접속이 존재하지 않는다.First, the connection matrix Mij for k = 1 must be tested. If the entry is "1", the direct connection is at the selected gate in chip j for this gate in chip i. therefore. The index or hop k = 1 is specified as the result of Mij, which is the distance between these two gates. At this point, other gate-to-gate connections can be tested. However, if the entry is "0", there is no direct connection.

직접 접속이 존재하지 않는다면, 다음 k가 시험되어야 한다. 이러한 새로운 k(즉, k=2)는 자체로서 메트릭스 Mij를 곱함으로써 계산될 수 있다; 즉, M²= M*M, 여기서 k=2이다.If no direct connection exists, then k shall be tested. This new k (ie k = 2) can be calculated by multiplying the matrix Mij by itself; That is, M ² = M * M, where k = 2.

칩 i와 칩 j를 위한 특정 로우 및 칼럼 엔트리까지 M을 곱하는 이러한 프로세스는 계산된 결과가 "1"이 될 때까지 계속되며, 인덱스 k는 호프의 수로서 선택된다. 동작은 메트릭스 M과 AND하며, AND된 결과를 OR하는 것을 포함한다. 메트릭스 m_i,l및 m_l,j사이의 AND 동작이 로직 "1" 값으로 나오면, 그러면 칩 i내의 선택된 게이트과 칩 j내의 선택된 게이트 사이에 홉 k를 통하여 임의의 칩 l을 통하여 접속이 존재한다; 만약 그렇지 않으면, 이러한 특정 홉 k내에서 접속이 존재하지 않으며, 다른 계산이 필요하게 된다. 메트릭스 m_i,l및 m_l,j는 이러한 하드웨어 모델링을 위해 정의된 접속 메트릭스 M이다. 상호접속을 필요로 하는 소정의 게이트 i와 게이트 j를 위하여, 메트릭스 m_i,l내의 게이트 i를 위한 FPGA 칩을 함유하는 로우는 논리적으로 게이트 j와 m_l,j를 위한 FPGA 칩을 함유하는 칼럼에 AND된다. 개별적으로 AND된 컴포넌트는 인덱스 또는 홉 k을 위한 Mij 값이 "1"또는 "0" 인지를 결정하기 위하여 OR된다. 만약, 결과가 "1"이라면, 접속이 존재하며, 인덱스 k는 홉의 수로서 지정된다. 만약 결과가 "0"이라면, 그러면 접속은 존재하지 않는다.This process of multiplying M by the specific row and column entries for chip i and chip j continues until the calculated result is " 1 ", and index k is selected as the number of hops. The operation ANDs with the matrix M and includes ORing the ANDed results. If the AND operation between the matrices m _{i, l} and m _{l, j} results in a logic “1” value, then there is a connection through any chip l through hop k between the selected gate in chip i and the selected gate in chip j. ; Otherwise, there is no connection within this particular hop k, and other calculations are required. The matrices m _{i, l} and m _{l, j} are the connection metrics M defined for this hardware modeling. For a given gate i and gate j requiring interconnection, the row containing the FPGA chip for gate i in matrix m _{i, l} is logically the column containing the FPGA chip for gate j and _{ml, j} . AND is The individually ANDed components are ORed to determine if the Mij value for the index or hop k is "1" or "0". If the result is "1", there is a connection and index k is specified as the number of hops. If the result is "0", then no connection exists.

이하의 예는 이러한 원칙을 도시한다. 도 35(A) 내지 35(D)를 참조하라. 도 35(A)는 클라우드(1090)으로서 표현된 사용자의 회로 설계를 도시한다. 이러한 회로 설계(1090)은 간단하거나 복합할 수 있다. 회로 설계(1090)의 부분은 OR 게이트(1091) 및 2개의 AND 게이트(1092 및 1093)을 포함한다. AND 게이트(1092, 1093)의 출력은 OR 게이트(1091)의 입력에 연결된다. 이러한 게이트(1091, 1092, 1093)는 회로 설계(1090)의 다른 부분에 연결될 수도 있다.The following example illustrates this principle. See FIGS. 35A-35D. 35A shows the circuit design of the user represented as the cloud 1090. This circuit design 1090 can be simple or complex. Part of circuit design 1090 includes an OR gate 1091 and two AND gates 1092 and 1093. The outputs of the AND gates 1092 and 1093 are connected to the inputs of the OR gate 1091. These gates 1091, 1092, 1093 may be connected to other portions of the circuit design 1090.

도 35(B)를 참조하면, 3 게이트(1091, 1092, 1093)를 함유하는 부분을 포함하는 이러한 회로(1090)의 컴포넌트는 FPGA 칩(1094, 1095, 1096)내에 구성되고 배치될 수 있다. FPGA 칩의 이러한 특정 어레이는 도시된 바와 같은 상호접속 기술을 갖는다; 즉, 상호접속(1097) 세트는 칩(1094)와 칩(1095)를 연결하며, 다른 상호접속 세트(1098)은 칩(1095)와 칩(1096)을 연결한다. 칩(1094)와 칩(1096) 사이에는 직접 접속이 제공되지 않는다. 이러한 회로 설계(1090)의 컴포넌트가 칩에 배치될 때, 시스템은 상이한 칩의 회로 경로를 연결하기 위하여 미리 지정된 상호접속 기술을 사용한다.Referring to FIG. 35B, components of such circuit 1090 that include portions containing three gates 1091, 1092, and 1093 may be constructed and disposed within FPGA chips 1094, 1095, and 1096. This particular array of FPGA chips has an interconnect technology as shown; That is, the set of interconnects 1097 connects the chip 1094 and the chip 1095, and the other set of interconnects 1098 connects the chip 1095 and the chip 1096. There is no direct connection provided between chip 1094 and chip 1096. When the components of this circuit design 1090 are placed on a chip, the system uses a predetermined interconnect technology to connect the circuit paths of the different chips.

도 35(C)를 참조하면, 한 가지 가능한 구성 및 배치는 칩 (1094)내에 배치된 OR 게이트(1091), 칩(1095)내에 배치된 AND 게이트(1092) 및 칩(1096)내에 배치된 AND 게이트(1093)이다. OR 게이트(1091)와 AND 게이트(1092) 사이의 접속은 상이한 칩내에 위치하므로 상호접속이 필요하며, 그러므로 상호접속(1097) 세트가 사용된다. 이러한 상호접속을 위한 호프의 수는 "1"이다. OR 게이트(1091)와 AND 게이트(1093) 사이의 접속도 상호접속이 필요하며, 그러므로 상호접속(1097, 1098) 세트가 사용된다. 홉의 수는 "2"이다. 이러한 배치를 위한 예의 경우, 전체 홉의 수는 "3"이며, 도시되지 않은 회로(1090)의 나머지에서 상호접속과 다른 게이트로부터의 기여가 감소된다.Referring to FIG. 35C, one possible configuration and arrangement is an OR gate 1091 disposed within the chip 1094, an AND gate 1092 disposed within the chip 1095, and an AND disposed within the chip 1096. Gate 1093. The connection between OR gate 1091 and AND gate 1092 is located in a different chip and therefore requires interconnection, so a set of interconnects 1097 are used. The number of hops for this interconnect is "1". The connection between OR gate 1091 and AND gate 1093 also requires interconnection, so a set of interconnects 1097 and 1098 are used. The number of hops is "2". For the example for this arrangement, the total number of hops is "3" and the contribution from interconnects and other gates is reduced in the rest of the circuit 1090, not shown.

도 35(D)는 다른 배치 예를 도시한다. 여기서, OR 게이트(1091)는 칩 (1094)내에 배치되며, AND 게이트(1092, 1093)는 칩(1095)내에 배치된다. 또한, 회로(1090)의 다른 부분은 교육 목적을 위해 도시되지 않는다. OR 게이트(1091) 및 AND 게이트(1092) 사이의 접속은 상기 게이트들이 다른 배치되기 때문에 요구하고, 따라서 (1097) 세트가 사용된다. 이런 대한 호프(hop)의 수는 "1"이다. OR 게이트(1091) 및 AND 게이트(1093) 사이의 접속은 요구하여 (1097) 세트가 사용된다. 호프의 수는 역시 "1"이다. 이런 배치 예에 대하여, 호프의 총 수는 "2"이고, 도시되지 않은 회로(1090) 나머지에서 다른 게이트 및 상기 게이트들의 기여도를 미리 계산한다. 따라서, 다른 인자들이 동일하다고 가정하고 거리 D 바탕으로, 코스트 (cost )은 도 35(C)의 배치예보다 도 35(D)의 보다 낮은 코스트 계산한다. 그러나, 모든 다른 인자는 동일하지 않다. 보다 유사하게, 도 35(D)의 코스트 /이용 가능성 G를 바탕으로 한다. 도 35(D)에서, 하나 이상의 게이트는 도 35(C)의 동일한 칩에 사용되기 보다 칩(1095)에서 사용된다. 게다가, 도 35(C)에 도시된 배치예에서 칩(1095)에 대한 핀 용도/이용 가능성 P은 도 35(D)에 도시된 다른 배치 예에서의 동일한 칩에 대한 핀 용도/이용 가능성보다 크다.35D shows another arrangement example. Here, the OR gate 1091 is disposed in the chip 1094, and the AND gates 1092 and 1093 are disposed in the chip 1095. Also, other parts of circuit 1090 are not shown for educational purposes. The connection between OR gate 1091 and AND gate 1092 is required because the gates are arranged differently, so a set of 1097 is used. The number of hops for this is "1". The connection between OR gate 1091 and AND gate 1093 is required and a set of 1097 is used. The number of hops is also "1". For this arrangement example, the total number of hops is " 2 ", and pre-calculates the contribution of the other gates and the gates in the rest of circuit 1090, not shown. Therefore, on the basis of the distance D, assuming that the other factors are the same, the cost calculates the lower cost of FIG. 35 (D) than the arrangement example of FIG. 35 (C). However, all other factors are not the same. More similarly, it is based on the cost / availability G of Fig. 35D. In FIG. 35D, more than one gate is used in chip 1095 rather than the same chip in FIG. 35C. In addition, the pin usage / availability P for the chip 1095 in the placement example shown in FIG. 35C is greater than the pin usage / availability for the same chip in the other placement example shown in FIG. 35 (D). .

코스 그레인(coarse-grain) 배치후, 평탄화 클러스터 배치의 미세 조절은 추가로 배치 결과를 최적화할것이다. 이런 파인-그레인(fine-grain) 배치 동작(353)은 코스 그레인 배치 동작(352)에 의해 처음에 선택된 배치를 정교하게 한다. 여기서, 초기 클러스터는 만약 배치의 최적화를 증가시키면 분할될수있다. 예를들어, 논리 엘리먼트 X 및 Y가 클러스터(A)의 본래 부분이고 FPGA 칩(10)을 위해 설계된 것을 가정한다. 파인 그레인 배치 동작(353)으로 인해, 논리 엘리먼트(X 및 Y)는 분리된 클러스터(B) 또는 다른 클러스터(C)의 일부로서 설계되거나 FPGA 칩(2)의 배치를 위해 설계될수있다. 따라서, 특정 FPGA에 대한 사용자의 회로 설계를 묶는 FPGA 네트리스트(netlist)(354)가 설계된다.After coarse-grain placement, fine control of the flattening cluster placement will further optimize the placement results. This fine-grain placement operation 353 elaborates the placement initially selected by the coarse grain placement operation 352. Here, the initial cluster can be partitioned if you increase the optimization of the placement. For example, assume that logical elements X and Y are the original part of cluster A and designed for FPGA chip 10. Due to the fine grain placement operation 353, the logic elements X and Y can be designed as part of a separate cluster B or other cluster C or for the placement of the FPGA chip 2. Thus, an FPGA netlist 354 is designed that binds the user's circuit design for a particular FPGA.

클러스터 분할 및 임의의 칩내에 배치 방법의 결정은 회로 CKTQ에 대한 코스트 펑션 f(P, G, D)를 통하여 계산된 배치 코스트를 바탕으로 이루어진다. 일실시예에서, 파인 그레인 배치 처리에 사용된 코스트 펑션은 코스 그레인 배치 처리에 사용된 코스트 펑션과 같다. 단지 두개의 배치 처리 사이의 차는 처리 자체가 아니라 배치되는 클러스트 크기이다. 코스 그레인 배치 처리는 파인 그레인 배치 처리보다 큰 클러스트를 사용한다. 다른 실시예에서, 코스 그레인 및 파인 그레인배치 처리에 대한 코스트 펑션은 웨이팅 상수(C0, C1 및 C2)를 선택하는 것과 관련하여 상기된 바와같이 서로 다르다.Determination of the cluster division and placement method in any chip is made based on the placement cost calculated through the cost function f (P, G, D) for the circuit CKTQ. In one embodiment, the cost function used for the fine grain batching process is the same as the cost function used for the coarse grain batching process. The difference between only two batch processes is not the process itself, but the size of the clusters to be placed. The coarse grain batching process uses a larger cluster than the fine grain batching process. In another embodiment, the cost functions for the coarse grain and fine grainbatch processing are different as described above with respect to selecting the weighting constants C0, C1 and C2.

배치가 완료되면, 칩 사이의 루팅 태스크(355)가 수행된다. 만약 다른 칩에배치된 회로들을 접속시키기 위한 루팅 와이어의 수가 칩 대 칩 루팅을 위해 할당된 FPGA 칩내의 이용 가능한 핀을 초과하면, 시분할 멀티플렉스(TDM) 회로가 사용된다. 예를들어, 만약 각각의 FPGA 칩이 두개의 다른 FPGA 칩에 배치된 회로를 접속시키기 위해 단지 44개의 핀만이 허용되고, 특정 모델 실행에서 칩 사이에 45 와이어가 요구되면, 특정 시분할 멀티플렉스 회로는 각각의 칩에서 실행될것이다. 이런 특정 TDM 회로는 적어도 두개의 와이어를 서로 결합시킨다. TDM 회로의 일실시예는 추후에 논의될 도 9(A), 9(B), 및 9(C)에 도시된다. 따라서, 루팅 태스크는 핀이 칩 사이에서 시분할 멀티플렉스로 배열되기 때문에 항상 완성될수있다.Once the placement is complete, a routing task 355 between the chips is performed. If the number of routing wires for connecting circuits deployed on other chips exceeds the available pins in the FPGA chip allocated for chip-to-chip routing, a time division multiplex (TDM) circuit is used. For example, if only 44 pins are allowed to connect a circuit where each FPGA chip is placed on two different FPGA chips, and 45 wires are required between the chips in a particular model implementation, a particular time division multiplex circuit will It will run on each chip. This particular TDM circuit couples at least two wires together. One embodiment of a TDM circuit is shown in Figures 9 (A), 9 (B), and 9 (C), which will be discussed later. Thus, a routing task can always be completed because the pins are arranged in a time division multiplex between the chips.

각각의 FPGA의 배치 및 루팅이 결정되면, 각각의 FPGA는 최적화된 작업 회로로 구성되고 따라서, 시스템은 "비트스트림" 구성 파일(356)을 생성한다. 얼터러(altera) 기술에서, 시스템은 하나 이상의 프로그래머 오브젝트 파일(-pof)을 생성한다. 다른 생성된 파일은 SRAM 오브젝트 파일(.sof), JECED 파일(.jed), 16진법(인텔-포맷) 파일(.hex) 및 테이블 텍스트 파일(.ttf)을 포함한다. 얼터러 MAX+PLUS Ⅱ 프로그래머는 FPGA 어레이를 프로그램하기 위하여 얼터러 하드웨어 프로그램 가능 장치와 함께 POF, SOF 및 JEDEC 파일을 사용한다. 선택적으로, 시스템은 하나 이상의 로(raw) 이진 파일(.rbf)을 생성한다. CPU는 .rbf 파일을 변경하고 PCI 버스를 통하여 FPGA 어레이를 프로그램한다.Once the placement and routing of each FPGA is determined, each FPGA is configured with optimized working circuitry, and thus the system generates a "bitstream" configuration file 356. In the altera technique, the system generates one or more programmer object files (-pof). Other generated files include SRAM object files (.sof), JECED files (.jed), hexadecimal (Intel-format) files (.hex), and table text files (.ttf). Altera The MAX + PLUS II programmer uses POF, SOF, and JEDEC files with Altera hardware programmable devices to program the FPGA array. Optionally, the system generates one or more raw binary files (.rbf). The CPU modifies the .rbf file and programs the FPGA array through the PCI bus.

이런 포인트에서, 구성된 하드웨어는 하드웨어 스타트-업(start-up)(370)을 준비한다. 이것은 재구성 가능 보드상에서 하드웨어 모델의 자동 구성을 완료한다.At this point, the configured hardware prepares for hardware start-up 370. This completes the automatic configuration of the hardware model on the reconfigurable board.

하나의 핀 출력만이 실제로 사용되도록 핀 출력 그룹이 함께 시간 멀티플렉스되도록 하는 TDM 회로를 다시 참조하여, TDM 회로는 필수적으로 적어도 두개의 입력(두개의 와이어에 대해)을 가진 멀티플렉서, 하나의 출력, 및 선택기 신호로서 루프에 구성된 레지스터 커플이다. 만약 SEmulation 시스템이 서로 그룹지도록 보다 많은 와이어를 요구하면, 보다 많은 입력 및 루프 레지스터가 제공될수있다. TDM 회로에 대한 선택기 신호처럼, 루프에 구성된 몇몇 레지스터는 멀티플렉서에 대한 적당한 신호를 제공하여, 임의의 시간 주기에, 상기 입력중 하나는 출력으로서 선택되고, 다른 시간 주기에 다른 입력은 출력으로서 선택된다. 따라서, TDM 회로는 칩 사이의 하나의 출력 와이어만을 사용하도록 관리하여, 이런 실시예에 대한 특정 칩에서 실행된 회로의 하드웨어 모델은 45 핀 대신 44 핀을 사용하여 달성된다. 따라서, 루팅 태스크는 상기 핀들이 칩에 걸쳐 시분할 멀티플렉스 형태로 분할될수있기 때문에 항상 완료될수있다.Referring back to the TDM circuit that causes the pin output groups to be time multiplexed together so that only one pin output is actually used, the TDM circuit is essentially a multiplexer with at least two inputs (for two wires), one output, And register couples configured in the loop as selector signals. If the SEmulation system requires more wires to group together, more input and loop registers can be provided. Like selector signals for TDM circuits, some registers configured in the loop provide a suitable signal for the multiplexer, so that at any time period, one of the inputs is selected as an output and another input is selected as an output at another time period. . Thus, the TDM circuit manages to use only one output wire between the chips, so that the hardware model of the circuit implemented on a particular chip for this embodiment is achieved using 44 pins instead of 45 pins. Thus, the routing task can always be completed because the pins can be divided in time division multiplex form across the chip.

도 9(A)는 핀 아웃 문제의 개요를 도시한다. 이것은 TDM 회로를 요구하기 때문에, 도 9(B)는 전송측에 대한 TDM 회로를 제공하고, 도 9(C)는 수신측에 대한 TDM 회로를 제공한다. 이들 도면은 SEmulation 시스템이 칩 사이에 두개의 와이어 대신 하나의 와이어를 요구하는 단지 하나의 특정 실시예를 도시한다. 만약 두개 이상의 와이어가 시간 멀티플렉스 배열에 함께 결합되야 한다면, 당업자는 하기 기술로 인해 적당한 변형을 형성할수있다.9 (A) shows an overview of the pin out problem. Since this requires a TDM circuit, Fig. 9B provides the TDM circuit for the transmitting side, and Fig. 9C provides the TDM circuit for the receiving side. These figures show only one specific embodiment where the SEmulation system requires one wire instead of two wires between the chips. If two or more wires must be joined together in a time multiplex arrangement, one of ordinary skill in the art can form suitable strains due to the following techniques.

도 9(A)는 SEmulation 시스템이 TDM 구성에서 두개의 와이어를 결합하는 TDM 회로의 일실시예를 도시한다. 두개의 칩(990 및 991)이 제공된다. 완성된 사용자 회로 설계의 일부인 회로(960)는 칩(991)에서 모델링 및 배치된다. 완성된 사용자 회로 설계의 일부인 회로(973)는 칩(990)에서 모델링 및 배치된다. 상호접속부(994), 상호접속부(992), 및 상호접속부(993)의 그룹을 포함하는 몇몇 상호접속부는 회로(960 및 973) 사이에 제공된다. 이런 실시예에서 상호접속부의 수는 총 45이다. 만약 일실시예에서, 각각의 칩이 이들 상호접속부를 위해 단지 44 핀만을 제공하면, 본 발명의 하나의 실시예는 이들 칩(990 및 991) 사이에 단지 하나의 상호접속부만을 요구하도록 시간 멀티플렉스될 적어도 두개의 상호접속부를 제공한다.9 (A) shows one embodiment of a TDM circuit in which the SEmulation system combines two wires in a TDM configuration. Two chips 990 and 991 are provided. Circuit 960, which is part of the completed user circuit design, is modeled and placed on chip 991. Circuit 973, which is part of the completed user circuit design, is modeled and placed on chip 990. Several interconnects are provided between circuits 960 and 973, including interconnect 994, interconnect 992, and a group of interconnects 993. In this embodiment the number of interconnects is 45 total. If, in one embodiment, each chip provides only 44 pins for these interconnects, one embodiment of the present invention is time multiplexed to require only one interconnect between these chips 990 and 991. It provides at least two interconnects.

이런 실시예에서, 상호접속부(994)의 그룹은 43 핀을 계속 사용할것이다. 44번째 및 마지막 핀에 대하여, 본 발명의 일실시예에 따른 TDM 회로는 시분할 멀티플렉스된 형태로 함께 상호접속부(992 및 993)를 결합하기 위하여 사용될수있다.In this embodiment, the group of interconnects 994 will continue to use 43 pins. For the 44th and last pins, TDM circuits according to one embodiment of the present invention can be used to couple interconnects 992 and 993 together in a time division multiplexed form.

도 9(B)는 TDM 회로의 일실시예를 도시한다. FPGA 칩(991)내의 모델링된 회로(또는 그것의 일부)(960)는 와이어(966 및 967)상에 두개의 신호를 제공한다. 회로(960)에 이들 와이어(966 및 967)가 출력된다. 이들 출력은 칩(990)(도 9A 및 9C 참조)의 모델링된 회로(973)에 일반적으로 결합된다. 그러나, 이들 두개의 출력 와이어(966 및 967)에 대하여 단지 하나의 핀의 이용 가능성은 직접적인 핀 대 핀 접속을 방해한다. 왜냐하면 출력(966 및 967)은 다른 칩으로 단일방향으로 전송되기 때문에, 적당한 전송 및 수신기 TDM 회로는 이들 라인을 서로 결합하도록 제공된다. 전송측 TDM 회로의 일실시예는 도 9(B)에 도시된다.9B shows one embodiment of a TDM circuit. Modeled circuitry (or portions thereof) 960 in FPGA chip 991 provides two signals on wires 966 and 967. These wires 966 and 967 are output to the circuit 960. These outputs are generally coupled to the modeled circuit 933 of the chip 990 (see FIGS. 9A and 9C). However, the availability of only one pin for these two output wires 966 and 967 prevents direct pin-to-pin connections. Because outputs 966 and 967 are transmitted unidirectionally to other chips, suitable transmit and receiver TDM circuits are provided to couple these lines together. One embodiment of the transmitting side TDM circuit is shown in Fig. 9B.

전송측 TDM 회로는 AND 게이트(961 및 962)를 포함하고, 상기 게이트들의 각각의 출력(970 및 971)은 OR 게이트(963)의 입력에 결합된다. OR 게이트(963)의 출력(972)은 하나의 핀에 할당되고 다른 칩(990)에 접속된 칩의 출력이다. AND 게이트(961 및 962)에 대한 하나의 세트의 입력(966 및 967)은 각각 회로 모델(960)에 의해 제공된다. 다른 세트의 입력(968 및 969)은 시분할 멀티플렉스 선택 신호로서 기능하는 루프 레지스터 방법에 의해 제공된다.The transmission-side TDM circuit includes AND gates 961 and 962, each output 970 and 971 of the gates coupled to an input of an OR gate 963. The output 972 of the OR gate 963 is the output of the chip assigned to one pin and connected to the other chip 990. One set of inputs 966 and 967 for AND gates 961 and 962 are provided by circuit model 960, respectively. The other set of inputs 968 and 969 are provided by a loop register method that functions as a time division multiplex select signal.

루프 레지스터 방법은 레지스터(964 및 965)를 포함한다. 레지스터(964)의 출력(995)은 레지스터(965)의 입력 및 AND 게이트(961)의 입력(968)에 제공된다. 레지스터(965)의 출력(996)은 레지스터(964)의 입력 및 AND 게이트(962)의 입력(968)에 제공된다. 각각의 레지스터(964 및 965)는 공통 클럭 소스에 의해 제어된다. 임의의 주어진 시간에, 단지 하나의 출력(995 또는 996)은 논리 "1"을 제공한다. 다른 출력은 논리 "0"을 제공한다. 따라서, 각각의 클럭 에지후, 논리 "1"은 출력(995 및 996) 사이에서 시프트한다. 이것은 차례로 AND 게이트(961)에 대해 "1" 또는 AND 게이트(962)에 대해 "1"을 제공하고, 와이어(966) 또는 와이어(967)중 어느 하나를 "선택"한다. 따라서, 와이어(972)상 데이타는 회로(960)로부터 와이어(966) 또는 와이어(967)에 존재한다.The loop register method includes registers 964 and 965. Output 995 of register 964 is provided to an input of register 965 and an input 968 of AND gate 961. Output 996 of register 965 is provided to an input of register 964 and an input 968 of AND gate 962. Each register 964 and 965 is controlled by a common clock source. At any given time, only one output 995 or 996 provides a logic "1". The other output provides a logic "0". Thus, after each clock edge, logic "1" shifts between outputs 995 and 996. This in turn provides "1" for AND gate 961 or "1" for AND gate 962 and "selects" either wire 966 or wire 967. Thus, data on wire 972 resides in wire 966 or wire 967 from circuit 960.

TDM 회로의 수신측 일실시예는 도 9(C)에 도시된다. 칩(991)(도 9(A) 및 9(B))에서 와이어(966 및 967)상 와이어로부터의 신호는 도 9(C)의 회로(973)에 대한 적당한 와이어(985 또는 986)에 결합되어야 한다. 칩(991)로부터의 시분할 멀티플렉스 신호는 와이어/핀(978)로부터 진입한다. 수신측 TDM 회로는 와이어/핀(978)상 이들 신호를 회로(973)에 대한 적당한 와이어(985 및 986)에 결합한다.One embodiment of the receiving side of the TDM circuit is shown in Fig. 9C. In chip 991 (FIGS. 9A and 9B), signals from wires on wires 966 and 967 are coupled to appropriate wires 985 or 986 for circuit 973 in FIG. 9C. Should be. Time division multiplex signal from chip 991 enters from wire / pin 978. The receiving TDM circuit couples these signals on wire / pin 978 to the appropriate wires 985 and 986 for circuit 973.

TDM 회로는 입력 레지스터(974 및 975)를 포함한다. 와이어/핀(978)상 신호는 각각 와이어(979 및 980)를 통하여 이들 입력 레지스터(974 및 975)에 제공된다. 입력 레지스터(974)의 출력(985)은 회로(973)의 적당한 포트에 제공된다. 유사하게, 입력 레지스터(975)의 출력(986)은 회로(973)의 적당한 포트에 제공된다. 이들 입력 레지스터(974 및 975)는 루프 레지스터(976 및 977)에 의해 제어된다.The TDM circuit includes input registers 974 and 975. Signals on wire / pin 978 are provided to these input registers 974 and 975 via wires 979 and 980, respectively. Output 985 of input register 974 is provided to a suitable port of circuit 973. Similarly, the output 986 of the input register 975 is provided to an appropriate port of the circuit 973. These input registers 974 and 975 are controlled by loop registers 976 and 977.

레지스터(976)의 출력(984)은 레지스터(977)의 입력 및 레지스터(974)의 클럭 입력(981)에 결합된다. 레지스터(977)의 출력(983)은 레지스터(976)의 입력 및 레지스터(975)의 클럭 입력(982)에 결합된다. 각각의 레지스터(976 및 977)는 공통 클럭 소스에 의해 제어된다. 임의의 주어진 시간에서, 단지 하나의 인에이블 입력(981 또는 982)은 논리 "1"이된다. 다른 입력은 논리 "0"이 된다. 따라서, 각각의 클럭 에지후, 논리 "1"은 인에이블 입력(981) 및 출력(982) 사이에서 시프트한다. 이것은 차례로 와이어(979 또는 980)를 통하여 회로(973)에 적당히 결합된다.The output 984 of the register 976 is coupled to the input of the register 997 and the clock input 981 of the register 974. The output 983 of the register 997 is coupled to the input of the register 976 and the clock input 982 of the register 975. Each register 976 and 977 is controlled by a common clock source. At any given time, only one enable input 981 or 982 becomes a logic "1". The other input is a logic "0". Thus, after each clock edge, logic “1” shifts between enable input 981 and output 982. This in turn is suitably coupled to circuit 973 via wire 979 or 980.

도 4와 관련하여 간략히 논의된 바와같이, 본 발명의 일실시예에 따른 어드레스 포인터는 지금 더 상세히 논의될 것이다. 반복하기 위하여, 몇몇 어드레스 포인터는 하드웨어 모델의 각각의 FPGA 칩에 배치된다. 일반적으로, 어드레스 포인터를 실행하는 제 1 목적은 32 비트 PCI 버스(328)(도 10 참조)를 통하여 하드웨어 모델(325)내의 특정 FPGA 칩 및 소프트웨어 모델(315) 사이에서 데이타를 전달하기 위하여 시스템을 인에이블하는 것이다. 보다 특히, 어드레스 포인터의 제 1 목적은 32 비트 PCI 버스의 대역폭 제한으로 인해 FPGA 칩의 뱅크(326a-326d)중 각각의 FPGA 칩과 소프트웨어/하드웨어 바운드리내의 각각의 어드레스 공간(즉, REG, S2H, H2S 및 CLK) 사이에서 데이타 전달을 선택적으로 제어하는 것이다. 비록 64 비트 PCI 버스가 실행되더라도, 이들 어드레스 포인터는 데이타 전달을 제어하기 위하여 여전히 필요하다. 따라서, 만약 소프트웨어 모델이 5 어드레스 공간(즉, REG 판독, REG 기록, S2H 판독, H2S 기록, 및 CLK 기록)을 가지면, 각각의 FPGA 칩은 이들 5 어드레스 공간에 해당하는 5 어드레스 포인터를 가진다. 각각의 FPGA는 처리되는 선택된 어드레스 공간내의 특정 선택된 워드가 임의의 하나 이상의 FPGA 칩에 잔류하기 때문에 이들 5 어드레스 포인터를 필요로한다.As briefly discussed with respect to FIG. 4, an address pointer according to an embodiment of the present invention will now be discussed in more detail. To repeat, some address pointers are placed on each FPGA chip of the hardware model. In general, the first purpose of implementing an address pointer is to provide a system for transferring data between a particular FPGA chip and software model 315 in hardware model 325 via a 32-bit PCI bus 328 (see FIG. 10). Is to enable it. More specifically, the first purpose of the address pointer is to address each of the FPGA chip and respective address spaces within the software / hardware boundary of the banks 326a-326d of the FPGA chip due to bandwidth limitations of the 32-bit PCI bus (ie, REG, S2H). , H2S and CLK). Although a 64-bit PCI bus is implemented, these address pointers are still needed to control data transfer. Thus, if the software model has five address spaces (ie, REG read, REG write, S2H read, H2S write, and CLK write), then each FPGA chip has five address pointers corresponding to these five address spaces. Each FPGA needs these five address pointers because the particular selected word in the selected address space being processed remains on any one or more FPGA chips.

FPGA I/O 제어기(381)는 공간 인덱스를 사용함으로써 소프트웨어/하드웨어 바운드리에 해당하는 특정 어드레스 공간(즉, REG, S2H, H2S 및 CLK)을 선택한다. 일단 어드레스 공간이 선택되면, 각각의 FPGA 칩내의 선택된 어드레스 공간에 해당하는 특정 어드레스 포인터는 선택된 어드레스 공간내의 동일한 워드에 해당하는 특정 워드를 선택한다. 각각의 FPGA 칩내의 어드레스 포인터 및 소프트웨어/하드웨어 바운드리내의 어드레스 공간의 최대 크기는 선택된 FPGA 칩의 메모리/워드 용량에 따른다. 예를들어, 본 발명의 일실시예는 FPGA 칩의 얼터러 FLEX 10K 패밀리를 사용한다. 따라서, 각각의 어드레스 공간에 대한 평가된 최대 크기는 : REG,3,000 워드; CLK, 1 워드; S2H, 10 워드; 및 H2S, 10 워드이다. 각각의 FPGA 칩은 대략 100 워드를 홀딩할수있다.The FPGA I / O controller 381 selects specific address spaces (ie, REG, S2H, H2S, and CLK) corresponding to software / hardware boundaries by using spatial indexes. Once the address space is selected, the specific address pointer corresponding to the selected address space in each FPGA chip selects the specific word corresponding to the same word in the selected address space. The maximum size of the address pointer in each FPGA chip and the address space in the software / hardware boundary is dependent on the memory / word capacity of the selected FPGA chip. For example, one embodiment of the present invention uses an alternate FLEX 10K family of FPGA chips. Thus, the estimated maximum size for each address space is: REG, 3,000 words; CLK, 1 word; S2H, 10 words; And H2S, 10 words. Each FPGA chip can hold approximately 100 words.

SEmulation 시스템은 사용자가 시작하고, 입력 값을 주장하고, SEmulation 처리시 임의의 시간에 값을 검사하도록 하는 특징을 가진다. 시뮬레이터의 적응성을 제공하기 위하여, SEmulator는 부품의 내부 구현형태가 소프트웨어인지 하드웨어인지에 관계없이 사용자에게 모든 부품이 가시적이어야 한다. 소프트웨어에서, 결합 부품은 모델링되고 값은 시뮬레이션 처리동안 계산된다. 따라서, 이들 값은 시뮬레이션 처리 동안 임의의 시간에 사용자가 액세스하도록 완전히 "가시적"이다.The SEmulation system is characterized by allowing the user to start, assert the input value, and check the value at any time during the SEmulation process. To provide simulator adaptability, the SEmulator must be visible to the user, regardless of whether the component's internal implementation is software or hardware. In software, the coupling part is modeled and the values are calculated during the simulation process. Thus, these values are completely "visible" for the user to access at any time during the simulation process.

그러나, 하드웨어 모델에서 결합 부품은 직접적으로 "가시적"이지 않다. 비록 레지스터가 소프트웨어 커널에 의해 쉽고 직접적으로 액세스 가능(즉, 판독/기록) 하지만, 결합 부품은 결정하기가 보다 어렵다. FPGA에서, 대부분의 결합 부품은 룩업 테이블로서 모델링되어 게이트 활용도를 높인다. 결과적으로, 룩업 테이블 맵핑은 효과적인 하드웨어 모델링을 제공하지만 대부분의 결합 논리 신호의 가시도를 손상시킨다.However, in the hardware model, the coupling component is not directly "visible". Although registers are easily and directly accessible (ie, read / write) by the software kernel, the combined components are more difficult to determine. In FPGAs, most coupled components are modeled as lookup tables to increase gate utilization. As a result, lookup table mapping provides effective hardware modeling but impairs the visibility of most combined logic signals.

결합 부품의 가시도 부족으로 인한 이들 문제에도 불구하고, SEmulation 시스템은 하드웨어 가속 모드후 사용자에 의한 검사를 위해 결합 부품을 보강하거나 재생성할수있다. 만약 사용자의 회로 설계가 단지 결합 및 레지스터 부품만을 가지면, 모든 결합 부품의 값은 레지스터 부품으로부터 유도될수있다. 즉, 결합 부품은 회로 설계에 의해 요구된 특정 논리 함수에 따라 다양한 배열의 레지스터로부터 구성되고, 상기 레지스터를 포함한다. SEmulator는 레지스터 및 결합 부품만의하드웨어 모델을 가지며, 결과적으로 SEmulator는 하드웨어 모델로부터 모든 레지스터 값을 판독하고 그 다음 모든 결합 부품을 보강하거나 재생성한다. 이런 재생성 처리를 수행하기 위하여 요구된 오버헤드로 인해, 결합 부품 재생성은 모든 시간에서 수행되지 못하고; 오히려, 사용자에 의해 요구시에만 수행된다. 실제로, 하드웨어 모델을 사용하는 장점중 하나는 시뮬레이션 처리를 가속화시키는 것이다. 모든 사이클(또는 심지어 대부분의 사이클)에서 결합 부품 값을 결정하는 것은 시뮬레이션 속도를 추가로 감소시킨다. 임의의 경우, 레지스터 값 단독의 검사는 대부분의 시뮬레이션 분석을 위해 충분해야 한다.Despite these problems due to lack of visibility of the mating components, the SEmulation system can reinforce or regenerate the mating components for inspection by the user after hardware acceleration mode. If your circuit design has only coupling and resistor components, the values of all coupling components can be derived from the resistor components. That is, the coupling component is constructed from and includes various registers in accordance with specific logic functions required by the circuit design. The SEmulator has a hardware model of registers and coupling components only, and consequently the SEmulator reads all register values from the hardware model and then augments or regenerates all coupling components. Due to the overhead required to perform this regeneration process, the mating part regeneration may not be performed at all times; Rather, it is performed only upon request by the user. In fact, one of the advantages of using hardware models is to speed up the simulation process. Determining the coupled part value in every cycle (or even most cycles) further reduces the simulation speed. In any case, checking the register value alone should be sufficient for most simulation analysis.

레지스터 값으로부터 결합 부품 값을 재생성하는 처리는 SEmulation 시스템가 하드웨어 가속 모드 또는 ICE 모드에 있다는 것을 보장한다. 그렇지 않으면, 소프트웨어 시뮬레이션은 사용자에게 결합 부품 값을 제공한다. SEmulation 시스템은 하드웨어 가속의 시작 전에 소프트웨어 모델에 잔류하는 레지스터뿐 아니라 결합 부품 값을 유지한다. 이들 값은 시스템에 의한 추가 오버 기록 동작때까지 소프트웨어 모델에 잔류한다. 소프트웨어 모델이 하드웨어 가속 시작 바로전 시간 주기로부터 결합 부품 값 및 레지스터 값을 가지기 때문에, 결합 부품 재생성 처리는 업데이트된 입력 레지스터 값에 응답하여 소프트웨어 모델의 몇몇 또는 모든 값을 업데이트하는 것을 포함한다.The process of regenerating the combined part value from the register value ensures that the SEmulation system is in hardware acceleration mode or ICE mode. Otherwise, the software simulation provides the user with a combined component value. The SEmulation system retains the combined component values as well as the registers remaining in the software model before the start of hardware acceleration. These values remain in the software model until further overwrite operations by the system. Since the software model has the combined part value and register value from the time period just before the start of hardware acceleration, the combined part regeneration process includes updating some or all values of the software model in response to the updated input register value.

결합 부품 생성 처리는 다음과 같다: 첫째, 만약 사용자에 의해 요구되면, 소프트웨어 커널은 FPGA 칩으로부터 REG 버퍼로 하드웨어 레지스터 부품의 모든 출력 밧을 판독한다. 이런 처리는 어드레스 포인터의 체인을 통하여 REG 어드레스공간으로 FPGA 칩내의 레지스터 값의 DMA 전달을 포함한다. 소프트웨어/하드웨어 바운드리내에 있는 REG 버퍼로 하드웨어 모델내에 있는 레지스터 값을 배치하는 것은 소프트웨어 모델이 추가 처리를 위하여 데이타를 액세스하도록 한다.The combined component creation process is as follows: First, if required by the user, the software kernel reads all output batts of the hardware register components from the FPGA chip into the REG buffer. This process involves the DMA transfer of register values in the FPGA chip into the REG address space through a chain of address pointers. Placing a register value in the hardware model into an REG buffer in the software / hardware boundary allows the software model to access the data for further processing.

둘째, 소프트웨어 커널은 하드웨어 가속이 시작되기전 및 하드웨어 가속이 된후 레지스터 값을 비교한다. 만약 하드웨어 가속전 레지스터값이 하드웨어 가속후 값과 동일하면, 결합된 부품의 값은 변하지 않는다. 재생성한 결합 푸무에 대한 시간 및 리소스를 확장하는 대신, 이들 값은 하드웨어 가속 바로전 시간으로부터 저장된 결합 부품 값을 가진 소프트웨어 모델로부터 판독될수있다. 다른 한편, 만약 하나 이상의 이들 레지스터 값이 변하면, 변화된 레지스터 값에 따르는 하나 이상의 결합 부품은 값을 변화시킬수있다. 이들 결합 부품은 다음 제 3 단계를 통하여 재생성되어야 한다.Second, the software kernel compares register values before and after hardware acceleration begins. If the register value before hardware acceleration is equal to the value after hardware acceleration, the value of the combined component is not changed. Instead of extending the time and resources for regenerated coupling results, these values can be read from a software model with the coupling component values stored from the time just before hardware acceleration. On the other hand, if one or more of these register values change, one or more coupling components may change the value according to the changed register value. These coupling parts must be regenerated in the next third step.

셋째, 가속전 및 가속후 비교로부터 다른 값을 가진 레지스터에 대하여, 소프트웨어 커널은 팬 아웃(fan-out) 결합 부품을 이벤트 큐에 스케쥴한다. 여기서, 가속동안 값을 변화시킨 레지스터는 이벤트를 검출한다. 보다 유사하게, 이들 변화된 레지스터 값에 따른 이들 결합 부품은 다른 값을 생성할것이다. 이들 결합 부품의 값의 임의의 변화에도 불구하고, 상기 시스템은 이들 결합 부품이 다음 단계에서 이들 변화된 레지스터 값을 평가하는 것을 보장한다.Third, for registers with different values from pre-acceleration and post-acceleration comparisons, the software kernel schedules fan-out coupling components to the event queue. Here, a register whose value changes during acceleration detects an event. More similarly, these coupling components will produce different values according to these changed register values. Despite any change in the value of these coupling parts, the system ensures that these coupling parts evaluate these changed register values in the next step.

넷째, 그다음 소프트웨어 커널은 레지스터로부터의 값 변화를 소프트웨어 모델의 모든 결합 부품으로 전달하기 위하여 표준 이벤트 시뮬레이션 알고리듬을 실행한다. 다른 말로, 가속전에서 가속후 시간 간격동안 변화된 레지스터 값은 이들레지스터 값에 의존하는 하부쪽 모든 결합 부품으로 전달된다. 그 다음 이들 결합 부품은 이들 새로운 레지스터 값을 평가한다. 팬 아웃 및 진행 원리에 따라, 변화된 레지스터 값에 차례로 직접적으로 의존하는 제 1 레벨 결합 부품으로부터 아래에 배치된 다른 제 2 레벨 결합 부품은 변화된 데이타를 평가하여야 한다. 영향을 받을수있는 아래에 배치된 다른 부품으로의 레지스터 값의 이런 진행은 팬 아웃 네트워크의 목적에 공헌한다. 따라서, 아래에 배치되고 변화된 레지스터 값에 의해 영향을 받는 이들 결합 부품은 소프트웨어 모델에서 업데이트된다. 결합 부품 값 어느 것도 영향을 받지 않는다. 따라서, 만약 가속전에서 가속후의 시간 간격 동안 변화된 단지 하나의 레지스터 값, 및 단지 하나의 결합 부품이 이런 레지스터 값 변화에 의해 영향을 받으면, 단지 이런 결합 부품은 이런 변화된 레지스터 값으로 인해 값을 재평가할것이다. 모델화된 회로의 다른 부분은 영향을 받지 않을 것이다. 이런 작은 변화를 위해, 결합 부품 재생성 처리는 비교적 빠르게 발생할것이다.Fourth, the software kernel then executes a standard event simulation algorithm to propagate the value change from the register to all the combined parts of the software model. In other words, the register values that changed during the time interval from pre-acceleration to post-acceleration are passed to all lower coupling components that depend on these register values. These coupling parts then evaluate these new register values. In accordance with the fan out and propagation principle, the other second level coupling components disposed below from the first level coupling components directly dependent on the changed register values must evaluate the changed data. This progression of register values to other parts placed underneath can contribute to the purpose of the fan out network. Thus, these coupled components placed below and affected by the changed register values are updated in the software model. None of the mating part values are affected. Thus, if only one register value changed during the time interval from acceleration to acceleration, and only one coupling part is affected by this register value change, then only this coupling part will re-evaluate the value due to this changed register value. . Other parts of the modeled circuit will not be affected. For this small change, the mating part regeneration process will occur relatively quickly.

마지막으로, 이벤트 진행이 완료될때, 시스템은 임의의 동작 모드를 위해 준비된다. 일반적으로, 사용자는 오랜 운행후 값을 검사하고자 한다. 결합 부품 재생성 처리후, 사용자는 디버그/테스트 목적을 위해 순수한 소프트웨어 시뮬레이션한다. 그러나, 다른 시점에서, 사용자는 다음 목표된 포인트로 하드웨어 가속한다. 다른 경우에도, 사용자는 ICE 모드로 추가로 진행하고자 한다.Finally, when event progress is complete, the system is ready for any mode of operation. In general, the user wants to check the value after a long run. After the combined part regeneration process, the user simulates pure software for debug / test purposes. However, at another point in time, the user hardware accelerates to the next desired point. In other cases, the user would like to proceed further to the ICE mode.

요약하여, 결합 부품 재생성은 소프트웨어 모델에서 결합 부품 값을 업데이트하기 위하여 레지스터 값을 사용하는 것을 포함한다. 임의의 레지스터 값이 변화될때, 변화된 레지스터 값은 값이 업데이트될때 레지스터의 팬 아웃 네트워크를 통하여 진행될것이다. 레지스터 값이 변화되지 않을때, 소프트웨어 모델 값은 변화지 않을것이고, 따라서 시스템은 결합 부품을 재성할 필요가 없다. 일반적으로, 하드웨어 가속 운행은 몇몇 시간 동안 발생할것이다. 결과적으로, 많은 레지스터 값은 변화하여, 변화된 값을 가지는 이들 레지스터의 팬 아웃 네트워크 하부에 배치된 많은 결합 부품 값에 영향을 미친다. 이런 경우, 결합 부품 재생성 처리는 비교적 느릴수있다. 다른 경우, 하드웨어 가속 운행후, 단지 몇개의 레지스터 값이 변할수있다. 변화된 레지스터 값을 가진 레지스터에 대한 팬 아웃 네트워크는 작을수있고, 결합 부품 재생성 처리는 비교적 빠를수있다.In summary, coupled component regeneration involves using register values to update coupled component values in a software model. When any register value changes, the changed register value will go through the register's fan out network when the value is updated. When the register value does not change, the software model value will not change, so the system does not need to rebuild the coupling part. In general, hardware accelerated operation will occur for several hours. As a result, many register values change, affecting many of the combined component values placed under the fan out network of those registers with the changed values. In such a case, the mating part regeneration process may be relatively slow. In other cases, after a hardware accelerated run, only a few register values may change. The fan out network for registers with changed register values can be small, and the combined component regeneration process can be relatively fast.

Ⅳ. 타켓 시스템 모드를 사용한 이뮬레이션(emulation)Ⅳ. Emulation Using Targeted System Mode

도 10은 본 발명의 일실시예에 따른 SEmulation 시스템 아키텍쳐를 도시한다. 도 10은 시스템이 회로내 이뮬레이션 모드에서 동작할때 소프트웨어 모델, 하드웨어 모델, 이뮬레이션 인터페이스 및 타켓 시스템 사이의 관게를 도시한다. 상기된 바와같이, SEmulation 시스템은 PCI 버스 같은 고속 버스에 의해 상호접속되는 범용 마이크로프로세서 및 재구성 하드웨어 보드를 포함한다. SEmulation 시스템은 사용자의회로 설계를 컴파일하고 하드웨어 모델 대 재구성 보드 맵핑 처리 동안 이뮬레이션 하드웨어 구성 데이타를 생성한다. 그 다음 사용자는 범용 프로세서를 통하여 회로를 시뮬레이트하고, 시뮬레이션 처리를 하드웨어 가속하고, 이뮬레이션 인터페이스를 통하여 타켓 시스템으로 회로 설계를 이뮬레이트하고, 추후 포스트 시뮬레이션 분석을 수행한다.10 illustrates a SEmulation system architecture in accordance with an embodiment of the present invention. 10 illustrates the relationship between a software model, a hardware model, an emulation interface, and a target system when the system is operating in an in-circuit emulation mode. As noted above, the SEmulation system includes a general purpose microprocessor and reconfiguration hardware board interconnected by a high speed bus such as a PCI bus. The SEmulation system compiles your circuit design and generates emulation hardware configuration data during the hardware model to reconfiguration board mapping process. The user then simulates the circuit through a general-purpose processor, hardware accelerates the simulation process, emulates the circuit design into the target system via the emulation interface, and then performs post simulation analysis.

소프트웨어 모델(315) 및 하드웨어 모델(325)은 컴파일 처리 동안 결정된다. 이뮬레이션 인터페이스(382) 및 타켓 시스템(387)은 회로내 이뮬레이션 모드 동안 시스템에 제공된다. 사용자의 결정하에서, 이뮬레이션 인터페이스 및 타켓 시스템은 최초에 시스템에 결합될 필요가 없다.Software model 315 and hardware model 325 are determined during the compilation process. The emulation interface 382 and the target system 387 are provided to the system during the in-circuit emulation mode. At the user's discretion, the emulation interface and target system do not need to be initially coupled to the system.

소프트웨어 모델(315)은 모든 시스템, 및 소프트웨어/하드웨어 바운더리-REG, S2H, H2S 및 CLK에 대한 4 어드레스 공간을 제어하는 커널(316)을 포함한다. SEmulation 시스템은 하드웨어 모델을 다른 부품 타입 및 제어 기능에 따른 메인 메모리의 4 어드레스 공간에 맵핑한다 : REG 공간(317)은 레지스터 부품을 위해 설계되고; CLK 공간(320)은 소프트웨어 클럭을 위해 설계되고; S2H 공간(318)은 하드웨어 모델에 대한 소프트웨어 테스트 벤치 부품의 출력을 위해 설계되고; H2S 공간(319)은 소프트웨어 테스트 벤치 부품에 대한 하드웨어 모델의 출력을 위하여 설계된다. 이들 전용 I/O 버퍼 공간은 시스템 초기화 시간 동안 커널의 메인 메모리 공간에 맵핑된다.The software model 315 includes a kernel 316 that controls all system and four address spaces for software / hardware boundary-REG, S2H, H2S and CLK. The SEmulation system maps the hardware model to four address spaces of main memory according to different component types and control functions: REG space 317 is designed for register components; CLK space 320 is designed for a software clock; S2H space 318 is designed for the output of software test bench components for hardware models; H2S space 319 is designed for the output of a hardware model for software test bench components. These dedicated I / O buffer spaces are mapped into the kernel's main memory space during system initialization time.

하드웨어 모델은 FPGA 칩 및 FPGA I/O 제어기(327)의 FPGA 칩의 몇몇 뱅크(326a-326d)를 포함한다. 각각의 뱅크(예를들어, 326b)는 적어도 하나의 FPGA 칩을 포함한다. 일실시예에서, 각각의 뱅크는 4 FPGA 칩으로 구성된다. FPGA 칩의 4×4 어레이에서, 뱅크(326b 및 326d)는 로우 뱅크이고 뱅크(326a 및 326c)는 하이 뱅크일수있다. 특정 칩 및 그것의 상호접속부에 대한 특정 하드웨어 모델링 사용자 회로 설계 엘리먼트의 맵핑, 배치 및 루팅은 도 6을 참조하여 논의된다. 소프트웨어 모델(315) 및 하드웨어 모델(325) 사이의 상호접속부(328)는 PCI 버스시스템이다. 하드웨어 모델은 PCI 버스의 작업처리량을 유지하는 동안 PCI 버스 및 FPGA 칩의 뱅크(326a-326d) 사이의 데이타 트래픽을 제어하기 위하여 PCI 인터페이스(380) 및 제어 유니트(381)를 포함한다. 각각의 FPGA 칩은 추가로 몇몇 어드레스 포인터를 포함하고, 여기서 각각의 어드레스 포인터는 소프트웨어/하드웨어 바운더리내의 각각의 어드레스 공간(즉, REG, S2H, H2S 및 CLK)에 해당하여,FPGA 칩의 뱅크(326a-326d)내에서 각각의 이들 어드레스 공간 및 각각의 FPGA 칩 사이의 데이타를 결합한다.The hardware model includes an FPGA chip and several banks 326a-326d of the FPGA chip of the FPGA I / O controller 327. Each bank (eg, 326b) includes at least one FPGA chip. In one embodiment, each bank consists of four FPGA chips. In a 4x4 array of FPGA chips, banks 326b and 326d may be low banks and banks 326a and 326c may be high banks. The mapping, placement and routing of specific hardware modeling user circuit design elements for a particular chip and its interconnects are discussed with reference to FIG. 6. The interconnect 328 between the software model 315 and the hardware model 325 is a PCI bus system. The hardware model includes a PCI interface 380 and a control unit 381 to control data traffic between the PCI bus and the banks 326a-326d of the FPGA chip while maintaining the throughput of the PCI bus. Each FPGA chip further includes several address pointers, where each address pointer corresponds to a respective address space (ie, REG, S2H, H2S, and CLK) in the software / hardware boundary, bank of the FPGA chip 326a Combine the data between each of these address spaces and each FPGA chip within -326d).

소프트웨어 모델(315) 및 하드웨어 모델(325) 사이의 통신은 하드웨어 모델의 DMA 엔진 또는 어드레스 포인터를 통하여 발생한다. 선택적으로, 통신은 하드웨어 모델에서 어드레스 포인트 및 DMA 엔진 양쪽을 통하여 발생한다. 커널은 직접적인 맵핑 I/O 제어 레지스터를 통하여 평가 요구와 함께 DMA 전달을 시작한다. REG 공간(317), CLK 공간(320), S2H 공간(318), 및 H2S 공간(319)은 소프트웨어 모델(315) 및 하드웨어 모델(325) 사이의 데이타 전달을 위하여 각각 I/O 데이타 경로 라인(321, 322, 323 및 324)을 사용한다.Communication between software model 315 and hardware model 325 occurs via the DMA engine or address pointer of the hardware model. Optionally, communication occurs through both the address point and the DMA engine in the hardware model. The kernel initiates DMA transfer with an evaluation request through a direct mapping I / O control register. The REG space 317, CLK space 320, S2H space 318, and H2S space 319 are each an I / O data path line (I / O data path line) for data transfer between software model 315 and hardware model 325. 321, 322, 323 and 324).

이중 버퍼링은 이들 공간이 업데이트 처리를 완료하기 위하여 몇몇 클럭 사이클이 걸리기 때문에 S2H 및 CLK 공간에 대한 모든 제 1 입력을 위해 요구된다. 이중 버퍼링은 레이스 조건을 유발할수있는 내부 하드웨어 모델 상태를 혼란시키는 것을 방지한다.Double buffering is required for all first inputs to the S2H and CLK space since these spaces take several clock cycles to complete the update process. Double buffering prevents disruption to internal hardware model states that can cause race conditions.

S2H 및 CLK 공간은 커널로부터 하드웨어 모델로의 주요 입력이다. 상기된 바와같이, 하드웨어 모델은 사용자의회로 설계의 결합 부품 및 레지스터 부품 모두를 실질적으로 홀딩한다. 게다가, 소프트웨어 클럭은 소프트웨어로 모델화되고 하드웨어 모델과 인터페이스하도록 CLK I/O 어드레스에 제공된다. 커널은 시뮬레이션 시간을 앞당기고, 활성 테스트 벤치 부품을 찾고, 클럭 부품을 평가한다. 임의의 클럭 에지가 커널에 의해 검출될때, 레지스터 및 메모리는 업데이트되고 결합 부품을 통한 값은 전달된다. 따라서, 이들 공간에서 값의 임의 변화는 만약 하드웨어 가속 모드가 선택되면 하드웨어 모델을 트리거하여 논리 상태를 변경시킨다.S2H and CLK space are the main inputs from the kernel to the hardware model. As noted above, the hardware model substantially holds both the coupling components and the resistor components of the user's circuit design. In addition, the software clock is modeled in software and provided at the CLK I / O address to interface with the hardware model. The kernel accelerates simulation time, finds active test bench parts, and evaluates clock parts. When any clock edge is detected by the kernel, the registers and memory are updated and the values passed through the coupling components. Thus, any change in value in these spaces triggers the hardware model to change the logic state if the hardware acceleration mode is selected.

회로내 이뮬레이션 모드 동안, 이뮬레이션 인터페이스(382)는 PCI 버스(328)에 결합되어 하드웨어 모델(325) 및 소프트웨어 모델(315)와 통신한다. 커널(316)은 하드웨어 가속화 시뮬레이션 모드 및 회로 이뮬레이션 모드동안 소프트웨어 모델뿐 아니라, 하드웨어 모델을 제어한다. 이뮬레이션 인터페이스(382)은 케이블(390)을 통하여 타켓 시스템(387)에 결합된다. 이뮬레이션 인터페이스(382)는 인터페이스 포트(385), 이뮬레이션 I/O 제어(386), 타켓 대 하드웨어 I/O 버퍼(T2H)(384), 및 하드웨어 대 타켓 I/O 버퍼(H2T)(383)를 포함한다.During the in-circuit emulation mode, the emulation interface 382 is coupled to the PCI bus 328 to communicate with the hardware model 325 and the software model 315. The kernel 316 controls the hardware model as well as the software model during the hardware acceleration simulation mode and the circuit emulation mode. Emulation interface 382 is coupled to target system 387 via cable 390. Emulation interface 382 includes interface port 385, emulation I / O control 386, target-to-hardware I / O buffers (T2H) 384, and hardware-to-target I / O buffers (H2T) (383). ).

타켓 시스템(387)은 접속기(389), 신호 입출력 인터페이스 소켓(388), 및 타켓 시스템(387)의 일부인 다른 모듈 또는 칩을 포함한다. 예를들어, 타켓 시스템(387)은 EGA 비디오 제어기일수있고 사용자의 회로 설계는 하나의 특정 I/O 제어기 회로일수있다. EGA 비디오 제어기에 대한 I/O 제어기의 사용자 회로 설계는 소프트웨어 모델(315)에서 완전히 모델화되고 하드웨어 모델(325)에서 부분적으로 모델화된다.The target system 387 includes a connector 389, a signal input / output interface socket 388, and other modules or chips that are part of the target system 387. For example, target system 387 may be an EGA video controller and the user's circuit design may be one specific I / O controller circuit. The user circuit design of the I / O controller for the EGA video controller is fully modeled in the software model 315 and partially modeled in the hardware model 325.

소프트웨어 모델(315)의 커널(316)은 회로내 이뮬레이션 모드를 제어한다.이뮬레이션 클럭의 제어는 소프트웨어 클럭을 통한 소프트웨어, 게이트 클럭 논리, 및 게이트 데이타 논리에 존재하여 셋업 및 홀딩 시간은 회로내 이뮬레이션 모드동안 발생할것이다. 따라서, 사용자는 회로내 이뮬레이션 처리 동안 임의의 시간에 시작, 정지, 단일 단계, 값 주장, 및 값 검사를 할수있다.The kernel 316 of the software model 315 controls the in-circuit emulation mode. Control of the emulation clock is present in the software, gate clock logic, and gate data logic via the software clock so that setup and holding time is in-circuit. It will occur during emulation mode. Thus, the user can start, stop, single step, value assertion, and value check at any time during the in-circuit emulation process.

이런 작업을 위해, 타켓 시스템 및 하드웨어 모델 사이의 모든 클럭 노드는 식별된다. 타켓 시스템에서 클럭 생성기는 디스에이블되고, 타켓 시스템으로부터의 클럭 포트는 분리되거나, 타켓 시스템으로부터의 클럭 신호는 하드웨어 모델에 도달되는 것이 방지된다. 대신, 클럭 신호는 소프트웨어 생성 클럭의 다른 형태 또는 테스트 벤치 처리로부터 시작하여, 소프트웨어 커널은 활성 클럭 에지를 검출하고 따라서 데이타 평가를 트리거한다. 따라서, ICE 모드에서, SEmulation 시스템은 소프트웨어 클럭을 사용하여 타켓 시스템 클럭 대신 하드웨어 모델을 제어한다.For this task, all clock nodes between the target system and the hardware model are identified. In the target system, the clock generator is disabled, the clock port from the target system is disconnected, or the clock signal from the target system is prevented from reaching the hardware model. Instead, the clock signal starts with another form of software generated clock or test bench processing, so that the software kernel detects an active clock edge and thus triggers data evaluation. Thus, in ICE mode, the SEmulation system uses a software clock to control the hardware model instead of the target system clock.

타켓 시스템의 환경내에서 사용자 회로 설계 동작 시뮬레이트하기 위하여, 타켓 시스템(40) 및 모델화된 회로 설계 사이의 1차 입력(신호 입력) 및 출력(신호 출력) 신호는 평가를 위하여 하드웨어 모델(325)에 제공된다. 이것은 두개의 버퍼, 즉 타켓 대 하드웨어 버퍼(T2H)(384) 및 하드웨어 대 타켓 버퍼(H2T)(383)를 통하여 달성된다. 타켓 시스템(387)은 입력 신호를 하드웨어 모델(325)에 인가하기 위하여 T2H 버퍼(384)를 사용한다. 하드웨어 모델(325)은 H2T 버퍼(383)를 사용하여 출력 신호를 타켓 시스템(387)에 전달한다. 이런 회로내 이뮬레이션 모드에서, 하드웨어 모델은 S2H 및 H2S 버퍼 대신 T2H 및 H2T 버퍼를 통하여 I/O 신호를 송신 및 수신한다. 왜냐하면, 시스템은 데이타를 평가하기 위하여 소프트웨어 모델(315)내의 테스트 벤치 처리 대신 타켓 시스템(387)을 사용하기 때문이다. 타켓 시스템이 소프트웨어 시뮬레이션의 속도보다 실질적으로 빠르게 운행하기 때문에, 회로내 이뮬레이션 모드는 보다 높은 속도로 운행할것이다. 이들 입력 및 출력 신호의 전송은 PCI 버스(328)에서 발생한다.In order to simulate user circuit design operation within the environment of the target system, the primary input (signal input) and output (signal output) signals between the target system 40 and the modeled circuit design are sent to the hardware model 325 for evaluation. Is provided. This is accomplished through two buffers: target to hardware buffer (T2H) 384 and hardware to target buffer (H2T) 383. Target system 387 uses T2H buffer 384 to apply an input signal to hardware model 325. The hardware model 325 uses the H2T buffer 383 to deliver the output signal to the target system 387. In this in-circuit emulation mode, the hardware model sends and receives I / O signals through the T2H and H2T buffers instead of the S2H and H2S buffers. This is because the system uses the target system 387 instead of the test bench processing in the software model 315 to evaluate the data. Because the target system runs substantially faster than the speed of software simulation, the in-circuit emulation mode will run at higher speeds. The transmission of these input and output signals occurs on the PCI bus 328.

게다가, 버스(61)는 이뮬레이션 인터페이스(382) 및 하드웨어 모델(325) 사이에 제공된다. 이 버스는 도 1의 버스(61)와 유사하다. 이 버스(61)는 이뮬레이션 인터페이스(382) 및 하드웨어 모델(325)가 T2H 버퍼(384) 및 H2T 버퍼(383)을 통하여 통신하도록 하게 한다.In addition, bus 61 is provided between emulation interface 382 and hardware model 325. This bus is similar to the bus 61 of FIG. This bus 61 allows the emulation interface 382 and the hardware model 325 to communicate via the T2H buffer 384 and the H2T buffer 383.

통상적으로, 타켓 시스템(387)은 PCI 버스에 결합되지 않는다. 그러나, 상기 결합은 만약 이뮬레이션 인터페이스(382)가 타켓 시스템(387)의 설계에 통합되면 실현될수있다. 타켓 시스템(387) 및 하드웨어 모델(325) 사이의 신호는 여전히 이뮬레이션 인터페이스를 통하여 통과할것이다.Typically, target system 387 is not coupled to the PCI bus. However, the combination may be realized if the emulation interface 382 is integrated into the design of the target system 387. The signal between the target system 387 and the hardware model 325 will still pass through the emulation interface.

Ⅴ. 포스트 시뮬레이션 분석 모드Ⅴ. Post Simulation Analysis Mode

본 발명의 SEumlation 시스템은 값 변화 덤프(VCD)를 지원하고, 포스트 시뮬레이션 분석을 위해 폭넓게 사용된 시뮬레이터 기능을 지원한다. 필수적으로, VCD는 추후 포스트 시뮬레이션 분석동안, 사용자가 시뮬레이션 처리의 다양한 입력 및 결과적인 출력을 검토할수있도록 하드웨어 모델의 모든 입력 및 선택된 레지스터 출력의 히스토리 기록을 제공한다. VCD를 지원하기 위하여, 시스템은 하드웨어 모델에 대한 모든 입력을 로그한다. 출력에 대하여, 시스템은 사용자 정의 로깅 주파수(예를들어, 1/10,000 기록/사이클)에서 하드웨어 레지스터 부품의 모든 값을 로그한다. 로깅 주파수는 출력 값을 기록하는 방법을 결정한다. 1/10,000 기록/사이클의 로깅 주파수에 대하여, 출력 값에는 매 10,000 사이클이 기록된다. 로깅 주파수가 높아질수록, 추후 포스트 시뮬레이션 분석을 위하여 정보가 보다 많이 기록된다. 로깅 주파수가 낮아질수록, 추후 포스트 시뮬레이션 분석을 위하여 정보가 보다 적게 저장된다. 선택된 로깅 주파수가 SEmulation 속도와 인과 관계를 가지기 때문에, 사용자는 주의깊게 로깅 주파수를 선택하여야 한다. 보다 높은 로깅 주파수는 추가 시뮬레이션이 수행되기 전에 메모리에 대한 I/O 동작을 수행함으로써 출력 데이타를 기록하기 위하여 시스템이 시간과 자원을 소비하기 때문에 SEmulation 속도를 감소시킨다.The SEumlation system of the present invention supports a value change dump (VCD) and a simulator function widely used for post simulation analysis. Essentially, the VCD provides a historical record of all inputs and selected register outputs of the hardware model, allowing the user to review the various inputs and resulting outputs of the simulation process during later post simulation analysis. To support the VCD, the system logs all inputs to the hardware model. For output, the system logs all values of hardware register components at user-defined logging frequencies (e.g., 1 / 10,000 writes / cycle). The logging frequency determines how the output value is recorded. For a logging frequency of 1 / 10,000 recordings / cycle, every 10,000 cycles is recorded in the output value. The higher the logging frequency, the more information is recorded for later post simulation analysis. The lower the logging frequency, the less information is stored for later post simulation analysis. Since the selected logging frequency has a causal relationship with the SEmulation rate, the user must select the logging frequency carefully. Higher logging frequencies reduce the speed of SEmulation because the system spends time and resources to write output data by performing I / O operations to memory before further simulations are performed.

포스트 시뮬레이션 분석과 관련하여, 사용자는 시뮬레이션이 목표되는 특정 포인트를 선택한다. 만약 로깅 주파수가 1/500 기록/사이클이면, 레지스터 값은 매 500 사이클에서 포인트 0, 500, 1000, 1500 등을 기록한다. 만약 사용자가 포인트 610에서의 결과를 원하면, 예를들어 사용자는 기록된 포인트 500을 선택하고, 시뮬레이션이 포인트 610에 도달할때까지 시뮬레이트한다. 분석 단계 동안, 분석 속도는 사용자가 처음에 포인트 500에 대한 데이타를 액세스하고 그 다음 포인트 610으로 시뮬레이트하기 때문에 시뮬레이션 속도와 같다. 보다 높은 로깅 주파수에서, 보다 많은 데이타가 포스트 시뮬레이션 분석을 위하여 저장된다는 것이 주의된다. 따라서, 1/300 기록/사이클의 로깅 주파수에 대하여, 매 300 사이클에서 포인트 0, 300, 600, 900 등에 대한 데이타가 기록된다. 포인트 610에서의 결과를얻기 위하여, 사용자는 처음에 기록된 포인트 600을 선택하고, 포인트 610으로 시뮬레이트한다. 로깅 주파수가 1/500보다 오히려 1/300일때 포스트 시뮬레이션 분석 동안 목표된 포인트 610으로 보다 빨리 도달한다. 그러나, 이것은 항상 그렇지는 않다. 로깅 주파수와 결합하여 특정 분석 포인트는 얼마나 빨리 포스트 시뮬레이션 분석 포인트에 도달하는지를 결정한다. 예를들어, 시스템은 만약 VCD 로깅 주파수가 1/300보다 오히려 1/500이면 보다 빨리 포인트 523에 도달할수있다.With regard to post simulation analysis, the user selects a particular point at which the simulation is targeted. If the logging frequency is 1/500 writes / cycle, the register value writes points 0, 500, 1000, 1500, etc. every 500 cycles. If the user wants the result at point 610, for example, the user selects the recorded point 500 and simulates until the simulation reaches point 610. During the analysis phase, the analysis rate is equal to the simulation rate because the user first accesses data for point 500 and then simulates to point 610. It is noted that at higher logging frequencies, more data is stored for post simulation analysis. Thus, for a logging frequency of 1/300 recording / cycle, data for points 0, 300, 600, 900 and the like are recorded every 300 cycles. To get the result at point 610, the user selects the initially recorded point 600 and simulates it with point 610. When the logging frequency is 1/300 rather than 1/500, it reaches faster at the target point 610 during post simulation analysis. However, this is not always the case. In combination with the logging frequency, a particular analysis point determines how quickly a post simulation analysis point is reached. For example, the system can reach point 523 sooner if the VCD logging frequency is 1/500 rather than 1/300.

그 다음 사용자는 SEmulation후 모든 하드웨어 부품의 값 변화 덤프를 계산하기 위해 하드웨어 모델에 대한 입력 로그를 가진 소프트웨어 시뮬레이션을 운행함으로써 분석을 수행한다. 사용자는 적시에 임의의 레지스터 로그 포인트를 선택하고 적시에 로그 포인트가 진행하는 값 변화 덤프를 시작한다. 이런 값 변화 덤프 방법은 포스트 시뮬레이션 분석을 위해 임의의 시뮬레이션 파형 뷰어에 링크할수있다.The user then performs the analysis by running a software simulation with input logs for the hardware model to calculate a value change dump of all hardware components after SEmulation. The user selects any register log point in a timely manner and starts a value change dump where the log point progresses in a timely manner. This value change dump method can be linked to any simulation waveform viewer for post simulation analysis.

VCD 주문 시스템VCD Ordering System

본 발명의 일실시예는 시뮬레이션 리턴없이 VCD 주문을 생성하는 시스템이다. 본 발명의 일실시예에 따라, 여기에 기술된 바와같은 VCD 주문 기술은 다음 고레벨 속성을 통합한다; (1) RCC 바탕 병렬 시뮬레이션 히스토리 압축 및 기록, (2) RCC 바탕 병렬 시뮬레이션 히스토리 압축해제 및 VCD 파일 생성, 및 (3) 시뮬레이션 리턴없이 선택된 시뮬레이션 타켓 범위 및 설계 개요에 대한 주문 소프트웨어 재생성. 각각의 이들 속성은 하기에 더 상세히 기술될것이다.One embodiment of the invention is a system for creating a VCD order without a simulation return. According to one embodiment of the present invention, VCD ordering techniques as described herein incorporate the following high level attributes; (1) RCC based parallel simulation history compression and recording, (2) RCC based parallel simulation history decompression and VCD file generation, and (3) custom software regeneration for selected simulation target range and design overview without simulation return. Each of these attributes will be described in more detail below.

디버그 세션 동안, EDA 툴(이후 본 발명의 다양한 측면을 통합하는 RCC 시스템이라 함)은 임의의 시뮬레이션 부분이 재생되도록 테스트 벤치 처리로부터 제 1 입력을 기록한다. 사용자는 추후 분석 동안 임의의 시뮬레이션 시간 범위로부터 VCD 파일로 하드웨어 상태 정보를 덤프하기 위하여 EDA 툴, 또는 RCC 시스템에게 선택적으로 명령할수있다. 이후, 사용자는 선택된 시뮬레이션 시간 범위내에 그의 설계를 디버깅하는 것을 즉각적으로 시작할수있다. 만약 사용자가 수리하기 위해 찾는 디버를 선택된 시뮬레이션 시간 범위가 포함하지 않으면, 상기 사용자는 VCD 파일로 덤프하기 위하여 다른 시뮬레이션 시간 범위를 선택할수있다. 그 다음 사용자는 이런 새로운 VCD 파일을 분석할수있다. 이런 VCD 주문 특징으로 인해, 사용자는 임의의 포인트에서 시뮬레이션을 중단하고 임의의 목표된 시뮬레이션 시간 시작 포인트로부터 임의의 시뮬레이션 시간 끝 포인트로 주문형 다른 선택적 VCD 파일의 생성을 요구한다.During the debug session, the EDA tool (hereinafter referred to as the RCC system incorporating various aspects of the present invention) records the first input from the test bench process so that any simulation portion is played. The user can optionally command the EDA tool, or the RCC system, to dump hardware status information into the VCD file from any simulation time range during later analysis. The user can then immediately begin debugging his design within the selected simulation time range. If the debugger the user seeks to repair does not include the selected simulation time range, the user can select a different simulation time range to dump to the VCD file. The user can then analyze these new VCD files. Due to this VCD ordering feature, the user may stop the simulation at any point and require the creation of another optional VCD file on demand from any desired simulation time start point to any simulation time end point.

통상적인 디버그 세션에서, 사용자는 도 83에 도시된 RCC 시스템을 사용하여 그의 설계를 디버그한다. 제 1 시뮬레이션동안, 사용자는 여기서시뮬레이션 세션 범위라 불리는 목표된 시작 시뮬레이션 시간으로부터 임의의 목표된 끝 시뮬레이션 시간에서 그의 설계를 빠르게 시뮬레이트한다. 이런 빠른 시뮬레이션 동안, 제 1 입력의 고도로 압축된 형태는 "입력 히스토리" 파일에 기록되어 시뮬레이션 세션의 일부는 재생될수있다. 시뮬레이션 세션 범위의 끝 부분에서, RCC 시스템은 만약 목표되면 사용자가 이런 끝 포인트를 지난 설계를 디버깅하는 하는 것으로 리턴할수있도록 "시뮬레이션 히스토리" 파일에서 이런 끝 포인트로부터의 하드웨어 상태 정보를 저장한다.In a typical debug session, the user debugs his design using the RCC system shown in FIG. During the first simulation, the user quickly simulates his design at any target end simulation time from the target start simulation time, referred to herein as the simulation session range . During this fast simulation, the highly compressed form of the first input is recorded in an "input history" file so that part of the simulation session can be played. At the end of the simulation session scope, the RCC system stores hardware state information from these end points in the "simulation history" file so that if desired, the user can return to debugging the design past these end points.

빠른 시뮬레이션 운행의 끝 부분에서, 사용자는 결과를 분석하고 그의 설계가 가지는 몇몇 문제를 예외없이 검출한다. 그 다음 사용자는 문제(즉, 버그)의 원인이 보다 넓은 시뮬레이션 세션 범위내에 있는 이후시뮬레이션 타켓 범위라 불리는 특정 좁은 시뮬레이션 시간 범위내에 배치되는 것을 추측한다. 예를들어, 만약 시뮬레이션 세션 범위가 1,000 시뮬레이션 시간 단계를 포함한다면, 보다 좁은 시뮬레이션 타켓 범위는 보다 넓은 시뮬레이션 세션 범위내의 특정 위치에서 단지 100 시뮬레이션 시간 단계만을 포함한다.At the end of the fast simulation run, the user analyzes the results and detects without exception some problems with his design. The user then assumes that the cause of the problem (i.e., the bug) is located within a broader simulation session range and then within a certain narrow simulation time range called the simulation target range . For example, if the simulation session range includes 1,000 simulation time steps, the narrower simulation target range includes only 100 simulation time steps at a particular location within the wider simulation session range.

일단 사용자가 버그를 격리시키기 위하여 시뮬레이션 타켓 범위의 정확 위치에 대해 추측하면, RCC 시스템은 입력 히스토리 파일에서 압축된 제 1 입력을 압축해제하고 평가를 위한 하드웨어 모델에 압축해제된 제 1 입력을 전달함으로써 시작부터 빠르게 시뮬레이트한다. RCC 시스템이 시뮬레이션 타켓 범위에 도달할때, VCD 파일로 평가된 결과(예를들어, 하드웨어 노드 값 및 레지스터 상태)을 덤프한다. 그후, 사용자는 시뮬레이션 세션 범위의 시작으로부터 시뮬레이션을 리턴하기 보다 시뮬레이션 타켓 범위의 시작에서, 또는 심지어 시뮬레이션의 시작 직후로부터 시작하는 VCD 파일을 사용하여 그의 설계를 재생함으로써 보다 주의 깊게 이 영역을 분석할수있다. VCD 파일 같은 시뮬레이션 타켓 범위로부터 하드웨어 상태를 저장하는 이런 특징은 시뮬레이션 재시작중에 낭비되지 않는 상당한 양의 디버그 시간을 절약한다.Once the user has guessed about the exact location of the simulation target range to isolate the bug, the RCC system decompresses the first compressed input in the input history file and passes the uncompressed first input to the hardware model for evaluation. Simulate quickly from the start. When the RCC system reaches the simulation target range, it dumps the evaluated results (eg, hardware node values and register status) into the VCD file. The user can then analyze this area more carefully by replaying his design with a VCD file starting at the start of the simulation target range, or even immediately after the start of the simulation, rather than returning the simulation from the start of the simulation session range. . This feature of storing hardware state from a simulation target range, such as a VCD file, saves a significant amount of debug time that is not wasted during a simulation restart.

도 83을 다시 참조하여, 본 발명의 일실시예를 통합한 고레벨의 RCC 시스템이 도시된다. RCC 시스템은 RCC 계산 시스템(2600) 및 RCC 하드웨어 가속기(2620)를 포함한다. 본 특허 명세서의 여러곳에 기재된 바와같이, RCC 게산 시스템(2600)은 사용자가 소프트웨어에서 사용자의 전체 소프트웨어 모델화 설계를 시뮬레이트하게 하고 상기 설계에서 하드웨어 모델화 부분의 하드웨어 가속을 제어하도록 하기에 필요한 계산 리소스를 포함한다. 이런 목적을 위해, RCC 계산 시스템(2600)은 CPU(2601), RCC 시스템의 다양한 부품에 의해 필요한 다양한 클럭(2602)(특허 명세서 여러곳에 기술된 소프트웨어 클럭 포함), 테스트 벤치 프로세스(2603), 및 시스템 디스크(2604)를 포함한다. 몇몇 통상적인 하드웨어 바탕 이벤트 히스토리 버퍼와 대조하여, 시스템 디스크는 작은 하드웨어 RAM 버퍼보다 압축된 데이타를 기록하기 위하여 사용된다. 비록 도시되지 않았지만, RCC 계산 시스템(2600)은 계산 시스템이 수행하는 여러 태스크중 진단, 다양한 소프트웨어, 및 관리 파일을 운행하기 위하여 계산 전력을 회로 설계자에게 제공하는 다른 논리 부품 및 버스 서브시스템을 포함한다.Referring again to FIG. 83, there is shown a high level RCC system incorporating one embodiment of the present invention. The RCC system includes an RCC calculation system 2600 and an RCC hardware accelerator 2620. As described elsewhere in this patent specification, RCC calculation system 2600 includes computational resources needed to allow a user to simulate the user's entire software modeling design in software and to control hardware acceleration of the hardware modeling portion of the design. do. For this purpose, the RCC calculation system 2600 may include a CPU 2601, various clocks 2602 (including software clocks described in various patent specifications), test bench processes 2603, and the like required by various components of the RCC system. System disk 2604. In contrast to some conventional hardware based event history buffers, system disks are used to write compressed data rather than small hardware RAM buffers. Although not shown, RCC calculation system 2600 includes a bus subsystem that provides computational power to circuit designers to run diagnostics, various software, and management files among various tasks performed by the calculation system. .

본 특허 명세서의 다른 세션에서 RCC 어레이라 불리는 RCC 하드웨어 가속기(2620)는 사용자가 디버깅 처리를 가속화하도록 하드웨어에서 사용자 설계의 적어도 일부를 모델링할수있는 논리 엘리먼트(예를들어, FPGA)의 재구성 어레이를 포함한다. 이런 목적을 위하여, RCC 하드웨어 가속기(2620)는 사용자 설계 부분의 하드웨어 모델을 제공하는 재구성 가능한 논리 엘리먼트(2621)의 어레이를 포함한다. RCC 계산 시스템(2600)은 본 특허 명에서 여러곳에 기술된 바오같은 소프트웨어 클럭 및 버스 시스템을 통하여 RCC 하드웨어 가속기(2620)에 단단히 고정되고, 상기 버스의 일부는 도 83에서 라인(2610 및 2611)로서 도시된다.In another session of this patent document, an RCC hardware accelerator 2620, called an RCC array, includes a reconstruction array of logical elements (eg, FPGAs) that a user can model at least a portion of a user design in hardware to speed up the debugging process. . For this purpose, the RCC hardware accelerator 2620 includes an array of reconfigurable logic elements 2621 that provide a hardware model of the user design portion. The RCC calculation system 2600 is securely fastened to the RCC hardware accelerator 2620 via a software clock and bus system, such as the one described in various places in this patent document, with portions of the bus as lines 2610 and 2611 in FIG. 83. Shown.

본 발명의 주문형 VCD는 도 84와 관련하여 논의될것이다. 도 84는 몇몇 시뮬레이션 시간의 시간라인 - t0, t1, t2 및 t3을 도시한다. 시뮬레이션 세션 범위는 시뮬레이션 시간 t0 및 시뮬레이션 시간 t3 사이에 있고, 그것의 경로는 시뮬레이션 시간 t1 및 t2를 포함한다. 시뮬레이션 시간 t0는 빠른 시뮬레이션 시작하는 시뮬레이션 세션 범위내의 제 1 시뮬레이션 시간을 나타낸다. 이 시뮬레이션 시간 t0은 임의의 분리 가능한 시뮬레이션 세션, 또는 시뮬레이션 세션 범위 동안 제 1 시뮬레이션 시간을 나타낸다. 다른 말로, 오늘의 디버그 세션이 t=10,000 내지 t=12,000의 시뮬레이션 세션 범위를 시험하는 것을 가정한다. 사용자는 특정 버그가 t=10,500 및 t=10,750 사이에 배치된다는 것을 가정한다. 이런 시뮬레이션 세션 범위에 대하여, 시뮬레이션 시간 t0는 t=10,000이다. 특정 버그가 배치되고 시뮬레이션 세션 범위 t=10,000 내지 t=12,000 동안 수리된다는 것이 가정된다. 그다음, 내일 사용자는 다음 시뮬레이션 세션 범위 t=12,000 내지 t=15,000으로 이동한다. 여기서, 시뮬레이션 시간 t0는 t=12,000이다. 몇몇 경우, 시뮬레이션 시간 t0는 사용자 설계의 제 1 버그 세션 동안(즉, t0가 t=0에 대응한다) 바로 제 1 시뮬레이션 시간을 나타낸다.The custom VCD of the present invention will be discussed with reference to FIG. 84. 84 shows timelines-t0, t1, t2 and t3 at several simulation times. The simulation session range is between simulation time t0 and simulation time t3, the path of which includes simulation times t1 and t2. Simulation time t0 represents a first simulation time within the scope of a simulation session that starts a fast simulation. This simulation time t0 represents the first simulation time during any separable simulation session, or simulation session range. In other words, suppose today's debug session tests a simulation session range of t = 10,000 to t = 12,000. The user assumes that a particular bug is placed between t = 10,500 and t = 10,750. For this simulation session range, the simulation time t0 is t = 10,000. It is assumed that a particular bug is deployed and repaired for the simulation session range t = 10,000 to t = 12,000. Then, the user moves to the next simulation session range t = 12,000 to t = 15,000 tomorrow. Here, simulation time t0 is t = 12,000. In some cases, simulation time t0 represents the first simulation time immediately during the first bug session of the user design (ie t0 corresponds to t = 0).

유사하게, 시뮬레이션 시간 t3은 선택된 시뮬레이션 세션 범위 동안 최종 시뮬레이션 시간을 나타낸다. 다른 말로, 오늘날의 디버그 세션은 t=14,555 내지 t=16,750의 시뮬레이션 세션 범위 확장을 포함한다. 이런 시뮬레이션 세션 범위에 대하여, 시뮬레이션 시간 t3는 t=16,750이다. 특정 버그가 이런 시뮬레이션 세션 범위 t=14,555 내지 t=16,750 동안 배치되고 수리되는 것이 가정된다. 그 다음 사용자는 다음 시뮬레이션 세션 범위 t=16,750 내지 t=19,100상에서 이동한다. 여기서, 시뮬레이션 시간 t3는 t=19,100이다. 몇몇 경우, 시뮬레이션 시간 t3은 사용자 설계자의 최종 디버그 센션 동안 최종 시뮬레이션 시간을 나타낸다.Similarly, simulation time t3 represents the final simulation time for the selected simulation session range. In other words, today's debug sessions include simulation session coverage extensions from t = 14,555 to t = 16,750. For this simulation session range, the simulation time t3 is t = 16,750. It is assumed that a particular bug is deployed and repaired for this simulation session range t = 14,555 to t = 16,750. The user then moves over the next simulation session range t = 16,750 to t = 19,100. Here, simulation time t3 is t = 19,100. In some cases, simulation time t3 represents the final simulation time during the final debug session of the user designer.

사용자는 만약 목표되면 이런 시뮬레이션 시간 t3넘어서 시뮬레이션을 계속할수있지만, 이동을 위하여 사용자는 시뮬레이션 시간 t0 내지 t3, 즉 현재 시뮬레이션 세션 범위 동안 그의 설계를 디버깅하는 것에 집중된다. 통상적으로, 버그가 현재 시뮬레이션 세션 범위 동안 아이론(ironed)될때, 사용자는 시뮬레이션 시간 t3를 넘어 다음 시뮬레이션 세션 범위로 그의 설계를 시뮬레이트할것이다.The user can continue the simulation beyond this simulation time t3 if desired, but for the move the user is focused on debugging his design during simulation time t0 to t3, ie the current simulation session scope. Typically, when a bug is ironed during the current simulation session scope, the user will simulate his design over the simulation time t3 to the next simulation session scope.

시뮬레이션 세션 범위의 이런 욕약 표현에서, 이들 시뮬레이션 시간 주기 t0-t3는 필수적으로 서로 인접한다; 즉, 시뮬레이션 시간 t0 및 t1은 서로 바로 인접하지 않는다. 정말로, 시뮬레이션 시간 t0 및 t1은 수천의 개별적인 시뮬레이션 시간 주기일수있다.In this greedy representation of the simulation session range, these simulation time periods t0-t3 are essentially adjacent to each other; In other words, the simulation times t0 and t1 are not immediately adjacent to each other. Indeed, the simulation times t0 and t1 can be thousands of individual simulation time periods.

본 발명의 일실예가 RCC 시스템에서 실행되기 때문에, 도 83에 도시된 RCC 시스템의 여러 부품에 대한 참조가 이루어질것이다. 첫째, RCC 시스템의입력 및 시뮬레이션 히스토리 생성 동작은 논의될것이다. 이런 생성 동작은 제 1 입력에 대한 데이타 압축의 몇몇 형태 및 압축된 제 1 입력의 몇몇 기록 형태를 포함한다. 둘째, RCC 시스템의 VCD 생성 동작은 논의될것이다. 이런 VCD 생성 동작은 시뮬레이션 히스토리를 재생하기 위하여 제 1 입력을 압축해제하고 시뮬레이션 타켓 범위 동안 하드웨어 상태를 VCD 파일로 덤핑하는 것을 포함한다. 셋째, VCD 검토 과정은 논의될것이다. 비록 용어 "시뮬레이션 히스토리"가 때때로 사용되지만, 이것은전체 디버그 세션이 소프트웨어 시뮬레이션을 포함하는 것을 의미하지 않는다. 정말로, RCC 시스템은 하드웨어 상태로부터 VCD 파일을 생성하고 소프트웨어 모델은 VCD 파일의 추후 분석을 위해서만 사용된다.Since one embodiment of the invention is implemented in an RCC system, reference will be made to various components of the RCC system shown in FIG. First, the input and simulation history generation behavior of the RCC system will be discussed. This generation operation includes some form of data compression for the first input and some form of recording of the compressed first input. Second, the VCD creation operation of the RCC system will be discussed. This VCD generation operation includes decompressing the first input to replay the simulation history and dumping the hardware state into the VCD file during the simulation target range. Third, the VCD review process will be discussed. Although the term "simulation history" is sometimes used, this does not mean that the entire debug session includes software simulation. Indeed, the RCC system creates a VCD file from the hardware state and the software model is used only for later analysis of the VCD file.

입력 및 시뮬레이션 히스토리 생성-압축 및 기록Input and simulation history generation—compression and recording

최초에, 사용자는 도 83의 RCC 계산 시스템(2600)내의 소프트웨어 설계를 모델화한다. 상기 설계의 몇몇 부분에 대하여, RCC 계산 시스템(2600)은 하드웨어 설명 언어(예를들어, VHDL)를 바탕으로 설계의 하드웨어 모델을 생성한다. 하드웨어 모델은 RCC 하드웨어 가속기(2620)의 일부인 재구성 가능 논리 엘리먼트(2621)의 어레이에 구성된다. 이런 셋업으로, 사용자는 RCC 계산 시스템(2600)내의 소프트웨어 설계를 시뮬레이트하고, RCC 하드웨어 가속기(2620)를 사용하여 설계의 일부(즉, 시뮬레이션 시간 단계 또는 회로의 구별되는 물리적 섹션)를 가속하고, 또는 시뮬레이션 및 하드웨어 가속을 결합한다.Initially, the user models the software design in the RCC calculation system 2600 of FIG. For some parts of the design, the RCC calculation system 2600 generates a hardware model of the design based on a hardware description language (eg, VHDL). The hardware model is configured in an array of reconfigurable logic elements 2621 that are part of the RCC hardware accelerator 2620. With this setup, the user simulates a software design in the RCC calculation system 2600, accelerates a portion of the design (ie, a simulation time step or distinct physical section of the circuit) using the RCC hardware accelerator 2620, or Combines simulation and hardware acceleration.

사용자는 그의 최종 회로 설계를 막 완료하였다. 이때가 결함을 찾기 위해 설계를 디버그하는 시간이다. 만약 사용자가 설계의 이전 버젼을 미리 디버그했다면, 그는 버그가 배치된 몇몇 장소를 생각한다. 다른 한편, 만약 이것이 새로운 설계에 대한 제 1 디버그 세션이면, 사용자는 잠재적 버그의 위치에 대해 몇몇 생각을 가져야 한다. 어느 경우에서나, 몇몇 추측 작업은 일반적으로 버그를 배치시키기 위해 필요하다. 이런 논의를 위해, 제 1 시간동안 설계를 디버깅하는 것을 가정한다.The user has just completed his final circuit design. This is the time to debug the design to find the defect. If the user has debugged a previous version of the design beforehand, he thinks of some places where the bug was placed. On the other hand, if this is the first debug session for the new design, the user should have some idea about the location of the potential bug. In either case, some guesswork is usually required to place the bug. For this discussion, assume that you debug your design during the first hour.

설계를 디버깅할때, 사용자는 시뮬레이션 세션 범위를 선택한다. 이론적으로, 이런 시뮬레이션 세션 범위는 시뮬레이션 시간의 임의의 길이일수있다. 그러나 실제로, 시뮬레이션 세션 범위는 설계에서 약간의 버그를 격리시키기에 충분하게 짧고 빠르게 디버깅 처리를 이동시키고 설게를 완전히 디버그할 필요가 있는 디버그 세션의 수를 최소화하기에 충분하게 길도록 선택되어야 한다. 분명히, 두개 또는 세개의 시뮬레이션 시간 단게의 시뮬레이션 세션 범위는 임의의 버그의 존재를 나타내지 않을것이다. 게다가, 이런 작은 시뮬레이션 세션 범위는 디버그 처리를 느리게할 많은 반복 태스크를 수행하도록 사용자에게 강요할것이다. 만약 선택된 시뮬레이션 세션 범위가 백만번의 시뮬레이션 단계이면, 너무 많은 버그가 자체적으로 나타나서 사용자는 문제 부분의 보다 집중된 공격을 실행하기 어렵게 만든다.When debugging a design, the user selects the scope of the simulation session. In theory, this simulation session range can be any length of simulation time. In practice, however, the simulation session range should be chosen long enough to isolate some bugs in the design and long enough to minimize the number of debug sessions that need to be moved quickly and completely debug the design. Obviously, the simulation session scope of two or three simulation time steps will not indicate the presence of any bugs. In addition, this small simulation session scope will force the user to perform many repetitive tasks that will slow debug processing. If the selected simulation session scope is one million simulation steps, too many bugs appear on their own, making it difficult for the user to execute a more focused attack on the problem.

일단 사용자가 시뮬레이션 세션 범위를 선택하면, 그는 RCC 시스템에게 명령하여 도 84에 도시된 바와같이 시뮬레이션 시간 t0로부터 시뮬레이션 시간 t3로의 시뮬레이션을 빠르게 한다. 상기된 바와같이, 독립된 시뮬레이션 시간 t0 내지 t3은 임의의 선택된 범위일수있지만, 시뮬레이션 시간 t0는 시뮬레이션의 시작을 나타내고 시뮬레이션 시간 t3는 시뮬레이션 세션 범위에 대한 최종 시뮬레이션 시간을 나타낸다.Once the user selects the simulation session range, he instructs the RCC system to speed up the simulation from simulation time t0 to simulation time t3 as shown in FIG. As noted above, independent simulation times t0 to t3 may be any selected range, but simulation time t0 represents the start of the simulation and simulation time t3 represents the final simulation time for the simulation session range.

시뮬레이션 시간 t0에서, 빠른 시뮬레이션은 RCC 계산 시스템(2600)에서 시작한다. 빠른 시뮬레이션은 소프트웨어 모델의 표현이 이런 시간 주기동안 필요하지 않기 때문에 일반적인 시뮬레이션 모드 대신 시뮬레이션 시간 t0로부터 시뮬레이션 시간 t3에서 수행된다. 본 특허 명세서 여러곳에서 논의된 바와같이, 재성성동작은 하드웨어 상태 정보(예를들어, 노드 값, 레지스터 상태)를 수신하기 위하여 RCC 계산 시스템(2620)을 요구하여, 보다 정교한 논리 엘리먼트(예를들어, 결합 논리부)는 사용자에 의한 추가 분석을 위해 소프트웨어에서 재성성된다. 물론, 몇몇 사용자는 시뮬레이션 처리 동안 소프트웨어 모델을 관찰하기를 원하고, 이 경우, RCC 계산 시스템(2600)은 빠른 시뮬레이션을 수행하지 않는다. 이 경우, 시뮬레이션 처리는 하드웨어 모델의 제 1 출력으로부터 소프트웨어 모델을 재생성하기 위한 RCC 계산 시스템(2600)에 의해 필요한 추가 시간으로 인해 보다 느려진다.At simulation time t0, a quick simulation starts in RCC calculation system 2600. Fast simulation is performed from simulation time t0 to simulation time t3 instead of the normal simulation mode because the representation of the software model is not needed during this time period. As discussed elsewhere in this patent specification, the regeneration operation requires the RCC calculation system 2620 to receive hardware state information (e.g., node values, register state), thereby providing more sophisticated logic elements (e.g., The join logic is regenerated in software for further analysis by the user. Of course, some users want to observe the software model during the simulation process, in which case the RCC calculation system 2600 does not perform a quick simulation. In this case, the simulation process is slower due to the additional time required by the RCC calculation system 2600 to regenerate the software model from the first output of the hardware model.

처음에, 소프트웨어 모델 상태 및 하드웨어 모델 레지스터 및 노드 값 같은 완전한 상태의 설계는 시스템 디스크에서 "시뮬레이션 히스토리" 파일이라 불리는 하나의 파일로 시뮬레이션 시간 to에서 저장된다. 이것은 사용자가 디버깅을 위해 미래의 임의의 시간에 설계 상태를 RCC 시스템에 로딩하도록 한다. 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3의 시뮬레이션 세션 범위에 대한 이런 빠른 시뮬레이션 주기 동안, RCC 계산 시스템(2600)은 두개의 구별되는 처리는 병렬로 제 1 입력(I_P)에 인가한다. 테스트 벤치 처리(2603)으로부터의 로 제 1 입력은 평가를 위해 RCC 하드웨어 가속기(2620)에 대한 라인(2610)에 제공된다. 동시에, 테스트 벤치 처리로부터의 동일한 제 1 입력은 압축되고 "입력 히스토리" 파일이라 불리는 독립된 파일로서 시스템 디스크에 기록되어, 제 1 입력의 전체 히스토리는 사용자가 추후 시뮬레이션의 임의의 일부를 재생하도록 수집될수있다. 특히, 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3에 해당하는 제 1 입력은 시스템 디스크에 압축저장된다.Initially, a complete state design, such as software model state and hardware model registers and node values, is stored at simulation time to in a file called the "simulation history" file on the system disk. This allows the user to load the design state into the RCC system at any time in the future for debugging. During this fast simulation period for the simulation session range from simulation time t0 to simulation time t3, the RCC calculation system 2600 applies two distinct processes to the first input I _P in parallel. The raw first input from test bench processing 2603 is provided to line 2610 for RCC hardware accelerator 2620 for evaluation. At the same time, the same first input from the test bench process is compressed and written to the system disk as a separate file called an "input history" file so that the entire history of the first input can be collected for the user to play back any part of the simulation later. have. In particular, the first input corresponding to simulation time t0 to simulation time t3 is compressed and stored on the system disk.

RCC 하드웨어 가속기(2620)가 테스트 벤치 처리(2603)로부터 제 1 입력(I_P)을 수신하면, 상기 가속기는 제 1 입력을 처리한다. 결과적으로, 하드웨어 모델의 하드웨어 상태는 다양한 논리 및 다른 회로 장치가 데이타를 평가할때 변화기 쉽다. 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3의 시간 주기 동안, RCC 시스템은 빠른 시뮬레이션 주기 동안 사용자가 설계에서 정교하게 디버깅하는데 이득이 없기 때문에 논리 생성을 수행하기 위하여 RCC 계산 시스템(2600)을 기다릴 필요가 없다. RCC 시스템은 또한 제 1 출력(예를들어, 하드웨어 노드 값 및 레지스터 상태)을 전혀 저장하지 않는다. RCC 계산 시스템(2600)이 "입력 히스토리" 파일에 기록하기 위하여 제 1 입력을 압축하는 동안, RCC 하드웨어 가속기(2620)이 로 및 압축되지 않는 제 1 입력을 평가하지 않는다는 것이 주의된다. 다른 실시예에서, RCC 시스템은 입력 히스토리 파일에 기록하기 위하여 제 1 입력을 압축하지 않는다.When the RCC hardware accelerator 2620 receives the first input I _P from the test bench process 2603, the accelerator processes the first input. As a result, the hardware state of the hardware model is likely to change when various logic and other circuit devices evaluate the data. During the time period from simulation time t0 to simulation time t3, the RCC system does not have to wait for the RCC calculation system 2600 to perform logic generation because there is no benefit for the user to elaborately debug in the design during the fast simulation period. The RCC system also does not store the first output (eg, hardware node value and register state) at all. It is noted that while the RCC calculation system 2600 compresses the first input to write to the "input history" file, the RCC hardware accelerator 2620 does not evaluate the low and uncompressed first input. In another embodiment, the RCC system does not compress the first input to write to the input history file.

출력이 빠른 시뮬레이션 주기 동안 전혀 저장되지 않을때 RCC 계산 시스템(2600)이 평가를 위해 RCC 하드웨어 가속기에 제 1 입력을 전달하는 이유는 무엇인가. RCC 시스템은 시뮬레이션의 시작부터 시뮬레이션 시간 t3까지 제 1 입력의 평가를 바탕으로 설계의 하드웨어 상태를 저장할 필요가 있다. 하드웨어 모델 상태의 정확한 스냅샷(snapshot)은 만약 하드웨어 모델이 시뮬레이션 시간 t3로부터 입력이 아닌 시작으로부터 포인트 t3으로의 제 1 입력의 전체 히스토리를 평가하지 않으면 시뮬레이션 시간 t3에서 얻어질수없다. 논리 회로는 입력 순서를 바탕으로 평가 결과에 영향을 미치는 메모리 속성을 가진다. 따라서, 만약 시뮬레이션 시간 t3(또는 시뮬레이션 시간 t3 바로 전 시뮬레이션 시간)로부터의 제 1 입력이 평가를 위해 하드웨어 모델에 공급되면, 하드웨어 모델은 시뮬레이션 시간 t3에서 잘못된 상태를 나타낼것이다.Why does the RCC calculation system 2600 pass the first input to the RCC hardware accelerator for evaluation when the output is not stored at all during a fast simulation cycle. The RCC system needs to store the hardware state of the design based on the evaluation of the first input from the start of the simulation to the simulation time t3. An accurate snapshot of the hardware model state cannot be obtained at simulation time t3 if the hardware model does not evaluate the entire history of the first input from the start to point t3 but not at the input from simulation time t3. Logic circuits have memory properties that affect the evaluation results based on the input order. Thus, if the first input from the simulation time t3 (or the simulation time just before the simulation time t3) is supplied to the hardware model for evaluation, the hardware model will show an incorrect state at the simulation time t3.

하드웨어 모델 상태가 왜 시뮬레이션 시간 t3 동안 저장되는가? 백만개 이상의 게이트 및 백만번 이상의 시뮬레이션 단계를 갖는 큰 설게는 비교적 짧은 시간 주기내에서 디버그될수없다. 사용자는 이런 설게를 디버그하기 위하여 다수의 시뮬레이션 세션을 필요로한다. 하나의 시뮬레이션 세션으로부터 다음 시뮬레이션 세션으로 빠르게 이동하기 위하여, RCC 시스템은 시뮬레이션 시간 t3로부터 하드웨어 상태(압축된 제 1 입력과 함께)를 저장하여, 사용자는 시뮬레이션 시간 t3에서 시작하는 다음 시뮬레이션 세션 범위를 디버그할수있다. 저장된 하드웨어 모델 상태로 인해, 사용자는 시뮬레이션의 시작 직후 시뮬레이트하는 것을 필요로 하지 않는다; 오히려, 사용자는 시뮬레이션 시간 t0에서 시뮬레이션 t3로의 설계를 디버깅한후 시뮬레이션 시간 t3로 빠르고 편리하게 리턴할수있다. 시뮬레이션 히스토리 파일에 저장된 시뮬레이션 시간 t3에서의 하드웨어 모델 상태는 상기 포인트까지 제 1 입력의 전체 히스토리 반영인 설게의 올바른 스냅샷을 나타낸다.Why is the hardware model state stored during simulation time t3? Large designs with more than one million gates and more than one million simulation steps cannot be debugged in a relatively short time period. The user needs multiple simulation sessions to debug this design. To quickly move from one simulation session to the next, the RCC system saves the hardware state (with the compressed first input) from simulation time t3 so that the user debugs the next simulation session range starting at simulation time t3. can do. Due to the stored hardware model state, the user does not need to simulate immediately after the start of the simulation; Rather, the user can debug the design from simulation time t0 to simulation t3 and return quickly and conveniently to simulation time t3. The hardware model state at simulation time t3 stored in the simulation history file represents a correct snapshot of the design, which is a complete history reflection of the first input up to that point.

RCC 하드웨어 가속기(2620)에서의 하드웨어 모델은 라인(2611)상의 내부 하드웨어 상태를 RCC 계산 시스템(2600)에 제공하여, RCC 계산 시스템(2600)은 만약 필요하고 사용자에 의해 목표되면 소프트웨어 모델의 다양한 논리 엘리먼트(예를들어, 결합 논리)를 구성 도는 재생성할수있다. 그러나, 상기된 바와같이 사용자는 시뮬레이션 세션 범위의 빠른 시뮬레이션 동안 소프트웨어 시뮬레이션을 관찰하는 것에 관련하지 않는다. 따라서, 내부 하드웨어 상태가 사용자에 의해 현재 버그에 대해 시험되지 않기 때문에, RCC 하드웨어 가속기로부터의 이들 내부 하드웨어 상태는 시스템 디스크에 저장되지 않는다.The hardware model in the RCC hardware accelerator 2620 provides the internal hardware state on the line 2611 to the RCC calculation system 2600 so that the RCC calculation system 2600 can be configured with various logic of the software model if needed and targeted by the user. An element (eg join logic) can be constructed or regenerated. However, as mentioned above, the user is not concerned with observing software simulations during a quick simulation of the simulation session range. Thus, since internal hardware states are not tested for current bugs by the user, these internal hardware states from the RCC hardware accelerators are not stored on the system disk.

시뮬레이션 시간 t3, 또는 시뮬레이션 세션 범위의 끝에서, 이런 특정 빠른 시뮬레이션 동작은 중단된다. 시뮬레이션 시간 t3에 대응하는 RCC 하드웨어 가속기(2620)내의 설계 하드웨어 모델로부터의 평가 결과 또는 제 1 출력(예를들어, 레지스터 값)은 시뮬레이션 히스토리 파일에 저장된다. 이것은 사용자가 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3에서 설계를 디버그할때, 사용자가 필요한만큼 추가 디버깅을 위해 시뮬레이션 시간 t3로 바로 진행하도록 행해진다. 사용자는 시뮬레이션 시간 t3 넘어 몇몇 포인트에서 그의 설계를 디버그하기 위하여 시뮬레이션 시간 t0로부터 시뮬레이션을 재운행할 필요가 없다.At the end of simulation time t3, or at the end of the simulation session range, this particular fast simulation operation is stopped. The evaluation result or first output (eg, register value) from the design hardware model in the RCC hardware accelerator 2620 corresponding to the simulation time t3 is stored in the simulation history file. This is done so that when the user debugs the design from simulation time t0 to simulation time t3, the user proceeds directly to simulation time t3 for further debugging as needed. The user does not need to rerun the simulation from simulation time t0 to debug his design at some point beyond simulation time t3.

요약하여, 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3(즉, 시뮬레이션 세션 범위)에서, 사용자는 미래 참조를 위해 시스템 디스크에 동일한 제 1 입력을 압축하여 저장함과 동시에 라인(2610)상에서 테스트 벤치 처리(2603)로부터의 제 1 입력을 RCC 하드웨어 가속기(2620)에 공급함으로써 설계를 가속한다. RCC 계산 시스템(2600)은 디버그 세션을 재생하기 위하여 입력 히스토리 파일에 제 1 입력(압축된 또는 압축되지 않은)을 저장을 필요로 한다. 압축 동작은 RCC 하드웨어 가속기(2620)내에서 데이타 평가와 동시에 발생한다. 마지막으로, 시뮬레이션 세션 범위의 끝인 시뮬레이션 시간 t3에서, RCC 시스템은 하드웨어 모델의 상태 정보를 시뮬레이션 히스토리 파일에 저장한다.In summary, from simulation time t0 to simulation time t3 (i.e., simulation session range), the user may compress and store the same first input to the system disk for future reference, while at the same time from test bench processing 2603 on line 2610. Accelerate the design by supplying a first input of the RCC hardware accelerator 2620. The RCC calculation system 2600 needs to store the first input (compressed or uncompressed) in the input history file to replay the debug session. Compression operation occurs concurrently with data evaluation within RCC hardware accelerator 2620. Finally, at simulation time t3, the end of the simulation session scope, the RCC system stores the state information of the hardware model in the simulation history file.

본 발명의 일실시예에서, 시뮬레이션 세션 범위로부터의 모든 기록되고 압축된 제 1 입력은 시뮬레이션 시간 t3로부터의 하드웨어 상태 정보에 대해 추후에 변형될 동일 파일의 일부이다. 다른 실시에에서, 시뮬에이션 세션 범위로부터 저장된 정보 및 시뮬레이션 시간 t3에서의 하드웨어 상태 정보는 시스템 디스크에서 구별되는 파일로서 저장된다. 유사하게, 임의의 상기된 파일은 시뮬레이션 타켓 범위를 위해 추후 생성되는 주문형 VCD 정보로 변형될수있다. 선택적으로, VCD 주문형 정보는 압축된 제 1 입력 파일 및 시뮬레이션 시간 t3에서 하드웨어 상태 정보 파일로부터 분리된 시스템 디스크내의 구별되는 VCD 파일로 저장될수있다. 다른 말로, 본 발명의 일실시예에 따라, 입력 히스토리 파일, 시뮬레이션 히스토리 파일 및 VCD 파일은 하나의 파일에서 서로 통합될수있다. 다른 실시에에서, 입력 히스토리 파일, 시뮬레이션 히스토리 파일 및 VCD 파일은 독립된 파일일수있다. 또한, 입력 히스토리 파일 및 시뮬레이션 히스토리 파일은 VCD 파일로부터 분리된 하나의 파일에 통합될수있다.In one embodiment of the invention, all recorded and compressed first inputs from the simulation session range are part of the same file which will later be modified for hardware state information from simulation time t3. In another embodiment, the information stored from the simulation session range and the hardware state information at simulation time t3 are stored as distinct files on the system disk. Similarly, any of the aforementioned files can be transformed into custom VCD information that is later generated for the simulation target range. Optionally, the VCD on-demand information can be stored as a separate VCD file in the system disk that is separated from the compressed first input file and the hardware state information file at simulation time t3. In other words, according to one embodiment of the present invention, the input history file, the simulation history file and the VCD file can be integrated with each other in one file. In another embodiment, the input history file, simulation history file, and VCD file may be separate files. In addition, the input history file and the simulation history file can be combined into one file separate from the VCD file.

압축 방법은 지금 논의될것이다. 본 발명의 일실시예에 따라, RCC 시스템의압축은 시뮬레이션 시간 단계 당 10% 입력 이벤트를 가진 제 1 입력 이벤트에 대한 20X의 압축 비율을 허용한다. 따라서, 백만개의 게이트 이상의 큰 ASIC 설계는 200개의 제 1 입력 이벤트를 요구할수있다. 시뮬레이션 시간 단계 당 10% 입력 이벤트에 대하여, 대략 20 입력이 압축되고 기록될 필요가 있다. 만약 각각의 입력신호가 2 바이트이고, 20 입력 신호는 40 바이트의 데이타가 시뮬레이션 시간 단계 당 제 1 입력에서 처리될 필요가 있게 한다. 20X의 압축 비율에 대하여, 40 바이트 데이타는 시뮬레이션 시간 단계당 2 바이트의 데이타로 압축될수있다. 따라서, 약 백만번의 시뮬레이션 단계를 요구하는 설계에 대하여, RCC 시스템은 2메가 바이트의 데이타로 제 1 입력을 압축한다. 이런 크기의 파일은 임의의 게산 파일 시스템 및 파형 뷰어에 의해 쉽게 관리될수있다. 일실시예에서, ZIP 압축이 사용된다.Compression methods will now be discussed. According to one embodiment of the invention, the compression of the RCC system allows a compression ratio of 20X for the first input event with 10% input events per simulation time step. Thus, a large ASIC design of more than one million gates may require 200 first input events. For 10% input events per simulation time step, approximately 20 inputs need to be compressed and recorded. If each input signal is 2 bytes, 20 input signals require 40 bytes of data to be processed at the first input per simulation time step. For a compression ratio of 20X, 40 byte data can be compressed to 2 bytes of data per simulation time step. Thus, for a design that requires about one million simulation steps, the RCC system compresses the first input with two megabytes of data. Files of this size can be easily managed by any computational file system and waveform viewer. In one embodiment, ZIP compression is used.

일실시예에 따라, 제 1 입력 압축은 RCC 하드웨어 가속기(2620)에 의해 제 1 입력 평가와 동시에 수행되고; 입력 스토리 파일 생성은 제 1 입력 평가와 동시에 발생한다. 따라서, 압축 방법은 RCC 시스템 성능에 직접적인 악영향을 제공하지 않는다. 하나의 가능한 병목은 현상은 시스템 디스크에 압축된 제 1 입력을 기록하는 처리이다. 그러나, 데이타가 고도로 압축되었기 때문에, RCC 시스템은 초당 50,000 시뮬레이션 시간 단계로 운행하는 대부분의 설게에 대하 5% 이하의 감속을 경험한다.According to one embodiment, the first input compression is performed concurrently with the first input evaluation by the RCC hardware accelerator 2620; Input story file generation occurs concurrently with the first input evaluation. Thus, the compression method does not provide a direct adverse effect on RCC system performance. One possible bottleneck is the phenomenon of writing a compressed first input to the system disk. However, because the data is highly compressed, the RCC system experiences a slowdown of less than 5% for most designs that run at 50,000 simulation time steps per second.

기록이 RCC 시스템에서 제어되는 특정 방식에 대하여, 사용자는 본 발명의 일실시예에 따라 RCC 기록 특징을 초기화하기 위하여 $rcc(기록)을 우선 사용하여 한다 :For a particular manner in which recording is controlled in the RCC system, the user first uses $ rcc (record) to initialize the RCC recording feature in accordance with one embodiment of the present invention:

$rcc(record, name, <disk space>, <checkpoint control>);$ rcc (record, name, <disk space>, <checkpoint control>);

아규먼트(argument) 이름, <디스크 공간(disk space)>, 및 <검사포인트 제어(checkpoint control)>의 확장은 지금 논의될것이다. "이름" 아규먼트는 현재 시뮬레이션 세션 범위에 대한 기록 이름이다. 다른 이름은 동일한 설게의 다른 시뮬레이션 운행을 구별하기 위하여 요구된다. 구별되는 기록 이름은 오프 라인 VCD 주문형 디버깅을 위하여 요구된다.The extension of argument names, <disk space>, and <checkpoint control> will now be discussed. The "name" argument is the record name for the current simulation session scope. Different names are required to distinguish different simulation runs of the same design. A distinct record name is required for off-line VCD on-demand debugging.

<디스크 공간> 아규먼트는 RCC 시스템 기록 처리를 위해 할당된 최대 디스크 공간(MB 유니트에서)을 나타내기 위한 선택적 파라미터이다. 결합 값은 100MB이다. RCC 시스템은 특정 디스크 공간내의 현재 시뮬레이션 세션 범위의 최후 부분만을 기록한다. 다른 말로, 만약 <디스크 공간> 값이 100 MB로서 지정되지만 현재 시뮬레이션 세션 범위가 140 MB로 정해지면, RCC 시스템은 최후 100 MB만을 기록하고 압축된 제 1 입력의 제 1 40 MB를 버린다. 본 발명의 이런 측면은 결함 분석을 위한 하나의 장점이다. 본 발명의 일실시에에서, 테스트 벤치 처리는 시뮬레이션 결함을 검출하고 시뮬레이션을 정지하기 위하여 자체 검사 기능을 가진다. RCC 시뮬레이션의 최종 히스토리는 상기 결함 분석을 위한 대부분의 정보를 제공할수있다.The <disk space> argument is an optional parameter for indicating the maximum disk space (in MB units) allocated for RCC system write processing. The combined value is 100 MB. The RCC system only records the last portion of the current simulation session range within a particular disk space. In other words, if the <disk space> value is specified as 100 MB but the current simulation session range is set to 140 MB, the RCC system writes only the last 100 MB and discards the first 40 MB of the compressed first input. This aspect of the invention is one advantage for defect analysis. In one embodiment of the present invention, the test bench process has a self test function to detect simulation defects and stop the simulation. The final history of the RCC simulation can provide most of the information for the defect analysis.

<검사포인트 제어> 아규먼트는 전체 상태 검사 포인트를 수행하기 위하여 필요한 시뮬레이션 시간 단계 수를 나타내는 선택적 파라미터이다. 디폴트는 1,000,000 번이다. 대부분의 통상적인 압축 알고리듬과 같이, 압축된 제 1 입력은 연속적인 시뮬레이션 단계 사이에서 상태 차를 바탕으로 한다. 긴 시뮬레이션 운행 동안, 주어진 저주파수에서 전체 RCC 상태에 대한 검사포인트는 시뮬레이션 히스토리 추출을 용이하게 할수있다. 매 일백만 단계에서 배치된 RCC 시스템 및 검사 포인트에서 초당 20 K 내지 200K 시뮬레이션 시간 단계의 압축 해제 비율에 대하여, RCC 시스템은 5 내지 50 초 내의 임의의 시뮬레이션 히스토리를 추출(즉, 제1 입력 및 선택된 VCD 파일 생성으로부터의 시뮬레이션 재생)한다.The <checkpoint control> argument is an optional parameter that indicates the number of simulation time steps needed to perform a full state checkpoint. The default is 1,000,000 times. As with most conventional compression algorithms, the compressed first input is based on the state difference between successive simulation steps. During long simulation runs, a checkpoint on the overall RCC state at a given low frequency can facilitate simulation history extraction. For decompression ratios of 20K to 200K simulation time steps per second in RCC systems and checkpoints deployed at one million steps, the RCC system extracts any simulation history within 5 to 50 seconds (ie, the first input and selected Simulation playback from VCD file creation).

이런 $rcc(기록) 명령이 호출될때, RCC 시스템은 시뮬레이션 히스토리를 기록할것이다; 즉, 제 1 입력에는 시스템 디스크의 저장을 위해 하나의 파일이 압축 기록될것이다. RCC 하드웨어 가속기로부터의 제 1 입력은 소프트웨어 논리 재생성이 이 시점에서 필요하지 않기 때문에 무시된다. 기록 처리는 명령 $rcc(stop) 또는 $rcc(off)로 유지될수있고, 이 포인트에서 RCC 시스템은 소프트웨어 모델로 다시 시뮬레이션의 제어를 스위칭한다. 이런 포인트에서, 제 1 출력은 소프트웨어 논리 재생성을 위하여 처리된다.When this $ rcc command is called, the RCC system will record the simulation history; In other words, one file will be compressed in the first input for storage of the system disk. The first input from the RCC hardware accelerator is ignored because software logic regeneration is not needed at this point. The write process can be maintained with the command $ rcc (stop) or $ rcc (off), at which point the RCC system switches the control of the simulation back to the software model. At this point, the first output is processed for software logic regeneration.

VCD 생성-압축 및 덤프VCD Creation-Compression and Dump

상기된 바와같이, RCC 시스템은 시뮬레이션 시간 t0에서 시뮬레이션 세션 범위의 초기에 소프트웨어 모델 및 하드웨어 모델을 저장하고, 입력 히스토리 파일에서 전체 시뮬레이션 세션 범위 동안 압축된 제 1 입력을 기록하고, 시뮬레이션 히스토리 파일내의 시뮬레이션 세션 범위의 끝에서 설계를 위한 하드웨어 모델 상태를 저장한다. 사용자는 시뮬레이션 시간 t0로부터의 설계 정보로부터 시뮬레이션 세션 범위의 시작시 설계를 로딩하기에 충분한 정보를 가진다. 압축된 제 1 입력으로 인해, 사용자는 그의 설계의 임의의 일부를 소프트웨어 시뮬레이트할수있다. 그러나, VCD 주문형 특징으로 인해, 사용자는 이런 포인트에서 그의 설계를 소프트웨어 시뮬레이트하는 것을 원하지 않을 것이다. 오히려, 사용자는 버그를 격리 및 수리하기 위하여 미세 분석을 위한 선택된 시뮬레이션 타켓 범위 동안 VCD 파일을 생성하기를 원한다. 실제로, 기록된 압축 제 1 입력으로 인해, RCC 시스템은 시뮬레이션 세션 범위내의 임의의 포인트를 재생할수있다. 게다가, RCC 시스템은 만약 목표된다면 시뮬레이션 시간 t3로부터 이전에 저장된 하드웨어 상태 정보를 로딩함으로써 현재 시뮬레이션 세션 범위 이상으로 시뮬레이트할수있다.As described above, the RCC system stores the software model and hardware model at the beginning of the simulation session range at simulation time t0, records the first input compressed for the entire simulation session range in the input history file, and simulates in the simulation history file. Save the hardware model state for the design at the end of the session scope. The user has enough information to load the design at the start of the simulation session range from the design information from the simulation time t0. Due to the compressed first input, the user can software simulate any part of his design. However, due to the VCD on-demand feature, the user would not want to software simulate his design at this point. Rather, the user wants to create a VCD file during the selected simulation target range for fine analysis to isolate and repair the bug. In fact, due to the recorded compressed first input, the RCC system can play any point within the scope of the simulation session. In addition, the RCC system can simulate beyond the current simulation session range by loading previously stored hardware state information from simulation time t3 if desired.

설계를 빠르게 시뮬레이팅한후, 사용자는 버그가 존재하는지를 결정하기 위하여 결과를 검토한다. 만약 버그가 사용자에게 나타나면, 상기 설계는 현재 시뮬레이션 세션 범위 동안 버그를 없앨수 있다. 그 다음 사용자는 선택된 범위가 무엇이든 현재 시뮬레이션 세션 범위 넘어 다음 시뮬레이션 세션 범위로 시뮬레이트하도록 진행한다. 그러나, 만약 사용자가 설계에 일종의 문제를 가진다는 것을 결정하면, 사용자는 버그를 격리 및 수리하기 위하여 보다 주의깊게 시뮬레이션을 분석하여야 한다. 전체 시뮬레이션 세션 범위가 신중하고 상세한 분석을 위해 너무 크기 때문에, 사용자는 심화 학습을 위해 특정한 더 좁은 범위를 목표로 해야만 한다. 상기 디자인과 과거의 디버깅 노력에 사용자가 친밀하다는 것을 근거로 하여, 사용자는 시뮬레이션 세션 범위 내에서 버그의 위치에 관한 적당한 추측을 행한다. 사용자는 버그의 위치(또는 버그가 자신을 나타낼 위치)에 관한 사용자의 추측에 대응하여야만 하는 선택된 시뮬레이션 목표 범위에 초점을 맞출 것이다. 사용자는 시뮬레이션 목표 범위가 도 84에 도시된 바와 같이 시뮬레이션 시간(T1) 및 시뮬레이션 시간(t2) 사이에 존재한다는 것을 결정한다.After simulating the design quickly, the user reviews the results to determine if a bug exists. If a bug appears to the user, the design can eliminate the bug during the current simulation session scope. The user then proceeds to simulate the scope of the next simulation session beyond the scope of the current simulation session, whatever the selected range. However, if the user decides that they have some kind of problem with the design, the user must analyze the simulation more carefully to isolate and repair the bug. Because the entire simulation session range is too large for careful and detailed analysis, the user must target a specific narrower range for further learning. Based on the user's familiarity with the design and past debugging efforts, the user makes a reasonable guess as to the location of the bug within the scope of the simulation session. The user will focus on the selected simulation target range that must respond to the user's guess about the location of the bug (or where the bug will represent itself). The user determines that the simulation target range exists between the simulation time T1 and the simulation time t2 as shown in FIG.

RCC 시스템은 시뮬레이션 상태(t0)로부터 이전에 저장된 구성 정보를 갖는 RCC 하드웨어 액셀러레이터(2620) 내의 하드웨어 모델 및 RCC 계산 시스템(2600) 내의 디자인의 소프트웨어 모델을 로딩한다. 그리고 나서, RCC 시스템은 시뮬레이션 시간(t0)으로부터 시뮬레이션 시간(t1)으로 고속으로 시뮬레이팅 한다. 고속 시뮬레이션 동작 동안, RCC 계산 시스템은 압축된 제 1 입력을 포함한 사전 저장 파일을 로딩한다. RCC 계산 시스템은 압축된 제 1 입력을 압축해제하여 상기 압축해제된 제 1 입력을 평가를 위해 RCC 하드웨어 액셀러레이터(2620)로 입력한다. 시뮬레이션 세션 범위에 대해 제 1 입력을 압축하여 저장한 초기 고속 시뮬레이션 동작과 같이, 평가된 결과인 제 1 출력(예를 들어, 하드웨어 모델 노드값 및 레지스터 상태)가 시뮬레이션 시간(t0)으로부터 시뮬레이션 시간(t1)으로의 고속 시뮬레이션 동안 저장되지 않는다.The RCC system loads the hardware model in the RCC hardware accelerator 2620 with the configuration information previously stored from the simulation state t0 and the software model of the design in the RCC calculation system 2600. The RCC system then simulates at high speed from simulation time t0 to simulation time t1. During the fast simulation operation, the RCC calculation system loads a prestored file containing the compressed first input. The RCC calculation system decompresses the compressed first input and inputs the decompressed first input to the RCC hardware accelerator 2620 for evaluation. As with the initial high speed simulation operation in which the first input is compressed and stored for the simulation session range, the first output (e.g., hardware model node value and register state) that is the result of the evaluation is the simulation time (t0) It is not stored during the high speed simulation to t1).

일단 고속 시뮬레이션 동작이 시뮬레이션 목표 범위의 도입부, 즉 시뮬레이션 시간(t1)에 도달하면, RCC 시스템은 평가된 결과(즉, 제 1 출력(O_p))를 RCC 하드웨어 액셀러레이터(2620)내의 하드웨어 모델로부터 시스템 디스크 내의 VCD 파일 내로 덤핑한다. 시뮬레이션 세션 범위에 대한 초기의 고속 시뮬레이션 동작과 달리,RCC 계산 시스템(2600)은 임의의 압축을 수행하지 않는다. 다시, RCC 계산 시스템(2600)은 사용자가 이 시간에 평가 결과를 볼 필요가 없기 때문에 소프트웨어 모델에 대한 재생 동작을 수행하지 않는다. 소프트웨어 모델에 대한 임의의 재생 동작을 수행하지 않음으로써, RCC 시스템은 VCD 파일을 신속하게 생성시킬 수 있다.Once the high speed simulation operation reaches the beginning of the simulation target range, i.e., simulation time t1, the RCC system sends the evaluated result (i.e., the first output O _p ) from the hardware model in the RCC hardware accelerator 2620. Dump into a VCD file on disk. Unlike the initial high speed simulation operation for the simulation session range, the RCC calculation system 2600 does not perform any compression. Again, the RCC calculation system 2600 does not perform a playback operation on the software model because the user does not need to see the evaluation results at this time. By not performing any playback operation on the software model, the RCC system can quickly create a VCD file.

그러나, 다른 실시예에서, 사용자는 제 1 출력을 저장하면서 t1로부터 t2로의 이 시뮬레이션 시간 주기 동안 사용자 디자인 소프트웨어 모델을 함께 볼 수 있다. 만약 그런 경우, RCC 계산 시스템(2600)은 소프트웨어 모델 재생 동작을 수행하여 사용자가 임의의 모양의 사용자 디자인으로부터 임의의 상태 및 모든 상태를 보도록 한다.However, in another embodiment, the user can view the user design software model together during this simulation time period from t1 to t2 while saving the first output. If so, the RCC calculation system 2600 performs a software model playback operation to allow the user to see any and all states from the user design of any shape.

시뮬레이션 시간(t2)에서, RCC 계산 시스템(2600)은 RCC 하드웨어 액셀러레이터(2620)로부터 VCD 파일로 평가 출력을 저장하는 것을 중단한다. 이 지점에서, 사용자는 고속 시뮬레이팅을 중단할 수 있다. RCC 시스템은 이제 시뮬레이션 목표 범위에 대한 완전한 VCD 파일을 가지며 VCD 파일을 더 상세하게 분석하는 것을 진행할 수 있다.At simulation time t2, the RCC calculation system 2600 stops storing the evaluation output from the RCC hardware accelerator 2620 to a VCD file. At this point, the user can stop high speed simulation. The RCC system now has a complete VCD file for the simulation target range and can proceed to further analyze the VCD file.

사용자가 VCD 파일을 분석하고자 할때, 사용자는 그 시작(예를 들어, 시뮬레이션 시간(t0))으로부터 시뮬레이션을 재실행시킬 필요가 없다. 그 대신에, 사용자는 RCC 시스템시뮬레이션 목표 범위의 시작으로부터 저장된 하드웨어 상태 정보를 로딩하도록 명령하고 소프트웨어 모델을 갖는 시뮬레이팅된 결과를 볼 수 있다. 이것은 시뮬레이션 히스토리 리뷰 섹션에서 아래에 보다 상세히 기술될 것이다.When the user wants to analyze the VCD file, the user does not need to rerun the simulation from the start (eg simulation time t0). Instead, the user can instruct to load the stored hardware state information from the start of the RCC system simulation target range and view the simulated results with the software model. This will be described in more detail below in the simulation history review section.

VCD 파일을 분석시, 사용자는 버그를 발견하거나 또는 발견하지 않을 수 있다. 버그가 발견된 경우, 사용자는 물론 디자인 수정을 개시한다. 버그가 발견되지 않는 경우, 사용자는 자신이 버그를 갖는다고 의심하는 시뮬레이션 목표 범위에 잘못된 추측을 할 수 있다. 사용자는 압축해제 및 VCD 파일 덤프(dump)에 관하여 위에서 사용한 동일한 공정을 사용해야만 한다. 사용자는 잘하면 시뮬레이션 세션 범위 내에 더 좋은 시뮬레이션 목표 범위가 있다는 다른 추측을 행한다. 그렇게 함으로써, RCC 시스템은 시뮬레이션 세션 범위의 시작으로부터 새로운 시뮬레이션목표 범위로 고속 시뮬레이팅하고, 제 1 입력을 압축해제하여 평가를 위해 RCC 하드웨어 액셀러레이터(2620)로 전달한다. RCC 시스템이 새로운 시뮬레이션 목표 범위의 시작에 도달할때, RCC 하드웨어 액셀러레이터(2620)로부터의 제 1 출력이 VCD 파일 내로 덤핑된다. 새로운 시뮬레이션 목표 범위의 끝에서, RCC 시스템은 VCD 파일 내로 하드웨어 상태 정보를 덤핑하는 것을 중단한다. 이 지점에서, 사용자는 버그를 고립하기 위한 VCD 파일을 볼 수 있다.When analyzing a VCD file, the user may or may not find a bug. If a bug is found, the user, of course, initiates the design modification. If no bug is found, the user may make a false guess in the scope of the simulation target that he suspects has a bug. The user must use the same process used above for decompression and VCD file dump. The user may hopefully make another conjecture that there is a better simulation target range within the simulation session scope. In doing so, the RCC system rapidly simulates from the start of the simulation session range to the new simulation target range, decompresses the first input and passes it to the RCC hardware accelerator 2620 for evaluation. When the RCC system reaches the start of a new simulation target range, the first output from the RCC hardware accelerator 2620 is dumped into the VCD file. At the end of the new simulation target range, the RCC system stops dumping hardware state information into the VCD file. At this point, the user can see the VCD file to isolate the bug.

요컨데, 시뮬레이션 시간(t0)으로부터 시뮬레이션 시간(t1)으로, RCC 시스템은 이전에 압축된 제 1 입력을 압축해제하고 이를 평가를 위해 하드웨어 모델로 전달함으로써 고속 시뮬레이팅한다. 시뮬레이션 시간(t1)으로부터 시뮬레이션 시간 (t2)으로의 시뮬레이션 목표 범위 동안, RCC 시스템은 하드웨어 모델로부터의 제 1 출력을 VCD 파일로 덤핑한다. 시뮬레이션 목표 범위의 끝에서, 사용자는 상기 디자인을 고속으로 시뮬레이팅하는 것을 중단할 수 있다. 그리고 나서, 이 지점에서, 사용자는 시뮬레이션 시간(t0)에서 바로 그 시작으로부터 시뮬레이션을 재실행함이 없이 시뮬레이션 시간(t1)으로 직접 진행함으로써 VCD 파일을 볼 수 있다.In short, from simulation time t0 to simulation time t1, the RCC system simulates high speed by decompressing a previously compressed first input and passing it to a hardware model for evaluation. During the simulation target range from simulation time t1 to simulation time t2, the RCC system dumps the first output from the hardware model into the VCD file. At the end of the simulation target range, the user can stop simulating the design at high speed. At this point, the user can then view the VCD file by going directly to simulation time t1 without rerunning the simulation from the very beginning at simulation time t0.

이 시뮬레이션 목표 범위의 검토가 완료되고 버그가 격리되어 제거될때, 사용자는 다음 시뮬레이션 세션 범위로 진행할 수 있다. 이 새로운 시뮬레이션 세션 범위는 시뮬레이션 시간(t3)에서 시작한다. 이전 시뮬레이션 시간 세션 범위와 동일한 길이일 수 있는 새로운 시뮬레이션 목표 범위의 특정 길이가 사용자에 의해 선택된다. RCC 시스템은 시뮬레이션 시간(t3)에 대응하는 이전에 저장된 하드웨어 상태 정보를 로딩한다. RCC 시스템은 이제 이 새로운 시뮬레이션 세션 범위를 고속으로 시뮬레이션할 준비가 되어 있다. 이 새로운 시뮬레이션 세션 범위는 시뮬레이션 시간 t0로부터 t3로의 범위에 대응한다는 것을 주의하고, 여기서 로딩된 하드웨어 상태는 이제 시뮬레이션 시간(t0)에 대응한다. 고속 시뮬레이션, VCD 온-디멘드 덤프(VCD on-demand dump) 및 VCD 검토 공정은 상술된 것과 유사하다.When the review of this simulation target scope is complete and the bug is isolated and removed, the user can proceed to the next simulation session scope. This new simulation session range starts at simulation time t3. The particular length of the new simulation target range, which may be the same length as the previous simulation time session range, is selected by the user. The RCC system loads previously stored hardware state information corresponding to the simulation time t3. The RCC system is now ready to simulate this new simulation session range at high speed. Note that this new simulation session range corresponds to the range from simulation time t0 to t3, where the loaded hardware state now corresponds to simulation time t0. The high speed simulation, VCD on-demand dump and VCD review processes are similar to those described above.

본 발명의 일 실시예에 따라서, 압축해제 단계는 성능에 부정적으로 영향을 주지 않는다. RCC 시스템은 시뮬레이션 히스토리(history)(즉, 압축되고 기록된 제 1 입력)을 초당 20,000 내지 200,000 시뮬레이션 시간 단계의 비율로 압축해제한다. 적절한 체크지점 제어를 사용하여, RCC 시스템은 50초 내에서 시뮬레이션 히스토리를 추출(즉, 제 1 입력으로부터의 시뮬레이션 재생 선택된 VCD 파일 재생)할 수 있다.According to one embodiment of the invention, the decompression step does not negatively affect performance. The RCC system decompresses the simulation history (ie, the compressed and recorded first input) at a rate of 20,000 to 200,000 simulation time steps per second. Using appropriate checkpoint control, the RCC system can extract the simulation history (ie, play the selected VCD file from the first input) within 50 seconds.

VCD 온-디멘드 특성이 RCC 시스템에서 제어되는 특정 방식에 관하여, 사용자는 $axis_rpd 명령을 사용해야만 한다. $axis_rpd는 RCC 평가 기록을 추출하여 요구시 VCD 파일을 생성하기 위한 대화식 명령이다. 종래의 시뮬레이션 리와인드 기술과 달리, $axis_rpd 명령의 수행은 내부 시뮬레이션 상태를 리와인딩하지도 않고 외부 PLI 및 파일 I/O 상태에 오류를 일으키지도 않는다. 사용자는 자신이 $stop 명령이후에 시뮬레이팅할 수 있는 것과 동일한 방식으로 $axis_rpd 명령을 실행한 후에 시뮬레이션을 지속할 수 있다.Regarding the particular way that the VCD on-demand feature is controlled in the RCC system, the user must use the $ axis_rpd command. $ axis_rpd is an interactive command for extracting RCC evaluation records and creating VCD files on demand. Unlike conventional simulation rewind techniques, the execution of the $ axis_rpd command neither rewinds the internal simulation state nor causes errors in external PLI and file I / O states. The user can continue the simulation after executing the $ axis_rpd command in the same way that he can simulate after the $ stop command.

인수(argument)가 규정되지 않을때, $axis_rpd 명령은 시뮬레이션 세션 범위 내의 모든 이용 가능한 시뮬레이션 시간 주기를 디스플레이한다; 즉, 사용자는 시뮬레이션 목표 범위를 선택할 수 있다. 시간 유닛은 명령 라인 인터페이스에서 동일한 시간 유닛이다. 시뮬레이션 로그(log)의 예는 다음과 같다:When no argument is specified, the $ axis_rpd command displays all available simulation time periods within the scope of the simulation session; That is, the user can select the simulation target range. The time unit is the same time unit in the command line interface. An example of a simulation log is as follows:

C1>$rcc(record, r1);C1> $ rcc (record, r1);

C2>#1000 $rcc(xt0,run);C2> # 1000 $ rcc (xt0, run);

C3>#50000$rcc(off);C3> # 50000 $ rcc (off);

C4>#50500 $rcc(run);C4> # 50500 $ rcc (run);

C5>#60000 $rcc(stop);C5> # 60000 $ rcc (stop);

…100500에서 RCC 엔진 시작… Start RCC engine at 100500

…SIM으로 되돌아감:5000000에서 RCC 엔지 정지… Return to SIM: Stop RCC engine at 5000000

…5050500에서 RCC 엔진 시작… Start RCC engine on 5050500

…SIM으로 되돌아감:60000.0000ns에서 RCC 엔진 정지… Return to SIM: RCC engine stopped at 60000.0000ns

C6>$axis_rpd;C6> $ axis_rpd;

이용 가능한 시뮬레이션 히스토리:Available simulation history:

1005.000000 내지 50000.0000001005.000000 to 50000.000000

50505.000000 내지 60000.0000050505.000000 to 60000.00000

시뮬레이션 시간 60000.0000ns에서 인터럽트Interrupt at simulation time 60000.0000ns

이 시뮬레이션 로고로부터, 사용자가 사용한 RCC 엔진은 1000 내지 50000 직후의 시간 및 50500 내지 60000 직후의 시간을 형성한다. 그러므로, $axis_rpd는 기록된 시뮬레이션 윈도우(window)를 나타낸다.From this simulation logo, the RCC engine used by the user forms a time immediately after 1000 to 50000 and a time immediately after 50500 to 60000. Therefore, $ axis_rpd represents the recorded simulation window.

시뮬레이션 히스토리로부터 VCD 파일을 발생시키기 위하여, 사용자는 다음의 제어 인수를 갖는 $axis_rpd 명령을 사용한다:To generate a VCD file from the simulation history, the user uses the $ axis_rpd command with the following control arguments:

$axis_rpd(시작-시간, 종료-시간, 덤프-파일-명칭", <레벨 및 범위 제어>);$ axis_rpd (start-time, end-time, dump-file-name ", <level and range control>);

시작-시간 및 종료-시간은 VCD 파일에 대한 시뮬레이션 시간 윈도우, 즉 시뮬레이션 목표 범위를 규정한다. 시간 제어 인수의 유닛은 명령 라인 인터페이스에서 사용된 시간 유닛이다. "덤프-파일-명칭"은 VCD 파일의 명칭이다. 덤프<레벨 및 범위 제어> 파라미터는 IEEE Verilog에서의 표준 $dumpvars 명령과 동일하다.The start-time and end-time define the simulation time window for the VCD file, ie the simulation target range. The unit of time control argument is the time unit used in the command line interface. "Dump-file-name" is the name of the VCD file. The dump <level and range control> parameter is identical to the standard $ dumpvars command in IEEE Verilog.

$axis_rpd 명령의 예로서:As an example of the $ axis_rpd command:

C7>$axis_rpd(50505,50600, "fl.dump");C7> $ axis_rpd (50505,50600, “fl.dump”);

…50505.0100000에서 RCC VCD 시작 !!… RCC VCD start at 50505.0100000 !!

…50600.000000에서 RCC VCD 종료 !!… RCC VCD exit at 50600.000000 !!

이 $axis_rpd 명령은 시뮬레이션 시간 50505로부터 50600으로의 시뮬레이션 목표 범위에 대한 "kl.dump"라 칭하는 VCD 파일을 생성한다. $dumpvars와 같이, 레벨 및 범위 제어 파라미터가 제공되지 않는 경우, $axis_rpd 명령은 전체 하드웨어 상태 또는 제 1 출력을 덤핑할 것이다.This $ axis_rpd command generates a VCD file called "kl.dump" for the simulation target range from simulation time 50505 to 50600. If no level and range control parameters are provided, such as $ dumpvars, the $ axis_rpd command will dump the entire hardware state or first output.

$axis_rpd 명령을 사용하는 다른 예는 다음과 같다:Another example of using the $ axis_rpd command is:

C8>$axis_rpd(40444,50600,"fl.dump",2,dp0)C8> $ axis_rpd (40444,50600, "fl.dump", 2, dp0)

…40000.000000에서 RCC VCD 시작 !!… RCC VCD start at 40000.000000 !!

…시간 50000.000000에서 건너뜀… Skipped at time 50000.000000

…시간 50505.000000에서 계속 !!… Keep on time 50505.000000 !!

…50600.000000에서 RCC VCD 종료 !!… RCC VCD exit at 50600.000000 !!

이 $axis_rpd 명령은 범위 dp0 상에서 시간 40000로부터 50600으로 2-레벨 VCD 파일 "f2.dump"를 생성한다. 상기 시뮬레이션이 시간 50000 내지 50500 동안 소프트웨어 제어로 다시 교환되기 때문에, $axis_rpd는 그 윈도우를 건너 뛰는데, 그 이유는 시뮬레이션 기록이 이용 가능하지 않게 때문이다.This $ axis_rpd command generates a two-level VCD file "f2.dump" from time 40000 to 50600 on the range dp0. Since the simulation is switched back to software control for time 50000 to 50500, $ axis_rpd skips that window because the simulation record is not available.

VCD 온-디멘드는 또한 사용자가 시뮬레이션 공정을 종료한 이후에 유용하다. 오프-라인 VCD 온-디멘드를 수행하기 위하여, 사용자는 +rccplay 옵션을 갖는 "vlg"라 칭하는 시뮬레이션 프로그램을 시작한다. 이 옵션으로, RCC 시스템은 시뮬레이션을 위해 통상적인 초기화 시퀀스를 수행하는 대신에 시뮬레이션 기록을 추출하도록 명령받는다. 일단 사용자가 시뮬레이션 프로그램으로 들어가면, 사용자는 VCD 온-디멘드를 달성하기 위하여 동일한 $axis_rpd 명령을 사용할 수 있다. 이 절차의 예는 다음과 같다:VCD on-demand is also useful after the user has finished the simulation process. To perform the off-line VCD on-demand, the user starts a simulation program called "vlg" with the + rccplay option. With this option, the RCC system is instructed to extract the simulation record instead of performing the normal initialization sequence for the simulation. Once the user enters the simulation program, the user can use the same $ axis_rpd command to achieve the VCD on-demand. An example of this procedure is as follows:

axis1:3-dpo_rtlc>vlg +rccplay_rl -saxis1: 3-dpo_rtlc> vlg + rccplay_rl -s

…시간 100500에서 리플레이 기록 ./AxisWork/rl 시작… Record replay at time 100500 ./AxisWork/rl start

C1>$axis_rpd;C1> $ axis_rpd;

이용 가능한 시뮬레이션 히스토리:Available simulation history:

1005.000000 내지 50000.0000001005.000000 to 50000.000000

50505.000000 내지 60000.00000050505.000000 to 60000.000000

시뮬레이션 시간 100500에서 인터럽트Interrupt at simulation time 100500

C2>$axis_rpd(40000,45000, "f2.dump");C2> $ axis_rpd (40000,45000, “f2.dump”);

…40000.000000에서 RCC VCD 시작 !!… RCC VCD start at 40000.000000 !!

…45000.000000에서 RCC VCD 종료 !!… RCC VCD ends at 45000.000000 !!

시뮬레이션 시간 4500000에서 인터럽트Interrupt at simulation time 4500000

C3>C3>

상기 예에서, 시뮬레이션 기록("rl")이 사용되어 시뮬레이션 히스토리를 추출하고 시간 40000으로부터 45000으로 전체 디자인 상에 VCD를 발생시킨다.In the above example, a simulation record ("rl") is used to extract the simulation history and generate a VCD over the entire design from time 40000 to 45000.

시뮬레이션 히스토리 리뷰Simulation history review

일단, 시뮬레이션 목표 범위(즉, 시뮬레이션 시간 t1 내지 t2)의 VCD 파일이 RCC 시스템에 의해 생성되면, 사용자는 시뮬레이션 시간 t2로부터 t3로 고속으로 시뮬레이팅할 필요가 없다. 그 대신에, RCC 시스템은 사용자가 시뮬레이션 중단하도록 하며 시뮬레이션 목표 범위, 즉 시뮬레이션 시간(t1)의 시작으로 직접 진행하도록 한다. 그러므로, 종래 기술과 대조적으로, 사용자는 그 시작(예를 들어, 시뮬레이션 시간(t0))으로부터 시뮬레이션을 재실행할 필요가 없다. VCD 파일 내로 덤핑된 하드웨어 상태는 시뮬레이션 시간 t0로부터 제 1 입력의 전체 히스토리의 평가를 반영하며, 시뮬레이션 시간 t1으로부터 t2로 제 1 입력을 포함한다.Once the VCD file of the simulation target range (ie, simulation time t1 to t2) is generated by the RCC system, the user does not need to simulate at high speed from simulation time t2 to t3. Instead, the RCC system allows the user to stop the simulation and proceed directly to the simulation target range, i.e. the beginning of the simulation time t1. Therefore, in contrast to the prior art, the user does not have to rerun the simulation from the start (eg, simulation time t0). The hardware state dumped into the VCD file reflects an evaluation of the entire history of the first input from simulation time t0 and includes the first input from simulation time t1 to t2.

RCC 시스템은 VCD 파일을 로딩한다. 그 후에, 저장된 제 1 출력이 RCC 계산 시스템(2600)으로 전달되어 소프트웨어 모델 및 이의 많은 결합 논리 회로 모두가 정확한 상태 정보에 의해 재생될 수 있다. 그리고 나서, 사용자는 디버깅을 위한 파형 뷰어(waveform viewer)로 소프트웨어 모델을 본다. VCD로, 사용자는 버그가격리될때까지 자신의 소프트웨어 모델에 걸쳐 매우 신중하게 한단계씩 스테핑할 수 있다.The RCC system loads the VCD file. Thereafter, the stored first output is passed to the RCC calculation system 2600 so that both the software model and many of its combined logic circuits can be reproduced by accurate state information. Then, the user sees the software model with a waveform viewer for debugging. With VCD, users can step very carefully through their software model until the bug is priced.

이 VCD 온-디멘드 특성으로, 사용자는 시뮬레이션 세션 범위 내에서 임의의 시뮬레이션 목표 범위를 선택할 수 있다. 버그가 선택된 시뮬레이션 목표 범위 내에서 발견될 수 없는 경우, 사용자는 요구시 다른 상이한 시뮬레이션 목표 범위를 선택할 수 있다. 테스트 벤치 공정(test bench process)으로부터의 모든 제 1 입력이 전체 시뮬레이션 세서 범위에 대해 기록될 수 있기 때문에, 이 시뮬레이션의 임의의 부분은 재생될 수 있고 시뮬레이션을 재실행함이 없이 요구시 보여질 수 있다. 이 특성은 사용자로 하여금 사용자가 이 시뮬레이션 세션 범위 내에서 버그를 수정할때까지 다중 그리고 상이한 시뮬레이션 목표 범위에 반복적으로 초점을 맞추도록 한다.This VCD on-demand feature allows the user to select any simulation target range within the simulation session range. If a bug cannot be found within the selected simulation target range, the user can select another different simulation target range on request. Since all first inputs from the test bench process can be recorded for the entire simulation parser range, any portion of this simulation can be reproduced and viewed on demand without rerunning the simulation. . This feature allows the user to repeatedly focus on multiple and different simulation target ranges until the user fixes bugs within the scope of this simulation session.

더구나, 이 VCD 온-디멘드 특성은 시뮬레이션 공정 도중에 온-라인으로 지원될 뿐만 아니라, 시뮬레이션 공정이 종료된 이후에 오프-라인으로 지원될 수 있다. 이 온-라인 지원은 시뮬레이션 시간(t0)에서 하드웨어 상태가 시스템 디스크 내에 저장될 수 있고 제 1 입력이 시뮬레이션 세션 범위의 임의의 길이에 대해 압축되어 기록될 수 있어서 가능하다. 그 후에, 사용자는 제 1 출력의 더 초점이 맞춰진 분석을 위해 시뮬레이션 목표 범위를 규정할 수 있다.Moreover, this VCD on-demand feature can be supported on-line during the simulation process, as well as off-line after the simulation process ends. This on-line support is possible because at the simulation time t0 the hardware state can be stored in the system disk and the first input can be recorded compressed for any length of the simulation session range. The user can then define a simulation target range for more focused analysis of the first output.

오프-라인 지원은 시뮬레이션 시간(t0)에서 시뮬레이션 세션 범위에 대한 전체의 제 1 입력 및 시뮬레이션 시간(t1)에서 하드웨어 상태가 시스템 디스크 내에 모두 저장되기 때문에 가능하다. 그러므로, 사용자는 시뮬레이션 시간(t0)에 대응하는 디자인을 로딩하고 나서 시뮬레이션 목표 범위를 규정함으로써 자신의 디자인을 디버깅하기 위하여 복귀할 수 있다. 또한, 사용자는 시뮬레이션 시간(t3)에 대응하는 하드웨어 상태를 로딩함으로써 다음 시뮬레이션 목표 범위로 직접 진행할 수 있다.Off-line support is possible because the hardware state is all stored in the system disk at the simulation time t1 and the first full input to the simulation session range at the simulation time t0. Therefore, the user can return to debug his design by loading the design corresponding to the simulation time t0 and then defining the simulation target range. In addition, the user can proceed directly to the next simulation target range by loading the hardware state corresponding to the simulation time t3.

Ⅵ. 하드웨어 구현 방식Ⅵ. Hardware implementation

A. 개요A. Overview

SE뮬레이션(SEmulation) 시스템은 재구성 가능한 보드 상에 FPGA 칩의 어레이를 구현한다. 상기 하드웨어 모델을 토대로 하여, SE뮬레이션 시스템은 사용자 회로 디자인의 각각의 선택된 부분을 FPGA 칩 상으로 분할하고, 랩핑하고, 위치시키고 루팅한다. 그러므로, 예를 들어, 16 칩의 4x4 어레이는 이러한 16 칩에 걸쳐서 퍼진 큰 회로를 모델링할 수 있다. 상호접속 방식은 각각의 칩이 2 "점프" 또는 링크 내에서 다른 칩에 액세스하도록 한다.The SE emulation system implements an array of FPGA chips on a reconfigurable board. Based on the hardware model, the SE emulation system divides, wraps, positions and routes each selected portion of the user circuit design onto the FPGA chip. Thus, for example, a 4x4 array of 16 chips can model large circuits spread across these 16 chips. The interconnect scheme allows each chip to access two "jumps" or other chips within the link.

각각의 FPGA 칩은 각각의 I/O 어드레스 스페이스(즉, REG, CLK, S2H, H2S)를 위한 어드레스 포인터를 구현한다. 특정 어드레스 스페이스와 관련된 모든 어드레스 포인터의 조합은 함께 연쇄(chain)될 수 있다. 그래서, 데이터 전달 동안, 각각의 칩 내의 워드 데이터는 메인 FPGA 버스 및 PCI 버스로부터/로 각 칩에서 선택된 어드레스 스페이스에 대하여 한번에 한 워드 그리고 희망 워드 데이터가 그 선택된 어드레스 스페이스에 대하여 액세스될 때까지 한번에 한 칩이 순차적으로 선택된다. 워드 데이터의 이 순차적인 선택은 전파되는 워드 선택 신호에 이해 달성된다. 이 워드 선택 신호는 칩 내의 어드레스 포인터를 통하여 이동하고 나서, 다음 칩 내의 어드레스 포인터로 전파되고 최종 칩 상으로 지속되거나 시스템이 어드레스 포인터를 초기화한다.Each FPGA chip implements an address pointer for each I / O address space (ie, REG, CLK, S2H, H2S). Combinations of all address pointers associated with a particular address space can be chained together. Thus, during data transfer, the word data in each chip is one word at a time for the selected address space on each chip to and from the main FPGA bus and the PCI bus until the desired word data is accessed for that selected address space. The chips are selected sequentially. This sequential selection of word data is achieved by understanding the word select signal to be propagated. This word select signal travels through the address pointer in the chip and then propagates to the address pointer in the next chip and continues on the last chip or the system initializes the address pointer.

재구성 가능한 보드 내의 FPGA 버스 시스템은 PCI 버스 대역폭을 두배로 동작시키지만, PCI 버스 속도를 절반으로 동작시킨다. 그러므로, FPGA 칩은 더 큰 대역폭 버스를 사용하기 위하여 군(bank)으로 분리된다. 이 FPGA 버스 시스템의 처리량이 PCI 버스 시스템의 처리량을 비례하여 성능은 버스 속도를 감소시킴으로서 성능이 손실된다. 확장은 군 길이를 확장시키는 더 많은 FPGA 칩 또는 피기백 보드를 포함하는 더 큰 보드를 통하여 가능하다.The FPGA bus system on the reconfigurable board doubles the PCI bus bandwidth, but halves the PCI bus speed. Therefore, FPGA chips are separated into banks to use larger bandwidth buses. The throughput of this FPGA bus system is proportional to the throughput of the PCI bus system, so the performance is reduced by reducing the bus speed. Expansion is possible through larger boards that include more FPGA chips or piggyback boards that extend the group length.

B. 어드레스 포인터B. Address Pointer

도 11은 본 발명의 어드레스 포인터의 일 실시예를 도시한 것이다. 모든 I/O 동작은 DMA 스트리밍 수행한다. 시스템이 단지 하나의 버스를 가지기 때문에, 상기 시스템은 한번에 한 워드씩 순차적으로 데이터에 액세스한다. 그러므로, 어드레스 포인터의 일 실시예는 이러한 어드레스 스페이스에서 선택된 워드에 순차적으로 액세스하기 위하여 시프트 레지스터 체인을 사용한다. 어드레스 포인터(400)는 플립-플롭(401-405), AND-게이트(406), 및 제어 신호 결합, INITIALIZE(407) 및 MOVE(408)을 포함한다.Figure 11 illustrates one embodiment of an address pointer of the present invention. All I / O operations perform DMA streaming. Since the system has only one bus, the system accesses the data sequentially one word at a time. Therefore, one embodiment of an address pointer uses a shift register chain to sequentially access selected words in this address space. The address pointer 400 includes flip-flops 401-405, AND-gates 406, and control signal combinations, INITIALIZE 407 and MOVE 408.

각각의 어드레스 포인터는 선택된 어드레스 스페이스 내의 동일한 워드에 대응하는 각각의 FPGA 칩에서 n 개의 가능한 워드로부터 한 워드를 선택하기 위하여n 개의 출력(W0,W1,W2,...,Wn-1)을 갖는다. 모델링되는 특정 사용자 회로 디자인에 따라서, 워드의 수(n)는 회로 디자인마다 가변될 수 있고, 소정 회로 디자인에 대하여, n은 FPGA 칩마다 가변이다. 도 11에서, 어드레스 포인터(400)는 단지 5 워드(즉, n=5) 어드레스 포인터이다. 그러므로, 특정 어드레스 스페이스에 대한 이 5-워드 어드레스 포인터를 포함하는 이 특정 FPGA 칩은 선택하기 위한 단지 5 워드만을 갖는다. 물론, 어드레스 포인터(400)는 임의의 수의 워드(n)를 구현할 수 있다. 이 출력 신호(Wn)는 워드 선택 신호로 명명될 수 있다. 이 워드 선택 신호가 이 어드레스 포인터 내의 최종 플립-플롭의 출력에 도달할때, 이것은 다음 FPGA 칩의 어드레스 포인터의 입력으로 전파될 OUT 신호로 명명된다.Each address pointer has n outputs (W0, W1, W2, ..., Wn-1) to select one word from n possible words in each FPGA chip corresponding to the same word in the selected address space. . Depending on the particular user circuit design being modeled, the number n of words can vary from circuit design to design, and for a given circuit design, n is variable per FPGA chip. In FIG. 11, the address pointer 400 is only a five word (ie n = 5) address pointer. Therefore, this particular FPGA chip containing this 5-word address pointer for that particular address space has only 5 words to select. Of course, the address pointer 400 can implement any number of words n. This output signal Wn may be referred to as a word select signal. When this word select signal reaches the output of the last flip-flop in this address pointer, it is named the OUT signal to propagate to the input of the address pointer of the next FPGA chip.

INITIALIZE 신호가 나타날때, 어드레스 포인터는 초기화된다. 제 1 플립-플롭(401)은 "1"로 설정되고 모든 다른 플립-플롭(402-405)은 "0"으로 설정된다. 이 지점에서, 어드레스 포인터의 초기화는 임의의 워드 선택을 가능하게 하지는 않을 것이다;즉, 모든 Wn 출력은 초기화 후에 여전히 "0"이다. 어드레스 포인터 초기화 절차는 도 12와 관련하여 논의될 것이다.When the INITIALIZE signal appears, the address pointer is initialized. The first flip-flop 401 is set to "1" and all other flip-flops 402-405 are set to "0". At this point, initialization of the address pointer will not enable any word selection; that is, all Wn outputs are still "0" after initialization. The address pointer initialization procedure will be discussed with respect to FIG.

MOVE 신호는 워드 선택을 위한 포인터의 진행을 제어한다. 이 MOVE 신호는 FPGA I/O 제어기로부터의 READ, WRITE 및 SPACE 인덱스 제어 신호로부터 유도된다. 모든 동작이 본질적으로 판독 또는 기록이기 때문에, SPACE 인덱스 신호는 어느 어드레스 포인터가 MOVE 신호를 제공받을 것인지를 본질적으로 결정한다. 그러므로, 시스템은 한번에 선택된 I/O 어드레스 스페이스와 관련된 단지 하나의 어드레스 포인터를 동작시키며, 그 시간 동안, 시스템은 그 어드레스 포인터에 MOVE 신호를 제공한다. MOVE 신호 발생은 도 13과 관련하여 이하에 논의된다. 도 11을 참조하면, MOVE 신호가 나타나면, MOVE 신호는 AND 게이트(406)의 입력으로 제공되어 플립-플롭(401-405)의 입력을 인에이블시킨다. 그러므로, 논리 "1"은 시스템 클럭 사이클마다 워드 출력 Wi으로부터 Wi+1로 이동할 것이다; 즉, 포인터는 사이클마다 특정 워드를 선택하기 위하여 Wi로부터 Wi+1로 이동할 것이다. 시프팅 워드 선택 신호가 최종 플립-플롭(405)의 출력(413)(본원에서 "OUT"으로 표시됨)으로 진행할때, 이 OUT 신호는 그 후에 어드레스 포인터가 다시 초기화되지 않는다면 도 14 및 15와 관련하여 서술되는 다중화된 교차 칩 어드레스 포인터 체인를 통하여 다음 FPGA 칩으로 진행되어야만 한다.The MOVE signal controls the progress of the pointer for word selection. This MOVE signal is derived from the READ, WRITE, and SPACE index control signals from the FPGA I / O controller. Since all operations are essentially read or write, the SPACE index signal essentially determines which address pointer will be provided with the MOVE signal. Therefore, the system operates only one address pointer associated with the selected I / O address space at a time, during which time the system provides a MOVE signal to that address pointer. MOVE signal generation is discussed below with respect to FIG. 13. Referring to FIG. 11, when the MOVE signal appears, the MOVE signal is provided to the input of the AND gate 406 to enable the inputs of the flip-flops 401-405. Therefore, logic "1" will move from word output Wi to Wi + 1 every system clock cycle; That is, the pointer will move from Wi to Wi + 1 to select a particular word every cycle. When the shifting word select signal proceeds to the output 413 of the final flip-flop 405 (shown herein as "OUT"), this OUT signal is then associated with Figures 14 and 15 unless the address pointer is reinitialized. It must proceed to the next FPGA chip through the multiplexed cross chip address pointer chain described.

어드레스 포인터 초기화 절차가 이하에 서술될 것이다. 도 12는 도 11의 어드레스 포인터에 대한 어드레스 포인터 초기화의 상태 전이 도를 도시한 것이다. 최초로, 상태(460)는 휴지 상태이다. DATA_XSFR이 "1"로 설정될때, 시스템은 상태(461)로 진행하며, 여기서 어드레스 포인터는 초기화된다. 여기서, INITIALIZE 신호가 나타난다. 각각의 어드레스 포인터 내의 제 1 플립-플롭이 "1"로 설정되고 어드레스 포인터 내의 모든 다른 플립-플롭이 "0"으로 설정된다. 이 지점에서, 어드레스 포인터의 초기화는 임의의 워드 선택을 인에이블시키지 않을 것이다: 즉, 모든 Wn 출력이 여전히 "0"이다. 다음 상태는 대기 상태(462)이며 DATA_XSFR은 여전히 "1"이다. DATA_XSFRdl "0"이 될때, 어드레스 포인터 초기화 절차는 완료되고 시스템은 휴지 상태(460)로 복귀한다.The address pointer initialization procedure will be described below. 12 shows a state transition diagram of address pointer initialization for the address pointer of FIG. Initially, state 460 is at rest. When DATA_XSFR is set to "1", the system proceeds to state 461 where the address pointer is initialized. Here, the INITIALIZE signal appears. The first flip-flop in each address pointer is set to "1" and all other flip-flops in the address pointer are set to "0". At this point, the initialization of the address pointer will not enable any word selection: that is, all Wn outputs are still "0". The next state is the wait state 462 and DATA_XSFR is still "1". When DATA_XSFRdl becomes " 0 ", the address pointer initialization procedure is complete and the system returns to the idle state 460.

어드레스 포인터를 위해 다양한 MOVE 신호를 발생시키는 MOVE 신호 발생기가이하에 논의될 것이다. FPGA I/O 제어기(도 10; 도 22 내의 아이템(327))에 의해 발생되는 SPACE 인덱스는 특정 어드레스 스페이스(즉, REG 판독, REG 기록, S2H 판독, H2S 기록 및 CLK기록)를 선택한다. 이 어드레스 스페이스 내에서, 본 발명의 시스템은 액세스될 특정 워드를 순차적으로 선택한다. 순차적인 워드 선택은 MOVE 신호에 의하여 각각의 어드레스 포인터 내에서 달성된다.MOVE signal generators for generating various MOVE signals for the address pointer will be discussed below. The SPACE index generated by the FPGA I / O controller (FIG. 10; item 327 in FIG. 22) selects a particular address space (ie, REG read, REG write, S2H read, H2S write, and CLK write). Within this address space, the system of the present invention sequentially selects specific words to be accessed. Sequential word selection is achieved within each address pointer by the MOVE signal.

MOVE 신호 발생기의 일실시예가 도 13에 도시되어 있다. 각각의 FPGA 칩(450)은 다양한 소프트웨어/하드웨어 경계 어드레스 스페이스(즉, REG, S2H, H2S 및 CLK)에 대응하는 어드레스 포인터를 갖는다. FPGA 칩(450)에서 모델링되어 구현되는 사용자의 회로 디자인 및 어드레스 포인터 이외에, MOVE 신호 발생기(470)가 FPGA 칩(450) 내에 제공된다. MOVE 신호 발생기(470)는 어드레스 스페이스 디코더(451) 및 몇 개의 AND 게이트(452-456)를 포함한다. 입력 신호는 와이어 라인(457) 상의 FPGA 판독 신호(F_RD), 와이어 라인(458) 상의 FPGA 기록 신호 (F_WR) 및 어드레스 스페이스 신호(459)이다. 각각의 어드레스 포인터에 대한 출력 MOVE 신호는 와이어 라인(464) 상의 REGR-이동, 와이어 라인(465) 상의 REGW-이동, 와이어 라인(466) 상의 S2H-이동, 와이어 라인(467) 상의 H2S-이동, 와이어 라인(468) 상의 CLK-이동에 대응하며, 이것들에 따라서 어드레스 스페이스의 어드레스 포인터는 적용 가능하다. 이러한 출력 신호는 와이어 라인(408)(도 11) 상의 MOVE 신호에 대응한다.One embodiment of a MOVE signal generator is shown in FIG. Each FPGA chip 450 has an address pointer corresponding to various software / hardware boundary address spaces (ie, REG, S2H, H2S, and CLK). In addition to the user's circuit design and address pointer modeled and implemented in the FPGA chip 450, a MOVE signal generator 470 is provided within the FPGA chip 450. MOVE signal generator 470 includes an address space decoder 451 and several AND gates 452-456. The input signals are the FPGA read signal F_RD on the wire line 457, the FPGA write signal F_WR on the wire line 458, and the address space signal 459. The output MOVE signal for each address pointer is REGR-move on wire line 464, REGW-move on wire line 465, S2H-move on wire line 466, H2S-move on wire line 467, Corresponds to CLK-move on wire line 468, and accordingly these address pointers in the address space are applicable. This output signal corresponds to the MOVE signal on wire line 408 (FIG. 11).

어드레스 스페이스 디코더(451)는 3-비트 입력 신호(459)를 수신한다. 이 디코더는 또한 단지 2-비트 입력 신호를 수신할 수 있다. 2-비트 신호는 4 개의가능한 어드레스 스페이스를 제공하는 반면, 3-비트 입력은 8 개의 가능한 어드레스 스페이스를 제공한다. 일 실시예에서, CLK는 "00"으로 할당되고, S2H는 "01"로 할당되며, H2S는 "10"으로 할당되고 REG는 "11"로 할당된다. 입력 신호(459)에 따라서, 어드레스 스페이스 디코더의 출력부는 REG, H2S, S2H, 및 CLK에 각각 대응하는 와이어 라인(460-463)중 하나 상에 "1"을 출력하지만, 나머지 와이어 라인은 "0"으로 설정된다. 그러므로, 임의의 이러한 출력 와이어 라인(460-463)이 "0"인 경우, AND 게이트(452-456)의 대응하는 출력은 "0"이다. 마찬가지로, 임의의 이러한 입력 와이어 라인(460-463)이 "1"인 경우, AND 게이트(452-456)의 대응하는 출력은 "1"이다. 예를 들어, 어드레스 스페이스 신호(459_가 "10"인 경우, 어드레스 스페이스(H2S)가 선택된다. 와이어 라인(461)은 "1"이지만, 나머지 와이어 라인(460, 462 및 463)은 "0"이다. 따라서, 와이어 라인(466)이 "1"이지만, 나머지 출력 와이어 라인(464, 465, 467 및 468)은 "0"이다. 마찬가지로, 와이어 라인(460)이 "1"인 경우, REG 스페이스가 선택되며 판독(F_RD) 또는 기록(F_WR) 동작이 선택되는지에 따라서, 와이어 라인(464) 상의 REGR-이동 신호 또는 와이어 라인(465) 상의 REGW-이동 신호중 하나는 "1"일 것이다.The address space decoder 451 receives the 3-bit input signal 459. This decoder can also only receive a 2-bit input signal. The 2-bit signal provides four possible address spaces, while the 3-bit input provides eight possible address spaces. In one embodiment, CLK is assigned "00", S2H is assigned "01", H2S is assigned "10" and REG is assigned "11". According to the input signal 459, the output of the address space decoder outputs "1" on one of the wire lines 460-463 corresponding to REG, H2S, S2H, and CLK, respectively, but the remaining wire lines are "0". Is set to ". Therefore, when any such output wire lines 460-463 are "0", the corresponding output of AND gates 452-456 is "0". Similarly, if any such input wire lines 460-463 are "1", the corresponding output of AND gates 452-456 is "1". For example, when the address space signal 459_ is "10", the address space H2S is selected. The wire line 461 is "1", but the remaining wire lines 460, 462, and 463 are "0." Thus, the wire line 466 is "1", but the remaining output wire lines 464, 465, 467 and 468 are "0". Similarly, if the wire line 460 is "1", the REG Depending on whether the space is selected and the read (F_RD) or write (F_WR) operation is selected, either the REGR-move signal on the wire line 464 or the REGW-move signal on the wire line 465 will be "1".

전술한 바와 같이, SPACE 인덱스는 FPGA I/O 제어기에 의해 생성된다. 코드에서, MOVE 제어는:As mentioned above, the SPACE index is created by the FPGA I / O controller. In the code, the MOVE control is:

REG 스페이스 판독 포인터: REGR-이동 = (SPACE-인덱스==#REG)& READ;REG space read pointer: REGR-Move = (SPACE-Index == # REG) &READ;

REG 스페이스 기록 포인터: REGW-이동 = (SPACE-인덱스==#REG)& WRITE;REG space write pointer: REGW-Move = (SPACE-Index == # REG) &WRITE;

S2H 스페이스 판독 포인터: S2H-이동 = (SPACE-인덱스==#S2H)& READ;S2H Space Read Pointer: S2H-Move = (SPACE-Index == # S2H) &READ;

H2S 스페이스 기록 포인터: H2S-이동 = (SPACE-인덱스==#H2S)& WRITE;H2S space write pointer: H2S-Move = (SPACE-Index == # H2S) &WRITE;

CLK 스페이스 판독 포인터: CLK-이동 = (SPACE-인덱스==#CLK)& WRITE;CLK space read pointer: CLK-Move = (SPACE-Index == # CLK) &WRITE;

이것은 도 13의 MOVE 신호 발생기의 논리도에 대한 등기 코드이다.This is the registration code for the logic diagram of the MOVE signal generator of FIG.

전술한 바와 같이, 각각의 FPGA 칩은 소프트웨어/하드웨어 경계에서의 어드레스 공간과 동일한 수의 어드레스 포인터를 갖는다. 소프트웨어/하드웨어 경계가 4 개의 어드레스 스페이스(즉, REG, S2H, H2S 및 CLK)를 갖는 경우, 각각의 FPGA 칩은 4 개의 어드레스 스페이스에 대응하는 4 개의 어드레스 포인터를 갖는다. 각각의 FPGA는 처리되고 있는 선택된 어드레스 스페이스 내의 특정 선택 워드가 임의의 하나 이상의 FPGA 칩에 존재하거나, 선택된 어드레스 스페이스 내의 데이터가 각각의 FPGA 칩에서 모델링되고 구현된 다양한 회로 엘리먼트에 영향을 주기 때문에, 이러한 4 개의 어드레스 포인터를 필요로 한다. 선택된 워드가 적절한 FPGA 칩(들) 내의 적절한 회로 엘리먼트(들)에 의해 처리되도록 하기 위하여, 소정 소프트웨어/하드웨어 경계 어드레스 스페이스(즉, REG, S2H, H2S 및 CLK)와 관련된 어드레스 포인터의 각 세트는 몇 개의 FPGA 칩에 걸쳐서 함께 "연쇄"된다. 도 11과 관련하여 전술한 바와 같은 MOVE 신호를 통한 특성 시프팅 또는 전파 워드 선택 메커니즘은 이 "체인" 실시예에서, 하나의 FPGA 칩 내의 특정 어드레스 스페이스와 관련된 어드레스 포인터가 다음 FPGA 칩 내의 동일한 어드레스 스페이스와 관련된 어드레스 포인터에 "연쇄"된다는 것을 제외하고, 여전히 사용된다.As mentioned above, each FPGA chip has the same number of address pointers as the address space at the software / hardware boundary. If the software / hardware boundary has four address spaces (ie, REG, S2H, H2S, and CLK), each FPGA chip has four address pointers corresponding to the four address spaces. Each FPGA has a particular select word in the selected address space being processed in any one or more FPGA chips, or because the data in the selected address space affects the various circuit elements modeled and implemented in each FPGA chip. Four address pointers are required. In order for the selected word to be processed by the appropriate circuit element (s) in the appropriate FPGA chip (s), each set of address pointers associated with a given software / hardware boundary address space (ie, REG, S2H, H2S and CLK) may be "Chain" together across the four FPGA chips. The characteristic shifting or propagation word selection mechanism via the MOVE signal as described above with respect to FIG. 11 is that in this "chain" embodiment, the address pointer associated with a particular address space in one FPGA chip is the same address space in the next FPGA chip. It is still used, except that it is "chained" to the address pointer associated with.

어드레스 포인터를 연쇄하기 위하여 4 개의 입력 핀 및 4 개의 출력 핀을 구현하는 것은 동일한 목적을 달성할 것이다. 그러나, 이러한 구현은 자원의 효율적인 사용면에서 너무 비용이 많이 들게 될 것이다; 즉, 두 개의 칩들 사이에 4 개의 와이가 필요로될 것이고, 각 칩에서 4 개의 입력 핀 및 4 개의 출력 핀이 필요로될 것이다. 본 발명에 따른 일 실시예는 하드웨어 모델이 칩들 사이에서 단지 하나의 와이어가 사용되도록 하고 각 칩에서 단지 1 입력 핀 및 1 출력 핀(칩 내에 2 I/O 핀)이 사용되도록 하는 다중화된 교차 칩 어드레스 포인터 체인을 사용한다. 다중화된 교차 칩 어드레스 포인터 체인의 일 실시에가 도 14에 도시되어 있다.Implementing four input pins and four output pins to concatenate address pointers will accomplish the same purpose. However, such an implementation would be too expensive in terms of efficient use of resources; That is, four wires will be needed between the two chips, and four input pins and four output pins will be needed on each chip. One embodiment according to the present invention is a multiplexed cross chip in which the hardware model allows only one wire to be used between the chips and only one input pin and one output pin (2 I / O pins within the chip) to be used on each chip. Use an address pointer chain. One embodiment of a multiplexed cross chip address pointer chain is shown in FIG.

도 14에 도시된 실시예에서, 사용자의 회로 디자인은 재구성 가능한 하드웨어 보드(470) 내의 세 개의 FPGA 칩(415-417)에서 맵핑되고 분할된다. 어드레스 포인터는 블럭(421-432)으로 도시된다. 각각의 어드레스 포인터, 예를 들어 어드레스 포인터(427)는 워드수(Wn) 및 플립-플롭수가 얼마나 많은 워드가 사용자의 커스텀 회로 디자인(custom circuit design)을 의해 각 칩에서 구현될 있는지에 따라 가변할 수 있다는 것을 제외하면 도 11에 도시된 어드레스 포인터와 유사한 구조 및 기능을 갖는다.In the embodiment shown in FIG. 14, the user's circuit design is mapped and partitioned on three FPGA chips 415-417 in the reconfigurable hardware board 470. The address pointer is shown in blocks 421-432. Each address pointer, for example address pointer 427, may vary depending on how many words Wn and flip-flop words are to be implemented on each chip by the user's custom circuit design. It has a structure and function similar to that of the address pointer shown in FIG.

REGR 어드레스 스페이스에 대하여, FPGA 칩(415)은 어드레스 포인터(421)를 가지고, FPGA 칩(416)은 어드레스 포인터(425)를 가지며, FPGA 칩(417)은 어드레스 포인터(429)를 갖는다. REGW 어드레스 스페이스에 대하여, FPGA 칩(415)은 어드레스 포인터(422)를 가지고, FPGA 칩(416)은 어드레스 포인터(426)를 가지며, FPGA 칩(417)은 어드레스 포인터(430)를 갖는다. S2H 어드레스 스페이스에 대하여, FPGA 칩(415)은 어드레스 포인터(423)를 가지고, FPGA 칩(416)은 어드레스 포인터(427)를 가지며 FPGA 칩(417)은 어드레스 포인터(431)를 갖는다. H2S 어드레스 스페이스에 대하여, 어드레스 스페이스에 대하여, FPGA 칩(415)은 어드레스 포인터(424)를 가지고, FPGA 칩(416)은 어드레스 포인터(428)를 가지며 FPGA 칩(417)은 어드레스 포인터(432)를 갖는다.For the REGR address space, the FPGA chip 415 has an address pointer 421, the FPGA chip 416 has an address pointer 425, and the FPGA chip 417 has an address pointer 429. For the REGW address space, the FPGA chip 415 has an address pointer 422, the FPGA chip 416 has an address pointer 426, and the FPGA chip 417 has an address pointer 430. For the S2H address space, the FPGA chip 415 has an address pointer 423, the FPGA chip 416 has an address pointer 427 and the FPGA chip 417 has an address pointer 431. For the H2S address space, for the address space, the FPGA chip 415 has an address pointer 424, the FPGA chip 416 has an address pointer 428 and the FPGA chip 417 has an address pointer 432. Have

각각의 칩(415-417)은 멀티플렉서(418-420)를 각각 갖는다. 이러한 멀티플렉서(418-420)는 모델일 수 있고 실제 구현은 당업자들에게 공지된 바와 같이, 레지스터 및 논리 엘리먼트의 조합일 수 있다. 예를 들어, 멀티플렉서는 도 15에 도시된 바와 같이 OR 게이트 내로 들어가는 몇 개의 AND 게이트일 수 있다. 멀티플렉서(487)는 네 개의 AND 게이트(481-484) 및 OR 게이트(485)를 포함한다. 멀티플렉서(487)로의 입력은 칩 내의 각 어드레스 포인터로부터의 OUT 및 MOVE 신호이다. 멀티플렉서(487)의 출력(486)은 다음 FPGA 칩에 대한 입력부로 통과되는 체인-아웃 신호(chain-out signal)이다.Each chip 415-417 has a multiplexer 418-420, respectively. Such multiplexers 418-420 may be models and the actual implementation may be a combination of registers and logic elements, as known to those skilled in the art. For example, the multiplexer can be several AND gates that go into the OR gate as shown in FIG. 15. Multiplexer 487 includes four AND gates 481-484 and OR gate 485. Inputs to the multiplexer 487 are OUT and MOVE signals from each address pointer in the chip. The output 486 of the multiplexer 487 is a chain-out signal that is passed to the input to the next FPGA chip.

도 15에서, 이 특정 FPGA 칩은 I/O 어드레스 스페이스에 대응하는 네 개의 어드레스 포인터(475-478)를 갖는다. 어드레스 포인터의 출력, OUT 및 MOVE 신호는 멀티플렉서(487)로의 입력이다. 예를 들어, 어드레스 포인터(475)는 와이어 라인(479) 상의 OUT 신호 및 와이어 라인(480) 상의 MOVE 신호를 갖는다. 이러한 신호는 AND 게이트(481)로의 입력이다. 이 AND 게이트(481)의 출력은 OR 게이트 (485)로의 입력이다. OR 게이트(485)의 출력은 이 멀디플렉서(487)의 출력이다. 동작에서, 대응하는 MOVE 신호 및 SPACE 인덱스와 함께 각각의 어드레스 포인터의 출력에서의 OUT 신호는 멀티플렉서(487)에 대한 선택기 신호로서 동작한다; 즉, (SPACE 인덱스 신호로부터 유도되는) OUT 및 MOVE 신호 둘 모두는 멀티플렉서로부터의 워드 선택 신호를 체인-아웃 와이어 라인으로 전파하기 위하여 활성(active)(예를 들어, 논리 "1")으로 나타나야만 한다. MOVE 신호는 입력 MUX 데이터 신호로서 특성을 나타낼 수 있도록 어드레스 포인터 내의 플립-플롭을 통하여 워드 선택 신호를 이동시키기 위하여 주기적으로 나타날 것이다.In Figure 15, this particular FPGA chip has four address pointers 475-478 corresponding to the I / O address space. The output of the address pointer, OUT and MOVE signals are input to the multiplexer 487. For example, address pointer 475 has an OUT signal on wire line 479 and a MOVE signal on wire line 480. This signal is input to AND gate 481. The output of this AND gate 481 is an input to the OR gate 485. The output of the OR gate 485 is the output of this multiplexer 487. In operation, the OUT signal at the output of each address pointer along with the corresponding MOVE signal and SPACE index act as a selector signal for multiplexer 487; That is, both the OUT and MOVE signals (derived from the SPACE index signal) must appear active (eg, a logic "1") to propagate word select signals from the multiplexer to the chain-out wire line. do. The MOVE signal will appear periodically to move the word select signal through the flip-flop in the address pointer so that it can be characterized as an input MUX data signal.

도 14를 참조하면, 이러한 멀티플렉서(418-420)는 네 개의 세트의 입력 및 하나의 출력을 갖는다. 각 입력 세트는 (1) 특정 어드레스 스페이스와 관련된 어드레스 포인터를 위한 최종 출력 Wn-1 와이어 라인(예를 들어, 도 11에 도시된 어드레스 포인터 내의 와이어 라인(413)) 상에서 발견된 OUT 신호, 및 (2) MOVE 신호를 포함한다. 각 멀티플렉서(418-420)의 출력은 체인-아웃 신호(chain-out signal)이다. 각 어드레스 포인터 내의 플립-플롭은 통한 워드 선택 신호(Wn)는 어드레스 포인터 내의 최종 플립-플롭의 출력에 도달할때 OUT 신호가 된다. 와이어 라인(433-435) 상의 체인-아웃 신호는 동일한 어드레스 포인터와 관련된 OUT 신호 및 MOVE 신호가 둘 모두 활성으로 나타날때(예를 들어, "1"로 나타날때)만, "1"이될 것이다.Referring to Figure 14, these multiplexers 418-420 have four sets of inputs and one output. Each input set includes (1) an OUT signal found on the final output Wn-1 wire line (e.g., wire line 413 in the address pointer shown in FIG. 11) for an address pointer associated with a particular address space, and ( 2) Contains the MOVE signal. The output of each multiplexer 418-420 is a chain-out signal. The word select signal Wn through the flip-flop in each address pointer becomes an OUT signal when the output of the last flip-flop in the address pointer is reached. The chain-out signal on wire line 433-435 will be "1" only when both the OUT signal and the MOVE signal associated with the same address pointer appear active (eg, appear as "1"). .

멀티플렉서(418)에 대하여 입력은 어드레스 포인터(421-424)로부터의 OUT 및 MOVE 신호에 각각 대응하는 MOVE 신호(436-439) 및 OUT 신호(440-443)이다. 멀티플렉서(419)에 대하여 입력은 어드레스 포인터(425-428)로부터의 OUT 및 MOVE 신호에 각각 대응하는 MOVE 신호(444-447) 및 OUT 신호(452-455)이다. 멀티플렉서 (420)에 대하여 입력은 어드레스 포인터(429-432)로부터의 OUT 및 MOVE 신호에 각각 대응하는 MOVE 신호(448-451) 및 OUT 신호(456-459)이다.Inputs to the multiplexer 418 are MOVE signals 436-439 and OUT signals 440-443, respectively, corresponding to the OUT and MOVE signals from the address pointers 421-424. For the multiplexer 419 the inputs are MOVE signals 444-447 and OUT signals 452-455, respectively, corresponding to the OUT and MOVE signals from the address pointers 425-428. For the multiplexer 420, the inputs are MOVE signals 448-451 and OUT signals 456-459 corresponding to the OUT and MOVE signals from address pointers 429-432, respectively.

동작시에, 워드(Wn)의 임의의 제공된 시프트에 대하여, 소프트웨어/하드웨어 경계 내의 선택된 I/O 어드레스 스페이스와 관련된 단지 그러한 어드레스 포인터 또는 어드레스 포인터의 체인만이 활성이다. 그러므로, 도 14에서, 어드레스 스페이스(REGR, REGW, S2H 또는 H2S)중 하나와 관련된 칩(415, 416 및 417) 내의 어드레스 포인터만이 제공된 시프트에 대해 활성이다. 또한, 플립-플롭을 통한 워드 선택 신호(Wn)의 제공된 시프트에 대하여, 선택된 워드는 버스 대역폭 상의 제한으로 인하여 순차적으로 액세스된다. 일 실시예에서, 버스는 32 비트폭이며 워드는 32비트이어서, 단지 하나의 워드가 한번에 액세스되어 적절한 자원으로 전달될 수 있다.In operation, for any given shift of word Wn, only such an address pointer or chain of address pointers associated with the selected I / O address space within the software / hardware boundary is active. Therefore, in FIG. 14, only address pointers in chips 415, 416 and 417 associated with one of the address spaces REGR, REGW, S2H or H2S are active for the provided shift. Also, for a given shift of the word select signal Wn via flip-flop, the selected words are accessed sequentially due to limitations in the bus bandwidth. In one embodiment, the bus is 32 bits wide and the words are 32 bits so that only one word can be accessed at a time and transferred to the appropriate resource.

어드레스 포인터가 플립-플롭을 통하여 워드 선택 신호를 전파 또는 시프팅하고 있을때, 출력 체인-아웃 신호는 활성화되지 않으므로(예를 들어, "1"이 아님), 이 칩 내의 이 멀티플렉서는 아직 다음 FPGA 칩으로 워드 선택 신호를 전파할 준비를 하지 않았다. OUT 신호가 활성(예를 들어 "1")로 나타날때, 체인-아웃 신호는 시스템이 워드 선택 신호를 다음 FPGA 칩으로 전파하거나 시프팅할 준비가되었다는 것을 표시하는 활성(예를 들어 "1")으로 나타난다. 그러므로, 액세스가 한번에 한칩에 대해 발생된다; 즉, 워드 선택 신호는 워드 선택 시프트 동작이 다른 칩에 대해 수행되기 전에 한 칩 내의 플립-플롭을 통하여 시프트된다. 체인-아웃 신호는 워드 선택 신호가 각 칩에서 어드레스 포인터의 끝에 도달할때만 나타난다. 코드에서, 체인-아웃 신호는:When the address pointer is propagating or shifting the word select signal through the flip-flop, the output chain-out signal is not active (for example, not "1"), so this multiplexer in this chip is not yet the next FPGA chip. It is not ready to propagate the word select signal. When the OUT signal appears active (eg "1"), the chain-out signal is active (eg "1") indicating that the system is ready to propagate or shift the word select signal to the next FPGA chip. Appears. Therefore, access is generated for one chip at a time; That is, the word select signal is shifted through flip-flops in one chip before the word select shift operation is performed on the other chip. The chain-out signal appears only when the word select signal reaches the end of the address pointer on each chip. In the code, the chain-out signal is:

체인-아웃= (REGR-이동&REGR-아웃)｜(REGW-이동*REGW-아웃)｜(S2H-이동&S2H-아웃)｜(H2S-이동&H2S-아웃);Chain-out = (REGR-move & REGR-out) | (REGW-move * REGW-out) | (S2H-move & S2H-out) | (H2S-move &H2S-out);

요컨데, 시스템 내의 X 개수의 I/O 어드레스 스페이스(즉, REG, H2S, S2H, CLK)에 대하여, 각각의 FPGA는 각 어드레스 스페이서에 대하여 하나의 어드레스 포인터씩 X 개의 어드레스 포인터를 갖는다. 각각의 어드레스 포인터의 크기는 각각의 FPGA 칩에서의 사용자의 커스텀 회로 디자인을 모델링하기 위하여 필요한 워드수에 따른다. 특정 FPGA 칩에 대해 n 개의 워드 및 어드레스 포인터에 대해 n 워드를 가정하면, 이 특정 어드레스 포인터는 n 개의 출력(즉, W0, W1, W2,...,Wn-1)을 갖는다. 이러한 출력(Wi)은 또한 워드 선택 신호라고 명명된다. 특정 워드(Wi)가 선택될때, Wi 신호는 활성(즉, "1")으로 나타난다. 이 워드 선택 신호는 이 칩 내의 어드레스 포인터의 끝에 도달할 때까지 이 칩의 어드레스 포인터를 아래로 시프트하거나 전파하며, 그 끝 지점에서, 상기 신호는 다음 칩 내의 어드레스 포인터를 통하여 워드 선택 신호(Wi)의 전파를 개시하는 체인-아웃 신호의 발생을 트리거한다. 이 방식에서, 소정 I/O 어드레스 스페이스와 관련된 어드레스 포인터의 체인은 이 재구성 가능한 하드웨어 보드 내의 모든 FPGA 칩에 걸쳐서 구현될 수 있다.In other words, for X number of I / O address spaces (ie, REG, H2S, S2H, CLK) in the system, each FPGA has X address pointers, one address pointer for each address spacer. The size of each address pointer depends on the number of words needed to model your custom circuit design on each FPGA chip. Assuming n words for a particular FPGA chip and n words for an address pointer, this particular address pointer has n outputs (ie, W0, W1, W2, ..., Wn-1). This output Wi is also termed a word select signal. When a particular word Wi is selected, the Wi signal appears active (ie, "1"). This word select signal shifts or propagates the address pointer of this chip down until it reaches the end of the address pointer in this chip, at which point the signal is passed through the word select signal Wi through the address pointer in the next chip. Trigger the generation of a chain-out signal that initiates propagation of. In this manner, a chain of address pointers associated with a given I / O address space can be implemented across all FPGA chips in this reconfigurable hardware board.

C. 게이팅된 데이터/클럭 네트워크 분석C. Gated Data / Clock Network Analysis

본 발명의 다양한 실시예는 게이팅된 데이터 논리(gated data logic) 및 게이팅된 클럭 논리 분석과 관련된 클럭 분석을 수행한다. 게이팅된 클럭 논리(또는 클럭 네트워크) 및 게이팅된 데이터 네트워크 결정은 에뮬레이션 동안 하드웨어 모델에서의 논리 평가 및 소프트웨어 클럭의 성공적인 구현에 대해 중요하다. 도 4와 관련하여 서술된 바와 같이, 클럭 분석을 단계(305)에서 수행된다. 이 클럭 분석 공정에 대해 더 부연하기 위하여, 도 16은 본 발명의 일 실시예에 따른 흐름도를 도시한 것이다. 도 16은 또한 게이팅된 이미지 분석을 도시한다.Various embodiments of the present invention perform clock analysis associated with gated data logic and gated clock logic analysis. Gated clock logic (or clock network) and gated data network decisions are important for the logic implementation in the hardware model and successful implementation of the software clock during emulation. As described in connection with FIG. 4, clock analysis is performed in step 305. To further illustrate this clock analysis process, FIG. 16 shows a flowchart in accordance with one embodiment of the present invention. 16 also shows gated image analysis.

SE뮬레이션 시스템은 소프트웨어에서 사용자 회로 디자인의 완전한 모델 및 하드웨어에서 사용자 회로 디자인의 일부분을 갖는다. 이러한 하드웨어 부분은 클럭 요소, 특히 유도된 클럭을 포함한다. 클럭 전달 타이밍 발행은 소프트웨어 및 하드웨어 사이의 경계로 인하여 발생한다. 완전 모델이 소프트웨어에 존재하기 때문에, 소프트웨어는 레지스터 값에 영향을 주는 클럭 에지를 검출할 수 있다. 레지스터의 소프트웨어 모델 이외에, 이러한 레지스터는 물리적으로 하드웨어 모델 내에 위치된다. 하드웨어 레지스터가 자신의 각각의 입력(즉, D 입력에서 Q 출력으로 데이터를 이동시키는 것)을 또한 평가하도록 하기 위하여, 소프트웨어/하드웨어 경계는 소프트웨어 클럭을 포함한다. 소프트웨어 클럭은 하드웨어 모델 내의 레지스터가 정확하게 평가한다는 것을 보증한다. 소프트웨어 클럭은 본질적으로 하드웨어 레지스터 소자로의 클럭 입력을 제어하기보다는 오히려 하드웨어 레지스터의 인에이블 입력을 제어한다. 이 소프트웨어 클럭은 경쟁 조건(race condition)을 피하며, 이에 따라서 유지-시간 위반을 피하기 위한 정확한 타이밍 제어가 필요하지 않다. 도 16에 도시된 클럭 네트워크 및 게이팅된 데이터 논리 분석 공정은 경쟁 조건이 피해지고 가요적인 소프트웨어/하드웨어 경계 구현이 제공되도록 하드웨어 레지스터에 대한 클럭 및 데이터 전달 시스템을 모델링하고 구현하는 방식을 제공한다.The SE emulation system has a complete model of the user circuit design in software and part of the user circuit design in hardware. This hardware part includes clock elements, in particular derived clocks. The issue of clock transfer timing occurs due to the boundary between software and hardware. Since the full model is in software, the software can detect clock edges that affect register values. In addition to the software model of the registers, these registers are physically located within the hardware model. In order for the hardware registers to also evaluate their respective inputs (ie, moving data from the D input to the Q output), the software / hardware boundary includes a software clock. The software clock ensures that the registers in the hardware model evaluate correctly. The software clock essentially controls the enable input of the hardware register rather than controlling the clock input to the hardware register element. This software clock avoids race conditions and therefore does not require precise timing control to avoid hold-time violations. The clock network and gated data logic analysis process shown in FIG. 16 provides a way to model and implement a clock and data delivery system for hardware registers so that race conditions are avoided and flexible software / hardware boundary implementations are provided.

전술한 바와 같이, 제 1 클럭은 테스트-벤치 공정으로부터의 클럭 신호이다. 조합한 소자로부터 유도된 그러한 클럭 신호와 같은 모든 다른 클럭은 유도되거나 게이팅된 클럭이다. 제 1 클럭은 게이팅된 클럭 및 게이팅된 데이터 신호 둘 모두를 유도할 수 있다. 대부분에 대해, 단지 몇 개(예를 들어, 1-10)의 유도되거나 게이팅된 클럭이 사용자의 회로 디자인 내에 존재한다. 이러한 유도된 클럭은 소프트웨어 클럭으로 구현될 수 있고 소프트웨어 내에 있게 될 것이다. 상대적으로 큰 수(예를 들어 10 이상)의 유도 클럭이 회로 디자인 내에 제공되는 경우, SE뮬레이션 시스템은 I/O 오버헤드를 감소시키기 SE뮬레이션 시스템의 성능을 유지시키기 위하여 이것을 하드웨어 내로 모델링할 것이다. 게이팅된 데이터는 어떤 조합 논리를 통하여 제 1 클러으로부터 유도된 클럭과는 다른 레지스터의 데이터 또는 제어 입력이다.As mentioned above, the first clock is a clock signal from the test-bench process. All other clocks, such as those clock signals derived from the combined device, are derived or gated clocks. The first clock can derive both the gated clock and the gated data signal. For the most part, only a few (eg 1-10) derived or gated clocks are present in the user's circuit design. This derived clock can be implemented as a software clock and will be in software. If a relatively large number (eg 10 or more) of induction clocks is provided in the circuit design, the SE emulation system will model it into hardware to maintain the performance of the SE emulation system to reduce I / O overhead. The gated data is a data or control input in a register different from the clock derived from the first clock through some combinatorial logic.

게이팅된 데이터/클럭 분석 공정은 단계(500)에서 시작한다. 단계(501)는 HDL 코드로부터 발생된 유용한 자원 디자인 데이터베이스 코드를 사용하고 SE뮬레이션 시스템의 레지스터 소자로 사용자의 레지스터 엘리먼트를 맵핑한다. SE뮬레이션 레지스터로의 사용자 레지스터의 이러한 일-대-일 맵핑은 이후의 모델링 단계를 용이하게 한다. 어떤 경우에, 이 맵핑은 특정 프리미티브(primitive)를 갖는 레지스터 엘리먼트를 설명하는 사용자 회로 디자인을 처리하는 것을 필요로 한다. 그러므로, RTL 레벨 코드에 대하여, SE뮬레이션 레지스터는 RTL 레벨 코드가 더 낮은 레벨 구현을 가변시키도록 하는 충분히 높은 레벨에 있기 때문에 쉽게 고속으로사용될 수 있다. 게이트 레벨 넷리스트(gate level netlist)에 대하여, SE뮬레이션 시스템은 소자의 셀 라이브러리에 액세스하여 이를 특정 회로 디자인-특정 논리 엘리먼트에 적합하게 하기 위하여 변경할 것이다.The gated data / clock analysis process begins at 500. Step 501 uses the useful resource design database code generated from the HDL code and maps the user's register elements to the register elements of the SE emulation system. This one-to-one mapping of user registers to SE simulation registers facilitates subsequent modeling steps. In some cases, this mapping requires handling the user circuit design that describes the register element with a particular primitive. Therefore, for RTL level code, the SE emulation register can be easily used at high speed because the RTL level code is at a high enough level to vary the lower level implementation. For a gate level netlist, the SE emulation system will access the device's cell library and modify it to suit a particular circuit design-specific logic element.

단계(502)는 하드웨어 모델의 레지스터 소자로부터 클럭 신호를 추출한다. 이 단계는 시스템이 제 1 클럭 및 유도된 클럭을 결정하도록 한다. 이 단계는 또한 회로 디자인의 다양한 소자에 의해 필요한 모든 클럭 신호를 결정한다. 이 단계로부터의 정보는 소프트웨어/하드웨어 클럭 모델링 단계를 용이하게 한다.Step 502 extracts the clock signal from the register element of the hardware model. This step allows the system to determine the first clock and derived clock. This step also determines all clock signals needed by the various elements of the circuit design. The information from this step facilitates the software / hardware clock modeling step.

단계(503)는 제 1 클럭 및 유도된 클럭을 결정한다. 제 1 클럭은 테스트-벤치 소자로부터 발생되어 소프트웨어에서만 모델링된다. 유도된 클럭은 결합 논리로부터 유도되고, 이 논리는 차례로 제 1 클럭에 의해 유도된다. 디폴트(default)에 의해, 본 발명의 SE뮬레이션 시스템은 유도된 클럭을 소프트웨어에 유지할 것이다. 유도된 클럭의 수가 작은 경우(예를 들어 10 이하), 이러한 유도된 클럭은 소프트웨어 클럭으로 모델링될 수 있다. 이러한 유도된 클럭을 발생시키기 위한 결합 소자의 수가 작아서, 이러한 결합 소자를 소프트웨어 내에 존재하도록 함으로써 상당한 I/O 오버헤드가 부가되지는 않는다. 그러나, 유도된 클럭의 수가 큰 경우(예를 들어 10 이상), 이러한 유도된 클럭은 I/O 오버헤드를 최소화하기 위하여 하드웨어에서 모델링될 수 있다. 종종, 사용자의 회로 디자인은 제 1 클럭으로부터 유도된 상당히 많은 유도 클럭 소자를 사용한다. 그러므로, 시스템은 소프트웨어 클럭의 수를 작게 유지하기 위하여 하드웨어에서 클럭을 구성한다.Step 503 determines the first clock and derived clock. The first clock is generated from the test-bench element and modeled only in software. The derived clock is derived from the combining logic, which in turn is derived by the first clock. By default, the SE emulation system of the present invention will maintain the derived clock in software. If the number of derived clocks is small (eg 10 or less), these derived clocks can be modeled as software clocks. The number of coupling elements to generate such a derived clock is small, so that such coupling elements are present in software without adding significant I / O overhead. However, if the number of derived clocks is large (eg 10 or more), such derived clocks can be modeled in hardware to minimize I / O overhead. Often, the user's circuit design uses a significant number of inductive clock elements derived from the first clock. Therefore, the system configures the clock in hardware to keep the number of software clocks small.

결정 단계(504)는 시스템에게 임의의 유도된 클럭이 사용자의 회로 디자인에서 발견되는지를 결정할 것을 요구한다. 그렇지 않은 경우, 단계(504)는 "NO"로 결정하고 사용자 회로 디자인의 모든 클럭이 제 1 클럭이고 이러한 클럭이 소프트웨어에서 간단하게 모델링되기 때문에 단계(508)에서 클럭 분석이 종료된다. 유도된 클럭이 사용자의 회로 디자인에서 발견되는 경우, 단계(504)는 "YES"로 결정하여 알고리즘은 단계(505)로 진행한다.Decision step 504 requires the system to determine if any derived clock is found in the user's circuit design. Otherwise, step 504 determines "NO" and clock analysis ends at step 508 because all clocks in the user circuit design are the first clock and these clocks are simply modeled in software. If the derived clock is found in the user's circuit design, step 504 determines "YES" and the algorithm proceeds to step 505.

단계(505)는 제 1 클럭으로부터 유도된 클럭으로의 팬-아웃(fan-out) 결합 소자를 결정한다. 즉, 이 단계는 결합 소자를 통하여 제 1 클럭으로부터 클럭 신호 데이터경로를 추적한다. 단계(506)는 유도된 클럭으로부터 팬-인(fan-in) 결합 소자를 결정한다. 즉, 이 단계는 결합 소자로부터 유도된 클럭으로의 클럭 신호 데이터경로를 추적한다. 상기 시스템에서 팬-아웃 및 팬-인 세트를 결정하는 것은 소프트웨어에서 반복적으로 행해진다. 넷(net) N의 팬-인 세트는 다음과 같다:Step 505 determines a fan-out coupling element from the first clock to the clock derived. In other words, this step tracks the clock signal datapath from the first clock through the coupling element. Step 506 determines a fan-in coupling element from the derived clock. In other words, this step tracks the clock signal datapath from the coupling element to the clock derived. Determining fan-out and fan-in sets in the system is done repeatedly in software. The fan-in set of net N is as follows:

FanIn Set of a net N:FanIn Set of a net N:

find all the components driving net N;find all the components driving net N;

for each component X driving net N do:for each component X driving net N do:

if the component X is not a combination component thenif the component X is not a combination component then

return;return;

elseelse

for each input net Y of the component Xfor each input net Y of the component X

add the FanIn set W of net Y to the FanIn Set of net Nadd the FanIn set W of net Y to the FanIn Set of net N

end forend for

add the component X into N;add the component X into N;

end ifend if

endforendfor

게이팅된 클럭 또는 데이터 논리 네트워크는 넷 N의 팬-인 세트 및 팬-아웃 세트를 반복적으로 결정하고 이들의 인터섹션(intersection)을 결정함으로써 결정된다. 여기서 최종 목표는 소위 넷 N의 팬-인 세트를 결정하는 것이다. 넷 N은 통상적으로 팬-인 예상(perspective)으로부터 게이팅된 클럭 논리를 결정하기 위한 클럭 입력 노드이다. 팬-인 예상으로부터 게이팅된 데이터 논리를 결정하기 위하여, 넷 N은 가까이의 데이터 입력과 관련된 클럭 입력 노드이다. 노드가 레지스터 상에 존재하는 경우, 넷 N은 그 레지스터와 관련된 데이터 입력을 위한 그 레지스터로의 클럭 입력이다. 시스템은 넷 N을 구동시키는 모든 소자를 찾아낸다. 넷 N을 구동시키는 각각의 소자 X에 대하여, 상기 시스템은 소자 X가 결합 소자인지 아닌지 여부를 결정한다. 각각의 소자 X가 결합 소자가 아닌 경우, 넷 N의 팬-인 세트는 결합 소자를 가지지 않고 넷 N은 제 1 클럭이다.The gated clock or data logic network is determined by iteratively determining the net-in fan-in and fan-out sets and determining their intersections. The final goal here is to determine the so-called net-in fan-in set. Net N is typically a clock input node for determining gated clock logic from a fan-in perspective. To determine the gated data logic from the fan-in prediction, net N is the clock input node associated with the nearby data input. If a node is on a register, net N is the clock input to that register for data input associated with that register. The system finds all the devices driving net N. For each device X driving net N, the system determines whether device X is a coupling device or not. If each element X is not a coupling element, the fan-in set of net N has no coupling element and net N is the first clock.

그러나, 적어도 하나의 소자 X가 결합 소자인 경우, 시스템은 소자 X의 입력 넷 Y을 결정한다. 여기서, 상기 시스템은 소자 X로의 입력 노드를 찾아냄으로써 회로 디자인을 고찰한다. 각각의 소자 X의 각 입력 넷 Y에 대하여, 넷 Y에 결합되는 팬-인 세트 W가 존재할 수 있다. 넷 Y의 이 팬-인 세트 W는 넷 N의 팬-인 세트에 부가되고 나서, 소자 X가 세트 N에 부가된다.However, if at least one element X is a coupling element, the system determines the input net Y of element X. Here, the system considers the circuit design by finding the input node to element X. For each input net Y of each element X, there may be a fan-in set W coupled to the net Y. This fan-in set W of net Y is added to the fan-in set of net N, and then element X is added to set N.

넷 N의 팬-아웃 세트가 유사한 방식으로 결정된다. 넷 N의 팬-아웃 세트는다음과 같이 결정된다:The fan-out set of net N is determined in a similar manner. The fan-out set of net N is determined as follows:

FanOut Set of a netN:FanOut Set of a netN:

find all the components using the net N;find all the components using the net N;

for each component X using net N do:for each component X using net N do:

return;return;

elseelse

for each output net Y of the component Xfor each output net Y of the component X

add the FanOut Set of net Y to the FanOut Set of net Nadd the FanOut Set of net Y to the FanOut Set of net N

end forend for

add the component X into N;add the component X into N;

end ifend if

endforendfor

다시, 게이팅된 클럭 또는 데이터 논리 네트워크는 넷 N의 팬-인 세트 및 팬-아웃 세트를 결정하고 이들의 인터섹션을 결정함으로써 결정된다. 여기서 최종 목표는 소위 넷 N의 팬-아웃 세트를 결정하는 것이다. 넷 N은 통상적으로 팬-아웃 예상(perspective)으로부터 게이팅된 데이터 논리를 결정하기 위한 클럭 출력 노드이다. 그러므로, 넷 N을 사용하는 모든 논리 엘리먼트 세트가 결정될 것이다. 팬-아웃 예상으로부터 게이팅된 데이터 논리를 결정하기 위하여, 넷 N은 가까이의 데이터 출력과 관련된 클럭 출력 노드이다. 노드가 레지스터 상에 존재하는 경우,넷 N은 그 레지스터와 관련된 제 1 클럭-구동된 입력을 위한 그 레지스터의 출력이다. 시스템은 넷 N을 사용하는 모든 소자를 찾아낸다. 넷 N을 사용하는 각각의 소자 X에 대하여, 상기 시스템은 소자 X가 결합 소자인지 아닌지 여부를 결정한다. 각각의 소자 X가 결합 소자가 아닌 경우, 넷 N의 팬-아웃 세트는 결합 소자를 가지지 않고 넷 N은 제 1 클럭이다.Again, a gated clock or data logic network is determined by determining the fan-in and fan-out sets of net N and determining their intersections. The final goal here is to determine the so-called net-out fan-out set. Net N is typically a clock output node for determining gated data logic from a fan-out perspective. Therefore, all logical element sets using net N will be determined. To determine the gated data logic from the fan-out prediction, net N is the clock output node associated with the nearby data output. If the node is on a register, net N is the output of that register for the first clock-driven input associated with that register. The system finds all devices using net N. For each device X using net N, the system determines whether device X is a coupling device or not. If each element X is not a coupling element, the fan-out set of net N has no coupling element and net N is the first clock.

그러나, 적어도 하나의 소자 X가 결합 소자인 경우, 시스템은 소자 X의 출력 넷 Y를 결정한다. 여기서, 상기 시스템은 소자 X로부터의 출력 노드를 찾아냄으로써 회로 디자인을 고찰한다. 각각의 소자 X로부터의 각 출력 넷 Y에 대하여, 넷 Y에 결합되는 팬-아웃 세트 W가 존재할 수 있다. 넷 Y의 이 팬-아웃 세트 W는 넷 N의 팬-아웃 세트에 부가되고 나서, 소자 X가 세트 N에 부가된다.However, if at least one element X is a coupling element, the system determines the output net Y of element X. Here, the system considers the circuit design by finding the output node from element X. For each output net Y from each element X, there may be a fan-out set W coupled to the net Y. This fan-out set W of net Y is added to the fan-out set of net N, and then element X is added to set N.

단계(507)는 클럭 네트워크 또는 게이팅된 클럭 논리를 결정한다. 클럭 네트워크는 팬-인 및 팬-아웃 결합 소자의 인터섹션이다.Step 507 determines the clock network or gated clock logic. The clock network is the intersection of fan-in and fan-out coupling elements.

마찬가지로, 게이팅된 논리 회로를 결정하는데 동일한 팬-인 및 팬-아웃 원리가 사용된다. 게이팅된 클럭과 같이, 게이팅된 데이터는 어떤 결합 논리를 통하여 제 1 클럭에 의해 구동된 레지스터의 데이터 또는 제어 입력(클럭을 제외한)이다. 게이팅된 데이터 논리는 게이팅된 데이터의 팬-인 및 제 1 클럭으로부터의 팬-아웃의 인터섹션이다. 그러므로, 클럭 분석 및 게이팅된 데이터 분석은 어떤 결합 논리를 통한 게이팅된 클럭 네트워크/논리 및 게이팅된 데이터 논리를 발생시킨다. 후술된 바와 같이, 게이팅된 클럭 네트워크 및 게이팅된 데이터 네트워크 결정은 에뮬레이션 동안 하드웨어 모델에서의 논리 평가 및 소프트웨어 클럭의 성공적인 구현에 중요하다. 클럭/데이터 네트워크 분석은 단계(508)에서 종료된다.Similarly, the same fan-in and fan-out principles are used to determine gated logic circuits. Like the gated clock, the gated data is the data or control input (except the clock) of the register driven by the first clock through some combinational logic. Gated data logic is the intersection of fan-in of the gated data and fan-out from the first clock. Therefore, clock analysis and gated data analysis result in gated clock network / logic and gated data logic through some combinational logic. As discussed below, gated clock network and gated data network decisions are important for logical implementation in hardware models and successful implementation of software clocks during emulation. Clock / data network analysis ends at step 508.

도 17은 본 발명의 일 실시예에 따른 하드웨어 모델의 기본적인 구조 블럭을 도시한 것이다. 레지스터 소자에 대하여, SE뮬레이션 시스템은 에지 트리거(즉, 플립-플롭) 및 레벨 감지(즉, 래치) 레지스터 하드웨어 모델 둘 모두를 구성하기 위한 기본적인 블럭으로서 비동기 부하 제어를 갖는 D-형 플립-플롭을 사용한다. 블럭을 구성하는 이 레지스터 모델은 다음의 포트: Q(출력 상태);A_E(비동기 인에이블);A_D(비동기 데이터);S_E(동기 인에이블);S_D(동기 데이터); 및 시스템.clk(시스템 클럭)을 갖는다.17 illustrates basic structural blocks of a hardware model according to an embodiment of the present invention. For register devices, the SE emulation system uses a D-type flip-flop with asynchronous load control as the basic block for constructing both the edge trigger (i.e. flip-flop) and level sense (i.e. latch) register hardware models. use. This register model constituting the block includes the following ports: Q (output state); A_E (asynchronous enable); A_D (asynchronous data); S_E (synchronous enable); S_D (synchronous data); And system.clk (system clock).

이 SE뮬레이션 레지스터 모델은 비동기 인에이블(A_E) 입력의 양의 레벨 또는 시스템 클럭의 양의 에지에 의하여 트리거된다. 이러한 두 개의 양의 에지 또는 양의 레벨 트리거링 이벤트중 하나가 발생할때, 레지스터 모델은 비동기 인에이블(A_E) 입력을 찾는다. 비동기 인에이블(A_E) 입력이 인에이블되는 경우, 출력 (Q)은 비동기 데이터(A_D)의 값을 나타낸다; 그렇지 않으면, 동기 인에이블(S_E)이 인에이블되는 경우, 출력(Q)은 동기 데이터(S_D)의 값을 나타낸다. 한편, 비동기 인에이블(A_E)도 동기 인에이블(S_E) 입력도 인에이블되지 않는 경우, 출력(Q)은 시스템 클럭의 양의 에지의 결정에도 불구하고 평가되지 않는다. 이 방식에서, 이러한 인에이블 포트로의 입력은 이 기본 구성 블럭 레지스터 모델의 동작을 제어한다.This SE emulation register model is triggered by either the positive level of the asynchronous enable (A_E) input or the positive edge of the system clock. When one of these two positive edges or positive level triggering events occurs, the register model looks for an asynchronous enable (A_E) input. When the asynchronous enable A_E input is enabled, the output Q indicates the value of the asynchronous data A_D; Otherwise, when the sync enable S_E is enabled, the output Q indicates the value of the sync data S_D. On the other hand, if neither the asynchronous enable A_E nor the synchronous enable S_E input are enabled, the output Q is not evaluated despite the determination of the positive edge of the system clock. In this way, input to this enable port controls the behavior of this basic building block register model.

시스템은 이러한 레지스터 모델의 인에이블 입력을 제어하기 위한 특정 인에이블 레지스터인 소프트웨어 클럭을 사용한다. 복잡한 사용자 회로 디자인에서,수 백만 개의 엘리먼트가 회로 디자인에서 발견되고, 이에 따라서 SE뮬레이션 시스템은 하드웨어 모델에서 수 백만 개의 엘리먼트를 구현할 것이다. 이러한 엘리먼트 모두를 개별적으로 제어하는 것은 하드웨어 모델로 수 백만 개의 신호를 전송하는 오버헤드가 소프트웨어에서 이러한 엘리먼트를 평가하는 것보다 많은 시간이 들것이기 때문에, 값이 비싸다. 그러나, 이 복잡한 회로 디자인은 통상적으로 단지 몇 개(1-10)의 클럭만을 요구하며 레지스터 및 결합 소자만을 갖는 시스템의 상태 변화를 제어하는데 클럭들만으로도 충분하다. SE뮬레이터 시스템의 하드웨어 모델은 단지 레지스터 및 결합 소자만을 사용한다. SE뮬레이터 시스템은 또한 소프트웨어 클럭을 통하여 하드웨어 모델의 평가를 제어한다. SE뮬레이션 시스템에서, 레지스터용 하드웨어 모델은 다른 하드웨어 소자에 직접 접속된 클럭을 갖지 않으며; 오히려, 소프트웨어 커넬(kernel)이 모든 클럭의 값을 제어한다. 몇 개의 클럭 신호를 제어함으로써, 커넬은 무시 가능한 양의 코프로세서 개입 오버헤드를 가지고 하드웨어 모델의 평가를 완전히 제어할 수 있게 된다.The system uses a software clock, which is a specific enable register to control the enable input of this register model. In a complex user circuit design, millions of elements are found in the circuit design, so the SE emulation system will implement millions of elements in the hardware model. Controlling all of these elements individually is expensive because the overhead of sending millions of signals to the hardware model will take more time than evaluating these elements in software. However, this complex circuit design typically requires only a few clocks (1-10) and the clocks are sufficient to control the state change of the system with only registers and coupling elements. The hardware model of the SE emulator system uses only registers and coupling elements. The SE emulator system also controls the evaluation of the hardware model through the software clock. In a SE emulation system, the hardware model for registers does not have a clock directly connected to another hardware element; Rather, the software kernel controls the values of all clocks. By controlling several clock signals, the kernel has full control over the evaluation of the hardware model with negligible coprocessor intervention overhead.

레지스터 모델이 래치로서 사용되는지 또는 플립-플롭으로 사용되는지 여부에 따라서, 소프트웨어 클럭은 비동기 인에이블(A_E) 또는 동기 인에이블(S_E) 와이어 라인중 하나로의 입력일 수 있다. 소프트웨어 모델로부터 하드웨어 모델로의 소프트웨어 클럭의 인가는 클럭 소자의 에지 검출에 의해 트리거된다. 소프트웨어 커넬이 클럭 소자의 에지를 검출할때, 이 커넬은 CLK 어드레스 스페이스를 통하여 클럭-에지 레지스터를 설정한다. 이것 클럭-에지 레지스터는 하드웨어 레지스터 모델로의 클럭 입력이 아니라 인에이블 입력을 제어한다. 글로벌 시스템 클럭은하드웨어 레지스터 모델로의 클럭 입력을 여전히 제공한다. 그러나, 클럭-에지 레지스터는 이중-버퍼링된 인터페이스를 통하여 하드웨어 레지스터 모델로의 소프트웨어 클럭 신호를 제공한다. 후술된 바와 같이, 소프트웨어 클럭으로부터 하드웨어 모델로의 이중-버퍼 인터페이스는 모든 레지스터 모델이 글로벌 시스템 클럭과 관련하여 동기적으로 갱신되도록 한다. 그러므로, 소프트웨어 클럭을 사용하면 유지 시간 위반의 위험이 제거된다.Depending on whether the register model is used as a latch or flip-flop, the software clock may be an input to one of the asynchronous enable (A_E) or synchronous enable (S_E) wire lines. The application of the software clock from the software model to the hardware model is triggered by edge detection of the clock element. When the software kernel detects the edge of the clock element, it sets the clock-edge register through the CLK address space. This clock-edge register controls the enable input, not the clock input to the hardware register model. The global system clock still provides the clock input to the hardware register model. However, the clock-edge register provides a software clock signal to the hardware register model through a double-buffered interface. As discussed below, the dual-buffer interface from the software clock to the hardware model allows all register models to be updated synchronously with respect to the global system clock. Therefore, using a software clock eliminates the risk of holding time violations.

도 18(a) 및 18(b)는 래치 및 플립-플롭용 구성 블럭 레지스터의 구현을 도시한 것이다. 이러한 레지스터 모델은 적절한 인에이블 입력을 통하여 소프트웨어-클럭 제어된다. 레지스터 모델이 플립-플롭으로 사용되는지 또는 래치로서 사용되는지 여부에 따라, 비동기 포트(A_E,A_D) 및 동기 포트(S_E,S_D)는 소프트웨어 클럭 또는 I/O 동작중 하나를 위해 사용된다. 도 18(a)는 래치로서 사용되는 경우의 레지스터 모델 구현을 도시한다. 래치는 레벨에 민감하다; 즉, 클럭 신호가 나타나는(예를 들어, "1")한, 출력(Q)은 입력(D)을 따른다. 여기서, 소프트웨어 클럭 신호는 비동기 인에이블(A_E)로 제공되며 데이터 입력은 비동기 데이터(A_D) 입력으로 제공된다. I/O 동작에 대하여, 소프트웨어 커넬은 Q 포트 내로 값을 다운로드하기 위하여 동기 인에이블(S_E) 및 동기 데이터(S_D) 입력을 사용한다. S_E 포트는 REG 스페이스 어드레스 포인터로서 사용되며 S_D는 국부적인 데이터 버스로/로부터 데이터에 액세스하기 위하여 사용된다.18 (a) and 18 (b) illustrate the implementation of component block registers for latches and flip-flops. This register model is software-clock controlled through the appropriate enable input. Depending on whether the register model is used as a flip-flop or as a latch, the asynchronous ports A_E and A_D and the sync ports S_E and S_D are used for either software clock or I / O operations. 18A shows a register model implementation when used as a latch. Latch is level sensitive; In other words, as long as the clock signal appears (eg, "1"), the output Q follows the input D. Here, the software clock signal is provided to the asynchronous enable (A_E) and the data input is provided to the asynchronous data (A_D) input. For I / O operations, the software kernel uses the sync enable (S_E) and sync data (S_D) inputs to download values into the Q port. The S_E port is used as a REG space address pointer and S_D is used to access data to and from the local data bus.

도 18(b)는 디자인 플립-플롭으로 사용되는 경우의 레지스터 모델 구현을 도시한다. 디자인 플립-플롭은 다음 상태 논리를 결정하기 위하여 다음의 포트: 데이터(D), 세트(S), 리셋(R) 및 인에이블(E)을 사용한다. 디자인 플립-플롭의 모든 다음 상태 논리는 동기 데이터(S_D) 입력 내로 공급되는 하드웨어 결합 소자로 팩터링된다. 소프트웨어 클럭은 동기 인에이블(S_E) 입력부로의 입력이다. I/O 동작에 대하여, 소프트웨어 커넬은 Q 포트 내로 값을 다운로드하기 위하여 비동기 인에이블(A_E) 및 비동기 데이터(A_D) 입력을 사용한다. A_E 포트는 REG 스페이스 기록 어드레스 포인터로서 사용되고 A_D 포트는 국부적인 데이터 버스로/로부터 데이터에 액세스하기 위하여 사용된다.18 (b) shows a register model implementation when used as a design flip-flop. The design flip-flop uses the following ports: data (D), set (S), reset (R), and enable (E) to determine the next state logic. All next state logic of the design flip-flop is factored into hardware coupled elements that are fed into the sync data (S_D) input. The software clock is the input to the sync enable (S_E) input. For I / O operations, the software kernel uses asynchronous enable (A_E) and asynchronous data (A_D) inputs to download values into the Q port. The A_E port is used as the REG space write address pointer and the A_D port is used to access data to and from the local data bus.

소프트웨어 클럭은 이하에 서술될 것이다. 본 발명의 소프트웨어 클럭의 일 실시예는 하드웨어 레지스터 모델로의 클럭 인에이블 신호여서 이러한 하드웨어 레지스터 모델로의 입력에서의 데이터가 시스템 클럭과 함께 그리고 시스템 클럭과 동기적으로 평가된다. 이것은 경쟁 조건 및 유지-시간 위반을 제거한다. 소프트웨어 클럭 논리의 일 실시예는 클럭 에지 검출시 하드웨어 내의 부가적인 논리를 트리거하는 소프트웨어 내의 클럭 에지 검출 논리를 포함한다. 이와같은 인에이블 신호 논리는 데이터가 이러한 하드웨어 레지스터 모델로 도착하기 이전에 하드웨어 레지스터 모델로 인에이블 입력에 대한 인에이블 신호를 발생시킨다. 게이팅된 클럭 네트워크 및 게이팅된 데이터 네트워크 결정은 하드웨어 가속 모드 동안 하드웨어 모델에서의 논리 평가 및 소프트웨어 클럭의 성공적인 구현에 중요하다. 전술한 바와 같이, 클럭 네트워크 또는 게이팅된 클럭 논리는 게이팅된 클럭의 팬-인 및 제 1 클럭의 팬-아웃의 인터섹션이다. 마찬가지로, 게이팅된 데이터 논리는 또한 게이팅된 데이터의 팬-인 및 데이터 신호에 대한 제 1 클럭의 팬-아웃의 인터섹션이다. 이러한 팬-인 및 팬-아웃 개념은 도 16과 관련하여 상술되어 있다.The software clock will be described below. One embodiment of the software clock of the present invention is a clock enable signal to a hardware register model so that data at the input to this hardware register model is evaluated with the system clock and synchronously with the system clock. This eliminates race conditions and maintenance-time violations. One embodiment of software clock logic includes clock edge detection logic in software that triggers additional logic in hardware upon clock edge detection. This enable signal logic generates an enable signal for the enable input into the hardware register model before data arrives in this hardware register model. Gated clock network and gated data network decisions are important for the logical evaluation of the hardware model and successful implementation of the software clock during hardware acceleration mode. As mentioned above, the clock network or gated clock logic is the intersection of the fan-in of the gated clock and the fan-out of the first clock. Likewise, the gated data logic is also the intersection of the fan-in of the gated data and the fan-out of the first clock with respect to the data signal. This fan-in and fan-out concept is described above with respect to FIG.

전술한 바와 같이, 제 1 클럭은 소프트웨어에서 테스트-벤치 공정에 의해 생성된다. 유도되거나 게이팅된 클럭은 제 1 클럭에 의해 차례로 구동되는 결합 논리 및 레지스터의 네트워크로부터 발생된다. 디폴트에 의하여, 본 발명의 SE뮬레이션 시스템은 유도된 클럭을 소프트웨어에 유지시킬 것이다. 유도된 클럭의 수가 작은 경우(예를 들어, 10 이하), 이러한 유도된 클럭은 소프트웨어 클럭으로 모델링될 수 있다. 이러한 유도 클럭을 발생시키기 위한 결합 소자의 수가 적어서, 소프트웨어 내의 이러한 결합 소자를 모델링함으로써 I/O 오버헤드가 부가되지 않는다. 그러나, 유도된 클럭의 수가 큰 경우(예를 들어, 10 이상), 이러한 유도된 클럭 및 이들의 결합 소자는 I/O 오버헤드를 최소화하기 위하여 하드웨어에서 모델링될 수 있다.As mentioned above, the first clock is generated by a test-bench process in software. The derived or gated clock is generated from a network of registers and register logic that are in turn driven by the first clock. By default, the SE emulation system of the present invention will maintain the derived clock in software. If the number of derived clocks is small (eg 10 or less), these derived clocks can be modeled as software clocks. The number of coupling elements to generate such an induction clock is small, so no I / O overhead is added by modeling such coupling elements in software. However, if the number of derived clocks is large (eg, 10 or more), these derived clocks and their combinations can be modeled in hardware to minimize I / O overhead.

궁극적으로, 본 발명의 일 실시예에 따라서, (제 1 클럭으로의 입력을 통하여) 소프트웨어에서 발생된 클럭 에지 검출은 (클럭 에지 레지스터로의 입력을 통하여) 하드웨어에서의 클럭 검출로 변화될 수 있다. 소프트웨어에서의 클럭 에지 검출은 하드웨어에서의 이벤트를 트리거하여 데이터 신호 이전에 클럭 인에이블 신호를 수신해서, 데이터 신호의 평가가 유지-시간 위반을 피하기 위해 시스템 클럭과 동기화되어 발생하도록 한다.Ultimately, according to one embodiment of the present invention, clock edge detection generated in software (via input to the first clock) may be changed to clock detection in hardware (via input to the clock edge register). . Clock edge detection in software triggers an event in hardware to receive a clock enable signal prior to the data signal, such that the evaluation of the data signal occurs in synchronization with the system clock to avoid hold-time violations.

전술한 바와 같이, SE뮬레이션 시스템은 소프트웨어에서 사용자의 회로 디자인의 완전 모델 및 하드웨어에서 사용자의 회로 디자인의 일부를 갖는다. 커넬에서 규정된 바와 같이, 소프트웨어는 하드웨어 레지스터 값에 영향을 주는 클럭 에지를 검출할 수 있다. 하드웨어 레지스터가 또한 자신들의 각 입력을 평가하도록 하기 위하여, 소프트웨어/하드웨어 경계는 소프트웨어 클럭을 포함한다. 소프트웨어 클럭은 하드웨어 모델 내의 레지스터가 시스템 클럭과 동기화하여 그리고 임의의 유지-시간 위반 없이 평가되도록 한다. 소프트웨어 클럭은 본질적으로 하드웨어 레지스터 소자로의 클럭 입력을 제어한다기 보다는 차라리 하드웨어 레지스터 소자의 인에이블 입력을 제어한다. 소프트웨어 클럭을 구현하기 위한 이중-버퍼링된 방법은 레지스터가 경쟁 조건을 피하기 위하여 시스템 클럭과 동기화하여 평가되도록 하고 유지-시간 위반을 피하기 위한 정확한 타이밍 제어에 대한 필요성을 제거하도록 한다.As mentioned above, the SE emulation system has a complete model of the user's circuit design in software and part of the user's circuit design in hardware. As defined in the kernel, software can detect clock edges that affect hardware register values. In order for the hardware registers to also evaluate their respective inputs, the software / hardware boundary includes a software clock. The software clock allows the registers in the hardware model to be evaluated in synchronization with the system clock and without any hold-time violation. The software clock essentially controls the enable input of the hardware register device rather than the clock input to the hardware register device. The double-buffered method for implementing a software clock allows the registers to be evaluated in synchronization with the system clock to avoid race conditions and eliminates the need for accurate timing control to avoid hold-time violations.

도 19는 본 발명에 따른 클럭 구현 시스템의 일 실시예를 도시한 것이다. 처음에, 게이팅된 클럭 논리 및 게이팅된 데이터 논리는 도 16과 관련하여 전술한 바와 같이 SE뮬레이션 시스템에 의해 결정된다. 그리고 나서, 게이팅된 클럭 논리 및 게이팅된 데이터 논리가 분리된다. 이중 버퍼를 구현할때, 구동 소스 및 이중 버퍼링된 제 1 논리는 분리되어야만 한다. 따라서, 팬-인 및 팬-아웃 분석으로부터, 게이팅된 데이터 논리(513) 및 게이팅된 클럭 논리(514)가 분리된다.19 illustrates one embodiment of a clock implementation system in accordance with the present invention. Initially, the gated clock logic and gated data logic are determined by the SE emulation system as described above with respect to FIG. The gated clock logic and gated data logic are then separated. When implementing a double buffer, the drive source and the double buffered first logic must be separated. Thus, from the fan-in and fan-out analysis, gated data logic 513 and gated clock logic 514 are separated.

모델링된 제 1 클럭 레지스터(510)는 제 1 버퍼(511) 및 제 2 버퍼(512)를 포함하는데, 이것들은 둘 모두 D 레지스터이다. 이 제 1 클럭은 소프트웨어에어 모델링되지만 이중-버퍼 구현은 소프트웨어 및 하드웨어 둘 모두에서 모델링된다. 클럭 에지 검출은 하드웨어 모델에 대한 소프트웨어 클럭 신호를 발생시키도록 하드웨어 모델을 트리거하기 위하여 소프트웨어 내의 제 1 클럭 레지스터(510)에서발생된다. 데이터 및 어드레스는 와이어 라인(519 및 529)에서 각각 제 1 버퍼(511)로 들어간다. 와이어 라인(521) 상에서의 이 제 1 버퍼(511)의 Q 출력은 제 2 버퍼(512)의 D 입력에 결합된다. 이 제 1 버퍼(511)의 Q 출력은 또한 궁극적으로 클럭 에지 레지스터(515)의 제 1 버퍼(516)의 클럭 입력을 구동시키기 위하여 와이어 라인(522) 상에서 게이팅된 클럭 논리(514)로 제공된다. 와이어 라인(523) 상에서의 제 2 버퍼(512)의 출력은 궁극적으로 사용자의 커스텀-디자인된 회로 모델에서 와이어 라인(530)을 통하여 레지스터(518)의 입력을 구동시키기 위하여 게이팅된 데이터 논리(513)에 제공된다. 제 1 클럭 레지스터(510) 내의 제 2 버퍼(512)로의 인에이블 입력은 상태 기계로부터 와이어 라인(533) 상의 INPUT-EN 신호이며, 이것은 평가 사이클을 결정하고 이에 따라서 다양한 신호를 제어한다.The modeled first clock register 510 includes a first buffer 511 and a second buffer 512, both of which are D registers. This first clock is modeled in software but the dual-buffer implementation is modeled in both software and hardware. Clock edge detection is generated in a first clock register 510 in software to trigger the hardware model to generate a software clock signal for the hardware model. Data and addresses enter the first buffer 511 on wire lines 519 and 529, respectively. The Q output of this first buffer 511 on wire line 521 is coupled to the D input of second buffer 512. The Q output of this first buffer 511 is also provided to the clock logic 514 gated on the wire line 522 ultimately to drive the clock input of the first buffer 516 of the clock edge register 515. . The output of the second buffer 512 on the wire line 523 is ultimately gated data logic 513 to drive the input of the register 518 through the wire line 530 in the user's custom-designed circuit model. Is provided). The enable input to the second buffer 512 in the first clock register 510 is an INPUT-EN signal on the wire line 533 from the state machine, which determines the evaluation cycle and thus controls the various signals.

클럭 에지 레지스터(515)는 또한 제 1 버퍼(516) 및 제 2 버퍼(517)를 포함한다. 클럭 에지 레지스터(515)는 하드웨어에서 구현된다. 클럭 에지 검출이 (제 1 클럭 레지스터(510)로의 입력을 통하여) 소프트웨어에서 발생할때, 이것은 하드웨어에서 (클럭 에지 레지스터(515)를 통하여) 하드웨어 내의 동일한 클럭 에지 검출을 트리거할 수 있다. 와이어 라인(524) 상에서 제 1 버퍼(516)로의 D 입력은 논리 "1"로 설정된다.Clock edge register 515 also includes a first buffer 516 and a second buffer 517. Clock edge register 515 is implemented in hardware. When clock edge detection occurs in software (via input to first clock register 510), it can trigger the same clock edge detection in hardware (via clock edge register 515) in hardware. The D input to the first buffer 516 on wire line 524 is set to a logic “1”.

인터-보드(inter-board) 통신을 위해 보드(1551)은 커넥터(1590)을 가지며 보드(1556)은 커넥터(1581)를 가진다. 인터커넥트(1600,1971,1977,1541,1540)와 같은 하나의 보드로부터 다른 보드에 걸치는 인터커넥트는 이들 커넥터(1590,1581)로 진행한다; 즉, 인터-보드 커넥터(1590,1581)는 인터커넥트(1600, 1971, 1977,1541, 1540)가 한 보드상의 한 소자와 다른 보드의 다른 소자 사이에 접속할 수 있게 한다. 인터-보드 커넥터(1590,1581)은 FPGA 버스 상에서 제어 데이터와 제어 신호들을 전송한다.Board 1551 has a connector 1590 and board 1556 has a connector 1581 for inter-board communication. Interconnects from one board to another board, such as interconnects 1600,1971,1977,1541,1540, advance to these connectors 1590,1581; In other words, inter-board connectors 1590 and 1581 allow interconnects 1600, 1971, 1977, 1541 and 1540 to connect between one element on one board and another element on the other board. Inter-board connectors 1590 and 1581 transfer control data and control signals on the FPGA bus.

4-보드 구성에서, 보드(1)과 보드(6)에는 북엔드(bookend) 보드가 제공되지만, 보드(2;1552)와 보드(3;1553)(도 39참조)는 중간 보드이다. (도 38A와 도 38B를 참조로 설명된 바와 같이) 본 발명에 따라서 마더보드에 결합될 때, 보드(1)과 보드(2)는 쌍을 이루며 보드(3)과 보드(6)은 쌍은 이룬다.In a four-board configuration, board 1 and board 6 are provided with a bookend board, while boards 21551 and boards 3153 (see FIG. 39) are intermediate boards. When coupled to the motherboard according to the present invention (as described with reference to FIGS. 38A and 38B), board 1 and board 2 are paired and board 3 and board 6 are paired. Achieve.

6-보드 구성에서, 상기 설명과 같이 보드(1)과 보드(6)에는 북엔드(bookend) 보드가 제공되지만, 보드(2;1552), 보드(3;1553), 보드(4;1554)와 보드(5;1555)(도 39참조)는 중간 보드이다. (도 38A와 도 38B를 참조로 설명된 바와 같이) 본 발명에 따라서 마더보드에 결합될 때, 보드(1)과 보드(2)는 쌍을 이루며 보드(3)과 보드(4)은 쌍은 이루고 보드(5)과 보드(6)은 쌍은 이룬다.In a six-board configuration, the board 1 and the board 6 are provided with a bookend board as described above, but boards 21551, boards 315353, and boards 4554. And board 5 1555 (see FIG. 39) are intermediate boards. When coupled to a motherboard in accordance with the present invention (as described with reference to FIGS. 38A and 38B), board 1 and board 2 are paired and board 3 and board 4 are paired. The board 5 and the board 6 are paired.

더 많은 보드가 필요에 따라서 제공될 수 있다. 그러나, 시스템에 추가될 보드의 수와는 관련없이, (도 39의 보드(1)과 보드(6)과 같이) 북엔드 보드는 메시 어레이 접속을 완전하게 하는 필수 종결부(requisite termination)를 가진다. 일 실시예에서, 최소 구성은 도 44의 듀얼-보드 구성이다. 더 많은 보드가 2-보드를 증가시켜 추가될 수 있다. 만약 초기 구성이 보드(1)과 보드(6)이라면, 4-보드 구성으로 변경하는 것은, 상기 언급한 바와 같이, 보드(6)을 외부로 이동시키고, 보드(1)과 보드(2)를 함께 쌍으로 만들고, 다음에 보드(3)과 보드(6)을 함께 쌍으로 만드는 것을 포함한다.More boards may be provided as needed. However, regardless of the number of boards to be added to the system, the bookend boards (such as boards 1 and 6 of FIG. 39) have a required termination to complete the mesh array connection. . In one embodiment, the minimum configuration is the dual-board configuration of FIG. 44. More boards can be added by increasing the 2-board. If the initial configuration is board 1 and board 6, then changing to a four-board configuration, as mentioned above, moves board 6 outward and moves board 1 and board 2 into place. Pairing together, and then pairing board 3 and board 6 together.

상기 설명한 바와 같이, 각각의 로직 디바이스는 인접하여 이웃하는 로직 디바이스와 인접하지는 않지만 한칸 건너 이웃하는 로직 디바이스와 결합한다. 따라서, 도 39와 도 44에서, 로직 디바이스(1577)는 인터커넥트)1547)을 통해 인접하여 이웃하는 로직 디바이스(1578)과 결합한다. 도한 로직 디바이스(1577)은 한칸 건너 인터커넥트(1548)을 통해 이웃하지 않는 로직 디바이스(1579)와 결합한다. 그러나, 로직 디바이스(1580)는 결합용 인터커넥트(1549)를 가지는 랩어라운드 토루스 구성으로 인해 로직 디바이스(1577)와 인접하도록 고려될 수 있다.As described above, each logic device is not adjacent to an adjacent neighboring logic device but is coupled with a neighboring logic device across one space. Thus, in FIGS. 39 and 44, logic device 1577 couples with adjacent neighboring logic device 1578 through interconnect 1547. The logic device 1577 also couples with the non-neighboring logic device 1579 via interconnect 1548 across one space. However, logic device 1580 may be considered adjacent to logic device 1577 due to a wraparound torus configuration having coupling interconnect 1549.

도 42는 단일 보드에 대한 온-보드 컴포넌트와 커넥터의 평면도(컴포넌트 측면)를 도시한다. 본 발명의 일 실시예에서, 시뮬레이션 시스템내의 사용자 설계를 모델링하기 위해 오로지 하나의 보드만이 필요하다. 다른 실시예에서, 여러 보드(적어도 2 보드)가 필요하다. 따라서, 예컨대, 도 39는 여러 600-핀 커넥터(1581-1590)를 통해 함께 결합된 6 보드(1551-1556)를 도시한다. 상부 단부 및 기저 단부에서, 보드(1551)는 10-옴 R-팩의 한 세트에 의해 종결되고 보드(1556)은 10-옴 R-팩의 다른 세트에 의해 종결된다.42 shows a top view (component side) of on-board components and connectors for a single board. In one embodiment of the invention, only one board is needed to model the user design in the simulation system. In other embodiments, several boards (at least two boards) are required. Thus, for example, FIG. 39 shows six boards 1551-1556 coupled together through several 600-pin connectors 1581-1590. At the top end and the base end, the board 1551 is terminated by one set of 10-ohm R-packs and the board 1556 is terminated by another set of 10-ohm R-packs.

도 42를 다시 참조하면, 보드(1820)는 4 FPGA, 즉, 로직 디바이스(1822(FPGA0)), 로직 디바이스(1823(FPGA1)), 로직 디바이스(1824(FPGA2)), 로직 디바이스(1825(FPGA3))를 포함한다. 또한 2 SRAM 메모리 디바이스(1828,1829)가 제공된다. 상기 SRAM 메모리 디바이스(1828,1829)는 상기 보드 상의 로직 디바이스로부터 메모리 블록을 매핑하는데 사용된다; 즉, 보 발명의 메모리 시뮬레이션 특징은 상기 보드 상의 로직 디바이스로부터 상기 보드상의 SRAM 메모리 디바이스까지 매핑하는 것이다. 다른 보드는 유사한 매핑 동작을 이루기 위해 다른 로직 디바이스와 메모리 디바이스를 가질 수 있다. 일 실시예에서, 메모리 매핑은 보드에 의존한다; 즉, 보드(1)에 대한 메모리 매핑은 보드(1) 상의 로직 디바이스와 메모리 디바이스로 제한되지만 다른 보드와는 무관하다. 다른 실시예에서, 메모리 매핑은 보드에 의하지 않는다. 따라서, 소수의 큰 메모리 디바이스는 한 보드 상의 로직 디바이스로부터 다른 보드 상에 위치한 메모리 디바이스까지 메모리 블록을 매핑하는데 사용된다.Referring back to FIG. 42, the board 1820 has four FPGAs, namely a logic device 1822 (FPGA0), a logic device 1827 (FPGA1), a logic device 1824 (FPGA2), and a logic device 1825 (FPGA3). )). Also provided are two SRAM memory devices 1828 and 1829. The SRAM memory devices 1828 and 1829 are used to map memory blocks from logic devices on the board; That is, the memory simulation feature of the invention is mapping from a logic device on the board to an SRAM memory device on the board. Other boards may have different logic devices and memory devices to achieve similar mapping operations. In one embodiment, memory mapping is board dependent; That is, the memory mapping for board 1 is limited to logic devices and memory devices on board 1 but is independent of other boards. In another embodiment, memory mapping is not board-based. Thus, a few large memory devices are used to map memory blocks from logic devices on one board to memory devices located on another board.

또한 일부 선택 액티버티를 가시적으로 나타내기 위해 발광 다이오드(LED)(1821)가 제공된다. LED 디스플레이는 본 발명의 일 실시예에 따라서 표 A에 나타나 있다:A light emitting diode (LED) 1821 is also provided to visually illustrate some selection activities. LED displays are shown in Table A in accordance with one embodiment of the present invention:

표 A : LED 디스플레이Table A: LED Display

LEDLED 색color 상태condition 설명Explanation LED1LED1 녹색green 온On +5 V 및 +3.3V 는 정상+5 V and +3.3 V are normal 오프off +5 V 또는 +3.3V 는 비정상+5 V or +3.3 V is abnormal LED2LED2 호박색amber 오프off 모든 온-보드 FPGA 구성이 동작됨All On-Board FPGA Configurations Work 점멸Flashing 온-보드 FPGA가 구성되지 않거나 또는 구성에 실패함On-board FPGA is not configured or fails to configure 온On FPGA 구성이 진행중임FPGA configuration is in progress LED3LED3 적색Red 온On 데이터 전송이 진행중임Data transfer is in progress 오프off 데이터 전송을 안함Do not send data 점멸Flashing 상태 점검 실패함Health check failed

PLX PCI 제어기(1826)과 CTRL_FPGA 유닛(1827)과 같은 여러 다른 제어 칩들이 인터-FPGA와 PCI 통신을 제어한다. 시스템에 사용될 수 있는 PLX PCI 제어기(1826)의 일 예는 PLX 테크놀로지의 PCI9080 또는 PCI9060 이다. PCI9080은 PCI 버스에 대한 적절한 로컬 버스 틴터페이스, 제어 레지스터, FIFO, 및 PCI 인터페이스를 가진다. 데이터 북 PLX 테크놀로지, PCI9080 데이터 시트(1997년 2월 28일 ver.0.93)가 참조로 여기에 포함된다. CTRL_FPGA 유닛(1827)의 일 예는 Altra 10K50 칩과 같이, FPGA의 형태를 가지는 프로그래머블 로직 디바이스(PLD)이다. 여러 보드 구성에서, PCI에 결합된 제 1 보드만이 PCI 제어기를 가진다.Several other control chips, such as the PLX PCI controller 1826 and the CTRL_FPGA unit 1827, control inter-FPGA and PCI communications. One example of a PLX PCI controller 1826 that may be used in a system is PCI9080 or PCI9060 from PLX Technology. The PCI9080 has an appropriate local bus tin interface, control registers, FIFO, and PCI interface to the PCI bus. Data Book PLX Technology, PCI9080 data sheet (February 28, 1997 ver.0.93), is incorporated herein by reference. One example of the CTRL_FPGA unit 1827 is a programmable logic device (PLD) in the form of an FPGA, such as an Altra 10K50 chip. In many board configurations, only the first board coupled to the PCI has a PCI controller.

커넥터(1830)는 보드(1820)를 마더보드(도시안됨)와, PCI 버스, 파워, 및 접지에 접속한다. 일부 보드에 대하여, 커넥터(1830)는 마더보드에 직접 접속하기 위해 사용되지 않는다. 따라서, 듀얼-보드 구성에서, 오로지 제 1 보드만이 마더보드에 결합된다. 6-보드 구성에서, 오로지 보드(1,3,5)만이 마더보드에 직접 접속되지만 나머지 보드(2,4,6)는 마더보드와 접속을 위해 이웃하는 보드에 의존한다. 인터-보드 커넥터(J1-J28)가 또한 제공된다. 명칭이 부여된 것에 따라서, 커넥터(J1-J28)는 다른 보드들 간의 접속을 가능하게 한다.Connector 1830 connects board 1820 to the motherboard (not shown) and to the PCI bus, power, and ground. For some boards, the connector 1830 is not used to connect directly to the motherboard. Thus, in a dual-board configuration, only the first board is coupled to the motherboard. In a six-board configuration, only boards 1, 3, 5 are directly connected to the motherboard while the remaining boards 2, 4, 6 rely on neighboring boards for connection with the motherboard. Inter-board connectors J1-J28 are also provided. As named, the connectors J1-J28 enable connection between different boards.

커넥터(J1)는 외부 파워 및 접지 접속용이다. 아래의 표 B는 본 발명의 일 실시예에 따라서 외부 파워 커넥터(J1)에 대한 핀과 해당하는 설명을 도시한다.Connector J1 is for external power and ground connection. Table B below shows the pins and corresponding descriptions for the external power connector J1 in accordance with one embodiment of the present invention.

표 B : 외부 파워-J1Table B: External Power-J1

핀 번호Pin number 설명Explanation 1One VCC 5VVCC 5V 22 GNDGND 33 GNDGND 44 VCC 3VVCC 3V

커넥터(J2)는 병렬 포트 접속용이다. 커넥터(J1,J2)는 제작시 스탠드-얼론 싱글-보드 바운더리 스캔 테스트(stand-alone single-board boundary scan test)를 위해 사용된다. 아래의 표 C는 본 발명의 일 실시예에 따라서 병렬 JTAG 포트 커넥터(J2)에 대한 핀과 해당하는 설명을 도시한다.The connector J2 is for parallel port connection. The connectors J1, J2 are used for stand-alone single-board boundary scan test in production. Table C below shows the pins and corresponding descriptions for the parallel JTAG port connector J2 in accordance with one embodiment of the present invention.

표 C : 병렬 JTAG 포트-J2Table C: Parallel JTAG Port-J2

J2핀 번호J2 pin number J2신호J2 signal 보드로부터의I/OI / O from the board DB25핀 번호DB25 pin number DB25신호DB25 signal 33 PARA_TCKPARA_TCK II 22 D0D0 55 PARA_TMSPARA_TMS II 33 D1D1 77 PARA_TDIPARA_TDI II 44 D2D2 99 PARA_NRPARA_NR II 55 D3D3 1919 PARA_TDOPARA_TDO OO 1010 NACKNACK 10,12,14,16,18,20,22,2410,12,14,16,18,20,22,24 GNDGND 18-2518-25 GNDGND

커넥터(J3,J4)는 보드에 걸리는 로컬 버스 접속을 위한 것이다. 커넥터(J5-J16)는 FPGA 상호커넥트 접속의 한 세트이다. 커넥터(J17-J28)는 PGA 상호커넥트 접속의 제2 세트이다. 납땜부-측부에 컴포넌트-측부가 위치할 때, 상기 커넥터는 한 보드의 한 컴포넌트와 다른 보드의 다른 컴포넌트 사이에서 유효한 접속을 제공한다. 아래의 표 D와 E는 본 발명의 일 실시예에 따라서 커넥터(J1-J28)의 모든 리스트와 설명을 도시한다.Connectors J3 and J4 are for local bus connection to the board. Connectors J5-J16 are a set of FPGA interconnect connections. Connectors J17-J28 are a second set of PGA interconnect connections. When the component-side is located at the solder-side, the connector provides a valid connection between one component on one board and another component on the other board. Tables D and E below show all lists and descriptions of connectors J1-J28 in accordance with one embodiment of the present invention.

표 D : 커넥터(J1-J28)Table D: Connectors (J1-J28)

커넥터connector 설명Explanation 타입type J1J1 +5V / +3V 외부 파워+ 5V / + 3V External Power 4-핀 파워 RA 헤더, 콤프 사이드4-pin power RA header, comp side J2J2 병렬 포트Parallel port 0.1"피치,2-로우 스루-홀 RA 헤더,콤프 사이드0.1 "pitch, 2-low through-hole RA header, comp side J3J3 로컬 버스Local bus 0.05" 피치,2x30 스루-홀 헤더,SAMTEC, 콤프 사이드0.05 "pitch, 2x30 through-hole header, SAMTEC, Comp Side J4J4 로컬 버스Local bus 0.05" 피치,2x30 스루-홀 리셉터클,SAMTEC, 솔더 사이드0.05 "pitch, 2x30 through-hole receptacle, SAMTEC, solder side J5J5 로우A:NH[0], VCC3V,GND로우B:J17 로우B, VCC3V, GNDLow A: NH [0], VCC3V, GND Low B: J17 Low B, VCC3V, GND 0.05" 피치,2x30 SMD 헤더,SAMTEC, 콤프 사이드0.05 "pitch, 2x30 SMD header, SAMTEC, Comp Side J6J6 로우A:J5 로우B, VCC3V,GND로우B:J17 로우B, VCC3V, GNDLow A: J5 Low B, VCC3V, GND Low B: J17 Low B, VCC3V, GND 0.05" 피치,2x30 리셉터클,SAMTEC, 솔더 사이드0.05 "pitch, 2x30 receptacle, SAMTEC, solder side J7J7 로우A:N[0], 4x VCC3V,4x GND, N[2]로우B:N[0], 4x VCC3V,4x GND, N[2]Low A: N [0], 4x VCC3V, 4x GND, N [2] Low B: N [0], 4x VCC3V, 4x GND, N [2] 0.05" 피치,2x45 스루-홀 헤더,SAMTEC, 콤프/솔더 사이드0.05 "pitch, 2x45 through-hole header, SAMTEC, Comp / Solder Side J8J8 로우A:N[0], 4x VCC3V,4x GND, N[2]로우B:N[0], 4x VCC3V,4x GND, N[2]Low A: N [0], 4x VCC3V, 4x GND, N [2] Low B: N [0], 4x VCC3V, 4x GND, N [2] 0.05" 피치,2x45 스루-홀 리셉터클,SAMTEC, 콤프/솔더 사이드0.05 "pitch, 2x45 through-hole receptacle, SAMTEC, comp / solder side J9J9 로우A:NH[2], LASTL, GND로우B:J21 로우B, GNDLow A: NH [2], LASTL, GND Low B: J21 Low B, GND 0.05" 피치,2x30 SMD 헤더,SAMTEC, 콤프 사이드0.05 "pitch, 2x30 SMD header, SAMTEC, Comp Side J10J10 로우A:J9 로우B, FIRSTL, GND로우B:J9 로우A, GNDLow A: J9 Low B, FIRSTL, GND Low B: J9 Low A, GND 0.05" 피치,2x30 SMD 리셉터클,SAMTEC, 솔더 사이드0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J11J11 로우A:NH[1], VCC3V,GND로우B:J23 로우B, VCC3V, GNDLow A: NH [1], VCC3V, GND Low B: J23 Low B, VCC3V, GND 0.05" 피치,2x30 SMD 헤더,SAMTEC, 콤프 사이드0.05 "pitch, 2x30 SMD header, SAMTEC, Comp Side J12J12 로우A:J11 로우B, VCC3V, GND로우B:J11 로우A, VCC3V, GNDLow A: J11 Low B, VCC3V, GND Low B: J11 Low A, VCC3V, GND 0.05" 피치,2x30 SMD 리셉터클,SAMTEC, 솔더 사이드0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J13J13 로우A:N[1], 4x VCC3V,4x GND, N[3]로우B:N[1], 4x VCC3V,4x GND, N[3]Low A: N [1], 4x VCC3V, 4x GND, N [3] Low B: N [1], 4x VCC3V, 4x GND, N [3] 0.05" 피치,2x45 스루-홀 헤더,SAMTEC, 콤프/솔더 사이드0.05 "pitch, 2x45 through-hole header, SAMTEC, Comp / Solder Side J14J14 로우A:N[1], 4x VCC3V,4x GND, N[3]로우B:N[1], 4x VCC3V,4x GND, N[3]Low A: N [1], 4x VCC3V, 4x GND, N [3] Low B: N [1], 4x VCC3V, 4x GND, N [3] 0.05" 피치,2x45 스루-홀 리셉터클,SAMTEC, 콤프/솔더 사이드0.05 "pitch, 2x45 through-hole receptacle, SAMTEC, comp / solder side J15J15 로우A:NH[3], LASTH, GND로우B:J27 로우B, GNDLow A: NH [3], LASTH, GND Low B: J27 Low B, GND 0.05" 피치,2x30 SMD 헤더,SAMTEC, 콤프 사이드0.05 "pitch, 2x30 SMD header, SAMTEC, Comp Side J16J16 로우A:J15 로우B, FIRSTH, GND로우B:J15 로우A, GNDLow A: J15 Low B, FIRSTH, GND Low B: J15 Low A, GND 0.05" 피치,2x30 SMD 리셉터클,SAMTEC, 솔더 사이드0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J17J17 로우A:SH[0], VCC3V,GND로우B:J5 로우B, VCC3V, GNDLow A: SH [0], VCC3V, GND Low B: J5 Low B, VCC3V, GND 0.05" 피치,2x30 SMD 헤더,SAMTEC, 콤프 사이드0.05 "pitch, 2x30 SMD header, SAMTEC, Comp Side J18J18 로우A:J17 로우B, VCC3V, GND로우B:J17 로우A, VCC3V, GNDLow A: J17 low B, VCC3V, GND Low B: J17 low A, VCC3V, GND 0.05" 피치,2x30 SMD 리셉터클,SAMTEC, 솔더 사이드0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J19J19 로우A:S[0], 4x VCC3V,4x GND, S[3]로우B:S[0], 4x VCC3V,4x GND, S[3]Low A: S [0], 4x VCC3V, 4x GND, S [3] Low B: S [0], 4x VCC3V, 4x GND, S [3] 0.05" 피치,2x45 스루-홀 헤더,SAMTEC, 콤프/솔더 사이드0.05 "pitch, 2x45 through-hole header, SAMTEC, Comp / Solder Side J20J20 로우A:S[0], 4x VCC3V,4x GND, S[3]로우B:S[0], 4x VCC3V,4x GND, S[3]Low A: S [0], 4x VCC3V, 4x GND, S [3] Low B: S [0], 4x VCC3V, 4x GND, S [3] 0.05" 피치,2x45 스루-홀 리셉터클,SAMTEC, 콤프/솔더 사이드0.05 "pitch, 2x45 through-hole receptacle, SAMTEC, comp / solder side J21J21 로우A:SH[2], LASTL, GND로우B:J9 로우B, GNDLow A: SH [2], LASTL, GND Low B: J9 Low B, GND 0.05" 피치,2x30 SMD 헤더,SAMTEC, 콤프 사이드0.05 "pitch, 2x30 SMD header, SAMTEC, Comp Side J22J22 로우A:J21 로우B, FIRSTL, GND로우B:J21 로우A, GNDLow A: J21 Low B, FIRSTL, GND Low B: J21 Low A, GND 0.05" 피치,2x30 SMD 리셉터클,SAMTEC, 솔더 사이드0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J23J23 로우A:SH[1], VCC3V,GND로우B:J11 로우B, VCC3V, GNDLow A: SH [1], VCC3V, GND Low B: J11 Low B, VCC3V, GND 0.05" 피치,2x30 SMD 헤더,SAMTEC, 콤프 사이드0.05 "pitch, 2x30 SMD header, SAMTEC, Comp Side J24J24 로우A:J23 로우B, VCC3V, GND로우B:J23 로우A, VCC3V, GNDLow A: J23 Low B, VCC3V, GND Low B: J23 Low A, VCC3V, GND 0.05" 피치,2x30 SMD 리셉터클,SAMTEC, 솔더 사이드0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J25J25 로우A:S[1], 4x VCC3V,4x GND, S[3]로우B:S[1], 4x VCC3V,4x GND, S[3]Low A: S [1], 4x VCC3V, 4x GND, S [3] Low B: S [1], 4x VCC3V, 4x GND, S [3] 0.05" 피치,2x45 스루-홀 헤더,SAMTEC, 콤프/솔더 사이드0.05 "pitch, 2x45 through-hole header, SAMTEC, Comp / Solder Side J26J26 로우A:S[1], 4x VCC3V,4x GND, S[3]로우B:S[1], 4x VCC3V,4x GND, S[3]Low A: S [1], 4x VCC3V, 4x GND, S [3] Low B: S [1], 4x VCC3V, 4x GND, S [3] 0.05" 피치,2x45 스루-홀 리셉터클,SAMTEC, 콤프/솔더 사이드0.05 "pitch, 2x45 through-hole receptacle, SAMTEC, comp / solder side

커넥터connector 설명Explanation 타입type J27J27 로우A:SH[3], LASTH, GND로우B:J15 로우B, GNDLow A: SH [3], LASTH, GND Low B: J15 Low B, GND 0.05" 피치,2x30 SMD 헤더,SAMTEC, 콤프 사이드0.05 "pitch, 2x30 SMD header, SAMTEC, Comp Side J28J28 로우A:J27 로우B, FIRSTH, GND로우B:J27 로우A, GNDLow A: J27 Low B, FIRSTH, GND Low B: J27 Low A, GND 0.05" 피치,2x30 SMD 리셉터클,SAMTEC, 솔더 사이드0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side

음영진 커넥터는 스루-홀 타입니다. 표 D에서, 괄호[]는 SMS FPGA 로직 디바이스 수(0-3)를 나타낸다. 따라서, S[0]는 사우스(south) 상호접속(도 37의 S[73:0])과 FPGA(0)의 74 비트를 가르킨다.Shaded connectors are through-hole rudders. In Table D, parentheses [] indicate SMS FPGA logic device numbers (0-3). Thus, S [0] points to the south interconnect (S [73: 0] in FIG. 37) and 74 bits of FPGA (0).

표 E : 로컬 버스 커넥터 - J3,J4Table E: Local Bus Connectors-J3, J4

핀 번호Pin number 신호 명칭Signal name I/OI / O 핀 번호Pin number 신호 명칭Signal name I/OI / O A1A1 GNDGND PWRPWR B1B1 LRESET_NLRESET_N I/OI / O A2A2 J3용 J3_CLKJ4용 J4_CLKJ3_CLK for J3 J4_CLK for J4 I/OI / O B2B2 VCC5VVCC5V PWRPWR A3A3 GNDGND PWRPWR B3B3 LD0LD0 I/OI / O A4A4 LD1LD1 I/OI / O B4B4 LD2LD2 I/OI / O A5A5 LD3LD3 I/OI / O B5B5 LD4LD4 I/OI / O A6A6 LD5LD5 I/OI / O B6B6 LD6LD6 I/OI / O A7A7 LD7LD7 I/OI / O B7B7 LD8LD8 I/OI / O A8A8 LD9LD9 I/OI / O B8B8 LD10LD10 I/OI / O A9A9 LD11LD11 I/OI / O B9B9 GNDGND PWRPWR A10A10 VCC3VVCC3V PWRPWR B10B10 LD12LD12 I/OI / O A11A11 LD13LD13 I/OI / O B11B11 LD14LD14 I/OI / O A12A12 LD15LD15 I/OI / O B12B12 LD16LD16 I/OI / O A13A13 LD17LD17 I/OI / O B13B13 LD18LD18 I/OI / O A14A14 LD19LD19 I/OI / O B14B14 LD20LD20 I/OI / O A15A15 LD21LD21 I/OI / O B15B15 VCC3VVCC3V PWRPWR A16A16 LD22LD22 I/OI / O B16B16 LD23LD23 I/OI / O A17A17 LD24LD24 I/OI / O B17B17 LD25LD25 I/OI / O A18A18 LD26LD26 I/OI / O B18B18 LD27LD27 I/OI / O A19A19 LD28LD28 I/OI / O B19B19 LD29LD29 I/OI / O A20A20 LD30LD30 I/OI / O B20B20 LD31LD31 I/OI / O A21A21 VCC3VVCC3V PWRPWR B21B21 LHOLDLHOLD OTOT A22A22 ADS_NADS_N I/OI / O B22B22 GNDGND PWRPWR A23A23 DEN_NDEN_N OTOT B23B23 DTR_NDTR_N 00 A24A24 LA31LA31 OO B24B24 LA30LA30 00 A25A25 LA29LA29 OO B25B25 LA28LA28 00 A26A26 LA10LA10 OO B26B26 LA7LA7 00 A27A27 LA6LA6 OO B27B27 LA5LA5 00 A28A28 LA4LA4 OO B28B28 LA3LA3 00 A29A29 LA2LA2 OO B29B29 완료complete ODOD A30A30 VCC5VVCC5V PWRPWR B30B30 VCC5VVCC5V PWRPWR

I/O 방향은 보드(1) 방향I / O direction to board (1)

도 43은 도 41A-41F, 도 42의 커넥터(J1-J28)에 대한 범례를 나타낸다. 일반적으로, 블록내 빈칸은 표면 장착을 나타내며, 블록내 녹색은 스루 홀 타입을 나타낸다. 또한, 솔리드 아웃라인 블록은 컴포넌트 측부에 위치한 커넥터를 나타낸다. 도트 아웃라인 블록은 솔더 측부에 위치한 커넥터를 나타낸다. 따라서, 빈칸 의 솔리드 아웃라인 블록(1840)은 표면에 장착되고 컴포넌트 측부에 위치한 20x30 헤더를 나타낸다. 빈칸의 도트 아웃라인 블록(1841)은 표면에 장착되고 보드의 솔더 측부에 위치한 2x30 리셉터클을 나타낸다. 녹색으로 채워진 솔리드 아웃라인 블록(1842)는 스루 홀 타입으로 컴포넌트 측부에 위치한 2x30 또는 2x45 헤더를 나타낸다. 녹색으로 채워진 도트 아웃라인 블록(1843)은 스루 홀 타입으로 솔더 측부에 위치한 2x30 또는 2x45 리셉터클을 나타낸다. 일 실시예에서, 시뮬레이션 시스템은 표면 장착 및 스루 홀 타입 모두에 대해 2x30 또는 2x45 마이크로 스트립 커넥터의 Samtec SFM과 TFM 계열을 사용한다. 격자표시로 채워진 솔리드 아웃라인 블록(1844)는 보드의 표면에 장착되고 컴포넌트 측부에 위치한 R-팩이다. 격자표시로 채워진 도트 아웃라인 블록(1845)는 표면에 장착되고 솔더 측부에 위치한 R-팩이다. 웹사이트에서 Samtec 카탈로그에 대한 Samtec 설명은 여기서 참조로 포함되었다. 도 42를 다시 참조하면, 커넥터(j3-j28)는 도 43의 범례에서 가르키는 것과 같은 타입이다.FIG. 43 shows a legend for the connectors J1-J28 of FIGS. 41A-41F, 42. In general, the blanks in the block indicate surface mounting and the green in the block indicates the through hole type. The solid outline block also represents a connector located on the component side. Dot outline blocks represent connectors located on the solder side. Thus, the blank solid outline block 1840 represents a 20x30 header mounted to the surface and located on the component side. The blank dot outline block 1841 represents a 2x30 receptacle mounted on the surface and located on the solder side of the board. Solid outline block 1882 filled in green represents a 2x30 or 2x45 header located on the component side in the form of a through hole. The green dot outline block 1843 represents a 2x30 or 2x45 receptacle located on the solder side in the form of a through hole. In one embodiment, the simulation system uses the Samtec SFM and TFM series of 2x30 or 2x45 microstrip connectors for both surface mount and through hole types. The solid outline block 1844 filled with grid marks is an R-pack mounted on the surface of the board and located on the component side. The dot outline block 1845 filled with grid marks is an R-pack mounted on the surface and located on the solder side. Samtec description of the Samtec catalog on the website is incorporated herein by reference. Referring again to FIG. 42, the connectors j3-j28 are of the same type as pointed out in the legend of FIG.

도 41A-41F는 각각의 보드와 이들 보드의 각각의 커넥터에 대한 평면도를 도시한다. 도 41A는 보드(6)에 대한 커넥터를 도시한다. 따라서, 보드(1660)은 마더보드 커넥터(1682)를 따라 커넥터(1661-1681)를 포함한다. 도 41B는 보드(5)에 대한 커넥터를 도시한다. 따라서, 보드(1690)은 마더보드 커넥터(1709)를 따라 커넥터(1691-1708)를 포함한다. 도 41C는 보드(4)에 대한 커넥터를 도시한다. 따라서, 보드(1715)는 마더보드 커넥터(1734)를 따라 커넥터(1716-1733)를 포함한다. 도 41D는 보드(3)에 대한 커넥터를 도시한다. 따라서, 보드(1740)은 마더보드 커넥터(1759)를 따라 커넥터(1741-1758)를 포함한다. 도 41E는 보드(2)에 대한 커넥터를 도시한다. 따라서, 보드(1765)는 마더보드 커넥터(1784)를 따라 커넥터(1766-1783)를 포함한다. 도 41F는 보드(1)에 대한 커넥터를 도시한다. 따라서, 보드(1790)은 마더보드 커넥터(1813)를 따라 커넥터(1791-1812)를 포함한다. 도 43의 범례에 도시된 바와 같이, 6 보드에 대한 커넥터는 (1) 표면 장착 타입 또는 스루 홀 타입, (2) 컴포넌트 측부 또는 솔더 측부, (3) 헤더 또는 리셉터클 또는 R-팩의 여러 조합이다.41A-41F show top views of each board and each connector of these boards. 41A shows the connector to the board 6. Thus, board 1660 includes connectors 1601-1681 along motherboard connector 1802. 41B shows the connector to the board 5. Thus, board 1690 includes connectors 1691-1708 along motherboard connector 1709. 41C shows the connector to board 4. Thus, board 1715 includes connectors 1171-1733 along motherboard connector 1734. 41D shows the connector to the board 3. Thus, board 1740 includes connectors 1741-1758 along motherboard connector 1759. 41E shows the connector to board 2. Accordingly, board 1765 includes connectors 1762-1783 along motherboard connector 1784. 41F shows the connector to board 1. Thus, board 1790 includes connectors 1791-1812 along motherboard connector 1813. As shown in the legend of FIG. 43, the connector for the six boards is a combination of (1) surface mount type or through hole type, (2) component side or solder side, (3) header or receptacle or R-pack .

일 실시예에서, 이들 커넥터는 인터-보드 통신에 사용된다. 관련된 버스 및 신호들은 임의의 두 보드 사이에서 신호를 라우팅하기 위해 함께 그룹화되어 이들 인터-보드 커넥터에 의해 지원된다. 또한, 오로지 보드들중 절반만이 마더 보드와 직접 결합된다. 도 41A에서, 보드(6)(1660)는 FPGA 인터커넥트의 한 세트를 위해 지정된 커넥터(1661-1668), FPGA 인터커넥트의 또 다른 세트를 위해 지정된 커넥터(1669-1674,1676,1679), 및 로컬 버스를 위해 지정된 커넥터(1681)을 포함한다. 보드(6)(1660)는 (다른 단부에서 도 41F의 보드(1)(1790)를 따라) 마더보드의 단부에서 보드들중 하나로서 위치하기 때문에, 커넥터(1675,1677,1678,1680)는 소정의 노스-사우스 인터커넥트를 위해 10-옴 R-팩으로 지정된다. 또한, 마더보드 커넥터(1682)는, 여섯번째 보드(1535)가 다섯번째 보드(1534)에 결합되지만 마더보드(1520)에 직접 결합되지 않는 것을 도시한 도 38B에서처럼, 보드(6)(1660)에 사용되지 않는다.In one embodiment, these connectors are used for inter-board communication. Related buses and signals are grouped together and supported by these inter-board connectors to route signals between any two boards. Also, only half of the boards are directly coupled to the motherboard. In FIG. 41A, board 6 (1660) includes connectors designated for one set of FPGA interconnects (1661-1668), connectors designated for another set of FPGA interconnects (1669-1674, 1676, 1679), and a local bus. And a connector 1801 designated for the purpose. Since the boards 6 and 1660 are located as one of the boards at the end of the motherboard (along board 1 (1790) of FIG. 41F at the other end), the connectors 1675, 1677, 1678, 1680 are Designated as a 10-ohm R-pack for a given north-south interconnect. Motherboard connector 1802 also includes board 6 (1660), as in FIG. 38B showing that sixth board 1535 is coupled to fifth board 1534 but not directly to motherboard 1520. Not used for

도 41B에서, 보드(5)(1690)는 FPGA 인터커넥트의 한 세트를 위해 지정된 커넥터(1691-1698), FPGA 인터커넥트의 또 다른 세트를 위해 지정된 커넥터(1699-1706), 및 로컬 버스를 위해 지정된 커넥터(1707)을 포함한다. 커넥터(1709)는 보드(5)(1690)를 마더보드에 결합하는데 사용된다.In FIG. 41B, board 5 1690 is designated connector 1691-1698 for one set of FPGA interconnects, connector 1699-1706 designated for another set of FPGA interconnects, and a connector designated for the local bus. (1707). Connector 1709 is used to couple boards 5 and 1690 to the motherboard.

도 41C에서, 보드(4)(1715)는 FPGA 인터커넥트의 한 세트를 위해 지정된 커넥터(1716-1723), FPGA 인터커넥트의 또 다른 세트를 위해 지정된 커넥터(1724-1731), 및 로컬 버스를 위해 지정된 커넥터(1732,1733)을 포함한다. 커넥터(1709)는 보드(4)(1715)를 마더보드에 직접 결합하는데 사용되지 않는다. 이러한 구성은, 네번째 보드(1533)가 세번째 보드(1532)와 다섯번째 보드(1534)에 결합되지만 마더보드(1520)에 직접 결합되지 않는 것을 도시한, 도 38B이다.In FIG. 41C, board 4 and 1715 are designated connectors 1716-1723 for one set of FPGA interconnects, connectors 1724-1731 designated for another set of FPGA interconnects, and connectors designated for the local bus. (1732,1733). The connector 1709 is not used to directly couple the boards 4 and 1715 to the motherboard. This configuration is in FIG. 38B showing that the fourth board 1533 is coupled to the third board 1532 and the fifth board 1534 but not directly to the motherboard 1520.

도 41D에서, 보드(3)(1740)는 FPGA 인터커넥트의 한 세트를 위해 지정된 커넥터(1741-1748), FPGA 인터커넥트의 또 다른 세트를 위해 지정된 커넥터(1749-1756), 및 로컬 버스를 위해 지정된 커넥터(1757,1758)을 포함한다. 커넥터(1759)는 보드(3)(1740)를 마더보드에 결합하는데 사용된다.In FIG. 41D, board 3 (1740) is a connector (1741-1748) designated for one set of FPGA interconnects, a connector (1749-1756) designated for another set of FPGA interconnects, and a connector designated for the local bus. (1757,1758). Connector 1759 is used to couple board 3 (1740) to the motherboard.

도 41E에서, 보드(2)(1765)는 FPGA 인터커넥트의 한 세트를 위해 지정된 커넥터(1766-1733), FPGA 인터커넥트의 또 다른 세트를 위해 지정된 커넥터(1774-1781), 및 로컬 버스를 위해 지정된 커넥터(1782,1783)을 포함한다. 커넥터(1784)는 보드(2)(1765)를 마더보드에 직접 결합하는데 사용되지 않는다. 이러한 구성은, 두번째 보드(1525)가 세번째 보드(1532)와 첫번째 보드(1526)에 결합되지만 마더보드(1520)에 직접 결합되지 않는 것을 도시한, 도 38B이다.In FIG. 41E, board 2 (1765) includes connectors designated for one set of FPGA interconnects (1766-1733), connectors designated for another set of FPGA interconnects (1774-1781), and connectors designated for local buses. (1782,1783). Connector 1784 is not used to directly join board 2 (1765) to the motherboard. This configuration is in FIG. 38B showing that the second board 1525 is coupled to the third board 1532 and the first board 1526 but not directly to the motherboard 1520.

도 41F에서, 보드(1)(1790)는 FPGA 인터커넥트의 한 세트를 위해 지정된 커넥터(1791-1798), FPGA 인터커넥트의 또 다른 세트를 위해 지정된 커넥터(1799-1804,1806,1809), 및 로컬 버스를 위해 지정된 커넥터(1811,1812)을 포함한다. 커넥터(1813)는 보드(1)(1790)를 마더보드에 결합하는데 사용된다. 보드(1)(1790)는 (다른 단부에서 도 41A의 보드(1)(1660)를 따라) 마더보드의 단부에서 보드들중 하나로서 위치하기 때문에, 커넥터(1805,1807,1808,1810)는 소정의 노스-사우스 인터커넥트를 위해 10-옴 R-팩으로 지정된다.In FIG. 41F, board (1) 1790 is a connector (1791-1798) designated for one set of FPGA interconnects, a connector (1799-1804, 1806, 1809) designated for another set of FPGA interconnects, and a local bus. Connectors 1811 and 1812 designated for the purpose of designation. Connector 1813 is used to couple board (1) 1790 to the motherboard. The boards 1805 1790 are positioned as one of the boards at the end of the motherboard (along the board 1 (1660 of Figure 41A at the other end)), so the connectors 1805, 1807, 1808, 1810 are Designated as a 10-ohm R-pack for a given north-south interconnect.

본 발명의 일실시예에서, 여러 보드는 고유한 방식으로 마더보드 및 각각 다른 보드와 결합한다. 여러 보드는 컴포넌트-측부에서 솔더-측부로 함께 결합된다. 또한 보드들중 하나는, 즉 첫번째 보드는 마더보드에 결합되고, 마더보드 커넥터를 통해 PCI 버스에 결합된다. 또한 첫번째 보드 상의 FPGA 인터커넥트 버스는 FPGA 인터커넥트 커넥터의 쌍을 통해 다른 보드 즉 제 2 보드의 FPGA 인터커넥트 버스에 결합된다. 제 1 보드의 FPGA 인터커넥트 커넥터는 컴포넌트 측부 상에 있으며 제 2 보드의 FPGA 인터커넥트 커넥터는 솔더 측부 상에 있다. 제 1 보드와 제 2 보드 상에 있는 컴포넌트 측부와 솔더 측부는 각각 FPGA 인터커넥트 버스가 함께 결합되게 한다.In one embodiment of the invention, several boards are combined with the motherboard and each other in a unique manner. Several boards are joined together from component-side to solder-side. In addition, one of the boards, the first board, is coupled to the motherboard and is connected to the PCI bus through the motherboard connector. The FPGA interconnect bus on the first board is also coupled to the FPGA interconnect bus on the other board, the second board, through a pair of FPGA interconnect connectors. The FPGA interconnect connector of the first board is on the component side and the FPGA interconnect connector of the second board is on the solder side. The component side and solder side on the first and second boards respectively allow the FPGA interconnect buses to be coupled together.

유사하게, 두 개의 보드 상에 있는 로컬 버스는 로컬 버스 커넥터를 통해 함께 결합된다. 제 1 보드 상에 있는 로컬버스 커넥터는 컴포넌트 측부에 있으며 제2 보드 상에 있는 로컬 버스 커넥터는 솔더 측부에 있다. 따라서, 각각 제 1 보드와 제 2 보드 상에 있는 컴포넌트 측부 및 솔더 측부 커넥터는 로컬 버스가 함께 결합되게 한다.Similarly, local buses on two boards are joined together via local bus connectors. The localbus connector on the first board is on the component side and the local bus connector on the second board is on the solder side. Thus, the component side and solder side connectors on the first and second boards respectively allow the local buses to be joined together.

더 많은 보드가 추가될 수 있다. 제 3 보드는 제 2 보드의 컴포넌트 측부에 솔더 측부를 가지며 추가될 수 있다. 유사하게 FPGA 인터커넥트와 로컬 버스 인터-보드 접속이 또한 이루어질 수 있다. 하기 설명처럼, 제 3 보드는 또한 다른 커넥터를 통해 마더보드에 결합되지만 이러한 커넥터는 단순히 파워와 접지를 제 3 보드에 제공한다.More boards can be added. The third board may have a solder side on the component side of the second board and be added. Similarly, FPGA interconnect and local bus inter-board connections can also be made. As described below, the third board is also coupled to the motherboard through other connectors, but these connectors simply provide power and ground to the third board.

듀얼 보드 구성의 솔더 측부의 컴포넌트 측부 커넥터는 도 38A를 참조하여 설명될 것이다. 상기 도는 본 발명의 일 실시예를 따라서 마더보드상의 FPGA 보드 접속의 측면도를 도시한다. 도 38A는 듀얼-보드 구성을 도시하는데, 명칭이 부여된 오로지 두 개의 보드만이 사용된다. 도 38A에 도시된 이들 두 개의 보드(1525(보드2),1526(보드1))는 도 39에 도시된 두 개의 보드(1552,1551)과 일치한다. 보드(1525,1526)의 컴포넌트 측부는 참조번호(1988)로 표시된다. 두 개의 보드(1525,1526)의 솔더 측부는 참조번호(1988)로 표시된다. 도 38A에 도시된 바와 같이, 이들 두 개의 보드(1525,1526)는 마더보드 커넥터(1523)를 통해 마더보드(1520)에 결합된다. 다른 마더보드 컨넥터(1521,1522,1524) 또한 확장되어 제공될 수 있다. PCI 버스와 보드(1525,1526) 사이의 신호는 마더보드 커넥터(1523)을 통해 라우팅된다. PCI 신호는 듀얼-보드 구조와 PCI 버스 사이에서 제 1 보드(1526)을 통해 먼저 라우팅된다. 따라서, PCI 버스로부터의 신호는이들이 제 2 보드(1525)로 전송되기 전에 먼저 제 1 보드(1526)를 인카운팅한다. 아날로그식으로, 듀얼-보드 구조로부터의 PCI 버스의 신호는 제 1 보드(1526)으로부터 전송된다. 파워는 또한 파워 서플라이(도시안됨)로부터 마더보드 커넥터(1523)을 통해 보드(1525,1526)에 인가된다.The component side connector of the solder side of the dual board configuration will be described with reference to FIG. 38A. The figure shows a side view of an FPGA board connection on a motherboard in accordance with an embodiment of the present invention. 38A shows a dual-board configuration, in which only two boards are named. These two boards 1525 (board 2) and 1526 (board 1) shown in FIG. 38A correspond to the two boards 1552 and 1551 shown in FIG. 39. Component sides of boards 1525 and 1526 are indicated by reference numeral 1988. Solder sides of the two boards 1525 and 1526 are indicated by reference numeral 1988. As shown in FIG. 38A, these two boards 1525 and 1526 are coupled to the motherboard 1520 through the motherboard connector 1523. Other motherboard connectors 1521, 1522, and 1524 may also be extended. Signals between the PCI bus and the boards 1525 and 1526 are routed through the motherboard connector 1523. The PCI signal is first routed through the first board 1526 between the dual-board architecture and the PCI bus. Thus, signals from the PCI bus first count the first board 1526 before they are sent to the second board 1525. Analogically, signals of the PCI bus from the dual-board architecture are transmitted from the first board 1526. Power is also applied to the boards 1525 and 1526 via the motherboard connector 1523 from the power supply (not shown).

도 38A에 도시된 것처럼, 보드(1526)는 여러 컴포넌트와 커넥터를 포함한다. 하나의 컴포넌트는 FPGA 로직 디바이스(1530)이다. 또한 커넥터(1528A,1531A)가 제공된다. 유사하게, 보드(1525)는 여러 컴포넌트와 커넥터를 포함한다. 이러한 한가지 컴포넌트는 FPGA 로직 디바이스(1529)이다. 또한 커넥터(1528B,1531B)가 제공된다.As shown in FIG. 38A, board 1526 includes several components and connectors. One component is the FPGA logic device 1530. Also provided are connectors 1528A, 1153A. Similarly, board 1525 includes several components and connectors. One such component is the FPGA logic device 1529. Also provided are connectors 1528B, 1153B.

일 실시예에서, 커넥터(1528A,1528B)는 (도 44의) (1590,1581)과 같은 FPGA 버스와 같은 인터-보드 커넥터이다. 이들 인터-보드 커넥터는 로컬 버스를 제외한 N[73:0], S[73:0], W[73:0], E[73:0], NH[27:0], SH[27:0], XH[36:0], XH[72:37]와 같은 여러 FPGA 인터커넥트용 인터-보드 접속을 제공한다.In one embodiment, connectors 1528A, 1528B are inter-board connectors, such as an FPGA bus, such as 1590,1581 (of FIG. 44). These inter-board connectors are N [73: 0], S [73: 0], W [73: 0], E [73: 0], NH [27: 0], SH [27: 0 except local buses. ], Provides inter-board connectivity for several FPGA interconnects such as XH [36: 0] and XH [72:37].

더욱이, 커넥터(1531A-1531B)는 로컬 버스용 인터-보드 커넥터이다. 로컬 버스는 (PCI 제어기를 통하는) PCI 버스와 (FPGA I/O 제어기(CTRL_FPGA)유닛을 통하는) FPGA 버스 사이에서 신호를 처리한다. 로컬 버스는 또한 PCI 제어기와 FPGA 로직 디바이스 및 FPGA I/O 제어기(CTRL_FPGA) 유닛 사이에서 구성 및 바운더리 스캔 테스트 정보를 처리한다.Moreover, connectors 1531A-1531B are inter-board connectors for the local bus. The local bus handles signals between the PCI bus (via the PCI controller) and the FPGA bus (via the FPGA I / O controller (CTRL_FPGA) unit). The local bus also handles configuration and boundary scan test information between the PCI controller and FPGA logic device and FPGA I / O controller (CTRL_FPGA) units.

결국, 마더보드 커넥터는 보드 쌍으로 하나의 보드를 PCI 버스와 파워에 결합한다. 커넥터의 한 세트는 한 보드의 컴포넌트 측부를 통해 다른 보드의 솔더측부에 FPGA 인터커넥트를 결합한다. 커넥터의 다른 세트는 한 보드의 컴포넌트 측를 통해 로컬 버스를 다른 보드의 솔더 측부에 결합한다.In the end, the motherboard connector is a pair of boards that couple a board to the PCI bus and power. One set of connectors couples the FPGA interconnect to the solder side of another board through the component side of one board. Another set of connectors couples the local bus to the solder side of the other board through the component side of one board.

본 발명의 다른 실시예에서, 두 개 이상의 보드가 사용된다. 게다가, 도 38B는 6-보드 구성을 도시한다. 구성은 도 38A의 것과 유사하다. 모든 다른 보드가 마더보드와 직접 접속되고, 이들 보드의 인터커넥트와 로컬 버스가 솔더-측부에서 컴포넌트-측부에 배치된 인터-보드 커넥터를 통해 함께 결합된다.In another embodiment of the present invention, two or more boards are used. In addition, Figure 38B shows a six-board configuration. The configuration is similar to that of FIG. 38A. All other boards are directly connected to the motherboard, and their interconnects and local buses are joined together through inter-board connectors placed at the solder-side and component-side.

도 38B는 6 보드(1526(제1보드), 1525(제2보드), 1532(제3보드), 1533(제4보드),1534(제5보드),1535(제6보드))를 도시한다. 상기 6 보드는 보드(1526(제1보드), 1532(제3보드), 1534(제5보드))상의 커넥터를 통해 마더보드(1520)와 결합한다. 다른 보드(1525(제2보드), 1533(제4보드), 1535(제6보드))는 마더보드(1520)와 직접 결합하지 않지만; 자신들의 각각 이웃하는 보드들과의 접속을 통해 간접적으로 마더보드와 결합한다.38B shows six boards 1526 (first board), 1525 (second board), 1532 (third board), 1533 (fourth board), 1534 (fiveth board), and 1535 (6th board). do. The six boards are coupled to the motherboard 1520 through connectors on boards 1526 (first board), 1532 (third board), and 1534 (fifth board). The other boards 1525 (second board), 1533 (fourth board), 1535 (sixth board) do not directly couple with the motherboard 1520; Indirectly couples with the motherboard through connections with their respective neighboring boards.

컴포넌트-측부에 솔더-측부를 위치시켜, 여러 인터-보드 커넥터들은 PCI 버스 컴포넌트, FPGA 로직 디바이스, 메모리 디바이스, 및 여러 시뮬레이션 시스템 제어 회로들 간에 통신을 가능하게 한다. 인터-보드 커넥터(1990)의 제 1 세트는 도 42의 커넥터(J5-J16)에 해당한다. 인터-보드 커넥터(1991)의 제 2 세트는 도 42의 커넥터(J17-J28)에 해당한다. 인터-보드 커넥터(1992)의 제 3 세트는 도 42의 커넥터(J3-J4)에 해당한다.By placing the solder-side on the component-side, several inter-board connectors enable communication between PCI bus components, FPGA logic devices, memory devices, and various simulation system control circuits. The first set of inter-board connectors 1990 corresponds to connectors J5-J16 of FIG. 42. The second set of inter-board connectors 1991 corresponds to connectors J17-J28 of FIG. 42. The third set of inter-board connectors 1992 corresponds to connectors J3-J4 of FIG. 42.

마더보드 커넥터(1521-1524)는 마더보드(이에 따라 PCI 버스)를 6 보드에 결합하기 위해 마더보드(1520) 상에 제공된다. 상기 언급한 바와 같이, 보드(1526(제1보드), 1532(제3보드), 1534(제5보드))는 커넥터(1523,1522,1521)과 각각 직접 결합한다. 다른 보드(1525(제2보드), 1533(제4보드), 1535(제6보드))는 마더보드(1520)와 직접 결합하지 않는다. 오로지 하나의 PCI 제어기만이 모든 6 보드들에 대해 필요하기 때문에, 오로지 제 1 보드(1526)만이 PCI 제어기를 포함한다. 또한, 제 1 보드(1526)에 결합된 마더보드 커넥터(1523)PCI 버스로부터/버스로의 액세스를 제공한다. 커넥터(1522,1521)는 오로지 파워와 접지와 결합한다. 인접한 마더보드 커넥터들 간의 중앙대중앙 공간은 일 실시예에서 대략 20.32 mm 이다.Motherboard connectors 1521-1524 are provided on motherboard 1520 to couple the motherboard (and therefore PCI bus) to the six boards. As mentioned above, the boards 1526 (first board), 1532 (third board), and 1534 (fifth board) are directly coupled with the connectors 1523, 1522, and 1521, respectively. The other boards 1525 (second board), 1533 (fourth board), and 1535 (6th board) do not directly couple with the motherboard 1520. Since only one PCI controller is needed for all six boards, only the first board 1526 includes the PCI controller. In addition, a motherboard connector 1523 coupled to the first board 1526 provides access to / from the PCI bus. Connectors 1522 and 1521 only couple power and ground. The center-to-center spacing between adjacent motherboard connectors is approximately 20.32 mm in one embodiment.

마더보드 커넥터(1523,1522,1521)과 각각 직접 결합한 보드(1526(제1보드), 1532(제3보드), 1534(제5보드))에 대하여, 각각 커넥터(J5-J16)는 컴포넌트 측부 상에 위치하고, 각각 커넥터(J17-J28)는 솔더 측부 상에 위치하며, 로컬 버스 커넥터(J3-J4)는 컴포넌트 측부 상에 위치한다. 마더보드 커넥터(1523,1522,1521)과 각각 직접 결합하지 않은 보드(1525(제2보드), 1533(제4보드), 1535(제6보드))에 대하여, 각각 커넥터(J5-J16)는 솔더 측부 상에 위치하고, 각각 커넥터(J17-J28)는 컴포넌트 측부 상에 위치하며, 로컬 버스 커넥터(J3-J4)는 솔더 측부 상에 위치한다. 단부 보드(1526(제1보드), 1535(제6보드))에 대하여, 커넥터(J17-J28)의 쌍이 10-옴 R-팩 종결부이다.For boards 1526 (first board), 1532 (third board), and 1534 (fifth board) directly bonded to the motherboard connectors 1523, 1522, and 1521, respectively, the connectors J5-J16 have component side Are located on the solder side and each of the local bus connectors J3-J4 is located on the component side. For boards 1525 (second board), 1533 (fourth board), and 1535 (fourth board) that are not directly coupled to the motherboard connectors 1523, 1522, and 1521, respectively, the connectors J5-J16 are Located on the solder side, each connector J17-J28 is located on the component side, and local bus connectors J3-J4 are located on the solder side. For end boards 1526 (first board), 1535 (sixth board), the pair of connectors J17-J28 is a 10-ohm R-pack termination.

도 40A와 도 40B는 다른 보드들 간의 어레이 커넥션을 도시한다. 제조 공정을 용이하게 하기 위하여, 단일 레이아웃 구조가 모든 보드들에 사용된다. 상기 설명한 바와 같이, 보드는 백플래인 엇이 컨넥터를 통해 다른 보드들과 접속한다.도 40A는 두 개의 예시적인 보드(1611(보드2),1610(보드1))를 도시한다. 보드(1610)의 콤포넌트 측부는 보드(1611)의 솔더 측부에 면한다. 보드(1611)은 다양한 FPGA 로직 디바이스, 다른 컴포넌트, 와이어 라인을 포함한다. 보드(1611) 상의 상기 로직 디바이스와 다른 컴포넌트의 특정 노드는 노드A'(참조번호1612), 노드B'(참조번호1614)로 표기된다. 노드A'는 PCB 트레이스(1620)를 통해 커넥터 패드(1616)에 결합한다. 유사하게, 노드B'는 PCB 트레이스(1623)를 통해 커넥터 패드(1617)에 접속된다.40A and 40B show array connections between different boards. To facilitate the manufacturing process, a single layout structure is used for all boards. As described above, the board connects to other boards via a backplane switch. FIG. 40A shows two exemplary boards 1611 (board 2) and 1610 (board 1). The component side of the board 1610 faces the solder side of the board 1611. Board 1611 includes various FPGA logic devices, other components, and wire lines. Specific nodes of the logic device and other components on the board 1611 are denoted by Node A '(reference number 1612) and Node B' (reference number 1614). Node A 'couples to connector pad 1616 via PCB trace 1620. Similarly, node B 'is connected to connector pad 1617 via PCB trace 1623.

유사하게, 보드(1610)은 다양한 FPGA 로직 디바이스, 다른 컴포넌트, 와이어 라인을 포함한다. 보드(1610) 상의 상기 로직 디바이스와 다른 컴포넌트의 특정 노드는 노드A(참조번호1613), 노드B(참조번호1615)로 표기된다. 노드A는 PCB 트레이스(1625)를 통해 커넥터 패드(1618)에 결합한다. 유사하게, 노드B는 PCB 트레이스(1622)를 통해 커넥터 패드(1619)에 접속된다.Similarly, board 1610 includes various FPGA logic devices, other components, and wire lines. Specific nodes of the logic device and other components on board 1610 are denoted by Node A (reference number 1613) and Node B (reference number 1615). Node A couples to connector pad 1618 through PCB trace 1625. Similarly, Node B is connected to connector pad 1619 via PCB trace 1622.

표면 장착 커넥터를 사용하여 다른 보드들에 위치한 노드들간의 신호 라우팅이 설명된다. 도 40A에서, 바람직한 접속은: (1) 가상 경로(1620,1621,1622)에 의해 지시되는 노드A와 노드B' 사이에서 이루어지고, (2) 가상 경로(1623,1624,1625)에 의해 지시되는 노드B와 노드A'에서 이루어진다. 상기 접속은 도 39의 보드(1551)과 보드(1552) 사이의 비대칭 접속과 같은 경로를 이룬다. 다른 비대칭 인터커넥트는 커넥터(1589-1590)의 양 측부상에서 NH-SH 인터커넥트(1577,1579,1581)을 포함한다.Signal routing between nodes located on different boards using surface mount connectors is described. In FIG. 40A, the preferred connection is: (1) between Node A and Node B 'indicated by virtual paths 1620, 1641 and 1622, and (2) indicated by virtual paths 1623, 1624, 1625. Node B and node A '. The connection forms the same path as the asymmetrical connection between board 1551 and board 1552 of FIG. 39. Other asymmetric interconnects include NH-SH interconnects 1577, 1579, 1581 on both sides of connectors 1589-1590.

A-A'와 B-B'는 인터커넥트(1515(N,S))와 같은 대칭 인터커넥션에 해당한다.N과 S 인터커넥션은 스루홀 커넥터를 사용하며, NH와 SH 비대칭 인터커넥션은 SMD 커넥터를 사용한다. 표 D를 참조하라.A-A 'and B-B' correspond to symmetrical interconnections such as interconnects (1515 (N, S)). N and S interconnects use through-hole connectors, NH and SH asymmetric interconnects use SMD connectors. Use See Table D.

표면 장착 커넥터를 사용한 실제 구현이 동일 항목에 대하여 동일 번호를 사용하여 도 40B를 참조하여 설명된다. 도 40B에서, 보드(1611)는 PCB 트레이스(1620)를 통해 컴포넌트-측부 커넥터 패드(1636)에 결합된 컴포넌트 측부 상의 노드A'를 도시한다. 컴포넌트-측부 커넥터 패드(1636)는 도전 경로(1651)를 통해 솔더-측부 커넥터 패드(1639)에 결합한다. 솔더-측부 커넥터 패드(1639)는 도전 경로(1648)을 통해 보드(1610) 상의 컴포넌트 측부 커넥터 패드(1642)에 결합한다. 마지막으로, 콤포넌트-측부 커넥터 패드(1642)는 PCB 트레이스(1622)를 통해 노드B에 결합한다. 따라서, 보드(1611) 상의 노드A'는 보드(1610) 상의 노드B와 결합될 수 있다.Actual implementations using surface mount connectors are described with reference to FIG. 40B using the same numbers for the same items. In FIG. 40B, board 1611 shows node A 'on the component side coupled to component-side connector pad 1636 via PCB trace 1620. Component-side connector pad 1636 couples to solder-side connector pad 1639 via conductive path 1651. Solder-side connector pad 1639 couples to component side connector pad 1644 on board 1610 via conductive path 1648. Finally, component-side connector pad 1644 couples to Node B via PCB trace 1622. Thus, node A 'on board 1611 may be combined with node B on board 1610.

마찬가지로, 도 40B에서, 보드(1611)는 PCB 트레이스(1623)를 통해 컴포넌트-측부 커넥터 패드(1638)에 결합된 컴포넌트 측부 상의 노드B'를 도시한다. 컴포넌트-측부 커넥터 패드(1638)는 도전 경로(1650)를 통해 솔더-측부 커넥터 패드(1637)에 결합한다. 솔더-측부 커넥터 패드(1637)는 도전 경로(1645)을 통해 컴포넌트 측부 커넥터 패드(1640)에 결합한다. 마지막으로, 콤포넌트-측부 커넥터 패드(1640)는 PCB 트레이스(1625)를 통해 노드A에 결합한다. 따라서, 보드(1611) 상의 노드B'는 보드(1610) 상의 노드A와 결합될 수 있다. 이들 보드들은 동일한 레이아웃을 공유하기 때문에, 도전 경로(1652,1653)은 보드(1610)에 인접하여 위치한 다른 보드에 대한 도전 경로(1650,1651)과 동일한 방식으로 사용될수 있다. 따라서, 고유한 인터-보드 접속 스킴이 스위칭 컴포넌트를 사용하지 않고 표면 장착 및 스루 홀 커넥터를 사용하여 제공된다.Likewise, in FIG. 40B, board 1611 shows NodeB ′ on the component side coupled to component-side connector pad 1638 through PCB trace 1623. Component-side connector pads 1638 couple to solder-side connector pads 1637 through conductive paths 1650. Solder-side connector pad 1637 couples to component side connector pad 1640 via conductive path 1645. Finally, component-side connector pad 1640 couples to Node A via PCB trace 1625. Thus, Node B ′ on board 1611 may be combined with Node A on board 1610. Since these boards share the same layout, the conductive paths 1652 and 1653 can be used in the same manner as the conductive paths 1650 and 1651 to other boards located adjacent to the board 1610. Thus, a unique inter-board connection scheme is provided using surface mount and through hole connectors without using switching components.

F. 타이밍-인센서티브 글리치-프리 로직 디바이스F. Timing-Insensitive Glitch-Free Logic Devices

본 발명의 일 실시예는 홀드 시간과 클록 글리치 문제를 해결한다. 본 발명의 일 실시예에 따라서, 재구성가능한 컴퓨팅 시스템의 하드웨어 모델로 사용자가 설계하는 동안, 사용자 설계에서 나타난 표준 로직 디바이스(래치, 플립-플롭)는 에뮬레이션 로직 디바이스, 타이밍-인센서티브 글리치-프리(TIGF) 로직 디바이스로 대체된다. 일 실시예에서, EVAL 신호에 포함된 트리거 신호는 TIGF 로직 디바이스 내에 저장된 값을 갱신하는데 사용된다. 산정 주기동안 사용자 설계 하드웨어 모델을 통해 진행하고 안정-상태에 도달하도록 여러 입력 및 다른 신호를 대기한 후에, TIGF 로직 디바이스에 의해 저장되거나 래치된 값을 갱신하기 위해 트리거 신호가 제공된다. 그 후에, 새로운 산정 주기가 시작된다. 일 실시예에서, 상기 산정 주기-트리거 주기는 주기적이다.One embodiment of the present invention solves the hold time and clock glitch problems. According to one embodiment of the invention, while a user designs with a hardware model of a reconfigurable computing system, the standard logic devices (latch, flip-flop) shown in the user design may be emulated logic devices, timing-insensitive glitch-free ( TIGF) is replaced by a logic device. In one embodiment, the trigger signal included in the EVAL signal is used to update the value stored in the TIGF logic device. After going through the user-designed hardware model during the estimation cycle and waiting for several inputs and other signals to reach a stable-state, a trigger signal is provided to update the values stored or latched by the TIGF logic device. After that, a new calculation cycle begins. In one embodiment, the calculation period-trigger period is periodic.

상기 언급한 홀드 시간 문제는 간단하게 설명된다. 공지된 바와 같이, 로직 회로 설계의 일반적인 문제는 홀드 시간 위반이다. 홀드 시간은 로직 엘리먼트의 데이터 입력(들)가 제어 입력(클록 입력)이 데이터 입력(들)에 의해 지시되는 값의 래치, 캡처 또는 저장을 변경한 후에 안정을 유지해야 하지만, 그렇지 않으면 로직 엘리먼트가 적절하게 동작하지 않는 최소 시간으로서 정의된다.The above-mentioned hold time problem is briefly described. As is known, a common problem in logic circuit design is a hold time violation. The hold time must remain stable after the data input (s) of the logic element changes the latch, capture, or storage of the value indicated by the data input (s), but otherwise the logic element It is defined as the minimum time that does not work properly.

시프트 레지스터 예는 홀드 시간 조건을 예시하는데 설명된다. 도 75A는 게개의 D-타입 플립-플롭이 직렬로 접속되는, 즉 플립-플롭(2400)의 출력부가 플립-플롭(2401)의 입력부에 연결되고, 그 출력부가 플립-플롭(2402)의 입력부에 연결되는, 예시적인 시프트 레지스터를 도시한다. 전체 입력 신호(S_in)는 플립-플롭(2400)의 입력부에 연결되고 전체 출력 신호(S_out)는 플립-플롭(2402)의 출력으로부터 생성된다. 모든 세 개의 플립플롭은 각각 자신의 클록 입력부에서 공통 클록 신호를 수신한다. 상기 시프트 레지스터 설계는 (1)클록 신호가 도시에 모든 플립플롭에 도달하고, (2) 클록 신호의 에지를 검출한 후에 플립플롭의 입력부가 홀드 시간동안 변하지 않는다는 가정에 기초한다.The shift register example is described to illustrate the hold time condition. 75A shows that multiple D-type flip-flops are connected in series, i.e., the output of flip-flop 2400 is connected to the input of flip-flop 2401, the output of which is the input of flip-flop 2402. An example shift register, shown, is shown. The entire input signal S _in is connected to the input of flip-flop 2400 and the entire output signal S _out is generated from the output of flip-flop 2402. All three flip-flops each receive a common clock signal at their clock inputs. The shift register design is based on the assumption that (1) the clock signal reaches every flip-flop in the illustration, and (2) the input of the flip-flop does not change during the hold time after detecting the edge of the clock signal.

도 75B의 타이밍 도를 참조하면, 시스템이 홀드 시간 조건을 위반하지 않는다는 홀드 시간 가정을 도시한다. 홀드 시간은 하나의 로직 엘리먼트로부터 다음 로직 엘리먼트로 변화하지만 항상 특정 시트에서 특정되는 것은 아니다. 시간(t₀)에서 클록 입력은 로직 0에서 로직 1로 변한다. 도 75A에 도시된 바와같이, 클록 입력은 각각의 플립플롭(2400-2402)에 제공된다. (t₀)에서의 상기 클록 에지로부터, 입력(S_in)은 시간(t₀)에서 시간(t₁)까지 지속되는 홀드 시간(T_H)동안 안정해야 한다. 유사하게, 플립플롭(2401(D₂), 2402(D₃))의 입력은 또한 클록 신호의 트리거 에지로부터 홀드 시간 동안 안정해야 한다. 상기 조건은 도 75A와 도 75B에서 만족되기 때문에, 입력(S_in)는 플립플롭(2400)으로 시프트되고, (D₂(로직0))에서의 입력은 플립플롭(2401)으로 시프트되고, (D₃(로직1))에서의 입력은 플립플롭(2402)으로 시프트된다. 공지된 바와 같이,클록 에지가 트리거된 후에, 플립플롭(2401)(입력D₂에서의 로직 1과 플립플롭(2402)(입력D₃에서의 로직 0의 입력에서의 새로운 값은 홀드 시간 조건이 만족된다고 가정하여 다음 클록 사이클에서 다음 플립플롭에 시프트 또는 저장된다. 하기 표는 상기 예시적인 값에 대한 시프트 레지스터의 동작을 요약한 것이다.Referring to the timing diagram of FIG. 75B, a hold time assumption is shown that the system does not violate the hold time condition. The hold time changes from one logic element to the next, but is not always specified on a particular sheet. At time t ₀ , the clock input changes from logic 0 to logic 1. As shown in FIG. 75A, a clock input is provided to each flip-flop 2400-2402. From the clock edge at (t ₀ ), input S _in must be stable for hold time T _H that lasts from time t ₀ to time t ₁ . Similarly, the inputs of flip-flops 2401 (D ₂ ) and 2402 (D ₃ ) must also be stable for the hold time from the trigger edge of the clock signal. Since the condition is satisfied in Figs. 75A and 75B, the input S _in is shifted to the flip-flop 2400, the input at (D ₂ (logic 0)) is shifted to the flip-flop 2401, ( The input at D ₃ (logic 1) is shifted to flip-flop 2402. As is known, after the clock edge is triggered, the new values at the flip-flop 2401 (logic 1 at input D ₂ and flip-flop 2402 (logic 0 at input D ₃ ) hold time conditions The shift is stored or stored at the next flip-flop in the next clock cycle, assuming it is satisfied The following table summarizes the operation of the shift register for the example values.

D₁ D ₁ D₂ D ₂ D₃ D ₃ Q₃ Q ₃ 클록 에지 전Before clock edge 1One 00 1One 00 클록 에지 후After clock edge 1One 1One 00 1One

실제 구현에서, 클록 신호는 모든 로직 엘리먼트에 동시에 도달하지 않지만, 회로는 클록 신호가 모든 로직 엘리먼트에 거의 동시에 또는 실질적으로 동시에 도달하도록 설계된다. 회로는 각각의 플립플롭에 도달하는 클록 신호들 사이의 클록 스큐, 또는 시간차가 홀드 시간 조건 보다 작도록 설계된다. 따라서, 모든 로직 엘리먼트는 적절한 입력 값을 캡처한다. 도 75A와 도75B에 도시된 상기 예에서, 다른 시간에 플립 플록(2400-2402)에 도달한 클록 신호로 인해 홀드 시간 위반은 다른 플립플롭이 새로운 입력 값을 캡처하는 동안 일부 플립플롭이 이전의 입력 값을 캡처하는 결과를 유발할 수 있다. 그 결과, 시프트 레지스터는 적절하게 동작하지 않는다.In practical implementations, the clock signal does not reach all the logic elements at the same time, but the circuit is designed such that the clock signal arrives at all or substantially the same time for all the logic elements. The circuit is designed such that the clock skew, or time difference, between clock signals reaching each flip-flop is less than the hold time condition. Thus, every logic element captures the appropriate input value. In the example shown in Figs. 75A and 75B, the hold time violation due to the clock signal arriving at the flip flops 2400-2402 at different times may result in some flip-flops being moved while the other flip-flops are capturing new input values. This can result in capturing input values. As a result, the shift register does not operate properly.

동일한 시프트 레지스터 설계의 재구성가능한 로직(FPGA) 구현에서, 만약 클록이 프라이머리(primary)로부터 직접 생성된다면, 회로는 낮은 스큐 네트워크가 클록 신호를 모든 로직 엘리먼트에 분배할 수 있어 로직 엘리먼트가 실질적으로 동시에 클록 에지를 검출하도록 설계될 수 있다. 이전 클록은 셀프-타임 테스트-벤치 프로세스로부터 생성된다. 일반적으로, 주요 클록 신호는 소프트웨어에서 생성되고 오로지 적은(1-10) 프라이머리 클록이 통상적인 사용자 회로 설계에서 발견된다.In a reconfigurable logic (FPGA) implementation of the same shift register design, if the clock is generated directly from primary, the circuitry allows the low skew network to distribute the clock signal to all logic elements so that the logic elements are substantially simultaneously It can be designed to detect clock edges. The previous clock is generated from the self-time test-bench process. In general, the primary clock signal is generated in software and only a small (1-10) primary clock is found in a typical user circuit design.

그러나, 만약 클록 신호가 프라이머리 입력부 대신에 내부 로직으로부터 생성된다면, 홀드 시간은 더 중요하게 된다. 유도 또는 게이트 클록은 프라이머리 클록에 의해 얻어진 조합 로직 및 레지스터의 네트워크로부터 생성된다. 많은 (1000 이상) 유도 클록은 통상적인 사용자 회로 설계에서 발견된다. 외부 예방 또는 추가의 제어없이, 상기 클록 신호는 다른 시간에서 각각의 로직 엘리먼트에 도달할 수 있고 클록 스큐는 홀드 시간보다 길어질 수 있다. 이것은 도 75A와 도 75B에 도시된 시프트 레지스터 회로와 같은 회로 설계를 실패하게 한다.However, if the clock signal is generated from internal logic instead of the primary input, the hold time becomes more important. The derivation or gate clock is generated from a network of register logic and combinational logic obtained by the primary clock. Many (more than 1000) inductive clocks are found in typical user circuit designs. Without external precautions or additional control, the clock signal may reach each logic element at a different time and the clock skew may be longer than the hold time. This causes circuit design such as the shift register circuit shown in Figs. 75A and 75B to fail.

도 75A에 도시된 동일한 시프트 레지스터 회로를 사용하여, 홀드 시간 위반이 설명된다. 그러나, 이 때, 시프트 레지스터 회로의 개별 플립플롭은 도 75A에 도시된 여러 재구성가능한 로직 칩(여러 FPGA 칩)에 걸쳐 퍼진다. 제 1 FPGA 칩(2411)은 클록 신호(CLK)를 FPGA 칩(2412-2416)의 일부 컴포넌트에 제공하는 내부에서 얻어진 클록 로직(2410)을 포함한다. 상기 예에서, 내부에서 생성된 클록 신호(CLK)는 시프트 레지스터 회로의 플립플롭(2400-2402)에 제공된다. 칩(2412)는 플립플롭(2400)을 포함하고, 칩(2415)은 플립플롭(2401)을 포함하며, 칩(2416)은 플립플롭(2402)를 포함한다. 두 개의 다른 칩(2413,2414)은 홀드 시간 위반 개념을 설명하는데 제공된다.Using the same shift register circuit shown in Fig. 75A, the hold time violation is described. However, at this time, individual flip-flops of the shift register circuit are spread over several reconfigurable logic chips (multiple FPGA chips) shown in FIG. 75A. The first FPGA chip 2411 includes internally obtained clock logic 2410 that provides a clock signal CLK to some components of the FPGA chip 2412-2416. In the above example, the internally generated clock signal CLK is provided to flip-flops 2400-2402 of the shift register circuit. Chip 2412 includes flip-flop 2400, chip 2415 includes flip-flop 2401, and chip 2416 includes flip-flop 2402. Two different chips 2413 and 2414 are provided to illustrate the concept of hold time violations.

칩(2411)의 클록 로직(2410)은 내부 클록 신호(CLK)를 생성하기 위해 프라이머리 클록 입력(또는 또 다른 얻어진 클록 입력)을 수신한다. 상기 내부 크록 신호(CLK)는 칩(2412)로 전송되로 (CLK1)로 명칭이 부여된다. 클록 로직(2410)으로부터의 내부 클록 신호(CLK)는 또한 칩(2415)로 전송되고 칩(2413,2414)를 통해 (CLK2)로 명칭이 부여된다. 도시된 바와 같이, (CLK1)는 플립플롭(2400)에 입력되고 (CLK2)는 플립플롭(2401)에 입력된다. (CLK1)과 (CLK2)는 (CLK1)과 (CLK2)의 데지가 내부 클록 신호(CLK)의 에지로부터 지연되도록 와이어 트레이스 지연을 가진다. 더욱이, (CLK2)은 두 개의 다른 칩(2413,2414)를 통해 전송되기 때문에 추가의 지연을 가진다.Clock logic 2410 of chip 2411 receives a primary clock input (or another obtained clock input) to generate an internal clock signal CLK. The internal clock signal CLK is transmitted to the chip 2412 and is named CLK1. Internal clock signal CLK from clock logic 2410 is also sent to chip 2415 and is named (CLK2) via chips 2413 and 2414. As shown, CLK1 is input to flip-flop 2400 and CLK2 is input to flip-flop 2401. (CLK1) and (CLK2) have wire trace delays such that the deg of (CLK1) and (CLK2) is delayed from the edge of the internal clock signal (CLK). Moreover, (CLK2) has an additional delay since it is transmitted through two different chips 2413 and 2414.

도 76B의 타이밍 도를 참조하면, 내부 클록 신호(CLK)가 시간(t₂)에서 생성되고 트리거된다. 와이어 트레이스 지연으로 인해, (CLK1)는 시간(t₃)까지 칩(2412)의 플립플록(2400)에 도달하지 못하며, 이것은 시간(T₁)의 지연이다. 상기 표에 도시된 바와 같이, (Q₁)(또는 입력 D₂)에서의 출력은 (CLK1)의 클록 에지가 도달하기전까지 로직0이다. (CLK1)의 에지가 플립플록(2400)에서 감지된 후에, D₁에서의 입력은 필수 홀드 시간(H₂)동안(시간(t₄)까지) 안정해야 한다. 이 때, 플립플롭(2400)은 Q₁(또는 D₂)에서 출력이 로직 1이 되도록 입력 로직1로 시프트 하거나 또는 저장한다.Referring to the timing diagram of FIG. 76B, an internal clock signal CLK is generated and triggered at time t ₂ . Due to the wire trace delay, CLK1 does not reach flip-flop 2400 of chip 2412 until time t ₃ , which is a delay of time T ₁ . As shown in the table above, the output at (Q ₁ ) (or input D ₂ ) is logic 0 until the clock edge of (CLK1) is reached. After the edge of CLK1 is sensed at flip-flop 2400, the input at D ₁ must be stable for the required hold time H ₂ (up to time t ₄ ). At this time, the flip-flop 2400 shifts or stores the input logic 1 so that the output at Q ₁ (or D ₂ ) becomes logic 1.

이것이 플립플롭(2400)에서 발생하는 동안, 클록 신호(CLK2)는 칩(2415)의 플립플롭(2401)로 진행한다. 칩(2413,2414)에 의해 유발된 지연(T₂)는 (CLK2)가 시간(t₅)에서 플립플롭(2401)에 도달하게 한다. D₂에서의 입력이 로직 1이고 그후에 홀드 시간이 상기 플립플롭(2401)에 대해 만족되면, 상기 로직 값 1은 출력(Q₂)(또는 D₃)에서 나타난다. 따라서, 출력(Q₂)은 (CLK2)의 도달 전에 로직 1이고 출력은 (CLK2)의 도달 후에 로직 1이 계속된다. 이것은 잘못된 결과이다. 상기 시프트 레지스터는 로직 0으로 시프트된다. 플립플롭(2400)은 이전의 입력 값(로직1)로 올바르게 시프트 하지만, 플립플롭(2401)은 새로운 입력 값(로직 1)로 올바르지 않게 시프트된다. 일반적으로 이러한 올바르지 않은 동작은 클록 스큐(또는 타이밍 지연)이 홀드 시간 보다 클때 발생한다. 상기 예에서, T2>T1+H2 이다. 결국, 홀드 시간 위반은 일부 미리예방 수단이 실행되지 않으면, 도 76A에 도시된 바와 같이, 클록 신호가 하나의 칩으로부터 생성되고 클록 신호를 다른 칩에 있는 다른 로직 엘리먼트에 분배하는 경우에 발생한다.While this occurs in flip-flop 2400, clock signal CLK2 proceeds to flip-flop 2401 of chip 2415. The delay T ₂ caused by chips 2413 and 2414 causes CLK2 to reach flip-flop 2401 at time t ₅ . If the input at D ₂ is logic 1 and then the hold time is satisfied for the flip-flop 2401, the logic value 1 appears at output Q ₂ (or D ₃ ). Thus, output Q ₂ is logic 1 before the arrival of CLK2 and output 1 continues logic 1 after the arrival of CLK2. This is a wrong result. The shift register is shifted to logic zero. Flip-flop 2400 correctly shifts to the previous input value (logic 1), while flip-flop 2401 incorrectly shifts to the new input value (logic 1). Typically this incorrect behavior occurs when the clock skew (or timing delay) is greater than the hold time. In this example, T2> T1 + H2. Consequently, hold time violations occur when some precautionary measures are not implemented, as shown in FIG. 76A, when the clock signal is generated from one chip and distributes the clock signal to other logic elements on another chip.

상기 언급한 클록 글리치 문제는 도 77A와 도 77B를 참조하여 설명된다. 일반적으로, 회로 입력이 변할 때, 출력이 올바른 값으로 안정화되기 전에 매우 짧은 시간 동안 일부 임의의 값으로 변한다. 만약 또다른 회로가 잘못된 시간에 출력을 조사하고 임의의 값을 판독한다면, 결과는 올바르지 않을 수 있고 디버깅이 어려워진다. 또 다른 회로에 나브게 영향을 미치 상기 임의의 값을 글리치라고 한다. 일반적인 로직 회로에서, 하나의 회로는 또 다른 회로에 대하여 클록 신호를 생성할 수 있다. 만약 보상되지 않은 타이밍 지연이 하나 또는 두 개의 회로에 존재한다면, 클록 글리치(클록 에지의 계획되지 않은 발생)이 발생할 수 있고 이것은 올바르지 않은 결과를 유발할 수 있다. 홀드 시간 위반과 같이, 회로 설계의 임의의 로직 엘리먼트가 다른 시간에서 값을 변화시키기 때문에 클록 글리치가 발생한다.The aforementioned clock glitch problem is described with reference to FIGS. 77A and 77B. In general, when the circuit input changes, it changes to some random value for a very short time before the output stabilizes to the correct value. If another circuit examines the output at the wrong time and reads a random value, the result may be incorrect and it becomes difficult to debug. Any value that affects another circuit badly is called a glitch. In a typical logic circuit, one circuit can generate a clock signal for another circuit. If an uncompensated timing delay is present in one or two circuits, clock glitches (unplanned occurrences of the clock edges) may occur and this may cause incorrect results. Like hold time violations, clock glitches occur because any logic element in the circuit design changes values at different times.

도 77A는 로직 엘리먼트의 또 다른 세트, 즉 D-타입 플립플롭(2420), D타입 플립플롭(2421)에 대하여 클록 신호를 생성하고, 배타적 논리합(XOR) 게이트(2422)는 D-타입 플립플롭(2423)에 대하여 클록 신호(CLK3)를 생성한다. 플립플롭(2420)은 라인(2425)에서 D₁에서 데이터 입력을 수신하고 라인(2427)에서 Q₁에서 데이터를 출력한다. 클록 로직(2424)으로부터 클록 입력(CLK)이 수신된다. CLK는 클록 로직(2424)으로부터 원래 생성된 클록 신호로 불리고 CLK1은 플립플롭(2420)에 도달하는 시간에 지연된 동일한 신호로 불린다.77A generates a clock signal for another set of logic elements, namely D-type flip-flop 2420, D-type flip-flop 2421, and exclusive OR (XOR) gate 2422 is a D-type flip-flop. The clock signal CLK3 is generated for the 2423. Flip-flop 2420 receives data input at D ₁ at line 2425 and outputs data at Q ₁ at line 2427. The clock input CLK is received from the clock logic 2424. CLK is called the clock signal originally generated from clock logic 2424 and CLK1 is called the same signal delayed in the time it reaches flip-flop 2420.

플립플롭(2421)은 라인(2426)에서 D₂에서 데이터 입력을 수신하고 라인(2428)에서 Q₂에서 데이터를 출력한다. 클록 로직(2424)으로부터 클록 입력(CLK2)이 수신된다. CLK는 클록 로직(2424)으로부터 원래 생성된 클록 신호로 불리고 CLK2은 플립플롭(2421)에 도달하는 시간에 지연된 동일한 신호로 불린다.Flip-flop 2421 receives data input at D ₂ at line 2426 and outputs data at Q ₂ at line 2428. Clock input CLK2 is received from clock logic 2424. CLK is called the clock signal originally generated from clock logic 2424 and CLK2 is called the same signal delayed in the time it reaches flip-flop 2421.

라인(2427,2428)에서 각각 플립플롭(2420,2421)으로부터의 출력은 XOR 게이트(2422)에 입력된다. XOR 게이트(2422)는 CLK3로 명칭이 부여된 데이터를 플립플롭(2423)의 클록 입력에 출력한다. 플립플롭(2423)은 또한 라인(2429)에서 D₃에서 데이터를 입력하고 Q₃에서 데이터를 출력한다.Outputs from flip-flops 2420 and 2421, respectively, at lines 2427 and 2428 are input to XOR gate 2422. XOR gate 2422 outputs data labeled CLK3 to the clock input of flip-flop 2423. Flip-flop 2423 also inputs data at D ₃ on line 2429 and outputs data at Q ₃ .

상기 회로에서 발생할 수 있는 클록 글리치 문제는 도 77B에 도시된 타이밍도를 참조하여 설명된다. CLK 신호는 시간(t₀)에서 트리거된다. 상기 클록 신호(CLK1)가 플립플롭(2420)에 도달할 때, 시간은 t₁이다. CLK2는 시간(t₂)까지 플립플롭(2421)에 도달하지 않는다.The clock glitch problem that may occur in the circuit is described with reference to the timing diagram shown in FIG. 77B. The CLK signal is triggered at time t ₀ . When the clock signal CLK1 reaches the flip-flop 2420, the time is t ₁ . CLK2 does not reach flip-flop 2421 until time t ₂ .

D₁과 D₂의 입력은 모두 로직 1이라고 가정한다. CLK1이 시간(t₁)에서 플립플롭(2420)에 도달할 때 Q₁에서의 출력은 (도 77B에 도시된 것처럼) 로직1이 될 것이다. CLK2는 시간(t₁)에서 플립플롭(2420)에 다소 늦게 도달하고, 따라서 라인(2428)에서 출력(Q₂)는 시간(t₁)에서 시간(t₂)까지 로직 0으로 남아 있다. XOR 게이트(2422)는, 원하는 신호가 로직0일지라도(1 XOR 1 = 0), 시간(t₁)과 시간(t₂) 사이의 시간 주기동안 플립플롭(2423)의 클록 입력에 존재하기 위한 CLK3로서, 로직 1을 생성한다. 시간(t₁)과 시간(t₂) 사이의 시간 주기동안의 CLK3 생성은 클록 글리치이다. 따라서, 플립플롭(2423)의 입력 라인(2429)에서 D₃에서 존재하는 어떤 로직 값이, 원하던 값이든 아니든, 저장되고, 상기 플립플롭(2423)은 라인(2429) 상에서 다음 입력을 준비한다. 만약 적절하게 설계되었다면, CLK1과 CLK2의 시간 지연은 클록 글리치가 생성되지 않도록 최소화되거나, 또는 적어도, 클록 글리치는 회로의 나머지 부분에 영향을 주지 않는 짧은 기간동안 지속된다. 후자의 경우에, 만약 CLK1과 CLK2 사이의 클록 스큐가 충분히 짧다면, XOR 게이트 지연은 글리치를 필터링 아웃하기에 충분히 길고 회로의 나머지 부분에 영향을 주지 않을 것이다.Assume that the inputs of D ₁ and D ₂ are both logic ones. When CLK1 reaches flip-flop 2420 at time t ₁ , the output at Q ₁ will be logic 1 (as shown in FIG. 77B). CLK2 arrives somewhat late at flip-flop 2420 at time t ₁ , so output Q ₂ at line 2428 remains logic 0 from time t ₁ to time t ₂ . XOR gate 2422 is CLK3 for being present at the clock input of flip-flop 2423 during the time period between time t ₁ and time t ₂ , even if the desired signal is logic 0 (1 XOR 1 = 0). As a result, logic 1 is generated. CLK3 generation during the time period between times t ₁ and t ₂ is a clock glitch. Thus, any logic value present at D ₃ in input line 2429 of flip-flop 2423, whether desired or not, is stored and the flip-flop 2423 prepares for the next input on line 2429. If properly designed, the time delays of CLK1 and CLK2 are minimized so that clock glitches are not generated, or at least, clock glitches last for a short period of time without affecting the rest of the circuit. In the latter case, if the clock skew between CLK1 and CLK2 is short enough, the XOR gate delay is long enough to filter out the glitches and will not affect the rest of the circuit.

홀드 시간 위반 문제에 대한 두 개의 공지된 해결책은 (1) 타이밍 조정과 (2) 타이밍 합성이다. 미국 특허 번호 5,475,830 에 개시된 타이밍 조정은 로직 엘리먼트의 홀드 시간을 늘리기 위해 임의의 신호 경로에서 (버퍼와 같은) 충분한 지연 엘리먼트의 설치를 필요로 한다. 예컨대, 상기 시프트 레지스터 회로내의 입력(D₂,D₃)상에서의 충분한 지연을 추가하는 것은 홀드 시간 위반을 방지할 수 있다. 따라서, 도 78에서, 입력(D₂,D₃)에 각각 추가된 지연 엘리먼트(2430,2430)을 가지는 동일한 시프트 레지스터 회로가 도시되어 있다. 그 결과, 지연 엘리먼트(2430)는 T2<T1+H2(도76B)이고 홀드 시간 위반이 발생하지 않기 위해 시간(t₄)이 시간(t₅)이후에 발생하도록 설계될 수 있다.Two known solutions to the hold time violation problem are (1) timing adjustment and (2) timing synthesis. The timing adjustment disclosed in US Pat. No. 5,475,830 requires the installation of sufficient delay elements (such as buffers) in any signal path to increase the hold time of the logic elements. For example, adding a sufficient delay on inputs D ₂ , D _{3 in} the shift register circuit can prevent hold time violations. Thus, in FIG. 78, the same shift register circuit is shown with delay elements 2430 and 2430 added to inputs D ₂ and D ₃ , respectively. As a result, delay element 2430 can be designed such that time t ₄ occurs after time t ₅ so that T2 < T1 + H2 (FIG. 76B) and hold time violations do not occur.

타이밍 조정 해결책을 가지는 잠재적인 문제는 FPGA 칩의 특정 시트에 너무 과도하게 부여된다는 것이다. 공지된 바와 같이, FPGA 칩과 같은 재구성가능한 로직 칩은 검색표를 가지는 로직 엘리먼트를 구현한다. 칩내의 검색표의 지연은 특정 시트에 제공되고 설계자는 홀드 시간 위반을 방지하는 타이밍 조정 방법을 사용하여 상기 특정 시간 지연에 의존한다. 그러나, 상기 지연은 단지 추정치이며 칩에 따라 변한다. 타이밍 조정 방법이 가지는 또 다른 잠재적인 문제점은 설계자가 또한 회로 설계 전체에 걸쳐 존재하는 와이어링 지연을 보상한다는 것이다. 비록 이것은 불가능한 작어이지만, 와이어링 지연의 추정은 시간 소모적이며 에러가 발생하기 쉽다. 더욱이, 타이밍 조정 방법은 클록 글리치 문제를 해결하지 않는다.A potential problem with timing adjustment solutions is that too much is placed on a particular sheet of FPGA chip. As is known, reconfigurable logic chips, such as FPGA chips, implement logic elements with lookup tables. The delay of the lookup table in the chip is provided in a particular sheet and the designer relies on the specific time delay using a timing adjustment method that prevents hold time violations. However, the delay is only an estimate and varies from chip to chip. Another potential problem with the timing adjustment method is that the designer also compensates for the wiring delays that exist throughout the circuit design. Although this is not possible, the estimation of the wiring delay is time consuming and error prone. Moreover, the timing adjustment method does not solve the clock glitch problem.

또다른 해결책은 IKOS의 VirtualWires 기술에 소개된 타이밍 합성이다. 타이밍 합성 개념은 원격 상태의 머신과 레지스터를 통해 클록의 타이밍과 핀-아웃 신호의 엄격한 제어동안 사용자의 회로 설계를 기능적으로 동일한 설계로 변형하는 것을 포함한다. 타이밍 합성은 단일 고속 클록에 의해 도입되는 사용자의 회로 설계를 리타이밍한다. 또한 래치, 게이트 클록, 여러 동기 및 비동기 클록을 플립플롭 기반 단일-클록 동기 설계로 변환한다. 따라서, 타이밍 합성은 인터-칩 홀드 시간 위반이 발생하지 않도록 정교한 인터-칩 신호 움직임을 제어하기 위해 각각의 칩의 입력 및 출력 핀-아웃에서 레지스터를 사용한다. 또한 타이밍 합성은 다른 칩으로부터 입력을 계획하고, 다른 칩으로 출력을 계획하며 기준 클록에 기초한 내부 플립플롭의 갱신을 계획하기 위해 각 칩의 원격 상태의 머신을 사용한다.Another solution is the timing synthesis introduced in IKOS's VirtualWires technology. The timing synthesis concept involves transforming the user's circuit design into a functionally identical design during the timing of the clock and tight control of the pin-out signal through remote machines and registers. Timing synthesis retimes the user's circuit design introduced by a single high speed clock. It also converts latches, gate clocks, and multiple synchronous and asynchronous clocks into flip-flop-based single-clock synchronous designs. Thus, timing synthesis uses registers at the input and output pin-out of each chip to control sophisticated inter-chip signal movement so that no inter-chip hold time violations occur. Timing synthesis also uses a machine in each chip's remote state to plan inputs from other chips, plan outputs to other chips, and schedule updates of internal flip-flops based on reference clocks.

도 75A,75B,76A,76B와 관련하여 설명된 동일한 시프트 레지스터 회로를 사용하여, 도 79는 타이밍 합성 회로의 일 예를 도시한다. 기본적인 세 개의 플립플롭 시프트 레지스터 설계는 기능적으로 동일한 회로로 변형되었다. 칩(2430)은 라인(2448)을 통해 레지스터(2443)에 연결된 로직(2435)를 생성하는 원래의 내부 클록을 포함한다. 클록 로직(2435)는 CLK 신호를 생성한다. 제 1 원격 상태 머신(2438)은 또한 라인(2449)를 통해 레지스터(2443)에 연결된다. 레지스터(2443)과 제 1 원격 상태 머신(2438)은 디자인-인디펜던트 글로벌 기준 클록에 의해 제어된다.Using the same shift register circuit described with respect to Figures 75A, 75B, 76A, 76B, Figure 79 shows an example of a timing synthesis circuit. The basic three flip-flop shift register designs have been transformed into functionally identical circuits. Chip 2430 includes an original internal clock that generates logic 2435 coupled to register 2443 via line 2448. Clock logic 2435 generates a CLK signal. The first remote state machine 2438 is also connected to the register 2443 via a line 2449. Register 2443 and first remote state machine 2438 are controlled by a design-independent global reference clock.

또한 CLK 신호는 칩(2434)에 도달하기 전에 칩(2432,2433)에 걸쳐 전달된다. 칩(2432)에서, 제 2 원격 상태 머신(2440)은 라인(2462)를 통해 레지스터(2445)를 제어한다. CLK 신호는 레지스터(2443)로부터 라인(2461)을 통해 레지스터(2445)로전송된다. 레지스터(2445)는 CLK 신호를 라인(2463)을 통해 다음 칩(2433)으로 출력한다. 칩(2433)은 라인(2464)를 통해 레지스터(2446)를 제어하는 제 3 원격 상태 머신(2441)을 포함한다. 레지스터(2446)는 CLK 신호를 칩(2434)으로 출력한다.The CLK signal is also propagated across chips 2432 and 2433 before reaching chip 2434. At chip 2432, second remote state machine 2440 controls register 2445 via line 2442. The CLK signal is sent from register 2443 to register 2445 via line 2461. The register 2445 outputs the CLK signal through the line 2463 to the next chip 2433. Chip 2433 includes a third remote state machine 2441 that controls register 2446 via line 2464. The register 2446 outputs the CLK signal to the chip 2434.

칩(2434)는 원래의 플립플롭(2436)을 포함한다. 레지스터(2444)는 입력(Sin)을 수신하고 라인(2452)를 통해 플립플롭(2436)의 D₁입력에 입력(S_in)을 출력한다. 플립플롭(2436)의 Q₁출력은 라인(2454)를 통해 레지스터(2466)에 연결된다. 제 4 원격 상태 머신(2439)는 라인(2451)을 통해 레지스터(2444)를, 라인(2455)을 통해 레지스터(2466)를 제어하고, 래치 이네이블 라인(2453)을 통해 플립플롭(2436)을 제어한다. 제 4 원격 상태 머신(2439)는 또한 라인(2450)을 통해 칩(2430)으로부터 원래의 클록 신호 CLK를 수신한다.Chip 2434 includes original flip-flop 2436. Register 2444 receives an input Sin and outputs an input S _in to the D ₁ input of flip-flop 2436 via line 2452. Q ₁ output of flip-flop 2436 is connected to register 2466 via line 2454. Fourth remote state machine 2439 controls register 2444 via line 2451, register 2466 via line 2455, and flip-flop 2436 through latch enable line 2453. To control. The fourth remote state machine 2439 also receives the original clock signal CLK from the chip 2430 via line 2450.

칩(2434)는 원래의 플립플롭(2437)을 포함하며, 이것은 라인(2456)을 통해 D₂입력에서 칩(2431)의 레지스터(2466)로부터 신호를 수신한다. 플립플롭(2437)의 Q₂출력은 라인(2457)을 통해 레지스터(2447)에 연결된다. 제 5 원격 상태 머신(2439)는 라인(2459)를 통해 레지스터(2447)을 제어하고, 래치 이네이블 라인(2458)을 통해 플립플롭(2437)을 제어한다. 제 5 원격 상태 머신(2442)는 또한 칩(2432,2433)을 통해 칩(2430)으로부터 원래의 클록 신호(CLK)를 수신한다.Chip 2434 includes an original flip-flop 2437, which receives a signal from register 2466 of chip 2431 at the D ₂ input over line 2456. Q ₂ output of flip-flop 2437 is connected to register 2447 via line 2457. Fifth remote state machine 2439 controls register 2447 through line 2459 and flip-flop 2437 through latch enable line 2458. The fifth remote state machine 2442 also receives the original clock signal CLK from the chip 2430 via the chips 2432 and 2433.

타이밍 합성을 이용하여, 원격 상태 머신(2438-2442), 레지스터(2443-2447,2466), 및 단일 글로벌 기준 클록은 여러 칩에 걸치는 단일 흐름을 제어하고 내부 플립플롭을 갱신하는데 사용된다. 따라서, 칩(2430)에서, CLK 신호를 다른칩에 분배하는 것은 레지스터(2443)을 통해 제 1 원격 상태 머신(2438)에 의해 계획된다. 유사하게, 칩(2431)에서, 제 4 원격 상태 머신(2439)는 입력(S_in)을 레지스터(2444)를 통해 플립플롭(2436)에 전달하고 레지스터(2466)을 통해 Q₁출력을 전달하는 것을 계획한다. 또한 플립플롭(2436)의 래칭 기능은 제 4 원격 상태 머신(2439)로부터 래치 이네이블 신호에 의해 제어된다. 동일한 원리가 다른 칩(2432-2434)내의 로직에 대해 적용된다. 상기 인터-칩 입력 전달 스케줄, 인터-칩 출력 전달 스케줄, 및 내부 플립플롭 상태 갱신의 엄격한 제어를 이용하여, 인터-칩 홀드-시간 위반이 제거된다.Using timing synthesis, remote state machines 2438-2442, registers 2443-2447, 2466, and a single global reference clock are used to control a single flow across multiple chips and to update internal flip-flops. Thus, at chip 2430, distributing the CLK signal to another chip is planned by the first remote state machine 2438 via the register 2443. Similarly, at chip 2431, fourth remote state machine 2439 passes input S _in to flip-flop 2436 through register 2444 and Q ₁ output through register 2466. To plan. The latching function of flip-flop 2436 is also controlled by the latch enable signal from the fourth remote state machine 2439. The same principle applies to logic in other chips 2432-2434. Using the inter-chip input transfer schedule, the inter-chip output transfer schedule, and strict control of internal flip-flop state update, the inter-chip hold-time violation is eliminated.

그러나, 시간 합성 기술은 사용자의 회로 설계를 원격 상태 머신과 레지스터를 포함하는 대부분 기능적으로 동일한 회로로 변형하는 것을 요구한다. 일반적으로, 이러한 기술을 구현하는데 추가로 필요한 로직은 각 칩내에서 이용될 수 있는 로직의 20%이다. 더욱이, 상기 기술은 클록 글리치 문제점에 영향을 받지 않는다. 클록 글리치를 피하기 위하여, 설계자는 타이밍 합성 기술을 사용하여 추가의 예비주의 단계를 수행해야 한다. 한가지 종래 설계 접근법은 게이트 클록을 이용하는 로직 디바이스의 입력이 동시에 변하지 않도록 회로를 설계하는 것이다. 개선되 접근법은 회로의 나머지 부분이 영향을 받지 않도록 글리치를 필터링하기 위해 게이트 지연을 사용한다. 그러나, 상기 설명처럼, 타이밍 합성은 클록 글리치를 피하기 위해 일부 추가로 시도되지 않은 수단을 요구한다.However, time synthesis techniques require transforming the user's circuit design into most functionally identical circuits, including remote state machines and registers. In general, the additional logic needed to implement this technique is 20% of the logic available in each chip. Moreover, the technique is not affected by clock glitch problems. To avoid clock glitch, designers must take additional precautionary steps using timing synthesis techniques. One conventional design approach is to design the circuit so that the input of the logic device using the gate clock does not change at the same time. The improved approach uses gate delay to filter the glitches so that the rest of the circuit is unaffected. However, as described above, timing synthesis requires some additional untried means to avoid clock glitches.

홀드 시간과 클록 글리치 문제점을 해결하는 본발명의 여러 실시예가 논의된다. 사용자 설계를 RCC 컴퓨팅 시스템의 소프트웨어 모델과 RCC 어레이의 하드웨어 모델로 구조 매핑하는 동안, 도 18A에 도시된 래치는 본 발명의 일 실시예에 따라서 타이밍 인센서티브 클리치-프리(TIGF) 래치로 에뮬레이팅된다. 유사하게, 도 18B에 도시된 플립플롭 설계는 본 발명의 일 실시예에 따라서 타이밍 인센서티브 클리치-프리(TIGF) 래치로 에뮬레이팅된다. 래치 또는 플립플롭 형태를 가지는 TIGF 로직 디바이스는 또한 에뮬레이션 로직 디바이스로 불릴 수 있다. TIGF 래치와 플립플롭의 갱신은 글로벌 트리거 신호로 제어된다.Various embodiments of the present invention that address hold time and clock glitch problems are discussed. While structurally mapping the user design to a software model of the RCC computing system and a hardware model of the RCC array, the latch shown in FIG. 18A is emulated with a timing insensitive cleat-free (TIGF) latch in accordance with one embodiment of the present invention. do. Similarly, the flip-flop design shown in FIG. 18B is emulated with a timing insensitive cleat-free (TIGF) latch in accordance with one embodiment of the present invention. TIGF logic devices in the form of latches or flip-flops may also be called emulation logic devices. Updates of TIGF latches and flip-flops are controlled by global trigger signals.

본 발명의 일 실시예에서, 사용자 설계 회로에서 발견된 모든 로직 디바이스가 TIGF 로직 디바이스로 대체되는 것은 아니다. 사용자 설계 회로는 프라이머리 클록에 의해 이네이블 또는 클록킹된 부분과 게이트 또는 얻어진 클록에 의해 제어된 다른 부분을 포함한다. 홀드 시간 위반과 클록 글리치는 로직 디바이스가 게이트 또는 얻어진 클록에 의해 제어되는 경우에 발생하기 때문에, 오로지 게이트 또는 얻어진 클록에 의해 제어된 특정 로직 디바이스가 본 발명의 일 실시예에 따라서 TIGF로 대체된다. 다른 실시예에서, 사용자 설계 회로에 발견된 모든 로직 디바이스는 TIGF 로직 디바이스로 대체된다.In one embodiment of the invention, not all logic devices found in the user design circuit are replaced by TIGF logic devices. The user design circuit includes a portion enabled or clocked by the primary clock and another portion controlled by the gate or obtained clock. Since hold time violations and clock glitches occur when a logic device is controlled by a gate or an obtained clock, only a particular logic device controlled by the gate or an obtained clock is replaced by TIGF in accordance with one embodiment of the present invention. In another embodiment, all logic devices found in the user designed circuit are replaced with TIGF logic devices.

본 발명의 TIGF 래치 및 플립플롭 실시예를 설명하기 전에, 글로벌 트리거 신호가 설명된다. 일반적으로, 글로벌 트리거 신호는 TIGF 래치 및 플립플롭이 산정 주기 동안 상태(이전 입력값)를 유지하게 하고, 짧은 트리거 주기동안 상태(새로운 입력값)를 갱신하게 한다. 일 실시예에서, 도 82에 도시된 글로벌 트리거 신호는 상기 설명한 EVAL 신호로부터 분리되고 얻어진다. 상기 실시예에서, 글로벌트리거 신호는 짧은 트리거 주기가 뒤따르는 긴 산정 주기를 가진다. 글로벌 트리거 신호는 산정 주기동안 EVAL 신호를 추적하고, EVAL 사이클의 결과에서, 짧은 트리거 신호가 TIGF 래치와 플립플롭을 갱신하기 위해 생성된다. 또 다른 실시예에서, EVAL 신호는 글로벌 트리거 신호이며, EVAL 신호는 산정 주기동안 하나의로직 상태(로직0)를 가지며, 비산정 또는 TIGF 래치/플립플롭 갱신 주기동안 또다른 로직 상태(로직1)를 가진다.Before describing the TIGF latch and flip-flop embodiments of the present invention, a global trigger signal is described. In general, the global trigger signal causes the TIGF latch and flip-flop to maintain the state (old input) during the calculation period and to update the state (new input) during the short trigger period. In one embodiment, the global trigger signal shown in FIG. 82 is separated and obtained from the EVAL signal described above. In this embodiment, the global trigger signal has a long calculation period followed by a short trigger period. The global trigger signal tracks the EVAL signal during the calculation period, and as a result of the EVAL cycle, a short trigger signal is generated to update the TIGF latch and flip-flop. In another embodiment, the EVAL signal is a global trigger signal, and the EVAL signal has one logic state (logic 0) during the calculation period and another logic state (logic 1) during the non-calculation or TIGF latch / flip-flop update period. Has

RCC 컴퓨팅 시스템과 RCC 하드웨어 어레이를 참조하여 설명한 산정 주기는 모든 프라이머리 입력을 진행하는데 사용되고 플립플롭/래치 디바이스는 임의의 시간에서의 시뮬레이션 사이클을 전체 사용자 설계로 변화시킨다. 전파(propagation) 도중에, RCC 시스템은 시스템 내의 모든 신호들이 정상 상태(steady-state)를 달성할 때까지 대기하게 된다. 사용자 설계가 맵핑되어 RCC 어레이의 적절한 재구성가능(reconfigurable) 논리 소자(예를들어, FPGA 칩)에 배치된 후에, 평가 기간이 계산된다. 따라서, 평가 기간은 설계 특정 사항이다. 즉, 한 사용자 설계에 대한 평가 기간이 다른 사용자 설계에 대한 평가 기간과 상이할 수 있다. 이 평가 기간은 시스템 내의 모든 신호가 전체 시스템을 통과하여 전파되고 다음의 짧은 트리거 기간 전에 정상 상태에 도달하는 것을 보장할 수 있도록 충분히 길어야 한다.The calculation cycle described with reference to the RCC computing system and the RCC hardware array is used to drive all primary inputs and the flip-flop / latch device transforms the simulation cycle at any time into a full user design. During propagation, the RCC system will wait until all signals in the system achieve a steady-state. After the user design is mapped and placed in the appropriate reconfigurable logic element (eg, FPGA chip) of the RCC array, the evaluation period is calculated. Therefore, the evaluation period is a design specific matter. That is, the evaluation period for one user design may be different from the evaluation period for another user design. This evaluation period should be long enough to ensure that all signals in the system propagate through the entire system and reach steady state before the next short trigger period.

짧은 트리거 기간은 도82에 도시된 바와 같이, 평가 기간에 인접한 시간에 발생한다. 본 발명의 일 실시예에서, 짧은 트리거 기간은 평가 기간 후에 일어난다. 이 짧은 트리거 기간 전에, 입력 신호가 평가 기간 도중에 사용자 설계 회로의 하드웨어 모델 구조화 부분을 통과하여 전파된다. 본 발명의 일 실시예에 따른 EVAL 신호의 논리 상태의 변화에 의해 표시되는 짧은 트리거 기간은 정상 상태가 도달된 후의 평가 기간으로부터 전파된 새로운 값으로 업데이트될 수 있도록, 사용자 설계 내의 모든 TIGF 래치와 플립플롭을 제어한다. 짧은 트리거 기간은 낮은 스큐(skew) 네트워크에 전체적으로 분산되며 재구성가능 논리 소자가 적절한 동작을 허용할 수 있을 정도만큼 짧을 수 있다(즉, 도82에 도시된 지속시간 t₂에서 t₃뿐만 아니라 t₀에서 t₁의 지속시간). 짧은 트리거 기간 도중에, 새로운 1차(primary) 입력이 TIGF 래치 및 플립플롭의 매 입력 스테이지마다 샘플되며, 동일한 TIGF 래치 및 플립플롭의 이전에 저장된 값이 사용자 설계의 RCC 하드웨어 모델의 다음 스테이지에 노출된다. 이하의 설명에서는, 짧은 트리거 기간 도중에 발생하는 전체 트리거 신호의 일부가 TIGF 트리거, TIGF 트리거 신호, 트리거 신호, 또는 단순히 트리거로 표시될 것이다.The short trigger period occurs at a time adjacent to the evaluation period, as shown in FIG. In one embodiment of the invention, the short trigger period occurs after the evaluation period. Before this short trigger period, the input signal propagates through the hardware model structuring portion of the user-designed circuit during the evaluation period. The short trigger period indicated by the change in the logical state of the EVAL signal in accordance with one embodiment of the present invention allows all TIGF latches and flips in the user design to be updated with new values propagated from the evaluation period after the steady state is reached. Control the flop. The short trigger period is distributed throughout the low skew network and may be short enough to allow the reconfigurable logic element to allow proper operation (i.e., t ₀ as well as t ₃ at duration t _{2 shown} in FIG. 82). Duration of t ₁ ). During the short trigger period, a new primary input is sampled for every input stage of the TIGF latch and flip-flop, and previously stored values of the same TIGF latch and flip-flop are exposed to the next stage of the RCC hardware model of your design. . In the description below, a portion of the total trigger signal that occurs during a short trigger period will be represented as a TIGF trigger, a TIGF trigger signal, a trigger signal, or simply a trigger.

도80(A)은 도18(A)에 먼저 도시되었던 래치(2470)를 나타낸다. 래치 동작은 다음과 같다.Fig. 80A shows the latch 2470 that was first shown in Fig. 18A. The latch operation is as follows.

if(#S), Q←1if (#S), Q ← 1

else if(#R), Q←0else if (#R), Q ← 0

else if(en), Q←Delse if (en), Q ← D

else Q는 이전 값을 유지else Q retains the previous value

이러한 래치가 레벨-민감(sensitive)하며 비동기적(asynchronous)이므로, 클록 입력이 인에이블되고 래치 인에이블 입력이 인에이블되는 동안은, 출력(Q)이 입력(D)을 추적한다.Since this latch is level-sensitive and asynchronous, the output Q tracks the input D while the clock input is enabled and the latch enable input is enabled.

도80(B)은 본 발명의 일 실시예에 따른 TIGF 래치를 나타낸다. 도80(A)에 도시된 바와 같이, TIGF 래치는 D 입력, 인에이블 입력, 셋(S), 리셋(R), 및 출력 Q를 갖는다. 따라서, TIGF 래치는 트리거 입력을 갖는다. TIGF 래치는 D 플립플롭(2471), 멀티플렉서(2472), OR 게이트(2473), AND 게이트(2474), 및 다양한 인터커넥션(interconnection)을 갖는다.Figure 80 (B) shows a TIGF latch in accordance with an embodiment of the present invention. As shown in Fig. 80A, the TIGF latch has a D input, an enable input, a set S, a reset R, and an output Q. Thus, the TIGF latch has a trigger input. The TIGF latch has a D flip-flop 2471, a multiplexer 2472, an OR gate 2473, an AND gate 2474, and various interconnections.

D 플립플롭(2471)은 라인(2476)을 경유하여 AND 게이트(2474)의 출력으로부터 D 플립플롭(2471)의 입력을 수신한다. D 플립플롭은 또한 라인(2477) 상의 트리거 신호에 의해 입력된 클록에서 트리거되며, 상기 트리거 신호는 평가 사이클에 기초한 엄격한 스케줄에 따라 RCC 시스템에 의해 전체적으로 분산되어 있다. D 플립플롭(2471)의 출력은 라인(2478)을 경유하여 멀티플렉서(2472)의 한 입력에 연결된다. 멀티플렉서는 라인(2484) 상의 인에이블 신호에 의해 제어된다. 멀티플렉서(2472)의 출력은 라인(2479)을 경유하여 OR 게이트(2473)의 입력에 연결된다. OR 게이트(2473)의 다른 입력은 라인(2480) 상의 셋(S) 입력에 연결된다. OR 게이트(2473)의 출력은 라인(2481)을 경유하여 AND 게이트(2474) 상의 리셋(R)에 연결된다. AND 게이트(2474) 상의 출력은 위에서 언급한 바와 같이 라인(2476)을 경유하여 D 플립플롭(2471)의 입력에 피드백된다.D flip-flop 2471 receives input of D flip-flop 2471 from the output of AND gate 2474 via line 2476. The D flip-flop is also triggered on the clock input by the trigger signal on line 2477, which is entirely distributed by the RCC system according to a strict schedule based on the evaluation cycle. The output of D flip-flop 2471 is connected to one input of multiplexer 2472 via line 2478. The multiplexer is controlled by an enable signal on line 2484. The output of multiplexer 2472 is connected to the input of OR gate 2473 via line 2479. The other input of OR gate 2473 is connected to the set (S) input on line 2480. The output of OR gate 2473 is connected to reset R on AND gate 2474 via line 2481. The output on AND gate 2474 is fed back to the input of D flip-flop 2471 via line 2476 as mentioned above.

본 발명에 따른 실시예의 이러한 TIGF 래치의 동작을 설명하고자 한다. TIGF 래치의 실시예에서, D 플립플롭(2471)은 TIGF 래치의 현재 상태(즉, 이전 값)를 유지한다. D 플립플롭(2471)의 입력에서의 라인(2476)은 TIGF 래치에 래치된 새로운 입력 값을 나타낸다. 라인(2475) 상의 TIGF 래치의 주 입력(D 입력)이 궁극적으로는 (궁극적으로는 표시될 라인(2484)상의 적절한 인에이블 신호를 갖는) 멀티플렉서(2472)의 입력으로부터 OR 게이트(2473)를 통과하여 진행되고, 마지막으로 라인(2483) 상에서 AND 게이트(2474)를 통과하고, TIGF 래치의 새로운 입력 신호를 라인(2476) 상의 D 플립플롭(2471)으로 귀환시키기 때문에, 라인(2476)은 새로운 값을 나타낸다. 라인(2477) 상의 트리거 신호는 새로운 입력 값을 D 플립플롭(2471)으로 클록킹함에 의해 TIGF 래치를 업데이트한다. 따라서, D 플립플롭(2471)의 라인(2478) 상의 출력은 TIGF 래치의 현재 상태(즉, 이전 값)를 나타내고, 라인(2476) 상의 입력은 TIGF 래치에 의해 래치된 새로운 입력을 나타낸다.The operation of this TIGF latch in an embodiment according to the present invention will be described. In an embodiment of the TIGF latch, the D flip-flop 2471 maintains the current state of the TIGF latch (ie, the previous value). Line 2476 at the input of D flip-flop 2471 represents a new input value latched to the TIGF latch. The main input (D input) of the TIGF latch on line 2475 ultimately passes through OR gate 2473 from the input of multiplexer 2472 (which ultimately has the appropriate enable signal on line 2484 to be displayed). And finally pass on AND gate 2474 on line 2483 and return the new input signal of the TIGF latch to D flip-flop 2471 on line 2476, so that line 2476 has a new value. Indicates. The trigger signal on line 2477 updates the TIGF latch by clocking the new input value to D flip-flop 2471. Thus, the output on line 2478 of D flip-flop 2471 represents the current state of the TIGF latch (ie, the previous value), and the input on line 2476 represents the new input latched by the TIGF latch.

멀티플렉서(2472)는 D 플립플롭(2471)으로부터 뿐만아니라 라인(2475) 상의 새로운 입력 값으로부터 현재 상태를 수신한다. 인에이블 라인(2484)은 멀티플렉서(2472)에 대한 선택기로서 기능한다. TIGF 래치가 트리거 신호가 라인(2477) 상에 제공될 때까지 업데이트되기 않기 때문에, 라인(2475) 상의 TIGF 래치의 D 입력 및 라인(2484) 상의 인에이블 입력은 임의의 순서로 TIGF 래치에 도달될 수 있다. TIGF 래치(및 사용자 설계의 하드웨어 모델에서의 다른 TIGF 래치)가, 하나의 클록 신호가 다른 클록 신호보다 훨씬 뒤에 도달되는 도76(A) 및 도76(B)에서 설명된 바와 같이, 종래의 래치에서 사용된 회로의 지속 시간 오류를 일반적으로 야기하는 상황을 맞닥뜨린 다면, 이 TIGF 래치는 트리거 신호가 라인(2477) 상에 제공될 때까지 적절한 이전 값을 유지함에 의해 적절히 기능할 것이다.Multiplexer 2472 receives the current state from D flip-flop 2471 as well as from a new input value on line 2475. Enable line 2484 functions as a selector for multiplexer 2472. Since the TIGF latch is not updated until a trigger signal is provided on line 2477, the D input of the TIGF latch on line 2475 and the enable input on line 2484 may reach the TIGF latch in any order. Can be. The TIGF latch (and other TIGF latches in a user-designed hardware model) is a conventional latch, as described in Figures 76 (A) and 76 (B), where one clock signal arrives far behind another clock signal. If a situation arises that would normally cause a duration error of the circuit used in this TIGF latch would function properly by maintaining the appropriate previous value until a trigger signal is provided on line 2477.

트리거 신호는 낮은 스큐 일반 클록 네트워크를 통해 분산된다.The trigger signal is distributed through a low skew common clock network.

이 TIGF 래치는 또한 클록 돌발사고(glitch) 문제를 해결한다. 클록신호가 TIGF 래치에서 인에이블 신호로 대치된다는 점이 주지된다. 라인(2484) 상의 인에이블 신호는 평가 기간 도중에 종종 돌발사고를 일으키나 TIGF 래치는 에러없이 현재 상태를 계속 유지할 것이다. TIGF 래치가 업데이트될 수 있는 단 하나의 메커니즘은, 본 발명의 평가 기간 후에 제공된 트리거 신호를 통해서이며, 상기 트리거 신호는 일 실시예에서 정상 상태에 도달되는 때에 평가 기간 후에 제공된다.This TIGF latch also solves the clock glitch problem. Note that the clock signal is replaced with the enable signal in the TIGF latch. The enable signal on line 2484 often causes a crash during the evaluation period, but the TIGF latch will remain in its current state without error. The only mechanism by which the TIGF latch can be updated is via a trigger signal provided after the evaluation period of the present invention, which in one embodiment is provided after the evaluation period when the steady state is reached.

도81(A)은 도18(B)에 먼저 도시된 플립플롭(2490)을 나타낸다. 플립플롭은 다음과 같이 동작한다.FIG. 81 (A) shows flip-flop 2490 shown first in FIG. 18 (B). Flip-flops work like this:

if(#S), Q←1if (#S), Q ← 1

else if(#R), Q←0else if (#R), Q ← 0

else if(CLK의 상승 에지), Q←Delse if (rising edge of CLK), Q ← D

else Q는 이전 값을 유지else Q retains the previous value

이러한 래치가 에지-트리거되기 때문에, 플립플롭 인에이블 입력이 인에이블인 동안에는, 출력 (Q)가 클록 신호의 상승 에지에서 입력(D)을 추적한다.Because this latch is edge-triggered, while the flip-flop enable input is enabled, the output Q tracks the input D on the rising edge of the clock signal.

도81(B)은 본 발명의 일 실시예에 따른 TIGF D형 플립플롭을 나타낸다. 도81(A)의 플립플롭과 마찬가지로, TIGF 플립플롭은 D 입력, 셋(S), 및 출력(Q)을 갖는다.81 (B) shows a TIGF-D flip-flop according to an embodiment of the present invention. Similar to the flip flop of Fig. 81A, the TIGF flip flop has a D input, a set S, and an output Q.

또한, TIGF 플립플롭은 트리거 입력을 갖는다. TIGF 플립플롭은 세 개의 D플립플롭(2491, 2492, 2496), 멀티플렉서(2493), OR 게이트(2494), 두 개의 AND 게이트(2495, 2497), 및 다양한 인터커넥션을 갖는다.The TIGF flip-flop also has a trigger input. TIGF flip-flops have three D flip-flops 2491, 2492, 2496, multiplexer 2493, OR gate 2494, two AND gates 2495, 2497, and various interconnections.

플립플롭(2491)은 라인(2498) 상의 TIGF D 입력, 라인(2499) 상의 트리거 입력을 수신하고, 라인(2500) 상에서 Q 출력을 제공한다. 이 출력 라인(2500)은 또한 멀티플렉서(2493)의 입력들중 하나로서 기능한다. 멀티플렉서(2493)의 다른 입력은 라인(2503)을 경유하여 플립플롭(2492)의 출력 Q 로부터 나온다. 멀티플렉서(2493)의 출력은 라인(2505)을 경유하여 OR 게이트(2494)의 입력들 중 하나에 연결된다. OR 게이트(2492)의 다른 입력은 라인(2506) 상의 셋(S) 신호이다. OR 게이트(2494)의 출력이 라인(2507)을 경유하여 AND 게이트(2495)의 입력들중 하나에 연결된다. AND 게이트(2495)의 다른 입력은 라인(2508) 상의 리셋(R) 신호이다. AND 게이트(2495)의 출력(전체 TIGF 출력 Q이기도 함)은 라인(2501)을 경유하여 플립플롭(2492)의 입력에 연결된다. 플립플롭(2492)은 또한 라인(2502) 상의 트리거 입력을 갖는다.Flip-flop 2491 receives a TIGF D input on line 2498, a trigger input on line 2499, and provides a Q output on line 2500. This output line 2500 also functions as one of the inputs of the multiplexer 2493. The other input of multiplexer 2493 comes from the output Q of flip-flop 2492 via line 2503. The output of multiplexer 2493 is connected to one of the inputs of OR gate 2494 via line 2505. The other input of OR gate 2492 is the set (S) signal on line 2506. The output of OR gate 2494 is connected to one of the inputs of AND gate 2495 via line 2507. The other input of AND gate 2495 is a reset (R) signal on line 2508. The output of AND gate 2495 (which is also the total TIGF output Q) is connected to the input of flip-flop 2492 via line 2501. Flip-flop 2492 also has a trigger input on line 2502.

멀티플렉서(2493)로 돌아와서, 멀티플렉서(2493)의 선택기 입력은 라인(2509)을 경유하여 AND 게이트(2497)의 출력에 연결된다. AND 게이트(2497)는 라인(2510) 상의 CLK 신호로부터의 입력들중 하나를 수신하고 라인(2512)을 경유하여 플립플롭(2496)의 출력으로부터 다른 입력을 수신한다. 플립플롭(2496)은 또한 라인(2511) 상의 CLK 신호로부터 입력을 수신하고 라인(2513) 상의 트리거 입력을 수신한다.Returning to the multiplexer 2493, the selector input of the multiplexer 2493 is connected to the output of the AND gate 2497 via line 2509. AND gate 2497 receives one of the inputs from the CLK signal on line 2510 and receives another input from the output of flip-flop 2496 via line 2512. Flip-flop 2496 also receives an input from the CLK signal on line 2511 and a trigger input on line 2513.

본 발명의 실시예의 TIGF 플립플롭 동작에 대하여 설명하고자 한다. 본 실시예에서, TIGF 플립플롭은 세 개의 상이한 지점(라인(2499)을 경유한 D 플립플롭(2491), 라인(2502)을 경유한 D 플립플롭, 및 라인(2513)을 경유한 D 플립플롭(2496))에서 트리거 신호를 수신한다.The TIGF flip-flop operation of the embodiment of the present invention will be described. In this embodiment, the TIGF flip-flop is three different points (D flip-flop 2491 via line 2499, D flip-flop via line 2502, and D flip-flop via line 2513). 2424), a trigger signal is received.

TIGF 플립플롭은 클록 신호의 에지가 감지된 경우에만 입력 값을 저장한다. 본 발명의 일 실시예에 따르면, 요구되는 에지는 클록 신호의 상승 에지이다. 클록 신호의 상승 에지를 감지하기 위해, 에지 감지기(2515)가 제공된다. 에지 감지기(2515)는 D 플립플롭(2496)과 AND 게이트(2497)를 포함한다. 에지 감지기(2515)는 또한 D 플립플롭(2496)의 라인(2513)의 트리거 신호를 경유하여 업데이트된다.TIGF flip-flops store the input value only when the edge of the clock signal is detected. According to one embodiment of the invention, the required edge is the rising edge of the clock signal. To detect the rising edge of the clock signal, an edge detector 2515 is provided. Edge detector 2515 includes a D flip-flop 2496 and an AND gate 2497. Edge detector 2515 is also updated via the trigger signal on line 2513 of D flip-flop 2496.

D 플립플롭(2491)은 TIGF 플립플롭의 새로운 입력 값을 유지하며 트리거 신호가 라인(2499) 상에 제공될 때까지 라인(2498) 상의 D 입력의 어떠한 변화에도 저항한다. 따라서, TIGF 플립플롭의 가 평가 주기 전에, 새로운 값이 D 플립플롭(2491) 상에 저장된다. 따라서, TIGF 플립플롭은 TIGF 플립플롭이 트리거 신호에 의해 업데이트 될 때까지 새로운 값을 미리 저장함에 의해 유지 시간 오류를 회피할 수 있다.D flip-flop 2491 retains the new input value of TIGF flip-flop and resists any change in D input on line 2498 until a trigger signal is provided on line 2499. Therefore, before the evaluation period of the TIGF flip-flop, a new value is stored on the D flip-flop 2491. Therefore, the TIGF flip-flop can avoid the holding time error by pre-storing a new value until the TIGF flip-flop is updated by the trigger signal.

D 플립플롭(2492)은 트리거 신호가 라인(2502) 상에 제공되기까지 TIGF 플립플롭의 현재 값(또는 이전 값)을 유지한다. 이 값은 TIGF 플립플롭이 업데이트되고 다음 평가 기간 이전인, 모방된(emulated) TIGF 플립플롭의 상태이다. 라인(2501) 상의 D 플립플롭(2492)의 입력은 (평가 기간의 상당한 지속에 대하여 라인(2500) 상의 동일한 값인) 새로운 값을 유지한다.D flip-flop 2492 holds the current value (or previous value) of the TIGF flip-flop until a trigger signal is provided on line 2502. This value is the state of the emulated TIGF flip-flop, where the TIGF flip-flop is updated and before the next evaluation period. The input of D flip-flop 2492 on line 2501 maintains a new value (which is the same value on line 2500 for significant duration of the evaluation period).

멀티플렉서(2493)는 라인(2500) 상의 새로운 입력 값을 수신하고 라인(2503)상의 TIGF 플립플롭에 저장된 이전 값을 수신한다. 라인(2504) 상의 선택기 신호에 기초하여, 멀티플렉서는 모방된 TIGF 플립플롭의 출력으로서, 새로운 값(라인(2500)) 또는 이전 값(라인(2503)) 중 하나를 출력한다. 사용자 설계 하드웨어 모델에서 전체 전파된 신호가 정상상태에 도달하기 전에, 이러한 출력이 임의의 클록 돌발사고에 따라 변화된다. 따라서, 라인(2501) 상의 입력이 평가 기간의 마지막에서 플립플롭(2491)에 저장된 새로운 값을 나타낼 것이다. 트리거 신호가 TIGF 플립플롭에 의해 수신된 경우, 플립플롭(2492)은 라인(2501)에 존재하는 새로운 값을 저장하며, 플립플롭(2491)은 라인(2498) 상의 다음 새로운 값을 저장한다. 따라서, 본 발명의 일 실시예에 따른 TIGF 플립플롭은 클록 돌발사고에 의해 악영향을 받지 않는다.Multiplexer 2493 receives the new input value on line 2500 and the previous value stored in the TIGF flip-flop on line 2503. Based on the selector signal on line 2504, the multiplexer outputs either the new value (line 2500) or the old value (line 2503) as the output of the simulated TIGF flip-flop. In the user-designed hardware model, before the entire propagated signal reaches steady state, this output is changed according to any clock accident. Thus, an input on line 2501 will represent a new value stored in flip-flop 2491 at the end of the evaluation period. When the trigger signal is received by a TIGF flip-flop, flip-flop 2492 stores the new value present on line 2501 and flip-flop 2491 stores the next new value on line 2498. Therefore, the TIGF flip-flop according to the embodiment of the present invention is not adversely affected by the clock accident.

덧붙여서, TIGF 플립플롭은 또한 클록 돌발사고에 대한 면역성(immunity)을 제공한다. 당업자는 플립플롭(2420, 2421, 2423)을 도81(B)의 TIGF 플립플롭으로 대치함에 의해, 클록 돌발사고가 이 TIGF 플립플롭을 이용하는 어떠한 회로에 영향을 주지 않는다는 점을 이해할 수 있을 것이다. 도77(A) 및 도77(B)을 참조하면, 시간 t₁과 t₂사이의 시간 동안 새로운 값으로 클록 되어서는 안되나 새로운 값으로 클록된 플립플롭(2423) 때문에, 클록 돌발사고는 도77(A)의 회로에 악영향을 준다. CLK1 및 CLK2의 스큐 성질은 시간 t₁과 t₂사이의 시간 주기 동안에 XOR 게이트(2422)가 논리 1 상태를 발생시켜서 다음 플립플롭(2423)의 클록 라인을 구동시키도록 만든다. 본 발명의 일실시예에 따른 TIGF 플립플롭에서, 클록 돌발사고는 새로운 값의 클록킹에 영향을 미치지 않는다. 플립플롭(2423)을 TIGF 플립플롭으로 대치하고, 신호가 평가 기간 도중에 정상상태에 도달되면, 짧은 트리거 기간중의 트리거 신호가 TIGF 플립플롭을 인에이블하여 새로운 값을 플립플롭(2491)(도91(B))에 저장한다. 다음, 시간 t₁과 t₂로부터의 시간 간격 도중의 도77(B)의 클록 돌발사고와 같은 임의의 클록 돌발사고가 새로운 값을 클록킹하지 않게 된다. TIGF 플립플롭은 단지 트리거 신호를 업데이트하고 트리거 신호는 회로를 통한 신호 전파가 정상상태에 도달된 경우의 평가 기간 후까지 TIGF 플립플롭에 나타나지 않게 된다.In addition, TIGF flip-flops also provide immunity to clock bursts. Those skilled in the art will appreciate that by replacing the flip-flops 2420, 2421, 2423 with the TIGF flip-flops of Figure 81 (B), the clock accident does not affect any circuit using this TIGF flip-flop. 77 (A) and 77 (B), because of the flip-flop 2423 clocked to the new value while not being clocked to the new value for the time between the times t ₁ and t ₂ , the clock accident is shown in FIG. 77. It adversely affects the circuit of (A). The skew nature of CLK1 and CLK2 causes the XOR gate 2422 to generate a logic 1 state during the time period between times t ₁ and t ₂ to drive the clock line of the next flip-flop 2423. In a TIGF flip-flop according to an embodiment of the present invention, the clock accident does not affect the clocking of the new value. If the flip-flop 2423 is replaced with a TIGF flip-flop, and the signal reaches a steady state during the evaluation period, the trigger signal during the short trigger period enables the TIGF flip-flop to flip the new value to the flip-flop 2491 (Fig. 91). (B)). Next, any clock rupture such as the clock rupture of Fig. 77 (B) during the time interval from the times t ₁ and t ₂ will not clock the new value. The TIGF flip-flop simply updates the trigger signal and the trigger signal does not appear on the TIGF flip-flop until after the evaluation period when signal propagation through the circuit has reached steady state.

이 특정 실시예에서 TIGF 플립플롭이 D 플립플롭이나, 다른 플립플롭 (예를들어, T. JK, SR)이 본 발명 내에서 가능하다. D 입력 앞에 AND/OR 로직을 부가함에 의해 다른 형태의 에지 트리거 플립플롭이 D 플립플롭으로부터 유도될 수 있다.In this particular embodiment, the TIGF flip-flop is a D flip-flop, but other flip-flops (eg, T. JK, SR) are possible within the present invention. Another form of edge trigger flip-flop can be derived from the D flip-flop by adding AND / OR logic before the D input.

Ⅶ. 시뮬레이션 서버Iii. Simulation server

본 발명의 또다른 실시예에 따른 시뮬레이션 서버는 다중 사용자들이 동일한 재구성가능 하드웨어 유닛을 액세스하여 시간 공유된 방식으로 이들 또는 상이한 사용자 설계를 시뮬레이트하고 가속하도록 제공된다. 고속 시뮬레이션 스케줄러 및 상태 교체(swapping) 메커니즘이 높은 수율을 야기하는 액티브 시뮬레이션 프로세스를 갖는 시뮬레이션 서버를 제공하기 위해 이용된다. 상기 서버는 가속 및 하드웨어 상태 교체 목적을 위해 재구성가능한 하드웨어 유닛을 액세스하도록 다중 사용자 또는 프로세스에 제공된다. 가속이 달성되거나 하드웨어 상태가 액세스되면, 각 사용자 또는 프로세스는 소프트웨어에서만 시뮬레이션되고, 이에 의해 다른 사용자 또는 프로세스에 재구성가능한 하드웨어 유닛의 제어를 해제시킨다.Simulation server according to another embodiment of the present invention is provided such that multiple users access the same reconfigurable hardware unit to simulate and accelerate these or different user designs in a time shared manner. A high speed simulation scheduler and state swapping mechanism are used to provide a simulation server with an active simulation process resulting in high yield. The server is provided to multiple users or processes to access reconfigurable hardware units for acceleration and hardware state replacement purposes. Once acceleration is achieved or hardware state is accessed, each user or process is simulated only in software, thereby releasing control of the reconfigurable hardware unit to another user or process.

본 명세서의 시뮬레이션 서버 부분에서, "작업(job)" 및 "프로세스"와 같은 용어가 사용된다. 본 명세서에서, 용어 "작업(job)" 및 "프로세스"는 일반적으로 상호교환적으로 사용된다. 과거에는, 배치(batch) 시스템이 "작업"을 실행하였고 시간 공유 시스템이 "프로세스" 또는 프로그램을 저장 및 실행하였다. 오늘날의 시스템에서, 이들 작업 및 프로세스는 유사하다. 따라서, 본 명세서에서, 용어 "작업"은 배치 타입 시스템에 국한되지 않고 "프로세스"는 시간 공유 시스템에 국한되지 않는다. 오히려, "프로세스"가 시간 슬라이스(slice) 내에 또는 다른 시간 공유 인터럽터(interrupter)에 의해 어떠한 인터럽트가 없이 수행되는 극단적인 경우에, "작업"이 "프로세스"와 동일하며, "작업"이 완결되기 위해 다중 시간 슬라이스를 요구하는 경우인 다른 극단적인 경우에는 "작업"은 "프로세스"의 서브셋(subset)이 된다. 따라서, "프로세스"가 다른 동일한 우선순위를 갖는 사용자/프로세스의 존재로 인하여 완결된 실행을 위해 다중 시간 슬라이스를 요구하는 경우에, "프로세스"는 "작업"으로 분할된다. 또한, 유일한 높은 우선순위 사용자 또는 프로세스가 시간 슬라이스 내에서 완결될 정도로 충분히 기 때문에 "프로세스"가 완결된 실행을 위해 다중 시간 슬라이스를 요구하지 않는다면, "프로세스"는 "작업"과 동일하다. 따라서, 사용자는 시뮬레이션 시스템에서 실행되고 로딩된 하나 이상의 "프로세스" 또는 프로그램과 상호작용할 수 있으며, 각 "프로세스"는 시간 공유 시스템을 완결하기 위해서 하나 이상의 "작업"을 필요로 할 수 있다.In the simulation server portion of this specification, terms such as "job" and "process" are used. In this specification, the terms "job" and "process" are generally used interchangeably. In the past, batch systems performed "jobs" and time sharing systems stored and executed "processes" or programs. In today's systems, these tasks and processes are similar. Thus, in this specification, the term "job" is not limited to batch type systems and "process" is not limited to time sharing systems. Rather, in the extreme case where a "process" is performed without any interruption in a time slice or by another time-sharing interrupter, the "task" is the same as the "process" and the "task" is completed. In other extreme cases, where a multi-time slice is required for this purpose, the "work" is a subset of the "process". Thus, if a "process" requires multiple time slices for completed execution due to the presence of other equal priority users / processes, the "process" is divided into "tasks". Also, if a "process" does not require multiple time slices for completed execution because the only high priority user or process is sufficient to complete within the time slice, the "process" is the same as the "task." Thus, a user may interact with one or more "processes" or programs executed and loaded in the simulation system, and each "process" may require one or more "tasks" to complete the time sharing system.

본 발명의 한 구조에서, 원격 터미널을 경유한 다중 사용자는, 동일한 재구성가능 하드웨어 유닛을 액세스하고 동일 또는 상이한 사용자 회로 설계를 검토/디버깅하기 위해서, 비(non)-네트워크 환경의 동일한 마이크로 프로세서 워크스테이션(workstation)을 이용할 수 있다. 비-네트워크 환경에서, 원격 터미널은 동작 기능에 대한 액세스를 위해 주 컴퓨팅 시스템에 연결될 수 있다. 이 비-네트워크 구조는 다중 사용자들이 병렬 디버깅 목적을 위해 동일한 사용자 설계에 대한 액세스를 공유하는 것을 가능케한다. 상기 액세스는 스케줄러(scheduler)가 다중 사용자에 대한 액세스 우선순위, 교체(swap) 작업, 및 선택적으로 예정된 사용자중에 록(lock) 하드웨어 유닛을 결정하는 시간 공유 프로세스에 의하여 달성된다. 다른 경우에는, 다중 사용자들이 디버깅 목적을 위한 사용자들의 분리된 상이한 사용자 설계용 서버를 이용하여 동일한 재구성가능 하드웨어 유닛을 액세스할 수 있다. 다른 구조에서는, 다중 사용자 또는 프로세스가 운영체제와 함께 워크스테이션 내의 다중 마이크로프로세서를 공유한다. 또다른 구조에서는, 분리된 마이크로프로세서 기반 워크스테이션 내의 다중 사용자 또는 프로세스가 동일 또는 상이한 사용자 회로 설계를 네트워크 전체에 대해 검토/디버깅하기 위해 동일한 재구성가능 하드웨어 유닛을 액세스할 수 있다. 유사하게, 다중 사용자, 교체 작업, 및 선택적으로 스케줄된 사용자 중에서 록 하드웨어 유닛 액세스에 대한 액세스 우선순위를 결정하는 시간 공유 프로세스에 의해 스케줄러가 달성된다. 네트워크 환경에서, 스케줄러는 UNIX 소켓(socket) 시스템 콜(call)을 통해 네트워크 요구를 청취한다.운영 체제는 스케줄러에게 명령을 전달하기 위해 소켓을 이용한다.In one architecture of the present invention, multiple users via a remote terminal can access the same reconfigurable hardware unit and review / debug the same or different user circuit designs in the same microprocessor workstation in a non-network environment. You can use (workstation). In a non-network environment, the remote terminal can be connected to the primary computing system for access to operational functions. This non-network architecture allows multiple users to share access to the same user design for parallel debugging purposes. The access is achieved by a time sharing process in which a scheduler determines access priorities for multiple users, swap operations, and optionally lock hardware units among scheduled users. In other cases, multiple users may access the same reconfigurable hardware unit using separate, different user design servers of users for debugging purposes. In another architecture, multiple users or processes share multiple microprocessors within a workstation with an operating system. In another architecture, multiple users or processes within separate microprocessor-based workstations can access the same reconfigurable hardware unit to review / debug the same or different user circuit designs throughout the network. Similarly, the scheduler is achieved by a time sharing process that determines the priority of access for lock hardware unit access among multiple users, replacement jobs, and optionally scheduled users. In a networked environment, the scheduler listens for network requests through UNIX socket system calls. The operating system uses the socket to deliver commands to the scheduler.

위에서 언급한 바와 같이, 시뮬레이션 스케줄러는 선점의(preemptive) 다중 우선순위 라운드(round) 로빈(robin) 알고리즘을 사용한다. 즉, 높은 우선순위 사용자 또는 프로세스는 상기 사용자 또는 프로세스가 작업을 완결하고 세션(session)을 종결할 때까지 먼저 서비스된다. 동일한 우선순위 사용자 또는 프로세스 중에서, 각 사용자 또는 프로세스에 동일한 시간 슬라이스가 할당되어 완결되기까지 동작이 실행되는 선점 라운드 로빈 알고리즘이 사용된다. 시간 슬라이스는 다중 사용자 또는 프로세스가 서비스되기 전에 오랜시간을 대기할 필요가 없을 정도로 충분히 짧다. 시뮬레이션 서버의 스케줄러가 한 사용자 또는 프로세스를 인터럽트하여 교체되고 새로운 사용자 작업을 실행하기 전에 충분한 동작이 실행될 정도로 시간 슬라이스는 또한 충분히 길다. 본 발명의 일 실시예에서, 디폴트 시간 슬라이스는 5초이며 사용자가 설정가능하다. 일 실시예에서, 스케줄러는 운영체제의 내장 스케줄러에 특정 콜을 만든다.As mentioned above, the simulation scheduler uses a preemptive multi-priority round robin algorithm. That is, the high priority user or process is serviced first until the user or process completes the task and terminates the session. Among the same priority users or processes, a preemptive round robin algorithm is used in which an operation is executed until the same time slice is assigned and completed for each user or process. The time slice is short enough that you do not have to wait long before multiple users or processes are serviced. The time slice is also long enough so that the scheduler of the simulation server can be replaced by interrupting one user or process and sufficient action is executed before executing a new user task. In one embodiment of the present invention, the default time slice is 5 seconds and is user settable. In one embodiment, the scheduler makes specific calls to the operating system's built-in scheduler.

도45는 본 발명에 따른 실시예에서 멀티프로세서 워크스테이션을 갖는 비-네트워크 환경을 나타낸다. 도45는 도1의 변형예이며, 따라서, 동일한 도면번호가 동일한 컴포넌트/유닛에 사용될 것이다. 워크스테이션(1100)은 로컬 버스(1105), 호스트/PCI 브리지(1106), 메모리 버스(1107), 및 주 메모리(1108)를 포함한다. 캐시 메모리 시스템(미도시)이 또한 제공될 수 있다. 다른 사용자 인터페이스 유닛(예를들어, 모니터, 키보드)이 또한 제공되나 도45에 도시되지는 않았다. 워크스테이션(1100)은 또한 스케줄러(1117) 및 접속/경로(1118)를 경유하여 로컬버스(1105)에 접속된 다중 마이크로프로세서(1101, 1102, 1103, 1104)를 포함한다. 당업자에게 공지된 바와 같이, 운영체제(1121)는 파일을 관리하고 컴퓨팅 환경 내의 다양한 사용자, 프로세스 및 장치를 할당하기 위하여 전체 컴퓨팅 환경에 대한 사용자-하드웨어 인터페이스 기초(foundation)를 제공한다. 개념적인 목적으로 운영체제(1121)와 버스(1122)가 도시되어 있다. 운영체제에 대하여, Abraham Silberschatz 및 James L. Peterson, OPERATING SYSTEM CONCEPTS(1988) 및 William Stallings, MODERN OPERATION SYSTEM(1996)이 참조될 수 있다.Figure 45 illustrates a non-network environment with a multiprocessor workstation in an embodiment in accordance with the present invention. Figure 45 is a variant of Figure 1, therefore, the same reference numbers will be used for the same components / units. Workstation 1100 includes a local bus 1105, a host / PCI bridge 1106, a memory bus 1107, and a main memory 1108. Cache memory systems (not shown) may also be provided. Other user interface units (eg, monitors, keyboards) are also provided but are not shown in FIG. Workstation 1100 also includes multiple microprocessors 1101, 1102, 1103, 1104 connected to local bus 1105 via scheduler 1117 and connection / path 1118. As is known to those skilled in the art, operating system 1121 provides a user-hardware interface foundation for the entire computing environment for managing files and assigning various users, processes, and devices within the computing environment. Operating system 1121 and bus 1122 are shown for conceptual purposes. For operating systems, reference may be made to Abraham Silberschatz and James L. Peterson, OPERATING SYSTEM CONCEPTS (1988) and William Stallings, MODERN OPERATION SYSTEM (1996).

본 발명의 일 실시예에서, 워크스테이션(1100)은 UltraSPARC Ⅱ 프로세서를 이용하는 Sun Microsystems Enterprise 450 시스템이다. 로컬 버스를 경유한 메모리 액세스 대신에, Sun 450 시스템은 멀티프로세서가 크로스바아(crossbar) 스위치를 통한 메모리용 버스를 경유하여 메모리를 액세스하도록 허용한다. Sun 450 시스템과 Sun UltraSPARC 멀티프로세서의 사양이 참조된다. Sun Ultra 60 시스템은 단지 두 개의 프로세스만을 허용하나 마이크로프로세서 시스템의 일 예가 될 수 있다.In one embodiment of the invention, workstation 1100 is a Sun Microsystems Enterprise 450 system utilizing an UltraSPARC II processor. Instead of accessing memory via the local bus, the Sun 450 system allows multiprocessors to access memory via the bus for memory through a crossbar switch. Reference is made to the specifications of the Sun 450 system and the Sun UltraSPARC Multiprocessor. The Sun Ultra 60 system allows only two processes, but could be an example of a microprocessor system.

스케줄러(1117)는 시간 공유 액세스를 장치 구동기(1119) 및 접속/경로(1120)를 경유하여 재구성가능 하드웨어 유닛(20)에 제공한다. 스케줄러(1117)는 호스트 컴퓨팅 시스템의 운영체제와 상호작용하기 위해 소프트웨어에서 대개 구현되며, 내부/외부 시뮬레이션 세션에서의 시뮬레이팅 작업 인터럽트를 지지함에 의해 시뮬레이션 서버와 상호작용하기 위해 부분적으로는 하드웨어에서 구현된다. 스케줄러(1117) 및 장치 구동기(1119)는 이하에서 상세히 설명될것이다.Scheduler 1117 provides time sharing access to reconfigurable hardware unit 20 via device driver 1119 and connection / path 1120. The scheduler 1117 is typically implemented in software to interact with the operating system of the host computing system and partly in hardware to interact with the simulation server by supporting simulation job interrupts in internal / external simulation sessions. . Scheduler 1117 and device driver 1119 will be described in detail below.

각 마이크로프로세서(1101-1104)는 워크스테이션(1101) 내의 다른 마이크로프로세서들을 독립적으로 프로세싱할 수 있다. 본 발명의 일 실시예에서, 워크스테이션(1100)은 UNIX 기반 운영체제하에서 동작되나, 다른 실시예에서는, 워크스테이션(1100)은 Windows 기반 또는 Macintosh 기반 운영체제 하에서 동작될 수 있다. UNIX 기반 시스템에서, 사용자는 프로그램, 작업, 및 필요한 경우 파일에 대하여 X-Windows를 구비한다. UNIX 운영체제에 대한 상세한 내용은 Maurice J. Bach, THE DESIGN OF THE UNIX OPERATING SYSTEM(1986)이 참조된다.Each microprocessor 1101-1104 may independently process other microprocessors in the workstation 1101. In one embodiment of the present invention, workstation 1100 operates under a UNIX-based operating system, while in other embodiments, workstation 1100 may operate under a Windows-based or Macintosh-based operating system. In UNIX-based systems, users have X-Windows for programs, tasks, and files as needed. For more information on UNIX operating systems, see Maurice J. Bach, THE DESIGN OF THE UNIX OPERATING SYSTEM (1986).

도45에서, 다중 사용자는 원격 터미널을 이용하여 워크스테이션(1100)에 액세스될 수 있다. 때때로, 각 사용자는 프로세스를 구동시키기 위해서 특정 CPU를 이용하고 있을 수 있다. 다른 경우에, 각 사용자는 자원 제한에 의존하여 상이한 CPU를 이용한다. 대개, 운영체제(1121)는 이러한 액세스를 결정하며, 운영체제는 작업을 완료하기 위하여 CPU로부터 다른 것으로 점프할 수 있다. 시간 공유 프로세스를 다루기 위해서, 스케줄러는 소켓 시스템 을 통하여 네트워크 요구를 청취하고 운영체제(1121)에 시스템 콜을 생성하며, 이들은 차례로 재구성가능 하드웨어 유닛(20)에 장치 구동기(1119)에 의한 인터럽트 신호의 생성을 개시함에 의해 선점을 다룬다. 이러한 인터럽트 신호 발생이 현재 작업을 중지하고, 현재 인터럽트된 작업에 대한 상태 정보를 저장하고, 작업을 교체하고, 새로운 작업을 실행하는 것을 포함하는 스케줄링 알로리즘에 대한 다수의 단계들중 하나가 된다. 서버 스케줄링 알고리즘이 이하에서 설명될 것이다.In FIG. 45, multiple users can be accessed to workstation 1100 using a remote terminal. At times, each user may be using a particular CPU to run a process. In other cases, each user uses a different CPU depending on resource limitations. Usually, operating system 1121 determines this access, and the operating system can jump from the CPU to another to complete the task. To handle the time sharing process, the scheduler listens for network requests through the socket system and makes system calls to the operating system 1121, which in turn generate interrupt signals by the device driver 1119 to the reconfigurable hardware unit 20. It deals with preemption by initiating. This interrupt signal generation becomes one of a number of steps to the scheduling algorithm that includes stopping the current task, storing state information about the currently interrupted task, replacing the task, and executing a new task. The server scheduling algorithm will be described below.

소켓 및 소켓 시스템 콜을 간략히 설명하고자 한다. 일 실시예의 UNIX 운영체제는 시간 공유 모드로 동작할 수 있다. UNIX 커널(kernel)은 CPU를 시간 기간(예를들어, 시간 슬라이스) 동안과 시간 슬라이스의 말미에서 프로세스에 할당시키고, 상기 프로세스를 선점하고 다음 시간 슬라이스에 대하여 다음 것을 스케줄한다. 이전 시간 슬라이스로부터 선점된 프로세스는 나중 시간 슬라이스에서 실행을 위해 리스케줄된다.We will briefly describe socket and socket system calls. The UNIX operating system of one embodiment may operate in a time sharing mode. The UNIX kernel allocates the CPU to a process for a period of time (eg, a time slice) and at the end of the time slice, preempts the process and schedules the next for the next time slice. Processes preempted from the previous time slice are rescheduled for execution in the later time slice.

프로세스간 통신을 가능 및 용이하게 하고 복잡한 네트워크 프로토콜의 사용을 가능하게 하기 위한 하나의 구조는 소켓이다. 커널은 클라이언트-서버 모델의 범주에서 동작하는 세 개의 층을 포함한다. 소켓 층인 상부층은 시스템 콜과 하부층(프로토콜 층 및 디바이스 층) 간의 인터페이스를 제공한다. 통상, 소켓은 클라이언트 프로세스를 서버 프로세스와 결합시키는 엔드 포인트(end point)를 갖는다. 소켓 엔드 포인트는 다른 장치일 수 있다. 프로토콜층인 중간 층은 TCP 및 IP와 같은 통신용 프로토콜 모듈을 제공한다. 장치 층인 하부층은 네트워크 장치를 제어하는 장치 구동기를 포함한다. 장치 구동기의 일예가 이더넷(Ethernet) 기반 네트워크 상의 이더넷 구동기이다.One structure for enabling and facilitating interprocess communication and for enabling the use of complex network protocols is sockets. The kernel contains three layers that operate within the scope of the client-server model. The upper layer, the socket layer, provides the interface between the system call and the lower layer (protocol layer and device layer). Typically, a socket has an end point that associates a client process with a server process. The socket endpoint may be another device. The middle layer, the protocol layer, provides protocol modules for communication such as TCP and IP. The lower layer, which is the device layer, includes device drivers for controlling the network devices. One example of a device driver is an Ethernet driver on an Ethernet based network.

서버 프로세스가 하나의 엔드 포인트에서 소켓을 청취하고 클라이언트 프로세스가 양방향 통신 경로의 다른 엔드 포인트에서 다른 소켓 상의 서버 프로세스를 청취하는 클라이언트-서버 모델을 이용하여 프로세스가 통신한다. 커널은 각 클라이언트와 서버의 세 개의 층들 중에서 내부 접속을 유지하고 필요한 경우 클라이언트로부터 서버로 데이터를 전송한다.The process communicates using a client-server model in which a server process listens for sockets at one endpoint and a client process listens for server processes on different sockets at different endpoints in a bidirectional communication path. The kernel maintains internal connections among the three layers of each client and server, and transfers data from the client to the server as needed.

소켓은 통신 경로의 엔드 포인트를 형성하는 소켓 시스템 콩을 포함하는 수개의 시스템 콜을 포함한다. 다수의 프로세스가 다수의 시스템 콜의 소켓 디스크립터(descriptor)를 사용한다. 바인드(bind) 시스템 콜은 이름을 소켓 디스크립터와 연관시킨다. 일부 다른 예시적인 시스템은 콜은 커널이 소켓에 접속시키는 연결 시스템 콜 요구를 포함하며, 폐(close) 시스템 콜을 소켓을 폐쇄시키고, 셧다운(shutdown) 시스템 콜은 소켓 접속을 폐쇄시키며, 송신 및 수신 콜은 연결된 소켓을 통해 데이터를 전송한다.The socket contains several system calls that include a socket system bean that forms an endpoint of the communication path. Many processes use socket descriptors for many system calls. The bind system call associates a name with a socket descriptor. Some other example systems include a connection system call request where the kernel connects to the socket, a close system call to close the socket, a shutdown system call to close the socket connection, and send and receive The call sends data through the connected socket.

도46은 다중 워크스테이션이 네트워크를 통해 시간 공유 기반의 단일 시뮬레이션 시스템을 공유하는 본 발명의 다른 실시예를 도시한다. 시뮬레이션 시스템의 컴퓨팅 환경 내에서, 단일 CPU(11)가 스테이션(1110) 의 로컬 버스(12)에 연결된다. 다중 CPU가 이 시스템에 제공될 수 있다. 당업자에게 공지된 바와 같이, 운영체제(1118)가 제공되며 근방의 모든 프로세스 및 어플리케이션이 운영체제의 상부에 존재하게 된다. 개념적인 목적을 위하여 운영체제(1121) 및 버스(1122)가 도시되어 있다.46 illustrates another embodiment of the present invention in which multiple workstations share a single time-based, single simulation system over a network. Within the computing environment of the simulation system, a single CPU 11 is connected to the local bus 12 of the station 1110. Multiple CPUs can be provided to this system. As is known to those skilled in the art, an operating system 1118 is provided and all nearby processes and applications reside on top of the operating system. Operating system 1121 and bus 1122 are shown for conceptual purposes.

도46에서, 워크스테이션(1110)은 도1에 도시된 컴포넌트/유닛과, 스케줄러(1117) 및 운영체제(1121)를 경유하여 로컬 버스(12)에 연결된 스케줄러 버스(1118)를 포함한다. 스케줄러(1117)는 운영체제(1121)에 대한 소켓 콜을 생성함에 의해 사용자 스테이션(1111, 1112, 1113)에 대한 시간 공유 액세스를 제어한다. 스케줄러(1117)는 대부분은 소프트웨어에서 부분적으로는 하드웨어에서 구현된다.In FIG. 46, workstation 1110 includes a component / unit shown in FIG. 1, and a scheduler bus 1118 connected to local bus 12 via scheduler 1117 and operating system 1121. The scheduler 1117 controls time sharing access to the user stations 1111, 1112, 1113 by making a socket call to the operating system 1121. The scheduler 1117 is implemented mostly in software, in part in hardware.

이 도면에서, 단지 세 개의 사용자가 도시되어 있고 네트워크를 통해 시뮬레이션 시스템에 액세스할 수 있다. 물론, 다른 시스템 구조가 세 개 이상의 사용자 또는 그 이하의 사용자에 대하여 제공될 수 있다. 각 사용자는 원격 스테이션(1111, 112, 또는 1113)을 경유하여 시스템을 액세스한다. 원격 사용자 스테이션(1111, 112, 및 1113)은 네트워크 접속(1114, 1115, 및 1116)을 경유하여 각각 스케줄러(1117)에 연결된다.In this figure only three users are shown and the simulation system can be accessed via a network. Of course, other system architectures may be provided for three or more users or fewer users. Each user accesses the system via remote station 1111, 112, or 1113. Remote user stations 1111, 112, and 1113 are connected to the scheduler 1117 via network connections 1114, 1115, and 1116, respectively.

당업자에게 공지된 바와 같이, 장치 구동기(1119)가 PCI 버스(50) 및 재구성가능 하드웨어 유닛(20) 사이에 접속된다. 접속 또는 전기전도성 경로(1120)가 장치 구동기(1119)와 재구성가능 하드웨어 유닛(20) 사이에 제공된다. 본 발명의 이 네트워크 다중 사용자 구현예에서, 스케줄러(1117)는 하드웨어 상태 복구 목적 후에 하드웨어 가속과 시뮬레이션을 위해 재구성가능 하드웨어 유닛(20)과 통신하고 제어하도록 운영체제(1121)를 경유하여 장치 구동기(1119)와 인터페이스를 갖는다.As is known to those skilled in the art, a device driver 1119 is connected between the PCI bus 50 and the reconfigurable hardware unit 20. A connection or electroconductive path 1120 is provided between the device driver 1119 and the reconfigurable hardware unit 20. In this network multi-user implementation of the invention, the scheduler 1117 communicates with and controls the reconfigurable hardware unit 20 for hardware acceleration and simulation after hardware state recovery purposes via the device driver 1119 via the operating system 1121. ) And the interface.

또한, 본 발명의 일실시예에서, 시뮬레이션 워크스테이션(1100)은 UltraSPARC Ⅱ 멀티프로세서를 이용하는 Sun Microsystems Enterprise 450 시스템이다. 로컬 버스를 경유한 메모리 액세스 대신에, Sun 450 시스템은 로컬 버스에 연합됨(tying up) 대신에 메모리용 버스를 경유하여 크로스바(crossbar) 스위치를 통하여 멀티프로세서가 메모리에 액세스하는 것을 가능하게 한다.In addition, in one embodiment of the invention, the simulation workstation 1100 is a Sun Microsystems Enterprise 450 system using an UltraSPARC II multiprocessor. Instead of accessing memory via the local bus, the Sun 450 system allows a multiprocessor to access memory through a crossbar switch via a bus for memory instead of tying up to the local bus.

도47은 본 발명의 네트워크 실시예에 따른 시뮬레이션 서버의 높은 레벨 구조를 나타낸다. 여기서, 운영체제는 명시적으로 도시되지는 않았으나, 당업계에 공지된 바와 같이, 다양한 사용자, 프로세스, 및 시뮬레이션 컴퓨팅 환경의 장치를서비스하기 위해 파일 관리 및 자원할당의 목적으로 항상 존재한다. 시뮬레이션 서버91130)은 스케줄러(1137), 하나 이상의 장치 구동기(1138), 및 재구성가능 하드웨어 유닛(1139)을 포함한다. 도45 및 도46에 단일 집적 유닛으로서 분명히 도시되어 있지는 않으나, 시뮬레이션 서버는 스케줄러(1117), 장치 구동기(1119), 및 재구성가능 하드웨어 유닛(20)을 포함한다. 도47로 돌아가서, 시뮬레이션 서버(1130)는 네트워크 접속/경로(1134, 1135, 및 1136)를 각각 경유하여 세 개의 워크스테이션(1131, 1132, 및 1133)(또는 사용자)에 연결된다. 위에서 언급한 바와 같이, 세 개 이상 또는 세 개 이하의 워크스테이션이 시뮬레이션 서버(1130)에 연결될 수 있다.47 illustrates a high level structure of a simulation server according to a network embodiment of the present invention. Here, the operating system is not explicitly shown, but as is known in the art, it is always present for file management and resource allocation purposes to service devices of various users, processes, and simulation computing environments. Simulation server 91130 includes a scheduler 1137, one or more device drivers 1138, and a reconfigurable hardware unit 1139. Although not explicitly shown as a single integrated unit in FIGS. 45 and 46, the simulation server includes a scheduler 1117, a device driver 1119, and a reconfigurable hardware unit 20. Returning to FIG. 47, the simulation server 1130 is connected to three workstations 1131, 1132, and 1133 (or users) via network connections / paths 1134, 1135, and 1136, respectively. As mentioned above, three or more or three or fewer workstations may be connected to the simulation server 1130.

시뮬레이션 서버의 스케줄러는 선점 라운드 로빈 알고리즘에 기초한다. 본질적으로, 라운드 로빈 구조는 수명의 사용자 또는 프로세스가 순환 수행을 완결을 위해 연속적으로 실행하는 것을 가능케한다. 따라서, 각 시뮬레이션 작업(네트워크 환경 내의 워크스테이션 또는 멀티프로세싱 비-네트워크 환경의 사용자/프로세스와 연관됨)이 우선순위 레벨 및 실행될 고정 시간 슬라이스에 할당된다.The scheduler of the simulation server is based on a preemptive round robin algorithm. In essence, the round robin structure allows a lifetime of users or processes to run continuously to complete a cyclical performance. Thus, each simulation task (associated with a workstation in a network environment or a user / process in a multiprocessing non-network environment) is assigned to a priority level and a fixed time slice to be executed.

일반적으로, 높은 우선순위 작업은 완결을 위해 먼저 실행된다. 극단적인 경우에, 상이한 사용자들이 각각 상이한 우선순위를 갖는 다면, 가장높은 우선순위를 갖는 사용자가 그의 작업이 완결될 때까지 먼저 서비스를 받으며, 가장 낮은 우선순위를 갖는 사용자는 가장 나중에 서비스를 받는다. 여기서, 각 사용자가 상이한 우선순위를 갖고 스케줄러가 단순히 우선순위에 따라 사용자에게 서비스를 제공하기 때문에 시간 슬라이스가 사용되지 않는다. 이러한 시나리오는 완결까지 단지하나의 액세싱 시뮬레이션 시스템을 갖는 것과 유사하다.In general, high priority tasks are executed first for completion. In the extreme case, if different users each have a different priority, the user with the highest priority is served first until his work is completed, and the user with the lowest priority is serviced last. Here, no time slice is used because each user has a different priority and the scheduler simply serves the user according to the priority. This scenario is similar to having only one accessing simulation system to completion.

극단적인 경우에, 상이한 사용자들이 동일한 우선순위를 갖는다. 따라서, 선입선출(first-in first-out; FIFO) 큐(queue)를 갖는 시간 슬라이스 개념이 이용된다. 동일한 우선순위 작업 중에서, 각 작업은 완결되거나 고정 시간 슬라이스가 끝날 때까지 먼저 오는 작업이 실행된다. 다음, 이 작업은 큐의 마지막에 배치된다. 저장된 시뮬레이션 이미지가 존재한다면 다음 작업이 복구되고 다음 시간 슬라이스에서 실행된다.In extreme cases, different users have the same priority. Thus, the concept of time slice with a first-in first-out (FIFO) queue is used. Of the same priority jobs, each job runs first until either it is completed or until the end of a fixed time slice. Next, this job is placed at the end of the queue. If a saved simulation image exists, the next task is restored and executed on the next time slice.

높은 우선순위 작업은 낮은 우선순위 작업을 선점할 수 있다. 즉, 동일한 우선순위의 작업은 완결을 위해 시간 슬라이스를 통해 실행될 때까지 라운드 로빈 방식으로 동작된다. 다음, 낮은 우선순위의 작업이 라운드 로빈 방식으로 동작된다. 낮은 우선순위의 작업이 동작되는 중에 높은 우선순위의 작업이 큐에 삽입되면, 높은 우선순위 작업이 높은 우선순위 작업이 완결되도록 수행될 때까지 낮은 우선순위 작업을 선점할 것이다. 낮은 우선순위 작업이 이미 실행을 개시한 경우에, 높은 우선순위 작업이 완결되도록 실행될 때까지 낮은 우선순위 작업은 더 이상 완결되도록 실행되지 않는다.High priority jobs can preempt low priority jobs. That is, tasks of the same priority are operated in a round robin fashion until executed over time slices for completeness. Next, lower priority tasks are run in a round robin fashion. If a high priority job is queued while a low priority job is running, the high priority job will preempt the low priority job until the high priority job is completed. If a low priority task has already started executing, the low priority task is no longer executed to complete until the high priority task is completed.

본 발명의 일 실시예에서, UNIX 운영체제는 기초적이고 기본적인 선점 라운드 로빈 스케줄링 알고리즘을 제공한다. 본 발명의 일실시예에 따른 시뮬레이션 서버의 스케줄링 알고리즘은 운영체제의 스케줄링 알고리즘과 결합되어 동작된다. UNIX 기반 시스템에서, 스케줄링 알고리즘의 선점 특성은 운영체제가 사용자 정의 스케줄을 선점하도록 제공된다. 시간 공유 구조를 가능하게 하기 위해서, 시뮬레이션 스케줄러는 운영체제의 스케줄링 알고리즘의 상부에서 선점 다중 우선순위 라운드 로빈 알고리즘을 사용한다.In one embodiment of the invention, the UNIX operating system provides a basic and basic preemptive round robin scheduling algorithm. The scheduling algorithm of the simulation server according to an embodiment of the present invention operates in conjunction with the scheduling algorithm of the operating system. In UNIX based systems, the preemptive nature of the scheduling algorithm is provided to allow the operating system to preempt a user defined schedule. To enable the time sharing architecture, the simulation scheduler uses a preemptive multi-priority round robin algorithm on top of the operating system's scheduling algorithm.

다중 사용자와 본 발명의 일실시예에 따른 시뮬레이션 서버 사이의 관계는 클라이언트-서버 모델을 따르며, 다중 사용자는 클라이언트이며 시뮬레이션 서버는 서버가 된다. 사용자 클라이언트와 서버 사이의 통신은 소켓 콜을 경유하여 일어난다. 도55를 참조하면, 클라이언트는 클라이언트 프로그램(1109), 소켓 시스템 콜 컴포넌트(1123), UNIX 커널(1124), 및 TCP/IP 프로토콜 컴포넌트(1125)를 포함한다. 서버는 TCP/IP 프로토콜 컴포넌트(1126), UNIX 커널(1127), 소켓 시스템 콜 컴포넌트(1128), 및 시뮬레이션 서버(1129)를 포함한다. 다중 클라이언트는 클라이언트 어플리케이션 프로그램으로부터 UNIX 소켓 콜을 통해 서버에서 시뮬레이트되도록 시뮬레이션 작업을 요구한다.The relationship between multiple users and a simulation server according to an embodiment of the present invention follows a client-server model, where multiple users are clients and simulation servers are servers. Communication between the user client and the server takes place via socket calls. Referring to Figure 55, a client includes a client program 1109, a socket system call component 1123, a UNIX kernel 1124, and a TCP / IP protocol component 1125. The server includes a TCP / IP protocol component 1126, a UNIX kernel 1127, a socket system call component 1128, and a simulation server 1129. Multiple clients require simulation to be simulated on the server through UNIX socket calls from the client application program.

본 발명의 일 실시예에서, 통상의 일련의 이벤트에는 UNIX 소켓 프로토콜을 경유하여 서버에 요구를 전달하는 다중 클라이언트가 포함된다. 각 요구를 위하여, 서버는 명령이 성공적으로 실행되었는지에 대한 요구를 인식한다(acknowledge). 서버 큐 상태의 요구를 위하여, 서버는 현재 큐 상태를 응답하여 사용자에게 적절하게 표시될 수 있게 된다. 아래의 표 F는 클라이언트로부터의 관련된 소켓 명령을 나타낸다.In one embodiment of the present invention, a typical series of events includes multiple clients that forward requests to the server via the UNIX socket protocol. For each request, the server acknowledges the request as to whether the command was executed successfully. For the request of the server queue status, the server can respond to the current queue status so that it can be properly displayed to the user. Table F below shows the relevant socket commands from the client.

표 F: 클라이언트 소켓 명령Table F: Client Socket Commands

명령Command 설명Explanation 00 시뮬레이션 <design> 시작Start simulation <design> 1One 시뮬레이션 <design> 중지Stop simulation <design> 22 시뮬레이션 <design> 빠져나감(exit)Simulation <design> Exit 33 시뮬레이션 세션에 우선순위 재할당Reassign Priorities to Simulation Sessions 44 설계 시뮬레이션 상태 저장Save Design Simulation State 55 상태 큐Status queue

각 소켓 콜에 대하여, 정수로 인코딩된 각 명령은 설계 이름을 나타내는 <design>과 같은 부가적인 파라미터가 뒤따를 수 있다. 명령이 성공적으로 실행되면 시뮬레이션 서버로부터의 응답이 "0"이 되고 명령이 실패하면 "1"이 될 것이다. 큐 상태를 요구하는 명령 "5"에 대하여, 명령의 복귀 응답의 일실시예로 사용자 스크린 상에 표시되기 위한 "＼0"에 의해 종단되는 ASCII 텍스트가 있다. 이들 시스템 소켓 콜에서, 적절한 통신 프로토콜 신호가 장치 구동기를 경유하여 재구성가능 하드웨어 유닛으로 전송되거나 이로부터 수신된다.For each socket call, each instruction encoded as an integer may be followed by additional parameters, such as <design>, indicating the design name. If the command runs successfully, the response from the simulation server will be "0" and if the command fails, it will be "1". For command " 5 " requesting queue status, there is ASCII text terminated by " 0 " to be displayed on the user screen as one embodiment of the command's return response. In these system socket calls, the appropriate communication protocol signal is sent to or received from the reconfigurable hardware unit via the device driver.

도48은 본 발명에 따른 시뮬레이션 서버의 일 실시예를 나타낸다. 위에서 설명한 바와 같이, 다중 사용자 또는 다중 프로세스는 사용자 설계의 시간 공유 방식의 시뮬레이션 및 하드웨어 가속을 위하여 단일 시뮬레이션 서버에 의해 서비스될 수 있다. 따라서, 사용자/프로세스(1147, 1148, 및 1149)가 프로세스간 통신 경로(1150, 1151, 1152) 각각을 경유하여 시뮬레이션 서버(1140)에 연결된다. 프로세스간 통신 경로(1150, 1151, 및 1152)는 멀티프로세서 구조 및 동작을 위하여 동일한 워크스테이션에, 또는 다중 워크스테이션용 네트워크에 존재할 수 있다. 각 시뮬레이션 세션은 소프트웨어 시뮬레이션 상태와 재구성가능 하드웨어 유닛과 통신하기 위한 하드웨어 상태를 포함한다. 소프트웨어 세션 중의 프로세스간 통신은 UNIX 소켓 또는 시뮬레이터 플러그인(plug-in) 카드가 설치되거나 별도의 워크스테이션 상에 TCP/IP 네트워크를 경유아형 연결된 동일한 워크스테이션 상에 시뮬레이션 세션이 존재하게 할 수 있는 능력을 제공하는 시스템 콜을 이용하여 수행된다. 시뮬레이션 서버와의 통신은 자동적으로 개시된다.48 shows an embodiment of a simulation server according to the present invention. As described above, multiple users or multiple processes can be serviced by a single simulation server for time-sharing simulation and hardware acceleration of user designs. Accordingly, users / processes 1147, 1148, and 1149 are connected to simulation server 1140 via each of the interprocess communication paths 1150, 1151, 1152. Interprocess communication paths 1150, 1151, and 1152 may exist at the same workstation or in a network for multiple workstations for multiprocessor architecture and operation. Each simulation session includes a software simulation state and a hardware state for communicating with the reconfigurable hardware unit. Interprocess communication during a software session provides the ability to have a simulation session on the same workstation with a UNIX socket or simulator plug-in card installed or connected via a TCP / IP network on a separate workstation. This is done using the provided system call. Communication with the simulation server is automatically initiated.

도48에서, 시뮬레이션 서버(1140)는 서버 모니터(1141), 시뮬레이션 작업 큐 테이블(1142), 우선순위 분류기(sorter)(1143), 작업 교체기(1144), 장치 구동기(1145), 및 재구성가능 하드웨어 유닛(1146)을 포함한다. 시뮬레이션 작업 큐 테이블(1142), 우선순위 분류기(1143), 및 작업 교체기(1144)는 도47에 도시된 스케줄러(1137)를 구성한다.In FIG. 48, the simulation server 1140 includes a server monitor 1141, a simulation work queue table 1142, a priority sorter 1143, a job changer 1144, a device driver 1145, and reconfigurable hardware. Unit 1146. The simulation work queue table 1142, priority classifier 1143, and job changer 1144 constitute the scheduler 1137 shown in FIG.

서버 모니터(1141)는 시스템의 관리자를 위한 사용자 인터페이스 기능을 제공한다. 사용자는 큐 내의 시뮬레이션 작업을 표시하도록 명령하고, 우선순위, 사용 내역, 및 시뮬레이션 작업 교체 효율을 스케줄링함에 의해 시뮬레이션 서버 상태를 모니터할 수 있다. 다른 사용 기능에는 작업 우선순위를 편집하고, 시뮬레이션 작업을 삭제하고, 시뮬레이션 서버 상태를 리셋하는 것이 포함된다.Server monitor 1141 provides a user interface function for the administrator of the system. The user can monitor the simulation server status by instructing to display simulation jobs in the queue and scheduling priorities, usage history, and simulation job replacement efficiency. Other usage features include editing task priorities, deleting simulation tasks, and resetting the simulation server state.

시뮬레이션 작업 큐 테이블(1142)은 스케줄러에 의해 삽입된 큐 내의 모든 미해결된 시뮬레이션 요구의 리스트를 보유한다. 테이블 입력은 작업 번호, 소프트웨어 시뮬레이션 프로세스 번호, 소프트웨어 시뮬레이션 이미지, 하드웨어 시뮬레이션 이미지 파일, 설계 구조 파일, 우선순위 번호, 하드웨어 사이즈, 소프트웨어 사이즈, 시뮬레이션 실행(run)의 누적 시간, 및 소유자 식별을 포함한다. 작업 큐는 선입선출(FIFO) 큐를 이용하여 구현된다. 따라서, 새로운 작업이 요구되는 경우, 큐의 말단에 배치되게 된다.The simulation work queue table 1142 holds a list of all outstanding simulation requests in the queue inserted by the scheduler. The table entries include job number, software simulation process number, software simulation image, hardware simulation image file, design structure file, priority number, hardware size, software size, cumulative time of simulation run, and owner identification. Work queues are implemented using first-in, first-out (FIFO) queues. Thus, when new work is required, it is placed at the end of the queue.

우선순위 분류기(1143)는 큐 내의 어떠한 작업이 실행될 것인지 결정한다. 일 실시예에서, 시뮬레이션 작업 우선순위 구조는 어떠한 시뮬레이션 프로세스가 현재 실행에 대해 우선순위를 갖는지 제어하기 위해 사용자 정의가능(즉, 시스템 관리자에 의해 제어가능 및 정의가능)하다. 다른 실시예에서, 우선순위 레벨은 동적이며 시뮬레이션 도중에 변경될 수 있다. 바람직한 실시예에서, 우선순위는 사용자 ID에 기초한다. 통상, 한 사용자는 높은 우선순위를 갖고 나머지 모든 사용자들은 낮으나 동일한 우선순위를 갖는다.Priority classifier 1143 determines which jobs in the queue are to be executed. In one embodiment, the simulation task priority structure is user definable (ie, controllable and definable by the system administrator) to control which simulation process has priority over current execution. In another embodiment, the priority level is dynamic and may change during the simulation. In a preferred embodiment, the priority is based on the user ID. Typically, one user has a high priority and all other users have a low but equal priority.

우선순위 레벨은 시스템 관리자에 의해 설정가능하다. 시뮬레이터 서버는 UNIX 장비로부터 "/etc/passwd"로 불리는 UNIX 사용자 파일에서 존재하는 모든 사용자 정보를 입수한다. 새로운 사용자를 부가하는 것은 UNIX 시스템 내에 새로운 사용자를 부가하는 과정 내내 일정하다. 모든 사용자가 정의된 후에, 시뮬레이션 서버 모니터가 사용자용 우선순위 레벨을 조정하기 위해 사용될 수 있다.The priority level can be set by the system administrator. The simulator server gets all user information from the UNIX machine in the UNIX user file called "/ etc / passwd". Adding new users is constant throughout the process of adding new users in a UNIX system. After all users have been defined, a simulation server monitor can be used to adjust the priority level for the user.

작업 교대기(1144)는 사용자용으로 프로그램된 우선순위 결정에 기초하여 한 프로세스 또는 한 워크스테이션과 연관된 한 시뮬레이션을 다른 프로세스 또는 워크스테이션과 연관된 다른 시뮬레이션으로 일시적으로 대치한다. 다중 사용자가 동일한 설계를 시뮬레이트하는 경우에, 작업 교대기가 시뮬레이션 세션 동안 단지 저장된 시뮬레이션 상태에서 교대된다. 그러나, 다중 사용자가 다중 설계를 시뮬레이트하는 경우에, 작업 교대기는 시뮬레이션 상태의 교대 전에 하드웨어 구조용 설계에 로딩한다. 일 실시예에서, 작업 교대가 단지 재구성가능 하드웨어 유닛 액세스만을 위해 행해져야 하기 때문에 작업 교대 메커니즘은 본 발명의 시간 공유실시에의 성능을 향상시킨다. 따라서, 한 사용자가 일부 시간 주기 동안에 소프트웨어 시뮬레이션을 필요로 한다면, 서버는 다른 사용자를 위해 다른 작업으로 교대되어 이 다른 사용자가 하드웨어 가속을 위해 재구성가능 하드웨어 유닛에 액세스할 수 있게 된다. 작업 교대의 빈도는 사용자가 조정가능하고 프로그램가능하다. 또한, 장치 구동기는 작업을 교대하기 위해 재구성가능한 하드웨어 유닛과 통신한다.The work shifter 1144 temporarily replaces one simulation associated with one process or workstation with another simulation associated with another process or workstation based on prioritization programmed for the user. If multiple users simulate the same design, work shifts are shifted in the saved simulation state only during the simulation session. However, if multiple users simulate multiple designs, the work shifter loads into the hardware structural design before the alternation of the simulated state. In one embodiment, the work shift mechanism improves the performance of the time sharing implementation of the present invention since work shifts should only be done for reconfigurable hardware unit access. Thus, if one user needs software simulation for some time period, the server can alternate to other tasks for the other user so that the other user can access the reconfigurable hardware unit for hardware acceleration. The frequency of work shifts is user adjustable and programmable. The device driver also communicates with reconfigurable hardware units to alternate work.

시뮬레이션 서버의 동작을 설명하고자 한다. 도49는 동작 중의 시뮬레이션 서버의 흐름도를 나타낸다. 먼저, 단계(1160)에서, 시스템이 유휴상태(idle)가 된다. 시스템이 단계(1160)에서 유휴상태인 경우에, 시뮬레이션 서버가 비활성일 필요가 있는 것은 아니며 시뮬레이션 작업이 동작되지 않는다. 유휴상태는 다음의 상황중 하나를 의미할 수 있다: (1) 시뮬레이션이 동작되지 않음; (2) 단지 사용자/워크스테이션만이 단일 프로세서 환경에서 활성이어서 시간 공유가 요구되지 않음; 또는 (3) 단지 하나의 사용자/워크스테이션이 활성이나 단지 하나의 프로세스가 동작중임. 따라서, 위의 조건2 및 조건3은 시뮬레이션 서버가 처리될 단지 하나의 작업만을 가져서 작업 대기(queuing), 우선순위 결정, 및 작업 교체가 필요하지 않고 본질적으로, 다른 워크스테이션 또는 프로세스로부터 요구(이벤트(1161))를 수신하지 않기 때문에 시뮬레이션 서버가 유휴상태이다.To explain the operation of the simulation server. 49 shows a flowchart of the simulation server in operation. First, at step 1160, the system is idle. If the system is idle at step 1160, the simulation server does not need to be inactive and the simulation job is not running. An idle state can mean one of the following situations: (1) the simulation is not running; (2) only users / workstations are active in a single processor environment so no time sharing is required; Or (3) only one user / workstation is active but only one process is running. Thus, condition 2 and condition 3 above have only one job to be processed by the simulation server, so that no waiting, prioritization, and job replacements are required, and essentially, requests from other workstations or processes (events) (1161)), the simulation server is idle because it is not receiving.

다중 사용자 환경의 워크스테이션으로부터 또는 멀티프로세서 환경의 마이크로프로세서로부터의 하나 이상의 요구 신호로 인해 시뮬레이션 요구가 일어나는 경우에, 시뮬레이션 서버는 들어오는 시뮬레이션 작업 또는 작업들을 단계(1162)에서대기시킨다. 스케줄러는 큐에 모든 미해결의 시뮬레이션 요구를 삽입하기 위해 시뮬레이션 작업 큐를 유지하고 모든 미해결 시뮬레이션 요구의 리스트를 만든다. 배치(batch) 시뮬레이션 작업을 위하여, 서버의 스케줄러는 모든 들어오는 시뮬레이션 요구를 대기시키고 자동적으로 사람의 개입 없이 작업들을 처리한다.If a simulation request is caused by one or more request signals from a workstation in a multi-user environment or from a microprocessor in a multiprocessor environment, the simulation server waits at step 1162 for the incoming simulation task or tasks. The scheduler maintains a simulation work queue and inserts a list of all outstanding simulation requests to insert all outstanding simulation requests into the queue. For batch simulation tasks, the server's scheduler waits for all incoming simulation requests and automatically processes tasks without human intervention.

다음 시뮬레이션 서버는 단계(1163)에서 우선순위를 결정하기 위해서 대기된 작업을 분류한다. 이 단계는 서버가 재구성가능 하드웨어 유닛에 액세스를 제공하기 위해서 다중 작업 들중에서 우선순위를 부여하여야 하는 경우에는 다중 작업들에 대해 특히 중요하게 된다. 우선순위 분류기는 큐에서 어떤 작업이 실행되어야 할지를 결정한다. 일 실시예에서, 시뮬레이션 작업 우선순위 구조는 자원 경쟁이 존재하는 경우 어떤 프로세스가 현재 실행에 대한 우선순위를 갖는지 제어하기 위해 사용자 정의 가능(즉, 시스템 관리자에 의해 제어가능 및 정의가능)하다.The simulation server then sorts the jobs waiting to determine priority in step 1163. This step is particularly important for multiple tasks where the server must prioritize among the multiple tasks to provide access to the reconfigurable hardware unit. The priority classifier decides what jobs in the queue should be run. In one embodiment, the simulation task priority structure is user definable (i.e., controllable and definable by the system administrator) to control which processes have priority for current execution if there is a resource contention.

단계(1163)의 우선 순위 분류 이후에, 다음 서버는 필요한 경우 단계(1164)에서 시뮬레이션 작업을 교체한다. 이 단계는 서버의 스케줄러용으로 프로그램된 우선순위 결정에 기초하여 한 프로세스 또는 한 워크스테인션과 연관된 하나의 시뮬레이션 작업을 다른 프로세스 또는 워크스테이션과 연관된 다른 시뮬레이션 작업과 일시적으로 대치한다. 다중 사용자가 동일한 설계를 시뮬레이트하는 경우에, 작업 교체기는 시뮬레이션 세션 동안 단지 저장된 시뮬레이션 상태에서 교체한다. 그러나, 다중 사용자가 다중 설계를 시뮬레이트하는 경우에, 작업 교체기는 시뮬레이션 상태에서의 교체 이전에 상기 설계를 먼저 로딩한다. 여기서, 장치 구동기는 또한 작업을 교체하기 위해서 재구성가능 하드웨어 유닛과 통신한다.After the priority classification of step 1163, the next server replaces the simulation task at step 1164 as needed. This step temporarily replaces one simulation task associated with one process or one workstation with another simulation task associated with another process or workstation based on prioritization programmed for the server's scheduler. If multiple users simulate the same design, the job changer only replaces in the saved simulation state during the simulation session. However, if multiple users simulate multiple designs, the job changer first loads the designs before replacement in the simulated state. Here, the device driver also communicates with the reconfigurable hardware unit to replace the task.

일 실시예에서, 작업 교체가 단지 재구성가능 하드웨어 유닛 액세스를 위해 행해져야만 하기 때문에 작업 교체 메커니즘은 본 발명의 시간 공유 실시예의 성능을 향상시킨다. 따라서, 어떤 사용자가 시간 임의의 주기 동안 소프트웨어 시뮬레이션을 필요로 하는 경우에, 서버는 다른 사용자를 위해 다른 작업으로 교체되어 이 다른 사용자가 하드웨어 가속을 위해 재구성가능 하드웨어를 액세스 할 수 있게 된다. 예를들어, 두 명의 사용자, 사용자1 및 사용자 2가 재구성가능 하드웨어 유닛에 액세스하기 위해서 시뮬레이션 서버에 연결되어 있다고 가정하자. 임의의 시간에, 사용자1은 시스템에 액세스하여 디버깅이 그의 사용자 설계를 위해 수행될 수 있다. 사용자 1이 소프트웨어 모드에서만 디버깅하는 경우에, 서버는 재구성가능 하드웨어 유닛을 해제하여 사용자2가 서버에 액세스할 수 있게 된다. 서버는 사용자2용 작업으로 교체되며 사용자 2는 소프트웨어 시뮬레이트하거나 하드웨어를 가속할 수 있게 된다. 사용자 1 및 사용자2 사이의 우선순위에 기초하여, 사용자 2는 소정 시간 동안 재구성가능 하드웨어 유닛을 계속 액세스할 수 있거나, 사용자1이 가속을 위해 재구성가능 하드웨어 유닛을 필요로 하는 경우엔, 서버가 사용자2용 작업을 선점하여 사용자1용 작업이 재구성가능 하드웨어 유닛을 이용하여 하드웨어 가속을 위해 교체될 수 있게 된다. 소정 시간을 위해서 동일한 우선순위의 다중 요구에 기초한 시뮬레이터 작업의 선점이 참조된다. 일 실시예에서, 디폴트 시간이 5분이나 이사간은 사용자가 설정할 수 있다. 이 5분으로 설정하는 것은 타임아웃(time-out) 타이머의 한 형태를 나타낸다. 본 발명의 시뮬레이션 시스템은 타임아웃 타이머를 사용하여 현재 시뮬레이션 작업의 실행을 중지시키는데,이는 이것이 과도하게 시간 소모적이고 시스템이 동일한 우선순위의 다른 진행 작업이 재구성가능 하드웨어 모델에 대한 액세스를 확보하여야 하는지를 결정하기 때문이다.In one embodiment, the task replacement mechanism improves the performance of the time sharing embodiment of the present invention because task replacement should only be done for reconfigurable hardware unit access. Thus, if a user needs software simulation for any period of time, the server will be replaced with another task for the other user so that the other user can access the reconfigurable hardware for hardware acceleration. For example, suppose two users, User1 and User2, are connected to a simulation server to access a reconfigurable hardware unit. At any time, user 1 can access the system so that debugging can be performed for his user design. If User 1 debugs only in software mode, the server releases the reconfigurable hardware unit so that User 2 can access the server. The server is replaced with a job for User2, and User2 can simulate software or accelerate hardware. Based on the priority between User 1 and User 2, User 2 can continue to access the reconfigurable hardware unit for a predetermined time, or if User 1 needs a reconfigurable hardware unit for acceleration, the server can By preempting the task for two, the task for user 1 can be replaced for hardware acceleration using the reconfigurable hardware unit. Reference is made to the preemption of simulator tasks based on multiple requests of the same priority for a given time. In one embodiment, the default time is 5 minutes, but the user can set the time between moves. Setting this five minutes represents one form of time-out timer. The simulation system of the present invention uses a timeout timer to suspend execution of the current simulation task, which is excessively time consuming and the system determines whether other progress tasks of the same priority should gain access to the reconfigurable hardware model. Because.

단계(1164)의 작업 교체 단계의 완결시에, 서버의 장치 구동기는 재구성가능 하드웨어 유닛을 로킹하여 단지 현재 예정된 사용자 또는 프로세스만이 하드웨어 모델을 시뮬레이트하고 이용하도록 한다. 로킹 및 시뮬레이션 단계는 단계(1165)에서 일어난다.Upon completion of the task changeover step of step 1164, the device driver of the server locks the reconfigurable hardware unit so that only currently scheduled users or processes simulate and use the hardware model. The locking and simulation step occurs at step 1165.

이벤트(1166)에서 현재 시뮬레이팅 세션의 시뮬레이션 또는 중지의 완결의 발생하면, 서버는 현재 시뮬레이션 작업의 우선순위를 결정하기 위해 우선순위 분류기 단계(1163)로 복귀하고 나중에 필요한 경우 시뮬레이션 작업을 교체한다. 유사하게, 서버를 우선순위 분류기 상태(1163)로 복귀시키기 위해서, 서버는 이벤트(1167)에서 진쟁중인 활성 시뮬레이션 작업의 동작을 선점할 수 있다. 선점은 단지 어떠한 조건하에서만 발생된다. 이러한 조건중 하나는 높은 우선순위의 작업이 진행중인 경우이다. 이런 조중 다른 하나는 시스템이 계산이 집중적인 시뮬레이션 작업을 현재 구동시키고 있는 경우이며, 이 경우 스케줄러는 타임아웃 타이머를 이용하여 동일한 우선순위를 갖는 작업을 스케줄하도록 현재 진행중인 작업을 선점하도록 프로그램될 수 있다. 일 실시예에서, 타임아웃 타이머는 5분으로 설정되고 현재 작업이 5분 동안 수행된다면, 시스템은 현재 작업을 선점하고 진행중인 작업을 우선순위 레벨에 있다 하더라도 교체한다.Upon completion of the simulation or suspension of the current simulating session at event 1166, the server returns to priority classifier step 1163 to later determine the priority of the current simulation task and later replaces the simulation task if necessary. Similarly, to return the server to priority classifier state 1163, the server may preempt the operation of the active simulation job in conflict at event 1167. Preemption only occurs under certain conditions. One such condition is when high priority work is in progress. The other of these is when the system is currently running a simulation-intensive simulation job, in which case the scheduler can be programmed to preempt a job in progress to schedule jobs of the same priority using a timeout timer. . In one embodiment, if the timeout timer is set to 5 minutes and the current task is performed for 5 minutes, the system preempts the current task and replaces the ongoing task even if it is at the priority level.

도50은 작업 교체 프로세스의 흐름도이다. 작업 교체 기능은 도49의단계(1164)에서 수행되며 도48의 작업 교체기로서 시뮬레이션 서버 하드웨어에 도시되어 있다. 도50에서, 시뮬레이션 작업이 다른 시뮬레이션 작업으로 교체될 필요가 있는 경우에, 작업 교체기는 단계(1180)에서 재구성가능 하드웨어 유닛에 인터럽트를 전송한다. 재구성가능 하드웨어 유닛이 현재 임의의 작업을 행하지 않는 경우에(즉, 시스템이 유휴상태이거나 사용자가 하드웨어 가속 개입만이 없이 소프트웨어 시뮬레이션 모드에서 동작하고 있는 경우에), 인터럽트는 즉시 작업 교체를 위한 재구성가능 하드웨어 유닛을 준비한다. 그러나, 재구성가능 하드웨어 유닛이 현재 작업을 동작시키고 있고 명령을 실행하거나 데이터를 처리하고 잇는 도중인 경우에, 인터럽트 신호는 인식되나 재구성가능 유닛은 현재 진행중인 명령을 실행하고 현재 작업에 대한 데이터를 처리한다. 재구성가능 하드웨어 유닛이 인터럽트 신호를 수신하고 현재 시뮬레이션 작업이 명령을 실행하거나 데이터를 처리하는 중이 아닌 경우에, 인터럽트 신호는 본질적으로 즉시 재구성 가능 하드웨어 유닛의 동작을 중단시킨다.50 is a flowchart of a job replacement process. The job change function is performed in step 1164 of FIG. 49 and is shown in the simulation server hardware as the job changer of FIG. In FIG. 50, if the simulation task needs to be replaced with another simulation task, the task changer sends an interrupt to the reconfigurable hardware unit in step 1180. If the reconfigurable hardware unit is not currently doing any work (i.e. when the system is idle or the user is operating in software simulation mode with only hardware acceleration intervention), the interrupt is immediately reconfigurable for task replacement. Prepare the hardware unit. However, if the reconfigurable hardware unit is currently executing a task and is executing a command or processing data, the interrupt signal is recognized but the reconfigurable unit executes the instruction currently in progress and processes the data for the current task. . If the reconfigurable hardware unit receives an interrupt signal and the current simulation task is not executing instructions or processing data, the interrupt signal essentially stops the operation of the reconfigurable hardware unit immediately.

단계(1181)에서, 시뮬레이션 시스템은 현재 시뮬레이션 이미지(즉, 하드웨어 및 소프트웨어 상태)를 저장한다. 이 이미지를 저장함에 의해, 사용자는 나중에 전체 시뮬레이션을 상기 저장된 지점까지 재실행시킴 없이 시뮬레이션 실행을 복구할 수 있다.In step 1181, the simulation system stores the current simulation image (ie, hardware and software state). By storing this image, the user can later restore the simulation run without rerunning the entire simulation to the stored point.

단계(1182)에서, 시뮬레이션 시스템은 재구성가능 하드웨어 유닛을 새로운 사용자 설계로 구성한다(configure). 이 구성(configuration) 단계는 단지 새로운 작업이 이미 구성되어 재구성가능 하드웨어 유닛에 로딩되고 그 실행이 인터럽트된 사용자 설계와 상이한 사용자 설계에 관련되어 있는 경우에만 요구된다. 구성 후에, 저장된 하드웨어 시뮬레이션 이미지가 단계(1183)에서 재로딩되고 저장된 하드웨어 시뮬레이션 이미지가 단계(1184)에서 재로딩된다. 새로운 시뮬레이션 작업이 동일한 설계와 연관되어 있는 경우에, 어떠한 부가적인 구성도 요구되지 않는다. 동일한 설계의 경우에, 새로운 작업에 대한 시뮬레이션 이미지가 단지 인터럽트된 작업에 대하여 시뮬레이션 이미지와 적절하게 상이하기 때문에, 시뮬레이션 시스템은 단계(1183)에서 상기 동일한 설계에 대한 새로운 시뮬레이션 작업과 연관된 목적하는 하드웨어 시뮬레이션 이미지를 로딩한다. 구성 단계의 상세한 내용을 설명하고자 한다. 다음, 관련 소프트웨어 시뮬레이션 이미지가 단계(1184)에서 리로딩된다. 하드웨어 및 소프트웨어 시뮬레이션 이미지의 리로딩 후에, 시뮬레이션이 이 새로운 작업에 대하여 개시될 수 있으나, 당분간 재구성가능 하드웨어에 대한 액세스가 존재하지 않기 때문에 이전에 인터럽트된 작업이 소프트웨어 시뮬레이션 모드에서만 진행될 수 있다.In step 1182, the simulation system configures the reconfigurable hardware unit with the new user design. This configuration step is only required if a new task has already been configured and loaded into the reconfigurable hardware unit and its execution is related to a user design different from the interrupted user design. After configuration, the stored hardware simulation image is reloaded in step 1183 and the stored hardware simulation image is reloaded in step 1184. If a new simulation task is associated with the same design, no additional configuration is required. In the case of the same design, since the simulation image for the new task is appropriately different from the simulation image for only the interrupted task, the simulation system performs the desired hardware simulation associated with the new simulation task for the same design in step 1183. Load the image. The details of the construction steps will be described. The relevant software simulation image is then reloaded at step 1184. After reloading the hardware and software simulation images, simulation can be initiated for this new task, but previously interrupted tasks can only proceed in software simulation mode because there is no access to reconfigurable hardware for the time being.

도51은 장치 구동기와 재구성가능 하드웨어 유닛 사이의 신호를 나타낸다. 장치 구동기(1171)는 스케줄러(1170)와 재구성가능 하드웨어 유닛(1172) 사이의 인터페이스를 제공한다. 장치 구동기(1171)는 또한 도45 및 도46에 도시된 바와 같이 전체 컴퓨팅 환경(즉, 워크스테이션, PCI 버스, PCI 장치)과 재구성가능 하드웨어 유닛(1172) 사이의 인터페이스를 제공한다. 도51은 시뮬레이션 서버 부분만을 도시한다. 장치 구동기와 재구성가능 하드웨어 사이의 신호는 양방향 통신 핸드셰이크(handshake) 신호, 스케줄러를 경유하여 컴퓨팅 환경으로부터 재구성가능 하드웨어 유닛으로의 단방향 설계 구성 정보, 시뮬레이션 상태로 교체된 정보, 시뮬레이션 상태에서 교체된 정보, 및 장치 구동기로부터 재구성가능 하드웨어 유닛으로의 인터럽트신호를 포함하여, 시뮬레이션 작업이 교체될 수 있다.51 shows signals between the device driver and the reconfigurable hardware unit. Device driver 1171 provides an interface between scheduler 1170 and reconfigurable hardware unit 1172. The device driver 1171 also provides an interface between the entire computing environment (ie, workstation, PCI bus, PCI device) and the reconfigurable hardware unit 1172 as shown in FIGS. 45 and 46. Fig. 51 shows only the simulation server portion. The signal between the device driver and the reconfigurable hardware is a bidirectional communication handshake signal, one-way design configuration information from the computing environment to the reconfigurable hardware unit via the scheduler, information replaced with the simulation state, information replaced with the simulation state. And the interrupt signal from the device driver to the reconfigurable hardware unit can be replaced.

라인(1173)은 양방향 통신 핸드셰이크 신호를 전송한다. 이들 신호 및 핸드셰이크 프로토콜은 도53 및 도54를 참조하여 보다 상세히 설명될 것이다.Line 1173 transmits a bidirectional communication handshake signal. These signal and handshake protocols will be described in more detail with reference to FIGS. 53 and 54.

라인(1174)은 스케줄러(1170)를 경유하여 컴퓨팅 환경으로부터 재구성가능 하드웨어 유닛(1172)으로의 단방향 설계 구성 정보를 전송한다. 초기 구성 정보는 이 라인(1170)상의 모델링 목적을 위해 재구성 가능 하드웨어 유닛(1172)으로 전송될 것이다. 또한, 사용자가 상이한 사용자 설계를 모델링하고 시뮬레이팅하는 경우에, 구성 정보는 시간 슬라이스 중에 재구성가능 하드웨어 유닛(1172)으로 전송되어야 한다. 상이한 사용자가 동일한 사용자 설계를 모델링하는 경우에, 어떠한 설계 구성도 요구되지 않으며, 오히려 동일한 설계와 연관된 상이한 시뮬레이션 하드웨어 상태가 상이한 시뮬레이션 동작을 위해 재구성가능 하드웨어 유닛(1172)으로 전송될 필요가 있을 것이다.Line 1174 sends unidirectional design configuration information from the computing environment to reconfigurable hardware unit 1172 via scheduler 1170. Initial configuration information will be sent to the reconfigurable hardware unit 1172 for modeling purposes on this line 1170. In addition, when a user models and simulates a different user design, the configuration information must be sent to the reconfigurable hardware unit 1172 during the time slice. If different users model the same user design, no design configuration is required, but rather different simulation hardware states associated with the same design will need to be sent to reconfigurable hardware unit 1172 for different simulation operations.

라인(1175)은 시뮬레이션 상태 정보로 교체된 정보를 재구성가능 하드웨어 유닛(1172)으로 전송한다. 라인(1176)은 재구성가능 하드웨어 유닛으로부터 컴퓨팅 환경(즉, 대게 메모리)으로 시뮬레이션 상태 정보에서 교체된다. 시뮬레이션 상태로 교체된 정보는 이전에 저장된 하드웨어 모델 상태정보와 재구성가능 하드웨어 유닛(1172)이 가속될 필요가 있는 하드웨어 메모리 상태를 포함한다. 교체된 상태 정보는 시간의 시작에서 전송되어 스케줄된 현재 사용자가 가속을 위해 재구성가능 하드웨어 유닛(1172)을 액세스 할 수 있다. 교체되 나간 상태 정보는 하드웨어 모델과 상이한 사용자/프로세스와 연관된 다음 시간 슬라이스로 이동시키기 위해서 인터럽트 신호를 수신하는 재구성 가능 하드웨어 유닛(1172)에 시간 슬라이스의 말미에 메모리에 저장되어야 하는 메모리 상태 정보를 포함한다.Line 1175 sends the information replaced with simulation state information to reconfigurable hardware unit 1172. Line 1176 is replaced in the simulation state information from the reconfigurable hardware unit to the computing environment (ie, usually memory). The information replaced with the simulation state includes previously stored hardware model state information and the hardware memory state in which the reconfigurable hardware unit 1172 needs to be accelerated. The replaced status information is transmitted at the beginning of time so that the scheduled current user can access the reconfigurable hardware unit 1172 for acceleration. The replaced state information includes memory state information that must be stored in memory at the end of the time slice in a reconfigurable hardware unit 1172 that receives an interrupt signal to move to the next time slice associated with a different user / process than the hardware model. do.

라인(1177)은 장치 구동기(1171)로부터 재구성 가능 하드웨어 유닛으로 인터럽트 신호를 전송하여 시뮬레이션 작업이 교체될 수 있다. 이 인터럽트 신호가 현재 시간 슬라이스에서 현재 시뮬레이션 작업으로 교체되 나가고 다음 시간 슬라이스 동안 새로운 시뮬레이션 작업으로 교체되도록 하기 위해 시간 슬라이스들 사이에서 전송된다.Line 1177 transmits an interrupt signal from device driver 1171 to a reconfigurable hardware unit so that the simulation task can be replaced. This interrupt signal is sent between time slices to ensure that it is replaced by the current simulation task in the current time slice and replaced by a new simulation task during the next time slice.

본 발명의 실시예에 따른 통신 핸드셰이크 프로토콜이 도53 및 도54를 참조하여 설명하고자 한다. 도53은 핸드셰이크 논리 인터페이스를 경유한 장치 구동기와 재구성 가능 하드웨어 유닛 사이의 통신 핸드셰이크 신호를 나타낸다. 도54는 통신 프로토콜의 상태도이다. 도51은 라인(1173) 상의 통신 핸드셰이크 신호를 나타낸다. 도53은 장치 구동기(1171)와 재구성 가능 하드웨어 유닛(1172) 사이의 통신 핸드셰이크 신호의 상세도이다.A communication handshake protocol according to an embodiment of the present invention will be described with reference to FIGS. 53 and 54. Figure 53 illustrates a communication handshake signal between a device driver and a reconfigurable hardware unit via a handshake logic interface. 54 is a state diagram of a communication protocol. 51 illustrates a communication handshake signal on line 1173. 53 is a detailed view of the communication handshake signal between the device driver 1171 and the reconfigurable hardware unit 1172.

도53에서, 핸드셰이크 논리 인터페이스(1234)가 재구성가능 하드웨어 유닛(1172)에 제공된다. 또한, 핸드셰이크 논리 인터페이스(1234)는 재구성가능 하드웨어 유닛(1172) 외부에 설치될 수 있다. 네 개의 세트의 신호가 장치 구동기(1171)와 핸드셰이크 논리 인터페이스(1234) 사이에 제공된다. 이들 신호는 라인(1230) 상의 3비트 SPACE 신호, 라인(1231) 상의 단일 비트 판독/기록 신호,라인(1232) 상의 4비트 COMAND 신호, 및 라인(1233) 상의 단일 비트 DONE 신호이다. 핸드셰이크 논리 인터페이스는 이들 신호들이 수행될 필요가 있는 다양한 동작에 대하여 적절한 모드로 재구성 가능 하드웨어 유닛에 배치되도록 이들 신호를 처리하는 논리 회로를 포함한다. 상기 인터페이스는 CTRL_FPGA 유닛(또는 FPGA I/O 제어기)에 결합된다.In Figure 53, handshake logic interface 1234 is provided to reconfigurable hardware unit 1172. In addition, the handshake logic interface 1234 may be installed outside the reconfigurable hardware unit 1172. Four sets of signals are provided between the device driver 1171 and the handshake logic interface 1234. These signals are a 3-bit SPACE signal on line 1230, a single bit read / write signal on line 1231, a 4-bit COMAND signal on line 1232, and a single bit DONE signal on line 1233. The handshake logic interface includes logic circuitry that processes these signals so that they are placed in a reconfigurable hardware unit in an appropriate mode for the various operations in which they need to be performed. The interface is coupled to a CTRL_FPGA unit (or FPGA I / O controller).

3 비트 SPACE 신호에 대하여, PCI 버스를 통한 시뮬레이션 시스템의 컴퓨팅 환경과 재구성 가능 하드웨어 유닛 사이의 데이터 전송이 소프트웨어/하드웨어 경계--REG(레지스터), CLK(소프트웨어 클록), S2H(소프트웨어에서 하드웨어로), 및 H2S(하드웨어에서 소프트웨어로)--의 임의의 I/O 어드레스 공간에 대해 지정된다. 위에서 설명한 바와 같이, 시뮬레이션 시스템은 하드웨어 모델을 상이한 컴포넌트 타입과 제어 기능에 따라 주 메모리의 네 개의 어드레스 공간으로 맵핑한다. CLK 공간은 소프트웨어 클록으로 지정되고, S2H 공간은 소프트웨어 테스트-벤치 컴포넌트의 하드웨어 모델로의 출력으로 지정되고, H2S 공간은 하드웨어 모델의 소프트웨어 테스트-벤치 컴포넌트의 출력으로 지정된다. 이들 지정I/O 버퍼 공간은 시스템 초기화 기간 중에 커널의 주 메모리 공간으로 맵핑된다.For 3-bit SPACE signals, data transfer between the computing environment of the simulation system and the reconfigurable hardware units over the PCI bus is a software / hardware boundary--REG (register), CLK (software clock), and S2H (software to hardware). , And H2S (hardware to software)-for any I / O address space. As described above, the simulation system maps the hardware model into four address spaces of main memory according to different component types and control functions. The CLK space is designated as the software clock, the S2H space is designated as the output to the hardware model of the software test-bench component, and the H2S space is designated as the output of the software test-bench component of the hardware model. These dedicated I / O buffer spaces are mapped into the kernel's main memory space during system initialization.

다음 표G는 각 SPACE 신호를 설명한다.The following table G describes each SPACE signal.

표G: SPACE 신호Table G: SPACE Signal

SPACESPACE 설명Explanation 000000 전체(또는 CLK) 공간 및 소프트웨어에서 하드웨어(DMA wr)Hardware (DMA wr) in total (or CLK) space and software 001001 레지스터 기록(DMA wr)Register write (DMA wr) 010010 하드웨어에서 소프트웨어로 (DMA wd)From hardware to software (DMA wd) 011011 레지스터 판독 (DMA wd)Register read (DMA wd) 100100 SRAM 기록 (DMA wr)SRAM Write (DMA wr) 101101 SRAM 판독 (DMA wd)SRAM Read (DMA wd) 110110 비사용No use 111111 비사용No use

라인(1231) 상의 판독/기록 신호는 데이터 전송이 판독 또는 기록인지를 나타낸다. 라인(12330 상의 DONE 신호는 DMA 데이터 전송 기간의 완결을 나타낸다.The read / write signal on line 1231 indicates whether the data transfer is read or written. The DONE signal on line 12330 indicates the completion of the DMA data transfer period.

4 비트 COMMAND는 데이터 전송 동작이 판독, 기록, 새로운 사용자 설계를 재구성가능 하드웨어 유닛에 구성, 또는 시뮬레이션의 인터럽트이어야 하는지를 나타낸다. 표H에 표시된 바와 같이, COMMAND 프로토콜은 다음과 같다.The 4-bit COMMAND indicates whether the data transfer operation should be a read, write, configure new user design in a reconfigurable hardware unit, or interrupt the simulation. As shown in Table H, the COMMAND protocol is as follows.

표H: COMMAND 신호Table H: COMMAND Signal

COMMANDCOMMAND 설명Explanation 00000000 지정 공간에 기록Write to specified space 00010001 지정 공간으로부터 판독Read from the designated space 00100010 FPGA 설계 구성FPGA design configuration 00110011 시뮬레이션 인터럽트Simulation interrupt 01000100 비사용No use

통신 핸드셰이크 프로토콜이 도54의 상태도를 참조하여 이하에서 설명될 것이다. 상태(1400)에서, 장치 구동기의 시뮬레이션 시스템은 유휴상태이다. 새로운 명령이 표시되지 않는 한은, 시스템은 경로(1401)에 표시된 바와 같이 유휴상태가 된다. 새로운 명령이 표시되면, 명령 프로세서는 상태(1402)에서 새로운 명령을 처리한다. 일 실시예에서, 명령 프로세서는 FPGA I/O 제어기이다.A communication handshake protocol will be described below with reference to the state diagram of FIG. In state 1400, the simulation system of the device driver is idle. Unless a new command is displayed, the system is idle as indicated in path 1401. If a new command is displayed, the command processor processes the new command in state 1402. In one embodiment, the command processor is an FPGA I / O controller.

COMMAND=0000 OR COMMAND=0001 이면, 시스템은 상태(1403)에서 SPACE 인덱스에 의해 지적된 바와 같이 지정 공간에 기록하거나 이로부터 판독한다. 만약 COMMAND=0010이면, 시스템은 사용자 디자인으로서 재구성가능한 하드웨어 유니트에서 FPGA을 구성하거나 또는 스테이트 1404에서 새로운 사용자 디자인으로 FPGA를 구성한다. 모든 FPGA에 대한 시스템 시퀀스 구성 정보는 하드웨어로 모델링될 수 있는 사용자 디자인의 부분을 모델링한다. 그러나, 만약 COMMAND=0011이면, 시스템은 시뮬레이션 시스템을 인터럽트하기 위하여 스테이트 1405에서 사용자 디자인의 부분을 인터럽트하는데, 이는 새로운 시뮬레이션 스테이트에서 스왑하기 위하여 시간 슬라이스가 새로운 사용자/프로세스에 대하여 시간이 경과하기 때문이다. 이러한 스테이트 1403, 1404 또는 1405가 완성되면, 시뮬레이션 시스템은 DONE 스테이트 1406으로 진행되어 DONE 신호를 생성하고, 스테이트 1400으로 리턴되어 새로운 커멘드가 존재할 때까지 대기한다.If COMMAND = 0000 OR COMMAND = 0001, the system writes to or reads from the designated space as indicated by the SPACE index in state 1403. If COMMAND = 0010, the system configures the FPGA in a reconfigurable hardware unit as a user design, or configures the FPGA as a new user design in state 1404. The system sequence configuration information for every FPGA models the part of the user design that can be modeled in hardware. However, if COMMAND = 0011, the system interrupts part of the user's design at state 1405 to interrupt the simulation system, because the time slice will time out for the new user / process to swap in the new simulation state. . Upon completion of this state 1403, 1404 or 1405, the simulation system proceeds to DONE state 1406 to generate a DONE signal and returns to state 1400 to wait for a new command to exist.

상이한 레벨의 우선순위로서 다양한 작업을 처라하는 시뮬레이션 서버의 시분할 특징을 이제 설명한다. 도 52는 일 실시예를 도시한다. 4개의 작업(작업 A, 작업 B, 작업 C, 작업 D)이 시뮬레이션 작업 큐의 입력 작업이다. 그러나, 이러한 4개의 작업에 대한 우선순위는 상이하다. 즉, 작업 A 및 B는 높은 우선순위 I에 할당되고, 작업 C 및 D는 낮은 우선순위 II에 할당된다. 도 52의 시간 라인 차트에 도시된 바와 같이, 시분할 재구성가능 하드웨어 유니트는 큐된 입력 작업의 우선순위에 따라 수행한다. 시간 1190에서, 시뮬레이션은 재구성가능 하드웨어 액세스하는 작업 A로 시작된다. 시간 1191에서, 작업 A는 작업 B에 의해 선취(preempt)되는데, 이는 작업 B가 작업 A와 동일한 우선순위를 가지며 스케쥴러가 2개의 작업에 동일한 시분할 액세스를 제공하기 때문이다. 이제 작업 B는 재구성가능 하드웨어 유니트에 액세스되어야 한다. 시간 1192에서, 작업 A는 작업 B를선취하고 작업 A는 시간 1193에서 실행된다. 시간 1193에서, 작업 B가 인수하여 시간 1194까지 완수한다. 시간 1194에서, 작업 C(이는 큐에서 다음에 있지만 작업 A 및 B보다는 우선순위가 낮다)가 이제 실행을 위하여 재구성가능 하드웨어 유니트에 액세스한다. 시간 1195에서, 작업 D는 시분할 액세스를 위하여 작업 C를 선취하는데, 이는 작업 D가 작업 C와 동일한 우선순위를 가지기 때문이다. 작업 D는 이제 작업 C에 의해 선취되는 시간 1196까지 액세스한다. 작업 C는 시간 1197ㅇ서 완성된다. 시간 1197에서 작업 D가 인수하고, 시간 1198까지 완수한다.The time-sharing features of the simulation server, which deal with various tasks with different levels of priority, are now described. 52 illustrates one embodiment. Four tasks (Task A, Task B, Task C, Task D) are the input tasks of the simulation work queue. However, the priorities for these four tasks are different. That is, jobs A and B are assigned to high priority I and jobs C and D are assigned to low priority II. As shown in the time line chart of FIG. 52, the time division reconfigurable hardware unit performs according to the priority of the queued input task. At time 1190, the simulation begins with task A accessing reconfigurable hardware. At time 1191, task A is preempted by task B, because task B has the same priority as task A and the scheduler provides the same time division access to the two tasks. Task B now needs to access the reconfigurable hardware unit. At time 1192, task A preempts task B and task A is executed at time 1193. At time 1193, task B takes over and completes until time 1194. At time 1194, task C (which is next in the queue but of lower priority than tasks A and B) now accesses the reconfigurable hardware unit for execution. At time 1195, task D preempts task C for time division access, because task D has the same priority as task C. Task D now accesses up to time 1196 preempted by task C. Task C is completed at time 1197. At time 1197 task D takes over and completes until time 1198.

VIII. 메모리 시뮬레이션VIII. Memory simulation

본 발명의 일 태양에 따른 메모리 시뮬레이션 또는 메모리 맵핑은 시뮬레이션 시스템이 사용자 디자인의 구성 하드웨어 모델과 관련된 다양한 메모리 블록을 관리하는 효율적 방법을 제공한다. 상기 사용자 디자인은 재구성가능 하드웨어 유니트에서 FPGA의 어레이로 프로그램된다. 본 발명의 일 실시예를 구현함으로써, 메모리 시뮬레이션 스킴(scheme)은 메모리 액세스를 처리하는 FPGA 칩에서의 어떠한 전용 핀을 요구하지 않는다.Memory simulation or memory mapping according to one aspect of the present invention provides an efficient way for a simulation system to manage various memory blocks associated with a constituent hardware model of a user design. The user design is programmed into an array of FPGAs in a reconfigurable hardware unit. By implementing one embodiment of the present invention, the memory simulation scheme does not require any dedicated pins in the FPGA chip to handle memory access.

여기서 사용된 바와 같이, "메모리 액세스"라는 용어는 사용자 디자인이 구현된 FPGA 로직 디바이스 및 사용자 디자인과 관련된 모든 메모리 블록을 저장하는 SRAM 메모리 디바이스 사이의 기록 액세스 또는 판독 액세스를 말한다. 따라서, 기록 오퍼레이션은 FPGA 로직 디바이스에서 SRAM 메모리 디바이스로 데이터를 전송하는 한편, 판독 오퍼레이션은 SRAM 메모리 디바이스에서 FPGA 로직 디바이스로 데이터를 전송한다. 도 56을 참조하면, FPGA 로직 디바이스는 1201(FPGA 1), 1202(FPGA 2), 1203(FPGA 3), 및 1204(FPGA 4)를 포함한다. SRAM 메모리 디바이스는 메모리 1205 및 1206을 포함한다.As used herein, the term "memory access" refers to a write access or a read access between an FPGA logic device in which a user design is implemented and an SRAM memory device that stores all memory blocks associated with the user design. Thus, the write operation transfers data from the FPGA logic device to the SRAM memory device, while the read operation transfers data from the SRAM memory device to the FPGA logic device. Referring to FIG. 56, an FPGA logic device includes 1201 (FPGA 1), 1202 (FPGA 2), 1203 (FPGA 3), and 1204 (FPGA 4). SRAM memory devices include memories 1205 and 1206.

또한, "DMA 데이터 전송"이라는 용어는 당업자에게 통용되는 일반적 의미외에도 컴퓨팅 시스템 및 시뮬레이션 시스템 사이의 데이터 전송을 의미한다. 컴퓨팅 시스템은 도 1, 45, 46에, 재구성가능 하드웨어 유니트 및 소프트웨어에 존재하는, 시뮬레이션 시스템을 지지하는 메모리를 구비한 전체 PCI-기초 시스템으로서 도시되어 있다. 선택된 디바이스에서, 오퍼레이팅 시스템으로/으로부터 요청하는 소켓/시스템은 또한 재구성가능 하드웨어 유니트 및 오퍼레이팅 시스템과의 적절한 인터페이스를 허용하는 시뮬레이션 시스템의 부분이다. 본 발명의 일 실시예에서, DMA 판독 전송은 FPGA 로직 디바이스(및 초기화와 메모리 컨텐츠 덤프를 위한 FPGA SRAM 메모리 디바이스)에서 호스트 컴퓨팅 시스템으로 데이터를 전송한다.The term "DMA data transfer" also refers to data transfer between a computing system and a simulation system in addition to the general meaning commonly used by those skilled in the art. The computing system is shown in Figures 1, 45, 46 as a complete PCI-based system with memory that supports the simulation system, residing in reconfigurable hardware units and software. In the selected device, the socket / system requesting to / from the operating system is also part of the simulation system that allows for proper interface with the reconfigurable hardware unit and the operating system. In one embodiment of the invention, the DMA read transfer transfers data from the FPGA logic device (and the FPGA SRAM memory device for initialization and memory content dump) to the host computing system.

"FPGA 데이터 버스", "FPGA 버스", "FD 버스" 및 이들의 유사어는, 디버깅될 구성 및 프로그램된 사용자 디자인을 포함하는 FPGA 로직 디바이스 및 SRAM 메모리 디바이스를 커플링하는 하이 뱅크 버스 FD[63:32] 및 로우 뱅크 버스 FD[31:0]을 말한다.The "FPGA data bus", "FPGA bus", "FD bus" and their analogy refer to the high bank bus FD [63: coupling an FPGA logic device and an SRAM memory device containing the configuration to be debugged and the user design programmed. 32] and low bank bus FD [31: 0].

메모리 시뮬레이션 시스템은 메모리 스테이트 머신, 평가 스테이트 머신, 및 하기 장치에 대한 제어 및 인터페이스를 위한 관련 로직을 포함한다: (1) 메인 컴퓨팅 시스템 및 관련 메모리 시스템, (2) 시뮬레이션 시스템에서 FPGA 버스와 커플링된 SRAM 메모리 디바이스, 및 (3) 디버깅될 구성 및 프로그램된 사용자 디자인을 포함하는 FPGA 로직 디바이스.The memory simulation system includes a memory state machine, an evaluation state machine, and associated logic for controlling and interfacing to the following devices: (1) the main computing system and associated memory system, and (2) coupling with the FPGA bus in the simulation system. An SRAM memory device, and (3) an FPGA logic device comprising a configuration to be debugged and a programmed user design.

메모리 시뮬레이션 시스템의 FPGA 로직 디바이스 측은 평가 스테이트 머신, FPGA 버스 드라이버, 및 하기의 사항을 처리하는 사용자 디자인에서 사용자 자신의 메모리 인터페이스와 인테페이싱하기 위한 각각의 메모리 블록 N에 대한 로직 인터페이스를 포함한다: (1) FPGA 로직 디바이스 사이의 데이터 평가, (2) FPGA 로직 디바이스 및 SRAM 메모리 디바이스 사이의 기록/판독 메모리 액세스. FPGA 로직 디바이스 측과 관련하여, FPGA I/O 컨트롤러 측은 (1) 메인 컴퓨팅 시스템 및 SRAM 메모리 디바이스 및 (2) FPGA 로직 디바이스 및 SRAM 메모리 디바이스 사이에 오퍼레이션의 기록 및 판독 그리고 DMA를 처리하는 인터페이스 로직 및 메모리 스테이트 머신을 포함한다.The FPGA logic device side of the memory simulation system includes an evaluation state machine, an FPGA bus driver, and a logic interface for each memory block N for interfacing with a user's own memory interface in a user design that handles the following: ( 1) Data evaluation between FPGA logic devices, (2) Write / read memory access between FPGA logic device and SRAM memory device. With respect to the FPGA logic device side, the FPGA I / O controller side includes (1) the main logic system and the SRAM memory device and (2) the interface logic to handle the writing and reading of operations and DMA between the FPGA logic device and the SRAM memory device; It includes a memory state machine.

본 발명의 일 실시예에 따른 메모리 시뮬레이션의 오페레이션은 일반적으로 다음과 같다. 시뮬레이션 기록/판독 사이클은 3개의 주기(DMA 데이터 전송, 평가, 및 메모리 액세스)로 나누어진다. DATAXSFR 신호는 DMA 데이터 전송 주기를 나타내는데, 여기서 컴퓨팅 시스템 및 SRAM 메모리 유니트는 FPGA 버스(하이 뱅크 버스(FD[63:32]) 1212 및 로우 뱅크 버스(FD[31:0]) 1213)를 통해서 각각 다른 장치로 데이터를 전송한다.Operation of memory simulation according to an embodiment of the present invention is generally as follows. The simulation write / read cycle is divided into three cycles (DMA data transfer, evaluation, and memory access). The DATAXSFR signal represents the DMA data transfer period, where the computing system and the SRAM memory unit are each connected via an FPGA bus (high bank bus (FD [63:32]) 1212 and low bank bus (FD [31: 0]) 1213, respectively. Send data to another device.

평가 주기동안, 각 FPGA 로직 디바이스의 로직 회로는 데이터 평가를 위한 사용자 디자인 로직을 위해 적절한 소프트웨어 클록, 입력 인에이블, 및 MUX 인에이블 신호를 생성한다. 인터-FPGA 로직 디바이스 통신은 이 주기에서 발생한다.During the evaluation cycle, the logic circuit of each FPGA logic device generates the appropriate software clock, input enable, and MUX enable signals for user design logic for data evaluation. Inter-FPGA logic device communication occurs in this period.

메모리 액세스 주기동안, 메모리 시뮬레이션 시스템은 각각의 어드레스 및제어 신호를 각각의 FPGA 데이터 버스에 전송하기 위하여 하이 및 로우 뱅크 FPGA 로직 디바이스를 기다린다. 만약, 오퍼레이션이 기록 오퍼레이션이면, 어드레스, 제어, 및 데이터 신호들이 FPGA 로직 디바이스에서 각 SRAM 메모리 디바이스로 전송된다. 만약, 오퍼레이션이 판독 오퍼레이션이면, 어드레스, 제어, 및 데이터 신호들이 지정 SRAM 메모리 디바이스로 제공되고, 데이터 신호가 SRAM 메모리 디바이스에서 각 FPGA 로직 디바이스로 전송된다. 결국, 모든 FPGA 로직 디바이스에서 원하는 메모리 블록이 액세스되고, 메모리 시뮬레이션 기록/판독 사이클은 완성되며, 메모리 시뮬레이션 시스템은 다음 메모리 시뮬레이션 기록/판독 사이클이 온셋(onset)될 때까지 대기한다.During the memory access period, the memory simulation system waits for the high and low bank FPGA logic devices to send each address and control signal to each FPGA data bus. If the operation is a write operation, address, control, and data signals are transferred from the FPGA logic device to each SRAM memory device. If the operation is a read operation, address, control, and data signals are provided to the designated SRAM memory device, and the data signal is sent from the SRAM memory device to each FPGA logic device. As a result, the desired memory block is accessed in all FPGA logic devices, the memory simulation write / read cycle is complete, and the memory simulation system waits until the next memory simulation write / read cycle is onset.

도 56은 본 발명의 일 실시예에 따른 메모리 시뮬레이션 구성의 하이 레벨 블록도를 도시한다. 본 발명의 특징과 관계없는 신호, 연결, 및 버스 등은 생략하였다. 전술한 CTRL_FPGA 유니트 1200는 라인 1209를 통해서 버스 1210에 연결된다. 일 실시예에서, CTRL_FPGA 유니트 1200는 Altera 10K50 칩과 같은 FPGA 칩 형태의 프로그램가능한 로직 디바이스(PLD)이다. 로직 버스 1210는 CTRL_FPGA 유니트 1200이 (만약 가능하다면) 다른 시뮬레이션 어레이 보드 및 다른 칩들(예, PCI 컨트롤러, EEPROM, 클록 버퍼)과 연결될 수 있게 한다. 라인 1209는 시뮬레이션 DMA 데이터 전송 주기와 완성을 지시하는 DONE 신호를 전송한다.56 shows a high level block diagram of a memory simulation configuration in accordance with an embodiment of the present invention. Signals, connections, buses and the like that are not relevant to the features of the present invention are omitted. The CTRL_FPGA unit 1200 described above is connected to bus 1210 via line 1209. In one embodiment, CTRL_FPGA unit 1200 is a programmable logic device (PLD) in the form of an FPGA chip, such as an Altera 10K50 chip. Logic bus 1210 allows CTRL_FPGA unit 1200 to be connected to other simulation array boards and other chips (e.g., PCI controllers, EEPROMs, clock buffers) if available. Line 1209 carries a simulated DMA data transfer period and a DONE signal indicating completion.

도 56은 로직 디바이스 및 메모리 디바이스 형태에서 다른 주요한 기능적 블록을 도시한다. 일 실시예에서, 로직 디바이스는 Altera 10K130 또는 10K250칩과 같은 FPGA 칩 형태의 프로그램가능한 로직 디바이스(PLD)이다. 따라서, 어레이에서 8개의 Altera FLEX 10K100 칩을 구비하는 전술한 실시예 대신, 단지 4개의 Altera FLEX 10K130 칩을 구비하는 실시예를 사용할 수 있다. 메모리 디바이스는 Cypress 128Kx32 CY7C1335 또는 CY7C1336 칩과 같은 동기식 파이프라인 캐시 SRAM이다. SRAM 칩은 로우 뱅크 메모리 디바이스 1205(L_SRAM) 및 하이 뱅크 메모리 디바이스 1206(H_SRAM)을 포함한다.56 illustrates another major functional block in the form of a logic device and a memory device. In one embodiment, the logic device is a programmable logic device (PLD) in the form of an FPGA chip such as an Altera 10K130 or 10K250 chip. Thus, instead of the embodiment described above with eight Altera FLEX 10K100 chips in the array, an embodiment with only four Altera FLEX 10K130 chips can be used. The memory device is a synchronous pipeline cache SRAM such as a Cypress 128Kx32 CY7C1335 or CY7C1336 chip. The SRAM chip includes a low bank memory device 1205 (L_SRAM) and a high bank memory device 1206 (H_SRAM).

이러한 로직 디바이스 및 메모리 디바이스는 하이 뱅크 버스 1212(FD[63:32]) 및 로우 뱅크 버스 1213(FD[31:0])을 통해서 CTRL_FPGA 유니트 1200에 연결된다. 로직 디바이스 1201(FPGA 1) 및 1202(FPGA 2)는 각각 버스 1223 및 버스 1225를 통해서 하이 뱅크 버스 1212에 연결되고, 로직 디바이스 1203(FPGA 1) 및 1204(FPGA 3)은 각각 버스 1224 및 1226을 통해서 로우 뱅크 버스 1213에 연결된다. 하이 뱅크 메모리 디바이스 1206는 버스 1220을 통해서 하이 뱅크 버스 1212에 연결되고, 로우 뱅크 메모리 디바이스 1205는 버스 1219를 통해서 로우 뱅크 버스 1213에 연결된다. 듀얼 뱅크 버스 구조는 시뮬레이션 시스템이 개선된 스루풋 속도로서 병렬로 하이 뱅크상의 디바이스 및 로우 뱅크상의 디바이스에 액세스할 수 있도록 한다. 듀얼 뱅크 데이터 버스 구조는 제어 및 어드레스 신호와 같은 다른 신호를 지원하고, 시뮬레이션 기록/판독 사이클이 제어될 수 있게 한다.These logic and memory devices are connected to CTRL_FPGA unit 1200 via high bank bus 1212 (FD [63:32]) and low bank bus 1213 (FD [31: 0]). Logic devices 1201 (FPGA 1) and 1202 (FPGA 2) are connected to high bank bus 1212 via bus 1223 and bus 1225, respectively, and logic devices 1203 (FPGA 1) and 1204 (FPGA 3) connect buses 1224 and 1226, respectively. Is connected to the low bank bus 1213. The high bank memory device 1206 is connected to the high bank bus 1212 via bus 1220 and the low bank memory device 1205 is connected to low bank bus 1213 via bus 1219. The dual bank bus structure allows the simulation system to access devices on the high bank and devices on the low bank in parallel with improved throughput rates. The dual bank data bus structure supports other signals, such as control and address signals, and allows the simulation write / read cycle to be controlled.

다시 도 61을 참조하면, 각 시뮬레이션 기록/판독 사이클은 DMA 데이터 전송 주기, 평가 주기, 및 메모리 액세스 주기를 포함한다. 다양한 제어 신호들의 조합은 시뮬레이션 시스템이 서로 대립되는 주기중 어느 주기에 해당하는지를 제어 및 지시한다. 재구성가능한 하드웨어 유니트에서 로직 디바이스 1201 내지 1204 및호스트 컴퓨터 사이의 DMA 데이터 전송은 PCI 버스(예, 도 46의 버스 50), 로컬 버스 1210 및 1236, 그리고 FPGA 버스 1212(FD[63:32]) 및 1213(FD[31:0])을 통해서 이루어진다. 메모리 디바이스 1205 및 1206은 초기화 및 메모리 컨텐츠 덤프를 위한 DMA 데이터 전송을 유발한다. 재구성가능한 하드웨어 유니트에서 로직 디바이스 1201 내지 1204 사이의 평가 데이터 전송은 인터커넥트 및 FPGA 버스 1212(FD[63:32]) 및 1213(FD[31:0])을 통해서 이루어진다.Referring again to FIG. 61, each simulation write / read cycle includes a DMA data transfer cycle, an evaluation cycle, and a memory access cycle. The combination of various control signals controls and directs which of the cycles the simulation system corresponds to. In a reconfigurable hardware unit, DMA data transfers between logic devices 1201 through 1204 and host computers can be transferred to PCI buses (eg, bus 50 in FIG. 46), local buses 1210 and 1236, and FPGA buses 1212 (FD [63:32]); Via 1213 (FD [31: 0]). Memory devices 1205 and 1206 cause DMA data transfers for initialization and memory content dump. Evaluation data transfer between logic devices 1201 through 1204 in the reconfigurable hardware unit is via interconnect and FPGA bus 1212 (FD [63:32]) and 1213 (FD [31: 0]).

도 56을 참조하면, CTRL_FPGA 유니트 1200은 시뮬레이션 기록/판독 사이클을 제어하기 위하여 많은 제어 및 어드레스 신호들을 전송 및 수신한다. CTRL_FPGA 유니트 1200은 라인 1211상에 DATAXSER 및 EVAL 신호를 제공하여, 라인 1221을 통해서 로직 디바이스 1201 및 1203에 전송하고 라인 1222를 통해서 로직 디바이스 1202 및 1204에 전송한다. CTRL_FPGA 유니트 1200은 또한 버스 1229 및 1214를 통해서 각각 로우 뱅크 메모리 디바이스 1205 및 하이 뱅크 메모리 디바이스 1206로 메모리 어드레스 신호 MA[18:2]를 제공한다. 이러한 메모리 디바이스의 모드를 제어하기 위하여, CTRL_FPGA 유니트 1200은 라인 1216 및 1215를 통하여 로우 뱅크 메모리 디바이스 1205 및 하이 뱅크 메모리 디바이스 1206에 칩 선택 기록(및 판독) 신호를 제공한다. DMA 데이터 전송의 완료를 지시하기 위하여, 메모리 시뮬레이션 시스템은 CTRL_FPGA 유니트 1200 및 컴퓨팅 시스템으로 라인 1209상의 DONE 신호를 전송 및 수신할 수 있다.Referring to FIG. 56, the CTRL_FPGA unit 1200 sends and receives many control and address signals to control the simulation write / read cycle. The CTRL_FPGA unit 1200 provides DATAXSER and EVAL signals on line 1211 to transmit to logic devices 1201 and 1203 via line 1221 and to logic devices 1202 and 1204 via line 1222. The CTRL_FPGA unit 1200 also provides a memory address signal MA [18: 2] over the buses 1229 and 1214 to the low bank memory device 1205 and the high bank memory device 1206, respectively. To control the mode of this memory device, the CTRL_FPGA unit 1200 provides a chip select write (and read) signal to the low bank memory device 1205 and the high bank memory device 1206 via lines 1216 and 1215. To indicate completion of the DMA data transfer, the memory simulation system may send and receive a DONE signal on line 1209 to the CTRL_FPGA unit 1200 and the computing system.

도 9, 11, 12, 14, 15와 관련하여 전술한 바와 같이, 로직 디바이스 1201 내지 1204는 2개의 SHIFTIN/SHIFTOUT 라인 세트(라인 1207, 1227, 1218 및 라인1208, 1228, 1217)에 의해서 도 56에서 멀티플렉싱된 크로스 칩 어드레스 포인터 체인에 의해 서로 연결된다. 이러한 세트는 라인 1207 및 1208에서 Vcc에 의해 체인의 시작시 초기화된다. SHIFTIN 신호는 현재의 FPGA 로직 디바이스를 위한 메모리 액세스를 시작하기 위하여 뱅크에서 이전의 FPGA 로직 디바이스로부터 전송된다. 소정의 체인 세트를 통해서 시프트가 완료되면, 최종 로직 디바이스는 LAST 신호(즉, LASTL 또는 LASTH)를 생성하여 CTRL_FPGA 유니트 1200으로 전송한다. 하이 뱅크에 대하여, 로직 디바이스 1202는 라인 1218상에 LASTH 시프트아웃 신호를 생성하여 CTRL_FPGA 유니트 1200으로 전송하고, 로우 뱅크에 대하여, 로직 디바이스 1204는 라인 1217상에 LASTL 신호를 생성하여 CTRL_FPGA 유니트 1200으로 전송한다.As described above in connection with FIGS. 9, 11, 12, 14, and 15, logic devices 1201 through 1204 are illustrated in FIG. 56 by two sets of SHIFTIN / SHIFTOUT lines (lines 1207, 1227, 1218 and lines 1208, 1228, 1217). Are connected to each other by a multiplexed cross chip address pointer chain. This set is initialized at the beginning of the chain by Vcc at lines 1207 and 1208. The SHIFTIN signal is sent from the previous FPGA logic device in the bank to initiate memory access for the current FPGA logic device. When the shift is completed over a given set of chains, the final logic device generates a LAST signal (ie, LASTL or LASTH) and sends it to CTRL_FPGA unit 1200. For the high bank, the logic device 1202 generates a LASTH shiftout signal on line 1218 and sends it to CTRL_FPGA unit 1200. For the low bank, the logic device 1204 generates a LASTL signal on line 1217 and sends it to CTRL_FPGA unit 1200. do.

도 56 및 보드 구현과 관련하여, 본 발명의 일 실시예는 컴포넌트(예, 로직 디바이스 1201 내지 1204, 메모리 디바이스 1205 내지 1206, 및 CTRL_FPGA 유니트 1200) 및 버스(예, FPGA 버스 1212 내지 1213 및 로컬 버스 1210)를 하나의 보드상에 통합한다. 이러한 원 보드는 마더보드 커넥터를 통해 마더보드에 연결된다. 따라서, 하나의 보드상에 4개의 로직 디바이스(각 뱅크에 대하여 2개), 2개의 메모리 디바이스(각 뱅크에 대하여 1개), 및 버스가 제공된다. 제 2 보드는 보충적 로직 디바이스(통상 4개), 메모리 디바이스(통상 2개), FPGA I/O 컨트롤러(CTRL_FPGA 유니트) 및 버스를 포함한다. 전술한 인터-보드 커넥터는 보드들 사이에 제공되어, 모든 보드에서의 로직 디바이스가 서로 연결되어 평가 주기동안 통신할 수 있도록 하며, 로컬 버스는 이러한 모든 보드들을 서로 연결한다. FPGA 버스FD[63:0]은 각 보드에 유일하게 제공되며, 다중 보드에 걸치지 않는다.With reference to FIG. 56 and the board implementation, one embodiment of the present invention provides components (eg, logic devices 1201-1204, memory devices 1205-1206, and CTRL_FPGA unit 1200) and buses (eg, FPGA buses 1212-1213 and local buses). 1210 is integrated on one board. These original boards are connected to the motherboard through motherboard connectors. Thus, four logic devices (two for each bank), two memory devices (one for each bank), and a bus are provided on one board. The second board includes a complementary logic device (typically four), a memory device (typically two), an FPGA I / O controller (CTRL_FPGA unit) and a bus. The inter-board connectors described above are provided between the boards so that logic devices on all boards can connect to each other and communicate during the evaluation period, and the local bus connects all these boards to each other. The FPGA bus FD [63: 0] is unique to each board and does not span multiple boards.

보드 구현에 있어서, 시뮬레이션 시스템은 각각의 보드에서 로직 디바이스와 메모리 디바이스 사이에 메모리 맵핑을 수행한다. 상이한 보드 사이에는 메모리 맵핑이 수행되지 않는다. 따라서, 보드 5의 로직 디바이스는 보드 5의 메모리 디바이스로 메모리 블록을 맵핑하고, 다른 보드상의 메모리 디바이스로 맵핑하지는 않는다. 그러나, 본 발명의 다른 실시예에서는, 시뮬레이션 시스템이 하나의 보드상의 로직 디바이스로부터 다른 보드상의 메모리 디바이스로 메모리 블록을 맵핑할 수 있다.In the board implementation, the simulation system performs memory mapping between the logic device and the memory device on each board. No memory mapping is performed between the different boards. Thus, a logic device on board 5 maps a block of memory to a memory device on board 5 and does not map to a memory device on another board. However, in another embodiment of the present invention, the simulation system may map memory blocks from logic devices on one board to memory devices on another board.

본 발명의 일 실시예에 따른 메모리 시뮬레이션의 오퍼레이션은 일반적으로 다음과 같다. 시뮬레이션 기록/판독 사이클은 3개의 주기(DMA 데이터 전송, 평가, 및 메모리 액세스)로 나누어진다. 시뮬레이션 기록/판독 사이클의 완료를 지시하기 위하여, 메모리 시뮬레이션 시스템은 CTRL_FPGA 유니트 1200 및 컴퓨팅 시스템에 대하여 라인 1209상에 DONE 신호를 전송 및 수신할 수 있다. 버스 1211상의 DATAXSFR 신호는 DMA 데이터 전송 주기의 발생을 나타내는데, 여기서 컴퓨팅 시스템 및 FPGA 로직 디바이스 1201 내지 1204는 FPGA 데이터 버스, 하이 뱅크 버스(FD[63:32]) 1212 및 로우 뱅크 버스(FD[31:0]) 1213을 통해서 데이터를 다른 장치로 전송한다. 일반적으로, DMA 전송은 호스트 컴퓨팅 시스템과 FPGA 로직 디바이스 사이에 발생한다. 초기화 및 메모리 컨텐츠 덤프에 대하여, DMA 전송은 호스트 컴퓨팅 시스템 및 SRAM 메모리 디바이스 1205 및 1206 사이에 발생한다.Operation of the memory simulation according to an embodiment of the present invention is generally as follows. The simulation write / read cycle is divided into three cycles (DMA data transfer, evaluation, and memory access). To indicate completion of the simulation write / read cycle, the memory simulation system may send and receive a DONE signal on line 1209 for the CTRL_FPGA unit 1200 and the computing system. The DATAXSFR signal on bus 1211 indicates the occurrence of a DMA data transfer cycle, where computing systems and FPGA logic devices 1201 through 1204 are FPGA data buses, high bank buses (FD [63:32]) 1212 and low bank buses (FD [31). : 0]) Send data to another device through 1213. In general, DMA transfers occur between a host computing system and an FPGA logic device. For initialization and memory content dump, DMA transfers occur between the host computing system and the SRAM memory devices 1205 and 1206.

평가 주기동안, 각 FPGA 로직 디바이스 1201 내지 1204의 로직 회로는 데이터 평가를 위한 사용자 디자인에 적절한 소프트웨어 클록, 입력 인에이블, 및 MUX 인에이블 신호를 생성한다. 인터-FPGA 로직 디바이스 통신이 이 주기동안 일어난다. CTRL_FPGA 유니트 1200은 또한 평가 주기의 지속을 위하여 평가 카운터를 동작시킨다. 카운터의 수, 및 이에 따른 평가 주기의 지속은 신호의 가장 긴 경로를 결정함으로써 시스템에 의해 세트된다. 경로 길이는 스텝의 특정 숫자와 관련된다. 시스템은 스텝 정보를 이용하고, 평가 사이클을 그 완료동안 실행시키는데 필요한 카운터의 수를 계산한다.During the evaluation period, the logic circuits of each FPGA logic device 1201 through 1204 generate a software clock, input enable, and MUX enable signal suitable for the user design for data evaluation. Inter-FPGA logic device communication occurs during this period. CTRL_FPGA unit 1200 also operates an evaluation counter for the duration of the evaluation cycle. The number of counters, and thus the duration of the evaluation period, is set by the system by determining the longest path of the signal. The path length is related to a certain number of steps. The system uses the step information and calculates the number of counters needed to run the evaluation cycle during its completion.

메모리 액세스 주기동안, 메모리 시뮬레이션 시스템은 FPGA 데이터 버스상으로 어드레스 및 제어 신호를 각각 전송하기 위하여 하이 및 로우 뱅크 FPGA 로직 디바이스 1201 내지 1204를 기다린다. 이러한 어드레스 및 제어 신호는 CTRL_FPGA 유니트 1200에 의해 래칭된다. 만약 오퍼레이션이 기록 오퍼레이션이면, 어드레스, 제어, 및 데이터 신호는 FPGA 로직 디바이스 1201 내지 1204에서 각각 SRAM 메모리 디바이스 1205 내지 1206으로 전송된다. 만약 오퍼레이션이 판독 오퍼레이션이면, 어드레스 및 제어 신호가 FPGA 로직 디바이스 1201 내지 1204에서 각각 SRAM 메모리 디바이스 1205 내지 1206으로 전송되고, 데이터 신호는 SRAM 메모리 디바이스 1205 및 1206에서 각각 FPGA 로직 디바이스 1201 내지 1204로 전송된다. FPGA 로직 디바이스 측에서, FD 버스 드라이버는 메모리 블록의 어드레스 및 제어 신호를 FPGA 데이터 버스(FD 버스)상에 위치시킨다. 만약 오퍼레이션이 기록 오퍼레이션이면, 기록 데이터가 메모리 블록을 위한 FD 버스상에 위치한다. 만약 오퍼레이션이 판독 오퍼레이션이면, 더블 버퍼가 SRAM 메모리 디바이스로부터의 FD 버스상의 메모리 블록을 위한 데이터를 래칭한다. 이러한 오퍼레이션은 각 FPGA 로직 디바이스의 각 메모리 블록에 대하여 연속하여 수행된다. FGPA 로직 디바이스의 원하는 모든 메모리 블록이 액세스되었을 때, 메모리 시뮬레이션 시스템은 각 뱅크에서 다음의 FPGA 로직 디바이스를 수행하고, 상기 FPGA 로직 디바이스의 메모리 블록에 대한 액세싱을 시작한다. 모든 FPGA 로직 디바이스 1201 내지 1204에서 원하는 모든 메모리 블록이 액세싱된 후, 메모리 시뮬레이션 기록/판독 사이클이 완료되고, 메모리 시뮬레이션 시스템은 다른 메모리 시뮬레이션 기록/판독 사이클의 온세트까지 대기한다.During the memory access period, the memory simulation system waits for the high and low bank FPGA logic devices 1201-1204 to send address and control signals, respectively, on the FPGA data bus. This address and control signal is latched by CTRL_FPGA unit 1200. If the operation is a write operation, address, control, and data signals are sent from the FPGA logic devices 1201 through 1204 to the SRAM memory devices 1205 through 1206, respectively. If the operation is a read operation, address and control signals are sent from the FPGA logic devices 1201 through 1204 to the SRAM memory devices 1205 through 1206, respectively, and data signals are sent from the SRAM memory devices 1205 and 1206 into the FPGA logic devices 1201 through 1204, respectively. . On the FPGA logic device side, the FD bus driver places the address and control signals of the memory block on the FPGA data bus (FD bus). If the operation is a write operation, write data is located on the FD bus for the memory block. If the operation is a read operation, the double buffer latches data for the memory block on the FD bus from the SRAM memory device. This operation is performed sequentially for each memory block of each FPGA logic device. When all the desired memory blocks of the FGPA logic device have been accessed, the memory simulation system performs the next FPGA logic device in each bank and begins accessing the memory blocks of the FPGA logic device. After all desired memory blocks in all FPGA logic devices 1201 through 1204 have been accessed, the memory simulation write / read cycle is complete, and the memory simulation system waits until onset of other memory simulation write / read cycles.

도 57은, CTRL_FPGA 유니트 1200 및 메모리 시뮬레이션과 관련된 각 로직 디바이스의 보다 상세한 구조적 다이어그램을 포함하는, 본 발명에 따른 메모리 시뮬레이션의 블록도를 도시한다. 도 57은 CTRL_FPGA 유니트 1200 및 로직 디바이스 1203(이는 다른 로직 디바이스 1201, 1202, 1204와 구조적으로 유사하다)을 도시한다. CTRL_FPGA 유니트 1200은 메모리 유한 스테이트 머신(MEMFSM) 1240, AND 게이트 1241, 평가(EVAL) 카운터 1242, 로우 뱅크 메모리 어드레스/제어 래치 1243, 로우 뱅크 어드레스/제어 멀티플렉서 1244, 어드레스 카운터 1245, 하이 뱅크 메모리 어드레스/제어 래치 1247, 및 하이 뱅크 어드레스/제어 멀티플렉서 1246으로 포함한다. 도 57에 도시된 로직 디바이스 1203과 같은 각 로직 디바이스는 평가 유한 스테이트 머신(EVALFSMx) 1248, 데이터 버스 멀티플렉서(FPGA 0 로직 디바이스 1203을 위한 FDO_MUXx) 1249를 포함한다. EVALFSM의 끝에 표기된 "x"는 관련 특정 로직 디바이스(FPGA 0, FPGA 1, FPGA 2, FPGA 3)를 나타내는 것으로, 본 실시예에서 "x"는 0, 1, 2, 3이다. 따라서, EVALFSM 0는 FPGA 0 로직 디바이스 1203과 관련된다. 일반적으로, 각 로직 디바이스는 동일한 번호 x와 관련되며, N개의 로직 디바이스가 사용되는 경우 "x"는 0 내지 N-1을 나타낸다.57 shows a block diagram of a memory simulation in accordance with the present invention, including a more detailed structural diagram of each logic device associated with CTRL_FPGA unit 1200 and memory simulation. 57 shows CTRL_FPGA unit 1200 and logic device 1203 (which is structurally similar to other logic devices 1201, 1202, 1204). The CTRL_FPGA unit 1200 is a memory finite state machine (MEMFSM) 1240, AND gate 1241, evaluation (EVAL) counter 1242, low bank memory address / control latch 1243, low bank address / control multiplexer 1244, address counter 1245, high bank memory address / Control latch 1247, and high bank address / control multiplexer 1246. Each logic device, such as logic device 1203 shown in FIG. 57, includes an evaluation finite state machine (EVALFSMx) 1248, a data bus multiplexer (FDO_MUXx for FPGA 0 logic device 1203) 1249. The " x " indicated at the end of the EVALFSM represents the relevant specific logic device (FPGA 0, FPGA 1, FPGA 2, FPGA 3), and in this embodiment "x" is 0, 1, 2, 3. Thus, EVALFSM 0 is associated with FPGA 0 logic device 1203. In general, each logic device is associated with the same number x, and " x " represents 0 to N-1 when N logic devices are used.

각 로직 디바이스 1201 내지 1204에서, 다양한 메모리 블록이 구성 및 맵핑된 사용자 디자인과 관련된다. 따라서, 상용자 로직에서 메모리 블록 인터페이스 1253은 FPGA 로직 디바이스의 원하는 메모리 블록에 액세스하기 위하여 컴퓨팅 시스템을 위한 수단을 제공한다. 메모리 블록 인터페이스 1253은 또한 버스 1295상에 메모리 기록 데이터를 제공하여 FPGA 데이터 버스 멀티플렉서(FDO_MUXx) 1249로 전송하고, 메모리 판독 데이터 더블 버퍼 1251로부터 버스 1297상에 메모리 판독 데이터를 수신한다.In each logic device 1201-1204, various memory blocks are associated with the user design that is configured and mapped. Thus, in consumer logic, memory block interface 1253 provides a means for a computing system to access a desired memory block of an FPGA logic device. Memory block interface 1253 also provides memory write data on bus 1295 to be transmitted to FPGA data bus multiplexer (FDO_MUXx) 1249 and receives memory read data on bus 1297 from memory read data double buffer 1251.

메모리 블록 데이터/로직 인터페이스 1298이 각 FPGA 로직 디바이스에 제공된다. 이러한 각각의 메모리 블록 데이터/로직 인터페이스 1298은 FPGA 데이터 버스 멀티플렉서(FDO_MUXx) 1249, 평가 유한 스테이트 머신(EVALFSMx) 1248, 및 FPGA 버스 FD[63:0]와 연결된다. 메모리 블록 데이터/로직 인터페이스 1298은 메모리 판독 데이터 더블 버퍼 1251, 어드레스 오프세트 유니트 1250, 메모리 모델 1252, 및 각 메모리 블록 N(mem_block_N)에 대한 메모리 블록 인터페이스 1253(이들은 모두 각 메모리 블록 N에 대하여 주어진 FPGA 로직 디바이스 1201 내지 1204에 대하여 반복된다)을 포함한다. 따라서, 5개의 메모리 블록에 대하여, 5개 세트의 메모리 블록 데이터/로직 인터페이스 1298가 제공된다. 즉, 5개 세트의 메모리 판독 데이터 더블 버퍼 1251, 어드레스 오프세트 유니트 1250, 메모리 모델 1252, 및 각메모리 블록 N(mem_block_N)에 대한 메모리 블록 인터페이스 1253이 제공된다.Memory block data / logic interface 1298 is provided for each FPGA logic device. Each of these memory block data / logic interfaces 1298 is coupled to an FPGA data bus multiplexer (FDO_MUXx) 1249, an evaluation finite state machine (EVALFSMx) 1248, and an FPGA bus FD [63: 0]. Memory block data / logic interface 1298 is a memory readout data double buffer 1251, address offset unit 1250, memory model 1252, and memory block interface 1253 for each memory block N (mem_block_N) (these are all FPGAs given for each memory block N). Repeated for logic devices 1201-1204). Thus, for five memory blocks, five sets of memory block data / logic interface 1298 are provided. That is, five sets of memory read data double buffers 1251, address offset unit 1250, memory model 1252, and memory block interface 1253 for each memory block N (mem_block_N) are provided.

EVALFSMx와 마찬가지로, FDO_MUXx에서 "x"는 관련된 특정 로직 디바이스(FPGA 0, FPGA 1, FPGA 2, FPGA 3)를 나타내며, 여기서 "x"는 0, 1, 2, 3이다. FDO_MUXx 1249의 출력은 버스 1282상에 제공되는데, 상기 버스 1282는 어떠한 칩(FPGA 0, FPGA 1, FPGA 2, FGPA 3)이 FDO_MUXx 1249와 관련되는가에 따라서 하이 뱅크 버스 FD[63:32] 또는 로우 뱅크 버스 FD[31:0]와 연결된다. 도 57에서, FDO_MUXx는 FDO_MUX0인데, 이는 로우 뱅크 로직 디바이스 FPGA0 1203과 관련된다. 따라서, 버스 1282상의 출력은 로우 뱅크 버스 FD[31:0]에 제공된다. 버스 1283의 부분은 하이 뱅크 FD[63:32] 또는 로우 뱅크 FD[31:0] 버스로부터 메모리 판독 데이터 더블 버퍼 1251로의 입력을 위한 판독 버스 1283로 판독 데이터를 전송하는데 사용된다. 그러므로, 기록 데이터는 FDO_MUX0 1249를 통해서 각 로직 디바이스 1201 내지 1204의 메모리 블록으로 부터 하이 뱅크 FD[63:32] 또는 로우 뱅크 FD[31:0] 버스로 전송되고, 판독 데이터는 판독 버스 1283을 통해서 하이 뱅크 FD[63:32] 또는 로우 뱅크 FD[32:0] 버스로부터 메모리 판독 데이터 더블 버퍼 1251로 전송된다. 메모리 판독 데이터 더블 버퍼는 제1 버퍼에서 데이터를 래치하기 위하여 더블 버플링된 메커니즘을 제공하고, 스큐(skew)를 최소화하기 위하여 동일한 시간에 래칭된 데이터를 얻기 위해 다시 버플링된다. 이러한 메모리 판독 데이터 더블 버퍼 1251는 하기에서 더욱 상세히 설명될 것이다.Like EVALFSMx, "x" in FDO_MUXx refers to the particular logic device involved (FPGA 0, FPGA 1, FPGA 2, FPGA 3), where "x" is 0, 1, 2, 3. The output of FDO_MUXx 1249 is provided on bus 1282, which is a high bank bus FD [63:32] or low depending on which chip (FPGA 0, FPGA 1, FPGA 2, FGPA 3) is associated with FDO_MUXx 1249. It is connected to bank bus FD [31: 0]. In FIG. 57, FDO_MUXx is FDO_MUX0, which is associated with low bank logic device FPGA0 1203. Thus, the output on bus 1282 is provided to low bank bus FD [31: 0]. Portion of bus 1283 is used to transfer read data from high bank FD [63:32] or low bank FD [31: 0] bus to read bus 1283 for input to memory read data double buffer 1251. Therefore, write data is transferred from the memory blocks of each logic device 1201 through 1204 via FDO_MUX0 1249 to the high bank FD [63:32] or low bank FD [31: 0] buses, and the read data is read through the read bus 1283. Memory read data is transferred from the high bank FD [63:32] or low bank FD [32: 0] buses to the double buffer 1251. The memory read data double buffer provides a double buffled mechanism to latch data in the first buffer, and is buffled again to obtain data latched at the same time to minimize skew. This memory read data double buffer 1251 will be described in more detail below.

메모리 모델 1252를 참조하면, 상기 메모리 모델 1252는 사용자 메모리 타입을 메모리 시뮬레이션 시스템의 SRAM 타입으로 변환한다. 사용자 디자인의 메모리타입은 하나의 타입에서 다른 타입으로 변화될 수 있기 때문에, 이러한 메모리 블록 인터페이스 1253는 또한 사용자 디자인에 대하여 유일할 수 있다. 예를 들면, 사용자의 메모리 타입은 DRAM, 플레시 메모리, 또는 EEPROM일 수 있다. 그러나, 모든 다양한 메모리 블록 인터페이스 1253에서, 메모리 어드레스 및 제어 신호(예, 판독, 기록, 칩 선택, mem_clk)이 제공된다. 본 발명에 따른 메모리 시뮬레이션의 일 실시예는 사용자 메모리 타입을 메모리 시뮬레이션 시스템에서 사용되는 SRAM 타입으로 변환한다. 만약 사용자의 메모리 타입이 SRAM이라면, SRAM 타입 메모리 모델로의 변환은 유일하다. 따라서, 메모리 어드레스 및 제어 신호는 버스 1296상에 제공되어 변환을 수행하는 메모리 모델 1252로 전송된다.Referring to memory model 1252, the memory model 1252 converts a user memory type into an SRAM type of a memory simulation system. Since the memory type of the user design can vary from one type to another, this memory block interface 1253 may also be unique for the user design. For example, the user's memory type may be DRAM, flash memory, or EEPROM. However, at all the various memory block interfaces 1253, memory address and control signals (eg, read, write, chip select, mem_clk) are provided. One embodiment of a memory simulation according to the present invention converts a user memory type into an SRAM type used in a memory simulation system. If your memory type is SRAM, conversion to SRAM type memory model is unique. Thus, memory address and control signals are provided on bus 1296 and sent to memory model 1252, which performs the translation.

메모리 모델 1252는 버스 1293상에 메모리 블록 어드레스 정보를 제공하고, 버스 1292상에 제어 정보를 제공한다. 어드레스 오프세트 유니트 1250는 다양한 메모리 블록에 대하여 어드레스 정보를 수신하고, 버스 1293상의 오리지널 어드레스로부터 버스 1291상에 수정된 오프세트 어드레스를 제공한다. 오프세트는 어떠한 메모리 블록의 어드레스가 서로 오버랩될 수 있기 때문에 필수적이다. 예를 들어, 하나의 메모리 블록이 스페이스 0-2K를 사용하고 여기에 존재할 수 있는 한편, 또다른 메모리 블록이 스페이스 0-3K를 사용하고 여기에 존재할 수 있다. 스페이스 0-2K에서 2개의 메모리 블록이 오버랩되기 때문에, 어떠한 어드레싱 오프세트 메커니즘이 없이는 개별 어드레싱이 어려울 것이다. 따라서, 제 1 메모리 블록은 스페이스 0-2K를 사용하고 여기에 존재할 수 있는 한편, 제 2 메모리 블록은 2K에서 5K까지의 스페이스를 사용하고 여기에 존재할 수 있다. 오프세트 유니트 1250으로부터의 오프세트 어드레스 및 버스 1292상의 제어 신호는 조합되고, 버스 1299상에 제공되어 FPGA 버스 멀티플렉서(FDO_MUXx) 1249로 전송된다.Memory model 1252 provides memory block address information on bus 1293 and control information on bus 1292. The address offset unit 1250 receives address information for the various memory blocks and provides a modified offset address on the bus 1291 from the original address on the bus 1293. The offset is necessary because the addresses of any memory block can overlap each other. For example, one memory block may use space 0-2K and be present here, while another memory block may use space 0-3K and be present here. Since two memory blocks overlap in space 0-2K, individual addressing will be difficult without any addressing offset mechanism. Thus, the first memory block may use space 0-2K and be present here, while the second memory block may use and be here using spaces from 2K to 5K. The offset address from offset unit 1250 and the control signals on bus 1292 are combined and provided on bus 1299 and sent to FPGA bus multiplexer (FDO_MUXx) 1249.

FPGA 데이터 버스 멀티플렉서 FDO_MUXx는 버스 1289상의 SPACE2 데이터, 버스 1299상의 어드레스/제어 신호, 및 버스 1295상의 메모리 기록 데이터를 수신한다. 전술한 바와 같이, SPACE2 및 SPACE3은 특정 스페이스 인덱스이다. FPGA I/O 컨트롤러(도 10 및 도 22의 참조부호 327)에 의해 생성된 SPACE 인덱스는 특정 어드레스 스페이스(예, REG 판독, REG 기록, S2H 판독, H2S 기록, 및 CLK기록)를 선택한다. 이러한 어드레스 내에서, 본 발명에 따른 시스템은 어드레싱될 특정 워드를 선택한다. SPACE2는 하드웨어-소프트웨어 H2S 데이터의 DMA 판독 전송을 전담하는 메모리 스페이스를 말한다. SPACE3는 REGISTER_READ 데이터의 DMA 판독 전송을 전담하는 메모리 스페이스를 말한다. 상기의 표 G를 참조하라.The FPGA data bus multiplexer FDO_MUXx receives SPACE2 data on bus 1289, address / control signals on bus 1299, and memory write data on bus 1295. As mentioned above, SPACE2 and SPACE3 are specific space indices. The SPACE index generated by the FPGA I / O controller (reference numeral 327 in FIGS. 10 and 22) selects a particular address space (eg, REG read, REG write, S2H read, H2S write, and CLK write). Within this address, the system according to the invention selects a particular word to be addressed. SPACE2 refers to the memory space dedicated to the DMA read transfer of hardware-software H2S data. SPACE3 refers to a memory space dedicated to DMA read transfer of REGISTER_READ data. See Table G above.

FDO_MUXx 1249는 로우 뱅크 또는 하이 뱅크 버스에 대하여 버스 1282상에 데이터를 출력한다. 선택기 신호는 EVALFSMx 유니트 1248로부터 라인 1285상의 선택 신호이며 라인 1284상의 출력 인에이블(output_en) 신호이다. 라인 1284상의 출력 인이에블 신호는 FDO_MUXx 1249의 오퍼레이션을 인에이블(또는 디스에이블)한다. FPGA 버스에 대한 데이터 액세스를 위하여, 출력 인에이블 신호는 인에이블되어 FDO_MUXx가 기능하도록 한다. 라인 1285상의 선택 신호가 EVALFSMx 유니트 1248에 의해 생성되어 버스 1289상의 SPACE2 데이터, 버스 1290상의 SPACE3 데이터, 버스 1299상의 어드레스/제어 신호, 및 버스 1295상의 메모리 기록 데이터로로부터의 다수의 출력 중 소정의 출력을 선택한다. EVALFSMx 유니트 1248에 의한 선택 신호의생성은 하기에 설명한다.FDO_MUXx 1249 outputs data on bus 1282 for a low bank or high bank bus. The selector signal is a select signal on line 1285 from EVALFSMx unit 1248 and an output enable signal on line 1284. The output enable signal on line 1284 enables (or disables) operation of FDO_MUXx 1249. For data access to the FPGA bus, the output enable signal is enabled to allow FDO_MUXx to function. A select signal on line 1285 is generated by EVALFSMx unit 1248 to output a predetermined number of outputs from SPACE2 data on bus 1289, SPACE3 data on bus 1290, address / control signals on bus 1299, and memory write data on bus 1295. Select. The generation of the selection signal by the EVALFSMx unit 1248 is described below.

EVALFSMx 유니트 1248은 메모리 시뮬레이션 시스템과 관련하여 각 로직 디바이스 1201 내지 1204의 동작의 핵심이다. EVALFSMx 유니트 1248은 그 입력으로서 라인 1279상의 SHIFTIN 신호, 라인 1274상의 CTRL_FPGA 유니트 1200으로부터 EVAL 신호, 및 라인 1287상의 기록 신호 wrx를 수신한다. EVALFSMx 유니트 1248은 라인 1280상에 SHIFTOUT 신호, 메모리 판독 데이터 더블 버퍼 1251을 위하여 라인 1286상에 판독 래치 신호 rd_latx, FDO_MUXx 1249를 위하여 라인 1284상에 출력 인에이블 신호, FDO_MUXx 1249를 위하여 라인 1285상에 선택 신호, 및 라인 1281상에 상용자 로직(input_en, mux_en, clk_en)을 위한 3개의 신호를 출력한다.The EVALFSMx unit 1248 is at the heart of the operation of each logic device 1201 through 1204 with respect to the memory simulation system. The EVALFSMx unit 1248 receives, as its input, a SHIFTIN signal on line 1279, an EVAL signal from CTRL_FPGA unit 1200 on line 1274, and a write signal wrx on line 1287. EVALFSMx unit 1248 selects SHIFTOUT signal on line 1280, read latch signal rd_latx on line 1286 for memory read data double buffer 1251, output enable signal on line 1284 for FDO_MUXx 1249, and on line 1285 for FDO_MUXx 1249 Signal and three signals for common logic (input_en, mux_en, clk_en) on line 1281.

본 발명에 따른 메모리 시뮬레이션 시스템을 위한 FPGA 로직 디바이스 1201 내지 1204의 동작을 설명한다. EVAL 신호가 로직 1일 때, FPGA 로직 디바이스 1201 내지 1204내에서 데이터 평가가 수행된다. 그렇지 않으면, 시뮬레이션 시스템은 DMA 데이터 전송 또는 메모리 액세스를 수행한다. EVAL=1일 때, EVALFSMx 유니트 1248은 clk_en 신호, input_en 신호, mux_en 신호를 생성하여, 사용자 로직이 각각 데이터, 래치 관련 데이터, 및 로직 디바이스에 대한 멀티플렉스 신호를 평가하게 한다. EVALFSMx 유니트 1248은 사용자 디자인(도 19 참조)에서 모든 클록 에지 레지스터의 제 2 플립-플롭을 인에이블하도록 clk_en 신호를 생성한다. 그렇지 않을 경우, clk_en 신호는 스포트웨어 클록이 될 것이다. 만약 사용자의 메모리 타입이 동기식이라면, clk_en 신호는 또한 각 메모리 블록의 메모리 판독 데이터 더블 버퍼 1251의 제 2 클록을 인에이블한다. EVALFSMx 유니트 1248은 DMA 전송에의해 CPU로부터 사용자의 로직으로 전송된 입력 신호를 래치하기 위하여 사용자 디자인에 대하여 input_en 신호를 생성한다. input_en 신호는 주 클록 레지스터의 제 2 플립-플롭에 대하여 인에이블 입력을 제공한다(도 19 참조). 마지막으로, EVALFSMx 유니트 1248은 어레이에서 다른 FPGA 로직 디바이스와 통신을 시작하기 위하여 각 FPGA 로직 디바이스내의 멀티플렉싱 회로를 턴온시키는 mux_en 신호를 생성한다.The operation of FPGA logic devices 1201 through 1204 for a memory simulation system in accordance with the present invention is described. When the EVAL signal is logic 1, data evaluation is performed within the FPGA logic devices 1201-1204. Otherwise, the simulation system performs DMA data transfer or memory access. When EVAL = 1, EVALFSMx unit 1248 generates clk_en signal, input_en signal, mux_en signal, allowing user logic to evaluate the data, latch related data, and multiplex signal for the logic device, respectively. The EVALFSMx unit 1248 generates a clk_en signal to enable the second flip-flop of all clock edge registers in the user design (see FIG. 19). Otherwise, the clk_en signal will be a spotware clock. If the user's memory type is synchronous, the clk_en signal also enables the second clock of the memory read data double buffer 1251 of each memory block. The EVALFSMx unit 1248 generates an input_en signal for the user design to latch the input signal sent from the CPU to the user's logic by the DMA transfer. The input_en signal provides an enable input for the second flip-flop of the main clock register (see FIG. 19). Finally, the EVALFSMx unit 1248 generates a mux_en signal that turns on the multiplexing circuitry within each FPGA logic device to initiate communication with other FPGA logic devices in the array.

따라서, 만약 FPGA 로직 디바이스 1201 내지 1204가 적어도 하나의 메모리 블록을 포함하면, 메모리 시뮬레이션 시스템은 선택된 FPGA 로직 디바이스로 시프트되기 위하여 선택된 데이터를 기다리고, FD 버스상에 메모리 블록 인터페이스 1253(mem_block_N)의 어드레스 및 제어 신호를 전송하기 위하여 FPGA 데이터 버스 드라이버를 위한 선택 신호 및 output_en 신호를 생성한다.Thus, if FPGA logic devices 1201 through 1204 include at least one memory block, the memory simulation system waits for the selected data to be shifted to the selected FPGA logic device, and the address of memory block interface 1253 (mem_block_N) on the FD bus and Generates a select signal and an output_en signal for the FPGA data bus driver to transmit control signals.

만약 라인 1287상의 기록 신호 wrx가 인에이블(즉, 로직 1)되면, 선택 신호 및 output_en 신호는 인에이블되어 FPGA 칩이 어느 뱅크에 연결되었는가에 따라서 기록 데이터를 로우 뱅크 또는 하이 뱅크 버스상으로 전송한다. 도 57에서, 로직 디바이스 1203는 FPGA0이고, 로우 뱅크 버스 FD[31:0]에 연결되어 있다. 만약 라인 1287상의 기록 신호 wrx가 디스에이블(즉, 로직 0)되었으면, 선택 신호 및 output_en 신호가 디스에이블되고, FPGA 칩이 어느 뱅크와 연결되었는가에 따라서 메모리 판독 데이터 더블 버퍼 1251으로의 전송을 위한 라인 1286상의 판독 래치 신호 rd_latx는 로우 뱅크 또는 하이 뱅크 버스를 통해서 SRAM으로부터 선택된 데이터를 래치 및 더블 버플링한다. wrx 신호는 사용자 디자인 로직의 메모리 인터페이스로부터 전송된 메모리 기록 신호이다. 라인 1287상의 wrx 신호는 제어 버스 1292를 통해서 메모리 모델 1252로부터 얻어진다.If write signal wrx on line 1287 is enabled (ie, logic 1), the select signal and output_en signal are enabled to transfer write data onto a low bank or high bank bus depending on which bank the FPGA chip is connected to. . In FIG. 57, logic device 1203 is FPGA0 and is connected to low bank bus FD [31: 0]. If write signal wrx on line 1287 is disabled (ie, logic 0), the select signal and output_en signal are disabled, and the line for transfer to memory read data double buffer 1251 depending on which bank the FPGA chip is connected to. The read latch signal rd_latx on 1286 latches and double buffs selected data from the SRAM via a low bank or high bank bus. The wrx signal is a memory write signal sent from the memory interface of the user design logic. The wrx signal on line 1287 is obtained from memory model 1252 via control bus 1292.

데이터 기록 또는 판독을 위한 이러한 프로세스는 각 FPGA 로직 디바이스에서 발생한다. 모든 메모리 블록이 SRAM을 통해서 프로세싱된 후, EVALFSMx 유니트 1248은 SHIFTPUT 신호를 생성하여 체인에서 다음 FPGA 로직 디바이스에 의해 SRAM이 액세스되도록 한다. 하이 및 로우 뱅크상의 디바이스에 대한 메모리 액세스는 병렬로 수행된다. 때때로, 하나의 뱅크에 대한 메모리 액세스는 다른 뱅크에 대한 메모리 액세스 전에 완료될 수 있다. 이러한 모든 액세스에 대하여, 적절한 대기 사이클이 삽입되어, 로직이 준비되고 데이터가 이용가능할 때에만 로직이 데이터를 프로세싱하게 한다.This process for writing or reading data occurs at each FPGA logic device. After all of the memory blocks have been processed through the SRAM, the EVALFSMx unit 1248 generates a SHIFTPUT signal that allows the SRAM to be accessed by the next FPGA logic device in the chain. Memory accesses to devices on the high and low banks are performed in parallel. At times, memory access to one bank may be completed before memory access to another bank. For all such accesses, appropriate wait cycles are inserted, causing the logic to process the data only when the logic is ready and the data is available.

CTRL_FPGA 유니트 1200 측에서, MEMFSM 1240은 본 발명에 따른 메모리 시뮬레이션의 핵심을 이룬다. 이는 메모리 시뮬레이션 기록/판독 사이클의 활성을 제어하기 위한 많은 제어 신호들을 전송 및 수신하여, 사이클에 의해 다양한 동작의 제어가 이루어진다. MEMFSM 1240은 라인 1258을 통해서 라인 1260상에 DATAXSFR 신호를 수신한다. 상기 신호는 또한 라인 1273상의 각 로직 디바이스에 제공된다. DATAXSFR이 로우가 되면(즉, 로직 로우), DMA 데이터 전송 주기는 완료되고, 평가 및 메모리 액세스 주기가 시작된다.On the CTRL_FPGA unit 1200 side, MEMFSM 1240 forms the core of the memory simulation according to the present invention. It transmits and receives a number of control signals for controlling the activity of the memory simulation write / read cycles, whereby the cycles control various operations. The MEMFSM 1240 receives the DATAXSFR signal on line 1260 over line 1258. The signal is also provided to each logic device on line 1273. When DATAXSFR goes low (ie, logic low), the DMA data transfer cycle is complete, and the evaluation and memory access cycle begins.

MEMFSM 1240은 또한 PCI 버스 및 FPGA 버스를 통해서 컴퓨팅 시스템과 시뮬레이션 시스템 사이에 액세스된 선택된 어드레스 스페이스와 관련된 선택된 워드를 지시하기 위하여 라인 1254상의 LASTH 신호 및 라인 1255상의 LASTL 신호를 수신한다. 이러한 시프트 아우트 프로세스와 관련된 MOVE 신호는 원하는 워드가 액세스될 때까지 각 로직 디바이스(예, 로직 디바이스 1201 내지 1204)를 통과하여 전송되고, 체인의 끝에서 MOVE 신호는 결국 LAST 신호(즉, 하이 뱅크를 위한 LASTH 및 로우 뱅크를 위한 LASTL)가 된다. EVALFSM 1248(즉, 도 57은 FPGA0 로직 디바이스 1203을 위한 EVALFSM0를 도시한다)에서, 대응 LAST 신호는 라인 1280상의 SHIFTOUT 신호이다. 특정 로직 디바이스 1203이 도 56에 도시된 바와 같은 로우 뱅크 체인에서 마지막 로직 디바이스가 아니기 때문에(도 56에서는 로직 디바이스 1204가 로우 뱅크 체인에서 마지막 디바이스이다), EVALFSM0을 위한 SHIFTOUT 신호는 LAST 신호가 아니다. 만약 EVALFSM 1248이 도 56의 EVALFSM2에 대응하면, 라인 1280상의 SHIFTOUT 신호는 MEMFSM을 위하여 라인 1255에 제공되는 LASTL 신호이다. 그렇지 않으면, 라인 1280상의 SHIFTOUT 신호는 로직 디바이스 1204로 제공된다(도 56 참조). 유사하게, 라인 1280상의 SHIFTIN 신호는 FPGA0 로직 디바이스 1203을 위한 Vcc를 표현한다(도 56 참조).The MEMFSM 1240 also receives a LASTH signal on line 1254 and a LASTL signal on line 1255 to indicate the selected word associated with the selected address space accessed between the computing system and the simulation system via the PCI bus and the FPGA bus. The MOVE signal associated with this shift out process is transmitted through each logic device (e.g., logic devices 1201 through 1204) until the desired word is accessed, and at the end of the chain the MOVE signal eventually results in a LAST signal (i.e. a high bank). LASTH for a row bank and LASTL for a low bank). In EVALFSM 1248 (ie, FIG. 57 shows EVALFSM0 for FPGA0 logic device 1203), the corresponding LAST signal is a SHIFTOUT signal on line 1280. Since the particular logic device 1203 is not the last logic device in the low bank chain as shown in FIG. 56 (logic device 1204 is the last device in the low bank chain in FIG. 56), the SHIFTOUT signal for EVALFSM0 is not a LAST signal. If EVALFSM 1248 corresponds to EVALFSM2 of FIG. 56, the SHIFTOUT signal on line 1280 is the LASTL signal provided on line 1255 for MEMFSM. Otherwise, the SHIFTOUT signal on line 1280 is provided to logic device 1204 (see FIG. 56). Similarly, the SHIFTIN signal on line 1280 represents Vcc for the FPGA0 logic device 1203 (see FIG. 56).

LASTL 및 LASTH 신호는 각각 라인 1256 및 1257을 통해서 AND 게이트 1241로 입력된다. AND 게이트 1241의 출력은 라인 1259상에 DONE 신호를 생성하고, 이는 컴퓨팅 시스템 및 MEMFSM 1240에 제공된다. 따라서, LASTL 및 LASTH 신호가 모두 시프트 아우트된 체인의 끝을 나타내는 로직 하이일 때, 프로세스는 AND 게이트 출력을 로직 하이로 할 것이다.The LASTL and LASTH signals are input to AND gate 1241 through lines 1256 and 1257, respectively. The output of AND gate 1241 produces a DONE signal on line 1259, which is provided to the computing system and MEMFSM 1240. Thus, when both the LASTL and LASTH signals are logic high indicating the end of the shifted out chain, the process will bring the AND gate output to logic high.

MEMFSM 1240은 EVAL 카운터 1242를 위하여 라인 1261상에 시작 신호를 생성한다. 명칭이 암시하는 바와 같이, 시작 신호는 EVAL 카운터 1242의 시작을 트리거링하고, DMA 데이터 전송 주기 완료 이후에 전송된다. 시작 신호는 DATAXSFR 신호가 하이에서 로우로(1에서 0으로) 전환되는 것을 검출할 때 생성된다. EVAL 카운터 1242는 소정 수의 클록 사이클을 카운팅하는 프로그램가능한 카운터이다. EVAL 카운터 1242에서 프로그램된 카운트의 지속은 평가 주기의 지속을 결정한다. 라인 1274상의 EVAL 카운터 1242의 출력은 카운터가 카운팅하는가 또는 카운팅하지 않는가에 따라서 로직 레벨 1 또는 0을 가진다. EVAL 카운터 1242가 카운팅할 때, 라인 1274의 출력은 로직 1이고, 이는 EVALFSMx 1248을 통해서 각 FPGA 로직 디바이스 1201 내지 1204에 제공된다. EVAL=1일 때, FPGA 로직 디바이스 1201 내지 1204는 사용자 디자인내의 데이터를 평가하기 위하여 인터 FPGA 통신을 수행한다. EVAL 카운터 1242의 출력은 자신의 트랙킹(tracking)을 목적으로 MEMFSM 유니트 1240에 대하여 라인 1262상으로 피드백된다. 프로그램된 카운트가 끝날 때, EVAL 카운터 1242는 평가 주기의 완료를 지시하기 위하여 라인 1274 및 1262상에 로직 0 신호를 생성한다.MEMFSM 1240 generates a start signal on line 1261 for EVAL counter 1242. As the name suggests, the start signal triggers the start of the EVAL counter 1242 and is sent after completion of the DMA data transfer period. The start signal is generated when the DATAXSFR signal detects a transition from high to low (1 to 0). EVAL counter 1242 is a programmable counter that counts a predetermined number of clock cycles. The duration of the count programmed in EVAL counter 1242 determines the duration of the evaluation cycle. The output of EVAL counter 1242 on line 1274 has logic level 1 or 0 depending on whether the counter is counting or not counting. When the EVAL counter 1242 counts, the output of line 1274 is logic 1, which is provided to each FPGA logic device 1201 through 1204 via EVALFSMx 1248. When EVAL = 1, FPGA logic devices 1201 through 1204 perform inter FPGA communication to evaluate data in the user design. The output of the EVAL counter 1242 is fed back onto the line 1262 to the MEMFSM unit 1240 for its tracking purposes. At the end of the programmed count, the EVAL counter 1242 generates a logic 0 signal on lines 1274 and 1262 to indicate the completion of the evaluation period.

만약 메모리 액세스가 목적되지 않으면, 라인 1272상의 MEM_EN 신호는 로직 0으로 되고, MEMFSM 유니트 1240으로 제공된다. 이 경우 메모리 시뮬레이션 시스템은 또다른 DMA 데이터 전송 주기를 기다린다. 만약 메모리 액세스가 목적되면, 라인 1272상의 MEM_EN 신호는 로직 1로 된다. 이 경우, MEM_EN 신호는 FPGA 로직 디바이스를 액세싱하기 위한 온-보드 SRAM 메모리 디바이스를 인에이블시키는 CPU로부터의 제어 신호이다. 여기서, MEMFSM 유니트 1240는 FPGA 버스 FD[63:32] 및 FD[31:0]상에 어드레스 및 제어 신호를 위치시키기 위하여 FPGA 로직 디바이스1201 내지 1204를 기다린다.If no memory access is desired, the MEM_EN signal on line 1272 goes to logic 0 and is provided to MEMFSM unit 1240. In this case, the memory simulation system waits for another DMA data transfer cycle. If memory access is desired, the MEM_EN signal on line 1272 becomes logic one. In this case, the MEM_EN signal is a control signal from the CPU that enables the on-board SRAM memory device for accessing the FPGA logic device. Here, the MEMFSM unit 1240 waits for the FPGA logic devices 1201 through 1204 to place address and control signals on the FPGA buses FD [63:32] and FD [31: 0].

나머지 기능적 유니트 그리고 이와 관련된 제어 신호 및 라인들은 데이터 기록 및 판독을 위하여 어드레스/제어 정보를 SRAM 메모리 디바이스로 전송한다. 이러한 유니트는 로우 뱅크를 위한 메모리 어드레스/제어 래치 1243, 로우 뱅크를 위한 어드레스 제어 mux 1244, 하이 뱅크를 위한 메모리 어드레스/제어 래치 1247, 하이 뱅크를 위한 어드레스 제어 mux 1246, 및 어드레스 카운터 1245를 포함한다.The remaining functional units and associated control signals and lines transfer address / control information to the SRAM memory device for data writing and reading. This unit includes a memory address / control latch 1243 for a low bank, an address control mux 1244 for a low bank, a memory address / control latch 1247 for a high bank, an address control mux 1246 for a high bank, and an address counter 1245. .

로우 뱅크를 위한 메모리 어드레스/제어 래치 1243은 버스 1213과 일치하는 FPGA 버스 FD[31:0]로부터 어드레스 및 제어 신호 그리고 라인 1263상의 래치 신호를 수신한다. 래치 1243는 라인 1264상에 mem_wr_L 신호를 생성하고, 버스 1266을 통해서 FPGA 버스 FD[31:0]로부터 어드레스/제어 mux 1244로 수신된 어드레스/제어 신호를 전송한다. 이러한 mem_wr 신호는 칩 선택 기록 신호와 동일하다.The memory address / control latch 1243 for the row bank receives the address and control signals and the latch signals on line 1263 from the FPGA bus FD [31: 0] that matches bus 1213. Latch 1243 generates a mem_wr_L signal on line 1264 and transmits the address / control signal received from FPGA bus FD [31: 0] to address / control mux 1244 over bus 1266. This mem_wr signal is the same as the chip select write signal.

어드레스/제어 mux 1244는 버스 1268을 통해서 어드레스 카운터 1245로부터 어드레스 정보 및 버스 1266상의 어드레스 및 제어 정보를 입력으로서 수신한다. 어드레스/제어 mux 1244는 버스 1276상에 어드레스/제어 정보를 출력하여 로우 뱅크 SRAM 메모리 디바이스 1205로 전송한다. 라인 1265상의 선택 신호는 MEMFSM 유니트 1240으로부터 적절한 선택 신호를 제공한다. 버스 1276상의 어드레스/제어 정보는 도 56에 도시된 버스 1229 및 1216상의 칩 선택 판독/기록 신호 및 MA[18:2]에 대응한다.The address / control mux 1244 receives, as input, address information and address and control information on the bus 1266 from the address counter 1245 over the bus 1268. Address / control mux 1244 outputs address / control information on bus 1276 and transmits to low bank SRAM memory device 1205. The select signal on line 1265 provides the appropriate select signal from MEMFSM unit 1240. The address / control information on the bus 1276 corresponds to the chip select read / write signals on the buses 1229 and 1216 and MA [18: 2] shown in FIG.

어드레스 카운터 1245는 버스 1267을 통해서 SPACE4 및 SPACE5로부터 정보를 수신한다. SPACE4는 DMA 기록 전송 정보를 포함한다. SPACE5는 DMA 판독 전송 정보를 포함한다. 따라서, 이러한 DMA 전송은 PCI 버스를 통해서 컴퓨팅 시스템(워크스테이션 CPU를 통한 캐시/메인 메모리) 및 시뮬레이션 시스템(SRAM 메모리 디바이스 1205, 1206) 사이에서 일어난다. 어드레스 카운터 1245는 그 출력을 버스 1288 및 1268로 제공하여 어드레스/제어 mux 1244 및 1246으로 전송한다. 로우 뱅크를 위하여 라인 1265상에 적절한 선택 신호를 제공하여, 어드레스/제어 mux 1244는 SRAM 디바이스 1205 및 FPGA 로직 디바이스 1203, 1204 사이의 판독/기록 메모리 액세스를 위한 버스 1266상의 어드레스/제어 정보 또는 대안으로서 버스 1276상의 SPACE4 또는 SPACE5로부터의 DMA 기록/판독 전송 데이터를 버스 1276상에 제공한다.Address counter 1245 receives information from SPACE4 and SPACE5 over bus 1267. SPACE4 contains DMA write transfer information. SPACE5 includes DMA read transfer information. Thus, such DMA transfers occur between the computing system (cache / main memory via the workstation CPU) and the simulation system (SRAM memory devices 1205, 1206) over the PCI bus. The address counter 1245 provides its output to buses 1288 and 1268 to send to address / control mux 1244 and 1246. By providing an appropriate select signal on line 1265 for the low bank, address / control mux 1244 can be used as an address / control information or alternative on bus 1266 for read / write memory access between SRAM device 1205 and FPGA logic devices 1203 and 1204. DMA write / read transfer data from SPACE4 or SPACE5 on bus 1276 is provided on bus 1276.

메모리 액세스 주기 동안, MEMFSM 유니트 1240은 FPGA 버스 FD[31:0]로부터 입력를 패치하기 위하여 메모리 어드레스/제어 래치 1243에 대하여 라인 1263상에 래치 신호를 제공한다. MEMFSM 유니트 1240은 추가의 제어를 위하여 FD[31:0]상의 어드레스/제어 신호로부터 mem_wr_L 제어 정보를 추출한다. 만약 버스 1264상의 mem_wr_L 신호가 로직 1이면, 기록 오퍼레이션이 목적되며, 라인 1265상의 적절한 선택 신호가 MEMFSM 유니트 1240에 의해 생성되고 어드레스/제어 mux 1244로 전송되어, 버스 1266상의 어드레스/제어 신호가 버스 1276상의 로우 뱅크 SRAM으로 전송된다. 그 후에, FPGA 로직 디바이스로부터 SRAM 메모리 디바이스로 기록 데이터 전송이 일어난다. 만약 버스 1264상의 mem_wr_L 신호가 로직 0이면, 판독 오퍼레이션이 목적되고, 시뮬레이션 시스템은 SRAM 메모리 디바이스에 의해 전송된 FPGA 버스 FD[31:0]상의 데이터를 기다린다. 데이터가 준비되면, SRAM 메모리 디바이스로부터 FPGA 로직 디바이스로 판독 데이터 전송이 일어난다.During a memory access period, MEMFSM unit 1240 provides a latch signal on line 1263 for memory address / control latch 1243 to fetch input from FPGA bus FD [31: 0]. The MEMFSM unit 1240 extracts mem_wr_L control information from the address / control signal on the FD [31: 0] for further control. If the mem_wr_L signal on bus 1264 is logic 1, a write operation is desired, and an appropriate select signal on line 1265 is generated by MEMFSM unit 1240 and sent to address / control mux 1244 so that address / control signal on bus 1266 is transferred to bus 1276. Is transferred to the low bank SRAM on the top. Thereafter, write data transfer occurs from the FPGA logic device to the SRAM memory device. If the mem_wr_L signal on bus 1264 is logic 0, a read operation is desired, and the simulation system waits for data on the FPGA bus FD [31: 0] sent by the SRAM memory device. Once the data is ready, a read data transfer occurs from the SRAM memory device to the FPGA logic device.

하이 뱅크에 대하여 유사한 구성 및 오퍼레이션이 제공된다. 하이 뱅크를 위한 메모리 어드레스/제어 래치 1247은 버스 1212와 일치하는 FPGA 버스 FD[63:32]로부터 어드레스 및 제어 신호 그리고 라인 1270상의 래치 신호를 수신한다. 래치 1270은 라인 1271상에 mem_wr_H 신호를 생성하고, 버스 1239를 통해서 FPGA 버스 FD[63:32]에서 어드레스/제어 mux 1246으로 들어오는 어드레스/제어 신호를 전송한다.Similar configurations and operations are provided for high banks. Memory address / control latch 1247 for the high bank receives address and control signals and latch signals on line 1270 from FPGA bus FDs [63:32] matching bus 1212. Latch 1270 generates a mem_wr_H signal on line 1271 and transmits an address / control signal coming into address / control mux 1246 from FPGA bus FD [63:32] over bus 1239.

어드레스/제어 mux 1246는 버스 1268을 통해서 어드레스 카운터 1245로부터 어드레스 정보 및 버스 1239상의 어드레스 및 제어 정보를 입력으로서 수신한다. 어드레스/제어 mux 1244는 버스 1277상에 어드레스/제어 정보를 출력하여 하이 뱅크 SRAM 메모리 디바이스 1206로 전송한다. 라인 1269상의 선택 신호는 MEMFSM 유니트 1240으로부터 적절한 선택 신호를 제공한다. 버스 1277상의 어드레스/제어 정보는 도 56에 도시된 버스 1214 및 1215상의 칩 선택 판독/기록 신호 및 MA[18:2]에 대응한다.Address / control mux 1246 receives address information and address and control information on bus 1239 as input from address counter 1245 over bus 1268. Address / control mux 1244 outputs address / control information on bus 1277 and transmits to high bank SRAM memory device 1206. The select signal on line 1269 provides the appropriate select signal from MEMFSM unit 1240. The address / control information on the bus 1277 corresponds to the chip select read / write signals on the buses 1214 and 1215 and MA [18: 2] shown in FIG.

어드레스 카운터 1245는 전술한 DMA 기록 및 판독 전송과 마찬가지로 버스 1267을 통해서 SPACE4 및 SPACE5로부터 정보를 수신한다. 어드레스 카운터 1245는 그 출력을 버스 1288 및 1268로 제공하여 어드레스/제어 mux 1244 및 1246으로 전송한다. 하이 뱅크를 위하여 라인 1269상에 적절한 선택 신호를 제공하여, 어드레스/제어 mux 1246는 SRAM 디바이스 1206 및 FPGA 로직 디바이스 1201, 1202 사이의 판독/기록 메모리 액세스를 위한 버스 1239상의 어드레스/제어 정보 또는 대안으로서 버스 1267상의 SPACE4 또는 SPACE5로부터의 DMA 기록/판독 전송 데이터를 버스 1277상에 제공한다.The address counter 1245 receives information from SPACE4 and SPACE5 over bus 1267 as with the DMA write and read transfer described above. The address counter 1245 provides its output to buses 1288 and 1268 to send to address / control mux 1244 and 1246. By providing an appropriate select signal on line 1269 for the high bank, address / control mux 1246 can be used as an address / control information or alternative on bus 1239 for read / write memory access between SRAM device 1206 and FPGA logic devices 1201 and 1202. DMA write / read transfer data from SPACE4 or SPACE5 on bus 1267 is provided on bus 1277.

메모리 액세스 주기 동안, MEMFSM 유니트 1240은 FPGA 버스 FD[63:32]로부터 입력를 패치하기 위하여 메모리 어드레스/제어 래치 1247에 대하여 라인 1270상에 래치 신호를 제공한다. MEMFSM 유니트 1240은 추가의 제어를 위하여 FD[63:32]상의 어드레스/제어 신호로부터 mem_wr_H 제어 정보를 추출한다. 만약 버스 1271상의 mem_wr_H 신호가 로직 1이면, 기록 오퍼레이션이 목적되며, 라인 1269상의 적절한 선택 신호가 MEMFSM 유니트 1240에 의해 생성되고 어드레스/제어 mux 1246로 전송되어, 버스 1239상의 어드레스/제어 신호가 버스 1277상의 하이 뱅크 SRAM으로 전송된다. 그 후에, FPGA 로직 디바이스로부터 SRAM 메모리 디바이스로 기록 데이터 전송이 일어난다. 만약 버스 1271상의 mem_wr_H 신호가 로직 0이면, 판독 오퍼레이션이 목적되고, 시뮬레이션 시스템은 SRAM 메모리 디바이스에 의해 전송된 FPGA 버스 FD[63:32]상의 데이터를 기다린다. 데이터가 준비되면, SRAM 메모리 디바이스로부터 FPGA 로직 디바이스로 판독 데이터 전송이 일어난다.During the memory access period, MEMFSM unit 1240 provides a latch signal on line 1270 for memory address / control latch 1247 to fetch input from FPGA bus FD [63:32]. The MEMFSM unit 1240 extracts mem_wr_H control information from the address / control signal on the FD [63:32] for further control. If the mem_wr_H signal on bus 1271 is logic 1, a write operation is desired, and an appropriate select signal on line 1269 is generated by MEMFSM unit 1240 and sent to address / control mux 1246, so that the address / control signal on bus 1239 is bus 1277. Is transferred to the high bank SRAM. Thereafter, write data transfer occurs from the FPGA logic device to the SRAM memory device. If the mem_wr_H signal on bus 1271 is logic 0, a read operation is desired, and the simulation system waits for data on the FPGA bus FD [63:32] sent by the SRAM memory device. Once the data is ready, a read data transfer occurs from the SRAM memory device to the FPGA logic device.

도 57에 도시된 바와 같이, 어드레스 및 제어 신호는 버스 1276 및 1277을 통해 각각 로우 뱅크 SRAM 메모리 디바이스 및 하이 뱅크 메모리 디바이스에 제공된다. 로우 뱅크를 위한 버스 1276은 도 56에서 버스 1229 및 1216의 조합에 대응한다. 유사하게, 하이 뱅크를 위한 버스 1277은 도 56의 버스 1214 및 1215의 조합에 대응한다.As shown in FIG. 57, address and control signals are provided to the low bank SRAM memory device and the high bank memory device via buses 1276 and 1277, respectively. Bus 1276 for the low bank corresponds to the combination of buses 1229 and 1216 in FIG. Similarly, bus 1277 for high bank corresponds to the combination of buses 1214 and 1215 of FIG. 56.

본 발명에 따른 메모리 시뮬레이션 시스템을 위한 CTRL_FPGA 유니트 1200의오퍼레이션은 일반적으로 다음과 같다. CTRL_FPGA 유니트 1200의 MEMFSM 유니트 1240 및 컴퓨팅 시스템에 제공되는 라인 1259상의 DONE 신호는 시뮬레이션 기록/판독 사이클의 완료를 지시한다. 라인 1260상의 DATAXSFR 신호는 시뮬레이션 기록/판독 사이클의 DMA 데이터 전송 주기의 발생을 지시한다. FGPA 버스 FD[31:0] 및 FD[63:32]상의 메모리 어드레스/제어 신호는 각각 하이 및 로우 뱅크를 위한 메모리 어드레스/제어 래치 1243 및 1247에 제공된다. 각 뱅크에 대하여, MEMFSM 유니트 1240는 어드레스 및 제어 정보를 래치하기 위하여 래치 신호(1263 또는 1269)를 생성한다. 그리고, 상기 정보는 SRAM 메모리 디바이스로 전송된다. mem_wr 신호는 기록 또는 판독 오퍼레이션이 목적되는가를 결정하는데 사용된다. 만약 기록 오퍼레이션이 목적되면, 데이터가 FPGA 버스를 통해서 FPGA 로직 디바이스 1201 내지 1204에서 SRAM 메모리 디바이스로 전송된다. 만약, 판독 오퍼레이션이 목적되면, 시뮬레이션 시스템은 SRAM 메모리 디바이스에서 FPGA 로직 디바이스로의 전송을 위한 FPGA 버스상으로 요청된 데이터를 제공하기 위하여 SRAM 메모리 디바이스를 기다린다. SPACE4 및 SPACE5의 DMA 데이터 전송을 위하여, 라인 1265 및 1269상의 선택 신호는 시뮬레이션 시스템의 SRAM 메모리 디바이스와 메인 컴퓨팅 시스템 사이에 데이터가 전송될 때 어드레스 카운터 1245의 출력을 선택한다. 이러한 모든 액세스에 대하여, 적절한 대기 사이클이 삽입되어 로직이 준비되고 데이터가 이용가능할 때만 로직이 데이터를 프로세싱한다.The operation of the CTRL_FPGA unit 1200 for the memory simulation system according to the present invention is generally as follows. The DONE signal on line 1259 provided to the MEMFSM unit 1240 of the CTRL_FPGA unit 1200 and the computing system indicates the completion of the simulation write / read cycle. The DATAXSFR signal on line 1260 indicates the occurrence of the DMA data transfer period of the simulation write / read cycle. Memory address / control signals on FGPA buses FD [31: 0] and FD [63:32] are provided to memory address / control latches 1243 and 1247 for the high and low banks, respectively. For each bank, the MEMFSM unit 1240 generates a latch signal 1263 or 1269 to latch the address and control information. The information is then transferred to an SRAM memory device. The mem_wr signal is used to determine if a write or read operation is intended. If a write operation is desired, data is transferred from the FPGA logic devices 1201 through 1204 to the SRAM memory device via the FPGA bus. If a read operation is desired, the simulation system waits for the SRAM memory device to provide the requested data on the FPGA bus for transfer from the SRAM memory device to the FPGA logic device. For DMA data transfer of SPACE4 and SPACE5, the select signal on lines 1265 and 1269 selects the output of address counter 1245 when data is transferred between the SRAM memory device of the simulation system and the main computing system. For all such accesses, the appropriate wait cycle is inserted so that the logic processes the data only when the logic is ready and the data is available.

도 60은 메모리 판독 데이터 더블 버퍼 1251(도 57)를 보다 상세히 도시한다. 각 FPGA 로직 디바이스에서 각 메모리 블록 N은 상이한 시간에 들어올 수 있는 관련 데이터를 래치하고 이러한 관련 래칭된 데이터를 동시에 버퍼링하기 위하여 더블 버퍼를 가진다. 도 60에서, 메모리 블록 0를 위한 더블 버퍼 1391는 2개의 D-타입 플립-플롭 1340 및 1341을 포함한다. 제 1 D-타입 플립-플롭 1341의 출력 1344은 제 2 D-타입 플립-플롭 1341의 입력에 연결된다. 제 2 D-타입 플립-플롭 1341의 출력 1344는 사용자 디자인의 메모리 블록 N 인터페이스에 제공되는 더블 버퍼의 출력이다. 글로벌 클록 입력은 라인 1393상의 제 1 플립-플롭 1340 및 라인 1394상의 제 2 플립-플롭 1341에 제공된다.60 shows memory read data double buffer 1251 (FIG. 57) in more detail. Each memory block N in each FPGA logic device has a double buffer to latch related data that may come in at different times and to buffer this related latched data simultaneously. In FIG. 60, the double buffer 1391 for memory block 0 includes two D-type flip-flops 1340 and 1341. An output 1344 of the first D-type flip-flop 1341 is connected to an input of the second D-type flip-flop 1341. The output 1344 of the second D-type flip-flop 1341 is the output of the double buffer provided to the memory block N interface of the user design. The global clock input is provided to a first flip-flop 1340 on line 1393 and a second flip-flop 1341 on line 1394.

제 1 D 플립-플롭은 하이 뱅크를 위한 FPGA 버스 FD[63:32] 및 로우 뱅크를 위한 FPGA 버스 FD[31:0] 및 버스 1283을 통해서 SRAM 메모리 디바이스로부터의 데이터 입력을 라인 1342상에서 수신한다. 인에이블 입력이 각 FPGA 로직 디바이스를 위한 EVALFSMx 유니트로부터 rd_latx(예, rd_lat0) 신호를 수신하는 라인 1345에 연결된다. 따라서, 판독 오퍼레이션(즉, wrx=0)을 위하여, EVALFSMx 유니트는 라인 1342 및 1343상에 데이터를 래치하기 위하여 rd_latx 신호를 생성한다. 모든 메모리 블록의 모든 더블 버퍼의 입력 데이터는 상이한 시간에 들어오며, 더블 버퍼는 모든 데이터가 우선 래치되도록 한다. 일단 모든 데이터가 D 플립-플롭 1340으로 래치되면, 제 2 D 플립-플롭 1341에 대한 클록 입력으로서 clk_en 신호(즉, 소프트웨어 클록)이 라인 1346상에 제공된다. clk_en 신호가 주어지면, 라인 1343상의 래치된 데이터는 라인 1344상으로 D 플립-플롭 1341로 버퍼링된다.The first D flip-flop receives data input from the SRAM memory device on line 1342 via FPGA bus FD [63:32] for the high bank and FPGA bus FD [31: 0] and bus 1283 for the low bank. . The enable input is connected to line 1345 which receives the rd_latx (eg rd_lat0) signal from the EVALFSMx unit for each FPGA logic device. Thus, for a read operation (ie wrx = 0), the EVALFSMx unit generates an rd_latx signal to latch data on lines 1342 and 1343. The input data of all double buffers of all memory blocks come at different times, and the double buffer causes all data to be latched first. Once all data has been latched to D flip-flop 1340, a clk_en signal (ie, a software clock) is provided on line 1346 as the clock input for second D flip-flop 1341. Given the clk_en signal, the latched data on line 1343 is buffered onto D flip-flop 1341 on line 1344.

다음 메모리 블록 1에 대하여, 더블 버퍼 1391와 실질적으로 동일한 또다른 더블 버퍼 1392가 제공된다. SRAM 메모리 디바이스로부터의 데이터는 라인 1396상의 입력이다. 글로벌 클록 신호는 라인 1397상의 입력이다. clk_en(소프트웨어 클록) 신호는 라인 1398상의 더블 버퍼 1392의 제 2 D 플립-플롭(미도시)로의 입력이다. 이러한 라인들은 메모리 블록 0에 대한 제 1 더블 버퍼 1391 및 메모리 블록 N에 대한 모든 다른 더블 버퍼를 위한 아날로그 신호 라인에 연결된다. 더블 버퍼링된 데이터 출력은 라인 1399상에 제공된다.For the next memory block 1, another double buffer 1392 is provided which is substantially the same as the double buffer 1391. Data from the SRAM memory device is input on line 1396. The global clock signal is the input on line 1397. The clk_en (software clock) signal is input to the second D flip-flop (not shown) of the double buffer 1392 on line 1398. These lines are connected to the analog signal lines for the first double buffer 1391 for memory block 0 and all other double buffers for memory block N. Double buffered data outputs are provided on line 1399.

제 2 더블 버퍼 1392를 위한 rd_latx 신호(예, rd_lat0)가 다른 더블 버퍼를 위한 다른 rd_latx 신호와 분리되어 라인 1395상에 제공된다. 더 많은 더블 버퍼가 다른 메모리 블록 N을 위하여 제공된다.An rd_latx signal (e.g., rd_lat0) for the second double buffer 1392 is provided on line 1395 separately from other rd_latx signals for the other double buffer. More double buffers are provided for other memory blocks N.

이하에서는 본 발명의 일 실시예에 따라서 MEMFSM 유니트 1240의 상태도를 설명한다. 도 58은 CTRL_FPGA 유니트의 MEMFSM 유니트의 유한 상태 머신의 상태도를 도시한다. 도 58의 상태도는 시뮬레이션 기록/판독 사이클내의 3개의 상태가 또한 그들의 대응 상태를 가지도록 구성되어 있다. 따라서, 상태 1300 내지 1301은 DMA 데이터 전송 주기에 대응하고, 상태 1302 내지 1303은 평가 주기에 대응하고, 상태 1305 내지 1314는 메모리 액세스 주기에 대응한다. 하기에 설명될 도 58과 관련하여 도 57을 참조하라.Hereinafter, a state diagram of the MEMFSM unit 1240 according to an embodiment of the present invention will be described. 58 shows a state diagram of a finite state machine of the MEMFSM unit of the CTRL_FPGA unit. The state diagram of FIG. 58 is configured such that the three states in the simulation write / read cycle also have their corresponding states. Thus, states 1300-1301 correspond to DMA data transfer periods, states 1302-1303 correspond to evaluation periods, and states 1305-1314 correspond to memory access periods. See FIG. 57 in connection with FIG. 58 to be described below.

일반적으로, DMA 전송, 평가, 및 메모리 액세스에 대한 신호 시퀀가 세트된다. 일 실시예에서, 시퀀스는 다음과 같다: 만약 가능하다면, DATA_XSFR은 DAM 데이터 전송을 트리거한다. 하이 및 로우 뱅크에 대한 LAST 신호는 DMA 데이터 전송이 완료될 때 생성되어, DMA 데이터 전송 주기의 완료를 지시하기 위하여 DONE 신호를 트리거한다. 그리고, XSFR_DONE 신호가 생성되고 EVAL 사이클이 시작된다.EVAL이 완료되면, 메모리 기록/판독이 시작된다.In general, signal sequences for DMA transfers, evaluations, and memory accesses are set. In one embodiment, the sequence is as follows: If possible, DATA_XSFR triggers DAM data transfer. LAST signals for the high and low banks are generated when the DMA data transfer is completed, triggering the DONE signal to indicate the completion of the DMA data transfer period. Then, the XSFR_DONE signal is generated and the EVAL cycle begins. When EVAL is completed, memory write / read starts.

도 58을 참조하면, DATAXSFR 신호가 로직 0일 때 항상 상태 1300은 대기한다. 이는 DMA 데이터 전송이 일어나지 않는다는 것을 의미한다. DATAXSFR 신호가 로직 1일 때, MEMFSM 유니트 1240은 상태 1301로 진행한다. 여기서, 컴퓨팅 시스템은 컴퓨팅 시스템(도 1, 45, 46의 메인 메모리)과 시뮬레이션 시스템(도 56의 FPGA 로직 디바이스 1201 내지 1204 또는 SRAM 메모리 디바이스 1205) 사이에 DMA 데이터 전송을 요구한다. DMA 데이터 전송이 완료될 때까지 적절한 대기 사이클이 삽입된다. DMA 전송이 완료되면, DATAXSFR 신호는 로직 0으로 리턴된다.Referring to FIG. 58, when the DATAXSFR signal is logic 0, the state 1300 always waits. This means that no DMA data transfer takes place. When the DATAXSFR signal is logic 1, the MEMFSM unit 1240 proceeds to state 1301. Here, the computing system requires DMA data transfer between the computing system (main memory of FIGS. 1, 45, 46) and the simulation system (FPGA logic devices 1201-1204 or SRAM memory device 1205 of FIG. 56). The appropriate wait cycle is inserted until the DMA data transfer is complete. When the DMA transfer is complete, the DATAXSFR signal is returned to logic zero.

DATAXSFR 신호가 로직 0으로 리턴될 때, 시작 신호가의 생성이 상태 1302에서 MEMFSM 유니트에서 트리거된다. 시작 신호는 프로그램가능한 카운터인 EVAL 카운터 1242를 시작시킨다. EVAL 카운터내의 프로그램된 카운트의 지속은 평가 주기의 지속과 동일하다. EVAL 카운터가 상태 1303에서 카운팅하는 동안, EVAL 신호가 로직 1을 나타내고 MEMFSM 유니트 1240 및 FPGA 로직 디바이스내의 EVALFSMx에 제공된다. 카운트가 완료될 때, EVAL 카운터는 EVAL 신호가 로직 0인 것을 나타내고, 이를 MEMFSM 유니트 1240 및 FPGA 로직 디바이스내의 EVALFSMx로 전송한다. MEMFSM 유니트 1240이 로직 0 EVAL 신호를 수신하면, 이는 상태 1304에서 EVAL_DONE 플래그를 턴온한다. EVAL_DONE 플래그는 평가 주기가 완료되었고 메모리 액세스 주기가 진행된다는 것을 나타내기 위하여 MEMFSM에 의해 사용된다. CPU는 XSFR_EVAL 레지스터(하기의 표 K 참조)를 판독함으로써 EVAL_DONE 및 XSFR_DONE를 체크하여, 다음 DMA 전송 전에 DMA 전송 및 EVAL이 성공적으로 완료되었는가를확인한다.When the DATAXSFR signal returns to logic 0, the generation of a start signal is triggered in the MEMFSM unit in state 1302. The start signal starts EVAL counter 1242, a programmable counter. The duration of the programmed count in the EVAL counter is equal to the duration of the evaluation cycle. While the EVAL counter is counting in state 1303, an EVAL signal indicates logic 1 and is provided to EVALFSMx in MEMFSM unit 1240 and the FPGA logic device. When the count is complete, the EVAL counter indicates that the EVAL signal is logic zero and sends it to EVALFSMx in the MEMFSM unit 1240 and the FPGA logic device. When MEMFSM unit 1240 receives a logic 0 EVAL signal, it turns on the EVAL_DONE flag in state 1304. The EVAL DONE flag is used by the MEMFSM to indicate that the evaluation cycle is complete and that the memory access cycle is in progress. The CPU checks EVAL_DONE and XSFR_DONE by reading the XSFR_EVAL register (see Table K below) to verify that the DMA transfer and EVAL completed successfully before the next DMA transfer.

그러나, 몇몇 경우에 있어서는, 시뮬레이션 시스템이 지금 현재 메모리 액세스를 수행하는 것을 원하지 않을 수 있다. 여기서, 시뮬레이션 시스템은 메모리 인에이블 신호 MEM_EN을 로직 0으로 유지한다. 이러한 디스에이블(로직 0)된 MEM_EN 신호는 MEMFSM 유니트를 아이들 상태 1300으로 유지하는데, 이는 MEMFSM 유니트가 DMA 데이터 전송 및 FPGA 로직 디바이스에 의한 데이터 평가를 대기한다. 한편, 만약 메모리 인에이블 신호 MEM_EN이 로직 1이면, 시뮬레이션 시스템은 원하는 메모리 액세스 수행을 지시한다.However, in some cases, the simulation system may not want to perform the current memory access now. Here, the simulation system keeps the memory enable signal MEM_EN at logic zero. This disabled (logic 0) signal keeps the MEMFSM unit in idle state 1300, which waits for the MEMFSM unit to transfer DMA data and evaluate the data by the FPGA logic device. On the other hand, if the memory enable signal MEM_EN is logic 1, then the simulation system instructs to perform the desired memory access.

도 58의 상태 1304 이하에서는, 상태도가 병렬로 진행하는 2개의 섹션으로 세분된다. 하나의 섹션은 로우 뱅크 메모리 액세스를 위하여 상태 1305, 1306, 1307, 1308, 1309를 포함한다. 다른 섹션은 하이 뱅크 메모리 액세스를 위하여 상태 1311, 1312, 1313, 1314, 1309를 포함한다.Under state 1304 in FIG. 58, the state diagram is subdivided into two sections running in parallel. One section includes states 1305, 1306, 1307, 1308, 1309 for low bank memory access. The other section includes states 1311, 1312, 1313, 1314, 1309 for high bank memory access.

상태 1305에서, 시뮬레이션 시스템은 어드레스 및 제어 신호를 FPGA 버스 FD[31:0]상에 제공하기 위하여 현재 선택된 FPGA 로직 디바이스를 위한 하나의 사이클을 대기한다. 상태 1306에서, MEMFSM은 라인 1263상에 래치 신호를 생성하여 메모리 어드레스/제어 래치 1243으로 전송하여 FD[31:0]로부터 입력을 패치한다. 이러한 특정 패치된 어드레스 및 제어 신호에 대응하는 데이터는 SRAM 메모리 디바이스로부터 판독되거나 또는 SRAM 메모리 디바이스로 기록된다. 시뮬레이션 시스템이 판독 오퍼레이션을 요구하는지 또는 기록 오퍼레이션을 요구하는지를 결정하기 위하여, 로우 뱅크를 위한 메모리 기록 신호 mem_wr_L이 어드레스 및 제어 신호로부터 추출된다. 만약 mem_wr_L=0이면, 판독 오퍼레이션이 요청된다. 만약 mem_wr_L=1이면, 기록 오퍼레이션이 요청된다. 전술한 바와 같이, 상기 mem_wr_L 신호는 칩 선택 기록 신호와 등가물이다.In state 1305, the simulation system waits one cycle for the currently selected FPGA logic device to provide an address and control signal on the FPGA bus FD [31: 0]. In state 1306, the MEMFSM generates a latch signal on line 1263 and sends it to memory address / control latch 1243 to patch the input from FD [31: 0]. Data corresponding to this particular patched address and control signal is read from or written to the SRAM memory device. In order to determine whether the simulation system requires a read operation or a write operation, the memory write signal mem_wr_L for the low bank is extracted from the address and control signals. If mem_wr_L = 0, a read operation is requested. If mem_wr_L = 1, a write operation is requested. As described above, the mem_wr_L signal is equivalent to the chip select write signal.

상태 1307에서, 어드레스/제어 mux 1244를 위한 적절한 선택 신호가 생성되어 어드레스 및 제어 신호를 로우 뱅크 SRAM으로 전송한다. MEMFSM 유니트는 mem_wr 신호 및 LASTL 신호를 체크한다. 만약 mem_wr_L=1이고 LASTL=0이면, 기록 오퍼레이션이 요청되지만 FPGA 로직 디바이스의 체인에서 마지막 데이터는 아직 시프트되지 않는다. 따라서, 시뮬레이션 시스템은 상태 1305로 리턴되는데, 여기서 시뮬레이션 시스템은 더 많은 어드레스 및 제어 신호를 FD[31:0]상에 전송하기 위하여 FPGA 로직 디바이스를 위한 하나의 사이클을 대기한다. 상기 프로세스는 마지막 데이터가 FPGA 로직 디바이스로 시프트될 때까지 계속된다. 그러나, 만약 mem_wr_L=1이고 LASTL=1이면, 마지막 데이터는 FPGA 로직 디바이스로 시프트된다.In state 1307, an appropriate select signal for address / control mux 1244 is generated to transmit the address and control signal to the low bank SRAM. The MEMFSM unit checks the mem_wr signal and the LASTL signal. If mem_wr_L = 1 and LASTL = 0, a write operation is requested but the last data in the chain of FPGA logic devices is not yet shifted. Thus, the simulation system returns to state 1305, where the simulation system waits one cycle for the FPGA logic device to send more address and control signals on the FD [31: 0]. The process continues until the last data is shifted to the FPGA logic device. However, if mem_wr_L = 1 and LASTL = 1, the last data is shifted to the FPGA logic device.

유사하게, 만약 판독 오퍼레이션을 지시하는 mem_wr_L=0이면, MEMFSM은 상태 1308로 진행한다. 상태 1308에서, 시뮬레이션 시스템은 데이터를 FPGA 버스 FD[31:0]으로 전송하기 위하여 SRAM 메모리 디바이스를 위한 하나의 사이클을 기다린다. 만약 LASTL=0이면, FPGA 로직 디바이스의 체인의 마지막 데이터는 아직 시프트되지 않는다. 따라서, 시뮬레이션 시스템은 상태 1305로 리턴되는데, 여기서 시뮬레이션 시스템은 더 많은 어드레스 및 제어 신호를 FD[31:0]상에 전송하기 위하여 FPGA 로직 디바이스를 위한 하나의 사이클을 대기한다. 상기 프로세스는 마지막 데이터가 FPGA 로직 디바이스로 시프트될 때까지 계속된다. 기록오퍼레이션(mem_wr_L=1) 및 판독 오퍼레이션(mem_wr_L=0)은 인터리브될 수 있거나 그렇지 않으면 LASTL=1이 될 때까지 교대로 반복된다.Similarly, if mem_wr_L = 0 indicating a read operation, the MEMFSM proceeds to state 1308. In state 1308, the simulation system waits one cycle for the SRAM memory device to transfer data to the FPGA bus FD [31: 0]. If LASTL = 0, the last data in the chain of the FPGA logic device is not shifted yet. Thus, the simulation system returns to state 1305, where the simulation system waits one cycle for the FPGA logic device to send more address and control signals on the FD [31: 0]. The process continues until the last data is shifted to the FPGA logic device. The write operation (mem_wr_L = 1) and the read operation (mem_wr_L = 0) may be interleaved or otherwise repeated until LASTL = 1.

LASTL=1일 때, MEMFSM 은 상태 1309로 진행하는데, 여기서는 DONE=0일 때까지 대기한다. DONE=1이고, LASTL 및 LASTH가 로직 1이면, 시뮬레이션 기록/판독 사이클이 완료된다. 그리고, 시뮬레이션 시스템은 상태 1300으로 진행하고, 여기서 DATAXSFR=0인동안 대기한다.When LASTL = 1, MEMFSM proceeds to state 1309, where it waits until DONE = 0. If DONE = 1 and LASTL and LASTH are logic 1, the simulation write / read cycle is complete. The simulation system then proceeds to state 1300 where it waits for DATAXSFR = 0.

하이 뱅크에 대하여도 유사한 프로세스가 적용될 수 있다. 상태 1311에서, 시뮬레이션 시스템은 어드레스 및 제어 신호를 FPGA 버스 FD[63:32]상에 전송하기 위하여 현재 선택된 FPGA 로직 디바이스에 대한 하나의 사이클을 기다린다. 상태 1312에서, MEMFSM은 라인 1270상에 래치 신호를 생성하여 메모리 어드레스/제어 래치 1247으로 전송하여 FD[63:32]로부터 입력을 패치한다. 이러한 특정 패치된 어드레스 및 제어 신호에 대응하는 데이터는 SRAM 메모리 디바이스로부터 판독되거나 또는 SRAM 메모리 디바이스로 기록된다. 시뮬레이션 시스템이 판독 오퍼레이션을 요구하는지 또는 기록 오퍼레이션을 요구하는지를 결정하기 위하여, 하이 뱅크를 위한 메모리 기록 신호 mem_wr_H이 어드레스 및 제어 신호로부터 추출된다. 만약 mem_wr_H=0이면, 판독 오퍼레이션이 요청된다. 만약 mem_wr_H=1이면, 기록 오퍼레이션이 요청된다.Similar processes can be applied for high banks. In state 1311, the simulation system waits one cycle for the currently selected FPGA logic device to send address and control signals on the FPGA bus FD [63:32]. In state 1312, MEMFSM generates a latch signal on line 1270 and sends it to memory address / control latch 1247 to patch the input from FD [63:32]. Data corresponding to this particular patched address and control signal is read from or written to the SRAM memory device. In order to determine whether the simulation system requires a read operation or a write operation, the memory write signal mem_wr_H for the high bank is extracted from the address and control signal. If mem_wr_H = 0, a read operation is requested. If mem_wr_H = 1, a write operation is requested.

상태 1313에서, 어드레스/제어 mux 1246를 위한 적절한 선택 신호가 생성되어 어드레스 및 제어 신호를 하이 뱅크 SRAM으로 전송한다. MEMFSM 유니트는 mem_wr 신호 및 LASTH 신호를 체크한다. 만약 mem_wr_H=1이고 LASTH=0이면, 기록오퍼레이션이 요청되지만 FPGA 로직 디바이스의 체인에서 마지막 데이터는 아직 시프트되지 않는다. 따라서, 시뮬레이션 시스템은 상태 1311로 리턴되는데, 여기서 시뮬레이션 시스템은 더 많은 어드레스 및 제어 신호를 FD[63:32]상에 전송하기 위하여 FPGA 로직 디바이스를 위한 하나의 사이클을 대기한다. 상기 프로세스는 마지막 데이터가 FPGA 로직 디바이스로 시프트될 때까지 계속된다. 그러나, 만약 mem_wr_H=1이고 LASTH=1이면, 마지막 데이터는 FPGA 로직 디바이스로 시프트된다.In state 1313, an appropriate select signal for address / control mux 1246 is generated to transmit the address and control signal to the high bank SRAM. The MEMFSM unit checks the mem_wr signal and the LASTH signal. If mem_wr_H = 1 and LASTH = 0, a write operation is requested but the last data in the chain of FPGA logic devices is not yet shifted. Thus, the simulation system returns to state 1311, where the simulation system waits one cycle for the FPGA logic device to send more address and control signals on the FD [63:32]. The process continues until the last data is shifted to the FPGA logic device. However, if mem_wr_H = 1 and LASTH = 1, the last data is shifted to the FPGA logic device.

유사하게, 만약 판독 오퍼레이션을 지시하는 mem_wr_H=0이면, MEMFSM은 상태 1314로 진행한다. 상태 1314에서, 시뮬레이션 시스템은 데이터를 FPGA 버스 FD[63:32]으로 전송하기 위하여 SRAM 메모리 디바이스를 위한 하나의 사이클을 기다린다. 만약 LASTH=0이면, FPGA 로직 디바이스의 체인의 마지막 데이터는 아직 시프트되지 않는다. 따라서, 시뮬레이션 시스템은 상태 1311로 리턴되는데, 여기서 시뮬레이션 시스템은 더 많은 어드레스 및 제어 신호를 FD[63:32]상에 전송하기 위하여 FPGA 로직 디바이스를 위한 하나의 사이클을 대기한다. 상기 프로세스는 마지막 데이터가 FPGA 로직 디바이스로 시프트될 때까지 계속된다. 기록 오퍼레이션(mem_wr_H=1) 및 판독 오퍼레이션(mem_wr_H=0)은 인터리브될 수 있거나 그렇지 않으면 LASTL=1이 될 때까지 교대로 반복된다.Similarly, if mem_wr_H = 0 indicating a read operation, the MEMFSM proceeds to state 1314. In state 1314, the simulation system waits one cycle for the SRAM memory device to transfer data to the FPGA bus FDs [63:32]. If LASTH = 0, the last data in the chain of the FPGA logic device is not shifted yet. Thus, the simulation system returns to state 1311, where the simulation system waits one cycle for the FPGA logic device to send more address and control signals on the FD [63:32]. The process continues until the last data is shifted to the FPGA logic device. The write operation mem_wr_H = 1 and the read operation mem_wr_H = 0 may be interleaved or alternately repeated until LASTL = 1.

LASTH=1일 때, MEMFSM 은 상태 1309로 진행하는데, 여기서는 DONE=0일 때까지 대기한다. DONE=1이고, LASTL 및 LASTH가 로직 1이면, 시뮬레이션 기록/판독 사이클이 완료된다. 그리고, 시뮬레이션 시스템은 상태 1300으로 진행하고, 여기서 DATAXSFR=0인동안 대기한다.When LASTH = 1, MEMFSM proceeds to state 1309, where it waits until DONE = 0. If DONE = 1 and LASTL and LASTH are logic 1, the simulation write / read cycle is complete. The simulation system then proceeds to state 1300 where it waits for DATAXSFR = 0.

대안적으로, 하이 뱅크 및 로우 뱅크 상태 1309 및 1320이 본 발명의 또다른 실시예에 따라서 수행될 수 있다. 따라서, 로우 뱅크에서, 상태 1308(LASTL=1) 및 1307(MEM_WR_L=1 및 LASTL=1)을 통과한 후에 MEMFSM은 상태 1300으로 바로 진행한다. 하이 뱅크에서, 상태 1313(LASTH=1) 및 1313(MEM_WR_H=1 및 LASTH=1)을 통과한 후에 MEMFSM은 상태 1300으로 바로 진행한다.Alternatively, high bank and low bank states 1309 and 1320 may be performed in accordance with another embodiment of the present invention. Thus, in the low bank, after passing through states 1308 (LASTL = 1) and 1307 (MEM_WR_L = 1 and LASTL = 1), the MEMFSM proceeds directly to state 1300. In the high bank, the MEMFSM proceeds directly to state 1300 after passing states 1313 (LASTH = 1) and 1313 (MEM_WR_H = 1 and LASTH = 1).

이하에서는 본 발명의 일 실시예에 따라서 EVALFSM 유니트 1248의 상태도를 설명한다. 도 59는 각 FPGA 칩에서 EVALFSMx 유한 상태 머신의 상태도를 도시한다. 도 58과 유사하게, 도 59의 상태도는 시뮬레이션 기록/판독 사이클내의 2개의 주기가 그들의 대응 상태를 나타내도록 구성된다. 따라서, 상태 1320 내지 1326A는 평가 주기에 대응하고, 상태 1326B 내지 1336은 메모리 액세스 주기에 대응한다. 하기에 설명될 도 58과 관련하여 도 57을 참조하라.Hereinafter, a state diagram of the EVALFSM unit 1248 will be described according to an embodiment of the present invention. 59 shows a state diagram of the EVALFSMx finite state machine at each FPGA chip. Similar to FIG. 58, the state diagram of FIG. 59 is configured such that two periods in the simulation write / read cycle indicate their corresponding states. Thus, states 1320-1326A correspond to evaluation periods, and states 1326B-1336 correspond to memory access periods. See FIG. 57 in connection with FIG. 58 to be described below.

EVALFSMx 유니트 1248은 CTRL_FPGA 유니트 1200로부터 라인 1274상의 EVAL 신호를 수신한다(도 57 참조). EVAL=0일 때, FPGA 로직 디바이스에 의한 데이터 평가는 수행되지 않는다. 따라서, 상태 1320에서, EVAL=0인 경우 EVALFSMx은 대기한다. EVAL=1일 때, EVALFSMx은 상태 1321로 진행한다.The EVALFSMx unit 1248 receives the EVAL signal on line 1274 from the CTRL_FPGA unit 1200 (see FIG. 57). When EVAL = 0, data evaluation by the FPGA logic device is not performed. Thus, in state 1320, EVALFSMx waits if EVAL = 0. When EVAL = 1, EVALFSMx proceeds to state 1321.

상태 1321, 1322, 1323은 인터-FPGA 통신과 관련되는데, 여기서 데이터는 FPGA 로직 디바이스를 통해 사용자 디자인에 의해 평가된다. 여기서, EVALFSMx는 신호 input_en, mux_en, clk_en(도 57의 아이템 1281)을 생성하여 사용자의 로직으로 전송한다. 상태 1321에서, EVALFSMx은 상기 사이클에서 사용자 디자인 로직내의 모든 클록 에지 레지스터 플립-플롭의 제 2 플립-플롭을 인에이블하는 clk_en신호를 생성한다. 그렇지 않을 경우, clk_en 신호는 소프트웨어로서 제공될 수 있다. 만약 사용자의 메모리 타입이 동기식이면, clk_en 신호는 각 메모리 블록내의 메모리 판독 데이터 더블 버퍼 1251의 제 2 클록을 또한 인에이블할 수 있다. 각 메모리 블록에 대한 SRAM 데이터 출력은 이 사이클에서 사용자 디자인 로직으로 전송된다.States 1321, 1322, and 1323 relate to inter-FPGA communication, where data is evaluated by a user design through an FPGA logic device. Here, EVALFSMx generates signals input_en, mux_en, clk_en (item 1281 of FIG. 57) and transmits them to the user's logic. In state 1321, EVALFSMx generates a clk_en signal that enables the second flip-flop of all clock edge register flip-flops in the user design logic in the cycle. Otherwise, the clk_en signal can be provided as software. If the user's memory type is synchronous, the clk_en signal may also enable the second clock of the memory read data double buffer 1251 in each memory block. The SRAM data output for each memory block is sent to user design logic in this cycle.

상태 1322에서, EVALFSMx는 DMA 전송에 의해 CPU로부터 사용자 로직으로 전송되는 입력 신호를 래치하기 위하여 사용자 디자인 로직에 대한 input_en 신호를 생성한다. input_en 신호는 주요 클록 레지스터내의 제 2 플립-플롭(도 19 참조)으로 인에이블 입력을 제공한다.At state 1322, EVALFSMx generates an input_en signal for the user design logic to latch the input signal sent from the CPU to the user logic by the DMA transfer. The input_en signal provides an enable input to the second flip-flop (see FIG. 19) in the main clock register.

상태 1323에서, EVALFSMx는 어레이에서 FPGA 로직 디바이스와 통신을 시작하기 위하여 각 FPGA 로직 디바이스의 멀티플렉싱 회로를 턴온하는 mux_en 신호를 생성한다. 전술한 바와 같이, 인터-FPGA 와이어 라인은 때때로 멀티플렉싱되어 각 FPGA 로직 디바이스 칩내의 제한된 핀 자원을 효과적으로 이용하게 한다.At state 1323, EVALFSMx generates a mux_en signal that turns on the multiplexing circuitry of each FPGA logic device to initiate communication with the FPGA logic device in the array. As noted above, inter-FPGA wire lines are sometimes multiplexed to effectively utilize limited pin resources within each FPGA logic device chip.

상태 1324에서, EVAL=1인 동안 EVALFSM는 대기한다. EVAL=0일 때, 평가 주기는 완료되고, 상태 1325는 EVALFSMx가 mux_en 신호를 턴온하는 것을 요청한다.In state 1324, EVALFSM waits while EVAL = 1. When EVAL = 0, the evaluation cycle is complete, and state 1325 requests EVALFSMx to turn on mux_en signal.

만약 메모리 블록 M(여기서 M은 0을 포함하는 정수)의 수가 0이면, EVALFSMx 는 상태 1320으로 리턴하고, 여기서 EVAL=0인 경우 대기한다. 대부분의 경우, M>0 이므로, EVALFSMx는 상태 1326A/1326B로 진행한다. "M"은 FPGA 로직 디바이스내의 메모리 블록의 수이다. 이는 FPGA 로직 디바이스내에 맵핑 및 구현된 사용자 디자인으로부터의 상수이다. 이는 카운트 다운되지 않는다. 만약 M>0이면, 도 59의우측 부분(메모리 액세스 주기)은 FPGA 로직 디바이스내에 구현된다. 만약 M=0이면, 도 59의 좌측 부분(EVAL 주기)만이 구현된다.If the number of memory blocks M (where M is an integer containing 0) is zero, EVALFSMx returns to state 1320 where it waits if EVAL = 0. In most cases, M> 0, EVALFSMx proceeds to state 1326A / 1326B. "M" is the number of memory blocks in the FPGA logic device. This is a constant from the user design that is mapped and implemented within the FPGA logic device. It does not count down. If M> 0, the right portion (memory access period) of FIG. 59 is implemented in the FPGA logic device. If M = 0, only the left part (EVAL period) of FIG. 59 is implemented.

상태 1327은 SHIFTIN=0인 동안 EVALFSMx를 대기 상태로 유지한다. SHIFTIN=1일 때, 이전의 FPGA 로직 디바이스는 그 메모리 액세스를 완료하고 현재의 FPGA 로직 디바이스는 그 메모리 액세스 작업을 수행할 준비를 한다. 대안적으로, SHIFTIN=1이면, 현재의 FPGA 로직 디바이스는 뱅크에서 제 1 로직 디바이스이며 SHIFTIN 입력 라인은 Vcc에 연결된다. 그럼에도 불구하고, SHIFTIN=1 신호의 수신은 현재의 FPGA 로직 디바이스가 메모리 액세스를 수행할 준비를 한 것을 지시한다. 상태 1328에서, 메모리 블록 수 N은 N=1로 세트된다. 이 숫자 N은 각 루프를 수행할 때마다 증가되어 특정 메모리 블록 N에 대한 메모리 액세스가 수행될 수 있다. 초기에, N=1이며, EVALFSMx는 메모리 블록 1에 대한 메모리 액세스를 수행한다.State 1327 holds EVALFSMx on standby while SHIFTIN = 0. When SHIFTIN = 1, the previous FPGA logic device completes its memory access and the current FPGA logic device is ready to perform its memory access task. Alternatively, if SHIFTIN = 1, the current FPGA logic device is the first logic device in the bank and the SHIFTIN input line is connected to Vcc. Nevertheless, receipt of the SHIFTIN = 1 signal indicates that the current FPGA logic device is ready to perform a memory access. In state 1328, the memory block number N is set to N = 1. This number N is incremented with each loop, allowing memory accesses to specific memory blocks N to be performed. Initially, N = 1, EVALFSMx performs memory access to memory block 1.

상태 1329에서, EVALFSMx는 Mem_Block_N 인터페이스 1253의 어드레스 및 제어 신호를 FPGA 버스 FD[63:32] 또는 FD[31:0]상에 전송하기 위하여 라인 1285상에 선택 신호를 생성하고 라인 1284상에 output_en 신호를 생성하여 FGPA 버스 드라이버 FDO_MUXx 1249로 전송한다. 만약 기록 오퍼레이션 요구되면 wr=1이 된다. 그렇지 않고, 판독 오페레이션이 요구되면 wr=0이 된다. EVALFSMx는 그 입력 중 하나로 라인 1287상의 wr 신호를 수신한다. 상기 wr 신호에 기초하여, 라인 1285상에 적절한 선택 신호가 주어진다.In state 1329, EVALFSMx generates a select signal on line 1285 and an output_en signal on line 1284 to send the address and control signals of Mem_Block_N interface 1253 on the FPGA bus FD [63:32] or FD [31: 0]. Create and send to the FGPA bus driver FDO_MUXx 1249. If a write operation is requested, wr = 1. Otherwise, if read operation is requested, wr = 0. EVALFSMx receives one of its inputs, the wr signal on line 1287. Based on the wr signal, an appropriate selection signal is given on line 1285.

wr=1일 때, EVALFSMx은 상태 1330으로 진행한다. EVALFSMx은 FPGA 버스FD[63:32] 또는 FD[31:0]상으로 Mem_Block_N 1253의 기록 데이터를 제공하기 위하여 FD 버스 드라이버를 위한 선택 및 output_en 신호를 생성한다. 그 후, EVALFSMx는 SRAM 메모리 디바이스가 기록 사이클을 완료하도록 하나의 사이클을 기다린다. 그리고, EVALFSMx는 상태 1335로 진행하고, 여기서 메모리 블록 수 N은 1이 증가한다. 즉, N=N+1.When wr = 1, EVALFSMx proceeds to state 1330. EVALFSMx generates select and output_en signals for the FD bus driver to provide the write data of Mem_Block_N 1253 on the FPGA bus FD [63:32] or FD [31: 0]. EVALFSMx then waits one cycle for the SRAM memory device to complete the write cycle. EVALFSMx then proceeds to state 1335, where the number N of memory blocks is increased by one. That is, N = N + 1.

그러나, 만약 wr=0이면, 판독 오퍼레이션이 요청되고, EVALFSMx는 상태 1332로 진행하며, 여기서 하나의 사이클을 기다리고, 상태 1333으로 진행하여 또다른 하나의 사이클을 기다린다. 상태 1334에서, EVALFSMx은 라인 1286상에 rd_latch 신호를 생성하여, 메모리 블록 N의 메모리 판독 데이터 더블 버퍼 1251이 SRAM 데이터를 FD 버스상으로 패치하게 한다. EVALFSMx는 상태 1335에 진행하고, 여기서 메모리 블록 N이 1 증가한다. 즉, N=N+1. 따라서, 만약 증가 상태 1335 이전에 N=1이면, N은 2가 되고, 메모리 블록 2에 대하여 시퀀스 메모리 액세스가 실행될 것이다.However, if wr = 0, a read operation is requested and EVALFSMx proceeds to state 1332, where it waits for one cycle and proceeds to state 1333 to wait for another cycle. In state 1334, EVALFSMx generates an rd_latch signal on line 1286, causing memory read data double buffer 1251 of memory block N to patch the SRAM data onto the FD bus. EVALFSMx proceeds to state 1335, where memory block N is incremented by one. That is, N = N + 1. Thus, if N = 1 before increment state 1335, N will be 2, and a sequence memory access will be performed for memory block 2. FIG.

만약 현재의 메모리 블록 N의 수가 사용자 디자인에서 총 메모리 블록 M의 수보다 작거나 같으면(즉, N ≤M), EVALFSMx는 상태 1329로 진행하고, 여기서 오퍼레이션이 기록 오퍼레이션인가 또는 판독 오퍼레이션인가에 기초하여 FD 버스 드라이버에 대한 특정 선택 및 output_en 신호를 생성한다. 그리고, 다음 메모리 블록 N에 대한 기록 또는 판독 오퍼레이션이 수행된다.If the current number of memory blocks N is less than or equal to the total number of memory blocks M in the user design (i.e., N ≦ M), EVALFSMx proceeds to state 1329, based on whether the operation is a write operation or a read operation. Generate specific selection and output_en signals for the FD bus driver. Then, a write or read operation to the next memory block N is performed.

그러나, 만약 현재의 메모리 블록 N의 수가 사용자 디자인에서 총 메모리 블록 M의 수보다 크면(즉, N > M), EVALFSMx는 상태 1336로 진행하고, 여기서 SRAM메모리 디바이스를 액세스하도록 뱅크의 다음 FPGA 로직 디바이스를 허용하는 SHIFTOUT 출력 신호를 턴온한다. 그 후, EVALFSMx은 상태 1320으로 진행하고, 여기서 시뮬레이션 시스템이 FPGA 로직 디바이스에서 데이터 평가를 요청할 때(즉, EVAL=1)까지 대기한다.However, if the current number of memory blocks N is greater than the total number of memory blocks M in the user design (ie, N> M), EVALFSMx proceeds to state 1336, where the next FPGA logic device in the bank to access the SRAM memory device. Turn on the SHIFTOUT output signal to allow. EVALFSMx then proceeds to state 1320 where it waits until the simulation system requests data evaluation from the FPGA logic device (ie, EVAL = 1).

도 61은 본 발명의 일 실시예에 따른 시뮬레이션 기록/판독 사이클을 도시한다. 도 61은 참조 번호 1366에서 시뮬레이션 기록/판독 사이클내의 3개의 주기(DMA 데이터 전송 주기, 평가 주기, 및 메모리 액세스 주기)를 도시한다. 비록 도시되지는 않았지만. 이전의 DMA 전송, 평가, 및 메모리 액세스도 수행된다는 것이 또한 암시된다. 더 나아가. 로우 뱅크 SRAM 에 대한 데이터 전송 타이밍은 하이 뱅크 SRAM과 다를 것이다. 간단화를 위하여, 도 61은 이상적 로우 또는 하이 뱅크에 대한 액세스 시간을 하나 예시한다. 글로벌 클록 GCLK 1350는 시스템내의 모든 컴포넌트에 대하여 클로킹 신호를 제공한다.61 shows a simulation write / read cycle according to an embodiment of the present invention. FIG. 61 shows three cycles (DMA data transfer cycle, evaluation cycle, and memory access cycle) in the simulation write / read cycle at 1366. Although not shown. It is also implied that previous DMA transfers, evaluations, and memory accesses are also performed. Furthermore. The data transfer timing for the low bank SRAM will be different than the high bank SRAM. For simplicity, FIG. 61 illustrates one access time for an ideal low or high bank. The global clock GCLK 1350 provides a clocking signal for all components in the system.

DATAXSFR 신호 1351은 DMA 데이터 전송 주기의 발생을 지시한다. 트레이스 1367에서 DATAXSFR=1일 때, DMA 데이터 전송은 메인 컴퓨팅 시스템 및 FPGA 로직 디바이스 또는 SRAM 메모리 디바이스 사이에 수행된다. 따라서, 데이터는 FPGA 하이 뱅크 버스 FD[63:32] 1359 및 트레이스 1369상에 제공될 뿐만 아니라, FPGA 로우 뱅크 버스 FD[31:0] 1358 및 트레이스 1368상에 제공된다. DONE 신호 1364는 로직 신호 0에서 1로 전환(예, 트레이스 1390)됨으로써 메모리 액세스 주기의 완료을 지시하거나 또는 그렇지 않을 경우 로직 0(예, 트레이스 1370의 에지 및 트레이스 1390의 에지의 조합)을 가지는 시뮬레이션 기록/판독 사이클의 지속을 지시한다. DMA 전송 주기 동안, DONE 신호는 로직 0이다.DATAXSFR signal 1351 indicates the occurrence of a DMA data transfer period. When DATAXSFR = 1 in trace 1367, DMA data transfer is performed between the main computing system and the FPGA logic device or SRAM memory device. Thus, data is provided on FPGA high bank bus FD [63:32] 1359 and trace 1369 as well as on FPGA low bank bus FD [31: 0] 1358 and trace 1368. DONE signal 1364 transitions from logic signal 0 to 1 (e.g., trace 1390) to indicate completion of a memory access cycle, or else simulated recording with logic 0 (e.g., a combination of edge of trace 1370 and edge of trace 1390). Indicates the duration of the read cycle. During the DMA transfer period, the DONE signal is logic zero.

DMA 전송 주기가 완료되면, DATAXSFR 신호는 로직 1에서 로직 0으로 전환되고, 이는 평가 주기의 온세트는 트리거한다. 따라서, EVAL 1352는 트레이스 1371로 지시된 바와 같이 로직 1이다. 로직 1에서 EVAL 신호의 지속은 미리 결정되어 프로그래밍될 수 있다. 이러한 평가 주기동안, 사용자 디자인 로직의 데이터는 clk_en 신호 1353(상기 신호는 트레이스 1372에 의해 지시되는 바와 같이 로직 1이다), input_en 신호 1354(상기 신호도 또한 트레이스 1373에 의해 지시되는 바와 같이 로직 1이다), 및 mux_en 신호 1355(상기 신호도 또한 트레이스 1374에 의해 지시되는 바와 같이 로직 1이며 clk_en 및 input_en 신호보다 더 오래 지속된다)를 이용하여 평가된다. 데이터는 이러한 특정 FPGA 로직 디바이스내에서 평가된다. mux_en 신호 1355가 트레이스 1374에서 로직 1에서 로직 0으로 전환되고 적어도 하나의 메모리 블록이 FPGA 로직 디바이스에 존재할 때, 평가 주기는 완료되고 메모리 액세스 주기가 시작된다.When the DMA transfer period is complete, the DATAXSFR signal transitions from logic 1 to logic 0, which triggers an onset of evaluation periods. Thus, EVAL 1352 is logic 1 as indicated by trace 1371. The duration of the EVAL signal in logic 1 can be predetermined and programmed. During this evaluation period, the data of the user design logic is clk_en signal 1353 (the signal is logic 1 as indicated by trace 1372), input_en signal 1354 (the signal is also logic 1 as indicated by trace 1373). ), And mux_en signal 1355 (the signal is also logic 1 and lasts longer than the clk_en and input_en signals, as indicated by trace 1374). Data is evaluated within this particular FPGA logic device. When the mux_en signal 1355 transitions from logic 1 to logic 0 at trace 1374 and at least one memory block is present in the FPGA logic device, the evaluation period is complete and the memory access period begins.

SHIFTIN 신호 1356이 트레이스 1375에서 로직 1로 주어진다. 이는 이전의 FPGA가 평가를 완료하였으며 모든 원하는 데이터가 이러한 이전 FPGA 로직 디바이스에 대하여 액세스된다는 것을 의미한다. 이제, 뱅크의 다음 FPGA 로직 디바이스가 메모리 액세스를 준비한다.The SHIFTIN signal 1356 is given as logic 1 at trace 1375. This means that the previous FPGA has completed the evaluation and all desired data is accessed for this previous FPGA logic device. Now, the next FPGA logic device in the bank prepares for memory access.

트레이스 1377 내지 1386에서, 다음과 같이 명명한다. ACj_k는 어드레스 및 제어 신호가 FPGAj 및 메모리 블록 k와 관련된다는 것을 의미한다. 여기서, j 및 k는 0을 포함하는 정수이다. WDj_k는 FPGAj 및 메모리 블록 k에 대한 기록 데이터를 의미한다. RDj_k는 FPGAj 및 메모리 블록 k에 대한 판독 데이터를 의미한다. 따라서, AC3_1은 FPGA3 및 메모리 블록 1과 관련된 어드레스 및 제어 신호를 의미한다. 로우 뱅크 SRAM 액세스 및 하이 뱅크 SRAM 액세스 1361는 트레이스 1387로 도시되었다.In traces 1377-1386, they are named as follows. ACj_k means that the address and control signals are associated with FPGAj and memory block k. Here, j and k are integers containing 0. WDj_k means write data for FPGAj and memory block k. RDj_k means read data for FPGAj and memory block k. Thus, AC3_1 refers to the address and control signals associated with FPGA3 and memory block 1. Low bank SRAM access and high bank SRAM access 1361 are shown as trace 1387.

다음 트레이스 1377 내지 1387은 메모리 액세스가 수행되는 방법을 나타낸다. EVALFSMx에 대한 wrx 신호의 로직 레벨 및 MEMFSM에 대한 mem_wr 신호의 로직 레벨에 따라서 기록 또는 판독 오퍼레이션이 수행된다. 만약 기록 오퍼레이션이 목적되면, 메모리 모델은 사용자 메모리 블록 N 인터페이스(도 57의 Mem_Block_N 인터페이스 1253)와 인터페이싱하여 wrx를 제어 신호의 하나로서 제공한다. 상기 제어 신호 wrx는 FD 버스 드라이버 및 EVALFSMx 유니트에 제공된다. 만약 wrx가 로직 1이면, 적절한 선택 신호 및 output_en 신호가 FD 버스 드라이버에 제공되어 FD 버스상에 메모리 기록 데이터를 전송한다. 이제 FD 버스상에 존재하는 이러한 동일의 제어 신호는 CTRL_FPGA 유니트에서 메모리 어드레스/제어 래치에 의해 래치될 수 있다. 메모리 어드레스/제어 래치는 MA[18:2]를 통해서 어드레스 및 제어 신호를 SRAM으로 전송한다. 로직 1인 wrx 제어 신호는 FD 버스로부터 추출되고, 기록 오퍼레이션이 요청되기 때문에 FD 버스상의 어드레스 및 제어 신호와 관련된 데이터가 SRAM 메모리 디바이스로 전송된다.The following traces 1377-1387 illustrate how memory accesses are performed. A write or read operation is performed according to the logic level of the wrx signal for EVALFSMx and the logic level of the mem_wr signal for MEMFSM. If a write operation is desired, the memory model interfaces with the user memory block N interface (Mem_Block_N interface 1253 in FIG. 57) to provide wrx as one of the control signals. The control signal wrx is provided to the FD bus driver and the EVALFSMx unit. If wrx is logic 1, the appropriate select signal and output_en signal are provided to the FD bus driver to transfer memory write data on the FD bus. This same control signal now present on the FD bus can be latched by a memory address / control latch in the CTRL_FPGA unit. The memory address / control latch transfers the address and control signals to the SRAM via MA [18: 2]. Logic 1 wrx control signal is extracted from the FD bus, and since a write operation is requested, data associated with the address and control signal on the FD bus is transferred to the SRAM memory device.

따라서, 도 61에 도시된 바와 같이, 다음 FPGA 로직 디바이스(로우 뱅크에서 로직 디바이스 FPGA0)는 트레이스 1377로 지시된 바와 같이 FD[31:0]상에 AC0_0를 전송한다. 시뮬레이션 시스템은 WD0_0에 대하여 기록 오퍼레이션을 수행한다. 그리고, AC0_1이 FD[31:0]상에 전송된다. 그러나, 만약 판독 오퍼레이션이 요청되면, AC0_0에 대응하는 WD0_0 대신 RD0_0가 SRAM 메모리 디바이에 의해 FD 버스상에 존재하기 전에 FD 버스 FD[31:0]상의 AC0_1의 존재는 얼마의 시간 지연을 가질 것이다.Thus, as shown in FIG. 61, the next FPGA logic device (logic device FPGA0 in the low bank) sends AC0_0 on FD [31: 0] as indicated by trace 1377. The simulation system performs a write operation on WD0_0. AC0_1 is then transmitted on FD [31: 0]. However, if a read operation is requested, the presence of AC0_1 on FD bus FD [31: 0] will have some time delay before RD0_0 is present on the FD bus by the SRAM memory device instead of WD0_0 corresponding to AC0_0.

트레이스(1383)에 의해 표시된 것처럼, MA[18:2]/제어 버스 상의 AC0_0의 배치는 FD 버스 상의 어드레스, 제어 및 데이터의 배치보다 약간 지연된다. 이것은 MEMFSM 유닛이 FD 버스로부터 어드레스/제어 신호를 래칭(latch)하고 mem_wr 신호를 추출하며 어드레스/제어 먹스(mux)에 적절한 선택 신호를 발생시켜 어드레스/제어 신호가 MA[18:2]/제어 버스 상에 배치될 수 있게 하는 시간을 요구하기 때문이다. 부가하여, MA[18:2]/제어 버스 상의 어드레스/제어 신호를 SRAM 메모리 장치에 배치한 이후, 시뮬레이션 시스템은 FD 버스 상에 배치될 SRAM 메모리 장치로부터 나온 대응 데이터를 기다려야 한다. 일 예는 트레이스(1384)와 트레이스(1381) 사이의 시간 오프셋(offset)이고, 여기서, RD1_1은 AC_1가 MA[18:2]/제어 버스 상에 배치된 이후 FD 버스 상에 배치된다.As indicated by trace 1383, the placement of AC0_0 on the MA [18: 2] / control bus is slightly delayed than the placement of address, control and data on the FD bus. This causes the MEMFSM unit to latch the address / control signal from the FD bus, extract the mem_wr signal, and generate an appropriate select signal to the address / control mux so that the address / control signal can be MA [18: 2] / control bus. This is because it requires time to be placed on the phase. In addition, after placing the address / control signal on the MA [18: 2] / control bus into the SRAM memory device, the simulation system must wait for corresponding data from the SRAM memory device to be placed on the FD bus. One example is the time offset between trace 1384 and trace 1381, where RD1_1 is placed on the FD bus after AC_1 is placed on the MA [18: 2] / control bus.

높은 뱅크(bank) 상에서, FPGA1은 AC1_0를 버스 FD[63:32] 상에 배치하고, 그 다음에 WD1_0가 수반된다. 그 후에, AC1_1은 버스 FD[63:32] 상에 배치된다. 이것은 트레이스(1380)에 의해 표시된다. AC1_1이 FD 버스 상에 배치될 때, 이러한 예에서 제어 신호는 판독 동작을 지시한다. 이와 같이, 전술한 것처럼, AC1_1이 트레이스(1384)에 의해 표시된 것처럼 MA[18:2]/제어 버스 상에 배치될 때, 로직 0에서 적절한 wrx 및 mem_wr 신호가 EVALFSMx 및 MEMFSM 유닛에 어드레스/제어신호로 제공된다. 상기 시뮬레이션 시스템은 이것이 판독 동작임을 알기 때문에, 기록 데이터는 SRAM 메모리 장치로 전송되지 않을 것이고, 그보다는 오히려 AC1_1과 관련된 판독 데이터가 시뮬레이션 메모리 블록 인터페이스를 경유한 사용자 설계 로직에 의한 후속적인 판독을 위하여 SRAM 메모리 장치에 의하여 FD 버스 상에 배치된다. 이것은 높은 뱅크 상의 트레이스(1381)에 의해 표시된다. 낮은 뱅크 상에서, RD0_1인 트레이스(1378)에 의해 표시된 것처럼 FD 버스 상에 배치되고, 그 다음에 AC0_1이 MA[18:2]/제어 버스 상에 배치된다(미도시).On the high bank, FPGA1 places AC1_0 on bus FD [63:32], followed by WD1_0. Thereafter, AC1_1 is disposed on bus FD [63:32]. This is indicated by trace 1380. When AC1_1 is placed on the FD bus, the control signal in this example indicates a read operation. Thus, as described above, when AC1_1 is placed on the MA [18: 2] / control bus as indicated by trace 1384, the appropriate wrx and mem_wr signals at logic 0 are addressed to the EVALFSMx and MEMFSM units / control signals. Is provided. Since the simulation system knows that this is a read operation, the write data will not be transferred to the SRAM memory device, but rather the read data associated with AC1_1 may not be sent to the SRAM for subsequent reading by the user design logic via the simulation memory block interface. Placed on the FD bus by the memory device. This is indicated by trace 1381 on the high bank. On the lower bank, it is placed on the FD bus as indicated by trace 1378 which is RD0_1, and then AC0_1 is placed on the MA [18: 2] / control bus (not shown).

시뮬레이션 메모리 블록 인터페이스를 경유한 사용자 설계 로직에 의한 판독 동작은 트레이스(1388)에 의해 표시된 것처럼 EVALFSMx가 rd_lat0 신호(1362)를 시뮬레이션 메모리 블록 인터페이스의 메모리 판독 데이터 이중 버퍼에 발생시킬 때 달성된다. 이러한 rd_lat0 신호은 낮은 뱅크 FPGA0 및 높은 뱅크 FPGA1 둘 다에 제공된다.Read operation by the user design logic via the simulation memory block interface is accomplished when EVALFSMx generates the rd_lat0 signal 1362 to the memory read data double buffer of the simulation memory block interface, as indicated by the trace 1388. This rd_lat0 signal is provided to both low bank FPGA0 and high bank FPGA1.

그 후에, 각각의 FPGA 로직 장치에 대한 다음 메모리 블록이 FD 버스 상에 배치된다. AC2_0가 낮은 뱅크 FD 버스 상에 배치되는 한편, AC3_0는 높은 뱅크 FD 버스 상에 배치된다. 기록 동작이 요구된다면, WD2_0는 낮은 뱅크 FD 버스 상에 배치되고, WD3_0은 높은 뱅크 FD 버스 상에 배치된다. AC3_0는 트레이스(1385)에 의해 표시된 것처럼 높은 뱅크 MA[18:2]/제어 버스 상에 배치된다. 이러한 프로세스는 기록 및 판독 동작을 위하여 다음 메모리 블록에 대하여 계속된다. 낮은 뱅크 및 높은 뱅크를 위한 기록 및 판독 동작은 서로 다른 시간 및 속도에서 일어날 수 있고 도 61은 낮은 뱅크 및 높은 뱅크에 대한 타이밍이 동일한 특별한 일례를보여준다. 부가적으로, 낮은 뱅크 및 높은 뱅크에 대한 기록 동작은 함께 발생하고, 뒤이어 두 뱅크 상의 판독 동작이 일어난다. 항상 이러한 것은 아니다. 낮은 뱅크 및 높은 뱅크의 존재는 이러한 뱅크들에 결합된 장치들의 병렬 동작을 가능하게 한다. 즉, 낮은 뱅크 상의 활동은 높은 뱅크 상의 활동에 독립적이다. 다른 시나리오에 의하면, 낮은 뱅크는 높은 뱅크가 일련의 판독 동작을 수행하고 있을 때 병렬적으로 일련의 기록 동작을 수행한다.Thereafter, the next memory block for each FPGA logic device is placed on the FD bus. AC2_0 is placed on the low bank FD bus, while AC3_0 is placed on the high bank FD bus. If a write operation is required, WD2_0 is placed on the low bank FD bus and WD3_0 is placed on the high bank FD bus. AC3_0 is placed on the high bank MA [18: 2] / control bus as indicated by trace 1385. This process continues for the next memory block for write and read operations. Write and read operations for the low and high banks can occur at different times and speeds and FIG. 61 shows a particular example where the timing for the low and high banks is the same. Additionally, write operations for the low and high banks occur together, followed by read operations on both banks. This is not always the case. The presence of low banks and high banks enables parallel operation of the devices coupled to these banks. That is, activity on the low bank is independent of activity on the high bank. According to another scenario, the low bank performs a series of write operations in parallel when the high bank is performing a series of read operations.

각각의 뱅크에 대한 마지막 FPGA 로직 장치의 마지막 데이터와 만나면, SHIFTOUT 신호(1357)가 트레이스(1376)에 의해 표시된 것처럼 가정된다. 판독 동작에 대하여, 낮은 뱅크 상의 FPGA2 및 높은 뱅크 상의 FPGA3에 대응하는 rd_lat 신호(1363)가 트레이스(1389)에 의해 표시된 것처럼 트레이스(1379) 상의 RD2_1 및 트레이스(1382) 상의 RD3_1를 판독하도록 가정된다. 마지막 FPGA 유닛에 대한 마지막 데이터가 액세싱되었기 때문에, 시뮬레이션 기록/판독 사이클의 완료는 트레이스(1390)에 의해 표시된 것처럼 DONE 신호(1364)에 의해 지시된다.Upon encountering the last data of the last FPGA logic device for each bank, the SHIFTOUT signal 1357 is assumed as indicated by trace 1376. For the read operation, it is assumed that the rd_lat signal 1363 corresponding to FPGA2 on the low bank and FPGA3 on the high bank read RD2_1 on trace 1379 and RD3_1 on trace 1382 as indicated by trace 1389. Since the last data for the last FPGA unit was accessed, the completion of the simulation write / read cycle is indicated by the DONE signal 1264 as indicated by trace 1390.

다음의 표 H는 시뮬레이션 시스템 보드 상의 여러가지 컴포넌트 및 그에 대응하는 레지스터/메모리, PCI 메모리 어드레스 및 로컬 어드레스를 리스트한다.Table H, below, lists the various components on the simulation system board and their corresponding registers / memory, PCI memory addresses, and local addresses.

표 H : 메모리 맵Table H: Memory Map

설정 파일(configuration file)에 대한 데이터 포맷은 본 발명의 일 실시예에 따라 이하의 표 J에 나타난다. CPU는 모든 온-보드 FPGA에 대하여 1 비트를 병렬로 설정하도록 매번 PCI 버스를 통하여 1 워드를 보낸다.The data format for the configuration file is shown in Table J below in accordance with one embodiment of the present invention. The CPU sends one word through the PCI bus each time to set one bit in parallel for all on-board FPGAs.

표 J : 설정 데이터 포맷Table J: Configuration Data Format

이하의 표 K는 XSFR_EVAL 레지스터를 리스트한다. 그것은 모든 보드에 상주한다. XSFR_EVAL 레지스터는 EVAL 기간(period)을 프로그래밍하고 DMA 판독/기록을 제어하며 EVAL_DONE 및 XSFR_DONE 필드의 상태를 판독하기 위하여 호스트 컴퓨팅 시스템에 의해 사용된다. 호스트 컴퓨팅 시스템은 또한 메모리가 액세싱할 수 있도록 이러한 레지스터를 사용한다. 이러한 레지스터와 관련된 시뮬레이션 시스템의 동작은 도 62 및 도 63과 관련하여 이하에서 설명된다.Table K below lists the XSFR_EVAL register. It resides on every board. The XSFR_EVAL register is used by the host computing system to program the EVAL period, control the DMA read / write, and read the status of the EVAL_DONE and XSFR_DONE fields. The host computing system also uses these registers to allow memory to access them. The operation of the simulation system associated with these registers is described below with respect to FIGS. 62 and 63.

표 K : 모든 6개의 보드에 대한 XSFR_EVAL REGISTER(로컬 어드레스: 0h)Table K: XSFR_EVAL REGISTER for all six boards (local address: 0h)

이하의 표 L은 CONFIG_JTAG[6:1] 레지스터의 콘텐트를 리스트한다. CPU는 FPGA 로직 장치를 설정하고 이러한 레지스터를 통하여 FPGA 로직 장치에 대한 경계 스캔 테스트(boundary scan test)를 실행한다. 각각의 보드는 하나의 전용 레지스터를 갖는다.Table L below lists the contents of the CONFIG_JTAG [6: 1] register. The CPU sets up the FPGA logic device and executes a boundary scan test of the FPGA logic device through these registers. Each board has one dedicated register.

표 L : CONFIG_JTAG[6:1] REGISTERTable L: CONFIG_JTAG [6: 1] REGISTER

도 62 및 도 63은 본 발명의 다른 실시예에 대한 타이밍 다이어그램을 보여준다. 상기 2 도면은 XSFR_EVAL 레지스터와 관련한 시뮬레이션 시스템의 동작을 보여준다. XSFR_EVAL 레지스터는 EVAL 기간을 프로그래밍하고 DMA 판독/기록을 제어하며 EVAL_DONE 및 XSFR_DONE 필드의 상태를 판독하기 위하여 호스트 컴퓨팅 시스템에 의하여 사용된다. 호스트 컴퓨팅 시스템은 또한 메모리가 액세싱할 수 있도록 이러한 레지스터를 사용한다. 상기 2 도면 사이의 주된 차이점 중 하나는 WAIT_EVAL 필드의 상태이다. 도 62의 경우에는 WAIT_EVAL 필드가 "0"으로 세팅되고, DMA 판독 전송은 CLK_EN 이후에 시작한다. 도 63의 경우에는 WAIT_EVAL 필드가 "1"로 세팅되고, DMA 판독 전송은 EVAL_DONE 이후에 시작한다.62 and 63 show timing diagrams for another embodiment of the present invention. Figure 2 shows the operation of the simulation system with respect to the XSFR_EVAL register. The XSFR_EVAL register is used by the host computing system to program the EVAL period, control the DMA read / write, and read the status of the EVAL_DONE and XSFR_DONE fields. The host computing system also uses these registers to allow memory to access them. One of the main differences between the two figures is the state of the WAIT_EVAL field. In the case of Fig. 62, the WAIT_EVAL field is set to " 0 ", and the DMA read transfer starts after CLK_EN. In the case of FIG. 63, the WAIT_EVAL field is set to "1", and the DMA read transfer starts after EVAL_DONE.

도 62에서, WR_XSFR_EN 및 RD_XSFR_EN은 둘 다 "1"로 세팅된다. 이러한 2개의 필드는 DMA 기록/판독 전송을 가능하게 하고 XSFR_DONE에 의해 클리어(clear)될 수 있다. 두 개의 필드가 "1"로 세팅되기 때문에, CTRL_FPGA 유닛은 자동적으로 DMA 기록 전송을 먼저 실행하고 그 다음에 DMA 판독 전송을 실행한다. 그러나, WAIT_EVAL 필드는 CLK_EN의 가정(assertion) 이후(및 DMA 기록 동작의 완료 이후)에 DMA 판독 전송이 시작됨을 지시하는 "0"으로 세팅된다. 이와 같이, 도 62에서, DMA 판독 동작은 CLK_EN 신호(소프트웨어 클럭)이 검출되자마자 DMA 기록 동작의 완료 이후에 거의 즉시 발생한다. DMA 판독 전송 동작은 EVAL 기간의 완료를 기다리지 않는다.In FIG. 62, both WR_XSFR_EN and RD_XSFR_EN are set to "1". These two fields enable DMA write / read transfers and may be cleared by XSFR_DONE. Since the two fields are set to "1", the CTRL_FPGA unit automatically executes the DMA write transfer first and then the DMA read transfer. However, the WAIT_EVAL field is set to " 0 " indicating that the DMA read transfer starts after the assertion of CLK_EN (and after completion of the DMA write operation). As such, in FIG. 62, the DMA read operation occurs almost immediately after completion of the DMA write operation as soon as the CLK_EN signal (software clock) is detected. The DMA read transfer operation does not wait for the completion of the EVAL period.

타이밍 다이어그램의 시작에서, EVAL_REQ_N 신호는 다수의 FPGA 로직 장치가 주목(attention)을 위하여 경쟁할 때 경쟁을 경험한다. 이전에 설명한 것처럼, EVAL_REQ_N(또는 EVAL_REQ#) 신호는 임의의 FPGA 로직 장치가 이러한 신호를 가정한다면 평가(evaluation) 사이클을 시작하기 위해 사용된다. 데이터 전송의 종료에서, 평가 사이클은 평가 프로세스를 촉진하기 위하여 어드레스 포인터 초기화(address pointer initialization) 및 소프트웨어 클럭의 동작을 포함하여 시작된다.At the beginning of the timing diagram, the EVAL_REQ_N signal experiences competition when multiple FPGA logic devices compete for attention. As previously described, the EVAL_REQ_N (or EVAL_REQ #) signal is used to begin the evaluation cycle if any FPGA logic device assumes this signal. At the end of the data transfer, the evaluation cycle begins with the operation of the address pointer initialization and software clock to facilitate the evaluation process.

DMA 데이터 전송 기간의 종결시 생성된 DONE 신호는 또한 다수의 LAST 신호(각각의 FPGA 논리 장치의 출력에서 시프트인 및 시프트아웃 신호로부터 나옴)가 발생하여 CTRL_FPGA 유닛에 제공될 때 경쟁을 경험한다. 모든 LAST 신호가 수신되고 프로세싱될 때, DONE 신호가 발생하여 새로운 DMA 데이터 전송 동작이 시작될 수 있다. EVAL_REQ_N 신호 및 DONE 신호는 이하에서 설명되는 방식으로 시간 공유 기초(time-shared basis) 상에서 동일한 와이어를 사용한다.The DONE signal generated at the end of the DMA data transfer period also experiences competition when a number of LAST signals (which come from the shift-in and shift-out signals at the output of each FPGA logic device) are generated and provided to the CTRL_FPGA unit. When all LAST signals are received and processed, a DONE signal may be generated to begin a new DMA data transfer operation. The EVAL_REQ_N signal and the DONE signal use the same wire on a time-shared basis in the manner described below.

상기 시스템은 시간(1409)에서 WR_XSFR 신호에 의해 도시된 것처럼 우선 DMA 기록 전송을 시작한다. WR_XSFR 신호의 시작 부분은 PCI 제어기(일 실시예에서는 PCI(9080) 또는 PCI(9060))와 관련된 소정의 오버헤드(overhead)를 포함한다. 그 후에, 호스트 컴퓨팅 시스템은 로컬 버스 LD[31:0] 및 FPGA 버스 FD[63:0]를 경유하여 FPGA 버스 FD[63:0]에 결합된 FPGA 로직 장치로 DMA 기록 동작을 수행한다.The system first starts a DMA write transfer as shown by the WR_XSFR signal at time 1409. The beginning of the WR_XSFR signal includes some overhead associated with the PCI controller (in one embodiment PCI 9080 or PCI 9060). The host computing system then performs a DMA write operation with the FPGA logic device coupled to the FPGA bus FD [63: 0] via the local bus LD [31: 0] and the FPGA bus FD [63: 0].

시간(1412)에서, WR_XSFR 신호는 DMA 기록 동작의 완료를 지시하면서 활성을 잃는다. EVAL 신호는 시간(1412)로부터 시간(1410)까지 사전에 결정된 시간 동안 활성화된다. EVALTIME의 지속시간은 프로그램가능하고 초기에 8+X로 세팅되는데, 여기서 X는 가장 긴 신호 트레이스 경로로부터 나온다. XSFR_DONE 신호는 또한 현재 동작이 DMA 기록인 이러한 DMA 전송 동작의 완료를 지시하기 위하여 짧은 시간동안 활성화된다.At time 1412, the WR_XSFR signal loses activity, indicating completion of the DMA write operation. The EVAL signal is activated for a predetermined time from time 1412 to time 1410. The duration of EVALTIME is programmable and is initially set to 8 + X, where X comes from the longest signal trace path. The XSFR_DONE signal is also activated for a short time to indicate the completion of this DMA transfer operation whose current operation is a DMA write.

또한, 시간(1412)에서, EVAL_REQ_N 신호들 사이의 경쟁이 중단되고 DONE 신호를 전하는 와이어는 이제 CTRL_FPGA 유닛으로 EVAL_REQ_N 신호를 전달한다. 3 클럭 사이클 동안, EVAL_REQ_N 신호는 DONE 신호를 전하는 와이어를 경유하여 프로세싱된다. 3 클럭 사이클 이후, EVAL_REQ_N 신호들은 더 이상 FPGA 로직 장치에 의해 생성되지 않고 이전에 CTRL_FPGA 유닛으로 전달되었던 EVAL_REQ_N 신호가 프로세싱될 것이다. EVAL_REQ_N 신호가 게이팅된(gated) 클럭에 대한 FPGA 로직 장치에 의해 더 이상 생성되지 않는 최대 시간은 대략 23 클럭 사이클이다. 이러한 기간보다 더 긴 EVAL_REQ_N 신호는 무시될 것이다.Also, at time 1412, the competition between the EVAL_REQ_N signals is stopped and the wire carrying the DONE signal now passes the EVAL_REQ_N signal to the CTRL_FPGA unit. For three clock cycles, the EVAL_REQ_N signal is processed via the wire carrying the DONE signal. After three clock cycles, the EVAL_REQ_N signals are no longer generated by the FPGA logic device and the EVAL_REQ_N signal that was previously delivered to the CTRL_FPGA unit will be processed. The maximum time for which the EVAL_REQ_N signal is no longer generated by the FPGA logic device for the gated clock is approximately 23 clock cycles. EVAL_REQ_N signals longer than this period will be ignored.

시간(1413)에서, 즉 시간(1412)(DMA 기록 동작의 종료시) 이후의 대략 2 클럭 사이클 정도 지난 시간에서, CTRL_FPGA 유닛은 DMA 판독 전송을 시작하기 위하여 기록 어드레스 스트로브(strobe) WPLX ADS_N 신호를 PCI 제어기로 보낸다. 시간(1413)으로부터 약 24 클럭 사이클 이후의 시간에서, PCI 제어기는 DMA 판독 전송 프로세스를 시작할 것이고 DONE 신호가 또한 생성된다. 시간(1414)에서, PCI 제어기에 의한 DMA 판독 프로세스의 시작에 앞서, RD_XSFR 신호가 DMA 판독 전송을 가능하게 하도록 활성화된다. 우선 소정의 PLX 오버헤드 데이터가 전송되고 프로세싱된다. 시간(1415)에서, 이러한 오버헤드 데이터가 프로세싱되는 동안, DMA 판독 데이터가 FPGA 버스 FD[63:0] 및 로컬 버스 LD[31:0] 상에 배치된다. 시간(1413)으로부터 24 클럭 사이클의 종료시 및 DONE 신호의 활성화와 FPGA 로직 장치들로부터 나온 EVAL_REQ_N 신호의 발생 시점에서, PCI 제어기는 FPGA 버스FD[63:0] 및 로컬 버스 LD[31:0]로부터 호스트 컴퓨터 시스템으로 데이터를 전송함으로써 DMA 판독 데이터를 프로세싱한다.At time 1413, i.e., approximately two clock cycles after time 1412 (at the end of the DMA write operation), the CTRL_FPGA unit performs a PCI write address strobe WPLX ADS_N signal to initiate a DMA read transfer. Send it to the controller. At a time after about 24 clock cycles from time 1413, the PCI controller will begin the DMA read transfer process and a DONE signal is also generated. At time 1414, prior to the start of the DMA read process by the PCI controller, the RD_XSFR signal is activated to enable DMA read transfer. First, some PLX overhead data is transmitted and processed. At time 1415, while this overhead data is being processed, DMA read data is placed on the FPGA bus FD [63: 0] and local bus LD [31: 0]. At the end of 24 clock cycles from time 1413 and at the time of activation of the DONE signal and generation of the EVAL_REQ_N signal from the FPGA logic devices, the PCI controller is deactivated from the FPGA bus FD [63: 0] and local bus LD [31: 0]. Process the DMA read data by transferring the data to the host computer system.

시간(1410)에서, DMA 판독 데이터는 계속 프로세싱될 수 있을 것이고, 반면 EVAL 신호는 활성을 잃으 것이며 EVAL_DONE 신호는 EVAL 사이클의 완료를 지시하기 위하여 활성화될 것이다. FPGA 로직 장치들 사이의 경쟁은 또한 그들이 EVAL_REQ_N 신호를 생성할 때 시작된다.At time 1410, the DMA read data may continue to be processed, while the EVAL signal will lose activity and the EVAL DONE signal will be activated to indicate the completion of the EVAL cycle. Competition between FPGA logic devices also begins when they generate the EVAL_REQ_N signal.

시간(1417)에서, 시간(1416)에서의 DMA 판독 기간의 완료 바로 이전에, 호스트 컴퓨터 시스템은 DMA 사이클의 종료가 가까운지를 결정하기 위하여 PLX 인터럽트 레지스터를 폴링(poll)한다. PCI 제어기는 DMA 데이터 전송 프로세스를 완료하기 위하여 얼마나 많은 수의 사이클이 필요한지를 안다. 미리 결정된 횟수의 사이클 이후에, PCI 제어기는 인터럽트 레지스터의 특정 비트를 세팅할 것이다. 호스트 컴퓨터의 CPU는 PCI 제어기의 이러한 인터럽트 레지스터를 폴링한다. 상기 비트가 세팅되면, CPU는 DMA 기간이 거의 수행됨을 안다. 호스트 시스템의 CPU는 인터럽트 레지스터가 판독 사이클로 PCI 버스를 방해할 것이므로 항상 인터럽트 레지스터를 폴링하는 것은 아니다. 그리하여, 본 발명의 일 실시예에서, 호스트 컴퓨터 시스템의 CPU는 인터럽트 레지스터를 폴링하기 이전에 특정 수의 사이클을 기다리도록 프로그래밍된다.At time 1417, just before the completion of the DMA read period at time 1416, the host computer system polls the PLX interrupt register to determine if the end of the DMA cycle is close. The PCI controller knows how many cycles are needed to complete the DMA data transfer process. After a predetermined number of cycles, the PCI controller will set a specific bit in the interrupt register. The CPU of the host computer polls these interrupt registers of the PCI controller. If the bit is set, the CPU knows that the DMA period is nearly performed. The CPU of the host system does not always poll the interrupt register because the interrupt register will interrupt the PCI bus in read cycles. Thus, in one embodiment of the present invention, the CPU of the host computer system is programmed to wait for a certain number of cycles before polling the interrupt register.

짧은 시간 이후에, RD_XSFR이 활성을 잃고 DAM 판독 데이터가 더 이상 FPGA 버스 FD[63:0] 또는 로컬 버스 LD[31:0] 상에 있지 않을 때 DMA 판독 기간의 종료는 시간(1416)에서 발생한다. XSFR_DONE 신호는 또한 시간(1416)에서 활성화되고DONE 신호의 발생을 위한 LAST 신호들 사이의 경쟁이 시작된다.After a short time, the end of the DMA read period occurs at time 1416 when RD_XSFR becomes inactive and the DAM read data is no longer on FPGA bus FD [63: 0] or local bus LD [31: 0]. do. The XSFR_DONE signal is also activated at time 1416 and competition between LAST signals for the generation of the DONE signal begins.

시간(1409)에서의 WR_XSFR 신호 발생으로부터 시간(1417)까지의 전체 DMA 기간 동안, 호스트 컴퓨터 시스템의 CPU는 시뮬레이션 하드웨어 시스템을 액세싱하지 않는다. 일 실시예에서, 이러한 기간의 지속 시간은 (1) PCI 제어기 시간 2에 대한 오버헤드 시간, (2) WR_XSFR 및 RD_XSFR의 워드 수, 및 (3) 호스트 컴퓨터 시스템(예를 들어, Sun ULTRASpace)의 PCI 오버헤드의 합이다. DMA 기간 이후의 제 1 액세스는 CPU가 PCI 제어기의 인터럽트 레지스터를 폴링할 때 시간(1419)에서 발생한다.During the entire DMA period from the generation of the WR_XSFR signal at time 1409 to time 1417, the CPU of the host computer system does not access the simulation hardware system. In one embodiment, the duration of this period is determined by (1) overhead time for PCI controller time 2, (2) number of words in WR_XSFR and RD_XSFR, and (3) host computer system (eg, Sun ULTRASpace). The sum of the PCI overhead. The first access after the DMA period occurs at time 1419 when the CPU polls the interrupt register of the PCI controller.

시간(1416) 이후의 3 클럭 사이클이 지난 시간(1411)에서, MEM_EN 신호가 온 보드 SRAM 메모리 장치를 가능하게 하도록 활성화되어 FPGA 로직 장치와 SRAM 메모리 장치들 사이의 메모리 액세스가 시작될 수 있다. 메모리 액세스는 시간(1419)까지 계속되고, 일 실시예에서 액세스 당 5 클럭 사이클이 필요하다. 아무런 DMA 판독 전송이 필요하다면, 그 다음에 메모리 액세스는 시간(1411) 대신에 시간(1410)에서 더 일찍 시작될 수 있다.At time 1411, three clock cycles after time 1416, the MEM_EN signal is activated to enable the on-board SRAM memory device so that memory access between the FPGA logic device and the SRAM memory devices can begin. Memory access continues until time 1418, and in one embodiment requires 5 clock cycles per access. If no DMA read transfer is needed, then memory access can be started earlier at time 1410 instead of time 1411.

메모리 액세스가 FPGA 로직 장치와 FPGA 버스 FD[63:0]를 가로지른 SRAM 메모리 장치 사이에서 일어나는 동안, 호스트 컴퓨터 시스템의 CPU는 시간(1418)로부터 시간(1429)까지 로컬 버스 LD[31:0]를 경유하여 PCI 제어기 및 CTRL_FPGA 유닛과 통신할 수 있다. 이것은 CPU가 PCI 제어기의 인터럽트 레지스터 폴링을 완료한 이후에 일어난다. CPU는 다음 데이터 전송 준비로 여러 레지스터들 상에 데이터를 기록한다. 이러한 기간의 지속시간은 4㎲보다 더 크다. 메모리 액세스가 이러한기간보다 더 짧다면, FPGA 버스 FD[63:0]은 어떠한 충돌도 경험하지 않을 것이다. 시간(1429)에서, XSFR_DONE 신호는 활성을 잃는다.While memory access occurs between the FPGA logic device and the SRAM memory device across the FPGA bus FD [63: 0], the CPU of the host computer system is configured to local bus LD [31: 0] from time 1418 to time 1429. It can communicate with the PCI controller and the CTRL_FPGA unit via. This happens after the CPU has completed polling the interrupt registers of the PCI controller. The CPU writes data into several registers in preparation for the next data transfer. The duration of this period is greater than 4 ms. If the memory access is shorter than this period, the FPGA bus FD [63: 0] will not experience any collision. At time 1429, the XSFR_DONE signal loses activity.

도 63에서, 타이밍 다이어그램은 WAIT_EVAL 필드가 "1"로 세팅되어 있다는 점에서 도 62와 다소 다르다. 달리 말하여, DMA 판독 전송 기간은 EVAL_DONE 신호가 활성화되어 거의 완료된 이후에 시작된다. 그것은 DMA 기록 동작의 완료 이후 바로 시작되는 대신에 EVAL 기간의 완료 근처를 기다린다. EVAL 신호는 시간(1412)로부터 시간(1410)까지 미리 설정된 시간동안 활성화된다. 시간(1410)에서, EVAL_DONE 신호는 EVAL 기간의 완료를 지시하기 위하여 활성화된다.In FIG. 63, the timing diagram is somewhat different from FIG. 62 in that the WAIT_EVAL field is set to "1". In other words, the DMA read transfer period starts after the EVAL_DONE signal is activated and almost complete. It waits near the completion of the EVAL period instead of starting immediately after completion of the DMA write operation. The EVAL signal is activated for a preset time from time 1412 to time 1410. At time 1410, the EVAL DONE signal is activated to indicate the completion of the EVAL period.

도 63에서, 시간(1412)에서 DMA 기록 동작 이후에, CTRL_FPGA 유닛은 시간(1420)까지 PCI 제어기로 기록 어드레스 스트로브 신호 WPLX ADS_N을 발생시키지 않고, 상기 시간(1420)은 EVAL 기간의 종료의 약 16 클럭 사이클 이전이다. XSFR_DONE 신호는 또한 시간(1423)으로 연장된다. 시간(1423)에서, XSFR_DONE 필드가 세팅되고 그 다음에 WPLX ADS_N 신호가 DMA 판독 프로세스를 시작하기 위하여 생성될 수 있다.In FIG. 63, after a DMA write operation at time 1412, the CTRL_FPGA unit does not generate a write address strobe signal WPLX ADS_N to the PCI controller until time 1420, which time 1420 is about 16 of the end of the EVAL period. It is before the clock cycle. The XSFR_DONE signal also extends to time 1423. At time 1423, the XSFR_DONE field is set and then a WPLX ADS_N signal can be generated to begin the DMA read process.

EVAL_DONE 신호의 활성화의 약 16 클럭 사이클 이전에 시간(1420)에서, CTRL_FPGA 유닛은 DMA 판독 전송을 개시하기 위하여 기록 어드레스 스트로브 WPLX ADS_N 신호를 PCI 제어기(예를 들어, PLX PC19080)에 보낸다. 시간(1420)으로부터 약 24 클럭 사이클 이후에, PCI 제어기는 DMA 판독 전송을 시작할 것이고 DONE 신호가 또한 발생한다. 시간(1421)에서, PCI 제어기에 의한 DMA 판독 프로세스의 시작에 앞서, RD_XSFR 신호가 DMA 판독 전송을 가능하게 하기 위하여 활성화된다.소정의 PLX 오버헤드 데이터가 우선 전달되고 프로세싱된다. 시간(1422)에서, 이러한 오버헤드 데이터가 프로세싱되는 시간 동안, DMA 판독 데이터는 FPGA 버스 FD[63:0] 및 로컬 버스 LD[31:0] 상에 배치된다. 시간(1424)에서 24 클럭 사이클의 종료시, PCI 제어기는 FPGA 버스 FD[63:0] 및 로컬 버스 LD[31:0]로부터 호스트 컴퓨터 시스템으로 데이터를 전송함으로써 DMA 판독 데이터를 프로세싱한다. 타이밍 다이어그램의 나머지 부분은 도 62와 동일하다.At time 1420 before about 16 clock cycles of activation of the EVAL_DONE signal, the CTRL_FPGA unit sends a write address strobe WPLX ADS_N signal to a PCI controller (eg, PLX PC19080) to initiate a DMA read transfer. After about 24 clock cycles from time 1420, the PCI controller will begin a DMA read transfer and a DONE signal will also occur. At time 1421, prior to the start of the DMA read process by the PCI controller, the RD_XSFR signal is activated to enable DMA read transfer. Certain PLX overhead data is first delivered and processed. At time 1422, during the time such overhead data is processed, DMA read data is placed on FPGA bus FD [63: 0] and local bus LD [31: 0]. At the end of 24 clock cycles at time 1424, the PCI controller processes the DMA read data by transferring data from the FPGA bus FD [63: 0] and the local bus LD [31: 0] to the host computer system. The rest of the timing diagram is the same as in FIG. 62.

이와 같이, 도 63의 RD_XSFR 신호가 도 62보다 더 이후에 활성화된다. 도 63의 RD_XSFR 신호는 EVAL 기간의 거의 완료 후에 존재하여 DMA 판독 동작이 지연된다. 도 62의 RD_XSFR 신호는 DMA 기록 전송의 완료 이후에 CLK_EN 신호의 검출 다음에 존재한다.As such, the RD_XSFR signal of FIG. 63 is activated later than FIG. 62. The RD_XSFR signal in FIG. 63 is present after almost completion of the EVAL period, and delays the DMA read operation. The RD_XSFR signal in FIG. 62 is present after the detection of the CLK_EN signal after completion of the DMA write transfer.

IX. 커버리피케이션 시스템(COVERIFICATION SYSTEM)IX. COVERIFICATION SYSTEM

본 발명의 커버리피케이션 시스템(coverification system)은 설계자에게 소프트웨어 시뮬레이션의 유연성 및 하드웨어 모델을 사용함에 의해 얻어지는 더 빠른 속도를 제공함으로써 설계/개발를 가속시킬 수 있다. 하드웨어 부분 설계 및 소프트웨어 설계 둘 다 ASIC 제조에 앞서 확인될 수 있고 에뮬레이터 기반 커버리피케이션 툴의 제한이 없다. 디버깅 특성이 향상되고 전체 디버깅 시간이 현저히 감소될 수 있다.The coverage system of the present invention can accelerate design / development by providing designers with the flexibility of software simulation and the faster speeds obtained by using hardware models. Both hardware part design and software design can be verified prior to ASIC fabrication and there are no limitations of emulator-based coverage tools. Debugging characteristics can be improved and overall debugging time can be significantly reduced.

테스트받는 장치(device-under-test)로서 ASIC을 구비한 종래의 커버리피케이션 툴Conventional coverage tool with ASIC as device-under-test

도 64는 비디오, 멀티미디어, 에더넷(Ethernet), 또는 SCSI 카드와 같은, PCI 애드-온(add-on) 카드로서 구현된 전형적인 최종 설계를 도시한다. 이러한 카드(2000)는 다른 주변 장치들과 통신할 수 있게 하는 직접적인 인터페이스 커넥터(2002)를 포함한다. 커넥터(2002)는 VCR, 카메라, 또는 텔레비전 튜너로부터 나온 비디오 신호, 모니터 또는 스피커로의 비디오 및 오디오 출력, 통신 또는 디스크 드라이브 인터페이스로의 신호를 전송하기 위하여 버스(2001)에 결합된다. 사용자 설계에 따라, 당업자는 다른 인터페이스 요구조건을 예상할 수 있다. 설계 기능의 태반은 칩(2004)에 존재하고 상기 칩(2004)은 버스(2003)을 경유하여 인터페이스 커넥터(2002)에, 로컬 클럭 신호를 생성하기 위한 버스(2007)을 경유하여 로컬 오실레이터(2005)에, 버스(2008)를 경유하여 메모리(2006)에 결합된다. 애드-온 카드(2000)은 또한 PCI 버스(2010)과 결합하기 위한 PCI 커넥터(2009)를 포함한다.64 illustrates a typical final design implemented as a PCI add-on card, such as video, multimedia, Ethernet, or SCSI card. This card 2000 includes a direct interface connector 2002 that enables communication with other peripheral devices. Connector 2002 is coupled to bus 2001 for transmitting video signals from a VCR, camera, or television tuner, video and audio outputs to a monitor or speaker, and signals to a communications or disk drive interface. Depending on the user design, one skilled in the art can anticipate other interface requirements. The placenta of the design function resides on chip 2004 and the chip 2004 is connected to interface connector 2002 via bus 2003 and local oscillator 2005 via bus 2007 for generating a local clock signal. ) Is coupled to memory 2006 via bus 2008. Add-on card 2000 also includes a PCI connector 2009 for coupling with PCI bus 2010.

도 64에 도시된 것처럼 설계를 애드-온 카드로서 구현하기 이전에, 상기 설계는 테스팅을 목적으로 ASIC 형태로 축소된다. 종래의 하드웨어/소프트웨어 커버리피케이션 툴이 도 65에 도시되어 있다. 사용자 설계는 도 65에서 테스트받는 장치(또는 "DUT")로 라벨링된 ASIC의 형태로 구현된다. 인터페이싱하도록 설계된 여러 소스로부터 자극(stimulus)을 얻기 위하여, 테스트받는 장치(2024)는 타겟 시스템(target system)(2020)에 배치되는데, 타겟 시스템은 마더보드(motherboard) 상의 중앙 컴퓨팅 시스템(2021) 및 여러 주변 장치들의 결합물이다. 타겟시스템(2020)은 CPU 및 메모리를 포함하는 중앙 컴퓨팅 시스템(2021)을 포함하고, 많은 어플리케이션을 실행하기 위하여 마이크로소프트 윈도우즈 또는 썬 마이크로시스템의 솔라리스(Solaris)와 같은 소정의 운영 시스템 하에서 동작한다. 당업계에 공지된 것처럼, 썬 마이크로시스템의 솔라리스는 운영 환경이고 인터넷, 인트라넷 및 엔터프라이즈-와이드(enterprise-wide) 컴퓨팅을 지원하는 소프트웨어 제품들의 세트이다. 솔라리스 운영 환경은 산업 표준 UNIX 시스템 배포 버전 4에 기초하고, 분산된 네트워킹 환경의 클라이언트-서버 어플리케이션을 위하여 설계되었으며, 더 작은 웍그룹(workgroup)에 대하여 적절한 자원을 제공하며, 전자 상업에 요구되는 WebTone을 제공한다.Prior to implementing the design as an add-on card as shown in FIG. 64, the design is reduced to ASIC form for testing purposes. A conventional hardware / software coverage tool is shown in FIG. 65. The user design is implemented in the form of an ASIC labeled with the device under test (or “DUT”) in FIG. 65. In order to obtain stimulus from various sources designed to interface, the device under test 2024 is placed in a target system 2020, which targets the central computing system 2021 on the motherboard and It is a combination of several peripherals. Target system 2020 includes a central computing system 2021 that includes a CPU and memory, and operates under a predetermined operating system such as Solaris of Microsoft Windows or Sun Microsystems for executing many applications. As is known in the art, Sun Microsystems' Solaris is a set of software products that are operating environments and support the Internet, intranets, and enterprise-wide computing. The Solaris operating environment is based on industry standard UNIX system distribution version 4, is designed for client-server applications in distributed networking environments, provides adequate resources for smaller workgroups, and is required for electronic commerce. To provide.

테스트받는 장치(2024)용 장치 드라이버(device driver)(2022)는 운영 시스템(및 임의의 어플리케이션)과 테스트받는 장치(2024) 사이의 통신을 가능하게 하도록 중앙 컴퓨팅 시스템(2021)에 포함된다. 당업계에 공지된 것처럼, 장치 드라이버는 하드웨어 컴포넌트 또는 컴퓨터 시스템의 주변 장치를 제어하는 특정 소프트웨어이다. 장치 드라이버는 장치의 하드웨어 레지스터를 액세싱하고 종종 장치에 의해 발생한 인터럽트를 다루기 위하여 인터럽트 핸들러(interrupt handler)를 포한한다. 장치 드라이버는 종종 운영 시스템 커널(kernel)의 최하위 레벨의 일부를 형성하고, 커널이 구성된 경우 장치 드라이버는 커널과 링크된다. 최근의 보다 많은 시스템은 운영 시스템이 실행된 이후 파일들로부터 설치될 수 있는 로딩가능(loadable) 장치 드라이버를 구비한다.A device driver 2022 for the device under test 2024 is included in the central computing system 2021 to enable communication between the operating system (and any application) and the device under test 2024. As is known in the art, device drivers are specific software that controls hardware components or peripherals of a computer system. Device drivers include interrupt handlers to access the device's hardware registers and often to handle interrupts generated by the device. Device drivers often form part of the lowest level of the operating system kernel, and device drivers are linked with the kernel when the kernel is configured. More modern systems have loadable device drivers that can be installed from files after the operating system is executed.

테스트받는 장치(2024) 및 중앙 컴퓨팅 시스템(2021)은 PCI 버스(2023)에 결합된다. 타겟 시스템(2020)의 다른 주변 장치들은 타겟 시스템을 버스(2034)를 경유하여 네트워크(2030)에 결합시키는데 사용되는 에더넷 PCI 애드-온 카드(2025), 버스(2036 및 2035)를 경유하여 SCSI 드라이브(2027 및 2031)에 결합되는 SCSI PCI 애드-온 카드(2026), 버스(2032)를 경유하여 테스트받는 장치(2024)에 결합되는 VCR(2028)(테스트받는 장치(2024) 설계에 필요한 경우), 및 버스(2033)를 경유하여 테스트받는 장치(2024)에 결합되는 모니터 및/또는 스피커(2029)(테스트받는 장치(2024) 설계에 필요한 경우)를 포함한다. 당업계에 공지된 것처럼, "SCSI"는 "소형 컴퓨터 시스템 인터페이스(Small Computer Systems Interface)"의 약자로서, 하드 디스크, 플로피 디스크, CD-ROM, 프린터, 스캐너 및 다수의 많은 장치와 같은 지능형 장치(intelligent device)와 컴퓨터 사이의 시스템 레벨 인터페이싱에 대한 프로세서 독립 표준이다.The device under test 2024 and the central computing system 2021 are coupled to the PCI bus 2023. Other peripheral devices of the target system 2020 are connected via the bus 2036 and 2035, the Ethernet PCI add-on card 2025 and the bus 2036 and 2035 used to couple the target system to the network 2030 via the bus 2034. SCSI PCI add-on card 2026 coupled to drives 2027 and 2031, VCR 2028 coupled to device under test 2024 via bus 2032 (if required for design of device 2024 under test) ) And a monitor and / or speaker 2029 (if necessary for designing the device under test 2024) coupled to the device under test 2024 via the bus 2033. As is known in the art, "SCSI" stands for "Small Computer Systems Interface" and can be used for intelligent devices such as hard disks, floppy disks, CD-ROMs, printers, scanners and many other devices. processor independent standard for system-level interfacing between intelligent devices and computers.

이러한 타겟 시스템 환경에서, 테스트받는 장치(2024)는 중앙 컴퓨팅 시스템(즉, 운영 시스템, 어플리케이션) 및 주변 장치로부터 여러 자극으로 검사될 수 있다. 시간은 고려사항에 들지 않고 설계자는 단지 간단한 통과/실패 테스트를 찾고 있다면, 커버리피케이션 툴은 그들의 요구를 충족시키기에 적절하여야 한다. 그러나, 대부분의 상황에서, 설계 프로젝트는 엄격한 예산의 제한을 받고 제품의 발매에 앞서 스케쥴 잡혀 있다. 앞서 설명한 것처럼, 이런 특정 ASIC 기반 커버리피케이션 툴은 디버깅 특징이 존재하지 않기 때문에 불만족스럽다(설계자는 정교한 기술없이는 "실패된" 테스트의 원인을 가려낼 수 없고, 검출된 모든 버그에 대한 "교정"의 수도 프로젝트의 시초에 예측될 수 없으며, 그리하여 스케쥴 및 예산을 예측할 수 없게 된다).In this target system environment, the device under test 2024 may be tested with various stimuli from a central computing system (ie, operating system, application) and peripherals. If time is not a consideration and designers are just looking for simple pass / fail testing, the coverage tool should be adequate to meet their needs. In most situations, however, design projects are subject to strict budget constraints and scheduled prior to product release. As mentioned earlier, this particular ASIC-based coverage tool is unsatisfactory because there are no debugging features (the designer cannot pinpoint the cause of a "failed" test without sophisticated techniques, and "corrects" all detected bugs). The number of countries cannot be predicted at the beginning of the project, and thus the schedule and budget cannot be predicted).

테스트받는 장치로서 에뮬레이터(emulator)를 구비한 종래의 커버리피케이션 툴Conventional coverage tool with emulator as device under test

도 66은 에뮬레이터를 구비한 종래의 커버리피케이션 툴을 도시한다. 도 64에 도시되고 앞서 설명된 셋업(set-up)과는 달리, 테스트받는 장치는 타겟 시스템(2040)과 소정의 주변 장치 및 테스트 워크스테이션(2052)과 결합된 에뮬레이터(2048)로 프로그래밍된다. 에뮬레이터(2048)는 에뮬레이션 클럭(2066) 및 에뮬레이터로 프로그래핑된 테스트받는 장치를 포함한다.66 shows a conventional coverage tool with an emulator. Unlike the setup-up shown in FIG. 64 and described above, the device under test is programmed with an emulator 2048 coupled with the target system 2040 and certain peripheral and test workstations 2052. Emulator 2048 includes an emulation clock 2066 and a device under test programmed with the emulator.

에뮬레이터(2048)는 PCI 버스 브리지(bridge)(2044)와 PCI 버스(2057) 및 제어 라인(2056)을 경유하여 타겟 시스템(2040)에 결합된다. 타겟 시스템(2040)은 마더보드 상의 중앙 컴퓨팅 시스템(2041) 및 여러 주변 장치들의 결합을 포함한다. 타겟 시스템(2040)은 CPU 및 메모리를 포함하는 중앙 컴퓨팅 시스템(2041)을 포함하고, 다수의 어플리케이션을 실행하기 위하여 마이크로소프트 윈도우즈 또는 썬 마이크로시스템의 솔라리스와 같은 소정의 운영 시스템하에서 동작한다. 테스트받는 장치의 장치 드라이버(2042)는 운영 시스템(및 임의의 어플리케이션)과 에뮬레이터(2048)의 테스트받는 장치 사이의 통신을 가능하게 하기 위하여 중앙 컴퓨팅 시스템(2041)에 포함된다. 이러한 컴퓨팅 환경의 일부인 다른 장치들과 마찬가지로 에뮬레이터(2048)와 통신하기 위하여, 중앙 컴퓨팅 시스템(2041)은 PCI 버스(2043)와 결합된다. 타겟 시스템(2040)의 다른 주변 장치는 버스(2058)를 경유하여 타겟 시스템을 네트워크(2049)에 결합시키기 위해 사용되는 에더넷 PCI 애드-온 카드(2045), 및 버스(2060 및 2059)를 경유하여 SCSI 드라이브(2047 및 2050)에 결합되는 SCSI PCI 애드-온 카드(2046)를 포함한다.Emulator 2048 is coupled to target system 2040 via PCI bus bridge 2044 and PCI bus 2057 and control line 2056. Target system 2040 includes a central computing system 2041 on the motherboard and a combination of several peripheral devices. Target system 2040 includes a central computing system 2041 that includes a CPU and memory, and operates under certain operating systems, such as Solaris of Microsoft Windows or Sun Microsystems, to execute multiple applications. The device driver 2042 of the device under test is included in the central computing system 2041 to enable communication between the operating system (and any application) and the device under test of the emulator 2048. As with other devices that are part of this computing environment, the central computing system 2041 is coupled with the PCI bus 2043 to communicate with the emulator 2048. Other peripheral devices of the target system 2040 are via an Ethernet PCI add-on card 2045 used to couple the target system to the network 2049 via the bus 2058, and via the buses 2060 and 2059. And a SCSI PCI add-on card 2046 coupled to SCSI drives 2047 and 2050.

에뮬레이터(2048)는 또한 버스(2062)를 경유하여 테스트 워크스테이션에 결합된다. 테스트 워크스테이션(2052)은 그 기능을 수행하기 위하여 CPU 및 메모리를 포함한다. 테스트 워크스테이션(2052)은 또한 모델링되지만 에뮬레이터(2048)에 물리적으로 결합되지 않은 다른 장치들에 대하여 데이터 케이스(2061) 및 장치 모델(2068)을 포함한다.Emulator 2048 is also coupled to the test workstation via bus 2062. Test workstation 2052 includes a CPU and a memory to perform its functions. Test workstation 2052 also includes data case 2061 and device model 2068 for other devices that are modeled but not physically coupled to emulator 2048.

최종적으로, 에뮬레이터(2048)는 버스(2061)를 경유하여 프레임 버퍼 또는 데이터 스트림 레코드/플레이 시스템(2051)과 같은 다른 주변 장치들에 결합된다. 이러한 프레임 버퍼 또는 데이터 스트림 레코드/플레이 시스템(2051)은 또한 버스(2063)를 경유하여 통신 장치 또는 채널(2053)에, 버스(2064)를 경유하여 VCR(2054)에, 그리고 버스(2065)를 경유하여 모니터 및/또는 스피커(2055)에 결합될 수 있다.Finally, emulator 2048 is coupled to other peripheral devices such as a frame buffer or data stream record / play system 2051 via bus 2061. This frame buffer or data stream record / play system 2051 may also connect to a communication device or channel 2053 via a bus 2063, to a VCR 2054 via a bus 2064, and to a bus 2065. Via a monitor and / or speaker 2055.

당업계에 공지된 것처럼, 에뮬레이션 클럭은 실제 타겟 시스템 속도보다 훨씬 더 느린 속도에서 동작한다. 그리하여, 도 66의 음영있는 부분은 에뮬레이션 속도로 동작하고 나머지 음영없는 부분은 실제 타겟 시스템 속도로 동작한다.As is known in the art, the emulation clock runs at a much slower speed than the actual target system speed. Thus, the shaded portions of FIG. 66 operate at the emulation speed and the remaining shaded portions operate at the actual target system speed.

전술된 것처럼, 에뮬레이터를 구비한 이러한 커버리피케이션 툴은 여러가지 제약 조건이 있다. 테스트받는 장치의 내부 상태 정보를 얻기 위하여 로직 분석기 또는 샘플-앤드-홀드(sample-and-hold) 장치를 사용할 때, 설계자는 그의 설계를컴파일링하고, 그 결과 디버깅 목적으로 조사하면서 관심있는 관련 신호가 샘플링을 위한 출력 핀 상에 제공된다. 설계자가 설계의 상이한 부분을 디버깅하길 원한다면, 그는 그 부분이 로직 분석기 또는 샘플-앤드-홀드 장치에 의해 샘플링될 수 있는 출력 신호를 갖는지 확인하여야 하고, 그렇지 않으면 이러한 신호가 샘플링 목적으로 출력 핀 상에 제공될 수 있도록 에뮬레이터(2048)에 존재하는 그의 설계를 재컴파일링하여야 한다. 이러한 재컴파일링 시간은 며칠 또는 몇 주가 걸릴 수 있고, 이것은 시간에 민감한 설계/개발 스케쥴을 지나치게 지연시킬 수 있다. 부가하여, 이러한 커버리피케이션 툴은 신호를 사용하기 때문에, 정교한 회로가 이러한 신호를 데이터로 변환하기 위하여 또는 소정의 신호 대 신호 타이밍 제어를 제공하기 위하여 제공되어야 한다. 게다가, 샘플링에 요구되는 각각의 신호에 필요한 다수의 와이어(2061 및 2062)를 사용하는 것에 대한 필요성은 디버그 셋업 부담 및 시간을 증가시킨다.As mentioned above, such coverage tools with emulators have various constraints. When using a logic analyzer or sample-and-hold device to obtain the internal state information of the device under test, the designer compiles his design and consequently investigates the relevant signals of interest while investigating for debugging purposes. Is provided on the output pin for sampling. If the designer wants to debug a different part of the design, he must make sure that part has an output signal that can be sampled by a logic analyzer or sample-and-hold device, otherwise this signal is placed on the output pin for sampling purposes. It is necessary to recompile its design as it exists in emulator 2048 to be provided. This recompile time can take days or weeks, which can overdue the time-sensitive design / development schedule. In addition, since such coverage tools use signals, sophisticated circuitry must be provided to convert such signals into data or to provide some signal-to-signal timing control. In addition, the need to use multiple wires 2061 and 2062 for each signal required for sampling increases debug setup burden and time.

재배치가능 컴퓨팅 어레이를 이용한 시뮬레이션(Simulation with Reconfigurable Computing Array)Simulation with Reconfigurable Computing Array

간단한 리뷰로서, 도 67은 본 특허 명세서에서 이전에 설명되었던 본 발명의 싱글 엔진 재배치가능 컴퓨팅(reconfigurable computing; RCC) 어레이 시스템의 고 수준 구성을 도시한다. 이러한 싱글 엔진 RCC 시스템은 본 발명의 일 실시예에 따라 커버리피케이션 시스템에 통합될 것이다.As a brief review, FIG. 67 illustrates a high level configuration of a single engine reconfigurable computing (RCC) array system of the present invention previously described herein. This single engine RCC system will be integrated into the coverage system according to one embodiment of the invention.

도 67에서, RCC 어레이 시스템(2080)은 RCC 컴퓨팅 시스템(2081), 재배치가능 컴퓨팅(RCC) 하드웨어 어레이(2084), 및 그것들을 함께 결합시키는 PCI 버스(2089)를 포함한다. 중요하게도, RCC 컴퓨팅 시스템(2081)은 소프트웨어의 사용자 설계의 전체 모델을 포함하고, RCC 하드웨어 어레이(2084)는 사용자 설계의 하드웨어 모델을 포함한다. RCC 컴퓨팅 시스템(2081)은 CPU, 메모리, 운영 시스템, 및 싱글 엔진 RCC 시스템(2080)을 실행하기 위하여 필수적인 소프트웨어를 포함한다. 소프트웨어 클럭(2082)은 RCC 컴퓨팅 시스템(2081)의 소프트웨어 모델 및 RCC 하드웨어 어레이의 하드웨어 모델을 빈틈없이 제어할 수 있도록 제공된다. 테스트 벤치(bench) 데이터(2083)가 또한 RCC 컴퓨팅 시스템(2081)에 저장된다.In FIG. 67, RCC array system 2080 includes an RCC computing system 2081, a relocatable computing (RCC) hardware array 2084, and a PCI bus 2089 that couples them together. Importantly, RCC computing system 2081 includes a full model of user design of software, and RCC hardware array 2084 includes a hardware model of user design. The RCC computing system 2081 includes the CPU, memory, operating system, and software necessary to run the single engine RCC system 2080. The software clock 2082 is provided to seamlessly control the software model of the RCC computing system 2081 and the hardware model of the RCC hardware array. Test bench data 2083 is also stored in the RCC computing system 2081.

RCC 하드웨어 어레이 시스템(2084)는 PCI 인터페이스(2085), RCC 하드웨어 어레이 보드 세트(2086), 및 인터페이스 목적의 여러가지 버스들을 포함한다. RCC 하드웨어 어레이 보드(2086)의 세트는 하드웨어로 모델링된 사용자 설계의 적어도 일부(하드웨어 모델(2087)) 및 테스트 벤치 데이터용 메모리를 포함한다. 일 실시예에서, 이러한 하드웨어 모델의 여러 부분은 배치 시간 중에 복수개의 재배치가능 로직 소자들 사이에 분포된다. 더 많은 재배치가능 로직 소자들이 사용될수록, 더 많은 보드가 요구될 수 있다. 일 실시예에서, 4개의 재배치가능 로직 소자들은 하나의 보드 상에 제공된다. 다른 실시예에서, 8개의 재배치가능 로직 소자들이 하나의 보드 상에 제공된다. 4칩 보드에 재배치가능 로직 소자들이 용량 및 능력은 8칩 보드의 재배치가능 로직 소자들과 현저히 다를 수 있다.The RCC hardware array system 2084 includes a PCI interface 2085, an RCC hardware array board set 2086, and various buses for interface purposes. The set of RCC hardware array boards 2086 includes at least a portion of a user design modeled in hardware (hardware model 2087) and memory for test bench data. In one embodiment, different parts of this hardware model are distributed among a plurality of relocatable logic elements during deployment time. As more relocatable logic elements are used, more boards may be required. In one embodiment, four relocatable logic elements are provided on one board. In another embodiment, eight relocatable logic elements are provided on one board. Repositionable logic elements on 4-chip boards may have significantly different capacities and capabilities than repositionable logic elements on 8-chip boards.

버스(2090)는 PCI 인터페이스(2085)로부터 하드웨어 모델(2087)로 하드웨어 모델을 위한 다양한 클럭을 제공한다. 버스(2091)는 커넥터(2093) 및 내부버스(2094)를 경유하여 PCI 인터페이스(2085)와 하드웨어 모델(2087) 사이에 다른 I/O 데이터를 제공한다. 버스(2092)는 PCI 인터페이스(2085)와 하드웨어 모델(2087) 사이에 PCI 버스로서 기능한다. 테스트 벤치 데이터는 또한 하드웨어 모델(2087)의 메모리에 저장될 수 있다. 하드웨어 모델(2087)은 전술한 것처럼 하드웨어 모델이 RCC 컴퓨팅 시스템(2081)과 인터페이스 가능하게 하기 위해 요구되는 사용자 설계의 하드웨어 모델과는 다른, 다른 구조 및 기능을 포함한다.The bus 2090 provides various clocks for the hardware model from the PCI interface 2085 to the hardware model 2087. The bus 2091 provides other I / O data between the PCI interface 2085 and the hardware model 2087 via the connector 2093 and the internal bus 2094. The bus 2092 functions as a PCI bus between the PCI interface 2085 and the hardware model 2087. Test bench data may also be stored in the memory of hardware model 2087. The hardware model 2087 includes other structures and functions that are different from the user-designed hardware model required to enable the hardware model to interface with the RCC computing system 2081 as described above.

RCC 시스템(2080)은 하나의 워크스테이션으로 제공될 수 있거나 또는 대안적으로 각각의 워크스테이션이 시간 공유 기반 상에서 RCC 시스템(2080)에 액세싱하도록 제공되는 워크스테이션의 네트워크에 결합될 수 있다. 사실상, RCC 어레이 시스템(2080)은 시뮬레이션 스케쥴러 및 상태 스와핑 메커니즘(state swapping mechanism)을 구비한 시뮬레이션 서버로서 기능한다. 서버는 워크스테이션에서 각각의 사용자가 높은 속도의 가속 및 하드웨어 상태 스와핑 목적으로 RCC 하드웨어 어레이(2084)에 액세싱할 수 있게 한다. 가속 및 상태 스와핑 이후에, 각각의 사용자는 다른 워크스테이션에서 다른 사용자들에게 RCC 하드웨어 어레이(2084)의 제어를 릴리싱(releasing)하는 동안 소프트웨어의 사용자 설계를 국부적으로 시뮬레이팅할 수 있다. 이러한 네트워크 모델은 또한 이하에서 설명되는 커버리피케니션 시스템에 사용될 것이다.The RCC system 2080 may be provided as one workstation or alternatively may be coupled to a network of workstations where each workstation is provided to access the RCC system 2080 on a time sharing basis. In fact, the RCC array system 2080 functions as a simulation server with a simulation scheduler and a state swapping mechanism. The server allows each user at the workstation to access the RCC hardware array 2084 for high speed acceleration and hardware state swapping purposes. After acceleration and state swapping, each user can locally simulate the user design of the software while releasing control of the RCC hardware array 2084 to other users at different workstations. This network model will also be used in the cover repetition system described below.

RCC 어레이 시스템(2080)은 설계자에게 전체 설계를 시뮬레이팅할 수 있고 재배치가능 컴퓨팅 어레이의 하드웨어 모델을 경유하여 선택된 사이클 동안 테스트 지점들의 부분을 가속시키며 어느 시점에서든지 가상으로 설계의 임의의 부분에 대한 내부 상태 정보를 얻을 수 있는 파워 및 유연성을 제공한다. 실제로, 싱글-엔진 재배치가능 컴퓨팅 어레이(RCC) 시스템은 하드웨어 가속 시뮬레이터(hardware-accelerated simulator)로서 대충 설명될 수 있는데, 싱글 디버그 세션에서 이하의 작업, (1) 시뮬레이션, (2) 사용자가 시작, 중지, 값 가정, 및 임의의 시점에서 설계의 내부 상태를 조사할 수 있는 하드웨어 가속으로 시뮬레이션, (3) 시뮬레이션 후 분석, 및 (4) 내부 회로 에뮬레이션(in-circuit emulation)을 수행하기 위하여 사용될 수 있다. 소프트웨어 모델 및 하드웨어 모델 둘 다가 소프트웨어 클럭을 경유하여 싱글 엔진의 엄격한 제어 하에 있기 때문에, 재배치가능 컴퓨팅 어레이의 하드웨어 모델은 소프트웨어 시뮬레이션 모델에 빈틈없이 결합된다. 이것은 설계자가 사이클마다(cycle-by-cycle) 디버깅할 수 있게 하고 가치있는 내부 상태 정보를 얻기 위하여 다수의 사이클을 통해 하드웨어 모델을 가속 및 감속시킬 수 있게 한다. 더욱이, 이러한 시뮬레이션 시스템은 신호 대신에 데이터를 다루기 때문에, 어떠한 복잡한 신호 대 데이터 변환/타이밍 회로도 필요하지 않다. 부가하여, 재배치가능 컴퓨팅 어레이의 하드웨어 모델은 설계자가 전형적인 에뮬레이션 시스템과 달리 상이한 노드(node) 세트를 조사하길 원한다면 재컴파일링될 필요가 없다. 보다 상세한 설명은 위의 설명을 다시 참조하라.The RCC array system 2080 can simulate the entire design to the designer and accelerate a portion of the test points during a selected cycle via a hardware model of the relocatable computing array and virtually internal to any portion of the design at any point in time. It provides the power and flexibility to obtain status information. Indeed, a single-engine relocatable computing array (RCC) system can be roughly described as a hardware-accelerated simulator, which includes the following tasks in a single debug session: (1) simulation, (2) user initiated, It can be used to perform simulations, (3) post-simulation analysis, and (4) in-circuit emulation with hardware accelerations that can examine the internal state of the design at any point in time, at hypothesis, and at any point in time. have. Since both software and hardware models are under tight control of a single engine via a software clock, the hardware model of the relocatable computing array is tightly coupled to the software simulation model. This allows designers to cycle-by-cycle debug and accelerate and decelerate hardware models through multiple cycles to gain valuable internal state information. Moreover, since these simulation systems handle data instead of signals, no complicated signal-to-data conversion / timing circuitry is required. In addition, the hardware model of the relocatable computing array does not need to be recompiled if the designer wants to examine a different set of nodes, unlike a typical emulation system. Please refer back to the above description for more details.

외부 I/O가 없는 커버리피케이션 시스템Coverage system without external I / O

본 발명의 일 실시예는 실제의 물리적 외부 I/O 장치 및 타겟 어플리케이션을 전혀 사용하지 않는 커버리피케이션 시스템이다. 그리하여, 본 발명의 일 실시예에 따른 커버리피케이션 시스템은 임의의 실제 타겟 시스템 또는 I/O 장치를 사용하지 않으면서 사용자 설계의 소프트웨어 부분 및 하드웨어 부분을 디버깅하기 위하여 RCC 시스템을 다른 기능과 통합시킬 수 있다. 대신에 타겟 시스템 및 외부 I/O 장치는 RCC 컴퓨팅 시스템의 소프트웨어로 모델링된다.One embodiment of the invention is a coverage system that uses no actual physical external I / O devices and target applications at all. Thus, the coverage system according to one embodiment of the present invention integrates the RCC system with other functions to debug the software and hardware portions of the user design without using any real target systems or I / O devices. Can be. Instead, the target system and external I / O devices are modeled in software of the RCC computing system.

도 68을 참조하면, 커버리피케이션 시스템(2100)은 RCC 컴퓨팅 시스템(2101), RCC 하드웨어 어레이(2108), 및 그들을 함께 결합시키는 PCI버스(2114)를 포함한다. 중요하게도, RCC 컴퓨팅 시스템(2101)은 소프트웨어의 사용자 설계의 전체 모델을 포함하고 재배치가능 컴퓨팅 어레이(2108)는 사용자 설계의 하드웨어 모델을 포함한다. RCC 컴퓨팅 시스템(2101)은 CPU, 메모리, 운영 시스템, 및 싱글 엔진 커버리피케이션 시스템(2100)을 실행하기 위하여 필요한 소프트웨어를 포함한다. 소프트웨어 클럭(2104)은 RCC 컴퓨팅 시스템(2101)의 소프트웨어 모델 및 재배치가능 컴퓨팅 어레이(2108)의 하드웨어 모델을 완전히 제어하기 위하여 제공된다. 테스트 케이스(2103)는 또한 RCC 컴퓨팅 시스템(2101)에 저장된다.Referring to FIG. 68, the coverage system 2100 includes an RCC computing system 2101, an RCC hardware array 2108, and a PCI bus 2114 that couples them together. Importantly, RCC computing system 2101 includes a full model of user design of software and relocatable computing array 2108 includes a hardware model of user design. The RCC computing system 2101 includes a CPU, memory, an operating system, and software necessary to run the single engine coverage system 2100. The software clock 2104 is provided to fully control the software model of the RCC computing system 2101 and the hardware model of the relocatable computing array 2108. Test case 2103 is also stored in RCC computing system 2101.

본 발명의 일 실시예에 따라, RCC 컴퓨팅 시스템(2101)은 또한 타겟 어플리케이션(2102), 사용자 설계의 하드웨어 모델의 드라이버(2105), 장치(예를 들어, 비디오 카드)의 모델과 2106으로 라벨링된 소프트웨어의 상기 모델의 드라이버, 및 또다른 장치(예를 들어, 모니터)의 모델과 또한 2107로 라벨링된 소프트웨어의 상기 모델의 드라이버를 포함한다. 필수적으로, RCC 컴퓨팅 시스템(2101)은 실제 타겟 시스템 및 다른 I/O 장치가 이러한 컴퓨팅 환경의 일부인 사용자 설계의 소프트웨어 모델 및 하드웨어 모델로 전달되기 위해 필요한 만큼의 많은 장치 모델 및 드라이버를 포함한다.According to one embodiment of the invention, RCC computing system 2101 is also labeled 2106 with a target application 2102, a driver 2105 of a hardware model of a user design, a model of a device (e.g., a video card) and 2106. A driver of the model of software, and a model of another device (eg, a monitor) and also of the model of software labeled 2107. Essentially, the RCC computing system 2101 includes as many device models and drivers as necessary for the actual target system and other I / O devices to be delivered to the user-designed software and hardware models that are part of this computing environment.

RCC 하드웨어 어레이(2108)는 PCI 인터페이스(2109), RCC 하드웨어 어레이 보드의 세트(2110), 및 인터페이스 목적을 위한 여러가지 버스를 포함한다. RCC 하드웨어 어레이 보드의 세트(2110)는 하드웨어(2112)로 모델링된 사용자 설계의 적어도 일부분 및 테스트 벤티 데이터를 위한 메모리(2113)를 포함한다. 전술한 것처럼, 각각의 보드는 복수 개의 재배치가능 로직 소자 또는 칩을 포함한다.RCC hardware array 2108 includes a PCI interface 2109, a set 2110 of RCC hardware array boards, and various buses for interface purposes. The set 2110 of RCC hardware array boards includes at least a portion of a user design modeled with hardware 2112 and memory 2113 for test vent data. As mentioned above, each board includes a plurality of relocatable logic elements or chips.

버스(2115)는 PCI 인터페이스(2109)로부터 하드웨어 모델(2112)까지 하드웨어 모델을 위한 다양한 클럭을 제공한다. 버스(2116)는 커넥터(2111) 및 내부 버스(2118)를 경유하여 PCI 인터페이스(2109)와 하드웨어 모델(2112) 사이에 I/O 데이터를 제공한다. 버스(2117)는 PCI 인터페이스(2109)와 하드웨어 모델(2112) 사이에 PCI 버스로서 기능한다. 테스트 벤치 데이터는 또한 하드웨어 모델(2113)의 메모리에 저장될 수 있다. 전술한 것처럼, 하드웨어 모델은 하드웨어 모델이 RCC 컴퓨팅 시스템(2101)과 인터페이스할 수 있게 하는데 요구되는 사용자 설계의 하드웨어 모델과는 다른, 다른 구조 및 기능을 포함한다.Bus 2115 provides various clocks for hardware models from PCI interface 2109 to hardware model 2112. Bus 2116 provides I / O data between PCI interface 2109 and hardware model 2112 via connector 2111 and internal bus 2118. The bus 2117 functions as a PCI bus between the PCI interface 2109 and the hardware model 2112. Test bench data may also be stored in the memory of hardware model 2113. As noted above, the hardware model includes other structures and functions that are different from the user-designed hardware model required to enable the hardware model to interface with the RCC computing system 2101.

도 68의 커버리피케이션 시스템과 종래의 에뮬레이터 기반 커버리피케이션 시스템을 비교하기 위하여, 도 66은 타겟 시스템(2040), 소정의 I/O 장치(예를 들어, 프레임 버퍼 또는 데이터 스트림 레코드/플레이 시스템(2051)), 및 워크스테이션(2052)에 결합되는 에뮬레이터(2048)를 보여준다. 이러한 에뮬레이터 구성은 설계자에게 많은 문제점 및 셋업 이슈를 제공한다. 에뮬레이터는 에뮬레이터로 모델링된 사용자 설계의 내부 상태를 측정하기 위하여 로직 분석기 또는 샘플-앤드-홀드 장치를 필요로 한다. 로직 분석기 및 샘플-앤드-홀드 장치는 신호를 필요로 하기 때문에, 복잡한 신호 대 데이터 변환 회로가 요구된다. 부가적으로, 또한 복잡한 신호 대 신호 타이밍 제어 회로가 요구된다. 에뮬레이터의 내부 상태를 측정하기 위하여 사용될, 모든 신호에 대하여 요구되는 다수의 와이어가 셋업 동안에 사용자에게 부담을 준다. 디버그 세션 동안, 사용자는 그가 내부 로직 회로의 상이한 세트를 조사하길 원하는 매 시점마다 에뮬레이터를 재컴파일링하여야 하고, 그 결과 로직 회로로부터 나온 적절한 신호가 로직 분석기 또는 샘플-앤드-홀드 장치에 의한 측정 및 레코딩을 위하여 출력으로서 제공된다. 장시간의 재컴파일링은 아주 비용이 많이 든다.In order to compare the coverage system of FIG. 68 with a conventional emulator-based coverage system, FIG. 66 illustrates a target system 2040, a predetermined I / O device (eg, a frame buffer or data stream record / play system). 2021), and an emulator 2048 coupled to the workstation 2052. This emulator configuration presents the designer with many problems and setup issues. The emulator requires a logic analyzer or sample-and-hold device to measure the internal state of the user design modeled by the emulator. Logic analyzers and sample-and-hold devices require signals, requiring complex signal-to-data conversion circuits. In addition, a complex signal to signal timing control circuit is also required. The number of wires required for all signals, which will be used to measure the internal state of the emulator, burdens the user during setup. During a debug session, the user must recompile the emulator each time he wants to examine a different set of internal logic circuits, so that the appropriate signal from the logic circuit is measured and measured by the logic analyzer or sample-and-hold device. It is provided as an output for recording. Long recompiles are very expensive.

아무런 외부 I/O 장치가 결합되지 않은 본 발명의 커버리피케이션 시스템에서, 타겟 시스템 및 다른 I/O 장치가 소프트웨어로 모델링되어 실제 물리적 타겟 시스템 및 I/O 장치는 물리적으로 필요하지 않다. RCC 컴퓨팅 시스템(2101)이 데이터를 프로세싱하기 때문에, 복잡한 신호 대 데이터 변화 회로 또는 신호 대 신호 타이밍 제어 회로는 전혀 필요하지 않다. 또한 와이어의 수는 신호의 수와 같지 않으므로, 셋업은 비교적 단순하다. 부가하여, 사용자 설계의 하드웨어 모델에 들어 있는 로직 회로의 상이한 부분을 디버깅하는 것은 커버리피케이션 시스템이 데이터만 프로세싱하고 신호는 프로세싱하지 않기 때문에 재컴파일링을 요구하지 않는다. RCC 컴퓨팅 시스템이 소프트웨어 제어 클럭(즉, 소프트웨어 클럭 및 클럭 에지 검출 회로)로 RCC 하드웨어 어레이를 제어하기 때문에, 하드웨어 모델을 시작하고 중지하는 것이 촉진된다. 전체 사용자 설계의 모델이 소프트웨어에 존재하고 소프트웨어 클럭이 동기화를 가능하게 하므로 하드웨어 모델로부터 데이터를 판독하는 것 또한 용이하다. 그리하여, 사용자는 소프트웨어 시뮬레이션 하나에 의하여 디버깅할 수 있고, 하드웨어의 전체 또는 일부를 가속시킬 수 있으며, 매 사이클마다 여러가지 목적하는 테스트 지점을 통하여 나아갈 수 있고 , 소프트웨어 및 하드웨어 모델의 내부 상태(예를 들어, 레지스터 및 결합 로직 상태)를 조사할 수 있다. 예를 들어, 사용자는 소정의 테스트 벤치 데이터로 설계를 시뮬레이팅할 수 있고, 그 다음에 내부 상태 정보를 하드웨어 모델로 다운로드할 수 있으며, 하드웨어 모델과 함께 다양한 테스트 벤치 데이터를 사용하여 설계를 가속시킬 수 있으며, 레지스터/결합 로직 재생성에 의한 하드웨어 모델의 결과적 내부 상태값 및 하드웨어 모델로부터 소프트웨어 모델로 로딩되는 값을 조사할 수 있으며, 사용자는 최종적으로 하드웨어 모델 가속 프로세스의 결과를 사용하여 사용자 설계의 다른 부분을 시뮬레이팅할 수 있다.In the coverage system of the present invention in which no external I / O devices are combined, the target system and other I / O devices are modeled in software so that no actual physical target system and I / O devices are physically needed. Since the RCC computing system 2101 processes the data, no complicated signal to data change circuit or signal to signal timing control circuit is needed. Also, the number of wires is not equal to the number of signals, so the setup is relatively simple. In addition, debugging different parts of the logic circuitry in the hardware model of the user design does not require recompilation because the coverage system only processes data and not signals. Since the RCC computing system controls the RCC hardware array with a software control clock (ie, software clock and clock edge detection circuit), starting and stopping the hardware model is facilitated. It is also easy to read data from the hardware model because a model of the entire user design is present in the software and the software clock allows synchronization. Thus, the user can debug by means of a single software simulation, accelerate all or part of the hardware, go through various desired test points every cycle, and internal state of the software and hardware model (e.g. , Registers, and coupling logic states) can be examined. For example, a user can simulate a design with some test bench data, then download internal state information to a hardware model, and use various test bench data with the hardware model to accelerate the design. And the resultant internal state values of the hardware model by register / join logic regeneration and the values loaded into the software model from the hardware model, and the user can finally use the results of the hardware model acceleration process to You can simulate the part.

그러나, 전술한 것처럼, 워크스테이션은 여전히 디버그 세션 제어 목적을 위해 필요하다. 네트워크 구성에서, 워크스테이션은 디버그 데이터를 원격으로 액세싱하기 위하여 커버리피케이션 시스템과 원격으로 결합될 수 있다. 비네트워크(non-network) 구성에서, 워크스테이션은 커버리피케이션 시스템에 국부적으로 결합될 수 있고, 또는 소정의 다른 실시예에서 워크스테이션은 내부적으로 커버리피케이션 시스템을 결합시켜 디버그 데이터가 국부적으로 액세싱될 수 있다.However, as mentioned above, workstations are still needed for debug session control purposes. In a network configuration, the workstation can be remotely coupled with the coverage system to remotely access debug data. In a non-network configuration, the workstation may be locally coupled to the coverage system, or in some other embodiments, the workstation may internally couple the coverage system so that debug data may be locally accessed. Can be fresh.

외부 I/O를 구비한 커버리피케이션 시스템Covertification System with External I / O

도 68에서, 여러가지 I/O 장치 및 타겟 어플리케이션이 RCC 컴퓨팅 시스템(2101)으로 모델링되었다. 그러나, 지나치게 많은 I/O 장치 및 타겟 어플리케이션이 RCC 컴퓨팅 시스템(2101)에서 실행되고 있는 경우, 전체 속도는 느려진다. RCC 컴퓨팅 시스템(2101)의 단지 하나의 CPU를 사용하면, 모든 장치 모델 및 타겟 어플리케이션으로부터 나온 다양한 데이터를 프로세싱하기 위하여 더 많은 시간이 필요하다. 데이터 처리량을 증가시키기 위하여, 실제 I/O 장치 및 타겟 어플리케이션(이러한 I/O 장치 및 타겟 어플리케이션의 소프트웨어 모델 대신에)이 물리적으로 커버리피케이션 시스템에 결합될 수 있다.In FIG. 68, various I / O devices and target applications have been modeled with the RCC computing system 2101. However, if too many I / O devices and target applications are running in the RCC computing system 2101, the overall speed is slow. Using only one CPU of the RCC computing system 2101, more time is required to process the various data from all device models and target applications. To increase data throughput, real I / O devices and target applications (instead of software models of such I / O devices and target applications) may be physically coupled to the coverage system.

본 발명의 일 실시예는 실제적이고 물리적인 외부 I/O 장치 및 타겟 어플리케이션을 사용하는 커버리피케이션 시스템이다. 그리하여, 커버리피케이션 시스템은 실제 타겟 시스템 및/또는 I/O장치를 사용하면서 사용자 설계의 소프트웨어 부분 및 하드웨어 부분을 디버깅하기 위하여 RCC 시스템에 다른 기능을 결합시킬 수 있다. 테스트를 위하여, 커버리피케이션 시스템은 소프트웨어로부터 나온 테스트 벤치 데이터 및 외부 인터페이스(예를 들어, 타겟 시스템 및 외부 I/O 장치)로부터 나온 자극 둘 다를 사용할 수 있다. 테스트 벤치 데이터는 사용자 설계의 핀 아웃(pin-out)에 테스트 데이터를 제공하기 위해 사용될 뿐만 아니라, 사용자 설계의 내부 노드에 테스트 데이터를 제공하기 위하여 사용될 수 있다. 외부 I/O 장치(또는 타겟 시스템)으로부터 나온 실제 I/O 신호는 단지 사용자 설계의 핀 아웃에 지향될 수 있다. 이와 같이, 외부 인터페이스(예를 들어, 타겟 시스템 또는외부 I/O 장치)로부터 나온 테스트 데이터와 소프트웨어의 테스트 벤치 프로세스 사이의 한 가지 주된 차이점은 테스트 벤치 데이터는 핀 아웃 및 내부 노드에 인가되는 자극으로 사용자 설계를 테스트하기 위하여 사용될 수 있는 반면, 타겟 시스템 또는 외부 I/O 장치는 단지 핀 아웃(또는 핀 아웃를 나타내는 사용자 설계의 노드)을 경유하여 사용자 설계에 인가될 수 있다는 것이다. 이하의 설명에서, 커버리피케이션 시스템의 구조 및 타겟 시스템과 외부 I/O 장치와 관련된 상기 커버리피케이션의 배치가 제공될 것이다.One embodiment of the present invention is a coverage system using actual and physical external I / O devices and target applications. Thus, the coverage system may incorporate other functionality into the RCC system to debug the software and hardware portions of the user design while using the actual target system and / or I / O devices. For testing, the coverage system may use both test bench data from software and stimuli from external interfaces (eg, target systems and external I / O devices). The test bench data can be used to provide test data to the pin-out of the user design, as well as to provide test data to internal nodes of the user design. The actual I / O signal from the external I / O device (or target system) can only be directed to the pin out of the user design. As such, one major difference between test data from external interfaces (e.g., target systems or external I / O devices) and test bench processes in software is that test bench data can be pinouted and applied to internal nodes. While it can be used to test a user design, the target system or external I / O device can only be applied to the user design via a pin out (or a node of the user design representing the pin out). In the following description, the structure of the coverage system and the placement of such coverage in relation to the target system and external I / O devices will be provided.

도 66의 시스템 배치와 비교하여, 본 발명의 일 실시예에 따른 커버리피케이션 시스템은 점선(2070)으로 된 아이템들의 구조 및 기능을 대체한다. 달리 말하면, 도 66은 점선(2070)의 경계 내부의 에뮬레이터 및 워크스테이션을 보여주는 반면, 본 발명의 일 실시예는 점선(2070) 내부의 커버리피케이션 시스템(2140)으로서 도 69에 도시된 것과 같은 커버리피케이션 시스템(2140)(및 그와 관련된 워크스테이션)을 포함한다.Compared to the system arrangement of FIG. 66, the coverage system according to one embodiment of the present invention replaces the structure and function of the items in dotted line 2070. In other words, FIG. 66 shows an emulator and workstation inside the border of dashed line 2070, while one embodiment of the present invention is a coverage system 2140 inside dashed line 2070 as shown in FIG. 69. Coverage system 2140 (and associated workstations).

도 69를 참조하면, 본 발명의 일 실시예에 따른 커버리피케이션 시스템 구성은 타겟 시스템(2120), 커버리피케이션 시스템(2140), 소정의 선택적 I/O 장치, 및 그것들을 함께 결합시키기 위한 제어/데이터 버스(2131 및 2132)를 포함한다. 타겟 시스템(2120)은 중앙 컴퓨팅 시스템(2121)을 포함하고, 상기 중앙 컴퓨팅 시스템(2121)은 CPU 및 메모리를 포함하며, 다수의 어플리케이션(2122) 및 테스트 케이스(2123)을 실행하기 위하여 마이크로소프트 윈도우즈 또는 썬 마이크로시스템의 솔라리스와 같은 소정의 운영 시스템 하에서 동작한다. 사용자 설계의 하드웨어모델을 위한 장치 드라이버(2124)는 운영 시스템(및 임의의 어플리케이션)과 사용자 설계 사이의 통신을 가능하게 하기 위하여 중앙 컴퓨팅 시스템에 포함된다. 이러한 컴퓨팅 환경의 일부인 다른 장치들 및 커버리피케이션과 통신하기 위하여, 중앙 컴퓨팅 시스템(2121)은 PCI 버스(2129)에 결합된다. 타겟 시스템(2120)의 다른 주변 장치들은 타겟 시스템을 네트워크에 결합시키기 위하여 사용되는 에더넷 PCI 애드-온 카드(2125), 버스(2130)를 경유하여 SCSI 드라이브(2128)에 결합되는 SCSI PCI 애드-온 카드(2126), 및 PCI 버스 브리지(2127)를 포함한다.Referring to FIG. 69, a coverage system configuration according to an embodiment of the present invention provides a target system 2120, a coverage system 2140, certain optional I / O devices, and controls for coupling them together. / Data buses 2131 and 2132. Target system 2120 includes a central computing system 2121, which includes a CPU and memory, and runs Microsoft Windows to run multiple applications 2122 and test cases 2123. Or run under certain operating systems such as Sun Microsystems' Solaris. Device drivers 2124 for hardware models of user designs are included in the central computing system to enable communication between the operating system (and any application) and the user designs. In order to communicate with other devices and coverage that are part of this computing environment, the central computing system 2121 is coupled to the PCI bus 2129. Other peripheral devices in the target system 2120 are Ethernet PCI add-on cards 2125 used to couple the target system to the network, SCSI PCI add-ons coupled to the SCSI drive 2128 via the bus 2130. On card 2126 and PCI bus bridge 2127.

커버리피케이션 시스템(2140)은 RCC 컴퓨팅 시스템(2141), RCC 하드웨어 어레이(2190), 외부 I/O 확장기(expander) 형태의 외부 인터페이스(2139), 및 RCC 컴퓨팅 시스템(2141)과 RCC 하드웨어 어레이(2190)를 함께 결합시키는 PCI 버스(2171)를 포함한다. RCC 컴퓨팅 시스템(2141)은 CPU, 메모리, 운영 시스템, 및 싱글 엔진 커버리피케이션 시스템(2140)을 실행하기 위하여 필요한 소프트웨어를 포함한다. 중요하게도, RCC 컴퓨팅 시스템(2141)은 소프트웨어의 사용자 설계의 전체 모델을 포함하고 RCC 하드웨어 어레이(2190)는 사용자 설계의 하드웨어 모델을 포함한다.The coverage system 2140 includes an RCC computing system 2141, an RCC hardware array 2190, an external interface 2139 in the form of an external I / O expander, and an RCC computing system 2141 and an RCC hardware array ( 2190 includes a PCI bus 2171 that couples together. The RCC computing system 2141 includes the CPU, memory, operating system, and software needed to run the single engine coverage system 2140. Importantly, RCC computing system 2141 includes a full model of user design of software and RCC hardware array 2190 includes a hardware model of user design.

전술한 것처럼, 커버리피케이션 시스템의 싱글 엔진은 RCC 컴퓨팅 시스템(2141)의 주 메모리에 상주하는 주 소프트웨어 커널로부터 그 파워 및 유연성을 얻고 커버리피케이션 시스템(2140)의 전체 동작 및 실행을 제어한다. 임의의 테스트 벤치 프로세스가 활성이거나 외부 세계로부터 나온 임의의 신호가 커버리피케이션에 제공되는 한, 커널을 활성 테스트 벤치 컴포넌트를 평가하고, 클럭 컴포넌트를 평가하며, 레지스터와 메모리 및 전파하는 결합 로직 데이터를 업데이트시키기 위하여 클럭 에지를 검출하여 시뮬레이션 시간을 낸다. 이러한 주 소프트웨어 커널은 RCC 컴퓨팅 시스템(2141) 및 RCC 하드웨어 어레이(2190)의 단단히 결합된 속성에 대비한다.As mentioned above, the single engine of the coverage system gains its power and flexibility from the main software kernel residing in the main memory of the RCC computing system 2141 and controls the overall operation and execution of the coverage system 2140. As long as any test bench process is active or any signal from the outside world is provided for coverage, the kernel evaluates the active test bench component, evaluates the clock component, registers, memory and propagates the combined logic data. The clock edge is detected and the simulation time is taken to update. This main software kernel provides for the tightly coupled nature of RCC computing system 2141 and RCC hardware array 2190.

소프트웨어 커널은 RCC 하드웨어 어레이(2190) 및 외부 세계에 제공되는 소프트웨어 클럭 소스(2142)로부터 소프트웨어 클럭 신호를 생성한다. 클럭 소스(2142)는 이러한 소프트웨어 클럭의 목적에 따라 서로 다른 주파수에서 다수의 클럭을 발생시킬 수 있다. 일반적으로, 소프트웨어 클럭은 사용자 설계의 하드웨어 모델에 존재하는 레지스터가 임의의 대기 시간(hold-time)을 위반하지 않으면서 시스템 클럭에 동기하여 평가함을 포장한다. 소프트웨어 모델은 하드웨어 모델 레지스터 값에 영향을 미치는 소프트웨어의 클럭 에지를 검출할 수 있다. 따라서, 클럭 검출 메커니즘은 주 소프트웨어 모델의 클럭 에지 검출이 하드웨어 모델의 클럭 검출로 번역될 수 있음을 보장한다. 소프트웨어 클럭 및 클럭 에지 검출 로직에 대한 보다 상세한 설명은 도 17-19 및 본 특허 명세서의 첨부 텍스트를 참조하라.The software kernel generates a software clock signal from the RCC hardware array 2190 and a software clock source 2142 provided to the outside world. The clock source 2142 can generate multiple clocks at different frequencies depending on the purpose of this software clock. In general, software clocks wrap registers present in the hardware model of the user's design in synchronous evaluation with the system clock without violating any hold-time. The software model can detect clock edges of software that affect the hardware model register values. Thus, the clock detection mechanism ensures that clock edge detection of the main software model can be translated to clock detection of the hardware model. See Figures 17-19 and accompanying text of this patent specification for more details on software clock and clock edge detection logic.

본 발명의 일 실시예에 따라, RCC 컴퓨팅 시스템(2141)은 또한 다른 실제의 물리적 I/O 장치가 커버리피케이션 시스템에 결합될 수 있음에도 불구하고 다수의 I/O 장치 중 하나 이상의 모델을 포함할 수 있다. 예를 들어, RCC 컴퓨팅 시스템(2141)은 드라이버와 2143으로 라벨링된 소프트웨어의 테스트 벤치 데이터를 구비한 장치(예를 들어, 스피커)의 모델, 및 드라이버와 2144로 라벨링된 소프트웨어의 테스트 벤티 데이터를 구비한 또다른 장치(예를 들어, 그래픽 가속기(graphic accelerator))의 모델을 포함할 수 있다. 사용자는 어떠한 장치(및 그와 관련된 드라이버 및 테스트 벤치 데이터)가 모델링되고 RCC 컴퓨팅 시스템(2141)에 통합될 수 있는지 그리고 어떠한 장치가 실제로 커버리피케이션 시스템과 결합될 것인지를 결정한다.In accordance with one embodiment of the invention, RCC computing system 2141 may also include one or more models of multiple I / O devices, although other actual physical I / O devices may be coupled to the coverage system. Can be. For example, RCC computing system 2141 includes a model of a device (eg, a speaker) with a driver and test bench data of software labeled 2143, and test vent data of a driver and software labeled 2144. It may include a model of one another device (eg, a graphic accelerator). The user determines which devices (and their associated driver and test bench data) can be modeled and integrated into the RCC computing system 2141 and which devices will actually be combined with the coverage system.

커버리피케이션 시스템은 (1) RCC 컴퓨팅 시스템(2141)과 RCC 하드웨어 어레이(2190) 사이에, (2) 외부 인터페이스(타겟 시스템과 외부 I/O 장치에 결합됨)와 RCC 하드웨어 어레이(2190) 사이에 트래픽(traffic) 제어를 제공하는 제어 로직을 포함한다. 소정의 데이터는 소정의 I/O 장치가 RCC 컴퓨팅 시스템으로 모델링될 수 있기 때문에 RCC 하드웨어 어레이(2190)와 RCC 컴퓨팅 시스템(2141) 사이에 전달된다. 부가하여, RCC 컴퓨팅 시스템(2141)은 RCC 하드웨어 어레이(2190)로 모델링된 사용자 설계 부분을 포함하여 전체 소프트웨어 설계의 모델을 구비한다. 결과적으로, RCC 컴퓨팅 시스템(2141)은 또한 외부 인터페이스와 RCC 하드웨어 어레이(2190) 사이에 전달되는 모든 데이터에 액세스를 가져야 한다. 제어 로직은 RCC 컴퓨팅 시스템(2141)이 이러한 데이터에 액세스를 가짐을 보장한다. 제어 로직은 이하에서 보다 상세히 설명될 것이다.The coverage system includes (1) between RCC computing system 2141 and RCC hardware array 2190, (2) between an external interface (coupled to the target system and external I / O devices) and RCC hardware array 2190. It includes control logic to provide traffic control. Certain data is transferred between the RCC hardware array 2190 and the RCC computing system 2141 because certain I / O devices can be modeled in an RCC computing system. In addition, the RCC computing system 2141 includes a model of the overall software design, including the user design portion modeled by the RCC hardware array 2190. As a result, the RCC computing system 2141 must also have access to all data passed between the external interface and the RCC hardware array 2190. Control logic ensures that RCC computing system 2141 has access to this data. Control logic will be described in more detail below.

RCC 하드웨어 어레이(2190)는 다수의 어레이 보드를 포함한다. 도 69에 도시된 이러한 특정 실시예에서, 하드웨어 어레이(2190)는 보드(2145-2149)를 포함한다. 보드(2146-2149)는 대부분의 배치된 하드웨어 모델을 포함한다. 보드(2145)(또는 보드 m1)는 커버리피케이션 시스템이 적어도 하드웨어 모델의 일부를 구성하는데 사용할 수 있는 재배치가능 컴퓨팅 소자(예를 들어, FPGA 칩)(2153) 및 외부 인터페이스(타겟 시스템과 I/O 장치)와 커버리피케이션 시스템(2140) 사이에 트래픽과 데이터를 보내는 외부 I/O 제어기(2152)를 포함한다. 보드(2145)은 외부 I/O 제어기를 경유하여, RCC 컴퓨팅 시스템(2141)이 외부 세계(즉, 타겟 시스템 및 I/O 시스템)와 RCC 하드웨어 어레이(2190) 사이에서 전달되는 모든 데이터에 액세스할 수 있게 한다. 이러한 액세스는 커버리피케이션 시스템의 RCC 컴퓨팅 시스템(2141)이 소프트웨어로 되어 있는 전체 사용자 설계의 모델을 포함하고 RCC 컴퓨팅 시스템(2141)이 또한 RCC 하드웨어 어레이(2190)의 기능을 제어할 수 있기 때문에 중요하다.The RCC hardware array 2190 includes a plurality of array boards. In this particular embodiment shown in FIG. 69, the hardware array 2190 includes boards 2145-2149. Boards 2146-2149 include most of the deployed hardware models. Board 2145 (or board m1) is a relocatable computing element (eg, FPGA chip) 2153 and an external interface (target system and I / O) that the coverage system can use to form at least part of a hardware model. O I) and an external I / O controller 2152 that sends traffic and data between the coverage system 2140. The board 2145 is via an external I / O controller to allow the RCC computing system 2141 to access all data transferred between the external world (ie, target system and I / O system) and the RCC hardware array 2190. To be able. This access is important because the RCC computing system 2141 of the coverage system includes a model of the entire user design in software and the RCC computing system 2141 can also control the functionality of the RCC hardware array 2190. Do.

만약 외부 I/O 장치로부터 나온 자극이 하드웨어 모델에 제공된다면, 소프트웨어 모델은 또한 마찬가지로 이러한 자극에 액세스를 가져야 하고, 그 결과 커버리피케이션 시스템의 사용자는 선택적으로 다음 디버그 단계를 제어할 수 있으며, 상기 디버그 단계는 이러한 가해진 가적의 결과로서 설계의 내부 상태 값을 조사하는 단계를 포함한다. 보드 레이아웃과 상호연결 개요와 관련하여 전술한 것과 같이, 제 1 보드와 마지막 보드는 하드웨어 어레이(2190)에 포함된다. 그리하여, 보드 1(보드(2146)로서 라벨링됨) 및 보드 8(보드(2149)로서 라벨링됨)는 8보드 하드웨어 어레이(보드 m1 배제)에 포함된다. 이러한 보드(2145-2149)와 달리, 보드 m2(도 69에 미도시, 도 74 참조)는 또한 칩 m2를 구비하여 제공될 수 있다. 이러한 보드 m2는 보드 m2가 임의의 외부 인터페이스를 갖지 않고 부가적인 보드가 필요하다면 확장 목적으로 사용될 수 있다는 점을 제외하고 보드 m1과 유사하다.If the stimulus from the external I / O device is provided to the hardware model, the software model should also have access to this stimulus as well, so that the user of the coverage system can optionally control the next debug step, The debug phase includes examining the value of the internal state of the design as a result of this applied addition. As described above in connection with the board layout and interconnect overview, the first and last boards are included in the hardware array 2190. Thus, board 1 (labeled as board 2146) and board 8 (labeled as board 2149) are included in an 8-board hardware array (excluding board m1). Unlike such boards 2145-2149, board m2 (not shown in FIG. 69, see FIG. 74) may also be provided with chip m2. This board m2 is similar to board m1 except that board m2 does not have any external interface and can be used for expansion purposes if additional boards are needed.

이러한 보드의 내용은 이하에서 설명될 것이다. 보드(2145)(보드 m1)는 PCI 제어기(2151), 외부 I/O 제어기(2152), 데이터 칩(m1)(2153), 메모리(2154) 및 멀티플렉서(2155)를 포함한다. 일 실시예에서, 이러한 PCI 제어기는 PLX 9080이다. PCI 제어기(2151)는 버스(2171)를 경유하여 RCC 컴퓨팅 시스템(2141)에, 버스(2172)를 경유하여 3상태 버퍼(2179)에 결합된다.The contents of this board will be described below. Board 2145 (board m1) includes PCI controller 2151, external I / O controller 2152, data chip m1 2153, memory 2154, and multiplexer 2155. In one embodiment, this PCI controller is PLX 9080. The PCI controller 2151 is coupled to the RCC computing system 2141 via the bus 2171 and to the tri-state buffer 2179 via the bus 2172.

외부 세계(타겟 시스템(2120) 및 I/O 장치)와 RCC 컴퓨팅 시스템(2141) 사이의 커버리피케이션 시스템에 존재하는 주 트래픽 제어기는 외부 I/O 제어기(2152)(도 69, 71, 및 73에서 "CTRLXM"으로 알려짐)이고, 상기 제어기는 RCC 컴퓨팅 시스템(2141), RCC 하드웨어 어레이의 다른 보드들(2146-2149), 타겟 시스템(2120), 및 실제 외부 I/O 장치에 결합된다. 물론, RCC 컴퓨팅 시스템(2141)과 RCC 하드웨어 어레이(2190) 사이의 주 트래픽 제어기는 전술한 것처럼 항상 각각의 어레이 보드(2146-2149)의 개개의 내부 I/O 제어기들(예를 들어, I/O 제어기(2156 및 2158)과 PCI 제어기(2151)의 결합이었다. 일 실시예에서, 제어기(2156 및 2158)와 같은 이러한 개개의 내부 I/O 제어기는 도 22(유닛(700)) 및 도 56(유닛(1200))과 같은 예시적인 도면에서 설명되고 도시된 FPGA I/O 제어기이다.The primary traffic controller present in the coverage system between the external world (target system 2120 and I / O devices) and the RCC computing system 2141 is the external I / O controller 2152 (FIGS. 69, 71, and 73). (Also known as " CTRLXM "), the controller is coupled to the RCC computing system 2141, other boards 2146-2149 of the RCC hardware array, the target system 2120, and the actual external I / O device. Of course, the primary traffic controller between the RCC computing system 2141 and the RCC hardware array 2190 is always the individual internal I / O controllers (eg, I / O) of each array board 2146-2149 as described above. O controllers 2156 and 2158 and a combination of PCI controller 2151. In one embodiment, these individual internal I / O controllers, such as controllers 2156 and 2158, are shown in Figure 22 (unit 700) and in Figure 56. FPGA I / O controller described and illustrated in an example figure (unit 1200).

외부 I/O 제어기(2152)는 외부 I/O 제어기가 RCC 컴퓨팅 시스템(2141)과 인터페이스할 수 있게 하기 위하여 3상태 버퍼(2179)에 결합된다. 일 실시예에서, 3상태 버퍼(2179)는 소정의 예에서 로컬 버스로부터 나온 데이터가 RCC 컴퓨팅 시스템(2141)으로 전달되는 것을 막으면서 RCC 컴퓨팅 시스템(2141)으로부터 나온 데이터가 로컬 버스(2180)로 전달될 수 있게 하고, 다른 예에서는 데이터가 로컬버스(2180)으로부터 RCC 컴퓨팅 시스템(2141)로 전달될 수 있게 한다.External I / O controller 2152 is coupled to tri-state buffer 2179 to enable the external I / O controller to interface with RCC computing system 2141. In one embodiment, the tri-state buffer 2179, in some examples, prevents data from the local bus from being passed to the RCC computing system 2141 while passing data from the RCC computing system 2141 to the local bus 2180. In another example, data may be transferred from the local bus 2180 to the RCC computing system 2141.

외부 I/O 제어기(2152)는 또한 데이터 버스(2176)를 통해 칩(m1)(2153) 및 메모리/외부 버퍼(2154)에 연결된다. 일 실시에에서, 칩(m1)(2153)은 사용자 설계의 하드웨어 모델(또는 사용자 설계가 충분히 소형이면, 모든 하드웨어 모델)의 적어도 일부를 구성하는데 이용될 수 있는 FPGA 칩과 같은 재구성가능한 컴퓨팅 엘리먼트이다. 외부 버퍼(2154)는 일 실시예에서 DRAM DIMM이며 다양한 목적을 위해 칩(2153)에 의해 이용될 수 있다. 외부 버퍼(2154)는 각각 재구성가능한 로직 엘리먼트(예를 들어, 재구성가능한 로직 엘리먼트(2157)에 국부적으로 연결된 개별 SRAM 메모리 장치이상의 많은 메모리 용량을 제공한다. 이러한 큰 메모리 용량은 RCC 컴퓨팅 시스템이 테스트 벤치 데이터, 마이크로제어기용 구현 코드(사용자 설계가 마이크로제어기라면) 및 하나의 메모리 장치의 큰 룩업 테이블과 같은 큰 데이터량을 저장하도록 허용한다. 외부 버퍼(2154)는 또한 상기에 기술된 바와 같이, 하드웨어 모델링에 필요한 데이터를 저장하는데 이용될 수 있다. 필수적으로, 이러한 외부 버퍼(2154)는 더 많은 메모리를 갖지만, 예를 들어 도 56(SRAM(1205, 1206))에서 상기에 기술되고 도시된 다른 하이 또는 로우 뱅크 SRAM 메모리 장치와 부분적으로 유사하게 기능할 수 있다. 외부 버퍼(2154)는 또한 이후에 데이터가 RCC 컴퓨팅 시스템(2141)에 의해 검색될 수 있도록 타겟 시스템(2120) 및 외부 I/O 장치로부터 수신한 데이터를 저장하기 위해 커버(coverification) 시스템에 의해 이용될 수 있다. 칩 m1(2153) 및 외부 버퍼(2154)는 또한 "메모리 시뮬레이션"이란 섹션하에 여기에 기술된 메모리 매핑 로직을 포함한다.The external I / O controller 2152 is also connected to the chip m21 2153 and the memory / external buffer 2154 via the data bus 2176. In one embodiment, chip m1 2153 is a reconfigurable computing element such as an FPGA chip that can be used to construct at least a portion of a hardware model of a user design (or any hardware model if the user design is small enough). . External buffer 2154 is a DRAM DIMM in one embodiment and may be used by chip 2153 for various purposes. External buffer 2154 provides more memory capacity than each individual SRAM memory device locally coupled to a reconfigurable logic element (eg, reconfigurable logic element 2157. This large memory capacity allows RCC computing systems to test bench Allows storing large amounts of data such as data, implementation code for the microcontroller (if the user design is a microcontroller), and a large lookup table of one memory device.The external buffer 2154 also includes hardware, as described above. Essentially, this external buffer 2154 has more memory, but for example other high described and illustrated above in FIG. 56 (SRAM 1205, 1206). Or function similarly in part to a low bank SRAM memory device. The data may be used by a coverage system to store data received from the target system 2120 and external I / O devices such that the data may be retrieved by the RCC computing system 2141. Chip m1 2153 And external buffer 2154 also includes the memory mapping logic described herein under the section “Memory Simulation”.

외부 버퍼(2154)에서 원하는 데이터를 액세스하기 위해, 칩(2153) 및 RCC 컴퓨팅 시스템(2141)(외부 I/O 제어기(2152)를 통해)은 원하는 데이터용 어드레스를 전달할 수 있다. 상기 칩(2153)은 어드레스 버스(2182)상에 어드레스를 제공하고 외부 I/O 제어기(2152)는 어드레스 버스(2177)상에 어드레스를 제공한다. 이러한 어드레스 버스(2182, 2177)는 외부 버퍼(2154)에 연결된 출력 라인(2178)상의 선택된 어드레스를 제공하는 멀티플렉서(2155)에 대한 입력이다. 상기 멀티플렉서(2155)에 대한 선택 신호는 라인(2181)을 통해 외부 I/O 제어기(2152)에 의해 제공된다.To access the desired data in the external buffer 2154, the chip 2153 and the RCC computing system 2141 (via the external I / O controller 2152) may pass the address for the desired data. The chip 2153 provides an address on the address bus 2182 and an external I / O controller 2152 provides an address on the address bus 2177. These address buses 2182 and 2177 are inputs to the multiplexer 2155 providing a selected address on the output line 2178 coupled to the external buffer 2154. The select signal for the multiplexer 2155 is provided by an external I / O controller 2152 over line 2181.

외부 I/O 제어기(2152)는 또한 버스(2180)를 통해 다른 보드(2146-2149)에 연결된다. 일 실시예에서, 버스(2180)는 도 22(로컬 버스(708)) 및 도 56(로컬 버스(1210))의 상기 예시적인 도면에 기술되고 도시된 로컬 버스이다. 이 실시예에서, 5개의 보드(보드(2145)(보드 m1) 포함)만이 이용된다. 보드의 실제 수는 하드웨어에서 모델링될 사용자 설계의 복잡도 및 크기에 의해 결정된다. 매체 복잡도인 사용자 설계의 하드웨어 모델은 고도의 복잡도를 갖는 사용자 설계의 하드웨어 모델보다 적은 보드를 필요로 한다.External I / O controller 2152 is also connected to other boards 2146-2149 via bus 2180. In one embodiment, the bus 2180 is a local bus described and shown in the above exemplary diagrams of FIGS. 22 (local bus 708) and 56 (local bus 1210). In this embodiment, only five boards (including board 2145 (board m1)) are used. The actual number of boards is determined by the complexity and size of the user design to be modeled in hardware. The hardware complexity of the user design, which is the media complexity, requires less boards than the hardware model of the user design, which is of high complexity.

범위성을 가능하게 하기 위해, 보드(2146-2149)는 소정의 보드내 상호접속 라인을 제외하고 서로에 대해 실질적으로 동일하다. 이러한 상호접속 라인은 하나의 칩(예를 들어, 보드(2146)의 칩(2157))에서 사용자 설계의 하드웨어 모델의 한 부분이 또 다른 칩(예를 들어, 보드(2148)의 칩(2161))에 물리적으로 위치한 동일한 사용자 설계의 하드웨어 모델의 또 다른 부분과 통신할 수 있게 한다. 간략하게, 도 8 및 36-44와 명세서에서의 도면 설명뿐 아니라, 이러한 커버 시스템용 상호접속 구조에 대해 도 74를 참조하라.To enable scalability, the boards 2146-2149 are substantially identical to each other except for certain intra-board interconnect lines. This interconnect line is a portion of the hardware model of the user design in one chip (eg, chip 2157 of board 2146) and another chip (eg, chip 2161 of board 2148). To communicate with another part of the hardware model of the same user design physically located at For simplicity, reference is made to FIG. 74 for interconnect structures for such cover systems, as well as FIGS. 8 and 36-44 and drawings in the specification.

보드(2148)는 대표적인 보드이다. 보드(2148)는 (보드(2145)(보드 m1)를 제외) 이러한 4-보드 레이아웃의 제 3 보드이다. 따라서, 상호접속 라인에 대해 적절한 종료를 필요로 하는 엔드-보드가 아니다. 보드(2148)는 내부 I/O 제어기 (2158), 여러 재구성가능한 로직 엘리먼트(예를 들어, FPGA 칩)(2159-2166), 하이 뱅크 FD 버스(2167), 로우 뱅크 FD 버스(2168), 하이 뱅크 메모리(2169) 및 로우 뱅크 메모리(2170)를 포함한다. 상기에 기술된 바와 같이, 일 실시예에서 내부 I/O 제어기(2158)는 도 22(유니트(700)) 및 도 56(유니트(1200))에서의 예시적인 도면으로 상기 기술되고 도시된 FPGA I/O 제어기이다. 유사하게, 하이 및 로우 뱅크 메모리 장치(2169, 2170)는 예를 들어, 도 56(SRAM(1205, 1206))에서 상기에 기술되고 도시된 SRAM 메모리 장치이다. 일 실시예에서, 하이 및 로우 뱅크 FD 버스(2167, 2168)는 도 22(FPGA 버스(718, 719)), 도 56(FD 버스(1212, 1213)) 및 도 57(FD 버스(1282))에서의 예시적인 도면에 기술되고 도시된 FD 버스 또는 FPGA 버스이다.Board 2148 is a representative board. Board 2148 is the third board of this four-board layout (except board 2145 (board m1)). Thus, it is not an end-board that requires proper termination of the interconnect line. Board 2148 includes internal I / O controller 2158, various reconfigurable logic elements (e.g., FPGA chips) 2159-2166, high bank FD bus 2167, low bank FD bus 2168, high A bank memory 2169 and a row bank memory 2170. As described above, in one embodiment the internal I / O controller 2158 is the FPGA I described and shown above with exemplary views in FIGS. 22 (unit 700) and 56 (unit 1200). / O controller. Similarly, high and low bank memory devices 2169 and 2170 are, for example, SRAM memory devices described and shown above in FIG. 56 (SRAMs 1205 and 1206). In one embodiment, the high and low bank FD buses 2167 and 2168 are shown in Figure 22 (FPGA buses 718 and 719), Figure 56 (FD buses 1212 and 1213) and Figure 57 (FD buses 1282). The FD bus or FPGA bus described and illustrated in the exemplary figures in FIG.

커버 시스템(2140)을 타겟 시스템(2120) 및 다른 I/O 장치에 결합하기 위해, 외부 I/O 확장기의 형태인 외부 인터페이스(2139)가 제공된다. 타겟 시스템측에서, 외부 I/O 확장기(2139)는 소프트웨어 클록을 전달하는데 이용되는 2차 PCI 버스(2132) 및 제어 라인(2131)을 통해 PCI 브리지(2127)에 연결된다. I/O 장치측에서, 외부 I/O 확장기(2139)는 소프트웨어 클록용 핀-아웃 데이터 및 제어 라인(2133-2135)에 대해 버스(2136-2138)를 통해 여러 I/O 장치에 연결된다. I/O 확장기(2139)에 결합될 수 있는 I/O 장치의 수는 사용자에 의해 결정된다. 소정 경우에, 많은 데이터 버스 및 소프트웨어 클록 제어 라인이 외부 I/O 확장기(2139)에 제공됨에 따라, 성공적인 디버그 세션을 실행하기 위해 커버 시스템(2140)에 많은 I/O 장치에 연결할 필요가 있다.In order to couple the cover system 2140 to the target system 2120 and other I / O devices, an external interface 2139 in the form of an external I / O expander is provided. On the target system side, an external I / O expander 2139 is connected to the PCI bridge 2127 via a secondary PCI bus 2132 and a control line 2131 that are used to carry a software clock. On the I / O device side, an external I / O expander 2139 is connected to various I / O devices via bus 2136-2138 for pin-out data and control lines 2133-2135 for the software clock. The number of I / O devices that can be coupled to I / O expander 2139 is determined by the user. In some cases, as many data buses and software clock control lines are provided to the external I / O expander 2139, it is necessary to connect to many I / O devices in the cover system 2140 to execute a successful debug session.

커버 시스템(2140)측에서, 외부 I/O 확장기(2139)는 데이터 서브(2175), 소프트웨어 클록 제어 라인(2174) 및 스캔 제어 라인(2173)을 통해 외부 I/O 제어기에 연결된다. 데이터 버스(2175)는 외부 장치(타겟 시스템(2120) 및 외부 I/O 장치)와 커버 시스템(2140)간에 핀-아웃 데이터를 전달하는데 이용된다. 소프트웨어 클록 제어 라인(2174)은 RCC 컴퓨팅 시스템(2141)으로부터 외부 장치로 소프트웨어 클록 데이터를 전달하는데 이용된다.On the cover system 2140 side, an external I / O expander 2139 is connected to an external I / O controller through a data sub 2175, a software clock control line 2174, and a scan control line 2173. The data bus 2175 is used to transfer pin-out data between the external device (target system 2120 and external I / O device) and cover system 2140. Software clock control line 2174 is used to transfer software clock data from RCC computing system 2141 to an external device.

제어 라인(2174, 2131)상에 존재하는 소프트웨어 클록은 RCC 컴퓨팅 시스템 (2141)의 메인 소프트웨어 커널에 의해 발생된다. RCC 컴퓨팅 시스템(2141)은 PCI 버스(2171), PCI 제어기(2151), 버스(2171), 3상 버퍼(2179), 로컬 버스(2180), 외부 I/O 제어기(2152) 및 제어 라인(2174)을 통해 외부 I/O 확장기(2139)에 소프트웨어 클록을 전달한다. 외부 I/O 확장기(2139)로부터, 소프트웨어 클록은 (PCI 브리지(2127)를 통한)타겟 시스템(2120) 및 제어 라인(2133-2135)을 통한 다른 외부 I/O 장치에 대한 클록 입력으로 제공된다. 소프트웨어 클록은 메인 클록 소스로 기능하기 때문에, 타겟 시스템(2120) 및 I/O 장치는 더 느린 속도로 실행한다. 그러나, 타겟 시스템(2120) 및 외부 I/O 장치에 제공된 데이터는 RCC 컴퓨팅시스템(2141)의 소프트웨어 모델 및 RCC 하드웨어 어레이(2190)의 하드웨어 모델과 같은 소프트웨어 클록 속도로 동기된다. 유사하게, 타겟 시스템(2120) 및 외부 I/O 장치로부터의 데이터는 소프트웨어 클록으로 동기된 커버 시스템(2140)에 전달된다.Software clocks present on control lines 2174 and 2131 are generated by the main software kernel of RCC computing system 2141. RCC computing system 2141 includes PCI bus 2171, PCI controller 2151, bus 2171, three-phase buffer 2179, local bus 2180, external I / O controller 2152, and control line 2174. Pass the software clock to the external I / O expander (2139). From the external I / O expander 2139, a software clock is provided as a clock input to the target system 2120 (via the PCI bridge 2127) and other external I / O devices via the control line 2133-2135. . Since the software clock serves as the main clock source, the target system 2120 and the I / O device run at a slower rate. However, data provided to the target system 2120 and external I / O devices are synchronized at the same software clock rate as the software model of the RCC computing system 2141 and the hardware model of the RCC hardware array 2190. Similarly, data from target system 2120 and external I / O devices are delivered to cover system 2140 synchronized with a software clock.

따라서, 외부 인터페이스와 커버 시스템간에 전달된 I/O 데이터는 소프트웨어 클록으로 동기된다. 본질적으로, 소프트웨어 클록은 데이터가 전달될 때마다 커버 시스템(RCC 시스템 및 RCC 하드웨어 어레이)을 갖는 타겟 시스템과 외부 I/O 장치의 동작을 동기시킨다. 소프트웨어 클록은 데이터 입력 동작 및 데이터 출력 동작 양쪽에 대해 이용된다. 데이터 입력 동작에 대해, 포인터(이후에 논의됨)가 RCC 컴퓨팅 시스템(2141)으로부터 외부 인터페이스로 소프트웨어 클록을 래칭할 때, 다른 포인터는 외부 인터페이스로부터 RCC 하드웨어 어레이(2190)의 하드웨어 모델에서 선택된 내부 노드로 이러한 I/O 데이터 입력을 래칭시킬 것이다. 한개 단위로, 포인터는 소프트웨어 클록이 외부 인터페이스에 전달될 때 이러한 사이클동안 상기 I/O 데이터 입력을 래칭시킬 것이다. 모든 데이터가 래칭되면, RCC 컴퓨팅 시스템은 원하는 경우 다른 소프트웨어 클록 사이클에서 더 많은 데이터를 다시 래칭하기 위해 다른 소프트웨어 클록을 발생시킬 수 있다. 데이터 출력 동작에 대해, RCC 컴퓨팅 시스템은 외부 인터페이스에 소프트웨어 클록을 전달할 수 있으며, 후속적으로 RCC 하드웨어 어레이(2190)의 하드웨어 모델의 내부 노드로부터 포인터의 보조로 외부 인터페이스로의 데이터 게이팅을 제어할 수 있다. 다시, 한개 단위로, 포인터는 내부 노드로부터 외부 인터페이스로 데이터를 게이팅할 것이다.더 많은 데이터가 외부 인터페이스에 전달될 필요가 있다면, RCC 컴퓨팅 시스템은 또 다른 소프트웨어 클록 사이클을 발생시킬 수 있으며 그후에 데이터 출력을 외부 인터페이스에 게이팅하기 위해 선택된 포인터를 구동시킨다. 소프트웨어 클록의 발생은 엄격하게 제어되며 따라서 커버 시스템이 데이터 전송을 동기시키도록 하며, 커버 시스템과 소정의 외부 I/O 장치간의 데이터 평가는 외부 인터페이스에 연결된다.Thus, the I / O data transferred between the external interface and the cover system is synchronized with the software clock. In essence, the software clock synchronizes the operation of the external I / O device with the target system with the cover system (RCC system and RCC hardware array) each time data is transferred. The software clock is used for both data input and data output operations. For data entry operations, when the pointer (discussed later) latches the software clock from the RCC computing system 2141 to the external interface, another pointer is selected from the external interface in the hardware model of the RCC hardware array 2190. Will latch these I / O data inputs. In one unit, a pointer will latch the I / O data input during this cycle when a software clock is delivered to the external interface. Once all the data is latched, the RCC computing system can generate another software clock to relatch more data in other software clock cycles if desired. For data output operations, the RCC computing system may deliver a software clock to an external interface, and subsequently control data gating to the external interface with the aid of a pointer from an internal node of the hardware model of the RCC hardware array 2190. have. Again, as a unit, the pointer will gate the data from the internal node to the external interface. If more data needs to be transferred to the external interface, the RCC computing system can generate another software clock cycle and then output the data. Drive the selected pointer to gate to the external interface. The generation of the software clock is tightly controlled so that the cover system synchronizes the data transfer, and the data evaluation between the cover system and any external I / O device is connected to the external interface.

스캔 제어 라인(2173)은 커버 시스템(2140)이 존재할 수 있는 소정 데이터에 대해 데이터 버스(2132, 2136, 2137, 2138)를 스캔하도록 허용하는데 이용된다. 스캔 신호를 지원하는 외부 I/O 제어기(2151)의 로직은 MOVE 신호를 통해 다음 입력으로 이동하기 전에 특정 시간주기동안 여러 입력이 출력으로 제공되는 포인터 로직이다. 이러한 로직은 도 11에 도시된 방식과 유사하다. 효율적으로, 스캔 신호는 라운드 로빈 순서로 멀티플렉서에 여러 입력을 선택하는 경우를 제외하고 멀티플렉서용 선택 신호와 유사하게 기능한다. 따라서, 한번의 시간주기에서, 스캔 제어 라인(2173)상의 스캔 신호는 타겟 시스템(2120)으로부터 발생할 수 있는 데이터용 데이터 버스(2132)를 샘플링한다. 다음의 시간 주기에서, 스캔 제어 라인(2173)상의 스캔 신호는 접속될 수 있는 외부 I/O 장치에서 발생할 수 있는 데이터용 데이터 버스(2136)를 샘플링한다. 다음의 시간 주기에서, 데이터 버스(2137)는 커버 시스템(2140)이 이러한 디버그 세션동안 타겟 시스템(2120) 또는 외부 I/O 장치로부터 발생되는 모든 핀-아웃 데이터를 수신하고 처리할 수 있도록 샘플링된다. 데이터 버스(2132, 2136, 2137, 2138)로부터 커버 시스템(2140)에의해 수신되는 소정 데이터는 외부 I/O 제어기(2152)를 통해 외부 버퍼(2154)에 전송된다.The scan control line 2173 is used to allow the cover system 2140 to scan the data buses 2132, 2136, 2137, 2138 for certain data that may be present. The logic of the external I / O controller 2151 that supports the scan signal is pointer logic in which multiple inputs are provided as outputs for a specific time period before moving to the next input via the MOVE signal. This logic is similar to the manner shown in FIG. Effectively, the scan signal functions similarly to the select signal for the multiplexer, except for selecting multiple inputs to the multiplexer in round robin order. Thus, in one time period, the scan signal on the scan control line 2173 samples the data bus 2132 for data that may occur from the target system 2120. In the next time period, the scan signal on the scan control line 2173 samples the data bus 2136 for data that may occur in an external I / O device that may be connected. In the next time period, data bus 2137 is sampled such that cover system 2140 can receive and process all pin-out data generated from target system 2120 or external I / O device during this debug session. . Certain data received by the cover system 2140 from the data buses 2132, 2136, 2137, and 2138 is transmitted to the external buffer 2154 through the external I / O controller 2152.

도 69에 도시된 구성은 타겟 시스템(2120)이 1차 CPU를 포함하며 사용자 설계는 비디오 제어기, 망 어댑터, 그래픽 어댑터, 마우스 또는 소정의 다른 지원 장치, 카드 또는 로직과 같은 소정의 주변 장치임을 가정한다. 따라서, 타겟 시스템 (2120)은 1차 PCI 버스(2129)에 연결된 타겟 애플리케이션(운영 시스템 포함)을 포함하며, 커버 시스템(2140)은 사용자 설계를 포함하며 2차 PCI 버스(2132)에 연결된다. 상기 구성은 사용자 설계의 조건에 따라 상당히 달라질 수 있다. 예를 들어, 사용자 설계가 CPU라면, 타겟 시스템(2120)이 더이상 중앙 컴퓨팅 시스템(2121)을 포함하지 않는 반면 타겟 애플리케이션은 커버 시스템(2140)의 RCC 컴퓨팅 시스템 (2141)에서 실행할 것이다. 또한, 버스(2132)는 1차 PCI 버스이며 버스(2129)는 2차 PCI 버스일 것이다. 효율적으로, 사용자 설계가 중앙 컴퓨팅 시스템(2121)을 지원하는 주변 장치 중 하나인 대신, 사용자 설계는 메인 컴퓨팅 센터이며 모든 다른 주변 장치는 사용자 설계를 지원한다.The configuration shown in FIG. 69 assumes that the target system 2120 includes a primary CPU and the user design is any peripheral device such as a video controller, network adapter, graphics adapter, mouse or any other supporting device, card or logic. do. Thus, target system 2120 includes a target application (including an operating system) coupled to primary PCI bus 2129, and cover system 2140 includes a user design and is coupled to secondary PCI bus 2132. The configuration may vary considerably depending on the conditions of the user design. For example, if the user design is a CPU, the target system 2120 will no longer include a central computing system 2121 while the target application will run on the RCC computing system 2141 of the cover system 2140. In addition, bus 2132 may be a primary PCI bus and bus 2129 may be a secondary PCI bus. Effectively, instead of the user design being one of the peripherals supporting the central computing system 2121, the user design is the main computing center and all other peripherals support the user design.

외부 인터페이스(외부 I/O 확장기(2139))와 커버 시스템(2140)간에 데이터를 전송하는 제어 로직은 각 보드(2145-2149)에 설치된다. 제어 로직의 1차 부분은 외부 I/O 제어기(2152)에 형성되지만 다른 부분은 여러 내부 I/O 제어기(예를 들어, 2156, 2158) 및 재구성가능한 로직 엘리먼트(예를 들어, FPGA 칩(2159, 2165))에 형성된다. 교육의 목적으로, 모든 보드의 모든 칩에 대해 동일한 반복 로직 구조 대신에 이러한 제어 로직의 소정 부분만을 도시할 필요가 있다. 도 69의점선(2150)내의 커버 시스템(2140)의 부분은 제어 로직의 하나의 서브세트를 포함한다. 이러한 제어 로직은 도 70-73에 관해 더욱 상세히 논의될 것이다.Control logic for transferring data between the external interface (external I / O expander 2139) and cover system 2140 is installed on each board 2145-2149. The primary portion of the control logic is formed in the external I / O controller 2152, while the other portion is comprised of several internal I / O controllers (eg, 2156, 2158) and reconfigurable logic elements (eg, FPGA chip 2159). , 2165). For educational purposes, it is necessary to show only a portion of this control logic instead of the same repeating logic structure for every chip on every board. The portion of cover system 2140 in dashed line 2150 of FIG. 69 includes one subset of control logic. Such control logic will be discussed in more detail with respect to FIGS. 70-73.

제어 로직의 특정 서브세트의 소자는 외부 I/O 제어기(2152), 3상 버퍼 (2179), 내부 I/O 제어기(2156)(CTRL 1), 재구성가능한 로직 엘리먼트(2157)(보드 1의 칩 0을 나타내는 chip0_1) 및 상기 소자에 연결되는 여러 버스 및 제어 라인의 부분을 포함한다. 구체적으로, 도 70은 데이터 입력 사이클동안 이용되는 제어 로직의 일부를 도시하며, 상기 외부 인터페이스(외부 I/O 확장기(2139)) 및 RCC 컴퓨팅 시스템(2141)으로부터의 데이터는 RCC 하드웨어 어레이(2190)에 전송된다. 도 72는 데이터 입력 사이클의 타이밍도를 도시한다. 도 71은 데이터 출력 사이클에 이용되는 제어 로직의 부분을 도시하며, RCC 하드웨어 어레이(2190)로부터의 데이터는 RCC 컴퓨팅 시스템(2141) 및 외부 인터페이스(외부 I/O 확장기(2139))에 전송된다. 도 73은 데이터 출력 사이클의 타이밍도를 도시한다.The elements of a particular subset of control logic are external I / O controller 2152, three-phase buffer 2179, internal I / O controller 2156 (CTRL 1), reconfigurable logic element 2157 (chip of board 1). Chip0_1 representing zero) and portions of various bus and control lines connected to the device. Specifically, FIG. 70 shows some of the control logic used during the data input cycle, with data from the external interface (external I / O expander 2139) and RCC computing system 2141 being RCC hardware array 2190. Is sent to. 72 shows a timing diagram of a data input cycle. 71 shows a portion of the control logic used in the data output cycle, with data from the RCC hardware array 2190 being sent to the RCC computing system 2141 and an external interface (external I / O expander 2139). 73 shows a timing diagram of a data output cycle.

데이터 입력Data entry

본 발명의 일 실시예에 따른 데이터 입력 제어 로직은 RCC 컴퓨팅 시스템 또는 외부 인터페이스로부터 RCC 하드웨어 어레이로 전송된 데이터 처리를 담당한다. 데이터 입력 제어 로직의 하나의 특정 서브세트(2150)(도 69 참조)는 도 70에 도시되며 외부 I/O 제어기(2200), 3상 버퍼(2202), 내부 I/O 제어기(2203), 재구성가능한 로직 엘리먼트(2204) 및 데이터 전송을 허용하는 여러 버스 및 제어 라인을 포함한다. 외부 버퍼(2201)는 또한 이러한 데이터 입력 실시예에 대해 도시된다.이러한 서브세트는 데이터 입력 동작에 필요한 로직을 도시하며, 외부 인터페이스 및 RCC 컴퓨팅 시스템으로부터의 데이터는 RCC 하드웨어 어레이에 전송된다. 도 70의 데이터 입력 제어 로직 및 도 72의 데이터 입력 타이밍도는 함께 논의될 것이다.Data input control logic in accordance with one embodiment of the present invention is responsible for processing data transmitted from an RCC computing system or an external interface to an RCC hardware array. One particular subset 2150 of data input control logic (see FIG. 69) is shown in FIG. 70 and includes an external I / O controller 2200, a three phase buffer 2202, an internal I / O controller 2203, and a reconfiguration. Possible logic elements 2204 and various bus and control lines to allow data transfer. An external buffer 2201 is also shown for this data input embodiment. This subset shows the logic required for the data input operation, with data from the external interface and the RCC computing system being sent to the RCC hardware array. The data input control logic of FIG. 70 and the data input timing diagram of FIG. 72 will be discussed together.

두개 유형의 데이터 사이클은 본 발명의 이러한 데이터 입력 실시예(글로벌 사이클 및 소프트웨어-대-하드웨어(S2H) 사이클)에 이용된다. 글로벌 사이클은 RCC 하드웨어 어레미의 여러 다른 노드에서 전달되는 소정의 다른 S2H 데이터, 클록 및 리셋과 같은 RCC 하드웨어 어레이의 모든 칩에 전달되는 소정 데이터에 대해 이용된다. 이러한 후속 "글로벌" S2H 데이터에 대해, 시퀀셜 S2H 데이터보다는 글로벌 사이클을 통해 이러한 데이터 출력을 전송하는 것이 더 실행가능하다.Two types of data cycles are used in this data input embodiment of the present invention (global cycle and software-to-hardware (S2H) cycle). Global cycles are used for certain data delivered to all chips in the RCC hardware array, such as some other S2H data, clocks, and resets that are delivered at different nodes of the RCC hardware array. For this subsequent "global" S2H data, it is more feasible to send this data output through a global cycle rather than sequential S2H data.

소프트웨어-대-하드웨어 사이클은 RCC 컴퓨팅 시스템의 테스트 벤치 프로세스로부터 RCC 하드웨어 어레이로 순차적으로 하나의 칩으로부터 모든 보드의 다른 칩으로 데이터를 전송하는데 이용된다. 사용자 설계의 하드웨어 모델이 여러 보드를 통해 분산되기 때문에, 테스트 벤치 데이터는 데이터 평가를 위해 모든 칩에 제공되어야 한다. 따라서, 데이터는 한번에 하나의 내부 노드로, 각 칩의 각 내부 노드에 순차적으로 전송된다. 순차적 전송은 하드웨어 모델이 다수의 칩간에 분포되기 때문에 특정 내부 노드에 대해 지적된 특정 데이터가 RCC 하드웨어 어레이의 모든 칩에 의해 처리되도록 허용한다.The software-to-hardware cycle is used to transfer data from one chip to another chip on all boards sequentially from the test bench process of the RCC computing system to the RCC hardware array. Because the hardware model of your design is distributed across multiple boards, test bench data must be provided on every chip for data evaluation. Thus, data is sequentially transmitted to one internal node at a time, to each internal node of each chip. Sequential transmission allows the specific data pointed out for a particular internal node to be processed by all the chips in the RCC hardware array because the hardware model is distributed among multiple chips.

이러한 데이터 평가에 대해, 커버화는 두개의 어드레스 공간(S2H 및 CLK)을 제공한다. 상기에 기술된 바와 같이, S2H 및 CLK 공간은 커널로부터 하드웨어 모델로의 1차 입력이다. 하드웨어 모델은 사용자 회로 설계의 모든 레지스터 소자 및 결합 소자를 유지한다. 게다가, 소프트웨어 클록은 소프트웨어에서 모델링되고 하드웨어 모델과 인터페이싱하기 위해 CLK I/O 어드레스 공간에 제공된다. 커널은 시뮬레이션 시간을 늘리고, 능동 테스트 벤치 소자를 탐색하고 클록 소자를 평가한다. 소정의 클록 에지가 커널에 의해 검출되면, 레지스터 및 메모리는 업데이팅되고 결합 소자를 통한 값은 전파된다. 따라서, 이러한 공간의 값의 변화는 하드웨어 가속 모드가 선택되면 로직 상태를 변화시키기 위해 하드웨어 모델을 트리거할 것이다.For this data evaluation, the covering provides two address spaces S2H and CLK. As described above, S2H and CLK space are the primary inputs from the kernel to the hardware model. The hardware model maintains all the register elements and coupling elements of the user circuit design. In addition, the software clock is modeled in software and provided in the CLK I / O address space to interface with the hardware model. The kernel increases simulation time, searches for active test bench devices, and evaluates clock devices. If a certain clock edge is detected by the kernel, the registers and memory are updated and the values propagated through the coupling element. Thus, the change in the value of this space will trigger the hardware model to change the logic state once the hardware acceleration mode is selected.

데이터 전송동안, DATA_XSFR 신호는 로직 "1"에 있다. 이 시간동안, 로컬 버스(2222-2230)는 (1) RCC 컴퓨팅 시스템으로부터 RCC 하드웨어 어레이 및 CLK 공간으로의 글로벌 데이터; (2) 외부 인터페이스로부터 RCC 하드웨어 어레이 및 외부 버퍼로의 글로벌 데이터; 및 (3) 각 보드에서 한번에 한 칩으로 RCC 컴퓨팅 시스템으로부터 RCC 하드웨어 어레이로의 S2H 데이터와 같은 데이터 사이클로 데이터를 전송하기 위해 커버 시스템에 의해 이용될 것이다. 따라서, 첫번째 두개 데이터 사이클은 글로벌 사이클의 일부이며 최종 데이터 사이클은 S2H 사이클의 일부이다.During data transfer, the DATA_XSFR signal is at logic "1". During this time, the local buses 2222-2230 may include: (1) global data from the RCC computing system to the RCC hardware array and CLK space; (2) global data from the external interface to the RCC hardware array and external buffer; And (3) a cover system to transfer data in a data cycle, such as S2H data from the RCC computing system to the RCC hardware array, one chip at a time on each board. Thus, the first two data cycles are part of the global cycle and the final data cycle is part of the S2H cycle.

RCC 컴퓨팅 시스템으로부터의 글로벌 데이터가 RCC 하드웨어 어레이로 전송될 때 데이터 입력 글로벌 사이클의 제 1 부분에 대해, 외부 I/O 제어기(2200)는 라인 (2255)상에 CPU_IN 신호를 로직 "1"로 인에이블시킨다. 라인(2255)은 삼상 버퍼 (2202)의 인에이블 입력에 연결된다. 라인(2255)상의 로직 "1"로, 3상 버퍼(2202)는 로컬 버스(2222)상의 데이터가 3상 버퍼(2202)의 다른측상에 로컬 버스(2223-2230)로 전송하도록 허용한다. 이러한 특정 예에서, 로컬 버스(2223, 2224, 2225, 2226, 2227, 2228, 2229, 2230)는 각각 LD3, LD4(외부 I/O 제어기(2200)로부터), LD6(외부 I/O 제어기(22O0)로부터), LD1, LD6, LD4, LD5, LD7에 대응한다.For the first part of the data input global cycle when global data from the RCC computing system is sent to the RCC hardware array, the external I / O controller 2200 reads the CPU_IN signal on logic line 2255 as logic " 1 ". Enable it. Line 2255 is connected to the enable input of three-phase buffer 2202. With logic "1" on line 2255, three-phase buffer 2202 allows data on local bus 2222 to transfer to local bus 2223-2230 on the other side of three-phase buffer 2202. In this particular example, local buses 2223, 2224, 2225, 2226, 2227, 2228, 2229, 2230 are LD3, LD4 (from external I / O controller 2200), and LD6 (external I / O controller 22O0, respectively). ), LD1, LD6, LD4, LD5, and LD7.

글로벌 데이터는 이러한 로컬 버스 라인으로부터 내부 I/O 제어기(2203)의 버스 라인(2231-2235) 및 그후에 FD 버스 라인(2236-2240)으로 진행한다. 이 예에서, FD 버스 라인(2236, 2237, 2238, 2239, 2240)은 각각 FD 버스 라인(FD1, FD6, FD4, FD5, FD7)에 대응한다.Global data proceeds from this local bus line to the bus lines 2231-2235 of the internal I / O controller 2203 and then to the FD bus lines 2236-2240. In this example, the FD bus lines 2236, 2237, 2238, 2239, and 2240 correspond to the FD bus lines FD1, FD6, FD4, FD5, and FD7, respectively.

이러한 FD 버스 라인(2236-2240)은 재구성가능한 로직 엘리먼트(2204)의 래치(2208-2213)에 대한 입력에 연결된다. 이 예에서, 재구성가능한 로직 엘리먼트는 chip0_1(즉, 보드 1의 칩 0)에 대응한다. 또한, FD 버스 라인(2236)은 래치 (2208)에 연결되고, FD 버스 라인(2237)은 래치(2209, 2211)에 연결된다. FD 버스 라인(2238)은 래치(2210)에 연결되고, FD 버스 라인(2239)은 래치(2212)에 연결되며, FD 버스 라인(2240)은 래치(2213)에 연결된다.These FD bus lines 2236-2240 are connected to the inputs to the latches 2208-2213 of the reconfigurable logic element 2204. In this example, the reconfigurable logic element corresponds to chip0_1 (ie chip 0 of board 1). FD bus line 2236 is also connected to latch 2208, and FD bus line 2237 is connected to latches 2209 and 2211. FD bus line 2238 is connected to latch 2210, FD bus line 2239 is connected to latch 2212, and FD bus line 2240 is connected to latch 2213.

이러한 래치(2208-2213) 각각에 대한 인에이블 입력은 여러 글로벌 포인터 및 소프트웨어-대-하드웨어(S2H) 포인터에 연결된다. 래치(2208-2211)에 대한 인에이블 입력은 글로벌 포인터에 연결되고 래치(2212-2213)에 대한 인에이블 입력은 S2H 포인터에 연결된다. 소정의 예시적인 글로벌 포인터는 라인(2241)상의 GLB_PTR0, 라인(2242)상의 GLB_PTR1, 라인(2243)상의 GLB_PTR2 및 라인(2244)상의 GLB_PTR3를 포함한다. 소정의 예시적인 S2H 포인터는 라인(2245)상의 S2H_PTR0 및라인(2246)상의 S2H_PTR1을 포함한다. 이러한 래치에 대한 인에이블 입력이 이러한 포인터에 연결되기 때문에, 각 래치는 적절한 포인터 신호없이 사용자 설계의 하드웨어 모델의 지정된 목적 모드에 데이터를 래칭할 수 없다.The enable input for each of these latches 2208-2213 is coupled to several global pointers and software-to-hardware (S2H) pointers. The enable input for latch 2208-2211 is connected to the global pointer and the enable input for latch 2212-2213 is connected to the S2H pointer. Some exemplary global pointers include GLB_PTR0 on line 2241, GLB_PTR1 on line 2242, GLB_PTR2 on line 2243 and GLB_PTR3 on line 2244. Some exemplary S2H pointers include S2H_PTR0 on line 2245 and S2H_PTR1 on line 2246. Because the enable input for this latch is connected to this pointer, each latch cannot latch data in the designated destination mode of the hardware model of the user's design without an appropriate pointer signal.

이러한 글로벌 및 S2H 포인터 신호는 출력(2254)상의 데이터 입력 포인터 상태 머신(2214)에 의해 발생된다. 데이터 입력 포인터 상태 머신(2214)은 라인 (2253)상의 DATA_XSFR 및 F_WR에 의해 제어된다. 내부 I/O 제어기(2203)는 라인(2253)상의 DATA_XSFR 및 F_WR을 발생시킨다. DATA_XSFR은 RCC 하드웨어 어레이와 RCC 컴퓨팅 시스템 또는 외부 인터페이스간의 데이터 전송을 원할 때마다 로직 "1"상태로 있다. F_RD 신호와 반대로, F_WR 신호는 RCC 하드웨어 어레이로의 기록을 원할 때마다 로직 "1"에 있다. F_RD 신호를 통한 판독은 RCC 하드웨어 어레이로부터 RCC 컴퓨팅 시스템 및 외부 인터페이스 중 하나로의 데이터 전송을 필요로 한다. DATA_XSFR 및 F_WR 신호 양쪽이 로직 "1"에 있으면, 데이터 입력 포인터 상태 머신은 적절한 프로그램된 시퀀스로 적절한 글로벌 또는 S2H 포인터 신호를 발생시킬 수 있다.These global and S2H pointer signals are generated by data input pointer state machine 2214 on output 2254. Data input pointer state machine 2214 is controlled by DATA_XSFR and F_WR on line 2253. Internal I / O controller 2203 generates DATA_XSFR and F_WR on line 2253. DATA_XSFR is in a logic "1" state whenever a data transfer is desired between the RCC hardware array and the RCC computing system or external interface. In contrast to the F_RD signal, the F_WR signal is in logic " 1 " whenever a write to the RCC hardware array is desired. Reading through the F_RD signal requires data transfer from the RCC hardware array to one of the RCC computing system and external interface. If both the DATA_XSFR and F_WR signals are in logic "1", the data input pointer state machine can generate the appropriate global or S2H pointer signal in the appropriate programmed sequence.

이러한 래치의 출력(2247-2252)은 사용자 설계의 하드웨어 모델의 여러 내부 노드에 연결된다. 내부 노드중 일부는 사용자 디자인의 입력 핀-아웃에 해당한다. 사용자 디자인은 일반적으로 핀-아웃을 통해 액세스될 수 없는 다른 내부 노드를 가지지만, 이러한 논-핀-아웃 내부 노드는 이들이 입력 핀-아웃인지 아닌지에 관계없이 사용자 디자인내 여러 내부 노드에 자극을 주기를 원하는 설계자에게 융통성을 제공하기 위한 다른 디버깅 목적을 위한 것이다. 사용자 디자인의 고도한 하드웨어 모델에 외부 인터페이스를 제공하는 자극을 위해, 데이터-인 논리 및 입력-핀-아웃에 해당하는 이러한 내부 노드가 수행된다. 예를 들면, 만일 사용자 디자인이 CRTC(6845) 비디오 콘트롤러일 때, 몇몇 입력 핀-아웃은 다음과 같다:The outputs of these latches 2247-2252 are connected to various internal nodes of the hardware model of the user design. Some of the internal nodes correspond to the input pin-outs of your design. User designs generally have other internal nodes that cannot be accessed through pin-out, but these non-pin-out internal nodes stimulate multiple internal nodes in the user design whether or not they are input pin-outs. It is for other debugging purposes to provide flexibility for designers who want it. For the stimulus to provide an external interface to the high hardware model of the user design, this internal node corresponding to data-in logic and input-pin-out is performed. For example, if the user design is a CRTC 6845 video controller, some input pin-outs are as follows:

LPSTPB - 광펜 스트로브 핀LPSTPB-Light Pen Strobe Pins

~RESET - 6846 콘트롤러를 리세트하기 위한 저레벨 신호~ RESET-low level signal for resetting the 6846 controller

RS - 레지스터 선택RS-Register Selection

E - 인에이블E-Enable

CLK - 클록CLK-Clock

~CS - 칩 선택~ CS-Chip Selection

다른 입력 핀-아웃은 또한 이러한 비디오 콘트롤러에서 사용될 수 있다. 외부 세계와 인터페이스하는 입력 핀-아웃의 수에 기초하여, 래치와 포인터의수는 빠르게 결정될 수 있다. RCC 하드웨어 어레이내에 구성된 몇몇 하드웨어 모델은 예를 들면 총 180개의 래치(=30x6)에 대해 각각의 GLB_PTR0, GLB_PTR1, GLB_PTR2, GLB_PTR3, S2H_PTRO 및 S2H_PTR1과 관련된 30개의 분리 래치를 가진다. 다른 디자인에서, GLB_PTR4 내지 GLB_PTR30과 같은 더 많은 글로벌 포인터가 필요에 따라 사용될 수 있다. 유사하게, S2H_PTR2 내지 S2H_PTR30과 같은 더 많은 S2H가 필요에 따라 사용될 수 있다. 이러한 포인터들과 이들의 해당 래치는 각각의 사용자 디자인의 하드웨어 모델에 대한 요구조건에 기초한다.Other input pin-outs can also be used in these video controllers. Based on the number of input pin-outs that interface with the outside world, the number of latches and pointers can be quickly determined. Some hardware models configured within the RCC hardware array have 30 separate latches associated with each of GLB_PTR0, GLB_PTR1, GLB_PTR2, GLB_PTR3, S2H_PTRO and S2H_PTR1, for a total of 180 latches (= 30x6), for example. In other designs, more global pointers such as GLB_PTR4 to GLB_PTR30 may be used as needed. Similarly, more S2H such as S2H_PTR2 to S2H_PTR30 can be used as needed. These pointers and their corresponding latches are based on the requirements for the hardware model of each user design.

도 70 및 도 72를 참조하면, FD 버스 라인상의 데이터는 래치가 적정 글로벌 포인터 또는 S2H 포인터 신호로 인에이블되기만 하면 이러한 내부 노드에 자신을길을 형성한다. 그렇지 않다면, 이들 내부 노드는 FD 버스상의 임의의 데이터에 의해 구동되지 않는다. F_WR이 CPU_IN=1 시간 주기의 처음 반주기 도안 논리 "1"일 때, GLB_PTR0은 논리 "1"이 되어 라인 2247을 통해 해당 내부 노드로 FD1상의 데이터를 구동한다. 인에이블링을 위해 GLB_PTR0에 의존하는 다른 래치가 존재한다면, 이러한 래치는 자신들이 해당 내부 노드에 데이터를 래칭할 것이다. CPU_IN=1 시간 주의 다음 반주기에서, F_WR은 GLB_PTR1이 논리 "1"로 증가되도록 트리거하여 논리 "1"로 간다. 이는 라인 2248에 연결된 내부 노드에 FD6상의 데이터를 구동한다. 이는 또한 래치 2205에 의해 라인 2216에 래칭될 라인 2223상에 소프트웨어 신호를 송신하고 인에이블 라인 2215에 GLB_PTR1 신호를 송신한다. 소프트웨어 클록은 외주 클록 입력, 타겟 시스템 및 다른 외부 I/O 장치로 전달된다. GLB_PTR0 및 GLB_PTR1이 데이터-인 글로벌 사이클의 제 1 부분으로서만 사용되기 때문에, CPU_IN은 논리 "0"으로 되돌아가고, 이는 RCC 연산 시스템으로부터 RCC 하드웨어 어레이로 글로벌 데이터의 전달을 완성한다.70 and 72, the data on the FD bus line builds itself on these internal nodes as long as the latch is enabled with the appropriate global pointer or S2H pointer signal. Otherwise, these internal nodes are not driven by any data on the FD bus. When F_WR is the first half-cycle design logic "1" of CPU_IN = 1 time period, GLB_PTR0 becomes logic "1" to drive data on FD1 to its internal node via line 2247. If there are other latches that depend on GLB_PTR0 for enabling, these latches will latch their data to that internal node. CPU_IN = 1 time note In the next half cycle, F_WR triggers GLB_PTR1 to be incremented to logic "1" and goes to logic "1". This drives data on FD6 to an internal node connected to line 2248. It also sends a software signal on line 2223 to be latched on line 2216 by latch 2205 and transmits a GLB_PTR1 signal on enable line 2215. The software clock is passed to the peripheral clock input, target system, and other external I / O devices. Since GLB_PTR0 and GLB_PTR1 are used only as the first part of the data-in global cycle, CPU_IN returns to logic "0", which completes the transfer of global data from the RCC computing system to the RCC hardware array.

데이터-인 글로벌 사이클의 제 2 부분이 설명될 것이고, 여기서 외부 인터페이스로부터의 글로벌 데이터는 RCC 하드웨어 어레이와 외부 버퍼에 전달된다. 다시, 사용자 디자인에 맞도록 유도되는 타겟 시스템 또는 외부 I/O 장치로부터의 여러 입력 핀-아웃 신호는 하드웨어 모델과 소프트웨어 모델에 제공되어야만 한다. 이러한 데이터는 적정 포인터를 사용함으로써 하드웨어 모델에 전달되고 내부 노드를 구동하도록 래칭된다. 이러한 데이터는 소프트 웨어 모델의 내부 상태를 업데이팅하기 위해 RCC 연상 시스템에 의한 추후 검색을 위해 외부 버퍼 2201내에 이들을 가장먼저 저장함으로써 소프트웨어 모델에 전달될 수 있다.The second part of the data-in global cycle will be described where global data from the external interface is passed to the RCC hardware array and external buffer. Again, several input pin-out signals from the target system or external I / O devices that are directed to the user's design must be provided to the hardware model and the software model. This data is passed to the hardware model by using the appropriate pointers and latched to drive internal nodes. Such data can be passed to the software model by first storing them in an external buffer 2201 for later retrieval by the RCC associative system to update the internal state of the software model.

CPU_IN은 논리 "0"이고 EXT_IN은 논리 "1"이다. 따라서, 외부 I/O 콘트롤러 2200내 3상(tri-state) 버퍼 2206은 버스 라인 2217과 2218로서 PCI 버스 라인상에 데이터가 올려지도록 인에이블된다. 이러한 PCI 버스 라인은 또한 외부 버퍼 2201내 스토리지용 FD 버스 라인 2219에 연결된다. EXT_IN 신호가 논리 "1"일 때의 시간 주기의 처음 반주기에서, GLB_PTR2는 논리 "1"이다. 이는 FD4상의 데이터가 (버스 라인 2217, 2224 및 로컬 버스 라인 2228(LD4)을 통해) 라인 2249에 연결된 하드웨어 모델내 내부 노드로 래칭되도록 래칭한다.CPU_IN is logic "0" and EXT_IN is logic "1". Thus, the tri-state buffer 2206 in the external I / O controller 2200 is enabled to load data onto the PCI bus lines as bus lines 2217 and 2218. This PCI bus line is also connected to FD bus line 2219 for storage in external buffer 2201. In the first half period of the time period when the EXT_IN signal is logic "1", GLB_PTR2 is logic "1". This latches the data on FD4 to latch internal nodes in the hardware model connected to line 2249 (via bus lines 2217, 2224 and local bus line 2228 (LD4)).

EXT_IN 신호가 논리 "1"일 때의 시간 주기의 다음 반주기동안, GLB_PTR3은 논리 "1"이다. 이는 FD5상의 데이터가 (버스 라인 2218, 2225 및 로컬 버스 라인 2227(LD6)을 통해) 라인 2250에 연결된 하드웨어 모델내 내부 노드로 래칭되도록 래칭한다.For the next half period of the time period when the EXT_IN signal is logic "1", GLB_PTR3 is logic "1". This latches the data on FD5 to latch internal nodes in the hardware model connected to line 2250 (via bus lines 2218, 2225 and local bus line 2227 (LD6)).

상술된 바와 같이, 타겟 시스템 또는 몇몇 다른 외부 I/O 장치로부터의 이러한 데이터는 소프트웨어 모델의 내부 상태를 업데이팅하기 위해 RCC 연산 시스템에 의한 추후 검색을 위해 외부 버퍼 2201에 이들을 가장먼저 저장함으로써 소프트웨어 모델로 전달될 수 있다. 버스 라인 2217과 2218상의 이러한 데이터는 외부 버퍼 2201에 대해 FD 버스 FD[63:0]으로 제공된다. 각각의 데이터가 외부 버퍼 2201에 저장되는 특정 메모리 어드레스는 외부 버퍼 2201에 버스 2220을 통해 메모리 어드레스 카운터 2201에 의해 제공된다. 이러한 저장을 가능케 하기 위해, WR_EXT_BUF 신호가 라인 2221을 통해 외부 버퍼 2201에 제공된다. 외부 버퍼 2201이 채워지기 전에, RCC 연산 시스템은 적정 업데이트가 소프트웨어 모델로 형성될 수 있도록 외부 버퍼 2201의 콘텐츠를 판독할 것이다. RCC 하드웨어 어레이내 하드웨어 모델의 여러 내부 노드에 전달된 임의의 데이터는 하드웨어 모델내 몇몇 내부 상태 변화를 야기할 것이다. RCC 연산 시스템이 소프트웨어내 전체 사용자 디자인의 모델을 가지기 때문에, 하드웨어 모델내 이러한 내부 상태 변화는 소프트웨어 모델내에 반영되어야 한다. 이는 데이터-인 글로벌 사이클을 마무리한다.As described above, such data from a target system or some other external I / O device may be stored first in an external buffer 2201 for later retrieval by the RCC computing system to update the internal state of the software model. Can be delivered. This data on bus lines 2217 and 2218 is provided to FD bus FD [63: 0] for external buffer 2201. The specific memory address where each data is stored in external buffer 2201 is provided by memory address counter 2201 via bus 2220 to external buffer 2201. To enable such storage, a WR_EXT_BUF signal is provided to the external buffer 2201 via line 2221. Before the external buffer 2201 is filled, the RCC computational system will read the contents of the external buffer 2201 so that an appropriate update can be made to the software model. Any data passed to various internal nodes of the hardware model in the RCC hardware array will cause some internal state change in the hardware model. Since the RCC computing system has a model of the overall user design in software, this internal state change in the hardware model must be reflected in the software model. This concludes the data-in global cycle.

S2H 사이클이 이하에서 설명될 것이다. S2H 사이클은 RCC 연산 시스템으로부터 RCC 하드웨어 어레이로 테스트 벤치 데이터를 전달하는데 사용되고, 다음으로 각각의 보드를 위해 하나의 칩으로부터 다음 칩으로 순차적으로 데이터를 이동한다. CPU-IN 신호가 논리 "1"인 반면 EXT_IN 신호는 데이터 전달이 RCC 연산 시스템과 RCC 하드웨어 어레이 사이임을 나타내는 논리 "0"으로 간다. 외부 인터페이스는 관련되지 않는다. CPU_IN 신호는 또한 로컬 버스 2222로부터 내주 I/O 콘트롤러 2203으로 데이터가 통과될 수 있도록 하기 위해 3상 버퍼 2202를 인에이블시킨다.The S2H cycle will be described below. S2H cycles are used to transfer test bench data from the RCC computing system to the RCC hardware array, and then sequentially move data from one chip to the next for each board. While the CPU-IN signal is a logic "1", the EXT_IN signal goes to a logic "0" indicating that data transfer is between the RCC computing system and the RCC hardware array. The external interface is not relevant. The CPU_IN signal also enables three-phase buffer 2202 to allow data to be passed from the local bus 2222 to the inner I / O controller 2203.

CPU_IN=1 시간 주기의 시작시, S2H_PTR0은 라인 2251에 연결된 하드웨어 모델내 내부 노드에 래칭될 (로컬 버스 2222, 로컬 버스 라인 2229, 버스 라인 2234 및 FD 버스 2239를 통해) FD5상에 데이터를 래칭하는 논리 "0"으로 간다. CPU_IN=1 시간 주기의 제 1 부분에서, S2H_PTR1은 라인 2252에 연결된 하드웨어 모델내 내부 노드에 래칭될 (로컬 버스 2222, 로컬 버스 라인 2230, 버스 라인 2235 및 FD 버스 2240을 통해) FD7상에 데이터를 래칭하는 논리 "1"로 간다. 순차적인데이터 평가 동안, RCC 연산 시스템으로부터의 데이터가 칩 m1에 가장먼저 전달되고, 다음으로 칩0_1(즉, 보드 1상의 칩 0), 칩1)1(즉, 보드 1상의 칩1)로 전달되어 마지막 보드상의 최종 칩, 칩7_8(즉, 보드8상의 칩7)로 전달된다. 만일 칩 m2가 사용 가능하다면, 데이터는 이러한 칩으로 이동될 수 있다.At the start of CPU_IN = 1 time period, S2H_PTR0 latches data on FD5 (via local bus 2222, local bus line 2229, bus line 2234, and FD bus 2239) to be latched to an internal node in the hardware model connected to line 2251. Go to logic "0". In the first part of the CPU_IN = 1 time period, S2H_PTR1 sends data on FD7 (via local bus 2222, local bus line 2230, bus line 2235 and FD bus 2240) to be latched to an internal node in the hardware model connected to line 2252. Go to the logic "1" to latch. During sequential data evaluation, the data from the RCC computing system is transferred first to chip m1 and then to chip 0_1 (i.e. chip 0 on board 1), chip 1) 1 (i.e. chip 1 on board 1). The chip is then transferred to the last chip on the last board, chip 7_8 (ie chip 7 on board 8). If chip m2 is available, data can be moved to this chip.

데이터 전달의 끝에서, DATA_XSFR은 논리 "0"으로 돌아간다. 외부 인터페이스로부터의 I/O 데이터가 글로벌 데이터로서 간주되고 글로벌 사이클 동안 처리된다. 이는 데이터-인 제어 논리 및 데이터-인 사이클의 결과를 마무리한다.At the end of the data transfer, DATA_XSFR returns to logic "0". I / O data from the external interface is considered global data and processed during the global cycle. This concludes the result of the data-in control logic and data-in cycle.

데이터-아웃Data-out

본 발명의 데이터-아웃 제어 논리 실시예가 이하에서 설명된다. 본 발명의 실시예에 따른 데이터-아웃 제어 논리는 RCC 하드웨어 어레이로부터 CC 연산 시스템과 외부 인터페이스로 전달된 데이터를 처리할 책임이 있다. 자극(외부 또는 그외)에 응답하여 데이터를 처리하는 과정동안, 하드웨어 모델은 타겟 응용 또는 몇몇 I/O 장치가 사용되는 특정 출력 데이터를 발생시킨다. 이러한 출력 데이터는 독립 데이터, 어드레스, 제어 정보 또는 다른 응용이나 장치가 자신의 처리에 필요한 다른 관련 정보일 수 있다. RCC 연산 시스템에 대한 이러한 출력 데이터(소프트웨어내 다른 외부 I/O 장치의 모델을 가진), 타겟, 시스템 또는 외부 I/O 장치가 여러 내부 노드에 제공된다. 데이터-인 논리에 대해 상술된 바와 같이, 이러한 내부 노드의 일부가 사용자 디자인의 출력 핀-아웃에 해당한다. 사용자 디자인은 핀-아웃을 통해 일반적으로 액세스되지 않는 다른 내부 노드를 가지지만, 이러한논-핀-아웃 내부 노드는 이들이 출력 핀-아웃인지 아닌지에 관계없이 사용자 디자인내 여러 내부 노드에서 자극 응답을 판독하고 분석하길 원하는 설계자에게 융통성을 제공할 다른 디버깅 목적을 위한 것이다. 사용자 디자인의 고도한 하드웨어 모델로부터 (소프트웨어내 다른 I/O 장치의 모델을 가진) RCC 연산 시스템 도는 외부 인터페이스로 제공된 자극에 대해, 데이터-아웃 논리와 출력 핀-아웃에 해당하는 이러한 내부 노드가 수행된다.A data-out control logic embodiment of the present invention is described below. Data-out control logic in accordance with an embodiment of the present invention is responsible for processing data transferred from the RCC hardware array to the CC computing system and the external interface. During the processing of the data in response to a stimulus (external or otherwise), the hardware model generates specific output data for which the target application or some I / O device is used. Such output data may be independent data, addresses, control information or other relevant information that other applications or devices need for their processing. This output data for the RCC computing system (with a model of other external I / O devices in software), targets, systems or external I / O devices are provided to several internal nodes. As described above for the data-in logic, some of these internal nodes correspond to the output pin-out of the user design. User designs have other internal nodes that are not normally accessed through pin-out, but these non-pin-out internal nodes read stimulus responses from multiple internal nodes in the user design, whether they are output pin-out or not. It is for other debugging purposes that will give flexibility to the designer who wants to analyze it. From the high hardware model of the user's design to the stimulus provided to the RCC computational system (with a model of another I / O device in software) or to an external interface, this internal node corresponding to the data-out logic and output pin-out is performed. do.

예를 들면, 만일 사용자 디자인이 CRTC 6845 비디오 콘트롤러라면, 몇몇 출력 핀-아웃은 다음과 같다:For example, if your design is a CRTC 6845 video controller, some output pin-outs are:

MA0-MA13 - 메모리 어드레스MA0-MA13-memory address

D0-D7 - 데이터 버스D0-D7-data bus

DE - 디스플레이 인에이블DE-display enable

CURSOR - 커서 위치CURSOR-Cursor Position

VS - 수직 동기화VS-Vertical Sync

HS - 수평 동기화HS-Horizontal Sync

다른 출력 핀-아웃은 이러한 비디오 콘트롤러에서 사용될 수 있다. 외부 세계와 인터페이스하는 출력 핀-아웃의 수에 기초하여, 노드 수와 이에 따른 게이팅 논리와 포인터의 수는 빠르게 결정될 수 있다. 따라서, 비디오 콘트롤러상의 출력 핀-아웃 MA0-MA13은 비디오 RAM용 메모리 어드레스를 제공한다. VS 출력 핀-아웃은 수직 동기화용 신호를 제공하고, 따라서 모니터상의 수직 리트레이스를 야기한다. 출력 핀-아웃 D0-D7은 타겟 시스템내 CPU에 의해 내부 6945 레지스터를 액세스하기 위한 양방향 데이터 버스를 형성하는 8개의 단자이다. 이러한 출력 핀-아웃은 하드웨어 모델내 특정 내주 노드에 해당한다. 물론, 이러한 내부 노드의 수와 특성은 사용자 디자인에 따라 변한다.Other output pin-outs can be used with these video controllers. Based on the number of output pin-outs that interface with the outside world, the number of nodes and thus the number of gating logic and pointers can be quickly determined. Thus, the output pin-out MA0-MA13 on the video controller provides the memory address for the video RAM. The VS output pin-out provides a signal for vertical synchronization, thus causing a vertical retrace on the monitor. Output pin-outs D0-D7 are eight terminals that form a bidirectional data bus for accessing internal 6945 registers by the CPU in the target system. This output pin-out corresponds to a specific inner node in the hardware model. Of course, the number and characteristics of these internal nodes vary depending on the user design.

RCC 연산 시스템이 소프트웨어내 전체 사용자 디자인의 모델을 포함하기 때문에 이러한 출력 핀-아웃 내부 노드는 RCC 연산 시스템에 제공되어야 하고, 소프트웨어 모델내에서 발생하는 임의의 경우 해당 변화가 형성될 수 있도록 소프트웨어 모델과 통신하여야 한다. 이러한 방식으로, 소프트웨어 모델은 하드웨어 모델과 일치하는 정보를 가질 것이다. 추가적으로, RCC 연산 시스템은 외부 I/O 확장상의 포트중 하나에 실제 장치를 연결시키는 것을 제외하고 소프트웨어내에 모델에 전용된 사용자 또는 설계자인 I/O 장치의 장치 모델을 가진다. 예를 들면, 사용자는 실제 모니터 또는 스피커를 외부 I/O 확장기 포트에 프러깅하는 것을 제외하고 소프트웨어내 모니터 또는 스피커를 모델링하는 것이 더 쉽고 효율적인지를 결정한다. 더욱이, 하드웨어 모델내 이러한 내부 노드로부터의 데이터는 타겟 시스템 및 다른 외부 I/O 장치에 제공되어야 한다. 이러한 출력 핀-아웃 내부 노드가 RCC 연산 시스템 및 타겟 시스템과 다른 외부 I/O 장치로 전달될 수 있도록 하기 위해, 본 발명의 일 실시예에 따른 데이터-아웃 제어 논리는 공동인증 시스템내에 제공된다.Since the RCC computational system contains a model of the entire user design in software, these output pin-out internal nodes must be provided to the RCC computational system, so that any changes that occur within the software model and the software model can be made. Communicate In this way, the software model will have information that matches the hardware model. In addition, the RCC computing system has a device model of an I / O device that is a user or designer dedicated to the model in software, except that the actual device is connected to one of the ports on the external I / O extension. For example, the user determines whether it is easier and more efficient to model a monitor or speaker in software except by plugging the actual monitor or speaker into the external I / O expander port. Moreover, data from these internal nodes in the hardware model must be provided to the target system and other external I / O devices. In order for these output pin-out internal nodes to be communicated to RCC computational systems and target I / O devices other than the target system, data-out control logic in accordance with one embodiment of the present invention is provided within the co-authentication system.

데이터-아웃 제어 논리는 RCC 하드웨어 2190으로부터 RCC 연산 시스템 2141 및 외부 인터페이스(외부 I/O 확장기 2139)로의 데이터 전달을 포함하는 데이터-아웃 사이클을 사용한다. 도 69에서, 외부 인터페이스(외부 I/O 확장기 2139)와 공동인증 2140 사이의 데이터 전달을 위한 제어 논리가 각각의 보드 2145-2149에 보여진다. 제어 논리의 주요부가 외부 I/O 콘트롤러 2152에서 보여지지만 다른 부분은 여러 내부 I/O 콘트롤러(예를 들면, 2156 및 2158)와 재구성 가능 제어 엘리먼트(예를 들면 FPGA 칩 2159 및 2165)에서 보여진다. 다시, 기구적인 목적으로, 모든 보드내 모든 칩에 대한 동일한 반복 논리 구조 대신에 이러한 제어 논리의 일부를 도시하는 것만이 필요하다. 도 69의 덤선 2150내 공동인증 시스템 부분 2140은 제어 논리의 서브세트를 포함한다. 이러한 제어 논리는 도 71과 도 73과 관련하여 전반적으로 설명될 것이다. 도 71은 데이터-아웃 사이클에 사용되는 제어 논리부를 도시한다. 도 73은 데이터-아웃 사이클의 타이밍 도면을 도시한다.The data-out control logic uses a data-out cycle that includes data transfer from RCC hardware 2190 to RCC computing system 2141 and an external interface (external I / O expander 2139). In FIG. 69, control logic for data transfer between the external interface (external I / O expander 2139) and co-authorization 2140 is shown on each board 2145-2149. While the main part of the control logic is shown in the external I / O controller 2152, the other part is shown in several internal I / O controllers (e.g. 2156 and 2158) and reconfigurable control elements (e.g. FPGA chips 2159 and 2165). . Again, for mechanical purposes, it is only necessary to show some of this control logic instead of the same repeating logic structure for every chip in every board. Co-authentication system portion 2140 in thick line 2150 of FIG. 69 includes a subset of control logic. This control logic will be described generally with respect to FIGS. 71 and 73. 71 shows the control logic used in the data-out cycle. 73 shows a timing diagram of a data-out cycle.

데이터-아웃 제어 논리의 특정 서브세트가 도 71에 도시되어 있고, 외부 I/O 콘트롤러 2300, 3상 버퍼 2301, 내부 I/O 콘트롤러 2302, 재구성 가능한 논리 엘리먼트 2303 및 여러 버스와 제어 라인을 포함하여 이들 사이의 데이터 전송을 가능케 한다. 이러한 서브세트는 외부 인터페이스와 RCC 연산 시스템으로부터의 데이터가 RCC 하드웨어 어레이로 전달되는 데이터-아웃 동작에 필요한 논리를 도시한다. 도 71의 데이터-아웃 제어 논리와 도 73의 데이터-아웃 타이밍 도면이 함께 설명될 것이다.A particular subset of the data-out control logic is shown in FIG. 71 and includes an external I / O controller 2300, a three phase buffer 2301, an internal I / O controller 2302, a reconfigurable logic element 2303 and several buses and control lines. It allows data transfer between them. This subset illustrates the logic required for the data-out operation in which data from the external interface and the RCC computing system is passed to the RCC hardware array. The data-out control logic of FIG. 71 and the data-out timing diagram of FIG. 73 will be described together.

데이터-아웃 사이클의 두 형태에 비교하여, 데이터-아웃 사이클은 오로지 한 형태의 사이클만을 포함한다. RCC 하드웨어 모델로부터의 데이터가 (1) RCC 연산 시스템으로 그리고 (2) RCC 연산 시스템과 외부 인터페이스(타겟 시스템과 외부 I/O 장치)로 순차적으로 전달되는 것을 필요로 한다. 특히, 데이터-아웃 사이클은RCC 하드웨어 어레이내 하드웨어 모델의 내부 노드로부터의 데이터가 RCC 연산 시스템에 가장먼저 전달되고 이후 RCC 연산 시스템과 각각의 칩내의 외부 인터페이스에 각각의 보드내에서 시간별로 하나의 칩과 하나의 보드에 다음으로 전달되는 것을 요구한다.Compared to two types of data-out cycles, the data-out cycle includes only one type of cycle. Data from the RCC hardware model needs to be passed sequentially to (1) the RCC computing system and (2) to the RCC computing system and external interfaces (target system and external I / O device). In particular, the data-out cycle ensures that data from an internal node of a hardware model in an RCC hardware array is delivered first to the RCC computing system and then one chip at a time within each board to the RCC computing system and an external interface within each chip. And to be passed on to one board next.

데이터-아웃 제어 논리와 같이, 포인터가 내부 노드로부터 RCC 연산 시스템과 외부 인터페이스로 데이터를 선택(또는 게이트)하는데 사용될 것이다. 도 71과 도 73에 도시된 실시예에서, 데이터-아웃 포인터 상태 기계 2319는 하드웨어-하드웨어 데이터 및 하드웨어-외부 인터페이스 데이터 모두를 위해 버스 2359상의 5개의 포인터 H2S_PTR[4:0]을 발생시킨다. 데이터-아웃 포인터 상태 기계 2319는 라인 2358상의 DATA_XSFR 및 F_RF 신호를 발생시킨다. DATA_XSFR은 RCC 하드웨어 어레이와 RCC 연산 시스템 또는 외부 인터페이스 사이의 데이터 전달이 요구될 때마다 항상 논리 "1"이다. F_RD 신호는 F_WR 신호와 비교하여, RCC 하드웨어 어레이로부터의 판독이 요구될 때마다 논리 "1"이다. 만일 DATA_XSFR 및 F_RD가 논리 "1"이라면, 데이터-아웃 포인터 상태 기계 2319는 적정 프로그램된 시퀀스로 적정 H2S 포인터 신호를 발생시킬 수 있다. 다른 실시예는 사용자 디자인에 필요한 더 많은 포인터(또는 더 적은 포인터)를 사용할 수 있다.Like data-out control logic, a pointer will be used to select (or gate) data from an internal node to the RCC computing system and an external interface. In the embodiment shown in FIGS. 71 and 73, the data-out pointer state machine 2319 generates five pointers H2S_PTR [4: 0] on the bus 2359 for both hardware-hardware data and hardware-external interface data. The data-out pointer state machine 2319 generates the DATA_XSFR and F_RF signals on line 2358. DATA_XSFR is always a logic "1" whenever data transfer between the RCC hardware array and the RCC computing system or external interface is required. The F_RD signal is a logic "1" every time a read from the RCC hardware array is required, compared to the F_WR signal. If DATA_XSFR and F_RD are logical " 1 ", then the data-out pointer state machine 2319 can generate the appropriate H2S pointer signal in the appropriate programmed sequence. Other embodiments may use more pointers (or fewer pointers) needed for user design.

이러한 H2S 포인터 신호는 게이팅 논리에 제공된다. 게이팅 논리로의 입력 세트 2353-2357은 수 개의 AND 게이트 2314-2318로 지향된다. 다른 입력 세트 2348-2352는 하드웨어 모델의 내부 노드에 연결된다. 따라서, AND 게이트(2314)는 내부 노드로부터의 입력(2348) 및 H2S_PTR0로부터의 입력(2353)을 갖으며; AND 게이트(2315)는 내부 노드로부터의 입력(2349) 및 H2S_PTR1로부터의 입력(2354)을 갖으며; AND 게이트(2316)는 내부 노드로부터의 입력(2350) 및 H2S_PTR2로부터의 입력(2355)을 갖으며; AND 게이트(2317)는 내부 노드로부터의 입력(2351) 및 H2S_PTR3로부터의 입력(2356)을 갖으며; AND 게이트(2318)는 내부 노드로부터의 입력(2352) 및 H2S_PTR3로부터의 입력(2357)을 갖는다. 적절한 H2S_PTR 포인터 신호없이, 내부 노드는 RCC 컴퓨팅 시스템 또는외부 인터페이스중 어떤 것에서도 구동될 수 없다.This H2S pointer signal is provided to the gating logic. Input sets 2353-2357 to the gating logic are directed to several AND gates 2314-2318. The other input sets 2348-2352 are connected to internal nodes of the hardware model. Thus, AND gate 2314 has an input 2348 from an internal node and an input 2357 from H2S_PTR0; AND gate 2315 has an input 2349 from an internal node and an input 2354 from H2S_PTR1; AND gate 2316 has input 2350 from internal node and input 2355 from H2S_PTR2; AND gate 2317 has an input 2351 from an internal node and an input 2356 from H2S_PTR3; AND gate 2318 has an input 2352 from an internal node and an input 2357 from H2S_PTR3. Without the proper H2S_PTR pointer signal, the inner node cannot be driven from either the RCC computing system or the external interface.

상기 AND 게이트(2314-2318)의 개별적인 출력(2343-2347)은 OR 게이트(2310-2313)에 결합된다. 따라서, AND 게이트 출력(2343)은 OR 게이트(2310)의 입력에 결합되며; AND 게이트 출력(2344)은 OR 게이트(2311)의 입력에 결합되며; AND 게이트 출력(2345)은 OR 게이트(2312)의 입력에 결합되며; AND 게이트 출력(2346)은 OR 게이트(2313)의 입력에 결합된다. AND 게이트(2315)의 출력(2344)은 비할당된 OR 게이트에서 결합되지 않지만, 출력(2344)은 AND 게이트(2316)의 출력(2345)에 결합된 OR게이트(2311)에 결합됨이 언급된다. OR 게이트(2310-2313)에 대한 다른 입력(2360-2366)은 스스로 다른 내부 노드 및 H2S_PTR 포인터에 결합되는 다른 AND 게이트(비도시)의 출력에 결합된다. 상기 OR 게이트 및 그들의 특정 입력의 사용은 사용자 설계 및 구성된 하드웨어 모델을 기반으로 한다. 따라서, 다른 설계에서, 사용될 수 있는 더 많은 포인터 및 AND 게이트(2315)로부터의 출력(2344)은 OR 게이트(2311)이 아닌 서로 다른 OR게이트에 결합된다.Individual outputs 2343-2347 of the AND gates 2314-2318 are coupled to OR gates 2310-2313. Thus, AND gate output 2343 is coupled to the input of OR gate 2310; AND gate output 2344 is coupled to the input of OR gate 2311; AND gate output 2345 is coupled to the input of OR gate 2312; AND gate output 2346 is coupled to the input of OR gate 2313. It is noted that output 2344 of AND gate 2315 is not coupled at unassigned OR gate, but output 2344 is coupled to OR gate 2311 coupled to output 2345 of AND gate 2316. . The other inputs 2236-2366 to the OR gates 2310-2313 are coupled to the output of another AND gate (not shown) which is itself coupled to another internal node and the H2S_PTR pointer. The use of the OR gates and their specific inputs is based on user designed and configured hardware models. Thus, in other designs, more pointers and outputs 2344 from AND gate 2315 that can be used are coupled to different OR gates rather than OR gate 2311.

OR 게이트(2310-2313)의 출력(2339-2342)은 FD 버스 라인 FD0, FD3, FD1, 및FD4에 결합된다. 사용자 설계의 특정 예에서, 오직 4개의 출력 핀아웃 신호는 RCC 컴퓨팅 시스템 및 외부 인터페이스에서 전달될 것이다. 따라서, FD0는 OR 게이트(2310)의 출력에 결합되고; FD3는 OR 게이트(2311)의 출력에 결합되고; FD1는 OR 게이트(2312)의 출력에 결합되고; FD4는 OR 게이트(2313)의 출력에 결합된다. 상기 FD 버스 라인은 내부 I/O 제어기(2302)의 내부 라인(2334-2338)를 통해 로컬 버스 라인(2330-2333)에 결합된다. 상기 실시예에서, 로컬 버스 라인(2330)은 LD0이고, 로컬 버스라인(2331)은 LD3이고, 로컬 버스라인(2332)은 LD1이고, 로컬 버스라인(2333)은 LD4이다.Outputs 2339-2342 of OR gates 2310-2313 are coupled to FD bus lines FD0, FD3, FD1, and FD4. In a particular example of a user design, only four output pinout signals will be delivered at the RCC computing system and external interface. Thus, FD0 is coupled to the output of the OR gate 2310; FD3 is coupled to the output of OR gate 2311; FD1 is coupled to the output of OR gate 2312; FD4 is coupled to the output of the OR gate 2313. The FD bus line is coupled to a local bus line 2330-2333 through an internal line 2334-2338 of an internal I / O controller 2302. In this embodiment, local bus line 2330 is LD0, local busline 2331 is LD3, local busline 2332 is LD1, and local busline 2333 is LD4.

로컬 버스라인(2330-2333)의 데이터가 RCC 컴퓨팅 시스템에 전달되도록 하기 위해, 상기 로컬 버스라인은 3상태 버퍼(2301)에 결합된다. 정규 상태에서 3상태 버퍼(2301)는 데이터가 로컬 버스라인(2330-2333)으로부터 로컬 버스(2320)로 통과하도록 허용한다. 대조적으로, 데이터-인동안, 데이터는 CPU_IN 신호가 3상태 버퍼(2301)에 제공될 때에만 RCC 컴퓨팅 시스템으로부터 RCC 하드웨어 어레이를 통과하도록 허용된다.The local busline is coupled to the tri-state buffer 2301 to allow data from the local buslines 2330-2333 to be delivered to the RCC computing system. In the normal state, the tri-state buffer 2301 allows data to pass from the local busline 2330-2333 to the local bus 2320. In contrast, during data-in, data is allowed to pass through the RCC hardware array from the RCC computing system only when the CPU_IN signal is provided to the tri-state buffer 2301.

상기 로컬 버스라인(2330-2333)의 데이터가 외부 인터페이스에 전달되도록 하기 위해 라인(2321-2324)이 제공된다. 라인(2321)은 라인(2330)및 외부 I/O 제어기(2300)에서의 임의의 래치(비도시)에 결합되며; 라인(2322)은 라인(2331)및 외부 I/O 제어기(2300)에서의 임의의 래치(비도시)에 결합되며; 라인(2323)은 라인(2332)및 외부 I/O 제어기(2300)에서의 래치(2305)에 결합되며; 라인(2324)은 라인(2333)및 외부 I/O 제어기(2300)에서의 래치(2306)에 결합된다.Lines 2321-2324 are provided to allow data from the local buslines 2330-2333 to be delivered to an external interface. Line 2321 is coupled to any latch (not shown) in line 2330 and external I / O controller 2300; Line 2232 is coupled to any latch (not shown) in line 2331 and external I / O controller 2300; Line 2323 is coupled to latch 2305 at line 2332 and external I / O controller 2300; Line 2324 is coupled to latch 2306 at line 2333 and external I / O controller 2300.

상기 래치(2305 및 2306)의 각각의 입력은 버퍼 및 목표 시스템의 적절한 출력 핀-아웃 또는 외부 I/O 에 결합되는 외부 인터페이스에 결합된다. 따라서, 래치(2305)의 출력은 버퍼(2307) 및 라인(2327)에 결합된다. 또한, 래치(2306)의 출력은 버퍼(2308) 및 라인(2328)에 결합된다. 또다른 래치(비도시)의 또다른 출력은 라인(2329)에 결합될 수 있다. 상기 예에서, 라인(2327-2329)은 목표 시스템 또는 임의의 외부 I/O 디바이스의 와이어 1, 와이어4, 및 와이어3과 각각 일치한다. 최종적으로, 상기 하드웨어 모델로부터 상기 외부 인터페이스로 데이터를 전송하는 동안, 상기 사용자 설계의 하드웨어 모델은 라인(2350)에 연결되어 있는 내부 노드가 라인(2329)상의 유선(3)에 상응하며, 라인(2351)에 연결되어 있는 내부 노드는 라인(2327)상의 유선(1)에 상응하며, 라인(2352)에 연결되어 있는 내부 노드는 아인(2328)상의 유선(4)에 상응하도록 구성된다. 유사하게, 유선(3)은 라인(2331)상의 KD3에 상응하며, 유선(1)은 라인(2332)상의 LD1에 상응하며, 유선(4)는 라인(2333)상의 LD4에 상응한다.Each input of the latches 2305 and 2306 is coupled to an external interface that is coupled to an appropriate output pin-out or external I / O of the buffer and target system. Thus, the output of latch 2305 is coupled to buffer 2307 and line 2327. In addition, the output of latch 2306 is coupled to buffer 2308 and line 2328. Another output of another latch (not shown) may be coupled to line 2329. In the above example, lines 2327-2329 match wire 1, wire 4, and wire 3, respectively, of the target system or any external I / O device. Finally, while transferring data from the hardware model to the external interface, the hardware model of the user design has an internal node connected to the line 2350 corresponding to the wire 3 on the line 2329 and the line ( An internal node connected to 2351 corresponds to a wire 1 on line 2327 and an internal node connected to line 2352 is configured to correspond to a wire 4 on ain 2328. Similarly, streamline 3 corresponds to KD3 on line 2331, streamline 1 corresponds to LD1 on line 2332, and streamline 4 corresponds to LD4 on line 2333.

조사 테이블(2309)는 이러한 래치(2305, 2306)으로의 입력에 연결된다. 상기 조사 테이블(2309)는 조사표 주소 카운터(2304)의 작동을 트리거하는 라인(2367) 상의 F_RD 신호에 의해 제어된다. 카운터가 각각 증가할 때, 상기 포인터는 조사 테이블(2309)의 특정 열을 인에이블한다. 만약 상기 특정 열에서의 엔트리(또는 비트)가 로직"1"이면, 상기 조사 테이블(2309)의 특정 엔트리에 연결되어 있는 LUT 출력 라인은 그것의 상응하는 래치를 인에이블하게하고 상기 데이터를 상기 ㅇ외부 인터페이스로 구동하며, 최종적으로는 상기 목표 시스템 또는 일정한 외부 I/O 기기의 원하는 지점으로 구동한다. 예를 들어, LUT 출력 라인(2325)는 래치(2305)로의 인에이블 입력에 연결되며, LUT 출력 라인(2326)은 래치(2306)으로의 인에이블 입력에 연결된다.Lookup table 2309 is connected to inputs to these latches 2305 and 2306. The lookup table 2309 is controlled by the F_RD signal on line 2367 which triggers the operation of lookup table address counter 2304. As each counter increments, the pointer enables a particular column of lookup table 2309. If the entry (or bit) in the particular column is logic " 1 ", then the LUT output line connected to the particular entry in the lookup table 2309 enables its corresponding latch and enables the data to occur. It is driven by an external interface and finally driven to the desired point of the target system or a constant external I / O device. For example, LUT output line 2325 is connected to an enable input to latch 2305 and LUT output line 2326 is connected to an enable input to latch 2306.

상기 예에서, 조사 테이블(2309)의 열(0-3)은 칩 m1의 내부 노드에 대한 출력 핀-아웃 유선들에 상응하는 인에이블링 래치를 위해 프로그램된다. 유사하게, 열(4-6)은 칩0_1(즉, 보드1의 칩0)의 내부 노드에 대한 출력 핀-아웃 유선들에 상응하는 인에이블링 래치를 위해 프로그램된다. 열4에서, 비트(3)은 로직"1"이다. 열(5)에서, 비트1은 로직"1"이다. 열(6)에서, 비트(4)는 로직"1"이다. 모든 다른 엔트리들과 비트 위치는 로직"0"이다. 조사 테이블의 어느 소정의 비트 위치에 대해, 단일 출력 핀-아웃 유선은 다중 I/O기기를 구동할 수 없기 때문에, 단지 하나의 엔트리만이 로직"1"이다. 달리 말하면, 하드웨어 모델에서 출력 핀-아웃 내부 노드는 상기 외부 인터페이스에 연결되어 있는 단지 단일 유선에만 데이터를 제공한다.In the above example, columns 0-3 of lookup table 2309 are programmed for enabling latches corresponding to the output pin-out wires for the inner node of chip m1. Similarly, columns 4-6 are programmed for enabling latches corresponding to the output pin-out wires for the internal node of chip 0_1 (ie, chip 0 of board 1). In column 4, the bit 3 is logic "1". In column 5, bit 1 is logic " 1. " In column 6, bit 4 is logic " 1 ". All other entries and bit positions are logic "0". For any given bit position in the lookup table, only one entry is logic "1" because a single output pin-out wireline cannot drive multiple I / O devices. In other words, in the hardware model, the output pin-out internal node provides data only to a single wire connected to the external interface.

상기 언급한 것과 같이, 상기 데이터-아웃 제어 로직은 상기 RCC 하드웨어 모델에 있는 각 칩의 각 구성될 수 있는 로직 구성요소의 데이터가 순차적으로 (1) 상기 RCC 컴퓨팅 시스템 및 (2) 상기 RCC 컴퓨팅 시스템 및 상기 외부 인터페이스로(상기 목표 시스템 및 상기 외부 I/O 기기) 함께 전달되는 것을 요구한다. 상기 RCC 컴퓨팅 시스템은 소프트웨어에서 일정한 I/O 기기의 모델을 가지고 있기 때문에 상기 데이터를 요구하며, 상기 모델된 I/O 기기 중 하나로 향하지 않는 상기 데이터에 대해서는, 상기 RCC 컴퓨팅 시스템이 그것의 내부 상태가 상기 RCC 하드웨어 어레이의 상기 하드웨어 모델의 그것에 상응하도록 하기 위해 그들을 감시할 필요가 있다. 도71과 73에서 설명된 예에서, 단지 7개의 내부 노드들이 상기 RCC 컴퓨팅 시스템과 외부 인터페이스로의 출력으로 구동될 것이다. 상기 내부 노드들 중에서 두개는 칩 m1에 있으며, 다른 5개의 내부 노드들은 칩0_1(즉, 보드1의 칩0)에 있다. 물론, 그것들 중에서 다른 내부 노드들과 다은 칩들은 상기 특정 사용자 설계를 이해 요구될 수 있지만, 도71과 73은 단지 7개의 노드들만을 설명하고 있다.As mentioned above, the data-out control logic is such that data of each configurable logic component of each chip in the RCC hardware model is sequentially (1) the RCC computing system and (2) the RCC computing system. And to be delivered together to the external interface (the target system and the external I / O device). The RCC computing system requires the data because it has a model of a certain I / O device in software, and for the data that is not directed to one of the modeled I / O devices, the RCC computing system has an internal state of its own. It is necessary to monitor them in order to correspond to that of the hardware model of the RCC hardware array. In the example described in Figures 71 and 73, only seven internal nodes will be driven with output to the RCC computing system and an external interface. Two of the internal nodes are on chip m1 and the other five internal nodes are on chip 0_1 (ie chip 0 of board 1). Of course, other internal nodes and chips among them may be required to understand the specific user design, but Figures 71 and 73 illustrate only seven nodes.

데이터 송신 동안에, 상기 DATA_XEFR 신호는 로직"1"이다. 상기 시간 동안에, 상기 로컬 버스(2330-2333)은 상기 RCC 하드웨어 어레이에 있는 각 보드의 각 칩으로부터 순차적으로 상기 RCC 컴퓨팅 시스템과 상기 외부 인터페이스로 데이터를 송신하기 위해 종래의 시스템에서 사용될 것이다. 상기 DATA_XSFR과 F_RD 신호들은 상기 출력 핀-아웃 내부 로드로 향하는 적절한 게이트로의 상기 적절한 포인터 신호 H2S_PTR[4:0]을 발생하기 이해 상기 데이터 출력 포인터 상태 기계의 작동을 제어한다. 상기 F_RD 신호는 또한 내부 노드 데이터를 상기 외부 인터페이스로 전송하기 위해 상기 조사 테이블 주소 카운터(2304)를 제어한다.During data transmission, the DATA_XEFR signal is logic "1". During this time, the local buses 2330-2333 will be used in a conventional system to sequentially transmit data from each chip of each board in the RCC hardware array to the RCC computing system and the external interface. The DATA_XSFR and F_RD signals control the operation of the data output pointer state machine to generate the appropriate pointer signal H2S_PTR [4: 0] to the appropriate gate that is directed to the output pin-out internal load. The F_RD signal also controls the lookup table address counter 2304 to send internal node data to the external interface.

칩 m1에 있는 상기 내부 노드는 처음으로 조정된다. F_RC가 데이터 전송 사이클에서 로직"1"로 발생되면, 칩m1에 있는 H2S_PTR0은 로직"1"로 간다. 이것은 상기 H2S_PTR0에 근거하는 칩 m1의 내부 노드에 있는 데이터를 트라이-상태 버퍼(23010과 국부 버스(2320)을 통해 상기 RCC 컴퓨팅 시스템으로 구동한다. 룩-업 테이블 어드레스 카운터(2304)가 카운팅되고 룩업 테이블(2309)의 로우(0)를 지시하여 칩 m1 의 적절한 데이터에서 외부 인터페이스로 래칭된다. F_RD 신호가 다시 논리"1"로 돌아가면, H2S_PTR1에 의해 구동될 수 있는 내부 노드들에서의 데이터는 RCC 계산 시스템 및 내부 인터페이스로 전달된다. H2S_PTR1은 논리 "1"로 진행하고 제2 F_RD 신호에 응답하여, 룩 업 테이블 어드레스 카운터(2304)는 카운팅되고 룩업 테이블(2309)의 로우(1)을 지시하여 칩 m1의 적절한 데이터에서 외부 인터페이스로 래칭된다.The internal node on chip m1 is adjusted for the first time. If F_RC is generated with logic "1" in the data transfer cycle, H2S_PTR0 in chip m1 goes to logic "1". This drives data in the internal node of chip m1 based on H2S_PTR0 to the RCC computing system via tri-state buffer 23010 and local bus 2320. Look-up table address counter 2304 is counted and lookup Pointing to row 0 of table 2309 is latched from the appropriate data of chip m1 to the external interface. When the F_RD signal returns to logic " 1 ", the data at internal nodes that can be driven by H2S_PTR1 is Is passed to the RCC calculation system and internal interface H2S_PTR1 proceeds to logic " 1 " and in response to the second F_RD signal, lookup table address counter 2304 counts and indicates row 1 of lookup table 2309; Is latched into the external interface at the appropriate data of chip m1.

재구성 논리 엘리먼트(2303)(즉, 보드 1에서의 칩 0_1, 또는 칩 0)가 이제 처리될 것이다. 이러한 예에서, H2S_PTR0 및 H2S_PTR2와 관련된 2개의 내부 노드들은 단지 RCC 계산 시스템으로 전달될 것이다. H2S_PTR2, H2S_PTR3, 및 H2S_PTR4와 관련된 3개의 내부 노드들로부터의 데이터는 RCC 계산 시스템 및 외부 인터페이스로 전달될 것이다.Reconstruction logic element 2303 (ie, chip 0_1, or chip 0 on board 1) will now be processed. In this example, two internal nodes associated with H2S_PTR0 and H2S_PTR2 will only be passed to the RCC calculation system. Data from three internal nodes associated with H2S_PTR2, H2S_PTR3, and H2S_PTR4 will be passed to the RCC calculation system and external interface.

F_RD 가 논리 "1"이 되면, 칩(2303)의 H2S_PTR0는 논리 "1"이 된다. 이는 3-상태 버퍼(2301) 및 로컬 버스(2320)를 통해 RCC 계산 시스템으로 H2S_PTR0에 의존하는 칩(2303)내의 이러한 내부 노드들을 구동한다. 이러한 예에서, 라인(2348)과 결합된 내부 노드는 라인(2353) 상에서 H2S_PTR0에 의존하는 라인(2348)과 결합된다. F_RD 신호가 다시 논리 "1"이 되면, H2S_PTR1에 의해 구동될 수 있는 내부 노드들에서의 데이터는 RCC 계산 시스템으로 전달된다. 여기서, 라인(2349)과 결합된 내부 노드가 영향을 받는다. 이러한 데이터는 라인(2331 및 2322) 상에서 LD3로 구동된다.When F_RD becomes logic "1", H2S_PTR0 of chip 2303 becomes logic "1". This drives these internal nodes in chip 2303 that rely on H2S_PTR0 to the RCC computation system via a tri-state buffer 2301 and local bus 2320. In this example, an internal node associated with line 2348 is coupled with line 2348 depending on H2S_PTR0 on line 2353. When the F_RD signal becomes logic " 1 " again, data at internal nodes that can be driven by H2S_PTR1 is passed to the RCC calculation system. Here, the internal node coupled with line 2349 is affected. This data is driven to LD3 on lines 2331 and 2322.

F_RD 신호가 다시 논리"1"이 되면, H2S_PTR2는 논리"1" 이 되고 라인(2350)과 결합된 내부 노드에서의 데이터가 LD3에서 제공된다. 이러한 데이터는 RCC 계산 시스템 및 외부 인터페이스 모두에게 제공된다. 3-상태 버퍼(2301)는 데이터가 로컬 버스(2320)로 그리고 나서 RCC 계산 시스템으로 전달될 수 있도록 하여준다. 외부 인터페이스에 대해서, 이러한 데이터는 H2S_PTR2 신호를 인에이블링 함으로써 라인(2331 및 2322) 상에서 LD3로 구동된다. F_RD 신호에 응답하여, 룩-업 테이블 어드레스 카운터(2304)가 카운팅되고 룩-업 테이블(2309)을 로우(4)를 지시하여 외부 인터페이스에서 라인 (2350) - 라인(2329)(와이어3)와 결합된 이러한 내부 노드로부터 적절한 데이터에서 래칭된다.When the F_RD signal becomes logic " 1 " again, H2S_PTR2 becomes logic " 1 " and data at the internal node associated with line 2350 is provided at LD3. This data is provided to both the RCC calculation system and the external interface. The tri-state buffer 2301 allows data to be delivered to the local bus 2320 and then to the RCC calculation system. For the external interface, this data is driven to LD3 on lines 2331 and 2322 by enabling the H2S_PTR2 signal. In response to the F_RD signal, look-up table address counter 2304 is counted and directs look-up table 2309 to row 4 so that lines 2350-line 2329 (wire 3) at the external interface are connected. From these internal nodes combined they are latched in the appropriate data.

F_RD 신호가 다시 논리"1"이 되면, H2S_PTR3는 논리"1" 이 되고 라인(2351)과 결합된 내부 노드에서의 데이터가 LD1에서 제공된다. 이러한 데이터는 RCC 계산 시스템 및 외부 인터페이스 모두에게 제공된다. 3-상태 버퍼(2301)는 데이터가 로컬 버스(2320)로 그리고 나서 RCC 계산 시스템으로 전달될 수 있도록 하여준다. 외부 인터페이스에 대해서, 이러한 데이터는 H2S_PTR3 신호를 인에이블링 함으로써 라인(2332 및 2323) 상에서 LD1로 구동된다. F_RD 신호에 응답하여, 룩-업 테이블 어드레스 카운터(2304)가 카운팅되고 룩-업 테이블(2309)을 로우(5)를 지시하여 외부 인터페이스에서 라인 (2351) - 라인(2327)(와이어1)와 결합된 이러한 내부 노드로부터 적절한 데이터에서 래칭된다.When the F_RD signal becomes logic " 1 " again, H2S_PTR3 becomes logic " 1 " and data at the internal node coupled with line 2351 is provided at LD1. This data is provided to both the RCC calculation system and the external interface. The tri-state buffer 2301 allows data to be delivered to the local bus 2320 and then to the RCC calculation system. For the external interface, this data is driven to LD1 on lines 2332 and 2323 by enabling the H2S_PTR3 signal. In response to the F_RD signal, the look-up table address counter 2304 counts and directs the look-up table 2309 low 5 so as to connect lines 2351-line 2327 (wire 1) at the external interface. From these internal nodes combined they are latched in the appropriate data.

F_RD 신호가 다시 논리"1"이 되면, H2S_PTR4는 논리"1" 이 되고 라인(2352)과 결합된 내부 노드에서의 데이터가 LD4에서 제공된다. 이러한 데이터는 RCC 계산 시스템 및 외부 인터페이스 모두에게 제공된다. 3-상태 버퍼(2301)는 데이터가로컬 버스(2320)로 그리고 나서 RCC 계산 시스템으로 전달될 수 있도록 하여준다. 외부 인터페이스에 대해서, 이러한 데이터는 H2S_PTR4 신호를 인에이블링 함으로써 라인(2333 및 2324) 상에서 LD4로 구동된다. F_RD 신호에 응답하여, 룩-업 테이블 어드레스 카운터(2304)가 카운팅되고 룩-업 테이블(2309)을 로우(6)를 지시하여 외부 인터페이스에서 라인 (2352) - 라인(2328)(와이어4)와 결합된 이러한 내부 노드로부터 적절한 데이터에서 래칭된다.When the F_RD signal becomes logic " 1 " again, H2S_PTR4 becomes logic " 1 " and data at the internal node coupled with line 2352 is provided at LD4. This data is provided to both the RCC calculation system and the external interface. The tri-state buffer 2301 allows data to be delivered to the local bus 2320 and then to the RCC calculation system. For the external interface, this data is driven to LD4 on lines 2333 and 2324 by enabling the H2S_PTR4 signal. In response to the F_RD signal, the look-up table address counter 2304 counts and directs the look-up table 2309 to low 6 so as to connect lines 2322-2323 (wire 4) at the external interface. From these internal nodes combined they are latched in the appropriate data.

RCC 계산 시스템으로 칩 m1의 내부 노드에서 데이터를 구동하고 그리고 나서 RCC 계산 시스템 및 외부 인터페이스로 데이터를 구동하는 이러한 과정은 다른 칩들에 대해 순차적으로 계속된다. 첫째로, 칩 m1의 내부 노드가 구동된다. 둘째로, 칩0-1(칩2303)의 내부 노드가 구동한다. 다음으로, 존재한다면 칩1-1의 내부 노드가 구동할 것이다. 이것은 상기 마지막 보드의 마지막 칩들에서 마지막 노드가 구동할 때까지 게속된다. 따라서, 존재한다면, 칩7-8의 재부 노드가 구동할 것이다. 마지막으로, 존재한다면 상기 칩 m2의 내부 노드는 구동할 것이다.This process of driving data at the internal node of chip m1 with the RCC calculation system and then to the RCC calculation system and external interface continues sequentially for the other chips. First, the internal node of chip m1 is driven. Secondly, the internal node of chip 0-1 (chip 2303) is driven. Next, if present, the internal node of chip 1-1 will run. This continues until the last node drives on the last chips of the last board. Thus, if present, the second node of chip 7-8 will run. Finally, if present the internal node of chip m2 will be driven.

비록 도71은 단지 칩(2303)의 내부 노드를 구동하기 위한 상기 데이터 출력 제어 로직을 도시하고 있지만, 다른 칩들은 또한 시스템과 상기 외부 인터페이스를 컴퓨팅하는 상기 RCC로 구동될 필요가 있는 내부 노드들을 가지고 있다. 내부 노드의 수에 무관하게, 상기 데이터 출력 로직은 한 칩에 있는 상기 내부 노드로부터 시스템을 컴퓨팅하는 상기 RCC로 상기 데이터를 구동할 것이며, 또 다른 사이클에 동일한 칩에 있는 내부 노드의 서로 다른 세트를 시스템과 외부 인터페이스를 컴퓨팅하는 상기 RCC로 구동한다. 상기 데이터 출력 제어 로직은 상기 다음 칩으로 이동하며, 시스템을 컴퓨팅하는 상기 RCC로 지정된 데이터를 구동하고 다음으로 상기 외부 인터페이스로 지정된 데이터를 상기 RCC 컴퓨팅 시스템과 내부 인터페이스로 구동하는 동일한 두 단계 연산을 수행한다. 상기 데이터는 상기 외부 인터페이스로 향하도록 되어 있더라도, 상기 RCC 컴퓨팅 시스템은 상기 RCC 하드웨어 어레이에 있는 상기 하드웨어 모델의 내부 상태 정보와 상응하는 내부 상태 정보를 가지고 있어야 하는 소프트웨어에서의 상기 전체 사용자 설계에 대한 모델을 가지고 있기 때문에, 상기 RCC 컴퓨팅 시스템은 상기 데이터에 대한 정보를 가지고 있어야 한다.Although Figure 71 only shows the data output control logic to drive an internal node of chip 2303, other chips also have internal nodes that need to be driven by the RCC computing system and the external interface. have. Regardless of the number of internal nodes, the data output logic will drive the data from the internal node on one chip to the RCC computing system, and in another cycle, will have different sets of internal nodes on the same chip. Powered by the RCC computing system and external interface. The data output control logic moves to the next chip, performs the same two-step operation of driving data designated to the RCC computing system and then driving data designated to the external interface to the RCC computing system and internal interface. do. Although the data is directed to the external interface, the RCC computing system must have internal state information corresponding to the internal state information of the hardware model in the RCC hardware array, the model for the entire user design in software. Since the RCC computing system must have information about the data.

보드 레이아웃(Board layout)Board layout

본 발명의 일 실시예에 상응하는 커버리피케이션 시스템의 상기 보드 레이아웃은 도74를 참고로 설명될 것이다. 상기 보드들은 상기 RCC 하드웨어 어레이에 인스톨될 수 있다. 상기 보드 레이아웃은 도8과 36-44에서 설명된 것과 다음에서 설명하는 것과 유사하다.The board layout of the coverage system corresponding to one embodiment of the present invention will be described with reference to FIG. The boards may be installed in the RCC hardware array. The board layout is similar to that described in Figures 8 and 36-44 and as described below.

한 실시예에서, 상기 RCC 하드웨어 어레이는 6개의 보드들을 포함한다. 보드 m1은 보드1에 연결되어 있으며, 보드m2는 보드8에 연결되어 있다. 보드 1,보드2, 보드3 및 보드8의 장치와 연결은 도8과 도36-44를 참고로 설명되었다.In one embodiment, the RCC hardware array includes six boards. Board m1 is connected to board 1 and board m2 is connected to board 8. Devices and connections of boards 1, 2, 3 and 8 have been described with reference to FIGS. 8 and 36-44.

보드m1은 칩m1을 포함한다. 다른 보드들에 대해 상기 보드m1의 상호연결 구조는 칩m1이 상기 보드1의 칩0, 칩2, 칩4 및 칩6으로 싸우스(south) 상호연결로 연결되어 있다. 유사하게, 보드m2은 칩m2을 포함한다. 다른 보드들에 대해 상기 보드m2의 상호연결 구조는 칩m2이 상기 보드8의 칩0, 칩2, 칩4 및 칩6으로싸우스(south) 상호연결로 연결되어 있다.Board m1 includes chip m1. For other boards, the interconnect structure of the board m1 is connected with a south interconnection of chip m1 to chip 0, chip 2, chip 4 and chip 6 of the board 1. Similarly, board m2 includes chip m2. For other boards the interconnect structure of the board m2 is a chip m2 connected in south interconnection with chip 0, chip 2, chip 4 and chip 6 of the board 8.

예Yes

본 발명의 한 실시예의 작동을 설명하기 위해, 가정적인 사용자 회로 설계가 사용될 것이다. 구조화된 레지스터 송신 레벨(RLT)HDL 코드에서, 상기 예시적인 사용자 회로 설계는 다음과 같다.To illustrate the operation of one embodiment of the present invention, a hypothetical user circuit design will be used. In a structured register send level (RLT) HDL code, the exemplary user circuit design is as follows.

module register (clock, reset, d, q);module register (clock, reset, d, q);

input clock,d,reset;input clock, d, reset;

output q;output q;

reg q;reg q;

always@(posedge clock or negedge reset)always @ (posedge clock or negedge reset)

if(~reset)if (~ reset)

q=0;q = 0;

elseelse

q=d;q = d;

endmoduleendmodule

module example;module example;

wire d1, d2, d3;wire d1, d2, d3;

wire q1, q2, q3;wire q1, q2, q3;

reg sigin;reg sigin;

wire sigout;wire sigout;

reg clk, reset;reg clk, reset;

register reg1(clk, reset, d1,q1);register reg1 (clk, reset, d1, q1);

register reg1(clk, reset, d2,q2);register reg1 (clk, reset, d2, q2);

register reg1(clk, reset, d3,q3);register reg1 (clk, reset, d3, q3);

assign d1 = sigin ^q3;assign d1 = sigin ^ q3;

assign d2 = q1 ^q3;assign d2 = q1 ^ q3;

assign d3 = q2 ^q3;assign d3 = q2 ^ q3;

assign sigout = q3;assign sigout = q3;

?? a clock generator?? a clock generator

alwaysalways

beginbegin

clk = 1;clk = 1;

#5;# 5;

clk = 1;clk = 1;

#5;# 5;

endend

//a signal generator// a signal generator

alwaysalways

beginbegin

#10;# 10;

sigin = $random;sigin = $ random;

endend

//initialization// initialization

initialinitial

beginbegin

reset = 0;reset = 0;

sigin = 0;sigin = 0;

#1;#One;

reset = 1;reset = 1;

#5;# 5;

$monitor($time, "%b, %b," sigin, sigout);$ monitor ($ time, "% b,% b," sigin, sigout);

#1000 $finish;# 1000 $ finish;

endend

end moduleend module

상기 코드는 도26에서 재생산된다. 상기 회로 설계의 특정한 기능적인 세세한 부분들은 본 발명을 이해하기 위해 필요없다. 그러나, 독자는 상기 사용자는 시뮬레이션을 위한 회로를 설계하기 위해 상기 HDL 코드를 발생한다는 것을 이해하여야 한다. 상기 코드에 의해 표현된 상기 회로는 입력 신호에 응답하여 상기 사용자가 설계한 일정 기능을 수행하며, 출력을 발생한다.The code is reproduced in FIG. Certain functional details of the circuit design are not necessary to understand the present invention. However, the reader should understand that the user generates the HDL code to design the circuit for the simulation. The circuit represented by the code performs a certain function designed by the user in response to an input signal and generates an output.

도27은 도26에서 설명된 상기 HDL 코드의 회로 다이어그램을 도시하고 있다. 대부분의 경우, 상기 사용자는 실제적으로 상기 본질을 HDL 폼에 나타내기 전에 상기 본질의 회로 다이어그램을 발생한다. 일정한 도식적인 캡쳐 툴은 도식적인 회로 다이어그램이 입력되고 프로세싱된 후에 상기 유용한 코드를 발생하도록 한다.FIG. 27 shows a circuit diagram of the HDL code described in FIG. In most cases, the user actually generates a circuit diagram of the nature before presenting the nature on the HDL form. Certain schematic capture tools allow generating the useful code after the schematic circuit diagram has been entered and processed.

도28에 도시되어 있는 것과 같이, 상기 시뮬레이션 시스템은 구성요소 타입 분석을 수행한다. 원래 사용자의 특정 회로 설계를 나타내는 도26에 제시되어 있는 것과 같이, 상기 HDL 코드는 현재 분석된다. 상기 "모듈 레지스터(clock, reset, d,q)"으로 시작하고 "endmodule"로 끝나는 코드의 처음 몇 줄(즉, 참조 번호(900)로 식별되는 부분)는 레지스터 정의 섹션이다.As shown in FIG. 28, the simulation system performs component type analysis. The HDL code is currently analyzed, as shown in Figure 26, which represents the original circuit design of the original user. The first few lines of code (ie, the part identified by reference numeral 900) starting with "module register (clock, reset, d, q)" and ending with "endmodule" are register definition sections.

다음 몇 줄의 코드,참조 번호(907),은 유선 상호연결 정보를 나타낸다. 당업자에게 공지되어 있는 HDL의 유선 변수는 게이트와 같은 구조적인 실체들 사이에서 물리적인 연결을 나타내는데 사용된다. HDL은 주로 디지털 회로를 모델하는데 사용되기 때문에, 유선 변수들은 변수를 필요로 한다. 보통, "q"(예들 들어, q1, q2, q3)은 출력 유선 라인을 나타내며, "d"(예를 들어, d1, d2, d3)는 입력 유선 라인을 나타낸다.The next few lines of code, reference number 907, indicate wired interconnect information. Wireline parameters of HDL, known to those skilled in the art, are used to represent physical connections between structural entities such as gates. Because HDL is primarily used to model digital circuits, wired variables require variables. Usually, "q" (e.g. q1, q2, q3) represents the output wired line and "d" (e.g. d1, d2, d3) represents the input wired line.

참조 번호(908)는 테스트-벤치 출력인 "sigin"를 도시한다. 레지스터번호(909)는 테스트 벤치 입력인 "sigout"를 도시한다.Reference numeral 908 shows the test-bench output "sigin". Register number 909 shows the test bench input "sigout".

참조 번호(901)는 레지스터 구성요소들(S1, S2 및 S3)을 도시한다. 참조번호(902)는 결합 구성요소(S4, S5, S6, S6)를 도시한다. 결합 구성요소 S4-S7은 상기 레지스터 구성요소 S1-S3으로의 입력인 출력 변수들 d1, d2, d3을 가지고 있다는 것에 유의하여야 한다. 참조번호(903)는 클록 구성요소(S8)를 도시한다.Reference numeral 901 shows register components S1, S2, and S3. Reference numeral 902 denotes coupling components S4, S5, S6, S6. It should be noted that coupling components S4-S7 have output variables d1, d2, d3 which are inputs to register components S1-S3. Reference numeral 903 shows the clock component S8.

다음 시리즈의 코드 라인 번호들은 테스트 벤치 구성요소들을 도시한다. 참조번호(904)는 테스트 벤치 구성요소(드라이버)(S9)를 도시한다. 참조번호(905)는 테스트 벤치 구성요소(초기화)(S10, S11)를 도시한다. 참조 번호(904)는 테스트 벤치 구성요소(감시)(S12)를 도시한다.The code line numbers in the next series show the test bench components. Reference numeral 904 denotes a test bench component (driver) S9. Reference numeral 905 shows a test bench component (initialization) S10, S11. Reference numeral 904 shows a test bench component (monitoring) S12.

상기 구성요소 타입 분석은 다음의 테이블에서 정리된다.The component type analysis is summarized in the following table.

구성요소Component 타입type S1S1 레지스터register S2S2 레지스터register S3S3 레지스터register S4S4 결합(combination)Combination S5S5 결합Combination S6S6 결합Combination S7S7 결합Combination S8S8 클록Clock S9S9 테스트-벤치(드라이버)Test-Bench (Driver) S10S10 테스트-벤치(초기화)Test-Bench (Initialize) S11S11 테스트-벤치(초기화)Test-Bench (Initialize) S12S12 테스트-벤치(검사)Test-Bench (Inspection)

상기 구성요소 타입 분석에 근거하여, 상기 시스템은 상기 전체 회로를 위한 소프트웨어 모델과 상기 레지스터와 결합 구성요소에 대한 하드웨어 모델을 발생한다. S1-S3는 레지스터 구성요소들이고 S4-S7은 결합 구성요소들이다. 이러한 구성요소들은 하드웨어에서 모델되어 시뮬레이션 시스템의 사용자로 하여금 상기 전체 회로를 소프트웨어로 시뮬레이트하거나 또는 소프트웨어로 시뮬레이션하고 선택적으로 하드웨어에서 촉진할 수 있다. 추가적으로, 상기 사용자는 시작하고, 중단하고 값들을 조사하고, 입력 값을 사이마다 입력하는 소프트웨어 제어를 유지하면서 목표 시스템에 의해 상기 회로를 에뮬레이트할 수 있다.Based on the component type analysis, the system generates a software model for the entire circuit and a hardware model for the registers and coupling components. S1-S3 are register components and S4-S7 are coupling components. These components can be modeled in hardware to allow a user of a simulation system to simulate the entire circuit in software or to simulate in software and optionally in hardware. In addition, the user can emulate the circuit by the target system while maintaining software control to start, stop, examine values, and enter input values between.

도29는 동일한 구조화된 RTL 레벨 HDL 코드의 신호 네트워크 분석을 도시하고 있다. 도시되어 있는 것과 같이, S8, S9, S10, S11은 모델화되거나 또는 소프트웨어로 제공된다. S9는 본질적으로 사인 신호들을 발생하는 상기 테스트-벤치 프로세스이며, S12는 본질적으로 상기 sigout 신호를 수신하는 상기 테스트벤치 감시 프로세스이다. 이러한 예에서, 상기 S9는 상기 회로를 시뮬레이트하기 위해 랜덤 사인을 발생한다. 그러나, 레지스터 S1내지 S3 및 결합 구성요소 S4 내지 S7은 하드웨어 및 소프트웨어에서 모델화된다.29 illustrates signal network analysis of the same structured RTL level HDL code. As shown, S8, S9, S10, S11 are modeled or provided in software. S9 is essentially the test-bench process that generates sine signals and S12 is essentially the testbench monitoring process that receives the sigout signal. In this example, S9 generates a random sign to simulate the circuit. However, registers S1 to S3 and coupling components S4 to S7 are modeled in hardware and software.

상기 하드웨어와 소프트웨어의 경계에서, 상기 시스템은 상기 소프트웨어 모델을 상기 하드웨어 모델로 인터페이스하는데 사용되는 여러 잔여 신호들에 대한 메모리 공간을 할당한다.(즉, q1, q2, q3, CLK, sign, sigout). 상기 메모리 공간 할당은 이하의 테이블과 같다:At the boundary between the hardware and the software, the system allocates memory space for the various residual signals used to interface the software model to the hardware model (ie q1, q2, q3, CLK, sign, sigout). . The memory space allocation is shown in the table below:

신호signal 메모리 주소 공간Memory address space q1q1 REGREG q2q2 REGREG q3q3 REGREG clkclk CLKCLK signsign S2HS2H sigoutsigout H2SH2S

도30은 상기 회로 디자인 예에서 하드웨어/소프트웨어 부분 결과를 도시하고 있다. 도30은 상기 하드웨어/소프트웨어 부분에 대한 보다 현실적인 도시예이다. 상기 소프트웨어 사이드(910)는 상기 소프트웨어/하드웨어 경계(911)와 상기 PCI 버스(913)를 통해 상기 하드웨어 사이드(912)에 연결된다.Figure 30 shows the hardware / software partial results in the circuit design example above. 30 is a more realistic illustration of the hardware / software portion. The software side 910 is connected to the hardware side 912 via the software / hardware boundary 911 and the PCI bus 913.

상기 소프트웨어 사이드(910)는 상기 소프트웨어 케르넬을 포함하고 있으며, 그것에 의해 제어된다. 일반적으로, 상기 케르넬은 상기 SE뮬레이션 시스템의 전체 동작을 제어하는 메인 제어 루프이다. 일정한 테스트 벤치 프로세스가 활성화되어 있는 동안에, 상기 케르넬은 상기 활성화된 테스트-벤치 구성요소를 이뮬레이트하고 클록 구성요소를 이뮬레이트하며, 결합 논리 데이터를 전파할 뿐만 아니라 레지스터와 메모리를 업데이트하기 위해 클록 에지를 탐색하여, 상기 시뮬레이션 시간을 진행한다. 상기 케르넬이 상기 소프트웨어 사이드에 존재하더라도, 그것의 작동 중 일부분과 명령은 하드웨어 모델이 그러한 명령과 작동을 위해 존재하기 때문에 하드웨어에서 작동할 수 있다. 따라서, 상기 소프트웨어는 소프트웨어와 하드웨어 모델을 모두 제어한다.The software side 910 includes and is controlled by the software kernel. In general, the Kernel is the main control loop that controls the overall operation of the SE emulation system. While a constant test bench process is active, the Kernel emulates the active test-bench component, emulates a clock component, propagates combined logic data, as well as clocks to update registers and memory. Search the edges and proceed with the simulation time. Even if the kernel is present on the software side, some of its operations and instructions may operate in hardware because a hardware model exists for those instructions and operations. Thus, the software controls both software and hardware models.

상기 소프트웨어 사이드(910)는 S1-S12를 포함하는 상기 사용자 회로의 전체 모델을 포함하고 있다. 상기 소프트웨어 사이드의 상기 소프트웨어/하드웨어 경계 부분은 I/O버퍼 또는 주소 공간 S2H, CLK, H2S 및 REG를 포함한다. 드라이버 테스트-벤치 프로세스(S9)는 상기 S2H 주소 공간에 연결되어 있으며, 감시 테스트 벤치 프로세스(S12)는 상기 H2S 주소 공간에 연결되어 있으며, 상기 클록 발생기(S8)는 상기 CLK 주소 공간에 연결되어 있다. 상기 레지스터(S1-S3) 출력 신호들(q1-q3)는 REG 공간에 할당될 수 있다.The software side 910 includes a full model of the user circuit including S1-S12. The software / hardware boundary portion of the software side includes an I / O buffer or address space S2H, CLK, H2S and REG. A driver test-bench process S9 is connected to the S2H address space, a monitoring test bench process S12 is connected to the H2S address space, and the clock generator S8 is connected to the CLK address space. . The registers S1-S3 output signals q1-q3 may be allocated to the REG space.

상기 하드웨어 모델(912)은 결합 구성요소(S4-S7)의 모델을 가지고 있는데,상기 구성요소들은 순수한 하드웨어 사이드에 존재한다. 상기 하드웨어 모델(912)의 상기 소프트웨어/하드웨어 경계 부분에서, sigout, sigin, 레지스터 출력(q1-q3) 및 상기 소프트웨어 클록(196)이 구현된다.The hardware model 912 has a model of coupling components S4-S7, which components are on the pure hardware side. At the software / hardware boundary portion of the hardware model 912, sigout, sigin, register outputs q1-q3 and the software clock 196 are implemented.

상기 사용자 회로 설계의 모델에 추가하여, 상기 시스템은 소프트웨어 클록과 주소 포인터를 발생한다. 상기 소프트웨어 클록은 레지스터(S1-S3)로의 인에이블 입력에 신호들을 제공한다. 상기 설명한 것과 같이, 본 발명에 상응하는 소프트웨어 클록은 레이스 조건과 유지 시간 방해(hold-time violation issues)를 제거한다. 상기 주요 클록에 의해 클록 에지가 소프트웨어에서 탐색될 때, 상기 탐색 논리는 하드웨어에서 상응하는 탐색 논리를 트리거한다. 시간적으로, 상기 클록 에지 레지스터(916)는 레지스터 인에이블 입력이 상기 레지스터로의 입력에 남아있는 데이터로 게이트하기 위해 인에이블 신호를 발생한다.In addition to the model of the user circuit design, the system generates a software clock and an address pointer. The software clock provides signals to an enable input to registers S1-S3. As described above, the software clock corresponding to the present invention eliminates race conditions and hold-time violation issues. When a clock edge is searched in software by the main clock, the search logic triggers the corresponding search logic in hardware. In time, the clock edge register 916 generates an enable signal to gate a register enable input with data remaining on the input to the register.

주소 포인터(194)는 또한 예시적이고 개념적인 설명을 위해 도시되어 있다. 주소 포인터들은 실제적으로 각 FPGA 칩에서 구현되며, 상기 데이터가 선택적으로 그리고 연속해서 그것의 목적지로 전송되도록 한다.The address pointer 194 is also shown for illustrative and conceptual description. Address pointers are practically implemented on each FPGA chip, allowing the data to be sent selectively and successively to its destination.

상기 결합 구성요소들(S4-S7)은 또한 레지스터 구성요소(S1-S3), sign, sigout)에 연결되어 있다. 이러한 신호들은 상기 I/O 버스(915)상에서 상기 PCI 버스로 및 으로부터 이동한다.The coupling elements S4-S7 are also connected to the register elements S1-S3, sign and sigout. These signals travel on and from the I / O bus 915 to the PCI bus.

맵핑, 정착 및 단계의 라우팅 이전에, 완벽한 하드웨어 모델이 주소 포인터를 제외하고 도31에 도시되어 있다. 상기 시스템은 상기 모델을 특정 칩에 맵하지는 않는다. 레지스터(s1-S3)는 상기 I/O 버스와 상기 결합 구성요소(S4-S6)에 연결되지 위해 제공된다. 결합 구성요소(S7)는 상기 레지스터(S3)의 출력(q3)이다. 상기 sigin, sigout 및 소프트웨어 클록(920)은 또한 모델화된다.Before mapping, settling, and routing of the steps, a complete hardware model is shown in FIG. 31 except for the address pointer. The system does not map the model to a particular chip. The registers s1-S3 are provided for not being connected to the I / O bus and the coupling component S4-S6. Coupling component S7 is the output q3 of register S3. The sigin, sigout and software clock 920 are also modeled.

일단 상기 하드웨어 모델이 결정되면, 상기 시스템은 맵되고, 정착하며, 상기 모델을 하나 이상의 칩으로 라우트한다. 이러한 특정 예는 실제적으로 단일 Altera FLEX 10K에서 구현될 수 있지만, 교육적인 목적에서 이러한 에는 상기 하드웨어 모델을 구현하기 위해 두 개의 칩이 요구된다는 것을 생각할 수 있다. 도32는 상기 예를 이한 한 특정 하드웨어 모델에 대한 칩 부분 결과를 도시하고 있다.Once the hardware model is determined, the system is mapped, settled, and routes the model to one or more chips. This particular example may actually be implemented in a single Altera FLEX 10K, but for educational purposes it may be conceivable that this requires two chips to implement the hardware model. Figure 32 shows chip part results for one particular hardware model from the above example.

도32에서, 점선으로 표현된 칩 경계에 의해 완벽한 모델이 도시되어 있다(다만, 사기 I/O와 클록 에지 레지스터는 제외된다). 상기 결과는 최종 구조 파일이 발생하기 전에 상기 SE뮬레이션 시스템의 컴파일러에 의해 생산된다. 따라서, 상기 하드웨어 모델은 유선 라인(921, 922 및 923)을 위한 상기 두 개의 칩들 사이에서 적어도 3개의 유선을 요구한다. 상기 2개의 칩들(칩1, 칩2)사이에서 요구되는 핀/유선들의 수를 최소화하기 위해, 또 다른 모델-칩 부분은 발생되어야 하거나 또는 멀티플렉싱 구조가 사용되어야 한다.In Fig. 32, a perfect model is shown by the chip boundaries represented by dashed lines (except fraudulent I / O and clock edge registers). The result is produced by the compiler of the SE emulation system before the final structure file occurs. Thus, the hardware model requires at least three wires between the two chips for wired lines 921, 922 and 923. In order to minimize the number of pins / wires required between the two chips (chip 1, chip 2), another model-chip part must be generated or a multiplexing structure must be used.

도32에 도시되어 있는 상기 특정 부분 결과를 분석할 때, 상기 2개의 칩들 사이에서 유선의 수는 칩2로부터 칩1로의 상기 sigin 유선 라인(923)을 제거함으로써 줄어들 수 있다. 도33은 상기 부분을 도시하고 있다. 비록 도33의 상기 특정 부분은 유선의 수만에 근거하여 도32의 부분보다는 더 좋은 부분으로 보이지만, 상기 예는 상기 SE뮬레이션 시스템이 상기 맵핑, 정착 및 작동 라우팅의 수행 후에 도 32를 선택한 것으로 생각될 수 있다. 도32의 상기 결과 부분은 상기 구조 파일을 발생하는 기본으로 사용될 수 있을 것이다.In analyzing the particular partial result shown in Fig. 32, the number of wires between the two chips can be reduced by removing the sigin wired line 923 from chip 2 to chip 1. 33 shows this portion. Although the particular part of FIG. 33 appears to be a better part than that of FIG. 32 based on tens of thousands of wires, the example would be considered that the SE emulation system selected FIG. 32 after performing the mapping, fixation and operational routing. Can be. The result portion of Figure 32 may be used as the basis for generating the rescue file.

도34는 2개의 칩의 최종 실현이 도시되어 있는 상기 동일한 가정적인 예에 대해 상기 논리 패칭 작동을 도시한다. 상기 시스템은 상기 구조 파일을 발생하기 위해 도32의 상기 부분 결과 사용했다. 그러나, 상기 주소 포인터들은 간소화를 위해 도시되지 않는다. 상기 두개의 FPGA 칩들(930, 940)이 도시되어 있다. 칩(930)은 여러 구성요소들 중에서 상기 사용자 회로 설계의 분할된 부분들, TDM 유닛(931, 수신기 사이드), 상기 소프트웨어 클록(932) 및 I/O 버스(933)를 포함한다. 칩(940)은 여러 구성요소들 중에서 상기 사용자 회로 설계의 분할된 부분들, 송신 사이드를 위한 TDM 유닛(941), 상기 소프트웨어 클록(942) 및 I/O 버스(943)을 포함한다. 상기 TDM 유닛(931, 941)은 도 9A, 9B, 9C를 참고로 설명되었다.Figure 34 shows the logic patching operation for the same hypothetical example in which the final realization of two chips is shown. The system used the partial results of FIG. 32 to generate the rescue file. However, the address pointers are not shown for simplicity. The two FPGA chips 930 and 940 are shown. Chip 930 includes divided parts of the user circuit design, TDM unit 931, receiver side, the software clock 932 and I / O bus 933 among other components. Chip 940 includes, among other components, divided portions of the user circuit design, a TDM unit 941 for the transmit side, the software clock 942 and an I / O bus 943. The TDM units 931 and 941 have been described with reference to Figs. 9A, 9B and 9C.

상기 칩들(930, 940)은 사기 하드웨어 모델을 함께 연결하는 상호연결 유선(944, 945)을 가지고 있다. 이러한 두 개의 상호연결 유선들은 도8에 도시되어 있는 상호연결 부분이다. 도8을 참고로, 상기 상호연결은 칩F32와 칩F33에 위치하고 있는 상호연결(611)이다. 한 실시예에서, 상기 각 상호연결을 위한 유선/핀들의 최대 수는 44이다. 도34에서, 상기 모델된 회로는 칩들930과 940 사이에서 단지 2개의 유선/핀들을 요구한다.The chips 930 and 940 have interconnect wires 944 and 945 connecting the fraudulent hardware model together. These two interconnect wires are the interconnect portion shown in FIG. Referring to Figure 8, the interconnect is an interconnect 611 located on chip F32 and chip F33. In one embodiment, the maximum number of wires / pins for each interconnect is 44. In Figure 34, the modeled circuit requires only two wires / pins between chips 930 and 940.

상기 칩들(930, 940)은 상기 뱅크 버스(90)에 연결되어 있다. 단지 2개의 칩들이 구현되기 때문에, 2개의 칩들은 동일한 뱅크에 있거나 또는 각각은 서로 다른 뱅크에 존재할 수 있다. 궁극적으로, 한 칩은 하나의 뱅크 버스와 연결되며, 또 다른 칩은 서로 다른 뱅크에 연결되어, 상기 FPGA 인터페이스에서의 출력은 상기 PCI 인터페이스의 출력과 동일하다.The chips 930 and 940 are connected to the bank bus 90. Since only two chips are implemented, the two chips may be in the same bank or each may be in a different bank. Ultimately, one chip is connected to one bank bus and another chip is connected to different banks so that the output at the FPGA interface is the same as the output of the PCI interface.

전술한 본 발명의 바람직한 실시예는 예와 설명을 목적으로 제시된 것이며 개시된 정확한 형태로 상기 발명을 제한하려는 것은 아니다. 명확히, 당업자는 많은 수정과 변화를 가할 수 있다. 당업자는 본 발명의 범위와 정신을 벗어나지 않고 본 발명의 여러 수정을 가할 수 있다. 따라서, 본 발명은 이하 첨부된 청구항에 의해서만 제한된다.The foregoing preferred embodiments of the invention are presented for purposes of illustration and description and are not intended to limit the invention to the precise form disclosed. Clearly, those skilled in the art can make many modifications and variations. Those skilled in the art can make various modifications of the present invention without departing from the scope and spirit of the invention. Accordingly, the invention is limited only by the appended claims below.

Claims

In a method for generating a custom modeled design VCD file,

Selecting a simulation session range starting at simulation time t0 and ending at simulation time t3;

Selecting a simulation target range starting at simulation time t1 and ending at simulation time t2, wherein simulation time t1 is equal to or greater than simulation time t0 and simulation time t2 is equal to or less than simulation time t3;

Generating a VCD file of the modeled design for a selected simulation target range; And

Accessing a VCD file directly from simulation time t1 to debug the modeled design.

The method of claim 1,

Providing primary inputs to the modeled design for evaluation; And

A method of generating a VCD file, further comprising the step of recording a simulation history during a simulation session scope.

The method of claim 2,

Processing the simulation history; And

And evaluating the processed simulation history from simulation time t0 to simulation time t2 in the modeled design.

The method according to claim 3,

The VCD file generation step

Generating evaluated results from a modeled design based on the processed simulation history; And

And storing the evaluated results during the simulation target range in the VCD file.

The method of claim 4, wherein

Compressing the primary inputs; And

And recording the compressed key inputs as a simulation history.

The method of claim 4, wherein

The processing step

Decompressing the compressed primary inputs; And

Providing the decompressed key inputs to the modeled design for evaluation as a processed simulation history.

The method of claim 4, wherein

The recording step

A method of generating a VCD file comprising recording key inputs as a simulation history.

The method of claim 1,

Storing state information of the modeled design at a simulation time t0 in a first file; And

And storing state information of the modeled design at a simulation time t3 in a second file.

In electrical design automation system for verifying user design,

A computing system including a memory and a central processing unit for modeling a user design in software;

An internal bus system coupled to the computing system;

Reconfiguration hardware logic coupled to the internal bus system and for modeling at least a portion of a user design in hardware;

Control logic coupled to the internal bus system for controlling data transfer between the reconstruction hardware logic and a computation system; And

On-demand VCD logic for recording simulation history for the selected simulation session range and dumping state information from the hardware model to the VCD for the selected simulation target range, the target simulation range being an electrical design within the simulation session range. Automation system.

The method of claim 9,

The custom VCD logic unit

A first range selection logic for selecting a simulation session range starting at simulation time t0 and ending at simulation time t3;

A second range selection logic for selecting a simulation target range starting at simulation time t1 and ending at simulation time t2, wherein simulation time t1 is greater than or equal to simulation time t0, and simulation time t2 is less than simulation time t3. Second range selection logic;

A dump logic section for generating a VCD file of a hardware-modeled design for the selected target range; And

Electrical design automation system further comprising a connection logic for accessing the VCD file directly from the simulation time t1 to debug the user design.

The method of claim 10,

VCD logic on demand

A test bench process providing key inputs to the hardware-modeled design for evaluation; And

And a recording logic in a calculation system for recording simulation history for the simulation session range.

The method of claim 11,

VCD logic on demand

Process logic in a calculation system for processing the simulation history; And

And an evaluation logic in reconstruction hardware logic for evaluating the processed simulation history in a hardware-modeled design from simulation time t0 to simulation time t2.

The method of claim 12,

And a dump logic dumps evaluation results from the hardware-modeled design into a VCD file according to the processed simulation history during a simulation target range.

The method of claim 13,

The recording logic section

Compression logic for compressing primary inputs; And

And further include a write logic for recording the compressed key inputs as a simulation history.

The method of claim 14,

The processing logic section

Decompression logic to decompress the compressed primary inputs; And

And a data transfer logic that passes the decompressed primary inputs into a hardware-modeled design as the processed simulation history for evaluation.

The method of claim 13,

And the write logic further comprises a write logic to record the key inputs as a simulation history.

The method of claim 9,

An electrical design automation system further comprising a state storage logic for storing state information of the hardware-modeled design at simulation time t0 in the first file and storing state information of the hardware-modeled design at simulation time t3 in the second file. .

In a custom VCD system for providing evaluated information for a selected simulation target range of simulation times, the evaluation takes place in the modeled design,

A first logic section for selecting a simulation session range starting at simulation time t0 and ending at simulation time t3;

A second logic unit for selecting a simulation target range starting at simulation time t1 and ending at simulation time t2, wherein simulation time t1 is greater than or equal to simulation time t0 and simulation time t2 is less than or equal to simulation time t3; Logic section;

A generation logic section for generating a VCD file of the evaluated information for the selected simulation target range; And

On-demand VCD system including connection logic to access the VCD file directly from simulation time t1 to debug the modeled design.

The method of claim 18,

A compression logic for receiving and compressing the main input data during the simulation session range period; And

A custom VCD system comprising decompression logic that decompresses the compressed main input data and passes the uncompressed main input data into a modeled design for evaluation.

The method of claim 19,

The generation logic section

And a dump circuit for dumping the evaluated information into the VCD file, wherein the evaluated information is generated by evaluating key inputs decompressed by the modeled design.