KR100928134B1

KR100928134B1 - Custom DCC Systems and Methods

Info

Publication number: KR100928134B1
Application number: KR1020037002218A
Authority: KR
Inventors: 핑-셍 트셍; 요게쉬 쿠마 고엘; 퀸시 쿤-흐수 쉔
Original assignee: 베리시티 디자인 인코포레이티드
Priority date: 2001-08-14
Filing date: 2001-08-14
Publication date: 2009-11-25
Also published as: CA2420027A1; IL154481A0; EP1417577A1; WO2003017099A1; IL154481A; IL160392A0; JP2005500618A; CA2420027C; JP4102752B2; EP1417577A4; CN1491385A; CN1308819C; KR20040028598A

Abstract

본원 발명은 소위 주문형 VCD에 관한 것이다. 일반적인 시스템에서, 주문형 VCD 기술을 통합하는 EDA 툴은 다음의 상위 레벨 속성들을 가지고 있다: (1) RCC-기반 병렬 시뮬레이션 히스토리 압축 및 기록, (2) RCC-기반 병렬 시뮬레이션 히스토리 압축 및 VCD 파일 발생, 및 (3) 선택된 시뮬레이션 타겟 범위에 대한 주문형 소프트웨어 재생성 및 시뮬레이션 재실행 없는 설계 리뷰. 이러한 속성들 각각이 상세히 기술될 것이다. 사용자가 시뮬레이션 세션 범위를 선택할 때, RCC 시스템은 테스트 벤치 프로세스로부터 1차 입력들에 대해 높게 압축된 버젼을 기록한다. 그리고 나서 사용자는 보다 집중된 분석을 위해 시뮬레이션 세션 범위 내에 소위 시뮬레이션 타겟 범위로 불려지는 보다 협소한 영역을 선택한다. RCC 시스템은 하드웨어 모델의 하드웨어 상태 정보(즉, 1차 입력들)를 VCD 파일내에 덤핑한다. 그리고 RCC 시스템은 시뮬레이션 세션 범위의 시작부터 완전한 시뮬레이션을 재실행할 필요없이 사용자가 시뮬레이션 타겟 범위의 시작으로부터 VCD 파일을 바로 관찰하는 과정으로 진행할 수 있도록 하여준다.The present invention relates to a so-called custom VCD. In a typical system, an EDA tool that incorporates custom VCD technology has the following high-level attributes: (1) RCC-based parallel simulation history compression and recording, (2) RCC-based parallel simulation history compression and VCD file generation, And (3) On-demand software regeneration and design review without simulation rerun for selected simulation target ranges. Each of these attributes will be described in detail. When the user selects a simulation session range, the RCC system writes a highly compressed version of the primary inputs from the test bench process. The user then selects a narrower area, called the simulation target range, within the simulation session range for more focused analysis. The RCC system dumps hardware state information (ie, primary inputs) of the hardware model into the VCD file. The RCC system allows the user to proceed directly to viewing the VCD file from the start of the simulation target range without having to rerun the complete simulation from the start of the simulation session range.

Description

Custom DCC system and method {VCD-ON-DEMAND SYSTEM AND METHOD}

본 발명은 1998년 8월 31일에 미국특허청(USPTO)에 제출된 미국 특허 출원 제09/144,222호의 일부 연속 출원(CIP)과 관련된 출원이다. The present invention is an application related to a partial serial application (CIP) of US patent application Ser. No. 09 / 144,222, filed with USPTO on August 31, 1998.

본 발명은 일반적으로 전자 설계 자동화(Electronic Design Automation: EDA)에 관한 것이다. 더욱 상세하게는, 본 발명은 설계 디버그 세션들을 가속화하기 위한 VCD(value change dump) 개선에 관한 것이다. The present invention relates generally to Electronic Design Automation (EDA). More specifically, the present invention relates to a value change dump (VCD) improvement for accelerating design debug sessions.

일반적으로, 전자 설계 자동화(EDA)는 사용자의 주문 회로 설계를 설계하고 검증하기 위한 자동 또는 반자동 툴을 설계자에게 제공하기 위하여 다양한 워크스테이션에서 구현되는 컴퓨터 기반의 툴이다. EDA는 일반적으로 시뮬레이션, 에뮬레이션, 프로토타이핑, 실행 또는 컴퓨팅을 목적으로 임의의 전자 설계를 생성하고, 분석하며 컴파일하기 위하여 사용된다. EDA 기술은 또한 사용자 설계되는 서브 시스템 또는 컴포넌트를 사용하는 시스템(즉, 타겟 시스템)을 개발하는데 사용될 수 있다. EDA의 최종 결과는 통상 독립된 집적회로 또는 인쇄회로기판의 형태인 변형되고 보완된 설계로서, 원래의 설계 범주를 유지하면서 원래의 설계를 개선하는 것이다.In general, electronic design automation (EDA) is a computer-based tool implemented at various workstations to provide designers with automatic or semi-automatic tools for designing and verifying user's custom circuit designs. EDA is commonly used to create, analyze, and compile any electronic design for simulation, emulation, prototyping, execution, or computing purposes. EDA technology can also be used to develop systems that use user-designed subsystems or components (ie, target systems). The end result of an EDA is a modified and complementary design, usually in the form of an independent integrated or printed circuit board, that improves the original design while maintaining the original design category.

하드웨어 에뮬레이션 이후에 회로 설계를 시뮬레이션하는 소프트웨어의 가치는 EDA 기술을 사용하고 EDA 기술의 이점을 이용하는 여러 가지 산업에서 인식되어 있다. 그럼에도 불구하고, 현재 소프트웨어 시뮬레이션 및 하드웨어 에뮬레이션/가속은 이러한 프로세스들의 분리되고 독립적인 특성 때문에 사용자에게 번거롭다. 예컨대, 사용자는 모든 하나의 디버그/테스트 세션에서, 시간의 일부분 동안 소프트웨어 시뮬레이션을 이용하여 회로 설계를 시뮬레이션하거나 디버깅하고, 그들 결과를 이용하여 다른 시간동안 하드웨어 모델을 이용하여 시뮬레이션 프로세스를 가속하며, 이후 시간에서 소프트웨어 시뮬레이션으로 리턴한다. 게다가, 시뮬레이션 시간이 진행됨에 따라 내부 레지스터와 조합 로직값이 변화하기 때문에, 사용자는 그 변화가 하드웨어 가속/에뮬레이션 프로세스 동안 하드웨어 모델에 발생한다 하더라도 이들 변화를 모니터링할 수 있어야 한다. The value of software that simulates circuit designs after hardware emulation is recognized in many industries that use EDA technology and take advantage of EDA technology. Nevertheless, current software simulation and hardware emulation / acceleration are cumbersome for the user because of the separate and independent nature of these processes. For example, in every one debug / test session, a user can simulate or debug a circuit design using software simulation for a portion of the time, use their results to accelerate the simulation process using a hardware model for another time, and then Return to software simulation in time. In addition, because the internal registers and combinational logic values change as the simulation time progresses, the user must be able to monitor these changes even if they occur in the hardware model during the hardware acceleration / emulation process.

공동-시뮬레이션(Co-simulation)은 두개의 분리되고 독립적인 순수 소프트웨어 시뮬레이션 프로세스와 순수 에뮬레이션/가속 프로세스를 이용하는 번거로운 특성의 몇가지 문제점을 바로잡고, 전반적인 시스템을 보다 사용자 친화적으로 만들 필요성에서 발생되었다. 그러나, 공동-시뮬레이션은 여전히 많은 결점이 있다: (1)공동-시뮬레이션 시스템은 수동 분할(manual partitioning)을 필요로 하고, (2)공동-시뮬레이션은 두개의 소결합 엔진을 사용하며, (3)공동-시뮬레이션 속도는 소프트웨어 시뮬레이션 속도만큼 느리고, (4)공동-시뮬레이션 시스템은 레이스 컨디션(race condition)과 충돌한다. Co-simulation arose from the need to correct some of the cumbersome nature of using two separate and independent pure software simulation processes and pure emulation / acceleration processes, making the overall system more user friendly. However, co-simulation still has many drawbacks: (1) a co-simulation system requires manual partitioning, (2) co-simulation uses two small-combination engines, and (3) The co-simulation speed is as slow as the software simulation speed, and (4) the co-simulation system collides with the race condition.

우선, 소프트웨어와 하드웨어 사이의 분할은 자동이 아닌 수동으로 수행되어, 사용자에게는 더 부담이다. 본질적으로, 공동-시뮬레이션은 사용자가 설계를 분할(동작 레벨에서 시작하고, 그 후 RTL, 및 게이트 레벨)하고, 소프트웨어와 하드웨어 중에서 매우 큰 기능 블럭으로 상기 모델을 자체적으로 테스트하는 것을 요구한다. 이러한 제한은 사용자에게 어느 정도 범위의 지능화를 요구한다. First of all, the division between software and hardware is performed manually rather than automatically, which is more burdensome for the user. In essence, co-simulation requires the user to split the design (start at the operating level, then the RTL, and gate level) and test the model itself with a very large functional block of software and hardware. This restriction requires some degree of intelligence from the user.

두 번째로, 공동-시뮬레이션 시스템은 두개의 소결합되고 독립된 엔진을 이용하며, 이것은 내부-엔진 동기화, 조정, 및 융통성 문제를 유발한다. 공동-시뮬레이션은 두개의 다른 검증 엔진 - 소프트웨어 시뮬레이션 및 하드웨어 에뮬레이션의 동기화를 요구한다. 소프트웨어 시뮬레이션측이 하드웨어 가속측에 연결되어 있다고 해도, 외부의 핀아웃 데이터(pin-out data)만을 검사 및 로딩에 이용할 수 있다. 레지스터에서 모델링되는 회로 내부의 값과 조합 로직 레벨은 용이한 검사 및 일측에서 타측으로의 다운로딩을 위해 사용할 수 없으므로, 이들 공동-시뮬레이션 시스템의 이용을 제한한다. 통상적으로 사용자가 소프트웨어 시뮬레이션에서 하드웨어 가속으로 전환하면 전체 설계를 다시 시뮬레이션해야 할 수도 있다. 따라서, 사용자가 레지스터와 조합 로직값을 검사하면서 단일 디버그 세션 동안 소프트웨어 시뮬레이션과 하드웨어 에뮬레이션간의 전환을 원할 경우, 공동-시뮬레이션 시스템은 이러한 능력을 제공하지 않는다. Secondly, the co-simulation system utilizes two small coupled independent engines, which causes internal-engine synchronization, coordination, and flexibility issues. Co-simulation requires the synchronization of two different verification engines-software simulation and hardware emulation. Even if the software simulation side is connected to the hardware acceleration side, only external pin-out data can be used for inspection and loading. The values within the circuit and the combinational logic levels modeled in the registers cannot be used for easy inspection and downloading from one side to the other, thus limiting the use of these co-simulation systems. Typically, when a user switches from software simulation to hardware acceleration, the entire design may need to be re-simulated. Thus, if a user wants to switch between software simulation and hardware emulation during a single debug session while checking registers and combinatorial logic values, the co-simulation system does not provide this capability.

세번째로, 공동-시뮬레이션 속도는 시뮬레이션 속도만큼 느리다. 공동-시뮬레이션은 두개의 다른 검증엔진- 소프트웨어 시뮬레이션 및 하드웨어 에뮬레이션의 동기화를 요구한다. 각각의 엔진은 시뮬레이션 또는 에뮬레이션을 구동하기 위한 그자신의 제어 메커니즘을 가진다. 이것은 소프트웨어와 하드웨어 사이의 동기화가 전체적인 성능을 소프트웨어 시뮬레이션만큼 느린 속도로 억지시킨다는 것을 의미한다. 이들 두개의 엔진의 동작을 조정하기 위한 부가적인 부담이 공동-시뮬레이션 시스템의 느린 속도에 부가된다. Third, the co-simulation speed is as slow as the simulation speed. Co-simulation requires the synchronization of two different verification engines-software simulation and hardware emulation. Each engine has its own control mechanism for running the simulation or emulation. This means that synchronization between software and hardware deters the overall performance at a rate as slow as software simulation. The additional burden of coordinating the operation of these two engines adds to the slow speed of the co-simulation system.

네번째로, 공동-시뮬레이션 시스템은 클럭 신호 사이의 레이스 컨디션으로 인해 셋업, 유지시간 및 클럭 글리치(glitch) 문제를 일으킨다. 공동-시뮬레이션은 서로 다른 배선 길이로 인하여 다른 시간에서 다른 로직 엘리먼트로의 입력일 수 있는 하드웨어 구동 클럭을 이용한다. 이들 로직 엘리먼트가 데이터를 함께 평가하여야 하는 경우, 몇몇 로직 엘리먼트들은 임의의 시간 주기에서 데이터를 평가하고 다른 로직 엘리먼트들은 다른 시간 주기에서 데이터를 평가하기 때문에, 불확신한 레벨의 평가 결과를 초래한다. Fourth, co-simulation systems cause setup, hold time and clock glitch issues due to race conditions between clock signals. Co-simulation uses a hardware driven clock that can be input to different logic elements at different times due to different wire lengths. If these logic elements are to evaluate the data together, some logic elements evaluate the data in any time period and other logic elements evaluate the data in another time period, resulting in an uncertain level of evaluation results.

통상적인 설계자에 의해 직면한 다른 문제점은, 디버깅동안 설계문제를 분리하여 인식하는 비교적 느린 프로세스이다. 설계자 자신의 제한된 문제 해결 능력이 몇가지 이러한 흩트러진 페이스(pace)에 기여할 수도 있지만, 이러한 문제의 주요 원인은 시뮬레이터 자체이다. 소프트웨어 기반 엔진 때문에 느린 시뮬레이터뿐만 아니라, 시뮬레이터에 있어서의 디버깅은 전체 시뮬레이션의 리턴을 필요로 한다. 이러한 문제들에 대해서는 더 설명될 것이다. Another problem faced by conventional designers is a relatively slow process that isolates and recognizes design problems during debugging. The designer's own limited problem solving ability may contribute to some of these scattered paces, but the main cause of this problem is the simulator itself. Because of the software-based engine, debugging in the simulator, as well as slow simulators, requires the return of the entire simulation. These problems will be further explained.

통상의 ASIC 칩 설계자는 시뮬레이터를 이용하여 설계를 디버깅한다: 즉, 설계자는 다른 것중에서 다양한 스티멀러스에 대한 그들의 반응을 관찰하기 위하여 테스트 벤치 프로세스(test bench process)를 이용하여 그 설계를 시뮬레이션하거나 테스트한다. 그 설계의 몇가지 키노드(key nodes) 및 출력의 심사를 기초로 하여, 설계자는 일반적으로 그들 설계의 결함 유무를 결정할 수 있다. 물론, 설계가 초기 단계에 있다면, 그것은 늘 몇가지 문제를 가진다. A typical ASIC chip designer uses a simulator to debug a design: that is, the designer can simulate the design using a test bench process to observe their response to various stimulus among others. Test it. Based on the examination of some key nodes and outputs of the design, the designer can generally determine the presence or absence of a defect in their design. Of course, if the design is in its infancy, it always has some problems.

그러나, 버그의 위치를 알아내는 것은 쉽지 않다. 상당히 크고 복잡한 설계(예컨대 수백만 게이트이상)에 있어서, 시뮬레이터는 버그중의 하나가 명백하게 드러나기 전에 수백만 시뮬레이션 시간 주기의 스텝을 진행해야 한다. 분명히, 이러한 설계에 있어서, 설계자가 각각의 시뮬레이션 시간 스텝을 검토하는 것을 기대할 수는 없다. 솔직히 이러한 작업은 제품 설계의 개발 사이클에 있어서 주어진 짧은 시간 범위에는 불가능할 것이다. However, locating bugs is not easy. In a fairly large and complex design (e.g., over millions of gates), the simulator must go through steps of millions of simulation time periods before one of the bugs becomes apparent. Clearly, in this design, the designer cannot expect to review each simulation time step. Frankly, this would not be possible in a given short time span in the product design development cycle.

일단 시뮬레이터가 일반적으로 버그의 존재를 발견하면, 실제 버그(actual bug)는 버그의 결함있는 설계를 제거하기 위하여 구체적으로 위치되어야 한다. 언제(즉, 시뮬레이션 시간 스텝) 문제가 발생하였는가? 시뮬레이션 초기(예컨대, t10), 중기(예컨대, t1000) 또는 말기(예컨대, t1000000)에 발생하였는가? 또한, 교정가 제공될 수 있는 곳(즉, 회로 설계의 물리적 위치)에 문제가 위치하는가? 초기에, 설계자가 정확하게 어디(시뮬레이션 시간 스텝)에서 버그가 발생했는지를 알 수는 없지만, 합리적인 추정을 할 수 있다. 설계자는 문제가 위치한다고 생각되는 정확한 시뮬레이션 시간으로 진행하기 위한 몇가지 방법을 가질 수 있다. 시뮬레이터는 두가지 종래의 방법- 풀(full) VCD 및 선택적인 VCD 중 하나를 통하여 VCD(Value Change Dump) 파일을 제공함으로써, 이러한 작업에 있어서 설계자를 보조할 수 있다. Once the simulator generally detects the presence of a bug, the actual bug must be specifically located to eliminate the bug's defective design. When did the problem occur (ie simulation time step)? Did it occur early in the simulation (eg t10), mid-term (eg t1000) or late (eg t1000000)? Also, is there a problem where calibration can be provided (ie physical location of the circuit design)? Initially, the designer does not know exactly where the bug occurred (simulation time step), but can make reasonable estimates. The designer can have several ways to proceed to the exact simulation time at which the problem is thought to be located. The simulator can assist the designer in this task by providing a VCD (Value Change Dump) file through one of two conventional methods—full VCD and optional VCD.

풀 VCD 방법에 있어서, 시뮬레이터는 시뮬레이션 시간 t0 에서 시뮬레이션 끝까지 VCD 파일로서 전체 시뮬레이션을 저장한다. 그 후, 이러한 VCD 파일은 설계자에 의해 분석되어 버그를 분리시킨다. 설계자는 그 일반적인 위치에 대하여 합리적인 추정을 하여 몇가지 정교한 스텝핑으로 그 위치를 분석할 수 있다: 즉, 설계자가 버그가 시뮬레이션 시간 t350 과 t400 사이의 어느 곳에서 발생했다고 다소 의심한다면, 시뮬레이션 시간 t345와 같이, 의심되는 시뮬레이션 시간 직전에 위치하는 시뮬레이션 시간으로 진행할 것이다. 그리고, 설계자는 이러한 의심되는 영역(즉, t345 내지 t400)에 대해 매우 조심스럽게 검사를 진행할 것이다. In the full VCD method, the simulator stores the entire simulation as a VCD file from simulation time t0 to the end of the simulation. This VCD file is then analyzed by the designer to isolate the bug. The designer can make a reasonable estimate of the general position and analyze the position with some elaborate stepping: if the designer somewhat suspects that the bug occurred somewhere between simulation times t350 and t400, like simulation time t345 As a result, we will proceed to the simulation time that is located just before the suspected simulation time. The designer will then check very carefully for these suspected areas (ie, t345 to t400).

그러나, 이러한 시뮬레이션 시간에 도달하기 위해서는, 설계자는 버그가 발생된 곳과 상관없이 VCD 파일을 갖는 시작(즉, t0)부터 전체 시뮬레이션을 리턴해야 한다. 만약 버그의 위치에 대한 초기 추정이 틀렸다면, 다른 추정을 하여 처음부터 시뮬레이션을 다시 리턴해야 한다. 백만 이상의 게이트 및 백만 이상의 시뮬레이션 시간 스텝을 갖는 설계에 있어서, 시작부터 시뮬레이션을 리턴하는 이러한 디버깅 프로세스는 잘못된 추정에 의해 비롯되는 매우 큰 시간낭비이다. However, to reach this simulation time, the designer must return the entire simulation from the beginning with the VCD file (ie t0), regardless of where the bug occurred. If the initial estimate of the bug's location is wrong, you'll have to make another estimate and return the simulation from the beginning. For designs with more than one million gates and more than one million simulation time steps, this debugging process of returning a simulation from the start is a very time consuming process resulting from false estimation.

그러나, 백만이상의 게이트와 백만 이상의 시뮬레이션 시간 스텝을 가진 설계는 많은 디스크 공간을 필요로 한다. 통상, 대략 100 GB의 풀 VCD 파일이 일반적이다. 이러한 VCD 파일은 대부분의 파일 시스템에서 너무 크다. 더욱이, 이러한 큰 VCD 파일은 너무 커서 대부분의 파형 뷰어가 충분히 처리할 수 없다.
또한, 풀 VCD에서 시뮬레이션 프로세스는 3배 더 느려진다. 각 시뮬레이션 시간(또는 값이 변할 때) 이후, 풀 VCD는 상태값들이 기록되는 것을 요구한다. 스토리지에 액세스하는 이러한 프로세스는 시간을 필요로 하며, 결과적으로 시뮬레이션은 스토리지 동작이 주어진 시뮬레이션 시간에서 종료될 때까지 중단되어야 한다. 오늘날, 풀 VCD 방법은 더 이상 실용적이지 않다. However, designs with more than one million gates and more than one million simulation time steps require a lot of disk space. Typically, a full VCD file of approximately 100 GB is common. These VCD files are too large in most file systems. Moreover, these large VCD files are too large for most waveform viewers to handle.
In addition, the simulation process is three times slower in full VCD. After each simulation time (or when the value changes), the full VCD requires that state values be recorded. This process of accessing storage requires time, and as a result, simulation must be stopped until the storage operation ends at a given simulation time. Today, the full VCD method is no longer practical.

선택적인 VCD방법에 있어서, 전체 시뮬레이션은 저장되지 않는다; 오히려, 시뮬레이터는 설계자가 선택한 부분의 시뮬레이션을 저장한다. 그러나, 선택적인 VCD는 설계자가 시작부터 전체 시뮬레이션을 리턴해야 하는 것을 줄여주지는 않는다. 초기에, 설계자는 시뮬레이션을 가동하고 반드시 그 설계에 있어서 문제점을 관찰하여야 한다. 그리고 나서, 어디 문제점이 위치하는지에 대하여 추정한다. 만약 설계자가 시뮬레이션 시간 t350과 t400사이의 어디엔가 문제점이 발생할 것이라는 추정을 한다면, 설계자는 시뮬레이션을 재가동하여 시뮬레이터에게 VCD 파일로서 이러한 시뮬레이션 시간 범위를 저장하도록 지시한다. 그후에, 설계자는 그의 추정에 해당하는 VCD 파일을 검사할 수 있다. 만약 문제의 위치를 알아내는데 있어서 그의 추정이 틀렸다면, 그는 다른 추정을 하여, VCD 파일로서 새로운 시뮬레이션 범위를 저장하도록 시뮬레이터에게 지시하고, 그리고 나서 그 시뮬레이션을 재가동하여야 한다. 그리고 나서, 설계자는 VCD 파일을 다시 분석한다. In the optional VCD method, the entire simulation is not saved; Rather, the simulator stores a simulation of the part selected by the designer. However, the optional VCD does not reduce the designer's need to return the entire simulation from the start. Initially, the designer runs the simulation and must observe problems with the design. Then, estimate where the problem is located. If the designer assumes that a problem will occur somewhere between simulation times t350 and t400, the designer restarts the simulation and instructs the simulator to save this simulation time range as a VCD file. The designer can then examine the VCD file corresponding to his estimate. If his estimate is wrong in locating the problem, he must make another estimate, instruct the simulator to save the new simulation range as a VCD file, and then restart the simulation. The designer then analyzes the VCD file again.

풀 VCD 방법과는 달리, 선택적인 VCD는 전체 시뮬레이션이 저장되지 않기 때문에 많은 디스크 공간을 필요로 하지 않는다. 그러나, 선택적인 VCD는 여전히 전체 시뮬레이션을 재가동하는 것을 요구한다. 만약 설계자가 그 버그의 위치를 알아내는데 있어서 잘못된 추정을 한다면, 그는 VCD 파일에 있어서 새로운 시뮬레이션 범위를 저장하도록 다시 시뮬레이션을 재가동하여야 한다. 어느 경우에도, 선택적인 VCD 방법은 잘못된 추정에 의해 악화된 시간낭비는 여전하다. Unlike the full VCD method, the optional VCD does not require much disk space because the entire simulation is not stored. However, the optional VCD still requires restarting the entire simulation. If the designer makes a mistake in locating the bug, he has to restart the simulation again to save the new simulation range in the VCD file. In either case, the selective VCD method is still aggravated by wasted time.

따라서, 현재 알려진 시뮬레이션 시스템, 하드웨어 에뮬레이션 시스템, 하드웨어 가속기, 공동-시뮬레이션, 및 공동 검증(coverification) 시스템에 의해 유발된 문제점을 바로 잡는 시스템 또는 방법에 대한 필요성이 산업 분야에 존재한다. Accordingly, there is a need in the industry for a system or method that corrects problems caused by currently known simulation systems, hardware emulation systems, hardware accelerators, co-simulation, and co-coverage systems.

본 발명의 하나의 실시예는 시뮬레이션 리턴 없이 주문형 VCD 파일을 제공하는 것이다. 주문형 VCD 특징은 RCC 컴퓨팅 시스템 및 RCC 하드웨어 가속기를 포함하는 RCC 시스템에 병합되는 것이다. RCC 컴퓨팅 시스템은 사용자가 소프트웨어로 사용자의 전체 소프트웨어-모델링되는 소프트웨어 설계를 시뮬레이션하고, 하드웨어-모델링 설계 부분의 하드웨어 가속을 제어하는데 필요한 컴퓨팅 리소스를 포함한다. RCC 하드웨어 가속기는 사용자 하드웨어 설계의 적어도 일부분을 모델링하여 사용자가 디버깅 프로세스를 가속시킬 수 있도록 하는 리컨피규러블(reconfigurable) 어레이의 로직 엘리먼트(예, FPGA)를 포함한다. RCC 컴퓨팅 시스템은 소프트웨어 클럭을 통하여 RCC 하드웨어 가속기에 밀착 결합된다. One embodiment of the present invention is to provide a custom VCD file without a simulation return. On-demand VCD features are incorporated into an RCC system that includes an RCC computing system and an RCC hardware accelerator. An RCC computing system includes the computing resources necessary for a user to simulate a user's entire software-modeled software design with software and to control hardware acceleration of the hardware-modeling design portion. RCC hardware accelerators include logic elements (eg, FPGAs) in a reconfigurable array that allow at least a portion of the user hardware design to be modeled to allow the user to accelerate the debugging process. The RCC computing system is tightly coupled to the RCC hardware accelerator through a software clock.

주문형 VCD는 사용자가 시뮬레이션을 재가동하지 않고 상세한 디버깅 분석을 위하여 시뮬레이션 히스토리의 일부를 선택하도록 한다. RCC 시스템은 사용자가 두개의 시뮬레이션 시간 범위, 보다 넓은 "시뮬레이션 세션 범위"와 "시뮬레이션 타겟 범위"라고 불리는 보다 좁은 서브셋의 이러한 범위를 선택하도록 한다. VCD 파일은 이러한 좁은 "시뮬레이션 타겟 범위" 동안 생성될 것이다. "시뮬레이션 세션 범위"의 선택 후에, RCC 시스템은 테스트 벤치 프로세스에서 평가를 위한 RCC 하드웨어 가속기내의 하드웨어 모델로 1차 입력을 제공함으로써, 시뮬레이션 세션 범위의 전체 주기동안 설계를 신속하게 시뮬레이션한다. 또한, 이들 동일한 1차 입력은 시뮬레이션 히스토리 파일에 압축되고 기록된다. 이러한 시뮬레이션 히스토리 파일을 가지고, RCC 시스템은 언제라도 시뮬레이션 세션 범위내에서 임의의 시뮬레이션 부분을 재생할 수 있다. On-demand VCD allows the user to select a portion of the simulation history for detailed debugging analysis without restarting the simulation. The RCC system allows the user to choose between two simulation time ranges, a wider "simulation session range" and a narrower subset called the "simulation target range." The VCD file will be created during this narrow "simulation target range". After the selection of "simulation session range", the RCC system quickly provides a primary input to the hardware model in the RCC hardware accelerator for evaluation in the test bench process, thereby rapidly simulating the design for the entire period of the simulation session range. In addition, these same primary inputs are compressed and recorded in the simulation history file. With this simulation history file, the RCC system can play any simulation part at any time within the scope of the simulation session.

시뮬레이션 세션 범위의 시작에서, RCC 시스템은 사용자가 필요하다면 오프라인 시뮬레이션을 할 수 있도록 상기 지점에서 설계의 하드웨어 상태 정보를 저장한다. 시뮬레이션 세션 범위의 종료에서, RCC 시스템은 사용자가 시뮬레이션을 리와인딩하지 않고 언제라도 이러한 시뮬레이션 세션 범위를 넘어서 시뮬레이션하기 위해 중지(left off)된 지점으로 빨리 리턴할 수 있도록 상기 지점에서 설계의 하드웨어 상태 정보를 저장한다. At the beginning of the simulation session range, the RCC system stores the hardware state information of the design at this point so that the user can perform an offline simulation if necessary. At the end of the simulation session scope, the RCC system can quickly return to the point left off to simulate beyond this simulation session range at any time without rewinding the simulation to the hardware state information of the design at that point. Save it.

사용자가 "시뮬레이션 타겟 범위"를 선택하면, RCC 시스템은 시뮬레이션 히스토리 파일의 압축된 1차 입력을 압축해제하고, 평가용 RCC 하드웨어 가속기에 이러한 압축해제된 1차 입력을 제공함으로써, 시뮬레이션 타겟 범위의 초기에 신속히 시뮬레이션한다. 시뮬레이션 타겟 범위에서, RCC 시스템은 하드웨어 모델에서 시스템 디스크내의 저장용 VCD 파일로의 1차 출력 또는 평가 결과를 덤프(dump)한다. 시뮬레이션 타겟 범위의 종료에서, RCC 시스템은 덤프 프로세스를 중지한다. When the user selects the "simulation target range", the RCC system decompresses the compressed primary input of the simulation history file and provides this decompressed primary input to the evaluation RCC hardware accelerator, thereby initializing the simulation target range. Simulate quickly. In the simulation target range, the RCC system dumps the primary output or evaluation results from the hardware model to a VCD file for storage in the system disk. At the end of the simulation target range, the RCC system stops the dump process.

VCD 파일이 생성되면, 사용자는 설계를 디버깅하기 위한 파형 뷰어를 통해 VCD 파일을 보다 상세히 관찰할 수 있다. 이는 시뮬레이션을 재가동시키지 않고 달성된다. 버그가 이러한 시뮬레이션 타겟 범위에 위치하지 않는다면, 사용자는 동일한 시뮬레이션 세션 범위내의 다른 시뮬레이션 타겟 범위를 선택할 수 있다. 일단 새로운 시뮬레이션 타겟 범위가 선택되면, RCC 시스템은 상술한 방법으로 새로운 VCD 파일을 생성한다. 그리고 나서, 사용자는 버그를 분리하기 위하여 이 새로운 VCD 파일을 분석할 수 있다. Once the VCD file is created, the user can observe the VCD file in more detail through the waveform viewer to debug the design. This is accomplished without restarting the simulation. If the bug is not within this simulation target range, the user can select another simulation target range within the same simulation session range. Once a new simulation target range is selected, the RCC system creates a new VCD file in the manner described above. Then, the user can analyze this new VCD file to isolate the bug.

버그가 분리되어 치료되면, 사용자는 현재 시뮬레이션 세션 범위를 지나서 다음 시뮬레이션 범위까지 계속 시뮬레이션 할 수 있다. 현재 시뮬레이션 세션 범위의 종료 시점에서의 저장된 하드웨어 상태 정보는 RCC 시스템으로 로딩된다. 그리고 나서, 사용자는 시뮬레이션을 착수할 수 있다. 주문형 VCD 특성은 온라인과 오프라인 모두에서 이용할 수 있다는 것이다. Once the bug is isolated and cleaned up, the user can continue to simulate beyond the current simulation session to the next simulation. The stored hardware state information at the end of the current simulation session scope is loaded into the RCC system. The user can then launch the simulation. The custom VCD feature is available both online and offline.

이들 및 다른 실시예는 이하의 상세한 설명에 의해 충분히 논의되고 설명된다. These and other embodiments are fully discussed and described by the following detailed description.

도 1은 워크스테이션, 리컨피규러블 하드웨어 에뮬레이션 모델, 에뮬레이션 인터페이스 및 PCI 버스에 결합된 타겟 시스템을 포함하는, 본 발명의 하나의 실시예의 상위 개략도를 나타내고, 1 shows a high level schematic diagram of one embodiment of the present invention, including a workstation, a reconfigurable hardware emulation model, an emulation interface, and a target system coupled to a PCI bus,

도 2는 본 발명의 하나의 특정된 이용 흐름도를 나타내며, 2 illustrates one specific use flow diagram of the invention,

도 3은 본 발명의 일실시예에 따라 컴파일링 시간 및 런타임동안 소프트웨어 컴파일레이션 및 하드웨어 컨피규레이션의 상위도를 나타내며, 3 illustrates a top view of software compilation and hardware configuration during compilation time and runtime, in accordance with an embodiment of the present invention.

도 4는 소프트웨어/하드웨어 모델 및 소프트웨어 커널 코드 생성을 포함하는 컴파일레이션 프로세스의 흐름도를 나타내고, 4 shows a flow diagram of a compilation process including a software / hardware model and software kernel code generation,

도 5는 전체 SEmulation 시스템을 제어하는 소트프웨어 커널을 나타내며, 5 shows a software kernel that controls the entire SEmulation system.

도 6은 맵핑, 배치, 및 라우팅(routing)을 통하여 하드웨어 모델을 리컨피규러블보드에 맵핑하는 방법을 나타내고, 6 shows a method of mapping a hardware model to a reconfigurable board through mapping, placement, and routing,

도 7은 도 8에 나타낸 FPGA 어레이를 위한 연결성 매트릭스를 나타내며, FIG. 7 shows the connectivity matrix for the FPGA array shown in FIG. 8,

도 8은 4x4 FPGA 어레이 및 그들의 내부 액세스의 일실시예를 나타내고, 8 illustrates one embodiment of a 4x4 FPGA array and their internal access,

도 9(A), (B) 및 (C)는 다수의 핀대신 하나의 핀이 칩내의 와이어 그룹에 대하여 사용될 수 있도록 타임 멀티플렉싱 방식으로 일단의 와이어가 함께 결합되는 시분할 다중 회로의 일실시예를 나타낸다. 도 9(A)는 핀아웃 문제의 개략을 나타내고, 도 9(B)는 발신측에 있어서의 TDM 회로를 제공하며, 도 9(C)는 수신측에서에서의 TDM회로를 제공한다. 9 (A), (B) and (C) illustrate an embodiment of a time division multiplexing circuit in which a group of wires are joined together in a time multiplexing manner so that one pin can be used for a group of wires in a chip instead of multiple pins. Indicates. Fig. 9A shows an outline of the pinout problem, Fig. 9B provides a TDM circuit at the originating side, and Fig. 9C provides a TDM circuit at the receiving side.

도 10은 본 발명의 일실시예에 따라서 SEmulation 시스템 구조를 나타내고, 10 illustrates a structure of a SEmulation system according to an embodiment of the present invention,

도 11은 본 발명의 어드레스 포인터의 일실시예를 나타내며, 11 shows an embodiment of the address pointer of the present invention,

도 12는 도 11의 어드레스 포인터를 위한 어드레스 포인터 초기화의 상태변이를 나타내며, 12 illustrates a state transition of address pointer initialization for the address pointer of FIG.

도 13은 어드레스 포인터를 위한 다양한 MOVE 신호를 파생적으로 발생하는 MOVE신호 발생기의 일실시예를 나타내고, FIG. 13 illustrates an embodiment of a MOVE signal generator that derivatively generates various MOVE signals for an address pointer,

도 14는 FPGA칩 각각에서 멀티플레싱된 어드레스 포인터 체인을 나타내며, 14 shows a chain of address pointers multiplexed on each FPGA chip,

도 15는 본 발명의 일실시예에 따라서 멀티플렉싱된 크로스 칩 어드레스 포인터 체인의 일실시예를 나타내고, 15 illustrates one embodiment of a cross chip address pointer chain multiplexed according to an embodiment of the present invention,

도 16은 하드웨어 모델의 로직 컴포넌트의 소프트웨어 클럭 실행과 평가에 중요한 클럭/데이터 네트워크 분석의 흐름도를 나타내며, 16 shows a flow diagram of clock / data network analysis critical to software clock execution and evaluation of logic components of a hardware model,

도 17은 본 발명의 일실시예에 따라서 하드웨어 모델의 기본 구축 블럭을 나타내며, 17 illustrates a basic building block of a hardware model according to an embodiment of the present invention,

도 18(A)와 (B)는 래치와 플립플롭에 대한 레지스터 모델 실행을 나타내고, 18 (A) and (B) show register model execution for latches and flip-flops,

도 19는 본 발명의 일실시예에 따라서 클럭 에지 검출 로직의 일실시예를 나타내며, 19 illustrates an embodiment of clock edge detection logic in accordance with an embodiment of the present invention.

도 20은 본 발명의 일실시예에 따라서 도 19의 클럭 에지 검출 로직을 제어하기 위한 4가지 상태의 한정 상태 머신을 나타내고, 20 illustrates a four state finite state machine for controlling the clock edge detection logic of FIG. 19 in accordance with an embodiment of the present invention.

도 21은 본 발명의 일실시예를 따라서 FPGA칩 각각에 대하여, 내부 액세스, JTAG, FPGA, 버스 및 전체 신호 핀 지정을 나타내고, 21 illustrates internal access, JTAG, FPGA, bus, and overall signal pin assignments for each FPGA chip in accordance with one embodiment of the present invention;

도 22는 PCI 버스와 FPGA 어레이 사이의 FPGA 컨트롤러의 일실시예를 나타내며, 22 illustrates one embodiment of an FPGA controller between a PCI bus and an FPGA array,

도 23은 도 22에 대하여 논의된 CTRL_FPGA 유닛과 데이터 버퍼의 보다 상세한 설명을 나타내고, FIG. 23 shows a more detailed description of the CTRL_FPGA unit and data buffer discussed with respect to FIG. 22;

도 24는 4x4 FPGA 어레이, FPGA에 대한 그 관계, 및 확장 능력을 나타내고, 24 illustrates a 4x4 FPGA array, its relationship to the FPGA, and its expansion capabilities,

도 25는 하드웨어 개시 방법의 일실시예를 나타내며, 25 illustrates one embodiment of a hardware initiation method,

도 26은 모델링되고 시뮬레이션될 사용자 회로 설계의 일예에 대한 HDL코드를 나타내고, 26 illustrates an HDL code for an example of a user circuit design to be modeled and simulated,

도 27은 도 26의 HDL코드의 회로 설계를 기호로 나타낸 회로도이며, 27 is a circuit diagram showing a circuit design of the HDL code of FIG.

도 28은 도 26의 HDL코드에 대한 컴포넌트 형태 분석을 나타내고, FIG. 28 shows component shape analysis for the HDL code of FIG. 26;

도 29는 도 26에 나타낸 사용자 주문 회로 설계에 기초한 구조화된 RTL HDL 코드의 신호 네트워크 분석을 나타내며, 29 illustrates a signal network analysis of a structured RTL HDL code based on the user order circuit design shown in FIG. 26,

도 30은 동일한 가상예의 소프트웨어/하드웨어 분할 결과를 나타내며, 30 shows the result of software / hardware partitioning of the same virtual example,

도 31은 동일한 가상예에 대한 하드웨어 모델을 나타내고, 31 shows a hardware model for the same virtual example,

도 32는 사용자 주문 회로 설계의 동일한 가상예에 대한 하나의 특정한 하드웨어 모델 대 칩 분할 결과를 나타내며, 32 shows one specific hardware model versus chip splitting result for the same hypothetical example of a user-customized circuit design,

도 33은 사용자 주문 회로 설계의 동일한 가상예의 다른 특정 하드웨어 모델 대 칩 분할 결과를 나타내며, 33 illustrates another specific hardware model versus chip splitting result of the same hypothetical example of a user-customized circuit design,

도 34는 사용자 주문 회로 설계의 동일한 가상예에 대한 로직 패칭 동작을 나타내고, 34 illustrates a logic patching operation for the same hypothetical example of a user order circuit design,

도 35(A) 내지 35(D)는 두개의 예에 있어서 "홉(hops)"과 내부 액세스 원리를 설명하며, 35A-35D illustrate "hops" and the internal access principle in two examples,

도 36은 본원 발명에서 사용된 FPGA칩의 개략을 나타내고, 36 shows an outline of an FPGA chip used in the present invention,

도 37은 FPGA칩상의 FPGA 내부 액세스 버스를 나타내며, 37 shows an FPGA internal access bus on an FPGA chip,

도 38(A) 내지 (B)는 본 발명의 일실시예에 따라서 FPGA보드 액세스의 측면도를 나타내며, 38 (A)-(B) show side views of an FPGA board access in accordance with one embodiment of the present invention,

도 39는 본 발명의 일실시예에 따라서 FPGA 어레이의 다이렉트-네이버(direct-neighbor) 및 원홉 식스보드(one-hop, six-board) 내부액세스 레이아웃을 나타내고, FIG. 39 illustrates a direct-neighbor and one-hop six-board internal access layout of an FPGA array in accordance with an embodiment of the present invention. FIG.

도 40(A) 및 40(B)는 FPGA 내부 보드 내부액세스 구성을 나타내며, 40 (A) and 40 (B) show the FPGA internal board internal access configuration,

도 41(A) 내지 41(F)는 보드 내부 액세스 커넥터의 상면도를 나타내고, 41A to 41F show top views of the board internal access connector,

도 42는 대표적인 FPGA 보드에 있어서 보드상 커넥터 및 몇가지 컴포넌트를 나타내며, 42 illustrates on-board connectors and some components of a typical FPGA board,

도 43은 도 41(A) 내지 41(F) 및 도 42의 커넥터의 사용 설명을 나타내고, Fig. 43 shows the use of the connectors of Figs. 41 (A) to 41 (F) and Fig. 42;

도 44는 본 발명의 다른 실시예에 따라서 FPGA 어레이의 다이렉트-네이버 및 원홉 듀얼보드 내부액세스 레이아웃을 나타내며, 44 illustrates a direct-naver and one-hop dual board internal access layout of an FPGA array in accordance with another embodiment of the present invention.

도 45는 본 발명의 다른 실시예에 따라서 멀티프로세서를 가진 워크스테이션을 나타내고, 45 illustrates a workstation with a multiprocessor according to another embodiment of the present invention,

도 46은 시간 공유 기반으로 다수의 사용자가 단일 시뮬레이션/에뮬레이션 시스템을 공유하는 본 발명의 다른 실시예에 따르는 환경을 나타내며, 46 illustrates an environment according to another embodiment of the present invention in which multiple users share a single simulation / emulation system on a time sharing basis,

도 47은 본 발명의 일실시예에 따라서 시뮬레이션 서버의 상위 구조를 나타내고, 47 illustrates a higher structure of a simulation server according to an embodiment of the present invention.

도 48은 본 발명의 일실시예를 따라서 시뮬레이션 서버의 구조를 나타내고, 48 illustrates a structure of a simulation server according to an embodiment of the present invention.

도 49는 시뮬레이션 서버의 흐름도를 나타내고, 49 shows a flowchart of a simulation server,

도 50은 잡 스와핑(job swapping) 프로세스의 흐름도를 나타내며, 50 shows a flowchart of a job swapping process,

도 51은 디바이스 드라이버와 리컨피규러블 하드웨어 유닛 사이의 신호를 나타내며, 51 illustrates a signal between a device driver and a reconfigurable hardware unit,

도 52는 다양한 레벨의 우선 순위를 가진 다중 잡(job)을 조정하기 위한 시뮬레이션 서버의 시간 공유 특징을 도시하며, FIG. 52 illustrates a time sharing feature of a simulation server for coordinating multiple jobs with various levels of priority,

도 53은 디바이스 드라이버와 리컨피규러블 하드웨어 유닛 사이의 통신 핸드쉐이크 신호를 나타내고,53 illustrates a communication handshake signal between a device driver and a reconfigurable hardware unit,

도 54는 통신 핸드쉐이크 프로토콜의 상태도를 나타내며, 54 shows a state diagram of the communication handshake protocol,

도 55는 본 발명의 일실시예에 따라서 시뮬레이션 서버의 클라이언트-서버 모델의 개략도를 나타내고, 55 shows a schematic diagram of a client-server model of a simulation server according to an embodiment of the present invention,

도 56은 본 발명의 일실시예에 따라서 메모리 맵핑을 실행하기 위한 시뮬레이션 시스템의 상위 블럭도를 나타내며, 56 shows a high block diagram of a simulation system for performing memory mapping in accordance with an embodiment of the present invention,

도 57은 각각의 FPGA 로직장치에 대한 평가 한정 상태 머신(EVALFSM_x)과 메모리 한정 상태 머신(MEMFSM)용 지원 컴퍼넌트를 가진 시뮬레이션 시스템의 메모리 맵핑 태양의 보다 상세한 블럭도를 나타내고, FIG. 57 shows a more detailed block diagram of a memory mapping aspect of a simulation system with support components for an evaluation limited state machine (EVALFSM _x ) and a memory limited state machine (MEMFSM) for each FPGA logic device.

도 58은 본 발명의 일실시예를 따라서 CTRL_FPGA 유닛내의 MEMFSM의 한정상태 머신의 상태도를 나타내며, 58 is a state diagram of a state machine of the MEMFSM in the CTRL_FPGA unit according to one embodiment of the present invention;

도 59는 본 발명의 일실시예를 따라서 FPGA칩 각각에서의 한정 상태 머신의 상태도를 나타내며, 59 is a state diagram of a limited state machine in each FPGA chip according to one embodiment of the present invention;

도 60은 메모리 판독 데이터 이중 버퍼를 나타내며, 60 shows a memory read data double buffer,

도 61은 본 발명의 일실시예를 따라서 시뮬레이션 기록/판독 사이클을 나타내며, 61 illustrates a simulation write / read cycle in accordance with an embodiment of the present invention.

도 62는 CLK_EN 신호 후에 DMA 판독 동작이 일어날 때의 시뮬레이션 데이터 전송 동작의 타이밍도를 나타내고, Fig. 62 shows a timing diagram of a simulation data transfer operation when a DMA read operation occurs after the CLK_EN signal,

도 63은 EVAL 주기의 끝 부근에서 DMA 판독 동작이 일어날 때의 시뮬레이션 데이터 전송 동작의 타이밍도를 나타내며, Fig. 63 is a timing chart of the simulation data transfer operation when the DMA read operation occurs near the end of the EVAL period.

도 64는 PCI 추가 카드로서 실행된 통상적인 사용자 설계를 나타내고, 64 shows a typical user design implemented as a PCI add-in card,

도 65는 테스트중인 장치로서 ASIC을 이용하는 하드웨어/소프트웨어 공동 검증 시스템을 나타내고, 65 illustrates a hardware / software co-verification system using an ASIC as the device under test,

도 66은 테스트중인 장치가 에뮬레이터에 프로그래밍된 때의 에뮬레이터를 사용하는 통상적인 공동 검증 시스템을 나타내며, 66 shows a typical joint verification system using an emulator when the device under test is programmed into the emulator,

도 67은 본 발명의 일실시예를 따라서 시뮬레이션 시스템을 나타내고, 67 illustrates a simulation system in accordance with an embodiment of the present invention.

도 68은 RCC 컴퓨팅 시스템이 다양한 I/O 장치의 소프트웨어 모델과 타겟시스템을 포함하는, 본 발명의 일실시예를 따라서 외부 I/O 장치없는 공동 검증 시스템을 나타내며, 68 illustrates a joint verification system without external I / O devices in accordance with an embodiment of the present invention, wherein the RCC computing system includes software models and target systems of various I / O devices.

도 69는 본 발명의 다른 실시예에 따라서 실제 외부 I/O 장치 및 타겟 시스템을 가진 공동 검증 시스템을 나타내며, 69 illustrates a joint verification system with actual external I / O devices and target systems in accordance with another embodiment of the present invention,

도 70은 본 발명의 일실시예에 따라서 제어로직 내부 데이터의 보다 상세한 로직도를 나타내고, 70 illustrates a more detailed logic diagram of the control logic internal data according to an embodiment of the present invention.

도 71은 본 발명의 일실시예에 따라서 제어로직 외부 데이터의 보다 상세한 로직도를 나타내며, 71 illustrates a more detailed logic diagram of the control logic external data according to an embodiment of the present invention.

도 72는 제어로직 내부 데이터의 타이밍도를 나타내고, 72 is a timing diagram of control logic internal data,

도 73은 제어로직 외부 데이터의 타이밍도를 나타내며, 73 is a timing diagram of control logic external data,

도 74는 본 발명의 일실시예에 RCC 하드웨어 어레이의 보드 레이아웃을 나타내고, 74 illustrates a board layout of an RCC hardware array in one embodiment of the present invention.

도 75(A)는 유지 시간 및 클럭 글리치 문제를 설명하는데 사용될 시프트 레지스터 회로의 일예를 나타내고, 75 (A) shows an example of a shift register circuit to be used to explain the retention time and clock glitch problems,

도 75(B)는 유지 시간을 설명하기 위해 도 76(A)에 나타낸 시프트 레지스터 회로의 타이밍도를 나타내며, FIG. 75B is a timing diagram of the shift register circuit shown in FIG. 76A for explaining the holding time.

도 76(A)는 다중 FPGA 칩을 가로질러 배치된 도 75(A)에 나타낸 동일한 시프트 레지스터 회로를 나타내고, FIG. 76 (A) shows the same shift register circuit shown in FIG. 75 (A) disposed across multiple FPGA chips,

도 76(B)는 유지시간 침해를 설명하기 위한 도 76(A)에 나타낸 시프트 레지스터 회로의 타이밍도를 나타내며, FIG. 76B shows a timing chart of the shift register circuit shown in FIG. 76A for explaining the infringement of the holding time.

도 77(A)는 클럭 글리치 문제를 설명하는데 사용될 로직 회로의 일예를 나타내고, 77 (A) shows an example of a logic circuit to be used to explain the clock glitch problem,

도 77(B)는 클럭 글리치 문제를 설명하기 위한 도 77(A)의 로직 회로의 타이밍도를 나타내며, FIG. 77 (B) shows a timing diagram of the logic circuit of FIG. 77 (A) for explaining the clock glitch problem,

도 78은 유지시간 침해 문제를 해결하기 위한 종래 기술의 타이밍 조정 기술을 나타내고, 78 shows a timing adjustment technique of the related art for solving the maintenance time infringement problem,

도 79는 유지시간 침해문제를 해결하기 위한 종래 기술의 타이밍 재통합기술을 나타내며, 79 illustrates a timing re-integration technique of the prior art for solving the maintenance time violation problem,

도 80(A)는 오리지널 래치를 나타내고, 도 80(B)는 본 발명의 일실시예에 따라서, 타이밍에 영향을 받지 않고, 글리치가 없는 래치를 나타내고, 80 (A) shows the original latch, FIG. 80 (B) shows the latch without glitches, without being affected by timing, in accordance with one embodiment of the present invention,

도 81(A)는 오리지널 설계 플립플롭을 나타내며, 도 81(B)는 본 발명의 일실시예에 따라서 타이밍에 영향을 받지 않고 글리치 없는 설계형 플립플롭을 나타내고, FIG. 81 (A) shows the original design flip-flop, FIG. 81 (B) shows the design flip-flop without glitch in accordance with one embodiment of the present invention,

도 82는 본 발명의 일실시예를 따라서 타이밍에 영향을 받지 않고 글리치가 없는 래치 및 플립플롭의 트리거 메카니즘의 타이밍도를 나타낸다. FIG. 82 illustrates a timing diagram of a trigger mechanism of latches and flip-flops that are not affected by timing and which are not affected by timing according to an embodiment of the present invention.

이러한 도면들에서 본 발명의 여러가지 다양한 관점 및 실시예와 관련하여 이하에서 논의될 것이다. These drawings will be discussed below in connection with various various aspects and embodiments of the invention.

도 83은 본 발명의 일실시예를 병합하는 RCC 시스템의 컴포넌트의 상위도를 나타내며, 83 illustrates a top view of components of an RCC system incorporating an embodiment of the present invention.

도 84는 본 발명의 일실시예에 따라서 주문형 VCD 동작을 설명하는 몇가지 시뮬레이션 시간 주기을 나타낸다. 84 illustrates several simulation time periods illustrating on-demand VCD operation in accordance with one embodiment of the present invention.

본 발명의 상기 목적 및 상세한 설명은 이하의 본문과 첨부 도면으로부터 보다 명백하게 이해될 것이다.
본 명세서는 "SEmulator"또는 "SEmulation" 시스템으로 칭하는 시스템을 통하여 상기 시스템내에서 본 발명의 다양한 실시예를 설명할 것이다. 명세서 전반에 걸쳐, "SEmulator system", "SEmulator" 또는 단순히 시스템이라는 용어가 사용될 수 있다. 이들 용어는 4가지 동작 모드의 임의의 조합을 위한 본 발명에 따른 다양한 장치 및 방법 실시예들을 나타내는 것이다: (1) 소프트웨어 시뮬레이션, (2) 하드웨어 가속을 통한 시뮬레이션, (3) 회로-내부 에뮬레이션(ICE), 및 (4) 그들 각각의 셋업 또는 전-처리 단계를 포함하는 포스트-시뮬레이션 분석. 다른 때는, "SEmulation"이라는 용어가 사용될 수 있다. 이 용어는 여기에서 설명되는 새로운 프로세스를 언급한다.
유사하게, "리컨피규러블 컴퓨팅(RCC, reconfigurable computing) 어레이 시스템", 또는 "RCC 컴퓨팅 시스템"과 같은 용어는 메인 프로세서, 사용자 설계(user design)의 소프트웨어 커널 및 소프트웨어 모델을 포함하는 시뮬레이션/공동 검증 시스템 부분을 나타낸다. "리컨피규러블 하드웨어 어레이" 또는 "RCC 하드웨어 어레이"와 같은 용어는 일실시예에서, 사용자 설계의 하드웨어 모델을 포함하고 리컨피규러블 로직 엘리먼트의 어레이를 포함하는 시뮬레이션/공동 검증 시스템 부분을 나타낸다. The above object and detailed description of the present invention will be more clearly understood from the following text and the accompanying drawings.
This specification will describe various embodiments of the present invention within such systems through a system referred to as a "SEmulator" or "SEmulation" system. Throughout the specification, the terms "SEmulator system", "SEmulator" or simply system may be used. These terms refer to various apparatus and method embodiments according to the present invention for any combination of four modes of operation: (1) software simulation, (2) simulation with hardware acceleration, (3) circuit-internal emulation ( ICE), and (4) post-simulation analysis comprising their respective setup or pre-processing steps. At other times, the term "SEmulation" may be used. This term refers to the new process described herein.
Similarly, terms such as "reconfigurable computing (RCC) array system", or "RCC computing system" include simulation / co-verification, including the main processor, the software kernel of the user design, and the software model. Represents a system part. Terms such as “reconfigurable hardware array” or “RCC hardware array” refer to a portion of a simulation / co-verification system that, in one embodiment, includes a hardware model of a user design and includes an array of reconfigurable logic elements.

삭제delete

또한, 본 명세서는 "사용자", 및 사용자의 "회로 설계" 또는 "전자설계"등에 대하여 언급하고 있다. "사용자"는 그 인터페이스를 통하여 SEmulation 시스템을 사용하는 사람이며, 설계 프로세스에 거의, 또는 전혀 참여하지 않은 회로 또는 테스트/디버거의 설계자일 수 있다. "회로 설계" 또는 "전자 설계"는 소프트웨어 또는 하드웨어에서 테스트/디버그 목적을 위해 SEmulation 시스템에 의해 모델링될 수 있는 주문형 설계 시스템 또는 컴포넌트이다. 많은 경우에, "사용자"도 "회로설계" 또는 "전자설계"를 설계하였다. In addition, the present specification refers to "user" and "circuit design" or "electronic design" of the user. The "user" is the person using the SEmulation system through that interface and may be the designer of a circuit or test / debugger that participates little or no in the design process. A "circuit design" or "electronic design" is a custom design system or component that can be modeled by a SEmulation system for test / debug purposes in software or hardware. In many cases, the "user" has also designed "circuit design" or "electronic design".

또한, 본 명세서는 "와이어", "와이어 라인", "와이어/버스 라인" 및 "버스"라는 용어를 사용한다. 이들 용어는 다양한 전기적으로 도전성인 라인을 언급한 것이다. 각 라인은 두개의 지점사이의 단일 와이어 또는 지점들 사이의 여러 개의 와이어일 수 있다. 이들 용어는 와이어가 하나 이상의 도전성 라인을 포함할 수 있고, 버스가 하나 이상의 도전성 라인을 또한 포함할 수 있다는 점에서 상호 교환가능하다. In addition, the specification uses the terms "wire", "wire line", "wire / bus line" and "bus". These terms refer to various electrically conductive lines. Each line may be a single wire between two points or several wires between points. These terms are interchangeable in that the wire may include one or more conductive lines, and the bus may also include one or more conductive lines.

본 명세서는 개략적인 형태로 표현되어 있다. 우선, 본 명세서는 4개의 작동 모드 및 하드웨어 실행 구성의 개략을 포함하는 SEmulator 시스템의 일반적인 개략을 나타낸다. 두번째로, 본 명세서는 SEmulator 시스템의 상세한 논의를 제공한다. 몇가지 경우에 있어서, 하나의 도면은 이전의 도면에 나타낸 실시예의 변형을 제공할 수 있다. 이들 경우에, 유사한 참조부호가 유사한 컴포넌트/유닛/프로세스에 대하여 사용될 수 있다. 본 명세서의 개요는 이하와 같다:This specification is shown in schematic form. First of all, the present specification presents a general outline of a SEmulator system that includes an outline of four modes of operation and a hardware execution configuration. Second, the present specification provides a detailed discussion of the SEmulator system. In some cases, one figure may provide a variation of the embodiment shown in the previous figure. In these cases, similar reference numerals may be used for similar components / units / processes. The outline of this specification is as follows:

I. 개요I. Overview

A. 시뮬레이션/하드웨어 가속모드A. Simulation / Hardware Acceleration Mode

B. 타겟 시스템 모드에 의한 에뮬레이션B. Emulation by Target System Mode

C. 포스트-시뮬레이션 분석 모드 C. Post-Simulation Analysis Mode

D. 하드웨어 구현 수단
E. 시뮬레이션 서버D. Hardware Implementation Means
E. Simulation Server

삭제delete

F. 메모리 시뮬레이션F. Memory Simulation

G. 공동 검증 시스템G. Joint Verification System

II. 시스템 설명II. System description

III. 시뮬레이션/하드웨어 가속모드III. Simulation / Hardware Acceleration Mode

IV. 타겟시스템 모드에 의한 에뮬레이션IV. Emulation by Target System Mode

V. 포스트-시뮬레이션 분석 모드V. Post-Simulation Analysis Mode

VI. 하드웨어 구현 수단VI. Hardware implementation means

A. 개요A. Overview

B. 어드레스 포인터B. Address Pointer

C. 게이트 데이터/클럭 네트워크 분석C. Gate Data / Clock Network Analysis

D. FPGA 어레이 및 제어D. FPGA Array and Control

E. 고밀도 FPGA칩을 이용하는 선택적 실시예 E. Optional Embodiments Using High Density FPGA Chips

F. TIGF 로직 소자F. TIGF Logic Devices

VII. 시뮬레이션 서버VII. Simulation server

VIII. 메모리 시뮬레이션VIII. Memory simulation

IX. 공동 검증 시스템IX. Joint verification system

X. 실예X. Example

I. 개요I. Overview

본 발명의 다양한 실시예는 4개의 일반적인 동작모드를 가진다: (1) 소프트웨어 시뮬레이션, (2) 하드웨어 가속을 통한 시뮬레이션, (3) 회로내 에뮬레이션, (4) 포스트-시뮬레이션 분석. 다양한 실시예는 이하의 특징 중 적어도 몇가지를 가진 이들 모드의 시스템 및 방법을 포함한다:Various embodiments of the present invention have four general modes of operation: (1) software simulation, (2) simulation with hardware acceleration, (3) in-circuit emulation, and (4) post-simulation analysis. Various embodiments include systems and methods of these modes having at least some of the following features:

(1) 사이클마다 소프트웨어 및 하드웨어 모델을 제어하는, 단일의 엄격하게 결합된 시뮬레이션 엔진, 소프트웨어 커널을 가지는 소프트웨어 및 하드웨어 모델, (2) 소프트웨어 및 하드웨어 모델 발생 및 분할을 위한 컴파일레이션 프로세스동안의 자동 컴포넌트 형태 분석, (3) 하드웨어 가속 모드, 회로내 에뮬레이션 모드, 및 포스트-시뮬레이션 분석 모드를 통한, 소프트웨어 시뮬레이션 모드 사이에서 (사이클마다)의 전환 능력, (4) 소프트웨어 결합 컴포넌트 재발생을 통한 완전한 하드웨어 모델 가시성, (5) 레이스 컨디션을 피하기 위한 소프트웨어 클럭 및 게이트 클럭/데이터 로직에 의한 이중 버퍼 클럭 모델링, 및 (6) 경과된 시뮬레이션 세션 내에서 임의로 선택된 지점으로부터 사용자의 회로설계를 다시 시뮬레이션하거나 또는 하드웨어 가속하는 능력. 최종 결과는 완전 HDL 기능성 및 에뮬레이터 실행 성능을 갖는 융통성있고 신속한 시뮬레이터/에뮬레이터 시스템 및 방법이다. (1) a single tightly coupled simulation engine that controls software and hardware models per cycle, software and hardware models with software kernels, and (2) automatic components during the compilation process for generating and partitioning software and hardware models. Ability to switch (per cycle) between software simulation modes, through shape analysis, (3) hardware acceleration mode, in-circuit emulation mode, and post-simulation analysis mode, and (4) full hardware model visibility through software coupled component regeneration (5) dual buffer clock modeling by software clock and gate clock / data logic to avoid race conditions, and (6) re-simulate or hardware accelerate the user's circuit design from randomly chosen points within an elapsed simulation session.ability. The end result is a flexible and fast simulator / emulator system and method with full HDL functionality and emulator execution performance.

A. 시뮬레이션/하드웨어 가속 모드A. Simulation / Hardware Acceleration Mode

자동 컴포넌트 타입 분석을 통해 SEmulator 시스템은 소프트웨어 및 하드웨어에 있어서 사용자 주문 회로 설계를 모델링할 수 있다. 전체 사용자 회로 설계는 소프트웨어에서 모델링되는 반면, 평가 컴포넌트(즉, 레지스터 컴퍼넌트, 결합 컴포넌트)는 하드웨어에서 모델링된다. 하드웨어 모델링은 컴포넌트 타입 분석에 의해 용이하게 이루어진다. Automated component type analysis allows the SEmulator system to model custom circuit designs in software and hardware. The entire user circuit design is modeled in software, while the evaluation component (ie, register component, coupling component) is modeled in hardware. Hardware modeling is facilitated by component type analysis.

범용의 프로세스 시스템의 메인 메모리에 귀속된 소프트웨어 커널은 다양한 모드와 특징의 실행 및 전체 동작을 제어하는 SEmulator 시스템의 메인 프로그램 으로서 작용한다. 임의의 테스트 벤치 프로세스가 작동하는 한, 커널은 액티브 테스트 벤치 컴포넌트를 평가하고, 클럭 컴포넌트를 평가하며, 클럭 에지(clock edge)를 검출하여 조합 로직 데이터를 전송할 뿐만 아니라 레지스터와 메모리를 업데이트하고, 시뮬레이션 시간을 진행시킨다. 이러한 소프트웨어 커널은 밀착 결합되는 특성의 시뮬레이터 엔진을 위해 하드웨어 가속 엔진을 제공한다. 소프트웨어/하드웨어 경계에 있어서, SEmulator 시스템은 많은 I/O 어드레스 공간 - REG(레지스터), CLK(소프트웨어 클럭), S2H(소프트웨어 대 하드웨어) 및 H2S(하드웨어 대 소프트웨어)를 제공한다. The software kernel, which belongs to the main memory of the general-purpose process system, acts as the main program of the SEmulator system, which controls the execution and overall operation of various modes and features. As long as any test bench process is running, the kernel evaluates active test bench components, evaluates clock components, detects clock edges, sends combinatorial logic data, updates registers and memory, and simulates them. Advance time. This software kernel provides a hardware acceleration engine for the closely coupled simulator engine. At the software / hardware boundary, the SEmulator system provides many I / O address spaces-REG (register), CLK (software clock), S2H (software to hardware) and H2S (hardware to software).

SEmulator는 4개 동작 모드 중에서 선택적으로 전환할 수 있는 능력을 가진다. 이러한 시스템의 사용자는 시뮬레이션을 시작하고, 시뮬레이션을 중단하며, 입력값을 발생(assert)하고, 값을 검사하며, 사이클마다 단일 단계를 진행하며, 4개의 다른 모드 사이에서 전후 전환한다. 예컨대, 상기 시스템은 시간 주기동안 소프트웨어로 회로를 시뮬레이션하고, 하드웨어 모델링을 통해 시뮬레이션을 가속하며, 소프트웨어 시뮬레이션 모드로 다시 리턴한다. The SEmulator has the ability to switch between four modes of operation selectively. The user of such a system starts a simulation, stops a simulation, asserts an input value, examines a value, goes through a single step every cycle, and switches back and forth between four different modes. For example, the system simulates the circuit in software over a period of time, accelerates the simulation through hardware modeling, and returns back to the software simulation mode.

일반적으로 SEmulation 시스템은 소프트웨어로 모델링되는지 또는 하드웨어로 모델링되는지에 상관없이 모든 모델링되는 컴포넌트를 "인식"할 수 있는 능력을 사용자에게 제공한다. 여러가지의 이유로, 결합 컴포넌트는 레지스터와 같이 가시적이지 않으며, 따라서 결합 컴포넌트 데이터의 획득은 어렵다. 하나의 이유는, 실제의 결합 컴포넌트 대신에, 통상적으로 룩업 테이블(LUT)과 같은 결합 컴포넌트를 사용자 회로설계의 하드웨어 부분을 모델링하기 위하여 리컨피규러블 보드내에 사용되는 FPGA가 모델링한다는 것이다. 따라서, SEmulation 시스템은 레지스터 값을 판독하여, 결합 컴포넌트를 재생성한다. 결합 컴포넌트를 재생성하기 위해서는 경비가 필요하므로, 이러한 재생성 프로세스는 항상 수행되는 것은 아니며, 오히려 사용자의 요청에 의해서만 수행된다. In general, SEmulation systems provide users with the ability to "see" all modeled components, whether modeled in software or hardware. For various reasons, the coupling component is not as visible as a register, and thus obtaining the coupling component data is difficult. One reason is that instead of the actual coupling component, the coupling component, typically a lookup table (LUT), is modeled by the FPGA used in the reconfigurable board to model the hardware portion of the user circuit design. Thus, the SEmulation system reads the register value and regenerates the coupling component. This regeneration process is not always performed, but rather only at the request of the user, as there is a cost to regenerate the combined component.

소프트웨어 커널은 소프트웨어 측에 존재하기 때문에, 클럭 에지 검출 메카니즘은 하드웨어 모델내의 다양한 레지스터로 인에이블 입력을 구동하는 소위 소프트웨어 클럭의 발생을 트리거하도록 제공된다. 그 타이밍은 소프트웨어 클럭 인에이블 신호가 이들에 대한 데이터의 모델링 전에 레지스터 모델로 진입하도록 이중 버퍼 회로 구현을 통하여 엄격하게 제어된다. 일단 이들 레지스터 모델에 대한 데이터 입력이 안정화되면, 소프트웨어 클럭은 유지시간 위반의 어떠한 위험도 없이 모든 데이터 값이 함께 게이트되는 것을 보장하기 위하여 동기식으로 데이터를 게이트한다. Since the software kernel resides on the software side, a clock edge detection mechanism is provided to trigger the generation of a so-called software clock that drives the enable input to various registers in the hardware model. The timing is tightly controlled through the double buffer circuit implementation so that the software clock enable signals enter the register model before modeling the data for them. Once the data inputs to these register models have stabilized, the software clock gates the data synchronously to ensure that all data values are gated together without any risk of holding time violations.

소프트웨어 시뮬레이션은 또한 상기 시스템이 모든 입력값 및 선택된 레지트터값/상태만을 로그하기 때문에 빠르고, 따라서 I/O 동작 수를 감소시킴으로써 오버헤드가 최소화된다. 사용자는 선택적으로 로깅(logging) 주파수를 선택할 수 있다. Software simulation is also fast because the system logs all input values and selected register values / states, thus minimizing overhead by reducing the number of I / O operations. The user can optionally select a logging frequency.

SEmulation 시스템은 그 타겟 시스템 환경내에서 사용자의 회로를 에뮬레이팅할 수 있다. 타겟 시스템은 평가용 하드웨어 모델에 데이터를 출력하고, 하드웨어 모델은 또한 타겟 시스템으로 데이터를 출력한다. 부가적으로, 소프트웨어 커널은 사용자가 개시하고, 중단하며, 값을 발생하고, 단일 단계를 진행하며, 한 모드에서 다른 모드로 전환하기 위한 옵션을 가질 수 있도록 이러한 모드의 동작을 제어한다. The SEmulation system can emulate the user's circuitry within its target system environment. The target system outputs data to the evaluation hardware model, which also outputs data to the target system. In addition, the software kernel controls the operation of these modes so that the user has the option to initiate, abort, generate values, go through a single step, and switch from one mode to another.

C. 포스트-시뮬레이션 분석 모드C. Post-Simulation Analysis Mode

로그는 시뮬레이션 세션의 히스토리 기록을 사용자에게 제공한다. 공지된 시뮬레이션 시스템과는 달리, SEmulation 시스템은 단일값, 내부 상태, 또는 시뮬레이션 프로세스 동안 값 변화마다 로깅하지는 않는다. SEmulation 시스템은 오직 로깅 주파수(즉, N사이클 마다 log 1 기록)에 근거하여 선택된 값과 상태를 로그한다. 포스트-시뮬레이션 단계 동안, 사용자가 방금 종료한 시뮬레이션 세션 내의 X 지점 근방의 다양한 데이터를 검사하기 원한다면, 사용자는 로깅된 지점들 중의 하나, 즉 지점 X에 가장 가깝고 그 이전에 일시적으로 위치하는 로깅된 지점 Y로 간다. 그리고 나서, 사용자는 시뮬레이션 결과를 얻기 위하여 선택되는 로깅된 지점 Y로부터 원하는 지점 X까지 시뮬레이션한다. The log provides the user with a historical record of the simulation session. Unlike known simulation systems, the SEmulation system does not log every single value, internal state, or value change during the simulation process. The SEmulation system only logs selected values and states based on the logging frequency (ie log 1 every N cycles). During the post-simulation phase, if the user wants to examine various data near the X point in the simulation session that he just ended, the user has one of the logged points, i.e. the logged point that is closest to and temporarily located before point X. Go to Y The user then simulates from the logged point Y selected from the selected point X to obtain the simulation result.

또한, 주문형 VCD 시스템은 이하에 설명될 것이다. 이러한 주문형 VCD 시스템은 사용자가 시뮬레이션 리턴 없이 주문되는 임의의 시뮬레이션 타겟 범위(즉, 시뮬레이션 시간)를 관찰할 수 있도록 한다. In addition, a custom VCD system will be described below. This custom VCD system allows the user to observe any simulation target range (ie simulation time) ordered without a simulation return.

D. 하드웨어 구현 수단D. Hardware Implementation Means

SEmulation 시스템은 리컨피규러블 보드 상에 FPGA칩 어레이를 구현한다. 하드웨어 모델에 근거하여, SEmulation 시스템은 사용자의 회로 설계의 선택된 부분 각각을 FPGA칩 상으로 분할하고, 맵핑하며, 배치하고 라우팅(routing:경로설정)한다. 따라서, 예컨대, 4x4 어레이의 16개의 칩은 이들 16개의 칩 전체에 대해 확장된 대형회로의 모델링일 수 있다. 상호접속 수단은 각각의 칩이 2 "점프" 또는 링크 내에서 다른 칩으로 액세스하는 것을 허용한다. The SEmulation system implements an FPGA chip array on a reconfigurable board. Based on the hardware model, the SEmulation system divides, maps, places and routes each selected portion of the user's circuit design onto the FPGA chip. Thus, for example, sixteen chips in a 4x4 array may be a model of a large circuit that is extended over all of these sixteen chips. The interconnect means allows each chip to access two "jumps" or other chips within the link.

각각의 FPGA칩은 I/O 어드레스 공간(즉, REG, CLK, S2H, H2S) 각각에 대한 어드레스 포인터를 실행한다. 특정 어드레스 공간과 연동되는 모든 어드레스 포인터의 조합은 서로 연결되어 있다. 그래서, 데이터 전송 동안, 각각의 칩의 워드 데이터는 메인 FPGA 버스 및 PCI 버스로부터/로 각 칩의 선택된 어드레스 공간에 대하여 한번에 하나의 워드가, 한번에 하나의 칩이, 원하는 워드 데이터가 선택된 어드레스 공간에 액세스될 때까지 순차적으로 선택된다. 이러한 워드데이터의 순차적인 선택은 워드 선택 신호의 전달에 의해 달성된다. 이러한 워드 선택 신호는 칩내의 어드레스 포인터를 통하여 진행하여, 다음 칩의 어드레스 포인터까지 전파되며, 마지막 칩 또는 시스템이 어드레스 포인터를 초기화할 때까지 계속된다. Each FPGA chip implements an address pointer for each of the I / O address spaces (ie, REG, CLK, S2H, H2S). All combinations of address pointers associated with a particular address space are linked together. Thus, during data transfer, the word data of each chip is to / from the main FPGA bus and PCI bus, one word at a time, one chip at a time, and one word at a time for the selected address space of each chip in the selected address space. Selected sequentially until accessed. This sequential selection of word data is accomplished by the transfer of word selection signals. This word select signal propagates through the address pointer in the chip, propagates to the address pointer of the next chip, and continues until the last chip or system initializes the address pointer.

리컨피규러블 보드 내의 FPGA 버스 시스템은 PCI 버스 대역폭의 두배의 대역폭, PCI 버스 속도의 절반 속도에서 동작한다. 따라서, FPGA 칩은 보다 큰 대역폭의 버스를 이용하기 위하여 뱅크로 분리된다. 이러한 FPGA 버스 시스템의 처리율은 PCI 버스 시스템의 처리율을 따를 수 있어서, 버스 속도를 줄임으로써 성능이 저하되지 않는다. 확장은 뱅크 길이를 연장하는 피기백(piggyback) 보드를 통하여 가능하다. The FPGA bus system in the reconfigurable board operates at twice the bandwidth of the PCI bus bandwidth and half the PCI bus speed. Thus, FPGA chips are divided into banks to use buses of higher bandwidth. The throughput of these FPGA bus systems can follow the throughput of the PCI bus system so that performance is not compromised by reducing the bus speed. Expansion is possible through piggyback boards that extend the bank length.

본 발명의 다른 실시예에 있어서, 고밀도의 FPGA칩이 사용된다. 이러한 고밀도의 칩의 일 예는 Altera 10K130V 및 10K250V 칩이다. 이들 칩의 사용으로 8개의 저밀도 FPGA 칩대신에 단지 4개의 FPGA칩(Altera 10K 100)이 보드당 사용되도록 보드 설계를 변경할 수 있다. In another embodiment of the present invention, a high density FPGA chip is used. Examples of such high density chips are Altera 10K130V and 10K250V chips. The use of these chips allows the board design to be modified so that only four FPGA chips (Altera 10K 100) are used per board instead of eight low-density FPGA chips.

시뮬레이션 시스템의 FPGA 어레이가 특정 보드 상호접속 구조를 통하여 마더보드상에 제공된다. 각각의 칩은 8개 세트의 상호접속부를 가질 수 있으며, 여기서 상호접속부는 인접 다이렉트-네이버(direct neighbor) 상호접속부(즉, N[73:0],W[73:0], E[73:0]), 및 하나의 단일 보드 내부와 다른 보드 전체에 대한 로컬 버스 커넥션을 제외한 원-홉 네이버 상호접속부(즉,NH[27:0], SH[27:0], XH[36:0], XH[72:37])에 따라서 배열된다. 각각의 칩은 인접한 네이버 칩에 바로 상호접속되거나, 위, 아래, 좌우에 위치한 인접하지 않은 칩으로 하나의 홉내에 상호접속될 수 있다. X 방향(동-서)에서 어레이는 토러스(torus)구조이다. Y 방향(북-남)에서 어레이는 매쉬(mesh)구조이다.The FPGA array of the simulation system is provided on the motherboard through a specific board interconnect structure. Each chip may have eight sets of interconnects, where the interconnects are direct neighbor interconnects (ie, N [73: 0], W [73: 0], E [73: 0]), and one-hop neighbor interconnects (i.e., NH [27: 0], SH [27: 0], XH [36: 0], excluding local bus connections on one single board and on the other board as a whole) , XH [72:37]). Each chip may be interconnected directly to adjacent neighboring chips, or may be interconnected in one hop with non-adjacent chips located above, below, left and right. In the X direction (east-west), the array is a torus. In the Y direction (north-south), the array is a mesh structure.

상기 상호접속부는 단일 보드 내에서 로직 소자 및 다른 컴포넌트들과 결합할 수 있다. 그러나, (1) 마더 보드와 어레이 보드를 통한 PCI 버스, 및 (2) 임의의 2개의 어레이 보드 사이에서 신호를 전달하기 위하여 서로 다른 보드에 대해 이러한 보드와 상호접속부들을 함께 결합시키도록 내부 보드 커넥터가 제공된다. The interconnect can couple with logic elements and other components within a single board. However, an internal board connector may be used to join these boards and interconnects together for (1) a PCI bus through the motherboard and the array board, and (2) any other array board to transfer signals between any two array boards. Is provided.

마더보드 커넥터는 보드를 마더모드에 연결시켜서, PCI 버스, 전력, 및 접지에 연결한다. 몇개의 보드에 있어서, 마더보드 커넥터는 마더보드로의 직접적인 커넥션에는 사용되지 않는다. 6개의 보드 구성에 있어서, 단지 보드 1, 3 및 5 만이 마더보드에 직접 액세스되는 반면에, 나머지 보드 2, 4, 및 6은 마더보드 연결성을 위하여 그들의 인접 보드에 의존한다. 따라서, 다른 모든 보드는 마더 보드에 직접 연결되며, 이들 보드의 로컬 버스 및 상호접속부들은 솔더 측 내지 컴포넌트 측에 배열되는 내부 보드 커넥터를 통하여 함께 결합된다. PCI신호는 단지 하나의 보드(통상적으로 첫번째의 보드)를 통해 라우팅된다. 전력 및 접지는 상기 보드들을 위한 다른 마더보드 커넥터에 인가된다. 솔더 측 내지 컴포넌트 측에 배치되면, 다양한 내부 보드 커넥터는 PCI 버스 컴포넌트, FPGA 로직 소자, 메모리 소자 및 다양한 시뮬레이션 시스템 제어 회로간의 통신을 허용한다. Motherboard connectors connect the board to motherboard mode, which connects to the PCI bus, power, and ground. On some boards, motherboard connectors are not used for direct connections to the motherboard. In a six board configuration, only boards 1, 3, and 5 are directly accessed to the motherboard, while the remaining boards 2, 4, and 6 rely on their adjacent boards for motherboard connectivity. Thus, all other boards are directly connected to the motherboard, and the local buses and interconnects of these boards are joined together through internal board connectors arranged on the solder side or component side. PCI signals are routed through only one board (usually the first board). Power and ground are applied to other motherboard connectors for the boards. When placed on the solder side or component side, various internal board connectors allow communication between PCI bus components, FPGA logic devices, memory devices, and various simulation system control circuits.

E. 시뮬레이션 서버E. Simulation Server

본 발명의 다른 실시예에서, 시뮬레이션 서버는 다수의 사용자가 동일한 리컨피규러블 하드웨어 유닛을 액세스하는 것을 허용하도록 제공된다. 하나의 시스템 구조에 있어서 하나의 네트워크에 대한 다수의 워크스테이션, 또는 비-네트워크 환경내의 다수의 사용자/프로세스가 동일한 또는 다른 사용자 회로 설계를 리뷰/디버그하도록 동일한 서버 기반 리컨피규러블 하드웨어 유닛을 액세스할 수 있다. 상기 액세스는 스케쥴러가 다수의 사용자에 대한 액세스 우선순위를 결정하고, 잡(job)을 스와핑하며, 예정된 사용자 사이에서 선택적으로 하드웨어 모델 액세스를 로크(lock)하는 시간 공유 프로세스를 통하여 달성된다. 하나의 시나리오에서, 각 사용자는 처음으로 리컨피규러블 하드웨어 모델로 각각의 사용자 설계를 맵핑하기 위하여 서버에 액세스할 수 있으며, 그 경우 시스템은 소프트웨어 및 하드웨어 모델을 생성하기 위하여 설계를 컴파일하고, 클러스터링 동작을 수행하며, 플레이스-및-라우트(place-and-route) 동작을 수행하고, 비트스트림 컨피규레이션 파일을 생성하고, 사용자 설계의 하드웨어 부분을 모델링하기 위하여 리컨피규러블 하드웨어 유닛의 FPGA칩을 리컨피규러블한다. 하나의 사용자가 하드웨어 모델, 및 소프트웨어 시뮬레이션동안 자신의 메모리로 다운로드되는 하드웨어 상태을 이용하여 그 설계를 가속하면, 하드웨어 유닛은 다른 사용자에 의한 액세스를 위해 개방될 수 있다. In another embodiment of the present invention, a simulation server is provided to allow multiple users to access the same reconfigurable hardware unit. In a system architecture, multiple workstations for one network, or multiple users / processes in a non-network environment, may access the same server-based reconfigurable hardware unit to review / debug the same or different user circuit designs. Can be. The access is achieved through a time sharing process where the scheduler determines access priorities for multiple users, swaps jobs, and optionally locks hardware model access among the scheduled users. In one scenario, each user can access a server for the first time to map each user design to a reconfigurable hardware model, in which case the system compiles the design to generate software and hardware models, and clustering operations. Reconfigure the FPGA chip of the reconfigurable hardware unit to perform place-and-route operations, generate bitstream configuration files, and model the hardware portion of your design. do. If one user accelerates the design using a hardware model and hardware state downloaded to his memory during software simulation, the hardware unit can be opened for access by another user.

서버는 가속 및 하드웨어 상태 스와핑 목적으로 리컨피규러블 하드웨어 유닛을 액세스하기 위하여 다수의 사용자 또는 프로세스를 제공한다. 시뮬레이션 서버는 스케쥴러 또는 하나 이상의 디바이스 드라이버, 및 리컨피규러블 하드웨어 유닛을 포함한다. 시뮬레이션 서버 내의 스케쥴러는 선점형 라운드 로빈(preemptive round robin) 알고리즘을 기반으로 한다. 서버 스케쥴러는 시뮬레이션 잡 큐 테이블(simulation job queue table), 우선순위 분류기 및 잡 스와퍼를 포함한다. 본 발명의 복원 및 재생 기능은 네트워크 다중 사용자 환경뿐만 아니라 비-네트워크 멀티프로세싱 환경을 용이하게 하고, 이전의 체크포인트 상태 데이터가 다운로드될 수 있으며, 그 체크포인트와 관련된 전체 시뮬레이션 상태는 재생 디버깅 또는 매사이클별 스텝핑을 위해 복원될 수 있다. The server provides multiple users or processes to access the reconfigurable hardware unit for acceleration and hardware state swapping purposes. The simulation server includes a scheduler or one or more device drivers, and a reconfigurable hardware unit. The scheduler in the simulation server is based on a preemptive round robin algorithm. The server scheduler includes a simulation job queue table, a priority classifier, and a job swapper. The restore and playback functions of the present invention facilitate not only network multi-user environments, but also non-network multiprocessing environments, where previous checkpoint state data can be downloaded, and the overall simulation state associated with the checkpoint can be reproduced or debugged. Can be restored for cycle-by-cycle stepping.

F. 메모리 시뮬레이션F. Memory Simulation

본 발명의 메모리 시뮬레이션 또는 메모리 맵핑 실시예는 리컨피규러블 하드웨어 유닛내의 FPGA 칩 어레이로 프로그래밍된, 사용자 설계의 구성된 하드웨어 모델과 연동되는 다양한 메모리 블럭을 관리하기 위하여 시뮬레이션 시스템에 대하여 효과적인 방안을 제공한다. 본 발명의 상기 메모리 시뮬레이션 실시예는 사용자 설계와 연동되는 수많은 메모리 블럭이 로직 소자 내부 대신에 시뮬레이션 시스템 의 SRAM 메모리 소자들로 맵핑되며, 그것은 사용자 설계를 구성하고 모델링하는데 이용된다. 메모리 시뮬레이션 시스템은 메모리 상태 머신, 평가 상태 머신, 및 제어와 인터페이스를 위해 연동되는 로직: (1) 메인 컴퓨팅 시스템 및 그와 연관된 메모리 시스템, (2) 시뮬레이션 시스템내의 FPGA 버스에 결합된 SRAM 메모리 소자, 및 (3) 디버깅되는 구성(configured) 및 프로그래밍된 사용자 설계를 포함하는 FPGA 로직 소자를 포함한다. 본 발명의 일 실시예에 따른 메모리 시뮬레이션 시스템의 동작은 일반적으로 이하와 같다. 시뮬레이션 기록(write)/판독(read) 사이클은 3주기-DMA 데이터 전송, 평가 및 메모리 액세스로 분할된다. The memory simulation or memory mapping embodiment of the present invention provides an effective approach for a simulation system to manage various memory blocks associated with a user-configured hardware model programmed with an FPGA chip array in a reconfigurable hardware unit. In the memory simulation embodiment of the present invention, a number of memory blocks associated with a user design are mapped to SRAM memory elements of the simulation system instead of inside the logic element, which is used to construct and model the user design. The memory simulation system comprises logic that interoperates with the memory state machine, evaluation state machine, and control: (1) the main computing system and its associated memory system, (2) an SRAM memory device coupled to the FPGA bus within the simulation system, And (3) FPGA logic elements that include a debugged configured and programmed user design. Operation of the memory simulation system according to an embodiment of the present invention is generally as follows. The simulation write / read cycle is divided into three cycles-DMA data transfer, evaluation and memory access.

메모리 시뮬레이션 시스템의 FPGA 로직 소자측은 평가 상태 머신, FPGA 버스 드라이버, 및 사용자 설계 내의 사용자 자신의 메모리 인터페이스와 인터페이싱하기 위하여 각각의 메모리 블럭 N에 대한 로직 인터페이스를 포함한다: (1) FPGA 로직 소자사이에서 데이터 평가, (2) FPGA 로직 소자 및 SRAM 메모리 소자 사이에서 기록/판독 메모리 액세스를 조정한다. FPGA 로직 소자측과 연계하여, FPGA I/O 컨트롤러 측은 (1) 메인 컴퓨팅 시스템과 SRAM 메모리 소자, 및 (2) FPGA 로직 소자와 SRAM 메모리 소자 사이의 DMA, 기록 및 판독 동작을 처리하기 위한 메모리 상태 머신 및 인터페이스 로직을 포함한다. The FPGA logic element side of the memory simulation system includes an evaluation state machine, an FPGA bus driver, and a logic interface for each memory block N to interface with the user's own memory interface in the user design: (1) between FPGA logic elements Data evaluation, (2) coordinates write / read memory access between FPGA logic devices and SRAM memory devices. In conjunction with the FPGA logic element side, the FPGA I / O controller side may include (1) the main computing system and the SRAM memory element, and (2) the memory state to handle DMA, write and read operations between the FPGA logic element and the SRAM memory element. It includes machine and interface logic.

G. 공동 검증 시스템(coverification system)G. Joint Coverage System

본 발명의 일실시예는 리컨피규러블 컴퓨팅 시스템(이하, "RCC 컴퓨팅 시스템"이라고 함)과, 리컨피규러블 컴퓨팅 하드웨어 어레이(이하, "RCC 하드웨어 어레이"라고 함)를 포함하는 공동검증 시스템이다. 몇가지 실시예에서, 타겟 시스템 및 외부 I/O 장치는 그들이 소프트웨어로 모델링될 수 있기 때문에 필수적이지는 않다. 다른 실시예에 있어서, 타겟 시스템 및 외부 I/O 장치는 사실상 시뮬레이션된 테스트 벤치 데이터보다는 실제 데이터를 사용하고 속도를 얻기 위하여 공동 검증 시스템에 결합되어 있다. 따라서, 하나의 공동 검증 시스템은 사용자 설계의 소프트웨어 부분 및 하드웨어 부분을 디버깅하기 위하여 다른 기능과 함께 RCC 컴퓨팅 시스템 및 RCC 하드웨어 어레이를 포함시키고, 실제 타겟 시스템 및/또는 I/O 장치를 사용할 수 있다. One embodiment of the present invention is a co-verification system that includes a reconfigurable computing system (hereinafter referred to as "RCC computing system") and a reconfigurable computing hardware array (hereinafter referred to as "RCC hardware array"). In some embodiments, the target system and external I / O devices are not essential because they can be modeled in software. In another embodiment, the target system and external I / O devices are actually coupled to a joint verification system to use and obtain actual data rather than simulated test bench data. Thus, one joint verification system may include an RCC computing system and an RCC hardware array along with other functionality to debug the software and hardware portions of the user design, and may use actual target systems and / or I / O devices.

RCC 컴퓨팅 시스템은 또한 클럭 로직(에지 검출 및 소프트웨어 클럭 발생용), 사용자 설계의 테스트를 위한 테스트 벤치 프로세스 및, 사용자가 실제 물리적 I/O 장치를 사용하는 대신에 소프트웨어로 모델링하도록 결정하는 임의의 I/O 장치를 위한 디바이스 모델을 포함한다. 물론, 사용자는 하나의 디버그 세션에서 모델링된 I/O 디바이스뿐만 아니라 실제의 I/O 장치를 사용하도록 결정할 수 있다. 소프트웨어 클럭은 타겟 시스템 및 외부 I/O 장치를 위한 외부 클럭 소스로서 기능하기 위하여 인터페이스에 제공되어 있다. 이러한 소프트웨어 클럭의 이용은 입출력되는 데이터를 처리하는데 필수적인 동기화를 제공한다. RCC 컴퓨팅 시스템-발생된 소프트웨어 클럭은 디버그 세션을 위한 시간 기반이므로, 시뮬레이션되고 하드웨어 가속된 데이터는 공동 검증 시스템 및 외부 인터페이스 사이에서 전달되는 임의의 데이터와 동기화된다. The RCC computing system also includes clock logic (for edge detection and software clock generation), test bench processes for testing user designs, and any I that decides to model in software instead of using a real physical I / O device. Contains device model for / O devices. Of course, the user can decide to use the actual I / O device as well as the modeled I / O device in one debug session. The software clock is provided on the interface to function as an external clock source for the target system and external I / O devices. The use of such software clocks provides the synchronization necessary to process the input and output data. Since the RCC computing system-generated software clock is time based for the debug session, the simulated and hardware accelerated data is synchronized with any data passed between the joint verification system and the external interface.

타겟 시스템 및 외부 I/O 장치가 공동 검증 시스템에 결합되면, 핀아웃 데이터는 공동 검증 시스템 및 그 외부 인터페이스사이에 제공되어야 한다. 공동 검증 시스템은 (1) RCC 컴퓨팅 시스템과 RCC 하드웨어 어레이, 및 (2) (타겟 시스템 및 외부 I/O 장치에 결합된) 외부 인터페이스와 RCC 하드웨어 어레이 사이에서 트래픽 제어를 제공하는 제어로직을 포함한다. RCC 컴퓨팅 시스템은 RCC 하드웨어 어레이에 모델링되는 사용자 설계 부분을 포함하는, 소프트웨어로 전체 설계의 모델을 가지므로, RCC 컴퓨팅 시스템은 또한 외부 인터페이스 및 RCC 하드웨어 어레이 사이를 통과하는 모든 데이터에 대하여 액세스하여야 한다. 제어 로직은 RCC 컴퓨팅 시스템이 이들 데이터를 액세스하는 것을 보장한다. When the target system and the external I / O device are coupled to the joint verification system, pinout data must be provided between the joint verification system and its external interface. The joint verification system includes (1) an RCC computing system and an RCC hardware array, and (2) a control logic that provides traffic control between an RCC hardware array and an external interface (coupled to a target system and an external I / O device). . Since the RCC computing system has a model of the overall design in software, including user-designed parts that are modeled on the RCC hardware array, the RCC computing system must also have access to all data passing between the external interface and the RCC hardware array. Control logic ensures that the RCC computing system accesses this data.

II. 시스템 설명II. System description

도 1은 본 발명의 일실시예의 상위 개략도를 나타낸다. 워크스테이션(10)은 PCI 버스 시스템(50)을 통해 리컨피규러블 하드웨어 모델(20) 및 에뮬레이션 인터페이스(30)에 결합되어 있다. 리컨피규러블 하드웨어 모델(20)은 케이블(61) 뿐만 아니라, PCI 버스(50)를 통하여 에뮬레이션 인터페이스(30)에 결합되어 있다. 타겟 시스템(40)은 케이블(60)을 통하여 에뮬레이션 인터페이스(30)에 결합되어 있다. 다른 실시예에서, 에뮬레이션 인터페이스(30) 및 타겟 시스템(40)(점선 박스로 나타낸)을 포함하는 내부-회로 에뮬레이션 셋업(70)은 특정 테스트/디버그 세션동안 타겟 시스템 환경내의 사용자 회로 설계의 에뮬레이션이 요구되지 않는 경우, 이러한 셋업에 제공되지 않는다. 내부-회로 에뮬레이션 셋업(70) 없이, 리컨피규러블 하드웨어 모델(20)은 PCI 버스(50)를 통하여 워크스테이션(10)과 통신한다. 1 shows a high level schematic diagram of one embodiment of the present invention. Workstation 10 is coupled to reconfigurable hardware model 20 and emulation interface 30 via PCI bus system 50. The reconfigurable hardware model 20 is coupled to the emulation interface 30 via the PCI bus 50 as well as the cable 61. The target system 40 is coupled to the emulation interface 30 via a cable 60. In another embodiment, the internal-circuit emulation setup 70, including the emulation interface 30 and the target system 40 (indicated by the dashed boxes), allows the emulation of the user circuit design in the target system environment during a particular test / debug session. If not required, it is not provided for this setup. Without the internal-circuit emulation setup 70, the reconfigurable hardware model 20 communicates with the workstation 10 via the PCI bus 50.

내부-회로 에뮬레이션 셋업(70)과 함께, 리컨피규러블 하드웨어 모델(20)은 타겟 시스템 내의 몇개의 전자 서브시스템의 사용자의 회로 설계를 에뮬레이션하거나 모사한다. 타겟 시스템의 환경 내에서 전자 서브시스템의 사용자의 회로 설계의 정확한 동작을 보장하기 위하여, 타겟 시스템(40) 및 모델링된 전자 서브시스템 사이의 입력 및 출력 신호는 평가를 위한 리컨피규러블 하드웨어 모델(20)에 제공되어야 한다. 따라서, 리컨피규러블 하드웨어 모델(20)로부터/리컨피규러블 하드웨어 모델(20)로의 타겟 시스템(40)의 입력 및 출력 신호가 에뮬레이션 인터페이스(30) 및 PCI 버스(50)를 통과하여 케이블(60)을 통하여 전달된다. 선택적으로, 타겟 시스템(40)의 입력/출력 신호는 에뮬레이션 인터페이스(30) 및 케이블(61)을 통하여 리컨피규러블 하드웨어 모델(20)에 전달될 수 있다. In conjunction with the internal-circuit emulation setup 70, the reconfigurable hardware model 20 emulates or simulates a user's circuit design of several electronic subsystems in the target system. In order to ensure the correct operation of the circuit design of the user of the electronic subsystem in the environment of the target system, the input and output signals between the target system 40 and the modeled electronic subsystem are converted into a reconfigurable hardware model 20 for evaluation. Should be provided. Thus, input and output signals from target system 40 to / from reconfigurable hardware model 20 pass through emulation interface 30 and PCI bus 50 to cable 60. Is passed through. Optionally, input / output signals of the target system 40 may be communicated to the reconfigurable hardware model 20 via the emulation interface 30 and the cable 61.

제어 데이터 및 몇몇 본체 시뮬레이션 데이터는 PCI 버스(50)를 통하여 리컨피규러블 하드웨어 모델(20) 및 워크스테이션(10) 사이를 통과한다. 실제로, 워크스테이션(10)은 전체 SEmulation 시스템의 동작을 제어하고, 리컨피규러블 하드웨어 모델(20)로 액세스(기록/판독)해야 하는 소프트웨어 커널을 실행한다. Control data and some body simulation data pass between the reconfigurable hardware model 20 and the workstation 10 via the PCI bus 50. In practice, workstation 10 controls the operation of the entire SEmulation system and executes a software kernel that needs to access (write / read) the reconfigurable hardware model 20.

컴퓨터, 키보드, 마우스, 모니터 및 적절한 버스/네트워크 인터페이스로 완성되는 하나의 워크스테이션(10)은 사용자가 전자 시스템의 회로 설계를 기술하는 데이터를 입력하고 교정하게 한다. 예시되는 워크스테이션은 썬마이크로 시스템즈 SPRAC 또는 ULTRA-SPARC 워크스테이션 또는 인텔/마이크로소프트 기반 컴퓨팅 스테이션을 포함한다. 당업자에게 공지된 바와 같이, 워크스테이션(10)은 CPU(11), 로컬 버스(12), 호스트/PCI 브릿지(13), 메모리 버스(14) 및 메인 메모리(15)를 포함한다. 본 발명의 다양한 소프트웨어 시뮬레이션, 하드웨어 가속에 의한 시뮬레이션, 내부-회로 에뮬레이션 및 포스트-시뮬레이션 분석의 실시예들은 워크스테이션(10), 리컨피규러블 하드웨어 모델(20), 및 에뮬레이션 인터페이스(30)내에 제공된다. 소프트웨어에 내장되는 알고리즘은 테스트/디버그 세션동안 메인 메모리(15)내에 저장되어, 워크스테이션의 운영 시스템에 의해 CPU(11)를 통해 실행된다. One workstation 10 complete with a computer, keyboard, mouse, monitor and appropriate bus / network interface allows a user to enter and calibrate data describing the circuit design of the electronic system. Exemplary workstations include Sun Microsystems SPRAC or ULTRA-SPARC workstations or Intel / Microsoft based computing stations. As known to those skilled in the art, workstation 10 includes CPU 11, local bus 12, host / PCI bridge 13, memory bus 14, and main memory 15. Embodiments of various software simulations, hardware-accelerated simulations, internal-circuit emulations, and post-simulation analysis of the present invention are provided within workstation 10, reconfigurable hardware model 20, and emulation interface 30. . Algorithms embedded in the software are stored in main memory 15 during the test / debug session and executed via CPU 11 by the workstation's operating system.

당업자에게 이미 알려진 바와 같이, 운영 시스템이 초기 펌웨어에 의해 워크스테이션(10)의 메모리내로 로딩된 후에, 제어가 필수 데이터 구조를 셋업하기 위한 그 초기화 코드를 패스하여, 디바이스 드라이버를 로드하고 초기화한다. 그리고 나서, 제어가 명령 라인 해석기(CLI)로 패스되어, 사용자가 프로그램을 실행하도록 지시한다. 그후, 운영 시스템이 프로그램을 가동하는 데 필요한 메모리의 양을 결정하고, 메모리 블럭의 위치를 정하거나, 또는 메모리 블럭을 할당하고 BIOS를 통하여 또는 직접 메모리를 액세스한다. 메모리 로딩 프로세스의 종료 후에, 응용 프로그램이 실행을 개시한다. As is already known to those skilled in the art, after the operating system is loaded into the memory of the workstation 10 by the initial firmware, control passes its initialization code to set up the necessary data structures to load and initialize the device driver. Control is then passed to the command line interpreter CLI to instruct the user to execute the program. The operating system then determines the amount of memory needed to run the program, locates the memory block, or allocates the memory block and accesses the memory either directly or through the BIOS. After the end of the memory loading process, the application starts running.

본 발명의 일실시예는 SEmulation의 특정 응용 프로그램이다. 그 실행과정동안, 응용 프로그램은 그것에 제한되지는 않지만, 디스크 파일로부터의 판독 및 기록, 데이터 교신 수행 및 디스플레이/키보드/마우스와 인터페이싱을 포함하는 운영 시스템으로부터의 수많은 서비스를 필요로 할 수 있다. One embodiment of the present invention is a specific application of SEmulation. During its execution, an application program may require numerous services from the operating system, including but not limited to reading and writing from disk files, performing data communications, and interfacing with displays / keyboards / mouses.

워크스테이션(10)은 사용자가 회로설계 데이터를 입력하고, 회로설계 데이터를 컴파일하며, 시뮬레이션 및 에뮬레이션 프로세스를 모니터링하는 한편 결과를 얻고, 본질적으로 시뮬레이션 및 에뮬레이션 프로세스를 제어하도록 하는 적절한 사용자 인터페이스를 가진다. 도 1에는 나타나 있지 않지만, 사용자 인터페이스는 모니터로 보여지며 키보드와 마우스로 입력될 수 있는 사용자 액세스가능한 메뉴방식 옵션(menu-driven option) 및 명령어 세트를 포함한다. 통상, 사용자는 키보드(90)로 컴퓨팅 시스템(80)을 이용한다.Workstation 10 has a suitable user interface that allows a user to enter circuit design data, compile circuit design data, monitor simulation and emulation processes, obtain results while essentially controlling the simulation and emulation process. Although not shown in FIG. 1, the user interface includes a user-accessible menu-driven option and a set of commands that can be viewed with a monitor and entered with a keyboard and mouse. Typically, a user uses computing system 80 with keyboard 90.

사용자는 통상적으로 전자시스템의 특정 회로 설계를 만들어내고, 워크스테이션(10)으로 그의 설계된 시스템의 HDL(보통은 RTL 레벨로 구성된) 코드 기술을 입력한다. 본 발명의 SEmulation 시스템은 소프트웨어 및 하드웨어 사이의 모델링을 분할하기 위하여, 다른 동작들 사이에서 컴포넌트 형태 분석을 수행한다. SEmulation 시스템은 소프트웨어의 작용, RTL 및 게이트 레벨 코드를 모델링한다. 하드웨어 모델링에 있어서, 시스템은 RTL 및 게이트 레벨 코드를 모델링할 수 있으나, RTL레벨은 하드웨어 모델링 이전에 게이트 레벨에 통합되어야 한다. 게이트 레벨 코드는 하드웨어 모델링을 위한 사용가능한 소스 설계 데이터 베이스 포맷으로 바로 프로세싱될 수 있다. RTL 및 게이트 레벨 코드를 사용하여, 시스템은 자동적으로 분할 단계를 완성하기 위하여 컴포넌트 형태 분석을 수행한다. 소프트웨어 컴파일 시간동안의 분할 분석을 기반으로, 시스템은 하드웨어 가속을 통하여 신속한 시뮬레이션을 위한 하드웨어로 회로설계의 몇개 부분을 맵핑한다. 사용자는 모델링된 회로 설계를 회로 에뮬레이션 내의 실제 환경을 위한 타겟 시스템에 결합할 수도 있다. 소프트웨어 시뮬레이션과 하드웨어 가속 엔진은 밀접하게 결합되어 있기 때문에, 소프트웨어 커널을 통하여, 사용자는 소프트웨어 시뮬레이션을 사용하여 전체 회로 설계를 시뮬레이션하고, 맵핑된 회로 설계의 하드웨어 모델을 사용하여 테스트/디버그 프로세스를 가속시키며, 시뮬레이션부로 리턴하며, 그리고 테스트/디버그 프로세스가 수행될 때 까지 하드웨어 가속으로 리턴할 수 있다. 매 사이클당 및 사용자 임의로 소프트웨어 시뮬레이션 및 하드웨어 가속간에 스위치를 할 수 있는 능력은 본 실시예의 중요한 특징중의 하나이다. 이러한 특징은 사용자가 하드웨어 가속 모드를 이용하여 특정 지점 또는 사이클로 신속히 간 후, 소프트웨어 시뮬레이션을 이용하여 다양한 지점을 검사하고 회로 설계를 디버깅할 수 있도록 하기 때문에, 디버그 프로세스에 특히 유용하다. 더욱이, SEmulation 시스템은 컴포넌트의 내부 구현이 하드웨어 또는 소프트웨어로 이루어지는지 사용자에게 모든 컴포넌트를 볼 수 있게 한다. SEmulation 시스템은 사용자가 그러한 판독을 요구할 때, 하드웨어 모델로부터 레지스터 값을 판독하며, 그리고 나서 소프트웨어 모델을 사용하여 결합 컴포넌트를 재형성함으로써 성취한다. 이러한 특징 및 다른 특징은 상세한 설명에서 더욱 상세히 논의될 것이다.
워크스테이션(10)는 버스 시스템(50)에 연결된다. 버스 시스템은 워크스테이션(10), 리컨피규러블 하드웨어 모델(20) 및 에뮬레이션 인터페이스(30)와 같은 다양한 에이전트가 함께 결합하여 동작 가능하게 하는 임의의 이용가능한 버스 시스템이 사용될 수 있다. 바람직하게는, 버스 시스템은 실시간 또는 거의 실시간으로 사용자에게 제공할 정도로 빠르다. 그러한 버스 시스템중 하나는 본 명세서에서 참조되는 주변회로 컴포넌트 상호접속(PCI) 표준에서 상술된 버스 시스템이다. 최근, PCI 표준 개정 2.0은 33㎒ 버스 속도를 제공한다. 개정 2.1은 66㎒ 버스 속도 지원을 제공한다. 따라서, 워크스테이션(10), 리컨피규러블 하드웨어 모델(20) 및 에뮬레이션 인터페이스(30)는 PCI 표준에 따른다.
일 실시예에서, 워크스테이션(10)과 리컨피규러블 하드웨어 모델(20)간의 통신은 PCI 버스로 이루어진다. 다른 PCI 수행 디바이스는 이러한 버스 시스템에서 발견될 수 있다. 이러한 디바이스는 워크스테이션(10), 리컨피규러블 하드웨어 모델(20) 및 에뮬레이션 인터페이스(30)와 같은 레벨 또는 다른 레벨에서 PCI 버스에 결합될 수 있다. PCI 버스(52)와 같이, 다른 레벨에서의 각 PCI 버스는 만약 모두 존재한다면, PCI-대-PCI 브릿지(51)를 통하여 PCI 버스(50)와 같은 다른 PCI 버스 레벨에 연결된다. PCI 버스(52)에서, 2개의 PCI 디바이스(53, 54)가 결합될 수 있다.The user typically creates a specific circuit design of the electronic system and inputs the HDL (usually configured at the RTL level) code description of his designed system to the workstation 10. The SEmulation system of the present invention performs component shape analysis among different operations in order to partition the modeling between software and hardware. The SEmulation system models the software's behavior, RTL and gate level code. In hardware modeling, the system can model RTL and gate level codes, but the RTL level must be integrated at the gate level before hardware modeling. Gate level code can be processed directly into an available source design database format for hardware modeling. Using RTL and gate level codes, the system automatically performs component shape analysis to complete the segmentation step. Based on segmentation analysis during software compile time, the system maps some parts of the circuit design into hardware for rapid simulation through hardware acceleration. The user may combine the modeled circuit design into a target system for the real environment in circuit emulation. Because the software simulation and hardware acceleration engine are tightly coupled, the software kernel allows the user to simulate the entire circuit design using software simulation and to accelerate the test / debug process using the hardware model of the mapped circuit design. Return to the simulation section, and return to hardware acceleration until the test / debug process is performed. The ability to switch between every cycle and between user arbitrarily software simulation and hardware acceleration is one of the important features of this embodiment. This feature is particularly useful in the debug process because it allows the user to quickly go to a specific point or cycle using hardware acceleration mode and then use software simulation to examine various points and debug the circuit design. Moreover, the SEmulation system allows the user to see all components whether the internal implementation of the components is hardware or software. The SEmulation system accomplishes this by reading the register value from the hardware model when the user requires such a read, and then reconstructing the coupling component using the software model. These and other features will be discussed in more detail in the detailed description.
Workstation 10 is connected to bus system 50. The bus system can be any available bus system that allows various agents such as workstation 10, reconfigurable hardware model 20, and emulation interface 30 to operate in conjunction. Preferably, the bus system is fast enough to provide the user in real time or near real time. One such bus system is the bus system described above in the Peripheral Circuit Component Interconnect (PCI) standard referenced herein. Recently, PCI Standard Revision 2.0 provides a 33 MHz bus speed. Revision 2.1 provides 66 MHz bus speed support. Thus, workstation 10, reconfigurable hardware model 20, and emulation interface 30 conform to the PCI standard.
In one embodiment, the communication between workstation 10 and reconfigurable hardware model 20 is via a PCI bus. Other PCI performing devices can be found in this bus system. Such devices may be coupled to the PCI bus at the same level or at different levels, such as workstation 10, reconfigurable hardware model 20, and emulation interface 30. Like PCI bus 52, each PCI bus at another level is connected to another PCI bus level, such as PCI bus 50, through PCI-to-PCI bridge 51, if all present. In the PCI bus 52, two PCI devices 53, 54 may be combined.

삭제delete

리컨피규러블 하드웨어 모델(20)은 사용자의 전자 시스템 설계의 하드웨어부를 설계하기 위하여 프로그래머블하게 구성되고, 리컨피규러블 필드-프로그래머블 게이트 어레이(FPGA)칩 어레이를 포함한다. 이러한 실시예에서, 하드웨어 모델은 리컨피규러블하다; 즉, 특정 계산 또는 사용자 회로 설계를 직접 적합하게 하도록 하드웨어를 리컨피규러블할 수 있다. 예를 들면, 만약 많은 가산기 또는 멀티플렉서가 요구된다면, 시스템은 많은 가산기 및 멀티플렉서를 포함하도록 구성된다. 다른 컴퓨팅 엘리먼트 또는 기능이 요구될 때, 그것들은 시스템내에서 모델링되거나 또는 형성될 수도 있다. 이러한 방식으로, 시스템은 특정 계산 또는 로직 동작을 실행하기 위하여 최적화될 수 있다. 리컨피규러블 시스템은 플렉서블하여, 사용자는 제조, 테스팅 또는 사용 동안에 일어나는 소수의 하드웨어 결점을 해결할 수 있다. 일 실시예에서, 리컨피규러블 하드웨어 모델(20)은 다양한 사용자 회로 설계 및 애플리케이션을 위한 컴퓨팅 자원을 제공하기 위하여 FPGA 칩을 포함하는 컴퓨팅 엘리먼트의 2차원 어레이를 포함한다. 하드웨어 컨피규레이션 프로세스가 더욱 상세히 제공될 것이다.The reconfigurable hardware model 20 is programmable and configured to design the hardware portion of the user's electronic system design and includes a reconfigurable field-programmable gate array (FPGA) chip array. In this embodiment, the hardware model is reconfigurable; In other words, the hardware can be reconfigured to suit a particular calculation or user circuit design directly. For example, if many adders or multiplexers are required, the system is configured to include many adders and multiplexers. When other computing elements or functions are required, they may be modeled or formed within the system. In this way, the system can be optimized to perform specific calculation or logic operations. The reconfigurable system is flexible, allowing the user to address a few hardware defects that occur during manufacturing, testing, or use. In one embodiment, the reconfigurable hardware model 20 includes a two dimensional array of computing elements including FPGA chips to provide computing resources for various user circuit designs and applications. The hardware configuration process will be provided in more detail.

그러한 2개의 FPGA 칩은 Altera 및 Xilinx에 의해 판매된 것을 포함한다. 일부 실시예에서, 리컨피규러블 하드웨어 모델은 필드 프로그래머블 디바이스의 사용을 통하여 리컨피규러블하다. 그러나, 본 발명의 다른 실시예는 주문형 집적회로 (ASIC)기술을 사용하여 구현될 수 있다. 다른 실시예는 주문형 집적 회로(custom IC)의 형태로 이루어질 수 있다.Two such FPGA chips include those sold by Altera and Xilinx. In some embodiments, the reconfigurable hardware model is reconfigurable through the use of a field programmable device. However, other embodiments of the invention may be implemented using application specific integrated circuit (ASIC) technology. Other embodiments may be in the form of custom ICs.

통상적인 테스트/디버그 시나리오에서, 리컨피규러블 디바이스는 실제 원형 제조 전에 적절한 변화가 행해지도록 사용자의 회로 설계를 시뮬레이션/에뮬레이션하도록 사용될 것이다. 다른 예에서, 그러나 이것은 재시뮬레이션 및 재에뮬레이션을 위한 비기능적 회로 설계를 신속하고, 비용 효과적으로 변화시킬 수 있는 능력을 사용자로부터 박탈하지만, 실제 ASIC 또는 주문형 집적 회로가 사용될 수 있다. 그러한 ASIC 또는 주문형 IC가 이미 제조되어 용이하게 이용가능하더라도, 실제 리컨피규러블이 불가능한 칩으로 에뮬레이션을 하는 것이 더 바람직할 수 있다.In a typical test / debug scenario, the reconfigurable device will be used to simulate / emulate the user's circuit design so that appropriate changes are made before actual prototype fabrication. In another example, however, this deprives the user of the ability to quickly and cost-effectively change re-simulation and non-functional circuit design for re-emulation, although actual ASICs or custom integrated circuits may be used. Although such an ASIC or custom IC is already manufactured and readily available, it may be more desirable to emulate a chip that is not actually reconfigurable.

본 발명에 따르면, 외부 하드웨어 모델의 집적도에 따라, 워크스테이션 내의 소프트웨어는 존재하는 시스템 상의 최종 사용자를 위해 더 큰 유연성, 제어 및 성능을 제공한다. 시뮬레이션 및 에뮬레이션을 실행하기 위하여, 회로 설계의 모델 및 관련 파라미터(예를 들면, 입력 테스트-벤치 스티멀러스, 전체 시스템 출력, 중간 결과)이 결정되며, 시뮬레이션 소프트웨어 시스템에 제공된다. 사용자는 시스템 회로 설계를 정의하기 위하여 스키메틱 캡쳐 장치나 분석 장치를 사용할 수 있다. 사용자는 통상적으로 드래프트 스키메틱 형태로 전자 시스템 회로 설계를 시작하며, 그리고 나서 분석 장치를 사용하여 HDL로 전화된다. HDL은 사용자에 의해 직접 기입될 수도 있다. 예시적인 HDL 언어는 Verilog 및 VHDL을 포함한다; 그러나 다른 언어도 사용 가능하다. HDL에 표현된 회로 설계는 많은 협력 컴포넌트를 포함한다. 각각의 컴포넌트는 회로 소자의 동작을 정의하거나 또는 시뮬레이션 실행을 제어하는 코드 시퀀스이다.According to the present invention, depending on the density of the external hardware model, the software in the workstation provides greater flexibility, control and performance for the end user on the existing system. In order to run the simulations and emulations, the model of the circuit design and associated parameters (eg input test-bench stimulus, total system output, intermediate results) are determined and provided to the simulation software system. The user can use the schematic capture device or the analysis device to define the system circuit design. The user typically begins the design of the electronic system circuit in draft schematic form, and then uses an analysis device to call HDL. The HDL may be written directly by the user. Exemplary HDL languages include Verilog and VHDL; However, other languages are available. The circuit design represented in the HDL includes many cooperative components. Each component is a sequence of code that defines the behavior of a circuit element or controls simulation execution.

SEmulation 시스템은 그것들의 컴포넌트 형태를 결정하기 위하여 분석하며, 컴파일러는 소프트웨어 및 하드웨어 내의 상이한 실행 모델을 만들기 위하여 이러한 컴포넌트 형태 정보를 사용한다. 그리고 나서, 사용자는 본 발명의 SEmulation 시스템을 사용할 수 있다. 설계자는 입력 신호와 같은 다양한 스티멀러스를 인가함으로써 시뮬레이션을 통하여 회로의 정확성을 확인하고, 시뮬레이션된 모델의 벡터 패턴을 테스트할 수 있다. 만약 시뮬레이션 동안에, 회로가 계획대로 동작하지 않으면, 사용자는 회로 스키메틱 또는 HDL 파일을 조정함으로써 회로를 재정의할 수 있다.The SEmulation system analyzes to determine their component types, and the compiler uses this component type information to create different execution models in software and hardware. The user can then use the SEmulation system of the present invention. By applying various stimulus, such as the input signal, the designer can verify the accuracy of the circuit through the simulation and test the vector pattern of the simulated model. If during the simulation, the circuit does not work as planned, the user can redefine the circuit by adjusting the circuit schematic or HDL file.

본 발명의 이러한 실시예의 사용은 도 2의 흐름도에 도시된다. 알고리즘은 단계 100에서 시작한다. 시스템으로 HDL 파일을 로딩한 후에, 시스템은 하드웨어 모델에 적합하도록 회로 설계를 컴파일, 분할, 및 맵핑한다. 컴파일, 분할 및 맵핑 단계는 이하에서 더욱 상세히 논의될 것이다.The use of this embodiment of the present invention is shown in the flowchart of FIG. The algorithm starts at step 100. After loading the HDL file into the system, the system compiles, partitions, and maps the circuit design to fit the hardware model. Compiling, splitting and mapping steps will be discussed in more detail below.

시뮬레이션 실행 전에, 시스템은 하드웨어 가속 모델이 기능할 수 있기 전에, 소프트웨어 내의 모든 미지의 "x" 값을 제거하기 위하여 리셋 시퀀스를 실행해야 한다. 본 발명의 일 실시예는 버스 신호 "00"는 로직 로우, "01"은 로직 하이, "10"은 "z" 및 "11"은 "x"-를 위해 4 상태 값을 제공하기 위해 2-비트 와이드 데이터 경로를 사용한다. 당업자가 알 수 있는 바와 같이, 소프트웨어 모델은 "0", "1", "x"(버스 충돌 또는 미지값), 및 "z"(드라이버 없음 또는 하이 임피던스)를 처리할 수 있다. 그와 반대로, 하드웨어는 미지값 "x"를 처리할 수 없어서, 특정 적용가능한 코드에 의존하여 변화하는 리셋 시퀀스는 레지스터 값을 모두 "0" 또는 "1"로 리셋시킨다.Before running the simulation, the system must execute a reset sequence to remove all unknown "x" values in software before the hardware acceleration model can function. In one embodiment of the present invention, the bus signal "00" is a logic low, "01" is a logic high, "10" is a "z" and "11" is a "x"-to provide a four-state value for the 2- Use a bit wide data path. As will be appreciated by those skilled in the art, the software model can handle "0", "1", "x" (bus crash or unknown value), and "z" (no driver or high impedance). On the contrary, the hardware cannot handle the unknown value "x" so that the reset sequence that changes depending on the particular applicable code resets all register values to "0" or "1".

단계 105에서, 사용자는 회로 설계를 시뮬레이션할 것인지를 결정한다. 통상적으로, 사용자는 먼저 소프트웨어 시뮬레이션으로 시스템을 시작할 것이다. 그러므로, 만약 단계 105에서 결정이 "예" 라면, 소프트웨어 시뮬레이션은 단계 110에서 발생한다. In step 105, the user determines whether to simulate the circuit design. Typically, the user will first start the system with software simulation. Therefore, if the decision is "Yes" in step 105, then a software simulation occurs in step 110.

사용자는 단계 115에서 도시된 바와 같이, 값을 검사하기 위하여 시뮬레이션을 중단한다. 실제로, 사용자는 단계 115에서 하드웨어 가속 모드, ICE 모드 및 포스트-시뮬레이션 모드내의 다양한 노드로 연장되는 점선 라인에 의해 도시된 바와 같이 테스트/디버그 세션 동안의 임의의 시간에 시뮬레이션을 중단할 수 있다. 실행 단계 115는 사용자를 단계 160으로 유도한다.
중단 후에, 사용자가 결합 컴포넌트 값을 검사하기를 원한다면, 시스템 커널은 결합 컴포넌트를 포함하는 전체 소프트웨어 모델을 재생성하기 위하여 하드웨어 레지스터 컴포넌트의 상태를 재판독한다. 전체 소프트웨어 모델을 저장한 후에, 사용자는 시스템내의 임의의 신호 값을 검사할 수 있다. 중단 및 검사 후에, 사용자는 시뮬레이션 모드 또는 하드웨어 가속 모드에서 계속 실행할 수 있다. 흐름도에 도시된 것처럼, 단계 115는 중단/값 검사 루틴으로 분기된다. 중단/값 검사 루틴은 단계 160에서 시작한다. 단계 165에서, 사용자는 이러한 포인트에서 시뮬레이션을 중단하고, 값을 검사할 것인지를 결정해야 한다. 만약 단계 165가 "예"를 결정한다면, 단계 170은 현재 진행될 시뮬레이션을 중단하고, 회로 설계의 정확성을 점검하기 위해 다양한 값을 검사한다. 단계 175에서, 알고리즘은 단계 115로 분기되어 있는 지점으로 리턴된다. 여기서, 사용자는 테스트/디버그 세션의 나머지를 위해 시뮬레이션 및 중단/ 값 검사를 계속하거나 또는 회로내의 에뮬레이션 단계로 진행할 수 있다.
유사하게, 만약 단계 105가 "아니오"를 결정하면, 알고리즘은 하드웨어 가속 결정 단계 120을 진행할 것이다. 단계 120에서, 사용자는 모델링된 회로 설계의 하드웨어부를 통한 시뮬레이션을 가속함으로써 테스트/디버그 프로세스를 가속할 것인지를 결정한다. 만약 단계 120에서의 결정이 "예"라면, 그러면 하드웨어 모델 가속은 단계 125에서 발생한다. 시스템 컴파일 프로세스 동안에, SEmulation 시스템은 하드웨어 모델 안으로 일부 맵핑된다. 여기서, 하드웨어 가속이 요구되면, 시스템은 레지스터 및 결합 컴포넌트가 하드웨어 모델로 이동하며, 입력 및 평가 값이 하드웨어 모델로 이동한다. 그러므로, 하드웨어 가속 동안에, 가속된 속도로 긴 시간 주기동안에 하드웨어 모델에서 평가가 일어난다. 커널은 테스트-벤치 출력을 하드웨어 모델에 기록하고, 소프트웨어 클럭을 업데이트하며, 그리고 나서 하드웨어 모델 출력값을 사이클마다 판독한다. 만약 사용자에 의해 요구된다면, 전체 회로 설계인 사용자의 회로 설계의 전체 소프트웨어 모델로부터의 값은 레지스터 값과 결합 컴포넌트를 출력하고, 레지스터 값으로 결합 컴포넌트를 재생성함으로써 이용가능하게 할 수 있다. 이러한 결합 컴포넌트를 재생성하기 위한 소프트웨어 개입의 필요성 때문에, 전체 소프트웨어 모델을 위한 출력값은 매 사이클마다 제공되지 않으며, 값은 사용자가 그러한 값을 원하는 경우에만 제공된다. 이러한 설명은 결합 컴포넌트 재생성 프로세서에서 논의할 것이다.The user stops the simulation to check the value, as shown in step 115. Indeed, the user may abort the simulation at any time during the test / debug session as shown by the dashed line extending to various nodes in the hardware acceleration mode, ICE mode and post-simulation mode in step 115. Execution step 115 directs the user to step 160.
After the abort, if the user wants to check the combined component value, the system kernel rereads the state of the hardware register component to regenerate the entire software model containing the combined component. After saving the entire software model, the user can examine any signal value in the system. After stopping and checking, the user can continue to run in simulation mode or hardware acceleration mode. As shown in the flowchart, step 115 branches to the abort / value check routine. The abort / value check routine begins at step 160. In step 165, the user must stop the simulation at this point and decide whether to check the value. If step 165 determines "Yes," step 170 stops the simulation that is currently going on and checks various values to check the accuracy of the circuit design. In step 175, the algorithm returns to the point where it branches to step 115. Here, the user can continue the simulation and stop / value checking for the remainder of the test / debug session or proceed to the emulation phase in the circuit.
Similarly, if step 105 determines "no", the algorithm will proceed to hardware acceleration decision step 120. In step 120, the user determines whether to accelerate the test / debug process by accelerating the simulation through the hardware portion of the modeled circuit design. If the determination at step 120 is yes, then hardware model acceleration occurs at step 125. During the system compilation process, the SEmulation system is partially mapped into the hardware model. Here, if hardware acceleration is required, the system moves registers and coupling components to the hardware model and input and evaluation values to the hardware model. Therefore, during hardware acceleration, evaluation takes place in the hardware model over long periods of time at an accelerated rate. The kernel writes test-bench outputs to the hardware model, updates the software clock, and then reads the hardware model outputs every cycle. If required by the user, the value from the full software model of the user's circuit design, which is the overall circuit design, can be made available by outputting the register value and the coupling component and regenerating the coupling component with the register value. Because of the need for software intervention to recreate this coupling component, the output for the entire software model is not provided every cycle, and the value is provided only if the user desires such a value. This description will be discussed in the Combined Component Regeneration Processor.

삭제delete

다시, 사용자는 단계 115에 개시된 바와 같이, 임의의 시간에 하드웨어 가속 모드를 중단시킬 수 있다. 만약 사용자가 중단하기를 원한다면, 알고리즘은 중단/값 검사 루틴을 분기하기 위하여 단계 115 및 160을 진행한다. 여기서, 단계 115에서와 같이, 사용자는 임의의 시간에 하드웨어 가속된 시뮬레이션 프로세스를 중단할 수 있으며, 시뮬레이션 프로세스로부터 발생되는 값을 검사할 수 있거나, 또는 사용자는 하드웨어-가속 시뮬레이션 프로세스를 계속할 수 있다. 중단/값 검사 루틴은 단계 160, 165, 170 및 175에 분기되며, 이러한 단계는 시뮬레이션 중단에서 언급되었다. 단계 125후에 주요 루틴으로 리턴하며, 사용자는 하드웨어-가속 시뮬레이션을 계속할 것인지 또는 대신에 단계 135에서 순수 시뮬레이션을 실행할 것인지를 결정할 수 있다. 만약 사용자가 시뮬레이션을 더 하기를 원하면, 알고리즘은 단계 105를 진행한다. 만약, 그렇지 않다면, 알고리즘은 단계 140에서 포스트-시뮬레이션 분석을 진행한다.
단계 140에서, SEmulation 시스템은 많은 포스트-시뮬레이션 분석 특징으로 제공한다. 시스템은 모든 입력은 하드웨어 모델에 로그(log)한다. 하드웨어 모델 출력을 위하여, 시스템은 사용자 정의 로깅 주파수(예를 들면, 1/10,000 기록/사이클)에서 하드웨어 레지스터 컴포넌트의 모든 값을 기입한다. 로깅 주파수는 출력값이 얼마나 자주 기록되는지를 결정한다. 1/10,000 기록/사이클의 로깅 주파수 동안에, 출력값은 10,000 사이클마다 한 번 기록된다. 로깅 주파수가 더 높으면, 나중의 포스트-시뮬레이션 분석 동안에 더 많은 정보가 기록된다. 선택된 로깅 주파수는 SEmulation 속도와 임시의 관계를 갖기 때문에, 사용자는 주의 깊게 로깅 주파수를 선택한다. 시스템은 더 많은 시뮬레이션이 실행되기 전에 I/O 동작을 메모리에 실행함으로써 출력 데이터를 기록하기 위한 자원과 시간을 소비해야만 하기 때문에, 더 높은 로깅 주파수는 SEmulation 속도를 감소시킬 것이다.
포스트-시뮬레이션 분석에 관하여, 사용자는 시뮬레이션이 요구하는 특정 지점을 선택한다. 그리고 나서, 사용자는 SEmulation 후에, 값 변화와 모든 하드웨어 컴포넌트의 내부 상태를 계산하기 위하여 하드웨어 모델에 입력 로그를 갖는 소프트웨어 시뮬레이션을 실행시킴으로써 분석한다. 시뮬레이션 결과를 분석하기 위하여 선택된 로깅 지점으로부터 데이터를 시뮬레이션하기 위하여 하드웨어 가속기가 사용된다는 것을 유의한다. 이러한 포스트-시뮬레이션 분석 방법은 포스트-시뮬레이션을 위한 어떤 시뮬레이션 파형 뷰어로 링크될 수 있다. 이하에서 더욱 상세히 논의될 것이다.
단계 145에서, 사용자는 타겟 시스템 환경내에서 시뮬레이션된 회로 설계를 에뮬레이션하기 위하여 선택할 수 있다. 만약 단계 145가 "아니오"로 결정하면, 알고리즘은 종료되고 SEmulation 프로세스는 단계 155에서 종료된다. 만약 타겟 시스템을 갖는 에뮬레이션이 요구된다면, 알고리즘은 단계 150을 진행한다. 이러한 단계는 에뮬레이션 인터페이스 보드 액티브화, 케이블 및 칩 핀 어댑터를 타겟 시스템에 플러깅 및 타겟 시스템으로부터 시스템 I/O를 획득하기 위하여 타겟 시스템을 실행하는 단계를 포함한다. 타겟 시스템으로부터 시스템 I/O는 타겟 시스템 및 회로 설계의 에뮬레이션 사이의 신호를 포함한다. 에뮬레이션된 회로 설계는 타겟 시스템으로부터 입력 신호를 수신하고, 이것을 처리하며, 다른 프로세싱을 위하여 SEmulation 시스템에 보내며, 처리된 신호를 타겟 시스템에 출력한다.
이와는 반대로, 에뮬레이션된 회로 설계는 출력 신호를 타겟 시스템에 보내며, 이러한 신호를 처리하며, 에뮬레이션된 회로 설계에 처리된 신호를 역 출력한다. 이러한 방식으로, 회로 설계의 성능은 본래 타겟 시스템 환경에서 평가될 수 있다. 타겟 시스템으로 에뮬레이션을 한 후에, 사용자는 회로 설계를 확인하거나 기능적인 면을 나타내는 결과를 갖는다. 이러한 지점에서, 사용자는 단계 135에서 개시된 바와 같이, 다시 시뮬레이션/에뮬레이션하거나, 회로 설계를 변형하기 위하여 중단하거나, 또는 유효한 회로 설계에 근거하여 집적 회로 제조를 진행할 수 있다.Again, the user can stop the hardware acceleration mode at any time, as disclosed in step 115. If the user wants to abort, the algorithm proceeds to steps 115 and 160 to branch the abort / value checking routine. Here, as in step 115, the user can stop the hardware accelerated simulation process at any time, check the values resulting from the simulation process, or the user can continue the hardware-accelerated simulation process. The stop / value checking routine branches to steps 160, 165, 170, and 175, which are mentioned in the simulation stop. Returning to the main routine after step 125, the user can decide whether to continue the hardware-accelerated simulation or instead run a pure simulation in step 135. If the user wants to add more simulations, the algorithm proceeds to step 105. If not, the algorithm proceeds to post-simulation analysis in step 140.
In step 140, the SEmulation system provides a number of post-simulation analysis features. The system logs all input to the hardware model. For hardware model output, the system writes all the values of the hardware register components at user defined logging frequencies (e.g., 1 / 10,000 writes / cycle). The logging frequency determines how often the output is recorded. During a logging frequency of 1 / 10,000 writes / cycle, the output value is written once every 10,000 cycles. The higher the logging frequency, the more information is recorded during later post-simulation analysis. Since the selected logging frequency has a temporary relationship with the SEmulation rate, the user carefully selects the logging frequency. Higher logging frequencies will reduce the SEmulation rate because the system must spend time and resources to write output data by executing I / O operations in memory before more simulations are run.
Regarding post-simulation analysis, the user selects a particular point that the simulation requires. Then, after the SEmulation, the user analyzes by running a software simulation with an input log in the hardware model to calculate the value change and the internal state of all hardware components. Note that a hardware accelerator is used to simulate the data from the selected logging point to analyze the simulation results. This post-simulation analysis method can be linked to any simulation waveform viewer for post-simulation. It will be discussed in more detail below.
In step 145, the user can select to emulate a simulated circuit design within the target system environment. If step 145 determines no, the algorithm terminates and the SEmulation process ends at step 155. If emulation with the target system is required, the algorithm proceeds to step 150. These steps include emulation interface board activation, plugging cables and chip pin adapters into the target system and executing the target system to obtain system I / O from the target system. System I / O from the target system includes the signal between the emulation of the target system and the circuit design. The emulated circuit design receives an input signal from the target system, processes it, sends it to the SEmulation system for further processing, and outputs the processed signal to the target system.
In contrast, an emulated circuit design sends an output signal to the target system, processes the signal, and outputs the processed signal back to the emulated circuit design. In this way, the performance of the circuit design can be assessed in the original target system environment. After emulation with the target system, the user has a result that confirms the circuit design or shows a functional aspect. At this point, the user may again simulate / emulate, interrupt to modify the circuit design, or proceed with integrated circuit fabrication based on the valid circuit design, as disclosed in step 135.

삭제delete

Ⅲ. 시뮬레이션/하드웨어 가속 모드III. Simulation / Hardware Acceleration Mode

본 발명의 일 실시예에 따라, 컴파일 시간 및 실행 시간 동안에 소프트웨어 컴파일 또는 하드웨어 컨피규레이션의 상위 레벨 블록도가 도 3에 도시된다. 도 3은 2세트의 정보를 도시하며; 한 세트의 정보는 컴파일 시간 및 시뮬레이션/에뮬레이션 실행 시간 동안에 실행된 동작을 구별하며; 다른 세트의 정보는 소프트웨어 모델 및 하드웨어 모델 사이의 분할을 도시한다. 처음에, 본 발명의 일 실시예에 따른 SEmulation 시스템은 입력 데이터(200)로서 사용자 회로 설계를 요구한다. 사용자 회로 설계는 HDL 파일(예를 들면, Verilog, VHDL)의 형태이다. SEmulation 시스템은 HDL 파일을 축소하여, 동작 레벨 코드, 레지스터 전달 레벨 코드 및 게이트 레벨 코드는 SEmulation 시스템에 의해 사용할 수 있는 형태로 감소될 수 있다. 시스템은 전처리 단계 205를 위하여 소스 설계 데이터베이스를 생성한다. 처리된 HDL 파일은 SEmulation 시스템에 의해 사용 가능하다. 파싱 프로세스(parsing process)는 ASCII 데이터를 내부 이진 데이터 구조로 변환시키며, 이는 당업자에게 공지되어 있다. 본 발명에 참조로 포함될 수 있는 ALFRED V.AHO, RAVISETHI, JEFFREY D.ULLMAN 편저, PRINCIPLE, THEHNIQUE AND TOOLS(1988)를 참조한다.In accordance with one embodiment of the present invention, a high level block diagram of software compilation or hardware configuration during compile time and run time is shown in FIG. 3. 3 shows two sets of information; A set of information distinguishes actions performed during compilation time and simulation / emulation execution time; The other set of information shows the division between the software model and the hardware model. Initially, the SEmulation system in accordance with one embodiment of the present invention requires user circuit design as input data 200. The user circuit design is in the form of HDL files (eg Verilog, VHDL). The SEmulation system shrinks the HDL file so that the operation level code, register transfer level code and gate level code can be reduced to a form that can be used by the SEmulation system. The system creates a source design database for preprocessing step 205. The processed HDL file is available by the SEmulation system. The parsing process converts ASCII data into an internal binary data structure, which is known to those skilled in the art. See ALFRED V. AHO, RAVISETHI, JEFFREY D. Fullman, PRINCIPLE, THEHNIQUE AND TOOLS (1988), which may be incorporated by reference herein.

컴파일 시간은 프로세스 225에 의해 표현되며, 실행 시간은 프로세스/소자 (230)에 의해 표현된다. 프로세스 (225)에 의해 도시된, 컴파일 시간 동안에, SEmulation 시스템은 컴포넌트 타입 분석을 실행함으로써 처리된 HDL 파일을 컴파일한다. 컴포넌트 형태 분석은 HDL 컴포넌트를 결합 컴포넌트, 레지스터 컴포넌트, 클럭 컴포넌트, 메모리 컴포넌트 및 테스트-벤치 컴포넌트로 분류한다. 시스템은 사용자 회로 설계를 제어 및 평가 컴포넌트로 분할한다.Compilation time is represented by process 225, and execution time is represented by process / device 230. During compile time, shown by process 225, the SEmulation system compiles the processed HDL file by performing component type analysis. Component shape analysis classifies HDL components into combined components, register components, clock components, memory components, and test-bench components. The system divides the user circuit design into control and evaluation components.

SEmulation 컴파일러(210)는 시뮬레이션의 제어 컴포넌트를 소프트웨어에 맵핑시키며, 평가 컴포넌트를 소프트웨어 및 하드웨어에 맵핑시킨다. 컴파일러(210)는 모든 HDL 컴포넌트를 위한 소프트웨어 모델을 생성한다. 소프트웨어 모델은 코드(215)에 배정된다. 추가적으로, SEmulation 컴파일러(210)는 HDL 파일의 컴포넌트 형태 정보를 사용하며, 라이브러리나 모듈 생성기로부터 하드웨어 로직 블록/소자를 선택 또는 생성하며, 그리고 특정 HDL 컴포넌트를 위한 하드웨어 모델을 생성한다. 최종 결과는 소위 "비트스트림" 컨피규레이션 파일(220)이다. The SEmulation compiler 210 maps the control components of the simulation to software, and the evaluation components to software and hardware. Compiler 210 generates a software model for all HDL components. The software model is assigned to code 215. In addition, the SEmulation compiler 210 uses component type information in the HDL file, selects or creates hardware logic blocks / elements from the library or module generator, and generates a hardware model for a particular HDL component. The final result is a so-called "bitstream" configuration file 220.

실행 시간의 준비시, 코드 형태의 소프트웨어 모델은 본 발명의 일 실시예에 따른 SEmulation 프로그램과 연동되는 응용 프로그램이 저장되는 메인 메모리에 저장된다. 이러한 코드는 범용 프로세서나 워크스테이션(240)내에서 처리된다. 거의 동시에, 하드웨어 모델을 위한 컨피규레이션 파일(220)은 사용자 회로 설계를 리컨피규러블 하드웨어 보드(250)에 맵핑하기 위하여 사용된다. 여기서, 하드웨어 내에 모델링된 회로 설계의 부분은 리컨피규러블 하드웨어 보드(250)내의 FPGA 칩안에 맵핑되고, 분할된다.In preparation of the execution time, the software model in code form is stored in a main memory in which an application program linked with a SEmulation program according to an embodiment of the present invention is stored. Such code is processed in a general purpose processor or workstation 240. At about the same time, the configuration file 220 for the hardware model is used to map the user circuit design to the reconfigurable hardware board 250. Here, the portion of the circuit design modeled in the hardware is mapped and divided into the FPGA chip in the reconfigurable hardware board 250.

상기에서 설명된 바와 같이, 사용자 테스트-벤치 스티멀러스(stimulus) 및 테스트 벡터 데이터 및 다른 테스트-벤치 자원(235)은 시뮬레이선 목적을 위하여 범용 프로세서 또는 워크스테이션(240)에 제공된다. 게다가, 사용자는 소프트웨어 제어를 통하여 회로 설계의 에뮬레이션을 실행할 수 있다. 리컨피규러블 하드웨어 보드(250)는 사용자의 에뮬레이션된 회로 설계를 포함한다. 이러한 SEmulation 시스템은 사용자가 소프트웨어 시뮬레이션 및 하드웨어 에뮬레이션 사이의 선택적으로 스위치하며, 임의의 시간에 시뮬레이션 또는 에뮬레이션 프로세스를 중단하며, 모델내에서 모든 컴포넌트로부터 값을 검사하기 위한 능력을 갖는다. 그러므로, SEmulation 시스템은 테스트-벤치(235)와 시뮬레이션을 위한 프로세서/워크스테이션 (240) 사이의 데이터를 패스시키며, 에뮬레이션을 위한 프로세서/워크스테이션(240)과 데이터 버스(245)를 통하여 테스트-벤치(235)와 리컨피규러블 하드웨어 보드(250) 사이의 데이터를 패스시킨다. 만약, 사용자 타겟 시스템(260)이 포함된다면, 에뮬레이션 데이터는 리컨피규러블 하드웨어 보드(250)및 에뮬레이션 인터페이스(255) 및 데이터 버스(245)를 통하여 타겟 시스템(260) 사이를 패스할 수 있다. 커널은 프로세서/워크스테이션(240)의 메모리 내의 소프트웨어 시뮬레이션 모델 내에서 발견되어, 데이터는 프로세서/워크스테이션(240)과 데이터 버스(245)를 통하여 리컨피규러블 하드웨어 보드(250) 사이를 패스한다.As described above, user test-bench stimulus and test vector data and other test-bench resources 235 are provided to general purpose processor or workstation 240 for simulation purposes. In addition, the user can perform emulation of the circuit design through software control. Reconfigurable hardware board 250 includes a user's emulated circuit design. This SEmulation system has the ability for the user to selectively switch between software simulation and hardware emulation, to interrupt the simulation or emulation process at any time, and to check values from all components in the model. Thus, the SEmulation system passes data between the test-bench 235 and the processor / workstation 240 for simulation, and the test-bench through the data bus 245 and the processor / workstation 240 for emulation. Data between the 235 and the reconfigurable hardware board 250 is passed. If user target system 260 is included, emulation data may pass between target system 260 via reconfigurable hardware board 250 and emulation interface 255 and data bus 245. The kernel is found in a software simulation model in memory of the processor / workstation 240 so that data passes between the reconfigurable hardware board 250 via the processor / workstation 240 and the data bus 245.

도 4는 본 발명의 일 실시예에 따른 컴파일 프로세스의 흐름도를 도시한다. 컴파일 프로세스는 도 3에 도시된 프로세스 205 및 210이다. 도 4의 컴파일 프로세스는 단계 300에서 시작한다. 단계 301은 전단 정보를 처리한다. 여기서 게이트 레벨 HDL 코드가 생성된다. 사용자는 코드의 게이트 레벨 HDL 표현을 생성하기 위하여 스키메틱 또는 분석 장치를 사용하거나 또는 코드를 직접 기입함으로써 초기 회로 설계를 HDL 형태로 변화시킨다. SEmulation 시스템은 HDL 파일(ASCII 포맷)을 이진 포맷으로 파싱(parsing)하여, 동작 레벨 코드, 레지스터 전달 레벨(RTL) 및 게이트 레벨 코드가 SEmulation 시스템에 의해 허용가능한 내부 데이터 구조 형태로 감소될 수 있다. 시스템은 파싱된 HDL 코드를 함유하는 소스 설계 데이터베이스를 생성한다.4 shows a flow diagram of a compilation process according to one embodiment of the invention. The compilation process is processes 205 and 210 shown in FIG. The compilation process of FIG. 4 begins at 300. Step 301 processes the leaflet information. The gate level HDL code is generated here. The user transforms the initial circuit design into HDL form by using a schematic or analysis device to generate a gate level HDL representation of the code, or by writing the code directly. The SEmulation system parses the HDL file (ASCII format) into binary format so that the operation level code, register transfer level (RTL) and gate level code can be reduced to an internal data structure form acceptable by the SEmulation system. The system creates a source design database containing parsed HDL code.

단계 302는 HDL 컴포넌트를 컴포넌트 형태 자원(303)으로 도시된 결합 컴포넌트, 레지스터 컴포넌트, 클럭 컴포넌트, 메모리 컴포넌트 및 테스트-벤치 컴포넌트로 분류함으로써 컴포넌트 형태 분석을 실행한다. SEmulation 시스템은 이하에서 논의되는 예외를 갖는, 레지스터 및 결합 컴포넌트를 위한 하드웨어 모델을 생성한다. 테스트-벤치 및 메모리 컴포넌트는 소프트웨어로 맵핑된다. 일부 클럭 컴포넌트(예를 들면, 유도된 클럭)는 하드웨어내에 모델링되며, 다른 컴포넌트들은 소프트웨어/하드웨어 경계(예를 들면, 소프트웨어 클럭)내에 존재한다.Step 302 performs component type analysis by classifying the HDL component into a combined component, a register component, a clock component, a memory component, and a test-bench component shown as component type resource 303. The SEmulation system creates a hardware model for registers and coupling components, with the exceptions discussed below. Test-bench and memory components are mapped to software. Some clock components (e.g., derived clocks) are modeled in hardware, while other components reside within software / hardware boundaries (e.g., software clocks).

결합 컴포넌트는 비상태(stateless) 로직 컴포넌트이며, 그것의 출력값은 현재 입력값의 함수이며, 과거 입력 값에 의존하지 않는다. 결합 컴포넌트의 예는 원시 게이트(예를 들면, AND, OR, XOR, NOT), 선택기, 어댑터, 멀티플렉서, 시프터, 및 버스 드라이버를 포함한다.Coupling components are stateless logic components whose output values are a function of the current input value and do not depend on past input values. Examples of coupling components include source gates (eg, AND, OR, XOR, NOT), selectors, adapters, multiplexers, shifters, and bus drivers.

레지스터 컴포넌트는 간단한 저장 소자이다. 레지스터의 상태 변이는 클럭 신호에 의해 제어된다. 레지스터의 한 형태는 에지가 검출될 때 상태가 변화하는 에지-트리거 레지스터이다. 다른 형태의 레지스터는 레벨 트리거되는 래치이다. 예는 플립-플롭(D-타입, JK-타입) 및 레벨-검출(level-detect) 래치를 포함한다. Register components are simple storage elements. The state transition of a register is controlled by a clock signal. One type of register is an edge-trigger register whose state changes when an edge is detected. Another type of register is a level triggered latch. Examples include flip-flops (D-type, JK-type) and level-detect latches.

클럭 컴포넌트는 로직 소자의 동작을 제어하기 위하여 로직 소자에 주기적인 신호를 보내는 소자이다. 통상적으로 클럭 신호는 레지스터의 업데이트를제어한다. 주요 클럭은 셀프-타임 테스트-벤치 프로세스로부터 발생된다. 예를 들면, Verilog 내의 클럭 발생을 위한 통상적인 테스트-벤치 프로세스는 다음과 같다.The clock component is a device that sends a periodic signal to the logic device to control the operation of the logic device. Typically the clock signal controls the updating of registers. The main clock is generated from the self-time test-bench process. For example, a typical test-bench process for clock generation in Verilog is as follows.

시작start

클럭 = 0;Clock = 0;

#5;# 5;

클럭 = 1;Clock = 1;

#5;# 5;

끝;End;

이러한 코드에 따라, 클럭 신호는 최초에 로직 "0"이다. 5 타임 유닛 후에, 클럭 신호는 로직 "1"로 변화한다. 5 타임 유닛 후에, 클럭 신호는 다시 로직 "0"으로 변한다. 통상적으로, 주요 클럭 신호는 소프트웨어에서 생성되며, 단지 몇몇(즉, 1-10) 주요 클럭은 통상적인 사용자 회로 설계에서 생긴다. 유도된 또는 게이트된 클럭은 주요 클럭에 의해 차례대로 유도되는 레지스터 및 조합 로직의 네트워크로부터 생성된다. 많은(즉, 1000 이상) 유도된 클럭은 통상적인 사용자 회로 설계내에서 생긴다.According to this code, the clock signal is initially logic "0". After 5 time units, the clock signal changes to logic " 1. " After 5 time units, the clock signal changes back to logic " 0 ". Typically, the major clock signal is generated in software, and only a few (ie 1-10) major clocks occur in a typical user circuit design. The derived or gated clock is generated from a network of register and combinational logic that is in turn driven by the primary clock. Many (ie, more than 1000) derived clocks occur within conventional user circuit designs.

메모리 컴포넌트는 어드레스를 갖는 블록 저장 컴포넌트이며, 특정 메모리 위치내의 개별 데이터를 액세스하기 위하여 라인을 제어한다. 예로서, ROM, 비동기식식 RAM, 동기식 RAM이 있다.The memory component is a block storage component with an address and controls lines to access individual data within a particular memory location. Examples include ROM, asynchronous RAM, synchronous RAM.

테스트-벤치 컴포넌트는 시뮬레이션 프로세스를 제어하고 모니터하기 위하여 사용된 소프트웨어 프로세스이다. 따라서, 이러한 컴포넌트는 테스트 동안의 하드웨어의 일부가 아니다. 테스트-벤치 컴포넌트는 클럭 신호를 제어하고, 시뮬레이션 데이터를 초기화하며, 디스크/메모리로부터 시뮬레이션 테스트 벡터 패턴을 판독함으로써 시뮬레이션을 제어한다. 테스트-벤치 컴포넌트는 값 변화를 점검하고, 값 변화 덤프를 실행하며, 신호 값 관계상의 제한을 점검하고, 출력 테스트 벡터를 디스크/ 메모리에 기록하며, 다양한 파형 뷰어 및 디버거와 인터페이스함으로써 시뮬레이션을 모니터한다.Test-bench components are software processes used to control and monitor the simulation process. Thus, these components are not part of the hardware during the test. The test-bench component controls the simulation by controlling the clock signal, initializing the simulation data, and reading the simulation test vector pattern from disk / memory. The test-bench component monitors simulation by checking for value changes, executing value change dumps, checking signal value relational constraints, writing output test vectors to disk / memory, and interfacing with various waveform viewers and debuggers. .

SEmulation 시스템은 컴포넌트 형태 분석을 다음과 같이 행한다. 시스템은 2진 소스 설계 데이터베이스를 시험한다. 소스 설계 데이터베이스에 근거하여, 시스템은 상기 컴포넌트 형태중 하나의 소자로서 분류할 수 있다. 연속 할당 진술은 결합 컴포넌트로서 분류된다. 원시 게이트는 언어 정의에 의하여 레지스터 형태의 래치 또는 조합 형태이다. 초기화 코드는 초기화 형태의 테스트-벤치로서 처리된다.The SEmulation system performs component type analysis as follows. The system tests the binary source design database. Based on the source design database, the system may classify as one of the component types. Consecutive assignment statements are classified as joining components. The primitive gate is a latch or combination of registers by language definition. The initialization code is treated as a test-bench in the initialization form.

네트를 사용하지 않고 네트를 구동하는 프로세스는 드라이버 형태의 테스트-벤치이다. 네트를 구동하지 않고 네트를 판독하는 프로세스는 모니터 형태의 테스트-벤치이다. 지연 제어 또는 다중 이벤트 제어를 갖는 프로세스는 일반적인 형태의 테스트-벤치이다.The process of running a net without using a net is a test-bench in the form of a driver. The process of reading a net without running the net is a test-bench in the form of a monitor. A process with delay control or multiple event control is a common form of test-bench.

단일 이벤트 제어 및 단일 네트를 구동하는 프로세스는 이하의 것중 하나가 될 수 있다 : (1) 이벤트 제어는 에지-트리거된 이벤트라면, 프로세스는 에지-트리거된 형태의 레지스터 컴포넌트이다. (2) 프로세스 내에서 구동된 네트가 모든 가능한 실행 경로내에서 정의되지 않는다면, 네트는 래치 형태의 레지스터이다. (3) 프로세스 내에서 구동된 네트가 모든 가능한 실행 경로내에서 정의된다면, 네트는 결합 컴포넌트이다.The process of driving a single event control and a single net can be one of the following: (1) If the event control is an edge-triggered event, the process is an edge-triggered register component. (2) If a net driven in a process is not defined in all possible execution paths, the net is a latch type register. (3) If a net driven in a process is defined in all possible execution paths, the net is a binding component.

다중 네트 구동없이 단일 이벤트 제어를 갖는 프로세스는 각각의 네트를 개별적으로 구동하는 몇몇 프로세스로 분해되어, 개별적인 컴포넌트 형태가 구동된다. 분해된 프로세스는 컴포넌트 형태를 결정하기 위하여 사용될 수 있다.A process with a single event control without multiple net drives is broken down into several processes that drive each net individually, driving individual component types. The decomposed process can be used to determine the component shape.

단계 304는 컴포넌트 형태와는 상관없이, 모든 HDL 컴포넌트를 위한 소프트웨어 모델을 생성한다. 적절한 사용자 인터페이스를 사용하면, 사용자는 완전한 소프트웨어 모델을 사용하여 전체 회로 설계를 시뮬레이션할 수 있다. 테스트-벤치 프로세스는 스티멀러스 입력을 구동하고, 벡터 패턴을 테스트하고, 전체 시뮬레이션을 제어하며, 그리고 시뮬레이션 프로세스를 모니터하기 위하여 사용된다.Step 304 generates a software model for all HDL components, regardless of component type. With the proper user interface, the user can simulate the entire circuit design using a complete software model. The test-bench process is used to drive the stylus input, test the vector pattern, control the overall simulation, and monitor the simulation process.

단계 305는 클럭 분석을 실행한다. 클럭 분석은 2가지 일반적인 단계; (1) 클럭 추출 및 연속 맵핑, 및 (2) 클럭 네트워크 분석을 포함한다. 클럭 추출 및 연속 맵핑 단계는 사용자의 레지스터 컴포넌트를 SEmulation 시스템의 하드웨어 레지스터 모델에 맵핑한 다음, 클럭 신호를 시스템의 하드웨어 레지스터 컴포넌트 외부로 추출하는 단계를 포함한다. 클럭 네트워크 분석 단계는 주요 클럭 및 추출된 클럭 신호에 근거하여 유도된 클럭을 결정하며, 게이트 클럭 네트워크 및 게이트 데이터 네트워크를 분리하는 단계를 포함한다. 더욱 상세한 상술은 도 16과 관련하여 제공될 것이다.Step 305 performs clock analysis. Clock analysis consists of two general steps; (1) clock extraction and continuous mapping, and (2) clock network analysis. The clock extraction and continuous mapping steps include mapping a user's register component to a hardware register model of the SEmulation system and then extracting the clock signal out of the system's hardware register component. The clock network analysis step includes determining a derived clock based on the main clock and the extracted clock signal, and separating the gate clock network and the gate data network. Further details will be provided in conjunction with FIG. 16.

단계 306은 거주지(residence) 선택을 수행한다. 사용자와 관련하여, 시스템은 하드웨어 모델을 위한 컴포넌트를 선택한다; 즉, 사용자의 회로 설계의 하드웨어 모델내에서 구현될 수 있는 하드웨어 컴포넌트중에서, 일부 하드웨어 컴포넌트는 다양한 이유로 인하여 하드웨어 내에서 모델링되지 않을 것이다. 이러한 이유는 컴포넌트 형태, 하드웨어 자원 제한(즉, 소프트웨어의 플로팅 포인트 동작 및 많은 곱셈 동작), 시뮬레이션 및 통신 오버헤드(즉, 소프트웨어에 있는 테스트-벤치 프로세스들, 및 소프트웨어에 있는 테스트-벤치 프로세스에 의해 모니터되는 신호간의 작은 브릿지 로직), 및 사용자의 선택을 포함한다. 성능 및 시뮬레이션 모니터링을 포함하는 다양한 이유를 위하여, 사용자는 소프트웨어에 상주하도록 하드웨어내에서 모델링되는 특정 컴포넌트가 되도록 할 수 있다.Step 306 performs residence selection. With respect to the user, the system selects a component for the hardware model; That is, among the hardware components that can be implemented in the hardware model of the user's circuit design, some hardware components will not be modeled in hardware for various reasons. This is due to component types, hardware resource limitations (i.e. floating point operations and many multiplication operations in software), simulation and communication overhead (i.e. test-bench processes in software, and test-bench processes in software). Small bridge logic between monitored signals), and user selection. For a variety of reasons, including performance and simulation monitoring, a user can be a specific component that is modeled in hardware to reside in software.

단계 307은 선택된 하드웨어 모델을 리컨피규러블 하드웨어 에뮬레이션 보드에 맵핑시킨다. 특히, 단계 307 맵은 네트리스트를 취하며, 회로 설계를 특정 FPGA 칩에 맵핑시킨다. 이러한 단계는 로직 엘리먼트를 함께 그룹화하거나 클러스터하는 단계를 포함한다. 그리고 나서, 시스템은 각 그룹에 단일 FPGA 칩을 할당하거나, 또는 몇몇 그룹에 단일 FPGA 칩을 할당한다. 시스템은 상이한 FPGA 칩에 할당하기 위하여 그룹을 분리하기도 한다. 일반적으로, 시스템은 그룹을 FPGA 칩에 할당한다. 더욱 상세한 논의는 도 6과 관련하여 제공될 것이다. 시스템은 내부-칩 통신 오버헤드를 최소화하기 위하여 하드웨어 모델 컴포넌트를 FPGA 칩의 매쉬에 배치한다. 일 실시예에서, 어레이는 FPGA의 4 ×4 어레이, PCI 인터페이스 유닛 및 소프트웨어 클럭 제어 유닛을 포함한다. FPGA 어레이는 이러한 소프트웨어 컴파일 프로세스의 단계 302-306에서 결정된 바와 같이, 사용자의 하드웨어 회로 설계의 부분을 구현한다. PCI 인터페이스 유닛은 리컨피규러블 하드웨어 에뮬레이션 모델이 PCI 버스를 통하여 워크스테이션과 통신하도록 한다. 소프트웨어 클럭은 FPGA의 어레이에 다양한 클럭 신호를 위하여 레이스 조건을 회피한다. 더욱이, 단계 307은 하드웨어 모델중에서 통신 스케쥴에 따라 FPGA 칩을 라우트한다.Step 307 maps the selected hardware model to the reconfigurable hardware emulation board. In particular, the step 307 map takes a netlist and maps the circuit design to a particular FPGA chip. This step includes grouping or clustering logic elements together. The system then assigns a single FPGA chip to each group, or assigns a single FPGA chip to several groups. The system may even separate groups to allocate to different FPGA chips. In general, the system assigns groups to FPGA chips. A more detailed discussion will be provided with respect to FIG. 6. The system places hardware model components in the mesh of the FPGA chip to minimize internal-chip communication overhead. In one embodiment, the array includes a 4x4 array of FPGAs, a PCI interface unit, and a software clock control unit. The FPGA array implements part of the user's hardware circuit design, as determined in steps 302-306 of this software compilation process. The PCI interface unit allows the reconfigurable hardware emulation model to communicate with the workstation via the PCI bus. The software clock avoids race conditions for various clock signals in the array of FPGAs. Moreover, step 307 routes the FPGA chip in accordance with the communication schedule in the hardware model.

단계 308은 제어 회로를 삽입한다. 이러한 제어 회로는 시뮬레이터로의 DMA 엔진과 통신하기 위한 I/O 어드레스 포인터와 데이터 버스 로직(도 11, 12 및 14와 관련하여 이하에서 논의됨), 및 하드웨어 상태 변이와 와이어 멀티플렉싱(도 19 및 20과 관련하여 논의됨)을 제어하기 위한 평가 제어 로직을 포함한다. 당업자에게 공지된 바와 같이, 직접 메모리 액세스(DMA) 유닛은 주변장치 및 메인 메모리 사이의 추가적인 데이터 채널을 제공하며, 주변장치는 CPU의 개입없이 메인 메모리와 직접적으로 액세스(즉, 리드, 라이트)할 수 있다. 각각의 FPGA 칩내의 어드레스 포인터는 버스 크기 제한에 따라 소프트웨어 모델 및 하드웨어 모델 사이에 데이터를 이동하도록 한다. 평가 제어 로직은 클럭과 데이터 입력이 이러한 레지스터를 입력하기 전에 클럭이 입력을 레지스터에 입력할 수 있도록 보장하는 한정된 상태 머신이다.Step 308 inserts a control circuit. These control circuits include I / O address pointers and data bus logic (discussed below in connection with FIGS. 11, 12, and 14) for communicating with the DMA engine to the simulator, and hardware state transitions and wire multiplexing (FIGS. 19 and 20). Evaluation control logic) to control). As is known to those skilled in the art, a direct memory access (DMA) unit provides additional data channels between the peripheral and main memory, which peripherals can access (ie, read, write) directly with the main memory without CPU intervention. Can be. Address pointers within each FPGA chip allow data to be moved between software and hardware models according to bus size limitations. Evaluation control logic is a finite state machine that ensures that the clock can enter the input into a register before the clock and data inputs enter these registers.

단계 309는 하드웨어 모델을 FPGA 칩에 맵핑하기 위한 컨피규레이션 파일을 생성한다. 본질적으로, 단계 309는 회로 설계 컴포넌트를 각 칩내의 특정 셀 또는 게이트 레벨 컴포넌트에 할당한다. 단계 307은 하드웨어 모델 그룹을 특정 FPGA 칩에 맵핑하는 것을 결정하는 반면에, 단계 309는 이러한 맵핑 결과를 취하여, 각 FPGA 칩을 위한 컨피규레이션 파일을 생성한다.Step 309 creates a configuration file for mapping the hardware model to the FPGA chip. In essence, step 309 assigns the circuit design component to a particular cell or gate level component within each chip. Step 307 determines mapping a hardware model group to a particular FPGA chip, while step 309 takes this mapping result to generate a configuration file for each FPGA chip.

단계 310은 소프트웨어 커널 코드를 생성한다. 커널은 전체 SEmulation 시스템을 제어하는 소프트웨어 코드의 시퀀스이다. 커널은 코드의 일부가 업데이트 및 하드웨어 컴포넌트 평가를 요구하기 때문에 이러한 포인트까지 생성될 수 없다. 단계 309 후에만 하드웨어 모델과 FPGA 칩에 적절한 맵핑이 발생한다. 더욱 상세한 논의는 도 5와 관련하여 이하에서 제공될 것이다. 컴파일은 단계 311에서 종료된다.Step 310 generates software kernel code. The kernel is a sequence of software code that controls the entire SEmulation system. The kernel cannot be generated to this point because some of the code requires updates and hardware component evaluation. Only after step 309 proper mapping to the hardware model and the FPGA chip occurs. A more detailed discussion will be provided below with respect to FIG. 5. Compilation ends at step 311.

도 4에 관하여 상술한 바와 같이, 소프트웨어 커널 코드는 소프트웨어와 하드웨어 모델이 결정된 후에 단계 310에서 생성된다. 커널은 전체 시스템의 동작을 제어하는 SEmulation 시스템내의 소프트웨어 일부이다. 커널은 소프트웨어 시뮬레이션과 하드웨어 에뮬레이션의 실행을 제어한다. 커널은 하드웨어 모델의 중앙에 위치하기 때문에, 시뮬레이터는 에뮬레이터에 집적된다. 다른 공지된 공동-시뮬레이션 시스템과는 달리, 본 발명의 일 실시예에 따른 SEmulation 시스템은 외부로부터 에뮬레이터와 상호작용하기 위한 시뮬레이터를 요구하지 않는다. 커널의 일 실시예는 도 5에 도시된 제어 루프이다.As described above with respect to FIG. 4, the software kernel code is generated in step 310 after the software and hardware model have been determined. The kernel is the piece of software in the SEmulation system that controls the behavior of the entire system. The kernel controls the execution of software simulations and hardware emulations. Because the kernel is located in the center of the hardware model, the simulator is integrated into the emulator. Unlike other known co-simulation systems, the SEmulation system according to one embodiment of the present invention does not require a simulator to interact with the emulator from the outside. One embodiment of the kernel is the control loop shown in FIG.

도 5를 참조하면, 커널은 단계 330에서 시작된다. 단계 331은 초기화 코드를 평가한다. 단계 332에서의 시작하여, 결정 단계 339에 도달하면, 제어 루프가 시작하여, 시스템이 액티브 테스트-벤치 프로세스를 관측하지 못할 때까지, 사이클이 반복되며, 그러한 경우 시뮬레이션 또는 에뮬레이션 세션은 종료된다. 단계 332는 시뮬레이션 또는 에뮬레이션을 위한 액티브 테스트-벤치 컴포넌트를 평가한다. Referring to FIG. 5, the kernel begins at step 330. Step 331 evaluates the initialization code. Beginning at step 332, when decision 339 is reached, the control loop begins and the cycle repeats until the system fails to observe the active test-bench process, in which case the simulation or emulation session ends. Step 332 evaluates an active test-bench component for simulation or emulation.

단계 333은 클럭 컴포넌트를 평가한다. 이러한 클럭 컴포넌트는 테스트-벤치 프로세스로부터 생긴다. 보통, 사용자는 시뮬레이션 시스템에 무슨 형태의 클럭 신호가 생성될 것인지를 지시한다. (컴포넌트 형태 분석과 관련하여 상술되고 여기서 재생산된) 일 예에서, 테스트-벤치 프로세스에서 사용자에 의해서 지정된 클럭 컴포넌트는 다음과 같다 : Step 333 evaluates the clock component. This clock component results from the test-bench process. Typically, the user tells the simulation system what type of clock signal to generate. In one example (described above in connection with component shape analysis and reproduced herein), the clock component specified by the user in the test-bench process is as follows:

시작start

클럭 = 0;Clock = 0;

#5;# 5;

클럭 = 1;Clock = 1;

#5;# 5;

종료;End;

이러한 클럭 컴포넌트에서, 사용자는 로직 "0" 신호가 먼저 생성되고, 그리고 나서 5 시뮬레이션 시간 후에, 로직 "1" 신호가 생성될 것이라는 것을 결정한다. 이러한 클럭 생성 프로세스는 사용자에 의해 중단될 때까지 사이클이 계속된다. 이러한 시뮬레이션 시간은 커널에 의해 진행된다.In this clock component, the user determines that a logic "0" signal will be generated first, and then after 5 simulation times, a logic "1" signal will be generated. This clock generation process continues the cycle until interrupted by the user. This simulation time is run by the kernel.

결정 단계 334는 임의의 액티브 클럭 에지가 검출되었는지를 문의하고, 소프트웨어 및 가능한 하드웨어 모델에서 몇가지 종류의 로직 평가가 일어난다(만약, 에뮬레이션이 실행중이라면). 커널은 액티브 클럭 에지를 검출하기 위하여 사용하며, 클럭 신호는 테스트-벤치 프로세스로부터의 클럭 신호이다. 만약 결정 단계 334가 "아니오"라고 평가한다면, 커널은 단계 337로 진행한다. 결정 단계 334가 "예"라고 평가한다면, 레지스터와 메모리를 업데이트하는 단계 335와 결합 컴포넌트를 전파하는 단계 336으로 진행된다. 단계 336은 클럭 신호가 나타난 후에 조합 로직 네트워크를 통하여 값을 전파하기 위하여 시간을 요구하는 조합 로직을 처리한다. 값이 결합 컴포넌트를 통하여 전파되어 안정화되면, 커널은 단계 337로 진행한다.Decision step 334 asks if any active clock edges have been detected, and some sort of logic evaluation occurs in software and possibly hardware models (if emulation is running). The kernel uses to detect active clock edges, and the clock signal is the clock signal from the test-bench process. If decision step 334 evaluates to "no", then the kernel proceeds to step 337. If the decision step 334 evaluates to "yes", then the process proceeds to step 335 of updating the register and memory and to propagating the coupling component. Step 336 processes the combinatorial logic that requires time to propagate the value through the combinatorial logic network after the clock signal appears. If the value propagates through the coupling component and stabilizes, the kernel proceeds to step 337.

레지스터와 결합 컴포넌트가 하드웨어내에서 모델링되며, 그러므로 커널은 SEmulation 시스템의 에뮬레이터 부분을 제어한다는 것을 유의한다. 실제로, 커널은 임의의 액티브 클럭 에지가 검출될 때마다, 단계 334와 335에서 하드웨어 모델의 평가를 가속시킬 수 있다. 그러므로, 종래기술과는 달리, 본 발명의 일 실시예에 따른 SEmulation 시스템은 컴포넌트 형태(예를 들면, 레지스터, 조합)를 기반으로 소프트웨어 커널을 통해 하드웨어 에뮬레이터을 가속할 수 있다. 더욱이, 커널은 사이클마다 소프트웨어와 하드웨어 모델의 실행을 제어한다. 필수적으로, 에뮬레이터 하드웨어 모델은 시뮬레이션 커널을 실행하는 범용 프로세서의 시뮬레이션 공동프로세서로서 특징될 수 있다. 공동프로세서는 시뮬레이션 작업을 가속시킨다.Note that registers and coupling components are modeled in hardware, so the kernel controls the emulator portion of the SEmulation system. Indeed, the kernel may accelerate the evaluation of the hardware model at steps 334 and 335 whenever any active clock edge is detected. Therefore, unlike the prior art, the SEmulation system according to an embodiment of the present invention can accelerate the hardware emulator through the software kernel based on the component type (eg, register, combination). Moreover, the kernel controls the execution of software and hardware models every cycle. Essentially, the emulator hardware model can be characterized as a simulation coprocessor of a general purpose processor running a simulation kernel. The coprocessor speeds up the simulation.

단계 337은 액티브 테스트-벤치 컴포넌트를 평가한다. 단계 338은 시뮬레이션 시간을 개선시킨다. 단계 339는 단계 332에서 시작하는 제어 루프를 위한 경계를 제공한다. 단계 339는 임의의 테스트-벤치 프로세스가 액티브인지를 결정한다. 만약 그렇다면, 시뮬레이션 및/또는 에뮬레이션은 여전히 동작하며, 더 많은 데이터가 평가된다. 그러므로, 커널은 임의의 액티브 테스트-벤치 컴포넌트를 평가하기 위하여 단계 332로 루프시킨다. 만약, 테스트-벤치 프로세스가 액티브가 아니라면, 시뮬레이션과 에뮬레이션 프로세스는 종료된다. 단계 340은 시뮬레이션/에뮬레이션 프로세스는 종료한다. 또한, 커널은 전체 SEmulation 시스템의 동작을 제어하는 메인 제어 루프이다. 임의의 테스트-벤치 프로세스가 액티브이면, 커널은 액티브 테스트-벤치 컴포넌트를 평가하고, 클럭 컴포넌트를 평가하며, 레지스터및 메모리를 업데이트하고 조합 로직 데이터를 전파하기 위해, 클럭 에지를 검출하고, 시뮬레이션 시간을 진행시킨다.
도 6은 하드웨어 모델을 리컨피규러블 보드로 자동 맵핑하는 방법의 일 실시예를 도시한다. 네트리스트 파일은 하드웨어 구현 프로세스로의 입력을 제공한다. 네트리스트는 로직 기능과 그것들의 상호접속을 상술한다. 하드웨어 모델-대-FPGA 구현 프로세스는 3가지 독립적인 작업: 맵핑, 배치 및 라우팅을 포함한다. 상기 툴들은 일반적으로 "플레이스-및-라우트" 툴로서 언급된다. 사용되는 설계 툴은 Viewlogic Viewdraw, 스키메틱 캡쳐 시스템 및 Xilinx Xact 배치 및 라우트 소프트웨어 또는 Altera's MAX+PLUS Ⅱ 시스템일 수 있다.
맵핑 작업은 회로 설계를 로직 블록, I/O 블록 및 다른 FPGA 자원으로 분할한다. 플립-플롭 및 버퍼와 같은 일부 로직 기능은 직접적으로 상응하는 FPGA 자원으로 맵핑하지만, 조합 로직과 같은 다른 로직 기능은 맵핑 알고리즘을 사용하여 로직 블록에서 구현되어야만 한다. 사용자는 일반적으로 최적 밀도 또는 최적 성능을 위한 맵핑을 선택할 수 있다.
배치 작업은 맵핑 작업으로부터 로직 및 I/O 블록을 취하며, FPGA 어레이내에 물리적인 위치로 할당하는 것을 포함한다. 현재 FPGA 장치는 일반적으로 3가지 기술; 민컷(mincut), 어닐링 시뮬레이션 및 GFDR(general force-directed relaxation)의 조합을 사용한다. 이러한 기술은 상호접속부의 전체 네트 길이 또는 다양한 변수 중에서 임계 신호 경로의 세트에 따른 지연에 좌우되는 다양한 비용 함수를 기초로 최적 배치를 결정한다. Xilinx XC4000 시리즈 FPGA 장치는 배치시 개선을 위한 GFDR에 의해 일어나는 최초 배치를 위한 민컷 기술의 변수를 사용한다.
라우팅 작업은 다양한 맵핑 및 배치된 블록을 상호접속하기 위하여 사용된 라우팅 경로를 결정하는 것을 포함한다. 소위 메이즈 라우터(maze router)인 그러한 라우터는 2개의 지점간의 최단 경로를 탐색한다. 라우팅 작업은 칩중에 직접 상호접속을 제공하므로, 칩과 관련된 회로의 배치가 중요하다.
초기에, 하드웨어 모델은 게이트 네트리스트 350 또는 RTL 357로 상술될 수 있다. RTL 레벨 코드는 게이트 레벨 네트리스트에 합성될 수 있다. 맵핑 프로세스 동안에, Altera MAX+PLUSⅡ 프로그래머블 로직 개발 툴 시스템 및 소프트웨어와 같은 합성기 서버(360)는 맵핑 목적을 위한 출력 파일을 생산하기 위하여 사용될 수 있다. 합성기 서버(360)는 사용자의 회로 설계 컴포넌트를 라이브러리(361)내에 있는 임의의 표준 로직 엘리먼트(예를 들면, 표준 가산기나 표준 곱셈기)에 매칭되며, 파라미터화되며 종종 사용되는 로직 모듈(362)(예를 들면, 비표준 멀티플렉서 또는 비표준 가산기)을 생성하며, 그리고 임의의 로직 엘리먼트(363)(예를 들면, 주문된 로직 함수를 구현하는 룩-업 테이블 기반의 로직)를 합성하는 능력을 갖는다. 합성기 서버는 여분의 로직과 사용되지 않는 로직을 제거한다. 출력 파일은 사용자의 회로 설계에 의해 요구된 로직을 합성하거나 최적화한다.
HDL의 일부 또는 모두가 RTL 레벨일 때, 회로 설계 컴포넌트는 상위 레벨에 있어서, SEmulation 시스템이 SEmulation 레지스터나 컴포넌트를 사용하여 이러한 컴포넌트를 용이하게 모델링할 수 있게된다. HDL의 일부 또는 모두가 게이트 네트리스트 레벨에 있을 때, 회로 설계 컴포넌트는 더 많은 회로 설계-특정이 되며, 사용자 회로 설계 컴포넌트가 SEmulation 컴포넌트에 맵핑하는 것을 더욱 어렵게 한다. 따라서, 합성기 서버는 표준 로직 엘리먼트의 변화에 근거한 임의의 로직 엘리먼트 또는 이러한 변화나 라이브러리 표준 로직 엘리먼트와 병행하지 않는 임의의 로직 엘리먼트를 생성할 수 있다.
회로 설계가 게이트 네트리스트 형태에 있다면, SEmulation 시스템은 그룹핑 또는 클러스터링 동작(351)을 최초로 수행할 것이다. 하드웨어 모델 구조는 조합 로직과 레지스터가 클럭으로부터 분리되기 때문에, 클러스터링 프로세스에 기초한다. 그러므로, 일반적인 주요 클럭 또는 게이트된 클럭 신호를 공유하는 로직 엘리먼트는 함께 그룹화하여, 칩 상에 배치됨으로써 더욱 양호하게 작용할 수 있다. 클러스터링 알고리즘은 유도된 액세스, 계층적 추출 및 규직적인 구조물 추출에 기초한다. 구조화된 RTL 358내에 기재되어 있다면, SEmulation 시스템은 기능을 로직 기능 분해 동작(359)에 의해 표현되는 더 작은 유닛으로 분해할 수 있다. 임의의 단계에서, 로직 합성이나 로직 최적화가 요구된다면, 합성기 서버(360)는 회로 설계를 사용자 지시에 기초한 더욱 효율적인 표현으로 변환할 수 있다. 클러스터링 동작(351)을 위하여, 합성기 서버에 링크는 점선 화살표(364)에 의해 표현된다. 구조화된 RTL(358)을 위하여, 합성기 서버(360)에 링크는 화살표(365)로 표현된다. 로직 기능 분해 동작(359)을 위하여, 합성기 서버(360)에 링크는 화살표(366)로 표현된다.Step 337 evaluates the active test-bench component. Step 338 improves simulation time. Step 339 provides a boundary for the control loop beginning at step 332. Step 339 determines if any test-bench process is active. If so, simulation and / or emulation still work, and more data is evaluated. Therefore, the kernel loops to step 332 to evaluate any active test-bench components. If the test-bench process is not active, the simulation and emulation processes are terminated. Step 340 ends the simulation / emulation process. The kernel is also the main control loop that controls the operation of the entire SEmulation system. If any test-bench process is active, the kernel evaluates the active test-bench component, evaluates the clock component, updates the registers and memory, detects clock edges, and simulates time to propagate combinational logic data. Proceed.
6 illustrates one embodiment of a method for automatically mapping a hardware model to a reconfigurable board. The netlist file provides input to the hardware implementation process. The netlist details the logic functions and their interconnections. The hardware model-to-FPGA implementation process involves three independent tasks: mapping, deployment, and routing. The tools are generally referred to as "place-and-route" tools. Design tools used may be Viewlogic Viewdraw, schematic capture systems and Xilinx Xact batch and route software or Altera's MAX + PLUS II systems.
The mapping task divides the circuit design into logic blocks, I / O blocks, and other FPGA resources. Some logic functions, such as flip-flops and buffers, map directly to corresponding FPGA resources, while other logic functions, such as combinatorial logic, must be implemented in logic blocks using mapping algorithms. The user can generally choose a mapping for optimal density or optimal performance.
Placement tasks involve taking logic and I / O blocks from the mapping and assigning them to physical locations within the FPGA array. Current FPGA devices typically have three technologies; A combination of mincut, annealing simulation and general force-directed relaxation (GFDR) is used. This technique determines the optimal placement based on various cost functions that depend on the delay along the set of critical signal paths among the overall net length or various variables of the interconnect. Xilinx XC4000 series FPGA devices use the parameters of the Mincut technology for initial deployment, caused by GFDR for improvement in deployment.
Routing tasks include determining the routing path used to interconnect the various mapped and placed blocks. Such a router, a so-called maze router, seeks the shortest path between two points. Routing operations provide direct interconnect among the chips, so the placement of the circuit associated with the chip is important.
Initially, the hardware model can be detailed with gate netlist 350 or RTL 357. The RTL level code can be synthesized in the gate level netlist. During the mapping process, synthesizer server 360, such as Altera MAX + PLUSII programmable logic development tool system and software, can be used to produce an output file for mapping purposes. Synthesizer server 360 matches a user's circuit design component to any standard logic element (eg, standard adder or standard multiplier) in library 361, and is parameterized and often used logic module 362 ( For example, it has the ability to create a nonstandard multiplexer or nonstandard adder, and to synthesize any logic element 363 (eg, look-up table based logic that implements the ordered logic function). The synthesizer server eliminates extra logic and unused logic. The output file synthesizes or optimizes the logic required by the user's circuit design.
When some or all of the HDL is at the RTL level, the circuit design components are at a higher level, allowing the SEmulation system to easily model these components using SEmulation registers or components. When some or all of the HDL is at the gate netlist level, the circuit design component becomes more circuit design-specific, making it more difficult for the user circuit design component to map to the SEmulation component. Thus, the synthesizer server may generate any logic element based on a change in the standard logic element or any logic element that is not parallel to this change or library standard logic element.
If the circuit design is in the form of a gate netlist, the SEmulation system will first perform a grouping or clustering operation 351. The hardware model structure is based on the clustering process because the combinatorial logic and registers are separated from the clock. Therefore, logic elements that share a common major clock or gated clock signal can work better by being grouped together and placed on a chip. Clustering algorithms are based on derived access, hierarchical extraction and canonical structure extraction. If described in structured RTL 358, the SEmulation system can decompose the function into smaller units represented by logic functional decomposition operation 359. At any stage, if logic synthesis or logic optimization is required, synthesizer server 360 may convert the circuit design into a more efficient representation based on user instructions. For clustering operation 351, the link to the synthesizer server is represented by dashed arrow 364. For structured RTL 358, the link to synthesizer server 360 is represented by arrow 365. For logic function decomposition operation 359, a link to synthesizer server 360 is represented by arrow 366.

삭제delete

클러스터링 동작(351)은 기능과 크기에 기초한 선택적인 방식으로 로직 컴포넌트를 그룹화한다. 클러스터링은 큰 회로 설계를 위한 몇몇 클러스터나 작은 회로 설계를 위한 하나의 클러스터를 포함한다. 이러한 로직 엘리먼트의 클러스터는 지정된 FPGA 칩으로 맵핑하기 위하여 다음 단계에서 사용될 것이다; 즉, 하나의 클러스터는 특정 칩을 위하여 목표가 정해질 것이며, 다른 클러스터는 상이한 칩 또는 제 1 클러스터로서 동일한 칩을 위하여 목표가 정해질 것이다. 일반적으로, 클러스터내의 로직 엘리먼트는 칩 내에서 클러스터와 함께 놓이지만, 최적화 목적을 위하여, 클러스터는 하나 이상의 칩에 분리된다.
클러스터가 클러스터링 동작(351)에서 형성된 후에, 시스템은 플레이스-및-라우트 동작을 수행한다. 우선, FPGA 칩 내에 클러스터의 거친-그레인(coarse-grain) 배치 동작(352)이 수행된다. 거친-그레인 배치 동작(352)은 우선 로직 엘리먼트의 클러스터를 선택된 FPGA 칩에 배치한다. 만약, 필요하다면, 시스템은 합성기 서버(360)가 화살표(367)에 의해 표현된 바와 같이 거친-그레인 배치 동작(352)에 이용가능하게 한다. 미세-그레인(fine-grain) 배치 동작은 최초 배치를 미세-조정하기 위하여, 거친-그레인 배치 동작을 한 후에 수행된다. SEmulation 시스템은 거친-그레인 배치 및 미세-그레인 배치 동작을 위한 최적 배치를 결정하기 위하여, 핀 사용 요구, 게이트 사용 요구 및 게이트-대-게이트 홉(hop)에 기초한 비용 함수를 사용한다.
특정 칩내에 배치되는 클러스터 결정 방법은 배치 비용에 기초하며, 2개이상의 회로(즉, CKTQ = CKT1, CKT2,...,CKTN) 및 FPGA 칩 어레이내의 개별 위치를 위한 비용 함수(P, G, D)를 통하여 계산되며, 여기서 P는 일반적으로 핀 사용/이용가능성, G는 일반적으로 게이트 이용/이용가능성 및 D는 (도 8과 관련하여 도 7에 도시된 )연결 매트릭스 M에 의해 정의되는 게이트 대 게이트 "홉"의 거리 또는 수이다. 하드웨어 모델내에 모델링되는 사용자의 회로 설계는 회로 CKTQ의 전체 조합이다. 각각의 비용함수는 계산된 배치 비용의 계산된 값이 일반적으로 (1) FPGA 어레이내의 임의의 2개의 회로 CKTN-1과 CKTN 사이의 "홉"의 최소 수 및 (2) FPGA 어레이내의 회로 CKTN-1과 CKTN 의 배치가 핀 사용이 최소가 되도록하는 경향이 있다.Clustering operation 351 groups the logical components in an optional manner based on function and size. Clustering includes several clusters for large circuit designs or one cluster for small circuit designs. This cluster of logic elements will be used in the next step to map to the designated FPGA chip; That is, one cluster will be targeted for a particular chip and the other cluster will be targeted for the same chip as a different chip or as a first cluster. In general, logic elements within a cluster lie with the cluster within the chip, but for optimization purposes, the cluster is separated into one or more chips.
After the cluster is formed in clustering operation 351, the system performs a place-and-route operation. First, a coarse-grain placement operation 352 of the cluster in the FPGA chip is performed. Rough-grain placement operation 352 first places a cluster of logic elements on a selected FPGA chip. If necessary, the system makes synthesizer server 360 available for coarse-grain placement operation 352 as represented by arrow 367. A fine-grain placement operation is performed after the coarse-grain placement operation to fine-tune the initial placement. The SEmulation system uses a cost function based on pin usage requirements, gate usage requirements, and gate-to-gate hops to determine optimal placement for coarse-grain placement and fine-grain placement operations.
Cluster determination methods that are placed within a particular chip are based on placement costs and include cost functions (P, G, P) for two or more circuits (ie, CKTQ = CKT1, CKT2, ..., CKTN) and individual locations within the FPGA chip array Calculated through D), where P is generally the pin usage / availability, G is the gate usage / availability in general, and D is the gate defined by the connection matrix M (shown in FIG. 7 with respect to FIG. 8). Vs. distance or number of "hops". The user's circuit design modeled in the hardware model is the overall combination of circuit CKTQ. Each cost function is calculated such that the calculated value of the calculated placement cost is generally (1) the minimum number of "hops" between any two circuits CKTN-1 and CKTN in the FPGA array and (2) the circuit CKTN- in the FPGA array. The placement of 1 and CKTN tends to minimize pin usage.

삭제delete

일 실시예에서, 비용 함수 F(P, G, D)는 다음과 같이 정의된다.In one embodiment, the cost function F (P, G, D) is defined as follows.

이러한 등식은 다음과 같이 간략화될 수 있다.This equation can be simplified as follows.

f(P, G, D) = C0*P + C1*G + C2*Df (P, G, D) = C0 * P + C1 * G + C2 * D

제 1 항(즉, C0*P)는 사용된 핀의 수와 사용가능한 핀의 수에 기초한 제 1 배치 비용 값을 만든다. 제 2 항(즉, C1*G)은 사용된 게이트의 수와 사용가능한 게이트의 수에 기초한 제 2 배치 비용값을 만든다. 제 3 항(즉, C2*D)은 회로 CKTQ(즉, CKT1, CKT2,...,CKTN)내의 다양한 상호접속 게이트 사이의 존재하는 홉의 수에 기초한 배치 비용 값을 만든다. 전체 배치 비용 값은 이러한 3개의 배치 비용 값을 반복적으로 가산함으로써 만들어진다. 상수 CO, C1및 C2는 임의의 배치 비용 계산 동안에 가장 중요한 인자 또는 인자들(즉, 핀 사용량, 게이트 사용량 또는 게이트-대-게이트 홉)에 대한 비용 함수로부터 발생된 전체 배치 비용 값을 선택적으로 왜곡하는 가중 상수를 나타낸다.The first term (ie C0 * P) makes a first placement cost value based on the number of pins used and the number of pins available. The second term (i.e. C1 * G) makes a second placement cost value based on the number of gates used and the number of gates available. The third term (ie, C2 * D) makes a placement cost value based on the number of hops present between the various interconnect gates in the circuit CKTQ (ie, CKT1, CKT2, ..., CKTN). The total batch cost value is created by iteratively adding these three batch cost values. The constants CO, C1, and C2 selectively skew the overall placement cost value resulting from the cost function for the most important factors or factors (ie, pin usage, gate usage, or gate-to-gate hop) during any placement cost calculation. Represents a weighting constant.

배치 비용은 시스템이 가중 상수 C0, C1 및 C2를 위한 상이한 상대값을 선택할 때 반복적으로 계산된다. 그러므로, 일 실시예에서, 거친-그레인 배치 동작 동안에, 시스템은 C2에 비하여 CO 및 C1를 위하여 큰 값을 선택한다. 이러한 반복에서, 시스템은 핀 사용량/이용가능성 및 게이트 사용량/이용가능성의 최적화가 FPGA 칩의 어레이내의 회로 CKTQ의 초기 배치에서의 게이트-대-게이트 홉의 최적화보다 더욱 중요하다는 것을 결정한다. 순차적인 반복에서, 시스템은 C2에 비해 작은 값의 C0 및 C1을 선택한다. 이러한 반복에서, 시스템은 게이트-대-게이트 홉의 최적화가 핀 사용량/이용가능성 및 게이트 사용량/이용가능성의 최적화보다 더 중요하는 것을 결정한다.The batch cost is calculated iteratively when the system selects different relative values for weighting constants C0, C1 and C2. Therefore, in one embodiment, during the rough-grain batch operation, the system selects large values for CO and C1 relative to C2. In this iteration, the system determines that the optimization of pin usage / availability and gate usage / availability is more important than the optimization of gate-to-gate hop in the initial placement of circuit CKTQ within the array of FPGA chips. In sequential iteration, the system selects smaller values of C0 and C1 compared to C2. In this iteration, the system determines that the optimization of gate-to-gate hop is more important than the optimization of pin usage / availability and gate usage / availability.

미세-그레인 배치 동작동안에, 시스템은 동일한 비용 함수를 사용한다. 일 실시예에서, CO, C1 및 C2의 선택에 관한 반복 단계는 거친-그레인 동작을 위한 것과 동일하다. 다른 실시예에서, 미세-그레인 배치 동작은 시스템이 C2에 비하여 CO 및 C1을 위하여 적은 값을 선택하는 단계를 포함한다.During the fine-grain placement operation, the system uses the same cost function. In one embodiment, the repeating steps for the selection of CO, C1 and C2 are the same as for the rough-grain operation. In another embodiment, the micro-grain placement operation includes the system selecting a smaller value for CO and C1 compared to C2.

이러한 변수와 등식의 설명은 이하에서 논의될 것이다. (다른 FPGA 칩중의) FPGA 칩 x 또는 FPGA 칩 y내의 특정 회로 CKTQ를 배치할 것인지를 결정할 시, 비용 함수는 핀 사용량/이용가능성(P), 게이트 사용량/이용가능성(G) 및 게이트-대-게이 트 홉(D)을 시험한다. 비용 함수 변수 P, G, D에 기초하여, 비용 함수 f(P, G, D)는 FPGA 어레이내의 특정 위치에서 회로 CKTQ를 위한 배치 비용 값을 만든다.Descriptions of these variables and equations will be discussed below. In deciding whether to place a particular circuit CKTQ within FPGA chip x or FPGA chip y (among other FPGA chips), the cost functions are pin usage / availability (P), gate usage / availability (G) and gate-to- Test the gate hop (D). Based on the cost function variables P, G, and D, the cost function f (P, G, D) produces a placement cost value for the circuit CKTQ at a specific location within the FPGA array.

핀 사용량/이용가능성(P)는 I/O 용량을 나타내기도 한다. P_used는 각각의 FPGA 칩을 위한 회로 CKTQ에 의해 사용된 핀의 개수이다. P_available은 FPGA 칩내에서 이용가능한 핀의 개수이다. 일 실시예에서, P_available은 264(44핀 ×6 상호접속/칩)인데 반하여, 다른 실시예에서는 P_available은 265이다(44핀 ×6 상호접속/칩 + 1 여분 핀). 그러나, 이용가능한 핀의 특정 수는 사용된 FPGA 칩의 형태, 칩당 사용된 상호접속의 전체 수 및 각각의 상호접속을 위해 사용된 핀의 개수에 의존한다. 그러므로, P_available은 상당히 변화할 수 있다. 제 1 항의 비용 함수 F(P, G, D) 방정식(즉, C0*P)을 평가하기 위하여, P_used/P_available비율이 각각의 FPGA 칩에 대해 계산된다. 그러므로, FPGA 칩의 4×4 어레이를 위하여, P_used/P_available비율은 16이 된다. 소정 개수의 이용가능한 핀을 위하여 더 많은 핀이 사용되면, 비율은 더 높아진다. 16 계산된 비율중, 비율을 양산하는 가장 높은 수가 선택된다. 제 1 배치 비용값은 선택된 최대 비율 P_used/P_available과 가중 상수 C0가 곱해짐으로써 제 1 항 C0*P로부터 계산된다. 이러한 제 1 항은 계산된 비율 P_used/P_available과 각각의 FPGA 칩을 위해 계산된 비율중 특정 최대 비율에 의존하기 때문에, 배치 비용 값은 더 높은 핀 사용을 위하여 더 높아지며, 다른 모든 인자는 같아진다. 시스템은 가장 낮은 배치 비용을 양산하는 배치를 선택한다. 다양한 배치를 위하여 계산된 모든 최대치중에서 가장 낮은 특정 배치를 만드는 P_used/P_available는 FPGA 어레이내에서 최적 배치로서 일반적으로 고려되며, 다른 모든 인자는 같다.Pin usage / availability (P) may also indicate I / O capacity. P _used is the number of pins used by the circuit CKTQ for each FPGA chip. P _available is the number of pins _available in the FPGA chip. In one embodiment, P _available is 264 (44 pin x 6 interconnect / chip), whereas in other embodiments, P _available is 265 (44 pin x 6 interconnect / chip + 1 spare pin). However, the specific number of pins available depends on the type of FPGA chip used, the total number of interconnects used per chip, and the number of pins used for each interconnect. Therefore, P _available can vary significantly. In order to evaluate the cost function F (P, G, D) equation (i.e. C0 * P) of _{claim 1} , a P _used / P _available ratio is calculated for each FPGA chip. Therefore, for a 4x4 array of FPGA chips, the P _used / P _available ratio is 16. If more pins are used for a given number of available pins, the ratio is higher. 16 Of the calculated ratios, the highest number yielding the ratio is selected. The first placement cost value is calculated from the first term C0 * P by multiplying the selected maximum ratio P _used / P _available by the weighting constant C0. Since this term depends on the calculated ratio P _used / P _available and the specific maximum ratio calculated for each FPGA chip, the placement cost value is higher for higher pin usage, and all other factors are equal. Lose. The system selects the batch that produces the lowest batch cost. P _used / P _{available, which} _yields the lowest specific placement of all the maximums calculated for various placements, is generally considered as the optimal placement within the FPGA array, and all other factors are equal.

게이트 사용량/이용가능성(G)은 각각의 FPGA 칩에 의해 허용가능한 게이트의 개수에 기초한다. 어레이내의 회로 CKTQ의 위치에 근거한 일 실시예에서, 만약 사용된 게이트의 수(G_used)는 특정 임계값 이상이며, 그리고 나서 제 2 배치 비용(C1*G)은 배치가 실행할 수 없다는 것을 나타내는 값으로 할당될 것이다. 유사하게, 회로 CKTQ를 함유하는 각각의 칩내에 사용된 게이트의 개수가 특정 임계값 이하이면, 그러면 이러한 제 2 항(C1*G)은 배치가 실행가능한 것을 나타내는 값으로 할당될 것이다. 그러므로, 초기에 시스템이 특정 칩내에 회로 CKTQ를 배치하기를 원하며, 칩은 회로 CKT1를 수용할 정도의 게이트가 충분치 못하다면, 시스템은 이러한 배치는 실행 가능하지 않은 비용 함수로 결론짓는다. 일반적으로, G를 위한 높은 수(예를 들면, 무한대)는 비용 함수가 회로 CKTQ의 바람직한 배치가 실행 불가능한 것을 나타내는 높은 배치 비용값을 나타낼 것이며, 다른 배치가 결정되어야 한다는 것을 보장한다.Gate usage / availability G is based on the number of gates allowable by each FPGA chip. In one embodiment based on the position of the circuit CKTQ in the array, if the number of gates _used G is above a certain threshold, then the second placement cost C1 * G is a value indicating that the placement cannot be performed. Will be assigned. Similarly, if the number of gates used in each chip containing circuit CKTQ is below a certain threshold, then this second term C1 * G will be assigned a value indicating that the placement is feasible. Therefore, initially, the system wants to place circuit CKTQ within a particular chip, and if the chip does not have enough gates to accommodate circuit CKT1, the system concludes that this placement is a cost function that is not feasible. In general, a high number (eg, infinity) for G will represent a high placement cost value indicating that the cost function indicates that the preferred placement of the circuit CKTQ is not feasible and ensures that another placement should be determined.

어레이내의 회로 CKTQ의 위치에 기초한 다른 실시예에서, 비율 G_used/G_available은 각 칩을 위하여 계산되며, 여기서 G_used는 각각의 FPGA 칩내의 회로 CKTQ에 의해 사용된 게이트의 개수이며, G_available은 각 칩내에 이용가능한 게이트의 개수이다. 일 실시예에서, 시스템은 FPGA 어레이를 위하여 FLEX 10K100 칩을 사용한다. FLEX 10K100 칩은 대략 100,000 게이트를 포함한다. 그러므로, 이러한 실시예에서, G_available은 100,000 게이트와 같다. 따라서 FPGA 칩의 4×4 어레이를 위하여, 16 비율 G_used/G_available가 계산된다. 소정 개수의 이용가능한 게이트를 위하여 많은 게이트가 사용되면, 비율은 더 높아진다. 계산된 16 비율중에서, 가장 높은 수가 선택된다. 제 2 배치 비용값은 선택된 최대 비율 G_used/G_available과 가중 상수 C1을 곱함으로써 제 2 항 C1*G로부터 계산된다. 이러한 제 2 항은 각각의 FPGA 칩에 대해 계산된 비율중에서 특정 최대 비율과 계산된 비율 G_used/G_available에 의존하기 때문에, 배치 비용값은 더 높은 게이트 사용량을 위하여 더 높아지며, 다른 인자는 동일해질 것이다. 시스템은 가장 낮은 비용을 양산하는 회로 배치를 선택한다. 다양한 배치를 위하여 계산된 모든 최대치중에서 가장 낮은 최대 비율 G_used/G_available를 만드는 특정 배치는 일반적으로 FPGA 어레이내의 최적 배치로서 고려되며, 다른 모든 인자는 동일하다.In another embodiment based on the position of circuit CKTQ in the array, the ratio G _used / G _available is calculated for each chip, where G _used is the number of gates used by circuit CKTQ in each FPGA chip, where G _available is The number of gates available in each chip. In one embodiment, the system uses a FLEX 10K100 chip for the FPGA array. The FLEX 10K100 chip contains approximately 100,000 gates. Therefore, in this embodiment, G _available is equal to 100,000 gates. Thus, for a 4x4 array of FPGA chips, 16 ratios G _used / G _available are calculated. If more gates are used for a given number of available gates, the ratio is higher. Of the 16 proportions calculated, the highest number is selected. The second placement cost value is calculated from the second term C1 * G by multiplying the selected maximum ratio G _used / G _available with the weighting constant C1. Since this term depends on the specific maximum ratio and the calculated ratio G _used / G _available among the calculated ratios for each FPGA chip, the placement cost value is higher for higher gate usage, and the other factors will be equal. will be. The system selects the circuit layout that produces the lowest cost. Of all the maximums calculated for the various placements, the particular placement that produces the lowest maximum ratio G _used / G _available is generally considered as the optimal placement within the FPGA array, and all other factors are equal.

다른 실시예에서, 시스템은 C1을 위한 일부 값을 선택한다. 만약 비율 G_used/G_available가 "1"보다 더 크다면, 이러한 특정 배치는 실행 불가능하다(즉, 적어도 하나의 칩은 회로의 이러한 특정 배치를 위해 게이트가 충분치 못하다). 그 결과, 시스템은 C1을 매우 높은 수(예를 들면, 무한대)로 조절하며, 제 2 항(C1*G)는 매우 높은 수가 되며, 전체 배치 비용 값 f(P, G, D)은 매우 높아질 것이다. 다른 한편, 만약 비율 G_used/G_available가 "1" 이하이면, 이러한 특정 배치는 실행가능하다(즉, 각각의 칩은 회로 구현을 지원하기에 충분한 게이트를 갖는다). 그 결과, 시스템은 C1을 변형하지 않으며, 제 2 항(C1*G)는 특정 수를 결정할 것이다.In another embodiment, the system selects some value for C1. If the ratio G _used / G _available is greater than " 1 ", this particular arrangement is not feasible (ie at least one chip does not have enough gates for this particular placement of the circuit). As a result, the system adjusts C1 to a very high number (e.g. infinity), the second term (C1 * G) becomes a very high number, and the overall deployment cost value f (P, G, D) becomes very high. will be. On the other hand, if the ratio G _used / G _available is less than or equal to "1", this particular arrangement is feasible (ie, each chip has enough gates to support the circuit implementation). As a result, the system does not modify C1, and the second term (C1 * G) will determine the particular number.

제 3항(C2*D)은 상호접속을 요구하는 모든 게이트들 사이의 홉의 개수를 나타낸다. 홉의 개수는 상호접속 매트릭스에 의존하기도 한다. 접속 매트릭스는 칩 대 칩 상호접속을 요구하는 임의의 2개 게이트 사이의 회로 경로를 결정하기 위한 기초를 제공한다. 모든 게이트가 게이트 대 게이트 상호접속을 요구하는 것은 아니다. 사용자의 최초 회로 설계와 특정 칩으로의 클러스터의 분할에 기초하여, 일부 게이트는 그것들의 개별 입력과 출력이 동일한 칩에 위치하기 때문에, 임의의 상호접속이 필요하지 않게 된다. 그러나, 다른 게이트는 그적들의 개별 입력과 출력에 연결된 로직 엘리먼트가 상이한 칩에 배치되므로 상호접속이 요구된다.Clause 3 (C2 * D) represents the number of hops between all gates requiring interconnection. The number of hops also depends on the interconnect matrix. The connection matrix provides the basis for determining the circuit path between any two gates requiring chip to chip interconnect. Not all gates require a gate-to-gate interconnect. Based on the user's original circuit design and the division of the cluster into specific chips, some gates do not require any interconnection because their individual inputs and outputs are located on the same chip. However, other gates require interconnection because logic elements connected to their respective inputs and outputs are placed on different chips.

"홉"을 이해하기 위하여, 도 7의 테이블 형태의 접속 매트릭스와 도 8의 화보 형태의 도면을 참조한다. 도 8에서, 칩 F11과 칩 F14 사이의 상호접속(602)과 같은 칩들 사이의 각각의 상호접속은 44핀 또는 44 와이어 라인을 나타낸다. 다른 실시예에서, 각각의 상호접속은 44핀 이상을 나타낸다. 다른 실시예에서, 각각의 상호접속은 44핀 미만을 나타낸다.In order to understand "hop", reference is made to the connection matrix in the form of a table in FIG. 7 and the diagram in the form of a pictorial in FIG. 8. In FIG. 8, each interconnection between chips, such as interconnect 602 between chip F11 and chip F14, represents a 44 pin or 44 wire line. In other embodiments, each interconnect represents at least 44 pins. In other embodiments, each interconnect represents less than 44 pins.

이러한 상호접속 기술을 사용하여, 데이터는 한 칩에서 다른 칩으로 2 "홉" 또는 "점프"내에서 통과할 수 있다. 그러므로, 데이터는 상호접속부(601)를 통하여 하나의 홉내에서 칩 F11에서 칩 F12로 통과할 수 있으며, 데이터는 상호접속부(600, 606) 또는 상호접속부(603, 610)중 어느 하나를 통하여 2 홉내에서 칩 F11에서 칩 F 33으로 통과할 수 있다. 이러한 예시적인 홉은 이러한 칩의 세트들 사이의 최단 경로 홉이다. 일부 예에서, 신호는 다양한 신호를 통하여 라우트되어, 한 칩내의 게이트와 다른 칩내의 게이트 사이의 홉의 개수는 최단 경로를 초과하게 된다. 게이트-대-게이트 홉의 개수를 결정할 때 시험되어야만 하는 회로 경로는 상호접속을 요구하는 것들이다.Using this interconnect technology, data can pass in two "hops" or "jumps" from one chip to another. Thus, data can pass from chip F11 to chip F12 within one hop through interconnect 601, and data can pass within two hops through either interconnect 600, 606 or interconnects 603, 610. Pass from chip F11 to chip F33. This exemplary hop is the shortest path hop between these sets of chips. In some examples, signals are routed through various signals such that the number of hops between gates in one chip and gates in another chip exceeds the shortest path. Circuit paths that must be tested when determining the number of gate-to-gate hops are those requiring interconnection.

접속은 내부 칩 상호접속을 요구하는 게이트들 사이의 모든 홉의 합에 의해서 표현된다. 임의의 2개 칩간의 최단 경로는 도 7과 8의 접속 매트릭스를 사용하여 하나 또는 두 개의 "홉"에 의해 표현될 수 있다. 그러나, 특정 하드웨어 모델 구현을 위하여, I/O 용량은 어레이내의 임의의 2 게이트 사이의 직접적인 최단 경로 접속의 개수를 제한하며, 그러므로 이러한 신호는 그것들의 목표지점에 도달하기 위하여 더 긴 경로(2개 이상의 홉)를 통하여 라우팅되어야만 한다. 따라서, 홉의 개수는 일부 게이트-대-게이트 접속을 위하여 2개를 초과할 수 있다. 일반적으로, 모든 것은 동일하며, 홉의 더 적은 수는 더 적은 배치 비용을 가져온다.The connection is represented by the sum of all hops between the gates requiring the internal chip interconnect. The shortest path between any two chips can be represented by one or two "hops" using the connection matrix of FIGS. 7 and 8. However, for certain hardware model implementations, the I / O capacity limits the number of direct shortest path connections between any two gates in the array, so such signals may require longer paths (two to reach their target). Must be routed through more than one hop). Thus, the number of hops may exceed two for some gate-to-gate connections. In general, everything is the same, with fewer hops resulting in less deployment costs.

제 3 항(즉, C2*D)은 다음과 같이 긴 형태로 표현된다.The third term (ie, C2 * D) is expressed in the long form as follows.

제 3 항은 가중 상수(C2)와 컴포넌트(S...)의 합계의 곱이다. 합 성분은 칩 대 칩 상호접속을 필요로 하는 사용자의 회로 설계에서 각각의 게이트 i와 게이트 j 사이의 모든 홉의 합이다. 상술한 바와 같이, 모든 게이트가 내부 칩 상호접속을 필요로 하는 것은 아니다. 내부 칩 상호접속을 요구하는 게이트 i와 게이트 j에 대해, 홉의 개수가 결정된다. 모든 게이트 i와 게이트 j에 대해, 홉의 전체 개수는 모두 합해진다.The term term is the product of the sum of the weighting constants C2 and components S ... The sum component is the sum of all hops between each gate i and gate j in the user's circuit design requiring chip to chip interconnect. As mentioned above, not all gates require internal chip interconnect. For gates i and j that require internal chip interconnect, the number of hops is determined. For all gates i and j, the total number of hops is added together.

거리 계산은 다음과 같이 정의될 수도 있다.Distance calculation may be defined as follows.

여기서, M은 접속 매트릭스이다. 접속 매트릭스의 일 실시예는 도 7에 도시된다. 거리는 상호접속을 요구하는 각각의 게이트-대-게이트 접속을 위해 계산된다. 그러므로, 각각의 게이트 i와 게이트 j 비교를 위하여, 접속 매트릭스 M이 검사된다. 더욱, 구체적으로는Where M is the connection matrix. One embodiment of the connection matrix is shown in FIG. The distance is calculated for each gate-to-gate connection that requires interconnection. Therefore, for each gate i and gate j comparison, the connection matrix M is checked. More specifically

매트릭스는 각각의 칩이 확인가능한 번호화되도록 어레이내의 모든 칩과 함께 설정된다. 이러한 확인 번호는 칼럼 헤더로서 매트릭스의 상부에 설정된다. 유사하게, 이러한 확인 번호는 로우 헤더로서 매트릭스의 측면을 따라 설정된다. 이러한 매트릭스내의 로우와 칼럼의 교차점에서의 특정 엔트리는 로우에 의해 확인된 칩과 칼럼에 의해서 확인된 칩 사이의 접속 데이터를 직접 제공하며, 이는 교차점에서 발생한다. 칩 i와 칩 j 사이의 임의의 거리 계산을 위하여, 매트릭스 Mij내의 엔트리는 직접 접속을 위한 "1" 또는 직접 접속이 아닌 것을 위한 "0"를 포함한다. 지수 k는 상호접속을 요구하는 칩 j내의 임의의 게이트와 칩 i내의 임의의 게이트를 상호접속하기 위하여 필요한 홉의 개수를 나타낸다.The matrix is set up with every chip in the array so that each chip is numbered identifiably. This confirmation number is set on top of the matrix as a column header. Similarly, this confirmation number is set along the side of the matrix as a row header. A particular entry at the intersection of a row and a column in this matrix directly provides connection data between the chip identified by the row and the chip identified by the column, which occurs at the intersection. For calculating the arbitrary distance between chip i and chip j, the entry in the matrix Mij contains "1" for direct connection or "0" for non-direct connection. The exponent k represents the number of hops needed to interconnect any gate in chip j and any gate in chip i requiring interconnection.

우선, k=1을 위한 접속 매트릭스 Mij가 검사되어야만 한다. 만약 엔트리가 "1"이라면, 직접 접속은 칩 i내의 이러한 게이트를 위하여 칩 j내의 선택된 게이트에 존재한다. 그러므로, 지수 또는 홉 k=1은 Mij의 결과로서 지정되며, 이러한 결과는 이러한 2 게이트 사이의 거리이다. 이러한 지점에서, 다른 게이트-대-게이트 접속이 검사될 수 있다. 그러나, 만약 엔트리가 "0"이라면, 직접 접속이 존재하지 않는다.
직접 접속이 존재하지 않는다면, 다음 k가 검사되어야 한다. 이러한 새로운 k(즉, k=2)는 그 자체와 매트릭스 Mij를 곱함으로써 계산될 수 있다; 즉, M² = M*M, 여기서 k=2이다.
칩 i와 칩 j를 위한 특정 로우 및 칼럼 엔트리까지 M을 곱하는 이러한 프로세스는 계산된 결과가 "1"이 될 때까지 계속되며, 지수 k는 홉의 수로서 선택된다. 동작은 매트릭스 M과 AND하며, AND된 결과를 OR하는 것을 포함한다. 매트릭스 m_i,l 및 m_l,j 사이의 AND 동작이 로직 "1" 값으로 나오면, 그러면 칩 i내의 선택된 게이트과 칩 j내의 선택된 게이트 사이에 홉 k를 통하여 임의의 칩 l을 통하여 액세스가 존재한다; 만약 그렇지 않으면, 이러한 특정 홉 k내에서 접속이 존재하지 않으며, 다른 계산이 필요하게 된다. 매트릭스 m_i,l 및 m_l,j 는 이러한 하드웨어 모델링을 위해 정의된 접속 매트릭스 M이다. 상호접속을 필요로 하는 소정의 게이트 i와 게이트 j를 위하여, 매트릭스 m_i,l내의 게이트 i를 위한 FPGA 칩을 함유하는 로우는 로직적으로 게이트 j와 m_l,j 를 위한 FPGA 칩을 함유하는 칼럼에 AND된다. 개별적으로 AND된 컴포넌트는 지수 또는 홉 k을 위한 Mij 값이 "1"또는 "0" 인지를 결정하기 위하여 OR된다. 만약, 결과가 "1"이라면, 접속이 존재하며, 지수 k는 홉의 수로서 지정된다. 만약 결과가 "0"이라면, 그러면 접속은 존재하지 않는다.First, the connection matrix Mij for k = 1 must be checked. If the entry is "1", the direct connection is at the selected gate in chip j for this gate in chip i. Therefore, the exponent or hop k = 1 is specified as the result of Mij, which is the distance between these two gates. At this point, other gate-to-gate connections can be examined. However, if the entry is "0", there is no direct connection.
If no direct connection exists, then k must be checked. This new k (ie k = 2) can be calculated by multiplying itself with the matrix Mij; That is, M ² = M * M, where k = 2.
This process of multiplying M by the specific row and column entries for chip i and chip j continues until the calculated result is " 1 ", and the index k is selected as the number of hops. The operation ANDs with the matrix M and includes ORing the ANDed results. If the AND operation between the matrices m _{i, l} and m _{l, j} results in a logic “1” value then there is access through any chip l through hop k between the selected gate in chip i and the selected gate in chip j. ; Otherwise, there is no connection within this particular hop k, and other calculations are required. The matrices m _{i, l} and m _{l, j} are the connection matrices M defined for this hardware modeling. For a given gate i and gate j requiring interconnection, the row containing the FPGA chip for gate i in matrix m _{i, l} logically contains the FPGA chip for gate j and _{ml, j} ANDed on the column. The individually ANDed components are ORed to determine if the Mij value for the exponent or hop k is "1" or "0". If the result is "1", there is a connection, and the index k is specified as the number of hops. If the result is "0", then no connection exists.

삭제delete

이하의 예는 이러한 원칙을 도시한다. 도 35(A) 내지 35(D)를 참조하라. 도 35(A)는 클라우드(1090)으로서 표현된 사용자의 회로 설계를 도시한다. 이러한 회로 설계(1090)은 간단하거나 복합할 수 있다. 회로 설계(1090)의 부분은 OR 게이트(1091) 및 2개의 AND 게이트(1092 및 1093)을 포함한다. AND 게이트(1092, 1093)의 출력은 OR 게이트(1091)의 입력에 연결된다. 이러한 게이트(1091, 1092, 1093)는 회로 설계(1090)의 다른 부분에 연결될 수도 있다.The following example illustrates this principle. See FIGS. 35A-35D. 35A shows the circuit design of the user represented as the cloud 1090. This circuit design 1090 can be simple or complex. Part of circuit design 1090 includes an OR gate 1091 and two AND gates 1092 and 1093. The outputs of the AND gates 1092 and 1093 are connected to the inputs of the OR gate 1091. These gates 1091, 1092, 1093 may be connected to other portions of the circuit design 1090.

도 35(B)를 참조하면, 3 게이트(1091, 1092, 1093)를 함유하는 부분을 포함하는 이러한 회로(1090)의 컴포넌트는 FPGA 칩(1094, 1095, 1096)내에 구성되고 배치될 수 있다. FPGA 칩의 이러한 특정 어레이는 도시된 바와 같은 상호접속 기술을 갖는다; 즉, 상호접속(1097) 세트는 칩(1094)와 칩(1095)를 연결하며, 다른 상호접속 세트(1098)은 칩(1095)와 칩(1096)을 연결한다. 칩(1094)와 칩(1096) 사이에는 직접 상호접속이 제공되지 않는다. 이러한 회로 설계(1090)의 컴포넌트가 칩에 배치될 때, 시스템은 상이한 칩의 회로 경로를 연결하기 위하여 미리 지정된 상호접속 기술을 사용한다.Referring to FIG. 35B, components of such circuit 1090 that include portions containing three gates 1091, 1092, and 1093 may be constructed and disposed within FPGA chips 1094, 1095, and 1096. This particular array of FPGA chips has an interconnect technology as shown; That is, the set of interconnects 1097 connects the chip 1094 and the chip 1095, and the other set of interconnects 1098 connects the chip 1095 and the chip 1096. There is no direct interconnection provided between chip 1094 and chip 1096. When the components of this circuit design 1090 are placed on a chip, the system uses a predetermined interconnect technology to connect the circuit paths of the different chips.

도 35(C)를 참조하면, 한 가지 가능한 구성 및 배치는 칩 (1094)내에 배치된 OR 게이트(1091), 칩(1095)내에 배치된 AND 게이트(1092) 및 칩(1096)내에 배치된 AND 게이트(1093)이다. 회로(1090)의 다른 부분들은 교육상 목적으로 도시되지 않는다. OR 게이트(1091)와 AND 게이트(1092) 사이의 접속은 상이한 칩들내에 위치하므로 상호접속이 필요하고, 이에 따라 상호접속부(1097) 세트가 사용된다. 이러한 상호접속부를 위한 홉의 수는 "1"이다. OR 게이트(1091)와 AND 게이트(1093) 사이의 액세스도 상호접속이 필요하며, 그러므로 상호접속부(1097, 1098) 세트가 사용된다. 홉의 수는 "2"이다. 이러한 배치를 위한 예의 경우, 전체 홉의 수는 "3"이며, 도시되지 않은 회로(1090)의 나머지에서 상호접속과 다른 게이트로부터의 기여가 감소된다.
도 35(D)는 다른 배치 예를 도시한다. 여기서, OR 게이트(1091)는 칩 (1094)내에 배치되며, AND 게이트(1092, 1093)는 칩(1095)내에 배치된다. 또한, 회로(1090)의 다른 부분은 교육 목적을 위해 도시되지 않는다. OR 게이트(1091) 및 AND 게이트(1092) 사이의 접속은 이들이 상이한 칩들에 위치되기 때문에 상호접속부를 요구하고, 이에 따라 상호접속부들(1097)의 세트가 사용된다. 이런 상호접속부에 대한 홉(hop)의 수는 "1"이다. OR 게이트(1091) 및 AND 게이트(1093) 사이의 접속은 상호접속부를 필요로 하여 상호접속부(1097) 세트가 사용된다. 홉의 수는 역시 "1"이다. 이런 배치예에 대하여, 홉의 총 수는 "2"이고, 도시되지 않은 회로(1090) 나머지에서 다른 게이트 및 상기 게이트들의 기여도를 미리 계산한다. 따라서, 다른 인자들이 동일하다고 가정하고 거리 D 파라미터를 바탕으로, 비용 함수는 도 35(C)의 배치예보다 도 35(D)의 배치예에 대해 보다 비용 함수를 계산한다. 그러나, 모든 다른 인자는 동일하지 않다. 보다 유사하게, 도 35(D)의 비용 함수는 게이트 사용량/이용가능성 G를 바탕으로 한다. 도 35(D)에서, 하나 이상의 게이트는 도 35(C)의 동일한 칩에 사용되기 보다 칩(1095)에서 사용된다. 게다가, 도 35(C)에 도시된 배치예에서 칩(1095)에 대한 핀 사용량/이용가능성 P은 도 35(D)에 도시된 다른 배치 예에서의 동일한 칩에 대한 핀 사용량/이용가능성 보다 크다.
거친-그레인(coarse-grain) 배치후, 평탄화된 클러스터 배치의 미세 조절은 추가로 배치 결과를 최적화할 것이다. 이런 미세-그레인(fine-grain) 배치 동작(353)은 거친-그레인 배치 동작(352)에 의해 처음에 선택된 배치를 정교하게 한다. 여기서, 초기 클러스터는 만약 배치의 최적화를 증가시키면 분할될 수 있다. 예를들어, 로직 엘리먼트 X 및 Y가 클러스터(A)의 본래 부분이고 FPGA 칩(10)을 위해 지정된다고 가정한다. 미세-그레인 배치 동작(353)으로 인해, 로직 엘리먼트(X 및 Y)는 분리된 클러스터(B) 또는 다른 클러스터(C)의 일부로서 설계되거나 FPGA 칩(2)의 배치를 위해 지정될 수 있다. 따라서, 특정 FPGA에 대한 사용자의 회로 설계를 묶는 FPGA 네트리스트(netlist)(354)가 생성된다.
클러스터 분할 및 임의의 칩내에 배치 방법의 결정은 회로 CKTQ에 대한 비용 함수 f(P, G, D)를 통하여 계산된 배치 비용을 바탕으로 이루어진다. 일실시예에서, 미세-그레인 배치 처리에 사용된 비용 함수는 거친-그레인 배치 처리에 사용된 비용 함수와 같다. 단지 두개의 배치 처리 사이의 차는 처리 자체가 아니라 배치되는 클러스트 크기이다. 거친-그레인 배치 처리는 미세-그레인 배치 처리보다 큰 클러스트를 사용한다. 다른 실시예에서, 거친-그레인 및 미세-그레인 배치 처리에 대한 비용 함수는 가중 상수(C0, C1 및 C2)를 선택하는 것과 관련하여 상기된 바와같이 서로 다르다.Referring to FIG. 35C, one possible configuration and arrangement is an OR gate 1091 disposed within the chip 1094, an AND gate 1092 disposed within the chip 1095, and an AND disposed within the chip 1096. Gate 1093. Other parts of the circuit 1090 are not shown for educational purposes. The connection between the OR gate 1091 and the AND gate 1092 is located in different chips and therefore requires interconnection, so a set of interconnects 1097 is used. The number of hops for this interconnect is "1". Access between OR gate 1091 and AND gate 1093 also requires interconnection, so a set of interconnects 1097 and 1098 are used. The number of hops is "2". For the example for this arrangement, the total number of hops is "3" and the contribution from interconnects and other gates is reduced in the rest of the circuit 1090, not shown.
35D shows another arrangement example. Here, the OR gate 1091 is disposed in the chip 1094, and the AND gates 1092 and 1093 are disposed in the chip 1095. Also, other parts of circuit 1090 are not shown for educational purposes. The connection between OR gate 1091 and AND gate 1092 requires interconnects because they are located on different chips, so a set of interconnects 1097 is used. The number of hops for this interconnect is "1". The connection between the OR gate 1091 and the AND gate 1093 requires an interconnect and a set of interconnects 1097 is used. The number of hops is also "1". For this arrangement, the total number of hops is " 2 ", and pre-calculates the contribution of the other gates and the gates in the rest of circuit 1090, not shown. Thus, on the basis of the distance D parameter, assuming that the other factors are the same, the cost function calculates the cost function more for the arrangement example of FIG. 35 (D) than for the arrangement example of FIG. 35C. However, all other factors are not the same. More similarly, the cost function of FIG. 35D is based on the gate usage / availability G. In FIG. 35D, more than one gate is used in chip 1095 rather than the same chip in FIG. 35C. In addition, the pin usage / availability P for the chip 1095 in the example arrangement shown in FIG. 35 (C) is greater than the pin usage / availability for the same chip in the other example arrangement shown in FIG. 35 (D). .
After coarse-grain placement, fine control of the flattened cluster placement will further optimize the placement results. This fine-grain placement operation 353 elaborates the placement initially selected by the coarse-grain placement operation 352. Here, the initial cluster can be split if it increases the optimization of the placement. For example, assume that logic elements X and Y are the original part of cluster A and designated for FPGA chip 10. Due to the fine-grain placement operation 353, the logic elements X and Y can be designed as part of a separate cluster B or other cluster C or designated for placement of the FPGA chip 2. Thus, an FPGA netlist 354 is created that binds the user's circuit design for a particular FPGA.
Determination of the cluster partitioning and placement method in any chip is made based on the placement cost calculated through the cost function f (P, G, D) for the circuit CKTQ. In one embodiment, the cost function used for the micro-grain batch process is the same as the cost function used for the coarse-grain batch process. The difference between only two batch processes is not the process itself, but the size of the clusters to be placed. Coarse-grain batch processing uses larger clusters than fine-grain batch processing. In other embodiments, the cost functions for coarse-grain and fine-grain batch processing differ from each other as described above with respect to selecting weighting constants C0, C1 and C2.

삭제delete

배치가 종료되면, 칩 사이의 라우팅 작업(355)이 수행된다. 만약 다른 칩에배치된 회로들을 접속시키기 위한 라우팅 와이어의 수가 칩-대-칩 라우팅을 위해 할당된 FPGA 칩내의 이용 가능한 핀을 초과하면, 시분할 멀티플렉스(TDM) 회로가 사용된다. 예를 들어, 만약 각각의 FPGA 칩이 두개의 다른 FPGA 칩에 배치된 회로를 접속시키기 위해 단지 44개의 핀만이 허용되고, 특정 모델 실행에서 칩 사이에 45 와이어가 요구되면, 특정 시분할 멀티플렉스 회로는 각각의 칩에서 실행될 것이다. 이런 특정 TDM 회로는 적어도 두개의 와이어를 서로 결합시킨다. TDM 회로의 일실시예는 추후에 논의될 도 9(A), 9(B), 및 9(C)에 도시된다. 따라서, 라우팅 작업은 핀이 칩 사이에서 시분할 멀티플렉스로 배열되기 때문에 항상 완성될 수 있다.
각각의 FPGA의 배치 및 라우팅이 결정되면, 각각의 FPGA는 최적화된 작업 회로로 구성되고 따라서, 시스템은 "비트스트림" 컨피규레이션 파일(356)을 생성한다. Altera 기술에서, 시스템은 하나 이상의 프로그래머 오브젝트 파일(-pof)을 생성한다. 다른 생성된 파일은 SRAM 오브젝트 파일(.sof), JECED 파일(.jed), 16진법(인텔-포맷) 파일(.hex) 및 테이블 텍스트 파일(.ttf)을 포함한다. Altera MAX+PLUS Ⅱ 프로그래머는 FPGA 어레이를 프로그램하기 위하여 Altera 하드웨어 프로그래머블 장치와 함께 POF, SOF 및 JEDEC 파일을 사용한다. 선택적으로, 시스템은 하나 이상의 로(raw) 이진 파일(.rbf)을 생성한다. CPU는 .rbf 파일을 변경하고 PCI 버스를 통하여 FPGA 어레이를 프로그램한다.
이런 포인트에서, 구성된 하드웨어는 하드웨어 스타트-업(start-up)(370)을 준비한다. 이것은 리컨피규러블 보드상에서 하드웨어 모델의 자동 구조를 종료한다.
하나의 핀 출력만이 실제로 사용되도록 핀 출력 그룹이 함께 시분할 멀티플렉스되도록 하는 TDM 회로를 다시 참조하여, TDM 회로는 필수적으로 적어도 두개의 입력(두개의 와이어에 대해)을 가진 멀티플렉서, 하나의 출력, 및 선택기 신호로서 루프에 구성된 레지스터 커플이다. 만약 SEmulation 시스템이 서로 그룹지도록 보다 많은 와이어를 요구하면, 보다 많은 입력 및 루프 레지스터가 제공될수있다. TDM 회로에 대한 선택기 신호처럼, 루프에 구성된 몇몇 레지스터는 멀티플렉서에 대한 적당한 신호를 제공하여, 임의의 시간 주기에, 상기 입력중 하나는 출력으로서 선택되고, 다른 시간 주기에 다른 입력은 출력으로서 선택된다. 따라서, TDM 회로는 칩 사이의 하나의 출력 와이어만을 사용하도록 관리하여, 이런 실시예에 대한 특정 칩에서 실행된 회로의 하드웨어 모델은 45 핀 대신 44 핀을 사용하여 달성된다. 따라서, 라우팅 작업은 상기 핀들이 칩에서 시분할 멀티플렉스 형태로 분할될 수 있기 때문에 항상 종료될 수 있다.When the deployment is complete, routing tasks 355 between the chips are performed. If the number of routing wires for connecting circuits deployed on other chips exceeds the available pins in the FPGA chip allocated for chip-to-chip routing, a time division multiplex (TDM) circuit is used. For example, if only 44 pins are allowed to connect a circuit where each FPGA chip is placed on two different FPGA chips, and 45 wires are required between the chips in a particular model run, then a particular time division multiplex circuit may It will run on each chip. This particular TDM circuit couples at least two wires together. One embodiment of a TDM circuit is shown in Figures 9 (A), 9 (B), and 9 (C), which will be discussed later. Thus, routing tasks can always be completed because the pins are arranged in time division multiplexes between the chips.
Once the placement and routing of each FPGA is determined, each FPGA is configured with optimized working circuitry, so the system generates a "bitstream" configuration file 356. In Altera technology, the system generates one or more programmer object files (-pof). Other generated files include SRAM object files (.sof), JECED files (.jed), hexadecimal (Intel-format) files (.hex), and table text files (.ttf). Altera MAX + PLUS II programmers use POF, SOF, and JEDEC files with Altera hardware programmable devices to program the FPGA array. Optionally, the system generates one or more raw binary files (.rbf). The CPU modifies the .rbf file and programs the FPGA array through the PCI bus.
At this point, the configured hardware prepares for hardware start-up 370. This terminates the automatic structure of the hardware model on the reconfigurable board.
Referring back to the TDM circuit, which allows time-division multiplexing of the pin output groups together so that only one pin output is actually used, the TDM circuit is essentially a multiplexer with at least two inputs (for two wires), one output, And register couples configured in the loop as selector signals. If the SEmulation system requires more wires to group together, more input and loop registers can be provided. Like selector signals for TDM circuits, some registers configured in the loop provide a suitable signal for the multiplexer, so that at any time period, one of the inputs is selected as an output and another input is selected as an output at another time period. . Thus, the TDM circuit manages to use only one output wire between the chips, so that the hardware model of the circuit implemented on a particular chip for this embodiment is achieved using 44 pins instead of 45 pins. Thus, the routing operation can always be terminated because the pins can be divided in time-division multiplex form on the chip.

삭제delete

도 9(A)는 핀-아웃 문제의 개요를 도시한다. 이것은 TDM 회로를 요구하기 때문에, 도 9(B)는 전송측에 대한 TDM 회로를 제공하고, 도 9(C)는 수신측에 대한 TDM 회로를 제공한다. 이들 도면은 SEmulation 시스템이 칩 사이에 두개의 와이어 대신 하나의 와이어를 요구하는 단지 하나의 특정 실시예를 도시한다. 만약 두개 이상의 와이어가 시간 멀티플렉스 배열에 함께 결합되야 한다면, 당업자는 하기 기술로 인해 적당한 변형을 형성할 수 있다.9 (A) shows an overview of the pin-out problem. Since this requires a TDM circuit, Fig. 9B provides the TDM circuit for the transmitting side, and Fig. 9C provides the TDM circuit for the receiving side. These figures show only one specific embodiment where the SEmulation system requires one wire instead of two wires between the chips. If two or more wires must be joined together in a time multiplex arrangement, one skilled in the art can form suitable strains due to the following techniques.

도 9(A)는 SEmulation 시스템이 TDM 구성에서 두개의 와이어를 결합하는 TDM 회로의 일실시예를 도시한다. 두개의 칩(990 및 991)이 제공된다. 완성된 사용자 회로 설계의 일부인 회로(960)는 칩(991)에서 모델링 및 배치된다. 완성된 사용자 회로 설계의 일부인 회로(973)는 칩(990)에서 모델링 및 배치된다. 상호접속부(994), 상호접속부(992), 및 상호접속부(993)의 그룹을 포함하는 몇몇 상호접속부는 회로(960 및 973) 사이에 제공된다. 이런 실시예에서 상호접속부의 수는 총 45이다. 만약 일실시예에서, 각각의 칩이 이들 상호접속부를 위해 단지 44 핀만을 제공하면, 본 발명의 하나의 실시예는 이들 칩(990 및 991) 사이에 단지 하나의 상호접속부만을 요구하도록 시분할 멀티플렉스되는 적어도 두개의 상호접속부를 제공한다.9 (A) shows one embodiment of a TDM circuit in which the SEmulation system combines two wires in a TDM configuration. Two chips 990 and 991 are provided. Circuit 960, which is part of the completed user circuit design, is modeled and placed on chip 991. Circuit 973, which is part of the completed user circuit design, is modeled and placed on chip 990. Several interconnects are provided between circuits 960 and 973, including interconnect 994, interconnect 992, and a group of interconnects 993. In this embodiment the number of interconnects is 45 total. If in one embodiment each chip provides only 44 pins for these interconnects, one embodiment of the present invention requires time division multiplexing to require only one interconnect between these chips 990 and 991. Provide at least two interconnections.

이런 실시예에서, 상호접속부(994)의 그룹은 43 핀을 계속 사용할것이다. 44번째 및 마지막 핀에 대하여, 본 발명의 일실시예에 따른 TDM 회로는 시분할 멀티플렉스된 형태로 함께 상호접속부(992 및 993)를 결합하기 위하여 사용될 수 있다. In this embodiment, the group of interconnects 994 will continue to use 43 pins. For the 44th and last pins, TDM circuits according to one embodiment of the present invention can be used to couple interconnects 992 and 993 together in a time division multiplexed form.

도 9(B)는 TDM 회로의 일실시예를 도시한다. FPGA 칩(991)내의 모델링된 회로(또는 그것의 일부)(960)는 와이어(966 및 967)상에 두개의 신호를 제공한다. 회로(960)에서 이들 와이어(966 및 967)는 출력이다. 이들 출력은 칩(990)(도 9A 및 9C 참조)의 모델링된 회로(973)에 일반적으로 결합된다. 그러나, 이들 두개의 출력 와이어(966 및 967)에 대하여 단지 하나의 핀의 이용 가능성은 직접적인 핀 대 핀 접속을 방해한다. 출력(966 및 967)은 다른 칩으로 단일방향으로 전송되기 때문에, 적당한 전송 및 수신기 TDM 회로는 이들 라인을 서로 결합하도록 제공된다. 전송측 TDM 회로의 일실시예는 도 9(B)에 도시된다.9B shows one embodiment of a TDM circuit. Modeled circuitry (or portions thereof) 960 in FPGA chip 991 provides two signals on wires 966 and 967. In circuit 960 these wires 966 and 967 are outputs. These outputs are generally coupled to the modeled circuit 933 of the chip 990 (see FIGS. 9A and 9C). However, the availability of only one pin for these two output wires 966 and 967 prevents direct pin-to-pin connections. Because outputs 966 and 967 are transmitted unidirectionally to other chips, suitable transmit and receiver TDM circuits are provided to couple these lines together. One embodiment of the transmitting side TDM circuit is shown in Fig. 9B.

전송측 TDM 회로는 AND 게이트(961 및 962)를 포함하고, 상기 게이트들의 각각의 출력(970 및 971)은 OR 게이트(963)의 입력에 결합된다. OR 게이트(963)의 출력(972)은 하나의 핀에 할당되고 다른 칩(990)에 액세스된 칩의 출력이다. AND 게이트(961 및 962)로의 하나의 세트의 입력(966 및 967)은 각각 회로 모델(960)에 의해 제공된다. 다른 세트의 입력(968 및 969)은 시분할 멀티플렉스 선택 신호로서 기능하는 루프 레지스터 수단에 의해 제공된다.The transmission-side TDM circuit includes AND gates 961 and 962, each output 970 and 971 of the gates coupled to an input of an OR gate 963. The output 972 of the OR gate 963 is the output of a chip assigned to one pin and accessed to another chip 990. One set of inputs 966 and 967 to AND gates 961 and 962 are provided by circuit model 960, respectively. The other set of inputs 968 and 969 are provided by loop register means which serve as time division multiplex select signals.

루프 레지스터 방법은 레지스터(964 및 965)를 포함한다. 레지스터(964)의 출력(995)은 레지스터(965)의 입력 및 AND 게이트(961)의 입력(968)에 제공된다. 레지스터(965)의 출력(996)은 레지스터(964)의 입력 및 AND 게이트(962)의 입력(968)에 제공된다. 각각의 레지스터(964 및 965)는 공통 클럭 소스에 의해 제어된다. 임의의 주어진 시간에, 단지 하나의 출력(995 또는 996)은 로직 "1"을 제공한다. 다른 출력은 로직 "0"을 제공한다. 따라서, 각각의 클럭 에지후, 로직 "1"은 출력(995 및 996) 사이에서 시프트한다. 이것은 차례로 AND 게이트(961)에 대해 "1" 또는 AND 게이트(962)에 대해 "1"을 제공하고, 와이어(966) 또는 와이어(967)중 어느 하나를 "선택"한다. 따라서, 와이어(972)상의 데이터는 와이어(966) 또는 와이어(967)상의 회로(960)로부터 출력된다.The loop register method includes registers 964 and 965. Output 995 of register 964 is provided to an input of register 965 and an input 968 of AND gate 961. Output 996 of register 965 is provided to an input of register 964 and an input 968 of AND gate 962. Each register 964 and 965 is controlled by a common clock source. At any given time, only one output 995 or 996 provides a logic "1". The other output provides logic "0". Thus, after each clock edge, logic “1” shifts between outputs 995 and 996. This in turn provides "1" for AND gate 961 or "1" for AND gate 962 and "selects" either wire 966 or wire 967. Thus, data on wire 972 is output from wire 966 or circuit 960 on wire 967.

TDM 회로의 수신측 일실시예는 도 9(C)에 도시된다. 칩(991)(도 9(A) 및 9(B))에서 와이어(966 및 967)상 회로(960)로부터의 신호는 도 9(C)의 회로(973)에 대한 적당한 와이어(985 또는 986)에 결합되어야 한다. 칩(991)로부터의 시분할 멀티플렉스 신호는 와이어/핀(978)로부터 진입한다. 수신측 TDM 회로는 와이어/핀(978)상 이들 신호를 회로(973)에 대한 적당한 와이어(985 및 986)에 결합한다.One embodiment of the receiving side of the TDM circuit is shown in Fig. 9C. The signals from circuit 960 on wires 966 and 967 in chip 991 (FIGS. 9A and 9B) are suitable wires 985 or 986 for circuits 953 in FIG. 9C. Must be combined). Time division multiplex signal from chip 991 enters from wire / pin 978. The receiving TDM circuit couples these signals on wire / pin 978 to the appropriate wires 985 and 986 for circuit 973.

TDM 회로는 입력 레지스터(974 및 975)를 포함한다. 와이어/핀(978)상 신호는 각각 와이어(979 및 980)를 통하여 이들 입력 레지스터(974 및 975)에 제공된다. 입력 레지스터(974)의 출력(985)은 회로(973)의 적당한 포트에 제공된다. 유사하게, 입력 레지스터(975)의 출력(986)은 회로(973)의 적당한 포트에 제공된다. 이들 입력 레지스터(974 및 975)는 루프 레지스터(976 및 977)에 의해 제어된다.The TDM circuit includes input registers 974 and 975. Signals on wire / pin 978 are provided to these input registers 974 and 975 via wires 979 and 980, respectively. Output 985 of input register 974 is provided to a suitable port of circuit 973. Similarly, the output 986 of the input register 975 is provided to an appropriate port of the circuit 973. These input registers 974 and 975 are controlled by loop registers 976 and 977.

레지스터(976)의 출력(984)은 레지스터(977)의 입력 및 레지스터(974)의 클럭 입력(981)에 결합된다. 레지스터(977)의 출력(983)은 레지스터(976)의 입력 및 레지스터(975)의 클럭 입력(982)에 결합된다. 각각의 레지스터(976 및 977)는 공통 클럭 소스에 의해 제어된다. 임의의 주어진 시간에서, 단지 하나의 인에이블 입력(981 또는 982)은 로직 "1"이 된다. 다른 입력은 로직 "0"이 된다. 따라서, 각각의 클럭 에지후, 로직 "1"은 인에이블 입력(981) 및 출력(982) 사이에서 시프트한다. 이것은 교대로 와이어(979 또는 980)상의 신호를 "선택"한다. 따라서, 회로(960)로부터 와이어(978) 상의 데이터가 와이어(985 또는 986)를 통해 회로(973)에 적절히 결합될 수 있다.
도 4와 관련하여 간략히 논의된 바와같이, 본 발명의 일실시예에 따른 어드레스 포인터는 지금 더 상세히 논의될 것이다. 반복하기 위하여, 몇몇 어드레스 포인터는 하드웨어 모델의 각각의 FPGA 칩에 배치된다. 일반적으로, 어드레스 포인터를 실행하는 제 1 목적은 32 비트 PCI 버스(328)(도 10 참조)를 통하여 하드웨어 모델(325)내의 특정 FPGA 칩 및 소프트웨어 모델(315) 사이에서 데이터를 전달하기 위하여 시스템을 인에이블하는 것이다. 보다 특히, 어드레스 포인터의 제 1 목적은 32 비트 PCI 버스의 대역폭 제한으로 인해 FPGA 칩의 뱅크(326a-326d)중 각각의 FPGA 칩과 소프트웨어/하드웨어 경계내의 각각의 어드레스 공간(즉, REG, S2H, H2S 및 CLK) 사이에서 데이터 전달을 선택적으로 제어하는 것이다. 비록 64 비트 PCI 버스가 실행되더라도, 이들 어드레스 포인터는 데이터 전달을 제어하기 위하여 여전히 필요하다. 따라서, 만약 소프트웨어 모델이 5 어드레스 공간(즉, REG 판독, REG 기록, S2H 판독, H2S 기록, 및 CLK 기록)을 가지면, 각각의 FPGA 칩은 이들 5 어드레스 공간에 해당하는 5 어드레스 포인터를 가진다. 각각의 FPGA는 처리되는 선택된 어드레스 공간내의 특정 선택된 워드가 임의의 하나 이상의 FPGA 칩에 잔류하기 때문에 이들 5 어드레스 포인터를 필요로 한다.
FPGA I/O 컨트롤러(381)는 공간 지수(SPACE index)를 사용함으로써 소프트웨어/하드웨어 경계에 해당하는 특정 어드레스 공간(즉, REG, S2H, H2S 및 CLK)을 선택한다. 일단 어드레스 공간이 선택되면, 각각의 FPGA 칩내의 선택된 어드레스 공간에 해당하는 특정 어드레스 포인터는 선택된 어드레스 공간내의 동일한 워드에 해당하는 특정 워드를 선택한다. 각각의 FPGA 칩내의 어드레스 포인터 및 소프트웨어/하드웨어 경계내의 어드레스 공간의 최대 크기는 선택된 FPGA 칩의 메모리/워드 용량에 따른다. 예를들어, 본 발명의 일실시예는 FPGA 칩의 Altera FLEX 10K 패밀리를 사용한다. 따라서, 각각의 어드레스 공간에 대한 평가된 최대 크기는 : REG, 3,000 워드; CLK, 1 워드; S2H, 10 워드; 및 H2S, 10 워드이다. 각각의 FPGA 칩은 대략 100 워드를 수용할 수 있다.The output 984 of the register 976 is coupled to the input of the register 997 and the clock input 981 of the register 974. The output 983 of the register 997 is coupled to the input of the register 976 and the clock input 982 of the register 975. Each register 976 and 977 is controlled by a common clock source. At any given time, only one enable input 981 or 982 becomes logic "1". The other input is a logic "0". Thus, after each clock edge, logic “1” shifts between enable input 981 and output 982. This in turn "selects" the signal on wire 979 or 980. Thus, data on wire 978 from circuit 960 can be appropriately coupled to circuit 973 via wire 985 or 986.
As briefly discussed with respect to FIG. 4, an address pointer according to an embodiment of the present invention will now be discussed in more detail. To repeat, some address pointers are placed on each FPGA chip of the hardware model. In general, the first purpose of implementing an address pointer is to provide a system for transferring data between a particular FPGA chip and software model 315 in hardware model 325 via a 32-bit PCI bus 328 (see FIG. 10). Is to enable it. More specifically, the first purpose of the address pointer is to address each FPGA space within the FPGA chip and software / hardware boundaries of the banks 326a-326d of the FPGA chip due to bandwidth limitations of the 32-bit PCI bus (ie, REG, S2H, Selectively control data transfer between H2S and CLK). Although a 64-bit PCI bus is implemented, these address pointers are still needed to control data transfer. Thus, if the software model has five address spaces (ie, REG read, REG write, S2H read, H2S write, and CLK write), then each FPGA chip has five address pointers corresponding to these five address spaces. Each FPGA needs these five address pointers because a particular selected word in the selected address space being processed remains on any one or more FPGA chips.
The FPGA I / O controller 381 selects specific address spaces (ie, REG, S2H, H2S and CLK) corresponding to the software / hardware boundary by using the SPACE index. Once the address space is selected, the specific address pointer corresponding to the selected address space in each FPGA chip selects the specific word corresponding to the same word in the selected address space. The maximum size of the address pointer within each FPGA chip and the address space within the software / hardware boundary depends on the memory / word capacity of the selected FPGA chip. For example, one embodiment of the present invention uses the Altera FLEX 10K family of FPGA chips. Thus, the estimated maximum size for each address space is: REG, 3,000 words; CLK, 1 word; S2H, 10 words; And H2S, 10 words. Each FPGA chip can accommodate approximately 100 words.

삭제delete

SEmulation 시스템은 사용자가 시작, 종료하고, 입력 값을 발생하고, SEmulation 처리시 임의의 시간에 값을 검사하도록 하는 특징을 가진다. 시뮬레이터의 적응성을 제공하기 위하여, SEmulator는 컴포넌트의 내부 구현형태가 소프트웨어인지 하드웨어인지에 관계없이 사용자에게 모든 컴포넌트이 가시적이어야 한다. 소프트웨어에서, 결합 컴포넌트는 모델링되고 값은 시뮬레이션 프로세스동안 계산된다. 따라서, 이들 값은 시뮬레이션 프로세스 동안 임의의 시간에 사용자가 액세스하도록 완전히 "가시적"이다.The SEmulation system is characterized by allowing the user to start and end, generate input values, and check the values at any time during the SEmulation process. To provide simulator adaptability, the SEmulator must be visible to the user, regardless of whether the component's internal implementation is software or hardware. In software, coupling components are modeled and values are calculated during the simulation process. Thus, these values are completely "visible" for the user to access at any time during the simulation process.

그러나, 하드웨어 모델에서 결합 컴포넌트는 직접적으로 "가시적"이지 않다. 비록 레지스터가 소프트웨어 커널에 의해 쉽고 직접적으로 액세스 가능(즉, 판독/기록) 하지만, 결합 컴포넌트은 결정하기가 보다 어렵다. FPGA에서, 대부분의 결합 컴포넌트은 룩업 테이블로서 모델링되어 게이트 활용도를 높인다. 결과적으로, 룩업 테이블 맵핑은 효과적인 하드웨어 모델링을 제공하지만 대부분의 조합 로직 신호의 가시도(visibility)를 손상시킨다.However, in the hardware model, the coupling component is not directly "visible". Although registers are easily and directly accessible (ie, read / write) by the software kernel, the coupling component is more difficult to determine. In FPGAs, most coupling components are modeled as lookup tables to increase gate utilization. As a result, lookup table mapping provides effective hardware modeling but compromises the visibility of most combinatorial logic signals.

결합 컴포넌트의 가시도 부족으로 인한 이들 문제에도 불구하고, SEmulation 시스템은 하드웨어 가속 모드후 사용자에 의한 검사를 위해 결합 컴포넌트을 보강하거나 재생성할 수 있다. 만약 사용자의 회로 설계가 단지 결합 및 레지스터 컴포넌트만을 가지면, 모든 결합 컴포넌트의 값은 레지스터 컴포넌트으로부터 유도될수있다. 즉, 결합 컴포넌트은 회로 설계에 의해 요구된 특정 로직 함수에 따라 다양한 배열의 레지스터로부터 구성되고, 상기 레지스터를 포함한다. SEmulator는 레지스터 및 결합 컴포넌트만의 하드웨어 모델을 가지며, 결과적으로 SEmulator는 하드웨어 모델로부터 모든 레지스터 값을 판독하고 그 다음 모든 결합 컴포넌트을 보강하거나 재생성한다. 이런 재생성 처리를 수행하기 위하여 요구된 오버헤드로 인해, 결합 컴포넌트 재생성은 모든 시간에서 수행되지 못하고; 오히려, 사용자에 의해 요구시에만 수행된다. 실제로, 하드웨어 모델을 사용하는 장점중 하나는 시뮬레이션 프로세스를 가속시키는 것이다. 모든 사이클(또는 심지어 대부분의 사이클)에서 결합 컴포넌트 값을 결정하는 것은 시뮬레이션 속도를 추가로 감소시킨다. 임의의 이벤트에서, 레지스터 값 단독의 검사는 대부분의 시뮬레이션 분석을 위해 충분해야 한다.Despite these problems due to lack of visibility of the coupling components, the SEmulation system can reinforce or regenerate the coupling components for inspection by the user after hardware acceleration mode. If your circuit design only has coupling and register components, the values of all coupling components can be derived from the register component. That is, the coupling component is comprised from and includes various registers according to specific logic functions required by the circuit design. The SEmulator has a hardware model of registers and coupling components only, and consequently the SEmulator reads all register values from the hardware model and then augments or regenerates all coupling components. Due to the overhead required to perform this regeneration process, combined component regeneration is not performed at all times; Rather, it is performed only upon request by the user. In fact, one of the advantages of using hardware models is to speed up the simulation process. Determining the coupling component value every cycle (or even most cycles) further reduces the simulation speed. In any event, checking of register values alone should be sufficient for most simulation analysis.

레지스터 값으로부터 결합 컴포넌트 값을 재생성하는 처리는 SEmulation 시스템가 하드웨어 가속 모드 또는 ICE 모드에 있다는 것을 보장한다. 그렇지 않으면, 소프트웨어 시뮬레이션은 사용자에게 결합 컴포넌트 값을 제공한다. SEmulation 시스템은 하드웨어 가속의 시작 전에 소프트웨어 모델에 잔류하는 레지스터 값 뿐만 아니라 결합 컴포넌트 값을 유지한다. 이들 값은 시스템에 의한 추가 오버 기록 동작때까지 소프트웨어 모델에 잔류한다. 소프트웨어 모델이 하드웨어 가속 시작 바로전 시간 주기로부터 결합 컴포넌트 값 및 레지스터 값을 가지기 때문에, 결합 컴포넌트 재생성 처리는 업데이트된 입력 레지스터 값에 응답하여 소프트웨어 모델의 일부 값 또는 모든 값을 업데이트하는 것을 포함한다.The process of regenerating the combined component values from the register values ensures that the SEmulation system is in hardware acceleration mode or ICE mode. Otherwise, the software simulation provides the user with the combined component value. The SEmulation system maintains the combined component values as well as the register values remaining in the software model before the start of hardware acceleration. These values remain in the software model until further overwrite operations by the system. Because the software model has a combined component value and a register value from the time period just before the start of hardware acceleration, the combined component regeneration process includes updating some or all values of the software model in response to the updated input register value.

결합 컴포넌트 생성 처리는 다음과 같다: 첫째, 만약 사용자에 의해 요구되면, 소프트웨어 커널은 FPGA 칩으로부터 REG 버퍼로 하드웨어 레지스터 컴포넌트의 모든 출력 값을 판독한다. 이런 처리는 어드레스 포인터의 체인을 통하여 REG 어드레스 공간으로 FPGA 칩내의 레지스터 값의 DMA 전달을 포함한다. 소프트웨어/하드웨어 경계내에 있는 REG 버퍼로 하드웨어 모델내에 있는 레지스터 값을 배치하는 것은 소프트웨어 모델이 추가 처리를 위하여 데이터를 액세스하도록 한다.The combined component generation process is as follows: First, if required by the user, the software kernel reads all output values of the hardware register components from the FPGA chip into the REG buffer. This process involves DMA transfer of register values in the FPGA chip into the REG address space through a chain of address pointers. Placing register values in the hardware model into the REG buffer within the software / hardware boundary allows the software model to access the data for further processing.

둘째, 소프트웨어 커널은 하드웨어 가속이 시작되기전 및 하드웨어 가속이 된후 레지스터 값을 비교한다. 만약 하드웨어 가속전 레지스터값이 하드웨어 가속후 값과 동일하면, 결합된 컴포넌트의 값은 변하지 않는다. 재생성되는 결합 컴포넌트에 대한 시간 및 리소스를 연장하는 대신, 이들 값은 하드웨어 가속 바로전 시간으로부터 저장된 결합 컴포넌트 값을 가진 소프트웨어 모델로부터 판독될 수 있다. 다른 한편, 만약 하나 이상의 이들 레지스터 값이 변하면, 변화된 레지스터 값에 따르는 하나 이상의 결합 컴포넌트는 값을 변화시킬 수 있다. 이들 결합 컴포넌트는 다음 제 3 단계를 통하여 재생성되어야 한다.Second, the software kernel compares register values before and after hardware acceleration begins. If the register value before hardware acceleration is equal to the value after hardware acceleration, the value of the combined component is not changed. Instead of extending the time and resources for the regenerated coupling component, these values can be read from a software model with the coupling component value stored from the time just before hardware acceleration. On the other hand, if one or more of these register values change, one or more coupling components in accordance with the changed register values may change the values. These coupling components must be regenerated in the next third step.

셋째, 가속전 및 가속후 비교로부터 다른 값을 가진 레지스터에 대하여, 소프트웨어 커널은 팬-아웃(fan-out) 결합 컴포넌트를 이벤트 큐에 스케쥴링한다. 여기서, 가속동안 값을 변화시킨 레지스터는 이벤트를 검출한다. 보다 유사하게, 이들 변화된 레지스터 값에 따른 이들 결합 컴포넌트은 다른 값을 생성할 것이다. 이들 결합 컴포넌트의 값의 임의의 변화에도 불구하고, 상기 시스템은 이들 결합 컴포넌트이 다음 단계에서 이들 변화된 레지스터 값을 평가하는 것을 보장한다.Third, for registers with different values from pre-acceleration and post-acceleration comparisons, the software kernel schedules a fan-out coupling component to the event queue. Here, a register whose value changes during acceleration detects an event. More similarly, these coupling components according to these changed register values will produce different values. Despite any change in the value of these coupling components, the system ensures that these coupling components evaluate these changed register values in the next step.

넷째, 그다음 소프트웨어 커널은 레지스터로부터의 값 변화를 소프트웨어 모델의 모든 결합 컴포넌트로 전달하기 위하여 표준 이벤트 시뮬레이션 알고리즘을 실행한다. 다른 말로, 가속전에서 가속후 시간 간격동안 변화된 레지스터 값은 이들 레지스터 값에 의존하는 하부쪽 모든 결합 컴포넌트로 전달된다. 그 다음 이들 결합 컴포넌트는 이들 새로운 레지스터 값을 평가한다. 팬-아웃 및 진행 원리에 따라, 변화된 레지스터 값에 차례로 직접적으로 의존하는 제 1 레벨 결합 컴포넌트으로부터 아래에 배치된 다른 제 2 레벨 결합 컴포넌트는 변화된 데이터를 평가하여야 한다. 영향을 받을 수 있는 아래에 배치된 다른 컴포넌트로의 레지스터 값의 이런 진행은 팬-아웃 네트워크의 목적에 공헌한다. 따라서, 아래에 배치되고 변화된 레지스터 값에 의해 영향을 받는 이들 결합 컴포넌트는 소프트웨어 모델에서 업데이트된다. 결합 컴포넌트 값 어느 것도 영향을 받지 않는다. 따라서, 만약 가속전에서 가속후의 시간 간격 동안 변화된 단지 하나의 레지스터 값, 및 단지 하나의 결합 컴포넌트이 이런 레지스터 값 변화에 의해 영향을 받으면, 단지 이런 결합 컴포넌트은 이런 변화된 레지스터 값을 기반으로 값을 재평가할 것이다. 모델링된 회로의 다른 부분은 영향을 받지 않을 것이다. 이런 작은 변화를 위해, 결합 컴포넌트 재생성 처리는 비교적 빠르게 발생할 것이다.
마지막으로, 이벤트 진행이 종료될때, 시스템은 임의의 운영 모드를 위해 준비된다. 일반적으로, 사용자는 긴 실행후 값을 검사하고자 한다. 결합 컴포넌트 재생성 처리후, 사용자는 디버그/테스트 목적을 위해 순수 소프트웨어 시뮬레이션을 계속한다. 그러나, 다른 시점에서, 사용자는 다음 목표된 포인트로 하드웨어 가속한다. 다른 경우에도, 사용자는 ICE 모드로 추가로 진행하고자 한다.
요약하여, 결합 컴포넌트 재생성은 소프트웨어 모델에서 결합 컴포넌트 값을 업데이트하기 위하여 레지스터 값을 사용하는 것을 포함한다. 임의의 레지스터 값이 변화될때, 변화된 레지스터 값은 값이 업데이트될때 레지스터의 팬-아웃 네트워크를 통하여 진행될 것이다. 레지스터 값이 변화되지 않을때, 소프트웨어 모델 값은 변하지 않을것이고, 따라서 시스템은 결합 컴포넌트를 재성할 필요가 없다. 일반적으로, 하드웨어 가속 운행은 몇몇 시간 동안 발생할것이다. 결과적으로, 많은 레지스터 값은 변화하여, 변화된 값을 가지는 이들 레지스터의 팬-아웃 네트워크 하부에 배치된 많은 결합 컴포넌트 값에 영향을 미친다. 이런 경우, 결합 컴포넌트 재생성 처리는 비교적 느릴수있다. 다른 경우, 하드웨어 가속 운행후, 단지 몇개의 레지스터 값이 변할수있다. 변화된 레지스터 값을 가진 레지스터에 대한 팬-아웃 네트워크는 작을수있고, 결합 컴포넌트 재생성 처리는 비교적 빠를 수 있다.Fourth, the software kernel then executes a standard event simulation algorithm to convey the value change from the register to all of the combined components of the software model. In other words, the register values that changed during the post-acceleration time interval before acceleration are passed to all underlying coupling components that depend on these register values. These coupling components then evaluate these new register values. In accordance with the fan-out and propagation principle, another second level coupling component placed below from the first level coupling component that in turn directly depends on the changed register value must evaluate the changed data. This progression of register values to other components placed below that may be affected contributes to the purpose of the fan-out network. Thus, these coupling components placed below and affected by the changed register values are updated in the software model. None of the binding component values are affected. Thus, if only one register value changed during the time interval after acceleration and before acceleration, and only one coupling component is affected by this register value change, then only this coupling component will re-evaluate the value based on this changed register value. . Other parts of the modeled circuit will not be affected. For this small change, the combined component regeneration process will occur relatively quickly.
Finally, when the event progression ends, the system is ready for any mode of operation. In general, you want to check the value after a long run. After the combined component regeneration process, the user continues the pure software simulation for debug / test purposes. However, at another point in time, the user hardware accelerates to the next desired point. In other cases, the user would like to proceed further to the ICE mode.
In summary, coupling component regeneration involves using register values to update coupling component values in a software model. When any register value changes, the changed register value will proceed through the register's fan-out network when the value is updated. When the register value does not change, the software model value will not change, so the system does not need to recreate the coupling component. In general, hardware accelerated operation will occur for several hours. As a result, many register values change, affecting the values of many coupling components placed under the fan-out network of those registers with the changed values. In this case, the combined component regeneration process can be relatively slow. In other cases, after a hardware accelerated run, only a few register values may change. The fan-out network for registers with changed register values can be small, and the combined component regeneration process can be relatively fast.

삭제delete

Ⅳ. 타켓 시스템 모드를 사용한 에뮬레이션(emulation)Ⅳ. Emulation Using Targeted System Mode

도 10은 본 발명의 일실시예에 따른 SEmulation 시스템 아키텍쳐를 도시한다. 도 10은 시스템이 회로내 에뮬레이션 모드에서 동작할때 소프트웨어 모델, 하드웨어 모델, 에뮬레이션 인터페이스 및 타켓 시스템 사이의 관계를 도시한다. 상기된 바와 같이, SEmulation 시스템은 PCI 버스 같은 고속 버스에 의해 상호접속되는 범용 마이크로프로세서 및 리컨피규러블 하드웨어 보드를 포함한다. SEmulation 시스템은 사용자 회로 설계를 컴파일하고 하드웨어 모델 대 리컨피규러블 보드 맵핑 처리 동안 에뮬레이션 하드웨어 컨피규레이션 데이터를 생성한다. 그 다음 사용자는 범용 프로세서를 통하여 회로를 시뮬레이션하고, 시뮬레이션 프로세스를 하드웨어 가속하고, 에뮬레이션 인터페이스를 통하여 타켓 시스템으로 회로 설계를 에뮬레이션하고, 추후 포스트 시뮬레이션 분석을 수행한다.10 illustrates a SEmulation system architecture in accordance with an embodiment of the present invention. 10 illustrates the relationship between a software model, a hardware model, an emulation interface, and a target system when the system operates in an in-circuit emulation mode. As noted above, the SEmulation system includes a general purpose microprocessor and a reconfigurable hardware board interconnected by a high speed bus such as a PCI bus. The SEmulation system compiles the user circuit design and generates emulation hardware configuration data during the hardware model to reconfigurable board mapping process. The user then simulates the circuit through a general-purpose processor, hardware accelerates the simulation process, emulates the circuit design with the target system via the emulation interface, and then performs post simulation analysis.

소프트웨어 모델(315) 및 하드웨어 모델(325)은 컴파일 처리 동안 결정된다. 에뮬레이션 인터페이스(382) 및 타켓 시스템(387)은 회로내 에뮬레이션 모드 동안 시스템에 제공된다. 사용자의 분리하에서, 에뮬레이션 인터페이스 및 타켓 시스템은 최초에 시스템에 결합될 필요가 없다.Software model 315 and hardware model 325 are determined during the compilation process. Emulation interface 382 and target system 387 are provided to the system during the in-circuit emulation mode. Under the user's separation, the emulation interface and target system do not need to be initially coupled to the system.

소프트웨어 모델(315)은 모든 시스템, 및 소프트웨어/하드웨어 경계-REG, S2H, H2S 및 CLK에 대한 4 어드레스 공간을 제어하는 커널(316)을 포함한다. SEmulation 시스템은 하드웨어 모델을 다른 컴포넌트 타입 및 제어 기능에 따른 메인 메모리의 4 어드레스 공간에 맵핑한다 : REG 공간(317)은 레지스터 컴포넌트을 위해 설계되고; CLK 공간(320)은 소프트웨어 클럭을 위해 설계되고; S2H 공간(318)은 하드웨어 모델에 대한 소프트웨어 테스트 벤치 컴포넌트의 출력을 위해 설계되고; H2S 공간(319)은 소프트웨어 테스트 벤치 컴포넌트에 대한 하드웨어 모델의 출력을 위하여 설계된다. 이들 전용 I/O 버퍼 공간은 시스템 초기화 시간 동안 커널의 메인 메모리 공간에 맵핑된다.The software model 315 includes a kernel 316 that controls all system and four address spaces for software / hardware boundary-REG, S2H, H2S and CLK. The SEmulation system maps the hardware model to four address spaces of main memory according to different component types and control functions: REG space 317 is designed for register components; CLK space 320 is designed for a software clock; S2H space 318 is designed for the output of software test bench components for the hardware model; H2S space 319 is designed for the output of a hardware model for software test bench components. These dedicated I / O buffer spaces are mapped into the kernel's main memory space during system initialization time.

하드웨어 모델은 FPGA 칩 및 FPGA I/O 컨트롤러(327)의 FPGA 칩의 몇몇 뱅크(326a-326d)를 포함한다. 각각의 뱅크(예를들어, 326b)는 적어도 하나의 FPGA 칩을 포함한다. 일 실시예에서, 각각의 뱅크는 4 FPGA 칩으로 구성된다. FPGA 칩의 4×4 어레이에서, 뱅크(326b 및 326d)는 로우 뱅크이고 뱅크(326a 및 326c)는 하이 뱅크일수있다. 특정 칩 및 그것의 상호접속부에 대한 특정 하드웨어 모델링 사용자 회로 설계 엘리먼트의 맵핑, 배치 및 라우팅은 도 6을 참조하여 논의된다. 소프트웨어 모델(315) 및 하드웨어 모델(325) 사이의 상호접속부(328)는 PCI 버스 시스템이다. 하드웨어 모델은 PCI 버스의 작업처리량을 유지하는 동안 PCI 버스 및 FPGA 칩의 뱅크(326a-326d) 사이의 데이터 트래픽을 제어하기 위하여 PCI 인터페이스(380) 및 제어 유닛(381)를 포함한다. 각각의 FPGA 칩은 추가로 몇몇 어드레스 포인터를 포함하고, 여기서 각각의 어드레스 포인터는 소프트웨어/하드웨어 경계내의 각각의 어드레스 공간(즉, REG, S2H, H2S 및 CLK)에 해당하여,FPGA 칩의 뱅크(326a-326d)내에서 각각의 이들 어드레스 공간 및 각각의 FPGA 칩 사이의 데이터를 결합한다. The hardware model includes an FPGA chip and several banks 326a-326d of the FPGA chip of the FPGA I / O controller 327. Each bank (eg, 326b) includes at least one FPGA chip. In one embodiment, each bank consists of four FPGA chips. In a 4x4 array of FPGA chips, banks 326b and 326d may be low banks and banks 326a and 326c may be high banks. The mapping, placement, and routing of specific hardware modeling user circuit design elements for a particular chip and its interconnects are discussed with reference to FIG. 6. The interconnect 328 between the software model 315 and the hardware model 325 is a PCI bus system. The hardware model includes a PCI interface 380 and a control unit 381 to control data traffic between the PCI bus and the banks 326a-326d of the FPGA chip while maintaining the throughput of the PCI bus. Each FPGA chip further includes several address pointers, where each address pointer corresponds to a respective address space (ie, REG, S2H, H2S, and CLK) within the software / hardware boundary, bank of banks 326a Combine the data between each of these address spaces and each FPGA chip within -326d).

소프트웨어 모델(315) 및 하드웨어 모델(325) 사이의 통신은 하드웨어 모델의 DMA 엔진 또는 어드레스 포인터를 통하여 발생한다. 선택적으로, 통신은 하드웨어 모델에서 어드레스 포인트 및 DMA 엔진 양쪽을 통하여 발생한다. 커널은 직접적인 맵핑 I/O 컨트롤 레지스터를 통하여 평가 요구와 함께 DMA 전달을 시작한다. REG 공간(317), CLK 공간(320), S2H 공간(318), 및 H2S 공간(319)은 소프트웨어 모델(315) 및 하드웨어 모델(325) 사이의 데이터 전달을 위하여 각각 I/O 데이터 경로 라인(321, 322, 323 및 324)을 사용한다.Communication between software model 315 and hardware model 325 occurs via the DMA engine or address pointer of the hardware model. Optionally, communication occurs through both the address point and the DMA engine in the hardware model. The kernel initiates DMA transfer with an evaluation request through a direct mapping I / O control register. The REG space 317, CLK space 320, S2H space 318, and H2S space 319 are each an I / O data path line (I / O data path line) for data transfer between the software model 315 and the hardware model 325. 321, 322, 323 and 324).

이중 버퍼링은 이들 공간이 업데이트 처리를 종료하기 위하여 몇몇 클럭 사이클이 걸리기 때문에 S2H 및 CLK 공간에 대한 모든 1차 입력을 위해 요구된다. 이중 버퍼링은 레이스 조건을 유발할수있는 내부 하드웨어 모델 상태를 혼란시키는 것을 방지한다.
S2H 및 CLK 공간은 커널로부터 하드웨어 모델로의 1차 입력이다. 상기된 바와같이, 하드웨어 모델은 사용자의회로 설계의 결합 컴포넌트 및 레지스터 컴포넌트 모두를 실질적으로 홀딩한다. 게다가, 소프트웨어 클럭은 소프트웨어로 모델링되고 하드웨어 모델과 인터페이스하도록 CLK I/O 어드레스에 제공된다. 커널은 시뮬레이션 시간을 앞당기고, 액티브 테스트 벤치 컴포넌트을 찾고, 클럭 컴포넌트을 평가한다. 임의의 클럭 에지가 커널에 의해 검출될때, 레지스터 및 메모리는 업데이트되고 결합 컴포넌트를 통해 값은 전달된다. 따라서, 이들 공간에서 값의 임의 변화는 만약 하드웨어 가속 모드가 선택되면 하드웨어 모델을 트리거하여 로직 상태를 변경시킨다.
내부-회로 에뮬레이션 모드 동안, 에뮬레이션 인터페이스(382)는 PCI 버스(328)에 결합되어 하드웨어 모델(325) 및 소프트웨어 모델(315)와 통신한다. 커널(316)은 하드웨어 가속 시뮬레이션 모드 및 회로 에뮬레이션 모드동안 소프트웨어 모델뿐 아니라, 하드웨어 모델을 제어한다. 에뮬레이션 인터페이스(382)은 케이블(390)을 통하여 타켓 시스템(387)에 결합된다. 에뮬레이션 인터페이스(382)는 인터페이스 포트(385), 에뮬레이션 I/O 제어(386), 타켓 대 하드웨어 I/O 버퍼(T2H)(384), 및 하드웨어 대 타켓 I/O 버퍼(H2T)(383)를 포함한다.
타켓 시스템(387)은 커넥터(389), 신호 입출력 인터페이스 소켓(388), 및 타켓 시스템(387)의 일부인 다른 모듈 또는 칩을 포함한다. 예를들어, 타켓 시스템(387)은 EGA 비디오 컨트롤러일수있고 사용자의 회로 설계는 하나의 특정 I/O 컨트롤러 회로일수있다. EGA 비디오 컨트롤러에 대한 I/O 컨트롤러의 사용자 회로 설계는 소프트웨어 모델(315)에서 완전히 모델링되고 하드웨어 모델(325)에서 부분적으로 모델링된다.
소프트웨어 모델(315)의 커널(316)은 내부-회로 에뮬레이션 모드를 제어한다. 에뮬레이션 클럭의 제어는 소프트웨어 클럭을 통한 소프트웨어, 게이트 클럭 로직, 및 게이트 데이터 로직에 존재하여 셋업 및 홀딩 시간은 회로내 에뮬레이션 모드동안 발생할 것이다. 따라서, 사용자는 회로내 에뮬레이션 처리 동안 임의의 시간에 시작, 정지, 단일 단계, 값 발생, 및 값 검사를 할 수 있다.Double buffering is required for all primary inputs to the S2H and CLK spaces because these spaces take several clock cycles to complete the update process. Double buffering prevents disruption to internal hardware model states that can cause race conditions.
S2H and CLK space are the primary inputs from the kernel to the hardware model. As noted above, the hardware model substantially holds both the coupling component and the register component of the user's circuit design. In addition, the software clock is modeled in software and provided at the CLK I / O address to interface with the hardware model. The kernel accelerates simulation time, finds active test bench components, and evaluates clock components. When any clock edge is detected by the kernel, registers and memory are updated and the values are passed through the coupling component. Thus, any change in values in these spaces triggers the hardware model to change the logic state if the hardware acceleration mode is selected.
During the inner-circuit emulation mode, the emulation interface 382 is coupled to the PCI bus 328 to communicate with the hardware model 325 and the software model 315. The kernel 316 controls the hardware model as well as the software model during the hardware acceleration simulation mode and the circuit emulation mode. Emulation interface 382 is coupled to target system 387 via cable 390. Emulation interface 382 provides interface ports 385, emulation I / O control 386, target-to-hardware I / O buffers (T2H) 384, and hardware-to-target I / O buffers (H2T) 383. Include.
The target system 387 includes a connector 389, a signal input / output interface socket 388, and other modules or chips that are part of the target system 387. For example, target system 387 may be an EGA video controller and the user's circuit design may be one specific I / O controller circuit. The user circuit design of the I / O controller for the EGA video controller is fully modeled in the software model 315 and partially modeled in the hardware model 325.
Kernel 316 of software model 315 controls the internal-circuit emulation mode. Control of the emulation clock is present in the software, gate clock logic, and gate data logic via the software clock so that setup and holding times will occur during the in-circuit emulation mode. Thus, the user can start, stop, single step, generate values, and check values at any time during the in-circuit emulation process.

삭제delete

이런 작업을 위해, 타켓 시스템 및 하드웨어 모델 사이의 모든 클럭 노드는 식별된다. 타켓 시스템에서 클럭 생성기는 디스에이블되고, 타켓 시스템으로부터의 클럭 포트는 비접속되거나, 타켓 시스템으로부터의 클럭 신호는 하드웨어 모델에 도달되는 것이 방지된다. 대신, 클럭 신호는 소프트웨어 생성 클럭의 다른 형태 또는 테스트 벤치 프로세스로부터 시작하여, 소프트웨어 커널은 액티브 클럭 에지를 검출하고 따라서 데이터 평가를 트리거한다. 따라서, ICE 모드에서, SEmulation 시스템은 소프트웨어 클럭을 사용하여 타켓 시스템 클럭 대신 하드웨어 모델을 제어한다.
타켓 시스템의 환경내에서 사용자 회로 설계 동작 시뮬레이션하기 위하여, 타켓 시스템(40) 및 모델링된 회로 설계 사이의 1차 입력(신호 입력) 및 출력(신호 출력) 신호는 평가를 위하여 하드웨어 모델(325)에 제공된다. 이것은 두개의 버퍼, 즉 타켓 대 하드웨어 버퍼(T2H)(384) 및 하드웨어 대 타켓 버퍼(H2T)(383)를 통하여 달성된다. 타켓 시스템(387)은 입력 신호를 하드웨어 모델(325)에 인가하기 위하여 T2H 버퍼(384)를 사용한다. 하드웨어 모델(325)은 H2T 버퍼(383)를 사용하여 출력 신호를 타켓 시스템(387)에 전달한다. 이런 내부-회로 에뮬레이션 모드에서, 하드웨어 모델은 S2H 및 H2S 버퍼 대신 T2H 및 H2T 버퍼를 통하여 I/O 신호를 송신 및 수신한다. 왜냐하면, 시스템은 데이터를 평가하기 위하여 소프트웨어 모델(315)내의 테스트 벤치 프로세스 대신 타켓 시스템(387)을 사용하기 때문이다. 타켓 시스템이 소프트웨어 시뮬레이션의 속도보다 대체로 빠르게 실행되기 때문에, 내부-회로 에뮬레이션 모드는 보다 높은 속도로 실행될 것이다. 이들 입력 및 출력 신호의 전송은 PCI 버스(328)에서 발생한다.
게다가, 버스(61)는 에뮬레이션 인터페이스(382) 및 하드웨어 모델(325) 사이에 제공된다. 이 버스는 도 1의 버스(61)와 유사하다. 이 버스(61)는 에뮬레이션 인터페이스(382) 및 하드웨어 모델(325)가 T2H 버퍼(384) 및 H2T 버퍼(383)을 통하여 통신하도록 하게 한다.
통상적으로, 타켓 시스템(387)은 PCI 버스에 결합되지 않는다. 그러나, 상기 결합은 만약 에뮬레이션 인터페이스(382)가 타켓 시스템(387)의 설계에 통합되면 실현될수있다. 이러한 셋업에서, 케이블(390)은 없을 것이다. 타켓 시스템(387) 및 하드웨어 모델(325) 사이의 신호는 여전히 에뮬레이션 인터페이스를 통하여 통과할 것이다.For this task, all clock nodes between the target system and the hardware model are identified. In the target system, the clock generator is disabled, the clock port from the target system is disconnected, or the clock signal from the target system is prevented from reaching the hardware model. Instead, the clock signal starts from another form of software generated clock or test bench process, so that the software kernel detects the active clock edge and thus triggers the data evaluation. Thus, in ICE mode, the SEmulation system uses a software clock to control the hardware model instead of the target system clock.
In order to simulate user circuit design behavior in the environment of the target system, the primary input (signal input) and output (signal output) signals between the target system 40 and the modeled circuit design are sent to the hardware model 325 for evaluation. Is provided. This is accomplished through two buffers: target to hardware buffer (T2H) 384 and hardware to target buffer (H2T) 383. Target system 387 uses T2H buffer 384 to apply an input signal to hardware model 325. The hardware model 325 uses the H2T buffer 383 to deliver the output signal to the target system 387. In this inner-circuit emulation mode, the hardware model sends and receives I / O signals through the T2H and H2T buffers instead of the S2H and H2S buffers. This is because the system uses the target system 387 instead of the test bench process in the software model 315 to evaluate the data. Since the target system runs generally faster than the speed of the software simulation, the internal-circuit emulation mode will run at a higher rate. The transmission of these input and output signals occurs on the PCI bus 328.
In addition, bus 61 is provided between emulation interface 382 and hardware model 325. This bus is similar to the bus 61 of FIG. This bus 61 allows the emulation interface 382 and the hardware model 325 to communicate through the T2H buffer 384 and the H2T buffer 383.
Typically, target system 387 is not coupled to the PCI bus. However, the combination may be realized if the emulation interface 382 is integrated into the design of the target system 387. In this setup, there will be no cable 390. The signal between the target system 387 and the hardware model 325 will still pass through the emulation interface.

삭제delete

Ⅴ. 포스트 시뮬레이션 분석 모드Ⅴ. Post Simulation Analysis Mode

본 발명의 SEumlation 시스템은 값 변화 덤프(VCD)를 지원하고, 포스트 시뮬레이션 분석을 위해 폭넓게 사용된 시뮬레이터 기능을 지원한다. 필수적으로, VCD는 추후 포스트 시뮬레이션 분석동안, 사용자가 시뮬레이션 프로세스의 다양한 입력 및 결과적인 출력을 검토할 수 있도록 하드웨어 모델의 모든 입력 및 선택된 레지스터 출력의 히스토리 기록을 제공한다. VCD를 지원하기 위하여, 시스템은 하드웨어 모델에 대한 모든 입력을 로그한다. 출력에 대하여, 시스템은 사용자 정의 로깅 주파수(예를들어, 1/10,000 기록/사이클)에서 하드웨어 레지스터 컴포넌트의 모든 값을 로그한다. 로깅 주파수는 출력 값을 기록하는 방법을 결정한다. 1/10,000 기록/사이클의 로깅 주파수에 대하여, 출력 값에는 매 10,000 사이클이 기록된다. 로깅 주파수가 높아질수록, 추후 포스트 시뮬레이션 분석을 위하여 정보가 보다 많이 기록된다. 로깅 주파수가 낮아질수록, 추후 포스트 시뮬레이션 분석을 위하여 정보가 보다 적게 저장된다. 선택된 로깅 주파수가 SEmulation 속도와 인과 관계를 가지기 때문에, 사용자는 주의깊게 로깅 주파수를 선택하여야 한다. 보다 높은 로깅 주파수는 추가 시뮬레이션이 수행되기 전에 메모리에 대한 I/O 동작을 수행함으로써 출력 데이터를 기록하기 위하여 시스템이 시간과 자원을 소비하기 때문에 SEmulation 속도를 감소시킨다. The SEumlation system of the present invention supports a value change dump (VCD) and a simulator function widely used for post simulation analysis. Essentially, the VCD provides a historical record of all inputs and selected register outputs of the hardware model so that the user can review the various inputs and the resulting outputs of the simulation process during later post simulation analysis. To support the VCD, the system logs all inputs to the hardware model. For output, the system logs all values of hardware register components at user defined logging frequencies (e.g., 1 / 10,000 writes / cycle). The logging frequency determines how the output value is recorded. For a logging frequency of 1 / 10,000 recordings / cycle, every 10,000 cycles is recorded in the output value. The higher the logging frequency, the more information is recorded for later post simulation analysis. The lower the logging frequency, the less information is stored for later post simulation analysis. Since the selected logging frequency has a causal relationship with the SEmulation rate, the user must select the logging frequency carefully. Higher logging frequencies reduce the speed of SEmulation because the system spends time and resources to write output data by performing I / O operations to memory before further simulations are performed.

포스트 시뮬레이션 분석과 관련하여, 사용자는 시뮬레이션이 목표되는 특정 포인트를 선택한다. 만약 로깅 주파수가 1/500 기록/사이클이면, 레지스터 값은 매 500 사이클에서 포인트 0, 500, 1000, 1500 등을 기록한다. 만약 사용자가 포인트 610에서의 결과를 원하면, 예를들어 사용자는 기록된 포인트 500을 선택하고, 시뮬레이션이 포인트 610에 도달할때까지 시뮬레이션한다. 분석 단계 동안, 분석 속도는 사용자가 처음에 포인트 500에 대한 데이터를 액세스하고 그 다음 포인트 610으로 시뮬레이션하기 때문에 시뮬레이션 속도와 같다. 보다 높은 로깅 주파수에서, 보다 많은 데이터가 포스트 시뮬레이션 분석을 위하여 저장된다는 것이 주의된다. 따라서, 1/300 기록/사이클의 로깅 주파수에 대하여, 매 300 사이클에서 포인트 0, 300, 600, 900 등에 대한 데이터가 기록된다. 포인트 610에서의 결과를 얻기 위하여, 사용자는 처음에 기록된 포인트 600을 선택하고, 포인트 610으로 시뮬레이션한다. 로깅 주파수가 1/500보다 오히려 1/300일때 포스트 시뮬레이션 분석 동안 목표된 포인트 610으로 보다 빨리 도달한다. 그러나, 이것은 항상 그렇지는 않다. 로깅 주파수와 결합하여 특정 분석 포인트는 얼마나 빨리 포스트 시뮬레이션 분석 포인트에 도달하는지를 결정한다. 예를 들어, 시스템은 만약 VCD 로깅 주파수가 1/300보다 오히려 1/500이면 보다 빨리 포인트 523에 도달할 수 있다.With regard to post simulation analysis, the user selects a particular point at which the simulation is targeted. If the logging frequency is 1/500 writes / cycle, the register value writes points 0, 500, 1000, 1500, etc. every 500 cycles. If the user wants a result at point 610, for example, the user selects the recorded point 500 and simulates until the simulation reaches point 610. During the analysis phase, the analysis rate is equal to the simulation rate because the user first accesses data for point 500 and then simulates to point 610. It is noted that at higher logging frequencies, more data is stored for post simulation analysis. Thus, for a logging frequency of 1/300 recording / cycle, data for points 0, 300, 600, 900 and the like are recorded every 300 cycles. To get the result at point 610, the user selects the first recorded point 600 and simulates with point 610. When the logging frequency is 1/300 rather than 1/500, it reaches faster at the target point 610 during post simulation analysis. However, this is not always the case. In combination with the logging frequency, a particular analysis point determines how quickly a post simulation analysis point is reached. For example, the system may reach point 523 sooner if the VCD logging frequency is 1/500 rather than 1/300.

그 다음 사용자는 SEmulation 이후 모든 하드웨어 컴포넌트의 값 변화 덤프(VCD)를 계산하기 위해 하드웨어 모델에 대한 입력 로그를 가진 소프트웨어 시뮬레이션을 운행함으로써 분석을 수행한다. 사용자는 적시에 임의의 레지스터 로그 포인트를 선택하고 적시에 로그 포인트가 진행하는 값 변화 덤프를 시작한다. 이런 값 변화 덤프(VCD) 방법은 포스트 시뮬레이션 분석을 위해 임의의 시뮬레이션 파형 뷰어에 링크할 수 있다. The user then performs the analysis by running a software simulation with input logs to the hardware model to calculate a value change dump (VCD) of all hardware components after SEmulation. The user selects any register log point in a timely manner and starts a value change dump where the log point progresses in a timely manner. This value change dump (VCD) method can link to any simulation waveform viewer for post simulation analysis.

주문형 VCD 시스템VCD system on demand

본 발명의 일 실시예는 시뮬레이션 리턴없이 주문형 VCD를 생성하는 시스템이다. 본 발명의 일 실시예에 따라, 여기에 기술된 바와 같은 주문형 VCD 기술은 다음의 상위 레벨 속성을 통합한다; (1) RCC 기반의 병렬 시뮬레이션 히스토리 압축 및 기록, (2) RCC 기반의 병렬 시뮬레이션 히스토리 압축해제 및 VCD 파일 생성, 및 (3) 시뮬레이션 리턴없이 선택된 시뮬레이션 타켓 범위 및 설계 개요에 대한 주문 소프트웨어 재생성. 각각의 이들 속성은 하기에 더 상세히 기술될 것이다.One embodiment of the invention is a system for creating a custom VCD without a simulation return. According to one embodiment of the present invention, a custom VCD technique as described herein incorporates the following high level attributes; (1) RCC-based parallel simulation history compression and recording, (2) RCC-based parallel simulation history decompression and VCD file generation, and (3) custom software regeneration for selected simulation target ranges and design overviews without simulation returns. Each of these attributes will be described in more detail below.

디버그 세션 동안, EDA 툴(이후 본 발명의 다양한 측면을 통합하는 RCC 시스템이라 함)은 임의의 시뮬레이션 부분이 재생되도록 테스트 벤치 프로세스로부터 1차 입력을 기록한다. 사용자는 추후 분석 동안 임의의 시뮬레이션 시간 범위로부터 VCD 파일로 하드웨어 상태 정보를 덤프하기 위하여 EDA 툴, 또는 RCC 시스템에게 선택적으로 명령할수있다. 이후, 사용자는 선택된 시뮬레이션 시간 범위내에 그의 설계를 디버깅하는 것을 즉각적으로 시작할 수 있다. 만약 상기 선택된 시뮬레이션 시간 범위가 사용자가 교정하고자 하는 버그를 포함하지 않으면, 상기 사용자는 VCD 파일로 덤프하기 위하여 다른 시뮬레이션 시간 범위를 선택할 수 있다. 그 다음 사용자는 이런 새로운 VCD 파일을 분석할 수 있다. 이런 주문형 VCD 특징으로 인해, 사용자는 임의의 포인트에서 시뮬레이션을 중단하고 임의의 목표된 시뮬레이션 시간 시작 포인트로부터 임의의 시뮬레이션 시간 종료 포인트로 주문형 다른 선택적 VCD 파일의 생성을 요구한다.During the debug session, the EDA tool (hereinafter referred to as the RCC system incorporating various aspects of the present invention) records the primary input from the test bench process so that any simulation portion is played. The user can optionally command the EDA tool, or the RCC system, to dump hardware status information into the VCD file from any simulation time range during later analysis. The user can then immediately begin debugging his design within the selected simulation time range. If the selected simulation time range does not contain bugs that the user wishes to correct, the user can select another simulation time range to dump to the VCD file. The user can then analyze these new VCD files. Due to this on-demand VCD feature, the user stops the simulation at any point and requires the creation of another optional VCD file on demand from any desired simulation time start point.

통상적인 디버그 세션에서, 사용자는 도 83에 도시된 RCC 시스템을 사용하여 그의 설계를 디버깅한다. 제 1 시뮬레이션 동안, 사용자는 여기서 시뮬레이션 세션 범위라 불리는 목표된 시작 시뮬레이션 시간으로부터 임의의 목표된 종료 시뮬레이션 시간에서 그의 설계를 빠르게 시뮬레이션한다. 이런 빠른 시뮬레이션 동안, 1차 입력의 고도로 압축된 형태는 "입력 히스토리" 파일에 기록되어 시뮬레이션 세션의 일부는 재생될 수 있다. 시뮬레이션 세션 범위의 끝 부분에서, RCC 시스템은 원한다면 사용자가 이런 종료 포인트를 지난 설계를 디버깅하는 하는 것으로 리턴할 수있도록 "시뮬레이션 히스토리" 파일에서 이런 종료 포인트로부터의 하드웨어 상태 정보를 저장한다. In a typical debug session, a user debugs his design using the RCC system shown in FIG. During the first simulation, the user quickly simulates his design at any desired end simulation time from the target start simulation time, referred to herein as the simulation session range . During this fast simulation, the highly compressed form of the primary input is recorded in an "input history" file so that part of the simulation session can be played. At the end of the simulation session scope, the RCC system stores hardware state information from these exit points in the "simulation history" file so that the user can return these exit points to debugging past designs if desired.

빠른 시뮬레이션 실행의 끝 부분에서, 사용자는 결과를 분석하고 그의 설계가 가지는 몇몇 문제를 예외없이 검출한다. 그 다음 사용자는 문제(즉, 버그)의 원인이 보다 넓은 시뮬레이션 세션 범위내에 있는 이후 시뮬레이션 타켓 범위라 불리는 특정 좁은 시뮬레이션 시간 범위내에 배치되는 것을 추측한다. 예를들어, 만약 시뮬레이션 세션 범위가 1,000 시뮬레이션 시간 단계를 포함한다면, 보다 좁은 시뮬레이션 타켓 범위는 보다 넓은 시뮬레이션 세션 범위내의 특정 위치에서 단지 100 시뮬레이션 시간 단계만을 포함한다.At the end of a quick simulation run, the user analyzes the results and detects without exception some problems with his design. The user then assumes that the cause of the problem (i.e., the bug) is located within a broader simulation session range and then within a certain narrow simulation time range called the simulation target range . For example, if the simulation session range includes 1,000 simulation time steps, the narrower simulation target range includes only 100 simulation time steps at a particular location within the wider simulation session range.

일단 사용자가 버그를 격리시키기 위하여 시뮬레이션 타켓 범위의 정확한 위치에 대해 추측하면, RCC 시스템은 입력 히스토리 파일에서 압축된 1차 입력을 압축해제하고 평가를 위한 하드웨어 모델에 압축해제된 1차 입력을 전달함으로써 시작부터 빠르게 시뮬레이션한다. RCC 시스템이 시뮬레이션 타켓 범위에 도달할 때, VCD 파일로 평가된 결과(예를들어, 하드웨어 노드 값 및 레지스터 상태)을 덤프한다. 그 후, 사용자는 시뮬레이션 세션 범위의 시작으로부터 시뮬레이션을 리턴하기 보다 시뮬레이션 타켓 범위의 시작에서, 또는 심지어 시뮬레이션의 시작 직후로부터 시작하는 VCD 파일을 사용하여 그의 설계를 재생함으로써 보다 주의 깊게 이 영역을 분석할수있다. 시뮬레이션 타켓 범위에서 VCD 파일로서 하드웨어 상태를 저장하는 이런 특징은 시뮬레이션 리턴 중에 낭비되는 상당한 양의 디버깅 시간을 절약한다. Once the user has guessed the exact location of the simulation target range to isolate the bug, the RCC system decompresses the compressed primary input in the input history file and passes the uncompressed primary input to the hardware model for evaluation. Simulate quickly from the start. When the RCC system reaches the simulation target range, it dumps the results (e.g., hardware node values and register status) evaluated in the VCD file. The user can then analyze this area more carefully by replaying his design with the VCD file starting at the beginning of the simulation target range, or even immediately after the start of the simulation, rather than returning the simulation from the beginning of the simulation session range. have. This feature of storing hardware state as a VCD file in the simulation target range saves a significant amount of debugging time wasted during the simulation return.

도 83을 다시 참조하여, 본 발명의 일 실시예를 통합한 상위 레벨의 RCC 시스템이 도시된다. RCC 시스템은 RCC 컴퓨팅 시스템(2600) 및 RCC 하드웨어 가속기(2620)를 포함한다. 본 특허 명세서의 여러곳에 기재된 바와 같이, RCC 게산 시스템(2600)은 사용자가 소프트웨어에서 사용자의 전체 소프트웨어 모델링 설계를 시뮬레이션하게 하고 상기 설계에서 하드웨어 모델링 부분의 하드웨어 가속을 제어하도록 하기에 필요한 컴퓨팅 리소스를 포함한다. 이런 목적을 위해, RCC 컴퓨팅 시스템(2600)은 CPU(2601), RCC 시스템의 다양한 컴포넌트에 의해 필요한 다양한 클럭(2602)(특허 명세서 여러곳에 기술된 소프트웨어 클럭 포함), 테스트 벤치 프로세스(2603), 및 시스템 디스크(2604)를 포함한다. 몇몇 통상적인 하드웨어 바탕 이벤트 히스토리 버퍼와 대조하여, 시스템 디스크는 작은 하드웨어 RAM 버퍼보다 압축된 데이터를 기록하기 위하여 사용된다. 비록 도시되지 않았지만, RCC 컴퓨팅 시스템(2600)은 컴퓨팅 시스템이 수행하는 여러 작업중 진단, 다양한 소프트웨어, 및 관리 파일을 실행하기 위하여 컴퓨팅 전력을 회로 설계자에게 제공하는 다른 로직 컴포넌트 및 버스 서브시스템을 포함한다.Referring back to FIG. 83, a high level RCC system incorporating one embodiment of the present invention is shown. The RCC system includes an RCC computing system 2600 and an RCC hardware accelerator 2620. As described elsewhere in this patent specification, the RCC calculation system 2600 includes computing resources needed to allow a user to simulate a user's entire software modeling design in software and to control hardware acceleration of the hardware modeling portion of the design. do. For this purpose, the RCC computing system 2600 may include a CPU 2601, various clocks 2602 required by various components of the RCC system (including the software clocks described in various patent specifications), test bench processes 2603, and System disk 2604. In contrast to some conventional hardware based event history buffers, system disks are used to write compressed data rather than small hardware RAM buffers. Although not shown, RCC computing system 2600 includes bus subsystems and other logic components that provide computing power to circuit designers to execute various on-the-job diagnostics, various software, and management files that the computing system performs.

본 특허 명세서의 다른 세션에서 RCC 어레이라 불리는 RCC 하드웨어 가속기(2620)는 사용자가 디버깅 프로세스를 가속하도록 하드웨어에서 사용자 설계의 적어도 일부를 모델링할 수 있는 리컨피규러블 어레이의 로직 엘리먼트(예를들어, FPGA)를 포함한다. 이런 목적을 위하여, RCC 하드웨어 가속기(2620)는 사용자 설계 부분의 하드웨어 모델을 제공하는 리컨피규러블 로직 엘리먼트(2621)의 어레이를 포함한다. RCC 컴퓨팅 시스템(2600)은 본 명세서의 여러 곳에 기술된 바와 같은 소프트웨어 클럭 및 버스 시스템을 통하여 RCC 하드웨어 가속기(2620)에 밀착 결합되고, 상기 버스의 일부는 도 83에서 라인(2610 및 2611)로서 도시된다.In another session of this patent document, an RCC hardware accelerator 2620, called an RCC array, is a logic element of a reconfigurable array (e.g., FPGA) that allows a user to model at least a portion of a user's design in hardware to accelerate the debugging process. ). For this purpose, RCC hardware accelerator 2620 includes an array of reconfigurable logic elements 2621 that provide a hardware model of the user-designed portion. The RCC computing system 2600 is tightly coupled to the RCC hardware accelerator 2620 via a software clock and bus system as described elsewhere herein, and portions of the bus are shown as lines 2610 and 2611 in FIG. 83. do.

본 발명의 주문형 VCD는 도 84와 관련하여 논의될것이다. 도 84는 몇몇 시뮬레이션 시간의 시간라인 t0, t1, t2 및 t3을 도시한다. 시뮬레이션 세션 범위는 시뮬레이션 시간 t0 및 시뮬레이션 시간 t3 사이에 있고, 그것의 경로는 시뮬레이션 시간 t1 및 t2를 포함한다. 시뮬레이션 시간 t0는 빠른 시뮬레이션 시작하는 시뮬레이션 세션 범위내의 제 1 시뮬레이션 시간을 나타낸다. 이 시뮬레이션 시간 t0은 임의의 분리 가능한 시뮬레이션 세션, 또는 시뮬레이션 세션 범위 동안 제 1 시뮬레이션 시간을 나타낸다. 다른 말로, 오늘의 디버그 세션이 t=10,000 내지 t=12,000의 시뮬레이션 세션 범위를 시험하는 것을 가정한다. 사용자는 특정 버그가 t=10,500 및 t=10,750 사이에 배치된다는 것을 가정한다. 이런 시뮬레이션 세션 범위에 대하여, 시뮬레이션 시간 t0는 t=10,000이다. 특정 버그가 배치되고 시뮬레이션 세션 범위 t=10,000 내지 t=12,000 동안 교정된다는 것이 가정된다. 그다음, 내일 사용자는 다음 시뮬레이션 세션 범위 t=12,000 내지 t=15,000으로 이동한다. 여기서, 시뮬레이션 시간 t0는 t=12,000이다. 몇몇 경우, 시뮬레이션 시간 t0는 사용자 설계의 제 1 버그 세션 동안(즉, t0가 t=0에 대응한다) 바로 제 1 시뮬레이션 시간을 나타낸다.The custom VCD of the present invention will be discussed with reference to FIG. 84. 84 shows timelines t0, t1, t2 and t3 at several simulation times. The simulation session range is between simulation time t0 and simulation time t3, the path of which includes simulation times t1 and t2. Simulation time t0 represents a first simulation time within the scope of a simulation session that starts a fast simulation. This simulation time t0 represents the first simulation time during any separable simulation session, or simulation session range. In other words, suppose today's debug session tests a simulation session range of t = 10,000 to t = 12,000. The user assumes that a particular bug is placed between t = 10,500 and t = 10,750. For this simulation session range, the simulation time t0 is t = 10,000. It is assumed that a particular bug is deployed and corrected for the simulation session range t = 10,000 to t = 12,000. Then, the user moves to the next simulation session range t = 12,000 to t = 15,000 tomorrow. Here, simulation time t0 is t = 12,000. In some cases, simulation time t0 represents the first simulation time immediately during the first bug session of the user design (ie t0 corresponds to t = 0).

유사하게, 시뮬레이션 시간 t3은 선택된 시뮬레이션 세션 범위 동안 최종 시뮬레이션 시간을 나타낸다. 다른 말로, 오늘날의 디버그 세션은 t=14,555 내지 t=16,750의 시뮬레이션 세션 범위 확장을 포함한다. 이런 시뮬레이션 세션 범위에 대하여, 시뮬레이션 시간 t3는 t=16,750이다. 특정 버그가 이런 시뮬레이션 세션 범위 t=14,555 내지 t=16,750 동안 배치되고 교정되는 것이 가정된다. 그 다음 사용자는 다음 시뮬레이션 세션 범위 t=16,750 내지 t=19,100상에서 이동한다. 여기서, 시뮬레이션 시간 t3는 t=19,100이다. 몇몇 경우, 시뮬레이션 시간 t3은 사용자 설계자의 최종 디버그 세션 동안 최종 시뮬레이션 시간을 나타낸다.Similarly, simulation time t3 represents the final simulation time for the selected simulation session range. In other words, today's debug sessions include simulation session coverage extensions from t = 14,555 to t = 16,750. For this simulation session range, the simulation time t3 is t = 16,750. It is assumed that specific bugs are deployed and corrected for this simulation session range t = 14,555 to t = 16,750. The user then moves over the next simulation session range t = 16,750 to t = 19,100. Here, simulation time t3 is t = 19,100. In some cases, simulation time t3 represents the final simulation time during the final debug session of the user designer.

사용자는 만약 원한다면 이런 시뮬레이션 시간 t3 넘어서 시뮬레이션을 계속할수있지만, 이동을 위하여 사용자는 시뮬레이션 시간 t0 내지 t3, 즉 현재 시뮬레이션 세션 범위 동안 그의 설계를 디버깅하는 것에 집중된다. 통상적으로, 버그가 현재 시뮬레이션 세션 범위 동안 교정(iron out)될 때, 사용자는 시뮬레이션 시간 t3를 넘어 다음 시뮬레이션 세션 범위로 그 설계를 시뮬레이션할 것이다.The user can continue the simulation beyond this simulation time t3 if desired, but for movement the user is focused on debugging his design during simulation time t0 to t3, ie the current simulation session scope. Typically, when a bug is ironed out during the current simulation session scope, the user will simulate the design beyond the simulation time t3 to the next simulation session scope.

시뮬레이션 세션 범위의 이런 요약적인 표현에서, 이들 시뮬레이션 시간 주기 t0-t3는 필수적으로 서로 인접한다; 즉, 시뮬레이션 시간 t0 및 t1은 서로 바로 인접하지 않는다. 사실상, 시뮬레이션 시간 t0 및 t1은 수천의 개별적인 시뮬레이션 시간 주기일 수 있다. In this summary representation of the simulation session range, these simulation time periods t0-t3 are essentially adjacent to each other; In other words, the simulation times t0 and t1 are not immediately adjacent to each other. In fact, the simulation times t0 and t1 can be thousands of individual simulation time periods.

본 발명의 일 실시예가 RCC 시스템에서 실행되기 때문에, 도 83에 도시된 RCC 시스템의 여러 컴포넌트에 대한 참조가 이루어질 것이다. 첫째, RCC 시스템의입력 및 시뮬레이션 히스토리 생성 동작은 논의될 것이다. 이런 생성 동작은 1차 입력에 대한 데이터 압축의 몇몇 형태 및 압축된 1차 입력의 몇몇 기록 형태를 포함한다. 둘째, RCC 시스템의 VCD 생성 동작은 논의될 것이다. 이런 VCD 생성 동작은 시뮬레이션 히스토리를 재생하기 위하여 1차 입력을 압축해제하고 시뮬레이션 타켓 범위 동안 하드웨어 상태를 VCD 파일로 덤핑하는 것을 포함한다. 셋째, VCD 검토 과정은 논의될 것이다. 비록 용어 "시뮬레이션 히스토리"가 때때로 사용되지만, 이것은 전체 디버그 세션이 소프트웨어 시뮬레이션을 포함하는 것을 의미하지 않는다. 정말로, RCC 시스템은 하드웨어 상태로부터 VCD 파일을 생성하고 소프트웨어 모델은 VCD 파일의 추후 분석을 위해서만 사용된다.Since one embodiment of the invention is implemented in an RCC system, reference will be made to various components of the RCC system shown in FIG. 83. First, the input and simulation history generation behavior of the RCC system will be discussed. This generation operation includes some form of data compression for the primary input and some form of recording of the compressed primary input. Second, the VCD generation operation of the RCC system will be discussed. This VCD generation operation involves decompressing the primary input to replay the simulation history and dumping the hardware state into the VCD file during the simulation target range. Third, the VCD review process will be discussed. Although the term "simulation history" is sometimes used, this does not mean that the entire debug session includes software simulation. Indeed, the RCC system creates a VCD file from the hardware state and the software model is used only for later analysis of the VCD file.

입력 및 시뮬레이션 히스토리 생성-압축 및 기록Input and simulation history generation—compression and recording

최초에, 사용자는 도 83의 RCC 컴퓨팅 시스템(2600)내의 소프트웨어 설계를 모델링한다. 상기 설계의 몇몇 부분에 대하여, RCC 컴퓨팅 시스템(2600)은 하드웨어 설명 언어(예를들어, VHDL)를 바탕으로 설계의 하드웨어 모델을 생성한다. 하드웨어 모델은 RCC 하드웨어 가속기(2620)의 일부인 리컨피규러블 로직 엘리먼트(2621)의 어레이에 구성된다. 이런 셋업으로, 사용자는 RCC 컴퓨팅 시스템(2600)내의 소프트웨어 설계를 시뮬레이션하고, RCC 하드웨어 가속기(2620)를 사용하여 설계의 일부(즉, 시뮬레이션 시간 단계 또는 회로의 구별되는 물리적 섹션)를 가속하고, 또는 시뮬레이션 및 하드웨어 가속을 결합한다. Initially, a user models a software design in the RCC computing system 2600 of FIG. 83. For some portions of the design, RCC computing system 2600 generates a hardware model of the design based on a hardware description language (eg, VHDL). The hardware model is configured in an array of reconfigurable logic elements 2621 that are part of the RCC hardware accelerator 2620. With this setup, the user simulates a software design within the RCC computing system 2600, accelerates a portion of the design (ie, a simulation time step or distinct physical section of the circuit) using the RCC hardware accelerator 2620, or Combines simulation and hardware acceleration.

사용자는 그의 최종 회로 설계를 막 종료하였다. 이 때가 결함을 찾기 위해 설계를 디버깅하는 시간이다. 만약 사용자가 설계의 이전 버젼을 미리 디버깅했다면, 그는 버그가 배치된 몇몇 장소를 생각한다. 다른 한편, 만약 이것이 새로운 설계에 대한 최초 디버그 세션이면, 사용자는 잠재적 버그의 위치에 대해 몇몇 생각을 가져야 한다. 어느 경우에서나, 몇몇 추측 작업은 일반적으로 버그를 배치시키기 위해 필요하다. 이런 논의를 위해, 최초 시간동안 설계를 디버깅하는 것을 가정한다. The user has just finished his final circuit design. This is the time to debug the design to find defects. If the user has debugged a previous version of the design beforehand, he thinks of some places where the bug is placed. On the other hand, if this is the first debug session for a new design, the user should have some idea about the location of potential bugs. In either case, some guesswork is usually required to place the bug. For this discussion, assume that you are debugging your design for the first time.

설계를 디버깅할때, 사용자는 시뮬레이션 세션 범위를 선택한다. 이론적으로, 이런 시뮬레이션 세션 범위는 임의의 길이의 시뮬레이션 시간일 수 있다. 그러나 실제로, 시뮬레이션 세션 범위는 설계에서 약간의 버그를 격리시키기에 충분하게 짧고 빠르게 디버깅 처리를 이동시키고 설게를 완전히 디버깅할 필요가 있는 디버그 세션의 수를 최소화하기에 충분하게 길도록 선택되어야 한다. 분명히, 두개 또는 세개의 시뮬레이션 시간 단계의 시뮬레이션 세션 범위는 임의의 버그의 존재를 나타내지 않을 것이다. 게다가, 이런 작은 시뮬레이션 세션 범위는 디버깅 프로세스를 느리게할 많은 반복 작업을 수행하도록 사용자에게 강요할 것이다. 만약 선택된 시뮬레이션 세션 범위가 백만번의 시뮬레이션 단계이면, 너무 많은 버그가 자체적으로 나타나서 사용자가 문제 부분의 보다 집중된 공격을 구현하기 어렵게 만든다. When debugging a design, the user selects the scope of the simulation session. In theory, this simulation session range can be any length of simulation time. In practice, however, the simulation session range should be chosen long enough to isolate some bugs in the design and long enough to minimize the number of debug sessions that need to be moved quickly and completely debug the design. Clearly, the simulation session scope of two or three simulation time steps will not indicate the presence of any bugs. In addition, this small simulation session scope will force the user to perform many iterative tasks that will slow down the debugging process. If the selected simulation session scope is one million simulation steps, too many bugs appear on their own, making it difficult for the user to implement a more focused attack on the problem.

일단 사용자가 시뮬레이션 세션 범위를 선택하면, 그는 RCC 시스템에게 명령하여 도 84에 도시된 바와같이 시뮬레이션 시간 t0로부터 시뮬레이션 시간 t3로의 시뮬레이션을 빠르게 한다. 상술한 바와 같이, 시뮬레이션 시간 t0 내지 t3의 분리는 임의의 선택된 범위일수있지만, 시뮬레이션 시간 t0는 시뮬레이션의 시작을 나타내고 시뮬레이션 시간 t3는 시뮬레이션 세션 범위에 대한 최종 시뮬레이션 시간을 나타낸다.Once the user selects the simulation session range, he instructs the RCC system to speed up the simulation from simulation time t0 to simulation time t3 as shown in FIG. As mentioned above, the separation of simulation times t0 to t3 may be any selected range, but simulation time t0 represents the start of the simulation and simulation time t3 represents the final simulation time for the simulation session range.

시뮬레이션 시간 t0에서, 빠른 시뮬레이션은 RCC 컴퓨팅 시스템(2600)에서 시작한다. 빠른 시뮬레이션은 소프트웨어 모델의 재생성이 이런 시간 주기동안 필요하지 않기 때문에 일반적인 시뮬레이션 모드 대신 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3에서 수행된다. 본 특허 명세서 여러곳에서 논의된 바와같이, 재성성 동작은 하드웨어 상태 정보(예를들어, 노드 값, 레지스터 상태)를 수신하기 위하여 RCC 컴퓨팅 시스템(2620)을 요구하여, 보다 지능화된 로직 엘리먼트(예를들어, 결합 로직)는 사용자에 의한 추가 분석을 위해 소프트웨어에서 재성성된다. 물론, 몇몇 사용자는 시뮬레이션 프로세스 동안 소프트웨어 모델을 관찰하기를 원하고, 이 경우, RCC 컴퓨팅 시스템(2600)은 빠른 시뮬레이션을 수행하지 않는다. 이 경우, 시뮬레이션 프로세스는 하드웨어 모델의 1차 출력으로부터 소프트웨어 모델을 재생성하기 위한 RCC 컴퓨팅 시스템(2600)에 의해 필요한 추가 시간으로 인해 보다 느려진다. At simulation time t0, a quick simulation starts in the RCC computing system 2600. Fast simulation is performed at simulation time t0 to simulation time t3 instead of the normal simulation mode because regeneration of the software model is not needed during this time period. As discussed elsewhere in this patent specification, regeneration operations require the RCC computing system 2620 to receive hardware state information (e.g., node values, register state), thereby providing more intelligent logic elements (e.g., For example, the joining logic) is regenerated in software for further analysis by the user. Of course, some users want to observe the software model during the simulation process, in which case the RCC computing system 2600 does not perform a quick simulation. In this case, the simulation process is slower due to the additional time required by the RCC computing system 2600 to regenerate the software model from the primary output of the hardware model.

처음에, 소프트웨어 모델 상태 및 하드웨어 모델 레지스터 및 노드 값 같은 완전한 상태의 설계는 시스템 디스크에서 "시뮬레이션 히스토리" 파일이라 불리는 하나의 파일로 시뮬레이션 시간 t0에서 저장된다. 이것은 사용자가 디버깅을 위해 미래의 임의의 시간에 설계 상태를 RCC 시스템에 로딩하도록 한다. 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3의 시뮬레이션 세션 범위에 대한 이런 빠른 시뮬레이션 주기 동안, RCC 컴퓨팅 시스템(2600)은 두개의 구별되는 처리는 병렬로 1차 입력(I_P)에 인가한다. 테스트 벤치 프로세스(2603)로부터의 원시(raw) 1차 입력은 평가를 위한 RCC 하드웨어 가속기(2620)에 대한 라인(2610)에 제공된다. 동시에, 테스트 벤치 프로세스로부터의 동일한 1차 입력은 압축되고 "입력 히스토리" 파일이라 불리는 독립된 파일로서 시스템 디스크에 기록되어, 1차 입력의 전체 히스토리는 사용자가 추후 시뮬레이션의 임의의 일부를 재생하도록 수집될 수 있다. 특히, 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3에 해당하는 1차 입력은 시스템 디스크에 압축 저장된다.Initially, a complete state design, such as software model state and hardware model registers and node values, is stored at simulation time t0 in a file called the "simulation history" file on the system disk. This allows the user to load the design state into the RCC system at any time in the future for debugging. During this fast simulation period for the simulation session range from simulation time t0 to simulation time t3, the RCC computing system 2600 applies two distinct processes to the primary input I _P in parallel. Raw primary input from test bench process 2603 is provided in line 2610 for RCC hardware accelerator 2620 for evaluation. At the same time, the same primary input from the test bench process is compressed and written to the system disk as a separate file called an "input history" file so that the entire history of the primary input can be collected for the user to play back any part of the simulation later. Can be. In particular, the primary input corresponding to the simulation time t0 to the simulation time t3 is compressed and stored in the system disk.

RCC 하드웨어 가속기(2620)가 테스트 벤치 프로세스(2603)로부터 1차 입력(I_P)을 수신하면, 상기 가속기는 1차 입력을 처리한다. 결과적으로, 하드웨어 모델의 하드웨어 상태는 다양한 로직 및 다른 회로 소자들이 데이터를 평가할때 변할 것이다. 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3의 시간 주기 동안, RCC 시스템은 빠른 시뮬레이션 주기 동안 사용자가 설계에서 정교하게 디버깅하는데 이득이 없기 때문에 로직 생성을 수행하기 위하여 RCC 컴퓨팅 시스템(2600)을 기다릴 필요가 없다. RCC 시스템은 또한 1차 출력(예를 들어, 하드웨어 노드 값 및 레지스터 상태)을 전혀 저장하지 않는다. RCC 컴퓨팅 시스템(2600)이 "입력 히스토리" 파일에 기록하기 위하여 1차 입력을 압축하는 동안, RCC 하드웨어 가속기(2620)이 로우 및 압축되지 않는 1차 입력을 평가하지 않는다는 것을 주의한다. 다른 실시예에서, RCC 시스템은 입력 히스토리 파일에 기록하기 위하여 1차 입력을 압축하지 않는다.When the RCC hardware accelerator 2620 receives the primary input I _P from the test bench process 2603, the accelerator processes the primary input. As a result, the hardware state of the hardware model will change when various logic and other circuit elements evaluate the data. During a time period from simulation time t0 to simulation time t3, the RCC system does not have to wait for the RCC computing system 2600 to perform logic generation because there is no benefit for the user to elaborately debug in the design during the fast simulation period. The RCC system also does not store primary outputs (eg hardware node values and register states) at all. Note that the RCC hardware accelerator 2620 does not evaluate low and uncompressed primary input while the RCC computing system 2600 compresses the primary input for writing to an "input history" file. In another embodiment, the RCC system does not compress the primary input to write to the input history file.

출력이 빠른 시뮬레이션 주기 동안 전혀 저장되지 않을때 RCC 컴퓨팅 시스템(2600)이 평가를 위해 RCC 하드웨어 가속기에 1차 입력을 전달하는 이유는 무엇인가? RCC 시스템은 시뮬레이션의 시작부터 시뮬레이션 시간 t3까지 1차 입력의 평가를 바탕으로 설계의 하드웨어 상태를 저장할 필요가 있다. 하드웨어 모델 상태의 정확한 스냅샷(snapshot)은 만약 하드웨어 모델이 시뮬레이션 시간 t3로부터 입력이 아닌 시작으로부터 포인트 t3으로의 1차 입력의 전체 히스토리를 평가하지 않으면 시뮬레이션 시간 t3에서 얻어질 수 없다. 로직 회로는 입력 순서를 바탕으로 평가 결과에 영향을 미치는 메모리 속성을 가진다. 따라서, 만약 시뮬레이션 시간 t3(또는 시뮬레이션 시간 t3 바로 전 시뮬레이션 시간)로부터의 1차 입력이 평가를 위해 하드웨어 모델에 공급되면, 하드웨어 모델은 시뮬레이션 시간 t3에서 잘못된 상태를 나타낼 것이다.Why does the RCC computing system 2600 pass the primary input to the RCC hardware accelerator for evaluation when the output is not stored at all during a fast simulation cycle? The RCC system needs to store the hardware state of the design based on the evaluation of the primary input from the start of the simulation to simulation time t3. An accurate snapshot of the hardware model state cannot be obtained at simulation time t3 if the hardware model does not evaluate the entire history of the primary input from the start to point t3 but not at the input from simulation time t3. Logic circuits have memory properties that affect the evaluation results based on the input order. Thus, if the primary input from the simulation time t3 (or the simulation time just before the simulation time t3) is fed to the hardware model for evaluation, the hardware model will show an incorrect state at the simulation time t3.

하드웨어 모델 상태가 왜 시뮬레이션 시간 t3 동안 저장되는가? 백만개 이상의 게이트 및 백만번 이상의 시뮬레이션 단계를 갖는 큰 설계는 비교적 짧은 시간 주기내에서 디버깅될 수 없다. 사용자는 이런 설계를 디버깅하기 위하여 다수의 시뮬레이션 세션을 필요로한다. 하나의 시뮬레이션 세션으로부터 다음 시뮬레이션 세션으로 빠르게 이동하기 위하여, RCC 시스템은 시뮬레이션 시간 t3로부터 하드웨어 상태(압축된 1차 입력과 함께)를 저장하여, 사용자는 시뮬레이션 시간 t3에서 시작하는 다음 시뮬레이션 세션 범위를 디버깅할 수 있다. 저장된 하드웨어 모델 상태로 인해, 사용자는 시뮬레이션의 시작 직후 시뮬레이션하는 것을 필요로 하지 않는다; 오히려, 사용자는 시뮬레이션 시간 t0에서 시뮬레이션 t3로의 설계를 디버깅한후 시뮬레이션 시간 t3로 빠르고 편리하게 리턴할 수 있다. 시뮬레이션 히스토리 파일에 저장된 시뮬레이션 시간 t3에서의 하드웨어 모델 상태는 상기 포인트까지 1차 입력의 전체 히스토리 반영인 설계의 올바른 스냅샷을 나타낸다.Why is the hardware model state stored during simulation time t3? Large designs with more than one million gates and more than one million simulation steps cannot be debugged in a relatively short time period. The user needs multiple simulation sessions to debug this design. In order to move quickly from one simulation session to the next, the RCC system saves the hardware state (with compressed primary input) from simulation time t3 so that the user debugs the next simulation session range starting at simulation time t3. can do. Due to the stored hardware model state, the user does not need to simulate immediately after the start of the simulation; Rather, the user can debug the design from simulation time t0 to simulation t3 and then return quickly and conveniently to simulation time t3. The hardware model state at simulation time t3 stored in the simulation history file represents a correct snapshot of the design, which is a complete history reflection of the primary input up to that point.

RCC 하드웨어 가속기(2620)에서의 하드웨어 모델은 라인(2611)상의 내부 하드웨어 상태를 RCC 컴퓨팅 시스템(2600)에 제공하여, RCC 컴퓨팅 시스템(2600)은 만약 필요하고 사용자에 의해 목표되면 소프트웨어 모델의 다양한 로직 엘리먼트(예를들어, 결합 로직)를 형성 또는 재생성할 수 있다. 그러나, 상기된 바와 같이 사용자는 시뮬레이션 세션 범위의 빠른 시뮬레이션 동안 소프트웨어 시뮬레이션을 관찰하는 것에 관심을 갖지 않는다. 따라서, 내부 하드웨어 상태가 사용자에 의해 현재 버그에 대해 시험되지 않기 때문에, RCC 하드웨어 가속기로부터의 이들 내부 하드웨어 상태는 시스템 디스크에 저장되지 않는다. The hardware model in the RCC hardware accelerator 2620 provides the internal hardware state on the line 2611 to the RCC computing system 2600 so that the RCC computing system 2600 can be configured with various logic of the software model if needed and targeted by the user. An element (eg, coupling logic) can be formed or regenerated. However, as noted above, the user is not interested in observing software simulations during a quick simulation of the simulation session range. Thus, since internal hardware states are not tested for current bugs by the user, these internal hardware states from the RCC hardware accelerators are not stored on the system disk.

시뮬레이션 시간 t3, 또는 시뮬레이션 세션 범위의 끝에서, 이런 특정 빠른 시뮬레이션 동작은 중단된다. 시뮬레이션 시간 t3에 대응하는 RCC 하드웨어 가속기(2620)내의 설계 하드웨어 모델로부터의 평가 결과 또는 1차 출력(예를 들어, 레지스터 값)은 시뮬레이션 히스토리 파일에 저장된다. 이것은 사용자가 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3에서 설계를 디버깅할때, 사용자가 필요한만큼 추가 디버깅을 위해 시뮬레이션 시간 t3로 바로 진행하도록 행해진다. 사용자는 시뮬레이션 시간 t3 넘어 몇몇 포인트에서 그의 설계를 디버깅하기 위하여 시뮬레이션 시간 t0로부터 시뮬레이션을 재실행할 필요가 없다. At the end of simulation time t3, or at the end of the simulation session range, this particular fast simulation operation is stopped. The evaluation result or primary output (eg, register value) from the design hardware model in RCC hardware accelerator 2620 corresponding to simulation time t3 is stored in the simulation history file. This is done so that when the user debugs the design from simulation time t0 to simulation time t3, the user proceeds directly to simulation time t3 for further debugging as needed. The user does not have to rerun the simulation from simulation time t0 to debug his design at some point beyond simulation time t3.

요약하여, 시뮬레이션 시간 t0 내지 시뮬레이션 시간 t3(즉, 시뮬레이션 세션 범위)에서, 사용자는 향후 참조를 위해 시스템 디스크에 동일한 1차 입력을 압축하여 저장함과 동시에 라인(2610)상에서 테스트 벤치 프로세스(2603)로부터의 1차 입력을 RCC 하드웨어 가속기(2620)에 공급함으로써 설계를 가속한다. RCC 컴퓨팅 시스템(2600)은 디버그 세션을 재생하기 위하여 입력 히스토리 파일에 1차 입력(압축된 또는 압축되지 않은)을 저장을 필요로 한다. 압축 동작은 RCC 하드웨어 가속기(2620)내에서 데이터 평가와 동시에 발생한다. 마지막으로, 시뮬레이션 세션 범위의 끝인 시뮬레이션 시간 t3에서, RCC 시스템은 하드웨어 모델의 상태 정보를 시뮬레이션 히스토리 파일에 저장한다.In summary, from simulation time t0 to simulation time t3 (i.e., simulation session range), the user may compress and store the same primary input on the system disk for future reference, while simultaneously exiting from the test bench process 2603 on line 2610. Accelerate the design by supplying the primary input of to the RCC hardware accelerator 2620. The RCC computing system 2600 requires storing primary input (compressed or uncompressed) in the input history file to replay the debug session. Compression operation occurs concurrently with data evaluation in RCC hardware accelerator 2620. Finally, at simulation time t3, the end of the simulation session scope, the RCC system stores the state information of the hardware model in the simulation history file.

본 발명의 일 실시예에서, 시뮬레이션 세션 범위로부터의 모든 기록되고 압축된 1차 입력은 시뮬레이션 시간 t3로부터의 하드웨어 상태 정보에 대해 추후에 변형될 동일 파일의 일부이다. 다른 실시예에서, 시뮬레이션 세션 범위로부터 저장된 정보 및 시뮬레이션 시간 t3에서의 하드웨어 상태 정보는 시스템 디스크에서 구별되는 파일로서 저장된다. 유사하게, 임의의 상기된 파일은 시뮬레이션 타켓 범위를 위해 추후 생성되는 주문형 VCD 정보로 변형될 수 있다. 선택적으로, 주문형 VCD형 정보는 압축된 1차 입력 파일 및 시뮬레이션 시간 t3에서 하드웨어 상태 정보 파일로부터 분리된 시스템 디스크내의 구별되는 VCD 파일로 저장될 수 있다. 다른 말로, 본 발명의 일실시예에 따라, 입력 히스토리 파일, 시뮬레이션 히스토리 파일 및 VCD 파일은 하나의 파일에서 서로 통합될 수 있다. 다른 실시예에서, 입력 히스토리 파일, 시뮬레이션 히스토리 파일 및 VCD 파일은 독립된 파일일 수 있다. 또한, 입력 히스토리 파일 및 시뮬레이션 히스토리 파일은 VCD 파일로부터 분리된 하나의 파일에 통합될 수 있다.In one embodiment of the invention, all recorded and compressed primary inputs from the simulation session range are part of the same file which will later be modified for hardware state information from simulation time t3. In another embodiment, the information stored from the simulation session range and the hardware state information at simulation time t3 are stored as distinct files on the system disk. Similarly, any of the aforementioned files can be transformed into custom VCD information that is later generated for the simulation target range. Optionally, the on-demand VCD type information can be stored in a compressed primary input file and a distinct VCD file in a system disk separate from the hardware state information file at simulation time t3. In other words, according to an embodiment of the present invention, the input history file, the simulation history file, and the VCD file may be integrated with each other in one file. In another embodiment, the input history file, simulation history file, and VCD file may be separate files. In addition, the input history file and the simulation history file can be merged into one file separate from the VCD file.

압축 방법은 지금 논의될 것이다. 본 발명의 일 실시예에 따라, RCC 시스템의압축은 시뮬레이션 시간 단계 당 10% 입력 이벤트를 가진 1차 입력 이벤트에 대한 20X의 압축 비율을 허용한다. 따라서, 백만개의 게이트 이상의 큰 ASIC 설계는 200개의 1차 입력 이벤트를 요구할 수 있다. 시뮬레이션 시간 단계 당 10% 입력 이벤트에 대하여, 대략 20 입력이 압축되고 기록될 필요가 있다. 만약 각각의 입력 신호가 2 바이트이고, 20 입력 신호는 40 바이트의 데이터가 시뮬레이션 시간 단계 당 1차 입력에서 처리될 필요가 있게 한다. 20X의 압축율에 대하여, 40 바이트 데이터는 시뮬레이션 시간 단계당 2 바이트의 데이터로 압축될 수 있다. 따라서, 약 백만번의 시뮬레이션 단계를 요구하는 설계에 대하여, RCC 시스템은 2메가 바이트의 데이터로 1차 입력을 압축한다. 이런 크기의 파일은 임의의 컴퓨팅 파일 시스템 및 파형 뷰어에 의해 쉽게 관리될 수 있다. 일 실시예에서, ZIP 압축이 사용된다.Compression methods will now be discussed. According to one embodiment of the invention, the compression of the RCC system allows a compression ratio of 20X for the primary input event with 10% input events per simulation time step. Thus, large ASIC designs of more than one million gates may require 200 primary input events. For 10% input events per simulation time step, approximately 20 inputs need to be compressed and recorded. If each input signal is 2 bytes, 20 input signals require 40 bytes of data to be processed at the primary input per simulation time step. For a compression rate of 20X, 40 byte data can be compressed to 2 bytes of data per simulation time step. Thus, for a design requiring about one million simulation steps, the RCC system compresses the primary input with two megabytes of data. Files of this size can be easily managed by any computing file system and waveform viewer. In one embodiment, ZIP compression is used.

일 실시예에 따라, 1차 입력 압축은 RCC 하드웨어 가속기(2620)에 의해 1차 입력 평가와 동시에 수행되고; 입력 스토리 파일 생성은 1차 입력 평가와 동시에 발생한다. 따라서, 압축 방법은 RCC 시스템 성능에 직접적인 악영향을 제공하지 않는다. 하나의 가능한 병목 현상은 시스템 디스크에 압축된 1차 입력을 기록하는 처리이다. 그러나, 데이터가 고압축되었기 때문에, RCC 시스템은 초당 50,000 시뮬레이션 시간 단계로 운행하는 대부분의 설계에 대해 5% 이하의 감속을 경험한다. According to one embodiment, primary input compression is performed concurrently with the primary input evaluation by the RCC hardware accelerator 2620; Input story file generation occurs concurrently with the primary input evaluation. Thus, the compression method does not provide a direct adverse effect on RCC system performance. One possible bottleneck is the process of writing compressed primary input to the system disk. However, because the data is highly compressed, the RCC system experiences less than 5% deceleration for most designs running at 50,000 simulation time steps per second.

기록이 RCC 시스템에서 제어되는 특정 방식에 대하여, 사용자는 본 발명의 일실시예에 따라 RCC 기록 특징을 초기화하기 위하여 $rcc(기록)을 우선 사용하여 한다 :For a particular manner in which recording is controlled in the RCC system, the user first uses $ rcc (record) to initialize the RCC recording feature in accordance with one embodiment of the present invention:

$rcc(record, name, <disk space>, <checkpoint control>);$ rcc (record, name, <disk space>, <checkpoint control>);

이제, 인수(argument) 이름, <디스크 공간(disk space)>, 및 <체크포인트 제어(checkpoint control)>에 대해 설명될 것이다. "이름" 인수는 현재 시뮬레이션 세션 범위에 대한 기록 이름이다. 동일한 설계의 다른 시뮬레이션 실행을 구별하기 위하여 서로 다른 이름들이 요구된다. 구별되는 기록 이름은 오프 라인 주문형 VCD형 디버깅을 위하여 요구된다.Now, the argument names, <disk space>, and <checkpoint control> will be described. The "name" argument is the record name for the current simulation session scope. Different names are required to distinguish different simulation runs of the same design. A distinct record name is required for offline on-demand VCD type debugging.

<디스크 공간> 인수는 RCC 시스템 기록 프로세스를 위해 할당된 최대 디스크 공간(MB 유닛에서)을 나타내기 위한 선택적 파라미터이다. 결합 값은 100MB이다. RCC 시스템은 특정 디스크 공간내의 현재 시뮬레이션 세션 범위의 최후 부분만을 기록한다. 다른 말로, 만약 <디스크 공간> 값이 100 MB로서 지정되지만 현재 시뮬레이션 세션 범위가 140 MB로 정해지면, RCC 시스템은 최후 100 MB만을 기록하고 압축된 1차 입력의 처음 40 MB를 버린다. 본 발명의 이런 측면은 결함 분석을 위한 하나의 장점이다. 본 발명의 일 실시예에서, 테스트 벤치 프로세스는 시뮬레이션 결함을 검출하고 시뮬레이션을 중지하기 위하여 자체 검사 기능을 가진다. RCC 시뮬레이션의 최종 히스토리는 상기 결함 분석을 위한 대부분의 정보를 제공할수 있다. The <disk space> argument is an optional parameter to indicate the maximum disk space (in MB units) allocated for the RCC system write process. The combined value is 100 MB. The RCC system only records the last portion of the current simulation session range within a particular disk space. In other words, if the <disk space> value is specified as 100 MB but the current simulation session range is 140 MB, the RCC system writes only the last 100 MB and discards the first 40 MB of the compressed primary input. This aspect of the invention is one advantage for defect analysis. In one embodiment of the present invention, the test bench process has a self test function to detect simulation faults and stop the simulation. The final history of the RCC simulation can provide most of the information for the defect analysis.

<체크포인트 제어> 인수는 전체 상태 검사 포인트를 수행하기 위하여 필요한 시뮬레이션 시간 단계 수를 나타내는 선택적 파라미터이다. 디폴트는 1,000,000 번이다. 대부분의 통상적인 압축 알고리즘과 같이, 압축된 1차 입력은 연속적인 시뮬레이션 단계 사이에서 상태 차이를 바탕으로 한다. 긴 시뮬레이션 실행 동안, 주어진 저주파수에서 전체 RCC 상태에 대한 체크포인트는 시뮬레이션 히스토리 추출을 용이하게 할수있다. 매 일백만 단계에서 배치된 RCC 시스템 및 체크포인트에서 초당 20K 내지 200K 시뮬레이션 시간 단계의 압축해제율에 대하여, RCC 시스템은 5 내지 50 초 내에 임의의 시뮬레이션 히스토리를 추출(즉, 1차 입력 및 선택된 VCD 파일 생성으로부터의 시뮬레이션 재생)한다.The <checkpoint control> argument is an optional parameter that indicates the number of simulation time steps needed to perform the full state checkpoint. The default is 1,000,000 times. As with most conventional compression algorithms, the compressed primary input is based on state differences between successive simulation steps. During long simulation runs, a checkpoint on the overall RCC state at a given low frequency may facilitate simulation history extraction. For decompression rates of 20K to 200K simulation time steps per second in RCC systems and checkpoints deployed in every million steps, the RCC system extracts any simulation history within 5 to 50 seconds (i.e., primary input and selected VCD). Simulation playback from file creation).

이런 $rcc(기록) 명령이 호출될때, RCC 시스템은 시뮬레이션 히스토리를 기록할것이다; 즉, 1차 입력에는 시스템 디스크의 저장을 위해 하나의 파일이 압축 기록될 것이다. RCC 하드웨어 가속기로부터의 1차 입력은 소프트웨어 로직 재생성이 이 시점에서 필요하지 않기 때문에 무시된다. 기록 처리는 $rcc(stop) 또는 $rcc(off) 명령으로 종료될 수 있고, 이 포인트에서 RCC 시스템은 소프트웨어 모델로 다시 시뮬레이션의 제어를 스위칭한다. 이런 포인트에서, 1차 출력은 소프트웨어 로직 재생성을 위하여 처리된다.When this $ rcc command is called, the RCC system will record the simulation history; In other words, one file will be compressed in the primary input for storage of the system disk. The primary input from the RCC hardware accelerator is ignored because software logic regeneration is not needed at this point. The write process can be terminated with the $ rcc (stop) or $ rcc (off) command, at which point the RCC system switches the control of the simulation back to the software model. At this point, the primary output is processed for software logic regeneration.

VCD 생성-압축 및 덤프VCD Creation-Compression and Dump

상기된 바와 같이, RCC 시스템은 시뮬레이션 시간 t0에서 시뮬레이션 세션 범위의 초기에 소프트웨어 모델 및 하드웨어 모델을 저장하고, 입력 히스토리 파일에서 전체 시뮬레이션 세션 범위 동안 압축된 1차 입력을 기록하고, 시뮬레이션 히스토리 파일내의 시뮬레이션 세션 범위의 끝에서 설계를 위한 하드웨어 모델 상태를 저장한다. 사용자는 시뮬레이션 시간 t0로부터의 설계 정보로부터 시뮬레이션 세션 범위의 시작시 설계를 로딩하기에 충분한 정보를 가진다. 압축된 1차 입력으로 인해, 사용자는 그의 설계의 임의의 일부를 소프트웨어 시뮬레이션할 수 있다. 그러나, 주문형 VCD의 특징으로 인해, 사용자는 이런 포인트에서 그의 설계를 소프트웨어 시뮬레이션하는 것을 원하지 않을 것이다. 오히려, 사용자는 버그를 격리 및 교정하기 위하여 미세 분석을 위한 선택된 시뮬레이션 타켓 범위 동안 VCD 파일을 생성하기를 원한다. 실제로, 기록된 압축 1차 입력으로 인해, RCC 시스템은 시뮬레이션 세션 범위내의 임의의 포인트를 재생할 수 있다. 게다가, RCC 시스템은 만약 목표된다면 시뮬레이션 시간 t3로부터 이전에 저장된 하드웨어 상태 정보를 로딩함으로써 현재 시뮬레이션 세션 범위 이상으로 시뮬레이션할 수 있다. As described above, the RCC system stores the software model and hardware model at the beginning of the simulation session range at simulation time t0, records the compressed primary input for the entire simulation session range in the input history file, and simulates in the simulation history file. Save the hardware model state for the design at the end of the session scope. The user has enough information to load the design at the start of the simulation session range from the design information from the simulation time t0. Due to the compressed primary input, the user can software simulate any part of his design. However, due to the features of the custom VCD, the user will not want to software simulate his design at this point. Rather, the user wants to create a VCD file during the selected simulation target range for fine analysis to isolate and correct bugs. Indeed, due to the compressed primary input recorded, the RCC system can reproduce any point within the scope of the simulation session. In addition, the RCC system can simulate beyond the current simulation session range by loading previously stored hardware state information from simulation time t3 if desired.

설계를 빠르게 시뮬레이션한 후, 사용자는 버그가 존재하는지를 결정하기 위하여 결과를 검토한다. 만약 버그가 사용자에게 나타나지 않으면, 상기 설계에는 현재 시뮬레이션 세션 범위 동안 버그가 없을 수 있다. 그 다음 사용자는 선택된 범위가 무엇이든 현재 시뮬레이션 세션 범위 넘어 다음 시뮬레이션 세션 범위로 시뮬레이션하도록 진행한다. 그러나, 만약 사용자가 설계에 일종의 문제를 가진다는 것을 결정하면, 사용자는 버그를 격리 및 교정하기 위하여 보다 주의깊게 시뮬레이션을 분석하여야 한다. 전체 시뮬레이션 세션 범위가 신중하고 상세한 분석을 위해 너무 크기 때문에, 사용자는 심화 학습을 위해 특정한 더 좁은 범위를 목표로 해야만 한다. 상기 설계와 과거의 디버깅 노력에 사용자가 친밀하다는 것을 근거로 하여, 사용자는 시뮬레이션 세션 범위 내에서 버그의 위치에 관한 적당한 추측을 행한다. 사용자는 버그의 위치(또는 버그가 자신을 나타낼 위치)에 관한 사용자의 추측에 대응하여야만 하는 선택된 시뮬레이션 타겟 범위에 초점을 맞출 것이다. 사용자는 시뮬레이션 타겟 범위가 도 84에 도시된 바와 같이, 시뮬레이션 시간(t1) 및 시뮬레이션 시간(t2) 사이에 존재한다는 것을 결정한다.After a quick simulation of the design, the user reviews the results to determine if a bug exists. If a bug does not appear to the user, the design may be free of bugs during the current simulation session scope. The user then proceeds to simulate the scope of the next simulation session beyond the scope of the current simulation session, whatever the selected range. However, if the user decides that he or she has some kind of problem in the design, the user must analyze the simulation more carefully to isolate and correct the bug. Because the entire simulation session range is too large for careful and detailed analysis, the user must target a specific narrower range for further learning. Based on the user's familiarity with the design and past debugging efforts, the user makes a reasonable guess as to the location of the bug within the scope of the simulation session. The user will focus on the selected simulation target range that must respond to the user's guess about the location of the bug (or where the bug will represent itself). The user determines that a simulation target range exists between simulation time t1 and simulation time t2, as shown in FIG.

RCC 시스템은 시뮬레이션 상태(t0)로부터 이전에 저장된 구성 정보를 갖는 RCC 하드웨어 가속기(2620) 내의 하드웨어 모델 및 RCC 컴퓨팅 시스템(2600) 내의 설계의 소프트웨어 모델을 로딩한다. 그리고 나서, RCC 시스템은 시뮬레이션 시간(t0)으로부터 시뮬레이션 시간(t1)으로 고속으로 시뮬레이션한다. 고속 시뮬레이션 동작 동안, RCC 컴퓨팅 시스템은 압축된 1차 입력을 포함하는 이전 저장된 파일을 로딩한다. RCC 컴퓨팅 시스템은 압축된 1차 입력을 압축해제하여 상기 압축해제된 1차 입력을 평가를 위해 RCC 하드웨어 가속기(2620)로 입력한다. 시뮬레이션 세션 범위에 대해 1차 입력을 압축하여 저장한 초기 고속 시뮬레이션 동작과 같이, 평가된 결과인 1차 출력(예를 들어, 하드웨어 모델 노드값 및 레지스터 상태)은 시뮬레이션 시간(t0)으로부터 시뮬레이션 시간(t1)으로의 고속 시뮬레이션 동안 저장되지 않는다.The RCC system loads the hardware model in the RCC hardware accelerator 2620 with the configuration information previously stored from the simulation state t0 and the software model of the design in the RCC computing system 2600. The RCC system then simulates at high speed from simulation time t0 to simulation time t1. During the fast simulation operation, the RCC computing system loads a previously stored file containing the compressed primary input. The RCC computing system decompresses the compressed primary input and inputs the decompressed primary input to the RCC hardware accelerator 2620 for evaluation. As with the initial high speed simulation operation where the primary input is compressed and stored over the simulation session range, the primary output (e.g., hardware model node value and register state) that is evaluated is the simulation time (t0) from the simulation time (t0). It is not stored during the high speed simulation to t1).

일단 고속 시뮬레이션 동작이 시뮬레이션 타겟 범위의 도입부, 즉 시뮬레이션 시간(t1)에 도달하면, RCC 시스템은 평가된 결과(즉, 1차 출력(O_p))를 RCC 하드웨어 가속기(2620)내의 하드웨어 모델로부터 시스템 디스크 내의 VCD 파일 내로 덤핑한다. 시뮬레이션 세션 범위에 대한 초기의 고속 시뮬레이션 동작과 달리,RCC 컴퓨팅 시스템(2600)은 임의의 압축을 수행하지 않는다. 다시, RCC 컴퓨팅 시스템(2600)은 사용자가 이 시간에 평가 결과를 볼 필요가 없기 때문에 소프트웨어 모델에 대한 재생성 동작을 수행하지 않는다. 소프트웨어 모델에 대한 임의의 재생성 동작을 수행하지 않음으로써, RCC 시스템은 VCD 파일을 신속하게 생성시킬 수 있다.Once a high speed when the simulation operation has reached the introduction, that is, the simulation time (t1) of the simulated target range, RCC system, the evaluation results (that is, the first output (O _p)), the system from the hardware model in RCC hardware accelerator 2620 Dump into a VCD file on disk. Unlike the initial high speed simulation operation for the simulation session range, the RCC computing system 2600 does not perform any compression. Again, the RCC computing system 2600 does not perform a regeneration operation on the software model because the user does not need to see the evaluation results at this time. By not performing any regeneration operation on the software model, the RCC system can quickly create a VCD file.

그러나, 다른 실시예에서, 사용자는 1차 출력을 저장하면서 t1로부터 t2로의 시뮬레이션 시간 주기 동안 사용자 설계 소프트웨어 모델을 함께 볼 수 있다. 만약 그런 경우, RCC 컴퓨팅 시스템(2600)은 소프트웨어 모델 재생성 동작을 수행하여 사용자가 임의의 형태의 사용자 설계로부터 임의의 상태 및 모든 상태를 보도록 한다.However, in another embodiment, the user can view the user design software model together during the simulation time period from t1 to t2 while saving the primary output. If so, the RCC computing system 2600 performs a software model regeneration operation to allow the user to see any and all states from any form of user design.

시뮬레이션 시간(t2)에서, RCC 컴퓨팅 시스템(2600)은 RCC 하드웨어 가속기(2620)로부터 VCD 파일로 평가 출력을 저장하는 것을 중단한다. 이 지점에서, 사용자는 고속 시뮬레이션을 중단할 수 있다. RCC 시스템은 이제 시뮬레이션 타겟 범위에 대한 완전한 VCD 파일을 가지며 VCD 파일을 더 상세하게 분석하는 것을 진행할 수 있다.At simulation time t2, RCC computing system 2600 stops storing the evaluation output from RCC hardware accelerator 2620 as a VCD file. At this point, the user can stop the high speed simulation. The RCC system now has a complete VCD file for the simulation target range and can proceed to further analyze the VCD file.

사용자가 VCD 파일을 분석하고자 할 때, 사용자는 그 시작(예를 들어, 시뮬레이션 시간(t0))으로부터 시뮬레이션을 재실행시킬 필요가 없다. 그 대신에, 사용자는 RCC 시스템에게 시뮬레이션 타겟 범위의 시작으로부터 저장된 하드웨어 상태 정보를 로딩하도록 명령하고 소프트웨어 모델을 갖는 시뮬레이션된 결과를 볼 수 있다. 이것은 이하의 시뮬레이션 히스토리 리뷰 섹션에서 보다 상세히 기술될 것이다.When the user wants to analyze the VCD file, the user does not have to rerun the simulation from the start (eg, simulation time t0). Instead, the user can instruct the RCC system to load the stored hardware state information from the start of the simulation target range and view the simulated results with the software model. This will be described in more detail in the Simulation History Review section below.

VCD 파일을 분석시, 사용자는 버그를 발견하거나 또는 발견하지 않을 수 있다. 버그가 발견된 경우, 사용자는 물론 설계 수정을 개시한다. 버그가 발견되지 않는 경우, 사용자는 자신이 버그를 갖는다고 의심하는 시뮬레이션 타겟 범위에 잘못된 추측을 할 수 있다. 사용자는 압축해제 및 VCD 파일 덤프(dump)에 관하여 위에서 사용한 동일한 프로세스를 사용해야만 한다. 사용자는 잘하면 시뮬레이션 세션 범위 내에 더 좋은 시뮬레이션 타겟 범위가 있다는 다른 추측을 행한다. 그렇게 함으로써, RCC 시스템은 시뮬레이션 세션 범위의 시작으로부터 새로운 시뮬레이션 타겟 범위로 고속 시뮬레이션하고, 1차 입력을 압축해제하여 평가를 위해 RCC 하드웨어 가속기(2620)로 전달한다. RCC 시스템이 새로운 시뮬레이션 타겟 범위의 시작에 도달할때, RCC 하드웨어 가속기(2620)로부터의 1차 출력이 VCD 파일 내로 덤핑된다. 새로운 시뮬레이션 타겟 범위의 끝에서, RCC 시스템은 VCD 파일 내로 하드웨어 상태 정보를 덤핑하는 것을 중단한다. 이 지점에서, 사용자는 버그를 격리시키기 위한 VCD 파일을 볼 수 있다.When analyzing a VCD file, the user may or may not find a bug. If a bug is found, the user, of course, initiates the design modification. If no bug is found, the user may make a false guess in the simulation target range that he suspects has a bug. The user must use the same process used above for the decompression and VCD file dump. The user may hopefully make another conjecture that there is a better simulation target range within the simulation session range. In doing so, the RCC system performs a high speed simulation from the start of the simulation session range to the new simulation target range, decompresses the primary input and passes it to the RCC hardware accelerator 2620 for evaluation. When the RCC system reaches the start of a new simulation target range, the primary output from the RCC hardware accelerator 2620 is dumped into the VCD file. At the end of the new simulation target range, the RCC system stops dumping hardware state information into the VCD file. At this point, the user can see the VCD file to isolate the bug.

요컨대, 시뮬레이션 시간(t0)으로부터 시뮬레이션 시간(t1)으로, RCC 시스템은 이전에 압축된 1차 입력을 압축해제하고 이를 평가를 위해 하드웨어 모델로 전달함으로써 고속 시뮬레이션한다. 시뮬레이션 시간(t1)으로부터 시뮬레이션 시간 (t2)으로의 시뮬레이션 타겟 범위 동안, RCC 시스템은 하드웨어 모델로부터의 1차 출력을 VCD 파일로 덤핑한다. 시뮬레이션 타겟 범위의 끝에서, 사용자는 상기 설계를 고속으로 시뮬레이션하는 것을 중단할 수 있다. 그리고 나서, 이 지점에서, 사용자는 시뮬레이션 시간(t0)에서 바로 그 시작으로부터 시뮬레이션을 재실행함이 없이 시뮬레이션 시간(t1)으로 직접 진행함으로써 VCD 파일을 볼 수 있다.In short, from simulation time t0 to simulation time t1, the RCC system performs high-speed simulation by decompressing previously compressed primary inputs and passing them to a hardware model for evaluation. During the simulation target range from simulation time t1 to simulation time t2, the RCC system dumps the primary output from the hardware model into the VCD file. At the end of the simulation target range, the user can stop simulating the design at high speed. At this point, the user can then view the VCD file by going directly to simulation time t1 without rerunning the simulation from the very beginning at simulation time t0.

상기 시뮬레이션 타겟 범위의 검토가 종료되고 버그가 격리되어 제거될때, 사용자는 다음 시뮬레이션 세션 범위로 진행할 수 있다. 이 새로운 시뮬레이션 세션 범위는 시뮬레이션 시간(t3)에서 시작한다. 이전 시뮬레이션 시간 세션 범위와 동일한 길이일 수 있는 특정 길이의 새로운 시뮬레이션 타겟 범위가 사용자에 의해 선택된다. RCC 시스템은 시뮬레이션 시간(t3)에 대응하는 이전에 저장된 하드웨어 상태 정보를 로딩한다. RCC 시스템은 이제 이 새로운 시뮬레이션 세션 범위를 고속으로 시뮬레이션할 준비가 되어 있다. 이 새로운 시뮬레이션 세션 범위는 시뮬레이션 시간 t0로부터 t3로의 범위에 대응한다는 것을 주의하고, 여기서 로딩된 하드웨어 상태는 이제 시뮬레이션 시간(t0)에 대응한다. 고속 시뮬레이션, 주문형 VCD 덤프(VCD on-demand dump) 및 VCD 검토 프로세스는 상술된 것과 유사하다.When the review of the simulation target range ends and the bug is quarantined and removed, the user can proceed to the next simulation session range. This new simulation session range starts at simulation time t3. A new simulation target range of a certain length, which may be the same length as the previous simulation time session range, is selected by the user. The RCC system loads previously stored hardware state information corresponding to the simulation time t3. The RCC system is now ready to simulate this new simulation session range at high speed. Note that this new simulation session range corresponds to the range from simulation time t0 to t3, where the loaded hardware state now corresponds to simulation time t0. The high speed simulation, VCD on-demand dump and VCD review processes are similar to those described above.

본 발명의 일 실시예에 따라서, 압축해제 단계는 성능에 부정적으로 영향을 주지 않는다. RCC 시스템은 시뮬레이션 히스토리(즉, 압축되고 기록된 1차 입력)를 초당 20,000 내지 200,000 시뮬레이션 시간 단계의 비율로 압축해제한다. 적절한 체크포인트 제어를 사용하여, RCC 시스템은 50초 내에서 시뮬레이션 히스토리를 추출(즉, 1차 입력으로부터의 시뮬레이션 재생 선택된 VCD 파일 재생)할 수 있다.According to one embodiment of the invention, the decompression step does not negatively affect performance. The RCC system decompresses the simulation history (ie, the compressed and recorded primary input) at a rate of 20,000 to 200,000 simulation time steps per second. Using appropriate checkpoint control, the RCC system can extract the simulation history (i.e. play the selected VCD file from the primary playback) within 50 seconds.

주문형 VCD 특성이 RCC 시스템에서 제어되는 특정 방식에 대하여, 사용자는 $axis_rpd 명령을 사용해야만 한다. $axis_rpd는 RCC 평가 기록을 추출하여 요구시 VCD 파일을 생성하기 위한 대화식 명령이다. 종래의 시뮬레이션 리와인드 기술과 달리, $axis_rpd 명령의 수행은 내부 시뮬레이션 상태를 리와인딩하지도 않고 외부 PLI 및 파일 I/O 상태에 오류를 일으키지도 않는다. 사용자는 자신이 $stop 명령 이후에 시뮬레이션할 수 있는 것과 동일한 방식으로 $axis_rpd 명령을 실행한 후에 시뮬레이션을 지속할 수 있다.For the particular way in which the custom VCD characteristics are controlled in the RCC system, the user must use the $ axis_rpd command. $ axis_rpd is an interactive command for extracting RCC evaluation records and creating VCD files on demand. Unlike conventional simulation rewind techniques, the execution of the $ axis_rpd command neither rewinds the internal simulation state nor causes errors in external PLI and file I / O states. The user can continue the simulation after executing the $ axis_rpd command in the same way that he can simulate after the $ stop command.

인수(argument)가 규정되지 않을 때, $axis_rpd 명령은 시뮬레이션 세션 범위 내의 모든 이용 가능한 시뮬레이션 시간 주기를 디스플레이한다; 즉, 사용자는 시뮬레이션 타겟 범위를 선택할 수 있다. 시간 유닛은 명령 라인 인터페이스에서 동일한 시간 유닛이다. 시뮬레이션 로그(log)의 예는 다음과 같다:When no argument is specified, the $ axis_rpd command displays all available simulation time periods within the simulation session scope; That is, the user can select the simulation target range. The time unit is the same time unit in the command line interface. An example of a simulation log is as follows:

C1>$rcc(record, r1);C1> $ rcc (record, r1);

C2>#1000 $rcc(xt0,run);C2> # 1000 $ rcc (xt0, run);

C3>#50000$rcc(off);C3> # 50000 $ rcc (off);

C4>#50500 $rcc(run);C4> # 50500 $ rcc (run);

C5>#60000 $rcc(stop);C5> # 60000 $ rcc (stop);

…100500에서 RCC 엔진 시작… Start RCC engine at 100500

…SIM으로 되돌아감:5000000에서 RCC 엔지 정지… Return to SIM: Stop RCC engine at 5000000

…5050500에서 RCC 엔진 시작… Start RCC engine on 5050500

…SIM으로 되돌아감:6000000에서 RCC 엔진 정지
시뮬레이션 시간 60000.0000ns에서 인터럽트… Return to SIM: RCC engine stopped at 6000000
Interrupt at simulation time 60000.0000ns

C6>$axis_rpd;C6> $ axis_rpd;

이용 가능한 시뮬레이션 히스토리:Available simulation history:

1005.000000 내지 50000.0000001005.000000 to 50000.000000

50505.000000 내지 60000.00000050505.000000 to 60000.000000

시뮬레이션 시간 60000.0000ns에서 인터럽트Interrupt at simulation time 60000.0000ns

이 시뮬레이션 로그로부터, 사용자가 사용한 RCC 엔진은 1000 내지 50000 직후의 시간 및 50500 내지 60000 직후의 시간을 형성한다. 그러므로, $axis_rpd는 기록된 시뮬레이션 윈도우(window)를 나타낸다.From this simulation log, the RCC engine used by the user forms a time immediately after 1000 to 50000 and a time immediately after 50500 to 60000. Therefore, $ axis_rpd represents the recorded simulation window.

시뮬레이션 히스토리로부터 VCD 파일을 생성하기 위하여, 사용자는 다음의 제어 인수를 갖는 $axis_rpd 명령을 사용한다:To create a VCD file from the simulation history, you use the $ axis_rpd command with the following control arguments:

$axis_rpd(start-time, end-time, "dump-file-name", <level and scope control>);$ axis_rpd (start-time, end-time, "dump-file-name", <level and scope control>);

시작-시간(start-time) 및 종료-시간(end-time)은 VCD 파일에 대한 시뮬레이션 시간 윈도우, 즉 시뮬레이션 타겟 범위를 규정한다. 시간 제어 인수의 유닛은 명령 라인 인터페이스에서 사용된 시간 유닛이다. "dump-file-name"은 VCD 파일의 명칭이다. 덤프<level and scope control> 파라미터는 IEEE Verilog에서의 표준 $dumpvars 명령과 동일하다.The start-time and end-time define the simulation time window for the VCD file, i.e. the simulation target range. The unit of time control argument is the time unit used in the command line interface. "dump-file-name" is the name of the VCD file. The dump <level and scope control> parameter is identical to the standard $ dumpvars command in IEEE Verilog.

$axis_rpd 명령의 예로서:As an example of the $ axis_rpd command:

C7>$axis_rpd(50505,50600, "fl.dump");C7> $ axis_rpd (50505,50600, “fl.dump”);

…50505.010000에서 RCC VCD 시작 !!… RCC VCD start at 50505.010000 !!

…50600.000000에서 RCC VCD 종료 !!… RCC VCD exit at 50600.000000 !!

이 $axis_rpd 명령은 시뮬레이션 시간 50505로부터 50600으로의 시뮬레이션 타겟 범위에 대한 "fl.dump"라 칭하는 VCD 파일을 생성한다. $dumpvars와 같이, <level and scope control> 파라미터가 제공되지 않는 경우, $axis_rpd 명령은 전체 하드웨어 상태 또는 1차 출력을 덤핑할 것이다.This $ axis_rpd command generates a VCD file called "fl.dump" for the simulation target range from simulation time 50505 to 50600. If $ level and scope control> parameters are not provided, such as $ dumpvars, the $ axis_rpd command will dump the entire hardware state or primary output.

$axis_rpd 명령을 사용하는 다른 예는 다음과 같다:Another example of using the $ axis_rpd command is:

C8>$axis_rpd(40444,50600,"fl.dump",2,dp0)C8> $ axis_rpd (40444,50600, "fl.dump", 2, dp0)

…40000.000000에서 RCC VCD 시작 !!… RCC VCD start at 40000.000000 !!

…시간 50000.000000에서 스킵… Skip from time 50000.000000

…시간 50505.000000에서 계속 !! … Keep on time 50505.000000 !!

…50600.000000에서 RCC VCD 종료 !!… RCC VCD exit at 50600.000000 !!

이 $axis_rpd 명령은 시간 40000 내지 50600의 범위 dp0 상에서 2-레벨 VCD 파일 "f2.dump"를 생성한다. 상기 시뮬레이션이 시간 50000 내지 50500 동안 소프트웨어 제어로 다시 교환되기 때문에, $axis_rpd는 그 윈도우를 스킵하는데, 그 이유는 시뮬레이션 기록이 이용 가능하지 않게 때문이다.This $ axis_rpd command creates a two-level VCD file "f2.dump" on the range dp0 in time 40000 to 50600. Since the simulation is switched back to software control for time 50000 to 50500, $ axis_rpd skips that window because the simulation record is not available.

주문형 VCD는 또한 사용자가 시뮬레이션 프로세스를 종료한 이후에 유용하다. 오프-라인 주문형 VCD를 수행하기 위하여, 사용자는 +rccplay 옵션을 갖는 "vlg"라 칭하는 시뮬레이션 프로그램을 시작한다. 이 옵션으로, RCC 시스템은 시뮬레이션을 위해 통상적인 초기화 시퀀스를 수행하는 대신에 시뮬레이션 기록을 추출하도록 명령받는다. 일단 사용자가 시뮬레이션 프로그램으로 들어가면, 사용자는 주문형 VCD를 달성하기 위하여 동일한 $axis_rpd 명령을 사용할 수 있다. 이 절차의 예는 다음과 같다:Custom VCDs are also useful after the user has finished the simulation process. To perform an off-line on-demand VCD, the user starts a simulation program called "vlg" with the + rccplay option. With this option, the RCC system is instructed to extract the simulation record instead of performing the normal initialization sequence for the simulation. Once the user enters the simulation program, the user can use the same $ axis_rpd command to achieve the custom VCD. An example of this procedure is as follows:

axis15:3-dpo_rtlc>vlg +rccplay+rl -saxis15: 3-dpo_rtlc> vlg + rccplay + rl -s

…시간 100500에서 재생 기록 ./AxisWork/rl 시작… Record playback at time 100500 ./AxisWork/rl start

C1>$axis_rpd;C1> $ axis_rpd;

이용 가능한 시뮬레이션 히스토리:Available simulation history:

1005.000000 내지 50000.0000001005.000000 to 50000.000000

50505.000000 내지 60000.00000050505.000000 to 60000.000000

시뮬레이션 시간 100500에서 인터럽트 Interrupt at simulation time 100500

C2>$axis_rpd(40000,45000, "f2.dump");C2> $ axis_rpd (40000,45000, “f2.dump”);

…40000.000000에서 RCC VCD 시작 !!… RCC VCD start at 40000.000000 !!

…45000.000000에서 RCC VCD 종료 !!… RCC VCD ends at 45000.000000 !!

시뮬레이션 시간 4500000에서 인터럽트Interrupt at simulation time 4500000

C3>C3>

상기 예에서, 시뮬레이션 기록("rl")이 사용되어 시뮬레이션 히스토리를 추출하고 시간 40000 내지 45000으로 전체 설계 상에 VCD를 생성시킨다.In this example, a simulation record ("rl") is used to extract the simulation history and generate a VCD over the entire design at times 40000 to 45000.

시뮬레이션 히스토리 리뷰Simulation history review

일단, 시뮬레이션 타겟 범위(즉, 시뮬레이션 시간 t1 내지 t2)의 VCD 파일이 RCC 시스템에 의해 생성되면, 사용자는 시뮬레이션 시간 t2로부터 t3로 고속으로 시뮬레이션할 필요가 없다. 그 대신에, RCC 시스템은 사용자가 시뮬레이션 중단하도록 하며 시뮬레이션 타겟 범위, 즉 시뮬레이션 시간(t1)의 시작으로 직접 진행하도록 한다. 그러므로, 종래 기술과 대조적으로, 사용자는 그 시작(예를 들어, 시뮬레이션 시간(t0))으로부터 시뮬레이션을 리턴할 필요가 없다. VCD 파일 내로 덤핑된 하드웨어 상태는 시뮬레이션 시간 t0로부터 1차 입력의 전체 히스토리의 평가를 반영하며, 시뮬레이션 시간 t1으로부터 t2로 1차 입력을 포함한다.Once the VCD file of the simulation target range (ie, simulation time t1 to t2) is generated by the RCC system, the user does not need to simulate at high speed from simulation time t2 to t3. Instead, the RCC system allows the user to stop the simulation and proceed directly to the simulation target range, ie the beginning of the simulation time t1. Therefore, in contrast to the prior art, the user does not need to return a simulation from its start (eg, simulation time t0). The hardware state dumped into the VCD file reflects an estimate of the overall history of the primary input from simulation time t0 and includes the primary input from simulation time t1 to t2.

RCC 시스템은 VCD 파일을 로딩한다. 그 후에, 저장된 1차 출력이 RCC 컴퓨팅 시스템(2600)으로 전달되어 소프트웨어 모델 및 이의 많은 결합 로직 회로 모두가 정확한 상태 정보에 의해 재생성될 수 있다. 그리고 나서, 사용자는 디버깅을 위한 파형 뷰어(waveform viewer)로 소프트웨어 모델을 본다. VCD로, 사용자는 버그가 격리될때까지 자신의 소프트웨어 모델에 걸쳐 매우 신중하게 단계를 진행할 수 있다.The RCC system loads the VCD file. Thereafter, the stored primary output is passed to the RCC computing system 2600 so that both the software model and many of its combined logic circuits can be regenerated by accurate state information. Then, the user sees the software model with a waveform viewer for debugging. With VCD, users can go very carefully through their software model until the bug is isolated.

이러한 주문형 VCD 특성으로, 사용자는 시뮬레이션 세션 범위 내에서 임의의 시뮬레이션 타겟 범위를 선택할 수 있다. 버그가 선택된 시뮬레이션 타겟 범위 내에서 발견될 수 없는 경우, 사용자는 요구시 다른 상이한 시뮬레이션 타겟 범위를 선택할 수 있다. 테스트 벤치 프로세스로부터의 모든 1차 입력이 전체 시뮬레이션 세션 범위에 대해 기록될 수 있기 때문에, 이 시뮬레이션의 임의의 부분은 재생될 수 있고 시뮬레이션을 재실행함이 없이 요구시 보여질 수 있다. 이 특성은 사용자로 하여금 사용자가 이 시뮬레이션 세션 범위 내에서 버그를 교정할때까지 다중 그리고 상이한 시뮬레이션 타겟 범위에 반복적으로 초점을 맞추도록 한다.With this custom VCD feature, the user can select any simulation target range within the simulation session range. If a bug cannot be found within the selected simulation target range, the user can select another different simulation target range on request. Since all primary inputs from the test bench process can be recorded for the entire simulation session range, any portion of this simulation can be reproduced and viewed on demand without rerunning the simulation. This feature allows the user to repeatedly focus on multiple and different simulation target ranges until the user corrects bugs within the scope of this simulation session.

더구나, 이 주문형 VCD 특성은 시뮬레이션 프로세스 도중에 온-라인으로 지원될 뿐만 아니라, 시뮬레이션 프로세스가 종료된 이후에 오프-라인으로 지원될 수 있다. 이 온-라인 지원은 시뮬레이션 시간(t0)에서 하드웨어 상태가 시스템 디스크 내에 저장될 수 있고 1차 입력이 시뮬레이션 세션 범위의 임의의 길이에 대해 압축되어 기록될 수 있어서 가능하다. 그 후에, 사용자는 1차 출력의 더 초점이 맞춰진 분석을 위해 시뮬레이션 타겟 범위를 규정할 수 있다.Moreover, this on-demand VCD feature can be supported on-line during the simulation process, as well as off-line after the simulation process is over. This on-line support is possible because at the simulation time t0 the hardware state can be stored in the system disk and the primary input can be recorded compressed for any length of the simulation session range. The user can then define the simulation target range for more focused analysis of the primary output.

오프-라인 지원은 시뮬레이션 시간(t0)에서 시뮬레이션 세션 범위에 대한 전체의 1차 입력 및 시뮬레이션 시간(t1)에서 하드웨어 상태가 시스템 디스크 내에 모두 저장되기 때문에 가능하다. 그러므로, 사용자는 시뮬레이션 시간(t0)에 대응하는 설계를 로딩하고 나서 시뮬레이션 타겟 범위를 규정함으로써 자신의 설계를 디버깅하기 위하여 리턴할 수 있다. 또한, 사용자는 시뮬레이션 시간(t3)에 대응하는 하드웨어 상태를 로딩함으로써 다음 시뮬레이션 타겟 범위로 직접 진행할 수 있다.Off-line support is possible because the entire primary input for the simulation session range at simulation time t0 and the hardware state at simulation time t1 are all stored in the system disk. Therefore, the user can load the design corresponding to the simulation time t0 and then return to debug his design by defining the simulation target range. In addition, the user can proceed directly to the next simulation target range by loading the hardware state corresponding to the simulation time t3.

Ⅵ. 하드웨어 구현 수단Ⅵ. Hardware implementation means

A. 개요A. Overview

SEmulation 시스템은 리컨피규러블 보드 상에 FPGA 칩의 어레이를 구현한다. 상기 하드웨어 모델을 토대로 하여, SEmulation 시스템은 사용자 회로 설계의 각각의 선택된 부분을 FPGA 칩 상으로 분할하고, 맵핑하고, 배치시키고 라우팅한다. 그러므로, 예를 들어, 16 칩의 4x4 어레이는 이러한 16 칩에 대해 배치되는 큰 회로를 모델링할 수 있다. 상호접속 방식은 각각의 칩이 2 "점프" 또는 링크 내에서 다른 칩에 액세스하도록 한다.The SEmulation system implements an array of FPGA chips on a reconfigurable board. Based on the hardware model, the SEmulation system divides, maps, places and routes each selected portion of the user circuit design onto the FPGA chip. Thus, for example, a 4x4 array of 16 chips can model the large circuitry deployed for these 16 chips. The interconnect scheme allows each chip to access two "jumps" or other chips within the link.

각각의 FPGA 칩은 각각의 I/O 어드레스 공간(즉, REG, CLK, S2H, H2S)을 위한 어드레스 포인터를 구현한다. 특정 어드레스 공간과 관련된 모든 어드레스 포인터의 조합은 함께 연결(chain)될 수 있다. 그래서, 데이터 전달 동안, 각각의 칩 내의 워드 데이터는 메인 FPGA 버스 및 PCI 버스로부터/PCI 버스로 각 칩에서 선택된 어드레스 공간에 대하여 한번에 한 워드 및 원하는 워드 데이터가 그 선택된 어드레스 공간에 대하여 액세스될 때까지 한번에 한 칩이 순차적으로 선택된다. 워드 데이터의 이 순차적인 선택은 전파되는 워드 선택 신호에 의해 달성된다. 이 워드 선택 신호는 칩 내의 어드레스 포인터를 통하여 이동하고 나서, 다음 칩 내의 어드레스 포인터로 전파되고 최종 칩 상으로 지속되거나 시스템이 어드레스 포인터를 초기화한다.Each FPGA chip implements an address pointer for each I / O address space (ie, REG, CLK, S2H, H2S). Any combination of address pointers associated with a particular address space can be chained together. Thus, during data transfer, the word data in each chip is from the main FPGA bus and the PCI bus to / from the PCI bus until one word at a time and the desired word data are accessed for that selected address space for each chip selected address space. One chip is selected sequentially at a time. This sequential selection of word data is achieved by the word select signal to be propagated. This word select signal travels through the address pointer in the chip and then propagates to the address pointer in the next chip and continues on the last chip or the system initializes the address pointer.

리컨피규러블 보드 내의 FPGA 버스 시스템은 PCI 버스 대역폭을 두배로 동작시키지만, PCI 버스 속도를 절반으로 동작시킨다. 그러므로, FPGA 칩은 더 큰 대역폭 버스를 사용하기 위하여 뱅크(bank)로 분리된다. 이 FPGA 버스 시스템의 처리량이 PCI 버스 시스템의 처리량을 비례하여 성능은 버스 속도를 감소시킴으로서 성능이 손실된다. 확장은 뱅크 길이를 확장시키는 더 많은 FPGA 칩 또는 피기백 보드를 포함하는 더 큰 보드를 통하여 가능하다.The FPGA bus system on the reconfigurable board doubles the PCI bus bandwidth, but halves the PCI bus speed. Therefore, FPGA chips are separated into banks to use larger bandwidth buses. The throughput of this FPGA bus system is proportional to the throughput of the PCI bus system, so the performance is reduced by reducing the bus speed. Expansion is possible through larger boards, including more FPGA chips or piggyback boards that extend the bank length.

B. 어드레스 포인터B. Address Pointer

도 11은 본 발명의 어드레스 포인터의 일 실시예를 도시한 것이다. 모든 I/O 동작은 DMA 스트리밍 수행한다. 시스템이 단지 하나의 버스를 가지기 때문에, 상기 시스템은 한번에 한 워드씩 순차적으로 데이터에 액세스한다. 그러므로, 어드레스 포인터의 일 실시예는 이러한 어드레스 공간에서 선택된 워드에 순차적으로 액세스하기 위하여 시프트 레지스터 체인을 사용한다. 어드레스 포인터(400)는 플립-플롭(401-405), AND-게이트(406), 및 제어 신호 결합, INITIALIZE(407) 및 MOVE(408)을 포함한다.Figure 11 illustrates one embodiment of an address pointer of the present invention. All I / O operations perform DMA streaming. Since the system has only one bus, the system accesses the data sequentially one word at a time. Therefore, one embodiment of an address pointer uses a shift register chain to sequentially access selected words in this address space. The address pointer 400 includes flip-flops 401-405, AND-gates 406, and control signal combinations, INITIALIZE 407 and MOVE 408.

각각의 어드레스 포인터는 선택된 어드레스 공간 내의 동일한 워드에 대응하는 각각의 FPGA 칩에서 n 개의 가능한 워드로부터 한 워드를 선택하기 위하여 n 개의 출력(W0,W1,W2,...,Wn-1)을 갖는다. 모델링되는 특정 사용자 회로 설계에 따라서, 워드의 수(n)는 회로 설계마다 가변될 수 있고, 소정 회로 설계에 대하여, n은 FPGA 칩마다 가변이다. 도 11에서, 어드레스 포인터(400)는 단지 5 워드(즉, n=5) 어드레스 포인터이다. 그러므로, 특정 어드레스 공간에 대한 이 5-워드 어드레스 포인터를 포함하는 이 특정 FPGA 칩은 선택하기 위한 단지 5 워드만을 갖는다. 물론, 어드레스 포인터(400)는 임의의 수의 워드(n)를 구현할 수 있다. 이 출력 신호(Wn)는 워드 선택 신호로 호출될 수 있다. 이 워드 선택 신호가 이 어드레스 포인터 내의 최종 플립-플롭의 출력에 도달할 때, 이것은 다음 FPGA 칩의 어드레스 포인터의 입력으로 전파될 OUT 신호로 호출된다.Each address pointer has n outputs (W0, W1, W2, ..., Wn-1) to select one word from n possible words in each FPGA chip corresponding to the same word in the selected address space. . Depending on the particular user circuit design being modeled, the number n of words can vary from circuit design to design, and for a given circuit design, n is variable per FPGA chip. In FIG. 11, the address pointer 400 is only a five word (ie n = 5) address pointer. Therefore, this particular FPGA chip containing this 5-word address pointer for that particular address space has only 5 words to select. Of course, the address pointer 400 can implement any number of words n. This output signal Wn can be called a word select signal. When this word select signal reaches the output of the last flip-flop in this address pointer, it is called with the OUT signal to be propagated to the input of the address pointer of the next FPGA chip.

INITIALIZE 신호가 나타날 때, 어드레스 포인터는 초기화된다. 제 1 플립-플롭(401)은 "1"로 설정되고 모든 다른 플립-플롭(402-405)은 "0"으로 설정된다. 이 지점에서, 어드레스 포인터의 초기화는 임의의 워드 선택을 가능하게 하지는 않을 것이다;즉, 모든 Wn 출력은 초기화 후에 여전히 "0"이다. 어드레스 포인터 초기화 절차는 도 12와 관련하여 논의될 것이다.When the INITIALIZE signal appears, the address pointer is initialized. The first flip-flop 401 is set to "1" and all other flip-flops 402-405 are set to "0". At this point, initialization of the address pointer will not enable any word selection; that is, all Wn outputs are still "0" after initialization. The address pointer initialization procedure will be discussed with respect to FIG.

MOVE 신호는 워드 선택을 위한 포인터의 진행을 제어한다. 이 MOVE 신호는 FPGA I/O 컨트롤러로부터의 READ, WRITE 및 SPACE 지수 제어 신호로부터 유도된다. 모든 동작이 본질적으로 판독 또는 기록이기 때문에, SPACE 지수 신호는 어느 어드레스 포인터가 MOVE 신호를 제공받을 것인지를 본질적으로 결정한다. 그러므로, 시스템은 한번에 선택된 I/O 어드레스 공간와 관련된 단지 하나의 어드레스 포인터를 동작시키며, 그 시간 동안, 시스템은 그 어드레스 포인터에 MOVE 신호를 제공한다. MOVE 신호 발생은 도 13과 관련하여 이하에 논의된다. 도 11을 참조하면, MOVE 신호가 나타나면, MOVE 신호는 AND 게이트(406)의 입력으로 제공되어 플립-플롭(401-405)의 입력을 인에이블시킨다. 그러므로, 로직 "1"은 시스템 클럭 사이클마다 워드 출력 Wi으로부터 Wi+1로 이동할 것이다; 즉, 포인터는 사이클마다 특정 워드를 선택하기 위하여 Wi로부터 Wi+1로 이동할 것이다. 시프팅 워드 선택 신호가 최종 플립-플롭(405)의 출력(413)(본원에서 "OUT"으로 표시됨)으로 진행할 때, 이 OUT 신호는 그 후에 어드레스 포인터가 다시 초기화되지 않는다면 도 14 및 15와 관련하여 서술되는 멀티플렉스된 교차 칩 어드레스 포인터 체인를 통하여 다음 FPGA 칩으로 진행되어야만 한다.The MOVE signal controls the progress of the pointer for word selection. This MOVE signal is derived from the READ, WRITE, and SPACE exponential control signals from the FPGA I / O controller. Since all operations are essentially read or write, the SPACE exponential signal essentially determines which address pointer will be provided with the MOVE signal. Therefore, the system operates only one address pointer associated with the selected I / O address space at a time, during which time the system provides a MOVE signal to that address pointer. MOVE signal generation is discussed below with respect to FIG. 13. Referring to FIG. 11, when the MOVE signal appears, the MOVE signal is provided to the input of the AND gate 406 to enable the inputs of the flip-flops 401-405. Therefore, logic "1" will move from word output Wi to Wi + 1 every system clock cycle; That is, the pointer will move from Wi to Wi + 1 to select a particular word every cycle. As the shifting word select signal proceeds to the output 413 of the final flip-flop 405 (shown herein as " OUT "), this OUT signal is then associated with Figures 14 and 15 unless the address pointer is reinitialized. It must proceed to the next FPGA chip through the multiplexed cross chip address pointer chain described.

어드레스 포인터 초기화 절차가 이하에 서술될 것이다. 도 12는 도 11의 어드레스 포인터에 대한 어드레스 포인터 초기화의 상태 전이도를 도시한 것이다. 최초로, 상태(460)는 휴면(idle) 상태이다. DATA_XSFR이 "1"로 설정될때, 시스템은 상태(461)로 진행하며, 여기서 어드레스 포인터는 초기화된다. 여기서, INITIALIZE 신호가 나타난다. 각각의 어드레스 포인터 내의 제 1 플립-플롭이 "1"로 설정되고 어드레스 포인터 내의 모든 다른 플립-플롭이 "0"으로 설정된다. 이 지점에서, 어드레스 포인터의 초기화는 임의의 워드 선택을 인에이블시키지 않을 것이다: 즉, 모든 Wn 출력이 여전히 "0"이다. 다음 상태는 대기 상태(462)이며 DATA_XSFR은 여전히 "1"이다. DATA_XSFR이 "0"이 될 때, 어드레스 포인터 초기화 절차는 종료되고 시스템은 휴면 상태(460)로 리턴한다.The address pointer initialization procedure will be described below. FIG. 12 shows a state transition diagram of address pointer initialization for the address pointer of FIG. Initially, state 460 is in an idle state. When DATA_XSFR is set to "1", the system proceeds to state 461 where the address pointer is initialized. Here, the INITIALIZE signal appears. The first flip-flop in each address pointer is set to "1" and all other flip-flops in the address pointer are set to "0". At this point, the initialization of the address pointer will not enable any word selection: that is, all Wn outputs are still "0". The next state is the wait state 462 and DATA_XSFR is still "1". When DATA_XSFR becomes "0", the address pointer initialization procedure ends and the system returns to sleep state 460.

어드레스 포인터를 위해 다양한 MOVE 신호를 발생시키는 MOVE 신호 발생기가 이하에 논의될 것이다. FPGA I/O 컨트롤러(도 10; 도 22 내의 아이템(327))에 의해 발생되는 SPACE 지수는 특정 어드레스 공간(즉, REG 판독, REG 기록, S2H 판독, H2S 기록 및 CLK 기록)을 선택한다. 이러한 어드레스 공간 내에서, 본 발명의 시스템은 액세스되는 특정 워드를 순차적으로 선택한다. 순차적인 워드 선택은 MOVE 신호에 의하여 각각의 어드레스 포인터 내에서 달성된다.MOVE signal generators for generating various MOVE signals for the address pointer will be discussed below. The SPACE index generated by the FPGA I / O controller (FIG. 10; item 327 in FIG. 22) selects a particular address space (ie, REG read, REG write, S2H read, H2S write, and CLK write). Within this address space, the system of the present invention sequentially selects specific words to be accessed. Sequential word selection is achieved within each address pointer by the MOVE signal.

MOVE 신호 발생기의 일실시예가 도 13에 도시되어 있다. 각각의 FPGA 칩(450)은 다양한 소프트웨어/하드웨어 경계 어드레스 공간(즉, REG, S2H, H2S 및 CLK)에 대응하는 어드레스 포인터를 갖는다. FPGA 칩(450)에서 모델링되어 구현되는 사용자의 회로 설계 및 어드레스 포인터 이외에, MOVE 신호 발생기(470)가 FPGA 칩(450) 내에 제공된다. MOVE 신호 발생기(470)는 어드레스 공간 디코더(451) 및 몇 개의 AND 게이트(452-456)를 포함한다. 입력 신호는 와이어 라인(457) 상의 FPGA 판독 신호(F_RD), 와이어 라인(458) 상의 FPGA 기록 신호 (F_WR) 및 어드레스 공간 신호(459)이다. 각각의 어드레스 포인터에 대한 출력 MOVE 신호는 와이어 라인(464) 상의 REGR-이동, 와이어 라인(465) 상의 REGW-이동, 와이어 라인(466) 상의 S2H-이동, 와이어 라인(467) 상의 H2S-이동, 와이어 라인(468) 상의 CLK-이동에 대응하며, 이것들에 따라서 어드레스 공간의 어드레스 포인터는 적용 가능하다. 이러한 출력 신호는 와이어 라인(408)(도 11) 상의 MOVE 신호에 대응한다.One embodiment of a MOVE signal generator is shown in FIG. Each FPGA chip 450 has an address pointer corresponding to various software / hardware boundary address spaces (ie, REG, S2H, H2S, and CLK). In addition to the user's circuit design and address pointer modeled and implemented in the FPGA chip 450, a MOVE signal generator 470 is provided within the FPGA chip 450. MOVE signal generator 470 includes an address space decoder 451 and several AND gates 452-456. The input signals are the FPGA read signal F_RD on the wire line 457, the FPGA write signal F_WR on the wire line 458, and the address space signal 459. The output MOVE signal for each address pointer is REGR-move on wire line 464, REGW-move on wire line 465, S2H-move on wire line 466, H2S-move on wire line 467, Corresponds to CLK-move on wire line 468, and accordingly these address pointers in the address space are applicable. This output signal corresponds to the MOVE signal on wire line 408 (FIG. 11).

어드레스 공간 디코더(451)는 3-비트 입력 신호(459)를 수신한다. 이 디코더는 또한 단지 2-비트 입력 신호를 수신할 수 있다. 2-비트 신호는 4 개의 가능한 어드레스 공간를 제공하는 반면, 3-비트 입력은 8 개의 가능한 어드레스 공간를 제공한다. 일 실시예에서, CLK는 "00"으로 할당되고, S2H는 "01"로 할당되며, H2S는 "10"으로 할당되고 REG는 "11"로 할당된다. 입력 신호(459)에 따라서, 어드레스 공간 디코더의 출력부는 REG, H2S, S2H, 및 CLK에 각각 대응하는 와이어 라인(460-463)중 하나 상에 "1"을 출력하지만, 나머지 와이어 라인은 "0"으로 설정된다. 그러므로, 임의의 이러한 출력 와이어 라인(460-463)이 "0"인 경우, AND 게이트(452-456)의 대응하는 출력은 "0"이다. 마찬가지로, 임의의 이러한 입력 와이어 라인(460-463)이 "1"인 경우, AND 게이트(452-456)의 대응하는 출력은 "1"이다. 예를 들어, 어드레스 공간 신호(459)가 "10"인 경우, 어드레스 공간(H2S)가 선택된다. 와이어 라인(461)은 "1"이지만, 나머지 와이어 라인(460, 462 및 463)은 "0"이다. 따라서, 와이어 라인(466)이 "1"이지만, 나머지 출력 와이어 라인(464, 465, 467 및 468)은 "0"이다. 마찬가지로, 와이어 라인(460)이 "1"인 경우, REG 공간가 선택되며 판독(F_RD) 또는 기록(F_WR) 동작이 선택되는지에 따라서, 와이어 라인(464) 상의 REGR-이동 신호 또는 와이어 라인(465) 상의 REGW-이동 신호중 하나는 "1"일 것이다.The address space decoder 451 receives the 3-bit input signal 459. This decoder can also only receive a 2-bit input signal. The 2-bit signal provides four possible address spaces, while the 3-bit input provides eight possible address spaces. In one embodiment, CLK is assigned "00", S2H is assigned "01", H2S is assigned "10" and REG is assigned "11". According to the input signal 459, the output of the address space decoder outputs "1" on one of the wire lines 460-463 corresponding to REG, H2S, S2H, and CLK, respectively, while the remaining wire lines are "0". Is set to ". Therefore, when any such output wire lines 460-463 are "0", the corresponding output of AND gates 452-456 is "0". Similarly, if any such input wire lines 460-463 are "1", the corresponding output of AND gates 452-456 is "1". For example, when the address space signal 459 is "10", the address space H2S is selected. Wire line 461 is "1", while the remaining wire lines 460, 462, and 463 are "0". Thus, wire line 466 is "1" while the remaining output wire lines 464, 465, 467 and 468 are "0". Similarly, if wire line 460 is "1", the REGR-move signal or wire line 465 on wire line 464, depending on whether the REG space is selected and whether the read (F_RD) or write (F_WR) operation is selected. One of the REGW-move signals on the phase will be "1".

전술한 바와 같이, SPACE 지수는 FPGA I/O 컨트롤러에 의해 생성된다. 코드에서, MOVE 제어는:As mentioned above, the SPACE index is generated by the FPGA I / O controller. In the code, the MOVE control is:

REG 공간 판독 포인터: REGR-move = (SPACE-index==#REG)& READ;REG space read pointer: REGR-move = (SPACE-index == # REG) &READ;

REG 공간 기록 포인터: REGW-move = (SPACE-index==#REG)& WRITE;REG space record pointer: REGW-move = (SPACE-index == # REG) &WRITE;

S2H 공간 판독 포인터: S2H-move = (SPACE-index==#S2H)& READ;S2H Spatial Read Pointer: S2H-move = (SPACE-index == # S2H) &READ;

H2S 공간 기록 포인터: H2S-move = (SPACE-index==#H2S)& WRITE;H2S space write pointer: H2S-move = (SPACE-index == # H2S) &WRITE;

CLK 공간 기록 포인터: CLK-move = (SPACE-index==#CLK)& WRITE;CLK space write pointer: CLK-move = (SPACE-index == # CLK) &WRITE;

이것은 도 13의 MOVE 신호 발생기의 로직도에 대한 등가 코드이다.This is the equivalent code for the logic diagram of the MOVE signal generator of FIG.

전술한 바와 같이, 각각의 FPGA 칩은 소프트웨어/하드웨어 경계에서의 어드레스 공간과 동일한 수의 어드레스 포인터를 갖는다. 소프트웨어/하드웨어 경계가 4 개의 어드레스 공간(즉, REG, S2H, H2S 및 CLK)를 갖는 경우, 각각의 FPGA 칩은 4 개의 어드레스 공간에 대응하는 4 개의 어드레스 포인터를 갖는다. 각각의 FPGA는 처리되고 있는 선택된 어드레스 공간 내의 특정 선택 워드가 임의의 하나 이상의 FPGA 칩에 존재하거나, 선택된 어드레스 공간 내의 데이터가 각각의 FPGA 칩에서 모델링되고 구현된 다양한 회로 엘리먼트에 영향을 주기 때문에, 이러한 4 개의 어드레스 포인터를 필요로 한다. 선택된 워드가 적절한 FPGA 칩(들) 내의 적절한 회로 엘리먼트(들)에 의해 처리되도록 하기 위하여, 소정 소프트웨어/하드웨어 경계 어드레스 공간(즉, REG, S2H, H2S 및 CLK)와 관련된 어드레스 포인터의 각 세트는 몇 개의 FPGA 칩에 대해 함께 "결합(chain)"된다. 도 11과 관련하여 전술한 바와 같은 MOVE 신호를 통한 특성 시프팅 또는 전파 워드 선택 메커니즘은 이 "체인" 실시예에서, 하나의 FPGA 칩 내의 특정 어드레스 공간와 관련된 어드레스 포인터가 다음 FPGA 칩 내의 동일한 어드레스 공간와 관련된 어드레스 포인터에 "결합"된다는 것을 제외하고, 여전히 사용된다.As mentioned above, each FPGA chip has the same number of address pointers as the address space at the software / hardware boundary. If the software / hardware boundary has four address spaces (ie, REG, S2H, H2S, and CLK), each FPGA chip has four address pointers corresponding to the four address spaces. Each FPGA has a particular select word in the selected address space being processed in any one or more FPGA chips, or because the data in the selected address space affects the various circuit elements modeled and implemented in each FPGA chip. Four address pointers are required. In order for the selected word to be processed by the appropriate circuit element (s) in the appropriate FPGA chip (s), each set of address pointers associated with a given software / hardware boundary address space (ie, REG, S2H, H2S and CLK) may be "Chain" together for the two FPGA chips. The characteristic shifting or propagation word selection mechanism via the MOVE signal as described above with respect to FIG. 11 is that in this "chain" embodiment, an address pointer associated with a particular address space within one FPGA chip is associated with the same address space within the next FPGA chip. It is still used, except that it is "coupled" to the address pointer.

어드레스 포인터를 연쇄하기 위하여 4 개의 입력 핀 및 4 개의 출력 핀을 구현하는 것은 동일한 목적을 달성할 것이다. 그러나, 이러한 구현은 자원의 효율적인 사용면에서 너무 비용이 많이 들게 될 것이다; 즉, 두 개의 칩들 사이에 4 개의 와이어가 필요할 것이고, 각 칩에서 4 개의 입력 핀 및 4 개의 출력 핀이 필요할 것이다. 본 발명에 따른 일 실시예는 하드웨어 모델이 칩들 사이에서 단지 하나의 와이어가 사용되도록 하고 각 칩에서 단지 1 입력 핀 및 1 출력 핀(칩 내에 2 I/O 핀)이 사용되도록 하는 멀티플렉스된 교차 칩 어드레스 포인터 체인을 사용한다. 멀티플렉스된 교차 칩 어드레스 포인터 체인의 일 실시예가 도 14에 도시되어 있다.Implementing four input pins and four output pins to concatenate address pointers will accomplish the same purpose. However, such an implementation would be too expensive in terms of efficient use of resources; That is, four wires will be needed between the two chips, with four input pins and four output pins on each chip. One embodiment according to the invention is a multiplexed crossover where the hardware model allows only one wire to be used between the chips and only one input pin and one output pin (2 I / O pins within the chip) to be used on each chip. Use a chip address pointer chain. One embodiment of a multiplexed cross chip address pointer chain is shown in FIG. 14.

도 14에 도시된 실시예에서, 사용자의 회로 설계는 리컨피규러블 하드웨어 보드(470) 내의 세 개의 FPGA 칩(415-417)에서 맵핑되고 분할된다. 어드레스 포인터는 블럭(421-432)으로 도시된다. 얼마나 많은 워드가 사용자 주문 회로 설계에 대해 각 칩에서 구현될 있는지에 따라 워드수(Wn) 및 플립-플롭 수가 가변할 수 있다는 것을 제외하면, 각각의 어드레스 포인터, 예를 들어 어드레스 포인터(427)는 도 11에 도시된 어드레스 포인터와 유사한 구조 및 기능을 갖는다.In the embodiment shown in FIG. 14, the user's circuit design is mapped and partitioned on three FPGA chips 415-417 in the reconfigurable hardware board 470. The address pointer is shown in blocks 421-432. Each address pointer, e.g., address pointer 427, except that the number of words (Wn) and the number of flip-flops may vary depending on how many words are to be implemented on each chip for a user-customized circuit design. It has a structure and a function similar to the address pointer shown in FIG.

REGR 어드레스 공간에 대하여, FPGA 칩(415)은 어드레스 포인터(421)를 가지고, FPGA 칩(416)은 어드레스 포인터(425)를 가지며, FPGA 칩(417)은 어드레스 포인터(429)를 갖는다. REGW 어드레스 공간에 대하여, FPGA 칩(415)은 어드레스 포인터(422)를 가지고, FPGA 칩(416)은 어드레스 포인터(426)를 가지며, FPGA 칩(417)은 어드레스 포인터(430)를 갖는다. S2H 어드레스 공간에 대하여, FPGA 칩(415)은 어드레스 포인터(423)를 가지고, FPGA 칩(416)은 어드레스 포인터(427)를 가지며 FPGA 칩(417)은 어드레스 포인터(431)를 갖는다. H2S 어드레스 공간에 대하여, 어드레스 공간에 대하여, FPGA 칩(415)은 어드레스 포인터(424)를 가지고, FPGA 칩(416)은 어드레스 포인터(428)를 가지며 FPGA 칩(417)은 어드레스 포인터(432)를 갖는다.For the REGR address space, the FPGA chip 415 has an address pointer 421, the FPGA chip 416 has an address pointer 425, and the FPGA chip 417 has an address pointer 429. For the REGW address space, the FPGA chip 415 has an address pointer 422, the FPGA chip 416 has an address pointer 426, and the FPGA chip 417 has an address pointer 430. For the S2H address space, the FPGA chip 415 has an address pointer 423, the FPGA chip 416 has an address pointer 427 and the FPGA chip 417 has an address pointer 431. For the H2S address space, for the address space, the FPGA chip 415 has an address pointer 424, the FPGA chip 416 has an address pointer 428 and the FPGA chip 417 has an address pointer 432. Have

각각의 칩(415-417)은 멀티플렉서(418-420)를 각각 갖는다. 이러한 멀티플렉서(418-420)는 모델일 수 있고 실제 구현은 당업자들에게 공지된 바와 같이, 레지스터 및 로직 엘리먼트의 조합일 수 있다. 예를 들어, 멀티플렉서는 도 15에 도시된 바와 같이 OR 게이트 내로 들어가는 몇 개의 AND 게이트일 수 있다. 멀티플렉서(487)는 4개의 AND 게이트(481-484) 및 OR 게이트(485)를 포함한다. 멀티플렉서(487)로의 입력은 칩 내의 각 어드레스 포인터로부터의 OUT 및 MOVE 신호이다. 멀티플렉서(487)의 출력(486)은 다음 FPGA 칩에 대한 입력부로 통과되는 체인-아웃 신호(chain-out signal)이다.Each chip 415-417 has a multiplexer 418-420, respectively. Such multiplexers 418-420 may be models and the actual implementation may be a combination of registers and logic elements, as known to those skilled in the art. For example, the multiplexer can be several AND gates that go into the OR gate as shown in FIG. 15. Multiplexer 487 includes four AND gates 481-484 and OR gate 485. Inputs to the multiplexer 487 are OUT and MOVE signals from each address pointer in the chip. The output 486 of the multiplexer 487 is a chain-out signal that is passed to the input to the next FPGA chip.

도 15에서, 이 특정 FPGA 칩은 I/O 어드레스 공간에 대응하는 네 개의 어드레스 포인터(475-478)를 갖는다. 어드레스 포인터의 출력, OUT 및 MOVE 신호는 멀티플렉서(487)로의 입력이다. 예를 들어, 어드레스 포인터(475)는 와이어 라인(479) 상의 OUT 신호 및 와이어 라인(480) 상의 MOVE 신호를 갖는다. 이러한 신호는 AND 게이트(481)로의 입력이다. 이 AND 게이트(481)의 출력은 OR 게이트 (485)로의 입력이다. OR 게이트(485)의 출력은 이 멀디플렉서(487)의 출력이다. 동작에서, 대응하는 MOVE 신호 및 SPACE 지수와 함께 각각의 어드레스 포인터의 출력에서의 OUT 신호는 멀티플렉서(487)에 대한 선택기 신호로서 동작한다; 즉, (SPACE 지수 신호로부터 유도되는) OUT 및 MOVE 신호 둘 모두는 멀티플렉서로부터의 워드 선택 신호를 체인-아웃 와이어 라인으로 전파하기 위하여 액티브(active)(예를 들어, 로직 "1")로 나타나야만 한다. MOVE 신호는 입력 MUX 데이터 신호로서 특성을 나타낼 수 있도록 어드레스 포인터 내의 플립-플롭을 통하여 워드 선택 신호를 이동시키기 위하여 주기적으로 나타날 것이다.In Figure 15, this particular FPGA chip has four address pointers 475-478 corresponding to the I / O address space. The output of the address pointer, OUT and MOVE signals are input to the multiplexer 487. For example, address pointer 475 has an OUT signal on wire line 479 and a MOVE signal on wire line 480. This signal is input to AND gate 481. The output of this AND gate 481 is an input to the OR gate 485. The output of the OR gate 485 is the output of this multiplexer 487. In operation, the OUT signal at the output of each address pointer along with the corresponding MOVE signal and SPACE index act as a selector signal for the multiplexer 487; That is, both the OUT and MOVE signals (derived from the SPACE exponent signal) must appear active (eg, logic "1") to propagate word select signals from the multiplexer to the chain-out wire line. do. The MOVE signal will appear periodically to move the word select signal through the flip-flop in the address pointer so that it can be characterized as an input MUX data signal.

도 14를 참조하면, 이러한 멀티플렉서(418-420)는 네 개의 세트의 입력 및 하나의 출력을 갖는다. 각 입력 세트는 (1) 특정 어드레스 공간와 관련된 어드레스 포인터를 위한 최종 출력(Wn-1) 와이어 라인(예를 들어, 도 11에 도시된 어드레스 포인터 내의 와이어 라인(413)) 상에서 발견된 OUT 신호, 및 (2) MOVE 신호를 포함한다. 각 멀티플렉서(418-420)의 출력은 체인-아웃 신호이다. 각 어드레스 포인터 내의 플립-플롭은 통한 워드 선택 신호(Wn)는 어드레스 포인터 내의 최종 플립-플롭의 출력에 도달할때 OUT 신호가 된다. 와이어 라인(433-435) 상의 체인-아웃 신호는 동일한 어드레스 포인터와 관련된 OUT 신호 및 MOVE 신호가 둘 모두 액티브으로 나타날 때(예를 들어, "1"로 나타날 때)만, "1"이 될 것이다.Referring to Figure 14, these multiplexers 418-420 have four sets of inputs and one output. Each input set includes (1) an OUT signal found on the final output (Wn-1) wire line (eg, wire line 413 in the address pointer shown in FIG. 11) for an address pointer associated with a particular address space, and (2) It includes a MOVE signal. The output of each multiplexer 418-420 is a chain-out signal. The word select signal Wn through the flip-flop in each address pointer becomes an OUT signal when the output of the last flip-flop in the address pointer is reached. The chain-out signal on wire line 433-435 will be "1" only when both the OUT signal and the MOVE signal associated with the same address pointer appear active (eg, appear as "1"). .

멀티플렉서(418)에 대하여 입력은 어드레스 포인터(421-424)로부터의 OUT 및 MOVE 신호에 각각 대응하는 MOVE 신호(436-439) 및 OUT 신호(440-443)이다. 멀티플렉서(419)에 대하여 입력은 어드레스 포인터(425-428)로부터의 OUT 및 MOVE 신호에 각각 대응하는 MOVE 신호(444-447) 및 OUT 신호(452-455)이다. 멀티플렉서 (420)에 대하여 입력은 어드레스 포인터(429-432)로부터의 OUT 및 MOVE 신호에 각각 대응하는 MOVE 신호(448-451) 및 OUT 신호(456-459)이다. Inputs to the multiplexer 418 are MOVE signals 436-439 and OUT signals 440-443, respectively, corresponding to the OUT and MOVE signals from the address pointers 421-424. For the multiplexer 419 the inputs are MOVE signals 444-447 and OUT signals 452-455, respectively, corresponding to the OUT and MOVE signals from the address pointers 425-428. For the multiplexer 420, the inputs are MOVE signals 448-451 and OUT signals 456-459 corresponding to the OUT and MOVE signals from address pointers 429-432, respectively.

동작시에, 워드(Wn)의 임의의 제공된 시프트에 대하여, 소프트웨어/하드웨어 경계 내의 선택된 I/O 어드레스 공간와 관련된 단지 그러한 어드레스 포인터 또는 어드레스 포인터의 체인만이 액티브이다. 그러므로, 도 14에서, 어드레스 공간(REGR, REGW, S2H 또는 H2S)중 하나와 관련된 칩(415, 416 및 417) 내의 어드레스 포인터만이 제공된 시프트에 대해 액티브이다. 또한, 플립-플롭을 통한 워드 선택 신호(Wn)의 제공된 시프트에 대하여, 선택된 워드는 버스 대역폭 상의 제한으로 인하여 순차적으로 액세스된다. 일 실시예에서, 버스는 32 비트 폭이며 워드는 32비트이어서, 단지 하나의 워드가 한번에 액세스되어 적절한 리소스로 전달될 수 있다.In operation, for any given shift of word Wn, only such an address pointer or chain of address pointers associated with the selected I / O address space within the software / hardware boundary is active. Therefore, in FIG. 14, only address pointers in chips 415, 416 and 417 associated with one of the address spaces REGR, REGW, S2H or H2S are active for the provided shift. Also, for a given shift of the word select signal Wn via flip-flop, the selected words are accessed sequentially due to limitations in the bus bandwidth. In one embodiment, the bus is 32 bits wide and the words are 32 bits so that only one word can be accessed and transferred to the appropriate resource at a time.

어드레스 포인터가 플립-플롭을 통하여 워드 선택 신호를 전파 또는 시프팅하고 있을때, 출력 체인-아웃 신호는 액티브화되지 않으므로(예를 들어, "1"이 아님), 이 칩 내의 이 멀티플렉서는 아직 다음 FPGA 칩으로 워드 선택 신호를 전파할 준비를 하지 않았다. OUT 신호가 액티브(예를 들어 "1")로 나타날때, 체인-아웃 신호는 시스템이 워드 선택 신호를 다음 FPGA 칩으로 전파하거나 시프팅할 준비가되었다는 것을 표시하는 액티브(예를 들어 "1")으로 나타난다. 그러므로, 액세스가 한번에 한칩에 대해 발생된다; 즉, 워드 선택 신호는 워드 선택 시프트 동작이 다른 칩에 대해 수행되기 전에 한 칩 내의 플립-플롭을 통하여 시프트된다. 체인-아웃 신호는 워드 선택 신호가 각 칩에서 어드레스 포인터의 끝에 도달할때만 나타난다. 코드에서, 체인-아웃 신호는:When the address pointer is propagating or shifting the word select signal through the flip-flop, the output chain-out signal is not active (for example, not "1"), so this multiplexer in this chip is still the next FPGA. The chip is not ready to propagate the word select signal. When the OUT signal appears active (eg "1"), the chain-out signal is active (eg "1") indicating that the system is ready to propagate or shift the word select signal to the next FPGA chip. Appears. Therefore, access is generated for one chip at a time; That is, the word select signal is shifted through flip-flops in one chip before the word select shift operation is performed on the other chip. The chain-out signal appears only when the word select signal reaches the end of the address pointer on each chip. In the code, the chain-out signal is:

Chain-out= (REGR-move&REGR-out)｜(REGW-move*REGW-out)｜(S2H-move&S2H-out)｜(H2S-move&H2S-out);Chain-out = (REGR-move & REGR-out) | (REGW-move * REGW-out) | (S2H-move & S2H-out) | (H2S-move &H2S-out);

요컨대, 시스템 내의 X 개수의 I/O 어드레스 공간(즉, REG, H2S, S2H, CLK)에 대하여, 각각의 FPGA는 각 어드레스 스페이서에 대하여 하나의 어드레스 포인터씩 X 개의 어드레스 포인터를 갖는다. 각각의 어드레스 포인터의 크기는 각각의 FPGA 칩에서의 사용자의 주문 회로 설계를 모델링하기 위하여 필요한 워드수에 따른다. 특정 FPGA 칩에 대해 n 개의 워드 및 어드레스 포인터에 대해 n 워드를 가정하면, 이 특정 어드레스 포인터는 n 개의 출력(즉, W0, W1, W2,...,Wn-1)을 갖는다. 이러한 출력(Wi)은 또한 워드 선택 신호라고 명명된다. 특정 워드(Wi)가 선택될때, Wi 신호는 액티브(즉, "1")로 나타난다. 이 워드 선택 신호는 이 칩 내의 어드레스 포인터의 끝에 도달할 때까지 이 칩의 어드레스 포인터를 아래로 시프트하거나 전파하며, 그 끝 지점에서, 상기 신호는 다음 칩 내의 어드레스 포인터를 통하여 워드 선택 신호(Wi)의 전파를 개시하는 체인-아웃 신호의 발생을 트리거한다. 이 방식에서, 소정 I/O 어드레스 공간와 관련된 어드레스 포인터의 체인은 이 리컨피규러블 하드웨어 보드 내의 모든 FPGA 칩에 대해 구현될 수 있다.In sum, for X number of I / O address spaces (ie, REG, H2S, S2H, CLK) in the system, each FPGA has X address pointers, one address pointer for each address spacer. The size of each address pointer depends on the number of words needed to model your custom circuit design on each FPGA chip. Assuming n words for a particular FPGA chip and n words for an address pointer, this particular address pointer has n outputs (ie, W0, W1, W2, ..., Wn-1). This output Wi is also termed a word select signal. When a particular word Wi is selected, the Wi signal appears active (ie, "1"). This word select signal shifts or propagates the address pointer of this chip down until it reaches the end of the address pointer in this chip, at which point the signal is passed through the word select signal Wi through the address pointer in the next chip. Trigger the generation of a chain-out signal that initiates propagation of. In this manner, a chain of address pointers associated with a given I / O address space can be implemented for all FPGA chips in this reconfigurable hardware board.

C. 게이트되는 데이터/클럭 네트워크 분석C. Gate / Data Network Analysis

본 발명의 다양한 실시예는 게이트되는 데이터 로직(gated data logic) 및 게이트된 클럭 로직 분석과 관련된 클럭 분석을 수행한다. 게이트된 클럭 로직(또는 클럭 네트워크) 및 게이트된 데이터 네트워크 결정은 에뮬레이션 동안 하드웨어 모델에서의 로직 평가 및 소프트웨어 클럭의 성공적인 구현에 대해 중요하다. 도 4와 관련하여 서술된 바와 같이, 클럭 분석을 단계(305)에서 수행된다. 이 클럭 분석 프로세스에 대해 더 부연하기 위하여, 도 16은 본 발명의 일 실시예에 따른 흐름도를 도시한 것이다. 도 16은 또한 게이트된 이미지 분석을 도시한다.Various embodiments of the present invention perform clock analysis associated with gated data logic and gated clock logic analysis. Gated clock logic (or clock network) and gated data network decisions are important for the logic evaluation in the hardware model and successful implementation of the software clock during emulation. As described in connection with FIG. 4, clock analysis is performed in step 305. To further illustrate this clock analysis process, FIG. 16 shows a flowchart in accordance with one embodiment of the present invention. 16 also shows gated image analysis.

SEmulation 시스템은 소프트웨어에서 사용자 회로 설계의 완전한 모델 및 하드웨어에서 사용자 회로 설계의 일부분을 갖는다. 이러한 하드웨어 부분은 클럭 컴포넌트, 특히 유도된 클럭을 포함한다. 클럭 전달 타이밍 문제는 소프트웨어 및 하드웨어 사이의 경계로 인하여 발생한다. 완전 모델이 소프트웨어에 존재하기 때문에, 소프트웨어는 레지스터 값에 영향을 주는 클럭 에지를 검출할 수 있다. 레지스터의 소프트웨어 모델 이외에, 이러한 레지스터는 물리적으로 하드웨어 모델 내에 위치된다. 하드웨어 레지스터가 자신의 각각의 입력(즉, D 입력에서 Q 출력으로 데이터를 이동시키는 것)을 또한 평가하도록 하기 위하여, 소프트웨어/하드웨어 경계는 소프트웨어 클럭을 포함한다. 소프트웨어 클럭은 하드웨어 모델 내의 레지스터가 정확하게 평가한다는 것을 보증한다. 소프트웨어 클럭은 본질적으로 하드웨어 레지스터 컴포넌트로의 클럭 입력을 제어하기보다는 오히려 하드웨어 레지스터의 인에이블 입력을 제어한다. 상기 소프트웨어 클럭은 레이스 조건(race condition)이 없으므로, 유지-시간 위반을 피하기 위한 정확한 타이밍 제어가 필요하지 않다. 도 16에 도시된 클럭 네트워크 및 게이트된 데이터 로직 분석 프로세스은 레이스 조건이 피해지고 융통성있는 소프트웨어/하드웨어 경계 구현이 제공되도록 하드웨어 레지스터에 대한 클럭 및 데이터 전달 시스템을 모델링하고 구현하는 방식을 제공한다.The SEmulation system has a complete model of the user circuit design in software and part of the user circuit design in hardware. This hardware part includes clock components, in particular derived clocks. Clock transfer timing issues arise due to the boundary between software and hardware. Since the full model is in software, the software can detect clock edges that affect register values. In addition to the software model of the registers, these registers are physically located within the hardware model. In order for the hardware registers to also evaluate their respective inputs (ie, moving data from the D input to the Q output), the software / hardware boundary includes a software clock. The software clock ensures that the registers in the hardware model evaluate correctly. The software clock essentially controls the enable input of the hardware register rather than controlling the clock input to the hardware register component. The software clock is free of race conditions and does not require precise timing control to avoid hold-time violations. The clock network and gated data logic analysis process shown in FIG. 16 provides a way to model and implement a clock and data delivery system for hardware registers so that race conditions are avoided and flexible software / hardware boundary implementations are provided.

전술한 바와 같이, 제 1 클럭은 테스트-벤치 프로세스로부터의 클럭 신호이다. 조합한 소자로부터 유도된 그러한 클럭 신호와 같은 모든 다른 클럭은 유도되거나 게이트된 클럭이다. 제 1 클럭은 게이트된 클럭 및 게이트된 데이터 신호 둘 모두를 유도할 수 있다. 대부분에 대해, 단지 몇 개(예를 들어, 1-10)의 유도되거나 게이트된 클럭이 사용자의 회로 설계 내에 존재한다. 이러한 유도된 클럭은 소프트웨어 클럭으로 구현될 수 있고 소프트웨어 내에 있게 될 것이다. 상대적으로 큰 수(예를 들어 10 이상)의 유도 클럭이 회로 설계 내에 제공되는 경우, SEmulation 시스템은 I/O 오버헤드를 감소시키기 SEmulation 시스템의 성능을 유지시키기 위하여 이것을 하드웨어 내로 모델링할 것이다. 게이트된 데이터는 어떤 조합 로직를 통하여 제 1 클럭으로부터 유도된 클럭과는 다른 레지스터의 데이터 또는 제어 입력이다.As mentioned above, the first clock is a clock signal from the test-bench process. All other clocks, such as those clock signals derived from the combined device, are derived or gated clocks. The first clock can derive both the gated clock and the gated data signal. For the most part, only a few (eg 1-10) derived or gated clocks are present in the user's circuit design. This derived clock can be implemented as a software clock and will be in software. If a relatively large number (eg 10 or more) of induction clocks is provided in the circuit design, the SEmulation system will model this into hardware to maintain the performance of the SEmulation system to reduce I / O overhead. The gated data is a data or control input in a register different from the clock derived from the first clock through some combinational logic.

게이트된 데이터/클럭 분석 프로세스은 단계(500)에서 시작한다. 단계(501)는 HDL 코드로부터 발생된 유용한 자원 설계 데이터베이스 코드를 사용하고 SEmulation 시스템의 레지스터 컴포넌트로 사용자의 레지스터 엘리먼트를 맵핑한다. SEmulation 레지스터로의 사용자 레지스터의 이러한 일-대-일 맵핑은 이후의 모델링 단계를 용이하게 한다. 어떤 경우에, 이 맵핑은 특정 프리미티브(primitive)를 갖는 레지스터 엘리먼트를 설명하는 사용자 회로 설계를 처리하는 것을 필요로 한다. 그러므로, RTL 레벨 코드에 대하여, SEmulation 레지스터는 RTL 레벨 코드가 더 하위 레벨 구현을 가변시키도록 하는 충분히 상위 레벨에 있기 때문에 쉽게 고속으로 사용될 수 있다. 게이트 레벨 네트리스트에 대하여, SEmulation 시스템은 컴포넌트의 셀 라이브러리에 액세스하여 이를 특정 회로 설계-특정 로직 엘리먼트에 적합하게 하기 위하여 변경할 것이다.The gated data / clock analysis process begins at 500. Step 501 uses the useful resource design database code generated from the HDL code and maps the user's register elements to the register components of the SEmulation system. This one-to-one mapping of user registers to SEmulation registers facilitates subsequent modeling steps. In some cases, this mapping requires handling the user circuit design that describes the register element with a particular primitive. Therefore, for RTL level codes, the SEmulation register can be easily used at high speed because the RTL level code is at a sufficiently high level to allow further lower level implementations to vary. For the gate level netlist, the SEmulation system will access the cell library of the component and modify it to suit a particular circuit design-specific logic element.

단계(502)는 하드웨어 모델의 레지스터 컴포넌트로부터 클럭 신호를 추출한다. 이러한 단계는 시스템이 제 1 클럭 및 유도된 클럭을 결정하도록 한다. 이 러한 단계는 또한 회로 설계의 다양한 컴포넌트에 의해 필요한 모든 클럭 신호를 결정한다. 이 단계로부터의 정보는 소프트웨어/하드웨어 클럭 모델링 단계를 용이하게 한다.Step 502 extracts the clock signal from the register component of the hardware model. This step allows the system to determine the first clock and the derived clock. These steps also determine all the clock signals needed by the various components of the circuit design. The information from this step facilitates the software / hardware clock modeling step.

단계(503)는 제 1 클럭 및 유도된 클럭을 결정한다. 제 1 클럭은 테스트-벤치 컴포넌트로부터 발생되어 소프트웨어에서만 모델링된다. 유도된 클럭은 결합 로직으로부터 유도되고, 이 로직은 차례로 제 1 클럭에 의해 유도된다. 디폴트(default)에 의해, 본 발명의 SEmulation 시스템은 유도된 클럭을 소프트웨어에 유지할 것이다. 유도된 클럭의 수가 작은 경우(예를 들어 10 이하), 이러한 유도된 클럭은 소프트웨어 클럭으로 모델링될 수 있다. 이러한 유도된 클럭을 발생시키기 위한 결합 컴포넌트의 수가 작아서, 이러한 결합 컴포넌트를 소프트웨어 내에 존재하도록 함으로써 상당한 I/O 오버헤드가 부가되지는 않는다. 그러나, 유도된 클럭의 수가 큰 경우(예를 들어 10 이상), 이러한 유도된 클럭은 I/O 오버헤드를 최소화하기 위하여 하드웨어에서 모델링될 수 있다. 종종, 사용자의 회로 설계는 제 1 클럭으로부터 유도된 상당히 많은 유도 클럭 컴포넌트를 사용한다. 그러므로, 시스템은 소프트웨어 클럭의 수를 작게 유지하기 위하여 하드웨어에서 클럭을 구성한다.Step 503 determines the first clock and derived clock. The first clock is generated from the test-bench component and modeled only in software. The derived clock is derived from the combining logic, which in turn is derived by the first clock. By default, the SEmulation system of the present invention will maintain the derived clock in software. If the number of derived clocks is small (eg 10 or less), these derived clocks can be modeled as software clocks. The number of coupling components to generate this derived clock is small, so that such coupling components are present in software without adding significant I / O overhead. However, if the number of derived clocks is large (eg 10 or more), such derived clocks can be modeled in hardware to minimize I / O overhead. Often, a user's circuit design uses a significant number of derived clock components derived from the first clock. Therefore, the system configures the clock in hardware to keep the number of software clocks small.

결정 단계(504)는 시스템에게 임의의 유도된 클럭이 사용자의 회로 설계에서 발견되는지를 결정할 것을 요구한다. 그렇지 않은 경우, 단계(504)는 "NO"로 결정하고 사용자 회로 설계의 모든 클럭이 제 1 클럭이고 이러한 클럭이 소프트웨어에서 간단하게 모델링되기 때문에 단계(508)에서 클럭 분석이 종료된다. 유도된 클럭이 사용자의 회로 설계에서 발견되는 경우, 단계(504)는 "YES"로 결정하여 알고리즘은 단계(505)로 진행한다.Decision step 504 requires the system to determine if any derived clock is found in the user's circuit design. Otherwise, step 504 determines "NO" and clock analysis ends at step 508 because all clocks in the user circuit design are first clocks and these clocks are simply modeled in software. If the derived clock is found in the user's circuit design, step 504 determines "YES" and the algorithm proceeds to step 505.

단계(505)는 제 1 클럭으로부터 유도된 클럭으로의 팬-아웃(fan-out) 결합 컴포넌트를 결정한다. 즉, 이 단계는 결합 컴포넌트를 통하여 제 1 클럭으로부터 클럭 신호 데이터 경로를 트레이스한다. 단계(506)는 유도된 클럭으로부터 팬-인(fan-in) 결합 컴포넌트를 결정한다. 즉, 이 단계는 결합 컴포넌트로부터 유도된 클럭으로의 클럭 신호 데이터 경로를 트레이스한다. 상기 시스템에서 팬-아웃 및 팬-인 세트를 결정하는 것은 소프트웨어에서 반복적으로 행해진다. 넷(net) N의 팬-인 세트는 다음과 같다:Step 505 determines a fan-out coupling component from the first clock to the clock derived. In other words, this step traces the clock signal data path from the first clock through the coupling component. Step 506 determines a fan-in coupling component from the derived clock. That is, this step traces the clock signal data path from the coupling component to the clock derived. Determining fan-out and fan-in sets in the system is done repeatedly in software. The fan-in set of net N is as follows:

FanIn Set of a net N:FanIn Set of a net N:

find all the components driving net N; find all the components driving net N;

for each component X driving net N do: for each component X driving net N do:

if the component X is not a combinational component then if the component X is not a combinational component then

return; return;

else else

for each input net Y of the component X for each input net Y of the component X

add the FanIn set W of net Y to the FanIn Set of net N add the FanIn set W of net Y to the FanIn Set of net N

end for end for

add the component X into N; add the component X into N;

end if end if

endfor endfor

게이트된 클럭 또는 데이터 로직 네트워크는 넷 N의 팬-인 세트 및 팬-아웃 세트를 반복적으로 결정하고 이들의 인터섹션(intersection)을 결정함으로써 결정된다. 여기서 최종 목표는 소위 넷 N의 팬-인 세트를 결정하는 것이다. 넷 N은 통상적으로 팬-인 예상(perspective)으로부터 게이트된 클럭 로직을 결정하기 위한 클럭 입력 노드이다. 팬-인 예상으로부터 게이트된 데이터 로직을 결정하기 위하여, 넷 N은 가까이의 데이터 입력과 관련된 클럭 입력 노드이다. 노드가 레지스터 상에 존재하는 경우, 넷 N은 그 레지스터와 관련된 데이터 입력을 위한 그 레지스터로의 클럭 입력이다. 시스템은 넷 N을 구동시키는 모든 컴포넌트를 찾아낸다. 넷 N을 구동시키는 각각의 컴포넌트 X에 대하여, 상기 시스템은 컴포넌트 X가 결합 컴포넌트인지 아닌지 여부를 결정한다. 각각의 컴포넌트 X가 결합 컴포넌트가 아닌 경우, 넷 N의 팬-인 세트는 결합 컴포넌트를 가지지 않고 넷 N은 제 1 클럭이다.The gated clock or data logic network is determined by iteratively determining the net-in fan-in and fan-out sets and determining their intersections. The final goal here is to determine the so-called net-in fan-in set. Net N is typically a clock input node for determining the gated clock logic from a fan-in perspective. To determine the gated data logic from the fan-in prediction, net N is the clock input node associated with the nearby data input. If a node is on a register, net N is the clock input to that register for data input associated with that register. The system finds all the components that drive net N. For each component X driving net N, the system determines whether component X is a combined component or not. If each component X is not a combined component, the fan-in set of net N has no combined component and net N is the first clock.

그러나, 적어도 하나의 컴포넌트 X가 결합 컴포넌트인 경우, 시스템은 컴포넌트 X의 입력 넷 Y을 결정한다. 여기서, 상기 시스템은 컴포넌트 X로의 입력 노드를 찾아냄으로써 회로 설계를 고찰한다. 각각의 컴포넌트 X의 각 입력 넷 Y에 대하여, 넷 Y에 결합되는 팬-인 세트 W가 존재할 수 있다. 넷 Y의 이 팬-인 세트 W는 넷 N의 팬-인 세트에 부가되고 나서, 컴포넌트 X가 세트 N에 부가된다.However, if at least one component X is a combined component, the system determines the input net Y of component X. Here, the system considers the circuit design by finding the input node to component X. For each input net Y of each component X, there may be a fan-in set W coupled to net Y. This fan-in set W of net Y is added to the fan-in set of net N, and then component X is added to set N.

넷 N의 팬-아웃 세트가 유사한 방식으로 결정된다. 넷 N의 팬-아웃 세트는 다음과 같이 결정된다:The fan-out set of net N is determined in a similar manner. The fan-out set of net N is determined as follows:

FanOut Set of a netN:FanOut Set of a netN:

find all the components using the net N; find all the components using the net N;

for each component X using net N do: for each component X using net N do:

if the component X is not a combination component then if the component X is not a combination component then

return; return;

else else

for each output net Y of the component X for each output net Y of the component X

add the FanOut Set of net Y to the FanOut Set of net N add the FanOut Set of net Y to the FanOut Set of net N

end for end for

add the component X into N; add the component X into N;

end if end if

endfor endfor

다시, 게이트된 클럭 또는 데이터 로직 네트워크는 넷 N의 팬-인 세트 및 팬-아웃 세트를 결정하고 이들의 인터섹션을 결정함으로써 결정된다. 여기서 최종 목표는 소위 넷 N의 팬-아웃 세트를 결정하는 것이다. 넷 N은 통상적으로 팬-아웃 예상(perspective)으로부터 게이트된 데이터 로직를 결정하기 위한 클럭 출력 노드이다. 그러므로, 넷 N을 사용하는 모든 로직 엘리먼트 세트가 결정될 것이다. 팬-아웃 예상으로부터 게이트된 데이터 로직를 결정하기 위하여, 넷 N은 가까이의 데이터 출력과 관련된 클럭 출력 노드이다. 노드가 레지스터 상에 존재하는 경우, 넷 N은 그 레지스터와 관련된 제 1 클럭-구동된 입력을 위한 그 레지스터의 출력이다. 시스템은 넷 N을 사용하는 모든 컴포넌트를 찾아낸다. 넷 N을 사용하는 각각의 컴포넌트 X에 대하여, 상기 시스템은 컴포넌트 X가 결합 컴포넌트인지 아닌지 여부를 결정한다. 각각의 컴포넌트 X가 결합 컴포넌트가 아닌 경우, 넷 N의 팬-아웃 세트는 결합 컴포넌트를 가지지 않고 넷 N은 제 1 클럭이다.Again, the gated clock or data logic network is determined by determining the fan-in and fan-out sets of net N and determining their intersections. The final goal here is to determine the so-called net-out fan-out set. Net N is typically the clock output node for determining the gated data logic from the fan-out perspective. Therefore, all logic element sets using net N will be determined. To determine the gated data logic from the fan-out prediction, net N is the clock output node associated with the nearby data output. If a node is on a register, net N is the output of that register for the first clock-driven input associated with that register. The system finds all components that use net N. For each component X using net N, the system determines whether component X is a combined component or not. If each component X is not a combined component, the net-out set of net N has no combined component and net N is the first clock.

그러나, 적어도 하나의 컴포넌트 X가 결합 컴포넌트인 경우, 시스템은 컴포넌트 X의 출력 넷 Y를 결정한다. 여기서, 상기 시스템은 컴포넌트 X로부터의 출력 노드를 찾아냄으로써 회로 설계를 고찰한다. 각각의 컴포넌트 X로부터의 각 출력 넷 Y에 대하여, 넷 Y에 결합되는 팬-아웃 세트 W가 존재할 수 있다. 넷 Y의 이 팬-아웃 세트 W는 넷 N의 팬-아웃 세트에 부가되고 나서, 컴포넌트 X가 세트 N에 부가된다.However, if at least one component X is a combined component, the system determines the output net Y of component X. Here, the system considers the circuit design by finding the output node from component X. For each output net Y from each component X, there may be a fan-out set W coupled to net Y. This fan-out set W of net Y is added to the fan-out set of net N, and then component X is added to set N.

단계(507)는 클럭 네트워크 또는 게이트된 클럭 로직을 결정한다. 클럭 네트워크는 팬-인 및 팬-아웃 결합 컴포넌트의 인터섹션이다.Step 507 determines the clock network or gated clock logic. The clock network is the intersection of fan-in and fan-out coupling components.

마찬가지로, 게이트된 로직 회로를 결정하는데 동일한 팬-인 및 팬-아웃 원리가 사용된다. 게이트된 클럭과 같이, 게이트된 데이터는 어떤 결합 로직를 통하여 제 1 클럭에 의해 구동된 레지스터의 데이터 또는 제어 입력(클럭을 제외한)이다. 게이트된 데이터 로직는 게이트된 데이터의 팬-인 및 제 1 클럭으로부터의 팬-아웃의 인터섹션이다. 그러므로, 클럭 분석 및 게이트된 데이터 분석은 어떤 결합 로직을 통한 게이트된 클럭 네트워크/로직 및 게이트된 데이터 로직를 발생시킨다. 후술되는 바와 같이, 게이트된 클럭 네트워크 및 게이트된 데이터 네트워크 결정은 에뮬레이션 동안 하드웨어 모델에서의 로직 평가 및 소프트웨어 클럭의 성공적인 구현에 중요하다. 클럭/데이터 네트워크 분석은 단계(508)에서 종료된다.Similarly, the same fan-in and fan-out principles are used to determine gated logic circuits. Like the gated clock, the gated data is the data or control input (except the clock) of the register driven by the first clock through some coupling logic. The gated data logic is the intersection of the fan-in of the gated data and the fan-out from the first clock. Therefore, clock analysis and gated data analysis generate gated clock network / logic and gated data logic through some combining logic. As discussed below, gated clock network and gated data network decisions are important for the logic evaluation in the hardware model and successful implementation of the software clock during emulation. Clock / data network analysis ends at step 508.

도 17은 본 발명의 일 실시예에 따른 하드웨어 모델의 기본적인 형성 블럭을 도시한 것이다. 레지스터 컴포넌트에 대하여, SEmulation 시스템은 에지 트리거(즉, 플립-플롭) 및 레벨 검출(즉, 래치) 레지스터 하드웨어 모델 둘 모두를 구성하기 위한 기본적인 블럭으로서 비동기식 부하 제어를 갖는 D타입 플립-플롭을 사용한다. 블럭을 형성하는 이 레지스터 모델은 다음의 포트: Q(출력 상태);A_E(비동기식 인에이블);A_D(비동기식 데이터);S_E(동기 인에이블);S_D(동기 데이터); 및 System.clk(시스템 클럭)을 갖는다.17 illustrates basic building blocks of a hardware model according to an embodiment of the present invention. For the register component, the SEmulation system uses a D-type flip-flop with asynchronous load control as the basic block for constructing both the edge trigger (i.e. flip-flop) and level detection (i.e. latch) register hardware models. . This register model, which forms a block, includes the following ports: Q (output state); A_E (asynchronous enable); A_D (asynchronous data); S_E (synchronous enable); S_D (synchronous data); And System.clk (system clock).

이 SEmulation 레지스터 모델은 비동기식 인에이블(A_E) 입력의 양의(+) 레벨 또는 시스템 클럭의 양의(+) 에지에 의하여 트리거된다. 이러한 두 개의 양의 에지 또는 양의 레벨 트리거링 이벤트중 하나가 발생할때, 레지스터 모델은 비동기식 인에이블(A_E) 입력을 찾는다. 비동기식 인에이블(A_E) 입력이 인에이블되는 경우, 출력 (Q)은 비동기식 데이터(A_D)의 값을 나타낸다; 그렇지 않으면, 동기 인에이블(S_E)이 인에이블되는 경우, 출력(Q)은 동기 데이터(S_D)의 값을 나타낸다. 한편, 어떤 비동기식 인에이블(A_E) 입력이나 동기 인에이블(S_E) 입력도 인에이블되지 않는 경우, 출력(Q)은 시스템 클럭의 양의(+) 에지의 결정에도 불구하고 평가되지 않는다. 이 방식에서, 이러한 인에이블 포트로의 입력은 이 기본 형성 블럭 레지스터 모델의 동작을 제어한다.This SEmulation register model is triggered by the positive level of the asynchronous enable (A_E) input or by the positive edge of the system clock. When either of these two positive edges or positive level triggering events occur, the register model looks for an asynchronous enable (A_E) input. When the asynchronous enable A_E input is enabled, the output Q indicates the value of the asynchronous data A_D; Otherwise, when the sync enable S_E is enabled, the output Q indicates the value of the sync data S_D. On the other hand, if no asynchronous enable (A_E) input or synchronous enable (S_E) input is enabled, the output Q is not evaluated despite the determination of the positive edge of the system clock. In this way, input to this enable port controls the behavior of this basic building block register model.

시스템은 이러한 레지스터 모델의 인에이블 입력을 제어하기 위한 특정 인에이블 레지스터인 소프트웨어 클럭을 사용한다. 복잡한 사용자 회로 설계에서, 수 백만 개의 엘리먼트가 회로 설계에서 발견되고, 이에 따라서 SEmulation 시스템은 하드웨어 모델에서 수 백만 개의 엘리먼트를 구현할 것이다. 이러한 엘리먼트 모두를 개별적으로 제어하는 것은 하드웨어 모델로 수 백만 개의 신호를 전송하는 오버헤드가 소프트웨어에서 이러한 엘리먼트를 평가하는 것보다 많은 시간이 들것이기 때문에, 값이 비싸다. 그러나, 이 복잡한 회로 설계는 통상적으로 단지 몇 개(1-10)의 클럭만을 요구하며 레지스터 및 결합 컴포넌트만을 갖는 시스템의 상태 변화를 제어하는데 클럭들만으로도 충분하다. SE뮬레이터 시스템의 하드웨어 모델은 단지 레지스터 및 결합 컴포넌트만을 사용한다. SE뮬레이터 시스템은 또한 소프트웨어 클럭을 통하여 하드웨어 모델의 평가를 제어한다. SEmulation 시스템에서, 레지스터용 하드웨어 모델은 다른 하드웨어 컴포넌트에 직접 접속된 클럭을 갖지 않으며; 오히려, 소프트웨어 커널이 모든 클럭의 값을 제어한다. 몇 개의 클럭 신호를 제어함으로써, 커널은 무시 가능한 양의 공동-프로세서 개입 오버헤드를 가지고 하드웨어 모델의 평가를 완전히 제어할 수 있게 된다.The system uses a software clock, which is a specific enable register to control the enable input of this register model. In a complex user circuit design, millions of elements are found in the circuit design, so the SEmulation system will implement millions of elements in the hardware model. Controlling all of these elements individually is expensive because the overhead of sending millions of signals to the hardware model will take more time than evaluating these elements in software. However, this complex circuit design typically requires only a few clocks (1-10) and the clocks are sufficient to control the state change of the system with only registers and coupling components. The hardware model of the SE emulator system uses only registers and coupling components. The SE emulator system also controls the evaluation of the hardware model through the software clock. In a SEmulation system, the hardware model for registers does not have a clock directly connected to other hardware components; Rather, the software kernel controls all clock values. By controlling several clock signals, the kernel has full control over the evaluation of the hardware model with negligible amounts of co-processor intervention overhead.

레지스터 모델이 래치로서 사용되는지 또는 플립-플롭으로 사용되는지 여부에 따라서, 소프트웨어 클럭은 비동기식 인에이블(A_E) 또는 동기 인에이블(S_E) 와이어 라인중 하나로의 입력일 수 있다. 소프트웨어 모델로부터 하드웨어 모델로의 소프트웨어 클럭의 인가는 클럭 컴포넌트의 에지 검출에 의해 트리거된다. 소프트웨어 커널이 클럭 컴포넌트의 에지를 검출할때, 이 커널은 CLK 어드레스 공간을 통하여 클럭-에지 레지스터를 설정한다. 이것 클럭-에지 레지스터는 하드웨어 레지스터 모델로의 클럭 입력이 아니라 인에이블 입력을 제어한다. 글로벌 시스템 클럭은 하드웨어 레지스터 모델로의 클럭 입력을 여전히 제공한다. 그러나, 클럭-에지 레지스터는 이중-버퍼링된 인터페이스를 통하여 하드웨어 레지스터 모델로의 소프트웨어 클럭 신호를 제공한다. 후술되는 바와 같이, 소프트웨어 클럭으로부터 하드웨어 모델로의 이중-버퍼 인터페이스는 모든 레지스터 모델이 글로벌 시스템 클럭과 관련하여 동기식으로 업데이트되도록 한다. 그러므로, 소프트웨어 클럭을 사용하면 유지 시간 위반의 위험이 제거된다.Depending on whether the register model is used as a latch or flip-flop, the software clock may be an input to either an asynchronous enable (A_E) or synchronous enable (S_E) wire line. The application of the software clock from the software model to the hardware model is triggered by edge detection of the clock component. When the software kernel detects an edge of a clock component, it sets a clock-edge register through the CLK address space. This clock-edge register controls the enable input, not the clock input to the hardware register model. The global system clock still provides the clock input to the hardware register model. However, the clock-edge register provides a software clock signal to the hardware register model through a double-buffered interface. As discussed below, the dual-buffer interface from the software clock to the hardware model allows all register models to be updated synchronously with respect to the global system clock. Therefore, using a software clock eliminates the risk of holding time violations.

도 18(a) 및 18(b)는 래치 및 플립-플롭용 형성 블럭 레지스터의 구현을 도시한 것이다. 이러한 레지스터 모델은 적절한 인에이블 입력을 통하여 소프트웨어-클럭 제어된다. 레지스터 모델이 플립-플롭 또는 래치로서 사용되는지 여부에 따라, 비동기식 포트(A_E,A_D) 및 동기식 포트(S_E,S_D)는 소프트웨어 클럭 또는 I/O 동작중 하나를 위해 사용된다. 도 18(a)는 래치로서 사용되는 경우의 레지스터 모델 구현을 도시한다. 래치는 레벨에 민감하다; 즉, 클럭 신호가 나타나는(예를 들어, "1")한, 출력(Q)은 입력(D)을 따른다. 여기서, 소프트웨어 클럭 신호는 비동기식 인에이블(A_E)로 제공되며 데이터 입력은 비동기 데이터(A_D) 입력으로 제공된다. I/O 동작에 대하여, 소프트웨어 커널은 Q 포트 내로 값을 다운로드하기 위하여 동기 인에이블(S_E) 및 동기 데이터(S_D) 입력을 사용한다. S_E 포트는 REG 공간 어드레스 포인터로서 사용되며 S_D는 국부적인 데이터 버스로/로부터 데이터에 액세스하기 위하여 사용된다.18 (a) and 18 (b) illustrate implementations of the formation block registers for latches and flip-flops. This register model is software-clock controlled through the appropriate enable input. Depending on whether the register model is used as a flip-flop or latch, asynchronous ports A_E and A_D and synchronous ports S_E and S_D are used for either software clock or I / O operations. 18A shows a register model implementation when used as a latch. Latch is level sensitive; In other words, as long as the clock signal appears (eg, "1"), the output Q follows the input D. Here, the software clock signal is provided with the asynchronous enable A_E and the data input is provided with the asynchronous data A_D input. For I / O operations, the software kernel uses synchronous enable (S_E) and synchronous data (S_D) inputs to download values into the Q port. The S_E port is used as a REG spatial address pointer and S_D is used to access data to and from the local data bus.

도 18(b)는 플립-플롭 설계로 사용되는 경우의 레지스터 모델 구현을 도시한다. 플립-플롭 설계는 다음 상태 로직를 결정하기 위하여 다음의 포트: 데이터(D), 세트(S), 리셋(R) 및 인에이블(E)을 사용한다. 플립-플롭 설계의 모든 다음 상태 로직는 동기 데이터(S_D) 입력 내로 공급되는 하드웨어 결합 컴포넌트로 팩터링된다. 소프트웨어 클럭은 동기 인에이블(S_E) 입력부로의 입력이다. I/O 동작에 대하여, 소프트웨어 커널은 Q 포트 내로 값을 다운로드하기 위하여 비동기식 인에이블(A_E) 및 비동기식 데이터(A_D) 입력을 사용한다. A_E 포트는 REG 공간 기록 어드레스 포인터로서 사용되고 A_D 포트는 국부적인 데이터 버스로/로부터 데이터에 액세스하기 위하여 사용된다.18 (b) shows a register model implementation when used in a flip-flop design. The flip-flop design uses the following ports: data D, set S, reset R, and enable E to determine the next state logic. All next state logic in the flip-flop design is factored into a hardware coupled component that is fed into the sync data (S_D) input. The software clock is the input to the sync enable (S_E) input. For I / O operations, the software kernel uses asynchronous enable (A_E) and asynchronous data (A_D) inputs to download values into the Q port. The A_E port is used as the REG space write address pointer and the A_D port is used to access data to and from the local data bus.

소프트웨어 클럭은 이하에 서술될 것이다. 본 발명의 소프트웨어 클럭의 일 실시예는 하드웨어 레지스터 모델로의 클럭 인에이블 신호여서 이러한 하드웨어 레지스터 모델로의 입력에서의 데이터가 시스템 클럭과 함께 그리고 시스템 클럭과 동기적으로 평가된다. 이것은 레이스 조건 및 유지-시간 위반을 제거한다. 소프트웨어 클럭 로직의 일 실시예는 클럭 에지 검출시 하드웨어 내의 부가적인 로직를 트리거하는 소프트웨어 내의 클럭 에지 검출 로직를 포함한다. 이와같은 인에이블 신호 로직는 데이터가 이러한 하드웨어 레지스터 모델로 도착하기 이전에 하드웨어 레지스터 모델로 인에이블 입력에 대한 인에이블 신호를 발생시킨다. 게이트된 클럭 네트워크 및 게이트된 데이터 네트워크 결정은 하드웨어 가속 모드 동안 하드웨어 모델에서의 로직 평가 및 소프트웨어 클럭의 성공적인 구현에 중요하다. 전술한 바와 같이, 클럭 네트워크 또는 게이트된 클럭 로직는 게이트된 클럭의 팬-인 및 제 1 클럭의 팬-아웃의 인터섹션이다. 마찬가지로, 게이트된 데이터 로직는 또한 게이트된 데이터의 팬-인 및 데이터 신호에 대한 제 1 클럭의 팬-아웃의 인터섹션이다. 이러한 팬-인 및 팬-아웃 개념은 도 16과 관련하여 상술되어 있다.The software clock will be described below. One embodiment of the software clock of the present invention is a clock enable signal to a hardware register model so that data at the input to this hardware register model is evaluated with the system clock and synchronously with the system clock. This eliminates race conditions and hold-time violations. One embodiment of software clock logic includes clock edge detection logic in software that triggers additional logic in hardware upon clock edge detection. This enable signal logic generates an enable signal for the enable input into the hardware register model before data arrives in this hardware register model. The gated clock network and gated data network decisions are important for the logic implementation in the hardware model and successful implementation of the software clock during the hardware acceleration mode. As mentioned above, the clock network or gated clock logic is the intersection of the fan-in of the gated clock and the fan-out of the first clock. Similarly, the gated data logic is also the intersection of the fan-in of the gated data and the fan-out of the first clock with respect to the data signal. This fan-in and fan-out concept is described above with respect to FIG.

전술한 바와 같이, 제 1 클럭은 소프트웨어에서 테스트-벤치 프로세스에 의해 생성된다. 유도되거나 게이트된 클럭은 제 1 클럭에 의해 차례로 구동되는 결합 로직 및 레지스터의 네트워크로부터 발생된다. 디폴트에 의하여, 본 발명의 SEmulation 시스템은 유도된 클럭을 소프트웨어에 유지시킬 것이다. 유도된 클럭의 수가 작은 경우(예를 들어, 10 이하), 이러한 유도된 클럭은 소프트웨어 클럭으로 모델링될 수 있다. 이러한 유도 클럭을 발생시키기 위한 결합 컴포넌트의 수가 적어서, 소프트웨어 내의 이러한 결합 컴포넌트를 모델링함으로써 I/O 오버헤드가 부가되지 않는다. 그러나, 유도된 클럭의 수가 큰 경우(예를 들어, 10 이상), 이러한 유도된 클럭 및 이들의 결합 컴포넌트는 I/O 오버헤드를 최소화하기 위하여 하드웨어에서 모델링될 수 있다.As mentioned above, the first clock is generated by a test-bench process in software. The derived or gated clock is generated from a network of register logic and register logic that is in turn driven by the first clock. By default, the SEmulation system of the present invention will maintain the derived clock in software. If the number of derived clocks is small (eg 10 or less), these derived clocks can be modeled as software clocks. The number of coupling components to generate such an induction clock is small, so no I / O overhead is added by modeling these coupling components in software. However, if the number of derived clocks is large (eg, 10 or more), these derived clocks and their combined components can be modeled in hardware to minimize I / O overhead.

궁극적으로, 본 발명의 일 실시예에 따라서, (제 1 클럭으로의 입력을 통하여) 소프트웨어에서 발생된 클럭 에지 검출은 (클럭 에지 레지스터로의 입력을 통하여) 하드웨어에서의 클럭 검출로 변화될 수 있다. 소프트웨어에서의 클럭 에지 검출은 하드웨어에서의 이벤트를 트리거시켜서 데이터 신호 이전에 클럭 인에이블 신호를 수신함으로써, 데이터 신호의 평가가 유지-시간 위반을 피하기 위해 시스템 클럭과 동기화되어 발생하도록 한다.Ultimately, according to one embodiment of the present invention, clock edge detection generated in software (via input to the first clock) may be changed to clock detection in hardware (via input to the clock edge register). . Clock edge detection in software triggers an event in hardware to receive a clock enable signal prior to the data signal, such that the evaluation of the data signal occurs in synchronization with the system clock to avoid hold-time violations.

전술한 바와 같이, SEmulation 시스템은 소프트웨어에서 사용자의 회로 설계의 완전 모델 및 하드웨어에서 사용자의 회로 설계의 일부를 갖는다. 커널에서 규정된 바와 같이, 소프트웨어는 하드웨어 레지스터 값에 영향을 주는 클럭 에지를 검출할 수 있다. 하드웨어 레지스터가 또한 자신들의 각 입력을 평가하도록 하기 위하여, 소프트웨어/하드웨어 경계는 소프트웨어 클럭을 포함한다. 소프트웨어 클럭은 하드웨어 모델 내의 레지스터가 시스템 클럭과 동기화하여 그리고 임의의 유지-시간 위반 없이 평가되도록 한다. 소프트웨어 클럭은 본질적으로 하드웨어 레지스터 컴포넌트로의 클럭 입력을 제어한다기 보다는 차라리 하드웨어 레지스터 컴포넌트의 인에이블 입력을 제어한다. 소프트웨어 클럭을 구현하기 위한 이중-버퍼링된 방법은 레지스터가 레이스 조건을 피하기 위하여 시스템 클럭과 동기화하여 평가되도록 하고 유지-시간 위반을 피하기 위한 정확한 타이밍 제어에 대한 필요성을 제거하도록 한다.As mentioned above, the SEmulation system has a complete model of the user's circuit design in software and part of the user's circuit design in hardware. As defined in the kernel, software can detect clock edges that affect hardware register values. In order for the hardware registers to also evaluate their respective inputs, the software / hardware boundary includes a software clock. The software clock allows the registers in the hardware model to be evaluated in synchronization with the system clock and without any hold-time violation. The software clock essentially controls the enable input of the hardware register component rather than the clock input to the hardware register component. The double-buffered method for implementing a software clock allows the registers to be evaluated in synchronization with the system clock to avoid race conditions and eliminates the need for accurate timing control to avoid hold-time violations.

도 19는 본 발명에 따른 클럭 구현 시스템의 일 실시예를 도시한 것이다. 처음에, 게이트된 클럭 로직 및 게이트된 데이터 로직는 도 16과 관련하여 전술한 바와 같이 SEmulation 시스템에 의해 결정된다. 그리고 나서, 게이트된 클럭 로직 및 게이트된 데이터 로직가 분리된다. 이중 버퍼를 구현할때, 구동 소스 및 이중 버퍼링된 제 1 로직는 분리되어야만 한다. 따라서, 팬-인 및 팬-아웃 분석으로부터, 게이트된 데이터 로직(513) 및 게이트된 클럭 로직(514)가 분리된다.19 illustrates one embodiment of a clock implementation system in accordance with the present invention. Initially, the gated clock logic and gated data logic are determined by the SEmulation system as described above with respect to FIG. The gated clock logic and gated data logic are then separated. When implementing a double buffer, the drive source and the double buffered first logic must be separated. Thus, from fan-in and fan-out analysis, gated data logic 513 and gated clock logic 514 are separated.

모델링된 제 1 클럭 레지스터(510)는 제 1 버퍼(511) 및 제 2 버퍼(512)를 포함하는데, 이것들은 둘 모두 D 레지스터이다. 이 제 1 클럭은 소프트웨어에어 모델링되지만 이중-버퍼 구현은 소프트웨어 및 하드웨어 둘 모두에서 모델링된다. 클럭 에지 검출은 하드웨어 모델에 대한 소프트웨어 클럭 신호를 발생시키도록 하드웨어 모델을 트리거하기 위하여 소프트웨어 내의 제 1 클럭 레지스터(510)에서 발생된다. 데이터 및 어드레스는 와이어 라인(519 및 529)에서 각각 제 1 버퍼(511)로 들어간다. 와이어 라인(521) 상에서의 이 제 1 버퍼(511)의 Q 출력은 제 2 버퍼(512)의 D 입력에 결합된다. 이 제 1 버퍼(511)의 Q 출력은 또한 궁극적으로 클럭 에지 레지스터(515)의 제 1 버퍼(516)의 클럭 입력을 구동시키기 위하여 와이어 라인(522) 상에서 게이트된 클럭 로직(514)로 제공된다. 와이어 라인(523) 상에서의 제 2 버퍼(512)의 출력은 궁극적으로 사용자의 주문-설계된 회로 모델에서 와이어 라인(530)을 통하여 레지스터(518)의 입력을 구동시키기 위하여 게이트된 데이터 로직(513)에 제공된다. 제 1 클럭 레지스터(510) 내의 제 2 버퍼(512)로의 인에이블 입력은 상태 머신으로부터 와이어 라인(533) 상의 INPUT-EN 신호이며, 이것은 평가 사이클을 결정하고 이에 따라서 다양한 신호를 제어한다.
클럭 에지 레지스터(515)는 또한 제 1 버퍼(516) 및 제 2 버퍼(517)를 포함한다. 클럭 에지 레지스터(515)는 하드웨어에서 구현된다. 클럭 에지 검출이 (제 1 클럭 레지스터(510)로의 입력을 통하여) 소프트웨어에서 발생할때, 이것은 하드웨어에서 (클럭 에지 레지스터(515)를 통하여) 하드웨어 내의 동일한 클럭 에지 검출을 트리거할 수 있다. 와이어 라인(524) 상에서 제 1 버퍼(516)로의 D 입력은 로직 "1"로 설정된다. 와이어 라인(525)상의 클럭 신호는 게이트 클럭 로직(514)으로부터 유도되며 궁극적으로 제 1 버퍼(511)의 와이어 라인(522) 상의 출력에서 제 1 클럭 레지스터(510)로부터 유도된다. 와이어 라인(525) 상의 클럭 신호는 게이트 클럭 신호이다. 제 1 버퍼(516)에 대한 인에이블 와이어 라인(526)은 I/O 및 평가 주기(이후 개시됨)를 제어하는 상태 머신으로부터 ~EVAL 신호이다. 또한 제 1 버퍼(516)는 와이어 라인(527) 상의 RESET 신호를 갖는다. 상기 동일한 RESET 신호는 클럭 에지 레지스터(515)내의 제 2 버퍼(517)에 제공된다. 와이어 라인(529) 상의 제 1 버퍼(516)의 출력(Q)은 제 2 버퍼(517)에 입력(D)에 제공된다. 또한 제 2 버퍼(517)는 와이어 라인(527) 상의 CLK-EN 신호와 RESET 입력에 대한 와이어 라인(528) 상의 인에이블 입력을 갖는다. 와이어 라인(532) 상의 제 2 버퍼(517)의 출력(Q)은 사용자의 주문-설계 회로 모델에 있는 레지스터(518)의 인에이블 입력에 제공된다. 레지스터(518)와 함께 버퍼(511,512,517)는 시스템 클럭에 의해 클럭화된다. 클럭 에지 레지스터(515)에 있는 버퍼(516)만이 게이트 클럭 로직(514)으로부터 게이트 클럭에 의해 클럭화된다.
레지스터(518)는 하드웨어에서 모델링되고 사용자의 주문 회로 설계의 일부인 통상의 D-타입 레지스터 모델이다. 본 발명의 클럭 구현 수단의 본 실시예에 의해 평가는 엄격히 제어된다. 본 클럭-셋업의 궁극적 목적은 레지스터에 의한 데이터 신호 평가가 시스템 클럭은 사용하고 레이스(race) 조건은 사용하지 않고 동기화될 수 있도록 와이어 라인(530)에서 데이터 신호 이전에 레지스터(518)에 와이어 라인(532)에서 클럭 인에이블 신호가 도달할 수 있게 한다.
반복을 위해, 모델링되는 제 1 클럭 레지스터(510)는 소프트웨어로 모델링되지만 그의 이중 버퍼 실행은 소프트웨어와 하드웨어 모두에서 모델링된다. 클럭 에지 레지스터(515)는 하드웨어에서 구현된다. 팬-인 및 팬-아웃 분석으로부터 게이트 데이터 로직(513)과 게이트 클럭 로직(514)은 모델링을 목적으로 분리되며, 소프트웨어(게이트 데이터 및 게이트 클럭의 수가 작은 경우) 또는 하드웨어(게이트 데이터 및 게이트 클럭의 수가 큰 경우)에서 모델링될 수 있다. 게이트 클럭 네트워크 및 게이트 데이터 네트워크 결정은 하드웨어 가속 모드 동안 하드웨어 모델에서의 로직 평가 및 소프트웨어 클럭의 성공적인 수행에 중요하다.
소프트웨어 클럭 수행은 ~EVAL, INPUT-EN, RESET 신호의 발생(assertion) 타이밍과 함께 도 19에 도시된 클럭 설정을 주로 따른다. 제 1 클럭 레지스터(510)는 하드웨어 모델에 대한 소프트웨어 클럭 발생을 트리거시키기 위한 클럭 에지를 검출한다. 상기 클럭 에지 검출 이벤트는 클럭 에지 레지스터(151)가 동일한 클럭 에지를 검출하도록 와이어 라인(525) 상의 클럭 입력, 게이트 클럭 로직(514) , 및 와이어 라인(522)을 통한 클럭 에지 레지스터(515)의 "액티베이션(activation)"을 트리거시킨다. 이러한 경우, 소프트웨어에서 발생하는 클럭 검출(제 1 클럭 레지스터(510)에서 입력(519, 520)을 통해)은 하드웨어에서 클럭 에지 검출(클럭 에지 레지스터(515)에서 입력(525)을 통해)로 전달될 수 있다. 이때, 제 1 클럭 레지스터(510)에 있는 제 2 버퍼(512)로의 IMPUT-EN 와이어 라인(533)과 클럭 에지 레지스터(515)에 있는 제 2 버퍼(517)로의 CLKEN 와이어 라인(528)은 나타나지 않아 어떠한 데이터 평가도 이루어지지 않는다. 따라서, 클럭 에지는 하드웨어 레지스터 모델에서 데이터가 평가되기 이전에 검출된다. 주목할 것은 상기 단계에서, 와이어 라인(519) 상의 데이터 버스로부터의 데이터는 게이트 데이터 로직(513) 밖으로 전파되지 않고 하드웨어-모델링된 사용자 레지스터(518) 속으로 전파된다는 것이다. 실제로, 데이터는 제 1 클럭 레지스터(510)에 있는 제 2 버퍼(512)에 도달하지 않으며 이는 와이어 라인(533) 상의 INPUT-EN 신호가 아직 나타나지 않았기 때문이다.
I/O 단계 동안, 와이어 라인(526)상의 ~EVAL 신호는 클럭 에지 레지스터(515)에 있는 제 1 버퍼(516)를 인에이블 시키도록 발생된다. ~EVAL 신호는 게이트 클럭 로직(514)을 통해 제 1 버퍼(516)의 와이어 라인(525) 상의 클럭 입력으로 게이트 클럭 로직이 통하게 함으로써 게이트 클럭 신호를 모니터한다. 따라서, 4-상태 평가 상태 머신에 관련하여 이하 설명될 것이며, ~EVAL 신호는 도 19에 도시된 시스템 부분을 통해 데이터 및 클럭 신호를 안정화시키는 것이 요구되는 한 유지될 수 있다.
신호가 안정화될 때, I/O가 종결되거나, 시스템이 데이터 평가를 위해 준비할 경우, ~EVAL은 제 1 버퍼(516)를 디스에이블시키도록 감소된다. CLK-EN 신호는 발생되고 제 2 버퍼(517)를 인에이블 시키도록 와이어 라인(528)을 통해 제 2 버퍼(517)에 적용되며 레지스터(518)에 대한 인에이블 입력에 와이어 라인(532) 상의 출력(Q)에 와이어 라인(529) 상의 로직"1"값을 전송한다. 레지스터(518)는 인에이블되고 와이어 라인(530)에 존재하는 임의의 데이터는 시스템 클럭에 의해 레지스터(518)속으로 동기식으로 클럭화된다. 판독기가 관찰가능함에 따라, 레지스터(518)로의 인에이블 신호는 상기 레지스터(518)로의 데이터 신호의 평가 보다 빠르다.
와이어 라인(533) 상의 IMPUT-EN 신호는 제 2 버퍼(512)에 나타나지 않는다. 또한, 와이어 라인(527) 상의 RESET 에지 레지스터 신호는 이들 버퍼를 리셋시키도록 클럭 에지 레지스터(515)에 있는 버퍼(5156,517)에 나타나며 이들의 출력이 로직 "0"이되게 한다. INPUT-EN 신호는 버퍼(512)에 대해 나타나고, 와이어 라인(521) 상의 데이터는 와이어 라인(530) 상의 사용자 회로 레지스터(518)에 대한 게이트 데이터 로직(513)로 전파된다. 상기 레지스터(518)로의 인에이블 입력은 로직 "0"이 되기 때문에, 와이어 라인(530) 상의 데이터는 레지스터(518) 속에서 클럭화될 수 없다. 그러나 이전 데이터는 이미 RESET 신호가 레지스터(518)를 디스에이블 시키도록 발생되기 이전에 와이어 라인(532) 상에 이미 발생된 인에이블 신호에 의해 클럭화된다. 따라서, 레지스터(518)에 대한 입력 데이터 뿐만 아니라 사용자 하드웨어 모델 회로 설계의 일부인 다른 레지스터에 대한 입력은 이들 각각의 레지스터 입력 포트에서 안정화된다. 클럭 에지가 차후 소프트웨어에서 검출되는 경우, 제 1 클럭 레지스터 (510) 및 하드웨어에 있는 클럭 에지 레지스터(515)는 레지스터(518)의 입력에서 데이터 대기 및 이들 각각의 레지스터 입력에서 데이터 대기가 서로 클럭화되어 시스템 클럭에 의해 동기화되도록 레지스터(518)로의 인에이블 신호를 활성화시킨다.
앞서 설명된 것처럼, 소프트웨어 클럭 실행은 ~EVAL, INPUT-EN, CLK-EN, RESET 신호의 발생 타이밍과 함께 도 19에 도시된 클럭 설정을 주로 따른다. 도 20은 본 발명의 일실시예에 따라 도 19의 소프트웨어 클럭 로직을 제어하기 위한 4가지 상태의 유한 상태 머신을 나타낸다.
상태 510에서, 시스템은 휴면(idle)되거나 또는 일부 I/O 동작은 진행중이다. ~EVAL 신호는 로직"0"이다. ~EVAL 신호는 평가 주기를 결정하며 시스템 컨트롤러에 의해 발생되고, 시스템에서 로직를 안정화시키기 위해 요구되는 다수의 클럭 주기를 지속시킨다. 통상적으로, ~EVAL 신호 주기는 컴파일 동안 배치 수단에 의해 결정되며 가장긴 직선 와이어의 길이 및 가장긴 세그먼트 멀티플렉스 와이어의 길이(즉, TDM 회로)에 기초한다. 평가 동안, ~EVAL 신호는 로직 "1"이다.
상태 541에서, 클럭은 인에이블이다. CLK-EN 신호는 로직 "1"에서 발생되며, 하드웨어 레지스터 모델에 대한 인에이블 신호가 발생된다. 그러나, 하드웨어 레지스터 모델에서 이전에 게이트된 데이터는 유지-시간 위반의 위험 없이 동기식으로 평가된다.
상태 542에서, 새로운 데이터는 INPUT-EN 신호가 로직"1"에서 발생될 때 인에이블이다. RESET 신호는 하드웨어 레지스터 모델로부터 인에이블 신호를 이동시키도록 발생된다. 그러나, 게이트 로직 네트워크를 통해 하드웨어 레지스터 모델로 인에이블되는 새로운 데이터는 의도된 하드웨어 레지스터 모델 결정을 위해 전파가 계속되거나 또는 인에이블 신호가 다시 발생될 때 하드웨어 레지스터 모델로 클럭되도록 대기된다.
상태 543에서, 새로운 데이터 전파는 로직 "1"에서 EVAL 신호가 유지되는 동안 로직에서 안정화된다. 도 9(A), 9(B) 및 9(C)와 관련한 시분할 멀티플렉스(TDM) 회로에 대해 상기 설명된 것처럼 멀티플렉서-와이어는 로직"1"에 있다. ~EVAL 신호는 감소 또는 로직 "0"으로 설정되며, 시스템은 휴면 상태(540)로 복귀되어 소프트웨어에 의해 클럭 에지의 검출에 따라 평가되도록 대기된다.
D. FPGA 어레이 및 컨트롤
상기 SEmulator 시스템은 컴포넌트 형태를 포함하는 다양한 컨트롤을 기반으로 상기 사용자 회로설계 데이터를 소프트웨어 및 하드웨어 모델로 초기 컴파일한다. 하드웨어 컴파일 프로세스동안, 도 6에서 상술한 바와 같이, 상기 시스템은 최적 파티션, 배치, 및 상기 사용자 회로설계를 구성하는 다양한 컴포넌트들의 상호접속에 대한 맵핑, 배치, 및 라우팅 프로세스를 수행한다. 알려진 프로그래밍 툴을 이용하여, 많은 FPGA 칩들을 포함하는 하드웨어 보드를 리컨피규러블하기 위해 비트스트림 컨피규레이션 파일 또는 프로그래머 오브젝트 파일(.pof)(또는 선택적으로, 로우 이진 파일(.rbf))들이 참조된다. 각각의 칩은 사용자 회로설계에 해당하는 하드웨어 모델의 일부를 포함한다.
일 실시예에서, 상기 SEmulator 시스템은 4 ×4 어레이의 FPGA 칩, 총 16개의 칩을 사용한다. 예시적인 FPGA 칩들은 Xilinx XC4000 시리즈 패밀리의 FPGA 로직 소자 및 Altera FLEX 10K 소자를 포함한다.
상기 Xilinx XC4000 시리즈의 FPGA들은 XC4000, XC4000A, XC4000D, XC4000H, XC4000E, XC4000EX, XC4000L, 및 XC4000XL을 포함하여 이용될 수 있다. 특정 FPGA들은 Xilinx XC4005H, XC4025, 및 Xilinx 4028EX를 포함한다. Xilinx 4028EX FPGA 엔진들은 단일 PCI 보드 용량에서 50만개의 게이트에 이른다. 이러한 Xilinx FPGA들은 본 발명에 참조로 포함되는 이들의 데이터북, Xilinx, 프로그래머블 로직 데이터북(9/96)에서 상세히 알 수 있다. Altera FPGA들은 본 발명에 참조로 포함되는 데이터북, Altera, 1996 데이터북(1996년 6월)에서 상세히 알 수 있다.
XC4025 FPGA에 대한 일반적인 간략한 설명이 제공된다. 각 어레이 칩은 240-핀 Xilinx 칩으로 이루어진다. Xilinx XC4025 칩을 갖는 어레이 보드는 440,000개의 배치가능한(configurable) 게이트들을 포함하고, 연산-집약적인 테스크를 수행할 수 있다. Xilinx XC4025 FPGA는 1024개의 배치가능한 로직 블럭(CLB)으로 이루어진다. 각 CLB는 32비트의 비동기식 SRAM 또는 작은 양의 일반적인 부울 로직, 및 2개의 스트로브(strobe) 레지스터를 구현할 수 있다. 칩 주변에는, 언스트로브 I/O 레지스터가 제공된다. XC4025 대신 XC4005H를 이용할 수 있다. 이는 120,000 배치가능한 게이트들을 갖는 어레이 보드의 상대적으로 저가의 버전이다. XC4005H 소자는 고전력 24㎃ 구동회로를 갖지만, 표준 XC4000 시리즈의 입/출력 플립플롭이 없다. 이들 및 다른 Xilinx FPGA들은 공개적으로 이용가능한 데이터 시트를 통해 상세히 알 수 있고, 본 발명에 참조로 포함될 수 있다.
Xilinx XC4000 시리즈 FPGA의 기능은 구성 데이터를 내부 메모리 셀로 로딩함으로써 주문생산할 수 있다. 이러한 메모리 셀에 저장된 값들은 FPGA의 로직 기능과 상호접속을 결정할 수 있다. 이러한 FPGA의 구성 데이터는 온-칩 메모리에 저장될 수 있고 외부 메모리로부터 로딩될 수도 있다. FPGA는 외부 직렬 또는 병렬 PROM으로부터 구성 데이터를 리드하거나, 외부 소자로부터 FPGA로 구성 데이터가 라이트될 수 있다. 이러한 FPGA들은 특히 하드웨어가 동적으로 변하거나 사용자가 하드웨어를 다른 에플리케이션에 적용하길 원하는 경우, 무제한 횟수로 재프로그램될 수 있다.
일반적으로, XC4000 시리즈 FPGA는 1024개까지 CLB를 갖는다. 각 CLB는 2가지 레벨의 룩-업 테이블을 갖고, 2개의 4-입력 룩-업 테이블(또는 함수 제너레이터 F 및 G)은 3번째 3-입력 룩-업 테이블(또는 함수 제너레이터 H), 및 2개의 플립플롭 또는 래치로 몇개의 입력을 제공한다. 이러한 룩-업 테이블의 출력은 플립플롭 또는 래치와 별개로 구동될 수 있다. CLB는 다음의 임의의 부울 함수의 조합을 구현할 수 있다: (1) 4 또는 5개 변수의 임의의 함수, (2) 4개 변수의 임의의 함수, 4개의 비상관 변수의 임의의 제 2 함수, 및 3개의 비상관 변수의 제 3 함수, (3) 4개 변수의 하나의 함수 및 6개 변수의 다른 함수, (4) 4개 변수의 임의의 함수, 및 (5) 9개 변수의 몇몇 함수. CLB 입력을 등록하거나 룩-업 테이블 출력을 저장하기 위해 2개의 D타입 플립플롭 또는 래치가 이용될 수 있다. 이러한 플립플롭은 룩-업 테이블과 별개로 사용될 수 있다. DIN은 이러한 2개의 플립플롭 또는 래치 중 하나의 직접적인 입력으로서 사용될 수 있고 H1은 H 함수 제너레이터를 통해 다른 함수 제너레이터를 구동할 수 있다.
CLB의 각 4-입력 함수 제너레이터(즉, F 및 G)는 신호를 전송하거나 받는 빠른 제너레이션동안 전용 산술 로직을 포함하고, "carry-in" 및 "carry-out"을 갖는 2비트 가산기(adder)를 구현하도록 구성될 수 있다. 이러한 함수 제너레이터들은 또한 리드/라이트 랜덤 액세스 메모리(RAM)로서 구현될 수 있다. 4-입력 와이어 라인들은 RAM을 위한 어드레스 라인으로 사용된다.
Altera FLEX 10K 칩은 개념상 다소 유사하다. 이러한 칩은 다중 32비트 버스를 갖는 SRAM-기반의 프로그래머블 로직 소자(PLD)이다. 특히, 각각의 FLEX 10K100 칩은 대략 100,000 게이트들, 12개의 내장 어레이 블럭(EAB), 624 로직 어레이 블럭(LAB), LAB당 8 로직 엘리먼트(LE)(또는 4,992 LE), 5,392개 플립플롭 또는 레지스터, 406개 I/O 핀, 및 총 503 핀을 포함한다.
Altera FLEX 10K 칩은 내장형 어레이의 내장 어레이 블럭(EAB)과 로직 어레이의 로직 어레이 블럭(LAB)을 포함한다. EAB는 다양한 메모리(예, RAM, ROM, FIFO), 및 복잡한 로직 기능(예, 디지털 신호 프로세서(DSP), 마이크로컨트롤러, 곱셈기, 데이터 변형 기능, 상태 머신)을 구현하도록 이용될 수 있다. 메모리 기능 구현에 따라, EAB는 2048비트를 제공한다. 로직 기능 구현에 따라, EAB는 100 내지 600 게이트를 제공한다.
LE를 통해 LAB는 중간 크기의 로직 블럭을 구현하도록 이용될 수 있다. 각각의 LAB는 대략 96 로직 게이트를 나타내고 8 LE 및 로컬 상호접속부를 포함한다. LE는 4-입력 룩-업 테이블, 프로그래머블 플립플롭, 및 캐리 및 캐스케이드 함수를 위한 전용 신호 경로를 포함한다. 생성될 수 있는 통상의 로직 기능은 카운터, 어드레스 디코더, 또는 작은 상태 머신을 포함한다.
Altera FLEX10K 칩의 더 상세한 설명은 Altera, 1996 데이터북(1996년 6월)에서 알 수 있고, 본 발명에 참조로 포함된다. 상기 데이터북은 또한 지원되는 프로그래밍 소프트웨어를 상세히 포함한다.
도 8은 4 ×4 FPGA 어레이 및 그 상호접속의 일 실시예를 나타낸다.
SEmulator의 이러한 실시예는 FPGA 칩에 대한 크로스 바 또는 부분적인 크로스 바 커넥션을 이용하지 않음을 주의한다. FPGA 칩은 제 1 열의 칩 F11 내지 F14, 제 2 열의 칩 F21 내지 F24, 제 3 열의 칩 F31 내지 F34, 및 제 4 열의 칩 F41 내지 F44를 포함한다. 일 실시예에서, 각 FPGA 칩(예, 칩 F23)은 SEmulator 시스템의 FPGA I/O 컨트롤러와 인터페이스하기 위한 아래의 핀들을 갖는다:
인터페이스 핀 데이터 버스 32 스페이스 지수 3 리드, 라이트, EVAL 3 데이터 XSFR 1 어드레스 포인터 체인 2 총계 41
따라서, 일 실시예에서, 각 FPGA 칩은 상기 SEmulator 시스템과 인터페이스하기 위한 41의 핀만을 이용한다. 이러한 핀들은 도 22에서 추가로 설명된다.
이러한 FPGA 칩들은 크로스바가 없는 또는 크로스바가 일부 없는 상호 접속부를 통해 서로 상호 접속된다. 칩(F11)과 칩(F14) 사이의 상호 접속부(602)와 같이 칩들간의 각 상호 접속부는 44 핀 또는 44 배선을 나타낸다. 다른 실시예에서 각 상호 접속부는 44 핀 이상을 나타낸다. 또 다른 실시예에서 각 상호 접속부는 44 핀 미만을 나타낸다.
각 칩은 6개의 상호 접속부를 갖는다. 예를 들어 칩(F11)은 상호 접속부(600∼605)를 갖는다. 또한, 칩(F33)은 상호 접속부(606∼611)를 갖는다. 이들 상호 접속부는 행을 따라 수평으로 열을 따라 수직으로 이어진다. 각 상호 접속부는 행을 따라 2개의 칩들간 또는 열을 따라 2개의 칩들간 직접 접속을 제공한다. 따라서 예를 들어 상호 접속부(600)는 칩(F11)과 칩(F13)을 직접 접속하고; 상호 접속부(601)는 칩(F11)과 칩(F12)을 직접 접속하고; 상호 접속부(602)는 칩(F11)과 칩(F14)을 직접 접속하고; 상호 접속부(603)는 칩(F11)과 칩(F31)을 직접 접속하고; 상호 접속부(604)는 칩(F11)과 칩(F21)을 직접 접속하고; 상호 접속부(605)는 칩(F11)과 칩(F41)을 직접 접속한다.
마찬가지로, 어레이의 에지(예를 들어 칩(F11))에 위치하지 않는 칩(F33)에 대해 상호 접속부(606)는 칩(F33)과 칩(F13)을 직접 접속하고; 상호 접속부(607)는 칩(F33)과 칩(F23)을 직접 접속하고; 상호 접속부(608)는 칩(F33)과 칩(F34)을 직접 접속하고; 상호 접속부(609)는 칩(F33)과 칩(F43)을 직접 접속하고; 상호 접속부(610)는 칩(F33)과 칩(F31)을 직접 접속하고; 상호 접속부(611)는 칩(F33)과 칩(F32)을 직접 접속한다.
칩(F11)은 칩(F13)으로부터 한 홉 내에 위치하기 때문에, 상호 접속부(600')는 "1"이라 한다. 칩(F11)은 칩(F12)으로부터 한 홉 내에 위치하기 때문에, 상호 접속부(601)는 "1"이라 한다. 마찬가지로, 칩(F11)은 칩(F14)으로부터 한 홉 내에 위치하기 때문에, 상호 접속부(602)는 "1"이라 한다. 마찬가지로, 칩(F33)에 대해 모든 상호 접속부는 "1"이라 한다.
이 상호 접속 구조는 각 칩이 두 "점프" 또는 상호 접속부 내의 어레이에 있는 다른 칩과 접속될 수 있게 한다. 따라서, 칩(F11)은 다음 두 경로 중 어느 하나에 의해 칩(F33)에 접속된다: (1) 상호 접속부(600)∼상호 접속부(606); 또는 (2) 상호 접속부(603)∼상호 접속부(610). 요컨대, 상기 경로는 (1) 우선 행을 따르고 이어서 열을 따라, 또는 (2) 우선 열을 따르고 이어서 행을 따를 수 있다.
도 8은 수평 및 수직 상호 접속부에 의해 4 ×4 어레이로 구성된 FPGA 칩을 나타내지만, 기판 상에서의 실제 물리적인 실시는 확장 피기백 기판으로 로우 및 하이 뱅크를 통한다. 그래서 일 실시예에서는 칩(F41-F44) 및 칩(F21-F24)이 로우 뱅크에 있다. 칩(F31-F34) 및 칩(F11-F14)은 하이 뱅크에 있다. 피기백 기판은 칩(F11-F14) 및 칩(F21-F24)을 포함한다. 따라서, 어레이를 확장하기 위해 다수(예를 들어 8개)의 칩을 포함하는 피기백 기판이 뱅크에 더해지므로, 현재 칩(F11-F14)을 포함하는 열 상부가 확장된다. 다른 실시예에서, 피기백 기판은 현재 칩(F41-F44)을 포함하는 열 아래의 어레이를 확장하게 된다. 또 다른 실시예는 칩(F14, F24, F34, F44)의 우측을 확장시킨다. 또 다른 실시예는 칩(F11, F21, F31, F41)의 좌측을 확장시킨다.
도 7은 도 8의 4 ×4 FPGA 어레이의 접속 매트릭스를 "1" 또는 "0"으로 나타낸다. 이 접속 매트릭스는 SEmulation 시스템의 하드웨어 맵핑, 배치 및 라우팅에 사용되는 비용 함수로부터 배치 비용 결과를 발생시키는데 사용한다. 비용 함수는 도 6에 관하여 상술하였다. 예와 같이 칩(F11)은 칩(F13)으로부터 한 홉 내에 위치하므로, F11-F13의 접속 매트릭스 엔트리는 "1"이다.
도 21은 본 발명의 일 실시예에 따른 단일 FPGA 칩의 상호 접속 핀-아웃을 나타낸다. 각 칩은 6 세트의 상호 접속부를 구비하여, 각 세트는 특정 개수의 핀을 포함한다. 일 실시예에서 각 세트는 44 핀을 갖는다. 각 FPGA 칩의 상호 접속부는 수평(동-서) 및 수직(북-남)으로 배향된다. 서쪽의 상호 접속부 세트는 W[43:0]라 한다. 동쪽의 상호 접속부 세트는 E[43:0]라 한다. 북쪽의 상호 접속부 세트는 N[43:0]이라 한다. 남쪽의 상호 접속부 세트는 S[43:0]라 한다. 이러한 완전한 상호 접속부 세트는 인접한 칩들에 접속하기 위한 것으로; 즉, 이 상호 접속부들은 어떤 칩 이상으로 "홉"하지 않는다. 예를 들어 도 8에서 칩(F33)은 N[43:0]의 상호 접속부(607), E[43:0]의 상호 접속부(608), S[43:0]의 상호 접속부(609) 및 W[43:0]의 상호 접속부를 갖는다.
도 21로 돌아가면, 2 세트의 추가 상호 접속부가 남아있다. 한 세트의 상호 접속부는 수직으로 - YH[21:0] 및 YH[43:22] - 이어지는 인접하지 않은 상호 접속부를 위한 것이다. 또 한 세트의 상호 접속부는 수평으로 - XH[21:0] 및 XH[43:22] - 이어지는 인접하지 않은 상호 접속부를 위한 것이다. 각 세트 YH[...] 및 XH[...]는 각 세트의 절반이 22 핀을 포함하도록 둘로 나누어진다. 이 구성은 각 칩이 동일하게 제조될 수 있게 한다. 따라서, 각 칩은 상부, 하부, 좌측 및 우측에 위치하는 인접하지 않은 칩에 한 홉으로 상호 접속될 수 있다. 이 FPGA 칩은 또한 범용 신호, FPGA 버스 및 JTAG 신호용 핀(들)을 나타낸다.
다음에 FPGA I/O 컨트롤러를 설명한다. 이 컨트롤러는 우선 도 10에서 항목(327)으로서 간략히 소개되었다. FPGA I/O 컨트롤러는 데이터를 관리하고 PCI 버스와 FPGA 어레이간의 트래픽을 제어한다.
도 22는 FPGA 칩의 뱅크와 함께 PCI 버스와 FPGA 어레이간 FPGA 컨트롤러의 일 실시예를 나타낸다. FPGA I/O 컨트롤러(700)는 CTRL_FPGA 유닛(701), 클럭 버퍼(702), PCI 컨트롤러(703), EEPROM(704), FPGA 직렬 구성 인터페이스(705), 경계 스캔 검사 인터페이스(706) 및 버퍼(707)를 포함한다. 당업자들에게 알려진 적절한 전력/전압 조정 회로가 제공된다. 전형적인 소스는 전압 검출기/조정기 및 검출 증폭기에 연결되어 다양한 환경 조건에서 전압을 거의 유지하는 Vcc를 포함한다. 각 FPGA 칩에 대한 Vcc에는 그 사이에 고속 동작 박막 퓨즈가 구비된다. 모든 FPGA 칩에 대한 CONFIG# 및 LOCAL_BUS(708)에 대한 LINTI#에 Vcc-HI가 제공된다.
CTRL_FPGA 유닛(701)은 다양한 유닛 및 버스 중에서 각종 제어, 검사 및 읽기/쓰기 독립 데이터를 취급하는 FPGA I/O 컨트롤러(700)의 제1 컨트롤러이다. CTRL_FPGA 유닛(701)은 FPGA 칩의 로우 및 하이 뱅크에 연결된다. FPGA 칩(F41-F44 및 F21-F24)(즉, 로우 뱅크)는 로우 FPGA 버스(718)에 연결된다. FPGA 칩(F31-F34 및 F11-F14)(즉, 하이 뱅크)은 하이 FPGA 버스(719)에 연결된다. 이들 FPGA 칩(F11-F14, F21-F24, F31-F34, F41-F44)은 도 8의 FPGA 칩에 대응하며, 그 부호를 유지한다.
이들 FPGA 칩(F11-F14, F21-F24, F31-F34, F41-F44) 사이의 로우 뱅크 버스(718) 및 하이 뱅크 버스(719)는 적절한 부하를 위한 두꺼운 막 칩 저항이다. 로우 뱅크 버스(718)에 연결된 저항군(713)은 예를 들어 저항(716) 및 저항(717)을 포함한다. 하이 뱅크 버스(719)에 연결된 저항군(712)은 예를 들어 저항(714) 및 저항(715)을 포함한다.
확장을 원하면, 보다 많은 FPGA 칩이 로우 뱅크 버스(718) 및 하이 뱅크 버스(719)에 FPGA 칩(F11, F21)의 오른쪽 방향으로 설치될 수도 있다. 일 실시예에서는 피기백 기판(720)과 비슷한 피기백 기판을 통해 확장이 이루어진다. 따라서, 이들 FPGA 칩의 뱅크가 처음 8개의 FPGA 칩(F41-F44 및 F31-34)만 갖고 있었다면, 로우 뱅크의 FPGA 칩(F24-F21) 및 하이 뱅크의 칩(F14-F11)을 포함하는 피기백 기판(720)을 추가함으로써 확장이 더 가능하다. 피기백 기판(720)은 또한 추가 로우 및 하이 뱅크 버스 및 두꺼운 막 칩 저항을 포함한다.
PCI 컨트롤러(703)는 FPGA I/O 컨트롤러(700) 및 32 비트 PCI 버스(709) 사이의 주 인터페이스이다. 만약 PCI 버스가 64 비트 및/또는 66 MHz로 확장된다면, 적절한 조정이 본 발명의 개념 및 범위를 벗어나지 않으면서 본 시스템에서 이루어질 수 있다. 이러한 조정은 이하에서 설명될 것이다. 본 시스템에 사용될 수 있는 PCI 컨트롤러(703)의 일 예는 PLX 테크놀러지의 PCI 9080 또는 9060이다. PCI 9080은 적절한 로컬 버스 인터페이스, 컨트롤 레지스터, FIFO, 및 PCI 버스에 대한 PCI 인터페이스를 구비한다. 데이터 북 PLX 테크놀러지, PCI 9080 데이터 시트(1997년 2월 28일 버전 0.93)는 본 명세서에 참조로서 포함된다.
PCI 컨트롤러(703)는 LOCAL_BUS(708)를 통해 CTRL_FPGA 유닛(701)과 PCI 버스(709)에 데이터를 전달한다. LOCAL_BUS는 제어 버스 부분, 어드레스 버스 부분, 및 제어 신호를 위한 데이터 버스 부분, 어드레스 신호, 및 데이터 신호 각각을 포함한다. 만약 PCI 버스가 64비트로 확장된다면, LOCAL_BUS(708)의 데이터 버스 부분 또한 64 비트로 확장될 수 있다. PCI 컨트롤러(703)는 EEPROM(704)에 커플링되고, EEPROM(704)은 PCI 컨트롤러(703)을 위한 구성 데이터를 포함한다. 예시적인 EEPROM(704)은 National Semiconductor의 93CS46이다.
PCI 버스(709)는 FPGA I/O 컨트롤러(700)에 33 MHz의 클럭 신호를 공급한다. 클럭 신호는 동기화 목적을 위하여, 그리고 낮은 타이밍 스큐(timing skew)를 위하여 와이어 라인(710)을 통해 클럭 버퍼(702)에 제공된다. 이러한 클럭 버퍼(702)의 출력은 와이어 라인(711)을 통해 모든 FPGA 칩에, 그리고 와이어 라인(721)을 통해 CTRL_FPGA 유닛(701)에 공급되는 33MHz의 글로벌 클럭(GL_CLK) 신호이다. 만약 PCI 버스가 66MHz로 확장된다면, 클럭 버퍼는 또한 본 시스템에 66MHz를 공급할 것이다.
FPGA 직렬 구성 인터페이스(705)는 FPGA 칩(F11-F14, F21-F24, F31-F34, 및 F41-F44)을 구성하기 위하여 구성 데이터를 제공한다. Altera 데이터 북, Altera, 1996 DATA BOOK(1996년 6월)은 구성 장치 상에 상세한 정보를 제공하고 처리한다. FPGA 직렬 구성 인터페이스(705)는 또한 LOCAL_BUS(708) 및 병렬 포트(721)에 결합된다. 부가하여, FPGA 직렬 구성 인터페이스(705)는 CONF_INTF 와이어 라인(723)을 통해 CTRL_FPGA 유닛(701) 및 FPGA 칩(F11-F14, F21-F24, F31-F34, 및 F41-F44)에 결합된다.
경계 스캔 테스트 인터페이스(706)는 소프트웨어에 의해 프로세서의, 또는 시스템의 로직 유닛 및 회로를 외부적으로 체킹하기 위하여 일정한 특정 테스트 명령 세트의 JTAG 구현을 제공한다. 이러한 인터페이스(706)는 IEEE Std. 1149-1990 사양으로 컴파일한다. Altera 데이터 북, Altera, 1996 DATA BOOK(1996년 6월) 및 어플리케이션 노트 39(Altera 장치에서의 JTAG 경계-스캔 테스트)를 참조하라. 상기 두 참조문헌은 보다 많은 정보를 위하여 본 명세서에 참조로서 결합된다. 경계 스캔 테스트 인터페이스(706)은 BST_INTF 와이어 라인(724)을 통해 CTRL_FPGA 유닛(701) 및 FPGA 칩(F11-F14, F21-F24, F31-F34, 및 F41-F44)에 결합된다.
CTRL_FPGA 유닛(701)은 버퍼(707)를 따라, 각각 로우 뱅크 32 비트 버스(718) 및 하이 뱅크 32 비트 버스(719)를 통해 FPGA 칩의 로우(칩 F41-44 및 F21-F24) 및 하이(칩 F31-34 및 F11-F14) 뱅크로/로부터 데이터를 전달하고, 로우 뱅크 32 비트 FD[31:0]에 대한 F_BUS(725) 및 하이 뱅크 32 비트 FD[63:32]에 대한 F_BUS(726)를 전달한다.
일 실시예는 로우 뱅크 버스(718) 및 하이 뱅크 버스(719)의 PCI 버스(709)의 처리량의 두 배가 된다. PCI 버스(709)는 33 MHz에서 32 비트 폭이다. 그리하여 처리량은 132 MB(= 33 MHz*4 바이트)이다. 로우 뱅크 버스(718)는 PCI 버스 주파수 절반(33/2 MHz = 16.5 MHz)에서 32비트이다. 하이 뱅크 버스(719) 또한 PCI 버스 주파수 절반(33/2 MHz = 16.5 MHz)에서 32비트이다. 64 비트 로우 뱅크 버스 및 하이 뱅크 버스의 처리량은 또한 132 MB(= 16.5 MHz * 8 바이트)이다. 그리하여, 로우 뱅크 버스 및 하이 뱅크 버스의 성능은 PCI 버스의 성능을 따라간다. 달리 말하면, 성능 제한은 로우 뱅크 버스 및 하이 뱅크 버스가 아니라 PCI 버스에 있다.
본 발명의 일 실시예에 따라, 어드레스 포인터는 또한 각 소프트웨어/하드웨어 경계 어드레스 공간에 대하여 각 FPGA 칩으로 구현된다. 이러한 어드레스 포인터는 멀티플렉싱된 크로스 칩 어드레스 포인터 체인(multiplexed cross chip address pointer chain)을 통하여 몇 개의 FPGA 칩에 대해 결합된다. 도 9, 11, 12, 14 및 15에 관련하여 위에서 논의된 어드레스 포인터를 참조하라. 주어진 어드레스 공간과 연관된 어드레스 포인터의 체인에 대한, 그리고 몇 개의 칩에 대한 워드 선택 신호를 이동시키기 위하여, 체인 아웃 와이어 라인이 제공되어야 한다. 이러한 체인 아웃 와이어 라인은 칩 사이의 화살표로서 도시된다. 로우 뱅크에 대한 상기 체인 아웃 와이어 라인은 칩(F23)과 칩(F22) 사이의 와이어 라인(730)이다. 하이 뱅크에 대한 또다른 상기 체인 아웃 와이어 라인은 칩(F31)과 칩(F32) 사이의 와이어 라인(731)이다. 로우 뱅크 칩(F21) 단부에서의 체인 아웃 와이어 라인(732)은 LAST_SHIFT_L로서 CTRL_FPGA 유닛(701)에 결합된다. 하이 뱅크 칩(F11) 단부에서의 체인 아웃 와이어 라인(733)은 LAST_SHIFT_H로서 CTRL_FPGA 유닛(701)에 결합된다. 이러한 신호 LAST_SHIFT_L 및 LAST_SHIFT_H는 워드 선택 신호가 FPGA 칩을 통해 전파될 때 각 뱅크에 대한 워드 선택 신호이다. 이러한 신호 LAST_SHIFT_L 및 LAST_SHIFT_H 중 하나가 CTRL_FPGA 유닛(701)에 로직 "1"을 제공할 때, 이것은 워드 선택 신호가 칩의 각각의 뱅크 단부로 진행되었음을 나타낸다.
CTRL_FPGA 유닛(701)은 와이어 라인(734) 상에 기록 신호(F_WR), 와어 라인(735) 상에 판독 신호(F_RD), 와이어 라인(736) 상에 DATA_XSFR 신호, 및 와이어 라인(738) 상에 SPACE[2:0] 신호를 FPGA 칩으로부터 그리고 FPGA 칩으로 제공한다. CTRL_FPGA 유닛(701)은 와이어 라인(739) 상에서 EVAL_REQ# 신호를 수신한다. 기록 신호(F_WR), 판독 신호(F_RD), DATA_XSFR 신호, 및 SPACE[2:0] 신호는 FPGA 칩의 어드레스 포인터에 대하여 함께 동작한다. 기록 신호(F_WR), 판독 신호(F_RD), DATA_XSFR 신호, 및 SPACE[2:0] 신호는 SPACE 지수(SPACE[2:0])에 의해 결정된 것으로서 선택 어드레스 공간과 연관된 어드레스 포인터에 대하여 MOVE 신호를 생성하도록 사용된다. DATA_XSFR 신호는 어드레스 포인터를 초기화하기 위하여 사용되고 워드 당(word-by-word) 전송 프로세스를 시작한다.
EVAL_REQ# 신호는 FPGA 칩 중 어느 것이 이 신호를 확인한다면 평가 사이클을 전체적으로 다시 시작하도록 사용된다. 예를 들어, 데이터를 평가하기 위하여, 데이터는 PCI 버스를 통해 호스트 프로세서의 컴퓨팅 스테이션의 주 메모리로부터 FPGA로 전달 또는 기록된다. 전달 종료시, 평가 사이클은 평가 프로세스를 촉진하기 위하여 어드레스 포인터 초기화 및 소프트웨어 클럭의 동작을 포함하여 시작한다. 그러나, 여러가지 이유로, 특정 FPGA 칩은 데이터를 전체 다시 평가해야 할 수 있다. 이러한 FPGA 칩은 EVAL_REQ# 신호를 확인하고 CNTI_FPGA 칩(701)은 평가 사이클은 전체적으로 다시 시작한다.
도 23는 도 22의 CTRL_FPGA 유닛(701) 및 버퍼(707)의 보다 상세한 예시를 보여준다. 도 22에 도시된 CTRL_FPGA 유닛(701)에 대한 동일한 입력/출력 신호 및 대응 참조 번호는 도 23에서도 유지되어 사용된다. 그러나, 도 22에 도시되지 않은 부가적인 신호 및 와이어/버스 라인은, 예를 들어, SEM_FPGA 출력 인에이블(1016), 로컬 인터럽트 출력(Local INTO)(708a), 로컬 판독/기록 제어 신호(708b), 로컬 어드레스 버스(708c), 로컬 인터럽트 입력(Local INTE#)(708d), 및 로컬 데이터 버스(708e)와 같은 새로운 참조번호를 사용하여 기술될 것이다.
CTRL_FPGA 유닛(701)은 전달 수행 체킹 로직(Transfer Done Checking Logic; XSFR_DONE Logic)(1000), 평가 제어 로직(EVAL Logic)(1001), DMA 디스크립터 블록(1002), 컨트롤 레지스터(1003), 평가 타이머 로직(EVAL timer)(1004), 어드레스 디코더(1005), 기록 플래그 시퀀서 로직(1006), FPGA 칩 판독/기록 제어 로직(SEM_FPGA R/W Logic)(1007), 디멀티플렉서 및 래치(DEMUX logic)(1008), 및 도 22의 버퍼(707)에 대응하는 래치(1009-1012)를 포함한다. 와이어/버스(721) 상의 글로벌 클럭 신호(CTRL_FPGA_CLK)는 CTRL_FPGA 유닛(701)의 모든 로직 요소/블록에 제공된다.
전송 종료(transfer done) 체킹 로직(XSFR_DONE)(1000)은 LAST_SHIFT_H(733), LAST_SHIFT_L(732) 및 로컬 INTO(708a)를 수신한다. XSFR_DONE 로직(1000)은 와이어/버스(1013) 상의 전송 종료 신호(XSFR_DONE)를 EVAL 로직(1001)으로 출력한다. LAST_SHIFT_H(733) 및 LAST_SHIFT_L(732)의 수신에 기초하여, XSFR_DONE 로직(1000)은 필요시 평가 사이클이 시작할 수 있도록 데이터 전송의 종료를 체킹한다.
EVAL 로직(1001)은 와이어/버스(739) 상의 EVAL_REQ# 신호 및 와이어/버스(1015) 상의 WR_XSFR/RD_XSFR 신호와 함께, 와이어/버스(1013) 상의 전송 종료 신호(XSFR_DONE)를 수신한다. EVAL 로직(1001)은 두 개의 출력 신호, 와이어/버스(1014) 상의 Start EVAL 및 와이어/버스(736) 상의 DATA_XSFR을 생성한다. EVAL 로직은 FPGA 버스와 PCI 버스 사이의 데이터 전송이 어드레스 포인터를 초기화하기 시작할 때를 가리킨다. EVAL 로직은 데이터 전송이 종료될 때 XSFR_DONE 신호를 수신한다. WR_XSFR/RD_XSFR 신호는 전송이 판독인지 또는 기록인지를 가리킨다. 일단 I/O 사이클이 종료되면 (또는 I/O 사이클의 온셋 이전에), EVAL 로직은 EVAL 타이머로의 시작~EVAL 신호를 이용하여 평가 사이클을 시작할 수 있다. EVAL 타이머는 평가 사이클의 주기을 규정하고 모든 레지스터와 결합 컴포넌트들에 대한 데이터 전달(propagation)을 안정시키기 위해 필요한 만큼 유효한 평가 사이클을 유지시킴으로써 소프트웨어 클럭 메커니즘의 성공적인 동작을 보장한다.
DMA 디스크립터(descriptor) 블록(1002)은 와이어/버스(1019) 상의 로컬 버스 어드레스, 어드레스 디코더(1005)로부터 와이어/버스(1020) 상의 와이어 인에이블 신호, 및 로컬 데이터 버스(708e)를 경유한 와이어/버스(1029) 상의 로컬 버스 데이터를 수신한다. 출력은 와이어/버스(1045) 상의 DEMUX 로직(1008)으로 향한 와이어/버스(1046) 상의 DMA 디스크립터 출력이다. DMA 디스크립터 블록(1002)은 호스트 메모리 내 정보에 대응하는 디스크립터 블록 정보를 포함하며, 상기 정보는 PCI 어드레스, 로컬 어드레스, 전송 카운트, 전송 방향, 및 다음 디스크립터 블록에 대한 어드레스를 포함한다. 또한 호스트는 PCI 컨트롤러의 디스크립터 포인터 레지스터 내에 초기 디스크립터 블록의 어드레스를 셋업한다. 전송은 컨트롤 비트를 세팅함으로써 초기화될 수 있다. PCI는 제 1 디스크립터 블록을 로딩하고 데이터 전송을 초기화한다. PCI 컨트롤러는 다음 디스크립터 포인터 레지스터에 세팅된 최종 체인 비트를 검출할 때까지 계속해서 디스크립터 블록을 로딩하고 데이터를 전송한다.
어드레스 디코더(1005)는 버스(708b) 상의 로컬 R/W 제어 신호를 수신하고 전달하며, 버스(708c) 상의 로컬 어드레스 신호를 수신하고 전달한다. 어드레스 디코더(1005)는 DMA 디스크립터(1002)에 대한 와이어/버스(1020) 상의 기록 인에이블 신호, 컨트롤 레지스터(1003)에 대한 와이어/버스(1021) 상의 기록 인에이블 신호, 와이어/버스(738) 상의 FPGA 어드레스 SPACE 지수, 와이어/버스(1027) 상의 제어 신호, 및 DEMUX 로직(1008)에 대한 와이어/버스(1024) 상의 또 다른 제어 신호를 생성한다.
컨트롤 레지스터(1003)는 어드레스 디코더(1005)로부터 와이어/버스(1021) 상의 기록 인에이블 신호, 및 와이어/버스(1030)로부터 로컬 데이터 버스(708e)를 경유한 데이터를 수신한다. 컨트롤 레지스터(1003)는 EVAL 로직(1001)에 대한 와이어/버스(1015) 상의 WR_XSFR/RD_XSFR 신호, EVAL 타이머(1004)에 대한 Set EVAL 시간 신호, 및 FPGA 칩에 대한 와이어/버스(1016) 상의 SEM_FPGA 출력 인에이블 신호를 생성한다. 시스템은 각각의 FPGA 칩을 선택적으로 턴온 또는 인에이블시키기 위해 SEM_FPGA 출력 인에이블 신호를 사용한다. 통상적으로, 시스템은 소정의 시간에 각각의 FPGA 칩 하나를 인에이블시킨다.
EVAL 타이머(1004)는 와이어/버스(1014) 상의 Start EVAL 신호, 및 와이어/버스(1041) 상의 Set EVAL 시간을 수신한다. EVAL 타이머(1004)는 와이어/버스(737) 상의 ~EVAL 신호, 와이어/버스(1017) 상의 평가 종료(EVAL_DONE) 신호, 및 Write Flag Sequencer 로직(1006)에 대한 와이어/버스(1018) 상의 Start 기록 플래그 신호를 생성한다. 일 실시예에서, EVAL 타이머는 6 비트 길이를 갖는다.
Write Flag Sequencer 로직(1006)은 EVAL 타이머(1004)로부터 와이어/버스(1018) 상의 Start 기록 플래그 신호를 수신한다. Write Flag Sequencer 로직(1006)은 로컬 R/W 와이어/버스(708b)에 대한 와이어/버스(1022) 상의 로컬 R/W 제어 신호, 로컬 어드레스 버스(708c)에 대한 와이어/버스(1023) 상의 로컬 어드레스 신호, 로컬 데이터 버스(708e)에 대한 와이어/버스(1028) 상의 로컬 데이터 신호, 및 와이어/버스(708d) 상의 로컬 INTI#를 생성한다. 시작 기록 플래그 신호를 수신할 때, 기록 플래그 시퀀서 로직은 제어 신호의 시퀀스를 시작하여 PCI 버스에 대한 메모리 기록 사이클을 시작한다.
SEM_FPGA R/W 컨트롤 로직(1007)은 어드레스 디코더(1005)로부터 와이어/버스(1027) 상의 제어 신호, 및 로컬 R/W 컨트롤 버스(708b)를 경유한 와이어/버스(1047) 상의 로컬 R/W 제어 신호를 수신한다. SEM_FPGA R/W 컨트롤 로직(1007)은 래치(1009)에 대한 와이어/버스(1035) 상의 인에이블 신호, DEMUX 로직(1008)에 대한 와이어/버스(1025) 상의 제어 신호, 래치(1011)에 대한 와이어/버스(1037) 상의 인에이블 신호, 래치(1012)에 대한 와이어/버스(1040) 상의 인에이블 신호, 와이어/버스(734) 상의 F_WR 신호, 및 와이어/버스(735) 상의 F-RD 신호를 생성한다. SEM_FPGA R/W 컨트롤 로직(1007)은 FPGA 로우 뱅크 및 하이 뱅크 버스로의/이들로부터의 다양한 기록 및 판독 데이터 전송을 제어한다.
입력 신호의 4 개의 세트를 수신하여 와이어/버스(1026) 상에 있는 신호의 1개 세트를 로컬 데이터 버스(708e)로 출력하는 DEMUX 로직(1008)은 멀티플렉스와 래치이다. 선택기(selector) 신호는 SEM_FPGA R/W 컨트롤 로직(1007)으로부터 와이어/버스(1025) 상에 있는 제어 신호와 어드레스 디코더(1005)로부터 와이어/버스(1024) 상의 제어 신호이다. DEMUX 로직(1008)은 와이어/버스(1042) 상의 EVAL_DONE 신호, 와이어/버스(1043) 상의 XSFR_DONE 신호, 및 와이어/버스(1044) 상의 ~EVAL 신호로부터의 입력중 1개 세트를 수신한다. 이러한 신호들에 대한 단일 세트는 참조번호 (1048)로 명칭이 부여되었다. 임의의 하나의 시간 구간에서, 이러한 3개의 신호, EVAL_DONE, XSFR_DONE, 및 ~EVAL중 1개의 신호만이 선택이 가능하도록 DEMUX 로직(1008)에 제공된다. 또한 DEMUX 로직(1008)은, 입력 신호의 다른 3개 세트로서, DMA 디스크립터 블록(1002)으로부터 와이어/버스(1045) 상의 DMA 디스크립터 출력 신호, 래치(1012)로부터 와이어/버스(1039) 상의 데이터 출력, 및 래치(1010)으로부터 와이어/버스(1034) 상의 또 다른 데이터 출력을 수신한다.
CTRL_FPGA 유닛(701)과 로우 및 하이 FPGA 뱅크 버스 사이의 데이터 버퍼는 래치(1009-1012)를 포함한다. 래치(1009)는 와이어/버스(1031)와 로컬 데이터 버스(708e)를 경유한 와이어/버스(1032) 상의 로컬 버스 데이터, 및 SEM_FPGA R/W 컨트롤 로직(1007)으로부터 와이어/버스(1035) 상의 인에이블 신호를 수신한다. 래치(1009)는 와이어/버스(1033) 상의 데이터를 래치(1010)로 출력한다.
래치(1010)는 래치(1009)로부터 와이어/버스(1033) 상의 데이터, 및 SEM_FPGA R/W 컨트롤 로직(1007)으로부터 와이어/버스(1037)를 경유한 와이어/버스(1036) 상의 인에이블 신호를 수신한다. 래치(1010)는 와이어/버스(725) 상의 데이터를 와이어/버스(1034)를 통해 FPGA 로우 뱅크 버스와 DEMUX 로직(1008)으로 출력한다.
래치(1011)는 로컬 데이터 버스(708e)로부터 와이어/버스(1031) 상의 데이터, 및 SEM_FPGA R/W 컨트롤 로직(1007)으로부터 와이어/버스(1037) 상의 인에이블 신호를 수신한다. 래치(1011)는 와이어/버스(726) 상의 데이터를 FPGA 하이 뱅크 버스로 출력하고 와이어/버스(1038) 상의 데이터를 래치(1012)로 출력한다.
래치(1012)는 래치(1011)로부터 와이어/버스(1038) 상의 데이터, 및 SEM_FPGA R/W 컨트롤 로직(1007)으로부터 와이어/버스(1040) 상의 인에이블 신호를 수신한다. 래치(1012)는 와이어/버스(1039) 상의 데이터를 DEMUX(1008)로 출력한다.
도 24는 4 ×4 FPGA 어레이, FPGA 뱅크에 대한 FPGA 어레이의 관계, 및 확장 능력을 도시한다. 도 8과 유사하게, 도 24는 동일한 4 ×4 어레이를 도시한다. 또한 CTRL_FPGA 유닛(740)이 도시되어 있다. 로우 뱅크 칩(칩(F41-F44 및 F21-F24))과 하이 뱅크 칩(칩(F31-F34 및 F11-F14))은 교대 방식으로 배열된다. 이에 따라, 하부 열부터 상부 열 까지의 FPGA 칩 열:로우 뱅크-하이 뱅크-로우 뱅크-하이 뱅크로 특징지어진다. 데이터 전송 체인은 미리 결정된 순서로 뱅크에 후속한다. 로우 뱅크에 대한 데이터 전송 체인이 화살표(741)로 도시되어 있다. 하이 뱅크에 대한 데이터 전송 체인이 화살표(742)로 도시되어 있다. JTAG 구성 체인은 화살표(743)으로 도시되어 있으며, 상기 체인은 칩(F41부터 F44, F34부터 F31, F21부터 F24, 및 F14부터 F11)의 전체 어레이를 연장하며 CTRL_FPGA 유닛(740)으로 되돌아간다.
확장은 피기백 보드로 달성될 수 있다. 도 24에서 FPGA 칩의 오리지널 어레이가 F41-F44 및 F31-F34를 포함하는 것을 가정하면, 두개 이상의 칩 F21-F24 및 F11-F14의 부가는 피기백 보드(745)로 달성될수있다. 피기백 보드(745)는 뱅크를 연장시키기 위하여 적당한 버스를 포함한다. 추가의 확장은 어레이의 다른 상부상에 배치된 많은 피기백 보드로 달성될 수 있다.
도 25는 하드웨어 개시(start-up) 방법의 일 실시예를 도시한다. 단계(800)는 전원 인가를 시작하거나 부트 시퀀스를 워밍(warm)한다. 단계(801)에서, PCI 컨트롤러는 시작용 EEPROM을 판독한다. 단계(802)는 초기 시퀀스로 인해 PCI 컨트롤러 레지스터를 판독 및 기록한다. 단계(803) 경계 스캔은 어레이에서 모든 FPGA 칩에 대해 검사한다. 단계(804)는 FPGA I/0 컨트롤러의 CTRL_FPGA를 구성한다. 단계(805)는 CTRL_FPGA 유닛의 레지스터를 판독 및 기록한다. 단계(806)는 DMA 마스터 판독/기록 모드에 대한 PCI 컨트롤러를 셋업한다. 그후, 데이터는 전달 및 검증된다. 단계(807)는 검사 설계를 가진 모든 FPGA를 구성하고 그의 수정을 검증한다. 단계(808)에서, 하드웨어는 사용될 준비가된다. 이 점에서, 시스템은 하드웨어의 동작의 긍정적인 확인에서 발생되는 모든 단계를 취하고, 그렇지 않으면 시스템은 결코 단계(808)에 도달하지 않는다.
E. 고집적 FPGA 칩을 이용하는 선택적 실시예
본 발명의 일실시예에서, FPGA 로직 소자는 각각의 보드상에 제공된다. 만약 보다 많은 FPGA 로직 소자가 보드에 제공되기 보다 사용자의 회로 설계를 모델링하기 위하여 요구되면, 많은 FPGA 로직 소자를 가진 다수의 보드는 제공될수있다. 많은 보드를 시뮬레이션 시스템에 부가하기 위한 능력은 본 발명의 바람직한 특징이다. 이 실시예에서, 변경 가능한 10K130V 및 10K250V 같은 밀집된 FPGA 칩은 사용된다. 이들 칩의 사용은 보드 설계를 변경시켜서 8개 이하 밀집된 FPGA 칩(예를들어, 변경 가능 10K100) 대신 보드마다 사용된다.
시뮬레이션 시스템의 마더보드에 이들 보드를 결합하는 것은 도전을 나타낸다. 상호접속 및 접속 방법은 후면의 결함을 보상하여야 한다. 시뮬레이션 시스템에서 FPGA는 특정 보드 상호접속 구조를 통해 마더보드상에 제공된다. 각각의 칩은 8개 세트의 상호접속부를 가질수있고, 여기서 상호접속은 직접 인접한 상호접속부(즉, N[73:0], S[73:0], W[73:0], E[73:0]), 및 단일 보드내 및 다른 보드를 가로질러 로컬 버스 접속부를 배제한 하나의 홉 이웃 상호접속부(즉, NH[27:0], SH[27:0], XH[36:0], XH[72:37])를 따라 배열된다. 각각의 칩은 인접한 이웃 칩에 직접, 또는 상부, 하부, 왼쪽 및 오른쪽에 배치된 비인업한 칩에 대한 하나의 홉에 직접적으로 상호접속된다. x 방향(동서쪽)에서, 어레이는 토러스이다. Y 방향(북남쪽)에서, 어레이는 매쉬이다.
상호접속부는 단일 보드내의 로직 소자 및 다른 컴포넌트를 결합한다. 그러나, 내부 보드 커넥터는 (1) 마더보드를 통한 PCI 버스 및 어레이 보드, 및 (2) 임으의 두개의 어레이 보드 사이에 신호를 전달하기 위하여 다른 보드를 가로질러 이들 보드 및 상호접속부를 결합하도록 제공된다. 각각의 보드는 서로, 즉 SRAM 메모리 소자, 및 CTRL_FPGA 유닛(FPGA I/0 컨트롤러)와 통신하기 위하여 FPGA 로직 소자를 허용하는 자신의 FPGA 버스 FD[63:0]를 포함한다. FPGA 버스 FD[63:0]는 다중 보드에 대해 제공되지 않는다. 그러나 FPGA 상호접속부는 비록 이들 상호접속부가 FPGA 버스에 관련되지 않을지라도 다수의 보드에 대해 FPGA 로직 소자 사이에 접속을 제공한다. 다른 한편, 로컬 버스는 모든 보드에 대해 제공된다.
마더보드 커넥터는 보드를 마더보드에 접속시켜서, PCI 버스, 전원 및 접지에 접속시킨다. 몇몇 보드에 대해, 마더보드 커넥터는 마더보드에 직접적인 접속을 위해 사용되지 않는다. 6개의 보드 구성에서, 단지 보드 1, 3 및 5만이 직접적으로 마더보드에 접속되고 나머지 보드 2, 4, 및 6는 마더보드 접속을 위한 이웃하는 보드상에 의존한다. 따라서, 각각 다른 보드는 직접적으로 마더보드에 접속되고, 이들 보드의 상호접속부 및 로컬 버스는 납땜측에 배열된 내부 보드 커넥터를 통해 컴포넌트측으로 함께 결합된다. PCI 신호는 보드(통상적으로 제 1 보드)중 하나를 통해서만 라우팅된다. 전원 및 접지는 이들 보드에 대한 다른 마더보드 커넥터에 인가된다. 컴포넌트측에 대한 납땜 측에 배치된 다양한 내부 보드 커넥터는 PCI 버스 컴포넌트, FPGA 로직 소자, 메모리 소자 및 다양한 시뮬레이션 시스템 및 제어 회로 사이에 통신을 제공한다.
도 56은 본 발명의 일실시예에 따른 FPGA 칩 구성의 어레이의 상위 레벨 블록도를 도시한다. 상기된 CTRL_FPGA 유닛(1200)는 라인(1209 및 1236)을 통해 버스(1210)에 결합된다. 일실시예에서, CTRL_FPGA 유닛(1200)는 변경 가능한 10K50 칩 같은 FPGA 칩의 형태로 프로그램 가능한 로직 소자(PLD)이다. 버스(1210)는 다른 시뮬레이션 어레이 보드(만약 있다면) 및 다른 칩(예를들어, PCI, 컨트롤러, EEPROM, 클럭 버퍼)에 결합된다. 도 56은 로직 소자 및 메모리 소자 형태로 다른 주요 기능 블록을 도시한다. 일실시예에서, 로직 소자는 변경 가능한 10K130V 또는 10K250V 칩의 형태로 프로그램 가능한 로직 소자(PLD)이다. 10K130V 및 10K250V는 호환 가능한 핀이고 각각은 599-핀 PGA 패키지이다. 따라서, 어레이에서 8개의 변경 가능한 FLEX 10K100 칩을 가진 상기된 실시예 대신, 이 실시예는 변경 가능한 FLEX 10K130의 4개의 칩만을 사용한다. 본 발명의 일실시예는 이들 4개의 로직 소자 및 그것의 접속부를 포함하는 보드를 기술한다.
사용자 설계가 어레이에서 이들 임의의 수의 로직 소자로 모델링되고 구성되기 때문에, 내부 FPGA 로직 소자 통신은 사용자의 회로 설계의 하나의 파트를 다른 파트에 접속할 필요가 있다. 게다가, 초기 구성 정보 및 경계 스캔은 내부 FPGA 상호접속부에 의해 지원된다. 마지막으로, 필요한 시뮬레이션 시스템 제어 신호는 시뮬레이션 시스템 및 FPGA 로직 소자 사이에서 액세스 가능하여야 한다.
도 36은 본 발명에 사용된 FPGA 로직 소자의 하드웨어 아키텍쳐를 도시한다. FPGA 로직 소자(1500)는 102 상부 I/0 핀, 102 하부 I/0 핀, 111 좌측 I/O 핀 및 10 우측 I/O 핀을 포함한다. 따라서, 상호접속부 핀의 총 수는 425이다. 게다가, 부가적인 45 I/0 핀은 GCLK, FPGA 버스 FD[31:0](하이 뱅그를 위하여, FD[63:32]가 사용됨), F_RD, F_WR, DATAXSFR, SHIFIN, SHIFTOUT, SPACE[2:0], ∼EVAL, EVAL_REO_N, DEVICE_OE(FPGA 로직 소자의 출력 핀을 턴온하기 위하여 CTRL_FPGA 유닛로부터의 신호), 및 DEV_CLRN(시뮬레이션 시작전에 모든 내부 플립 플롭을 클리어하기 위하여 CTRL_FPGA 유닛로부터의 신호)를 위해 사용된다. 따라서, 두개의 임의의 FPGA 로직 소자 사이를 가로지르는 임의의 데이터 및 제어 신호는 이들 상호접속부에 의해 수행된다. 나머지 핀은 전력 및 접지에 사용된다.
도 37은 본 발명의 일실시예에 따른 단일 FPGA 칩에 대한 FPGA 상호접속 핀 아웃을 도시한다. 각각의 칩(1510)은 8개의 상호접속부 세트를 가질수있고, 여기서 각각의 세트는 특정수의 핀을 포함한다. 몇몇 칩은 보드상 각각의 위치에 따라 8개 이하의 상호접속부 세트를 가질수있다. 바람직한 실시예에서, 모든 칩은 비록 사용된 상호접속부의 특정 세트가 그것의 각각의 보드상 위치에 따라 칩마다 가변할 수 있지만, 7개의 상호접속부 세트를 가진다. 각각의 FPGA 칩에 대한 상호접속부는 수평(동서쪽) 및 수직(북남쪽)으로 지향된다. 서쪽 방향에 대한 상호접속부 세트는 W[73:0]로서 라벨링된다. 동쪽 방향에 대한 상호접속부 세트는 E[73:0]로서 라벨링된다. 북쪽 방향에 대한 상호접속부 세트는 N[73:0]로서 라벨링된다. 남쪽 방향에 대한 상호접속부 세트는 S[73:0]로서 라벨링된다. 상호접속부의 이들 완전한 세트는 인접한 칩에 대한 상호접속부이다; 즉, 이들 상호접속부는 임의의 칩 상에서 "홉"하지 않는다. 예를들어, 도 39에서, 칩(1570)은 N[73:0]에 대한 상호접속부(1540), W[73:0]에 대한 상호접속부(1542), E[73:0]에 대한 상호접속부(1543), 및 S[73:0]에 대한 상호접속부(1545)를 가진다. FPGA2 칩인 FPGA 칩(1570)은 모두 4개의 인접한 상호접속부세트 - N[73:0], S[73:0], W[73:0] 및 E[73:0]를 가진다. FPGA0의 서쪽 상호접속부는 토러스-형태 상호접속부에 의한 와이어(1539)를 통해 FPGA3의 동쪽 상호접속부에 접속된다. 따라서, 와이어(1539)는 칩(1569)(FPGA0) 및 1572(FPGA3)이 서로 만나 주위에 감겨질 보드의 서쪽-동쪽 단부를 감는 것과 유사한 방식으로 서로 직접적으로 결합되게 한다.
도 37을 참조하면, 네 세트의 "홉핑(hopping)" 상호접속부가 제공된다. 두 세트의 상호접속부는 수직으로 연장하는 근접하지 않은 상호접속부-NH[27:0] 및 SH[27:0]을 위한 것이다. 예를 들어, 도39의 FPGA2 칩은 NH 상호접속부(1541) 및 SH 상호접속부(1546)를 나타낸다. 도37을 참조하면, 다른 두 세트의 상호접속부는 수평으로 연장하는 인접하지 않은 상호접속부-XH[36:0] 및 XH[72:37]이다. 예를 들어, 도 39의 FPGA 칩(1570)은 XH 상호접속부(1544)를 나타낸다.
도 37을 참조하면, 수직 홉핑 상호접속부(NH[27:0] 및 SH[27:0])는 각각 28개의 핀을 갖는다. 수평 상호접속부는 73개의 핀(XH[36:0] 및 XH[72:37])을 갖는다. 수평 상호접속부 핀(XH[36:0] 및 XH[72:37])은 서쪽(예를 들어, FPGA3 칩(1576)을 위해, 도39의 상호접속부(1605)) 및/또는 동쪽(예를 들어, FPGA0 칩(1573)을 위해, 도39의 상호접속부(1602))에 사용될 수 있다. 이러한 구조는 각각의 칩이 이상적으로 형성되게 한다. 따라서, 각각의 칩은 상하좌우에 배치된 인접하지 않은 칩에 대해 하나의 홉에 연결될 수 있다.
도 39는 본 발명의 일 실시예에 따른 단일 마더보드 상의 6개의 보드의 바로 인접한 하나의 홉 인접 FPGA 어레이 배치를 도시한다. 이러한 구조는 6개의 보드 시스템 및 이중 보드 시스템과 같은 두개의 가능한 구조를 설명하는데 사용된다. 위치 지시기(1550)는 "Y" 방향이 북-남이고 "X" 방향이 동-서임을 나타낸다. X 방향에서, 어레이는 토러스이다. Y 방향에서, 어레이는 매쉬이다. 도39에서, 단지 보드, FPGA 로직 소자, 인터커넥터 및 상위 레벨의 커넥터가 도시된다. 메인 보드 및 다른 보조 소자(예를 들어, SRAM 메모리 장치) 및 배선(예를 들어, FPGA 버스)이 도시된다.
도 39는 보드의 어레이 모양과 컴포넌트, 상호접속부 및 커넥터를 도시한다. 실제의 물리적 구성 및 배치는 이러한 보드를 솔더 측에 대해 각각의 에지 소자 측에 배치하는 것을 포함한다. 대략적으로 보드의 절반은 마더보드와 직접 연결되지만, 보드의 다른 절반 부분은 각각의 이웃하는 보드에 연결된다.
본 발명에 따른 6개의 보드를 사용하는 실시예에서, 6개의 보드(1551(보드1), 1552(보드2), 1553(보드3), 1554(보드4), 1555(보드5), 1556(보드6))가 도1의 리컨피규러블 하드웨어의 일부인 마더보드(미도시)에 제공된다. 각각의 보드는 거의 동일한 세트의 컴포넌트 및 커넥터를 포함한다. 따라서, 설명의 목적을 위해, 6번째 보드(1556)는 FPGA 로직 소자(1565 내지 1568) 및 커넥터(1557 내지 1560 및 1581)를 포함하며, 5번째 보드(1555)는 FPGA 로직 소자(1569 내지 1572) 및 커넥터(1582 및 1583)를 포함하며, 4번째 보드(1554)는 FPGA 로직 소자(1573 내지 1576) 및 커넥터(1584 및 1585)를 포함한다.
이러한 6개의 보드 구조에서, 보드(1551) 및 보드(1556)은 보드6(1556) 상의 R-팩 단자(1557 내지 1560)와 같은 Y-매쉬 단자 및 보드1(1551) 상의 단자(1591 내지 1594)를 포함하는 "북엔드(bookend)" 보드로서 제공된다. 중간에 배치된 보드(즉, 보드(1552)(보드2), 1553(보드3), 1554(보드4) 및 1555(보드5))는 어레이를 완성하기 위해 제공된다.
전술한 바와 같이, 로컬 버스 커넥션을 제외하고 단일 보드 내에서 다른 보드에 걸쳐, 상호접속부는 인접한 직접 이웃 상호접속부(즉, N[73:0], S[73:0], W[73:0], E[73:0]), 및 원-홉 이웃 상호접속부(즉, NH[27:0], SH[27:0], XH[36:0], XH[27:37])에 따라 배치된다. 상호접속부는 단지 로직 소자와 단일 보드 내의 다른 소자를 결합할 수 있다. 그러나, 보드간 커넥터(1581 내지 1590)는 여러 보드(즉, 보드1 내지 보드6)에 걸쳐 FPGA 로직 소자들 사이의 통신을 가능하게 한다. FPGA 버스는 보드간 커넥터(1581 내지 1590)의 부분이다. 이러한 커넥터(1581 내지 1590)는 520 신호를 전송하고, 두개의 이웃한 어레이 보드 사이의 80 파워/접지 연결을 하는 600-핀 커넥터이다.
도 39에서, 다양한 보드가 보드간 커넥터(1581 내지 1590)에 대해 비대칭 방식으로 배치된다. 예를 들어, 보드(1551)와 보드(1552) 사이에서, 보드간 커넥터(1589 및 1590)가 제공된다. 상호접속부(1515)는 FPGA 로직 소자(1511 및 1577)를 함께 연결하며, 커넥터(1589 및 1590)에 따라, 이러한 연결은 대칭적이다. 그러나, 상호접속부(1603)는 대칭적이지 않으며; 세번째 보드(1553)의 FPGA를 보드(1551)의 FPGA 로직 소자에 연결시킨다. 커넥터(1589 및 1590)에 대해, 이러한 상호접속부는 비대칭적이다. 유사하게, 상호접속부(1600)는 커넥터(1589 및 1590)에 대해 비대칭적인데, 이는 상호접속부가 FPGA 로직 소자(1557)를 상호접속부(1601)를 통해 FPGA 로직 소자(1577)를 연결시키는 단자(1591)에 연결시키기 때문이다. 비대칭을 보여주는 다른 유사한 상호접속부가 존재한다.
이러한 비대칭의 결과로서, 상호접속부는 두 방식으로 보드간 커넥터를 통해 라우팅(경로설정)되는데, 한 방식은 상호접속부(1515)와 같은 대칭 상호접속부이며, 다른 방식은 상호접속부(1603 및 1600)와 같은 비대칭 상호접속부이다. 상호접속부 라우팅 수단은 도40(A) 및 40(B)로 도시된다.
도39에서, 단일 보드 내의 직접 이웃 커넥션의 예는 보드(1555)의 동-서 방향을 따라 로직 소자(1571)에 로직 소자(1570)를 결합시키는 상호접속부(1543)이다. 단일 보드 내의 직접 이웃 커넥션의 다른 예는 보드(1554)의 로직 소자(1576)로 로직 소자(1573)를 결합시키는 상호접속부(1607)이다. 두개의 상이한 보드 사이의 직접-이웃 커넥션의 예는 북-서 방향을 따라 커넥터(1583 및 1584)를 통해 보드(1554)의 로직 소자(1574)로 보드(1555)의 로직 소자(1570)를 결합시키는 상호접속부(1545)이다. 여기서, 두개의 보드간 커넥터(1583 및 1584)는 신호를 전송하는데 사용된다.
단일 보드 내의 원-홉 상호접속부의 예는 동-서 방향을 따라 보드(1555)에서 로직 소자(1572)에 로직 소자(1570)를 결합시키는 상호접속부(1544)이다. 두개의 상이한 보드 사이의 원-홉 상호접속부의 예는 커넥터(1581 내지 1584)를 통해 보드(1554)의 로직 소자(1573)에 보드(1556)의 로직 소자(1556)를 결합시키는 상호접속부(1599)이다. 여기서, 네개의 보드간 커넥터(1581 내지 1584)는 신호를 가로질러 전송하는데 사용된다.
특히 마더보드의 북-남 단부에 배치된 몇몇 보드는 소정의 접속을 종료하기 위해 10오옴 R-팩을 포함한다. 따라서, 6번째 보드(1556)는 10오옴 R-팩 커넥터(1557 내지 1560)를 포함하며, 첫 번째 보드(1551)는 10오옴 R-팩 커낵터(1591 내지 1594)를 포함한다. 6번째 보드(1556)는 상호접속부(1970 및 1971)용 R-팩 커넥터(1557), 상호접속부(1972 및 1541)용 R-팩 커넥터(1558), 상호접속부(1973 및 1974)용 R-팩 커넥터(1559), 상호접속부(1975 및 1976)용 R-팩 커넥터(1560)를 포함한다. 더욱이, 상호접속부(1561 내지 1564)는 어느 것에도 연결되지 않는다. 동-서 토러스-타입 상호접속과는 달리, 북-남 상호접속은 매쉬-타입 방식으로 배치된다.
이러한 매쉬 상호접속은 북-남 방향 상호접속의 수를 증가시킨다. 그렇지 않으면, FPGA 매쉬의 북 및 남 에지에서의 상호접속은 모두 무용지물이 될 것이다. 예를 들어, FPGA 로직 소자(1511 및 1577)는 이미 직접 상호접속부(1515)중 하나의 세트를 갖는다. 추가의 상호접속은 또한 R-팩(1591) 및 상호접속부(1600 및 1601)을 통해 이러한 두개의 FPGA 로직 소자에 제공되며; 즉, R-팩(1591)은 상호접속부(1600 및 1601)을 함께 연결시킨다. 이는 FPGA 로직 소자(1511 및 1577) 사이의 직접 접속의 수를 증가시킨다.
내부-보드 커넥션이 제공된다. 보드(1551) 상의 로직 소자(1577, 1578, 1579 및 1580)는 상호접속부(1515, 1516, 1517 및 1518)를 통해 보드(1522) 상의 로직 소자(1511, 1512, 1513 및 1514)에 결합된다. 따라서, 상호접속부(1515)는 보드(1552) 상의 로직 소자(1511)를 커넥터(1589 및 1590)를 통해 보드(1551) 상의 로직 소자(1577)에 결합시키며; 상호접속부(1516)는 보드(1552) 상의 로직 소자(1512)를 커넥터(1589 및 1590)를 통해 보드(1551) 상의 로직 소자(1577)에 결합시키며; 상호접속부(1517)는 보드(1552) 상의 로직 소자(1513)를 커넥터(1589 및 1590)를 통해 보드(1551) 상의 로직 소자(1579)에 결합시키며; 상호접속부(1518)는 보드(1552) 상의 로직 소자(1514)를 커넥터(1589 및 1590)를 통해 보드(1511) 상의 로직 소자(1580)에 결합시킨다.
상호접속부(1595, 1596, 1597 및 1598)와 같은 몇몇 상호접속부는 이들이 사용되지 않기 때문에 어느 것에도 연결되지 않는다. 그러나, 로직 소자(1511 및 1577)과 관련하여 전술된 바와 같이, R-팩(1591)은 북-남 상호접속을 증가시키기 위해 상호접속부(1600 및 1601)에 연결된다.
본 발명의 이중 보드 실시예는 도44에 도시된다. 본 발명의 이중 보드 실시예에서, 단지 두개의 보드가 시뮬레이션 시스템에서 사용자의 설계를 모델링하는데 필수적이다. 도39의 6개의 보드 구조와 같이, 도44의 이중 보드 구조는 "북엔드"를 위한 동일한 두개의 보드(보드1(1551) 및 보드6(1556))를 사용하며, 이는 도1에서 리컨피규러블 하드웨어 유닛(20)의 일부인 마더보드에 제공된다. 도44에서, 하나의 북엔드 보드는 보드1이며, 제2 북엔드 보드는 보드6이다. 보드6은 도39에서 보드6에 유사하게 도시하기 위해 도44에 사용되며; 즉, 보드1 및 보드6과 같은 북엔드 보드는 북-남 매쉬 접속을 위한 필수적인 단자를 갖는다.
이러한 이중 보드 구조는 보드1(1551) 상의 4개의 FPGA 로직 소자(1577(FPGA0), 1578(FPGA1), 1579(FPGA2) 및 1580(FPGA3)), 및 보드6(1556) 상의 4개의 FPGA 로직 소자(1565(FPGA0), 1566(FPGA1), 1567(FPGA2) 및 1568(FPGA3))를 포함한다. 이러한 두개의 보드는 내부-보드 커넥터(1581 및 1590)에 의해 연결된다.
이러한 보드는 소정의 연결을 종료시키기 위해 10ΩR-팩을 포함한다. 이중 보드 실시예에 대해, 두개의 보드는 "북엔드" 보드이다. 보드(1551)는 저항성 단자로서 10ΩR-팩 커넥터(1591, 1592, 1593 및 1594)를 포함한다. 두 번째 보드(1556)는 또한 10ΩR-팩 커넥터(1557 내지 1560)를 포함한다.
내부-보드(inter-board) 통신을 위해 보드(1551)은 커넥터(1590)을 가지며 보드(1556)은 커넥터(1581)를 가진다. 상호접속부(1600,1971,1977,1541,1540)와 같은 하나의 보드에서 다른 보드로의 상호접속부는 이들 커넥터(1590,1581)로 진행한다; 즉, 내부-보드 커넥터(1590,1581)는 상호접속부(1600, 1971, 1977, 1541, 1540)가 한 보드상의 한 컴포넌트와 다른 보드의 다른 컴포넌트 사이에 접속할 수 있게 한다. 내부-보드 커넥터(1590,1581)은 FPGA 버스 상에서 제어 데이터와 제어 신호들을 전송한다.
4-보드 구성에서, 보드(1)과 보드(6)에는 북엔드 보드가 제공되지만, 보드(2;1552)와 보드(3;1553)(도 39참조)는 중간 보드이다. (도 38A와 도 38B를 참조로 설명된 바와 같이) 본 발명에 따라서 마더보드에 결합될 때, 보드(1)과 보드(2)는 쌍을 이루며 보드(3)과 보드(6)은 쌍은 이룬다.
6-보드 구성에서, 상기 설명과 같이 보드(1)과 보드(6)에는 북엔드d) 보드가 제공되지만, 보드(2;1552), 보드(3;1553), 보드(4;1554)와 보드(5;1555)(도 39참조)는 중간 보드이다. (도 38A와 도 38B를 참조로 설명된 바와 같이) 본 발명에 따라서 마더보드에 결합될 때, 보드(1)과 보드(2)는 쌍을 이루며 보드(3)과 보드(4)은 쌍은 이루고 보드(5)과 보드(6)은 쌍은 이룬다.
더 많은 보드가 필요에 따라서 제공될 수 있다. 그러나, 시스템에 추가될 보드의 수와는 관련없이, (도 39의 보드(1)과 보드(6)과 같이) 북엔드 보드는 매쉬 어레이 커넥션을 완전하게 하는 필수 단자를 가진다. 일 실시예에서, 최소 구성은 도 44의 이중-보드 구성이다. 더 많은 보드가 2-보드를 증가시켜 추가될 수 있다. 만약 초기 구성이 보드(1)과 보드(6)이라면, 4-보드 구성으로 변경하는 것은, 상기 언급한 바와 같이, 보드(6)을 외부로 이동시키고, 보드(1)과 보드(2)를 함께 쌍으로 만들고, 다음에 보드(3)과 보드(6)을 함께 쌍으로 만드는 것을 포함한다.
상기 설명한 바와 같이, 각각의 로직 소자는 하나의 홉내에서 인접하는 이웃 로직 소자와 인접하지 않는 이웃 로직 소자에 결합된다. 따라서, 도 39와 도 44에서, 로직 소자(1577)는 상호접속부(1547)을 통해 인접하여 이웃하는 로직 소자(1578)과 결합한다. 또한 로직 소자(1577)는 원-홉 상호접속부(1548)을 통해 이웃하지 않는 로직 소자(1579)와 결합한다. 그러나, 로직 소자(1580)는 결합을 제공하는 상호접속부(1549)로 감기는 토러스 구성으로 인해 로직 소자(1577)와 인접하도록 고려될 수 있다.The modeled first clock register 510 includes a first buffer 511 and a second buffer 512, both of which are D registers. This first clock is modeled in software but the dual-buffer implementation is modeled in both software and hardware. Clock edge detection is generated in a first clock register 510 in software to trigger the hardware model to generate a software clock signal for the hardware model. Data and addresses enter the first buffer 511 on wire lines 519 and 529, respectively. The Q output of this first buffer 511 on wire line 521 is coupled to the D input of second buffer 512. The Q output of this first buffer 511 is also provided to the clock logic 514 gated on the wire line 522 ultimately to drive the clock input of the first buffer 516 of the clock edge register 515. . The output of second buffer 512 on wire line 523 is ultimately gated data logic 513 to drive the input of register 518 through wire line 530 in the user-designed circuit model. Is provided. The enable input to the second buffer 512 in the first clock register 510 is an INPUT-EN signal on the wire line 533 from the state machine, which determines the evaluation cycle and thus controls the various signals.
Clock edge register 515 also includes a first buffer 516 and a second buffer 517. Clock edge register 515 is implemented in hardware. When clock edge detection occurs in software (via input to first clock register 510), it can trigger the same clock edge detection in hardware (via clock edge register 515) in hardware. The D input to the first buffer 516 on the wire line 524 is set to logic “1”. The clock signal on wire line 525 is derived from gate clock logic 514 and ultimately from first clock register 510 at the output on wire line 522 of first buffer 511. The clock signal on wire line 525 is a gate clock signal. The enable wire line 526 for the first buffer 516 is an ~ EVAL signal from the state machine that controls the I / O and evaluation cycles (hereinafter initiated). The first buffer 516 also has a RESET signal on the wire line 527. The same RESET signal is provided to a second buffer 517 in clock edge register 515. The output Q of the first buffer 516 on the wire line 529 is provided to the input D to the second buffer 517. The second buffer 517 also has a CLK-EN signal on wire line 527 and an enable input on wire line 528 for a RESET input. The output Q of the second buffer 517 on the wire line 532 is provided to the enable input of the register 518 in the user-designed circuit model of the user. The buffers 511, 512, 517 along with the register 518 are clocked by the system clock. Only buffer 516 in clock edge register 515 is clocked by gate clock from gate clock logic 514.
Register 518 is a conventional D-type register model that is modeled in hardware and is part of the user's custom circuit design. Evaluation is strictly controlled by this embodiment of the clock implementation means of the present invention. The ultimate goal of this clock-setup is to provide a wire line to register 518 prior to the data signal on wire line 530 so that data signal evaluation by registers can be synchronized using the system clock and without race conditions. Allow the clock enable signal to arrive at 532.
For repetition, the first clock register 510 to be modeled is modeled in software but its double buffer execution is modeled in both software and hardware. Clock edge register 515 is implemented in hardware. From fan-in and fan-out analysis, the gate data logic 513 and gate clock logic 514 are separated for modeling purposes and can be software (if the number of gate data and gate clocks are small) or hardware (gate data and gate clocks). In the case of large number). Gate clock network and gate data network decisions are important for the logic evaluation in the hardware model and the successful performance of the software clock during the hardware acceleration mode.
The software clock execution mainly follows the clock setting shown in FIG. 19 along with the timing of assertion of the ~ EVAL, INPUT-EN, RESET signals. The first clock register 510 detects a clock edge for triggering software clock generation for the hardware model. The clock edge detection event is generated by a clock input on wire line 525, gate clock logic 514, and clock edge register 515 through wire line 522 such that clock edge register 151 detects the same clock edge. Triggers "activation". In this case, the clock detection that occurs in software (via inputs 519 and 520 in the first clock register 510) passes from the clock edge detection in hardware (via the input 525 to the clock edge register 515). Can be. At this time, the IMPUT-EN wire line 533 to the second buffer 512 in the first clock register 510 and the CLKEN wire line 528 to the second buffer 517 in the clock edge register 515 appear. No data evaluation is done. Thus, the clock edge is detected before the data is evaluated in the hardware register model. Note that in this step, data from the data bus on wire line 519 is propagated into hardware-modeled user register 518 rather than out of gate data logic 513. In practice, the data does not reach the second buffer 512 in the first clock register 510 because the INPUT-EN signal on the wire line 533 has not yet appeared.
During the I / O phase, the ~ EVAL signal on wire line 526 is generated to enable first buffer 516 in clock edge register 515. The ~ EVAL signal monitors the gate clock signal by allowing the gate clock logic to pass through the gate clock logic 514 to the clock input on the wire line 525 of the first buffer 516. Thus, as will be described below with respect to a four-state evaluation state machine, the ~ EVAL signal may be maintained as long as it is required to stabilize the data and clock signals through the system portion shown in FIG.
When the signal stabilizes, ~ EVAL is reduced to disable the first buffer 516 when the I / O is terminated or the system prepares for data evaluation. The CLK-EN signal is generated and applied to the second buffer 517 via wire line 528 to enable the second buffer 517 and on the wire line 532 at the enable input to the register 518. Send a logic " 1 " value on wire line 529 to output Q. Register 518 is enabled and any data present in wire line 530 is synchronously clocked into register 518 by the system clock. As the reader is observable, the enable signal to register 518 is faster than the evaluation of the data signal to register 518.
The IMPUT-EN signal on wire line 533 does not appear in the second buffer 512. RESET edge register signals on wire line 527 also appear in buffers 5156 and 517 in clock edge register 515 to reset these buffers and cause their output to be logic " 0 ". The INPUT-EN signal appears for buffer 512 and the data on wire line 521 is propagated to gate data logic 513 for user circuit register 518 on wire line 530. Since the enable input to the register 518 is a logic " 0 ", the data on the wire line 530 cannot be clocked in the register 518. However, the previous data is clocked by the enable signal already generated on wire line 532 before the RESET signal is already generated to disable register 518. Thus, input data to register 518 as well as inputs to other registers that are part of the user hardware model circuit design are stabilized at their respective register input ports. If the clock edge is later detected in software, the first clock register 510 and the clock edge register 515 in hardware clock the data wait at the input of register 518 and the wait for data at their respective register inputs. To enable the enable signal to register 518 to be synchronized by the system clock.
As described above, software clock execution mainly follows the clock setting shown in FIG. 19 along with the timing of generation of the ~ EVAL, INPUT-EN, CLK-EN, RESET signals. 20 illustrates a four state finite state machine for controlling the software clock logic of FIG. 19 in accordance with an embodiment of the present invention.
In state 510, the system is idle or some I / O operation is in progress. The ~ EVAL signal is logic "0". The ~ EVAL signal determines the evaluation cycle and is generated by the system controller and sustains many of the clock cycles required to stabilize the logic in the system. Typically, the ~ EVAL signal period is determined by the placement means during compilation and is based on the length of the longest straight wire and the length of the longest segment multiplex wire (ie, TDM circuit). During evaluation, the ~ EVAL signal is logic "1".
In state 541, the clock is enabled. The CLK-EN signal is generated at logic " 1 " and an enable signal for the hardware register model is generated. However, previously gated data in the hardware register model is evaluated synchronously without the risk of maintenance-time violations.
In state 542, new data is enabled when the INPUT-EN signal is generated at logic " 1. " The RESET signal is generated to move the enable signal from the hardware register model. However, new data enabled through the gate logic network to the hardware register model is waited to be clocked into the hardware register model when propagation continues or the enable signal is generated again for the intended hardware register model determination.
In state 543, new data propagation is stabilized in the logic while the EVAL signal is maintained at logic " 1. " The multiplexer-wires are at logic " 1 " as described above for the time division multiplex (TDM) circuits in connection with Figures 9 (A), 9 (B) and 9 (C). The ~ EVAL signal is set to Decrease or Logic " 0 " and the system returns to sleep state 540 and waits for software to evaluate upon detection of the clock edge.
D. FPGA Arrays and Controls
The SEmulator system initially compiles the user circuit design data into software and hardware models based on various controls including component types. During the hardware compilation process, as described above in FIG. 6, the system performs a mapping, placement, and routing process for the optimal partitioning, placement, and interconnection of the various components that make up the user circuit design. Using known programming tools, bitstream configuration files or programmer object files (.pof) (or, optionally, raw binary files (.rbf)) are referenced to reconfigure a hardware board containing many FPGA chips. Each chip contains part of a hardware model that corresponds to a user circuit design.
In one embodiment, the SEmulator system uses a total of 16 chips, 4 × 4 array of FPGA chips. Example FPGA chips include FPGA logic devices and Altera FLEX 10K devices from the Xilinx XC4000 series family.
The FPGAs of the Xilinx XC4000 series can be used, including the XC4000, XC4000A, XC4000D, XC4000H, XC4000E, XC4000EX, XC4000L, and XC4000XL. Specific FPGAs include Xilinx XC4005H, XC4025, and Xilinx 4028EX. Xilinx 4028EX FPGA engines reach 500,000 gates in a single PCI board capacity. Such Xilinx FPGAs can be found in detail in their databook, Xilinx, and programmable logic databook 9/96, which are incorporated herein by reference. Altera FPGAs can be found in detail in the databook, Altera, 1996 databook (June 1996), which is incorporated herein by reference.
A general brief description of the XC4025 FPGA is provided. Each array chip consists of a 240-pin Xilinx chip. The array board with the Xilinx XC4025 chip includes 440,000 configurable gates and can perform computationally-intensive tasks. Xilinx XC4025 FPGAs consist of 1024 deployable logic blocks (CLBs). Each CLB can implement 32-bit asynchronous SRAM or a small amount of common boolean logic, and two strobe registers. Around the chip, unstrobed I / O registers are provided. The XC4005H can be used instead of the XC4025. This is a relatively inexpensive version of the array board with 120,000 deployable gates. The XC4005H device has a high-power 24kW drive circuit but lacks the input / output flip-flop of the standard XC4000 series. These and other Xilinx FPGAs are available in detail through publicly available data sheets and can be incorporated by reference herein.
The functionality of the Xilinx XC4000 Series FPGAs can be customized by loading configuration data into internal memory cells. The values stored in these memory cells can determine the logic functions and interconnects of the FPGA. Configuration data of such FPGAs may be stored in on-chip memory and loaded from external memory. The FPGA can either read configuration data from an external serial or parallel PROM, or write configuration data from an external device to the FPGA. These FPGAs can be reprogrammed an unlimited number of times, especially if the hardware changes dynamically or if the user wants to adapt the hardware to other applications.
In general, XC4000 series FPGAs have up to 1024 CLBs. Each CLB has two levels of look-up tables, and two four-input look-up tables (or function generators F and G) are the third three-input look-up table (or function generator H), and 2 Four flip-flops or latches provide some input. The output of this look-up table can be driven separately from flip-flops or latches. The CLB may implement a combination of any of the following Boolean functions: (1) any function of four or five variables, (2) any function of four variables, and any second function of four uncorrelated variables And a third function of three uncorrelated variables, (3) one function of four variables and another function of six variables, (4) any function of four variables, and (5) several of nine variables function. Two D-type flip-flops or latches can be used to register the CLB input or to store the look-up table output. This flip-flop can be used separately from the look-up table. DIN can be used as a direct input to one of these two flip-flops or latches and H1 can drive another function generator through an H function generator.
Each 4-input function generator in CLB (i.e., F and G) contains dedicated arithmetic logic during fast generation to send or receive signals, and a 2-bit adder with "carry-in" and "carry-out". It can be configured to implement. Such function generators may also be implemented as read / write random access memory (RAM). Four-input wire lines are used as address lines for RAM.
Altera FLEX 10K chips are somewhat similar in concept. These chips are SRAM-based programmable logic devices (PLDs) with multiple 32-bit buses. Specifically, each FLEX 10K100 chip contains approximately 100,000 gates, 12 embedded array blocks (EABs), 624 logic array blocks (LABs), 8 logic elements (LEs) (or 4,992 LEs), 5,392 flip-flops or registers per LAB. , 406 I / O pins, and a total of 503 pins.
The Altera FLEX 10K chip includes an embedded array block (EAB) of an embedded array and a logic array block (LAB) of a logic array. EABs can be used to implement various memories (eg, RAM, ROM, FIFO), and complex logic functions (eg, digital signal processors (DSPs), microcontrollers, multipliers, data transformation functions, state machines). Depending on the memory function implementation, EAB provides 2048 bits. Depending on the logic function implementation, the EAB provides 100 to 600 gates.
With LE, the LAB can be used to implement medium sized logic blocks. Each LAB represents approximately 96 logic gates and includes 8 LEs and local interconnects. LE includes a four-input look-up table, a programmable flip-flop, and dedicated signal paths for carry and cascade functions. Typical logic functions that can be generated include counters, address decoders, or small state machines.
A more detailed description of the Altera FLEX10K chip can be found in Altera, 1996 databook (June 1996), and is incorporated herein by reference. The databook also includes details of supported programming software.
8 illustrates one embodiment of a 4x4 FPGA array and its interconnections.
Note that this embodiment of the SEmulator does not use crossbar or partial crossbar connections to the FPGA chip. The FPGA chips include chips F11 through F14 in the first row, chips F21 through F24 in the second row, chips F31 through F34 in the third row, and chips F41 through F44 in the fourth row. In one embodiment, each FPGA chip (eg chip F23) has the following pins to interface with the FPGA I / O controller of the SEmulator system:
interface pin Data bus 32 Space exponent 3 Reed, Light, Eval 3 Data XSFR One Address pointer chain 2 sum 41
Thus, in one embodiment, each FPGA chip uses only 41 pins to interface with the SEmulator system. These pins are further described in FIG. 22.
These FPGA chips are interconnected with each other through interconnects without or without crossbars. Each interconnect between chips, such as interconnect 602 between chip F11 and chip F14, represents a 44 pin or 44 wire. In another embodiment, each interconnect represents at least 44 pins. In another embodiment each interconnect represents less than 44 pins.
Each chip has six interconnects. For example, chip F11 has interconnects 600-605. Chip F33 also has interconnects 606-611. These interconnects run vertically along columns horizontally along rows. Each interconnect provides a direct connection between two chips along a row or between two chips along a column. Thus, for example, the interconnect 600 directly connects the chip F11 and the chip F13; Interconnect 601 directly connects chip F11 and chip F12; Interconnect 602 directly connects chip F11 and chip F14; The interconnect 603 directly connects the chip F11 and the chip F31; Interconnect 604 directly connects chip F11 and chip F21; The interconnect 605 directly connects the chip F11 and the chip F41.
Similarly, for a chip F33 not located at the edge of the array (eg chip F11), the interconnect 606 directly connects chip F33 and chip F13; Interconnect 607 directly connects chip F33 and chip F23; Interconnect 608 directly connects chip F33 and chip F34; Interconnect 609 directly connects chip F33 and chip F43; Interconnect 610 directly connects chip F33 and chip F31; The interconnect 611 directly connects the chip F33 and the chip F32.
Since chip F11 is located within one hop from chip F13, interconnect 600 'is referred to as "1". Since chip F11 is located within one hop from chip F12, interconnect 601 is referred to as "1". Similarly, because chip F11 is located within one hop from chip F14, interconnect 602 is referred to as "1". Likewise, all interconnections for chip F33 are referred to as "1".
This interconnect structure allows each chip to be connected with two "jumps" or other chips in an array within the interconnect. Thus, chip F11 is connected to chip F33 by either of the following two paths: (1) interconnect 600-interconnect 606; Or (2) interconnect 603 to interconnect 610. In short, the path may follow (1) first row followed by column, or (2) first column followed by a row.
Figure 8 shows an FPGA chip configured in a 4x4 array by horizontal and vertical interconnects, but the actual physical implementation on the substrate goes through the low and high banks to the extended piggyback substrate. Thus, in one embodiment chips F41-F44 and F21-F24 are in the low bank. Chips F31-F34 and F11-F14 are in the high bank. The piggyback substrate includes chips F11-F14 and chips F21-F24. Thus, a piggyback substrate containing multiple (e.g., eight) chips is added to the bank to expand the array, so that the top of the column that currently contains chips F11-F14 is expanded. In another embodiment, the piggyback substrate will expand the array underneath the row containing the current chips F41-F44. Another embodiment extends the right side of chips F14, F24, F34, F44. Another embodiment extends the left side of chips F11, F21, F31, F41.
FIG. 7 shows the connection matrix of the 4x4 FPGA array of FIG. 8 as "1" or "0". This connection matrix is used to generate deployment cost results from the cost function used for hardware mapping, placement and routing of the SEmulation system. The cost function has been described above with respect to FIG. 6. As in the example, chip F11 is located within one hop from chip F13, so the connection matrix entry of F11-F13 is " 1 ".
Figure 21 illustrates the interconnect pin-out of a single FPGA chip in accordance with an embodiment of the present invention. Each chip has six sets of interconnects, each set containing a certain number of pins. In one embodiment each set has 44 pins. The interconnects of each FPGA chip are oriented horizontally (east-west) and vertically (north-south). The west set of interconnects is called W [43: 0]. The set of interconnections on the east side is called E [43: 0]. The north set of interconnects is called N [43: 0]. The south set of interconnects is called S [43: 0]. This complete set of interconnects is for connecting adjacent chips; That is, these interconnects do not "hop" beyond any chip. For example, in FIG. 8, chip F33 includes interconnects 607 of N [43: 0], interconnects 608 of E [43: 0], interconnects 609 of S [43: 0], and Has an interconnection of W [43: 0].
Returning to FIG. 21, two sets of additional interconnects remain. One set of interconnects is for non-contiguous interconnects vertically followed by-YH [21: 0] and YH [43:22]. Another set of interconnects is for non-contiguous interconnects horizontally leading to XH [21: 0] and XH [43:22]. Each set YH [...] and XH [...] is divided into two such that half of each set contains 22 pins. This configuration allows each chip to be manufactured identically. Thus, each chip can be interconnected in one hop to non-adjacent chips located at the top, bottom, left and right sides. The FPGA chip also represents pin (s) for general purpose signals, FPGA buses, and JTAG signals.
The FPGA I / O controller is described next. This controller was first briefly introduced as item 327 in FIG. FPGA I / O controllers manage data and control traffic between the PCI bus and the FPGA array.
22 illustrates one embodiment of an FPGA controller between a PCI bus and an FPGA array with a bank of FPGA chips. The FPGA I / O controller 700 includes a CTRL_FPGA unit 701, a clock buffer 702, a PCI controller 703, an EEPROM 704, an FPGA serial configuration interface 705, a boundary scan check interface 706, and a buffer ( 707). Appropriate power / voltage regulation circuits known to those skilled in the art are provided. Typical sources include Vcc, which is connected to a voltage detector / regulator and detection amplifier to maintain almost no voltage under various environmental conditions. The Vcc for each FPGA chip is equipped with a fast-acting thin film fuse in between. Vcc-HI is provided in CONFIG # for all FPGA chips and LINTI # for LOCAL_BUS 708.
The CTRL_FPGA unit 701 is the first controller of the FPGA I / O controller 700 that handles various control, inspection, and read / write independent data among various units and buses. CTRL_FPGA unit 701 is coupled to the low and high banks of the FPGA chip. The FPGA chips F41-F44 and F21-F24 (ie, low banks) are connected to the low FPGA bus 718. The FPGA chips F31-F34 and F11-F14 (ie, high banks) are connected to the high FPGA bus 719. These FPGA chips F11-F14, F21-F24, F31-F34, and F41-F44 correspond to the FPGA chips of FIG. 8 and maintain their symbols.
The low bank bus 718 and high bank bus 719 between these FPGA chips F11-F14, F21-F24, F31-F34, F41-F44 are thick film chip resistors for proper load. The resistor group 713 connected to the low bank bus 718 includes, for example, a resistor 716 and a resistor 717. The resistor group 712 connected to the high bank bus 719 includes a resistor 714 and a resistor 715, for example.
If expansion is desired, more FPGA chips may be installed in the low bank bus 718 and high bank bus 719 in the right direction of the FPGA chips F11 and F21. In one embodiment expansion is through a piggyback substrate similar to piggyback substrate 720. Therefore, if the bank of these FPGA chips had only the first eight FPGA chips (F41-F44 and F31-34), then the piggy bank containing the low bank FPGA chips (F24-F21) and the high bank chips (F14-F11). Expansion is further possible by adding a back substrate 720. Piggyback substrate 720 also includes additional low and high bank buses and thick film chip resistors.
PCI controller 703 is the main interface between FPGA I / O controller 700 and 32-bit PCI bus 709. If the PCI bus is extended to 64 bits and / or 66 MHz, appropriate adjustments can be made in the system without departing from the spirit and scope of the present invention. This adjustment will be explained below. One example of a PCI controller 703 that can be used in the system is PCI 9080 or 9060 from PLX Technology. The PCI 9080 has a suitable local bus interface, control registers, FIFOs, and a PCI interface to the PCI bus. Data Book PLX Technology, PCI 9080 Data Sheet (February 28, 1997 version 0.93) is incorporated herein by reference.
The PCI controller 703 delivers data to the CTRL_FPGA unit 701 and the PCI bus 709 via the LOCAL_BUS 708. LOCAL_BUS includes a control bus portion, an address bus portion, and a data bus portion for the control signal, an address signal, and a data signal, respectively. If the PCI bus is extended to 64 bits, the data bus portion of LOCAL_BUS 708 can also be extended to 64 bits. PCI controller 703 is coupled to EEPROM 704, which contains configuration data for PCI controller 703. Exemplary EEPROM 704 is 93CS46 from National Semiconductor.
The PCI bus 709 supplies a 33 MHz clock signal to the FPGA I / O controller 700. The clock signal is provided to the clock buffer 702 via wire line 710 for synchronization purposes and for low timing skew. The output of this clock buffer 702 is a 33 MHz global clock (GL_CLK) signal supplied to all FPGA chips via wire line 711 and to CTRL_FPGA unit 701 via wire line 721. If the PCI bus is extended to 66MHz, the clock buffer will also supply 66MHz to the system.
The FPGA serial configuration interface 705 provides configuration data to configure the FPGA chips F11-F14, F21-F24, F31-F34, and F41-F44. Altera Data Book, Altera, 1996 DATA BOOK (June 1996) provides and processes detailed information on configuration devices. FPGA serial configuration interface 705 is also coupled to LOCAL_BUS 708 and parallel port 721. In addition, the FPGA serial configuration interface 705 is coupled to the CTRL_FPGA unit 701 and the FPGA chips F11-F14, F21-F24, F31-F34, and F41-F44 via the CONF_INTF wire line 723.
The boundary scan test interface 706 provides a JTAG implementation of a certain set of specific test instructions to externally check the logic units and circuits of the processor or system by software. This interface 706 is an IEEE Std. Compile to the 1149-1990 specification. See Altera Data Book, Altera, 1996 DATA BOOK (June 1996) and Application Note 39 (JTAG Boundary-Scan Test on Altera Devices). The two references are incorporated herein by reference for more information. The boundary scan test interface 706 is coupled to the CTRL_FPGA unit 701 and the FPGA chips F11-F14, F21-F24, F31-F34, and F41-F44 via the BST_INTF wire line 724.
The CTRL_FPGA unit 701, along the buffer 707, passes through the low bank 32 bit bus 718 and the high bank 32 bit bus 719, respectively, to the low (chips F41-44 and F21-F24) and high ( Chips F31-34 and F11-F14) transfer data to / from banks, F_BUS 725 for low bank 32 bit FD [31: 0] and F_BUS (726 for high bank 32 bit FD [63:32] ).
One embodiment doubles the throughput of PCI bus 709 of low bank bus 718 and high bank bus 719. PCI bus 709 is 32 bits wide at 33 MHz. Thus, the throughput is 132 MB (= 33 MHz * 4 bytes). The low bank bus 718 is 32 bits at half the PCI bus frequency (33/2 MHz = 16.5 MHz). The high bank bus 719 is also 32 bits at half the PCI bus frequency (33/2 MHz = 16.5 MHz). Throughput for the 64-bit low bank bus and high bank bus is also 132 MB (= 16.5 MHz * 8 bytes). Thus, the performance of the low bank bus and high bank bus follows the performance of the PCI bus. In other words, the performance limit is on the PCI bus, not the low bank bus and the high bank bus.
According to one embodiment of the invention, an address pointer is also implemented with each FPGA chip for each software / hardware boundary address space. These address pointers are combined for several FPGA chips through a multiplexed cross chip address pointer chain. See the address pointer discussed above with respect to FIGS. 9, 11, 12, 14 and 15. In order to move the word select signal for a chain of address pointers and for several chips associated with a given address space, a chain out wire line must be provided. This chain out wire line is shown as an arrow between chips. The chain out wire line for the low bank is the wire line 730 between chip F23 and chip F22. Another said chain out wire line for the high bank is the wire line 731 between chip F31 and chip F32. The chain out wire line 732 at the end of the low bank chip F21 is coupled to the CTRL_FPGA unit 701 as LAST_SHIFT_L. Chain out wire line 733 at the end of high bank chip F11 is coupled to CTRL_FPGA unit 701 as LAST_SHIFT_H. These signals LAST_SHIFT_L and LAST_SHIFT_H are word select signals for each bank when the word select signal is propagated through the FPGA chip. When one of these signals LAST_SHIFT_L and LAST_SHIFT_H provides logic " 1 " to CTRL_FPGA unit 701, this indicates that the word select signal has advanced to each bank end of the chip.
CTRL_FPGA unit 701 writes signal F_WR on wire line 734, read signal F_RD on wire line 735, DATA_XSFR signal on wire line 736, and on wire line 738. The SPACE [2: 0] signal is provided to and from the FPGA chip. CTRL_FPGA unit 701 receives the EVAL_REQ # signal on wire line 739. The write signal F_WR, read signal F_RD, DATA_XSFR signal, and SPACE [2: 0] signal work together with respect to the address pointer of the FPGA chip. The write signal F_WR, the read signal F_RD, the DATA_XSFR signal, and the SPACE [2: 0] signal are determined by the SPACE index SPACE [2: 0] and the MOVE signal is directed to an address pointer associated with the selected address space. Used to generate The DATA_XSFR signal is used to initialize the address pointer and starts the word-by-word transfer process.
The EVAL_REQ # signal is used to restart the evaluation cycle as a whole if any of the FPGA chips see this signal. For example, to evaluate the data, the data is transferred or written from the main memory of the computing station of the host processor to the FPGA via the PCI bus. At the end of the transfer, the evaluation cycle begins, including address pointer initialization and the operation of the software clock to facilitate the evaluation process. However, for various reasons, a particular FPGA chip may need to reevaluate the data entirely. This FPGA chip sees the EVAL_REQ # signal and the CNTI_FPGA chip 701 restarts the evaluation cycle as a whole.
FIG. 23 shows a more detailed example of the CTRL_FPGA unit 701 and buffer 707 of FIG. 22. The same input / output signal and corresponding reference number for the CTRL_FPGA unit 701 shown in FIG. 22 are retained and used in FIG. However, additional signals and wire / bus lines not shown in FIG. 22 are, for example, SEM_FPGA output enable 1016, local interrupt output (Local INTO) 708a, local read / write control signal 708b. Will be described using new reference numbers, such as local address bus 708c, local interrupt input (Local INTE #) 708d, and local data bus 708e.
The CTRL_FPGA unit 701 includes Transfer Done Checking Logic (XSFR_DONE Logic) 1000, EVAL Logic 1001, DMA Descriptor Block 1002, Control Register 1003, and Evaluation Timer Logic. (EVAL timer) 1004, address decoder 1005, write flag sequencer logic 1006, FPGA chip read / write control logic (SEM_FPGA R / W Logic) 1007, demultiplexer and latch (DEMUX logic) 1008 And latches 1009-1012 corresponding to the buffer 707 of FIG. 22. The global clock signal CTRL_FPGA_CLK on wire / bus 721 is provided to all logic elements / blocks of CTRL_FPGA unit 701.
Transfer done checking logic (XSFR_DONE) 1000 receives LAST_SHIFT_H 733, LAST_SHIFT_L 732 and local INTO 708a. The XSFR_DONE logic 1000 outputs the transmission end signal XSFR_DONE on the wire / bus 1013 to the EVAL logic 1001. Based on the reception of LAST_SHIFT_H 733 and LAST_SHIFT_L 732, the XSFR_DONE logic 1000 checks the end of the data transfer so that the evaluation cycle can begin if necessary.
EVAL logic 1001 receives the transmission end signal XSFR_DONE on wire / bus 1013, along with the EVAL_REQ # signal on wire / bus 739 and the WR_XSFR / RD_XSFR signal on wire / bus 1015. EVAL logic 1001 generates two output signals, Start EVAL on wire / bus 1014 and DATA_XSFR on wire / bus 736. The EVAL logic indicates when data transfer between the FPGA bus and the PCI bus begins to initialize the address pointer. The EVAL logic receives an XSFR_DONE signal when the data transfer ends. The WR_XSFR / RD_XSFR signal indicates whether the transmission is read or written. Once the I / O cycle ends (or before the onset of the I / O cycle), the EVAL logic can begin the evaluation cycle using the Start to EVAL Timer signal. The EVAL timer ensures the successful operation of the software clock mechanism by defining the duration of the evaluation cycle and maintaining the evaluation cycle as valid as necessary to stabilize the data propagation for all registers and coupling components.
DMA descriptor block 1002 includes a local bus address on wire / bus 1019, a wire enable signal on wire / bus 1020 from address decoder 1005, and a wire via local data bus 708e. Receive local bus data on / bus 1029. The output is a DMA descriptor output on wire / bus 1046 directed to DEMUX logic 1008 on wire / bus 1045. The DMA descriptor block 1002 includes descriptor block information corresponding to information in the host memory, which includes a PCI address, a local address, a transfer count, a transfer direction, and an address for the next descriptor block. The host also sets up the address of the initial descriptor block in the descriptor pointer register of the PCI controller. Transmission can be initiated by setting a control bit. PCI loads the first descriptor block and initializes the data transfer. The PCI controller continues to load the descriptor block and transfer data until it detects the last chain bit set in the next descriptor pointer register.
The address decoder 1005 receives and forwards local R / W control signals on bus 708b and receives and forwards local address signals on bus 708c. The address decoder 1005 has a write enable signal on the wire / bus 1020 for the DMA descriptor 1002, a write enable signal on the wire / bus 1021 for the control register 1003, a wire / bus 738. FPGA address SPACE index on, control signal on wire / bus 1027, and another control signal on wire / bus 1024 for DEMUX logic 1008.
The control register 1003 receives the write enable signal on the wire / bus 1021 from the address decoder 1005 and the data via the local data bus 708e from the wire / bus 1030. Control register 1003 includes a WR_XSFR / RD_XSFR signal on wire / bus 1015 for EVAL logic 1001, a Set EVAL time signal for EVAL timer 1004, and SEM_FPGA on wire / bus 1016 for FPGA chip. Generate an output enable signal. The system uses the SEM_FPGA output enable signal to selectively turn on or enable each FPGA chip. Typically, the system enables one FPGA chip each at a given time.
EVAL timer 1004 receives the Start EVAL signal on wire / bus 1014 and Set EVAL time on wire / bus 1041. EVAL timer 1004 records the ~ EVAL signal on wire / bus 737, the end of evaluation (EVAL_DONE) signal on wire / bus 1017, and Start on wire / bus 1018 for Write Flag Sequencer logic 1006. Generate a flag signal. In one embodiment, the EVAL timer is 6 bits long.
Write Flag Sequencer logic 1006 receives a Start write flag signal on wire / bus 1018 from EVAL timer 1004. Write Flag Sequencer logic 1006 may be configured to provide a local R / W control signal on wire / bus 1022 for local R / W wire / bus 708b, local on wire / bus 1023 for local address bus 708c. Generate an address signal, a local data signal on wire / bus 1028 for local data bus 708e, and a local INTI # on wire / bus 708d. Upon receiving the start write flag signal, the write flag sequencer logic begins a sequence of control signals to begin a memory write cycle for the PCI bus.
SEM_FPGA R / W control logic 1007 controls control signals on wire / bus 1027 from address decoder 1005 and local R / W on wire / bus 1047 via local R / W control bus 708b. Receive the control signal. SEM_FPGA R / W control logic 1007 includes enable signal on wire / bus 1035 for latch 1009, control signal on wire / bus 1025 for DEMUX logic 1008, for latch 1011. Enable signal on wire / bus 1037, enable signal on wire / bus 1040 to latch 1012, F_WR signal on wire / bus 734, and F-RD signal on wire / bus 735 Create SEM_FPGA R / W control logic 1007 controls various write and read data transfers to / from the FPGA low bank and high bank buses.
DEMUX logic 1008 is multiplexed and latched, which receives four sets of input signals and outputs one set of signals on wire / bus 1026 to local data bus 708e. The selector signals are control signals on wire / bus 1025 from SEM_FPGA R / W control logic 1007 and control signals on wire / bus 1024 from address decoder 1005. DEMUX logic 1008 receives one set of inputs from the EVAL_DONE signal on wire / bus 1042, the XSFR_DONE signal on wire / bus 1043, and the ˜EVAL signal on wire / bus 1044. A single set of these signals has been labeled 1048. In any one time interval, only one of these three signals, EVAL_DONE, XSFR_DONE, and ˜EVAL, is provided to DEMUX logic 1008 to allow for selection. The DEMUX logic 1008 is also the other three sets of input signals, the DMA descriptor output signal on the wire / bus 1045 from the DMA descriptor block 1002 and the data on the wire / bus 1039 from the latch 1012. And another data output on wire / bus 1034 from latch 1010.
The data buffer between CTRL_FPGA unit 701 and the low and high FPGA bank buses includes latches 1009-1012. Latches 1009 are local bus data on wire / bus 1032 via wire / bus 1031 and local data bus 708e, and on wire / bus 1035 from SEM_FPGA R / W control logic 1007. Receive an enable signal. Latch 1009 outputs data on wire / bus 1033 to latch 1010.
Latch 1010 receives data on wire / bus 1033 from latch 1009 and enable signal on wire / bus 1036 via wire / bus 1037 from SEM_FPGA R / W control logic 1007. Receive. Latch 1010 outputs data on wire / bus 725 to FPGA low bank bus and DEMUX logic 1008 via wire / bus 1034.
Latch 1011 receives data on wire / bus 1031 from local data bus 708e and enable signal on wire / bus 1037 from SEM_FPGA R / W control logic 1007. Latch 1011 outputs data on wire / bus 726 to the FPGA high bank bus and outputs data on wire / bus 1038 to latch 1012.
Latch 1012 receives data on wire / bus 1038 from latch 1011 and enable signal on wire / bus 1040 from SEM_FPGA R / W control logic 1007. Latch 1012 outputs data on wire / bus 1039 to DEMUX 1008.
24 shows a 4x4 FPGA array, the relationship of the FPGA array to the FPGA bank, and expansion capabilities. Similar to FIG. 8, FIG. 24 shows the same 4 × 4 array. Also shown is CTRL_FPGA unit 740. The low bank chips (chips F41-F44 and F21-F24) and the high bank chips (chips F31-F34 and F11-F14) are alternately arranged. Thus, the FPGA chip row from the lower row to the upper row is characterized by: low bank-high bank-low bank-high bank. The data transmission chain follows the banks in a predetermined order. The data transfer chain for the low bank is shown by arrow 741. The data transfer chain for the high bank is shown by arrow 742. The JTAG configuration chain is shown by arrow 743, which extends the entire array of chips F41 through F44, F34 through F31, F21 through F24, and F14 through F11, and returns to CTRL_FPGA unit 740.
Expansion can be accomplished with a piggyback board. Assuming that the original array of FPGA chips in FIG. 24 includes F41-F44 and F31-F34, the addition of two or more chips F21-F24 and F11-F14 can be accomplished with the piggyback board 745. Piggyback board 745 includes a suitable bus to extend the bank. Further expansion can be achieved with many piggyback boards arranged on other tops of the array.
25 illustrates one embodiment of a hardware start-up method. Step 800 begins powering up or warming up the boot sequence. In step 801, the PCI controller reads the starting EEPROM. Step 802 reads and writes the PCI controller registers due to the initial sequence. Step 803 boundary scan checks for all FPGA chips in the array. Step 804 configures CTRL_FPGA of the FPGA I / 0 controller. Step 805 reads and writes the registers of the CTRL_FPGA unit. Step 806 sets up the PCI controller for the DMA master read / write mode. The data is then delivered and verified. Step 807 configures all FPGAs with the test design and verifies the modifications. In step 808, the hardware is ready for use. In this regard, the system takes every step that results from a positive confirmation of the operation of the hardware, otherwise the system never reaches step 808.
E. Optional Embodiments Using Highly Integrated FPGA Chips
In one embodiment of the invention, FPGA logic elements are provided on each board. If more FPGA logic elements are required to model your circuit design than are provided on the board, multiple boards with many FPGA logic elements can be provided. The ability to add many boards to the simulation system is a desirable feature of the present invention. In this embodiment, dense FPGA chips such as 10K130V and 10K250V that are changeable are used. The use of these chips changes board design and is used per board instead of up to eight dense FPGA chips (eg, 10K100, which can be changed).
Coupling these boards to the motherboard of the simulation system represents a challenge. Interconnection and connection methods shall compensate for backside defects. In a simulation system, FPGAs are provided on the motherboard through specific board interconnect structures. Each chip can have eight sets of interconnects, where the interconnects are directly adjacent interconnects (ie, N [73: 0], S [73: 0], W [73: 0], E [73: 0]), and one hop neighbor interconnect (i.e. NH [27: 0], SH [27: 0], XH [36: 0], XH) excluding local bus connections within a single board and across other boards. [72:37]). Each chip is directly interconnected to an adjacent neighboring chip or directly to one hop for a non-up-up chip disposed on top, bottom, left and right. In the x direction (east-west), the array is torus. In the Y direction (north south), the array is a mesh.
The interconnects combine logic components and other components within a single board. However, internal board connectors provide for (1) coupling of these boards and interconnects across other boards to carry signals between the PCI bus and array boards through the motherboard, and (2) any two array boards. do. Each board includes its own FPGA bus FD [63: 0] that allows the FPGA logic device to communicate with each other, namely SRAM memory devices, and the CTRL_FPGA unit (FPGA I / 0 controller). FPGA bus FD [63: 0] is not provided for multiple boards. However, FPGA interconnects provide connectivity between FPGA logic devices for many boards, even though these interconnects are not related to the FPGA bus. On the other hand, a local bus is provided for all boards.
Motherboard connectors connect the board to the motherboard, connecting it to the PCI bus, power, and ground. For some boards, motherboard connectors are not used for direct connection to the motherboard. In a six board configuration, only boards 1, 3, and 5 are directly connected to the motherboard and the remaining boards 2, 4, and 6 depend on neighboring boards for motherboard connection. Thus, each other board is directly connected to the motherboard, and the interconnects and local buses of these boards are joined together to the component side via internal board connectors arranged on the solder side. PCI signals are routed only through one of the boards (typically the first board). Power and ground are applied to the other motherboard connectors for these boards. Various internal board connectors placed on the solder side to the component side provide communication between PCI bus components, FPGA logic elements, memory elements, and various simulation systems and control circuits.
56 shows a high level block diagram of an array of FPGA chip configurations in accordance with an embodiment of the present invention. The CTRL_FPGA unit 1200 described above is coupled to bus 1210 via lines 1209 and 1236. In one embodiment, CTRL_FPGA unit 1200 is a programmable logic device (PLD) in the form of an FPGA chip, such as a modifiable 10K50 chip. Bus 1210 is coupled to other simulation array boards (if any) and other chips (eg, PCI, controllers, EEPROMs, clock buffers). 56 shows another major functional block in the form of a logic element and a memory element. In one embodiment, the logic device is a programmable logic device (PLD) in the form of a changeable 10K130V or 10K250V chip. The 10K130V and 10K250V are compatible pins, each in a 599-pin PGA package. Thus, instead of the embodiment described above with eight changeable FLEX 10K100 chips in the array, this embodiment uses only four chips of changeable FLEX 10K130. One embodiment of the present invention describes a board comprising these four logic elements and their connections.
Because the user design is modeled and configured with these arbitrary numbers of logic elements in the array, internal FPGA logic device communication needs to connect one part of the user's circuit design to another part. In addition, initial configuration information and boundary scans are supported by the internal FPGA interconnect. Finally, the required simulation system control signals must be accessible between the simulation system and the FPGA logic device.
Figure 36 illustrates the hardware architecture of the FPGA logic device used in the present invention. FPGA logic device 1500 includes 102 upper I / 0 pins, 102 lower I / 0 pins, 111 left I / O pins, and 10 right I / O pins. Thus, the total number of interconnect pins is 425. In addition, additional 45 I / 0 pins include GCLK, FPGA bus FD [31: 0] (for high bang, FD [63:32] is used), F_RD, F_WR, DATAXSFR, SHIFIN, SHIFTOUT, SPACE [2: 0], ~ EVAL, EVAL_REO_N, DEVICE_OE (signal from CTRL_FPGA unit to turn on output pin of FPGA logic element), and DEV_CLRN (signal from CTRL_FPGA unit to clear all internal flip flops before simulation starts) do. Thus, any data and control signals that cross between any two FPGA logic elements are performed by these interconnects. The remaining pins are used for power and ground.
Figure 37 illustrates an FPGA interconnect pin out for a single FPGA chip in accordance with an embodiment of the present invention. Each chip 1510 may have eight interconnect sets, where each set includes a certain number of pins. Some chips can have up to eight sets of interconnects, depending on their location on the board. In a preferred embodiment, every chip has a set of seven interconnects, although the particular set of interconnections used may vary from chip to chip depending on its respective board location. The interconnects for each FPGA chip are oriented horizontally (east-west) and vertical (north-south). The interconnect set for the west direction is labeled as W [73: 0]. The set of interconnects for the east direction is labeled as E [73: 0]. The set of interconnections for the north direction is labeled as N [73: 0]. The interconnect set for the south direction is labeled as S [73: 0]. These complete sets of interconnects are interconnections for adjacent chips; That is, these interconnects do not "hop" on any chip. For example, in FIG. 39, chip 1570 is interconnect 1540 for N [73: 0], interconnect 1542 for W [73: 0], and interconnect for E [73: 0]. Connector 1543 and interconnect 1545 for S [73: 0]. The FPGA chip 1570, an FPGA2 chip, has all four adjacent interconnect sets-N [73: 0], S [73: 0], W [73: 0] and E [73: 0]. The west interconnect of FPGA0 is connected to the east interconnect of FPGA3 via wire 1539 by a torus-shaped interconnect. Thus, wire 1539 allows chips 1569 (FPGA0) and 1572 (FPGA3) to be directly coupled to each other in a manner similar to winding the west-east end of the board to be wound around.
Referring to FIG. 37, four sets of "hopping" interconnects are provided. Two sets of interconnects are for non-nearly interconnects—NH [27: 0] and SH [27: 0] that extend vertically. For example, the FPGA2 chip of FIG. 39 shows NH interconnect 1541 and SH interconnect 1546. Referring to Figure 37, the other two sets of interconnects are non-contiguous interconnects -XH [36: 0] and XH [72:37] extending horizontally. For example, the FPGA chip 1570 of FIG. 39 shows an XH interconnect 1544.
Referring to FIG. 37, the vertical hopping interconnects NH [27: 0] and SH [27: 0] each have 28 pins. The horizontal interconnect has 73 pins (XH [36: 0] and XH [72:37]). Horizontal interconnect pins XH [36: 0] and XH [72:37] are west (e.g., interconnect 1605 of FIG. 39, for FPGA3 chip 1576) and / or east (e.g., For example, for FPGA0 chip 1573, it may be used in interconnect 1602 of FIG. This structure allows each chip to be ideally formed. Thus, each chip can be connected to one hop for non-adjacent chips disposed on the top, bottom, left and right.
FIG. 39 illustrates a immediately adjacent one hop adjacent FPGA array arrangement of six boards on a single motherboard in accordance with an embodiment of the present invention. This structure is used to describe two possible structures, such as a six board system and a dual board system. Position indicator 1550 indicates that the "Y" direction is north-south and the "X" direction is east-west. In the X direction, the array is torus. In the Y direction, the array is a mesh. In FIG. 39, only the board, FPGA logic elements, interconnects and higher level connectors are shown. Main boards and other auxiliary devices (e.g., SRAM memory devices) and wiring (e.g., FPGA buses) are shown.
Figure 39 illustrates the array shape and components, interconnects and connectors of the board. The actual physical configuration and placement involves placing this board on each edge element side with respect to the solder side. Roughly half of the board is connected directly to the motherboard, while the other half of the board is connected to each neighboring board.
In an embodiment using six boards according to the present invention, six boards 1551 (board 1), 1552 (board 2), 1553 (board 3), 1554 (board 4), 1555 (board 5), 1556 ( Board 6) is provided on a motherboard (not shown) which is part of the reconfigurable hardware of FIG. Each board contains a nearly identical set of components and connectors. Thus, for purposes of explanation, sixth board 1556 includes FPGA logic elements 1565-1568 and connectors 1557-1560 and 1581, and fifth board 1555 includes FPGA logic elements 1569-1572. ) And connectors 1582 and 1583, and fourth board 1554 includes FPGA logic elements 1573-1576 and connectors 1584 and 1585.
In this six board structure, board 1551 and board 1556 are Y-mash terminals, such as R-pack terminals 1557-1560 on board 6 1556, and terminals 1591-1594 on board 1 1551. It is provided as a "bookend" board, including Intermediate boards (ie, boards 1552 (board 2), 1553 (board 3), 1554 (board 4) and 1555 (board 5)) are provided to complete the array.
As noted above, across other boards within a single board except for local bus connections, the interconnects are adjacent direct neighbor interconnects (ie, N [73: 0], S [73: 0], W [73: 0). ], E [73: 0]), and one-hop neighbor interconnect (ie NH [27: 0], SH [27: 0], XH [36: 0], XH [27:37]) Is placed. The interconnect can only combine logic elements with other elements within a single board. However, board-to-board connectors 1581-1590 enable communication between FPGA logic elements across multiple boards (ie, boards 1-6). The FPGA bus is part of the board-to-board connectors 1581-1590. These connectors 1581-1590 are 600-pin connectors that transmit 520 signals and make 80 power / ground connections between two neighboring array boards.
In FIG. 39, various boards are arranged in an asymmetrical manner with respect to the board-to-board connectors 1581-1590. For example, between board 1551 and board 1552, inter-board connectors 1589 and 1590 are provided. Interconnect 1515 connects FPGA logic elements 1511 and 1577 together, and, depending on connectors 1589 and 1590, these connections are symmetrical. However, interconnect 1603 is not symmetric; The FPGA of the third board 1553 is connected to the FPGA logic elements of the board 1551. For connectors 1589 and 1590, these interconnects are asymmetrical. Similarly, the interconnect 1600 is asymmetrical with respect to connectors 1589 and 1590, which are interconnect terminals that connect the FPGA logic element 1557 to the FPGA logic element 1577 through the interconnect 1601. 1591). There are other similar interconnects that show asymmetry.
As a result of this asymmetry, the interconnects are routed (routed) through the board-to-board connectors in two ways: one is a symmetrical interconnect, such as interconnect 1515, and the other is interconnected with interconnects 1603 and 1600. Same asymmetric interconnects. The interconnect routing means are shown in Figures 40 (A) and 40 (B).
In FIG. 39, an example of a direct neighbor connection within a single board is an interconnect 1543 that couples the logic element 1570 to the logic element 1571 along the east-west direction of the board 1555. Another example of a direct neighbor connection within a single board is an interconnect 1607 that couples the logic element 1573 to the logic element 1576 of the board 1554. An example of a direct-neighbor connection between two different boards couples the logic element 1570 of the board 1555 to the logic element 1574 of the board 1554 via connectors 1583 and 1584 along the north-west direction. Interconnect 1545. Here, two board-to-board connectors 1583 and 1584 are used to transmit signals.
An example of a one-hop interconnect within a single board is an interconnect 1544 that couples the logic element 1570 to the logic element 1552 on the board 1555 along the east-west direction. An example of a one-hop interconnect between two different boards is an interconnect 1599 that couples the logic element 1556 of the board 1556 to the logic element 1573 of the board 1554 via connectors 1581-1584. )to be. Here, four board-to-board connectors 1581-1584 are used to transmit across the signal.
In particular, some boards located at the north-south end of the motherboard include a 10 ohm R-pack to terminate a given connection. Thus, the sixth board 1556 includes 10 ohm R-pack connectors 1557-1560, and the first board 1551 includes 10 ohm R-pack connectors 1591-1594. Sixth board 1556 is R-pack connector 1557 for interconnects 1970 and 1971, R-pack connector 1558 for interconnects 1972 and 1541, and R-pack for interconnects 1973 and 1974. Connector 1559, R-pack connector 1560 for interconnects 1975 and 1976. Moreover, interconnects 1561-1564 are not connected to anything. Unlike the east-west torus-type interconnection, the north-south interconnection is arranged in a mesh-type manner.
This mesh interconnection increases the number of north-south interconnections. Otherwise, the interconnect at the north and south edges of the FPGA mesh would all be useless. For example, FPGA logic elements 1511 and 1577 already have one set of direct interconnects 1515. Additional interconnects are also provided to these two FPGA logic elements via R-pack 1591 and interconnects 1600 and 1601; That is, R-pack 1591 connects interconnects 1600 and 1601 together. This increases the number of direct connections between FPGA logic elements 1511 and 1577.
Internal board connections are provided. Logic elements 1577, 1578, 1579, and 1580 on board 1551 are coupled to logic elements 1511, 1512, 1513, and 1514 on board 1522 through interconnects 1515, 1516, 1517, and 1518. Thus, interconnect 1515 couples logic elements 1511 on board 1552 to logic elements 1577 on board 1551 through connectors 1589 and 1590; Interconnect 1516 couples logic elements 1512 on board 1552 to logic elements 1577 on board 1551 through connectors 1589 and 1590; Interconnect 1517 couples logic elements 1513 on board 1552 to logic elements 1579 on board 1551 through connectors 1589 and 1590; Interconnect 1518 couples logic elements 1514 on board 1552 to logic elements 1580 on board 1511 through connectors 1589 and 1590.
Some interconnects, such as interconnects 1595, 1596, 1597 and 1598, are not connected to either because they are not used. However, as described above with respect to logic elements 1511 and 1577, R-pack 1591 is connected to interconnects 1600 and 1601 to increase north-south interconnection.
A dual board embodiment of the present invention is shown in FIG. In a dual board embodiment of the present invention, only two boards are necessary for modeling the user's design in the simulation system. Like the six board structure of FIG. 39, the dual board structure of FIG. 44 uses the same two boards (board 1 1551 and board 6 1556) for the "bookend", which is the reconstruction in FIG. It is provided on a motherboard that is part of the flexible hardware unit 20. In FIG. 44, one bookend board is board 1 and the second bookend board is board 6. In FIG. Board 6 is used in FIG. 44 to show similarly to board 6 in FIG. 39; That is, bookend boards such as boards 1 and 6 have the necessary terminals for north-south mesh connections.
This dual board structure comprises four FPGA logic devices 1577 (FPGA0), 1578 (FPGA1), 1579 (FPGA2) and 1580 (FPGA3) on board 1 (1551), and four FPGA logic devices on board 6 (1556). (1565 (FPGA0), 1566 (FPGA1), 1567 (FPGA2), and 1568 (FPGA3). These two boards are connected by internal-board connectors 1581 and 1590.
This board includes a 10µR-pack to terminate any connection. For a dual board embodiment, the two boards are "bookend" boards. Board 1551 includes 10ΩR-pack connectors 1591, 1592, 1593, and 1594 as resistive terminals. The second board 1556 also includes 10 Ω R-pack connectors 1557-1560.
Board 1551 has a connector 1590 and board 1556 has a connector 1581 for inter-board communication. Interconnects from one board to another, such as interconnects 1600, 1971, 1977, 1541, 1540, proceed to these connectors 1590, 1581; In other words, the internal-board connectors 1590, 1581 allow interconnects 1600, 1971, 1977, 1541, 1540 to connect between one component on one board and another component on the other board. Internal-board connectors 1590 and 1581 transfer control data and control signals on the FPGA bus.
In a four-board configuration, boards 1 and 6 are provided with bookend boards, while boards 21551 and boards 3153 (see FIG. 39) are intermediate boards. When coupled to the motherboard according to the present invention (as described with reference to FIGS. 38A and 38B), board 1 and board 2 are paired and board 3 and board 6 are paired. Achieve.
In a six-board configuration, the board 1 and the board 6 are provided with a bookend board as described above, but with boards 21551, boards 315353 and boards 4554; Board 5 1555 (see FIG. 39) is an intermediate board. When coupled to a motherboard in accordance with the present invention (as described with reference to FIGS. 38A and 38B), board 1 and board 2 are paired and board 3 and board 4 are paired. The board 5 and the board 6 are paired.
More boards may be provided as needed. However, regardless of the number of boards to be added to the system, the bookend boards (such as board 1 and board 6 in FIG. 39) have the necessary terminals to complete the mesh array connection. In one embodiment, the minimum configuration is the dual-board configuration of FIG. 44. More boards can be added by increasing the 2-board. If the initial configuration is board 1 and board 6, then changing to a four-board configuration, as mentioned above, moves board 6 outward and moves board 1 and board 2 into place. Pairing together, and then pairing board 3 and board 6 together.
As described above, each logic element is coupled to a neighboring logic element that is not adjacent to an adjacent neighboring logic element in one hop. Thus, in FIGS. 39 and 44, logic element 1577 couples with adjacent neighboring logic element 1578 through interconnect 1547. Logic element 1577 also couples with non-neighboring logic elements 1579 through one-hop interconnect 1548. However, logic element 1580 may be considered adjacent to logic element 1577 due to a torus configuration that is wound into interconnect 1549 providing coupling.

삭제delete

도 42는 단일 보드에 대한 온-보드 컴포넌트와 커넥터의 상면도(컴포넌트 측면)를 도시한다. 본 발명의 일 실시예에서, 시뮬레이션 시스템내의 사용자 설계를 모델링하기 위해 오로지 하나의 보드만이 필요하다. 다른 실시예에서, 여러 보드(적어도 2 보드)가 필요하다. 따라서, 예컨대, 도 39는 여러 600-핀 커넥터(1581-1590)를 통해 함께 결합된 6 보드(1551-1556)를 도시한다. 상부 단부 및 기저 단부에서, 보드(1551)는 10ΩR-팩의 한 세트에 의해 종결되고 보드(1556)는 10ΩR-팩의 다른 세트에 의해 종결된다.
도 42를 다시 참조하면, 보드(1820)는 4 FPGA, 즉, 로직 소자(1822(FPGA0)), 로직 소자(1823(FPGA1)), 로직 소자(1824(FPGA2)), 로직 소자(1825(FPGA3))를 포함한다. 또한 2 SRAM 메모리 소자(1828,1829)가 제공된다. 상기 SRAM 메모리 소자(1828,1829)는 상기 보드 상의 로직 소자로부터 메모리 블럭을 맵핑하는데 사용된다; 즉, 보 발명의 메모리 시뮬레이션 특징은 상기 보드 상의 로직 소자로부터 상기 보드 상의 SRAM 메모리 소자까지 맵핑하는 것이다. 다른 보드는 유사한 맵핑 동작을 이루기 위해 다른 로직 소자와 메모리 소자를 가질 수 있다. 일 실시예에서, 메모리 맵핑은 보드에 의존한다; 즉, 보드(1)에 대한 메모리 맵핑은 보드(1) 상의 로직 소자와 메모리 소자로 제한되지만 다른 보드와는 무관하다. 다른 실시예에서, 메모리 맵핑은 보드에 의하지 않는다. 따라서, 소수의 큰 메모리 소자는 한 보드 상의 로직 소자로부터 다른 보드 상에 위치한 메모리 소자까지 메모리 블럭을 맵핑하는데 사용된다.42 shows a top view (component side) of an on-board component and connector for a single board. In one embodiment of the invention, only one board is needed to model the user design in the simulation system. In other embodiments, several boards (at least two boards) are required. Thus, for example, FIG. 39 shows six boards 1551-1556 coupled together through several 600-pin connectors 1581-1590. At the top and bottom ends, the board 1551 is terminated by one set of 10 μs R-pack and the board 1556 is terminated by another set of 10 μs R-pack.
Referring back to FIG. 42, the board 1820 has four FPGAs, that is, a logic element 1822 (FPGA0), a logic element 1827 (FPGA1), a logic element 1824 (FPGA2), and a logic element 1825 (FPGA3). )). Also provided are two SRAM memory elements 1828 and 1829. The SRAM memory elements 1828 and 1829 are used to map memory blocks from logic elements on the board; That is, the memory simulation feature of the beam invention maps from a logic element on the board to an SRAM memory element on the board. Other boards may have different logic and memory elements to achieve similar mapping operations. In one embodiment, memory mapping is board dependent; That is, the memory mapping for board 1 is limited to logic elements and memory elements on board 1 but is independent of other boards. In another embodiment, the memory mapping is not board dependent. Thus, a few large memory elements are used to map memory blocks from logic elements on one board to memory elements located on another board.

삭제delete

또한 일부 선택 동작들을 가시적으로 나타내기 위해 발광 다이오드(LED)(1821)가 제공된다. LED 디스플레이는 본 발명의 일 실시예에 따라서 표 A에 나타나 있다:A light emitting diode (LED) 1821 is also provided to visually illustrate some selection operations. LED displays are shown in Table A in accordance with one embodiment of the present invention:

표 A : LED 디스플레이Table A: LED Display

LEDLED 색color 상태condition 설명Explanation LED1LED1 녹색green 온On +5 V 및 +3.3V 는 정상+5 V and +3.3 V are normal 오프off +5 V 또는 +3.3V 는 비정상+5 V or +3.3 V is abnormal LED2LED2 호박색amber 오프off 모든 온-보드 FPGA 구성이 동작됨All On-Board FPGA Configurations Work 점멸Flashing 온-보드 FPGA가 구성되지 않거나 또는 구성에 실패함On-board FPGA is not configured or fails to configure 온On FPGA 구성이 진행중임FPGA configuration is in progress LED3LED3 적색Red 온On 데이터 전송이 진행중임Data transfer is in progress 오프off 데이터 전송을 안함Do not send data 점멸Flashing 상태 점검 실패함Health check failed

PLX PCI 컨트롤러(1826)과 CTRL_FPGA 유닛(1827)과 같은 여러 다른 제어 칩들이 인터-FPGA와 PCI 통신을 제어한다. 시스템에 사용될 수 있는 PLX PCI 컨트롤러(1826)의 일 예는 PLX 테크놀로지의 PCI9080 또는 PCI9060 이다. PCI9080은 PCI 버스에 대한 적절한 로컬 버스 인터페이스, 컨트롤 레지스터, FIFO, 및 PCI 인터페이스를 가진다. 데이터 북 PLX 테크놀로지, PCI9080 데이터 시트(1997년 2월 28일 ver.0.93)가 참조로 여기에 포함된다. CTRL_FPGA 유닛(1827)의 일 예는 Altra 10K50 칩과 같이, FPGA의 형태를 가지는 프로그래머블 로직 소자(PLD)이다. 여러 보드 구성에서, PCI에 결합된 제 1 보드만이 PCI 컨트롤러를 가진다.Several other control chips, such as the PLX PCI controller 1826 and the CTRL_FPGA unit 1827, control inter-FPGA and PCI communications. One example of a PLX PCI controller 1826 that can be used in a system is PCI9080 or PCI9060 from PLX Technology. The PCI9080 has an appropriate local bus interface, control registers, FIFO, and PCI interface to the PCI bus. Data Book PLX Technology, PCI9080 data sheet (February 28, 1997 ver.0.93), is incorporated herein by reference. One example of the CTRL_FPGA unit 1827 is a programmable logic device (PLD) in the form of an FPGA, such as an Altra 10K50 chip. In many board configurations, only the first board coupled to the PCI has a PCI controller.

커넥터(1830)는 보드(1820)를 마더보드(도시안됨)와, PCI 버스, 파워, 및 접지에 접속한다. 일부 보드에 대하여, 커넥터(1830)는 마더보드에 직접 접속하기 위해 사용되지 않는다. 따라서, 이중-보드 구성에서, 오로지 제 1 보드만이 마더보드에 결합된다. 6-보드 구성에서, 오로지 보드(1,3,5)만이 마더보드에 직접 접속되지만 나머지 보드(2,4,6)는 마더보드와 액세스를 위해 이웃하는 보드에 의존한다. 내부-보드 커넥터(J1-J28)가 또한 제공된다. 명칭이 부여된 것에 따라서, 커넥터(J1-J28)는 다른 보드들 간의 접속을 가능하게 한다. Connector 1830 connects board 1820 to the motherboard (not shown) and to the PCI bus, power, and ground. For some boards, the connector 1830 is not used to connect directly to the motherboard. Thus, in a dual-board configuration, only the first board is coupled to the motherboard. In a six-board configuration, only boards 1, 3 and 5 are directly connected to the motherboard while the remaining boards 2, 4 and 6 rely on the motherboard and neighboring boards for access. Inner-board connectors J1-J28 are also provided. As named, the connectors J1-J28 enable connection between different boards.

커넥터(J1)는 외부 파워 및 접지 접속용이다. 아래의 표 B는 본 발명의 일 실시예에 따라서 외부 파워 커넥터(J1)에 대한 핀과 해당하는 설명을 도시한다.Connector J1 is for external power and ground connection. Table B below shows the pins and corresponding descriptions for the external power connector J1 in accordance with one embodiment of the present invention.

표 B : 외부 파워-J1Table B: External Power-J1

핀 번호Pin number 설명Explanation 1One VCC 5VVCC 5V 22 GNDGND 33 GNDGND 44 VCC 3VVCC 3V

커넥터(J2)는 병렬 포트 접속용이다. 커넥터(J1,J2)는 제작시 자립형(stand-alone) 싱글-보드 경계 스캔 테스트를 위해 사용된다. 아래의 표 C는 본 발명의 일 실시예에 따라서 병렬 JTAG 포트 커넥터(J2)에 대한 핀과 해당하는 설명을 도시한다.The connector J2 is for parallel port connection. Connectors J1 and J2 are used for stand-alone single-board boundary scan testing in production. Table C below shows the pins and corresponding descriptions for the parallel JTAG port connector J2 in accordance with one embodiment of the present invention.

표 C : 병렬 JTAG 포트-J2Table C: Parallel JTAG Port-J2

J2 핀 번호J2 pin number J2 신호J2 signal 보드로부터의 I/OI / O from the board DB25 핀 번호DB25 pin number DB25 신호DB25 signal 33 PARA_TCKPARA_TCK II 22 D0D0 55 PARA_TMSPARA_TMS II 33 D1D1 77 PARA_TDIPARA_TDI II 44 D2D2 99 PARA_NRPARA_NR II 55 D3D3 1919 PARA_TDOPARA_TDO OO 1010 NACKNACK 10,12,14,16, 18,20,22,2410,12,14,16, 18,20,22,24 GNDGND 18-2518-25 GNDGND

커넥터(J3,J4)는 보드에 걸리는 로컬 버스 접속을 위한 것이다. 커넥터(J5-J16)는 FPGA 상호커넥트 액세스의 한 세트이다. 커넥터(J17-J28)는 PGA 상호커넥트 액세스의 제2 세트이다. 솔더-측에 컴포넌트-측이 위치할 때, 상기 커넥터는 한 보드의 한 컴포넌트와 다른 보드의 다른 컴포넌트 사이에서 유효한 액세스를 제공한다. 아래의 표 D와 E는 본 발명의 일 실시예에 따라서 커넥터(J1-J28)의 모든 리스트와 설명을 도시한다.Connectors J3 and J4 are for local bus connection to the board. Connectors J5-J16 are a set of FPGA interconnect accesses. Connectors J17-J28 are a second set of PGA interconnect access. When the component-side is located on the solder-side, the connector provides effective access between one component on one board and another component on the other board. Tables D and E below show all lists and descriptions of connectors J1-J28 in accordance with one embodiment of the present invention.

표 D : 커넥터(J1-J28) Table D: Connectors (J1-J28)

커넥터connector 설명Explanation 타입type J1J1 +5V / +3V 외부 파워+ 5V / + 3V External Power 4-핀 파워 RA 헤더, 콤프 측4-pin power RA header, comp side J2J2 병렬 포트Parallel port 0.1"피치,2-로우 스루-홀 RA 헤더, 콤프 측0.1 "pitch, 2-low through-hole RA header, comp side J3J3 로컬 버스Local bus 0.05" 피치,2x30 스루-홀 헤더, SAMTEC, 콤프 측0.05 "pitch, 2x30 through-hole header, SAMTEC, Comp side J4J4 로컬 버스Local bus 0.05" 피치,2x30 스루-홀 리셉터클, SAMTEC, 솔더 측0.05 "pitch, 2x30 through-hole receptacle, SAMTEC, solder side J5J5 로우A:NH[0], VCC3V,GND 로우B:J17 로우B, VCC3V, GNDLow A: NH [0], VCC3V, GND Low B: J17 Low B, VCC3V, GND 0.05" 피치,2x30 SMD 헤더, SAMTEC, 콤프 측0.05 "pitch, 2x30 SMD header, SAMTEC, Comp side J6J6 로우A:J5 로우B, VCC3V,GND 로우B:J5 로우B, VCC3V, GNDLow A: J5 Low B, VCC3V, GND Low B: J5 Low B, VCC3V, GND 0.05" 피치,2x30 리셉터클, SAMTEC, 솔더 측0.05 "pitch, 2x30 receptacle, SAMTEC, solder side J7J7 로우A:N[0], 4x VCC3V,4x GND, N[2] 로우B:N[0], 4x VCC3V,4x GND, N[2]Low A: N [0], 4x VCC3V, 4x GND, N [2] Low B: N [0], 4x VCC3V, 4x GND, N [2] 0.05" 피치,2x45 스루-홀 헤더, SAMTEC, 콤프/솔더 측0.05 "pitch, 2x45 through-hole header, SAMTEC, comp / solder side J8J8 로우A:N[0], 4x VCC3V,4x GND, N[2] 로우B:N[0], 4x VCC3V,4x GND, N[2]Low A: N [0], 4x VCC3V, 4x GND, N [2] Low B: N [0], 4x VCC3V, 4x GND, N [2] 0.05" 피치,2x45 스루-홀 리셉터클, SAMTEC, 콤프/솔더 측0.05 "pitch, 2x45 through-hole receptacle, SAMTEC, comp / solder side J9J9 로우A:NH[2], LASTL, GND 로우B:J21 로우B, GNDLow A: NH [2], LASTL, GND Low B: J21 Low B, GND 0.05" 피치,2x30 SMD 헤더, SAMTEC, 콤프 측0.05 "pitch, 2x30 SMD header, SAMTEC, Comp side J10J10 로우A:J9 로우B, FIRSTL, GND 로우B:J9 로우A, GNDLow A: J9 Low B, FIRSTL, GND Low B: J9 Low A, GND 0.05" 피치,2x30 SMD 리셉터클, SAMTEC, 솔더 측0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J11J11 로우A:NH[1], VCC3V,GND 로우B:J23 로우B, VCC3V, GNDLow A: NH [1], VCC3V, GND Low B: J23 Low B, VCC3V, GND 0.05" 피치,2x30 SMD 헤더, SAMTEC, 콤프 측0.05 "pitch, 2x30 SMD header, SAMTEC, Comp side J12J12 로우A:J11 로우B, VCC3V, GND 로우B:J11 로우A, VCC3V, GNDLow A: J11 Low B, VCC3V, GND Low B: J11 Low A, VCC3V, GND 0.05" 피치,2x30 SMD 리셉터클, SAMTEC, 솔더 측0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J13J13 로우A:N[1], 4x VCC3V,4x GND, N[3] 로우B:N[1], 4x VCC3V,4x GND, N[3]Low A: N [1], 4x VCC3V, 4x GND, N [3] Low B: N [1], 4x VCC3V, 4x GND, N [3] 0.05" 피치,2x45 스루-홀 헤더, SAMTEC, 콤프/솔더 측0.05 "pitch, 2x45 through-hole header, SAMTEC, comp / solder side J14J14 로우A:N[1], 4x VCC3V,4x GND, N[3] 로우B:N[1], 4x VCC3V,4x GND, N[3]Low A: N [1], 4x VCC3V, 4x GND, N [3] Low B: N [1], 4x VCC3V, 4x GND, N [3] 0.05" 피치,2x45 스루-홀 리셉터클, SAMTEC, 콤프/솔더 측0.05 "pitch, 2x45 through-hole receptacle, SAMTEC, comp / solder side J15J15 로우A:NH[3], LASTH, GND 로우B:J27 로우B, GNDLow A: NH [3], LASTH, GND Low B: J27 Low B, GND 0.05" 피치,2x30 SMD 헤더, SAMTEC, 콤프 측0.05 "pitch, 2x30 SMD header, SAMTEC, Comp side J16J16 로우A:J15 로우B, FIRSTH, GND 로우B:J15 로우A, GNDLow A: J15 Low B, FIRSTH, GND Low B: J15 Low A, GND 0.05" 피치,2x30 SMD 리셉터클, SAMTEC, 솔더 측0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J17J17 로우A:SH[0], VCC3V,GND 로우B:J5 로우B, VCC3V, GNDLow A: SH [0], VCC3V, GND Low B: J5 Low B, VCC3V, GND 0.05" 피치,2x30 SMD 헤더, SAMTEC, 콤프 측0.05 "pitch, 2x30 SMD header, SAMTEC, Comp side J18J18 로우A:J17 로우B, VCC3V, GND 로우B:J17 로우A, VCC3V, GNDRow A: J17 Low B, VCC3V, GND Row B: J17 Low A, VCC3V, GND 0.05" 피치,2x30 SMD 리셉터클, SAMTEC, 솔더 측0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J19J19 로우A:S[0], 4x VCC3V,4x GND, S[2] 로우B:S[0], 4x VCC3V,4x GND, S[2]Low A: S [0], 4x VCC3V, 4x GND, S [2] Low B: S [0], 4x VCC3V, 4x GND, S [2] 0.05" 피치,2x45 스루-홀 헤더, SAMTEC, 콤프/솔더 측0.05 "pitch, 2x45 through-hole header, SAMTEC, comp / solder side J20J20 로우A:S[0], 4x VCC3V,4x GND, S[2] 로우B:S[0], 4x VCC3V,4x GND, S[2]Low A: S [0], 4x VCC3V, 4x GND, S [2] Low B: S [0], 4x VCC3V, 4x GND, S [2] 0.05" 피치,2x45 스루-홀 리셉터클, SAMTEC, 콤프/솔더 측0.05 "pitch, 2x45 through-hole receptacle, SAMTEC, comp / solder side J21J21 로우A:SH[2], LASTL, GND 로우B:J9 로우B, GNDLow A: SH [2], LASTL, GND low B: J9 low B, GND 0.05" 피치,2x30 SMD 헤더, SAMTEC, 콤프 측0.05 "pitch, 2x30 SMD header, SAMTEC, Comp side J22J22 로우A:J21 로우B, FIRSTL, GND 로우B:J21 로우A, GNDRow A: J21 Row B, FIRSTL, GND Row B: J21 Row A, GND 0.05" 피치,2x30 SMD 리셉터클, SAMTEC, 솔더 측0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J23J23 로우A:SH[1], VCC3V,GND 로우B:J11 로우B, VCC3V, GNDLow A: SH [1], VCC3V, GND Low B: J11 Low B, VCC3V, GND 0.05" 피치,2x30 SMD 헤더, SAMTEC, 콤프 측0.05 "pitch, 2x30 SMD header, SAMTEC, Comp side J24J24 로우A:J23 로우B, VCC3V, GND 로우B:J23 로우A, VCC3V, GNDLow A: J23 Low B, VCC3V, GND Low B: J23 Low A, VCC3V, GND 0.05" 피치,2x30 SMD 리셉터클, SAMTEC, 솔더 측0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side J25J25 로우A:S[1], 4x VCC3V,4x GND, S[3] 로우B:S[1], 4x VCC3V,4x GND, S[3]Low A: S [1], 4x VCC3V, 4x GND, S [3] Low B: S [1], 4x VCC3V, 4x GND, S [3] 0.05" 피치,2x45 스루-홀 헤더, SAMTEC, 콤프/솔더 측0.05 "pitch, 2x45 through-hole header, SAMTEC, comp / solder side J26J26 로우A:S[1], 4x VCC3V,4x GND, S[3] 로우B:S[1], 4x VCC3V,4x GND, S[3]Low A: S [1], 4x VCC3V, 4x GND, S [3] Low B: S [1], 4x VCC3V, 4x GND, S [3] 0.05" 피치,2x45 스루-홀 리셉터클, SAMTEC, 콤프/솔더 측0.05 "pitch, 2x45 through-hole receptacle, SAMTEC, comp / solder side

커넥터connector 설명Explanation 타입type J27J27 로우A:SH[3], LASTH, GND 로우B:J15 로우B, GNDLow A: SH [3], LASTH, GND low B: J15 low B, GND 0.05" 피치,2x30 SMD 헤더, SAMTEC, 콤프 측0.05 "pitch, 2x30 SMD header, SAMTEC, Comp side J28J28 로우A:J27 로우B, FIRSTH, GND 로우B:J27 로우A, GNDLow A: J27 Low B, FIRSTH, GND Low B: J27 Low A, GND 0.05" 피치,2x30 SMD 리셉터클, SAMTEC, 솔더 측0.05 "pitch, 2x30 SMD Receptacle, SAMTEC, Solder Side

음영진(shaded) 커넥터는 스루-홀 타입니다. 표 D에서, 괄호[]는 SMS FPGA 로직 소자 수(0-3)를 나타낸다. 따라서, S[0]는 남쪽(south) 상호접속(도 37의 S[73:0])과 FPGA(0)의 74 비트를 나타낸다.Shaded connectors are through-hole. In Table D, parentheses [] indicate SMS FPGA logic device numbers (0-3). Thus, S [0] represents the south interconnect (S [73: 0] in FIG. 37) and 74 bits of FPGA (0).

표 E : 로컬 버스 커넥터 - J3,J4Table E: Local Bus Connectors-J3, J4

핀 번호Pin number 신호 명칭Signal name I/OI / O 핀 번호Pin number 신호 명칭Signal name I/OI / O A1A1 GNDGND PWRPWR B1B1 LRESET_NLRESET_N I/OI / O A2A2 J3용 J3_CLK J4용 J4_CLKJ3_CLK for J3 J4_CLK for J4 I/OI / O B2B2 VCC5VVCC5V PWRPWR A3A3 GNDGND PWRPWR B3B3 LD0LD0 I/OI / O A4A4 LD1LD1 I/OI / O B4B4 LD2LD2 I/OI / O A5A5 LD3LD3 I/OI / O B5B5 LD4LD4 I/OI / O A6A6 LD5LD5 I/OI / O B6B6 LD6LD6 I/OI / O A7A7 LD7LD7 I/OI / O B7B7 LD8LD8 I/OI / O A8A8 LD9LD9 I/OI / O B8B8 LD10LD10 I/OI / O A9A9 LD11LD11 I/OI / O B9B9 GNDGND PWRPWR A10A10 VCC3VVCC3V PWRPWR B10B10 LD12LD12 I/OI / O A11A11 LD13LD13 I/OI / O B11B11 LD14LD14 I/OI / O A12A12 LD15LD15 I/OI / O B12B12 LD16LD16 I/OI / O A13A13 LD17LD17 I/OI / O B13B13 LD18LD18 I/OI / O A14A14 LD19LD19 I/OI / O B14B14 LD20LD20 I/OI / O A15A15 LD21LD21 I/OI / O B15B15 VCC3VVCC3V PWRPWR A16A16 LD22LD22 I/OI / O B16B16 LD23LD23 I/OI / O A17A17 LD24LD24 I/OI / O B17B17 LD25LD25 I/OI / O A18A18 LD26LD26 I/OI / O B18B18 LD27LD27 I/OI / O A19A19 LD28LD28 I/OI / O B19B19 LD29LD29 I/OI / O A20A20 LD30LD30 I/OI / O B20B20 LD31LD31 I/OI / O A21A21 VCC3VVCC3V PWRPWR B21B21 LHOLDLHOLD OTOT A22A22 ADS_NADS_N I/OI / O B22B22 GNDGND PWRPWR A23A23 DEN_NDEN_N OTOT B23B23 DTR_NDTR_N 00 A24A24 LA31LA31 OO B24B24 LA30LA30 00 A25A25 LA29LA29 OO B25B25 LA28LA28 00 A26A26 LA10LA10 OO B26B26 LA7LA7 00 A27A27 LA6LA6 OO B27B27 LA5LA5 00 A28A28 LA4LA4 OO B28B28 LA3LA3 00 A29A29 LA2LA2 OO B29B29 종료End ODOD A30A30 VCC5VVCC5V PWRPWR B30B30 VCC5VVCC5V PWRPWR

I/O 방향은 보드1 방향I / O direction is board 1 direction

도 43은 도 41A-41F, 도 42의 커넥터(J1-J28)에 대한 범례를 나타낸다. 일반적으로, 블록내 빈칸은 표면 장착을 나타내며, 블록내 녹색은 스루 홀 타입을 나타낸다. 또한, 솔리드 아웃라인 블록은 컴포넌트 측에 위치한 커넥터를 나타낸다. 도트 아웃라인 블록은 솔더 측에 위치한 커넥터를 나타낸다. 따라서, 빈칸 의 솔리드 아웃라인 블록(1840)은 표면에 장착되고 컴포넌트 측에 위치한 20x30 헤더를 나타낸다. 빈칸의 도트 아웃라인 블록(1841)은 표면에 장착되고 보드의 솔더 측에 위치한 2x30 리셉터클을 나타낸다. 녹색으로 채워진 솔리드 아웃라인 블록(1842)는 스루 홀 타입으로 컴포넌트 측에 위치한 2x30 또는 2x45 헤더를 나타낸다. 녹색으로 채워진 도트 아웃라인 블록(1843)은 스루 홀 타입으로 솔더 측에 위치한 2x30 또는 2x45 리셉터클을 나타낸다. 일 실시예에서, 시뮬레이션 시스템은 표면 장착 및 스루 홀 타입 모두에 대해 2x30 또는 2x45 마이크로 스트립 커넥터의 Samtec SFM과 TFM 계열을 사용한다. 격자표시로 채워진 솔리드 아웃라인 블록(1844)는 보드의 표면에 장착되고 컴포넌트 측에 위치한 R-팩이다. 격자표시로 채워진 도트 아웃라인 블록(1845)는 표면에 장착되고 솔더 측에 위치한 R-팩이다. 웹사이트에서 Samtec 카탈로그에 대한 Samtec 설명은 여기서 참조로 포함되었다. 도 42를 다시 참조하면, 커넥터(j3-j28)는 도 43의 범례에서 가르키는 것과 같은 타입이다.FIG. 43 shows a legend for the connectors J1-J28 of FIGS. 41A-41F, 42. In general, the blanks in the block indicate surface mounting and the green in the block indicates the through hole type. The solid outline block also represents a connector located on the component side. Dot outline blocks represent connectors located on the solder side. Thus, the blank solid outline block 1840 represents a 20x30 header mounted to the surface and located on the component side. The blank dot outline block 1841 represents a 2x30 receptacle mounted on the surface and located on the solder side of the board. Solid outline block 1882 filled in green represents a 2x30 or 2x45 header located on the component side in the form of a through hole. The dot outline block 1843 filled in green represents a 2x30 or 2x45 receptacle located on the solder side in a through hole type. In one embodiment, the simulation system uses the Samtec SFM and TFM series of 2x30 or 2x45 microstrip connectors for both surface mount and through hole types. The solid outline block 1844 filled with grid marks is an R-pack mounted on the surface of the board and located on the component side. The dot outline block 1845 filled with grid marks is an R-pack mounted on the surface and located on the solder side. Samtec description of the Samtec catalog on the website is incorporated herein by reference. Referring again to FIG. 42, the connectors j3-j28 are of the same type as pointed out in the legend of FIG.

도 41A-41F는 각각의 보드와 이들 보드의 각각의 커넥터에 대한 상면도를 도시한다. 도 41A는 보드(6)에 대한 커넥터를 도시한다. 따라서, 보드(1660)은 마더보드 커넥터(1682)를 따라 커넥터(1661-1681)를 포함한다. 도 41B는 보드(5)에 대한 커넥터를 도시한다. 따라서, 보드(1690)은 마더보드 커넥터(1709)를 따라 커넥터(1691-1708)를 포함한다. 도 41C는 보드(4)에 대한 커넥터를 도시한다. 따라서, 보드(1715)는 마더보드 커넥터(1734)를 따라 커넥터(1716-1733)를 포함한다. 도 41D는 보드(3)에 대한 커넥터를 도시한다. 따라서, 보드(1740)은 마더보드 커넥터(1759)를 따라 커넥터(1741-1758)를 포함한다. 도 41E는 보드(2)에 대한 커넥터를 도시한다. 따라서, 보드(1765)는 마더보드 커넥터(1784)를 따라 커넥터(1766-1783)를 포함한다. 도 41F는 보드(1)에 대한 커넥터를 도시한다. 따라서, 보드(1790)은 마더보드 커넥터(1813)를 따라 커넥터(1791-1812)를 포함한다. 도 43의 범례에 도시된 바와 같이, 6 보드에 대한 커넥터는 (1) 표면 장착 타입 또는 스루 홀 타입, (2) 컴포넌트측 또는 솔더측, (3) 헤더 또는 리셉터클 또는 R-팩의 여러 조합이다. 41A-41F show top views of each board and each connector of these boards. 41A shows the connector to the board 6. Thus, board 1660 includes connectors 1601-1681 along motherboard connector 1802. 41B shows the connector to the board 5. Thus, board 1690 includes connectors 1691-1708 along motherboard connector 1709. 41C shows the connector to board 4. Thus, board 1715 includes connectors 1171-1733 along motherboard connector 1734. 41D shows the connector to the board 3. Thus, board 1740 includes connectors 1741-1758 along motherboard connector 1759. 41E shows the connector to board 2. Accordingly, board 1765 includes connectors 1762-1783 along motherboard connector 1784. 41F shows the connector to board 1. Thus, board 1790 includes connectors 1791-1812 along motherboard connector 1813. As shown in the legend of FIG. 43, the connector for the six boards is a combination of (1) surface mount type or through hole type, (2) component side or solder side, (3) header or receptacle or R-pack .

일 실시예에서, 이들 커넥터는 내부-보드 통신에 사용된다. 관련된 버스 및 신호들은 임의의 두 보드 사이에서 신호를 라우팅하기 위해 함께 그룹화되어 이들 내부-보드 커넥터에 의해 지원된다. 또한, 오로지 보드들중 절반만이 마더보드와 직접 결합된다. 도 41A에서, 보드(6)(1660)는 FPGA 상호접속부의 한 세트를 위해 지정된 커넥터(1661-1668), FPGA 상호접속부의 또 다른 세트를 위해 지정된 커넥터(1669-1674,1676,1679), 및 로컬 버스를 위해 지정된 커넥터(1681)을 포함한다. 보드(6)(1660)는 (다른 단부에서 도 41F의 보드(1)(1790)를 따라) 마더보드의 단부에서 보드들중 하나로서 위치하기 때문에, 커넥터(1675,1677,1678,1680)는 소정의 북-남 상호접속을 위해 10ΩR-팩으로 지정된다. 또한, 마더보드 커넥터(1682)는, 여섯번째 보드(1535)가 다섯번째 보드(1534)에 결합되지만 마더보드(1520)에 직접 결합되지 않는 것을 도시한 도 38B에서처럼, 보드(6)(1660)에 사용되지 않는다. In one embodiment, these connectors are used for intra-board communication. Related buses and signals are grouped together and supported by these inner-board connectors to route signals between any two boards. Also, only half of the boards are directly coupled to the motherboard. In FIG. 41A, board 6 (1660) includes connectors 1661-1668 designated for one set of FPGA interconnects, connectors 1669-1674, 1676, 1679 designated for another set of FPGA interconnects, and It includes a connector 1801 designated for the local bus. Since the boards 6 and 1660 are located as one of the boards at the end of the motherboard (along board 1 (1790) of FIG. 41F at the other end), the connectors 1675, 1677, 1678, 1680 are It is designated as 10m R-pack for a given north-south interconnection. Motherboard connector 1802 also includes board 6 (1660), as in FIG. 38B showing that sixth board 1535 is coupled to fifth board 1534 but not directly to motherboard 1520. Not used for

도 41B에서, 보드(5)(1690)는 FPGA 상호접속부의 한 세트를 위해 지정된 커넥터(1691-1698), FPGA 상호접속부의 또 다른 세트를 위해 지정된 커넥터(1699-1706), 및 로컬 버스를 위해 지정된 커넥터(1707)을 포함한다. 커넥터(1709)는 보드(5)(1690)를 마더보드에 결합하는데 사용된다.In FIG. 41B, board 5 1690 designates connector 11691-1698 designated for one set of FPGA interconnects, connector 1699-1706 designated for another set of FPGA interconnects, and for the local bus. A designated connector 1707. Connector 1709 is used to couple boards 5 and 1690 to the motherboard.

도 41C에서, 보드4(1715)는 FPGA 상호접속부의 한 세트를 위해 지정된 커넥터(1716-1723), FPGA 상호접속부의 또 다른 세트를 위해 지정된 커넥터(1724-1731), 및 로컬 버스를 위해 지정된 커넥터(1732,1733)을 포함한다. 커넥터(1709)는 보드4(1715)를 마더보드에 직접 결합하는데 사용되지 않는다. 이러한 구성은, 네번째 보드(1533)가 세번째 보드(1532)와 다섯번째 보드(1534)에 결합되지만 마더보드(1520)에 직접 결합되지 않는 도 38B에 도시된다. In FIG. 41C, board 4 1715 shows a connector 1716-1723 designated for one set of FPGA interconnects, a connector 1724-1731 designated for another set of FPGA interconnects, and a connector designated for the local bus. (1732,1733). Connector 1709 is not used to directly couple board 41715 to the motherboard. This configuration is shown in FIG. 38B in which the fourth board 1533 is coupled to the third board 1532 and the fifth board 1534 but not directly to the motherboard 1520.

도 41D에서, 보드3(1740)는 FPGA 상호접속부의 한 세트를 위해 지정된 커넥터(1741-1748), FPGA 상호접속부의 또 다른 세트를 위해 지정된 커넥터(1749-1756), 및 로컬 버스를 위해 지정된 커넥터(1757,1758)을 포함한다. 커넥터(1759)는 보드3(1740)를 마더보드에 결합하는데 사용된다.In FIG. 41D, board 3 1740 shows connectors designated for one set of FPGA interconnects (1741-1748), connectors designated for another set of FPGA interconnects (1749-1756), and connectors designated for the local bus. (1757,1758). Connector 1959 is used to couple board 3 1740 to the motherboard.

도 41E에서, 보드2(1765)는 FPGA 상호접속부의 한 세트를 위해 지정된 커넥터(1766-1733), FPGA 상호접속부의 또 다른 세트를 위해 지정된 커넥터(1774-1781), 및 로컬 버스를 위해 지정된 커넥터(1782,1783)을 포함한다. 커넥터(1784)는 보드2(1765)를 마더보드에 직접 결합하는데 사용되지 않는다. 이러한 구성은, 두번째 보드(1525)가 세번째 보드(1532)와 첫번째 보드(1526)에 결합되지만 마더보드(1520)에 직접 결합되지 않는 것을 도시한, 도 38B이다. In FIG. 41E, board 2 1765 shows a connector 1762-1733 designated for one set of FPGA interconnects, a connector 1774-1781 designated for another set of FPGA interconnects, and a connector designated for the local bus. (1782,1783). Connector 1784 is not used to directly couple board 2 1765 to the motherboard. This configuration is in FIG. 38B showing that the second board 1525 is coupled to the third board 1532 and the first board 1526 but not directly to the motherboard 1520.

도 41F에서, 보드1(1790)는 FPGA 상호접속부의 한 세트를 위해 지정된 커넥터(1791-1798), FPGA 상호접속부의 또 다른 세트를 위해 지정된 커넥터(1799-1804,1806,1809), 및 로컬 버스를 위해 지정된 커넥터(1811,1812)을 포함한다. 커넥터(1813)는 보드1(1790)를 마더보드에 결합하는데 사용된다. 보드1(1790)는 (다른 단부에서 도 41A의 보드1(1660)를 따라) 마더보드의 단부에서 보드들중 하나로서 위치하기 때문에, 커넥터(1805,1807,1808,1810)는 특정한 북-남 상호접속부를 위해 10ΩR-팩으로 지정된다. In FIG. 41F, Board 1 1790 is a connector 11791-1798 designated for one set of FPGA interconnects, a connector 1799-1804, 1806, 1809 designated for another set of FPGA interconnects, and a local bus. Connectors 1811 and 1812 designated for the purpose of designation. Connector 1813 is used to couple board 1 1790 to the motherboard. Because board 1 1790 is located as one of the boards at the end of the motherboard (along board 1 1660 of FIG. 41A at the other end), connectors 1805, 1807, 1808, 1810 are specific north-south. Designated as 10µR-packs for interconnects.

본 발명의 일실시예에서, 여러 보드는 고유한 방식으로 마더보드 및 각각 다른 보드와 결합한다. 여러 보드는 컴포넌트-측에서 솔더-측으로 함께 결합된다. 또한 보드들중 하나는, 즉 첫번째 보드는 마더보드에 결합되고, 마더보드 커넥터를 통해 PCI 버스에 결합된다. 또한 첫번째 보드 상의 FPGA 상호접속 버스는 FPGA 상호접속 커넥터의 쌍을 통해 다른 보드 즉 제 2 보드의 FPGA 상호접속 버스에 결합된다. 제 1 보드의 FPGA 상호접속 커넥터는 컴포넌트 측 상에 있으며 제 2 보드의 FPGA 상호접속 커넥터는 솔더 측상에 있다. 제 1 보드와 제 2 보드상에 있는 컴포넌트 측과 솔더 측은 각각 FPGA 상호접속부 버스가 함께 결합되게 한다.In one embodiment of the invention, several boards are combined with the motherboard and each other in a unique manner. Several boards are joined together from component-side to solder-side. In addition, one of the boards, the first board, is coupled to the motherboard and is connected to the PCI bus through the motherboard connector. The FPGA interconnect bus on the first board is also coupled to the FPGA interconnect bus of another board, the second board, through a pair of FPGA interconnect connectors. The FPGA interconnect connector of the first board is on the component side and the FPGA interconnect connector of the second board is on the solder side. The component side and solder side on the first and second boards allow the FPGA interconnect buses to be coupled together.

유사하게, 두 개의 보드 상에 있는 로컬 버스는 로컬 버스 커넥터를 통해 함께 결합된다. 제 1 보드 상에 있는 로컬버스 커넥터는 컴포넌트 측에 있으며 제 2 보드 상에 있는 로컬 버스 커넥터는 솔더 측에 있다. 따라서, 각각 제 1 보드와 제 2 보드 상에 있는 컴포넌트 측 및 솔더 측 커넥터는 로컬 버스가 함께 결합되게 한다.Similarly, local buses on two boards are joined together via local bus connectors. The local bus connector on the first board is on the component side and the local bus connector on the second board is on the solder side. Thus, the component side and solder side connectors on the first and second boards respectively allow the local buses to be joined together.

더 많은 보드가 추가될 수 있다. 제 3 보드는 제 2 보드의 컴포넌트 측에 솔더 측를 가지며 추가될 수 있다. 유사하게 FPGA 상호접속부와 로컬 버스 내부-보드 액세스 또한 이루어질 수 있다. 하기 설명처럼, 제 3 보드는 또한 다른 커넥터를 통해 마더보드에 결합되지만 이러한 커넥터는 단순히 전력과 접지를 제 3 보드에 제공한다.More boards can be added. The third board can be added with the solder side on the component side of the second board. Similarly, FPGA interconnect and local bus in-board access can also be made. As described below, the third board is also coupled to the motherboard through other connectors, but these connectors simply provide power and ground to the third board.

이중 보드 구성의 솔더 측의 컴포넌트 측 커넥터는 도 38A를 참조하여 설명될 것이다. 상기 도는 본 발명의 일 실시예를 따라서 마더보드상의 FPGA 보드 액세스의 측면도를 도시한다. 도 38A는 이중-보드 구성을 도시하는데, 명칭이 부여된 오로지 두 개의 보드만이 사용된다. 도 38A에 도시된 이들 두 개의 보드(1525(보드2),1526(보드1))는 도 39에 도시된 두 개의 보드(1552,1551)과 일치한다. 보드(1525,1526)의 컴포넌트 측는 참조번호(1988)로 표시된다. 두 개의 보드(1525,1526)의 솔더 측은 참조번호(1988)로 표시된다. 도 38A에 도시된 바와 같이, 이들 두 개의 보드(1525,1526)는 마더보드 커넥터(1523)를 통해 마더보드(1520)에 결합된다. 다른 마더보드 컨넥터(1521,1522,1524) 또한 확장되어 제공될 수 있다. PCI 버스와 보드(1525,1526) 사이의 신호는 마더보드 커넥터(1523)을 통해 라우팅된다. PCI 신호는 이중-보드 구조와 PCI 버스 사이에서 제 1 보드(1526)을 통해 먼저 라우팅된다. 따라서, PCI 버스로부터의 신호는 이들이 제 2 보드(1525)로 전송되기 전에 먼저 제 1 보드(1526)를 인카운팅한다. 아날로그식으로, 이중-보드 구조로부터의 PCI 버스의 신호는 제 1 보드(1526)으로부터 전송된다. 전력은 또한 전력 공급기(도시안됨)로부터 마더보드 커넥터(1523)을 통해 보드(1525,1526)에 인가된다.The component side connector on the solder side of the dual board configuration will be described with reference to FIG. 38A. The figure illustrates a side view of an FPGA board access on a motherboard in accordance with an embodiment of the present invention. 38A shows a dual-board configuration, in which only two boards are named. These two boards 1525 (board 2) and 1526 (board 1) shown in FIG. 38A correspond to the two boards 1552 and 1551 shown in FIG. 39. The component side of boards 1525 and 1526 is indicated by reference numeral 1988. The solder side of the two boards 1525 and 1526 is indicated by reference numeral 1988. As shown in FIG. 38A, these two boards 1525 and 1526 are coupled to the motherboard 1520 through the motherboard connector 1523. Other motherboard connectors 1521, 1522, and 1524 may also be extended. Signals between the PCI bus and the boards 1525 and 1526 are routed through the motherboard connector 1523. PCI signals are first routed through the first board 1526 between the dual-board architecture and the PCI bus. Thus, signals from the PCI bus first count the first board 1526 before they are sent to the second board 1525. Analogically, signals of the PCI bus from the dual-board architecture are transmitted from the first board 1526. Power is also applied to the boards 1525 and 1526 through the motherboard connector 1523 from a power supply (not shown).

도 38A에 도시된 것처럼, 보드(1526)는 여러 컴포넌트와 커넥터를 포함한다. 하나의 컴포넌트는 FPGA 로직 소자(1530)이다. 또한 커넥터(1528A,1531A)가 제공된다. 유사하게, 보드(1525)는 여러 컴포넌트와 커넥터를 포함한다. 이러한 한가지 컴포넌트는 FPGA 로직 소자(1529)이다. 또한 커넥터(1528B,1531B)가 제공된다.As shown in FIG. 38A, board 1526 includes several components and connectors. One component is the FPGA logic element 1530. Also provided are connectors 1528A, 1153A. Similarly, board 1525 includes several components and connectors. One such component is the FPGA logic element 1529. Also provided are connectors 1528B, 1153B.

일 실시예에서, 커넥터(1528A,1528B)는 (도 44의) (1590,1581)과 같은 FPGA 버스와 같은 내부-보드 커넥터이다. 이들 내부-보드 커넥터는 로컬 버스 커넥션을 제외한 N[73:0], S[73:0], W[73:0], E[73:0], NH[27:0], SH[27:0], XH[36:0], XH[72:37]와 같은 여러 FPGA 상호접속용 내부-보드 접속을 제공한다.In one embodiment, connectors 1528A, 1528B are internal-board connectors, such as an FPGA bus, such as 1590,1581 (of FIG. 44). These internal-board connectors are N [73: 0], S [73: 0], W [73: 0], E [73: 0], NH [27: 0], SH [27: except local bus connections. It provides internal-board connections for several FPGA interconnects such as 0], XH [36: 0], and XH [72:37].

더욱이, 커넥터(1531A-1531B)는 로컬 버스용 내부-보드 커넥터이다. 로컬 버스는 (PCI 컨트롤러를 통하는) PCI 버스와 (FPGA I/O 컨트롤러(CTRL_FPGA)유닛을 통하는) FPGA 버스 사이에서 신호를 처리한다. 로컬 버스는 또한 PCI 컨트롤러와 FPGA 로직 소자 및 FPGA I/O 컨트롤러(CTRL_FPGA) 유닛 사이에서 구성 및 경계 스캔 테스트 정보를 처리한다.Moreover, connectors 1531A-1531B are internal-board connectors for the local bus. The local bus handles signals between the PCI bus (via the PCI controller) and the FPGA bus (via the FPGA I / O controller (CTRL_FPGA) unit). The local bus also handles configuration and boundary scan test information between the PCI controller and FPGA logic devices and the FPGA I / O controller (CTRL_FPGA) unit.

결국, 마더보드 커넥터는 보드 쌍으로 하나의 보드를 PCI 버스와 파워에 결합한다. 커넥터의 한 세트는 한 보드의 컴포넌트 측를 통해 다른 보드의 솔더 측에 FPGA 상호접속부를 결합한다. 커넥터의 다른 세트는 한 보드의 컴포넌트 측를 통해 로컬 버스를 다른 보드의 솔더 측에 결합한다.In the end, the motherboard connector is a pair of boards that couple a board to the PCI bus and power. One set of connectors couples the FPGA interconnect to the solder side of the other board through the component side of one board. Another set of connectors couples the local bus to the solder side of the other board through the component side of one board.

본 발명의 다른 실시예에서, 두 개 이상의 보드가 사용된다. 게다가, 도 38B는 6-보드 구성을 도시한다. 구성은 모든 다른 보드가 마더보드와 직접 액세스되고, 이들 보드의 상호접속부와 로컬 버스가 솔더-측에서 컴포넌트-측에 배치된 내부-보드 커넥터를 통해 함께 결합되는 도 38A와 유사하다.In another embodiment of the present invention, two or more boards are used. In addition, Figure 38B shows a six-board configuration. The configuration is similar to FIG. 38A in which all other boards are directly accessed with the motherboard and the interconnects and local buses of these boards are joined together through an inner-board connector placed at the solder-side to the component-side.

도 38B는 6 보드(1526(제1보드), 1525(제2보드), 1532(제3보드), 1533(제4보드),1534(제5보드),1535(제6보드))를 도시한다. 상기 6 보드는 보드(1526(제1보드), 1532(제3보드), 1534(제5보드))상의 커넥터를 통해 마더보드(1520)와 결합한다. 다른 보드(1525(제2보드), 1533(제4보드), 1535(제6보드))는 마더보드(1520)와 직접 결합하지 않지만; 자신들의 각각 이웃하는 보드들과의 접속을 통해 간접적으로 마더보드와 결합한다.38B shows six boards 1526 (first board), 1525 (second board), 1532 (third board), 1533 (fourth board), 1534 (fiveth board), and 1535 (6th board). do. The six boards are coupled to the motherboard 1520 through connectors on boards 1526 (first board), 1532 (third board), and 1534 (fifth board). The other boards 1525 (second board), 1533 (fourth board), 1535 (sixth board) do not directly couple with the motherboard 1520; Indirectly couples with the motherboard through connections with their respective neighboring boards.

컴포넌트-측에 솔더-측을 위치시켜, 여러 내부-보드 커넥터들은 PCI 버스 컴포넌트, FPGA 로직 소자, 메모리 소자, 및 여러 시뮬레이션 시스템 제어 회로들 간에 통신을 가능하게 한다. 내부-보드 커넥터(1990)의 제 1 세트는 도 42의 커넥터(J5-J16)에 해당한다. 내부-보드 커넥터(1991)의 제 2 세트는 도 42의 커넥터(J17-J28)에 해당한다. 내부-보드 커넥터(1992)의 제 3 세트는 도 42의 커넥터(J3-J4)에 해당한다. By placing the solder-side on the component-side, several internal-board connectors enable communication between PCI bus components, FPGA logic devices, memory devices, and various simulation system control circuits. The first set of inner-board connectors 1990 corresponds to connectors J5-J16 of FIG. 42. The second set of inner-board connectors 1991 corresponds to connectors J17-J28 of FIG. 42. The third set of inner-board connectors 1992 corresponds to connectors J3-J4 of FIG. 42.

마더보드 커넥터(1521-1524)는 마더보드(이에 따라 PCI 버스)를 6 보드에 결합하기 위해 마더보드(1520) 상에 제공된다. 상기 언급한 바와 같이, 보드(1526(제1보드), 1532(제3보드), 1534(제5보드))는 커넥터(1523,1522,1521)과 각각 직접 결합한다. 다른 보드(1525(제2보드), 1533(제4보드), 1535(제6보드))는 마더보드(1520)와 직접 결합하지 않는다. 오로지 하나의 PCI 컨트롤러만이 모든 6 보드들에 대해 필요하기 때문에, 오로지 제 1 보드(1526)만이 PCI 컨트롤러를 포함한다. 또한, 제 1 보드(1526)에 결합된 마더보드 커넥터(1523)PCI 버스로부터/버스로의 액세스를 제공한다. 커넥터(1522,1521)는 오로지 전력 및 접지와 결합한다. 인접한 마더보드 커넥터들간의 센터-대-센터 간격은 일 실시예에서 대략 20.32 mm 이다. Motherboard connectors 1521-1524 are provided on motherboard 1520 to couple the motherboard (and therefore PCI bus) to the six boards. As mentioned above, the boards 1526 (first board), 1532 (third board), and 1534 (fifth board) are directly coupled with the connectors 1523, 1522, and 1521, respectively. The other boards 1525 (second board), 1533 (fourth board), and 1535 (6th board) do not directly couple with the motherboard 1520. Since only one PCI controller is needed for all six boards, only the first board 1526 includes the PCI controller. In addition, a motherboard connector 1523 coupled to the first board 1526 provides access to / from the PCI bus. Connectors 1522 and 1521 only couple with power and ground. The center-to-center spacing between adjacent motherboard connectors is approximately 20.32 mm in one embodiment.

마더보드 커넥터(1523,1522,1521)과 각각 직접 결합한 보드(1526(제1보드), 1532(제3보드), 1534(제5보드))에 대하여, 각각 커넥터(J5-J16)는 컴포넌트 측 상에 위치하고, 각각 커넥터(J17-J28)는 솔더 측 상에 위치하며, 로컬 버스 커넥터(J3-J4)는 컴포넌트 측상에 위치한다. 마더보드 커넥터(1523,1522,1521)과 각각 직접 결합하지 않은 보드(1525(제2보드), 1533(제4보드), 1535(제6보드))에 대하여, 각각 커넥터(J5-J16)는 솔더 측 상에 위치하고, 각각 커넥터(J17-J28)는 컴포넌트 측상에 위치하며, 로컬 버스 커넥터(J3-J4)는 솔더 측상에 위치한다. 단부 보드(1526(제1보드), 1535(제6보드))에 대하여, 커넥터(J17-J28)의 쌍이 10Ω R-팩 단자이다.For the boards 1526 (first board), 1532 (third board), and 1534 (fifth board) directly bonded to the motherboard connectors 1523, 1522, and 1521, respectively, the connectors J5-J16 are on the component side. Are located on the solder side, and the local bus connectors J3-J4 are located on the component side. For boards 1525 (second board), 1533 (fourth board), and 1535 (fourth board) that are not directly coupled to the motherboard connectors 1523, 1522, and 1521, respectively, the connectors J5-J16 are Located on the solder side, each connector J17-J28 is located on the component side, and local bus connectors J3-J4 are located on the solder side. For the end boards 1526 (first board) and 1535 (sixth board), the pair of connectors J17-J28 are 10 Ω R-pack terminals.

도 40A와 도 40B는 다른 보드들 간의 어레이 커넥션을 도시한다. 제조 프로세스을 용이하게 하기 위하여, 단일 레이아웃 구조가 모든 보드들에 사용된다. 상기 설명한 바와 같이, 보드는 후면 없이 커넥터를 통해 다른 보드들과 접속한다. 도 40A는 두 개의 예시적인 보드(1611(보드2),1610(보드1))를 도시한다. 보드(1610)의 컴포넌트 측은 보드(1611)의 솔더 측에 접한다. 보드(1611)은 다양한 FPGA 로직 소자, 다른 컴포넌트, 와이어 라인을 포함한다. 보드(1611) 상의 상기 로직 소자와 다른 컴포넌트의 특정 노드는 노드A'(참조번호1612), 노드B'(참조번호1614)로 표기된다. 노드A'는 PCB 트레이스(1620)를 통해 커넥터 패드(1616)에 결합한다. 유사하게, 노드B'는 PCB 트레이스(1623)를 통해 커넥터 패드(1617)에 접속된다.40A and 40B show array connections between different boards. To facilitate the manufacturing process, a single layout structure is used for all boards. As described above, the board connects to other boards through a connector without a backside. 40A shows two exemplary boards 1611 (board 2), 1610 (board 1). The component side of the board 1610 abuts the solder side of the board 1611. Board 1611 includes various FPGA logic elements, other components, and wire lines. Specific nodes of the logic element and other components on the board 1611 are denoted by node A '(reference number 1612) and node B' (reference number 1614). Node A 'couples to connector pad 1616 via PCB trace 1620. Similarly, node B 'is connected to connector pad 1617 via PCB trace 1623.

유사하게, 보드(1610)은 다양한 FPGA 로직 소자, 다른 컴포넌트, 와이어 라인을 포함한다. 보드(1610) 상의 상기 로직 소자와 다른 컴포넌트의 특정 노드는 노드A(참조번호1613), 노드B(참조번호1615)로 표기된다. 노드A는 PCB 트레이스(1625)를 통해 커넥터 패드(1618)에 결합한다. 유사하게, 노드B는 PCB 트레이스(1622)를 통해 커넥터 패드(1619)에 접속된다.Similarly, board 1610 includes various FPGA logic elements, other components, and wire lines. Specific nodes of the logic elements and other components on board 1610 are denoted by Node A (reference numeral 1613) and Node B (reference numeral 1615). Node A couples to connector pad 1618 through PCB trace 1625. Similarly, Node B is connected to connector pad 1619 via PCB trace 1622.

표면 장착 커넥터를 사용하여 다른 보드들에 위치한 노드들간의 신호 라우팅(경로설정)이 설명된다. 도 40A에서, 바람직한 접속은: (1) 가상 경로(1620,1621,1622)에 의해 지시되는 노드A와 노드B' 사이에서 이루어지고, (2) 가상 경로(1623,1624,1625)에 의해 지시되는 노드B와 노드A'에서 이루어진다. 상기 접속은 도 39의 보드(1551)과 보드(1552) 사이의 비대칭 접속과 같은 경로를 이룬다. 다른 비대칭 상호접속은 커넥터(1589-1590)의 양 측상에서 NH-SH 상호접속부(1577,1579,1581)를 포함한다.Signal routing (routing) between nodes located on different boards using surface mount connectors is described. In FIG. 40A, the preferred connection is: (1) between Node A and Node B 'indicated by virtual paths 1620, 1641 and 1622, and (2) indicated by virtual paths 1623, 1624, 1625. Node B and node A '. The connection forms the same path as the asymmetrical connection between board 1551 and board 1552 of FIG. 39. Other asymmetrical interconnects include NH-SH interconnects 1577, 1579, 1581 on both sides of connectors 1589-1590.

A-A'와 B-B'는 상호접속부(1515(N,S))와 같은 대칭 상호접속부에 해당하는 반면, N과 S 상호접속은 스루홀 커넥터를 사용하며, NH와 SH 비대칭 상호접속은 SMD 커넥터를 사용한다. 표 D를 참조하라.A-A 'and B-B' correspond to symmetrical interconnects such as interconnects 1515 (N, S), while N and S interconnects use through-hole connectors, while NH and SH asymmetric interconnects Use SMD connectors. See Table D.

표면 장착 커넥터를 사용한 실제 구현이 동일 항목에 대하여 동일 번호를 사용하여 도 40B를 참조하여 설명된다. 도 40B에서, 보드(1611)는 PCB 트레이스(1620)를 통해 컴포넌트-측 커넥터 패드(1636)에 결합된 컴포넌트 측 상의 노드A'를 도시한다. 컴포넌트-측 커넥터 패드(1636)는 도전 경로(1651)를 통해 솔더-측 커넥터 패드(1639)에 결합한다. 솔더-측 커넥터 패드(1639)는 도전 경로(1648)을 통해 보드(1610) 상의 컴포넌트 측 커넥터 패드(1642)에 결합한다. 마지막으로, 컴포넌트-측 커넥터 패드(1642)는 PCB 트레이스(1622)를 통해 노드B에 결합한다. 따라서, 보드(1611) 상의 노드A'는 보드(1610) 상의 노드B와 결합될 수 있다.Actual implementations using surface mount connectors are described with reference to FIG. 40B using the same numbers for the same items. In FIG. 40B, board 1611 shows node A 'on the component side coupled to component-side connector pad 1636 via PCB trace 1620. The component-side connector pad 1636 couples to the solder-side connector pad 1639 via a conductive path 1651. Solder-side connector pad 1639 couples to component-side connector pad 1644 on board 1610 via conductive path 1648. Finally, component-side connector pad 1644 couples to Node B via PCB trace 1622. Thus, node A 'on board 1611 may be combined with node B on board 1610.

마찬가지로, 도 40B에서, 보드(1611)는 PCB 트레이스(1623)를 통해 컴포넌트-측 커넥터 패드(1638)에 결합된 컴포넌트 측 상의 노드B'를 도시한다. 컴포넌트-측 커넥터 패드(1638)는 도전 경로(1650)를 통해 솔더-측 커넥터 패드(1637)에 결합한다. 솔더-측 커넥터 패드(1637)는 도전 경로(1645)을 통해 컴포넌트 측 커넥터 패드(1640)에 결합한다. 마지막으로, 컴포넌트-측 커넥터 패드(1640)는 PCB 트레이스(1625)를 통해 노드A에 결합한다. 따라서, 보드(1611) 상의 노드B'는 보드(1610) 상의 노드A와 결합될 수 있다. 이들 보드들은 동일한 레이아웃을 공유하기 때문에, 도전 경로(1652,1653)은 보드(1610)에 인접하여 위치한 다른 보드에 대한 도전 경로(1650,1651)과 동일한 방식으로 사용될 수 있다. 따라서, 고유한 내부-보드 접속 수단이 스위칭 컴포넌트를 사용하지 않고 표면 장착 및 스루 홀 커넥터를 사용하여 제공된다.Likewise, in FIG. 40B, board 1611 shows NodeB ′ on the component side coupled to component-side connector pad 1638 through PCB trace 1623. Component-side connector pads 1638 couple to solder-side connector pads 1637 through conductive paths 1650. Solder-side connector pad 1637 couples to component-side connector pad 1640 through conductive path 1645. Finally, component-side connector pad 1640 couples to Node A via PCB trace 1625. Thus, Node B ′ on board 1611 may be combined with Node A on board 1610. Since these boards share the same layout, the conductive paths 1652 and 1653 can be used in the same manner as the conductive paths 1650 and 1651 to other boards located adjacent to the board 1610. Thus, unique inner-board connection means are provided using surface mount and through hole connectors without using switching components.

F. 타이밍-인센서티브 글리치-프리(TIGF) 로직 소자F. Timing-Insensitive Glitch-Free (TIGF) Logic Devices

본 발명의 일 실시예는 유지 시간(hold time)과 클럭 글리치(glitch) 문제를 해결한다. 본 발명의 일 실시예에 따라서, 리컨피규러블 컴퓨팅 시스템의 하드웨어 모델로 사용자가 설계하는 동안, 사용자 설계에서 나타난 표준 로직 소자(래치, 플립-플롭)는 에뮬레이션 로직 소자, 타이밍-인센서티브 글리치-프리(TIGF) 로직 소자로 대체된다. 일 실시예에서, EVAL 신호에 포함된 트리거 신호는 TIGF 로직 소자 내에 저장된 값을 업데이트하는데 사용된다. 평가 주기동안 사용자 설계 하드웨어 모델을 통해 진행하고 안정-상태에 도달하도록 여러 입력 및 다른 신호를 대기한 후에, TIGF 로직 소자에 의해 저장되거나 래치된 값을 업데이트하기 위해 트리거 신호가 제공된다. 그 후에, 새로운 평가 주기가 시작된다. 일 실시예에서, 상기 평가 주기-트리거 주기는 주기적이다.One embodiment of the present invention solves the hold time and clock glitch issues. According to one embodiment of the invention, while a user designs with a hardware model of a reconfigurable computing system, the standard logic elements (latch, flip-flop) shown in the user design are emulation logic elements, timing-insensitive glitch-free Replaced with (TIGF) logic elements. In one embodiment, the trigger signal included in the EVAL signal is used to update the value stored in the TIGF logic element. After going through the user-designed hardware model during the evaluation cycle and waiting for multiple inputs and other signals to reach a stable-state, trigger signals are provided to update the values stored or latched by the TIGF logic elements. After that, a new evaluation cycle begins. In one embodiment, the evaluation period-trigger period is periodic.

상기 언급한 유지 시간 문제는 간단하게 설명된다. 공지된 바와 같이, 로직 회로 설계의 일반적인 문제는 유지 시간 위반이다. 유지 시간은 로직 엘리먼트의 데이터 입력(들)가 제어 입력(클럭 입력)이 데이터 입력(들)에 의해 지시되는 값의 래치, 캡쳐 또는 저장을 변경한 후에 안정을 유지해야 하지만, 그렇지 않으면 로직 엘리먼트가 적절하게 동작하지 않는 최소 시간으로서 정의된다. The above mentioned retention time problem is briefly explained. As is known, a common problem in logic circuit design is a retention time violation. The hold time must remain stable after the data input (s) of the logic element changes the latch, capture, or storage of the value indicated by the data input (s), but otherwise the logic element It is defined as the minimum time that does not work properly.

시프트 레지스터 예는 유지 시간 조건을 예시하는데 설명된다. 도 75A는 게개의 D-타입 플립-플롭이 직렬로 접속되는, 즉 플립-플롭(2400)의 출력부가 플립-플롭(2401)의 입력부에 연결되고, 그 출력부가 플립-플롭(2402)의 입력부에 연결되는, 예시적인 시프트 레지스터를 도시한다. 전체 입력 신호(S_in)는 플립-플롭(2400)의 입력부에 연결되고 전체 출력 신호(S_out)는 플립-플롭(2402)의 출력으로부터 생성된다. 모든 세 개의 플립플롭은 각각 자신의 클럭 입력부에서 공통 클럭 신호를 수신한다. 상기 시프트 레지스터 설계는 (1)클럭 신호가 도시에 모든 플립플롭에 도달하고, (2)클럭 신호의 에지를 검출한 후에 플립플롭의 입력부가 유지 시간동안 변하지 않는다는 가정에 기초한다.The shift register example is described to illustrate the hold time condition. 75A shows that multiple D-type flip-flops are connected in series, i.e., the output of flip-flop 2400 is connected to the input of flip-flop 2401, the output of which is the input of flip-flop 2402. An example shift register, shown, is shown. The entire input signal S _in is connected to the input of flip-flop 2400 and the entire output signal S _out is generated from the output of flip-flop 2402. All three flip-flops each receive a common clock signal at their clock inputs. The shift register design is based on the assumption that (1) the clock signal reaches every flip-flop in the illustration, and (2) the input of the flip-flop does not change during the hold time after detecting the edge of the clock signal.

도 75B의 타이밍 도를 참조하면, 시스템이 유지 시간 조건을 위반하지 않는다는 유지 시간 가정을 도시한다. 유지 시간은 하나의 로직 엘리먼트로부터 다음 로직 엘리먼트로 변화하지만 항상 특정 시트에서 특정되는 것은 아니다. 시간(t₀)에서 클럭 입력은 로직 0에서 로직 1로 변한다. 도 75A에 도시된 바와같이, 클럭 입력은 각각의 플립플롭(2400-2402)에 제공된다. (t₀)에서의 상기 클럭 에지로부터, 입력(S_in)은 시간(t₀)에서 시간(t₁)까지 지속되는 유지 시간(T_H)동안 안정해야 한다. 유사하게, 플립플롭(2401(D₂), 2402(D₃))의 입력은 또한 클럭 신호의 트리거 에지로부터 유지 시간 동안 안정해야 한다. 상기 조건은 도 75A와 도 75B에서 만족되기 때문에, 입력(S_in)는 플립플롭(2400)으로 시프트되고, (D₂(로직0))에서의 입력은 플립플롭(2401)으로 시프트되고, (D₃(로직1))에서의 입력은 플립플롭(2402)으로 시프트된다. 공지된 바와 같이,클럭 에지가 트리거된 후에, 플립플롭(2401)(입력D₂에서의 로직 1과 플립플롭(2402)(입력D₃에서의 로직 0의 입력에서의 새로운 값은 유지 시간 조건이 만족된다고 가정하여 다음 클럭 사이클에서 다음 플립플롭에 시프트 또는 저장된다. 하기 표는 상기 예시적인 값에 대한 시프트 레지스터의 동작을 요약한 것이다.Referring to the timing diagram of FIG. 75B, a hold time assumption is shown that the system does not violate the hold time condition. The retention time changes from one logic element to the next, but is not always specified in a particular sheet. At time t ₀ , the clock input changes from logic 0 to logic 1. As shown in FIG. 75A, a clock input is provided to each flip-flop 2400-2402. From the clock edge at (t ₀ ), input S _in must remain stable for a holding time T _H that lasts from time t ₀ to time t ₁ . Similarly, the inputs of flip-flops 2401 (D ₂ ) and 2402 (D ₃ ) must also be stable for the hold time from the trigger edge of the clock signal. Since the condition is satisfied in Figs. 75A and 75B, the input S _in is shifted to the flip-flop 2400, the input at (D ₂ (logic 0)) is shifted to the flip-flop 2401, ( The input at D ₃ (logic 1) is shifted to flip-flop 2402. As is known, after the clock edge is triggered, the new value at the input of the flip-flop 2401 (logic 1 at the input D ₂ and the flip-flop 2402 (logic 0 at the input D ₃ ) has a hold time condition. The shift is stored or stored in the next flip-flop in the next clock cycle, assuming that it is satisfied The following table summarizes the operation of the shift register for the example values.

D₁ D ₁ D₂ D ₂ D₃ D ₃ Q₃ Q ₃ 클럭 에지 전Before clock edge 1One 00 1One 00 클럭 에지 후After clock edge 1One 1One 00 1One

실제 구현에서, 클럭 신호는 모든 로직 엘리먼트에 동시에 도달하지 않지만, 회로는 클럭 신호가 모든 로직 엘리먼트에 거의 동시에 또는 실질적으로 동시에 도달하도록 설계된다. 회로는 각각의 플립플롭에 도달하는 클럭 신호들 사이의 클럭 스큐, 또는 시간차가 유지 시간 조건 보다 작도록 설계된다. 따라서, 모든 로직 엘리먼트는 적절한 입력 값을 캡쳐한다. 도 75A와 도75B에 도시된 상기 예에서, 다른 시간에 플립플롭(2400-2402)에 도달한 클럭 신호로 인해 유지 시간 위반은 다른 플립플롭이 새로운 입력 값을 캡쳐하는 동안 일부 플립플롭이 이전의 입력 값을 캡쳐하는 결과를 유발할 수 있다. 그 결과, 시프트 레지스터는 적절하게 동작하지 않는다.In practical implementations, the clock signal does not reach all the logic elements at the same time, but the circuit is designed such that the clock signal reaches all the logic elements almost simultaneously or substantially simultaneously. The circuit is designed such that the clock skew, or time difference, between clock signals arriving at each flip-flop is less than the hold time condition. Thus, every logic element captures the appropriate input value. In the example shown in Figs. 75A and 75B, the holding time violation is due to the clock signal arriving at flip-flops 2400-2402 at different times, causing some flip-flops to be moved while the other flip-flops are capturing new input values. This can result in capturing input values. As a result, the shift register does not operate properly.

동일한 시프트 레지스터 설계의 리컨피규러블 로직(FPGA) 구현에서, 만약 클럭이 1차 입력으로부터 직접 생성된다면, 회로는 낮은 스큐 네트워크가 클럭 신호를 모든 로직 엘리먼트에 분배할 수 있어 로직 엘리먼트가 실질적으로 동시에 클럭 에지를 검출하도록 설계될 수 있다. 이전 클럭은 셀프-타임 테스트-벤치 프로세스로부터 생성된다. 일반적으로, 주요 클럭 신호는 소프트웨어에서 생성되고 오로지 적은(1-10) 1차 클럭이 통상적인 사용자 회로 설계에서 발견된다.In reconfigurable logic (FPGA) implementations of the same shift register design, if the clock is generated directly from the primary input, the circuitry allows a low skew network to distribute the clock signal to all logic elements such that the logic elements are clocked substantially simultaneously. It can be designed to detect edges. The previous clock is generated from the self-time test-bench process. In general, the primary clock signal is generated in software and only a small (1-10) primary clock is found in a typical user circuit design.

그러나, 만약 클럭 신호가 1차 입력부 대신에 내부 로직으로부터 생성된다면, 유지 시간은 더 중요하게 된다. 유도 또는 게이트 클럭은 1차 클럭에 의해 얻어진 조합 로직 및 레지스터의 네트워크로부터 생성된다. 많은 (1000 이상) 유도 클럭은 통상적인 사용자 회로 설계에서 발견된다. 외부 예방 또는 추가의 제어없이, 상기 클럭 신호는 다른 시간에서 각각의 로직 엘리먼트에 도달할 수 있고 클럭 스큐는 유지 시간보다 길어질 수 있다. 이것은 도 75A와 도 75B에 도시된 시프트 레지스터 회로와 같은 회로 설계를 실패하게 한다.However, if the clock signal is generated from internal logic instead of the primary input, then the hold time becomes more important. The derivation or gate clock is generated from the network of register logic and combinational logic obtained by the primary clock. Many (more than 1000) inductive clocks are found in typical user circuit designs. Without external prevention or further control, the clock signal can reach each logic element at a different time and the clock skew can be longer than the hold time. This causes circuit design such as the shift register circuit shown in Figs. 75A and 75B to fail.

도 75A에 도시된 동일한 시프트 레지스터 회로를 사용하여, 유지 시간 위반이 설명된다. 그러나, 이 때, 시프트 레지스터 회로의 개별 플립플롭은 도 75A에 도시된 여러 리컨피규러블 로직 칩(여러 FPGA 칩)에 대해 배치된다. 제 1 FPGA 칩(2411)은 클럭 신호(CLK)를 FPGA 칩(2412-2416)의 일부 컴포넌트에 제공하는 내부에서 얻어진 클럭 로직(2410)을 포함한다. 상기 예에서, 내부에서 생성된 클럭 신호(CLK)는 시프트 레지스터 회로의 플립플롭(2400-2402)에 제공된다. 칩(2412)는 플립플롭(2400)을 포함하고, 칩(2415)은 플립플롭(2401)을 포함하며, 칩(2416)은 플립플롭(2402)을 포함한다. 두 개의 다른 칩(2413,2414)은 유지 시간 위반 개념을 설명하는데 제공된다.Using the same shift register circuit shown in Fig. 75A, the holding time violation is described. However, at this time, individual flip-flops of the shift register circuit are arranged for the various reconfigurable logic chips (multiple FPGA chips) shown in FIG. 75A. The first FPGA chip 2411 includes internally obtained clock logic 2410 that provides a clock signal CLK to some components of the FPGA chip 2412-2416. In this example, the internally generated clock signal CLK is provided to flip-flops 2400-2402 of the shift register circuit. Chip 2412 includes flip-flops 2400, chip 2415 includes flip-flops 2401, and chip 2416 includes flip-flops 2402. Two different chips 2413 and 2414 are provided to illustrate the concept of hold time violations.

칩(2411)의 클럭 로직(2410)은 내부 클럭 신호(CLK)를 생성하기 위해 1차 클럭 입력(또는 다른 얻어진 클럭 입력)을 수신한다. 상기 내부 클록 신호(CLK)는 칩(2412)으로 전송되고 CLK1로 라벨링된다. 클럭 로직(2410)으로부터의 내부 클럭 신호(CLK)는 또한 칩(2415)으로 전송되고 칩(2413,2414)를 통해 CLK2로 라벨링된다. 도시된 바와 같이, (CLK1)는 플립플롭(2400)에 입력되고 (CLK2)는 플립플롭(2401)에 입력된다. (CLK1)과 (CLK2)는 (CLK1)과 (CLK2)의 데지가 내부 클럭 신호(CLK)의 에지로부터 지연되도록 와이어 트레이스 지연을 가진다. 더욱이, (CLK2)은 두 개의 다른 칩(2413,2414)를 통해 전송되기 때문에 추가의 지연을 가진다. Clock logic 2410 of chip 2411 receives a primary clock input (or other obtained clock input) to generate an internal clock signal CLK. The internal clock signal CLK is sent to chip 2412 and labeled CLK1. Internal clock signal CLK from clock logic 2410 is also sent to chip 2415 and labeled CLK2 via chips 2413 and 2414. As shown, CLK1 is input to flip-flop 2400 and CLK2 is input to flip-flop 2401. CLK1 and CLK2 have wire trace delays such that the deg of CLK1 and CLK2 are delayed from the edge of the internal clock signal CLK. Moreover, (CLK2) has an additional delay since it is transmitted through two different chips 2413 and 2414.

도 76B의 타이밍 도를 참조하면, 내부 클럭 신호(CLK)가 시간(t₂)에서 생성되고 트리거된다. 와이어 트레이스 지연으로 인해, (CLK1)는 시간(t₃)까지 칩(2412)의 플립플롭(2400)에 도달하지 못하며, 이것은 시간(T₁)의 지연이다. 상기 표에 도시된 바와 같이, (Q₁)(또는 입력 D₂)에서의 출력은 (CLK1)의 클럭 에지가 도달하기전까지 로직0이다. (CLK1)의 에지가 플립플롭(2400)에서 검출된 후에, D₁에서의 입력은 필수 유지 시간(H₂)동안(시간(t₄)까지) 안정해야 한다. 이 때, 플립플롭(2400)은 Q₁(또는 D₂)에서 출력이 로직 1이 되도록 입력 로직1로 시프트하거나 또는 저장한다.Referring to the timing diagram of FIG. 76B, an internal clock signal CLK is generated and triggered at time t ₂ . Due to the wire trace delay, CLK1 does not reach the flip-flop 2400 of the chip 2412 until time t ₃ , which is a delay of time T ₁ . As shown in the table above, the output at (Q ₁ ) (or input D ₂ ) is logic 0 until the clock edge of (CLK1) is reached. After the edge of CLK1 is detected at flip-flop 2400, the input at D ₁ must be stable for the required holding time H ₂ (up to time t ₄ ). At this time, the flip-flop 2400 shifts or stores to the input logic 1 such that the output at Q ₁ (or D ₂ ) is logic 1.

이것이 플립플롭(2400)에서 발생하는 동안, 클럭 신호(CLK2)는 칩(2415)의 플립플롭(2401)로 진행한다. 칩(2413,2414)에 의해 유발된 지연(T₂)는 (CLK2)가 시간(t₅)에서 플립플롭(2401)에 도달하게 한다. D₂에서의 입력이 로직 1이고 그후에 유지 시간이 상기 플립플롭(2401)에 대해 만족되면, 상기 로직 값 1은 출력(Q₂)(또는 D₃)에서 나타난다. 따라서, 출력(Q₂)은 (CLK2)의 도달 전에 로직 1이고 출력은 (CLK2)의 도달 후에 로직 1이 계속된다. 이것은 잘못된 결과이다. 상기 시프트 레지스터는 로직 0으로 시프트된다. 플립플롭(2400)은 이전의 입력 값(로직1)로 올바르게 시프트 하지만, 플립플롭(2401)은 새로운 입력 값(로직 1)로 올바르지 않게 시프트된다. 일반적으로 이러한 올바르지 않은 동작은 클럭 스큐(또는 타이밍 지연)이 유지 시간 보다 클때 발생한다. 상기 예에서, T2>T1+H2 이다. 결국, 유지 시간 위반은 일부 예방 수단이 실행되지 않으면, 도 76A에 도시된 바와 같이, 클럭 신호가 하나의 칩으로부터 생성되고 클럭 신호를 다른 칩에 있는 다른 로직 엘리먼트에 분배하는 경우에 발생한다. While this occurs in flip-flop 2400, clock signal CLK2 proceeds to flip-flop 2401 on chip 2415. The delay T ₂ caused by chips 2413 and 2414 causes CLK2 to reach flip-flop 2401 at time t ₅ . If the input at D ₂ is logic 1 and then the hold time is satisfied for the flip-flop 2401, the logic value 1 appears at output Q ₂ (or D ₃ ). Thus, output Q ₂ is logic 1 before the arrival of CLK2 and output 1 continues logic 1 after the arrival of CLK2. This is a wrong result. The shift register is shifted to logic zero. Flip-flop 2400 correctly shifts to the previous input value (logic 1), while flip-flop 2401 incorrectly shifts to the new input value (logic 1). Typically this incorrect behavior occurs when the clock skew (or timing delay) is greater than the hold time. In this example, T2> T1 + H2. As a result, a hold time violation occurs when some preventive measures are not implemented, as shown in FIG. 76A, when a clock signal is generated from one chip and distributes the clock signal to other logic elements on another chip.

상기 언급한 클럭 글리치 문제는 도 77A와 도 77B를 참조하여 설명된다. 일반적으로, 회로 입력이 변할 때, 출력이 올바른 값으로 안정화되기 전에 매우 짧은 시간 동안 일부 임의의 값으로 변한다. 만약 또다른 회로가 잘못된 시간에 출력을 조사하고 임의의 값을 판독한다면, 결과는 올바르지 않을 수 있고 디버깅이 어려워진다. 또 다른 회로에 나브게 영향을 미치 상기 임의의 값을 글리치라고 한다. 일반적인 로직 회로에서, 하나의 회로는 또 다른 회로에 대하여 클럭 신호를 생성할 수 있다. 만약 보상되지 않은 타이밍 지연이 하나 또는 두 개의 회로에 존재한다면, 클럭 글리치(클럭 에지의 계획되지 않은 발생)이 발생할 수 있고 이것은 올바르지 않은 결과를 유발할 수 있다. 유지 시간 위반과 같이, 회로 설계의 임의의 로직 엘리먼트가 다른 시간에서 값을 변화시키기 때문에 클럭 글리치가 발생한다. The aforementioned clock glitch problem is described with reference to FIGS. 77A and 77B. In general, when the circuit input changes, it changes to some random value for a very short time before the output stabilizes to the correct value. If another circuit examines the output at the wrong time and reads a random value, the result may be incorrect and it becomes difficult to debug. Any value that affects another circuit badly is called a glitch. In a typical logic circuit, one circuit can generate a clock signal for another circuit. If an uncompensated timing delay is present in one or two circuits, clock glitches (unplanned occurrences of the clock edges) may occur and this may cause incorrect results. Like retention time violations, clock glitches occur because any logic element in the circuit design changes values at different times.

도 77A는 로직 엘리먼트의 또 다른 세트, 즉 D-타입 플립플롭(2420), D타입 플립플롭(2421)에 대하여 클럭 신호를 생성하고, 배타적 논리합(XOR) 게이트(2422)는 D-타입 플립플롭(2423)에 대하여 클럭 신호(CLK3)를 생성한다. 플립플롭(2420)은 라인(2425)에서 D₁에서 데이터 입력을 수신하고 라인(2427)에서 Q₁에서 데이터를 출력한다. 클럭 로직(2424)으로부터 클럭 입력(CLK)이 수신된다. CLK는 클럭 로직(2424)으로부터 원래 생성된 클럭 신호로 불리고 CLK1은 플립플롭(2420)에 도달하는 시간에 지연된 동일한 신호로 불린다. 77A generates a clock signal for another set of logic elements, namely D-type flip-flop 2420, D-type flip-flop 2421, and the exclusive OR gate 2422 is a D-type flip-flop. The clock signal CLK3 is generated for the 2423. Flip-flop 2420 receives data input at D ₁ at line 2425 and outputs data at Q ₁ at line 2427. The clock input CLK is received from the clock logic 2424. CLK is called the clock signal originally generated from clock logic 2424 and CLK1 is called the same signal delayed at the time it reaches flip-flop 2420.

플립플롭(2421)은 라인(2426)상의 D₂에서 데이터 입력을 수신하고 라인(2428)상의 Q₂에서 데이터를 출력한다. 클럭 로직(2424)으로부터 클럭 입력(CLK2)이 수신된다. CLK는 클럭 로직(2424)으로부터 원래 생성된 클럭 신호로 불리고 CLK2은 플립플롭(2421)에 도달하는 시간에 지연된 동일한 신호로 불린다. Flip-flop 2421 receives data input at D ₂ on line 2426 and outputs data at Q ₂ on line 2428. Clock input CLK2 is received from clock logic 2424. CLK is called the clock signal originally generated from clock logic 2424 and CLK2 is called the same signal delayed in the time it reaches flip-flop 2421.

라인(2427,2428)에서 각각 플립플롭(2420,2421)으로부터의 출력은 XOR 게이트(2422)에 입력된다. XOR 게이트(2422)는 CLK3로 라벨링된 데이터를 플립플롭(2423)의 클럭 입력에 출력한다. 플립플롭(2423)은 또한 라인(2429)에서 D₃에서 데이터를 입력하고 Q₃에서 데이터를 출력한다. Outputs from flip-flops 2420 and 2421, respectively, at lines 2427 and 2428 are input to XOR gate 2422. XOR gate 2422 outputs data labeled CLK3 to the clock input of flip-flop 2423. Flip-flop 2423 also inputs data at D ₃ on line 2429 and outputs data at Q ₃ .

상기 회로에서 발생할 수 있는 클럭 글리치 문제는 도 77B에 도시된 타이밍 도를 참조하여 설명된다. CLK 신호는 시간(t₀)에서 트리거된다. 상기 클럭 신호(CLK1)가 플립플롭(2420)에 도달할 때, 시간은 t₁이다. CLK2는 시간(t₂)까지 플립플롭(2421)에 도달하지 않는다.The clock glitch problem that may occur in the circuit is described with reference to the timing diagram shown in FIG. 77B. The CLK signal is triggered at time t ₀ . When the clock signal CLK1 reaches the flip-flop 2420, the time is t ₁ . CLK2 does not reach flip-flop 2421 until time t ₂ .

D₁과 D₂의 입력은 모두 로직 1이라고 가정한다. CLK1이 시간(t₁)에서 플립플롭(2420)에 도달할 때 Q₁에서의 출력은 (도 77B에 도시된 것처럼) 로직1이 될 것이다. CLK2는 시간(t₁)에서 플립플롭(2420)에 다소 늦게 도달하고, 따라서 라인(2428)에서 출력(Q₂)는 시간(t₁)에서 시간(t₂)까지 로직 0으로 남아 있다. XOR 게이트(2422)는, 원하는 신호가 로직0일지라도(1 XOR 1 = 0), 시간(t₁)과 시간(t₂) 사이의 시간 주기동안 플립플롭(2423)의 클럭 입력에 존재하기 위한 CLK3로서, 로직 1을 생성한다. 시간(t₁)과 시간(t₂) 사이의 시간 주기동안의 CLK3 생성은 클럭 글리치이다. 따라서, 플립플롭(2423)의 입력 라인(2429)에서 D₃에서 존재하는 어떤 로직 값이, 원하던 값이든 아니든, 저장되고, 상기 플립플롭(2423)은 라인(2429) 상에서 다음 입력을 준비한다. 만약 적절하게 설계되었다면, CLK1과 CLK2의 시간 지연은 클럭 글리치가 생성되지 않도록 최소화되거나, 적어도 클럭 글리치는 회로의 나머지 부분에 영향을 주지 않는 짧은 주기동안 지속된다. 후자의 경우에, 만약 CLK1과 CLK2 사이의 클럭 스큐가 충분히 짧다면, XOR 게이트 지연은 글리치를 필터링하기에 충분히 길고, 회로의 나머지 부분에 영향을 주지 않을 것이다. Assume that the inputs of D ₁ and D ₂ are both logic ones. When CLK1 reaches flip-flop 2420 at time t ₁ , the output at Q ₁ will be logic 1 (as shown in FIG. 77B). CLK2 arrives somewhat late at flip-flop 2420 at time t ₁ , so output Q ₂ at line 2428 remains logic 0 from time t ₁ to time t ₂ . XOR gate 2422 is CLK3 for being present at the clock input of flip-flop 2423 during the time period between time t ₁ and time t ₂ , even if the desired signal is logic 0 (1 XOR 1 = 0). As a result, logic 1 is generated. CLK3 generation during the time period between times t ₁ and t ₂ is a clock glitch. Thus, any logic value present at D ₃ in input line 2429 of flip-flop 2423, whether desired or not, is stored and the flip-flop 2423 prepares for the next input on line 2429. If properly designed, the time delays of CLK1 and CLK2 are minimized so that no clock glitches are produced, or at least the clock glitches last for a short period of time without affecting the rest of the circuit. In the latter case, if the clock skew between CLK1 and CLK2 is short enough, the XOR gate delay is long enough to filter the glitches and will not affect the rest of the circuit.

유지 시간 위반 문제에 대한 두 개의 공지된 해결책은 (1) 타이밍 조정과 (2) 타이밍 재합성이다. 미국 특허 번호 5,475,830 에 개시된 타이밍 조정은 로직 엘리먼트의 유지 시간을 늘리기 위해 임의의 신호 경로에서 (버퍼와 같은) 충분한 지연 엘리먼트의 설치를 필요로 한다. 예컨대, 상기 시프트 레지스터 회로내의 입력(D₂,D₃)상에서의 충분한 지연을 추가하는 것은 유지 시간 위반을 방지할 수 있다. 따라서, 도 78에서, 입력(D₂,D₃)에 각각 추가된 지연 엘리먼트(2430,2430)을 가지는 동일한 시프트 레지스터 회로가 도시되어 있다. 그 결과, 지연 엘리먼트(2430)는 T2<T1+H2(도76B)이고 유지 시간 위반이 발생하지 않기 위해 시간(t₄)이 시간(t₅)이후에 발생하도록 설계될 수 있다. Two known solutions to the retention time violation problem are (1) timing adjustment and (2) timing resynthesis. The timing adjustment disclosed in US Pat. No. 5,475,830 requires the installation of sufficient delay elements (such as buffers) in any signal path to increase the retention time of the logic elements. For example, adding a sufficient delay on inputs D ₂ , D _{3 in} the shift register circuit can prevent a holding time violation. Thus, in FIG. 78, the same shift register circuit is shown with delay elements 2430 and 2430 added to inputs D ₂ and D ₃ , respectively. As a result, delay element 2430 can be designed such that time t ₄ occurs after time t ₅ so that T2 < T1 + H2 (FIG. 76B) and no hold time violation occurs.

타이밍 조정 해결책이 갖는 잠재적인 문제는 FPGA 칩의 스펙 시트에 너무 과도하게 의존한다는 것이다. 통상의 당업자에게 공지된 바와 같이, FPGA 칩과 같은 리컨피규러블 로직 칩은 룩-업 테이블을 갖는 로직 엘리먼트를 구현한다. 칩내의 룩-업 테이블의 지연은 스펙 시트에 제공되고 설계자는 유지 시간 위반을 방지하는 타이밍 조정 방법을 사용하여 상기 특정 시간 지연에 의존한다. 그러나, 상기 지연은 단지 추정값이며 칩에 따라 변한다. 타이밍 조정 방법이 가지는 또 다른 잠재적인 문제점은 설계자가 또한 회로 설계 전체에 걸쳐 존재하는 와이어링 지연을 보상한다는 것이다. 비록 이것은 불가능한 작업이지만, 와이어링 지연의 추정은 시간 소모적이며 에러가 발생하기 쉽다. 더욱이, 타이밍 조정 방법은 클럭 글리치 문제를 해결하지 않는다. A potential problem with timing adjustment solutions is that they rely too much on the spec sheet of the FPGA chip. As is known to those skilled in the art, reconfigurable logic chips, such as FPGA chips, implement logic elements with look-up tables. The delay of the look-up table in the chip is provided in the spec sheet and the designer relies on this specific time delay using a timing adjustment method that prevents a hold time violation. However, the delay is only an estimate and varies from chip to chip. Another potential problem with the timing adjustment method is that the designer also compensates for the wiring delays that exist throughout the circuit design. Although this is an impossible task, the estimation of the wiring delay is time consuming and error prone. Moreover, the timing adjustment method does not solve the clock glitch problem.

또다른 해결책은 IKOS의 VirtualWires 기술에 소개된 타이밍 재합성(timing resynthesis)이다. 타이밍 재합성 개념은 유한 상태 머신과 레지스터를 통해 클럭의 타이밍과 핀-아웃 신호의 엄격한 제어동안 사용자의 회로 설계를 기능적으로 동일한 설계로 변형하는 것을 포함한다. 타이밍 재합성은 단일 고속 클럭에 의해 도입되는 사용자의 회로 설계를 재타이밍한다. 또한 래치, 게이트 클럭, 여러 동기식 및 비동기식 클럭을 플립플롭 기반 단일-클럭 동기 설계로 변환한다. 따라서, 타이밍 재합성은 내부-칩 유지 시간 위반이 발생하지 않도록 정교한 내부-칩 신호 움직임을 제어하기 위해 각각의 칩의 입력 및 출력 핀-아웃에서 레지스터를 사용한다. 또한 타이밍 재합성은 다른 칩으로부터 입력을 계획하고, 다른 칩으로 출력을 계획하며 기준 클럭에 기초한 내부 플립플롭의 업데이트을 계획하기 위해 각 칩의 유한 상태 머신을 사용한다.
도 75A,75B,76A,76B와 관련하여 설명된 동일한 시프트 레지스터 회로를 사용하여, 도 79는 타이밍 재합성 회로의 일 예를 도시한다. 기본적인 세 개의 플립플롭 시프트 레지스터 설계는 기능적으로 동일한 회로로 변형되었다. 칩(2430)은 라인(2448)을 통해 레지스터(2443)에 연결된 로직(2435)를 생성하는 원래의 내부 클럭을 포함한다. 클럭 로직(2435)은 CLK 신호를 생성한다. 제 1 유한 상태 머신(2438)은 또한 라인(2449)를 통해 레지스터(2443)에 연결된다. 레지스터(2443)과 제 1 유한 상태 머신(2438)은 설계-독립형 글로벌 기준 클럭에 의해 제어된다.
또한 CLK 신호는 칩(2434)에 도달하기 전에 칩(2432,2433)에 대해 전달된다. 칩(2432)에서, 제 2 유한 상태 머신(2440)은 라인(2462)를 통해 레지스터(2445)를 제어한다. CLK 신호는 레지스터(2443)로부터 라인(2461)을 통해 레지스터(2445)로 전송된다. 레지스터(2445)는 CLK 신호를 라인(2463)을 통해 다음 칩(2433)으로 출력한다. 칩(2433)은 라인(2464)를 통해 레지스터(2446)를 제어하는 제 3 유한 상태 머신(2441)을 포함한다. 레지스터(2446)는 CLK 신호를 칩(2434)으로 출력한다.
칩(2431)는 원래의 플립플롭(2436)을 포함한다. 레지스터(2444)는 입력(Sin)을 수신하고 라인(2452)를 통해 플립플롭(2436)의 D₁ 입력에 입력(S_in)을 출력한다. 플립플롭(2436)의 Q₁ 출력은 라인(2454)를 통해 레지스터(2466)에 연결된다. 제 4 유한 상태 머신(2439)는 라인(2451)을 통해 레지스터(2444)를, 라인(2455)을 통해 레지스터(2466)를 제어하고, 래치 인에이블 라인(2453)을 통해 플립플롭(2436)을 제어한다. 제 4 유한 상태 머신(2439)는 또한 라인(2450)을 통해 칩(2430)으로부터 원래의 클럭 신호 CLK를 수신한다.
칩(2434)은 원래의 플립플롭(2437)을 포함하며, 이것은 라인(2456)을 통해 D₂ 입력에서 칩(2431)의 레지스터(2466)로부터 신호를 수신한다. 플립플롭(2437)의 Q₂ 출력은 라인(2457)을 통해 레지스터(2447)에 연결된다. 제 5 유한 상태 머신(2439)는 라인(2459)를 통해 레지스터(2447)을 제어하고, 래치 인에이블 라인(2458)을 통해 플립플롭(2437)을 제어한다. 제 5 유한 상태 머신(2442)는 또한 칩(2432,2433)을 통해 칩(2430)으로부터 원래의 클럭 신호(CLK)를 수신한다.
타이밍 재합성을 이용하여, 유한 상태 머신(2438-2442), 레지스터(2443-2447,2466), 및 단일 글로벌 기준 클럭은 다중 칩들에 대한 신호 흐름을 제어하고 내부 플립플롭을 업데이트하는데 사용된다. 따라서, 칩(2430)에서, CLK 신호를 다른 칩에 분배하는 것은 레지스터(2443)을 통해 제 1 유한 상태 머신(2438)에 의해 스케쥴링된다. 유사하게, 칩(2431)에서, 제 4 유한 상태 머신(2439)는 입력(S_in)을 레지스터(2444)를 통해 플립플롭(2436)에 전달하고 레지스터(2466)을 통해 Q₁ 출력을 전달하는 것을 스케쥴링한다. 또한 플립플롭(2436)의 래칭 기능은 제 4 유한 상태 머신(2439)로부터 래치 인에이블 신호에 의해 제어된다. 동일한 원리가 다른 칩(2432-2434)내의 로직에 대해 적용된다. 상기 내부-칩 입력 전달 스케줄, 내부-칩 출력 전달 스케쥴, 및 내부 플립플롭 상태 업데이트의 엄격한 제어를 이용하여, 내부-칩 홀드-시간 위반이 제거된다.
그러나, 타이밍 재합성 기술은 사용자의 회로 설계를 유한 상태 머신과 레지스터를 포함하는 대부분 기능적으로 동일한 회로로 변형하는 것을 요구한다. 일반적으로, 이러한 기술을 구현하는데 추가로 필요한 로직은 각 칩내에서 이용될 수 있는 로직의 20%이다. 더욱이, 상기 기술은 클럭 글리치 문제점에 영향을 받지 않는다. 클럭 글리치를 피하기 위하여, 설계자는 타이밍 재합성 기술을 사용하여 추가의 예비주의 단계를 수행해야 한다. 한가지 종래 설계 접근법은 게이트 클럭을 이용하는 로직 소자의 입력이 동시에 변하지 않도록 회로를 설계하는 것이다. 개선되 접근법은 회로의 나머지 부분이 영향을 받지 않도록 글리치를 필터링하기 위해 게이트 지연을 사용한다. 그러나, 상기 설명처럼, 타이밍 재합성은 클럭 글리치를 피하기 위해 일부 추가로 시도되지 않은 수단을 요구한다.Another solution is the timing resynthesis introduced in IKOS's VirtualWires technology. The timing resynthesis concept involves transforming a user's circuit design into a functionally identical design during finite state machines and registers during tight control of the timing of the clock and pin-out signals. Timing resynthesis retimes the user's circuit design introduced by a single high speed clock. It also converts latches, gate clocks, and multiple synchronous and asynchronous clocks into a flip-flop-based single-clock synchronous design. Thus, timing resynthesis uses registers at the input and output pin-out of each chip to control elaborate internal-chip signal movement such that no internal-chip hold time violations occur. Timing resynthesis also uses each chip's finite state machine to plan inputs from other chips, plan outputs to other chips, and schedule updates of internal flip-flops based on reference clocks.
Using the same shift register circuit described with respect to Figures 75A, 75B, 76A, 76B, Figure 79 shows an example of a timing resynthesis circuit. The basic three flip-flop shift register designs have been transformed into functionally identical circuits. Chip 2430 includes an original internal clock that generates logic 2435 coupled to register 2443 via line 2448. Clock logic 2435 generates a CLK signal. First finite state machine 2438 is also coupled to register 2443 via line 2449. Register 2443 and first finite state machine 2438 are controlled by a design-independent global reference clock.
The CLK signal is also conveyed to chips 2432 and 2433 before reaching chip 2434. In chip 2432, second finite state machine 2440 controls register 2445 via line 2442. The CLK signal is sent from register 2443 to register 2445 via line 2461. The register 2445 outputs the CLK signal through the line 2463 to the next chip 2433. Chip 2433 includes a third finite state machine 2441 that controls register 2446 via line 2464. The register 2446 outputs the CLK signal to the chip 2434.
Chip 2431 includes original flip-flop 2436. Register 2444 receives an input Sin and outputs an input S _in to the D ₁ input of flip-flop 2436 via line 2452. Q ₁ output of flip-flop 2436 is connected to register 2466 via line 2454. Fourth finite state machine 2439 controls register 2444 through line 2451, register 2466 through line 2455, and flip-flop 2436 through latch enable line 2453. To control. The fourth finite state machine 2439 also receives the original clock signal CLK from the chip 2430 via line 2450.
Chip 2434 includes original flip-flop 2437, which receives a signal from register 2466 of chip 2431 at the D ₂ input over line 2456. Q ₂ output of flip-flop 2437 is connected to register 2447 via line 2457. Fifth finite state machine 2439 controls register 2447 through line 2459 and flip-flop 2437 through latch enable line 2458. The fifth finite state machine 2442 also receives the original clock signal CLK from the chip 2430 via the chips 2432 and 2433.
Using timing resynthesis, finite state machines 2438-2442, registers 2443-2447, 2466, and a single global reference clock are used to control signal flow and update internal flip-flops for multiple chips. Thus, at chip 2430, distributing the CLK signal to other chips is scheduled by first finite state machine 2438 via register 2443. Similarly, at chip 2431, fourth finite state machine 2439 passes input S _in to flip-flop 2436 through register 2444 and Q ₁ output through register 2466. Schedule it. The latching function of flip-flop 2436 is also controlled by the latch enable signal from fourth finite state machine 2439. The same principle applies to logic in other chips 2432-2434. Using the internal-chip input delivery schedule, internal-chip output delivery schedule, and strict control of internal flip-flop status update, internal-chip hold-time violations are eliminated.
However, timing resynthesis techniques require the user's circuit design to be transformed into mostly functionally identical circuits, including finite state machines and registers. In general, the additional logic needed to implement this technique is 20% of the logic available in each chip. Moreover, the technique is not affected by clock glitch problems. To avoid clock glitches, designers must take additional precautionary steps using timing resynthesis techniques. One conventional design approach is to design a circuit such that the input of a logic device using a gate clock does not change at the same time. The improved approach uses gate delay to filter the glitches so that the rest of the circuit is unaffected. However, as described above, timing resynthesis requires some additional untried means to avoid clock glitches.

삭제delete

유지 시간과 클럭 글리치 문제점을 해결하는 본발명의 여러 실시예가 논의 된다. 사용자 설계를 RCC 컴퓨팅 시스템의 소프트웨어 모델과 RCC 어레이의 하드웨어 모델로 구조 맵핑하는 동안, 도 18A에 도시된 래치는 본 발명의 일 실시예에 따라서 타이밍 인센서티브 글리치-프리(TIGF) 래치로 에뮬레이션된다. 유사하게, 도 18B에 도시된 플립플롭 설계는 본 발명의 일 실시예에 따라서 타이밍 인센서티브 글리치-프리(TIGF) 래치로 에뮬레이팅된다. 래치 또는 플립플롭 형태를 가지는 TIGF 로직 소자는 또한 에뮬레이션 로직 소자로 불릴 수 있다. TIGF 래치와 플립플롭의 업데이트은 글로벌 트리거 신호로 제어된다.
본 발명의 일 실시예에서, 사용자 설계 회로에서 발견된 모든 로직 소자가 TIGF 로직 소자로 대체되는 것은 아니다. 사용자 설계 회로는 1차 클럭에 의해 인에이블 또는 클럭킹된 부분과 게이트 또는 얻어진 클럭에 의해 제어된 다른 부분을 포함한다. 유지 시간 위반과 클럭 글리치는 로직 소자가 게이트 또는 얻어진 클럭에 의해 제어되는 경우에 발생하기 때문에, 오로지 게이트 또는 얻어진 클럭에 의해 제어된 특정 로직 소자가 본 발명의 일 실시예에 따라서 TIGF로 대체된다. 다른 실시예에서, 사용자 설계 회로에 발견된 모든 로직 소자는 TIGF 로직 소자로 대체된다.
본 발명의 TIGF 래치 및 플립플롭 실시예를 설명하기 전에, 글로벌 트리거 신호가 설명된다. 일반적으로, 글로벌 트리거 신호는 TIGF 래치 및 플립플롭이 평가 주기동안 상태(이전 입력값)를 유지하게 하고, 짧은 트리거 주기동안 상태(새로운 입력값)를 업데이트하게 한다. 일 실시예에서, 도 82에 도시된 글로벌 트리거 신호는 상기 설명한 EVAL 신호로부터 분리되고 얻어진다. 상기 실시예에서, 글로벌 트리거 신호는 짧은 트리거 주기가 뒤따르는 긴 평가 주기를 가진다. 글로벌 트리거 신호는 평가 주기동안 EVAL 신호를 트레이스하고, EVAL 사이클의 결과에서, 짧은 트리거 신호가 TIGF 래치와 플립플롭을 업데이트하기 위해 생성된다. 또 다른 실시예에서, EVAL 신호는 글로벌 트리거 신호이며, EVAL 신호는 평가 주기동안 하나의 로직 상태(로직0)를 가지며, 비평가 또는 TIGF 래치/플립플롭 업데이트 주기동안 또다른 로직 상태(로직1)를 가진다. Various embodiments of the present invention that address the retention time and clock glitch problems are discussed. While structurally mapping the user design to a software model of the RCC computing system and a hardware model of the RCC array, the latch shown in FIG. 18A is emulated with a timing insensitive glitch-free (TIGF) latch in accordance with one embodiment of the present invention. Similarly, the flip-flop design shown in FIG. 18B is emulated with a timing insensitive glitch-free (TIGF) latch in accordance with one embodiment of the present invention. TIGF logic elements in the form of latches or flip-flops may also be referred to as emulation logic elements. Updates of TIGF latches and flip-flops are controlled by global trigger signals.
In one embodiment of the invention, not all logic elements found in the user designed circuit are replaced by TIGF logic elements. The user design circuit includes a portion enabled or clocked by the primary clock and another portion controlled by the gate or obtained clock. Since hold time violations and clock glitches occur when a logic element is controlled by a gate or an obtained clock, only a particular logic element controlled by the gate or an obtained clock is replaced by TIGF in accordance with one embodiment of the present invention. In another embodiment, all logic elements found in the user designed circuit are replaced with TIGF logic elements.
Before describing the TIGF latch and flip-flop embodiments of the present invention, a global trigger signal is described. In general, the global trigger signal causes the TIGF latch and flip-flop to maintain the state (old input) for the evaluation period and to update the state (new input) for a short trigger period. In one embodiment, the global trigger signal shown in FIG. 82 is separated and obtained from the EVAL signal described above. In this embodiment, the global trigger signal has a long evaluation period followed by a short trigger period. The global trigger signal traces the EVAL signal during the evaluation period, and as a result of the EVAL cycle, a short trigger signal is generated to update the TIGF latch and flip-flop. In another embodiment, the EVAL signal is a global trigger signal, which has one logic state (logic 0) during the evaluation period and another logic state (logic 1) during the non-evaluation or TIGF latch / flip-flop update period. Have

삭제delete

RCC 컴퓨팅 시스템과 RCC 하드웨어 어레이를 참조하여 설명한 평가 주기는 모든 1차 입력을 진행하는데 사용되고 플립플롭/래치 디바이스는 임의의 시간에서의 시뮬레이션 사이클을 전체 사용자 설계로 변화시킨다. 전파 도중에, RCC 시스템은 시스템 내의 모든 신호들이 정상 상태(steady-state)를 달성할 때까지 대기하게 된다. 사용자 설계가 맵핑되어 RCC 어레이의 적절한 리컨피규러블 로직 엘리먼트(예를들어, FPGA 칩)에 배치된 후에, 평가 주기가 계산된다. 따라서, 평가 주기는 설계 특정 사항이다. 즉, 한 사용자 설계에 대한 평가 주기가 다른 사용자 설계에 대한 평가 주기와 상이할 수 있다. 이 평가 주기는 시스템 내의 모든 신호가 전체 시스템을 통과하여 전파되고 다음의 짧은 트리거 주기 전에 정상 상태에 도달하는 것을 보장할 수 있도록 충분히 길어야 한다.
짧은 트리거 주기는 도82에 도시된 바와 같이, 평가 주기에 인접한 시간에 발생한다. 본 발명의 일 실시예에서, 짧은 트리거 주기는 평가 주기 후에 일어난다. 이 짧은 트리거 주기 전에, 입력 신호가 평가 주기 도중에 사용자 설계 회로의 하드웨어 모델 구조화 부분을 통과하여 전파된다. 본 발명의 일 실시예에 따른 EVAL 신호의 로직 상태의 변화에 의해 표시되는 짧은 트리거 주기은 정상 상태가 도달된 후의 평가 주기으로부터 전파된 새로운 값으로 업데이트될 수 있도록, 사용자 설계 내의 모든 TIGF 래치와 플립플롭을 제어한다. 짧은 트리거 주기는 낮은 스큐(skew) 네트워크에 전체적으로 분산되며 리컨피규러블 로직 엘리먼트가 적절한 동작을 허용할 수 있을 정도만큼 짧을 수 있다(즉, 도82에 도시된 지속시간 t₂에서 t₃뿐만 아니라 t₀에서 t₁의 지속시간). 짧은 트리거 주기 도중에, 새로운 1차 입력이 TIGF 래치 및 플립플롭의 매 입력 단계마다 샘플되며, 동일한 TIGF 래치 및 플립플롭의 이전에 저장된 값이 사용자 설계의 RCC 하드웨어 모델의 다음 스테이지에 노출된다. 이하의 설명에서는, 짧은 트리거 주기 도중에 발생하는 전체 트리거 신호의 일부가 TIGF 트리거, TIGF 트리거 신호, 트리거 신호, 또는 단순히 트리거로 표시될 것이다.The evaluation cycle described with reference to the RCC computing system and the RCC hardware array is used to run all primary inputs and the flip-flop / latch device transforms the simulation cycle at any time into a full user design. During the propagation, the RCC system will wait until all signals in the system achieve a steady-state. After the user design is mapped and placed in the appropriate reconfigurable logic element (eg, FPGA chip) of the RCC array, the evaluation period is calculated. Therefore, the evaluation cycle is design specific. That is, the evaluation cycle for one user design may be different from the evaluation cycle for another user design. This evaluation period should be long enough to ensure that all signals in the system propagate through the entire system and reach steady state before the next short trigger period.
The short trigger period occurs at a time adjacent to the evaluation period, as shown in FIG. In one embodiment of the invention, the short trigger period occurs after the evaluation period. Before this short trigger period, the input signal propagates through the hardware model structuring portion of the user design circuit during the evaluation period. A short trigger period, indicated by a change in the logic state of the EVAL signal, according to one embodiment of the present invention, can be updated with all TIGF latches and flip-flops in the user design so that they can be updated with new values propagated from the evaluation period after the steady state is reached. To control. The short trigger period is distributed throughout the low skew network and may be short enough to allow the reconfigurable logic element to allow proper operation (i.e. t as well as t ₃ at duration t _{2 shown} in FIG. 82). Duration of ₀ to t ₁ ). During a short trigger period, a new primary input is sampled for every input stage of TIGF latches and flip-flops, and previously stored values of the same TIGF latches and flip-flops are exposed to the next stage of the user-designed RCC hardware model. In the description below, a portion of the total trigger signal that occurs during a short trigger period will be represented as a TIGF trigger, TIGF trigger signal, trigger signal, or simply a trigger.

삭제delete

도 80(A)은 도18(A)에 먼저 도시되었던 래치(2470)를 나타낸다. 래치 동작은 다음과 같다.FIG. 80A shows the latch 2470 previously shown in FIG. 18A. The latch operation is as follows.

if(#S), Q←1
else if(#R), Q←0
else if(en), Q←D
else Q는 이전 값을 유지.if (#S), Q ← 1
else if (#R), Q ← 0
else if (en), Q ← D
else Q retains the previous value.

삭제delete

이러한 래치가 레벨-민감성(sensitive)이며 비동기적이므로, 클럭 입력이 인에이블되고 래치 인에이블 입력이 인에이블되는 동안은, 출력(Q)이 입력(D)을 트레이스한다.
도80(B)은 본 발명의 일 실시예에 따른 TIGF 래치를 나타낸다. 도80(A)에 도시된 바와 같이, TIGF 래치는 D 입력, 인에이블 입력, 셋(S), 리셋(R), 및 출력 Q를 갖는다. 따라서, TIGF 래치는 트리거 입력을 갖는다. TIGF 래치는 D 플립플롭(2471), 멀티플렉서(2472), OR 게이트(2473), AND 게이트(2474), 및 다양한 상호접속부를 갖는다.
D 플립플롭(2471)은 라인(2476)을 통해 AND 게이트(2474)의 출력으로부터 D 플립플롭(2471)의 입력을 수신한다. D 플립플롭은 또한 라인(2477) 상의 트리거 신호에 의해 입력된 클럭에서 트리거되며, 상기 트리거 신호는 평가 사이클에 기초한 엄격한 스케줄에 따라 RCC 시스템에 의해 전체적으로 분산되어 있다. D 플립플롭(2471)의 출력은 라인(2478)을 통해 멀티플렉서(2472)의 한 입력에 연결된다. 멀티플렉서(2472)의 다른 입력은 라인(2475)상에 TIGF 래치 D 입력에 연결된다. 멀티플렉서는 라인(2484) 상의 인에이블 신호에 의해 제어된다. 멀티플렉서(2472)의 출력은 라인(2479)을 통해 OR 게이트(2473)의 입력에 연결된다. OR 게이트(2473)의 다른 입력은 라인(2480) 상의 셋(S) 입력에 연결된다. OR 게이트(2473)의 출력은 라인(2481)을 통해 AND 게이트(2474)의 하나의 입력에 연결된다. AND 게이트(2474)의 다른 입력은 라인(2482)상의 리셋(R) 신호에 연결된다. AND 게이트(2474) 상의 출력은 위에서 언급한 바와 같이 라인(2476)을 통해 D 플립플롭(2471)의 입력에 피드백된다.
본 발명에 따른 실시예의 이러한 TIGF 래치의 동작을 설명하고자 한다. TIGF 래치의 실시예에서, D 플립플롭(2471)은 TIGF 래치의 현재 상태(즉, 이전 값)를 유지한다. D 플립플롭(2471)의 입력에서의 라인(2476)은 TIGF 래치에 래치된 새로운 입력 값을 나타낸다. 라인(2475) 상의 TIGF 래치의 주 입력(D 입력)이 궁극적으로는(궁극적으로는 표시될 라인(2484)상의 적절한 인에이블 신호를 갖는) 멀티플렉서(2472)의 입력으로부터 OR 게이트(2473)를 통과하여 진행되고, 마지막으로 라인(2483) 상에서 AND 게이트(2474)를 통과하고, TIGF 래치의 새로운 입력 신호를 라인(2476) 상의 D 플립플롭(2471)으로 피드백시키기 때문에, 라인(2476)은 새로운 값을 나타낸다. 라인(2477) 상의 트리거 신호는 새로운 입력 값을 D 플립플롭(2471)으로 클럭킹함에 의해 TIGF 래치를 업데이트한다. 따라서, D 플립플롭(2471)의 라인(2478) 상의 출력은 TIGF 래치의 현재 상태(즉, 이전 값)를 나타내고, 라인(2476) 상의 입력은 TIGF 래치에 의해 래치된 새로운 입력을 나타낸다.
멀티플렉서(2472)는 D 플립플롭(2471)으로부터 뿐만아니라 라인(2475) 상의 새로운 입력 값으로부터 현재 상태를 수신한다. 인에이블 라인(2484)은 멀티플렉서(2472)에 대한 선택기로서 기능한다. TIGF 래치가 트리거 신호가 라인(2477) 상에 제공될 때까지 업데이트되기 않기 때문에, 라인(2475) 상의 TIGF 래치의 D 입력 및 라인(2484) 상의 인에이블 입력은 임의의 순서로 TIGF 래치에 도달될 수 있다. TIGF 래치(및 사용자 설계의 하드웨어 모델에서의 다른 TIGF 래치)가, 하나의 클럭 신호가 다른 클럭 신호보다 훨씬 뒤에 도달되는 도76(A) 및 도76(B)에서 설명된 바와 같이, 종래의 래치에서 사용된 회로의 유지 시간 위반을 일반적으로 야기하는 상황에 직면하면, 이 TIGF 래치는 트리거 신호가 라인(2477) 상에 제공될 때까지 적절한 이전 값을 유지함에 의해 적절히 기능할 것이다.
트리거 신호는 낮은 스큐 글로벌 클럭 네트워크를 통해 분산된다.
이 TIGF 래치는 또한 클럭 글리치 문제를 해결한다. 클럭신호가 TIGF 래치에서 인에이블 신호로 대체된다는 점을 주의한다. 라인(2484) 상의 인에이블 신호는 평가 주기 도중에 종종 글리치를 일으키나 TIGF 래치는 에러없이 현재 상태를 계속 유지할 것이다. TIGF 래치가 업데이트될 수 있는 단 하나의 메커니즘은, 본 발명의 평가 주기 후에 제공된 트리거 신호를 통해서이며, 상기 트리거 신호는 일 실시예에서 정상 상태에 도달되는 때에 평가 주기 후에 제공된다.Since this latch is level-sensitive and asynchronous, the output Q traces the input D while the clock input is enabled and the latch enable input is enabled.
Figure 80 (B) shows a TIGF latch in accordance with an embodiment of the present invention. As shown in Fig. 80A, the TIGF latch has a D input, an enable input, a set S, a reset R, and an output Q. Thus, the TIGF latch has a trigger input. The TIGF latch has a D flip-flop 2471, a multiplexer 2472, an OR gate 2473, an AND gate 2474, and various interconnects.
D flip-flop 2471 receives the input of D flip-flop 2471 from the output of AND gate 2474 over line 2476. The D flip-flop is also triggered on the clock input by the trigger signal on line 2477, which is entirely distributed by the RCC system according to a strict schedule based on the evaluation cycle. The output of D flip-flop 2471 is connected to one input of multiplexer 2472 via line 2478. The other input of multiplexer 2472 is connected to TIGF latch D input on line 2475. The multiplexer is controlled by an enable signal on line 2484. The output of multiplexer 2472 is connected to the input of OR gate 2473 through line 2479. The other input of OR gate 2473 is connected to the set (S) input on line 2480. The output of OR gate 2473 is connected to one input of AND gate 2474 via line 2481. The other input of AND gate 2474 is connected to a reset (R) signal on line 2482. The output on AND gate 2474 is fed back to the input of D flip-flop 2471 via line 2476 as mentioned above.
The operation of this TIGF latch in an embodiment according to the present invention will be described. In an embodiment of the TIGF latch, the D flip-flop 2471 maintains the current state of the TIGF latch (ie, the previous value). Line 2476 at the input of D flip-flop 2471 represents a new input value latched to the TIGF latch. The main input (D input) of the TIGF latch on line 2475 ultimately passes through OR gate 2473 from the input of multiplexer 2472 (which ultimately has the appropriate enable signal on line 2484 to be displayed). And finally pass on AND gate 2474 on line 2483, and feed back a new input signal of TIGF latch to D flip-flop 2471 on line 2476, so line 2476 is a new value. Indicates. The trigger signal on line 2477 updates the TIGF latch by clocking the new input value to D flip-flop 2471. Thus, the output on line 2478 of D flip-flop 2471 represents the current state of the TIGF latch (ie, the previous value), and the input on line 2476 represents the new input latched by the TIGF latch.
Multiplexer 2472 receives the current state from D flip-flop 2471 as well as from a new input value on line 2475. Enable line 2484 functions as a selector for multiplexer 2472. Since the TIGF latch is not updated until a trigger signal is provided on line 2477, the D input of the TIGF latch on line 2475 and the enable input on line 2484 may reach the TIGF latch in any order. Can be. The TIGF latch (and other TIGF latches in a user-designed hardware model) is a conventional latch, as described in Figures 76 (A) and 76 (B), where one clock signal arrives far behind another clock signal. In the event of a situation that typically causes a hold time violation of the circuit used in the < RTI ID = 0.0 > introduced < / RTI >
The trigger signal is distributed through a low skew global clock network.
This TIGF latch also solves the clock glitch problem. Note that the clock signal is replaced by the enable signal in the TIGF latch. The enable signal on line 2484 often causes glitches during the evaluation period, but the TIGF latch will remain in its current state without error. The only mechanism by which the TIGF latch can be updated is via a trigger signal provided after the evaluation period of the present invention, which in one embodiment is provided after the evaluation period when the steady state is reached.

삭제delete

도81(A)은 도18(B)에 먼저 도시된 플립플롭(2490)을 나타낸다. 플립플롭은 다음과 같이 동작한다.
if(#S), Q←1
else if(#R), Q←0
else if(CLK의 (+) 에지), Q←D
else Q는 이전 값을 유지.FIG. 81 (A) shows flip-flop 2490 shown first in FIG. 18 (B). Flip-flops work like this:
if (#S), Q ← 1
else if (#R), Q ← 0
else if (+ edge of CLK), Q ← D
else Q retains the previous value.

삭제delete

이러한 래치가 에지-트리거되기 때문에, 플립플롭 인에이블 입력이 인에이블되는 동안에는, 출력 (Q)가 클럭 신호의 (+) 에지에서 입력(D)을 뒷따른다. Because this latch is edge-triggered, while the flip-flop enable input is enabled, the output (Q) follows the input (D) at the positive edge of the clock signal.

도81(B)은 본 발명의 일 실시예에 따른 TIGF D형 플립플롭을 나타낸다. 도81(A)의 플립플롭과 마찬가지로, TIGF 플립플롭은 D 입력, 클럭 입력, 셋(S), 리셋(R) 및 출력(Q)를 갖는다. 또한, TIGF 플립플롭은 트리거 입력을 갖는다. TIGF 플립플롭은 세 개의 D 플립플롭(2491, 2492, 2496), 멀티플렉서(2493), OR 게이트(2494), 두 개의 AND 게이트(2495, 2497), 및 다양한 상호접속부를 갖는다.
플립플롭(2491)은 라인(2498) 상의 TIGF D 입력, 라인(2499) 상의 트리거 입력을 수신하고, 라인(2500) 상에서 Q 출력을 제공한다. 이 출력 라인(2500)은 또한 멀티플렉서(2493)의 입력들중 하나로서 기능한다. 멀티플렉서(2493)의 다른 입력은 라인(2503)을 통해 플립플롭(2492)의 출력 Q로부터 나온다. 멀티플렉서(2493)의 출력은 라인(2505)을 통해 OR 게이트(2494)의 입력들 중 하나에 연결된다. OR 게이트(2492)의 다른 입력은 라인(2506) 상의 셋(S) 신호이다. OR 게이트(2494)의 출력이 라인(2507)을 통해 AND 게이트(2495)의 입력들중 하나에 연결된다. AND 게이트(2495)의 다른 입력은 라인(2508) 상의 리셋(R) 신호이다. AND 게이트(2495)의 출력(전체 TIGF 출력 Q이기도 함)은 라인(2501)을 통해 플립플롭(2492)의 입력에 연결된다. 플립플롭(2492)은 또한 라인(2502) 상의 트리거 입력을 갖는다.
멀티플렉서(2493)로 돌아와서, 멀티플렉서(2493)의 선택기 입력은 라인(2509)을 통해 AND 게이트(2497)의 출력에 연결된다. AND 게이트(2497)는 라인(2510) 상의 CLK 신호로부터의 입력들중 하나를 수신하고 라인(2512)을 통해 플립플롭(2496)의 출력으로부터 다른 입력을 수신한다. 플립플롭(2496)은 또한 라인(2511) 상의 CLK 신호로부터 입력을 수신하고 라인(2513) 상의 트리거 입력을 수신한다.
본 발명의 실시예의 TIGF 플립플롭 동작에 대하여 설명하고자 한다. 본 실시예에서, TIGF 플립플롭은 세 개의 상이한 지점(라인(2499)을 경유한 D 플립플롭(2491), 라인(2502)을 경유한 D 플립플롭, 및 라인(2513)을 경유한 D 플립플롭(2496))에서 트리거 신호를 수신한다.
TIGF 플립플롭은 클럭 신호의 에지가 검출된 경우에만 입력 값을 저장한다. 본 발명의 일 실시예에 따르면, 요구되는 에지는 클럭 신호의 상승 에지이다. 클럭 신호의 상승 에지를 검출하기 위해, 에지 검출기(2515)가 제공된다. 에지 검출기(2515)는 D 플립플롭(2496)과 AND 게이트(2497)를 포함한다. 에지 검출기(2515)는 또한 D 플립플롭(2496)의 라인(2513)의 트리거 신호를 통해 업데이트된다.81 (B) shows a TIGF-D flip-flop according to an embodiment of the present invention. Similar to the flip-flop of Fig. 81A, the TIGF flip-flop has a D input, a clock input, a set S, a reset R, and an output Q. The TIGF flip-flop also has a trigger input. TIGF flip-flops have three D flip-flops 2491, 2492, 2496, multiplexer 2493, OR gate 2494, two AND gates 2495, 2497, and various interconnects.
Flip-flop 2491 receives a TIGF D input on line 2498, a trigger input on line 2499, and provides a Q output on line 2500. This output line 2500 also functions as one of the inputs of the multiplexer 2493. The other input of multiplexer 2493 comes from the output Q of flip-flop 2492 via line 2503. The output of multiplexer 2493 is connected via line 2505 to one of the inputs of OR gate 2494. The other input of OR gate 2492 is the set (S) signal on line 2506. The output of OR gate 2494 is connected via line 2507 to one of the inputs of AND gate 2495. The other input of AND gate 2495 is a reset (R) signal on line 2508. The output of AND gate 2495 (which is also the total TIGF output Q) is connected to input of flip-flop 2492 via line 2501. Flip-flop 2492 also has a trigger input on line 2502.
Returning to the multiplexer 2493, the selector input of the multiplexer 2493 is connected to the output of the AND gate 2497 via line 2509. AND gate 2497 receives one of the inputs from the CLK signal on line 2510 and receives another input from the output of flip-flop 2496 over line 2512. Flip-flop 2496 also receives an input from the CLK signal on line 2511 and a trigger input on line 2513.
The TIGF flip-flop operation of the embodiment of the present invention will be described. In this embodiment, the TIGF flip-flop is three different points (D flip-flop 2491 via line 2499, D flip-flop via line 2502, and D flip-flop via line 2513). 2424), a trigger signal is received.
The TIGF flip-flop stores the input value only when the edge of the clock signal is detected. According to one embodiment of the invention, the required edge is the rising edge of the clock signal. To detect the rising edge of the clock signal, an edge detector 2515 is provided. Edge detector 2515 includes a D flip-flop 2496 and an AND gate 2497. Edge detector 2515 is also updated via the trigger signal of line 2513 of D flip-flop 2496.

삭제delete

D 플립플롭(2491)은 TIGF 플립플롭의 새로운 입력 값을 유지하며 트리거 신호가 라인(2499) 상에 제공될 때까지 라인(2498) 상의 D 입력의 어떠한 변화에도 저항한다. 따라서, 각각의 TIGF 플립플롭의 평가 주기 이전에, 새로운 값이 D 플립플롭(2491)에 저장된다. 따라서, TIGF 플립플롭은 TIGF 플립플롭이 트리거 신호에 의해 업데이트될 때까지 새로운 값을 미리 저장함에 의해 유지 시간 위반을 회피할 수 있다.
D 플립플롭(2492)은 트리거 신호가 라인(2502) 상에 제공되기까지 TIGF 플립플롭의 현재 값(또는 이전 값)을 유지한다. 이 값은 TIGF 플립플롭이 업데이트되고 다음 평가 주기 이전인, 에뮬레이션되는 TIGF 플립플롭의 상태이다. 라인(2501) 상의 D 플립플롭(2492)의 입력은 (평가 주기의 상당한 지속에 대하여 라인(2500) 상의 동일한 값인) 새로운 값을 유지한다.
멀티플렉서(2493)는 라인(2500) 상의 새로운 입력 값을 수신하고 라인(2503) 상의 TIGF 플립플롭에 저장된 이전 값을 수신한다. 라인(2504) 상의 선택기 신호에 기초하여, 멀티플렉서는 에뮬레이션되는 TIGF 플립플롭의 출력으로서, 새로운 값(라인(2500)) 또는 이전 값(라인(2503)) 중 하나를 출력한다. 사용자 설계 하드웨어 모델에서 전체 전파된 신호가 정상상태에 도달하기 전에, 이러한 출력이 임의의 클럭 글리치에 따라 변화된다. 따라서, 라인(2501) 상의 입력이 평가 주기의 마지막에서 플립플롭(2491)에 저장된 새로운 값을 나타낼 것이다. 트리거 신호가 TIGF 플립플롭에 의해 수신된 경우, 플립플롭(2492)은 라인(2501)에 존재하는 새로운 값을 저장하며, 플립플롭(2491)은 라인(2498) 상의 다음 새로운 값을 저장한다. 따라서, 본 발명의 일 실시예에 따른 TIGF 플립플롭은 클럭 글리치에 의해 악영향을 받지 않는다. D flip-flop 2491 retains the new input value of TIGF flip-flop and resists any change in D input on line 2498 until a trigger signal is provided on line 2499. Thus, before the evaluation period of each TIGF flip-flop, a new value is stored in the D flip-flop 2491. Thus, the TIGF flip-flop can avoid holding time violations by pre-storing new values until the TIGF flip-flop is updated by the trigger signal.
D flip-flop 2492 holds the current value (or previous value) of the TIGF flip-flop until a trigger signal is provided on line 2502. This value is the state of the emulated TIGF flip-flop, where the TIGF flip-flop is updated and before the next evaluation cycle. The input of D flip-flop 2492 on line 2501 maintains a new value (which is the same value on line 2500 for significant duration of the evaluation period).
Multiplexer 2493 receives the new input value on line 2500 and the previous value stored in TIGF flip-flop on line 2503. Based on the selector signal on line 2504, the multiplexer outputs either the new value (line 2500) or the old value (line 2503) as the output of the emulated TIGF flip-flop. In the user-designed hardware model, this output is changed according to any clock glitches before the entire propagated signal reaches steady state. Thus, an input on line 2501 will represent the new value stored in flip-flop 2491 at the end of the evaluation period. When the trigger signal is received by a TIGF flip-flop, flip-flop 2492 stores the new value present on line 2501 and flip-flop 2491 stores the next new value on line 2498. Therefore, the TIGF flip-flop according to the embodiment of the present invention is not adversely affected by the clock glitch.

삭제delete

덧붙여서, TIGF 플립플롭은 또한 클럭 글리치에 대한 면역성(immunity)을 제공한다. 당업자는 플립플롭(2420, 2421, 2423)을 도81(B)의 TIGF 플립플롭으로 대체함에 의해, 클럭 글리치가 이 TIGF 플립플롭을 이용하는 어떠한 회로에 영향을 주지 않는다는 점을 이해할 수 있을 것이다. 도77(A) 및 도77(B)을 참조하면, 시간 t₁과 t₂ 사이의 시간 동안 새로운 값으로 클럭킹되어서는 않되지만 새로운 값으로 클럭킹된 플립플롭(2423) 때문에, 클럭 글리치는 도77(A)의 회로에 악영향을 준다. CLK1 및 CLK2의 스큐 특성은 시간 t₁과 t₂ 사이의 시간 주기 동안에 XOR 게이트(2422)가 로직 1 상태를 발생시켜서 다음 플립플롭(2423)의 클럭 라인을 구동시키도록 만든다. 본 발명의 일실시예에 따른 TIGF 플립플롭에서, 클럭 글리치는 새로운 값의 클럭킹에 영향을 미치지 않는다. 플립플롭(2423)을 TIGF 플립플롭으로 대치하고, 신호가 평가 주기 도중에 정상상태에 도달되면, 짧은 트리거 주기중의 트리거 신호가 TIGF 플립플롭을 인에이블하여 새로운 값을 플립플롭(2491)(도91(B))에 저장한다. 다음, 시간 t₁과 t₂로부터의 시간 간격 도중의 도77(B)의 클럭 글리치와 같은 임의의 클럭 글리치가 새로운 값을 클럭킹하지 않게 된다. TIGF 플립플롭은 단지 트리거 신호를 업데이트하고 트리거 신호는 회로를 통한 신호 전파가 정상상태에 도달된 경우의 평가 주기 후까지 TIGF 플립플롭에 나타나지 않게 된다. In addition, TIGF flip-flops also provide immunity to clock glitches. Those skilled in the art will appreciate that by replacing the flip-flops 2420, 2421, 2423 with the TIGF flip-flops of Figure 81 (B), the clock glitch does not affect any circuit using this TIGF flip-flop. Referring to Figures 77 (A) and 77 (B), because of the flip-flop 2423 clocked to a new value but not clocked to a new value for a time between times t ₁ and t ₂ , the clock glitch is shown in Figure 77. It adversely affects the circuit of (A). The skew characteristics of CLK1 and CLK2 cause the XOR gate 2422 to generate a logic 1 state during the time period between times t ₁ and t ₂ to drive the clock line of the next flip-flop 2423. In a TIGF flip-flop according to an embodiment of the present invention, the clock glitch does not affect the clocking of the new value. If flip-flop 2423 is replaced with a TIGF flip-flop, and the signal reaches a steady state during the evaluation period, the trigger signal during the short trigger period enables the TIGF flip-flop to flip the new value to flip-flop 2491 (Figure 91). (B)). Next, any clock glitch such as the clock glitch in Fig. 77 (B) during the time interval from the times t ₁ and t ₂ will not clock the new value. The TIGF flip-flop simply updates the trigger signal and the trigger signal does not appear on the TIGF flip-flop until after the evaluation cycle when signal propagation through the circuit has reached steady state.

이 특정 실시예에서 TIGF 플립플롭이 D 플립플롭이나, 다른 플립플롭 (예를들어, T. JK, SR)이 본 발명 내에서 가능하다. D 입력 앞에 AND/OR 로직을 부가함에 의해 다른 형태의 에지 트리거 플립플롭이 D 플립플롭으로부터 유도될 수 있다.
In this particular embodiment, the TIGF flip-flop is a D flip-flop, but other flip-flops (eg, T. JK, SR) are possible within the present invention. Another form of edge trigger flip-flop can be derived from the D flip-flop by adding AND / OR logic before the D input.

Ⅶ. 시뮬레이션 서버Iii. Simulation server

본 발명의 또다른 실시예에 따른 시뮬레이션 서버는 다중 사용자들이 동일한 리컨피규러블 하드웨어 유닛을 액세스하여 시간 공유된 방식으로 이들 또는 상이한 사용자 설계를 시뮬레이션하고 가속하도록 제공된다. 고속 시뮬레이션 스케쥴러 및 하드웨어 상태 스와핑 메커니즘이 높은 수율을 야기하는 액티브 시뮬레이션 프로세스를 갖는 시뮬레이션 서버를 제공하기 위해 이용된다. 상기 서버는 가속 및 하드웨어 상태 스와핑 목적을 위해 리컨피규러블 하드웨어 유닛을 액세스하도록 다중 사용자 또는 프로세스에 제공된다. 가속이 달성되거나 하드웨어 상태가 액세스되면, 각 사용자 또는 프로세스는 소프트웨어에서만 시뮬레이션되고, 이에 의해 다른 사용자 또는 프로세스에 리컨피규러블 하드웨어 유닛의 제어를 해제시킨다.A simulation server according to another embodiment of the present invention is provided such that multiple users access the same reconfigurable hardware unit to simulate and accelerate these or different user designs in a time shared manner. A high speed simulation scheduler and hardware state swapping mechanism are used to provide a simulation server with an active simulation process resulting in high yield. The server is provided to multiple users or processes to access a reconfigurable hardware unit for acceleration and hardware state swapping purposes. Once acceleration is achieved or a hardware state is accessed, each user or process is simulated only in software, thereby releasing control of the reconfigurable hardware unit to another user or process.

본 명세서의 시뮬레이션 서버 부분에서, "작업(job)" 및 "프로세스"와 같은 용어가 사용된다. 본 명세서에서, 용어 "작업(job)" 및 "프로세스"는 일반적으로 상호교환적으로 사용된다. 과거에는, 배치(batch) 시스템이 "작업"을 실행하였고 시간 공유 시스템이 "프로세스" 또는 프로그램을 저장 및 실행하였다. 오늘날의 시스템에서, 이들 작업 및 프로세스는 유사하다. 따라서, 본 명세서에서, 용어 "작업"은 배치 타입 시스템에 국한되지 않고 "프로세스"는 시간 공유 시스템에 국한되지 않는다. 오히려, "프로세스"가 시간 슬라이스(slice) 내에 또는 다른 시간 공유 인터럽터에 의해 어떠한 인터럽트가 없이 수행되는 극단적인 경우에, "작업"이 "프로세스"와 동일하며, "작업"이 종료되기 위해 다중 시간 슬라이스를 요구하는 경우인 다른 극단적인 경우에는 "작업"은 "프로세스"의 서브셋(subset)이 된다. 따라서, "프로세스"가 다른 동일한 우선순위를 갖는 사용자/프로세스의 존재로 인하여 종료된 실행을 위해 다중 시간 슬라이스를 요구하는 경우에, "프로세스"는 "작업"으로 분할된다. 또한, 유일한 높은 우선순위 사용자 또는 프로세스가 시간 슬라이스 내에서 종료될 정도로 충분히 짧기 때문에 "프로세스"가 종료된 실행을 위해 다중 시간 슬라이스를 요구하지 않는다면, "프로세스"는 "작업"과 동일하다. 따라서, 사용자는 시뮬레이션 시스템에서 실행되고 로딩된 하나 이상의 "프로세스" 또는 프로그램과 상호작용할 수 있으며, 각 "프로세스"는 시간 공유 시스템을 종료하기 위해서 하나 이상의 "작업"을 필요로 할 수 있다.In the simulation server portion of this specification, terms such as "job" and "process" are used. In this specification, the terms "job" and "process" are generally used interchangeably. In the past, batch systems performed "jobs" and time sharing systems stored and executed "processes" or programs. In today's systems, these tasks and processes are similar. Thus, in this specification, the term "job" is not limited to batch type systems and "process" is not limited to time sharing systems. Rather, in the extreme case where a "process" is performed without any interruption in a time slice or by another time-sharing interrupter, the "task" is the same as the "process" and multiple times for the "task" to end. In the other extreme case where a slice is required, the "work" is a subset of the "process". Thus, if a "process" requires multiple time slices for execution terminated due to the presence of other equal priority users / processes, the "process" is divided into "tasks". Also, unless a "process" requires multiple time slices for terminated execution because the only high priority user or process is short enough to terminate within the time slice, the "process" is the same as the "task." Thus, a user may interact with one or more "processes" or programs executed and loaded in the simulation system, each "process" may require one or more "tasks" to terminate the time sharing system.

본 발명의 한 구조에서, 원격 터미널을 경유한 다중 사용자는, 동일한 리컨피규러블 하드웨어 유닛을 액세스하고 동일 또는 상이한 사용자 회로 설계를 검토/디버깅하기 위해서, 비(non)-네트워크 환경의 동일한 마이크로 프로세서 워크스테이션을 이용할 수 있다. 비-네트워크 환경에서, 원격 터미널은 동작 기능에 대한 액세스를 위해 주 컴퓨팅 시스템에 연결될 수 있다. 이 비-네트워크 구조는 다중 사용자들이 병렬 디버깅 목적을 위해 동일한 사용자 설계에 대한 액세스를 공유하는 것을 가능케한다. 상기 액세스는 스케쥴러가 다중 사용자에 대한 액세스 우선순위를 결정하고, 작업을 스와핑하며, 및 스케쥴된 사용자중에서 하드웨어 유닛 액세스를 선택적으로 락킹하는 시간 공유 프로세스에 의하여 달성된다. 다른 경우에는, 다중 사용자들이 디버깅 목적을 위한 사용자들의 분리된 상이한 사용자 설계용 서버를 이용하여 동일한 리컨피규러블 하드웨어 유닛을 액세스할 수 있다. 다른 구조에서는, 다중 사용자 또는 프로세스가 운영 시스템과 함께 워크스테이션 내의 다중 마이크로프로세서를 공유한다. 또다른 구조에서는, 독립된 마이크로프로세서 기반 워크스테이션 내의 다중 사용자 또는 프로세스가 동일 또는 상이한 사용자 회로 설계를 네트워크 전체에 대해 검토/디버깅하기 위해 동일한 리컨피규러블 하드웨어 유닛을 액세스할 수 있다. 유사하게, 스케쥴러가 다중 사용자에 대한 액세스 우선순위를 결정하고, 작업을 스와핑하며, 스케줄된 사용자 중에서 하드웨어 유닛 액세스를 선택적으로 락킹하는 시간 공유 프로세스를 통해 액세스가 이루어진다. 네트워크 환경에서, 스케쥴러는 UNIX 소켓 시스템 콜을 통해 네트워크 요청을 수신한다. 운영 시스템은 스케쥴러에게 명령을 전달하기 위해 소켓을 이용한다.In one architecture of the present invention, multiple users via a remote terminal can access the same reconfigurable hardware unit and review / debug the same or different user circuit designs in the same microprocessor work in a non-network environment. The station can be used. In a non-network environment, the remote terminal can be connected to the primary computing system for access to operational functions. This non-network architecture allows multiple users to share access to the same user design for parallel debugging purposes. The access is achieved by a time sharing process in which the scheduler determines access priorities for multiple users, swaps tasks, and selectively locks hardware unit access among the scheduled users. In other cases, multiple users may access the same reconfigurable hardware unit using separate, different user-designed servers of users for debugging purposes. In another architecture, multiple users or processes share multiple microprocessors within a workstation with an operating system. In another architecture, multiple users or processes within separate microprocessor-based workstations can access the same reconfigurable hardware unit to review / debug the same or different user circuit designs throughout the network. Similarly, access is achieved through a time sharing process in which the scheduler determines access priorities for multiple users, swaps tasks, and selectively locks hardware unit access among scheduled users. In a network environment, the scheduler receives network requests through UNIX socket system calls. The operating system uses sockets to send commands to the scheduler.

위에서 언급한 바와 같이, 시뮬레이션 스케쥴러는 선점의(preemptive) 다중 우선순위 라운드 로빈(round robin) 알고리즘을 사용한다. 즉, 높은 우선순위 사용자 또는 프로세스는 상기 사용자 또는 프로세스가 작업을 종료하고 세션을 종료할 때까지 먼저 서비스된다. 동일한 우선순위 사용자 또는 프로세스 중에서, 각 사용자 또는 프로세스에 동일한 시간 슬라이스가 할당되어 종료되기까지 동작이 실행되는 선점 라운드 로빈 알고리즘이 사용된다. 시간 슬라이스는 다중 사용자 또는 프로세스가 서비스되기 전에 오랜시간을 대기할 필요가 없을 정도로 충분히 짧다. 시뮬레이션 서버의 스케쥴러가 한 사용자 또는 프로세스를 인터럽트하여 스와핑되고 새로운 사용자 작업을 실행하기 전에 충분한 동작이 실행될 정도로 시간 슬라이스는 또한 충분히 길다. 본 발명의 일 실시예에서, 디폴트 시간 슬라이스는 5초이며 사용자가 설정가능하다. 일 실시예에서, 스케쥴러는 운영 시스템의 내장 스케쥴러에 특정 콜을 만든다.As mentioned above, the simulation scheduler uses a preemptive multi-priority round robin algorithm. That is, the high priority user or process is serviced first until the user or process finishes work and ends the session. Among the same priority users or processes, a preemptive round robin algorithm is used in which the operation is executed until the same time slice is assigned to each user or process and terminated. The time slice is short enough that you do not have to wait long before multiple users or processes are serviced. The time slice is also long enough so that the scheduler of the simulation server is swapped out by interrupting one user or process and sufficient actions are executed before executing a new user task. In one embodiment of the present invention, the default time slice is 5 seconds and is user settable. In one embodiment, the scheduler makes specific calls to the operating system's built-in scheduler.

도45는 본 발명에 따른 실시예에서 멀티프로세서 워크스테이션을 갖는 비-네트워크 환경을 나타낸다. 도45는 도1의 변형예이며, 따라서, 동일한 도면번호가 동일한 컴포넌트/유닛에 사용될 것이다. 워크스테이션(1100)은 로컬 버스(1105), 호스트/PCI 브릿지(1106), 메모리 버스(1107), 및 주 메모리(1108)를 포함한다. 캐쉬 메모리 시스템(미도시)이 또한 제공될 수 있다. 다른 사용자 인터페이스 유닛(예를들어, 모니터, 키보드)이 또한 제공되나 도45에 도시되지는 않았다. 워크스테이션(1100)은 또한 스케쥴러(1117) 및 커넥션/경로(1118)를 통해 로컬 버스(1105)에 연결된 다중 마이크로프로세서(1101, 1102, 1103, 1104)를 포함한다. 당업자에게 공지된 바와 같이, 운영 시스템(1121)은 파일을 관리하고 컴퓨팅 환경 에서 다양한 사용자, 프로세스 및 장치를 위해 리소스를 할당하기 위하여 전체 컴퓨팅 환경에 대한 사용자-하드웨어 인터페이스 기초(foundation)를 제공한다. 개념적인 목적으로 운영 시스템(1121)과 버스(1122)가 도시되어 있다. 운영 시스템에 대하여, Abraham Silberschatz 및 James L. Peterson, OPERATING SYSTEM CONCEPTS(1988) 및 William Stallings, MODERN OPERATION SYSTEM(1996)이 참조될 수 있다.Figure 45 illustrates a non-network environment with a multiprocessor workstation in an embodiment in accordance with the present invention. Figure 45 is a variant of Figure 1, therefore, the same reference numbers will be used for the same components / units. Workstation 1100 includes a local bus 1105, a host / PCI bridge 1106, a memory bus 1107, and a main memory 1108. Cache memory systems (not shown) may also be provided. Other user interface units (eg, monitors, keyboards) are also provided but are not shown in FIG. Workstation 1100 also includes multiple microprocessors 1101, 1102, 1103, 1104 connected to local bus 1105 via scheduler 1117 and connection / path 1118. As is known to those skilled in the art, operating system 1121 provides a user-hardware interface foundation for the entire computing environment for managing files and allocating resources for various users, processes, and devices in the computing environment. Operating system 1121 and bus 1122 are shown for conceptual purposes. For operating systems, reference may be made to Abraham Silberschatz and James L. Peterson, OPERATING SYSTEM CONCEPTS (1988) and William Stallings, MODERN OPERATION SYSTEM (1996).

본 발명의 일 실시예에서, 워크스테이션(1100)은 UltraSPARC Ⅱ 프로세서를 이용하는 Sun Microsystems Enterprise 450 시스템이다. 로컬 버스를 경유한 메모리 액세스 대신에, Sun 450 시스템은 멀티프로세서가 크로스바(crossbar) 스위치를 통한 메모리용 버스를 통해 메모리를 액세스하도록 허용한다. 따라서, 다중 프로세스는 로컬 버스를 통하지 않고 각각의 명령어들을 실행하고 메모리에 액세스하는 다중 마이크로프로세서에 의해 가동될 수 있다. Sun 450 시스템과 Sun UltraSPARC 멀티프로세서의 사양이 참조된다. Sun Ultra 60 시스템은 단지 두 개의 프로세스만을 허용하나 마이크로프로세서 시스템의 일 예가 될 수 있다.In one embodiment of the invention, workstation 1100 is a Sun Microsystems Enterprise 450 system utilizing an UltraSPARC II processor. Instead of accessing memory via the local bus, the Sun 450 system allows multiprocessors to access memory through the bus for memory through a crossbar switch. Thus, multiple processes can be run by multiple microprocessors that execute respective instructions and access memory without going through the local bus. Reference is made to the specifications of the Sun 450 system and the Sun UltraSPARC Multiprocessor. The Sun Ultra 60 system allows only two processes, but could be an example of a microprocessor system.

스케쥴러(1117)는 시간 공유 액세스를 장치 드라이버(1119) 및 커넥션/경로(1120)를 통해 리컨피규러블 하드웨어 유닛(20)에 제공한다. 스케쥴러(1117)는 호스트 컴퓨팅 시스템의 운영 시스템과 상호작용하기 위해 소프트웨어에서 대개 구현되며, 내부/외부 시뮬레이션 세션에서의 시뮬레이션 작업 인터럽트를 지원함으로써 시뮬레이션 서버와 상호작용하기 위해 부분적으로는 하드웨어에서 구현된다. 스케쥴러(1117) 및 장치 드라이버(1119)는 이하에서 상세히 설명될 것이다.The scheduler 1117 provides time sharing access to the reconfigurable hardware unit 20 via the device driver 1119 and the connection / path 1120. The scheduler 1117 is typically implemented in software to interact with the operating system of the host computing system, and partly in hardware to interact with the simulation server by supporting simulation task interruptions in internal / external simulation sessions. Scheduler 1117 and device driver 1119 will be described in detail below.

각 마이크로프로세서(1101-1104)는 워크스테이션(1101) 내의 다른 마이크로프로세서들을 독립적으로 프로세싱할 수 있다. 본 발명의 일 실시예에서, 워크스테이션(1100)은 UNIX 기반 운영 시스템하에서 동작되나, 다른 실시예에서는, 워크스테이션(1100)은 Windows 기반 또는 Macintosh 기반 운영 시스템 하에서 동작될 수 있다. UNIX 기반 시스템에서, 사용자는 프로그램, 작업, 및 필요한 경우 파일에 대하여 X-Windows를 구비한다. UNIX 운영 시스템에 대한 상세한 내용은 Maurice J. Bach, THE DESIGN OF THE UNIX OPERATING SYSTEM(1986)이 참조된다.Each microprocessor 1101-1104 may independently process other microprocessors in the workstation 1101. In one embodiment of the present invention, workstation 1100 operates under a UNIX-based operating system, while in other embodiments, workstation 1100 may operate under a Windows-based or Macintosh-based operating system. In UNIX-based systems, users have X-Windows for programs, tasks, and files as needed. For more information on UNIX operating systems, see Maurice J. Bach, THE DESIGN OF THE UNIX OPERATING SYSTEM (1986).

도45에서, 다중 사용자는 원격 터미널을 이용하여 워크스테이션(1100)에 액세스될 수 있다. 때때로, 각 사용자는 프로세스를 구동시키기 위해서 특정 CPU를 이용하고 있을 수 있다. 다른 경우에, 각 사용자는 자원 제한에 의존하여 상이한 CPU를 이용한다. 대개, 운영 시스템(1121)은 이러한 액세스를 결정하며, 운영 시스템은 작업을 종료하기 위하여 CPU로부터 다른 것으로 점프할 수 있다. 시간 공유 프로세스를 처리하기 위해서, 스케쥴러는 소켓 시스템 을 통하여 네트워크 요구를 수신하고 운영 시스템(1121)에 시스템 콜을 생성하며, 이들은 차례로 리컨피규러블 하드웨어 유닛(20)에 장치 드라이버(1119)에 의한 인터럽트 신호의 생성을 개시함에 의해 선점을 다룬다. 이러한 인터럽트 신호 발생이 현재 작업을 중지하고, 현재 인터럽트된 작업에 대한 상태 정보를 저장하고, 작업을 스와핑(swap)하고, 새로운 작업을 실행하는 것을 포함하는 스케쥴링 알고리즘에 대한 다수의 단계들중 하나가 된다. 서버 스케쥴링 알고리즘이 이하에서 설명될 것이다.In FIG. 45, multiple users can be accessed to workstation 1100 using a remote terminal. At times, each user may be using a particular CPU to run a process. In other cases, each user uses a different CPU depending on resource limitations. Usually, operating system 1121 determines this access, and the operating system can jump from the CPU to another to complete the task. To handle the time sharing process, the scheduler receives the network request through the socket system and makes a system call to the operating system 1121, which in turn is interrupted by the device driver 1119 to the reconfigurable hardware unit 20. Address preemption by initiating signal generation. One of a number of steps for scheduling algorithms that involve generating such an interrupt signal includes stopping the current task, storing state information about the currently interrupted task, swapping the task, and executing a new task. do. The server scheduling algorithm will be described below.

소켓 및 소켓 시스템 콜을 간략히 설명하고자 한다. 일 실시예의 UNIX 운영 시스템는 시간 공유 모드로 동작할 수 있다. UNIX 커널(kernel)은 CPU를 시간 주기(예를들어, 시간 슬라이스) 동안과 시간 슬라이스의 말미에서 프로세스에 할당시키고, 상기 프로세스를 선점하고 다음 시간 슬라이스에 대하여 다음 것을 스케줄한다. 이전 시간 슬라이스로부터 선점된 프로세스는 나중 시간 슬라이스에서 실행을 위해 재스케쥴링된다.We will briefly describe socket and socket system calls. The UNIX operating system of one embodiment may operate in a time sharing mode. The UNIX kernel allocates a CPU to a process for a time period (e.g., a time slice) and at the end of the time slice, preempts the process and schedules the next for the next time slice. Processes preempted from the previous time slice are rescheduled for execution in the later time slice.

프로세스간 통신을 가능 및 용이하게 하고 복잡한 네트워크 프로토콜의 사용을 가능하게 하기 위한 하나의 구조는 소켓이다. 커널은 클라이언트-서버 모델의 범주에서 동작하는 3개의 층들을 포함한다. 이러한 3개의 층들은 소켓 층, 프로토콜 층, 및 디바이스 층을 포함한다. 소켓 층인 상부층은 시스템 요청들과 하부층(프로토콜 층 및 디바이스 층) 간의 인터페이스를 제공한다. 통상, 소켓은 클라이언트 프로세스를 서버 프로세스와 결합시키는 엔드 포인트(end point)를 갖는다. 소켓 엔드 포인트는 다른 장치일 수 있다. 프로토콜 층인 중간 층은 TCP 및 IP와 같은 통신용 프로토콜 모듈을 제공한다. 디바이스 층인 하부층은 네트워크 장치를 제어하는 디바이스 드라이버를 포함한다. 디바이스 드라이버의 일 예가 이더넷(Ethernet) 기반 네트워크 상의 이더넷 드라이버이다. One structure for enabling and facilitating interprocess communication and for enabling the use of complex network protocols is sockets. The kernel contains three layers that operate within the scope of the client-server model. These three layers include a socket layer, a protocol layer, and a device layer. The upper layer, the socket layer, provides the interface between the system requests and the lower layer (protocol layer and device layer). Typically, a socket has an end point that associates a client process with a server process. The socket endpoint may be another device. The middle layer, the protocol layer, provides protocol modules for communication such as TCP and IP. The lower layer, which is the device layer, contains device drivers that control the network devices. An example of a device driver is an Ethernet driver on an Ethernet based network.

서버 프로세스가 하나의 엔드 포인트에서 소켓을 수신하고 클라이언트 프로세스가 양방향 통신 경로의 다른 엔드 포인트에서 다른 소켓 상의 서버 프로세스를 수신하는 클라이언트-서버 모델을 이용하여 프로세스는 통신한다. 커널은 각 클라이언트와 서버의 세 개의 층들 중에서 내부 접속을 유지하고 필요한 경우 클라이언트로부터 서버로 데이터를 전송한다.Processes communicate using a client-server model in which a server process receives a socket at one endpoint and a client process receives a server process on another socket at another endpoint in a bidirectional communication path. The kernel maintains internal connections among the three layers of each client and server, and transfers data from the client to the server as needed.

소켓은 통신 경로의 엔드 포인트를 형성하는 소켓 시스템 콩을 포함하는 수개의 시스템 콜을 포함한다. 다수의 프로세스가 다수의 시스템 콜의 소켓 디스크립터를 사용한다. 바인드(bind) 시스템 콜은 이름을 소켓 디스크립터와 연관시킨다. 일부 다른 예시적인 시스템 콜은 커널이 소켓에 접속되는 접속 시스템 콜 요청을 포함하며, 폐쇄(close) 시스템 콜은 소켓을 폐쇄시키고, 셧다운(shutdown) 시스템 콜은 소켓 커넥션을 폐쇄시키며, 송신 및 수신 시스템 콜은 연결된 소켓을 통해 데이터를 전송한다. The socket contains several system calls that include a socket system bean that forms an endpoint of the communication path. Many processes use socket descriptors for many system calls. The bind system call associates a name with a socket descriptor. Some other example system calls include a connect system call request where the kernel is connected to a socket, a close system call closes a socket, a shutdown system call closes a socket connection, and a send and receive system. The call sends data through the connected socket.

도46은 다중 워크스테이션이 네트워크를 통해 시간 공유 기반의 단일 시뮬레이션 시스템을 공유하는 본 발명의 다른 실시예를 도시한다. 다중 워크스테이션은 스케쥴러(1117)를 통해 시뮬레이션 시스템에 연결된다. 시뮬레이션 시스템의 컴퓨팅 환경내에서, 단일 CPU(11)가 스테이션(1110) 의 로컬 버스(12)에 연결된다. 다중 CPU가 이 시스템에 제공될 수 있다. 당업자에게 공지된 바와 같이, 운영 시스템(1118)가 제공되며 거의 모든 프로세스 및 어플리케이션이 운영 시스템의 상부에 존재하게 된다. 개념적인 목적을 위하여 운영 시스템(1121) 및 버스(1122)가 도시되어 있다. 46 illustrates another embodiment of the present invention in which multiple workstations share a single time-based, single simulation system over a network. Multiple workstations are connected to the simulation system through the scheduler 1117. Within the computing environment of the simulation system, a single CPU 11 is connected to the local bus 12 of the station 1110. Multiple CPUs can be provided to this system. As is known to those skilled in the art, operating system 1118 is provided and almost all processes and applications reside on top of the operating system. Operating system 1121 and bus 1122 are shown for conceptual purposes.

도 46에서, 워크스테이션(1110)은 도1에 도시된 컴포넌트/유닛과, 스케쥴러(1117) 및 운영 시스템(1121)을 통해 로컬 버스(12)에 연결된 스케쥴러 버스(1118)를 포함한다. 스케쥴러(1117)는 운영 시스템(1121)에 대한 소켓 콜을 생성함에 의해 사용자 스테이션(1111, 1112, 1113)에 대한 시간 공유 액세스를 제어한다. 스케쥴러(1117)는 대부분은 소프트웨어에서 부분적으로는 하드웨어에서 구현된다.In FIG. 46, workstation 1110 includes a component / unit shown in FIG. 1 and a scheduler bus 1118 connected to local bus 12 via scheduler 1117 and operating system 1121. Scheduler 1117 controls time sharing access to user stations 1111, 1112, 1113 by making a socket call to operating system 1121. The scheduler 1117 is implemented mostly in software, in part in hardware.

이 도면에서, 단지 세 개의 사용자가 도시되어 있고 네트워크를 통해 시뮬레이션 시스템에 액세스할 수 있다. 물론, 다른 시스템 구조가 세 개 이상의 사용자 또는 그 이하의 사용자에 대하여 제공될 수 있다. 각 사용자는 원격 스테이션(1111, 112, 또는 1113)을 통해 시스템을 액세스한다. 원격 사용자 스테이션(1111, 112, 및 1113)은 네트워크 액세스(1114, 1115, 및 1116)를 통해 각각 스케쥴러(1117)에 연결된다.In this figure only three users are shown and the simulation system can be accessed via a network. Of course, other system architectures may be provided for three or more users or fewer users. Each user accesses the system through a remote station 1111, 112, or 1113. Remote user stations 1111, 112, and 1113 are connected to scheduler 1117 via network access 1114, 1115, and 1116, respectively.

당업자에게 공지된 바와 같이, 장치 드라이버(1119)가 PCI 버스(50) 및 리컨피규러블 하드웨어 유닛(20) 사이에 액세스된다. 액세스 또는 전기전도성 경로(1120)가 장치 드라이버(1119)와 리컨피규러블 하드웨어 유닛(20) 사이에 제공된다. 본 발명의 이 네트워크 다중 사용자 구현예에서, 스케쥴러(1117)는 하드웨어 상태 복구 목적 후에 하드웨어 가속과 시뮬레이션을 위해 리컨피규러블 하드웨어 유닛(20)과 통신하고 제어하도록 운영 시스템(1121)을 통해 장치 드라이버(1119)와 인터페이스된다. As is known to those skilled in the art, a device driver 1119 is accessed between the PCI bus 50 and the reconfigurable hardware unit 20. An access or electrically conductive path 1120 is provided between the device driver 1119 and the reconfigurable hardware unit 20. In this network multi-user implementation of the present invention, the scheduler 1117 communicates with and controls the reconfigurable hardware unit 20 for hardware acceleration and simulation after hardware state recovery purposes. 1119).

또한, 본 발명의 일실시예에서, 시뮬레이션 워크스테이션(1100)은 UltraSPARC Ⅱ 멀티프로세서를 이용하는 Sun Microsystems Enterprise 450 시스템이다. 로컬 버스를 통한 메모리 액세스 대신에, Sun 450 시스템은 로컬 버스를 대신하여 크로스바 스위치를 통한 메모리로의 전용 버스에 의해 멀티프로세서가 메모리에 액세스하는 것을 가능하게 한다. In addition, in one embodiment of the invention, the simulation workstation 1100 is a Sun Microsystems Enterprise 450 system using an UltraSPARC II multiprocessor. Instead of accessing the memory via the local bus, the Sun 450 system allows the multiprocessor to access the memory by means of a dedicated bus to the memory through the crossbar switch on behalf of the local bus.

도47은 본 발명의 네트워크 실시예에 따른 시뮬레이션 서버의 상위 레벨 구조를 나타낸다. 여기서, 운영 시스템은 명시적으로 도시되지는 않았으나, 당업계에 공지된 바와 같이, 다양한 사용자, 프로세스, 및 시뮬레이션 컴퓨팅 환경의 장치를 서비스하기 위해 파일 관리 및 리소스 할당의 목적으로 항상 존재한다. 시뮬레이션 서버91130)은 스케쥴러(1137), 하나 이상의 장치 드라이버(1138), 및 리컨피규러블 하드웨어 유닛(1139)을 포함한다. 도45 및 도46에 단일 집적 유닛으로서 분명히 도시되어 있지는 않으나, 시뮬레이션 서버는 스케쥴러(1117), 장치 드라이버(1119), 및 리컨피규러블 하드웨어 유닛(20)을 포함한다. 도47로 돌아가서, 시뮬레이션 서버(1130)는 네트워크 커넥션/경로(1134, 1135, 및 1136)를 각각 통하여 세 개의 워크스테이션(1131, 1132, 및 1133)(또는 사용자)에 연결된다. 위에서 언급한 바와 같이, 세 개 이상 또는 세 개 이하의 워크스테이션이 시뮬레이션 서버(1130)에 연결될 수 있다.
시뮬레이션 서버의 스케쥴러는 선점 라운드 로빈 알고리즘에 기초한다. 본질적으로, 라운드 로빈 구조는 수명의 사용자 또는 프로세스가 순환 수행을 종료을 위해 연속적으로 실행하는 것을 가능케한다. 따라서, 각 시뮬레이션 작업(네트워크 환경 내의 워크스테이션 또는 멀티프로세싱 비-네트워크 환경의 사용자/프로세스와 연관됨)이 우선순위 레벨 및 실행될 고정 시간 슬라이스에 할당된다.
일반적으로, 높은 우선순위 작업은 종료을 위해 먼저 실행된다. 극단적인 경우에, 상이한 사용자들이 각각 상이한 우선순위를 갖는 다면, 가장높은 우선순위를 갖는 사용자가 그의 작업이 종료될 때까지 먼저 서비스를 받으며, 가장 낮은 우선순위를 갖는 사용자는 가장 나중에 서비스를 받는다. 여기서, 각 사용자가 상이한 우선순위를 갖고 스케쥴러가 단순히 우선순위에 따라 사용자에게 서비스를 제공하기 때문에 시간 슬라이스가 사용되지 않는다. 이러한 시나리오는 종료까지 시뮬레이션 시스템에 액세스하는 단지 하나의 사용자만을 갖는 것과 유사하다.
극단적인 경우에, 상이한 사용자들이 동일한 우선순위를 갖는다. 따라서, 선입선출(first-in first-out; FIFO) 큐(queue)를 갖는 시간 슬라이스 개념이 이용된다. 동일한 우선순위 작업 중에서, 각 작업은 종료되거나 고정 시간 슬라이스가 끝날 때까지 먼저 오는 작업이 실행된다. 만약 작업이 시간 슬라이스 동안 실행 완료되지 않으면, 완료되는 어떠한 작업과 연동되는 시뮬레이션 영상은 이후의 복구 및 실행을 위해 저장되어야 한다. 다음, 이 작업은 큐의 마지막에 배치된다. 저장된 시뮬레이션 영상이 존재한다면 다음 작업이 복구되고 다음 시간 슬라이스에서 실행된다.
높은 우선순위 작업은 낮은 우선순위 작업을 선점할 수 있다. 즉, 동일한 우선순위의 작업은 종료을 위해 시간 슬라이스를 통해 실행될 때까지 라운드 로빈 방식으로 동작된다. 다음, 낮은 우선순위의 작업이 라운드 로빈 방식으로 동작된다. 낮은 우선순위의 작업이 동작되는 중에 높은 우선순위의 작업이 큐에 삽입되면, 높은 우선순위 작업은 높은 우선순위 작업이 실행 종료될 때까지 낮은 우선순위 작업을 선점한다. 따라서, 높은 우선순위 작업은 낮은 우선순위 작업이 실행 개시되기 전에 실행 종료된다. 낮은 우선순위 작업이 이미 실행을 개시한 경우에, 높은 우선순위 작업이 실행 종료될 때까지 낮은 우선순위 작업은 추가로 실행 종료되지 않는다. 47 illustrates a high level structure of the simulation server according to the network embodiment of the present invention. Here, the operating system is not explicitly shown, but as is known in the art, it is always present for file management and resource allocation purposes to service devices of various users, processes, and simulation computing environments. Simulation server 91130 includes a scheduler 1137, one or more device drivers 1138, and a reconfigurable hardware unit 1139. Although not explicitly shown as a single integrated unit in FIGS. 45 and 46, the simulation server includes a scheduler 1117, a device driver 1119, and a reconfigurable hardware unit 20. Returning to FIG. 47, the simulation server 1130 is connected to three workstations 1131, 1132, and 1133 (or users) via network connections / paths 1134, 1135, and 1136, respectively. As mentioned above, three or more or three or fewer workstations may be connected to the simulation server 1130.
The scheduler of the simulation server is based on a preemptive round robin algorithm. In essence, the round robin structure allows a lifetime of users or processes to run continuously to end circular performance. Thus, each simulation task (associated with a workstation in a network environment or a user / process in a multiprocessing non-network environment) is assigned to a priority level and a fixed time slice to be executed.
In general, high priority jobs are executed first for termination. In the extreme case, if different users each have a different priority, the user with the highest priority is served first until the end of his work, and the user with the lowest priority is serviced last. Here, no time slice is used because each user has a different priority and the scheduler simply provides services to the user according to the priority. This scenario is similar to having only one user accessing the simulation system until termination.
In extreme cases, different users have the same priority. Thus, the concept of time slice with a first-in first-out (FIFO) queue is used. Of the same priority jobs, each job runs first until either it ends or the fixed time slice ends. If a task is not completed during the time slice, the simulation image associated with any task that is completed should be saved for later recovery and execution. Next, this job is placed at the end of the queue. If a saved simulation image exists, the next task is restored and executed at the next time slice.
High priority jobs can preempt low priority jobs. That is, jobs of the same priority are operated in a round robin fashion until executed over time slices for termination. Next, lower priority tasks are run in a round robin fashion. If a high priority job is put in the queue while a low priority job is running, the high priority job preempts the low priority job until the high priority job ends. Thus, the high priority task ends execution before the low priority task begins execution. If the low priority task has already started executing, the low priority task is not terminated further until the high priority task has finished executing.

삭제delete

본 발명의 일 실시예에서, UNIX 운영 시스템는 기초적이고 기본적인 선점 라운드 로빈 스케쥴링 알고리즘을 제공한다. 본 발명의 일실시예에 따른 시뮬레이션 서버의 스케쥴링 알고리즘은 운영 시스템의 스케쥴링 알고리즘과 결합되어 동작된다. UNIX 기반 시스템에서, 스케쥴링 알고리즘의 선점 특성은 운영 시스템이 사용자 정의 스케줄을 선점하도록 제공된다. 시간 공유 구조를 가능하게 하기 위해서, 시뮬레이션 스케쥴러는 운영 시스템의 스케쥴링 알고리즘의 상부에서 선점 다중 우선순위 라운드 로빈 알고리즘을 사용한다.
다중 사용자와 본 발명의 일실시예에 따른 시뮬레이션 서버 사이의 관계는 클라이언트-서버 모델을 따르며, 다중 사용자는 클라이언트이며 시뮬레이션 서버는 서버가 된다. 사용자 클라이언트와 서버 사이의 통신은 소켓 콜을 통해 일어난다. 도55를 참조하면, 클라이언트는 클라이언트 프로그램(1109), 소켓 시스템 콜 컴포넌트(1123), UNIX 커널(1124), 및 TCP/IP 프로토콜 컴포넌트(1125)를 포함한다. 서버는 TCP/IP 프로토콜 컴포넌트(1126), UNIX 커널(1127), 소켓 시스템 콜 컴포넌트(1128), 및 시뮬레이션 서버(1129)를 포함한다. 다중 클라이언트는 클라이언트 어플리케이션 프로그램으로부터 UNIX 소켓 콜을 통해 서버에서 시뮬레이션되도록 시뮬레이션 작업을 요청한다.
본 발명의 일 실시예에서, 통상의 일련의 이벤트에는 UNIX 소켓 프로토콜을 통해 서버에 요청(request)을 전달하는 다중 클라이언트가 포함된다. 각 요청을 위하여, 서버는 명령이 성공적으로 실행되었는지에 대한 요구를 인식한다. 서버 큐 상태의 요청을 위하여, 서버는 현재 큐 상태를 응답하여 사용자에게 적절하게 표시될 수 있게 된다. 아래의 표 F는 클라이언트로부터의 관련된 소켓 명령을 나타낸다.In one embodiment of the invention, the UNIX operating system provides a basic and basic preemptive round robin scheduling algorithm. The scheduling algorithm of the simulation server according to an embodiment of the present invention operates in conjunction with the scheduling algorithm of the operating system. In UNIX-based systems, the preemptive nature of the scheduling algorithm is provided for the operating system to preempt a user defined schedule. To enable the time sharing architecture, the simulation scheduler uses a preemptive multi-priority round robin algorithm on top of the scheduling algorithm of the operating system.
The relationship between multiple users and a simulation server according to an embodiment of the present invention follows a client-server model, where multiple users are clients and simulation servers are servers. Communication between the user client and the server takes place via socket calls. Referring to Figure 55, a client includes a client program 1109, a socket system call component 1123, a UNIX kernel 1124, and a TCP / IP protocol component 1125. The server includes a TCP / IP protocol component 1126, a UNIX kernel 1127, a socket system call component 1128, and a simulation server 1129. Multiple clients request simulation tasks to be simulated on the server through UNIX socket calls from client application programs.
In one embodiment of the present invention, a typical series of events includes multiple clients delivering requests to the server via the UNIX socket protocol. For each request, the server recognizes the request as to whether the command was executed successfully. For the request of the server queue status, the server can respond to the current queue status so that it can be properly displayed to the user. Table F below shows the relevant socket commands from the client.

삭제delete

표 F: 클라이언트 소켓 명령Table F: Client Socket Commands

명령Command 설명Explanation 00 시뮬레이션 <design> 시작Start simulation <design> 1One 시뮬레이션 <design> 중지Stop simulation <design> 22 시뮬레이션 <design> 빠져나감(exit)Simulation <design> Exit 33 시뮬레이션 세션에 우선순위 재할당Reassign Priorities to Simulation Sessions 44 설계 시뮬레이션 상태 저장Save Design Simulation State 55 상태 큐Status queue

각 소켓 콜에 대하여, 정수로 엔코딩된 각 명령은 설계 이름을 나타내는 <design>과 같은 부가적인 파라미터가 뒤따를 수 있다. 명령이 성공적으로 실행되면 시뮬레이션 서버로부터의 응답이 "0"이 되고 명령이 실패하면 "1"이 될 것이다. 큐 상태를 요구하는 명령 "5"에 대하여, 명령의 리턴 응답의 일실시예로 사용자 스크린 상에 표시되기 위한 "＼0"에 의해 종단되는 ASCII 텍스트가 있다. 이들 시스템 소켓 콜에서, 적절한 통신 프로토콜 신호가 장치 드라이버를 통해 리컨피규러블 하드웨어 유닛으로 전송되거나 이로부터 수신된다.For each socket call, each instruction encoded as an integer may be followed by additional parameters, such as <design>, indicating the design name. If the command runs successfully, the response from the simulation server will be "0" and if the command fails, it will be "1". For command " 5 " requesting queue status, there is ASCII text terminated by " 0 " to be displayed on the user screen as one example of the command's return response. In these system socket calls, the appropriate communication protocol signal is sent to or received from the reconfigurable hardware unit via the device driver.

도48은 본 발명에 따른 시뮬레이션 서버의 일 실시예를 나타낸다. 위에서 설명한 바와 같이, 다중 사용자 또는 다중 프로세스는 사용자 설계의 시간 공유 방식의 시뮬레이션 및 하드웨어 가속을 위하여 단일 시뮬레이션 서버에 의해 서비스될 수 있다. 따라서, 사용자/프로세스(1147, 1148, 및 1149)가 프로세스간 통신 경로(1150, 1151, 1152) 각각을 통해 시뮬레이션 서버(1140)에 연결된다. 프로세스간 통신 경로(1150, 1151, 및 1152)는 멀티프로세서 구조 및 동작을 위하여 동일한 워크스테이션에, 또는 다중 워크스테이션용 네트워크에 존재할 수 있다. 각 시뮬레이션 세션은 소프트웨어 시뮬레이션 상태와 리컨피규러블 하드웨어 유닛과 통신하기 위한 하드웨어 상태를 포함한다. 소프트웨어 세션 중의 프로세스간 통신은 UNIX 소켓 또는 시뮬레이터 플러그인(plug-in) 카드가 설치되거나 별도의 워크스테이션 상에 TCP/IP 네트워크를 통해 연결된 동일한 워크스테이션상에 시뮬레이션 세션이 존재하게 할 수 있는 능력을 제공하는 시스템 콜을 이용하여 수행된다. 시뮬레이션 서버와의 통신은 자동적으로 개시된다.48 shows an embodiment of a simulation server according to the present invention. As described above, multiple users or multiple processes can be serviced by a single simulation server for time-sharing simulation and hardware acceleration of user designs. Thus, users / processes 1147, 1148, and 1149 are connected to the simulation server 1140 through each of the interprocess communication paths 1150, 1151, 1152. Interprocess communication paths 1150, 1151, and 1152 may exist at the same workstation or in a network for multiple workstations for multiprocessor architecture and operation. Each simulation session includes a software simulation state and a hardware state for communicating with the reconfigurable hardware unit. Interprocess communication during a software session provides the ability to have a simulation session on the same workstation with a UNIX socket or simulator plug-in card installed, or connected via a TCP / IP network on a separate workstation. This is done using a system call. Communication with the simulation server is automatically initiated.

도48에서, 시뮬레이션 서버(1140)는 서버 모니터(1141), 시뮬레이션 작업 큐 테이블(1142), 우선순위 분류기(sorter)(1143), 작업 스와퍼(1144), 장치 드라이버(1145), 및 리컨피규러블 하드웨어 유닛(1146)을 포함한다. 시뮬레이션 작업 큐 테이블(1142), 우선순위 분류기(1143), 및 작업 스와퍼(job swapper)(1144)는 도47에 도시된 스케쥴러(1137)를 구성한다. In FIG. 48, the simulation server 1140 includes a server monitor 1141, a simulation work queue table 1142, a priority sorter 1143, a job swapper 1144, a device driver 1145, and a reconfiguration. And a flexible hardware unit 1146. The simulation job queue table 1142, priority classifier 1143, and job swapper 1144 constitute the scheduler 1137 shown in FIG.

서버 모니터(1141)는 시스템의 관리자를 위한 사용자 인터페이스 기능을 제공한다. 사용자는 큐 내의 시뮬레이션 작업을 표시하도록 명령하고, 우선순위, 사용 내역, 및 시뮬레이션 작업 스와핑 효율을 스케쥴링함에 의해 시뮬레이션 서버 상태를 모니터할 수 있다. 다른 사용 기능에는 작업 우선순위를 컴파일하고, 시뮬레이션 작업을 삭제하고, 시뮬레이션 서버 상태를 리셋하는 것이 포함된다. Server monitor 1141 provides a user interface function for the administrator of the system. The user can monitor the simulation server status by instructing to display simulation jobs in the queue and scheduling priorities, usage history, and simulation job swapping efficiency. Other usage features include compiling task priorities, deleting simulation tasks, and resetting the simulation server state.

시뮬레이션 작업 큐 테이블(1142)은 스케쥴러에 의해 삽입된 큐 내의 모든 미해결된 시뮬레이션 요구의 리스트를 보유한다. 테이블 입력은 작업 번호, 소프트웨어 시뮬레이션 프로세스 번호, 소프트웨어 시뮬레이션 영상, 하드웨어 시뮬레이션 영상 파일, 설계 구조 파일, 우선순위 번호, 하드웨어 사이즈, 소프트웨어 사이즈, 시뮬레이션 실행(run)의 누적 시간, 및 소유자 식별을 포함한다. 작업 큐는 선입선출(FIFO) 큐를 이용하여 구현된다. 따라서, 새로운 작업이 요구되는 경우, 큐의 말단에 배치되게 된다.The simulation work queue table 1142 holds a list of all outstanding simulation requests in the queue inserted by the scheduler. Table entries include job number, software simulation process number, software simulation image, hardware simulation image file, design structure file, priority number, hardware size, software size, cumulative time of simulation run, and owner identification. Work queues are implemented using first-in, first-out (FIFO) queues. Thus, when new work is required, it is placed at the end of the queue.

우선순위 분류기(1143)는 큐 내의 어떠한 작업이 실행될 것인지 결정한다. 일 실시예에서, 시뮬레이션 작업 우선순위 구조는 어떠한 시뮬레이션 프로세스가 현재 실행에 대해 우선순위를 갖는지 제어하기 위해 사용자 정의가능(즉, 시스템 관리자에 의해 제어가능 및 정의가능)하다. 일 실시예에서, 우선순위 레벨은 특정 프로세스의 긴급성 또는 특정 사용자의 중요성을 기반으로 수정될 수 있다. 다른 실시예에서, 우선순위 레벨은 동적이며 시뮬레이션 도중에 변경될 수 있다. 바람직한 실시예에서, 우선순위는 사용자 ID에 기초한다. 통상, 한 사용자는 높은 우선순위를 갖고 나머지 모든 사용자들은 낮으나 동일한 우선순위를 갖는다.Priority classifier 1143 determines which jobs in the queue are to be executed. In one embodiment, the simulation task priority structure is user definable (ie, controllable and definable by the system administrator) to control which simulation process has priority over current execution. In one embodiment, the priority level may be modified based on the urgency of a particular process or the importance of a particular user. In another embodiment, the priority level is dynamic and may change during the simulation. In a preferred embodiment, the priority is based on the user ID. Typically, one user has a high priority and all other users have a low but equal priority.

우선순위 레벨은 시스템 관리자에 의해 설정가능하다. 시뮬레이터 서버는 UNIX 장비로부터 "/etc/passwd"로 불리는 UNIX 사용자 파일에서 존재하는 모든 사용자 정보를 입수한다. 새로운 사용자를 부가하는 것은 UNIX 시스템 내에 새로운 사용자를 부가하는 과정 내내 일정하다. 모든 사용자가 정의된 후에, 시뮬레이션 서버 모니터가 사용자용 우선순위 레벨을 조정하기 위해 사용될 수 있다.The priority level can be set by the system administrator. The simulator server gets all user information from the UNIX machine in the UNIX user file called "/ etc / passwd". Adding new users is constant throughout the process of adding new users in a UNIX system. After all users have been defined, a simulation server monitor can be used to adjust the priority level for the user.

작업 스와퍼(1144)는 사용자용으로 프로그램된 우선순위 결정에 기초하여 한 프로세스 또는 한 워크스테이션과 연관된 한 시뮬레이션을 다른 프로세스 또는 워크스테이션과 연관된 다른 시뮬레이션으로 일시적으로 대체한다. 다중 사용자가 동일한 설계를 시뮬레이션하는 경우에, 작업 스와퍼가 시뮬레이션 세션 동안 단지 저장된 시뮬레이션 상태에서 스와핑된다. 그러나, 다중 사용자가 다중 설계를 시뮬레이션하는 경우에, 작업 스와퍼는 시뮬레이션 상태의 스와핑 전에 하드웨어 구조용 설계에 로딩한다. 일 실시예에서, 작업 스와핑이 단지 리컨피규러블 하드웨어 유닛 액세스만을 위해 행해져야 하기 때문에 작업 스와핑 메커니즘은 본 발명의 시간 공유 실시예의 성능을 향상시킨다. 따라서, 한 사용자가 일부 시간 주기 동안에 소프트웨어 시뮬레이션을 필요로 한다면, 서버는 다른 사용자를 위해 다른 작업으로 교대되어 이 다른 사용자가 하드웨어 가속을 위해 리컨피규러블 하드웨어 유닛에 액세스할 수 있게 된다. 작업 스와핑의 빈도는 사용자가 조정가능하고 프로그램가능하다. 또한, 장치 드라이버는 작업을 스와핑하기 위해 리컨피규러블 하드웨어 유닛과 통신한다.
시뮬레이션 서버의 동작을 설명하고자 한다. 도49는 동작 중의 시뮬레이션 서버의 흐름도를 나타낸다. 먼저, 단계(1160)에서, 시스템이 휴면상태가 된다. 시스템이 단계(1160)에서 휴면상태인 경우에, 시뮬레이션 서버가 비액티브일 필요가 있는 것은 아니며 시뮬레이션 작업이 동작되지 않는다. 휴면상태는 다음의 상황중 하나를 의미할 수 있다: (1) 시뮬레이션이 동작되지 않음; (2) 단지 사용자/워크스테이션만이 단일 프로세서 환경에서 액티브이어서 시간 공유가 요구되지 않음; 또는 (3) 단지 하나의 사용자/워크스테이션이 액티브나 단지 하나의 프로세스가 동작중임. 따라서, 위의 조건2 및 조건3은 시뮬레이션 서버가 처리될 단지 하나의 작업만을 가져서 작업 큐잉(queuing: 대기), 우선순위 결정, 및 작업 스와핑이 필요하지 않고 본질적으로, 다른 워크스테이션 또는 프로세스로부터 요구(이벤트(1161))를 수신하지 않기 때문에 시뮬레이션 서버가 휴면상태이다.
다중 사용자 환경의 워크스테이션으로부터 또는 멀티프로세서 환경의 마이크로프로세서로부터의 하나 이상의 요구 신호로 인해 시뮬레이션 요구가 일어나는 경우에, 시뮬레이션 서버는 들어오는 시뮬레이션 작업 또는 작업들을 단계(1162)에서 대기시킨다. 스케쥴러는 큐에 모든 미해결의 시뮬레이션 요구를 삽입하기 위해 시뮬레이션 작업 큐를 유지하고 모든 미해결 시뮬레이션 요구의 리스트를 만든다. 배치(batch) 시뮬레이션 작업을 위하여, 서버의 스케쥴러는 모든 들어오는 시뮬레이션 요구를 대기시키고 자동적으로 사람의 개입 없이 작업들을 처리한다. Task swapper 1144 temporarily replaces one simulation associated with one process or workstation with another simulation associated with another process or workstation based on prioritization programmed for the user. If multiple users simulate the same design, the task swapper is swapped in the saved simulation state only during the simulation session. However, if multiple users simulate multiple designs, the task swapper loads into the hardware structural design before swapping in the simulation state. In one embodiment, the task swapping mechanism improves the performance of the time sharing embodiment of the present invention because task swapping should only be done for reconfigurable hardware unit access. Thus, if one user needs software simulation for some period of time, the server will alternate to another task for the other user so that the other user can access the reconfigurable hardware unit for hardware acceleration. The frequency of task swapping is user adjustable and programmable. The device driver also communicates with the reconfigurable hardware unit to swap tasks.
To explain the operation of the simulation server. 49 shows a flowchart of the simulation server in operation. First, at step 1160, the system is dormant. If the system is dormant at step 1160, the simulation server does not need to be inactive and the simulation job is not running. A dormant state can mean one of the following situations: (1) the simulation is not running; (2) only users / workstations are active in a single processor environment so no time sharing is required; Or (3) only one user / workstation is active or only one process is running. Thus, conditions 2 and 3 above have only one job to be processed by the simulation server so that job queuing, prioritization, and job swapping are not required and are essentially required from other workstations or processes. The simulation server is dormant because it does not receive (event 1161).
If a simulation request is caused by one or more request signals from a workstation in a multi-user environment or from a microprocessor in a multiprocessor environment, the simulation server queues the incoming simulation task or tasks at step 1162. The scheduler maintains a simulation work queue and lists all outstanding simulation requests to insert all outstanding simulation requests into the queue. For batch simulation tasks, the server's scheduler waits for all incoming simulation requests and automatically processes tasks without human intervention.

삭제delete

다음 시뮬레이션 서버는 단계(1163)에서 우선순위를 결정하기 위해서 대기된 작업을 분류한다. 이 단계는 서버가 리컨피규러블 하드웨어 유닛에 액세스를 제공하기 위해서 다중 작업 들중에서 우선순위를 부여하여야 하는 경우에는 다중 작업들에 대해 특히 중요하게 된다. 우선순위 분류기는 큐에서 어떤 작업이 실행되어야 할지를 결정한다. 일 실시예에서, 시뮬레이션 작업 우선순위 구조는 자원 경쟁이 존재하는 경우 어떤 프로세스가 현재 실행에 대한 우선순위를 갖는지 제어하기 위해 사용자 정의 가능(즉, 시스템 관리자에 의해 제어가능 및 정의가능)하다.
단계(1163)의 우선 순위 분류 이후에, 다음 서버는 필요한 경우 단계(1164)에서 시뮬레이션 작업을 대체한다. 이 단계는 서버의 스케쥴러용으로 프로그램된 우선순위 결정에 기초하여 한 프로세스 또는 한 워크스테인션과 연관된 하나의 시뮬레이션 작업을 다른 프로세스 또는 워크스테이션과 연관된 다른 시뮬레이션 작업과 일시적으로 대체한다. 다중 사용자가 동일한 설계를 시뮬레이션하는 경우에, 작업 스와퍼는 시뮬레이션 세션 동안 단지 저장된 시뮬레이션 상태에서 스와핑한다. 그러나, 다중 사용자가 다중 설계를 시뮬레이션하는 경우에, 작업 스와퍼는 시뮬레이션 상태에서의 스와핑 이전에 상기 설계를 먼저 로딩한다. 여기서, 장치 드라이버는 또한 작업을 스와핑하기 위해서 리컨피규러블 하드웨어 유닛과 통신한다.
일 실시예에서, 작업 스와핑이 단지 리컨피규러블 하드웨어 유닛 액세스를 위해 행해져야만 하기 때문에 작업 스와핑 메커니즘은 본 발명의 시간 공유 실시예의 성능을 향상시킨다. 따라서, 어떤 사용자가 시간 임의의 주기 동안 소프트웨어 시뮬레이션을 필요로 하는 경우에, 서버는 다른 사용자를 위해 다른 작업으로 스와핑되어 이 다른 사용자가 하드웨어 가속을 위해 리컨피규러블 하드웨어를 액세스 할 수 있게 된다. 예를들어, 두 명의 사용자, 사용자1 및 사용자 2가 리컨피규러블 하드웨어 유닛에 액세스하기 위해서 시뮬레이션 서버에 연결되어 있다고 가정하자. 임의의 시간에, 사용자1은 시스템에 액세스하여 디버깅이 그의 사용자 설계를 위해 수행될 수 있다. 사용자 1이 소프트웨어 모드에서만 디버깅하는 경우에, 서버는 리컨피규러블 하드웨어 유닛을 해제하여 사용자2가 서버에 액세스할 수 있게 된다. 서버는 사용자2용 작업으로 스와핑되며 사용자 2는 소프트웨어 시뮬레이션하거나 하드웨어를 가속할 수 있게 된다. 사용자 1 및 사용자2 사이의 우선순위에 기초하여, 사용자 2는 소정 시간 동안 리컨피규러블 하드웨어 유닛을 계속 액세스할 수 있거나, 사용자1이 가속을 위해 리컨피규러블 하드웨어 유닛을 필요로 하는 경우엔, 서버가 사용자2용 작업을 선점하여 사용자1용 작업이 리컨피규러블 하드웨어 유닛을 이용하여 하드웨어 가속을 위해 스와핑될 수 있게 된다. 소정 시간을 위해서 동일한 우선순위의 다중 요구에 기초한 시뮬레이터 작업의 선점이 참조된다. 일 실시예에서, 디폴트 시간이 5분이나 이사간은 사용자가 설정할 수 있다. 이 5분으로 설정하는 것은 타임아웃(time-out) 타이머의 한 형태를 나타낸다. 본 발명의 시뮬레이션 시스템은 타임아웃 타이머를 사용하여 현재 시뮬레이션 작업의 실행을 중지시키는데, 이는 이것이 과도하게 시간 소모적이고 시스템이 동일한 우선순위의 다른 진행 작업이 리컨피규러블 하드웨어 모델에 대한 액세스를 확보하여야 하는지를 결정하기 때문이다.
단계(1164)의 작업 스와핑 단계의 종료시에, 서버의 장치 드라이버는 리컨피규러블 하드웨어 유닛을 로킹(lock)하여 단지 현재 예정된 사용자 또는 프로세스만이 하드웨어 모델을 시뮬레이션하고 이용하도록 한다. 로킹 및 시뮬레이션 단계는 단계(1165)에서 일어난다.
시뮬레이션의 종료 또는 이벤트(1166)에서 현재 시뮬레이션 세션의 중지의 발생시에, 서버는 현재 시뮬레이션 작업의 우선순위를 결정하기 위해 우선순위 분류기 단계(1163)로 복귀하고 나중에 필요한 경우 시뮬레이션 작업을 스와핑한다. 유사하게, 서버를 우선순위 분류기 상태(1163)로 복귀시키기 위해서, 서버는 이벤트(1167)에서 진쟁중인 액티브 시뮬레이션 작업의 동작을 선점할 수 있다. 선점은 단지 어떠한 조건하에서만 발생된다. 이러한 조건중 하나는 높은 우선순위의 작업이 진행중인 경우이다. 이런 조중 다른 하나는 시스템이 계산이 집중적인 시뮬레이션 작업을 현재 구동시키고 있는 경우이며, 이 경우 스케쥴러는 타임아웃 타이머를 이용하여 동일한 우선순위를 갖는 작업을 스케쥴링하도록 현재 진행중인 작업을 선점하도록 프로그램될 수 있다. 일 실시예에서, 타임아웃 타이머는 5분으로 설정되고 현재 작업이 5분 동안 수행된다면, 시스템은 현재 작업을 선점하고 진행중인 작업을 우선순위 레벨에 있다 하더라도 스와핑한다.
도50은 작업 스와핑 프로세스의 흐름도이다. 작업 스와핑 기능은 도49의 단계(1164)에서 수행되며 도48의 작업 스와퍼로서 시뮬레이션 서버 하드웨어에 도시되어 있다. 도50에서, 시뮬레이션 작업이 다른 시뮬레이션 작업으로 스와핑될 필요가 있는 경우에, 작업 스와퍼는 단계(1180)에서 리컨피규러블 하드웨어 유닛에 인터럽트를 전송한다. 리컨피규러블 하드웨어 유닛이 현재 임의의 작업을 행하지 않는 경우에(즉, 시스템이 휴면상태이거나 사용자가 하드웨어 가속 개입만이 없이 소프트웨어 시뮬레이션 모드에서 동작하고 있는 경우에), 인터럽트는 즉시 작업 스와핑를 위한 리컨피규러블 하드웨어 유닛을 준비한다. 그러나, 리컨피규러블 하드웨어 유닛이 현재 작업을 동작시키고 있고 명령을 실행하거나 데이터를 처리하고 잇는 도중인 경우에, 인터럽트 신호는 인식되나 리컨피규러블 유닛은 현재 진행중인 명령을 실행하고 현재 작업에 대한 데이터를 처리한다. 리컨피규러블 하드웨어 유닛이 인터럽트 신호를 수신하고 현재 시뮬레이션 작업이 명령을 실행하거나 데이터를 처리하는 중이 아닌 경우에, 인터럽트 신호는 본질적으로 즉시 리컨피규러블 하드웨어 유닛의 동작을 중단시킨다.
단계(1181)에서, 시뮬레이션 시스템은 현재 시뮬레이션 영상(즉, 하드웨어 및 소프트웨어 상태)를 저장한다. 이 이미지를 저장함에 의해, 사용자는 나중에 전체 시뮬레이션을 상기 저장된 지점까지 재실행시킴 없이 시뮬레이션 실행을 복구할 수 있다.
단계(1182)에서, 시뮬레이션 시스템은 리컨피규러블 하드웨어 유닛을 새로운 사용자 설계로 구성한다. 이 구성 단계는 단지 새로운 작업이 이미 구성되어 리컨피규러블 하드웨어 유닛에 로딩되고 그 실행이 인터럽트된 사용자 설계와 상이한 사용자 설계에 관련되어 있는 경우에만 요구된다. 구성 후에, 저장된 하드웨어 시뮬레이션 영상가 단계(1183)에서 리로딩되고 저장된 하드웨어 시뮬레이션 영상가 단계(1184)에서 재로딩된다. 새로운 시뮬레이션 작업이 동일한 설계와 연관되어 있는 경우에, 어떠한 부가적인 구성도 요구되지 않는다. 동일한 설계의 경우에, 새로운 작업에 대한 시뮬레이션 영상이 단지 인터럽트된 작업에 대하여 시뮬레이션 영상과 적절하게 상이하기 때문에, 시뮬레이션 시스템은 단계(1183)에서 상기 동일한 설계에 대한 새로운 시뮬레이션 작업과 연관된 목적하는 하드웨어 시뮬레이션 영상을 로딩한다. 구성 단계의 상세한 내용을 설명하고자 한다. 다음, 관련 소프트웨어 시뮬레이션 영상가 단계(1184)에서 리로딩된다. 하드웨어 및 소프트웨어 시뮬레이션 영상의 리로딩 후에, 시뮬레이션이 이 새로운 작업에 대하여 개시될 수 있으나, 당분간 리컨피규러블 하드웨어에 대한 액세스가 존재하지 않기 때문에 이전에 인터럽트된 작업이 소프트웨어 시뮬레이션 모드에서만 진행될 수 있다.
도51은 장치 드라이버와 리컨피규러블 하드웨어 유닛 사이의 신호를 나타낸다. 장치 드라이버(1171)는 스케쥴러(1170)와 리컨피규러블 하드웨어 유닛(1172) 사이의 인터페이스를 제공한다. 장치 드라이버(1171)는 또한 도45 및 도46에 도시된 바와 같이 전체 컴퓨팅 환경(즉, 워크스테이션, PCI 버스, PCI 장치)과 리컨피규러블 하드웨어 유닛(1172) 사이의 인터페이스를 제공한다. 도51은 시뮬레이션 서버 부분만을 도시한다. 장치 드라이버와 리컨피규러블 하드웨어 사이의 신호는 양방향 통신 핸드쉐이크(handshake) 신호, 스케쥴러를 통해 컴퓨팅 환경으로부터 리컨피규러블 하드웨어 유닛으로의 단방향 설계 구성 정보, 시뮬레이션 상태로 스와핑된 정보, 시뮬레이션 상태에서 스와핑된 정보, 및 장치 드라이버로부터 리컨피규러블 하드웨어 유닛으로의 인터럽트 신호를 포함하여, 시뮬레이션 작업이 스와핑될 수 있다.
라인(1173)은 양방향 통신 핸드쉐이크 신호를 전송한다. 이들 신호 및 핸드쉐이크 프로토콜은 도53 및 도54를 참조하여 보다 상세히 설명될 것이다.
라인(1174)은 스케쥴러(1170)를 통해 컴퓨팅 환경으로부터 리컨피규러블 하드웨어 유닛(1172)으로의 단방향 설계 구성 정보를 전송한다. 초기 구성 정보는 이 라인(1170)상의 모델링 목적을 위해 리컨피규러블 하드웨어 유닛(1172)으로 전송될 것이다. 또한, 사용자가 상이한 사용자 설계를 모델링하고 시뮬레이션하는 경우에, 구성 정보는 시간 슬라이스 중에 리컨피규러블 하드웨어 유닛(1172)으로 전송되어야 한다. 상이한 사용자가 동일한 사용자 설계를 모델링하는 경우에, 어떠한 설계 구성도 요구되지 않으며, 오히려 동일한 설계와 연관된 상이한 시뮬레이션 하드웨어 상태가 상이한 시뮬레이션 동작을 위해 리컨피규러블 하드웨어 유닛(1172)으로 전송될 필요가 있을 것이다.
라인(1175)은 시뮬레이션 상태 정보로 스와핑된 정보를 리컨피규러블 하드웨어 유닛(1172)으로 전송한다. 라인(1176)은 리컨피규러블 하드웨어 유닛으로부터 컴퓨팅 환경(즉, 대게 메모리)으로 시뮬레이션 상태 정보에서 스와핑된다. 시뮬레이션 상태로 스와핑된 정보는 이전에 저장된 하드웨어 모델 상태정보와 리컨피규러블 하드웨어 유닛(1172)이 가속될 필요가 있는 하드웨어 메모리 상태를 포함한다. 스와핑된 상태 정보는 시간의 시작에서 전송되어 스케줄된 현재 사용자가 가속을 위해 리컨피규러블 하드웨어 유닛(1172)을 액세스할 수 있다. 스와핑된 상태 정보는 하드웨어 모델과 상이한 사용자/프로세스와 연관된 다음 시간 슬라이스로 이동시키기 위해서 인터럽트 신호를 수신하는 리컨피규러블 하드웨어 유닛(1172)에 시간 슬라이스의 말미에 메모리에 저장되어야 하는 메모리 상태 정보를 포함한다. 상태 정보의 저장을 통해 사용자는 현재 사용자/프로세스에 할당되는 다음 시간 슬라이스와 같은 이후 시간에 이러한 상태를 복구할 수 있다.
라인(1177)은 장치 드라이버(1171)로부터 리컨피규러블 하드웨어 유닛으로 인터럽트 신호를 전송하여 시뮬레이션 작업이 스와핑될 수 있다. 이 인터럽트 신호가 현재 시간 슬라이스에서 현재 시뮬레이션 작업으로 스와핑되고 다음 시간 슬라이스 동안 새로운 시뮬레이션 작업으로 스와핑되도록 하기 위해 시간 슬라이스들 사이에서 전송된다.
본 발명의 실시예에 따른 통신 핸드쉐이크 프로토콜이 도 53 및 도 54를 참조하여 설명하고자 한다. 도 53은 핸드쉐이크 로직 인터페이스를 경유한 장치 드라이버와 리컨피규러블 하드웨어 유닛 사이의 통신 핸드쉐이크 신호를 나타낸다. 도54는 통신 프로토콜의 상태도이다. 도 51은 라인(1173) 상의 통신 핸드쉐이크 신호를 나타낸다. 도 53은 장치 드라이버(1171)와 리컨피규러블 하드웨어 유닛(1172) 사이의 통신 핸드쉐이크 신호의 상세도이다.The simulation server then sorts the jobs waiting to determine priority in step 1163. This step is particularly important for multiple tasks where the server must prioritize among the multiple tasks to provide access to the reconfigurable hardware unit. The priority classifier decides what jobs in the queue should be run. In one embodiment, the simulation task priority structure is user definable (i.e., controllable and definable by the system administrator) to control which processes have priority for current execution if there is a resource contention.
After the priority classification of step 1163, the next server replaces the simulation task at step 1164 as needed. This step temporarily replaces one simulation task associated with one process or one workstation with another simulation task associated with another process or workstation based on prioritization programmed for the server's scheduler. If multiple users simulate the same design, the task swapper swaps in the saved simulation state only during the simulation session. However, if multiple users simulate multiple designs, the task swapper first loads the designs prior to swapping in the simulated state. Here, the device driver also communicates with the reconfigurable hardware unit to swap tasks.
In one embodiment, the task swapping mechanism improves the performance of the time sharing embodiment of the present invention because task swapping should only be done for reconfigurable hardware unit access. Thus, if a user needs software simulation for any period of time, the server will be swapped into different tasks for other users so that the other users can access the reconfigurable hardware for hardware acceleration. For example, suppose two users, User1 and User2, are connected to a simulation server to access a reconfigurable hardware unit. At any time, user 1 can access the system so that debugging can be performed for his user design. If user 1 debugs only in software mode, the server releases the reconfigurable hardware unit so that user 2 can access the server. The server is swapped for a task for User2, and User2 can simulate software or accelerate hardware. Based on the priority between User 1 and User 2, User 2 can continue to access the reconfigurable hardware unit for a predetermined time, or if User 1 needs the reconfigurable hardware unit for acceleration, Preempts the task for user 2 so that the task for user 1 can be swapped for hardware acceleration using the reconfigurable hardware unit. Reference is made to the preemption of simulator tasks based on multiple requests of the same priority for a given time. In one embodiment, the default time is 5 minutes, but the user can set the time between moves. Setting this five minutes represents one form of time-out timer. The simulation system of the present invention uses a timeout timer to suspend the execution of the current simulation task, which is excessively time consuming and allows the system to determine if other progress tasks of the same priority should gain access to the reconfigurable hardware model. Because it is determined.
At the end of the task swapping step of step 1164, the device driver of the server locks the reconfigurable hardware unit so that only currently scheduled users or processes simulate and use the hardware model. The locking and simulation step occurs at step 1165.
Upon the end of the simulation or the occurrence of a break in the current simulation session at event 1166, the server returns to priority classifier step 1163 to later determine the priority of the current simulation task and later swaps the simulation task if necessary. Similarly, to return the server to priority classifier state 1163, the server may preempt the operation of the active simulation task in conflict at event 1167. Preemption only occurs under certain conditions. One such condition is when high priority work is in progress. The other of these cases is when the system is currently running a simulation-intensive simulation task, in which case the scheduler can be programmed to preempt the ongoing task to schedule tasks of equal priority using a timeout timer. . In one embodiment, if the timeout timer is set to 5 minutes and the current task is performed for 5 minutes, the system preempts the current task and swaps the ongoing task even if it is at the priority level.
50 is a flowchart of a task swapping process. The task swapping function is performed in step 1164 of FIG. 49 and is shown in the simulation server hardware as the task swapper of FIG. In FIG. 50, if the simulation task needs to be swapped with another simulation task, the task swapper sends an interrupt to the reconfigurable hardware unit in step 1180. If the reconfigurable hardware unit is not currently doing any work (that is, if the system is dormant or the user is operating in software simulation mode without hardware acceleration intervention only), the interrupt is immediately reconfigured for task swapping. Prepare a scalable hardware unit. However, if the reconfigurable hardware unit is currently executing a task and is executing instructions or processing data, the interrupt signal is recognized but the reconfigurable unit executes the instruction currently in progress and retrieves data for the current task. Process. If the reconfigurable hardware unit receives an interrupt signal and the current simulation task is not executing instructions or processing data, the interrupt signal essentially stops the operation of the reconfigurable hardware unit immediately.
At step 1181, the simulation system stores the current simulation image (ie, hardware and software state). By storing this image, the user can later restore the simulation run without rerunning the entire simulation to the stored point.
At step 1182, the simulation system configures the reconfigurable hardware unit with a new user design. This configuration step is only required if a new task has already been configured and loaded into the reconfigurable hardware unit and its execution is related to a user design different from the interrupted user design. After configuration, the stored hardware simulation image is reloaded in step 1183 and the stored hardware simulation image is reloaded in step 1184. If a new simulation task is associated with the same design, no additional configuration is required. In the case of the same design, since the simulated image for the new task is appropriately different from the simulated image for the interrupted task, the simulation system is the desired hardware simulation associated with the new simulation task for the same design in step 1183. Load the image. The details of the construction steps will be described. The relevant software simulation image is then reloaded at step 1184. After reloading the hardware and software simulation images, the simulation can be initiated for this new task, but the previously interrupted task can only proceed in software simulation mode because there is no access to the reconfigurable hardware for the time being.
51 shows signals between the device driver and the reconfigurable hardware unit. The device driver 1171 provides an interface between the scheduler 1170 and the reconfigurable hardware unit 1172. The device driver 1171 also provides an interface between the entire computing environment (ie, workstation, PCI bus, PCI device) and the reconfigurable hardware unit 1172 as shown in FIGS. 45 and 46. Fig. 51 shows only the simulation server portion. Signals between the device driver and the reconfigurable hardware are bidirectional communication handshake signals, unidirectional design configuration information from the computing environment to the reconfigurable hardware unit via the scheduler, information swapped to the simulation state, and swapped from the simulation state. Simulation tasks can be swapped, including information and interrupt signals from the device driver to the reconfigurable hardware unit.
Line 1173 transmits a bidirectional communication handshake signal. These signal and handshake protocols will be described in more detail with reference to FIGS. 53 and 54.
Line 1174 transmits unidirectional design configuration information from the computing environment to reconfigurable hardware unit 1172 via scheduler 1170. Initial configuration information will be sent to the reconfigurable hardware unit 1172 for modeling purposes on this line 1170. In addition, when a user models and simulates a different user design, the configuration information must be sent to the reconfigurable hardware unit 1172 during the time slice. If different users model the same user design, no design configuration is required, but rather different simulation hardware states associated with the same design will need to be sent to reconfigurable hardware unit 1172 for different simulation operations. .
Line 1175 sends the information swapped with simulation state information to reconfigurable hardware unit 1172. Line 1176 is swapped in the simulation state information from the reconfigurable hardware unit to the computing environment (ie, usually memory). The information swapped into the simulation state includes previously stored hardware model state information and a hardware memory state in which the reconfigurable hardware unit 1172 needs to be accelerated. The swapped state information is transmitted at the beginning of time so that the scheduled current user can access the reconfigurable hardware unit 1172 for acceleration. The swapped state information includes memory state information that must be stored in memory at the end of the time slice at the reconfigurable hardware unit 1172 that receives the interrupt signal to move to the next time slice associated with a different user / process than the hardware model. do. The storage of state information allows the user to recover this state at a later time, such as the next time slice assigned to the current user / process.
Line 1177 transmits an interrupt signal from device driver 1171 to the reconfigurable hardware unit so that simulation work can be swapped. This interrupt signal is swapped between time slices to cause the current time slice to be swapped from the current time slice to the new simulation task.
A communication handshake protocol according to an embodiment of the present invention will be described with reference to FIGS. 53 and 54. 53 illustrates a communication handshake signal between a device driver and a reconfigurable hardware unit via a handshake logic interface. 54 is a state diagram of a communication protocol. 51 illustrates a communication handshake signal on line 1173. 53 is a detailed diagram of the communication handshake signal between the device driver 1171 and the reconfigurable hardware unit 1172.

삭제delete

도 53에서, 핸드쉐이크 로직 인터페이스(1234)가 리컨피규러블 하드웨어 유닛(1172)에 제공된다. 또한, 핸드쉐이크 로직 인터페이스(1234)는 리컨피규러블 하드웨어 유닛(1172) 외부에 설치될 수 있다. 네 개의 세트의 신호가 장치 드라이버(1171)와 핸드쉐이크 로직 인터페이스(1234) 사이에 제공된다. 이들 신호는 라인(1230) 상의 3비트 SPACE 신호, 라인(1231) 상의 단일 비트 판독/기록 신호, 라인(1232) 상의 4비트 COMMAND 신호, 및 라인(1233) 상의 단일 비트 DONE 신호이다. 핸드쉐이크 로직 인터페이스는 이들 신호들이 수행될 필요가 있는 다양한 동작에 대하여 적절한 모드로 리컨피규러블 하드웨어 유닛에 배치되도록 이들 신호를 처리하는 로직 회로를 포함한다. 상기 인터페이스는 CTRL_FPGA 유닛(또는 FPGA I/O 컨트롤러)에 결합된다.In FIG. 53, a handshake logic interface 1234 is provided to the reconfigurable hardware unit 1172. In addition, the handshake logic interface 1234 may be installed outside the reconfigurable hardware unit 1172. Four sets of signals are provided between the device driver 1171 and the handshake logic interface 1234. These signals are a 3-bit SPACE signal on line 1230, a single bit read / write signal on line 1231, a 4-bit COMMAND signal on line 1232, and a single bit DONE signal on line 1233. The handshake logic interface includes logic circuitry that processes these signals so that they are placed in the reconfigurable hardware unit in an appropriate mode for the various operations in which these signals need to be performed. The interface is coupled to a CTRL_FPGA unit (or FPGA I / O controller).

3 비트 SPACE 신호에 대하여, PCI 버스를 통한 시뮬레이션 시스템의 컴퓨팅 환경과 리컨피규러블 하드웨어 유닛 사이의 데이터 전송이 소프트웨어/하드웨어 경계--REG(레지스터), CLK(소프트웨어 클럭), S2H(소프트웨어에서 하드웨어로), 및 H2S(하드웨어에서 소프트웨어로)--의 임의의 I/O 어드레스 공간에 대해 지정된다. 위에서 설명한 바와 같이, 시뮬레이션 시스템은 하드웨어 모델을 상이한 컴포넌트 타입과 제어 기능에 따라 주 메모리의 네 개의 어드레스 공간으로 맵핑한다: REG 공간은 레지스터 컴포넌트에 지정되고; CLK 공간은 소프트웨어 클럭으로 지정되고; S2H 공간은 소프트웨어 테스트-벤치 컴포넌트의 하드웨어 모델로의 출력으로 지정되고; H2S 공간은 하드웨어 모델의 소프트웨어 테스트-벤치 컴포넌트의 출력으로 지정된다. 이들 전용 I/O 버퍼 공간은 시스템 초기화 시간 중에 커널의 주 메모리 공간으로 맵핑된다.For 3-bit SPACE signals, data transfer between the computing environment of the simulation system and the reconfigurable hardware units over the PCI bus is a software / hardware boundary--REG (register), CLK (software clock), and S2H (software to hardware). ), And H2S (hardware to software)-for any I / O address space. As described above, the simulation system maps the hardware model into four address spaces of main memory according to different component types and control functions: an REG space is assigned to a register component; CLK space is designated by software clock; The S2H space is designated as the output to the hardware model of the software test-bench component; The H2S space is designated as the output of the software test-bench component of the hardware model. These dedicated I / O buffer spaces are mapped into the kernel's main memory space during system initialization time.

다음 표G는 각 SPACE 신호를 설명한다.The following table G describes each SPACE signal.

표G: SPACE 신호Table G: SPACE Signal

SPACESPACE 설명Explanation 000000 글로벌(또는 CLK) 공간 및 소프트웨어에서 하드웨어(DMA wr)Hardware (DMA wr) in global (or CLK) space and software 001001 레지스터 기록(DMA wr)Register write (DMA wr) 010010 하드웨어에서 소프트웨어로 (DMA rd)From hardware to software (DMA rd) 011011 레지스터 판독 (DMA rd)Register read (DMA rd) 100100 SRAM 기록 (DMA wr)SRAM Write (DMA wr) 101101 SRAM 판독 (DMA rd)SRAM Read (DMA rd) 110110 비사용No use 111111 비사용No use

라인(1231) 상의 판독/기록 신호는 데이터 전송이 판독 또는 기록인지를 나타낸다. 라인(12330 상의 DONE 신호는 DMA 데이터 전송 주기의 종료를 나타낸다.The read / write signal on line 1231 indicates whether the data transfer is read or written. The DONE signal on line 12330 indicates the end of the DMA data transfer period.

4 비트 COMMAND는 데이터 전송 동작이 판독, 기록, 새로운 사용자 설계를 리컨피규러블 하드웨어 유닛에 구성, 또는 시뮬레이션의 인터럽트이어야 하는지를 나타낸다. 표H에 표시된 바와 같이, COMMAND 프로토콜은 다음과 같다.The 4-bit COMMAND indicates whether the data transfer operation should be a read, write, configure new user design in the reconfigurable hardware unit, or interrupt the simulation. As shown in Table H, the COMMAND protocol is as follows.

표H: COMMAND 신호Table H: COMMAND Signal

COMMANDCOMMAND 설명Explanation 00000000 지정 공간에 기록Write to specified space 00010001 지정 공간으로부터 판독Read from the designated space 00100010 FPGA 설계 구성FPGA design configuration 00110011 시뮬레이션 인터럽트Simulation interrupt 01000100 비사용No use

통신 핸드쉐이크 프로토콜이 도54의 상태도를 참조하여 이하에서 설명될 것이다. 상태(1400)에서, 장치 드라이버의 시뮬레이션 시스템은 휴면상태이다. 새로운 명령이 표시되지 않는 한은, 시스템은 경로(1401)에 표시된 바와 같이 휴면상태가 된다. 새로운 명령이 표시되면, 명령 프로세서는 상태(1402)에서 새로운 명령을 처리한다. 일 실시예에서, 명령 프로세서는 FPGA I/O 컨트롤러이다.
COMMAND=0000 또는 COMMAND=0001 이면, 시스템은 상태(1403)에서 SPACE 지수에 의해 지적된 바와 같이 지정 공간에 기록하거나 이로부터 판독한다. 만약 COMMAND=0010이면, 시스템은 사용자 설계로서 리컨피규러블 하드웨어 유닛에서 FPGA을 구성하거나 또는 상태 1404에서 새로운 사용자 설계로 FPGA를 구성한다. 모든 FPGA에 대한 시스템 시퀀스 구성 정보는 하드웨어로 모델링될 수 있는 사용자 설계의 부분을 모델링한다. 그러나, 만약 COMMAND=0011이면, 시스템은 시뮬레이션 시스템을 인터럽트하기 위하여 상태 1405에서 사용자 설계의 부분을 인터럽트하는데, 이는 새로운 시뮬레이션 상태에서 스와핑하기 위하여 시간 슬라이스가 새로운 사용자/프로세스에 대하여 시간이 경과하기 때문이다. 이러한 상태 1403, 1404 또는 1405가 완성되면, 시뮬레이션 시스템은 DONE 상태 1406으로 진행되어 DONE 신호를 생성하고, 상태 1400으로 리턴되어 새로운 커멘드가 존재할 때까지 대기한다.
상이한 레벨의 우선순위로서 다양한 작업을 처리하는 시뮬레이션 서버의 시분할 특징을 이제 설명한다. 도 52는 일 실시예를 도시한다. 4개의 작업(작업 A, 작업 B, 작업 C, 작업 D)이 시뮬레이션 작업 큐의 입력 작업이다. 그러나, 이러한 4개의 작업에 대한 우선순위는 상이하다. 즉, 작업 A 및 B는 높은 우선순위 I에 할당되고, 작업 C 및 D는 낮은 우선순위 II에 할당된다. 도 52의 시간 라인 차트에 도시된 바와 같이, 시분할 리컨피규러블 하드웨어 유닛는 큐된 입력 작업의 우선순위에 따라 수행한다. 시간 1190에서, 시뮬레이션은 리컨피규러블 하드웨어 액세스하는 작업 A로 시작된다. 시간 1191에서, 작업 A는 작업 B에 의해 선점(preempt)되는데, 이는 작업 B가 작업 A와 동일한 우선순위를 가지며 스케쥴러가 2개의 작업에 동일한 시분할 액세스를 제공하기 때문이다. 이제 작업 B는 리컨피규러블 하드웨어 유닛에 액세스되어야 한다. 시간 1192에서, 작업 A는 작업 B를 선취하고 작업 A는 시간 1193에서 실행된다. 시간 1193에서, 작업 B가 인수하여 시간 1194까지 실행 완료한다. 시간 1194에서, 작업 C(이는 큐에서 다음에 있지만 작업 A 및 B보다는 우선순위가 낮다)가 이제 실행을 위하여 리컨피규러블 하드웨어 유닛에 액세스한다. 시간 1195에서, 작업 D는 시분할 액세스를 위하여 작업 C를 선취하는데, 이는 작업 D가 작업 C와 동일한 우선순위를 가지기 때문이다. 작업 D는 이제 작업 C에 의해 선취되는 시간 1196까지 액세스한다. 작업 C는 시간 1197ㅇ서 완성된다. 시간 1197에서 작업 D가 인수하고, 시간 1198까지 실행 완료한다.
VIII. 메모리 시뮬레이션
본 발명의 일 실시예에 따른 메모리 시뮬레이션 또는 메모리 맵핑은 시뮬레이션 시스템이 사용자 설계의 구성 하드웨어 모델과 관련된 다양한 메모리 블럭을 관리하는 효율적 방법을 제공한다. 상기 사용자 설계는 리컨피규러블 하드웨어 유닛에서 FPGA의 어레이로 프로그램된다. 본 발명의 일 실시예를 구현함으로써, 메모리 시뮬레이션 수단은 메모리 액세스를 처리하는 FPGA 칩에서의 어떠한 전용 핀을 요구하지 않는다.
여기서 사용된 바와 같이, "메모리 액세스"라는 용어는 사용자 설계가 구현된 FPGA 로직 소자 및 사용자 설계와 관련된 모든 메모리 블럭을 저장하는 SRAM 메모리 소자 사이의 기록 액세스 또는 판독 액세스를 말한다. 따라서, 기록 동작은 FPGA 로직 소자에서 SRAM 메모리 소자로 데이터를 전송하는 한편, 판독 동작은 SRAM 메모리 소자에서 FPGA 로직 소자로 데이터를 전송한다. 도 56을 참조하면, FPGA 로직 소자는 1201(FPGA 1), 1202(FPGA 2), 1203(FPGA 3), 및 1204(FPGA 4)를 포함한다. SRAM 메모리 소자는 메모리 1205 및 1206을 포함한다.
또한, "DMA 데이터 전송"이라는 용어는 당업자에게 통용되는 일반적 의미외에도 컴퓨팅 시스템 및 시뮬레이션 시스템 사이의 데이터 전송을 의미한다. 컴퓨팅 시스템은 도 1, 45, 46에, 리컨피규러블 하드웨어 유닛 및 소프트웨어에 존재하는, 시뮬레이션 시스템을 지원하는 메모리를 구비한 전체 PCI-기초 시스템으로서 도시되어 있다. 선택된 디바이스에서, 운영 시스템으로/으로부터 요청하는 소켓/시스템은 또한 리컨피규러블 하드웨어 유닛 및 운영 시스템과의 적절한 인터페이스를 허용하는 시뮬레이션 시스템의 부분이다. 본 발명의 일 실시예에서, DMA 판독 전송은 FPGA 로직 소자(및 초기화와 메모리 컨텐츠 덤프를 위한 FPGA SRAM 메모리 소자)에서 호스트 컴퓨팅 시스템으로의 데이터 전송을 포함한다. DMA 기록 전송은 호스트 컴퓨팅 시스템에서 FPGA 로직 소자(및 초기화와 메모리 컨텐츠 덤프를 위한 FPGA SRAM 메모리 소자)로의 데이터 전송을 포함한다.
"FPGA 데이터 버스", "FPGA 버스", "FD 버스" 및 이들의 유사어는, 디버깅될 구성 및 프로그램된 사용자 설계를 포함하는 FPGA 로직 소자 및 SRAM 메모리 소자를 커플링하는 하이 뱅크 버스 FD[63:32] 및 로우 뱅크 버스 FD[31:0]을 말한다.
메모리 시뮬레이션 시스템은 메모리 상태 머신, 평가 상태 머신, 및 하기 장치에 대한 제어 및 인터페이스를 위한 관련 로직을 포함한다: (1) 메인 컴퓨팅 시스템 및 관련 메모리 시스템, (2) 시뮬레이션 시스템에서 FPGA 버스와 커플링된 SRAM 메모리 소자, 및 (3) 디버깅될 구성 및 프로그램된 사용자 설계를 포함하는 FPGA 로직 소자.A communication handshake protocol will be described below with reference to the state diagram of FIG. In state 1400, the simulation system of the device driver is dormant. Unless a new command is displayed, the system sleeps as indicated by path 1401. If a new command is displayed, the command processor processes the new command in state 1402. In one embodiment, the command processor is an FPGA I / O controller.
If COMMAND = 0000 or COMMAND = 0001, the system writes to or reads from the designated space as indicated by the SPACE index in state 1403. If COMMAND = 0010, the system configures the FPGA in the reconfigurable hardware unit as a user design, or configures the FPGA as a new user design in state 1404. The system sequence configuration information for every FPGA models the part of the user design that can be modeled in hardware. However, if COMMAND = 0011, the system interrupts part of the user's design in state 1405 to interrupt the simulation system, because the time slice will time out for the new user / process to swap in the new simulation state. . Upon completion of this state 1403, 1404 or 1405, the simulation system proceeds to DONE state 1406 to generate a DONE signal and returns to state 1400 to wait for a new command to exist.
The time-sharing features of the simulation server that handles various tasks with different levels of priority are now described. 52 illustrates one embodiment. Four tasks (Task A, Task B, Task C, Task D) are the input tasks of the simulation work queue. However, the priorities for these four tasks are different. That is, jobs A and B are assigned to high priority I and jobs C and D are assigned to low priority II. As shown in the time line chart of FIG. 52, the time division reconfigurable hardware unit performs according to the priority of the queued input task. At time 1190, the simulation begins with task A accessing the reconfigurable hardware. At time 1191, task A is preempted by task B, because task B has the same priority as task A and the scheduler provides the same time division access to the two tasks. Task B now needs to access the reconfigurable hardware unit. At time 1192, task A preempts task B and task A is executed at time 1193. At time 1193, task B takes over and completes execution until time 1194. At time 1194, task C (which is next in the queue but of lower priority than tasks A and B) now accesses the reconfigurable hardware unit for execution. At time 1195, task D preempts task C for time division access, because task D has the same priority as task C. Task D now accesses up to time 1196 preempted by task C. Task C is completed at time 1197. Task D takes over at time 1197 and finishes running until time 1198.
VIII. Memory simulation
Memory simulation or memory mapping in accordance with one embodiment of the present invention provides an efficient way for a simulation system to manage various memory blocks associated with a constituent hardware model of a user design. The user design is programmed into an array of FPGAs in a reconfigurable hardware unit. By implementing one embodiment of the present invention, the memory simulation means does not require any dedicated pins in the FPGA chip to handle memory access.
As used herein, the term "memory access" refers to a write access or a read access between an FPGA logic element in which a user design is implemented and an SRAM memory element that stores all memory blocks associated with the user design. Thus, the write operation transfers data from the FPGA logic device to the SRAM memory device, while the read operation transfers data from the SRAM memory device to the FPGA logic device. Referring to FIG. 56, an FPGA logic device includes 1201 (FPGA 1), 1202 (FPGA 2), 1203 (FPGA 3), and 1204 (FPGA 4). SRAM memory devices include memories 1205 and 1206.
The term "DMA data transfer" also refers to data transfer between a computing system and a simulation system in addition to the general meaning commonly used by those skilled in the art. The computing system is shown in Figures 1, 45, 46 as a complete PCI-based system with memory supporting the simulation system, which resides in reconfigurable hardware units and software. In the selected device, the socket / system requesting to / from the operating system is also part of the simulation system that allows for proper interface with the reconfigurable hardware unit and the operating system. In one embodiment of the invention, the DMA read transfer includes transferring data from an FPGA logic element (and an FPGA SRAM memory element for initialization and memory content dump) to a host computing system. DMA write transfers include data transfers from a host computing system to FPGA logic devices (and FPGA SRAM memory devices for initialization and memory content dump).
The "FPGA data bus", "FPGA bus", "FD bus" and their analogy refer to the high bank bus FD [63: 32] and low bank bus FD [31: 0].
The memory simulation system includes a memory state machine, an evaluation state machine, and associated logic for control and interface to the following devices: (1) the main computing system and associated memory system, and (2) coupling with the FPGA bus in the simulation system. An SRAM memory device, and (3) an FPGA logic device comprising a configuration to be debugged and a programmed user design.

삭제delete

메모리 시뮬레이션 시스템의 FPGA 로직 소자 측은 평가 상태 머신, FPGA 버스 드라이버, 및 하기의 사항을 처리하는 사용자 설계에서 사용자 자신의 메모리 인터페이스와 인테페이싱하기 위한 각각의 메모리 블럭 N에 대한 로직 인터페이스를 포함한다: (1) FPGA 로직 소자 사이의 데이터 평가, (2) FPGA 로직 소자 및 SRAM 메모리 소자 사이의 기록/판독 메모리 액세스. FPGA 로직 소자 측과 관련하여, FPGA I/O 컨트롤러 측은 (1) 메인 컴퓨팅 시스템 및 SRAM 메모리 소자 및 (2) FPGA 로직 소자 및 SRAM 메모리 소자 사이에 동작의 기록 및 판독 그리고 DMA를 처리하는 인터페이스 로직 및 메모리 상태 머신을 포함한다.
본 발명의 일 실시예에 따른 메모리 시뮬레이션의 동작은 일반적으로 다음과 같다. 시뮬레이션 기록/판독 사이클은 3개의 주기(DMA 데이터 전송, 평가, 및 메모리 액세스)로 나누어진다. DATAXSFR 신호는 DMA 데이터 전송 주기를 나타내는데, 여기서 컴퓨팅 시스템 및 SRAM 메모리 유닛는 FPGA 버스(하이 뱅크 버스(FD[63:32]) 1212 및 로우 뱅크 버스(FD[31:0]) 1213)를 통해서 각각 다른 장치로 데이터를 전송한다.
평가 주기동안, 각 FPGA 로직 소자의 로직 회로는 데이터 평가를 위한 사용자 설계 로직을 위해 적절한 소프트웨어 클럭, 입력 인에이블, 및 MUX 인에이블 신호를 생성한다. 내부-FPGA 로직 소자 통신은 이 주기에서 발생한다.
메모리 액세스 주기동안, 메모리 시뮬레이션 시스템은 각각의 어드레스 및 제어 신호를 각각의 FPGA 데이터 버스에 전송하기 위하여 하이 및 로우 뱅크 FPGA 로직 소자를 기다린다. 이러한 어드레스 및 제어 신호는 CTRL_FPGA 유닛에 의해 래치된다. 만약, 동작이 기록 동작이면, 어드레스, 제어, 및 데이터 신호들이 FPGA 로직 소자에서 각 SRAM 메모리 소자로 전송된다. 만약, 동작이 판독 동작이면, 어드레스, 제어, 및 데이터 신호들이 지정 SRAM 메모리 소자로 제공되고, 데이터 신호가 SRAM 메모리 소자에서 각 FPGA 로직 소자로 전송된다. 결국, 모든 FPGA 로직 소자에서 원하는 메모리 블럭이 액세스되고, 메모리 시뮬레이션 기록/판독 사이클은 완성되며, 메모리 시뮬레이션 시스템은 다음 메모리 시뮬레이션 기록/판독 사이클이 개시될 때까지 대기한다.
도 56은 본 발명의 일 실시예에 따른 메모리 시뮬레이션 구성의 상위 레벨 블록도를 도시한다. 본 발명의 특징과 관계없는 신호, 연결, 및 버스 등은 생략하였다. 전술한 CTRL_FPGA 유닛 1200는 라인 1209를 통해서 버스 1210에 연결된다. 일 실시예에서, CTRL_FPGA 유닛 1200는 Altera 10K50 칩과 같은 FPGA 칩 형태의 프로그램가능한 로직 소자(PLD)이다. 로직 버스 1210는 CTRL_FPGA 유닛 1200이 (만약 가능하다면) 다른 시뮬레이션 어레이 보드 및 다른 칩들(예, PCI 컨트롤러, EEPROM, 클럭 버퍼)과 연결될 수 있게 한다. 라인 1209는 시뮬레이션 DMA 데이터 전송 주기와 완성을 지시하는 DONE 신호를 전송한다.
도 56은 로직 소자 및 메모리 소자 형태에서 다른 주요한 기능적 블록을 도시한다. 일 실시예에서, 로직 소자는 Altera 10K130 또는 10K250칩과 같은 FPGA 칩 형태의 프로그램가능한 로직 소자(PLD)이다. 따라서, 어레이에서 8개의 Altera FLEX 10K100 칩을 구비하는 전술한 실시예 대신, 단지 4개의 Altera FLEX 10K130 칩을 구비하는 실시예를 사용할 수 있다. 메모리 소자는 Cypress 128Kx32 CY7C1335 또는 CY7C1336 칩과 같은 동기식 파이프라인 캐쉬 SRAM이다. 로직 소자는 1201(FPGA1), 1202(FPGA3), 1202(FPGA0) 및 1204(FPGA2)를 포함한다. SRAM 칩은 로우 뱅크 메모리 소자(1205)(L_SRAM) 및 하이 뱅크 메모리 소자(1206)(H_SRAM)를 포함한다.
이러한 로직 소자 및 메모리 소자는 하이 뱅크 버스 1212(FD[63:32]) 및 로우 뱅크 버스 1213(FD[31:0])을 통해서 CTRL_FPGA 유닛 1200에 연결된다. 로직 소자 1201(FPGA 1) 및 1202(FPGA 2)는 각각 버스 1223 및 버스 1225를 통해서 하이 뱅크 버스 1212에 연결되고, 로직 소자 1203(FPGA 1) 및 1204(FPGA 3)은 각각 버스 1224 및 1226을 통해서 로우 뱅크 버스 1213에 연결된다. 하이 뱅크 메모리 소자 1206는 버스 1220을 통해서 하이 뱅크 버스 1212에 연결되고, 로우 뱅크 메모리 소자 1205는 버스 1219를 통해서 로우 뱅크 버스 1213에 연결된다. 이중 뱅크 버스 구조는 시뮬레이션 시스템이 개선된 처리량 속도로서 병렬로 하이 뱅크상의 디바이스 및 로우 뱅크상의 디바이스에 액세스할 수 있도록 한다. 이중 뱅크 데이터 버스 구조는 제어 및 어드레스 신호와 같은 다른 신호를 지원하고, 시뮬레이션 기록/판독 사이클이 제어될 수 있게 한다.The FPGA logic element side of the memory simulation system includes an evaluation state machine, an FPGA bus driver, and a logic interface for each memory block N for interfacing with a user's own memory interface in a user design that handles the following: ( 1) Data Evaluation Between FPGA Logic Devices; (2) Write / Read Memory Access Between FPGA Logic Devices and SRAM Memory Devices. With respect to the FPGA logic device side, the FPGA I / O controller side includes (1) the main computing system and the SRAM memory device and (2) the interface logic to handle the writing and reading of operations and DMA between the FPGA logic and SRAM memory devices; Contains a memory state machine.
Operation of the memory simulation according to an embodiment of the present invention is generally as follows. The simulation write / read cycle is divided into three cycles (DMA data transfer, evaluation, and memory access). The DATAXSFR signal represents a DMA data transfer period, where the computing system and the SRAM memory unit are each different through an FPGA bus (high bank bus (FD [63:32]) 1212 and low bank bus (FD [31: 0]) 1213. Send data to the device.
During the evaluation cycle, the logic circuitry in each FPGA logic device generates the appropriate software clock, input enable, and MUX enable signals for user-designed logic for data evaluation. Internal-FPGA logic device communication occurs during this period.
During the memory access period, the memory simulation system waits for the high and low bank FPGA logic elements to send each address and control signal to each FPGA data bus. This address and control signal is latched by the CTRL_FPGA unit. If the operation is a write operation, address, control, and data signals are transferred from the FPGA logic element to each SRAM memory element. If the operation is a read operation, address, control, and data signals are provided to the designated SRAM memory device, and the data signal is transferred from the SRAM memory device to each FPGA logic device. As a result, the desired memory block is accessed in all FPGA logic devices, the memory simulation write / read cycle is completed, and the memory simulation system waits until the next memory simulation write / read cycle begins.
56 shows a high level block diagram of a memory simulation configuration, in accordance with an embodiment of the present invention. Signals, connections, buses and the like that are not relevant to the features of the present invention are omitted. The aforementioned CTRL_FPGA unit 1200 is connected to bus 1210 via line 1209. In one embodiment, CTRL_FPGA unit 1200 is a programmable logic device (PLD) in the form of an FPGA chip such as an Altera 10K50 chip. Logic bus 1210 allows CTRL_FPGA unit 1200 to be connected with other simulation array boards and other chips (e.g., PCI controllers, EEPROMs, clock buffers) if available. Line 1209 carries a simulated DMA data transfer period and a DONE signal indicating completion.
56 illustrates another major functional block in the form of a logic element and a memory element. In one embodiment, the logic device is a programmable logic device (PLD) in the form of an FPGA chip such as an Altera 10K130 or 10K250 chip. Thus, instead of the embodiment described above with eight Altera FLEX 10K100 chips in the array, an embodiment with only four Altera FLEX 10K130 chips can be used. The memory device is a synchronous pipeline cache SRAM such as a Cypress 128Kx32 CY7C1335 or CY7C1336 chip. Logic elements include 1201 (FPGA1), 1202 (FPGA3), 1202 (FPGA0), and 1204 (FPGA2). The SRAM chip includes a low bank memory element 1205 (L_SRAM) and a high bank memory element 1206 (H_SRAM).
These logic and memory devices are connected to CTRL_FPGA unit 1200 via high bank bus 1212 (FD [63:32]) and low bank bus 1213 (FD [31: 0]). Logic elements 1201 (FPGA 1) and 1202 (FPGA 2) are connected to high bank bus 1212 through bus 1223 and bus 1225, respectively, and logic devices 1203 (FPGA 1) and 1204 (FPGA 3) connect buses 1224 and 1226, respectively. Is connected to the low bank bus 1213. The high bank memory device 1206 is connected to the high bank bus 1212 via the bus 1220 and the low bank memory device 1205 is connected to the low bank bus 1213 via the bus 1219. The dual bank bus structure allows the simulation system to access devices on the high bank and devices on the low bank in parallel with improved throughput rates. The dual bank data bus structure supports other signals, such as control and address signals, and allows the simulation write / read cycle to be controlled.

삭제delete

다시 도 61을 참조하면, 각 시뮬레이션 기록/판독 사이클은 DMA 데이터 전송 주기, 평가 주기, 및 메모리 액세스 주기를 포함한다. 다양한 제어 신호들의 조합은 시뮬레이션 시스템이 서로 대립되는 주기중 어느 주기에 해당하는지를 제어 및 지시한다. 리컨피규러블 하드웨어 유닛에서 로직 소자 1201 내지 1204 및 호스트 컴퓨터 사이의 DMA 데이터 전송은 PCI 버스(예, 도 46의 버스 50), 로컬 버스 1210 및 1236, 그리고 FPGA 버스 1212(FD[63:32]) 및 1213(FD[31:0])을 통해서 이루어진다. 메모리 소자 1205 및 1206은 초기화 및 메모리 컨텐츠 덤프를 위한 DMA 데이터 전송을 유발한다. 리컨피규러블 하드웨어 유닛에서 로직 소자 1201 내지 1204 사이의 평가 데이터 전송은 상호접속부 및 FPGA 버스(1212)(FD[63:32]) 및 (1213)(FD[31:0])을 통해서 이루어진다. 로직 소자(1201 내지 1204)와 메모리 소자(1205, 1206) 사이의 메모리 액세스는 FPGA 버스(1212)(FD[63:32]) 및 (1213)(FD[31:0])를 통해 이루어진다.Referring again to FIG. 61, each simulation write / read cycle includes a DMA data transfer cycle, an evaluation cycle, and a memory access cycle. The combination of various control signals controls and directs which of the cycles the simulation system corresponds to. In a reconfigurable hardware unit, DMA data transfers between logic elements 1201 through 1204 and the host computer are performed on PCI buses (eg, bus 50 in FIG. 46), local buses 1210 and 1236, and FPGA bus 1212 (FD [63:32]). And 1213 (FD [31: 0]). Memory elements 1205 and 1206 cause DMA data transfer for initialization and memory content dump. Evaluation data transfer between logic elements 1201 through 1204 in the reconfigurable hardware unit is through the interconnect and FPGA bus 1212 (FD [63:32]) and 1213 (FD [31: 0]). Memory access between logic elements 1201-1204 and memory elements 1205, 1206 is through FPGA bus 1212 (FD [63:32]) and 1213 (FD [31: 0]).

도 56을 참조하면, CTRL_FPGA 유닛 1200은 시뮬레이션 기록/판독 사이클을 제어하기 위하여 많은 제어 및 어드레스 신호들을 전송 및 수신한다. CTRL_FPGA 유닛 1200은 라인 1211상에 DATAXSER 및 EVAL 신호를 제공하여, 라인 1221을 통해서 로직 소자 1201 및 1203에 전송하고 라인 1222를 통해서 로직 소자 1202 및 1204에 전송한다. CTRL_FPGA 유닛 1200은 또한 버스 1229 및 1214를 통해서 각각 로우 뱅크 메모리 소자 1205 및 하이 뱅크 메모리 소자 1206로 메모리 어드레스 신호 MA[18:2]를 제공한다. 이러한 메모리 소자의 모드를 제어하기 위하여, CTRL_FPGA 유닛 1200은 라인 1216 및 1215를 통하여 로우 뱅크 메모리 소자 1205 및 하이 뱅크 메모리 소자 1206에 칩 선택 기록(및 판독) 신호를 제공한다. DMA 데이터 전송의 종료를 지시하기 위하여, 메모리 시뮬레이션 시스템은 CTRL_FPGA 유닛 1200 및 컴퓨팅 시스템으로 라인 1209상의 DONE 신호를 전송 및 수신할 수 있다.Referring to FIG. 56, the CTRL_FPGA unit 1200 sends and receives many control and address signals to control the simulation write / read cycle. CTRL_FPGA unit 1200 provides DATAXSER and EVAL signals on line 1211 to transmit to logic devices 1201 and 1203 via line 1221 and to logic devices 1202 and 1204 via line 1222. CTRL_FPGA unit 1200 also provides memory address signal MA [18: 2] over bus 1229 and 1214 to low bank memory device 1205 and high bank memory device 1206, respectively. To control the mode of this memory device, CTRL_FPGA unit 1200 provides chip select write (and read) signals to low bank memory device 1205 and high bank memory device 1206 via lines 1216 and 1215. To indicate the end of the DMA data transfer, the memory simulation system may send and receive the DONE signal on line 1209 to the CTRL_FPGA unit 1200 and the computing system.

도 9, 11, 12, 14, 15와 관련하여 전술한 바와 같이, 로직 소자 1201 내지 1204는 2개의 SHIFTIN/SHIFTOUT 라인 세트(라인 1207, 1227, 1218 및 라인 1208, 1228, 1217)에 의해서 도 56에서 멀티플렉싱된 크로스 칩 어드레스 포인터 체인에 의해 서로 연결된다. 이러한 세트는 라인 1207 및 1208에서 Vcc에 의해 체인의 시작시 초기화된다. SHIFTIN 신호는 현재의 FPGA 로직 소자를 위한 메모리 액세스를 시작하기 위하여 뱅크에서 이전의 FPGA 로직 소자로부터 전송된다. 소정의 체인 세트를 통해서 시프트가 종료되면, 최종 로직 소자는 LAST 신호(즉, LASTL 또는 LASTH)를 생성하여 CTRL_FPGA 유닛 1200으로 전송한다. 하이 뱅크에 대하여, 로직 소자 1202는 라인 1218상에 LASTH 시프트아웃 신호를 생성하여 CTRL_FPGA 유닛 1200으로 전송하고, 로우 뱅크에 대하여, 로직 소자 1204는 라인 1217상에 LASTL 신호를 생성하여 CTRL_FPGA 유닛 1200으로 전송한다.As described above with respect to FIGS. 9, 11, 12, 14, and 15, logic elements 1201 through 1204 are illustrated in FIG. 56 by two sets of SHIFTIN / SHIFTOUT lines (lines 1207, 1227, 1218 and lines 1208, 1228, 1217). Are connected to each other by a multiplexed cross chip address pointer chain. This set is initialized at the beginning of the chain by Vcc at lines 1207 and 1208. The SHIFTIN signal is sent from the previous FPGA logic device in the bank to initiate memory access for the current FPGA logic device. When the shift ends through a set of chains, the final logic element generates a LAST signal (ie, LASTL or LASTH) and sends it to CTRL_FPGA unit 1200. For the high bank, logic element 1202 generates a LASTH shiftout signal on line 1218 and sends it to CTRL_FPGA unit 1200. For the low bank, the logic element 1204 generates a LASTL signal on line 1217 and sends it to CTRL_FPGA unit 1200. do.

도 56 및 보드 구현과 관련하여, 본 발명의 일 실시예는 컴포넌트(예, 로직 소자 1201 내지 1204, 메모리 소자 1205 내지 1206, 및 CTRL_FPGA 유닛 1200) 및 버스(예, FPGA 버스 1212 내지 1213 및 로컬 버스 1210)를 하나의 보드상에 통합한다. 이러한 원 보드는 마더보드 커넥터를 통해 마더보드에 연결된다. 따라서, 하나의 보드상에 4개의 로직 소자(각 뱅크에 대하여 2개), 2개의 메모리 소자(각 뱅크에 대하여 1개), 및 버스가 제공된다. 제 2 보드는 보충적 로직 소자(통상 4개), 메모리 소자(통상 2개), FPGA I/O 컨트롤러(CTRL_FPGA 유닛) 및 버스를 포함한다. 그러나, PCI 컨트롤러는 제 1 보드상에서만 인스톨된다. 전술한 내부-보드 커넥터는 보드들 사이에 제공되어, 모든 보드에서의 로직 소자가 서로 연결되어 평가 주기동안 통신할 수 있도록 하며, 로컬 버스는 이러한 모든 보드들을 서로 연결한다. FPGA 버스 FD[63:0]은 각 보드에 유일하게 제공되며, 다중 보드에 대해 제공되지 않는다.
보드 구현에 있어서, 시뮬레이션 시스템은 각각의 보드에서 로직 소자와 메모리 소자 사이에 메모리 맵핑을 수행한다. 상이한 보드 사이에는 메모리 맵핑이 수행되지 않는다. 따라서, 보드 5의 로직 소자는 보드 5의 메모리 소자로 메모리 블럭을 맵핑하고, 다른 보드상의 메모리 소자로 맵핑하지는 않는다. 그러나, 본 발명의 다른 실시예에서는, 시뮬레이션 시스템이 하나의 보드상의 로직 소자로부터 다른 보드상의 메모리 소자로 메모리 블럭을 맵핑할 수 있다.
본 발명의 일 실시예에 따른 메모리 시뮬레이션의 동작은 일반적으로 다음과 같다. 시뮬레이션 기록/판독 사이클은 3개의 주기(DMA 데이터 전송, 평가, 및 메모리 액세스)로 나누어진다. 시뮬레이션 기록/판독 사이클의 종료를 지시하기 위하여, 메모리 시뮬레이션 시스템은 CTRL_FPGA 유닛(1200) 및 컴퓨팅 시스템에 대하여 라인(1209)상에 DONE 신호를 전송 및 수신할 수 있다. 버스(1211) 상의 DATAXSFR 신호는 DMA 데이터 전송 주기의 발생을 나타내는데, 여기서 컴퓨팅 시스템 및 FPGA 로직 소자 1201 내지 1204는 FPGA 데이터 버스, 하이 뱅크 버스(FD[63:32])(1212) 및 로우 뱅크 버스(FD[31:0])(1213)을 통해서 데이터를 다른 장치로 전송한다. 일반적으로, DMA 전송은 호스트 컴퓨팅 시스템과 FPGA 로직 소자 사이에 발생한다. 초기화 및 메모리 컨텐츠 덤프에 대하여, DMA 전송은 호스트 컴퓨팅 시스템 및 SRAM 메모리 소자(1205 및 1206) 사이에 발생한다.
평가 주기동안, 각 FPGA 로직 소자(1201 내지 1204)의 로직 회로는 데이터 평가를 위한 사용자 설계에 적절한 소프트웨어 클럭, 입력 인에이블, 및 MUX 인에이블 신호를 생성한다. 인터-FPGA 로직 소자 통신이 이 주기동안 일어난다. CTRL_FPGA 유닛(1200)은 또한 평가 주기의 지속을 위하여 평가 카운터를 동작시킨다. 카운터의 수, 및 이에 따른 평가 주기의 지속은 신호의 가장 긴 경로를 결정함으로써 시스템에 의해 세트된다. 경로 길이는 스텝의 특정 숫자와 관련된다. 시스템은 스텝 정보를 이용하고, 평가 사이클을 그 종료동안 실행시키는데 필요한 카운터의 수를 계산한다.
메모리 액세스 주기동안, 메모리 시뮬레이션 시스템은 FPGA 데이터 버스상으로 어드레스 및 제어 신호를 각각 전송하기 위하여 하이 및 로우 뱅크 FPGA 로직 소자(1201 내지 1204)를 기다린다. 이러한 어드레스 및 제어 신호는 CTRL_FPGA 유닛(1200)에 의해 래칭된다. 만약 동작이 기록 동작이면, 어드레스, 제어, 및 데이터 신호는 FPGA 로직 소자(1201 내지 1204)에서 각각 SRAM 메모리 소자(1205 내지 1206)으로 전송된다. 만약 동작이 판독 동작이면, 어드레스 및 제어 신호가 FPGA 로직 소자(1201 내지 1204)에서 각각 SRAM 메모리 소자(1205 내지 1206)로 전송되고, 데이터 신호는 SRAM 메모리 소자(1205 및 1206)에서 각각 FPGA 로직 소자(1201 내지 1204)로 전송된다. FPGA 로직 소자 측에서, FD 버스 드라이버는 메모리 블럭의 어드레스 및 제어 신호를 FPGA 데이터 버스(FD 버스)상에 위치시킨다. 만약 동작이 기록 동작이면, 기록 데이터가 메모리 블럭을 위한 FD 버스상에 위치한다. 만약 동작이 판독 동작이면, 이중 버퍼가 SRAM 메모리 소자로부터의 FD 버스상의 메모리 블럭을 위한 데이터를 래칭한다. 이러한 동작은 각 FPGA 로직 소자의 각 메모리 블럭에 대하여 연속하여 수행된다. FGPA 로직 소자의 원하는 모든 메모리 블럭이 액세스되었을 때, 메모리 시뮬레이션 시스템은 각 뱅크에서 다음의 FPGA 로직 소자를 수행하고, 상기 FPGA 로직 소자의 메모리 블럭에 대한 액세스를 시작한다. 모든 FPGA 로직 소자(1201 내지 1204)에서 원하는 모든 메모리 블럭이 액세스된 후, 메모리 시뮬레이션 기록/판독 사이클이 종료되고, 메모리 시뮬레이션 시스템은 다른 메모리 시뮬레이션 기록/판독 사이클의 개시까지 대기한다.With reference to FIG. 56 and the board implementation, one embodiment of the present invention provides components (eg, logic elements 1201-1204, memory elements 1205-1206, and CTRL_FPGA unit 1200) and buses (eg, FPGA buses 1212-1213 and local buses). 1210 is integrated on one board. These original boards are connected to the motherboard through motherboard connectors. Thus, four logic elements (two for each bank), two memory elements (one for each bank), and a bus are provided on one board. The second board contains complementary logic elements (typically four), memory elements (typically two), an FPGA I / O controller (CTRL_FPGA unit) and a bus. However, the PCI controller is only installed on the first board. The above-described internal-board connectors are provided between the boards so that logic elements on all boards can be connected to each other and communicate during the evaluation period, and the local bus connects all these boards to each other. The FPGA bus FD [63: 0] is unique to each board and not for multiple boards.
In a board implementation, the simulation system performs memory mapping between logic elements and memory elements on each board. No memory mapping is performed between the different boards. Thus, logic elements on board 5 map memory blocks to memory elements on board 5, but do not map to memory elements on other boards. However, in another embodiment of the present invention, the simulation system may map memory blocks from logic elements on one board to memory elements on another board.
Operation of the memory simulation according to an embodiment of the present invention is generally as follows. The simulation write / read cycle is divided into three cycles (DMA data transfer, evaluation, and memory access). To indicate the end of the simulation write / read cycle, the memory simulation system may send and receive a DONE signal on line 1209 for the CTRL_FPGA unit 1200 and the computing system. The DATAXSFR signal on bus 1211 indicates the occurrence of a DMA data transfer cycle, where computing systems and FPGA logic elements 1201 through 1204 are FPGA data buses, high bank buses (FD [63:32]) 1212 and low bank buses. Data is transmitted to another device via (FD [31: 0]) 1213. In general, DMA transfers occur between a host computing system and an FPGA logic device. For initialization and memory content dump, DMA transfers occur between the host computing system and the SRAM memory elements 1205 and 1206.
During the evaluation period, the logic circuit of each FPGA logic element 1201-1204 generates a software clock, input enable, and MUX enable signal appropriate for the user design for data evaluation. Inter-FPGA logic device communication occurs during this period. CTRL_FPGA unit 1200 also operates an evaluation counter for the duration of the evaluation cycle. The number of counters, and thus the duration of the evaluation period, is set by the system by determining the longest path of the signal. The path length is related to a certain number of steps. The system uses the step information and calculates the number of counters needed to run the evaluation cycle during its termination.
During the memory access period, the memory simulation system waits for the high and low bank FPGA logic elements 1201-1204 to send address and control signals, respectively, on the FPGA data bus. This address and control signal is latched by CTRL_FPGA unit 1200. If the operation is a write operation, the address, control, and data signals are sent from the FPGA logic elements 1201 through 1204 to the SRAM memory elements 1205 through 1206, respectively. If the operation is a read operation, address and control signals are sent from the FPGA logic elements 1201-1204 to the SRAM memory elements 1205-1206, respectively, and the data signals are sent from the SRAM memory elements 1205, 1206, respectively. Is sent to 1201-1204. On the FPGA logic device side, the FD bus driver places the address and control signals of the memory block on the FPGA data bus (FD bus). If the operation is a write operation, write data is placed on the FD bus for the memory block. If the operation is a read operation, a double buffer latches data for the memory block on the FD bus from the SRAM memory element. This operation is performed sequentially for each memory block of each FPGA logic device. When all the desired memory blocks of the FGPA logic element have been accessed, the memory simulation system performs the next FPGA logic element in each bank and begins accessing the memory block of the FPGA logic element. After all desired memory blocks in all FPGA logic elements 1201-1204 have been accessed, the memory simulation write / read cycle ends, and the memory simulation system waits until the start of another memory simulation write / read cycle.

삭제delete

도 57은, CTRL_FPGA 유닛(1200) 및 메모리 시뮬레이션과 관련된 각 로직 소자의 보다 상세한 구조적 다이어그램을 포함하는, 본 발명에 따른 메모리 시뮬레이션의 블록도를 도시한다. 도 57은 CTRL_FPGA 유닛(1200) 및 로직 소자(1203)(이는 다른 로직 소자 1201, 1202, 1204와 구조적으로 유사하다)을 도시한다. CTRL_FPGA 유닛(1200)은 메모리 유한 상태 머신(MEMFSM)(1240), AND 게이트(1241), 평가(EVAL) 카운터(1242), 로우 뱅크 메모리 어드레스/제어 래치(1243), 로우 뱅크 어드레스/제어 멀티플렉서(1244), 어드레스 카운터(1245), 하이 뱅크 메모리 어드레스/제어 래치(1247), 및 하이 뱅크 어드레스/제어 멀티플렉서(1246)으로 포함한다. 도 57에 도시된 로직 소자(1203)과 같은 각 로직 소자는 평가 유한 상태 머신(EVALFSMx)(1248), 데이터 버스 멀티플렉서(FPGA 0 로직 소자(1203)을 위한 FDO_MUXx))(1249)를 포함한다. EVALFSM의 끝에 표기된 "x"는 관련 특정 로직 소자(FPGA 0, FPGA 1, FPGA 2, FPGA 3)를 나타내는 것으로, 본 실시예에서 "x"는 0, 1, 2, 3이다. 따라서, EVALFSM 0는 FPGA 0 로직 소자(1203)과 관련된다. 일반적으로, 각 로직 소자는 동일한 번호 x와 관련되며, N개의 로직 소자가 사용되는 경우 "x"는 0 내지 N-1을 나타낸다.
각 로직 소자(1201 내지 1204)에서, 다양한 메모리 블럭이 구성 및 맵핑된 사용자 설계와 관련된다. 따라서, 상용자 로직에서 메모리 블럭 인터페이스 1253은 FPGA 로직 소자의 원하는 메모리 블럭에 액세스하기 위하여 컴퓨팅 시스템을 위한 수단을 제공한다. 메모리 블럭 인터페이스(1253)은 또한 버스(1295) 상에 메모리 기록 데이터를 제공하여 FPGA 데이터 버스 멀티플렉서(FDO_MUXx)(1249)로 전송하고, 메모리 판독 데이터 이중 버퍼(1251)로부터 버스(1297)상에 메모리 판독 데이터를 수신한다.
메모리 블럭 데이터/로직 인터페이스(1298)이 각 FPGA 로직 소자에 제공된다. 이러한 각각의 메모리 블럭 데이터/로직 인터페이스 1298은 FPGA 데이터 버스 멀티플렉서(FDO_MUXx)(1249), 평가 유한 상태 머신(EVALFSMx)(1248), 및 FPGA 버스 FD[63:0]와 연결된다. 메모리 블럭 데이터/로직 인터페이스(1298)은 메모리 판독 데이터 이중 버퍼(1251), 어드레스 오프세트 유닛(1250), 메모리 모델(1252), 및 각 메모리 블럭 N(mem_block_N)에 대한 메모리 블럭 인터페이스(1253)(이들은 모두 각 메모리 블럭 N에 대하여 주어진 FPGA 로직 소자(1201 내지 1204)에 대하여 반복된다)을 포함한다. 따라서, 5개의 메모리 블럭에 대하여, 5개 세트의 메모리 블럭 데이터/로직 인터페이스(1298)가 제공된다. 즉, 5개 세트의 메모리 판독 데이터 이중 버퍼(1251), 어드레스 오프세트 유닛(1250), 메모리 모델(1252), 및 각 메모리 블럭 N(mem_block_N)에 대한 메모리 블럭 인터페이스(1253)이 제공된다.FIG. 57 shows a block diagram of a memory simulation in accordance with the present invention, including a more detailed structural diagram of each logic element associated with CTRL_FPGA unit 1200 and memory simulation. 57 shows CTRL_FPGA unit 1200 and logic element 1203 (which is structurally similar to other logic elements 1201, 1202, 1204). The CTRL_FPGA unit 1200 includes a memory finite state machine (MEMFSM) 1240, an AND gate 1241, an evaluation (EVAL) counter 1242, a low bank memory address / control latch 1243, a low bank address / control multiplexer ( 1244, address counter 1245, high bank memory address / control latch 1247, and high bank address / control multiplexer 1246. Each logic element, such as logic element 1203 shown in FIG. 57, includes an evaluation finite state machine (EVALFSMx) 1248, a data bus multiplexer (FDO_MUXx for FPGA 0 logic element 1203) 1249. "X" marked at the end of the EVALFSM represents the relevant specific logic elements (FPGA 0, FPGA 1, FPGA 2, FPGA 3), in the present embodiment "x" is 0, 1, 2, 3. Thus, EVALFSM 0 is associated with the FPGA 0 logic element 1203. In general, each logic element is associated with the same number x, and " x " represents 0 to N-1 when N logic elements are used.
In each logic element 1201-1204, various memory blocks are associated with the user design that is configured and mapped. Thus, in consumer logic, memory block interface 1253 provides a means for a computing system to access a desired memory block of an FPGA logic element. Memory block interface 1253 also provides memory write data on bus 1295 to be transmitted to FPGA data bus multiplexer (FDO_MUXx) 1249 and memory on bus 1297 from memory read data double buffer 1251. Receive read data.
Memory block data / logic interface 1298 is provided to each FPGA logic device. Each of these memory block data / logic interfaces 1298 is coupled to an FPGA data bus multiplexer (FDO_MUXx) 1249, an evaluation finite state machine (EVALFSMx) 1248, and an FPGA bus FD [63: 0]. The memory block data / logic interface 1298 includes a memory read data double buffer 1251, an address offset unit 1250, a memory model 1252, and a memory block interface 1253 (mem_block_N) for each memory block N (mem_block_N). These are all repeated for a given FPGA logic element 1201-1204 for each memory block N). Thus, for five memory blocks, five sets of memory block data / logic interfaces 1298 are provided. That is, five sets of memory read data double buffers 1251, an address offset unit 1250, a memory model 1252, and a memory block interface 1253 for each memory block N (mem_block_N) are provided.

삭제delete

EVALFSMx와 마찬가지로, FDO_MUXx에서 "x"는 관련된 특정 로직 소자(FPGA 0, FPGA 1, FPGA 2, FPGA 3)를 나타내며, 여기서 "x"는 0, 1, 2, 3이다. FDO_MUXx (1249)의 출력은 버스(1282)상에 제공되는데, 상기 버스(1282)는 어떠한 칩(FPGA 0, FPGA 1, FPGA 2, FGPA 3)이 FDO_MUXx 1249와 관련되는가에 따라서 하이 뱅크 버스 FD[63:32] 또는 로우 뱅크 버스 FD[31:0]와 연결된다. 도 57에서, FDO_MUXx는 FDO_MUX0인데, 이는 로우 뱅크 로직 소자 FPGA0 1203과 관련된다. 따라서, 버스 (1282)상의 출력은 로우 뱅크 버스 FD[31:0]에 제공된다. 버스(1283)의 부분은 하이 뱅크 FD[63:32] 또는 로우 뱅크 FD[31:0] 버스로부터 메모리 판독 데이터 이중 버퍼(1251)로의 입력을 위한 판독 버스(1283)로 판독 데이터를 전송하는데 사용된다. 그러므로, 기록 데이터는 FDO_MUX0 1249를 통해서 각 로직 소자(1201 내지 1204)의 메모리 블럭으로 부터 하이 뱅크 FD[63:32] 또는 로우 뱅크 FD[31:0] 버스로 전송되고, 판독 데이터는 판독 버스(1283)을 통해서 하이 뱅크 FD[63:32] 또는 로우 뱅크 FD[32:0] 버스로부터 메모리 판독 데이터 이중 버퍼(1251)로 전송된다. 메모리 판독 데이터 이중 버퍼는 제1 버퍼에서 데이터를 래치하기 위하여 더블 버퍼링된 메커니즘을 제공하고, 스큐(skew)를 최소화하기 위하여 동일한 시간에 래칭된 데이터를 얻기 위해 다시 버퍼링된다. 이러한 메모리 판독 데이터 이중 버퍼 (1251)는 하기에서 더욱 상세히 설명될 것이다.
메모리 모델(1252)를 참조하면, 상기 메모리 모델(1252)는 사용자 메모리 타입을 메모리 시뮬레이션 시스템의 SRAM 타입으로 변환한다. 사용자 설계의 메모리 타입은 하나의 타입에서 다른 타입으로 변화될 수 있기 때문에, 이러한 메모리 블럭 인터페이스(1253)는 또한 사용자 설계에 대하여 유일할 수 있다. 예를 들면, 사용자의 메모리 타입은 DRAM, 플레시 메모리, 또는 EEPROM일 수 있다. 그러나, 모든 다양한 메모리 블럭 인터페이스(1253)에서, 메모리 어드레스 및 제어 신호(예, 판독, 기록, 칩 선택, mem_clk)이 제공된다. 본 발명에 따른 메모리 시뮬레이션의 일 실시예는 사용자 메모리 타입을 메모리 시뮬레이션 시스템에서 사용되는 SRAM 타입으로 변환한다. 만약 사용자의 메모리 타입이 SRAM이라면, SRAM 타입 메모리 모델로의 변환은 유일하다. 따라서, 메모리 어드레스 및 제어 신호는 버스 (1296)상에 제공되어 변환을 수행하는 메모리 모델(1252)로 전송된다.
메모리 모델(1252)는 버스(1293)상에 메모리 블럭 어드레스 정보를 제공하고, 버스(1292)상에 제어 정보를 제공한다. 어드레스 오프세트 유닛(1250)는 다양한 메모리 블럭에 대하여 어드레스 정보를 수신하고, 버스(1293)상의 오리지널 어드레스로부터 버스(1291)상에 교정된 오프세트 어드레스를 제공한다. 오프세트는 어떠한 메모리 블럭의 어드레스가 서로 오버랩될 수 있기 때문에 필수적이다. 예를 들어, 하나의 메모리 블럭이 공간 0-2K를 사용하고 여기에 존재할 수 있는 한편, 또다른 메모리 블럭이 공간 0-3K를 사용하고 여기에 존재할 수 있다. 공간 0-2K에서 2개의 메모리 블럭이 오버랩되기 때문에, 어떠한 어드레싱 오프세트 메커니즘이 없이는 개별 어드레싱이 어려울 것이다. 따라서, 제 1 메모리 블럭은 공간 0-2K를 사용하고 여기에 존재할 수 있는 한편, 제 2 메모리 블럭은 2K에서 5K까지의 공간를 사용하고 여기에 존재할 수 있다. 오프세트 유닛(1250)으로부터의 오프세트 어드레스 및 버스(1292)상의 제어 신호는 조합되고, 버스(1299)상에 제공되어 FPGA 버스 멀티플렉서(FDO_MUXx)(1249)로 전송된다.
FPGA 데이터 버스 멀티플렉서 FDO_MUXx는 버스(1289)상의 SPACE2 데이터, 버스(1299)상의 어드레스/제어 신호, 및 버스(1295)상의 메모리 기록 데이터를 수신한다. 전술한 바와 같이, SPACE2 및 SPACE3은 특정 공간 지수이다. FPGA I/O 컨트롤러(도 10 및 도 22의 참조부호 327)에 의해 생성된 SPACE 지수는 특정 어드레스 공간(예, REG 판독, REG 기록, S2H 판독, H2S 기록, 및 CLK기록)를 선택한다. 이러한 어드레스 내에서, 본 발명에 따른 시스템은 어드레싱될 특정 워드를 선택한다. SPACE2는 하드웨어-소프트웨어 H2S 데이터의 DMA 판독 전송을 위한 전용 메모리 공간을 나타낸다. SPACE3는 REGISTER_READ 데이터의 DMA 판독 전송을 위한 전용 메모리 공간을 나타낸다. 상기의 표 G를 참조하라.
FDO_MUXx(1249)는 로우 뱅크 또는 하이 뱅크 버스에 대하여 버스(1282)상에 데이터를 출력한다. 선택기 신호는 EVALFSMx 유닛(1248)로부터 라인(1285)상의 선택 신호이며 라인(1284)상의 출력 인에이블(output_en) 신호이다. 라인(1284)상의 출력 인이에블 신호는 FDO_MUXx(1249)의 동작을 인에이블(또는 디스에이블)한다. FPGA 버스에 대한 데이터 액세스를 위하여, 출력 인에이블 신호는 인에이블되어 FDO_MUXx가 기능하도록 한다. 라인(1285)상의 선택 신호가 EVALFSMx 유닛(1248)에 의해 생성되어 버스(1289)상의 SPACE2 데이터, 버스(1290)상의 SPACE3 데이터, 버스(1299)상의 어드레스/제어 신호, 및 버스(1295)상의 메모리 기록 데이터로로부터의 다수의 출력 중 소정의 출력을 선택한다. EVALFSMx 유닛(1248)에 의한 선택 신호의 생성은 하기에 설명한다.
EVALFSMx 유닛 1248은 메모리 시뮬레이션 시스템과 관련하여 각 로직 소자 1201 내지 1204의 동작의 핵심이다. EVALFSMx 유닛 1248은 그 입력으로서 라인 1279상의 SHIFTIN 신호, 라인 1274상의 CTRL_FPGA 유닛 1200으로부터 EVAL 신호, 및 라인 1287상의 기록 신호 wrx를 수신한다. EVALFSMx 유닛 1248은 라인 1280상에 SHIFTOUT 신호, 메모리 판독 데이터 이중 버퍼 1251을 위하여 라인 1286상에 판독 래치 신호 rd_latx, FDO_MUXx 1249를 위하여 라인 1284상에 출력 인에이블 신호, FDO_MUXx 1249를 위하여 라인 1285상에 선택 신호, 및 라인 1281상에 상용자 로직(input_en, mux_en, clk_en)을 위한 3개의 신호를 출력한다.
본 발명에 따른 메모리 시뮬레이션 시스템을 위한 FPGA 로직 소자 1201 내지 1204의 동작을 설명한다. EVAL 신호가 로직 1일 때, FPGA 로직 소자 1201 내지 1204내에서 데이터 평가가 수행된다. 그렇지 않으면, 시뮬레이션 시스템은 DMA 데이터 전송 또는 메모리 액세스를 수행한다. EVAL=1일 때, EVALFSMx 유닛 1248은 clk_en 신호, input_en 신호, mux_en 신호를 생성하여, 사용자 로직이 각각 데이터, 래치 관련 데이터, 및 로직 소자에 대한 멀티플렉스 신호를 평가하게 한다. EVALFSMx 유닛 1248은 사용자 설계(도 19 참조)에서 모든 클럭 에지 레지스터의 제 2 플립-플롭을 인에이블하도록 clk_en 신호를 생성한다. 그렇지 않을 경우, clk_en 신호는 소프트웨어 클럭이 될 것이다. 만약 사용자의 메모리 타입이 동기식이라면, clk_en 신호는 또한 각 메모리 블럭의 메모리 판독 데이터 이중 버퍼 1251의 제 2 클럭을 인에이블한다. EVALFSMx 유닛 1248은 DMA 전송에 의해 CPU로부터 사용자의 로직으로 전송된 입력 신호를 래치하기 위하여 사용자 설계에 대하여 input_en 신호를 생성한다. input_en 신호는 주 클럭 레지스터의 제 2 플립-플롭에 대하여 인에이블 입력을 제공한다(도 19 참조). 마지막으로, EVALFSMx 유닛 1248은 어레이에서 다른 FPGA 로직 소자와 통신을 시작하기 위하여 각 FPGA 로직 소자내의 멀티플렉싱 회로를 턴온시키는 mux_en 신호를 생성한다.
따라서, 만약 FPGA 로직 소자 1201 내지 1204가 적어도 하나의 메모리 블럭을 포함하면, 메모리 시뮬레이션 시스템은 선택된 FPGA 로직 소자로 시프트되기 위하여 선택된 데이터를 기다리고, FD 버스상에 메모리 블럭 인터페이스 1253(mem_block_N)의 어드레스 및 제어 신호를 전송하기 위하여 FPGA 데이터 버스 드라이버를 위한 선택 신호 및 output_en 신호를 생성한다.As with EVALFSMx, "x" in FDO_MUXx refers to the specific logic elements involved (FPGA 0, FPGA 1, FPGA 2, FPGA 3), where "x" is 0, 1, 2, 3. The output of FDO_MUXx 1249 is provided on bus 1282, which has a high bank bus FD [depending on which chip (FPGA 0, FPGA 1, FPGA 2, FGPA 3) is associated with FDO_MUXx 1249. 63:32] or low bank bus FD [31: 0]. In FIG. 57, FDO_MUXx is FDO_MUX0, which is associated with low bank logic device FPGA0 1203. Thus, the output on bus 1282 is provided to low bank bus FD [31: 0]. Portions of bus 1283 are used to transfer read data from high bank FD [63:32] or low bank FD [31: 0] buses to read bus 1283 for input to memory read data double buffer 1251. do. Therefore, write data is transferred from the memory block of each logic element 1201 to 1204 through the FDO_MUX0 1249 to the high bank FD [63:32] or low bank FD [31: 0] buses, and the read data is read from the read bus ( 1283 is transferred from the high bank FD [63:32] or low bank FD [32: 0] buses to the memory read data double buffer 1251. The memory read data double buffer provides a double buffered mechanism to latch data in the first buffer, and is buffered again to obtain data latched at the same time to minimize skew. This memory read data double buffer 1251 will be described in more detail below.
Referring to memory model 1252, the memory model 1252 converts a user memory type into an SRAM type of a memory simulation system. Since the memory type of the user design can vary from one type to another, this memory block interface 1253 can also be unique for the user design. For example, the user's memory type may be DRAM, flash memory, or EEPROM. However, in all the various memory block interfaces 1253, memory addresses and control signals (eg, read, write, chip select, mem_clk) are provided. One embodiment of a memory simulation according to the present invention converts a user memory type into an SRAM type used in a memory simulation system. If your memory type is SRAM, conversion to SRAM type memory model is unique. Thus, the memory address and control signals are provided on bus 1296 and sent to memory model 1252 to perform the conversion.
Memory model 1252 provides memory block address information on bus 1293 and control information on bus 1292. The address offset unit 1250 receives address information for the various memory blocks and provides a corrected offset address on the bus 1291 from the original address on the bus 1293. The offset is necessary because the addresses of any memory block can overlap each other. For example, one memory block may use space 0-2K and be present here, while another memory block may use space 0-3K and be here. Since two memory blocks overlap in space 0-2K, individual addressing will be difficult without any addressing offset mechanism. Thus, the first memory block uses space 0-2K and can be present here, while the second memory block uses space from 2K to 5K and can be present here. The offset address from offset unit 1250 and the control signals on bus 1292 are combined, provided on bus 1299, and sent to FPGA bus multiplexer (FDO_MUXx) 1249.
The FPGA data bus multiplexer FDO_MUXx receives SPACE2 data on bus 1289, address / control signals on bus 1299, and memory write data on bus 1295. As mentioned above, SPACE2 and SPACE3 are specific spatial indices. The SPACE index generated by the FPGA I / O controller (reference numeral 327 in FIGS. 10 and 22) selects a particular address space (e.g., REG read, REG write, S2H read, H2S write, and CLK write). Within this address, the system according to the invention selects a particular word to be addressed. SPACE2 represents a dedicated memory space for DMA read transfer of hardware-software H2S data. SPACE3 represents a dedicated memory space for DMA read transfer of REGISTER_READ data. See Table G above.
FDO_MUXx 1249 outputs data on bus 1282 for a low bank or high bank bus. The selector signal is a select signal on line 1285 from EVALFSMx unit 1248 and an output enable signal on line 1284. The output enable signal on line 1284 enables (or disables) operation of FDO_MUXx 1249. For data access to the FPGA bus, the output enable signal is enabled to allow FDO_MUXx to function. A select signal on line 1285 is generated by EVALFSMx unit 1248 to generate SPACE2 data on bus 1289, SPACE3 data on bus 1290, address / control signal on bus 1299, and memory on bus 1295. A predetermined output is selected from among a plurality of outputs from the recording data. The generation of the selection signal by the EVALFSMx unit 1248 is described below.
The EVALFSMx unit 1248 is the heart of the operation of each logic element 1201 through 1204 with respect to the memory simulation system. The EVALFSMx unit 1248 receives as inputs a SHIFTIN signal on line 1279, an EVAL signal from CTRL_FPGA unit 1200 on line 1274, and a write signal wrx on line 1287. EVALFSMx unit 1248 selects SHIFTOUT signal on line 1280, read latch signal rd_latx on line 1286 for memory read data double buffer 1251, output enable signal on line 1284 for FDO_MUXx 1249, line 1285 for FDO_MUXx 1249 Signal and three signals for common logic (input_en, mux_en, clk_en) on line 1281.
The operation of the FPGA logic elements 1201 to 1204 for the memory simulation system according to the present invention will be described. When the EVAL signal is logic 1, data evaluation is performed within the FPGA logic elements 1201-1204. Otherwise, the simulation system performs DMA data transfer or memory access. When EVAL = 1, EVALFSMx unit 1248 generates clk_en signal, input_en signal, mux_en signal, allowing user logic to evaluate the multiplex signal for data, latch related data, and logic elements, respectively. The EVALFSMx unit 1248 generates a clk_en signal to enable the second flip-flop of all clock edge registers in the user design (see FIG. 19). Otherwise, the clk_en signal will be a software clock. If the user's memory type is synchronous, the clk_en signal also enables the second clock of the memory read data double buffer 1251 of each memory block. The EVALFSMx unit 1248 generates an input_en signal for the user design to latch the input signal transmitted from the CPU to the user's logic by the DMA transfer. The input_en signal provides an enable input for the second flip-flop of the main clock register (see FIG. 19). Finally, the EVALFSMx unit 1248 generates a mux_en signal that turns on the multiplexing circuitry within each FPGA logic device to initiate communication with other FPGA logic devices in the array.
Thus, if FPGA logic elements 1201-1204 include at least one memory block, the memory simulation system waits for the selected data to be shifted to the selected FPGA logic element, and the address of memory block interface 1253 (mem_block_N) on the FD bus. Generates a select signal and an output_en signal for the FPGA data bus driver to transmit control signals.

삭제delete

만약 라인 1287상의 기록 신호 wrx가 인에이블(즉, 로직 1)되면, 선택 신호 및 output_en 신호는 인에이블되어 FPGA 칩이 어느 뱅크에 연결되었는가에 따라서 기록 데이터를 로우 뱅크 또는 하이 뱅크 버스상으로 전송한다. 도 57에서, 로직 소자 1203는 FPGA0이고, 로우 뱅크 버스 FD[31:0]에 연결되어 있다. 만약 라인 1287상의 기록 신호 wrx가 디스에이블(즉, 로직 0)되었으면, 선택 신호 및 output_en 신호가 디스에이블되고, FPGA 칩이 어느 뱅크와 연결되었는가에 따라서 메모리 판독 데이터 이중 버퍼 1251으로의 전송을 위한 라인 1286상의 판독 래치 신호 rd_latx는 로우 뱅크 또는 하이 뱅크 버스를 통해서 SRAM으로부터 선택된 데이터를 래치 및 이중 버퍼링한다. wrx 신호는 사용자 설계 로직의 메모리 인터페이스로부터 전송된 메모리 기록 신호이다. 라인 1287상의 wrx 신호는 제어 버스 1292를 통해서 메모리 모델 1252로부터 얻어진다.
데이터 기록 또는 판독을 위한 이러한 프로세스는 각 FPGA 로직 소자에서 발생한다. 모든 메모리 블럭이 SRAM을 통해서 프로세싱된 후, EVALFSMx 유닛 1248은 SHIFTPUT 신호를 생성하여 체인에서 다음 FPGA 로직 소자에 의해 SRAM이 액세스되도록 한다. 하이 및 로우 뱅크상의 디바이스에 대한 메모리 액세스는 병렬로 수행된다. 때때로, 하나의 뱅크에 대한 메모리 액세스는 다른 뱅크에 대한 메모리 액세스 전에 종료될 수 있다. 이러한 모든 액세스에 대하여, 적절한 대기 사이클이 삽입되어, 로직이 준비되고 데이터가 이용가능할 때에만 로직이 데이터를 프로세싱하게 한다.If write signal wrx on line 1287 is enabled (ie, logic 1), the select signal and output_en signal are enabled to transfer write data onto a low bank or high bank bus depending on which bank the FPGA chip is connected to. . In FIG. 57, logic element 1203 is FPGA0 and is connected to low bank bus FD [31: 0]. If write signal wrx on line 1287 is disabled (ie, logic 0), the select signal and output_en signal are disabled, and the line for transfer to memory read data dual buffer 1251 depending on which bank the FPGA chip is connected to. The read latch signal rd_latx on 1286 latches and double buffers the selected data from the SRAM via the low bank or high bank bus. The wrx signal is a memory write signal sent from the memory interface of user designed logic. The wrx signal on line 1287 is obtained from memory model 1252 via control bus 1292.
This process for writing or reading data occurs in each FPGA logic device. After all of the memory blocks have been processed through the SRAM, the EVALFSMx unit 1248 generates a SHIFTPUT signal that allows the SRAM to be accessed by the next FPGA logic element in the chain. Memory accesses to devices on the high and low banks are performed in parallel. Occasionally, memory access to one bank may end before memory access to another bank. For all such accesses, appropriate wait cycles are inserted, causing the logic to process the data only when the logic is ready and the data is available.

삭제delete

CTRL_FPGA 유닛(1200) 측에서, MEMFSM(1240)은 본 발명에 따른 메모리 시뮬레이션의 핵심을 이룬다. 이는 메모리 시뮬레이션 기록/판독 사이클의 액티브을 제어하기 위한 많은 제어 신호들을 전송 및 수신하여, 사이클에 의해 다양한 동작의 제어가 이루어진다. MEMFSM(1240)은 라인 1258을 통해서 라인 1260상에 DATAXSFR 신호를 수신한다. 상기 신호는 또한 라인 1273상의 각 로직 소자에 제공된다. DATAXSFR이 로우가 되면(즉, 로직 로우), DMA 데이터 전송 주기는 종료되고, 평가 및 메모리 액세스 주기가 시작된다.On the CTRL_FPGA unit 1200 side, the MEMFSM 1240 forms the core of the memory simulation according to the present invention. It transmits and receives a number of control signals for controlling the active of the memory simulation write / read cycle, so that various operations are controlled by the cycle. MEMFSM 1240 receives DATAXSFR signal on line 1260 over line 1258. The signal is also provided to each logic element on line 1273. When DATAXSFR goes low (ie, logic low), the DMA data transfer cycle ends and the evaluation and memory access cycle begins.

MEMFSM(1240)은 또한 PCI 버스 및 FPGA 버스를 통해서 컴퓨팅 시스템과 시뮬레이션 시스템 사이에 액세스된 선택된 어드레스 공간와 관련된 선택된 워드를 지시하기 위하여 라인 1254상의 LASTH 신호 및 라인 1255상의 LASTL 신호를 수신한다. 이러한 시프트 아웃 프로세스와 관련된 MOVE 신호는 원하는 워드가 액세스될 때까지 각 로직 소자(예, 로직 소자 1201 내지 1204)를 통과하여 전송되고, 체인의 끝에서 MOVE 신호는 결국 LAST 신호(즉, 하이 뱅크를 위한 LASTH 및 로우 뱅크를 위한 LASTL)가 된다. EVALFSM(1248)(즉, 도 57은 FPGA0 로직 소자 1203을 위한 EVALFSM0를 도시한다)에서, 대응 LAST 신호는 라인 1280상의 SHIFTOUT 신호이다. 특정 로직 소자 1203이 도 56에 도시된 바와 같은 로우 뱅크 체인에서 마지막 로직 소자가 아니기 때문에(도 56에서는 로직 소자 1204가 로우 뱅크 체인에서 마지막 디바이스이다), EVALFSM0을 위한 SHIFTOUT 신호는 LAST 신호가 아니다. 만약 EVALFSM 1248이 도 56의 EVALFSM2에 대응하면, 라인 1280상의 SHIFTOUT 신호는 MEMFSM을 위하여 라인 1255에 제공되는 LASTL 신호이다. 그렇지 않으면, 라인 1280상의 SHIFTOUT 신호는 로직 소자(1204)로 제공된다(도 56 참조). 유사하게, 라인 1280상의 SHIFTIN 신호는 FPGA0 로직 소자(1203)을 위한 Vcc를 표현한다(도 56 참조).
LASTL 및 LASTH 신호는 각각 라인 1256 및 1257을 통해서 AND 게이트 1241로 입력된다. AND 게이트(1241)는 오픈 드레인을 제공한다. AND 게이트 1241의 출력은 라인 1259상에 DONE 신호를 생성하고, 이는 컴퓨팅 시스템 및 MEMFSM 1240에 제공된다. 따라서, LASTL 및 LASTH 신호가 모두 시프트 아웃된 체인의 끝을 나타내는 로직 하이일 때, 프로세스는 AND 게이트 출력을 로직 하이로 할 것이다.
MEMFSM 1240은 EVAL 카운터 1242를 위하여 라인 1261상에 시작 신호를 생성한다. 명칭이 암시하는 바와 같이, 시작 신호는 EVAL 카운터 1242의 시작을 트리거링하고, DMA 데이터 전송 주기 종료 이후에 전송된다. 시작 신호는 DATAXSFR 신호가 하이에서 로우로(1에서 0으로) 전환되는 것을 검출할 때 생성된다. EVAL 카운터 1242는 소정 수의 클럭 사이클을 카운팅하는 프로그램가능한 카운터이다. EVAL 카운터 1242에서 프로그램된 카운트의 지속은 평가 주기의 지속을 결정한다. 라인 1274상의 EVAL 카운터 1242의 출력은 카운터가 카운팅하는가 또는 카운팅하지 않는가에 따라서 로직 레벨 1 또는 0을 가진다. EVAL 카운터 1242가 카운팅할 때, 라인 1274의 출력은 로직 1이고, 이는 EVALFSMx 1248을 통해서 각 FPGA 로직 소자 1201 내지 1204에 제공된다. EVAL=1일 때, FPGA 로직 소자 1201 내지 1204는 사용자 설계내의 데이터를 평가하기 위하여 인터 FPGA 통신을 수행한다. EVAL 카운터 1242의 출력은 자신의 트랙킹(tracking)을 목적으로 MEMFSM 유닛 1240에 대하여 라인 1262상으로 피드백된다. 프로그램된 카운트가 끝날 때, EVAL 카운터 1242는 평가 주기의 종료를 지시하기 위하여 라인 1274 및 1262상에 로직 0 신호를 생성한다.MEMFSM 1240 also receives a LASTH signal on line 1254 and a LASTL signal on line 1255 to indicate the selected word associated with the selected address space accessed between the computing system and the simulation system via the PCI bus and the FPGA bus. The MOVE signal associated with this shift out process is transmitted through each logic element (e.g., logic elements 1201 through 1204) until the desired word is accessed, and at the end of the chain the MOVE signal eventually results in a LAST signal (i.e. a high bank). LASTH for a row bank and LASTL for a low bank). In EVALFSM 1248 (ie, FIG. 57 shows EVALFSM0 for FPGA0 logic element 1203), the corresponding LAST signal is a SHIFTOUT signal on line 1280. Since the particular logic element 1203 is not the last logic element in the low bank chain as shown in FIG. 56 (logic element 1204 is the last device in the low bank chain in FIG. 56), the SHIFTOUT signal for EVALFSM0 is not a LAST signal. If EVALFSM 1248 corresponds to EVALFSM2 of FIG. 56, the SHIFTOUT signal on line 1280 is the LASTL signal provided on line 1255 for MEMFSM. Otherwise, the SHIFTOUT signal on line 1280 is provided to logic element 1204 (see FIG. 56). Similarly, the SHIFTIN signal on line 1280 represents Vcc for the FPGA0 logic element 1203 (see FIG. 56).
The LASTL and LASTH signals are input to AND gate 1241 through lines 1256 and 1257, respectively. AND gate 1241 provides an open drain. The output of AND gate 1241 produces a DONE signal on line 1259, which is provided to the computing system and MEMFSM 1240. Thus, when both the LASTL and LASTH signals are logic high indicating the end of the shifted out chain, the process will bring the AND gate output to logic high.
MEMFSM 1240 generates a start signal on line 1261 for EVAL counter 1242. As the name suggests, the start signal triggers the start of the EVAL counter 1242 and is sent after the end of the DMA data transfer period. The start signal is generated when the DATAXSFR signal detects a transition from high to low (1 to 0). EVAL counter 1242 is a programmable counter that counts a predetermined number of clock cycles. The duration of the count programmed in EVAL counter 1242 determines the duration of the evaluation cycle. The output of EVAL counter 1242 on line 1274 has logic level 1 or 0 depending on whether the counter is counting or not counting. When the EVAL counter 1242 counts, the output of line 1274 is logic 1, which is provided to each FPGA logic device 1201 through 1204 via EVALFSMx 1248. When EVAL = 1, FPGA logic elements 1201 through 1204 perform inter FPGA communication to evaluate data in the user design. The output of the EVAL counter 1242 is fed back on the line 1262 to the MEMFSM unit 1240 for its tracking purposes. At the end of the programmed count, the EVAL counter 1242 generates a logic 0 signal on lines 1274 and 1262 to indicate the end of the evaluation period.

삭제delete

만약 메모리 액세스를 원하지 않으면, 라인 1272상의 MEM_EN 신호는 로직 0으로 되고, MEMFSM 유닛(1240)으로 제공된다. 이 경우 메모리 시뮬레이션 시스템은 또다른 DMA 데이터 전송 주기를 기다린다. 만약 메모리 액세스를 원하면, 라인 1272상의 MEM_EN 신호는 로직 1로 된다. 필수적으로, MEM_EN 신호는 FPGA 로직 소자를 액세싱하기 위한 온-보드 SRAM 메모리 소자를 인에이블시키는 CPU로부터의 제어 신호이다. 여기서, MEMFSM 유닛(1240)는 FPGA 버스 FD[63:32] 및 FD[31:0]상에 어드레스 및 제어 신호를 위치시키기 위하여 FPGA 로직 소자(1201 내지 1204)를 기다린다.If no memory access is desired, the MEM_EN signal on line 1272 goes to logic 0 and is provided to MEMFSM unit 1240. In this case, the memory simulation system waits for another DMA data transfer cycle. If memory access is desired, the MEM_EN signal on line 1272 becomes logic one. Essentially, the MEM_EN signal is a control signal from the CPU that enables the on-board SRAM memory device to access the FPGA logic device. Here, the MEMFSM unit 1240 waits for the FPGA logic elements 1201-1204 to place address and control signals on the FPGA buses FD [63:32] and FD [31: 0].

나머지 기능적 유닛 그리고 이와 관련된 제어 신호 및 라인들은 데이터 기록 및 판독을 위하여 어드레스/제어 정보를 SRAM 메모리 소자로 전송한다. 이러한 유닛는 로우 뱅크를 위한 메모리 어드레스/제어 래치(1243), 로우 뱅크를 위한 어드레스 제어 mux(1244), 하이 뱅크를 위한 메모리 어드레스/제어 래치(1247), 하이 뱅크를 위한 어드레스 제어 mux(1246), 및 어드레스 카운터(1245)를 포함한다.The remaining functional units and their associated control signals and lines transfer address / control information to the SRAM memory device for data writing and reading. These units include memory address / control latch 1243 for the low bank, address control mux 1244 for the low bank, memory address / control latch 1247 for the high bank, address control mux 1246 for the high bank, And an address counter 1245.

로우 뱅크를 위한 메모리 어드레스/제어 래치(1243)은 버스 1213과 일치하는 FPGA 버스 FD[31:0]로부터 어드레스 및 제어 신호 그리고 라인 1263상의 래치 신호를 수신한다. 래치(1243)는 라인 1264상에 mem_wr_L 신호를 생성하고, 버스(1266)을 통해서 FPGA 버스 FD[31:0]로부터 어드레스/제어 mux(1244)로 수신되는 어드레스/제어 신호를 제공한다. 이러한 mem_wr 신호는 칩 선택 기록 신호와 동일하다.
어드레스/제어 mux(1244)는 버스 1268을 통해서 어드레스 카운터(1245)로부터 어드레스 정보 및 버스(1266)상의 어드레스 및 제어 정보를 입력으로서 수신한다. 어드레스/제어 mux(1244)는 버스 1276상에 어드레스/제어 정보를 출력하여 로우 뱅크 SRAM 메모리 소자(1205)로 전송한다. 라인 1265상의 선택 신호는 MEMFSM 유닛(1240)으로부터 적절한 선택 신호를 제공한다. 버스 1276상의 어드레스/제어 정보는 도 56에 도시된 버스 1229 및 1216 상의 칩 선택 판독/기록 신호 및 MA[18:2]에 대응한다.
어드레스 카운터(1245)는 버스(1267)을 통해서 SPACE4 및 SPACE5로부터 정보를 수신한다. SPACE4는 DMA 기록 전송 정보를 포함한다. SPACE5는 DMA 판독 전송 정보를 포함한다. 따라서, 이러한 DMA 전송은 PCI 버스를 통해서 컴퓨팅 시스템(워크스테이션 CPU를 통한 캐쉬/메인 메모리) 및 시뮬레이션 시스템(SRAM 메모리 소자 1205, 1206) 사이에서 일어난다. 어드레스 카운터(1245)는 그 출력을 버스(1288 및 1268)로 제공하여 어드레스/제어 mux(1244 및 1246)로 전송한다. 로우 뱅크를 위하여 라인 1265상에 적절한 선택 신호를 제공하여, 어드레스/제어 mux(1244)는 SRAM 디바이스(1205) 및 FPGA 로직 소자(1203, 1204) 사이의 판독/기록 메모리 액세스를 위한 버스(1266)상의 어드레스/제어 정보 또는 대안으로서 버스 1276상의 SPACE4 또는 SPACE5로부터의 DMA 기록/판독 전송 데이터를 버스 1276상에 제공한다.
메모리 액세스 주기 동안, MEMFSM 유닛(1240)은 FPGA 버스 FD[31:0]로부터 입력를 패치하기 위하여 메모리 어드레스/제어 래치(1243)에 대하여 라인 1263상에 래치 신호를 제공한다. MEMFSM 유닛(1240)은 추가의 제어를 위하여 FD[31:0]상의 어드레스/제어 신호로부터 mem_wr_L 제어 정보를 추출한다. 만약 버스(1264)상의 mem_wr_L 신호가 로직 1이면, 기록 동작이 목적되며, 라인 1265상의 적절한 선택 신호가 MEMFSM 유닛(1240)에 의해 생성되고 어드레스/제어 mux(1244)로 전송되어, 버스(1266)상의 어드레스/제어 신호가 버스(1276)상의 로우 뱅크 SRAM으로 전송된다. 그 후에, FPGA 로직 소자로부터 SRAM 메모리 소자로 기록 데이터 전송이 일어난다. 만약 버스(1264)상의 mem_wr_L 신호가 로직 0이면, 판독 동작이 목적되고, 시뮬레이션 시스템은 SRAM 메모리 소자에 의해 전송된 FPGA 버스 FD[31:0]상의 데이터를 기다린다. 데이터가 준비되면, SRAM 메모리 소자로부터 FPGA 로직 소자로 판독 데이터 전송이 일어난다.The memory address / control latch 1243 for the row bank receives an address and control signal and a latch signal on line 1263 from the FPGA bus FD [31: 0] that matches bus 1213. Latch 1243 generates a mem_wr_L signal on line 1264 and provides an address / control signal received from FPGA bus FD [31: 0] to address / control mux 1244 via bus 1266. This mem_wr signal is the same as the chip select write signal.
The address / control mux 1244 receives, as input, address information and address and control information on the bus 1266 from the address counter 1245 over the bus 1268. The address / control mux 1244 outputs address / control information on the bus 1276 and transmits it to the low bank SRAM memory device 1205. The select signal on line 1265 provides an appropriate select signal from MEMFSM unit 1240. The address / control information on the bus 1276 corresponds to the chip select read / write signals on the buses 1229 and 1216 and MA [18: 2] shown in FIG.
The address counter 1245 receives information from SPACE4 and SPACE5 over the bus 1267. SPACE4 contains DMA write transfer information. SPACE5 includes DMA read transfer information. Thus, such DMA transfers occur between the computing system (cache / main memory via the workstation CPU) and the simulation system (SRAM memory elements 1205, 1206) over the PCI bus. The address counter 1245 provides its output to the buses 1288 and 1268 and sends them to the address / control mux 1244 and 1246. Providing an appropriate select signal on line 1265 for the low bank, address / control mux 1244 provides bus 1266 for read / write memory access between SRAM device 1205 and FPGA logic elements 1203 and 1204. Address / control information on the bus or alternatively DMA write / read transfer data from SPACE4 or SPACE5 on bus 1276 is provided on bus 1276.
During the memory access period, MEMFSM unit 1240 provides a latch signal on line 1263 for memory address / control latch 1243 to fetch input from FPGA bus FD [31: 0]. The MEMFSM unit 1240 extracts mem_wr_L control information from the address / control signal on the FD [31: 0] for further control. If the mem_wr_L signal on bus 1264 is logic 1, a write operation is desired, and an appropriate select signal on line 1265 is generated by MEMFSM unit 1240 and sent to address / control mux 1244, bus 1266. The address / control signal on is sent to the low bank SRAM on bus 1276. Thereafter, write data transfer occurs from the FPGA logic element to the SRAM memory element. If the mem_wr_L signal on bus 1264 is logic 0, a read operation is desired, and the simulation system waits for data on FPGA bus FD [31: 0] transmitted by the SRAM memory device. Once the data is ready, a read data transfer occurs from the SRAM memory device to the FPGA logic device.

삭제delete

하이 뱅크에 대하여 유사한 구성 및 동작이 제공된다. 하이 뱅크를 위한 메모리 어드레스/제어 래치(1247)은 버스(1212)와 일치하는 FPGA 버스 FD[63:32]로부터 어드레스 및 제어 신호 그리고 라인 1270상의 래치 신호를 수신한다. 래치 (1270)은 라인 1271상에 mem_wr_H 신호를 생성하고, 버스(1239)를 통해서 FPGA 버스 FD[63:32]에서 어드레스/제어 mux(1246)로 들어오는 어드레스/제어 신호를 전송한다.Similar configurations and operations are provided for the high bank. Memory address / control latch 1247 for the high bank receives address and control signals and latch signals on line 1270 from FPGA bus FDs [63:32] matching bus 1212. Latch 1270 generates a mem_wr_H signal on line 1271 and transmits an address / control signal coming from FPGA bus FD [63:32] to address / control mux 1246 over bus 1239.

어드레스/제어 mux(1246)는 버스 1268을 통해서 어드레스 카운터 1245로부터 어드레스 정보 및 버스 1239상의 어드레스 및 제어 정보를 입력으로서 수신한다. 어드레스/제어 mux(1244)는 버스 1277상에 어드레스/제어 정보를 출력하여 하이 뱅크 SRAM 메모리 소자(1206)로 전송한다. 라인 1269상의 선택 신호는 MEMFSM 유닛 (1240)으로부터 적절한 선택 신호를 제공한다. 버스(1277)상의 어드레스/제어 정보는 도 56에 도시된 버스(1214 및 1215)상의 칩 선택 판독/기록 신호 및 MA[18:2]에 대응한다.The address / control mux 1246 receives, as input, address information and address and control information on the bus 1239 from the address counter 1245 over the bus 1268. The address / control mux 1244 outputs address / control information on the bus 1277 and transmits it to the high bank SRAM memory element 1206. The select signal on line 1269 provides an appropriate select signal from MEMFSM unit 1240. The address / control information on the bus 1277 corresponds to the chip select read / write signal and the MA [18: 2] on the buses 1214 and 1215 shown in FIG.

어드레스 카운터(1245)는 전술한 DMA 기록 및 판독 전송과 마찬가지로 버스 1267을 통해서 SPACE4 및 SPACE5로부터 정보를 수신한다. 어드레스 카운터(1245)는 그 출력을 버스(1288 및 1268)로 제공하여 어드레스/제어 mux(1244 및 1246)으로 전송한다. 하이 뱅크를 위하여 라인 1269상에 적절한 선택 신호를 제공하여, 어드레스/제어 mux 1246는 SRAM 디바이스(1206) 및 FPGA 로직 소자(1201, 1202) 사이의 판독/기록 메모리 액세스를 위한 버스(1239)상의 어드레스/제어 정보 또는 대안으로서 버스(1267)상의 SPACE4 또는 SPACE5로부터의 DMA 기록/판독 전송 데이터를 버스(1277)상에 제공한다.The address counter 1245 receives information from SPACE4 and SPACE5 via bus 1267, similar to the DMA write and read transfer described above. The address counter 1245 provides its output to the buses 1288 and 1268 for transmission to the address / control mux 1244 and 1246. By providing the appropriate select signal on line 1269 for the high bank, address / control mux 1246 is an address on bus 1239 for read / write memory access between SRAM device 1206 and FPGA logic elements 1201 and 1202. Control information or alternatively provides DMA write / read transfer data from SPACE4 or SPACE5 on bus 1267 on bus 1277.

메모리 액세스 주기 동안, MEMFSM 유닛(1240)은 FPGA 버스 FD[63:32]로부터 입력를 패치하기 위하여 메모리 어드레스/제어 래치(1247)에 대하여 라인 1270상에 래치 신호를 제공한다. MEMFSM 유닛(1240)은 추가의 제어를 위하여 FD[63:32]상의 어드레스/제어 신호로부터 mem_wr_H 제어 정보를 추출한다. 만약 버스 1271상의 mem_wr_H 신호가 로직 1이면, 기록 동작이 목적되며, 라인 1269상의 적절한 선택 신호가 MEMFSM 유닛(1240)에 의해 생성되고 어드레스/제어 mux(1246)로 전송되어, 버스 1239상의 어드레스/제어 신호가 버스 1277상의 하이 뱅크 SRAM으로 전송된다. 그 후에, FPGA 로직 소자로부터 SRAM 메모리 소자로 기록 데이터 전송이 일어난다. 만약 버스 1271상의 mem_wr_H 신호가 로직 0이면, 판독 동작이 목적되고, 시뮬레이션 시스템은 SRAM 메모리 소자에 의해 전송된 FPGA 버스 FD[63:32]상의 데이터를 기다린다. 데이터가 준비되면, SRAM 메모리 소자로부터 FPGA 로직 소자로 판독 데이터 전송이 일어난다.During a memory access period, MEMFSM unit 1240 provides a latch signal on line 1270 for memory address / control latch 1247 to fetch input from FPGA bus FD [63:32]. The MEMFSM unit 1240 extracts mem_wr_H control information from the address / control signal on the FD [63:32] for further control. If the mem_wr_H signal on bus 1271 is logic 1, a write operation is desired, and an appropriate select signal on line 1269 is generated by MEMFSM unit 1240 and sent to address / control mux 1246 to address / control on bus 1239. The signal is sent to the high bank SRAM on bus 1277. Thereafter, write data transfer occurs from the FPGA logic element to the SRAM memory element. If the mem_wr_H signal on bus 1271 is logic 0, a read operation is desired, and the simulation system waits for data on the FPGA bus FD [63:32] sent by the SRAM memory device. Once the data is ready, a read data transfer occurs from the SRAM memory device to the FPGA logic device.

도 57에 도시된 바와 같이, 어드레스 및 제어 신호는 버스 1276 및 1277을 통해 각각 로우 뱅크 SRAM 메모리 소자 및 하이 뱅크 메모리 소자에 제공된다. 로우 뱅크를 위한 버스 1276은 도 56에서 버스 1229 및 1216의 조합에 대응한다. 유사하게, 하이 뱅크를 위한 버스 1277은 도 56의 버스 1214 및 1215의 조합에 대응한다.As shown in FIG. 57, address and control signals are provided to the low bank SRAM memory elements and the high bank memory elements via buses 1276 and 1277, respectively. Bus 1276 for the low bank corresponds to the combination of buses 1229 and 1216 in FIG. Similarly, bus 1277 for high bank corresponds to the combination of buses 1214 and 1215 of FIG. 56.

본 발명에 따른 메모리 시뮬레이션 시스템을 위한 CTRL_FPGA 유닛 1200의 동작은 일반적으로 다음과 같다. CTRL_FPGA 유닛(1200)의 MEMFSM 유닛(1240) 및 컴퓨팅 시스템에 제공되는 라인 1259상의 DONE 신호는 시뮬레이션 기록/판독 사이클의 종료를 지시한다. 라인 1260상의 DATAXSFR 신호는 시뮬레이션 기록/판독 사이클의 DMA 데이터 전송 주기의 발생을 지시한다. FGPA 버스 FD[31:0] 및 FD[63:32]상의 메모리 어드레스/제어 신호는 각각 하이 및 로우 뱅크를 위한 메모리 어드레스/제어 래치(1243 및 1247)에 제공된다. 각 뱅크에 대하여, MEMFSM 유닛 1240는 어드레스 및 제어 정보를 래치하기 위하여 래치 신호(1263 또는 1269)를 생성한다. 그리고, 상기 정보는 SRAM 메모리 소자로 전송된다. mem_wr 신호는 기록 또는 판독 동작을 원하는지를 결정하는데 사용된다. 만약 기록 동작을 원하면, 데이터가 FPGA 버스를 통해서 FPGA 로직 소자 1201 내지 1204에서 SRAM 메모리 소자로 전송된다. 만약, 판독 동작을 원하면, 시뮬레이션 시스템은 SRAM 메모리 소자에서 FPGA 로직 소자로의 전송을 위한 FPGA 버스상으로 요청된 데이터를 제공하기 위하여 SRAM 메모리 소자를 기다린다. SPACE4 및 SPACE5의 DMA 데이터 전송을 위하여, 라인 1265 및 1269상의 선택 신호는 시뮬레이션 시스템의 SRAM 메모리 소자와 메인 컴퓨팅 시스템 사이에 데이터가 전송될 때 어드레스 카운터 1245의 출력을 선택한다. 이러한 모든 액세스에 대하여, 적절한 대기 사이클이 삽입되어 로직이 준비되고 데이터가 이용가능할 때만 로직이 데이터를 프로세싱한다.The operation of the CTRL_FPGA unit 1200 for the memory simulation system according to the present invention is generally as follows. The DONE signal on line 1259 provided to the MEMFSM unit 1240 of the CTRL_FPGA unit 1200 and the computing system indicates the end of the simulation write / read cycle. The DATAXSFR signal on line 1260 indicates the occurrence of the DMA data transfer period of the simulation write / read cycle. Memory address / control signals on FGPA buses FD [31: 0] and FD [63:32] are provided to memory address / control latches 1243 and 1247 for the high and low banks, respectively. For each bank, the MEMFSM unit 1240 generates a latch signal 1263 or 1269 to latch the address and control information. The information is then transferred to an SRAM memory device. The mem_wr signal is used to determine if a write or read operation is desired. If a write operation is desired, data is transferred from the FPGA logic devices 1201 through 1204 to the SRAM memory device via the FPGA bus. If a read operation is desired, the simulation system waits for the SRAM memory device to provide the requested data on the FPGA bus for transfer from the SRAM memory device to the FPGA logic device. For DMA data transfer of SPACE4 and SPACE5, the select signal on lines 1265 and 1269 selects the output of address counter 1245 when data is transferred between the SRAM memory element of the simulation system and the main computing system. For all such accesses, the appropriate wait cycle is inserted so that the logic processes the data only when the logic is ready and the data is available.

도 60은 메모리 판독 데이터 이중 버퍼(1251)(도 57)를 보다 상세히 도시한다. 각 FPGA 로직 소자에서 각 메모리 블럭 N은 상이한 시간에 들어올 수 있는 관련 데이터를 래치하고 이러한 관련 래칭된 데이터를 동시에 버퍼링하기 위하여 이중 버퍼를 가진다. 도 60에서, 메모리 블럭 0를 위한 이중 버퍼(1391)는 2개의 D-타입 플립-플롭(1340 및 1341)을 포함한다. 제 1 D-타입 플립-플롭(1341)의 출력 (1344)은 제 2 D-타입 플립-플롭(1341)의 입력에 연결된다. 제 2 D-타입 플립-플롭(1341)의 출력(1344)는 사용자 설계의 메모리 블럭 N 인터페이스에 제공되는 이중 버퍼의 출력이다. 글로벌 클럭 입력은 라인 1393상의 제 1 플립-플롭(1340) 및 라인 1394상의 제 2 플립-플롭(1341)에 제공된다.60 shows memory read data double buffer 1251 (FIG. 57) in more detail. Each memory block N in each FPGA logic device has a double buffer to latch related data that may come in at different times and to buffer this related latched data simultaneously. In FIG. 60, the double buffer 1391 for memory block 0 includes two D-type flip-flops 1340 and 1341. An output 1344 of the first D-type flip-flop 1341 is connected to an input of the second D-type flip-flop 1341. The output 1344 of the second D-type flip-flop 1341 is the output of a double buffer provided to the memory block N interface of the user design. The global clock input is provided to a first flip-flop 1340 on line 1393 and a second flip-flop 1341 on line 1394.

제 1 D 플립-플롭은 하이 뱅크를 위한 FPGA 버스 FD[63:32] 및 로우 뱅크를 위한 FPGA 버스 FD[31:0] 및 버스 1283을 통해서 SRAM 메모리 소자로부터의 데이터 입력을 라인 1342상에서 수신한다. 인에이블 입력이 각 FPGA 로직 소자를 위한 EVALFSMx 유닛로부터 rd_latx(예, rd_lat0) 신호를 수신하는 라인 1345에 연결된다. 따라서, 판독 동작(즉, wrx=0)을 위하여, EVALFSMx 유닛는 라인 1342 및 1343상에 데이터를 래치하기 위하여 rd_latx 신호를 생성한다. 모든 메모리 블럭의 모든 이중 버퍼의 입력 데이터는 상이한 시간에 들어오며, 이중 버퍼는 모든 데이터가 우선 래치되도록 한다. 일단 모든 데이터가 D 플립-플롭(1340)으로 래치되면, 제 2 D 플립-플롭 1341에 대한 클럭 입력으로서 clk_en 신호(즉, 소프트웨어 클럭)이 라인 1346상에 제공된다. clk_en 신호가 나타나면, 라인 1343상의 래치된 데이터는 라인 1344상으로 D 플립-플롭(1341)로 버퍼링된다.The first D flip-flop receives data input from the SRAM memory device on line 1342 via FPGA bus FD [63:32] for the high bank and FPGA bus FD [31: 0] and bus 1283 for the low bank. . The enable input is connected to line 1345 which receives the rd_latx (eg rd_lat0) signal from the EVALFSMx unit for each FPGA logic device. Thus, for a read operation (ie wrx = 0), the EVALFSMx unit generates an rd_latx signal to latch data on lines 1342 and 1343. The input data of all double buffers in all memory blocks come in at different times, and the double buffer causes all data to be latched first. Once all data is latched into D flip-flop 1340, a clk_en signal (ie, a software clock) is provided on line 1346 as the clock input for second D flip-flop 1341. When the clk_en signal appears, the latched data on line 1343 is buffered onto D flip-flop 1341 onto line 1344.

다음 메모리 블럭 1에 대하여, 이중 버퍼(1391)와 실질적으로 동일한 또다른 이중 버퍼(1392)가 제공된다. SRAM 메모리 소자로부터의 데이터는 라인 1396상의 입력이다. 글로벌 클럭 신호는 라인 1397상의 입력이다. clk_en(소프트웨어 클럭) 신호는 라인 1398상의 이중 버퍼(1392)의 제 2 D 플립-플롭(미도시)로의 입력이다. 이러한 라인들은 메모리 블럭 0에 대한 제 1 이중 버퍼(1391) 및 메모리 블럭 N에 대한 모든 다른 이중 버퍼를 위한 아날로그 신호 라인에 연결된다. 이중 버퍼링된 데이터 출력은 라인 1399상에 제공된다.
제 2 이중 버퍼 1392를 위한 rd_latx 신호(예, rd_lat0)가 다른 이중 버퍼를 위한 다른 rd_latx 신호와 분리되어 라인 1395상에 제공된다. 더 많은 이중 버퍼가 다른 메모리 블럭 N을 위하여 제공된다.
이하에서는 본 발명의 일 실시예에 따라서 MEMFSM 유닛 1240의 상태도를 설명한다. 도 58은 CTRL_FPGA 유닛의 MEMFSM 유닛의 유한 상태 머신의 상태도를 도시한다. 도 58의 상태도는 시뮬레이션 기록/판독 사이클내의 3개의 상태가 또한 그들의 대응 상태를 가지도록 구성되어 있다. 따라서, 상태 1300 내지 1301은 DMA 데이터 전송 주기에 대응하고, 상태 1302 내지 1303은 평가 주기에 대응하고, 상태 1305 내지 1314는 메모리 액세스 주기에 대응한다. 하기에 설명될 도 58과 관련하여 도 57을 참조하라.For the next memory block 1, another double buffer 1372 is provided, which is substantially the same as the double buffer 1391. Data from the SRAM memory device is input on line 1396. The global clock signal is the input on line 1397. The clk_en (software clock) signal is the input of a double buffer 1332 on line 1398 to a second D flip-flop (not shown). These lines are connected to the analog signal lines for the first double buffer 1391 for memory block 0 and all other double buffers for memory block N. The double buffered data output is provided on line 1399.
An rd_latx signal (e.g., rd_lat0) for the second double buffer 1392 is provided on line 1395 separately from the other rd_latx signal for the other double buffer. More double buffers are provided for other memory blocks N.
Hereinafter, a state diagram of the MEMFSM unit 1240 according to an embodiment of the present invention will be described. 58 shows a state diagram of a finite state machine of the MEMFSM unit of the CTRL_FPGA unit. The state diagram of FIG. 58 is configured such that the three states in the simulation write / read cycle also have their corresponding states. Thus, states 1300-1301 correspond to DMA data transfer periods, states 1302-1303 correspond to evaluation periods, and states 1305-1314 correspond to memory access periods. See FIG. 57 in connection with FIG. 58 to be described below.

삭제delete

일반적으로, DMA 전송, 평가, 및 메모리 액세스에 대한 신호 시퀀스가 세트된다. 일 실시예에서, 시퀀스는 다음과 같다: 만약 가능하다면, DATA_XSFR은 DAM 데이터 전송을 트리거한다. 하이 및 로우 뱅크에 대한 LAST 신호는 DMA 데이터 전송이 종료될 때 생성되어, DMA 데이터 전송 주기의 종료를 지시하기 위하여 DONE 신호를 트리거한다. 그리고, XSFR_DONE 신호가 생성되고 EVAL 사이클이 시작된다. EVAL이 종료되면, 메모리 기록/판독이 시작된다.In general, signal sequences for DMA transfers, evaluations, and memory accesses are set. In one embodiment, the sequence is as follows: If possible, DATA_XSFR triggers DAM data transfer. LAST signals for the high and low banks are generated when the DMA data transfer ends, triggering the DONE signal to indicate the end of the DMA data transfer period. The XSFR_DONE signal is then generated and the EVAL cycle begins. When EVAL ends, memory write / read starts.

도 58을 참조하면, DATAXSFR 신호가 로직 0일 때 항상 상태 1300은 대기한다. 이는 DMA 데이터 전송이 일어나지 않는다는 것을 의미한다. DATAXSFR 신호가 로직 1일 때, MEMFSM 유닛(1240)은 상태 1301로 진행한다. 여기서, 컴퓨팅 시스템은 컴퓨팅 시스템(도 1, 45, 46의 메인 메모리)과 시뮬레이션 시스템(도 56의 FPGA 로직 소자 1201 내지 1204 또는 SRAM 메모리 소자(1205)) 사이에 DMA 데이터 전송을 요구한다. DMA 데이터 전송이 종료될 때까지 적절한 대기 사이클이 삽입된다. DMA 전송이 종료되면, DATAXSFR 신호는 로직 0으로 리턴된다.Referring to FIG. 58, when the DATAXSFR signal is logic 0, the state 1300 always waits. This means that no DMA data transfer takes place. When the DATAXSFR signal is logic 1, the MEMFSM unit 1240 proceeds to state 1301. Here, the computing system requires DMA data transfer between the computing system (main memory of FIGS. 1, 45, 46) and the simulation system (FPGA logic elements 1201-1204 or SRAM memory device 1205 of FIG. 56). The appropriate wait cycle is inserted until the DMA data transfer is complete. When the DMA transfer ends, the DATAXSFR signal is returned to logic zero.

DATAXSFR 신호가 로직 0으로 리턴될 때, 시작 신호의 생성이 상태 1302에서 MEMFSM 유닛에서 트리거된다. 시작 신호는 프로그램가능한 카운터인 EVAL 카운터 (1242)를 시작시킨다. EVAL 카운터내의 프로그램된 카운트의 지속은 평가 주기의 지속과 동일하다. EVAL 카운터가 상태 1303에서 카운팅하는 동안, EVAL 신호가 로직 1을 나타내고 MEMFSM 유닛(1240) 및 FPGA 로직 소자내의 EVALFSMx에 제공된다. 카운트가 종료될 때, EVAL 카운터는 EVAL 신호가 로직 0인 것을 나타내고, 이를 MEMFSM 유닛(1240) 및 FPGA 로직 소자내의 EVALFSMx로 전송한다. MEMFSM 유닛 (1240)이 로직 0 EVAL 신호를 수신하면, 이는 상태 1304에서 EVAL_DONE 플래그를 턴온한다. EVAL_DONE 플래그는 평가 주기가 종료되었고 메모리 액세스 주기가 진행된다는 것을 나타내기 위하여 MEMFSM에 의해 사용된다. CPU는 XSFR_EVAL 레지스터(하기의 표 K 참조)를 판독함으로써 EVAL_DONE 및 XSFR_DONE를 체크하여, 다음 DMA 전송 전에 DMA 전송 및 EVAL이 성공적으로 종료되었는가를 확인한다.When the DATAXSFR signal returns to logic zero, the generation of a start signal is triggered in the MEMFSM unit in state 1302. The start signal starts the EVAL counter 1242, which is a programmable counter. The duration of the programmed count in the EVAL counter is equal to the duration of the evaluation cycle. While the EVAL counter is counting at state 1303, an EVAL signal indicates logic 1 and is provided to the MELFSM unit 1240 and EVALFSMx in the FPGA logic device. When the count ends, the EVAL counter indicates that the EVAL signal is logic 0 and sends it to EVALFSMx in the MEMFSM unit 1240 and the FPGA logic element. When MEMFSM unit 1240 receives a logic 0 EVAL signal, it turns on the EVAL_DONE flag in state 1304. The EVAL_DONE flag is used by the MEMFSM to indicate that the evaluation cycle has ended and the memory access cycle has progressed. The CPU checks the EVAL_DONE and XSFR_DONE by reading the XSFR_EVAL register (see Table K below) to verify that the DMA transfer and EVAL completed successfully before the next DMA transfer.

그러나, 몇몇 경우에 있어서는, 시뮬레이션 시스템이 지금 현재 메모리 액세스를 수행하는 것을 원하지 않을 수 있다. 여기서, 시뮬레이션 시스템은 메모리 인에이블 신호 MEM_EN을 로직 0으로 유지한다. 이러한 디스에이블(로직 0)된 MEM_EN 신호는 MEMFSM 유닛를 휴면 상태 1300으로 유지하는데, 이는 MEMFSM 유닛가 DMA 데이터 전송 및 FPGA 로직 소자에 의한 데이터 평가를 대기한다. 한편, 만약 메모리 인에이블 신호 MEM_EN이 로직 1이면, 시뮬레이션 시스템은 원하는 메모리 액세스 수행을 지시한다.However, in some cases, the simulation system may not want to perform the current memory access now. Here, the simulation system keeps the memory enable signal MEM_EN at logic zero. This disabled (logic 0) signal keeps the MEMFSM unit in a dormant state 1300, which waits for the MEMFSM unit to transfer DMA data and evaluate the data by the FPGA logic device. On the other hand, if the memory enable signal MEM_EN is logic 1, then the simulation system instructs to perform the desired memory access.

도 58의 상태 1304 이하에서는, 상태도가 병렬로 진행하는 2개의 섹션으로 세분된다. 하나의 섹션은 로우 뱅크 메모리 액세스를 위하여 상태 1305, 1306, 1307, 1308, 1309를 포함한다. 다른 섹션은 하이 뱅크 메모리 액세스를 위하여 상태 1311, 1312, 1313, 1314, 1309를 포함한다.Under state 1304 in FIG. 58, the state diagram is subdivided into two sections running in parallel. One section includes states 1305, 1306, 1307, 1308, 1309 for low bank memory access. The other section includes states 1311, 1312, 1313, 1314, 1309 for high bank memory access.

상태 1305에서, 시뮬레이션 시스템은 어드레스 및 제어 신호를 FPGA 버스 FD[31:0]상에 제공하기 위하여 현재 선택된 FPGA 로직 소자를 위한 하나의 사이클을 대기한다. 상태 1306에서, MEMFSM은 라인 1263상에 래치 신호를 생성하여 메모리 어드레스/제어 래치(1243)으로 전송하여 FD[31:0]로부터 입력을 패치한다. 이러한 특정 패치된 어드레스 및 제어 신호에 대응하는 데이터는 SRAM 메모리 소자로부터 판독되거나 또는 SRAM 메모리 소자로 기록된다. 시뮬레이션 시스템이 판독 동작을 요구하는지 또는 기록 동작을 요구하는지를 결정하기 위하여, 로우 뱅크를 위한 메모리 기록 신호 mem_wr_L이 어드레스 및 제어 신호로부터 추출된다. 만약 mem_wr_L=0이면, 판독 동작이 요청된다. 만약 mem_wr_L=1이면, 기록 동작이 요청된다. 전술한 바와 같이, 상기 mem_wr_L 신호는 칩 선택 기록 신호와 등가물이다.In state 1305, the simulation system waits one cycle for the currently selected FPGA logic element to provide an address and control signal on the FPGA bus FD [31: 0]. In state 1306, the MEMFSM generates a latch signal on line 1263 and sends it to memory address / control latch 1243 to patch the input from FD [31: 0]. Data corresponding to this particular patched address and control signal is read from or written to the SRAM memory device. To determine whether the simulation system requires a read operation or a write operation, the memory write signal mem_wr_L for the row bank is extracted from the address and control signals. If mem_wr_L = 0, a read operation is requested. If mem_wr_L = 1, a write operation is requested. As described above, the mem_wr_L signal is equivalent to the chip select write signal.

상태 1307에서, 어드레스/제어 mux 1244를 위한 적절한 선택 신호가 생성되어 어드레스 및 제어 신호를 로우 뱅크 SRAM으로 전송한다. MEMFSM 유닛는 mem_wr 신호 및 LASTL 신호를 체크한다. 만약 mem_wr_L=1이고 LASTL=0이면, 기록 동작이 요청되지만 FPGA 로직 소자의 체인에서 마지막 데이터는 아직 시프트되지 않는다. 따라서, 시뮬레이션 시스템은 상태 1305로 리턴되는데, 여기서 시뮬레이션 시스템은 더 많은 어드레스 및 제어 신호를 FD[31:0]상에 전송하기 위하여 FPGA 로직 소자를 위한 하나의 사이클을 대기한다. 상기 프로세스는 마지막 데이터가 FPGA 로직 소자로부터 시프트될 때까지 계속된다. 그러나, 만약 mem_wr_L=1이고 LASTL=1이면, 마지막 데이터는 FPGA 로직 소자로부터 시프트된다.In state 1307, an appropriate select signal for address / control mux 1244 is generated to transmit the address and control signal to the low bank SRAM. The MEMFSM unit checks the mem_wr signal and the LASTL signal. If mem_wr_L = 1 and LASTL = 0, a write operation is requested but the last data in the chain of FPGA logic devices is not yet shifted. Thus, the simulation system returns to state 1305, where the simulation system waits one cycle for the FPGA logic element to send more address and control signals on the FD [31: 0]. The process continues until the last data is shifted out of the FPGA logic device. However, if mem_wr_L = 1 and LASTL = 1, the last data is shifted out of the FPGA logic element.

유사하게, 만약 판독 동작을 지시하는 mem_wr_L=0이면, MEMFSM은 상태 1308로 진행한다. 상태 1308에서, 시뮬레이션 시스템은 데이터를 FPGA 버스 FD[31:0]으로 전송하기 위하여 SRAM 메모리 소자를 위한 하나의 사이클을 기다린다. 만약 LASTL=0이면, FPGA 로직 소자의 체인의 마지막 데이터는 아직 시프트되지 않는다. 따라서, 시뮬레이션 시스템은 상태 1305로 리턴되는데, 여기서 시뮬레이션 시스템은 더 많은 어드레스 및 제어 신호를 FD[31:0]상에 전송하기 위하여 FPGA 로직 소자를 위한 하나의 사이클을 대기한다. 상기 프로세스는 마지막 데이터가 FPGA 로직 소자로 시프트될 때까지 계속된다. 기록 동작(mem_wr_L=1) 및 판독 동작(mem_wr_L=0)은 삽입될 수 있거나 그렇지 않으면 LASTL=1이 될 때까지 교대로 반복된다.Similarly, if mem_wr_L = 0 indicating a read operation, the MEMFSM proceeds to state 1308. In state 1308, the simulation system waits one cycle for the SRAM memory device to transfer data to the FPGA bus FD [31: 0]. If LASTL = 0, the last data in the chain of FPGA logic elements is not shifted yet. Thus, the simulation system returns to state 1305, where the simulation system waits one cycle for the FPGA logic element to send more address and control signals on the FD [31: 0]. The process continues until the last data is shifted to the FPGA logic device. The write operation (mem_wr_L = 1) and the read operation (mem_wr_L = 0) can be inserted or otherwise repeated until LASTL = 1.

LASTL=1일 때, MEMFSM 은 상태 1309로 진행하는데, 여기서는 DONE=0일 때까지 대기한다. DONE=1이고, LASTL 및 LASTH가 로직 1이면, 시뮬레이션 기록/판독 사이클이 종료된다. 그리고, 시뮬레이션 시스템은 상태 1300으로 진행하고, 여기서 DATAXSFR=0인동안 대기한다.When LASTL = 1, MEMFSM proceeds to state 1309, where it waits until DONE = 0. If DONE = 1 and LASTL and LASTH are logic 1, the simulation write / read cycle ends. The simulation system then proceeds to state 1300 where it waits for DATAXSFR = 0.

하이 뱅크에 대하여도 유사한 프로세스가 적용될 수 있다. 상태 1311에서, 시뮬레이션 시스템은 어드레스 및 제어 신호를 FPGA 버스 FD[63:32]상에 전송하기 위하여 현재 선택된 FPGA 로직 소자에 대한 하나의 사이클을 기다린다. 상태 1312에서, MEMFSM은 라인 1270상에 래치 신호를 생성하여 메모리 어드레스/제어 래치 (1247)로 전송하여 FD[63:32]로부터 입력을 패치한다. 이러한 특정 패치된 어드레스 및 제어 신호에 대응하는 데이터는 SRAM 메모리 소자로부터 판독되거나 또는 SRAM 메모리 소자로 기록된다. 시뮬레이션 시스템이 판독 동작을 요구하는지 또는 기록 동작을 요구하는지를 결정하기 위하여, 하이 뱅크를 위한 메모리 기록 신호 mem_wr_H이 어드레스 및 제어 신호로부터 추출된다. 만약 mem_wr_H=0이면, 판독 동작이 요청된다. 만약 mem_wr_H=1이면, 기록 동작이 요청된다.Similar processes can be applied for high banks. In state 1311, the simulation system waits one cycle for the currently selected FPGA logic element to send address and control signals on the FPGA bus FD [63:32]. In state 1312, MEMFSM generates a latch signal on line 1270 and sends it to memory address / control latch 1247 to patch the input from FD [63:32]. Data corresponding to this particular patched address and control signal is read from or written to the SRAM memory device. To determine whether the simulation system requires a read operation or a write operation, the memory write signal mem_wr_H for the high bank is extracted from the address and control signals. If mem_wr_H = 0, a read operation is requested. If mem_wr_H = 1, a write operation is requested.

상태 1313에서, 어드레스/제어 mux(1246)를 위한 적절한 선택 신호가 생성되어 어드레스 및 제어 신호를 하이 뱅크 SRAM으로 전송한다. MEMFSM 유닛는 mem_wr 신호 및 LASTH 신호를 체크한다. 만약 mem_wr_H=1이고 LASTH=0이면, 기록 동작이 요청되지만 FPGA 로직 소자의 체인에서 마지막 데이터는 아직 시프트되지 않는다. 따라서, 시뮬레이션 시스템은 상태 1311로 리턴되는데, 여기서 시뮬레이션 시스템은 더 많은 어드레스 및 제어 신호를 FD[63:32]상에 전송하기 위하여 FPGA 로직 소자를 위한 하나의 사이클을 대기한다. 상기 프로세스는 마지막 데이터가 FPGA 로직 소자로부터 시프트될 때까지 계속된다. 그러나, 만약 mem_wr_H=1이고 LASTH=1이면, 마지막 데이터는 FPGA 로직 소자로부터 시프트된다.In state 1313, an appropriate select signal for address / control mux 1246 is generated to transmit the address and control signal to high bank SRAM. The MEMFSM unit checks the mem_wr signal and the LASTH signal. If mem_wr_H = 1 and LASTH = 0, a write operation is requested but the last data in the chain of FPGA logic elements is not yet shifted. Thus, the simulation system returns to state 1311, where the simulation system waits one cycle for the FPGA logic element to send more address and control signals on the FD [63:32]. The process continues until the last data is shifted out of the FPGA logic device. However, if mem_wr_H = 1 and LASTH = 1, the last data is shifted out of the FPGA logic element.

유사하게, 만약 판독 동작을 지시하는 mem_wr_H=0이면, MEMFSM은 상태 1314로 진행한다. 상태 1314에서, 시뮬레이션 시스템은 데이터를 FPGA 버스 FD[63:32]으로 전송하기 위하여 SRAM 메모리 소자를 위한 하나의 사이클을 기다린다. 만약 LASTH=0이면, FPGA 로직 소자의 체인의 마지막 데이터는 아직 시프트되지 않는다. 따라서, 시뮬레이션 시스템은 상태 1311로 리턴되는데, 여기서 시뮬레이션 시스템은 더 많은 어드레스 및 제어 신호를 FD[63:32]상에 전송하기 위하여 FPGA 로직 소자를 위한 하나의 사이클을 대기한다. 상기 프로세스는 마지막 데이터가 FPGA 로직 소자로부터 시프트될 때까지 계속된다. 기록 동작(mem_wr_H=1) 및 판독 동작(mem_wr_H=0)은 삽입될 수 있거나 그렇지 않으면 LASTL=1이 될 때까지 교대로 반복된다.Similarly, if mem_wr_H = 0 indicating a read operation, the MEMFSM proceeds to state 1314. At state 1314, the simulation system waits one cycle for the SRAM memory device to transfer data to the FPGA bus FDs [63:32]. If LASTH = 0, the last data in the chain of FPGA logic elements is not shifted yet. Thus, the simulation system returns to state 1311, where the simulation system waits one cycle for the FPGA logic element to send more address and control signals on the FD [63:32]. The process continues until the last data is shifted out of the FPGA logic device. The write operation (mem_wr_H = 1) and the read operation (mem_wr_H = 0) can be inserted or otherwise repeated until LASTL = 1.

LASTH=1일 때, MEMFSM 은 상태 1309로 진행하는데, 여기서는 DONE=0일 때까지 대기한다. DONE=1이고, LASTL 및 LASTH가 로직 1이면, 시뮬레이션 기록/판독 사이클이 종료된다. 그리고, 시뮬레이션 시스템은 DATAXSFR=0일 때마다 휴면 상태인 상태 1300으로 진행한다.When LASTH = 1, MEMFSM proceeds to state 1309, where it waits until DONE = 0. If DONE = 1 and LASTL and LASTH are logic 1, the simulation write / read cycle ends. The simulation system then proceeds to the dormant state 1300 whenever DATAXSFR = 0.

선택적으로, 하이 뱅크 및 로우 뱅크 상태 1309 및 1320이 본 발명의 또다른 실시예에 따라서 수행될 수 있다. 따라서, 로우 뱅크에서, 상태 1308(LASTL=1) 및 1307(MEM_WR_L=1 및 LASTL=1)을 통과한 후에 MEMFSM은 상태 1300으로 바로 진행한다. 하이 뱅크에서, 상태 1313(LASTH=1) 및 1313(MEM_WR_H=1 및 LASTH=1)을 통과한 후에 MEMFSM은 상태 1300으로 바로 진행한다.Optionally, high bank and low bank states 1309 and 1320 may be performed in accordance with another embodiment of the present invention. Thus, in the low bank, after passing through states 1308 (LASTL = 1) and 1307 (MEM_WR_L = 1 and LASTL = 1), the MEMFSM proceeds directly to state 1300. In the high bank, the MEMFSM proceeds directly to state 1300 after passing states 1313 (LASTH = 1) and 1313 (MEM_WR_H = 1 and LASTH = 1).

이하에서는 본 발명의 일 실시예에 따라서 EVALFSM 유닛 1248의 상태도를 설명한다. 도 59는 각 FPGA 칩에서 EVALFSMx 유한 상태 머신의 상태도를 도시한다. 도 58과 유사하게, 도 59의 상태도는 시뮬레이션 기록/판독 사이클내의 2개의 주기가 그들의 대응 상태를 나타내도록 구성된다. 따라서, 상태 1320 내지 1326A는 평가 주기에 대응하고, 상태 1326B 내지 1336은 메모리 액세스 주기에 대응한다. 하기에 설명될 도 58과 관련하여 도 57을 참조하라.Hereinafter, a state diagram of the EVALFSM unit 1248 will be described according to an embodiment of the present invention. 59 shows a state diagram of the EVALFSMx finite state machine at each FPGA chip. Similar to FIG. 58, the state diagram of FIG. 59 is configured such that two periods in the simulation write / read cycle indicate their corresponding states. Thus, states 1320-1326A correspond to evaluation periods, and states 1326B-1336 correspond to memory access periods. See FIG. 57 in connection with FIG. 58 to be described below.

EVALFSMx 유닛 1248은 CTRL_FPGA 유닛 1200로부터 라인 1274상의 EVAL 신호를 수신한다(도 57 참조). EVAL=0일 때, FPGA 로직 소자에 의한 데이터 평가는 수행되지 않는다. 따라서, 상태 1320에서, EVAL=0인 경우 EVALFSMx은 휴면상태이다. EVAL=1일 때, EVALFSMx은 상태 1321로 진행한다.EVALFSMx unit 1248 receives the EVAL signal on line 1274 from CTRL_FPGA unit 1200 (see FIG. 57). When EVAL = 0, no data evaluation by the FPGA logic device is performed. Thus, in state 1320, EVALFSMx is dormant when EVAL = 0. When EVAL = 1, EVALFSMx proceeds to state 1321.

상태 1321, 1322, 1323은 인터-FPGA 통신과 관련되는데, 여기서 데이터는 FPGA 로직 소자를 통해 사용자 설계에 의해 평가된다. 여기서, EVALFSMx는 신호 input_en, mux_en, clk_en(도 57의 아이템 1281)을 생성하여 사용자의 로직으로 전송한다. 상태 1321에서, EVALFSMx은 상기 사이클에서 사용자 설계 로직내의 모든 클럭 에지 레지스터 플립-플롭의 제 2 플립-플롭을 인에이블하는 clk_en 신호를 생성한다. 그렇지 않을 경우, clk_en 신호는 소프트웨어로서 제공될 수 있다. 만약 사용자의 메모리 타입이 동기식이면, clk_en 신호는 각 메모리 블럭내의 메모리 판독 데이터 이중 버퍼(1251)의 제 2 클럭을 또한 인에이블할 수 있다. 각 메모리 블럭에 대한 SRAM 데이터 출력은 이 사이클에서 사용자 설계 로직으로 전송된다.States 1321, 1322, and 1323 relate to inter-FPGA communication, where data is evaluated by user design through FPGA logic devices. Here, EVALFSMx generates signals input_en, mux_en, clk_en (item 1281 of FIG. 57) and transmits them to the user's logic. In state 1321, EVALFSMx generates a clk_en signal that enables the second flip-flop of all clock edge register flip-flops in the user design logic in the cycle. Otherwise, the clk_en signal can be provided as software. If the user's memory type is synchronous, the clk_en signal may also enable the second clock of the memory read data double buffer 1251 in each memory block. The SRAM data output for each memory block is sent to user-designed logic in this cycle.

상태 1322에서, EVALFSMx는 DMA 전송에 의해 CPU로부터 사용자 로직으로 전송되는 입력 신호를 래치하기 위하여 사용자 설계 로직에 대한 input_en 신호를 생성한다. input_en 신호는 주요 클럭 레지스터내의 제 2 플립-플롭(도 19 참조)으로 인에이블 입력을 제공한다.At state 1322, EVALFSMx generates an input_en signal for the user design logic to latch the input signal sent from the CPU to the user logic by the DMA transfer. The input_en signal provides an enable input to the second flip-flop (see FIG. 19) in the main clock register.

상태 1323에서, EVALFSMx는 어레이에서 FPGA 로직 소자와 통신을 시작하기 위하여 각 FPGA 로직 소자의 멀티플렉싱 회로를 턴온하는 mux_en 신호를 생성한다. 전술한 바와 같이, 인터-FPGA 와이어 라인은 때때로 멀티플렉싱되어 각 FPGA 로직 소자 칩내의 제한된 핀 자원을 효과적으로 이용하게 한다.
상태 1324에서, EVAL=1인 동안 EVALFSM는 휴면상태이다. EVAL=0일 때, 평가 주기는 종료되고, 상태 1325는 EVALFSMx가 mux_en 신호를 턴온하는 것을 요청한다.At state 1323, EVALFSMx generates a mux_en signal that turns on the multiplexing circuitry of each FPGA logic device to initiate communication with the FPGA logic device in the array. As mentioned above, inter-FPGA wire lines are sometimes multiplexed to effectively utilize limited pin resources within each FPGA logic device chip.
In state 1324, EVALFSM is dormant while EVAL = 1. When EVAL = 0, the evaluation cycle ends and state 1325 requests EVALFSMx to turn on mux_en signal.

삭제delete

만약 메모리 블럭 M(여기서 M은 0을 포함하는 정수)의 수가 0이면, EVALFSMx 는 상태 1320으로 리턴하고, 여기서 EVAL=0인 경우 휴면상태이다. 대부분의 경우, M>0 이므로, EVALFSMx는 상태 1326A/1326B로 진행한다. "M"은 FPGA 로직 소자내의 메모리 블럭의 수이다. 이는 FPGA 로직 소자내에 맵핑 및 구현된 사용자 설계로부터의 상수이다. 이는 카운트 다운되지 않는다. 만약 M>0이면, 도 59의 우측 부분(메모리 액세스 주기)은 FPGA 로직 소자내에 구현된다. 만약 M=0이면, 도 59의 좌측 부분(EVAL 주기)만이 구현된다. If the number of memory blocks M (where M is an integer containing 0) is zero, EVALFSMx returns to state 1320 where it is dormant if EVAL = 0. In most cases, M> 0, EVALFSMx proceeds to state 1326A / 1326B. "M" is the number of memory blocks in the FPGA logic device. This is a constant from the user design that is mapped and implemented within the FPGA logic device. It does not count down. If M> 0, then the right part (memory access period) of Figure 59 is implemented in the FPGA logic element. If M = 0, only the left part (EVAL period) of FIG. 59 is implemented.

상태 1327은 SHIFTIN=0인 동안 EVALFSMx를 대기 상태로 유지한다. SHIFTIN=1일 때, 이전의 FPGA 로직 소자는 그 메모리 액세스를 종료하고 현재의 FPGA 로직 소자는 그 메모리 액세스 작업을 수행할 준비를 한다. 대안적으로, SHIFTIN=1이면, 현재의 FPGA 로직 소자는 뱅크에서 제 1 로직 소자이며 SHIFTIN 입력 라인은 Vcc에 연결된다. 그럼에도 불구하고, SHIFTIN=1 신호의 수신은 현재의 FPGA 로직 소자가 메모리 액세스를 수행할 준비를 한 것을 지시한다. 상태 1328에서, 메모리 블럭 수 N은 N=1로 설정된다. 이 숫자 N은 각 루프를 수행할 때마다 증가되어 특정 메모리 블럭 N에 대한 메모리 액세스가 수행될 수 있다. 초기에, N=1이며, EVALFSMx는 메모리 블럭 1에 대한 메모리 액세스를 수행한다.State 1327 holds EVALFSMx on standby while SHIFTIN = 0. When SHIFTIN = 1, the previous FPGA logic device terminates its memory access and the current FPGA logic device is ready to perform its memory access operation. Alternatively, if SHIFTIN = 1, the current FPGA logic element is the first logic element in the bank and the SHIFTIN input line is connected to Vcc. Nevertheless, receipt of the SHIFTIN = 1 signal indicates that the current FPGA logic device is ready to perform a memory access. In state 1328, the memory block number N is set to N = 1. This number N is incremented with each loop, so that memory accesses to specific memory blocks N can be performed. Initially, N = 1, EVALFSMx performs memory access to memory block 1.

상태 1329에서, EVALFSMx는 Mem_Block_N 인터페이스(1253)의 어드레스 및 제어 신호를 FPGA 버스 FD[63:32] 또는 FD[31:0]상에 전송하기 위하여 라인 1285상에 선택 신호를 생성하고 라인 1284상에 output_en 신호를 생성하여 FGPA 버스 드라이버 FDO_MUXx(1249)로 전송한다. 만약 기록 동작 요구되면 wr=1이 된다. 그렇지 않고, 판독 동작이 요구되면 wr=0이 된다. EVALFSMx는 그 입력 중 하나로 라인 1287상의 wr 신호를 수신한다. 상기 wr 신호에 기초하여, 라인 1285상에 적절한 선택 신호가 주어진다.In state 1329, EVALFSMx generates a select signal on line 1285 and sends a select signal on line 1284 to send the address and control signals of Mem_Block_N interface 1253 on the FPGA bus FD [63:32] or FD [31: 0]. Create an output_en signal and send it to the FGPA bus driver FDO_MUXx (1249). If a write operation is requested, wr = 1. Otherwise, if a read operation is requested, wr = 0. EVALFSMx receives one of its inputs, the wr signal on line 1287. Based on the wr signal, an appropriate selection signal is given on line 1285.

wr=1일 때, EVALFSMx은 상태 1330으로 진행한다. EVALFSMx은 FPGA 버스 FD[63:32] 또는 FD[31:0]상으로 Mem_Block_N 1253의 기록 데이터를 제공하기 위하여 FD 버스 드라이버를 위한 선택 및 output_en 신호를 생성한다. 그 후, EVALFSMx는 SRAM 메모리 소자가 기록 사이클을 종료하도록 하나의 사이클을 기다린다. 그리고, EVALFSMx는 상태 1335로 진행하고, 여기서 메모리 블럭 수 N은 1이 증가한다. 즉, N=N+1.When wr = 1, EVALFSMx proceeds to state 1330. EVALFSMx generates select and output_en signals for the FD bus driver to provide the write data of Mem_Block_N 1253 on the FPGA bus FD [63:32] or FD [31: 0]. Thereafter, EVALFSMx waits one cycle for the SRAM memory device to end the write cycle. EVALFSMx then proceeds to state 1335 where memory block number N is increased by one. That is, N = N + 1.

그러나, 만약 wr=0이면, 판독 동작이 요청되고, EVALFSMx는 상태 1332로 진행하며, 여기서 하나의 사이클을 기다리고, 상태 1333으로 진행하여 또다른 하나의 사이클을 기다린다. 상태 1334에서, EVALFSMx은 라인 1286상에 rd_latch 신호를 생성하여, 메모리 블럭 N의 메모리 판독 데이터 이중 버퍼 1251이 SRAM 데이터를 FD 버스상으로 패치하게 한다. EVALFSMx는 상태 1335에 진행하고, 여기서 메모리 블럭 N이 1 증가한다. 즉, N=N+1. 따라서, 만약 증가 상태 1335 이전에 N=1이면, N은 2가 되고, 메모리 블럭 2에 대하여 시퀀스 메모리 액세스가 실행될 것이다.
만약 현재의 메모리 블럭 N의 수가 사용자 설계에서 총 메모리 블럭 M의 수보다 작거나 같으면(즉, N ≤M), EVALFSMx는 상태 1329로 진행하고, 여기서 동작이 기록 동작인가 또는 판독 동작인가에 기초하여 FD 버스 드라이버에 대한 특정 선택 및 output_en 신호를 생성한다. 그리고, 다음 메모리 블럭 N에 대한 기록 또는 판독 동작이 수행된다.
그러나, 만약 현재의 메모리 블럭 N의 수가 사용자 설계에서 총 메모리 블럭 M의 수보다 크면(즉, N > M), EVALFSMx는 상태 1336로 진행하고, 여기서 SRAM 메모리 소자를 액세스하도록 뱅크의 다음 FPGA 로직 소자를 허용하는 SHIFTOUT 출력 신호를 턴온한다. 그 후, EVALFSMx은 상태 1320으로 진행하고, 여기서 시뮬레이션 시스템이 FPGA 로직 소자에서 데이터 평가를 요청할 때(즉, EVAL=1)까지 휴면상태이다.However, if wr = 0, a read operation is requested and EVALFSMx proceeds to state 1332, where it waits for one cycle, and proceeds to state 1333 to wait for another cycle. In state 1334, EVALFSMx generates an rd_latch signal on line 1286, causing memory read data dual buffer 1251 of memory block N to patch the SRAM data onto the FD bus. EVALFSMx proceeds to state 1335, where memory block N is incremented by one. That is, N = N + 1. Thus, if N = 1 before increment state 1335, N becomes 2, and sequence memory access will be performed for memory block 2. FIG.
If the current number of memory blocks N is less than or equal to the total number of memory blocks M in the user design (i.e., N ≦ M), EVALFSMx proceeds to state 1329, based on whether the operation is a write operation or a read operation. Generate specific selection and output_en signals for the FD bus driver. Then, a write or read operation to the next memory block N is performed.
However, if the current number of memory blocks N is greater than the total number of memory blocks M in the user design (ie N> M), EVALFSMx proceeds to state 1336, where the next FPGA logic element in the bank is accessed to access the SRAM memory elements. Turn on the SHIFTOUT output signal to allow. EVALFSMx then proceeds to state 1320 where it is dormant until the simulation system requests data evaluation from the FPGA logic device (ie, EVAL = 1).

삭제delete

도 61은 본 발명의 일 실시예에 따른 시뮬레이션 기록/판독 사이클을 도시한다. 도 61은 참조 번호 1366에서 시뮬레이션 기록/판독 사이클내의 3개의 주기(DMA 데이터 전송 주기, 평가 주기, 및 메모리 액세스 주기)를 도시한다. 비록 도시되지는 않았지만, 이전의 DMA 전송, 평가, 및 메모리 액세스도 수행된다는 것이 또한 암시된다. 더 나아가, 로우 뱅크 SRAM 에 대한 데이터 전송 타이밍은 하이 뱅크 SRAM과 다를 것이다. 간단화를 위하여, 도 61은 이상적 로우 또는 하이 뱅크에 대한 액세스 시간을 하나 예시한다. 글로벌 클럭 GCLK(1350)는 시스템내의 모든 컴포넌트에 대하여 클로킹 신호를 제공한다.61 shows a simulation write / read cycle according to an embodiment of the present invention. FIG. 61 shows three cycles (DMA data transfer cycle, evaluation cycle, and memory access cycle) in the simulation write / read cycle at 1366. Although not shown, it is also implied that previous DMA transfers, evaluations, and memory accesses are also performed. Furthermore, the data transfer timing for low bank SRAM will be different than high bank SRAM. For simplicity, FIG. 61 illustrates one access time for an ideal low or high bank. Global clock GCLK 1350 provides a clocking signal for all components in the system.

DATAXSFR 신호 1351은 DMA 데이터 전송 주기의 발생을 지시한다. 트레이스 1367에서 DATAXSFR=1일 때, DMA 데이터 전송은 메인 컴퓨팅 시스템 및 FPGA 로직 소자 또는 SRAM 메모리 소자 사이에 수행된다. 따라서, 데이터는 FPGA 하이 뱅크 버스 FD[63:32] 1359 및 트레이스 1369상에 제공될 뿐만 아니라, FPGA 로우 뱅크 버스 FD[31:0] 1358 및 트레이스 1368상에 제공된다. DONE 신호 1364는 로직 신호 0에서 1로 전환(예, 트레이스 1390)됨으로써 메모리 액세스 주기의 종료을 지시하거나 또는 그렇지 않을 경우 로직 0(예, 트레이스 1370의 에지 및 트레이스 1390의 에지의 조합)을 가지는 시뮬레이션 기록/판독 사이클의 지속을 지시한다. DMA 전송 주기 동안, DONE 신호는 로직 0이다.DATAXSFR signal 1351 indicates the occurrence of a DMA data transfer period. When DATAXSFR = 1 in trace 1367, DMA data transfer is performed between the main computing system and the FPGA logic device or SRAM memory device. Thus, data is provided on FPGA high bank bus FD [63:32] 1359 and trace 1369 as well as on FPGA low bank bus FD [31: 0] 1358 and trace 1368. DONE signal 1364 transitions from logic signal 0 to 1 (e.g., trace 1390) to indicate the end of a memory access cycle, or else simulated recording with logic 0 (e.g., a combination of edge of trace 1370 and edge of trace 1390). Indicates the duration of the read cycle. During the DMA transfer period, the DONE signal is logic zero.

DMA 전송 주기가 종료되면, DATAXSFR 신호는 로직 1에서 로직 0으로 전환되고, 이는 평가 주기의 온세트는 트리거한다. 따라서, EVAL 1352는 트레이스 1371로 지시된 바와 같이 로직 1이다. 로직 1에서 EVAL 신호의 지속은 미리 결정되어 프로그래밍될 수 있다. 이러한 평가 주기동안, 사용자 설계 로직의 데이터는 clk_en 신호 1353(상기 신호는 트레이스 1372에 의해 지시되는 바와 같이 로직 1이다), input_en 신호 1354(상기 신호도 또한 트레이스 1373에 의해 지시되는 바와 같이 로직 1이다), 및 mux_en 신호 1355(상기 신호도 또한 트레이스 1374에 의해 지시되는 바와 같이 로직 1이며 clk_en 및 input_en 신호보다 더 오래 지속된다)를 이용하여 평가된다. 데이터는 이러한 특정 FPGA 로직 소자내에서 평가된다. mux_en 신호 1355가 트레이스 1374에서 로직 1에서 로직 0으로 전환되고 적어도 하나의 메모리 블럭이 FPGA 로직 소자에 존재할 때, 평가 주기는 종료되고 메모리 액세스 주기가 시작된다.When the DMA transfer period ends, the DATAXSFR signal transitions from logic 1 to logic 0, which triggers an onset of evaluation periods. Thus, EVAL 1352 is logic 1 as indicated by trace 1371. The duration of the EVAL signal in logic 1 can be predetermined and programmed. During this evaluation period, the data of the user design logic is clk_en signal 1353 (the signal is logic 1 as indicated by trace 1372), input_en signal 1354 (the signal is also logic 1 as indicated by trace 1373). ), And mux_en signal 1355 (the signal is also logic 1 and lasts longer than the clk_en and input_en signals, as indicated by trace 1374). Data is evaluated within this particular FPGA logic device. When the mux_en signal 1355 transitions from logic 1 to logic 0 at trace 1374 and at least one memory block is present in the FPGA logic device, the evaluation period ends and the memory access period begins.

SHIFTIN 신호 1356이 트레이스 1375에서 로직 1로 주어진다. 이는 이전의 FPGA가 평가를 종료하였으며 모든 원하는 데이터가 이러한 이전 FPGA 로직 소자에 대하여 액세스된다는 것을 의미한다. 이제, 뱅크의 다음 FPGA 로직 소자가 메모리 액세스를 준비한다. The SHIFTIN signal 1356 is given as logic 1 at trace 1375. This means that the previous FPGA has completed the evaluation and all desired data is accessed for this previous FPGA logic device. The next FPGA logic element in the bank is now ready for memory access.

트레이스 1377 내지 1386에서, 다음과 같이 명명한다. ACj_k는 어드레스 및 제어 신호가 FPGAj 및 메모리 블럭 k와 연동된다는 것을 의미한다. 여기서, j 및 k는 0을 포함하는 정수이다. WDj_k는 FPGAj 및 메모리 블럭 k에 대한 기록 데이터를 의미한다. RDj_k는 FPGAj 및 메모리 블럭 k에 대한 판독 데이터를 의미한다. 따라서, AC3_1은 FPGA3 및 메모리 블럭 1과 관련된 어드레스 및 제어 신호를 의미한다. 로우 뱅크 SRAM 액세스 및 하이 뱅크 SRAM 액세스 1361는 트레이스 1387로 도시되었다.In traces 1377-1386, they are named as follows. ACj_k means that the address and control signals are associated with FPGAj and memory block k. Here, j and k are integers containing 0. WDj_k means write data for FPGAj and memory block k. RDj_k means read data for FPGAj and memory block k. Thus, AC3_1 refers to the address and control signals associated with FPGA3 and memory block 1. Low bank SRAM access and high bank SRAM access 1361 are shown as trace 1387.

다음 트레이스 1377 내지 1387은 메모리 액세스가 수행되는 방법을 나타낸다. EVALFSMx에 대한 wrx 신호의 로직 레벨 및 MEMFSM에 대한 mem_wr 신호의 로직 레벨에 따라서 기록 또는 판독 동작이 수행된다. 만약 기록 동작을 원하면, 메모리 모델은 사용자 메모리 블럭 N 인터페이스(도 57의 Mem_Block_N 인터페이스 1253)와 인터페이싱하여 wrx를 제어 신호의 하나로서 제공한다. 상기 제어 신호 wrx는 FD 버스 드라이버 및 EVALFSMx 유닛에 제공된다. 만약 wrx가 로직 1이면, 적절한 선택 신호 및 output_en 신호가 FD 버스 드라이버에 제공되어 FD 버스상에 메모리 기록 데이터를 전송한다. 이제 FD 버스상에 존재하는 이러한 동일의 제어 신호는 CTRL_FPGA 유닛에서 메모리 어드레스/제어 래치에 의해 래치될 수 있다. 메모리 어드레스/제어 래치는 MA[18:2]를 통해서 어드레스 및 제어 신호를 SRAM으로 전송한다. 로직 1인 wrx 제어 신호는 FD 버스로부터 추출되고, 기록 동작이 요청되기 때문에 FD 버스상의 어드레스 및 제어 신호와 관련된 데이터가 SRAM 메모리 소자로 전송된다.The following traces 1377-1387 illustrate how memory accesses are performed. A write or read operation is performed according to the logic level of the wrx signal for EVALFSMx and the logic level of the mem_wr signal for MEMFSM. If a write operation is desired, the memory model interfaces with the user memory block N interface (Mem_Block_N interface 1253 in FIG. 57) to provide wrx as one of the control signals. The control signal wrx is provided to the FD bus driver and the EVALFSMx unit. If wrx is logic 1, the appropriate select signal and output_en signal are provided to the FD bus driver to transfer memory write data on the FD bus. This same control signal now present on the FD bus can be latched by a memory address / control latch in the CTRL_FPGA unit. The memory address / control latch transfers the address and control signals to the SRAM via MA [18: 2]. Logic 1 wrx control signal is extracted from the FD bus, and since a write operation is requested, data related to the address and control signal on the FD bus are transferred to the SRAM memory device.

따라서, 도 61에 도시된 바와 같이, 다음 FPGA 로직 소자(로우 뱅크에서 로직 소자 FPGA0)는 트레이스 1377로 지시된 바와 같이 FD[31:0]상에 AC0_0를 전송한다. 시뮬레이션 시스템은 WD0_0에 대하여 기록 동작을 수행한다. 그리고, AC0_1이 FD[31:0]상에 전송된다. 그러나, 만약 판독 동작이 요청되면, AC0_0에 대응하는 WD0_0 대신 RD0_0가 SRAM 메모리 디바이에 의해 FD 버스상에 존재하기 전에 FD 버스 FD[31:0]상의 AC0_1의 존재는 얼마의 시간 지연을 가질 것이다. Thus, as shown in FIG. 61, the next FPGA logic element (logic element FPGA0 in the low bank) sends AC0_0 on FD [31: 0] as indicated by trace 1377. The simulation system performs a write operation on WD0_0. AC0_1 is then transmitted on FD [31: 0]. However, if a read operation is requested, the presence of AC0_1 on FD bus FD [31: 0] will have some time delay before RD0_0 is present on the FD bus by the SRAM memory device instead of WD0_0 corresponding to AC0_0.

트레이스(1383)에 의해 표시된 것처럼, MA[18:2]/제어 버스 상의 AC0_0의 배치는 FD 버스 상의 어드레스, 제어 및 데이터의 배치보다 약간 지연된다. 이것은 MEMFSM 유닛이 FD 버스로부터 어드레스/제어 신호를 래치하고 mem_wr 신호를 추출하며 어드레스/제어 먹스(mux)에 적절한 선택 신호를 발생시켜 어드레스/제어 신호가 MA[18:2]/제어 버스 상에 배치될 수 있게 하는 시간을 요구하기 때문이다. 부가하여, MA[18:2]/제어 버스 상의 어드레스/제어 신호를 SRAM 메모리 소자에 배치한 이후, 시뮬레이션 시스템은 FD 버스 상에 배치될 SRAM 메모리 소자로부터 나온 대응 데이터를 기다려야 한다. 일 예는 트레이스(1384)와 트레이스(1381) 사이의 시간 오프셋(offset)이고, 여기서, RD1_1은 AC_1가 MA[18:2]/제어 버스 상에 배치된 이후 FD 버스 상에 배치된다.As indicated by trace 1383, the placement of AC0_0 on the MA [18: 2] / control bus is slightly delayed than the placement of address, control and data on the FD bus. This allows the MEMFSM unit to latch the address / control signal from the FD bus, extract the mem_wr signal, and generate an appropriate select signal to the address / control mux so that the address / control signal is placed on the MA [18: 2] / control bus. It requires time to make it possible. In addition, after placing the address / control signal on the MA [18: 2] / control bus into the SRAM memory device, the simulation system must wait for corresponding data from the SRAM memory device to be placed on the FD bus. One example is the time offset between trace 1384 and trace 1381, where RD1_1 is placed on the FD bus after AC_1 is placed on the MA [18: 2] / control bus.

하이 뱅크 상에서, FPGA1은 AC1_0를 버스 FD[63:32] 상에 배치하고, 그 다음에 WD1_0가 수반된다. 그 후에, AC1_1은 버스 FD[63:32] 상에 배치된다. 이것은 트레이스(1380)에 의해 표시된다. AC1_1이 FD 버스 상에 배치될 때, 이러한 예에서 제어 신호는 판독 동작을 지시한다. 이와 같이, 전술한 것처럼, AC1_1이 트레이스(1384)에 의해 표시된 것처럼 MA[18:2]/제어 버스 상에 배치될 때, 로직 0에서 적절한 wrx 및 mem_wr 신호가 EVALFSMx 및 MEMFSM 유닛에 어드레스/제어 신호로 제공된다. 상기 시뮬레이션 시스템은 이것이 판독 동작임을 알기 때문에, 기록 데이터는 SRAM 메모리 소자로 전송되지 않을 것이고, 그보다는 오히려 AC1_1과 관련된 판독 데이터가 시뮬레이션 메모리 블럭 인터페이스를 경유한 사용자 설계 로직에 의한 후속적인 판독을 위하여 SRAM 메모리 소자에 의하여 FD 버스 상에 배치된다. 이것은 하이 뱅크 상의 트레이스(1381)에 의해 표시된다. 로우 뱅크 상에서, RD0_1인 트레이스(1378)에 의해 표시된 것처럼 FD 버스 상에 배치되고, 그 다음에 AC0_1이 MA[18:2]/제어 버스 상에 배치된다(미도시).On the high bank, FPGA1 places AC1_0 on bus FD [63:32], followed by WD1_0. Thereafter, AC1_1 is disposed on bus FD [63:32]. This is indicated by trace 1380. When AC1_1 is placed on the FD bus, the control signal in this example indicates a read operation. As such, as described above, when AC1_1 is placed on the MA [18: 2] / control bus as indicated by trace 1384, the appropriate wrx and mem_wr signals at logic 0 are addressed to the EVALFSMx and MEMFSM units at the address / control signals. Is provided. Since the simulation system knows that this is a read operation, the write data will not be transferred to the SRAM memory element, but rather the read data associated with AC1_1 may not be transferred to the SRAM for subsequent reading by the user design logic via the simulation memory block interface. The memory device is disposed on the FD bus. This is indicated by trace 1381 on the high bank. On the low bank, it is placed on the FD bus as indicated by trace 1378 which is RD0_1, and then AC0_1 is placed on the MA [18: 2] / control bus (not shown).

시뮬레이션 메모리 블럭 인터페이스를 경유한 사용자 설계 로직에 의한 판독 동작은 트레이스(1388)에 의해 표시된 것처럼 EVALFSMx가 rd_lat0 신호(1362)를 시뮬레이션 메모리 블럭 인터페이스의 메모리 판독 데이터 이중 버퍼에 발생시킬 때 달성된다. 이러한 rd_lat0 신호은 로우 뱅크 FPGA0 및 하이 뱅크 FPGA1 둘 다에 제공된다.Read operation by the user design logic via the simulation memory block interface is accomplished when EVALFSMx generates the rd_lat0 signal 1362 to the memory read data double buffer of the simulation memory block interface, as indicated by trace 1388. This rd_lat0 signal is provided to both low bank FPGA0 and high bank FPGA1.

그 후에, 각각의 FPGA 로직 소자에 대한 다음 메모리 블럭이 FD 버스 상에 배치된다. AC2_0가 로우 뱅크 FD 버스 상에 배치되는 한편, AC3_0는 하이 뱅크 FD 버스 상에 배치된다. 기록 동작이 요구된다면, WD2_0는 로우 뱅크 FD 버스 상에 배치되고, WD3_0은 하이 뱅크 FD 버스 상에 배치된다. AC3_0는 트레이스(1385)에 의해 표시된 것처럼 하이 뱅크 MA[18:2]/제어 버스 상에 배치된다. 이러한 프로세스는 기록 및 판독 동작을 위하여 다음 메모리 블럭에 대하여 계속된다. 로우 뱅크 및 하이 뱅크를 위한 기록 및 판독 동작은 서로 다른 시간 및 속도에서 일어날 수 있고 도 61은 로우 뱅크 및 하이 뱅크에 대한 타이밍이 동일한 특별한 일례를 보여준다. 부가적으로, 로우 뱅크 및 하이 뱅크에 대한 기록 동작은 함께 발생하고, 뒤이어 두 뱅크 상의 판독 동작이 일어난다. 항상 이러한 것은 아니다. 로우 뱅크 및 하이 뱅크의 존재는 이러한 뱅크들에 결합된 장치들의 병렬 동작을 가능하게 한다. 즉, 로우 뱅크 상의 활동은 하이 뱅크 상의 활동에 독립적이다. 다른 시나리오에 의하면, 로우 뱅크는 하이 뱅크가 일련의 판독 동작을 수행하고 있을 때 병렬적으로 일련의 기록 동작을 수행한다. Thereafter, the next memory block for each FPGA logic element is placed on the FD bus. AC2_0 is placed on the low bank FD bus, while AC3_0 is placed on the high bank FD bus. If a write operation is required, WD2_0 is placed on the low bank FD bus and WD3_0 is placed on the high bank FD bus. AC3_0 is placed on the high bank MA [18: 2] / control bus as indicated by trace 1385. This process continues for the next memory block for write and read operations. Write and read operations for the low bank and the high bank can occur at different times and speeds and FIG. 61 shows a special example in which the timing for the low bank and the high bank are the same. In addition, write operations for the low bank and high bank occur together, followed by read operations on both banks. This is not always the case. The presence of a low bank and a high bank enables parallel operation of the devices coupled to these banks. That is, activity on the low bank is independent of activity on the high bank. According to another scenario, the low bank performs a series of write operations in parallel when the high bank is performing a series of read operations.

각각의 뱅크에 대한 마지막 FPGA 로직 소자의 마지막 데이터와 만나면, SHIFTOUT 신호(1357)가 트레이스(1376)에 의해 표시된 것처럼 가정된다. 판독 동작에 대하여, 로우 뱅크 상의 FPGA2 및 하이 뱅크 상의 FPGA3에 대응하는 rd_lat 신호(1363)가 트레이스(1389)에 의해 표시된 것처럼 트레이스(1379) 상의 RD2_1 및 트레이스(1382) 상의 RD3_1를 판독하도록 가정된다. 마지막 FPGA 유닛에 대한 마지막 데이터가 액세싱되었기 때문에, 시뮬레이션 기록/판독 사이클의 종료는 트레이스(1390)에 의해 표시된 것처럼 DONE 신호(1364)에 의해 지시된다. Upon encountering the last data of the last FPGA logic element for each bank, the SHIFTOUT signal 1357 is assumed as indicated by trace 1376. For the read operation, it is assumed that the rd_lat signal 1363 corresponding to FPGA2 on the low bank and FPGA3 on the high bank read RD2_1 on trace 1379 and RD3_1 on trace 1382 as indicated by trace 1389. Since the last data for the last FPGA unit has been accessed, the end of the simulation write / read cycle is indicated by the DONE signal 1264 as indicated by trace 1390.

다음의 표 H는 시뮬레이션 시스템 보드 상의 여러가지 컴포넌트 및 그에 대응하는 레지스터/메모리, PCI 메모리 어드레스 및 로컬 어드레스를 리스트한다.
Table H, below, lists the various components on the simulation system board and their corresponding registers / memory, PCI memory addresses, and local addresses.

표 H : 메모리 맵Table H: Memory Map

컨피규레이션 파일에 대한 데이터 포맷은 본 발명의 일 실시예에 따라 이하의 표 J에 나타난다. CPU는 모든 온-보드 FPGA에 대하여 1 비트를 병렬로 설정하도록 매번 PCI 버스를 통하여 1 워드를 보낸다. The data format for the configuration file is shown in Table J below in accordance with one embodiment of the present invention. The CPU sends one word through the PCI bus each time to set one bit in parallel for all on-board FPGAs.

표 J : 설정 데이터 포맷Table J: Configuration Data Format

이하의 표 K는 XSFR_EVAL 레지스터를 리스트한다. 그것은 모든 보드에 상주한다. XSFR_EVAL 레지스터는 EVAL 주기를 프로그래밍하고 DMA 판독/기록을 제어하며 EVAL_DONE 및 XSFR_DONE 필드의 상태를 판독하기 위하여 호스트 컴퓨팅 시스템에 의해 사용된다. 호스트 컴퓨팅 시스템은 또한 메모리가 액세싱할 수 있도록 이러한 레지스터를 사용한다. 이러한 레지스터와 관련된 시뮬레이션 시스템의 동작은 도 62 및 도 63과 관련하여 이하에서 설명된다. Table K below lists the XSFR_EVAL register. It resides on every board. The XSFR_EVAL register is used by the host computing system to program the EVAL period, control the DMA read / write, and read the status of the EVAL_DONE and XSFR_DONE fields. The host computing system also uses these registers to allow memory to access them. The operation of the simulation system associated with these registers is described below with respect to FIGS. 62 and 63.

표 K : 모든 6개의 보드에 대한 XSFR_EVAL REGISTER(로컬 어드레스: 0h)Table K: XSFR_EVAL REGISTER for all six boards (local address: 0h)

이하의 표 L은 CONFIG_JTAG[6:1] 레지스터의 컨텐츠를 리스트한다. CPU는 FPGA 로직 소자를 설정하고 이러한 레지스터를 통하여 FPGA 로직 소자에 대한 경계 스캔 테스트를 실행한다. 각각의 보드는 하나의 전용 레지스터를 갖는다.Table L below lists the contents of the CONFIG_JTAG [6: 1] register. The CPU sets up the FPGA logic device and executes a boundary scan test on the FPGA logic device through these registers. Each board has one dedicated register.

표 L : CONFIG_JTAG[6:1] REGISTERTable L: CONFIG_JTAG [6: 1] REGISTER

도 62 및 도 63은 본 발명의 다른 실시예에 대한 타이밍도를 보여준다. 상기 두개의 도면은 XSFR_EVAL 레지스터와 관련한 시뮬레이션 시스템의 동작을 보여준다. XSFR_EVAL 레지스터는 EVAL 주기을 프로그래밍하고 DMA 판독/기록을 제어하며 EVAL_DONE 및 XSFR_DONE 필드의 상태를 판독하기 위하여 호스트 컴퓨팅 시스템에 의하여 사용된다. 호스트 컴퓨팅 시스템은 또한 메모리가 액세싱할 수 있도록 이러한 레지스터를 사용한다. 상기 두개의 도면 사이의 주된 차이점 중 하나는 WAIT_EVAL 필드의 상태이다. 도 62의 경우에는 WAIT_EVAL 필드가 "0"으로 세팅되고, DMA 판독 전송은 CLK_EN 이후에 시작한다. 도 63의 경우에는 WAIT_EVAL 필드가 "1"로 세팅되고, DMA 판독 전송은 EVAL_DONE 이후에 시작한다. 62 and 63 show timing diagrams for another embodiment of the present invention. The two figures show the operation of the simulation system with respect to the XSFR_EVAL register. The XSFR_EVAL register is used by the host computing system to program the EVAL period, control the DMA read / write, and read the status of the EVAL_DONE and XSFR_DONE fields. The host computing system also uses these registers to allow memory to access them. One of the main differences between the two figures is the state of the WAIT_EVAL field. In the case of Fig. 62, the WAIT_EVAL field is set to " 0 ", and the DMA read transfer starts after CLK_EN. In the case of FIG. 63, the WAIT_EVAL field is set to "1", and the DMA read transfer starts after EVAL_DONE.

도 62에서, WR_XSFR_EN 및 RD_XSFR_EN은 둘 다 "1"로 세팅된다. 이러한 2개의 필드는 DMA 기록/판독 전송을 가능하게 하고 XSFR_DONE에 의해 클리어(clear)될 수 있다. 두 개의 필드가 "1"로 세팅되기 때문에, CTRL_FPGA 유닛은 자동적으로 DMA 기록 전송을 먼저 실행하고 그 다음에 DMA 판독 전송을 실행한다. 그러나, WAIT_EVAL 필드는 CLK_EN의 발생(assertion) 이후(및 DMA 기록 동작의 종료 이후)에 DMA 판독 전송이 시작됨을 지시하는 "0"으로 세팅된다. 이와 같이, 도 62에서, DMA 판독 동작은 CLK_EN 신호(소프트웨어 클럭)이 검출되자마자 DMA 기록 동작의 종료 이후에 거의 즉시 발생한다. DMA 판독 전송 동작은 EVAL 주기의 종료를 기다리지 않는다. In FIG. 62, both WR_XSFR_EN and RD_XSFR_EN are set to "1". These two fields enable DMA write / read transfers and may be cleared by XSFR_DONE. Since the two fields are set to "1", the CTRL_FPGA unit automatically executes the DMA write transfer first and then the DMA read transfer. However, the WAIT_EVAL field is set to " 0 " indicating that the DMA read transfer starts after the assertion of CLK_EN (and after the end of the DMA write operation). As such, in Fig. 62, the DMA read operation occurs almost immediately after the end of the DMA write operation as soon as the CLK_EN signal (software clock) is detected. The DMA read transfer operation does not wait for the end of the EVAL cycle.

타이밍도의 시작에서, EVAL_REQ_N 신호는 다수의 FPGA 로직 소자가 주목(attention)을 위하여 경쟁하기 때문에 경쟁을 경험한다. 이전에 설명한 것처럼, EVAL_REQ_N(또는 EVAL_REQ#) 신호는 임의의 FPGA 로직 소자가 이러한 신호를 가정한다면 평가 사이클을 시작하기 위해 사용된다. 데이터 전송의 종료에서, 평가 사이클은 평가 프로세스를 촉진하기 위하여 어드레스 포인터 초기화 및 소프트웨어 클럭의 동작을 포함하여 시작된다. At the beginning of the timing diagram, the EVAL_REQ_N signal experiences competition because many FPGA logic devices compete for attention. As previously described, the EVAL_REQ_N (or EVAL_REQ #) signal is used to begin the evaluation cycle if any FPGA logic device assumes this signal. At the end of the data transfer, the evaluation cycle begins with the operation of the address pointer initialization and software clock to facilitate the evaluation process.

DMA 데이터 전송 주기의 종결시 생성된 DONE 신호는 또한 다수의 LAST 신호(각각의 FPGA 로직 소자의 출력에서 시프트인 및 시프트아웃 신호로부터 나옴)가 발생하여 CTRL_FPGA 유닛에 제공될 때 경쟁을 경험한다. 모든 LAST 신호가 수신되고 프로세싱될 때, DONE 신호가 발생하여 새로운 DMA 데이터 전송 동작이 시작될 수 있다. EVAL_REQ_N 신호 및 DONE 신호는 이하에서 설명되는 방식으로 시간 공유 기초(time-shared basis) 상에서 동일한 와이어를 사용한다.The DONE signal generated at the end of the DMA data transfer period also experiences competition when multiple LAST signals (which come from the shift-in and shift-out signals at the output of each FPGA logic element) are provided to the CTRL_FPGA unit. When all LAST signals are received and processed, a DONE signal may be generated to begin a new DMA data transfer operation. The EVAL_REQ_N signal and the DONE signal use the same wire on a time-shared basis in the manner described below.

상기 시스템은 시간(1409)에서 WR_XSFR 신호에 의해 도시된 것처럼 우선 DMA 기록 전송을 시작한다. WR_XSFR 신호의 시작 부분은 PCI 컨트롤러(일 실시예에서는 PCI(9080) 또는 PCI(9060))와 관련된 소정의 오버헤드를 포함한다. 그 후에, 호스트 컴퓨팅 시스템은 로컬 버스 LD[31:0] 및 FPGA 버스 FD[63:0]를 통해 FPGA 버스 FD[63:0]에 결합된 FPGA 로직 소자로 DMA 기록 동작을 수행한다. The system first starts a DMA write transfer as shown by the WR_XSFR signal at time 1409. The beginning of the WR_XSFR signal includes some overhead associated with the PCI controller (in one embodiment PCI 9080 or PCI 9060). The host computing system then performs a DMA write operation to the FPGA logic device coupled to the FPGA bus FD [63: 0] via the local bus LD [31: 0] and the FPGA bus FD [63: 0].

시간(1412)에서, WR_XSFR 신호는 DMA 기록 동작의 종료를 지시하면서 액티브을 잃는다. EVAL 신호는 시간(1412)로부터 시간(1410)까지 사전에 결정된 시간 동안 액티브화된다. EVALTIME의 지속시간은 프로그램가능하고 초기에 8+X로 세팅되는데, 여기서 X는 가장 긴 신호 트레이스 경로로부터 나온다. XSFR_DONE 신호는 또한 현재 동작이 DMA 기록인 이러한 DMA 전송 동작의 종료를 지시하기 위하여 짧은 시간 동안 액티브화된다. At time 1412, the WR_XSFR signal becomes active indicating the end of the DMA write operation. The EVAL signal is activated for a predetermined time from time 1412 to time 1410. The duration of EVALTIME is programmable and is initially set to 8 + X, where X comes from the longest signal trace path. The XSFR_DONE signal is also activated for a short time to indicate the end of this DMA transfer operation, where the current operation is a DMA write.

또한, 시간(1412)에서, EVAL_REQ_N 신호들 사이의 경쟁이 중단되고 DONE 신호를 전하는 와이어는 이제 CTRL_FPGA 유닛으로 EVAL_REQ_N 신호를 전달한다. 3 클럭 사이클 동안, EVAL_REQ_N 신호는 DONE 신호를 전하는 와이어를 통해 프로세싱된다. 3 클럭 사이클 이후, EVAL_REQ_N 신호들은 더 이상 FPGA 로직 소자에 의해 생성되지 않고 이전에 CTRL_FPGA 유닛으로 전달되었던 EVAL_REQ_N 신호가 프로세싱될 것이다. EVAL_REQ_N 신호가 게이트된 클럭에 대한 FPGA 로직 소자에 의해 더 이상 생성되지 않는 최대 시간은 대략 23 클럭 사이클이다. 이러한 주기보다 더 긴 EVAL_REQ_N 신호는 무시될 것이다.Also, at time 1412, the competition between the EVAL_REQ_N signals is stopped and the wire carrying the DONE signal now passes the EVAL_REQ_N signal to the CTRL_FPGA unit. During three clock cycles, the EVAL_REQ_N signal is processed through the wire carrying the DONE signal. After three clock cycles, the EVAL_REQ_N signals are no longer generated by the FPGA logic device and the EVAL_REQ_N signal that was previously delivered to the CTRL_FPGA unit will be processed. The maximum time for which the EVAL_REQ_N signal is no longer generated by the FPGA logic device for the gated clock is approximately 23 clock cycles. EVAL_REQ_N signals longer than this period will be ignored.

시간(1413)에서, 즉 시간(1412)(DMA 기록 동작의 종료시) 이후의 대략 2 클럭 사이클 정도 지난 시간에서, CTRL_FPGA 유닛은 DMA 판독 전송을 시작하기 위하여 기록 어드레스 스트로브 WPLX ADS_N 신호를 PCI 컨트롤러로 보낸다. 시간(1413)으로부터 약 24 클럭 사이클 이후의 시간에서, PCI 컨트롤러는 DMA 판독 전송 프로세스를 시작할 것이고 DONE 신호가 또한 생성된다. 시간(1414)에서, PCI 컨트롤러에 의한 DMA 판독 프로세스의 시작에 앞서, RD_XSFR 신호가 DMA 판독 전송을 가능하게 하도록 액티브화된다. 우선 소정의 PLX 오버헤드 데이터가 전송되고 프로세싱된다. 시간(1415)에서, 이러한 오버헤드 데이터가 프로세싱되는 동안, DMA 판독 데이터가 FPGA 버스 FD[63:0] 및 로컬 버스 LD[31:0] 상에 배치된다. 시간(1413)으로부터 24 클럭 사이클의 종료시 및 DONE 신호의 액티브화와 FPGA 로직 소자들로부터 나온 EVAL_REQ_N 신호의 발생 시점에서, PCI 컨트롤러는 FPGA 버스 FD[63:0] 및 로컬 버스 LD[31:0]로부터 호스트 컴퓨터 시스템으로 데이터를 전송함으로써 DMA 판독 데이터를 프로세싱한다.
시간(1410)에서, DMA 판독 데이터는 계속 프로세싱될 수 있을 것이고, 반면 EVAL 신호는 비활성화되며 EVAL_DONE 신호는 EVAL 사이클의 종료를 지시하기 위하여 활성화된다. FPGA 로직 소자들 사이의 경쟁(contention)은 또한 그들이 EVAL_REQ_N 신호를 생성할 때 시작된다.
시간(1417)에서, 시간(1416)에서의 DMA 판독 주기의 종료 바로 이전에, 호스트 컴퓨터 시스템은 DMA 사이클의 종료가 가까운지를 결정하기 위하여 PLX 인터럽트 레지스터를 폴링(polling)한다. PCI 컨트롤러는 DMA 데이터 전송 프로세스를 종료하기 위하여 얼마나 많은 수의 사이클이 필요한지를 안다. 미리 결정된 횟수의 사이클 이후에, PCI 컨트롤러는 인터럽트 레지스터의 특정 비트를 세팅할 것이다. 호스트 컴퓨터의 CPU는 PCI 컨트롤러의 이러한 인터럽트 레지스터를 폴링한다. 상기 비트가 세팅되면, CPU는 DMA 주기가 거의 완료되는 것을 인식한다. 호스트 시스템의 CPU는 인터럽트 레지스터가 판독 사이클로 PCI 버스를 방해할 것이므로 항상 인터럽트 레지스터를 폴링하는 것은 아니다. 그리하여, 본 발명의 일 실시예에서, 호스트 컴퓨터 시스템의 CPU는 인터럽트 레지스터를 폴링하기 이전에 특정 수의 사이클을 기다리도록 프로그래밍된다.
짧은 시간 이후에, RD_XSFR이 비활성화되고 DAM 판독 데이터가 더 이상 FPGA 버스 FD[63:0] 또는 로컬 버스 LD[31:0] 상에 있지 않을 때 DMA 판독 주기의 종료는 시간(1416)에서 발생한다. XSFR_DONE 신호는 또한 시간(1416)에서 액티브화되고 DONE 신호의 발생을 위한 LAST 신호들 사이의 경쟁이 시작된다.At time 1413, that is, approximately two clock cycles after time 1412 (at the end of the DMA write operation), the CTRL_FPGA unit sends a write address strobe WPLX ADS_N signal to the PCI controller to begin the DMA read transfer. . At a time after about 24 clock cycles from time 1413, the PCI controller will begin the DMA read transfer process and a DONE signal is also generated. At time 1414, prior to the start of the DMA read process by the PCI controller, the RD_XSFR signal is activated to enable DMA read transfer. First, some PLX overhead data is transmitted and processed. At time 1415, while this overhead data is being processed, DMA read data is placed on the FPGA bus FD [63: 0] and local bus LD [31: 0]. At the end of 24 clock cycles from time 1413 and at the time of activation of the DONE signal and the generation of the EVAL_REQ_N signal from the FPGA logic elements, the PCI controller is responsible for the FPGA bus FD [63: 0] and local bus LD [31: 0]. Process the DMA read data by transferring data from the host computer system.
At time 1410, the DMA read data may continue to be processed, while the EVAL signal is deactivated and the EVAL DONE signal is activated to indicate the end of the EVAL cycle. Contention between FPGA logic elements also begins when they generate the EVAL_REQ_N signal.
At time 1417, just before the end of the DMA read period at time 1416, the host computer system polls the PLX interrupt register to determine if the end of the DMA cycle is close. The PCI controller knows how many cycles are needed to terminate the DMA data transfer process. After a predetermined number of cycles, the PCI controller will set a specific bit in the interrupt register. The CPU of the host computer polls these interrupt registers of the PCI controller. If the bit is set, the CPU recognizes that the DMA cycle is almost complete. The CPU of the host system does not always poll the interrupt register because the interrupt register will interrupt the PCI bus in read cycles. Thus, in one embodiment of the present invention, the CPU of the host computer system is programmed to wait for a certain number of cycles before polling the interrupt register.
After a short time, the end of the DMA read cycle occurs at time 1416 when RD_XSFR is disabled and the DAM read data is no longer on FPGA bus FD [63: 0] or local bus LD [31: 0]. . The XSFR_DONE signal is also activated at time 1416 and competition between LAST signals for the generation of the DONE signal begins.

삭제delete

시간(1409)에서의 WR_XSFR 신호 발생으로부터 시간(1417)까지의 전체 DMA 주기 동안, 호스트 컴퓨터 시스템의 CPU는 시뮬레이션 하드웨어 시스템을 액세싱하지 않는다. 일 실시예에서, 이러한 주기의 유지 시간은 (1) PCI 컨트롤러 시간 2에 대한 오버헤드 시간, (2) WR_XSFR 및 RD_XSFR의 워드 수, 및 (3) 호스트 컴퓨터 시스템(예를 들어, Sun ULTRASpace)의 PCI 오버헤드의 합이다. DMA 주기 이후의 제 1 액세스는 CPU가 PCI 컨트롤러의 인터럽트 레지스터를 폴링할 때 시간(1419)에서 발생한다.During the entire DMA cycle from the generation of the WR_XSFR signal at time 1409 to time 1417, the CPU of the host computer system does not access the simulation hardware system. In one embodiment, the retention time of such a cycle is based on (1) overhead time for PCI controller time 2, (2) the number of words in WR_XSFR and RD_XSFR, and (3) the host computer system (eg, Sun ULTRASpace). The sum of the PCI overhead. The first access after the DMA period occurs at time 1419 when the CPU polls the interrupt register of the PCI controller.

시간(1416) 이후의 3 클럭 사이클이 지난 시간(1411)에서, MEM_EN 신호가 온 보드 SRAM 메모리 소자를 가능하게 하도록 액티브화되어 FPGA 로직 소자와 SRAM 메모리 소자들 사이의 메모리 액세스가 시작될 수 있다. 메모리 액세스는 시간(1419)까지 계속되고, 일 실시예에서 액세스 당 5 클럭 사이클이 필요하다. 아무런 DMA 판독 전송이 필요하다면, 그 다음에 메모리 액세스는 시간(1411) 대신에 시간(1410)에서 더 일찍 시작될 수 있다. At time 1411, three clock cycles after time 1416, the MEM_EN signal is activated to enable the on-board SRAM memory device so that memory access between the FPGA logic device and the SRAM memory devices can begin. Memory access continues until time 1418, and in one embodiment requires 5 clock cycles per access. If no DMA read transfer is needed, then memory access can be started earlier at time 1410 instead of time 1411.

메모리 액세스가 FPGA 로직 소자와 FPGA 버스 FD[63:0]를 가로지른 SRAM 메모리 소자 사이에서 일어나는 동안, 호스트 컴퓨터 시스템의 CPU는 시간(1418)로부터 시간(1429)까지 로컬 버스 LD[31:0]를 통해 PCI 컨트롤러 및 CTRL_FPGA 유닛과 통신할 수 있다. 이것은 CPU가 PCI 컨트롤러의 인터럽트 레지스터 폴링을 종료한 이후에 일어난다. CPU는 다음 데이터 전송 준비로 여러 레지스터들 상에 데이터를 기록한다. 이러한 주기의 지속시간은 4㎲보다 더 크다. 메모리 액세스가 이러한 주기보다 더 짧다면, FPGA 버스 FD[63:0]은 어떠한 충돌도 경험하지 않을 것이다. 시간(1429)에서, XSFR_DONE 신호는 비활성화된다.While memory access occurs between the FPGA logic element and the SRAM memory element across the FPGA bus FD [63: 0], the CPU of the host computer system is configured to local bus LD [31: 0] from time 1418 to time 1429. It can communicate with the PCI controller and the CTRL_FPGA unit. This happens after the CPU has finished polling the interrupt registers of the PCI controller. The CPU writes data into several registers in preparation for the next data transfer. The duration of this cycle is greater than 4 ms. If the memory access is shorter than this period, the FPGA bus FD [63: 0] will not experience any collision. At time 1429, the XSFR_DONE signal is deactivated.

도 63에서, 타이밍도은 WAIT_EVAL 필드가 "1"로 세팅되어 있다는 점에서 도 62와 다소 다르다. 달리 말하여, DMA 판독 전송 주기은 EVAL_DONE 신호가 액티브화되어 거의 종료된 이후에 시작된다. 그것은 DMA 기록 동작의 종료 이후 바로 시작되는 대신에 EVAL 주기의 종료를 기다린다. EVAL 신호는 시간(1412)로부터 시간(1410)까지 미리 설정된 시간동안 액티브(활성화)된다. 시간(1410)에서, EVAL_DONE 신호는 EVAL 주기의 종료를 지시하기 위하여 액티브(활성화)된다.In FIG. 63, the timing diagram is somewhat different from FIG. 62 in that the WAIT_EVAL field is set to "1". In other words, the DMA read transfer period begins after the EVAL_DONE signal is activated and almost terminates. It waits for the end of the EVAL cycle instead of starting immediately after the end of the DMA write operation. The EVAL signal is active (activated) for a preset time from time 1412 to time 1410. At time 1410, the EVAL DONE signal is activated (activated) to indicate the end of the EVAL period.

도 63에서, 시간(1412)에서 DMA 기록 동작 이후에, CTRL_FPGA 유닛은 시간(1420)까지 PCI 컨트롤러로 기록 어드레스 스트로브 신호 WPLX ADS_N을 발생시키지 않고, 상기 시간(1420)은 EVAL 주기의 종료의 약 16 클럭 사이클 이전이다. XSFR_DONE 신호는 또한 시간(1423)으로 연장된다. 시간(1423)에서, XSFR_DONE 필드가 세팅되고 그 다음에 WPLX ADS_N 신호가 DMA 판독 프로세스를 시작하기 위하여 생성될 수 있다.In FIG. 63, after a DMA write operation at time 1412, the CTRL_FPGA unit does not generate a write address strobe signal WPLX ADS_N to the PCI controller until time 1420, which time 1420 is about 16 of the end of the EVAL cycle. It is before the clock cycle. The XSFR_DONE signal also extends to time 1423. At time 1423, the XSFR_DONE field is set and then a WPLX ADS_N signal can be generated to begin the DMA read process.

EVAL_DONE 신호의 액티브화의 약 16 클럭 사이클 이전에 시간(1420)에서, CTRL_FPGA 유닛은 DMA 판독 전송을 개시하기 위하여 기록 어드레스 스트로브 WPLX ADS_N 신호를 PCI 컨트롤러(예를 들어, PLX PC19080)에 보낸다. 시간(1420)으로부터 약 24 클럭 사이클 이후에, PCI 컨트롤러는 DMA 판독 전송을 시작할 것이고 DONE 신호가 또한 발생한다. 시간(1421)에서, PCI 컨트롤러에 의한 DMA 판독 프로세스의 시작에 앞서, RD_XSFR 신호가 DMA 판독 전송을 가능하게 하기 위하여 액티브된다. 소정의 PLX 오버헤드 데이터가 우선 전달되고 프로세싱된다. 시간(1422)에서, 이러한 오버헤드 데이터가 프로세싱되는 시간 동안, DMA 판독 데이터는 FPGA 버스 FD[63:0] 및 로컬 버스 LD[31:0] 상에 배치된다. 시간(1424)에서 24 클럭 사이클의 종료시, PCI 컨트롤러는 FPGA 버스 FD[63:0] 및 로컬 버스 LD[31:0]로부터 호스트 컴퓨터 시스템으로 데이터를 전송함으로써 DMA 판독 데이터를 프로세싱한다. 타이밍도의 나머지 부분은 도 62와 동일하다.At time 1420 before approximately 16 clock cycles of activation of the EVAL_DONE signal, the CTRL_FPGA unit sends a write address strobe WPLX ADS_N signal to a PCI controller (eg, PLX PC19080) to initiate a DMA read transfer. After about 24 clock cycles from time 1420, the PCI controller will begin a DMA read transfer and a DONE signal will also occur. At time 1421, prior to the start of the DMA read process by the PCI controller, the RD_XSFR signal is activated to enable DMA read transfer. Certain PLX overhead data is first delivered and processed. At time 1422, during the time such overhead data is processed, DMA read data is placed on FPGA bus FD [63: 0] and local bus LD [31: 0]. At the end of 24 clock cycles at time 1424, the PCI controller processes the DMA read data by transferring data from the FPGA bus FD [63: 0] and the local bus LD [31: 0] to the host computer system. The rest of the timing chart is the same as in FIG.

이와 같이, 도 63의 RD_XSFR 신호가 도 62보다 더 이후에 액티브된다. 도 63의 RD_XSFR 신호는 EVAL 주기의 거의 종료 후에 존재하여 DMA 판독 동작이 지연된다. 도 62의 RD_XSFR 신호는 DMA 기록 전송의 종료 이후에 CLK_EN 신호의 검출 다음에 존재한다. As such, the RD_XSFR signal of FIG. 63 is activated later than FIG. 62. The RD_XSFR signal in Fig. 63 is present after almost the end of the EVAL period, and the DMA read operation is delayed. The RD_XSFR signal in FIG. 62 is present after the detection of the CLK_EN signal after the end of the DMA write transfer.

IX. 공동 검증 시스템(COVERIFICATION SYSTEM)IX. COVERIFICATION SYSTEM

본 발명의 공동검증 시스템은 설계자에게 소프트웨어 시뮬레이션의 유연성 및 하드웨어 모델을 사용함에 의해 얻어지는 더 빠른 속도를 제공함으로써 설계/개발를 가속시킬 수 있다. 하드웨어 부분 설계 및 소프트웨어 설계 둘 다 ASIC 제조에 앞서 확인될 수 있고 에뮬레이터 기반 공동검증 툴의 제한이 없다. 디버깅 특성이 향상되고 전체 디버깅 시간이 현저히 감소될 수 있다. The joint validation system of the present invention can accelerate design / development by providing designers with the flexibility of software simulation and the faster speeds obtained by using hardware models. Both hardware part design and software design can be verified prior to ASIC manufacturing and there are no limitations of emulator-based co-verification tools. Debugging characteristics can be improved and overall debugging time can be significantly reduced.

테스트 소자(device-under-test)로서 ASIC을 구비한 종래의 공동검증 툴Conventional joint validation tool with ASIC as device-under-test

도 64는 비디오, 멀티미디어, 이더넷, 또는 SCSI 카드와 같은, PCI 애드-온(add-on) 카드로서 구현된 전형적인 최종 설계를 도시한다. 이러한 카드(2000)는 다른 주변 장치들과 통신할 수 있게 하는 직접적인 인터페이스 커넥터(2002)를 포함한다. 커넥터(2002)는 VCR, 카메라, 또는 텔레비전 튜너로부터 나온 비디오 신호, 모니터 또는 스피커로의 비디오 및 오디오 출력, 통신 또는 디스크 드라이브 인터페이스로의 신호를 전송하기 위하여 버스(2001)에 결합된다. 사용자 설계에 따라, 당업자는 다른 인터페이스 요구조건을 예상할 수 있다. 설계 기능의 태반은 칩(2004)에 존재하고 상기 칩(2004)은 버스(2003)을 통해 인터페이스 커넥터(2002)에, 로컬 클럭 신호를 생성하기 위한 버스(2007)을 통해 로컬 오실레이터(2005)에, 버스(2008)를 통해 메모리(2006)에 결합된다. 애드-온 카드(2000)은 또한 PCI 버스(2010)과 결합하기 위한 PCI 커넥터(2009)를 포함한다. 64 illustrates a typical final design implemented as a PCI add-on card, such as a video, multimedia, Ethernet, or SCSI card. This card 2000 includes a direct interface connector 2002 that enables communication with other peripheral devices. Connector 2002 is coupled to bus 2001 for transmitting video signals from a VCR, camera, or television tuner, video and audio outputs to a monitor or speaker, and signals to a communications or disk drive interface. Depending on the user design, one skilled in the art can anticipate other interface requirements. The placenta of the design function is present on chip 2004 and the chip 2004 is connected to interface connector 2002 via bus 2003 and to local oscillator 2005 via bus 2007 for generating a local clock signal. And to memory 2006 via bus 2008. Add-on card 2000 also includes a PCI connector 2009 for coupling with PCI bus 2010.

도 64에 도시된 것처럼 설계를 애드-온 카드로서 구현하기 이전에, 상기 설계는 테스팅을 목적으로 ASIC 형태로 축소된다. 종래의 하드웨어/소프트웨어 공동검증 툴이 도 65에 도시되어 있다. 사용자 설계는 도 65에서 테스트 소자(또는 "DUT")로 라벨링된 ASIC의 형태로 구현된다. 인터페이싱하도록 설계된 여러 소스로부터 스티멀러스(stimulus)를 얻기 위하여, 테스트 소자(2024)는 타겟 시스템(2020)에 배치되는데, 타겟 시스템은 마더보드 상의 중앙 컴퓨팅 시스템(2021) 및 여러 주변 장치들의 결합물이다. 타겟 시스템(2020)은 CPU 및 메모리를 포함하는 중앙 컴퓨팅 시스템(2021)을 포함하고, 많은 어플리케이션을 실행하기 위하여 마이크로소프트 윈도우즈 또는 썬 마이크로시스템의 솔라리스(Solaris)와 같은 소정의 운영 시스템 하에서 동작한다. 당업계에 공지된 것처럼, 썬 마이크로시스템의 솔라리스는 운영 환경이고 인터넷, 인트라넷 및 엔터프라이즈-와이드 컴퓨팅을 지원하는 소프트웨어 제품들의 세트이다. 솔라리스 운영 환경은 산업 표준 UNIX 시스템 배포 버전 4에 기초하고, 분산된 네트워킹 환경의 클라이언트-서버 어플리케이션을 위하여 설계되었으며, 더 작은 워크그룹(workgroup)에 대하여 적절한 자원을 제공하며, 전자 상업에 요구되는 WebTone을 제공한다.Prior to implementing the design as an add-on card as shown in FIG. 64, the design is reduced to ASIC form for testing purposes. A conventional hardware / software co-verification tool is shown in FIG. 65. The user design is implemented in the form of an ASIC labeled with a test element (or “DUT”) in FIG. 65. In order to obtain stimulus from several sources designed to interface, a test element 2024 is placed in the target system 2020, which is a combination of a central computing system 2021 and several peripheral devices on the motherboard. to be. Target system 2020 includes a central computing system 2021 that includes a CPU and memory, and operates under a predetermined operating system such as Solaris of Microsoft Windows or Sun Microsystems to execute many applications. As is known in the art, Sun Microsystems' Solaris is an operating environment and a set of software products that support the Internet, intranets, and enterprise-wide computing. The Solaris operating environment is based on industry standard UNIX system distribution version 4, is designed for client-server applications in distributed networking environments, provides adequate resources for smaller workgroups, and is required for electronic commerce. To provide.

테스트 소자(2024)용 장치 드라이버(2022)는 운영 시스템(및 임의의 어플리케이션)과 테스트 소자(2024) 사이의 통신을 가능하게 하도록 중앙 컴퓨팅 시스템(2021)에 포함된다. 당업계에 공지된 것처럼, 장치 드라이버는 하드웨어 컴포넌트 또는 컴퓨터 시스템의 주변 장치를 제어하는 특정 소프트웨어이다. 장치 드라이버는 장치의 하드웨어 레지스터를 액세싱하고 종종 장치에 의해 발생한 인터럽트를 다루기 위하여 인터럽트 핸들러(interrupt handler)를 포한한다. 장치 드라이버는 종종 운영 시스템 커널의 최하위 레벨의 일부를 형성하고, 커널이 구성된 경우 장치 드라이버는 커널과 링크된다. 최근의 보다 많은 시스템은 운영 시스템이 실행된 이후 파일들로부터 설치될 수 있는 로딩가능 장치 드라이버를 구비한다. Device driver 2022 for test device 2024 is included in central computing system 2021 to enable communication between the operating system (and any application) and test device 2024. As is known in the art, device drivers are specific software that controls hardware components or peripherals of a computer system. Device drivers include interrupt handlers to access the device's hardware registers and often to handle interrupts generated by the device. Device drivers often form part of the lowest level of the operating system kernel, and device drivers are linked with the kernel when the kernel is configured. More modern systems have loadable device drivers that can be installed from files after the operating system is run.

테스트 소자(2024) 및 중앙 컴퓨팅 시스템(2021)은 PCI 버스(2023)에 결합된다. 타겟 시스템(2020)의 다른 주변 장치들은 타겟 시스템을 버스(2034)를 통해 네트워크(2030)에 결합시키는데 사용되는 이더넷 PCI 애드-온 카드(2025), 버스(2036 및 2035)를 통해 SCSI 드라이브(2027 및 2031)에 결합되는 SCSI PCI 애드-온 카드(2026), 버스(2032)를 통해 테스트 소자(2024)에 결합되는 VCR(2028)(테스트 소자(2024) 설계에 필요한 경우), 및 버스(2033)를 통해 테스트 소자(2024)에 결합되는 모니터 및/또는 스피커(2029)(테스트 소자(2024) 설계에 필요한 경우)를 포함한다. 당업계에 공지된 것처럼, "SCSI"는 "소형 컴퓨터 시스템 인터페이스(Small Computer Systems Interface)"의 약자로서, 하드 디스크, 플로피 디스크, CD-ROM, 프린터, 스캐너 및 다수의 많은 장치와 같은 지능형 장치(intelligent device)와 컴퓨터 사이의 시스템 레벨 인터페이싱에 대한 프로세서 독립 표준이다. The test device 2024 and the central computing system 2021 are coupled to the PCI bus 2023. Other peripheral devices of the target system 2020 can be used to connect the target system to the network 2030 via the bus 2034, the SCSI drive 2027 via the bus 2036 and 2035, the Ethernet PCI add-on card 2025. And a SCSI PCI add-on card 2026 coupled to 2031, a VCR 2028 coupled to the test component 2024 via a bus 2032 (if necessary for the design of the test component 2024), and the bus 2033. Monitor and / or speaker 2029 (if necessary for test device 2024 design) coupled to the test device 2024. As is known in the art, "SCSI" stands for "Small Computer Systems Interface" and can be used for intelligent devices such as hard disks, floppy disks, CD-ROMs, printers, scanners and many other devices. processor independent standard for system-level interfacing between intelligent devices and computers.

이러한 타겟 시스템 환경에서, 테스트 소자(2024)는 중앙 컴퓨팅 시스템(즉, 운영 시스템, 어플리케이션) 및 주변 장치로부터 여러 스티멀러스로 검사될 수 있다. 시간은 고려사항에 들지 않고 설계자는 단지 간단한 통과/실패 테스트를 찾고 있다면, 공동검증 툴은 그들의 요구를 충족시키기에 적절하여야 한다. 그러나, 대부분의 상황에서, 설계 프로젝트는 엄격한 예산의 제한을 받고 제품의 발매에 앞서 스케쥴 잡혀 있다. 앞서 설명한 것처럼, 이런 특정 ASIC 기반 공동검증 툴은 디버깅 특징이 존재하지 않기 때문에 불만족스럽다(설계자는 정교한 기술없이는 "실패된" 테스트의 원인을 가려낼 수 없고, 검출된 모든 버그에 대한 "교정"의 수도 프로젝트의 시초에 예측될 수 없으며, 그리하여 스케쥴 및 예산을 예측할 수 없게 된다).In this target system environment, the test device 2024 can be inspected with various stimulus from the central computing system (ie, operating system, application) and peripherals. If time is not a consideration and designers are just looking for simple pass / fail testing, the co-verification tool should be appropriate to meet their needs. In most situations, however, design projects are subject to strict budget constraints and scheduled prior to product release. As mentioned earlier, this particular ASIC-based co-verification tool is unsatisfactory because there are no debugging features (the designer can't pinpoint the cause of a "failed" test without sophisticated techniques, and the "correction" of all detected bugs is Water projects cannot be predicted at the beginning of the project, and thus schedules and budgets cannot be predicted).

테스트 소자로서 에뮬레이터를 구비한 종래의 공동검증 툴Conventional cavity validation tool with emulator as test element

도 66은 에뮬레이터를 구비한 종래의 공동검증 툴을 도시한다. 도 64에 도시되고 앞서 설명된 셋업과는 달리, 테스트 소자는 타겟 시스템(2040)과 소정의 주변 장치 및 테스트 워크스테이션(2052)과 결합된 에뮬레이터(2048)로 프로그래밍된다. 에뮬레이터(2048)는 에뮬레이션 클럭(2066) 및 에뮬레이터로 프로그래핑된 테스트 소자를 포함한다.66 shows a conventional cavity validation tool with an emulator. Unlike the setup shown in FIG. 64 and described above, the test element is programmed with an emulator 2048 coupled with the target system 2040 and certain peripheral and test workstations 2052. Emulator 2048 includes an emulation clock 2066 and test devices programmed with the emulator.

에뮬레이터(2048)는 PCI 버스 브릿지(2044)와 PCI 버스(2057) 및 제어 라인(2056)을 통해 타겟 시스템(2040)에 결합된다. 타겟 시스템(2040)은 마더보드 상의 중앙 컴퓨팅 시스템(2041) 및 여러 주변 장치들의 결합을 포함한다. 타겟 시스템(2040)은 CPU 및 메모리를 포함하는 중앙 컴퓨팅 시스템(2041)을 포함하고, 다수의 어플리케이션을 실행하기 위하여 마이크로소프트 윈도우즈 또는 썬 마이크로시스템의 솔라리스와 같은 소정의 운영 시스템에서 동작한다. 테스트 소자의 장치 드라이버(2042)는 운영 시스템(및 임의의 어플리케이션)과 에뮬레이터(2048)의 테스트 소자 사이의 통신을 가능하게 하기 위하여 중앙 컴퓨팅 시스템(2041)에 포함된다. 이러한 컴퓨팅 환경의 일부인 다른 장치들과 마찬가지로 에뮬레이터(2048)와 통신하기 위하여, 중앙 컴퓨팅 시스템(2041)은 PCI 버스(2043)와 결합된다. 타겟 시스템(2040)의 다른 주변 장치는 버스(2058)를 통해 타겟 시스템을 네트워크(2049)에 결합시키기 위해 사용되는 에더넷 PCI 애드-온 카드(2045), 및 버스(2060 및 2059)를 통해 SCSI 드라이브(2047 및 2050)에 결합되는 SCSI PCI 애드-온 카드(2046)를 포함한다. Emulator 2048 is coupled to target system 2040 via PCI bus bridge 2044 and PCI bus 2057 and control line 2056. Target system 2040 includes a central computing system 2041 on the motherboard and a combination of several peripheral devices. Target system 2040 includes a central computing system 2041 that includes a CPU and memory, and operates on any operating system, such as Solaris of Microsoft Windows or Sun Microsystems, to execute multiple applications. The device driver 2042 of the test device is included in the central computing system 2041 to enable communication between the operating system (and any application) and the test device of the emulator 2048. As with other devices that are part of this computing environment, the central computing system 2041 is coupled with the PCI bus 2043 to communicate with the emulator 2048. Other peripheral devices of the target system 2040 are Ethernet PCI add-on cards 2045 used to couple the target system to the network 2049 via the bus 2058, and SCSI via the buses 2060 and 2059. SCSI PCI add-on card 2046 coupled to drives 2047 and 2050.

에뮬레이터(2048)는 또한 버스(2062)를 통해 테스트 워크스테이션에 결합된다. 테스트 워크스테이션(2052)은 그 기능을 수행하기 위하여 CPU 및 메모리를 포함한다. 테스트 워크스테이션(2052)은 또한 모델링되지만 에뮬레이터(2048)에 물리적으로 결합되지 않은 다른 장치들에 대하여 테스트 케이스(2061) 및 장치 모델(2068)을 포함한다.Emulator 2048 is also coupled to the test workstation via bus 2062. Test workstation 2052 includes a CPU and a memory to perform its functions. Test workstation 2052 also includes test case 2061 and device model 2068 for other devices that are modeled but not physically coupled to emulator 2048.

최종적으로, 에뮬레이터(2048)는 버스(2061)를 통해 프레임 버퍼 또는 데이터 스트림 기록/재생 시스템(2051)과 같은 다른 주변 장치들에 결합된다. 이러한 프레임 버퍼 또는 데이터 스트림 기록/재생 시스템(2051)은 또한 버스(2063)를 통해 통신 장치 또는 채널(2053)에, 버스(2064)를 통해 VCR(2054)에, 그리고 버스(2065)를 통해 모니터 및/또는 스피커(2055)에 결합될 수 있다. Finally, emulator 2048 is coupled via bus 2061 to other peripheral devices such as a frame buffer or data stream recording / reproducing system 2051. This frame buffer or data stream recording / reproducing system 2051 also monitors on a communication device or channel 2053 via bus 2063, on VCR 2054 via bus 2064, and on bus 2065. And / or to the speaker 2055.

당업계에 공지된 것처럼, 에뮬레이션 클럭은 실제 타겟 시스템 속도보다 훨씬 더 느린 속도에서 동작한다. 그리하여, 도 66의 음영있는 부분은 에뮬레이션 속도로 동작하고 나머지 음영없는 부분은 실제 타겟 시스템 속도로 동작한다.As is known in the art, the emulation clock runs at a much slower speed than the actual target system speed. Thus, the shaded portions of FIG. 66 operate at the emulation speed and the remaining shaded portions operate at the actual target system speed.

전술된 것처럼, 에뮬레이터를 구비한 이러한 공동검증 툴은 여러가지 제약 조건이 있다. 테스트 소자의 내부 상태 정보를 얻기 위하여 로직 분석기 또는 샘플-앤드-홀드(sample-and-hold) 장치를 사용할 때, 설계자는 그의 설계를 컴파일링하고, 그 결과 디버깅 목적으로 조사하면서 관심있는 관련 신호가 샘플링을 위한 출력 핀 상에 제공된다. 설계자가 설계의 상이한 부분을 디버깅하길 원한다면, 그는 그 부분이 로직 분석기 또는 샘플-앤드-홀드 장치에 의해 샘플링될 수 있는 출력 신호를 갖는지 확인하여야 하고, 그렇지 않으면 이러한 신호가 샘플링 목적으로 출력 핀 상에 제공될 수 있도록 에뮬레이터(2048)에 존재하는 그의 설계를 재컴파일링하여야 한다. 이러한 재컴파일링 시간은 며칠 또는 몇 주가 걸릴 수 있고, 이것은 시간에 민감한 설계/개발 스케쥴을 지나치게 지연시킬 수 있다. 부가하여, 이러한 공동검증 툴은 신호를 사용하기 때문에, 정교한 회로가 이러한 신호를 데이터로 변환하기 위하여 또는 소정의 신호 대 신호 타이밍 제어를 제공하기 위하여 제공되어야 한다. 게다가, 샘플링에 요구되는 각각의 신호에 필요한 다수의 와이어(2061 및 2062)를 사용하는 것에 대한 필요성은 디버그 셋업 부담 및 시간을 증가시킨다.As mentioned above, such co-validation tools with emulators have various constraints. When using a logic analyzer or sample-and-hold device to obtain the internal state information of a test device, the designer compiles his design and, as a result, investigates the relevant signals of interest while investigating for debugging purposes. It is provided on the output pin for sampling. If the designer wants to debug a different part of the design, he must make sure that part has an output signal that can be sampled by a logic analyzer or sample-and-hold device, otherwise this signal is placed on the output pin for sampling purposes. It is necessary to recompile its design as it exists in emulator 2048 to be provided. This recompile time can take days or weeks, which can overdue the time-sensitive design / development schedule. In addition, because these co-verification tools use signals, sophisticated circuitry must be provided to convert these signals to data or to provide some signal-to-signal timing control. In addition, the need to use multiple wires 2061 and 2062 for each signal required for sampling increases debug setup burden and time.

리컨피규러블 컴퓨팅 어레이를 이용한 시뮬레이션Simulation with Reconfigurable Computing Arrays

간단한 리뷰로서, 도 67은 본 특허 명세서에서 이전에 설명되었던 본 발명의 싱글 엔진 리컨피규러블 컴퓨팅(reconfigurable computing; RCC) 어레이 시스템의 상위 레벨의 구성을 도시한다. 이러한 싱글 엔진 RCC 시스템은 본 발명의 일 실시예에 따라 공동검증 시스템에 통합될 것이다.As a brief review, FIG. 67 illustrates a high level configuration of a single engine reconfigurable computing (RCC) array system of the present invention previously described herein. This single engine RCC system will be integrated into the joint validation system in accordance with one embodiment of the present invention.

도 67에서, RCC 어레이 시스템(2080)은 RCC 컴퓨팅 시스템(2081), 리컨피규러블 컴퓨팅(RCC) 하드웨어 어레이(2084), 및 그것들을 함께 결합시키는 PCI 버스(2089)를 포함한다. 중요하게도, RCC 컴퓨팅 시스템(2081)은 소프트웨어의 사용자 설계의 전체 모델을 포함하고, RCC 하드웨어 어레이(2084)는 사용자 설계의 하드웨어 모델을 포함한다. RCC 컴퓨팅 시스템(2081)은 CPU, 메모리, 운영 시스템, 및 싱글 엔진 RCC 시스템(2080)을 실행하기 위하여 필수적인 소프트웨어를 포함한다. 소프트웨어 클럭(2082)은 RCC 컴퓨팅 시스템(2081)의 소프트웨어 모델 및 RCC 하드웨어 어레이의 하드웨어 모델을 빈틈없이 제어할 수 있도록 제공된다. 테스트 벤치 데이터(2083)가 또한 RCC 컴퓨팅 시스템(2081)에 저장된다.In FIG. 67, RCC array system 2080 includes an RCC computing system 2081, a reconfigurable computing (RCC) hardware array 2084, and a PCI bus 2089 that couples them together. Importantly, RCC computing system 2081 includes a full model of user design of software, and RCC hardware array 2084 includes a hardware model of user design. The RCC computing system 2081 includes the CPU, memory, operating system, and software necessary to run the single engine RCC system 2080. The software clock 2082 is provided to seamlessly control the software model of the RCC computing system 2081 and the hardware model of the RCC hardware array. Test bench data 2083 is also stored in RCC computing system 2081.

RCC 하드웨어 어레이 시스템(2084)는 PCI 인터페이스(2085), RCC 하드웨어 어레이 보드 세트(2086), 및 인터페이스 목적의 여러가지 버스들을 포함한다. RCC 하드웨어 어레이 보드(2086)의 세트는 하드웨어로 모델링된 사용자 설계의 적어도 일부(하드웨어 모델(2087)) 및 테스트 벤치 데이터용 메모리를 포함한다. 일 실시예에서, 이러한 하드웨어 모델의 여러 부분은 배치 시간 중에 복수개의 리컨피규러블 로직 엘리먼트들 사이에 분포된다. 더 많은 리컨피규러블 로직 엘리먼트들이 사용될수록, 더 많은 보드가 요구될 수 있다. 일 실시예에서, 4개의 리컨피규러블 로직 엘리먼트들은 하나의 보드 상에 제공된다. 다른 실시예에서, 8개의 리컨피규러블 로직 엘리먼트들이 하나의 보드 상에 제공된다. 4칩 보드에 리컨피규러블 로직 엘리먼트들이 용량 및 능력은 8칩 보드의 리컨피규러블 로직 엘리먼트들과 현저히 다를 수 있다.The RCC hardware array system 2084 includes a PCI interface 2085, an RCC hardware array board set 2086, and various buses for interface purposes. The set of RCC hardware array boards 2086 includes at least a portion of a user design modeled in hardware (hardware model 2087) and memory for test bench data. In one embodiment, different parts of this hardware model are distributed among a plurality of reconfigurable logic elements during deployment time. As more reconfigurable logic elements are used, more boards may be required. In one embodiment, four reconfigurable logic elements are provided on one board. In another embodiment, eight reconfigurable logic elements are provided on one board. Reconfigurable Logic Elements on a 4-Chip Board The capacities and capabilities can differ significantly from the reconfigurable logic elements on an 8-chip board.

버스(2090)는 PCI 인터페이스(2085)로부터 하드웨어 모델(2087)로 하드웨어 모델을 위한 다양한 클럭을 제공한다. 버스(2091)는 커넥터(2093) 및 내부 버스(2094)를 통해 PCI 인터페이스(2085)와 하드웨어 모델(2087) 사이에 다른 I/O 데이터를 제공한다. 버스(2092)는 PCI 인터페이스(2085)와 하드웨어 모델(2087) 사이에 PCI 버스로서 기능한다. 테스트 벤치 데이터는 또한 하드웨어 모델(2087)의 메모리에 저장될 수 있다. 하드웨어 모델(2087)은 전술한 것처럼 하드웨어 모델이 RCC 컴퓨팅 시스템(2081)과 인터페이스 가능하게 하기 위해 요구되는 사용자 설계의 하드웨어 모델과는 다른, 다른 구조 및 기능을 포함한다.The bus 2090 provides various clocks for the hardware model from the PCI interface 2085 to the hardware model 2087. Bus 2091 provides other I / O data between PCI interface 2085 and hardware model 2087 via connector 2093 and internal bus 2094. The bus 2092 functions as a PCI bus between the PCI interface 2085 and the hardware model 2087. Test bench data may also be stored in the memory of hardware model 2087. The hardware model 2087 includes other structures and functions that are different from the user-designed hardware model required to enable the hardware model to interface with the RCC computing system 2081 as described above.

RCC 시스템(2080)은 하나의 워크스테이션으로 제공될 수 있거나 또는 대안적으로 각각의 워크스테이션이 시간 공유 기반 상에서 RCC 시스템(2080)에 액세싱하도록 제공되는 워크스테이션의 네트워크에 결합될 수 있다. 사실상, RCC 어레이 시스템(2080)은 시뮬레이션 스케쥴러 및 상태 스와핑 메커니즘을 구비한 시뮬레이션 서버로서 기능한다. 서버는 워크스테이션에서 각각의 사용자가 높은 속도의 가속 및 하드웨어 상태 스와핑 목적으로 RCC 하드웨어 어레이(2084)에 액세싱할 수 있게 한다. 가속 및 상태 스와핑 이후에, 각각의 사용자는 다른 워크스테이션에서 다른 사용자들에게 RCC 하드웨어 어레이(2084)의 제어를 릴리싱(releasing)하는 동안 소프트웨어의 사용자 설계를 국부적으로 시뮬레이션할 수 있다. 이러한 네트워크 모델은 또한 이하에서 설명되는 공동검증 시스템에 사용될 것이다.The RCC system 2080 may be provided as one workstation or alternatively may be coupled to a network of workstations where each workstation is provided to access the RCC system 2080 on a time sharing basis. In fact, the RCC array system 2080 functions as a simulation server with a simulation scheduler and state swapping mechanism. The server allows each user at the workstation to access the RCC hardware array 2084 for high speed acceleration and hardware state swapping purposes. After acceleration and state swapping, each user can locally simulate the user design of the software while releasing control of the RCC hardware array 2084 to other users at different workstations. This network model will also be used in the joint validation system described below.

RCC 어레이 시스템(2080)은 설계자에게 전체 설계를 시뮬레이션할 수 있고 리컨피규러블 컴퓨팅 어레이의 하드웨어 모델을 통해 선택된 사이클 동안 테스트 지점들의 부분을 가속시키며 어느 시점에서든지 가상으로 설계의 임의의 부분에 대한 내부 상태 정보를 얻을 수 있는 파워 및 유연성을 제공한다. 실제로, 싱글-엔진 리컨피규러블 컴퓨팅 어레이(RCC) 시스템은 하드웨어 가속 시뮬레이터로서 설명될 수 있는데, 싱글 디버그 세션에서 이하의 작업, (1) 시뮬레이션, (2) 사용자가 시작, 중지, 값 가정, 및 임의의 시점에서 설계의 내부 상태를 조사할 수 있는 하드웨어 가속으로 시뮬레이션, (3) 시뮬레이션 후 분석, 및 (4) 내부 회로 에뮬레이션을 수행하기 위하여 사용될 수 있다. 소프트웨어 모델 및 하드웨어 모델 둘 다가 소프트웨어 클럭을 통해 싱글 엔진의 엄격한 제어 하에 있기 때문에, 리컨피규러블 컴퓨팅 어레이의 하드웨어 모델은 소프트웨어 시뮬레이션 모델에 빈틈없이 결합된다. 이것은 설계자가 사이클마다 디버깅할 수 있게 하고 가치있는 내부 상태 정보를 얻기 위하여 다수의 사이클을 통해 하드웨어 모델을 가속 및 감속시킬 수 있게 한다. 더욱이, 이러한 시뮬레이션 시스템은 신호 대신에 데이터를 다루기 때문에, 어떠한 복잡한 신호 대 데이터 변환/타이밍 회로도 필요하지 않다. 부가하여, 리컨피규러블 컴퓨팅 어레이의 하드웨어 모델은 설계자가 통상의 에뮬레이션 시스템과 달리 상이한 노드 세트를 조사하길 원한다면 재컴파일링될 필요가 없다. 보다 상세한 설명은 위의 설명을 다시 참조하라.The RCC array system 2080 allows the designer to simulate the entire design, accelerate the portion of the test points during the selected cycle through the hardware model of the reconfigurable computing array, and virtually any state of the design at any point in time. It provides the power and flexibility to get information. Indeed, a single-engine reconfigurable computing array (RCC) system can be described as a hardware acceleration simulator, which includes the following tasks in a single debug session: (1) simulation, (2) user start, stop, value assumption, and It can be used to perform simulation, (3) post-simulation analysis, and (4) internal circuit emulation with hardware acceleration that can examine the internal state of the design at any point in time. Since both software and hardware models are under tight control of a single engine through a software clock, the hardware model of the reconfigurable computing array is tightly coupled to the software simulation model. This allows designers to debug cycle-by-cycle and accelerate and decelerate hardware models through multiple cycles to gain valuable internal state information. Moreover, since these simulation systems handle data instead of signals, no complicated signal-to-data conversion / timing circuitry is required. In addition, the hardware model of the reconfigurable computing array does not need to be recompiled if the designer wants to examine a different set of nodes than a conventional emulation system. Please refer back to the above description for more details.

외부 I/O가 없는 공동검증 시스템Co-verification system without external I / O

본 발명의 일 실시예는 실제의 물리적 외부 I/O 장치 및 타겟 어플리케이션을 전혀 사용하지 않는 공동검증 시스템이다. 그리하여, 본 발명의 일 실시예에 따른 공동검증 시스템은 임의의 실제 타겟 시스템 또는 I/O 장치를 사용하지 않으면서 사용자 설계의 소프트웨어 부분 및 하드웨어 부분을 디버깅하기 위하여 RCC 시스템을 다른 기능과 통합시킬 수 있다. 대신에 타겟 시스템 및 외부 I/O 장치는 RCC 컴퓨팅 시스템의 소프트웨어로 모델링된다.One embodiment of the present invention is a joint validation system that does not use any actual physical external I / O devices and target applications. Thus, the co-verification system according to one embodiment of the present invention can integrate the RCC system with other functions to debug the software and hardware portions of the user design without using any real target systems or I / O devices. have. Instead, the target system and external I / O devices are modeled in software of the RCC computing system.

도 68을 참조하면, 공동검증 시스템(2100)은 RCC 컴퓨팅 시스템(2101), RCC 하드웨어 어레이(2108), 및 그들을 함께 결합시키는 PCI버스(2114)를 포함한다. 중요하게도, RCC 컴퓨팅 시스템(2101)은 소프트웨어의 사용자 설계의 전체 모델을 포함하고 리컨피규러블 컴퓨팅 어레이(2108)는 사용자 설계의 하드웨어 모델을 포함한다. RCC 컴퓨팅 시스템(2101)은 CPU, 메모리, 운영 시스템, 및 싱글 엔진 공동검증 시스템(2100)을 실행하기 위하여 필요한 소프트웨어를 포함한다. 소프트웨어 클럭(2104)은 RCC 컴퓨팅 시스템(2101)의 소프트웨어 모델 및 리컨피규러블 컴퓨팅 어레이(2108)의 하드웨어 모델을 완전히 제어하기 위하여 제공된다. 테스트 케이스(2103)는 또한 RCC 컴퓨팅 시스템(2101)에 저장된다.Referring to FIG. 68, the co-verification system 2100 includes an RCC computing system 2101, an RCC hardware array 2108, and a PCI bus 2114 that couples them together. Importantly, RCC computing system 2101 includes a full model of user design of software and reconfigurable computing array 2108 includes a hardware model of user design. The RCC computing system 2101 includes a CPU, memory, an operating system, and software necessary to run the single engine co-verification system 2100. The software clock 2104 is provided to fully control the software model of the RCC computing system 2101 and the hardware model of the reconfigurable computing array 2108. Test case 2103 is also stored in RCC computing system 2101.

본 발명의 일 실시예에 따라, RCC 컴퓨팅 시스템(2101)은 또한 타겟 어플리케이션(2102), 사용자 설계의 하드웨어 모델의 드라이버(2105), 장치(예를 들어, 비디오 카드)의 모델과 2106으로 라벨링된 소프트웨어의 상기 모델의 드라이버, 및 또다른 장치(예를 들어, 모니터)의 모델과 또한 2107로 라벨링된 소프트웨어의 상기 모델의 드라이버를 포함한다. 필수적으로, RCC 컴퓨팅 시스템(2101)은 실제 타겟 시스템 및 다른 I/O 장치가 이러한 컴퓨팅 환경의 일부인 사용자 설계의 소프트 웨어 모델 및 하드웨어 모델로 전달되기 위해 필요한 만큼의 많은 장치 모델 및 드라이버를 포함한다.According to one embodiment of the invention, RCC computing system 2101 is also labeled 2106 with a target application 2102, a driver 2105 of a hardware model of a user design, a model of a device (e.g., a video card) and 2106. A driver of the model of software, and a model of another device (eg, a monitor) and also of the model of software labeled 2107. Essentially, the RCC computing system 2101 includes as many device models and drivers as necessary for the actual target system and other I / O devices to be delivered to the user-designed software and hardware models that are part of this computing environment.

RCC 하드웨어 어레이(2108)는 PCI 인터페이스(2109), RCC 하드웨어 어레이 보드의 세트(2110), 및 인터페이스 목적을 위한 여러가지 버스를 포함한다. RCC 하드웨어 어레이 보드의 세트(2110)는 하드웨어(2112)로 모델링된 사용자 설계의 적어도 일부분 및 테스트 벤치 데이터를 위한 메모리(2113)를 포함한다. 전술한 것처럼, 각각의 보드는 복수 개의 리컨피규러블 로직 엘리먼트 또는 칩을 포함한다.RCC hardware array 2108 includes a PCI interface 2109, a set 2110 of RCC hardware array boards, and various buses for interface purposes. The set 2110 of RCC hardware array boards includes at least a portion of a user design modeled with hardware 2112 and memory 2113 for test bench data. As mentioned above, each board includes a plurality of reconfigurable logic elements or chips.

버스(2115)는 PCI 인터페이스(2109)로부터 하드웨어 모델(2112)까지 하드웨어 모델을 위한 다양한 클럭을 제공한다. 버스(2116)는 커넥터(2111) 및 내부 버스(2118)를 통해 PCI 인터페이스(2109)와 하드웨어 모델(2112) 사이에 I/O 데이터를 제공한다. 버스(2117)는 PCI 인터페이스(2109)와 하드웨어 모델(2112) 사이에 PCI 버스로서 기능한다. 테스트 벤치 데이터는 또한 하드웨어 모델(2113)의 메모리에 저장될 수 있다. 전술한 것처럼, 하드웨어 모델은 하드웨어 모델이 RCC 컴퓨팅 시스템(2101)과 인터페이스할 수 있게 하는데 요구되는 사용자 설계의 하드웨어 모델과는 다른, 다른 구조 및 기능을 포함한다.Bus 2115 provides various clocks for hardware models from PCI interface 2109 to hardware model 2112. Bus 2116 provides I / O data between PCI interface 2109 and hardware model 2112 via connector 2111 and internal bus 2118. The bus 2117 functions as a PCI bus between the PCI interface 2109 and the hardware model 2112. Test bench data may also be stored in the memory of hardware model 2113. As noted above, the hardware model includes other structures and functions that are different from the user-designed hardware model required to enable the hardware model to interface with the RCC computing system 2101.

도 68의 공동검증 시스템과 종래의 에뮬레이터 기반 공동검증 시스템을 비교하기 위하여, 도 66은 타겟 시스템(2040), 소정의 I/O 장치(예를 들어, 프레임 버퍼 또는 데이터 스트림 기록/재생 시스템(2051)), 및 워크스테이션(2052)에 결합되는 에뮬레이터(2048)를 보여준다. 이러한 에뮬레이터 구성은 설계자에게 많은 문제점 및 설정 문제를 제공한다. 에뮬레이터는 에뮬레이터로 모델링된 사용자 설계의 내부 상태를 측정하기 위하여 로직 분석기 또는 샘플-앤드-홀드 장치를 필요로 한다. 로직 분석기 및 샘플-앤드-홀드 장치는 신호를 필요로 하기 때문에, 복잡한 신호 대 데이터 변환 회로가 요구된다. 부가적으로, 또한 복잡한 신호 대 신호 타이밍 제어 회로가 요구된다. 에뮬레이터의 내부 상태를 측정하기 위하여 사용될, 모든 신호에 대하여 요구되는 다수의 와이어가 셋업 동안에 사용자에게 부담을 준다. 디버그 세션 동안, 사용자는 그가 내부 로직 회로의 상이한 세트를 조사하길 원하는 매 시점마다 에뮬레이터를 재컴파일링하여야 하고, 그 결과 로직 회로로부터 나온 적절한 신호가 로직 분석기 또는 샘플-앤드-홀드 장치에 의한 측정 및 기록을 위하여 출력으로서 제공된다. 장시간의 재컴파일링은 아주 비용이 많이 든다.To compare the co-verification system of FIG. 68 with a conventional emulator-based co-verification system, FIG. 66 shows a target system 2040, a given I / O device (e.g., a frame buffer or data stream recording / reproducing system 2051). ), And emulator 2048 coupled to workstation 2052. This emulator configuration presents many problems for designers and configuration issues. The emulator requires a logic analyzer or sample-and-hold device to measure the internal state of the user design modeled by the emulator. Logic analyzers and sample-and-hold devices require signals, requiring complex signal-to-data conversion circuits. In addition, a complex signal to signal timing control circuit is also required. The number of wires required for all signals, which will be used to measure the internal state of the emulator, burdens the user during setup. During a debug session, the user must recompile the emulator each time he wants to examine a different set of internal logic circuits, so that the appropriate signal from the logic circuit is measured and measured by the logic analyzer or sample-and-hold device. Provided as output for recording. Long recompiles are very expensive.

아무런 외부 I/O 장치가 결합되지 않은 본 발명의 공동검증 시스템에서, 타겟 시스템 및 다른 I/O 장치가 소프트웨어로 모델링되어 실제 물리적 타겟 시스템 및 I/O 장치는 물리적으로 필요하지 않다. RCC 컴퓨팅 시스템(2101)이 데이터를 프로세싱하기 때문에, 복잡한 신호 대 데이터 변화 회로 또는 신호 대 신호 타이밍 제어 회로는 전혀 필요하지 않다. 또한 와이어의 수는 신호의 수와 같지 않으므로, 셋업은 비교적 단순하다. 부가하여, 사용자 설계의 하드웨어 모델에 들어 있는 로직 회로의 상이한 부분을 디버깅하는 것은 공동검증 시스템이 데이터만 프로세싱하고 신호는 프로세싱하지 않기 때문에 재컴파일링을 요구하지 않는다. RCC 컴퓨팅 시스템이 소프트웨어 제어 클럭(즉, 소프트웨어 클럭 및 클럭 에지 검출 회로)로 RCC 하드웨어 어레이를 제어하기 때문에, 하드웨어 모델을 시작하고 중지하는 것이 촉진된다. 전체 사용자 설계의 모델이 소프트웨어에 존재하고 소프트웨어 클럭이 동기화를 가능하게 하므로 하드웨어 모델로부터 데이터를 판독하는 것 또한 용이하다. 그리하여, 사용자는 소프트웨어 시뮬레이션 하나에 의하여 디버깅할 수 있고, 하드웨어의 전체 또는 일부를 가속시킬 수 있으며, 매 사이클마다 여러가지 목적하는 테스트 지점을 통하여 나아갈 수 있고 , 소프트웨어 및 하드웨어 모델의 내부 상태(예를 들어, 레지스터 및 결합 로직 상태)를 조사할 수 있다. 예를 들어, 사용자는 소정의 테스트 벤치 데이터로 설계를 시뮬레이션할 수 있고, 그 다음에 내부 상태 정보를 하드웨어 모델로 다운로드할 수 있으며, 하드웨어 모델과 함께 다양한 테스트 벤치 데이터를 사용하여 설계를 가속시킬 수 있으며, 레지스터/결합 로직 재생성에 의한 하드웨어 모델의 결과적 내부 상태값 및 하드웨어 모델로부터 소프트웨어 모델로 로딩되는 값을 조사할 수 있으며, 사용자는 최종적으로 하드웨어 모델 가속 프로세스의 결과를 사용하여 사용자 설계의 다른 부분을 시뮬레이션할 수 있다. In the joint validation system of the present invention in which no external I / O devices are combined, the target system and other I / O devices are modeled in software so that no actual physical target system and I / O devices are physically needed. Since the RCC computing system 2101 processes the data, no complicated signal to data change circuit or signal to signal timing control circuit is needed. Also, the number of wires is not equal to the number of signals, so the setup is relatively simple. In addition, debugging different parts of the logic circuitry in the hardware model of the user design does not require recompilation because the co-verification system only processes data, not signals. Since the RCC computing system controls the RCC hardware array with a software control clock (ie, software clock and clock edge detection circuit), starting and stopping the hardware model is facilitated. It is also easy to read data from the hardware model because a model of the entire user design is present in the software and the software clock allows synchronization. Thus, the user can debug by means of a single software simulation, accelerate all or part of the hardware, go through various desired test points every cycle, and internal state of the software and hardware model (e.g. , Registers, and coupling logic states) can be examined. For example, a user can simulate a design with some test bench data, then download internal state information into a hardware model, and use various test bench data with the hardware model to accelerate the design. You can examine the resulting internal state of the hardware model by register / join logic regeneration and the values loaded into the software model from the hardware model, and the user can finally use the results of the hardware model acceleration process to determine other parts of the user's design. Can be simulated.

그러나, 전술한 것처럼, 워크스테이션은 여전히 디버그 세션 제어 목적을 위해 필요하다. 네트워크 구성에서, 워크스테이션은 디버그 데이터를 원격으로 액세싱하기 위하여 공동검증 시스템과 원격으로 결합될 수 있다. 비네트워크(non-network) 구성에서, 워크스테이션은 공동검증 시스템에 국부적으로 결합될 수 있고, 또는 소정의 다른 실시예에서 워크스테이션은 내부적으로 공동검증 시스템을 결합시켜 디버그 데이터가 국부적으로 액세싱될 수 있다.However, as mentioned above, workstations are still needed for debug session control purposes. In a network configuration, the workstation can be remotely coupled with the joint validation system to remotely access debug data. In a non-network configuration, the workstation may be locally coupled to the co-verification system, or in some other embodiments the workstation may internally couple the co-verification system so that debug data may be locally accessed. Can be.

외부 I/O를 구비한 공동검증 시스템Joint Verification System with External I / O

도 68에서, 여러가지 I/O 장치 및 타겟 어플리케이션이 RCC 컴퓨팅 시스템(2101)으로 모델링되었다. 그러나, 지나치게 많은 I/O 장치 및 타겟 어플리케이션이 RCC 컴퓨팅 시스템(2101)에서 실행되고 있는 경우, 전체 속도는 느려진다. RCC 컴퓨팅 시스템(2101)의 단지 하나의 CPU를 사용하면, 모든 장치 모델 및 타겟 어플리케이션으로부터 나온 다양한 데이터를 프로세싱하기 위하여 더 많은 시간이 필요하다. 데이터 처리량을 증가시키기 위하여, 실제 I/O 장치 및 타겟 어플리케이션(이러한 I/O 장치 및 타겟 어플리케이션의 소프트웨어 모델 대신에)이 물리적으로 공동검증 시스템에 결합될 수 있다.In FIG. 68, various I / O devices and target applications have been modeled with the RCC computing system 2101. However, if too many I / O devices and target applications are running in the RCC computing system 2101, the overall speed is slow. Using only one CPU of the RCC computing system 2101, more time is required to process the various data from all device models and target applications. To increase data throughput, the actual I / O device and target application (instead of the software model of such I / O device and target application) may be physically coupled to the co-validation system.

본 발명의 일 실시예는 실제적이고 물리적인 외부 I/O 장치 및 타겟 어플리케이션을 사용하는 공동검증 시스템이다. 그리하여, 공동검증 시스템은 실제 타겟 시스템 및/또는 I/O장치를 사용하면서 사용자 설계의 소프트웨어 부분 및 하드웨어 부분을 디버깅하기 위하여 RCC 시스템에 다른 기능을 결합시킬 수 있다. 테스트를 위하여, 공동검증 시스템은 소프트웨어로부터 나온 테스트 벤치 데이터 및 외부 인터페이스(예를 들어, 타겟 시스템 및 외부 I/O 장치)로부터 나온 스티멀러스 둘 다를 사용할 수 있다. 테스트 벤치 데이터는 사용자 설계의 핀 아웃(pin-out)에 테스트 데이터를 제공하기 위해 사용될 뿐만 아니라, 사용자 설계의 내부 노드에 테스트 데이터를 제공하기 위하여 사용될 수 있다. 외부 I/O 장치(또는 타겟 시스템)으로부터 나온 실제 I/O 신호는 단지 사용자 설계의 핀 아웃에 지향될 수 있다. 이와 같이, 외부 인터페이스(예를 들어, 타겟 시스템 또는 외부 I/O 장치)로부터 나온 테스트 데이터와 소프트웨어의 테스트 벤치 프로세스 사이의 한 가지 주된 차이점은 테스트 벤치 데이터는 핀 아웃 및 내부 노드에 인가되는 스티멀러스로 사용자 설계를 테스트하기 위하여 사용될 수 있는 반면, 타겟 시스템 또는 외부 I/O 장치는 단지 핀 아웃(또는 핀 아웃를 나타내는 사용자 설계의 노드)을 통해 사용자 설계에 인가될 수 있다는 것이다. 이하의 설명에서, 공동검증 시스템의 구조 및 타겟 시스템과 외부 I/O 장치와 관련된 상기 공동검증 구성이 제공될 것이다. One embodiment of the present invention is a co-verification system using actual and physical external I / O devices and target applications. Thus, the co-verification system can incorporate other functionality into the RCC system to debug the software and hardware portions of the user design while using the actual target system and / or I / O devices. For testing, the co-validation system can use both test bench data from software and stimulus from external interfaces (eg, target systems and external I / O devices). The test bench data can be used to provide test data to the pin-out of the user design, as well as to provide test data to internal nodes of the user design. The actual I / O signal from the external I / O device (or target system) can only be directed to the pin out of the user design. As such, one major difference between test data from external interfaces (eg, target systems or external I / O devices) and test bench processes in software is that the test bench data is pinned out and applied to internal nodes. While a target system or external I / O device can be used to test a user design by itself, it can only be applied to the user design via a pin out (or node of the user design representing the pin out). In the following description, the structure of the co-verification system and the co-verification configuration associated with the target system and the external I / O device will be provided.

도 66의 시스템 구성과 비교하여, 본 발명의 일 실시예에 따른 공동검증 시스템은 점선(2070)으로 된 아이템들의 구조 및 기능을 대체한다. 달리 말하면, 도 66은 점선(2070)의 경계 내부의 에뮬레이터 및 워크스테이션을 보여주는 반면, 본 발명의 일 실시예는 점선(2070) 내부의 공동검증 시스템(2140)으로서 도 69에 도시된 것과 같은 공동검증 시스템(2140)(및 그와 관련된 워크스테이션)을 포함한다. Compared to the system configuration of FIG. 66, the co-verification system according to one embodiment of the present invention replaces the structure and function of items in dotted line 2070. In other words, FIG. 66 shows an emulator and workstation inside the boundary of dashed line 2070, while one embodiment of the present invention is a cavity as shown in FIG. 69 as a cavity validation system 2140 inside dashed line 2070. Verification system 2140 (and associated workstations).

도 69를 참조하면, 본 발명의 일 실시예에 따른 공동검증 시스템 구성은 타겟 시스템(2120), 공동검증 시스템(2140), 소정의 선택적 I/O 장치, 및 그것들을 함께 결합시키기 위한 제어/데이터 버스(2131 및 2132)를 포함한다. 타겟 시스템(2120)은 중앙 컴퓨팅 시스템(2121)을 포함하고, 상기 중앙 컴퓨팅 시스템(2121)은 CPU 및 메모리를 포함하며, 다수의 어플리케이션(2122) 및 테스트 케이스(2123)을 실행하기 위하여 마이크로소프트 윈도우즈 또는 썬 마이크로시스템의 솔라리스와 같은 소정의 운영 시스템 하에서 동작한다. 사용자 설계의 하드웨어 모델을 위한 장치 드라이버(2124)는 운영 시스템(및 임의의 어플리케이션)과 사용자 설계 사이의 통신을 가능하게 하기 위하여 중앙 컴퓨팅 시스템에 포함된다. 이러한 컴퓨팅 환경의 일부인 다른 장치들 및 공동검증과 통신하기 위하여, 중앙 컴퓨팅 시스템(2121)은 PCI 버스(2129)에 결합된다. 타겟 시스템(2120)의 다른 주변 장치들은 타겟 시스템을 네트워크에 결합시키기 위하여 사용되는 에더넷 PCI 애드-온 카드(2125), 버스(2130)를 통해 SCSI 드라이버(2128)에 결합되는 SCSI PCI 애드-온 카드(2126), 및 PCI 버스 브릿지(2127)를 포함한다.Referring to FIG. 69, a joint validation system configuration in accordance with an embodiment of the present invention is a target system 2120, a joint validation system 2140, certain optional I / O devices, and control / data for coupling them together. Buses 2131 and 2132. Target system 2120 includes a central computing system 2121, which includes a CPU and memory, and runs Microsoft Windows to run multiple applications 2122 and test cases 2123. Or run under certain operating systems such as Sun Microsystems' Solaris. Device drivers 2124 for hardware models of user designs are included in the central computing system to enable communication between the operating system (and any application) and the user designs. In order to communicate with other devices and co-verification that are part of this computing environment, the central computing system 2121 is coupled to the PCI bus 2129. Other peripheral devices of the target system 2120 are an Ethernet PCI add-on card 2125 used to couple the target system to the network, a SCSI PCI add-on coupled to the SCSI driver 2128 via the bus 2130. Card 2126, and PCI bus bridge 2127.

공동검증 시스템(2140)은 RCC 컴퓨팅 시스템(2141), RCC 하드웨어 어레이(2190), 외부 I/O 확장기 형태의 외부 인터페이스(2139), 및 RCC 컴퓨팅 시스템(2141)과 RCC 하드웨어 어레이(2190)를 함께 결합시키는 PCI 버스(2171)를 포함한다. RCC 컴퓨팅 시스템(2141)은 CPU, 메모리, 운영 시스템, 및 싱글 엔진 공동검증 시스템(2140)을 실행하기 위하여 필요한 소프트웨어를 포함한다. 중요하게도, RCC 컴퓨팅 시스템(2141)은 소프트웨어의 사용자 설계의 전체 모델을 포함하고 RCC 하드웨어 어레이(2190)는 사용자 설계의 하드웨어 모델을 포함한다.The joint validation system 2140 combines the RCC computing system 2141, the RCC hardware array 2190, the external interface 2139 in the form of an external I / O expander, and the RCC computing system 2141 and the RCC hardware array 2190 together. PCI bus 2171 for coupling. The RCC computing system 2141 includes the CPU, memory, operating system, and software needed to run the single engine co-verification system 2140. Importantly, RCC computing system 2141 includes a full model of user design of software and RCC hardware array 2190 includes a hardware model of user design.

전술한 것처럼, 공동검증 시스템의 싱글 엔진은 RCC 컴퓨팅 시스템(2141)의 주 메모리에 상주하는 주 소프트웨어 커널로부터 그 파워 및 유연성을 얻고 공동검증 시스템(2140)의 전체 동작 및 실행을 제어한다. 임의의 테스트 벤치 프로세스가 액티브이거나 외부 세계로부터 나온 임의의 신호가 공동검증에 제공되는 한, 커널을 액티브 테스트 벤치 컴포넌트를 평가하고, 클럭 컴포넌트를 평가하며, 레지스터와 메모리 및 전파하는 결합 로직 데이터를 업데이트시키기 위하여 클럭 에지를 검출하여 시뮬레이션 시간을 낸다. 이러한 주 소프트웨어 커널은 RCC 컴퓨팅 시스템(2141) 및 RCC 하드웨어 어레이(2190)의 단단히 결합된 속성에 대비한다.As mentioned above, the single engine of the co-verification system gains its power and flexibility from the main software kernel residing in the main memory of the RCC computing system 2141 and controls the overall operation and execution of the co-verification system 2140. As long as any test bench process is active or any signal from the outside world is provided for co-verification, the kernel evaluates active test bench components, evaluates clock components, updates registers, memory and propagation logic data. The clock edge is detected to simulate time. This main software kernel provides for the tightly coupled nature of RCC computing system 2141 and RCC hardware array 2190.

소프트웨어 커널은 RCC 하드웨어 어레이(2190) 및 외부 세계에 제공되는 소프트웨어 클럭 소스(2142)로부터 소프트웨어 클럭 신호를 생성한다. 클럭 소스(2142)는 이러한 소프트웨어 클럭의 목적에 따라 서로 다른 주파수에서 다수의 클럭을 발생시킬 수 있다. 일반적으로, 소프트웨어 클럭은 사용자 설계의 하드웨어 모델에 존재하는 레지스터가 임의의 유지 시간(hold-time)을 위반하지 않으면서 시스템 클럭에 동기하여 평가함을 포장한다. 소프트웨어 모델은 하드웨어 모델 레지스터 값에 영향을 미치는 소프트웨어의 클럭 에지를 검출할 수 있다. 따라서, 클럭 검출 메커니즘은 주 소프트웨어 모델의 클럭 에지 검출이 하드웨어 모델의 클럭 검출로 변환될 수 있음을 보장한다. 소프트웨어 클럭 및 클럭 에지 검출 로직에 대한 보다 상세한 설명은 도 17-19 및 본 특허 명세서의 첨부 텍스트를 참조하라.The software kernel generates a software clock signal from the RCC hardware array 2190 and a software clock source 2142 provided to the outside world. The clock source 2142 can generate multiple clocks at different frequencies depending on the purpose of this software clock. In general, software clocks wrap registers present in the hardware model of the user design in synchronization with the system clock without violating any hold-time. The software model can detect clock edges of software that affect the hardware model register values. Thus, the clock detection mechanism ensures that clock edge detection of the main software model can be converted to clock detection of the hardware model. See Figures 17-19 and accompanying text of this patent specification for more details on software clock and clock edge detection logic.

본 발명의 일 실시예에 따라, RCC 컴퓨팅 시스템(2141)은 또한 다른 실제의 물리적 I/O 장치가 공동검증 시스템에 결합될 수 있음에도 불구하고 다수의 I/O 장치 중 하나 이상의 모델을 포함할 수 있다. 예를 들어, RCC 컴퓨팅 시스템(2141)은 드라이버와 2143으로 라벨링된 소프트웨어의 테스트 벤치 데이터를 구비한 장치(예를 들어, 스피커)의 모델, 및 드라이버와 2144로 라벨링된 소프트웨어의 테스트 벤티 데이터를 구비한 또다른 장치(예를 들어, 그래픽 가속기)의 모델을 포함할 수 있다. 사용자는 어떠한 장치(및 그와 관련된 드라이버 및 테스트 벤치 데이터)가 모델링되고 RCC 컴퓨팅 시스템(2141)에 통합될 수 있는지 그리고 어떠한 장치가 실제로 공동검증 시스템과 결합될 것인지를 결정한다. According to one embodiment of the invention, RCC computing system 2141 may also include one or more models of multiple I / O devices, although other actual physical I / O devices may be coupled to the co-validation system. have. For example, RCC computing system 2141 includes a model of a device (eg, a speaker) with a driver and test bench data of software labeled 2143, and test vent data of a driver and software labeled 2144. It may include a model of one another device (eg, graphics accelerator). The user determines which devices (and their associated driver and test bench data) can be modeled and integrated into the RCC computing system 2141 and which devices will actually be combined with the co-validation system.

공동검증 시스템은 (1) RCC 컴퓨팅 시스템(2141)과 RCC 하드웨어 어레이(2190) 사이에, (2) 외부 인터페이스(타겟 시스템과 외부 I/O 장치에 결합됨)와 RCC 하드웨어 어레이(2190) 사이에 트래픽 제어를 제공하는 제어 로직을 포함한다. 소정의 데이터는 소정의 I/O 장치가 RCC 컴퓨팅 시스템으로 모델링될 수 있기 때문에 RCC 하드웨어 어레이(2190)와 RCC 컴퓨팅 시스템(2141) 사이에 전달된다. 부가하여, RCC 컴퓨팅 시스템(2141)은 RCC 하드웨어 어레이(2190)로 모델링된 사용자 설계 부분을 포함하여 전체 소프트웨어 설계의 모델을 구비한다. 결과적으로, RCC 컴퓨팅 시스템(2141)은 또한 외부 인터페이스와 RCC 하드웨어 어레이(2190) 사이에 전달되는 모든 데이터에 액세스를 가져야 한다. 제어 로직은 RCC 컴퓨팅 시스템(2141)이 이러한 데이터에 액세스를 가짐을 보장한다. 제어 로직은 이하에서 보다 상세히 설명될 것이다.The co-verification system includes (1) between RCC computing system 2141 and RCC hardware array 2190, (2) between an external interface (coupled to the target system and external I / O devices) and RCC hardware array 2190. It includes control logic that provides traffic control. Certain data is transferred between the RCC hardware array 2190 and the RCC computing system 2141 because certain I / O devices can be modeled in an RCC computing system. In addition, the RCC computing system 2141 includes a model of the overall software design, including the user design portion modeled by the RCC hardware array 2190. As a result, the RCC computing system 2141 must also have access to all data passed between the external interface and the RCC hardware array 2190. Control logic ensures that RCC computing system 2141 has access to this data. Control logic will be described in more detail below.

RCC 하드웨어 어레이(2190)는 다수의 어레이 보드를 포함한다. 도 69에 도시된 이러한 특정 실시예에서, 하드웨어 어레이(2190)는 보드(2145-2149)를 포함한다. 보드(2146-2149)는 대부분의 배치된 하드웨어 모델을 포함한다. 보드(2145)(또는 보드 m1)는 공동검증 시스템이 적어도 하드웨어 모델의 일부를 구성하는데 사용할 수 있는 리컨피규러블 컴퓨팅 소자(예를 들어, FPGA 칩)(2153) 및 외부 인터페이스(타겟 시스템과 I/O 장치)와 공동검증 시스템(2140) 사이에 트래픽과 데이터를 보내는 외부 I/O 컨트롤러(2152)를 포함한다. 보드(2145)은 외부 I/O 컨트롤러를 통해, RCC 컴퓨팅 시스템(2141)이 외부 세계(즉, 타겟 시스템 및 I/O 시스템)와 RCC 하드웨어 어레이(2190) 사이에서 전달되는 모든 데이터에 액세스할 수 있게 한다. 이러한 액세스는 공동검증 시스템의 RCC 컴퓨팅 시스템(2141)이 소프트웨어로 되어 있는 전체 사용자 설계의 모델을 포함하고 RCC 컴퓨팅 시스템(2141)이 또한 RCC 하드웨어 어레이(2190)의 기능을 제어할 수 있기 때문에 중요하다.The RCC hardware array 2190 includes a plurality of array boards. In this particular embodiment shown in FIG. 69, the hardware array 2190 includes boards 2145-2149. Boards 2146-2149 include most of the deployed hardware models. Board 2145 (or board m1) is a reconfigurable computing element (eg, FPGA chip) 2153 and an external interface (target system and I / O) that the co-verification system can use to form at least part of the hardware model. O I) and an external I / O controller 2152 for sending traffic and data between the joint validation system 2140. The board 2145 uses an external I / O controller to allow the RCC computing system 2141 to access all data passed between the external world (ie, target system and I / O system) and the RCC hardware array 2190. To be. This access is important because the RCC computing system 2141 of the co-verification system contains a model of the entire user design in software and the RCC computing system 2141 can also control the functionality of the RCC hardware array 2190. .

만약 외부 I/O 장치로부터 나온 스티멀러스가 하드웨어 모델에 제공된다면, 소프트웨어 모델은 또한 마찬가지로 이러한 스티멀러스에 액세스를 가져야 하고, 그 결과 공동검증 시스템의 사용자는 선택적으로 다음 디버그 단계를 제어할 수 있으며, 상기 디버그 단계는 이러한 가해진 가적의 결과로서 설계의 내부 상태 값을 조사하는 단계를 포함한다. 보드 레이아웃과 상호접속 개요와 관련하여 전술한 것과 같이, 제 1 보드와 마지막 보드는 하드웨어 어레이(2190)에 포함된다. 그리하여, 보드 1(보드(2146)로서 라벨링됨) 및 보드 8(보드(2149)로서 라벨링됨)는 8보드 하드웨어 어레이(보드 m1 배제)에 포함된다. 이러한 보드(2145-2149)와 달리, 보드 m2(도 69에 미도시, 도 74 참조)는 또한 칩 m2를 구비하여 제공될 수 있다. 이러한 보드 m2는 보드 m2가 임의의 외부 인터페이스를 갖지 않고 부가적인 보드가 필요하다면 확장 목적으로 사용될 수 있다는 점을 제외하고 보드 m1과 유사하다.If stimulus from an external I / O device is provided to the hardware model, the software model should also have access to this stimulus as well, so that the user of the co-verification system can optionally control the next debug step. The debug step includes examining the internal state values of the design as a result of this applied addition. As described above in connection with the board layout and interconnect overview, the first board and the last board are included in the hardware array 2190. Thus, board 1 (labeled as board 2146) and board 8 (labeled as board 2149) are included in an 8-board hardware array (excluding board m1). Unlike such boards 2145-2149, board m2 (not shown in FIG. 69, see FIG. 74) may also be provided with chip m2. This board m2 is similar to board m1 except that board m2 does not have any external interface and can be used for expansion purposes if additional boards are needed.

이러한 보드의 내용은 이하에서 설명될 것이다. 보드(2145)(보드 m1)는 PCI 컨트롤러(2151), 외부 I/O 컨트롤러(2152), 데이터 칩(m1)(2153), 메모리(2154) 및 멀티플렉서(2155)를 포함한다. 일 실시예에서, 이러한 PCI 컨트롤러는 PLX 9080이다. PCI 컨트롤러(2151)는 버스(2171)를 통해 RCC 컴퓨팅 시스템(2141)에, 버스(2172)를 통해 3상태 버퍼(2179)에 결합된다.The contents of this board will be described below. Board 2145 (board m1) includes PCI controller 2151, external I / O controller 2152, data chip m1 2153, memory 2154, and multiplexer 2155. In one embodiment, this PCI controller is PLX 9080. PCI controller 2151 is coupled to RCC computing system 2141 via bus 2171 and to tri-state buffer 2179 via bus 2172.

외부 세계(타겟 시스템(2120) 및 I/O 장치)와 RCC 컴퓨팅 시스템(2141) 사이의 공동검증 시스템에 존재하는 주 트래픽 컨트롤러는 외부 I/O 컨트롤러(2152)(도 69, 71, 및 73에서 "CTRLXM"으로 알려짐)이고, 상기 컨트롤러는 RCC 컴퓨팅 시스템(2141), RCC 하드웨어 어레이의 다른 보드들(2146-2149), 타겟 시스템(2120), 및 실제 외부 I/O 장치에 결합된다. 물론, RCC 컴퓨팅 시스템(2141)과 RCC 하드웨어 어레이(2190) 사이의 주 트래픽 컨트롤러는 전술한 것처럼 항상 각각의 어레이 보드(2146-2149)의 개개의 내부 I/O 컨트롤러들(예를 들어, I/O 컨트롤러(2156 및 2158)과 PCI 컨트롤러(2151)의 결합이었다. 일 실시예에서, 컨트롤러(2156 및 2158)와 같은 이러한 개개의 내부 I/O 컨트롤러는 도 22(유닛(700)) 및 도 56(유닛(1200))과 같은 예시적인 도면에서 설명되고 도시된 FPGA I/O 컨트롤러이다.The primary traffic controller present in the co-validation system between the external world (target system 2120 and I / O devices) and the RCC computing system 2141 is the external I / O controller 2152 (FIGS. 69, 71, and 73). Known as " CTRLXM ", the controller is coupled to the RCC computing system 2141, other boards 2146-2149 of the RCC hardware array, the target system 2120, and the actual external I / O device. Of course, the primary traffic controller between the RCC computing system 2141 and the RCC hardware array 2190 is always the individual internal I / O controllers (eg, I / O) of each array board 2146-2149 as described above. O controllers 2156 and 2158 and a combination of PCI controller 2151. In one embodiment, these individual internal I / O controllers, such as controllers 2156 and 2158, are shown in Figs. FPGA I / O controller described and illustrated in an exemplary diagram such as (unit 1200).

외부 I/O 컨트롤러(2152)는 외부 I/O 컨트롤러가 RCC 컴퓨팅 시스템(2141)과 인터페이스할 수 있게 하기 위하여 3상태 버퍼(2179)에 결합된다. 일 실시예에서, 3상태 버퍼(2179)는 소정의 예에서 로컬 버스로부터 나온 데이터가 RCC 컴퓨팅 시스템(2141)으로 전달되는 것을 막으면서 RCC 컴퓨팅 시스템(2141)으로부터 나온 데이터가 로컬 버스(2180)로 전달될 수 있게 하고, 다른 예에서는 데이터가 로컬 버스(2180)으로부터 RCC 컴퓨팅 시스템(2141)로 전달될 수 있게 한다.External I / O controller 2152 is coupled to tri-state buffer 2179 to enable external I / O controllers to interface with RCC computing system 2141. In one embodiment, the tri-state buffer 2179, in some examples, prevents data from the local bus from being passed to the RCC computing system 2141 while passing data from the RCC computing system 2141 to the local bus 2180. And, in another example, data can be transferred from the local bus 2180 to the RCC computing system 2141.

외부 I/O 컨트롤러(2152)는 또한 데이터 버스(2176)를 통해 칩(m1)(2153) 및 메모리/외부 버퍼(2154)에 연결된다. 일 실시예에서, 칩(m1)(2153)은 사용자 설계의 하드웨어 모델(또는 사용자 설계가 충분히 소형이면, 모든 하드웨어 모델)의 적어도 일부를 구성하는데 이용될 수 있는 FPGA 칩과 같은 리컨피규러블 컴퓨팅 엘리먼트이다. 외부 버퍼(2154)는 일 실시예에서 DRAM DIMM이며 다양한 목적을 위해 칩(2153)에 의해 이용될 수 있다. 외부 버퍼(2154)는 각각 리컨피규러블 로직 엘리먼트(예를 들어, 리컨피규러블 로직 엘리먼트(2157)에 국부적으로 연결된 개별 SRAM 메모리 소자이상의 많은 메모리 용량을 제공한다. 이러한 큰 메모리 용량은 RCC 컴퓨팅 시스템이 테스트 벤치 데이터, 마이크로컨트롤러용 구현 코드(사용자 설계가 마이크로컨트롤러라면) 및 하나의 메모리 소자의 큰 룩업 테이블과 같은 큰 데이터량을 저장하도록 허용한다. 외부 버퍼(2154)는 또한 상기에 기술된 바와 같이, 하드웨어 모델링에 필요한 데이터를 저장하는데 이용될 수 있다. 필수적으로, 이러한 외부 버퍼(2154)는 더 많은 메모리를 갖지만, 예를 들어 도 56(SRAM(1205, 1206))에서 상기에 기술되고 도시된 다른 하이 또는 로우 뱅크 SRAM 메모리 소자와 부분적으로 유사하게 기능할 수 있다. 외부 버퍼(2154)는 또한 이후에 데이터가 RCC 컴퓨팅 시스템(2141)에 의해 검색될 수 있도록 타겟 시스템(2120) 및 외부 I/O 장치로부터 수신한 데이터를 저장하기 위해 공동검증 시스템에 의해 이용될 수 있다. 칩 m1(2153) 및 외부 버퍼(2154)는 또한 "메모리 시뮬레이션"이란 섹션하에 여기에 기술된 메모리 맵핑 로직을 포함한다.The external I / O controller 2152 is also connected to the chip m21 2153 and the memory / external buffer 2154 via the data bus 2176. In one embodiment, chip m1 2153 is a reconfigurable computing element such as an FPGA chip that can be used to construct at least a portion of a hardware model of a user design (or any hardware model if the user design is small enough). to be. External buffer 2154 is a DRAM DIMM in one embodiment and may be used by chip 2153 for various purposes. External buffers 2154 each provide more memory capacity than individual SRAM memory elements locally coupled to reconfigurable logic elements (e.g., reconfigurable logic elements 2157. This large memory capacity allows the RCC computing system to Allows to store large amounts of data, such as test bench data, implementation code for the microcontroller (if the user design is a microcontroller), and a large lookup table of one memory device The external buffer 2154 is also described above. Essentially, this external buffer 2154 has more memory, but is described and illustrated above in FIG. 56 (SRAMs 1205 and 1206, for example). May function partially similar to other high or low bank SRAM memory devices. Can be used by the joint validation system to store data received from the target system 2120 and external I / O devices such that the data can be retrieved by the RCC computing system 2141. Chip m1 2153 and External buffer 2154 also includes the memory mapping logic described herein under the section “Memory Simulation”.

외부 버퍼(2154)에서 원하는 데이터를 액세스하기 위해, 칩(2153) 및 RCC 컴퓨팅 시스템(2141)(외부 I/O 컨트롤러(2152)를 통해)은 원하는 데이터용 어드레스를 전달할 수 있다. 상기 칩(2153)은 어드레스 버스(2182)상에 어드레스를 제공하고 외부 I/O 컨트롤러(2152)는 어드레스 버스(2177)상에 어드레스를 제공한다. 이러한 어드레스 버스(2182, 2177)는 외부 버퍼(2154)에 연결된 출력 라인(2178)상의 선택된 어드레스를 제공하는 멀티플렉서(2155)에 대한 입력이다. 상기 멀티플렉서(2155)에 대한 선택 신호는 라인(2181)을 통해 외부 I/O 컨트롤러(2152)에 의해 제공된다.To access the desired data in the external buffer 2154, the chip 2153 and the RCC computing system 2141 (via the external I / O controller 2152) may pass the address for the desired data. The chip 2153 provides an address on the address bus 2182 and an external I / O controller 2152 provides an address on the address bus 2177. These address buses 2182 and 2177 are inputs to the multiplexer 2155 providing a selected address on the output line 2178 coupled to the external buffer 2154. The select signal for the multiplexer 2155 is provided by an external I / O controller 2152 over line 2181.

외부 I/O 컨트롤러(2152)는 또한 버스(2180)를 통해 다른 보드(2146-2149)에 연결된다. 일 실시예에서, 버스(2180)는 도 22(로컬 버스(708)) 및 도 56(로컬 버스(1210))의 상기 예시적인 도면에 기술되고 도시된 로컬 버스이다. 이 실시예에서, 5개의 보드(보드(2145)(보드 m1) 포함)만이 이용된다. 보드의 실제 수는 하드웨어에서 모델링될 사용자 설계의 복잡성 및 크기에 의해 결정된다. 매체 복잡도인 사용자 설계의 하드웨어 모델은 고도의 복잡성을 갖는 사용자 설계의 하드웨어 모델보다 적은 보드를 필요로 한다.External I / O controller 2152 is also connected to other boards 2146-2149 via bus 2180. In one embodiment, the bus 2180 is a local bus described and shown in the above exemplary diagrams of FIGS. 22 (local bus 708) and 56 (local bus 1210). In this embodiment, only five boards (including board 2145 (board m1)) are used. The actual number of boards is determined by the complexity and size of the user design to be modeled in hardware. The hardware complexity of the user design, which is the media complexity, requires less boards than the hardware model of the user design, which is highly complex.

범위성을 가능하게 하기 위해, 보드(2146-2149)는 소정의 보드내 상호접속 라인을 제외하고 서로에 대해 실질적으로 동일하다. 이러한 상호접속 라인은 하나의 칩(예를 들어, 보드(2146)의 칩(2157))에서 사용자 설계의 하드웨어 모델의 한 부분이 또 다른 칩(예를 들어, 보드(2148)의 칩(2161))에 물리적으로 위치한 동일한 사용자 설계의 하드웨어 모델의 또 다른 부분과 통신할 수 있게 한다. 간략하게, 도 8 및 36-44와 명세서에서의 도면 설명뿐 아니라, 이러한 공동검증 시스템용 상호접속 구조에 대해 도 74를 참조하라.To enable scalability, the boards 2146-2149 are substantially identical to each other except for certain intra-board interconnect lines. This interconnect line is a portion of the hardware model of the user design in one chip (eg, chip 2157 of board 2146) and another chip (eg, chip 2161 of board 2148). To communicate with another part of the hardware model of the same user design physically located at For simplicity, reference is made to FIG. 74 for the interconnect structure for such a joint validation system, as well as the figures in FIGS. 8 and 36-44 and the description in the specification.

보드(2148)는 대표적인 보드이다. 보드(2148)는 (보드(2145)(보드 m1)를 제외) 이러한 4-보드 레이아웃의 제 3 보드이다. 따라서, 상호접속 라인에 대해 적절한 종료를 필요로 하는 엔드-보드가 아니다. 보드(2148)는 내부 I/O 컨트롤러 (2158), 여러 리컨피규러블 로직 엘리먼트(예를 들어, FPGA 칩)(2159-2166), 하이 뱅크 FD 버스(2167), 로우 뱅크 FD 버스(2168), 하이 뱅크 메모리(2169) 및 로우 뱅크 메모리(2170)를 포함한다. 상기에 기술된 바와 같이, 일 실시예에서 내부 I/O 컨트롤러(2158)는 도 22(유닛(700)) 및 도 56(유닛(1200))에서의 예시적인 도면으로 상기 기술되고 도시된 FPGA I/O 컨트롤러이다. 유사하게, 하이 및 로우 뱅크 메모리 소자(2169, 2170)는 예를 들어, 도 56(SRAM(1205, 1206))에서 상기에 기술되고 도시된 SRAM 메모리 소자이다. 일 실시예에서, 하이 및 로우 뱅크 FD 버스(2167, 2168)는 도 22(FPGA 버스(718, 719)), 도 56(FD 버스(1212, 1213)) 및 도 57(FD 버스(1282))에서의 예시적인 도면에 기술되고 도시된 FD 버스 또는 FPGA 버스이다.Board 2148 is a representative board. Board 2148 is the third board of this four-board layout (except board 2145 (board m1)). Thus, it is not an end-board that requires proper termination of the interconnect line. Board 2148 includes internal I / O controller 2158, various reconfigurable logic elements (e.g., FPGA chips) 2159-2166, high bank FD bus 2167, low bank FD bus 2168, A high bank memory 2169 and a low bank memory 2170. As described above, in one embodiment the internal I / O controller 2158 is the FPGA I described and shown above with the exemplary diagrams in FIGS. 22 (unit 700) and FIG. 56 (unit 1200). / O controller. Similarly, high and low bank memory elements 2169 and 2170 are, for example, SRAM memory elements described and shown above in FIG. 56 (SRAMs 1205 and 1206). In one embodiment, the high and low bank FD buses 2167 and 2168 are shown in Figure 22 (FPGA buses 718 and 719), Figure 56 (FD buses 1212 and 1213) and Figure 57 (FD buses 1282). The FD bus or FPGA bus described and illustrated in the exemplary figures in FIG.

공동검증 시스템(2140)을 타겟 시스템(2120) 및 다른 I/O 장치에 결합하기 위해, 외부 I/O 확장기의 형태인 외부 인터페이스(2139)가 제공된다. 타겟 시스템측에서, 외부 I/O 확장기(2139)는 소프트웨어 클럭을 전달하는데 이용되는 2차 PCI 버스(2132) 및 제어 라인(2131)을 통해 PCI 브릿지(2127)에 연결된다. I/O 장치측에서, 외부 I/O 확장기(2139)는 소프트웨어 클럭용 핀-아웃 데이터 및 제어 라인 (2133-2135)에 대해 버스(2136-2138)를 통해 여러 I/O 장치에 연결된다. I/O 확장기(2139)에 결합될 수 있는 I/O 장치의 수는 사용자에 의해 결정된다. 소정 경우에, 많은 데이터 버스 및 소프트웨어 클럭 제어 라인이 외부 I/O 확장기(2139)에 제공됨에 따라, 성공적인 디버그 세션을 실행하기 위해 공동검증 시스템(2140)에 많은 I/O 장치에 연결할 필요가 있다.In order to couple the joint validation system 2140 to the target system 2120 and other I / O devices, an external interface 2139 in the form of an external I / O expander is provided. On the target system side, an external I / O expander 2139 is connected to the PCI bridge 2127 via a secondary PCI bus 2132 and a control line 2131 that are used to carry the software clock. On the I / O device side, an external I / O expander 2139 is connected to various I / O devices via bus 2136-2138 for pin-out data and control lines 2133-2135 for software clocks. The number of I / O devices that can be coupled to I / O expander 2139 is determined by the user. In some cases, as many data buses and software clock control lines are provided to the external I / O expander 2139, it is necessary to connect many I / O devices to the co-validation system 2140 to run a successful debug session. .

공동검증 시스템(2140)측에서, 외부 I/O 확장기(2139)는 데이터 서브(2175), 소프트웨어 클럭 제어 라인(2174) 및 스캔 제어 라인(2173)을 통해 외부 I/O 컨트롤러에 연결된다. 데이터 버스(2175)는 외부 장치(타겟 시스템(2120) 및 외부 I/O 장치)와 공동검증 시스템(2140)간에 핀-아웃 데이터를 전달하는데 이용된다. 소프트웨어 클럭 제어 라인(2174)은 RCC 컴퓨팅 시스템(2141)으로부터 외부 장치로 소프트웨어 클럭 데이터를 전달하는데 이용된다.On the side of the joint validation system 2140, an external I / O expander 2139 is connected to an external I / O controller through a data sub 2175, a software clock control line 2174, and a scan control line 2173. Data bus 2175 is used to pass pin-out data between external devices (target system 2120 and external I / O devices) and co-validation system 2140. Software clock control line 2174 is used to transfer software clock data from RCC computing system 2141 to an external device.

제어 라인(2174, 2131)상에 존재하는 소프트웨어 클럭은 RCC 컴퓨팅 시스템 (2141)의 메인 소프트웨어 커널에 의해 발생된다. RCC 컴퓨팅 시스템(2141)은 PCI 버스(2171), PCI 컨트롤러(2151), 버스(2171), 3상 버퍼(2179), 로컬 버스(2180), 외부 I/O 컨트롤러(2152) 및 제어 라인(2174)을 통해 외부 I/O 확장기(2139)에 소프트웨어 클럭을 전달한다. 외부 I/O 확장기(2139)로부터, 소프트웨어 클럭은 (PCI 브릿지(2127)를 통한)타겟 시스템(2120) 및 제어 라인(2133-2135)을 통한 다른 외부 I/O 장치에 대한 클럭 입력으로 제공된다. 소프트웨어 클럭은 메인 클럭 소스로 기능하기 때문에, 타겟 시스템(2120) 및 I/O 장치는 더 느린 속도로 실행한다. 그러나, 타겟 시스템(2120) 및 외부 I/O 장치에 제공된 데이터는 RCC 컴퓨팅 시스템(2141)의 소프트웨어 모델 및 RCC 하드웨어 어레이(2190)의 하드웨어 모델과 같은 소프트웨어 클럭 속도로 동기된다. 유사하게, 타겟 시스템(2120) 및 외부 I/O 장치로부터의 데이터는 소프트웨어 클럭으로 동기된 공동검증 시스템(2140)에 전달된다.The software clock present on control lines 2174 and 2131 is generated by the main software kernel of RCC computing system 2141. RCC computing system 2141 includes PCI bus 2171, PCI controller 2151, bus 2171, three-phase buffer 2179, local bus 2180, external I / O controller 2152, and control line 2174. Pass the software clock to the external I / O expander (2139). From the external I / O expander 2139, the software clock is provided as a clock input to the target system 2120 (via the PCI bridge 2127) and other external I / O devices via the control line 2133-2135. . Since the software clock serves as the main clock source, the target system 2120 and I / O devices run at a slower rate. However, data provided to the target system 2120 and external I / O devices are synchronized at the same software clock rate as the software model of the RCC computing system 2141 and the hardware model of the RCC hardware array 2190. Similarly, data from target system 2120 and external I / O devices are passed to co-verification system 2140 synchronized with a software clock.

따라서, 외부 인터페이스와 공동검증 시스템간에 전달된 I/O 데이터는 소프트웨어 클럭으로 동기된다. 본질적으로, 소프트웨어 클럭은 데이터가 전달될 때마다 공동검증 시스템(RCC 시스템 및 RCC 하드웨어 어레이)을 갖는 타겟 시스템과 외부 I/O 장치의 동작을 동기시킨다. 소프트웨어 클럭은 데이터 입력 동작 및 데이터 출력 동작 양쪽에 대해 이용된다. 데이터 입력 동작에 대해, 포인터(이후에 논의됨)가 RCC 컴퓨팅 시스템(2141)으로부터 외부 인터페이스로 소프트웨어 클럭을 래칭할 때, 다른 포인터는 외부 인터페이스로부터 RCC 하드웨어 어레이(2190)의 하드웨어 모델에서 선택된 내부 노드로 이러한 I/O 데이터 입력을 래칭시킬 것이다. 한개 단위로, 포인터는 소프트웨어 클럭이 외부 인터페이스에 전달될 때 이러한 사이클동안 상기 I/O 데이터 입력을 래칭시킬 것이다. 모든 데이터가 래칭되면, RCC 컴퓨팅 시스템은 원하는 경우 다른 소프트웨어 클럭 사이클에서 더 많은 데이터를 다시 래칭하기 위해 다른 소프트웨어 클럭을 발생시킬 수 있다. 데이터 출력 동작에 대해, RCC 컴퓨팅 시스템은 외부 인터페이스에 소프트웨어 클럭을 전달할 수 있으며, 후속적으로 RCC 하드웨어 어레이(2190)의 하드웨어 모델의 내부 노드로부터 포인터의 보조로 외부 인터페이스로의 데이터 게이트을 제어할 수 있다. 다시, 한개 단위로, 포인터는 내부 노드로부터 외부 인터페이스로 데이터를 게이트할 것이다. 더 많은 데이터가 외부 인터페이스에 전달될 필요가 있다면, RCC 컴퓨팅 시스템은 또 다른 소프트웨어 클럭 사이클을 발생시킬 수 있으며 그후에 데이터 출력을 외부 인터페이스에 게이트하기 위해 선택된 포인터를 구동시킨다. 소프트웨어 클럭의 발생은 엄격하게 제어되며 따라서 공동검증 시스템이 데이터 전송을 동기시키도록 하며, 공동검증 시스템과 소정의 외부 I/O 장치간의 데이터 평가는 외부 인터페이스에 연결된다.Thus, the I / O data transferred between the external interface and the joint verification system is synchronized with the software clock. In essence, the software clock synchronizes the operation of an external I / O device with a target system having a joint validation system (RCC system and RCC hardware array) each time data is transferred. The software clock is used for both data input and data output operations. For data entry operations, when the pointer (discussed later) latches the software clock from the RCC computing system 2141 to the external interface, another pointer is selected from the external interface to an internal node selected in the hardware model of the RCC hardware array 2190. Will latch these I / O data inputs. In one unit, a pointer will latch the I / O data input during this cycle when a software clock is delivered to the external interface. Once all the data is latched, the RCC computing system can generate another software clock to relatch more data in another software clock cycle if desired. For a data output operation, the RCC computing system can deliver a software clock to the external interface and subsequently control the data gate to the external interface with the aid of a pointer from an internal node of the hardware model of the RCC hardware array 2190. . Again, in one unit, the pointer will gate the data from the internal node to the external interface. If more data needs to be delivered to the external interface, the RCC computing system can generate another software clock cycle and then drive the selected pointer to gate the data output to the external interface. The generation of the software clock is tightly controlled so that the co-verification system synchronizes the data transfer, and the data evaluation between the co-verification system and any external I / O device is connected to the external interface.

스캔 제어 라인(2173)은 공동검증 시스템(2140)이 존재할 수 있는 소정 데이터에 대해 데이터 버스(2132, 2136, 2137, 2138)를 스캔하도록 허용하는데 이용된다. 스캔 신호를 지원하는 외부 I/O 컨트롤러(2151)의 로직은 MOVE 신호를 통해 다음 입력으로 이동하기 전에 특정 시간주기동안 여러 입력이 출력으로 제공되는 포인터 로직이다. 이러한 로직은 도 11에 도시된 방식과 유사하다. 효율적으로, 스캔 신호는 라운드 로빈 순서로 멀티플렉서에 여러 입력을 선택하는 경우를 제외하고 멀티플렉서용 선택 신호와 유사하게 기능한다. 따라서, 한번의 시간주기에서, 스캔 제어 라인(2173)상의 스캔 신호는 타겟 시스템(2120)으로부터 발생할 수 있는 데이터용 데이터 버스(2132)를 샘플링한다. 다음의 시간 주기에서, 스캔 제어 라인(2173)상의 스캔 신호는 액세스될 수 있는 외부 I/O 장치에서 발생할 수 있는 데이터용 데이터 버스(2136)를 샘플링한다. 다음의 시간 주기에서, 데이터 버스(2137)는 공동검증 시스템(2140)이 이러한 디버그 세션동안 타겟 시스템(2120) 또는 외부 I/O 장치로부터 발생되는 모든 핀-아웃 데이터를 수신하고 처리할 수 있도록 샘플링된다. 데이터 버스(2132, 2136, 2137, 2138)로부터 공동검증 시스템(2140)에 의해 수신되는 소정 데이터는 외부 I/O 컨트롤러(2152)를 통해 외부 버퍼(2154)에 전송된다.Scan control line 2173 is used to allow the co-verification system 2140 to scan the data buses 2132, 2136, 2137, 2138 for certain data that may be present. The logic of the external I / O controller 2151 that supports the scan signal is pointer logic where multiple inputs are provided as outputs for a specific period of time before moving to the next input via the MOVE signal. This logic is similar to the manner shown in FIG. Effectively, the scan signal functions similarly to the select signal for the multiplexer, except for selecting multiple inputs to the multiplexer in round robin order. Thus, in one time period, the scan signal on the scan control line 2173 samples the data bus 2132 for data that may occur from the target system 2120. In the next time period, the scan signal on the scan control line 2173 samples the data bus 2136 for data that may occur in an external I / O device that can be accessed. In the next time period, the data bus 2137 samples the co-verification system 2140 to receive and process all pin-out data originating from the target system 2120 or external I / O devices during this debug session. do. Certain data received by the co-verification system 2140 from the data buses 2132, 2136, 2137, and 2138 is transmitted to the external buffer 2154 through the external I / O controller 2152.

도 69에 도시된 구성은 타겟 시스템(2120)이 1차 CPU를 포함하며 사용자 설계는 비디오 컨트롤러, 망 어댑터, 그래픽 어댑터, 마우스 또는 소정의 다른 지원 장치, 카드 또는 로직과 같은 소정의 주변 장치임을 가정한다. 따라서, 타겟 시스템 (2120)은 1차 PCI 버스(2129)에 연결된 타겟 애플리케이션(운영 시스템 포함)을 포함하며, 공동검증 시스템(2140)은 사용자 설계를 포함하며 2차 PCI 버스(2132)에 연결된다. 상기 구성은 사용자 설계의 조건에 따라 상당히 달라질 수 있다. 예를 들어, 사용자 설계가 CPU라면, 타겟 시스템(2120)이 더이상 중앙 컴퓨팅 시스템(2121)을 포함하지 않는 반면, 타겟 애플리케이션은 공동검증 시스템(2140)의 RCC 컴퓨팅 시스템 (2141)에서 실행할 것이다. 또한, 버스(2132)는 1차 PCI 버스이며 버스(2129)는 2차 PCI 버스일 것이다. 효율적으로, 사용자 설계가 중앙 컴퓨팅 시스템(2121)을 지원하는 주변 장치 중 하나인 대신, 사용자 설계는 메인 컴퓨팅 센터이며 모든 다른 주변 장치는 사용자 설계를 지원한다.The configuration shown in FIG. 69 assumes that the target system 2120 includes a primary CPU and the user design is any peripheral device such as a video controller, network adapter, graphics adapter, mouse or any other supporting device, card or logic. do. Thus, target system 2120 includes a target application (including operating system) coupled to primary PCI bus 2129, and co-validation system 2140 includes a user design and is coupled to secondary PCI bus 2132. . The configuration may vary considerably depending on the conditions of the user design. For example, if the user design is a CPU, the target system 2120 will no longer include a central computing system 2121, while the target application will run on the RCC computing system 2141 of the co-verification system 2140. In addition, bus 2132 may be a primary PCI bus and bus 2129 may be a secondary PCI bus. Effectively, instead of the user design being one of the peripherals supporting the central computing system 2121, the user design is the main computing center and all other peripherals support the user design.

외부 인터페이스(외부 I/O 확장기(2139))와 공동검증 시스템(2140)간에 데이터를 전송하는 제어 로직은 각 보드(2145-2149)에 설치된다. 제어 로직의 1차 부분은 외부 I/O 컨트롤러(2152)에 형성되지만 다른 부분은 여러 내부 I/O 컨트롤러(예를 들어, 2156, 2158) 및 리컨피규러블 로직 엘리먼트(예를 들어, FPGA 칩(2159, 2165))에 형성된다. 교육의 목적으로, 모든 보드의 모든 칩에 대해 동일한 반복 로직 구조 대신에 이러한 제어 로직의 소정 부분만을 도시할 필요가 있다. 도 69의 점선(2150)내의 공동검증 시스템(2140)의 부분은 제어 로직의 하나의 서브세트를 포함한다. 이러한 제어 로직은 도 70-73에 관해 더욱 상세히 논의될 것이다.Control logic for transferring data between the external interface (external I / O expander 2139) and the joint validation system 2140 is installed on each board 2145-2149. The primary portion of the control logic is formed in the external I / O controller 2152, while the other portion is comprised of several internal I / O controllers (e.g. 2156, 2158) and reconfigurable logic elements (e.g. FPGA chips ( 2159, 2165). For educational purposes, it is necessary to show only a portion of this control logic instead of the same repeating logic structure for every chip on every board. The portion of the co-verification system 2140 in dashed line 2150 of FIG. 69 includes one subset of control logic. Such control logic will be discussed in more detail with respect to FIGS. 70-73.

제어 로직의 특정 서브세트의 소자는 외부 I/O 컨트롤러(2152), 3상 버퍼 (2179), 내부 I/O 컨트롤러(2156)(CTRL 1), 리컨피규러블 로직 엘리먼트(2157)(보드 1의 칩 0을 나타내는 chip0_1) 및 상기 소자에 연결되는 여러 버스 및 제어 라인의 부분을 포함한다. 구체적으로, 도 70은 데이터 입력 사이클동안 이용되는 제어 로직의 일부를 도시하며, 상기 외부 인터페이스(외부 I/O 확장기(2139)) 및 RCC 컴퓨팅 시스템(2141)으로부터의 데이터는 RCC 하드웨어 어레이(2190)에 전송된다. 도 72는 데이터 입력 사이클의 타이밍도를 도시한다. 도 71은 데이터 출력 사이클에 이용되는 제어 로직의 부분을 도시하며, RCC 하드웨어 어레이(2190)로부터의 데이터는 RCC 컴퓨팅 시스템(2141) 및 외부 인터페이스(외부 I/O 확장기(2139))에 전송된다. 도 73은 데이터 출력 사이클의 타이밍도를 도시한다.Devices in a particular subset of control logic include an external I / O controller 2152, a three-phase buffer 2179, an internal I / O controller 2156 (CTRL 1), a reconfigurable logic element 2157 (board 1 of Chip0_1 representing chip 0) and portions of various bus and control lines connected to the device. Specifically, FIG. 70 shows some of the control logic used during the data input cycle, with data from the external interface (external I / O expander 2139) and RCC computing system 2141 being RCC hardware array 2190. Is sent to. 72 shows a timing diagram of a data input cycle. 71 shows a portion of the control logic used in the data output cycle, with data from the RCC hardware array 2190 being sent to the RCC computing system 2141 and an external interface (external I / O expander 2139). 73 shows a timing diagram of a data output cycle.

데이터 입력Data entry

본 발명의 일 실시예에 따른 데이터 입력 제어 로직은 RCC 컴퓨팅 시스템 또는 외부 인터페이스로부터 RCC 하드웨어 어레이로 전송된 데이터 처리를 담당한다. 데이터 입력 제어 로직의 하나의 특정 서브세트(2150)(도 69 참조)는 도 70에 도시되며 외부 I/O 컨트롤러(2200), 3상 버퍼(2202), 내부 I/O 컨트롤러(2203), 리컨피규러블 로직 엘리먼트(2204) 및 데이터 전송을 허용하는 여러 버스 및 제어 라인을 포함한다. 외부 버퍼(2201)는 또한 이러한 데이터 입력 실시예에 대해 도시된다. 이러한 서브세트는 데이터 입력 동작에 필요한 로직을 도시하며, 외부 인터페이스 및 RCC 컴퓨팅 시스템으로부터의 데이터는 RCC 하드웨어 어레이에 전송된다. 도 70의 데이터 입력 제어 로직 및 도 72의 데이터 입력 타이밍도는 함께 논의될 것이다.Data input control logic in accordance with one embodiment of the present invention is responsible for processing data transmitted from an RCC computing system or an external interface to an RCC hardware array. One particular subset 2150 of data input control logic (see FIG. 69) is shown in FIG. 70 and includes an external I / O controller 2200, a three-phase buffer 2202, an internal I / O controller 2203, and a recon. The programmable logic element 2204 and various bus and control lines that allow data transfer. External buffer 2201 is also shown for this data input embodiment. This subset shows the logic required for data input operations, with data from the external interface and the RCC computing system being sent to the RCC hardware array. The data input control logic of FIG. 70 and the data input timing diagram of FIG. 72 will be discussed together.

두개 유형의 데이터 사이클은 본 발명의 이러한 데이터 입력 실시예(글로벌 사이클 및 소프트웨어-대-하드웨어(S2H) 사이클)에 이용된다. 글로벌 사이클은 RCC 하드웨어 어레미의 여러 다른 노드에서 전달되는 소정의 다른 S2H 데이터, 클럭 및 리셋과 같은 RCC 하드웨어 어레이의 모든 칩에 전달되는 소정 데이터에 대해 이용된다. 이러한 후속 "글로벌" S2H 데이터에 대해, 시퀀스적인 S2H 데이터보다는 글로벌 사이클을 통해 이러한 데이터 출력을 전송하는 것이 더 실행가능하다.Two types of data cycles are used in this data input embodiment of the present invention (global cycle and software-to-hardware (S2H) cycle). Global cycles are used for certain data delivered to all chips in the RCC hardware array, such as some other S2H data, clocks, and resets that are delivered at different nodes of the RCC hardware array. For this subsequent "global" S2H data, it is more feasible to send this data output through a global cycle rather than sequential S2H data.

소프트웨어-대-하드웨어 사이클은 RCC 컴퓨팅 시스템의 테스트 벤치 프로세스로부터 RCC 하드웨어 어레이로 순차적으로 하나의 칩으로부터 모든 보드의 다른 칩으로 데이터를 전송하는데 이용된다. 사용자 설계의 하드웨어 모델이 여러 보드를 통해 분산되기 때문에, 테스트 벤치 데이터는 데이터 평가를 위해 모든 칩에 제공되어야 한다. 따라서, 데이터는 한번에 하나의 내부 노드로, 각 칩의 각 내부 노드에 순차적으로 전송된다. 순차적 전송은 하드웨어 모델이 다수의 칩간에 분포되기 때문에 특정 내부 노드에 대해 지적된 특정 데이터가 RCC 하드웨어 어레이의 모든 칩에 의해 처리되도록 허용한다.The software-to-hardware cycle is used to transfer data from one chip to another chip on all boards sequentially from the test bench process of the RCC computing system to the RCC hardware array. Because the hardware model of your design is distributed across multiple boards, test bench data must be provided on every chip for data evaluation. Thus, data is sequentially transmitted to one internal node at a time, to each internal node of each chip. Sequential transmission allows the specific data pointed out for a particular internal node to be processed by all the chips in the RCC hardware array because the hardware model is distributed among multiple chips.

이러한 데이터 평가에 대해, 공동검증화는 두개의 어드레스 공간(S2H 및 CLK)을 제공한다. 상기에 기술된 바와 같이, S2H 및 CLK 공간은 커널로부터 하드웨어 모델로의 1차 입력이다. 하드웨어 모델은 사용자 회로 설계의 모든 레지스터 컴포넌트 및 결합 컴포넌트를 유지한다. 게다가, 소프트웨어 클럭은 소프트웨어에서 모델링되고 하드웨어 모델과 인터페이싱하기 위해 CLK I/O 어드레스 공간에 제공된다. 커널은 시뮬레이션 시간을 늘리고, 액티브 테스트 벤치 소자를 탐색하고 클럭 컴포넌트를 평가한다. 소정의 클럭 에지가 커널에 의해 검출되면, 레지스터 및 메모리는 업데이트되고 결합 컴포넌트를 통한 값은 전파된다. 따라서, 이러한 공간의 값의 변화는 하드웨어 가속 모드가 선택되면 로직 상태를 변화시키기 위해 하드웨어 모델을 트리거할 것이다.For this data evaluation, co-validation provides two address spaces (S2H and CLK). As described above, S2H and CLK space are the primary inputs from the kernel to the hardware model. The hardware model maintains all register components and coupling components of the user circuit design. In addition, the software clock is modeled in software and provided in the CLK I / O address space to interface with the hardware model. The kernel increases simulation time, searches for active test bench devices, and evaluates clock components. If a certain clock edge is detected by the kernel, the registers and memory are updated and the values propagated through the coupling component. Thus, the change in the value of this space will trigger the hardware model to change the logic state once the hardware acceleration mode is selected.

데이터 전송동안, DATA_XSFR 신호는 로직 "1"에 있다. 이 시간동안, 로컬 버스(2222-2230)는 (1) RCC 컴퓨팅 시스템으로부터 RCC 하드웨어 어레이 및 CLK 공간으로의 글로벌 데이터; (2) 외부 인터페이스로부터 RCC 하드웨어 어레이 및 외부 버퍼로의 글로벌 데이터; 및 (3) 각 보드에서 한번에 한 칩으로 RCC 컴퓨팅 시스템으로부터 RCC 하드웨어 어레이로의 S2H 데이터와 같은 데이터 사이클로 데이터를 전송하기 위해 공동검증 시스템에 의해 이용될 것이다. 따라서, 첫번째 두개 데이터 사이클은 글로벌 사이클의 일부이며 최종 데이터 사이클은 S2H 사이클의 일부이다. During data transfer, the DATA_XSFR signal is at logic "1". During this time, the local buses 2222-2230 may include: (1) global data from the RCC computing system to the RCC hardware array and CLK space; (2) global data from the external interface to the RCC hardware array and external buffer; And (3) a joint validation system to transfer data in a data cycle, such as S2H data from the RCC computing system to the RCC hardware array, one chip at a time on each board. Thus, the first two data cycles are part of the global cycle and the final data cycle is part of the S2H cycle.

RCC 컴퓨팅 시스템으로부터의 글로벌 데이터가 RCC 하드웨어 어레이로 전송될 때 데이터 입력 글로벌 사이클의 제 1 부분에 대해, 외부 I/O 컨트롤러(2200)는 라인 (2255)상에 CPU_IN 신호를 로직 "1"로 인에이블시킨다. 라인(2255)은 삼상 버퍼 (2202)의 인에이블 입력에 연결된다. 라인(2255)상의 로직 "1"로, 3상 버퍼(2202)는 로컬 버스(2222)상의 데이터가 3상 버퍼(2202)의 다른측상에 로컬 버스(2223-2230)로 전송하도록 허용한다. 이러한 특정 예에서, 로컬 버스(2223, 2224, 2225, 2226, 2227, 2228, 2229, 2230)는 각각 LD3, LD4(외부 I/O 컨트롤러(2200)로부터), LD6(외부 I/O 컨트롤러(22O0)로부터), LD1, LD6, LD4, LD5, LD7에 대응한다.For the first part of the data input global cycle when global data from the RCC computing system is sent to the RCC hardware array, the external I / O controller 2200 reads the CPU_IN signal on logic line 2 on line 2255. Enable it. Line 2255 is connected to the enable input of three-phase buffer 2202. With logic "1" on line 2255, three-phase buffer 2202 allows data on local bus 2222 to transfer to local bus 2223-2230 on the other side of three-phase buffer 2202. In this particular example, local buses 2223, 2224, 2225, 2226, 2227, 2228, 2229, 2230 are LD3, LD4 (from external I / O controller 2200), and LD6 (external I / O controller 22O0, respectively). ), LD1, LD6, LD4, LD5, and LD7.

글로벌 데이터는 이러한 로컬 버스 라인으로부터 내부 I/O 컨트롤러(2203)의 버스 라인(2231-2235) 및 그후에 FD 버스 라인(2236-2240)으로 진행한다. 이 예에서, FD 버스 라인(2236, 2237, 2238, 2239, 2240)은 각각 FD 버스 라인(FD1, FD6, FD4, FD5, FD7)에 대응한다.Global data proceeds from this local bus line to the bus lines 2231-2235 of the internal I / O controller 2203 and then to the FD bus lines 2236-2240. In this example, the FD bus lines 2236, 2237, 2238, 2239, and 2240 correspond to the FD bus lines FD1, FD6, FD4, FD5, and FD7, respectively.

이러한 FD 버스 라인(2236-2240)은 리컨피규러블 로직 엘리먼트(2204)의 래치(2208-2213)에 대한 입력에 연결된다. 이 예에서, 리컨피규러블 로직 엘리먼트는 chip0_1(즉, 보드 1의 칩 0)에 대응한다. 또한, FD 버스 라인(2236)은 래치 (2208)에 연결되고, FD 버스 라인(2237)은 래치(2209, 2211)에 연결된다. FD 버스 라인(2238)은 래치(2210)에 연결되고, FD 버스 라인(2239)은 래치(2212)에 연결되며, FD 버스 라인(2240)은 래치(2213)에 연결된다.These FD bus lines 2236-2240 are connected to the inputs to latches 2208-2213 of the reconfigurable logic element 2204. In this example, the reconfigurable logic element corresponds to chip0_1 (ie chip 0 on board 1). FD bus line 2236 is also connected to latch 2208, and FD bus line 2237 is connected to latches 2209 and 2211. FD bus line 2238 is connected to latch 2210, FD bus line 2239 is connected to latch 2212, and FD bus line 2240 is connected to latch 2213.

이러한 래치(2208-2213) 각각에 대한 인에이블 입력은 여러 글로벌 포인터 및 소프트웨어-대-하드웨어(S2H) 포인터에 연결된다. 래치(2208-2211)에 대한 인에이블 입력은 글로벌 포인터에 연결되고 래치(2212-2213)에 대한 인에이블 입력은 S2H 포인터에 연결된다. 소정의 예시적인 글로벌 포인터는 라인(2241)상의 GLB_PTR0, 라인(2242)상의 GLB_PTR1, 라인(2243)상의 GLB_PTR2 및 라인(2244)상의 GLB_PTR3를 포함한다. 소정의 예시적인 S2H 포인터는 라인(2245)상의 S2H_PTR0 및 라인(2246)상의 S2H_PTR1을 포함한다. 이러한 래치에 대한 인에이블 입력이 이러한 포인터에 연결되기 때문에, 각 래치는 적절한 포인터 신호없이 사용자 설계의 하드웨어 모델의 지정된 목적 모드에 데이터를 래치할 수 없다.The enable input for each of these latches 2208-2213 is coupled to several global pointers and software-to-hardware (S2H) pointers. The enable input for latch 2208-2211 is connected to the global pointer and the enable input for latch 2212-2213 is connected to the S2H pointer. Some exemplary global pointers include GLB_PTR0 on line 2241, GLB_PTR1 on line 2242, GLB_PTR2 on line 2243 and GLB_PTR3 on line 2244. Certain example S2H pointers include S2H_PTR0 on line 2245 and S2H_PTR1 on line 2246. Because the enable input for this latch is connected to this pointer, each latch cannot latch data in the designated destination mode of the hardware model of the user's design without an appropriate pointer signal.

이러한 글로벌 및 S2H 포인터 신호는 출력(2254)상의 데이터 입력 포인터 상태 머신(2214)에 의해 발생된다. 데이터 입력 포인터 상태 머신(2214)은 라인 (2253)상의 DATA_XSFR 및 F_WR에 의해 제어된다. 내부 I/O 컨트롤러(2203)는 라인(2253)상의 DATA_XSFR 및 F_WR을 발생시킨다. DATA_XSFR은 RCC 하드웨어 어레이와 RCC 컴퓨팅 시스템 또는 외부 인터페이스간의 데이터 전송을 원할 때마다 로직 "1"상태로 있다. F_RD 신호와 반대로, F_WR 신호는 RCC 하드웨어 어레이로의 기록을 원할 때마다 로직 "1"에 있다. F_RD 신호를 통한 판독은 RCC 하드웨어 어레이로부터 RCC 컴퓨팅 시스템 및 외부 인터페이스 중 하나로의 데이터 전송을 필요로 한다. DATA_XSFR 및 F_WR 신호 양쪽이 로직 "1"에 있으면, 데이터 입력 포인터 상태 머신은 적절한 프로그램된 시퀀스로 적절한 글로벌 또는 S2H 포인터 신호를 발생시킬 수 있다.These global and S2H pointer signals are generated by data input pointer state machine 2214 on output 2254. Data input pointer state machine 2214 is controlled by DATA_XSFR and F_WR on line 2253. Internal I / O controller 2203 generates DATA_XSFR and F_WR on line 2253. DATA_XSFR is in a logic "1" state whenever a data transfer is desired between the RCC hardware array and the RCC computing system or external interface. In contrast to the F_RD signal, the F_WR signal is in logic " 1 " whenever a write to the RCC hardware array is desired. Reading through the F_RD signal requires data transfer from the RCC hardware array to one of the RCC computing system and external interface. If both the DATA_XSFR and F_WR signals are in logic "1", the data input pointer state machine can generate the appropriate global or S2H pointer signal in the appropriate programmed sequence.

이러한 래치의 출력(2247-2252)은 사용자 설계의 하드웨어 모델의 여러 내부 노드에 연결된다. 내부 노드중 일부는 사용자 설계의 입력 핀-아웃에 해당한다. 사용자 설계는 일반적으로 핀-아웃을 통해 액세스될 수 없는 다른 내부 노드를 가지지만, 이러한 논-핀-아웃 내부 노드는 이들이 입력 핀-아웃인지 아닌지에 관계없이 사용자 설계내 여러 내부 노드에 스티멀러스를 주기를 원하는 설계자에게 융통성을 제공하기 위한 다른 디버깅 목적을 위한 것이다. 사용자 설계의 고도한 하드웨어 모델에 외부 인터페이스를 제공하는 스티멀러스를 위해, 데이터-인 로직 및 입력-핀-아웃에 해당하는 이러한 내부 노드가 수행된다. 예를 들면, 만일 사용자 설계가 CRTC(6845) 비디오 컨트롤러일 때, 몇몇 입력 핀-아웃은 다음과 같다:The outputs of these latches 2247-2252 are connected to various internal nodes of the hardware model of the user design. Some of the internal nodes correspond to the input pin-outs of the user design. User designs typically have other internal nodes that cannot be accessed through pin-out, but these non-pin-out internal nodes have stimulus across multiple internal nodes in the user design, whether they are input pin-out or not. It is for other debugging purposes to provide flexibility for designers who want to provide For stimulus providing an external interface to the high hardware model of the user's design, this internal node corresponding to data-in logic and input-pin-out is performed. For example, if the user design is a CRTC 6845 video controller, some input pin-outs are as follows:

LPSTPB - 광펜 스트로브 핀LPSTPB-Light Pen Strobe Pins

~RESET - 6845 컨트롤러를 리셋하기 위한 로우 레벨 신호~ RESET-Low Level Signal to Reset the 6845 Controller

RS - 레지스터 선택RS-Register Selection

E - 인에이블E-Enable

CLK - 클럭CLK-Clock

~CS - 칩 선택~ CS-Chip Selection

다른 입력 핀-아웃은 또한 이러한 비디오 컨트롤러에서 사용될 수 있다. 외부 세계와 인터페이스하는 입력 핀-아웃의 수에 기초하여, 래치와 포인터의 수는 용이하게 결정될 수 있다. RCC 하드웨어 어레이내에 구성된 몇몇 하드웨어 모델은 예를 들면 총 180개의 래치(=30x6)에 대해 각각의 GLB_PTR0, GLB_PTR1, GLB_PTR2, GLB_PTR3, S2H_PTRO 및 S2H_PTR1과 관련된 30개의 분리 래치를 가진다. 다른 설계에서, GLB_PTR4 내지 GLB_PTR30과 같은 더 많은 글로벌 포인터가 필요에 따라 사용될 수 있다. 유사하게, S2H_PTR2 내지 S2H_PTR30과 같은 더 많은 S2H가 필요에 따라 사용될 수 있다. 이러한 포인터들과 이들의 해당 래치는 각각의 사용자 설계의 하드웨어 모델에 대한 요구조건에 기초한다. Other input pin-outs can also be used in such video controllers. Based on the number of input pin-outs that interface with the outside world, the number of latches and pointers can be easily determined. Some hardware models configured within the RCC hardware array have 30 separate latches associated with each of GLB_PTR0, GLB_PTR1, GLB_PTR2, GLB_PTR3, S2H_PTRO and S2H_PTR1, for a total of 180 latches (= 30x6), for example. In other designs, more global pointers such as GLB_PTR4 through GLB_PTR30 may be used as needed. Similarly, more S2H such as S2H_PTR2 to S2H_PTR30 can be used as needed. These pointers and their corresponding latches are based on the requirements for the hardware model of each user design.

도 70 및 도 72를 참조하면, FD 버스 라인상의 데이터는 래치가 적정 글로벌 포인터 또는 S2H 포인터 신호로 인에이블되기만 하면 이러한 내부 노드에 자신을 길을 형성한다. 그렇지 않다면, 이들 내부 노드는 FD 버스상의 임의의 데이터에 의해 구동되지 않는다. F_WR이 CPU_IN=1 시간 주기의 처음 반주기 동안 로직 "1"일 때, GLB_PTR0은 로직 "1"이 되어 라인 2247을 통해 해당 내부 노드로 FD1상의 데이터를 구동한다. 인에이블을 위해 GLB_PTR0에 의존하는 다른 래치가 존재한다면, 이러한 래치는 자신들이 해당 내부 노드에 데이터를 래칭할 것이다. CPU_IN=1 시간 주의 다음 반주기에서, F_WR은 GLB_PTR1이 로직 "1"로 증가되도록 트리거하여 로직 "1"로 간다. 이는 라인 2248에 연결된 내부 노드에 FD6상의 데이터를 구동한다. 이는 또한 래치(2205)에 의해 라인 2216에 래칭될 라인 2223상에 소프트웨어 신호를 송신하고 인에이블 라인 2215에 GLB_PTR1 신호를 송신한다. 소프트웨어 클럭은 외주 클럭 입력, 타겟 시스템 및 다른 외부 I/O 장치로 전달된다. GLB_PTR0 및 GLB_PTR1이 데이터-인 글로벌 사이클의 제 1 부분으로서만 사용되기 때문에, CPU_IN은 로직 "0"으로 되돌아가고, 이는 RCC 컴퓨팅 시스템으로부터 RCC 하드웨어 어레이로 글로벌 데이터의 전달을 완성한다.70 and 72, the data on the FD bus line routes itself to these internal nodes as long as the latch is enabled with the appropriate global pointer or S2H pointer signal. Otherwise, these internal nodes are not driven by any data on the FD bus. When F_WR is logic "1" for the first half cycle of CPU_IN = 1 time period, GLB_PTR0 becomes logic "1" to drive data on FD1 to its internal node via line 2247. If there are other latches that depend on GLB_PTR0 for enable, these latches will latch their data to that internal node. CPU_IN = 1 time note In the next half cycle, F_WR triggers GLB_PTR1 to increase to logic "1" and goes to logic "1". This drives data on FD6 to an internal node connected to line 2248. It also sends a software signal on line 2223 to be latched on line 2216 by latch 2205 and transmits a GLB_PTR1 signal on enable line 2215. The software clock is passed to the peripheral clock input, target system, and other external I / O devices. Since GLB_PTR0 and GLB_PTR1 are used only as the first part of the data-in global cycle, CPU_IN reverts to logic "0", which completes the transfer of global data from the RCC computing system to the RCC hardware array.

데이터-인 글로벌 사이클의 제 2 부분이 설명될 것이고, 여기서 외부 인터페이스로부터의 글로벌 데이터는 RCC 하드웨어 어레이와 외부 버퍼에 전달된다. 다시, 사용자 설계에 맞도록 유도되는 타겟 시스템 또는 외부 I/O 장치로부터의 여러 입력 핀-아웃 신호는 하드웨어 모델과 소프트웨어 모델에 제공되어야만 한다. 이러한 데이터는 적정 포인터를 사용함으로써 하드웨어 모델에 전달되고 내부 노드를 구동하도록 래칭된다. 이러한 데이터는 소프트 웨어 모델의 내부 상태를 업데이팅하기 위해 RCC 연상 시스템에 의한 추후 검색을 위해 외부 버퍼(2201)내에 이들을 가장먼저 저장함으로써 소프트웨어 모델에 전달될 수 있다.The second part of the data-in global cycle will be described where global data from the external interface is passed to the RCC hardware array and external buffer. Again, several input pin-out signals from target systems or external I / O devices that are directed to the user's design must be provided to the hardware model and the software model. This data is passed to the hardware model by using the appropriate pointers and latched to drive internal nodes. Such data can be passed to the software model by first storing them in an external buffer 2201 for later retrieval by the RCC associative system to update the internal state of the software model.

CPU_IN은 로직 "0"이고 EXT_IN은 로직 "1"이다. 따라서, 외부 I/O 컨트롤러 2200내 3상(tri-state) 버퍼(2206)은 버스 라인(2217과 2218)로서 PCI 버스 라인상에 데이터가 올려지도록 인에이블된다. 이러한 PCI 버스 라인은 또한 외부 버퍼 (2201)내 스토리지용 FD 버스 라인 2219에 연결된다. EXT_IN 신호가 로직 "1"일 때의 시간 주기의 처음 반주기에서, GLB_PTR2는 로직 "1"이다. 이는 FD4상의 데이터가 (버스 라인 2217, 2224 및 로컬 버스 라인 2228(LD4)을 통해) 라인 2249에 연결된 하드웨어 모델내 내부 노드로 래칭되도록 래칭한다.CPU_IN is logic "0" and EXT_IN is logic "1". Thus, tri-state buffer 2206 in external I / O controller 2200 is enabled to load data onto PCI bus lines as bus lines 2217 and 2218. This PCI bus line is also connected to FD bus line 2219 for storage in external buffer 2201. In the first half of the time period when the EXT_IN signal is logic "1", GLB_PTR2 is logic "1". This latches the data on FD4 to latch internal nodes in the hardware model connected to line 2249 (via bus lines 2217, 2224 and local bus line 2228 (LD4)).

EXT_IN 신호가 로직 "1"일 때의 시간 주기의 다음 반주기동안, GLB_PTR3은 로직 "1"이다. 이는 FD5상의 데이터가 (버스 라인 2218, 2225 및 로컬 버스 라인 2227(LD6)을 통해) 라인 2250에 연결된 하드웨어 모델내 내부 노드로 래칭되도록 래칭한다.For the next half period of the time period when the EXT_IN signal is logic "1", GLB_PTR3 is logic "1". This latches the data on FD5 to latch internal nodes in the hardware model connected to line 2250 (via bus lines 2218, 2225 and local bus line 2227 (LD6)).

상술된 바와 같이, 타겟 시스템 또는 몇몇 다른 외부 I/O 장치로부터의 이러한 데이터는 소프트웨어 모델의 내부 상태를 업데이팅하기 위해 RCC 컴퓨팅 시스템에 의한 추후 검색을 위해 외부 버퍼(2201)에 이들을 가장 먼저 저장함으로써 소프트웨어 모델로 전달될 수 있다. 버스 라인 2217과 2218상의 이러한 데이터는 외부 버퍼(2201)에 대해 FD 버스 FD[63:0]으로 제공된다. 각각의 데이터가 외부 버퍼 (2201)에 저장되는 특정 메모리 어드레스는 외부 버퍼(2201)에 버스(2220)를 통해 메모리 어드레스 카운터(2201)에 의해 제공된다. 이러한 저장을 가능케 하기 위해, WR_EXT_BUF 신호가 라인 2221을 통해 외부 버퍼(2201)에 제공된다. 외부 버퍼 (2201)가 채워지기 전에, RCC 컴퓨팅 시스템은 적정 업데이트가 소프트웨어 모델로 형성될 수 있도록 외부 버퍼(2201)의 콘텐츠를 판독할 것이다. RCC 하드웨어 어레이내 하드웨어 모델의 여러 내부 노드에 전달된 임의의 데이터는 하드웨어 모델내 몇몇 내부 상태 변화를 야기할 것이다. RCC 컴퓨팅 시스템이 소프트웨어내 전체 사용자 설계의 모델을 가지기 때문에, 하드웨어 모델내 이러한 내부 상태 변화는 소프트웨어 모델내에 반영되어야 한다. 이는 데이터-입력 글로벌 사이클을 마무리한다.As described above, such data from a target system or some other external I / O device may be stored first in an external buffer 2201 for later retrieval by the RCC computing system to update the internal state of the software model. It can be delivered in a software model. This data on bus lines 2217 and 2218 is provided to FD bus FD [63: 0] for external buffer 2201. The specific memory address where each data is stored in the external buffer 2201 is provided by the memory address counter 2201 to the external buffer 2201 via the bus 2220. To enable such storage, a WR_EXT_BUF signal is provided to external buffer 2201 via line 2221. Before the external buffer 2201 is filled, the RCC computing system will read the contents of the external buffer 2201 so that an appropriate update can be made to the software model. Any data passed to various internal nodes of the hardware model in the RCC hardware array will cause some internal state change in the hardware model. Since the RCC computing system has a model of the overall user design in software, this internal state change in the hardware model must be reflected in the software model. This concludes the data-input global cycle.

S2H 사이클이 이하에서 설명될 것이다. S2H 사이클은 RCC 컴퓨팅 시스템으로부터 RCC 하드웨어 어레이로 테스트 벤치 데이터를 전달하는데 사용되고, 다음으로 각각의 보드를 위해 하나의 칩으로부터 다음 칩으로 순차적으로 데이터를 이동한다. CPU-IN 신호가 로직 "1"인 반면 EXT_IN 신호는 데이터 전달이 RCC 컴퓨팅 시스템과 RCC 하드웨어 어레이 사이임을 나타내는 로직 "0"으로 간다. 외부 인터페이스는 관련되지 않는다. CPU_IN 신호는 또한 로컬 버스(2222)로부터 내주 I/O 컨트롤러 2203으로 데이터가 통과될 수 있도록 하기 위해 3상 버퍼(2202)를 인에이블시킨다.The S2H cycle will be described below. The S2H cycle is used to transfer test bench data from the RCC computing system to the RCC hardware array, and then sequentially move the data from one chip to the next for each board. While the CPU-IN signal is logic "1", the EXT_IN signal goes to logic "0" indicating that data transfer is between the RCC computing system and the RCC hardware array. The external interface is not relevant. The CPU_IN signal also enables the three-phase buffer 2202 to allow data to pass from the local bus 2222 to the inner I / O controller 2203.

CPU_IN=1 시간 주기의 시작시, S2H_PTR0은 라인 2251에 연결된 하드웨어 모델내 내부 노드에 래칭될 (로컬 버스 2222, 로컬 버스 라인 2229, 버스 라인 2234 및 FD 버스 2239를 통해) FD5상에 데이터를 래칭하는 로직 "0"으로 간다. CPU_IN=1 시간 주기의 제 1 부분에서, S2H_PTR1은 라인 2252에 연결된 하드웨어 모델내 내부 노드에 래칭될 (로컬 버스 2222, 로컬 버스 라인 2230, 버스 라인 2235 및 FD 버스 2240을 통해) FD7상에 데이터를 래칭하는 로직 "1"로 간다. 순차적인 데이터 평가 동안, RCC 컴퓨팅 시스템으로부터의 데이터가 칩 m1에 가장 먼저 전달되고, 다음으로 칩0_1(즉, 보드 1상의 칩 0), 칩1)1(즉, 보드 1상의 칩1)로 전달되어 마지막 보드상의 최종 칩, 칩7_8(즉, 보드8상의 칩7)로 전달된다. 만일 칩 m2가 사용 가능하다면, 데이터는 이러한 칩으로 이동될 수 있다.At the start of CPU_IN = 1 time period, S2H_PTR0 latches data on FD5 (via local bus 2222, local bus line 2229, bus line 2234, and FD bus 2239) to be latched to an internal node in the hardware model connected to line 2251. Go to logic "0". In the first part of the CPU_IN = 1 time period, S2H_PTR1 sends data on FD7 (via local bus 2222, local bus line 2230, bus line 2235 and FD bus 2240) to be latched to an internal node in the hardware model connected to line 2252. Go to the logic "1" to latch. During sequential data evaluation, data from the RCC computing system is first delivered to chip m1 and then to chip 0_1 (ie chip 0 on board 1), chip 1) 1 (ie chip 1 on board 1). The chip is then transferred to the last chip on the last board, chip 7_8 (ie chip 7 on board 8). If chip m2 is available, data can be moved to this chip.

데이터 전달의 끝에서, DATA_XSFR은 로직 "0"으로 돌아간다. 외부 인터페이스로부터의 I/O 데이터가 글로벌 데이터로서 간주되고 글로벌 사이클 동안 처리된다. 이는 데이터-인 제어 로직 및 데이터-입력 사이클의 결과를 마무리한다.At the end of the data transfer, DATA_XSFR returns to logic "0". I / O data from the external interface is considered global data and processed during the global cycle. This concludes the result of the data-in control logic and data-input cycles.

데이터-아웃Data-out

본 발명의 데이터-아웃 제어 로직 실시예가 이하에서 설명된다. 본 발명의 실시예에 따른 데이터-아웃 제어 로직는 RCC 하드웨어 어레이로부터 RCC 컴퓨팅 시스템과 외부 인터페이스로 전달된 데이터를 처리할 책임이 있다. 스티멀러스(외부 또는 그외)에 응답하여 데이터를 처리하는 과정동안, 하드웨어 모델은 타겟 응용 또는 몇몇 I/O 장치가 사용되는 특정 출력 데이터를 발생시킨다. 이러한 출력 데이터는 독립 데이터, 어드레스, 제어 정보 또는 다른 응용이나 장치가 자신의 처리에 필요한 다른 관련 정보일 수 있다. RCC 컴퓨팅 시스템에 대한 이러한 출력 데이터(소프트웨어내 다른 외부 I/O 장치의 모델을 가진), 타겟, 시스템 또는 외부 I/O 장치가 여러 내부 노드에 제공된다. 데이터-인 로직에 대해 상술된 바와 같이, 이러한 내부 노드의 일부가 사용자 설계의 출력 핀-아웃에 해당한다. 사용자 설계는 핀-아웃을 통해 일반적으로 액세스되지 않는 다른 내부 노드를 가지지만, 이러한 논-핀-아웃 내부 노드는 이들이 출력 핀-아웃인지 아닌지에 관계없이 사용자 설계내 여러 내부 노드에서 스티멀러스 응답을 판독하고 분석하길 원하는 설계자에게 융통성을 제공할 다른 디버깅 목적을 위한 것이다. 사용자 설계의 고도한 하드웨어 모델로부터(소프트웨어내 다른 I/O 장치의 모델을 가진) RCC 컴퓨팅 시스템 도는 외부 인터페이스로 제공된 스티멀러스에 대해, 데이터-아웃 로직와 출력 핀-아웃에 해당하는 이러한 내부 노드가 수행된다.A data-out control logic embodiment of the present invention is described below. Data-out control logic in accordance with an embodiment of the present invention is responsible for processing data transferred from the RCC hardware array to the RCC computing system and external interfaces. During the process of processing data in response to stimulus (external or otherwise), the hardware model generates specific output data for which the target application or some I / O device is used. Such output data may be independent data, addresses, control information or other relevant information that other applications or devices need for their processing. This output data for the RCC computing system (with a model of other external I / O devices in software), targets, systems or external I / O devices are provided to several internal nodes. As described above for the data-in logic, some of these internal nodes correspond to the output pin-out of the user design. User designs have other internal nodes that are not normally accessed through pin-out, but these non-pin-out internal nodes have stimulant responses at multiple internal nodes in the user design, whether or not they are output pin-outs. It is for other debugging purposes that will give flexibility to designers who want to read and analyze the data. From the high hardware model of the user design (with the model of other I / O devices in the software) to the stimulus provided by the RCC computing system or external interface, these internal nodes corresponding to the data-out logic and output pin-out Is performed.

예를 들면, 만일 사용자 설계가 CRTC 6845 비디오 컨트롤러라면, 몇몇 출력 핀-아웃은 다음과 같다:For example, if your design is a CRTC 6845 video controller, some output pin-outs are:

MA0-MA13 - 메모리 어드레스MA0-MA13-memory address

D0-D7 - 데이터 버스D0-D7-data bus

DE - 디스플레이 인에이블DE-display enable

CURSOR - 커서 위치CURSOR-Cursor Position

VS - 수직 동기화VS-Vertical Sync

HS - 수평 동기화HS-Horizontal Sync

다른 출력 핀-아웃은 이러한 비디오 컨트롤러에서 사용될 수 있다. 외부 세계와 인터페이스하는 출력 핀-아웃의 수에 기초하여, 노드 수와 이에 따른 게이트 로직와 포인터의 수는 빠르게 결정될 수 있다. 따라서, 비디오 컨트롤러상의 출력 핀-아웃 MA0-MA13은 비디오 RAM용 메모리 어드레스를 제공한다. VS 출력 핀-아웃은 수직 동기화용 신호를 제공하고, 따라서 모니터상의 수직 리트레이스(retrace)를 야기한다. 출력 핀-아웃 D0-D7은 타겟 시스템내 CPU에 의해 내부 6945 레지스터를 액세스하기 위한 양방향 데이터 버스를 형성하는 8개의 단자이다. 이러한 출력 핀-아웃은 하드웨어 모델내 특정 내주 노드에 해당한다. 물론, 이러한 내부 노드의 수와 특성은 사용자 설계에 따라 변한다.Other output pin-outs can be used in these video controllers. Based on the number of output pin-outs that interface with the outside world, the number of nodes and thus the number of gate logic and pointers can be quickly determined. Thus, the output pin-out MA0-MA13 on the video controller provides the memory address for the video RAM. The VS output pin-out provides a signal for vertical synchronization, thus causing a vertical retrace on the monitor. Output pin-outs D0-D7 are eight terminals that form a bidirectional data bus for accessing internal 6945 registers by the CPU in the target system. This output pin-out corresponds to a specific inner node in the hardware model. Of course, the number and characteristics of these internal nodes vary depending on the user design.

RCC 컴퓨팅 시스템이 소프트웨어내 전체 사용자 설계의 모델을 포함하기 때문에 이러한 출력 핀-아웃 내부 노드는 RCC 컴퓨팅 시스템에 제공되어야 하고, 소프트웨어 모델내에서 발생하는 임의의 경우 해당 변화가 형성될 수 있도록 소프트웨어 모델과 통신하여야 한다. 이러한 방식으로, 소프트웨어 모델은 하드웨어 모델과 일치하는 정보를 가질 것이다. 추가적으로, RCC 컴퓨팅 시스템은 외부 I/O 확장상의 포트중 하나에 실제 장치를 연결시키는 것을 제외하고 소프트웨어내에 모델에 전용된 사용자 또는 설계자인 I/O 장치의 장치 모델을 가진다. 예를 들면, 사용자는 실제 모니터 또는 스피커를 외부 I/O 확장기 포트에 프러깅하는 것을 제외하고 소프트웨어내 모니터 또는 스피커를 모델링하는 것이 더 쉽고 효율적인지를 결정한다. 더욱이, 하드웨어 모델내 이러한 내부 노드로부터의 데이터는 타겟 시스템 및 다른 외부 I/O 장치에 제공되어야 한다. 이러한 출력 핀-아웃 내부 노드가 RCC 컴퓨팅 시스템 및 타겟 시스템과 다른 외부 I/O 장치로 전달될 수 있도록 하기 위해, 본 발명의 일 실시예에 따른 데이터-아웃 제어 로직는 공동인증 시스템내에 제공된다.Since the RCC computing system includes a model of the entire user design in software, these output pin-out internal nodes must be provided to the RCC computing system, so that any changes that occur within the software model can be made with the software model. Communicate In this way, the software model will have information that matches the hardware model. In addition, the RCC computing system has a device model of an I / O device that is a user or designer dedicated to the model in software, except that the actual device is connected to one of the ports on the external I / O extension. For example, the user determines whether it is easier and more efficient to model a monitor or speaker in software except by plugging the actual monitor or speaker into the external I / O expander port. Moreover, data from these internal nodes in the hardware model must be provided to the target system and other external I / O devices. In order to allow these output pin-out internal nodes to be communicated to RCC computing systems and external I / O devices other than the target system, data-out control logic in accordance with one embodiment of the present invention is provided within the co-authentication system.

데이터-아웃 제어 로직는 RCC 하드웨어(2190)로부터 RCC 컴퓨팅 시스템(2141) 및 외부 인터페이스(외부 I/O 확장기 2139)로의 데이터 전달을 포함하는 데이터-아웃 사이클을 사용한다. 도 69에서, 외부 인터페이스(외부 I/O 확장기 2139)와 공동인증 2140 사이의 데이터 전달을 위한 제어 로직이 각각의 보드(2145-2149)에 보여진다. 제어 로직의 주요부가 외부 I/O 컨트롤러(2152)에서 보여지지만 다른 부분은 여러 내부 I/O 컨트롤러(예를 들면, 2156 및 2158)와 리컨피규러블 제어 엘리먼트(예를 들면, FPGA 칩 2159 및 2165)에서 보여진다. 다시, 기구적인 목적으로, 모든 보드내 모든 칩에 대한 동일한 반복 로직 구조 대신에 이러한 제어 로직의 일부를 도시하는 것만이 필요하다. 도 69의 덤선 2150내 공동인증 시스템 부분 2140은 제어 로직의 서브세트를 포함한다. 이러한 제어 로직는 도 71과 도 73과 관련하여 전반적으로 설명될 것이다. 도 71은 데이터-아웃 사이클에 사용되는 제어 로직을 도시한다. 도 73은 데이터-아웃 사이클의 타이밍 도면을 도시한다.The data-out control logic uses a data-out cycle that includes data transfer from the RCC hardware 2190 to the RCC computing system 2141 and to an external interface (external I / O expander 2139). In FIG. 69, control logic for data transfer between the external interface (external I / O expander 2139) and co-authorization 2140 is shown on each board 2145-2149. While the main part of the control logic is shown in the external I / O controller 2152, the other parts are the various internal I / O controllers (e.g. 2156 and 2158) and reconfigurable control elements (e.g. FPGA chips 2159 and 2165). Is shown in Again, for mechanical purposes, it is only necessary to show some of this control logic instead of the same repeating logic structure for every chip in every board. Co-authentication system portion 2140 in thick line 2150 of FIG. 69 includes a subset of control logic. Such control logic will be described generally with respect to FIGS. 71 and 73. 71 shows the control logic used in the data-out cycle. 73 shows a timing diagram of a data-out cycle.

데이터-아웃 제어 로직의 특정 서브세트가 도 71에 도시되어 있고, 외부 I/O 컨트롤러 2300, 3상 버퍼 2301, 내부 I/O 컨트롤러 2302, 리컨피규러블 로직 엘리먼트 2303 및 여러 버스와 제어 라인을 포함하여 이들 사이의 데이터 전송을 가능케 한다. 이러한 서브세트는 외부 인터페이스와 RCC 컴퓨팅 시스템으로부터의 데이터가 RCC 하드웨어 어레이로 전달되는 데이터-아웃 동작에 필요한 로직를 도시한다. 도 71의 데이터-아웃 제어 로직와 도 73의 데이터-아웃 타이밍 도면이 함께 설명될 것이다. A specific subset of the data-out control logic is shown in FIG. 71 and includes an external I / O controller 2300, a three phase buffer 2301, an internal I / O controller 2302, reconfigurable logic elements 2303 and several buses and control lines. This enables data transfer between them. This subset illustrates the logic required for the data-out operation in which data from the external interface and the RCC computing system is passed to the RCC hardware array. The data-out control logic of FIG. 71 and the data-out timing diagram of FIG. 73 will be described together.

데이터-아웃 사이클의 두 형태에 비교하여, 데이터-아웃 사이클은 오로지 한 형태의 사이클만을 포함한다. RCC 하드웨어 모델로부터의 데이터가 (1) RCC 컴퓨팅 시스템으로 그리고 (2) RCC 컴퓨팅 시스템과 외부 인터페이스(타겟 시스템과 외부 I/O 장치)로 순차적으로 전달되는 것을 필요로 한다. 특히, 데이터-아웃 사이클은 RCC 하드웨어 어레이내 하드웨어 모델의 내부 노드로부터의 데이터가 RCC 컴퓨팅 시스템에 가장먼저 전달되고 이후 RCC 컴퓨팅 시스템과 각각의 칩내의 외부 인터페이스에 각각의 보드내에서 시간별로 하나의 칩과 하나의 보드에 다음으로 전달되는 것을 요구한다.Compared to two types of data-out cycles, the data-out cycle includes only one type of cycle. The data from the RCC hardware model needs to be passed sequentially to (1) the RCC computing system and (2) to the RCC computing system and external interfaces (target system and external I / O device). In particular, the data-out cycle ensures that data from an internal node of a hardware model in an RCC hardware array is delivered first to the RCC computing system, and then one chip at a time within each board to the RCC computing system and an external interface within each chip. And to be passed on to one board next.

데이터-아웃 제어 로직와 같이, 포인터가 내부 노드로부터 RCC 컴퓨팅 시스템과 외부 인터페이스로 데이터를 선택(또는 게이트)하는데 사용될 것이다. 도 71과 도 73에 도시된 실시예에서, 데이터-아웃 포인터 상태 머신(2319)는 하드웨어-하드웨어 데이터 및 하드웨어-외부 인터페이스 데이터 모두를 위해 버스 2359상의 5개의 포인터 H2S_PTR[4:0]을 발생시킨다. 데이터-아웃 포인터 상태 머신(2319)는 라인 2358상의 DATA_XSFR 및 F_RD 신호를 발생시킨다. 내부 I/O 컨트롤러(2302)는 라인 2358상에서 DATA_XSFR 및 F_RD 신호를 생성한다. DATA_XSFR은 RCC 하드웨어 어레이와 RCC 컴퓨팅 시스템 또는 외부 인터페이스 사이의 데이터 전달이 요구될 때마다 항상 로직 "1"이다. F_RD 신호는 F_WR 신호와 비교하여, RCC 하드웨어 어레이로부터의 판독이 요구될 때마다 로직 "1"이다. 만일 DATA_XSFR 및 F_RD가 로직 "1"이라면, 데이터-아웃 포인터 상태 머신(2319)은 적정 프로그램된 시퀀스로 적정 H2S 포인터 신호를 발생시킬 수 있다. 다른 실시예는 사용자 설계에 필요한 더 많은 포인터(또는 더 적은 포인터)를 사용할 수 있다.Like the data-out control logic, a pointer will be used to select (or gate) data from an internal node to an external interface with the RCC computing system. In the embodiment shown in FIGS. 71 and 73, data-out pointer state machine 2319 generates five pointers H2S_PTR [4: 0] on bus 2359 for both hardware-hardware data and hardware-external interface data. . Data-out pointer state machine 2319 generates the DATA_XSFR and F_RD signals on line 2358. Internal I / O controller 2302 generates the DATA_XSFR and F_RD signals on line 2358. DATA_XSFR is always logic "1" whenever data transfer between the RCC hardware array and the RCC computing system or external interface is required. The F_RD signal is a logic "1" every time a read from the RCC hardware array is required, compared to the F_WR signal. If DATA_XSFR and F_RD are logic "1", data-out pointer state machine 2319 can generate the appropriate H2S pointer signal in the appropriate programmed sequence. Other embodiments may use more pointers (or fewer pointers) needed for user design.

이러한 H2S 포인터 신호는 게이트 로직에 제공된다. 게이트 로직로의 입력 세트 2353-2357은 수 개의 AND 게이트 2314-2318로 지향된다. 다른 입력 세트 2348-2352는 하드웨어 모델의 내부 노드에 연결된다. 따라서, AND 게이트(2314)는 내부 노드로부터의 입력(2348) 및 H2S_PTR0로부터의 입력(2353)을 갖으며; AND 게이트(2315)는 내부 노드로부터의 입력(2349) 및 H2S_PTR1로부터의 입력(2354)을 갖으며; AND 게이트(2316)는 내부 노드로부터의 입력(2350) 및 H2S_PTR2로부터의 입력(2355)을 갖으며; AND 게이트(2317)는 내부 노드로부터의 입력(2351) 및 H2S_PTR3로부터의 입력(2356)을 갖으며; AND 게이트(2318)는 내부 노드로부터의 입력(2352) 및 H2S_PTR3로부터의 입력(2357)을 갖는다. 적절한 H2S_PTR 포인터 신호없이, 내부 노드는 RCC 컴퓨팅 시스템 또는외부 인터페이스중 어떤 것에서도 구동될 수 없다.This H2S pointer signal is provided to the gate logic. Input sets 2353-2357 to the gate logic are directed to several AND gates 2314-2318. The other input sets 2348-2352 are connected to internal nodes of the hardware model. Thus, AND gate 2314 has an input 2348 from an internal node and an input 2357 from H2S_PTR0; AND gate 2315 has an input 2349 from an internal node and an input 2354 from H2S_PTR1; AND gate 2316 has input 2350 from internal node and input 2355 from H2S_PTR2; AND gate 2317 has an input 2351 from an internal node and an input 2356 from H2S_PTR3; AND gate 2318 has an input 2352 from an internal node and an input 2357 from H2S_PTR3. Without the proper H2S_PTR pointer signal, the inner node cannot be driven from either the RCC computing system or the external interface.

상기 AND 게이트(2314-2318)의 개별적인 출력(2343-2347)은 OR 게이트(2310-2313)에 결합된다. 따라서, AND 게이트 출력(2343)은 OR 게이트(2310)의 입력에 결합되며; AND 게이트 출력(2344)은 OR 게이트(2311)의 입력에 결합되며; AND 게이트 출력(2345)은 OR 게이트(2312)의 입력에 결합되며; AND 게이트 출력(2346)은 OR 게이트(2313)의 입력에 결합된다. AND 게이트(2315)의 출력(2344)은 비할당된 OR 게이트에서 결합되지 않지만, 출력(2344)은 AND 게이트(2316)의 출력(2345)에 결합된 OR게이트(2311)에 결합됨이 언급된다. OR 게이트(2310-2313)에 대한 다른 입력(2360-2366)은 스스로 다른 내부 노드 및 H2S_PTR 포인터에 결합되는 다른 AND 게이트(비도시)의 출력에 결합된다. 상기 OR 게이트 및 그들의 특정 입력의 사용은 사용자 설계 및 구성된 하드웨어 모델을 기반으로 한다. 따라서, 다른 설계에서, 사용될 수 있는 더 많은 포인터 및 AND 게이트(2315)로부터의 출력(2344)은 OR 게이트(2311)가 아닌 서로 다른 OR 게이트에 결합된다.Individual outputs 2343-2347 of the AND gates 2314-2318 are coupled to OR gates 2310-2313. Thus, AND gate output 2343 is coupled to the input of OR gate 2310; AND gate output 2344 is coupled to the input of OR gate 2311; AND gate output 2345 is coupled to the input of OR gate 2312; AND gate output 2346 is coupled to the input of OR gate 2313. It is noted that output 2344 of AND gate 2315 is not coupled at unassigned OR gate, but output 2344 is coupled to OR gate 2311 coupled to output 2345 of AND gate 2316. . The other inputs 2236-2366 to the OR gates 2310-2313 are coupled to the output of another AND gate (not shown) which is itself coupled to another internal node and the H2S_PTR pointer. The use of the OR gates and their specific inputs is based on user designed and configured hardware models. Thus, in other designs, more pointers and outputs 2344 from AND gate 2315 that can be used are coupled to different OR gates rather than OR gate 2311.

OR 게이트(2310-2313)의 출력(2339-2342)은 FD 버스 라인 FD0, FD3, FD1, 및 FD4에 결합된다. 사용자 설계의 특정 예에서, 오직 4개의 출력 핀아웃 신호는 RCC 컴퓨팅 시스템 및 외부 인터페이스에서 전달될 것이다. 따라서, FD0는 OR 게이트(2310)의 출력에 결합되고; FD3는 OR 게이트(2311)의 출력에 결합되고; FD1는 OR 게이트(2312)의 출력에 결합되고; FD4는 OR 게이트(2313)의 출력에 결합된다. 상기 FD 버스 라인은 내부 I/O 컨트롤러(2302)의 내부 라인(2334-2338)를 통해 로컬 버스 라인(2330-2333)에 결합된다. 상기 실시예에서, 로컬 버스 라인(2330)은 LD0이고, 로컬 버스라인(2331)은 LD3이고, 로컬 버스라인(2332)은 LD1이고, 로컬 버스라인(2333)은 LD4이다.Outputs 2339-2342 of OR gates 2310-2313 are coupled to FD bus lines FD0, FD3, FD1, and FD4. In a particular example of a user design, only four output pinout signals will be delivered at the RCC computing system and external interface. Thus, FD0 is coupled to the output of the OR gate 2310; FD3 is coupled to the output of OR gate 2311; FD1 is coupled to the output of OR gate 2312; FD4 is coupled to the output of the OR gate 2313. The FD bus line is coupled to a local bus line 2330-2333 through an internal line 2334-2338 of an internal I / O controller 2302. In this embodiment, local bus line 2330 is LD0, local busline 2331 is LD3, local busline 2332 is LD1, and local busline 2333 is LD4.

로컬 버스라인(2330-2333)의 데이터가 RCC 컴퓨팅 시스템에 전달되도록 하기 위해, 상기 로컬 버스라인은 3상태 버퍼(2301)에 결합된다. 정규 상태에서 3상태 버퍼(2301)는 데이터가 로컬 버스라인(2330-2333)으로부터 로컬 버스(2320)로 통과하도록 허용한다. 대조적으로, 데이터-인동안, 데이터는 CPU_IN 신호가 3상태 버퍼(2301)에 제공될 때에만 RCC 컴퓨팅 시스템으로부터 RCC 하드웨어 어레이를 통과하도록 허용된다.The local busline is coupled to the tri-state buffer 2301 to allow data from the local buslines 2330-2333 to be delivered to the RCC computing system. In the normal state, the tri-state buffer 2301 allows data to pass from the local busline 2330-2333 to the local bus 2320. In contrast, during data-in, data is allowed to pass through the RCC hardware array from the RCC computing system only when the CPU_IN signal is provided to the tri-state buffer 2301.

상기 로컬 버스라인(2330-2333)의 데이터가 외부 인터페이스에 전달되도록 하기 위해 라인(2321-2324)이 제공된다. 라인(2321)은 라인(2330)및 외부 I/O 컨트롤러(2300)에서의 임의의 래치(비도시)에 결합되며; 라인(2322)은 라인(2331)및 외부 I/O 컨트롤러(2300)에서의 임의의 래치(비도시)에 결합되며; 라인(2323)은 라인(2332)및 외부 I/O 컨트롤러(2300)에서의 래치(2305)에 결합되며; 라인(2324)은 라인(2333)및 외부 I/O 컨트롤러(2300)에서의 래치(2306)에 결합된다.Lines 2321-2324 are provided to allow data from the local buslines 2330-2333 to be delivered to an external interface. Line 2321 is coupled to any latch (not shown) in line 2330 and external I / O controller 2300; Line 2232 is coupled to any latch (not shown) in line 2331 and external I / O controller 2300; Line 2323 is coupled to latch 2305 at line 2332 and external I / O controller 2300; Line 2324 is coupled to latch 2306 at line 2333 and external I / O controller 2300.

상기 래치(2305 및 2306)의 각각의 입력은 버퍼 및 타겟 시스템의 적절한 출력 핀-아웃 또는 외부 I/O 에 결합되는 외부 인터페이스에 결합된다. 따라서, 래치(2305)의 출력은 버퍼(2307) 및 라인(2327)에 결합된다. 또한, 래치(2306)의 출력은 버퍼(2308) 및 라인(2328)에 결합된다. 또 다른 래치(비도시)의 또 다른 출력은 라인(2329)에 결합될 수 있다. 상기 예에서, 라인(2327-2329)은 타겟 시스템 또는 임의의 외부 I/O 디바이스의 와이어1, 와이어4, 및 와이어3과 각각 일치한다. 최종적으로, 상기 하드웨어 모델로부터 상기 외부 인터페이스로 데이터를 전송하는 동안, 상기 사용자 설계의 하드웨어 모델은 라인(2350)에 연결되어 있는 내부 노드가 라인(2329)상의 와이어(3)에 상응하며, 라인(2351)에 연결되어 있는 내부 노드는 라인(2327)상의 와이어(1)에 상응하며, 라인(2352)에 연결되어 있는 내부 노드는 아인(2328)상의 와이어(4)에 상응하도록 구성된다. 유사하게, 와이어(3)은 라인(2331)상의 KD3에 상응하며, 와이어(1)은 라인(2332)상의 LD1에 상응하며, 와이어(4)는 라인(2333)상의 LD4에 상응한다.Each input of the latches 2305 and 2306 is coupled to an external interface that is coupled to the appropriate output pin-out or external I / O of the buffer and target system. Thus, the output of latch 2305 is coupled to buffer 2307 and line 2327. In addition, the output of latch 2306 is coupled to buffer 2308 and line 2328. Another output of another latch (not shown) may be coupled to line 2329. In the above example, lines 2327-2329 match wire 1, wire 4, and wire 3, respectively, of the target system or any external I / O device. Finally, while transferring data from the hardware model to the external interface, the hardware model of the user design has an internal node connected to the line 2350 corresponding to the wire 3 on the line 2329 and the line ( An internal node connected to 2351 corresponds to wire 1 on line 2327, and an internal node connected to line 2352 is configured to correspond to wire 4 on ain 2328. Similarly, wire 3 corresponds to KD3 on line 2331, wire 1 corresponds to LD1 on line 2332, and wire 4 corresponds to LD4 on line 2333.

룩-업 테이블(2309)은 이러한 래치(2305, 2306)으로의 입력에 연결된다. 상기 룩-업 테이블(2309)은 조사표 주소 카운터(2304)의 작동을 트리거하는 라인(2367) 상의 F_RD 신호에 의해 제어된다. 카운터가 각각 증가할 때, 상기 포인터는 룩-업 테이블(2309)의 특정 열을 인에이블한다. 만약 상기 특정 열에서의 엔트리(또는 비트)가 로직 "1"이면, 상기 룩-업 테이블(2309)의 특정 엔트리에 연결되어 있는 LUT 출력 라인은 그것의 상응하는 래치를 인에이블하고 상기 데이터를 상기 외부 인터페이스로 구동하며, 최종적으로는 상기 타겟 시스템 또는 일정한 외부 I/O 기기의 원하는 지점으로 구동한다. 예를 들어, LUT 출력 라인(2325)는 래치(2305)로의 인에이블 입력에 연결되며, LUT 출력 라인(2326)은 래치(2306)으로의 인에이블 입력에 연결된다.Look-up table 2309 is connected to the inputs to these latches 2305 and 2306. The look-up table 2309 is controlled by the F_RD signal on line 2367 which triggers the operation of lookup table address counter 2304. As each counter increments, the pointer enables a particular column of look-up table 2309. If the entry (or bit) in the particular column is a logic "1", the LUT output line connected to the particular entry in the look-up table 2309 enables its corresponding latch and reads the data. It is driven by an external interface, and finally by a desired point of the target system or a constant external I / O device. For example, LUT output line 2325 is connected to an enable input to latch 2305 and LUT output line 2326 is connected to an enable input to latch 2306.

상기 예에서, 룩-업 테이블(2309)의 열(0-3)은 칩 m1의 내부 노드에 대한 출력 핀-아웃 와이어들에 상응하는 인에이블링 래치를 위해 프로그램된다. 유사하게, 열(4-6)은 칩0_1(즉, 보드1의 칩0)의 내부 노드에 대한 출력 핀-아웃 와이어들에 상응하는 인에이블링 래치를 위해 프로그램된다. 열4에서, 비트(3)은 로직"1"이다. 열(5)에서, 비트1은 로직"1"이다. 열(6)에서, 비트(4)는 로직"1"이다. 모든 다른 엔트리들과 비트 위치는 로직"0"이다. 룩-업 테이블의 어느 소정의 비트 위치에 대해, 단일 출력 핀-아웃 와이어은 다중 I/O기기를 구동할 수 없기 때문에, 단지 하나의 엔트리만이 로직"1"이다. 달리 말하면, 하드웨어 모델에서 출력 핀-아웃 내부 노드는 상기 외부 인터페이스에 연결되어 있는 단지 단일 와이어에만 데이터를 제공한다.In the above example, columns 0-3 of look-up table 2309 are programmed for enabling latches corresponding to the output pin-out wires for the internal node of chip m1. Similarly, columns 4-6 are programmed for the enabling latches corresponding to the output pin-out wires for the internal node of chip 0_1 (ie chip 0 of board 1). In column 4, the bit 3 is logic "1". In column 5, bit 1 is logic " 1. " In column 6, bit 4 is logic " 1 ". All other entries and bit positions are logic "0". For any given bit position of the look-up table, only one entry is logic "1" because a single output pin-out wire cannot drive multiple I / O devices. In other words, in the hardware model, the output pin-out internal node provides data only to a single wire connected to the external interface.

상기 언급한 것과 같이, 상기 데이터-아웃 제어 로직은 상기 RCC 하드웨어 모델에 있는 각 칩의 각 구성될 수 있는 로직 컴포넌트의 데이터가 순차적으로 (1) 상기 RCC 컴퓨팅 시스템 및 (2) 상기 RCC 컴퓨팅 시스템 및 상기 외부 인터페이스로(상기 타겟 시스템 및 상기 외부 I/O 기기) 함께 전달되는 것을 요구한다. 상기 RCC 컴퓨팅 시스템은 소프트웨어에서 일정한 I/O 기기의 모델을 가지고 있기 때문에 상기 데이터를 요구하며, 상기 모델링된 I/O 기기 중 하나로 향하지 않는 상기 데이터에 대해서는, 상기 RCC 컴퓨팅 시스템이 그것의 내부 상태가 상기 RCC 하드웨어 어레이의 상기 하드웨어 모델의 그것에 상응하도록 하기 위해 그들을 모니터할 필요가 있다. 도71과 73에서 설명된 예에서, 단지 7개의 내부 노드들이 상기 RCC 컴퓨팅 시스템과 외부 인터페이스로의 출력으로 구동될 것이다. 상기 내부 노드들 중에서 두개는 칩 m1에 있으며, 다른 5개의 내부 노드들은 칩0_1(즉, 보드1의 칩0)에 있다. 물론, 그것들 중에서 다른 내부 노드들과 다은 칩들은 상기 특정 사용자 설계를 이해 요구될 수 있지만, 도71과 73은 단지 7개의 노드들만을 설명하고 있다.As mentioned above, the data-out control logic is configured such that the data of each configurable logic component of each chip in the RCC hardware model is sequentially (1) the RCC computing system and (2) the RCC computing system and To the external interface (the target system and the external I / O device). The RCC computing system requires the data because it has a model of a certain I / O device in software, and for the data that is not directed to one of the modeled I / O devices, the RCC computing system is in a state of its internal state. It is necessary to monitor them in order to correspond to that of the hardware model of the RCC hardware array. In the example described in Figures 71 and 73, only seven internal nodes will be driven with output to the RCC computing system and an external interface. Two of the internal nodes are on chip m1 and the other five internal nodes are on chip 0_1 (ie chip 0 of board 1). Of course, other internal nodes and chips among them may be required to understand the specific user design, but Figures 71 and 73 illustrate only seven nodes.

데이터 송신 동안에, 상기 DATA_XEFR 신호는 로직 "1"이다. 상기 시간 동안에, 상기 로컬 버스(2330-2333)은 상기 RCC 하드웨어 어레이에 있는 각 보드의 각 칩으로부터 순차적으로 상기 RCC 컴퓨팅 시스템과 상기 외부 인터페이스로 데이터를 송신하기 위해 종래의 시스템에서 사용될 것이다. 상기 DATA_XSFR과 F_RD 신호들은 상기 출력 핀-아웃 내부 노드로 향하는 적절한 게이트로의 상기 적절한 포인터 신호 H2S_PTR[4:0]을 발생하기 이해 상기 데이터 출력 포인터 상태 머신의 작동을 제어한다. 상기 F_RD 신호는 또한 내부 노드 데이터를 상기 외부 인터페이스로 전송하기 위해 상기 룩-업 테이블 주소 카운터(2304)를 제어한다.During data transmission, the DATA_XEFR signal is logic "1". During this time, the local buses 2330-2333 will be used in a conventional system to sequentially transmit data from each chip of each board in the RCC hardware array to the RCC computing system and the external interface. The DATA_XSFR and F_RD signals control the operation of the data output pointer state machine to generate the appropriate pointer signal H2S_PTR [4: 0] to the appropriate gate directed to the output pin-out internal node. The F_RD signal also controls the look-up table address counter 2304 to send internal node data to the external interface.

칩 m1에 있는 상기 내부 노드는 처음으로 조정된다. F_RC가 데이터 전송 사이클에서 로직 "1"로 발생되면, 칩 m1에 있는 H2S_PTR0은 로직 "1"로 간다. 이것은 상기 H2S_PTR0에 근거하는 칩 m1의 내부 노드에 있는 데이터를 트라이-상태 버퍼(2301)과 국부 버스(2320)을 통해 상기 RCC 컴퓨팅 시스템으로 구동한다. 룩-업 테이블 어드레스 카운터(2304)가 카운팅되고 룩업 테이블(2309)의 로우(0)를 지시하여 칩 m1 의 적절한 데이터에서 외부 인터페이스로 래칭된다. F_RD 신호가 다시 로직 "1"로 돌아가면, H2S_PTR1에 의해 구동될 수 있는 내부 노드들에서의 데이터는 RCC 컴퓨팅 시스템 및 내부 인터페이스로 전달된다. H2S_PTR1은 로직 "1"로 진행하고 제 2 F_RD 신호에 응답하여, 룩 업 테이블 어드레스 카운터(2304)는 카운팅되고 룩업 테이블(2309)의 로우(1)을 지시하여 칩 m1의 적절한 데이터에서 외부 인터페이스로 래칭된다. The internal node on chip m1 is adjusted for the first time. If F_RC is generated with logic "1" in the data transfer cycle, H2S_PTR0 on chip m1 goes to logic "1". This drives data at the internal node of chip m1 based on H2S_PTR0 to the RCC computing system via tri-state buffer 2301 and local bus 2320. Look-up table address counter 2304 is counted and instructed row 0 of look-up table 2309 to latch onto the external interface at the appropriate data on chip m1. When the F_RD signal returns to logic " 1 ", data at internal nodes that can be driven by H2S_PTR1 is passed to the RCC computing system and internal interface. H2S_PTR1 proceeds to logic " 1 " and in response to the second F_RD signal, lookup table address counter 2304 counts and indicates row 1 of lookup table 2309 to the external interface from the appropriate data of chip m1. Latched.

리컨피규러블 로직 엘리먼트(2303)(즉, 보드 1에서의 칩 0_1, 또는 칩 0)가 이제 처리될 것이다. 이러한 예에서, H2S_PTR0 및 H2S_PTR2와 관련된 2개의 내부 노드들은 단지 RCC 컴퓨팅 시스템으로 전달될 것이다. H2S_PTR2, H2S_PTR3, 및 H2S_PTR4와 관련된 3개의 내부 노드들로부터의 데이터는 RCC 컴퓨팅 시스템 및 외부 인터페이스로 전달될 것이다. Reconfigurable logic element 2303 (ie, chip 0_1, or chip 0 on board 1) will now be processed. In this example, two internal nodes associated with H2S_PTR0 and H2S_PTR2 will only be passed to the RCC computing system. Data from three internal nodes associated with H2S_PTR2, H2S_PTR3, and H2S_PTR4 will be delivered to the RCC computing system and external interface.

F_RD 가 로직 "1"이 되면, 칩(2303)의 H2S_PTR0는 로직 "1"이 된다. 이는 3-상태 버퍼(2301) 및 로컬 버스(2320)를 통해 RCC 컴퓨팅 시스템으로 H2S_PTR0에 의존하는 칩(2303)내의 이러한 내부 노드들을 구동한다. 이러한 예에서, 라인(2348)과 결합된 내부 노드는 라인(2353) 상에서 H2S_PTR0에 의존하는 라인(2348)과 결합된다. F_RD 신호가 다시 로직 "1"이 되면, H2S_PTR1에 의해 구동될 수 있는 내부 노드들에서의 데이터는 RCC 컴퓨팅 시스템으로 전달된다. 여기서, 라인(2349)과 결합된 내부 노드가 영향을 받는다. 이러한 데이터는 라인(2331 및 2322) 상에서 LD3로 구동된다. When F_RD becomes logic "1", H2S_PTR0 of chip 2303 becomes logic "1". It drives these internal nodes in chip 2303 that rely on H2S_PTR0 to the RCC computing system via a tri-state buffer 2301 and a local bus 2320. In this example, an internal node associated with line 2348 is coupled with line 2348 depending on H2S_PTR0 on line 2353. When the F_RD signal becomes logic " 1 " again, data at internal nodes that can be driven by H2S_PTR1 is passed to the RCC computing system. Here, the internal node coupled with line 2349 is affected. This data is driven to LD3 on lines 2331 and 2322.

F_RD 신호가 다시 로직 "1"이 되면, H2S_PTR2는 로직 "1"이 되고 라인(2350)과 결합된 내부 노드에서의 데이터가 LD3에서 제공된다. 이러한 데이터는 RCC 컴퓨팅 시스템 및 외부 인터페이스 모두에게 제공된다. 3-상태 버퍼(2301)는 데이터가 로컬 버스(2320)로 그리고 나서 RCC 컴퓨팅 시스템으로 전달될 수 있도록 하여준다. 외부 인터페이스에 대해서, 이러한 데이터는 H2S_PTR2 신호를 인에이블 함으로써 라인(2331 및 2322) 상에서 LD3로 구동된다. F_RD 신호에 응답하여, 룩-업 테이블 어드레스 카운터(2304)가 카운팅되고 룩-업 테이블(2309)을 로우(4)를 지시하여 외부 인터페이스에서 라인(2350) - 라인(2329)(와이어3)와 결합된 이러한 내부 노드로부터 적절한 데이터에서 래칭된다.When the F_RD signal becomes logic "1" again, H2S_PTR2 becomes logic "1" and data at the internal node coupled with line 2350 is provided at LD3. This data is provided to both the RCC computing system and the external interface. The tri-state buffer 2301 allows data to be delivered to the local bus 2320 and then to the RCC computing system. For the external interface, this data is driven to LD3 on lines 2331 and 2322 by enabling the H2S_PTR2 signal. In response to the F_RD signal, look-up table address counter 2304 is counted and directs look-up table 2309 to row 4 so that lines 2350-line 2329 (wire 3) at the external interface are connected. From these internal nodes combined they are latched in the appropriate data.

F_RD 신호가 다시 로직"1"이 되면, H2S_PTR3는 로직"1" 이 되고 라인(2351)과 결합된 내부 노드에서의 데이터가 LD1에서 제공된다. 이러한 데이터는 RCC 컴퓨팅 시스템 및 외부 인터페이스 모두에게 제공된다. 3-상태 버퍼(2301)는 데이터가 로컬 버스(2320)로 그리고 나서 RCC 컴퓨팅 시스템으로 전달될 수 있도록 하여준다. 외부 인터페이스에 대해서, 이러한 데이터는 H2S_PTR3 신호를 인에이블링 함으로써 라인(2332 및 2323) 상에서 LD1로 구동된다. F_RD 신호에 응답하여, 룩-업 테이블 어드레스 카운터(2304)가 카운팅되고 룩-업 테이블(2309)을 로우(5)를 지시하여 외부 인터페이스에서 라인(2351) - 라인(2327)(와이어1)와 결합된 이러한 내부 노드로부터 적절한 데이터에서 래칭된다.When the F_RD signal becomes logic " 1 " again, H2S_PTR3 becomes logic " 1 " and data at the internal node coupled with line 2351 is provided at LD1. This data is provided to both the RCC computing system and the external interface. The tri-state buffer 2301 allows data to be delivered to the local bus 2320 and then to the RCC computing system. For the external interface, this data is driven to LD1 on lines 2332 and 2323 by enabling the H2S_PTR3 signal. In response to the F_RD signal, look-up table address counter 2304 is counted and instructs look-up table 2309 to row 5 to indicate lines 2351-line 2327 (wire 1) at the external interface. From these internal nodes combined they are latched in the appropriate data.

F_RD 신호가 다시 로직"1"이 되면, H2S_PTR4는 로직"1" 이 되고 라인(2352)과 결합된 내부 노드에서의 데이터가 LD4에서 제공된다. 이러한 데이터는 RCC 컴퓨팅 시스템 및 외부 인터페이스 모두에게 제공된다. 3-상태 버퍼(2301)는 데이터가 로컬 버스(2320)로 그리고 나서 RCC 컴퓨팅 시스템으로 전달될 수 있도록 하여준다. 외부 인터페이스에 대해서, 이러한 데이터는 H2S_PTR4 신호를 인에이블링 함으로써 라인(2333 및 2324) 상에서 LD4로 구동된다. F_RD 신호에 응답하여, 룩-업 테이블 어드레스 카운터(2304)가 카운팅되고 룩-업 테이블(2309)을 로우(6)를 지시하여 외부 인터페이스에서 라인(2352) - 라인(2328)(와이어4)와 결합된 이러한 내부 노드로부터 적절한 데이터에서 래칭된다.When the F_RD signal becomes logic " 1 " again, H2S_PTR4 becomes logic " 1 " and data at the internal node coupled with line 2352 is provided at LD4. This data is provided to both the RCC computing system and the external interface. The tri-state buffer 2301 allows data to be delivered to the local bus 2320 and then to the RCC computing system. For the external interface, this data is driven to LD4 on lines 2333 and 2324 by enabling the H2S_PTR4 signal. In response to the F_RD signal, the look-up table address counter 2304 counts and directs the look-up table 2309 to row 6 to match lines 2352-2323 (wire 4) at the external interface. From these internal nodes combined they are latched in the appropriate data.

RCC 컴퓨팅 시스템으로 칩 m1의 내부 노드에서 데이터를 구동하고 그리고 나서 RCC 컴퓨팅 시스템 및 외부 인터페이스로 데이터를 구동하는 이러한 과정은 다른 칩들에 대해 순차적으로 계속된다. 첫째로, 칩 m1의 내부 노드가 구동된다. 둘째로, 칩0-1(칩2303)의 내부 노드가 구동한다. 다음으로, 존재한다면 칩1-1의 내부 노드가 구동할 것이다. 이것은 상기 마지막 보드의 마지막 칩들에서 마지막 노드가 구동할 때까지 게속된다. 따라서, 존재한다면, 칩7-8의 재부 노드가 구동할 것이다. 마지막으로, 존재한다면 상기 칩 m2의 내부 노드는 구동할 것이다.This process of driving data at the internal node of chip m1 with the RCC computing system and then to the RCC computing system and external interface continues sequentially for the other chips. First, the internal node of chip m1 is driven. Secondly, the internal node of chip 0-1 (chip 2303) is driven. Next, if present, the internal node of chip 1-1 will run. This continues until the last node drives on the last chips of the last board. Thus, if present, the second node of chip 7-8 will run. Finally, if present the internal node of chip m2 will be driven.

비록 도71은 단지 칩(2303)의 내부 노드를 구동하기 위한 상기 데이터 출력 제어 로직을 도시하고 있지만, 다른 칩들은 또한 시스템과 상기 외부 인터페이스를 컴퓨팅하는 상기 RCC로 구동될 필요가 있는 내부 노드들을 가지고 있다. 내부 노드의 수에 무관하게, 상기 데이터 출력 로직은 한 칩에 있는 상기 내부 노드로부터 시스템을 컴퓨팅하는 상기 RCC로 상기 데이터를 구동할 것이며, 또 다른 사이클에 동일한 칩에 있는 내부 노드의 서로 다른 세트를 시스템과 외부 인터페이스를 컴퓨팅하는 상기 RCC로 구동한다. 상기 데이터 출력 제어 로직은 상기 다음 칩으로 이동하며, 시스템을 컴퓨팅하는 상기 RCC로 지정된 데이터를 구동하고 다음으로 상기 외부 인터페이스로 지정된 데이터를 상기 RCC 컴퓨팅 시스템과 내부 인터페이스로 구동하는 동일한 두 단계 연산을 수행한다. 상기 데이터는 상기 외부 인터페이스로 향하도록 되어 있더라도, 상기 RCC 컴퓨팅 시스템은 상기 RCC 하드웨어 어레이에 있는 상기 하드웨어 모델의 내부 상태 정보와 상응하는 내부 상태 정보를 가지고 있어야 하는 소프트웨어에서의 상기 전체 사용자 설계에 대한 모델을 가지고 있기 때문에, 상기 RCC 컴퓨팅 시스템은 상기 데이터에 대한 정보를 가지고 있어야 한다.Although Figure 71 only shows the data output control logic to drive an internal node of chip 2303, other chips also have internal nodes that need to be driven by the RCC computing system and the external interface. have. Regardless of the number of internal nodes, the data output logic will drive the data from the internal node on one chip to the RCC computing system, and in another cycle, will have different sets of internal nodes on the same chip. Powered by the RCC computing system and external interface. The data output control logic moves to the next chip, performs the same two-step operation of driving data designated to the RCC computing system and then driving data designated to the external interface to the RCC computing system and internal interface. do. Although the data is directed to the external interface, the RCC computing system must have internal state information corresponding to the internal state information of the hardware model in the RCC hardware array, the model for the entire user design in software. Since the RCC computing system must have information about the data.

보드 레이아웃(Board layout)Board layout

본 발명의 일 실시예에 상응하는 공동검증 시스템의 상기 보드 레이아웃은 도74를 참고로 설명될 것이다. 상기 보드들은 상기 RCC 하드웨어 어레이에 인스톨될 수 있다. 상기 보드 레이아웃은 도8과 36-44에서 설명된 것과 다음에서 설명하는 것과 유사하다.The board layout of the joint verification system corresponding to one embodiment of the present invention will be described with reference to FIG. The boards may be installed in the RCC hardware array. The board layout is similar to that described in Figures 8 and 36-44 and as described below.

일 실시예에서, 상기 RCC 하드웨어 어레이는 6개의 보드들을 포함한다. 보드 m1은 보드1에 연결되어 있으며, 보드m2는 보드8에 연결되어 있다. 보드1, 보드2, 보드3 및 보드8의 장치와 연결은 도8과 도36-44를 참고로 설명되었다.In one embodiment, the RCC hardware array includes six boards. Board m1 is connected to board 1 and board m2 is connected to board 8. The devices and connections of boards 1, 2, 3 and 8 have been described with reference to FIGS. 8 and 36-44.

보드m1은 칩m1을 포함한다. 다른 보드들에 대해 상기 보드m1의 상호접속 구조는 칩m1이 상기 보드1의 칩0, 칩2, 칩4 및 칩6으로 남쪽(south) 상호접속부에 연결되어 있다. 유사하게, 보드m2은 칩m2을 포함한다. 다른 보드들에 대해 상기 보드m2의 상호접속 구조는 칩m2이 상기 보드8의 칩0, 칩2, 칩4 및 칩6으로 남쪽 상호접속부에 연결되어 있다.Board m1 includes chip m1. For other boards, the interconnect structure of the board m1 has a chip m1 connected to the south interconnect to chip 0, chip 2, chip 4 and chip 6 of the board 1. Similarly, board m2 includes chip m2. For other boards, the interconnect structure of the board m2 is that chip m2 is connected to the south interconnect with chip 0, chip 2, chip 4 and chip 6 of the board 8.

예Yes

본 발명의 한 실시예의 작동을 설명하기 위해, 가정적인 사용자 회로 설계가 사용될 것이다. 구조화된 레지스터 송신 레벨(RLT) HDL 코드에서, 상기 예시적인 사용자 회로 설계는 다음과 같다.To illustrate the operation of one embodiment of the present invention, a hypothetical user circuit design will be used. In a structured register send level (RLT) HDL code, the exemplary user circuit design is as follows.

module register(clock, reset, d, q);module register (clock, reset, d, q);

input clock,d,reset;input clock, d, reset;

output q;output q;

reg q;reg q;

always@(posedge clock or negedge reset)always @ (posedge clock or negedge reset)

if(~reset)if (~ reset)

q=0;q = 0;

elseelse

q=d;q = d;

endmoduleendmodule

module example;module example;

wire d1, d2, d3;wire d1, d2, d3;

wire q1, q2, q3;
wire q1, q2, q3;

reg sigin;reg sigin;

wire sigout;wire sigout;

reg clk, reset;
reg clk, reset;

register reg1(clk, reset, d1,q1);register reg1 (clk, reset, d1, q1);

register reg2(clk, reset, d2,q2);register reg2 (clk, reset, d2, q2);

register reg3(clk, reset, d3,q3);register reg3 (clk, reset, d3, q3);

assign d1 = sigin ^q3;assign d1 = sigin ^ q3;

assign d2 = q1 ^q3;assign d2 = q1 ^ q3;

assign d3 = q2 ^q3;assign d3 = q2 ^ q3;

assign sigout = q3;assign sigout = q3;

// a clock generator// a clock generator

alwaysalways

beginbegin

clk = 0;clk = 0;

#5;# 5;

clk = 1;clk = 1;

#5;# 5;

end end

//a signal generator// a signal generator

alwaysalways

beginbegin

#10;# 10;

sigin = $random;sigin = $ random;

endend

//initialization// initialization

initialinitial

beginbegin

reset = 0;reset = 0;

sigin = 0;sigin = 0;

#1; #One;

reset = 1;reset = 1;

#5;# 5;

$monitor($time, "%b, %b," sigin, sigout);$ monitor ($ time, "% b,% b," sigin, sigout);

#1000 $finish;# 1000 $ finish;

endend

end module
end module

상기 코드는 도26에서 나타낸다. 상기 회로 설계의 특정한 기능적인 세세한 부분들은 본 발명을 이해하기 위해 필요없다. 그러나, 독자는 상기 사용자는 시뮬레이션을 위한 회로를 설계하기 위해 상기 HDL 코드를 발생한다는 것을 이해해야 한다. 상기 코드에 의해 표현된 상기 회로는 입력 신호에 응답하여 상기 사용자가 설계한 일정 기능을 수행하며, 출력을 발생한다.The code is shown in FIG. Certain functional details of the circuit design are not necessary to understand the present invention. However, the reader should understand that the user generates the HDL code to design the circuit for the simulation. The circuit represented by the code performs a certain function designed by the user in response to an input signal and generates an output.

도27은 도26에서 설명된 상기 HDL 코드의 회로도를 도시하고 있다. 대부분의 경우, 상기 사용자는 실제적으로 상기 본질을 HDL 폼에 나타내기 전에 상기 본질의 회로도를 발생한다. 일정한 도식적인 캡쳐 툴은 도식적인 회로도이 입력되고 프로세싱된 후에 상기 유용한 코드를 발생하도록 한다.FIG. 27 shows a circuit diagram of the HDL code described in FIG. In most cases, the user generates a schematic of the nature before actually presenting the nature on the HDL foam. Certain schematic capture tools allow generating the useful code after the schematic schematic has been entered and processed.

도28에 도시되어 있는 것과 같이, 상기 시뮬레이션 시스템은 컴포넌트 타입 분석을 수행한다. 원래 사용자의 특정 회로 설계를 나타내는 도26에 제시되어 있는 것과 같이, 상기 HDL 코드는 현재 분석된다. 상기 "모듈 레지스터(clock, reset, d,q)"으로 시작하고 "endmodule"로 끝나는 코드의 처음 몇 줄(즉, 참조 번호(900)로 식별되는 부분)는 레지스터 정의 섹션이다.As shown in Figure 28, the simulation system performs component type analysis. The HDL code is currently analyzed, as shown in Figure 26, which represents the original circuit design of the original user. The first few lines of code (ie, the part identified by reference numeral 900) starting with "module registers (clock, reset, d, q)" and ending with "endmodule" are register definition sections.

다음 몇 줄의 코드,참조 번호(907),은 와이어 상호접속 정보를 나타낸다. 당업자에게 공지되어 있는 HDL의 와이어 변수는 게이트와 같은 구조적인 실체들 사이에서 물리적인 연결을 나타내는데 사용된다. HDL은 주로 디지털 회로를 모델하는데 사용되기 때문에, 와이어 변수들은 변수를 필요로 한다. 보통, "q"(예들 들어, q1, q2, q3)은 출력 와이어 라인을 나타내며, "d"(예를 들어, d1, d2, d3)는 입력 와이어 라인을 나타낸다.The next few lines of code, reference number 907, represent wire interconnect information. Wire variables of HDL known to those skilled in the art are used to represent physical connections between structural entities such as gates. Since HDL is mainly used to model digital circuits, wire variables need variables. Usually, "q" (e.g., q1, q2, q3) represents an output wire line, and "d" (e.g., d1, d2, d3) represents an input wire line.

참조 번호(908)는 테스트-벤치 출력인 "sigin"를 도시한다. 레지스터 번호(909)는 테스트 벤치 입력인 "sigout"를 도시한다.Reference numeral 908 shows the test-bench output "sigin". Register number 909 shows the test bench input "sigout".

참조 번호(901)는 레지스터 컴포넌트들(S1, S2 및 S3)을 도시한다. 참조번호(902)는 결합 컴포넌트(S4, S5, S6, S6)를 도시한다. 결합 컴포넌트 S4-S7은 상기 레지스터 컴포넌트 S1-S3으로의 입력인 출력 변수들 d1, d2, d3을 가지고 있다는 것에 유의하여야 한다. 참조번호(903)는 클럭 컴포넌트(S8)를 도시한다.Reference numeral 901 shows register components S1, S2 and S3. Reference numeral 902 shows the coupling components S4, S5, S6, S6. Note that coupling component S4-S7 has output variables d1, d2, d3 which are inputs to register components S1-S3. Reference numeral 903 shows the clock component S8.

다음 시리즈의 코드 라인 번호들은 테스트 벤치 컴포넌트들을 도시한다. 참조번호(904)는 테스트 벤치 컴포넌트(드라이버)(S9)를 도시한다. 참조번호(905)는 테스트 벤치 컴포넌트(초기화)(S10, S11)를 도시한다. 참조 번호(904)는 테스트 벤치 컴포넌트(모니터)(S12)를 도시한다.The code line numbers in the next series illustrate test bench components. Reference numeral 904 shows a test bench component (driver) S9. Reference numeral 905 shows test bench components (initialization) S10, S11. Reference numeral 904 shows a test bench component (monitor) S12.

상기 컴포넌트 타입 분석은 다음의 테이블에서 정리된다.The component type analysis is summarized in the following table.

컴포넌트 component 타입 type S1 S1 레지스터 register S2 S2 레지스터 register S3 S3 레지스터 register S4 S4 결합(combination) Combination S5 S5 결합 Combination S6 S6 결합 Combination S7 S7 결합 Combination S8 S8 클럭 Clock S9 S9 테스트-벤치(드라이버) Test-Bench (Driver) S10 S10 테스트-벤치(초기화) Test-Bench (Initialize) S11 S11 테스트-벤치(초기화) Test-Bench (Initialize) S12 S12 테스트-벤치(검사) Test-Bench (Inspection)

상기 컴포넌트 타입 분석에 근거하여, 상기 시스템은 상기 전체 회로를 위한 소프트웨어 모델과 상기 레지스터와 결합 컴포넌트에 대한 하드웨어 모델을 발생한다. S1-S3는 레지스터 컴포넌트들이고 S4-S7은 결합 컴포넌트들이다. 이러한 컴포넌트들은 하드웨어에서 모델되어 시뮬레이션 시스템의 사용자로 하여금 상기 전체 회로를 소프트웨어로 시뮬레이션하거나 또는 소프트웨어로 시뮬레이션하고 선택적으로 하드웨어에서 촉진할 수 있다. 각각의 경우에, 사용자는 시뮬레이션 및 하드웨어 가속 모드를 제어할 수 있다. 추가적으로, 상기 사용자는 시작하고, 중단하고 값들을 조사하고, 입력 값을 사이마다 입력하는 소프트웨어 제어를 유지하면서 타겟 시스템에 의해 상기 회로를 에뮬레이션할 수 있다.Based on the component type analysis, the system generates a software model for the entire circuit and a hardware model for the registers and coupling components. S1-S3 are register components and S4-S7 are coupling components. These components can be modeled in hardware to allow a user of a simulation system to simulate the entire circuit in software or in software and optionally in hardware. In each case, the user can control the simulation and hardware acceleration modes. In addition, the user can emulate the circuit by a target system while maintaining software control to start, stop, examine values, and enter input values between.

도29는 동일한 구조화된 RTL 레벨 HDL 코드의 신호 네트워크 분석을 도시하고 있다. 도시되어 있는 것과 같이, S8, S9, S10, S11은 모델링되거나 소프트웨어로 제공된다. S9는 본질적으로 사인 신호들을 발생하는 상기 테스트-벤치 프로세스이며, S12는 본질적으로 상기 sigout 신호를 수신하는 상기 테스트벤치 모니터 프로세스이다. 이러한 예에서, 상기 S9는 상기 회로를 시뮬레이션하기 위해 랜덤 사인을 발생한다. 그러나, 레지스터 S1내지 S3 및 결합 컴포넌트 S4 내지 S7은 하드웨어 및 소프트웨어에서 모델링된다.29 illustrates signal network analysis of the same structured RTL level HDL code. As shown, S8, S9, S10, S11 are modeled or provided in software. S9 is essentially the test-bench process that generates sine signals and S12 is essentially the testbench monitor process that receives the sigout signal. In this example, S9 generates a random sine to simulate the circuit. However, registers S1 to S3 and coupling components S4 to S7 are modeled in hardware and software.

상기 하드웨어와 소프트웨어의 경계에서, 상기 시스템은 상기 소프트웨어 모델을 상기 하드웨어 모델로 인터페이스하는데 사용되는 여러 잔여 신호들에 대한 메모리 공간을 할당한다.(즉, q1, q2, q3, CLK, sign, sigout). 상기 메모리 공간 할당은 이하의 테이블과 같다:At the boundary between the hardware and the software, the system allocates memory space for the various residual signals used to interface the software model to the hardware model (ie q1, q2, q3, CLK, sign, sigout). . The memory space allocation is shown in the table below:

신호 signal 메모리 주소 공간 Memory address space q1 q1 REG REG q2 q2 REG REG q3 q3 REG REG clk clk CLK CLK sign sign S2H S2H sigout sigout H2S H2S

도30은 상기 회로 설계 예에서 하드웨어/소프트웨어 부분 결과를 도시하고 있다. 도30은 상기 하드웨어/소프트웨어 부분에 대한 보다 현실적인 도시예이다. 상기 소프트웨어 측(910)은 상기 소프트웨어/하드웨어 경계(911)와 상기 PCI 버스(913)를 통해 상기 하드웨어 측(912)에 연결된다. Figure 30 shows the hardware / software partial results in the circuit design example above. 30 is a more realistic illustration of the hardware / software portion. The software side 910 is connected to the hardware side 912 via the software / hardware boundary 911 and the PCI bus 913.

상기 소프트웨어 측(910)는 상기 소프트웨어 커널을 포함하고 있으며, 그것에 의해 제어된다. 일반적으로, 상기 커널은 상기 SEmulation 시스템의 전체 동작을 제어하는 메인 제어 루프이다. 일정한 테스트 벤치 프로세스가 액티브화되어 있는 동안에, 상기 커널은 상기 액티브화된 테스트-벤치 컴포넌트를 에뮬레이션하고 클럭 컴포넌트를 에뮬레이션하며, 결합 로직 데이터를 전파할 뿐만 아니라 레지스터와 메모리를 업데이트하기 위해 클럭 에지를 탐색하여, 상기 시뮬레이션 시간을 진행한다. 상기 커널이 상기 소프트웨어 측에 존재하더라도, 그것의 작동 중 일부분과 명령은 하드웨어 모델이 그러한 명령과 작동을 위해 존재하기 때문에 하드웨어에서 작동할 수 있다. 따라서, 상기 소프트웨어는 소프트웨어와 하드웨어 모델을 모두 제어한다.The software side 910 includes and is controlled by the software kernel. In general, the kernel is the main control loop that controls the overall operation of the SEmulation system. While a constant test bench process is active, the kernel emulates the active test-bench component, emulates a clock component, propagates combined logic data, as well as searching for clock edges to update registers and memory. The simulation time is then performed. Even if the kernel is on the software side, some of its operations and instructions may operate on hardware because a hardware model exists for such instructions and operations. Thus, the software controls both software and hardware models.

상기 소프트웨어 측(910)는 S1-S12를 포함하는 상기 사용자 회로의 전체 모델을 포함하고 있다. 상기 소프트웨어 측의 상기 소프트웨어/하드웨어 경계 부분은 I/O버퍼 또는 주소 공간 S2H, CLK, H2S 및 REG를 포함한다. 드라이버 테스트-벤치 프로세스(S9)는 상기 S2H 주소 공간에 연결되어 있으며, 모니터 테스트 벤치 프로세스(S12)는 상기 H2S 주소 공간에 연결되어 있으며, 상기 클럭 발생기(S8)는 상기 CLK 주소 공간에 연결되어 있다. 상기 레지스터(S1-S3) 출력 신호들(q1-q3)는 REG 공간에 할당될 수 있다.The software side 910 includes a full model of the user circuit including S1-S12. The software / hardware boundary portion on the software side includes an I / O buffer or address space S2H, CLK, H2S and REG. A driver test-bench process S9 is connected to the S2H address space, a monitor test bench process S12 is connected to the H2S address space, and the clock generator S8 is connected to the CLK address space. . The registers S1-S3 output signals q1-q3 may be allocated to the REG space.

상기 하드웨어 모델(912)은 결합 컴포넌트(S4-S7)의 모델을 가지고 있는데, 상기 컴포넌트들은 순수한 하드웨어 측에 존재한다. 상기 하드웨어 모델(912)의 상기 소프트웨어/하드웨어 경계 부분에서, sigout, sigin, 레지스터 출력(q1-q3) 및 상기 소프트웨어 클럭(196)이 구현된다.The hardware model 912 has a model of coupling components S4-S7, which components are on the pure hardware side. At the software / hardware boundary portion of the hardware model 912, sigout, sigin, register outputs q1-q3 and the software clock 196 are implemented.

상기 사용자 회로 설계의 모델에 추가하여, 상기 시스템은 소프트웨어 클럭과 주소 포인터를 발생한다. 상기 소프트웨어 클럭은 레지스터(S1-S3)로의 인에이블 입력에 신호들을 제공한다. 상기 설명한 것과 같이, 본 발명에 상응하는 소프트웨어 클럭은 레이스 조건과 유지 시간 위반 문제를 제거한다. 상기 주요 클럭에 의해 클럭 에지가 소프트웨어에서 탐색될 때, 상기 탐색 로직는 하드웨어에서 상응하는 탐색 로직를 트리거한다. 시간적으로, 상기 클럭 에지 레지스터(916)는 레지스터 인에이블 입력이 상기 레지스터로의 입력에 남아있는 데이터로 게이트하기 위해 인에이블 신호를 발생한다.In addition to the model of the user circuit design, the system generates a software clock and an address pointer. The software clock provides signals to enable inputs to registers S1-S3. As described above, the software clock corresponding to the present invention eliminates race conditions and retention time violations. When a clock edge is searched in software by the main clock, the search logic triggers the corresponding search logic in hardware. In time, the clock edge register 916 generates an enable signal to gate the register enable input with data remaining on the input to the register.

주소 포인터(194)는 또한 예시적이고 개념적인 설명을 위해 도시되어 있다. 주소 포인터들은 실제적으로 각 FPGA 칩에서 구현되며, 상기 데이터가 선택적으로 그리고 연속해서 그것의 목적지로 전송되도록 한다.The address pointer 194 is also shown for illustrative and conceptual description. Address pointers are practically implemented on each FPGA chip, allowing the data to be sent selectively and successively to its destination.

상기 결합 컴포넌트들(S4-S7)은 또한 레지스터 컴포넌트(S1-S3, sign, sigout)에 연결되어 있다. 이러한 신호들은 상기 I/O 버스(915)상에서 상기 PCI 버스로 및 PCI 버스로부터 이동한다.The coupling components S4-S7 are also connected to the register components S1-S3, sign, sigout. These signals travel on and from the PCI bus on the I / O bus 915.

맵핑, 정착 및 단계의 라우팅 이전에, 완벽한 하드웨어 모델이 주소 포인터를 제외하고 도31에 도시되어 있다. 상기 시스템은 상기 모델을 특정 칩에 맵하지는 않는다. 레지스터(s1-S3)는 상기 I/O 버스와 상기 결합 컴포넌트(S4-S6)에 연결되지 위해 제공된다. 결합 컴포넌트(S7)는 상기 레지스터(S3)의 출력(q3)이다. 상기 sigin, sigout 및 소프트웨어 클럭(920)은 또한 모델링된다.Before mapping, settling, and routing of the steps, a complete hardware model is shown in FIG. 31 except for the address pointer. The system does not map the model to a particular chip. Registers s1-S3 are provided for not being connected to the I / O bus and the coupling component S4-S6. Coupling component S7 is the output q3 of register S3. The sigin, sigout and software clock 920 are also modeled.

일단 상기 하드웨어 모델이 결정되면, 상기 시스템은 맵되고, 정착하며, 상기 모델을 하나 이상의 칩으로 라우팅한다. 이러한 특정 예는 실제적으로 단일 Altera FLEX 10K에서 구현될 수 있지만, 교육적인 목적에서 이러한 에는 상기 하드웨어 모델을 구현하기 위해 두 개의 칩이 요구된다는 것을 생각할 수 있다. 도32는 상기 예를 이한 한 특정 하드웨어 모델에 대한 칩 부분 결과를 도시하고 있다.Once the hardware model is determined, the system is mapped, settled, and routes the model to one or more chips. This particular example may actually be implemented in a single Altera FLEX 10K, but for educational purposes it may be conceivable that this requires two chips to implement the hardware model. Figure 32 shows chip part results for one particular hardware model from the above example.

도32에서, 점선으로 표현된 칩 경계에 의해 완벽한 모델이 도시되어 있다(다만, 사기 I/O와 클럭 에지 레지스터는 제외된다). 상기 결과는 최종 구조 파일이 발생하기 전에 상기 SEmulation 시스템의 컴파일러에 의해 생산된다. 따라서, 상기 하드웨어 모델은 와이어 라인(921, 922 및 923)을 위한 상기 두 개의 칩들 사이에서 적어도 3개의 와이어을 요구한다. 상기 2개의 칩들(칩1, 칩2)사이에서 요구되는 핀/와이어들의 수를 최소화하기 위해, 또 다른 모델-칩 부분은 발생되어야 하거나 또는 멀티플렉싱 구조가 사용되어야 한다.In Fig. 32, a perfect model is shown by the chip boundaries represented by dashed lines (except fraudulent I / O and clock edge registers). The result is produced by the compiler of the SEmulation system before the final structure file occurs. Thus, the hardware model requires at least three wires between the two chips for wire lines 921, 922 and 923. In order to minimize the number of pins / wires required between the two chips (chip 1, chip 2), another model-chip part must be generated or a multiplexing structure must be used.

도32에 도시되어 있는 상기 특정 부분 결과를 분석할 때, 상기 2개의 칩들 사이에서 와이어의 수는 칩2로부터 칩1로의 sigin 와이어 라인(923)을 제거함으로써 줄어들 수 있다. 도33은 상기 부분을 도시하고 있다. 비록 도33의 상기 특정 부분은 와이어의 수만에 근거하여 도32의 부분보다는 더 좋은 부분으로 보이지만, 상기 예는 상기 SEmulation 시스템이 상기 맵핑, 정착 및 작동 라우팅의 수행 후에 도 32를 선택한 것으로 생각될 수 있다. 도32의 상기 결과 부분은 상기 구조 파일을 발생하는 기본으로 사용될 수 있을 것이다.In analyzing the particular partial result shown in Fig. 32, the number of wires between the two chips can be reduced by removing the sigin wire line 923 from chip 2 to chip 1. 33 shows this portion. Although the particular part of FIG. 33 appears to be a better part than that of FIG. 32 based on the number of wires only, the example can be considered that the SEmulation system selected FIG. 32 after performing the mapping, fixation and operational routing. have. The result portion of Figure 32 may be used as the basis for generating the rescue file.

도34는 2개의 칩의 최종 실현이 도시되어 있는 상기 동일한 가정적인 예에 대해 상기 로직 패칭 작동을 도시한다. 상기 시스템은 상기 구조 파일을 발생하기 위해 도32의 상기 부분 결과 사용했다. 그러나, 상기 주소 포인터들은 간소화를 위해 도시되지 않는다. 상기 두개의 FPGA 칩들(930, 940)이 도시되어 있다. 칩(930)은 여러 컴포넌트들 중에서 상기 사용자 회로 설계의 분할된 부분들, TDM 유닛(931, 수신기 측), 상기 소프트웨어 클럭(932) 및 I/O 버스(933)를 포함한다. 칩(940)은 여러 컴포넌트들 중에서 상기 사용자 회로 설계의 분할된 부분들, 송신 측를 위한 TDM 유닛(941), 상기 소프트웨어 클럭(942) 및 I/O 버스(943)을 포함한다. 상기 TDM 유닛(931, 941)은 도 9A, 9B, 9C를 참고로 설명되었다.Figure 34 illustrates the logic patching operation for the same hypothetical example in which the final realization of two chips is shown. The system used the partial results of FIG. 32 to generate the rescue file. However, the address pointers are not shown for simplicity. The two FPGA chips 930 and 940 are shown. Chip 930 includes divided parts of the user circuit design, TDM unit 931, receiver side, the software clock 932 and I / O bus 933 among other components. Chip 940 includes divided parts of the user circuit design, a TDM unit 941 for the transmitting side, the software clock 942 and an I / O bus 943 among other components. The TDM units 931 and 941 have been described with reference to Figs. 9A, 9B and 9C.

상기 칩들(930, 940)은 하드웨어 모델을 함께 연결하는 2개의 상호접속 와이어들(944, 945)을 가지고 있다. 이러한 2개의 상호접속 와이어들은 도8에 도시되어 있는 상호접속부이다. 도8을 참고로, 상기 상호접속부는 칩F32와 칩F33에 위치하고 있는 상호접속부(611)이다. 일 실시예에서, 상기 각 상호접속을 위한 와이어/핀들의 최대 수는 44이다. 도34에서, 상기 모델링된 회로는 칩들(930, 940) 사이에서 단지 2개의 와이어/핀들을 요구한다.The chips 930 and 940 have two interconnect wires 944 and 945 connecting the hardware model together. These two interconnect wires are the interconnects shown in FIG. Referring to FIG. 8, the interconnect is an interconnect 611 located on chip F32 and chip F33. In one embodiment, the maximum number of wires / pins for each of the interconnections is 44. In Figure 34, the modeled circuit requires only two wires / pins between the chips 930, 940.

상기 칩들(930, 940)은 상기 뱅크 버스(90)에 연결되어 있다. 단지 2개의 칩들이 구현되기 때문에, 2개의 칩들은 동일한 뱅크에 있거나 또는 각각은 서로 다른 뱅크에 존재할 수 있다. 궁극적으로, 한 칩은 하나의 뱅크 버스와 연결되며, 또 다른 칩은 서로 다른 뱅크에 연결되어, 상기 FPGA 인터페이스에서의 출력은 상 기 PCI 인터페이스의 출력과 동일하다.The chips 930 and 940 are connected to the bank bus 90. Since only two chips are implemented, the two chips may be in the same bank or each may be in a different bank. Ultimately, one chip is connected to one bank bus and another chip is connected to different banks so that the output at the FPGA interface is the same as the output of the PCI interface.

전술한 본 발명의 바람직한 실시예는 예와 설명을 목적으로 제시된 것이며 개시된 정확한 형태로 상기 발명을 제한하려는 것은 아니다. 명확히, 당업자는 많은 교정과 변화를 가할 수 있다. 당업자는 본 발명의 범위와 정신을 벗어나지 않고 본 발명의 여러 교정을 가할 수 있다. 따라서, 본 발명은 이하 첨부된 청구항에 의해서만 제한된다.The foregoing preferred embodiments of the invention are presented for purposes of illustration and description and are not intended to limit the invention to the precise form disclosed. Clearly, those skilled in the art can make many corrections and changes. Those skilled in the art can make various modifications of the present invention without departing from the scope and spirit of the invention. Accordingly, the invention is limited only by the appended claims below.

Claims

A method of generating a value change dump (VCD) file for an on-demand modeling design.

Selecting a simulation session range starting at simulation time t0 of the simulation of the modeling design and ending at simulation time t3;

Selecting a simulation target range starting at simulation time t1 of the simulation of the modeling design and ending at simulation time t2, wherein the simulation time t1 is greater than or equal to the simulation time t0 and the simulation time t2 is greater than the simulation time t3. Less than or equal to;

Generating a VCD file of the modeling design during the selected target range of the simulation of the modeling design; And

Directly accessing the VCD file at the simulation time t1 to debug the modeling design being simulated

How to create a VCD file containing.

The method of claim 1,

Simulating the modeling design by providing primary inputs to the modeling design for evaluation by the modeling design; And

Recording simulation history during the simulation session range

Further comprising, VCD file generation method.

The method of claim 2,

Simulating the modeling design by evaluating a simulation history of the modeling design from the simulation time t0 to the simulation time t2.

The method according to claim 3,

Generating the VCD file,

Generating evaluated results from the modeling design based on the simulation history; And

And storing the evaluated results in the VCD file during the simulation target range.

The method of claim 4, wherein

The recording of the simulation history,

Compressing the primary inputs; And

Recording the compressed primary inputs as the simulation history.

The method of claim 5, wherein

Processing the simulation history,

Decompressing the compressed primary inputs; And

Providing the decompressed primary inputs to the modeling design as the processed simulation history for evaluation.

The method of claim 4, wherein

The recording of the simulation history,

Recording the primary inputs as the simulation history.

The method of claim 1,

Storing state information of the modeling design in a first file at the simulation time t0; And

And storing state information of the modeling design in a second file at the simulation time t3.

An electronic design automation system for verifying user designs.

A computing system including a central processing unit (CPU) and a memory for modeling a user design of software for simulating the user design;

An internal bus system coupled to the computing system;

Reconfigurable hardware logic coupled to the internal bus system for modeling the user design and for modeling at least a portion of the user design of hardware;

Control logic coupled to the internal bus system to control data transfer between the reconfigurable hardware logic and the computing system; And

Custom VCD logic for recording simulation history for a selected simulation session range and dumping state information from a hardware model to a VCD file for a selected simulation target range, wherein the simulation target range is within the simulation session range.

Electronic design automation system comprising a.

The method of claim 9,

The custom VCD logic,

First range selection logic for selecting a simulation session range starting at simulation time t0 and ending at simulation time t3;

Second range selection logic for selecting a simulation target range starting at simulation time t1 and ending at simulation time t2-the simulation time t1 is greater than or equal to the simulation time t0 and the simulation time t2 is less than the simulation time t3 Or equal to-;

Dump logic for generating a VCD file of the user design during the selected simulation target range; And

Access logic to directly access the VCD file at the simulation time t1 to debug the user design

Further comprising, electronic design automation system.

The method of claim 10,

The custom VCD logic,

A test bench process for simulating the user design by providing primary inputs to the user design for evaluation by the user design; And

Recording logic of the computing system to record simulation history during the simulation session range

Further comprising, electronic design automation system.

The method of claim 11,

The custom VCD logic,

Evaluation logic of the reconfigurable hardware logic for simulating the user design by evaluating the simulation history of the user design from the simulation time t0 to the simulation time t2

Further comprising, electronic design automation system.

The method of claim 12,

The dump logic dumps evaluation results from the user design into the VCD file during the simulation target range based on the simulation history.

The method of claim 13,

The recording logic is,

Compression logic for compressing the primary inputs; And

Write logic for writing the compressed primary inputs as the simulation history

Further comprising, electronic design automation system.

The method of claim 14,

The processing logic is,

Decompression logic to decompress the compressed primary inputs; And

Data transfer logic for passing the uncompressed primary inputs to the user design as the simulation history for evaluation

Further comprising, electronic design automation system.

The method of claim 13,

The write logic further comprises write logic to write the primary inputs as the simulation history.

The method of claim 9,

And state storing logic for storing state information of the user design in a first file at simulation time t0 and storing state information of the user design in a second file at simulation time t3.

A custom VCD system for providing evaluation information from a modeling design simulated during a selected simulation target range of simulation times,

First logic for selecting a simulation session range starting at simulation time t0 and ending at simulation time t3;

Second logic to select a simulation target range starting at simulation time t1 and ending at simulation time t2, wherein the simulation time t1 is greater than or equal to the simulation time t0 and the simulation time t2 is less than or equal to the simulation time t3; ;

Generation logic for generating a VCD file of the evaluation information during the selected simulation target range; And

Access logic to directly access the VCD file at the simulation time t1 to debug the modeling design

Custom VCD system comprising a.

The method of claim 18,

Compression logic for receiving and compressing primary input data during the time period of the simulation session range; And

Decompression logic to decompress the compressed primary input data and pass the decompressed primary input data to the modeling design for evaluation.

Further comprising, on-demand VCD system.

The method of claim 19,

The generation logic further comprises dump logic for dumping the evaluated information into the VCD file, wherein the evaluated information is generated by evaluating the decompressed primary inputs by the modeling design.