KR20080096723A

KR20080096723A - Distributed parallel simulation method based on adaptive synchronization/communication scheme

Info

Publication number: KR20080096723A
Application number: KR1020070041672A
Authority: KR
Inventors: 양세양
Original assignee: 양세양
Priority date: 2007-04-29
Filing date: 2007-04-29
Publication date: 2008-11-03

Abstract

A dispersion parallel simulation method by the variable synchronization/communication method are provided to improve the dispersion parallel simulation speed by effectively reducing synchronization overhead and communication overhead. A dispersion parallel simulation is performed by variably changing communication/synchronization point which controls the dispersion parallel simulation. The dispersion parallel simulation uses the future synchronous point prediction method based on experience to present, and the method for progressing as the variable interval. Through the simulation result, communication overhead, or synchronization overhead is minimized. Therefore, the dispersion parallel simulation speed is gradually improved.

Description

Distributed Parallel Simulation Method Based on Variable Synchronization / Communication Scheme {Adtributed Parallel Simulation Method Based on Adaptive Synchronization / Communication Scheme}

도1 은, 본 특허에서의 분산처리적병렬수행 방식의 시뮬레이션을 위한 2 이상의 컴퓨터에 인스톨된 2 이상의 로컬컴퓨터들의 논리적연결구조 방식들의 몇가지 사례들을 개략적으로 도시한 도면.1 schematically illustrates some examples of logical connection structure schemes of two or more local computers installed in two or more computers for simulation of a distributed processing parallel execution scheme in the present patent.

도2 는, 분산병렬 시뮬레이션을 2 이상의 컴퓨터들과 이들 컴퓨터에 인스톨된 2이상의 HDL 시뮬레이터들을 이용하여서 구성한 분산병렬 시뮬레이션 환경의 일 예를 개략적으로 도시한 도면.2 is a diagram schematically illustrating an example of a distributed parallel simulation environment in which distributed parallel simulation is configured using two or more computers and two or more HDL simulators installed in these computers.

도3 은, 통상적인 분산병렬 시뮬레이션의 전체 진행 흐름도의 일 예를 개략적으로 도시한 도면.3 is a schematic illustration of an example of an overall flow chart of a typical distributed parallel simulation.

<도면의 주요부분에 대한 부호의 설명> <Description of the symbols for the main parts of the drawings>

32 : 검증 소프트웨어 34 : HDL 시뮬레이터32: Verification Software 34: HDL Simulator

35 : 컴퓨터35: computer

62 : 검증 소프트웨어로써 검증대상 설계코드에 부가되어진 부가 코드62: additional code added to the design code to be verified by the verification software

64 : 분산병렬 시뮬레이션을 위한 통신 및 동기화 모듈64: Communication and Synchronization Module for Distributed Parallel Simulation

333 : 별방식의 연결구조에서 분산병렬 시뮬레이션 수행 시에 로컬시뮬레이션들의 333: Local simulations of distributed parallel simulation

제어와 로컬시뮬레이션들의 연결을 수행하는 중앙컴퓨터에 존재하는 SW서버 모듈SW server module in the central computer that performs the connection of control and local simulations

343 : 분산병렬 시뮬레이션 환경에서 로컬시뮬레이션을 수행하는 시뮬레이터343: Simulator for Local Simulation in Distributed Parallel Simulation Environment

353 : 중앙컴퓨터353: central computer

354 : 외곽컴퓨터354: Outer Computer

644 : 분산병렬 시뮬레이션을 위한 로컬시뮬레이션 런-타임 모듈644: Local Simulation Run-Time Module for Distributed Parallel Simulation

646 : 시뮬레이션가속을 위한 통신 및 동기화 모듈646: communication and synchronization module for simulation acceleration

648 : 하드웨어기반검증플랫폼648 Hardware-based Verification Platform

650 : 시뮬레이션가속 런타임 모듈650 simulation acceleration runtime module

660 : 분산병렬 시뮬레이션에서 로컬시뮬레이터에 수행되도록 분할된 모델내의 설660: Installation in a model that is split to be performed on the local simulator in distributed parallel simulation.

계객체Object

670 : VPI/PLI/FLI670: VPI / PLI / FLI

674 : Socket API674: Socket API

676 : TCP/IP socket676: TCP / IP socket

678 : Device API678: Device API

680 : Device Driver680: Device Driver

682 : HAL(Hardware Abstraction Layer)682: Hardware Abstraction Layer (HAL)

684 : Giga-bit LAN card684: Giga-bit LAN card

본 발명은 전자적 시스템 수준(Electronic System Level: 앞으로 ESL로 약칭함)에 서부터 게이트 수준까지로의 설계를 시뮬레이션을 이용하여 체계적으로 검증하는 기술에 관한 것으로, 설계된 수백만 게이트급 이상의 디지털 시스템을 검증하고자 하는 경우에 검증의 성능과 효율성을 증가시키는 검증 방법에 관한 것이다. 반도체 설계 검증에서 시뮬레이션(simulation)이란 소프트웨어적으로 DUV(Design Under Verification) 내지는 DUV내의 1 이상의 설계객체(추후에 정의됨)와 이를 구동하는 테스트벤치를 컴퓨터로 실행가능한 모델(computer-executable model)로서 구성하고, 이와 같은 컴퓨터로 실행가능한 모델을 시뮬레이션 컴파일 과정을 통하여 컴퓨터의 기계언어(machine instruction)들의 시퀀스로 바꾸어서 컴퓨터를 이용하여서 실행시키는 과정이다. 따라서, 시뮬레이션의 실행은 기본적으로 컴퓨터의 기계언어들의 순차적인 수행(sequential execution)을 통하여 이루어지게 되는데, 현재 다양한 시뮬레이션 기술(이벤트-구동 시뮬레이션(event-driven simulation), 사이클-기반 시뮬레이션(cycle-based simulation), 컴파일방식 시뮬레이션(compiled simulation), 해석방식 시뮬레이션(interpreted simulation), 동시-시뮬레이션(co-simulation) 등)들이 존재하고 있다. 즉 시뮬레이션이란 설계대상 내지는 구현대상이 되는 설계객체를 적정한 추상화 수준(abstraction level, 반도체 설계에서는 게이트-수준, 레지스터전송-수준, 트란젝션-수준, 아키텍춰-수준, 행위-수준, 알고리즘-수준 등의 다양한 추상화 수준 등이 존재함)에서의 모델링 과정(modeling process)을 통하여 컴퓨터를 이용하여 소프트웨어적으로 수행시켜서 해당 설계객체의 동작기능 내지는 동작특성 등을 모의적으로 컴퓨터 상에서 실현시키는 다양한 과정들을 모두 통칭하는 것이다. 이와 같은 시뮬레이션의 장점은 설계객체를 실제적으로(physically) 구현하기 전에 이의 동작기능 내지는 동작특성 등을 가상적으로(virtually) 컴퓨터 상에서 미리 예측해볼 수 있을 뿐만 아니라 소프트웨어적인 방식임으로 높은 유연성을 제공받을 수 있다는 것이며, 단점으로는 시뮬레이션의 수행이 결국은 기계언어들의 시퀀스가 순차적인 수행을 통하여 이루어지게 됨으로 시뮬레이션 대상의 되는 설계객체의 복잡도가 큰 경우(예로 최근의 반도체는 1억 게이트급 이상의 설계들도 다수 존재함)에는 시뮬레이션의 수행 속도가 매우 느리다(예로 상기 1억 게이트급의 설계를 이벤트-구동 시뮬레이션으로 수행하는 경우의 시뮬레이션 속도가 1 cycle/sec가 되고 시뮬레이션을 100,000,000 사이클 수행하여야 한다면 약 3.2년이 소요됨)는 것이다. 따라서, 본 특허에서 시뮬레이션이라함은 DUV 내지는 DUV 내의 1 이상의 설계객체를 적정한 추상화 수준에서 소프트웨어적으로 모델링하여서 이를 소프트웨어적으로 실행시키는 모든 방법을 가르킨다. 좀 더 구체적으로 설명한다면, DUV 내지는 DUV 내의 1 이상의 설계객체의 특정 추상화 수준의 행태(behavior)를 궁극적으로 특정한 컴퓨터 자료구조(data structure)와 이 특정한 자료구조에 대한 일정한 오퍼레이션(operation)들로 정의되어지도록 구현함으로서 이를 컴퓨터 실현가능한(computer-executable) 형태로 만들고, 여기에 입력 값들이 인가되면서 컴퓨터로 동작시키는 과정 내지는 입력 값들로써 컴퓨터에서 일련의 연산(computation) 내지는 프로세싱(processing)을 포함하는 과정이라면 모두 시뮬레이션이라고 정의한다 (따라서, 상용 시뮬레이터를 이용하여 이루어지는 시뮬레이션뿐만 아니라, 상기 정의에 부합되는 과정이라면 자체적으로 제작한 시뮬레이 터에 의한 시뮬레이션과 상기 시뮬레이션 프로세스와 동일한 과정을 통한 모델링을 통하여 컴퓨터 상에서 가상적으로 수행되어지는 소프트웨어 프로세스라면 모두 시뮬레이션으로 정의하기로 한다).The present invention relates to a technique for systematically verifying a design from an electronic system level (abbreviated to ESL) to a gate level by using a simulation. In this case, the present invention relates to a verification method which increases the performance and efficiency of verification. In semiconductor design verification, simulation is a computer-executable model that is software-based under design (DUV) or one or more design objects in a DUV (defined later) and the test bench that drives it. The computer-implemented model is converted into a sequence of machine instructions of a computer through a simulation compilation process and executed using a computer. Therefore, the execution of the simulation is basically performed through the sequential execution of the machine language of the computer, and at present various simulation techniques (event-driven simulation, cycle-based simulation). simulation, compiled simulation, interpreted simulation, co-simulation, etc. exist. That is, simulation means design objects that are designed or implemented as appropriate abstraction levels (abstraction level, semiconductor transfer design, gate-level, register transfer-level, transaction-level, architecture-level, behavior-level, algorithm-level, etc.). Various processes that simulate the operating function or operation characteristics of the corresponding design object on the computer through software modeling through the modeling process in various levels of abstraction) It is. The advantage of such simulation is that it is possible not only to predict virtually the operation function or operation characteristics of the design object before implementing it physically, but also provide high flexibility in software. The disadvantage is that the simulation is finally performed through the sequential execution of the machine languages, so that the complexity of the design object to be simulated is large. Present), the simulation is very slow (for example, if the 100 million-gate design is performed by event-driven simulation, the simulation speed is 1 cycle / sec and about 3.2 years if the simulation is to be performed 100,000,000 cycles). Is required). Therefore, simulation in this patent refers to all methods of software modeling one or more design objects in a DUV or DUV at an appropriate level of abstraction and executing them in software. More specifically, the behavior of a particular level of abstraction of a DUV or one or more design objects within a DUV ultimately defines a particular computer data structure and certain operations on that particular data structure. Process to make it computer-executable and to operate the computer with input values applied thereto or to include a series of computations or processing in the computer as input values. If it is defined as a simulation, it is defined as a simulation (So, if the process meets the definition as well as the simulation using a commercial simulator, the simulation is performed on the computer through the simulation by the self-made simulator and the modeling through the same process as the simulation process.If a software process which is normally carried out will be both defined by simulation).

최근에 집적회로(IC: Integrated Circuit)의 설계 및 반도체 공정기술이 급격하게 발달함에 따라 디지털 회로 내지는 디지털 시스템 설계의 규모가 최소 수천만 게이트급에서 수억 게이트급까지 커짐은 물론 그 구성이 극히 복잡해지고 있는 추세이고, 이와 같은 추세는 계속적으로 확대되고 있는 추세이다. 특히 SOC(System On Chip)으로 통칭되는 최근의 시스템급의 집적회로들은 대부분 1 이상의 프로세서 코어(RISC 코어 내지는 DSP 코어 등으로, 구체적 예로는 ARM사의 ARM11 코어 내지는 CEVA사의 Teak DSP 코어)를 내장하고 칩 기능의 상당 부분들을 소프트웨어로 구현하는 추세이다. 그리고, 시장에서의 경쟁은 더욱 더 치열해지므로 빠른 시간 내에 우수한 제품을 개발하여야만 함으로 설계 기간의 단축은 제품의 성공을 결정하는 매우 중요한 요소로 되어져 버렸다. 따라서 최근의 칩 설계에서는 ESL 설계 기법이 새로운 설계 방식으로 산업체에서 많은 주목을 받고 있다. 이와 같은 전통적인 디지털 하드웨어 설계에서 적용하여 왔던 RTL(Register Transfer Level) 설계기법보다 추상화 수준(abstraction level)(추후에 설명됨)이 높은 ESL 설계 기법을 적용하여 설계되는 칩들은 칩 설계와 더불어서 이 칩을 구동시키는 소프트웨어의 개발도 동시에 수행되어져야만 한다. 따라서, 하드웨어의 설계와 동시에 소프트웨어 개발을 진행시키기 위하여서 해당 하드웨어를 소프트웨어적으로 모델링한 가상 플랫폼(Virtual Platform, 앞으로 VP로 약칭함)을 만들어서 시스템 수준의 모델(ESL 모 델)로 아키텍춰 탐색(architecture exploration), 소프트웨어 개발, 하드웨어/소프트웨어 동시 검증(HW/SW co-verification), 시스템 검증(system verification)에 사용하는 것이 최근의 추세인데, 이는 수행 가능한 스펙(executable specification)의 역할(즉 레퍼런스 모델의 역할)도 수행한다. 이와 같은 VP는 추상화 수준을 높여서 만들게 됨으로 신속하게 만들 수 있을 뿐만 아니라, 구현가능한 DUV를 설계하기 이전에 DUV에 대한 VP를 먼저 만들어 놓으면 이 VP를 이용하여서 구현가능한 DUV가 존재하기 이전에 TB에 대한 검증을 미리 진행할 수도 있음으로 여러 면에서 유리하다. 이와 같은 VP는 현재 SOC 설계 방법에서 보편화된 플랫폼기반의 설계(PBS: Platform-based Design)에서 중요한 역할을 수행하는데, VP는 일반적으로 트란젝션(transaction) 수준에서 온-칩 버스(on-chip bus)를 정해진 버스규약(bus protocol)에 맞추어서 모델링한 버스 모델(이와 같이 트란젝션 수준에서 모델링된 것을 TLM 모델이라함)을 핵심 컴퍼넌트화 하여서 이 버스에 연결되어지는 설계블럭들을 트란젝션 수준(transaction level)에서 모델링하여 이들 트란젝션 수준의 설계블럭들을 추상화되어진 버스규약에 맞추어 버스 모델과 통신이 이루어질 수 있게 함으로서 상대적으로 높은 시뮬레이션 수행속도(대략 RTL 모델의 수행속도와 비교하여 100배-10,000배)를 가능하게 한다. 현재 이와 같은 VP를 생성시키고 수행시키는 상용 툴들로서는 ARM사의 MaxSim, CoWare사의 ConvergenSC, Cadence사의 Insicive, Summit Design사의 VisualElite, Vast Systems Technology사의 VSP, Synopsys사의 SystemStudio, TenisonEDA사의 VTOC, Carbon Design Systems사의 VSP, Virutech사의 VirtualPlatform 등이 있다. SOC 설계에서는 이 VP 는 소프트웨어를 개발할 수 있을 정도의 빠른 수행 속도가 제일 중요함으로 Verilog나 VHDL과 같은 언어를 이용하여서 레지스터전송수준(RTL)에서 모델링하지 않고, C/C++ 내지는 SystemC와 같은 언어를 이용하여서 RTL 보다 추상화 수준이 높은 트란젝션 수준 내지는 알고리즘 수준(algorithmic level)에서 모델링하고 있다. 시스템 설계에서 매우 중요한 개념인 추상화 수준(abstraction level)이란 해당 설계객체(추후에 설명됨)의 구술의 구체화 정도를 표현하는 단계로, 디지털 시스템의 경우에는 추상화 수준이 낮은 단계에서부터 높은 단계로 레이아웃-수준, 트란지스터-수준, 게이트-수준, RTL(레지스터전송-수준), 트란젝션 수준, 알고리즘 수준 등으로 나눌 수 있다. 즉, 게이트-수준은 RTL 보다 추상화 수준이 낮고, RTL 수준은 트란젝션 수준보다 추상화 수준이 낮으며, 트란젝션 수준은 알고리즘 수준보다 추상화 수준이 낮다. 따라서, 특정 설계객체 A 의 추상화 수준이 트란젝션이고, 이를 더욱 구체화시켜서 표현한 설계객체 B가 RTL이면 설계객체 A는 설계객체 B보다 추상화 수준이 높다고 정의 한다. 뿐만 아니라 설계객체 X가 내부에 설계객체 A와 설계객체 C를 가지고 있고 설계객체 Y는 내부에 A를 구체화한 설계객체 B와 설계객체 C를 가지고 있다면, 설계객체 X는 설계객체 Y보다 추상화 수준이 높다고 정의한다. 뿐만 아니라, 같은 게이트수준 내지는 같은 RTL에서는 지연시간 모델을 얼마나 더 정확이 하였느냐에 따라서 추상화 수준의 높고 낮음을 결정할 수 있다. 즉, 지연시간 모델이 정확한 것이 추상화 수준이 낮다고 이야기하는데, 예로서 같은 게이트수준이라고 하더라도 제로지연시간모델(zero-delay model)의 네트리스트가 단위지연시간모델(unit-delay model)의 네트리스트보다 추상화 수준이 높다고 하고, 단위지 연시간모델의 네트리스트가 SDF(Standard Delay Format)를 이용한 풀타이밍모델(full-timing model)의 네트리스트보다 추상화 수준이 높다고 정의한다. 최근의 SOC 설계는 칩으로 최종적으로 구현되어져야 하는 대상을 시초 설계객체로 정의하고, 이 시초 설계객체를 처음 추상화 수준(예로 트란젝션 수준)에서부터 점진적 구체화 과정(progressive refinement process)를 통하여서 목표로 하는 마지막 추상화 수준(예로 게이트-수준)까지 구체화 시켜가는 과정이라고 정의될 수 있다 (도 14 참조). 점진적 구체화(progressive refinement) 과정을 통한 설계 기법은 플랫폼 기반의 설계와 함께 최근의 SOC의 설계 복잡도에 효율적으로 대처하면서 설계를 진행시킬 수 있는 유일한 설계 기법으로 대부분의 SOC 설계들은 이와 같은 점진적 구체화 과정을 통하여 진행되어진다. 점진적 구체화 과정을 통한 설계 기법의 핵심은 추상화 상위 수준에서 모델링된 설계객체 MODEL_DUV(HIGH) 내에 존재하는 설계 블럭들을 단계별로 구체화시켜서 MODEL_DUV(HIGH)보다 추상화 하위 수준에서 모델링되는 설계객체 MODEL_DUV(LOW)를 자동화된 방식(예로 논리합성 내지는 상위수준합성)으로 내지는 수작업 방식으로 내지는 자동화된 방식과 수작업 방식을 혼용하여서 얻는 과정이라고 요약할 수 있다. 이의 구체적 예로서는 우선 ESL에서부터 RTL로 구체화하는 단계인 ESL 모델에서부터 구현가능한 RTL 모델을 얻는 과정(이 과정은 현재 수작업으로 진행되거나 내지는 상위수준합성 방식 내지는 이를 혼용한 방식으로 진행됨)에서는 ESL 모델이 MODEL_DUV(HIGH)가 되고, 구현가능한 RTL 모델이 MODEL_DUV(LOW)가 되는 것이며, 다시 RTL에서부터 게이트 수준으로 구체화하는 단계인 구현가능한 RTL 모델에서부터 게이트 수준 모델(즉, 게이트-수준 네트리스 트)를 얻는 과정(이 과정은 현재 대부분 논리합성 방식으로 진행됨)에서는 구현가능한 RTL 모델이 MODEL_DUV(HIGH)가 되고, 게이트 수준 모델이 MODEL_DUV(LOW)가 된다. 이 게이트 수준 모델은 배치 및 배선(placement and routing) 과정에서 추출된 지연시간 정보(Standard Delay Format으로 표현됨)를 부가(back-annotation)함으로서 타이밍정확한 게이트 수준 모델이 된다 (앞으로는 모델이라고 하면 특별한 언급이 없는 한, DUV(Design Under Verification으로 추후에 설명됨)와 TB(Test Bench로 추후에 설명됨)를 모두 포함하고 있는 것으로 정의한다). With the recent rapid development of integrated circuit (IC) and semiconductor process technology, the scale of digital circuit or digital system design has grown from at least tens of millions of gates to hundreds of millions of gates. It is a trend, and this trend is continuously expanding. In particular, the recent system-class integrated circuits, commonly referred to as SOCs (System On Chip), are mostly embedded with at least one processor core (RISC core or DSP core, for example, ARM11 ARM or CEVA Teak DSP core). A great deal of functionality is being implemented in software. And competition in the market is fierce, so it is necessary to develop a superior product in a short time, and shortening the design period has become a very important factor in determining the success of the product. Therefore, in recent chip design, the ESL design technique is attracting much attention from the industry as a new design method. Chips designed with ESL design techniques that have higher abstraction levels (described later) than the Register Transfer Level (RTL) design techniques that have been used in traditional digital hardware designs are used in addition to the chip design. The development of the running software must also be carried out at the same time. Therefore, in order to proceed with software development at the same time as designing hardware, architecture exploration is made by system level model (ESL model) by creating virtual platform (hereinafter referred to as VP) modeling the hardware. ), Software development, hardware / software co-verification, and system verification are recent trends, which serve as an executable specification (i.e. the role of a reference model). ). Such VPs can be created quickly by increasing the level of abstraction, as well as creating a VP for the DUV before designing a viable DUV. Verification can be done in advance, which is advantageous in many ways. Such VPs play an important role in platform-based design (PBS), which is common in current SOC design methods. VPs are typically on-chip buses at the transaction level. ) Is a core component of the bus model (modeled at the transaction level, called the TLM model) modeled according to the specified bus protocol, and the transaction level of the design blocks connected to this bus. By designing these transaction level design blocks to communicate with the bus model according to the abstracted bus protocol, the simulation performance is relatively high (approximately 100 times -10,000 times compared to that of the RTL model). Make it possible. Currently, commercial tools for creating and executing such VP include ARM MaxSim, CoWare ConvergenSC, Cadence Insicive, Summit Design VisualElite, Vast Systems Technology VSP, Synopsys SystemStudio, TenisonEDA VTOC, Carbon Design Systems VSP, Virutech's VirtualPlatform. In SOC design, the VP is fast enough to develop software, so it is important not to model at the register transfer level (RTL) using languages such as Verilog or VHDL, but to use languages such as C / C ++ or SystemC. Therefore, the model is modeled at the transaction level or algorithm level, which has a higher level of abstraction than RTL. Abstraction level, a very important concept in system design, expresses the degree of dictation of the dictation of the design object (described later). It can be divided into level, transistor-level, gate-level, register transfer-level (RTL), transaction level, and algorithm level. That is, the gate-level has a lower level of abstraction than the RTL, the RTL level has a lower level of abstraction than the transaction level, and the transaction level has a lower level of abstraction than the algorithm level. Therefore, if the level of abstraction of a specific design object A is a transaction, and the design object B expressed in detail is RTL, the design object A is defined as having a higher level of abstraction than the design object B. In addition, if design object X has design object A and design object C inside, and design object Y has design object B and design object C incorporating A inside, design object X is more abstract than design object Y. It is defined as high. In addition, depending on how much more accurate the latency model is at the same gate level or the same RTL, it is possible to determine the high and low level of abstraction. In other words, the accurate latency model is said to have a low level of abstraction. For example, even if the gate level is the same, the netlist of the zero-delay model is better than the netlist of the unit-delay model. The level of abstraction is high, and the netlist of unit time model is higher than the netlist of full-timing model using SDF (Standard Delay Format). Recent SOC design defines the object to be finally implemented in the chip as the initial design object, which targets the initial design object from the initial abstraction level (e.g. transaction level) through the progressive refinement process. It can be defined as the process of shaping to the last level of abstraction (eg gate-level) (see FIG. 14). The design technique through progressive refinement process is the only design technique that can cope with the design complexity of the recent SOC with the platform-based design. It proceeds through. The core of the design technique through the gradual materialization process is to refine the design blocks existing in the design object MODEL_DUV (HIGH) modeled at the upper level of abstraction step by step to designate the design object MODEL_DUV (LOW) modeled at the lower level of abstraction than MODEL_DUV (HIGH). It can be summarized as a process obtained by using an automated method (eg, a logic synthesis or a higher level synthesis) or by a manual method or a mixture of an automated method and a manual method. As an example of this, first of all, the process of obtaining an RTL model that can be implemented from an ESL model, which is an embodiment of ESL to RTL (this process is currently performed manually or in a high-level synthesis method or a mixture thereof), is performed using the MODEL_DUV ( HIGH), and the implementable RTL model becomes MODEL_DUV (LOW), and the process of obtaining the gate-level model (ie, gate-level netlist) from the implementable RTL model, which is the step of specifying from the RTL to the gate level again ( This process is now mostly done in a logic synthesis method), and the RTL model that can be implemented becomes MODEL_DUV (HIGH) and the gate level model becomes MODEL_DUV (LOW). This gate-level model is a timing-accurate gate-level model by back-annotating the delay information (represented in Standard Delay Format) extracted during the placement and routing process. Unless otherwise defined, it includes both DUV (described later as Design Under Verification) and TB (described later as Test Bench).

여기에서 한가지 언급할 것은 ESL 모델이라고 하더라도 ESL 모델 내부에 존재하는 모든 설계객체들이 시스템 수준으로 존재하여야만 하는 것은 아니며, RTL 모델이라고 하더라도 RTL 모델 내부에 존재하는 모든 설계객체들이 레지스터전송 수준으로 존재하여야만 하는 것은 아니라는 것이다. 즉, ESL 모델이라고 하더라도 이 모델 내부의 특정 소수의 설계객체들은 레지스터전송 수준으로 존재하고 이들 설계객체들 추상화용 래퍼(wrapper)로 감싸버림으로서 시스템 수준으로 존재하는 다른 다수의 설계객체들과 추상화 수준을 맞추어 냄으로서 ESL 모델로서 취급되어질 수 있고, RTL 모델이라고 하더라도 이 모델 내부의 특정 소수의 설계객체들은 게이트 수준으로 존재하지만 RTL 수준으로 존재하는 다른 다수의 설계객체들과 같이 RTL 모델로서 취급되어질 수도 있다. 같은 이유에서 GL 모델에서도 특정 소수의 설계객체(예로서 논리합성으로 게이트수준의 네트리스트를 생성시키지 않는 메모리블럭)들은 레지스터전송 수준으로 존재할 수도 있다. 따라서 본 특허에서 "특정 추상화 수준의 모델" 이라고 하면 ESL에서부터 GL에까지 점진적 구체화 과정에서 존재할 수 있는 다양한 추상화 수준들(ESL, RTL, GL 추상화 수준뿐만 아니라 이들의 혼합된 형태의 추상화 수준들인 ESL/RTL 혼합된 추상화 수준, RTL/GL 혼합된 추상화 수준, ESL/RTL/GL 혼합된 추상화 수준 등도 모두 포함하는 추상화 수준들)에서 존재하는 설계 대상이 되는 어떠한 형태의 모델들 중에 하나를 특정하는 명칭이고, 일반적으로 지칭되어지는 “추상화 수준”이라고 하면 ESL, RTL, GL 뿐만 아니라 ESL에서부터 GL에까지 점진적 구체화 과정에서 존재할 수 있는 다양한 추상화 수준들(예로, ESL/RTL 혼합된 추상화 수준, RTL/GL 혼합된 추상화 수준, ESL/RTL/GL 혼합된 추상화 수준 등)도 모두 포함되는 명칭임을 강조해 둔다. 예로서, DUV 내부에 4개의 설계객체 A, B, C, D가 서브모듈로서 존재하고 A와 B는 ESL, C는 RTL, 그리고 D는 GL 추상화 수준에서 구술됨으로서 DUV가 ESL/RTL/GL이 혼합된 형태의 추상화 수준 모델이라고 하더라도 이 DUV는 특정 추상화 수준의 모델이라고 지칭할 수 있다 (뿐만 아니라, 구체적으로 ESL/RTL/GL이 혼합된 추상화 수준의 모델이라고 구체적으로 지칭하는 것도 가능함). 앞으로, 추상화 수준이 혼합된 형태의 모델의 경우에 추상화 수준이 혼합된 형태임을 분명하게 이야기할 필요가 있는 경우에는 추상화 상위/하위 혼합수준 모델, 내지는 추상화 혼합수준 모델이라 칭하기로 한다.One thing to note here is that not all design objects within the ESL model must exist at the system level, even for the ESL model, and all design objects within the RTL model must exist at the register transfer level, even for the RTL model. It is not. That is, even in the ESL model, a small number of design objects within this model exist at the register transfer level and are wrapped at the system level by wrapping these design objects in a wrapper for abstraction. It can be treated as an ESL model by matching the model, and even though it is an RTL model, a few of the design objects inside this model can be treated as RTL models like many other design objects that exist at the gate level but exist at the RTL level. have. For the same reason, in the GL model, a small number of design objects (e.g. memory blocks that do not generate gate-level netlists in logical synthesis) may exist at the register transfer level. Therefore, the term "specific abstraction level model" in this patent refers to the various abstraction levels (ESL, RTL, GL abstraction levels as well as their mixed forms of abstraction levels, ESL / RTL, which may exist in the process of gradual materialization from ESL to GL). Abstraction levels, including mixed abstraction level, RTL / GL mixed abstraction level, ESL / RTL / GL mixed abstraction level, etc.). The general term “abstraction level” refers to the various levels of abstraction that can exist in the gradual materialization process, from ESL to RTL and GL as well as from ESL to GL (eg, ESL / RTL mixed abstraction level, RTL / GL mixed abstraction). Levels, ESL / RTL / GL mixed abstraction levels, etc.). For example, four design objects A, B, C, and D exist as submodules inside the DUV, where A and B are dictated at ESL, C at RTL, and D at the GL abstraction level. Even with a mixed level abstraction level model, this DUV can be referred to as a model of a specific level of abstraction (as well as specifically referring to ESL / RTL / GL as a mixed level of abstraction model). In the future, in the case of a model having a mixed level of abstraction, when it is necessary to clearly state that the abstraction level is a mixed form, it will be referred to as an abstract upper / lower mixed level model or an abstract mixed level model.

ESL에서 중요한 개념인 트란젝션(transaction)이란 RTL의 시그널 내지는 핀(pin)과 대응되는 개념으로 시그널 내지는 핀 상에서 나타내어지는 정보는 비트 내지는 비트벡터로만 표현되어질 수 있는 것임에 반하여서, 트란젝션은 논리적 연관성이 있는 복수개의 시그널들 내지는 핀들을 하나의 단위로서 정의하여서 나타내어지는 정보를 의미하며, 정보의 전달은 함수호출(function call)을 이용한다. 예로 임의의 프로세서 모듈과 임의의 메모리 모듈로 구성된 설계에서 항상 존재하는 어드레스 시그널 N-비트, 데이터 시그널 M-비트, 제어 시그널 P-비트로 구성되는 총 (N+M+P)개의 시그널들을 논리적 연관성이 있는 N-비트 어드레스 버스, M-비트 데이터 버스, P-비트 제어 버스로 구성하게 되면 매 사이클을 (N+M+P) 비트벡터로 구성되는 의미 해석이 어려운 이진벡터 대신에 READ(ADDR(address_value), DATA(data_value)), 내지는 WRITE(ADDR(address_value), DATA(data_value)), 내지는 READ-WAIT(ADDR(address_value), DATA(data_value)), 내지는 WRITE-WAIT(ADDR(address_value), DATA(data_value)) 등과 같이 의미 해석이 가능한 심벌로서 나타낼 수 있는데, 이와 같은 것을 트란젝션이라 칭한다. 또한 트란젝션은 하나의 사이클 단위에서 뿐만 정의될 수 있을 뿐만 아니라(이를 단일사이클 단위의 트란젝션(cycle-accurate transaction)이라 칭하기로 하고, ca-트란젝션으로 약칭한다), 여러 사이클 단위로도 확장되어져서 정의되는 트란젝션(이를 여러사이클단위의 트란젝션(timed transaction 내지는 cycle-count transaction 내지는 PV-T transaction 등 여러 명칭으로 불림)이라 칭하기로 하고, 본 특허에서는 timed-트란젝션으로 단일명으로 약칭한다)도 있을 수도 있다 (이와 같이 여러 사이클 단위에서 정의되는 timed-트란젝션은 Transaction_name(start_time, end_time, other_attributes)로 표현되어질 수 있다). 또한 트란젝션은 시간개념이 없는 트란젝션(이를 untimed-트란젝션으로 약칭함)까지도 포함한다. 사실 트란젝션에 대한 표준화된 일관된 정의는 없으나, 위에서 설명한 것과 같이 untimed-트란젝션, timed-트란젝션, ca-트란젝션으로 나누는 것이 제일 일반적이다. 이와 같이 트란젝 션 내에서도 추상화 수준에 따라서 추상화 수준이 제일 높지만 시간정확성이 제일 낮은 untimed-트란젝션에서부터 추상화 수준이 제일 낮지만 시간정확성이 제일 높은 ca-트란젝션, 그리고 추상화수준과 시간정확성이 이들의 중간수준이 timed-트란젝션으로 나누어 질 수 있다. Transaction, which is an important concept in ESL, corresponds to the RTL signal or pin, and the information represented on the signal or pin can only be expressed as bit or bit vector, whereas transaction is logical. It refers to information represented by defining a plurality of related signals or pins as a unit, and the transfer of information uses a function call. For example, a logical association between a total of (N + M + P) signals consisting of an address signal N-bit, a data signal M-bit, and a control signal P-bit that always exist in a design consisting of any processor module and any memory module. N-bit address bus, M-bit data bus, and P-bit control bus, each cycle consists of (N + M + P) bitvectors. ), DATA (data_value)), or WRITE (ADDR (address_value), DATA (data_value)), or READ-WAIT (ADDR (address_value), DATA (data_value)), or WRITE-WAIT (ADDR (address_value), DATA ( data_value)), and the like, as a symbol that can be interpreted as meaning. Such a thing is called a transaction. Transactions can also be defined not only in one cycle unit (this is called a cycle-accurate transaction, abbreviated as a ca-transaction), and extend to multiple cycle units as well. The transaction is defined as a transaction (which is called by several names such as a timed transaction, a cycle-count transaction, or a PV-T transaction), and in this patent, abbreviated as a single name as a timed-transaction. (A timed-transaction defined in multiple cycles can be expressed as Transaction_name (start_time, end_time, other_attributes)). Transactions also include transactions without time concepts (abbreviated as untimed-transactions). In fact, there is no standardized and consistent definition of transactions, but it is most common to divide them into untimed-transactions, timed-transactions and ca-transactions as described above. As such, even within a transaction, the level of abstraction is highest depending on the level of abstraction, but the time accuracy is lowest, from the untimed transaction with the lowest level of abstraction, but the highest level of time accuracy is ca-transaction, and the level of abstraction and time accuracy The intermediate level can be divided into timed transactions.

구체화 과정은 점진적 방식으로 일어남으로서 VP에 존재하는 트란젝션 수준의 설계객체들이 단계적으로 구체화를 위한 변환 과정을 통하여서 최소한 비트수준의 사이클 정확도를 가지는(bit-level cycle-accurate) RTL 수준의 설계객체들로 바뀌어지게 된다. 이와 같은 변환 과정의 마지막에는 VP에 존재하는 모든 트란젝션 수준의 설계객체들이 RTL 수준의 설계객체들로 변환되어져서 존재하게(따라서 트란젝션 수준 VP가 구현가능한 RTL 모델로 변환되게) 된다. 또한 구현가능한 RTL 모델에 존재하는 RTL 수준의 설계객체들이 단계적으로 구체화를 위한 변환 과정을 통하여서 비트수준의 타이밍 정확도를 가지는(bit-level timing-accurate) 게이트 수준의 설계객체들로 바뀌어지게 된다. 이와 같은 변환 과정의 마지막에는 RTL 모델에 존재하는 모든 RTL 수준의 설계객체들이 게이트 수준의 설계객체들로 변환되어져서 존재하게(따라서 RTL 모델이 게이트 수준 모델로 변환되게) 된다. 도14 는 이와 같은 과정을 예로서 보여주고 있다. 트란젝션 수준 DUV(ESL) 내에 하위 블록으로 4개의 트란젝션 수준의 설계객체 DO_esl_1, DO_esl_2, DO_esl_3, DO_esl_4를 가지고 있다면, 이 4개의 트란젝션 수준의 설계객체들이 단계적으로 점진적 구체화 과정을 거침으로서 RTL 수준의 설계객체 DO_rtl_1, DO_rtl_2, DO_rtl_3, DO_rtl_4로 교체되면서 최종적으로는 RTL 수준의 설계객체 DO_rtl_1, DO_rtl_2, DO_rtl_3, DO_rtl_4 들만으로 구성되는 레지스터전송수준 DUV(RTL)로 변환된다. 또한, 레지스터전송수준 DUV(RTL)내에 하위 블록으로 4개의 설계객체 DO_rtl_1, DO_rtl_2, DO_rtl_3, DO_rtl_4를 가지고 있다면, 이 4개의 RTL 설계객체들이 단계별로 점진적 구체화 과정을 거침으로서 게이트 수준의 설계객체 DO_gl_1, DO_gl_2, DO_gl_3, DO_gl_4로 교체되면서 최종적으로는 게이트 수준의 설계객체 DO_gl_1, DO_gl_2, DO_gl_3, DO_gl_4들만으로 구성되는 게이트수준 DUV(GL)로 변환된다. The materialization process takes place in a progressive manner so that transaction-level design objects in the VP are at least bit-level cycle-accurate RTL-level design objects through the conversion process for specification. Will change to At the end of this conversion process, all transaction-level design objects in the VP are transformed into RTL-level design objects (and thus converted into an RTL model that can be implemented as a transaction-level VP). In addition, RTL-level design objects in the feasible RTL model are converted to bit-level timing-accurate gate-level design objects through the conversion process for concrete specification. At the end of this conversion process, all RTL-level design objects in the RTL model are converted to gate-level design objects (and thus the RTL model is converted to a gate-level model). Figure 14 shows this process as an example. If you have four transaction-level design objects DO_esl_1, DO_esl_2, DO_esl_3, and DO_esl_4 as sub-blocks within the transaction-level DUV (ESL), these four transaction-level design objects go through the step-by-step gradual refinement process. The design objects of DO_rtl_1, DO_rtl_2, DO_rtl_3, and DO_rtl_4 are replaced with the register transfer level DUV (RTL) consisting of only RTL-level design objects DO_rtl_1, DO_rtl_2, DO_rtl_3, and DO_rtl_4. Also, if there are four design objects DO_rtl_1, DO_rtl_2, DO_rtl_3, and DO_rtl_4 as lower blocks in the register transfer level DUV (RTL), these four RTL design objects undergo a step-by-step gradual specification process to allow gate-level design objects DO_gl_1, It is replaced by DO_gl_2, DO_gl_3, and DO_gl_4, and eventually converted to gate-level DUV (GL) consisting of only gate-level design objects DO_gl_1, DO_gl_2, DO_gl_3, and DO_gl_4.

SOC 설계에서는 설계되어져야 하는 대상이 두 가지가 있는데, 그 하나는 DUV(Design Under Verification)이고, 또 다른 하나는 DUV를 시뮬레이션하기 위한 테스트벤치(testbench, 앞으로는 이를 TB로 약칭함)이다. DUV는 궁극적으로 반도체 제조 공정을 거쳐서 칩으로 만들어지는 설계 대상이고, 테스트벤치는 구현된 해당 칩이 장착되어서 동작하는 주변상황을 모델링한 것으로서 DUV의 시뮬레이션에 사용된다. DUV를 시뮬레이션 하는 경우에 테스트벤치가 DUV에 입력을 인가하고 인가된 입력으로 DUV에서 출력되는 결과를 받아들여서 처리하는 것이 일반적이다. 이들 DUV와 테스트벤치는 일반적으로 계층적인 구조로 내부에 다양한 1 이상의 하위모듈들을 가지고 있는데, 이들 하위모듈 각각을 설계 블록이라고 할 수 있고, 설계 블록 내부에는 설계 모듈들이 존재하고, 설계 모듈 내부에는 서브모듈들이 존재한다. 이와 같은 설계 블록들, 내지는 설계 모듈들, 내지는 서브모듈들이나 DUV, 그리고 테스트벤치 각각 내지는 일부분 내지는 이들의 조합들 내지는 이들의 조합들의 일부분을 모두 본 특허에서는 설계객체(design object)들 (구체적 예로는, Verilog의 경우에는 module, VHDL인 경우에는 entity, SystemC의 경우에는 sc_module들이 모 두 설계객체의 일 예임)이라고 지칭하기로 한다. 따라서 VP도 설계객체의 하나로 볼 수 있으며, VP의 일부분 내지는 VP내의 1 이상의 설계 블록들 내지는 설계 블럭들의 일부분 내지는 이 블록 내부의 설계 모듈들 내지는 이들의 일부분 내지는 이 설계 모듈 내부의 서브모듈들 내지는 이들의 일부분 등도 모두 설계객체로 볼 수 있다 (즉 DUV와 TB의 전체 내지는 DUT와 TB 내부의 임의의 부분을 모두 설계객체로 볼 수 있다).In SOC design, there are two things that need to be designed, one is Design Under Verification (DUV), and the other is a testbench for simulating DUV (abbreviated as TB). The DUV is the design target ultimately made into a chip through the semiconductor manufacturing process, and the testbench is used to simulate the DUV by modeling the ambient conditions in which the chip is implemented. When simulating a DUV, it is common for the testbench to apply an input to the DUV and accept and process the output from the DUV with the authorized input. These DUVs and testbenches typically have a hierarchical structure with one or more submodules inside them, each of which can be called a design block, with design modules inside the design block, and a submodule inside the design module. Modules exist. Such design blocks, or design modules, or submodules or DUVs, and test benches, respectively, or in part, or combinations thereof, or portions of combinations thereof, are all referred to herein as design objects (specific examples In the case of Verilog, module, VHDL entity, and SystemC sc_module are all examples of design objects). Thus, a VP may also be regarded as one of the design objects, and a part of the VP, one or more design blocks in the VP, a part of the design blocks, the design modules in the block, the parts thereof, or the submodules in the design module All of the parts of the can be viewed as design objects (ie, all of the DUV and TB or any part of the DUT and TB can be seen as design objects).

그러나, 이와 같은 현재의 점진적 구체화를 통한 설계에서는 추상화 상위수준에서의 검증이 매우 빠르게 수행될 수 있으나 추상화 하위수준에서의 검증은 상대적으로 느리게 수행되어짐으로 점진적 구체화 과정을 통하여 추상화 하위 단계로 진행될수록 검증 속도가 크게 떨어지는 문제점이 있다. 일반적인 단일 시뮬레이션 방식(본 특허에서 단일 시뮬레이션 방식이란 넓은 의미로 정의되어져서 1개의 시뮬레이터를 사용하는 경우에 뿐만 아니라, 2 이상의 시뮬레이터를 사용하는 경우(예로 Verilog 시뮬레이터와 Vera 시뮬레이터를 동시에 사용하는 경우)라도 이 2 이상의 시뮬레이터를 하나의 CPU 상에서 수행시키는 것을 단일 시뮬레이션 방식으로 정의함)에 대비하여서, 검증 속도를 높이는 방법으로 2 이상의 시뮬레이터들을 분산병렬(distributed parallel) 방식으로 실행시키는 방법이 있다. 이 시뮬레이터의 일 예로들로는 HDL(Hardware Description Language, Cadence사의 NC-Verilog/Verilog-XL와 X-sim, Synopsys사의 VCS, Mentor사의 ModelSim, Aldec사의 Riviera/Active-HDL, Fintronic사의 FinSim 등이 있음) 시뮬레이터 내지는 HVL(Hardware Verification Language, Cadence사의 e 시뮬레이터, Synopsys사의 Vera 시뮬레이터 등이 있음) 시뮬레이터 내지는 SDL(System Description Language, SystemC 시뮬레이터, Cadence사의 Incisive 시뮬레이터 등이 있음) 시뮬레이터들이 있다. 또 다른 구분으로는 이벤트-구동(event-driven) 시뮬레이터 내지는 사이클-기반(cycle-based) 시뮬레이터들이 있는데, 본 특허에서 언급하는 시뮬레이터들은 이들 시뮬레이터들을 모두를 가르킨다. 따라서 2 이상의 시뮬레이터들을 사용하는 경우에는 상기에서 언급된 어떤 종류의 시뮬레이터들을 이용할 수 있을 뿐만 아니라, 이들 다른 종류의 시뮬레이터를 자유롭게 혼용하여 사용할 수 있다. 시뮬레이션의 분산수행(distributed processing) 방식인 분산병렬 시뮬레이션(distributed parallel simulation) (혹은 병렬분산 시뮬레이션(parallel distributed simulation)이라고도 하거나 간략히 병렬 시뮬레이션(parallel simulation)이라고도 하는데, 본 특허에서는 분산병렬 시뮬레이션이라 명칭함)은 제일 일반적인 병렬시뮬레이션 방식으로 시뮬레이션 대상이 되는 DUV와 TB(즉 특정 추상화 수준의 모델)를 2 이상의 설계객체들로 나누어서 이들 나누어진 각각의 설계객체를 별도의 시뮬레이터에 분산시켜서 수행시키는 방식이다 (도 5 참조). 따라서 분산병렬 시뮬레이션을 위해서는 시뮬레이션 대상이 되는 모델을 2 이상의 설계객체들로 나누는 분할(partition) 과정이 필요하다. 따라서 본 특허에서는 분할 과정을 통하여 특정 로컬시뮬레이션(추후에 정의됨)에서 수행되어져야 하는 설계객체를 로컬설계객체라 칭하기로 한다. However, in the current gradual concrete design, the verification at the upper abstraction level can be performed very quickly, but the verification at the lower level of abstraction is performed relatively slowly. There is a problem that the speed is greatly reduced. In general, a single simulation method (a single simulation method is defined in this patent in a broad sense, not only when using one simulator but also when using two or more simulators (for example, using Verilog simulator and Vera simulator simultaneously). In contrast to defining two or more simulators on a single CPU as a single simulation method, there is a method of executing two or more simulators in a distributed parallel manner to increase the verification speed. Examples of this simulator include HDL (Hardware Description Language, NC-Verilog / Verilog-XL and X-sim from Cadence, VCS from Synopsys, ModelSim from Mentor, Riviera / Active-HDL from Aldec, and FinSim from Fintronic) Or HVL (Hardware Verification Language, Cadence's e simulator, Synopsys' Vera simulator, etc.) simulator, or SDL (System Description Language, SystemC simulator, Cadence's Incisive simulator, etc.) simulators. Another distinction is the event-driven simulator or the cycle-based simulators, the simulators of which are mentioned in this patent. Thus, in the case of using two or more simulators, it is possible not only to use any kind of simulators mentioned above, but also to freely use these different kinds of simulators. Distributed parallel simulation (also called parallel distributed simulation or simply parallel simulation, which is referred to in this patent as distributed parallel simulation), which is a distributed processing method of simulation. In the most common parallel simulation method, DUV and TB (ie, a model of a certain level of abstraction) to be simulated are divided into two or more design objects, and each of these divided design objects is distributed and executed in a separate simulator. 5). Therefore, for distributed parallel simulation, a partitioning process is needed to divide the model to be simulated into two or more design objects. Therefore, in this patent, a design object to be performed in a specific local simulation (to be defined later) through a division process will be referred to as a local design object.

최근에는 기가비트 이더넷 등의 고속의 컴퓨터네트워크로 2 이상의 컴퓨터들을 연결하고 이들 각각의 컴퓨터 상에서 시뮬레이터를 수행(본 특허에서는 이들 분산병렬 시뮬레이션을 가능하게 하는 2 이상의 시뮬레이터들 각각의 시뮬레이터에서 수 행되는 시뮬레이션을 로컬시뮬레이션이라 칭하고, 해당 시뮬레이터를 로컬시뮬레이터라고 칭한다)하거나, 내지는 2 이상의 중앙처리장치(CPU)를 장착한 멀티프로세서 컴퓨터(예로, 펜티움 듀얼코어 칩과 AMD 듀얼코어 칩은 프로세서코어가 2개로 이를 이용하여도 멀티프로세서 컴퓨터를 구성할 수 있으며 1 이상의 시스템보드 상에 여러개의 CPU 칩들을 장착하여서도 멀티프로세서 컴퓨터를 구성할 수 있음)에서 이들 각각의 CPU 상에서 시뮬레이터를 수행시킴으로서 분산병렬 시뮬레이션을 수행하는 것이 가능하다. 그러나, 이와 같은 통상적인 분산병렬 방식을 통한 시뮬레이션은 시뮬레이터들 간의 통신 오버헤드 및 동기 오버헤드(communication overhead and synchronization overhead)로 인하여 성능 향상이 매우 제약적이라는 문제점이 있다. 분산병렬 시뮬레이션에서의 동기화 방법은 크게 두가지가 있는데 하나는 보수적(conservative) 방식(혹은 비관적 방식이라고도 함)이고 또 다른 하나는 낙관적(optimistic) 방식이다. 보수적 방식의 동기화는 시뮬레이션 이벤트의 전후 인과관계(causality relation)가 시뮬레이터들 간에서도 반드시 유지됨으로서 롤백(roll-back)이 필요하지 않으나 분산병렬 시뮬레이션의 속도가 제일 느린 로컬 시뮬레이션으로 한정되어진다는 문제점과 과도한 동기화가 일어나는 문제점이 있고, 낙관적 방식의 동기화는 시뮬레이션 이벤트의 전후 인과관계가 시뮬레이터들간에 일시적으로 유지되지 못하게 될 수 있고 이를 수정하기 위한 롤백이 필요함으로서 전체의 롤백의 횟수를 줄이는 것이 분산병렬 시뮬레이션의 성능에 큰 영향을 미치게 된다. 그러나 지금까지의 낙관적 방식에 의한 분산병렬 시뮬레이션에서는 각 로컬 시뮬레이션이 다른 로컬 시뮬레이션들과의 동기화 과정 없이 진행하는 시뮬레 이션 시점들이 롤백이 일어나는 것을 최소한으로 되도록 특별하게 고려되어진 것이 아님으로 인하여 과도한 롤백을 초래하게 되는 결과를 초래함으로서 전체 시뮬레이션의 성능을 크게 떨어뜨리게 한다. 통상적인 낙관적 방식의 분산병렬 시뮬레이션이나 보수적 방식의 분산병렬 시뮬레이션의 구체적 수행방법 및 구현방법은 여러 문헌들 내지는 논문들로 이미 잘 알려져 있음으로 본 특허에서 이의 자세한 설명은 생략하기로 한다. 단, 한가지 더 언급할 것은 분산병렬 시뮬레이션을 수행하는 프로세서들의 수는 분산병렬 시뮬레이션의 성능을 최대한으로 하기 위해서는 로컬 시뮬레이션의 수와 같게 하는 것이 바람직하지만, 프로세서들의 수가 2 이상(즉, 2 이상의 컴퓨터가 네트워크로 연결되어 있거나 내지는 멀티프로세서 컴퓨터에 프로세서가 2 이상)만 된다면 로컬 시뮬레이션의 수가 2 보다 많다고 하더라도 2 이상의 로컬 시뮬레이션을 하나의 프로세서로 수행하게 함으로서 이와 같은 경우에도 분산병렬 시뮬레이션이 가능하다는 것이다. 결론적으로, 현재의 보수적 방식의 동기화 방법과 통신 방법이나 낙관적 방식의 동기화 방법이나 통신 방법들 모두 2 이상의 시뮬레이터들을 이용한 분산병렬 시뮬레이션의 성능을 크게 떨어뜨리는 제약 요소가 되는 문제점이 있다. Recently, two or more computers are connected by a high-speed computer network such as Gigabit Ethernet, and a simulator is performed on each of them (in this patent, simulations performed in each simulator of two or more simulators that enable these distributed parallel simulations) are performed. Multiprocessor computers (eg, Pentium dual-core chips and AMD dual-core chips are called local simulations and are referred to as local simulators), or are equipped with two or more central processing units (CPUs). It is also possible to configure a multiprocessor computer, and to perform a distributed parallel simulation by running a simulator on each of these CPUs in a multiprocessor computer with multiple CPU chips mounted on one or more system boards. end It is. However, such a conventional distributed parallel simulation has a problem in that the performance improvement is very limited due to the communication overhead and synchronization overhead between the simulators. There are two main methods of synchronization in distributed parallel simulations: one is conservative (also called pessimistic) and the other is optimistic. The conservative synchronization is that the causality relation of simulation events is maintained between simulators, so no rollback is required, but distributed parallel simulation is limited to the slowest local simulation. There is a problem of excessive synchronization, and optimistic synchronization can prevent causality between simulation events temporarily between simulators, and rollbacks are required to correct them, so reducing the total number of rollbacks is distributed parallel simulation. Will have a big impact on performance. However, up to now, distributed parallel simulations by optimistic methods have excessive rollbacks because the simulation points in which each local simulation proceeds without synchronization with other local simulations are not specially considered to minimize the rollback. This results in significant consequences for the overall simulation. Since the conventional optimistic distributed parallel simulation or the conservative distributed parallel simulation method and the implementation method thereof are well known in the literature or the literature, detailed description thereof will be omitted. One more thing to mention is that the number of processors performing distributed parallel simulations should be equal to the number of local simulations in order to maximize the performance of distributed parallel simulations. If a networked or multiprocessor computer has more than two processors), even if the number of local simulations is more than two, two or more local simulations can be performed by one processor. In conclusion, both the current conservative synchronization method and the communication method, or the optimistic synchronization method and the communication method have a problem of significantly reducing the performance of distributed parallel simulation using two or more simulators.

뿐만 아니라, 점진적 구체화 과정을 통한 설계에서는 추상화 상위 수준의 모델이 추상화 하위 수준의 모델의 레퍼런스 모델(reference model)의 역할을 수행하는 것이 매우 중요한데 이와 같이 레퍼런스 모델의 역할을 수행하기 위해서는 추상화 상위 수준의 모델과 추상화 하위 수준의 모델간의 모델 일관성(model consistency)을 유지시키는 것이 필요하다. 그러나, 현재의 점진적 구체화 방법에서는 이들 다른 추상화 수준들에서 존재하는 2 이상의 모델들간의 모델 일관성을 유지시키는 효과적인 방법이 없다. In addition, it is very important that the high-level abstraction model serves as a reference model of the low-level abstraction model in the design through the gradual materialization process. It is necessary to maintain model consistency between the model and the low-level model. However, there are no effective ways to maintain model coherence between two or more models present at these different levels of abstraction in current incremental refinement methods.

뿐만 아니라, 점진적 구체화 과정을 통한 설계 과정에서 발견되는 설계오류들을 제거하는 디버깅 과정이 체계적이지 못함으로 인하여서 매우 많은 시간이 소요되는 문제점이 있다.In addition, the debugging process for removing design errors found in the design process through a gradual materialization process is very time consuming because the system is not systematic.

본 발명의 목적은, 분산 병렬 시뮬레이션에서의 동기 오버헤드와 통신 오버헤드를 효과적으로 줄임으로서 분산병렬 시뮬레이션의 속도를 높이는 방법을 제공함에 있다. It is an object of the present invention to provide a method of speeding up distributed parallel simulation by effectively reducing synchronization overhead and communication overhead in distributed parallel simulation.

상기 목적들을 달성하기 위하여, 본 발명에 따른 설계 검증 방법을 적용하기 위하여 필요한 설계 검증 장치는 검증 소프트웨어와 1 내지는 2 이상의 시뮬레이터가 인스톨된 1 내지는 2 이상의 컴퓨터로 구성될 수 있다. 본 발명에서의 설계 검증 방법을 적용할 수 있는 또 다른 설계 검증 장치는 검증 소프트웨어와 1 내지는 2 이상의 시뮬레이터가 인스톨된 1 내지는 2 이상의 컴퓨터와 상기 1 내지는 2 이상의 컴퓨터에 연결된 1 내지는 2 이상의 시뮬레이션가속기 내지는 FPGA 보드로 구성된다. 검증 소프트웨어는 컴퓨터에서 실행되며, 만일 상기 설계 검증 장치에 2 이상의 컴퓨터들이 있는 경우에는 이들 2 이상의 컴퓨터는 네트워크(예로 이더넷 혹 은 기가비트이더넷)로 연결되어져서 컴퓨터들 간에 파일들 내지는 데이터의 이동을 네트워크를 통하여 가능하게 한다. 설계 검증 용도의 상기 1 내지는 2 이상의 시뮬레이터는 이벤트-구동 시뮬레이터(event-driven simulator)로만 구성(분산병렬 시뮬레이션을 이벤트-구동 시뮬레이터들로만 구성하여서 병렬 시뮬레이션하는 것을 PDES(Parallel Discrete Event Simulation)이라 함)될 수도 있고, 혹은 이벤트-구동 시뮬레이터와 사이클-기반 시뮬레이터(cycle-based simulator)로 같이 구성 될 수도 있고, 혹은 사이클-기반 시뮬레이터로만 구성될 수도 있고, 혹은 사이클-기반 시뮬레이터와 트란젝션-기반 시뮬레이터(transaction-based simulator)로 같이 구성될 수도 있고, 혹은 트란젝션 시뮬레이터로만 구성될 수도 있고, 혹은 이벤트-구동 시뮬레이터와 트란젝션-기반 시뮬레이터로 같이 구성될 수도 있고, 혹은 이벤트-구동 시뮬레이터와 사이클-기반 시뮬레이터와 트란젝션-기반 시뮬레이터로 같이 구성될 수도 있는 등 본 특허에서의 시뮬레이터의 구성은 매우 다양한 방식으로 구성할 수 있다. 따라서, 상기 2 이상의 시뮬레이터가 이벤트-구동 시뮬레이터와 사이클-기반 시뮬레이터로 구성된 경우에는 이를 이용한 분산병렬 시뮬레이션이 부분적으로는 이벤트-구동 시뮬레이션 방식으로 진행되고 다른 부분은 사이클-기반 시뮬레이션 방식으로 진행되는 동시-시뮬레이션(co-simulation) 모드로 진행되게 된다. 혹은 상기 2 이상의 시뮬레이터가 이벤트-구동 시뮬레이터와 사이클-기반 시뮬레이터와 트란젝션-기반 시뮬레이터로 구성된 경우에는 이를 이용한 분산병렬 시뮬레이션이 부분적으로는 이벤트-구동 시뮬레이션 방식으로 진행되고 다른 부분은 사이클-기반 시뮬레이션 방식으로 진행되고 또 다른 부분은 트란젝션-기반 시뮬레이 션 방식으로 진행되는 동시-시뮬레이션 모드로도 진행되어질 수 있다.In order to achieve the above objects, the design verification apparatus necessary for applying the design verification method according to the present invention may be composed of one or two or more computers in which verification software and one or two or more simulators are installed. Another design verification apparatus to which the design verification method of the present invention can be applied includes one or more computers including verification software and one or more simulators, and one or two simulation accelerators connected to the one or more computers. It consists of an FPGA board. The verification software runs on a computer, and if there are two or more computers in the design verification device, these two or more computers are connected by a network (e.g. Ethernet or Gigabit Ethernet) to network the movement of files or data between the computers. Enabled through The one or more simulators for design verification use consist only of event-driven simulators (Parallel Discrete Event Simulation (PDES) for parallel simulation by configuring distributed parallel simulations only of event-driven simulators). It can be a combination of an event-driven simulator and a cycle-based simulator, or it can consist of only a cycle-based simulator, or a cycle-based and transaction-based simulator. It can be configured as a "based simulator", or as a transaction simulator only, or as an event-driven and transaction-based simulator, or as an event-driven simulator and a cycle-based simulator. Get together with a transaction-based simulator It may be such that the configuration of the simulator according to the present patent can be configured in a wide variety of ways. Therefore, when the two or more simulators are composed of an event-driven simulator and a cycle-based simulator, the distributed parallel simulation using the same is partially performed in the event-driven simulation and the other part is performed in the cycle-based simulation. Proceed to co-simulation mode. Alternatively, when the two or more simulators are composed of an event-driven simulator, a cycle-based simulator, and a transaction-based simulator, distributed parallel simulation using the same is partially performed as an event-driven simulation method, and the other part is a cycle-based simulation method. The other part can also be run in the co-simulation mode, which proceeds in a transaction-based simulation manner.

특허 10-2006-92574와 PCT/KR2006/004059에서는 추상화 상위수준 모델을 이용한 시뮬레이션 결과 내지는 설계 변경 이전에 수행된 시뮬레이션 결과 내지는 추상화 상위수준 모델로부터 구해지는 예상입력과 예상출력를 이용함으로서 분산 병렬 시뮬레이션의 각 로컬시뮬레이션의 통신 오버헤드와 동기 오버헤드를 최소화시키는 시뮬레이션 방법을 제안하였다. 그러나, 이와 같은 시뮬레이션 방법은 예상입력과 예상출력이 필요함으로, 반드시 추상화 상위수준 모델을 이용한 시뮬레이션 내지는 설계 변경 이전의 시뮬레이션 내지는 추상화 상위수준 모델이 반드시 앞서서 필요하다. 따라서 본 특허에서는 특허 10-2006-92574와 PCT/KR2006/004059에서 제안된 분산처리적병렬시뮬레이션의 예상입력과 예상출력을 구하기 위해서 필요한 시뮬레이션 내지는 통상적인 분산 병렬 시뮬레이션에서 통신 오버헤드 내지는 동기 오버헤드를 최소화할 수 있는 새로운 분산 병렬 시뮬레이션 방법을 제안한다.In Patent 10-2006-92574 and PCT / KR2006 / 004059, the angles of distributed parallel simulation are obtained by using the predictive input and the predicted output obtained from the simulation result using the abstraction high-level model or the simulation result performed before the design change or the abstraction high-level model. We proposed a simulation method to minimize the communication overhead and synchronization overhead of local simulation. However, such a simulation method requires an expected input and an expected output, and therefore, a simulation using an abstract high-level model or a simulation or abstract high-level model before a design change must be performed in advance. Therefore, in this patent, the communication overhead or synchronization overhead in simulations or conventional distributed parallel simulations necessary for obtaining the expected inputs and expected outputs of the distributed parallelism simulation proposed in Patent 10-2006-92574 and PCT / KR2006 / 004059 We propose a new distributed parallel simulation method that can be minimized.

본 특허에서 발명되는 분산 병렬 시뮬레이션 방법은, 시뮬레이션을 진행해가면서 각 로컬시뮬레이션이 다른 로컬시뮬레이션들과 동기를 맞추는 동기 시점을 시뮬레이션 과정 중에서 동적으로 결정하게 된다. 구체적으로, 시뮬레이션 현재시점 Ti에서 분산 병렬 시뮬레이션의 각 로컬시뮬레이션 Sl(j)은 다른 로컬시뮬레이션들에서 자기 쪽으로 오게 될 다음 이벤트(즉 이 이벤트는 다른 로컬시뮬레이션들에서 수행되는 로컬설계객체들에서부터 상기 로컬시뮬레이션 Sl(j)에서 수행되는 로컬설계객체로 들어오는 입력이벤트임) 발생 시점 Tn을 현재시간 Ti에서부터 현재까지(즉 시 뮬레이션시간 0에서부터 Ti까지) 해당 다른 로컬시뮬레이션이 보내온 모든 이벤트들의 간격(이벤트들의 간격이란, 바로 전 이벤트와의 간격을 말함)들 중에서 제일 짧은 시간 n(이를 본 특허에서는 "외부 이벤트 최소간격"이라 칭함)만큼 더 진행된 시점으로 정하여 Tn을 Ti + n으로 가정하고(이와 같은 가정은 현재시점 Ti에서 하는 것임) 이 Tn과, 로컬시뮬레이션 Sl(j)에서부터 다른 로컬시뮬레이션들로 가야되는 다음 이벤트(즉 이 이벤트는 상기 로컬시뮬레이션 Sl(j)에서 수행되는 로컬설계객체로부터 다른 로컬시뮬레이션로 나가는 출력이벤트임) 발생 시점 To 중에서 시간적으로 과거인(즉 시간값의 작은) 시간을 동기 시점 Te로 예상하고 로컬시뮬레이션을 진행하는 것이다 (즉 Ti 이후의 다음 예상 동기 시점 Te는 min(Ti + n, To)이 됨. 이와 같은 예상 동기 시점을 구하는 과정에서 해당 로컬시뮬레이션을 반드시 To로까지 진행할 필요는 없음. 즉 Ti에서 로컬시뮬레이션 시간을 우선 Ti+n까지 시간 전진(advance)시키는 과정에서 To에 도달하면 Te는 To가 되는 것이고 이와 같은 시간 전진 과정에서 To 도달 이전에 Ti+n가 도달되면 Te는 Ti + n이 되는 것임). 만일 실제 시뮬레이션 진행 과정에서 이와 같은 가정이 틀리게 되면, 실제 동기 시점 Ts는 min(Ti + n, To) 보다 시간적으로 과거에 있게 됨으로서, 낙관적 분산 병렬 시뮬레이션과 유사하게 롤백(roll-back)을 수행하게 된다 (즉, Ts와 같거나 과거에 있는 체크포인트 시점으로 롤백을 수행하고, 시뮬레이션 시간이 Ts가 되면 동기화 수행). 이를 위하여 본 특허에서의 분산 병렬 시뮬레이션의 각 로컬시뮬레이션은 시뮬레이션 수행 도중에 1 이상의 시점에서 체크포인트를 수행하여, 필요시에 롤백을 수행할 수 있게 한다. 그런데, 본 특허에서는 이와 같은 체크포인트도 주기 적으로 진행할 수 있을 뿐만 아니라, 시뮬레이션 시간이 진행되면서 체크포인트 간격을 넓히거나 줄이는 방식으로 가변적으로 수행하는 것도 가능하다. 이와 같이 체크포인트의 간격을 동적으로 변화시키면 체크포인트를 위한 시뮬레이션 오버헤드를 최소화시킬 수 있다. In the distributed parallel simulation method of the present invention, as the simulation proceeds, the synchronization point at which each local simulation synchronizes with other local simulations is dynamically determined during the simulation process. Specifically, each local simulation Sl (j) of distributed parallel simulation at the current simulation point Ti is the next event that will come to itself in other local simulations (i.e. this event is the local design object from the local design objects performed in other local simulations). This is the input event coming into the local design object performed in the simulation Sl (j). When Tn occurs from the current time Ti to the present time (i.e., simulation time 0 to Ti), the interval of all events sent by the corresponding local simulation. The interval is defined as a point in time that is further advanced by the shortest time n (which is referred to as "external event minimum interval" in this patent) among the previous events) and assumes Tn as Ti + n (this assumption Is at the current time Ti). This local time is different from Tn and local simulation Sl (j). The next event that should go to the migrations (that is, this event is an output event from the local design object performed in the local simulation Sl (j) to another local simulation). This is to estimate the time as the synchronization point Te and proceed with the local simulation (that is, the next expected synchronization point Te after Ti becomes min (Ti + n, To). It is not necessary to proceed to To, ie, when To arrives in the process of advancing local simulation time from Ti to Ti + n first, Te becomes To, and in this time advancement process, Ti + n before reaching To. Is reached, Te becomes Ti + n). If this assumption is wrong during the actual simulation, the actual synchronization point Ts will be in the past in time rather than min (Ti + n, To), making roll-back similar to an optimistic distributed parallel simulation. (I.e. rollback to a checkpoint point equal to or in the past Ts, and synchronize when the simulation time reaches Ts). To this end, each local simulation of distributed parallel simulation in this patent performs a checkpoint at one or more points of time during the simulation, so that a rollback can be performed when necessary. However, in the present patent, not only the checkpoint may be periodically performed, but also it may be variably performed by increasing or decreasing the checkpoint interval as the simulation time progresses. This dynamic change in checkpoint interval minimizes the simulation overhead for the checkpoint.

위에서 설명한 것과 같이, 본 특허에서 제안한 분산 병렬 시뮬레이션 방법을 일 예를 통하여 구체적으로 설명한다. 상황을 단순화시키기 위하여 로컬시뮬레이션가 2인 경우로 한다. 로컬시뮬레이션 A와 로컬시뮬레이션 B로 구성된 분산 병렬 시뮬레이션에서 로컬시뮬레이션 A에서부터 로컬시뮬레이션 B로 전달되는 이벤트들이 시뮬레이션시간 0, 10, 20, 35, 45, 50, 60, 70으로 진행한다고 가정하고, 로컬시뮬레이션 B에서부터 로컬시뮬레이션 A로 전달되는 이벤트들은 시뮬레이션시간 10, 20, 30, 40, 45, 50, 60, 65, 75으로 진행된다고 가정하자. 이와 같은 경우에 본 특허에서의 분산 병렬 시뮬레이션 방법에서 로컬시뮬레이션들간의 동기 시점을 다음과 같이 동적으로 결정한다. As described above, the distributed parallel simulation method proposed by the present patent will be described in detail through an example. In order to simplify the situation, it is assumed that the local simulation is 2. In a distributed parallel simulation consisting of local simulation A and local simulation B, it is assumed that events transmitted from local simulation A to local simulation B proceed at simulation time 0, 10, 20, 35, 45, 50, 60, 70, and local simulation Suppose that events from B to local simulation A go through simulation times 10, 20, 30, 40, 45, 50, 60, 65, and 75. In such a case, the synchronization timing between local simulations is dynamically determined in the distributed parallel simulation method of the present patent as follows.

우선 Ti=0인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=10이고 B에서 A로 전달된 적이 없음으로 n=0임으로 Te=10이 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=10이고 A에서 B로 전달된 적이 없음으로 n=0임으로 Te=10이 된다 (이 경우에는 Ts도 실제 10임으로 롤백이 일어나지 않음).First of all, when Ti = 0, local simulation A becomes To = 10 at the next synchronization time point and Te = 10 since n = 0 since it has never been passed from B to A, and To is at the next synchronization time point in local simulation B as well. = 10 and never passed from A to B, where n = 0, and Te = 10 (in this case, Ts is actually 10, so no rollback occurs).

Ti=10인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=20이고 n=10임으로 Te=20이 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=20이고 n=10임으 로 Te=20이 된다 (이 경우에는 Ts도 실제 20임으로 롤백이 일어나지 않음).In the case of Ti = 10, the local simulation A is Te = 20 with To = 20 and n = 10 at the next synchronization point, and Te = 20 with To = 20 and n = 10 at the next synchronization point in Local Simulation B as well. 20 (in this case, Ts is also 20, so no rollback occurs).

Ti=20인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=35이고 n=10임으로 Te=30이 되고(이 경우 로컬시뮬레이션 A는 Ti=20인 경우에 n=10임을 이미 알고 있음으로 To가 35라고 하더라도 시뮬레이션시간 35까지 진행하지 않고 시뮬레이션시간 30에서 동기화를 수행함), 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=30이고 n=10임으로 Te=30이 된다 (이 경우에는 Ts도 실제 30임으로 롤백이 일어나지 않음).In the case of Ti = 20, local simulation A becomes Te = 30 with To = 35 and n = 10 at the next synchronization time point (in this case, local simulation A already knows that n = 10 when Ti = 20). Even though To is 35, synchronization is performed at simulation time 30 without progressing to simulation time 35), and in local simulation B, Te = 30 as To = 30 and n = 10 at the next synchronization point. 30, no rollback).

Ti=30인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=35이고 n=10임으로 Te=35이 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=40이고 n=10임으로 Te=40이 된다 (이 경우에는 Ts는 실제 35임으로 B에서 롤백이 일어남).In the case of Ti = 30, local simulation A is Te = 35 with To = 35 and n = 10 at the next synchronization point, and Te = 40 with To = 40 and n = 10 at local synchronization B. (In this case, Ts is actually 35, so rollback occurs at B).

Ti=35인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=45이고 n=5임으로 Te=40이 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=45이고 n=5임으로 Te=40이 된다 (이 경우에는 Ts도 실제 40임으로 롤백이 일어나지 않음).In the case of Ti = 35, the local simulation A is Te = 40 with To = 45 and n = 5 at the next synchronization time point, and Te = 40 with To = 45 and n = 5 at the next synchronization time point in Local Simulation B as well. (In this case Ts is also 40 so no rollback occurs).

Ti=40인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=45이고 n=5임으로 Te=45가 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=45이고 n=5임으로 Te=45이 된다 (이 경우에는 Ts도 실제 45임으로 롤백이 일어나지 않음).In the case of Ti = 40, the local simulation A is Te = 45 with To = 45 and n = 5 at the next synchronization point, and Te = 45 with To = 45 and n = 5 at the next synchronization point in Local Simulation B as well. (In this case Ts is also 45, so no rollback occurs).

Ti=45인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=50이고 n=5임으로 Te=50이 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=50이고 n=5임으로 Te=50이 된다 (이 경우에는 Ts도 실제 50임으로 롤백이 일어나지 않음).In the case of Ti = 45, local simulation A is Te = 50 with To = 50 and n = 5 at the next synchronization point, and Te = 50 with To = 50 and n = 5 at local synchronization B. (In this case, Ts is also 50, so no rollback occurs).

Ti=50인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=60이고 n=5임으로 Te=55가 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=60이고 n=5임으로 Te=55가 된다 (이 경우에는 Ts도 실제 55임으로 롤백이 일어나지 않음).In the case of Ti = 50, local simulation A is Te = 55 with To = 60 and n = 5 at the next synchronization point, and Te = 55 with To = 60 and n = 5 at local synchronization B. (In this case, Ts is also 55, so no rollback occurs).

Ti=55인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=60이고 n=5임으로 Te=60이 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=60이고 n=5임으로 Te=60이 된다 (이 경우에는 Ts도 실제 60임으로 롤백이 일어나지 않음).In the case of Ti = 55, local simulation A is Te = 60 with To = 60 and n = 5 at the next synchronization point, and Te = 60 with To = 60 and n = 5 at local synchronization B. (In this case Ts is also 60, so no rollback occurs).

Ti=60인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=70이고 n=5임으로 Te=65가 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=65이고 n=5임으로 Te=65가 된다 (이 경우에는 Ts도 실제 65임으로 롤백이 일어나지 않음).In the case of Ti = 60, the local simulation A is Te = 65 with To = 70 and n = 5 at the next synchronization point, and Te = 65 with To = 65 and n = 5 at the next synchronization point in Local Simulation B as well. (In this case Ts is also 65 so no rollback occurs).

Ti=65인 경우에, 로컬시뮬레이션 A는 다음 동기 시점으로는 To=70이고 n=5임으로 Te=70이 되고, 로컬시뮬레이션 B에서도 다음 동기 시점으로는 To=75이고 n=5임으로 Te=70이 된다 (이 경우에는 Ts도 실제 70임으로 롤백이 일어나지 않음).In the case of Ti = 65, the local simulation A is Te = 70 with To = 70 and n = 5 at the next synchronization point, and Te = 70 with To = 75 and n = 5 at the next synchronization point in Local Simulation B as well. (In this case Ts is also 70, so no rollback occurs).

상기의 이 일 예에서 만일 시뮬레이션시간 65 이후부터 A와 B 모두 이벤트 간격들이 5보다 작아지지 않는다면 시뮬레이션시간 65 부터는 시간 간격 5 단위를 주기로 하여서 동기화가 진행되면서 분산 병렬 시뮬레이션이 진행되게 된다. 분산 병렬 시뮬레이션이 레지스터전송수준에서 구술된 DUV를 대상으로 하는 시뮬레이션의 경우에는, 이와 같은 시간 간격 5는 상기 로컬시뮬레이션 A와 B 상에서 수행되는 설계객체들 간의 연결시그널들을 구동하는 DUV에 존재하는 사용자클럭들 중에서 제일 주기가 작은 사용자클럭의 주기일 확률이 매우 높다. 즉, 본 특허에서 제시한 동기 방법에 의하여서 일정 시뮬레이션 시간이 경과된 후에 로컬시뮬레이션간에 동기화 간격으로 정해지는 시간 간력은 분산 병렬 시뮬레이션의 로컬시뮬레이션들로 수행되는 설계객체들 간의 연결시그널들을 구동하는 DUV에 존재하는 사용자클럭들 중에서 제일 주기가 작은 사용자클럭의 주기일 확률이 매우 높게 됨으로서, 이 이후부터는 롤백의 가능성이 0에 가깝게 떨어진 상황에서 분산 병렬 시뮬레이션을 수행할 수 있도록 한다. 따라서, 이와 같은 분산 병렬 시뮬레이션을 본 특허에서의 방법으로 진행해 가면서, 이와 같은 시점에 도달된 것으로 판단이 된다면(이와 같은 판단은 분산 병렬 시뮬레이션을 수행하는 소프트웨어가 자동적으로 판단하면 됨. 이와 같은 판단이 잘못되더라도 시뮬레이션 결과의 옳고 그름에 영향을 미치는 것이 아미고, 시뮬레이션의 성능 향상을 예상한 것보다 약간 저하시키는 제한적인 영향만을 미치게 됨), 이 시점 이후부터는 각 로컬시뮬레이션에서 롤백을 대비하기 위한 체크포인트 수행 간격을 매우 넓게 동적으로 변경하는 것이 가능하게 되며, 이를 통하여 체크포인트의 오버헤드를 최소화시키는 것도 가능하게 된다. In this example, if the event intervals of both A and B do not become smaller than 5 after the simulation time 65, the distributed parallel simulation is performed while the synchronization is performed at a time interval of 5 units from the simulation time 65. In the case of a simulation where the distributed parallel simulation targets the DUV described at the register transfer level, this time interval 5 is a user clock present in the DUV driving the connection signals between the design objects performed on the local simulations A and B. It is highly probable that the cycle of the user clock is the smallest among them. That is, according to the synchronization method proposed by the present patent, the time force determined as the synchronization interval between local simulations after a certain simulation time has elapsed is a DUV that drives the connection signals between design objects performed by local simulations of distributed parallel simulation. Since the probability of the user clock being the smallest among the user clocks present in R is very high, the distributed parallel simulation can be performed after the rollback probability is close to zero. Therefore, if such a parallel simulation is performed in the method of the present patent, and it is determined that such a point has been reached (the judgment is automatically performed by the software performing the distributed parallel simulation. If anything goes wrong, it will not affect the correctness or wrongness of the simulation results, and will only have a limited effect of slightly lowering the performance improvement of the simulation.) From this point forward, checkpoints are prepared for rollback in each local simulation. It is possible to dynamically change the spacing very widely, thereby minimizing the overhead of the checkpoint.

위에서 자세하게 설명된 것과 같이 특허에서는 이와 같은 분산 병렬 시뮬레이션의 특정 로컬시뮬레이션 Sl(j)의 시뮬레이션 시점 Ti에서 다음 동기 시점을 다른 로컬시뮬레이션들에서 자기 쪽으로 오게 될 다음 이벤트 발생 시점 Tn과, 로컬시뮬레이션 Sl(j)에서부터 발생하여서 다른 로컬시뮬레이션들로 가야되는 다음 이벤트 발생 시점 To, 둘 중에서 작은 것(시간적으로 과거인 것)을 다음 예상 동기 시점 Te로 정하는 과정에서, 상기 Tn을 현재까지(즉 시뮬레이션시간 0에서부터 Ti까지) 해당 다른 로컬시뮬레이션이 보내온 모든 이벤트 간격들 중에서 제일 짧은 간격인 "이벤 트 최소 간격" n만큼 Ti에서 더 진행된 시간으로 정하여, Tn을 Ti + n으로 가정하는 본 특허에서의 방법을 "현재까지의 경험에 근거한 미래동기시점 예측 방법"이라 칭하기로 한다. 또한 본 특허에서는 상기 "현재까지의 경험에 근거한 미래동기시점 예측 방법"을 위하여 체크포인트 수행 간격을 분산 병렬 시뮬레이션 도중에 변화시키면서 수행하는 체크포인트 수행을 "가변적 간격으로 진행하는 체크포인트 방법"이라 칭한다.As described in detail above, the patent discloses that the next synchronization time point Tn at the simulation time Ti of a specific local simulation Sl (j) of such distributed parallel simulation is to be directed to itself in other local simulations, and the local simulation Sl ( In the process of setting the next event occurrence time To, which occurs from j) and needs to go to other local simulations, the smaller of the two (the past in time) as the next expected synchronization point Te, the Tn is present (ie, simulation time 0). To Ti), the method in this patent that assumes Tn to be Ti + n by setting the time further advanced in Ti by the shortest "event minimum interval" n of all event intervals sent by that other local simulation. The method of predicting future synchronous time based on the experiences to date "is called. In addition, in the present patent, for the "future synchronization point prediction method based on the present experience", the checkpoint execution that is performed while changing the checkpoint execution interval during the distributed parallel simulation is referred to as a "checkpoint method that proceeds at a variable interval".

상기 목적 외에 본 발명의 다른 목적 및 이점들은 첨부한 도면을 참조한 추가적인 설명을 통하여 명백하게 드러나게 될 것이다. Other objects and advantages of the present invention in addition to the above objects will become apparent from the following description with reference to the accompanying drawings.

도1 은, 본 특허에서의 분산처리적병렬수행 방식의 시뮬레이션을 위한 2 이상의 컴퓨터에 인스톨된 2 이상의 로컬컴퓨터들의 논리적연결구조 방식들의 몇가지 사례들을 개략적으로 도시한 도면이다.FIG. 1 schematically illustrates some examples of logical connection structure schemes of two or more local computers installed in two or more computers for the simulation of the distributed processing parallel execution scheme in the present patent.

도2 는, 분산병렬 시뮬레이션을 2 이상의 컴퓨터들과 이들 컴퓨터에 인스톨된 2이상의 HDL 시뮬레이터들을 이용하여서 구성한 분산병렬 시뮬레이션 환경의 일 예를 개략적으로 도시한 도면이다.FIG. 2 is a diagram schematically illustrating an example of a distributed parallel simulation environment in which distributed parallel simulation is configured using two or more computers and two or more HDL simulators installed in these computers.

도3 은, 통상적인 분산 병렬 시뮬레이션의 전체 진행 흐름도의 일 예를 개략적으로 도시한 도면이다. 따라서 분산 병렬 시뮬레이션의 전체 진행을 위한 다른 여러 진행 흐름도들도 얼마든지 존재할 수 있다. 이 분산 병렬 시뮬레이션 과정 단계들에서 S106 단계에서 각 로컬시뮬레이터 상에서 각 로컬설계객체들 별로 로컬시뮬레이션을 수행하게 되는데 본 특허에서 제안된 동기 시점 결정 방법을 사용하여서 동기 시점을 결정하면서 로컬시뮬레이션이 진행되게 된다.3 is a diagram schematically showing an example of an overall flow chart of a typical distributed parallel simulation. Thus, there may be any number of other progress flow diagrams for the overall progress of distributed parallel simulation. In this distributed parallel simulation process, local simulation is performed for each local design object on each local simulator in step S106. Local simulation is performed while determining synchronization timing using the synchronization point determination method proposed in the present patent. .

상술한 바와 같이, 본 발명의 효과는, 분산병렬 시뮬레이션에서의 동기 오버헤드와 통신 오버헤드를 효과적으로 줄임으로서 분산병렬 시뮬레이션의 속도를 높이는 방법을 제공한다. As described above, the effects of the present invention provide a method of speeding up distributed parallel simulation by effectively reducing synchronization overhead and communication overhead in distributed parallel simulation.

이상 설명한 내용을 통해 당업자라면 본 발명의 기술사상을 일탈하지 아니하는 범위에서 다양한 변경 및 수정이 가능함을 알 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 실시 예에 기재된 내용으로 한정되는 것이 아니라 특허 청구의 범위에 의하여 정하여져야만 한다. Those skilled in the art will appreciate that various changes and modifications can be made without departing from the technical spirit of the present invention. Therefore, the technical scope of the present invention should not be limited to the contents described in the embodiments, but should be defined by the claims.

Claims

In distributed parallel simulation using two or more simulators, or one or more simulators and one or more simulation accelerators or hardware emulators,

The distributed parallel simulation is "a future synchronization point prediction method based on the experience to date"

Distributed parallel simulation, characterized in that the use of.

Distributed parallel simulation, characterized in that the distributed parallel simulation uses a "checkpoint method proceeding at variable intervals".

In order to obtain a predicted synchronization time point of the local simulation, which may be the next synchronization time point in a process of performing a simulation in at least one local simulation among two or more local simulations performing the distributed parallel simulation, the specific simulation Distributed parallel simulation characterized by using the "external event minimum interval" obtained from all events delivered from other local simulations up to the point in time.

The method of claim 3, wherein

And performing a rollback of the corresponding local simulation when the actual synchronization point exists in the past as the simulation time in the past than the predicted synchronization point predicted in the local simulation.