KR100594593B1

KR100594593B1 - Parallel simulation method for rapid verifying design of semiconductor

Info

Publication number: KR100594593B1
Application number: KR1020040117803A
Authority: KR
Inventors: 이윤식; 김우성
Original assignee: 한국전자통신연구원; 호서대학교 산학협력단
Priority date: 2004-12-31
Filing date: 2004-12-31
Publication date: 2006-06-30

Abstract

본 발명은 반도체 회로 설계의 고속 검증을 위한 기술로 종래의 소프트웨어적 검증 방식의 한계를 극복하기 위하여 하드웨어 가속기(Hardware Accelerator)와 이의 최적 동작을 위한 소프트웨어 알고리즘을 구현함으로써 실제 결과 값에 근사한 예측치를 빠른 시간 내에 계산되도록 하여 설계에 즉시 반영할 수 있도록 하는 반도체 소자의 설계 고속 병렬 검증 방법에 관한 것이다.The present invention implements a hardware accelerator and a software algorithm for its optimal operation to overcome the limitations of the conventional software verification method. The present invention relates to a high-speed parallel verification method for designing a semiconductor device that can be calculated in time and immediately reflected in a design.

반도체 소자, 설계 검증, 고속 병렬 검증 방법, 하드웨어 가속 장치.Semiconductor device, design verification, high speed parallel verification method, hardware acceleration device.

Description

Parallel simulation method for rapid verifying design of semiconductor

도 1 내지 도 28은 종래의 기술을 설명하기 위한 도면.1 to 28 are views for explaining a conventional technology.

도 29내지 도 50은 본 발명의 기술을 설명하기 위한 도면.29 to 50 are diagrams for explaining the technique of the present invention.

종래의 여러 개의 반도체 칩이 하나의 시스템을 구성에 비하여 현재는 각 칩 사이의 제어 신호, 동기 신호 등으로 인한 비효율적인 요소들을 하나의 칩으로 내장하여 별개의 제어 신호 및 동기 요소들을 제거함으로써 소형, 저전력을 요구하는 가전업계와 소비자의 욕구에 충족하기 위하여 SoC(System on a Chip)를 이용하게 되었다.Compared to a conventional system, several semiconductor chips have a small size by eliminating separate control signals and synchronization elements by incorporating inefficient elements due to control signals and synchronization signals between the chips into one chip. SoC (System on a Chip) is used to meet the needs of consumer electronics industry and consumers who demand low power.

그러나, 이러한 SoC의 설계 목표는 기능의 수요와 설계 효율의 갭을 줄이기 위한 방안으로써 기술적인 한계를 해결하기 위한 것이라기보다는 생산 단가와 효율의 관점에서 정의되고 있어 그 설계비용 및 효율 또한 매우 중요한 요소이다.However, the SoC design goals are defined in terms of production cost and efficiency rather than solving technical limitations as a way to bridge the gap between functional demand and design efficiency. to be.

SoC는 그 기능적 특성상 시장이 요구하는 적기에 개발 및 생산이 이루어져야하며, 이러한 SoC의 설계에 있어 총 소요 시간의 약 40%를 칩의 특성 검증에 소모하고 있으므로 이 검증 시간의 단축이야말로 SoC 개발 및 생산에 있어 가장 중요한 요소라 할 수 있다.The SoC needs to be developed and produced in a timely manner, which is required by the market due to its functional characteristics, and the shortening of the verification time is a shortening of the verification time since about 40% of the total time required for designing such a SoC is spent on chip characteristics verification. It is the most important factor in.

종래의 반도체 설계의 검증 방법으로 로직 게이트 시뮬레이션, 컴파일드 방식 시뮬레이션 및 이벤트 방식의 시뮬레이션이 주로 사용된다.Logic gate simulation, compiled simulation and event simulation are mainly used as a verification method of a conventional semiconductor design.

로직 시뮬레이션은 로직으로 기술된 설계 도면을 분석하여 설계 도면상의 기능을 직접 칩으로 구현하기 전에 컴퓨터에서 가상의 입력값을 대입하고 이에 따른 설계 도면의 출력을 예상 출력값과 비교함으로써 도면의 기능 오류 여부를 검증하는 방법으로, 각 로직별 예상 출력값을 이용하므로 전체 회로에 있어 게이트의 시뮬레이션 순서가 올바르게 수행되지 않을 수가 있다.Logic simulation analyzes a design drawing written in logic and assigns a virtual input from a computer and compares the output of the design drawing with expected outputs before implementing the functions on the design drawing directly on the chip. As a verification method, the expected output value of each logic is used, so the simulation order of the gates may not be performed correctly for the entire circuit.

도 1은 로직 시뮬레이션의 로직 모델과 타이밍에 대하여 설명하기 위한 일예로, Feedback이 있는 간단한 것이며, 모델에 따라 출력 O가 완전히 다른 것을 보여 주고 있다. 즉, Zero-delay의 경우 출력이 시간 0부터 값이 1로 일정하게 유지되나, Unit-delay나 Multi-delay의 경우 출력이 각각 2, 3의 단위 시간부터 발진(oscillation)이 일어나는 경우가 발생한다. 이 경우 시뮬레이터는 설계자에게 이와 같은 가능성을 인지시켜, 회로의 오동작을 방지할 수 있도록 정보를 제공한다. 이와 같이, Zero-delay에서 찾을 수 없는 문제점을 Unit-delay나 Multi-delay 모델에서 찾을 수 있다. 또한 Unit-delay에서 발견하기 어려운 Static, Dynamic Hazard와 같은 문제도 Multi-delay에서 발견할 수 있는 장점이 있으나, 시뮬레이션이 복잡하고 시뮬레이션 시간이 오래 걸리는 단점이 있다.Figure 1 is an example for explaining the logic model and timing of the logic simulation, it is a simple feedback and shows that the output O is completely different according to the model. In other words, in case of zero-delay, the output is kept as 1 from time 0, but in case of unit-delay or multi-delay, oscillation occurs from 2, 3 unit time respectively. . In this case, the simulator notifies the designer of this possibility and provides information to prevent circuit malfunction. As such, problems not found in zero-delay can be found in unit-delay or multi-delay models. In addition, problems such as Static and Dynamic Hazard, which are hard to find in the unit-delay, can be found in the multi-delay, but the disadvantage is that the simulation is complicated and the simulation takes a long time.

도 2는 기본적인 로직 게이트 회로이며, 도 3는 도 2의 회로의 분석을 위한 의사 시뮬레이션 프로그램의 예제이다.2 is a basic logic gate circuit, and FIG. 3 is an example of a pseudo simulation program for analysis of the circuit of FIG.

도 3의 시뮬레이션 프로그램은 변수 X2의 값이 저장되기 이전에 사용되기 때 문에 수행 결과에 대한 문제점이 발생할 여지가 있는데 예를 들어, 모든 변수의 값이 "0"으로 초기화되었다면 컴퓨터 시뮬레이션은 올 바른 수행을 할 수가 없다.Since the simulation program of FIG. 3 is used before the value of variable X2 is stored, there may be a problem with the result of execution. For example, if all variable values are initialized to "0", the computer simulation is correct. Can't do it.

즉, 입력 A=1, B=0, C=0, D=1, E=1인 경우 그림13에 의한 시뮬레이션 결과는 Q1=0,Q2=0 로써 올바른 결과인 Q1=0, Q2=1과는 다른 결과를 보인다.That is, if the input A = 1, B = 0, C = 0, D = 1, E = 1, the simulation result in Figure 13 is Q1 = 0, Q2 = 0 and the correct results Q1 = 0, Q2 = 1 and Shows different results.

상기의 문제점을 해결하기 위한 방안으로 게이트 시뮬레이션의 수행 순서를 결정하는 단계화작업(levelization)을 하며, 각각의 게이트와 네트에 단계(level)번호를 부여한다.In order to solve the above problems, leveling is performed to determine the order of performing the gate simulation, and a level number is assigned to each gate and the net.

도 4은 상기 단계화작업의 예제로써, 단계번호의 부여방법의 시작은 입력 네트(Primary Input)에 레벨값 0을 부여하고 각 게이트 입력네트의 값을 취합하고 최대값을 추출하여 최대값+1을 해당 게이트의 레벨로 지정한다.Figure 4 is an example of the stepping operation, the start of the step number assignment method is to give a level value 0 to the primary (primary input), collect the value of each gate input net and extract the maximum value to the maximum value + 1 Is specified as the level of the gate.

각 게이트에 레벨값이 지정되면 그 레벨값에 따라 순차적으로 정렬(sorting)되어 이 레벨순서에 따라 컴퓨터 시뮬레이션을 수행하며, 도 2의 예제의 경우 {G1,G4}??G2??{G3,G5}의 순서로 시뮬레이션이 수행되어 도 3의 시뮬레이션 프로그램에서 오류를 발생시켰던 X2 문제를 해결할 수 있다.If a level value is assigned to each gate, the levels are sequentially sorted according to the level value, and computer simulation is performed according to the level order. In the example of FIG. 2, {G1, G4} ?? G2 ?? {G3, Simulation is performed in the order of G5} to solve the X2 problem that caused an error in the simulation program of FIG.

도 5는 각 게이트의 레벨 지정을 위한 단계화작업 프로그램의 일예이다.5 is an example of a staged work program for level designation of each gate.

또한, 상기 컴퓨터 시뮬레이션을 위해서는 내부적으로 회로의 분석을 통하여 필요한 자료구조를 구성하고 테이블 형태로 저장한다.In addition, for the computer simulation, the necessary data structure is constructed through the analysis of the circuit internally and stored in a table form.

도 6은 도 2의 예제회로에 대한 회로 분석 결과의 게이트 테이블의 일예이다.6 is an example of a gate table of a circuit analysis result of the example circuit of FIG. 2.

로직 게이트 시뮬레이션은 상기의 절차를 거쳐 회로도면(circuit) 및 입력값 (input vectors)을 이용하여 시뮬레이션을 수행한 후 출력네트의 값을 예상값과 비교함으로써 설계도면과 기능의 검증을 수행하고 기능 오류에 따라 재설계 및 재검증 절차를 거친다.Logic gate simulation is performed by using the circuit diagram and input vectors through the above procedure, and then compares the values of the output net with the expected values to verify design drawings and functions, and to perform functional errors. Redesign and revalidation procedures are followed.

도 7은 로직 게이트 시뮬레이션 방법인 인터프리티브 방식의 시뮬레이션의 알고리즘을 나타낸 것이다.7 illustrates an algorithm of an interactive simulation which is a logic gate simulation method.

그러나, 상기한 인터프리티브 방식의 시뮬레이션은 각 게이트마다 상기의 각 동작요소를 실행해야하고 특히, 모든 입력값의 변화에 따른 각각의 출력값을 테이블로 저장하는 등의 복잡한 과정을 수행하므로 수많은 게이트로 구성된 고집적의 SoC분석방법으로는 최적화가 매우 어려운 단점이 있다.However, since the above-described simulation of the interactive method has to execute each operation element for each gate, and in particular, it performs a complicated process such as storing each output value according to the change of all input values in a table, The high density SoC analysis method is very difficult to optimize.

상기한 인터프리티브 방식의 단점을 보완하기 위하여 컴파일드 시뮬레이션 방법에서는 C언어나 어셈블리어의 반복 구문을 이용하여 전체 회로를 시뮬레이션하므로 인터프리티브 방식보다 최적화 및 고속화가 가능하다.In order to make up for the shortcomings of the interpretive method, the compiled simulation method simulates the entire circuit using the repetitive syntax of C language or assembly language, thereby optimizing and speeding up the interpretive method.

도 9는 개선된 컴파일드 시뮬레이션 방법을 이용한 프로그램의 일예이며, 도 9은 컴파일드 시뮬레이션 방법의 알고리즘을 나타낸다.9 is an example of a program using the improved compiled simulation method, and FIG. 9 shows an algorithm of the compiled simulation method.

도 10은 컴파일드 시뮬레이션 방법을 이용한 도 2의 예제회로에 대한 게이트 시뮬레이션 프로그램이다.FIG. 10 is a gate simulation program for the example circuit of FIG. 2 using a compiled simulation method.

그러나, 상기한 인터프리티브 방식 및 컴파일드 방식은 각각의 입력 변화수에 따른 경우의 수를 모두 수행해야하는데, 입력변화에 따른 출력변화가 없는 경우도 모두 시뮬레이션하는 불필요한 동작을 하게된다.However, the above-described interpretive and compiled methods must perform all the cases according to the number of input variations, and even if there is no output change due to the input change, all the unnecessary operations of simulating are performed.

이러한 불필요한 동작요소를 제거함으로써 고속검증을 하는 방법이 이벤트 방식의 시뮬레이션이다.Event-based simulation is a method of high-speed verification by eliminating these unnecessary motion elements.

이벤트방식의 시뮬레이션 방법을 도 11의 회로를 이용하여 설명하면 다음과 같다.The event method simulation method will be described with reference to the circuit of FIG.

도 2의 회로에 있어서 각각의 입력값이 (A,B,C,D)=(0,0,0,0)과 (0,0,0,1)의 경우 A,B,C값이 변화하지 않았으므로 게이트 G1과 G2는 시뮬레이션을 수행할 필요가 없으며, 따라서 G4의 시뮬레이션은 X1,X2를 테스트하지 않고 생략할 수가 있다.In the circuit of FIG. 2, when each input value is (A, B, C, D) = (0,0,0,0) and (0,0,0,1), A, B, C values change. Since the gates G1 and G2 do not need to be simulated, the simulation of G4 can be omitted without testing X1 and X2.

도 12는 이벤트를 구현하기 위한 자료 구조를 나타낸 것으로, 각각 스케쥴링 정보, 비트 정보 및 새로운 비트값으로 구성된다.12 shows a data structure for implementing an event, which is composed of scheduling information, bit information, and a new bit value.

이벤트방식은 레벨방식과 달리 동적인 스케쥴링 기법을 가지고 있으며, 동적인 방식은 회로 분석시점에서는 회로내의 게이트가 수행하는 순서를 예측할 수가 없으므로, 큐(queue; 대표적인 컴퓨터 자료의 저장, 검색 방식)를 사용하여 동적 스케쥴링이 가능토록 한다.Unlike the level method, the event method has a dynamic scheduling method. Since the dynamic method cannot predict the order of the gates in the circuit at the time of circuit analysis, it uses a queue. To enable dynamic scheduling.

즉, 회로의 입력값의 변동으로 인하여 이벤트가 검출되었다면, 이벤트 자료구조(도 12)가 해당 입력네트의 정보를 가지고 생성되며, 이벤트 큐(Event Queue)는 생성된 이벤트 자료들을 저장하고 처리되기를 기다리는 것이다.That is, if an event is detected due to a change in the input value of the circuit, an event data structure (FIG. 12) is generated with the information of the corresponding input net, and an event queue stores the generated event data and waits for processing. will be.

도 13는 입력값의 이벤트를 처리하는 방법을 보여주는 프로그램의 일예이다.13 is an example of a program showing a method of processing an event of an input value.

도 14의 예제 회로에 도 13의 프로그램을 적용하여 좀 더 자세히 설명하면 다음과 같다.14 will be described in more detail by applying the program of FIG. 13 to the example circuit of FIG. 14.

두 개의 입력값이 (1,1,0,0,1,0)과 (0,0,0,0,1,1)이라고 가정하면, 두 개의 입력값 A, B와 F의 3개가 변동되고 따라서 3개의 이벤트가 발생하며, 이 이벤트는 큐에 저장된 후 각각의 이벤트 처리가 수행되어야하므로 이벤트 처리의 처음은 네트값을 이벤트 자료구조에서 영구적인 값을 갖는 영역으로 복사한다.Assuming two input values are (1,1,0,0,1,0) and (0,0,0,0,1,1), three of the two input values A, B, and F change Therefore, three events occur, and each event processing must be performed after being queued, so the first step of event processing is to copy the net value from the event data structure to a permanent value.

이후, 해당 네트에 연결된 모든 게이트를 시뮬레이션하게 되며, 이와같은 모든 게이트를 처리하는 큐가 게이트 큐(Gate Queue)이다.After that, all gates connected to the net are simulated, and the queue that processes all the gates is a gate queue.

도 15는 이벤트 처리를 위한 프로그램의 일예로써, 이벤트의 처리는 모든 발생된 이벤트가 처리될 때까지 수행하며, 수행된 이벤트는 큐에서 제거되고 동시에 게이트 큐에 해당 게이트는 저장되는 것을 나타내며, 통상적으로 여러개의 게이트가 게이트 큐에 저장된다.15 is an example of a program for event processing, in which processing of an event is performed until all generated events are processed, and the performed event is removed from the queue and at the same time, the corresponding gate is stored in the gate queue. Several gates are stored in the gate queue.

도 16은 이러한 수행의 결과로 생성된 게이트 큐의 내용을 나타낸다.Figure 16 shows the contents of the gate queue created as a result of this performance.

도 17은 모든 이벤트들이 처리된 후, 게이트 큐의 각각의 게이트를 시뮬레이션하고 게이트의 출력, 네트의 출력과 이 출력 네트의 스케쥴링을 수행함을 나타내는 것으로, 회로 도면 게이트당 하나의 시뮬레이션 종속프로그램이 있어서 시뮬레이션을 수행한다.Figure 17 shows that after all events have been processed, each gate of the gate queue is simulated and the output of the gate, the output of the net, and the scheduling of this output net are performed. Do this.

이와같이 게이트 처리 프로그램은 새로운 이벤트를 생성하고, 이 이벤트를 이벤트큐에 저장하며, 이 이벤트들을 처리되도록 이벤트 처리 프로그램을 호출하는 동작을 반복적으로 수행하여 처리할 이벤트가 없을 때까지 수행한다.In this way, the gate processing program generates a new event, stores the event in the event queue, and repeatedly calls the event processing program to process these events until there are no events to process.

이때, 입력값에 대한 시뮬레이션은 입력 처리 프로그램과 여러번의 게이트, 이벤트 처리 프로그램들로 구성된다.At this time, the simulation of the input value is composed of an input processing program, a plurality of gates and an event processing program.

도 18은 이러한 시뮬레이션 프로그램의 메인 프로그램이자, 전체적인 구성을 나타낸다.18 is a main program of such a simulation program and shows the overall configuration.

상기의 이벤트 방식의 시뮬레이션을 구현하기위한 타이밍 모델에 대하여 설명하면 다음과 같다.Referring to the timing model for implementing the above-described event simulation is as follows.

기본적으로 레벨기반의 시뮬레이션이 0-지연(0 delay) 모델을 사용하여 시뮬레이션하는 반면에, 이벤트 기반의 시뮬레이션은 1-지연(unit-delay) 또는 multi-delay(임의의 지연) 모델을 사용한다.By default, level-based simulations simulate using a 0-delay model, while event-based simulations use a unit-delay or multi-delay model.

도 19는 이벤트 시뮬레이션 수행의 일예로, 유닛딜레이(1-지연) 모델은 모든 게이트의 지연값이 1로 모델하여 시뮬레이션하는 반면, 임의의 지연 모델은 게이트의 특성, 입력의 수, 종류에 따라 지연값을 정수로 정의하여 시뮬레이션하게되며, (A,B)가 (0,0)에서 (1,1)로 변화되었을 경우를 이벤트기반의 알고리즘을 사용한 예를 보여주고 있다.19 illustrates an example of performing an event simulation. A unit delay (1-delay) model simulates all gates having a delay value of 1, while an arbitrary delay model delays according to the characteristics of the gate, the number and type of inputs. The simulation is defined by defining the value as an integer, and shows an example of using an event-based algorithm when (A, B) is changed from (0,0) to (1,1).

여기서 이벤트기반 시뮬레이션에서의 중요한 요점은 첫째, 게이트 G2는 입력값 A의 변화에 따라 즉시 시뮬레이션을 수행하게 되는데, 이는 레벨방식의 시뮬레이션과 동일하다. 둘째, G2는 두 번 시뮬레이션을 수행하게 되며, 이는 레벨방식의 시뮬레이션과는 다른 것으로, 네트 Q는 두 번의 이벤트를 가지고 있으며, 따라서 Q값은 0->1->0값으로 변환되므로 만일 네트 Q가 T 플립플롭의 입력이면, 시뮬레이션의 결과가 레벨방식과 이벤트방식은 다른 결과를 보이게 된다.The important point of event-based simulation is that first, gate G2 performs simulation immediately according to the change of input value A, which is the same as level simulation. Secondly, G2 performs two simulations, which is different from level-based simulations, where net Q has two events, so the Q value is converted from 0-> 1-> 0 to the net Q. If the input of the T flip-flop, the simulation results are different from the level method and the event method.

이벤트방식의 알고리즘에서 사용하는 두 가지 단계 - 이벤트(혹은 네트) 처리 프로그램과 게이트 처리 프로그램 단계-는 시뮬레이션의 한 개의 시간 흐름을 나타낸다. 따라서 이벤트처리 프로그램의 수행은 한 개의 단위시간의 흐름을 나타내고, 처음 이벤트처리 프로그램의 수행을 시간 0으로 하여, 이벤트처리프로그램의 수행 때 마다 시간이 1씩 증분 된다.The two phases used in the event-based algorithm, the event (or net) and gate processing program phases, represent one time flow of the simulation. Therefore, the execution of the event processing program indicates the flow of one unit time, and the execution of the first event processing program is time 0, and the time is incremented by 1 for each execution of the event processing program.

상기의 유닛 딜레이 모델과 달리 멀티 딜레이 모델은 게이트에 임의의 지연값을 지정하여 시뮬레이션하는 것으로 입력의 수, fanout의 수 등에 따른 게이트 특성의 차를 고려하여야 한다.Unlike the unit delay model described above, the multi-delay model simulates by assigning a random delay value to the gate. The difference in gate characteristics according to the number of inputs and fanout should be considered.

도 20은 임의지연회로를 이용한 간단한 회로를 보여주는 것으로, 가령, A,B의 값이 t=0에서 값의 변화가 있었다면, 네트 F의 값은 t=3이전에는 변화가 없고 또한, 네트 E의 값은 t=4일 때 까지 변화가 없게되며, 이와 같은 현상은 이벤트기반의 알고리즘에서 문제점을 야기 시킬 수 있다.FIG. 20 shows a simple circuit using an arbitrary delay circuit. For example, if the values of A and B were changed at t = 0, the value of net F was not changed before t = 3. The value remains unchanged until t = 4, which can cause problems in event-based algorithms.

이벤트방식 알고리즘에서는 t=0일 때, 게이트 G1, G4가 시뮬레이션을 위하여 스케쥴링이 되며, 스케쥴링이 된 게이트 G1, G4는 시뮬레이션을 수행하여, 네트 C와 E의 이벤트가 t=1일 때 스케쥴링이 된다. 그러나 네트 E값의 변화를 올바른 시간에 변경하기 위한 조치가 필요하게 된다. 이것은 게이트 G4의 시뮬레이션을 t=3일 때까지 지연하는 것이 한 방법이지만, 만일 네트 B가 t=1, 2, 또는 3에서 값이 바뀐다면 게이트 G4의 시뮬레이션은 정확치 않은 입력값으로 수행하게 되는 문제점이 있다. 올바른 시뮬레이션을 위하여는 게이트 G4의 수행을 t=0에서 하고, 네트의 이벤트를 처리하는 시간은 t=4에서 수행하여야하는데, 이러한 생성된 시간과 수행 시간과의 차이는 게이트의 지연값에 의해 생기는 것으로 두 가지의 시간이 시뮬레이션내에서 존재하게 된다.In the event-based algorithm, when t = 0, gates G1 and G4 are scheduled for simulation, and the scheduled gates G1 and G4 perform simulation and are scheduled when the events of nets C and E are t = 1. . However, action is needed to change the change in net E value at the right time. This is one way to delay the simulation of gate G4 until t = 3, but if net B changes its value at t = 1, 2, or 3, the simulation of gate G4 will run with an incorrect input. There is this. For the correct simulation, the gate G4 should be performed at t = 0 and the time to process the net event should be performed at t = 4. This difference between the generated time and the execution time is caused by the delay value of the gate. Two times exist in the simulation.

상기의 이벤트 시뮬레이션에 있어서 적절하지 못한 알고리즘을 사용하는 경우에는 시뮬레이션 성능 및 결과에 심각한 영향을 미치게 되는데, 임의지연의 시뮬 레이션(multi-delay simulation)을 위하여 단일지연 시뮬레이션(unit-delay simulation)에서 사용하는 간단한 이벤트큐를 사용하는 경우 이벤트는 생성될 때마다 이벤트큐에 저장하게 되고, 시간 T가 되어 이벤트 처리를 한다면 시뮬레이터는 해당 시간의 이벤트를 찾기 위하여 큐를 검색하게 되어 이때 소모되는 검색시간(Repeated Minimum Sort알고리즘)은 O(n²) 시간이 소요될 뿐 아니라, 이벤트가 생성되어 저장되는 시간(Insertion sort)도 O(n²)시간이 소요된다. 여기서 n은 이벤트의 수이다.Inappropriate algorithms in the event simulation have a significant impact on simulation performance and results, which are used in unit-delay simulation for multi-delay simulation. In case of using a simple event queue, the event is stored in the event queue every time it is created.If the time is T and the event is processed, the simulator searches the queue to find the event of the corresponding time. Minimum sort algorithm) as well as take time O (n ^2), an event is generated and do takes O (n ²⁾ time time (Insertion sort) that is stored. Where n is the number of events.

또한 이벤트들이 시간 순서적으로 생성되지 않는 문제가 생기므로 이를 방지하기 위해 이벤트들이 처리되기 전에는 반드시 수행시간에 따라 정렬되어 있어서 수행되어야 올바른 결과를 얻을 수 있으며, 가장 효율적인 방법으로 버켓소트(bucket sort; O(n)) 알고리즘의 변형인 타이밍 휠(timing wheel)과 힙소트(heap sort; O(nlog₂n)) 알고리즘의 변형인 우선 순위큐(Priority Queue)를 이용한다.In addition, since events are not generated in chronological order, in order to prevent this, the events must be sorted according to execution time before they are processed so that the correct results can be obtained. Bucket sort (bucket sort; A timing wheel, a variant of the O (n)) algorithm, and a priority queue, a variant of the heap sort (O (nlog ₂ n)) algorithm, are used.

상기 타이밍 휠의 기본적인 개념은 이벤트큐가 단위시간당 하나의 큐를 나타내는 여러 개의 이벤트큐의 집합 즉, 자료구조론 적으로는 큐의 어레이의 개념으로 파악하는 것이다.The basic concept of the timing wheel is to identify an event queue as a set of several event queues representing one queue per unit time, that is, a data structure.

도 21은 타이밍 휠의 구조를 보여주는 일예로써, 게이트의 지연의 최대값이 D라 가정할때 현재 T시간에서 시뮬레이션이 수행되고 있으면, 이벤트 I가 수행되는 최대 시간은 T+D가 된다. 따라서 T+D+1 이상의 시간에는 이벤트가 있을 수 없고, 게이트들은 모든 이벤트가 처리 될 때까지 시뮬레이션이 수행되지 않기 때문에 시 간 T나 T이전에는 이벤트가 큐에 저장되지 않으므로 최대 큐의 수는 D로 정의할 수 있다. 시뮬레이션 시간 T에서 발생하는 이벤트는 구문 "Q의 위치 = T % D, %는 modulation"을 이용하여 큐의 인덱스를 찾을 수 있다. 시뮬레이터는 시뮬레이션의 시작시간을 기준으로 한"현재 시간"을 기록 저장하여 수행하며, 이벤트 처리 프로그램은 단위지연 모델의 경우와 동일하다.21 illustrates an example of a structure of a timing wheel. When a simulation is performed at a current T time when a maximum value of a delay of a gate is D, a maximum time at which an event I is performed is T + D. Therefore, there can be no events at times above T + D + 1, and because the gates are not simulated until all events have been processed, the maximum number of queues is D because no events are queued before time T or T. Can be defined as Events that occur at the simulation time T can be found using the syntax "Q position = T% D,% is modulation" to find the index of the queue. The simulator records and stores the "current time" based on the start time of the simulation, and the event processing program is the same as that of the unit delay model.

도 22는 임의지연 모델을 위한 시뮬레이션 알고리즘의 구성을 보여 주는 프로그램의 일예로써, 밑줄로 표기된 부분은 단위지연모델 대비 변경된 부분이다.FIG. 22 is an example of a program showing a configuration of a simulation algorithm for a random delay model. The underlined part is a part changed from the unit delay model.

도 23은 게이트 수행을 위한 게이트 시뮬레이션 프로그램의 일예이다.23 is an example of a gate simulation program for performing a gate.

그러나, 상기의 시뮬레이션에 있어서 동일 네트의 두개 이벤트가 처리를 위하여 큐에 저장되어 있는 경우에는 문제가 발생하게 된다.However, in the above simulation, a problem occurs when two events of the same net are stored in a queue for processing.

도 24는 이러한 문제 발생의 일예를 보여주는 회로로써, 게이트 G는 대형회로의 한 부분임을 뜻한다. 여기서 게이트 G의 입력과 출력값이 t=100에서 발생된 값이라고 가정하면, t=101에서 이벤트가 처리되어 네트 a의 값이 1에서 0으로 변환되고, 이는 네트 b가 처리되기 위하여 t=105에 스케쥴링되는데 이는 t=105에서 b값이 0에서 1로 변환될 예정임을 알 수 있다. 그런데, t=102에서 이벤트가 발생되어 네트 a의 값이 1에서 다시 0으로 변환되었을 경우, t=105의 이벤트는 아직 큐에 있는 시간이며 네트 b값이 변환되기 이전이며, G는 t=102에서 입력값 1을 사용하여 시뮬레이션하고 G의 출력값이 이미 0이기 때문에 새로운 이벤트를 발생하지 않는다. 이와 같이 새로운 이벤트를 발생하지 않는 다면 t=105에서 이벤트가 수행되어 네트 b값을 1로 변환하게 되고 그 결과, 게이트 G의 입력과 출력값이 모두 1인 경 우가 되는 문제가 생긴다.24 is a circuit illustrating an example of such a problem, and means that the gate G is part of a large circuit. Assuming that the input and output values of gate G are generated at t = 100, the event is processed at t = 101 and the value of net a is converted from 1 to 0, which means that net b is processed at t = 105. It is scheduled, which indicates that at value t = 105, b will be converted from 0 to 1. However, if an event occurs at t = 102 and the value of net a is converted from 1 back to 0, the event at t = 105 is still in the queue and before the net b value is converted, and G is t = 102. Simulates using input 1 in, and does not raise a new event because the output of G is already zero. If a new event does not occur in this manner, an event is performed at t = 105 to convert the net b value to 1, and as a result, there is a problem that the input and output values of the gate G are all 1.

이와 같은 문제는 t=102에서 게이트 시뮬레이션의 결과를 네트 b의 값과 비교하기 때문에 발생되는 것으로, t=106에서 네트 b값이 변환되기 때문이다. 이 문제를 해결하기 위한 방법으로 큐에 있는 네트 b의 이벤트를 조사하여 새로운 값과 비교하는 방법, 또는 다른 방법은 네트의 이전 값과 새로운 값의 비교를 지연하여 처리하는 방법 등이 있으며 이중 후자의 방법이 전자의 방법보다 간단하고 구현하기 쉽다.This problem occurs because the result of the gate simulation is compared with the value of net b at t = 102, because the net b value is converted at t = 106. One way to solve this problem is to examine the event in the queued net b and compare it with the new value, or alternatively, delay the comparison of the old and new values in the net and handle the latter. The method is simpler and easier to implement than the former method.

도 25는 상기한 네트의 이전 값과 새로운 값의 지연 비교하여 처리하는 프로그램의 일예이다.25 is an example of a program for processing by comparing the delay between the old value and the new value of the net.

도 26은 좀 더 상세한 설명을 위한 이벤트 방식의 임의 지연 모델의 일예로써, 각 게이트의 지연 모델(delay value)은 각 게이트에서 G1은 2 시간단위(time unit), G3은 2단위, G2는 1단위의 게이트 지연(delay)이 있다고 가정하고, 입력은 무작위로 주어지며, 입력의 값은 두 가지 0, 1값을 사용하고 이벤트방식(Event-driven 형식)의 논리 시뮬레이션을 수행한다고 가정할 때, 이의 수행을 위한 시뮬레이터 알고리즘은 다음과 같다.FIG. 26 is an example of an arbitrary delay model of an event method for a more detailed description, in which the delay value of each gate includes G1 for two time units, G3 for two units, and G2 for one. Suppose there is a gate delay in units, the inputs are given randomly, and the value of the input assumes two 0, 1 values and the logic simulation of the event-driven type is performed. The simulator algorithm for doing this is as follows.

(a) 입력 TEST (GET A, B, C); 입력 이벤트 처리 (a) input TEST (GET A, B, C); Input event processing

(b) Net_D: 네트 D 단위 프로그램 (b) Net_D: Net D unit program

(c) if new_value != net_value then 이벤트 처리(네트 검증) 단계 (c) if new_value! = net_value then event processing (net validation) steps

(d) PUSH Gate_G3; 네트 D의 연결 게이트 (d) PUSH Gate_G3; Connecting gate of net D

(Fanout Gates)(Fanout Gates)

(e) goto WHEEL[current_time]->procedure 타이밍 휠 (e) goto WHEEL [current_time]-> procedure timing wheel

(f) endif(f) endif

(g) POP GATE_QUEUE (g) POP GATE_QUEUE

(h) Gate_G3: 게이트 시뮬레이션 단계 (h) Gate_G3: gate simulation step

(i) temp = net_D & net_E; 로직 기능 수행 (i) temp = net_D &net_E; Perform logic function

(j) QUEUE_INDEX = (current_time + GATE_DELAY)(j) QUEUE_INDEX = (current_time + GATE_DELAY)

MOD Size_of_Queue MOD Size_of_Queue

(k) PUSH net_O and temp onto(k) PUSH net_O and temp onto

WHEEL[QUEUE_INDEX]; WHEEL [QUEUE_INDEX];

(l) goto Gate_Handler(l) goto Gate_Handler

상기 알고리즘은 기능상 네 부분으로 구별할 수 있는데, 첫째는 입력을 해석하여 주어진 벡터가 이벤트인지를 검증하는 (a)의 부분, 둘째는 네트 D에 해당하는 단위프로그램으로써 (b) 내지 (f)의 부분, 셋째는 각각의 게이트에 해당하는 단위 프로그램들로써 (h) 내지 (l)이며, 넷째는 (g) 내지 (l)부분의 컴파일된 프로그램(컴파일드 방식의 시뮬레이션)에서 게이트와 네트에 해당하는 단위 프로그램들을 관리 및 운영하는 스케쥴러이다. 여기서 회로의 네트 E도 상기 둘째부분과 유사한 구조로 구성되어 있으며, 전체 시뮬레이터는 네트 D 외에도 E, O에 해당하는 단위 프로그램으로 구성된다. 컴파일드 방식은 상기 종래의 기술에서 설명하였듯이 회로를 분석하여 시뮬레이터의 내부 구조로 표현하고 해석하여 최적화된 C 언어의 코드를 생성하여 도면을 검증하는 방식을 말한다.The algorithm is functionally divided into four parts, firstly, the part of (a) which interprets the input and verifies whether a given vector is an event, and the second is a unit program corresponding to net D. The part and the third are the unit programs corresponding to the respective gates (h) to (l), and the fourth is the gate and the net corresponding to the compiled programs (compiled simulation) of the (g) to (l) parts. A scheduler that manages and operates unit programs. Here, the net E of the circuit has a structure similar to the second part, and the entire simulator is composed of unit programs corresponding to E and O in addition to the net D. As described in the related art, the compiled method refers to a method of verifying a drawing by analyzing a circuit, expressing and interpreting the internal structure of a simulator, and generating an optimized C language code.

수행 과정은 예제 회로에서 모든 입력 벡터에 의해 게이트 G1과 G2가 수행된다. 이후 게이트 G2의 출력에 의해 D의 값이 변화되어, 이를 상기 (c)에서 이벤트인지 검증하며, 이벤트인 경우, Fanout 게이트인 G3를 게이트 실행부에서 수행하도록 저장하여 상기 (h)에서 로직 기능을 수행한다.In the example circuit, gates G1 and G2 are performed by all input vectors. After that, the value of D is changed by the output of the gate G2, and this is verified in (c), and in the case of the event, the Gout, which is a fanout gate, is stored in the gate execution unit to perform the logic function in (h). Perform.

시뮬레이터의 주요 구성 요소로는 이벤트(event)를 저장할 수 있는 타이밍휠(timing wheel)과 게이트 큐가 있는데 타이밍휠은 입력 단자의 값(A, B, C의 값)이나 게이트를 시뮬레이션한 후 게이트 출력 단자 값(네트 D, E, O의 값)을 저장하는 기능을 가진다. 타이밍휠의 크기는 시뮬레이터 내의 메모리를 절약하기 위하여 회로 상의 게이트 지연 값 중에서 가장 큰 값 더하기 1의 크기로 하며, 타이밍휠의 인덱스는 단위 시간을 의미한다.The main components of the simulator include a timing wheel for storing events and a gate cue. The timing wheel simulates the value of the input terminals (values of A, B, and C) or the gate output. It has the function to save the terminal value (net D, E, O value). In order to save memory in the simulator, the size of the timing wheel is set to the largest value of the gate delay value on the circuit plus 1, and the index of the timing wheel means unit time.

도 27 내지 도 28은 상기 알고리즘을 수행하는 시뮬레이터의 구성 요소를 나타낸 것으로, 빗금친 부분은 현재 시간을 표시하며, 현재 시간이 휠의 크기보다 큰 경우는 MOD의 연산자를 이용하여 타이밍휠의 첫 번째를 다시 지정하여 사용한다. 즉, 예제 상의 현재 시간에 관한 휠의 표현은 "현재 시간 MOD 4"(wheel size)이다.27 to 28 show the components of the simulator for performing the above algorithm. The hatched portion indicates the current time, and when the current time is larger than the size of the wheel, the first of the timing wheel using the operator of the MOD is shown. Specify again to use. That is, the representation of the wheel with respect to the current time in the example is "current time MOD 4" (wheel size).

도 27에 있어서, 도 26의 입력 C의 변화에 따라 C를 입력으로 하는 게이트인 G2를 게이트 큐에 저장하여 단계 2에서 사용할 수 있도록 저장하며, 현재의 이벤트인 C를 제거한다. 현재 시간 0에 더 이상의 이벤트가 없는 경우, 게이트 검증 단계로 변경된다. 이는 단계 2로의 변경 조건이며, 이 단계에서는 게이트 큐에 저장된 것이 있는 경우, 이 게이트를 시뮬레이션하게 된다. 게이트 G2가 수행되며, 이의 출력인 네트 E의 값 1이 현재 시간 0 더하기 게이트의 지연값인 1의 휠 위치에 저장되어 다음의 네트 검증 단계에서 처리된다. 단계 3에서 표현된 것과 같이, 네트 E가 휠의 시간 1에 저장되며, 이때 네트의 현재의 값과 게이트 시뮬레이션 값들이 저장되고, 더 이상의 게이트가 없으면 현재 시간이 증가되고 네트 검증 단계인 단계 3이 수행된다. 단계 3에서는 단계 1과 같이 네트의 값을 비교하여, 다른 경우 이 네트를 입력으로 하는 게이트 G3 를 게이트 큐에 저장하여 위의 단계 1과 단계 2를 수행한 것과 동일한 방법으로 진행한다.In FIG. 27, according to the change of the input C of FIG. 26, G2, which is a gate of C, is stored in the gate queue and stored for use in step 2, and C, which is the current event, is removed. If there are no more events at the current time zero, it changes to the gate verify phase. This is the change condition to step 2, and this step simulates this gate if there is something stored in the gate queue. Gate G2 is performed and its output, value 1 of net E, is stored at the wheel position of current time 0 plus the delay value of gate 1 and processed in the next net verify step. As represented in step 3, net E is stored at time 1 of the wheel, where the current value of the net and the gate simulation values are stored, and if there are no more gates, the current time is incremented and net verification step 3 is performed. Is performed. In step 3, the net values are compared as in step 1, and in another case, the gate G3 having the net as an input is stored in the gate queue, and the same procedure as in the above steps 1 and 2 is performed.

도 28에 있어서, 단계 5 내지 6과 같이 저장되어 있는 네트나 게이트가 없는 경우에는 단지 각 네트와 게이트 시뮬레이션 단계를 반복하며, 현재 시간만을 증가시킨다. 단계 9를 행하여 더 이상의 이벤트나 게이트가 없으면 도 27의 단계 1에서 주어진 입력에 대한 시뮬레이션이 종료되게 된다.In Fig. 28, when there are no nets or gates stored as in steps 5 to 6, only the respective net and gate simulation steps are repeated, and only the current time is increased. If step 9 is performed and there are no more events or gates, then the simulation for the input given in step 1 of FIG. 27 ends.

이와 같이, 현재의 로직 시뮬레이션은 네트 검증 단계와 게이트 실행 단계를 반복 수행하여 각각의 큐에 저장된 것이 없으면 하나의 입력의 시뮬레이션을 종료하게 되며, 필요한 출력을 수행하게 된다.As such, the current logic simulation repeats the net verification step and the gate execution step, and if there is nothing stored in each queue, the simulation of one input is terminated and the required output is performed.

상기한 바와 같이 여러 방법을 통하여 좀 더 빠르고 정확한 시뮬레이션을 위한 방법들이 고안되어 사용되고 있으나 이는 대부분 소프트웨어적인 프로그램의 분석에 의한 방법으로, 분석을 위해 여전히 많은 시간이 소요되고 있다.As described above, methods for faster and more accurate simulation have been devised and used through various methods, but most of them are by analysis of software programs, which still takes a long time for analysis.

본 발명은 상기한 종래 기술의 문제점을 해결하기 위하여 안출된 것으로서, 반도체 회로 도면의 기능 검증을 위하여 하드웨어적인 방법과 이에 최적화된 소프트웨어적인 방법을 병용함으로써 종래의 기술에 비하여 매우 빠른 검증이 가능토록 하는데 있다.The present invention has been made to solve the above problems of the prior art, by using a hardware method and an optimized software method for the functional verification of the semiconductor circuit drawings to enable a very quick verification compared to the prior art. have.

상기 목적을 달성하기 위하여, 본 발명에 따른 반도체 소자의 설계 검증을 위한 고속 병렬 시뮬레이션 방법은 하드웨어적인 병렬처리와 이에 따른 개선된 소프트웨어를 적용하여 보다 빠르게 회로를 검증하는 것을 특징으로 한다.In order to achieve the above object, the high-speed parallel simulation method for design verification of a semiconductor device according to the present invention is characterized by verifying a circuit faster by applying hardware parallel processing and improved software accordingly.

이하, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여, 본 발명의 가장 바람직한 실시 예를 설명하기로 한다.Hereinafter, the most preferred embodiments of the present invention will be described in detail so that those skilled in the art can easily implement the technical idea of the present invention.

반도체 회로 도면의 로직 검증을 이벤트 방식의 처리 프로그램은 입출력의 처리, 이벤트처리 프로그램의 수행, 시뮬레이션프로그램의 수행 및 시뮬레이션시간, 시간의 증가, 시뮬레이션의 종료를 관리하는 메인프로그램으로 구성된다. 특히, 메인프로그램은 이벤트와 게이트시뮬레이션의 스케쥴링을 처리하는 타이밍휠의 관리를 담당하는 역할을 할 뿐 아니라, 타이밍 휠 및 스케쥴링이 시뮬레이션의 성능을 좌우한다. 그 이유는 스케쥴링의 횟수, 게이트 시뮬레이션의 프로그램의 수행을 위한 준비 작업과 해당 이벤트 처리프로그램의 수행을 위한 사전 준비 작업이 실제 게이트 시뮬레이션과 이벤트처리 프로그램의 수행 횟수보다 많기 때문이다.The event verification processing program of the logic circuit of the semiconductor circuit diagram is composed of a main program that manages input / output processing, event processing program execution, simulation program execution and simulation time, time increase, and simulation termination. In particular, the main program not only plays a role in managing the timing wheel that handles the scheduling of events and gate simulations, but the timing wheel and scheduling determine the performance of the simulation. The reason for this is that the number of times of scheduling, preparation for the execution of the gate simulation program and preparatory work for the execution of the corresponding event processing program are larger than the number of execution of the actual gate simulation and the event processing program.

본 발명에서 적용하는 PC-set(Potential Change Set) 알고리즘은 임의지연 시뮬레이션과 단위지연(Multi-delay and Unit-delay Simulation)에 적용하기 위하 여 레벨기법을 이용하고, 정적인 스케쥴링 기법을 사용한다. 정적 스케쥴링 기법을 사용하기 위하여 필요한 기술적인 기법은 시뮬레이션 시간을 결정하는 것이다. The PC-set (Potential Change Set) algorithm applied in the present invention uses a level technique and a static scheduling technique for applying to a random delay simulation and a unit delay (Multi-delay and Unit-delay Simulation). The technical technique required to use the static scheduling technique is to determine the simulation time.

시뮬레이션의 결정 방법은 레벨을 결정하는 것과 유사한 방식에 따라 결정되는데, 시뮬레이션의 시간을 결정하는 이유는 각각의 네트가 회로 내부에서 시뮬레이션이 수행될 가능성이 있는 시간을 추출하는 것이다.The method of determining the simulation is determined in a similar way to determining the level. The reason for determining the time of the simulation is that each net extracts the time during which the simulation is likely to be performed inside the circuit.

레벨을 결정하는 방법과 유사하게, PC-Set의 계산은 입력단자로부터 시작한다. 입력단의 입력이 변화(이벤트의 발생) 가능성이 t=0이기 때문에 각각의 입력단의 PC집합의 값은 {0}이 된다. 임의지연모델은 단위지연모델(delta = 1)을 포함하기 때문에 단위지연모델을 포함하기로 하여 각각의 게이트 지연을 D로 정의하기로 한다. 만일 게이트의 입력이 시간 t에서 변경되었을 경우, 출력은 t+D시간에 변경될 가능성이 있다. 또한, 모든 입력이 시간 t에서 변경이 없었다면 출력 t+D에서도 변경 없이 같은 값을 가지게 된다. 게이트 출력의 PC-set을 계산하기 위하여서는 게이트 입력의 PC-set을 알 필요가 있다. Similar to the method of determining the level, the calculation of the PC-Set starts from the input terminal. Since the probability that the input at the input stage changes (the occurrence of an event) is t = 0, the value of the PC set at each input stage is {0}. Since the random delay model includes the unit delay model (delta = 1), the unit delay model is included and each gate delay is defined as D. If the input of the gate is changed at time t, the output is likely to change at time t + D. Also, if all inputs did not change at time t, the output t + D would have the same value without change. In order to calculate the PC-set of the gate output, we need to know the PC-set of the gate input.

도 29에서와 같은 방법으로 회로내의 게이트, 네트의 PC 집합을 구하여 결정한다. Wired-OR 연결이나 한 소스보다 많은 방법으로 연결되는 네트는 게이트와 네트의 PC-set을 동시에 이용하여 계산하여 구할 수 있다. 만일 네트가 두 개 이상의 게이트에 의하여 구동된다면, 해당 네트의 PC-set은 집합론의 유니온(union) 기능에 의하여 구할 수 있다.29, a PC set of gates and nets in a circuit is obtained and determined. Nets connected by wired-OR connection or by more than one source can be calculated by using both the gate and the PC-set of the net simultaneously. If a net is driven by more than one gate, the PC-set of that net can be obtained by the union function of set theory.

도 30은 이러한 방법에 대한 예제이며, 게이트의 PC-set은 시뮬레이션 프로그램을 생성하는 데 사용된다.30 is an example of such a method, where a PC-set of gates is used to generate a simulation program.

여기서, 게이트와 네트의 PC-set은 동적 시뮬레이션을 위한 요소로 사용되지만, 게이트의 입력네트가 변환될 수 있는 시간을 모두 포함하기 위하여서는 추가 작업이 수행되어야 한다.Here, the PC-set of gate and net is used as an element for dynamic simulation, but additional work has to be performed to cover all the time that the gate's input net can be converted.

도 31의 예제에서 볼 수 있듯이, 게이트의 PC-set 중에 "4"의 인자가 있다. 이것은 t=4에서 게이트가 시뮬레이션을 수행할 필요가 있음을 의미한다. 따라서 게이트가 시뮬레이션을 하기위하여서는 t=3의 입력 네트의 값을 사용하게 된다. 하지만, 게이트의 다른 입력 네트의 값은 t=7일 때까지 새로운 값이 필요하지 않다. 그러므로 t=4의 네트 입력값은 이전 입력값(previous input vector)에서 구하여야 한다. 이와 같은 경우에 "zero 삽입(zero insertion)" 태그가 사용되어 이전의 입력값을 보존하는 네트를 정의하고, 동 네트에 PC-set의 인자에 "0"을 추가하게 된다.As can be seen in the example of FIG. 31, there is a factor of "4" in the PC-set of the gate. This means that at t = 4 the gate needs to perform the simulation. Therefore, the gate uses the value of the input net of t = 3 for the simulation. However, the value of the other input net on the gate does not need a new value until t = 7. Therefore, the net input of t = 4 should be obtained from the previous input vector. In this case, a "zero insertion" tag is used to define a net that preserves the previous input and add "0" to the PC-set argument to that net.

도 33은 PC-set의 일예와 "0"삽입 보정의 일예를 나타낸 것으로, 보정된 PC-set의 요소 "0"는 회로 도면을 따라서 전파되지 않는다. 회로의 네트에서 "0"가 삽입되는 경우는 단지 게이트 G의 입력이 2개 이상이며, 게이트 G의 PC-set의 가장 작은 요소값(set의 element)이 입력네트의 PC-set의 가장 작은 요소 값보다 큰 경우에 해당한다.33 shows an example of the PC-set and an example of the "0" insertion correction, in which the element "0" of the corrected PC-set does not propagate along the circuit diagram. If "0" is inserted in the net of the circuit, only the gate G has two or more inputs, and the smallest element value of the PC-set of the gate G is the smallest element of the PC-set of the input net. Corresponds to the case where the value is larger than the value.

도 32는 이와 같은 방법을 보여주는 PC-set의 "0"삽입 룰(rule)의 알고리즘을 나타낸다.Fig. 32 shows the algorithm of the " 0 " insertion rule of the PC-set showing such a method.

상기 PC-set을 이용한 시뮬레이션 코드의 생성은 첫 번째, 네트의 값을 위한 변수를 생성하는 것이다. 각 네트를 위하여 변수 어레이(set of variables)가 생성 되며, 더불어 "0" 삽입 룰에 의한 변수도 생성한다.Generation of the simulation code using the PC-set is to first generate a variable for the value of the net. An array of variables is created for each net, as well as variables with a "0" insertion rule.

도 34는 상기 변수 어레이가 생성된 예제를 나타낸 것으로, 생성된 변수의 이름은 N_k의 형태를 갖추고 있으며, N은 네트의 이름을, k는 PC-set의 요소를 표현한다.34 shows an example in which the variable array is generated, in which the name of the generated variable has the form of N_k, where N represents a net name and k represents an element of a PC-set.

상기 변수 N-k는 게이트의 PC-set이 시뮬레이션 수행 프로그램을 생성하는 데 사용한다. 게이트의 분석은 레벨의 순서에 의하며 처리되며, 게이트 시뮬레이션의 구문의 수는 게이트의 PC-set의 요소 개수만큼 생성된다. 각 시뮬레이션 구문은 게이트의 입력 네트의 값을 가져와서 수행한다. 각각의 입력 네트는 PC-set의 가장 큰 요소의 값을 선택하여 입력값이 선정된다. "0"삽입은 PC-set의 가장 큰 요소값이 있음을 보장한다.The variable N-k is used by the PC-set of gates to generate a simulation program. The analysis of gates is processed in the order of levels, and the number of syntax of gate simulation is generated by the number of elements of the PC-set of gates. Each simulation statement is performed by taking the value of the gate's input net. Each input net is selected by selecting the value of the largest element of the PC-set. Insertion of "0" ensures that there is the largest element value of the PC-set.

도 35는 시뮬레이션 코드의 생성을 위한 알고리즘을 나타낸 것으로, 알고리즘이 모든 게이트가 두 개의 입력을 갖는 AND 게이트인 경우를 예제로 한 이유는 설명을 간략히 하기 위함이고, 다른 형태의 게이트를 위한 방법도 유사한 절차를 거쳐서 수행할 수 있다.FIG. 35 shows an algorithm for generating a simulation code. The reason why the algorithm is an AND gate having two inputs as an example is for simplicity of explanation, and a method for another type of gate is similar. This can be done through a procedure.

여기서, PC-set은 비록 임의지연 모델을 지원하지만, 주로 단위지연 모델에 적합한 구조와 이론으로써, 상기 종래의 기술에서 설명한 바와 같이 단위지연모델은 제로지연모델에 비하여 회로의 특성을 표현하는 데 장점이 있지만, 아직도 복잡한 회로의 상태나 정확한 기능의 검증에는 문제가 있다. 따라서 복잡한 회로 상태를 검증을 위하여 임의지연 모델을 사용하게 되며, PC-set 또한 개선된 형태의 EPC-set(Enhanced or Extended PC-set)를 적용한다.Here, although the PC-set supports an arbitrary delay model, it is mainly a structure and a theory suitable for the unit delay model. As described in the related art, the unit delay model has an advantage in expressing the characteristics of the circuit as compared to the zero delay model. However, there are still problems in verifying the state of complex circuits and correct functioning. Therefore, a random delay model is used to verify complex circuit conditions, and the PC-set also applies an improved EPC-set (Enhanced or Extended PC-set).

상기 EPC-set 알고리즘은 게이트 당 3가지 순서 요소(ordered element)를 가진 집합으로 표시한다. 첫 번째 요소는 상기 PC-set의 의미와 같고, 둘째와 셋째의 요소는 게이트에 도달하는 신호의 경로 최소값과 최대값을 의미한다. PC-set 요소 값이 의미하는 바는 입력(Primary Input)에서부터 동 게이트나 네트까지 도달하는 신호의 경로 - 즉 게이트의 수-를 의미한다.The EPC-set algorithm denotes a set with three ordered elements per gate. The first element is the same as the meaning of the PC-set, and the second and third elements mean the path minimum and maximum values of the signal reaching the gate. The value of the PC-set element means the path of the signal from the primary input to the same gate or net-the number of gates.

도 36은 EPC-set의 일예이다.36 shows an example of an EPC-set.

EPC-set알고리즘은 입력네트(Primary Input)에 값 {(0,0,0)}를 지정하는 것으로부터 시작한다. 상기 PC-set을 구하는 방법과 같이, EPC-set은 게이트의 모든 입력네트의 EPC-set가 정의되고 나서 게이트의 요소들이 정의된다. 게이트의 EPC-set은 먼저 입력네트들의 EPC-set의 합집합 동작(Union)으로 계산한다. 만일 입력네트들 중의 EPC값에서 첫 번째 요소가 같으면, 하나의 값만을 인정하고 나머지는 제거한다. 하지만 두 번째 요소는 같은 첫 번째 요소 중에서 두 번째 요소들 중에서 최소값을 선택하여 지정한다. 마찬가지 방법으로 세 번째 요소는 같은 첫 번째 요소 중에서 세 번째 요소들 중에서 최대값을 선택한다. 첫 번째 요소가 중복된 것을 다 처리하고 나면, EPC-set의 첫 번째 요소는 1을 증가시키고, 두 번째와 세 번째 요소는 게이트의 지연값 만큼 증가시키면 해당 게이트의 EPC-set은 구해진다.The EPC-set algorithm starts by assigning a value {(0,0,0)} to the Primary Input. As with the method of obtaining the PC-set, the EPC-set is defined with the elements of the gate after the EPC-set of all input nets of the gate is defined. The EPC-set of the gate is first calculated as the union of the EPC-sets of the input nets. If the first element in the EPC values of the input nets is the same, only one value is accepted and the rest are removed. However, the second element is specified by selecting the minimum value among the second elements among the same first element. Similarly, the third element selects the maximum of the third of the same first element. After the first element has been dealt with, the first element of the EPC-set is increased by 1, and the second and third elements are increased by the delay of the gate to obtain the gate's EPC-set.

표 1은 도 36의 예제의 EPC-set을 구하는 알고리즘을 각 단계 별로 설명한 것이다.Table 1 describes the algorithms for obtaining the EPC-set of the example of FIG. 36 for each step.

EPC-set의 계산 단계Calculation step of EPC-set 단계 1Step 1 단계 2Step 2 단계 2Step 2 단계 3Step 3 단계 3Step 3 A - {(0,0,0)} B - {(0,0,0)} C - {(0,0,0)} D - {(0,0,0)} E - {(0,0,0)} A-{(0,0,0)} B-{(0,0,0)} C-{(0,0,0)} D-{(0,0,0)} E-{(0, 0,0)} G1 - {(1,1,1)} G2 - {(1,2,2)} G3 - {(1,3,3)} G1-{(1,1,1)} G2-{(1,2,2)} G3-{(1,3,3)} I1 - {(1,1,1)} I2 - {(1,2,2)} I3 - {(1,3,3)} I1-{(1,1,1)} I2-{(1,2,2)} I3-{(1,3,3)} G4 - {(1,1,1), (1,2,2), (1,3,3)} G4-{(1,1,1), (1,2,2), (1,3,3)} Q - {(2,5,7)} Q-{(2,5,7)} G4 - {(1,1,3)} G4-{(1,1,3)} G4 - {(2,5,7)} G4-{(2,5,7)}

이와 같이 모든 EPC-set들이 계산된 후에, 필요한 비트필드의 폭을 다음의 수식 1을 이용하여 계산한다.After all the EPC-sets are calculated, the width of the required bitfield is calculated using Equation 1 below.

[수식 1][Equation 1]

Wi = Mi - mi + 1Wi = Mi-mi + 1

여기서, i는 PC-set의 값, Mi는 최대지연길이, mi는 최소길이를 뜻한다.Where i is the PC-set value, Mi is the maximum delay length, and mi is the minimum length.

상기의 연산에 의하여 생성되는 W의 값중에서 최대값이 시뮬레이션을 위하여 사용되는 비트필드의 크기가 된다. 고정된 비트필드의 크기가 시뮬레이션을 수행하는 데 간단하고 처리하기 쉽기 때문에 모든 네트와 게이트는 이 비트필드를 사용한다. Among the values of W generated by the above operation, the maximum value is the size of the bitfield used for the simulation. All nets and gates use this bitfield because the size of the fixed bitfield is simple and easy to process.

비트필드의 크기와 더불어 네트의 정렬은 시뮬레이션의 기본 요건에 해당된다. 네트의 정렬은 시간과 밀접한 상관관계가 있기 때문이다. 네트의 이벤트를 처리할 때, 시뮬레이터의 현재시간을 기준으로 EPC-set의 값으로 사용한다. 따라서 네트는 EPC-Set의 최소 지연값(d)에 따라 정렬되며, 새로운 네트의 값이 비트필드에 저장된다. The alignment of the net along with the size of the bitfield is a fundamental requirement of the simulation. This is because the alignment of the net is closely correlated with time. When processing an event on the net, use the value of EPC-set based on the simulator's current time. Therefore, the nets are sorted according to the minimum delay value d of the EPC-Set, and the value of the new net is stored in the bitfield.

그러나 이러한 절차에는 문제점이 있다. 만일 현재 네트의 정렬된 시간이 현재의 시간보다 크다면, 시프트동작에 의하여 과거의 값을 생성해야 되는 문제점이 있다. 그러나, 과거의 정보를 생성하는 것은 불가능하기 때문에, EPC-set 알고리즘은 새로운 요소로써 k값을 추가하였으며, 각각의 k값은 다음 수식 2와 같은 의미를 갖는다.However, there is a problem with this procedure. If the aligned time of the current net is larger than the current time, there is a problem that a past value should be generated by the shift operation. However, since it is impossible to generate past information, the EPC-set algorithm added k as a new element, and each k value has the same meaning as in Equation 2 below.

[수식 2][Formula 2]

ki = min({ mj | j < i})ki = min ({mj | j <i})

여기서 ki 값은 PC-set 값에 따라 증가하는 값을 가진다. 마찬가지로, 만일 i>j, Mi < Mj의 상태가 되면 문제점이 발생된다. 즉, 시간 j에서 이벤트가 발생하게 되면, 시간 i의 비트필드에서 없어질 가능성이 있다. 따라서 왼쪽으로 시프트를 하게 되면 j의 정보가 없어지게 된다(좌측시프트의 문제). 이를 해결하기 위하여 EPC-set알고리즘은 새로운 요소 Mi를 이용하여 방지하게 되며 수식 3과 같이 표현된다.Here, the ki value has a value that increases with the PC-set value. Similarly, if i> j, Mi <Mj, the problem occurs. In other words, if an event occurs at time j, there is a possibility that the bitfield at time i will disappear. Therefore, when shifting to the left, j information is lost (left shift problem). In order to solve this problem, the EPC-set algorithm is prevented by using the new element Mi and is expressed as Equation 3.

[수식 3] [Equation 3]

Ki = max({ Mj | j < i})Ki = max ({Mj | j <i})

k와 K값의 계산과 처리를 위하여 다음의 표 2를 이용하면 쉽게 알 수 있다. 각각의 줄은 하나의 EPC-Set값을 예시하고 있으며, k와 K값이 추가 되어 있다.For the calculation and processing of k and K values, it is easy to see using Table 2 below. Each line illustrates one EPC-Set value, with k and K added.

EPC-set의 예Example of EPC-set PC PC m m MM kk KK 1One 22 22 22 22 22 77 99 55 99 44 55 66 55 99 88 66 1010 66 1010 99 88 99 88 1010

회로의 네트와 게이트를 시뮬레이션을 수행할 때는 고정된 비트필드의 크기를 사용하여 시프트기능을 이용하여 시간과 네트의 값을 표현한다. 게이트 시뮬레이션이 수행하는 경우에 게이트의 입력네트의 비트필드는 값 ki로 정렬되어야 한다. 이와 같은 정렬은 "path-tracing"알고리즘에 의하여 정렬되어 좌측 시프트를 없앨 수 있다.When simulating the nets and gates of the circuit, the fixed bitfield size is used to represent the time and net values using the shift function. When the gate simulation is performed, the bitfield of the input net of the gate should be aligned with the value ki. This sort can be sorted by the "path-tracing" algorithm, eliminating the left shift.

상기 Path-tracing 알고리즘은 회로의 출력단과 연결된 게이트로부터 시작하여 입력단 방향으로 처리를 수행한다. 각각의 PC 값 i - EPC-set의 첫 번째 요소 - 에 대하여 ki값 중에서 i-1의 PC값이 작은 것을 구분한다. 이때, 좌측 시프트를 피하기 위하여 EPC-set의 k값은 ki-d보다 작거나 커야 한다(d는 게이트의 지연값). 이러한 조건의 k값이 아닌 경우에는 이 조건을 만족시키기 위하여 EPC-set의 요소값은 ki-d의 값으로 지정한다. 이와 더불어, 비트필드의 크기가 확장되는 것을 막기 위하여, K값은 같은 값으로 축소된다. 이것은 상위비트의 값을 잃어버리는 것 같아 보이지만 실제로는 그렇지 않다. 모든 비트필드의 크기가 동일하기 때문에, 그리고 EPC-set의 m과 M값이 게이트의 입력네트들에서 전파되기 때문에 새로운 Ki는 입력네트의 모든 이벤트를 보유할 수 있다.The path-tracing algorithm performs processing toward the input terminal starting from the gate connected to the output terminal of the circuit. For each PC value i-the first element of the EPC-set-we distinguish among the ki values that the PC value of i-1 is small. At this time, in order to avoid left shift, the k value of the EPC-set should be smaller or larger than ki-d (d is a delay value of the gate). If the k value of such a condition is not satisfied, the element value of the EPC-set is designated as the value of ki-d to satisfy this condition. In addition, in order to prevent the size of the bitfield from expanding, the K value is reduced to the same value. This seems to lose the value of the higher bits, but in reality it is not. Because all bitfields are the same size, and because the m and M values of the EPC-set are propagated on the input nets of the gate, the new Ki can hold all the events of the input net.

도 37은 Path Tracing 알고리즘의 예제를 나타낸 것으로, "Path Tracing" 알고리즘은 출력단 Q에서부터 k와 K값을 가지고 수행한다. 출력 네트 Q와 게이트 G4 의 정렬은 t=5값으로 되어 있고(표 1 참조), 게이트 G4의 입력네트들 - G4의 k에서 지연값을 뺀 결과인 t=1로 정렬된다. 반면에 I2과 I2 네트의 k값은 변동되지 않지만, I3의 값은 좌측 시프트 동작을 피하기 위하여 변동되어야 한다. 따라서 I3의 k값은 1로 변환하여 한다. 이 변환된 동작은 G3의 입력네트인 입력단 E로 전파되어 E의 k값을 -2로 변환하게 된다.37 shows an example of a path tracing algorithm, and the "path tracing" algorithm is performed with k and K values from the output terminal Q. The alignment of output net Q and gate G4 is t = 5 (see Table 1), and the input nets of gate G4 minus the delay value from k of G4, t = 1. On the other hand, the k values of I2 and I2 nets do not change, but the value of I3 must be changed to avoid left shift operation. Therefore, the value of k in I3 should be converted to one. This converted operation is propagated to the input terminal E which is the input net of G3 to convert the k value of E to -2.

특정한 시간 i에 대한 k값에 대한 조정은 복잡하다. 즉 t=i에서 한개의 EPC-set가 변경되고 나머지의 EPC-set값이 다른 네트들로 구성된 게이트라면, 다른 네트들의 값 중에서 i와 같거나 작은 값을 찾아야 한다. 또한 EPC-set요소중에서 k값이 변경되면, 이 시간보다 앞 시간에 해당하는 요소들이 변경되어야 k값이 감소되지 않는 성질이 유지된다. 따라서, EPC-set을 이용한 시뮬레이션 알고리즘은 다음과 같이 표현될 수 있는데, 이벤트 처리 시뮬레이션 방식에서 입력값을 갖는 프로그램과 출력하는 프로그램은 동일하며 변경된 부분만을 표기한 것이다.Adjustment of the k value for a particular time i is complicated. That is, if one EPC-set is changed at t = i and the rest of the EPC-sets are gates composed of different nets, one of the values of the other nets must be found to be equal to or smaller than i. In addition, if the value of k is changed in the EPC-set element, the property corresponding to the time before this time must be changed to maintain the property that the value of k is not reduced. Therefore, the simulation algorithm using the EPC-set can be expressed as follows. In the event processing simulation method, a program having an input value and an output program are the same and only the changed part is indicated.

Gate_simulation Program: 게이트 시뮬레이션 프로그램: Gate_simulation Program: Gate Simulation Program:

1. Locate EPC-set at time=iLocate EPC-set at time = i

2. A = k value of EPC-setA = k value of EPC-set

3. for all Input net if alignment != A3.for all Input net if alignment! = A

Align N = A;Align N = A;

4. R = Simulation function of all Input4.R = Simulation function of all Input

5. Scheduling Event with R5.Scheduling Event with R

Process_Event Program: 이벤트 처리 프로그램: Process_Event Program: Event Processing Program:

1. Locate EPC-set at time=i of net NLocate EPC-set at time = i of net N

2. A = k value of EPC-setA = k value of EPC-set

3. Align N = A3. Align N = A

4. Event = Compare N with R4.Event = Compare N with R

5. If event = 0 discard5.If event = 0 discard

else copy N=Relse copy N = R

6. Scheduling the fan-out Gate6. Scheduling the fan-out Gate

표 3은 상기의 EPC-set과 Path Tracing을 이용한 MDP(Multi-delay Parallel) 시뮬레이션을 수행한 결과이다.Table 3 shows the results of the MDP simulation using the EPC-set and path tracing.

수행결과의 비교Comparison of results CircuitCircuit 기존 방법 (인터프리티브)Existing Method (Interactive) MDP 시뮬레이션 MDP Simulation 소요시간 Time 비트필드 크기 Bitfield size 성능개선율Performance improvement rate C432C432 30.0초30.0 seconds 14.1초14.1 seconds 88 2.1배2.1x C499C499 30.1초30.1 seconds 14.6초14.6 seconds 33 2.1배2.1x C880C880 50.2초50.2 seconds 32.9초32.9 seconds 88 1.5배1.5 times C1355C1355 107.0초107.0 seconds 62.0초62.0 seconds 44 1.7배1.7x C1908C1908 206.2초206.2 seconds 115.9초115.9 seconds 1313 1.8배1.8x C2670C2670 222.0초222.0 seconds 139.2초139.2 seconds 1111 1.6배1.6x C3540C3540 373.0초373.0 seconds 208.3초208.3 seconds 1818 1.8배1.8x C5315C5315 595.4초595.4 seconds 367.2초367.2 seconds 1212 1.6배1.6x C6288C6288 5666.4초5666.4 seconds 2663.5초2663.5 seconds 55 2.1배2.1x C7552C7552 962.7초962.7 seconds 573.1초573.1 seconds 1414 1.7배1.7x

여기서 시뮬레이션의 수행은 SUN Workstation을 사용하였고 예제회로는 벤치마크 테스트를 위한 IEEE 협회에서 제공하는 ISCAS-85 조합회로 예제를 사용하였다. 본 회로는 십여 년간 벤치마크를 위하여 사용되어 왔기 때문에 연구 및 개발자간의 성 능비교를 위하여 좋은 예제가 된다.In this case, the SUN Workstation was used to perform the simulation, and the example circuit used the ISCAS-85 combination circuit example provided by the IEEE Association for benchmark test. This circuit has been used for benchmarking for over a decade and is a good example for performance comparison between research and developers.

상기 PC-set과 "0"삽입 룰의 시뮬레이션 방법은 이벤트기반의 동적 스케쥴링방식(혹은 oblivious방식)의 시뮬레이션이다. 이 방법의 최대 장점은 여러 개의 입력값을 (PI값)을 하나의 워드(word)로 조합(packing)하여 동시에 시뮬레이션을 수행할 수 있다는 것이다. 만일 시뮬레이션 입력값과 다음 입력값사이의 연관성이 없다면 일반적인 입력값의 조합은 가장 이상적인 방법이 될 수 있다. 그러나 임의지연모델이나 단위지연모델에서는 연관성의 고려가 절대적이기 때문에 연속된 입력값간의 연관성을 반드시 고려하여야 한다. 즉 정적 및 동적 해저드(static and dynamic hazards)의 검색은 연속된 입력값의 분석으로 가능하기 때문에 연속된 입력값 사이의 연관성이 필수적이다.The simulation method of the PC-set and the "0" insertion rule is an event-based dynamic scheduling (or oblivious) simulation. The best advantage of this method is that you can pack several input values (PI values) into one word and perform simulation simultaneously. If there is no association between the simulated input and the next input, then a combination of common inputs would be the ideal solution. However, the correlation between successive input values must be taken into account because the consideration of correlation is absolute in the random delay model or the unit delay model. In other words, retrieval of static and dynamic hazards is possible by analysis of successive input values, so the association between successive input values is essential.

도 38은 상기와 같은 문제점을 설명하기 위하여 회로도면에 다음의 순서와 같은 16개의 입력값(A, B, C, ..., N, O, P)을 부가하여 수행하였다고 가정하고, 예제를 간략히 하기 위하여 각 워드의 길이가 4비트로 구성할 경우의 일반적인 벡터(즉, 입력값)의 조합하는 방법을 설명한 것으로, 4개의 입력값이 하나의 워드로 구성되어 있는 경우, 입력값 F는 이전의 입력값이 E가 아니라 입력값 B가 되기 때문에 연속된 입력값의 관계가 유지 될 수 없다. 따라서 도 37의 예제 회로와 같이 일반적이고 수평적인 입력값의 조합은 입력값간의 연관성을 보존할 수 없기 때문에 도 38과 같은 수직적인 방식의 입력값을 사용하게 된다.FIG. 38 assumes that 16 input values (A, B, C, ..., N, O, P) are added to the circuit diagram in order to explain the above problem, and an example is given. For the sake of simplicity, the method of combining general vectors (i.e. input values) when the length of each word is 4 bits is described. When four input values are composed of one word, the input value F Since the input value becomes the input value B instead of the E, the relationship between the continuous input values cannot be maintained. Therefore, since the combination of general and horizontal input values as in the example circuit of FIG. 37 cannot preserve the association between the input values, the vertical input values as shown in FIG. 38 are used.

그러나, 아직도 문제점을 가지고 있는데, 그 이유는 입력값 D와 E의 관계, 즉 (D, E), (H, I), (L, M)의 연속적인 연관성을 보존할 수 없다. However, there is still a problem, because the relationship between input values D and E, namely (D, E), (H, I), (L, M), can not be preserved.

도 39는 이러한 문제를 해결하기 위하여 초기에 입력값을 추가하여 도 38의 연속적인 연관성 문제를 해결한 것을 나타내는 것으로, 입력값 패킹 방식은 시뮬레이션 처음에 입력값 간의 연관성을 유지하기 위한 초기 입력값을 생성하여 수행하고 나서, 실제의 입력값을 수행하는 단계를 거치게 된다. FIG. 39 illustrates that the continuous association problem of FIG. 38 is solved by adding an input value initially to solve such a problem. The input value packing method uses an initial input value to maintain an association between input values at the beginning of a simulation. After generating and executing, the actual input value is executed.

실제의 시뮬레이션을 수행하는 시스템의 워드 크기는 32비트이기 때문에 매 32개의 입력값(vector)마다 추가로 1개의 입력값을 생성하여 시뮬레이션을 수행한다. 따라서 도 39와 같이 입력값을 패킹함으로써 시뮬레이션의 수행 시간은 약 32배만큼 빠르게 수행할 수 있다. Since the word size of the system that performs the actual simulation is 32 bits, one additional input value is generated for every 32 input vectors to perform the simulation. Accordingly, by packing the input values as shown in FIG. 39, the execution time of the simulation can be performed by about 32 times faster.

상기한 본 발명의 시뮬레이션 기법은 도면 한 개의 게이트 당 수개의 게이트시뮬레이션 프로그램이 필요하다. 이와 같은 추가의 게이트 시뮬레이션 프로그램은 SoC나 대형의 VLSI도면의 처리에는 심각한 시간이 소모된다. 이러한 시간소모를 줄이기 위하여 상기 본 발명의 시뮬레이션 방법에 더하여 병렬처리 방법을 고안하였다.The simulation technique of the present invention described above requires several gate simulation programs per gate. This additional gate simulation program is very time consuming for processing SoCs or large VLSI drawings. In order to reduce such time consumption, a parallel processing method was devised in addition to the simulation method of the present invention.

상기한 병렬처리 방식은 워드의 각 비트는 시뮬레이션의 단위시간을 나타내며, 워드의 LSB는 최소 시간을, MSB는 시뮬레이션의 최대 시간 정보를 가지고 있고, 각 비트는 시뮬레이션 결과값을 가지고 있다.In the parallel processing scheme, each bit of a word represents a unit time of a simulation, the LSB of a word has a minimum time, the MSB has a maximum time information of a simulation, and each bit has a simulation result.

도 40은 상기한 워드의 구조를 나타낸 것으로, 각 비트의 위치가 시간을 나타내고 있으며, LSB(비트 0)의 위치로부터 시간 t로 시작하여, 다음 자리는 t+1등의 관계로 시간을 나타내고 있다. 시뮬레이션의 시간은 워드 단위보다는 비트필드(bit-field)의 크기로 표현되며, 비트필드의 크기는 회로 도면의 분석으로 그 크기 (M+1)가 결정된다. 여기서 M값은 회로 도면의 분석 결과에서 각 네트 정보가 가지고 있어야하는 시간별 네트값을 비트 크기로 표시한 것으로써, 하나의 벡터 입력을 시뮬레이션 할 때 필요한 비트의 크기이다.Fig. 40 shows the structure of the above word, where each bit position represents time, starting with the time t from the position of LSB (bit 0), and the next digit represents time in relation to t + 1 or the like. . The time of the simulation is represented by the size of the bitfield rather than the word unit, and the size of the bitfield is determined by analysis of the circuit diagram (M + 1). Here, the M value represents the time-based net value that each net information should have in the analysis result of the circuit diagram in bit size, and is a bit size required when simulating one vector input.

도 41은 게이트 시뮬레이션의 병렬처리 방법의 일예로써, 예제회로에서 보인 바와 같이 8비트 크기의 AND기능을 입력과 출력으로써 보여 주고 있으며, 2개의 입력을 가지는 AND게이트로써 두 개의 입력 네트의 값은 t=0로부터 설정되어 있고, 출력은 게이트의 지연(d=1)을 거친 후 게이트 기능의 시뮬레이션 결과가 출력 네트에 t=1로 시작되어 표기된다. 각각의 비트는 해당하는 시간의 AND기능 결과를 표기한다. 따라서 시뮬레이션이 수행되고 난 후에는 비트필드를 재조정할 필요가 있다. 즉, 비트단위의 AND을 수행하고 나서, 모든 비트의 값을 좌측으로 시프트(shift)하여 시작 시간을 t=0으로 조정해야 할 필요가 있다. 여기서 t=0의 값은 현재 시뮬레이션의 입력 벡터값이 입력되는 시간을 의미하기 때문에, 시프트 기능으로 말미암아 LSB의 위치에 인가되는 값은 t=-1에 해당하는 이전 입력 벡터값의 결과가 된다.41 shows an example of a parallel processing method of a gate simulation. As shown in the example circuit, an AND function having an 8-bit size is shown as an input and an output. An AND gate having two inputs has two input net values of t. It is set from = 0, and the output passes through the gate delay (d = 1), and then the simulation result of the gate function is displayed starting with t = 1 on the output net. Each bit represents the AND function result of the corresponding time. Therefore, after the simulation is performed, it is necessary to readjust the bitfields. That is, it is necessary to adjust the start time to t = 0 by performing the bitwise AND, shifting the values of all the bits to the left. Since the value of t = 0 means the time at which the input vector value of the current simulation is input, the value applied to the position of the LSB through the shift function is the result of the previous input vector value corresponding to t = -1.

그렇기 때문에 회로 도면의 모든 네트의 비트필드의 값을 시뮬레이션 수행할 때 마다 초기화하여야 하고, t=0의 위치에 이전 입력에 대한 값을 저장해두고, t=0이외의 위치는 네트값 "0"으로 초기화한다.For this reason, the values of all net bitfields in the circuit diagram must be initialized each time the simulation is carried out. Initialize

도 42는 상기의 좌측으로 시프트되는 비트필드의 조정 시뮬레이션을 나타낸다.Fig. 42 shows adjustment simulation of the bitfield shifted to the left side above.

게이트 시뮬레이션 프로그램은 레벨순서에 따라 게이트마다 1개씩 생성된다. 게이트 시뮬레이션 프로그램이 수행되기 이전에 입력포트의 초기화 및 네트값의 초 기화를 수행한다. 입력포트(PI : Primary Input)의 경우, 시뮬레이션의 입력값(입력 벡터)은 모든 비트의 위치로 확장되어 입력값이 지정되는 반면, 회로의 내부 네트값은 이전 입력 벡터를 이용하여 수행한 네트의 최종 결과값이 네트의 LSB위치에 초기화된다. 즉, 입력네트가 아닌 회로의 내부 네트와 출력네트의 초기화 방법은 비트필드의 MSB값을 시프트 동작에 의하여 LSB로 이동하면 된다. 왜냐하면, 비트필드의 MSB위치에는 시뮬레이션의 최종값이 저장되어 있기 때문이다.One gate simulation program is generated for each gate in the level order. The input port is initialized and the net value is initialized before the gate simulation program is executed. In the case of an input port (PI), the input value of the simulation (input vector) is expanded to the position of every bit to specify the input value, while the internal net value of the circuit is the value of the net performed using the previous input vector. The final result is initialized to the LSB position on the net. That is, in the initialization method of the internal net and the output net of the circuit other than the input net, the MSB value of the bit field may be moved to the LSB by the shift operation. This is because the final value of the simulation is stored in the MSB position of the bitfield.

입력네트의 경우에는 다른 방법을 사용한다. 이는 입력네트는 t=0일때 네트값이 설정되며, 이 입력을 이용한 시뮬레이션 동안에는 값이 변화하지 않는다. 따라서 비트필드의 각 비트의 값은 동일한 값을 가진다. For input nets, use a different method. This means that the net value is set when the input net is t = 0, and the value does not change during the simulation using this input. Therefore, the value of each bit of the bitfield has the same value.

도 43은 병렬처리 방법을 이용한 시뮬레이션의 간단한 예제를 나타낸 것으로, 병렬 시뮬레이션의 동작 설명을 위하여 이 회로가 이미 입력값(입력벡터) A=0, B=1, C=1을 이용하여 수행하였다고 가정하고, 새로운 입력값 (A=1, B=1, C=0)을 이용하여 시뮬레이션 하고자 한다. 이 회로의 비트필드의 크기 "M"은 2로써, 최대 비트필드의 크기는 3이면 가능하다. 비록 더 큰 비트필드를 사용한다하더라도 3번째 이후의 모든 비트의 값은 동일한 값을 갖기 때문에 큰 의미가 없다.Fig. 43 shows a simple example of the simulation using the parallel processing method. It is assumed that this circuit has already been performed using input values (input vectors) A = 0, B = 1, and C = 1 to explain the operation of the parallel simulation. The new input value (A = 1, B = 1, C = 0) is used to simulate. The bitfield size " M " of this circuit is 2, and the maximum bitfield size is 3. Even if a larger bitfield is used, all the bits of the third and subsequent bits have the same value.

도 44는 네트의 초기값과 예제회로에서 1단계에 해당하는 AND 게이트의 회로 동작을 비트필드의 위치에 따른 네트값의 변화와 시뮬레이션 결과를 보여 주고 있다. 또한 OR게이트의 시뮬레이션 결과 정적인 해저드(static hazard)를 현상을 찾을 수 있다.FIG. 44 shows the net value change and the simulation result of the operation of the AND gate corresponding to step 1 in the initial value of the net and the example circuit according to the bitfield position. In addition, the OR gate simulation results in static hazards.

도 45는 예제회로의 생성된 시뮬레이션 코드를 나타낸 것으로, 이 생성된 코 드는 입력값을 읽고, 변수의 하위 자리에 해당하는 입력을 저장한다.45 shows generated simulation code of the example circuit. The generated code reads an input value and stores an input corresponding to a lower digit of a variable.

상기의 방법에 있어서, 시프트 동작을 제거하게 되면 시뮬레이션의 수행이 더욱 개선되며, EPC-set과 Path-Tracing 알고리즘을 사용하여 비트 필드의 정렬과 좌측 시프트의 문제를 제거할 수 있다.In the above method, the elimination of the shift operation further improves the performance of the simulation. The problem of bit field alignment and left shift can be eliminated by using the EPC-set and path-tracing algorithms.

이와 같은 본 발명에서의 하드웨어적 구현과 최적의 동작을 위한 소프트웨어는 입력 테스트 기능 하드웨어 장치, 게이트 시뮬레이션 하드웨어 장치, 이벤트 처리 하드웨어 장치 및 초기와 장치의 4개의 기본 기능 블록으로 구성된다.The software for hardware implementation and optimal operation in the present invention is composed of input test function hardware device, gate simulation hardware device, event processing hardware device, and four basic functional blocks.

도 46은 이러한 하드웨어 가속기(Hardware Accelerator)의 전체 구조를 나타낸 것이며, 이의 동작을 위해서는 상기한 병렬처리기법(MDP 방법)을 사용하여 내부 동작을 수행한다.FIG. 46 shows the overall structure of such a hardware accelerator. For this operation, an internal operation is performed using the above-described parallel processing technique (MDP method).

도 47은 상기 하드웨어 가속기의 내부자료 구조를 나타낸 것으로써, 가속기의 동작을 위한 기본정보는 게이트, 네트 및 이들 사이의 연결정보가 주어져야 한다. 이 기본정보를 표현하는 데 사용하는 자료 표현 기법은 디스크립터(descriptor)를 사용한다. 시뮬레이션 기능을 수행하기 이전에 호스트 컴퓨터는 반도체 도면 분석의 결과로 디스크립터들을 생성하여 하드웨어가속기의 메모리에 저장하게 된다. 반도체 도면의 게이트와 네트 마다 하나의 디스크립터가 생성된다.FIG. 47 shows the internal data structure of the hardware accelerator. The basic information for the operation of the accelerator should be given a gate, a net, and connection information therebetween. The data representation technique used to express this basic information uses a descriptor. Before performing the simulation function, the host computer generates descriptors as a result of analyzing the semiconductor drawings and stores them in the memory of the hardware accelerator. One descriptor is generated for each gate and net of the semiconductor drawing.

상기 하드웨어 가속기와 호스트 컴퓨터와의 인터페이스는 메모리 기반의 큐 동작을 이용하여 데이터의 교환과 시뮬레이션의 시작과 종료를 수행한다. 하드웨어가속기와 호스트 컴퓨터는 호스트 큐를 이용하여 동작하고, 가속기의 내부에서는 별도의 범용 큐(universal queue)를 이용하여 동작한다.The interface between the hardware accelerator and the host computer uses memory based queue operation to perform data exchange and start and end of simulation. Hardware accelerators and host computers operate using host queues, and within the accelerators, use separate universal queues.

상기 호스트 컴퓨터는 입력단의 네트값을 호스트 큐를 사용하여 가속기에 전달하게 된다.The host computer sends the net value of the input end to the accelerator using the host queue.

도 48은 상기 호스트 큐의 데이터 전달 동작을 나타낸 것으로, 호스트 큐는 입력단의 이전 값, 현재 값 및 네트디스크립터의 포인터의 세 요소로 구성되며, 호스트 컴퓨터가 데이터를 저장한 후에 하드웨어가속기로 전달되어 시뮬레이션을 수행하게 된다. "Qstart"와 "Qend"는 전달되는 정보의 시작과 끝을 지정하여 전달하게 된다.48 illustrates a data transfer operation of the host queue. The host queue is composed of three elements, a previous value of an input terminal, a current value, and a pointer of a net descriptor. The host queue stores data and is transferred to a hardware accelerator to simulate the data. Will be performed. "Qstart" and "Qend" specify the start and end of the information to be delivered.

도 49는 하드웨어 가속기의 범용 큐와 내부 장치간의 동작을 나타낸 것으로, 범용 큐의 동작은 호스트 큐의 동작과 비슷하며, 큐의 시작과 끝을 "Qgate" 및 "Qnet"의 포인터를 이용하여 큐 내의 정보의 시작과 마지막을 지정하여 게이트시뮬레이션 장치와 이벤트처리 장치간의 정보 전달의 메카니즘으로 사용하고, 게이트 시뮬레이션 장치와 이벤트처리 장치(네트처리장치)는 범용 큐를 이용하여 게이트 시뮬레이션의 수행 결과로 발생하는 네트의 이벤트를 큐를 이용하여 전달한다.Fig. 49 shows the operation between the general purpose queue of the hardware accelerator and the internal device. The operation of the general purpose queue is similar to that of the host queue, and the start and end of the queue are indicated by the pointers of "Qgate" and "Qnet". The start and end of the information is designated as a mechanism for transferring information between the gate simulation device and the event processing device. The gate simulation device and the event processing device (net processing device) are generated as a result of performing the gate simulation using a general-purpose queue. Deliver events from the net using queues.

이벤트 처리장치 역시 동일한 방법으로 네트 이벤트 처리 후 발생하는 fanout 게이트들을 큐를 이용하여 게이트 시뮬레이션 장치에 전달하게 되며, 시뮬레이션의 종료는 이 범용 큐에 저장되는 데이터가 없으면 더 이상 처리 할 것이 없기 때문에 종료 절차를 수행하게 되고, 호스트 컴퓨터에 인터럽트를 발생하게 된다.In the same way, the event processing device delivers the fanout gates generated after the net event processing to the gate simulation device by using the queue, and the termination of the simulation is because there is no further processing if there is no data stored in this general-purpose queue. Will cause an interrupt to the host computer.

도 50은 도 46의 로직 시뮬레이션을 위한 하드웨어 가속기의 전체 구조 중 게이트 시뮬레이션 장치, 네트 검증(이벤트 처리) 장치 및 초기화 장치의 기본 구 조를 나타낸 것으로, 이 세 가지 장치의 구조는 동일하며, ROM의 프로그램과 SUBCIRCUIT의 구조에 따라 동작이 세가지로 분리되어 동작한다. 즉, 게이트시뮬레이션 장치는 SUBCIRCUIT와 FIRMWARE가 게이트 기능 수행과 수행 결과를 범용 큐에 저장하는 것을 담당하는 회로로 구성되어 있으며, 네트 검증(이벤트처리) 장치는 범용큐에 있는 네트의 값 처리와 네트와 연결된 게이트의 처리 및 범용큐에 이벤트 처리 결과를 저장하여 시뮬레이션이 지속되도록 하는 장치로 구성되고, 초기화 장치 또한 동일한 구조로 구성되어 있으며, 하나의 시뮬레이션이 종료되면 이전의 네트값, 정렬값 및 초기값을 원위치하여 다음의 시뮬레이션이 수행 될 수 있는 기능을 위의 기본적인 구조를 사용하여 수행한다.FIG. 50 illustrates the basic structures of the gate simulation apparatus, the net verification (event processing) apparatus, and the initialization apparatus among the entire structures of the hardware accelerator for logic simulation of FIG. 46. The structures of the three apparatuses are identical, and Depending on the structure of the program and SUBCIRCUIT, the operation is divided into three parts. That is, the gate simulation device is composed of circuits that SUBCIRCUIT and FIRMWARE are in charge of performing the gate function and storing the result in the general purpose queue, and the net verification (event processing) device is the value processing of the net in the general purpose queue and the net and It is composed of the device to keep the simulation by saving the event processing result in the connected gate processing and the general purpose queue, and the initialization device is also configured in the same structure. When one simulation is finished, the previous net value, alignment value and initial value By using the above basic structure, the following simulation can be performed by repositioning.

그리고, 상기 하드웨어 가속기의 전체 구조 중 입력장치는 입력단의 값을 호스트큐로부터 처리하여 범용큐에 연결하는 역할과 처음의 이벤트처리 장치가 수행할 수 있는 기능을 담당하며, 회로의 구조가 위의 세가지처럼 복잡하지 않기 때문에 단순한 비교기 및 레지스터로 구성되어 있다.In addition, the input device of the overall structure of the hardware accelerator processes the value of the input stage from the host queue and connects to the general queue, and the function that the first event processing apparatus can perform, and the circuit structure of the three It is not as complex as it consists of simple comparators and registers.

이상과 같이 본 발명에 따른 반도체 소자의 전압 강하 및 전력 소모 값의 분석 방법에 있어서, 본 명세서에 개시된 실시 예와 도면에 의해 본 발명은 한정되지 않으며 그 발명의 기술사상 범위내에서 당업자에 의해 다양한 변형이 이루어 질 수 있으며, 이는 본 발명의 권리범위에 포함되는 것으로 해석되어야 한다.As described above, in the method for analyzing the voltage drop and the power consumption value of a semiconductor device according to the present invention, the present invention is not limited by the embodiments and drawings disclosed herein, and various modifications may be made by those skilled in the art within the technical scope of the present invention. Modifications may be made, which should be construed as being included in the scope of the present invention.

상기한 바와 같은 반도체 소자의 설계 검증을 위한 고속 병렬 시뮬레이션 방법에 있어서 하드웨어 가속기 및 최적화된 소프트웨어를 적용함으로써 반도체 소자의 설계 검증 속도를 높일 수 있으며, 실 예로 ISCAS-85 벤치마킹용 회로인 C432.CKT회로를 사용하고, 본 하드웨어가속기를 25MHz기반으로 동작하는 경우 초당 523,000개의 이벤트를 처리할 수 있는 능력을 보유하고 있으며, 이는 초당 905,000게이트를 처리할 수 있다. 이 하드웨어가속기는 HDL을 이용하여 구현하고 검증하였으며, 네트 디스크립터가 32비트로 표현가능하기 때문에 최대 처리 능력은 초당 16,730,000이벤트(초당 29,000,000게이트)가 가능하므로, 표 3에서 비교되었던 모든 벤치마킹용 회로를 1초이내에 처리할 수 있으며, 이는 소프트웨어를 이용한 검증과 비교하여 최대 5,600배 이상의 수행 속도를 가지게 된다. In the high-speed parallel simulation method for design verification of a semiconductor device as described above, by applying a hardware accelerator and optimized software, the design verification speed of the semiconductor device can be increased. For example, the C432.CKT circuit, which is a circuit for benchmarking ISCAS-85, is used. If the hardware accelerator is operated based on 25MHz, it has the ability to process 523,000 events per second, which can process 905,000 gates per second. This hardware accelerator is implemented and verified using HDL, and because the Net Descriptor can be expressed in 32 bits, the maximum processing power is 16,730,000 events per second (29,000,000 gates per second), so all benchmarking circuits compared in Table 3 are 1 second. This can be done in less than five seconds, which is up to 5,600 times faster than software verified.

Claims

In the analysis method for design verification of a semiconductor device,

High-speed parallel simulation method using hardware-designed accelerator and analysis method by software that processes event-based simulation in parallel by applying EPC-set algorithm and path-tracing algorithm.

The method of claim 1,

The EPC-set algorithm recognizes only one value if the first element is equal to the value of the input nets, and removes the rest. The second element selects and specifies the minimum value of the second element among the same first element. The first element selects the maximum value of the third element of the same first element, and when the overlapping first element is processed, the first element of the EPC-set increases by 1, and the second and third elements use the delay value of the gate. Analytical method for design verification of semiconductor device to increase EPC-set value of corresponding gate by increasing by.

The method of claim 1,

The path-tracing algorithm performs processing in the direction of the input terminal starting from the gate connected to the output terminal of the circuit, and verifies the design of the semiconductor device so that the past information generated by the left shift can be applied to the EPC-set algorithm without being destroyed. Method for analysis.

The method of claim 1,

And the hardware accelerator comprises an input test function, a gate simulation, an event processing and an initialization device.

In the analysis for design verification of a semiconductor device,

Hardware acceleration device comprising an input test unit, a gate simulation unit, an event processing unit and an initialization unit.

The method of claim 5,

The input test unit is a hardware accelerator for processing the value of the input terminal of the analysis target circuit from the host queue to connect to the general purpose queue and the first event processing device can be performed.

The method of claim 5,

The gate simulation unit is a hardware acceleration device for allowing the accessory circuit and the firmware to perform the gate function, the result stored in the general purpose queue.

The method of claim 5,

The event processing unit is a hardware acceleration device for the simulation to continue by processing the net value in the universal queue, the processing of the gate connected to the net and the event processing result in the universal queue.

The method of claim 5,

The initialization unit is a hardware acceleration device that allows the next simulation to be performed by returning the previous net value, the alignment value and the initial value when one simulation is completed.