KR20120040630A

KR20120040630A - Reconfigurable processor and method for processing loop having memory dependency

Info

Publication number: KR20120040630A
Application number: KR1020100109998A
Authority: KR
Inventors: 안희진; 유동훈; 이강웅; 안민욱; 이진석; 김태송; 김원섭
Original assignee: 삼성전자주식회사
Priority date: 2010-10-19
Filing date: 2010-11-05
Publication date: 2012-04-27
Also published as: KR101722695B1

Abstract

PURPOSE: A reconfigurable processor to handle a memory dependent loop and a method thereof are provided to analyze the dependency between memory access commands, thereby allocating the commands to a plurality of processing elements based on the analyzed result. CONSTITUTION: An extracting unit(140) extracts an operation trace from a simulation result. A scheduler(150) analyzes the memory dependency between commands of iterations based on a corresponding trace to a memory access command out of the operation traces. A simulation unit(130) simulates the commands. The scheduler analyzes the memory dependency between the commands of iterations which exist in a generated iteration window.

Description

RECONFIGURABLE PROCESSOR AND METHOD FOR PROCESSING LOOP HAVING MEMORY DEPENDENCY}

이터레이션(iteration)에 대한 연산을 병렬적으로 실행할 때, 연산이 정확하게 실행될 수 있도록 명령들을 다수의 프로세싱 엘리먼트(processing element)에 할당하는 기술과 관련된다.
When executing operations on iterations in parallel, techniques are involved in assigning instructions to multiple processing elements so that the operations can be executed correctly.

재구성 가능 아키텍처(reconfigurable architecture)란 어떠한 작업을 수행하기 위한 컴퓨팅 장치의 하드웨어적 구성을 각각의 작업에 최적화되도록 변경할 수 있는 아키텍처를 의미한다.Reconfigurable architecture refers to an architecture that can change the hardware configuration of a computing device to perform a task to be optimized for each task.

어떠한 작업을 하드웨어적으로만 처리하면 고정된 하드웨어의 기능으로 인해 작업 내용에 약간의 변경이 가해지면 이를 효율적으로 처리하기가 어렵다. 또한, 어떠한 작업을 소프트웨어적으로만 처리하면 그 작업 내용에 맞도록 소프트웨어를 변경하여 처리하는 것이 가능하지만 하드웨어적 처리에 비해 속도가 늦다.If a task is processed only in hardware, the fixed hardware function makes it difficult to process it efficiently if a small change is made to the task. In addition, it is possible to process a job by changing only the software in accordance with the contents of the job if the job is done only in software, but it is slower than the hardware process.

재구성 가능 아키텍처는 이러한 하드웨어/소프트웨어의 장점을 모두 만족시킬 수가 있다. 특히, 동일한 작업이 반복적으로 수행되는 디지털 신호 처리 분야에서는 이러한 재구성 가능 아키텍처가 많은 주목을 받고 있다.Reconfigurable architectures can meet all of these hardware / software advantages. In particular, in the field of digital signal processing where the same operation is performed repeatedly, such a reconfigurable architecture attracts much attention.

한편, 디지털 신호 처리 과정은 그 신호 처리 특성상 동일한 작업이 반복되는 루프 연산 과정을 다수 포함하는 것이 일반적이다. 일반적으로, 루프 연산 속도를 높이기 위해서 루프 레벨 병렬화(loop level parallelism, LLP)가 많이 이용된다. 이러한 LLP로는 소프트웨어 파이프라이닝(software pipelining)이 대표적이다.On the other hand, the digital signal processing process generally includes a plurality of loop operation processes in which the same operation is repeated due to its signal processing characteristics. In general, loop level parallelism (LLP) is widely used to speed up loop operations. Software pipelining is typical of such LLP.

소프트웨어 파이프라이닝은 서로 다른 이터레이션(iteration)에 속해 있는 오퍼레이션이라도 그 이터레이션(iteration) 간의 의존성이 없으면 각각의 오퍼레이션을 동시에 처리할 수 있는 원리를 이용한 것이다. 이러한 소프트 웨어 파이프라이닝은 재구성 가능 어레이와 결합하면서 더 좋은 성능을 낼 수 있다. 예를 들어, 병렬처리가 가능한 오퍼레이션들이 재구성 가능 어레이를 구성하는 각각의 프로세싱 유닛에서 동시에 처리되는 것이 가능하다.Software pipelining uses the principle that even operations belonging to different iterations can be processed simultaneously without dependencies between the iterations. This software pipelining can be combined with reconfigurable arrays for better performance. For example, it is possible for parallelizable operations to be processed simultaneously in each processing unit constituting a reconfigurable array.

최근에는, 파이프 라이닝을 실행함에 있어 메모리 의존성이 있는 루프의 명령이 정확하게 연산되도록, 명령들을 다수의 프로세싱 엘리먼트에 할당하는 기술에 대한 연구의 필요성이 증대되고 있다.
Recently, there is an increasing need for research on a technique for allocating instructions to a plurality of processing elements such that the instruction of a memory-dependent loop is correctly calculated in executing pipelining.

메모리 액세스 명령들 간의 의존 관계를 분석하고, 분석된 결과에 기초하여 명령들을 다수의 프로세싱 엘리먼트에 할당함으로써, 잘못된 연산을 줄일 수 있는 재구성 가능 프로세서와 관련된다.
It relates to a reconfigurable processor that can reduce erroneous operations by analyzing dependencies between memory access instructions and assigning instructions to multiple processing elements based on the analyzed results.

발명의 일 실시예에 따른 메모리 의존성 있는 루프의 처리하기 위한 재구성 가능 프로세서는 시뮬레이션 결과로부터 연산 트레이스를 추출하는 추출부 및 상기 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스에 기초하여, 이터레이션들에 포함된 명령들간의 메모리 의존 관계를 분석하는 스케줄러를 포함한다.A reconfigurable processor for processing a memory dependent loop according to an embodiment of the present invention is included in iterations based on an extractor for extracting a computational trace from a simulation result and a trace corresponding to a memory access instruction among the computational traces. It includes a scheduler that analyzes the memory dependencies between the issued instructions.

상기 명령들을 시뮬레이션하는 시뮬레이션부를 더 포함할 수 있다.It may further include a simulation unit for simulating the commands.

상기 스케줄러는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 상기 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 메모리 의존 관계를 분석할 수 있다.The scheduler may generate an iteration window corresponding to a processing time of instructions included in one iteration, and analyze a memory dependency relationship between instructions included in the iteration existing in the generated iteration window.

상기 스케줄러는 상기 분석된 메모리 의존 관계에 기초하여 상기 이터레이션(iteration)들 간의 MII(minimum iteration distance)를 연산할 수 있다.The scheduler may calculate a minimum iteration distance (MII) between the iterations based on the analyzed memory dependency.

상기 스케줄러는 상기 분석된 메모리 의존 관계에 기초하여 상기 연산된 MII로부터 II(iteration distance)를 증가시키면서 명령들을 프로세싱 엘리먼트들에 할당할 수 있다.The scheduler may assign instructions to processing elements while increasing an iteration distance (II) from the computed MII based on the analyzed memory dependency.

상기 연산 트레이스는 레지스터 주소, 레지스터에 저장된 값, 메모리 주소 및 메모리에 저장된 값 중 적어도 하나를 포함하는 메모리 의존성 있는 루프를 처리하기 위한 재구성 가능 프로세서.
And said operation trace comprises at least one of a register address, a value stored in a register, a memory address, and a value stored in memory.

본 발명의 일 실시예에 따른 메모리 의존성 있는 루프의 처리 방법은 시뮬레이션 결과로부터 연산 트레이스를 추출하는 단계 및 상기 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스에 기초하여, 이터레이션들에 포함된 명령들간의 메모리 의존 관계를 분석하는 단계를 포함한다.A method of processing a memory-dependent loop according to an embodiment of the present invention includes extracting an operation trace from a simulation result, and based on a trace corresponding to a memory access instruction among the operation traces, between instructions included in the iterations. Analyzing the memory dependencies.

상기 명령들을 시뮬레이션하는 단계를 더 포함할 수 있다.The method may further include simulating the instructions.

상기 분석하는 단계는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 상기 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 메모리 의존 관계를 분석할 수 있다.The analyzing may include generating an iteration window corresponding to a processing time of instructions included in one iteration, and analyzing a memory dependency relationship between instructions included in the iteration existing in the generated iteration window. have.

상기 분석된 메모리 의존 관계에 기초하여 상기 이터레이션(iteration)들 간의 MII(minimum iteration distance)를 연산하는 단계를 더 포함할 수 있다.The method may further include calculating a minimum iteration distance (MII) between the iterations based on the analyzed memory dependency relationship.

상기 분석된 메모리 의존 관계에 기초하여 상기 연산된 MII로부터 II(iteration distance) 값을 증가시켜 나가면서 상기 분석된 메모리 의존 관계를 고려하여 프로세싱 엘리먼트들에 할당하는 단계를 더 포함할 수 있다.The method may further include allocating processing elements in consideration of the analyzed memory dependency while increasing an iteration distance (II) value from the calculated MII based on the analyzed memory dependency.

상기 연산 트레이스는 레지스터 주소, 레지스터에 저장된 값, 메모리 주소 및 메모리에 저장된 값 중 적어도 하나를 포함할 수 있다.
The operation trace may include at least one of a register address, a value stored in a register, a memory address, and a value stored in a memory.

개시된 내용에 따르면, 프로파일링을 통해 얻은 연산 트레이스로부터 명령들 간의 메모리 의존 관계를 추출하고, 추출된 메모리 의존관계에 기초하여 이터레이션에 포함된 명령들을 다수의 프로세싱 엘리먼트에 할당함으로써, 메모리 의존 관계를 고려하지 않았을 때보다 연산의 정확도를 향상시킬 수 있다.According to the present disclosure, by extracting the memory dependency between the instructions from the operation trace obtained through profiling, and assigning the instructions contained in the iteration to a plurality of processing elements based on the extracted memory dependency, The accuracy of calculations can be improved than when not considered.

또한, 이터레이션 윈도우를 이용하여 메모리 액세스 명령들 간의 의존 관계를 분석함으로써, 의존 관계 분석 시간을 줄일 수 있다. In addition, it is possible to reduce dependency analysis time by analyzing dependency relationships between memory access commands using an iteration window.

도 1은 본 발명의 일 실시예와 관련된 재구성 가능 프로세서를 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 이터레이션 윈도우를 설명하기 위한 도면이다.
도 3a 및 도 3b는 MII를 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 재구성 가능 프로세서의 제어 방법을 설명하기 위한 흐름도이다.1 is a view for explaining a reconfigurable processor associated with an embodiment of the present invention.
2 is a view for explaining an iteration window according to an embodiment of the present invention.
3A and 3B are diagrams for explaining MII.
4 is a flowchart illustrating a control method of a reconfigurable processor according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 발명을 실시하기 위한 구체적인 내용에 대하여 상세하게 설명한다.
Hereinafter, with reference to the accompanying drawings will be described in detail the specific contents for carrying out the invention.

도 1은 본 발명의 일 실시예와 관련된 재구성 가능 프로세서를 설명하기 위한 도면이다.1 is a view for explaining a reconfigurable processor associated with an embodiment of the present invention.

도 1을 참조하면, 재구성 가능 프로세서(100)는 재구성 가능 어레이(110), 메모리(120), 시뮬레이션부(130), 추출부(140) 및 스케줄러(150)를 포함한다. Referring to FIG. 1, the reconfigurable processor 100 includes a reconfigurable array 110, a memory 120, a simulation unit 130, an extractor 140, and a scheduler 150.

이하에서, 이터레이션(iteration)이란 루프가 여러번 실행될 때, 각각의 루프의 실행을 의미한다. 예를 들면, 루프가 3번 실행되는 경우, 루프의 첫번째 실행을 첫번째 이터레이션이라고 하고, 루프의 두번째 실행을 두번째 이터레이션이라고 하고, 루프의 세번째 실행을 세번째 이터레이션이라고 할 수 있다. 이터레이션에 속하는 명령(instruction)들이 서로 다른 프로세싱 엘리먼트에 매핑되고 각 프로세싱 엘리먼트가 동시에 동작함으로써, 명령들이 병렬적으로 처리될 수 있다. 이에 따라, 연산 속도가 향상될 수 있다.Hereinafter, iteration means the execution of each loop when the loop is executed several times. For example, if a loop is executed three times, the first execution of the loop may be referred to as the first iteration, the second execution of the loop may be referred to as the second iteration, and the third execution of the loop may be referred to as the third iteration. As instructions belonging to the iteration are mapped to different processing elements and each processing element operates simultaneously, the instructions can be processed in parallel. Accordingly, the computation speed can be improved.

재구성 가능 프로세서(100)는 CGA(coarse-grained array) 모드, VLIW(very long instruction word) 모드 등으로 구동될 수 있다. 예를 들면, 재구성 가능 프로세서(100)는 CGA 모드에서 루프 연산을 처리하고, VLIW 모드에서는 일반적인 연산 또는 루프 연산을 처리할 수 있다. 다만, VLIW 모드에서 루프 연산을 할 수 있으나, CGA 모드에서 루프 연산을 처리하는 것보다 효율이 떨어진다. 예를 들면, 하나의 프로그램이 실행될 때, 재구성 가능 프로세서(100)는 CGA 모드 및 VLIW 모드를 번갈아가면서 구동될 수 있다. The reconfigurable processor 100 may be driven in a coarse-grained array (CGA) mode, a very long instruction word (VLIW) mode, or the like. For example, the reconfigurable processor 100 may process a loop operation in the CGA mode, and may process a general operation or a loop operation in the VLIW mode. However, loop operation can be performed in VLIW mode, but is less efficient than loop operation in CGA mode. For example, when one program is executed, the reconfigurable processor 100 may be driven alternately between the CGA mode and the VLIW mode.

재구성 가능 어레이(110)는 레지스터 파일(111) 및 다수의 프로세싱 엘리먼트(processing element; PE)(112)를 포함한다. 재구성 가능 어레이(110)는 최적의 연산을 수행하도록 하드웨어적 구성을 변경하는 것이 가능하다. 예를 들면, 재구성 가능 어레이(110)는 연산의 종류에 따라 다수의 프로세싱 엘리먼트들 간의 연결 상태를 변경할 수 있다.Reconfigurable array 110 includes register file 111 and a number of processing elements (PEs) 112. The reconfigurable array 110 may change the hardware configuration to perform an optimal operation. For example, the reconfigurable array 110 may change a connection state between a plurality of processing elements according to the type of operation.

레지스터 파일(111)은 프로세싱 엘리먼트들(112) 간의 데이터 전달을 위해 사용되거나, 명령 실행 시 필요한 각종 데이터를 저장한다. 예를 들면, 각각의 프로세싱 엘리먼트(112)는 레지스터 파일(111)에 접속하여 명령 실행 시 사용되는 데이터를 읽거나 쓰는 것이 가능하다. 다만, 모든 프로세싱 엘리먼트들(112)이 서로 연결되는 것이 아니기 때문에, 특정 프로세싱 엘리먼트의 경우에는 레지스터 파일(11)에 접속하기 위해 다른 프로세싱 엘리먼트를 경유할 수도 있다.The register file 111 is used for data transfer between the processing elements 112 or stores various data necessary for executing an instruction. For example, each processing element 112 may access the register file 111 to read or write data used in executing instructions. However, since not all processing elements 112 are connected to each other, in the case of a specific processing element, it may be via another processing element to access the register file 11.

프로세싱 엘리먼트(112)들은 할당된 명령을 실행할 수 있다. 프로세싱 엘리먼트(112)들의 연결 상태 및 동작 순서는 처리하고자 하는 작업에 따라 변경될 수 있다. Processing elements 112 may execute the assigned instructions. The connection state and the operation order of the processing elements 112 may be changed according to the task to be processed.

메모리(120)는 프로세싱 엘리먼트들(112)간의 연결 상태에 관한 정보, 명령들 등과 같이 프로세싱에 필요한 정보 및 프로세싱의 결과 정보가 저장될 수 있다.예를 들면, 메모리(120)는 처리할 데이터를 저장하거나 처리 결과를 저장할 수 있다. 또 다른 예를 들면, 메모리(120)에는 재구성 가능 프로세서(100)의 구동시 필요한 정보, 재구성 가능 어레이의 연결 상태 정보, 재구성 가능 어레이의 동작 방법에 대한 정보 등이 저장될 수 있다.The memory 120 may store information necessary for processing such as information about a connection state between the processing elements 112, instructions, and the like, and result information of the processing. For example, the memory 120 may store data to be processed. You can either save or save the processing results. As another example, the memory 120 may store information necessary for driving the reconfigurable processor 100, connection state information of the reconfigurable array, and information on a method of operating the reconfigurable array.

시뮬레이션부(130)는 프로세싱 엘리먼트에서 실행될 명령들을 테스트 파일에 적용하여 시뮬레이션을 실행할 수 있다. 예를 들면, 시뮬레이션부(130)는 명령들을 이용하여 테스트 파일(예를 들면, MP3 파일, 동영상 파일 등)을 처리하는 시뮬레이션을 실행할 수 있다. The simulation unit 130 may execute a simulation by applying instructions to be executed in the processing element to a test file. For example, the simulation unit 130 may execute a simulation of processing a test file (eg, an MP3 file, a video file, etc.) by using commands.

추출부(140)는 시뮬레이션부(130)에서 실행된 시뮬레이션 결과로부터 연산 트레이스(execution trace)를 추출할 수 있다. 이를 프로파일링이라고도 한다. 연산 트레이스는 시뮬레이션 중에 실행된 명령에 관한 정보를 시간 순서대로 기록한 자료이며, 기록된 자료는 각 명령이 실행된 순간의 변수들의 값일 수 있다. 예를 들면, 상기 연산 트레이스는 레지스터 주소, 레지스터에 저장된 값, 메모리 주소 및 메모리에 저장된 값 등을 포함할 수 있다.The extraction unit 140 may extract an operation trace from the simulation result executed by the simulation unit 130. This is also called profiling. The operation trace is data in which information about an instruction executed during a simulation is recorded in chronological order, and the recorded data may be values of variables at the moment of execution of each instruction. For example, the operation trace may include a register address, a value stored in a register, a memory address, a value stored in a memory, and the like.

스케줄러(150)는 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스('추적된 변수 값')에 기초하여 명령들 간의 메모리 의존 관계를 분석할 수 있다. 각각의 이터레이션에 포함된 명령들 중 동일한 메모리에 액세스하는 명령이 존재하는 경우, 명령들 간에는 메모리 의존 관계가 있는 것이다. 이 경우, 명령들은 정확한 연산을 위해 직렬적으로 처리되어야만 한다. 메모리 액세스 명령이란 메모리(120)로 데이터를 저장하거나 메모리(120)로부터 데이터를 읽어들이는 명령을 의미할 수 있다. 예를 들면, 아래와 같다. The scheduler 150 may analyze the memory dependency between the instructions based on the trace ('traced variable value') corresponding to the memory access instruction among the operation traces. If there is an instruction that accesses the same memory among the instructions included in each iteration, there is a memory dependency between the instructions. In this case, the instructions must be processed serially for correct operation. The memory access command may refer to a command for storing data to or reading data from the memory 120. For example:

k번째 이터레이션 : k iteration:

A : A: ldld _i _i r20r20 <- M[0x50] <-M [0x50]

B : B: addadd r2r2 <- <- r4r4 + + r5r5

C : C: stst _i M[0x100] <- _i M [0x100] <- r8r8

D : D: subsub r1r1 <- <- r4r4 - - r5r5

E : E: stst _i M[0x1000] <- _i M [0x1000] <- r10r10

k+1번째 이터레이션 : k + 1 iteration:

A : A: ldld _i _i r20r20 <- M[0x100] <-M [0x100]

B : B: addadd r2r2 <- <- r4r4 + + r5r5

C : C: stst _i M[0x150] <- _i M [0x150] <- r8r8

D : D: subsub r1r1 <- <- r4r4 - - r5r5

E : E: stst _i M[0x1000] <- _i M [0x1000] <- r10r10

여기서, ld는 읽기(load) 명령, add는 덧셈(addition) 명령, st는 저장(store) 명령, sub는 뺄셈(subtraction) 명령을 의미한다. 메모리 액세스 명령은 M[]이 포함된 명령을 의미한다.Here, ld is a load command, add is an addition command, st is a store command, and sub is a subtraction command. The memory access command means a command including M [].

레지스터(r) 의존 관계 분석은 레지스터의 이름만을 비교함으로써 쉽게 알 수 있다. 반면에, 메모리 의존 관계 분석은 레지스터에 저장된 메모리의 주소 값(예를 들면 '0x100','0x150')을 비교하여야만 알 수 있다. 따라서, 상대적으로 메모리 의존 관계 분석이 레지스터 의존 관계 분석에 비해 어렵다. The register (r) dependency analysis can be easily seen by comparing only the name of the register. On the other hand, memory dependency analysis can be found only by comparing the address values (eg, '0x100', '0x150') of the memory stored in the register. Thus, memory dependency analysis is relatively difficult compared to register dependency analysis.

연산 트레이스는 레지스터 주소('r1,r2,r4,r5,r8,r10,r20'), 레지스터에 저장된 값, k번째 및 k+1번째 이터레이션에 포함된 메모리 액세스 명령에 해당하는 메모리 주소 또는 메모리 주소에 저장된 값을 포함할 수 있다.The operation trace is a register address ('r1, r2, r4, r5, r8, r10, r20'), the value stored in the register, or the memory address or memory corresponding to the memory access instruction contained in the kth and k + 1th iterations. It can contain values stored in addresses.

스케줄러(150)는 메모리 액세스 명령에 대응되는 트레이스에 기초하여 명령들 간의 의존 관계를 분석할 수 있다. 예를 들면, 스케줄러(150)는 메모리 액세스 명령에 해당하는 메모리의 주소 값이 동일한 경우, 해당하는 메모리 액세스 명령들 간에 메모리 의존 관계가 존재한다고 판단할 수 있다. 예를 들면, k번째 이터레이션 C의 메모리 주소 값 '0X100'과 k+1번째 이터레이션 A의 메모리 주소 값 '0x100'이 동일하므로, 스케줄러(150)는 k번째 이터레이션 C와 k+1번째 이터레이션 A가 의존 관계가 있다고 판단할 수 있다. 또 다른 예를 들면 k번째 이터레이션 C의 M[0X100]에 저장된 값과 k+1번째 이터레이션 A의 M[0X100]에 저장된 값이 동일한 경우, 스케줄러(150)는 k번째 이터레이션 C와 k+1번째 이터레이션 A가 의존 관계가 있다고 판단할 수 있다. 스케줄러(150)는 시뮬레이션 결과값에 기초하여 위와 같은 의존 관계를 분석할 수 있다. k번째 이터레이션과 k+1 번째 이터레이션 간의 메모리 의존 관계를 판단하는 과정을 설명하였으나, 스케줄러(150)는 k번째 이터레이션과 k+2번째 이터레이션, k번째 이터레이션과 k+3번째 이터레이션 등간의 메모리 의존 관계도 판단한다. The scheduler 150 may analyze the dependency relationship between the instructions based on the trace corresponding to the memory access instruction. For example, when the address value of the memory corresponding to the memory access command is the same, the scheduler 150 may determine that a memory dependency relationship exists between the corresponding memory access commands. For example, since the memory address value '0X100' of the k-th iteration C and the memory address value '0x100' of the k + 1 iteration A are the same, the scheduler 150 performs the k-th iteration C and the k + 1st iteration. It may be determined that iteration A has a dependency. In another example, when the value stored in M [0X100] of the k-th iteration C and the value stored in M [0X100] of the k + 1 iteration A are the same, the scheduler 150 performs the k-th iteration C and k. It can be determined that the + 1th iteration A has a dependency relationship. The scheduler 150 may analyze the above dependencies based on the simulation result. Although the process of determining the memory dependency relationship between the k-th iteration and the k + 1 iterations has been described, the scheduler 150 performs the k-th iteration and the k + 2-th iteration, the k-th iteration and the k + 3-th iteration. It also determines the memory dependency between migrations.

스케줄러(150)는 메모리 의존 관계에 기초하여 MII(minimum iteration distance)를 연산할 수 있다. The scheduler 150 may calculate a minimum iteration distance (MII) based on the memory dependency.

스케줄러(150)는 연산된 MII로부터 II값을 증가시켜 나가면서, 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트들에 할당할 수 있다. 예를 들면, 스케줄러(150)는 MII로부터 II(iteration distance)값을 1씩 증가시키면서, 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트들에 할당할 수 있다. 이를 시행착오법(trial-and-error)이라고도 한다. 다만, 시행착오법은 일 실시예에 불과하고, 연산된 MII로부터 II값을 연산하는 다른 방법이 사용될 수도 있다.The scheduler 150 may allocate instructions to the processing elements in consideration of the analyzed memory dependency while increasing the II value from the calculated MII. For example, the scheduler 150 may allocate instructions to processing elements in consideration of the analyzed memory dependency while increasing the iteration distance (II) value by 1 from MII. This is also known as trial-and-error. However, the trial and error method is only one embodiment, and another method of calculating the II value from the calculated MII may be used.

스케줄러(150)는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 의존 관계를 분석할 수 있다. 스케줄러(150)는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성한다. 스케줄러(150)는 순차적으로 입력되는 이터레이션들에 대해 이터레이션 윈도우를 이용하여 명령들 간의 의존 관계를 분석할 수 있다.The scheduler 150 may generate an iteration window corresponding to the processing time of the instructions included in one iteration, and analyze a dependency relationship between the instructions included in the iterations present in the generated iteration window. The scheduler 150 corresponds to a processing time of instructions included in one iteration. Create an iteration window. The scheduler 150 may analyze the dependency relationship between the commands with respect to sequentially inputted iterations.

스케줄러(150)가 이터레이션 윈도우를 이용하여 메모리 액세스 명령들 간의 의존 관계를 분석함으로써, 의존 관계 분석 시간을 줄일 수 있다. 즉, 이터레이션 윈도우를 이용함으로써, 분석이 필요없는 이터레이션들에 포함된 명령들 간의 의존 관계를 분석하지 않을 수 있다.The scheduler 150 analyzes dependency relationships between memory access commands using an iteration window, thereby reducing dependency analysis time. That is, by using the iteration window, the dependency relationship between the commands included in the iterations that do not need to be analyzed may not be analyzed.

도 2는 본 발명의 일 실시예에 따른 이터레이션 윈도우를 설명하기 위한 도면이다. 2 is a view for explaining an iteration window according to an embodiment of the present invention.

본 실시예에서는 이터레이션은 1 사이클(cycle)마다 입력(II=1)되고, 1개의 이터레이션에 포함된 명령의 처리 시간은 10 사이클이라고 가정한다.In this embodiment, iteration is assumed to be inputted every cycle (II = 1), and the processing time of an instruction included in one iteration is 10 cycles.

도 1 및 도 2를 참조하면, 이터레이션 윈도우(200)는 1개의 이터레이션에 포함된 명령의 처리 시간과 같거나 큰 크기로 생성될 수 있다. 예를 들면, 1개의 이터레이션에 포함된 명령들의 처리 시간이 10 사이클이므로, 이터레이션 윈도우(200)는 10 사이클에 대응되는 10개의 이터레이션이 포함되거나 이거나 10개 이상의 이터레이션이 포함될 수 있는 크기로 생성될 수 있다. 10개의 이터레이션이 입력되는데 걸리는 시간은 10 사이클이다. 1 and 2, the iteration window 200 may be generated with a size equal to or larger than a processing time of an instruction included in one iteration. For example, since the processing time of instructions included in one iteration is 10 cycles, the iteration window 200 may include 10 iterations corresponding to 10 cycles, or may include 10 or more iterations. Can be generated. It takes 10 cycles to enter 10 iterations.

스케줄러(150)는 이터레이션 윈도우에 포함된 이터레이션들에 포함된 명령들 간의 의존 관계를 분석할 수 있다. 예를 들면, 현재 입력된 이터레이션('첫번째 이터레이션')과 현재 입력된 이터레이션에 포함된 명령들의 처리 시간('10 사이클')을 초과하여 입력된 이터레이션('열한번째 이터레이션')간의 의존 관계는 분석할 필요가 없다. 그 이유는 첫번재 이터레이션과 열한번째 이터레이션은 동시에 연산('병렬적 연산')되는 것이 아니라 순차적으로 연산('직렬적 연산') 되기 때문이다. 즉, 첫번째 이터레이션이 연산된 후, 열한번째 이터레이션이 연산된다. 따라서, 이터레이션 윈도우의 크기는 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 크기와 같거나 크게 설정할 수 있다. The scheduler 150 may analyze the dependency relationship between the commands included in the iterations included in the iteration window. For example, between the currently entered iteration ('first iteration') and the input iteration ('the eleventh iteration') that exceeds the processing time ('10 cycles') of the commands contained in the currently entered iteration Dependencies do not need to be analyzed. The reason is that the first iteration and the eleventh iteration are not computed at the same time ('parallel'), but rather sequentially ('serial'). That is, after the first iteration is calculated, the eleventh iteration is calculated. Accordingly, the size of the iteration window may be set equal to or larger than the size corresponding to the processing time of the instructions included in one iteration.

스케줄러(150)는 재구성 가능 어레이(110)에서 실행될 명령을 분석하고, 분석 결과에 기초하여 다수의 프로세싱 엘리먼트(112)에 명령을 할당할 수 있다. The scheduler 150 may analyze the instructions to be executed in the reconfigurable array 110 and assign the instructions to the plurality of processing elements 112 based on the analysis results.

스케줄러(150)는 이터레이션(iteration)들의 MII를 연산할 수 있다. 스케줄러(150)는 연산된 MII(minimum iteration distance)로부터 II(iteration distance)를 증가시켜 나가면서 분석된 메모리 의존 관계를 이용하여 명령을 프로세싱 엘리먼트들에 할당할 수 있다.
The scheduler 150 may calculate the MII of iterations. The scheduler 150 may assign an instruction to the processing elements using the analyzed memory dependency while increasing the iteration distance (II) from the calculated minimum iteration distance (MII).

도 3a 및 도 3b는 MII를 설명하기 위한 도면이다.3A and 3B are diagrams for explaining MII.

도 3a는 MII가 1인 경우를 도시한 도면이고, A,B,C,D,E는 명령을 의미한다.3A is a diagram illustrating a case where MII is 1, and A, B, C, D, and E indicate an instruction.

도 3a를 참조하면, 스케줄러(150)는 MII(minimum iteration distance)로부터 II(iteration distance)값을 1씩 증가시키면서, 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트에 할당할 수 있다. 예를 들면, 제 1 이터레이션(200a)의 A와 제 2 이터레이션(210a)의 B간에 의존 관계가 존재하는 경우, 스케줄러(150)는 제 2 이터레이션(210a)의 B 명령이 제 1 이터레이션(200a)의 A 명령이 실행된 이후에 실행되도록 MII값을 연산할 수 있다. 그 다음, 스케줄러(150)는 연산된 MII로부터 II값을 증가시키면서 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트에 할당할 수 있다. Referring to FIG. 3A, the scheduler 150 may allocate instructions to a processing element in consideration of the analyzed memory dependency while increasing an iteration distance (II) value by 1 from a minimum iteration distance (MII). For example, if there is a dependency relationship between A of the first iteration 200a and B of the second iteration 210a, the scheduler 150 determines that the B command of the second iteration 210a is the first iteration. The MII value may be calculated to be executed after the A instruction of the migration 200a is executed. The scheduler 150 may then assign instructions to the processing element taking into account the resolved memory dependency while increasing the II value from the computed MII.

도 3b는 MII가 3인 경우를 도시한 도면이고, A,B,C,D,E는 명령을 의미한다.3B is a diagram illustrating a case where MII is 3, and A, B, C, D, and E indicate an instruction.

도 3b를 참조하면, 스케줄러(150)는 MII(minimum iteration distance)로부터 II(iteration distance)값을 1씩 증가시키면서, 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트에 할당할 수 있다. 예를 들면, II는 MII값인 3부터 1씩 증가 될 수 있다. 예를 들면, 제 1 이터레이션(200a)의 D와 제 2 이터레이션(210a)의 B간에 의존 관계가 존재하는 경우, 스케줄러(150)는 제 2 이터레이션(210a)의 B 명령이 제 1 이터레이션(200a)의 D 명령이 실행된 이후에 실행되도록 MII값을 연산할 수 있다. 스케줄러(150)는 연산된 MII로부터 II값을 증가시키면서 분석된 메모리 의존 관계를 고려해서 명령들을 프로세싱 엘리먼트에 할당할 수 있다.
Referring to FIG. 3B, the scheduler 150 may allocate instructions to a processing element in consideration of the analyzed memory dependency while increasing an iteration distance (II) by 1 from a minimum iteration distance (MII). For example, II may be increased by 1 from 3, the MII value. For example, if there is a dependency relationship between D of the first iteration 200a and B of the second iteration 210a, the scheduler 150 determines that the B command of the second iteration 210a is the first iteration. The MII value may be calculated to be executed after the D instruction of the migration 200a is executed. The scheduler 150 may allocate instructions to the processing element in consideration of the analyzed memory dependency while increasing the II value from the calculated MII.

도 4는 본 발명의 일 실시예에 따른 재구성 가능 프로세서의 제어 방법을 설명하기 위한 흐름도이다.4 is a flowchart illustrating a control method of a reconfigurable processor according to an embodiment of the present invention.

도 4를 참조하면, 다수의 프로세싱 엘리먼트에서 실행될 명령들을 테스트 파일에 적용하여 시뮬레이션한다(400). 시뮬레이션된 결과로부터 연산 트레이스를 추출한다(410). 연산 트레이스 중 메모리 액세스 명령에 해당하는 트레이스에 기초하여, 이터레이션들에 포함된 명령들간의 메모리 의존 관계를 분석한다(420). 예를 들면, 1개의 이터레이션에 포함된 명령들의 처리 시간에 대응되는 이터레이션 윈도우를 생성하고, 생성된 이터레이션 윈도우 내에 존재하는 이터레이션에 포함된 명령들 간의 의존 관계를 분석할 수 있다. 분석된 메모리 의존 관계에 기초하여 이터레이션(iteration)들 간의 MII(minimum iteration distance)를 연산한다(430). 분석된 메모리 의존 관계에 기초하여 연산된 MII(minimum iteration distance)로부터 II값을 증가시켜 나가면서 메모리 의존 관계를 고려하여 명령들을 프로세싱 엘리먼트들에 할당한다(440).Referring to FIG. 4, simulations are performed by applying instructions to be executed on a plurality of processing elements to a test file (400). The operation trace is extracted from the simulated result (410). Based on the trace corresponding to the memory access instruction among the operation traces, the memory dependency relation between the instructions included in the iterations is analyzed (420). For example, an iteration window corresponding to a processing time of instructions included in one iteration may be generated, and dependency relationships between the commands included in the iterations present in the generated iteration window may be analyzed. A minimum iteration distance (MII) between the iterations is calculated based on the analyzed memory dependency (430). Instructions 440 are allocated to the processing elements in consideration of the memory dependency while increasing the II value from the minimum iteration distance (MII) calculated based on the analyzed memory dependency.

재구성 가능 프로세서의 제어 방법은 메모리 액세스 명령들 간의 의존 관계에 기초하여 이터레이션을 다수의 프로세싱 엘리먼트에 할당함으로써, 연산의 정확도를 향상시킬 수 있다.The control method of the reconfigurable processor can improve the accuracy of the operation by assigning iterations to the plurality of processing elements based on the dependencies between the memory access instructions.

또한, 재구성 가능 프로세서의 제어 방법은 한개의 이터레이션의 처리 시간에 대응되는 이터레이션 윈도우를 이용하여 메모리 액세스 명령들 간의 의존 관계를 분석함으로써, 의존 관계 분석 시간을 줄일 수 있다.
In addition, the control method of the reconfigurable processor may reduce dependency analysis time by analyzing dependency relationships between memory access commands using an iteration window corresponding to a processing time of one iteration.

설명된 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.The embodiments described may be constructed by selectively combining all or a part of each embodiment so that various modifications can be made.

또한, 실시예는 그 설명을 위한 것이며, 그 제한을 위한 것이 아님을 주의하여야 한다. 또한, 본 발명의 기술분야의 통상의 전문가라면 본 발명의 기술사상의 범위에서 다양한 실시예가 가능함을 이해할 수 있을 것이다.It should also be noted that the embodiments are for explanation purposes only, and not for the purpose of limitation. In addition, those skilled in the art will understand that various embodiments are possible within the scope of the technical idea of the present invention.

또한, 본 발명의 일 실시예에 의하면, 전술한 방법은, 프로그램이 기록된 매체에 프로세서가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 프로세서가 읽을 수 있는 매체의 예로는, ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있으며, 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다.Further, according to an embodiment of the present invention, the above-described method can be implemented as a code that can be read by a processor on a medium on which the program is recorded. Examples of processor-readable media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, and may be implemented in the form of a carrier wave (for example, transmission over the Internet). Include.

Claims

An extraction unit for extracting a calculation trace from the simulation result; And
And a scheduler for analyzing a memory dependency between instructions included in the iterations based on a trace corresponding to a memory access instruction among the operation traces.

The method of claim 1,
And a simulation unit for simulating the instructions.

The method of claim 1,
The scheduler,
Processes a memory-dependent loop that generates an iteration window corresponding to the processing time of instructions included in one iteration, and analyzes the memory dependency between instructions included in the iterations present in the generated iteration window. Reconfigurable processor for

The method of claim 1,
The scheduler,
A reconfigurable processor for processing a memory dependent loop that calculates a minimum iteration distance (MII) between the iterations based on the analyzed memory dependency.

The method of claim 4, wherein
The scheduler,
A reconfigurable processor for processing a memory dependent loop that allocates instructions to processing elements while increasing an iteration distance from the computed MII based on the analyzed memory dependency.

The method of claim 1,
And said operation trace comprises at least one of a register address, a value stored in a register, a memory address, and a value stored in memory.

Extracting a computational trace from the simulation result; And
Analyzing a memory dependency between instructions included in the iterations based on a trace corresponding to a memory access instruction among the operation traces.

The method of claim 7, wherein
And simulating the instructions.

The method of claim 7, wherein
The analyzing step
Processing of a memory-dependent loop that generates an iteration window corresponding to the processing time of instructions included in one iteration, and analyzes the memory dependency between instructions included in the iterations present in the generated iteration window. Way.

The method of claim 7, wherein
Computing a minimum iteration distance (MII) between the iterations based on the analyzed memory dependency relationship.

The method of claim 10,
And assigning to processing elements in consideration of the analyzed memory dependency while increasing an iteration distance (II) value from the calculated MII based on the analyzed memory dependency. Treatment method.

The method of claim 7, wherein
And wherein said operation trace comprises at least one of a register address, a value stored in a register, a memory address, and a value stored in memory.