KR100639985B1

KR100639985B1 - Apparatus for generating a on-chip network topology and method therefor

Info

Publication number: KR100639985B1
Application number: KR1020050013904A
Authority: KR
Inventors: 배영환; 장준영; 한진호; 조한진
Original assignee: 한국전자통신연구원
Priority date: 2004-12-14
Filing date: 2005-02-19
Publication date: 2006-10-31
Also published as: KR20060067070A

Abstract

온칩 네트워크 토폴로지 생성 장치 및 그 방법이 개시된다. 알고리즘 단계의 설계 사양이 구현된 레퍼런스 코드를 수행하여 IP 모듈간의 통신 요구량을 분석하고, IP 모듈들간의 통신 요구량을 기초로 IP 모듈들을 최하위 자식노드로 하는 이진트리를 생성한다. 그리고, 이진트리의 소정 중간노드와 연결된 하위 노드들과 소정 중간노드와의 병합 가능한 모든 경우 중에서 면적 및 통신지연시간을 기초로 정의된 비용함수의 값이 최소가 되는 병합을 선택하는 과정을 이진트리의 루트 노드까지 수행하여 트리를 재구성한다. 이로써, 면적 및 통신 지연 시간이 최소가 되는 온칩 네트워크 토폴로지를 생성할 수 있다.An on-chip network topology generating apparatus and method are disclosed. By performing the reference code implemented by the design specification of the algorithm step, it analyzes the communication requirements between the IP modules, and generates a binary tree with the IP modules as the lowest child node based on the communication requirements between the IP modules. Then, among all cases where merging between the lower nodes connected to the predetermined intermediate node and the predetermined intermediate node of the binary tree is possible, the process of selecting a merge that minimizes the value of the cost function defined based on the area and the communication delay time is performed. Reorganize the tree by running up to the root node. This allows the creation of an on-chip network topology with minimal area and communication latency.

온칩 네트워크 토폴로지, IP 모듈, 통신 요구량, 이진 트리On-Chip Network Topology, IP Modules, Communication Requirements, Binary Trees

Description

Apparatus for generating a on-chip network topology and method therefor}

도 1a 내지 도 1c은 온칩 네트워크의 다양한 토폴로지를 도시한 도면,1A-1C illustrate various topologies of an on-chip network;

도 2는 스타형 네트워크 토폴로지에서 사용되는 크로스바 스위치(Crossbar Switch)를 도시한 도면,FIG. 2 illustrates a crossbar switch used in a star network topology. FIG.

도 3a 내지 도 3c는 실제 설계 회로에서의 컴포넌트들간의 데이터 통신 요구량의 일 예를 도시한 도면,3A to 3C show an example of data communication requirements between components in an actual design circuit;

도 4a 내지 도 4c는 단일 크로스바 스위치 구조 및 트리 구조를 사용한 경우의 칩 면적을 비교한 그래프,4A to 4C are graphs comparing chip areas when a single crossbar switch structure and a tree structure are used;

도 5a는 본 발명에 따른 온칩 네트워크 토폴로지 생성 장치의 일 실시예의 구성을 도시한 도면,5A is a diagram illustrating a configuration of an embodiment of an on-chip network topology generating device according to the present invention;

도 6a, 도 6b, 및 도 7a 내지 도 7e는 이진 트리 생성 과정의 일 실시예를 도시한 도면,6A, 6B, and 7A to 7E illustrate an embodiment of a binary tree generation process;

도 8a 내지 도 8d는 트리 최적화 방법의 일 실시예를 도시한 도면,8A to 8D illustrate an embodiment of a tree optimization method;

도 9a 내지 도 9d는 온칩 네트워크 토폴로지 생성의 각 단계를 도시한 도면, 그리고, 9A-9D illustrate each step of generating an on-chip network topology, and

도 10a 및 도 10b는 임계 경로 탐색 및 경로 추가의 일 예를 도시한 도면이다.10A and 10B illustrate an example of critical path search and path addition.

본 발명은 온칩 네트워크 토폴로지 생성 장치 및 그 방법에 관한 것이다.The present invention relates to an on-chip network topology generating apparatus and method thereof.

SoC(System-on-Chip) 설계 방식은 다품종 시스템용 SoC 개발을 단기간에 할 수 있는 플랫폼 중심 설계가 중시되고 있는데, 플랫폼의 구성에는 프로세서와 함께 데이터 통신 구조가 설계의 핵심이다. SoC에 집적되는 소자의 수가 증가함에 따라 각 구성 모듈들 간에 주고받아야 하는 데이터의 양이 급속히 늘어나고 있으며, 이를 위한 통신 구조의 설계는 더욱 어려워지고 있다.SoC (System-on-Chip) design method is focused on the platform-oriented design to develop SoC for a variety of systems in a short time, the data communication structure together with the processor is the core of the design. As the number of devices integrated in an SoC increases, the amount of data that must be transmitted and received between each component module is rapidly increasing, and the design of a communication structure for this is becoming more difficult.

현재 내장형 시스템에서 널리 쓰이고 있는 Multi-layer AMBA, Silicon Backplane 등의 온칩 버스는 데이터 대역폭의 제약으로 인하여 대량 데이터 통신이 요구되는 멀티미디어 등의 응용에 큰 성능 제약 요인이 되고 있다. On-chip buses such as Multi-layer AMBA and Silicon Backplane, which are widely used in embedded systems, are becoming a big performance constraint for multimedia applications requiring large data communication due to data bandwidth limitations.

기존의 온칩 버스 구조는 제한된 통신 미디어를 여러 통신 주체들이 시간 기준 다중화(Time Based Multiplexed) 방식으로 공유함으로서 실시간 데이터 송수신이 요구되는 응용 분야에는 적합하지 않다.The existing on-chip bus structure is not suitable for applications requiring real-time data transmission and reception by sharing the limited communication media with multiple communication entities in a time based multiplexed manner.

또한, 다중 코어 컴포넌트들이 버스를 공유하는 온칩 버스의 제약 사항으로 성능 저하, 다양한 인터페이스 요구로 인한 빈약한 재사용률, 데이터 대역폭 범위성(Scalability)의 제약 등이 있으며, 연결된 코어 컴포넌트의 수가 증가함에 따라 로딩 용량(Loading Capacitance)이 증가하여 많은 전력이 소모되는 단점이 있다. 그리고, 온칩 네트워크와 동일한 데이터 대역폭을 얻기 위해서는 버스를 구동하는 클럭 주파수가 높아지게 되어 같은 성능에서 전력 소모가 커지는 단점이 있다. In addition, the limitations of on-chip buses where multiple core components share a bus include performance degradation, poor reuse rate due to various interface requirements, and limitations in data bandwidth scalability. As the number of connected core components increases, There is a disadvantage in that a lot of power is consumed because the loading capacity is increased. In addition, in order to obtain the same data bandwidth as the on-chip network, the clock frequency driving the bus is increased, resulting in increased power consumption at the same performance.

따라서 다양한 프로세서 코어, 메모리, IP(Intellectual Properties) 등을 하나의 칩에 집적할 수 있는 SoC 설계를 위해서는 범위성(Scalability)과 재사용성(Reusability)를 갖는 온칩 네트워크 기술이 요구된다. 또한, 온칩 네트워크를 이용하여 고성능 SoC 설계를 가능하게 하려면 온칩 네트워크 아키텍쳐 설계 및 검증 기술 등을 기반으로 하는 설계 자동화 방법이 필요하다.Therefore, for SoC design that can integrate various processor cores, memory, and IP (Intellectual Properties) on a single chip, on-chip network technology having scalability and reusability is required. In addition, enabling high-performance SoC designs using on-chip networks requires design automation methods based on on-chip network architecture design and verification techniques.

온칩 네트워크의 경우 실세계의 범용 컴퓨터 네트워크와는 달리 하나의 반도체 칩에 구현되어 있고, 반도체 칩 자체의 설계 목적에 따라 통신 주체인 각 IP 모듈들 간의 통신 요구가 일정한 주기를 기준으로 통신 순서와 데이터 양이 거의 일정하기 때문에 각 설게에 특화된 네트워크 구조로 구현할 경우, 반도체 칩의 제작 단가에 절대적인 영향을 미치는 칩 면적을 크게 줄일 수 있으며, 성능 개선은 물론 소모 전력까지도 줄일 수 있다.Unlike general-purpose computer networks in the real world, the on-chip network is implemented in one semiconductor chip. Since this is almost constant, the implementation of a network structure specialized for each design can greatly reduce the chip area which has an absolute influence on the manufacturing cost of the semiconductor chip, and improve the performance as well as the power consumption.

도 1a 내지 도 1c은 온칩 네트워크의 다양한 토폴로지를 도시한 도면이다.1A-1C illustrate various topologies of an on-chip network.

도 1a는 스타(star)형 온칩 네트워크의 토폴로지를 도시한 도면이고, 도 1b는 메쉬(mesh)형 온칩 네트워크의 토폴로지를 도시한 도면이며, 도 1c는 토러스(torus)형 온칩 네트워크의 토폴로지를 도시한 도면이다.FIG. 1A illustrates a topology of a star on-chip network, FIG. 1B illustrates a topology of a mesh-type on-chip network, and FIG. 1C illustrates a topology of a torus-type on-chip network. One drawing.

상호 통신 요구가 있는 컴포넌트의 수가 비교적 많지 않은 경우 스타형 토폴로지가 성능 및 칩 면적 면에서 유리하고, 컴포넌트의 수가 많은 경우 메쉬형, 토러스형의 토폴로지가 범위성(Scalability) 면에서 매우 유리하다. 세계적으로 향후 5년 이내에 SoC 설계 규모에는 스타형 토폴로지가 가장 유리할 것으로 예상되어 스타형 토폴로지가 여러 설계에 시험적으로 적용되고 있는 상황이다. 메쉬형 및 토러스형 토폴로지의 경우 매우 범용의 구조로서 컴포넌트의 수가 매우 많아질 미래에는 적용 가능하나 현재에는 통신 지연 시간이 커서 적용하기 어렵다.Star topology is advantageous in terms of performance and chip area when the number of components with mutual communication requirements is not relatively high, and mesh and torus topology is very advantageous in scalability when the number of components is large. Globally, star topologies are expected to be the most advantageous for SoC designs within the next five years, and star topologies are being applied to several designs. The mesh and torus topologies are very general structures and can be applied in the future when the number of components will be very large.

도 2는 스타형 네트워크 토폴로지에서 사용되는 크로스바 스위치(Crossbar Switch)를 도시한 도면이다.FIG. 2 illustrates a crossbar switch used in a star network topology.

도 3a 내지 도 3c는 실제 설계 회로에서의 컴포넌트들간의 데이터 통신 요구량의 일 예를 도시한 도면이다.3A to 3C are diagrams showing examples of data communication requirements between components in actual design circuits.

도 4a 내지 도 4c는 단일 크로스바 스위치 구조 및 트리 구조를 사용한 경우의 칩 면적을 비교한 그래프이다.4A to 4C are graphs comparing chip areas when a single crossbar switch structure and a tree structure are used.

도 2를 참조하면, 크로스바 스위치의 포트에 컴포넌트들을 연결만 하면 별다른 통신 구조 설계 작업이 필요하지 않아 설계 시간이 단축되고 다수의 컴포넌트들이 통시에 통신 가능하므로 기존의 온칩 버스, 다른 토폴로지의 온칩 네트워크 구조에 비해 성능이 우수하다,.Referring to FIG. 2, by simply connecting components to ports of a crossbar switch, a design of a communication structure is not required. Therefore, a design time is shortened and a plurality of components can communicate at the same time. The performance is excellent, compared to.

그러나, 일반적으로 IP 들간의 통신 요구량은 도 3의 예에서와 같이 균등하지 않기 때문에 하나의 크로스바 스위치를 사용하여 설계할 경우 데이터 대역폭의 낭비는 물론 쓰지 않는 포트가 있을 경우 칩 면적을 낭비하게 된다. However, in general, the communication requirements between the IPs are not equal as in the example of FIG. 3, so when designing with one crossbar switch, not only wastes data bandwidth but also wastes chip area when there are unused ports.

또한 연결된 IP의 수가 증가함에 따라 도 4와 같이 크로스바 스위치의 면적이 포트수의 제곱에 비례하므로 스위치의 클럭 속도가 감소하게 되어 전체 네트워크의 성능 감소를 초래한다. 기 설계된 스위치 라이브러리에 원하는 포트수의 스위치가 없을 경우 사용하지 않는 포트가 발생하게 되어 귀중한 칩 리소스의 낭비를 초래한다. 따라서, 각 설계 자체의 특성에 최적화된 응용 설계 특화 온칩 네트워크가 필요하다.In addition, as the number of connected IP increases, the area of the crossbar switch is proportional to the square of the number of ports as shown in FIG. 4, thereby reducing the clock speed of the switch, thereby reducing the performance of the entire network. If the switch library is not designed for the desired number of ports, unused ports are generated, which wastes valuable chip resources. Therefore, there is a need for an application design specialized on-chip network optimized for the characteristics of each design itself.

그러나, 각 설계에 따라 구성 컴포넌트들 간의 통신 요구량을 분석하고 이에 가장 적합한 통신 구조를 찾아내는 일은 매우 어렵고 많은 시간이 요구되기 때문에 일반적으로 설계자의 손으로 설계하는 경우에는 단일 스위치를 쓰거나 설계자의 경 험에 의존한 구조로 설계하게 되어 매우 비효율적이다. However, it is very difficult and time-consuming to analyze the communication requirements between the components of each design and find the most suitable communication structure according to each design. Therefore, when designing by the designer's hand, a single switch or the experience of the designer is generally required. It is very inefficient to design with a dependent structure.

각 설게에 최적화된 온칩 네트워크 기반 통신 구조를 자동으로 합성하여 주는 설계 자동화 툴을 온칩 네트워크 컴파일러라고 하는데, 현재까지 많은 연구가 이루어지지 않고 있으나, 미국 스탠포드 대학의 Xpipes OCN(On-Chip-Network) 컴파일러가 있다. Xpipes 컴파일러는 다양한 토폴로지의 온칩 네트워크를 지원하며 합성의 결과로 SystemC 코드를 출력한다. 그러나, Xpipes는 다양한 네트워크 토폴로지를 지원하지만 정해진 토폴로지로 맵핑하는 방식을 사용하기 때문에 각 설계에 가장 최적화된(customized) 네트워크를 생성하지 못한다.A design automation tool that automatically synthesizes the on-chip network-based communication structure optimized for each design is called an on-chip network compiler. Although much research has not been done so far, Xpipes On-Chip-Network (OCN) compiler from Stanford University There is. The Xpipes compiler supports on-chip networks of various topologies and outputs SystemC code as a result of synthesis. However, Xpipes supports a variety of network topologies but does not create the most customized network for each design because of the way they map to a given topology.

본 발명이 이루고자 하는 기술적 과제는, 칩 면적 및 통신 지연 시간이 최소가 되도록 최적화된 온칩 네트워크 토폴로지를 자동으로 생성해 주는 장치 및 그 방법을 제공하는 데 있다.It is an object of the present invention to provide an apparatus and method for automatically generating an on-chip network topology optimized to minimize chip area and communication delay time.

본 발명이 이루고자 하는 다른 기술적 과제는, 칩 면적 및 통신 지연 시간이 최소가 되도록 최적화된 온칩 네트워크 토폴로지를 자동으로 생성하는 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는 데 있다. Another technical problem to be solved by the present invention is to provide a computer-readable recording medium having recorded thereon a program for executing on a computer a method for automatically generating an on-chip network topology optimized to minimize chip area and communication latency. There is.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 온칩 네트워크 토폴로지 생성 장치의 일 실시예는, 알고리즘 단계의 설계 사양이 구현된 레퍼런스 코드를 수행하여 IP 모듈간의 통신 요구량을 분석하는 통신량 분석부; 상기 IP 모듈들 간의 통신 요구량을 기초로 상기 IP 모듈들을 최하위 자식노드로 하는 이진트리를 생성하는 트리 생성부; 및 상기 이진트리의 소정 중간노드와 연결된 하위 노드들과 상기 소정 중간노드와의 병합 가능한 모든 경우 중에서 면적 및 통신지연시간을 기초로 정의된 비용함수의 값이 최소가 되는 병합을 선택하는 과정을 상기 이진트리의 루트 노드까지 수행하는 트리 최적화부;를 포함한다.In accordance with one aspect of the present invention, there is provided an apparatus for generating an on-chip network topology according to an embodiment of the present invention, comprising: a traffic analysis unit configured to analyze a communication requirement between IP modules by performing a reference code in which a design specification of an algorithm step is implemented; A tree generation unit generating a binary tree having the IP modules as the lowest child nodes based on the communication demands between the IP modules; And selecting merging in which all of the lower nodes connected to the predetermined intermediate node of the binary tree and the predetermined intermediate node are merged with the minimum value of the cost function defined based on the area and the communication delay time. It includes; tree optimizer for performing up to the root node of the binary tree.

상기의 기술적 과제를 달성하기 위한, 본 발명에 따른 온칩 네트워크 토폴로지 생성 방법의 일 실시예는, 알고리즘 단계의 설계 사양이 구현된 레퍼런스 코드를 수행하여 IP 모듈간의 통신 요구량을 분석하는 단계; 상기 IP 모듈들간의 통신 요구량을 기초로 상기 IP 모듈들을 최하위 자식노드로 하는 이진트리를 생성하는 단계; 및 상기 이진트리의 소정 중간노드와 연결된 하위 노드들과 상기 소정 중간노드와의 병합 가능한 모든 경우 중에서 면적 및 통신지연시간을 기초로 정의된 비용함수의 값이 최소가 되는 병합을 선택하는 과정을 상기 이진트리의 루트 노드까지 수행하여 트리를 재구성하는 단계;를 포함한다.In order to achieve the above technical problem, an embodiment of the method for generating an on-chip network topology according to the present invention includes: analyzing communication requirements between IP modules by performing a reference code in which a design specification of an algorithm step is implemented; Generating a binary tree having the IP modules as the lowest child nodes based on the communication demands between the IP modules; And selecting merging in which all of the lower nodes connected to the predetermined intermediate node of the binary tree and the predetermined intermediate node are merged with the minimum value of the cost function defined based on the area and the communication delay time. And reconstructing the tree by performing up to the root node of the binary tree.

이로써, 면적 및 통신 지연 시간이 최소가 되는 온칩 네트워크 토폴로지를 생성할 수 있다.This allows the creation of an on-chip network topology with minimal area and communication latency.

이하에서, 첨부된 도면들을 참조하여 본 발명에 따른 온칩 네트워크 토폴로지 생성 장치 및 그 방법에 관해 상세히 설명한다.Hereinafter, an on-chip network topology generating apparatus and method thereof according to the present invention will be described in detail with reference to the accompanying drawings.

도 5a는 본 발명에 따른 온칩 네트워크 토폴로지 생성 장치의 일 실시예의 구성을 도시한 도면이다.5A is a diagram illustrating a configuration of an embodiment of an on-chip network topology generating apparatus according to the present invention.

도 5a를 참조하면, 온칩 네트워크 토폴로지 생성장치는 통신량 분석부(500), 트리 생성부(510), 트리 최적화부(520), 임계 경로 탐색부(530) 및 토폴로지 생성부(540)로 구성된다.Referring to FIG. 5A, the on-chip network topology generating apparatus includes a traffic analyzer 500, a tree generator 510, a tree optimizer 520, a critical path searcher 530, and a topology generator 540. .

통신량 분석부(500)는 알고리즘 단계의 설계 사양이 구현된 레퍼런스(reference) 코드를 수행하여 실제 수행 환경에서의 각각의 IP 모듈들간의 통신 요구가 얼마인가를 계산한다. 다시 말하면, 통신량 분석부(500)는 C 코드 또는 SystemC 코드로 쓰여져 있는 알고리즘 단계의 설계 사양이 구현된 레퍼런스 코드를 수행시키는 프로그램을 수행하여 실제 수행 환경에서의 통신 요구를 발생시키는 프로파일링(Application Profiling) 단계를 거쳐, 각 구성 IP 모듈들 간의 통신 요구량을 분석한다.The traffic analysis unit 500 performs a reference code in which a design specification of an algorithm step is implemented and calculates a communication request between respective IP modules in an actual execution environment. In other words, the traffic analysis unit 500 executes a program that executes a reference code in which a design specification of an algorithm step written in C code or SystemC code is implemented to generate a communication request in an actual execution environment. In step), communication requirements between the respective IP modules are analyzed.

즉, 통신량 분석부(500)는 각각의 IP 모듈들에 대응되는 레퍼런스 C 코드상의 함수들 간의 데이터 이동량을 함수 별로 누적하여 최종적으로 출력한다. 본 발명은 통신량 분석부(500)에 의해 산출된 통신량을 근거로 온칩 네트워크 토폴로지를 최적화한다. That is, the traffic analysis unit 500 accumulates the amount of data movement between the functions on the reference C code corresponding to the respective IP modules for each function and finally outputs the accumulated data movement amount. The present invention optimizes the on-chip network topology based on the traffic amount calculated by the traffic analysis unit 500.

트리 생성부(510)는 통신량 분석부(500)에 의해 각각의 IP 모듈들 간의 통신 요구량 분석이 완료된 후, 통신 요구량이 많은 IP 모듈들 끼리 가까이 배치하고 가능한 하나의 크로스바 스위치(Crossbar Switch)에 할당하여 통신 지연시간 및 통신 선로 간의 트래픽이 최소가 되도록 IP 모듈들을 클러스터링(clustering)한다. 트리 생성부(510)는 클러스터링을 위해 2개의 IP 모듈들을 Bottom-Up 방식으로 묶어 트리를 구성하는 이진 트리 분할 방식을 사용한다. The tree generation unit 510 is arranged by the traffic analysis unit 500 after the analysis of the communication requirements between the respective IP modules is completed, placed close to the IP modules having a large communication demands and assigned to one crossbar switch as possible. By clustering the IP modules to minimize the communication delay time and the traffic between the communication lines. The tree generating unit 510 uses a binary tree splitting method of forming a tree by combining two IP modules in a bottom-up manner for clustering.

구체적으로 살펴보면, 트리 생성부(510)는 먼저, 전체 IP 모듈들 중에서 상 호 간에 통신 요구량이 많은 IP 모듈들을 두 개 씩 묶어서 이들을 자식 노드로 갖는 부모 노드를 도입하고, 자식 노드들의 모든 통신 요구를 부모 노드에 할당한다. 그리고, 트리 생성부(510)는 각 부모 노드들에 대하여 앞서 행했던 방식과 같이 통신 요구량이 많은 순서대로 둘씩 묶고 이들을 자식 노드로 갖는 차 상위 보모 노드를 도입한다. 트리 생성부(510)는 이러한 과정을 모든 IP 블록들이 트리에 포함될 때까지 수행하여 이진 트리를 생성한다. 이진 트리 생성의 일 실시예는 도 6a, 도 6b 및 도 7a 내지 도 7e에 도시되어 있다Specifically, the tree generating unit 510 first introduces a parent node having two communication modules having high communication demands among each other among all the IP modules and having them as child nodes, and then applies all communication requests of the child nodes. Assign to the parent node. Then, the tree generating unit 510 introduces the next higher level of nanny node having two parent nodes and grouping them two by one in order of high communication demand as in the previous method. The tree generator 510 performs this process until all IP blocks are included in the tree to generate a binary tree. One embodiment of binary tree generation is shown in FIGS. 6A, 6B and 7A-7E.

도 6a 내지 도 7e를 참조하여, 이진 트리 생성의 일 실시예를 먼저 살펴본다. 도 6a를 참조하면, p1(600), p2(602), ip1(604), ip2(606), m1(608) 및 m2(610)의 6개의 IP 모듈들이 존재하며, 통신 분석부(500)에 의해 분석된 각각의 IP 모듈들간의 통신 요구량이 제시되어 있다. 도 6b는 트리 생성부(510)에 의해 생성된 도 6a에 대한 이진 트리 생성 결과이다. 도 6a 에서 도 6b로의 생성과정은 도 7a 내지 도 7e에 도시되어 있다.6A to 7E, an embodiment of generating a binary tree will be described first. Referring to FIG. 6A, there are six IP modules of p1 600, p2 602, ip1 604, ip2 606, m1 608, and m2 610, and communication analyzer 500. The communication requirements between the respective IP modules analyzed by are presented. FIG. 6B is a binary tree generation result of FIG. 6A generated by the tree generator 510. The production process from FIG. 6A to FIG. 6B is illustrated in FIGS. 7A to 7E.

도 7a를 참조하면, 먼저 IP 모듈들간의 통신 요구량을 기초로 통신 요구량이 많은 IP 모듈들을 둘씩 묶고 부모 노드들(700,702)를 도입하여 연결한다. 그리고, 도 7b에서 처럼 부모 노드들과 묶여지지 않고 남은 IP 모듈들(604,606)의 통신 요구량을 기초로 둘씩 묶고 다시 부모 노드(704)를 도입하여 연결한다(도 7b, 도 7c, 도 7d). 이러한 방법을 반복하여 최종적으로 도 7e와 같은 이진트리를 완성한다. Referring to FIG. 7A, first, IP modules having a high communication demand are bundled together based on a communication demand between IP modules, and parent nodes 700 and 702 are introduced and connected. Then, as shown in FIG. 7B, the two nodes are bundled based on the communication requirements of the remaining IP modules 604 and 606 that are not tied to the parent nodes, and the parent node 704 is introduced and connected again (FIGS. 7B, 7C, and 7D). This method is repeated to finally complete the binary tree as shown in FIG. 7E.

트리 생성부(510)에 의해 이진 트리의 중간 노드들을 2x1 크로스바 스위치로 구현할 경우에, 하나의 크로스바 스위치에 할당된 IP 모듈들 간에는 빠른 통신이 가능하지만, 경우에 따라서는 여러 크로스바 스위치를 거쳐 통신을 하는 경우가 발생하고 통신 채널 중간에 과도하게 크로스바 스위치를 많이 쓰게되는 경우 칩 면적도 커질 가능성이 있다.When the intermediate nodes of the binary tree are implemented as 2x1 crossbar switches by the tree generating unit 510, fast communication is possible between IP modules allocated to one crossbar switch, but in some cases, communication is performed through several crossbar switches. If there are too many crossbar switches in the middle of the communication channel, the chip area may increase.

따라서, 트리 최적화 과정을 통해서 스위치가 차지하는 면적 및 통신 지연시간을 최소화 하는 트리 최적화 과정이 필요하다.Therefore, a tree optimization process that minimizes the area occupied by the switch and communication delay time through the tree optimization process is required.

트리 최적화부(520)는 이러한 트리 최적화를 위해 동적 프로그래밍(Dynamic Programming) 기법을 이용하여 트리 상의 중간 노드들을 필요에 따라 병합하여 면적과 지연시간을 최소화한다. 최적화 적업의 척도를 가늠하기 위하여 먼저 아래 수학식 1과 같은 비용 함수를 정의하고, 각 트리 최적화 단계에서 정의된 비용 함수가 최소가 되는 해를 구하여 최적화를 수행한다.The tree optimizer 520 merges intermediate nodes on the tree as needed by using a dynamic programming technique to minimize the area and delay time. To measure the scale of optimization task, first, the cost function is defined as in Equation 1 below.

여기서, 상수 a 및 b는 실험적으로 계산된다.Here, the constants a and b are calculated experimentally.

트리 최적화부(520)에 의한 트리 최적화 방법을 구체적으로 살펴보면, 먼저 이진 트리의 최하위 자식노드에서 출발하여 루트노드를 향하여 이진 트리를 깊이 우선 탐색(Depth First Search)하면서, 트리의 각 중간 노드에서 그 하위 트리(sub 트리)에서 계산한 모든 가능한 노드 병합의 해를 이용하여 중간 노드 단계에서의 비용 함수의 값이 최소가 되는 병합 해를 구한다. 이러한 노드 병합 과정을 중간노드들에 대해 계속 진행하여 전체 트리의 비용함수의 값이 최소가 되는 트리를 구한 다. Looking at the tree optimization method by the tree optimizer 520 in detail, first the binary tree to the root node starting from the lowest child node of the binary tree (Depth First Search), each intermediate node of the tree Use the solution of all possible node merges computed in the subtree to find the merge solution that minimizes the value of the cost function at the intermediate node level. This node merging process is continued for the intermediate nodes to find a tree with the minimum value of the cost function of the entire tree.

트리 최적화 방법의 일 예는 도 8a 내지 도 8d에 도시되어 있다. 도 8a 내지 도 8d를 참조하여 트리 최적화 방법의 일 예를 살펴보면, 이진 트리의 중간 노드 병합 방법은 4가지로 구분된다. 먼저, 도 8a와 같이 하위 서브 트리의 루트 노드인 두 자식 노드들(802,804)과 이들을 포함하는 한 단계 상위 서브 트리의 루트 노드인 부모 노드(800)를 병합하지 않는 경우, 도 8b와 같이 오른쪽 자식 노드(804)와 병합하는 경우, 도 8c와 같이 왼쪽 자식 노드(802)와 병합하는 경우, 그리고, 도 8d와 같이 부모 노드(800)와 두 자식 노드(802,804) 모두를 병합하는 경우가 있다.An example of the tree optimization method is shown in FIGS. 8A-8D. Referring to FIGS. 8A to 8D, one example of a tree optimization method is divided into four methods for merging intermediate nodes of a binary tree. First, as shown in FIG. 8A, when the two child nodes 802 and 804 that are the root nodes of the lower subtree and the parent node 800 that are the root nodes of the one-level higher subtree including them are not merged, the right child as shown in FIG. 8B. When merging with the node 804, when merging with the left child node 802 as shown in FIG. 8C, and when the parent node 800 and both child nodes 802 and 804 are merged as shown in FIG. 8D.

트리 최적화부(520)는 도 8a 내지 도 8d에 도시된 병합의 각각에 대하여 비용 함수의 해를 구하고, 이들 중에서 최소 비용의 해를 선택하여 현 서브 트리의 해로 정한다. 모든 서브 트리의 최소 비용 함수의 해가 정해지면 한 단계 상위 서브 트리에서 각 자식 노드와 루트 노드로 구성된 서브 트리의 비용 함수의 해를 이용하여 다시 도 8a 내지 도 8d의 네가지 병합 과정을 수행하여 최소 비용의 해를 구하는 과정을 전체 이진 트리가 포함될 때까지 수행한다. The tree optimizer 520 obtains a solution of the cost function for each of the merges shown in FIGS. 8A to 8D, selects a solution of the least cost among them, and determines the solution of the current subtree. When the solution of the minimum cost function of all subtrees is determined, the four merge processes of FIGS. 8A to 8D are performed again by using the solution of the cost function of each sub-node and the root node in a higher level subtree. The solution of the cost is performed until the entire binary tree is included.

도 9a 내지 도 9d는 도 5를 참조하여 설명한 통신량 분석부(500), 트리 생성부(510) 및 트리 최적화부(520)에 의해 최적화된 온칩 네트워크 토폴로지 생성의 각 단계의 일 예를 도시한 도면이다. 도 9a는 통신량 분석부(500)에 의해 각각의 IP 모듈들 간의 통신량을 분석한 결과를 도시하고(도 6a 참조), 도 9b는 트리 생성부(510)에 의해 각각의 IP 모듈들간의 통신 요구량을 기초로 생성한 이진 트리를 도시하고(도 6b 및 도 7e 참조), 도 9c는 트리 최적화부(520)에 의해 이진 트리를 최적화한 트리를 도시하고, 도 9d는 도 9d의 최적화된 트리의 중간노드들을 크로스바 스위치로 대체하여 구현한 온칩 네트워크 토폴로지를 도시한 도면이다. 9A to 9D illustrate an example of each step of generating an on-chip network topology optimized by the traffic analyzer 500, the tree generator 510, and the tree optimizer 520 described with reference to FIG. 5. to be. FIG. 9A shows a result of analyzing the amount of communication between the respective IP modules by the traffic analyzer 500 (see FIG. 6A), and FIG. 9B shows the amount of communication between the respective IP modules by the tree generator 510. Fig. 6B and 7E show the binary tree generated based on Fig. 9C, and Fig. 9C shows a tree optimized by the tree optimizer 520, and Fig. 9D shows the optimized tree of Fig. 9D. On-chip network topology is implemented by replacing intermediate nodes with crossbar switches.

임계 경로 탐색부(530)는 IP 모듈들간의 통신 특성에 따라 통신의 양은 많지 않지만 통신 지연 시간이 일정 임계치 이상 커지면 동작에 이상이 생길 수 있는 임계 경로(Critical Path)의 경우, 임계 경로 지연 시간(Critical Path Delay)을 줄이기 위해 이들 IP 모듈들 간에 경로를 삽입한다. 이 경우 각각의 IP 모듈들이 속한 크로스바 스위치에 통신 채널이 연결되므로 각 두 크로스바 스위치의 포트수가 1씩 증가되어 전체적으로 통신 지연시간은 줄어드나 면적은 증가한다.The critical path search unit 530 does not have a large amount of communication depending on the communication characteristics between the IP modules, but in the case of a critical path that may cause an abnormal operation when the communication delay time increases by more than a predetermined threshold, the critical path delay time ( Insert paths between these IP modules to reduce Critical Path Delay. In this case, since the communication channel is connected to the crossbar switch to which each IP module belongs, the number of ports of each of the two crossbar switches increases by 1, which reduces the overall communication delay time but increases the area.

따라서, 임계 경로 탐색부(530)는 소정의 임계 경로를 위해 크로스바 스위치들 간에 통신 채널을 연결할 경우 전체적인 통신 지연 시간이 감소하면서 면적의 합이 정해 놓은 일정 한도를 넘지 않으면 새롭게 삽입된 통신 채널을 선택하고, 그렇지 않을 경우 추가된 경로를 제거한다. 이러한 Greedy 최적화는 더 이상 선택할 임계 경로가 없거나, 더 이상의 통신 지연 시간 개선 효과가 없을 때까지 수행된다. Accordingly, the critical path search unit 530 selects a newly inserted communication channel when the total communication delay time is reduced and the sum of the areas does not exceed a predetermined limit when the communication channel is connected between the crossbar switches for a predetermined critical path. Otherwise, remove the added path. This greedy optimization is performed until there are no more critical paths to choose from, or there is no further improvement in communication latency.

도 10a 및 도 10b는 임계 경로 탐색부(530)에 의한 임계 경로 탐색 및 경로 추가의 일 예를 도시한 도면이다. 도 10a 및 도 10b를 참조하면, 임계 경로 탐색부(530)는 왼쪽 최하위 자식노드(1000)와 통신 지연 시간이 소정의 임계치 이상인 노드(1010)와의 임계 경로를 탐색한다. 그리고, 임계 경로 탐색부(530)는 임계 경로와 연결된 중간 노드들(1010,1012)간의 경로를 추가한다. 이로써, 중간노드들이 크로스바 스위치로 구현될 때 크로스바 스위치 간의 통신 채널이 추가된다. 10A and 10B illustrate an example of threshold path search and path addition by the threshold path search unit 530. 10A and 10B, the threshold path search unit 530 searches for a critical path between the left lowest child node 1000 and the node 1010 having a communication delay time greater than or equal to a predetermined threshold. The critical path search unit 530 adds a path between intermediate nodes 1010 and 1012 connected to the critical path. This adds a communication channel between crossbar switches when intermediate nodes are implemented as crossbar switches.

토폴로지 생성부(540)는 트리 최적화부(520)에 의한 최적화 및 필요에 따라 임계 경로 탐색부(530)에 의해 추가 경로가 삽입된 트리의 중간노드들을 중간노드들에 연결된 노드 수만큼의 포트를 갖는 크로스바 스위치로 바꾸고, 트리의 연결구조와 동일하게 크로스바 스위치에 상기 IP 모듈들을 연결하여 온칩 네트워크 토폴로지를 생성하여 출력한다. The topology generating unit 540 optimizes the tree by the tree optimizer 520 and optionally adds the number of ports corresponding to the number of nodes connected to the intermediate nodes in the tree where the additional path is inserted by the critical path search unit 530. It changes to a crossbar switch having the same, and generates and outputs an on-chip network topology by connecting the IP modules to the crossbar switch in the same manner as the connection structure of the tree.

도 5b는 본 발명에 따른 온칩 네트워크 토폴로지 생성 방법의 일 실시예의 흐름을 도시한 흐름도이다.5B is a flowchart illustrating a flow of an embodiment of an on-chip network topology generation method according to the present invention.

도 5b를 참조하면, 먼저, C 코드 또는 SystmeC 코드로 쓰여진 알고리즘 단계의 설계 사양이 구현된 레퍼런스 코드를 수행하여 실제 수행 환경에서의 통신 요구를 발생시키는 프로파일링을 수행한다(S550). 그리고, 프로파일링 단계를 통해 각 구성 IP 모듈들 간의 통신 요구량을 분석하고(S555), 분석한 통신 요구량을 기초로 IP 모듈들을 최하위 노드로 하는 이진 트리를 생성한다(S560). Referring to FIG. 5B, first, a profiling that generates a communication request in an actual execution environment is performed by performing a reference code in which a design specification of an algorithm step written in C code or SystmeC code is implemented (S550). The profiling step analyzes the communication requirements between the respective IP modules (S555), and generates a binary tree including the IP modules as the lowest nodes based on the analyzed communication requirements (S560).

이진 트리를 동적 프로그래밍 기법을 응용하여 면적 및 통신 지연 시간이 최소가 되도록 이진 트리를 최적화하고(S565), 통신 지연 시간이 소정 임계치 이상인 IP 모듈간의 임계 경로를 탐색하여 그 IP 모듈을 직접 연결하는 경로를 추가한다(S570). 그리고, 최종적으로 트리에 대한 OCN(On Chip Network) SystemC 코드를 생성하고(S575), 성능을 시뮬레이션한다(S580).The binary tree is optimized by applying a dynamic programming technique to optimize the binary tree to minimize the area and communication delay time (S565), and search for critical paths between IP modules having a communication delay time greater than or equal to a predetermined threshold to directly connect the IP modules. Add (S570). And finally, generates an On Chip Network (OCN) SystemC code for the tree (S575), and simulates the performance (S580).

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The invention can also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, and may also be implemented in the form of a carrier wave (for example, transmission over the Internet). Include. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far I looked at the center of the preferred embodiment for the present invention. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

본 발명에 따르면, 알고리즘 단계의 설계 사양이 구현된 레퍼런스 코드를 수행하고 이를 통해 통신 패턴을 분석하여 최적의 온칩 네트워크를 자동으로 생성한다. 따라서, 칩 면적을 최소화하고 성능을 최대로 하는 네트워크 토폴로지로 최적화된 통신 구조를 SystemC와 VHDL로 출력하여 시뮬레이션 및 하드웨어로 합성 가능하다.According to the present invention, an algorithm-based design specification executes a reference code and analyzes a communication pattern, thereby automatically generating an optimal on-chip network. Therefore, it is possible to output the communication structure optimized by the network topology that minimizes the chip area and maximizes the performance to SystemC and VHDL, and to synthesize it with simulation and hardware.

Claims

A traffic volume analyzer configured to analyze a communication requirement between IP modules by performing a reference code in which a design specification of an algorithm step is implemented;

A tree generation unit generating a binary tree having the IP modules as the lowest child nodes based on the communication demands between the IP modules; And

Among all possible mergers of the lower nodes connected to a predetermined intermediate node of the binary tree and the predetermined intermediate node, a process of selecting a merge of which a value of a cost function defined based on area and communication delay time is minimized is selected. On-chip network topology generating device comprising a; tree optimization unit for performing up to the root node of the tree.

The method of claim 1,

On-chip network topology generation device further comprising; a critical path search unit for inserting a path between the IP modules having a communication delay time of more than a predetermined threshold after the node merging process of the binary tree is completed.

The method according to claim 1 or 2,

The intermediate nodes of the tree generated through the node merging process of the binary tree are replaced with crossbar switches having as many ports as the nodes connected to the intermediate nodes, and the IP modules are connected to the crossbar switches in the same manner as the connection structure of the tree. On-chip network topology generating device further comprising; a topology generating unit for connecting.

The method of claim 1, wherein the traffic analysis unit,

And accumulating the amount of data movement between the functions of the reference code for each function and analyzing the communication requirements between the IP modules.

The method of claim 1, wherein the tree generation unit,

Group the two IP modules based on the order of the high communication demands between the IP modules, introduce the parent nodes having the respective IP modules as the child nodes, and allocate the communication requirements of the child nodes to the parent nodes. And generating a binary tree by repeating the step of introducing two higher parent nodes in order of increasing communication requirements between the parent nodes.

The method of claim 1, wherein the tree optimizer,

When merging with the right subnode connected with the predetermined intermediate node, when merging with the left subnode connected with the predetermined intermediate node, when merging with both the right and left subnodes connected with the predetermined intermediate node, The on-chip network topology generating device, wherein the merging of the minimum value of the cost function is selected from among the intermediate nodes and no subnodes.

Analyzing a communication requirement between IP modules by performing a reference code in which a design specification of an algorithm step is implemented;

Generating a binary tree having the IP modules as the lowest child nodes based on the communication demands between the IP modules; And

Among all possible mergers of the lower nodes connected to a predetermined intermediate node of the binary tree and the predetermined intermediate node, a process of selecting a merge of which a value of a cost function defined based on area and communication delay time is minimized is selected. And reconstructing the tree by performing up to the root node of the tree.

The method of claim 7, wherein

And inserting a path between the IP modules having a communication delay time equal to or greater than a predetermined threshold.

The method according to claim 7 or 8,

Changing the intermediate nodes of the reconstructed tree into crossbar switches having as many ports as the nodes connected to the intermediate nodes, and connecting the IP modules to the crossbar switches in the same manner as the connection structure of the reconstructed tree to generate a topology The on-chip network topology generation method further comprising.

The method of claim 7, wherein the traffic analysis step,

And accumulating the amount of data movement between the functions of the reference code for each function and analyzing the communication demands between the IP modules.

The method of claim 7, wherein the tree generation step,

Group the two IP modules based on the order of the high communication demands between the IP modules, introduce the parent nodes having the respective IP modules as the child nodes, and allocate the communication requirements of the child nodes to the parent nodes. And generating a binary tree by repeating the step of introducing two higher parent nodes by binding the two nodes in order of increasing communication demands between the parent nodes. 2.

The method of claim 7, wherein the tree reconstruction step,

When merging with the right subnode connected with the predetermined intermediate node, when merging with the left subnode connected with the predetermined intermediate node, when merging with both the right and left subnodes connected with the predetermined intermediate node, Selecting a merging in which the value of the cost function is the minimum among cases in which the intermediate node and no subnodes are not merged.