TW201114224A

TW201114224A - NoC-centric system exploration platform and the parallel application communication mechanism description format used in the exploration platform

Info

Publication number: TW201114224A
Application number: TW098133472A
Authority: TW
Inventors: Yar-Sun Hsu; Chi-Fu Chang
Original assignee: Nat Univ Tsing Hua
Priority date: 2009-10-02
Filing date: 2009-10-02
Publication date: 2011-04-16

Abstract

Network-on-Chip (NoC) is a popular solution to solve the performance bottleneck of communication in System-on-Chip, and the performance of the NoC significantly depends on the application traffic. The present invention establishes a system framework across multiple layers from a system viewpoint, and defines the interface function behaviors and the traffic patterns of layers. The system-level simulation is used to simplify any unnecessary detail. The present invention provides an application modeling in which the task-graph of parallel applications is described in a text method, called Parallel Application Communication Mechanism Description Format (PACMDF), accordingly reducing the overhead of coding. The present invention further provides a system level NoC simulation framework, called NoC-centric System Exploration Platform, which defines the service spaces of layers in order to separate the traffic patterns and enable the independent designs of layers. Accordingly, the present invention can simulate a new design without modifying the framework of simulator or interface designs. Therefore, the present invention increases the design spaces of NoC simulators and design flexibility, supports a wider design space of NoC simulation, simplifies the design flow before constructing a new NoC design, and provides a modeling to evaluate the performance of NoC at the design stage.

Description

201114224 六、發明說明：【發明所屬之技術領域】本發明係與單晶片網路(NoC，Network-on-Chip)相關，特別指一種單晶片網路之系統探索平台；該系統探索平台根據系統分層作分割並且彼此使模擬模型彼此獨立；該探索平台並使用文字方式，來描述平行應用程式之工作圖。【先前技術】由於超大型積體電路(VLSI)製程進步而使得系統晶片 (System-on-Chip)的複雜程度日益增加。多核心處理器（multi_c〇re processors )、多智權單位（ip units )與功能控制器（c〇ntr〇llers ) 等元件數量增加’將原先系統晶片之效能瓶頸（perf〇rmance bottleneck )由計算電路（compUtation circilits )轉移到通訊電路 (communication circuits) ’ 也造成通訊瓶頸（c〇mmunicati〇n bottleneck)越益嚴重，使得通訊電路成為系統晶片設計的關鍵所在。單晶片網路（Network-on-Chip，NoC)為通訊瓶頸之熱門解決方案，使其成為新興的研究範疇。單晶片網路將系統晶片的設計，以計算為導向，轉向以通訊為導向。單晶片網路解決現今主流的匯流排（buses)架構中的許多問題，如拓展性差〇〇w㈣咖邮）、流量不夠（lowthroughout)。惟，單晶片網路往往使用較多的網路資源，如緩衝器（buffers)和交換器（switches)，並須考慮較複雜且耗電的電路設計，如路由單元（r〇uting units)。因此;^在實際建構單晶片網路前，設計的探索與模擬越顯重要。習知的單晶片網路研究，其模擬的環境和流程如圖i所示。其中應用程式模型(application m〇del)方塊U係用以描述交通模今 201114224 (traffic pattern) 〇而單晶片網路設計(NoC design)方塊12用以描述單晶片網路的元件(components)、計算節點(computation nodes)以及轉接器(adaptors)等。信息特性方塊(message characteristics) 13 則用以描述匯流排協議(bus transaction)、封包型式(packet format)或流量控制單元(flow control unit)等。這些方塊作為一單晶片網路模擬器 14之輸入，並且在模擬完畢之後輸出一模擬報告(rep〇rt)i5。然而，如圖1之習知模擬環境在描述該些應用程式方塊U、單晶片網路設計區塊12、以及訊息特性方塊13之輸入並未有一個統一的規範，因此不同的設計、一旦需要抽換某一方塊時，原本已經建立的方塊可重複使用度差、設計彈性小，設計探索空間因此受到限制。此外 ’ CoWare 公司之 CoWare Convergence SC 以及 ARM 公司之SoC Designer提供處理元件咖沉挪㈣eiements)、智權單元(工p units)、以及匯流排(buses)模型之完整框架。然，因該框架使用週期精確（cycle-accurate)之硬體模型以及指令精確 (instruction-accurate)之軟體模型，因此在模擬較複雜之單晶片網路的情況下相當耗時。另外，該些習用技術花費很多心力在以可執行碼(executablecodes)建構一個新的應用程式作為輸入，以及花費許多心力在以匯流排介面描述一個新的單晶片網路。為改善這些負擔(overhead) ’ Xu 等人在正EE 論文，’ a Meth〇d〇1〇gy f〇r 以_，201114224 VI. Description of the Invention: [Technical Field] The present invention relates to a Network-on-Chip (NoC), and more particularly to a system exploration platform for a single-chip network; The layers are segmented and the simulation models are made independent of each other; the exploration platform uses textual methods to describe the working diagram of the parallel application. [Prior Art] System-on-Chip is becoming more and more complicated due to advances in the process of ultra-large integrated circuit (VLSI). Increased number of components such as multi_c〇re processors, ip units, and function controllers (c〇ntr〇llers) 'calculates the performance bottleneck of the original system chip (perf〇rmance bottleneck) The transfer of circuits (compUtation circilits) to communication circuits' also causes communication bottlenecks (c〇mmunicati〇n bottleneck) to become more and more serious, making communication circuits the key to system chip design. Network-on-Chip (NoC) is a popular solution for communication bottlenecks, making it an emerging research area. The single-chip network takes the design of the system's chips in a computation-oriented, steering-oriented manner. The single-chip network solves many of the problems in today's mainstream buss architecture, such as scalability (w), and low-throughput. However, single-chip networks often use more network resources, such as buffers and switches, and must consider more complex and power-hungry circuit designs, such as routing units. Therefore, the exploration and simulation of the design becomes more important before the actual construction of the single-chip network. The conventional single-chip network research, the simulation environment and process are shown in Figure i. The application model (application m〇del) block U is used to describe the traffic model 201114224 (traffic pattern), and the single-chip network design (NoC design) block 12 is used to describe the components of the single-chip network, Computation nodes and adaptors. The message characteristics 13 are used to describe a bus transaction, a packet format, or a flow control unit. These blocks are input to a single-chip network simulator 14, and a simulation report (rep〇rt) i5 is output after the simulation is completed. However, the conventional analog environment of FIG. 1 does not have a uniform specification for describing the input of the application block U, the single-chip network design block 12, and the message characteristic block 13, so different designs are needed once needed. When a block is replaced, the originally established block has poor reusability and small design flexibility, and the design exploration space is thus limited. In addition, CoWare's CoWare Convergence SC and ARM's SoC Designer provide a complete framework for handling component entanglements, intellectual units, and busses. However, because the framework uses a cycle-accurate hardware model and an instruction-accurate software model, it is time consuming to simulate a more complex single-chip network. In addition, these conventional techniques take a lot of effort to construct a new application with executable code as input, and spend a lot of effort to describe a new single-chip network in the bus interface. In order to improve these burdens, 'Xu et al. in the EE paper, ' a Meth〇d〇1〇gy f〇r to _,

Modeling, and Analysis of Networks-on-Chip» , Circuits and Systems, 2005, ISCAS 2005，使用一個建構應用程 traffic pattern)之計算-通訊網路模型（刪㈣此⑽麵Μ〇η network modd)。但是其所提供之模擬環境齡割成許多不同的典， S. —·- 4 201114224 驟’每一個步驟使用不同的模擬工具以及評估的量測標準，且步驟和步驟之間有資訊遺失(informati〇n i〇ss)，因此無法得到整個系統的資訊。此外，Kangas 等人在” UML-based multiprocessor SoC designModeling, and Analysis of Networks-on-Chip», Circuits and Systems, 2005, ISCAS 2005, using a computational traffic pattern (communication network model) (deleted (4) this (10) face network network modd). However, the simulated environment age provided by it is cut into many different codes, S. —·· 4 201114224 'Each step uses different simulation tools and evaluated measurement standards, and there is information loss between steps and steps (informati 〇ni〇ss), so I can't get information about the entire system. In addition, Kangas et al. at " UML-based multiprocessor SoC design

framework , ACM transaction on Embedded Computing SyStems(TECS)，2006, Vol. 5, 2 中，使用一個通用模型語言(UML， Universal Modeling Language)，其輸入以工作圖為基礎之應用程式以及NoC模組(modules)。惟，所提供之環境並無法直接沿用 SystemC s吾言所建立的模擬模型（simuiati〇n m〇(jeiing)，但是Framework, ACM transaction on Embedded Computing SyStems (TECS), 2006, Vol. 5, 2, using a Universal Modeling Language (UML), which inputs work-based applications and NoC modules (modules) ). However, the provided environment does not directly follow the simulation model established by SystemC s. (simuiati〇n m〇(jeiing), but

SystemC 卻是軟硬體協同設計(hardware-software co-simulation)常選用的語言。【發明内容】、，本發明之一目的在於提供一種以系統為規範之設計框架，其並非疋一個完整的單晶片網路模擬器，以網路設計模擬為簡化了對_效能影響不大的單⑼網路的部份細之模擬速度。本發明之單晶片網路之系統探索平台，簡化系統設計以及建，過程，可客制化（cust0Blize)設計，且無須考量系統設計上瑣 =細節問題。本發明特別_於系統設計初期，在軟體和硬體，格（specification)尚未加以制定之前，預先用以探索單晶片，、祠路的設計空間（design space)。祕i〇=P之目的在於擴域擬環境（Si福此加⑽画膽t) ，使糾探索空間增加。本發明之模型（mQdels)和 Ιϊϋτ/Γτ·10本身與程式語言無侧，所以可 201114224 本發明之另一目的係提供一種應用程式之定義方式。本發明透過工作圖（task graph)為基礎的應用程式模型，稱為平行應用程式父通機制描述格式（Parallel Application Communication Mechanism Description Format, PACMDF) ’ 產生與指令集模擬器 (instruction simulator)相仿的交通樣式（traffic pattern)，但屏除指令精確之複雜度，因此可減少編碼的負擔。 /本發明之再一目的係提供一種在設計之際即可同時評估效能之系統框架’其屏除RTL (register transfer level)的設計，亦不使用週期準確（CyCle-accurate)的設計，而採取週期趨近SystemC is a language often used in hardware-software co-simulation. SUMMARY OF THE INVENTION One object of the present invention is to provide a system-standard design framework, which is not a complete single-chip network simulator, and the network design simulation is simplified to have little effect on the performance. Single (9) network part of the fine simulation speed. The system discovery platform of the single-chip network of the invention simplifies the system design and construction, the process, and can be customized (cust0Blize) design, and does not need to consider the system design trivial = detail problem. In particular, the present invention is used in the early stages of system design to explore the design space of a single wafer, a circuit, before software and hardware specifications have been developed. The purpose of the secret i〇=P is to expand the domain to prepare the environment (Si Fu this plus (10) painting bile t), so that the space for exploration and exploration increases. The models (mQdels) and Ιϊϋτ/Γτ·10 of the present invention have no side to the programming language, so that another object of the present invention is to provide an application definition. The present invention uses a task graph-based application model called Parallel Application Communication Mechanism Description Format (PACMDF) to generate traffic similar to the instruction simulator. The pattern (traffic pattern), but the screen divides the precise complexity of the instructions, thus reducing the burden of coding. A further object of the present invention is to provide a system framework that can simultaneously evaluate performance at the time of design. The design of the screen transfer RTL (register transfer level) does not use a Cycle-accurate design, but takes a cycle. Approaching

事件驅動（cycle approximate event driven)的設計。本發明並採用全參數化的遲滯模型（full_parameterized latexncy modelj設計’可量化評估每一設計環節對整個系統的效益。單f片網路設計須謹慎考量各種設計取捨(trade-offs)，並選取對系統最有效的設計（ad_hQG design)，而非僅是將各種網 ^設計全數應用在晶片上，因為單晶片網路的資源比傳統之網路裱境更文限制。透過模擬，可以評估一部份通訊結構設計對整单晶片網路·益，選取成本效益⑹st_perf_nee)最佳者。本發明之所提供之模擬框架（simulati〇n framew〇r 正單晶片網路設計，而非設計完成之後才進ί 杈擬。更具體而言’本發明可混合不_路層級、不同細緻度(granularity)同時進行驗證並重新設計’分層設計絪路糸統’過財可使職擬框架檢驗f 旦 !實=實應用程式所產生的交通樣式，尋二々為方便說明與瞭解，說明書之實施方式分割為個部分’分別詳述：戍 1. 單晶片網路之系統探索平台； 2. 效能評估； 3. 系統分層； 201114224 4. 應用程式模型化； 5. 平行應用程式通訊機 6. 中間層模型化。以及〔單晶片網路之系統探索平台〕本發明的「系統探索定義Event-driven event driven design. The invention also adopts a fully parameterized hysteresis model (full_parameterized latexncy modelj design) to quantitatively evaluate the benefit of each design link to the entire system. Single f-chip network design must carefully consider various design trade-offs, and select the pair The most efficient design of the system (ad_hQG design), not just the application of all kinds of network design on the chip, because the resources of the single-chip network are more limited than the traditional network environment. Through simulation, one can be evaluated. The communication structure design is optimal for the whole wafer network, and the cost-effective (6) st_perf_nee is selected. The simulation framework provided by the present invention (simulati〇n framew〇r positive single-chip network design, but not after the design is completed. More specifically, the present invention can be mixed without _ level, different fineness (granularity) Simultaneous verification and redesigning the 'layered design of the road system' to pass the wealth of the job to test the framework f! Dan = real application of the traffic pattern generated by the application, for the convenience of explanation and understanding, the manual The implementation is divided into sections 'detailed separately: 戍1. System discovery platform for single-chip network; 2. Performance evaluation; 3. System layering; 201114224 4. Application modeling; 5. Parallel application communication machine 6. Intermediate layer modeling and [system exploration platform for single-chip network] "system exploration definition" of the present invention

LevdModding)的細緻度下 ^ 級模型(Transaction 路整體效能的影響」。該平台計對單晶片網 ii片；’《増加該平台用於模擬LevdModding) The sub-level model (the effect of the overall performance of the Transaction Road). The platform counts the single-chip network ii; '"The platform is used for simulation

Platf〇Lt片統探索平台⑽以咖s_ __ platform} ’ 間％為 Nocsep，在本二索平台與N〇CSep係為同義，並交替使/早曰曰片網路之系統探本發明之單晶片網路之系統探索平台而非在建立，更精=== 差里。本發明用十1」門發明與一般單晶片模擬器之 ri “1未明之情況使用，並藉著系統化、規摩巳化、板型化板擬和修正，來探索單晶片網路的可能設計空間。之後根據設計m的各種實作之效能刪，轉—個最終之設 3二再ί ’本發明之名稱「系統」探索平台’係利用系統設計的規靶以間化不必要的模擬細節，預先規劃出可行之N〇c設計。本發明之Noes印實體上分為三個部份，簡要說明如°下°: 1.模型設計(model design): 以NoC為中心之系統，所須之軟體模型、硬體模型、以及通訊信息(communication message)模型。其對N〇c系統進行跨網路層級 (network cross-layer)以及多層抽象階層(muWple北伽比⑽levd)模組化。其可再區分為兩模組’包括： a. NoC服務(services):包含通訊信息模型，用以描述對網路服務發出的請求内容以及網路資源介面的控制與協議資訊；所謂服 201114224 務表示在層級之中以及在層級之間，流動的所有資訊 (information);以及 b. NoC服務處置器(service handlers):包含NoC軟體模型、硬體模型，用以描述產生或者處理NoC服務之方式。 2. 糸統框架設計(system firamework design): 從系統規範建立簡化跨網路層級之系統框架，定義各層級介面之功能行為(function behavior)以及NoC通訊信息之傳輸方式。其目的為建立從系統層級最上層到最底層之交通樣式(trag|C p如。 3. 权擬J辰境(simulation environment): 依照系統晶片實作Nocsep模型以及Nocsep系統框架之模擬環境，並依循所建立之NoC系統，提供模擬和效能評估。义圖2顯示N〇Csep之模擬環境圖。在圖1之習知架構下，本發月2知：供夕個統一規範描述輸入，包括N〇csep應用程式規範 (Nocsep application Regulations) 21、Nocsep 服務處置器規範 (Nocsep service handler Regulations)22、以及 N〇csep 服^規 (semce^gUlatoins)23，建構出本發明之框架(framew〇rk)24，之再根據這些統一輸入描述進行模擬，最後得到模擬報所摇=了的制巾將述及’職贿式職21職於本發明中出的平行應用程式通訊機制描述格式pACMDF」方式，描述平行應贿式之工作圖(顯示於表…、容後詳 5 £=$_範23麟應本發0种的信息分層 “Nocsep的統一規範描述有以下的優點： .模擬的尺度可以擴大到系統等級一 2· 3.:=:=二了模擬器的實作’因此無須更動 201114224 〔效能評估〕 :完巧一個新的NoC系統，必須要了解其效能；一般方式係以該NoC系統完成某一個應用程式所需要的總執行時間來評估。現存模擬器大多是由「NoC交通(traffic)的產生到其結束」的遲沛時間和模擬行為來估算n〇C的設計效能，而NoC的平均流量 (throughout)、平均傳輸遲滯(c〇mmunicati〇n iatency)、平均衝突率 (contention rate)都是估算之指標。一般模擬的方式係以一應用程式的統計特徵(statistical feaUires) ’來進行亂數模擬。然而，事實上，大多數的應用程式行為並非皆為亂數。真實的程式交通必須考量不同層級之間的相互作用，舉例說明’例如晶片電腦架構(0CCA, 〇n_chip c〇mputer Architecture)發出之請求順序受到多個系統處理單元之多工 (multi-task)方式所影響。因此本發明之Nocsep非僅考慮單晶片網路單一層級的設計觀念’其加入了網路較高層級的模型，例如工作層(tasklayer)、執行緒層(thread layer)、節點層(node layer)、以及轉接器層（adaptor fer) ’涵蓋軟體層級到達〇CCA層級的設計，使得N〇csep軟體模型能夠產生更接近真實應用程式的交通樣態。在评估一單晶片網路效能下，Nocsep將應用程式之工作運作加入模擬時間的遲滯’利用遲滯函數的組合，趨近工作所對應硬體真實運行之時間。〜本發明估算一個應用程式的執行時間，係將應用程式的行為刀解為許多服務(services) ’並保留服務(services)之間的前後關係，輸入至複數個服務處置器(service handler)所組成之單晶片網路系統。上述服務(services)表示在層級之中以及層級之間，流動的所有資訊(information) ’包括硬體的介面規格、硬體控制訊號、軟體的資料内容、韌體的工作或任務等。此外，不同網路層級使用不同的抽象階層服務。上述之服務處置器(service handler)係指可以處理服務或者傳遞服務的軟體或是硬體。換言之，總執行時間的評估等於複數個服務處置遲滯(service handling latency)的總和。倘若有 9 201114224 遲1，(laten'y 〇veria_情況’本發明之Ν·ρ亦會—併考量。明將單晶片網路的料空間_為許多輯區塊，並以較化而:π，:，體的細節被包覆在仵留硬成包裝。如以顯微的觀點，其仍然 modelinT)4!tJt^J (cycle-aPProximati〇n latency mtnS^Tt 5 handling) > 再分割為；:乂：2仃、也可能循序進行。每-個子行為可组人發^以德^勃仏。某些子行為必須等待某事件或某也事件尤口發生以後才執行，該等待的時間亦為为。因此可形成類似樹狀的結構，最上声 _、丄 ^ 樹狀中的遲滯之總和。 s的仃為遲沛·時間由所有對$麟近遲鞠型的概念更具體酬總執行時間為應用程式中外曾杆個應用私式的為，執行時間={計算行為二訊行;的=的表示行為可分解為許多子行為，表示為1^二為} ’其中的通訊交換器經過時間，‘...}，而交換器經過時轉接器經過時間，交換器經過時間={路由，資源配二，、0 J再分解，表示為：形成類似樹狀的結構，最上層的行為遲滞時過層級展開，狀結構的遲滯總和。寻間係為經過此複雜樹在本發明中，限制週期趨近模型不使用 (d〇ck_edge⑽t)來描述行為，細和週號的邊緣事件 (cyde-accurate tra職ti〇n Wd咖如 =為協議層級模型型中時鐘訊賴作為記錄事件發生 ^。在週期趨近模期趨近為一簡化的非同步電路之描述。由硬體的觀點，週以亡的遲滞模型並不會模擬過二細節充分估算服務經過之遲滯。舉例來說，一，，僅須模擬至可路由（routing)，可能透過該交^哭^交換器後產生新的、°。的計算元件（routip琴、 201114224 C^put=)、也可能透過信息(message)片段中的源頭路由(s_ 二對本發明之路*元件模型而言，無論何種產生方式’白為直接賦遲滯後，再賦予信息鱗由資訊。〔NocseP 系統分層（SystemLayering)〕，了接近更真實的交通樣態，本發明除了單晶片網路層級， =考篁網路較高層級的模型，例如工作層喊丨啊）、執行緒層 (thread layer)、節點層(n〇de layer)、以及轉接器層㈣卿h㈣。The Platf〇Lt film exploration platform (10) is the Nocsep between the coffee s_ __ platform} ', and the N 〇 CSep system is synonymous with the second cable platform, and the system of the invention is alternately made. The system of the chip network explores the platform rather than being built, more refined === difference. The invention uses the "1" door invention and the general single-chip simulator ri "1 unexplained situation, and through the systemization, regulation, and slab-type board revision and correction, to explore the possibility of a single-chip network Design space. After that, according to the performance of the various implementations of the design m, the final design is 3 2 and then the name of the invention "system" exploration platform is to use the system design target to intervene unnecessary simulation. Details, pre-planned feasible N〇c design. The Noes printing entity of the present invention is divided into three parts, and the brief description is as follows: 1. Model design: NoC-centric system, required software model, hardware model, and communication information (communication message) model. It performs a network cross-layer and a multi-layer abstraction layer (muWple North Gaby (10) levd) for the N〇c system. It can be further divided into two modules' including: a. NoC service (services): contains a communication information model to describe the content of the request sent to the network service and the control and protocol information of the network resource interface; the so-called service 201114224 Represents all information flowing between hierarchies and between hierarchies; and b. NoC service handlers: Contains NoC software models, hardware models, to describe the way in which NoC services are generated or processed . 2. System firamework design: Establish a system framework that simplifies the cross-network level from the system specification, defines the function behavior of each level interface and the transmission method of NoC communication information. The purpose is to establish the traffic pattern from the top to the bottom of the system level (trag|C p such as 3. The simulation environment: the simulation environment of the Nocsep model and the Nocsep system framework according to the system chip, and The simulation and performance evaluation are provided according to the established NoC system. Figure 2 shows the simulation environment diagram of N〇Csep. Under the conventional architecture of Figure 1, this month 2 knows: the unified specification description input, including N 〇csep application Regulations 21, Nocsep service handler Regulations 22, and N〇csep service rules (semce^gUlatoins) 23, constructing the framework of the present invention (framew〇rk) 24, and then according to these unified input description simulation, and finally the simulation of the newspaper shakes = the towel will be described in the 'bribery job position 21 job in the parallel application communication mechanism description format pACMDF" mode, Describe the work diagram of parallel bribery (shown in the table..., after the details of 5 £=$_范23麟 should be issued 0 kinds of information layering" Nocsep's unified specification description has the following advantages: The scale can be extended to the system level one 2· 3.:=:= two simulator implementations 'so no need to change 201114224 [performance evaluation]: a new NoC system, you must understand its effectiveness; the general way is to The NoC system evaluates the total execution time required for an application. Most of the existing simulators are based on the delay time and simulation behavior of "NoC traffic generation to its end" to estimate the design efficiency of n〇C. The NoC's average throughput, average transmission hysteresis (c〇mmunicati〇n iatency), and average contention rate are all indicators of estimation. The general simulation method is based on the statistical characteristics of an application (statistical feaUires "To carry out random number simulation. However, in fact, most of the application behavior is not random. Real program traffic must consider the interaction between different levels, for example, 'for example, chip computer architecture (0CCA, 〇 The request sequence issued by n_chip c〇mputer Architecture is affected by the multi-task mode of multiple system processing units. Therefore, the Nocs of the present invention Ep does not only consider the single-layer network single-level design concept. It incorporates higher-level models of the network, such as tasklayer, thread layer, node layer, and transit. The adaptor's design covers the software level and arrives at the CCA level, enabling the N〇csep software model to produce traffic patterns that are closer to real-world applications. In evaluating the performance of a single-chip network, Nocsep adds the application's working operations to the simulation time hysteresis' using a combination of hysteresis functions to approximate the real life of the hardware for the job. ~ The present invention estimates the execution time of an application by breaking the behavior of the application into a number of services' and retaining the context between the services, inputting to a plurality of service handlers. A single-chip network system. The above services represent all the information flowing in the hierarchy and between the levels, including hardware interface specifications, hardware control signals, software data content, firmware work or tasks. In addition, different network levels use different levels of abstraction services. The above service handler refers to software or hardware that can process services or deliver services. In other words, the total execution time is estimated to be equal to the sum of a plurality of service handling latencies. If there is 9 201114224 late 1, (laten'y 〇 veria_ the situation 'this invention will also _ _ will also - and consider. The material space of the single-chip network _ is a lot of blocks, and to compare: π,:, the details of the body are wrapped in a stranded hard package. As in the microscopic view, it is still modelinT)4!tJt^J (cycle-aPProximati〇n latency mtnS^Tt 5 handling) >For;:乂: 2仃, may also be carried out in order. Each of the sub-behaviors can be grouped with people to ^ ^ 仏仏. Some sub-behaviors must wait for an event or an event to occur after the occurrence of the event. The wait time is also . Therefore, a tree-like structure can be formed, which is the sum of the hysteresis in the tree _, 丄 ^ tree. The s 仃迟迟 · 时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间时间The representation behavior can be decomposed into many sub-behaviors, expressed as 1^two as} 'where the communication exchanger passes the time, '...}, and the switch passes the time elapsed by the switch, the switch elapses time = {route , the resource is equipped with two, and 0 J is decomposed, which is expressed as: Forming a tree-like structure, the uppermost layer is delayed when the behavior is delayed, and the hysteresis of the structure is summed. The locating system is through this complex tree. In the present invention, the limit cycle approach model is not used (d〇ck_edge(10)t) to describe the behavior, the fine and the weekly edge event (cyde-accurate tra job ti〇n Wd coffee == In the protocol level model, the clock signal is recorded as a recording event. In the cycle approaching the model period, it is a simplified non-synchronous circuit description. From a hardware point of view, the hysteresis model of the week is not simulated. The second detail fully estimates the delay of the service. For example, one, only need to simulate to routing, may generate new, ° computing components through the exchange ^ cry ^ exchanger (routip Qin, 201114224 C^put=), it is also possible to use the source route in the message segment (s_2 for the path* component model of the present invention, no matter what kind of production mode is white, the direct delay is delayed, and then the information scale is given information. [NocseP System Layering], close to a more realistic traffic pattern, the present invention in addition to the single-chip network level, = consider the higher-level model of the network, such as the working layer shouting), thread Layer Layer), node layer (n〇de layer), and adapter layer (four) qing h (four).

因此’本發鴨-個單晶片網路緖分為五層，顯示如圖3，簡單說明其特徵如下： 1. 工作層（task iayer) 3〇 :本層級使用工作圖（task级叩“描述應用耘式特徵，包括計算服務、通訊服務之產生量值和產生條件，並且虛擬化外部I/O行為；工作在本發明中用以描述系統所有服務的來源(sources) ’工作層被包含於執行緒層之中； 2. 執行緒層(丁111^(1_1')31:使用執行緒模組(^11^111〇(11116)描述工作間的通訊、工作整合(taskgrouping)以及執行緒分配（thfead mapping) ’以及程式平行化模型（paraUelismm〇del); 3. 節點層（node layer) 32 :使用節點模組(node m〇dule)具體描述工作仲裁(task arbitration)以及排程（task scheduling)、作業系統的多工策略（multi_tasking strategy )、計算單元的能力（c〇mputing power )與通訊單元的行為（communicati〇n behavi〇r); 4. 轉接器層（adaptor layer ) 33 .使用轉接器模組(adaptor module)，具體陳述晶片通 ifL 架構〇CCA(on-chip communication architecture)介面设§·]、支板通路交換網路（circuit-switch network)、封包交換網路（packet-switch network)以及匯流排規格通訊架構（BUS-like communication architecture)等不同 OCCA元件；以及 5.晶片通訊架構 OCCA 層（On-chip communication architecture layer)34 :所有建構OCCA的物件以及子物件都設置於此一層級’對應單晶片網路系統的通訊架構設計；Οη-φίρ Ί 201114224 communication architecture意指本層級除支援NoC，亦支援一般性通訊架構。圖3為Nocsep之服務模型分層(service model layering)示意圖，圖中顯示的方塊代表Nocsep所定義之不同層級，而方塊之間的箭頭則表示交通格式(traffic formats)。圖3中之工作30的輸入為所有交通(traffic)的來源，圖3中所顯示的複數個「通道」、包含方塊36、方塊37、方塊38，可視為硬體中不同設計的「介面」，在匕下方的所有元件會實做出來此介面的功能。假如在同一網路層級中需要模擬不同的硬體設計，但是只要支援不變的介面，其他層級的硬體模型就不必更動。工作3〇被包含於執行緒(threads)31 之内’並以「信息(messages)」交通格式傳遞至節點32，信息在圖 I的每一個層級中，將被轉換成不同的交通格式。當一個信息通過節點32轉換為複數個「串流(stream)」交通格式，串流通過程序通道(process channel)36到達轉接器(adaptor)層33。程序通道為一個虛擬，(pseudo)通道概念’可以被實際建構成執行緒對執行緒通道、節點對節點通道或者轉接器對轉接器通道。串流通過轉接器層33，轉換成「傳送程式包(transferpackage)」交通格式，真實網路通道37為晶片通訊架構層(〇ccAlayer)34的進出介面。通過晶片通訊架構層34後，傳送程式包轉換成實體通道單元(physical ehannel unit) ’ 透過最低層級實體通道(i〇west_ievei phySicai c|iannel)38 ’到達實體層35。當圖3的較上層級的交通，被轉換成較下層級的交通單元時，下層級的複數個交通單元皆共同擁有其來源父通格式的所有内容。本發=將單晶片網路以「網路分層」來分割系統設計空間，建構出規範的單晶片網路觀點。之後，再將每一網路層設計以不同的抽象階層建立出不同的模擬模型、逐一完成更精細的模擬。本發明之分層目的是為使每一層級的服務設計空間能夠彼此獨立因此’不同種類的服務處置器(service handler)僅能知悉其對應層級的資訊。 ‘ 201114224 在本發明之分層設計下，因各層級的服務設計空間獨立，因此僅需要利用同一服務模型(service m〇dd)，利用致使某部份不意欲之層級失效(disable) ’達到模擬特定層級的目的。在此一概念下’因此無讀更動模擬器的框架或者介面設計，僅須利用直接更改服務對應的資料結構，即可達到模擬新服務設計並且使用其硬體的功能。因此’本發明之編碼負擔可減少，並增加設計空間與彈性。關於各層級的服務型態以及服務内容範例，請參閱以下表一。Therefore, 'this hair duck-a single-chip network is divided into five layers, as shown in Figure 3. The characteristics are as follows: 1. Task iayer 3 〇: This level uses the work diagram (task level 叩 "description Applying 耘-type features, including computing services, communication service generation values and generation conditions, and virtualizing external I/O behavior; working in the present invention to describe the source of all services of the system (sources) are included in the working layer In the executor layer; 2. Threading layer (Ding 111^(1_1')31: Using the thread module (^11^111〇(11116) to describe communication between work rooms, task grouping, and thread assignment (thfead mapping) 'and program parallelization model (paraUelismm〇del); 3. node layer 32: use node module (node m〇dule) to specifically describe task arbitration and schedule ), the multi_tasking strategy of the operating system, the capability of the computing unit (c〇mputing power) and the behavior of the communication unit (communicati〇n behavi〇r); 4. the adapter layer (adaptor layer) 33. Adapter module (ada Ptor module), which specifies the on-chip communication architecture (CCA) interface, the circuit-switch network, the packet-switch network, and the convergence. Different OCCA components such as BUS-like communication architecture; and 5. On-chip communication architecture layer 34: All objects and sub-objects that construct OCCA are set at this level. Communication architecture design of single-chip network system; Οη-φίρ Ί 201114224 communication architecture means that this level supports NoC and also supports general communication architecture. Figure 3 is a schematic diagram of service model layering of Nocsep. The squares shown represent the different levels defined by Nocsep, and the arrows between the squares represent the traffic formats. The input to job 30 in Figure 3 is the source of all traffic, the plural shown in Figure 3. "Channels", including Block 36, Box 37, and Block 38, can be considered as different designs in the hardware. Face "all the elements below the real dagger will do it for this interface function. If you need to simulate different hardware designs in the same network level, as long as you support the same interface, the hardware models of other levels do not have to be changed. Work 3 is contained within threads 31 and passed to node 32 in a "messages" traffic format. The information is converted to a different traffic format in each level of Figure 1. When a message is converted by the node 32 into a plurality of "stream" traffic formats, the stream arrives at the adaptor layer 33 through the process channel 36. The program channel is a virtual, (pseudo) channel concept 'can be actually constructed to constitute a thread-to-thread channel, node-to-node channel, or adapter-to-arulator channel. The stream passes through the adapter layer 33 and is converted into a "transferpackage" traffic format. The real network channel 37 is the access interface of the chip communication architecture layer (〇ccAlayer) 34. After the wafer communication architecture layer 34, the transport package is converted into a physical ehannel unit' to the physical layer 35 through the lowest level physical channel (i〇west_ievei phySicai c|iannel) 38'. When the traffic of the upper level of Fig. 3 is converted into the traffic unit of the lower level, the plurality of traffic units of the lower level collectively own all the contents of the source parent communication format. This issue divides the single-chip network into a system design space by "network layering" to construct a standardized single-chip network view. Then, each network layer design creates different simulation models with different abstract levels, and completes finer simulations one by one. The layering objective of the present invention is to enable each level of service design space to be independent of each other so that 'different kinds of service handlers can only know the information of their corresponding levels. ' 201114224 Under the layered design of the present invention, since the service design space of each level is independent, it is only necessary to use the same service model (service m〇dd) to achieve a simulation by causing a part of the undesired hierarchy to be disabled. The purpose of a particular level. Under this concept, the frame or interface design of the no-reader simulator can be used to simulate the new service design and use its hardware only by directly changing the data structure corresponding to the service. Therefore, the coding burden of the present invention can be reduced, and the design space and flexibility can be increased. See Table 1 below for examples of service types and service content at each level.

表一層級（level) 服務型態（service type) 服務内容（service content) 工作層 (task layer) 任務追蹤線 (trace line of task) 1. 工作種類（task type) 2. 計算服務内容(computation service content) 3. 通訊服務内容(communication service content) 節點層 (node layer) 信息（message) 工作群識別碼(task group ID) 轉接器層 (adaptor layer) 串流(stream) 1. 串流資料大小 2. 高層級協定資訊(high-level protocol information) 3. 服務品質限制（QoS constraints) 4. 預留虛擬網路(virtual channel) 的識別碼 0CCA 層 (OCCA layer) 封包(packet)、流量控制單元（flit) 1. 分封化（packetization) 2. 分配路由（routing)資訊 3. 流量單元優先權（flow unit priority) 4. 預留真實網路資源（例如虛擬通道）的識別碼實體層 (physical layer) 實體通道單元 (physical channel unit)、缓衝器項目 (buffer item) 1_ 分時多工器（time-division multiplexing unit) 2. 中斷率（broken rate)以及校正負擔（correction overhead) 3. 於位元層級之詳細設計 (detailed design in bit level)，例如，首5個位元紀錄路由、後25個位元紀錄内容、末 2個位元偵錯。 13 201114224 在本發明中’服務(services)表示在層級之中以及層級之間，流動的所有資訊(information)，包括硬體的介面規格、硬體控制訊號、軟體的資料内容、章刃體的工作或任務等。不同網路層級使用不同的抽象層級服務。而服務處置器(service handler)係指可以處理服務或者傳遞服務的軟體或是硬體。〔應用程式模型化(application modeling)〕Table level service type (service type) service content (task layer) task trace line (trace line of task) 1. job type (task type) 2. calculation service content (computation service Content) 3. Communication service content node layer message (task group ID) adapter layer (adaptor layer) stream (stream) 1. stream data size 2. High-level protocol information 3. QoS constraints 4. Reserved virtual network identifier 0CCA layer (OCCA layer) packet (packet), flow control unit (flit) 1. Packetization 2. Routing information 3. Flow unit priority 4. Retain the physical layer of the real network resource (such as virtual channel) (physical layer) ) physical channel unit, buffer item 1_ time-division multiplexing unit 2. interrupt rate (broken rate And the correction overhead (detailed design in bit level), for example, the first 5 bit record routing, the last 25 bit record content, the last 2 bit error debugging . 13 201114224 In the present invention, 'services' means all information flowing in and between levels, including hardware interface specifications, hardware control signals, software data contents, and chapters of the blade. Work or task, etc. Different abstraction levels of services are used at different network levels. A service handler is software or hardware that can handle a service or deliver a service. [application modeling]

工作(task layer)被包含在執行緒(thread layer)之内，工作(task) 為本，明之交通源(traffic source)，其定義為所有執行期，與系統模擬相關之輸入(input)cNoC由外部輸入之軟體和硬體資，白包含於工作資訊(或者追蹤線資訊)中，例如網路層級最上^ 的應用程式，或者系統輸出入元件等。曰圖斗八顯示本發明之應用程式模型⑽咖咖請地丨㈣。執行緒(thread)的交通產生方式有三種，分別為亂數交通 (random traffic) ’ 應用程式導向交通（appiicati〇n_driven traffic) f及事件導向交通Cevent-triggered traffiO。圖4A顯示上述三種交，產生，包括應用程式導向交通G1、亂數交通G2以及事件導 =交通G3。亂數交通G2是由綠產生的健或硬體服務，根據父通的統aj·特性(traffic statistical features)產生。而事件導向交通 1由^镇發而產生之倾或硬龜務，其是根觀行緒_adS) ^接收之某些事件產生’例如㈣練要求(data哪㈣。應用程 ^導向交通G1由顧减(applieati〇n)所產生，在本發明中係利用平行，紐式通峨制描祕式PACMDF來描述，容麟細說明。的ΐ彳何雜合成—個功群㈣glOUp)，該群皆具有相同 =作群識別碼。如圖4A中，應用程式導向交通⑴包含三個工乍群中的圓圈部分代表計算工作(C〇mputationtasks)，而箭、二刀則用來代表通訊工作(c〇mmunicati〇n tasks)。而執行啫由f個卫作離成，圖4·示五個執行緒(threads)!!, T2 3，T4，T5,其中執行緒Τ1Τ2，Τ3皆包含多個工作群。 ’ 應用程式交通(applicati贈affic)從「工作」生出、傳送通過執行讀 201114224 層以及節點層，傳送内容以及過程請參考實施方式之厂N〇cs 統分層」t的詳述。 ’、圖4A同時顯禾四個節點Nl，N2, N3, N4，其令節，點N3包含兩個執行緒T3, T4 ;關於節點層(n〇de layer)，容後詳細說明。〔平行應用程式通訊機制描述格式一PACMDF〕The task layer is contained within the thread layer, the task is the basis, and the traffic source is defined as all execution periods. The input cNoC related to the system simulation is defined by The external input software and hardware resources are included in the work information (or trace line information), such as the application at the top of the network level, or the system input and output components.曰斗斗八 shows the application model (10) of the present invention (4). There are three ways of generating traffic for threads: random traffic, appiicati〇n_driven traffic, and event-oriented traffic, Cevent-triggered traffiO. Figure 4A shows the above three types of intersection generation, including application-oriented traffic G1, random traffic G2, and event guidance = traffic G3. The random traffic G2 is a health or hardware service generated by green, which is generated according to the parent's traffic statistical features. The event-oriented traffic 1 is generated by the town, and it is the root of the _adS) ^ Some of the events received are generated, for example, (four) training requirements (data which (four). Application ^ guide traffic G1 by In the present invention, it is described by using parallel, New-style through-the-loop PACMDF, and Rong Lin is a detailed description of the hybrid synthesis - a group of powers (four) glOUp), the group All have the same = group identification code. As shown in Figure 4A, the application-oriented traffic (1) consists of a circle of three work groups representing the computational work (C〇mputationtasks), while the arrow and the second knife are used to represent the communication work (c〇mmunicati〇n tasks). Execution 啫 is separated by f guards. Figure 4 shows five threads!!, T2 3, T4, T5, where 执行1Τ2, Τ3 all contain multiple work groups. The application traffic (applicati affic) is generated from the "work" and transmitted through the execution of the 201114224 layer and the node layer. For the transmission of the content and the process, please refer to the detailed description of the N〇cs system layer of the implementation system. Figure 4A shows four nodes Nl, N2, N3, N4 at the same time. The knot, point N3 contains two threads T3, T4; the node layer (n〇de layer) is described in detail later. [Parallel Application Communication Mechanism Description Format - PACMDF]

本發明亦提出一種用以描述平行應用程式之工作圖，也就是中的細程式導向交通G卜稱為平行應聰式通訊機制描 ^'(Parallel Application Communication Mechanism Description ^nat) ’簡稱為PACMDF。在本說明書中，平行應用程式通訊機制描述格式與PACMDF係相等，並交替使用。本發明之平行應用程式通訊機制描述格式，pACMDF，係為格式是用在平行應用程式，來描述關於「通訊量罝」圖徵_em)。其將平行應用程式的圖徵咖_)以所定義的描述格絲示，因此有易於撰寫以及更改哺性。就系統也與麟統所執行的顧程式具有高度細性，因此對於 =曰路’除了硬體的模型以外，也須要有相互應用的軟體應用程式模型，作為軟體和硬體整合的模擬。使用一行文字來描述一個工作(task)，其屏除圖式中複巧訊心’蚊字的方式產生翻程式的輸入碼。PACMDF 將應用程式之工作圖分為八種，整理如表二。表二分類計算工作 ' (computation task) 子分類 "ϊΓ算工石 ~~ (computation task) 内容所耗費之計算單元》 •1^ SIU—1 下 (communication task) 傳送資料工作 (data sending task) 傳送至執行緒之資料大小。傳送通告工作 (notification sending task) 傳送非資料的訊息，例如收訖封包，或者控制封包》 15 201114224 記憶體讀取工作 (memory read) 對記憶體的某一位址進行讀含讀取位址、資料量》記憶體寫入工作 (memory write) 對記憶體的某一位址進行寫尺含讀取位址、資料量。工作圖控制工作 (task graph control) 執行緒重來 (thread re-run) 並不顯不於工作圖中，分為^^ 重複執行（可指定次數或重複執行的條件）、無限次重複執行、以及有限次重複執行並結束整個工作圖。前一工作之補充資訊（supplemental information) 補充用攔位。強制暫停執行緒 (thread forced to idle for a while) 使執行緒在不佔用單晶片網資源下，應用程式暫時停止—段時間。 PACMDF包括多個欄位，而表二中所列的每一個工作種應PACMDF之不同欄位。關於PACMDF之欄位，列如表三。、表三 ~~~~ ~PACMDF^~ Mark ---参義備言主或執行範^ # :該行為備註。 ~~~~ ;:該行須執行。丄作種類 Type 工作種類 busy:代表計算 '~~~— (computation)或者 I/O 存取 (access) ° send :代表傳送信息 (messages)，包括資料指示 (data instruction)等。一---- 工作央通 Ctrl :控制訊號。 ------ oOlirCc Destination 來源工作终點工作編號。 ~~一' 工作編號；用於通知T Ak：工作特徵 Size/Execution 尺寸/執行時間計算工作之運算數值 Id 工作傳送之位元組大小》工作碼。 ~ 觸發特徵 iriggcring source 峒發來源該工作需要執行特徵 1 riggering task id 觸發工作編號 -- 指向引起觸發之工作碼。 LII6CI1V6 有致性描述該工作之有效性，例如或一行該工作之機率、該執行控制 ~~~-___ ------ 之條件等。 ~~ — 」 201114224 需要雜僅為列舉之基本攔位，實際使用上可根據非用以^三僅作為PACMDF_釋之用，並作，個卫作圖範例’其顯示人個方塊表示人個計算工 ί笪41、計紅作42、計算工作43、計算工作^、 ίίϋ 計算工作46、計紅作47、以及計算工作48，計代表;代表運算單元以及運算數值。舉舰明，。ρ=1_ C二始之加法運算。而圖4Β中，每一個線段代表一個通The present invention also proposes a working diagram for describing a parallel application, that is, a detailed program-oriented traffic G (referred to as Parallel Application Communication Mechanism Description ^nat) hereinafter referred to as PACMDF. In this manual, the parallel application communication mechanism description format is equal to the PACMDF system and is used interchangeably. The parallel application communication mechanism description format of the present invention, pACMDF, is used in a parallel application to describe the "communication" flag _em). It will show the parallel application's image _) in a defined description, so it is easy to write and change the feeding. As far as the system is concerned with the implementation of the program, it is highly detailed. Therefore, in addition to the hardware model, there must be a software application model that is applied to each other as a simulation of software and hardware integration. A line of text is used to describe a task, which removes the input code of the program by dividing the heartbeat in the pattern. PACMDF divides the working diagrams of the application into eight categories, as shown in Table 2. Table 2 classification calculation work '(computation task) sub-category" calculation work stone ~~ (computation task) Content calculation unit" • 1^ SIU-1 (communication task) data sending task The size of the data transferred to the thread. Notification sending task Sending non-data messages, such as receiving packets, or controlling packets. 15 201114224 Memory read Reads a memory address, read address, data Volume Write memory (memory write) Write a size of a certain address of the memory, including the read address, the amount of data. Task graph control thread re-run is not obvious in the work diagram, it is divided into ^^ repeated execution (conditions that can be specified or repeated), unlimited execution, And a limited number of iterations and the end of the entire work diagram. Supplemental information for the previous job. The thread forced to idle for a while causes the thread to temporarily stop for a period of time without occupying the single-chip network resources. The PACMDF includes multiple fields, and each of the work categories listed in Table 2 should have different fields for the PACMDF. The columns for PACMDF are listed in Table 3. Table 3 ~~~~ ~PACMDF^~ Mark --- Participate in the statement of the main or execution Fan ^ # : The behavior note. ~~~~ ;: This line must be executed. Type of work Type of work type busy: Represents calculation '~~~—(computation) or I/O access (access) ° send : stands for message, including data instruction. One ---------- Work Ctrl: Control signal. ------ oOlirCc Destination Source Job Endpoint Work number. ~~一' work number; used to notify T Ak: work feature Size/Execution size/execution time Calculated operation value Id The byte size of the work transfer" Work code. ~ Trigger feature iriggcring source The source of the job needs to perform the feature 1 riggering task id Trigger the job number -- points to the work code that caused the trigger. LII6CI1V6 Dependent Describe the validity of the work, for example, the probability of the job, the condition of the execution control ~~~-___ ------, etc. ~~ — ” 201114224 The need for miscellaneous is only the basic block of enumeration. In actual use, it can be used as a PACMDF_ for non-use of ^3, and it can be used as a sample. Computational work 笪 41, calculation red work 42, calculation work 43, calculation work ^, ίί ϋ calculation work 46, calculation work 47, and calculation work 48, representative; representative arithmetic unit and operation value. Lift the ship Ming,. ρ = 1_ C is the initial addition. In Figure 4, each line segment represents a pass.

I 隨的數字代表傳輸資料之長度(以低組為單位），位元組。計算卫作41由本身所觸發，而計算工前的計算工作45、46或者47所觸發。_ 一丁%線應^程式(parallel pipeline appiication)在計算工作 48 執 4丁 1UUU *^後停止。 =係圖4B之PACMDF之表示，為了方便說明，在表四插 ^攔’標不對應之列數，實際使用上，標示列數的第一棚可者获t的每一列皆代表一個工作，其中以busy為¥ ^代表為计畀工作，以send為Type者代表通訊工作，而以c出 Type者代表控制訊號。表四中，與計算工作4丨相關，可分解為八個工作，分別對應，四由1至8的八列。第1列以#為始，代表其為一註解(c〇mmentf …、須執行。第2列為一個計算工作之起始化(initiaiize)，接著第3 】開始執行方塊中的運鼻〇pi〇〇〇 ’也就是加上1000的運算。運算工作元成之後，在第4列，表示傳輸一個64位元組之資料至終點 (destination)為42之方塊。第4列之Id攔位為S2，2之前的s表示其將觸發另外一列的工作，由Triggering task id欄位彳2，1 就是對應表四之的第13列、19列以及25列，表示上Hi 作必須等待第4列之工作結束。而第4列之Effective欄位為pi , 表示執行該列的絕對機率為1。在Effective欄位’例如第52列之3000 ,其對應的數值代表該列重複執行所須的次數。在第49-51列之Size/Executiontime攔iThe number I follows represents the length of the transmitted data (in low units), the byte. The calculation guard 41 is triggered by itself, and is calculated by the calculation work 45, 46 or 47 before the calculation. _ A single line should be a program (parallel pipeline appiication) in the calculation work 48 after 4 1UUU * ^ stop. = indicates the representation of PACMDF in Figure 4B. For the sake of convenience, in Table 4, the number of columns that do not correspond to the number of columns is not included. In actual use, the first column of the number of columns can be used to represent a job. Among them, busy is ¥ ^ for the work of the program, send for the Type represents the communication work, and c for the Type represents the control signal. In Table 4, related to the calculation work, it can be decomposed into eight jobs, corresponding to each, and four from eight to eight columns. The first column starts with #, which means it is an annotation (c〇mmentf ..., must be executed. The second column is the initialization of a calculation work, and then the third is the execution of the nose in the box. 〇〇〇 'that is to add 1000 operations. After the operation of the work element, in the fourth column, it means to transfer a 64-bit data to the destination block of 42. The fourth column of the Id block is The s before S2,2 indicates that it will trigger the work of another column. The Triggering task id field 彳2,1 is the 13th column, 19th column and 25th column corresponding to Table 4, indicating that Hi must wait for the 4th column. The work is finished, and the Effective field in column 4 is pi, indicating that the absolute probability of executing the column is 1. In the Effective field 'for example, column 52, 3000, the corresponding value represents the number of times the column is repeatedly executed. In the 49/51 column Size/Executiontime block i

ί C 201114224 表示該工作之特徵，其出現，，w 其對應於圖4B中，必須等待計算工作45 為f—執行，工作47之其中之—到達後，計算卫作48 ^者計算ί C 201114224 indicates the characteristics of the work, its appearance, w corresponds to Figure 4B, must wait for the calculation work 45 to be f-execution, one of the work 47 - after arrival, the calculation of the work is calculated

Tnggering — id攔位中的_Plex表示等：緊8列開始’例如第48列係等待第49-51列中的” w 殊條件的辻文^推以^的工作圖可利用PACMDF表示為表四_ 述文子型式，以圖4δ的方式對應瞭解表四，因此不再贅述。表四列數 Mark Type Sourc e Destinati on Size/ Execution Time Id Triggerin g， source Triggerin g， task id Effectiv e 1 # task T41 2 9 busy 41 1 1 Initial 3 9 busy 41 i叩 1000 1 Initial 4 9 send 41 42 64 S2 pi 5 9 send 41 43 64 S2 pi 6 > send 41 44 64 S2 pi 7 > busy 41 inplOOO 1 pi 8 Ctrl 41 end 3 1000 9 # taskT42 4 10 9 busy 42 1 1 Initial 11 9 busy 42 i叩 1000 5 Initial 12 send 42 45 64 S6 pi 13 ? busy 42 inplOOO 5 2 pi 14 5 Ctrl 42 end 7 1000 15 # task T43 8 16 busy 43 1 1 Initial 17 5 busy 43 inplOOO 9 Initial 18 > send 43 46 64 SI 0 pi 19 3 busy 43 inplOOO 9 2 pl 20 5 Ctrl 43 End 11 1000 21 # taskT44 12 22 ? busy 44 1 1 Initial 23 busy 44 inplOOO 13 Initial 24 5 send 44 47 64 SI 4 pi 25 ? busy 44 inplOOO 13 2 pl 26 ? Ctrl 44 End 15 1000 27 # taskT45 16 28 5 busy 45 1 1 Initial 29 5 busy 45 inplOOO 16 Initial 30 5 send 45 48 64 SI 8 pi 18 201114224 31 ， busy 45 hip 1000 16 6 nl 32 ， Ctrl 45 End 19 1000 33 # taskT46 20 34 9 busy 46 1 1 Initial 35 9 busy 46 inplOOO 21 Initial 36 ， send 46 48 64 S2 2 pi 37 ， busy 46 inplOOO 21 10 pl 38 > Ctrl 46 End 23 1000 39 # taskT47 24 40 ， busy 47 1 1 initial 41 busy 47 inplOOO 25 initial 42 9 send 47 48 64 S2 6 pi 43 ， busy 47 inplOOO 25 14 pl 44 > Ctrl 47 End 27 1000 45 # taskT48 28 46 > busy 48 1 1 initial 47 > busy 48 inplOOO 29 initial 48 5 busy 48 inplOOO 29 complex pl 49 5 para 48 w or 29 13 18 50 para 48 w or 29 14 22 51 5 para 48 w or 29 15 26 52 9 Ctrl 48 End 31 3000 53 # taskT49 32 54 5 Ctrl 49 End 35 1 55 # END OF TRACE FILE 36 〔中間層模型化(middle layer modeling)〕本發明在中間層(middle layer)提供詳細的模型化(m〇deiing)，所謂的中間層指應用程式層(application layer)與NoC之間的層級’包括節點模型化(node modeling)以及轉接器模型化(adaptor modeling) 〇郎點（node)係組合處理器單元結構(processing eiement structure) ’以及作業系統程序處置(〇s process handling)。節點層強調該些能夠顯著(significantly)影響交通的行為(behaviors)，因此可減少真實情況中不需要的許多細節。凊見圖5 ’其顯示一個節點模型化(node modeling)。由執行緒 (threads)而來的工作(tasks)進入請求表(request tabie)5l。請求表51 疋一個暫時保存所有進入信息(holding all entering message. 19 201114224 temporarily)之表單(iist)。該請求表51包含複數個擴充槽⑼说, ，一個擴充槽511指定給一個固定執行緒識別碼(thread ID)以及固之工作優先權(specified task priority)。圖5亦顯示三個核(c〇res) 單元5 5，包含一個計异核(c〇mpUtati〇n c〇re)以及兩個通訊核 (communication core)。一核心管理器(kemd _啊)52 為一個< 體單兀，其負責仲裁(arbitration)並從該請求表51選擇一個芦*，並透過一工作安排器54指定給一個核(c〇re)單元55，並由單兀55處理該信息。如果核單元55是一個計算單元，將根據此單兀設定的計算能力、延遲一段時間來解決分配給它的計算工作。籲同時產生相關的狀態事件(status events)，此些事件經過對應的連接，(ports)56 ’傳回到§亥彳s息之源頭執行緒(s〇urce如⑸由。如果核單元55@疋—個通訊單元，將產生工作要求的資料並通過連接埠傳給轉，杰。該連接埠會與轉接器溝通並在轉接器中將資料轉成N〇c ，父通格式’其機制容後說明。圖5巾包含一個事件收集和觸發早7L(event coUectorand trigger)53，如有需要事件觸發的情況，該單元將收集該些事件至對應的執行緒。此處要說明的是，一個信息必須經過該核心管理器52之選取才能被處理。圖5中的連接埠56麟實體連接埠p〇rts)，而為虛擬連接埠（pseudoports)。此外，本發明之節點模型化且有巧，因此圖5中的核心管理器數量、計算核⑽寧論n⑺叫里以及通B凡核的數里皆可以參數化❻arameterize)。圖5僅用來作為說明以及解釋之範例，並非用以限制本發明。在圖5的節點模型下，交通失真(traffic触〇出產生於下幾種情況： L如擴充槽H1被占用(occupied)，信息即無法被服務； 2. 如果核心官理器(kemel managers)53數量不足，或者核單元& 不足，信息也會被閒置(idle); 3. 甚至核(core)單元％之時間分享機制(time_sharing蘭㈣⑽也會影響交通。 20 201114224 轉接器(adaptor)用以區隔節點與N〇c之間的交通(traffic)。基於該層、’及的存在’因此可以在相同模擬條件(slmulati〇n c〇nditi〇ns)下，對不同NoC設計進行比對。圖6顯示一轉接器6模型(adaptor modeling)，節點66的通訊核(〔、頁示於圖5)必須利用一管理器分配器(manager au〇cat〇r)61以及一，衝器資源分配器(bufferres〇urceai丨〇cat〇r)63，來配置(alk)cate) 一管理器資源62以及一緩衝器資源(buffer res〇urce)64，決定一 =流係可糊傳送出或者因為資源不足而須繼續等待。該管理器 >源62包含複數個串流管理器(streammanager)，而該緩衝器資源 64則包含複數個程式包彳宁列queue)。在一串流獲得配置 =開始傳送’節點66之通訊核會開始將資料送至緩衝器資源中之輊式包佇列。在該程式佇列中，串流轉換為N〇c傳送程式包 (tracer packages)型式。所謂Noc傳送程式包係指N〇c能夠傳送之資料結構；對於封包交換(package_switched)網路或者流量控制單元基礎(flit-based)直接連結網路恤⑽-触⑼沉触⑻，是使用封包或是流量控制單元作為傳送程式包；而對於另外一種電路交換 (€11_^(^(1)]^〇(：或者直接連結網路((1丨1^_1丨11]^11_〇1^，則使用協議單元(transaction unit)作為傳送程式包。轉接器6包含連接埠(p〇rts)651，轉接器6將負責封裝 (encapsulate)傳送程式包，並由該連接埠651寄出程式包至N〇c元件67的連接蜂652 ’並維持端點對端點流量控制(end_t〇_end. fl〇w control)。在NoC的連接埠652忙碌(busy)或是程式包佇列額滿(fUU) 的情況下，轉接器6皆須等待。如果應用程式係對遲滯很敏感 (latency-sensitive)或者系統的緩衝器空間十分受限，轉接器6對效能的影響顯得特別重要，交通通量(traffic thr〇ughput)也會受到影響。 ’ 在本層級中’程式包產生速率^package generati〇n rate)、最大仔列長度(maximum queue length)、每一個步驟的處置遲滯 (handling latency) ’以及緩衝器資源之總數(t〇tai buffer s〇urce)皆被參數化(parameterized)， 201114224 本發明之NoC系統設計空間明確地加以分割，系統分割為多個層級(layers) ’而層級被分割為多個元件(comp〇nents) ’元件再被分割為多個抽象階層模型(abstraction level modeling)，並且以多個遲滯參數(latency parameters)加以完成。本發明將單晶片網路的設計空間切割為許多設計區塊，並以抽^階層加以模型化。所謂的抽象階層係指硬體的細節被包覆在較咼層級的元件内，而形成一個包裝。如以顯微的觀點，其仍然Tnggering — _Plex representation in the id block: etc.: Start with 8 columns. For example, the 48th column waits for the "w" condition in the 49th-th column. The work chart of the ^ can be expressed as a table using PACMDF. 4 _ The description of the text, corresponding to the table 4 δ to understand Table 4, so will not repeat them. Table 4 column number Mark Type Sourc e Destinati on Size / Execution Time Id Triggerin g, source Triggerin g, task id Effectiv e 1 # task T41 2 9 busy 41 1 1 Initial 3 9 busy 41 i叩1000 1 Initial 4 9 send 41 42 64 S2 pi 5 9 send 41 43 64 S2 pi 6 > send 41 44 64 S2 pi 7 > busy 41 inplOOO 1 pi 8 Ctrl 41 end 3 1000 9 # taskT42 4 10 9 busy 42 1 1 Initial 11 9 busy 42 i叩1000 5 Initial 12 send 42 45 64 S6 pi 13 ? busy 42 inplOOO 5 2 pi 14 5 Ctrl 42 end 7 1000 15 # Task T43 8 16 busy 43 1 1 Initial 17 5 busy 43 inplOOO 9 Initial 18 > send 43 46 64 SI 0 pi 19 3 busy 43 inplOOO 9 2 pl 20 5 Ctrl 43 End 11 1000 21 # taskT44 12 22 ? 1 1 Initial 23 busy 44 inplOOO 13 Initial 24 5 send 44 47 64 SI 4 pi 25 ? busy 44 inplOOO 13 2 pl 26 ? Ctrl 44 End 15 1000 27 # taskT45 16 28 5 busy 45 1 1 Initial 29 5 busy 45 inplOOO 16 Initial 30 5 send 45 48 64 SI 8 pi 18 201114224 31 , busy 45 hip 1000 16 6 nl 32 , Ctrl 45 End 19 1000 33 # taskT46 20 34 9 busy 46 1 1 Initial 35 9 busy 46 inplOOO 21 Initial 36 , send 46 48 64 S2 2 pi 37 , busy 46 inplOOO 21 10 pl 38 > Ctrl 46 End 23 1000 39 # taskT47 24 40 , busy 47 1 1 initial 41 busy 47 inplOOO 25 initial 42 9 send 47 48 64 S2 6 pi 43 , busy 47 inplOOO 25 14 pl 44 > Ctrl 47 End 27 1000 45 # taskT48 28 46 > busy 48 1 1 initial 47 > busy 48 inplOOO 29 initial 48 5 busy 48 inplOOO 29 complex pl 49 5 para 48 w or 29 13 18 50 para 48 w or 29 14 22 51 5 para 48 w or 29 15 26 52 9 Ctrl 48 End 31 3000 53 # taskT49 32 54 5 Ctrl 49 End 35 1 55 # END OF TRACE FILE 36 [middle layer modeling] The present invention provides detailed modeling (m〇deiing) in the middle layer, and the so-called middle layer refers to the application layer (application) The hierarchy between layer and NoC 'includes node modeling and adaptor modeling node node system processing eiement structure' and operating system program handling (〇s process handling). The node layer emphasizes the behaviors that can significantly affect traffic, thus reducing many of the details that are not needed in the real world. See Figure 5' for a node modeling. The tasks from the threads enter the request tabie 5l. Request Form 51 A form (iist) that temporarily holds all entering messages (holding all entering message. 19 201114224 temporarily). The request table 51 includes a plurality of expansion slots (9), and an expansion slot 511 is assigned to a fixed thread ID and a specified task priority. Figure 5 also shows three core (c〇res) units 5 5 containing a heterogeneous core (c〇mpUtati〇n c〇re) and two communication cores. A core manager (kemd_ah) 52 is a <body", which is responsible for arbitration and selects a reed* from the request list 51 and assigns it to a core through a work scheduler 54 (c〇re Unit 55, and the information is processed by the unit 55. If the core unit 55 is a computing unit, the computational effort assigned to it will be resolved based on the computational power set by this unit and the delay. At the same time, the relevant status events (status events) are generated. These events are transmitted to the source thread of the § 彳 ce ce ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (疋—a communication unit that will generate the information required by the job and pass it to the transfer. The connection will communicate with the adapter and convert the data into N〇c in the adapter. The mechanism is described later. The Figure 5 towel contains an event coUector and trigger 53. If there is a need for event triggering, the unit will collect the events to the corresponding thread. A message must be selected by the core manager 52 to be processed. The port in Figure 5 is connected to 埠p〇rts) and is a virtual port. In addition, the node of the present invention is modeled and cumbersome, so the number of core managers in Fig. 5, the computational kernel (10), the number of n(7) calls, and the number of cores can be parameterized. Figure 5 is only intended to illustrate and explain the examples and is not intended to limit the invention. In the node model of Figure 5, traffic distortion (traffic touch occurs in the following cases: L. If the expansion slot H1 is occupied, the information cannot be serviced; 2. If the core gammel managers 53 is insufficient, or the nuclear unit & is insufficient, the information will be idle; 3. Even the core unit's time sharing mechanism (time_sharing Lan (4) (10) will also affect traffic. 20 201114224 Adapter (adaptor) Used to distinguish the traffic between the node and N〇c. Based on the layer, 'and existence', it is possible to compare different NoC designs under the same simulation conditions (slmulati〇nc〇nditi〇ns) Figure 6 shows an adapter model (adaptor modeling). The communication core of node 66 ([, shown in Figure 5) must utilize a manager au〇cat〇r 61 and a buffer. A resource allocator (bufferres〇urceai丨〇cat〇r) 63, to configure (alk) cate) a manager resource 62 and a buffer resource (buffer res〇urce) 64, to determine a = stream system can be transmitted or Waiting for the resource due to insufficient resources. The manager > source 62 contains complex A stream manager (streammanager), which buffer resource 64 includes a plurality of program packet rather left foot column queue). The communication core that gets configured in a stream = Start Transfers node 66 begins to send data to the buffer packet in the buffer resource. In this program queue, the stream is converted to the N〇c transport package type. The so-called Noc transfer package refers to the data structure that N〇c can transmit; for the packet exchange (package_switched) network or the flow control unit basis (flit-based) direct connection to the Internet (10)-touch (9) sink (8), the use of packets Or the flow control unit as a transfer package; and for another circuit exchange (€11_^(^(1))^〇(: or directly connected to the network ((1丨1^_1丨11]^11_〇1) ^, the transaction unit is used as the transfer package. The adapter 6 includes a connection port (p〇rts) 651, and the adapter 6 is responsible for encapsulating the transfer package and sending it by the port 651 The package is connected to the connection bee 652' of the N〇c component 67 and maintains the endpoint-to-endpoint flow control (end_t〇_end.ff〇w control). The NoC connection 埠652 is busy or packaged. In the case of a full line (fUU), the adapter 6 has to wait. If the application is latency-sensitive or the buffer space of the system is very limited, the effect of the adapter 6 on the performance appears. Especially important, traffic thr〇ughput will also be affected. Medium 'package generation rate ^package generati〇n rate), maximum queue length (maximum queue length), handling latency for each step 'and the total number of buffer resources (t〇tai buffer s〇urce) All are parameterized, 201114224 The NoC system design space of the present invention is clearly divided, the system is divided into multiple layers (layers are divided into multiple components (comp〇nents), and the components are further divided into A plurality of abstraction level models are implemented with a plurality of latency parameters. The present invention cuts the design space of a single-chip network into a plurality of design blocks and models them in a hierarchical manner. The so-called abstract hierarchy means that the details of the hardware are wrapped in the elements of the lower level to form a package. As in the microscopic view, it still

保留硬體設計的良好特性，但可因此減少硬體建構上的细^^ 減少模擬的時間。、本發明所揭露之實施例僅用於說明發明之特徵與精神，用以限枝明本身。本發明之齡應以下觸提圍為準’並參酌說明書給予最寬之解釋。蓋之相等性改㈣應屬本發明之腳所保護。利補涵【圖式簡單說明】為使本發明之上述和其他目的、特徵、優點與實施例更容易瞭解’所附圖式之詳細說明如下：圖1為習知單晶片網路模擬環境示意圖；圖2為本發明之單晶片網路模擬環境示意圖；圖3為本發明單晶片網路系統的分層示意圖；圖4A為本發明之應用程式模型圖；圖4B顯示一個工作範例圖；圖5顯示本發明之節點模型；圖6顯示本發明之轉接器模型。Retaining the good features of the hardware design, but can reduce the fineness of the hardware construction and reduce the simulation time. The embodiments disclosed in the present invention are only used to illustrate the features and spirit of the invention, and are used to limit the invention itself. The age of the present invention is subject to the following specifications and the broadest interpretation is given in the specification. The equality of the cover (4) shall be protected by the feet of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS The above description of the above and other objects, features, advantages and embodiments of the present invention will become more apparent. 2 is a schematic diagram of a single-chip network simulation environment of the present invention; FIG. 3 is a hierarchical diagram of a single-chip network system of the present invention; FIG. 4A is an application model diagram of the present invention; FIG. 4B is a working example diagram; 5 shows the node model of the present invention; Figure 6 shows the adapter model of the present invention.

22 201114224 【主要元件符號說明】 11 ............應用程式 12 ............單晶片網路設計 13 ............信息特性 14 ............單晶片網路模擬器 15 ............模擬報告 21 ............應用程式規範 22 ............服務處置器規範 23 ............服務規範 • 24............單晶片網路系統探索框架 30 ............工作 31 ............執行緒 32 ............節點 33 ............轉接器 34 ............晶片通訊架構 35 ............實體 36 ............程序通道 37 ............真實網路通道 38 ............最低層級實體通道 ® 41，42, 43, 44,45, 46, 47, 48 ..計算工作 51 ............請求表 511............擴充槽 52 ............核心管理器 53 ............事件收集和觸發單元 54 ............工作安排器 55 ............核單元 56 ............連接埠 6............轉接器 61 ............管理器分配器 62 ............管理器資源 23 20111422422 201114224 [Key component symbol description] 11 ............Application 12 ............Single chip network design 13 ........ ....Information Characteristics 14 ......... Single Chip Network Simulator 15 ......... Simulation Report 21 ......... ...Application Specification 22 ............Service Processor Specification 23 ............Service Specification • 24.......... .. Single Chip Network System Exploration Framework 30 ............Work 31 ............Threads 32 ........... Node 33 ............ Adapter 34 ............ Wafer Communication Architecture 35 ............ Entity 36. ...........Program channel 37............Real network channel 38............lowest level physical channel® 41, 42, 43, 44, 45, 46, 47, 48 .. Calculation work 51 ......... Request form 511............ Expansion slot 52 .. .......... core manager 53 ......... event collection and trigger unit 54 ......... work arranger 55 .. ..........nuclear unit 56 ............connection 埠6............Adapter 61 ...... ... manager allocator 62 ......... manager resource 23 201114224

63 ............緩衝器資源分配器 64 ............缓衝器資源 651,652 .........連接埠 66 ............節點 67 ............單晶片網路元件 2463 ......... Buffer resource allocator 64 ......... Buffer resources 651, 652 ... ... connection 埠 66 .. ..........node 67 ............single chip network element 24

Claims

201114224 VII. Patent application scope: 1. A system exploration platform for a single-chip network, including: a. — Model design: Modeling a single-chip network-centric system, integrated model...hard acid type, and-view Information model; the pass is a plurality of services describing a single chip, and the hard type is a way of describing the generation and processing of the services; (4) software switching b. - system framework setting ff: fresh "transfer" , defining the functional behavior of each level and the information transmission method. The network system provides a performance evaluation method from the top-level to the bottom-level of the hierarchy, and designs with the model. And the design details of the negligible single-chip network can be neglected, simplifying the management of the single-chip network and providing performance evaluation during the design phase. Inter-segmentation 2. The single-chip network described in Patent Item 1 The system explores the design of a bamboo raft (4), and relies on the layers to be turned into a plurality of jobs - the application, and the description of the squat; each _ at least _ the complex; including the work transferred to The node layer, convert S complex; = : (3) at least - core tube ^ to f a computing core and at least - communication core; (4); (d) at and by - d. transit / layer, package ft for the node Layer output; 3 multiple observation groups; bribes to the transfer layer, steam, *· J 25 201114224 for at least one event; the adapter module contains: = to V official distribution, for configuration — (7) The buffer resource allocator, ^ resource and buffer resource are transferred to the chip communication architecture layer and converted into the process 3. The last lag time of the system described in the second application of the patent scope is ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, The picture of Gu Chenghong is made into a text form, a number column, and each column represents a job; the blocks include: work as a computing job, a communication job, or a b. work source, Used to describe the source of the work; c. the end of the work, used in communication work, The transmission end point of the communication work; d·^ is used to describe the transmission byte of the communication work or the nose value of the calculation work; % e. trigger feature ' is used to describe the trigger condition of the work; f execution condition And execution characteristics to describe the number of executions of the job or to perform 26