CN114915586A - Network-on-chip topology generation - Google Patents

Network-on-chip topology generation Download PDF

Info

Publication number
CN114915586A
CN114915586A CN202210113322.1A CN202210113322A CN114915586A CN 114915586 A CN114915586 A CN 114915586A CN 202210113322 A CN202210113322 A CN 202210113322A CN 114915586 A CN114915586 A CN 114915586A
Authority
CN
China
Prior art keywords
router
identified
traffic
routers
packet rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210113322.1A
Other languages
Chinese (zh)
Inventor
纳拉亚纳·斯里·哈沙·盖德
霍纳胡吉·哈里纳特·文卡塔·纳加·安比卡·普拉萨德
阿努普·刚沃
尼丁·库马尔·阿加瓦尔
拉维尚卡尔·斯里达兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Ltd
Original Assignee
ARM Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/171,408 external-priority patent/US11329690B2/en
Application filed by ARM Ltd filed Critical ARM Ltd
Publication of CN114915586A publication Critical patent/CN114915586A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/021Ensuring consistency of routing table updates, e.g. by using epoch numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/394Routing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/337Design optimisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/109Integrated on microchip, e.g. switch-on-chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/02System on chip [SoC] design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2117/00Details relating to the type or aim of the circuit design
    • G06F2117/04Clock gating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2117/00Details relating to the type or aim of the circuit design
    • G06F2117/10Buffer insertion

Abstract

The present disclosure provides a computer-based method and system for synthesizing a NoC that advantageously generates a balanced NoC topology without end-to-end fairness or local credit-based arbitration and improves NoC performance when the target device bridge port supports only one incoming physical link per channel. More specifically, certain routers are assigned a clock domain that satisfies the router's minimum frequency while reducing the transition of the clock domain to neighboring routers and balancing the traffic flows received by the routers based on the traffic flow packet rate.

Description

Network-on-chip topology generation
Cross Reference to Related Applications
This application is a part-on-application (CIP) of U.S. patent application serial No. 17/076,403 (filed 10/21/2020), which is a part-on-application (CIP) of U.S. patent application serial No. 16/518,254 (filed 7/22/2019, now U.S. patent No. 10,817,627), the disclosure of which is incorporated herein by reference in its entirety.
Background
The present disclosure relates to a network. More particularly, the present disclosure relates to networks on chip (NoC).
Nocs are network-based communication subsystems implemented on Integrated Circuits (ICs), such as system-on-a-chip (SoC), that enable IC modules to exchange data more efficiently than traditional bus or crossbar architectures. More specifically, nocs are router-based packet-switched networks that connect IC modules, such as Intellectual Property (IP) cores. Nocs include various components such as routers, resizers or serializers/deserializers (SerDes), Physical Clock Domain Crossing (PCDC) buffers, pipeline elements, and so forth. NoC synthesis is the process of laying out and configuring NoC components on an IC based on NoC input specifications. Generally, NoC designs must accommodate the data or traffic communicated between IC modules while satisfying various design constraints, such as power, performance and area (PPA), wiring costs, etc., which may conflict with one another.
NoC synthesis includes, among other things, generating a topology for the NoC, which is an arrangement of routers, connections, and traffic paths or routes between IC modules. An under-designed NoC topology may significantly impact the PPA, wiring costs, etc. of the NoC and may cause line-end (HoL) blocking across traffic classes. HoL blocking occurs when a sequence of packets from one traffic class is blocked by packets from another traffic class even if the route of the blocked traffic class is clear. HoL blocking across traffic classes can degrade NoC performance.
It is important that a properly designed NoC is balanced in order to minimize interference between different input ports that send traffic to the same output port at each arbitration point (such as, for example, a router). In nocs without end-to-end quality of service (QoS) enforcement and local fair arbitration schemes, an imbalance between arrival rates at input ports sharing the same output port on a router may result in a loss due to arbitration and subsequent performance loss. The presence of burstiness in traffic or variable packet sizes poses additional constraints on designing topologies that meet performance requirements.
Drawings
Fig. 1 depicts a block diagram of a NoC synthesis system, according to one embodiment of the present disclosure.
Fig. 2 depicts a flow diagram for NoC synthesis according to one embodiment of the present disclosure.
Fig. 3 depicts functionality associated with determining the topology of a NoC according to one embodiment of the present disclosure.
Fig. 4 depicts a graphical representation of an input specification of a NoC, according to one embodiment of the present disclosure.
Fig. 5 depicts an HoL Conflict Graph (HCG) of a NoC, according to one embodiment of the present disclosure.
Fig. 6A depicts a flow graph (TG) of a NoC according to one embodiment of the present disclosure.
Fig. 6B-6F depict a series of graphs, meshes, and topologies of the TG depicted in fig. 6A according to one embodiment of the present disclosure.
Fig. 7 depicts router consolidation for a merged candidate topology according to one embodiment of the present disclosure.
Fig. 8 depicts a reference topology of a NoC according to one embodiment of the present disclosure.
Fig. 9A depicts a traffic flow diagram of traffic flows within a NoC, according to one embodiment of the present disclosure.
Fig. 9B depicts a traffic flow view of traffic flow on a reference topology of a NoC of one embodiment of the present disclosure.
Fig. 10A depicts a traffic flow diagram of traffic flows on a reference topology of a NoC, according to one embodiment of the present disclosure.
Fig. 10B depicts a traffic flow diagram of traffic flows over a first variant topology of a NoC, according to an embodiment of the present disclosure.
Fig. 10C depicts a traffic flow view of traffic flows on a second variant topology of a NoC according to one embodiment of the present disclosure.
Fig. 10D depicts a traffic flow view of traffic flows on the final topology of the NoC, according to one embodiment of the present disclosure.
Fig. 11 depicts a final topology of a NoC according to one embodiment of the present disclosure.
Fig. 12A, 12B, and 12C depict a flow diagram representing functions associated with a synthetic NoC, according to one embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will now be described with reference to the drawings, wherein like reference numerals refer to like parts throughout.
Embodiments of the present disclosure advantageously provide a computer-based method and system for synthesizing a NoC that generates a balanced topology without implementing end-to-end fairness or local credit-based arbitration, and improves NoC performance when the target device bridge port supports only one incoming physical link per channel. More specifically, embodiments of the present disclosure allocate clock domains for certain routers that meet a minimum frequency for the router while reducing transitions of the clock domains to neighboring routers and balancing traffic flows received by these routers based on traffic flow packet rates.
In one embodiment, a computer-based method for synthesizing a NoC is provided. The method includes determining, based on an input specification of the NoC, physical data, device data, bridge data, and traffic data, the physical data including a dimension of the NoC, the device data including a plurality of devices, each device having a location and a dimension, the bridge data including a plurality of bridge ports, each bridge port associated with one of the devices and having a location, the traffic data including a plurality of traffic flows, each traffic flow having a packet rate. Virtual Channels (VCs) are assigned for each traffic stream to create a plurality of VC assignments. A reference topology is generated based on the physical data, the device data, the bridge data, the traffic data, and the VC assignments, the reference topology including a plurality of bridge ports, a plurality of routers, and a plurality of connections, each router having one or more input ports and one or more output ports. Each router having at least one output port shared by traffic flows received on at least two input ports is identified. For each identified router, a minimum frequency for the identified router is calculated based on the packet rate of the traffic flow received by the identified router, and a clock domain is assigned to the identified router based on the minimum frequency for the identified router. Balancing the traffic flows received by the identified routers based on their packet rates. A final topology is generated based on the reference topology and the balanced traffic flows of the identified routers.
Fig. 1 depicts a block diagram of a NoC synthesis system 10, according to one embodiment of the present disclosure.
Computer 100 includes a bus 110, a processor 120, a storage element or memory 130, I/O interfaces 140, a display interface 150, and one or more communication interfaces 160. Generally, I/O interface 140 is coupled to I/O device 142 using a wired or wireless connection, display interface 150 is coupled to display 152, and communication interface 160 is connected to network 20 using a wired or wireless connection.
Bus 110 is a communication system that transfers data between processor 120, memory 130, I/O interface 140, display interface 150, and communication interface 160, as well as other components not shown in FIG. 1. Power connector 112 is coupled to bus 110 and a power source (not shown).
The processor 120 includes one or more general-purpose or special-purpose microprocessors that execute instructions to perform control, computing, input/output, etc. functions of the computer 100. The processor 120 may comprise a single integrated circuit such as a microprocessor device, or multiple integrated circuit devices and/or circuit boards working in cooperation to implement the functions of the processor 120. Additionally, the processor 120 may execute computer programs or modules stored within the memory 130, such as an operating system 132, a NoC synthesis module 134, other software modules 136, and so forth.
Generally, memory 130 stores instructions and data for execution by processor 120. Memory 130 may include various non-transitory computer-readable media that are accessible by processor 120. In various embodiments, memory 130 may include volatile and nonvolatile media, non-removable media, and/or removable media. For example, memory 130 may include any combination of Random Access Memory (RAM), dynamic RAM (dram), static RAM (sram), Read Only Memory (ROM), flash memory, cache memory, and/or any other type of non-transitory computer-readable medium.
Memory 130 contains various components for retrieving, presenting, modifying and storing data. For example, memory 130 stores software modules that provide functionality when executed by processor 120. The software modules include an operating system 132 that provides operating system functionality for the computer 100. The software modules also include a NoC synthesis module 134 that provides functionality for synthesizing a NoC architecture. In certain embodiments, the NoC synthesis module 134 may include a plurality of modules, each providing a particular individual function for synthesizing the NoC architecture, such as, for example, an input module, a VC module, a topology module, a routing module, a network generation module, a PCDC module, a link size and resizer module, a pipeline and timing component module, an output module, and so forth. Other software modules 136 may cooperate with NoC synthesis module 134 to provide functionality for synthesizing NoC architecture.
The data 138 may include data associated with the operating system 132, the NoC synthesis module 134, other software modules 136, and the like.
The I/O interface 140 is configured to transmit and/or receive data from the I/O device 142. The I/O interface 140 enables a connection between the processor 120 and the I/O device 142 by encoding data to be transmitted from the processor 120 to the I/O device 142 and decoding data received from the I/O device 142 for use by the processor 120. Generally, data may be transmitted over a wired and/or wireless connection. For example, I/O interface 140 may include one or more wired communication interfaces (such as USB, ethernet, etc.) and/or one or more wireless communication interfaces coupled to one or more antennas (such as WiFi, bluetooth, cellular, etc.).
Generally speaking, the I/O devices 142 provide input to the computer 100 and/or output from the computer 100. As discussed above, the I/O devices 142 are operatively connected to the computer 100 using wired and/or wireless connections. The I/O device 142 may include a local processor coupled to a communication interface configured to communicate with the computer 100 using wired and/or wireless connections. For example, the I/O devices 142 may include a keyboard, mouse, touchpad, joystick, or the like.
The display interface 150 is configured to transmit image data from the computer 100 to a monitor or display 152.
Communication interface 160 is configured to transfer data to and from network 20 using one or more wired and/or wireless connections. The network 20 may include one or more local area networks, wide area networks, the internet, etc., which may implement various network protocols such as, for example, wired and/or wireless ethernet, bluetooth, etc. The network 20 may also include various combinations of wired and/or wireless physical layers, such as, for example, copper or coaxial cable networks, fiber optic networks, bluetooth wireless networks, WiFi wireless networks, CDMA, FDMA and TDMA cellular wireless networks, and the like.
Fig. 2 depicts a NoC synthesis flow diagram 200, while fig. 3 depicts functionality at 230 associated with determining the topology of a NoC, according to embodiments of the present disclosure.
As discussed above, the software modules include a NoC synthesis module 134 that provides functionality for synthesizing a NoC architecture. In certain embodiments, the NoC synthesis module 134 includes a plurality of modules, each providing a particular individual function for synthesizing the NoC architecture, such as, for example, an input module, a VC module, a topology module, a routing module, a network generation module, a PCDC module, a link size and resizer module, a pipeline and timing component module, an output module, and so forth.
At 210, the NoC input specifications 202 are retrieved from the memory 130 and design information for the NoC is determined. For example, the NoC input specification 202 may be received over the network 20 and then stored as data 138 in the memory 130. In another example, the NoC input specification 202 may be created by a NoC designer using one or more software modules 136 and then stored as data 138 in the memory 130.
Design information of a NoC includes, for example, physical data, device data, bridge data, traffic data, and the like. Additional design information may include voltage domain data, power domain data, clock domain data, address region data, synthesis constraints, and the like.
The physical data includes a list of dimensions and non-routable regions of the NoC. NoC components (such as bridges, routers, pipelines, resizers, connections, etc.) are typically not located within non-routable areas. In one example, nocs are modeled as an array of cells arranged in rows and columns. The number of rows is defined by the height (in units of cells) and the number of columns is defined by the width (in units of cells). Cell widths in millimeters, micrometers, inches, etc. may also be provided. The cells are numbered in order, starting from the top left corner of the array. The data for each non-routable area includes a location (cell number) and dimensions, such as a width (in units of cells) and a height (in units of cells). In another example, NoC is modeled as a grid defined by cartesian coordinates (X, Y), with the origin located in the lower left corner of the grid. The height and width are provided in normalized units, and a normalization factor may also be provided. The data for each non-routable area includes a location (X, Y) and dimensions, such as a width (X) and a height (Y).
The device data includes a list of devices, such as IP cores, IC modules, etc., located within the NoC. Each device includes one or more bridge ports (i.e., signal interfaces). The data for each device may include name, location (cell number, X-Y coordinates, etc.), dimensions (including width (in units of cells, X dimensions, etc.) and height (in units of cells, Y dimensions, etc.)), power domain, etc.
The bridge data includes a list of bridge ports of the device. The data for each bridge port may include a name, associated device name, location (cell number, X-Y coordinates, etc.), data width (in bits), low/high line indicators, etc.
In many embodiments, nocs are packet-switched networks that divide data packets into sequences of message flow control units or flits (flits). Each flit has the same size (in bits) and is divided into a sequence of data transmissions across a physical connection or link. A physical unit or block (phit) is the number of bits (i.e., the bit width of a link) that can be transmitted in parallel across a physical connection in a single data transmission cycle. In one example, the flit size of a NoC is 128 bits. A bridge port 32 bits wide (block size) requires 4 data transfer cycles to transfer each flit. In the context of the present disclosure, the link size of the bridge port is 4 (each unit of link size is 32 bits). Similarly, a bridge port with a data width of 16 bits requires 8 data transfer cycles to transfer each flit, and the link size is 8 (16 bits per unit of link size), while a bridge port with a data width of 64 bits requires 2 data transfer cycles to transfer each flit, and the link size is 2 (64 bits per unit of link size). Other flit sizes may also be used, such as, for example, 32-bit, 64-bit, 256-bit, 512-bit, etc. Different flow control techniques may be used in alternative embodiments.
The traffic data includes a list of traffic flows for the NoC. The data for each traffic flow includes a source bridge port, a destination bridge port, a peak traffic rate, an average traffic rate, and a traffic class. The source bridge port and the target bridge port are contained within a bridge port list. The peak traffic rate and average traffic rate are provided in bits or bytes per second, such as, for example, B/s, KB/s, Mb/s, Gb/s, Tb/s, etc., B/s, KB/s, MB/s, GB/s, TB/s, etc. In general, traffic classes provide one or more metrics that differentiate the levels of NoC performance that may be provided for each traffic. In many embodiments, the traffic classes include quality of service (QoS) metrics and delay sensitivity (LS) metrics. QoS metrics provide a mechanism to prioritize traffic within a NoC, while LS metrics indicate the sensitivity of traffic to network delay. For example, 8 different traffic classes are provided for an integer QOS metric having four possible values (e.g., 0 to 3) and a boolean LS metric having two possible values (e.g., true or false). In this example, a QoS value of 0 and an LS value of true provides the best potential NoC performance. Other metrics are also contemplated. Additionally, message types may also be provided, such as, for example, read requests, write requests, read/write requests, and the like.
Fig. 4 depicts a graphical representation of a NoC input specification 202, according to one embodiment of the present disclosure. The user may view the NoC 300 on the display 152.
NoC 300 has a NoC height and a NoC width, and includes a non-routable region 301 located within a central portion of NoC 300. Nine devices are dispersed in the NoC 300, none of which are located within the non-routable region 301. Each device includes at least one bridge port ("P"). For ease of illustration, each device has a single bridge port. Device 310 includes bridge port 310a, device 311 includes bridge port 311a, device 312 includes bridge port 312a, device 313 includes bridge port 313a, device 314 includes bridge port 314a, device 315 includes bridge port 315a, device 316 includes bridge port 316a, device 317 includes bridge port 317a, and device 318 includes bridge port 318 a. Generally speaking, the location of each bridge port is limited by the location of the associated device and the footprint (i.e., device width and height) of the devices within the NoC. For example, for an 8 cell x 8 cell NoC, a device with a 1 cell width and a 3 cell height located at cell number 9 supports one or more bridge ports located at cell number 9, cell number 17, and/or cell number 25.
Eight sets of traffic flows between devices are depicted; each traffic flow set includes at least one traffic flow. For example, a set of traffic streams may include a traffic stream that defines a read request and a traffic stream that defines a write request. A set of traffic flows 320 flows between bridge port 310a and bridge port 318 a. Traffic flow set 321 flows between bridge port 311a and bridge port 318 a. Traffic flow set 322 flows between bridge port 312a and bridge port 318 a. Flow stream set 323 flows between bridge port 313a and bridge port 318 a. A set of traffic flows 324 flows between bridge port 314a and bridge port 318 a. The set of traffic flows 325 flows between the bridge port 315a and the bridge port 318 a. A set of traffic flows 326 flows between bridge port 316a and bridge port 318 a. Flow set 327 flows between bridge port 317a and bridge port 318 a.
In many embodiments, devices 310, 311, 312, 313, 314, 315, 316, and 317 may be AXI Slave Network Interfaces (ASNI) and device 318 may be an AXI Master Network Interface (AMNI). Generally, an AMNI may send data to an ASNI or may request data from an ASNI. For ease of explanation, device 310 is labeled "S0," device 311 is labeled "S1," device 312 is labeled "S2," device 313 is labeled "S3," device 314 is labeled "S4," device 315 is labeled "S5," device 316 is labeled "S6," device 317 is labeled "S7," and device 318 is labeled "M0" may accommodate other configurations and device types.
Referring back to fig. 2, at 220, each traffic flow is assigned a VC. Generally, VCs are allocated to reduce collisions and simplify subsequent topology generation. In one embodiment, VCs are assigned using an iterative estimation process that performs a specify-evaluate-refine loop until no significant improvement in the estimation results. Other allocation methods are also contemplated.
At 230, the topology of the NoC is determined.
Referring back to fig. 3, at 232, the HCG is constructed based on the traffic data and VC allocations.
Fig. 5 depicts a HCG 400 of a NoC 300 according to one embodiment of the present disclosure. The user may view HCG 400 on display 152.
In this embodiment, HCG 400 includes traffic nodes 410 through 417 and has no HoL edges. Each traffic node represents a traffic flow and each HoL edge represents a HoL conflict. HoL collision is defined as two traffic flows assigned to the same VC but with different traffic classes, such as, for example, different QoS values and/or different LS values. For purposes of illustration only, each set of traffic streams 320, 321, 322, 323, 324, 325, 326 and 327 has a single traffic stream from the respective slave device to the master device 318, which results in eight traffic nodes 410 through 417. Each traffic node 410 through 417 is then assigned a color to minimize HoL collisions, where neighboring traffic nodes receive different colors. In some embodiments, minimum vertex shading is used to find the minimum number of colors assigned to traffic nodes 410 through 417. Since there is no HoL conflict, HCG 400 includes eight traffic nodes 410 to 417 in one color (white).
Referring back to fig. 3, at 234, TG for each color is constructed based on the physical data, bridge data, traffic data, and modified HCG. In other words, a plurality of flow graphs are constructed based on the physical data, the bridge data, the flow data, and the modified HCG.
Fig. 6A depicts a TG500 of a NoC 300 according to one embodiment of the present disclosure. TG500 includes a color from HCG 400, i.e., white. The user may view the TG500 on the display 152.
The TG500 includes nodes 510 to 518 and edges 520, 521, 522, 523, 524, 525, 526, and 527. Each node 510-518 is associated with a different bridge port, and each edge 520, 521, 522, 523, 524, 525, 526, and 527 connects the pair of nodes and is associated with a set of traffic flows between the two bridge ports. As discussed above, each set of traffic flows includes at least one traffic flow.
More specifically, node 510 is associated with bridge port 310a, node 511 is associated with bridge port 311a, node 512 is associated with bridge port 312a, node 513 is associated with bridge port 313a, node 514 is associated with bridge port 314a, node 515 is associated with bridge port 315a, node 516 is associated with bridge port 316a, node 517 is associated with bridge port 317a, and node 518 is associated with bridge port 318 a. Similarly, edge 520 is associated with traffic flow set 320, edge 521 is associated with traffic flow set 321, edge 522 is associated with traffic flow set 322, edge 523 is associated with traffic flow set 323, edge 524 is associated with traffic flow set 324, edge 525 is associated with traffic flow set 325, edge 526 is associated with traffic flow set 326, and edge 527 is associated with traffic flow set 327.
At 236, candidate topologies for each color are generated based on the respective TG. In other words, a candidate topology is generated for each TG. The candidate topology includes bridge ports, routers, and connections.
Fig. 6B-6F depict a series of meshes and topologies of TGs 550 according to one embodiment of the present disclosure. The user may view these meshes and topologies on the display 152.
First, a mesh is generated based on TG. The grid includes nodes and intersections formed by grid lines passing through each node. Each node is associated with a different bridge port and is located at a different intersection. In one embodiment, the grid is a Hainan grid formed of orthogonal vertical grid lines and horizontal grid lines. Other types of grids may also be generated, such as lattices, squares, or unit distance grids, etc.
Generally, the functions at 234 and 236 are performed for each color. In one embodiment, the function at 234 is performed for all colors, and then the function at 236 is performed for all colors. In another embodiment, the function at 234 is performed for a first color, and then the function at 236 is performed for the first color. Next, the function at 234 is performed for the second color, then the function at 236 is performed for the second color, and so on.
Fig. 6B depicts a hannan grid 501 for TG 500.
Nodes 510 to 518 are located at respective intersections and a router is added to the mesh at each intersection not occupied by a node. In this embodiment, 27 routers (i.e., routers R01 through R27) are added to the mesh. The neighboring nodes and routers are then connected to create an initial mesh or topology.
Fig. 6C depicts an initial mesh or topology 502 of the TG 500.
The node 510 is connected to routers R14, R19, and R23. Node 511 is connected to routers R06, R10, and R14. Node 512 is connected to routers R02, R06, R07, and R10. Node 513 is connected to routers R02, R03, and R07. Node 514 is connected to routers R03, R07, R08, and R12. Node 515 is connected to routers R08, R12, R13, and R18. Node 516 is connected to routers R13, R18, and R22. Node 517 is connected to routers R18, R21, R22, and R26. The node 518 is connected to routers R20, R24, and R25.
The router R01 is connected to the routers R02 and R06. Router R02 is connected to nodes 512 and 513 and router R01. Router R03 is connected to nodes 513 and 514 and router R04. The router R04 is connected to the routers R03, R05 and R08. The router R05 is connected to the routers R04 and R09. Router R06 is connected to nodes 511 and 512 and router R01. Router R07 is connected to nodes 512, 513, and 514 and router R11. Router R08 is connected to nodes 514 and 515 and routers R04 and R09. The router R09 is connected to the routers R05, R08, and R13. Router R10 is connected to nodes 511 and 512 and routers R11 and R15. The router R11 is connected to the routers R07, R10, R12, and R16. Router R12 is connected to nodes 514 and 515 and routers R11 and R17. Router R13 is connected to nodes 515 and 516 and router R09. Router R14 is connected to nodes 510 and 511 and router R15. The router R15 is connected to the routers R10, R14, R16, and R19. The router R16 is connected to the routers R11, R15, R17, and R20. The router R17 is connected to the routers R12, R16, R18, and R21. Router R18 is connected to nodes 515, 516, and 517 and router R17. The router R19 is connected to the node 510 and to the routers R15, R20, and R24. The router R20 is connected to the node 518 and to the routers R16, R19 and R21. Router R21 is connected to node 517 and routers R17, R20 and R25. Router R22 is connected to nodes 516 and 517 and router R27. Router R23 is connected to node 510 and router R24. Router R24 is connected to node 518 and routers R19 and R23. The router R25 is connected to the node 518 and to the routers R21 and R26. The router R26 is connected to the node 517 and to the routers R25 and R27. The router R27 is connected to the routers R22 and R26.
The weight of each connection is then calculated based on the traffic data to create a weighted grid or topology. In one embodiment, a flow criticality index (TCI) is calculated for each flow stream, and then the TCI for each flow stream is added to the heating index for each connection that falls within the straight bounding box of that flow stream. TCI may be based on traffic criticality and rate. The straight bounding box for a particular traffic flow is defined by the source node (source bridge port) and the target node (target bridge port) for that traffic flow. In one embodiment, the weight of each connection is inversely proportional to the heating index of that connection, while in another embodiment, the weight is proportional to the heating index. The weights are then applied to the initial mesh or topology to create a weighted mesh or topology.
Fig. 6D depicts a weighted mesh or topology 503 of TG 500.
The different weights for each connection are represented by different line weights. The thinnest lines indicate connections through which flow does not flow. For example, edge 530 is associated with a set of traffic flows 320 that includes at least one traffic flow between node 510 (bridge port 310a) and node 518 (bridge port 318 a). The straight bounding box of traffic flow set 320 is defined by node 510 and node 518, represented by connections 520a, 520b, 520c, and 520 d. The connection 520c and 520d are weighted the lowest, the connection 520b is weighted more than the connections 520c and 520d, and the connection 520a is weighted the highest. Notably, the weight of connection 520a includes contributions from the remaining edges 521, 522, 523, 524, 525, 526, and 527, while the weight of connection 520b includes contributions from edge 521.
The minimum cost mesh or topology is then constrained based on the weighted mesh or topology certainty, including removing one or more connections and one or more routers. In one embodiment, the certainty constrains a minimum cost Steiner (Steiner) tree that generates a plurality of trees based on the degree and the number of nodes, and then selects the lowest cost tree. The connections and routers through which traffic does not flow are then removed from the degree constrained minimum cost mesh or topology.
Fig. 6E depicts a degree-constrained minimum cost grid or topology 504 for the TG 500.
The degree constrained minimum cost topology 504 includes nodes 510 through 518, and routers R07 and R10 through R21. The connection weights are the same as in fig. 6D.
Node 510 is connected to router R19, node 511 is connected to router R10, nodes 512, 513, and 514 are connected to router R07, node 515 is connected to router R12, node 516 is connected to router R18, node 517 is connected to router R21, and node 518 is connected to router R20.
Router R07 is connected to nodes 512, 513, and 514 and router R05. Router R10 is connected to node 511 and router R15. The router R11 is connected to the routers R07, R12, and R16. Router R12 is connected to node 515 and router R11. The router R15 is connected to the routers R10 and R19. The router R16 is connected to the router R17. The router R17 is connected to the routers R16 and R18. Router R18 is connected to node 516 and router R17. Router R19 is connected to node 510 and to routers R15 and R20. Router R20 is connected to node 518 and to routers R16, R19, and R21. Router R21 is connected to node 517 and router R20.
Candidate topologies are then generated from the degree-constrained minimum cost tree.
Fig. 6F depicts a candidate topology 505 of the TG 500.
The candidate topology 505 includes nodes 510 through 518, and routers R07, R10, R11, R12, and R16 through R21. The connections between the nodes and the routers are the same as in fig. 6D. Generally, a user may view the grid 501 and topologies 502-505 on the display 152.
Referring back to FIG. 3, at 238, a base topology is generated.
The candidate topologies are then merged to create a merged candidate topology, and the routers initially merge within the merged candidate topology to generate the reference topology. In this embodiment, the candidate topology 505 is also a merged candidate topology 505.
The routers are then merged and a reference topology is generated.
Fig. 7 depicts router consolidation for a consolidated candidate topology 505 according to one embodiment of the present disclosure.
The router merge graph 506 shows a process for merging routers in the merged candidate topology 505. Generally, route consolidation reduces the number of routers in a topology by consolidating two or more routers into one router. The merged router may also be relocated, i.e., placed in a location that does not correspond to any mesh location of the original router. Router relocation may occur after the merging of candidate topologies for each color, and/or during a subsequent optimization process.
Routers R10 and R19 have been incorporated into router R15, also labeled as router 540 for clarity. Routers R07, R12, R16, R17, R18, and R21 have been incorporated into router R11, also labeled router 542 for clarity. For clarity, router R20 is also labeled as router 544.
Fig. 8 depicts a reference topology 507 of a NoC 300 according to one embodiment of the present disclosure. The user may view the reference topology 507 on the display 152.
Reference topology 507 has the same NoC height and width as NoC 300 and includes a non-routable region 301 located within a central portion of reference topology 507. Device 310 is connected to router 540 through bridge port 310 a. Device 311 connects to router 540 through bridge port 311 a. Device 312 is connected to router 542 through bridge port 312 a. Device 313 is connected to router 542 through bridge port 313 a. Device 314 connects to router 542 through bridge port 314 a. The device 315 is connected to the router 542 through a bridge port 315 a. Device 316 is connected to router 542 through bridge port 316 a. Device 317 is connected to router 542 through bridge port 317 a. Device 318 is connected to router 544 through bridge port 318 a.
The reference topology 507 may be determined by the NoC synthesis module 134 based on the methods described above. Alternatively, the reference topology 507 along with traffic data and the like may be developed by different software modules 136, different computer systems, and the like, and retrieved from memory 130, received by computer 100, and the like.
In many embodiments, neither end-to-end QoS support can be applied to the reference topology 507 nor local credit based arbitration can be applied (i.e., local credits are used to mitigate arbitration losses) for various reasons. Instead, arbitration at each arbitration point (e.g., each router) is based solely on local information, such as, for example, Least Recently Used (LRU) arbitration, Round Robin (RR) arbitration, and the like. For a router with multiple input ports sharing a single output port, each arbitration grants access to only one of the input ports per cycle.
Embodiments of the present disclosure advantageously generate a balanced topology for a NoC without end-to-end fairness or arbitration based on local credits, and improve NoC performance when a target device bridge port supports only one incoming physical link per channel. More specifically, embodiments of the present disclosure allocate clock domains for certain routers that meet a minimum frequency for the router while reducing transitions of the clock domains to neighboring routers and balancing traffic flows received by these routers based on traffic flow packet rates.
Referring back to fig. 2, at 240, an effective clock domain may be determined for each location and an initial clock domain may be assigned according to traffic flow and topology.
Each router having an output port, VC combination shared by traffic flows received on at least two input ports is identified. A minimum frequency for each identified router is calculated based on the traffic flow packet rate at the input port. Each identified router is then assigned a clock domain that satisfies the router's minimum frequency.
The minimum frequency is not determined for the remaining "unidentified" routers and their clock domains may be assigned based on location, traffic flow, topology, etc.
After the initial clock domain has been assigned at 240, the minimum cost constrained steiner tree computation at 236 can be revisited by adding a performance cost to identify performance violations and enhance the PPA cost model used during the initial topology exploration. For those embodiments that receive the reference topology 507 from an external source, a minimum cost constrained steiner tree analysis at 236 may be performed using performance cost to identify performance violations.
The performance cost includes a router Clock Domain (CD) violation cost and a packet rate balancing cost. This router CD violation cost imposes a cost on routers that violate their lowest frequency requirements, since all such routers may incur bandwidth loss, which must be minimized. This packet rate balancing cost imposes a cost on the imbalance between packet rates on input ports on the router that share the same output port, which may result in arbitration on the router causing a credit penalty, resulting in a performance penalty in the policy that cannot recover the credit penalty. Generally, performance costs do not impact unidentified routers.
Flow then proceeds to 238 which constrains the steiner tree computation to modify the reference topology based on the modified minimum cost 507 and flow then proceeds to 240 which modifies the clock domain allocation of the router for the modified topology. This flow may be repeated until the routers do not violate the clock domain constraint and the packet arrival rates on all router input ports are balanced.
Fig. 9A depicts a traffic flow diagram 600 of traffic flows within the NoC 300, according to one embodiment of the present disclosure.
The traffic flow view 600 presents a slightly different view of the TG500 (nodes and edges of the TG500 are indicated in parentheses). The traffic flow view 600 depicts the devices 310 through 318 and their corresponding bridge ports, the operating frequency of each device, the set of traffic flows 320, 321, 322, 323, 324, 325, 326, and 327, and the number of transactions per second for each traffic flow.
Devices 310, 311, and 318 operate at 1GHz, while devices 312, 313, 314, 315, 316, and 317 operate at 0.5 GHz. Thus, the effective clock domains of devices 310, 311, and 318 are 1GHz clock domains, while the effective clock domains of devices 310, 311, 312, 313, 314, 315, 316, and 317 are 0.5GHz clock domains. Each set of traffic streams represents 100x10 per second from each device 310, 311, 312, 313, 314, 315, 316, and 317 to device 318 6 A single unidirectional traffic flow of individual transactions or packets.
The traffic stream sets 322, 323, 324, 325, 326, and 327 must cross clock domain boundaries because the devices 312, 313, 314, 315, 316, and 317 are located in clock domains having different clock speeds than the clock domain in which the device 318 is located. In many embodiments, the sets of traffic streams 320 and 321 do not cross clock domain boundaries because the devices 310, 311, and 318 are located in the same clock domain. In other embodiments, the sets of traffic streams 320 and 321 may cross clock domain boundaries because the devices 310, 311, and 318 are located in different clock domains having the same clock speed.
In many embodiments, the minimum router frequency may be determined as follows.
For each edge in the TG, all source-target (SD) endpoint pairs that use the edge are identified, and all packet rates from the identified SD pairs are summed to generate a total packet rate for the edge. For the traffic flow view 600, the edges 520, 521, 522, 523, 524, 525, 526, and 527 each have a 100 × 10 dimension 6 Total packet rate of packets/second.
All routers having at least one output port shared by a plurality of input ports are then identified, and then a number of determinations are performed for each identified router. For each output port O i Summing the data packet rates on all input ports sharing the output port to generate a total output packet rate, and calculating for the output port based on the total output packet rateMinimum frequency O i-min . Then the router minimum frequency r min Set to minimum frequency O of all output ports i-min Maximum value of (2). Then, it is identified that r is satisfied min Then assigns the initial router clock domain.
Fig. 9B depicts a traffic flow view 610 of traffic flow on the reference topology 507 of the NoC 300 in accordance with one embodiment of the present disclosure.
Generally, a router may have up to 8 input ports, up to 8 output ports, and may support up to 4 VCs in order to meet timing requirements. In reference topology 507, routers 540 and 544 receive traffic on 2 input ports and transmit traffic on 1 output port and 1 VC. Router 542 receives traffic on 6 input ports and transmits traffic on 1 output port and 1 VC. In other words, routers 540, 542, and 544 have at least one output shared by multiple input ports.
For router 540, the sum of the packet rates on two input ports sharing a single output port is 200 × 10 6 Packets/second. The minimum frequency of a single output port is determined to be 0.2GHz, which is also the router minimum frequency r min . Then will satisfy r min Is identified as 0.5GHz and 1GHz and then based on r min And the operating frequency of devices 310 and 311 assign an initial router clock domain of 1 GHz.
For router 542, the sum of the packet rates on the six input ports sharing a single output port is 600 × 10 6 Packets/second. The minimum frequency of a single output port is determined to be 0.6GHz, which is also the router minimum frequency r min . Then will satisfy r min Is identified as 0.5GHz and 1GHz and then based on r min And the operating frequencies of devices 312, 313, 314, 315, 316, and 317 assign an initial router clock domain of 0.5 GHz.
For router 544, the sum of the packet rates on the two input ports sharing a single output port is 800 × 10 6 Packets/second. The minimum frequency of a single output port is determined to be 0.8GHz, which is also the router minimum frequency r min . Then will satisfy r min Is identified as 0.5GHz and 1GHz and then based on r min And the operating frequency of device 318 allocates an initial router clock domain of 1 GHz.
Fig. 10A depicts a traffic flow view 620 of traffic flow on the reference topology 507 of the NoC 300 in accordance with one embodiment of the present disclosure.
Although the PCDC buffer may be added at a later stage (e.g., at 270 below), in many implementations, the PCDC buffer may be added earlier to help allocate and optimize the clock domain. In this embodiment, PCDC buffer P11 has been added to the 1GHz clock domain and is located between router 542 and router 544. PCDC buffer P15 has been added to the 1GHz clock domain for flexibility and is located between router 540 and router 544.
Under this initial distribution of clock domains, it may be based on 100 × 10 per second originating from each device 310, 311, 312, 313, 314, 315, 316, and 317 6 The traffic flow of individual transactions (or packets) and the clock domain frequency to calculate the normalized packet rate on the link between the device, the router and the PCDC buffer. The normalized packet rate for a particular link may be defined as the number of transactions per second divided by the frequency of the clock domain in which the link is located. For example, the normalized packet rate on the link between device 310 and router 540 is 100x10 6 /1×10 9 Or 0.1 packets/cycle, with a normalized packet rate of 100x10 on the link between device 312 and router 542 6 /0.5×10 9 Or 0.2 packets/period, and so on.
More specifically, router 540 receives 0.1 packets/cycles from device 310 via the first input port, receives 0.1 packets/cycles from device 311 via the second input port, and transmits 0.2 packets/cycles to PCDC buffer P15 via 1 output port and 1 VC. PCDC buffer P15 transmits 0.2 packets/cycle to router 544. These normalized packet rates are determined with respect to the 1GHz clock domain, where 1 cycle equals 1 × 10 -9 And second.
Router 542 receives 0.2 packets/cycle from device 312 via a first input port and 0 from device 313 via a second input port2 packets/cycles, 0.2 packets/cycles received from device 314 through the third input port, 0.2 packets/cycles received from device 315 through the fourth input port, 0.2 packets/cycles received from device 316 through the fifth input port, and 0.2 packets/cycles received from device 317 through the sixth input port. Router 542 transmits 1.2 packets/cycle to PCDC buffer P11 via 1 output port and 1 VC. PCDC buffer P11 transmits 0.6 packets/cycle to router 544. These normalized packet rates are determined relative to the 0.5GHz clock domain, where 1 cycle equals 2 × 10 -9 And second.
Router 544 receives 0.2 packets/cycle from PCDC buffer P15 through a first input port, receives 0.6 packets/cycle from PCDC buffer P11 through a second input port, and transmits 0.8 packets/cycle to device 318. These normalized packet rates are determined with respect to the 1GHz clock domain, where 1 cycle equals 1 × 10 -9 And second.
Under reference topology 507, router 542 violates the router minimum frequency requirement because it has been assigned an initial clock domain of 0.5GHz, but has a minimum frequency r of 0.6GHz min . This results in an allocation of 1.2 packets/period being transmitted from a single output port and VC, which is greater than the maximum allowed normalized packet rate of 1 packet/period. Such a clock domain violation is indicated in fig. 10A.
In addition, router 542 provides a normalized packet rate of 0.6 packets/cycle to one of the input ports of router 544, which creates a packet rate imbalance as compared to a normalized packet rate of 0.2 packets/cycle provided by router 540 to another of the input ports of router 544. This packet rate imbalance is also indicated in fig. 10A. Advantageously, the performance cost of availability is to eliminate clock domain violations and balance the normalized packet rate on the input ports of router 544.
Generally, for frequencies assigned less than their minimum frequency (i.e., r) min ) For which a maximum normalized packet rate difference parameter (i.e., P) is determined router ). Then to P router Parameter summation to generate same for exploring reference topologyHis variant's packet rate balances costs.
More specifically, for each output port on the router and VC combination port, VC, i.e., OVC, the difference between the normalized packet rates, i.e., P, is calculated for each pair (or combination) of input ports i, j, which share the OVC ij Then determining the maximum P of the OVC ij And minimum P ij The difference between them, i.e. P OVC . Then determining the maximum P of the router OVC . Finally, P for each router OVC Summing to generate a packet rate balancing cost.
Fig. 10B depicts a traffic flow view 630 of traffic flows on a first variant topology of the NoC 300, according to one embodiment of the present disclosure.
In this embodiment, router 542 has been relocated to the 1GHz clock domain, and PCDC buffers P2, P3, P4, P5, P6, and P7 have been added to the 1GHz clock domain. PCDC buffers P2, P3, P4, P5, P6, and P7 are located between router 542 and devices 312, 313, 314, 315, 316, and 317, respectively, to address clock domain violations.
More specifically, PCDC buffer P2 receives 0.2 packets/cycle from device 312 and transmits 0.1 packets/cycle to router 542. PCDC buffer P3 receives 0.2 packets/cycle from device 313 and transmits 0.1 packets/cycle to router 542. PCDC buffer P4 receives 0.2 packets/cycle from device 314 and transmits 0.1 packets/cycle to router 542. PCDC buffer P5 receives 0.2 packets/cycle from device 315 and transmits 0.1 packets/cycle to router 542. PCDC buffer P6 receives 0.2 packets/cycle from device 316 and transmits 0.1 packets/cycle to router 542. PCDC buffer P7 receives 0.2 packets/cycle from device 317 and transmits 0.1 packets/cycle to router 542. Router 542 transmits 0.6 packets/cycle to router 544 via a single output port and VC.
The traffic flow from devices 310, 311 and router 540 remains the same as reference topology 507. More specifically, router 540 receives 0.1 packets/cycles from device 310 via the first input port, receives 0.1 packets/cycles from device 311 via the second input port, and transmits 0.2 packets/cycles to PCDC buffer P15 via 1 output port and 1 VC. PCDC buffer P15 transmits 0.2 packets/cycle to router 544. Router 544 receives 0.2 packets/cycles from PCDC buffer P15 through a first input port, receives 0.6 packets/cycles from router 542 through a second input port, and transmits 0.8 packets/cycles to device 318.
Although the clock domain violation has been eliminated, the packet rate imbalance at router 544 has not been resolved, and is indicated in fig. 10B. Additionally, adding six PCDC buffers to the 1GHz clock domain may not meet other design requirements of the NoC, such as, for example, PPA design constraints, etc.
Fig. 10C depicts a traffic flow view 640 of traffic flows on a second variant topology of the NoC 300, according to an embodiment of the present disclosure.
In this embodiment, router 542 has relocated to the 0.5GHz clock and split into two routers, router 542(R11.1) and router 543(R11.2), to address the clock domain violation problem. PCDC buffers 11.1 and 11.2 have also been introduced in the 1GHz clock domain.
Router 540 receives 0.1 packets/cycles from device 310 through a first input port and 0.1 packets/cycles from device 311 through a second input port and transmits 0.2 packets/cycles to PCDC buffer P15 through 1 output port and 1 VC. PCDC buffer P15 transmits 0.2 packets/cycle to router 544.
Router 542 receives 0.2 packets/cycles from device 312 via the first input port, 0.2 packets/cycles from device 313 via the second input port, 0.2 packets/cycles from device 314 via the third input port, 0.2 packets/cycles from device 315 via the fourth input port, and 0.2 packets/cycles from device 316 via the fifth input port. Router 542 transmits 1.0 packet/cycle to PCDC buffer P11.1 via 1 output port and 1 VC. PCDC buffer P11.1 transmits 0.5 packets/cycle to router 544.
Router 543 receives 0.2 packets/cycles from device 317 through the first input port and transmits 0.2 packets/cycles to PCDC buffer P11.2 through 1 output port and 1 VC. PCDC buffer P11.2 transmits 0.1 packets/cycle to router 544.
Router 544 receives 0.2 packets/cycles from PCDC buffer P15 through the first input port, 0.5 packets/cycles from router 542 through the second input port, 0.1 packets/cycles from router 543 through the third input port, and transmits 0.8 packets/cycles to device 318.
Although the clock domain violation has been eliminated, the normalized packet rate output by router 542 is the maximum allowed normalized packet rate of 1 packet/cycle. This does not provide an optimal solution since any processing delay that occurs at router 542 will result in a lost packet transmission to PCDC buffer P11.1.
In addition, packet rate imbalance at router 544 has not been addressed. Router 542 provides a normalized packet rate of 0.5 packets/cycle to the second input port of router 544, which creates a packet rate imbalance as compared to the normalized packet rate of 0.2 packets/cycle provided by router 540 to the first input port of router 544 and the normalized packet rate of 0.2 packets/cycle provided by router 543 to the third input port of router 544. This packet rate imbalance is indicated in fig. 10C.
Table 1 summarizes the normalized packet rates and packet transmission periods for some of the links depicted in fig. 10C. The PCDC buffer P15-to-router 544 link is identified as "P15-to-R20", the PCDC buffer P11.1-to-router 544 link is identified as "P11.1-to-R20", the PCDC buffer P11.2-to-router 544 link is identified as "P11.2-to-R20", and the router 542-to-PCDC buffer P11.1 link is identified as "R11.1-to-P11.1".
Link circuit Normalized packet rate Period of packet transmission
P15 to R20 0.2 packet/period 0、5、10、15、20...
P11.1 to R20 0.5 packet/period 0、2、4、6、8、10、12、14...
P11.2 to R20 0.1 packet/period 0、10、20...
R11.1 to P11.1 1.0 packet/period 0、1、2、3、4、5...
TABLE 1
In this embodiment, during cycles 0, 10, 20, etc., arbitration at router 544(R20) forces a packet transmitted over the P11.1 to R20 to wait two cycles, which also causes a packet transmitted over the R11.1 to P11.1 link to wait one cycle. The performance loss on the R11.1 to P11.1 links is 20%.
Fig. 10D depicts a traffic flow view 650 of traffic flows on the final topology of the NoC 300, according to one embodiment of the present disclosure.
In this embodiment, two traffic flows (i.e., traffic flow sets 325 and 326) have been reassigned from router 542 to router 543.
Router 540 receives 0.1 packets/cycles from device 310 through a first input port and 0.1 packets/cycles from device 311 through a second input port and transmits 0.2 packets/cycles to PCDC buffer P15 through 1 output port and 1 VC. PCDC buffer P15 transmits 0.2 packets/cycle to router 544.
Router 542 receives 0.2 packets/cycles from device 312 via the first input port, 0.2 packets/cycles from device 313 via the second input port, and 0.2 packets/cycles from device 314 via the third input port. Router 542 transmits 0.6 packets/cycle to PCDC buffer P11.1 via 1 output port and 1 VC. PCDC buffer P11.1 transmits 0.3 packets/cycle to router 544.
Router 543 receives 0.2 packets/cycles from device 315 through the first input port, 0.2 packets/cycles from device 316 through the second input port, and 0.2 packets/cycles from device 317 through the third input port. Router 543 transmits 0.6 packets/cycle to PCDC buffer P11.2 through 1 output port and 1 VC. PCDC buffer P11.2 transmits 0.3 packets/cycle to router 544.
Router 544 receives 0.2 packets/cycles from PCDC buffer P15 through the first input port, 0.3 packets/cycles from router 542 through the second input port, 0.3 packets/cycles from router 543 through the third input port, and transmits 0.8 packets/cycles to device 318.
Table 2 summarizes the normalized packet rates and packet transmission periods for some of the links depicted in fig. 10D. The PCDC buffer P15-to-router 544 link is identified as "P15-R20", the PCDC buffer P11.1-to-router 544 link is identified as "P11.1-R20", the PCDC buffer P11.2-to-router 544 link is identified as "P11.2-R20", the router 542-to-PCDC buffer P11.1 link is identified as "R11.1-P11.1", and the router 543-to-PCDC buffer P11.2 link is identified as "R11.2-P11.2".
Link circuit Normalized packet rate Period of packet transmission
P15 to R20 0.2 packet/period 0、5、10、15、20...
P11.1 to R20 0.3 packet/period 0、3、6、10、13、16、20...
P11.2 to R20 0.3 packet/period 0、3、6、10、13、16、20...
R11.1 to P11.1 0.6 packet/period 0、2、4、5、6、8、10、12、14...
R11.2 to P11.2 0.6 packet/period 0、2、4、5、6、8、10、12、14...
TABLE 2
In this embodiment, during cycles 0, 10, 20, etc., arbitration at router 544(R20) forces packets transmitted to R20 over P11.1 or R20 over P11.2 to wait for one cycle, but no cycle loss occurs for packets transmitted over the R11.1 to P11.1 link or the R11.2 to P11.2 link. No loss of performance occurred.
The final topology is a balanced NoC topology that eliminates clock domain violations and packet rate imbalances present in reference topology 507.
At 250, a final route for each traffic flow is determined. In one embodiment, using shortest path routing, optional constraints may be unrotated in the generated topology. Different routing methods may be employed such as, for example, XY-YX routing, turn-prohibited routing, etc.
At 260, a configuration network is generated. In many embodiments, the configuration network may be used for debugging purposes. The configuration network includes bridge ports, routers, connections, and routes. In one embodiment, the configuration network emulates a data network. In addition, the configuration network can be independently optimized in a manner similar to a data network. The delay and performance of the configuration network is generally relaxed to produce the simplest design with the smallest area.
At 270, a PCDC buffer is added to the connection between a bridge or router in the synchronous clock domain and an adjacent bridge or router in the asynchronous clock domain, and the clock domain allocation may be refined. Link sizes are also determined for each router in each route, and resizers are added between the bridge and routers or between adjacent routers with different link sizes. Generally, bridge data, traffic data, VC allocation, and topology are used to determine link sizes to collectively meet average traffic performance requirements and individually meet peak traffic performance requirements. In addition, the number of resizers added to the NoC is minimized in order to reduce the delay encountered by the traffic flow. In certain embodiments, certain bridge ports may be allowed to peak at the same time.
At 280, a pipeline and a timing reordering component are added based on timing. To meet the timing, pipeline components are added at appropriate locations to keep slack (i.e., the difference between the required time and the arrival time) within appropriate limits. For example, one or more components may be relocated, and one or more pipeline components may be added if the relocated components cannot meet the timing. For example, component repositioning may be based on force orientation arrangements and the like. In certain embodiments, blocks 260, 270, and 280 may be repeated until the NoC has been optimized.
At 290, NoC output specification 292 is generated and then stored in memory 130. In addition, the NoC output specification 292 may be transmitted over the network 20, provided to software modules 136 used by NoC designers, and the like. For example, the NoC output specification 292 may be provided as an input to a NoC manufacturing process of a chip foundry. Reports 294 may also be generated and then stored in memory 130. For example, the report 294 may include components used in the design (e.g., routers, resizers, PCDCs, pipelines, etc.), traffic on each link, link utilization, delays across paths, etc.
Fig. 11 depicts a final topology 508 of a NoC 300 according to one embodiment of the present disclosure. The user may view the final topology 508 on the display 152.
NoC 300 now includes clock domains 302 and 303. Clock domain 302 is a 0.5GHz clock domain, while clock domain 303 is a 1GHz clock domain.
Devices 310 and 311 are connected to router 540. Devices 312, 313, and 314 are connected to router 542. Devices 315, 316, and 317 are connected to router 543. Device 318 is connected to router 544. The router 540 is connected to the router 544 through the PCDC buffer P15. The router 542 is connected to the router 544 through the PCDC buffer P11.1. The router 543 is connected to the router 544 through the PCDC buffer P11.2.
Fig. 12A, 12B, and 12C depict a flow diagram representing functions associated with a synthetic NoC, according to one embodiment of the present disclosure. Fig. 12A depicts a flowchart 600, fig. 12B depicts a flowchart 601, and fig. 12C depicts a flowchart 602.
At 610, physical data, device data, bridge data, and traffic data are determined based on the input specifications of the NoC. The physical data includes a dimension of the NoC, the device data includes a plurality of devices, the bridge data includes a plurality of bridge ports, and the traffic data includes a plurality of traffic flows. Each device has a location and a dimension, each bridge port is associated with one of the devices and has a location, and each traffic flow has a packet rate.
At 620, a Virtual Channel (VC) is assigned for each traffic flow to create a plurality of VC allocations.
At 630, a reference topology is generated based on the physical data, the device data, the bridge data, the traffic data, and the VC allocation. The reference topology includes bridge ports, routers, and connections. Each router has one or more input ports and one or more output ports.
In some embodiments, the functions at 631 through 635 are performed to generate the reference topology.
At 631, a line end (HoL) collision graph (HCG) is constructed based on the traffic data and VC assignments, including creating a plurality of nodes, creating a plurality of edges, and assigning a color to each node to minimize HoL collisions. Each node represents a traffic flow and each edge represents an HoL conflict.
At 632, a plurality of flow maps (TGs) are constructed based on the physical data, the bridge data, the flow data, and the HCG, including constructing the TGs for each color of the HCG.
At 633, candidate topologies are generated for each TG. Each candidate topology includes at least two bridge ports, at least one router, and at least two connections.
At 634, the candidate topologies are merged to create a reference topology.
At 635, the routers within the reference topology are merged. Flow then proceeds to 640.
In other embodiments, the functions at 610, 620, and 630 are not performed. Instead, at 605, traffic data, VC allocation, and reference topology are received by computer 100, or alternatively, retrieved from memory 130, and flow proceeds to 640.
At 640, each router having at least one output port shared by traffic flows received on at least two input ports is identified.
The functions at 650 and 651 are repeated for each identified router.
At 650, a minimum frequency for the identified router is calculated based on the packet rate of the traffic flow received by the identified router.
At 651, a clock domain is assigned to the identified router based on the minimum frequency of the identified router.
At 660, the traffic flows received by the identified routers are balanced based on their packet rates.
At 670, a final topology is generated based on the reference topology and the balanced traffic flows of the identified routers.
Embodiments of the present disclosure advantageously provide a computer-based method and system for synthesizing a network on chip (NoC). The embodiments described above and outlined below are combinable.
In one embodiment, a computer-based method for synthesizing a NoC is provided. The method includes determining, based on an input specification of the NoC, physical data, device data, bridge data, and traffic data, the physical data including a dimension of the NoC, the device data including a plurality of devices, each device having a location and a dimension, the bridge data including a plurality of bridge ports, each bridge port associated with one of the devices and having a location, the traffic data including a plurality of traffic flows, each traffic flow having a packet rate. Virtual Channels (VCs) are assigned for each traffic stream to create a plurality of VC assignments. A reference topology is generated based on the physical data, the device data, the bridge data, the traffic data, and the VC assignments, the reference topology including a plurality of bridge ports, a plurality of routers, and a plurality of connections, each router having one or more input ports and one or more output ports. Each router having at least one output port shared by traffic flows received on at least two input ports is identified. For each identified router, a minimum frequency for the identified router is calculated based on the packet rate of the traffic flow received by the identified router, and a clock domain is assigned to the identified router based on the minimum frequency for the identified router. Balancing the traffic flows received by the identified routers based on their packet rates. A final topology is generated based on the reference topology and the balanced traffic flows of the identified routers.
In another embodiment of the method, each shared output port transmits a traffic stream over a single VC.
In another embodiment of the method, the calculating the minimum frequency of the identified router includes, for each output port: summing packet rates of traffic flows received on input ports of a shared output port to generate a total output packet rate, and calculating a minimum frequency of the output port based on the total output packet rate; determining a maximum frequency among the minimum frequencies of the output ports; and setting the minimum frequency of the identified router to the maximum frequency.
In another embodiment of the method, the method further comprises: adding a Physical Clock Domain Crossing (PCDC) buffer to each link between two routers allocated to different clock domains; determining a normalized packet rate for traffic flows between the bridge port, the router, and the PCDC buffer based on the assigned clock domain; and identifying a clock domain violation for the identified router based on the assigned clock domain and the normalized packet rate.
In another embodiment of the method, balancing the flow streams comprises: identifying a packet rate imbalance for the identified router based on the normalized packet rate; determining a performance cost based on the assigned clock domain and the normalized packet rate; and correcting clock domain violations and packet rate imbalances based on performance cost.
In another embodiment of the method, the performance cost includes a router clock domain violation cost and a packet rate balancing cost.
In another embodiment of the method, correcting the clock domain violation and the packet rate imbalance includes at least one of: assigning different clock domains to one or more of the identified routers; routing one or more traffic flows to different identified routers; and adding one or more of the identified routers.
In another embodiment of the method, generating the reference topology comprises: constructing a line-end (HoL) collision graph (HCG) based on traffic data and VC allocation, including creating a plurality of nodes, each node representing a traffic stream, creating a plurality of edges, each edge representing a HoL collision, and allocating colors for each HCG node to minimize HoL collisions; constructing a plurality of flow graphs (TGs) based on the physical data, the bridge data, the flow data, and the HCG, including constructing TGs for each color of the HCG; generating candidate topologies for each TG, each candidate topology comprising at least two bridge ports, at least one router, and at least two connections; merging the candidate topologies to create a reference topology; and merging routers within the reference topology.
In one embodiment, a system for synthesizing a NoC comprises: a memory for storing an input specification of the NoC; and a processor coupled to the memory. The processor is configured to: determining, based on an input specification of the NoC, physical data, device data, bridge data, and traffic data, the physical data including dimensions of the NoC, the device data including a plurality of devices, each device having a location and a dimension, the bridge data including a plurality of bridge ports, each bridge port associated with one of the devices and having a location, the traffic data including a plurality of traffic flows, each traffic flow having a packet rate; allocating a Virtual Channel (VC) for each traffic flow to create a plurality of VC allocations; generating a reference topology based on the physical data, the device data, the bridge data, the traffic data, and the VC assignments, the reference topology comprising a plurality of bridge ports, a plurality of routers, and a plurality of connections, each router having one or more input ports and one or more output ports; identifying each router having at least one output port, each router of the at least one output port being shared by traffic flows received on at least two input ports; for each identified router: calculating a minimum frequency for the identified router based on a packet rate of a traffic flow received by the identified router, and assigning a clock domain for the identified router based on the minimum frequency for the identified router; balancing the traffic flows received by the identified routers based on their packet rates; and generating a final topology based on the reference topology and the balanced traffic flows of the identified routers.
In another embodiment of the system, each shared output port transmits a traffic stream over a single VC.
In another embodiment of the system, calculating the minimum frequency for the identified router includes, for each output port: summing packet rates of traffic flows received on input ports of a shared output port to generate a total output packet rate, and calculating a minimum frequency of the output port based on the total output packet rate; determining a maximum frequency among the minimum frequencies of the output ports; and setting the minimum frequency of the identified router to the maximum frequency.
In another embodiment of the system, the processor is further configured to: adding a Physical Clock Domain Crossing (PCDC) buffer to each link between two routers allocated to different clock domains; determining a normalized packet rate for traffic flows between the bridge port, the router, and the PCDC buffer based on the assigned clock domain; and identifying a clock domain violation for the identified router based on the assigned clock domain and the normalized packet rate.
In another embodiment of the system, balancing the flow streams comprises: identifying a packet rate imbalance for the identified router based on the normalized packet rate; determining a performance cost based on the assigned clock domain and the normalized packet rate; and correcting clock domain violations and packet rate imbalances based on performance cost.
In another embodiment of the system, the performance cost includes a router clock domain violation cost and a packet rate balancing cost.
In another embodiment of the system, correcting the clock domain violation and the packet rate imbalance includes at least one of: assigning different clock domains to one or more of the identified routers; routing one or more traffic flows to different identified routers; and adding one or more of the identified routers.
In another embodiment of the system, generating the reference topology comprises: constructing a line-end (HoL) collision graph (HCG) based on traffic data and VC allocation, including creating a plurality of nodes, each node representing a traffic stream, creating a plurality of edges, each edge representing a HoL collision, and allocating colors for each HCG node to minimize HoL collisions; constructing a plurality of flow graphs (TGs) based on the physical data, the bridge data, the flow data, and the HCG, including constructing TGs for each color of the HCG; generating candidate topologies for each TG, each candidate topology comprising at least two bridge ports, at least one router, and at least two connections; merging the candidate topologies to create a reference topology; and merging routers within the reference topology.
In one embodiment, another computer-based method for synthesizing nocs is provided. Receiving traffic data, Virtual Channel (VC) allocation, and a reference topology, the traffic data comprising a plurality of traffic flows, the reference topology comprising a plurality of bridge ports, a plurality of routers, each router having one or more input ports and one or more output ports, and a plurality of connections, each traffic flow having a packet rate. Each router having at least one output port shared by traffic flows received on at least two input ports is identified. For each identified router: a minimum frequency for the identified router is calculated based on the packet rate of the traffic flow received by the identified router, and a clock domain is assigned to the identified router based on the minimum frequency for the identified router. Balancing the traffic flows received by the identified routers based on their packet rates. A final topology is generated based on the reference topology and the balanced traffic flows of the identified routers.
In another embodiment of the method, each shared output port transports a flow of traffic over a single VC, and the calculating the minimum frequency of the identified router includes, for each output port: summing packet rates of traffic flows received on input ports of a shared output port to generate a total output packet rate, and calculating a minimum frequency of the output port based on the total output packet rate; determining a maximum frequency among the minimum frequencies of the output ports; and setting the minimum frequency of the identified router to the maximum frequency.
In another embodiment of the another method, the another method further comprises: adding a Physical Clock Domain Crossing (PCDC) buffer to each link between two routers allocated to different clock domains; determining a normalized packet rate for traffic flows between the bridge port, the router, and the PCDC buffer based on the assigned clock domains; and identifying a clock domain violation for the identified router based on the assigned clock domain and the normalized packet rate, wherein the balanced traffic flow comprises: the method further includes identifying a packet rate imbalance for the identified router based on the normalized packet rate, determining a performance cost based on the assigned clock domain and the normalized packet rate, the performance cost including a router clock domain violation cost and a packet rate balancing cost, and correcting the clock domain violation and the packet rate imbalance based on the performance cost.
In another embodiment of the another method, correcting the clock domain violation and the packet rate imbalance includes at least one of: assigning different clock domains to one or more of the identified routers; routing one or more traffic flows to different identified routers; and adding one or more of the identified routers.
While specific implementations of the present disclosure are capable of embodiments in many different forms, there is shown in the drawings and will herein be described in detail specific embodiments, with the understanding that the present disclosure is to be considered as an exemplification of the principles of the disclosure and is not intended to limit the disclosure to the specific embodiments illustrated and described. In the description above, like reference numerals may be used to describe the same, similar or corresponding parts in the several views of the drawings.
In this document, relational terms such as first and second, top and bottom, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," "includes," "including," "has," "having," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element with the "comprising" does not, without further limitation, preclude the presence of additional identical elements in a process, method, article, or apparatus that comprises the element.
Reference throughout this document to "one embodiment," "certain embodiments," "an embodiment," "a specific implementation," "aspect," or similar terminology means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of such phrases in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments without limitation.
As used herein, the term "or" is to be interpreted as inclusive or meaning any one or any combination. Thus, "A, B or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C. An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive. Furthermore, grammatical coupling is intended to convey any and all disjunctive and conjunctive combinations of coupled clauses, sentences, words, and the like, unless otherwise indicated or clear from the context. Thus, the term "or" should generally be understood to mean "and/or" and the like. Reference to an item in the singular is to be understood as including the plural and vice versa unless explicitly stated otherwise or clear from the context.
Recitation of ranges of values herein are not intended to be limiting, but rather are individually intended to mean any and all values falling within the range, unless otherwise indicated herein, and each separate value within such range is incorporated into the specification as if it were individually recited herein. When accompanying numerical values, the words "about," "approximately," and the like are to be understood as indicating a deviation that, as one of ordinary skill in the art would appreciate, operates satisfactorily for its intended purpose. Values and/or ranges of values are provided herein as examples only and do not constitute limitations on the scope of the described embodiments. The use of any and all examples, or exemplary language ("e.g.," such as "etc.), provided herein is intended merely to better illuminate the embodiments and does not pose a limitation on the scope of the embodiments. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the embodiments.
For simplicity and clarity of illustration, reference numerals may be repeated among the figures to indicate corresponding or analogous elements. Numerous details are set forth to provide an understanding of the embodiments described herein. Embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the embodiments. The description should not be considered as limiting the scope of the embodiments described herein.
In the following description, it is to be understood that such terms as "first," "second," "top," "bottom," "up," "down," "over," "under," and the like are words of convenience and are not to be construed as limiting terms. In addition, the terms device, apparatus, system, and the like may be used interchangeably herein.
The many features and advantages of the disclosure are apparent from the detailed specification, and thus, it is intended by the appended claims to cover all such features and advantages of the disclosure which fall within the scope of the disclosure. Further, since numerous modifications and variations will readily occur to those skilled in the art, it is not desired to limit the disclosure to the exact construction and operation illustrated and described, and accordingly, all suitable modifications and equivalents may be resorted to, falling within the scope of the disclosure.

Claims (20)

1. A computer-based method for synthesizing a network on chip (NoC), the computer-based method comprising:
determining, based on input specifications of the NoC, physical data, device data, bridge data, and traffic data, the physical data including dimensions of the NoC, the device data including a plurality of devices, each device having a location and a dimension, the bridge data including a plurality of bridge ports, each bridge port associated with one of the devices and having a location, the traffic data including a plurality of traffic flows, each traffic flow having a packet rate;
allocating a Virtual Channel (VC) for each traffic flow to create a plurality of VC allocations;
generating a reference topology based on the physical data, the device data, the bridge data, the traffic data, and the VC allocation, the reference topology comprising the plurality of bridge ports, a plurality of routers, and a plurality of connections, each router having one or more input ports and one or more output ports;
identifying each router having at least one output port, each router of the at least one output port being shared by traffic flows received on at least two input ports;
for each identified router:
calculating a minimum frequency for the identified router based on the packet rate of the traffic flow received by the identified router, an
Assigning a clock domain for the identified router based on the minimum frequency of the identified router;
balancing the traffic flows received by the identified routers based on the packet rates of the traffic flows; and
generating a final topology based on the reference topology and the balanced traffic flows of the identified routers.
2. The computer-based method of claim 1, wherein each shared output port transmits the traffic flow over a single VC.
3. The computer-based method of claim 2, wherein the calculating the minimum frequency of the identified router comprises:
for each output port:
summing the packet rates of the traffic flows received on the input ports sharing the output port to generate a total output packet rate, an
Calculating a minimum frequency for the output port based on the total output packet rate;
determining a maximum frequency of the minimum frequencies of the output port; and
setting the minimum frequency of the identified router to the maximum frequency.
4. The computer-based method of claim 3, further comprising:
adding a Physical Clock Domain Crossing (PCDC) buffer to each link between two routers assigned to different clock domains;
determining a normalized packet rate for the traffic flow between the bridge port, router, and PCDC buffer based on the assigned clock domain; and
identifying a clock domain violation for the identified router based on the assigned clock domain and the normalized packet rate.
5. The computer-based method of claim 4, wherein the balancing the traffic flows comprises:
identifying a packet rate imbalance for the identified router based on the normalized packet rate;
determining a performance cost based on the assigned clock domain and the normalized packet rate; and
correcting the clock domain violation and the packet rate imbalance based on the performance cost.
6. The computer-based method of claim 5, wherein the performance cost comprises a router clock domain violation cost and a packet rate balancing cost.
7. The computer-based method of claim 6, wherein the correcting the clock domain violation and the packet rate imbalance comprises at least one of:
assigning different clock domains to one or more of the identified routers;
routing one or more traffic flows to different identified routers; and
adding one or more identified routers.
8. The computer-based method of claim 1, wherein the generating a reference topology comprises:
constructing a line-end (HoL) collision graph (HCG) based on the traffic data and the VC assignments, comprising:
creating a plurality of nodes, each node representing a traffic flow,
creating a plurality of edges, each edge representing an HoL conflict, an
Assigning colors to each HCG node to minimize HoL collisions;
constructing a plurality of flow maps (TG) based on the physical data, the bridge data, the flow data and the HCG, including constructing TG for each color of the HCG;
generating candidate topologies for each TG, each candidate topology comprising at least two bridge ports, at least one router, and at least two connections;
merging the candidate topologies to create the reference topology; and
merging routers within the reference topology.
9. A system for synthesizing a network on chip (NoC), the system comprising:
a memory for storing an input specification of a NoC; and
a processor coupled to the memory, the processor configured to:
determining, based on input specifications of the NoC, physical data, device data, bridge data, and traffic data, the physical data including dimensions of the NoC, the device data including a plurality of devices, each device having a location and a dimension, the bridge data including a plurality of bridge ports, each bridge port associated with one of the devices and having a location, the traffic data including a plurality of traffic flows, each traffic flow having a packet rate,
a Virtual Channel (VC) is assigned for each traffic stream to create a plurality of VC assignments,
generating a reference topology based on the physical data, the device data, the bridge data, the traffic data, and the VC assignments, the reference topology comprising the plurality of bridge ports, a plurality of routers, and a plurality of connections, each router having one or more input ports and one or more output ports,
identifying each router having at least one output port, each router of the at least one output port being shared by traffic flows received on at least two input ports,
for each identified router:
calculating a minimum frequency for the identified router based on the packet rate of the traffic flow received by the identified router, an
Assigning a clock domain for the identified router based on the minimum frequency of the identified router,
balancing the traffic flows received by the identified routers based on the packet rates of the traffic flows, an
Generating a final topology based on the reference topology and the balanced traffic flows of the identified routers.
10. The system of claim 9, wherein each shared output port transmits the traffic flow over a single VC.
11. The system of claim 10, wherein the calculating the minimum frequency of the identified router comprises:
for each output port:
summing the packet rates of the traffic flows received on the input ports sharing the output port to generate a total output packet rate, an
Calculating a minimum frequency of the output port based on the total output packet rate;
determining a maximum frequency of the minimum frequencies of the output port; and
setting the minimum frequency of the identified router to the maximum frequency.
12. The system of claim 11, wherein the processor is further configured to:
adding a Physical Clock Domain Crossing (PCDC) buffer to each link between two routers assigned to different clock domains;
determining a normalized packet rate for the traffic flow between the bridge port, router, and PCDC buffer based on the assigned clock domain; and
identifying a clock domain violation for the identified router based on the assigned clock domain and the normalized packet rate.
13. The system of claim 12, wherein the balancing the traffic flows comprises:
identifying a packet rate imbalance for the identified router based on the normalized packet rate;
determining a performance cost based on the assigned clock domain and the normalized packet rate; and
correcting the clock domain violation and the packet rate imbalance based on the performance cost.
14. The system of claim 13, wherein the performance cost comprises a router clock domain violation cost and a packet rate balancing cost.
15. The system of claim 14, wherein the correcting the clock domain violation and the packet rate imbalance comprises at least one of:
assigning different clock domains to one or more of the identified routers;
routing one or more traffic flows to different identified routers; and
adding one or more identified routers.
16. The system of claim 9, wherein the generating a reference topology comprises:
constructing a line end (HoL) collision graph (HCG) based on the traffic data and the VC assignments, comprising:
creating a plurality of nodes, each node representing a traffic flow,
creating a plurality of edges, each edge representing an HoL conflict, an
Assigning colors to each HCG node to minimize HoL collisions;
constructing a plurality of flow graphs (TG) based on the physical data, the bridge data, the flow data, and the HCG, including constructing a TG for each color of the HCG;
generating candidate topologies for each TG, each candidate topology comprising at least two bridge ports, at least one router, and at least two connections;
merging the candidate topologies to create the reference topology; and
merging routers within the reference topology.
17. A computer-based method for synthesizing a network on chip (NoC), the computer-based method comprising:
receiving traffic data, a Virtual Channel (VC) allocation, and a reference topology, the traffic data comprising a plurality of traffic flows, the reference topology comprising a plurality of bridge ports, a plurality of routers, and a plurality of connections, each router having one or more input ports and one or more output ports, each traffic flow having a packet rate;
identifying each router having at least one output port, each router of the at least one output port being shared by traffic flows received on at least two input ports;
for each identified router:
calculating a minimum frequency for the identified router based on the packet rate of the traffic flow received by the identified router, an
Assigning a clock domain for the identified router based on the minimum frequency of the identified router;
balancing the traffic flows received by the identified routers based on the packet rates of the traffic flows; and
generating a final topology based on the reference topology and the balanced traffic flows of the identified routers.
18. The computer-based method of claim 17, wherein each shared output port transmits the traffic flow over a single VC, and the calculating the minimum frequency for the identified router comprises:
for each output port:
summing the packet rates of the traffic flows received on the input ports sharing the output port to generate a total output packet rate, an
Calculating a minimum frequency of the output port based on the total output packet rate;
determining a maximum frequency of the minimum frequencies of the output port; and
setting the minimum frequency of the identified router to the maximum frequency.
19. The computer-based method of claim 18, the computer-based method further comprising:
adding a Physical Clock Domain Crossing (PCDC) buffer to each link between two routers assigned to different clock domains;
determining a normalized packet rate for the traffic flow between the bridge port, router and PCDC buffer based on the allocated clock domain; and
identifying a clock domain violation for the identified router based on the assigned clock domain and the normalized packet rate,
wherein said balancing said traffic flows comprises:
identifying a packet rate imbalance for the identified router based on the normalized packet rate,
determining a performance cost based on the distributed clock domain and the normalized packet rate, the performance cost comprising a router clock domain violation cost and a packet rate balancing cost, an
Correcting the clock domain violation and the packet rate imbalance based on the performance cost.
20. The computer-based method of claim 19, wherein the correcting the clock domain violation and the packet rate imbalance comprises at least one of:
assigning different clock domains to one or more of the identified routers;
routing one or more traffic flows to different identified routers; and
adding one or more of the identified routers.
CN202210113322.1A 2021-02-09 2022-01-30 Network-on-chip topology generation Pending CN114915586A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17/171,408 2021-02-09
US17/171,408 US11329690B2 (en) 2019-07-22 2021-02-09 Network-on-Chip topology generation

Publications (1)

Publication Number Publication Date
CN114915586A true CN114915586A (en) 2022-08-16

Family

ID=80461567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210113322.1A Pending CN114915586A (en) 2021-02-09 2022-01-30 Network-on-chip topology generation

Country Status (2)

Country Link
CN (1) CN114915586A (en)
GB (1) GB2607653B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115643167A (en) * 2022-12-14 2023-01-24 摩尔线程智能科技(北京)有限责任公司 Network-on-chip configuration method and device, and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11194950B2 (en) * 2019-07-22 2021-12-07 Arm Limited Network-on-chip topology generation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115643167A (en) * 2022-12-14 2023-01-24 摩尔线程智能科技(北京)有限责任公司 Network-on-chip configuration method and device, and storage medium
CN115643167B (en) * 2022-12-14 2023-03-10 摩尔线程智能科技(北京)有限责任公司 Network-on-chip configuration method and device, and storage medium

Also Published As

Publication number Publication date
GB2607653A (en) 2022-12-14
GB2607653B (en) 2023-06-14
GB202200723D0 (en) 2022-03-09

Similar Documents

Publication Publication Date Title
KR101652490B1 (en) Automatic noc topology generation
US11329690B2 (en) Network-on-Chip topology generation
US8819616B2 (en) Asymmetric mesh NoC topologies
US11194950B2 (en) Network-on-chip topology generation
US9244880B2 (en) Automatic construction of deadlock free interconnects
US10817627B1 (en) Network on-chip topology generation
Abdallah et al. Basic network-on-chip interconnection for future gigascale MCSoCs applications: Communication and computation orthogonalization
US11283729B2 (en) Network-on-chip element placement
US20220210056A1 (en) Network-On-Chip Topology Generation
US11310169B2 (en) Network-on-chip topology generation
Manevich et al. Best of both worlds: A bus enhanced NoC (BENoC)
CN114915586A (en) Network-on-chip topology generation
US10419300B2 (en) Cost management against requirements for the generation of a NoC
Göhringer et al. Heterogeneous and runtime parameterizable star-wheels network-on-chip
US10547514B2 (en) Automatic crossbar generation and router connections for network-on-chip (NOC) topology generation
US20180198682A1 (en) Strategies for NoC Construction Using Machine Learning
Zitouni et al. Communication architecture synthesis for multi-bus SoC
Johari et al. Master-based routing algorithm and communication-based cluster topology for 2D NoC
Alimi et al. Network-on-Chip Topologies: Potentials, Technical Challenges, Recent Advances and Research Direction
Bamberg et al. Interconnect architectures for 3d technologies
Gugulothu et al. Design and Implementation of various topologies for Networks on Chip and its performance evolution
Sasakawa et al. LEF: long edge first routing for two-dimensional mesh network on chip
e Fizardo et al. State of art of network on chip
Sahu Bidirectional Network-on-Chip Router Implementation Using VHDL
Umamaheswari et al. Dynamic buffer management to improve the performance of fault tolerance adaptive network-on-chip applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination