CN112631985B - Network-on-chip for link sharing - Google Patents

Network-on-chip for link sharing Download PDF

Info

Publication number
CN112631985B
CN112631985B CN202011528831.8A CN202011528831A CN112631985B CN 112631985 B CN112631985 B CN 112631985B CN 202011528831 A CN202011528831 A CN 202011528831A CN 112631985 B CN112631985 B CN 112631985B
Authority
CN
China
Prior art keywords
data packet
virtual channel
chip
classifier
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011528831.8A
Other languages
Chinese (zh)
Other versions
CN112631985A (en
Inventor
胡东伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 54 Research Institute
Original Assignee
CETC 54 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 54 Research Institute filed Critical CETC 54 Research Institute
Priority to CN202011528831.8A priority Critical patent/CN112631985B/en
Publication of CN112631985A publication Critical patent/CN112631985A/en
Application granted granted Critical
Publication of CN112631985B publication Critical patent/CN112631985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17381Two dimensional, e.g. mesh, torus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a network-on-chip of link sharing, and belongs to the technical field of chip design. The system comprises a plurality of processor cores which are arranged in an array form, wherein a physical link is arranged between two adjacent processor cores in the array, and two ends of each physical link are respectively provided with an arbitration and multiplexing device, a classifier and two virtual channel buffers; the processor core transmits the data packet to the next hop processor core according to the routing path until the data packet reaches the receiving end processor core. The invention adopts virtual channel technology, can realize the independence of two service flows and the sharing of physical links. In addition, by adopting the unbalanced arbiter, the matching of the arbiter and the service flow can be realized, which is beneficial to improving the network-on-chip utilization rate.

Description

Network-on-chip for link sharing
Technical Field
The invention relates to the technical field of chip design, in particular to a network-on-chip with link sharing.
Background
Network-on-chip is an important method of implementing multi-core/many-core processors. With network-on-chip technology, hundreds or thousands of chips within a single chip may be interconnected. Modern high-end server chips all employ network-on-chip technology. However, no implementation method of the network-on-chip technology is disclosed by each company. Thus, each company has different network-on-chip implementation technologies.
For multi-core/many-core processors, network-on-chip technology can not only enable each processor to access off-chip memory, but also enable each processor to communicate with each other. These are two different traffic flows. At present, two methods exist for realizing the access of any processor to the off-chip memory and the mutual access between any two processors. One approach is to use two completely independent sets of network-on-chip to achieve independent access to the two communication traffic streams, respectively. For example, the TILE64 architecture claims to have 3 separate sets of network-on-chip. Since the implementation of the network on chip involves a large number of on-chip traces, this method has the disadvantage that the implementation cost of the network on chip is large, resulting in a large cost of chip area. Alternatively, two communication traffic streams are mixed together, and a set of network-on-chip is used to implement both communication traffic streams. This approach is visible in a number of academic papers. A disadvantage of this approach is that when one traffic flow is blocked, the other traffic flow will also be blocked, resulting in a decrease in system performance.
Disclosure of Invention
In view of this, the present invention proposes a network-on-chip with link sharing, which adopts virtual channel technology, and can realize separation of service flow accessing memory and service flow communicating between processor cores, and physical links of the two are shared, thereby not only ensuring performance, but also reducing area overhead of network-on-chip implementation.
Based on the above purpose, the technical scheme provided by the invention is as follows:
a link shared network on chip comprising a plurality of processor cores; the processor cores are arranged in an array form, a physical link is arranged between two adjacent processor cores in the array, and two ends of each physical link are respectively provided with an arbitration and multipath selector, a classifier and two virtual channel buffers for storing an off-chip memory access service flow data packet and an inter-processor communication service flow data packet; the processor core sends the data packets of the off-chip memory access service flow and the inter-processor communication service flow to the corresponding virtual channel buffers respectively according to the routing paths, the arbitration and multiplexing selector selects one data packet from the two virtual channel buffers, the data packet is sent to the classifier of the next-hop processor core through the corresponding physical link, the classifier of the next-hop processor core stores the received data packet in the corresponding virtual channel buffer, the arbitration and multiplexing selector sends the received data packet to the classifier of the next-hop processor core through the corresponding physical link until the data packet reaches the classifier of the processor core of the receiving end, the data packet of the off-chip memory access service flow is sent to the off-chip memory through the classifier, and the inter-processor communication service flow data packet is provided to the processor core of the receiving end.
Further, the arbitration and multiplexing device comprises a register and an inverter, wherein the register is reset to 0 at the beginning, and is continuously turned between 0 and 1 after starting to generate a multiplexing signal, so that data packets in two virtual channel buffers are continuously transmitted in a round-robin manner.
Further, the arbitration and multiplexer includes a cyclic shift register and a configurable selector; the cyclic shift register is an N-bit register chain, N is more than or equal to 2, when the register chain is reset, only 1 bit is 1, the rest bits are all 0, and then each clock of the cyclic shift register is shifted to the right by one bit; the selector selects fixed M bits from the N-bit register chain, M < N, and performs OR operation on the M bits to obtain a selection signal of the multiplexer.
Further, each virtual channel buffer is a first-in first-out buffer.
Further, the packet header of the data packet has a type field and a control field, where the type field is used to indicate the traffic flow type of the data packet, and the control field is used to indicate the access type, that is, whether the data packet belongs to a read request, a read response, a write request or a write response.
Further, the virtual channel buffer has a data packet validity indication signal, after the classifier receives the data packet of the opposite end, the classifier compares the type field of the data packet with the service flow types of the two local virtual channel buffers to obtain corresponding boolean values, and then sums the two boolean values with the data packet validity indication signals of the opposite end to obtain validity signals of the two local virtual channel buffers, and then sends the data packet to the two local virtual channel buffers at the same time, wherein only the local virtual channel buffer with the validity signal being valid can successfully receive the data packet.
From the above description, the technical scheme of the invention has the following beneficial effects:
1. the invention adopts virtual channel technology, realizes the independence of two service flows of each processor core accessing the off-chip memory and each processor core accessing each other in the network on chip, and the physical links are shared. Therefore, on one hand, the service flow accessed by the memory is separated from the service flow communicated between the processors, so that the communication performance is improved; on the other hand, the physical links of the two service flows are shared, so that the overhead of wiring in a chip is avoided, and the area of the chip is reduced.
2. The invention can realize the matching of the service flow and the arbiter through the programmable unbalanced arbiter, and is beneficial to improving the network-on-chip utilization rate.
Drawings
For a clearer description of the present patent, one or more drawings are provided below, which are intended to aid in the description of the background, principles, and/or certain embodiments of the present patent.
Fig. 1 is a schematic diagram of a network on chip in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a link sharing structure in an embodiment of the present invention.
Fig. 3 is a schematic diagram of a format of a data packet according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of one implementation of an arbitration and multiplexer in accordance with an embodiment of the present invention.
FIG. 5 is a schematic diagram of another implementation of an arbitration and multiplexer in accordance with an embodiment of the present invention.
Fig. 6 is a schematic diagram of one implementation of a classifier in an embodiment of the invention.
Detailed Description
In order to facilitate understanding of the technical solution of the present patent by those skilled in the art, and at the same time, in order to make the technical purpose, technical solution and beneficial effect of the present patent clearer, and make the protection scope of the claims fully supported, the technical solution of the present patent is further and more detailed described in the form of specific cases.
As shown in fig. 1 and 2, a network on chip with shared links includes a plurality of processor cores, where the processor cores are arranged in an array, a physical link is arranged between two adjacent processor cores in the array, two ends of each physical link are respectively provided with an arbitration and multiplexer, a classifier, and two virtual channel buffers for storing off-chip memory access service flow data packets and inter-processor communication service flow data packets; the processor core sends the data packet of the off-chip memory access service flow (i.e. service flow 1) to the virtual channel buffer A of the local end according to the routing path, sends the data packet of the inter-processor communication service flow (i.e. service flow 2) to the virtual channel buffer B of the local end,
the arbitration and multiplexing selector selects a data packet from the two virtual channel buffers and transmits the data packet to the classifier of the next hop processor core through a corresponding physical link; the classifier of the next hop processor core stores the data packet of the service flow 1 into a virtual channel buffer A of the next hop physical link at the local end according to the service flow type, stores the data packet of the service flow 2 into a virtual channel buffer B of the next hop physical link at the local end, and then the data packet is transmitted to the classifier of the next hop processor core through the next hop physical link by an arbitration and multiplexing device, and the like until the data packet reaches the virtual channel buffer at the processor core of the receiving end; the classifier at the receiving processor core sends the off-chip memory access traffic stream packets to the off-chip memory and provides the inter-processor communication traffic stream packets to the receiving processor core.
Further, as shown in fig. 3, the packet header of the data packet has a type field for indicating the traffic flow type of the data packet and other control fields for indicating the access type, i.e. whether the data packet belongs to a read request, a read response, a write request or a write response.
Further, as shown in fig. 4, the arbitration and multiplexing selector includes a register Reg and an inverter INV, where the register Reg is reset to 0 at the beginning, and is continuously flipped between 0 and 1 after starting, so as to generate a multiplexing selection signal S to the selector MUX, where the selector MUX selects, according to the value of S, a corresponding data packet Pkt1 and Pkt2 from the two first virtual channel buffers to send, and outputs the data packet Pkt.
With this arbitration and multiplexer, the switching is continuous regardless of whether packets are sent out or not. Thus, when there is a congestion of packets of one virtual channel, an attempt may be made to send packets of another virtual channel. After the arbitration and the selection of the multiplexer, the physical link only needs the width of one data packet, thereby reducing the realization cost of the physical link.
Further, as shown in fig. 5, the arbitration and multiplexing selector includes a cyclic shift register and a configurable selector; the cyclic shift register is a 17-bit register chain, when the register chain is reset, only 1 bit is 1, the rest bits are all 0, and then each clock of the cyclic shift register is shifted to the right by one bit; the selector selects fixed 10 bits from a 17-bit register chain, OR operation is carried out on the 10 bits to obtain a selection signal S of the multiplexer, and the selector MUX selects corresponding data packets Pkt1 and Pkt2 from the two first virtual channel buffers according to the value of S to send and outputs the data packets Pkt.
In many systems, the traffic of off-chip memory accesses and the traffic of inter-processor communications are unequal. For example, most systems have an off-chip memory access with much greater traffic than an inter-processor communication, i.e., traffic of traffic 1 is much greater than traffic of traffic 2. Thus, an arbiter of such imbalance as described above may be provided between traffic 1 and traffic 2. For example, use 3:1, then 1 traffic stream 2 data packet is sent every 3 traffic stream 1 data packets are sent. Thus, the network-on-chip efficiency can be effectively improved.
Further, each virtual channel buffer is a first-in first-out buffer. Virtual channel buffers are typically implemented using FIFO first-in first-out buffers. The FIFO may be a synchronous FIFO or an asynchronous FIFO. The Valid interface signal at the transmitting end may indicate the validity of a data packet, and the Ready interface signal at the receiving end indicates that the receiving end is Ready to receive a data packet. When the Valid signal and the Ready signal are both active (high at the same time), the data packet is transmitted from the transmitting end to the receiving end.
Specifically, as shown in fig. 6, after receiving the data packet Pkt, the classifier compares the type field of the data packet Pkt with the traffic flow types of the two second virtual channel buffers to obtain corresponding boolean values, AND then performs AND operation on the two boolean values AND the data packet validity indication signal Vld of the opposite end to obtain validity signals Vld AND Vld2 of the two second virtual channel buffers, AND then sends the data packet Pkt to the two second virtual channel buffers at the same time, where only the second virtual channel buffer whose validity signal is valid can successfully receive the data packet.
Currently, in a multi-core/many-core processor, on one hand, each processor core accesses an off-chip memory through a network on chip, and on the other hand, each processor realizes mutual access between the processors. For example, a multi-core/many-core processor system comprising 64 processors, the processors of which are divided into 16 processor clusters, are connected by a 4 x 4 network-on-chip. At the 4 corners of the network on chip, memory controllers are connected, respectively. Any one processor can access 4 memory controllers through the network on chip, so as to access the off-chip memory; any two processors may access each other through a network on chip. This results in two communication traffic streams of different nature. When two kinds of services are mixed together, the two kinds of services are easy to cause mutual congestion, so that the system performance is reduced; when two communication services are completely separated and two independent physical links are adopted, the implementation cost of the network on chip is high. Therefore, the invention adopts virtual channel technology, can realize the independence of two service flows and the sharing of physical links. In addition, by adopting the unbalanced arbiter, the matching of the arbiter and the service flow can be realized, which is beneficial to improving the network-on-chip utilization rate.
It should be understood that the foregoing description of the specific embodiments of the present patent is merely illustrative for the purpose of facilitating the understanding of the present patent application by those of ordinary skill in the art, and does not imply that the scope of protection of the present patent is limited to only these examples, and that a person of ordinary skill in the art can fully understand the technical solution of the present patent without any inventive effort, by taking the combination of technical features, substitution of some technical features, addition of more technical features, etc. of each of the examples listed in the present patent, all of which are within the scope of coverage of the claims of the present patent, and therefore, these new specific embodiments should also be within the scope of protection of the present patent.

Claims (4)

1. A link shared network on chip comprising a plurality of processor cores; the system is characterized in that the processor cores are arranged in an array form, a physical link is arranged between two adjacent processor cores in the array, and two ends of each physical link are respectively provided with an arbitration and multiplexing selector, a classifier and two virtual channel buffers for storing an off-chip memory access service flow data packet and an inter-processor communication service flow data packet; the processor core respectively sends the data packets of the off-chip memory access service flow and the inter-processor communication service flow to corresponding virtual channel buffers according to the routing path, the arbitration and multiplexing selector selects one data packet from the two virtual channel buffers, the data packet is sent to the classifier of the next-hop processor core through a corresponding physical link, the classifier of the next-hop processor core stores the received data packet in the corresponding virtual channel buffer, and then the arbitration and multiplexing selector sends the received data packet to the classifier of the next-hop processor core through the corresponding physical link until the data packet reaches the classifier of the processor core at the receiving end, the data packet of the off-chip memory access service flow is sent to the off-chip memory by the classifier, and the inter-processor communication service flow data packet is provided to the processor core at the receiving end;
the arbitration and multiplexing selector comprises a register and an inverter, wherein the register is reset to 0 at the beginning, and is continuously turned over between 0 and 1 after being started to generate a multiplexing signal, so that data packets in two virtual channel buffers are continuously transmitted in a round flow manner;
the arbitration and multiplexing selector comprises a cyclic shift register and a configurable selector; the cyclic shift register is an N-bit register chain, N is more than or equal to 2, when the register chain is reset, only 1 bit is 1, the rest bits are all 0, and then each clock of the cyclic shift register is shifted to the right by one bit; the selector selects fixed M bits from the N-bit register chain, M < N, and performs OR operation on the M bits to obtain a selection signal of the multiplexer.
2. The network on chip of claim 1, wherein each virtual channel buffer is a first-in-first-out buffer.
3. A link shared network on chip as claimed in claim 1, wherein the packet header has a type field for indicating the traffic flow type of the packet and a control field for indicating the access type, i.e. whether the packet belongs to a read request, a read response, a write request or a write response.
4. A network-on-chip for link sharing according to claim 3, wherein the virtual channel buffers have data packet validity indication signals, the classifier compares the type field of the data packet with the traffic flow types of the two local virtual channel buffers after receiving the data packet of the opposite end to obtain corresponding boolean values, and then compares the two boolean values with the data packet validity indication signals of the opposite end to obtain validity signals of the two local virtual channel buffers, and then sends the data packet to the two local virtual channel buffers simultaneously, wherein only the local virtual channel buffer with the valid validity signal can successfully receive the data packet.
CN202011528831.8A 2020-12-22 2020-12-22 Network-on-chip for link sharing Active CN112631985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011528831.8A CN112631985B (en) 2020-12-22 2020-12-22 Network-on-chip for link sharing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011528831.8A CN112631985B (en) 2020-12-22 2020-12-22 Network-on-chip for link sharing

Publications (2)

Publication Number Publication Date
CN112631985A CN112631985A (en) 2021-04-09
CN112631985B true CN112631985B (en) 2023-05-23

Family

ID=75321634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011528831.8A Active CN112631985B (en) 2020-12-22 2020-12-22 Network-on-chip for link sharing

Country Status (1)

Country Link
CN (1) CN112631985B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103581031A (en) * 2013-10-15 2014-02-12 复旦大学 Configurable on-chip router model used for heterogeneous multi-core on-chip network modeling
CN103778374A (en) * 2014-02-19 2014-05-07 邹候文 Trusted terminal, double-channel card, anti-cloning chip, chip fingerprint and channel attack resistance method
CN109189720A (en) * 2018-08-22 2019-01-11 曙光信息产业(北京)有限公司 Stratification Survey on network-on-chip topology and its method for routing
CN111104775A (en) * 2019-11-22 2020-05-05 核芯互联科技(青岛)有限公司 Network-on-chip topological structure and implementation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103581031A (en) * 2013-10-15 2014-02-12 复旦大学 Configurable on-chip router model used for heterogeneous multi-core on-chip network modeling
CN103778374A (en) * 2014-02-19 2014-05-07 邹候文 Trusted terminal, double-channel card, anti-cloning chip, chip fingerprint and channel attack resistance method
CN109189720A (en) * 2018-08-22 2019-01-11 曙光信息产业(北京)有限公司 Stratification Survey on network-on-chip topology and its method for routing
CN111104775A (en) * 2019-11-22 2020-05-05 核芯互联科技(青岛)有限公司 Network-on-chip topological structure and implementation method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于 FPGA 的片上网络虚拟通道控制器的设计;张旺、汪金辉、侯立刚、吴武臣;《微 电 子 学 与 计 算 机》;20120531;第29卷(第5期);第10-14页 *

Also Published As

Publication number Publication date
CN112631985A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
US6910092B2 (en) Chip to chip interface for interconnecting chips
Galles Spider: A high-speed network interconnect
JP3816530B2 (en) Low latency, high clock frequency, pre-geo asynchronous packet-based crossbar switching chip system and method
US8964754B2 (en) Backplane interface adapter with error control and redundant fabric
EP1384354B1 (en) High speed network processor
US7042891B2 (en) Dynamic selection of lowest latency path in a network switch
US7016996B1 (en) Method and apparatus to detect a timeout condition for a data item within a process
US20040022263A1 (en) Cross point switch with out-of-band parameter fine tuning
US20020118692A1 (en) Ensuring proper packet ordering in a cut-through and early-forwarding network switch
US20030108061A1 (en) Fibre channel arbitrated loop bufferless switch circuitry to increase bandwidth without significant increase in cost
US20030174721A1 (en) Fibre channel arbitrated loop bufferless switch circuitry to increase bandwidth without significant increase in cost
KR19980080498A (en) Fibre Channel Switching Systems and Methods
US7439763B1 (en) Scalable shared network memory switch for an FPGA
US7079538B2 (en) High-speed router
WO2014051758A1 (en) Managing starvation and congestion in a two-dimensional network having flow control
US7568074B1 (en) Time based data storage for shared network memory switch
US7218638B2 (en) Switch operation scheduling mechanism with concurrent connection and queue scheduling
CN112631985B (en) Network-on-chip for link sharing
US7965705B2 (en) Fast and fair arbitration on a data link
Omang et al. Scalability of SCI workstation clusters, a preliminary study
US7996604B1 (en) Class queue for network data switch to identify data memory locations by arrival time
Reddy et al. QNOC Isochronous Router with Efficient Dynamic Virtual channel and Error Termination
CN117992371A (en) Distributed arbitration for shared data paths
CN116627894A (en) Medium access control layer, communication method and system
Wang et al. Design of a Partially Buffered Crossbar Router for Mesh-Based Network-on-Chips

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant