CN112631985A - Link-shared network-on-chip - Google Patents

Link-shared network-on-chip Download PDF

Info

Publication number
CN112631985A
CN112631985A CN202011528831.8A CN202011528831A CN112631985A CN 112631985 A CN112631985 A CN 112631985A CN 202011528831 A CN202011528831 A CN 202011528831A CN 112631985 A CN112631985 A CN 112631985A
Authority
CN
China
Prior art keywords
data packet
virtual channel
chip
packet
link
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011528831.8A
Other languages
Chinese (zh)
Other versions
CN112631985B (en
Inventor
胡东伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 54 Research Institute
Original Assignee
CETC 54 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 54 Research Institute filed Critical CETC 54 Research Institute
Priority to CN202011528831.8A priority Critical patent/CN112631985B/en
Publication of CN112631985A publication Critical patent/CN112631985A/en
Application granted granted Critical
Publication of CN112631985B publication Critical patent/CN112631985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17381Two dimensional, e.g. mesh, torus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a link-shared network on chip, and belongs to the technical field of chip design. The system comprises a plurality of processor cores which are arranged in an array form, wherein a physical link is arranged between two adjacent processor cores in the array, and an arbitration and multiplexer, a classifier and two virtual channel buffers are arranged at two ends of each physical link; and the processor core transmits the data packet to the next hop of processor core according to the routing path until the data packet reaches the processor core at the receiving end. The invention adopts the virtual channel technology, can realize the independence of two service flows and the sharing of a physical link. In addition, by adopting the unbalanced arbiter, the matching of the arbiter and the service flow can be realized, which is beneficial to improving the utilization rate of the network on chip.

Description

Link-shared network-on-chip
Technical Field
The invention relates to the technical field of chip design, in particular to a link-shared network on chip.
Background
Network on chip is an important method for implementing multi-core/many-core processors. With the network-on-chip technology, the interconnection of hundreds of chips in a single chip can be realized. Modern high-end server chips all adopt on-chip network technology. However, the implementation method of the network-on-chip technology by each company is not disclosed. Thus, companies have different network-on-chip implementation techniques.
For a multi-core/many-core processor, the network-on-chip technology can realize the access of each processor to an off-chip memory and can also realize the mutual communication among the processors. These are two different traffic flows. Currently, there are two approaches to achieve both the access of any processor to the off-chip memory and the mutual access between any two processors. One approach is to use two completely independent networks on chip to respectively realize independent access of two communication service flows. For example, the TILE64 architecture claims to have 3 sets of independent networks on chip. Because the implementation of the network on chip involves a large number of on-chip routing, the method has the disadvantage that the implementation cost of the network on chip is large, which causes the cost of the chip area to be large. Alternatively, two traffic streams are mixed together and a set of network-on-chip is used to implement both traffic streams. This approach is seen in a number of academic papers. The disadvantage of this approach is that when one communication traffic flow is blocked, the other traffic flow will also be blocked, resulting in a degradation of system performance.
Disclosure of Invention
In view of this, the present invention provides a link-shared network on chip, which uses a virtual channel technology, and can implement separation of a service flow for accessing a memory and a service flow for communication between processor cores, and the physical links of the two are shared, thereby not only ensuring performance, but also reducing area overhead for implementing the network on chip.
Based on the above purpose, the technical scheme provided by the invention is as follows:
a link-shared network on chip comprising a plurality of processor cores; the processor cores are arranged in an array form, a physical link is arranged between two adjacent processor cores in the array, and both ends of each physical link are provided with an arbitration and multiplexer, a classifier and two virtual channel buffers which are respectively used for storing an off-chip memory to access a service flow data packet and an interprocessor communication service flow data packet; the processor core respectively sends the data packets of the off-chip memory access service flow and the inter-processor communication service flow to the corresponding virtual channel buffers according to the routing path, the arbitration and multi-path selector selects one data packet from the two virtual channel buffers and transmits the data packet to the classifier of the next-hop processor core through the corresponding physical link, the classifier of the next-hop processor core stores the received data packet into the corresponding virtual channel buffer, and then the arbitration and multi-path selector transmits the data packet to the classifier of the next-hop processor core through the corresponding physical link until the data packet reaches the classifier of the processor core at the receiving end, the classifier sends the off-chip memory access service flow data packet to the off-chip memory, and the inter-processor communication service flow data packet is provided for the processor core at the receiving end.
Furthermore, the arbitration and multiplexer comprises a register and an inverter, wherein the register is reset to 0 at the beginning, and is continuously turned over between 0 and 1 after being started to generate a multiplexing signal, so that the data packets in the two virtual channel buffers are sent in turn without interruption.
Further, the arbitration and multiplexer comprises a circular shift register and a configurable selector; the cyclic shift register is a register chain with N bits, N is more than or equal to 2, when the register chain is reset, only 1 bit is 1, and the rest bits are all 0, and then each clock of the cyclic shift register circularly shifts one bit to the right; the selector selects fixed M bits from the register chain with N bits, M < N, and the M bits are subjected to OR operation to obtain a selection signal of the multiplexer.
Further, each virtual channel buffer is a first-in first-out buffer.
Further, the packet header of the data packet has a type field and a control field, the type field is used for indicating the traffic type of the data packet, and the control field is used for indicating the access type, that is, whether the data packet belongs to a read request, a read response, a write request, or a write response.
Furthermore, the virtual channel buffers have data packet validity indication signals, after the classifier receives an opposite-end data packet, the classifier compares the type field of the data packet with the service flow types of the two local-end virtual channel buffers to obtain corresponding boolean values, and then the two boolean values are respectively and respectively summed with the data packet validity indication signals of the opposite end to obtain validity signals of the two local-end virtual channel buffers, and then the data packet is simultaneously sent to the two local-end virtual channel buffers, and only the local-end virtual channel buffers with valid validity signals can successfully receive the data packet.
As can be seen from the above description, the technical scheme of the invention has the beneficial effects that:
1. the invention adopts the virtual channel technology to realize the independence of the two service flows of the access of each processor core of the on-chip network to the off-chip memory and the access of each processor core to each other, and the sharing of the physical link. Therefore, on one hand, the service flow of memory access and the service flow of communication between processors are separated, and the communication performance is improved; on the other hand, the physical links of the two service flows are shared, so that the wiring overhead in a chip is avoided, and the chip area is reduced.
2. The invention can realize the matching of the service flow and the arbitrator through the programmable non-equilibrium arbitrator, and is beneficial to improving the utilization rate of the network on chip.
Drawings
To more clearly describe this patent, one or more drawings are provided below to assist in explaining the background, technical principles and/or certain embodiments of this patent.
Fig. 1 is a schematic structural diagram of a network on chip in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a link sharing structure in the embodiment of the present invention.
Fig. 3 is a diagram illustrating a format of a data packet according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of an implementation of the arbitration and multiplexer according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of another implementation of the arbitration and multiplexer according to the embodiment of the present invention.
Fig. 6 is a schematic diagram of one implementation of a classifier in an embodiment of the invention.
Detailed Description
In order to facilitate understanding of the technical solutions of the present patent by those skilled in the art, and to make the technical objects, technical solutions and advantages of the present patent more apparent and fully support the scope of the claims, the technical solutions of the present patent are described in detail in the following embodiments.
As shown in fig. 1 and 2, a link-shared network on chip includes a plurality of processor cores, where the processor cores are arranged in an array, a physical link is arranged between two adjacent processor cores in the array, and both ends of each physical link are provided with an arbitration and multiplexer, a classifier, and two virtual channel buffers respectively used for storing an off-chip memory access traffic packet and an inter-processor communication traffic packet; the processor core sends the data packet of the off-chip memory access service flow (namely, the service flow 1) to the virtual channel buffer A of the local end according to the routing path, sends the data packet of the inter-processor communication service flow (namely, the service flow 2) to the virtual channel buffer B of the local end,
the arbitration and multi-path selector selects a data packet from the two virtual channel buffers and transmits the data packet to the classifier of the next hop processor core through a corresponding physical link; the classifier of the processor core of the next hop stores the data packet of the service flow 1 into the virtual channel buffer A of the physical link of the next hop at the local end according to the type of the service flow, stores the data packet of the service flow 2 into the virtual channel buffer B of the physical link of the next hop at the local end, and then transmits the data packet to the classifier of the processor core of the next hop through the physical link of the next hop by the arbitration and multi-path selector, and so on until the data packet reaches the virtual channel buffer at the processor core of the receiving end; and the classifier at the processor core of the receiving end sends the access service flow data packet of the off-chip memory to the off-chip memory and provides the communication service flow data packet between the processors to the processor core of the receiving end.
Further, as shown in fig. 3, the packet header of the data packet has a type field for indicating the traffic type of the data packet and other control fields for indicating the access type, i.e. whether the data packet belongs to a read request, a read response, a write request or a write response.
Further, as shown in fig. 4, the arbitration and multiplexer includes a register Reg and an inverter INV, the register Reg is reset to 0 initially, and is continuously inverted between 0 and 1 after being started, a multiplexer signal S is generated and sent to the selector MUX, and the selector MUX selects and sends a corresponding packet Pkt1 and Pkt2 from the two first virtual channel buffers according to the value of S, and outputs the packet Pkt.
With this arbitration and multiplexer, the switch is continuous whether or not a packet is sent. Thus, when a packet of one virtual channel is blocked, an attempt can be made to transmit a packet of another virtual channel. After arbitration and selection of the multi-path selector, the physical link only needs the width of one data packet, thereby reducing the implementation overhead of the physical link.
Further, as shown in fig. 5, the arbitration and multiplexer includes a circular shift register and a configurable selector; the cyclic shift register is a 17-bit register chain, when the register chain is reset, only 1 bit is 1, and the rest bits are all 0, and then each clock of the cyclic shift register circularly shifts one bit to the right; the selector selects fixed 10 bits from a 17-bit register chain, performs OR operation on the 10 bits to obtain a selection signal S of the multiplexer, and the selector MUX selects corresponding data packets Pkt1 and Pkt2 from the two first virtual channel buffers according to the value of S to transmit, and outputs the data packets Pkt.
In many systems, the traffic for off-chip memory access and the traffic for inter-processor communication are not equal. For example, in most systems, the traffic for off-chip memory access is much larger than the traffic for interprocessor communication, i.e., traffic 1 is much larger than traffic 2. Thus, a non-uniform arbiter of the kind described above may be provided between traffic flow 1 and traffic flow 2. For example, 3: 1 arbiter, then every time 3 packets of traffic 1 are sent, 1 packet of traffic 2 is sent. Therefore, the efficiency of the network on chip can be effectively improved.
Further, each virtual channel buffer is a first-in first-out buffer. The virtual channel buffer is typically implemented using a FIFO first-in first-out buffer. The FIFO may be a synchronous FIFO or an asynchronous FIFO. The Valid interface signal of the sending end can indicate the validity of a data packet, and the Ready interface signal of the receiving end indicates that the receiving end is Ready to receive a data packet. When the Valid signal and the Ready signal are simultaneously active (simultaneously high), the data packet is transmitted from the transmitting end to the receiving end.
Specifically, as shown in fig. 6, after receiving the data packet Pkt, the classifier compares the type field of the data packet Pkt with the service flow types of the two second virtual channel buffers, respectively, to obtain corresponding boolean values, AND then performs AND operation on the two boolean values with the data packet validity indication signal Vld of the opposite end, respectively, to obtain validity signals Vld1 AND Vld2 of the two second virtual channel buffers, AND then sends the data packet Pkt to the two second virtual channel buffers at the same time, AND only the second virtual channel buffer whose validity signal is valid can successfully receive the data packet.
Currently, in a multi-core/many-core processor, on one hand, each processor core needs to access an off-chip memory through an on-chip network, and on the other hand, each processor needs to realize mutual access among the processors. For example, a multi/many-core processor system containing 64 processors, whose processors are divided into 16 processor clusters, connected by a 4 × 4 network on chip. At the 4 corners of the network on chip, memory controllers are connected, respectively. Any processor can access 4 memory controllers through the on-chip network, so that the off-chip memory is accessed; any two processors may access each other through a network on chip. This results in two traffic flows that differ in nature. When two services are mixed together, the two services are easily congested, so that the system performance is reduced; when two communication services are completely separated and two sets of independent physical links are adopted, the implementation cost of the network on chip is large. Therefore, the virtual channel technology is adopted in the invention, so that the independence of two service flows can be realized, and the physical link is shared. In addition, by adopting the unbalanced arbiter, the matching of the arbiter and the service flow can be realized, which is beneficial to improving the utilization rate of the network on chip.
It should be understood that the above description of the embodiments of the present patent is only an exemplary description for facilitating the understanding of the patent scheme by the person skilled in the art, and does not imply that the scope of protection of the patent is only limited to these examples, and that the person skilled in the art can obtain more embodiments by combining technical features, replacing some technical features, adding more technical features, and the like to the various embodiments listed in the patent without any inventive effort on the premise of fully understanding the patent scheme, and therefore, the new embodiments are also within the scope of protection of the patent.

Claims (6)

1. A link-shared network on chip comprising a plurality of processor cores; the system is characterized in that the processor cores are arranged in an array form, a physical link is arranged between two adjacent processor cores in the array, and both ends of each physical link are provided with an arbitration and multiplexer, a classifier and two virtual channel buffers which are respectively used for storing an off-chip memory to access a service flow data packet and a communication service flow data packet between the processors; the processor core respectively sends the data packets of the off-chip memory access service flow and the inter-processor communication service flow to the corresponding virtual channel buffers according to the routing path, the arbitration and multi-path selector selects one data packet from the two virtual channel buffers and transmits the data packet to the classifier of the next-hop processor core through the corresponding physical link, the classifier of the next-hop processor core stores the received data packet into the corresponding virtual channel buffer, and then the arbitration and multi-path selector transmits the data packet to the classifier of the next-hop processor core through the corresponding physical link until the data packet reaches the classifier of the processor core at the receiving end, the classifier sends the off-chip memory access service flow data packet to the off-chip memory, and the inter-processor communication service flow data packet is provided for the processor core at the receiving end.
2. The link-shared network-on-chip as claimed in claim 1, wherein the arbitration and multiplexer comprises a register and an inverter, the register is reset to 0 initially and toggles between 0 and 1 after activation to generate the multiplexing signal, thereby transmitting the data packets in the two virtual channel buffers alternately without interruption.
3. A link-shared network-on-chip according to claim 1, wherein said arbitration and multiplexer comprises a circular shift register and a configurable selector; the cyclic shift register is a register chain with N bits, N is more than or equal to 2, when the register chain is reset, only 1 bit is 1, and the rest bits are all 0, and then each clock of the cyclic shift register circularly shifts one bit to the right; the selector selects fixed M bits from the register chain with N bits, M < N, and the M bits are subjected to OR operation to obtain a selection signal of the multiplexer.
4. The link-shared network-on-chip of claim 1, wherein each virtual channel buffer is a first-in-first-out buffer.
5. The link-sharing network on chip of claim 1, wherein the packet header of the data packet has a type field and a control field, the type field is used to indicate the traffic type of the data packet, and the control field is used to indicate the access type, i.e. whether the data packet belongs to a read request, a read response, a write request or a write response.
6. The network on chip of claim 5, wherein the virtual channel buffers have packet validity indication signals, the classifier receives an opposite-end packet, compares the type field of the packet with the service flow types of the two local-end virtual channel buffers to obtain corresponding Boolean values, and then respectively sums the two Boolean values with the packet validity indication signals of the opposite end to obtain validity signals of the two local-end virtual channel buffers, and then sends the packet to the two local-end virtual channel buffers at the same time, and only the local-end virtual channel buffer with the validity signal of the local-end virtual channel buffer can successfully receive the packet.
CN202011528831.8A 2020-12-22 2020-12-22 Network-on-chip for link sharing Active CN112631985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011528831.8A CN112631985B (en) 2020-12-22 2020-12-22 Network-on-chip for link sharing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011528831.8A CN112631985B (en) 2020-12-22 2020-12-22 Network-on-chip for link sharing

Publications (2)

Publication Number Publication Date
CN112631985A true CN112631985A (en) 2021-04-09
CN112631985B CN112631985B (en) 2023-05-23

Family

ID=75321634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011528831.8A Active CN112631985B (en) 2020-12-22 2020-12-22 Network-on-chip for link sharing

Country Status (1)

Country Link
CN (1) CN112631985B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766576A (en) * 2022-12-01 2023-03-07 电子科技大学 Angle router of network on chip based on dimension split type router

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103581031A (en) * 2013-10-15 2014-02-12 复旦大学 Configurable on-chip router model used for heterogeneous multi-core on-chip network modeling
CN103778374A (en) * 2014-02-19 2014-05-07 邹候文 Trusted terminal, double-channel card, anti-cloning chip, chip fingerprint and channel attack resistance method
CN109189720A (en) * 2018-08-22 2019-01-11 曙光信息产业(北京)有限公司 Stratification Survey on network-on-chip topology and its method for routing
CN111104775A (en) * 2019-11-22 2020-05-05 核芯互联科技(青岛)有限公司 Network-on-chip topological structure and implementation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103581031A (en) * 2013-10-15 2014-02-12 复旦大学 Configurable on-chip router model used for heterogeneous multi-core on-chip network modeling
CN103778374A (en) * 2014-02-19 2014-05-07 邹候文 Trusted terminal, double-channel card, anti-cloning chip, chip fingerprint and channel attack resistance method
CN109189720A (en) * 2018-08-22 2019-01-11 曙光信息产业(北京)有限公司 Stratification Survey on network-on-chip topology and its method for routing
CN111104775A (en) * 2019-11-22 2020-05-05 核芯互联科技(青岛)有限公司 Network-on-chip topological structure and implementation method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张旺、汪金辉、侯立刚、吴武臣: "基于 FPGA 的片上网络虚拟通道控制器的设计", 《微 电 子 学 与 计 算 机》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115766576A (en) * 2022-12-01 2023-03-07 电子科技大学 Angle router of network on chip based on dimension split type router
CN115766576B (en) * 2022-12-01 2024-05-28 电子科技大学 Angle router of network on chip based on dimension split router

Also Published As

Publication number Publication date
CN112631985B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
US4965788A (en) Self-routing switch element for an asynchronous time switch
JP3816530B2 (en) Low latency, high clock frequency, pre-geo asynchronous packet-based crossbar switching chip system and method
Galles Spider: A high-speed network interconnect
US6160813A (en) Fibre channel switching system and method
US8085801B2 (en) Resource arbitration
US20140133488A1 (en) Backplane interface adapter with error control and redundant fabric
EP0581486A2 (en) High bandwidth packet switch
EP0588104A2 (en) Multipath torus switching apparatus
JPH02131048A (en) Packet transfer method between adapter, contention eliminating device and token-ring device
JP2003508967A (en) Network switch using network processor and method
US7439763B1 (en) Scalable shared network memory switch for an FPGA
JP2003508954A (en) Network switch, components and operation method
JP2003508851A (en) Network processor, memory configuration and method
JPH08265270A (en) Transfer line assignment system
JP2003508957A (en) Network processor processing complex and method
US20010030961A1 (en) High-speed router
CN213024387U (en) Data redundancy transmission device based on RapidIO bus
US20020150056A1 (en) Method for avoiding broadcast deadlocks in a mesh-connected network
CN112631985B (en) Network-on-chip for link sharing
US7568074B1 (en) Time based data storage for shared network memory switch
KR20170015000A (en) On-chip network and communication method thereof
US20020172156A1 (en) Adaptive control of multiplexed input buffer channels
Omang et al. Scalability of SCI workstation clusters, a preliminary study
El-Moursy et al. High throughput architecture for OCTAGON network on chip
JP3492539B2 (en) Flow control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant