CN112631985B

CN112631985B - Network-on-chip for link sharing

Info

Publication number: CN112631985B
Application number: CN202011528831.8A
Authority: CN
Inventors: 胡东伟
Original assignee: CETC 54 Research Institute
Current assignee: CETC 54 Research Institute
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2023-05-23
Anticipated expiration: 2040-12-22
Also published as: CN112631985A

Abstract

The invention discloses a network-on-chip of link sharing, and belongs to the technical field of chip design. The system comprises a plurality of processor cores which are arranged in an array form, wherein a physical link is arranged between two adjacent processor cores in the array, and two ends of each physical link are respectively provided with an arbitration and multiplexing device, a classifier and two virtual channel buffers; the processor core transmits the data packet to the next hop processor core according to the routing path until the data packet reaches the receiving end processor core. The invention adopts virtual channel technology, can realize the independence of two service flows and the sharing of physical links. In addition, by adopting the unbalanced arbiter, the matching of the arbiter and the service flow can be realized, which is beneficial to improving the network-on-chip utilization rate.

Description

Network-on-chip for link sharing

Technical Field

The invention relates to the technical field of chip design, in particular to a network-on-chip with link sharing.

Background

Network-on-chip is an important method of implementing multi-core/many-core processors. With network-on-chip technology, hundreds or thousands of chips within a single chip may be interconnected. Modern high-end server chips all employ network-on-chip technology. However, no implementation method of the network-on-chip technology is disclosed by each company. Thus, each company has different network-on-chip implementation technologies.

For multi-core/many-core processors, network-on-chip technology can not only enable each processor to access off-chip memory, but also enable each processor to communicate with each other. These are two different traffic flows. At present, two methods exist for realizing the access of any processor to the off-chip memory and the mutual access between any two processors. One approach is to use two completely independent sets of network-on-chip to achieve independent access to the two communication traffic streams, respectively. For example, the TILE64 architecture claims to have 3 separate sets of network-on-chip. Since the implementation of the network on chip involves a large number of on-chip traces, this method has the disadvantage that the implementation cost of the network on chip is large, resulting in a large cost of chip area. Alternatively, two communication traffic streams are mixed together, and a set of network-on-chip is used to implement both communication traffic streams. This approach is visible in a number of academic papers. A disadvantage of this approach is that when one traffic flow is blocked, the other traffic flow will also be blocked, resulting in a decrease in system performance.

Disclosure of Invention

In view of this, the present invention proposes a network-on-chip with link sharing, which adopts virtual channel technology, and can realize separation of service flow accessing memory and service flow communicating between processor cores, and physical links of the two are shared, thereby not only ensuring performance, but also reducing area overhead of network-on-chip implementation.

Based on the above purpose, the technical scheme provided by the invention is as follows:

a link shared network on chip comprising a plurality of processor cores; the processor cores are arranged in an array form, a physical link is arranged between two adjacent processor cores in the array, and two ends of each physical link are respectively provided with an arbitration and multipath selector, a classifier and two virtual channel buffers for storing an off-chip memory access service flow data packet and an inter-processor communication service flow data packet; the processor core sends the data packets of the off-chip memory access service flow and the inter-processor communication service flow to the corresponding virtual channel buffers respectively according to the routing paths, the arbitration and multiplexing selector selects one data packet from the two virtual channel buffers, the data packet is sent to the classifier of the next-hop processor core through the corresponding physical link, the classifier of the next-hop processor core stores the received data packet in the corresponding virtual channel buffer, the arbitration and multiplexing selector sends the received data packet to the classifier of the next-hop processor core through the corresponding physical link until the data packet reaches the classifier of the processor core of the receiving end, the data packet of the off-chip memory access service flow is sent to the off-chip memory through the classifier, and the inter-processor communication service flow data packet is provided to the processor core of the receiving end.

Further, the arbitration and multiplexing device comprises a register and an inverter, wherein the register is reset to 0 at the beginning, and is continuously turned between 0 and 1 after starting to generate a multiplexing signal, so that data packets in two virtual channel buffers are continuously transmitted in a round-robin manner.

Further, the arbitration and multiplexer includes a cyclic shift register and a configurable selector; the cyclic shift register is an N-bit register chain, N is more than or equal to 2, when the register chain is reset, only 1 bit is 1, the rest bits are all 0, and then each clock of the cyclic shift register is shifted to the right by one bit; the selector selects fixed M bits from the N-bit register chain, M < N, and performs OR operation on the M bits to obtain a selection signal of the multiplexer.

Further, each virtual channel buffer is a first-in first-out buffer.

Further, the packet header of the data packet has a type field and a control field, where the type field is used to indicate the traffic flow type of the data packet, and the control field is used to indicate the access type, that is, whether the data packet belongs to a read request, a read response, a write request or a write response.

Further, the virtual channel buffer has a data packet validity indication signal, after the classifier receives the data packet of the opposite end, the classifier compares the type field of the data packet with the service flow types of the two local virtual channel buffers to obtain corresponding boolean values, and then sums the two boolean values with the data packet validity indication signals of the opposite end to obtain validity signals of the two local virtual channel buffers, and then sends the data packet to the two local virtual channel buffers at the same time, wherein only the local virtual channel buffer with the validity signal being valid can successfully receive the data packet.

From the above description, the technical scheme of the invention has the following beneficial effects:

1. the invention adopts virtual channel technology, realizes the independence of two service flows of each processor core accessing the off-chip memory and each processor core accessing each other in the network on chip, and the physical links are shared. Therefore, on one hand, the service flow accessed by the memory is separated from the service flow communicated between the processors, so that the communication performance is improved; on the other hand, the physical links of the two service flows are shared, so that the overhead of wiring in a chip is avoided, and the area of the chip is reduced.

2. The invention can realize the matching of the service flow and the arbiter through the programmable unbalanced arbiter, and is beneficial to improving the network-on-chip utilization rate.

Drawings

For a clearer description of the present patent, one or more drawings are provided below, which are intended to aid in the description of the background, principles, and/or certain embodiments of the present patent.

Fig. 1 is a schematic diagram of a network on chip in an embodiment of the present invention.

Fig. 2 is a schematic diagram of a link sharing structure in an embodiment of the present invention.

Fig. 3 is a schematic diagram of a format of a data packet according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of one implementation of an arbitration and multiplexer in accordance with an embodiment of the present invention.

FIG. 5 is a schematic diagram of another implementation of an arbitration and multiplexer in accordance with an embodiment of the present invention.

Fig. 6 is a schematic diagram of one implementation of a classifier in an embodiment of the invention.

Detailed Description

In order to facilitate understanding of the technical solution of the present patent by those skilled in the art, and at the same time, in order to make the technical purpose, technical solution and beneficial effect of the present patent clearer, and make the protection scope of the claims fully supported, the technical solution of the present patent is further and more detailed described in the form of specific cases.

As shown in fig. 1 and 2, a network on chip with shared links includes a plurality of processor cores, where the processor cores are arranged in an array, a physical link is arranged between two adjacent processor cores in the array, two ends of each physical link are respectively provided with an arbitration and multiplexer, a classifier, and two virtual channel buffers for storing off-chip memory access service flow data packets and inter-processor communication service flow data packets; the processor core sends the data packet of the off-chip memory access service flow (i.e. service flow 1) to the virtual channel buffer A of the local end according to the routing path, sends the data packet of the inter-processor communication service flow (i.e. service flow 2) to the virtual channel buffer B of the local end,

the arbitration and multiplexing selector selects a data packet from the two virtual channel buffers and transmits the data packet to the classifier of the next hop processor core through a corresponding physical link; the classifier of the next hop processor core stores the data packet of the service flow 1 into a virtual channel buffer A of the next hop physical link at the local end according to the service flow type, stores the data packet of the service flow 2 into a virtual channel buffer B of the next hop physical link at the local end, and then the data packet is transmitted to the classifier of the next hop processor core through the next hop physical link by an arbitration and multiplexing device, and the like until the data packet reaches the virtual channel buffer at the processor core of the receiving end; the classifier at the receiving processor core sends the off-chip memory access traffic stream packets to the off-chip memory and provides the inter-processor communication traffic stream packets to the receiving processor core.

Further, as shown in fig. 3, the packet header of the data packet has a type field for indicating the traffic flow type of the data packet and other control fields for indicating the access type, i.e. whether the data packet belongs to a read request, a read response, a write request or a write response.

Further, as shown in fig. 4, the arbitration and multiplexing selector includes a register Reg and an inverter INV, where the register Reg is reset to 0 at the beginning, and is continuously flipped between 0 and 1 after starting, so as to generate a multiplexing selection signal S to the selector MUX, where the selector MUX selects, according to the value of S, a corresponding data packet Pkt1 and Pkt2 from the two first virtual channel buffers to send, and outputs the data packet Pkt.

With this arbitration and multiplexer, the switching is continuous regardless of whether packets are sent out or not. Thus, when there is a congestion of packets of one virtual channel, an attempt may be made to send packets of another virtual channel. After the arbitration and the selection of the multiplexer, the physical link only needs the width of one data packet, thereby reducing the realization cost of the physical link.

Further, as shown in fig. 5, the arbitration and multiplexing selector includes a cyclic shift register and a configurable selector; the cyclic shift register is a 17-bit register chain, when the register chain is reset, only 1 bit is 1, the rest bits are all 0, and then each clock of the cyclic shift register is shifted to the right by one bit; the selector selects fixed 10 bits from a 17-bit register chain, OR operation is carried out on the 10 bits to obtain a selection signal S of the multiplexer, and the selector MUX selects corresponding data packets Pkt1 and Pkt2 from the two first virtual channel buffers according to the value of S to send and outputs the data packets Pkt.

In many systems, the traffic of off-chip memory accesses and the traffic of inter-processor communications are unequal. For example, most systems have an off-chip memory access with much greater traffic than an inter-processor communication, i.e., traffic of traffic 1 is much greater than traffic of traffic 2. Thus, an arbiter of such imbalance as described above may be provided between traffic 1 and traffic 2. For example, use 3:1, then 1 traffic stream 2 data packet is sent every 3 traffic stream 1 data packets are sent. Thus, the network-on-chip efficiency can be effectively improved.

Further, each virtual channel buffer is a first-in first-out buffer. Virtual channel buffers are typically implemented using FIFO first-in first-out buffers. The FIFO may be a synchronous FIFO or an asynchronous FIFO. The Valid interface signal at the transmitting end may indicate the validity of a data packet, and the Ready interface signal at the receiving end indicates that the receiving end is Ready to receive a data packet. When the Valid signal and the Ready signal are both active (high at the same time), the data packet is transmitted from the transmitting end to the receiving end.

Specifically, as shown in fig. 6, after receiving the data packet Pkt, the classifier compares the type field of the data packet Pkt with the traffic flow types of the two second virtual channel buffers to obtain corresponding boolean values, AND then performs AND operation on the two boolean values AND the data packet validity indication signal Vld of the opposite end to obtain validity signals Vld AND Vld2 of the two second virtual channel buffers, AND then sends the data packet Pkt to the two second virtual channel buffers at the same time, where only the second virtual channel buffer whose validity signal is valid can successfully receive the data packet.

Currently, in a multi-core/many-core processor, on one hand, each processor core accesses an off-chip memory through a network on chip, and on the other hand, each processor realizes mutual access between the processors. For example, a multi-core/many-core processor system comprising 64 processors, the processors of which are divided into 16 processor clusters, are connected by a 4 x 4 network-on-chip. At the 4 corners of the network on chip, memory controllers are connected, respectively. Any one processor can access 4 memory controllers through the network on chip, so as to access the off-chip memory; any two processors may access each other through a network on chip. This results in two communication traffic streams of different nature. When two kinds of services are mixed together, the two kinds of services are easy to cause mutual congestion, so that the system performance is reduced; when two communication services are completely separated and two independent physical links are adopted, the implementation cost of the network on chip is high. Therefore, the invention adopts virtual channel technology, can realize the independence of two service flows and the sharing of physical links. In addition, by adopting the unbalanced arbiter, the matching of the arbiter and the service flow can be realized, which is beneficial to improving the network-on-chip utilization rate.

It should be understood that the foregoing description of the specific embodiments of the present patent is merely illustrative for the purpose of facilitating the understanding of the present patent application by those of ordinary skill in the art, and does not imply that the scope of protection of the present patent is limited to only these examples, and that a person of ordinary skill in the art can fully understand the technical solution of the present patent without any inventive effort, by taking the combination of technical features, substitution of some technical features, addition of more technical features, etc. of each of the examples listed in the present patent, all of which are within the scope of coverage of the claims of the present patent, and therefore, these new specific embodiments should also be within the scope of protection of the present patent.

Claims

1. A link shared network on chip comprising a plurality of processor cores; the system is characterized in that the processor cores are arranged in an array form, a physical link is arranged between two adjacent processor cores in the array, and two ends of each physical link are respectively provided with an arbitration and multiplexing selector, a classifier and two virtual channel buffers for storing an off-chip memory access service flow data packet and an inter-processor communication service flow data packet; the processor core respectively sends the data packets of the off-chip memory access service flow and the inter-processor communication service flow to corresponding virtual channel buffers according to the routing path, the arbitration and multiplexing selector selects one data packet from the two virtual channel buffers, the data packet is sent to the classifier of the next-hop processor core through a corresponding physical link, the classifier of the next-hop processor core stores the received data packet in the corresponding virtual channel buffer, and then the arbitration and multiplexing selector sends the received data packet to the classifier of the next-hop processor core through the corresponding physical link until the data packet reaches the classifier of the processor core at the receiving end, the data packet of the off-chip memory access service flow is sent to the off-chip memory by the classifier, and the inter-processor communication service flow data packet is provided to the processor core at the receiving end;

the arbitration and multiplexing selector comprises a register and an inverter, wherein the register is reset to 0 at the beginning, and is continuously turned over between 0 and 1 after being started to generate a multiplexing signal, so that data packets in two virtual channel buffers are continuously transmitted in a round flow manner;

the arbitration and multiplexing selector comprises a cyclic shift register and a configurable selector; the cyclic shift register is an N-bit register chain, N is more than or equal to 2, when the register chain is reset, only 1 bit is 1, the rest bits are all 0, and then each clock of the cyclic shift register is shifted to the right by one bit; the selector selects fixed M bits from the N-bit register chain, M < N, and performs OR operation on the M bits to obtain a selection signal of the multiplexer.

2. The network on chip of claim 1, wherein each virtual channel buffer is a first-in-first-out buffer.

3. A link shared network on chip as claimed in claim 1, wherein the packet header has a type field for indicating the traffic flow type of the packet and a control field for indicating the access type, i.e. whether the packet belongs to a read request, a read response, a write request or a write response.

4. A network-on-chip for link sharing according to claim 3, wherein the virtual channel buffers have data packet validity indication signals, the classifier compares the type field of the data packet with the traffic flow types of the two local virtual channel buffers after receiving the data packet of the opposite end to obtain corresponding boolean values, and then compares the two boolean values with the data packet validity indication signals of the opposite end to obtain validity signals of the two local virtual channel buffers, and then sends the data packet to the two local virtual channel buffers simultaneously, wherein only the local virtual channel buffer with the valid validity signal can successfully receive the data packet.