CN112631985A

CN112631985A - Link-shared network-on-chip

Info

Publication number: CN112631985A
Application number: CN202011528831.8A
Authority: CN
Inventors: 胡东伟
Original assignee: CETC 54 Research Institute
Current assignee: CETC 54 Research Institute
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-04-09
Anticipated expiration: 2040-12-22
Also published as: CN112631985B

Abstract

The invention discloses a link-shared network on chip, and belongs to the technical field of chip design. The system comprises a plurality of processor cores which are arranged in an array form, wherein a physical link is arranged between two adjacent processor cores in the array, and an arbitration and multiplexer, a classifier and two virtual channel buffers are arranged at two ends of each physical link; and the processor core transmits the data packet to the next hop of processor core according to the routing path until the data packet reaches the processor core at the receiving end. The invention adopts the virtual channel technology, can realize the independence of two service flows and the sharing of a physical link. In addition, by adopting the unbalanced arbiter, the matching of the arbiter and the service flow can be realized, which is beneficial to improving the utilization rate of the network on chip.

Description

Link-shared network-on-chip

Technical Field

The invention relates to the technical field of chip design, in particular to a link-shared network on chip.

Background

Network on chip is an important method for implementing multi-core/many-core processors. With the network-on-chip technology, the interconnection of hundreds of chips in a single chip can be realized. Modern high-end server chips all adopt on-chip network technology. However, the implementation method of the network-on-chip technology by each company is not disclosed. Thus, companies have different network-on-chip implementation techniques.

For a multi-core/many-core processor, the network-on-chip technology can realize the access of each processor to an off-chip memory and can also realize the mutual communication among the processors. These are two different traffic flows. Currently, there are two approaches to achieve both the access of any processor to the off-chip memory and the mutual access between any two processors. One approach is to use two completely independent networks on chip to respectively realize independent access of two communication service flows. For example, the TILE64 architecture claims to have 3 sets of independent networks on chip. Because the implementation of the network on chip involves a large number of on-chip routing, the method has the disadvantage that the implementation cost of the network on chip is large, which causes the cost of the chip area to be large. Alternatively, two traffic streams are mixed together and a set of network-on-chip is used to implement both traffic streams. This approach is seen in a number of academic papers. The disadvantage of this approach is that when one communication traffic flow is blocked, the other traffic flow will also be blocked, resulting in a degradation of system performance.

Disclosure of Invention

In view of this, the present invention provides a link-shared network on chip, which uses a virtual channel technology, and can implement separation of a service flow for accessing a memory and a service flow for communication between processor cores, and the physical links of the two are shared, thereby not only ensuring performance, but also reducing area overhead for implementing the network on chip.

Based on the above purpose, the technical scheme provided by the invention is as follows:

a link-shared network on chip comprising a plurality of processor cores; the processor cores are arranged in an array form, a physical link is arranged between two adjacent processor cores in the array, and both ends of each physical link are provided with an arbitration and multiplexer, a classifier and two virtual channel buffers which are respectively used for storing an off-chip memory to access a service flow data packet and an interprocessor communication service flow data packet; the processor core respectively sends the data packets of the off-chip memory access service flow and the inter-processor communication service flow to the corresponding virtual channel buffers according to the routing path, the arbitration and multi-path selector selects one data packet from the two virtual channel buffers and transmits the data packet to the classifier of the next-hop processor core through the corresponding physical link, the classifier of the next-hop processor core stores the received data packet into the corresponding virtual channel buffer, and then the arbitration and multi-path selector transmits the data packet to the classifier of the next-hop processor core through the corresponding physical link until the data packet reaches the classifier of the processor core at the receiving end, the classifier sends the off-chip memory access service flow data packet to the off-chip memory, and the inter-processor communication service flow data packet is provided for the processor core at the receiving end.

Furthermore, the arbitration and multiplexer comprises a register and an inverter, wherein the register is reset to 0 at the beginning, and is continuously turned over between 0 and 1 after being started to generate a multiplexing signal, so that the data packets in the two virtual channel buffers are sent in turn without interruption.

Further, the arbitration and multiplexer comprises a circular shift register and a configurable selector; the cyclic shift register is a register chain with N bits, N is more than or equal to 2, when the register chain is reset, only 1 bit is 1, and the rest bits are all 0, and then each clock of the cyclic shift register circularly shifts one bit to the right; the selector selects fixed M bits from the register chain with N bits, M < N, and the M bits are subjected to OR operation to obtain a selection signal of the multiplexer.

Further, each virtual channel buffer is a first-in first-out buffer.

Further, the packet header of the data packet has a type field and a control field, the type field is used for indicating the traffic type of the data packet, and the control field is used for indicating the access type, that is, whether the data packet belongs to a read request, a read response, a write request, or a write response.

Furthermore, the virtual channel buffers have data packet validity indication signals, after the classifier receives an opposite-end data packet, the classifier compares the type field of the data packet with the service flow types of the two local-end virtual channel buffers to obtain corresponding boolean values, and then the two boolean values are respectively and respectively summed with the data packet validity indication signals of the opposite end to obtain validity signals of the two local-end virtual channel buffers, and then the data packet is simultaneously sent to the two local-end virtual channel buffers, and only the local-end virtual channel buffers with valid validity signals can successfully receive the data packet.

As can be seen from the above description, the technical scheme of the invention has the beneficial effects that:

1. the invention adopts the virtual channel technology to realize the independence of the two service flows of the access of each processor core of the on-chip network to the off-chip memory and the access of each processor core to each other, and the sharing of the physical link. Therefore, on one hand, the service flow of memory access and the service flow of communication between processors are separated, and the communication performance is improved; on the other hand, the physical links of the two service flows are shared, so that the wiring overhead in a chip is avoided, and the chip area is reduced.

2. The invention can realize the matching of the service flow and the arbitrator through the programmable non-equilibrium arbitrator, and is beneficial to improving the utilization rate of the network on chip.

Drawings

To more clearly describe this patent, one or more drawings are provided below to assist in explaining the background, technical principles and/or certain embodiments of this patent.

Fig. 1 is a schematic structural diagram of a network on chip in an embodiment of the present invention.

Fig. 2 is a schematic diagram of a link sharing structure in the embodiment of the present invention.

Fig. 3 is a diagram illustrating a format of a data packet according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of an implementation of the arbitration and multiplexer according to an embodiment of the present invention.

FIG. 5 is a schematic diagram of another implementation of the arbitration and multiplexer according to the embodiment of the present invention.

Fig. 6 is a schematic diagram of one implementation of a classifier in an embodiment of the invention.

Detailed Description

In order to facilitate understanding of the technical solutions of the present patent by those skilled in the art, and to make the technical objects, technical solutions and advantages of the present patent more apparent and fully support the scope of the claims, the technical solutions of the present patent are described in detail in the following embodiments.

As shown in fig. 1 and 2, a link-shared network on chip includes a plurality of processor cores, where the processor cores are arranged in an array, a physical link is arranged between two adjacent processor cores in the array, and both ends of each physical link are provided with an arbitration and multiplexer, a classifier, and two virtual channel buffers respectively used for storing an off-chip memory access traffic packet and an inter-processor communication traffic packet; the processor core sends the data packet of the off-chip memory access service flow (namely, the service flow 1) to the virtual channel buffer A of the local end according to the routing path, sends the data packet of the inter-processor communication service flow (namely, the service flow 2) to the virtual channel buffer B of the local end,

the arbitration and multi-path selector selects a data packet from the two virtual channel buffers and transmits the data packet to the classifier of the next hop processor core through a corresponding physical link; the classifier of the processor core of the next hop stores the data packet of the service flow 1 into the virtual channel buffer A of the physical link of the next hop at the local end according to the type of the service flow, stores the data packet of the service flow 2 into the virtual channel buffer B of the physical link of the next hop at the local end, and then transmits the data packet to the classifier of the processor core of the next hop through the physical link of the next hop by the arbitration and multi-path selector, and so on until the data packet reaches the virtual channel buffer at the processor core of the receiving end; and the classifier at the processor core of the receiving end sends the access service flow data packet of the off-chip memory to the off-chip memory and provides the communication service flow data packet between the processors to the processor core of the receiving end.

Further, as shown in fig. 3, the packet header of the data packet has a type field for indicating the traffic type of the data packet and other control fields for indicating the access type, i.e. whether the data packet belongs to a read request, a read response, a write request or a write response.

Further, as shown in fig. 4, the arbitration and multiplexer includes a register Reg and an inverter INV, the register Reg is reset to 0 initially, and is continuously inverted between 0 and 1 after being started, a multiplexer signal S is generated and sent to the selector MUX, and the selector MUX selects and sends a corresponding packet Pkt1 and Pkt2 from the two first virtual channel buffers according to the value of S, and outputs the packet Pkt.

With this arbitration and multiplexer, the switch is continuous whether or not a packet is sent. Thus, when a packet of one virtual channel is blocked, an attempt can be made to transmit a packet of another virtual channel. After arbitration and selection of the multi-path selector, the physical link only needs the width of one data packet, thereby reducing the implementation overhead of the physical link.

Further, as shown in fig. 5, the arbitration and multiplexer includes a circular shift register and a configurable selector; the cyclic shift register is a 17-bit register chain, when the register chain is reset, only 1 bit is 1, and the rest bits are all 0, and then each clock of the cyclic shift register circularly shifts one bit to the right; the selector selects fixed 10 bits from a 17-bit register chain, performs OR operation on the 10 bits to obtain a selection signal S of the multiplexer, and the selector MUX selects corresponding data packets Pkt1 and Pkt2 from the two first virtual channel buffers according to the value of S to transmit, and outputs the data packets Pkt.

In many systems, the traffic for off-chip memory access and the traffic for inter-processor communication are not equal. For example, in most systems, the traffic for off-chip memory access is much larger than the traffic for interprocessor communication, i.e., traffic 1 is much larger than traffic 2. Thus, a non-uniform arbiter of the kind described above may be provided between traffic flow 1 and traffic flow 2. For example, 3: 1 arbiter, then every time 3 packets of traffic 1 are sent, 1 packet of traffic 2 is sent. Therefore, the efficiency of the network on chip can be effectively improved.

Further, each virtual channel buffer is a first-in first-out buffer. The virtual channel buffer is typically implemented using a FIFO first-in first-out buffer. The FIFO may be a synchronous FIFO or an asynchronous FIFO. The Valid interface signal of the sending end can indicate the validity of a data packet, and the Ready interface signal of the receiving end indicates that the receiving end is Ready to receive a data packet. When the Valid signal and the Ready signal are simultaneously active (simultaneously high), the data packet is transmitted from the transmitting end to the receiving end.

Specifically, as shown in fig. 6, after receiving the data packet Pkt, the classifier compares the type field of the data packet Pkt with the service flow types of the two second virtual channel buffers, respectively, to obtain corresponding boolean values, AND then performs AND operation on the two boolean values with the data packet validity indication signal Vld of the opposite end, respectively, to obtain validity signals Vld1 AND Vld2 of the two second virtual channel buffers, AND then sends the data packet Pkt to the two second virtual channel buffers at the same time, AND only the second virtual channel buffer whose validity signal is valid can successfully receive the data packet.

Currently, in a multi-core/many-core processor, on one hand, each processor core needs to access an off-chip memory through an on-chip network, and on the other hand, each processor needs to realize mutual access among the processors. For example, a multi/many-core processor system containing 64 processors, whose processors are divided into 16 processor clusters, connected by a 4 × 4 network on chip. At the 4 corners of the network on chip, memory controllers are connected, respectively. Any processor can access 4 memory controllers through the on-chip network, so that the off-chip memory is accessed; any two processors may access each other through a network on chip. This results in two traffic flows that differ in nature. When two services are mixed together, the two services are easily congested, so that the system performance is reduced; when two communication services are completely separated and two sets of independent physical links are adopted, the implementation cost of the network on chip is large. Therefore, the virtual channel technology is adopted in the invention, so that the independence of two service flows can be realized, and the physical link is shared. In addition, by adopting the unbalanced arbiter, the matching of the arbiter and the service flow can be realized, which is beneficial to improving the utilization rate of the network on chip.

It should be understood that the above description of the embodiments of the present patent is only an exemplary description for facilitating the understanding of the patent scheme by the person skilled in the art, and does not imply that the scope of protection of the patent is only limited to these examples, and that the person skilled in the art can obtain more embodiments by combining technical features, replacing some technical features, adding more technical features, and the like to the various embodiments listed in the patent without any inventive effort on the premise of fully understanding the patent scheme, and therefore, the new embodiments are also within the scope of protection of the patent.

Claims

1. A link-shared network on chip comprising a plurality of processor cores; the system is characterized in that the processor cores are arranged in an array form, a physical link is arranged between two adjacent processor cores in the array, and both ends of each physical link are provided with an arbitration and multiplexer, a classifier and two virtual channel buffers which are respectively used for storing an off-chip memory to access a service flow data packet and a communication service flow data packet between the processors; the processor core respectively sends the data packets of the off-chip memory access service flow and the inter-processor communication service flow to the corresponding virtual channel buffers according to the routing path, the arbitration and multi-path selector selects one data packet from the two virtual channel buffers and transmits the data packet to the classifier of the next-hop processor core through the corresponding physical link, the classifier of the next-hop processor core stores the received data packet into the corresponding virtual channel buffer, and then the arbitration and multi-path selector transmits the data packet to the classifier of the next-hop processor core through the corresponding physical link until the data packet reaches the classifier of the processor core at the receiving end, the classifier sends the off-chip memory access service flow data packet to the off-chip memory, and the inter-processor communication service flow data packet is provided for the processor core at the receiving end.

2. The link-shared network-on-chip as claimed in claim 1, wherein the arbitration and multiplexer comprises a register and an inverter, the register is reset to 0 initially and toggles between 0 and 1 after activation to generate the multiplexing signal, thereby transmitting the data packets in the two virtual channel buffers alternately without interruption.

3. A link-shared network-on-chip according to claim 1, wherein said arbitration and multiplexer comprises a circular shift register and a configurable selector; the cyclic shift register is a register chain with N bits, N is more than or equal to 2, when the register chain is reset, only 1 bit is 1, and the rest bits are all 0, and then each clock of the cyclic shift register circularly shifts one bit to the right; the selector selects fixed M bits from the register chain with N bits, M < N, and the M bits are subjected to OR operation to obtain a selection signal of the multiplexer.

4. The link-shared network-on-chip of claim 1, wherein each virtual channel buffer is a first-in-first-out buffer.

5. The link-sharing network on chip of claim 1, wherein the packet header of the data packet has a type field and a control field, the type field is used to indicate the traffic type of the data packet, and the control field is used to indicate the access type, i.e. whether the data packet belongs to a read request, a read response, a write request or a write response.

6. The network on chip of claim 5, wherein the virtual channel buffers have packet validity indication signals, the classifier receives an opposite-end packet, compares the type field of the packet with the service flow types of the two local-end virtual channel buffers to obtain corresponding Boolean values, and then respectively sums the two Boolean values with the packet validity indication signals of the opposite end to obtain validity signals of the two local-end virtual channel buffers, and then sends the packet to the two local-end virtual channel buffers at the same time, and only the local-end virtual channel buffer with the validity signal of the local-end virtual channel buffer can successfully receive the packet.