CN114024920B

CN114024920B - Data packet routing method for on-chip message network

Info

Publication number: CN114024920B
Application number: CN202111404105.XA
Authority: CN
Inventors: 秦晓阳; 徐培欣
Original assignee: Suzhou Blizzard Electronic Technology Co ltd
Current assignee: Suzhou Blizzard Electronic Technology Co ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2023-10-27
Anticipated expiration: 2041-11-24
Also published as: CN114024920A

Abstract

The invention discloses a data packet routing method for an on-chip message network, which adopts a plurality of measures to improve the transmission speed. For example, the bottom layer switch, such as the level 1 and level 2 switch, does not have cache direct forwarding; the high-level switch can adopt a multi-channel mode to solve the communication bottleneck; a multidimensional network on chip may also be established to further increase the transmission speed. Support layered implementation of network-on-chip communication protocols. The network layer data packet routing forwarding method provided by the invention has the advantages that the number of connecting wires of the network on chip is as small as possible, the occupied chip resources are as small as possible, and the power consumption is also small.

Description

Data packet routing method for on-chip message network

Technical Field

The invention relates to the technical field of processors, in particular to a data packet routing method for an on-chip message network.

Background

Power consumption and manufacturing process issues limit the way single-core processors continue to develop to improve performance, with single-core dominant frequencies approaching a limit, and multi-core or many-core technology is the most efficient way to improve processor performance while reducing power consumption.

However, as more and more processor cores are integrated on a single chip, how to ensure efficient communication between the individual processor cores becomes a significant issue.

The design of multi-core or many-core processors can be broadly divided into three categories: bus or switch interconnect designs, stream processors and graphics processors, and network interconnect processors. The multi-core design with bus or switch as basic interconnection architecture has the advantages that: the path of access memory for each processor core is the same, and the shared memory also supports a cache coherency protocol based on bus snooping; each processor core is similar to a traditional single-core processor, has a relatively powerful computing function, and only has some elements such as tailoring to optimize power consumption. That is, even though the single-threaded application is not changed, it can be run on a new multi-core processor, with the possibility of improved performance.

However, the significant disadvantages of this architecture result from the fact that the bus or switch becomes a system bottleneck, which is reflected in both system performance and power consumption.

Stream processors and gpgpgpu (general purpose graphics processor) are one type of massive data parallel computing mode. SIMD technology, i.e., single instruction stream multiple data streams, is typically employed to organize the computation of instructions or data. The advantage of this architecture is that it efficiently supports massive data parallel computing, but the disadvantage is that only a large number of applications exist that rule data parallel, GPU's can take its great advantage. Branch jumps in programs and data sharing between threads are soft ribs of the GPU, and are not efficient, if at all, capable of being supported. It is basically a boy's dream if who wants to do Web Server on the GPU. Second, the GPU requires extensive optimization of the application to mine its parallelism. This optimization process requires a profound understanding of the GPU architecture and the optimized program itself.

The network interconnects the primary processors. Whether the design of buses and switch, or the flow processor, there is no way to materially alter the scalability of the multi-core or even many-core processor design. By changing the traditional interconnection, a network-on-chip method is proposed, so that a plurality of processor cores communicate with each other in the future in a distributed communication mode, and the system performance bottleneck and the larger power consumption expense caused by the concentrated interconnection design are avoided. However, as the number of integrated cores on a chip increases, it is not a simple matter how this functionality is organized.

Disclosure of Invention

To overcome the above-mentioned drawbacks of the prior art, the present invention provides a data packet routing method for an on-chip messaging network.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a data packet routing method for a message-on-chip network, comprising:

1. caching of switches

There are 2 switches, 1 with no internal cache and another 1 with internal cache. The internal non-buffered switch is composed of combinational logic, and is suitable for switches with lower stages, such as level 1 or level 2 switches, without delay for direct transfer of data. The switch with internal buffer receives data, which aims to block the combination path between the sender and the receiver and prevent the overlong combination logic path. The buffer memory has 2 types, 1 type is a register-based real-time forwarding buffer memory, and the number of bits of a register in the internal buffer memory is consistent with the bit width of received data in the buffer memory. The data received in each clock period is firstly put into a buffer register and then forwarded to a switch interface where a destination address is located, and delay of 1 clock period exists between data receiving and retransmission; another 1 is to set up 1 buffer memory to receive the whole data packet and then forward it to the switch interface where the destination address is located. In this way the buffer memory should be able to hold the maximum packet length. The retransmission time is not fixed, and the transmission and the reception can be performed simultaneously, and the reception can be performed first and then the transmission can be performed.

2. Address representation

The addresses of the processors are composed of a plurality of fields, and the addresses of 1 processor with N-level processor groups are { N-1-level processor group address bit section, … …, 2-level processor group address bit section, 1-level processor group address bit section }. The bit width of each bit segment is determined by how many identical units are at the same level, e.g., if there are up to 4 processors in a level 1 processor group, then the address bit segment in a level 1 processor group is 2 bits; if there are a maximum of 4 level 1 processor complexes within a level 2 processor complex, then the level 1 processor complex address bit field is also 2 bits.

1 on-chip many cores with a class 2 processor group. There are 16 processors in total, so the processor address is represented by 4 bits. There are 4 processors in each 1-stage processor group, the lower 2 bits of each processor address are the address bit segments in the 1-stage processor group, and the upper 2 bits are the 1-stage processor group address bit segments.

For example, the address bit field in the 1-stage processor group of processor P1 is 0b01, the address bit field of the 1-stage processor group is 0b00, then the address of P0 is 0b0001; the address of P12 is 0b1100, with the address bit field in the processor P12's level 1 processor group being 0b00 and the address bit field in the processor P12's level 1 processor group being 0b 11.

At the same time, each processor group also has 1 group address, and the bit segment in the group address of the processor group is all 0. For example, G1.1 has a group address of 0b0100 and G1.3 has a group address of 0b1100.

Description: in the numerical representation herein, the number beginning with 0b is a binary number, and each number following 0b is a 1-bit binary number.

3. Transmission mode

The transmission mode of the data message is unicast, multicast and broadcast. Unicast refers to one-to-one transmission, with 1 processor in the on-chip many cores sending data to another 1 processor. Multicasting refers to 1-to-many transmission, where 1 processor of the on-chip many cores sends the same 1 data packet to all processors in a certain processor group. Broadcast refers to 1 processor sending the same 1 data packet to all processors of the on-chip many cores.

A "type" field is set up at the transmitting and receiving ends to indicate the transmission mode. the bit width of the type is the same as the number of stages of the highest processor group of the on-chip many cores, and each bit of the type corresponds to multicast control of each 1 stage of the on-chip many cores. For example, if the number of stages of a many-core highest processor group on a slice is 3, the type has 3 bits. Unicast may be indicated when type=0; type 0=0b1 indicates that each processor in the 1-level processor group where the destination address is located is multicast; type [1:0] =0b11 indicates that all processors and processor groups in the 2-level processor group where the destination address is located are multicast; and so on.

The representation of type is illustrated with the type having 3 bits. The switch of each stage decides the forwarding rule according to the destination address and transmission mode of the received data packet. type=0, meaning unicast, i.e. one-to-one transmission. Type=0b111, i.e. all bits are 1, indicating broadcast. Type=0b001, multicasting each processor in the level 1 processor group where the destination address is located; type=0b 011 indicates that all processors and processor groups within the level 2 processor group where the destination address is located are multicast.

When the transmission mode is broadcasting, a destination address is not needed; when the transmission mode is 1-level processor multicast, the address bit section in the 1-level processor group in the destination address can be omitted; when the transmission mode is 2-level processor group multicast, the address bit section of the 1-level processor group and the address bit section in the 1-level processor group in the destination address can be omitted; and so on.

4. Forwarding rules for switches

There are 1 register PADDR per port of the switch. PADDR is used to store the address of this port. At the same time, the switch also sets 1 register to store the group address GADDR of the processor group.

One switch also has 1 address comparison mask register PMASK. The PMASK bit width is the same as the address bit width. The PMASK mainly serves to mask some fields that do not need to be compared, and is mainly used for switches of class 2 and above. For a class 1 group switch, PMASK may be omitted or set to 1 for each bit.

For example, a certain 2-stage switch has 4 ports, and 4 1-stage group switches are connected to each other. The addresses of the 4 ports are 0b0100XX,0b0101XX,0b0110XX and 0b0111XX, respectively, and the "X" in the values indicates that the items are irrelevant items, and can be 0 or 1. The highest 2 bits of the port address are the 2-stage processor group address bit segments, the middle 2 bits are the address of each port, and since the present switch is a 1-stage 2 switch, the lowest 2 bits are originally the 1-stage processor group address bit segments, but are not required now, the PMASK of the 2-stage switch is 0b111100. By analogy, PMASK for a 3-stage switch is 0b110000.

For the unicast transmission mode, after a message is received from one port, the destination address TADDR of the message is compared with the address PADR of other ports in the group. The comparison rule is

TADDR & pmask= PADDR & PMASK. If equal, it is stated that the destination address matches the address of the port to be sent to the port.

If the port address is the same as a certain port address, the port address is forwarded to the port outside the group if the port address is different from the port address inside each group. If the out-of-group port is being occupied, wait. If a plurality of group external ports exist, any free group external port is found and sent out.

For example, a certain switch has 4 intra-group ports, and addresses are respectively a ports: 0b110000, port b: 0b110100, port c: 0b111000, d port: 0b111100.

Case 1: if the destination address of the 1 message sent from the port a is 0b111000, the destination address of the message is compared with the other 3 ports, and found to be consistent with the address of the C port, and then the message is sent to the C port.

Case 2: if the destination address of the 1 message sent from the port B is 0B000000, the destination address of the message is inconsistent with the other 3 ports. The message is then sent to the out-of-group port of the switch.

Case 3: if the destination address of the 1 message sent from the port C is 0b111000, the destination address of the message is inconsistent with the destination addresses of the other 3 ports. The message is then sent to the out-of-group port of the switch.

Case 4: if the destination address of the 1 message sent from the port C is 0b111000, the destination address of the message is inconsistent with the other 3 ports, but is consistent with the address of the port C itself. This situation can be understood as 1 loop back message, i.e. send out and receive back again. If a loopback message is to be supported, 1 port address comparison logic needs to be added, i.e. the destination address of the message received from a certain port is compared with the addresses of other ports, and the destination address of the message is also compared with the addresses of the ports. If the address comparison is consistent with the address of the port, the address is sent back. If, of course, this address comparison logic is to be saved or it is decided that such a loopback message is not supported, the destination address of this message is forwarded to the out-of-group port of the switch, as opposed to other port addresses. Such loop back messages are forwarded layer by layer until the outer ports of the switches of the highest layer are discarded.

For the broadcast transmission mode, the switch directly copies the message and forwards the message to other ports in the group and ports outside the group without address comparison.

For multicast transmission mode, each 1 bit of the type corresponds to 1-level multicast control, and the switch firstly copies each bit of the received type to be as wide as the address field of the current level and inverts the bit to form 1 digital CMPTYPE with the same bit number as the address field of the current level. For example, a system has 3-level groups, 4 processors in a 1-level group, 4 in a 3-level group, and 4 in a 2-level group. The type is 3 bits in the system, at this time, every 1 bit of the type is duplicated for 2 copies and inverted, i.e

Cmptype= { |! type [2] ++! type [2] ++! type [1], ++! type [1], ++! type [0] ++! type [0] }. The CMPTYPE bit width is then the same as the address bit width. The multicast address comparison method is TADDR & PMASK & cmptype= GADDR & PMASK & CMPTYPE. If equal, the destination address hits in the group, and is multicast in the group. If not, address comparison and forwarding are carried out in a unicast mode.

If it has been determined that multicast is to be performed in the group during multicast, the switch will generally change the type of the message into a broadcasting mode in which each 1 bit is 1, discard the destination address of the message, and send the message to the port in the group of the switch; but the message is not modified when the copied message is sent to the out-of-group port.

Whether or not to copy the message to the out-of-group port during multicast depends on the type field. For example, the present level is a level 1 switch, type [0] represents a level 1 multicast control, and type [1] represents a level 2 multicast control. If the current level 1 group is determined to be multicast, namely, type [0] =1, if type [1] is also 1, the higher level group is also multicast, and the message is copied to an out-of-group port; if type [1] is 0, it indicates that the higher level group does not multicast, and the message is not copied to the out-of-group port.

5. Data packet format

In addition to the data load, the message transmitted by the network layer must have 2 elements of destination address to be reached by the message, and the transmission mode. Meanwhile, the element of the source address of the data message can also be provided as required. The source address indicates which processor the present data message was issued by. The destination address indicates to which processor the present data message is to be sent.

6. Interface for a switch

The interfaces of the switch are in the form of 2,

a full duplex mode, i.e. supporting both transmission and reception, see fig. 2.send_req and re_req are 2 lines, send_ack and re_ack are 1 line, and they belong to the control signal. Both send_data and re_data are 1 set of lines for transmitting data. The 1 clock cycle in fig. 2 enables simultaneous transmission of 32-bit data because both send_data and rect_data are 32-bit. The data bit width of the simultaneous transmission is changeable.

For the transmitting end, when send_req=0, the transmitting end is in an idle state. When send_req=1, it indicates that the transmitting end is busy, it indicates that an idle clock period is inserted in the middle of 1 packet transmission, the packet has not been transmitted yet, but the current data is invalid. When send_req=2, it indicates the start of 1 new packet, and the current data is valid and is also the first beat of data of the new packet. When send_req=3, it indicates that the current data is valid, but not the first beat of data of the packet.

send_ack=1 indicates that the current data has been received by the receiving side, and the next beat can transmit the next data. send_ack=0 indicates that the current data has not been received by the receiving side, and the next beat is to be kept unchanged.

The receiving end corresponds exactly to the transmitting end data format and function, since the transmitting end and the receiving end can be directly docked, see fig. 3.

When this form of interface is used, the destination address and manner of transmission of the data packet must be placed in the first beat of data of the data packet. The purpose is to facilitate the switch to compare and determine the data forwarding ports. Meanwhile, the destination address and the transmission mode of the data packet currently being transmitted are stored in the switch, so that the subsequent data forwarding of the data packet is facilitated.

A second form of interface for a switch, see fig. 4:

the second form is different from the first form in that address lines and transmission mode lines are added separately to take out the destination address to be transmitted and the transmission mode from the data packet, and the destination address and the transmission mode are transmitted independently. This has the advantage that the destination address and the transmission mode do not have to be replayed in the data packet, nor does the buffer storing the destination address and the transmission mode have to be replayed in the switch, but the number of transmission lines is disadvantageously increased.

For the sender, send_req becomes 1 line, and when send_req=0, the sender is in an idle state. When send_req=1, it indicates that data is being transmitted, and at this time, the destination address of the packet is stored in send_addr, the send_type stores the transmission scheme, and send_data stores the data to be transmitted. send_addr is a set of lines whose bit width is consistent with the address bit width. Send_type is also a set of lines whose bit width is consistent with the transport type bit width. send_data is also a set of lines used to transmit data.

The signal connections of the receiving end and the transmitting end are also corresponding.

The invention has the technical effects and advantages that:

1. the invention provides a network layer data packet routing forwarding method of a network-on-chip, which supports layered realization of a network-on-chip communication protocol. The network layer data packet routing forwarding method provided by the invention has the advantages that the number of connecting wires of the network on chip is as small as possible, the occupied chip resources are as small as possible, and the power consumption is also small.

2. And many measures are taken to increase the transmission speed. For example, the bottom layer switch, such as the level 1 and level 2 switch, does not have cache direct forwarding; the high-level switch can adopt a multi-channel mode to solve the communication bottleneck; a multidimensional network on chip may also be established to further increase the transmission speed.

Drawings

FIG. 1 is a schematic diagram of 1 on-chip many cores with a level 2 processor group;

FIG. 2 is a schematic diagram of one example of an interface of a processor and a switch;

FIG. 3 is a schematic diagram of a connection between a processor and a switch, the switch and the switch;

FIG. 4 is a schematic diagram of a second form of interface of the switch;

FIG. 5 is a schematic diagram of an on-chip many-core network system with a 3-level processor complex, 64 processors;

FIG. 6 is a graph of each of the level 2 and level 3 switches;

fig. 7 is a schematic diagram of a system with a 2-dimensional network on chip.

Detailed Description

The following description of the technical solutions in the embodiments of the present invention will be clear and complete, and it is obvious that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

1. The on-chip multi-core or many-core structure applied by the design scheme is as follows:

1. in on-chip many cores, each processor has 1 unique address, and each processor's address is different. The address of a processor is 1 number, the bit width of which is determined by the number of all processors in the on-chip many cores. For example, if there are 256 processors on a chip, the bit width of the processor address is at least 8 bits.

2. Each processor uses a hierarchical packet placement scheme. I.e. a number of processor cores make up a 1 processor group, called a 1-level processor group. The new processor set is composed of 1 number of 1-level processor sets, which is called 2-level processor set. The on-chip many-core structure is formed by a plurality of 2-level processor groups, 1 3-level processor groups and … … and so on.

3. Each processor core has 1 message interface in support of this many-core architecture for interfacing with switches in the network-on-chip. The address of the processor is in fact the address or number of the port to which the switch is connected to the present processor. The address of the processor may or may not be modifiable. If modifiable, the address is stored in a port address register of the connected switch. At the same time, the switch has an interface capable of configuring registers, through which the on-chip processor or the off-chip application with authority reads and writes the various registers of the switch.

4. There are at least 1 switch for data exchange within each processor group. The message interface of the processor is directly and fixedly connected with one interface of the switch. In addition to being connected to each processor in the group, the switch has at least 1 interface to an external higher level processor group. Each interface of the switch is a set of signals, and each interface is bidirectional, either in a bidirectional full duplex mode or in a bidirectional half duplex mode.

5. Communication of the on-chip many-core messaging network may be layered, common layering including physical layer, link layer, network layer, transport layer, application layer. The network layer is used for realizing the forwarding of the communication message among the processors. This patent only describes the routing method used by the network layer to transmit data messages.

2. Caching of switches

3. Address representation

Such as 1 on-chip many cores with a class 2 processor group. There are 16 processors in total, so the processor address is represented by 4 bits. There are 4 processors in each 1-stage processor group, the lower 2 bits of each processor address are the address bit segments in the 1-stage processor group, and the upper 2 bits are the 1-stage processor group address bit segments.

4. Transmission mode

5. Forwarding rules for switches

6. Data packet format

7. Interface for a switch

The interfaces of the switch have 2 forms, fig. 2 being the first form:

fig. 2 is an example of an interface of a processor and a switch. This is a full duplex mode, i.e. supporting both transmission and reception. send_req and re_req are 2 lines, send_ack and re_ack are 1 line, and they belong to the control signal. Both send_data and re_data are 1 set of lines for transmitting data. The 1 clock cycle in the example of fig. 2 can simultaneously transmit 32 bits of data because both send_data and rect_data are 32 bits. The data bit width of the simultaneous transmission is changeable.

The receiving end corresponds exactly to the transmitting end data format and function, because the transmitting end and the receiving end can be directly connected. Fig. 3 is a connection between a processor and a switch, the switch and the switch.

A second form of interface for a switch is shown in fig. 4.

For the sender, send_req becomes 1 line, and when send_req=0, the sender is in an idle state. When send_req=1, it indicates that data is being transmitted, and at this time, the destination address of the packet is stored in send_addr, the send_type stores the transmission scheme, and send_data stores the data to be transmitted. send_addr is a set of lines whose bit width is consistent with the address bit width. send_type is also a set of lines whose bit width coincides with the transport type bit width. send_data is also a set of lines used to transmit data.

The signals of the receiving end and the transmitting end are correspondingly connected.

8. Transmission examples

As shown in fig. 5, an on-chip many-core network system with a 3-level processor group and 64 processors is provided. There are 4 processors in each level 1 processor group, 4 level 1 processor groups in each level 2 processor group, and 4 level 2 processor groups in the highest level 3 processor group. The group address bit width of each level of processors is 2 bits and the addresses within the 1-level processor group are also 2 bits. The address of the processor is 6 bits.

For example, P27 has an address of 0b011011 and P57 has an address of 0b111001.

The unicast communication modes in the system mainly comprise:

1. the processor goes to a level 1 switch, then to a level 2 switch, then to a level 3 switch, then to a level 2 switch, then to a level 1 switch, and then to the destination processor.

Such as P1 to P56. The address of P56 is 0b111000, and P1 arrives at the level 1 group switch S1.0 after sending the packet. S1.0 compares and finds that the destination address 0b111000 is not any port of the group, and directly sends to the port outside the group.

Reaching the level 2 group switch S2.0, the address comparison of S2.0 also finds that the destination address is not any port of the group, and sends it to the out-of-group port.

Reaching the level 3 group switch S3.0, the S3.0 finds, by address comparison, that it is to be sent to the port of the level 2 processor group G2.3 and then to the level 2 group switch S2.3 of the level 2 processor group G2.3.

The level 2 group switch S2.3 finds, by address comparison, that it is sending to the port of the level 1 processor group G1.14 and thus to the level 1 group switch S1.14 of the level 1 processor group G1.14.

Upon reaching S1.14, it is found by address comparison to be addressed to P56 and then to the port connected to P56.

2. The processor goes to a level 1 switch, then to a level 2 switch, then to a level 1 switch, then to the destination processor.

Such as P1 to P12. P12 has an address of 0b001100, and P1 sends a packet to reach the class 1 group switch S1.0. S1.0 compares and finds that the destination address 0b001100 is not any port of the group, and directly sends it to the port outside the group.

Reaching the level 2 group switch S2.0, S2.0 finds by address comparison that it is sent to the port of the level 1 processor group G1.3 and then to the level 1 group switch S1.3 of the level 1 processor group G1.3.

Upon reaching S1.3, it is found by address comparison to be destined for P12 and then to the port connected to P12.

3. The processor goes to the 1-level switch and then to the destination processor.

Such as P1 to P3. The address of P3 is 0b000011, and P1 arrives at the level 1 group switch S1.0 after sending the packet. S1.0 compares and finds that the destination address 0b000011 is the port address of the group connected with P3, and sends the port directly.

The multicast communication modes in the system mainly comprise:

1. multicast within the present class 1 processor group.

For example, P0 sends out a multicast message to be multicast in the present level 1 group. Send_type is set to 0b001, send_addr is set to 0b0000XX, where XX is an irrelevant item, which may or may not be any value. The level 1 group switch S1.0 detects send_addr as the address in the group, and bit 0 of the send_type is 1, which indicates level 1 multicast, so that 3 copies of the message will be sent to the other 3 ports in the group.

When type=0b001, the multicast address comparison procedure is cmptype=0b111100, 1-level group pmask=0b111111, 1-level group address where P0 is located is gaddr=0b000000, taddr & PMASK & cmptype=0b0000xx & 0b111100=0b000000, GADDR & PMASK & cmptype=0b000000 &0b111111& 0b111100=0b 000000, and so both are equal.

Since bit 1 of the send_type is 0, i.e., indicates that the level 2 group is not multicast, the message is not copied to the out-of-group port of S1.0.

2. Multicast to the same level 2 processor group, different level 1 processor groups.

For example, P1 sends out a multicast message to be multicast in G1.3 group. Send_type is set to 0b001, send_addr is set to 0b0011XX. After the message reaches the level 1 switch G1.0, the address comparison finds that the destination address is not in the group, and then the message is forwarded to the port outside the group.

After the 2-stage switch G2.0 receives the message, since bit 1 of the send_type is 0, which means 2-stage multicast, and the address comparison finds that the destination address is at the port connected to G1.3, then the message is forwarded to this port.

After receiving the message, G1.3 detects that send_addr is the address in the group, and bit 0 of send_type is 1, which means level 1 multicast, so that 4 copies of the message will be sent to 4 ports in the group.

3. Multicast within the present level 2 processor group.

For example, P2 issues a multicast message to multicast to each processor in the level 2 group G2.0. send_type is set to 0b011, send_addr is set to 0b00AAXX. The AA here must be 00 since this level 1 multicast is also being multicast.

After the message reaches the level 1 switch G1.0, the address comparison finds that the destination address is in the group, and the 0 th bit of send_type is 1, which means level 1 multicast, so that 3 copies of the message can be sent to the other 3 intra-group ports in the group. Note that when replication is sent to the intra-group port, send_type is modified to 0b111, i.e., broadcast message, while the destination address is discarded.

Since bit 1 of send_type is 1, meaning that level 2 group is also multicast, the message is also copied to the out-of-group port of S1.0. Note that when copying to the out-of-group port, the send_type will not be modified and the destination address will be preserved.

After S2.0 is reached, a multicast address comparison is performed: cmptype=0b110000, 2-level group pmask=0b111100, and G2.0 is found at a 2-level group address of gaddr=0b000000, taddr & PMASK & cmptype=0b0000xx &0b111100& 0b110000=0b000000, GADDR & PMASK & cmptype=0b000000 &0b111100& 0b110000=0b 000000, and so both are equal. Multicast within the group is required. The message transmission method is changed into broadcasting and 3 copies are duplicated and sent to all the intra-group ports.

Since bit 2 of the send_type is 0, i.e., indicates that the level 3 group is not multicast, the message is not copied to the out-of-group port of S2.0.

4. Multicast within another class 2 processor group.

For example, P16 issues a multicast message to multicast to each processor in the level 2 group G2.2. Send_type is set to 0b011, send_addr is set to group address 0b10XXXX of G2.2.

After the message reaches the level 1 switch S1.4, the address comparison finds that the group address of G1.4 is 0b0100XX, obviously different from send_addr, the destination address is not in the group, and the destination address is sent to the out-of-group port.

After S2.1 is reached, the destination address is found out not to be in the group by address comparison, and is sent to the port outside the group.

Upon reaching S3.0, the address comparison finds that the destination address is at the interface connected to G2.2 and then sends to S2.2.

Upon arrival at S2.2, due to type [1] =1 and multicast address comparison: cmptype=0b110000, 2-level group pmask=0b111100, and G2.0 is found at a 2-level group address of gaddr=0b100000, taddr & PMASK & cmptype=0b10xxxx &0b111100& 0b110000=0b100000, GADDR & PMASK & cmptype=0b100000 &0b111100& 0b110000=0b 100000, and so both are equal. Multicast within the group is required. The message transmission method is changed into broadcasting and copying 4 copies and sending to all the intra-group ports.

Because the message arrives from the out-of-group port of S2.2, the message is not sent back to the out-of-group port. And even if the current network-on-chip allows a loop-back message, the type [2] =0 at this time, so the message is not copied to the out-of-group port.

5. To multicast within a certain level 1 processor group within another level 2 processor group.

For example, P17 issues a multicast message to multicast to each processor in the class 1 group G1.12. Send_type is set to 0b001, send_addr is set to group address 0b1100XX of G1.12.

After the message reaches the level 1 switch S1.4, the address comparison finds that the destination address is not in the group, and the message is sent to the port outside the group.

Upon reaching S3.0, the address comparison finds that the destination address is at the interface connected to G2.3 and then sends to S2.3.

Upon reaching S2.3, the address comparison finds that the destination address is at the interface connected to G1.12 and then sends to S1.12.

Upon arrival at S1.12, due to type [0] =1 and multicast address comparison: cmptype=0b111100, level 1 group pmask=0b111111, and the level 1 group address where G1.12 is located is gaddr=0b110000, taddr & PMASK & cmptype=0b1100xx &0b111111& 0b111100=0b110000, GADDR & PMASK & cmptype=0b110000 &0b111111& 0b111100=0b110000, so that both are equal. Multicast within the group is required. The message transmission method is changed into broadcasting and copying 4 copies and sending to all the intra-group ports.

Because the message arrives from the out-of-group port of S1.12, the message is not sent back to the out-of-group port.

There are only 1 broadcast communication modes in this system. For example, P3 sends out 1 broadcast message. The send_type is set to 0b111 and send_addr is negligible. After the message arrives at the level 1 switch S1.0, since the send_type=0b111 is unconditional broadcast, the S1.0 directly copies 4 copies of the message without address comparison, and sends the message to the other 3 intra-group ports and the other 3 extra-group ports respectively.

S2.0, after receiving the broadcast message, directly broadcasting the broadcast message, copying 4 copies of the message, and respectively sending the copied messages to the other 3 intra-group ports and the other 3 extra-group ports. S1.1, S1.2 and S1.3 directly copy 4 parts of broadcast after receiving the message and send the broadcast to 4 intra-group ports in the group.

S3.0, after receiving the broadcast message, directly broadcasting the broadcast message, copying 4 copies of the message, and respectively sending the copied messages to the other 3 intra-group ports and the other 3 extra-group ports. S2.1, S2.2 and S2.3 also copy 4 parts of broadcast directly after receiving the message, and send the broadcast to 4 intra-group ports in the group. Because S3.0 is the highest-layer switch, the message sent to the out-of-group port can be directly discarded, or the out-of-group port can be connected with the chip pins, so that the connection between the on-chip network and the off-chip resources is realized, and even the on-chip network interconnection among multiple chips can be formed.

Similarly, other 1-level switches receive the broadcast message and broadcast the message directly.

9. Multi-channel interface to mitigate communication bottlenecks

The number of the switch external ports and the switch internal ports can be more than 1. Each interface is 1 channel when multiple interfaces are provided. Especially, the high-level switch has large data traffic, and a plurality of external ports or internal ports can be added to improve the communication speed.

As shown in fig. 6, each of the level 2 and level 3 switches has 2 out-of-group ports, called 2 lanes. Setting up 2 3-stage switches simultaneously increases the communication speed. When 1 switch has a plurality of out-of-group ports, a message to be sent to an out-of-group port may be sent through any one out-of-group port. So only 1 free out-of-group port transmission is found.

The number of ports in the group to which the switch is connected to a certain processor may be plural. At this time, the same message only needs to find any free interface to send out.

10. Establishing a multidimensional network on chip to increase communication speed

To further increase the communication speed, a multi-dimensional network on chip may be established.

As shown in fig. 7, this is a system with a 2-dimensional network on chip. Any processor has 2 interfaces, each interfacing with a 1-dimensional network. The interface between the 2-dimensional 1-level switch and the 2-dimensional 2-level switch in fig. 7 is also a 2-channel interface.

When the processor needs to send the message, any idle interface can be found to send out, the message can be classified, and a certain type of message is fixedly sent and received through a certain 1-dimensional network, which is determined by the system or the application.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A method for routing data packets for a message-on-chip network, comprising:

transmission mode determination

Setting up a type field at a transmitting end and a receiving end to represent a transmission mode; the bit width of the type is the same as the level number of the highest processor group of the on-chip many cores, and each bit of the type corresponds to multicast control of each 1 level of the on-chip many cores respectively; the unicast is indicated when type=0 is specified; type 0=0b1 indicates that each processor in the 1-level processor group where the destination address is located is multicast; type [1:0] =0b11 indicates that all processors and processor groups in the 2-level processor group where the destination address is located are multicast; and so on;

when the transmission mode is broadcasting, a destination address is not needed; when the transmission mode is 1-level processor multicast, the address bit segment in the 1-level processor group in the destination address is omitted; when the transmission mode is 2-level processor group multicast, the address bit section of the 1-level processor group in the destination address and the address bit section in the 1-level processor group are omitted; and so on;

Forwarding rule determination for a (two) switch

The switch has 1 register PMASK; the PMASK bit width is the same as the address bit width; the PMASK mainly shields some fields which do not need to be compared and is mainly used for switches of level 2 and above; for a class 1 group switch, PMASK is omitted or set to be 1 for each bit;

for the unicast transmission mode, after receiving a message from one port, the destination address TADDR of the message is compared with the address PADR of other ports in the group; the comparison rule is TADDR & pmask= PADDR & PMASK; if the address is equal, the destination address is matched with the address of the port and is sent to the port; if the port address is the same as a certain port address, the port address is forwarded to the port outside the group if the port address is different from the port address inside each group; waiting if the group external port is being occupied; if a plurality of group external ports exist, any free group external port is found and sent out;

for the broadcast transmission mode, the switch directly copies the message and then forwards the message to other ports in the group and ports outside the group without address comparison;

for the transmission mode of multicast, each 1 bit of the type corresponds to 1-level multicast control, the switch firstly copies each bit of the received type to be as wide as the address field of the current level and inverts the bit to form 1 digital CMPTYPE with the same bit number as the address field of the current level;

The multicast address comparison method is TADDR & PMASK & cmptype= GADDR & PMASK & CMPTYPE; if the destination addresses are equal, indicating that the destination addresses hit the group, and multicasting in the group; if not, address comparison and forwarding are carried out according to a unicast mode;

if it is judged that the multicast is to be performed in the group during the multicast, the switch changes the type of the message into a broadcasting mode that each 1 bit is 1, discards the destination address of the message and then sends the message to the port in the group of the switch; however, when the copied message is sent to the outside port of the group, the message is not modified;

whether to copy the message to the outside port of the group during the multicast depends on the type field; if the current level is a level 1 switch, type [0] represents a level 1 multicast control, and type [1] represents a level 2 multicast control; if the current level 1 group is determined to be multicast, namely, type [0] =1, if type [1] is also 1, the higher level group is also multicast, and the message is copied to an out-of-group port; if type [1] is 0, it indicates that the higher level group does not multicast, and the message is not copied to the out-of-group port.

2. A method of routing packets for a message-on-chip network as claimed in claim 1, wherein: each port of the switch has 1 register PADDR; PADDR is used to store the address of this port; at the same time, the switch also sets 1 register to store the group address GADDR of the processor group.

3. A method of routing packets for a message-on-chip network as claimed in claim 1, wherein: for unicast transmission mode, if the switch has 4 intra-group ports, the addresses are the a ports: 0b110000, port b: 0b110100, port c: 0b111000, d port: 0b111100; the following 4 cases are included:

case 1: the destination address of 1 message sent from the port A is 0b111000, and then the destination address of the message is compared with the other 3 ports, and found to be consistent with the address of the C port, and then the message is sent to the C port;

case 2: the destination address of 1 message sent from the port B is 0B000000, and then the destination address of the message is inconsistent with the destination address of the other 3 ports; then the message is sent to the external port of the switch;

case 3: the destination address of 1 message sent from port C is 0b111000, and then the destination address of this message is inconsistent with the other 3 ports; then the message is sent to the external port of the switch;

case 4: the destination address of 1 message sent from port C is 0b111000, so that the destination address of the message is inconsistent with the other 3 ports, but consistent with the address of port C; if the loopback message is to be supported, 1 port address comparison logic needs to be added, namely, the destination address of the message received from a certain port is compared with the addresses of other ports and the addresses of the ports; if the address comparison is consistent with the address of the port, the port is sent back; or when it is determined that the loopback message is not supported, the destination address of the message is different from other port addresses and is forwarded to the external port of the switch, and the loopback message is forwarded layer by layer until the external port of the switch at the highest layer is discarded.

4. A method of routing packets for a message-on-chip network as claimed in claim 1, wherein: transmission mode for multicast

If the system has 3-level groups, the 1-level groups have 4 processors, the 3-level groups have 4 2-level groups, and the 2-level groups have 4 1-level groups; the type is 3 bits in the system, at this time, every 1 bit of the type is copied 2 copies and inverted, i.e. cmptype= { |! type [2] ++! type [2] ++! type [1], ++! type [1], ++! type [0] ++! type [0] }; at this time, the CMPTYPE bit width is the same as the address bit width; the multicast address comparison method is TADDR & PMASK & cmptype= GADDR & PMASK & CMPTYPE; if the destination addresses are equal, indicating that the destination addresses hit the group, and multicasting in the group; if not, address comparison and forwarding are carried out in a unicast mode.

5. A method of routing packets for a message-on-chip network as claimed in claim 1, wherein: the number of the switches is 2, 1 is internally without cache, and the other 1 is internally with cache; the exchanger without buffer memory is composed of combinational logic, and the data is directly transferred without delay, so that the exchanger is suitable for the exchanger with lower level; the exchanger with buffer memory can buffer the received data, which is mainly aimed at isolating the combination path between sender and receiver to prevent the overlong combination logic path; the buffer memory has 2 types, 1 type is a register-based real-time forwarding buffer memory, and the number of bits of the register in the internal buffer memory is consistent with the width of bits of the received data in the buffer memory; the data received in each clock period is firstly put into a buffer register and then forwarded to a switch interface where a destination address is located, and delay of 1 clock period exists between data receiving and retransmission; the other 1 is to set 1 buffer memory to receive the whole data packet and then forward the data packet to the switch interface where the destination address is located; the buffer memory should be able to hold the maximum packet length in this way; the retransmission time is not fixed, and the transmission and the reception can be performed simultaneously, and the reception can be performed first and then the transmission can be performed.

6. A method of routing packets for a message-on-chip network as claimed in claim 1, wherein: the address represents: the addresses of the processors are composed of a plurality of fields, and the addresses of 1 processor with N-level processor groups are { N-1 level processor group address bit section, … …,2 level processor group address bit section, 1 level processor group address bit section }; the bit width of each bit segment is determined by how many identical cells are at the same level.

7. A method of routing packets for a message-on-chip network as claimed in claim 1, wherein: one or more of the group external port and the group internal port of the exchanger are arranged; each interface is 1 channel when the interfaces are multiple; when a high-level switch is used, data traffic is large, and communication speed is increased by adding a plurality of out-of-group ports or in-group ports.

8. A method of routing packets for a message-on-chip network as claimed in claim 1, wherein: to increase the speed of further communication, a multidimensional network-on-chip is established; when the processor is to send the message, the message is sent out by finding any idle interface or classified, and a certain type of message is fixedly sent and received through a certain 1-dimensional network, which is decided by the system or the application.