CN114024920A

CN114024920A - Data packet routing method for on-chip message network

Info

Publication number: CN114024920A
Application number: CN202111404105.XA
Authority: CN
Inventors: 秦晓阳; 徐培欣
Original assignee: Suzhou Blizzard Electronic Technology Co ltd
Current assignee: Suzhou Blizzard Electronic Technology Co ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-02-08
Anticipated expiration: 2041-11-24
Also published as: CN114024920B

Abstract

The invention discloses a data packet routing method for an on-chip message network, which adopts a plurality of measures to improve the transmission speed. For example, underlying switches, such as level 1, level 2 switches, do not have cache direct forwarding; the high-level switch can solve the communication bottleneck by adopting a multi-channel mode; a multi-dimensional network on chip may also be built to further increase the transmission speed. The network communication protocol on the chip is realized in a layered mode. By using the network layer data packet routing forwarding method provided by the invention, the number of connecting lines of the network on chip is reduced as much as possible, the occupied chip resources are reduced as much as possible, and the power consumption is reduced.

Description

Data packet routing method for on-chip message network

Technical Field

The invention relates to the technical field of processors, in particular to a data packet routing method for a message-on-chip network.

Background

The development approach of a single-core processor for continuously improving the performance is limited by the problems of power consumption and manufacturing process, the main frequency of a single core is close to the limit, and the multi-core or multi-core technology is the most effective way for improving the performance of the processor and reducing the power consumption.

However, when more and more processor cores are integrated on a single chip, how to guarantee efficient communication between the respective processor cores becomes 1 important issue.

The design of multi-core or many-core processors can be broadly divided into three categories: bus or switch interconnect designs, stream processors and graphics processors, and network interconnected processors. The advantage of the multi-core design using bus or switch as basic interconnection architecture is: the path of memory access of each processor core is the same, and the shared memory also supports a cache coherence protocol based on bus snooping; each processor core is similar to a traditional single-core processor, has a powerful calculation function, and only needs to be cut to optimize elements such as power consumption. That is, performance may be improved even if a single-threaded application program is running on a new multi-core processor without any change.

A significant disadvantage of this architecture, however, arises from the bus or switch becoming a system bottleneck, which manifests itself in both system performance and power consumption.

Stream processors and gpgpgpu (general purpose graphics processor) are a large scale data parallel computing model. SIMD techniques, i.e. single instruction stream multiple data streams, are often employed to organize the computation of instructions or data. The advantage of this architecture is that it efficiently supports massive data parallel computing, but the disadvantage is that the GPU can only exploit its great advantages if there are a large number of applications with regular data parallel. Branch jumps in the program and data sharing among threads are soft ribs of the GPU, which is not efficient if supported. It is basically a dream of the minds if who wants to do the Web Server on the GPU. Second, the GPU needs to perform a lot of optimization on the application to mine its parallelism. This optimization process requires a profound understanding of the GPU architecture and the program itself being optimized.

Network interconnect-based processors. Neither the bus and switch designs, nor the stream processors, have essentially changed the non-extensibility of the multi-core or even many-core processor design. Changing the traditional interconnection, people propose a method using a network on chip, so that a plurality of processor cores communicate with each other in a distributed communication mode in the future, thereby avoiding system performance bottleneck and large power consumption overhead caused by a centralized interconnection design. However, as the number of nuclei integrated on a chip increases, it is not a simple matter how this function is organized.

Disclosure of Invention

To overcome the above-mentioned drawbacks of the prior art, the present invention provides a packet routing method for a message-on-chip network.

In order to achieve the purpose, the invention provides the following technical scheme:

a method of packet routing for a message-on-chip network, comprising:

cache of switch

There are 2 switches, one 1 internally uncached and the other 1 internally cached. The internal unbuffered switch is composed of combinational logic, directly transfers data without delay, and is suitable for switches with lower stages, such as a stage 1 or a stage 2 switch. The switch with the buffer inside buffers the received data, and the main purpose is to cut off the combined path between the sender and the receiver and prevent the combined logic path from being too long. The caching mode has 2 types, and 1 type is register-based real-time forwarding caching, and the register bit number of the internal caching in the mode is consistent with the bit width of the received data. The data received in each clock cycle is firstly put into a cache register and then forwarded to the switch interface where the destination address is located, and 1 clock cycle delay exists between data receiving and retransmitting; the other 1 is to set 1 buffer memory to receive the whole data packet and then forward it to the switch interface where the destination address is located. In this way the buffer memory should be sized to fit the maximum packet length. The retransmission time is not fixed, and there is a possibility that the retransmission is performed while the reception is performed, or the retransmission is performed after the reception.

Two, address representation

The processor address is composed of a plurality of fields, and the addresses of 1 processor with an N-level processor group are { N-1 level processor group address bit field, … …, 2 level processor group address bit field, 1 level processor group address bit field }. The bit width of each bit segment is determined by how many identical units are on the same level, e.g., if there are 4 processors at most in the level 1 processor group, then the address bit segment in the level 1 processor group is 2 bits; if there are a maximum of 4 level 1 processor groups within a level 2 processor group, then the level 1 processor group address bit field is also 2 bits.

1 on-chip many core with 2-level processor complex. There are 16 processors in total, so the processor address is represented by 4 bits. Each level 1 processor group has 4 processors, the low 2 bits of each processor address are address bit sections in the level 1 processor group, and the high 2 bits are address bit sections of the level 1 processor group.

For example, if the address bit field for level 1 processor group of processor P1 is 0b01 and the address bit field for level 1 processor group is 0b00, then the address for P0 is 0b 0001; the address bit field for processor P12 in the level 1 processor group is 0b00 and the address bit field for the level 1 processor group is 0b11, then the address for P12 is 0b 1100.

Meanwhile, each processor group also has 1 group address, and the bit sections in the group address of the processor group are all 0. For example, the group address of G1.1 is 0b0100 and the group address of G1.3 is 0b 1100.

Description of the drawings: in the numerical representation herein, the number beginning with 0b is a binary number, and each number following 0b is a 1-bit binary number.

Third, transmission mode

The transmission modes of the data message include unicast, multicast and broadcast. Unicast refers to one-to-one transmission, where 1 processor in a many-on-chip core sends data to another 1 processor. Multicast refers to 1-to-many transmission, where 1 processor of a many-core on a chip sends the same 1 packet to all processors in a certain processor group. Broadcast means that 1 processor sends the same 1 packet to all processors of the many cores on the chip.

A "type" field is set up at the transmitting and receiving ends to indicate a transmission mode. The bit width of the Type is the same as the stage number of the highest processor group of the on-chip many cores, and each bit of the Type respectively corresponds to the multicast control of each level 1 of the on-chip many cores. For example, if the level of the highest processor group of a certain on-chip many core is 3 levels, type has 3 bits. A unicast may be indicated when type ═ 0; when type [0] is 0b1, multicast is performed on each processor in the level 1 processor group where the destination address is located; when type [1:0] is 0b11, multicast is performed on all processors and processor groups in the level 2 processor group where the destination address is located; and so on.

The representation of type is illustrated by taking the example that type has 3 bits. The switch of each stage determines the forwarding rule according to the destination address and transmission mode of the received data packet. type 0 indicates unicast, i.e., one-to-one transmission. Type is 0b111, i.e. all bits are 1, indicating broadcast. When the Type is 0b001, multicasting each processor in the level 1 processor group where the destination address is located; when the type is 0b011, it means that all processors and processor groups in the level 2 processor group where the destination address is located are multicast.

When the transmission mode is broadcasting, a destination address is not needed; when the transmission mode is 1-level processor group broadcasting, the 'address bit field in the 1-level processor group' in the destination address can be omitted; when the transmission mode is 2-level processor group broadcasting, the 'address bit section in the 1-level processor group' and the 'address bit section in the 1-level processor group' in the destination address can be omitted; and so on.

Forwarding rule of switch

Each port of the switch has 1 register PADDR. PADDR is used to store the address of this port. At the same time, the switch will also set 1 register to store the group address GADDR of this processor group.

A switch also has 1 address compare mask register PMASK. The PMASK bit width is the same as the address bit width. The role of PMASK is mainly to mask some fields that do not need to be compared, mainly for switches at level 2 and above. For a level 1 group switch, the PMASK may be omitted or may be set to 1 for each bit.

For example, a level 2 switch has 4 ports, each of which is connected to 4 level 1 group switches. The addresses of the 4 ports are 0b0100XX, 0b0101XX, 0b0110XX and 0b0111XX respectively, and the "X" inside the value indicates that the value is an irrelevant item, and may be 0 or 1. The highest 2 bits in the port address are the address bit section of the processor group of level 2, the middle 2 bits are the address of each port, and since this switch is a level 2 switch, the lowest 2 bits are originally the address bit section in the processor group of level 1, but now not needed, the PMASK of the level 2 switch is 0b 111100. By analogy, the PMASK for a level 3 switch is 0b 110000.

For the unicast transmission mode, after receiving a packet from one port, the destination address TADDR of the packet is compared with the addresses PADDR of the other ports in the group. The comparison rule is TADDR & PMASK ═ PADDR & PMASK. If equal, it indicates that the destination address matches the address of the port to which it is destined.

If the address of the port is the same as that of a certain port, the port is forwarded to the port, and if the address of the port is not the same as that of each port in the group, the port is forwarded to the port out of the group. If the out-of-group port is being occupied, wait. If a plurality of external ports exist, any free external port of the group is found and sent out.

For example, a switch has 4 intra-group ports, and addresses are respectively a port: 0B110000, port B: 0b110100, port C: 0b111000, port D: 0b 111100.

Case 1: the destination address of 1 message sent from port A is 0b111000, then the destination address of the message is compared with other 3 ports, and the message is found to be consistent with the address of port C, and then the message is sent to port C.

Case 2: the destination address of 1 message sent from port B is 0B000000, so the destination address of the message is not consistent with that of other 3 ports. The message is then sent to the group egress port of the switch.

Case 3: if the destination address of 1 message sent from port C is 0b111000, the destination address of this message is inconsistent with that of the other 3 ports. The message is then sent to the group egress port of the switch.

Case 4: if the destination address of 1 message sent from port C is 0b111000, the destination address of this message is not consistent with the other 3 ports, but is consistent with the address of port C itself. This situation can be understood as 1 loopback message, i.e. sent out and then received back. If loop-back message is to be supported, 1 more port address comparison logic is needed, that is, the destination address of the message received from a certain port is compared with the addresses of other ports and also with the address of the port. If the address comparison with the port is consistent, the address comparison is sent back again. Of course, if it is desired to save the address comparison logic or decide that such a loopback packet is not supported, the destination address of the packet is different from the addresses of other ports and is forwarded to the out-of-group port of the switch. At this time, the loop message is forwarded by the layer until the group external port of the switch of the highest layer is discarded.

For the broadcast transmission mode, the switch does not compare addresses, and directly copies the message and forwards the message to other ports in the group and ports out of the group.

For the transmission mode of multicast, each 1 bit of type corresponds to 1-level multicast control, the exchanger firstly copies each bit of the received type into the same width as the address bit segment of the level and inverts according to the bit to form 1-digit CMPTYPE with the same bit number as the address field of the level. For example, a system may have 3-level groups, 4 processors for a level 1 group, 4 level 2 groups for a

level

3, and 4 level 1 groups for a level 2. Then the type in the system is 3 bits, and then each 1 bit of the type is copied 2 times and inverted, i.e., CMPTYPE {! type [2],! type [2],! type [1],! type [1],! type [0],! type [0] }. The CMPTYPE bit width and the address bit width are the same. In this case, the multicast address comparison method is TADDR & PMASK & CMPTYPE ═ GADDR & PMASK & CMPTYPE. If the addresses are equal, the destination address hits the group, and multicast is needed in the group. If not, the address comparison and forwarding are carried out in a unicast mode.

If multicast in the group is determined, the exchanger will change the type of the message into a broadcast mode with 1 bit being 1, discard the destination address of the message, and then send the message to the port in the group of the exchanger; but when the copy message is sent to the external port of the group, the message is not modified.

Whether the message is copied to the external port of the group or not during multicast depends on the type field. For example, the present level is a level 1 switch, type [0] indicates a level 1 multicast control, and type [1] indicates a level 2 multicast control. If the current 1-level group is judged to be multicast, namely type [0] is 1, if type [1] is also 1, the higher-level group is also subjected to multicast, and at the moment, the message is copied to an external port of the group; if type [1] is 0, it means that higher group does not multicast, at this time, the message is not copied to the external port of the group.

Fifth, data packet format

The message transmitted by the network layer must have a destination address to which the message is to arrive, in addition to the data payload, and is transmitted in 2 elements. Meanwhile, there may be an element of the source address of the data packet, as required. The source address indicates which processor the present data message was sent from. The destination address indicates to which processor the data message is intended.

Interface of exchanger

There are 2 forms of interface to the switch,

a full duplex mode, i.e. supporting both transmission and reception, is shown in fig. 2. The send _ req and the rec _ req are 2 lines, and the send _ ack and the rec _ ack are 1 line, which belong to the control signals. Both send _ data and rec _ data are 1 set of lines for transmitting data. In fig. 2, 32-bit data can be transmitted simultaneously for 1 clock cycle because both send _ data and rece _ data are 32-bit. The bit width of the simultaneously transmitted data is changeable.

For the sender, when send _ req is 0, the sender is in an idle state. When send _ req is 1, it indicates that the transmitting end is busy, and indicates that an idle clock cycle is inserted in the middle of 1 packet transmission, and the packet has not been completely transmitted, but the current data is invalid. When send _ req is 2, it indicates the start of 1 new packet, and the current data is valid and also the first beat of data of the new packet. When send _ req is 3, it indicates that the current data is valid, but not the first beat of data of the packet.

When send _ ack equals 1, it indicates that the current data has been received by the receiving party, and the next beat can transmit the next data. When send _ ack is equal to 0, it indicates that the current data has not been received by the receiving party, and the next beat is to be kept unchanged.

The receiving end exactly corresponds to the data format and function of the transmitting end, because the transmitting end and the receiving end can be directly connected, as shown in fig. 3.

When using this type of interface, the destination address and transmission mode of the data packet must be placed in the first beat of data of the data packet. The purpose is to facilitate the switch to compare and determine the data forwarding port. Meanwhile, the destination address and the transmission mode of the data packet currently being transmitted are stored in the switch, so that the subsequent data forwarding of the data packet is facilitated.

Second form of interface for the switch, see fig. 4:

the difference between the second form and the first form is that address lines and transmission mode lines are added separately, and the destination address and transmission mode to be transmitted are taken out from the data packet and transmitted separately. This has the advantage that no destination address and transport is placed in the packet, nor is a buffer for storing destination addresses and transport placed in the switch, but has the disadvantage of increasing the number of transmission lines.

For the sender, send _ req becomes 1 line, and when send _ req is 0, the sender is in an idle state. When send _ req is 1, it indicates that data is being transmitted, and at this time send _ addr stores the destination address of the packet, send _ type stores the transmission scheme, and send _ data stores the data to be transmitted. send _ addr is a set of lines whose bit width is consistent with the address bit width. Send _ type is also a set of lines whose bit width coincides with the transport type bit width. send _ data is also a set of lines used to transfer data.

The signal connections of the receiving end and the transmitting end are also corresponding.

The invention has the technical effects and advantages that:

1. the invention provides a network layer data packet routing forwarding method of a network on chip, which supports the realization of network on chip communication protocol layering. By using the network layer data packet routing forwarding method provided by the invention, the number of connecting lines of the network on chip is reduced as much as possible, the occupied chip resources are reduced as much as possible, and the power consumption is reduced.

2. Many measures have been taken to increase the transmission speed. For example, underlying switches, such as level 1, level 2 switches, do not have cache direct forwarding; the high-level switch can solve the communication bottleneck by adopting a multi-channel mode; a multi-dimensional network on chip may also be built to further increase the transmission speed.

Drawings

FIG. 1 is a schematic diagram of a 1-chip many-core with a 2-level processor complex;

FIG. 2 is a schematic diagram of one example of an interface of a processor and a switch;

FIG. 3 is a schematic diagram of the connection between a processor and a switch, and between switches;

FIG. 4 is a schematic diagram of a second form of interface for a switch;

FIG. 5 is a schematic diagram of a 64-processor on-chip many-core network system with a 3-level processor complex;

FIG. 6 is a relationship diagram for each level 2 switch and level 3 switch;

fig. 7 is a schematic diagram of a system with a 2-dimensional network on chip.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Firstly, the on-chip multi-core or multi-core structure applied by the design scheme is as follows:

1. in many cores on a chip, each processor has 1 unique address, and the address of each processor is different. The address of a processor is 1 digit whose bit width is determined by the number of all processors in the many-core on chip. For example, if there are 256 processors on a chip, the processor address is at least 8 bits wide.

2. Each processor uses a hierarchical grouping placement. Namely, a plurality of processor cores form 1 processor group called a level 1 processor group. A number of level 1 processor groups form 1 new processor group, called level 2 processor group. A plurality of 2-level processor groups form 1 3-level processor group, … …, and the like form an on-chip mode core structure.

3. Each processor core has 1 message interface supporting the crowdsourcing structure, and the message interface is used for connecting with a switch in the network on chip. The address of a processor is actually the address or number of the port to which the switch is connected. The address of the processor may or may not be modifiable. If modifiable, the address is stored in a port address register of the connected switch. Meanwhile, the switch is required to be provided with an interface capable of configuring the register, and the register of the switch can be read and written by the processor on the chip or the external application of the chip with the authority through the interface.

4. There are at least 1 switch for data switching within each processor group. The message interface of the processor is directly and fixedly connected with a certain interface of the switch. The switch has at least 1 interface to an external higher level processor group in addition to each processor in the group. Each interface of the switch is a set of signals, and each interface is bidirectional, either in a bidirectional full-duplex mode or a bidirectional half-duplex mode.

5. Communication of the many-core on-chip messaging network may be hierarchical, with common hierarchies including a physical layer, a link layer, a network layer, a transport layer, and an application layer. The network layer is used for realizing the forwarding of communication messages among the processors. This patent only introduces the routing method used by the network layer for transmitting data messages.

Second, cache of the exchanger

Three, address representation

FIG. 1 is a diagram of 1 on-chip many cores with a 2-level processor complex. There are 16 processors in total, so the processor address is represented by 4 bits. Each level 1 processor group has 4 processors, the low 2 bits of each processor address are address bit sections in the level 1 processor group, and the high 2 bits are address bit sections of the level 1 processor group.

Fourth, transmission mode

Fifth, forwarding rule of the switch

level

Sixth, data packet format

Interface of seven, switch

The interface of the switch has 2 forms, and fig. 2 is a first form:

fig. 2 is an example of an interface of a processor and a switch. This is a full duplex mode, i.e. supporting both transmission and reception. The send _ req and the rec _ req are 2 lines, and the send _ ack and the rec _ ack are 1 line, which belong to the control signals. Both send _ data and rec _ data are 1 set of lines for transmitting data. In the example of fig. 2, 32-bit data can be transmitted simultaneously for 1 clock cycle because both send _ data and rece _ data are 32-bit. The bit width of the simultaneously transmitted data is changeable.

The receiving end is exactly corresponding to the data format and function of the transmitting end, because the transmitting end and the receiving end can be directly connected. Fig. 3 is a connection between a processor and a switch, and between switches.

A second form of interface for the switch is shown in figure 4.

For the sender, send _ req becomes 1 line, and when send _ req is 0, the sender is in an idle state. When send _ req is 1, it indicates that data is being transmitted, and at this time send _ addr stores the destination address of the packet, send _ type stores the transmission scheme, and send _ data stores the data to be transmitted. send _ addr is a set of lines whose bit width is consistent with the address bit width. The send _ type is also a set of lines, and the bit width thereof coincides with the bit width of the transmission type. send _ data is also a set of lines used to transfer data.

The signals of the receiving end and the transmitting end are correspondingly connected.

Eight example of Transmission

FIG. 5 shows a 64-processor on-chip many-core network with a 3-level processor complex. There are 4 processors in each level 1 processor group, 4 level 1 processor groups in each level 2 processor group, and 4 level 2 processor groups in the highest level 3 processor group. The group address bit width of the processors of each level is 2 bits, and the intra-group address of the processor of level 1 is also 2 bits. The address of the processor is 6 bits.

For example, the address of P27 is 0b011011, and the address of P57 is 0b 111001.

The unicast communication method in this system mainly includes:

1. the processor goes to the level 1 switch, then to the level 2 switch, then to the level 3 switch, then to the level 2 switch, then to the level 1 switch, and then to the destination processor.

Such as P1 to P56. P56 has an address of 0b111000, and P1 sends a packet to level 1 group switch S1.0. S1.0 compares and finds that the destination address 0b111000 is not any port of the group and is directly sent to the port outside the group.

Arriving at the level 2 group switch S2.0, S2.0 also finds that the destination address is not any port of the group by address comparison, and sends it to the group-out port.

Arriving at level 3 group switch S3.0, S3.0 finds by address comparison that it is sent to a port of level 2 processor group G2.3, and then to level 2 group switch S2.3 of level 2 processor group G2.3.

The level 2 group switch S2.3 finds, by address comparison, to be sent to a port of the level 1 processor group G1.14 and then to the level 1 group switch S1.14 of the level 1 processor group G1.14.

After S1.14, the address is compared to find that the address is sent to P56, and then the address is sent to the port connected with P56.

2. The processor goes to the level 1 switch, then to the level 2 switch, then to the level 1 switch, and then to the destination processor.

Such as P1 to P12. The address of P12 is 0b001100, and P1 sends the packet to the level 1 group switch S1.0. S1.0 compares and finds that the destination address 0b001100 is not any port of the group and is directly sent to an external port of the group.

Arriving at level 2 group switch S2.0, S2.0 finds by address comparison that it is sent to a port of level 1 processor group G1.3, and then to level 1 group switch S1.3 of level 1 processor group G1.3.

After S1.3, the address is compared to find that the address is sent to P12, and then the address is sent to the port connected with P12.

3. The processor goes to the level 1 switch and then to the destination processor.

Such as P1 to P3. P3 has an address of 0b000011, and P1 sends a packet to the level 1 group switch S1.0. S1.0 compares and finds that the destination address 0b000011 is the port address of the group connected with P3, and is directly sent to the port.

The multicast communication method in this system mainly includes:

1. multicast within the present level 1 processor group.

For example, P0 sends out a multicast message to be multicast within this level 1 group. Send _ type is set to 0b001, Send _ addr is set to 0b0000XX, where XX is an irrelevant item, and may be any value or may not exist. The level 1 group switch S1.0 detects that Send _ addr is the address in the group and bit 0 of Send _ type is 1, which means that level 1 group multicast is indicated, and then copies 3 messages to the other 3 ports in the group.

When the type is 0b001, the multicast address comparison process is that CMPTYPE is 0b111100, the level 1 group PMASK is 0b111111, the level 1 group address where P0 is located is GADDR 0b000000, TADDR & PMASK & CMPTYPE is 0b0000XX &0b111111&0b111100 is 0b000000, GADDR & PMASK & CMPTYPE is 0b000000&0b111111&0b 000000, and GADDR & PMASK & CMPTYPE is 0b000000&0b111111&0b111100 is 0b000000, so they are equal.

Since bit 1 of Send _ type is 0, i.e., indicating that the level 2 group is not multicast, the message is not copied to the out-of-group port of S1.0.

2. Multicast to the same level 2 processor group, a different level 1 processor group.

For example, P1 sends out a multicast message to be multicast in G1.3 group. Send _ type is set to 0b001 and Send _ addr is set to 0b0011 XX. After the message reaches the level 1 switch G1.0, the destination address is not in the group through address comparison, and then the message is forwarded to the external port of the group.

After the level 2 switch G2.0 receives the packet, since the 1 st bit of Send _ type is 0, that is, indicating that the level 2 group unicast, and the address comparison finds that the destination address is at the port connected to G1.3, the packet is forwarded to this port.

G1.3 detects send _ addr as the address in the group after receiving the message, and the 0 th bit of send _ type is 1, which means that 1-level group multicast is represented, and then 4 copies of the message are sent to 4 ports in the group.

3. Multicast within the present level 2 processor group.

For example, P2 sends out a multicast message to be multicast to each processor in the G2.0 level 2 group. send _ type is set to 0b011, and send _ addr is set to 0b00 AAXX. Since this level 1 group is also multicast at the same time, the AA here must be 00.

After the message reaches the level 1 switch G1.0, the address comparison finds that the destination address is in the group, and the 0 th bit of send _ type is 1, which means that level 1 group multicast, and then 3 copies of the message are sent to the other 3 ports in the group. Note that when the copy is sent to the intra-group port, send _ type is modified to 0b111, i.e. the packet is broadcast, and the destination address is discarded.

Since bit 1 of send _ type is 1, i.e. indicating that the level 2 group is also multicast, the message is also copied to the out-of-group port of S1.0. Note that when copying to the out-of-group port, send _ type will not be modified, and the destination address will also be preserved.

And after S2.0 is reached, multicast address comparison is carried out: CMPTYPE is 0b110000, level 2 group PMASK is 0b111100, level 2 group address where G2.0 is located is GADDR is 0b000000, TADDR & PMASK & CMPTYPE is 0b0000XX &0b111100&0b110000 is 0b000000, GADDR & PMASK & CMPTYPE is 0b000000&0b111100&0b110000 is 0b000000, so both are equal. So it is multicast in this group. Then the transmission method of the message is changed into broadcasting and 3 copies are sent to all the ports in the group.

Since the 2 nd bit of Send _ type is 0, i.e., indicating that the level 3 group is not multicast, the message is not copied to the out-of-group port of S2.0.

4. Multicast within another group of level 2 processors.

For example, P16 issues a multicast message to be multicast to each processor in the G2.2 level 2 group. Send _ type is set to 0b011, and Send _ addr is set to group address 0b10XXXX of G2.2.

After the message reaches the level 1 switch S1.4, the address comparison shows that the group address of G1.4 is 0b0100XX, obviously different from send _ addr, and the destination address is not in the group and is sent to the group external port.

And after S2.1 is reached, the destination address is found not to be in the group through address comparison and is sent to an external port of the group.

After S3.0 is reached, the address comparison reveals that the destination address is on the interface connected to G2.2, and is sent to S2.2.

After S2.2 is reached, since type [1] is 1 and the multicast addresses are compared: CMPTYPE is 0b110000, level 2 group PMASK is 0b111100, level 2 group address where G2.0 is located is GADDR is 0b100000, TADDR & PMASK & CMPTYPE is 0b10XXXX &0b111100&0b110000 is 0b100000, GADDR & PMASK & CMPTYPE is 0b100000&0b111100&0b110000 is 0b100000, so both are equal. So it is multicast in this group. Then the transmission method of the message is changed into broadcasting and is copied into 4 copies to be sent to all the ports in the group.

Since the message comes from the group external port of S2.2, the message is not sent back to the group external port. Even if the current network on chip allows a loopback packet, the type [2] is 0, so that the packet is not copied to the external port of the group.

5. To a multicast in one of the level 1 processor groups in another of the level 2 processor groups.

For example, P17 sends out a multicast message to be multicast to each processor in the G1.12 level 1 group. Send _ type is set to 0b001, and Send _ addr is set to group address 0b1100XX of G1.12.

After the message reaches the 1-stage switch S1.4, the destination address is found out not in the group through address comparison and is sent to the group external port.

After S3.0 is reached, the address comparison shows that the destination address is on the interface connected to G2.3, and then sent to S2.3.

After S2.3 is reached, the address comparison reveals that the destination address is on the interface connected to G1.12 and is sent to S1.12.

After S1.12 is reached, since type [0] is 1 and the multicast addresses are compared: CMPTYPE is 0b111100, 1-stage group PMASK is 0b111111, G1.12 is located at 1-stage group address GADDR is 0b110000, TADDR & PMASK & CMPTYPE is 0b1100XX &0b111111&0b111100 is 0b110000, GADDR & PMASK & CMPTYPE is 0b110000&0b111111&0b111100 is 0b110000, so both are equal. So it is multicast in this group. Then the transmission method of the message is changed into broadcasting and is copied into 4 copies to be sent to all the ports in the group.

Since the message comes from the external port of S1.12, no message is sent back to the external port.

The broadcast communication method in this system is only 1. For example, P3 would send out 1 broadcast message. send _ type is set to 0b111, and send _ addr can be ignored. After the packet arrives at the level 1 switch S1.0, because the unconditional broadcast is sent with send _ type 0b111, S1.0 copies 4 packets directly without address comparison, and sends them to the other 3 intra-group ports and the other 3 extra-group ports.

And S2.0, after receiving the broadcast message, directly broadcasting, copying 4 parts of the message, and respectively sending the copied message to other 3 internal ports and external ports of the group. After S1.1, S1.2 and S1.3 receive the message, they also directly copy 4 broadcasts and send them to 4 intra-group ports in this group.

And S3.0, after receiving the broadcast message, directly broadcasting, copying 4 parts of the message, and respectively sending the copied message to other 3 internal ports and external ports of the group. After S2.1, S2.2 and S2.3 receive the message, they also directly copy 4 broadcasts and send them to 4 intra-group ports in this group. Because S3.0 is the highest-level switch, the message sent to the group external port can be directly discarded, and the group external port can be connected with the chip pins, so that the connection between the network on chip and the resources outside the chip is realized, and even the network on chip interconnection among multiple chips can be formed.

Similarly, the other level 1 switches receive the broadcast message and broadcast directly.

Nine, multi-channel interface to alleviate communication bottlenecks

The number of the outer port and the inner port of the switch can be more than 1. Multiple interfaces are 1 channel each. Especially for high-level switches, the data traffic is large, and a plurality of out-group ports or in-group ports can be added to improve the communication speed.

As shown in fig. 6, each of the level 2 switch and the level 3 switch has 2 out-of-group ports, called 2 lanes. Setting up 2 level-3 switches at the same time increases the communication speed. When 1 switch has multiple group egress ports, a message to be sent to a group egress port can be sent through any group egress port. So it is only necessary to find 1 free group outer port for transmission.

There may be a plurality of intra-group ports to which the switch is connected to a certain processor. At this time, the same message can be sent out only by finding any free interface.

Tenth, establish multidimensional network on chip to improve communication speed

To further increase the communication speed, a multidimensional network on chip may be established.

This is a system with a 2-dimensional network on chip, as shown in fig. 7. Any processor has 2 interfaces, each interface is connected with a 1-dimensional network. The 2-dimensional 1-stage switch and the 2-dimensional 2-stage switch in fig. 7 are also interfaces of 2 channels.

When the processor needs to send a message, any idle interface can be found to send out, the message can also be classified, and a certain type of message is fixedly sent and received through a certain 1-dimensional network, which is determined by the system or application.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that are within the spirit and principle of the present invention are intended to be included in the scope of the present invention.

Claims

1. A method for packet routing for a message-on-chip network, comprising:

(one) Transmission mode determination

Setting a 'type' field at transmitting and receiving ends to indicate a transmission mode; the bit width of the Type is the same as the stage number of the highest processor group of the on-chip many cores, and each bit of the Type respectively corresponds to the multicast control of each level 1 of the on-chip many cores; a unicast may be indicated when type ═ 0; when type [0] is 0b1, multicast is performed on each processor in the level 1 processor group where the destination address is located; when type [1:0] is 0b11, multicast is performed on all processors and processor groups in the level 2 processor group where the destination address is located; and so on;

when the transmission mode is broadcasting, a destination address is not needed; when the transmission mode is 1-level processor group broadcasting, the 'address bit field in the 1-level processor group' in the destination address can be omitted; when the transmission mode is 2-level processor group broadcasting, the 'address bit section in the 1-level processor group' and the 'address bit section in the 1-level processor group' in the destination address can be omitted; and so on;

forwarding rule determination for switch (II)

The exchanger has 1 register PMASK; the PMASK bit width is the same as the address bit width; the PMASK mainly has the function of shielding fields which do not need to be compared and is mainly used for 2-stage and above switches; for a level 1 group switch, the PMASK may be omitted, or may be set to 1 for each bit;

for the unicast transmission mode, after receiving a message from one port, the destination address TADDR of the message is compared with the addresses PADDR of other ports in the group; the comparison rule is TADDR & PMASK ═ PADDR & PMASK; if the address of the port is equal to the address of the destination address, the destination address is matched with the address of the port and is sent to the port; if the address of the port is the same as that of a certain port, the port is forwarded to the port, and if the address of the port is not the same as that of each port in the group, the port is forwarded to a port outside the group; waiting if the group exterior port is occupied; if a plurality of external group ports exist, any free external group port is found and sent out;

for the broadcast transmission mode, the switch does not compare addresses, and directly copies the message and forwards the message to other ports in the group and ports out of the group;

for the transmission mode of multicast, each 1 bit of type corresponds to 1-level multicast control, the exchanger firstly copies each bit of the received type into the same width as the address bit section of the level and negates the bit to form 1 digital CMPTYPE with the same bit number as the address field of the level;

if multicast in the group is judged, the exchanger changes the type of the message into a broadcast mode that each 1 bit is 1, discards the destination address of the message and sends the message to the port in the group of the exchanger; but when the copy message is sent to the external port of the group, the message is not modified;

whether the message is copied to an external port of the group or not during multicast depends on the type field; if the current level is a level 1 switch, type [0] represents level 1 multicast control, and type [1] represents level 2 multicast control; if the current 1-level group is judged to be multicast, namely type [0] is 1, if type [1] is also 1, the higher-level group is also subjected to multicast, and at the moment, the message is copied to an external port of the group; if type [1] is 0, it means that higher group does not multicast, at this time, the message is not copied to the external port of the group.

2. The method of claim 1, wherein the packet routing method is implemented by a network of messages on chip: each port of the switch is provided with 1 register PADDR; PADDR is used to store the address of this port; at the same time, the switch will also set 1 register to store the group address GADDR of this processor group.

3. The method of claim 1, wherein the packet routing method is implemented by a network of messages on chip: for the unicast transmission mode, if the switch has 4 intra-group ports, the addresses are respectively a ports: 0B110000, port B: 0b110100, port C: 0b111000, port D: 0b 111100; the following 4 cases are included:

case 1: the destination address of 1 message sent from the port A is 0b111000, then the destination address of the message is compared with other 3 ports, the address is consistent with the address of the port C, and then the message is sent to the port C;

case 2: the destination address of 1 message sent from the port B is 0B000000, and the destination address of the message is inconsistent with the destination addresses of the other 3 ports; then the message is sent to the group external port of the switch;

case 3: the destination address of 1 message sent from the port C is 0b111000, and the destination address of the message is inconsistent with the destination addresses of the other 3 ports; then the message is sent to the group external port of the switch;

case 4: the destination address of 1 message sent from the port C is 0b111000, and then the destination address of the message is inconsistent with the comparison of other 3 ports, but is consistent with the address of the port C; if the loop-back message is to be supported, 1 port address comparison logic needs to be added, namely the destination address of the message received from a certain port is compared with the addresses of other ports and the address of the port; if the address of the port is consistent with the address of the port, sending back the port; or when the destination address of the loop-back message is different from other port addresses and the loop-back message is forwarded to the group external port of the switch, the loop-back message is forwarded by the layer until the group external port of the switch at the highest layer and is discarded.

4. The method of claim 1, wherein the packet routing method is implemented by a network of messages on chip: transmission mode for multicast

If the system has 3-level groups, the 1-level group has 4 processors, the 3-level has 4 2-level groups, and the 2-level has 4 1-level groups; then the type in the system is 3 bits, and then each 1 bit of the type is copied 2 times and inverted, i.e., CMPTYPE {! type [2],! type [2],! type [1],! type [1],! type [0],! type [0] }; the CMPTYPE bit width is the same as the address bit width; at this time, the method for comparing the multicast addresses is TADDR & PMASK & CMPTYPE ═ GADDR & PMASK & CMPTYPE; if the addresses are equal, the destination address hits the group, and multicast in the group is required; if not, the address comparison and forwarding are carried out in a unicast mode.

5. The method of claim 1, wherein the packet routing method is implemented by a network of messages on chip: the number of the switches is 2, wherein 1 is internally provided with no cache, and the other 1 is internally provided with cache; the switch without the buffer inside is composed of combinational logic, directly transmits data without delay, and is suitable for switches with lower levels; the switch with the buffer inside buffers the received data, and the main purpose is to cut off the combined path between the sender and the receiver and prevent the combined logic path from being too long; the caching mode has 2 types, 1 type is real-time forwarding caching based on a register, and the register bit number of the internal caching in the mode is consistent with the bit width of the received data; the data received in each clock cycle is firstly put into a cache register and then forwarded to the switch interface where the destination address is located, and 1 clock cycle delay exists between data receiving and retransmitting; the other 1 is to set 1 buffer memory to receive the whole data packet and then forward it to the switch interface where the destination address is located. In this way the buffer memory should be sized to fit the maximum packet length. The retransmission time is not fixed, and there is a possibility that the retransmission is performed while the reception is performed, or the retransmission is performed after the reception.

6. The method of claim 1, wherein the packet routing method is implemented by a network of messages on chip: the address indicates: the address of the processor is composed of a plurality of fields, and the addresses of 1 processor with an N-level processor group are { N-1 level processor group address bit section, … …, 2 level processor group address bit section, 1 level processor group address bit section }; the bit width of each bit segment is determined by how many identical cells are at the same level.

7. The method of claim 1, wherein the packet routing method is implemented by a network of messages on chip: one or more external ports and internal ports of the switch are provided; each interface is 1 channel when the interfaces are multiple; when a high-level switch is adopted, the data traffic is large, and a plurality of external ports or internal ports can be added to improve the communication speed.

8. The method of claim 1, wherein the packet routing method is implemented by a network of messages on chip: in order to improve the further communication speed, a multidimensional network on chip is established; when the processor needs to send a message, any idle interface can be found to send out, the message can also be classified, and a certain type of message is fixedly sent and received through a certain 1-dimensional network, which is determined by the system or application.