CN110011938B

CN110011938B - Reordering circuit and method with variable stage number applied to network on chip

Info

Publication number: CN110011938B
Application number: CN201910280264.XA
Authority: CN
Inventors: 李桢旻; 杜高明; 范人士; 王晓蕾; 邢琨; 邓宸
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2021-01-15
Anticipated expiration: 2039-04-09
Also published as: CN110011938A

Abstract

The invention discloses a reordering circuit and a method with variable stage number applied to a network on chip, wherein the circuit comprises a final reordering circuit and n secondary reordering circuits; the final-stage reordering circuit is arranged between a computing node and a communication node of the network on chip; the secondary reordering circuit is arranged between any two communication nodes of the network on chip; the final stage reordering circuit and the secondary stage reordering circuit each include: the input buffer module and the recombination buffer module; the recombination cache module consists of a reading module, a sequence generation module, a judgment module, a storage module, a direct output module and a cache output module; wherein, the final stage reordering circuit further comprises: and the data packet unpacking module. Compared with the traditional sorting circuit, the invention can sort and output data simultaneously, has fast sorting speed, lower power consumption and balanced network-on-chip load.

Description

Reordering circuit and method with variable stage number applied to network on chip

Technical Field

The invention belongs to the technical field of communication of an on-chip network of an integrated circuit, and particularly relates to a reordering circuit and a reordering method with variable stages applied to the on-chip network.

Background

The network on chip is a feasible alternative for bus control in the system on chip, and gradually expands from single-path routing to multi-path routing. Because the service flow is divided into a plurality of sub-service flows, and the collision conditions on each path are different, the time spent from the source end to the destination end is different, the delay upper bound of the service flow is greatly changed, and thus the carried data packets are out of order. Packet sequential delivery is very important for many applications, such as multimedia data delivery, cache coherency, etc.

The sorting circuit proposed in 'Memory-efficiency ON-Chip network with adaptive Interface' published in IEEE TRANSACTIONS COMPUTER-AIDED DESIGN OF INTEGRATED CICUITS AND SYSTEMS, volume 31, first phase 146-159 has high sorting delay and high power consumption.

Disclosure of Invention

The invention provides a reordering circuit and a reordering method with variable series applied to a network-on-chip (SON) for overcoming the problem of disorder of data packets of the conventional network-on-chip (SON) platform, so that data can be ordered and output at the same time, and the utilization rate of a storage unit is improved, the ordering time is shortened, the power consumption is reduced, and the network load is more balanced through multilevel reordering.

The technical scheme adopted by the invention to achieve the aim is as follows:

the invention relates to a reordering circuit with variable series applied in network on chip, which is characterized in that: a final reordering circuit and n secondary reordering circuits;

the final-stage reordering circuit is arranged between a computing node and a communication node of the network on chip;

the secondary reordering circuit is arranged between any two communication nodes of the network on chip;

the final stage reorder circuit and the secondary reorder circuit each include: the input buffer module and the recombination buffer module; the recombination cache module consists of a reading module, a sequence generation module, a judgment module, a storage module, a direct output module and a cache output module; wherein the final stage reorder circuit further comprises: a data packet unpacking module;

a counter pack _ cnt of a sequence generation module in the final stage reordering circuit is initialized to "1";

initializing a counter pack _ cnt of a sequence generation module in the secondary reordering circuit according to splitting ratio information preset in the x direction and the y direction of each communication node;

a register set mem in the memory module is initialized to "0"; the data address addr in the memory module is initialized to "0";

an input cache module in the secondary reordering circuit receives a data packet transmitted by a network on chip by using a communication node of the upper stage and stores the data packet into a synchronous FIFO; an input cache module in the final-stage reordering circuit receives a data packet transmitted by a network on chip by using a destination communication node and stores the data packet into a synchronous FIFO; when the synchronous FIFO is Full, feeding back a Full signal Full to a corresponding communication node to stop sending the packet;

when the reading module detects that the synchronous FIFO is not empty, generating a read enable rd _ en for reading a head flit of a data packet in the synchronous FIFO and selecting data packet sequence information pack _ id in the head flit;

when the sequence generation module detects that a first microchip enters or receives a request mark seq fed back by the cache output module, the counter pack _ cnt is automatically increased by '1';

the judging module reads and compares the value of the counter pack _ cnt in the sequence generating module with the data packet sequence information pack _ id, if the value of the counter pack _ cnt is the same as the value of the data packet sequence information pack _ id in the sequence generating module, an output mark out is generated and sent to the direct output module, and if the value of the counter pack _ cnt is different from the value of the data packet sequence information pack _ id in the sequence generating module, a storage mark store is generated and sent to the;

after the direct output module in the final-stage reordering circuit reads the output mark out, completely outputting a head flit, a body flit and a tail flit in the data packet to the data packet unpacking module through a self counter outdata _ cnt;

after the direct output module in the secondary reordering circuit reads the output mark out, the head microchip, the body microchip and the tail microchip in the data packet are completely output to a target communication node through a self counter outdata _ cnt;

after reading the storage mark store, the storage module controls the data address addr to increase by 1 by itself through the counter store _ cnt, and simultaneously sequentially stores the head microchip, the body microchip and the tail microchip in the corresponding data packet into the RAM; the storage module stores the address of the head microchip into a register in the register group mem at a position corresponding to the pack _ id of the data packet sequence information;

the cache output module reads the value of the counter pack _ cnt from the sequence generation module, searches the register data at the corresponding position from the register group mem according to the value of the counter pack _ cnt, takes the corresponding register data as a head microchip address when the searched register data is not 0, sequentially outputs a head microchip, a body microchip and a tail microchip in the data packet from the RAM through the counter RAM _ cnt of the cache output module, and feeds back a request mark seq of the sequence generation module;

a buffer output module in the final-stage reordering circuit outputs the data packet to the data packet unpacking module;

a buffer output module in the secondary reordering circuit outputs the data packet to a next-level communication node or a destination communication node;

and the data unpacking module receives the data packet output by the reorganization cache module, unpacks the effective data information of the body flit and the tail flit in the data packet and sends the effective data information to the computing node of the network on chip.

The invention relates to a reordering and sorting method with variable progression applied to a network on chip, which is characterized by comprising the following steps:

step 1, defining a counter pack _ cnt, and initializing according to split ratio information preset in the x direction and the y direction of each communication node;

defining a register set mem and initializing to "0"; defining the data address addr to be initialized to "0";

defining the stage number as i; defining n as a preset maximum stage number;

step 2, receiving a data packet transmitted by a network on chip by using an i-th-level communication node and storing the data packet into a synchronous FIFO, and feeding a Full signal Full to the communication node when the synchronous FIFO is Full so that the communication node stops sending the packet;

step 3, when detecting that the synchronous FIFO is not empty, generating a read enable rd _ en for reading a head flit of a data packet in the synchronous FIFO and selecting data packet sequence information pack _ id in the head flit;

step 4, when detecting that the first microchip enters or receiving a feedback request mark seq, the counter pack _ cnt is self-increased by '1';

step 5, comparing the value of the counter pack _ cnt with the data packet sequence information pack _ id, if the two are the same, generating an output mark out, and executing step 6; if the two are different, a storage mark store is generated, and step 7 is executed;

step 6, completely outputting the head microchip, the body microchip and the tail microchip in the data packet to a next-stage communication node through a self counter outdata _ cnt;

step 7, controlling the addr of the data address to be in self increment of 1 through a self counter store _ cnt, and simultaneously sequentially storing a head microchip, a body microchip and a tail microchip in a corresponding data packet into an RAM; meanwhile, storing the address of the head microchip into a register at a position corresponding to the pack _ id of the data packet sequence information in the register group mem;

step 8, searching register data at a corresponding position from a register group mem according to the value of the counter pack _ cnt, when the searched register data is not '0', using the corresponding register data as a head microchip address, and sequentially outputting a head microchip, a body microchip and a tail microchip in the data packet to an i + 1-level communication node from the RAM through the counter RAM _ cnt;

step 9, judging whether i is greater than or equal to n, if so, executing step 10; otherwise, returning to the step 2;

and step 10, disassembling the effective data information of the body flit and the tail flit in the data packet, and sending the effective data information to a computing node of the network on chip.

Compared with the prior art, the beneficial technical effects of the invention are as follows:

1. the reorganization cache circuit module provided by the invention skips the positive-order data packets in the sorting process, only sorts the disordered data packets, uses less hardware resources, and reduces the whole delay and power consumption of the circuit compared with the traditional sorting mode.

2. The level variable reordering cache circuit provided by the invention distributes the load of the ordering circuit among the routing nodes when in multi-level reordering, and the reordering circuits work simultaneously, thereby ensuring the balance of the circuit load, avoiding a large amount of disordered data packets from being accumulated at the final-level reordering nodes, and reducing the whole ordering delay.

Drawings

FIG. 1 is a circuit diagram of an overall platform of the present invention;

FIG. 2 is a circuit diagram of a reorganization cache module according to the present invention;

FIG. 3 is a schematic diagram of a multi-stage recombination sequence of the present invention;

FIG. 4 is a graph of comparative experimental results for different buffer sizes of the present invention;

FIG. 5 is a graph of comparative experimental results for different packet transmission rates (processing delays) in accordance with the present invention;

FIG. 6 is a graph showing the results of a comparison of single-stage reordering and multi-stage reordering performed in the present invention.

Detailed Description

In this example, as shown in fig. 1, the network on chip is a two-dimensional network of 5x 5;

1 head microchip, 3 individual microchips and 1 tail microchip form a data packet, the microchip bit width is 54 bits, the 53 th bit and the 52 th bit of the microchip are microchip flag bits, wherein 01 represents the head microchip, 11 represents the body microchip, 10 represents the tail microchip, the 51 st bit to the 46 th bit represent the serial number of the microchip in the data packet, the 45 th bit to the 32 th bit represent the serial number of the data packet, and the 31 st bit to the 0 th bit represent the data contained in the microchip, as shown below:

Packet＝{flit_head[53:52],sequence[51:46],packet_id[45:32],data[31:0]}

in this example, a reordering circuit with variable stage number applied in a network on chip includes: a final stage reordering circuit and n secondary reordering circuits, as shown in fig. 3, wherein the source node (1,1) is a routing node for sending data packets, the destination node (5,5) is a routing node for receiving data packets, the destination data stream flows as shown in fig. 3, and 50 data packets are injected at a time, and the number between two routing nodes in fig. 3 represents the number of data packets transmitted between two nodes;

the remaining 9 data streams in the network on chip are interference streams, and the source node and the destination node are (1,2) to (4,3), (1,4) to (5,3), (3,1) to (4,5), (4,2) to (1,5), (2,2) to (5,4), (3,3) to (4,6), (1,3) to (6,6), (4,1) to (2,5), (2,1) to (4,4), respectively;

the final-stage reordering circuit is arranged between a computing node and a communication node of the network on chip, namely nodes (5,5), and finishes the ordering of data packets from number 1 to number 40;

the secondary reordering circuit is arranged between any two communication nodes of the network on chip, namely between the communication nodes (5,4) and the communication nodes (5,5), and finishes the ordering of No. 40 to No. 50 data packets;

as shown in fig. 1, the final stage reordering circuit and the secondary stage reordering circuit each include: the input buffer module and the recombination buffer module; as shown in fig. 2, the reassembly and caching module is composed of a reading module, a sequence generation module, a judgment module, a storage module, a direct output module, and a caching output module; wherein, the final stage reordering circuit further comprises: a data packet unpacking module;

a counter pack _ cnt of a sequence generation block in the final stage reordering circuit is initialized to "1";

a counter pack _ cnt of a sequence generation module in the secondary reordering circuit is initialized to be 1 according to splitting ratio information preset in the x direction and the y direction of each communication node;

an input buffer module in the secondary reordering circuit receives a data packet transmitted by the network on chip by using a communication node (5,4) of the upper level and stores the data packet into a synchronous FIFO (first in first out); an input buffer module in the final-stage reordering circuit receives a data packet transmitted by the network on chip by using a target communication node (5,5) and stores the data packet into a synchronous FIFO (first in first out); when the synchronous FIFO is Full, feeding back a Full signal Full to a corresponding communication node to stop sending the packet;

when the reading module detects that the synchronous FIFO is not empty, generating a read enable rd _ en for reading a head flit of a data packet in the synchronous FIFO, and selecting data packet sequence information pack _ id from 45 th bit to 32 th bit in the head flit;

when the sequence generation module detects that the first flit enters or receives a request mark seq fed back by the cache output module, the counter pack _ cnt is automatically increased by '1';

the judging module reads the value of the counter pack _ cnt in the sequence generating module and the data packet sequence information pack _ id and compares the value with the value of the counter pack _ cnt in the sequence generating module, if the value is the same as the value of the counter pack _ cnt in the sequence generating module, an output mark out is generated and sent to the direct output module, and if the value is different from the value of the counter pack _ cnt in the sequence generating module, a storage mark store is generated;

after a direct output module in the final-stage reordering circuit reads an output mark out, completely outputting 1 head microchip, 3 individual microchips and 1 tail microchip in a data packet to a data packet unpacking module through a self counter outdata _ cnt;

after a direct output module in the secondary reordering circuit reads an output mark out, 1 head microchip, 3 individual microchips and 1 tail microchip in a data packet are completely output to a target communication node through a self counter outdata _ cnt;

after reading the storage mark store, the storage module controls the data address addr to be self-increased by 1 through the self counter store _ cnt, and simultaneously sequentially stores 1 head microchip, 3 individual microchips and 1 tail microchip in the corresponding data packet into the RAM; the storage module stores the address of the head microchip into a register in a register group mem at a position corresponding to the data packet sequence information pack _ id;

the buffer output module reads the value of the counter pack _ cnt from the sequence generation module, searches the register data at the corresponding position from the register group mem according to the value of the counter pack _ cnt, takes the corresponding register data as a head microchip address when the searched register data is not 0, sequentially outputs 1 head microchip, 3 individual microchips and 1 tail microchip in the data packet from the RAM through the counter RAM _ cnt of the buffer output module, and feeds back a request mark seq of the sequence generation module;

a buffer output module in the final-stage reordering circuit outputs the data packet to a data packet unpacking module;

a buffer output module in the secondary reordering circuit outputs the data packet to a destination communication node (5, 5);

and the data unpacking module receives the data packet output by the reorganization cache module, and sends the data packet to a computing node of the network on chip after the information of the valid data from the 31 st bit to the 0 th bit of the body microchip and the tail microchip in the data packet is unpacked.

In this example, as shown in fig. 3, a reordering and sorting method with variable stages applied in a network on chip is performed as follows:

step 1, defining a counter pack _ cnt, and initializing to be 1 according to splitting ratio information preset in the x direction and the y direction of each communication node;

defining the stage number as i; defining n to 2 as a preset maximum stage;

step 2, receiving data packets transmitted by the network on chip by using the level 1 communication nodes (5,4) and storing the data packets into the synchronous FIFO, and feeding a Full signal Full back to the communication nodes to enable the communication nodes to stop sending the packets when the synchronous FIFO is Full;

step 3, when detecting that the synchronous FIFO is not empty, generating a read enable rd _ en for reading a head flit of a data packet in the synchronous FIFO, and selecting data packet sequence information pack _ id from the 45 th bit to the 32 th bit in the head flit;

step 4, when detecting that the first microchip enters or receiving a feedback request seq mark, the counter pack _ cnt is self-increased by '1';

step 6, completely outputting 1 head microchip, 3 individual microchips and 1 tail microchip in the data packet to a next-level communication node (5,5) through a self counter outdata _ cnt;

step 7, controlling the data address addr to be self-increased to be 1 through the self counter store _ cnt, and simultaneously sequentially storing 1 head microchip, 3 individual microchips and 1 tail microchip in the corresponding data packet into the RAM; meanwhile, storing the address of the head microchip into a register at a position corresponding to the pack _ id of the data packet sequence information in a register group mem;

step 8, register data of a corresponding position is searched from a register group mem according to the value of a counter pack _ cnt, when the searched register data is not 0, the corresponding register data is used as a head microchip address, and 1 head microchip, 3 individual microchips and 1 tail microchip in a data packet are sequentially output to a 2-level communication node from an RAM through the counter RAM _ cnt;

step 9, judging whether (i ═ 2) is equal to or greater than (n ═ 2) or not, and if yes, executing step 10; otherwise, returning to the step 2;

and step 10, disassembling the 31 st bit to 0 th bit effective data information of the body microchip and the tail microchip in the data packet, and sending the effective data information to a computing node of the network on chip.

As shown in fig. 4 and fig. 5, compared with the delay under the conditions OF different buffer sizes and different packet sending rates, the circuit sorting delay OF the invention is lower, and the average optimization is 15.58% and 32.65%, by taking the sorting circuit in IEEE transport ON packages-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, volume 31, first period, page 146-159, OF 'Memory-efficiency ON-Chip Network with adaptive Interface', as a reference experiment, in 1 month 2012; when the circuit works at the frequency of 1GHz, the power consumption of the reordering circuit is 24mw, and the power consumption is reduced by 35.14% relative to the reference experiment of 37 mw.

As shown in fig. 6, setting the secondary and final stages of the multi-stage reordering circuit in this example case is delayed less, on average 12.15% optimized, than a single stage reordering circuit with only the final stage.

Claims

1. A reordering circuit with variable stage number applied to a network on chip, comprising: a final reordering circuit and n secondary reordering circuits;

2. A reordering and sorting method with variable progression applied to network on chip is characterized by comprising the following steps:

defining the stage number as i; defining n as a preset maximum stage number;