CN104636085A

CN104636085A - Storage management module in on-chip network message buffering area

Info

Publication number: CN104636085A
Application number: CN201510040741.7A
Authority: CN
Inventors: 王一拙; 王小军; 石峰; 计卫星; 高玉金
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2015-01-27
Filing date: 2015-01-27
Publication date: 2015-05-20
Anticipated expiration: 2035-01-27
Also published as: CN104636085B

Abstract

The invention relates to a storage management module in an on-chip network message buffering area and belongs to the field of computer architecture and chip design. The storage management module is used for distribution and recovery of storage units in the buffering area when all ports of on-chip network computing nodes receive and forward messages. A multi-port synchronous access mode is adopted for the storage management module, and the multiple ports can apply for or release the storage units in the message buffering area. A predistribution strategy is adopted in the process for applying for the storage units, the storage units are not distributed in the applying process but pre-distributed for the ports in the initialization process, and afterwards, each time the ports apply for the storage units, the storage units are pre-distributed for the next time of applying. Compared with the prior art, the distribution and recovery time of the storage units is shortened through multi-port concurrent access and strategy predistribution, so that the delay of message storing and forwarding is shortened, the storing efficiency is improved, an annular queue is used for storing addresses of the idle storage units in the buffering area, and the circuit design is simplified.

Description

A kind of memory management module of network-on-chip message buffer

Technical field

The present invention relates to a kind of memory management module of network-on-chip message buffer, belong to Computer Architecture and chip design field, be applicable to the structure of chip multi-core, the internuclear interconnecting unit of many-core processor.

Background technology

Along with the development of multinuclear, many-core processor architecture, increasing process core is integrated on one single chip, and this proposes great demand to intercore communication, and network-on-chip is just being widely studied as a kind of effective solution of intercore communication and is adopting.In a network-on-chip architecture, data are packaged into message and transmit between process core (computing node), message process adopts the mode of storage forwarding to carry out, the routing module of each computing node is connected with the routing module of current computing node and other neighborhood calculation node, routing module as two-dimensional grid network-on-chip has east, west, south, four ports in north and a local port, need to be write message buffer when routing module receives the message on certain port, afterwards according to target route, message in playback buffer district is to certain port.Size due to message buffer is fixing, multiple port in the concurrent reception of synchronization or may send message, therefore, the automatic management of message buffer storage space must be realized, i.e. memory management module, the distribution of this module in charge message buffer storage space and recovery, namely when certain port writing message, idle access unit address in message buffer is provided by memory management module, when certain port reads message, memory unit address corresponding for this message is saved in memory management module, represent that message buffer storage unit corresponding to this address can be used.

At present, the memory management module of network-on-chip message buffer all adopts instant allocation strategy, namely when certain port receives message, to between memory management module solicitation message buffer empty time, memory management module just in an address of acquisition message buffer, inside free cells, and sends it on the data bus of read-out channel.This mode just applies for space when receiving message, needs the sequence of operations waiting for memory management module inside, can make at least one clock period of storage forwarding process lag of message.

The present invention adopts another kind of strategy, namely strategy is allocated in advance, free cells address, message buffer is sent on the data bus of read-out channel by memory management module in advance, read-out channel data bus when there being message to arrive directly from memory management module of the present invention obtains free cells address, message buffer, doing so avoids the time postponing to wait for memory management module built-in function.

In addition, the memory management module of existing network-on-chip message buffer all adopts multiport asynchronous access pattern, as when multiple port simultaneously receipt message time, memory management module provides free cells address, message buffer one by one for each port, this pattern realizes simple, but efficiency is lower.For this point, the present invention adopts Synchronous access mode, and namely multiple port can read while write memory management module, namely carries out distribution and the recovery of the multiple storage unit in message buffer simultaneously.

Summary of the invention

The object of the invention is to postpone large problem for solving existing memory management module, a kind of memory management module of network-on-chip message buffer is provided, this memory management module adopts multiport Synchronous access mode, and multiple port can be applied for or storage unit in release message buffer zone simultaneously; When applying for storage unit, adopting and allocating strategy in advance, reduce the delay that message stores forwards.

The object of the invention is to be achieved through the following technical solutions:

A memory management module for network-on-chip message buffer, comprises the port of the storage unit can simultaneously applied for more than 2 or in release message buffer zone; The process of application storage unit adopts allocates strategy in advance, namely be each port assignment storage unit in advance during initialization, afterwards at every turn ports having application storage unit time in advance for apply for distributing storage unit in advance next time.

Preferably, described memory management module is by reading address generation module IMG_RdAGen, write address generation module IMG_WrAGen, list item memory module IMG_Mem5x32 and show long computing module IMG_QueLen to form, reading, writing address generation module is connected with list item memory module respectively, shows long computing module and is connected with reading, writing address generation module, list item memory module is used for the address of preserving all free memory locations in message buffer with the form of circle queue, read address generation module for generating to the read channel of each port the table entry address reading above-mentioned circle queue, it is that the data that will write distribute the table entry address being written to above-mentioned circle queue that write address generation module is used for according to the application situation of each port write passage, show long computing module for obtaining the current head of the queue address Head of circle queue from reading address generation module, the current tail of the queue address Tail of circle queue is obtained from write address generation module, then the queue length of current time is calculated, each port is undertaken by reading address generation module the application of message buffer storage unit, reclaims undertaken by write address generation module the release of message buffer storage unit.

Preferably, described list item memory module is a multiport memory, preserves the address of all free memory locations in message buffer with the form of circle queue; The total number of queue list item that this circle queue can store equals message buffer storage unit number and adds one; Do not need to carry out address conflict arbitration in described list item memory module, multiple port can synchronization of access.

Preferably, describedly read address generation module, the table entry address reading circle queue is generated to the read channel of each port, and safeguard the head of the queue address Head of circle queue, to N number of passage applying for reading simultaneously, read address generate in two kinds of situation: when N is less than the number of message buffer free memory locations, according to channel priorities by Head, Head+1 ..., Head+N-1 distribute to each passage as reading address; When N is greater than the number of message buffer free memory locations, according to channel priorities order, the passage of high priority obtains and distributes.

Preferably, described write address generation module, to N number of passage applying for writing simultaneously, according to channel number order from small to large by current tail of the queue address Tail, Tail+1 ..., Tail+N-1 distributes to each passage as writing address, data are write the above-mentioned each address of circle queue by N number of passage under same clock drives simultaneously, and Tail+N is revised as in tail of the queue address.

For a kind of memory management module realizing circuit with 4 simultaneously read/write channel, the principle of the invention is described below:

The external interface of the memory management module of a kind of network-on-chip message buffer of the present invention comprises 4 write data bus Dwr0 (4:0) ~ Dwr3 (4:0) and 4 sense data bus Drd0 (4:0) ~ Drd3 (4:0), writes 4 relevant write control signal WR0 ~ WR3 of data bus to 4; 4 relevant to 4 sense data buses read control signal RD0 ~ RD3 and the condition indicative signal RE0 ~ RE3 that continues (whether having valid data at corresponding sense data bus Drd0 (4:0) ~ Drd3 (4:0) for representing); And reset signal RST and clock signal clk.

Be top level structure of the present invention as shown in Figure 2, comprise four modules: write address generation module IMG_WrAGen, read address generation module IMG_RdAGen, list item memory module IMG_Mem5x32 and show long computing module IMG_QueLen.During the space of any passage application list item memory module IMG_Mem5x32, all near its head of the queue, read an idle address, after reading, this address, not in list item memory module, shows that respective stored space is distributed; The address message that any one passage reads from list item memory module is once send, list item memory module space shared by this message will be released, namely corresponding address will be written near the tail of the queue of list item memory module, to use when other passage applications.Wherein, list item memory module IMG_Mem5x32 is connected with Awr0 (4:0) ~ Awr3 (4:0) interface of write address generation module IMG_WrAGen by Awr0 (4:0) ~ Awr3 (4:0) interface, is connected with Ard0 (4:0) ~ Ard3 (4:0) interface reading address generation module IMG_RdAGen by Ard0 (4:0) ~ Ard3 (4:0) interface, show long computing module IMG_QueLen to be connected with CurTail (4:0) interface of write address generation module IMG_WrAGen by Tail (4:0) interface, be connected with QuLen (4:0) interface reading address generation module IMG_RdAGen by Len (4:0) interface, be connected with Head (4:0) interface reading address generation module IMG_RdAGen by Head (4:0) interface, external interface RST, CLK and write address generation module IMG_WrAGen, read address generation module IMG_RdAGen and list item memory module IMG_Mem5x32 tri-modules to be connected, write control signal WR0 ~ WR3 is connected with list item memory module IMG_Mem5x32 with write address generation module IMG_WrAGen, read control signal RD0 ~ RD3 and continue condition indicative signal RE0 ~ RE3 with read address generation module IMG_RdAGen and be connected, 4 write data bus Dwr0 (4:0) ~ Dwr3 (4:0) are connected with list item memory module IMG_Mem5x32 with 4 sense data bus Drd0 (4:0) ~ Drd3 (4:0).

The circuit structure of write address generation module as shown in Figure 3.4 write passages of write address generation module IMG_WrAGen are parallel, IMG_WrAGen is that the data that will write distribute writing address according to the application situation of each write passage, if WRi is 1 expression write passage i application write data, then during each passage write data, IMG_AddressInc is the writing address of each channel allocation.Write address generation module is that the data that will write distribute the address be written in circle queue according to the application situation of each port write passage, when having N number of passage application to write simultaneously, according to channel number order from small to large by current tail of the queue address Tail, Tail+1 ..., Tail+N-1 distributes to each passage as writing address, data are write the above-mentioned each address of circle queue by N number of passage under same clock drives simultaneously, and Tail+N is revised as in tail of the queue address.Because the preservable list item number of circle queue is that message buffer storage unit number adds one, and ablation process actual corresponding be that the release of message buffer storage unit is reclaimed, each recovery write circle queue list item, circle queue list item is taken in every sub-distribution away, therefore there is not the state that queue full cannot write during write.IMG_M5_1x5e 5 selects 1 MUX, has 5, be respectively TailSel, CH0 ~ CH3 module in figure, is connected to each other between them by D0 (4:0) ~ D4 (4:0); 5 select the E interface of 1 MUX TailSel module to connect high level, simultaneously and CH0 ~ CH3 module pass through respective S (2:0) interface and be connected with interface TailSel (2:0), the AwrSet0 (2:0) of module IM G_SelGenWR ~ AwrSet3 (2:0) respectively; Address decoding module IM G_SelGenWR is connected with outside write control signal WR0 ~ WR3 by interface WrCtrl0 ~ WrCtrl3 simultaneously, mainly through the decoding of WR0 ~ WR3, provide each passage writing address and select signal, subsequent time tail of the queue Tail to calculate selection signal; Latch IMG_FDx5pe is for latching current tail of the queue address, isolate subsequent time tail of the queue address simultaneously, it selects the O of 1 MUX TailSel (4:0) to be connected by D (4:0) interface and 5, be connected with Din (4:0) interface of the computing module IMG_AddressInc of address increment addition by Q (4:0), outside write control signal WR0 ~ WR3 passes through or door OR4 is connected with its ce interface; Outside write control signal WR0 ~ WR3 selects the E interface of 1 MUX CH0 ~ CH3 to be connected respectively with 5 simultaneously; IMG_AddressInc is the computing module of an address increment addition, provide the result of calculation of Din+1, Din+2, Din+3, Din+4, this module is using current annular queue tail of the queue address CurTail as input, export CurTail+0, CurTail+1, CurTail+2, CurTail+3, CurTail+4, these values select the corresponding interface D0 (4:0) of 1 MUX ~ D4 (4:0) to be connected respectively by interface O0 (4:0) ~ O4 (4:0) with 55.

Read address generation module IMG_RdAGen, its function provides list item memory module IMG_Mem5x32 head of the queue table entry address, and simultaneously carry out control for read operation and to avoid and read to overflow, circuit structure as shown in Figure 4.Read address generation module and the read channel of each port is generated to the table entry address reading circle queue, in fact read procedure is exactly the process of the port assignment message buffer storage unit for receiving message, read the head of the queue address Head that address generation module safeguards circle queue, when applying for when there being N number of passage reading simultaneously, according to channel priorities by Head, Head+1, Head+N-1 distributes to each passage as reading address, and obtain new head of the queue address, if the number of active lanes of applying for is less than the number of message buffer free memory locations simultaneously, the passage that each application is read can be assigned with an address, if the number of active lanes of applying for is more than idle storage space number simultaneously, then only have the passage of high priority just can obtain distribution.Channel priorities by user's static state setting, as arranged priority from high to low according to channel number order from small to large, also can on-the-fly modify in operational process.Read procedure adopts and allocates strategy in advance, during system initialization, all storage unit of message buffer are all idle, therefore during initialization, order is each port assignment message buffer storage unit, namely in the sense data bus of each port, an effective message buffer memory unit address is exported, and the condition indicative signal RE that continues of each port is set to 1 (readable), after initialization, when port application is read, external circuit directly obtains pre-assigned address from sense data bus, read address generation module and generate the table entry address reading above-mentioned circle queue simultaneously, read request for next corresponding port allocates message buffer memory unit address in advance.RE identifies the validity of the data in sense data bus, so when RE is 0, data on data bus can be any values, in addition, every passage not applying for read operation, corresponding output data and RE all remain unchanged, and these 2 can be used for optimizing the Design of Logic Circuit reading address generation module.Different from write access, owing to needing, the priority of read operation is applied in consideration to read channel simultaneously, the factors such as spilling are read in control, and the operation of its internal logic is more complicated than write access, relevant with kind of the state of three residing for system: one, system initialization state.All message storages are idle state, therefore can allocate a message memory address in advance for each input channel during initialization, and without the need to considering whether this passage receives message; Its two, be assigned with message stores space.Now read-out channel is when receipt message, can find to be assigned with storage space.Whether the input channel of solicitation message storage space can be assigned with, relevant with the priority of the port number applied for and each passage simultaneously, if the number of active lanes of applying for is less than the number of idle storage space simultaneously, the passage of each application storage space can be assigned with an address, if the number of active lanes of applying for is more than idle storage space number simultaneously, then only have the passage of those high priorities just can obtain distribution; Its three, unallocated to message stores space.Now this passage should constantly solicitation message storage space, until be assigned to storage space.The annexation reading address generation module IMG_RdAGen is as follows:

Module CH0 ~ CH3 is that 5 of IMG_M5_1x5e selects 1 MUX, it is connected with O0 (4:0) ~ O4 (4:0) interface of IMG_AddressInc address increment additive operation module respectively by respective D0 (4:0) ~ D4 (4:0) interface, and same selects the D0 of 1 MUX (4:0) ~ D4 (4:0) interface to be connected with module HS5; IMG_AddressInc address increment additive operation module is connected with Q (4:0) interface of HeadLatch module by Din (4:0) interface; Module HS is connected with D (4:0) interface of HeadLatch module by O (4:0) interface; Name is called the various selection signals of the IMG_SelGenRD CMOS macro cell of OpSel, for reading table entry address and front pointer, be connected with HS module, CH0 ~ CH3 module S separately (2:0) interface respectively by HSel (2:0), ArdSet0 (2:0) ~ ArdSet3 (2:0) interface, be connected with C0 ~ C3 module NxtRE interface separately respectively by its PreRE0 ~ PreRE3 interface; CH0 ~ CH3 module is connected respectively by NxtA (4:0) interface that respective O (4:0) interface is corresponding to C0 ~ C3 module; External interface RST with CLK and name are called the IMG_FDPEx5 module of HeadLatch, the corresponding interface of the IMG_RdALatch module of C0 ~ C3 is connected; With name, outside read control signal RD0 ~ RD3 is called that RdCtrl0 ~ RdCtrl3 interface of the IMG_SelGenRD module of OpSel is connected, and be connected with the ce interface of corresponding C0 ~ C3 module and the E interface of CH0 ~ CH3 module, simultaneously these four read control signal lines are through OR4 or be called that the ce interface of IMG_FDPEx5 module of HeadLatch is connected behind the door with name.Name is called that C0 ~ C3 latch module of IMG_RdALatch then exports initialized Pre0 (31:0) ~ Pre3 (31:0) signal and Ard0 (4:0) ~ Ard3 (4:0) signal to list item memory module, externally exports RE0 ~ RE3 signal.

List item memory module IMG_Mem5x32 circuit structure diagram as shown in Figure 7, it is a multiport memory, the read/write address signal of this each port of multiport memory is respectively by reading address generation module and write address generation module provides, read address generation module and ensure that with write address generation module the address that the multiple port access of synchronization is different, therefore do not need to carry out address conflict arbitration in list item memory module, multiple port can synchronization of access.It preserves the address of all free memory locations in message buffer with the form of circle queue, the space size of this multiport memory is determined by message buffer storage unit number, be full or empty judgement to queue during for avoiding circle queue head and the tail pointer overlapping, the total number of list item that circle queue can store is set to message buffer storage unit number and adds one.

Aerial lug WR0 ~ WR3 is connected with A (4:0) interface with the E interface of corresponding submodule sequence number CHwr0 ~ CHwr3 respectively with Awr0 (4:0) ~ Awr3 (4:0), aerial lug Dwr0 ~ Dwr3 and sub-module name are called that Din0 ~ Din3 interface of CHwrExt is connected, initialization address signal Pre (31:0) and external signal RST, CLK and sub-module name are called the Pre (31:0) of MEM, Reset, C interface is connected, with sub-module name, aerial lug Ard0 (4:0) ~ Ard3 (4:0) is called that A0 (4:0) ~ A3 (4:0) interface of CHrdExt is connected as input, submodule CHwr0 ~ CHwr3 is connected with WrCtrl0 (the 31:0) ~ WrCtrl3 (31:0) of submodule CHwrExt respectively by respective port D (31:0), submodule CHwrExt is connected by the D (31:0) of Data (31:0), WrEn (31:0) interface and submodule MEM, CE (31:0) interface, submodule MEM is connected with D (31:0) interface of submodule CHrdExt by Q (31:0) interface.

Show long computing module IMG_QueLen, circuit structure as shown in Figure 8, is a 5bit binary subtracter.Show long computing module and obtain the current head of the queue address Head of circle queue from reading address generation module, the current tail of the queue address Tail of above-mentioned circle queue is obtained from write address generation module, then the queue length of current time is calculated, this queue length is read address generation module and is used for judging that multiple passage applies for whether there are enough queue list items when reading simultaneously, when namely multiple port receives message simultaneously, whether message buffer has enough free memory locations.Show long computing module and the access control of reading address generation module and write address generation module and in fact constitute list item memory module multiport memory, namely above-mentioned circle queue go out team and control of joining the team.The circuit connecting relation showing long computing module IMG_QueLen is as follows:

Rear pointer data line Tail (the 0) ~ Tail (3) of round-robin queue is connected with B0 ~ B3 interface with A0 ~ A3 interface of ADSU4 submodule above respectively with team head pointer data line Head (0) ~ Head (3), and rear pointer data line Tail (4) is connected with B0 interface with the A0 interface of ADSU4 submodule below with team head pointer data line Head (4); S0 ~ S3 interface of ADSU4 submodule above generates queue length line Len (0) ~ Len (3), and the S0 interface of ADSU4 submodule below generates queue length line Len (4); Above ADSU4 submodule by A2, A3, B1 of ADD interface and ADSU4 submodule below ~ B3, ADD interface and " " signal is connected simultaneously, simultaneously, ADSU4 submodule is connected with the C1 interface of ADSU4 submodule below by CO interface, is connected by the A1 interface of C1 interface and ADSU4 submodule below and VCC; Above ADSU4 submodule OFL interface and OFL, CO interface of ADSU4 submodule, S1 ~ S3 interface all put sky below.

Beneficial effect

Compared with prior art, the present invention has the following advantages:

1) the present invention supports the concurrent application of multiport or release message buffer storage space, improves the efficiency of storage administration;

2) the present invention adopts when message buffer storage unit application and allocates strategy in advance, shortens the time of storage unit application, thus shortens the delay of message stores forwarding;

3) the present invention adopts circle queue to preserve free memory locations address, message buffer, application and release are carried out at circle queue head and afterbody respectively, do not produce access conflict, circle queue length redundancy unit avoid head and the tail pointer overlapping time be full or empty judgement to queue, simplify circuit design.

Accompanying drawing explanation

Fig. 1 is the external interface signal schematic diagram of the embodiment of the present invention;

Fig. 2 is the composition structural representation of the embodiment of the present invention;

Fig. 3 is the electrical block diagram of write address generation module;

Fig. 4 is the time sequential routine schematic diagram of write address generation module in circle queue;

Fig. 5 is the electrical block diagram reading address generation module;

Fig. 6 reads the time sequential routine schematic diagram of address generation module in circle queue in list item abundance situation;

Fig. 7 reads the time sequential routine schematic diagram of address generation module in circle queue in list item deficiency situation;

Fig. 8 is the electrical block diagram of module IM G_Mem1x32;

Fig. 9 is the electrical block diagram showing long computing module.

Embodiment

Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.

The network-on-chip interconnected for base three is below described circuit of the present invention based on Xilinx software platform.

In the network-on-chip that base three interconnects, each node has three external-connected ports and a local port, and therefore total port number is 4, and message buffer storage unit number is 31, and address-bus width is 5bit.

Be the external interface signals schematic diagram of the embodiment of the present invention as shown in Figure 1,4 corresponding 4 read-out channels of port and 4 write passages, write relevant the having write control signal WR0 ~ WR3 and write data bus Dwr0 ~ Dwr3 of passage to 4; Control signal RD0 ~ RD3, sense data bus Drd0 ~ Drd3 and the condition indicative signal RE0 ~ RE3 that continues is read to 4 relevant having of read-out channel; Reset signal RST and clock signal clk.Failed situation may be there is during reading, namely message buffer does not have free memory locations for distributing, therefore to need signal RE0 ~ RE3 to indicate on sense data bus Drd0 ~ Drd3 whether have valid data (free memory locations address, message buffer).The recovery of message buffer storage unit is equivalent to during write, the memory unit address reclaimed is write the circle queue in memory management module, one is added because circle queue total length equals message buffer storage unit number, therefore there is not the situation that write is failed, do not need other condition indicative signal.

The memory management module composition structure of a kind of network-on-chip message buffer of the embodiment of the present invention as shown in Figure 2, comprises four modules: write address generation module IMG_WrAGen, read address generation module IMG_RdAGen, list item memory module IMG_Mem5x32 and show long computing module IMG_QueLen, list item memory module IMG_Mem5x32 is connected with Awr0 (4:0) ~ Awr3 (4:0) interface of write address generation module IMG_WrAGen by Awr0 (4:0) ~ Awr3 (4:0) interface, is connected with Ard0 (4:0) ~ Ard3 (4:0) interface reading address generation module IMG_RdAGen by Ard0 (4:0) ~ Ard3 (4:0) interface, show long computing module IMG_QueLen to be connected with CurTail (4:0) interface of write address generation module IMG_WrAGen by Tail (4:0) interface, be connected with QuLen (4:0) interface reading address generation module IMG_RdAGen by Len (4:0) interface, be connected with Head (4:0) interface reading address generation module IMG_RdAGen by Head (4:0) interface, external interface RST, CLK and write address generation module IMG_WrAGen, read address generation module IMG_RdAGen and list item memory module IMG_Mem5x32 tri-modules to be connected, write control signal WR0 ~ WR3 is connected with the corresponding interface of list item memory module IMG_Mem5x32 with write address generation module IMG_WrAGen, read control signal RD0 ~ RD3 and the condition indicative signal RE0 ~ RE3 that continues to be connected with the corresponding interface reading address generation module IMG_RdAGen, 4 write data bus Dwr0 (4:0) ~ Dwr3 (4:0) are connected with list item memory module IMG_Mem5x32 with 4 sense data bus Drd0 (4:0) ~ Drd3 (4:0).The composition structure of each module and sequential specifically describe as follows:

1) write address generation module IMG_WrAGen, circuit structure as shown in Figure 3.Initialization is carried out in the tail of the queue address of initial time RST signal to latch, is namely all high level 11111.Write control signal WR0 ~ WR3 mono-aspect is by interface WrCtrl0 ~ WrCtrl3 link block IMG_SelGenWR, and the signal obtained through decoding is called respectively to name that by output interface TailSel (2:0), AwrSel0 (2:0) ~ AwrSel3 (2:0) S (2:0) interface of 5 multi-path choice modules of TailSel, CH0 ~ CH3 passage sends subsequent time tail of the queue address signal and 4 passage writing address select signals.Write control signal WR0 ~ WR3 sends signal for judging that in the respective modules of corresponding port, whether write address is effective respectively to the corresponding E interface of CH0 ~ CH3 module simultaneously.Then the signal that write control signal WR0 ~ WR3 passes through again or door OR4 generates is connected into the ce interface of latch IMG_FDx5pe, effectively makes latch enable signal ask for having judged whether.5 IMG_M5_1x5e 5 select 1 MUX to be connected to each other, wherein D0 (4:0) ~ D3 (4:0) is the address signal of 4 ports, D4 (4:0) is tail of the queue address signal, when after D0 (4:0) ~ D4 (4:0) has input signal request of module TailSel, judge to select the signal of which interface to be outputted to D (4:0) interface of latch IMG_FDx5pe by O (4:0) as tail of the queue address NxtTail (4:0) by the signal of S (2:0) interface, for latching current tail of the queue address, in the next clock period and write address control signal is effective time the tail of the queue address in latch is exported as current tail of the queue address, if the CE now on latch IMG_FDx5pe, C interface signal is effective, then by Q (4:0), NxtTail (4:0) signal is outputted to IMG_AddressInc module as current tail of the queue address CurTail (4:0), this module is the computing module of an address increment addition, this module using current annular queue tail of the queue address CurTail (4:0) by Din (4:0) interface as input, CurTail+0 is exported by O0 (4:0) ~ O4 (4:0) interface, CurTail+1, CurTail+2, CurTail+3, CurTail+4, these values select rear assignment to each passage according to each passage current application write situation by IMG_M5_1x5e by module IM G_SelGenWR.Such as WR0=WR1=1, during WR2=WR3=0, namely passage 0 and 1 applies for write, suppose that now circle queue tail of the queue address is 00011 (5bit binary number), 5 values then on IMG_AddressInc output port O0 ~ O4 are respectively 00011, 00100, 00101, 00110, 00111, IMG_SelGenWR is according to the decoding of WR0 ~ WR3, provide the selection signal of the MUX CH0 of passage 0 correspondence, it is made to select to export the value 00011 on D0 mouth, provide the selection signal of the MUX CH1 of passage 1 correspondence, it is made to select to export the value 00100 on D1 mouth, and provide the selection signal of upper left corner MUX TailSel in Fig. 3, it is made to select to export value 00101 on D2 mouth as subsequent time tail of the queue address.CurTail (4:0) signal new in latch TL module is connected into the long computing module of table, for the length of computation cycles queue as current tail of the queue address.

Fig. 4 is the time sequential routine of write address generation module IMG_WrAGen in circle queue, before Figure 4 shows that first pulse signal arrival, RST signal is effective, now in latch, tail of the queue address initialization is 11111, because now Wr0 ~ Wr3 does not all have write address application signal, the value of therefore Awr0 (4:0) ~ Awr3 (4:0) port is invalid signals 00000.When first pulse signal arrives, Wr0 ~ Wr3 all has write address request signal, therefore just from tail of the queue address 11111 for Awr0 (4:0) ~ Awr (4:0) port distributes 11111 ~ 00010 successively, this stylish tail of the queue address 00011 is in latch.In second clock period, along with the arrival of second time clock, latch provides new tail of the queue address 00011.Simultaneously because Wr0 ~ Wr2 has write address request signal, therefore from tail of the queue address 00011 for Awr0 (4:0) ~ Awr2 (4:0) port distributes 00011 ~ 00101 successively, because Wr3 does not have write address request, just Awr3 (4:0) port is set to invalid address.Distribute down according to this according to the clock period according to this, can see that the tail of the queue address of a clock period on each clock cycle all will overturn the tail of the queue address becoming new.

2) read address generation module IMG_RdAGen, circuit structure as shown in Figure 5.QuLen (4:0) signal sent by the long computing module IMG_QueLen of table is connected into QuLen (4:0) interface of IMG_SelGenRD module, for judging whether the length of round-robin queue meets the number of request reading each port RD0 ~ RD3 of address module.Read RdCtrl0 ~ RdCtrl3 that address request signal RD0 ~ RD3 connects IMG_SelGenRD module, ArdSel0 (2:0) ~ ArdSel3 (2:0) signal is generated as selection logic through decoding, be connected into S (2:0) interface of MUX CH0 ~ CH3 respectively, simultaneously RD0 ~ RD3 request signal is connected E interface and the ce interface of respective modules with the enable signal of latch C0 ~ C3 module respectively as the enable signal of corresponding MUX CH0 ~ CH3, as the next table entry address reading will to read when applying for of CH0 ~ CH3 passage, this address will be latched by latch C0 ~ C3 in next CLK forward position and appear at output terminals A rd0 ~ Ard3, HSel (2:0) signal generated through decoding is connected into S (2:0) interface of MUX HS module, for generating team's leading address that next cycle looks ahead, owing to there being 4 ports thus read Address requests number may between 0 ~ 4, so need 3 binary representations, such as present port 2,3,4 have read request, and port number is 3 next team leading address is then Head+3, the NxtRE port that PreRE0 ~ PreRE3 signal is connected into latch C0 ~ C3 module is respectively generated, for latching preallocated next read request address enable signal through decoding.The signal that request signal RD0 ~ RD3 becomes through OR4 or pupil is connected into latch HeadLatch module ce interface, for determining whether there is effective read request.D (4:0) interface that MUX HS selects an address to output to latch HeadLatch as subsequent time team leading address by interface O (4:0) according to input signal S (2:0) and D0 (4:0) ~ D4 (4:0) latches, by the time interface CE, CLK, time the while of RST effectively, Din (4:0) port of team leading address Head (4:0) to IMG_AddressInc module of round-robin queue is exported by Q (4:0) interface, this module is the computing module of an address increment addition, this module is using current annular queue head of the queue address Head as input, Head+0 is exported respectively by O0 (4:0) ~ O4 (4:0), Head+1, Head+2, Head+3, Head+4 is to MUX HS, D0 (4:0) ~ D4 (4:0) interface of CH0 ~ CH3 module, then each MUX is according to respective S (2:0) signal, export subsequent time by O (4:0) to read address signal NxtH (4:0) (this signal is mentioned above, repeat no more herein) and NxtA (4:0) interface of NxtArd0 (4:0) ~ NxtArd3 (4:0) to latch C0 ~ C3 latch, when next significant instant i.e. next reading request signal arrives according to NxtRE, CE, CLK, RST signal is exported by respective CurA (4:0) interface, exported the external output signal RE0 ~ RE3 of subsequent time by CurRE interface simultaneously, and the PreData of C0 ~ C3 (4:0) interface signal is as the initialization value reading list item memory module separately, because team's leading address of round-robin queue is from 00000, therefore the initial value of C0 ~ C3 is just respectively from 00000 ~ 00011, these signals change into Pre (31:0) signal through expansion.Such one group of latch is used to be allocate strategy in advance because have employed at output terminal, it when each passage application is read, output terminals A rd0 ~ Ard3 is the circle queue table entry address read in advance, these addresses can not be washed out by the newly assigned address of current C LK before being read, therefore need to latch new allocation address, after the pre-assigned address of next CLK is read away, output to output terminal again.Head (4:0) signal that latch HeadLatch module exports also is connected into the long computing module of table, for the length of computation cycles queue as current team leading address simultaneously.

Fig. 6 reads the time sequential routine of address generation module IMG_RdAGen in circle queue in list item abundance situation, after Figure 6 shows that initialization terminates, in each clock period (CLK) afterwards along situation when to apply for that the number of active lanes read is respectively 4,3,2,1,0 simultaneously, at the end of initialization, namely in figure before 100ns, Ard3=0x03, Ard2=0x02, Ard1=0x01, Ard0=0x00, Head=0x04, RE3=RE2=RE1=RE0=1.Reading control signal RD0 ~ RD3 along with each passage puts 1 (application is read), the circle queue list item of 0x00 ~ 0x03 is read away, then at next CLK forward position (140ns place), new address value 0x04 ~ 0x07 is outputted to Ard0 ~ Ard3, along applying for that the number of active lanes read has become 3 after this CLK, now RD3=0, RD signal due to each passage is used as the enable signal of each passage output latch, and therefore in Fig. 6, after RD3=0, the value of Ard3 remains constant.When reading several times in Fig. 6 to apply for, queue length is greater than application number of active lanes all the time, and therefore RE0 ~ RE3 is always 1.

Fig. 7 reads the time sequential routine of address generation module IMG_RdAGen in circle queue in list item deficiency situation, in figure, about 340ns 4 passages start to apply for reading simultaneously, and now queue length QueLen=3, head of the queue address Head=0x0E, next clock CLK forward position (390ns place), passage 0, 1, 2 have all been assigned with address (0x0E, 0x0F, 0x10), passage 3 does not have allocation address because residue list item is not enough, now RE3 is set to 0, due to the reason that logic realization is optimized, although queue is without residue list item, but built-in function is still assigned with next address for passage 3, i.e. now Ard3=0x11, only corresponding RE3=0 shows that this address is invalid.Situation (queue length is respectively 2,1,0) similar to the above betides near 490ns, 790ns, 890ns respectively.And the waveform between 490ns to 790ns provides when be queue length being 2, apply for that the number of active lanes read is respectively the situation of 4,3,2 simultaneously, owing to there being write to occur simultaneously in this process, just make queue length be always 2.

3) list item memory module IMG_Mem5x32, what wherein deposit is the address of message buffer free memory locations, because message buffer storage unit number is 31, its address is 5bit, and for avoiding the judgement of circle queue head and the tail hands coincide, circle queue length is 32, therefore the size of list item storage space is 5 × 32bit, in addition, because the present embodiment has 4 ports, therefore list item memory module is one 4 and enters 4 multiport memories gone out.IMG_Mem5x32 is made up of 5 IMG_Mem1x32 wired in parallel, and these 5 modules preserve one of 5 bits of all list items respectively.Be described for an IMG_Mem1x32 circuit below, as shown in Figure 8, because message buffer storage unit number is 31, its address is 5bit to the electrical block diagram of IMG_Mem1x32.When round-robin queue starts, each access unit address needs to carry out initialization, address is from 00000 ~ 11111, Pre (31:0) is exactly for each IMG_Mem1x32 memory module provides the signal of initialization value, by Pre (31:0) interface of IMG_FDPEx32 module, initialization is carried out to IMG_Mem1x32, namely after system electrification or under the effect of external signal RST, by Pre (31:0) signal data write message buffer.The Awr0 (4:0) that external signal WR0 ~ WR3 and write address module send ~ Awr3 (4:0) signal respectively connection name is that enable interface E and A (4:0) interface of CHwr0 ~ CHwr3 module of IMG_D5_32E carries out address extension, expands to 32 bit address by 5 bit address.Such as, Awr1 ~ Awr3 writing address be respectively 00100 ~ 00110 expansion after address be 00000000000000000000000000010000 ~ 00000000000000000000000001000000.The effect of CHwrExt module is exactly the address control signal of write and data carried out expanding and merge the address control signal and data that are combined into and correspond to 1x32MEM module.Address after expansion is connected into WrCtrl0 (31:0) ~ WrCtrl3 (31:0) interface of CHwrExt module by respective D (31:0) interface, as the control signal of each port writing address, write address control signal after WrCtrl0 (31:0) ~ WrCtrl3 (31:0) ports-Extending is merged together by CHwrExt module, exported by WrEn (31:0) as a write address control signal, namely 00000000000000000000000000010000 ~ 00000000000000000000000001000000 the CE (31:0) that 00000000000000000000000001110000 outputs to MEM module is merged into, as write address control signal.And a data Din0 ~ Din3 in data Dwr0 (the 4:0) ~ Dwr3 (4:0) of each port write is connected into Din0 ~ Din3 interface of CHwrExt module, the data port one described above, 2,3 forming 32 through the expansion of CHwrExt module has write address request, then appropriate address space write corresponding port data provided according to write address control signal writes data from Din1 ~ Din3.After this address and data that are extended to 32 are sent to CE (31:0) and D (31:0) interface of MEM module respectively by WrEn (31:0) and Data (31:0) interface by CHwrExt module, and such list item memory module just completes the data write of write address module.Name is called that the function of a CHrdExt module of IMG_RdExt4P1x32 exports from MEM module reading corresponding address data 5bit position according to address Ard0 (the 4:0) ~ Ard3 (4:0) reading each port of address module.Ard0 (4:0) ~ Ard3 (4:0) is from C0 ~ C3 CurA separately (4:0) interface reading address module, be connected into A0 (4:0) ~ A3 (4:0) interface of CHrdExt module, this signal represents will read data from which address space of MEM module, then undertaken inquiring about and digital independent by the line of the D (31:0) of CHrdExt module and Q (31:0) module of MEM module, wherein also will carry out the address extension of being correlated with.The address of reading of such as Ard0 (4:0) ~ Ard3 (4:0) is respectively 01000 ~ 01011, namely four port channels will read the data in 1x32MEM module on corresponding 9 ~ 12 positions, once all read the data of 1x32MEM module by the D (31:0) of CHrdExt module and Q (31:0) line of MEM module, then a data on each port obtained is sent as external signal by D0 ~ D3 by CHrdExt module by the data on relevant position.

4) show long computing module IMG_QueLen, circuit structure as shown in Figure 9, is a 5bit binary subtracter, for the length of computation cycles queue.When reading address module applied address space, show long computing module to subtract each other according to current tail of the queue address Tail (the 0) ~ Tail (4) of round-robin queue and team leading address Head (0) ~ Head (4) corresponding position, judge whether that enough address spaces meet the application of reading address module.Because single ADSU4 module is 4 binary subtracters, so need 2 ADSU4 modules concerning 5bit scale-of-two.ADSU4 submodule is by obtaining latter four of round-robin queue's length to four Head (0) ~ Head (3) after four Tail (0) ~ Tail (3) deduct group head pointer address value after round-robin queue's tail of the queue address value above, ADSU4 submodule is by deducting to round-robin queue tail of the queue address value most significant digit Tail (4) most significant digit that group head pointer address value most significant digit Head (4) obtains round-robin queue's length below, the length of round-robin queue is jointly drawn by upper and lower two ADSU4 submodules, ADSU4 obtains low 4 Len (the 0) ~ Len (3) of round-robin queue's length above, ADSU4 obtains the most significant digit Len (4) of round-robin queue's length below.Such as team's leading address is 00010 now, tail of the queue address is 11111, latter four of tail of the queue address are carried out subtracting each other obtaining 1101 with rear four superincumbent ADSU4 of team leading address, owing to not producing carry, directly exported by S0 ~ S3, carry out calculating most significant digit 1 inside the highest equally ADSU4 below, exported by S0.The length value 11101 of final round-robin queue is obtained after the value combination of two ADSU4 generations.Due to the standard package that ADSU4 provides for related software, as shown in Figure 9, even if occur that team's leading address is greater than tail of the queue address, inside modules also has corresponding mechanism to carry out process to obtain correct result its connection, no longer describes at this.

Certainly; the present invention also can have other various embodiments; when not deviating from the present invention's spirit and essence thereof; those of ordinary skill in the art can make various corresponding change and distortion according to the present invention, but these change accordingly and are out of shape the protection domain that all should belong to the claim appended by the present invention.

Claims

1. a memory management module for network-on-chip message buffer, is characterized in that: the port comprising the storage unit can simultaneously applied for more than 2 or in release message buffer zone; The process of application storage unit adopts allocates strategy in advance, namely be each port assignment storage unit in advance during initialization, afterwards at every turn ports having application storage unit time in advance for apply for distributing storage unit in advance next time.

2. the memory management module of a kind of network-on-chip message buffer according to claim 1, is characterized in that: by reading address generation module, write address generation module, list item memory module and show long computing module to form, reading, writing address generation module is connected with list item memory module respectively, shows long computing module and is connected with reading, writing address generation module, list item memory module is used for the address of preserving all free memory locations in message buffer with the form of circle queue, read address generation module for generating to the read channel of each port the table entry address reading above-mentioned circle queue, it is that the data that will write distribute the table entry address being written to above-mentioned circle queue that write address generation module is used for according to the application situation of each port write passage, show long computing module for obtaining the current head of the queue address Head of circle queue from reading address generation module, the current tail of the queue address Tail of circle queue is obtained from write address generation module, then the queue length of current time is calculated, each port is undertaken by reading address generation module the application of message buffer storage unit, reclaims undertaken by write address generation module the release of message buffer storage unit.

3. the memory management module of a kind of network-on-chip message buffer according to claim 2, is characterized in that: described list item memory module is a multiport memory, preserves the address of all free memory locations in message buffer with the form of circle queue; The total number of queue list item that this circle queue can store equals message buffer storage unit number and adds one; Do not need to carry out address conflict arbitration in described list item memory module, multiple port can synchronization of access.

4. the memory management module of a kind of network-on-chip message buffer according to claim 2, it is characterized in that: described in read address generation module, the table entry address reading circle queue is generated to the read channel of each port, and safeguard the head of the queue address Head of circle queue, to N number of passage applying for reading simultaneously, read address generate in two kinds of situation: when N is less than the number of message buffer free memory locations, according to channel priorities by Head, Head+1 ..., Head+N-1 distribute to each passage as reading address; When N is greater than the number of message buffer free memory locations, according to channel priorities order, the passage of high priority obtains and distributes.

5. the memory management module of a kind of network-on-chip message buffer according to claim 2, it is characterized in that: described write address generation module, to N number of passage applying for writing simultaneously, according to channel number order from small to large by current tail of the queue address Tail, Tail+1 ..., Tail+N-1 distributes to each passage as writing address, data are write the above-mentioned each address of circle queue by N number of passage under same clock drives simultaneously, and Tail+N is revised as in tail of the queue address.

6. one kind has the memory management module realizing circuit of 4 simultaneously network-on-chip message buffers of read/write channel, it is characterized in that: the external interface of described memory management module comprises 4 write data bus Dwr0 (4:0) ~ Dwr3 (4:0) and 4 sense data bus Drd0 (4:0) ~ Drd3 (4:0), writes 4 relevant write control signal WR0 ~ WR3 of data bus to 4; 4 relevant to 4 sense data buses read control signal RD0 ~ RD3 and the condition indicative signal RE0 ~ RE3 that continues (whether having valid data at corresponding sense data bus Drd0 (4:0) ~ Drd3 (4:0) for representing); And reset signal RST and clock signal clk; Described memory management module inner by write address generation module IMG_WrAGen, read address generation module IMG_RdAGen, list item memory module IMG_Mem5x32 and show long computing module IMG_QueLen tetra-modules to form; Annexation is as follows:

List item memory module IMG_Mem5x32 is connected with Awr0 (4:0) ~ Awr3 (4:0) interface of write address generation module IMG_WrAGen by Awr0 (4:0) ~ Awr3 (4:0) interface, is connected with Ard0 (4:0) ~ Ard3 (4:0) interface reading address generation module IMG_RdAGen by Ard0 (4:0) ~ Ard3 (4:0) interface, show long computing module IMG_QueLen to be connected with CurTail (4:0) interface of write address generation module IMG_WrAGen by Tail (4:0) interface, be connected with QuLen (4:0) interface reading address generation module IMG_RdAGen by Len (4:0) interface, be connected with Head (4:0) interface reading address generation module IMG_RdAGen by Head (4:0) interface, external interface RST, CLK and write address generation module IMG_WrAGen, read address generation module IMG_RdAGen and list item memory module IMG_Mem5x32 tri-modules to be connected, write control signal WR0 ~ WR3 is connected with list item memory module IMG_Mem5x32 with write address generation module IMG_WrAGen, read control signal RD0 ~ RD3 and continue condition indicative signal RE0 ~ RE3 with read address generation module IMG_RdAGen and be connected, 4 write data bus Dwr0 (4:0) ~ Dwr3 (4:0) are connected with list item memory module IMG_Mem5x32 with 4 sense data bus Drd0 (4:0) ~ Drd3 (4:0),

Described write address generation module IMG_WrAGen selects 1 MUX IMG_M5_1x5e (TailSel and CH0 ~ CH3 module) by 55, address decoding module IM G_SelGenWR, or door OR4, latch IMG_FDx5pe (TL module) and address increment totalizer IMG_AddressInc forms, wherein be connected to each other by D0 (4:0) ~ D4 (4:0) between 5 IMG_M5_1x5e, the E interface of TailSel module connects high level, simultaneously and CH0 ~ CH3 module pass through respective S (2:0) interface respectively with the interface TailSel (2:0) of module IM G_SelGenWR, AwrSet0 (2:0) ~ AwrSet3 (2:0) is connected, IMG_SelGenWR is connected with outside write control signal WR0 ~ WR3 by interface WrCtrl0 ~ WrCtrl3 simultaneously, mainly through the decoding of WR0 ~ WR3, provides each passage writing address and selects signal, subsequent time tail of the queue Tail to calculate selection signal, latch IMG_FDx5pe is for latching current tail of the queue address, isolate subsequent time tail of the queue address simultaneously, it selects the O of 1 MUX TailSel (4:0) to be connected by D (4:0) interface and 5, be connected with Din (4:0) interface of the computing module IMG_AddressInc of address increment addition by Q (4:0), outside write control signal WR0 ~ WR3 passes through or door OR4 is connected with its ce interface, outside write control signal WR0 ~ WR3 selects the E interface of 1 MUX CH0 ~ CH3 to be connected respectively with 5 simultaneously, IMG_AddressInc is the computing module of an address increment addition, provide the result of calculation of Din+1, Din+2, Din+3, Din+4, this module is using current annular queue tail of the queue address CurTail as input, export CurTail+0, CurTail+1, CurTail+2, CurTail+3, CurTail+4, these values select the corresponding interface D0 (4:0) of 1 MUX ~ D4 (4:0) to be connected respectively by interface O0 (4:0) ~ O4 (4:0) with 55,

The described address generation module IMG_RdAGen that reads selects 1 MUX IMG_M5_1x5e (HS and CH0 ~ CH3 module) by 55, address decoding module IM G_SelGenRD (OpSel), or door OR4, latch IMG_FDPEx5 (HeadLatch module), address increment totalizer IMG_AddressInc and 4 IMG_RdALatch (C0 ~ C3 module) forms, wherein 5 IMG_M5_1x5e are all connected to each other D0 (4:0) ~ D4 (4:0) interface by D0 (4:0) ~ D4 (4:0) and are connected with O0 (4:0) ~ O4 (4:0) interface of IMG_AddressInc, IMG_AddressInc is connected with Q (4:0) interface of HeadLatch module by Din (4:0) interface, module HS is connected with D (4:0) interface of HeadLatch module by O (4:0) interface, the various selection signals of OpSel CMOS macro cell, for reading table entry address and front pointer, be connected with HS module, CH0 ~ CH3 module S separately (2:0) interface respectively by HSel (2:0), ArdSet0 (2:0) ~ ArdSet3 (2:0) interface, be connected with C0 ~ C3 module NxtRE interface separately respectively by its PreRE0 ~ PreRE3 interface, CH0 ~ CH3 module is connected respectively by NxtA (4:0) interface that respective O (4:0) interface is corresponding to C0 ~ C3 module, external interface RST is connected with the corresponding interface of CLK and HeadLatch module, C0 ~ C3 module, RdCtrl0 ~ RdCtrl3 interface of outside read control signal RD0 ~ RD3 and OpSel module is connected, and be connected with the ce interface of corresponding C0 ~ C3 module and the E interface of CH0 ~ CH3 module, simultaneously these four read control signal lines are through OR4 or be connected with the ce interface of HeadLatch module behind the door, C0 ~ C3 module then exports initialized Pre0 (31:0) ~ Pre3 (31:0) signal and Ard0 (4:0) ~ Ard3 (4:0) signal to list item memory module, externally exports RE0 ~ RE3 signal,

Described list item memory module IMG_Mem5x32 is made up of 5 IMG_Mem1x32 wired in parallel, these 5 modules preserve one of 5 bits of all list items respectively, and IMG_Mem1x32 module is made up of 4 expansion module IMG_D5_32E (CHwr0 ~ CHwr3 module), address and data combiners block IMG_WrExt4P1x32 (CHwrExt module), memory I MG_FDPE x32 (MEM module) and data reading module IM G_RdExt4P1x32 (CHrdExt module); Aerial lug WR0 ~ WR3 is connected with A (4:0) interface with the E interface of CHwr0 ~ CHwr3 module respectively with Awr0 (4:0) ~ Awr3 (4:0), Din0 ~ Din3 interface of aerial lug Dwr0 ~ Dwr3 and CHwrExt module is connected, Pre (31:0), Reset, the C interface of initialization address signal Pre (31:0) and external signal RST, CLK and MEM module are connected, and aerial lug Ard0 (4:0) ~ Ard3 (4:0) is connected with A0 (4:0) ~ A3 (4:0) interface of CHrdExt module as input; CHwr0 ~ CHwr3 module is connected with WrCtrl0 (the 31:0) ~ WrCtrl3 (31:0) of CHwrExt module respectively by respective port D (31:0); CHwrExt module is connected by the D (31:0) of Data (31:0), WrEn (31:0) interface and MEM module, CE (31:0) interface; MEM module is connected with D (31:0) interface of CHrdExt module by Q (31:0) interface;

The long computing module IMG_QueLen of described table is made up of 24 subtracter ADSU4 block coupled in series, rear pointer data line Tail (the 0) ~ Tail (3) of outside round-robin queue is connected with B0 ~ B3 interface with A0 ~ A3 interface of ADSU4 submodule above respectively with team head pointer data line Head (0) ~ Head (3), and rear pointer data line Tail (4) is connected with B0 interface with the A0 interface of ADSU4 submodule below with team head pointer data line Head (4); S0 ~ S3 interface of ADSU4 module above generates queue length line Len (0) ~ Len (3), and the S0 interface of ADSU4 submodule below generates queue length line Len (4); Above ADSU4 submodule by A2, A3, B1 of ADD interface and ADSU4 submodule below ~ B3, ADD interface and " " signal is connected simultaneously, simultaneously, ADSU4 submodule is connected with the C1 interface of ADSU4 submodule below by CO interface, is connected by the A1 interface of C1 interface and ADSU4 submodule below and VCC; Above ADSU4 submodule OFL interface and OFL, CO interface of ADSU4 submodule, S1 ~ S3 interface all put sky below.