CN102013984B

CN102013984B - Two-dimensional net network-on-chip system

Info

Publication number: CN102013984B
Application number: CN2010105072008A
Authority: CN
Inventors: 蔡觉平; 魏洁; 李赞; 姚磊; 王韶力; 郝跃
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2010-10-14
Filing date: 2010-10-14
Publication date: 2012-05-09
Anticipated expiration: 2030-10-14
Also published as: CN102013984A

Abstract

The invention discloses a two-dimensional net network-on-chip system which is used for solving the problem that a multi-core on-chip system has delay transmission time and large power consumption when processing mass data. The technical scheme is that: a two-stage register L2 is arranged out of the core; a novel exchanging switch with an internal memory accessing port is used, so that the two-stage register L2 exchanges data with a processing unit PE through the internal memory accessing port in the exchanging switch; all processing units PE can share the two-stage register L2; and the writing/reading operations among the processing units PE in the traditional two-dimensional net network-on-chip system are divided into two steps of firstly sharing to the two-stage register L2 from the processing unit PE and then sharing to the processing unit PE from the shared two-stage register L2. The two-dimensional net network-on-chip system relieves the congestion among the processing units PE caused by the concentrated reading/writing requests and reduces the transmission time and the power consumption of the network-on-chip system; and the two-dimensional net network-on-chip system is used for processing the large-scale data.

Description

Two-dimension netted network-on-a-chip

Technical field

The invention belongs to technical field of integrated circuits, relate to the structure of multi-core processor chip network-on-chip, can be used for handling the large-scale data that multimedia technology or wireless application etc. produce.

Background technology

Network-on-chip NoC is used for system-on-chip designs to interference networks, solves communication between components problem on the sheet.Compare with traditional structure such as bus structures, cross bar structure, have the reliability height, autgmentability is strong, advantage low in energy consumption.

The netted network-on-chip tactical rule of traditional two-dimensional, simply be easy to realize, and have good durability, therefore two-dimension netted network is a most frequently used network-on-chip structure in the research at present, and its structure is as shown in Figure 1.Its each routing node links to each other with a kernel with four adjacent routing nodes; Each routing node is an alteration switch S; In each kernel, L2 cache device L2 and processing unit PE, level cache device L1, network adapter NI integrate.

Alteration switch S, its structure is as shown in Figure 2, and this alteration switch S is by North, South, East, four I/O ports of West; Processing unit access interface PE port; Five MUX MUX, five selected cells, five fifo queue Queue and a cross bar switch array are formed.North, South, East, four I/O ports of West, processing unit access interface PE port forms by input port and output port two parts.Input port links to each other with the fifo queue Queue of this input port; Output port links to each other with the MUX MUX of this output port direction; MUX MUX simultaneously with this MUX MUX direction on selected cell link to each other; MUX MUX links to each other through MUX MUX, the fifo queue Queue of all MUX directions of cross bar switch array and other again.

This alteration switch S is transferred to one or more output ports to data from an input port, realizes the transfer of data of network-on-chip.Data transmission procedure is: data are from certain input port input, and fifo queue Queue carries out buffer memory to the input data; Confirm transmission path by the cross bar switch array then; Then MUX MUX selects the data that transmission comes under the control of selected cell; Last selecteed data are exported through output port.

According to Pande ' s performance model, set up the network-on-chip transmission delay model of write/read operation between the processing unit PE:

Write operation: shown in Fig. 3 (a), as i processing unit PE _iTo j processing unit PE _jDuring write data, PE _iAt first to PE _jRequest is write in transmission, then PE _jResponse should request, then PE _iBegin to PE _jWrite data.So PE _iThe transmission delay T noc write of the network-on-chip of write operation can use following formulate:

T _noc?write＝T _h+T _S+T _C+T _W＝Ht _r+L/b+T _C+T _W

In the formula, T _h, T _s, T _c, T _WBe respectively that head postpones, sequence delays, communication delay and response time, H is a jumping figure, t _rBe that route postpones, L is that bag is long, and b is a bandwidth.

Read operation: shown in Fig. 3 (b), as i processing unit PE _iFrom j processing unit PE _jDuring read data, PE _iAt first to PE _jSend read request, PE then _jResponse should request, then PE _jBegin to PE _iSend data.So PE _iThe transmission delay T noc read of the network-on-chip of read operation can use following formulate:

T _noc?read＝2T _h+T _S+2T _C+T _W＝2Ht _r+L/b+2T _C+T _W

In the netted network-on-a-chip of traditional two-dimensional, because processing unit PE request is too concentrated and caused congestedly, and system need wait for that processing unit PE responds the Writing/Reading request, communication delay T _cWith response time T _WGreatly, cause the transmission delay of network-on-chip and power consumption big, particularly when handling large-scale data, the problem that time-delay and power consumption are big is particularly evident, can't satisfy the requirement that system in time handles mass data at short notice.

Summary of the invention

The objective of the invention is to overcome the deficiency of above-mentioned prior art, a kind of novel two-dimension netted network-on-a-chip is provided,, satisfy the requirement that system in time handles mass data at short notice to reduce transmission delay and power consumption.

The technical thought that realizes the object of the invention is; L2 cache device L2 is arranged on the outer novel alteration switch with an internal memory access interface that also adopts of kernel; Realize sharing of L2 cache device L2; And to change into the data-transmission mode between the processing unit PE with L2 cache device L2 be the data-transmission mode of intermediary, and then realize low transmission time-delay, low-power consumption.Whole network-on-a-chip comprises: N kernel, a N routing node (N >=2) and a L2 cache device L2; Each routing node links to each other with a kernel with four adjacent routing nodes; Each kernel is by processing unit PE, and level cache device L1 and network adapter NI form; Each routing node is an alteration switch S, and this alteration switch is made up of North, South, East, four I/O ports of West, internal memory access interface L2port, processing unit access interface PE port, cross bar switch array, six MUX MUX, six selected cells and six fifo queue Queue; L2 cache device L2 is arranged on the outside of kernel; Realize sharing of L2 cache device L2; This L2 cache device L2 is connected with all routing nodes, through internal memory access interface among the alteration switch S and the processing unit PE swap data in the kernel, realizes the low transmission time-delay.

Processing unit PE in the said kernel, level cache device L1 link to each other with other routing node through four I/O ports among the alteration switch S; Be connected with the outer L2 cache device L2 of kernel through the internal memory access interface among the alteration switch S, realize earlier from i processing unit PE _iTo the L2 cache device L2 that shares, again from the L2 cache device L2 that shares to j processing unit PE _jTwo the step write/read operation.

Described North, South, East, four I/O ports of West, internal memory access interface L2port and processing unit access interface PE port form by input port and output port two parts; Input port links to each other with the fifo queue Queue of this input port direction; Output port links to each other with the MUX MUX of this output port direction; MUX MUX, the fifo queue Queue of MUX MUX through all MUX directions of cross bar switch array and other links to each other, and the while links to each other with the selected cell of self direction.

The present invention compared with prior art has the following advantages:

(1) the present invention realizes that processing unit PE, level cache device L1 in the kernel are connected with the outer L2 cache device L2's that shares of kernel owing to be provided with the internal memory access interface in the alteration switch; Be provided with four I/O ports and realize that kernel, L2 cache device L2 are connected with other routing node; Be divided into the write/read operation between the processing unit PE in the netted network-on-a-chip of conventional two-dimensional earlier from processing unit PE to L2 cache device L2; Go on foot to handling unit PE two from L2 cache device L2 again; Alleviated because processing unit PE read is too concentrated cause congested; Reduced the communication delay between the processing unit PE, thereby reduced the transmission delay of network-on-a-chip, power consumption also decreases;

(2) the present invention shares L2 cache device L2 owing to the outside that L2 cache device L2 is arranged on kernel, and there is not response time T in this L2 cache device L2 that shares through internal memory access interface and processing unit PE swap data in the alteration switch _WThereby, further reduced network-on-a-chip transmission delay and power consumption, satisfied the requirement that system in time handles mass data at short notice.

Description of drawings

Fig. 1 is the netted network-on-chip system configuration of a conventional two-dimensional sketch map;

Fig. 2 is an alteration switch structural representation in the netted network-on-a-chip of conventional two-dimensional;

Fig. 3 is the read/write operation delay model sketch map of processing unit PE in the netted network-on-a-chip of conventional two-dimensional;

Fig. 4 is the two-dimension netted network-on-a-chip structural representation of the present invention;

Fig. 5 is an alteration switch structural representation in the two-dimension netted network-on-a-chip of the present invention;

Fig. 6 is the read/write operation delay model sketch map of processing unit PE in the two-dimension netted network-on-a-chip of the present invention.

Embodiment

With reference to Fig. 4, two-dimension netted network-on-a-chip of the present invention is made up of N kernel, a N routing node (N >=2) and a L2 cache device L2.Each routing node links to each other with a kernel with four adjacent routing nodes; Each kernel is made up of processing unit PE, level cache device L1 and network adapter NI; And the L2 cache device L2 that is integrated in the traditional structure in the kernel is arranged on outside the kernel; This L2 cache device L2 is connected with all routing nodes, realizes sharing of L2 cache device L2.The L2 cache device L2 that shares links to each other with processing unit PE, level cache device L1 in the kernel through the internal memory access interface L2port among the alteration switch S, realizes first from i processing unit PE _iTo the L2 cache device L2 that shares, again from the L2 cache device L2 that shares to j processing unit PE _jTwo the step write/read operation.Each routing node is an alteration switch S, and its structure is as shown in Figure 5.

With reference to Fig. 5; Alteration switch S of the present invention comprises: North, South, East, four I/O ports of West; Internal memory access interface L2port, processing unit access interface PE port, six MUX MUX; Six selected cells, six fifo queue Queue and a cross bar switch array.Wherein, North, South, East, four I/O ports of West, internal memory access interface L2port and processing unit access interface PE port form by input port and output port two parts.Input port links to each other with the fifo queue Queue of this input port direction; Output port links to each other with the MUX MUX of this output port direction; MUX MUX links to each other with the selected cell of this MUX direction simultaneously; MUX MUX also links to each other with MUX MUX, the fifo queue Queue of other all MUX directions through the cross bar switch array.

This alteration switch S realizes the transmission of data from an input port to one or more output ports.Transmission course is: data are imported from input port, and the fifo queue Queue on this input port direction carries out buffer memory to the input data; By the transmission path of cross bar switch array specified data, then MUX MUX selects transmitting the data of coming under the control of selected cell then; At last selecteed data are exported through output port.When data were transmitted between processing unit access interface PE port and internal memory access interface L2port, network-on-a-chip had been realized the exchanges data between processing unit PE and the shared L2 cache device L2.

Effect of the present invention further specifies through following theory analysis and simulation result:

1. theory analysis

Write/read operation process among the present invention between the processing unit PE is divided into network-on-chip transmission course and the DRP data reception process from L2 cache device L2 to processing unit PE from processing unit PE to L2 cache device L2.Influence the response time T of the processing unit PE in network-on-chip transmission time in the traditional structure _WCan influence the Data Receiving time in the new construction and can not influence the network-on-chip transmission time.The present invention only considers the network-on-chip transmission time.

With reference to Fig. 6, set up i processing unit PE in the network-on-a-chip of the present invention _iTo j processing unit PE _jThe delay model of write/read operation.Wherein:

Write operation: shown in Fig. 6 (a), as i processing unit PE _iTo j processing unit PE _jDuring write data, PE _iAt first to distributing to PE _jL2 cache device L2 _jRequest is write in transmission, then PE _iTo L2 _jWrite data.PE _iThe network-on-chip transmission delay T of write operation _SMThe expression formula of noc write is:

T _SM?noc?write＝T _h+T _S+T _C＝Ht _r+L/b+T _C (1)

In the formula, T _h, T _s, T _cBe respectively that head postpones, sequence delays and communication delay, H are jumping figures, t _rBe that route postpones, L is that bag is long, and b is a bandwidth.

Read operation: shown in Fig. 6 (b), as i processing unit PE _iFrom j processing unit PE _jDuring read data, PE _iAt first to distributing to PE _jL2 cache device L2 _jSend read request.PE then _iDirectly from L2 _jMiddle reading of data.PE _iThe network-on-chip transmission delay T of read operation _SMThe expression of noc read is:

T _SM?noc?read＝2T _h+T _S+2T _C＝2Ht _r+L/b+2T _C (2)

According to background technology, the network-on-chip transmission delay T noc write and the T noc read of the write/read operation of traditional network-on-a-chip are expressed as respectively:

T _noc?write＝T _h+T _S+T _C+T _W＝Ht _r+L/b+T _C+T _W (3)

T _noc?read＝2T _h+T _S+2T _C+T _W＝2Ht _r+L/b+2T _C+T _W (4)

Contrast equation (1) and (3), (2) and (4) are because the transmission course of network-on-chip of the present invention is to realize i processing unit PE _iAnd the exchanges data between the L2 cache device L2 that shares, this process need not waited for j processing unit PE _jResponse Writing/Reading request is not so exist response time T _WThereby, reduced the transmission delay of network-on-chip.Network-on-chip of the present invention is realized be earlier from processing unit PE to the L2 cache device L2 that shares; Again from L2 cache device L2 to the data-transmission mode of handling unit PE; Compare with the data-transmission mode between the processing unit PE of traditional network-on-chip; Alleviated too concentrated cause congested of read between the processing unit PE, made the communication delay T of network-on-chip _cDiminish, thereby further reduced the transmission delay of network-on-chip.

2. emulation experiment

This emulation experiment adopts the supply voltage of SIMC 0.13um method and 1.1V, application based on the MPSOCS simulation system software of OPNET respectively on netted network-on-a-chip of traditional two-dimensional and two-dimension netted network-on-a-chip of the present invention to H.264, the transmission delay and the power consumption of M-JPEG, three kinds of decoding algorithms of MP3 carry out emulation.Simulation result is as shown in table 1.

The contrast of table 1 simulation result

Visible by table 1, two-dimension netted network-on-a-chip of the present invention is compared with the netted network-on-a-chip of traditional two-dimensional, on average makes transmission delay reduce by 37.6%, and power consumption reduces by 33.7%.

Claims

1. two-dimension netted network-on-a-chip; Comprise N kernel; N routing node and a L2 cache device L2, N>=2 wherein, each routing node links to each other with a kernel with four adjacent routing nodes; It is characterized in that: each kernel is by processing unit PE, and level cache device L1 and network adapter NI form; Each routing node is an alteration switch S, and this alteration switch S is made up of North, South, East, four I/O ports of West, processing unit access interface PE port, internal memory access interface L2port, six MUX MUX, six selected cells, cross bar switch array and six fifo queue Queue; L2 cache device L2 is arranged on the outside of kernel, realizes sharing of L2 cache device L2, and this L2 cache device L2 is connected with all routing nodes, through the processing unit PE swap data in internal memory access interface L2port and the kernel, realizes the low transmission time-delay; Described internal memory access interface L2port and processing unit access interface PE port form by input port and output port two parts; Input port links to each other with the fifo queue Queue of this input port direction; Output port links to each other with the MUX MUX of this output port direction; MUX MUX, the fifo queue Queue of MUX MUX through all MUX directions of cross bar switch array and other links to each other, and the while links to each other with the selected cell of self direction; Processing unit PE in the kernel, level cache device L1 link to each other with other routing node through four the I/O ports of North, South, East, West among the alteration switch S; Be connected with the outer L2 cache device L2 that shares of kernel through the internal memory access interface L2port among the alteration switch S, realize earlier from i processing unit PE _iTo the L2 cache device L2 that shares, again from the L2 cache device L2 that shares to j processing unit PE _jTwo the step write/read operation.