CN102013984B - Two-dimensional net network-on-chip system - Google Patents

Two-dimensional net network-on-chip system Download PDF

Info

Publication number
CN102013984B
CN102013984B CN2010105072008A CN201010507200A CN102013984B CN 102013984 B CN102013984 B CN 102013984B CN 2010105072008 A CN2010105072008 A CN 2010105072008A CN 201010507200 A CN201010507200 A CN 201010507200A CN 102013984 B CN102013984 B CN 102013984B
Authority
CN
China
Prior art keywords
processing unit
mux
cache device
kernel
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2010105072008A
Other languages
Chinese (zh)
Other versions
CN102013984A (en
Inventor
蔡觉平
魏洁
李赞
姚磊
王韶力
郝跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN2010105072008A priority Critical patent/CN102013984B/en
Publication of CN102013984A publication Critical patent/CN102013984A/en
Application granted granted Critical
Publication of CN102013984B publication Critical patent/CN102013984B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a two-dimensional net network-on-chip system which is used for solving the problem that a multi-core on-chip system has delay transmission time and large power consumption when processing mass data. The technical scheme is that: a two-stage register L2 is arranged out of the core; a novel exchanging switch with an internal memory accessing port is used, so that the two-stage register L2 exchanges data with a processing unit PE through the internal memory accessing port in the exchanging switch; all processing units PE can share the two-stage register L2; and the writing/reading operations among the processing units PE in the traditional two-dimensional net network-on-chip system are divided into two steps of firstly sharing to the two-stage register L2 from the processing unit PE and then sharing to the processing unit PE from the shared two-stage register L2. The two-dimensional net network-on-chip system relieves the congestion among the processing units PE caused by the concentrated reading/writing requests and reduces the transmission time and the power consumption of the network-on-chip system; and the two-dimensional net network-on-chip system is used for processing the large-scale data.

Description

Two-dimension netted network-on-a-chip
Technical field
The invention belongs to technical field of integrated circuits, relate to the structure of multi-core processor chip network-on-chip, can be used for handling the large-scale data that multimedia technology or wireless application etc. produce.
Background technology
Network-on-chip NoC is used for system-on-chip designs to interference networks, solves communication between components problem on the sheet.Compare with traditional structure such as bus structures, cross bar structure, have the reliability height, autgmentability is strong, advantage low in energy consumption.
The netted network-on-chip tactical rule of traditional two-dimensional, simply be easy to realize, and have good durability, therefore two-dimension netted network is a most frequently used network-on-chip structure in the research at present, and its structure is as shown in Figure 1.Its each routing node links to each other with a kernel with four adjacent routing nodes; Each routing node is an alteration switch S; In each kernel, L2 cache device L2 and processing unit PE, level cache device L1, network adapter NI integrate.
Alteration switch S, its structure is as shown in Figure 2, and this alteration switch S is by North, South, East, four I/O ports of West; Processing unit access interface PE port; Five MUX MUX, five selected cells, five fifo queue Queue and a cross bar switch array are formed.North, South, East, four I/O ports of West, processing unit access interface PE port forms by input port and output port two parts.Input port links to each other with the fifo queue Queue of this input port; Output port links to each other with the MUX MUX of this output port direction; MUX MUX simultaneously with this MUX MUX direction on selected cell link to each other; MUX MUX links to each other through MUX MUX, the fifo queue Queue of all MUX directions of cross bar switch array and other again.
This alteration switch S is transferred to one or more output ports to data from an input port, realizes the transfer of data of network-on-chip.Data transmission procedure is: data are from certain input port input, and fifo queue Queue carries out buffer memory to the input data; Confirm transmission path by the cross bar switch array then; Then MUX MUX selects the data that transmission comes under the control of selected cell; Last selecteed data are exported through output port.
According to Pande ' s performance model, set up the network-on-chip transmission delay model of write/read operation between the processing unit PE:
Write operation: shown in Fig. 3 (a), as i processing unit PE iTo j processing unit PE jDuring write data, PE iAt first to PE jRequest is write in transmission, then PE jResponse should request, then PE iBegin to PE jWrite data.So PE iThe transmission delay T noc write of the network-on-chip of write operation can use following formulate:
T noc?write=T h+T S+T C+T W=Ht r+L/b+T C+T W
In the formula, T h, T s, T c, T WBe respectively that head postpones, sequence delays, communication delay and response time, H is a jumping figure, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
Read operation: shown in Fig. 3 (b), as i processing unit PE iFrom j processing unit PE jDuring read data, PE iAt first to PE jSend read request, PE then jResponse should request, then PE jBegin to PE iSend data.So PE iThe transmission delay T noc read of the network-on-chip of read operation can use following formulate:
T noc?read=2T h+T S+2T C+T W=2Ht r+L/b+2T C+T W
In the formula, T h, T s, T c, T WBe respectively that head postpones, sequence delays, communication delay and response time, H is a jumping figure, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
In the netted network-on-a-chip of traditional two-dimensional, because processing unit PE request is too concentrated and caused congestedly, and system need wait for that processing unit PE responds the Writing/Reading request, communication delay T cWith response time T WGreatly, cause the transmission delay of network-on-chip and power consumption big, particularly when handling large-scale data, the problem that time-delay and power consumption are big is particularly evident, can't satisfy the requirement that system in time handles mass data at short notice.
Summary of the invention
The objective of the invention is to overcome the deficiency of above-mentioned prior art, a kind of novel two-dimension netted network-on-a-chip is provided,, satisfy the requirement that system in time handles mass data at short notice to reduce transmission delay and power consumption.
The technical thought that realizes the object of the invention is; L2 cache device L2 is arranged on the outer novel alteration switch with an internal memory access interface that also adopts of kernel; Realize sharing of L2 cache device L2; And to change into the data-transmission mode between the processing unit PE with L2 cache device L2 be the data-transmission mode of intermediary, and then realize low transmission time-delay, low-power consumption.Whole network-on-a-chip comprises: N kernel, a N routing node (N >=2) and a L2 cache device L2; Each routing node links to each other with a kernel with four adjacent routing nodes; Each kernel is by processing unit PE, and level cache device L1 and network adapter NI form; Each routing node is an alteration switch S, and this alteration switch is made up of North, South, East, four I/O ports of West, internal memory access interface L2port, processing unit access interface PE port, cross bar switch array, six MUX MUX, six selected cells and six fifo queue Queue; L2 cache device L2 is arranged on the outside of kernel; Realize sharing of L2 cache device L2; This L2 cache device L2 is connected with all routing nodes, through internal memory access interface among the alteration switch S and the processing unit PE swap data in the kernel, realizes the low transmission time-delay.
Processing unit PE in the said kernel, level cache device L1 link to each other with other routing node through four I/O ports among the alteration switch S; Be connected with the outer L2 cache device L2 of kernel through the internal memory access interface among the alteration switch S, realize earlier from i processing unit PE iTo the L2 cache device L2 that shares, again from the L2 cache device L2 that shares to j processing unit PE jTwo the step write/read operation.
Described North, South, East, four I/O ports of West, internal memory access interface L2port and processing unit access interface PE port form by input port and output port two parts; Input port links to each other with the fifo queue Queue of this input port direction; Output port links to each other with the MUX MUX of this output port direction; MUX MUX, the fifo queue Queue of MUX MUX through all MUX directions of cross bar switch array and other links to each other, and the while links to each other with the selected cell of self direction.
The present invention compared with prior art has the following advantages:
(1) the present invention realizes that processing unit PE, level cache device L1 in the kernel are connected with the outer L2 cache device L2's that shares of kernel owing to be provided with the internal memory access interface in the alteration switch; Be provided with four I/O ports and realize that kernel, L2 cache device L2 are connected with other routing node; Be divided into the write/read operation between the processing unit PE in the netted network-on-a-chip of conventional two-dimensional earlier from processing unit PE to L2 cache device L2; Go on foot to handling unit PE two from L2 cache device L2 again; Alleviated because processing unit PE read is too concentrated cause congested; Reduced the communication delay between the processing unit PE, thereby reduced the transmission delay of network-on-a-chip, power consumption also decreases;
(2) the present invention shares L2 cache device L2 owing to the outside that L2 cache device L2 is arranged on kernel, and there is not response time T in this L2 cache device L2 that shares through internal memory access interface and processing unit PE swap data in the alteration switch WThereby, further reduced network-on-a-chip transmission delay and power consumption, satisfied the requirement that system in time handles mass data at short notice.
Description of drawings
Fig. 1 is the netted network-on-chip system configuration of a conventional two-dimensional sketch map;
Fig. 2 is an alteration switch structural representation in the netted network-on-a-chip of conventional two-dimensional;
Fig. 3 is the read/write operation delay model sketch map of processing unit PE in the netted network-on-a-chip of conventional two-dimensional;
Fig. 4 is the two-dimension netted network-on-a-chip structural representation of the present invention;
Fig. 5 is an alteration switch structural representation in the two-dimension netted network-on-a-chip of the present invention;
Fig. 6 is the read/write operation delay model sketch map of processing unit PE in the two-dimension netted network-on-a-chip of the present invention.
Embodiment
With reference to Fig. 4, two-dimension netted network-on-a-chip of the present invention is made up of N kernel, a N routing node (N >=2) and a L2 cache device L2.Each routing node links to each other with a kernel with four adjacent routing nodes; Each kernel is made up of processing unit PE, level cache device L1 and network adapter NI; And the L2 cache device L2 that is integrated in the traditional structure in the kernel is arranged on outside the kernel; This L2 cache device L2 is connected with all routing nodes, realizes sharing of L2 cache device L2.The L2 cache device L2 that shares links to each other with processing unit PE, level cache device L1 in the kernel through the internal memory access interface L2port among the alteration switch S, realizes first from i processing unit PE iTo the L2 cache device L2 that shares, again from the L2 cache device L2 that shares to j processing unit PE jTwo the step write/read operation.Each routing node is an alteration switch S, and its structure is as shown in Figure 5.
With reference to Fig. 5; Alteration switch S of the present invention comprises: North, South, East, four I/O ports of West; Internal memory access interface L2port, processing unit access interface PE port, six MUX MUX; Six selected cells, six fifo queue Queue and a cross bar switch array.Wherein, North, South, East, four I/O ports of West, internal memory access interface L2port and processing unit access interface PE port form by input port and output port two parts.Input port links to each other with the fifo queue Queue of this input port direction; Output port links to each other with the MUX MUX of this output port direction; MUX MUX links to each other with the selected cell of this MUX direction simultaneously; MUX MUX also links to each other with MUX MUX, the fifo queue Queue of other all MUX directions through the cross bar switch array.
This alteration switch S realizes the transmission of data from an input port to one or more output ports.Transmission course is: data are imported from input port, and the fifo queue Queue on this input port direction carries out buffer memory to the input data; By the transmission path of cross bar switch array specified data, then MUX MUX selects transmitting the data of coming under the control of selected cell then; At last selecteed data are exported through output port.When data were transmitted between processing unit access interface PE port and internal memory access interface L2port, network-on-a-chip had been realized the exchanges data between processing unit PE and the shared L2 cache device L2.
Effect of the present invention further specifies through following theory analysis and simulation result:
1. theory analysis
Write/read operation process among the present invention between the processing unit PE is divided into network-on-chip transmission course and the DRP data reception process from L2 cache device L2 to processing unit PE from processing unit PE to L2 cache device L2.Influence the response time T of the processing unit PE in network-on-chip transmission time in the traditional structure WCan influence the Data Receiving time in the new construction and can not influence the network-on-chip transmission time.The present invention only considers the network-on-chip transmission time.
With reference to Fig. 6, set up i processing unit PE in the network-on-a-chip of the present invention iTo j processing unit PE jThe delay model of write/read operation.Wherein:
Write operation: shown in Fig. 6 (a), as i processing unit PE iTo j processing unit PE jDuring write data, PE iAt first to distributing to PE jL2 cache device L2 jRequest is write in transmission, then PE iTo L2 jWrite data.PE iThe network-on-chip transmission delay T of write operation SMThe expression formula of noc write is:
T SM?noc?write=T h+T S+T C=Ht r+L/b+T C (1)
In the formula, T h, T s, T cBe respectively that head postpones, sequence delays and communication delay, H are jumping figures, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
Read operation: shown in Fig. 6 (b), as i processing unit PE iFrom j processing unit PE jDuring read data, PE iAt first to distributing to PE jL2 cache device L2 jSend read request.PE then iDirectly from L2 jMiddle reading of data.PE iThe network-on-chip transmission delay T of read operation SMThe expression of noc read is:
T SM?noc?read=2T h+T S+2T C=2Ht r+L/b+2T C (2)
In the formula, T h, T s, T cBe respectively that head postpones, sequence delays and communication delay, H are jumping figures, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
According to background technology, the network-on-chip transmission delay T noc write and the T noc read of the write/read operation of traditional network-on-a-chip are expressed as respectively:
T noc?write=T h+T S+T C+T W=Ht r+L/b+T C+T W (3)
T noc?read=2T h+T S+2T C+T W=2Ht r+L/b+2T C+T W (4)
In the formula, T h, T s, T cBe respectively that head postpones, sequence delays and communication delay, H are jumping figures, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
Contrast equation (1) and (3), (2) and (4) are because the transmission course of network-on-chip of the present invention is to realize i processing unit PE iAnd the exchanges data between the L2 cache device L2 that shares, this process need not waited for j processing unit PE jResponse Writing/Reading request is not so exist response time T WThereby, reduced the transmission delay of network-on-chip.Network-on-chip of the present invention is realized be earlier from processing unit PE to the L2 cache device L2 that shares; Again from L2 cache device L2 to the data-transmission mode of handling unit PE; Compare with the data-transmission mode between the processing unit PE of traditional network-on-chip; Alleviated too concentrated cause congested of read between the processing unit PE, made the communication delay T of network-on-chip cDiminish, thereby further reduced the transmission delay of network-on-chip.
2. emulation experiment
This emulation experiment adopts the supply voltage of SIMC 0.13um method and 1.1V, application based on the MPSOCS simulation system software of OPNET respectively on netted network-on-a-chip of traditional two-dimensional and two-dimension netted network-on-a-chip of the present invention to H.264, the transmission delay and the power consumption of M-JPEG, three kinds of decoding algorithms of MP3 carry out emulation.Simulation result is as shown in table 1.
The contrast of table 1 simulation result
Figure BDA0000028249200000071
Visible by table 1, two-dimension netted network-on-a-chip of the present invention is compared with the netted network-on-a-chip of traditional two-dimensional, on average makes transmission delay reduce by 37.6%, and power consumption reduces by 33.7%.

Claims (1)

1. two-dimension netted network-on-a-chip; Comprise N kernel; N routing node and a L2 cache device L2, N>=2 wherein, each routing node links to each other with a kernel with four adjacent routing nodes; It is characterized in that: each kernel is by processing unit PE, and level cache device L1 and network adapter NI form; Each routing node is an alteration switch S, and this alteration switch S is made up of North, South, East, four I/O ports of West, processing unit access interface PE port, internal memory access interface L2port, six MUX MUX, six selected cells, cross bar switch array and six fifo queue Queue; L2 cache device L2 is arranged on the outside of kernel, realizes sharing of L2 cache device L2, and this L2 cache device L2 is connected with all routing nodes, through the processing unit PE swap data in internal memory access interface L2port and the kernel, realizes the low transmission time-delay; Described internal memory access interface L2port and processing unit access interface PE port form by input port and output port two parts; Input port links to each other with the fifo queue Queue of this input port direction; Output port links to each other with the MUX MUX of this output port direction; MUX MUX, the fifo queue Queue of MUX MUX through all MUX directions of cross bar switch array and other links to each other, and the while links to each other with the selected cell of self direction; Processing unit PE in the kernel, level cache device L1 link to each other with other routing node through four the I/O ports of North, South, East, West among the alteration switch S; Be connected with the outer L2 cache device L2 that shares of kernel through the internal memory access interface L2port among the alteration switch S, realize earlier from i processing unit PE iTo the L2 cache device L2 that shares, again from the L2 cache device L2 that shares to j processing unit PE jTwo the step write/read operation.
CN2010105072008A 2010-10-14 2010-10-14 Two-dimensional net network-on-chip system Expired - Fee Related CN102013984B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105072008A CN102013984B (en) 2010-10-14 2010-10-14 Two-dimensional net network-on-chip system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105072008A CN102013984B (en) 2010-10-14 2010-10-14 Two-dimensional net network-on-chip system

Publications (2)

Publication Number Publication Date
CN102013984A CN102013984A (en) 2011-04-13
CN102013984B true CN102013984B (en) 2012-05-09

Family

ID=43844014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105072008A Expired - Fee Related CN102013984B (en) 2010-10-14 2010-10-14 Two-dimensional net network-on-chip system

Country Status (1)

Country Link
CN (1) CN102013984B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188158B (en) * 2011-12-28 2016-07-20 清华大学 A kind of network-on-chip router and method for routing
CN102868604B (en) * 2012-09-28 2015-05-06 中国航空无线电电子研究所 Two-dimension Mesh double buffering fault-tolerant route unit applied to network on chip
CN105812063B (en) * 2016-03-22 2018-08-03 西安电子科技大学 Network on mating plate system based on statistic multiplexing and communication means
CN108897701B (en) * 2018-06-20 2020-07-14 珠海市杰理科技股份有限公司 cache storage device
CN113162906B (en) * 2021-02-26 2023-04-07 西安微电子技术研究所 NoC transmission method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232456A (en) * 2008-01-25 2008-07-30 浙江大学 Distributed type testing on-chip network router
CN101383712A (en) * 2008-10-16 2009-03-11 电子科技大学 Routing node microstructure for on-chip network
CN101582854A (en) * 2009-06-12 2009-11-18 华为技术有限公司 Data exchange method, device and system thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070280224A1 (en) * 2006-06-05 2007-12-06 Via Technologies System and method for an output independent crossbar
US8102884B2 (en) * 2008-10-15 2012-01-24 International Business Machines Corporation Direct inter-thread communication buffer that supports software controlled arbitrary vector operand selection in a densely threaded network on a chip

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101232456A (en) * 2008-01-25 2008-07-30 浙江大学 Distributed type testing on-chip network router
CN101383712A (en) * 2008-10-16 2009-03-11 电子科技大学 Routing node microstructure for on-chip network
CN101582854A (en) * 2009-06-12 2009-11-18 华为技术有限公司 Data exchange method, device and system thereof

Also Published As

Publication number Publication date
CN102013984A (en) 2011-04-13

Similar Documents

Publication Publication Date Title
CN108400880B (en) Network on chip, data transmission method and first switching node
CN102013984B (en) Two-dimensional net network-on-chip system
CN101841420B (en) Network-on-chip oriented low delay router structure
US20060095621A1 (en) Methods and apparatuses for generating a single request for block transactions over a communication fabric
CN104158738A (en) Network-on-chip router with low buffer area and routing method
US8769459B2 (en) High-end fault-tolerant computer system and method for same
CN102685017A (en) On-chip network router based on field programmable gate array (FPGA)
CN101739241A (en) On-chip multi-core DSP cluster and application extension method
US7277975B2 (en) Methods and apparatuses for decoupling a request from one or more solicited responses
Ou et al. A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array
CN105207957B (en) A kind of system based on network-on-chip multicore architecture
CN109992543A (en) A kind of PCI-E data efficient transmission method based on ZYZQ-7000
CN103106173A (en) Interconnection method among cores of multi-core processor
Ebrahimi et al. A high-performance network interface architecture for NoCs using reorder buffer sharing
CN104320341A (en) Adaptive and asynchronous routing network on 2D-Torus chip and design method thereof
Sikder et al. Exploring wireless technology for off-chip memory access
CN104035896B (en) Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system
Sinha et al. Data-flow aware CNN accelerator with hybrid wireless interconnection
KR20150028520A (en) Memory-centric system interconnect structure
CN110096456A (en) A kind of High rate and large capacity caching method and device
Ahmed et al. A one-to-many traffic aware wireless network-in-package for multi-chip computing platforms
CN103744817B (en) For Avalon bus to the communication Bridge equipment of Crossbar bus and communication conversion method thereof
Lee et al. Design of a feasible on-chip interconnection network for a chip multiprocessor (cmp)
CN102622319A (en) Data exchange system of high-speed storage interface IP (Internet Protocol) core based on MPMC (Multi-Port Memory Controller)
Duan et al. Research on Double-Layer Networks-on-Chip for Inter-Chiplet Data Switching on Active Interposers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120509

Termination date: 20151014

EXPY Termination of patent right or utility model