CN102013984A - Two-dimensional net network-on-chip system - Google Patents

Two-dimensional net network-on-chip system Download PDF

Info

Publication number
CN102013984A
CN102013984A CN2010105072008A CN201010507200A CN102013984A CN 102013984 A CN102013984 A CN 102013984A CN 2010105072008 A CN2010105072008 A CN 2010105072008A CN 201010507200 A CN201010507200 A CN 201010507200A CN 102013984 A CN102013984 A CN 102013984A
Authority
CN
China
Prior art keywords
processing unit
mux
cache device
kernel
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010105072008A
Other languages
Chinese (zh)
Other versions
CN102013984B (en
Inventor
蔡觉平
魏洁
李赞
姚磊
王韶力
郝跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN2010105072008A priority Critical patent/CN102013984B/en
Publication of CN102013984A publication Critical patent/CN102013984A/en
Application granted granted Critical
Publication of CN102013984B publication Critical patent/CN102013984B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a two-dimensional net network-on-chip system which is used for solving the problem that a multi-core on-chip system has delay transmission time and large power consumption when processing mass data. The technical scheme is that: a two-stage register L2 is arranged out of the core; a novel exchanging switch with an internal memory accessing port is used, so that the two-stage register L2 exchanges data with a processing unit PE through the internal memory accessing port in the exchanging switch; all processing units PE can share the two-stage register L2; and the writing/reading operations among the processing units PE in the traditional two-dimensional net network-on-chip system are divided into two steps of firstly sharing to the two-stage register L2 from the processing unit PE and then sharing to the processing unit PE from the shared two-stage register L2. The two-dimensional net network-on-chip system relieves the congestion among the processing units PE caused by the concentrated reading/writing requests and reduces the transmission time and the power consumption of the network-on-chip system; and the two-dimensional net network-on-chip system is used for processing the large-scale data.

Description

Two-dimension netted network-on-a-chip
Technical field
The invention belongs to technical field of integrated circuits, relate to the structure of multi-core processor chip network-on-chip, can be used for handling the large-scale data that multimedia technology or wireless application etc. produce.
Background technology
Network-on-chip NoC is used for system-on-chip designs to interference networks, solves communication between components problem on the sheet.Compare with traditional structure such as bus structures, cross bar structure, have the reliability height, autgmentability is strong, advantage low in energy consumption.
Traditional two-dimension netted network-on-chip tactical rule, simply be easy to realize, and have good durability, therefore two-dimension netted network is a most frequently used network-on-chip structure in the research at present, and its structure as shown in Figure 1.Its each routing node links to each other with a kernel with four adjacent routing nodes; Each routing node is an alteration switch S; In each kernel, L2 cache device L2 and processing unit PE, level cache device L1, network adapter NI integrate.
Alteration switch S, its structure as shown in Figure 2, this alteration switch S is by North, South, four I/O ports of East, West, processing unit access interface PE port, five MUX MUX, five selected cells, five fifo queue Queue and a cross bar switch array are formed.North, South, four I/O ports of East, West, processing unit access interface PE port forms by input port and output port two parts.Input port links to each other with the fifo queue Queue of this input port; Output port links to each other with the MUX MUX of this output port direction; MUX MUX links to each other with selected cell on this MUX MUX direction simultaneously; MUX MUX links to each other by MUX MUX, the fifo queue Queue of all MUX directions of cross bar switch array and other again.
This alteration switch S is transferred to one or more output ports to data from an input port, realizes the transfer of data of network-on-chip.Data transmission procedure is: data are from certain input port input, and fifo queue Queue carries out buffer memory to the input data; Determine transmission path by the cross bar switch array then; Then MUX MUX selects the data that transmission comes under the control of selected cell; Last selecteed data are exported through output port.
According to Pande ' s performance model, set up the network-on-chip transmission delay model of write/read operation between the processing unit PE:
Write operation: shown in Fig. 3 (a), as i processing unit PE iTo j processing unit PE jDuring write data, PE iAt first to PE jSend write request, PE then jRespond this request, then PE iBeginning is to PE jWrite data.So PE iThe transmission delay T noc write of the network-on-chip of write operation can use following formulate:
T noc?write=T h+T S+T C+T W=Ht r+L/b+T C+T W
In the formula, T h, T s, T c, T WBe respectively that head postpones, sequence delays, communication delay and response time, H is a jumping figure, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
Read operation: shown in Fig. 3 (b), as i processing unit PE iFrom j processing unit PE jDuring read data, PE iAt first to PE jSend read request, PE then jRespond this request, then PE jBeginning is to PE iSend data.So PE iThe transmission delay T noc read of the network-on-chip of read operation can use following formulate:
T noc?read=2T h+T S+2T C+T W=2Ht r+L/b+2T C+T W
In the formula, T h, T s, T c, T WBe respectively that head postpones, sequence delays, communication delay and response time, H is a jumping figure, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
In traditional two-dimension netted network-on-a-chip, because processing unit PE request is too concentrated and caused congestedly, and system need wait for that processing unit PE responds the Writing/Reading request, communication delay T cWith response time T WGreatly, cause the transmission delay of network-on-chip and power consumption big, particularly when handling large-scale data, the problem that time-delay and power consumption are big is particularly evident, can't satisfy the requirement that system in time handles mass data at short notice.
Summary of the invention
The objective of the invention is to overcome the deficiency of above-mentioned prior art, a kind of novel two-dimension netted network-on-a-chip is provided,, satisfy the requirement that system in time handles mass data at short notice to reduce transmission delay and power consumption.
The technical thought that realizes the object of the invention is, L2 cache device L2 is arranged on the outer novel alteration switch with an internal memory access interface that also adopts of kernel, realize sharing of L2 cache device L2, and to change into the data-transmission mode between the processing unit PE with L2 cache device L2 be the data-transmission mode of intermediary, and then realize low transmission time-delay, low-power consumption.Whole network-on-a-chip comprises: N kernel, a N routing node (N 〉=2) and a L2 cache device L2, each routing node links to each other with a kernel with four adjacent routing nodes, each kernel is by processing unit PE, and level cache device L1 and network adapter NI form; Each routing node is an alteration switch S, and this alteration switch is made up of North, South, four I/O ports of East, West, internal memory access interface L2port, processing unit access interface PE port, cross bar switch array, six MUX MUX, six selected cells and six fifo queue Queue; L2 cache device L2 is arranged on the outside of kernel, realize sharing of L2 cache device L2, this L2 cache device L2 is connected with all routing node, by internal memory access interface among the alteration switch S and the processing unit PE swap data in the kernel, realizes the low transmission time-delay.
Processing unit PE in the described kernel, level cache device L1 link to each other with other routing node by four I/O ports among the alteration switch S, the L2 cache device L2 outer by internal memory access interface among the alteration switch S and kernel is connected, and realizes earlier from i processing unit PE iTo the L2 cache device L2 that shares, again from the L2 cache device L2 that shares to j processing unit PE jTwo the step write/read operation.
Described North, South, four I/O ports of East, West, internal memory access interface L2port and processing unit access interface PE port form by input port and output port two parts; Input port links to each other with the fifo queue Queue of this input port direction; Output port links to each other with the MUX MUX of this output port direction; MUX MUX, the fifo queue Queue of MUX MUX by all MUX directions of cross bar switch array and other links to each other, and the while links to each other with the selected cell of self direction.
The present invention compared with prior art has the following advantages:
(1) the present invention is owing to the processing unit PE that is provided with in the alteration switch in the internal memory access interface realization kernel, level cache device L1 is connected with the outer L2 cache device L2's that shares of kernel, be provided with four I/O ports and realize kernel, L2 cache device L2 is connected with other routing node, write/read operation between the processing unit PE in the netted network-on-a-chip of conventional two-dimensional is divided into earlier from processing unit PE to L2 cache device L2, go on foot to handling unit PE two from L2 cache device L2 again, alleviated because processing unit PE read is too concentrated cause congested, reduced the communication delay between the processing unit PE, thereby reduced the transmission delay of network-on-a-chip, power consumption also decreases;
(2) the present invention shares L2 cache device L2 owing to the outside that L2 cache device L2 is arranged on kernel, and there is not response time T in this L2 cache device L2 that shares by internal memory access interface and processing unit PE swap data in the alteration switch WThereby, further reduced network-on-a-chip transmission delay and power consumption, satisfied the requirement that system in time handles mass data at short notice.
Description of drawings
Fig. 1 is the netted network-on-chip system configuration of a conventional two-dimensional schematic diagram;
Fig. 2 is an alteration switch structural representation in the netted network-on-a-chip of conventional two-dimensional;
Fig. 3 is the read/write operation delay model schematic diagram of processing unit PE in the netted network-on-a-chip of conventional two-dimensional;
Fig. 4 is the two-dimension netted network-on-a-chip structural representation of the present invention;
Fig. 5 is an alteration switch structural representation in the two-dimension netted network-on-a-chip of the present invention;
Fig. 6 is the read/write operation delay model schematic diagram of processing unit PE in the two-dimension netted network-on-a-chip of the present invention.
Embodiment
With reference to Fig. 4, two-dimension netted network-on-a-chip of the present invention is made up of N kernel, a N routing node (N 〉=2) and a L2 cache device L2.Each routing node links to each other with a kernel with four adjacent routing nodes, each kernel is made up of processing unit PE, level cache device L1 and network adapter NI, and the L2 cache device L2 that is integrated in the traditional structure in the kernel is arranged on outside the kernel, this L2 cache device L2 is connected with all routing node, realizes sharing of L2 cache device L2.The L2 cache device L2 that shares links to each other with processing unit PE, level cache device L1 in the kernel by the internal memory access interface L2port among the alteration switch S, realizes first from i processing unit PE iTo the L2 cache device L2 that shares, again from the L2 cache device L2 that shares to j processing unit PE jTwo the step write/read operation.Each routing node is an alteration switch S, and its structure as shown in Figure 5.
With reference to Fig. 5, alteration switch S of the present invention comprises: North, South, four I/O ports of East, West, internal memory access interface L2port, processing unit access interface PE port, six MUX MUX, six selected cells, six fifo queue Queue and a cross bar switch array.Wherein, North, South, four I/O ports of East, West, internal memory access interface L2port and processing unit access interface PE port form by input port and output port two parts.Input port links to each other with the fifo queue Queue of this input port direction; Output port links to each other with the MUX MUX of this output port direction; MUX MUX links to each other with the selected cell of this MUX direction simultaneously; MUX MUX also links to each other with MUX MUX, the fifo queue Queue of other all MUX directions by the cross bar switch array.
This alteration switch S realizes the transmission of data from an input port to one or more output ports.Transmission course is: data are imported from input port, and the fifo queue Queue on this input port direction carries out buffer memory to the input data; By the transmission path of cross bar switch array specified data, then MUX MUX selects transmitting the data of coming under the control of selected cell then; At last selecteed data are exported through output port.When data were transmitted between processing unit access interface PE port and internal memory access interface L2port, network-on-a-chip had been realized the exchanges data between processing unit PE and the shared L2 cache device L2.
Effect of the present invention further specifies by following theory analysis and simulation result:
1. theory analysis
Write/read operation process among the present invention between the processing unit PE is divided into network-on-chip transmission course and the DRP data reception process from L2 cache device L2 to processing unit PE from processing unit PE to L2 cache device L2.Influence the response time T of the processing unit PE in network-on-chip transmission time in the traditional structure WCan influence the Data Receiving time in the new construction and can not influence the network-on-chip transmission time.The present invention only considers the network-on-chip transmission time.
With reference to Fig. 6, set up i processing unit PE in the network-on-a-chip of the present invention iTo j processing unit PE jThe delay model of write/read operation.Wherein:
Write operation: shown in Fig. 6 (a), as i processing unit PE iTo j processing unit PE jDuring write data, PE iAt first to distributing to PE jL2 cache device L2 jSend write request, PE then iTo L2 jWrite data.PE iThe network-on-chip transmission delay T of write operation SMThe expression formula of noc write is:
T SM?noc?write=T h+T S+T C=Ht r+L/b+T C (1)
In the formula, T h, T s, T cBe respectively that head postpones, sequence delays and communication delay, H are jumping figures, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
Read operation: shown in Fig. 6 (b), as i processing unit PE iFrom j processing unit PE jDuring read data, PE iAt first to distributing to PE jL2 cache device L2 jSend read request.PE then iDirectly from L2 jMiddle reading of data.PE iThe network-on-chip transmission delay T of read operation SMThe expression of noc read is:
T SM?noc?read=2T h+T S+2T C=2Ht r+L/b+2T C (2)
In the formula, T h, T s, T cBe respectively that head postpones, sequence delays and communication delay, H are jumping figures, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
According to background technology, the network-on-chip transmission delay T noc write and the T noc read of the write/read operation of traditional network-on-a-chip are expressed as respectively:
T noc?write=T h+T S+T C+T W=Ht r+L/b+T C+T W (3)
T noc?read=2T h+T S+2T C+T W=2Ht r+L/b+2T C+T W (4)
In the formula, T h, T s, T cBe respectively that head postpones, sequence delays and communication delay, H are jumping figures, t rBe that route postpones, L is that bag is long, and b is a bandwidth.
Contrast equation (1) and (3), (2) and (4) are because the transmission course of network-on-chip of the present invention is to realize i processing unit PE iAnd the exchanges data between the L2 cache device L2 that shares, this process does not need to wait for j processing unit PE jResponse Writing/Reading request is not so exist response time T WThereby, reduced the transmission delay of network-on-chip.Network-on-chip of the present invention is realized be earlier from processing unit PE to the L2 cache device L2 that shares, again from L2 cache device L2 to the data-transmission mode of handling unit PE, compare with the data-transmission mode between the processing unit PE of traditional network-on-chip, alleviated too concentrated cause congested of read between the processing unit PE, made the communication delay T of network-on-chip cDiminish, thereby further reduced the transmission delay of network-on-chip.
2. emulation experiment
This emulation experiment adopts the supply voltage of SIMC 0.13um method and 1.1V, use based on the MPSOCS simulation system software of OPNET respectively on traditional two-dimension netted network-on-a-chip and two-dimension netted network-on-a-chip of the present invention to H.264, the transmission delay and the power consumption of M-JPEG, three kinds of decoding algorithms of MP3 carry out emulation.Simulation result is as shown in table 1.
The contrast of table 1 simulation result
By table 1 as seen, two-dimension netted network-on-a-chip of the present invention is compared with traditional two-dimension netted network-on-a-chip, on average makes transmission delay reduce by 37.6%, and power consumption reduces by 33.7%.

Claims (3)

1. two-dimension netted network-on-a-chip, comprise N kernel, a N routing node (N 〉=2) and a L2 cache device L2, each routing node links to each other with a kernel with four adjacent routing nodes, it is characterized in that: each kernel is by processing unit PE, and level cache device L1 and network adapter NI form; Each routing node is an alteration switch S, and this alteration switch S is made up of North, South, four I/O ports of East, West, processing unit access interface PE port, internal memory access interface L2port, six MUX MUX, six selected cells, cross bar switch array and six fifo queue Queue; L2 cache device L2 is arranged on the outside of kernel, realizes sharing of L2 cache device L2, and this L2 cache device L2 is connected with all routing node, and by the processing unit PE swap data in internal memory access interface L2port and the kernel, the realization low transmission is delayed time.
2. two-dimension netted network-on-a-chip according to claim 1, it is characterized in that processing unit PE, level cache device L1 in the kernel link to each other with other routing node by four the I/O ports of North, South, East, West among the alteration switch S, be connected with the kernel L2 cache device L2 that shares outward by the internal memory access interface L2port among the alteration switch S, realize earlier from i processing unit PE iTo the L2 cache device L2 that shares, again from the L2 cache device L2 that shares to j processing unit PE jTwo the step write/read operation.
3. two-dimension netted network-on-a-chip according to claim 1, it is characterized in that described North, South, four I/O ports of East, West, internal memory access interface L2port and processing unit access interface PE port form by input port and output port two parts; Input port links to each other with the fifo queue Queue of this input port direction; Output port links to each other with the MUX MUX of this output port direction; MUX MUX, the fifo queue Queue of MUX MUX by all MUX directions of cross bar switch array and other links to each other, and the while links to each other with the selected cell of self direction.
CN2010105072008A 2010-10-14 2010-10-14 Two-dimensional net network-on-chip system Expired - Fee Related CN102013984B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105072008A CN102013984B (en) 2010-10-14 2010-10-14 Two-dimensional net network-on-chip system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105072008A CN102013984B (en) 2010-10-14 2010-10-14 Two-dimensional net network-on-chip system

Publications (2)

Publication Number Publication Date
CN102013984A true CN102013984A (en) 2011-04-13
CN102013984B CN102013984B (en) 2012-05-09

Family

ID=43844014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105072008A Expired - Fee Related CN102013984B (en) 2010-10-14 2010-10-14 Two-dimensional net network-on-chip system

Country Status (1)

Country Link
CN (1) CN102013984B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102868604A (en) * 2012-09-28 2013-01-09 中国航空无线电电子研究所 Two-dimension Mesh double buffering fault-tolerant route unit applied to network on chip
CN103188158A (en) * 2011-12-28 2013-07-03 清华大学 On-chip network router and on-chip network routing method
CN105812063A (en) * 2016-03-22 2016-07-27 西安电子科技大学 Optical network on chip (ONoC) system based on statistical multiplexing and communication method
CN108897701A (en) * 2018-06-20 2018-11-27 珠海市杰理科技股份有限公司 Cache storage architecture
CN113162906A (en) * 2021-02-26 2021-07-23 西安微电子技术研究所 NoC transmission method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025822A (en) * 2006-06-05 2007-08-29 威盛电子股份有限公司 Switch system with separate output and its method
CN101232456A (en) * 2008-01-25 2008-07-30 浙江大学 Distributed type testing on-chip network router
CN101383712A (en) * 2008-10-16 2009-03-11 电子科技大学 Routing node microstructure for on-chip network
CN101582854A (en) * 2009-06-12 2009-11-18 华为技术有限公司 Data exchange method, device and system thereof
US20100091787A1 (en) * 2008-10-15 2010-04-15 International Business Machines Corporation Direct inter-thread communication buffer that supports software controlled arbitrary vector operand selection in a densely threaded network on a chip

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101025822A (en) * 2006-06-05 2007-08-29 威盛电子股份有限公司 Switch system with separate output and its method
CN101232456A (en) * 2008-01-25 2008-07-30 浙江大学 Distributed type testing on-chip network router
US20100091787A1 (en) * 2008-10-15 2010-04-15 International Business Machines Corporation Direct inter-thread communication buffer that supports software controlled arbitrary vector operand selection in a densely threaded network on a chip
CN101383712A (en) * 2008-10-16 2009-03-11 电子科技大学 Routing node microstructure for on-chip network
CN101582854A (en) * 2009-06-12 2009-11-18 华为技术有限公司 Data exchange method, device and system thereof

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188158A (en) * 2011-12-28 2013-07-03 清华大学 On-chip network router and on-chip network routing method
CN103188158B (en) * 2011-12-28 2016-07-20 清华大学 A kind of network-on-chip router and method for routing
CN102868604A (en) * 2012-09-28 2013-01-09 中国航空无线电电子研究所 Two-dimension Mesh double buffering fault-tolerant route unit applied to network on chip
CN102868604B (en) * 2012-09-28 2015-05-06 中国航空无线电电子研究所 Two-dimension Mesh double buffering fault-tolerant route unit applied to network on chip
CN105812063A (en) * 2016-03-22 2016-07-27 西安电子科技大学 Optical network on chip (ONoC) system based on statistical multiplexing and communication method
CN105812063B (en) * 2016-03-22 2018-08-03 西安电子科技大学 Network on mating plate system based on statistic multiplexing and communication means
CN108897701A (en) * 2018-06-20 2018-11-27 珠海市杰理科技股份有限公司 Cache storage architecture
CN108897701B (en) * 2018-06-20 2020-07-14 珠海市杰理科技股份有限公司 cache storage device
CN113162906A (en) * 2021-02-26 2021-07-23 西安微电子技术研究所 NoC transmission method
CN113162906B (en) * 2021-02-26 2023-04-07 西安微电子技术研究所 NoC transmission method

Also Published As

Publication number Publication date
CN102013984B (en) 2012-05-09

Similar Documents

Publication Publication Date Title
US7155554B2 (en) Methods and apparatuses for generating a single request for block transactions over a communication fabric
CN107454003B (en) It is a kind of can dynamic switching working mode network-on-chip router and method
CN101841420B (en) Network-on-chip oriented low delay router structure
CN104158738A (en) Network-on-chip router with low buffer area and routing method
CN108400880B (en) Network on chip, data transmission method and first switching node
CN102013984B (en) Two-dimensional net network-on-chip system
CN102685017A (en) On-chip network router based on field programmable gate array (FPGA)
US7277975B2 (en) Methods and apparatuses for decoupling a request from one or more solicited responses
CN101739241A (en) On-chip multi-core DSP cluster and application extension method
CN105207957B (en) A kind of system based on network-on-chip multicore architecture
CN102946529A (en) Image transmission and processing system based on FPGA (Field Programmable Gate Array) and multi-core DSP (Digital Signal Processor)
US11074206B1 (en) Message protocol for a data processing system
CN103106173A (en) Interconnection method among cores of multi-core processor
CN103532807A (en) Technology for PCIE (Peripheral Component Interface Express) data service quality management
CN109992543A (en) A kind of PCI-E data efficient transmission method based on ZYZQ-7000
CN104320341A (en) Adaptive and asynchronous routing network on 2D-Torus chip and design method thereof
CN104461979A (en) Multi-core on-chip communication network realization method based on ring bus
Ebrahimi et al. A high-performance network interface architecture for NoCs using reorder buffer sharing
Sikder et al. Exploring wireless technology for off-chip memory access
Sinha et al. Data-flow aware CNN accelerator with hybrid wireless interconnection
KR20150028520A (en) Memory-centric system interconnect structure
CN110096456A (en) A kind of High rate and large capacity caching method and device
CN103744817B (en) For Avalon bus to the communication Bridge equipment of Crossbar bus and communication conversion method thereof
CN102158380B (en) Multi-cluster network-on-chip architecture based on statistic time division multiplexing technology
Duan et al. Research on Double-Layer Networks-on-Chip for Inter-Chiplet Data Switching on Active Interposers

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120509

Termination date: 20151014

EXPY Termination of patent right or utility model