CN104008084A - Extensible 2.5-dimensional multi-core processor architecture - Google Patents

Extensible 2.5-dimensional multi-core processor architecture Download PDF

Info

Publication number
CN104008084A
CN104008084A CN201410237881.9A CN201410237881A CN104008084A CN 104008084 A CN104008084 A CN 104008084A CN 201410237881 A CN201410237881 A CN 201410237881A CN 104008084 A CN104008084 A CN 104008084A
Authority
CN
China
Prior art keywords
chip
sheet
processor
data
interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410237881.9A
Other languages
Chinese (zh)
Other versions
CN104008084B (en
Inventor
虞志益
林杰
朱世凯
俞剑明
周炜
周力君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201410237881.9A priority Critical patent/CN104008084B/en
Publication of CN104008084A publication Critical patent/CN104008084A/en
Application granted granted Critical
Publication of CN104008084B publication Critical patent/CN104008084B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)

Abstract

The invention belongs to the technical field of multi-core processors, in particular to an extensible 2.5-dimensional multi-core processor architecture. The extensible 2.5-dimensional multi-core processor architecture communicates with an extensible chip through interlinked multi-core processor chips of two-dimensional grid structure networks on chips and a high-speed data transmit channel which is provided by an SerDes interface. In the longitudinal direction, the processor reads and writes an individual character and directly accesses data through an off-chip memory interface and an off-chip memory; in the transverse direction, the processor controls and interacts data through an off-chip accelerating interface and an off-chip accelerator; the extensible 2.5-dimensional multi-core processor architecture is capable of supporting longitudinal and transverse multi-core chip extension through configuring a data selector at the interface between the chips through a software. Different interlinked chips are bonded in the same substrate through a 2.5-dimensional technique and are integrated in the same encapsulation. The extensible 2.5-dimensional multi-core processor architecture smartly supports the extension of the storage space of a traditional 2-dimensional multi-core processor, the coupling of a variety of accelerators and the extension of core computing resource, enables the reusability of the chip-level IP and the reconfigurability of the system-level design to be improved, enables the large chip design period to be shortened, and enables the manufacturing cost to be lowered.

Description

A kind of prolongable 2.5D polycaryon processor framework
Technical field
The invention belongs to polycaryon processor technical field, be specially a kind of prolongable 2.5D polycaryon processor framework.
Background technology
From Intel in 1971 is released global first item commercial microprocessor chip 4004, the performance of processor is just constantly soaring under the ic manufacturing technology of develop rapidly and the dual promotion of the pipeline design technology.On the one hand, under the promotion of Moore's Law, the lower raceway groove time delay of process node of new generation has promoted processor work dominant frequency, and less characteristic dimension allows larger integration density and the complex circuit designs degree of chip; On the other hand, processor designer also proposes and has put into practice many complicated pipelinings and improved instruction throughput, such as the very long instruction word (VLIW), the superscale that propose in order to excavate instruction level parallelism (ILP), in order to improve dynamic branch predictor technology of streamline load to weight ratio proposition etc.But the continuous lifting of processor performance has brought very important power problems thereupon, intel pentium 4 processors of take are example, when it is operated in 3.8GHz, more than power consumption also rises to 100W, this has brought serious fever phenomenon, has brought challenge to common air cooling method.So people attempt obtaining by the tasks in parallel of simple multinuclear the performance of single complex processor, the framework of polycaryon processor arises at the historic moment like this.
The design of polycaryon processor has alleviated the burden of monokaryon, thereby has simplified its circuit design, and the frequency of monokaryon and power-dissipation-reduced are got off.By division and the parallel processing of task, polycaryon processor has obtained higher energy efficiency.Nearly 10 for many years, and the design of polycaryon processor presents place's vigorous growth, wherein the obvious trend of performance place: the number of core increases gradually, the capacity of storer constantly expands, the kind of accelerator becomes varied.But these trend also can be brought many deficiencies and challenge: such as the area of chip constantly increases, bring the lifting of flow expense, chip physical Design work quantitative change large, the design cycle is elongated; And the expansibility of conventional two-dimensional (2D) chip, reconfigurability does not manifest by force.
The 2.5D encapsulation technology occurring in recent years has greatly overcome above-mentioned deficiency and challenge, this technology utilizes dimpling point (u-bump) processing procedure that the multi core chip of having made, memory chip and accelerator chip are bonded on same substrate, and with a kind of TSI(Through of being called Silicon Interposer) transmission line couples together, finally be made in an encapsulation inner, Fig. 1 is shown in by this process schematic diagram.Visible, 2.5D technology can realize the inner chip of encapsulation and connects flexibly and freely expand, and shorter interconnection line brings communication speed and bandwidth between higher sheet.
Summary of the invention
The object of the present invention is to provide a kind of prolongable 2.5D polycaryon processor framework, can support neatly the expansion of the storage space of traditional 2D polycaryon processor, the expansion of the coupling of multiple accelerator and kernel operation resource has the reusability of raising chip-scale IP and the reconfigurability of system level design, shortens the large chip design cycle, reduces manufacture cost and other advantages.
Based on above goal of the invention, the present invention proposes a kind of 2.5D polycaryon processor framework, its one-piece construction figure as shown in Figure 2, consists of the memory chip of polycaryon processor chip, expansion and the accelerator chip of expansion, the high speed data transfer tunneling traffic providing by SerDes interface between chip; Said chip is bonded in same substrate by 2.5D technique and is integrated in an encapsulation inner.Its core is the polycaryon processor chip of an interconnection of the network-on-chip (Network-on-chip, NoC) by two-dimensional grid structure, the high speed data transfer passage that it provides by SerDes interface and the chip communication of expansion.Longitudinally upper, processor carries out individual character read-write and immediate data memory access (Direct Memory Access, DMA) operation by sheet external memory interface module to chip external memory, realizes the expansion of local storage space; Transversely, processor accelerates interface module outward by sheet and the outer accelerator of sheet is controlled and data interaction, realizes the expansion of coupling accelerator.The present invention configures the data selector (MUX) at sheet interface place by software, support the multi core chip of vertical and horizontal to expand.
In the present invention, polycaryon processor is by network-on-chip (Network-on-chip, the NoC) interconnection of two-dimensional grid structure, and 4 processors form one bunch.Each processor is connected with local on-chip memory, and on-chip memory is divided into two classes, is respectively used to store instruction and low volume data, and capacity is less.Wherein, the network-on-chip that data-carrier store can consist of router is shared by other core in the mode of message transmission.
In the present invention, described 2.5D storage expanding function, as shown in Figure 3, longitudinally, processor carries out word read-write and dma operation by sheet external memory interface module to chip external memory to its configuration diagram.In the address that sheet external memory interface module detects load/store (loal/store) instruction of streamline drops into the address space of local chip external memory or receive the DMA configuration signal that processor sends, after command adapted thereto can being encoded, packing, by SerDes interface, send to outside sheet.The configuration packet structural representation that sheet external memory interface produces as shown in Figure 4.Choosing high several is operational code, can be defined as word is read, word is write, DMA reads, DMA writes by deviser.At word, read to write in situation with word, middle bit code represents the address of read-write, and several of ends are reservation position.In the situation that DMA reads and writes, middle bit code represents DMA start address, and several of ends are DMA end address.The configuration packet of sheet file memory controller reception self SerDes interface transmission, after decoding procedure, the read-write of the word of control strip external storage or dma operation.
In the present invention, described 2.5D coupling accelerator expanding function, its principle and storage are expanded similar, and different is to consider the layout of chip and the balance of distribution of 2.5D encapsulation TSI line, and the Expanding design of accelerator is laterally.Processor accelerates interface module outward by sheet and the outer accelerator of sheet is controlled and data interaction.Concrete control information is encapsulated in configuration packet, is used for defining the kind of accelerator, the information such as length of computational data called.The configuration packet structural representation that the outer accelerating interface of sheet produces as shown in Figure 5.
In the present invention, described 2.5D multi core chip expanding function, its interface and configuration schematic diagram are as shown in Figure 6, on the external border of processor, at vertical and horizontal, set up respectively the data selector (MUX) of an alternative, an input end connection route device, another input end is brace external memory interface module and the outer accelerating module of sheet respectively.The output terminal of MUX connects SerDes interface.Vertical or horizontal being operated under multi core chip expansion state decided in the MUX selecting side that configures sheet interface place by software.Specifically by software, configure multinuclear and expand configurator module, this module produces two class signals, and a kind of is the control signal of data selector, be used for being outputted to router, thereby the network-on-chip consisting of expansion router is realized the expansion of multiple nucleus system; Another kind of signal is the address configuration signal of router, the network-on-chip forming for two-dimensional grid, and the corresponding unique address number of each router, is used as the path computing of routing algorithm.After network-on-chip is expanded, need to configure new address.
Framework of the present invention can support neatly the expansion of the storage space of traditional 2D polycaryon processor, the expansion of the coupling of multiple accelerator and kernel operation resource, greatly improved the reconfigurability of reusability and the system level design of chip-scale IP, shorten the large chip design cycle, reduced manufacturing cost.In addition the speed and the bandwidth that adopt SerDes interface also to promote to communicate by letter between sheet.
Accompanying drawing explanation
Fig. 1 2.5D chip package schematic diagram.
Fig. 2 can expand 2.5D polycaryon processor integrated stand composition.
Configuration diagram is expanded in the storage of Fig. 3 2.5D polycaryon processor.
The configuration packet structural representation that Fig. 4 sheet external memory interface produces.
The configuration packet structural representation that the outer accelerating interface of Fig. 5 sheet produces.
Fig. 6 multi core chip is expanded interface configuration figure.
Embodiment
The present invention proposes a kind of prolongable 2.5D polycaryon processor framework of novelty, supports that three of storer, accelerator and multi core chip expand greatly, specifically tells about respectively embodiment separately below.
2.5D stores expansion: first as shown in Figure 6, by software control multinuclear, expand configurator and produce signal, on the one hand by output and the sheet external memory interface gating of MUX longitudinally, on the other hand, according to XY coordinate calculated address corresponding to position in the two-dimensional grid topological structure of router place, be loaded into this router.Then, during processor work, in the address that sheet external memory interface module detects load/store (loal/store) instruction of streamline drops into the address space of local chip external memory or receive the DMA configuration signal that processor sends, after command adapted thereto can being encoded, packing, by SerDes interface, send to outside sheet.As shown in Figure 4, choose high several is operational code to the configuration packet structural representation that sheet external memory interface produces, and can be defined as word is read, word is write, DMA reads, DMA writes by deviser.At word, read to write in situation with word, middle bit code represents the address of read-write, and several of ends are reservation position.In the situation that DMA reads and writes, middle bit code represents DMA start address, and several of ends are DMA end address.The implementation process mode that this 4 class is concrete is as follows respectively:
1) word is read.Sheet external memory interface is read configuration packet by word and is dealt into outside sheet, after the memory controller outside sheet receives, according to the address in configuration packet, read after corresponding memory cell data, then by SerDes interface by data feedback to processor.
2) word is write.Sheet external memory interface is write configuration packet by word and is dealt into outside sheet, then continues to send the data that will write, and the memory controller outside sheet is write data in the storage unit of configuration packet address indication.
3) DMA reads.Sheet external memory interface is read configuration packet by DMA and is dealt into outside sheet, after the memory controller outside sheet receives, according to the start address in configuration packet, reads successively corresponding memory cell data, feeds back to processor, until address increment is to the end address in configuration packet.
4) DMA writes.Sheet external memory interface is write configuration packet by DMA and is dealt into outside sheet, then continue to send the data that will write, memory controller outside sheet starts according to configuration packet start address, data is write successively in the storage unit of indication, until address increment is to the end address in configuration packet.
2.5D accelerator is expanded: first as shown in Figure 6, by software control multinuclear, expand configurator and produce signal, on the one hand by the output of horizontal MUX and the outer accelerating interface gating of sheet, on the other hand, according to XY coordinate calculated address corresponding to position in the two-dimensional grid topological structure of router place, be loaded into this router.Processor accelerates interface module outward by sheet and the outer accelerator of sheet is controlled and data interaction.Concrete control information is encapsulated in configuration packet, is used for defining the kind of accelerator, the information such as length of computational data called.The configuration packet structural representation that the outer accelerating interface of sheet produces as shown in Figure 5.
2.5D multi core chip is expanded: first as shown in Figure 6, by software control multinuclear, expand configurator and produce signal, on the one hand by laterally and longitudinally output and the router gating of MUX, on the other hand, XY coordinate calculated address corresponding to position in the two-dimensional grid topological structure of expanding according to router place, is loaded into this router.So, by expanding the network-on-chip of router formation, realize the expansion of multiple nucleus system.Multinuclear can be delivered to the processor on different chips by the router network of expanding.

Claims (5)

1. a prolongable 2.5D polycaryon processor framework, is characterized in that consisting of the memory chip of polycaryon processor chip, expansion and the accelerator chip of expansion the high speed data transfer tunneling traffic providing by SerDes interface between chip; Said chip is bonded in same substrate by 2.5D technique and is integrated in an encapsulation inner;
Longitudinally upper, processor carries out individual character read-write and immediate data accessing operation by sheet external memory interface module to chip external memory, realizes the expansion of local storage space; Transversely, processor accelerates interface module outward by sheet and the outer accelerator of sheet is controlled and data interaction, realizes the expansion of coupling accelerator.
2. 2.5D polycaryon processor framework according to claim 1, is characterized in that: polycaryon processor is interconnected by the network-on-chip of two-dimensional grid structure, and 4 processors form one bunch; Each processor is connected with local on-chip memory, and on-chip memory is divided into two classes, is respectively used to store instruction and low volume data; Wherein, the network-on-chip that data-carrier store consists of router is shared by other core in the mode of message transmission.
3. 2.5D polycaryon processor framework according to claim 2, is characterized in that: longitudinally, processor carries out word read-write and dma operation by sheet external memory interface module and chip external memory; In the address that sheet external memory interface module detects the load/store instruction of streamline drops into the address space of local chip external memory or receive the DMA configuration signal that processor sends, after command adapted thereto can being encoded, packing, by SerDes interface, send to outside sheet; The configuration packet of sheet file memory controller reception self SerDes interface transmission, after decoding procedure, the read-write of the word of control strip external storage or dma operation.
4. 2.5D polycaryon processor framework according to claim 3, is characterized in that: laterally, processor accelerates interface module outward by sheet and the outer accelerator of sheet is controlled and data interaction; Concrete control information is encapsulated in configuration packet, is used for defining calling the kind of accelerator, the length information of computational data.
5. 2.5D polycaryon processor framework according to claim 4, is characterized in that: on the external border of processor, set up respectively the data selector of an alternative at vertical and horizontal; An input end connection route device of data selector, another input end is brace external memory interface module and the outer accelerating module of sheet respectively; The output terminal of data selector connects SerDes interface; Vertical or horizontal being operated under multi core chip expansion state decided in the MUX selecting side that configures sheet interface place by software.
CN201410237881.9A 2014-06-02 2014-06-02 Extensible 2.5-dimensional multi-core processor architecture Expired - Fee Related CN104008084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410237881.9A CN104008084B (en) 2014-06-02 2014-06-02 Extensible 2.5-dimensional multi-core processor architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410237881.9A CN104008084B (en) 2014-06-02 2014-06-02 Extensible 2.5-dimensional multi-core processor architecture

Publications (2)

Publication Number Publication Date
CN104008084A true CN104008084A (en) 2014-08-27
CN104008084B CN104008084B (en) 2017-01-18

Family

ID=51368743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410237881.9A Expired - Fee Related CN104008084B (en) 2014-06-02 2014-06-02 Extensible 2.5-dimensional multi-core processor architecture

Country Status (1)

Country Link
CN (1) CN104008084B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794100A (en) * 2015-05-06 2015-07-22 西安电子科技大学 Heterogeneous multi-core processing system based on on-chip network
CN106294274A (en) * 2016-08-01 2017-01-04 严媚 A kind of polycaryon processor based on 2.5D Advanced Packaging
CN109086228A (en) * 2018-06-26 2018-12-25 深圳市安信智控科技有限公司 High-speed memory chip with multiple independent access channels
CN109144943A (en) * 2018-06-26 2019-01-04 深圳市安信智控科技有限公司 Computing chip and memory chip combined system based on high-speed serial channel interconnection
CN109240980A (en) * 2018-06-26 2019-01-18 深圳市安信智控科技有限公司 Memory access intensity algorithm with multiple high speed serialization Memory access channels accelerates chip
CN113138955A (en) * 2020-01-20 2021-07-20 北京灵汐科技有限公司 Network-on-chip interconnection structure of many-core system and data transmission method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739241A (en) * 2008-11-12 2010-06-16 中国科学院微电子研究所 On-chip multi-core DSP cluster and application extension method
CN101753388B (en) * 2008-11-28 2011-08-31 中国科学院微电子研究所 Router and interface device suitable for the extending on and among sheets of polycaryon processor
CN103345461B (en) * 2013-04-27 2016-01-20 电子科技大学 Based on the polycaryon processor network-on-a-chip with accelerator of FPGA
CN103425620B (en) * 2013-08-20 2018-01-12 复旦大学 The coupled structure of accelerator and processor based on multiple token-ring

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794100A (en) * 2015-05-06 2015-07-22 西安电子科技大学 Heterogeneous multi-core processing system based on on-chip network
CN104794100B (en) * 2015-05-06 2017-06-16 西安电子科技大学 Heterogeneous polynuclear processing system based on network-on-chip
CN106294274A (en) * 2016-08-01 2017-01-04 严媚 A kind of polycaryon processor based on 2.5D Advanced Packaging
CN109086228A (en) * 2018-06-26 2018-12-25 深圳市安信智控科技有限公司 High-speed memory chip with multiple independent access channels
CN109144943A (en) * 2018-06-26 2019-01-04 深圳市安信智控科技有限公司 Computing chip and memory chip combined system based on high-speed serial channel interconnection
CN109240980A (en) * 2018-06-26 2019-01-18 深圳市安信智控科技有限公司 Memory access intensity algorithm with multiple high speed serialization Memory access channels accelerates chip
CN113138955A (en) * 2020-01-20 2021-07-20 北京灵汐科技有限公司 Network-on-chip interconnection structure of many-core system and data transmission method
WO2021147721A1 (en) * 2020-01-20 2021-07-29 北京灵汐科技有限公司 Network-on-chip interconnection structure of many-core system, data transmission method, board card, electronic device, and computer-readable storage medium
CN113138955B (en) * 2020-01-20 2024-04-02 北京灵汐科技有限公司 Network-on-chip interconnection structure of many-core system and data transmission method

Also Published As

Publication number Publication date
CN104008084B (en) 2017-01-18

Similar Documents

Publication Publication Date Title
CN104008084A (en) Extensible 2.5-dimensional multi-core processor architecture
CN109643704A (en) Method and apparatus for managing the gate of the special power on multi-chip package
CN101808032B (en) Static XY routing algorithm-oriented two-dimensional grid NoC router optimization design method
CN103345461B (en) Based on the polycaryon processor network-on-a-chip with accelerator of FPGA
CN105468568B (en) Efficient coarseness restructurable computing system
CN103210589B (en) In conjunction with independent logical block in system on chip
US20140177626A1 (en) Die-stacked device with partitioned multi-hop network
CN111104775A (en) Network-on-chip topological structure and implementation method thereof
CN112817905A (en) Interconnection bare chip, interconnection micro assembly, interconnection micro system and communication method thereof
CN102866980B (en) Network communication cell used for multi-core microprocessor on-chip interconnected network
CN103106173A (en) Interconnection method among cores of multi-core processor
CN101706762A (en) Intelligent type signal transfer system
CN102567280A (en) Computer hardware platform design method based on DSP (digital signal processor) and FPGA (field programmable gate array)
CN105205025A (en) Chip interconnection method, chips and device
KR20160078233A (en) Techniques for managing power and performance for a networking device
CN112817907A (en) Interconnected bare chip expansion micro system and expansion method thereof
CN104360982A (en) Implementation method and system for host system directory structure based on reconfigurable chip technology
CN107920025A (en) A kind of dynamic routing method towards CPU GPU isomery network-on-chips
CN110555269B (en) Top-level clock tree structure of system on chip
CN102013984A (en) Two-dimensional net network-on-chip system
CN102158380B (en) Multi-cluster network-on-chip architecture based on statistic time division multiplexing technology
CN102761578B (en) Cluster computing system
CN103914429A (en) Multi-mode data transmission interconnection device for coarseness dynamic reconfigurable array
CN203982379U (en) For the multimode data transmission connectors of coarseness dynamic reconfigurable array
CN210402328U (en) Serial port extension circuit based on USB interface

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170118

Termination date: 20190602

CF01 Termination of patent right due to non-payment of annual fee