CN104008084B - Extensible 2.5-dimensional multi-core processor architecture - Google Patents
Extensible 2.5-dimensional multi-core processor architecture Download PDFInfo
- Publication number
- CN104008084B CN104008084B CN201410237881.9A CN201410237881A CN104008084B CN 104008084 B CN104008084 B CN 104008084B CN 201410237881 A CN201410237881 A CN 201410237881A CN 104008084 B CN104008084 B CN 104008084B
- Authority
- CN
- China
- Prior art keywords
- chip
- piece
- processor
- interface
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Multi Processors (AREA)
Abstract
The invention belongs to the technical field of multi-core processors, in particular to an extensible 2.5-dimensional multi-core processor architecture. The extensible 2.5-dimensional multi-core processor architecture communicates with an extensible chip through interlinked multi-core processor chips of two-dimensional grid structure networks on chips and a high-speed data transmit channel which is provided by an SerDes interface. In the longitudinal direction, the processor reads and writes an individual character and directly accesses data through an off-chip memory interface and an off-chip memory; in the transverse direction, the processor controls and interacts data through an off-chip accelerating interface and an off-chip accelerator; the extensible 2.5-dimensional multi-core processor architecture is capable of supporting longitudinal and transverse multi-core chip extension through configuring a data selector at the interface between the chips through a software. Different interlinked chips are bonded in the same substrate through a 2.5-dimensional technique and are integrated in the same encapsulation. The extensible 2.5-dimensional multi-core processor architecture smartly supports the extension of the storage space of a traditional 2-dimensional multi-core processor, the coupling of a variety of accelerators and the extension of core computing resource, enables the reusability of the chip-level IP and the reconfigurability of the system-level design to be improved, enables the large chip design period to be shortened, and enables the manufacturing cost to be lowered.
Description
Technical field
The invention belongs to polycaryon processor technical field, specially a kind of prolongable 2.5d polycaryon processor framework.
Background technology
From Intel in 1971 releases global first item commercial microprocessor chip 4004, the property of processor
Can just constantly rise under the dual promotion of the ic manufacturing technology developing rapidly and the pipeline design technology.On the one hand,
Under the promotion of Moore's Law, the lower raceway groove time delay of process node of new generation improves processor work dominant frequency, less
Characteristic size allows the bigger integration density of chip and complex circuit designs degree;On the other hand, processor designer it is also proposed that
To improve instruction throughput with the complicated pipelining having put into practice many, such as to propose to excavate instruction level parallelism (ilp)
Very long instruction word (vliw), superscale, in order to improve dynamic branch predictor technology etc. of streamline load to weight ratio proposition.But
It is that the continuous lifting of processor performance brings very important power problemses therewith, taking intel pentium 4 processor as a example, when
When it is operated in 3.8ghz, power consumption also rises to more than 100w, this results in serious fever phenomenon, to the cooling of common air
Method brings challenge.Then people attempt obtaining the performance of single complex processor with the tasks in parallel of simple multinuclear, this
The framework of sample polycaryon processor arises at the historic moment.
The design of polycaryon processor alleviates the burden of monokaryon, thus simplifies its circuit design so that the frequency of monokaryon
Get off with lower power consumption.By division and the parallel processing of task, polycaryon processor obtains higher energy efficiency.More than nearly 10
Since year, the design of polycaryon processor assumes place's vigorous growth, obvious trend wherein at performance: the number of core gradually increases
Many, memorizer capacity constantly expands, the species of accelerator becomes varied.But these trend also bring along many not enough and
Challenge: the area of such as chip constantly increases, brings the lifting of flow expense, chip makes physical design work quantitative change big, design week
Phase is elongated;And the expansibility of conventional two-dimensional (2d) chip, reconfigurability do not manifest by force.
The 2.5d encapsulation technology occurring in recent years greatly overcomes above-mentioned not enough and challenge, and this technology utilizes micro convex point
(u-bump) multi core chip made, memory chip and accelerator chip are bonded on same substrate processing procedure,
And it is referred to as tsi(through silicon interposer with a kind of) transmission line couples together, and is finally made in an encapsulation
Inside, this process schematic is shown in Fig. 1.It can be seen that, the chip that 2.5d technology can be realized within encapsulation flexibly connects and freely expands,
Shorter interconnection line brings communication speed and bandwidth between higher piece.
Content of the invention
It is an object of the invention to provide a kind of prolongable 2.5d polycaryon processor framework, can neatly support traditional 2d
The expansion of the expansion of the memory space of polycaryon processor, the coupling of multiple accelerator and kernel operation resource, has raising chip
The reconfigurability of the reusability of level ip and system level design, shortening large chip design cycle, reduction manufacture cost and other advantages.
Based on above goal of the invention, the present invention proposes a kind of 2.5d polycaryon processor framework, its overall structure figure such as Fig. 2 institute
Show, the memory chip by multi-core processor chip, expanded and the accelerator chip of expansion are constituted, between chip, pass through serdes
The high speed data transfer tunneling traffic that interface provides;Said chip is bonded in same substrate by 2.5d technique and is integrated in one
Individual encapsulation is internal.Its core be a network-on-chip (network-on-chip, noc) by two-dimensional grid structure interconnect many
Core processor chip, the high speed data transfer passage that it is provided by serdes interface and the chip communication expanded.In longitudinal direction, place
Reason device carries out individual character read-write and immediate data memory access (direct memory by piece external memory interface module to chip external memory
Access, dma) operation, realize the expansion of local storage space;Transversely, processor passes through to accelerate interface module and piece outside piece
Outer accelerator is controlled data interaction, realizes the expansion of coupling accelerator.The present invention passes through at software arrangements piece interface
Data selector (mux), support vertical and horizontal multi core chip expand.
In the present invention, polycaryon processor is mutual by the network-on-chip (network-on-chip, noc) of two-dimensional grid structure
Even, 4 processors constitute a cluster.Each processor is connected with local on-chip memory, and on-chip memory is divided into two classes, point
Not Yong Yu store instruction and low volume data, capacity is less.Wherein, the network-on-chip that data storage can consist of router
Shared by other cores in the way of message transmission.
In the present invention, described 2.5d stores expanding function, and its configuration diagram is as shown in figure 3, in longitudinal direction, processor passes through
Piece external memory interface module carries out word read-write and dma operation to chip external memory.When piece external memory interface module detects flowing water
The address that the loading/storage (loal/store) of line instructs drops in the address space of local chip external memory or receives
The dma configuration signal that processor sends, then command adapted thereto can be encoded, pack after be sent to outside piece by serdes interface.
The configuration pack arrangement schematic diagram that piece external memory interface produces is as shown in Figure 4.Choosing high several is operation code, can be fixed by designer
Justice is read for word, word is write, dma reads, dma writes.In the case of word is read and word is write, middle bit code represents the address of read-write, end
Several is reserved bit.In the case of dma read-write, middle bit code represents dma initial address, and ground is terminated for dma in end several
Location.Piece file memory controller receives the configuration bag of itself serdes interface transmission, after decoded step, controls chip external memory
Word read-write or dma operation.
In the present invention, described 2.5d couples accelerator expanding function, and its principle is similar with storage expansion, except for the difference that considers
To the layout of chip and the balance of distribution of 2.5d encapsulation tsi line, the Expanding design of accelerator is horizontal.Processor passes through piece
Outer acceleration interface module and the outer accelerator of piece are controlled data interaction.Specific control information is encapsulated in configuration bag, uses
To define the species calling accelerator, to calculate the information such as the length of data.The configuration pack arrangement that the outer accelerating interface of piece produces is illustrated
Figure is as shown in Figure 5.
In the present invention, described 2.5d multi core chip expanding function, its interface and configuration schematic diagram are as shown in fig. 6, processing
The external border of device, sets up the data selector (mux) of an alternative, an input link road respectively in vertical and horizontal
By device, another input connection sheet external memory interface module and outer accelerating module of piece respectively.The outfan of mux connects serdes
Interface.End is selected to determine that the vertical or horizontal multi core chip that is operated in expands state by the mux at software arrangements piece interface
Under.Expand configurator module particular by software arrangements multinuclear, this module produces two class signals, and a kind of is data selector
Control signal, for being outputted to router, thus the network-on-chip being constituted by expanding router realizes multiple nucleus system
Expansion;Another kind of signal is the address configuration signal of router, for the network-on-chip of two-dimensional grid composition, each router
All correspond to unique address number, be used as the path computing of routing algorithm.After network-on-chip is expanded, need to give to join
Put new address.
Framework of the present invention can neatly support the expansion of memory space of traditional 2d polycaryon processor, multiple accelerator
Coupling and the expansion of kernel operation resource, substantially increase the reusability of chip-scale ip and the reconfigurability of system level design,
Shorten the large chip design cycle, reduce manufacturing cost.In addition the speed of communication between piece is also improved using serdes interface
With bandwidth.
Brief description
Fig. 1 2.5d chip package schematic diagram.
Fig. 2 can expand 2.5d polycaryon processor integrated stand composition.
Configuration diagram is expanded in the storage of Fig. 3 2.5d polycaryon processor.
The configuration pack arrangement schematic diagram that Fig. 4 piece external memory interface produces.
The configuration pack arrangement schematic diagram that the outer accelerating interface of Fig. 5 piece produces.
Fig. 6 multi core chip expands interface configuration figure.
Specific embodiment
The present invention proposes a kind of novel prolongable 2.5d polycaryon processor framework, supports memorizer, accelerator and many
The three of core piece expand greatly, specifically tell about respective embodiment separately below.
2.5d storage is expanded: produces signal, on the one hand will as shown in fig. 6, controlling multinuclear to expand distributor by software first
Longitudinal output of mux is gated with piece external memory interface, on the other hand, according to router place two-dimensional grid topological structure middle position
Put corresponding xy Coordinate generation address, be loaded into this router.Then, during processor work, when the inspection of piece external memory interface module
Measure the address that the loading/storage (loal/store) of streamline instructs drop in the address space of local chip external memory or
Person receives the dma configuration signal that processor sends, then command adapted thereto can be encoded, pack after sent by serdes interface
To outside piece.The configuration pack arrangement schematic diagram that piece external memory interface produces as shown in figure 4, choosing high several is operation code, Ke Yiyou
Designer is defined as word reading, word is write, dma reads, dma writes.In the case of word is read and word is write, middle bit code represents the ground of read-write
Location, end several is reserved bit.In the case of dma read-write, middle bit code represents dma initial address, and end several is dma
End address.This 4 class specific implementation process mode is as follows respectively:
1) word is read.Word is read configuration bag and is dealt into outside piece by piece external memory interface, after the storage control outside piece receives, according to
After corresponding memory cell data is read in address in configuration bag, then pass through serdes interface by data feedback to processor.
2) word is write.Word is write configuration bag and is dealt into outside piece by piece external memory interface, then continues to send the data that will write, outside piece
Storage control write data into configuration packet address instruction memory element in.
3) dma reads.Dma is read configuration bag and is dealt into outside piece by piece external memory interface, after the storage control outside piece receives, presses
It is successively read corresponding memory cell data according to the initial address in configuration bag, feed back to processor, join until address is incremented to
Put the end address in bag.
4) dma writes.Dma is write configuration bag and is dealt into outside piece by piece external memory interface, then continues to send the data that will write, piece
Outer storage control starts according to configuration bag initial address, data is write successively in the memory element of instruction, until address
It is incremented to the end address in configuration bag.
2.5d accelerator is expanded: first as shown in fig. 6, controlling multinuclear to expand distributor by software produce signal, on the one hand
By outer to the output of horizontal mux and piece accelerating interface gating, on the other hand, according in the two-dimensional grid topological structure of router place
Position corresponding xy Coordinate generation address, is loaded into this router.Processor passes through to accelerate outside piece to accelerate outside interface module and piece
Device is controlled data interaction.Specific control information is encapsulated in configuration bag, for defining the species calling accelerator, meter
The information such as the length of the evidence that counts.The configuration pack arrangement schematic diagram that the outer accelerating interface of piece produces is as shown in Figure 5.
2.5d multi core chip is expanded: first as shown in fig. 6, controlling multinuclear to expand distributor by software produce signal, a side
The output of horizontal and longitudinal mux is gated by face with router, on the other hand, according to the two-dimensional grid expanded that router is located
In topological structure, position corresponding xy Coordinate generation address, is loaded into this router.So, by expanding the piece that router is constituted
The expansion of upper real-time performance multiple nucleus system.Multinuclear can be delivered on different chips by the router network expanded
Processor.
Claims (3)
1. a kind of prolongable 2.5d polycaryon processor framework it is characterised in that by multi-core processor chip, expand memorizer
Chip and the accelerator chip expanded are constituted, the high speed data transfer tunneling traffic being provided by serdes interface between chip;
Said chip is bonded in same substrate by 2.5d technique and is integrated in an encapsulation inside;
In longitudinal direction, processor carries out individual character read-write and immediate data memory access behaviour by piece external memory interface module to chip external memory
Make, realize the expansion of local storage space;Transversely, processor is controlled by accelerating interface module and the outer accelerator of piece outside piece
Data interaction processed, realizes the expansion of coupling accelerator;
Polycaryon processor is interconnected by the network-on-chip of two-dimensional grid structure, and 4 processors constitute a cluster;Each processor and basis
The on-chip memory on ground is connected, and on-chip memory is divided into two classes, is respectively used to store instruction and low volume data;Wherein, store less
The network-on-chip that formed by router of on-chip memory of amount data with message transmission by way of shared by other cores;
In longitudinal direction, processor carries out word read-write and dma operation by piece external memory interface module and chip external memory;When piece external memory
The address of storage interface module load/store instruction that streamline is detected drop in the address space of local chip external memory or
Person receives the dma configuration signal that processor sends, then command adapted thereto can be encoded, pack after sent by serdes interface
To outside piece;Piece file memory controller receives the configuration bag of itself serdes interface transmission, after decoded step, stores outside control sheet
The word read-write of device or dma operation.
2. 2.5d polycaryon processor framework according to claim 1 it is characterised in that: horizontal, processor passes through outside piece
Accelerating interface module and the outer accelerator of piece are controlled data interaction;Specific control information is encapsulated in configuration bag, is used for
Definition is called the species of accelerator, is calculated the length information of data.
3. 2.5d polycaryon processor framework according to claim 1 it is characterised in that: on the external border of processor, point
The data selector of an alternative is not set up in vertical and horizontal;One input connection route device of data selector, separately
One input accelerates interface module outside connection sheet external memory interface module and piece respectively;The outfan of data selector connects
Serdes interface;End is selected to determine vertical or horizontal to be operated in polycaryon processor by mux at software arrangements piece interface
Under chip expansion state.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410237881.9A CN104008084B (en) | 2014-06-02 | 2014-06-02 | Extensible 2.5-dimensional multi-core processor architecture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410237881.9A CN104008084B (en) | 2014-06-02 | 2014-06-02 | Extensible 2.5-dimensional multi-core processor architecture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104008084A CN104008084A (en) | 2014-08-27 |
CN104008084B true CN104008084B (en) | 2017-01-18 |
Family
ID=51368743
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410237881.9A Expired - Fee Related CN104008084B (en) | 2014-06-02 | 2014-06-02 | Extensible 2.5-dimensional multi-core processor architecture |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104008084B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104794100B (en) * | 2015-05-06 | 2017-06-16 | 西安电子科技大学 | Heterogeneous polynuclear processing system based on network-on-chip |
CN106294274A (en) * | 2016-08-01 | 2017-01-04 | 严媚 | A kind of polycaryon processor based on 2.5D Advanced Packaging |
CN109144943A (en) * | 2018-06-26 | 2019-01-04 | 深圳市安信智控科技有限公司 | Computing chip and memory chip combined system based on high-speed serial channel interconnection |
CN109086228B (en) * | 2018-06-26 | 2022-03-29 | 深圳市安信智控科技有限公司 | High speed memory chip with multiple independent access channels |
CN109240980A (en) * | 2018-06-26 | 2019-01-18 | 深圳市安信智控科技有限公司 | Memory access intensity algorithm with multiple high speed serialization Memory access channels accelerates chip |
CN113138955B (en) * | 2020-01-20 | 2024-04-02 | 北京灵汐科技有限公司 | Network-on-chip interconnection structure of many-core system and data transmission method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101739241A (en) * | 2008-11-12 | 2010-06-16 | 中国科学院微电子研究所 | On-chip multi-core DSP cluster and application extension method |
CN101753388A (en) * | 2008-11-28 | 2010-06-23 | 中国科学院微电子研究所 | Routing and interface device suitable for on-chip and inter-chip extension of multi-core processor |
CN103345461A (en) * | 2013-04-27 | 2013-10-09 | 电子科技大学 | Multi-core processor on-chip network system based on FPGA and provided with accelerator |
CN103425620A (en) * | 2013-08-20 | 2013-12-04 | 复旦大学 | Coupled structure of accelerator and processor based on multiple Token-Rings |
-
2014
- 2014-06-02 CN CN201410237881.9A patent/CN104008084B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101739241A (en) * | 2008-11-12 | 2010-06-16 | 中国科学院微电子研究所 | On-chip multi-core DSP cluster and application extension method |
CN101753388A (en) * | 2008-11-28 | 2010-06-23 | 中国科学院微电子研究所 | Routing and interface device suitable for on-chip and inter-chip extension of multi-core processor |
CN103345461A (en) * | 2013-04-27 | 2013-10-09 | 电子科技大学 | Multi-core processor on-chip network system based on FPGA and provided with accelerator |
CN103425620A (en) * | 2013-08-20 | 2013-12-04 | 复旦大学 | Coupled structure of accelerator and processor based on multiple Token-Rings |
Also Published As
Publication number | Publication date |
---|---|
CN104008084A (en) | 2014-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104008084B (en) | Extensible 2.5-dimensional multi-core processor architecture | |
US9825843B2 (en) | Die-stacked device with partitioned multi-hop network | |
Kannan et al. | Enabling interposer-based disintegration of multi-core processors | |
CN103345461B (en) | Based on the polycaryon processor network-on-a-chip with accelerator of FPGA | |
CN101808032B (en) | Static XY routing algorithm-oriented two-dimensional grid NoC router optimization design method | |
Darve et al. | Physical implementation of an asynchronous 3D-NoC router using serial vertical links | |
US11062070B2 (en) | Die to die interconnect structure for modularized integrated circuit devices | |
CN103210589B (en) | In conjunction with independent logical block in system on chip | |
WO2024159717A1 (en) | Reconfigurable 3d chip and integration method therefor | |
Daneshtalab et al. | HIBS—Novel inter-layer bus structure for stacked architectures | |
Jabbar et al. | 3D multiprocessor with 3D NoC architecture based on Tezzaron technology | |
CN116260760A (en) | Topology reconstruction method based on flow sensing in multi-core interconnection network | |
CN107920025A (en) | A kind of dynamic routing method towards CPU GPU isomery network-on-chips | |
CN118034780A (en) | Nonvolatile multi-core heterogeneous integrated memory internal computing acceleration system | |
Pano et al. | 3D NoCs with active interposer for multi-die systems | |
US11205109B2 (en) | On-chip communication system for neural network processors | |
Liao et al. | Exploring AMBA AXI on-chip interconnection for TSV-based 3D SoCs | |
Feng et al. | Heterogeneous Die-to-Die Interfaces: Enabling More Flexible Chiplet Interconnection Systems | |
Daneshtalab et al. | CMIT—A novel cluster-based topology for 3D stacked architectures | |
Franzon et al. | Computing in 3D | |
CN103744817B (en) | For Avalon bus to the communication Bridge equipment of Crossbar bus and communication conversion method thereof | |
Shamim et al. | Energy-efficient wireless interconnection framework for multichip systems with in-package memory stacks | |
Vivet et al. | Interconnect challenges for 3D multi-cores: From 3D network-on-chip to cache interconnects | |
CN102761578A (en) | Cluster computing system | |
Jin et al. | FPGA prototype design of the computation nodes in a cluster based MPSoC |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170118 Termination date: 20190602 |