CN104008084B - Extensible 2.5-dimensional multi-core processor architecture - Google Patents

Extensible 2.5-dimensional multi-core processor architecture Download PDF

Info

Publication number
CN104008084B
CN104008084B CN201410237881.9A CN201410237881A CN104008084B CN 104008084 B CN104008084 B CN 104008084B CN 201410237881 A CN201410237881 A CN 201410237881A CN 104008084 B CN104008084 B CN 104008084B
Authority
CN
China
Prior art keywords
chip
piece
processor
interface
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410237881.9A
Other languages
Chinese (zh)
Other versions
CN104008084A (en
Inventor
虞志益
林杰
朱世凯
俞剑明
周炜
周力君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201410237881.9A priority Critical patent/CN104008084B/en
Publication of CN104008084A publication Critical patent/CN104008084A/en
Application granted granted Critical
Publication of CN104008084B publication Critical patent/CN104008084B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)

Abstract

The invention belongs to the technical field of multi-core processors, in particular to an extensible 2.5-dimensional multi-core processor architecture. The extensible 2.5-dimensional multi-core processor architecture communicates with an extensible chip through interlinked multi-core processor chips of two-dimensional grid structure networks on chips and a high-speed data transmit channel which is provided by an SerDes interface. In the longitudinal direction, the processor reads and writes an individual character and directly accesses data through an off-chip memory interface and an off-chip memory; in the transverse direction, the processor controls and interacts data through an off-chip accelerating interface and an off-chip accelerator; the extensible 2.5-dimensional multi-core processor architecture is capable of supporting longitudinal and transverse multi-core chip extension through configuring a data selector at the interface between the chips through a software. Different interlinked chips are bonded in the same substrate through a 2.5-dimensional technique and are integrated in the same encapsulation. The extensible 2.5-dimensional multi-core processor architecture smartly supports the extension of the storage space of a traditional 2-dimensional multi-core processor, the coupling of a variety of accelerators and the extension of core computing resource, enables the reusability of the chip-level IP and the reconfigurability of the system-level design to be improved, enables the large chip design period to be shortened, and enables the manufacturing cost to be lowered.

Description

A kind of prolongable 2.5d polycaryon processor framework
Technical field
The invention belongs to polycaryon processor technical field, specially a kind of prolongable 2.5d polycaryon processor framework.
Background technology
From Intel in 1971 releases global first item commercial microprocessor chip 4004, the property of processor Can just constantly rise under the dual promotion of the ic manufacturing technology developing rapidly and the pipeline design technology.On the one hand, Under the promotion of Moore's Law, the lower raceway groove time delay of process node of new generation improves processor work dominant frequency, less Characteristic size allows the bigger integration density of chip and complex circuit designs degree;On the other hand, processor designer it is also proposed that To improve instruction throughput with the complicated pipelining having put into practice many, such as to propose to excavate instruction level parallelism (ilp) Very long instruction word (vliw), superscale, in order to improve dynamic branch predictor technology etc. of streamline load to weight ratio proposition.But It is that the continuous lifting of processor performance brings very important power problemses therewith, taking intel pentium 4 processor as a example, when When it is operated in 3.8ghz, power consumption also rises to more than 100w, this results in serious fever phenomenon, to the cooling of common air Method brings challenge.Then people attempt obtaining the performance of single complex processor with the tasks in parallel of simple multinuclear, this The framework of sample polycaryon processor arises at the historic moment.
The design of polycaryon processor alleviates the burden of monokaryon, thus simplifies its circuit design so that the frequency of monokaryon Get off with lower power consumption.By division and the parallel processing of task, polycaryon processor obtains higher energy efficiency.More than nearly 10 Since year, the design of polycaryon processor assumes place's vigorous growth, obvious trend wherein at performance: the number of core gradually increases Many, memorizer capacity constantly expands, the species of accelerator becomes varied.But these trend also bring along many not enough and Challenge: the area of such as chip constantly increases, brings the lifting of flow expense, chip makes physical design work quantitative change big, design week Phase is elongated;And the expansibility of conventional two-dimensional (2d) chip, reconfigurability do not manifest by force.
The 2.5d encapsulation technology occurring in recent years greatly overcomes above-mentioned not enough and challenge, and this technology utilizes micro convex point (u-bump) multi core chip made, memory chip and accelerator chip are bonded on same substrate processing procedure, And it is referred to as tsi(through silicon interposer with a kind of) transmission line couples together, and is finally made in an encapsulation Inside, this process schematic is shown in Fig. 1.It can be seen that, the chip that 2.5d technology can be realized within encapsulation flexibly connects and freely expands, Shorter interconnection line brings communication speed and bandwidth between higher piece.
Content of the invention
It is an object of the invention to provide a kind of prolongable 2.5d polycaryon processor framework, can neatly support traditional 2d The expansion of the expansion of the memory space of polycaryon processor, the coupling of multiple accelerator and kernel operation resource, has raising chip The reconfigurability of the reusability of level ip and system level design, shortening large chip design cycle, reduction manufacture cost and other advantages.
Based on above goal of the invention, the present invention proposes a kind of 2.5d polycaryon processor framework, its overall structure figure such as Fig. 2 institute Show, the memory chip by multi-core processor chip, expanded and the accelerator chip of expansion are constituted, between chip, pass through serdes The high speed data transfer tunneling traffic that interface provides;Said chip is bonded in same substrate by 2.5d technique and is integrated in one Individual encapsulation is internal.Its core be a network-on-chip (network-on-chip, noc) by two-dimensional grid structure interconnect many Core processor chip, the high speed data transfer passage that it is provided by serdes interface and the chip communication expanded.In longitudinal direction, place Reason device carries out individual character read-write and immediate data memory access (direct memory by piece external memory interface module to chip external memory Access, dma) operation, realize the expansion of local storage space;Transversely, processor passes through to accelerate interface module and piece outside piece Outer accelerator is controlled data interaction, realizes the expansion of coupling accelerator.The present invention passes through at software arrangements piece interface Data selector (mux), support vertical and horizontal multi core chip expand.
In the present invention, polycaryon processor is mutual by the network-on-chip (network-on-chip, noc) of two-dimensional grid structure Even, 4 processors constitute a cluster.Each processor is connected with local on-chip memory, and on-chip memory is divided into two classes, point Not Yong Yu store instruction and low volume data, capacity is less.Wherein, the network-on-chip that data storage can consist of router Shared by other cores in the way of message transmission.
In the present invention, described 2.5d stores expanding function, and its configuration diagram is as shown in figure 3, in longitudinal direction, processor passes through Piece external memory interface module carries out word read-write and dma operation to chip external memory.When piece external memory interface module detects flowing water The address that the loading/storage (loal/store) of line instructs drops in the address space of local chip external memory or receives The dma configuration signal that processor sends, then command adapted thereto can be encoded, pack after be sent to outside piece by serdes interface. The configuration pack arrangement schematic diagram that piece external memory interface produces is as shown in Figure 4.Choosing high several is operation code, can be fixed by designer Justice is read for word, word is write, dma reads, dma writes.In the case of word is read and word is write, middle bit code represents the address of read-write, end Several is reserved bit.In the case of dma read-write, middle bit code represents dma initial address, and ground is terminated for dma in end several Location.Piece file memory controller receives the configuration bag of itself serdes interface transmission, after decoded step, controls chip external memory Word read-write or dma operation.
In the present invention, described 2.5d couples accelerator expanding function, and its principle is similar with storage expansion, except for the difference that considers To the layout of chip and the balance of distribution of 2.5d encapsulation tsi line, the Expanding design of accelerator is horizontal.Processor passes through piece Outer acceleration interface module and the outer accelerator of piece are controlled data interaction.Specific control information is encapsulated in configuration bag, uses To define the species calling accelerator, to calculate the information such as the length of data.The configuration pack arrangement that the outer accelerating interface of piece produces is illustrated Figure is as shown in Figure 5.
In the present invention, described 2.5d multi core chip expanding function, its interface and configuration schematic diagram are as shown in fig. 6, processing The external border of device, sets up the data selector (mux) of an alternative, an input link road respectively in vertical and horizontal By device, another input connection sheet external memory interface module and outer accelerating module of piece respectively.The outfan of mux connects serdes Interface.End is selected to determine that the vertical or horizontal multi core chip that is operated in expands state by the mux at software arrangements piece interface Under.Expand configurator module particular by software arrangements multinuclear, this module produces two class signals, and a kind of is data selector Control signal, for being outputted to router, thus the network-on-chip being constituted by expanding router realizes multiple nucleus system Expansion;Another kind of signal is the address configuration signal of router, for the network-on-chip of two-dimensional grid composition, each router All correspond to unique address number, be used as the path computing of routing algorithm.After network-on-chip is expanded, need to give to join Put new address.
Framework of the present invention can neatly support the expansion of memory space of traditional 2d polycaryon processor, multiple accelerator Coupling and the expansion of kernel operation resource, substantially increase the reusability of chip-scale ip and the reconfigurability of system level design, Shorten the large chip design cycle, reduce manufacturing cost.In addition the speed of communication between piece is also improved using serdes interface With bandwidth.
Brief description
Fig. 1 2.5d chip package schematic diagram.
Fig. 2 can expand 2.5d polycaryon processor integrated stand composition.
Configuration diagram is expanded in the storage of Fig. 3 2.5d polycaryon processor.
The configuration pack arrangement schematic diagram that Fig. 4 piece external memory interface produces.
The configuration pack arrangement schematic diagram that the outer accelerating interface of Fig. 5 piece produces.
Fig. 6 multi core chip expands interface configuration figure.
Specific embodiment
The present invention proposes a kind of novel prolongable 2.5d polycaryon processor framework, supports memorizer, accelerator and many The three of core piece expand greatly, specifically tell about respective embodiment separately below.
2.5d storage is expanded: produces signal, on the one hand will as shown in fig. 6, controlling multinuclear to expand distributor by software first Longitudinal output of mux is gated with piece external memory interface, on the other hand, according to router place two-dimensional grid topological structure middle position Put corresponding xy Coordinate generation address, be loaded into this router.Then, during processor work, when the inspection of piece external memory interface module Measure the address that the loading/storage (loal/store) of streamline instructs drop in the address space of local chip external memory or Person receives the dma configuration signal that processor sends, then command adapted thereto can be encoded, pack after sent by serdes interface To outside piece.The configuration pack arrangement schematic diagram that piece external memory interface produces as shown in figure 4, choosing high several is operation code, Ke Yiyou Designer is defined as word reading, word is write, dma reads, dma writes.In the case of word is read and word is write, middle bit code represents the ground of read-write Location, end several is reserved bit.In the case of dma read-write, middle bit code represents dma initial address, and end several is dma End address.This 4 class specific implementation process mode is as follows respectively:
1) word is read.Word is read configuration bag and is dealt into outside piece by piece external memory interface, after the storage control outside piece receives, according to After corresponding memory cell data is read in address in configuration bag, then pass through serdes interface by data feedback to processor.
2) word is write.Word is write configuration bag and is dealt into outside piece by piece external memory interface, then continues to send the data that will write, outside piece Storage control write data into configuration packet address instruction memory element in.
3) dma reads.Dma is read configuration bag and is dealt into outside piece by piece external memory interface, after the storage control outside piece receives, presses It is successively read corresponding memory cell data according to the initial address in configuration bag, feed back to processor, join until address is incremented to Put the end address in bag.
4) dma writes.Dma is write configuration bag and is dealt into outside piece by piece external memory interface, then continues to send the data that will write, piece Outer storage control starts according to configuration bag initial address, data is write successively in the memory element of instruction, until address It is incremented to the end address in configuration bag.
2.5d accelerator is expanded: first as shown in fig. 6, controlling multinuclear to expand distributor by software produce signal, on the one hand By outer to the output of horizontal mux and piece accelerating interface gating, on the other hand, according in the two-dimensional grid topological structure of router place Position corresponding xy Coordinate generation address, is loaded into this router.Processor passes through to accelerate outside piece to accelerate outside interface module and piece Device is controlled data interaction.Specific control information is encapsulated in configuration bag, for defining the species calling accelerator, meter The information such as the length of the evidence that counts.The configuration pack arrangement schematic diagram that the outer accelerating interface of piece produces is as shown in Figure 5.
2.5d multi core chip is expanded: first as shown in fig. 6, controlling multinuclear to expand distributor by software produce signal, a side The output of horizontal and longitudinal mux is gated by face with router, on the other hand, according to the two-dimensional grid expanded that router is located In topological structure, position corresponding xy Coordinate generation address, is loaded into this router.So, by expanding the piece that router is constituted The expansion of upper real-time performance multiple nucleus system.Multinuclear can be delivered on different chips by the router network expanded Processor.

Claims (3)

1. a kind of prolongable 2.5d polycaryon processor framework it is characterised in that by multi-core processor chip, expand memorizer Chip and the accelerator chip expanded are constituted, the high speed data transfer tunneling traffic being provided by serdes interface between chip; Said chip is bonded in same substrate by 2.5d technique and is integrated in an encapsulation inside;
In longitudinal direction, processor carries out individual character read-write and immediate data memory access behaviour by piece external memory interface module to chip external memory Make, realize the expansion of local storage space;Transversely, processor is controlled by accelerating interface module and the outer accelerator of piece outside piece Data interaction processed, realizes the expansion of coupling accelerator;
Polycaryon processor is interconnected by the network-on-chip of two-dimensional grid structure, and 4 processors constitute a cluster;Each processor and basis The on-chip memory on ground is connected, and on-chip memory is divided into two classes, is respectively used to store instruction and low volume data;Wherein, store less The network-on-chip that formed by router of on-chip memory of amount data with message transmission by way of shared by other cores;
In longitudinal direction, processor carries out word read-write and dma operation by piece external memory interface module and chip external memory;When piece external memory The address of storage interface module load/store instruction that streamline is detected drop in the address space of local chip external memory or Person receives the dma configuration signal that processor sends, then command adapted thereto can be encoded, pack after sent by serdes interface To outside piece;Piece file memory controller receives the configuration bag of itself serdes interface transmission, after decoded step, stores outside control sheet The word read-write of device or dma operation.
2. 2.5d polycaryon processor framework according to claim 1 it is characterised in that: horizontal, processor passes through outside piece Accelerating interface module and the outer accelerator of piece are controlled data interaction;Specific control information is encapsulated in configuration bag, is used for Definition is called the species of accelerator, is calculated the length information of data.
3. 2.5d polycaryon processor framework according to claim 1 it is characterised in that: on the external border of processor, point The data selector of an alternative is not set up in vertical and horizontal;One input connection route device of data selector, separately One input accelerates interface module outside connection sheet external memory interface module and piece respectively;The outfan of data selector connects Serdes interface;End is selected to determine vertical or horizontal to be operated in polycaryon processor by mux at software arrangements piece interface Under chip expansion state.
CN201410237881.9A 2014-06-02 2014-06-02 Extensible 2.5-dimensional multi-core processor architecture Expired - Fee Related CN104008084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410237881.9A CN104008084B (en) 2014-06-02 2014-06-02 Extensible 2.5-dimensional multi-core processor architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410237881.9A CN104008084B (en) 2014-06-02 2014-06-02 Extensible 2.5-dimensional multi-core processor architecture

Publications (2)

Publication Number Publication Date
CN104008084A CN104008084A (en) 2014-08-27
CN104008084B true CN104008084B (en) 2017-01-18

Family

ID=51368743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410237881.9A Expired - Fee Related CN104008084B (en) 2014-06-02 2014-06-02 Extensible 2.5-dimensional multi-core processor architecture

Country Status (1)

Country Link
CN (1) CN104008084B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794100B (en) * 2015-05-06 2017-06-16 西安电子科技大学 Heterogeneous polynuclear processing system based on network-on-chip
CN106294274A (en) * 2016-08-01 2017-01-04 严媚 A kind of polycaryon processor based on 2.5D Advanced Packaging
CN109240980A (en) * 2018-06-26 2019-01-18 深圳市安信智控科技有限公司 Memory access intensity algorithm with multiple high speed serialization Memory access channels accelerates chip
CN109086228B (en) * 2018-06-26 2022-03-29 深圳市安信智控科技有限公司 High speed memory chip with multiple independent access channels
CN109144943A (en) * 2018-06-26 2019-01-04 深圳市安信智控科技有限公司 Computing chip and memory chip combined system based on high-speed serial channel interconnection
CN113138955B (en) * 2020-01-20 2024-04-02 北京灵汐科技有限公司 Network-on-chip interconnection structure of many-core system and data transmission method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739241A (en) * 2008-11-12 2010-06-16 中国科学院微电子研究所 On-chip multi-core DSP cluster and application extension method
CN101753388A (en) * 2008-11-28 2010-06-23 中国科学院微电子研究所 Router and interface device suitable for the extending on and among sheets of polycaryon processor
CN103345461A (en) * 2013-04-27 2013-10-09 电子科技大学 Multi-core processor on-chip network system based on FPGA and provided with accelerator
CN103425620A (en) * 2013-08-20 2013-12-04 复旦大学 Coupled structure of accelerator and processor based on multiple Token-Rings

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739241A (en) * 2008-11-12 2010-06-16 中国科学院微电子研究所 On-chip multi-core DSP cluster and application extension method
CN101753388A (en) * 2008-11-28 2010-06-23 中国科学院微电子研究所 Router and interface device suitable for the extending on and among sheets of polycaryon processor
CN103345461A (en) * 2013-04-27 2013-10-09 电子科技大学 Multi-core processor on-chip network system based on FPGA and provided with accelerator
CN103425620A (en) * 2013-08-20 2013-12-04 复旦大学 Coupled structure of accelerator and processor based on multiple Token-Rings

Also Published As

Publication number Publication date
CN104008084A (en) 2014-08-27

Similar Documents

Publication Publication Date Title
CN104008084B (en) Extensible 2.5-dimensional multi-core processor architecture
Kannan et al. Enabling interposer-based disintegration of multi-core processors
US9065722B2 (en) Die-stacked device with partitioned multi-hop network
CN103345461B (en) Based on the polycaryon processor network-on-a-chip with accelerator of FPGA
CN101808032B (en) Static XY routing algorithm-oriented two-dimensional grid NoC router optimization design method
Darve et al. Physical implementation of an asynchronous 3D-NoC router using serial vertical links
US20080164907A1 (en) Customized silicon chips produced using dynamically configurable polymorphic network
US11062070B2 (en) Die to die interconnect structure for modularized integrated circuit devices
CN103210589B (en) In conjunction with independent logical block in system on chip
Agyeman et al. Low power heterogeneous 3d networks-on-chip architectures
Daneshtalab et al. HIBS—Novel inter-layer bus structure for stacked architectures
Mathur et al. Thermal-aware design space exploration of 3-D systolic ML accelerators
Jabbar et al. 3D multiprocessor with 3D NoC architecture based on Tezzaron technology
CN116260760A (en) Topology reconstruction method based on flow sensing in multi-core interconnection network
Rathore et al. A 16nm 785GMACs/J 784-Core Digital Signal Processor Array With a Multilayer Switch Box Interconnect, Assembled as a 2× 2 Dielet with 10μm-Pitch Inter-Dielet I/O for Runtime Multi-Program Reconfiguration
US11205109B2 (en) On-chip communication system for neural network processors
Pano et al. 3D NoCs with active interposer for multi-die systems
Liao et al. Exploring AMBA AXI on-chip interconnection for TSV-based 3D SoCs
Daneshtalab et al. CMIT—A novel cluster-based topology for 3D stacked architectures
CN103744817B (en) For Avalon bus to the communication Bridge equipment of Crossbar bus and communication conversion method thereof
Duan et al. Research on double-layer networks-on-chip for inter-chiplet data switching on active interposers
CN102761578A (en) Cluster computing system
Jin et al. FPGA prototype design of the computation nodes in a cluster based MPSoC
Wu et al. Cost evaluation on reuse of generic network service dies in three-dimensional integrated circuits
Dutoit et al. 3D integration for power-efficient computing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170118

Termination date: 20190602