CN106294274A - A kind of polycaryon processor based on 2.5D Advanced Packaging - Google Patents
A kind of polycaryon processor based on 2.5D Advanced Packaging Download PDFInfo
- Publication number
- CN106294274A CN106294274A CN201610619205.7A CN201610619205A CN106294274A CN 106294274 A CN106294274 A CN 106294274A CN 201610619205 A CN201610619205 A CN 201610619205A CN 106294274 A CN106294274 A CN 106294274A
- Authority
- CN
- China
- Prior art keywords
- chip
- core
- processor
- polycaryon processor
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/167—Interprocessor communication using a common memory, e.g. mailbox
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
The invention discloses a kind of polycaryon processor based on 2.5D Advanced Packaging, the chip using μ bump technology to realize interconnecting some high speeds is bonded to the polycaryon processor on same silicon interlayer abreast.Core is 32 MIPS processor architectures of tufted 8 core: use the circuit switching bilayer network-on-chip controlled based on packet switch, is laterally accelerated device and expands, and longitudinally carries out memorizer expansion, also supports that core core expands pattern.Can support that multimedia H.264 entropy decodes, communicate 64 FFT, complex multiplication etc..Interconnected by sheet interface circuit realiration between chip, there is height functional mobility, the feature such as framework reconfigurability and silicon chip reusability.Sheet interface circuit based on 2.5D encapsulation technology solves the difficulties that check figure mesh is expanded, storage system is expanded, accelerator is expanded and between sheet, the aspect such as interconnection runs into of polycaryon processor.
Description
Technical field
Present invention relates particularly to a kind of polycaryon processor based on 2.5D Advanced Packaging, belong to polycaryon processor technology
Field.
Background technology
In recent years, smart machine development is swift and violent.With the intelligence product that panel computer, smart mobile phone and wearable device are representative
Product obtain rapid progress and development in intelligent movable and mobile Internet field, while changing people life style, also
The profound influence theory of people.Processor, as the core of smart machine, has wide application market.But along with intelligence
The development of energy technology, the requirement to processor is stricter.At a high speed, low-power consumption, extensive compatibility is the development of future processor
Direction.Polycaryon processor enjoys designer to favor owing to having parallel processing capability with high energy efficiency advantage, this is accomplished by a high speed,
The sheet interface of high bandwidth and high reliability.Along with application software is the most complicated, the check figure mesh of polycaryon processor is constantly carrying
Rising, memorizer is constantly changing with accelerating hardware.
The 2.5D Advanced Packaging occurred in recent years, uses μ-bump technology, and the chip that can some high speeds be interconnected is also
The key of row and in same silicon interlayer, it is achieved the list encapsulation of multi-chip is integrated.2.5D Advanced Packaging has had the spirit of PCB multi-chip concurrently
Activity, the thermal diffusivity that the high speed that 2D sheet interconnects and 3D/TSV are short of.Wish by will be based on 2.5D Advanced Packaging skill
The sheet interface circuit of art is applied in the design of polycaryon processor, finds and solve the multinuclear towards 2.5D sheet interface to process
Device is expanded at check figure mesh, and memorizer is expanded, accelerator expand and at a high speed, the aspect such as interconnection between the sheet of high bandwidth and high reliability
Problem.
Summary of the invention
Goal of the invention: it is an object of the invention to apply 2.5D Advanced Packaging, by multi core chip key side by side with to same
The list encapsulation realizing multi-chip on one piece of silicon interlayer is integrated, and is applied to many by the sheet interface circuit integrated based on 2.5D first
Interconnection at a high speed between multi-chip is realized on core processor.And then find, study and solve polycaryon processor design towards 2.5D sheet
The difficulties that the check figure mesh of interface circuit is expanded, storage system is expanded, accelerator is expanded and between sheet, the aspect such as interconnection runs into
And key technology.By proposing the functional mobility with height of a kind of chip-scale, framework reconfigurability and chip can be answered
With the 2.5D multicore architecture of property, explore polycaryon processor architecture evolution.Invention main innovation point is first by TRX sheet interface electricity
Road is applied in the polycaryon processor that 2.5D is integrated, it is achieved thereby that functionally have motility, expansibility and restructural
Property advantage, circuit performance has at a high speed, high bandwidth and the advantage of high energy efficiency.
Technical scheme
It is an object of the invention to be reached by following measure:
A. polycaryon processor and extension framework
The polycaryon processor framework core of the present invention is 32 MIPS processors 101 of one piece of 8 core, with the external world by TRX electricity
Road 105 communicates with asynchronous FIFO 106, can work as minimum system.Improving data locality further, 8 cores are further divided into
Two bunches 102, every bunch is 4 cores, is communicated by shared storage mode in bunch, by the circuit switching controlled based on packet switch between bunch
Double-deck network-on-chip carries out message transmission.Each MIPS core comprises on the sheet of the privately owned command memory by 1.5K word and 1K word common
Enjoy data storage.System can also longitudinally expand the SRAM memory module 103 of 8 16KB, it is possible to by bunch inner treater
With overall situation DMA share and access.In order to accelerate kernel process in communication and multimedia application, system is also laterally expanding 4
Accelerator module 104, decodes including H.264 entropy, 16 FFT and complex multiplication module.Core-memorizer, core-accelerator
It is connected by identical interface circuit, the data selector logical block in core-memory interface additional designs so that system
Connect in the expansion that longitudinally can be configured to double-deck network-on-chip, thus the scale realizing core-core is expanded, it is thus achieved that doubling operations energy
Power.
B. full custom high speed TRX sheet interface circuit
Full custom high speed TRX module 105, it is mainly by serializer 803, current mode logic buffer 804, TSI passage
806, sampler 809, decoder 810, TX voltage controlled oscillator (VCO) 805, clock data recovery circuit (CDR) 811, RX are voltage-controlled
Agitator VCO812 forms;
Parallel data is converted into serial sequence by 8:1 serializer 803, and current mode logic buffer is received in the output of serializer
The input of CML804;
The output of current mode logic buffer CML804 drives the T-shaped line of 2.5D TSI806;
Another termination sampler of TSI806T molded line, serial data is given decoding after receiving serial data by sampler
Device 810, decoder finally converts serial data into parallel data and completes data transmission;
Voltage controlled oscillator is used for producing clock signal, uses clock and data recovery based on delay phase-locked loop (DLL) (CDR)
Circuit 811 regulates the deflection of sampling clock, uses the phase detectors that two XOR gates (XOR) are constituted to judge sampling clock phase
For inputting the position of data, and produce " early " pulse and " late " pulse.Additionally, also apply electric charge pump (charge-pump)
These pulses switch become plurality of level to control DLL delay line, and DLL delay line is then used for regulating the delay phase place of clock, and
And feed back to sampler 809 as sampled clock signal.
Beneficial effect
The present invention uses 2.5D Advanced Packaging, applies μ-bump technology, and the chip some high speeds interconnected is abreast
It is bonded on same silicon interlayer, and then the list encapsulation realizing multi core chip is integrated.2.5D Advanced Packaging has had PCB multi-chip concurrently
The thermal diffusivity that the high speed interconnected on the motility of interconnection, 2D sheet and 3D/TSV are short of.In addition by 2.5D Advanced Packaging skill
Art is incorporated into polycaryon processor design, it is possible to finds, study and solves polycaryon processor and design indirect towards 2.5D sheet
Mouthful the difficulties that check figure mesh is expanded, storage system is expanded, accelerator is expanded and between sheet, the aspect such as interconnection runs into and crucial skill
Art.And then explore functional mobility, framework reconfigurability and the silicon chip reusability with height of a kind of chip-scale
2.5D multicore architecture.
Accompanying drawing explanation
1, Fig. 1 is the basic framework of 2.5D technology polycaryon processor;
2, Fig. 2 is full custom high-speed chip interface TRX circuit;
3, Fig. 3 is the integrated schematic diagram of 2.5D.
Wherein: 101 is two bunch of 8 core processor model;102 are;102 is bunch interior core 0 processor;103 is sheet external memory
Device 0;104 is off-chip accelerator;105 is TRX custom circuit;106 is asynchronous FIFO;107 is bunch interior core 1 processor;108 are
Bunch interior core 2 processor;109 is bunch interior core 3 processor;110 is chip external memory 1;111 is chip external memory 2;112 are
Chip external memory 3.
801 is TxD circuit module;802 is TX data;803 is serializer;804 is current mode logic buffer;805 are
TX voltage controlled oscillator (VCO);806 is TSI passage;
807 is RxD circuit module;
808 is RX data;809 is sampler;810 is decoder;811 is clock data recovery circuit (CDR);
812 is RX voltage controlled oscillator (VCO).
301 is chip 1;302 is chip 2;303 is chip 3;304 is TSI;305 is silicon interlayer;306 is sheet interface electricity
Road;307 is UART control circuit.
Detailed description of the invention
According to the content of aforementioned invention, this 2.5D polycaryon processor is designed with the 65nm LPE of Global Foundry
Technique, based on traditional 2D chip EDA design tool, has carried out the physics realization of chip to whole numerical portions.Emphasis is adopted
By Hierarchical Design flow process and digital-to-analogue mixed design method, have employed Parametric designing in large quantities and Perl language automatically generates
Script skill, has effectively ensured finishing on schedule of flow project.Multi core chip area is 3.29 × 2.34mm2, equivalent logic
Door is 1,270,000, and IC Compiler sequential address display 1.2V operating at voltages frequency is 500MHz, Prime Time PX power consumption
The typical power consumption of analysis result display monokaryon is 25.5mW, and efficiency is 51.0pj/OP.Chip external memory chip area is 1.30 ×
0.83mm2, operating frequency is 719MHz;Off-chip accelerator chip area is 1.30 × 0.83mm2.
First, core is 32 MIPS processors 101 of one piece of 8 core, with the external world by TRX circuit 105 and asynchronous FIFO
106 communications, can work as minimum system.Improving data locality further, 8 cores are further divided into two bunches 102, and every bunch is
4 cores, are communicated by shared storage mode in bunch, and between bunch, the circuit switching bilayer network-on-chip by controlling based on packet switch is carried out
Message is transmitted.Each MIPS core comprises shares data storage on the sheet of the privately owned command memory by 1.5K word and 1K word.System
System can also longitudinally expand the SRAM memory module 103 of 8 16KB, it is possible to shared visit by bunch inner treater and overall situation DMA
Ask.In order to accelerate kernel process in communication and multimedia application, system is also laterally expanding 4 accelerator modules 104, its
Include H.264 entropy decoding, 16 FFT and complex multiplication module.
As it is shown on figure 3, be 2.5D system integration schematic diagram.For common sheet interface circuit be on the basis of 2D
The high speed interconnection being directly realized by same chip between disparate modules.And the present invention achieves first and is answered by sheet interface circuit 306
It is used on 2.5D polycaryon processor.Unlike 2D interface circuit, 2.5D encapsulation technology utilizes micro convex point (μ-bump) processing procedure
Several nude films 301,302,303 made are bonded on same substrate 305, and are referred to as TSI304 with a kind of
The transmission line of (Through Silicon Interposer) interconnects, and is finally made in an encapsulation internal.Based on 2.5D skill
The sheet interface circuit 306 of art is incorporated into polycaryon processor design, solves polycaryon processor and designs towards 2.5D's
The difficulties that the aspects such as the expansion of check figure mesh, storage system expansion and accelerator expansion run into.It is achieved thereby that a kind of chip-scale
The 2.5D multicore architecture of functional mobility, framework reconfigurability and silicon chip reusability with height.
The application of sheet interface circuit 306, as an innovative point of the design, plays the effect of chip chamber interconnection communication,
First sheet interface circuit 306 is applied to 2.5D polycaryon processor.Owing to 2.5D encapsulation have employed flit-based on μ-bump
Chip (flip-chip) technology, multi core chip top-level metallic placed 246 octagon reguline metals according to DRC rule (long
75um, pitch spacing 160um) it is used for and being bonded of silicon interlayer 305 (Silicon Interposer).When doing rear end domain,
Between all of, interconnection signal and the signal being connected to package pins will be connected on top-level metallic block, owing to system supports
Transmission channel between the sheet that 12 tunnels are two-way, so according to the allocation result of I/O resource between sheet, every unidirectional physical transmission channel is only
There are 5, and the top level ports of sheet interface circuit 306 is to be abstracted into 32 asynchronous FIFOs, just both ensure that 5 physical channels
Really transmit 32 bit data, in turn ensure that the signal integrity between sheet.
Applying sheet interface circuit first on 2.5D polycaryon processor framework, it achieves the height of interconnection line on 2D sheet
Speed, high bandwidth and high energy efficiency (parasitic RC is little, and drive circuit is few).Full custom high speed TRX circuit is the end of whole interface
Level, is joined directly together with 2.5D TSI 806 physical channel, and wherein key component is exactly by serializer 803, current mode logic buffering
The transmitting terminal of the composition such as device 804 and the receiving terminal circuit that is made up of sampler 809, deserializer 810 etc..Transmitter uses 8:1 string
8 bit parallel data are converted into serial sequence by row device 803.4 d type flip flops are used to constitute shift register chain for often organizing idol
Number (D0, D2, D4, D8) or odd number (D1, D3, D5, D7) data bit, be followed by 2 select 1 selector to combine them.Electric current
Type logic circuit (CML) 804 output drives the T-line of TSI 806 between transmitter and receptor.The transmitting terminal of multinuclear I/O interface
Drive circuit is made up of two-stage cascade CML buffer.In order to alleviate the mismatch of impedance, use the resistance in 50 Europe to be transmitted line
Impedance matching.At receiving terminal, sampler 809 is connected to data clock and recovers module (CDR) 811, is converted by current mode signal
For digital CMOS level signal, then through deserializer 810 thus it is converted into 8 bit parallel data by digital signal.Based on delay
Clock and data recovery (CDR) circuit 811 of phaselocked loop (DLL) is used for regulating the deflection of sampling clock, uses two XOR gate XOR
The phase detectors constituted judge the sampling clock position relative to input data, and produce " early " pulse and " late " pulse.
Additionally, also apply electric charge pump (charge-pump), these pulses switch become plurality of level to control DLL delay line, DLL
Delay line is then used for regulating the delay phase place of clock, and feeds back to sampler 809 as sampled clock signal.
This module is under 8Gbps operating rate, and TxD power consumption is 15.24mW, postpone for 1.16ns, RxD power consumption be
7.10mW, postpones as 2.69ns.Owing to 2.5D multi core chip in this paper supports 12 paths two-way simultaneous transmission, so,
Between the sheet of system, the peak bandwidth of data transmission is 24GB/s, and wherein memory access part is 16GB/s.
Claims (2)
1. a polycaryon processor based on 2.5D Advanced Packaging, it is characterised in that include the architecture design of processor, its
Processor architecture is designed as:
32 MIPS processors (101) of tufted 8 core are the core cell of whole system, with the external world by full custom high speed TRX mould
Block (105) communicates with S/P asynchronous FIFO (106), and 8 core processors (102) are divided into two bunches, every bunch of 4 cores;Can enter in bunch
Row is shared storage and is communicated, and between bunch, the circuit switching bilayer network-on-chip by controlling based on packet switch carries out information transmission;
System carries out longitudinal memory space and expands, and remotely expands the SRAM memory module (103) of 8 16KB in off-chip, logical
Cross full custom high speed TRX module (105) to be connected on processor with S/P asynchronous FIFO (106);
System is laterally expanded space, and increases by 4 accelerator modules (104), again by full custom high speed TRX module (105)
It is connected on processor with S/P asynchronous FIFO (106).
2. according to the polycaryon processor in claim 1, it is characterised in that described full custom high speed TRX module (105) is by going here and there
Row device (803), current mode logic buffer (804), TSI passage (806), sampler (809), decoder (810) form;
Parallel data is converted into serial sequence by 8:1 serializer (803), and current mode logic buffer is received in the output of serializer
The input of CML (804);
The output of current mode logic buffer CML (804) drives the T-shaped line of 2.5D TSI (806);
Another termination sampler of TSI (806) T-shaped line, serial data is given decoder after receiving serial data by sampler
(810), decoder finally converts serial data into parallel data and completes data transmission;
Other modules of TRX circuit include that TX voltage controlled oscillator VCO (805), clock data recovery circuit (CDR) (811), RX are voltage-controlled
Agitator VCO (812), voltage controlled oscillator is used for producing clock signal, and clock data recovery circuit CDR (811) is used for regulating adopting
The deflection of sample clock.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610619205.7A CN106294274A (en) | 2016-08-01 | 2016-08-01 | A kind of polycaryon processor based on 2.5D Advanced Packaging |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610619205.7A CN106294274A (en) | 2016-08-01 | 2016-08-01 | A kind of polycaryon processor based on 2.5D Advanced Packaging |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106294274A true CN106294274A (en) | 2017-01-04 |
Family
ID=57663625
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610619205.7A Pending CN106294274A (en) | 2016-08-01 | 2016-08-01 | A kind of polycaryon processor based on 2.5D Advanced Packaging |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106294274A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6323679B1 (en) * | 1999-11-12 | 2001-11-27 | Sandia Corporation | Flexible programmable logic module |
CN104008084A (en) * | 2014-06-02 | 2014-08-27 | 复旦大学 | Extensible 2.5-dimensional multi-core processor architecture |
CN104035896A (en) * | 2014-06-10 | 2014-09-10 | 复旦大学 | Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system |
-
2016
- 2016-08-01 CN CN201610619205.7A patent/CN106294274A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6323679B1 (en) * | 1999-11-12 | 2001-11-27 | Sandia Corporation | Flexible programmable logic module |
CN104008084A (en) * | 2014-06-02 | 2014-08-27 | 复旦大学 | Extensible 2.5-dimensional multi-core processor architecture |
CN104035896A (en) * | 2014-06-10 | 2014-09-10 | 复旦大学 | Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system |
Non-Patent Citations (1)
Title |
---|
JIE LIN 等: "A Scalable and Reconfigurable 2.5D Integrated Multicore Processor on Silicon Interposer", 《CUSTOM INTEGRATED CIRCUITS CONFERENCE》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101951313B (en) | FPGA-based SFI4.1 device | |
PD et al. | A scalable network-on-chip microprocessor with 2.5 D integrated memory and accelerator | |
CN103885919A (en) | Multi-DSP and multi-FPGA parallel processing system and implement method | |
CN102999467A (en) | High-speed interface and low-speed interface switching circuit and method based on FPGA (Field Programmable Gate Array) | |
CN110334044A (en) | A kind of MIPI DPHY transmitting line and equipment | |
Höppner et al. | An energy efficient multi-Gbit/s NoC transceiver architecture with combined AC/DC drivers and stoppable clocking in 65 nm and 28 nm CMOS | |
Rahmani et al. | BBVC-3D-NoC: an efficient 3D NoC architecture using bidirectional bisynchronous vertical channels | |
CN106649155A (en) | SDR SDRAM (Single Data Rate Synchronous Dynamic Random Access Memory) controller with low power consumption and high data throughout and working method thereof | |
CN104184456B (en) | For the low frequency multi-phase differential clock tree-shaped high-speed low-power-consumption serializer of I/O interface | |
US10924096B1 (en) | Circuit and method for dynamic clock skew compensation | |
CN106294274A (en) | A kind of polycaryon processor based on 2.5D Advanced Packaging | |
TWI810962B (en) | Semiconductor die, electronic component, electronic device and manufacturing method thereof | |
Dang et al. | FPGA implementation of a low latency and high throughput network-on-chip router architecture | |
Liao et al. | Exploring AMBA AXI on-chip interconnection for TSV-based 3D SoCs | |
CN104394072A (en) | Double-pumped vertical channel for three dimensional Network on chip | |
CN108717400A (en) | A kind of field programmable gate array and communication means | |
Goyal et al. | Neksus: An interconnect for heterogeneous system-in-package architectures | |
CN106708768A (en) | Aurora interface binding method and device based on shared phase-locked loops | |
Sundaram et al. | A reconfigurable asynchronous SERDES for heterogenous chiplet interconnects | |
CN219842685U (en) | FPGA platform LVDS parallel bus bandwidth acceleration device | |
Ning et al. | Design of a GALS Wrapper for Network on Chip | |
Canegallo et al. | System on chip with 1.12 mW-32Gb/s AC-coupled 3D memory interface | |
Wey et al. | A 2Gb/s high-speed scalable shift-register based on-chip serial communication design for SoC applications | |
Chen et al. | Design of SRAM Based Interface Module with DMA in Inductive-Coupling 3D Stacked IoT Chips | |
Mas et al. | Network-on-chip: The intelligence is in the wire |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170104 |
|
WD01 | Invention patent application deemed withdrawn after publication |