CN106294274A - A kind of polycaryon processor based on 2.5D Advanced Packaging - Google Patents

A kind of polycaryon processor based on 2.5D Advanced Packaging Download PDF

Info

Publication number
CN106294274A
CN106294274A CN201610619205.7A CN201610619205A CN106294274A CN 106294274 A CN106294274 A CN 106294274A CN 201610619205 A CN201610619205 A CN 201610619205A CN 106294274 A CN106294274 A CN 106294274A
Authority
CN
China
Prior art keywords
chip
core
processor
polycaryon processor
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610619205.7A
Other languages
Chinese (zh)
Inventor
余浩
虞志益
严媚
伊颖颖
刘旭
黄汐威
许鹤
李硕
李烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201610619205.7A priority Critical patent/CN106294274A/en
Publication of CN106294274A publication Critical patent/CN106294274A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

The invention discloses a kind of polycaryon processor based on 2.5D Advanced Packaging, the chip using μ bump technology to realize interconnecting some high speeds is bonded to the polycaryon processor on same silicon interlayer abreast.Core is 32 MIPS processor architectures of tufted 8 core: use the circuit switching bilayer network-on-chip controlled based on packet switch, is laterally accelerated device and expands, and longitudinally carries out memorizer expansion, also supports that core core expands pattern.Can support that multimedia H.264 entropy decodes, communicate 64 FFT, complex multiplication etc..Interconnected by sheet interface circuit realiration between chip, there is height functional mobility, the feature such as framework reconfigurability and silicon chip reusability.Sheet interface circuit based on 2.5D encapsulation technology solves the difficulties that check figure mesh is expanded, storage system is expanded, accelerator is expanded and between sheet, the aspect such as interconnection runs into of polycaryon processor.

Description

A kind of polycaryon processor based on 2.5D Advanced Packaging
Technical field
Present invention relates particularly to a kind of polycaryon processor based on 2.5D Advanced Packaging, belong to polycaryon processor technology Field.
Background technology
In recent years, smart machine development is swift and violent.With the intelligence product that panel computer, smart mobile phone and wearable device are representative Product obtain rapid progress and development in intelligent movable and mobile Internet field, while changing people life style, also The profound influence theory of people.Processor, as the core of smart machine, has wide application market.But along with intelligence The development of energy technology, the requirement to processor is stricter.At a high speed, low-power consumption, extensive compatibility is the development of future processor Direction.Polycaryon processor enjoys designer to favor owing to having parallel processing capability with high energy efficiency advantage, this is accomplished by a high speed, The sheet interface of high bandwidth and high reliability.Along with application software is the most complicated, the check figure mesh of polycaryon processor is constantly carrying Rising, memorizer is constantly changing with accelerating hardware.
The 2.5D Advanced Packaging occurred in recent years, uses μ-bump technology, and the chip that can some high speeds be interconnected is also The key of row and in same silicon interlayer, it is achieved the list encapsulation of multi-chip is integrated.2.5D Advanced Packaging has had the spirit of PCB multi-chip concurrently Activity, the thermal diffusivity that the high speed that 2D sheet interconnects and 3D/TSV are short of.Wish by will be based on 2.5D Advanced Packaging skill The sheet interface circuit of art is applied in the design of polycaryon processor, finds and solve the multinuclear towards 2.5D sheet interface to process Device is expanded at check figure mesh, and memorizer is expanded, accelerator expand and at a high speed, the aspect such as interconnection between the sheet of high bandwidth and high reliability Problem.
Summary of the invention
Goal of the invention: it is an object of the invention to apply 2.5D Advanced Packaging, by multi core chip key side by side with to same The list encapsulation realizing multi-chip on one piece of silicon interlayer is integrated, and is applied to many by the sheet interface circuit integrated based on 2.5D first Interconnection at a high speed between multi-chip is realized on core processor.And then find, study and solve polycaryon processor design towards 2.5D sheet The difficulties that the check figure mesh of interface circuit is expanded, storage system is expanded, accelerator is expanded and between sheet, the aspect such as interconnection runs into And key technology.By proposing the functional mobility with height of a kind of chip-scale, framework reconfigurability and chip can be answered With the 2.5D multicore architecture of property, explore polycaryon processor architecture evolution.Invention main innovation point is first by TRX sheet interface electricity Road is applied in the polycaryon processor that 2.5D is integrated, it is achieved thereby that functionally have motility, expansibility and restructural Property advantage, circuit performance has at a high speed, high bandwidth and the advantage of high energy efficiency.
Technical scheme
It is an object of the invention to be reached by following measure:
A. polycaryon processor and extension framework
The polycaryon processor framework core of the present invention is 32 MIPS processors 101 of one piece of 8 core, with the external world by TRX electricity Road 105 communicates with asynchronous FIFO 106, can work as minimum system.Improving data locality further, 8 cores are further divided into Two bunches 102, every bunch is 4 cores, is communicated by shared storage mode in bunch, by the circuit switching controlled based on packet switch between bunch Double-deck network-on-chip carries out message transmission.Each MIPS core comprises on the sheet of the privately owned command memory by 1.5K word and 1K word common Enjoy data storage.System can also longitudinally expand the SRAM memory module 103 of 8 16KB, it is possible to by bunch inner treater With overall situation DMA share and access.In order to accelerate kernel process in communication and multimedia application, system is also laterally expanding 4 Accelerator module 104, decodes including H.264 entropy, 16 FFT and complex multiplication module.Core-memorizer, core-accelerator It is connected by identical interface circuit, the data selector logical block in core-memory interface additional designs so that system Connect in the expansion that longitudinally can be configured to double-deck network-on-chip, thus the scale realizing core-core is expanded, it is thus achieved that doubling operations energy Power.
B. full custom high speed TRX sheet interface circuit
Full custom high speed TRX module 105, it is mainly by serializer 803, current mode logic buffer 804, TSI passage 806, sampler 809, decoder 810, TX voltage controlled oscillator (VCO) 805, clock data recovery circuit (CDR) 811, RX are voltage-controlled Agitator VCO812 forms;
Parallel data is converted into serial sequence by 8:1 serializer 803, and current mode logic buffer is received in the output of serializer The input of CML804;
The output of current mode logic buffer CML804 drives the T-shaped line of 2.5D TSI806;
Another termination sampler of TSI806T molded line, serial data is given decoding after receiving serial data by sampler Device 810, decoder finally converts serial data into parallel data and completes data transmission;
Voltage controlled oscillator is used for producing clock signal, uses clock and data recovery based on delay phase-locked loop (DLL) (CDR) Circuit 811 regulates the deflection of sampling clock, uses the phase detectors that two XOR gates (XOR) are constituted to judge sampling clock phase For inputting the position of data, and produce " early " pulse and " late " pulse.Additionally, also apply electric charge pump (charge-pump) These pulses switch become plurality of level to control DLL delay line, and DLL delay line is then used for regulating the delay phase place of clock, and And feed back to sampler 809 as sampled clock signal.
Beneficial effect
The present invention uses 2.5D Advanced Packaging, applies μ-bump technology, and the chip some high speeds interconnected is abreast It is bonded on same silicon interlayer, and then the list encapsulation realizing multi core chip is integrated.2.5D Advanced Packaging has had PCB multi-chip concurrently The thermal diffusivity that the high speed interconnected on the motility of interconnection, 2D sheet and 3D/TSV are short of.In addition by 2.5D Advanced Packaging skill Art is incorporated into polycaryon processor design, it is possible to finds, study and solves polycaryon processor and design indirect towards 2.5D sheet Mouthful the difficulties that check figure mesh is expanded, storage system is expanded, accelerator is expanded and between sheet, the aspect such as interconnection runs into and crucial skill Art.And then explore functional mobility, framework reconfigurability and the silicon chip reusability with height of a kind of chip-scale 2.5D multicore architecture.
Accompanying drawing explanation
1, Fig. 1 is the basic framework of 2.5D technology polycaryon processor;
2, Fig. 2 is full custom high-speed chip interface TRX circuit;
3, Fig. 3 is the integrated schematic diagram of 2.5D.
Wherein: 101 is two bunch of 8 core processor model;102 are;102 is bunch interior core 0 processor;103 is sheet external memory Device 0;104 is off-chip accelerator;105 is TRX custom circuit;106 is asynchronous FIFO;107 is bunch interior core 1 processor;108 are Bunch interior core 2 processor;109 is bunch interior core 3 processor;110 is chip external memory 1;111 is chip external memory 2;112 are Chip external memory 3.
801 is TxD circuit module;802 is TX data;803 is serializer;804 is current mode logic buffer;805 are TX voltage controlled oscillator (VCO);806 is TSI passage;
807 is RxD circuit module;
808 is RX data;809 is sampler;810 is decoder;811 is clock data recovery circuit (CDR);
812 is RX voltage controlled oscillator (VCO).
301 is chip 1;302 is chip 2;303 is chip 3;304 is TSI;305 is silicon interlayer;306 is sheet interface electricity Road;307 is UART control circuit.
Detailed description of the invention
According to the content of aforementioned invention, this 2.5D polycaryon processor is designed with the 65nm LPE of Global Foundry Technique, based on traditional 2D chip EDA design tool, has carried out the physics realization of chip to whole numerical portions.Emphasis is adopted By Hierarchical Design flow process and digital-to-analogue mixed design method, have employed Parametric designing in large quantities and Perl language automatically generates Script skill, has effectively ensured finishing on schedule of flow project.Multi core chip area is 3.29 × 2.34mm2, equivalent logic Door is 1,270,000, and IC Compiler sequential address display 1.2V operating at voltages frequency is 500MHz, Prime Time PX power consumption The typical power consumption of analysis result display monokaryon is 25.5mW, and efficiency is 51.0pj/OP.Chip external memory chip area is 1.30 × 0.83mm2, operating frequency is 719MHz;Off-chip accelerator chip area is 1.30 × 0.83mm2.
First, core is 32 MIPS processors 101 of one piece of 8 core, with the external world by TRX circuit 105 and asynchronous FIFO 106 communications, can work as minimum system.Improving data locality further, 8 cores are further divided into two bunches 102, and every bunch is 4 cores, are communicated by shared storage mode in bunch, and between bunch, the circuit switching bilayer network-on-chip by controlling based on packet switch is carried out Message is transmitted.Each MIPS core comprises shares data storage on the sheet of the privately owned command memory by 1.5K word and 1K word.System System can also longitudinally expand the SRAM memory module 103 of 8 16KB, it is possible to shared visit by bunch inner treater and overall situation DMA Ask.In order to accelerate kernel process in communication and multimedia application, system is also laterally expanding 4 accelerator modules 104, its Include H.264 entropy decoding, 16 FFT and complex multiplication module.
As it is shown on figure 3, be 2.5D system integration schematic diagram.For common sheet interface circuit be on the basis of 2D The high speed interconnection being directly realized by same chip between disparate modules.And the present invention achieves first and is answered by sheet interface circuit 306 It is used on 2.5D polycaryon processor.Unlike 2D interface circuit, 2.5D encapsulation technology utilizes micro convex point (μ-bump) processing procedure Several nude films 301,302,303 made are bonded on same substrate 305, and are referred to as TSI304 with a kind of The transmission line of (Through Silicon Interposer) interconnects, and is finally made in an encapsulation internal.Based on 2.5D skill The sheet interface circuit 306 of art is incorporated into polycaryon processor design, solves polycaryon processor and designs towards 2.5D's The difficulties that the aspects such as the expansion of check figure mesh, storage system expansion and accelerator expansion run into.It is achieved thereby that a kind of chip-scale The 2.5D multicore architecture of functional mobility, framework reconfigurability and silicon chip reusability with height.
The application of sheet interface circuit 306, as an innovative point of the design, plays the effect of chip chamber interconnection communication, First sheet interface circuit 306 is applied to 2.5D polycaryon processor.Owing to 2.5D encapsulation have employed flit-based on μ-bump Chip (flip-chip) technology, multi core chip top-level metallic placed 246 octagon reguline metals according to DRC rule (long 75um, pitch spacing 160um) it is used for and being bonded of silicon interlayer 305 (Silicon Interposer).When doing rear end domain, Between all of, interconnection signal and the signal being connected to package pins will be connected on top-level metallic block, owing to system supports Transmission channel between the sheet that 12 tunnels are two-way, so according to the allocation result of I/O resource between sheet, every unidirectional physical transmission channel is only There are 5, and the top level ports of sheet interface circuit 306 is to be abstracted into 32 asynchronous FIFOs, just both ensure that 5 physical channels Really transmit 32 bit data, in turn ensure that the signal integrity between sheet.
Applying sheet interface circuit first on 2.5D polycaryon processor framework, it achieves the height of interconnection line on 2D sheet Speed, high bandwidth and high energy efficiency (parasitic RC is little, and drive circuit is few).Full custom high speed TRX circuit is the end of whole interface Level, is joined directly together with 2.5D TSI 806 physical channel, and wherein key component is exactly by serializer 803, current mode logic buffering The transmitting terminal of the composition such as device 804 and the receiving terminal circuit that is made up of sampler 809, deserializer 810 etc..Transmitter uses 8:1 string 8 bit parallel data are converted into serial sequence by row device 803.4 d type flip flops are used to constitute shift register chain for often organizing idol Number (D0, D2, D4, D8) or odd number (D1, D3, D5, D7) data bit, be followed by 2 select 1 selector to combine them.Electric current Type logic circuit (CML) 804 output drives the T-line of TSI 806 between transmitter and receptor.The transmitting terminal of multinuclear I/O interface Drive circuit is made up of two-stage cascade CML buffer.In order to alleviate the mismatch of impedance, use the resistance in 50 Europe to be transmitted line Impedance matching.At receiving terminal, sampler 809 is connected to data clock and recovers module (CDR) 811, is converted by current mode signal For digital CMOS level signal, then through deserializer 810 thus it is converted into 8 bit parallel data by digital signal.Based on delay Clock and data recovery (CDR) circuit 811 of phaselocked loop (DLL) is used for regulating the deflection of sampling clock, uses two XOR gate XOR The phase detectors constituted judge the sampling clock position relative to input data, and produce " early " pulse and " late " pulse. Additionally, also apply electric charge pump (charge-pump), these pulses switch become plurality of level to control DLL delay line, DLL Delay line is then used for regulating the delay phase place of clock, and feeds back to sampler 809 as sampled clock signal.
This module is under 8Gbps operating rate, and TxD power consumption is 15.24mW, postpone for 1.16ns, RxD power consumption be 7.10mW, postpones as 2.69ns.Owing to 2.5D multi core chip in this paper supports 12 paths two-way simultaneous transmission, so, Between the sheet of system, the peak bandwidth of data transmission is 24GB/s, and wherein memory access part is 16GB/s.

Claims (2)

1. a polycaryon processor based on 2.5D Advanced Packaging, it is characterised in that include the architecture design of processor, its Processor architecture is designed as:
32 MIPS processors (101) of tufted 8 core are the core cell of whole system, with the external world by full custom high speed TRX mould Block (105) communicates with S/P asynchronous FIFO (106), and 8 core processors (102) are divided into two bunches, every bunch of 4 cores;Can enter in bunch Row is shared storage and is communicated, and between bunch, the circuit switching bilayer network-on-chip by controlling based on packet switch carries out information transmission;
System carries out longitudinal memory space and expands, and remotely expands the SRAM memory module (103) of 8 16KB in off-chip, logical Cross full custom high speed TRX module (105) to be connected on processor with S/P asynchronous FIFO (106);
System is laterally expanded space, and increases by 4 accelerator modules (104), again by full custom high speed TRX module (105) It is connected on processor with S/P asynchronous FIFO (106).
2. according to the polycaryon processor in claim 1, it is characterised in that described full custom high speed TRX module (105) is by going here and there Row device (803), current mode logic buffer (804), TSI passage (806), sampler (809), decoder (810) form;
Parallel data is converted into serial sequence by 8:1 serializer (803), and current mode logic buffer is received in the output of serializer The input of CML (804);
The output of current mode logic buffer CML (804) drives the T-shaped line of 2.5D TSI (806);
Another termination sampler of TSI (806) T-shaped line, serial data is given decoder after receiving serial data by sampler (810), decoder finally converts serial data into parallel data and completes data transmission;
Other modules of TRX circuit include that TX voltage controlled oscillator VCO (805), clock data recovery circuit (CDR) (811), RX are voltage-controlled Agitator VCO (812), voltage controlled oscillator is used for producing clock signal, and clock data recovery circuit CDR (811) is used for regulating adopting The deflection of sample clock.
CN201610619205.7A 2016-08-01 2016-08-01 A kind of polycaryon processor based on 2.5D Advanced Packaging Pending CN106294274A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610619205.7A CN106294274A (en) 2016-08-01 2016-08-01 A kind of polycaryon processor based on 2.5D Advanced Packaging

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610619205.7A CN106294274A (en) 2016-08-01 2016-08-01 A kind of polycaryon processor based on 2.5D Advanced Packaging

Publications (1)

Publication Number Publication Date
CN106294274A true CN106294274A (en) 2017-01-04

Family

ID=57663625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610619205.7A Pending CN106294274A (en) 2016-08-01 2016-08-01 A kind of polycaryon processor based on 2.5D Advanced Packaging

Country Status (1)

Country Link
CN (1) CN106294274A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6323679B1 (en) * 1999-11-12 2001-11-27 Sandia Corporation Flexible programmable logic module
CN104008084A (en) * 2014-06-02 2014-08-27 复旦大学 Extensible 2.5-dimensional multi-core processor architecture
CN104035896A (en) * 2014-06-10 2014-09-10 复旦大学 Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6323679B1 (en) * 1999-11-12 2001-11-27 Sandia Corporation Flexible programmable logic module
CN104008084A (en) * 2014-06-02 2014-08-27 复旦大学 Extensible 2.5-dimensional multi-core processor architecture
CN104035896A (en) * 2014-06-10 2014-09-10 复旦大学 Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIE LIN 等: "A Scalable and Reconfigurable 2.5D Integrated Multicore Processor on Silicon Interposer", 《CUSTOM INTEGRATED CIRCUITS CONFERENCE》 *

Similar Documents

Publication Publication Date Title
CN101951313B (en) FPGA-based SFI4.1 device
PD et al. A scalable network-on-chip microprocessor with 2.5 D integrated memory and accelerator
CN103885919A (en) Multi-DSP and multi-FPGA parallel processing system and implement method
CN102999467A (en) High-speed interface and low-speed interface switching circuit and method based on FPGA (Field Programmable Gate Array)
CN110334044A (en) A kind of MIPI DPHY transmitting line and equipment
Höppner et al. An energy efficient multi-Gbit/s NoC transceiver architecture with combined AC/DC drivers and stoppable clocking in 65 nm and 28 nm CMOS
Rahmani et al. BBVC-3D-NoC: an efficient 3D NoC architecture using bidirectional bisynchronous vertical channels
CN106649155A (en) SDR SDRAM (Single Data Rate Synchronous Dynamic Random Access Memory) controller with low power consumption and high data throughout and working method thereof
CN104184456B (en) For the low frequency multi-phase differential clock tree-shaped high-speed low-power-consumption serializer of I/O interface
US10924096B1 (en) Circuit and method for dynamic clock skew compensation
CN106294274A (en) A kind of polycaryon processor based on 2.5D Advanced Packaging
TWI810962B (en) Semiconductor die, electronic component, electronic device and manufacturing method thereof
Dang et al. FPGA implementation of a low latency and high throughput network-on-chip router architecture
Liao et al. Exploring AMBA AXI on-chip interconnection for TSV-based 3D SoCs
CN104394072A (en) Double-pumped vertical channel for three dimensional Network on chip
CN108717400A (en) A kind of field programmable gate array and communication means
Goyal et al. Neksus: An interconnect for heterogeneous system-in-package architectures
CN106708768A (en) Aurora interface binding method and device based on shared phase-locked loops
Sundaram et al. A reconfigurable asynchronous SERDES for heterogenous chiplet interconnects
CN219842685U (en) FPGA platform LVDS parallel bus bandwidth acceleration device
Ning et al. Design of a GALS Wrapper for Network on Chip
Canegallo et al. System on chip with 1.12 mW-32Gb/s AC-coupled 3D memory interface
Wey et al. A 2Gb/s high-speed scalable shift-register based on-chip serial communication design for SoC applications
Chen et al. Design of SRAM Based Interface Module with DMA in Inductive-Coupling 3D Stacked IoT Chips
Mas et al. Network-on-chip: The intelligence is in the wire

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170104

WD01 Invention patent application deemed withdrawn after publication