WO2023030051A1 - 一种堆叠芯片 - Google Patents

一种堆叠芯片 Download PDF

Info

Publication number
WO2023030051A1
WO2023030051A1 PCT/CN2022/113699 CN2022113699W WO2023030051A1 WO 2023030051 A1 WO2023030051 A1 WO 2023030051A1 CN 2022113699 W CN2022113699 W CN 2022113699W WO 2023030051 A1 WO2023030051 A1 WO 2023030051A1
Authority
WO
WIPO (PCT)
Prior art keywords
array component
storage
programmable gate
gate array
control unit
Prior art date
Application number
PCT/CN2022/113699
Other languages
English (en)
French (fr)
Inventor
郭一欣
江喜平
左丰国
王嵩
周骏
Original Assignee
西安紫光国芯半导体有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西安紫光国芯半导体有限公司 filed Critical 西安紫光国芯半导体有限公司
Publication of WO2023030051A1 publication Critical patent/WO2023030051A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F2015/761Indexing scheme relating to architectures of general purpose stored programme computers
    • G06F2015/763ASIC
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F2015/761Indexing scheme relating to architectures of general purpose stored programme computers
    • G06F2015/768Gate array
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to the technical field of integrated circuits, in particular to a stacked chip.
  • the invention provides a stacked chip, which can realize high bandwidth and low power consumption of storage access.
  • a technical solution provided by the present invention is to provide a stacked chip, including: a first programmable gate array component, the first programmable gate array component includes a first interface module, and the first interface module is embedded in In the first programmable gate array component, the first interface module includes a first bond lead-out area; the first storage array component is provided with a second bond lead-out area; the first bond lead-out area, the second bond lead-out area key
  • the coupling connection is used to connect the interconnection signals on the first programmable gate array component and the first storage array component together.
  • the first programmable gate array component includes a plurality of functional modules, the number of the first interface module is at least one, the first interface module is located between the plurality of functional modules, and is connected to the functional modules through an interface routing unit.
  • the inside of the functional module is strip-shaped, and the first interface module is arranged along with the layout of the strip-shaped functional modules.
  • the functional modules are connected to the interface routing unit through the inner metal layer, and the first interface module is interconnected with the interface routing unit through the inner metal layer.
  • the first programmable gate array component includes: a programmable routing network, and multiple functional modules are interconnected with the programmable routing network through internal metal layers, and connected to the interface routing unit through the programmable routing network.
  • the stacked chip further includes: a physical layer, the physical layer is used to realize the level conversion between the first programmable gate array component and the second memory array component; the physical layer is arranged on the first interface module.
  • the functional modules include: programmable logic block LAB (Logic Array Block)/CLB (Configurable Logic Block), storage block BRAM (Block Random Access Memory, BRAM), multiplication unit DSP (Digital Signal Processer) and multiplication and accumulation unit MAC ( Multiply Accumulate) any one or any combination of multiple.
  • programmable logic block LAB Logic Array Block
  • CLB Configurable Logic Block
  • storage block BRAM Block Random Access Memory
  • DSP Digital Signal Processer
  • multiplication and accumulation unit MAC Multiply Accumulate
  • the function module also includes: a combination of ASIC array unit, and the ASIC array unit is a solidified hardware circuit for accomplishing a fixed calculation target.
  • the storage block is connected with the programmable logic block through the storage routing unit.
  • the first programmable gate array component includes a field-programmable gate array (Field-Programmable Gate Array, FPGA) or an embedded field-programmable gate array (Embedded Field-Programmable Gate Array, eFPGA).
  • FPGA Field-Programmable Gate Array
  • eFPGA embedded Field-Programmable Gate Array
  • the stacked chip also includes: a storage control unit, the storage control unit is arranged on the first interface module; or, the storage control unit is arranged at a position close to the first interface of the first programmable gate array assembly; or, the storage control unit is arranged On the first storage array component; the storage control unit controls the first programmable gate array component to perform storage access to the first storage array component.
  • the stacked chip also includes: a second storage array assembly, the second storage array assembly is arranged on the side of the first programmable gate array assembly away from the first storage array assembly; the second storage array assembly is provided with a third bonding lead-out area ;
  • the first interface module includes a fourth bonding lead-out area, and the first programmable gate array component and the second memory array component are bonded and connected through the third bonding lead-out area and the fourth bonding lead-out area.
  • the stacked chip also includes: a second storage array assembly, the second storage array assembly is arranged on the side of the first storage array assembly away from the first programmable gate array assembly; Area; the first storage array component includes a fourth bonding lead-out area, and the first storage array component and the second storage array component are bonded and connected through the fourth bonding lead-out area and the third bonding lead-out area.
  • the stacked chip further includes: a storage control unit disposed on the first interface module; the storage control unit controls the first programmable gate array component to access the first storage array component and the second storage array component.
  • the first programmable gate array component also includes: a programmable logic unit connected to a storage control unit, and the programmable logic unit leads out a logic signal; the storage control unit selectively controls the access of the first programmable gate array component based on the logic signal in time division The first storage array component, or controlling the first programmable gate array component to access the second storage array component.
  • the stacked chip further includes: a first storage control unit, arranged on the first interface module; a second storage control unit, arranged on the first interface module; the first storage control unit controls the first programmable gate array component to access the second A storage array component, the second storage control unit controls the first programmable gate array component to access the second storage array component.
  • the first programmable gate array assembly further includes: a programmable logic unit connected to the first storage control unit and the second storage control unit, and the programmable logic unit leads out a logic signal; the first storage control unit controls the first programmable logic unit based on the logic signal
  • the programming gate array component accesses the first storage array component, and the second storage control unit simultaneously controls the first programmable gate array component to access the second storage array component based on a logic signal.
  • the stacked chip further includes: a second programmable gate array component, the second programmable gate array component is disposed on a side of the first programmable gate array component away from the first storage array component;
  • the second programmable gate array component includes a second interface module, the second interface module includes a third bond lead-out area, the first interface module includes a fourth bond lead-out area, and the third bond lead-out area bonded to the fourth bonding area to connect the second programmable gate array component to the first programmable gate array component; wherein, the first programmable gate array component and The second programmable gate array component shares the same storage control unit to access the same storage unit of the first storage array component; or the first programmable gate array component and the second programmable gate array component use independent The storage control unit accesses different storage units of the first storage array assembly.
  • the stacked chip further includes: a second programmable gate array component, the second programmable gate array component is disposed on a side of the first memory array component away from the first programmable gate array component;
  • the second programmable gate array component includes a second interface module, the second interface module includes a third bond lead-out area, the first memory array component includes a fourth bond lead-out area, and the third bond lead-out area The area is bonded and connected to the fourth bonding area to connect the second programmable gate array component to the first storage array component; wherein, the first programmable gate array component and the The second programmable gate array component shares the same storage control unit to access the same storage unit of the first storage array component; or the first programmable gate array component and the second programmable gate array component use independent A storage control unit accesses different storage units of the first storage array assembly.
  • the first programmable gate array component includes a programmable logic block and a programmable routing network; the programmable logic blocks are interconnected with each other through the programmable routing network to be configured as several programmable function modules; and At least a portion of the programmable routing network is extendable to interface routing units.
  • the beneficial effect of the present invention is different from the situation of the prior art.
  • the stacked chip of the present invention connects the interconnection between the first programmable gate array component and the first memory array component through the first bonding lead-out area and the second bonding lead-out area.
  • the signals are connected together, and the first interface module that sets the first bonding lead-out area is embedded into the first programmable gate array component, thereby realizing a three-dimensional heterogeneous integrated structure and achieving the purpose of high bandwidth and low power consumption for storage access .
  • FIG. 1 is a schematic structural view of a first embodiment of stacked chips of the present invention
  • FIG. 2 is a schematic plan view of the first programmable gate array assembly of the present invention.
  • Fig. 3 is a schematic diagram of the memory access structure of the first programmable gate array component to the first storage array component in Fig. 1;
  • FIG. 4 is a schematic structural diagram of a second embodiment of stacked chips of the present invention.
  • FIG. 5 is a schematic structural diagram of the shared memory access of the first programmable gate array component and the second programmable gate array component to the first storage array component in FIG. 4;
  • FIG. 6 is a schematic structural diagram of the first programmable gate array component and the second programmable gate array component in FIG. 4 independently storing and accessing the first storage array component;
  • FIG. 7 is a schematic structural diagram of a third embodiment of stacked chips of the present invention.
  • FIG. 8 is a schematic structural diagram of the shared storage access of the first programmable gate array component to the first storage array component and the second storage array component in FIG. 7;
  • FIG. 9 is a schematic structural diagram of the independent storage access of the first programmable gate array component to the first storage array component and the second storage array component in FIG. 7;
  • FIG. 10 is a schematic structural diagram of a fourth embodiment of stacked chips of the present invention.
  • FIG. 11 is a schematic structural diagram of the shared storage access of the first programmable gate array component to the first storage array component and the second storage array component in FIG. 10;
  • FIG. 12 is a schematic structural diagram of the independent storage access of the first programmable gate array component to the first storage array component and the second storage array component in FIG. 10;
  • Fig. 13 is a schematic structural diagram of a programmable routing network and a programmable logic block
  • FIG. 14 is a schematic diagram of a three-dimensional heterogeneous integration structure among functional components 210 , 220 , and 230 .
  • first”, “second”, and “third” in the present invention are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, features defined as “first”, “second”, and “third” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise specifically defined. All directional indications (such as up, down, left, right, front, back%) in the embodiments of the present invention are only used to explain the relative positional relationship between the components in a certain posture (as shown in the accompanying drawings) , sports conditions, etc., if the specific posture changes, the directional indication also changes accordingly.
  • FIG. 1 is a schematic structural diagram of a first embodiment of stacked chips of the present invention.
  • the stacked chip includes a first programmable gate array component 1 and a first memory array component 2 .
  • the first programmable gate array component 1 and the first storage array component 2 are hybrid-bonded and integrated by means of three-dimensional heterogeneous integration.
  • Three-dimensional heterogeneous integration is to directly interconnect the internal metal layers of two chip components across chips.
  • the physical and electrical parameters follow the characteristics of the semiconductor process. , and/or, the interconnection realized by the I/O circuit is greatly improved, and the internal interconnection of the stacked chips can be achieved, so that the stacked chips can achieve high bandwidth and low power consumption.
  • the first storage array component 2 can be DRAM (Dynamic Random Access Memory, dynamic random access memory), in another embodiment, the first storage array component 2 can also be SRAM (static Random Access Memory, static random access memory), of course, considering the iterative development of technology, the first storage array component 2 can also be other types of memory, such as flash memory (Flash), variable resistance memory (RRAM or ReRAM), magnetoresistive memory (MRAM), ferroelectric memory (FeRAM), Oxide Resistive Memory (OxRAM), Bridge Memory (CBRAM), Phase Change Memory (PCM), Spin Transfer Torque Memory (STT-MRAM) and Electrically Erasable Memory (EEPROM), etc.
  • flash memory Flash
  • variable resistance memory RRAM or ReRAM
  • MRAM magnetoresistive memory
  • FeRAM ferroelectric memory
  • CBRAM Bridge Memory
  • PCM Phase Change Memory
  • STT-MRAM Spin Transfer Torque Memory
  • EEPROM Electrically Erasable Memory
  • the above-mentioned memories have their own characteristics and advantages, and may require a storage controller as a storage access interface.
  • the storage controller is used to implement functions such as physical interface, data reading and writing, data buffering, data prefetching, data refreshing, and data block remapping.
  • the first programmable gate array component 1 includes a first interface module 11 , and the first interface module 11 is embedded in the first programmable gate array component 1 .
  • the first interface module 11 includes a first bonding lead-out area 111 .
  • the first storage array assembly 2 is provided with a second bonding lead-out area 21 .
  • the first bonding lead-out region 111 and the second bonding lead-out region 21 are bonded together through a three-dimensional heterogeneous integrated bonding structure, thereby realizing the three-dimensional heterogeneous integration of the first programmable gate array component 1 and the first memory array component 2 , and then realize the high-bandwidth, low-power programmable static storage and calculation integrated structure of stacked chips.
  • Three-dimensional heterogeneous integrated bonding can greatly increase the interconnection density between the first programmable gate array component 1 and the first interface module 11, and can further improve the interconnection between the first programmable gate array component 1 and the first storage array component 2 Density, reduce interconnect distribution parameters, increase interconnect bandwidth and reduce interconnect power consumption.
  • the first programmable gate array component 1 includes a plurality of functional modules 13, the first interface module 11 is located between the plurality of functional modules 13, and the first interface module 11 is provided with an interface routing unit 137 near the side of the functional modules 13,
  • the interface routing unit 137 connects the functional module 13 with the first interface module 11 .
  • the functional module 13 is connected to the interface routing unit 137 through an internal metal layer
  • the first interface module 11 is connected to the interface routing unit 137 through an internal metal layer.
  • the number of the first interface module 11 is one, and in another embodiment, the number of the first interface module 11 is at least 2, and at least two first interface modules 11 are interspersed and arranged in multiple functions respectively.
  • FIG. 1 only shows one first interface module 11 , in other embodiments, there may also be multiple first interface modules 11 , the present application is not limited thereto, and it is specifically set according to requirements.
  • FIG. 2 is a schematic plan view of the first programmable gate array component 1 .
  • Function module 13 comprises programmable logic block (Logic Array Block, LAB/Configurable Logic Block, CLB) 133, storage block (Block Random Access Memory, BRAM) 134, multiplication unit (Digital Signal Processer) 135 and multiplication and addition unit (Multiply Accumulate , MAC) 138.
  • the multiplication unit 135 is not a digital signal processor chip, but an embedded programmable multiplication unit.
  • the functional module 13 can be set as required, which is not limited to this application.
  • the first bonding lead-out region 111 is the three-dimensional heterogeneous integrated interconnection resources in the first programmable gate array component 1, and the first programmable gate array component 1 directly connects the first bonding lead-out region 111 with
  • the second bonding lead-out area 21 of the first storage array component 2 is bonded and connected to realize direct interconnection of metal layers with high density and low distribution parameters, and realize storage access, avoiding the use of the first programmable gate array component 1 through the IO interface and IO
  • the interface circuit is interconnected with the first storage array component 2, thereby achieving the purpose of high bandwidth and low power consumption, and has the advantages of high density and low distribution parameters.
  • the first programmable gate array component 1 further includes: a programmable routing network.
  • a plurality of functional modules 13 are interconnected with a programmable routing network through internal metal layers, and are connected to the interface routing unit 137 through the programmable routing network.
  • the programmable routing network is used to establish the interconnection and data exchange of all resources inside the first programmable gate array component 1 by using the internal metal layer of the first programmable gate array component 1 in a programmable manner, and the functional modules 13 Establish extensive reconfigurable high-bandwidth data interconnections between modules and between modules and storage devices through programmable routing networks.
  • the programmable routing network is connected to the storage routing unit 136, and the storage block BRAM 134 is interconnected with the storage routing unit 136 and connected to the programmable storage routing network, so as to realize all functional modules 13 in the first programmable gate array assembly 1 , through the storage routing unit 136, the storage access to all storage blocks BRAM 134.
  • the programmable routing network is connected to the interface routing unit 137, and the first storage array component 2 is interconnected with the interface routing unit 137 through the first interface module 11 and connected to the programmable storage routing network to realize all functions in the first programmable gate array component 1
  • the module 13 accesses the storage of all the storage arrays on the first storage array component 2 through the interface routing unit 137 .
  • all functional modules 13 on the first programmable gate array component 1 are connected to the interface routing unit 137 through a programmable routing network, and the interface routing unit 137 is connected to the three-dimensional heterogeneous integrated bonding structure corresponding to the first interface module 11, Furthermore, the storage access of the functional module 13 to the storage arrays on all the first storage array components 2 is established. Since the programmable routing network is widely distributed on the first programmable gate array component 1 and supports programmable features, no matter whether it is close to or far away from the functional module 13 of the first interface module 11, it can communicate with the interface routing unit through the programmable routing network. 137 establishes high-density intra-chip metal layer interconnections.
  • the first interface module 11 realizes the direct interconnection of cross-chip metal layers with high density and low distribution parameters through the first bonding lead-out area 111 and the second bonding lead-out area 21 with the first storage array component 2, avoiding the IO interface and the IO interface.
  • the shortcomings of low interconnection density, low interconnection speed and high interconnection power consumption caused by the circuit establish high bandwidth and low power consumption storage access of all functional modules 13 to all storage arrays on the first storage array assembly 2 .
  • the storage block BRAM on the programmable gate array component is connected to the programmable routing network through the storage routing unit to provide high-bandwidth storage resources for the functional modules.
  • the storage block BRAM The capacity is usually tens of thousands to millions of storage bits (bit, memory cell), which cannot meet the needs of conventional applications.
  • the large-capacity storage resource is expanded through the IO of the programmable gate array component and the external memory outside the programmable gate array component, and the storage block BRAM inside the programmable gate array component is usually used as an external large-capacity storage resource cache usage.
  • the first interface module 11 is connected to the first storage array component 2 through three-dimensional heterogeneous integration, that is, in fact, in this application, the first programmable gate array component 1 and the first storage array component 2 are integrated through three-dimensional heterogeneity, Establish a high-density interconnection of metal layers between chips, the physical and electrical parameters of the interconnection follow the characteristics of the semiconductor manufacturing process, inherit the interconnection between the storage block BRAM 134 and the functional module 13 in the chip of the first programmable gate array component 1 through the storage routing unit 136
  • the high-density and high-speed bandwidth advantages and low power consumption advantages of the connection, and the storage capacity can be expanded almost infinitely.
  • the programmable logic block LAB/CLB 133, the storage block BRAM 134, the multiplication unit DSP 135, and the multiplication and accumulation unit MAC 138 in the functional module 13 are all striped layouts, and the storage routing unit 136 is a striped layout. shape layout.
  • Programmable logic block LAB/CLB 133, storage block BRAM 134, multiplication unit DSP 135, multiplication and accumulation unit MAC 138, storage routing unit 136, etc. are in the first programmable gate array component 1 according to requirements, as shown in Figure 2. Shapes can be repeated and combined arbitrarily, and a programmable interconnection can be established through a programmable routing network. The specific combination method is not limited by this application.
  • the first interface module 11 is set to match the shape of the functional modules 13, and is also arranged in a strip shape to be embedded between the functional modules 13.
  • the first interface module 11 is arranged in a strip shape based on the size of the functional modules 13. In the length direction, the functional modules 13 are extended and expanded in capacity.
  • the interface routing unit 137 is set to match the shape of the functional module 13, and is also strip-shaped to be embedded between the functional modules 13. The interface routing unit 137 is based on the size of the functional module 13 in the strip-shaped length direction.
  • the interface routing unit 137 is designed, which can greatly increase the bit width of the bus, and the interface routing unit 137 is directly connected to the three-dimensional heterogeneous integrated bonding structure, through the three-dimensional heterogeneous integrated interconnection structure and the first The storage array component 2 is connected to realize access to the large-capacity storage array.
  • the first interface module 11 is set on the first programmable gate array component 1 to realize the storage access with the first storage array component 2, and the first programmable gate array component 1 in the traditional technology through the internal IO circuit and The way the external IO interface is connected to the large-capacity external memory is different.
  • the stacked chip of this embodiment can save the IO resources of the first programmable gate array component 1, provide an external storage interconnection density much higher than that through IO, and improve storage access. bandwidth, reducing memory access power consumption.
  • a global bus such as NOC AXI AHB, etc.
  • NOC AXI AHB can also be set on the first programmable gate array component 1, which can realize cross-region storage access of programmable logic on the first programmable gate array component 1.
  • the global bus can be set near the first interface module 11, or can also be set at other storage access associated locations, which is not specifically limited.
  • an ASIC array unit 139 can also be set in the first programmable gate array assembly 1, and the ASIC array unit 139 includes a hard-core operation/processing unit ( Processing Element), such as multiplication and addition calculation arrays, multiplication calculation arrays, systolic processor arrays, hash calculation arrays, various encoder arrays, machine learning dedicated layer arrays, retrieval function arrays, image/video processing arrays, and CPUs and MCUs One or more arbitrary combinations of hard-core computing/processing units.
  • Processing Element such as multiplication and addition calculation arrays, multiplication calculation arrays, systolic processor arrays, hash calculation arrays, various encoder arrays, machine learning dedicated layer arrays, retrieval function arrays, image/video processing arrays, and CPUs and MCUs
  • the ASIC array unit 139 is arranged in strips in the first programmable gate array component 1 to embed the functional modules 13 Between, the size is extended and capacity expanded with the functional modules 13 in the strip-like length direction, and widely interconnected on the programmable routing network to become the hard core operation/processing expansion circuit of the functional modules 13.
  • ASIC array unit 139 has limited programmability or no programmability, and is used for calculation/processing acceleration of specific requirements. Compared with functional modules 13 with arbitrary programmability, the calculation/processing density is much higher, and stacking is significantly increased. Chip computing/processing density.
  • Design ASIC array unit 139 to include the hard core operation/processing unit that ASIC realizes, such as multiplication and addition calculation array, multiplication calculation array, pulse processor array, hash One or more arbitrary combinations of computing arrays, various encoder arrays, machine learning dedicated layer arrays, retrieval function arrays, image/video processing arrays, and hard-core computing/processing units such as CPU and MCU; 2.
  • the first can be Design an operation/processing interface module on the programming gate array component 1, and establish a high-density cross-chip interconnection with the operation/processing unit in the ASIC array unit 139 through three-dimensional heterogeneous integration; 3.
  • the first programmable gate array component 1 The computing/processing interface routing unit is designed above, and the high-density interconnection of the on-chip metal layer between the programmable routing network and the computing/processing interface module is established.
  • the functional modules 13 on the first programmable gate array component 1, the calculation/processing units on the integrated circuit array unit 139, the scheduling based on high-density three-dimensional heterogeneous integration, the calculation input and calculation results of the calculation/processing units are mapped to the large-capacity storage array on the first storage array component 2 through storage access based on high-density three-dimensional heterogeneous integration.
  • the stacked chip further includes: a storage control unit 113 for controlling storage and access of the first programmable gate array component 1 to the first storage array component 2 .
  • the storage control unit 113 can be set on the first interface module 11 ; or near the first interface module 11 on the first programmable gate array component 1 ; or the storage control unit 113 can be set on the first storage array component 2 .
  • the stacked chips in this embodiment can avoid interconnection through physical IO interfaces, thereby saving IO resources, providing interconnection density much higher than that of IO interfaces, improving storage access bandwidth, and reducing storage access power consumption.
  • the high-density, close-distance interconnection of the internal signals of the first programmable gate array component 1 to the first storage array component 2 is realized.
  • the storage control unit 113 is arranged on the first interface module 11 . Since the programmable gate array component needs to pass through the first interface module 11 when accessing the storage array component, this is beneficial to data flow. In a preferred embodiment, the storage control unit 113 is disposed on the first programmable gate array component 1 , because the process performance of the programmable gate array component is better than that of the storage array component, so that higher density and speed can be obtained. In a preferred embodiment, the storage control unit 113 is arranged near the first interface module 11, which can inherit the process performance of the programmable gate array assembly to obtain higher density and speed, and can also reduce the cost of the first interface module 11.
  • the storage control unit 113 can also be combined with the programmable feature of the functional module 13, so that some functions and/or parameters of the storage control unit 113 can be programmed.
  • the storage control unit 113 is arranged on the storage array component. Because the process of the storage array component is cheaper than the unit area of the programmable gate array component, the implementation cost can be reduced and the density of the programmable gate array component can be relatively increased.
  • the stacked chip further includes: a physical layer 114, and the physical layer 114 is used to implement the first programmable gate array component 1 when the core voltage of the first programmable gate array component 1 and the first storage array component 2 are different. 1 and the level conversion of the three-dimensional heterogeneous integrated interconnection between the first memory array assembly 2.
  • the physical layer 114 may be set on the first interface module 11 .
  • the physical layer 114 can also be designed on the first programmable gate array component 1, usually on or near the first interface module 11, so as to inherit the process performance of the first programmable gate array component 1 and obtain Higher density and speed; the physical layer 114 can be designed on the first storage array assembly 2, usually on or near the vertical projection area of the first interface module 11, to save the area of the first programmable gate array assembly 1 and improve Computing/processing density of the first programmable gate array assembly 1 .
  • the physical and electrical parameters of the cross-chip three-dimensional heterogeneous integration of the first programmable gate array component 1 and the first memory array component 2 follow the characteristics of the semiconductor manufacturing process.
  • the first programmable The number of interconnections (storage access bandwidth) between the gate array component 1 and the first storage array component 2 is increased by 4 to 2 orders of magnitude.
  • the direct interconnection between the first programmable gate array component 1 and the first storage array component 2 is realized without going through the IO interface and/or IO circuit, so that the interconnection distance is closer and the interconnection distribution parameters Lower (especially the lower distributed capacitance of the interconnection to the reference ground), the power consumption overhead of memory access is significantly reduced.
  • Form the near-memory storage access architecture of the first programmable gate array component 1 and the first storage array component 2 realize the nearby storage access of the functional modules 13 on the first programmable gate array component 1, and avoid storage access conflicts and Efficiency is reduced; the IO overhead for interconnecting the first programmable gate array component 1 and the external large-capacity storage device in the traditional technology is saved.
  • the storage control unit is set on the first interface module as an example for illustration.
  • the storage control unit H21 is disposed on the first interface module H17.
  • the first memory array assembly 2 includes a storage unit G13, the second bonding lead-out area G14 is arranged on the storage unit G13, the storage control unit H21 is connected to the first bonding lead-out area H19, and the first bonding lead-out area H19 is connected to the first bonding lead-out area H19.
  • the second bonding lead-out region G14 on the storage array assembly 2 is connected.
  • the first programmable gate array component 1 is provided with a programmable logic unit K23, and the programmable logic unit K23 is connected to the storage control unit H21 through the interface routing unit H22.
  • the programmable logic unit K23 leads out a logic signal, and the storage control unit H21 controls the first programmable gate array component 1 to perform storage access to the first storage array component 2 based on the logic signal.
  • the number and positions of the first programmable gate array component 1 and the first storage array component 2 can be set according to requirements, as shown in Figure 4, which is a schematic structural diagram of the second embodiment of the stacked chip of the present invention .
  • the stacked chips of this embodiment further include: a second programmable gate array component 3 .
  • the second programmable gate array component 3 is disposed on a side of the first programmable gate array component 1 away from the first memory array component 2 .
  • the second programmable gate array component 3 includes a second interface module 31
  • the second interface module 31 includes a third bonding lead-out area 32 .
  • the first interface module 11 also includes a fourth bonding area 12, and the third bonding area 32 is bonded to the fourth bonding area 12, so as to connect the second programmable gate array component 3 with the The first programmable gate array components 1 are bonded together.
  • the stacked chip of this embodiment is provided with two layers of programmable gate array components, that is, the second programmable gate array component 3 and the first programmable gate array component 1, and the second programmable gate array component 3 and the first programmable gate array component
  • the programming gate array assembly 1 is bonded and connected through the third bonding area 32 and the fourth bonding area 12 .
  • the third bonding lead-out area 32 is the three-dimensional heterogeneous interconnection resource of the second programmable gate array component 3, that is, the second programmable gate array component 3 directly connects with the first interface module through the interconnection resource 11 connection, and then realize the interconnection with the first storage array component 2 through the interconnection resource (the first bonding lead-out region 111) in the first programmable gate array component 1, realize storage access, and avoid using the second programmable gate array
  • the IO interface of component 3 is interconnected with the first storage array component 2, thereby achieving the purpose of high bandwidth and low power consumption, and has the advantages of high density of programmable resources, low distribution parameters, and fast storage access speed.
  • the stacked chip In the stacked chip, adjacent components are interconnected through three-dimensional heterogeneous integration, and the high-density metal layer interconnection in the chip is established layer by layer.
  • the components in the stacked chip are designed and packaged in the same stacked chip without the need for existing
  • the functions provided by the IO circuit in the technology such as drive, external level boost (for output), external level step-down (for input), three-state controller, electrostatic protection ESD and surge protection circuit, do not need to pass through the existing The IO interface and/or IO circuit interconnection of the technology, and directly establish the high-density metal layer interconnection across components.
  • the use of the IO structure of the programmable gate array component is reduced, and the interconnection density and interconnection speed of the programmable gate array component and the storage array component are increased; at the same time, the three-dimensional heterogeneous integrated interconnection does not pass through the traditional IO structure, and the interconnection distance
  • the short length reduces the communication power consumption between the chips; furthermore, the integration degree of the stacked chips and the interconnection frequency of the programmable gate array components and the storage array components are improved, and the interconnection power consumption is reduced.
  • the programmable routing network that widely interconnects programmable resources on the programmable gate array component extends across chips to the large-capacity storage array on the memory chip, and forms extensive interconnections to realize programmable resources in a high-bandwidth, programmable manner.
  • the multi-layer chip has both the large capacity of the external memory and the key advantages of large bit width and high bandwidth of the BRAM (current technology, small capacity) interconnected by a programmable routing network on a similar programmable gate array component. It fundamentally breaks through the IO number bottleneck, memory access bandwidth bottleneck, and memory access power consumption bottleneck of the existing programmable gate array chip to expand large-scale memory.
  • the stacked chips of this embodiment can further increase the calculation density, which is beneficial to more complex reconfigurable calculations.
  • more programmable gate array components can be provided according to requirements, so as to increase the density of the programmable gate array components in the stacked chips.
  • the second programmable gate array component 3 can also be different from the first programmable gate array component 1 , and it can be provided with different functional modules according to actual needs.
  • the functional modules of the first programmable gate array component 1 include programmable functional modules, and the programmable functional modules include but are not limited to programmable logic blocks LAB/CLB, memory blocks BRAM, multiplication units DSP and multiplication units Arbitrary combination of accumulating unit MAC;
  • the functional modules of the second programmable gate array assembly 3 can partially/entirely include an ASIC array unit, and the ASIC array unit includes but is not limited to a multiplication and addition calculation array, a multiplication calculation array, and a pulse processor
  • One or more arbitrary combinations of arrays, hash calculation arrays, multiple encoder arrays, machine learning dedicated layer arrays, retrieval function arrays, image/video processing arrays, and hard-core computing/processing units such as CPU and MCU.
  • the first programmable gate array component 1 and the second programmable gate array component 3 share the same storage control unit 113 to access the same storage unit of the first storage array component 2 .
  • the storage control unit 113 can be set on or near the first interface module 11; the storage control unit 113 can also be set on or near the second interface module 31; or, the storage control unit 113 can also be set on the first storage array component 2.
  • the first programmable gate array assembly 1 further includes: a first programmable logic unit, the first programmable logic unit is connected to the storage control unit 113, and the first programmable logic unit leads out the first logic signal .
  • the second programmable gate array component 3 further includes: a second programmable logic unit connected to the storage control unit 113 , and the second programmable logic unit leads out a second logic signal.
  • the storage control unit 113 selects the first programmable gate array component 1 to access the first storage array component 2 or selects the second programmable gate array component 3 to access the first storage array component 2 based on the first logic signal and the second logic signal.
  • the storage control unit H21 is set on the first interface module H17 as an example for illustration.
  • the first storage array assembly 2 includes a storage unit G13, the second bonding lead-out area G14 is set on the storage unit G13, the first bonding lead-out area H19 is set on the first interface module H17, and the first bonding lead-out area H19 It is bonded to the second bonding lead-out region G14.
  • the storage control unit H21 is disposed on the first interface module H17, and the storage control unit H21 is connected to the first bonding lead-out area H19.
  • the first interface module H17 is also provided with a fourth bonding lead-out area H24, and the fourth bonding lead-out area H24 is connected to the storage control unit H21.
  • a third bond extraction area I28 is disposed on the second interface module I27, and the third bond extraction area I28 is connected to the fourth bond extraction area H24.
  • the first programmable gate array component 1 further includes a first programmable logic unit H23, and the first programmable logic unit H23 is connected to the storage control unit H21.
  • the second programmable gate array component 321 further includes a second programmable logic unit I32, which is connected to the third bonding lead-out area I28.
  • the first programmable logic unit H23 leads the first logic signal to the storage control unit H21, at this time, the storage control The unit H21 controls the first programmable gate array component 1 to access the storage unit G13 on the first storage array component 2 through the first bonding lead-out region H19 and the second bonding lead-out region G14 based on the first logic signal.
  • the second programmable gate array component 3 needs to access the first storage array component 2
  • the second programmable logic unit I32 leads a second logic signal to the storage control unit H21.
  • the storage control unit H21 controls the second programmable gate array assembly 3 to access the storage unit G13 on the first storage array assembly 2 through the third bonding lead-out area I28 and the fourth bonding lead-out area H24 based on the second logic signal.
  • the storage control unit selects the first programmable gate array component 1 to access the first storage array component 2 or the second programmable gate array component 3 to access the first storage array component 2 based on the first logic signal and the second logic signal.
  • only one storage control unit H21 is designed, and the storage control unit H21 may be located on or near the first interface module H17, may also be located on or near the second interface module I27, or may be located on the first storage array assembly 2 , is not specifically limited.
  • the storage unit G13 on the first storage array assembly 2 is all connected to the storage control unit H21 through the second bonding lead-out area G14 and the first bonding lead-out area H19, and the storage control unit H21 can be directly connected to two sets of storage access interfaces (such as H19 and H24 in FIG. 5 ), multiple groups of programmable gate array components share the storage access of the storage unit G13 through this interface.
  • the first programmable logic unit H23 and the second programmable logic unit I32 include any combination of programmable logic blocks, storage blocks, multiplication units, multiply-accumulate units, and hard core operation/processing units.
  • the first programmable logic unit H23 leads out a first logic signal
  • the second programmable logic unit I32 leads out a second logic signal.
  • the storage control unit H21 switches the storage access interface of the storage control unit H21 to the bonding direction of the first bonding lead-out area H19 and the second bonding lead-out area G14, or switches to The bonding direction of the fourth bonding lead-out area H24 and the third bonding lead-out area I28 is used by the first programmable logic unit H23 and the second programmable logic unit I32 in time-sharing, realizing shared storage access.
  • the third bonding lead-out area I28 is connected to the interface routing unit I30.
  • the interface routing unit I30 connects the second programmable logic unit I32 to the fourth bonding lead-out area H24.
  • one storage control unit H21 is shared and occupies a small area.
  • the first programmable gate array component 1 and the second programmable gate array component 3 respectively use independent storage control units to access different storage units of the first storage array component 2 .
  • the stacked chip includes a first storage control unit and a second storage control unit, the first programmable gate array assembly 1 uses the first storage control unit to access the storage unit of the first storage array assembly 2, and the second programmable gate array assembly 3. Use the second storage control unit to access the storage units of the first storage array assembly 2.
  • the second storage control unit is disposed on or near the second interface module 31
  • the first storage control unit is disposed on or near the first interface module 11
  • the first programmable gate array component 1 further includes: a first programmable logic unit, the first programmable logic unit is connected to the first storage control unit, and the first programmable logic unit leads out a first logic signal
  • the second programmable gate array component 3 further includes: a second programmable logic unit, the second programmable logic unit is connected to the second storage control unit, and the second programmable logic unit leads out a second logic signal.
  • the first storage control unit controls the first programmable gate array component 1 to access the storage unit at the first time based on the first logic signal; the second storage control unit controls the second programmable gate array component 3 to access the storage unit at the second time based on the second logic signal Time access storage unit.
  • the first storage control unit and the second storage control unit In response to the first storage control unit and the second storage control unit respectively controlling different storage units of the first storage array assembly, the first storage control unit and the second storage control unit simultaneously control the first programmable gate array assembly 1 and the second programmable gate array assembly 1
  • the programming gate array component 3 accesses different memory cells of the first memory array component 2 .
  • the first storage control unit and the second storage control unit control all storage units of the first storage array assembly 2, if the first programmable gate array assembly 1 and the second programmable gate array
  • the first storage control unit and the second storage control unit respectively control the first programmable gate array component 1 and the second programmable gate array component 3 to access the storage unit.
  • the first storage control unit controls the first programmable gate array component 1 to access the storage unit at the first time based on the first logic signal
  • the second storage control unit controls the second programmable gate array component 3 to access the storage unit based on the second logic signal. Accessing the storage unit at the second time realizes time-sharing access to the same storage unit by different programmable gate arrays, that is, eliminates access conflicts.
  • the first programmable gate array component 1 may include arbitration logic of the storage unit, based on the first logic signal and the second logic signal, select to be accessed by the first storage control unit or the second storage control unit.
  • the first storage control unit of the first programmable gate array component 1 and the second storage control unit of the second programmable gate array component 3 access the same area of the same storage unit of the first storage array component 2 at the same time
  • the first The arbitration logic of the storage unit in the programmable gate array component 1 based on the first logic signal and the second logic signal, time-division establishes the first storage control unit of the first programmable gate array component 1 or the second programmable gate array component 3's second storage control unit access.
  • the arbitration logic of the memory cells in the first programmable gate array component 1 can also be set on the first memory array component 2 or the second programmable gate array component 3 . That is, based on the arbitration logic, the first programmable gate array component 1 and the second programmable gate array component 3 are selected to access the first memory array component 2 in time division.
  • the first storage control unit and the second storage control unit when the first storage control unit and the second storage control unit respectively control different storage units of the first storage array assembly, the first storage control unit and the second storage control unit simultaneously control the first programmable gate The array component 1 and the second programmable gate array component 3 access different storage units of the first storage array component 2 .
  • the arbitration logic in the storage unit in the first programmable gate array component 1 can simultaneously establish the first storage control unit of the first programmable gate array component 1 based on the first logic signal and the second logic signal and the access of the second storage control unit of the second programmable gate array component 3 to the storage unit of the first storage array component 2 .
  • each logical component has an independent storage access interface, and the memory access bandwidth is the highest.
  • the specific units of the access storage array are different, they can be accessed at the same time; when writing to the shared area of the storage array, conflicts occur when the specific units are the same, and arbitration is required and time-sharing access.
  • both the first storage control unit and the second storage control unit control all the storage units of the first storage array assembly 2, if the same storage unit is accessed at the same time, time-sharing access is required.
  • time-sharing access is not required.
  • the second storage control unit is disposed on or near the second interface module 31
  • the first storage control unit is disposed on or near the first interface module 11 .
  • the first storage control unit controls the first programmable gate array component 1 to access some storage units of the first storage array component 2 based on the first logic signal; the second storage control unit controls the second storage unit based on the second logic signal
  • the programmable gate array assembly 3 accesses the rest of the storage cells of the first storage array assembly 2; the second programmable gate array assembly 1 accesses the storage cells of the first storage array assembly 2 and the access area of the first programmable gate array assembly 3 does not overlap .
  • the first programmable logic unit utilizes the first storage control unit, and the second programmable logic unit utilizes the second storage control unit to independently and simultaneously access different storage units on the corresponding first storage array assembly 2 .
  • each logic component has an independent storage access interface, and the memory access bandwidth is the highest, and the first storage array component 2 is accessed and divided into different programmable logic units by using the storage control unit combination; the concurrency of different programmable logic units is realized Storage access without reducing storage access efficiency due to arbitration and time-sharing access.
  • the first storage array assembly 2 includes a storage unit G13, wherein two second bonding lead-out areas are set on the storage unit G13, which are respectively the second bonding lead-out area G14 and the second key Combined to lead out the region G12.
  • the second bonding lead-out region G14 is connected to the first bonding lead-out region H19 on the first interface module H17 on the first programmable gate array component 1 .
  • the first interface module H17 of the first programmable gate array component 1 is provided with a first storage control unit H20 for controlling the first programmable gate array component 1 to access the first storage array component 2 .
  • the first storage control unit H20 is connected to the first bonding lead-out region H19.
  • the first programmable logic unit H23 is provided on the first programmable gate array component 1, and the first programmable logic unit H23 is connected to the first storage control unit H20 through the interface routing unit H22.
  • the first programmable gate array component 1 accesses the first storage array component 2
  • the first programmable logic unit H23 leads the first logic signal to the first storage control unit H20
  • the first storage control unit H20 controls the
  • the first programmable gate array component 1 accesses part of the memory cells G13 of the first memory array component 2 through the first bond lead-out region H19 and the second bond lead-out region G14.
  • the second bonding area G12 is connected to the first bonding area H18 on the first interface module H17, and the first bonding area H18 is connected to the third bonding area H18 on the second programmable gate array component 3.
  • the second programmable gate array assembly 3 also includes a second programmable logic unit I32, and the second programmable logic unit I32 is connected to the second interface module I27 located on the second programmable gate array assembly 3 through the interface routing unit I31.
  • Storage control unit I29 is also included in the second programmable logic unit I32, and the second programmable logic unit I32 is connected to the second interface module I27 located on the second programmable gate array assembly 3 through the interface routing unit I31.
  • the second programmable gate array component 3 accesses the first storage array component 2
  • the second programmable logic unit I32 leads a second logic signal to the second storage control unit I29
  • the second storage control unit I29 controls based on the second logic signal
  • the second programmable gate array assembly 3 accesses the rest of the memory cells G13 of the first memory array assembly 2 through the third bonding area I28 , the first bonding area H18 and the second bonding area G14 .
  • the independent storage access of the first programmable gate array component 1 and the second programmable gate array component 3 to the first storage array component 2 is realized through the connection mode shown in FIG. 6 .
  • the programmable gate array component can also be 3 layers or 4 layers without specific limitation.
  • first programmable gate array component 1 and the second programmable gate array component 3 of the present application may be FPGA (Field Programmable Gate Array) or eFPGA (non-volatile Field Programmable Gate Array).
  • first programmable gate array component 1 and the second programmable gate array component 3 are FPGA (Field Programmable Gate Array) or eFPGA (Embedded Field Programmable Gate Array).
  • the storage access of the second programmable gate array component 3 to the first storage array component 2 does not go through the IO interface and/or the IO circuit, so that the interconnection distance is closer, and the interconnection distribution parameters are lower.
  • the power consumption overhead of memory access is significantly reduced.
  • the second programmable gate array component 3 and the first programmable gate array component 1 can be produced simultaneously, and after the second programmable gate array component 3 and the first programmable gate array component 1 are bonded, Bonding with the first memory array component 2 can reduce process complexity and save costs.
  • the storage access of the second programmable gate array component 3 to the first storage array component 2 needs to go through the first interface module 11 and the second interface module 31 , which will cause a slight area loss.
  • the present application also proposes another embodiment.
  • a plurality of programmable gate array components are used for at least one storage array component, and multiplexing or independent storage control units are designed by mixing the methods shown in FIG. 5 and FIG. 6 to realize hybrid storage access.
  • the programmable logic units in some areas use the multiplexed storage control unit shown in Figure 5 to implement storage access; the programmable logic units in some areas use the independent storage control unit shown in Figure 6 .
  • the present application also proposes another embodiment.
  • the second programmable gate array component 3 is disposed on the side of the first memory array component 2 away from the first programmable gate array component 1 . That is, the first storage array component 2 is disposed between the second programmable gate array component 3 and the first programmable gate array component 1 .
  • the first memory array component 2 includes a fourth bonding lead-out region, and the fourth bonding lead-out region and the third bonding lead-out region form a three-dimensional heterogeneous integrated interconnection.
  • both the second programmable gate array component 3 and the first programmable gate array component 1 can be directly interconnected with the first storage array component 2, which increases the programmable processing density and facilitates larger storage access bandwidth.
  • the storage access of the first programmable gate array component 1 to the first storage array component 2 only needs to go through the first interface module 11, and the storage access of the first storage array component 2 by the second programmable gate array component 3 The access only needs to go through the second interface module 31 .
  • This structure makes the interconnection distance between the second programmable gate array component 3 and the first storage array component 2 closer, and can further reduce storage access power consumption.
  • the second programmable gate array component 3 needs to be bonded to the first memory array component 2 first, and then bonded to the first programmable gate array component 1 .
  • FIG. 7 is a schematic structural diagram of the third embodiment of the stacked chip of the present invention.
  • the stacked chip of this embodiment also includes: a second storage array assembly 4.
  • the second storage array component 4 is disposed on the side of the first storage array component 2 away from the first programmable gate array component 1 , and the second storage array component 4 is provided with a third bonding lead-out area 41 .
  • the first memory array component 2 further includes a fourth bonding lead-out region 12 , and the third bonding lead-out region 41 and the fourth bonding lead-out region 12 form a three-dimensional heterogeneous integrated interconnection.
  • integrating more storage array components is beneficial to increase storage density and achieve greater storage access bandwidth.
  • integrating more storage array components is beneficial to increase the storage density, and after the multiple storage array components are uniformly produced and tested to form a standard product, and then integrated with logic components, it is beneficial to reduce costs.
  • the first programmable gate array component 1 shares the same storage control unit to access the first storage array component 2 and the second storage array component 4 .
  • the storage control unit can selectively select the first programmable gate array component in time division.
  • the programming gate array component 1 accesses the first storage array component 2 or the second storage array component 4 .
  • the stacked chip further includes a storage control unit H21, and the storage control unit H21 is disposed on the first interface module H17.
  • the first interface module H17 includes two first bond extraction areas, namely the first bond extraction area H19 and the first bond extraction area H18.
  • a plurality of memory cells G13 are arranged on the first memory array assembly 2, and there are two second bond lead-out regions on the memory cell G13, namely the second bond lead-out region G12 and the second bond lead-out region G14.
  • a plurality of memory cells F01 are disposed on the second memory array assembly 4, and a third bonding lead-out area I28 is disposed on the memory cells F01.
  • the first bonding extraction region H18 is connected to the second bonding extraction region G14.
  • the storage control unit H21 is connected to the first bonding lead-out region H18. In this way, the storage control unit H21 can control the first programmable gate array component 1 to access the first memory array component 2 through the first bonding lead-out region H18 and the second bonding lead-out region G14.
  • the first bonding extraction region H19 is connected to the second bonding extraction region G12, and the second bonding extraction region G12 is connected to the third bonding extraction region I28.
  • the storage control unit H21 can control the first programmable gate array component 1 to access the second storage array component 4 through the first bonding lead-out region H19 , the second bonding lead-out region G12 , and the third bonding lead-out region I28 .
  • the second bonding lead-out region G12 is not connected to the storage unit G13.
  • the first programmable gate array component 1 further includes a programmable logic unit K23, the programmable logic unit K23 is connected to the storage control unit H21 through the interface routing unit H22, and the programmable logic unit K23 leads out logic signals.
  • the storage control unit H21 selectively controls the first programmable gate array component 1 to access the first storage array component 2 or controls the first programmable gate array component 1 to access the second storage array component 4 based on the logic signal in time division. Specifically, the storage control unit H21 controls the first programmable gate array component 1 to access the first storage array component 2 at the first time based on the logic signal, and controls the first programmable gate array component 1 to access the second storage component at the second time.
  • Array component4 the programmable gate array component
  • the first programmable gate array component 1 uses two different storage control units to access the first storage array component 2 and the second storage array component 4 respectively. Specifically, when the first programmable gate array component 1 uses two different storage control units to access the first storage array component 2 and the second storage array component 4, since there is no access conflict, the storage control unit can simultaneously control the first The programmable gate array component 1 accesses the first storage array component 2 and controls the first programmable gate array component 1 to access the second storage array component 4 . Specifically, the first storage control unit controls the first programmable gate array component 1 to access the first storage array component 2 , and the second storage control unit controls the first programmable gate array component 1 to access the second storage array component 4 .
  • the stacked chip further includes a first storage control unit H20 and a second storage control unit I29.
  • the first storage control unit H20 and the second storage control unit I29 are arranged on the first interface module H17.
  • the first interface module H17 includes two first bond extraction areas, namely the first bond extraction area H19 and the first bond extraction area H18.
  • a plurality of memory cells G13 are arranged on the first memory array assembly 2, and there are two second bond lead-out regions on the memory cell G13, namely the second bond lead-out region G12 and the second bond lead-out region G14.
  • a plurality of memory cells F01 are disposed on the second memory array assembly 4, and a third bonding lead-out area I28 is disposed on the memory cells F01.
  • the first storage control unit H20 is connected to the first bonding lead-out region H18, and the first bonding lead-out region H18 is connected to the second bonding lead-out region G14.
  • the first storage control unit H18 can control the first programmable gate array component 1 to access the first storage array component 2 through the first bonding lead-out region H18 and the second bonding lead-out region G14.
  • the second storage control unit I29 is connected to the first bonding lead-out region H19, the first bonding lead-out region H19 is connected to the second bonding lead-out region G12, and the second bonding lead-out region G12 is connected to the third bonding lead-out region I28.
  • the second storage control unit I29 can control the first programmable gate array component 1 to access the second storage array component 4 through the first bonding lead-out area H19, the second bonding lead-out area G12, and the third bonding lead-out area I28 .
  • the second bonding lead-out region G12 is not connected to the storage unit G13.
  • the first programmable gate array assembly 1 further includes: a programmable logic unit K23, which is connected to the first storage control unit H20 and the second storage control unit I29, and the programmable logic unit K23 draws logic signals .
  • the programmable logic unit K23 is respectively connected to the first storage control unit H20 and the second storage control unit I29 through the interface routing unit H22.
  • the first storage control unit H20 controls the first programmable gate array component 1 to access the first storage array component 2 based on the logic signal
  • the second storage control unit I29 controls the first programmable gate array component 1 based on the logic signal at the same time. Access to the second storage array component 4 .
  • the present application also proposes another embodiment.
  • a plurality of storage array components is used for at least one programmable gate array component, and multiplexing or independent storage control units are designed by mixing the methods shown in FIG. 8 and FIG. 9 to realize mixed storage access.
  • the programmable logic units in some areas use the multiplexed storage control unit shown in Figure 8 to implement storage access; the programmable logic units in some areas use the independent storage control unit shown in Figure 9 to implement storage access.
  • the second storage array component 4 may also be disposed on a side of the first programmable gate array component 1 away from the first storage array component 2 .
  • the first interface module 11 further includes a fourth bonding extraction area 12, and the third bonding extraction area 41 and the fourth bonding extraction area 12 form a three-dimensional heterogeneous integrated interconnection.
  • integrating more storage array components is beneficial to increase storage density.
  • the first storage array component 2 and the second storage array component 4 are directly connected to the first programmable gate array component 1, three-dimensional heterogeneous integration techniques are reduced, the interconnection distance is closer, the storage access distance is short, and the distribution parameters are small. Optimum storage access frequency and power consumption.
  • the first programmable gate array component 1 shares the same storage control unit to access the first storage array component 2 and the second storage array component 4 .
  • the storage control unit can selectively select the first programmable gate array component in time division.
  • the programming gate array component 1 accesses the first storage array component 2 or the second storage array component 4 .
  • the stacked chip further includes a storage control unit H21, and the storage control unit H21 is disposed on the first interface module H17.
  • the first interface module H17 includes two first bond extraction areas, namely the first bond extraction area H19 and the first bond extraction area H18.
  • a plurality of storage units G13 are arranged on the first storage array assembly 2, and there are second bonding lead-out regions G14 on the storage units G13.
  • a plurality of memory cells F01 are disposed on the second memory array assembly 4, and a third bonding lead-out area I28 is disposed on the memory cells F01.
  • the first bonding extraction region H18 is connected to the second bonding extraction region G14.
  • the storage control unit H21 is connected to the first bonding lead-out region H18. In this way, the storage control unit H21 can control the first programmable gate array component 1 to access the first memory array component 2 through the first bonding lead-out region H18 and the second bonding lead-out region G14.
  • the storage control unit H21 can lead out the area H19 through the first bond, and the first bond out area H19 is connected to the third bond out area I28. In this way, the storage control unit H21 can control the first programmable gate array component 1 to access the second storage array component 4 through the first bonding lead-out region H19 and the third bonding lead-out region I28 .
  • the first programmable gate array component 1 further includes a programmable logic unit K23, the programmable logic unit K23 is connected to the storage control unit H21 through the interface routing unit H22, and the programmable logic unit K23 leads out logic signals.
  • the storage control unit H21 selectively controls the first programmable gate array component 1 to access the first storage array component 2 or controls the first programmable gate array component 1 to access the second storage array component 4 based on the logic signal in time division. Specifically, the storage control unit H21 controls the first programmable gate array component 1 to access the first storage array component 2 at the first time based on the logic signal, and controls the first programmable gate array component 1 to access the second storage component at the second time.
  • the first programmable gate array component 1 uses two different storage control units to access the first storage array component 2 and the second storage array component 4 respectively. Specifically, when the first programmable gate array component 1 uses two different storage control units to access the first storage array component 2 and the second storage array component 4, since there is no access conflict, the storage control unit can simultaneously control the first The programmable gate array component 1 accesses the first storage array component 2 and controls the first programmable gate array component 1 to access the second storage array component 4 . Specifically, the first storage control unit controls the first programmable gate array component 1 to access the first storage array component 2 , and the second storage control unit controls the first programmable gate array component 1 to access the second storage array component 4 .
  • the stacked chip further includes a first storage control unit H20 and a second storage control unit I29.
  • the first storage control unit H20 and the second storage control unit I29 are arranged on the first interface module H17.
  • the first interface module H17 includes two first bond extraction areas, namely the first bond extraction area H19 and the first bond extraction area H18.
  • a plurality of storage units G13 are arranged on the first storage array assembly 2, and there are second bonding lead-out regions G14 on the storage units G13.
  • a plurality of memory cells F01 are disposed on the second memory array assembly 4, and a third bonding lead-out area I28 is disposed on the memory cells F01.
  • the first storage control unit H20 is connected to the first bonding lead-out region H18, and the first bonding lead-out region H18 is connected to the second bonding lead-out region G14.
  • the first storage control unit H18 can control the first programmable gate array component 1 to access the first storage array component 2 through the first bonding lead-out region H18 and the second bonding lead-out region G14.
  • the second storage control unit I29 is connected to the first bonding lead-out region H19, and the first bonding lead-out region H19 is connected to the third bonding lead-out region I28. In this way, the second storage control unit I29 can control the first programmable gate array component 1 to access the second storage array component 4 through the first bonding lead-out region H19 and the third bonding lead-out region I28 .
  • the first programmable gate array assembly 1 further includes: a programmable logic unit K23, which is connected to the first storage control unit H20 and the second storage control unit I29, and the programmable logic unit K23 draws logic signals .
  • the programmable logic unit K23 is respectively connected to the first storage control unit H20 and the second storage control unit I29 through the interface routing unit H22.
  • the first storage control unit H20 controls the first programmable gate array component 1 to access the first storage array component 2 based on the logic signal
  • the second storage control unit I29 controls the first programmable gate array component 1 based on the logic signal at the same time. Access to the second storage array component 4 .
  • the present application also proposes another embodiment.
  • a plurality of storage array components is used for at least one programmable gate array component, and multiplexing or independent storage control units are designed by using the method of FIG. 11 and FIG. 12 to realize hybrid storage access.
  • the programmable logic units in some areas use the multiplexed storage control unit shown in Figure 11 to implement storage access; the programmable logic units in some areas use the independent storage control unit shown in Figure 12 to implement storage access.
  • the storage array component can be a multi-layer chip, which is combined through three-dimensional heterogeneous integration bonding;
  • the application-specific integrated circuit array component can be provided with a multiplication and addition calculation array, a multiplication calculation array, a pulse processor array, a hash calculation array, a multiple One or more arbitrary combinations of encoder arrays, special layer arrays for machine learning, retrieval function arrays, image/video processing arrays, and hard-core computing/processing units such as CPU and MCU, for use in combination with programming gate array components , improve the processing density of stacked chips.
  • the component may be at least one of a die (die or chip) and a wafer (wafer), but it is not limited thereto, and may be any replacement conceivable by those skilled in the art.
  • wafer wafer
  • chip or die chip or die
  • the memory array component of the present application may be a memory array die (DRAM die or DRAM chip), or a memory array wafer (DRAM wafer).
  • an embodiment of the present invention also provides a three-dimensional heterogeneously integrated stacked chip structure.
  • the stacked chip is provided with hierarchical stacking components, which are interconnected through three-dimensional heterogeneous integration, and these components can be any of the above components.
  • the stacked chips are manufactured, it is also possible to directly prepare them in units of wafers and perform three-dimensional heterogeneous integration.
  • part of the wafer in the preparation of stacked chips, it is also possible to prepare part of the wafer (wafer) as a unit and perform three-dimensional heterogeneous integration.
  • a stacked chip composed of a multi-layer programmable gate array component and at least one layer of memory array components as shown in FIG.
  • Integrate to form an intermediate product to increase the interconnection density perform three-dimensional heterogeneous integration of the intermediate product and the intermediate product formed by at least one layer of memory array components to obtain stacked chips; or combine multi-layer programmable gate array components with
  • the wafer is used as a unit to carry out three-dimensional heterogeneous integration to form an intermediate product.
  • the intermediate product After the intermediate product is cut into grains and tested, the intermediate product after cutting and testing formed with at least one layer of storage array components is used for die-to-die integration. Integrating, resulting in stacked chips, the yield is significantly improved because the finished product is derived from the three-dimensional heterogeneous integration of the components after dicing and testing.
  • the intermediate product after cutting and testing formed by at least one layer of programmable gate array components is used for die-to-die Integrating, resulting in stacked chips, the yield is significantly improved because the finished product is derived from the three-dimensional heterogeneous integration of the components after dicing and testing.
  • the number and order of layers of programmable gate array components and memory array components of stacked chips depend on the complex game of application scenarios, engineering requirements, production costs and production yields, and the optimal results obtained are not single.
  • the required production and preparation processes are also diversified, and there are obvious differences in the design and reuse design of storage controllers.
  • programmable gate array component In the programmable gate array component, the extensive interconnection of the programmable function module and the programmable routing network is shown in Figure 13.
  • the programmable gate array component is based on the field programmable logic gate array (Field-Programmable Gate Array, FPGA/Embedded Field-Programmable Gate Array (eFPGA) technology expansion, programmable gate array components include programmable logic block 11A and programmable routing network 11B (interconnect); programmable logic block 11A is interconnected with each other through programmable routing network 11B and is configured as several Programming functional modules, and at least a part of the programmable routing network 11B can be extended to interface routing units, and then through three-dimensional heterogeneous integration, cross-level interconnection of large-capacity storage arrays to form large-capacity, high-bandwidth, programmable storage access.
  • programmable logic block 11A is interconnected with each other through programmable routing network 11B and is configured as several Programming functional modules, and at least
  • Three-dimensional heterogeneous integration is a stacked chip interconnect bonding technology, such as hybrid bonding (Hybrid Bonding) process.
  • Hybrid Bonding Hybrid Bonding
  • BEOL back-end process
  • the stacked chip includes a functional component 210 , a functional component 220 and a functional component 230 .
  • the functional component 210 , the functional component 220 and the functional component 230 may be programmable gate array components and/or memory array components.
  • the functional component 210, the functional component 220 and the functional component 230 all include a top metal layer, an internal metal layer active layer and a substrate, wherein the top metal layer and the internal metal layer are used for signal interconnection in the component; the active layer is used to realize Transistors form the function of the module; the substrate is used to protect the module and provide mechanical support.
  • the functional component 210 and the functional component 220 are close to the top metal layer, and the three-dimensional heterogeneous integrated bonding layer is manufactured through the subsequent process, and interconnected to form a face-to-face interconnection structure; the functional component 220 is close to the substrate and the functional component 230 On the side close to the top metal layer, a three-dimensional heterogeneous integrated bonding layer is manufactured through a subsequent process and interconnected to form a back-to-face (or face-to-back) interconnection structure. Between the functional component 210 , the functional component 220 and the functional component 230 , cross-component signal interconnection can be established through arbitrary three-dimensional heterogeneous integration. Whether the core voltages of the functional component 210 , the functional component 220 and the functional component 230 are the same corresponds to two interconnection technologies.
  • the functional circuit 1 is in the functional component 210
  • the outgoing signal of the internal metal layer is connected to the face-to-face three-dimensional heterogeneous integrated bonding structure between the functional component 210 and the functional component 220 through the top layer metal of the functional component 210, and then interconnected with the top layer metal of the functional component 220; the interconnection signal, Through the internal metal layer of the functional component 220, and the through-silicon via (TSV) that penetrates the active layer of the functional component 220 and the thinned substrate, interconnection to the back-to-face three-dimensional heterogeneous integration between the functional component 220 and the functional component 230
  • TSV through-silicon via
  • the level conversion circuit 2 can also be transferred and designed into the functional module 230 or the functional module 220 through three-dimensional heterogeneous integration and interconnection.
  • the storage access of the programmable gate array component and the ASIC array component to the storage array component does not go through the IO interface and/or IO circuit, so that the interconnection distance is closer, and the power consumption overhead of the storage access is significant. reduce.
  • a high-bandwidth, low-power programmable storage integrated structure is realized through three-dimensional heterogeneous integration and bonding.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

一种堆叠芯片,其中堆叠芯片包括:第一可编程门阵列组件,第一可编程门阵列组件包括第一接口模块,第一接口模块嵌入于第一可编程门阵列组件内,第一接口模块包括第一键合引出区域;第一存储阵列组件,设置有第二键合引出区域;第一键合引出区域、第二键合引出区域键合连接,以将第一可编程门阵列组件以及第一存储阵列组件上的互连信号连接在一起。实现存储访问的高带宽、低功耗的目的。

Description

一种堆叠芯片
相关申请的交叉引用
本申请基于2021年9月2日提交的中国专利申请202111028371.7主张其优先权,此处通过参照引入其全部的记载内容。
【技术领域】
本发明涉及集成电路技术领域,特别是涉及一种堆叠芯片。
【背景技术】
随着应用计算规模的快速增长,存储访问的带宽和能耗开销成为限制规模性计算电路发展的重要因素。
【发明内容】
本发明提供一种堆叠芯片,其能够实现存储访问的高带宽、低功耗。
为解决上述技术问题,本发明提供的一个技术方案为:提供一种堆叠芯片,包括:第一可编程门阵列组件,第一可编程门阵列组件包括第一接口模块,第一接口模块嵌入于第一可编程门阵列组件内,第一接口模块包括第一键合引出区域;第一存储阵列组件,设置有第二键合引出区域;第一键合引出区域、第二键合引出区域键合连接,以将第一可编程门阵列组件以及第一存储阵列组件上的互连信号连接在一起。
其中,第一可编程门阵列组件包括多个功能模块,第一接口模块数量至少为一,第一接口模块位于多个功能模块之间,且通过接口路由单元与功能模块连接。
其中,功能模块内部为条带状,第一接口模块随条带状的功能模块布局延伸布局。
其中,功能模块通过内部金属层连接至接口路由单元,第一接口模块通过内部金属层与接口路由单元互连。
其中,第一可编程门阵列组件包括:可编程路由网络,多个功能模块通过内部金属层与可编程路由网络互连,并通过可编程路由网络连接至接口路由单元。
其中,堆叠芯片还包括:物理层,物理层用于实现第一可编程门阵列组件与第二存储阵列组件之间的电平转换;物理层设置于第一接口模块上。
其中,功能模块包括:可编程逻辑块LAB(Logic Array Block)/CLB(Configurable Logic Block)、存储块BRAM(Block Random Access Memory,BRAM)、乘法单元DSP(Digital Signal Processer)和乘累加单元MAC(Multiply Accumulate)中任一种或多种的任意组合。
其中,功能模块还包括:专用集成电路阵列单元的组合,专用集成电路阵列单元是用于完成固定计算目标的固化硬件电路。
其中,存储块通过存储路由单元与可编程逻辑块连接。
其中,第一可编程门阵列组件包含现场可编程门阵列(Field-Programmable Gate Array,FPGA)或嵌入式现场可编程门阵列(Embedded Field-Programmable Gate Array,eFPGA)。
其中,堆叠芯片还包括:存储控制单元,存储控制单元设置于第一接口模块上;或者,存储控制单元设置于第一可编程门阵列组件靠近第一接口的位置处;或者,存储控制单元设置于第一存储阵列组件上;存储控制单元控制第一可编程门阵列组件对第一存储阵列组件进行存储访问。
其中,堆叠芯片还包括:第二存储阵列组件,第二存储阵列组件设置于第一可编程门阵列组件远离第一存储阵列组件的一侧;第二存储阵列组件设置有第三键合引出区域;第一接口模块包括第四键合引出区域,第一可编程门阵列组件与第二存储阵列组件通过第三键合引 出区域、第四键合引出区域键合连接。
其中,堆叠芯片还包括:第二存储阵列组件,第二存储阵列组件设置于第一存储阵列组件远离第一可编程门阵列组件的一侧;第二存储阵列组件设置有第三四键合引出区域;第一存储阵列组件包括第四键合引出区域,第一存储阵列组件与第二存储阵列组件通过第四键合引出区域、第三键合引出区域键合连接。
其中,堆叠芯片还包括:存储控制单元,存储控制单元设置于第一接口模块上;存储控制单元控制第一可编程门阵列组件访问第一存储阵列组件以及第二存储阵列组件。
其中,第一可编程门阵列组件还包括:可编程逻辑单元,连接存储控制单元,可编程逻辑单元引出逻辑信号;存储控制单元基于逻辑信号分时选择性的控制第一可编程门阵列组件访问第一存储阵列组件,或者控制第一可编程门阵列组件访问第二存储阵列组件。
其中,堆叠芯片还包括:第一存储控制单元,设置于第一接口模块上;第二存储控制单元,设置于第一接口模块上;第一存储控制单元控制第一可编程门阵列组件访问第一存储阵列组件,第二存储控制单元控制第一可编程门阵列组件访问第二存储阵列组件。
其中,第一可编程门阵列组件还包括:可编程逻辑单元,连接第一存储控制单元以及第二存储控制单元,可编程逻辑单元引出逻辑信号;第一存储控制单元基于逻辑信号控制第一可编程门阵列组件访问第一存储阵列组件,第二存储控制单元同时基于逻辑信号控制第一可编程门阵列组件访问第二存储阵列组件。
其中,所述堆叠芯片还包括:第二可编程门阵列组件,所述第二可编程门阵列组件设置于所述第一可编程门阵列组件远离所述第一存储阵列组件的一侧;所述第二可编程门阵列组件包括第二接口模块,所述第二接口模块包括第三键合引出区域,所述第一接口模块包括第四键合引出区域,所述第三键合引出区域与所述第四键合引出区域键合连接,以将所述第二可编程门阵列组件与所述第一可编程门阵列组件键合连接;其中,所述第一可编程门阵列组件和所述第二可编程门阵列组件共用同一存储控制单元访问所述第一存储阵列组件的同一存储单元;或者所述第一可编程门阵列组件和所述第二可编程门阵列组件分别利用独立的存储控制单元访问所述第一存储阵列组件的不同的存储单元。
其中,所述堆叠芯片还包括:第二可编程门阵列组件,所述第二可编程门阵列组件设置于所述第一存储阵列组件远离所述第一可编程门阵列组件的一侧;所述第二可编程门阵列组件包括第二接口模块,所述第二接口模块包括第三键合引出区域,所述第一存储阵列组件包括第四键合引出区域,所述第三键合引出区域与所述第四键合引出区域键合连接,以将所述第二可编程门阵列组件与所述第一存储阵列组件键合连接;其中,所述第一可编程门阵列组件和所述第二可编程门阵列组件共用同一存储控制单元访问所述第一存储阵列组件的同一存储单元;或者所述第一可编程门阵列组件和所述第二可编程门阵列组件分别利用独立的存储控制单元访问所述第一存储阵列组件的不同的存储单元。
其中,所述第一可编程门阵列组件包括可编程逻辑块和可编程路由网络;所述可编程逻辑块通过所述可编程路由网络而彼此互联进而被配置为若干可编程功能模块;且所述可编程路由网络的至少一部分可扩展至接口路由单元。
本发明的有益效果,区别于现有技术的情况,本发明的堆叠芯片通过第一键合引出区域、第二键合引出区域将第一可编程门阵列组件与第一存储阵列组件上的互连信号连接在一起,并且设置第一键合引出区域的第一接口模块嵌入至第一可编程门阵列组件中,进而实现三维异质集成结构,实现存储访问的高带宽、低功耗的目的。
【附图说明】
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图,其中:
图1为本发明堆叠芯片的第一实施例的结构示意图;
图2为本发明第一可编程门阵列组件的平面结构示意图;
图3为图1中第一可编程门阵列组件对第一存储阵列组件的存储访问结构示意图;
图4为本发明堆叠芯片的第二实施例的结构示意图;
图5为图4中第一可编程门阵列组件以及第二可编程门阵列组件对第一存储阵列组件的共享存储访问的结构示意图;
图6为图4中第一可编程门阵列组件以及第二可编程门阵列组件对第一存储阵列组件独立存储访问的的结构示意图;
图7为本发明堆叠芯片的第三实施例的结构示意图;
图8为图7中第一可编程门阵列组件对第一存储阵列组件以及第二存储阵列组件的共享存储访问的结构示意图;
图9为图7中第一可编程门阵列组件对第一存储阵列组件以及第二存储阵列组件的独立存储访问的结构示意图;
图10为本发明堆叠芯片的第四实施例的结构示意图;
图11为图10中第一可编程门阵列组件对第一存储阵列组件以及第二存储阵列组件的共享存储访问的结构示意图;
图12为图10中第一可编程门阵列组件对第一存储阵列组件以及第二存储阵列组件的独立存储访问的结构示意图;
图13为可编程路由网络和可编程逻辑块的结构示意图;
图14为功能组件210、220、230之间的三维异质集成结构示意图。
【具体实施方式】
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
本发明中的术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”、“第三”的特征可以明示或者隐含地包括至少一个该特征。本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。本发明实施例中所有方向性指示(诸如上、下、左、右、前、后……)仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变时,则该方向性指示也相应地随之改变。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排它的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
请参见图1,为本发明堆叠芯片的第一实施例的结构示意图。具体的,堆叠芯片包括第一可编程门阵列组件1以及第一存储阵列组件2。本申请中,利用三维异质集成的方式将第一可编程门阵列组件1以及第一存储阵列组件2混合键合集成。三维异质集成是将两个芯片组件内部金属层直接跨芯片互连,物理及电气参数遵循半导体制程工艺特征,三维异质集成的互连密度和速度,较通过输入输出(I/O)接口,和/或,I/O电路实现的互连,极大提高,堆叠芯片内部互连,因此能够实现堆叠芯片的高带宽、低功耗。
在一实施例中,第一存储阵列组件2可以为DRAM(Dynamic Random Access Memory,动态随机存储器),在另一实施例中,第一存储阵列组件2还可以为SRAM(static Random Access Memory,静态随机存储器),当然考虑到技术迭代发展,第一存储阵列组件2还可以为其它类型的存储器,例如闪存(Flash)、变阻存储器(RRAM或ReRAM)、磁阻存储器(MRAM)、铁电存储器(FeRAM)、氧化物电阻存储器(OxRAM)、电桥存储器(CBRAM)、相变存储器(PCM)、自旋转移力矩存储器(STT-MRAM)和电擦除存储器(EEPROM)等。上述存储器有各自的特征优势,并可能需要存储控制器作为存储访问界面,存储控制器用于实现物理接口、数据读写、数据缓冲、数据预取、数据刷新和数据块重映射等功能。
具体的,如图1所示,第一可编程门阵列组件1包括第一接口模块11,第一接口模块11嵌入至第一可编程门阵列组件1内。具体的,第一接口模块11包括第一键合引出区域111。第一存储阵列组件2设置有第二键合引出区域21。第一键合引出区域111与第二键合引出区域21通过三维异质集成键合结构键合在一起,进而实现第一可编程门阵列组件1与第一存储阵列组件2的三维异质集成,进而实现堆叠芯片的高带宽、低功耗的可编程静态存算一体结构。三维异质集成键合能够大大提高第一可编程门阵列组件1与第一接口模块11的互连密度,并且能够进一步提高第一可编程门阵列组件1与第一存储阵列组件2的互连密度,降低互连分布参数,提高互连带宽并且降低互连功耗。
具体的,第一可编程门阵列组件1包括多个功能模块13,第一接口模块11位于多个功能模块13之间,第一接口模块11靠近功能模块13的一侧设置接口路由单元137,该接口路由单元137将功能模块13与第一接口模块11连接。具体的,功能模块13通过内部金属层连接至接口路由单元137,第一接口模块11通过内部金属层与接口路由单元137连接。在一具体实施例中,第一接口模块11的数量为一,在另一实施例中,第一接口模块11的数量至少为2,至少两个第一接口模块11分别穿插设置在多个功能模块13之间,并且通过接口路由单元137与功能模块13连接。图1所示的实施例中仅仅示出了一个第一接口模块11,在其它实施例中,还可以具有多个第一接口模块11,本申请并不以此为限,具体根据需求设置。
在一实施例中,如图2所示,图2为第一可编程门阵列组件1的平面结构示意图。功能模块13包括可编程逻辑块(Logic Array Block,LAB/Configurable Logic Block,CLB)133,存储块(Block Random Access Memory,BRAM)134,乘法单元(Digital Signal Processer)135和乘加单元(Multiply Accumulate,MAC)138。需要说明的是,乘法单元135不是数字信号处理器芯片,是嵌入式可编程乘法单元。在一具体实施例中,功能模块13可以按照要求设置,不限定于本申请。
本实施例中,第一键合引出区域111即为第一可编程门阵列组件1中的三维异质集成互连资源,第一可编程门阵列组件1直接通过第一键合引出区域111与第一存储阵列组件2的第二键合引出区域21键合连接,实现高密度低分布参数的金属层直接互连,实现存储访问,避免利用第一可编程门阵列组件1通过IO接口和IO接口电路与第一存储阵列组件2互连,进而实现高带宽、低功耗的目的,并且具有密度高,分布参数低的优点。
在一实施例中,第一可编程门阵列组件1还包括:可编程路由网络。多个功能模块13通过内部金属层与可编程路由网络互连,并通过可编程路由网络连接至接口路由单元137。具体的,可编程路由网络用于以可编程的方式,利用第一可编程门阵列组件1的内部金属层,建立第一可编程门阵列组件1内部所有资源的互连和数据交换,功能模块13通过可编程路由网络建立模块之间和模块到存储设备的广泛的可重构的大带宽的数据互连。如图2所示,可编程路由网络连接存储路由单元136,存储块BRAM 134与存储路由单元136互连并连接至可编程存储路由网络,实现第一可编程门阵列组件1内所有功能模块13,通过存储路由单元136,对所有存储块BRAM 134的存储访问。可编程路由网络连接接口路由单元137,第一存储阵列组件2通过第一接口模块11与接口路由单元137互连并连接至可编程存储路由网络,实现第一可编程门阵列组件1内所有功能模块13通过接口路由单元137,对所有第一存储阵列组件2上存储阵列的存储访问。
具体的,第一可编程门阵列组件1上的所有功能模块13通过可编程路由网络连接至接口路由单元137,接口路由单元137与第一接口模块11对应的三维异质集成键合结构连接,进而建立功能模块13对所有第一存储阵列组件2上存储阵列的存储访问。由于可编程路由网络广泛分布在第一可编程门阵列组件1上,并支持可编程特性,无论是接近或远离第一接口模块11的功能模块13,都能通过可编程路由网络与接口路由单元137建立高密度芯片内金属层互连。第一接口模块11通过第一键合引出区域111、第二键合引出区域21与第一存储阵列组件2实现高密度低分布参数的跨芯片金属层直接互连,避免通过IO接口和IO接口电路带来的低互连密度、低互连速度和高互连功耗的不足,建立起所有功能模块13对所有第一存储阵列组件2上存储阵列的高带宽和低功耗的存储访问。
可以理解的是,可编程门阵列组件上存储块BRAM通过存储路由单元连接至可编程路由网络,为功能模块提供高带宽存储资源,受限于可编程门阵列组件的面积约束,存储块BRAM的容量通常在几万至几百万个存储位(bit,memory cell)这无法满足常规应用需求。现有技术中,在可编程门阵列组件外,通过可编程门阵列组件和外部存储器的IO,扩展大容量存储资源,并通常将可编程门阵列组件内部的存储块BRAM作为外部大容量存储资源的缓存使用。受限于可编程门阵列组件外部扩展大容量存储资源的互连技术,外部存储访问带宽远低于内部,且存储访问功耗更大。与现有技术相比本申请的两个优势克服了上述不足:与功能模块和存储块BRAM的互连和存储访问结构相似,设计接口路由单元137和第一接口模块11,所有功能模块13,都能通过可编程路由网络与接口路由单元137建立高密度芯片内金属层互连,并且所有功能模块13都能够进一步通过接口路由单元137与第一接口模块11互连。由于第一接口模块11通过三维异质集成方式连接第一存储阵列组件2,也即实际上,本申请中,第一可编程门阵列组件1与第一存储阵列组件2通过三维异质集成,建立芯片间金属层高密度互连,互连物理及电气参数遵循半导体制程工艺特征,继承存储块BRAM 134与功能模块13通过储路由单元136在第一可编程门阵列组件1的芯片内的互连的高密度和高速度的带宽优势和低功耗优势,并近乎无限地扩展存储容量。如图2所示,功能模块13中的可编程逻辑块LAB/CLB 133、存储块BRAM 134、乘法单元DSP 135、乘累加单元MAC 138等均为条带状布局,存储路由单元136为条带状布局。可编程逻辑块LAB/CLB 133、存储块BRAM 134、乘法单元DSP 135、乘累加单元MAC 138、存储路由单元136等根据需求在第一可编程门阵列组件1中,以如图2的条带状任意重复组合,并通过可编程路由网络建立可编程互连,具体组合方式,本申请并不加以限制。在本实施例中,设置第一接口模块11与功能模块13的形状契合,同样为条带状布局,以嵌入功能模块13之间,第一接口模块11基于功能模块13的尺寸在条带状长度方向上随功能模块13进行延伸和容量扩展。在一具体实施例中,设置接口路由单元137与功能模块13的形状契合,同样为条带状,以嵌入功能模块13之间,接口路由单元137基于功能模块13的尺寸在条带状长度方向上随第一接口模块11进行延伸以支持第一接口模块11的容量扩展;以此能够形成功能模块13与第一存储阵列组件2之间的大容量存储访问互连,互连密度远大于通过FPGA内部IO电路和/或对外IO接口与外部大容量存储器的互连,实现堆叠芯片的高宽带、低功耗存储访问。
本实施例的堆叠芯片,设计了接口路由单元137,能够大幅度提高总线的位宽,并且接口路由单元137直接连接至三维异质集成键合结构,通过三维异质集成互连结构与第一存储阵列组件2连接,能够实现大容量存储阵列的访问。
本实施例中,在第一可编程门阵列组件1上设置第一接口模块11实现与第一存储阵列组件2的存储访问,与传统技术中第一可编程门阵列组件1通过内部IO电路和对外IO接口与大容量外部存储器连接的方式不同,本实施例的堆叠芯片能够节省第一可编程门阵列组件1的IO资源,提供远高于通过IO的外部存储互连密度,提高了存储访问带宽,降低了存储访问功耗。
在一实施例中,还可以在第一可编程门阵列组件1上设置全局总线,例如NOC AXI AHB等,可以实现第一可编程门阵列组件1上可编程逻辑跨区域存储访问。具体的,全局总线可 以设置在第一接口模块11的附近,或者还可以设置在其它存储访问关联位置处,具体不做限定。
在一实施例中,如图2所示,第一可编程门阵列组件1中还可以设置专用集成电路阵列单元139,专用集成电路阵列单元139包括专用集成电路实现的硬核运算/处理单元(Processing Element),例如乘加计算阵列、乘法计算阵列、脉动处理器阵列、哈希计算阵列、多种编码器阵列、机器学习的专用层阵列、检索功能阵列、图像/视频处理阵列以及CPU和MCU等硬核运算/处理单元的一种或多种任意组合。与功能模块13在第一可编程门阵列组件1中的布局和互连相似,专用集成电路阵列单元139在第一可编程门阵列组件1中,为条带状布局,以嵌入功能模块13之间,尺寸在条带状长度方向上随功能模块13进行延伸和容量扩展,并广泛互连在可编程路由网络上,成为功能模块13的硬核运算/处理扩展电路。专用集成电路阵列单元139具备有限的可编程性或不具备可编程性,应用于特定需求的计算/处理加速,较具备任意可编程性的功能模块13的计算/处理密度大很多,显著增加堆叠芯片计算/处理密度。
在一实施例中,结合具体应用需求,对专用集成电路阵列单元139的需求较高时,结合第一存储阵列组件2对第一可编程门阵列组件1的大容量存储跨芯片扩展方式,进行专用集成电路阵列单元139的跨芯片扩展:1.设计专用集成电路阵列单元139包括专用集成电路实现的硬核运算/处理单元,例如乘加计算阵列、乘法计算阵列、脉动处理器阵列、哈希计算阵列、多种编码器阵列、机器学习的专用层阵列、检索功能阵列、图像/视频处理阵列以及CPU和MCU等硬核运算/处理单元的一种或多种任意组合;2.第一可编程门阵列组件1上设计运算/处理接口模块,通过三维异质集成,与专用集成电路阵列单元139中的运算/处理单元建立高密度跨芯片互连;3.第一可编程门阵列组件1上设计运算/处理接口路由单元,建立可编程路由网络与运算/处理接口模块之间的片内金属层高密度互连。由此实现第一可编程门阵列组件1上的功能模块13,对集成电路阵列单元139上的运算/处理单元,基于高密度三维异质集成的调度,运算/处理单元的计算输入和计算结果,通过基于高密度三维异质集成的存储访问,映射至第一存储阵列组件2上的大容量存储阵列中。
在一实施例中,堆叠芯片进一步包括:存储控制单元113,存储控制单元113用于控制第一可编程门阵列组件1对第一存储阵列组件2的存储与访问。具体的,存储控制单元113可以设置于第一接口模块11上;或者第一可编程门阵列组件1上第一接口模块11附近;或者存储控制单元113设置于第一存储阵列组件2上。本实施例的堆叠芯片,能够避免通过物理IO接口互连,进而节省IO资源,提供远高于IO接口的互连密度,提高了存储访问带宽,降低了存储访问功耗。实现了第一可编程门阵列组件1内部信号到第一存储阵列组件2的高密度、近距离互连。
在一优选实施例中,将存储控制单元113设置在第一接口模块11上。由于可编程门阵列组件访问存储阵列组件均需要通过第一接口模块11,这对数据流有利。在一优选实施例中,将存储控制单元113设置于第一可编程门阵列组件1上,由于可编程门阵列组件的工艺性能优于存储阵列组件,以此可以获得更高的密度和速度。在一优先实施例中,将存储控制单元113设置在第一接口模块11附近,能继承可编程门阵列组件的工艺性能,以获得更高的密度和速度,还能降低第一接口模块11的面积,减少三维异质集成的互连区域的面积开销,存储控制单元113还可以和功能模块13的可编程特性结合,使存储控制单元113的部分功能和/或参数可编程。在一优选实施例中,将存储控制单元113设置于存储阵列组件上,由于存储阵列组件工艺比可编程门阵列组件单位面积便宜,可以降低实现成本,并相对提高可编程门阵列组件密度。
在一实施例中,堆叠芯片还包括:物理层114,物理层114用于当第一可编程门阵列组件1与第一存储阵列组件2的内核电压不同时,实现第一可编程门阵列组件1与第一存储阵列组件2之间三维异质集成互连的电平转换。在一实施例中,如图1所示,物理层114可以设置在第一接口模块11上。在另一实施例中,物理层114还可以设计在第一可编程门阵列组件1上,通常在第一接口模块11上或附近,以继承第一可编程门阵列组件1的工艺性能,获 得更高的密度和速度;物理层114可以设计在第一存储阵列组件2上,通常在第一接口模块11的垂直投影区域上或附近,以节省第一可编程门阵列组件1的面积,提高第一可编程门阵列组件1的计算/处理密度。
本申请中,第一可编程门阵列组件1与第一存储阵列组件2跨芯片三维异质集成互连的物理及电气参数遵循半导体制程工艺特征,较传统PCB或2.5D封装,第一可编程门阵列组件1与第一存储阵列组件2的互连数量(存储访问带宽)提高4~2个数量级。较传统PCB或2.5D封装,实现第一可编程门阵列组件1与第一存储阵列组件2的直接互连,不经过IO接口和/或IO电路,使得互连距离更近,互连分布参数更低(尤其是互连线的对参考地分布电容更低),存储访问的功耗开销显著降低。形成第一可编程门阵列组件1与第一存储阵列组件2的近存存储访问架构,实现第一可编程门阵列组件1上的功能模块13就近存储访问,避免传统共享总线的存储访问冲突和效率降低;节省了传统技术中用于互连第一可编程门阵列组件1与外部大容量存储器件的IO开销。
在本申请的一实施例中,如图3,以存储控制单元设置于第一接口模块上为例进行说明。具体的,存储控制单元H21设置于第一接口模块H17上。第一存储阵列组件2上包括存储单元G13,第二键合引出区域G14设置在存储单元G13上,存储控制单元H21与第一键合引出区域H19连接,第一键合引出区域H19与第一存储阵列组件2上的第二键合引出区域G14连接。
进一步的,第一可编程门阵列组件1上设置有可编程逻辑单元K23,可编程逻辑单元K23通过接口路由单元H22连接存储控制单元H21。可编程逻辑单元K23引出逻辑信号,存储控制单元H21基于逻辑信号控制第一可编程门阵列组件1对第一存储阵列组件2进行存储访问。
本申请中,第一可编程门阵列组件1与第一存储阵列组件2的数量以及位置可以根据需求进行设置,如图4所示,图4为本发明堆叠芯片的第二实施例的结构示意图。与上述图1所示的第一实施例相比,区别在于,本实施例的堆叠芯片还包括:第二可编程门阵列组件3。第二可编程门阵列组件3设置于第一可编程门阵列组件1远离第一存储阵列组件2的一侧。具体的,第二可编程门阵列组件3包括第二接口模块31,且第二接口模块31包括第三键合引出区域32。本实施例中,第一接口模块11还包括第四键合引出区域12,第三键合引出区域32与第四键合引出区域12键合连接,以将第二可编程门阵列组件3与第一可编程门阵列组件1键合在一起。
本实施例的堆叠芯片,设置有两层可编程门阵列组件,也即第二可编程门阵列组件3与第一可编程门阵列组件1,并且第二可编程门阵列组件3与第一可编程门阵列组件1通过第三键合引出区域32以及第四键合引出区域12键合连接。本实施例中,第三键合引出区域32即为第二可编程门阵列组件3的三维异质互连资源,也即第二可编程门阵列组件3直接通过互连资源与第一接口模块11连接,进而通过第一可编程门阵列组件1中的互连资源(第一键合引出区域111)实现与第一存储阵列组件2互连,实现存储访问,避免利用第二可编程门阵列组件3的IO接口与第一存储阵列组件2互连,进而实现高带宽、低功耗的目的,并且具有可编程资源密度高,分布参数低,存储访问速度快的优点。
堆叠芯片中,相邻组件之间通过三维异质集成互连,逐层建立芯片内高密度金属层互连,堆叠芯片中的构成组件被层叠设计和封装在同一个堆叠芯片内,无需现有技术中的IO电路所提供的驱动、外部电平升压(输出时)、外部电平降压(输入时)、三态控制器、静电防护ESD和浪涌保护电路等功能,不用通过现有技术的IO接口和/或IO电路互连,而直接建立跨组件高密度金属层互连。因此减少可编程门阵列组件IO结构的使用,增加可编程门阵列组件和存储阵列组件的互连密度和互连速度;同时,三维异质集成互连因不通过传统IO结构,且互连距离较短,降低了芯片之间的通讯功耗;进而提高了堆叠芯片的集成度以及可编程门阵列组件和存储阵列组件互连频率,并降低了互连功耗。由此可编程门阵列组件上广泛互连可编程资源的可编程路由网络跨芯片延伸至存储芯片上的大容量存储阵列,并形成广泛互连,实现 可编程资源以高带宽、可编程的方式,对存储芯片上的大容量存储阵列的三维异质集成的存储访问。多层芯片同时兼具外部存储器的大容量,以及类似可编程门阵列组件上通过可编程路由网络互连存储块BRAM(现有技术,容量小)的,大位宽、高带宽的关键优势。从根本上突破了现有技术可编程门阵列芯片扩展大规模存储器的IO数量瓶颈、访存带宽瓶颈和访存功耗瓶颈。
相对于图1所示的第一实施例,本实施例的堆叠芯片能够进一步提高计算密度,有利于更复杂的可重构计算。结合本实施例的堆叠芯片,可以根据需求设置更多可编程门阵列组件,以提高堆叠芯片中的可编程门阵列组件的密度。
需要说明的是,第二可编程门阵列组件3还可以与第一可编程门阵列组件1不同,其可以按照实际需要设置不同的功能模块。例如,在一实施例中,第一可编程门阵列组件1的功能模块包括可编程功能模块,可编程功能模块包括但不限于可编程逻辑块LAB/CLB、存储块BRAM、乘法单元DSP和乘累加单元MAC的任意组合;第二可编程门阵列组件3的功能模块可以部分/全部包括专用集成电路阵列单元,专用集成电路阵列单元包括但不限于乘加计算阵列、乘法计算阵列、脉动处理器阵列、哈希计算阵列、多种编码器阵列、机器学习的专用层阵列、检索功能阵列、图像/视频处理阵列以及CPU和MCU等硬核运算/处理单元的一种或多种任意组合。
在本实施例中,第一可编程门阵列组件1以及第二可编程门阵列组件3共用同一存储控制单元113访问第一存储阵列组件2的同一存储单元。具体的,本实施例中,存储控制单元113可以设置于第一接口模块11上或附近;存储控制单元113还可以设置于第二接口模块31上或附近;或者,存储控制单元113还可以设置于第一存储阵列组件2上。
具体的,在一实施例中,第一可编程门阵列组件1还包括:第一可编程逻辑单元,第一可编程逻辑单元连接存储控制单元113,第一可编程逻辑单元引出第一逻辑信号。第二可编程门阵列组件3还包括:第二可编程逻辑单元,第二可编程逻辑单元连接存储控制单元113,第二可编程逻辑单元引出第二逻辑信号。存储控制单元113基于第一逻辑信号以及第二逻辑信号选择第一可编程门阵列组件1访问第一存储阵列组件2或者选择第二可编程门阵列组件3访问第一存储阵列组件2。
具体的,如图5所示,以存储控制单元H21设置在第一接口模块H17上为例进行说明。第一存储阵列组件2上包括存储单元G13,第二键合引出区域G14设置在存储单元G13上,第一键合引出区域H19设置在第一接口模块H17上,且第一键合引出区域H19与第二键合引出区域G14键合连接。存储控制单元H21设置在第一接口模块H17上,且存储控制单元H21与第一键合引出区域H19连接。第一接口模块H17上还设置有第四键合引出区域H24,第四键合引出区域H24与存储控制单元H21连接。第二接口模块I27上设置有第三键合引出区域I28,第三键合引出区域I28与第四键合引出区域H24连接。进一步的,本实施例中,第一可编程门阵列组件1上还包括第一可编程逻辑单元H23,第一可编程逻辑单元H23连接存储控制单元H21。第二可编程门阵列组件321上还包括第二可编程逻辑单元I32,第二可编程逻辑单元I32,连接第三键合引出区域I28。
例如,在一实施例中,在第一可编程门阵列组件1需要访问第一存储阵列组件2时,第一可编程逻辑单元H23引出第一逻辑信号至存储控制单元H21,此时,存储控制单元H21基于第一逻辑信号控制第一可编程门阵列组件1通过第一键合引出区域H19、第二键合引出区域G14访问第一存储阵列组件2上的存储单元G13。在第二可编程门阵列组件3需要访问第一存储阵列组件2时,第二可编程逻辑单元I32引出第二逻辑信号至存储控制单元H21。此时,存储控制单元H21基于第二逻辑信号控制第二可编程门阵列组件3通过第三键合引出区域I28、第四键合引出区域H24访问第一存储阵列组件2上的存储单元G13。以此实现存储控制单元基于第一逻辑信号以及第二逻辑信号选择第一可编程门阵列组件1访问第一存储阵列组件2或者第二可编程门阵列组件3访问第一存储阵列组件2。
本实施例中,仅设计一个存储控制单元H21,存储控制单元H21可以位于第一接口模 块H17上或附近,还可以位于第二接口模块I27上或附近,还可以位于第一存储阵列组件2上,具体不做限定。第一存储阵列组件2上的存储单元G13,通过第二键合引出区域G14和第一键合引出区域H19全部连接到存储控制单元H21,存储控制单元H21可直接连接两组存储访问接口(例如图5中的H19、H24),多组可编程门阵列组件通过该接口共享存储单元G13的存储访问。
在一实施例中,第一可编程逻辑单元H23以及第二可编程逻辑单元I32包括可编程逻辑块、存储块、乘法单元、乘累加单元和硬核运算/处理单元等的任意组合。第一可编程逻辑单元H23引出第一逻辑信号,第二可编程逻辑单元I32引出第二逻辑信号。由存储控制单元H21根据第一逻辑信号以及第二逻辑信号,将存储控制单元H21的存储访问接口切换到第一键合引出区域H19和第二键合引出区域G14键合的方向,或者切换到第四键合引出区域H24和第三键合引出区域I28键合的方向,由第一可编程逻辑单元H23以及第二可编程逻辑单元I32分时使用,实现了共享存储访问。
需要说明的是,本实施例中,第三键合引出区域I28与接口路由单元I30连接。并且接口路由单元I30将第二可编程逻辑单元I32连接至第四键合引出区域H24。
本实施例中,共用一个存储控制单元H21,占用面积小。
在另一实施例中,第一可编程门阵列组件1以及第二可编程门阵列组件3分别利用独立的存储控制单元访问第一存储阵列组件2的不同的存储单元。具体地,堆叠芯片包括第一存储控制单元以及第二存储控制单元,第一可编程门阵列组件1利用第一存储控制单元访问第一存储阵列组件2的存储单元,第二可编程门阵列组件3利用第二存储控制单元访问第一存储阵列组件2的存储单元。
在本实施例中,第二存储控制单元设置于第二接口模块31上或附近,第一存储控制单元设置于第一接口模块11上或附近。在本实施例中,第一可编程门阵列组件1还包括:第一可编程逻辑单元,第一可编程逻辑单元连接第一存储控制单元,第一可编程逻辑单元引出第一逻辑信号;第二可编程门阵列组件3还包括:第二可编程逻辑单元,第二可编程逻辑单元连接第二存储控制单元,第二可编程逻辑单元引出第二逻辑信号。
响应于第一存储控制单元以及第二存储控制单元均控制第一存储阵列组件2的所有存储单元,且第一可编程门阵列组件1以及第二可编程门阵列组件3同时访问同一存储单元时,第一存储控制单元基于第一逻辑信号控制第一可编程门阵列组件1在第一时间访问存储单元;第二存储控制单元基于第二逻辑信号控制第二可编程门阵列组件3在第二时间访问存储单元。响应于第一存储控制单元以及第二存储控制单元分别控制第一存储阵列组件的不同存储单元,第一存储控制单元以及第二存储控制单元同时控制第一可编程门阵列组件1以及第二可编程门阵列组件3访问第一存储阵列组件2的不同存储单元。
具体的,本实施例中,若第一存储控制单元以及第二存储控制单元均控制第一存储阵列组件2的所有存储单元时,若第一可编程门阵列组件1以及第二可编程门阵列组件3同时访问同一存储单元时,第一存储控制单元以及第二存储控制单元分别控制第一可编程门阵列组件1以及第二可编程门阵列组件3访问该存储单元。具体的,第一存储控制单元基于第一逻辑信号控制第一可编程门阵列组件1在第一时间访问存储单元,第二存储控制单元基于第二逻辑信号控制第二可编程门阵列组件3在第二时间访问存储单元,实现不同可编程门阵列对同一个存储单元的分时访问,即消除访问冲突。
具体的,第一可编程门阵列组件1可包含存储单元的仲裁逻辑,基于第一逻辑信号以及第二逻辑信号,选择被第一存储控制单元或者第二存储控制单元访问。当第一可编程门阵列组件1的第一存储控制单元和第二可编程门阵列组件3的第二存储控制单元分别同时访问第一存储阵列组件2的同一存储单元的相同区域时,第一可编程门阵列组件1中的存储单元的仲裁逻辑,基于第一逻辑信号以及第二逻辑信号,分时建立第一可编程门阵列组件1的第一存储控制单元或第二可编程门阵列组件3的第二存储控制单元访问。第一可编程门阵列组件1中的存储单元的仲裁逻辑还可以设置在第一存储阵列组件2或第二可编程门阵列组件3上。 也即,基于仲裁逻辑选择第一可编程门阵列组件1和第二可编程门阵列组件3分时访问第一存储阵列组件2。
在另一实施例中,在第一存储控制单元以及第二存储控制单元分别控制第一存储阵列组件的不同存储单元时,第一存储控制单元以及第二存储控制单元同时控制第一可编程门阵列组件1以及第二可编程门阵列组件3访问第一存储阵列组件2的不同存储单元。
具体的,当第一可编程门阵列组件1的第一存储控制单元和第二可编程门阵列组件3的第二存储控制单元分别同时访问第一存储阵列组件2的不同存储单元时,由于各自存储控制单元独立,第一可编程门阵列组件1中的存储单元中的仲裁逻辑,基于第一逻辑信号以及第二逻辑信号,可以同时建立第一可编程门阵列组件1的第一存储控制单元和第二可编程门阵列组件3的第二存储控制单元对第一存储阵列组件2的存储单元的访问。
本实施例中,每个逻辑组件有独立存储访问接口,访存带宽最高,访问存储阵列的具体单元不同时,可以同时访问;写入存储阵列的共享区域,具体单元相同时出现冲突,需要仲裁和分时访问。具体的,在第一存储控制单元以及第二存储控制单元均控制第一存储阵列组件2的所有存储单元时,若同时访问同一存储单元时,则需要分时访问。在第一存储控制单元以及第二存储控制单元控制的存储单元不同时,则不需要分时访问。
在本实施例中,第二存储控制单元设置于第二接口模块31上或附近,第一存储控制单元设置于第一接口模块11上或附近。在本实施例中,第一存储控制单元基于第一逻辑信号控制第一可编程门阵列组件1访问第一存储阵列组件2的部分存储单元;第二存储控制单元基于第二逻辑信号控制第二可编程门阵列组件3访问第一存储阵列组件2的其余部分存储单元;第二可编程门阵列组件1访问第一存储阵列组件2的存储单元与第一可编程门阵列组件3访问区域不重叠。第一可编程逻辑单元利用第一存储控制单元,和第二可编程逻辑单元利用第二存储控制单元,独立同时访问各自对应的第一存储阵列组件2上不同存储单元。
本实施例中,每个逻辑组件有独立存储访问接口,访存带宽最高,访问切分第一存储阵列组件2给不同可编程逻辑单元利用存储控制单元组合;实现了不同可编程逻辑单元的并发存储访问,并无需因仲裁和分时访问而降低存储访问效率。
具体的,请参见图6,第一存储阵列组件2上包括存储单元G13,其中,存储单元G13上设置有两个第二键合引出区域,分别为第二键合引出区域G14以及第二键合引出区域G12。其中,第二键合引出区域G14连接到位于第一可编程门阵列组件1上的第一接口模块H17上的第一键合引出区域H19。第一可编程门阵列组件1的第一接口模块H17上设置有第一存储控制单元H20,第一存储控制单元H20用于控制第一可编程门阵列组件1访问第一存储阵列组件2。具体的,第一存储控制单元H20连接第一键合引出区域H19。第一可编程门阵列组件1上设置有第一可编程逻辑单元H23,第一可编程逻辑单元H23通过接口路由单元H22连接至第一存储控制单元H20。在第一可编程门阵列组件1访问第一存储阵列组件2时,第一可编程逻辑单元H23引出第一逻辑信号至第一存储控制单元H20,第一存储控制单元H20基于第一逻辑信号控制第一可编程门阵列组件1通过第一键合引出区域H19、第二键合引出区域G14访问第一存储阵列组件2的部分存储单元单元G13。
另外,第二键合引出区域G12连接到第一接口模块H17上的第一键合引出区域H18上,第一键合引出区域H18与第二可编程门阵列组件3上的第三键合引出区域I28连接。第二可编程门阵列组件3还包括第二可编程逻辑单元I32,第二可编程逻辑单元I32通过接口路由单元I31连接位于第二可编程门阵列组件3的第二接口模块I27上的第二存储控制单元I29。在第二可编程门阵列组件3访问第一存储阵列组件2时,第二可编程逻辑单元I32引出第二逻辑信号至第二存储控制单元I29,第二存储控制单元I29基于第二逻辑信号控制第二可编程门阵列组件3通过第三键合引出区域I28、第一键合引出区域H18、第二键合引出区域G14访问第一存储阵列组件2的其余部分存储单元单元G13。
通过图6所示的连接方式实现第一可编程门阵列组件1以及第二可编程门阵列组件3对第一存储阵列组件2的独立存储访问。可以理解的,可编程门阵列组件还可以为3层、4层 具体不做限定。
需要说明的是,本申请的第一可编程门阵列组件1以及第二可编程门阵列组件3可以为FPGA(现场可编程门阵列)或者eFPGA(非易失性现场可编程门阵列)。在一优选实施例中,第一可编程门阵列组件1以及第二可编程门阵列组件3为FPGA(现场可编程门阵列)或eFPGA(嵌入式现场可编程门阵列)。
本实施例的堆叠芯片中,第二可编程门阵列组件3对第一存储阵列组件2的存储访问不经过IO接口和/或IO电路,使得互连距离更近,互连分布参数更低、存储访问的功耗开销显著降低。芯片制造过程中,可以同时生产第二可编程门阵列组件3以及第一可编程门阵列组件1,并且在将第二可编程门阵列组件3与第一可编程门阵列组件1键合后再与第一存储阵列组件2键合,能够降低工艺复杂度,节省成本。但是第二可编程门阵列组件3对第一存储阵列组件2的存储访问需要经过第一接口模块11以及第二接口模块31,会造成轻微面积损失。
本申请还提出另一实施例,该实施例中,多个可编程门阵列组件对至少一个存储阵列组件,通过混合使用图5和图6方法设计复用或独立存储控制单元,实现混合存储访问。在同一个堆叠芯片中,部分区域的可编程逻辑单元,使用图5所示复用存储控制单元实现存储访问;部分区域的可编程逻辑单元使用图6所示独立存储控制单元。
本申请还提出另一实施例,该实施例中,第二可编程门阵列组件3设置于第一存储阵列组件2远离第一可编程门阵列组件1的一侧。也即第一存储阵列组件2设置于第二可编程门阵列组件3以及第一可编程门阵列组件1之间。其中,第一存储阵列组件2包括第四键合引出区域,第四键合引出区域与第三键合引出区域构成三维异质集成互连。本实施例中,第二可编程门阵列组件3以及第一可编程门阵列组件1均能够实现与第一存储阵列组件2的直接互连,增加可编程处理密度,并有利于更大的存储访问带宽。
本实施例中,第一可编程门阵列组件1对第一存储阵列组件2的存储访问只需要经过第一接口模块11,且第二可编程门阵列组件3对第一存储阵列组件2的存储访问只需要经过第二接口模块31。这种结构使得第二可编程门阵列组件3与第一存储阵列组件2之间的互连距离更近,能够进一步降低存储访问功耗。但是这种结构的堆叠芯片在制备过程中,需要先将第二可编程门阵列组件3与第一存储阵列组件2进行键合,再与第一可编程门阵列组件1进行键合。
请参见图7,为本发明堆叠芯片的第三实施例的结构示意图,与上述图1所示的第一实施例相比,区别在于,本实施例的堆叠芯片还包括:第二存储阵列组件4。第二存储阵列组件4设置于第一存储阵列组件2远离第一可编程门阵列组件1的一侧,第二存储阵列组件4设置有第三键合引出区域41。本实施例中,第一存储阵列组件2还包括第四键合引出区域12,第三键合引出区域41与第四键合引出区域12构成三维异质集成互连。
本实施例中,集成更多的存储阵列组件,有利于增加存储密度,并实现更大的存储访问带宽。本实施例中,集成更多的存储阵列组件,有利于增加存储密度,将多个存储阵列组件统一生产和测试构成标准产品后,与逻辑组件集成,有利于降低成本。
在一实施例中,第一可编程门阵列组件1共用同一存储控制单元访问第一存储阵列组件2和第二存储阵列组件4。具体的,在第一可编程门阵列组件1共用同一存储控制单元访问第一存储阵列组件2和第二存储阵列组件4,为了避免访问冲突,存储控制单元可以分时选择性的选择第一可编程门阵列组件1访问第一存储阵列组件2或者第二存储阵列组件4。
具体请参见8,本实施例中,堆叠芯片还包括存储控制单元H21,存储控制单元H21设置于第一接口模块H17上。本实施例中,第一接口模块H17包括两个第一键合引出区域,分别为第一键合引出区域H19以及第一键合引出区域H18。第一存储阵列组件2上设置多个存储单元G13,存储单元G13上有两个第二键合引出区域,分别为第二键合引出区域G12以及第二键合引出区域G14。第二存储阵列组件4上设置有多个存储单元F01,存储单元F01上设置有第三键合引出区域I28。
具体的,第一键合引出区域H18连接第二键合引出区域G14。存储控制单元H21连接第一键合引出区域H18。以此,存储控制单元H21可以通过第一键合引出区域H18、第二键合引出区域G14控制第一可编程门阵列组件1访问第一存储阵列组件2。
第一键合引出区域H19连接第二键合引出区域G12,第二键合引出区域G12连接第三键合引出区域I28。以此,存储控制单元H21可以通过第一键合引出区域H19、第二键合引出区域G12、第三键合引出区域I28控制第一可编程门阵列组件1访问第二存储阵列组件4。需要说明的是,第二键合引出区域G12不连接存储单元G13。
本实施例中,第一可编程门阵列组件1还包括可编程逻辑单元K23,可编程逻辑单元K23通过接口路由单元H22连接存储控制单元H21,可编程逻辑单元K23引出逻辑信号。存储控制单元H21基于逻辑信号分时选择性的控制第一可编程门阵列组件1访问第一存储阵列组件2,或者控制第一可编程门阵列组件1访问第二存储阵列组件4。具体的,在存储控制单元H21基于逻辑信号,在第一时间控制第一可编程门阵列组件1访问第一存储阵列组件2,在第二时间控制第一可编程门阵列组件1访问第二存储阵列组件4。
在一实施例中,第一可编程门阵列组件1分别利用两个不同的存储控制单元访问第一存储阵列组件2和第二存储阵列组件4。具体的,在第一可编程门阵列组件1分别利用两个不同的存储控制单元访问第一存储阵列组件2和第二存储阵列组件4,由于不存在访问冲突,存储控制单元可以同时控制第一可编程门阵列组件1访问第一存储阵列组件2,并控制第一可编程门阵列组件1访问第二存储阵列组件4。具体的,第一存储控制单元控制第一可编程门阵列组件1访问第一存储阵列组件2,第二存储控制单元控制第一可编程门阵列组件1访问第二存储阵列组件4。
具体请参见图9,本实施例中,堆叠芯片还包括第一存储控制单元H20以及第二存储控制单元I29,第一存储控制单元H20以及第二存储控制单元I29设置于第一接口模块H17上。本实施例中,第一接口模块H17包括两个第一键合引出区域,分别为第一键合引出区域H19以及第一键合引出区域H18。第一存储阵列组件2上设置多个存储单元G13,存储单元G13上有两个第二键合引出区域,分别为第二键合引出区域G12以及第二键合引出区域G14。第二存储阵列组件4上设置有多个存储单元F01,存储单元F01上设置有第三键合引出区域I28。
本实施例中,第一存储控制单元H20连接第一键合引出区域H18,第一键合引出区域H18连接第二键合引出区域G14。以此,第一存储控制单元H18可以通过第一键合引出区域H18、第二键合引出区域G14控制第一可编程门阵列组件1访问第一存储阵列组件2。
进一步的,第二存储控制单元I29连接第一键合引出区域H19,第一键合引出区域H19连接第二键合引出区域G12,第二键合引出区域G12连接第三键合引出区域I28。以此,第二存储控制单元I29可以通过第一键合引出区域H19、第二键合引出区域G12、第三键合引出区域I28控制第一可编程门阵列组件1访问第二存储阵列组件4。需要说明的是,第二键合引出区域G12不连接存储单元G13。
本实施例中,第一可编程门阵列组件1还包括:可编程逻辑单元K23,可编程逻辑单元K23连接第一存储控制单元H20和第二存储控制单元I29,可编程逻辑单元K23引出逻辑信号。具体的,可编程逻辑单元K23通过接口路由单元H22分别连接第一存储控制单元H20和第二存储控制单元I29。本实施例中,第一存储控制单元H20基于逻辑信号控制第一可编程门阵列组件1访问第一存储阵列组件2,第二存储控制单元I29同时基于逻辑信号控制第一可编程门阵列组件1访问第二存储阵列组件4。
本申请还提出另一实施例,该实施例中,多个存储阵列组件对至少一个可编程门阵列组件,通过混合使用图8和图9方法设计复用或独立存储控制单元,实现混合存储访问。在同一个堆叠芯片中,部分区域的可编程逻辑单元,使用图8所示复用存储控制单元实现存储访问;部分区域的可编程逻辑单元使用图9所示独立存储控制单元实现存储访问。
在另一实施例中,如图10所示,第二存储阵列组件4还可以设置于第一可编程门阵列组件1远离第一存储阵列组件2的一侧。本实施例中,第一接口模块11还包括第四键合引出 区域12,第三键合引出区域41与第四键合引出区域12构成三维异质集成互连。
本实施例中,集成更多的存储阵列组件,有利于增加存储密度。并且由于第一存储阵列组件2与第二存储阵列组件4直接与第一可编程门阵列组件1连接,减少三维异质集成中专,互连距离更近,存储访问距离短,分布参数小,存储访问频率和功耗最优。
在一实施例中,第一可编程门阵列组件1共用同一存储控制单元访问第一存储阵列组件2和第二存储阵列组件4。具体的,在第一可编程门阵列组件1共用同一存储控制单元访问第一存储阵列组件2和第二存储阵列组件4,为了避免访问冲突,存储控制单元可以分时选择性的选择第一可编程门阵列组件1访问第一存储阵列组件2或者第二存储阵列组件4。
具体请参见11,本实施例中,堆叠芯片还包括存储控制单元H21,存储控制单元H21设置于第一接口模块H17上。本实施例中,第一接口模块H17包括两个第一键合引出区域,分别为第一键合引出区域H19以及第一键合引出区域H18。第一存储阵列组件2上设置多个存储单元G13,存储单元G13上有第二键合引出区域G14。第二存储阵列组件4上设置有多个存储单元F01,存储单元F01上设置有第三键合引出区域I28。
具体的,第一键合引出区域H18连接第二键合引出区域G14。存储控制单元H21连接第一键合引出区域H18。以此,存储控制单元H21可以通过第一键合引出区域H18、第二键合引出区域G14控制第一可编程门阵列组件1访问第一存储阵列组件2。
存储控制单元H21可以通过第一键合引出区域H19,第一键合引出区域H19连接第三键合引出区域I28。以此,存储控制单元H21可以通过第一键合引出区域H19、第三键合引出区域I28控制第一可编程门阵列组件1访问第二存储阵列组件4。
本实施例中,第一可编程门阵列组件1还包括可编程逻辑单元K23,可编程逻辑单元K23通过接口路由单元H22连接存储控制单元H21,可编程逻辑单元K23引出逻辑信号。存储控制单元H21基于逻辑信号分时选择性的控制第一可编程门阵列组件1访问第一存储阵列组件2,或者控制第一可编程门阵列组件1访问第二存储阵列组件4。具体的,在存储控制单元H21基于逻辑信号,在第一时间控制第一可编程门阵列组件1访问第一存储阵列组件2,在第二时间控制第一可编程门阵列组件1访问第二存储阵列组件4
在一实施例中,第一可编程门阵列组件1分别利用两个不同的存储控制单元访问第一存储阵列组件2和第二存储阵列组件4。具体的,在第一可编程门阵列组件1分别利用两个不同的存储控制单元访问第一存储阵列组件2和第二存储阵列组件4,由于不存在访问冲突,存储控制单元可以同时控制第一可编程门阵列组件1访问第一存储阵列组件2,并控制第一可编程门阵列组件1访问第二存储阵列组件4。具体的,第一存储控制单元控制第一可编程门阵列组件1访问第一存储阵列组件2,第二存储控制单元控制第一可编程门阵列组件1访问第二存储阵列组件4。
具体请参见图12,本实施例中,堆叠芯片还包括第一存储控制单元H20以及第二存储控制单元I29,第一存储控制单元H20以及第二存储控制单元I29设置于第一接口模块H17上。本实施例中,第一接口模块H17包括两个第一键合引出区域,分别为第一键合引出区域H19以及第一键合引出区域H18。第一存储阵列组件2上设置多个存储单元G13,存储单元G13上有第二键合引出区域G14。第二存储阵列组件4上设置有多个存储单元F01,存储单元F01上设置有第三键合引出区域I28。
本实施例中,第一存储控制单元H20连接第一键合引出区域H18,第一键合引出区域H18连接第二键合引出区域G14。以此,第一存储控制单元H18可以通过第一键合引出区域H18、第二键合引出区域G14控制第一可编程门阵列组件1访问第一存储阵列组件2。
进一步的,第二存储控制单元I29连接第一键合引出区域H19,第一键合引出区域H19连接第三键合引出区域I28。以此,第二存储控制单元I29可以通过第一键合引出区域H19、第三键合引出区域I28控制第一可编程门阵列组件1访问第二存储阵列组件4。
本实施例中,第一可编程门阵列组件1还包括:可编程逻辑单元K23,可编程逻辑单元K23连接第一存储控制单元H20和第二存储控制单元I29,可编程逻辑单元K23引出逻辑信 号。具体的,可编程逻辑单元K23通过接口路由单元H22分别连接第一存储控制单元H20和第二存储控制单元I29。本实施例中,第一存储控制单元H20基于逻辑信号控制第一可编程门阵列组件1访问第一存储阵列组件2,第二存储控制单元I29同时基于逻辑信号控制第一可编程门阵列组件1访问第二存储阵列组件4。
本申请还提出另一实施例,该实施例中,多个存储阵列组件对至少一个可编程门阵列组件,通过混合使用图11和图12方法设计复用或独立存储控制单元,实现混合存储访问。在同一个堆叠芯片中,部分区域的可编程逻辑单元,使用图11所示复用存储控制单元实现存储访问;部分区域的可编程逻辑单元使用图12所示独立存储控制单元实现存储访问。
本申请中,存储阵列组件可以是多层芯片,通过三维异质集成键合的组合;专用集成电路阵列组件可以设置乘加计算阵列、乘法计算阵列、脉动处理器阵列、哈希计算阵列、多种编码器阵列、机器学习的专用层阵列、检索功能阵列、图像/视频处理阵列以及CPU和MCU等硬核运算/处理单元的一种或多种任意组合,用于与编程门阵列组件组合使用,提高堆叠芯片的处理密度。
具体的,组件可以为晶粒(die或者chip)、晶圆(wafer)中至少一种,但不以此为限,也可以是本领域技术人员所能想到的任何替换。其中,晶圆(wafer)是指制作硅半导体电路所用的硅晶片,芯片或晶粒(chip or die)是指将上述制作有半导体电路的晶圆进行分割后的硅晶片。例如,本申请的存储阵列组件可以为存储阵列晶粒(DRAM die或者DRAM chip)、存储阵列晶圆(DRAM wafer)。
基于与方法同样的发明构思,本发明实施例还提供了一种三维异质集成的堆叠芯片结构。该堆叠芯片上设有层次化堆叠组件,通过三维异质集成互连,这些组件可以为上文任一组件。在该堆叠芯片进行制备时,还可以,直接以晶圆(wafer)为单位进行制备,以及进行三维异质集成。
在堆叠芯片进行制备时,还可以,部分以晶圆(wafer)为单位进行制备,并及进行三维异质集成,具体有两种方法:先将部分晶圆层进行三维异质集成后,形成中间产品,再将其余晶圆层与中间产品进行上述迭代,直至完成制备;或先将部分晶圆层进行三维异质集成后,形成中间产品,后将中间产品切割成晶粒(die),与其它组件的晶粒进行晶粒对晶粒的三维异质集成,完成制备。
具体的,图4所述多层可编程门阵列组件与至少一层存储阵列组件组成堆叠芯片的制备过程有两种方法:将多层可编程门阵列组件以晶圆为单位,进行三维异质集成,形成中间产品,以提高互连密度,将中间产品,再与至少一层存储阵列组件形成的中间产品,进行三维异质集成,得到堆叠芯片;或者,将多层可编程门阵列组件以晶圆为单位,进行三维异质集成,形成中间产品,将中间产品切割成晶粒并测试后,再与至少一层存储阵列组件形成的切割测试后的中间产品,进行晶粒对晶粒的集成,得到堆叠芯片,因为成品来源于切割和测试后的组件的三维异质集成,良率得到显著提高。
同样的,图7所述多层存储阵列组件与至少一层可编程门阵列组件组成堆叠芯片的制备过程有两种方法:将多层存储阵列组件以晶圆为单位,进行三维异质集成,形成中间产品,以提高互连密度,将中间产品,再与至少一层可编程门阵列组件形成的中间产品,进行三维异质集成,得到堆叠芯片;或者,将多层存储阵列组件以晶圆为单位,进行三维异质集成,形成中间产品,将中间产品切割成晶粒并测试后,再与至少一层可编程门阵列组件形成的切割测试后的中间产品,进行晶粒对晶粒的集成,得到堆叠芯片,因为成品来源于切割和测试后的组件的三维异质集成,良率得到显著提高。
堆叠芯片的可编程门阵列组件和存储阵列组件的层次数量和层次顺序,取决于应用场景、工程需求以及生产成本和生产良率的复杂博弈,所得最优结果不单一。不同层次数量和层次顺序的不同目标产品,所需生产制备工艺也是多样化的,并且对存储控制器的设计和复用设计存在有明显差异。
可编程门阵列组件中,可编程功能模块与可编程路由网络的广泛互连,参见图13,可 编程门阵列组件基于现场可编程逻辑门阵列(Field-Programmable Gate Array,FPGA/Embedded Field-Programmable Gate Array,eFPGA)技术的扩展,可编程门阵列组件包括可编程逻辑块11A和可编程路由网络11B(interconnect);可编程逻辑块11A通过可编程路由网络11B而彼此互联而被配置为若干可编程功能模块,且可编程路由网络11B的至少一部分可扩展至接口路由单元,进而通过三维异质集成,跨层次互连大容量存储阵列,形成大容量、高带宽、可编程存储访问。
三维异质集成是一种层叠芯片互连键合的技术,例如混合键合(Hybrid Bonding)工艺等。通过在已制备的芯片(例如可编程门阵列组件或存储阵列组件)基础上,利用后道工序(BEOL)制造的三维异质集成键合层,实现芯片之间信号的高密度互连,制备得到堆叠芯片。
具体以图14为例说明。图14中,堆叠芯片包含功能组件210、功能组件220和功能组件230,功能组件210、功能组件220和功能组件230可以为可编程门阵列组件和/或存储阵列组件。功能组件210、功能组件220和功能组件230均包含顶层金属层、内部金属层有源层和衬底,其中,顶层金属层和内部金属层用于组件内信号互连;有源层用于实现晶体管,组成模块功能;衬底用于保护模块及提供机械支撑等。功能组件210和功能组件220上接近顶层金属层一面,通过后道工序制造三维异质集成键合层,并互连,形成面对面的互连结构;功能组件220上接近衬底一面和功能组件230上接近顶层金属层一面,通过后道工序制造三维异质集成键合层,并互连,形成背对面(或面对背)的互连结构。功能组件210、功能组件220和功能组件230之间,可以任意通过三维异质集成建立跨组件信号互连。区别于功能组件210、功能组件220和功能组件230的内核电压是否相同,对应两种互连技术。
当功能组件210和功能组件230的内核电压相同时,以功能组件210中的功能电路1,需要与功能组件230中的功能电路10建立跨组件互连为例:功能电路1在功能组件210中内部金属层的引出信号,通过功能组件210的顶层金属,连接功能组件210与功能组件220之间的面对面三维异质集成键合结构,进而与功能组件220的顶层金属互连;互连信号,通过功能组件220的内部金属层,以及贯穿功能组件220的有源层和减薄衬底的硅通孔(TSV),互连至功能组件220与功能组件230之间的背对面三维异质集成键合结构,进而互连至功能组件230的顶层金属层;互连信号通过功能组件230的内部金属层,实现跨组件互连功能组件230中的功能电路10。
当功能组件210和功能组件230的内核电压不同时候,以功能组件中的功能电路2,需要与功能组件230中的功能电路10建立跨组件互连为例:在功能组件210中设计电平转换电路2,电平转换电路2与功能电路2在功能组件210中互连;电平转换电路2将功能电路2的互连信号转换成匹配功能组件230的内核电压后,使用前述方法跨组件互连至功能组件230中的功能电路20。并且,电平转换电路2也可以通过三维异质集成互连,被转移设计到功能模块230或功能模块220中。
本申请提供的堆叠芯片中,可编程门阵列组件以及专用集成电路阵列组件对存储阵列组件的存储访问不经过IO接口和/或IO电路,使得互连距离更近,存储访问的功耗开销显著降低。并且通过三维异质集成键合的方式实现了高宽带、低功耗的可编程存储一体结构。
以上仅为本发明的实施方式,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (20)

  1. 一种堆叠芯片,其特征在于,包括:
    第一可编程门阵列组件,所述第一可编程门阵列组件包括第一接口模块,所述第一接口模块嵌入于所述第一可编程门阵列组件内,所述第一接口模块包括第一键合引出区域;
    第一存储阵列组件,设置有第二键合引出区域;
    所述第一键合引出区域、所述第二键合引出区域键合连接,以将所述第一可编程门阵列组件以及所述第一存储阵列组件上的互连信号连接在一起。
  2. 根据权利要求1所述的堆叠芯片,其特征在于,所述第一可编程门阵列组件包括多个功能模块,
    所述第一接口模块数量至少为一,所述第一接口模块位于多个所述功能模块之间,且通过接口路由单元与所述功能模块连接。
  3. 根据权利要求2所述的堆叠芯片,其特征在于,所述功能模块内部为条带状,所述第一接口模块随条带状的所述功能模块布局延伸布局。
  4. 根据权利要求2所述的堆叠芯片,其特征在于,所述功能模块通过内部金属层连接至接口路由单元,所述第一接口模块通过内部金属层与所述接口路由单元互连。
  5. 根据权利要求4所述的堆叠芯片,其特征在于,所述第一可编程门阵列组件包括:可编程路由网络,所述多个功能模块通过内部金属层与所述可编程路由网络互连,并通过所述可编程路由网络连接至所述接口路由单元。
  6. 根据权利要求1所述的堆叠芯片,其特征在于,所述堆叠芯片还包括:
    物理层,所述物理层用于实现所述第一可编程门阵列组件与所述第二存储阵列组件之间的电平转换;
    所述物理层设置于所述第一接口模块上。
  7. 根据权利要求2所述的堆叠芯片,其特征在于,所述功能模块包括:可编程逻辑块LAB(Logic Array Block)/CLB(Configurable Logic Block)、存储块BRAM(Block Random Access Memory,BRAM)、乘法单元DSP(Digital Signal Processer)和乘累加单元MAC(Multiply Accumulate)中任一种或多种的任意组合。
  8. 根据权利要求7所述的堆叠芯片,其特征在于,所述功能模块还包括:
    专用集成电路阵列单元的组合,所述专用集成电路阵列单元是用于完成固定计算目标的固化硬件电路。
  9. 根据权利要求7所述的堆叠芯片,其特征在于,所述存储块通过存储路由单元与所述可编程逻辑块连接。
  10. 根据权利要求1所述的堆叠芯片,其特征在于,所述第一可编程门阵列组件包含现场可编程门阵列(Field-Programmable Gate Array,FPGA)或嵌入式现场可编程门阵列(Embedded Field-Programmable Gate Array,eFPGA)。
  11. 根据权利要求1所述的堆叠芯片,其特征在于,所述堆叠芯片还包括:
    存储控制单元,所述存储控制单元设置于所述第一接口模块上;或者,
    所述存储控制单元设置于所述第一可编程门阵列组件靠近所述第一接口的位置处;或者,所述存储控制单元设置于所述第一存储阵列组件上;
    所述存储控制单元控制所述第一可编程门阵列组件对所述第一存储阵列组件进行存储访问。
  12. 根据权利要求1所述的堆叠芯片,其特征在于,所述堆叠芯片还包括:
    第二存储阵列组件,所述第二存储阵列组件设置于所述第一可编程门阵列组件远离所述第一存储阵列组件的一侧;
    所述第二存储阵列组件设置有第三键合引出区域;
    所述第一接口模块包括第四键合引出区域,所述第一可编程门阵列组件与所述第二存储阵列组件通过所述第三键合引出区域、所述第四键合引出区域键合连接。
  13. 根据权利要求1所述的堆叠芯片,其特征在于,所述堆叠芯片还包括:
    第二存储阵列组件,所述第二存储阵列组件设置于所述第一存储阵列组件远离所述第一可编程门阵列组件的一侧;
    所述第二存储阵列组件设置有第三键合引出区域;
    所述第一存储阵列组件包括第四键合引出区域,所述第一存储阵列组件与所述第二存储阵列组件通过所述第四键合引出区域、所述第三键合引出区域键合连接。
  14. 根据权利要求12所述的堆叠芯片,其特征在于,所述堆叠芯片还包括:
    存储控制单元,所述存储控制单元设置于所述第一接口模块上;
    所述存储控制单元控制所述第一可编程门阵列组件访问所述第一存储阵列组件以及所述第二存储阵列组件。
  15. 根据权利要求14所述的堆叠芯片,其特征在于,所述第一可编程门阵列组件还包括:
    可编程逻辑单元,连接所述存储控制单元,所述可编程逻辑单元引出逻辑信号;
    所述存储控制单元基于所述逻辑信号分时选择性的控制所述第一可编程门阵列组件访问所述第一存储阵列组件,或者控制所述第一可编程门阵列组件访问所述第二存储阵列组件。
  16. 根据权利要求12所述的堆叠芯片,其特征在于,所述堆叠芯片还包括:
    第一存储控制单元,设置于所述第一接口模块上;
    第二存储控制单元,设置于所述第一接口模块上;
    所述第一存储控制单元控制所述第一可编程门阵列组件访问所述第一存储阵列组件,所述第二存储控制单元控制所述第一可编程门阵列组件访问所述第二存储阵列组件。
  17. 根据权利要求16所述的堆叠芯片,其特征在于,所述第一可编程门阵列组件还包括:
    可编程逻辑单元,连接所述第一存储控制单元以及所述第二存储控制单元,所述可编程逻辑单元引出逻辑信号;
    所述第一存储控制单元基于所述逻辑信号控制所述第一可编程门阵列组件访问所述第一存储阵列组件,所述第二存储控制单元同时基于所述逻辑信号控制所述第一可编程门阵列组件访问所述第二存储阵列组件。
  18. 根据权利要求1所述的堆叠芯片,其特征在于,所述堆叠芯片还包括:
    第二可编程门阵列组件,所述第二可编程门阵列组件设置于所述第一可编程门阵列组件远离所述第一存储阵列组件的一侧;
    所述第二可编程门阵列组件包括第二接口模块,所述第二接口模块包括第三键合引出区域,所述第一接口模块包括第四键合引出区域,所述第三键合引出区域与所述第四键合引出区域键合连接,以将所述第二可编程门阵列组件与所述第一可编程门阵列组件键合连接;
    其中,所述第一可编程门阵列组件和所述第二可编程门阵列组件共用同一存储控制单元访问所述第一存储阵列组件的同一存储单元;或者
    所述第一可编程门阵列组件和所述第二可编程门阵列组件分别利用独立的存储控制单元访问所述第一存储阵列组件的不同的存储单元。
  19. 根据权利要求1所述的堆叠芯片,其特征在于,所述堆叠芯片还包括:
    第二可编程门阵列组件,所述第二可编程门阵列组件设置于所述第一存储阵列组件远离所述第一可编程门阵列组件的一侧;
    所述第二可编程门阵列组件包括第二接口模块,所述第二接口模块包括第三键合引出区域,所述第一存储阵列组件包括第四键合引出区域,所述第三键合引出区域与所述第四键合引出区域键合连接,以将所述第二可编程门阵列组件与所述第一存储阵列组件键合连接;
    其中,所述第一可编程门阵列组件和所述第二可编程门阵列组件共用同一存储控制单元访问所述第一存储阵列组件的同一存储单元;或者
    所述第一可编程门阵列组件和所述第二可编程门阵列组件分别利用独立的存储控制单元访问所述第一存储阵列组件的不同的存储单元。
  20. 根据权利要求1所述的堆叠芯片,其特征在于,所述第一可编程门阵列组件包括可编程逻辑块和可编程路由网络;
    所述可编程逻辑块通过所述可编程路由网络而彼此互联进而被配置为若干可编程功能模块;且所述可编程路由网络的至少一部分可扩展至接口路由单元。
PCT/CN2022/113699 2021-09-02 2022-08-19 一种堆叠芯片 WO2023030051A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111028371.7 2021-09-02
CN202111028371.7A CN113626374A (zh) 2021-09-02 2021-09-02 一种堆叠芯片

Publications (1)

Publication Number Publication Date
WO2023030051A1 true WO2023030051A1 (zh) 2023-03-09

Family

ID=78388996

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/113699 WO2023030051A1 (zh) 2021-09-02 2022-08-19 一种堆叠芯片

Country Status (2)

Country Link
CN (1) CN113626374A (zh)
WO (1) WO2023030051A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116828866A (zh) * 2023-06-07 2023-09-29 阿里巴巴达摩院(杭州)科技有限公司 集成电路组件、处理器和片上系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113626374A (zh) * 2021-09-02 2021-11-09 西安紫光国芯半导体有限公司 一种堆叠芯片
CN116246963A (zh) * 2023-01-31 2023-06-09 北京清微智能科技有限公司 一种可重构3d芯片及其集成方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9374094B1 (en) * 2014-08-27 2016-06-21 Altera Corporation 3D field programmable gate array system with reset manufacture and method of manufacture thereof
CN110192269A (zh) * 2019-04-15 2019-08-30 长江存储科技有限责任公司 三维nand存储器件与多个功能芯片的集成
CN110870062A (zh) * 2019-04-30 2020-03-06 长江存储科技有限责任公司 具有可编程逻辑器件和nand闪存的键合半导体器件及其形成方法
CN111727503A (zh) * 2019-04-15 2020-09-29 长江存储科技有限责任公司 具有可编程逻辑器件和异构存储器的统一半导体器件及其形成方法
CN113626374A (zh) * 2021-09-02 2021-11-09 西安紫光国芯半导体有限公司 一种堆叠芯片

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3918818B2 (ja) * 2004-02-16 2007-05-23 ソニー株式会社 半導体装置
US11244738B2 (en) * 2019-01-11 2022-02-08 Samsung Electronics Co., Ltd. Multi-chip package
US10797037B1 (en) * 2019-07-15 2020-10-06 Xilinx, Inc. Integrated circuit device having a plurality of stacked dies
WO2021159028A1 (en) * 2020-02-07 2021-08-12 Sunrise Memory Corporation High capacity memory circuit with low effective latency
CN111564429A (zh) * 2020-04-29 2020-08-21 北京大学深圳研究生院 一种集成电路三维异质集成芯片及封装方法
CN216118778U (zh) * 2021-09-02 2022-03-22 西安紫光国芯半导体有限公司 一种堆叠芯片

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9374094B1 (en) * 2014-08-27 2016-06-21 Altera Corporation 3D field programmable gate array system with reset manufacture and method of manufacture thereof
CN110192269A (zh) * 2019-04-15 2019-08-30 长江存储科技有限责任公司 三维nand存储器件与多个功能芯片的集成
CN111727503A (zh) * 2019-04-15 2020-09-29 长江存储科技有限责任公司 具有可编程逻辑器件和异构存储器的统一半导体器件及其形成方法
CN110870062A (zh) * 2019-04-30 2020-03-06 长江存储科技有限责任公司 具有可编程逻辑器件和nand闪存的键合半导体器件及其形成方法
CN113626374A (zh) * 2021-09-02 2021-11-09 西安紫光国芯半导体有限公司 一种堆叠芯片

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116828866A (zh) * 2023-06-07 2023-09-29 阿里巴巴达摩院(杭州)科技有限公司 集成电路组件、处理器和片上系统

Also Published As

Publication number Publication date
CN113626374A (zh) 2021-11-09

Similar Documents

Publication Publication Date Title
WO2023030051A1 (zh) 一种堆叠芯片
US11923341B2 (en) Memory device including modular memory units and modular circuit units for concurrent memory operations
TWI767489B (zh) 含晶圓級記憶體電路之高容量記憶體模組
US9293170B2 (en) Configurable bandwidth memory devices and methods
US11810640B2 (en) Memory interface with configurable high-speed serial data lanes for high bandwidth memory
KR102693213B1 (ko) 메모리 시스템
US11789644B2 (en) Memory centric system incorporating computational memory
WO2023030053A1 (zh) 一种llc芯片、缓存系统以及llc芯片的读写方法
CN216118778U (zh) 一种堆叠芯片
US12112793B2 (en) Signal routing between memory die and logic die for mode based operations
US8305789B2 (en) Memory/logic conjugate system
WO2023030054A1 (zh) 一种计算器件、计算系统及计算方法
CN113674772A (zh) 三维集成芯片及其构建方法、数据处理方法、电子设备
CN215601334U (zh) 3d-ic基带芯片、堆叠芯片
CN113722268B (zh) 一种存算一体的堆叠芯片
CN113626373A (zh) 一种集成芯片
CN216118777U (zh) 一种集成芯片
CN113793632A (zh) 非易失可编程芯片
CN113626372B (zh) 一种存算一体的集成芯片
US20240354263A1 (en) Interconnection clustering architecture in system-on-chip and method for facilitating data accessing and data transfer operations using the same
WO2024218900A1 (ja) 演算システムおよび半導体集積回路モジュール
CN105742277A (zh) 一种大容量立体集成sram存储器三维扩展方法
CN115996200A (zh) 3d-ic基带芯片、堆叠芯片及数据处理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22863190

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22863190

Country of ref document: EP

Kind code of ref document: A1