CN117581366A - 3D semiconductor devices and structures - Google Patents

3D semiconductor devices and structures Download PDF

Info

Publication number
CN117581366A
CN117581366A CN202180096166.6A CN202180096166A CN117581366A CN 117581366 A CN117581366 A CN 117581366A CN 202180096166 A CN202180096166 A CN 202180096166A CN 117581366 A CN117581366 A CN 117581366A
Authority
CN
China
Prior art keywords
level
transistor
circuit
ecus
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180096166.6A
Other languages
Chinese (zh)
Inventor
兹维·奥尔巴赫
韩金宇
布莱恩·克洛奎斯特
亚伦·卡彭特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
3d Monocrystalline
Original Assignee
3d Monocrystalline
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 3d Monocrystalline filed Critical 3d Monocrystalline
Priority claimed from PCT/US2021/044110 external-priority patent/WO2022159141A1/en
Publication of CN117581366A publication Critical patent/CN117581366A/en
Pending legal-status Critical Current

Links

Landscapes

  • Semiconductor Integrated Circuits (AREA)

Abstract

The invention relates to a 3D device, the device comprising: a first level including a first transistor, the first level including a first interconnect; a second level including a second transistor, the second level overlying the first level; a third level including a third transistor, the third level overlying the second level; and a plurality of Electronic Circuit Units (ECUs), wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus, and wherein each of the ECUs comprises at least one high resistivity trap rich layer.

Description

3D半导体器件和结构3D semiconductor devices and structures

技术领域Technical field

本申请涉及集成电路(IC)器件和制造方法的一般领域,更具体地涉及多层或三维集成存储电路(3D存储)和三维集成逻辑电路(3D逻辑)器件和制造方法。The present application relates to the general field of integrated circuit (IC) devices and fabrication methods, and more specifically to multi-layer or three-dimensional integrated memory circuit (3D memory) and three-dimensional integrated logic circuit (3D logic) devices and fabrication methods.

背景技术Background technique

在过去的40年里,集成电路(IC)的功能和性能大幅度提高。这在很大程度上是由于“缩放”现象;即,随着每一代技术的不断发展,IC内的横向(lateral)和垂直尺寸等组件尺寸都有所减小(“缩放”)。互补金属氧化物半导体(CMOS)IC中有两类主要组件,即晶体管和导线。通过“缩放”,晶体管性能和密度通常会提高,这有助于前面提到的IC性能和功能的提高。然而,将晶体管连接在一起的导线(互连件)会随着“缩放”而降低性能。目前的情况是,导线主导了IC的性能、功能和功耗。Over the past 40 years, the functionality and performance of integrated circuits (ICs) have increased dramatically. This is largely due to the phenomenon of “scaling”; that is, with each successive generation of technology, component dimensions such as lateral and vertical dimensions within an IC decrease (“scale”). There are two main types of components in a complementary metal oxide semiconductor (CMOS) IC, namely transistors and wires. Through "scaling," transistor performance and density typically increase, which contributes to the previously mentioned increases in IC performance and functionality. However, the wires (interconnects) that connect transistors together degrade performance as they "scale." The current situation is that wires dominate the performance, functionality and power consumption of ICs.

半导体器件或芯片的3D堆叠是解决导线问题的一种途径。通过将晶体管排列在三维而不是二维(就像20世纪90年代的情况一样),IC中的晶体管可以放置得更靠近。这减小了导线长度,并保持低布线延迟和布线。3D stacking of semiconductor devices or chips is one way to solve the wire problem. By arranging transistors in three dimensions instead of two (as was the case in the 1990s), transistors in an IC can be placed closer together. This reduces wire length and keeps routing latency and routing low.

有许多构建3D堆叠集成电路或芯片的技术,包括:There are many technologies for building 3D stacked integrated circuits or chips, including:

·硅通孔(Through-silicon via,TSV)技术:多层芯片分别构建。在此之后,其可以彼此接合并通过硅通孔(TSV)彼此连接。·Through-silicon via (TSV) technology: multi-layer chips are constructed separately. After this, they can be bonded to each other and connected to each other through through silicon vias (TSVs).

·单片3D技术:通过这种方法,可以单片构建多层晶体管和导线。在以下项中描述了一些单片3D和3D IC方法:美国专利8,273,610、8,298,875、8,362,482、8,378,715、8,379,458、8,450,804、8,557,632、8,574,929、8,581,349、8,642,416、8,669,778、8,674,470、8,687,399、8,742,476、8,803,206、8,836,073、8,902,663、8,994,404、9,023,688、9,029,173、9,030,858、9,117,749、9,142,553、9,219,005、9,385,058、9,406,670、9,460,978、9,509,313、9,640,531、9,691,760、9,711,407、9,721,927、9,799,761、9,871,034、9,953,870、9,953,994、10,014,292、10,014,318、10,515,981、10,892,016;未决的美国专利申请公开案和申请14/642,724、15/150,395、15/173,686、16/337,665、16/558,304、16/649,660、16/836,659、17/151,867、62/651,722、62/681,249、62/713,345、62/770,751、62/952,222、62/824,288、63/075,067、63/091,307、63/115,000、2020/0013791、16/558,304;以及PCT申请(和公开案)PCT/US2010/052093、PCT/US2011/042071(W02012/015550)、PCT/US2016/52726(WO2017053329)、PCT/US2017/052359(W02018/071143)、PCT/US2018/016759(WO2018144957);以及PCT/US2018/52332(WO 2019/060798)。上述专利、公开案和申请的内容以全文引用的方式并入本文中。·Monolithic 3D technology: With this method, multiple layers of transistors and wires can be built monolithically. Some monolithic 3D and 3D IC approaches are described in: U.S. Patents 8,273,610, 8,298,875, 8,362,482, 8,378,715, 8,379,458, 8,450,804, 8,557,632, 8,574,929, 8,581,349, 8,642,416, 8,669 ,778, 8,674,470, 8,687,399, 8,742,476, 8,803,206, 8,836,073, 8,902,663 , 8,994,404, 9,023,688, 9,029,173, 9,030,858, 9,117,749, 9,142,553, 9,219,005, 9,385,058, 9,406,670, 9,460,978, 9,509,313, 9,640,531, 9, Pending U.S. patent application publications and applications 14/642,724, 15/150,395, 15/173,686, 16/337,665, 16/558,304, 16/649,660, 16/836,659, 17/151,867, 62/651,722, 62/681,249, 62/713, 345 , 62/770,751, 62/952,222, 62/824,288, 63/075,067, 63/091,307, 63/115,000, 2020/0013791, 16/558,304; and PCT applications (and publications) PCT/US2010/052093, PCT/US201 1 /042071(W02012/015550), PCT/US2016/52726(WO2017053329), PCT/US2017/052359(W02018/071143), PCT/US2018/016759(WO2018144957); and PCT/US2018/523 32(WO 2019/060798). The contents of the above patents, publications, and applications are incorporated herein by reference in their entirety.

·电光:也有针对包括不同晶体层的集成单片3D的工作,如美国专利8,283,215、8,163,581、8,753,913、8,823,122、9,197,804、9,419,031、9,941,319、10,679,977和10,943,934。上述专利、公开案和申请的内容以全文引用的方式并入本文中。Electro-optical: There is also work on integrated monolithic 3D including different crystal layers, such as US patents 8,283,215, 8,163,581, 8,753,913, 8,823,122, 9,197,804, 9,419,031, 9,941,319, 10,679,977 and 10,943,934. The contents of the above patents, publications, and applications are incorporated herein by reference in their entirety.

此外,美国专利申请公开2018/0350823和美国专利申请62/963166、62/963270、62/983559、62/986772、63108433、63/118908、63/123464、63/144970、63/151664和17/151867以全文引用的方式并入本文中。In addition, U.S. Patent Application Publication 2018/0350823 and U.S. Patent Applications 62/963166, 62/963270, 62/983559, 62/986772, 63108433, 63/118908, 63/123464, 63/144970, 63/151664, and 17/151867 It is incorporated herein by reference in its entirety.

此外,根据本发明的一些实施例的3D技术可以实现一些非常创新的IC器件替代品,具有开发成本降低、工艺流程新颖且更简单、产量增加优点以及其它说明性优点。In addition, 3D technology according to some embodiments of the present invention may enable some very innovative IC device alternatives with reduced development costs, novel and simpler process flows, increased yield advantages, and other illustrative advantages.

发明内容Contents of the invention

本发明涉及多层或三维集成电路(3D IC)器件和制造方法。3D IC的重要方面是允许层转移的技术。这些技术包括支持供体晶片再利用的技术,和支持在待与其一起转移的转移层上制造有源器件的技术。The present invention relates to multilayer or three-dimensional integrated circuit (3D IC) devices and manufacturing methods. An important aspect of 3D IC is the technology that allows layer transfer. These include technologies that enable the reuse of donor wafers and those that enable the fabrication of active devices on transfer layers to be transferred with them.

附图说明Description of the drawings

至少从以下接合附图的详细描述中,将更充分地理解和了解本发明的各种实施例,其中:Various embodiments of the present invention will be more fully understood and understood from at least the following detailed description taken in conjunction with the accompanying drawings, in which:

图1是7nm 6T SRAM位单元布局的示例说明;Figure 1 is an example illustration of 7nm 6T SRAM bit cell layout;

图2是具有以2D重复模式布置的存储单元的存储器结构的示例说明;Figure 2 is an illustration of a memory structure with memory cells arranged in a 2D repeating pattern;

图3A-3D是图2的2D重复模式存储器结构的各种布置和定制的示例说明;Figures 3A-3D are illustrative illustrations of various arrangements and customizations of the 2D repeating pattern memory structure of Figure 2;

图4A-4C是图3B-3D的剖视图示例说明,说明了各种单元到单元和单元内连接;Figures 4A-4C are illustrative cross-sectional views of Figures 3B-3D illustrating various cell-to-cell and intra-cell connections;

图5A-5E是字线引脚/焊盘连接布局和位线引脚/焊盘连接布局的示例说明;Figures 5A-5E are example illustrations of wordline pin/pad connection layouts and bitline pin/pad connection layouts;

图6是将存储布局概念扩展到多级存储器结构的示例说明;Figure 6 is an example illustration of extending the memory layout concept to a multi-level memory structure;

图7A是美国申请16/558,304的图43E的示例说明;Figure 7A is an illustration of Figure 43E of US application 16/558,304;

图7B是包括存储器和存储控制器的存储单元的示例说明;Figure 7B is an example illustration of a memory unit including a memory and a memory controller;

图7C是图7B中形成为阵列的4个存储单元的示例说明;Figure 7C is an illustration of four memory cells formed into an array in Figure 7B;

图7D是存储单元的晶片大小阵列的示例说明;Figure 7D is an example illustration of a wafer-sized array of memory cells;

图7E-7G是存储层形成工艺的剖视图示例说明,存储层可以被存储,然后接合到其它器件结构以形成系统;Figures 7E-7G are cross-sectional views illustrating a memory layer formation process. The memory layer can be stored and then bonded to other device structures to form a system;

图8是设计逻辑和存储器的整个工艺流程的示例说明;Figure 8 is an example illustration of the entire process flow for designing logic and memory;

图9A-9G是可以形成3D计算器件的3D层形成流程的示例说明;9A-9G are illustrations of 3D layer formation processes that can form 3D computing devices;

图10A-10D是通过异构集成有效地向有源器件的多个层级输送功率的各种功率输送衬底架构的示例说明;10A-10D are illustrative illustrations of various power delivery substrate architectures that efficiently deliver power to multiple levels of active devices through heterogeneous integration;

图10E是利用降压转换器或电压调节器的示例说明,降压转换器或电压调节器可以通过3D系统分布到每个区域;Figure 10E is an example illustration of utilizing a buck converter or voltage regulator, which can be distributed to each area through a 3D system;

图10F是提供可集成到3D系统中的有意应力释放机构的示例技术的示例说明;10F is an illustration of an example technology that provides an intentional stress relief mechanism that can be integrated into a 3D system;

图11是说明晶片处理成本高度依赖于所用工艺线类型的示例表;Figure 11 is an example table illustrating that wafer processing costs are highly dependent on the type of process line used;

图12A-12B是准备混合接合到电路上引脚/焊盘结构的耦合层级的示例说明;Figures 12A-12B are illustrative illustrations of coupling levels in preparation for hybrid bonding to pin/pad structures on a circuit;

图13A-13B是各种3D系统的分阶段集成和各种M层级形成的示例说明;Figures 13A-13B are illustrations of the staged integration of various 3D systems and the formation of various M levels;

图14A-14E是形成各种类型的3D系统的各种层级集成的示例说明;Figures 14A-14E are illustrative illustrations of various levels of integration forming various types of 3D systems;

图14F是连接到纳米TSV和内部逻辑的各种ESD保护功能的示例说明;Figure 14F is an example illustration of various ESD protection functions connected to the Nano TSV and internal logic;

图15A-15D是作为具有光子X-Y连接的3D系统的一部分的DieM层级的示例说明;Figures 15A-15D are example illustrations of DieM levels as part of a 3D system with photonic X-Y connections;

图15E是Tam,Sai Wang等人的“未来SoC的有线/无线RF互连(Wireline/wirelessRF-Interconnect for future SoC)”2011IEEE射频集成技术国际研讨会,IEEE,2011年的图8和图9的副本。;Figure 15E is Figure 8 and Figure 9 of "Wireline/wirelessRF-Interconnect for future SoC" by Tam, Sai Wang et al., 2011 IEEE International Symposium on Radio Frequency Integration Technology, IEEE, 2011 copy. ;

图15F是片上传输/互连网络的各种结构和数据的示意图;Figure 15F is a schematic diagram of various structures and data of the on-chip transmission/interconnect network;

图15G是示例性3D系统的示意图,所述系统类似于图14E中所示的系统,具有一个额外的RF-M层级;Figure 15G is a schematic diagram of an exemplary 3D system similar to the system shown in Figure 14E with an additional RF-M level;

图15H是图15F所示蜿蜒结构的替代3D系统器件和结构的示例说明;Figure 15H is an illustration of an alternative 3D system device and structure to the meandering structure shown in Figure 15F;

图15I是具有DMA选项的RF-M层级的基层的每单元TL处理器的示例说明框图;15I is an example illustrative block diagram of a per-unit TL processor at the base level of the RF-M hierarchy with DMA options;

图15J是描述数据传输方向改变连接和逻辑的替代示意图的示例说明;Figure 15J is an illustration of an alternative schematic diagram depicting data transfer direction change connections and logic;

图15K是两个TL之间连接的简化示例说明;Figure 15K is a simplified example illustration of a connection between two TLs;

图15L示出了图15I的修改框图,其中4个单元被聚合以与RF-I结构(fabric)通信;Figure 15L shows a modified block diagram of Figure 15I, in which 4 units are aggregated to communicate with the RF-I fabric;

图15M示出了超大光罩的示例以及TL和(处理器)电路的不同场尺寸的用途;Figure 15M shows an example of an oversized reticle and the use of different field sizes for TL and (processor) circuitry;

图15N是可延伸大于10mm的芯片上互连件和可分组的多个芯片(例如4个芯片)的示例说明;15N is an illustration of an on-chip interconnect that can extend greater than 10 mm and multiple chips that can be grouped (eg, 4 chips);

图15O是杜洁琼等人的论文“用于高速存储接口的A28 mW 32Gb/s/引脚16-QAM单端收发器(A28-mW 32-Gb/s/pin 16-QAM Single-Ended Transceiver for High-SpeedMemory Interface)”,2020IEEE VLSI电路研讨会,IEEE,2020的图1;Figure 15O is the paper "A28-mW 32-Gb/s/pin 16-QAM Single-Ended Transceiver for High-speed Storage Interface" by Du Jieqiong et al. -SpeedMemory Interface)", Figure 1 of 2020 IEEE VLSI Circuit Symposium, IEEE, 2020;

图15P示出了多层TL的截面图,示出了X方向TL的两个层级和Y方向TL的多个层级;15P shows a cross-sectional view of the multi-layer TL, showing two levels of the X-direction TL and multiple levels of the Y-direction TL;

图15Q显示了I.Cutress的“Hot Chips 31直播博客:Cerebras的1.2万亿晶体管深度学习处理器(I.Cutress"Hot Chips 31Live Blogs:Cerebras'1.2Trillion TransistorDeep Learning Processor)”提出的晶片级引擎的示例,为了节省浪费的面积,可以考虑使用非矩形3D系统;Figure 15Q shows the wafer-level engine proposed by I. Cutress "Hot Chips 31 Live Blogs: Cerebras' 1.2 Trillion Transistor Deep Learning Processor" For example, in order to save wasted area, you can consider using a non-rectangular 3D system;

图15R不仅显示了整个圆形晶片,而且可以使用一半或四分之一的晶片,而不会损失边缘附近的任何芯片;Figure 15R not only shows the entire circular wafer, but half or quarter of the wafer can be used without losing any die near the edge;

图15S是显示TL层级可以放置在处理器和存储器层级下面和/或上面的示例说明;Figure 15S is an illustration showing that the TL hierarchy can be placed below and/or above the processor and memory hierarchy;

图15T是显示当前面板各个区域的示例说明;然而,最小的面板尺寸仍然大于300mm晶片的尺寸;Figure 15T is an example illustration showing various areas of the current panel; however, the smallest panel size is still larger than the size of the 300mm wafer;

图15U是使用NoC型拓扑结构,即3D toms的长途TL的示例说明;Figure 15U is an example illustration of long-haul TL using NoC-type topology, i.e., 3D toms;

图15V是使用NoC型拓扑结构,即蝴蝶状的长途TL的示例说明;Figure 15V is an example illustration of long-distance TL using a NoC-type topology, i.e., butterfly-shaped;

图15W是拥塞感知路由的示例说明;Figure 15W is an example illustration of congestion-aware routing;

图16A-16E是3D系统内置的各种散热技术和结构的示例说明;例如,3D系统中的SubstrateM-Level可以包括多个计算层级和存储器层级,X-Y连接层级介于两者之间,而系统热量可以通过液体冷却来管理;Figures 16A-16E are examples of various cooling technologies and structures built into 3D systems; for example, the SubstrateM-Level in a 3D system can include multiple computing levels and memory levels, with the X-Y connection level in between, and the system Heat can be managed through liquid cooling;

图16F是显示3D系统与外部器件的无线连接的示例说明,这可能很有吸引力,因为其可以共享NOC和与外部器件连接的资源;Figure 16F is an example illustration showing a wireless connection of a 3D system to an external device, which may be attractive because it can share the NOC and resources connected to the external device;

图16G是具有额外冷却液分配结构的3D系统的示例说明;Figure 16G is an example illustration of a 3D system with additional coolant distribution structures;

图16H是具有额外冷却液分配结构的3D系统的额外示例说明;Figure 16H is an additional example illustration of a 3D system with additional coolant distribution structures;

图17A-17D是通过简单接合和细化的多个步骤形成全M层级,然后使用TSV工艺通过层级堆叠形成垂直母线支柱,然后形成引脚/焊盘的示例说明;Figures 17A-17D are illustrative examples of multiple steps of simple bonding and thinning to form full M levels, and then using the TSV process to form vertical busbar pillars through level stacking, and then to form pins/pads;

图18A-18F是可用于形成肖特基势垒(Schottky Barrier)S/D结、形成均匀硅化物层的工艺步骤的示例性说明;18A-18F are exemplary illustrations of process steps that may be used to form a Schottky Barrier S/D junction to form a uniform suicide layer;

图18G是堆叠的3D NAND结构的顶部和底部的弯曲和扭曲变化的示例说明;Figure 18G is an example illustration of bending and twisting changes at the top and bottom of a stacked 3D NAND structure;

图19A-19E是实现具有正确连接的多个3D NOR-P块堆叠的3D NOR-P晶片的方法和结构的示例性说明;19A-19E are illustrative illustrations of methods and structures for implementing a 3D NOR-P wafer with a stack of correctly connected multiple 3D NOR-P blocks;

图19F-19H是实现具有金属诱导再结晶硅通道的3D NOR-P晶片的方法和结构的示例性说明;19F-19H are illustrative illustrations of methods and structures for realizing 3D NOR-P wafers with metal-induced recrystallized silicon channels;

图20A和20B是3D NOR-P结构和器件中金属位线的一些优点的示例性说明;Figures 20A and 20B are illustrative illustrations of some advantages of metal bit lines in 3D NOR-P structures and devices;

图20C是用于说明各种WL/BL/SL组合的各种类型的未选择单元共享的2x2阵列3DNOR-P结构的示例性说明;Figure 20C is an exemplary illustration of a 2x2 array 3DNOR-P structure shared by various types of unselected cells for various WL/BL/SL combinations;

图20D是对图20C中所示的此类存储单元的写入电压的示例性说明,且在四种情况下在单个单元上进行说明;Figure 20D is an exemplary illustration of write voltages for the type of memory cell shown in Figure 20C, and is illustrated in four cases on a single cell;

图20E是以不同方式连接到感应放大器的2x2 FG存储器阵列的示例说明;Figure 20E is an example illustration of a 2x2 FG memory array connected to a sense amplifier in different ways;

图20F是根据读取时间的S/A的BL的电压发展的示例性说明;Figure 20F is an exemplary illustration of voltage development of BL of S/A as a function of read time;

图20G是图20C的示例性单元的读取操作和未选择单元的禁止读取的示例性电压条件和相关能带图的示例性说明;20G is an exemplary illustration of exemplary voltage conditions and associated energy band diagrams for read operations of the exemplary cells of FIG. 20C and read-inhibition of unselected cells;

图20H是图20C的示例性单元的不同存储状态和读取条件的位线电流与字线电压特性的示例性说明;20H is an exemplary illustration of bit line current and word line voltage characteristics for different storage states and read conditions of the exemplary cell of FIG. 20C;

图21A是Cerebras Systems公司晶片级引擎的示例性说明;Figure 21A is an exemplary illustration of a wafer-scale engine from Cerebras Systems;

图21B是晶片级3D系统的示例性说明,所述系统可包括沿晶片边缘圆周的I/O焊盘;且21B is an exemplary illustration of a wafer-level 3D system that may include I/O pads along the circumference of the wafer edge; and

图21C-21G是晶片级3D系统夹具的示例性说明,可以使系统“裸露”并提供更好的散热;Figures 21C-21G are exemplary illustrations of wafer-level 3D system fixtures that can "expose" the system and provide better heat dissipation;

图21H-21I是用于晶片级3D系统的液体冷却槽的各种配置的示例性说明;21H-21I are illustrative illustrations of various configurations of liquid cooling baths for wafer-scale 3D systems;

图21J是裸晶片级3D系统的夹具的示例性说明,所述系统包括以太网端口和电源端口;Figure 21J is an illustrative illustration of a fixture for a bare wafer-level 3D system including an Ethernet port and a power port;

图21K是设计为切成两半的晶片级3D系统的示例性说明;且Figure 21K is an exemplary illustration of a wafer-scale 3D system designed to be cut in half; and

图21L是通过印刷电路板排列的图21K的晶片级3D系统的示例性说明。Figure 21L is an exemplary illustration of the wafer-level 3D system of Figure 21K arranged by a printed circuit board.

具体实施方式Detailed ways

现在参照附图描述本发明的一个实施例。所属领域的普通技术人员应理解,说明书和附图说明而不是限制本发明,且通常为了表述的清楚,附图并非按比例绘制。这样的技术人员还应意识到,通过应用本文所包含的发明原理,更多的实施例是可能的,且这样的实施例属于本发明的范围,除任何额外权利要求外,本发明不受限制。An embodiment of the present invention will now be described with reference to the accompanying drawings. It will be understood by those of ordinary skill in the art that the description and drawings illustrate rather than limit the invention, and that the drawings are generally not drawn to scale for clarity of presentation. Such skilled artisans will further appreciate that many more embodiments are possible by applying the inventive principles contained herein and that such embodiments are within the scope of the invention, which is not limited except by any additional claims. .

一些图可能描述了建筑器件的工艺流程。工艺流程可以是用于构建器件的一系列步骤,其可以具有在两个或多个相邻步骤之间可能是共同的许多结构、数字和标签。在这种情况下,用于某个步骤的图的一些标签、数字和结构可能已经在前面的步骤的图中进行了描述。Some diagrams may depict the process flow of building components. A process flow may be a series of steps used to build a device, which may have many structures, numbers, and labels that may be common between two or more adjacent steps. In this case, some of the labels, numbers, and structures used in the diagram for a certain step may already be described in the diagram of the previous step.

在基于3D IC的系统的构建中使用层转移可以实现异构集成,其中每个层可以包括MEMS传感器、图像传感器、CMOS SoC、易失性存储器(如DRAM和SRAM)、持久性存储器以及非易失性存储器(如闪存和OTP)中的一个或多个。这可以包括在存储器阵列的顶部或下方添加存储器控制电路,也称为外围电路。存储器层可以仅包含存储单元而不包含控制逻辑,因此控制逻辑可以被包括在单独的层上。或者,存储器层可以包含存储单元和简单控制逻辑,其中所述层上的控制逻辑可以包括解码器、缓冲存储器、感应放大器中的至少一个。电路可以包括电荷泵和高压晶体管,其可以使用硅晶体管或其它晶体管类型(例如SiGe、Ge、CNT等)使用不同于低压控制电路制造工艺线的制造工艺线在层上制造。模拟电路,例如用于感应放大器的模拟电路和其它敏感线性电路,也可以被独立处理并被转移到3D结构上。这种3D构造可以包括本发明中提出的“智能对准”技术,或者利用存储器阵列的重复性质来减少晶片接合器未对准对集成有效性的影响。The use of layer shifting in building 3D IC-based systems enables heterogeneous integration, where each layer can include MEMS sensors, image sensors, CMOS SoCs, volatile memories such as DRAM and SRAM, persistent memories, and non-volatile memories. One or more of the volatile memories (such as flash memory and OTP). This can include adding memory control circuitry, also known as peripheral circuitry, on top or below the memory array. The memory layer may contain only memory cells and no control logic, so the control logic may be included on a separate layer. Alternatively, the memory layer may contain memory cells and simple control logic, wherein the control logic on the layer may include at least one of a decoder, a buffer memory, and a sense amplifier. The circuits may include charge pumps and high voltage transistors, which may be fabricated on a layer using silicon transistors or other transistor types (eg, SiGe, Ge, CNT, etc.) using fabrication process lines that are different from the low voltage control circuit fabrication process lines. Analog circuits, such as those used in sense amplifiers and other sensitive linear circuits, can also be processed independently and transferred to 3D structures. Such 3D construction could include the "intelligent alignment" technology proposed in this invention, or exploit the repetitive nature of the memory array to reduce the impact of wafer bonder misalignment on integration effectiveness.

例如,在美国专利申请第15/173,395号等专利中,提出了称为外延层转移(epitaxial layer transfer,EL TRAN)的层转移技术,所述技术可能是3DIC形成工艺的一部分。ELTRAN技术利用在多孔层上的一种或多种外延工艺。或者,通过利用这些外延层的蚀刻选择性,例如SiGe对硅的非常高的蚀刻选择性以及诸如硅(单晶或多晶硅或非晶)、SiGe(硅和锗的混合物)、P掺杂硅、N掺杂硅等的变化,可以形成其它基于外延的结构以支持层转移技术,这些层可以与各种类型的分离工艺相接合,如“冷裂”,例如Silectra应力聚合物和低温冲击处理,以提供薄层转移工艺。For example, in US Patent Application No. 15/173,395 and other patents, a layer transfer technology called epitaxial layer transfer (EL TRAN) is proposed, which may be part of the 3DIC formation process. ELTRAN technology utilizes one or more epitaxial processes on a porous layer. Alternatively, by taking advantage of the etch selectivity of these epitaxial layers, such as the very high etch selectivity of SiGe to silicon and other technologies such as silicon (monocrystalline or polycrystalline or amorphous), SiGe (mixture of silicon and germanium), P-doped silicon, Variations such as N-doped silicon, other epitaxy-based structures can be formed to support layer transfer techniques, and these layers can be joined with various types of separation processes such as "cold cracking", such as Silectra stress polymers and cryogenic shock treatments, To provide thin layer transfer process.

最近,它成为处理栅极全方位水平晶体管的一个非常有吸引力的概念,并已成为下一代器件(如5nm技术节点)的目标流程。Jang Gn Yun等人发表于IEEE电子器件汇刊,第58卷,第4期,2011年4月上的一篇标题为“单晶硅堆叠阵列(STAR)NAND闪存(Single-Crystalline Si Stacked Array(STAR)NAND Flash Memory)”的论文介绍了SiGe与硅的选择性蚀刻方面的一些工作,且K.Wostyn等人发表于ECS Transactions,69(8)147-152(2015)上的标题为“用于栅极全方位器件结构的Si和SiGe的选择性蚀刻(Selective Etchof Si and SiGe for Gate All-Around Device Architecture)”的文章,和V.Destefanis等人发表于EC Transactions,16(10)427-438(2008)上的标题为“Sil-xGex相对于Si的HC1选择性蚀刻用于无硅和多栅极器件(HC1 Selective Etching of Sil-xGex versus Sifor Silicon On Nothing and Multi Gate Devices)”的文章介绍了最近的工作,所有上述内容均以引用的方式并入本文中。由于硅衬底上的SiGe工艺正在变得成熟,这有助于使用SiGe层作为牺牲层来进行有生产价值的3D层转移。Recently, it has become a very attractive concept for processing omni-directional horizontal transistors with gates and has become a target flow for next-generation devices such as 5nm technology nodes. Jang Gn Yun et al. published an article in IEEE Transactions on Electronic Devices, Volume 58, Issue 4, April 2011 titled "Single-Crystalline Si Stacked Array (STAR) NAND Flash (Single-Crystalline Si Stacked Array) STAR) NAND Flash Memory)" paper introduces some work on selective etching of SiGe and silicon, and K.Wostyn et al. published in ECS Transactions, 69 (8) 147-152 (2015) titled "Using "Selective Etch of Si and SiGe for Gate All-Around Device Architecture" article, published by V.Destefanis et al. in EC Transactions, 16(10)427- 438 (2008) article titled "HC1 Selective Etching of Sil-xGex versus Si for Silicon On Nothing and Multi Gate Devices" Recent work is presented, and all of the above are incorporated by reference into this article. As the SiGe process on silicon substrates is maturing, this facilitates the use of SiGe layers as sacrificial layers for production-worthy 3D layer transfer.

至少在以引用的方式并入本文中的美国专利8,669,778中,关于至少图22,提出了一种技术,所述技术具有针对特定应用定制的通用存储器阵列,如SRAM、DRAM、FRAM、RRAM或MRAM,并将其集成为3D器件流程的一部分。至少在以引用的方式并入本文中的美国专利9,021,414中,提出了将电子设计自动化(“EDA”)工具用于这种3D结构的流程和技术。至少在以引用的方式并入本文中的美国专利申请16/558,304中,关于图21A至图25J,提出了将通用存储器阵列与利用混合接合的逻辑集成的技术,作为3D器件流程的一部分。本文介绍了这些概念的进一步变化。3D器件可以包括定制设计逻辑层级,通过使用例如混合接合的3D集成来集成存储器层级。存储器层级可以被完全定制以匹配底层定制逻辑,或者通过使用通用存储器层级,如本文所示,所述通用存储器层级已经被定制了几个添加的步骤以匹配底层定制逻辑。存储器层级可以形成为单元阵列,其中单元是位单元阵列。底层定制逻辑可以包括诸如解码器和感应放大器之类的存储器控制电路。In at least U.S. Patent 8,669,778, which is incorporated herein by reference, with respect to at least Figure 22, a technology is proposed with a general-purpose memory array customized for a specific application, such as SRAM, DRAM, FRAM, RRAM, or MRAM , and integrate it as part of the 3D device flow. At least in US Patent 9,021,414, which is incorporated herein by reference, processes and techniques for using electronic design automation ("EDA") tools for such 3D structures are proposed. At least in U.S. Patent Application No. 16/558,304, which is incorporated herein by reference, with respect to Figures 21A-25J, techniques are proposed to integrate general-purpose memory arrays with logic utilizing hybrid bonding as part of a 3D device flow. This article introduces further variations of these concepts. 3D devices may include custom designed logic levels to integrate memory levels using 3D integration such as hybrid bonding. The memory hierarchy can be fully customized to match the underlying custom logic, or by using a generic memory hierarchy that has been customized with several added steps to match the underlying custom logic, as shown herein. The memory hierarchy may be formed as an array of cells, where the cells are arrays of bit cells. The underlying custom logic can include memory control circuits such as decoders and sense amplifiers.

在以下存储器堆叠替代方案中,一些考虑因素被视为重要的驱动因素。首先,目标是保持或最大限度地减少对定制器件使用存储器堆叠的总体投资。因此,存储器阵列可以被设计为通过很少的定制步骤来定制的通用结构,例如一个或两个金属层及其相关的通孔层。其次,通用存储器结构使用常规和简单的铜互连,这些互连通常由化学机械抛光(CMP)而非蚀刻来定义。换言之,通用存储器结构可以由半导体代工厂等专用供应商提供,且通用存储器结构可由许多客户购买和定制,并根据其需求降低掩模成本和其它非经常性成本(“NRE”)。Several considerations are considered important drivers in the following memory stacking alternatives. First, the goal is to maintain or minimize the overall investment in custom devices using memory stacks. Thus, memory arrays can be designed as generic structures that can be customized with few customization steps, such as one or two metal layers and their associated via layers. Second, general-purpose memory structures use conventional and simple copper interconnects that are typically defined by chemical mechanical polishing (CMP) rather than etching. In other words, a universal memory structure can be provided by a dedicated supplier such as a semiconductor foundry, and the universal memory structure can be purchased and customized by many customers and reduce mask costs and other non-recurring costs ("NRE") according to their needs.

因此,通用存储器结构可以设计为单元阵列。每个单元可以是晶片平面中的位单元的小二维阵列。稍后,如果产品或客户需要比2D单个芯片的位单元密度更高的位单元浓度,则可以堆叠多个通用存储器晶片以形成3D堆叠通用存储器结构。当堆叠相同设计和处理的通用存储器晶片时,存储单元在垂直方向上或沿着晶片外平面重复。通常,一个单元中的行数可以在32到1028的范围内,且所述单元中的列数可以在从32到1028的范围内。为了在最低限度地降低成本、功率和性能的情况下为客户提供灵活性和多功能性,可以青睐相对较小的单元尺寸,如32x 32或64x 64,而不是512x 512。在本文中,单元的最小尺寸将被称为“原始单元”。如果3D堆叠的通用存储器晶片应考虑通用存储器晶片,则相邻的基元单元可以具有一些用于硅通孔或层通孔的额外空间。存储单元尺寸方面的定制可以通过在晶片堆叠步骤之前在通用存储器晶片上添加几个定制工艺步骤来提供。定制步骤可以是在通用存储器晶片上处理的额外金属化步骤,其将几个单元桥接并拼接成所需尺寸的存储器结构。拼接在一起形成目标尺寸的多个基本单元将被称为“拼接单元”。例如,可以连接32x 32个基本单元的四个单元以形成64x 64拼接单元。除了拼接工艺之外,引脚焊盘形成步骤也可以作为这些额外金属定制工艺步骤的一部分。然后,可以使用例如混合接合将定制的存储器晶片翻转并接合到逻辑衬底,且在将存储器连接到逻辑的逻辑衬底处形成到预定义焊盘的连接。Therefore, a general memory structure can be designed as a cell array. Each cell may be a small two-dimensional array of bit cells in the plane of the wafer. Later, if a product or customer requires a higher bit cell density than that of a 2D single chip, multiple universal memory dies can be stacked to form a 3D stacked universal memory structure. When stacking general-purpose memory wafers of the same design and processing, the memory cells are repeated in the vertical direction, or along the outer plane of the wafer. Typically, the number of rows in a cell may range from 32 to 1028, and the number of columns in the cell may range from 32 to 1028. To provide customers with flexibility and versatility with minimal cost, power and performance reductions, relatively smaller unit sizes such as 32x 32 or 64x 64 may be favored instead of 512x 512. In this article, the minimum size of the cell will be referred to as the "original cell". If a 3D-stacked universal memory wafer should be considered a universal memory wafer, adjacent primitive cells can have some extra space for through silicon vias or layer vias. Customization in memory cell size can be provided by adding several custom process steps on a general-purpose memory wafer prior to the wafer stacking step. The customization step can be an additional metallization step processed on a general-purpose memory wafer that bridges and stitches several cells into a memory structure of the desired size. Multiple basic units spliced together to form a target size will be called "splice units". For example, four units of a 32x 32 base unit can be connected to form a 64x 64 patchwork unit. In addition to the stitching process, a pin pad formation step can also be included as part of these additional metal customization process steps. The custom memory wafer can then be flipped and bonded to the logic substrate using, for example, hybrid bonding, and connections to predefined pads are made at the logic substrate connecting the memory to the logic.

最小的存储器结构可以在考虑位单元尺寸和定义焊盘的最小间距和尺寸的混合接合的精度的情况下设计。所述单元可以根据这种最小的存储器结构来设计,或者甚至更小,从而允许更灵活的布置和网格粒度。The smallest memory structures can be designed taking into account the accuracy of the bit cell size and hybrid bonding that defines the minimum pitch and size of the pads. The cells can be designed according to this minimal memory structure, or even smaller, allowing for more flexible placement and grid granularity.

考虑具有总面积W*L的宽度W和长度L的位单元。假设混合接合工艺,最小间距H表示一个连接的面积H*H,其中一个连接面积包括实际焊盘和用于接合的空间。假设存储器是6T SRAM,每个单元宽度具有一条字线,每个位单元长度具有两条位线。假设最小阵列的宽度为m个单元,长度为n个单元。因此,以下公式表示对这种结构的要求:m*W*2n*L>=m*H*H+n*H*HConsider a bit cell of width W and length L with a total area W*L. Assuming a hybrid bonding process, the minimum pitch H represents the area H*H of a connection, where a connection area includes the actual pad and the space used for bonding. Assume the memory is a 6T SRAM with one word line per cell width and two bit lines per bit cell length. Assume that the minimum array is m cells wide and n cells long. Therefore, the following formula represents the requirements for this structure: m*W*2n*L>=m*H*H+n*H*H

正如所看到,焊盘的数量,以及相应的焊盘所需的面积,根据m+n增长,而单元阵列面积增长m*n。因此,在给定特定数量和纵横比选择的情况下,可以针对位单元的特定情况和混合接合工艺来定义最小阵列尺寸。As can be seen, the number of pads, and the corresponding area required for the pads, grows according to m+n, while the cell array area grows by m*n. Therefore, the minimum array size can be defined for a specific case of bit cells and a hybrid bonding process, given a specific number and aspect ratio selection.

例如,最近关于混合接合的报告,如Jouve,A.等人的“1pm间距的直接混合接合,晶片到晶片的覆盖精度<300nm(1pm pitch direct hybrid bonding with<300nm Wafer-to-Wafer overlay accuracy)”2017IEEE SOI-3D亚阈值微电子技术统一会议(S3S),IEEE,2017;和Global Foundries于2019年8月7日发布的标题为“GLOBALFOUNDRIES和Arm演示用于高性能计算应用的高密度3D堆叠测试芯片”的文章表明,1微米间距(H=1微米)的混合接合。Kim、Soon Wook等人的“用于1pm间距混合晶片到晶片接合的新型Cu/SiCN表面拓扑控制(Novel Cu/SiCN surface topography control for 1pm pitch hybrid wafer-to-waferbonding)”,2020,IEEE第70届电子元件和技术会议(ECTC),IEEE,2020也提出了类似的结果和接合技术,上述内容以全文引用的方式并入本文中。For example, recent reports on hybrid bonding such as "1pm pitch direct hybrid bonding with <300nm Wafer-to-Wafer overlay accuracy" by Jouve, A. et al. 2017 IEEE SOI-3D Subthreshold Microelectronics Technology Unification Conference (S3S), IEEE, 2017; and Global Foundries published on August 7, 2019 titled "GLOBALFOUNDRIES and Arm Demonstrate High-Density 3D Stacking for High-Performance Computing Applications" The "Test Chip" article shows a hybrid bonding of 1 micron pitch (H = 1 micron). "Novel Cu/SiCN surface topography control for 1pm pitch hybrid wafer-to-waferbonding" by Kim, Soon Wook et al., 2020, IEEE No. 70 Similar results and bonding techniques were also presented at the 2020 Electronic Components and Technology Conference (ECTC), IEEE, 2020, which is incorporated by reference in its entirety.

7nm 6T SRAM位单元布局的示例如图1所示:W=108nm且L=250nm。根据上述公式和近似正方形的存储器结构,可以用于混合接合的最小存储器结构可以具有:An example of a 7nm 6T SRAM bit cell layout is shown in Figure 1: W = 108nm and L = 250nm. Based on the above formula and the approximately square memory structure, the smallest memory structure that can be used for hybrid bonding can have:

m~100和n~85。m~100 and n~85.

图2示出了具有基元单元202的示例性存储器结构,对于这样的示例,基元单元可以设置为具有~100*85位单元的最小阵列。这些单元可以被放置成在其之间具有位单元大小的空间204,从而形成通用存储器的二维重复模式200。Figure 2 shows an exemplary memory structure with primitive cells 202. For such an example, the primitive cells may be configured as a minimum array with ~100*85 bit cells. These cells may be placed with bit-cell sized spaces 204 between them, forming a two-dimensional repeating pattern 200 of a general-purpose memory.

图3A示出了单元阵列的四个单元202,如图2所示。这四个单元排列在2x2配置示例中。FIG. 3A shows four cells 202 of the cell array, as shown in FIG. 2 . The four units are arranged in a 2x2 configuration example.

图3B示出了四个通用单元202,通过在其之间形成“桥”304(或捆扎连接),使字线和位线连接,从而控制2x2存储器结构,其被定制为用作存储器结构。桥接器连接相邻单元的字线和位线。桥可以是铜、钨或与铜一样导电或更好的其它导电金属或导电材料。Figure 3B shows four general-purpose cells 202 that connect word lines and bit lines by forming "bridges" 304 (or strap connections) between them, thereby controlling a 2x2 memory structure, which is customized for use as a memory structure. Bridges connect the word lines and bit lines of adjacent cells. The bridge may be copper, tungsten, or other conductive metal or material that conducts electricity as well as or better than copper.

图3C示出了通过添加焊盘或引脚306为接下来的混合接合步骤做准备而获得的进一步定制示例。焊盘或引脚可以是铜、铝或其它金属。焊盘或引脚306层可以在桥接层的相同步骤中进行处理。或者,与桥接层相比,焊盘或引脚306层可以形成在更高的水平上,因此当焊盘和引脚306层暴露时,桥接层位于介电层内部。Figure 3C shows an example of further customization achieved by adding pads or pins 306 in preparation for the subsequent hybrid bonding step. The pads or pins can be copper, aluminum or other metals. The pad or pin 306 layer can be processed in the same step as the bridge layer. Alternatively, the pad or pin 306 layer may be formed at a higher level than the bridging layer, so that when the pad and pin 306 layer is exposed, the bridging layer is internal to the dielectric layer.

图3D示出了结构的扩展,显示了一个额外的2x2存储器结构(总共两个2x2存储器结构)以及其之间的空间308,而没有桥接。在所述示例中,每个2x2存储器结构具有四个通用单元202。Figure 3D illustrates an extension of the structure, showing an additional 2x2 memory structure (two 2x2 memory structures in total) and the space 308 between them without bridging. In the example, each 2x2 memory structure has four general-purpose cells 202.

图4A显示了图3A中椭圆322标记区域的剖视图。图4A显示了存储器控制线402中的间隙450。存储器控制线402可以是位线或字线,或者在一些情况下是另一类型的存储器控制线。存储器控制线402延伸到位单元阵列的外部边界之外。Figure 4A shows a cross-sectional view of the area marked by oval 322 in Figure 3A. Figure 4A shows a gap 450 in memory control line 402. Memory control lines 402 may be bit lines or word lines, or in some cases another type of memory control line. Memory control lines 402 extend beyond the outer boundaries of the bit cell array.

图4B显示了图3C中椭圆302标记区域的剖视图。图4B示出了具有连接间隙450并将一个单元202的控制线402与来自另一单元202的另一控制线402连接的通孔409的桥404。焊盘/引脚406和408示出了潜在的导电混合接合点。示例性通孔407[例如,硅通孔(TSV)或层通孔(TLV)]可以将焊盘/引脚406连接到示例性单元202内的下层控制线。同样,从焊盘/引脚408通过孔407到来自边缘的另一控制线410的导电连接在图4D中进一步示出。Figure 4B shows a cross-sectional view of the area marked by oval 302 in Figure 3C. 4B shows bridge 404 with vias 409 connecting gaps 450 and connecting control lines 402 of one cell 202 with another control line 402 from another cell 202 . Pads/pins 406 and 408 illustrate potential conductive hybrid joints. An example via 407 [eg, a through silicon via (TSV) or a through layer via (TLV)] may connect the pad/pin 406 to an underlying control line within the example cell 202 . Likewise, the conductive connection from pad/pin 408 through hole 407 to another control line 410 from the edge is further shown in Figure 4D.

图4C示出了图3D中由椭圆312标记的区域中的剖视图。所述图示出了过孔412,所述过孔412用导线414和过孔407和412将控制线404从单元边缘连接到焊盘/引脚406以用于将来的接合。图4A、4B和4C示出了通用单元202的边缘的一部分,本质上是示例性的。工程设计选择可以创建本文提出的连接概念的许多变体,以优化所设想的系统/器件的速度、功率和成本。例如,图4A-4C中所示的焊盘/引脚、过孔、控制线段等不需要相对于间隙450对称。单元202的每个部分可以具有完全不同的连接。同样,单元202之间的连接可以是可编程的,例如,通过激光熔断或熔断熔丝,或者可以通过反熔丝被电编程为导电连接,或者通过熔丝不导电。Figure 4C shows a cross-sectional view in the area marked by ellipse 312 in Figure 3D. The figure shows via 412 connecting control line 404 from the edge of the cell to pad/pin 406 with wire 414 and vias 407 and 412 for future bonding. Figures 4A, 4B, and 4C illustrate a portion of the edge of the universal unit 202, which is exemplary in nature. Engineering design choices can create many variations of the connectivity concepts presented in this article to optimize speed, power, and cost for the envisioned system/device. For example, the pads/pins, vias, control segments, etc. shown in Figures 4A-4C do not need to be symmetrical with respect to gap 450. Each part of unit 202 may have completely different connections. Likewise, connections between cells 202 may be programmable, for example, by laser blowing or blowing fuses, or may be electrically programmed to be conductive via antifuses, or non-conductive via fuses.

图5A-5C示出了具有3D引脚/焊盘连接结构的存储单元的示例,所述存储单元使用两个金属层和一个引脚/焊垫层。Figures 5A-5C show examples of memory cells with a 3D pin/pad connection structure using two metal layers and one pin/pad layer.

图5A示出了栅极顶部的引脚/焊盘层,其示出了具有节距510(“BLP”)的下层位线502和具有节距509(“WLP”)的字线504。混合接合引脚/焊盘间距在W-E方向上为501,在N-S方向上为503,大约是BLP和WLP的4倍。基本概念是以更大的间距将可能具有紧密间距要求的控制线重新分配到二维金属焊盘阵列中,以适应混合接合能力。在此实例中,位线是W-E基数500方向上的存储器顶部金属且在N-S基数500方向下的字线。在所述实例中,存储单元大小为约2x BLP*WLP——如果互补单元需要BL和BL/例如SRAM,则为两个栅格正方形,或者如果单元仅具有一个BL(例如DRAM、MRAM、PRAM或RRAM),则为约BLP*WLAN——一个栅格正方形。为了简单起见,假设一个网格的平方为一个网格平方,以供后续解释。接合对准表明了大约两个BLP乘两个WLP或2x2网格正方形505、507的焊盘/引脚,这表明了4x4网格正方形的一个焊盘/插脚的总面积。换言之,一个焊盘/引脚占据16位单元区域;引脚/焊盘的4位单元区域和12位单元需要空间。在这个示例中,每个存储单元具有一条字线和一条位线,例如在DRAM位单元中发现的。最小单元大小的计算可以相应地适用于其它类型的存储单元,如下所示。在这个示例中,单位纵横比大约是一平方单位。对于以下计算,BLP大约与WLP或P相同。因此,一个引脚/焊盘的面积是4x4xP2=16P2,字线的数量可以等于位线的数量,或者对于正方形单元结构,m,这比公式建议的要多:Figure 5A shows the pin/pad layer on top of the gate, showing the underlying bit line 502 with pitch 510 ("BLP") and the word line 504 with pitch 509 ("WLP"). The hybrid bond pin/pad spacing is 501 in the WE direction and 503 in the NS direction, which is approximately 4 times that of BLP and WLP. The basic concept is to redistribute control lines that may have close spacing requirements into a two-dimensional array of metal pads at larger intervals to accommodate mixed bonding capabilities. In this example, the bit line is the memory top metal in the WE base 500 direction and the word line in the NS base 500 direction. In the example described, the memory cell size is about 2x BLP*WLP - two grid squares if the complementary cell requires both BL and BL/e.g. SRAM, or if the cell only has one BL (e.g. DRAM, MRAM, PRAM or RRAM), it is approximately BLP*WLAN - a grid square. For simplicity, assume that one grid squared is one grid squared for subsequent explanations. The bond alignment indicates approximately two BLPs by two WLPs or 2x2 grid squares 505, 507 of pads/pins, which indicates the total area of one pad/pin of a 4x4 grid square. In other words, one pad/pin occupies a 16-bit cell area; space is required for the pin/pad's 4-bit cell area and 12-bit cell area. In this example, each memory cell has a word line and a bit line, such as those found in DRAM bit cells. The calculation of the minimum cell size can be applied accordingly to other types of memory cells, as shown below. In this example, the unit aspect ratio is approximately one square unit. For the following calculations, BLP is approximately the same as WLP or P. Therefore, the area of one pin/pad is 4x4xP 2 =16P 2 , and the number of word lines can be equal to the number of bit lines, or for a square cell structure, m, which is more than the formula suggests:

16P2*(m+m)<=mP*mP,或32<=m。16P 2 *(m+m)<=mP*mP, or 32<=m.

因此,图5A的示例说明了32位线乘32字线的最小单元。Thus, the example of Figure 5A illustrates a minimum cell of 32 bit lines by 32 word lines.

图5A示出了用于位线连接的32个引脚/焊盘507,其中前16个位线地址编号为508,以及用于字线连接的32pin/焊盘505。它还用虚线506示出了将顶表面分配给四个区域,其中两个用于字线连接的引脚/焊盘,两个用于位线连接。图5A-5C的特定引脚/焊盘布置是示例性的,且可以根据工程权衡来设计特定布置,例如光刻和接合对准精度和精度、临界速度网、存储单元尺寸和纵横比等。Figure 5A shows 32 pins/pads 507 for bitline connections, with the first 16 bitline address numbers 508, and 32pins/pads 505 for wordline connections. It also shows with dashed line 506 the allocation of the top surface to four areas, two for pins/pads for wordline connections and two for bitline connections. The specific pin/pad arrangements of Figures 5A-5C are exemplary, and specific arrangements may be designed based on engineering trade-offs, such as lithography and bonding alignment accuracy and accuracy, critical speed nets, memory cell size and aspect ratio, etc.

图5B示出了位线的金属连接。在每个位线与一个对应的引脚/焊盘之间存在连接。连接分为两组。偶数编号的526位线通过516从南使用侧连接,而奇数编号的是从北使用侧连接。这充分利用了单元两侧每条位线的可用性。连接布局只是一个示例。合格的布局可以由所属领域的布局技术人员在考虑特定工艺的设计规则的情况下进行设计。这种布局可以包括扩展单元尺寸以适应这种特定情况下的布局限制。在图5B中,顶部金属被分配给焊盘/引脚522、524。用过孔520连接到下面的金属层514,取向为W-E,用过孔518连接到下面512的金属层,在图5B中取向为S-N。Figure 5B shows the metal connections of the bit lines. There is a connection between each bit line and a corresponding pin/pad. Connections are divided into two groups. The even-numbered 526-bit lines are connected from the south-using side via 516, while the odd-numbered ones are connected from the north-using side. This takes full advantage of the availability of each bit line on both sides of the cell. The connection layout is just an example. Qualified layouts can be designed by layout technicians in the field taking into account the design rules of the specific process. This layout may include extending the cell size to accommodate the layout constraints of this particular case. In Figure 5B, the top metal is assigned to pads/pins 522, 524. Connection is made to the underlying metal layer 514 with via 520, oriented W-E, and via 518 is connected to the underlying metal layer 512, oriented S-N in Figure 5B.

位线的连接布局(未显示)可以在留给它的区域中以类似的方式进行,或者利用位于存储器阵列顶部的位线定向W-E的可用性,使用直接过孔而不是西侧、东侧的访问过孔。The connection layout of the bit lines (not shown) can be done in a similar manner in the area left for it, or take advantage of the availability of the bit line orientation W-E at the top of the memory array, using direct vias instead of west, east access vias hole.

图5C示出了图5B的顶部三个连接层,没有网格,以便更好地看到字线引脚/焊盘连接布局。图5B和图5C之间的图纸符号图例相同。Figure 5C shows the top three connection layers of Figure 5B without the grid to better see the wordline pin/pad connection layout. The drawing symbol legends are the same between Figures 5B and 5C.

尽管未画出,但许多存储器位单元需要电源线和地线,例如SRAM。应理解的是,用于电源和接地的焊盘被分配在图的桥接区域304的顶部。电源和接地线通常被偏置在静态电压下,而没有行或列单独控制,来自多行或列的电源和地线被分组在一起,因此只需要几个焊盘。Although not shown, many memory bit cells require power and ground wires, such as SRAM. It should be understood that the pads for power and ground are allocated at the top of the bridge area 304 of the figure. Power and ground lines are typically biased at quiescent voltages, and instead of rows or columns being controlled individually, power and ground lines from multiple rows or columns are grouped together so only a few pads are needed.

逻辑晶片的顶表面将具有与存储器晶片或芯片可逆的焊盘/引脚布局。逻辑晶片和存储器晶片的焊盘布局将被镜像,使得其稍后可以被适当地F2F接合和电连接。逻辑晶片的焊盘/引脚将连接到位线的感应放大器和字线焊盘的多路复用器。The top surface of the logic die will have a reversible pad/pin layout from the memory die or chip. The pad layout of the logic die and the memory die will be mirrored so that they can later be properly F2F bonded and electrically connected. The pads/pins of the logic die will be connected to the sense amplifier for the bit lines and the multiplexer for the word line pads.

另一种选择是具有稍大的单元尺寸,以允许单元连接上的常规引脚/焊盘。这样可以允许一个金属层用于布线,而另一个用于引脚/焊盘层。为了说明这种替代方案,图5A的单元结构和更好的混合接合间距(“H”),如图5D所示。包括焊盘/引脚547的接合焊盘间距541、543例如是存储器阵列的字线间距WLP 549和位线间距BLP 550的三倍。混合接合连接结构类似于PCT/US2017/052359中引用的结构,以引用的方式并入本文中,如图21A-21C所示,折叠在存储单元上,如图5D所示。混合接合间距H和位线间距BLP之间的比率H/BLP可以导出位线的接合焊盘/引脚的列数(四舍五入),如图5D所示。类似地,混合接合间距H和字线间距WLP之间的比率H/WLP可以驱动字线的接合焊盘/引脚的行数(四舍五入),如图5D所示。结果,分别确定WL的行和列的数目以及BL的行和列数。因此,所述单元的顶表面可以由N-S虚线545和W-E虚线546标记为四个类似尺寸的象限。N-E象限542可用于接合一半位线554的焊盘/引脚,而W-S可用于接合另一半位线552的焊垫/引脚,且以类似方式用于字线552,W-N象限用于第一半且S-E象限用于另一半。Another option is to have a slightly larger cell size to allow regular pins/pads on the cell connections. This allows one metal layer to be used for routing and another for the pin/pad layer. To illustrate this alternative, the cell structure of Figure 5A and a better hybrid joint spacing ("H") are shown in Figure 5D. Bond pad pitch 541, 543 including pad/pin 547 is, for example, three times the word line pitch WLP 549 and bit line pitch BLP 550 of the memory array. The hybrid joint connection structure is similar to the structure cited in PCT/US2017/052359, incorporated herein by reference, as shown in Figures 21A-21C, folded over the storage unit as shown in Figure 5D. The ratio H/BLP between the hybrid bond pitch H and the bit line pitch BLP can derive the number of columns of bond pads/pins for the bit line (rounded), as shown in Figure 5D. Similarly, the ratio H/WLP between the hybrid bond pitch H and the word line pitch WLP can drive the number of rows of bond pads/pins for word lines (rounded), as shown in Figure 5D. As a result, the number of rows and columns of WL and the number of rows and columns of BL are determined respectively. Thus, the top surface of the unit may be labeled into four similarly sized quadrants by dashed N-S lines 545 and W-E dashed lines 546. The N-E quadrant 542 can be used to bond the pads/pins of one half of the bit line 554, while the W-S can be used to bond the pads/pins of the other half of the bit line 552, and in a similar manner for word line 552, the W-N quadrant is used for the first half and the S-E quadrant is used for the other half.

为了评估这种引脚/焊盘连接的较小单元尺寸,可以考虑以下因素。虚线556表示N-E象限结构的南方向边缘,而虚线557表示S-E象限连接结构的北边缘。需要这些结构558之间的距离以避免这些结构变得太近。N-E象限结构的长度(在N-S方向上)约为~N/2*BLP+H。这里,N是单位中的位线数量。S-E象限结构的宽度(在N-S方向上)约为~H/WLP(向上取整)*H。为了简单起见,假设字线间距约等于位线间距,且可以表示为P。N-S方向的单位大小约为N*P。因此,表示关于558的条件的公式是:H/P*H+H+n/2*P<n*P,其可以写成:n>2H2/P2+2H/P。例如,假设H=1微米,P=0.1微米,而n>220。因此,被构造为尺寸为200μ*200μ、控制线间距为0.1μ的单元阵列的存储器阵列将具有足够的单元区域顶部,以形成如图5D所示的引脚/焊盘连接结构,其中5D具有n~2000>>220。图5E是去除网格和其它标记并为底层存储单元560的边界添加标记后的图5D的示意图。To evaluate the smaller cell size for this pin/pad connection, the following factors can be considered. Dashed line 556 represents the south oriented edge of the NE quadrant structure, while dashed line 557 represents the north edge of the SE quadrant connection structure. The distance between these structures 558 is needed to avoid these structures becoming too close. The length of the NE quadrant structure (in the NS direction) is approximately ~N/2*BLP+H. Here, N is the number of bit lines in the unit. The width of the SE quadrant structure (in the NS direction) is approximately ~H/WLP (rounded up)*H. For simplicity, it is assumed that the word line pitch is approximately equal to the bit line pitch, and can be expressed as P. The unit size in the NS direction is approximately N*P. Therefore, the formula expressing the condition with respect to 558 is: H/P*H+H+n/2*P<n*P, which can be written as: n>2H 2 /P 2 +2H/P. For example, assume H=1 micron, P=0.1 micron, and n>220. Therefore, a memory array constructed as a cell array with dimensions of 200μ*200μ and a control line spacing of 0.1μ will have enough cell area on top to form a pin/pad connection structure as shown in Figure 5D, where 5D has n~2000>>220. FIG. 5E is a schematic diagram of FIG. 5D with the grid and other markings removed and markings added for the boundaries of the underlying memory cells 560 .

图6说明了将所述概念扩展到多级存储器结构。在存储器要求非常高且单个层级的存储器将不能提供足够的存储器的情况下,可以使用这种方法。图6类似于美国申请16/558,304的图22F,以引用的方式并入本文中。全球控制线的垂直支柱,如图22F(16/558,304)中的2246和2258,被两组垂直支柱645、646替换为2246,655、656替换为2258,依此类推。且图4B的桥接概念或图4C的用于接合的焊盘/引脚扩展可以用于多级存储器结构的定制。每层级选择647、657可以连接到控制逻辑以实现对所选择的特定层级的完全控制。Figure 6 illustrates the extension of the described concept to a multi-level memory structure. This approach can be used in situations where memory requirements are very high and a single level of memory will not provide sufficient memory. Figure 6 is similar to Figure 22F of US application 16/558,304, which is incorporated herein by reference. The vertical pillars of the Global Line of Control, such as 2246 and 2258 in Figure 22F (16/558,304), are replaced by two sets of vertical pillars 645, 646 replaced by 2246, 655, 656 replaced by 2258, and so on. And the bridging concept of Figure 4B or the pad/pin extension for bonding of Figure 4C can be used for customization of multi-level memory structures. Each level selection 647, 657 can be connected to control logic to enable complete control of the specific level selected.

图7A是美国申请16/558,304的图43E的副本,以引用的方式并入本文中。图7A示出了一种多级器件,其可以包括如本文所述的逻辑层级、定制存储器层级,以支持诸如高速缓存1、高速缓存2或最后一级高速缓存型存储器之类的逻辑层级,额外的存储器层级和包括解码和读出放大器电路在内的存储器控制层级,这些层级可以是高速存储器的多级堆叠的形式,例如DRAM,诸如3D NOR的存储器结构和诸如3D NAND的存储器结构。此外,全局X-Y互连的层级可以利用传输线上的电磁波或具有支持RF或光学电路的波导。各种层级可以包括馈通连接,以允许跨层级的垂直连接。在这种基于3D IC的系统的构造中使用层转移可以实现异构集成,其中每个层/层/级可以包括例如MEMS传感器、图像传感器、CMOS SoC、诸如DRAM和SRAM的易失性存储器、持久存储器、铁电存储器和诸如闪存和OTP的非易失性存储中的一个或多个。这可以包括在存储器阵列的顶部或下方添加存储器控制电路,也称为外围电路。存储器层可以仅包含存储单元而不包含控制逻辑,因此控制逻辑可以被包括在单独的层上。或者,存储器层可以包含存储单元和简单控制逻辑,其中所述层上的控制逻辑可以包括解码器、缓冲存储器、感应放大器中的至少一个。外围/控制电路可以包括电荷泵和高压晶体管,其可以使用硅晶体管或其它晶体管类型(例如SiGe、Ge、CNT等)在层上制造,使用的制造工艺线可以且通常不同于低压控制电路制造工艺线。模拟电路(例如用于感应放大器的模拟电路)和其它敏感线性电路也可以被独立处理,并被层转移到3D结构上。这种3D构造可以包括本发明中提出的“智能对准”技术或接合的参考文献,或者利用存储器阵列的重复性质来减少晶片接合器未对准对集成有效性的影响。如PCT/US2017/052359(WO2018/071143)中所述,其全部内容以引用的方式并入本文中。具体而言,针对其图11A至图12J进行讨论,或使用如图20A至图25J所示的混合接合技术。水平之间的混合接合减少了这种3D集成所需的工艺步骤,但为克服错位挑战提供了较少的灵活性。“智能对准”技术可以克服这种对准挑战,但需要通过蚀刻和沉积步骤来实现这种水平,从而在堆叠工艺中增加步骤。在3D堆叠结构的不同层级之间,垂直连接的挑战可能非常不同。没有级内解码器的堆叠存储器层级可能需要字线、位线间距等到解码器级的垂直连接,这比堆叠中其它级的连接要求相对更高。因此,堆叠工艺可以是不同的,以适应这些水平之间的对准要求。此外,如果晶片来自相同的工艺线,则对准误差的来源可能不同,使得误差有时更小,例如对于存储器层级可能预期的(例如,最小步进匹配)。这些选择和3D工程设计可以使用所属领域技术人员在此通过引用技术并入的各种3D集成技术。Figure 7A is a copy of Figure 43E of US Application No. 16/558,304, which is incorporated herein by reference. 7A illustrates a multi-level device that may include logic levels as described herein, custom memory levels to support logic levels such as Cache 1, Cache 2, or last level cache-type memory, Additional memory levels and memory control levels including decoding and sense amplifier circuits, which may be in the form of multi-level stacks of high-speed memories such as DRAM, memory structures such as 3D NOR and memory structures such as 3D NAND. Additionally, layers of global X-Y interconnects can utilize electromagnetic waves on transmission lines or have waveguides with supporting RF or optical circuits. Various tiers may include feedthrough connections to allow vertical connections across tiers. Heterogeneous integration can be achieved using layer transfer in the construction of such 3D IC based systems, where each layer/layer/level can include, for example, MEMS sensors, image sensors, CMOS SoCs, volatile memories such as DRAM and SRAM, One or more of persistent memory, ferroelectric memory, and non-volatile storage such as flash memory and OTP. This can include adding memory control circuitry, also known as peripheral circuitry, on top or below the memory array. The memory layer may contain only memory cells and no control logic, so the control logic may be included on a separate layer. Alternatively, the memory layer may contain memory cells and simple control logic, wherein the control logic on the layer may include at least one of a decoder, a buffer memory, and a sense amplifier. Peripheral/control circuitry may include charge pumps and high-voltage transistors, which may be fabricated on a layer-by-layer basis using silicon transistors or other transistor types (e.g., SiGe, Ge, CNT, etc.), using fabrication process lines that may be, and often are, different from the low-voltage control circuitry fabrication processes Wire. Analog circuits (such as those used in sense amplifiers) and other sensitive linear circuits can also be processed independently and layer-transferred onto 3D structures. Such 3D construction could include references to "smart alignment" techniques or bonding proposed in this disclosure, or exploit the repetitive nature of memory arrays to reduce the impact of wafer bonder misalignment on integration effectiveness. As described in PCT/US2017/052359 (WO2018/071143), the entire contents of which are incorporated herein by reference. Specifically, it is discussed with respect to FIGS. 11A to 12J , or a hybrid bonding technique is used as shown in FIGS. 20A to 25J . Hybrid bonding between levels reduces the process steps required for this 3D integration but provides less flexibility to overcome misalignment challenges. "Smart alignment" technology can overcome this alignment challenge, but it requires etching and deposition steps to achieve this level, adding steps to the stacking process. Vertical connection challenges can be very different between different levels of a 3D stacked structure. Stacked memory levels without intra-level decoders may require vertical connections for word lines, bit line spacing, etc. at the decoder level, which are relatively more demanding than the connections at other levels in the stack. Therefore, the stacking process can be different to accommodate the alignment requirements between these levels. Furthermore, if the wafers come from the same process line, the sources of alignment errors may be different, such that the errors are sometimes smaller, such as may be expected for a memory hierarchy (eg, minimum step matching). These selections and 3D engineering designs may use various 3D integration techniques incorporated herein by reference to those skilled in the art.

存储器层可以包括多种类型和存储器技术,且可以放置在3D器件结构的不同层级中,如图7A所示。它可以包括更靠近计算逻辑的高速存储器和更靠近X-Y互连结构的高密度存储器。高密度层级可以是类似于业内已知的3D NAND、V-NAND、X点存储器或Optane的形式,而高速存储器可以类似于PCT/US2018/016759和62/952,222中提出的所谓的3D NOR-P,两者均以引用的方式并入本文中。存储器层可以是单元阵列的结构。图7B示出了这样的单元,其可以具有约0.04mm2、约0.1mm2、约0.4mm2、约0.1mm2、约0.4mm2的尺寸。或者甚至大于约1mm2。它可以是一个结构化的单元阵列,如2x2、4x4、8x8、32x32、256x256、1024x1024或这些数字的任何组合,如16x64。存储器层级可以包括存储器控制电路710、714(也称为存储器外围电路)和每单元约100个馈通718,以支持整个3D结构700的垂直连接。控制电路可以被构造为使得每个存储单元在存储器阵列712的顶部710和/或下方714具有其自己的控制。存储控制器和存储器阵列之间的连接可以利用混合接合和焊盘/引脚结构,如本文参考图5A-5C所示,或者其它结构,如通过引用并入的现有技术中所示,如PCT/US2017/052359,以引用的方式并入本文中,如图21A-21C所示。从控制电路714到诸如计算逻辑716的其它器件级的连接可能相对更容易,因为对于单元的区域,可能需要几十个或几百个连接,因为存储器控制电路包括用于垂直母线内的地址部分的地址解码器。因此,在单元内,从存储器控制电路到存储器阵列(2D或3D)的连接需求可以包括到位线和字线的几千个连接——在3D存储器的情况下,大约一百个连接用于馈通(feed-through),几十到几百个连接用于层选择。几百个额外的连接可以添加在单元的顶部或其侧面,因为即使对于在200微米或更大的单元的边长上具有1微米间距的焊盘/引脚,也将仅为结构增加1%的开销面积。存储器层可以是与其它结构集成以形成定制或半定制产品的标准模块。所述结构尺寸可以是全晶片或任何更小的结构,例如甚至小于100mm2尺寸的单个场,如PCT/US2018/523 32中所述,以引用的方式并入本文中。存储控制器可以包括待在器件设置和操作期间操作的内置测试和冗余激活。这些内置测试和冗余的激活和报告可以作为这数百个连接和直通连接功能的一部分。Memory layers can include a variety of types and memory technologies and can be placed at different levels of the 3D device structure, as shown in Figure 7A. It can include high-speed memory closer to the computing logic and high-density memory closer to the XY interconnect structure. The high-density tier could be in the form of what is known in the industry as 3D NAND, V-NAND, X-point memory or Optane, while the high-speed memory could be similar to the so-called 3D NOR-P proposed in PCT/US2018/016759 and 62/952,222 , both of which are incorporated herein by reference. The memory layer may be a cell array structure. Figure 7B shows that such a unit may have dimensions of about 0.04mm2, about 0.1mm2 , about 0.4mm2 , about 0.1mm2 , about 0.4mm2 . Or even larger than about 1mm2. It can be a structured array of cells like 2x2, 4x4, 8x8, 32x32, 256x256, 1024x1024 or any combination of these numbers like 16x64. The memory hierarchy may include memory control circuits 710, 714 (also referred to as memory peripheral circuits) and approximately 100 feedthroughs 718 per cell to support vertical connectivity throughout the 3D structure 700. The control circuitry may be configured so that each memory cell has its own control at the top 710 and/or below 714 of the memory array 712 . Connections between the memory controller and the memory array may utilize hybrid bonding and pad/pin structures, as shown herein with reference to Figures 5A-5C, or other structures, as shown in the prior art incorporated by reference, such as PCT/US2017/052359, incorporated herein by reference, is shown in Figures 21A-21C. Connections from control circuitry 714 to other device levels such as computing logic 716 may be relatively easier since dozens or hundreds of connections may be required for an area of a cell because the memory control circuitry includes address sections within the vertical bus address decoder. Therefore, within a cell, the connection requirements from the memory control circuitry to the memory array (2D or 3D) can include thousands of connections to bit lines and word lines - in the case of 3D memory, about a hundred connections for feeding Feed-through, dozens to hundreds of connections are used for layer selection. A few hundred extra connections could be added on the top of the cell or its sides as even for pads/pins with 1 micron spacing on the side length of a 200 micron or larger cell it would only add 1% to the structure of overhead area. The memory layer can be a standard module integrated with other structures to form a custom or semi-custom product. The structure size may be a full wafer or any smaller structure, such as a single field even smaller than 100 mm2 size, as described in PCT/US2018/523 32, incorporated herein by reference. The memory controller may include built-in tests and redundancy activation that operate during device setup and operation. Activation and reporting of these built-in tests and redundancies are available as part of these hundreds of connections and pass-through connectivity capabilities.

对于整个结构中的不同单元,这种单元的母线可能不同,结构中单元的大小也可能不同。母线可以是工业中常见的1、2、4、8、16、32或64位,但也可以是几百甚至数千位的极宽母线,以支持具有极宽数据母线的处理器设计,或者具有额外的片上缓冲器,以提高从存储器到处理器级的数据速度。The bus bars of such a unit may be different for different units throughout the structure, and the size of the units in the structure may also be different. The bus can be 1, 2, 4, 8, 16, 32 or 64 bits as is common in the industry, but can also be extremely wide buses of hundreds or even thousands of bits to support processor designs with extremely wide data buses, or Features additional on-chip buffers to increase data speed from memory to the processor level.

图7C示出了平铺图7B的单元结构,从而形成单元740的阵列。这种平铺可以横跨整个晶片或其任何部分。图7A-7C为沿X-Z 702方向的侧视图。图7D是单元704的晶片尺寸阵列沿X-Y 703方向的平面图。FIG. 7C shows tiling the cell structure of FIG. 7B to form an array of cells 740. This tiling can span the entire wafer or any part of it. Figures 7A-7C are side views along the X-Z 702 direction. Figure 7D is a plan view of a wafer-scale array of cells 704 along the X-Y 703 direction.

形成全3D异质集成的工艺流程,如图7A所示,可能包括晶片接合和衬底去除的几个步骤,如使用切割层进行的“切割”或使用研磨和蚀刻进行的细化,其中可能包括使用切割层作为蚀刻终止层。这种3D结构的形成可以包括不同层次的混合和匹配接合,如普通层、半定制层和全定制层。存储器层可以包括形成3D NOR存储器阵列,然后将存储器控制级接合到其上的步骤。图7E-7G使用小截面X-Z 702剖视图说明了这种工艺流程。The process flow to form a full 3D heterogeneous integration, as shown in Figure 7A, may include several steps of wafer bonding and substrate removal, such as "dicing" using dicing layers or thinning using grinding and etching, which may Including using the cut layer as an etch stop layer. The formation of this 3D structure can include mixing and matching joints of different layers, such as regular layers, semi-custom layers and fully custom layers. The memory layer may include the steps of forming a 3D NOR memory array and then bonding a memory control level thereon. Figures 7E-7G illustrate this process flow using the small cross-section X-Z 702 cross-section.

图7E示出了存储器控制电路739(外围电路)的一小部分。所述截面对应于两个单元的边缘。它说明了四个顶部接合引脚/焊盘736738要接合到存储器引脚/焊盘中,如图5E所示。它示出了馈通结构735和被指定为连接到逻辑层级的两个底部引脚/焊盘737。基底可以包括基底硅742和切割层/蚀刻终止层740。底部接合焊盘可以放置在单元之间的区域中,所述区域可以清除有源电路。底部737引脚/焊盘可以是第一金属或接触层的一部分。或者,利用切割层740的蚀刻选择性,其甚至可以形成在切割层740下方(未示出),以简化暴露其以将其制备为引脚/焊盘的后续步骤。其它选项确实存在,包括为这些引脚/焊盘分配更多的区域以及使用被称为TSV的技术。所述结构包括用于保护的顶部氧化物733,且将是未来混合接合的一部分。Figure 7E shows a small portion of the memory control circuit 739 (peripheral circuit). The sections correspond to the edges of the two units. It illustrates the four top bonding pins/pads 736738 to be bonded into the memory pins/pads as shown in Figure 5E. It shows the feedthrough structure 735 and the two bottom pins/pads 737 designated to be connected to the logic level. The substrate may include base silicon 742 and dicing/etch stop layer 740. Bottom bond pads can be placed in areas between cells that are cleared of active circuitry. The bottom 737 pin/pad may be part of the first metal or contact layer. Alternatively, taking advantage of the etch selectivity of the dicing layer 740, it can even be formed under the dicing layer 740 (not shown) to simplify the subsequent steps of exposing it to prepare it as a pin/pad. Other options do exist, including allocating more area to these pins/pads and using a technology called TSV. The structure includes a top oxide 733 for protection and will be part of future hybrid bonding.

图7F示出了在存储器层743顶部翻转和接合存储器控制电路744(来自图7E)。存储器层743可以是3D NOR-P的阵列752或先前讨论的任何其它存储器选项。存储器层743可以形成在具有其自己的切割层(也可以称为蚀刻终止层)752的衬底756上。馈通755可以被放置在存储单元之间。图7E中的存储器引脚/焊盘735736738可以使用混合接合连接到控制级引脚/焊垫750。Figure 7F shows flipping and engaging memory control circuit 744 (from Figure 7E) on top of memory layer 743. Memory layer 743 may be an array 752 of 3D NOR-P or any other memory option previously discussed. Memory layer 743 may be formed on substrate 756 with its own dicing layer (which may also be referred to as an etch stop layer) 752. Feedthroughs 755 may be placed between storage cells. Memory pin/pad 735736738 in Figure 7E may be connected to control stage pin/pad 750 using hybrid bonding.

图7G示出了通过利用蚀刻终止层的研磨和回蚀刻等技术或本文或引用文献中先前提出的任何其它切割技术来薄化存储器控制电路744以形成存储器控制745之后的结构。引脚/焊盘758、760暴露或通过使用传统半导体工艺打开顶部过孔并形成金属顶部引脚/焊垫而形成。形成这些引脚/焊盘758、760所需的工艺步骤是整个流程设计的一部分。在如图7E所示的晶片处理包括形成到接触层级733的金属连接的情况下,这可能是一个相对非常简单的工艺,或者它可能包括更多的步骤,以形成向下穿过晶片背面的通孔,一直到适当信号线的适当金属层级。图7G示出了完整存储器层的一小部分,所述完整存储器层具有准备好接合在逻辑晶片顶部的存储器阵列及其控制,以形成图7A所示的结构类型。可以预期的是,每单位从存储器控制层到存储器阵列750的连接数量可以是几千,以提供对字线、位线和其它存储器控制线的控制。如前所述,每个单元的馈通755,758的数量可以是几十,到处理器逻辑层级的连接760的数量也是如此。7G illustrates the structure after thinning the memory control circuit 744 to form the memory control 745 by techniques such as grinding and etch back using an etch stop layer or any other cutting technique previously proposed herein or in the cited literature. The pins/pads 758, 760 are exposed or formed by opening top vias and forming metal top pins/pads using conventional semiconductor processes. The process steps required to form these pins/pads 758, 760 are part of the overall flow design. In the case where wafer processing as shown in Figure 7E includes forming metal connections to contact level 733, this may be a relatively simple process, or it may involve more steps to form metal connections down through the backside of the wafer. vias, all the way down to the appropriate metal levels for the appropriate signal lines. Figure 7G shows a small portion of a complete memory layer with the memory array and its controls ready to be bonded on top of the logic die to form the type of structure shown in Figure 7A. It is contemplated that the number of connections from the memory control layer to the memory array 750 may be in the thousands per unit to provide control of word lines, bit lines, and other memory control lines. As mentioned previously, the number of feedthroughs 755,758 per unit can be in the tens, as can the number of connections 760 to the processor logic level.

存储控制器可以使用接合技术或其它技术进行集成,例如与具有外围单元下的3DNAND(“PUC”)通用的技术。The memory controller may be integrated using bonding technology or other technologies, such as those common to 3D NAND with Peripheral Units Under ("PUC").

存储器层可以设置为用作双端口存储器,例如一个存储控制器714由底层处理逻辑控制,而上控制器710可以由上层处理电路控制,上层处理电路可以是操作以将数据移入结构或移出结构(“I/O”)的电路的一部分。The memory layer may be configured to function as a dual-port memory, for example one memory controller 714 may be controlled by the underlying processing logic, while an upper controller 710 may be controlled by upper layer processing circuitry, which may operate to move data into or out of the fabric ( "I/O") part of the circuit.

存储器层可以设置为用作内容可寻址存储器(CAM)。The memory layer can be configured to function as content addressable memory (CAM).

堆叠可以利用如图5A-5E所示的引脚/焊盘连接,或其它技术,如通过参考技术并入的智能对准和电子对准,或这些技术的任何混合和匹配。The stack-up may utilize pin/pad connections as shown in Figures 5A-5E, or other technologies such as smart alignment and electronic alignment incorporated through reference technology, or any mix and match of these technologies.

图8示出了设计逻辑晶片802和处理逻辑晶片804的示例性整体工艺流程。设计存储器晶片822的定制。可能有全套通用晶片,提供多个工艺节点和其它存储器选项,如高密度和高速等,供设计者选择。然后可以针对特定设计定制824所选择的通用存储器晶片,然后使用例如混合接合828将其翻转和接合到逻辑晶片。FIG. 8 illustrates an exemplary overall process flow for designing logic wafer 802 and processing logic wafer 804 . Design customization of memory die 822. There may be a full set of general-purpose wafers, offering multiple process nodes and other memory options, such as high density and high speed, for designers to choose from. The selected universal memory die can then be customized 824 for a specific design and then flipped and bonded to the logic die using, for example, hybrid bonding 828 .

逻辑晶片和通用晶片结构也可以包括使用混合接合的电源线连接。这些电源连接可以在单元级存储器结构级和/或芯片级进行。这些数字没有显示这些电源连接。所述步骤中的最终处理可以包括背面研磨、划片和封装。Logic die and general die structures may also include power line connections using hybrid bonding. These power connections can be made at the cell level memory fabric level and/or at the chip level. The figures do not show these power connections. Final processing in the steps may include back grinding, dicing and packaging.

通用存储器可以使用通过引用接合的技术进行定制,以支持一个以上层级的存储器。General-purpose memory can be customized to support more than one level of memory using join-by-reference techniques.

用于这种3D逻辑存储器设计的EDA工具可以接合至少美国专利9,021,414中提出的技术,通过引用接合于此。对于图8中所示的流程,EDA工具可以包括用于存储器解码器放置的网格,以支持这种基于单元的通用存储器结构。EDA tools for such 3D logic memory designs may incorporate at least the techniques set forth in U.S. Patent 9,021,414, incorporated herein by reference. For the flow shown in Figure 8, the EDA tool can include a grid for memory decoder placement to support this universal cell-based memory structure.

有许多选择可以使用本文或通过参考技术并入的技术来形成3D系统。这些技术可以包括在存储单元上添加引脚/焊盘,如图5A-5D所示。这可以包括堆叠几个存储器层级,一个在另一个之上,形成通过堆叠可以是2D级的存储器层级或可以是多层存储器的3D级形成的3D存储器层,例如,诸如3D NAND或3D NOR等等。这样的3D结构可以包括共享在层级和独立层之间公共的全局存储器控制线或层级选择信号。这样的存储器3D结构可以由一个或几个存储器控制层来控制,所述存储器控制层使用公共存储器控制柱和单独的层选择来控制存储器层中的每一个。这种三维层形成流程如图9A-9F所示。There are many options for forming a 3D system using the techniques incorporated in this article or through reference techniques. These techniques can include adding pins/pads on the memory cells as shown in Figures 5A-5D. This may include stacking several memory levels, one on top of another, forming a 3D memory layer formed by stacking memory levels which may be 2D levels or 3D levels which may be multi-layer memory, for example, such as 3D NAND or 3D NOR. wait. Such 3D structures may include sharing common global memory control lines or hierarchy select signals between hierarchies and independent layers. Such memory 3D structures may be controlled by one or several memory control layers that control each of the memory tiers using a common memory control pillar and individual tier selections. This three-dimensional layer formation process is shown in Figures 9A-9F.

图9A示出了类似于图7E中的存储器控制层的单元域的X-Z 902剖视图。所述结构包括具有蚀刻终止层910和存储器控制电路909的衬底912。存储器控制电路909结构可以包括单元之间的“底部”连接904、907,用于将来连接到处理器逻辑层级,以及馈通905。它还在控制电路上包括用于存储器控制线的“全局支柱”的引脚/焊盘906、908。全局存储器控制连接看起来不像支柱,因为其保持在单元表面的顶部上折叠,以适应与混合接合相关的相对较低的间距。Figure 9A shows an X-Z 902 cross-sectional view of a cell domain similar to the memory control layer in Figure 7E. The structure includes a substrate 912 having an etch stop layer 910 and memory control circuitry 909 . The memory control circuit 909 structure may include "bottom" connections 904, 907 between cells for future connections to the processor logic hierarchy, and feedthroughs 905. It also includes pins/pads 906, 908 on the control circuit for the "global pillars" of the memory control lines. The global memory control connection does not look like a pillar because it remains folded over the top of the cell surface to accommodate the relatively low pitch associated with hybrid bonding.

图9B示出了3D存储器922的单元域的X-Z 902剖视图,所述存储器922构建在具有蚀刻终止“切割层”924的衬底926上。同样,所述结构包括单元馈通925和在单元接合引脚/焊盘920之上。9B shows an X-Z 902 cross-sectional view of a cell domain of a 3D memory 922 built on a substrate 926 with an etch stop "cut layer" 924. Likewise, the structure includes cell feedthrough 925 and above cell bonding pins/pads 920 .

图9C示出了在图9A的存储器控制结构上转移存储器结构913并移除衬底926之后的结构,例如通过研磨和使用蚀刻终止层924的湿和/或干蚀刻来进行控制蚀刻终止。9C shows the structure after transferring memory structure 913 over the memory control structure of FIG. 9A and removing substrate 926, such as controlled etch stop by grinding and wet and/or dry etching using etch stop layer 924.

图9D示出了使用如图5A-5E所示的布局在3D存储器922单元上添加引脚/焊盘928之后的结构。Figure 9D shows the structure after adding pins/pads 928 on the 3D memory 922 cell using the layout shown in Figures 5A-5E.

图9E示出了在衬底上构建的存储器914的额外单元域的X-Z 902剖视图,其中在单元的馈通和单元的接合引脚/焊盘之间具有蚀刻终止“切割层”。Figure 9E shows an X-Z 902 cross-sectional view of additional cell domains of memory 914 built on a substrate with an etch stop "cut layer" between the cell's feedthrough and the cell's bonding pins/pads.

图9F示出了在图9D的结构上转移3D存储器结构914之后的结构,使用混合接合连接各个存储器控制线(如字线、位线等),并连接馈通。因此,存储器控制电路909可用于控制第一层922和重叠存储器层914的重叠存储单元。存储器层可以被设计为具有相同的存储单元尺寸和相同数量的存储器控制线,且利用标准引脚/焊盘布局来实现使用混合接合的这种系统级集成。这些存储器层可以是2D存储器阵列或3D存储器阵列。其可以是非常相似的存储器技术,或者在其它情况下是不同的存储器技术。存储器控件可以是2D结构或3D结构。可以构建多种混搭风格。如之前所讨论的,在3D结构中使用全局位线需要控制层级选择。这样的层级控制需要被适当地连接在存储器控制电路909中。这样做的选项很少,例如:Figure 9F shows the structure after transferring the 3D memory structure 914 on the structure of Figure 9D, using hybrid bonds to connect various memory control lines (such as word lines, bit lines, etc.), and connecting feedthroughs. Thus, memory control circuitry 909 may be used to control overlapping memory cells of first layer 922 and overlapping memory layer 914 . The memory layers can be designed to have the same memory cell size and the same number of memory control lines, and utilize standard pin/pad layouts to achieve this system-level integration using hybrid bonding. These memory layers can be 2D memory arrays or 3D memory arrays. This may be a very similar memory technology, or in other cases a different memory technology. Memory controls can be 2D structures or 3D structures. A variety of mix-and-match styles can be constructed. As discussed previously, using global bitlines in 3D structures requires controlling layer selection. Such hierarchical control needs to be properly connected in the memory control circuit 909. There are few options for doing this, for example:

A.选择与存储器控制电路直接连接的单独层。在这种情况下,内部层级选择可以连接到存储器电路的全局层级选择。A. Select a separate layer that is directly connected to the memory control circuitry. In this case, the internal level selection can be connected to the global level selection of the memory circuit.

B.每个存储器层可以具有用于其层级选择的专用连接。预计层级选择的数量可以小于100,因此为其中的每一个分配引脚/焊盘的区域将是合理的区域开销。B. Each memory tier can have dedicated connections for its tier selection. It is expected that the number of layer selections can be less than 100, so allocating an area of pins/pads to each of them would be a reasonable area overhead.

C.在目标是多次堆叠相同类型的存储器层的情况下,一个好的选择是使用PCT/US2017/052359的图22A-22B中提出的技术,所述技术以引用的方式并入本文中。C. In cases where the goal is to stack the same type of memory layer multiple times, a good option is to use the technique proposed in Figures 22A-22B of PCT/US2017/052359, which is incorporated herein by reference.

图9G示出了移除顶部衬底913、添加引脚/焊盘以及通过添加更多存储器层934、932、930来重复所述流程之后的结构。因此,所述结构可以包括承载衬底942、存储器控制层940、第一3D存储器层938和四个存储器层的堆叠930、932、934、936。图9G的结构可以用作存储器构建块,与计算逻辑层集成以形成3D计算结构。Figure 9G shows the structure after removing the top substrate 913, adding pins/pads, and repeating the process by adding more memory layers 934, 932, 930. Thus, the structure may include a carrier substrate 942, a memory control layer 940, a first 3D memory layer 938, and a stack of four memory layers 930, 932, 934, 936. The structure of Figure 9G can be used as a memory building block, integrated with the computing logic layer to form a 3D computing structure.

具有多层有源器件的3D系统面临的挑战之一是功率传输。异构集成的概念可以扩展到包括支持功率传输的衬底设计。图10A为垂直剖视图1002,示出了与图7A所示衬底相似的衬底,添加了全局功率输送结构。这可以包括深沟电容器1016和配电网络(“PDN”)1014。深沟槽电容器可以形成在硅晶片内部。在这种情况下,硅衬底1001将被重掺杂以形成沟槽电容器的底部电极,如图10A所示。或者,深沟槽电容器可以形成在氧化物内。在这种情况下,底部电极可以是金属衬垫(未绘制)。电容器的结构可以是平面型、冠型、柱型或圆柱型中的一种。在圆柱形中,顶板电极可以是重掺杂(例如磷或硼掺杂)的多晶硅或硅锗。电容器电极1014A的一侧将连接接地/电源线,而电容器电极1012B的另一侧将连接电源/地线。这可以使用一个厚金属层或多个金属层来形成。在PDN中集成沟槽电容器可能是减少由电路操作引起的局部电压变化的有效方法。图10B示出了添加了图7A中所示的逻辑、存储器、EM互连层级和IO层级1022的各个层级之后的结构。这可能包括混合接合和多级水平转移。One of the challenges of 3D systems with multiple layers of active devices is power transfer. The concept of heterogeneous integration can be extended to include substrate designs that support power transfer. Figure 10A is a vertical cross-sectional view 1002 showing a substrate similar to that shown in Figure 7A with the addition of global power delivery structures. This may include deep trench capacitors 1016 and power distribution network ("PDN") 1014. Deep trench capacitors can be formed inside silicon wafers. In this case, the silicon substrate 1001 will be heavily doped to form the bottom electrode of the trench capacitor, as shown in Figure 10A. Alternatively, deep trench capacitors can be formed within the oxide. In this case, the bottom electrode may be a metal pad (not drawn). The structure of the capacitor can be one of planar, crown, cylindrical or cylindrical. In a cylindrical shape, the top plate electrode may be heavily doped (eg phosphorus or boron doped) polysilicon or silicon germanium. One side of capacitor electrode 1014A will be connected to the ground/power line, while the other side of capacitor electrode 1012B will be connected to the power/ground line. This can be formed using one thick metal layer or multiple metal layers. Integrating trench capacitors in PDN may be an effective way to reduce local voltage changes caused by circuit operation. Figure 10B shows the structure after adding the various levels of logic, memory, EM interconnect level and IO level 1022 shown in Figure 7A. This may include hybrid engagement and multi-level horizontal transfers.

本发明的另一个实施例是集成用于电力输送网络的电感器。这可以包括基于MEMS或CMOS-BEOL的电感器1017可以是空气、氧化物、铁或铁氧体。当使用铁氧体磁芯时,磁芯材料可以是锰锌、镍锌、铁硅或铁硅铝。电感器的结构可以是螺旋型薄膜。如图10C所示,电感器电极1014A的一侧将连接接地/电源线,电感剂电极1014B的另一侧将连接电源/地线。电感器1017的核心材料,图10D示出了在添加了图7A中所示的各种逻辑、存储器、EM互连层级和IO层级1022之后的结构。这可能包括混合接合和多级水平转移。Another embodiment of the invention is the integration of inductors for use in power delivery networks. This may include MEMS or CMOS-BEOL based inductors 1017 which may be air, oxide, iron or ferrite. When using a ferrite core, the core material can be manganese-zinc, nickel-zinc, iron-silicon or iron-silicon-aluminum. The structure of the inductor can be a spiral thin film. As shown in Figure 10C, one side of the inductor electrode 1014A will be connected to the ground/power line and the other side of the inductor electrode 1014B will be connected to the power/ground line. The core material of the inductor 1017, Figure 10D shows the structure after adding the various logic, memory, EM interconnect levels and IO levels 1022 shown in Figure 7A. This may include hybrid engagement and multi-level horizontal transfers.

本发明的另一个实施例是同时集成图10A所示的电容器和图10C所示的电感器,用于电力输送网络。Another embodiment of the present invention is to simultaneously integrate the capacitor shown in FIG. 10A and the inductor shown in FIG. 10C for use in a power transmission network.

功率分配是向3D晶片级系统(3D系统)中的各个单元提供各种电压和电流的基本功能。电源电压可以被设计为恒定,且在整个3D系统上具有窄的变化范围。三维系统中的配电系统对其可靠运行至关重要。从静态操作的角度来看,IR下降在3D系统的中心附近最高,而在Vdd和Vss连接附近最低。然而,由于各个电路/存储器块的时变功率需求,IR下降是一种动态现象。在参考其图162的美国专利8,273,610中,8,273,610以全文引用的方式并入本文中,在3D系统上补充了分级配电系统。一种替代技术是分配比电路块的所需电压大至少10%的电源电压。过驱动电源能够适应最坏情况下的动态IR下降。例如,如图10E所示,这可能包括降压转换器或电压调节器,其可以分布在3D系统上的每个区域。此处的区域可以指块、芯片或单元。电压调节器(“VR”)可以是DC-DC转换器、低压降(LDO)或其它形式的功率调节器。这样的电压调节器可以专用于稳定其自身的非隔离区的电源。通过这样做,电源管理可以具有高粒度,且可以针对不同的负载块进行单独控制。另一种替代方案可以是多级分布式电源架构。可以在外部电源和区域VR点之间添加另一层级的功率分配或单个中间母线转换器。中间母线转换器可以是单独的(隔离的)电压调节器,例如DC-DC转换器、低压降或其它形式的功率调节器。中间母线转换器可以是相对宽松地调节外部电压的分立模块,且它可以集成在分级电源背板内。或者,区域点VR可以具有数字接口和一些可编程性,其中通过其数字接口从3D系统内的中央控制器进行控制,或者进出3DSYSTEM的信号可以是外部控制信号。Power distribution is the basic function of providing various voltages and currents to various units in a 3D chip-level system (3D system). The supply voltage can be designed to be constant with a narrow range of variation throughout the 3D system. The power distribution system in a three-dimensional system is critical to its reliable operation. From a static operating perspective, IR drop is highest near the center of the 3D system and lowest near the Vdd and Vss connections. However, IR degradation is a dynamic phenomenon due to the time-varying power requirements of individual circuits/memory blocks. In U.S. Patent 8,273,610 with reference to Figure 162 thereof, which is incorporated herein by reference in its entirety, a hierarchical power distribution system is supplemented on a 3D system. An alternative technique is to distribute a supply voltage that is at least 10% greater than the required voltage of the circuit block. The overdriven power supply is able to accommodate worst-case dynamic IR drops. For example, as shown in Figure 10E, this may include buck converters or voltage regulators, which can be distributed in each area on the 3D system. Region here can refer to blocks, chips or cells. The voltage regulator ("VR") may be a DC-DC converter, a low dropout (LDO), or other form of power regulator. Such a voltage regulator can be dedicated to stabilizing the power supply in its own non-isolated area. By doing this, power management can be highly granular and controlled individually for different load blocks. Another alternative could be a multi-level distributed power architecture. Another level of power distribution or a single intermediate bus converter can be added between the external power source and the zone VR point. The intermediate bus converter may be a separate (isolated) voltage regulator such as a DC-DC converter, low dropout or other form of power regulator. The intermediate bus converter can be a discrete module that regulates external voltages relatively loosely, and it can be integrated within the hierarchical power backplane. Alternatively, the Zone Point VR could have a digital interface and some programmability, where it is controlled via its digital interface from a central controller within the 3D system, or the signals going in and out of the 3DSYSTEM could be external control signals.

在3D系统中,供体晶片和宿主晶片可以具有不同的热膨胀系数。施主晶片到主机晶片的接合工艺应控制应力和翘曲,以产生可靠的施主到主机连接。本文中的主晶片可以指单个晶片或多层堆叠结构。Qiu,Yuanying等人的“Si/GaAs接合晶片的热分析和接合应力的缓解策略(Thermal Analysis of Si/GaAs Bonding Wafers and MitigationStrategies of the Bonding Stresses)”,材料科学与工程进展2017(2017)中进行了研究,以全文引用的方式并入本文中,室温下的接合工艺与需要升高温度的工艺相比降低了应力;此外,研究表明,接合的晶片更薄,例如厚度小于1μm。与较厚晶片的接合相比,减小了应力。由于在操作工艺中结构的不均匀加热,在现场的3D系统的操作工艺中也可能动态地发生应力。此外,当堆叠多个晶片时,应力可能累积,至少随着水平数量的增加而成比例地增大。通过有意的应力释放机构提供帮助的技术可以集成在这样的3D系统中,如图10F所示。步骤s1示出了具有三个层级的主晶片,所述三个层级堆叠并与具有晶片通孔的相邻层级连接,所述晶片通孔例如硅通孔(TSV)或纳米TSV。为了接合阻尼功能以减轻应力累积,如步骤s2所示,将在水平和垂直方向上排列的应力阻尼沟槽阵列图案化。然后,如步骤s3所示,接着将另一晶片接合并转移到主晶片上。根据所需的层数,如步骤s4所示,在最上层上形成应力阻尼沟槽,随后进行晶片转移。例如,可以形成其它类型的应力释放结构,例如沟槽或孔,如SR(应力释放)结构所示:不连续的线性SR沟槽1042、具有圆角或倒角的正方形或矩形SR沟槽1044、圆形或椭圆形SR沟槽1046以及应力阻尼SR孔1048。In a 3D system, the donor wafer and the host wafer can have different coefficients of thermal expansion. The donor wafer to host wafer bonding process should control stress and warpage to produce a reliable donor to host connection. The master wafer in this article may refer to a single wafer or a multi-layer stack structure. "Thermal Analysis of Si/GaAs Bonding Wafers and MitigationStrategies of the Bonding Stresses" by Qiu, Yuanying et al., Progress in Materials Science and Engineering 2017 (2017) According to research, which is incorporated herein by reference in its entirety, bonding processes at room temperature reduce stresses compared to processes that require elevated temperatures; furthermore, the research shows that the bonded wafers are thinner, e.g., less than 1 μm thick. Stresses are reduced compared to bonding of thicker wafers. Stresses can also occur dynamically during the operating process of a 3D system in the field due to uneven heating of the structure during the operating process. Furthermore, when stacking multiple wafers, stress may accumulate, at least proportionally with the number of levels. Technology that provides assistance through intentional stress relief mechanisms can be integrated into such a 3D system, as shown in Figure 10F. Step s1 shows a master wafer with three levels stacked and connected to adjacent levels with through-wafer vias, such as through-silicon vias (TSVs) or nano-TSVs. To engage the damping function to mitigate stress accumulation, an array of stress damping trenches arranged in the horizontal and vertical directions is patterned as shown in step s2. Then, as shown in step s3, another wafer is then bonded and transferred to the main wafer. Depending on the required number of layers, a stress damping trench is formed on the uppermost layer as shown in step s4, followed by wafer transfer. For example, other types of stress relief structures, such as trenches or holes, can be formed, as shown in SR (stress relief) structures: discontinuous linear SR trenches 1042, square or rectangular SR trenches with rounded or chamfered corners 1044 , circular or oval SR grooves 1046 and stress damping SR holes 1048.

层级转移和混合接合可能需要特殊的互连层来形成焊盘/引脚,如图5A-5E所示。在有源电路下方形成这样的结构可能需要首先进行层级转移和衬底去除,例如图9B-9D所示。晶片加工成本高度依赖于所用工艺线的类型,如图11的表所示。图11的表格于2020年4月发表在一份题为“Al芯片:其是什么和为什么重要,Al芯片参考(Al Chips:What TheyAre and Why They Matter,An Al Chips Reference)”的报告中,由Saif M.Khan、Alexander Mann撰写,以引用的方式并入本文中。它显示了从90纳米线到5纳米工艺线的数量级成本和价格差异。因此,构建特殊的耦合层级可能是有用的,所述耦合层级可以包括类似于PCT/US2018/052332的图1A至图3C的电子对准能力,PCT/US2018/052332的以引用的方式并入本文中。这样的耦合层级可以帮助构建异构集成3D系统,其中存储器层级可以在类似于图9A的907的单元底部引脚之间以及在诸如906、908的单元顶部焊盘之上。使用耦合层级,单元之间的引脚907可以耦合到电路焊盘结构上,如图5A所示,仅使用混合接合。Level transfer and hybrid bonding may require special interconnect layers to form pads/pins, as shown in Figures 5A-5E. Forming such structures beneath active circuitry may require first level transfer and substrate removal, such as shown in Figures 9B-9D. Wafer processing costs are highly dependent on the type of process line used, as shown in the table in Figure 11. The table in Figure 11 was published in a report titled "Al Chips: What They Are and Why They Matter, An Al Chips Reference" in April 2020. Written by Saif M. Khan, Alexander Mann, incorporated herein by reference. It shows the order of magnitude cost and price difference from 90nm line to 5nm process line. Therefore, it may be useful to construct special coupling hierarchies that may include electronic alignment capabilities similar to those of Figures 1A-3C of PCT/US2018/052332, which is incorporated herein by reference. middle. Such coupling levels can help build heterogeneous integrated 3D systems, where memory levels can be between cell bottom pins like 907 of Figure 9A and over cell top pads like 906, 908. Using a coupling hierarchy, the pins 907 between cells can be coupled to the circuit pad structure, as shown in Figure 5A, using only hybrid bonding.

类似700的3D系统可以构建为所有层级都是为所述特定系统定制的,或者许多层级是通用的,利用引脚/焊盘位置和单元尺寸的商定标准。因此,可以使耦合水平符合这样的3D异构集成标准。在一些情况下,电路上引脚/焊盘位置可以是标准的一部分,而单元引脚/焊盘中的引脚/焊垫或控制线可以是定制的,以更好地适应特定存储器或其它类型的电路技术。A 3D system like the 700 can be built with all layers customized for the specific system in question, or with many layers being generic, utilizing agreed-upon standards for pin/pad locations and cell sizes. Therefore, the coupling level can be made consistent with such 3D heterogeneous integration standards. In some cases, the on-circuit pin/pad locations may be part of the standard, while the pins/pads or control lines within the cell pins/pads may be customized to better fit a specific memory or other type of circuit technology.

图12A是耦合水平的截面X-Z 1202剖视图。在可移除衬底1204上,可切换底部引脚/焊盘1218按照参考PCT/US2018/052332的图1A-图3C所示的概念构建。晶体管选择1216可以类似于PCT/US2018/052332的图2E中的图12B。被称为BL1-BL4的所选信号可以连接到电路引脚/焊盘结构1214上,所述电路引脚/焊盘结构1214可以根据标准形成。耦合层级可以具有非常简单的控制电路1212来执行诸如在GLS到GRS之间的电子对准选择。所述结构可以包括用于电源连接的较大引脚/焊盘,未示出。控制电路1212可以利用目标层级中的两个连接的测试引脚/焊盘来测量连接性,并相应地在哪个GLS到GRS之间进行选择。额外的较大引脚/焊盘可用于连接可选的层级选择控制引脚。这样的层级选择控制信号可以用于禁用GLS和GRS。这种层级选择对于在目标晶片中难以形成层级选择的情况是有用的,如参考US16/558,304的图26A所讨论的DRAM所示,US16/558,304以引用的方式并入本文中。Figure 12A is a cross-sectional view of cross-section X-Z 1202 at the coupling level. On the removable substrate 1204, switchable bottom pins/pads 1218 are constructed according to the concept shown in reference to Figures 1A-3C of PCT/US2018/052332. Transistor selection 1216 may be similar to Figure 12B in Figure 2E of PCT/US2018/052332. Selected signals, referred to as BL1-BL4, may be connected to circuit pin/pad structures 1214, which may be formed according to standards. The coupling level may have very simple control circuitry 1212 to perform electronic alignment selection such as between GLS to GRS. The structure may include larger pins/pads for power connections, not shown. The control circuit 1212 can measure connectivity using the two connected test pins/pads in the target hierarchy and select between which GLS to GRS accordingly. Additional larger pins/pads are available to connect optional stratum select control pins. Such level selection control signals can be used to disable GLS and GRS. Such hierarchical selection is useful in situations where it is difficult to formulate hierarchical selection in the target wafer, as shown in the DRAM discussed with reference to Figure 26A of US 16/558,304, which is incorporated herein by reference.

虽然使用具有层级选择的耦合层级或参考美国专利申请16/558,304(美国专利公开2020/0176420)的至少图26A所讨论的技术是存储器层级内的层级选择的替代方案,但可能优选为存储器层级处理添加所需的额外处理步骤,以便在其中进行层级选择。层级选择的类型可以作为此类M-层级设计的一部分进行设计。这样的设计可以适应诸如n型之类的单个晶体管类型和由M层级的其它元件补偿的一些松弛选择晶体管规格,例如感应放大器的设计,以支持存储器内层级、层级选择,如参考美国专利申请16/558,304(美国专利公开2020/0176420Al)的至少图22C-22E所示。While using a coupling hierarchy with hierarchy selection or the technique discussed with reference to at least Figure 26A of US Patent Application No. 16/558,304 (US Patent Publication 2020/0176420) is an alternative to hierarchy selection within a memory hierarchy, memory hierarchy processing may be preferred. Adds the additional processing steps required to make level selections within it. The type of tier selection can be designed as part of such an M-tier design. Such a design can accommodate a single transistor type such as n-type and some relaxed selection of transistor specifications compensated by other elements of the M level, such as the design of sense amplifiers to support intra-memory level, level selection, as described in US patent application 16 /558,304 (U.S. Patent Publication 2020/0176420A1) as shown in at least Figures 22C-22E.

图12A说明了准备混合接合到电路上引脚/焊盘结构的耦合层级。如果需要接合到管脚/焊盘之间的结构,则可以使用载体晶片来翻转所述结构,使得其将首先接合到管针/焊盘中间的结构。Figure 12A illustrates the coupling levels in preparation for hybrid bonding to the pin/pad structure on the circuit. If you need to bond to the structure between the pins/pads, you can use a carrier wafer to flip the structure so that it will bond to the structure in the middle of the pins/pads first.

在3D集成中使用水平转移通常被称为并行器件集成,而不是顺序集成。在并行器件集成中,两个晶片分别处理(通常在晶体管形成和一些金属化之后),然后使用主要工艺步骤(例如,混合接合)对其进行集成。这个概念可以进一步扩展到集成3D系统的方法,例如,参考本文的图7A。这样的3D系统可以相应地利用一种以上类型的存储器和存储器技术。计算系统中最常见的存储器是由FMC开发的用于超高速存储器(如高速缓存)的SRAM或铁电存储器,用于大多数快速存储器(如主存储器)的DRAM,以及用于高密度存储器(如数据存储器)的NAND闪存。系统可以包括用于程序代码存储器的NOR型闪存和诸如交叉点存储器、MRAM或RRAM之类的其它类型的存储器。在3D异构集成系统中,这些存储器可以通过利用所述存储器技术所需的特定处理在适当的晶片制造线上处理的存储器晶片的水平转移来集成。并行集成工艺可以用于分阶段完成集成。第一阶段可以是在适当的晶片生产线中处理所需的晶片,其可以包括前端线处理(晶体管)和后端线处理(互连)。第二阶段可以包括层级转移,以形成“主层级”或“存储器层级”,这可以被称为M层级。The use of horizontal transfer in 3D integration is often referred to as parallel device integration, rather than sequential integration. In parallel device integration, two wafers are processed separately (usually after transistor formation and some metallization), and then they are integrated using a main process step (e.g., hybrid bonding). This concept can be further extended to methods that integrate 3D systems, see, for example, Figure 7A of this article. Such 3D systems may accordingly utilize more than one type of memory and memory technology. The most common memories in computing systems are SRAM or ferroelectric memory developed by FMC for ultra-fast memories such as caches, DRAM for most fast memories such as main memory, and DRAM for high-density memories ( Such as data storage) NAND flash memory. Systems may include NOR-type flash memory and other types of memory such as cross-point memory, MRAM, or RRAM for program code storage. In a 3D heterogeneous integrated system, these memories can be integrated by horizontal transfer of memory wafers processed on appropriate wafer fabrication lines utilizing the specific processing required by the memory technology. Parallel integration processes can be used to complete integration in stages. The first stage may be to process the required wafers in an appropriate wafer production line, which may include front-end line processing (transistors) and back-end line processing (interconnects). The second stage may include level transfer to form a "main level" or "memory level", which may be referred to as the M level.

因此,可以将存储器控制晶片(可能通过第一阶段形成)转移到存储器晶片(可能经由第一阶段形成,可能在不同的扇形线中)的顶部,以形成M层级晶片。M层级晶片可以在等待在3D系统中使用的同时被存储。在形成M层级晶片并可能储存M层级晶片之后,这些M层级晶片可以被转移和集成以形成所需的3D系统,如图13所示。存储器控制可以包括电路(也称为“存储器外围”,尤其是在2D器件中),例如解码器、感应放大器、电荷泵、自测试逻辑和类似的存储器控制电路。它可以包括到存储器层级的垂直连接,提供字线、位线、层级选择等等。存储控制器可以使用混合接合连接技术,例如参考本文中的图4A-4C、图5A-5E和图12A-12B,以及美国专利申请16/558,304、公开案2020/0176420的图21A至图27D,以及PCT申请PCT/US2018/52332的图1A至图3C,所有这些都以全文引用的方式并入本文中。Thus, the memory control wafer (possibly formed via the first stage) can be transferred on top of the memory wafer (possibly formed via the first stage, possibly in a different sector line) to form an M-level wafer. M-level wafers can be stored while awaiting use in 3D systems. After the M-level wafers are formed and possibly stored, these M-level wafers can be transferred and integrated to form the desired 3D system, as shown in Figure 13. Memory control may include circuitry (also called "memory peripherals", especially in 2D devices) such as decoders, sense amplifiers, charge pumps, self-test logic, and similar memory control circuitry. It can include vertical connections to the memory hierarchy, providing word lines, bit lines, hierarchy selection, and so on. The memory controller may use hybrid bonding connection technology, see for example Figures 4A-4C, Figures 5A-5E, and Figures 12A-12B herein, and Figures 21A-27D of U.S. Patent Application No. 16/558,304, Publication No. 2020/0176420, and Figures 1A to 3C of PCT application PCT/US2018/52332, all of which are incorporated herein by reference in their entirety.

图13A是晶片区域的X-Z 1302侧视图。它说明了三维系统的分阶段集成。在第一阶段中,每个晶片在其各自的工艺线中被处理,例如用于处理器级1320的逻辑线、用于快速存储器1318的DRAM线、DRAM存储器控制1316、用于高密度存储器1314的3D NAND线以及3DNAND控制逻辑电路1312。或者,DRAM存储器控制逻辑晶片1316可以从不同于DRAM线的逻辑制造厂进行处理。同样地,3D NAND控制逻辑晶片1312可以从不同于3D NAND线的逻辑制造厂进行处理。DRAM存储器晶片1318可以仅包括存储单元。或者,DRAM存储器晶片1318可以包括存储单元和一些核心逻辑功能,例如读出放大器和行/列解码器。3D NAND晶片1314可以仅包括存储单元。或者,3D NAND晶片1314可以包括存储单元和一些核心逻辑功能,例如感应放大器、行/列解码器和控制线选择门。DRAM存储器控制逻辑电路1316和3D NAND控制逻辑电路1311包括数据缓冲器、地址缓冲器、控制缓冲器、模式电阻器、纠错控制电路、内置测试中的至少一个。Figure 13A is an X-Z 1302 side view of the wafer area. It illustrates the phased integration of a three-dimensional system. In the first phase, each wafer is processed in its respective process line, such as logic lines for processor stage 1320, DRAM lines for flash memory 1318, DRAM memory control 1316, high density memory 1314 3D NAND lines and 3D NAND control logic circuit 1312. Alternatively, the DRAM memory control logic wafer 1316 may be processed from a different logic fab than the DRAM line. Likewise, the 3D NAND control logic die 1312 may be processed from a different logic fab than the 3D NAND lines. DRAM memory die 1318 may include only memory cells. Alternatively, DRAM memory die 1318 may include memory cells and some core logic functions, such as sense amplifiers and row/column decoders. The 3D NAND wafer 1314 may only include memory cells. Alternatively, the 3D NAND die 1314 may include memory cells and some core logic functions such as sense amplifiers, row/column decoders, and control line select gates. The DRAM memory control logic circuit 1316 and the 3D NAND control logic circuit 1311 include at least one of a data buffer, an address buffer, a control buffer, a mode resistor, an error correction control circuit, and a built-in test.

在第二阶段中,通过在DRAM电路1318和衬底背面切割上翻转和接合(混合接合)DRAM控制电路1316来形成M层级,例如通过使用蚀刻、研磨或抛光DRAM控制衬底以产生接合结构1324,以及添加引脚/焊盘层级以产生DRAM 1334的M层级中的至少一个。类似地,例如通过使用蚀刻、研磨或抛光3D NAND控制衬底以产生接合结构1322和添加引脚/焊盘层级以产生DRAM 1332的M层级中的至少一个,在3D NAND电路1314和衬底背面切割上翻转和接合3D NAND控制电路1312。然后在第三阶段中,将DRAM M层级1334翻转并接合在处理器级1320上,切割DRAM衬底,得到接合结构1330,然后根据需要添加引脚/焊盘结构,然后在结构1330上翻转并接合NAND M层级1332,并切割NAND衬底,得到接合结构1340。In a second stage, the M level is formed by flipping and bonding (hybrid bonding) the DRAM control circuit 1316 on the DRAM circuit 1318 and the substrate backside cut, such as by using etching, grinding or polishing the DRAM control substrate to create the bonding structure 1324 , and adding pin/pad levels to produce at least one of the M levels of DRAM 1334 . Similarly, the 3D NAND circuitry 1314 and the backside of the substrate, such as by using etching, grinding, or polishing the 3D NAND control substrate to create bonding structures 1322 and adding pin/pad levels to create at least one of the M levels of DRAM 1332 The 3D NAND control circuit 1312 is flipped and bonded on the cut. Then in the third stage, the DRAM M level 1334 is flipped and bonded on the processor level 1320, the DRAM substrate is cut to obtain the bonding structure 1330, and the pin/pad structure is added as needed, then flipped and bonded on the structure 1330 The NAND M level 1332 is bonded, and the NAND substrate is cut to obtain a bonded structure 1340.

可以在DRAM M层级1334和3D NAND M层级1332之间共享存储器控制信号,例如数据路径、地址和推荐线。DRAM M M层级1334和3D NAND M层级1332可以具有其自己的专用控制信号。Memory control signals, such as data paths, addresses, and recommendation lines, may be shared between the DRAM M level 1334 and the 3D NAND M level 1332. DRAM MM level 1334 and 3D NAND M level 1332 may have their own dedicated control signals.

图13B是形成3D系统的替代阶段集成的X-Z 1302侧视图。DRAM 1334的M层级和3DNAND 1332的M层级可以被单独处理且可能被存储体。DRAM M M层级1334和3D NAND M层级1332在处理器级1320上以例如并排布置(其它布置也是可能的,接触边缘、仅接触一个角等,全部由工程和制造考虑确定)翻转和接合,形成3D系统结构1350。Figure 13B is an X-Z 1302 side view of an alternative stage integration forming a 3D system. The M level of DRAM 1334 and the M level of 3D NAND 1332 may be processed separately and possibly banked. The DRAM M level 1334 and the 3D NAND M level 1332 are flipped and joined on the processor level 1320 in, for example, a side-by-side arrangement (other arrangements are possible, touching edges, touching only one corner, etc., all determined by engineering and manufacturing considerations) to form 3D System Architecture 1350.

应注意,本文中DRAM或3D NAND的使用代表了高速/易失性存储器或高密度/非易失性存储。随着其它存储器技术变得有用,例如SRAM、交叉点存储器、PCRAM、RRAM、FRAM和MRAM,这些存储器可以与所提出的概念一样集成在3D系统中。It should be noted that the use of DRAM or 3D NAND in this article represents high-speed/volatile memory or high-density/non-volatile storage. As other memory technologies become useful, such as SRAM, cross-point memory, PCRAM, RRAM, FRAM and MRAM, these memories can be integrated in 3D systems like the proposed concept.

如前所述,可以利用单元尺寸和引脚/焊盘位置的行业标准构建3D系统。使用诸如M层级之类的结构可以允许在保持系统架构灵活性的同时遵守标准。这可以是通过所述层级控制电路将多个单元聚集在M层级中用于特定应用。As mentioned previously, 3D systems can be built using industry standards for cell size and pin/pad location. Using a structure such as the M hierarchy can allow compliance with standards while maintaining system architectural flexibility. This may be through the hierarchical control circuitry grouping multiple units into M hierarchies for a specific application.

这样的流程可以有许多变化,包括在一个M层级内包括多个存储器层级,这些存储器层级首先被接合以形成3D存储器结构,例如参考美国专利申请16/558,304,公开案2020/0176420的至少图21H、图25C、图25J和图26A所示,以全文引用的方式并入本文中。There are many variations to such a flow, including including multiple memory levels within an M-level that are first joined to form a 3D memory structure, see for example at least Figure 21H of U.S. Patent Application No. 16/558,304, Publication 2020/0176420 , Figure 25C, Figure 25J, and Figure 26A, which are incorporated herein by reference in their entirety.

通过M层级集成,每个单元的3D系统垂直连接可以缩小到母线格式。因此,垂直母线连接可以包括地址线,地址线可以被每个M层级的存储器控制电路解码为字线、位线。每个单元的系统级垂直连接可以计算大约一百条线路,而不是数千条线路。馈通概念,如图7B中的每单元馈通718,可用于此类垂直每单元母线。垂直线或柱可以被分配,例如,32个数据、34个地址、4个系统类型、16个控制和14个馈通。特定的系统每个单元可以使用多于或少于100根支柱线路母线。这种垂直母线可以利用工业中常见的计算机系统母线技术,例如多路传输数据或地址线,或者使用工业标准,例如AMB A、Avalon等等。Milica和MileStojcev在一篇论文中回顾了一系列工业芯片母线标准:“片上母线概述”,Factauniversitatis系列:Electronics and Energetics 19.3(2006):405-428,其以全文引用的方式并入本文中。With M-level integration, the 3D system vertical connections of each unit can be reduced to a busbar format. Therefore, the vertical bus connections may include address lines, which may be decoded into word lines and bit lines by each M-level memory control circuit. System-level vertical connections per unit can count around a hundred lines, rather than thousands. Feedthrough concepts, such as per-unit feedthrough 718 in Figure 7B, can be used for such vertical per-unit buses. Vertical lines or columns can be assigned, for example, 32 data, 34 addresses, 4 system types, 16 controls, and 14 feedthroughs. Certain systems may use more or less than 100 leg line buses per unit. This vertical bus can utilize computer system bus technologies common in the industry, such as multiplexed data or address lines, or use industry standards such as AMB A, Avalon, and others. Milica and MileStojcev review a series of industrial chip bus standards in a paper: "Overview of On-Chip Busbars," Factauniversitatis Series: Electronics and Energetics 19.3(2006):405-428, which is incorporated by reference in its entirety.

图14A-14B是这种3D系统的一个区域在不同比例因子下的垂直X-Z 1402剖面图。图14A显示了几个单元1406和中间的垂直母线1408。3D系统可以被构造在功能衬底1403上,所述功能衬底包括如前所述的除热结构、沟槽电容器或集成电感器、配电网络以及如前所讨论的能级和M-能级的异质集成的堆叠1404。除了垂直母线之外,所述系统还可以包括电力母线,以支持向各个层级的电力分配。垂直电源母线可以在同一单元侧,也可以在其它侧。例如,可以使用其它垂直公共柱,例如公共时钟和测试信号。单元侧面尺寸可以是本文经常提到的200gm,或者包括X方向和Y方向上的不同尺寸的其它尺寸,例如,例如约0.1mm、约0.2-0.4mm、约0.4-0.8mm、约0.8-1.2mm、约1.2-1.6mm、约1.6-2.2mm、约2.2-3.5或甚至大于约3.5mm。Figures 14A-14B are vertical X-Z 1402 cross-sections of a region of such a 3D system at different scale factors. Figure 14A shows several cells 1406 and a central vertical busbar 1408. The 3D system can be constructed on a functional substrate 1403 including thermal removal structures, trench capacitors or integrated inductors as previously described. Power distribution network and stack 1404 of heterogeneous integration of energy levels and M-levels as previously discussed. In addition to vertical busbars, the system may also include power busbars to support power distribution to various levels. The vertical power bus can be on the same side of the unit or on another side. For example, other vertical common columns can be used, such as common clocks and test signals. The unit side dimensions may be 200gm as often mentioned herein, or other dimensions including different dimensions in the X direction and Y direction, for example, about 0.1 mm, about 0.2-0.4 mm, about 0.4-0.8 mm, about 0.8-1.2 mm, about 1.2-1.6mm, about 1.6-2.2mm, about 2.2-3.5 or even greater than about 3.5mm.

图14A说明了对这种母线公共垂直连接的垂直支柱1414使用冗余。图14B显示了三个垂直支柱1414承载垂直母线的相同信号,并被连接在一起以提供给要使用的M层级的公共水平信号1416。M层级控制电路可以包括解码器和其它控制电路,包括母线多路复用、层级选择、包括电压泵电路的发电以及诸如通常称为存储器外围电路的其它电路。Figure 14A illustrates the use of redundancy for the vertical struts 1414 of this busbar common vertical connection. Figure 14B shows three vertical struts 1414 carrying the same signals of the vertical bus and being connected together to provide a common horizontal signal 1416 for the M levels to be used. M-level control circuits may include decoders and other control circuits including bus multiplexing, level selection, power generation including voltage pump circuits, and other circuits such as what are commonly referred to as memory peripheral circuits.

图14B示出了3D系统的一部分,其具有功能衬底1411、处理器级1420、高速M层级1422、高密度存储器M层级1424、水平电磁互连M层级1426和用于将3D系统连接到外部器件的输入输出M层级1428。3D系统还可以包括热隔离层1421和屏蔽层1425,所述热隔离层用于将处理器热量与覆盖的存储器层级隔离,所述屏蔽层用于保护底层免受可能与电磁互连M层级1426相关联的EMI噪声的影响。Figure 14B shows a portion of a 3D system with a functional substrate 1411, a processor stage 1420, a high-speed M-level 1422, a high-density memory M-level 1424, a horizontal electromagnetic interconnect M-level 1426, and an M-level for connecting the 3D system to the outside. The input and output M levels of the device 1428. The 3D system may also include a thermal isolation layer 1421 for isolating processor heat from the overlying memory level and a shielding layer 1425 for protecting the underlying layers from Effects of EMI noise that may be associated with the electromagnetic interconnection M level 1426.

图14A-14B的3D系统可以包括如先前讨论的耦合层级,或者将系统中使用的行业标准连接到为其它标准构建的层级或M层级的耦合层级。这样的耦合水平可以被认为是标准到标准的耦合水平。The 3D system of Figures 14A-14B may include coupling hierarchies as previously discussed, or coupling hierarchies that connect industry standards used in the system to hierarchies or M-levels built for other standards. Such a coupling level can be considered a standard-to-standard coupling level.

图14C是3D系统的一个区域的水平剖视图X-Y 1432,显示了6x3单元1438的子阵列及其母线1434和1436的相关侧垂直支柱;这些可能包括其裁员支柱。虽然图14C将垂直母线支柱1434、1436示出为阻挡单元之间的间隙,但可以预期的是,这种3D系统的设计可以支持相邻单元之间和跨单元(未示出)之间的连接X-Y连接。这些设计可以由所属领域的工程师进行,以适应与垂直支柱、引脚/焊盘设计规则和垂直支柱的数量及其冗余、单元尺寸和其它系统考虑因素和设计规则相关联的权衡。Figure 14C is a horizontal cross-sectional view X-Y 1432 of an area of the 3D system showing a sub-array of 6x3 cells 1438 and its associated side vertical struts of busbars 1434 and 1436; these may include its layoff struts. Although Figure 14C shows the vertical busbar struts 1434, 1436 as blocking gaps between cells, it is contemplated that the design of such a 3D system can support gaps between adjacent cells and across cells (not shown). Connect the X-Y connection. These designs can be made by engineers in their field to accommodate the trade-offs associated with the number of vertical pillars, pin/pad design rules and vertical pillars and their redundancy, cell size and other system considerations and design rules.

在仍然使用混合接合的同时适应接合未对准的额外替代方案可以是参考美国专利8,395,191的至少图93A-94C提出的技术,所述专利以全文引用的方式并入本文中。An additional alternative to accommodate joint misalignment while still using hybrid joints may be the techniques proposed with reference to at least Figures 93A-94C of US Patent 8,395,191, which is incorporated herein by reference in its entirety.

使用M层级概念的另一个优点是用于预测试。至少参考美国专利8,395,191的图86C(以全文引用的方式并入本文中),提出了无接触或无线测试的概念。这样可以用于执行指定集成到3D系统的M层级的测试。探针测试或其它形式的测试,包括使用自检和基于扫描的测试,可以用于测试一个层级,并标记任何具有单元级冗余无法克服的故障的单元。这种预测试可能是3D系统集成的重要组成部分,以实现整体系统产量。此外,M层级可以通过包含存储单元的冗余行和列、地址映射/重新映射块、内置测试、反熔丝来包括封装后修复功能。M层级甚至可以进一步包括软封装后修复电路。此外,M层级还可以包括片上纠错电路。Another advantage of using the M-level concept is for pretesting. With reference at least to Figure 86C of US Patent 8,395,191, which is incorporated herein by reference in its entirety, the concept of contactless or wireless testing is proposed. This can be used to perform tests at the M level specified for integration into the 3D system. Probe testing or other forms of testing, including the use of self-tests and scan-based testing, can be used to test a hierarchy and flag any units with faults that cannot be overcome by unit-level redundancy. This pre-testing can be an important part of 3D system integration to achieve overall system yield. Additionally, the M-level can include post-packaging repair capabilities by including redundant rows and columns of memory cells, address mapping/remapping blocks, built-in tests, and antifuses. The M level can even further include soft packaging post-repair circuitry. In addition, the M level may also include on-chip error correction circuitry.

在这种制造操作中,在使用例如混合接合进行3D集成之前,在进行这种水平和M水平测试之后,存在多种优势和操作替代选项。一种选择是选择高产量水平和M水平用于3D集成,而较低产量水平可用于其它应用,例如标准存储器产品或其它标准功能。较低的屈服水平也可以在3D技术中集成到具有较少水平的结构中,在所述结构中这种屈服损失可以被接受或修复。另一种选择是执行水平的匹配以最大化3D系统产量,通过对准故障来匹配水平以最小化产量损失,使得许多故障单元覆盖其它故障单元。基于单元的3D系统架构师,其中每个单元都有自己的垂直连接和电力输送,可以用于支持整个系统的功能,即使其中一些单元确实存在故障,应禁用。这可以被认为是一种冗余或敏捷系统重新配置。因此,使用诸如基于扫描或其它类型的内置测试(“BIST”)之类的测试,系统会禁用无法通过其内置冗余进行修复的单元。In this manufacturing operation, there are several advantages and operational alternative options after such level and M-level testing before 3D integration using, for example, hybrid bonding. One option is to select high yield levels and M levels for 3D integration, while lower yield levels can be used for other applications, such as standard memory products or other standard functions. Lower yield levels can also be integrated in 3D technology into structures with fewer levels where this loss of yield can be accepted or repaired. Another option is to perform matching of levels to maximize 3D system yield, by aligning faults to match levels to minimize yield loss, such that many faulty cells cover other faulty cells. Cell-based 3D system architects, where each cell has its own vertical connections and power delivery, can be used to support the functionality of the entire system, even if some of the cells do fail and should be disabled. This can be thought of as a redundant or agile system reconfiguration. Therefore, using tests such as scan-based or other types of built-in testing ("BIST"), the system disables units that cannot be repaired through their built-in redundancy.

图14E垂直X-Z 1442替代3D系统1450的一个区域的剖视图,所述系统包括混合的“颗粒”M层级1444和功能衬底1443。在这样的替代方案中,M层级的上部区域可以具有比下部分区1446更粗的单元分区1448。上M层级可以包括包括高密度存储器的一个或多个级,例如,3D NAND型存储器。这样的存储器与更长的访问时间相关联,且可以在减少垂直连接的情况下支持系统性能。3D系统模块化的其它变体在一些应用中可能是有用的。14E vertical In such an alternative, the upper region of the M level may have coarser cell partitions 1448 than the lower partitions 1446 . The upper M levels may include one or more levels including high density memory, for example, 3D NAND type memory. Such memory is associated with longer access times and can support system performance with reduced vertical connections. Other variations of 3D system modularity may be useful in some applications.

3D系统的另一个选项如图所示。14E是移动进程而不是数据,即以内存为中心的架构。多年来,通常的做法是将数据带到处理器来计算所需的指令。随着数据量的不断增长,另一种方法可能更有效,那就是将处理单元引入数据。在本文所述的3D系统中,大量数据可以存储在3D系统中从而形成池存储器。例如,与美国相关的数据可以存储在由美国数据(气泡)1452标记的位置,而与欧洲相关的数据可存储在由欧洲数据(气泡(bubble)1453标记的位置。因此,如果要对美国数据进行搜索或其它操作,则可以将适当的程序传输到靠近美国数据的处理器,由紧密处理器(气泡)1454标记。在一些系统中,处理器可以包括诸如FPGA门之类的可编程逻辑以及可编程逻辑的相关结构。因此,用于对可配置逻辑进行编程的适当比特流可以被传送到接近为处理器1452指定的数据的闭合处理器(气泡)1454。在这种情况下,将处理后的数据存储在原始数据(气泡)1452附近并在新程序中移动到靠近处理器(气泡)的1454以用于下一处理步骤可能更有效。因此,由于数据和处理器非常接近,处理能量将显著降低,而原始性能将更高。Another option for 3D systems is shown in the figure. 14E is a memory-centric architecture that moves processes rather than data. For many years, common practice was to bring data to the processor to compute the required instructions. As the amount of data continues to grow, another approach that may be more effective is to introduce processing units to the data. In the 3D system described in this article, a large amount of data can be stored in the 3D system to form a pool memory. For example, data related to the United States may be stored in a location marked by US data (bubble) 1452, while data related to Europe may be stored in a location marked by European data (bubble) 1453. Therefore, if you want to compare US data To perform a search or other operation, the appropriate program may be transferred to a processor close to the US data, marked by a compact processor (bubble) 1454. In some systems, the processor may include programmable logic such as FPGA gates and The associated structure of the programmable logic. Therefore, the appropriate bitstream for programming the configurable logic can be passed to the closed processor (bubble) 1454 close to the data designated for the processor 1452. In this case, the process It may be more efficient to store the resulting data near the original data (bubble) 1452 and move it closer to the processor (bubble) 1454 in the new program for the next processing step. Therefore, since the data and processor are in close proximity, the processing energy will Significantly lower, while raw performance will be higher.

处理器级本身可以是具有处理器程序存储器的M层级,且L1高速缓存级可以使用3D技术进行集成,如本文和通过引用接合的技术中所述。The processor level itself may be an M-level with processor program memory, and the L1 cache level may be integrated using 3D technology, as described herein and in the technology incorporated by reference.

本文参考图13A至图14E介绍的3D系统是关于模块化3D系统的各种异构结构。M层级在存储器控制层级和存储器层级之间可以具有非常高的连接性,其中位线和字线的每单位具有数百或数千个垂直连接,以及根据需要的额外控制,例如层级选择。这种垂直连接可以利用混合接合和引脚/焊盘结构,所述结构类似于本文中参考图5A-5E所提出的结构,或者至少参考美国专利申请16/558,304,公开案2020/0176420的图21H、图25C、图25J和图26A所提出的结构,其全部内容以引用的方式并入本文中。它还可以使用诸如本文所引用的技术,例如,作为电子对准。它也可以使用其它技术,例如在通过引用合并的技术中被引用为智能对准。这种丰富的每单位垂直连接可以在M层级中使用,而在3D系统中,可以利用每单位垂直母线的概念使用更宽松的垂直连接-例如,参考图14A-14C。因此,3D系统中的每个层级都可以支持每个单元的母线的垂直连接。一些层级可以将其作为馈通来支持,而另一些层级也可以通过连接母线或系统中层级之间的母线来支持。参考图7G,垂直母线信号758被示为馈送M层级的存储器控制739,也被示为通过755馈送到存储器层级752。如前所述,存储器层级的设计可以包括馈通柱的设计,以支持垂直单位系统母线的连接。因此,3D系统可以包括每单元适度的垂直连接,例如每单元母线大约一百个柱,以及M层级内的丰富连接,例如每个单元一千个柱,以支持存储器控制级与字线和位线753的存储器阵列之间的连接。不同的垂直连接技术和对准技术可以用于垂直母线和每M层级内部垂直连接。The 3D system introduced herein with reference to Figures 13A to 14E is about various heterogeneous structures of modular 3D systems. The M level can have very high connectivity between the memory control level and the memory level, with hundreds or thousands of vertical connections per unit of bit lines and word lines, as well as additional controls as needed, such as level selection. Such vertical connections may utilize hybrid bonding and pin/pad structures similar to those set forth herein with reference to Figures 5A-5E, or at least with reference to the drawings of U.S. Patent Application No. 16/558,304, Publication No. 2020/0176420 21H, Figure 25C, Figure 25J, and Figure 26A, the entire contents of which are incorporated herein by reference. It can also use technologies such as those cited in this article, for example, as electronic alignment. It can also use other techniques, such as smart alignment by reference in the merge-by-reference technique. This rich per-unit vertical connection can be used in the M hierarchy, while in 3D systems looser vertical connections can be used leveraging the per-unit vertical bus concept - for example, see Figures 14A-14C. Therefore, each level in the 3D system can support the vertical connection of the busbars of each unit. Some levels can support this as a feedthrough, while others can also support it by connecting busbars or busbars between levels in the system. Referring to FIG. 7G , a vertical bus signal 758 is shown feeding the memory control 739 of the M level, and is also shown feeding the memory level 752 via 755 . As mentioned previously, the design of the memory hierarchy can include the design of feedthrough columns to support connections to the vertical unit system buses. Thus, a 3D system can include modest vertical connections per cell, such as about a hundred columns per cell bus, and rich connections within the M hierarchy, such as a thousand columns per cell, to support memory control levels with word lines and bits Line 753 is the connection between the memory arrays. Different vertical connection techniques and alignment techniques can be used for vertical busbars and internal vertical connections per M level.

在一些3D系统中,垂直连接可能包括每个单元一个以上的垂直母线。这些垂直母线可以具有不同的功能,例如,一条将存储器M层级连接到处理器级的垂直母线,可以称为M母线。还有一条额外的垂直母线,将X-Y连接M层级连接到处理器级,可以称为C母线。例如,一些系统中的M母线甚至可能没有扩展到X-Y连接M层级,而一些系统中C母线可能不仅仅通过存储器M层级馈电。C母线可以类似于M母线,也可以非常不同,例如,利用不同的工业母线标准等等。每个功能的母线可以扩展为用于高速存储器的母线(可以称为SM母线)和用于高密度存储器的母线,可以称为DM母线。SM母线可以被设计用于高速,例如,例如在多于16个支柱的母线内使用宽数据用于数据,而DM母线可以设计用于具有例如内置冗余和纠错特征的高完整性。In some 3D systems, vertical connections may include more than one vertical busbar per unit. These vertical buses can have different functions. For example, a vertical bus that connects the memory M level to the processor level can be called an M bus. There is an additional vertical bus that connects the X-Y connection M level to the processor level, which can be called the C bus. For example, the M bus in some systems may not even extend to the X-Y connection M level, while in some systems the C bus may not only feed through the memory M level. The C busbar can be similar to the M busbar, or it can be very different, for example, utilizing different industrial busbar standards, etc. The bus of each function can be expanded into a bus for high-speed memory (which can be called an SM bus) and a bus for high-density memory, which can be called a DM bus. The SM bus may be designed for high speed, e.g. using wide data within a bus with more than 16 legs for data, while the DM bus may be designed for high integrity with e.g. built-in redundancy and error correction features.

在一些系统中,所述单元可能具有子单元,如图14D所示,X-Y 1442剖视图显示了具有通信处理器1432的子单元和Al处理器的16个子单元1436的单元1430的示例。通信处理器1432可以具有用于与X-Y连接M层级通信的C母线1434,以及用于将其连接到其覆盖存储器的M母线。Al处理器1436可以具有M母线以将其连接到其覆盖的存储器。此外,处理器级可以具有将Al处理器连接到通信处理器1432的水平母线(未示出)。子单元1436可以具有100pm的侧面尺寸或其它尺寸,如前面针对单元尺寸所提及的。所述系统可以包括针对不同类型的任务优化的不同类型的单元的混合。这些概念的许多其它变体可以由所属领域的工程师设计,以构建能够高效并行处理以及具有跨系统有效连接的串行处理的3D系统。In some systems, the unit may have sub-units, as shown in Figure 14D, the X-Y 1442 cross-sectional view shows an example of a unit 1430 with a sub-unit of the communications processor 1432 and 16 sub-units 1436 of the Al processor. The communications processor 1432 may have a C bus 1434 for communicating with the X-Y connection M level, and an M bus for connecting it to its overlay memory. The Al processor 1436 may have an M bus to connect it to the memory it overlays. Additionally, the processor stage may have a horizontal bus (not shown) connecting the A1 processor to the communications processor 1432. Subunits 1436 may have side dimensions of 100 pm or other dimensions, as mentioned previously for unit dimensions. The system may include a mix of different types of units optimized for different types of tasks. Many other variations of these concepts can be designed by engineers in the field to build 3D systems capable of efficient parallel processing as well as serial processing with efficient connections across systems.

一个额外的替代方案是将M母线扩展到更大数量的数据柱,例如,80、160甚至超过320。这种扩展的M母线增加了处理级和存储器层级之间的数据通信,以支持整体处理速度/性能的增加。An additional alternative is to extend the M bus to a larger number of data columns, for example, 80, 160 or even beyond 320. This expanded M-bus increases data communication between processing and memory levels to support increases in overall processing speed/performance.

由于存储器阵列的母线和单元级分区中有超宽的数据,基于3D NAND技术的存储器层级可以提供合理的数据速率,为系统提供高速存储器的作用。这样的3D NAND技术可以被修改以利用极薄的隧道氧化物,从而放弃保留时间以获得更快的写入和擦除时间以及更好的耐久性,如在至少美国专利10515981和PCT申请PCT/US2018/016759中所讨论的,所述申请以引用的方式并入本文中。三星通过其称为Z-NAND的产品线在行业中实践了将3DNAND技术修改为超低延迟存储器。这样的概念可以通过使用极薄的隧道氧化物、非常宽的数据母线和将存储器阵列划分为数百个单元来进一步增强,这些单元利用了对3D NAND存储器阵列的存储器控制的堆叠,如本文和一些接合的参考文献中所述。Due to the ultra-wide data in the bus and cell-level partitions of the memory array, the memory hierarchy based on 3D NAND technology can provide reasonable data rates and provide the system with the role of high-speed memory. Such 3D NAND technology could be modified to utilize extremely thin tunnel oxides, thereby giving up retention time for faster write and erase times and better endurance, as described in at least US Patent 10515981 and PCT application PCT/ are discussed in US2018/016759, which application is incorporated herein by reference. Samsung has practiced in the industry modifying 3D NAND technology into ultra-low latency memory through its product line called Z-NAND. Such a concept can be further enhanced by using extremely thin tunnel oxides, very wide data busses, and partitioning the memory array into hundreds of cells that take advantage of memory-controlled stacking of 3D NAND memory arrays, as described in this article and Some of the joint references are described in the literature.

通常,本文提出的3D系统可能类似于用于连接采用印刷电路板(“PCB”)的芯片和封装的现有系统。这些PCB集成系统的许多系统架构可以被映射到本文所呈现的垂直3D系统。In general, the 3D system proposed in this article may be similar to existing systems for connecting chips and packages employing printed circuit boards ("PCBs"). Many system architectures of these PCB integrated systems can be mapped to the vertical 3D systems presented in this article.

M层级概念可以超越记忆扩展到3D系统的其它功能元件。这可以是使用电磁波的X-Y互连。连接性M层级可以包括控制级、调制和解码级以及传输线/波导级。因此,母线垂直连接可以由X-Y互连控制器使用,然后X-Y互联控制器可以将信息传播到X连接通道和Y连接通道。The M-level concept can be extended beyond memory to other functional components of the 3D system. This can be an X-Y interconnect using electromagnetic waves. The connectivity M level may include control level, modulation and decoding level, and transmission line/waveguide level. Therefore, the bus vertical connection can be used by the X-Y interconnect controller, which can then propagate the information to the X connection channel and the Y connection channel.

对于薄晶片转移工艺,静电吸盘由于其长期的静电保持能力和薄晶片的可逆附着,作为晶片搬运器越来越受欢迎。当堆叠晶片以配置晶片级3D系统时,如图所示。13A由于静电放电(ESD)电流应力,可能会发生晶体管氧化物的退化。这种栅极氧化物退化可能导致功能失效并导致寿命缩短。为了保护栅极氧化物免受纳米TSV的混合接合工艺的影响,可以在纳米TSV上连接ESD保护功能。图14F示出了连接到纳米TSV和内部逻辑的ESD保护功能的放大图。现有技术中已知有多种ESD结构,其可用于支持3D系统制造。这些ESD结构通常相当大。在此,纳米TSV和垂直母线支柱可以互换地提及。对于3D系统和支撑垂直母线支柱,如图14B所示,这些结构可以按比例缩小1:10甚至1:100,以铺设在约1x1 gm2的矩形中,或者如果多个垂直支柱1414被指定为支撑相同的信号作为冗余,例如图14B。然后,普通ESD可以支持其,且可以具有大约2x2gm2的矩形尺寸。图14F示出了具有多个ESD替代结构1462、1464、1466、1458、1470、1472的图B 1460的结构。根据要接合的晶片表面侧的类型,连接可以是从前到前、从前到后或背靠背。ESD保护功能可以是单个器件或电路。ESD保护功能以有限的电压降分流或旁路ESD电流。在传统CMOS中,I/O焊盘的ESD保护被广泛使用并以引用的方式并入本文中,如Lin,Chun-Yu的“CMOS技术中的低C ESD保护设计(Low-C ESD ProtectionDesign in CMOS Technology)”,从微间隙击穿到纳米发电机的静电放电,IntechOpen,2019和阿尔伯特·ZH·王,集成电路的片上ESD保护:IC设计的视角,第663卷,施普林格科学与商业媒体,2006年。在本发明中可以使用类似的ESD器件或ESD电路。假设内部电路在Vdd和Vss之间工作,EDS功能在正常电压区域不导通。对于大于Vdd的ESD感应电压,ESD功能分流ESD电流。常见的ESD保护功能可以是二极管、MOSFET、可控硅整流器(SCR)及其组合,如图14F所示。几个选项(但不限于)显示为横截面视图。ESD功能的一个选项可以是接地栅极N型MOSFET 1464,其中栅极、源极和主体接地以在正常操作期间保持其关断。这种类型的ESD先前在Lee,Jian Xing等人的“高电流应力和HBMESD事件下多指GGNMOS的动态电流分布”,2006,IEEE国际可靠性物理研讨会论文集,IEEE,2006中进行了研究,其以应用的方式并入本文中。ESD功能的另一个选择可以是可控硅整流器(SCR)1468,其由PNP BJT和NPN B JT组成。交叉耦合的PNP和NPN的正反馈机构导致ESD分流。NPN和PNP B JT可以由浅沟槽氧化物或伪栅极限定。这种类型的ESD先前在Ker Ming Dou和K-C.Hsu的“CMOS集成电路中基于SCR的器件的片上静电放电保护设计概述”,IEEE器件和材料可靠性汇刊5.2(2005):235-249中进行了研究,其以引用的方式并入本文中。ESD功能的另一个选择可以是具有连接到其中一个BJT的基极的触发端子的SCR器件。触发端子可以进一步与另一个器件耦合,例如MOSFET的栅极、衬底和二极管。GGNMOS耦合SCR的示例被绘制为示例1464。二极管型ESD形成单向放电路径。双二极管型ESD 1470使用两个单向ESD器件,可以形成双向放电路径。在CMOS工艺中,n型阱和p+扩散以及p型阱和扩散可以导致双向二极管中的双向ESD。n+和p+扩散区也可以通过STI或虚设栅极来分离。在ESD的另一个示例中,二极管可以被堆叠以减少寄生电容或提供更高的触发电压,如堆叠的双向二极管所示。这种类型的ESD以前在Son Minoh和Changkun Park的“使用分布式电池基二极管串联的静电放电保护装置”,电子信件50.3(2014):168-170中进行了研究,其以引用的方式并入本文中。在另一个示例中,SRC可以嵌入到堆叠二极管中,如在具有嵌入式SCR的堆叠二极管1462中所示,如之前在Lin Chun Yu等人的“在65nm CMOS中提高具有嵌入式SCR应用的堆叠二极管的ESD鲁棒性”,2014IEEE国际可靠性物理研讨会,IEEE,2014中所研究的,其以引用的方式并入本文中。ESD的另一个示例提供了在单个ESD器件中的双向放电。SCR和二极管的截面图显示SCR路径负责nTSV和Vss之间的ESD放电,二极管路径负责nTSV和Vdd之间的ESD排放。For thin wafer transfer processes, electrostatic chucks are becoming increasingly popular as wafer handlers due to their long-term electrostatic retention capabilities and reversible attachment of thin wafers. When wafers are stacked to configure a wafer-scale 3D system, as shown in the figure. 13A Degradation of transistor oxides may occur due to electrostatic discharge (ESD) current stress. This gate oxide degradation can lead to functional failure and result in shortened lifetime. To protect the gate oxide from the hybrid bonding process of Nano-TSVs, ESD protection functions can be connected on the Nano-TSVs. Figure 14F shows an enlarged view of the ESD protection features connected to the Nano TSV and internal logic. A variety of ESD structures are known in the art and can be used to support 3D system manufacturing. These ESD structures are usually quite large. Here, nanoTSV and vertical busbar pillars may be mentioned interchangeably. For 3D systems and supporting vertical busbar struts, as shown in Figure 14B, these structures can be scaled down 1:10 or even 1:100 to lay in a rectangle of approximately 1x1 gm 2 , or if multiple vertical struts 1414 are specified Support the same signal as redundancy, e.g. Figure 14B. Then a normal ESD can support it and can have a rectangular size of about 2x2gm2 . Figure 14F shows the structure of Figure B 1460 with multiple ESD surrogate structures 1462, 1464, 1466, 1458, 1470, 1472. Depending on the type of wafer surface side to be bonded, the connections can be front-to-front, front-to-back, or back-to-back. ESD protection functions can be individual devices or circuits. ESD protection features shunt or bypass ESD current with limited voltage drop. In traditional CMOS, ESD protection of I/O pads is widely used and is incorporated into this article by reference, such as "Low-C ESD Protection Design in CMOS Technology" by Lin and Chun-Yu CMOS Technology)”, From Microgap Breakdown to Electrostatic Discharge in Nanogenerators, IntechOpen, 2019 and Albert ZH Wang, On-chip ESD Protection of Integrated Circuits: An IC Design Perspective, Volume 663, Springer Science and Business Media, 2006. Similar ESD devices or ESD circuits may be used in the present invention. Assuming that the internal circuit operates between Vdd and Vss, the EDS function does not conduct in the normal voltage region. For ESD induced voltages greater than Vdd, the ESD function shunts the ESD current. Common ESD protection functions can be diodes, MOSFETs, silicon controlled rectifiers (SCRs), and combinations thereof, as shown in Figure 14F. Several options (but not limited to) are shown as cross-section views. One option for ESD functionality could be a ground gate N-type MOSFET 1464, where the gate, source and body are connected to ground to keep it off during normal operation. This type of ESD was previously studied in Lee, Jian Xing, et al., “Dynamic current distribution of multi-finger GGNMOS under high current stress and HBMESD events,” 2006, Proceedings of the IEEE International Symposium on Reliability Physics, IEEE, 2006 , which is incorporated herein by application. Another option for ESD functionality can be a Silicon Controlled Rectifier (SCR) 1468, which consists of PNP BJT and NPN B JT. The positive feedback mechanism of cross-coupled PNP and NPN results in ESD shunting. NPN and PNP B JTs can be defined by shallow trench oxide or dummy gates. This type of ESD was previously described in Ker Ming Dou and KC Hsu, "Overview of Design of On-Chip Electrostatic Discharge Protection for SCR-Based Devices in CMOS Integrated Circuits," IEEE Transactions on Device and Materials Reliability 5.2 (2005): 235-249 conducted research, which is incorporated herein by reference. Another option for ESD functionality could be an SCR device with a trigger terminal connected to the base of one of the BJTs. The trigger terminal can be further coupled to another device, such as the MOSFET's gate, substrate, and diode. An example of a GGNMOS coupled SCR is drawn as example 1464. Diode-type ESD creates a unidirectional discharge path. The dual diode ESD 1470 uses two unidirectional ESD devices to create a bidirectional discharge path. In CMOS processes, n-type wells and p+ diffusions and p-type wells and diffusions can cause bidirectional ESD in bidirectional diodes. The n+ and p+ diffusion regions can also be separated by STI or dummy gate. In another example of ESD, diodes can be stacked to reduce parasitic capacitance or provide a higher trigger voltage, as shown with stacked bidirectional diodes. This type of ESD was previously studied in Son Minoh and Changkun Park, "Electrostatic discharge protection devices using distributed battery-based diode series connections," Electronic Letters 50.3 (2014): 168-170, which is incorporated by reference in this article. In another example, SRCs can be embedded into stacked diodes, as shown in Stacked Diodes with Embedded SCR 1462, as previously described in Lin Chun Yu et al., “Improving Stacking with Embedded SCR Applications in 65nm CMOS” ESD Robustness of Diodes," studied in 2014 IEEE International Symposium on Reliability Physics, IEEE, 2014, which is incorporated herein by reference. Another example of ESD provides bidirectional discharge in a single ESD device. The cross-sectional view of the SCR and diode shows that the SCR path is responsible for the ESD discharge between nTSV and Vss, and the diode path is responsible for the ESD discharge between nTSV and Vdd.

可以在每个M层级集成一个小型ESD,例如约1:100的常规ESD,以支撑其垂直母线支柱。M层级可以设计为预先测试,以支持前面讨论的各种集成策略。测试工艺可能与静电电荷有关,适当的ESD对于支持M层级预测试和集成策略可能很重要。如果需要,可以将具有常规ESD的全层级集成为ESD层级,以用于可能需要防止高压ESD的应用。缩减ESD可以由ESD设计中的工程师根据M层级硅最终(在转移接合和细化之后)衬底厚度的设计选择来设计,以支持这种M层级。A small ESD, such as a regular ESD of about 1:100, can be integrated at each M level to support its vertical busbar pillars. The M-tier can be designed to be pre-tested to support the various integration strategies discussed earlier. The test process may be related to electrostatic charge, and appropriate ESD may be important to support M-level pre-test and integration strategies. If desired, the full level with conventional ESD can be integrated into an ESD level for applications where protection from high voltage ESD may be required. Reduced ESD can be designed by engineers in the ESD design based on design choices for the final (after transfer bonding and refinement) substrate thickness of the M-level silicon to support this M-level.

本文提出的晶片级3D系统可能需要冗余和产量修复或产量灵活性才能成为商业可行的技术。这已经在本文和通过引用接合的技术中提出,包括多种技术,例如参考美国专利申请16/558,304(公开案2020/0176420)的图35A-35C和图38A-38C,通过引用接合在本文中。参考美国专利8,994,404的图17和图24A至图44B,提出了额外的基于3D的冗余和修复技术,通过引用合并于此。3D系统中的每个M层级可以包括其自己的自检和修复技术,如所属领域中已知的存储器和任务关键电路。用于3D系统的额外技术可以包括添加冗余M层级,例如用于X-Y连接M层级的第二备份级。或者为每个单元添加一条冗余垂直母线。这些冗余层级可以连接起来,以便用于增强系统,并提供容错性、缺陷灵活性和优雅的老化。The wafer-scale 3D system proposed in this paper may require redundancy and yield fixes or yield flexibility to become a commercially viable technology. This has been suggested herein and in the art incorporated by reference, including various techniques, such as with reference to Figures 35A-35C and Figures 38A-38C of U.S. Patent Application No. 16/558,304 (Publication 2020/0176420), incorporated herein by reference. . Additional 3D-based redundancy and repair techniques are proposed with reference to Figures 17 and 24A-44B of US Patent 8,994,404, which is incorporated herein by reference. Each M-level in the 3D system may include its own self-test and repair technology, such as memory and mission-critical circuitry known in the art. Additional techniques for 3D systems may include adding redundant M-levels, such as a second backup level for X-Y connected M-levels. Or add a redundant vertical busbar to each unit. These layers of redundancy can be connected to enhance the system and provide fault tolerance, defect flexibility, and graceful aging.

本文所述的3D系统利用了许多单元,这些单元具有处理器存储器,且能够利用X-Y连接层级进行互连。这种系统有时被称为“片上网络”(NoC)。这样的系统可以通过调用要激活的备用单元来替换有缺陷的单元或者提供预先任务分配能力来将工作负载分配给可用的良好操作单元来管理缺陷。这种具有自我修复和操作灵活性的复杂系统的概念在所属领域是众所周知的,且例如与服务器场和其它多计算机系统一起使用。这种技术可以包括使用一种被称为“监视器”的电路,在这种电路中,良好的操作单元将定期触发监视器电路,宣布单元处于良好的操作状态。如果监视器在没有这种触发器的情况下放置太久,它可能会激活单元故障保护模式。因此,一旦检测到故障单元,监视器电路就可以激活受控的垂直母线断开,以将故障处理器与垂直母线隔离,从而避免故障单元影响3D系统的其它单元的操作。在这种情况下,电路还可以启动处理器重新启动,以克服临时故障并恢复单元操作。如果故障是永久性的,那么除了母线隔离之外,监视器电路还可以控制处理器中央操作时钟电路,以进一步减少故障单元处理器的损坏并降低其功耗。此外,3D系统可以包括系统工艺,其中每个单元周期性地被3D系统任务分配器处理器ping。如果一个单元被任务分配器处理器认为有故障,那么可以激活恢复操作来分配一个备用单元来替换故障单元。或者,3D系统可以包括在操作单元之间重新分配系统任务的灵活性。大规模多计算机系统领域的技术人员可以将这种内置的测试、检测和恢复技术设计到3D系统的设计中。The 3D system described in this article utilizes a number of units that have processor memory and can be interconnected using a hierarchy of X-Y connections. Such systems are sometimes called "networks on a chip" (NoC). Such a system can manage defects by calling up spare units to be activated to replace defective units or by providing pre-tasking capabilities to allocate workloads to available good operating units. The concept of such complex systems with self-healing and operational flexibility is well known in the art and is used, for example, with server farms and other multi-computer systems. This technique can include the use of a circuit called a "monitor" in which a good operating unit will periodically trigger the monitor circuit, announcing that the unit is in good operating condition. If a monitor is left for too long without this trigger, it may activate unit failsafe mode. Therefore, once a faulty unit is detected, the monitor circuit can activate a controlled vertical bus disconnect to isolate the faulty processor from the vertical bus, thereby preventing the faulty unit from affecting the operation of other units of the 3D system. In this case, the circuitry can also initiate a processor restart to overcome the temporary failure and restore unit operation. If the fault is permanent, then in addition to bus isolation, the monitor circuit can control the processor central operating clock circuit to further reduce damage to the failed unit processor and reduce its power consumption. Additionally, the 3D system may include a system process in which each unit is periodically pinged by the 3D system tasker processor. If a unit is deemed faulty by the task allocator processor, recovery operations can be activated to allocate a spare unit to replace the failed unit. Alternatively, the 3D system may include the flexibility to redistribute system tasks between operating units. Those skilled in the field of large-scale multi-computer systems can engineer this built-in test, detection and recovery technology into the design of 3D systems.

这种3D系统的另一个替代方案是具有通过多个芯片转移而不是如参考美国专利申请16/558,304,公开案2020/0176420的图43A-43E所示的一个晶片转移构造的水平,所述专利申请以引用的方式并入本文中。这种芯片级转移也可以利用Inoue,Fumihiro等人在《用于晶片到晶片和集体芯片到晶片直接接合的高级划片技术》2019年IEEE第69届电子元件和技术会议(ECTC)的论文中提出的一种名为“集体芯片对晶片直接接合”的技术,IEEE,2019;此外,Nick Flaherty的标题为“3D封装精度低于2|im的集体芯片到晶片接合”,欧洲EE新闻,2020年10月19日;Brandstatter,Birgit等人的“高速超精确直接C2W接合”,2020年IEEE第70届电子元件与技术会议(ECTC),IEEE,2020;以上所有内容以全文引用的方式并入本文中。这样的芯片级转移可以利用M层级概念使芯片转移到基层,从而形成可以被称为芯片级的M层级,然后一起转移到3D系统堆叠上。Another alternative to such a 3D system is to have a level constructed by transferring multiple dies instead of one wafer as shown in Figures 43A-43E of U.S. Patent Application No. 16/558,304, Publication No. 2020/0176420. These patent applications are incorporated herein by reference. Such chip-level transfer can also be exploited Inoue, Fumihiro et al., "Advanced dicing technology for wafer-to-wafer and collective chip-to-wafer direct bonding," 2019 IEEE 69th Electronic Components and Technology Conference (ECTC) A proposed technology called "Collective die-to-wafer direct bonding", IEEE, 2019; also, Nick Flaherty's titled "Collective die-to-wafer bonding with 3D packaging accuracy below 2|im", European EE News, 2020 October 19, 2020; "High-speed ultra-precise direct C2W bonding" by Brandstatter, Birgit et al., 2020 IEEE 70th Electronic Components and Technology Conference (ECTC), IEEE, 2020; All content above is incorporated by reference in full. in this article. Such chip-level transfer can use the M-level concept to transfer the chip to the base layer, thereby forming an M-level that can be called a chip level, and then transfer it together to the 3D system stack.

这种DieM能级概念可用于利用激光器、光电探测器和波导的X-Y连接M能级,如参考美国专利申请16/558,304,公开案2020/0176420的至少图35A至图37B所示,其以引用的方式并入本文中。这种DieM层级可以通过硅光子学来实现,所述硅光子力学包括由硅锗合金制成的光电探测器。光子连接的波长可以是约1.3微米或约1.5微米,但其它有用的波长也是可能的。这种DieM层级可以是3D系统的一部分,例如本文图14E中的参考数字1447。参考图15A-15D给出了一个示例,图15A-15D是X-Z 1502的剖视图。This DieM level concept can be used to utilize X-Y junction M levels of lasers, photodetectors, and waveguides, as shown in reference to at least Figures 35A-37B of U.S. Patent Application No. 16/558,304, Publication No. 2020/0176420, which is incorporated by reference are incorporated into this article. This DieM hierarchy can be achieved through silicon photonics, which includes photodetectors made of silicon-germanium alloys. The wavelength of the photon connection may be about 1.3 microns or about 1.5 microns, but other useful wavelengths are possible. Such a DieM hierarchy may be part of a 3D system, such as reference numeral 1447 in Figure 14E herein. An example is given with reference to Figures 15A-15D, which are cross-sectional views of X-Z 1502.

图15A示出了具有波导1512的驱动和控制晶片1504,所述波导1512定位在衬底1518上的切割层(例如SiGe 1516)上的控制和驱动电路1514上。驱动和控制晶片1504可以包括垂直连接焊盘1506,用于将驱动和控制晶片1504连接到一个或多个激光二极管芯片1520,其可以被接合在顶部,且透明通孔1508用于将激光束引导到分束器和方向改变组件1510,且因此将激光束(一个或更多个)引导到适当的波导。用于处理这种波导和光学互连结构的技术在所属领域是已知的,例如在至少美国专利5,485,021、5,987,196、6,791,675、7,203,387、8,548,288、9,197,804中提出;Lo Shih-Shou、Mou-Sian Wang和Chii-ChangChen的论文“由全向反射器形成的半导体空心光波导”,Optics Express 12.26(2004):6589-6593,以上所有内容均以引用的方式并入本文中。激光二极管芯片1520也可以构建在具有可选切割层1528的衬底1530上。激光二极管芯片1520可以包括许多二极管,每个二极管的引脚/焊盘连接在透明过孔输出和诸如接地/电源连接的支撑结构中。激光二极管可以构建在晶体1526上,所述晶体1526例如非常适合于激光产生,例如GaAs、InP、GaSb、GaN等。晶体层1526可以是与衬底1530不同的材料。例如,晶体激光器1526可以是通过缓冲层生长在硅衬底1530上的晶体直接带隙半导体。或者,将一块晶体直接带隙半导体(即所谓的芯片)转移并接合到硅衬底1530上。激光二极管芯片可以包括引脚1522和透明通孔1524。在许多情况下,用于激光二极管的晶体在300mm晶片上是不可用的,因此对于3D集成应用而言,晶片级转移可能是优选的。Figure 15A shows a drive and control wafer 1504 with waveguides 1512 positioned over control and drive circuitry 1514 on a cutting layer (eg, SiGe 1516) on a substrate 1518. The driver and control die 1504 may include vertical connection pads 1506 for connecting the driver and control die 1504 to one or more laser diode chips 1520 , which may be bonded on top, and transparent vias 1508 for directing the laser beam to the beam splitter and direction changing assembly 1510 and thereby direct the laser beam(s) to the appropriate waveguide. Techniques for processing such waveguide and optical interconnect structures are known in the art and are set forth, for example, in at least U.S. Patent Nos. 5,485,021, 5,987,196, 6,791,675, 7,203,387, 8,548,288, 9,197,804; Lo Shih-Shou, Mou-Sian Wang, and Chii-ChangChen's paper "Semiconductor Hollow Optical Waveguides Formed by Omnidirectional Reflectors," Optics Express 12.26 (2004): 6589-6593, all of which are incorporated herein by reference. Laser diode chip 1520 may also be built on substrate 1530 with optional dicing layer 1528. Laser diode chip 1520 may include a number of diodes with each diode's pins/pads connected in clear via outputs and supporting structures such as ground/power connections. The laser diode may be built on a crystal 1526 that is well suited for laser generation, such as GaAs, InP, GaSb, GaN, etc. Crystal layer 1526 may be a different material than substrate 1530. For example, crystal laser 1526 may be a crystalline direct bandgap semiconductor grown on silicon substrate 1530 through a buffer layer. Alternatively, a crystalline direct bandgap semiconductor (so-called chip) is transferred and bonded to the silicon substrate 1530. The laser diode chip may include pins 1522 and transparent vias 1524. In many cases, the crystals used for laser diodes are not available on 300mm wafers, so wafer-level transfer may be preferable for 3D integration applications.

图15B示出了驱动和控制晶片1504顶部的几个激光二极管芯片1520的接合。Figure 15B shows the bonding of several laser diode chips 1520 on top of the driver and control wafer 1504.

图15C示出了在细化激光二极管芯片1520的衬底之后的接合结构1540。如果激光二极管芯片1520内置有切割层,那么这种切割层,例如图15A所示的切割层1528,可以用于所述细化步骤。许多用于激光二极管的晶体是在另一种晶体上使用外延生长构建的。这种工艺可以用于在承载衬底和二极管激光晶体之间形成蚀刻终止切割层。在薄化工艺之后,可以使用其它工艺步骤,例如保形氧化物沉积,用于填充激光二极管芯片之间的间隙,然后CMP以提供平坦化。如果需要,那么可以利用在现在的顶表面上形成连接引脚/焊盘的步骤。15C shows the bonding structure 1540 after detailing the substrate of the laser diode chip 1520. If the laser diode chip 1520 has a dicing layer built into it, such a dicing layer, such as the dicing layer 1528 shown in Figure 15A, can be used in the thinning step. Many crystals used in laser diodes are built using epitaxial growth on another crystal. This process can be used to form an etch stop cut layer between the carrier substrate and the diode laser crystal. After the thinning process, other process steps can be used, such as conformal oxide deposition for filling the gaps between laser diode chips, followed by CMP to provide planarization. If desired, the step of forming connection pins/pads on the now top surface can be utilized.

图15D示出了图15C的结构,在将其翻转到具有切割层1546的另一衬底1548上并去除其衬底1538之后。图15D的结构可以通过添加用于C母线未来垂直连接的焊盘/引脚(未显示)而作为DieM层级做好准备。Figure 15D shows the structure of Figure 15C after flipping it onto another substrate 1548 with cutting layer 1546 and removing its substrate 1538. The structure of Figure 15D can be prepared as a DieM layer by adding pads/pins (not shown) for future vertical connections to the C bus.

光学X-Y互连也可以利用硅片,使其更容易集成到这里介绍的3D系统中。如XuKaikai等人的论文“用于标准IC技术中GHz频率范围内的快速光学互连和快速传感应用的硅发光器件”,Optoelectron.Adv.Mater.-Rapid Commun 11(2017):164-166和Snyman,Lukas W.等人的“使用RF硅集成电路工艺刺激Si AMLED的700-900nm波长的光发射并耦合到Si 3N 4波导中”,OSAContinuum 3.4(2020):798-813中所述,两者均以全文引用的方式并入本文中。Optical X-Y interconnects can also utilize silicon wafers, making them easier to integrate into the 3D systems presented here. For example, the paper by XuKaikai et al. "Silicon light-emitting devices for fast optical interconnection and fast sensing applications in the GHz frequency range in standard IC technology", Optoelectron.Adv.Mater.-Rapid Commun 11(2017): 164-166 and Snyman, Lukas W. et al., "Stimulating light emission at 700-900nm wavelengths from Si AMLEDs and coupling into SiN waveguides using RF silicon integrated circuit processes," OSAContinuum 3.4 (2020): 798-813, Both are incorporated herein by reference in their entirety.

如图15B至图15C之间的步骤所示,在将芯片衬底接合到目标晶片之后,可以通过研磨和湿式化学/等离子体回蚀来实现芯片衬底的细化。对于硅基芯片,基于SiGe的切割层可以实现极端薄化,甚至低于500nm的最终厚度。在一些情况下,芯片衬底的细化可以使用其它形式的蚀刻终止,或者可以在不使用切割层的情况下进行到诸如20或10pm水平的较小极端。这将被设计为确定特定产品和结构需求的最佳工艺。这些工程权衡和可能性中的许多已经在通过引用并入的各种组成部分中进行了讨论。As shown in the steps between Figures 15B to 15C, after bonding the chip substrate to the target wafer, the chip substrate can be refined by grinding and wet chemical/plasma etchback. For silicon-based chips, SiGe-based dicing layers can achieve extreme thinning, even to final thicknesses below 500nm. In some cases, thinning of the chip substrate may be terminated using other forms of etching, or may be carried out to smaller extremes such as the 20 or 10 pm level without the use of a dicing layer. This will be designed to determine the best process for specific product and structural requirements. Many of these engineering trade-offs and possibilities have been discussed in the various components incorporated by reference.

用于集成光波导的一些技术和工艺流程已在美国专利中提出,例如至少在10,587,026和10,770,414中,两者均以全文引用的方式并入本文中。Some techniques and processes for integrating optical waveguides have been proposed in US patents, such as at least 10,587,026 and 10,770,414, both of which are incorporated herein by reference in their entirety.

在一些3D系统中,对X-Y连接的需求可以证明添加RF连接M层级是合理的,在本文中将其称为RF-M层级。虽然光学连接可以提供极好的带宽和最小的串扰,但它相对笨重,且使用了硅片处理中不常见的元件。光学连接可用于较长的X-Y连接,例如长于150mm,而RF连接可优选用于10mm至300mm的范围。一些3D系统可使用不止一种技术进行X-Y连接——正如可能需要不止一种类型的存储器技术一样,如图15E所示,这是Tam Sai-Wang的“未来SoC的有线/无线射频互连”,2011年IEEE射频集成技术国际研讨会,IEEE,2011的图8和图9的副本,上述内容以引用的方式并入本文中。In some 3D systems, the need for X-Y connections can justify adding an M level of RF connections, which will be referred to as the RF-M level in this article. While optical connectivity can provide excellent bandwidth and minimal crosstalk, it is relatively bulky and uses components not commonly found in silicon processing. Optical connections can be used for longer X-Y connections, for example longer than 150mm, while RF connections can preferably be used in the range of 10mm to 300mm. Some 3D systems may use more than one technology for X-Y connectivity – just as more than one type of memory technology may be required, as shown in Figure 15E, “Wired/Wireless RF Interconnects for Future SoCs” by Tam Sai-Wang , 2011 IEEE International Symposium on Radio Frequency Integration Technology, IEEE, 2011, is a copy of Figures 8 and 9, the foregoing content of which is incorporated herein by reference.

如以下公开案所述,射频互连(RF-I)已被认为是支持NoC的芯片上多核互连的首选技术:Kaplan、Adam Blake和Glenn Reinman的射频互连的体系结构集成以增强多核心芯片多处理器的片上通信,加利福尼亚大学,洛杉矶,2008等公开案所述,射频互连(RF-I)已被视为互连支持NoC的多核芯片上的首选技术;Chang,M-C.Frank等人的“芯片上通信的射频互连”,2008年国际物理设计研讨会论文集,2008;Chang,M.Frank等人的“用多频带射频互连覆盖芯片上的CMP网络”,2008年IEEE第14届高性能计算机体系结构国际研讨会,IEEE,2008;LaRocca、Tim、Jenny Yi-Chun Liu和Mau-Chung Frank Chang的“60GHz CMOS放大器使用变压器耦合和人工介电差分传输线进行紧凑设计”IEEE固态电路杂志44.5(2009):1425-1435;Tam Sai-Wang等人的“未来片上网络的同时三频片上射频互连”,2009超大规模集成电路研讨会,IEEE,2009;Wu Hao等人的“用于5Gbps双向通信和多点仲裁的带X/4耦合器的60GHz片上RF互连”,IEEE 2012定制集成电路会议论文集,IEEE,2012;以上所有内容以全文引用的方式并入本文中。图15F的1551和1552说明了所述工作中建议和复制的输电线路配置。这种传输线(“TL”)可以具有大约12pm的节距,且可以利用诸如M7和M8的上部金属层形成在有源CMOS电路的顶部。这些公开案表明,一组这样的TL可以布置在连接许多计算核心的蜿蜒拓扑中,如本文中图15F的1557所示。Radio frequency interconnect (RF-I) has been considered the technology of choice for NoC-enabled on-chip multicore interconnects, as discussed in the following publication: Architectural Integration of Radio Frequency Interconnects to Enhance Multicore by Kaplan, Adam Blake, and Glenn Reinman On-Chip Communications for Chip Multiprocessors, University of California, Los Angeles, 2008 et al., Radio Frequency Interconnect (RF-I) has been regarded as the preferred technology for interconnecting NoC-enabled multi-core chips; Chang, M-C. Frank, et al. "Radio Frequency Interconnects for On-Chip Communications", Proceedings of the 2008 International Physical Design Symposium, 2008; Chang, M. Frank et al., "Covering On-Chip CMP Networks with Multi-band RF Interconnects", 2008 IEEE 14th International Symposium on High Performance Computer Architecture, IEEE, 2008; "Compact design of 60GHz CMOS amplifiers using transformer coupling and artificial dielectric differential transmission lines" by LaRocca, Tim, Jenny Yi-Chun Liu, and Mau-Chung Frank Chang, IEEE Journal of Solid State Circuits 44.5(2009): 1425-1435; "Simultaneous Tri-Band On-Chip RF Interconnects for Future On-Chip Networks" by Tam Sai-Wang et al., 2009 VLSI Symposium, IEEE, 2009; Wu Hao et al. "60GHz On-Chip RF Interconnect with X/4 Couplers for 5Gbps Bidirectional Communication and Multipoint Arbitration," Proceedings of the IEEE 2012 Custom Integrated Circuits Conference, IEEE, 2012; all content above is incorporated by reference in full. . 1551 and 1552 of Figure 15F illustrate the transmission line configuration proposed and replicated in the work described. Such transmission lines ("TL") may have a pitch of approximately 12pm and may be formed on top of the active CMOS circuit using upper metal layers such as M7 and M8. These publications show that a set of such TLs can be arranged in a meandering topology connecting many computing cores, as shown at 1557 in Figure 15F herein.

以下公开内容是额外工作,如:Maekawa、Tomoaki、Hiroyuki Ito和Kazuya Masu的“8Gbps 2.5mW片上脉冲电流模式传输线与堆叠开关Tx互连”,ESSCIRC 2008第34届欧洲固态电路会议,IEEE,2008;Carpenter,Aaron等人的“全球共享介质片上互连的案例”,第38届计算机体系结构年度国际研讨会论文集,2011;Carpenter,Aaron等人的“提高基于传输线的母线的有效吞吐量”,2012年第39届计算机体系结构年度国际研讨会(ISCA),IEEE,2012;以及Carpenter,Aaron等人的“使用传输线进行全球芯片上通信”,IEEE电路与系统新兴选题期刊2.2(2012):183-193,上述所有内容均以全文引用的方式并入本文中。这些工作提出了类似的概念,但具有更宽的TL,如图15F的1554所示,节距约为45pm。The following disclosures are additional work such as: "8Gbps 2.5mW on-chip pulsed current mode transmission line with stacked switch Tx interconnect" by Maekawa, Tomoaki, Hiroyuki Ito, and Kazuya Masu, ESSCIRC 2008 34th European Solid State Circuits Conference, IEEE, 2008; Carpenter, Aaron et al., “The Case for Global Shared Media On-Chip Interconnects,” Proceedings of the 38th Annual International Symposium on Computer Architecture, 2011; Carpenter, Aaron et al., “Improving Effective Throughput of Transmission Line-Based Buses,” 2012 39th Annual International Symposium on Computer Architecture (ISCA), IEEE, 2012; and "Global on-chip communications using transmission lines" by Carpenter, Aaron et al., IEEE Journal of Emerging Topics in Circuits and Systems 2.2 (2012): 183-193, all of which are incorporated herein by reference in their entirety. These works proposed a similar concept but with a wider TL, as shown at 1554 in Figure 15F, with a pitch of approximately 45pm.

例如以下公开内容的后续工作展示了类似的概念:Drillet,Frédéric等人的“NOCRF互连的灵活无线电接口”2014年第17届欧洲微型数字系统设计会议,IEEE,2014,其以全文引用的方式并入本文中,但具有更宽的TL,如图15F的1556所示,节距约为75pm。For example, follow-up work to the following publications demonstrating similar concepts: Drillet, Frédéric et al., “Flexible Radio Interfaces for NOCRF Interconnections,” 2014 17th European Micro-Digital Systems Design Conference, IEEE, 2014, which is cited in full Incorporated into this article, but with a wider TL, as shown at 1556 in Figure 15F, with a pitch of approximately 75pm.

这些片上RF-I可以应用于本文所述的3D系统,如图15G所示。蜿蜒概念可以改变为X-Y连接结构,类似于美国申请16/558304的图33A中所示的结构,以引用的方式并入本文中。TL配置的选择可以被设计为适应3D系统的X-Y尺寸和其它参数,例如单元的尺寸和对X-Y连接的需求。虽然TL的形状如图15F所示是共面的条带,但已经开发了其它变体,这些变体可以集成到这种3D系统中。TL可以被构造为X方向上的重复带被Y方向上的反复带覆盖。频带可以包括TL形状的混合,使得对于较短距离具有紧密间距,且对于较长距离具有宽间距和低衰减。作为单元上的路径的TL的这些频带可以包括每个单元的丢弃连接,或者具有跳过一些单元的一些TL,从而为系统架构和使用协议提供选择。这种TL的一个重要方面是每毫米长度的TL衰减与载波频率和TL间距之间的强关系,如图15F的图表1555所示。These on-chip RF-I can be applied to the 3D system described in this article, as shown in Figure 15G. The meandering concept can be changed to an X-Y connected structure similar to that shown in Figure 33A of US application 16/558304, which is incorporated herein by reference. The choice of TL configuration can be designed to accommodate the X-Y dimensions of the 3D system and other parameters, such as the size of the cells and the need for X-Y connections. Although the shape of the TL is shown in Figure 15F as coplanar strips, other variants have been developed that can be integrated into such 3D systems. TL can be constructed as a repeating band in the X direction covered by a repeating band in the Y direction. The frequency bands may include a mixture of TL shapes, allowing for tight spacing for shorter distances and wide spacing and low attenuation for longer distances. These bands as TLs for paths over the units may include dropped connections for each unit, or have some TLs skipping some units, providing options for system architecture and usage protocols. An important aspect of this TL is the strong relationship between TL attenuation per millimeter length and carrier frequency and TL spacing, as shown in graph 1555 of Figure 15F.

这种3D系统的工程可以包括利用高频进行相对较短的距离连接,利用低频进行相对较长的距离连接。在一些情况下,可以设计为一个TL可以用于高频本地连接的一个区域,而远处区域可以利用相同的频率用于另一个本地连接,从而利用第一区域的信号的衰减效应。TL的宽度和厚度可以进行类似的调谐,使用更宽、更厚的线来获得更长的距离和更高的频率。Engineering of such 3D systems can include utilizing high frequencies for relatively short distance connections and low frequencies for relatively longer distance connections. In some cases, it can be designed so that one TL can be used for one area of the high-frequency local connection, while a distant area can utilize the same frequency for another local connection, thereby taking advantage of the attenuation effect of the signal in the first area. The width and thickness of the TL can be similarly tuned, using wider, thicker wires for longer distances and higher frequencies.

图15G是类似于图14E所示的3D系统的X-Z 1503剖视图。图15G示出了设置在光学X-Y互连M层级1547下方的额外RF-M层级1570。形成RF-M层级1570的工艺可以类似于图13A中所示的工艺,只是1314可以是RF接收器、收发器和支持电路的阵列的一个层级,其中TL的金属层在X方向上,TL在Y方向上。1312可以是被设计为支持用于各种通信链路的相应通信协议的面向RF的处理器的层级。因此,RF-M-层级1570可以包括面向RF的处理器的层级1572,层级1573可以提供RF屏蔽和RF匹配层级,例如高缺陷衬底,层级1574可以提供RF接收器、收发器和支持电路。这种水平可以利用RF取向的晶体,例如SiGe。层级1575可以提供Y方向定向的TL的束,层级1576可以提供X方向定向的TLs的束,等级1577可以提供Y向定向的TLs的束,且层级1578可以提供X向定向的TL束。设计用于在顶部支持RF电路的衬底在所属领域中是已知的,且通常作为SOI高电阻率(HR)衬底或富含陷阱的HR衬底提供,例如,如Neve、Cesar Roda和Jean-Pierre Rask在“HR-Si和富含陷阱的HR-Si衬底上的CPW线路的RF谐波失真”,IEEE电子器件汇刊59.4(2012):924-932中提出的,其全部内容以引用的方式并入本文中。另一种选择是利用多孔层,如Sarafis、Panagiotis和Androula G.Nassiopoulou的论文“多孔硅作为射频和毫米波无源器件(传输线、电感器、滤波器和天线)单片集成的衬底:当前技术现状和前景”,应用物理评论4.3(2017):031102和Gautier、Gael和PhilippeLeduc的“用于射频器件中电隔离的多孔硅:综述”,应用物理评论1.1(2014):011101中所述,所有上述内容均以全文引用的方式并入本文中。集成这样的RF友好型衬底可以帮助降低功率并提高这样的3D系统的性能,且可以利用这里和所并入的技术中提出的架构和层转移技术。Figure 15G is an X-Z 1503 cross-sectional view of a 3D system similar to that shown in Figure 14E. Figure 15G shows an additional RF-M level 1570 disposed below the optical X-Y interconnect M level 1547. The process of forming RF-M level 1570 can be similar to the process shown in Figure 13A, except that 1314 can be a level of an array of RF receivers, transceivers, and supporting circuitry, with metal layers of TL in the in the Y direction. 1312 may be a layer of RF-oriented processors designed to support corresponding communication protocols for various communication links. Thus, RF-M-level 1570 may include an RF-oriented processor level 1572, level 1573 may provide RF shielding and RF matching levels, such as high defect substrates, and level 1574 may provide RF receivers, transceivers, and support circuitry. This level can utilize RF oriented crystals such as SiGe. Level 1575 may provide a bundle of Y-directed TLs, level 1576 may provide a bundle of X-directed TLs, level 1577 may provide a bundle of Y-directed TLs, and level 1578 may provide an X-directed bundle of TLs. Substrates designed to support RF circuitry on top are known in the art and are often offered as SOI high resistivity (HR) substrates or trap-rich HR substrates, such as, for example, Neve, Cesar Roda and Jean-Pierre Rask, "RF Harmonic Distortion of CPW Lines on HR-Si and Trap-Rich HR-Si Substrates," IEEE Transactions on Electronic Devices 59.4 (2012): 924-932, entire content Incorporated herein by reference. Another option is to utilize porous layers, as in the paper by Sarafis, Panagiotis, and Androula G. Nassiopoulou, “Porous silicon as a substrate for monolithic integration of RF and millimeter-wave passive devices (transmission lines, inductors, filters, and antennas): Current State of the Technology and Prospects," Applied Physics Reviews 4.3 (2017): 031102 and Gautier, Gael, and Philippe Leduc, "Porous Silicon for Electrical Isolation in Radio Frequency Devices: A Review," Applied Physics Reviews 1.1 (2014): 011101, All of the foregoing are incorporated herein by reference in their entirety. Integrating such RF-friendly substrates can help reduce power and improve the performance of such 3D systems, and can take advantage of the architecture and layer transfer techniques presented here and incorporated in the technology.

RF X-Y互连的优点之一是相对更容易接入引入和引出信号。TL在导体之间保持较大的距离,如图15F所示。因此,有足够的空间使过孔(例如,过孔1553)从收发器穿过TL或到达接收器。多个层级的TL可以一个放在另一个上面,并连接到层级1574——RF接收器收发器和支持电路。这些垂直连接很短,即使导体宽度连接约为1pm,也可以具有可接受的衰减,且可以是阻抗匹配的,以实现最佳功率传输。One of the advantages of RF X-Y interconnects is that it is relatively easy to access incoming and outgoing signals. TL maintains a larger distance between conductors, as shown in Figure 15F. Therefore, there is enough room for a via (eg, via 1553) to pass from the transceiver through the TL or to the receiver. Multiple levels of TL can be placed one on top of the other and connected to level 1574 - the RF receiver transceiver and support circuitry. These vertical connections are short, have acceptable attenuation even with conductor width connections of about 1pm, and can be impedance matched for optimal power transfer.

与TL的连接可以包括一个晶体管开关,以减少未激活连接的影响,如Hamieh,Mohamad等人的论文“使用基于晶体管的分布式接入的射频芯片内通信的新互连方法”,微波和光技术快报61.2(2019):297-302的图2、6、7所示,以全文引用的方式并入本文中。多点TL的另一种替代方案使用L/4定向耦合器,如Wu Hao等人的论文“用于5Gbps双向通信和多点仲裁的带L/4耦合器的60GHz片上RF互连”,IEEE 2012定制集成电路会议论文集,IEEE,2012中提出,其以全文引用的方式并入本文中。与TL的电磁耦合也在美国专利7,889,022中提出,所述专利以全文引用的方式并入本文中。The connection to the TL can include a transistor switch to reduce the effects of inactive connections, as described in Hamieh, Mohamad et al.'s paper "A new interconnect approach to RF intra-chip communications using transistor-based distributed access," Microwave and Optical Technology Express 61.2 (2019): 297-302 is shown in Figures 2, 6, and 7, which are incorporated into this article by full text citation. Another alternative to multipoint TL uses L/4 directional couplers, as in the paper "60GHz On-Chip RF Interconnect with L/4 Couplers for 5Gbps Bidirectional Communication and Multipoint Arbitration" by Wu Hao et al., IEEE Presented in Proceedings of the 2012 Custom Integrated Circuits Conference, IEEE, 2012, which is incorporated by reference in its entirety. Electromagnetic coupling to TL is also proposed in US Patent 7,889,022, which is incorporated herein by reference in its entirety.

面向射频的处理器的层级1572和射频接收器收发器和支持电路的层级1574可以被构造为与下面的单元结构对齐和对应的单元,并使用垂直母线连接到它。BEOL(线的后端)可以适于支撑TL,例如对于分配给TL的层具有大于0.5gm的厚度。TL的金属厚度可以调整为大于约0.8克或约1.4克或甚至大于2克。TL如图15F所示,且相关公开案被设计为适用于通用CMOS工艺线,如IBM 90nm工艺,以支持与底层SOC电路的集成。对于如图15G所示的3D系统,最好有一个专用的工艺流程来支持RF-M层级的处理。The hierarchy of RF oriented processors 1572 and the hierarchy of RF receiver transceivers and support circuits 1574 may be constructed as units aligned and corresponding to the underlying unit structure and connected to it using vertical busbars. The BEOL (back end of line) may be adapted to support the TL, for example having a thickness greater than 0.5 gm for the layer assigned to the TL. The metal thickness of the TL can be adjusted to be greater than about 0.8 grams or about 1.4 grams or even greater than 2 grams. The TL is shown in Figure 15F, and the related disclosure is designed to be suitable for general CMOS process lines, such as the IBM 90nm process, to support integration with the underlying SOC circuit. For a 3D system as shown in Figure 15G, it is best to have a dedicated process flow to support RF-M level processing.

如图15F的1557所示,TL束是一个蜿蜒连接64核处理器,具有16个通信节点。多份公开案提出了蜿蜒,其它公开案提出了U形、Z形和X形。这些公开案建议在芯片级使用RF-I,且优选具有连接系统中所有中心节点的TL。这样的方法可以应用于本文所建议的3D系统。然而,本文中的3D系统可以被制成远大于本文中先前讨论的芯片尺寸,例如,通过光罩片尺寸或50cm x 50cm或100cm x 100cm或大于200cm x 200cm,如全尺寸晶片或面板尺寸。对于这样的大面积3D系统,这样的方法可能不太吸引人,特别是对于具有诸如200gm x 200gm的面积的相对较小单元的3D系统的大阵列。一种替代方案如图15H,俯视图X-Y 1580所示,TL线沿X方向1582延伸,TL线按Y方向1581延伸,每个TL线都可以连接到其底层单元(未显示)。在200厘米的长度下,一个TL可以有1000个到其底层单元的脱落连接。为了提供3D系统内的两个单元之间的连接,可以使用X方向的TL和Y方向的TL。As shown at 1557 in Figure 15F, the TL bundle is a meandering connection of 64-core processors with 16 communication nodes. Several publications proposed meandering, and other publications proposed U-shapes, Z-shapes, and X-shapes. These publications suggest using RF-I at the chip level, preferably with TLs connecting all central nodes in the system. Such an approach can be applied to the 3D system proposed in this paper. However, the 3D system in this article can be made much larger than the chip sizes previously discussed in this article, for example, by reticle size or 50cm x 50cm or 100cm x 100cm or larger than 200cm x 200cm, like full-size wafer or panel size. Such an approach may not be very attractive for such large area 3D systems, especially for large arrays of 3D systems with relatively small cells of area such as 200gm x 200gm. An alternative is shown in Figure 15H, top view X-Y 1580, with TL lines extending in the X direction 1582 and TL lines extending in the Y direction 1581. Each TL line can be connected to its underlying unit (not shown). At 200cm in length, a TL can have 1000 drop-off connections to its underlying units. To provide a connection between two units within a 3D system, a TL in the X direction and a TL in the Y direction can be used.

X向TL和Y向TL的使用可以通过直接连接或使用底层1572TL处理器来完成。The use of X-TL and Y-TL can be done through direct connection or using the underlying 1572TL processor.

图151示出了RF-M层级1570的基层1572的每单元TL处理器的框图。它可以包括具有连接到每单元垂直母线1587的功能的处理器1583,且相应地连接到重叠TL中的一个以向系统中的另一单元传输数据或从系统中的其它单元传输数据。3D系统可以利用用于正在使用的通信链路类型中的每一种的通信协议。例如用于垂直母线1597的通信协议和用于单元间RF互连的通信协议。Figure 151 shows a block diagram of a per-unit TL processor of the base layer 1572 of the RF-M hierarchy 1570. It may include a processor 1583 with functionality connected to the per-unit vertical bus 1587 and, accordingly, connected to one of the overlapping TLs to transmit data to or from another unit in the system. The 3D system can utilize communication protocols for each of the types of communication links being used. Examples include communication protocols for vertical bus 1597 and communication protocols for inter-unit RF interconnection.

额外的处理器1584可以是直接存储器访问处理器(DMA)。这样的处理器可以利用对底层存储器的直接访问来将数据块从系统中的另一个单元传送或传送到系统中的另外一个单元。3D系统可以包括用于向存储器组提供第二端口的额外存储器控制级1571以及专用存储器访问母线1569和1588。Additional processor 1584 may be a direct memory access processor (DMA). Such a processor can utilize direct access to the underlying memory to transfer blocks of data from or to another unit in the system. The 3D system may include an additional memory control level 1571 and dedicated memory access buses 1569 and 1588 to provide a second port to the memory bank.

额外的处理器1586可用于促进从X向TL到Y向TL的数据传输。这些处理器可用于系统级互连,并将基层处理器留给数据处理。这种方法可以帮助为具有10000个处理单元的3D系统提供系统级集成,每个处理单元具有自己的存储器和通信链路。Additional processors 1586 may be used to facilitate data transfer from the X-to-TL to the Y-to-TL. These processors can be used for system-level interconnects and leave the base processors for data processing. This approach can help provide system-level integration for 3D systems with 10,000 processing units, each with its own memory and communication links.

有几种传输数据的选项,例如,从X方向的TL传输到Y方向的TL。一旦系统选择了应发生方向改变的单元,相应的方向改变处理器1586将控制和管理X到Y连接。图15J说明了这种连接的一些替代示意图。一种简单的方法是使用传输栅极1591。在某些情况下,例如当在TL上使用一个频率时,可以在信号下降点以最小的额外开销添加放大器。这种电路的功能类似于可编程过孔,其简单但如果TL携带调制多个载波频率的多个数据信号,则可能不那么吸引人。可以用于这种多频带连接的方法是频带分配。这种频带分配方法例如在美国专利9,806,787和Oh Jungju、Milos Prvulovic和Alenka Zajic的论文“TLSync:支持使用片上传输线的多个快速屏障”,2011年第38届国际计算机体系结构年会(ISCA),IEEE,2011中提出,其以全文引用的方式并入本文中。There are several options for transferring data, for example, from a TL in the X direction to a TL in the Y direction. Once the system selects the unit where a direction change should occur, the corresponding direction change processor 1586 will control and manage the X to Y connection. Figure 15J illustrates some alternative schematics of this connection. A simple method is to use transfer gate 1591. In some cases, such as when using a frequency on the TL, an amplifier can be added at the signal drop-off point with minimal additional overhead. This circuit functions similarly to a programmable via, which is simple but may not be as attractive if the TL carries multiple data signals modulating multiple carrier frequencies. A method that can be used for such multi-band connections is frequency band allocation. This method of band allocation is exemplified in U.S. Patent 9,806,787 and the paper "TLSync: Support for multiple fast barriers using on-chip transmission lines" by Oh Jungju, Milos Prvulovic, and Alenka Zajic, 2011 38th Annual International Symposium on Computer Architecture (ISCA), IEEE, 2011, which is incorporated by reference in its entirety.

一个替代概念是具有来自源TL的衰减信号,然后使用可编程带通滤波器(BPF)仅传输选定的载波频率,然后例如使用可编程增益放大器1592重新放大信号,然后将放大的信号连接到目的地TL。可用于这种TL可编程连接的方法是分配用于指定用于TL连接的信号的频带。这可以包括为诸如X-Y的TL连接分配较低的频带。较低的频率可以具有较低的衰减且更好地适合较长的信号路径。这个概念可以扩展到包括指定为广播通道的较低波段,例如一对多。然后,低频带可以用于较长的信号路径,例如X-Y连接,以及中高频带到单个TL连接。在一些情况下,例如当在TL上使用一个频率时,信号下降点可以是具有最小额外开销的放大器。放大器或BPF和传输栅极可以一起设计,每个方向(X-to-Y和Y-to-X)允许一次使用一个方向,但仍然利用TL波导的双向性。An alternative concept is to have an attenuated signal from the source TL and then use a programmable band pass filter (BPF) to transmit only the selected carrier frequency and then re-amplify the signal using for example a programmable gain amplifier 1592 and then connect the amplified signal to Destination TL. A method that can be used for such a TL programmable connection is to allocate frequency bands for specifying signals used for the TL connection. This may include allocating lower frequency bands for TL connections such as X-Y. Lower frequencies can have lower attenuation and fit better into longer signal paths. This concept can be extended to include lower bands designated as broadcast channels, such as one-to-many. The low frequency band can then be used for longer signal paths, such as X-Y connections, and the mid and high frequency bands can be used for a single TL connection. In some cases, such as when using a frequency on the TL, the signal drop-off point can be the amplifier with minimal overhead. The amplifier or BPF and transmission gate can be designed together, allowing each direction (X-to-Y and Y-to-X) to be used one at a time, but still taking advantage of the bidirectionality of the TL waveguide.

另一种替代方案是首先从源TL的衰减信号中重建数据流,然后用目标载波频率1593对数据进行重新调制,并将其衰减到目标TL。这样,系统可以为第一TL选择不同的载波,为第二TL选择不同的载波。通过对方向改变处理器1586的更多支持电路,可以实现更高层级的灵活性,其可以包括完全数据丢弃和存储,然后是完全数据传输。这样,当数据从第一TL传输到第二TL连接时,载波频率和数据时间传输时隙可以改变,其中数据被解调并作为数字数据路由,数字数据可以包括通常被称为队列的短时间存储。这在数据交换机和路由器中非常常见,也出现在芯片上电路中,如美国专利7,362,125和Deb,Dipika等人的论文“使用芯片上传输线的2D meshNoC中的成本效益路由技术”,并行和分布式计算杂志123(2019):118-129,其以全文引用的方式并入本文中。Another alternative is to first reconstruct the data stream from the attenuated signal of the source TL, then remodulate the data with the target carrier frequency 1593 and attenuate it to the target TL. In this way, the system can select a different carrier for the first TL and a different carrier for the second TL. A higher level of flexibility can be achieved through more support circuitry for the direction change processor 1586, which can include full data drop and storage followed by full data transfer. In this way, when data is transmitted from the first TL to the second TL connection, the carrier frequency and data time transmission slot can change, where the data is demodulated and routed as digital data. The digital data can include short periods of time commonly known as queues. storage. This is very common in data switches and routers, but also in on-chip circuits, as shown in U.S. Patent 7,362,125 and the paper "Cost-effective routing techniques in 2D meshNoC using on-chip transmission lines" by Deb, Dipika et al., Parallel and Distributed Journal of Computing 123(2019):118-129, which is incorporated by reference in its entirety.

图15K是两个TL 15810、15820之间连接的简化示意图。例如,在图15H中,X方向TL1582和Y方向TL 1581之间的任何交叉都可以包括使用底层单元处理器1586形成数据传输连接的可编程连接选项。所述连接可以使用如参考图15J所讨论的技术。所述系统可以包括数据路由连接,同时改变方向或以相同方向改变TL。去往和来自数据路由处理器15861的分接和导线15811、15821可以包括到TL的晶体管控制的连接或其它分接/接通技术,例如之前讨论的V4。连接线15811、15821用于信号及其返回,且用于用于两个信号及其返回的差分对TL。Figure 15K is a simplified schematic diagram of the connection between two TLs 15810, 15820. For example, in Figure 15H, any intersection between the X direction TL 1582 and the Y direction TL 1581 may include programmable connection options using the underlying unit processor 1586 to form a data transfer connection. The connection may use techniques as discussed with reference to Figure 15J. The system may include data routing connections that change direction simultaneously or change the TL in the same direction. Taps and conductors 15811, 15821 to and from the data routing processor 15861 may include transistor controlled connections to the TL or other tap/turn technology, such as the V4 discussed previously. Connection lines 15811, 15821 are for the signal and its return, and for the differential pair TL for both signals and their return.

如图15E所示,互连技术的首选在很大程度上取决于路由的长度或连接元件之间的距离。对于本文所述的3D系统,X-Y连接可以利用具有RC的导线来进行相对短的互连,例如在相邻单元之间(0.2至2mm)、跨越较长路线(1至300mm)的RF-I以及针对较长路线(超过100mm)的光学互连。这些技术的使用可以包括混合技术的使用,例如使用RF信号来调制光信号,如以全文引用的方式并入本文中的美国专利10,502,987中所提出的。As shown in Figure 15E, the preferred interconnect technology depends largely on the length of the route or the distance between connected elements. For the 3D systems described in this article, X-Y connections can utilize wires with RC for relatively short interconnects such as RF-I between adjacent cells (0.2 to 2mm), across longer routes (1 to 300mm) and optical interconnects for longer routes (over 100mm). The use of these techniques may include the use of hybrid techniques, such as using RF signals to modulate optical signals, as proposed in US Patent 10,502,987, which is incorporated herein by reference in its entirety.

X-Y连接可以利用多个层级,每个层级都支持其中一种连接技术。通过引用技术并入本文的许多内容建议节点之间的RF-I以及到这些节点的有线(RC)连接。图15L示出了图151的修改框图,其中4个单元被聚合以与RF-I结构通信。处理器15830,其具有连接四个聚合单元的四条垂直母线的功能,所述四条聚合单元连接到重叠TL中的一个,以向系统中的另一个单元传输数据或从系统中的其它单元传输数据。处理器15840可以是直接存储器访问处理器(DMA)。这样的处理器可以利用对四个底层单元的存储器的直接访问来将数据块从系统中的另一个单元传送或传送到系统中的其它单元。额外的处理器15860可以用于促进从面向X的TL到面向Y的TL的数据传输,或者在具有不同功能(例如长连接和短连接)的具有相同方向的TL之间的数据传输。3D系统X-Y互连的工程可能包括在转移到TL连接之前,使用导线聚合2x2、3x3、4x4或其它配置的单元。这种工程可以考虑单元的大小和在TL上添加落客/上车的性能成本。3D系统可以包括这些技术的组合,包括从单元和从单元聚合器到TL的连接。X-Y connections can utilize multiple tiers, each supporting one of the connectivity technologies. Much of what is incorporated herein by reference suggests RF-I between nodes as well as wired (RC) connections to these nodes. Figure 15L shows a modified block diagram of Figure 151 where 4 units are aggregated to communicate with the RF-I fabric. Processor 15830 with the function of four vertical buses connecting four aggregation units connected to one of the overlapping TLs to transmit data to or from another unit in the system . Processor 15840 may be a direct memory access processor (DMA). Such a processor can utilize direct access to the memory of the four underlying units to transfer blocks of data from or to another unit in the system. Additional processors 15860 may be used to facilitate data transfer from an X-oriented TL to a Y-oriented TL, or between TLs with the same direction with different capabilities (e.g., long connections and short connections). Engineering of 3D system X-Y interconnects may include using wires to aggregate cells in 2x2, 3x3, 4x4 or other configurations before moving to TL connections. This kind of engineering can take into account the size of the unit and the performance cost of adding drop-off/pick-up on the TL. 3D systems can include a combination of these technologies, including slave cells and slave cell aggregator connections to the TL.

此外,通过垂直母线位置的标准化结构,可以使用3D系统中元素的混合和匹配。处理器和存储器的层级可以利用不同层级的X-Y连接或不同层级的存储器等等,所述概念的模块性可以通过根据应用需求进行切割来扩展以支持不同的3D系统尺寸。这里介绍的3D系统结构可以支持X-Y和Z的模块化和定制。In addition, through the standardized structure of the vertical busbar positions, mixing and matching of elements in the 3D system can be used. Hierarchies of processors and memory can utilize different levels of X-Y connections or different levels of memory, etc. The modularity of the concept can be expanded to support different 3D system sizes by cutting according to application needs. The 3D system architecture presented here can support X-Y and Z modularity and customization.

互连层也可以连接到其自己的内置自检和监视器电路,因为大型RF导线可能存在瞬时或永久的内部错误。连接到内置自检的互连层可以进一步包括训练模块和芯片上端接。用于传输线中的阻抗匹配的终端电阻器可以位于硅芯片内部。终端电阻器的阻抗值可以进一步进行微调和粗调,以应对工艺、电压和温度变化。训练和校准电路确定最佳终止阻抗,以减少信号反射率并补偿可变性。这些监视器电路可以发送测试消息,在出现错误消息的情况下,可以将流量从故障线路重定向开。冗余互连线可能是可用的,因为射频层级将包含足够的面积来容纳重复。由于这些大型晶片级系统需要使用寿命,这些容错测试和冗余将延长其使用寿命,并允许优雅的退化技术。The interconnect layer can also be connected to its own built-in self-test and monitor circuitry, as large RF wires can have transient or permanent internal errors. The interconnect layer connected to the built-in self-test can further include training modules and on-chip termination. Terminating resistors for impedance matching in transmission lines can be located inside the silicon chip. The impedance value of the terminating resistor can be further fine- and coarse-tuned to account for process, voltage, and temperature variations. The training and calibration circuit determines the optimal termination impedance to reduce signal reflectivity and compensate for variability. These monitor circuits can send test messages and, in the event of an error message, redirect traffic away from the failed line. Redundant interconnect lines may be available because the RF hierarchy will contain sufficient area to accommodate duplication. Since these large wafer-scale systems require longevity, these fault-tolerant tests and redundancies will extend their lifespan and allow for graceful degradation techniques.

传统上,用于光刻步骤的光罩的光场面积在整个工艺中是相同的,这决定了芯片的最大尺寸。为了实现跨越多个芯片的TL,由于TL特征的相对低的分辨率要求,用于处理TL的光罩的光场面积可以大于处理器芯片15902的光场区域。图15M展示了一个尺寸过大的光罩15904的示例。另外,可以拼接多个过大的光罩TL图案以形成整个TL层的图案。图15M显示了4倍光罩片大小的大面积芯片。此外,一对一接触对准器或接近非接触对准器可以用于TL的光刻。Traditionally, the light field area of the photomask used for the photolithography step is the same throughout the process, which determines the maximum size of the chip. To implement TL across multiple chips, the light field area of the reticle used to process the TL can be larger than that of the processor chip 15902 due to the relatively low resolution requirements of the TL features. Figure 15M shows an example of an oversized reticle 15904. In addition, multiple oversized photomask TL patterns can be spliced to form the pattern of the entire TL layer. Figure 15M shows a large area chip 4 times the size of the reticle. In addition, one-to-one contact aligners or nearly non-contact aligners can be used for lithography of TL.

TL层可以在与处理器晶片相同的晶片厂进行处理;或者,TL层可以在不同的制造厂处理,例如在专用封装制造厂处理。这种TL层处理可以作为重新分布层(RDL)处理的一部分来完成。使用专用封装制造厂实现的TL可以提供比逻辑或RF线路中的金属和电介质更厚的金属和介质。在这种情况下,可以使用诸如聚酰亚胺的有机树脂作为电介质。厚树脂可以帮助将TL层与硅片衬底隔离。结果,增加的Q因子可以减少功耗并减少传输衰减。以下文章中提出了此类TL处理:Balachandran、Jayaprakash等人的“用晶片级封装概念扩展芯片上布线层次”,IEEE 2004国际互连技术会议论文集(IEEE目录号04TH8729),IEEE,2004;Balachandran,Jayaprakash等人的“晶片级封装互连选项”,IEEE超大规模集成(VLSI)系统汇刊14.6(2006):654-659;Itoi,Kazuhisa等人的“用于硅RF应用的晶片级芯片级封装中嵌入的片上高Q螺旋Cu电感器”,2004年IEEE MTT-S国际微波研讨会摘要(IEEE目录号04CH37535),第1卷,IEEE,2004;Lahiji,R.R.等人的“多层聚乙烯-N上的低损耗共面波导传输线和垂直互连”,2009年IEEE射频系统中硅单片集成电路专题会议,IEEE,2009,所有上述内容以全文引用的方式并入本文中。The TL layer can be processed at the same fab as the processor wafer; alternatively, the TL layer can be processed at a different fab, such as a dedicated packaging fab. This TL layer processing can be done as part of the redistribution layer (RDL) processing. TL implemented using dedicated packaging fabs can provide thicker metals and dielectrics than those found in logic or RF lines. In this case, organic resin such as polyimide can be used as the dielectric. Thick resin can help isolate the TL layer from the silicon substrate. As a result, the increased Q factor can reduce power consumption and reduce transmission attenuation. This type of TL processing is proposed in: Balachandran, Jayaprakash et al., “Expanding on-chip routing levels with wafer-level packaging concepts,” Proceedings of the IEEE 2004 International Conference on Interconnect Technology (IEEE Cat. No. 04TH8729), IEEE, 2004; Balachandran , Jayaprakash et al., "Wafer-scale packaging interconnect options," IEEE Transactions on Very Large Scale Integration (VLSI) Systems 14.6 (2006): 654-659; Itoi, Kazuhisa et al., "Wafer-level chip-scale integration for silicon RF applications." "On-Chip High-Q Spiral Cu Inductors Embedded in Packages," Abstracts of the 2004 IEEE MTT-S International Microwave Symposium (IEEE Cat. No. 04CH37535), Volume 1, IEEE, 2004; Lahiji, R.R. et al., "Multilayer Polyethylene Low-loss coplanar waveguide transmission lines and vertical interconnects on -N,” 2009 IEEE Symposium on Silicon Monolithic Integrated Circuits in Radio Frequency Systems, IEEE, 2009. All the above content is incorporated by reference in its entirety.

在本发明的一个实施例中,提出了跨器件的混合互连。如图15N所示,混合多芯片互连使用直接RC互连通过划线15912连接相邻芯片,使用多层重叠TL互连连接分离芯片15910。RC部分可以被称为划线互连,TL部分可以被称作芯片上互连15918。划线互连连接短距离,例如小于1mm,对于所述短距离,普通RC互连可能是有效的。划线链路协议可以包括高速并行互连。芯片上互连可以延伸大于10mm,且可以将多个芯片分组,例如图15N所示的4个芯片15914或大于4个,作为TL互连15916的TX/RX的单元。TL链路协议可以使用高速SerDes,例如DDR存储器母线、USB和PCIe。In one embodiment of the invention, a hybrid interconnect across devices is proposed. As shown in Figure 15N, hybrid multi-chip interconnect uses direct RC interconnects to connect adjacent chips through scribe lines 15912 and multi-layer overlapping TL interconnects to connect separate chips 15910. The RC part may be referred to as the scribe line interconnect and the TL part may be referred to as the on-chip interconnect 15918. Scored interconnects connect short distances, such as less than 1 mm, for which ordinary RC interconnects may be effective. Striped link protocols can include high-speed parallel interconnects. On-chip interconnects can extend greater than 10mm, and multiple chips can be grouped, such as 4 chips 15914 as shown in Figure 15N or more than 4, as units of TX/RX for TL interconnects 15916. TL link protocols can use high-speed SerDes such as DDR memory bus, USB and PCIe.

在本发明的另一个实施例中,提出了一种多层TL。具有较厚导体和导体之间的较大空间的每个信道的1、2、3、4、6或x64通道和长途连接15922可以使用较少的TL通道,例如xl或x2通道。图15P显示了多层TL的横截面图,显示了X方向TL和Y方向TL的两个层级。各种技术中的每一种,特别是TL和光子学,也可以根据系统或子系统(包括高基数路由器)的需要,以不同的拓扑结构排列。这些路由器可以包含在具有可用区域的通信层中,以便放大和引导信号。这些拓扑结构可作为直接X-Y布线的可能替代方案。如图15P和15S所示的这些长途TL可以使用额外的NoC型拓扑,例如,如图15U所示的3D环面或如图15V所示的蝶形拓扑,但也可以包括其它网络拓扑。对这些网络的研究可以以下先前文献中找到:Jiao等人的“基于3D Torus片上网络的同质多核系统的性能分析和优化”,IEEE NEWCAS2010;和Kim等人的“片上网络的扁平蝶形拓扑”,IEEE计算机体系结构快报2007,所有上述内容以全文引用的方式并入本文中。传统上,半导体处理器以及所谓的晶片级引擎都是矩形的。图15Q显示了I.Cutress的“热芯片31直播博客:Cebebras的1.2万亿晶体管深度学习处理器”提出的晶片级引擎的示例,其以引用的方式并入本文中。然而,为了节省图15Q中所示的浪费区域,可以考虑使用非矩形3D系统,其中多个芯片通过晶片上TL连接。如图15R所示,即使对于大面积芯片,也可以使用整个圆形晶片,还可以使用一半或四分之一的晶片,而不会损失边缘附近的任何芯片。In another embodiment of the invention, a multi-layer TL is proposed. 1, 2, 3, 4, 6 or x64 lanes per channel and long haul connections with thicker conductors and larger spaces between conductors 15922 can use fewer TL lanes such as xl or x2 lanes. Figure 15P shows a cross-sectional view of the multi-layer TL, showing two levels of the X-direction TL and the Y-direction TL. Each of the various technologies, especially TL and photonics, can also be arranged in different topologies depending on the needs of the system or subsystem, including high-radix routers. These routers can be included in the communications layer with available zones to amplify and direct the signal. These topologies serve as possible alternatives to direct X-Y routing. These long-distance TLs as shown in Figures 15P and 15S may use additional NoC-type topologies, such as a 3D torus as shown in Figure 15U or a butterfly topology as shown in Figure 15V, but other network topologies may also be included. Research on these networks can be found in the following previous literature: "Performance Analysis and Optimization of Homogeneous Multicore Systems Based on 3D Torus Network-on-Chip" by Jiao et al., IEEE NEWCAS 2010; and "Flat Butterfly Topology for On-Chip Networks" by Kim et al. ”, IEEE Computer Architecture Letters 2007, all of which are incorporated by reference in their entirety. Traditionally, semiconductor processors and so-called wafer-scale engines have been rectangular. Figure 15Q shows an example of the wafer-scale engine proposed in I. Cutress’s “Hot Chip 31 Live Blog: Cebebras’ 1.2 Trillion Transistor Deep Learning Processor,” which is incorporated herein by reference. However, to save the wasted area shown in Figure 15Q, one could consider using a non-rectangular 3D system in which multiple chips are connected via on-wafer TLs. As shown in Figure 15R, even for large-area chips, the entire circular wafer can be used, and half or quarters of the wafer can be used without losing any chip near the edge.

TL层级可以放置在处理器和存储器层级的下面和/或上面,如图所示。15S使用本文所述的层级传输技术。圆形晶片或圆形晶片的一部分的这种使用进一步受益于使用诸如约200x200μ2或约150x150μ2的小单元尺寸。相对较小单元的阵列的使用进一步提高了具有矩形单元的圆形的利用率。此外,如本文参考图16F所示的无线互连的使用,其中配合晶片提供了3D系统与其它系统的连接,并支持晶片圆形的充分使用,而不需要浪费与方形最终产品形状相关的晶片部分。The TL hierarchy can be placed below and/or above the processor and memory hierarchy as shown in the figure. 15S uses the hierarchical transmission technology described in this article. This use of circular wafers or portions of circular wafers further benefits from the use of small cell sizes such as approximately 200x200μ2 or approximately 150x150μ2. The use of arrays of relatively small cells further improves the utilization of circles with rectangular cells. Additionally, the use of wireless interconnects, as shown herein with reference to Figure 16F, where mating wafers provide connections between the 3D system and other systems, and enables full use of the wafer round shape without the need to waste wafers associated with square end product shapes part.

最近,一种超越晶片级封装(WLP)的技术,即面板级封装(PLP)正在出现,它提供了更低成本的扇出。在PLP中,使用大的正方形面板或LCD衬底。提供了各种面板区域;然而,最小的面板尺寸仍然大于300mm晶片的尺寸,如图15T所示。跨越多个芯片的多层TL可以在PLP衬底上制造,然后可以将整个CMOS晶片转移并接合到PLP衬底。Recently, a technology that transcends wafer-level packaging (WLP), namely panel-level packaging (PLP), is emerging to provide lower-cost fan-out. In PLP, a large square panel or LCD substrate is used. Various panel areas are provided; however, the smallest panel size is still larger than the size of the 300mm wafer, as shown in Figure 15T. Multilayer TLs spanning multiple chips can be fabricated on a PLP substrate, and then the entire CMOS wafer can be transferred and bonded to the PLP substrate.

数据流从一个TL到另一个TL的转换可能是整个系统数据路由的一部分。它可以是方向改变的一部分,例如从X到Y(或从Y到X),或者沿着相同的方向但从长TL移动到短TL(或从短TL移动到长TL)。如图15F所示,TL可以在密度和衰减之间进行权衡的情况下以不同的间距进行设计1555。3D系统可以由TL构成,所述TL设计用于支持长距离的数据传输,如联邦高速公路,可以选择过渡到较短的TL,如州高速公路,然后一旦转换回RC线路的电压型信号,连接就像地方道路一样。这一概念可以包括用于非常长的数据传输的到光波导1547的转换和从光波导1546的转换。另一种转换可以利用诸如1593之类的结构来将数据从一个频率载波移动到另一频率载波。The transformation of data flows from one TL to another may be part of the overall system data routing. It can be part of a change in direction, such as from X to Y (or from Y to X), or moving in the same direction but from long TL to short TL (or from short TL to long TL). As shown in Figure 15F, TLs can be designed at different spacings 1555 with a trade-off between density and attenuation. 3D systems can be composed of TLs designed to support data transmission over long distances, such as Federal Highway For highways, there is the option of transitioning to a shorter TL, such as a state highway, and then once converted back to the voltage type signal of the RC line, the connection is just like a local road. This concept may include conversion to and from optical waveguide 1547 for very long data transmission. Another conversion can utilize a structure such as 1593 to move data from one frequency carrier to another.

RF-I可能包括的其它方面是使用低频载波来支持,例如,包括路由分配在内的系统管理。正如我们从频率与衰减图1555(图15F的一部分)中看到的,低载波频率可能具有非常低的衰减,使其非常适合系统控制和广播功能。因此,在这样的3D系统中,源单元可以通过使用一个方向列中的TL作为源来广播数据,然后使用每行的正交TL来广播数据。例如,这些可以被构造为全系统广播或按定义的区域选择性广播。Other aspects that RF-I may include are the use of low-frequency carriers to support, for example, system management including route distribution. As we can see from the frequency versus attenuation plot 1555 (part of Figure 15F), low carrier frequencies can have very low attenuation, making them ideal for system control and broadcast functions. Therefore, in such a 3D system, the source unit can broadcast data by using the TL in one directional column as the source, and then broadcast the data using the orthogonal TL of each row. For example, these can be structured as system-wide broadcasts or selective broadcasts by defined areas.

对于广播,TL网络可以利用其广播能力在X方向上快速传播数据包,然后向Y方向扇出。对于多个发送者正在发送相关消息(例如确认消息或屏障同步)的所有对一、所有对所有或其它多播操作,也可以相反地进行相同的操作。如图15G所示,这可以通过现有结构来实现。For broadcasts, a TL network can leverage its broadcast capabilities to quickly spread packets in the X direction and then fan out in the Y direction. The same operation can also be done in reverse for all-to-one, all-to-all, or other multicast operations where multiple senders are sending related messages (such as acknowledgment messages or barrier synchronization). As shown in Figure 15G, this can be achieved with existing structures.

消息可以在RF层级的处理器或网络接口中聚合,然后作为组合消息发送,以减少拥塞并提高大型多播的吞吐量。由于图15H中共享X方向TL 1582的节点响应多对一消息,X和Y TL 1581和1582之间的接口可以将消息保留在缓冲器中。在一段时间后,缓冲器可以将响应组合成单个更大的消息,并将其作为一个消息在Y方向1581上发送,从而以较少的拥塞进行响应。缓冲区可以在接口中,作为系统15861的放大和路由的一部分,如图15K所示。Messages can be aggregated at the processor or network interface at the RF level and then sent as combined messages to reduce congestion and increase throughput of large multicasts. Since the nodes sharing the X direction TL 1582 in Figure 15H respond to many-to-one messages, the interface between the X and Y TLs 1581 and 1582 can keep the messages in buffers. After some time, the buffer can combine the responses into a single larger message and send it as one message in the Y direction 1581, thus responding with less congestion. The buffer can be in the interface as part of the amplification and routing of the system 15861, as shown in Figure 15K.

这可以使用软件控制机构或可以构建在硬件中的路由协议来实现。多对一水平可以是独立的水平,例如图15G的1579,或者作为RF-M水平1570的一部分集成。已经在宏观尺度上证明了聚合在大型超级计算机规模的互连网络中的应用:Chen等人的“Looking UnderThe Hood of The IBM Blue Gene/Q Network”,IEEE SC 2012;和Bui等人的“使用压缩、拓扑感知数据聚合和子文件的Blue Gene/Q超级计算机上的可扩展并行I/O”,以及基于网络的处理2014,所有上述内容以全文引用的方式并入本文中。This can be accomplished using a software control mechanism or a routing protocol that can be built into the hardware. The many-to-one level may be a standalone level, such as 1579 of Figure 15G, or integrated as part of the RF-M level 1570. The use of aggregation in large supercomputer-scale interconnection networks has been demonstrated at macroscale: Chen et al., “Looking Under the Hood of The IBM Blue Gene/Q Network,” IEEE SC 2012; and Bui et al., “Using Compression, Topology-Aware Data Aggregation, and Scalable Parallel I/O on the Blue Gene/Q Supercomputer for Subfiles," and Network-Based Processing 2014, all of which are incorporated by reference in their entirety.

传统的片上TL使用共面波导。已经开发了更多的技术,可以作为改进进行设计。在Feng Zijun、Nan Li和Xiuping Li的论文“CMOS片上传输线的亚太赫兹特性”,2015,IEEE\HT-S国际数值电磁和多物理建模与优化会议(NEMO),IEEE,2015中(以全文引用的方式并入本文中),称为接地共面波导(GCPW)的TL显示出低至0.06dB/mm、0.08dB/mm和0.04dB/mm的衰减,支持亚太赫兹(超过100GHz)载波连接。在LaRocca、Tim、Jenny Yi-Chun Liu和Mau-Chung Frank Chang的论文“60GHz CMOS放大器,使用变压器耦合和人工电介质差分传输线进行紧凑设计”,IEEE固态电路杂志44.5(2009):1425-1435中(以全文引用的方式并入本文中),它建议人工电介质带提供衬底屏蔽,并将有效介电常数提高到54,以进一步减小尺寸。CPW和CPWG之间的混合的额外变化导致屏蔽CPW(SCPW)。SCPW在信号路径下方没有固体接地平面,而是连接两个共面接地路径的金属段网格,起到屏蔽的作用,作者:Lourandakis,Errikos等人的“nm CMOS中毫米波传输线的参数分析和设计指南”,IEEE微波理论与技术汇刊66.10(2018):4383-4389,其以全文引用的方式并入本文中。CMOS.MS thesis.2013中的Bjomdal,Oystein.Millimeter波互连和慢波传输线(以全文引用的方式并入本文中)中提出了一种新的梳状慢波接地共面波导(comb-S-GCPW)的另一种变体,其有效介电常数为140,与传统的CPW相比,尺寸减小了83%。另一种方法支持具有数字间加载短截线和浮动条的超慢波片上共面波导(CPW)传输线,与传统的片上CPW传输线相比,所述传输线提供更小的尺寸和更低的损耗:Arigong,Bayaner等人的“CMOS技术上的超慢波传输线”,微波和光学技术快报59.3(2017):604-606(以全文引用的方式并入本文中)Traditional on-chip TL uses coplanar waveguides. More techniques have been developed and can be engineered as improvements. In the paper "Sub-terahertz characteristics of CMOS on-chip transmission lines" by Feng Zijun, Nan Li, and Xiuping Li, 2015, IEEE\HT-S International Conference on Numerical Electromagnetics and Multiphysics Modeling and Optimization (NEMO), IEEE, 2015 (full text available (Incorporated into this article by reference), the TL, known as grounded coplanar waveguide (GCPW), shows attenuation as low as 0.06dB/mm, 0.08dB/mm, and 0.04dB/mm, supporting sub-terahertz (over 100GHz) carrier connections . In LaRocca, Tim, Jenny Yi-Chun Liu, and Mau-Chung Frank Chang, "60GHz CMOS Amplifier, Compact Design Using Transformer Coupling and Artificial Dielectric Differential Transmission Lines," IEEE Solid-State Circuits Magazine 44.5 (2009): 1425-1435 ( Incorporated herein by reference in its entirety), it was proposed that artificial dielectric strips provide substrate shielding and increase the effective dielectric constant to 54 for further size reduction. Additional changes in the mix between CPW and CPWG result in Shielded CPW (SCPW). SCPW does not have a solid ground plane beneath the signal path, but rather a grid of metal segments connecting two coplanar ground paths, acting as a shield, in "Parametric Analysis of Millimeter-Wave Transmission Lines in nm CMOS and by Lourandakis, Errikos et al. Design Guide,” IEEE Transactions on Microwave Theory and Technology 66.10 (2018): 4383-4389, which is incorporated by reference in its entirety. A new comb-shaped slow-wave grounded coplanar waveguide (comb-S) is proposed in Bjomdal, Oystein. Millimeter wave interconnects and slow-wave transmission lines in CMOS. - Another variant of GCPW) with an effective dielectric constant of 140 and an 83% reduction in size compared to conventional CPW. Another approach supports ultra-slow wave on-chip coplanar waveguide (CPW) transmission lines with inter-digit loading stubs and floating bars, which offer smaller size and lower loss than traditional on-chip CPW transmission lines : Arigong, Bayaner et al., "Ultra-slow wave transmission lines in CMOS technology", Microwave and Optical Technology Letters 59.3 (2017): 604-606 (incorporated into this article by full citation)

可以集成到此类3D系统中的另一项技术是利用表面波互连的“一对多”互连,如以下公开内容中详述:Karkar、Ammar和Alex Yakovlev的“利用有线表面波互连体系结构实现片上网络中的一对多业务”和Karkar,Ammar等人的“使用混合有线和表面波互连的片上网络多播体系结构”,IEEE计算新兴主题汇刊6.3(2016):357-369,所有内容均以全文引用的方式并入本文中。如上所述,表面波(SW)或天颈表面波是由金属介电表面支撑的非均匀电磁(EM)波。所设计的表面是将EM信号捕获在二维介质中而不是三维自由空间中的波导。因此,SWI中从源沿边界水平的E场衰减率约为(1/√d),这使得SWI技术除了TL互连水平之外还具有吸引力。一对多层级可以是独立层级,如图15G中的1579,也可以作为RF-M层级1570的一部分进行集成。为这种3D系统增加广播能力的另一种选择是使用以下内容中提出的无线技术:如Abadal,Sergi等人的“广播支持的大规模多核架构:无线RF方法”,IEEE micro35.5(2015):,其以全文引用的方式并入本文中。Another technology that can be integrated into such 3D systems is “one-to-many” interconnects utilizing surface wave interconnects, as detailed in the following publication: Karkar, Ammar, and Alex Yakovlev, “Utilizing Wired Surface Wave Interconnects. Architecture to Enable One-to-Many Services in On-Chip Networks” and Karkar, Ammar et al., “A Multicast Architecture for On-Chip Networks Using Hybrid Wired and Surface Wave Interconnects”, IEEE Transactions on Emerging Topics in Computing 6.3 (2016): 357- 369, all contents of which are incorporated herein by reference in their entirety. As mentioned above, surface waves (SW) or sky-neck surface waves are non-uniform electromagnetic (EM) waves supported by metallic dielectric surfaces. The designed surface is a waveguide that traps EM signals in a two-dimensional medium rather than in three-dimensional free space. Therefore, the E-field attenuation rate in SWI from the source along the boundary level is approximately (1/√d), which makes the SWI technology attractive in addition to the TL interconnect level. The one-to-many layer can be an independent layer, such as 1579 in Figure 15G, or it can be integrated as part of the RF-M layer 1570. Another option to add broadcast capabilities to such 3D systems is to use wireless technologies as proposed in Abadal, Sergi et al., "Broadcast-supported large-scale multicore architectures: a wireless RF approach," IEEE micro35.5 (2015 ): , which is incorporated herein by reference in its entirety.

3D系统X-Y互连可以包括多层电磁互连,例如使用不同尺寸长度和方向的TL的RF-I,X-Y连接的广播部分的SWI水平,以及较长部分的光波导。这种异构X-Y连接可以支持大的芯片/器件面积,例如连接数十万甚至数百万计算元件的3D系统的大阵列单元的晶片级集成。系统的管理可以利用被指定为系统管理器或分布式计算网络的单元。为服务器场开发的技术可以用于帮助组织和操作这样的3D系统。3D system X-Y interconnects can include multi-layer electromagnetic interconnects such as RF-I using TLs of varying size lengths and orientations, SWI levels for broadcast portions of X-Y connections, and optical waveguides for longer portions. This heterogeneous X-Y connection can support large chip/device areas, such as wafer-level integration of large array units of 3D systems connecting hundreds of thousands or even millions of computing elements. Management of the system may utilize elements designated as system managers or distributed computing networks. Technologies developed for server farms can be used to help organize and operate such 3D systems.

可以以协同的方式使用多种互连技术,以改善整个系统的连接性。这可以在本文所述的3D系统中有效地利用,所述3D系统可以用于添加优化为支持不同互连技术的3D系统堆叠层级,例如RF、普通线、SWI和光学互连。以下内容中提出了这种混合方法:Krishna,Tushar等人的“使用全球线路通信的具有接近理想快速虚拟信道的NoC”2008年第16届IEEE高性能互连研讨会,IEEE,2008;Oh Jungju,Alenka Zajic和Milos Prvulovic的“低延迟无开关TL环和高通量交换片上互连之间的流量控制”,第22届并行体系结构和编译技术国际会议论文集,IEEE,2013,所有上述内容以全文引用的方式并入本文中。Multiple interconnect technologies can be used in a collaborative manner to improve overall system connectivity. This can be effectively exploited in the 3D system described herein, which can be used to add 3D system stacking levels optimized to support different interconnect technologies, such as RF, plain wire, SWI, and optical interconnects. This hybrid approach is proposed in: Krishna, Tushar et al. “NoC with near-ideal fast virtual channels using global wire communication” 2008 16th IEEE High Performance Interconnect Symposium, IEEE, 2008; Oh Jungju , "Flow control between low-latency switchless TL rings and high-throughput switching on-chip interconnects" by Alenka Zajic and Milos Prvulovic, Proceedings of the 22nd International Conference on Parallel Architecture and Compilation Technology, IEEE, 2013, all of the above It is incorporated herein by reference in its entirety.

近年来,RF-I的表面波概念得到了进一步发展。这导致了“表面等离子体激元(SPP)的传输和检测”的发展,一种定位在金属/电介质界面上的特殊TM偏振表面波。通过在金属线上人工设计周期性的亚波长金属带,或称为欺骗SPP,可以在亚太赫兹区域的金属和电介质之间建立和传播TM模式表面波信号,如以下内容中提出:Liang Yuan,等人的“65nmCMOS中具有40dB消光比和3.7mW输出功率的D波段表面波调制器和信号源”,ESSCIRC 2018-IEEE第44届欧洲固态电路会议(ESSCIRC),IEEE,2018;Liang Yuan等人的“CMOS模式转换器芯片上亚太赫兹表面等离子体激元传输线”科学报告6(2016):30063;Liang Yuan等人的“CMOS中表面等离子体激元互连的节能低串扰亚太赫兹I/O”,IEEE微波理论与技术汇刊65.8(2017):2762-2774;Joy,Soumitra Roy等人的“Spoof等离子体互连通信超出RC限制”IEEE通信汇刊67.1(2018):599-610;Qi Zihang、Xiuping Li和Hua Zhu的“亚太赫兹区低损耗BiCMOS欺骗表面等离子体激元传输线”,IET微波,天线与传播12.2(2017):254-258;ShiZihao、Yizhu Shen和SanmingHu的“具有减小线宽和增强场限制的Spoof表面等离子体激元传输线”,国际RF和微波计算机辅助工程杂志(2020):e22276;Chen Qian等人的“利用场受限慢波传输线实现多信道FSK芯片间/芯片内通信”,2020年IEEE电路与系统国际研讨会(ISCAS),IEEE,2020;Singh,Surya Prakash,Nilesh Kumar Tiwari和M.Jaleel Akhtar的“具有高隔离和低传播损耗的假表面等离子体传输线”应用光学59.5(2020):1371-1375,所有内容均以全文引用的方式并入本文中。这些介于金属传导TL和介电传导波导之间的TL提供了一种有吸引力的选择,所述选择具有RF-I的芯片上集成的相对容易性和光学互连的低串扰。这些欺骗SPP可以使用传统的共面带(CPS)引导来馈入和馈出信号,如以上通过参考技术并入的详细内容所述。In recent years, the surface wave concept of RF-I has been further developed. This led to the development of "transmission and detection of surface plasmon polaritons (SPP)", a special TM polarized surface wave localized at the metal/dielectric interface. By artificially designing periodic subwavelength metal strips on metal wires, or called spoofing SPP, TM mode surface wave signals can be established and propagated between metals and dielectrics in the sub-terahertz region, as proposed by Liang Yuan, "D-band surface wave modulator and signal source with 40dB extinction ratio and 3.7mW output power in 65nm CMOS" et al., ESSCIRC 2018 - IEEE 44th European Solid-State Circuits Conference (ESSCIRC), IEEE, 2018; Liang Yuan et al. "Sub-terahertz surface plasmon transmission lines on CMOS mode converter chips" Scientific Reports 6 (2016): 30063; "Energy-efficient low-crosstalk sub-terahertz I/O for surface plasmon interconnects in CMOS" by Liang Yuan et al. ", IEEE Transactions on Microwave Theory and Technology 65.8 (2017): 2762-2774; Joy, Soumitra Roy et al. "Spoof Plasma Interconnect Communication Beyond RC Limits" IEEE Transactions on Communications 67.1 (2018): 599-610; Qi "Low-loss BiCMOS deception surface plasmon transmission lines in the sub-terahertz region" by Zihang, Xiuping Li and Hua Zhu, IET Microwaves, Antennas and Propagation 12.2 (2017): 254-258; "With reduced Linewidth and enhanced field-limited Spoof surface plasmon transmission lines", International Journal of RF and Microwave Computer-Aided Engineering (2020): e22276; Chen Qian et al.'s "Utilizing field-limited slow-wave transmission lines to achieve multi-channel FSK inter-chip/ "Intra-Chip Communication", 2020 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, 2020; "False Surface Plasmon Transmission Lines with High Isolation and Low Propagation Loss" by Singh, Surya Prakash, Nilesh Kumar Tiwari, and M. Jaleel Akhtar ” Applied Optics 59.5(2020):1371-1375, all contents are incorporated by reference in their entirety. These TLs, intermediate between metallic conducting TLs and dielectric conducting waveguides, offer an attractive option with the relative ease of on-chip integration of RF-I and the low crosstalk of optical interconnects. These spoof SPPs may use conventional coplanar strip (CPS) guidance to feed signals in and out, as described in detail above by the reference technology.

这些欺骗SPP可以设计为支持亚太赫兹载波或更低的频率,如Kianinejad、Amin、Zhi Ning Chen和Cheng Wei Qiu发表的“具有紧凑过渡和高隔离的低损耗欺骗表面等离子体慢波传输线”,IEEE微波理论与技术汇刊64.10(2016):3078-3086;Shen Sensong等人的“一种新型的三维集成欺骗表面等离子体激元传输线”,IEEE Access 7(2019):26900-26908;以及Ye Longfang等人的“由V形微带实现的高效和低耦合欺骗表面等离子体激元”光学快报27.16(2019):22088-22099,所有这些都以全文引用的方式并入本文中。These spoofing SPPs can be designed to support sub-terahertz carriers or lower frequencies, as published in "Low-loss spoofing surface plasmon slow-wave transmission lines with compact transitions and high isolation," published by Kianinejad, Amin, Zhi Ning Chen, and Cheng Wei Qiu, IEEE Transactions on Microwave Theory and Technology 64.10 (2016): 3078-3086; "A novel three-dimensional integrated deception surface plasmon transmission line" by Shen Sensong et al., IEEE Access 7 (2019): 26900-26908; and Ye Longfang "Efficient and low-coupling deception of surface plasmons enabled by V-shaped microstrips" et al. Optics Letters 27.16 (2019): 22088-22099, all of which are incorporated by reference in their entirety.

3D系统X-Y互连和系统网络上的连接可以利用为大型集成计算系统(如服务器场)开发的知识和工具。其它工作包括已知的片上网络-NOC的工作,如Manevich、Ran等人的“在分层的NOC中设计单周期长链路”,微处理器和微系统38.8(2014):814-825;Lahdhiri、Habiba、Jordane Lorandel和Emmanuelle Bourdel的“RF NoC OFDMA架构的基于阈值的路由算法”,2019第14届可重构通信中心芯片上系统国际研讨会(ReCoSoC),IEEE,2019;Spyropoulou,Maria等人的“迈向1.6T数据中心互连技术:TWILIGHT视角”,物理学杂志:光子2.4(2020):041002;以及Lahdhiri,Habiba等人的“RF NoC Manycore架构的设计探索和性能分析框架”,低功率电子与应用杂志10.4(2020):37,所有内容均以全文引用的方式并入本文中。3D system X-Y interconnects and connections on system networks can leverage knowledge and tools developed for large integrated computing systems such as server farms. Other work includes work on known network-on-chip-NOCs, such as "Designing single-cycle long links in hierarchical NOCs" by Manevich, Ran et al., Microprocessors and Microsystems 38.8 (2014): 814-825; "Threshold-based routing algorithm for RF NoC OFDMA architecture" by Lahdhiri, Habiba, Jordane Lorandel, and Emmanuelle Bourdel, 2019 14th International Symposium on Reconfigurable Communication Center Systems-on-Chip (ReCoSoC), IEEE, 2019; Spyropoulou, Maria, et al. "Towards 1.6T Data Center Interconnect Technologies: A TWILIGHT Perspective" by Al., Journal of Physics: Photonics 2.4 (2020): 041002; and "A Design Exploration and Performance Analysis Framework for RF NoC Manycore Architectures" by Lahdhiri, Habiba et al., Journal of Low Power Electronics and Applications 10.4(2020):37, all contents are incorporated by reference in full.

所述行业正在不断推进通信技术,以支持更好地使用通信介质,如TL,以较低的每比特数据功率实现每秒更高的数据传输。为此,已经开发了先进的数据调制技术,如:Du,Jason Y.的论文“用于存储器到处理器和处理器到处理器接口的射频调制信号互连:概述”,arXiv预印本arXiv:1612.0652(2016);Hamieh,Mohamad等人的“射频芯片内通信物理层的尺寸确定”,2014年第21届IEEE电子、电路和系统国际会议(ICECS),IEEE,2014;以及Chang,Mau Chung F.等人的“用于下一代移动/机载计算系统的多频带射频互连(MRFI)技术”,美国洛杉矶加利福尼亚大学,2017,全部以引用的方式并入本文中。这种先进的调制技术可以用于3D系统的X-Y连接。用于X-Y连接的TL可能有多个下降点。因此,当TL在各个单元上运行时,它可以具有连接,以允许将数据传输到底层单元或从底层单元加载数据。多个TL可以在同一行或同一列单元上并行运行,并连接到所有或部分底层单元。到底层单元的连接可以在并行运行的TL之间划分,以共享这些并行运行的TLs之间的数据传输负载。调制技术可以允许多个数据信道在同一TL上运行而不相互干扰。这些调制信道可以作为3D系统设置工艺的一部分来分配,或者作为系统操作的一部分被动态地分配。其它通信技术可以用于作为3D系统控制结构的一部分来管理这种分配。这些可以包括使用广播技术或利用低频带来传输控制信息,以支持更好的整体X-Y连接。The said industries are continuously advancing communication technologies to support better use of communication media such as TL to enable higher data transmission per second with lower data power per bit. Advanced data modulation techniques have been developed for this purpose, such as: Du, Jason Y.'s paper "RF Modulated Signal Interconnects for Memory-to-Processor and Processor-to-Processor Interfaces: An Overview", arXiv preprint arXiv: 1612.0652(2016); Hamieh, Mohamad et al., “Sizing the Physical Layer for Communication Within RF Chips,” 2014 21st IEEE International Conference on Electronics, Circuits, and Systems (ICECS), IEEE, 2014; and Chang, Mau Chung F et al., "Multiband Radio Frequency Interconnect (MRFI) Technology for Next-Generation Mobile/Airborne Computing Systems," University of California, Los Angeles, 2017, all incorporated by reference. This advanced modulation technology can be used for X-Y connections in 3D systems. The TL used for X-Y connections may have multiple drop points. Therefore, when the TL is running on the individual units, it can have connections to allow data to be transferred to or loaded from the underlying units. Multiple TLs can run in parallel on the same row or column of cells and connect to all or part of the underlying cells. The connection to the underlying unit can be divided among TLs running in parallel to share the data transfer load between these TLs running in parallel. Modulation techniques can allow multiple data channels to operate on the same TL without interfering with each other. These modulation channels can be allocated as part of the 3D system setup process, or dynamically allocated as part of system operation. Other communication technologies may be used to manage this allocation as part of the 3D system control structure. These can include using broadcast techniques or utilizing low frequency bands to transmit control information to support better overall X-Y connectivity.

TL中的多频带通信也可用于拥塞管理。路由器可以被设计为包括流量监控和自适应路由算法。如果链路上的流量拥塞很高,可以为这些路由器保留一个或多个频带,以确保通信的向前发展。Elbrahimi等人的“HARAQ:片上网络中高度自适应路由算法的拥塞感知学习模型”,芯片上网络国际研讨会,2012(以引用的方式并入本文中),包括用于从流量模式学习并相应调整的算法。图15W说明了这种拥塞感知路由,因为消息避开了高拥塞区域。在TL网络中,这些X和Y方向将类似于图15K。Multi-band communication in TL can also be used for congestion management. Routers can be designed to include traffic monitoring and adaptive routing algorithms. If traffic congestion is high on a link, one or more bands can be reserved for these routers to ensure communication moving forward. Elbrahimi et al., “HARAQ: A Congestion-Aware Learning Model for Highly Adaptive Routing Algorithms in Networks on a Chip,” International Symposium on Networks on a Chip, 2012 (incorporated into this article by reference), includes methods for learning from traffic patterns and responding accordingly Adjusted algorithm. Figure 15W illustrates this congestion-aware routing as messages avoid high-congestion areas. In a TL network, these X and Y directions will look like Figure 15K.

3D系统的替代方案可以包括TL内信道的自适应分配。因此,一个单元可以使用TL的更多信道以更高的数据速率进行传输,或者将一些信道留给其它单元,并在所选择的时间段内降低其自身的数据速率。为了说明其中的一些想法,我们可以参考图150,这是DuJieqiong等人的论文“用于高速存储接口的28mW32Gb/s/引脚16-QAM单端收发器”2020IEEE超大规模集成电路研讨会,IEEE,2020的图1,其以引用的方式并入本文中。图15Q示出了16个QAM调制电路,其组合了两个QPSK调制子电路。在3D系统中,数据路由处理器15861可以使用组合信号15926来驱动一个TL,或者具有将调制信号15922的一部分用于一个TL而将另一部分15924用于另一个TL的可编程选项,这两个TL可以并行运行,或者被设计为一个在X方向上,另一个在Y方向上。具有覆盖同一单元的多个TL的3D系统可以利用由相对复杂的调制电路共享资源并驱动多个TL产生的先进信号调制技术。例如,这种先进的调制技术可以利用时钟电路,例如图150中标记为CCK 15928的时钟电路。CCK时钟可以由多个单元共享,或者甚至可以在整个3D系统X-Y连接中共享,以支持先进高效的数据调制技术,从而实现更好的X-Y连接。时钟信号的分布可以通过使用传统的时钟树、诸如谐波谐振时钟之类的高级类型或者通过使用一些TL结构来实现。Alternatives to 3D systems may include adaptive allocation of channels within the TL. Therefore, a unit can use more channels of the TL to transmit at a higher data rate, or leave some channels to other units and reduce its own data rate during selected time periods. To illustrate some of these ideas, we can refer to Figure 150, which is from DuJieqiong et al.'s paper "28mW 32Gb/s/pin 16-QAM single-ended transceiver for high-speed memory interfaces" 2020 IEEE VLSI Symposium, IEEE , Figure 1 of 2020, which is incorporated herein by reference. Figure 15Q shows 16 QAM modulation circuits combining two QPSK modulation sub-circuits. In a 3D system, the data routing processor 15861 may use the combined signal 15926 to drive one TL, or have a programmable option to use part of the modulation signal 15922 for one TL and another part 15924 of the other TL, both TLs can run in parallel, or be designed with one in the X direction and the other in the Y direction. 3D systems with multiple TLs covering the same unit can take advantage of advanced signal modulation techniques produced by relatively complex modulation circuits that share resources and drive multiple TLs. For example, this advanced modulation technique can utilize a clock circuit such as the one labeled CCK 15928 in Figure 150. The CCK clock can be shared by multiple units or even across the entire 3D system X-Y connection to support advanced and efficient data modulation techniques for better X-Y connections. The distribution of clock signals can be achieved by using traditional clock trees, advanced types such as harmonic resonant clocks, or by using some TL structures.

一个额外的替代方案是将X-Y互连构建为具有至少3个步骤的层次结构。首先分解为X方向和Y方向,然后分解为全局线,最后分解为局部线。例如,具有259.2x 259.2mm2的矩形形状的3D系统具有0.2x0.2 mm2的单元大小的阵列,产生1296乘1296个单元的阵列。所提出的X-Y互连层次结构的优点是减少了下降位置的数量和相关的衰减。因此,全局TL可以具有36个落客位置,数据可以从所述落客位置传输到具有36个到达目的地单元的落客位置的本地TL,36x36=1296。全局TL可以用更厚和/或更宽的金属结构,以具有最小的衰减,从而有效地覆盖3D系统的整个长度-259.2mm。从下降到下降可以利用图15J所示的结构,且如果工程权衡和考虑需要,还可以提供信号重缓冲功能。这种分层的X-Y连接可以是本文中提出的其它X-Y连接结构的补充,这些结构将被设计用于特定的3D系统要求。An additional alternative is to structure the XY interconnect as a hierarchy with at least 3 steps. It is first decomposed into X direction and Y direction, then into global lines, and finally into local lines. For example, a 3D system with a rectangular shape of 259.2x 259.2 mm has an array with a cell size of 0.2x0.2 mm , resulting in an array of 1296 by 1296 cells. The advantage of the proposed XY interconnection hierarchy is the reduced number of drop locations and associated attenuation. Therefore, the global TL can have 36 drop-off locations from which data can be transferred to the local TL which has 36 drop-off locations to the destination unit, 36x36=1296. Global TL can be constructed with thicker and/or wider metal to have minimal attenuation, effectively covering the entire length of the 3D system - 259.2mm. The structure shown in Figure 15J can be utilized from drop to drop, and signal re-buffering can also be provided if engineering trade-offs and considerations require it. This layered XY connection can be complementary to other XY connection structures proposed in this article, which will be designed for specific 3D system requirements.

这种3D系统的另一个考虑因素是从上层去除热量,例如,图14A中的层和M层1404的异质集成堆叠。以全文引用的方式并入本文中的美国专利8,674,470教示了使用电力线来提供从3D结构中的水平到最底部或最顶部表面的热去除路径,在所述最底部或顶部表面处热可以通过空气或流体传导去除。这可以是单位垂直支柱的额外功能,例如用于垂直母线的支柱。例如,这些支柱,如图14B中的垂直支柱1414,可以设计为向特定水平提供良好的功率传导,并将热量从可能需要热量去除的水平中去除。这些散热柱可以被视为“热通孔”。这些柱可以设计为具有良好的热路径,到达本文图14A的冷却衬底1401,同时具有电绝缘性。在不导电的情况下形成和利用导热接触的方法,例如,如参考美国专利8674470的至少图6所示。以类似的方式,这些支柱可以热连接和电隔离直到并包括顶层,顶层可以包括用于通过空气或流体传导去除热量的散热器结构。在一个实施例中,可以以减轻甚至屏蔽电磁干扰的方式来设计这些过孔。Another consideration for such a 3D system is the removal of heat from upper layers, such as the heterogeneous integrated stack of layer and M-layer 1404 in Figure 14A. U.S. Patent 8,674,470, which is incorporated herein by reference in its entirety, teaches the use of power lines to provide a heat removal path from a level in a 3D structure to the bottom or top surface where heat can pass through the air or fluid conduction removal. This can be an additional feature of the unit's vertical struts, such as struts for vertical busbars. For example, these struts, such as vertical struts 1414 in Figure 14B, can be designed to provide good power conduction to specific levels and to remove heat from levels where heat removal may be required. These heat dissipation pillars can be considered as "thermal vias". These pillars can be designed to have a good thermal path to the cooling substrate 1401 of Figure 14A herein while being electrically insulating. Methods of forming and utilizing thermally conductive contacts without being electrically conductive are, for example, as shown in reference to at least FIG. 6 of US Pat. No. 8,674,470. In a similar manner, the pillars may be thermally connected and electrically isolated up to and including the top layer, which may include a heat sink structure for removing heat by air or fluid conduction. In one embodiment, these vias can be designed in a way to mitigate or even shield electromagnetic interference.

此外,热隔离技术、方法、材料和结构,例如在整个美国专利9,023,688中公开的,可以用于本文公开的3D系统和器件。上述美国专利及其全部内容以引用的方式并入本文中。Additionally, thermal isolation techniques, methods, materials and structures, such as those disclosed throughout U.S. Patent 9,023,688, may be used with the 3D systems and devices disclosed herein. The above-mentioned U.S. patents and their entire contents are incorporated herein by reference.

图16A示出了类似于本文图14E中公开的3D系统的X-Z 1602侧剖视图,包括计算逻辑的上层1604。热隔离层1605可用于防止计算逻辑1604的热量基本上到达布置在其下方的存储器堆叠1603,且散热器1606可用于将热量从器件/系统移除和从器件/体系移除。通常导电的电源线(未示出)可以部分地相对于散热器1606热连接和电隔离,以帮助从顶部去除由内部堆叠1603产生的形成和操作热,以及通过其液体微通道冷却1610将热去除到底部衬底1601。Figure 16A shows an X-Z 1602 side cutaway view of a 3D system similar to that disclosed in Figure 14E herein, including an upper layer 1604 of computational logic. Thermal isolation layer 1605 may be used to prevent heat from computing logic 1604 from substantially reaching the memory stack 1603 disposed therebelow, and heat sink 1606 may be used to remove heat to and from the device/system. Typically conductive power lines (not shown) may be partially thermally connected and electrically isolated relative to the heat sink 1606 to assist in removing the formation and operating heat generated by the internal stack 1603 from the top, as well as channeling the heat through its liquid microchannel cooling 1610 Remove to bottom substrate 1601.

图16B示出了类似的3D系统,其中计算逻辑的上层具有其自己的液体冷却衬底1614,其可以以与底部衬底1601类似的方式包括电力输送线和沟槽电容器。液冷衬底1614可以是硅中介层的一部分,或者单独制造并接合到3D系统中,或者甚至与3D系统的硅衬底的基片单片集成。Figure 16B shows a similar 3D system where the upper layer of computing logic has its own liquid cooling substrate 1614, which may include power delivery lines and trench capacitors in a similar manner to the bottom substrate 1601. The liquid cooling substrate 1614 may be part of the silicon interposer, or be fabricated separately and bonded into the 3D system, or even monolithically integrated with the base of the 3D system's silicon substrate.

超规模集成的动机可能建议为3D系统添加更多的计算层级。然而,这样的计算水平可能会产生太多的热量,而电力线网络无法消除这些热量。可能需要将具有液体微通道冷却的液位嵌入3D堆叠内部,而不仅仅是底部和顶部,如图16B所示。微通道冷却可以是冷却剂或热管的流体通道。这些微通道可以进一步与传统的被动冷却相接合,如翅片散热器和通风槽。在本发明的一个实施例中,微通道可以包括强制对流装置,例如风扇和喷嘴。冷却剂可以在3D系统外部的带有热交换器和冷板的回路中泵送。The motivation for ultra-scale integration may suggest adding more computational layers to 3D systems. However, this level of computing may generate too much heat for powerline networks to remove. It may be necessary to embed the liquid levels with liquid microchannel cooling inside the 3D stack, rather than just the bottom and top, as shown in Figure 16B. Microchannel cooling can be fluid channels for coolant or heat pipes. These microchannels can be further coupled with traditional passive cooling, such as finned heat sinks and ventilation slots. In one embodiment of the invention, the microchannels may include forced convection devices such as fans and nozzles. Coolant can be pumped in a circuit with heat exchangers and cold plates outside the 3D system.

挑战在于通过厚衬底管理系统垂直(Z方向)连接,所述厚衬底可以支持微通道冷却,如Colgan,Evan G.等人的“用于高功率芯片的硅微通道冷却器的实际实现”IEEE元件和封装技术汇刊30.2(2007):218-225(以引用的方式并入本文中)中提出。这样的衬底可以至少有50微米厚,且可能需要直径约为5微米的TSV穿过。用于垂直母线的柱可以使用直径小于1微米的穿透层过孔,也称为纳米TSV。管理这种垂直连接挑战的一种方法可以是通过TSV调制信号,例如通过使用类似于本文中针对X-Y连接所提出的RF互连或光学互连。The challenge is to manage system vertical (Z-direction) connections through thick substrates that can support microchannel cooling, as described in Colgan, Evan G. et al., “Practical Implementation of Silicon Microchannel Coolers for High-Power Chips” ” IEEE Transactions on Components and Packaging Technology 30.2 (2007): 218-225 (incorporated herein by reference). Such a substrate may be at least 50 microns thick and may require TSVs of about 5 microns in diameter to pass through. Pillars for vertical busbars can use through-layer vias less than 1 micron in diameter, also known as nano-TSVs. One way to manage this vertical connectivity challenge could be to modulate the signal via TSV, for example by using RF interconnects or optical interconnects similar to those proposed in this article for X-Y connectivity.

图16C示出了具有嵌入式微通道冷却衬底1624的3D系统的侧面X-Z 1602剖视图。衬底可以包括TSV 1622,其可以用于通过衬底的电力线连接和携带调制数据的电磁波。1623以下的层和1626以上的层可以包括用于控制、生成和检测通过TSV 1622传播的电磁调制数据的电路。顶层可以包括额外的X-Y电磁连接1628或者到可以支持无线连接的外部器件的连接。Figure 16C shows a side X-Z 1602 cross-sectional view of a 3D system with an embedded microchannel cooling substrate 1624. The substrate may include TSVs 1622, which may be used to connect power lines through the substrate and carry electromagnetic waves that modulate data. Layers below 1623 and above 1626 may include circuitry for controlling, generating, and detecting electromagnetic modulated data propagated through TSV 1622. The top layer may include additional X-Y electromagnetic connections 1628 or connections to external devices that may support wireless connectivity.

传输3D系统的数据输入和输出(I/O)可以利用许多技术,如布线和接合技术。无线技术的使用可能是利用本文所提出的层传输概念的一个有吸引力的选择。因此,上层1626、1628可以具有指定用于系统级IO的M层级。G5等移动技术的进步促进了片上RF电路的快速发展。多篇论文建议将无线技术用于片上网络(“NOC”)应用,如以下内容中所提出:DebSujay等人的“无线NOC作为多核芯片的互连主干:前景和挑战”IEEE电路与系统新兴和选定主题期刊2.2(2012):228-239;Yu Xinmin等人的“多信道毫米波无线NoC的结构与设计”,IEEE设计与测试31.6(2014):19-28;Mineo,Andrea等人的“在无线NoC架构中利用天线指向性”,微处理器和微系统43(2016):59-66;Kim,Ryan Gary等人的“支持VFI的多核芯片设计的无线NoC:性能评估和设计权衡”IEEE计算机汇刊65.4(2015):1323-1336,上述所有专利和论文均以引用的方式并入本文中。一些人建议使用TL和/或波导和无线的混合系统连接,例如Agyeman、Michael Opoku等人的论文“用于混合有线-无线NoC设计的弹性二维波导通信结构”IEEE并行和分布式系统汇刊28.2(2016):359-373,以引用的方式并入本文中。这些无线连接技术也可以用于将3D系统连接到提供系统级I/O的外部器件。这也可能包括Zhang,Hao Chi等人的论文“用于亚衍射受限信号的集成无线通信的等离子体路径”,光:科学与应用9.1(2020):1-9中提出的等离子体技术,以上所有内容以全文引用的方式并入本文中。扩展无线的使用,例如为NOC提供的无线,以将3D系统连接到外部器件,可能是有吸引力的,因为它可以共享NOC和与外部器件的连接的资源。图16F说明了这样一个概念。3D系统1662可以适当地(由工程考虑和权衡决定)靠近连接结构1664放置。可以有数十个或数百个无线信道将3D系统连接到外部连接器件。这种并行连接可以通过为3D系统提供广泛且并行的分布式连接来增强整个系统。连接结构1664可以用光纤1666连接到上游数据源。根据工程和商业权衡,连接结构1664可以用与3D系统类似的技术构造,或者利用传统的PCB型集成技术。图16F的3D系统I/O结构可以应用于其它类型的无线连接,例如通过使用二极管激光器的光学连接,且可以包括高级信号,例如轨道角动量(0AM)。这可以用于RF类型或光学类型的无线连接。在Bahari,Babak等人最近的论文“光子量子霍尔效应和大轨道角动量的多路复用光源”,自然物理学(2021):1-4中(以引用的方式并入本文中),提出了使用0AM的“无限”数据容量。其它技术也可以用于利用形成这种无线连接的结构1662、1664之间的接近度来增加信道容量。Transporting data input and output (I/O) of a 3D system can utilize many technologies, such as routing and bonding techniques. The use of wireless technologies may be an attractive option to exploit the layer transport concept proposed in this paper. Therefore, the upper layers 1626, 1628 may have M levels designated for system-level IO. Advances in mobile technologies such as G5 have promoted the rapid development of on-chip RF circuits. Multiple papers propose the use of wireless technologies for network-on-chip ("NOC") applications, as presented in: "Wireless NOCs as an interconnect backbone for multicore chips: Prospects and challenges" by DebSujay et al. IEEE Circuits and Systems Emerging and Journal of Selected Topics 2.2 (2012): 228-239; "Structure and Design of Multi-Channel Millimeter Wave Wireless NoC" by Yu Xinmin et al., IEEE Design and Test 31.6 (2014): 19-28; Mineo, Andrea et al. “Exploiting Antenna Directivity in Wireless NoC Architectures,” Microprocessors and Microsystems 43 (2016): 59-66; Kim, Ryan Gary, et al. “Wireless NoCs for VFI-Enabled Multicore Chip Design: Performance Evaluation and Design Tradeoffs ” IEEE Transactions on Computers 65.4 (2015): 1323-1336, all patents and papers mentioned above are incorporated herein by reference. Some have suggested using TL and/or hybrid system connections of waveguide and wireless, such as the paper "Resilient 2D Waveguide Communication Structures for Hybrid Wired-Wireless NoC Design" by Agyeman, Michael Opoku, et al. IEEE Transactions on Parallel and Distributed Systems 28.2(2016):359-373, incorporated herein by reference. These wireless connectivity technologies can also be used to connect 3D systems to external devices that provide system-level I/O. This may also include plasmonic technologies presented in the paper "Plasmonic pathways for integrated wireless communications for sub-diffraction limited signals" by Zhang, Hao Chi et al., Light: Science and Applications 9.1 (2020): 1-9, All of the above are incorporated by reference in their entirety. Extending the use of wireless, such as that provided for the NOC, to connect the 3D system to external devices may be attractive as it can share the resources of the NOC and connections to external devices. Figure 16F illustrates such a concept. 3D system 1662 may be placed close to connecting structure 1664 as appropriate (determined by engineering considerations and trade-offs). There can be dozens or hundreds of wireless channels connecting the 3D system to external connectivity devices. This parallel connection can enhance the overall system by providing extensive and parallel distributed connections to the 3D system. Connectivity fabric 1664 may use fiber optics 1666 to connect to upstream data sources. Depending on the engineering and business trade-offs, the connection structure 1664 may be constructed using similar techniques as the 3D system, or utilizing traditional PCB-type integration techniques. The 3D system I/O structure of Figure 16F can be applied to other types of wireless connections, such as optical connections through the use of diode lasers, and can include high-level signals such as orbital angular momentum (0AM). This can be used for either RF type or optical type wireless connections. In the recent paper "Photonic Quantum Hall Effect and Multiplexed Light Sources with Large Orbital Angular Momentum" by Bahari, Babak et al., Nature Physics (2021): 1-4 (incorporated herein by reference), An "unlimited" data capacity using 0AM is proposed. Other techniques may also be used to exploit the proximity between the structures 1662, 1664 forming such wireless connections to increase channel capacity.

对于电磁调制的光学类型,通孔可以通过适当的氧化物填充或不填充来制成光学透明。类似的光学过孔连接已经在美国专利7,203,387和8,916,910中提出,其以引用的方式并入本文中。For electromagnetic modulated optical types, vias can be made optically clear with or without appropriate oxide filling. Similar optical via connections have been proposed in US Patent Nos. 7,203,387 and 8,916,910, which are incorporated herein by reference.

对于RF类型的电磁调制,过孔可以是铜填充的,或者是使用保形侧壁填充金属外壳、然后是内氧化物、然后是金属的类同轴TSV传输线。这种结构可以通过使用ALD或其它类型的共形沉积来实现。RF型TSV在所属领域中是已知的,例如,在以下内容中提出:美国专利8,618,629、8,759,950、8,916,471和Bleiker,Simon J.等人的论文“通过镀金镍线的磁性组装制造的高频应用的高纵横比贯硅通孔”,IEEE元件、封装和制造技术汇刊5.1(2014):21-27;Vitale,Wolfgang A.等人的“用于RF MEMS应用的基于精细间距3D-TSV的高频元件”,2015年IEEE第65届电子元件与技术会议(ECTC),IEEE,2015;Ebefors,Thorbjbm等人的“用于3D IPD应用的RF TSV的开发和评估”,2013年IEEE国际3D系统集成会议(3DIC),IEEE,2013,所有上述专利和论文的内容以全文引用的方式并入本文中。For RF type electromagnetic modulation, the vias can be copper filled, or a quasi-coaxial TSV transmission line using conformal sidewalls filled with metal shell, then inner oxide, then metal. This structure can be achieved using ALD or other types of conformal deposition. RF type TSVs are known in the art and are proposed, for example, in U.S. Patents 8,618,629, 8,759,950, 8,916,471 and Bleiker, Simon J. et al.'s paper "High Frequency Applications Fabricated by Magnetic Assembly of Gold-Coated Nickel Wires" High aspect ratio through-silicon vias," IEEE Transactions on Components, Packaging, and Manufacturing Technology 5.1 (2014): 21-27; Vitale, Wolfgang A. et al., "Fine-pitch 3D-TSV-based TSVs for RF MEMS applications. High Frequency Components," 2015 IEEE 65th Electronic Components and Technology Conference (ECTC), IEEE, 2015; Ebefors, Thorbjbm et al., "Development and Evaluation of RF TSVs for 3D IPD Applications," 2013 IEEE International 3D System Integration Conference (3DIC), IEEE, 2013. The contents of all above-mentioned patents and papers are incorporated by reference in their entirety.

另一种选择是建造特殊的M层级,用于插入3D堆叠内的冷却衬底。这样的衬底M层级可以利用具有再分布层的传统TSV,所述再分布层将这些大TSV连接到用于每单元垂直母线的单元之间的相对较小的TSV。对于尺寸约为200μm X 200μm的单元,100个大TSV 5μm X5μm的面积可为单元面积的约100x5/200x5/200=1/16,为微沟道和沟槽电容器留出空间。Another option is to build special M levels for cooling substrates inserted inside the 3D stack. Such substrate M levels can utilize conventional TSVs with redistribution layers that connect these large TSVs to relatively smaller TSVs between cells for per-cell vertical busbars. For a cell size of about 200μm X 200μm, 100 large TSVs 5μm

图16D示出了具有TSV 1646的冷却衬底1644和具有用于TSV的再分配层和焊盘1636的逻辑层级1634以及用于垂直母线1632的单元间引脚的X-Z 1602侧剖视图(示出了两个)。Figure 16D shows an two).

图16E示出了通过将顶部再分布层1654添加到图16D的混合接合结构而形成的衬底M层1650的侧面X-Z 1698剖视图。每单位垂直母线引脚/焊盘1632、1652使用TSV 1646通过冷却衬底连接垂直母线。切割层1656可以用于将衬底M层级与承载衬底1658分离。Figure 16E shows a side X-Z 1698 cross-sectional view of substrate M layer 1650 formed by adding a top redistribution layer 1654 to the hybrid bonding structure of Figure 16D. Per unit vertical bus pins/pads 1632, 1652 use TSV 1646 to connect the vertical bus through the cooling substrate. A cutting layer 1656 may be used to separate the substrate M level from the carrier substrate 1658.

使用这样的衬底M层级,3D系统可以包括多个计算级和内存级,X-Y连接级介于两者之间,而系统热量可以通过液体冷却进行管理。Using such a substrate M hierarchy, 3D systems can include multiple compute and memory stages, with X-Y connectivity stages in between, while system heat can be managed with liquid cooling.

图16G示出了带有额外冷却液分配结构的3D系统的X-Z 1698侧剖视图,图16H示出了X-Y 1699剖视图。图16G和图16H中的箭头说明了冷却液或冷却液的流动方向。使用类似的技术,例如氧化物对氧化物接合来接合较厚的衬底1674,所述衬底1674可以由玻璃型材料或硅晶片制成。较厚的衬底1674可以形成有Y方向宽的隧道1672、1676,其宽度和深度例如为亚毫米、几毫米或甚至几十毫米。较厚衬底1674的XY尺寸可以大于3D系统的其余部分,使得较厚衬底可以在其边缘区域具有足够的热区域裕度/重叠的情况下容纳3D系统。Y方向宽隧道1672可以是奇阵列,而另一个Y方向宽通道1676可以是偶阵列。每个Y方向宽的隧道组1672和1676可以在3D系统边缘附近连接成一个隧道,如图16H所示。每个Y方向通道组共享其各自的外部冷却剂循环系统的入口或出口。这些隧道1676、1672用作将冷却流体循环到硅内X方向隧道(例如1670、1610和1624)的主管,且被设计为具有数倍的流体传输能力。因此,尽管硅隧道被嵌入在3D系统的有源器件附近,且被设计为具有来自有源晶体管的低热阻,但较厚的衬底1674被设计为支持整体较高的冷却循环,以将热量从诸如1670、1610、1624的硅内隧道中去除。隧道1676中的一些隧道将用于通过专用管道1680将诸如水的冷却液体引入1678到3D系统的基座的X方向隧道1670。且一些通道1672将被用来引出载热液体。这种多级液体分配网络可用于大面积3D系统,以实现有效的冷却和排热。这样的多级液体分配网络可以包括较大的管道1676、1672作为支撑较细管道1610、1624的第一层级,较细管道靠近产生热量的器件级。制造工艺可以使用所属领域已知的技术,例如蚀刻沉积和水平接合。具体设计可以考虑所属领域的技术人员可以设计的特定散热需求和流体动力学考虑。详细设计可以包括布局优化,以进一步均衡3D系统中的整体流体流动。这可以包括使硅内隧道1670更靠近装置边缘(靠近冷冷却剂入口)变窄,其中主管1676、1672可以具有更高的流体压力。这还可以包括使主管入口1676变窄,并使主管出口1672变宽,因为它离“冷冷却剂入口”越来越远,离“热冷却剂出口”越来越近。与螺旋形单通道解决方案或单向解决方案相比,使Y方向宽隧道和X方向窄隧道平行将导致所述区域内的温度梯度最小化。用冷却液冷却半导体电路并将其用于衬底蚀刻管在以下论文中提出:Wang Shaoxi等人的“微流体的3D集成电路冷却”,Micromachines 9.6(2018):287;和C.J.Wu、S.T.Hsiao等人的“3D IC的超高功率冷却解决方案”,VLSIT JFS1-4(2021),其均以引用的方式并入本文中。Figure 16G shows an X-Z 1698 side cross-sectional view of the 3D system with additional coolant distribution structure, and Figure 16H shows an X-Y 1699 cross-sectional view. The arrows in Figures 16G and 16H illustrate the flow direction of the coolant or coolant. Similar techniques, such as oxide-to-oxide bonding, are used to bond thicker substrates 1674, which may be made from glass-type materials or silicon wafers. The thicker substrate 1674 may be formed with Y-wide tunnels 1672, 1676, with widths and depths of, for example, sub-millimeter, several millimeters, or even tens of millimeters. The XY dimensions of the thicker substrate 1674 can be larger than the rest of the 3D system so that the thicker substrate can accommodate the 3D system with sufficient thermal area margin/overlap at its edge areas. The Y-wide tunnel 1672 may be an odd array, while the other Y-wide channel 1676 may be an even array. Each Y-direction wide tunnel group 1672 and 1676 can be connected into one tunnel near the edge of the 3D system, as shown in Figure 16H. Each Y-direction channel group shares its respective inlet or outlet of the external coolant circulation system. These tunnels 1676, 1672 serve as main pipes for circulating cooling fluid to the X-direction tunnels within the silicon (eg, 1670, 1610, and 1624) and are designed to have several times the fluid transfer capacity. Therefore, although the silicon tunnel is embedded near the active devices of the 3D system and is designed to have low thermal resistance from the active transistors, the thicker substrate 1674 is designed to support overall higher cooling cycles to transfer heat Removed from intra-silicon tunnels such as 1670, 1610, 1624. Some of the tunnels 1676 will be used to introduce 1678 cooling liquid, such as water, through dedicated pipes 1680 to the X-direction tunnels 1670 of the base of the 3D system. And some channels 1672 will be used to lead out the heat transfer liquid. This multi-level liquid distribution network can be used in large-area 3D systems to achieve effective cooling and heat removal. Such a multi-stage liquid distribution network may include larger tubes 1676, 1672 as a first level supporting thinner tubes 1610, 1624, which are located close to the device stage where heat is generated. The manufacturing process may use techniques known in the art, such as etched deposition and horizontal bonding. Specific designs may take into account specific cooling needs and fluid dynamics considerations that can be devised by those skilled in the art. Detailed design can include layout optimization to further equalize the overall fluid flow in the 3D system. This may include narrowing the intra-silicon tunnel 1670 closer to the edge of the device (near the cold coolant inlet), where the main tubes 1676, 1672 may have higher fluid pressures. This may also include narrowing the main pipe inlet 1676 and widening the main pipe outlet 1672 as it gets further away from the "cold coolant inlet" and closer to the "hot coolant outlet". Making the wide tunnel in the Y direction and the narrow tunnel in the X direction parallel will result in minimizing the temperature gradient within said area compared to a spiral single channel solution or a one-way solution. The use of coolants to cool semiconductor circuits and their use in substrate etching tubes is proposed in Wang Shaoxi et al., "3D Integrated Circuit Cooling with Microfluidics," Micromachines 9.6(2018):287; and C.J. Wu, S.T. Hsiao "Ultra-high power cooling solutions for 3D ICs" et al., VLSIT JFS1-4 (2021), both of which are incorporated herein by reference.

对于多级3D系统,可能需要添加一个逻辑层级,所述逻辑层级可以针对数据移动而不是数据处理进行优化,例如,正如我们过去看到的作为MCS 85微处理器系统一部分的直接存储器访问(DMA)控制器Intel 8237。如图16C所示,这样的3D系统可以包括水冷处理器级的基础,由高速存储器M层级覆盖,由高密度存储器M层级重叠,由专用数据移动M层级覆盖、由X-Y连接M层级重叠、由高密度存储M层级重叠和由高速存储M层级覆盖,由额外的水冷处理器M层级覆盖、由器件到外部系统连接M层级覆盖。可以使用散热器层来平均各个单元之间的热量,以减少局部热点。相变材料层可用于随时间平均热量,以减少瞬时热峰值。主动热管理可以通过集成每个区域来使用,例如,与温度控制电路集成的每个单元温度传感器。这样的温度控制电路还可以控制单元处理器的操作以防止过热。这可以通过减慢处理器时钟或降低处理器电源电压或改变周期性静默时间或激活关机来实现。这些主动技术管理运行速度以避免过热。概述的3D系统集成减少了系统的整体互连,并因此允许更节能和更高效的计算系统。然而,功率预算和热量预算限制了3D系统的运行。这些热管理技术允许在这样的总体热预算内优化操作。For multi-level 3D systems, it may be necessary to add a level of logic that can be optimized for data movement rather than data processing, for example, as we have seen in the past with Direct Memory Access (DMA) as part of the MCS 85 microprocessor system )Controller Intel 8237. As shown in Figure 16C, such a 3D system could include a water-cooled processor-level foundation covered by M levels of high-speed memory, overlaid by M levels of high-density memory, overlaid by M levels of dedicated data movement, overlaid by M levels of X-Y connections, High-density storage M levels overlap and are covered by M levels of high-speed storage, M levels of additional water-cooled processors, and M levels of device-to-external system connections. Radiator layers can be used to even out the heat between units to reduce local hot spots. Layers of phase change materials can be used to average heat over time to reduce instantaneous heat spikes. Active thermal management can be used by integrating each zone, for example, a per-unit temperature sensor integrated with the temperature control circuitry. Such temperature control circuitry may also control the operation of the unit's processor to prevent overheating. This can be achieved by slowing down the processor clock or reducing the processor supply voltage or changing periodic quiet times or activating a shutdown. These active technologies manage operating speed to avoid overheating. The outlined 3D system integration reduces the overall interconnection of the system and therefore allows for more energy efficient and efficient computing systems. However, power budget and thermal budget limit the operation of 3D systems. These thermal management techniques allow optimized operation within such an overall thermal budget.

另一种选择是包括使用简单接合和细化的多个步骤,然后使用TSV处理通过层级堆叠形成垂直母线支柱,然后形成用于整个M层级的引脚/焊盘,用于以下混合接合集成步骤。图17A-17D显示了这种流程。这种流动的优点是节省了用于这种水平堆叠的内部水平的引脚/焊盘的形成。Another option is to include multiple steps using simple bonding and thinning, then using TSV processing to form vertical busbar pillars through level stacking and then forming pins/pads for the entire M level for the following hybrid bonding integration step . Figures 17A-17D illustrate this flow. The advantage of this flow is that it saves the formation of internal horizontal pins/pads for this horizontal stack.

图17A示出了底部水平1706和内部水平1704的侧面X-Z 1702剖视图。这些层级中的每一个都被构造为具有间隔开的单元1724且在连接1722之间,连接1722可以用于稍后连接到垂直母线支柱1726。图17A还显示了F2F相互接合的两个水平面,形成了结构1708,其中内部水平面1704被翻转并接合到基层1706。Figure 17A shows a side X-Z 1702 cross-sectional view of the bottom level 1706 and the interior level 1704. Each of these levels is constructed with spaced apart cells 1724 and between connections 1722 that may be used for later connection to vertical busbar struts 1726 . Figure 17A also shows two levels of F2F joined to each other, forming a structure 1708 in which the inner level 1704 is flipped over and joined to the base layer 1706.

图17B示出了移除后的结构,即承载衬底1705的内部水平1704的“切割”。17B shows the structure after removal, ie, the "cut" of the inner level 1704 of the carrier substrate 1705.

图17C示出了在重复所述工艺五次之后的结构,所述工艺形成了由基层和六个内部层接合在顶部的层堆叠。Figure 17C shows the structure after repeating the process five times, forming a layer stack consisting of a base layer and six inner layers bonded on top.

图17D示出了在形成贯穿堆叠过孔(TSV)1726和接合引脚/焊盘1724之后的结构。内能级厚度可以是大约100nm或更大,例如大约0.5μm、大约1μm、约2μm、大致4μm或甚至大于大约6μm。贯穿堆叠通孔(TSV)1726(贯穿水平堆叠)可以穿过几十微米,这对于工业中的TSV来说是常见的。通孔的金属填充可以同时在连接线1722之间的单元之间形成到水平面的连接。这样的情况并不常见,且需要半导体工艺中的技术人员对工艺进行适当的调整。可以合理地预期,这样的贯穿堆叠过孔将在单元1730之间需要比如果所述过孔将独立地形成用于每个层级所需的更大的空间,从而增加了结构尺寸,然而所述工艺的简单性可能使其在一些应用中具有吸引力。所述行业正在改进这种过孔的蚀刻技术,且已经证明了1:20的纵横比。因此,对于20μm厚度的水平堆叠,可以制造直径约为1μm的过孔。Figure 17D shows the structure after forming through stack vias (TSVs) 1726 and bonding pins/pads 1724. The internal level thickness may be about 100 nm or greater, such as about 0.5 μm, about 1 μm, about 2 μm, about 4 μm, or even greater than about 6 μm. Through stack vias (TSVs) 1726 (through the horizontal stack) can pass through tens of microns, which is common for TSVs in the industry. The metal filling of the vias can simultaneously form connections to the horizontal plane between cells between connection lines 1722 . Such situations are uncommon and require technicians in the semiconductor process to make appropriate adjustments to the process. It is reasonable to expect that such through-stack vias would require larger space between cells 1730 than would be required if the vias were to be formed independently for each level, thus increasing the structure size, however the The simplicity of the process may make it attractive in some applications. The industry is improving etching techniques for such vias and has demonstrated an aspect ratio of 1:20. Therefore, for a horizontal stack of 20 μm thickness, vias with a diameter of about 1 μm can be made.

这种3D系统的一个重要方面是用于存储器层级的存储器技术。至少在美国专利10,892,016和美国申请16/483,431(现为US2020/0013791)中提出了一种命名为3D NOR-P的特定存储器技术,其以引用的方式并入本文中。在下文中,给出了这种3D NOR-P技术的一些增强。An important aspect of such 3D systems is the memory technology used for the memory hierarchy. A specific memory technology named 3D NOR-P was proposed at least in US Patent 10,892,016 and US Application 16/483,431 (now US2020/0013791), which are incorporated herein by reference. In the following, some enhancements of this 3D NOR-P technology are given.

本发明的另一个实施方案涉及可用于形成肖特基势垒S/D结的工艺步骤。更具体地,一种精确地形成硅化物层的方法。图18A-18F是示例性3D NOR-P结构的XY和XZ截面的截面剖视图,显示了示例性工艺步骤。Another embodiment of the present invention relates to process steps that can be used to form a Schottky barrier S/D junction. More specifically, a method of accurately forming a silicide layer. 18A-18F are cross-sectional views of XY and XZ sections of an exemplary 3D NOR-P structure showing exemplary process steps.

图18B显示了用于硅化工艺的金属,例如Ni、Co、Ti、Ta、W、Cu、Pt、Al或其合金金属沉积。在所述步骤中,通过有能力的方法/机器沉积非常薄的金属衬垫,例如原子层沉积(ALD)或分子束外延(MBE)。原子厚度控制的金属衬垫的厚度范围为3nm~20nm。对于标称工艺,金属的体积将被沉积,从而被完全消耗并形成硅化物,而基本上不留下任何未反应的金属。Figure 18B shows metal deposition of metals such as Ni, Co, Ti, Ta, W, Cu, Pt, Al or alloys thereof for a silicide process. In said step, a very thin metal liner is deposited by a capable method/machine, such as Atomic Layer Deposition (ALD) or Molecular Beam Epitaxy (MBE). The thickness of the atomic thickness controlled metal liner ranges from 3nm to 20nm. For a nominal process, the volume of metal will be deposited so that it is completely consumed and forms silicide, leaving essentially no unreacted metal.

图18C显示了通过退火硅化后的结构。退火可以通过促进硅化反应的工艺进行,例如快速热退火、激光尖峰退火、微波退火或其各种组合。通过限制金属供应并充分消耗金属,硅化物的深度和界面将是均匀的。使用薄金属将导致自限制反应性质。在自限硅化之后,剩余的孔可以由其它金属填充,以形成s/D的大部分,如图18D所示。所属领域的技术人员可以将这种肖特基势垒S/D结的形成适配于利用本文所提出的或以引用的方式并入本文中的肖特基势垒S/P结的其它存储器结构。Figure 18C shows the structure after siliconization by annealing. Annealing can be performed by a process that promotes the silicide reaction, such as rapid thermal annealing, laser spike annealing, microwave annealing, or various combinations thereof. By limiting the metal supply and fully depleting the metal, the silicide depth and interface will be uniform. Using thin metal will result in self-limiting reaction properties. After self-limiting silicide, the remaining holes can be filled with other metals to form the majority of s/D, as shown in Figure 18D. One skilled in the art can adapt the formation of this Schottky barrier S/D junction to other memories utilizing the Schottky barrier S/P junctions proposed herein or incorporated herein by reference. structure.

与许多其它3D存储器结构相比,3D NOR-P的一个优势在于其架构的可变性容限。通常,高纵横比蚀刻不可避免地涉及顶部和底部的弯曲和扭曲变化,如图18G所示。在3DNAND架构中,这些变化直接反映在编程和读取操作中,因为NAND架构使用一个非常长的共享通道,称为“字符串”。有时,可变性太高,器件会被认为有缺陷。在许多情况下,对于不同的WL,每个单元的可变性将需要不同的操作电压。此外,3D NAND编程需要增量步进脉冲编程(ISSP),其使用温和编程操作的多次重复,然后进行验证。这样的方法不可避免地会减慢编程操作。相反,3D NOR-P体系结构使用每个对应的比特单元一个信道。因此,可变性问题将远小于3D NAND的可变性问题。在本发明的一个实施例中,编程和擦除电压可以在不同的WL层级上是恒定的。此外,尽管3D NAND使用ISSP和ISSE,但3D NOR-P的编程和擦除电压脉冲可以仅使用单个脉冲,从而提供比3D NAND更快的编程和删除操作。One advantage of 3D NOR-P compared to many other 3D memory structures is the variability tolerance of its architecture. Typically, high aspect ratio etching inevitably involves bending and twisting changes at the top and bottom, as shown in Figure 18G. In the 3D NAND architecture, these changes are directly reflected in program and read operations because the NAND architecture uses a very long shared channel called a "string." Sometimes, the variability is too high and the device is considered defective. In many cases, the variability of each cell will require different operating voltages for different WLs. Additionally, 3D NAND programming requires Incremental Step Pulse Programming (ISSP), which uses multiple iterations of gentle programming operations followed by verification. Such an approach inevitably slows down programming operations. In contrast, the 3D NOR-P architecture uses one channel per corresponding bit unit. Therefore, the variability issue will be much smaller than that of 3D NAND. In one embodiment of the invention, the program and erase voltages can be constant at different WL levels. Additionally, while 3D NAND uses ISSP and ISSE, 3D NOR-P’s program and erase voltage pulses can use only a single pulse, providing faster program and erase operations than 3D NAND.

具有肖特基势垒S/D的3D NOR-P技术的另一个优点与对源极和漏极分离变化的免疫力有关。通常,具有退化掺杂(n+)S/D的标准晶体管的电流-电压特性受到源极和漏极分离或栅极长度的影响。编程效率也可能在很大程度上受到栅极长度的影响,因为电子需要行进不同的沟道长度才能作为热电子用于编程。换言之,用于编程的排水侧注入自然受到任何通道长度变化的影响。因此,基于标准晶体管的NOR闪存往往会受到可变性问题的影响。然而,流入肖特基势垒S/D中的沟道的电流受到源极势垒的限制。结果,肖特基势垒S/D固有地对栅极长度不敏感。Sporea,Radu A.等人的“工艺变化对肖特基势垒源极栅极晶体管电流的影响”2009年国际半导体会议,第2卷,IEEE,2009(以引用的方式并入本文中)证明了这种栅极长度不敏感。这一优点对于高速应用来说甚至更为重要,对于高速应用,使用诸如增量步进脉冲编程(ISSP)之类的技术会减慢写入周期,从而破坏高速目标。Another advantage of 3D NOR-P technology with Schottky barrier S/D is related to immunity to source and drain separation variations. Typically, the current-voltage characteristics of standard transistors with degenerate doped (n+)S/D are affected by source and drain separation or gate length. Programming efficiency can also be significantly affected by gate length, since electrons need to travel different channel lengths to be used for programming as hot electrons. In other words, the drain-side injection used for programming is naturally affected by any changes in channel length. Therefore, standard transistor-based NOR flash memories tend to suffer from variability issues. However, the current flowing into the channel in the Schottky barrier S/D is limited by the source barrier. As a result, the Schottky barrier S/D is inherently insensitive to gate length. Proof of Sporea, Radu A., et al., “The Effect of Process Variations on Schottky Barrier Source-Gate Transistor Current,” 2009 International Semiconductor Conference, Volume 2, IEEE, 2009 (incorporated herein by reference) This gate length is insensitive. This advantage is even more important for high-speed applications, where using techniques such as Incremental Step Pulse Programming (ISSP) can slow the write cycle to the detriment of high-speed targets.

在NOR-P结构的加工工艺中可能会产生热机械应力。热机械应力可能导致可靠性问题,例如晶片的弯曲。这主要是由于结构中使用的材料之间的热膨胀系数(CTE)不匹配。主要的CTE失配可能发生在诸如金属S/D的金属与诸如氧化物、氮化物和多晶硅的非金属之间。在本发明的一个实施例中,故意引入空的空间或空隙,以减轻热机械应力。示例如图18E所示。空隙位于S/D金属柱内部。所述空隙可以在金属沉积期间形成。当热膨胀失配发生时,空隙可以作为应力阻尼器,因为应力将被吸收。Thermomechanical stress may occur during the processing of NOR-P structures. Thermomechanical stress can cause reliability issues such as warping of wafers. This is mainly due to the coefficient of thermal expansion (CTE) mismatch between the materials used in the structure. The major CTE mismatch can occur between metals such as metal S/D and non-metals such as oxides, nitrides and polysilicon. In one embodiment of the invention, empty spaces or voids are intentionally introduced to relieve thermomechanical stress. An example is shown in Figure 18E. The void is located inside the S/D metal pillar. The voids may be formed during metal deposition. When a thermal expansion mismatch occurs, the voids act as stress dampeners because stress will be absorbed.

上面的修改流程是按照图18B中的步骤进行,如图18F所示,进行额外的金属1854沉积。因此,可以选择第一薄金属1852以具有高硅化能力,例如但不限于镍(Ni)、钛(Ti)或钴(Co),而可以选择第二和较厚金属1854以具有较慢的硅化能力,诸如钨(W)。或者,第一薄金属1852选自在相对较低的温度下与硅反应的金属,例如,低于400℃,而第二厚金属1854可选自仅在相对较高的温度下(例如,高于600℃)与硅发生反应的金属。因此,在相对较低的温度下执行金属沉积后退火步骤,以使硅化主要与第一薄金属1852一起进行,而很少与第二厚金属1854一起进行,从而产生类似于图18E所示的结构。通过这种流动,在退火期间第二金属1854的存在可以有助于第二金属的退火热传递,以减少在各种肖特基势垒S/D结位置之间的温度变化下的热暴露时间。从而导致更均匀的肖特基势垒S/D结深度和形状,从而在阵列存储单元上下产生更均匀的器件特性。The modification process above follows the steps in Figure 18B, with additional metal 1854 being deposited as shown in Figure 18F. Accordingly, a first thin metal 1852 may be selected to have high silicide capabilities, such as, but not limited to, nickel (Ni), titanium (Ti), or cobalt (Co), while a second and thicker metal 1854 may be selected to have slower silicide abilities, such as Tungsten (W). Alternatively, the first thin metal 1852 may be selected from metals that react with silicon at relatively low temperatures, e.g., below 400°C, while the second thick metal 1854 may be selected from metals that react only at relatively high temperatures (e.g., high A metal that reacts with silicon at 600°C. Therefore, the post-metal deposition annealing step is performed at a relatively low temperature such that silicide proceeds primarily with the first thin metal 1852 and less with the second thicker metal 1854, resulting in a structure similar to that shown in Figure 18E structure. Through this flow, the presence of the second metal 1854 during annealing can aid in the annealing heat transfer of the second metal to reduce thermal exposure to temperature changes between the various Schottky barrier S/D junction locations. time. This results in a more uniform Schottky barrier S/D junction depth and shape, resulting in more uniform device characteristics up and down the array memory cells.

另一个修改的流程(未画出)是在沟道多晶硅和用于硅化的金属之间添加薄的非晶硅层,以形成肖特基势垒s/D。随着金属硅化退火的进行,金属沿着多晶硅沟道的晶界扩散得更快。通过晶界的这种快速扩散可以导致硅化尖峰,这可以导致器件的故障。为了避免金属与晶界直接接触,在多晶硅沟道和用于硅化的金属之间插入薄的非晶相硅膜。非晶硅的所述界面可以具有例如从5nm到10nm范围内的厚度。界面非晶硅可以任选地包括n型掺杂剂,例如磷或砷,以帮助形成掺杂剂隔离的肖特基势垒S/D。Another modified process (not shown) is to add a thin amorphous silicon layer between the channel polysilicon and the metal used for silicide to form the Schottky barrier s/D. As the metal silicidation anneal proceeds, the metal diffuses faster along the grain boundaries of the polysilicon channel. This rapid diffusion through grain boundaries can lead to silicide spikes, which can lead to device failure. To avoid direct contact between the metal and grain boundaries, a thin amorphous phase silicon film is inserted between the polysilicon channel and the metal used for silicide. The interface of amorphous silicon may have a thickness ranging from 5 nm to 10 nm, for example. The interfacial amorphous silicon may optionally include n-type dopants, such as phosphorus or arsenic, to help form a dopant-isolated Schottky barrier S/D.

另一种替代方案是利用MIS-金属绝缘体-半导体接触的概念。随着器件规模的缩小,掺杂剂的离散性可能是接触电阻变化增加的原因。非常薄的层间电介质半导体和金属可以减少原子变化。在MIS接触的传统方法中,层间氧化物不仅需要足够薄,而且还需要具有导带边缘费米能级钉扎,以便不降低接触电阻本身,如至少US 9,240,480B2、9,735,111B2和9,613,855B1中所述,上述内容以全文引用的方式并入本文中。然而,本发明的区别在于,层间电介质的费米能级可以形成在比导带边缘低0.1~1.0eV的位置,以便在肖特基结附近保持热载流子产生能力。因此,参考图1W,薄氧化物层152可以用作金属化S/D柱154和沟道之间的屏障。氧化物层152可以是约0.1nm至约0.3nm、约0.3至约0.6nm、约0.6nm至约Inm,或者甚至更厚,这取决于产品、工程、电路和设计考虑以及权衡。薄势垒可以帮助实现存储器阵列中所有单元的一致S/D到沟道势垒和稳定的肖特基势垒。这种方法不仅可以减少由于掺杂剂的离散性引起的可变性;它还可以减少由于金属和多晶硅沟道的晶界引起的可变性。Another alternative is to utilize the concept of MIS-metal insulator-semiconductor contact. Dispersion of dopants may be responsible for the increase in contact resistance variation as device scale shrinks. Very thin interlayer dielectric semiconductors and metals can reduce atomic changes. In conventional approaches to MIS contact, the interlayer oxide not only needs to be thin enough, but also needs to have conduction band edge Fermi level pinning so as not to reduce the contact resistance itself, as in at least US 9,240,480B2, 9,735,111B2 and 9,613,855B1 As stated above, the above content is incorporated herein by reference in its entirety. However, the difference of the present invention is that the Fermi level of the interlayer dielectric can be formed 0.1 to 1.0 eV lower than the conduction band edge in order to maintain the hot carrier generation capability near the Schottky junction. Therefore, referring to Figure 1W, thin oxide layer 152 may serve as a barrier between metallized S/D pillar 154 and the channel. Oxide layer 152 may be about 0.1 nm to about 0.3 nm, about 0.3 to about 0.6 nm, about 0.6 nm to about 1 nm, or even thicker, depending on product, engineering, circuit and design considerations and trade-offs. Thin barriers can help achieve consistent S/D-to-channel barriers and stable Schottky barriers for all cells in the memory array. This approach not only reduces variability due to dopant dispersion; it also reduces variability due to grain boundaries in the metal and polysilicon channels.

每个芯片的存储器容量需求可能超过晶片制造技术的存储器容量。为了增加每个芯片的存储器容量,可以将多个存储器晶片堆叠在彼此之上。在一个实施例中,如图19A至19E所示,3D NOR-P晶片可以包括3D NOR-P块的多个堆叠,其中3D NOR-P区块1950A、1950B和1950C中的每一个都在不同的晶片上制造。例如,可以通过使用晶片接合技术来堆叠存储器晶片。为此,每个3D NOR-P块将包括接合焊盘1960。图19A显示了堆叠的3D NOR-P块的一个示例。S/D线垂直对齐,使得每个垂直堆叠的S/D线共享相同的BL和SL地址。然而,在这种情况下,一个晶片块中的WL不应共享另一晶片块中对应的WL。因此,WL接触的阶梯对于所有单独的WL层是必要的。为了实现所有单独WL层的单独/唯一WL地址,下面给出了形成晶片块阶梯WL的示例,其中阶梯WL也形成在晶片块内,如图19B-图19E所示。在下文中,晶片内的阶梯WL被称为局部WL阶梯。连接一个晶片的阶梯WL和另一晶片的楼梯WL的楼梯WL被称为全局WL楼梯。The memory capacity requirements of each chip may exceed the memory capacity of the wafer manufacturing technology. To increase the memory capacity of each chip, multiple memory dies can be stacked on top of each other. In one embodiment, as shown in Figures 19A-19E, a 3D NOR-P wafer may include multiple stacks of 3D NOR-P blocks, where each of the 3D NOR-P blocks 1950A, 1950B, and 1950C are in different fabricated on wafers. For example, memory wafers can be stacked using wafer bonding techniques. To this end, each 3D NOR-P block will include bonding pads 1960. Figure 19A shows an example of stacked 3D NOR-P blocks. The S/D lines are vertically aligned such that each vertically stacked S/D line shares the same BL and SL address. However, in this case, WLs in one wafer block should not share corresponding WLs in another wafer block. Therefore, steps of WL contact are necessary for all individual WL layers. In order to achieve individual/unique WL addresses for all individual WL layers, an example of forming a wafer block ladder WL is given below, where the ladder WL is also formed within the wafer block, as shown in Figures 19B-19E. In the following, the steps WL within the wafer are called local WL steps. The staircase WL connecting the staircase WL of one wafer and the staircase WL of another wafer is called a global WL staircase.

在本发明的一个实施例中,每个局部阶梯WL触点通过局部阶梯跳线1980与其自己的垂直跳跃金属插头1970连接,如图19B所示。跳跃金属塞1970基本上穿透NOR-P水平的整个高度。对于多个堆叠的NOR-P块,需要全局WL楼梯。根据要堆叠的NOR-P晶片的数量,添加相同数量的跳跃金属塞组,如图19C示例所示。额外的跳跃金属插头通过全球阶梯跳线1990连接。为了更好地观察局部楼梯跳线1980、全局楼梯跳线1990、跳跃金属塞,图19D是图19C的侧视图。全局楼梯1990n的左侧和局部楼梯1990n的右侧是打开的(不电连接,不导电),而全局楼梯1990n的右侧和局部楼梯1990n+1的左侧是短路的(电连接,导电)。图19E显示了三个NOR-P块堆叠的情况下的全局WL楼梯连接。第一跳塞1970i连接第一NOR-P块1950A的局部WL楼梯,第二跳塞1970j连接第二NOR-P块1950 0B的局部WL-楼梯,且第三跳塞1970-j连接第三NOR-P块195 0C的局部WL楼梯。通过这样做,堆叠的WL中的每个单独的WL可以具有来自外围逻辑电路的专用WL控制信号。这种连接局部阶梯和全局阶梯的方法可以应用于其它控制线,例如BL和SL。此外,相同的方法可以用于其它3D存储器,例如3D堆叠NAND、RRAM和PCRAM。上述堆叠概念类似于美国专利申请16/558304(现为美国专利公开2020/0176420)的图22A-22B和图27A-27D中提出的概念,其全部内容以引用的方式并入本文中。In one embodiment of the invention, each local ladder WL contact is connected to its own vertical jump metal plug 1970 via a local ladder jumper 1980, as shown in Figure 19B. Jump Metal Plug 1970 penetrates essentially the entire height of the NOR-P level. For multiple stacked NOR-P blocks, a global WL staircase is required. Depending on the number of NOR-P wafers to be stacked, add the same number of jump metal plug sets as shown in the example in Figure 19C. Additional jump metal plugs are connected via Global Ladder Jumpers 1990. In order to better observe the local stair jumper 1980, the global stair jumper 1990, and the jumping metal plug, Figure 19D is a side view of Figure 19C. The left side of the global staircase 1990 n and the right side of the local staircase 1990 n are open (not electrically connected and not conductive), while the right side of the global staircase 1990 n and the left side of the local staircase 1990 n+1 are short-circuited (electrically connection, conduction). Figure 19E shows the global WL staircase connection with three NOR-P blocks stacked. The first jumper 1970i connects the local WL-stairs of the first NOR-P block 1950A, the second jumper 1970j connects the local WL-stairs of the second NOR-P block 19500B, and the third jumper 1970-j connects the third NOR - Partial WL staircase of P block 195 0C. By doing this, each individual WL in the stacked WL can have a dedicated WL control signal from the peripheral logic circuitry. This method of connecting local ladders and global ladders can be applied to other control lines such as BL and SL. Furthermore, the same approach can be used for other 3D memories, such as 3D stacked NAND, RRAM and PCRAM. The stacking concept described above is similar to that presented in Figures 22A-22B and Figures 27A-27D of US Patent Application No. 16/558304 (now US Patent Publication 2020/0176420), the entire contents of which are incorporated herein by reference.

Aa 3D NOR-P存储器件的一个实施例是用于金属诱导的结晶硅沟道的工艺步骤。三维NOR-P。工艺步骤可用于由小晶粒尺寸的多晶硅或非晶硅沟道形成实质上大晶粒尺寸的金属诱导结晶(MIC)硅沟道。小晶粒尺寸的多晶硅或非晶硅是指通过化学气相沉积工艺沉积的半导体相。为了简单起见,本发明中的“多晶硅”是指结晶工艺之前的沉积硅,而“MICSi”是指金属诱导结晶工艺之后的再结晶多晶硅。晶种金属和结晶的温度要求可以在Yoon,Soo-Young等人的“非晶硅的金属诱导结晶”固体薄膜383.1-2(2001):34-38(以引用的方式并入本文中)中找到。此外,本发明中提出的3D NOR-P的沟道材料可以是但不限于多晶锗、非晶锗、多晶硅锗、非晶态硅锗或其金属诱导结晶半导体。晶种金属和结晶所需的温度可在Kang Dong-Ho和Jin-Hong Park的“非晶锗(α-Ge)上的铟(In)和锡(Sn)基金属诱导结晶(MIC)”,材料研究公告60(2014):814-818和Peng Shanglong等人的“氢化非晶Si1-xGex(0.2≤x≤1)薄膜的低温al诱导结晶”,固体薄膜516.8(2008):2276-2279中找到,其以引用的方式并入本文中。3D NOR-P的器件结构在US10892016中提出,在此引入作为参考。图19F示出了现有技术中在3D NOR-P中形成MIC硅的方法。图19G和图19H示出了本发明中在3D NOR-P中形成MIC硅的另一种方法。为了简单起见,仅绘制单层单位存储单元。图中所示的现有技术19F描述了横向发生的金属诱导结晶工艺。步骤si示出了形成交替作为WL间氧化物的氧化硅和作为字线(WL)的简并掺杂多晶硅的材料的多堆叠水平层。为了更好地观察,这里仅示出了WL的单层,但没有绘制WL间氧化物。单位存储单元由中空圆柱形多晶硅的三个孔组成,所述中空圆柱形多晶硅相对于晶片衬底垂直延伸,其中靠近中心孔的多晶硅的一部分可以成为沟道区域,靠近左孔和右孔的多晶硅部分可以成为源极和漏极区域。存储层形成在中空圆柱形多晶硅的外表面和WL之间。存储层可以是隧道氧化物、诸如氮化硅或浮栅的电荷捕获层以及阻挡氧化物的堆叠。步骤s2显示沉积在多晶硅孔的内表面上的金属薄层,例如Ni、Al或Cu,作为源极或漏极。步骤s3显示了MIC工艺的中间阶段,表明金属正朝着相反的方向横向迁移,并留下MIC-Si。步骤s4示出了MIC工艺的最后阶段,表明金属到达漏极或源极区域,且沟道区域的多晶硅变为MIC-Si。步骤s5示出了在随后的源极和漏极工艺之后的器件结构。与图19F中所示的工艺相比,本发明中提出的工艺可以提供更短的工艺时间。图19G描述了本发明中提出的MIC工艺。步骤si示出了类似于图19F中所示的形成交替作为WL间氧化物的氧化硅和作为字线(WL)的简并掺杂多晶硅的材料的多堆叠水平层。存储层形成在中空圆柱形多晶硅的外表面和WL之间。存储层可以是隧道氧化物、诸如氮化硅或浮栅的电荷捕获层以及阻挡氧化物的堆叠。步骤s2示出沉积在作为沟道区域的多晶硅孔的内表面上的诸如Ni、Al或Cu的金属薄层。步骤s3显示了MIC工艺的中间阶段,表明金属首先沿径向从沟道中的中空多晶硅的内表面迁移到外表面,然后沿横向迁移到源极和漏极区域,留下MIC Si。步骤s4示出了MIC工艺的最后阶段,表明金属到达漏极或源极区域,且沟道区域的多晶硅变为MIC-Si。步骤s5示出了在随后的源极和漏极工艺之后的器件结构。图19H描绘了本发明中提出的MIC工艺的另一个实施方案。步骤si示出了将用金属或其它导电WL材料替代的材料交替的多堆叠水平层,氧化硅作为WL间氧化物,氮化硅作为牺牲层。步骤s2示出了在选择性地去除牺牲氮化硅之后的器件结构,随后在多晶硅的外表面上沉积诸如Ni、Al或Cu的金属薄层。步骤s3显示了MIC工艺的最后阶段,表明金属从外表面向内表面中空多晶硅径向迁移,留下MIC Si。步骤s4示出了在去除到达MIC Si的内表面的金属之后,接着形成存储层和WL的器件结构。存储层可以是隧道氧化物、诸如氮化硅或浮栅的电荷捕获层以及阻挡氧化物的堆叠。步骤s5示出了在随后的源极和漏极工艺之后的器件结构。One example of an Aa 3D NOR-P memory device is a process step for a metal-induced crystalline silicon channel. Three-dimensional NOR-P. Process steps may be used to form substantially large grain size metal induced crystallization (MIC) silicon channels from small grain size polysilicon or amorphous silicon channels. Small grain size polysilicon or amorphous silicon refers to the semiconductor phase deposited by the chemical vapor deposition process. For simplicity, "polysilicon" in this invention refers to deposited silicon before the crystallization process, and "MICSi" refers to recrystallized polysilicon after the metal-induced crystallization process. Seed metal and crystallization temperature requirements can be found in Yoon, Soo-Young et al., "Metal-Induced Crystallization of Amorphous Silicon," Solid Thin Films 383.1-2 (2001):34-38 (incorporated herein by reference) turn up. In addition, the channel material of the 3D NOR-P proposed in the present invention can be, but is not limited to, polycrystalline germanium, amorphous germanium, polycrystalline silicon germanium, amorphous silicon germanium or metal-induced crystallization semiconductors thereof. The seed metal and temperatures required for crystallization are described in Kang Dong-Ho and Jin-Hong Park, "Indium (In) and tin (Sn)-based metal-induced crystallization (MIC) on amorphous germanium (α-Ge)," Materials Research Bulletin 60 (2014): 814-818 and "Low-temperature al-induced crystallization of hydrogenated amorphous Si 1-x Ge x (0.2≤x≤1) thin films" by Peng Shanglong et al., Solid Thin Films 516.8 (2008): 2276 -2279, which is incorporated herein by reference. The device structure of 3D NOR-P is proposed in US10892016, which is hereby incorporated by reference. Figure 19F shows a prior art method of forming MIC silicon in 3D NOR-P. Figures 19G and 19H illustrate another method of forming MIC silicon in 3D NOR-P in the present invention. For simplicity, only a single layer of unit storage cells is drawn. Prior art 19F, shown in the figure, describes a metal-induced crystallization process that occurs laterally. Step si shows the formation of multiple stacked horizontal layers of material alternating silicon oxide as inter-WL oxide and degenerately doped polysilicon as word lines (WL). For better observation, only a single layer of WL is shown here, but the inter-WL oxide is not drawn. The unit memory cell consists of three holes of hollow cylindrical polysilicon extending vertically relative to the wafer substrate, where a part of the polysilicon near the center hole can become the channel area, and the polysilicon near the left and right holes Parts can become source and drain regions. The memory layer is formed between the outer surface of the hollow cylindrical polysilicon and WL. The storage layer may be a tunnel oxide, a charge trapping layer such as silicon nitride or a floating gate, and a stack of blocking oxides. Step s2 shows a thin layer of metal, such as Ni, Al or Cu, deposited on the inner surface of the polycrystalline silicon hole as a source or drain. Step s3 shows the intermediate stage of the MIC process, indicating that the metal is migrating laterally in the opposite direction, leaving behind MIC-Si. Step s4 shows the final stage of the MIC process, indicating that the metal reaches the drain or source region and the polysilicon in the channel region becomes MIC-Si. Step s5 shows the device structure after subsequent source and drain processes. Compared with the process shown in Figure 19F, the process proposed in the present invention can provide shorter process time. Figure 19G depicts the MIC process proposed in the present invention. Step si shows the formation of multiple stacked horizontal layers of material similar to that shown in Figure 19F, alternating silicon oxide as inter-WL oxide and degenerately doped polysilicon as word lines (WL). The storage layer is formed between the outer surface of the hollow cylindrical polysilicon and WL. The storage layer may be a tunnel oxide, a charge trapping layer such as silicon nitride or a floating gate, and a stack of blocking oxides. Step s2 shows a thin layer of metal such as Ni, Al or Cu deposited on the inner surface of the polysilicon hole as the channel region. Step s3 shows the intermediate stage of the MIC process, indicating that the metal first migrates radially from the inner surface to the outer surface of the hollow polysilicon in the channel, and then migrates laterally to the source and drain regions, leaving behind the MIC Si. Step s4 shows the final stage of the MIC process, indicating that the metal reaches the drain or source region and the polysilicon in the channel region becomes MIC-Si. Step s5 shows the device structure after subsequent source and drain processes. Figure 19H depicts another embodiment of the MIC process proposed in this invention. Step si shows multiple stacked horizontal layers of materials that will be replaced with metal or other conductive WL materials, silicon oxide as the inter-WL oxide, and silicon nitride as the sacrificial layer. Step s2 shows the device structure after selective removal of sacrificial silicon nitride, followed by deposition of a thin layer of metal such as Ni, Al or Cu on the outer surface of the polysilicon. Step s3 shows the final stage of the MIC process, indicating radial migration of metal from the outer surface to the inner surface of the hollow polysilicon, leaving behind the MIC Si. Step s4 shows the removal of metal reaching the inner surface of the MIC Si, followed by the formation of the memory layer and the device structure of the WL. The storage layer may be a tunnel oxide, a charge trapping layer such as silicon nitride or a floating gate, and a stack of blocking oxides. Step s5 shows the device structure after subsequent source and drain processes.

3D NOR架构的一个重要优势,尤其是具有金属化S/D柱的3D NOR-P,是能够堆叠数百个层级,但对器件性能,特别是感应电流值的影响较小。常见的3D NAND架构对层级的数量非常敏感,因为沟道与相邻的晶体管串联,如前所述,晶体管被称为串。因此,当随机单元被访问时,电流必须流过串中的每个晶体管。由于通过通道的电流随着层级数量的增加而减少,因此限制了可以堆叠的层级数量。在3D NOR-P中,当访问随机单元时,电流仅流过一个选定的晶体管。穿过层级堆叠的冲孔被指定用于S/D,且对于金属化S/D,导电性比3DNAND沟道的导电性好几个数量级,从而允许相对数量级的更多3D层级。因此,可制造存储器的尺寸可以快速、容易地增长到100X/1000X或更高的密度,而性能几乎没有损失。An important advantage of 3D NOR architectures, especially 3D NOR-P with metallized S/D pillars, is the ability to stack hundreds of levels with less impact on device performance, especially the sensed current value. Common 3D NAND architectures are very sensitive to the number of levels because the channels are in series with adjacent transistors, which, as mentioned earlier, are called strings. Therefore, when a random cell is accessed, current must flow through every transistor in the string. Since the current through the channel decreases as the number of layers increases, there is a limit to the number of layers that can be stacked. In 3D NOR-P, when a random cell is accessed, current flows through only one selected transistor. Punch holes through the level stack are specified for S/D, and for metallized S/D, the conductivity is orders of magnitude better than that of the 3D NAND channel, allowing for relatively orders of magnitude more 3D levels. As a result, the size of manufacturable memory can quickly and easily grow to densities of 100X/1000X or higher with little loss in performance.

图20A和图20B提供了SRAM、DRAM等高速存储器的主要属性表;以及诸如NOR、存储类(SCM)的中速存储器;以及诸如NAND之类的高密度存储器。正如我们所看到的,这些存储器技术提供了密度、访问时间和功率之间的权衡。功率权衡可以包括写入功率和备用功率;访问时间权衡可以包括读取访问和写入访问。密度权衡可能包括单元大小、层级数量以及每个单元可编程的比特数。可以进行这些权衡以更好地支持所需的应用程序。例如,在人工智能(Al)应用中,用于数据的存储器部分与用于权重的存储器部分之间可能存在内置的物理和电差异,权重通常需要比写入多得多的读取。因此,对于所述示例,数据可以存储在不具有隧道氧化物或具有最小隧道氧化物的单元中,而权重可以存储在具有大于约1nm隧道氧化物的细胞中,以在较长写入时间的代价下减少刷新的需要和功率。Figures 20A and 20B provide tables of the main properties of high-speed memories such as SRAM and DRAM; as well as medium-speed memories such as NOR, storage class (SCM); and high-density memories such as NAND. As we have seen, these memory technologies offer trade-offs between density, access time, and power. Power tradeoffs can include write power and spare power; access time tradeoffs can include read access and write access. Density trade-offs may include cell size, number of levels, and the number of bits programmable per cell. These tradeoffs can be made to better support the desired application. For example, in artificial intelligence (AI) applications, there may be built-in physical and electrical differences between the portion of memory used for data and the portion of memory used for weights, which typically require many more reads than writes. Therefore, for the example described, data can be stored in cells with no tunnel oxide or with minimal tunnel oxide, while weights can be stored in cells with greater than about 1 nm tunnel oxide for longer write times. Reduce refresh requirements and power at the cost.

3D NOR-P可以被构造为支持设计和处理的存储单元,通过改变单元结构(例如薄或厚的隧道氧化物)来更好地匹配这些权衡。这可以用适当的掩模沿着X-Y方向进行,也可以用堆叠等沿着Z方向进行——例如图19A。为了更好地匹配应用程序,也可以通过存储器控制来实现,例如通过使用镜像位或多级单元存储技术。3D NOR-P can be constructed to support the design and processing of memory cells by changing the cell structure (such as thin or thick tunnel oxide) to better match these trade-offs. This can be done in the X-Y direction with an appropriate mask, or in the Z direction with a stack etc. - e.g. Figure 19A. To better match the application, this can also be achieved through memory control, for example by using mirrored bits or multi-level cell memory technology.

本文中的术语“电子充电”是指通过在电荷捕获层中捕获电子或去捕获空穴来增加电子的净密度。类似地,术语“电子放电”描述了通过在电荷捕获层中去捕获电子或捕获空穴来降低电子的净密度。在二进制位或每单元一位操作中,术语“编程”和“擦除”可以分别互换为“写入0”和“写入1”。The term "electron charging" as used herein refers to increasing the net density of electrons by trapping electrons or detrapping holes in a charge trapping layer. Similarly, the term "electron discharge" describes reducing the net density of electrons by trapping electrons or trapping holes in a charge trapping layer. In binary-bit or one-bit-per-cell operations, the terms "program" and "erase" are interchangeable with "write 0" and "write 1" respectively.

提出了一种3D NOR-P的操作方法,以补偿细胞间的变化。在传统的存储器操作中,将诸如阈值电压之类的存储状态与参考层级进行比较,其中参考层级可以是由外部电压发生器或诸如未选择的相邻块的位线之类的控制线产生的固定值。然而,编程和擦除操作中的工艺引起的可变性和可变性可能导致阈值电压分布的广泛分布。在大阈值电压分布的情况下,用于读取操作的固定参考电压层级可能导致故障,或者可能需要额外的读取时间来解决读取操作。自参考或差分操作模式和控制电路可以克服这种单元到单元的变化。A method of operating 3D NOR-P is proposed to compensate for cell-to-cell changes. In conventional memory operation, a storage state such as a threshold voltage is compared to a reference level, which may be generated by an external voltage generator or a control line such as the bit line of an unselected adjacent block. Fixed value. However, process-induced variability and variability in program and erase operations can lead to wide spreads in threshold voltage distributions. In the case of large threshold voltage distributions, a fixed reference voltage level for read operations may cause malfunctions, or additional read time may be required to resolve the read operation. Self-referenced or differential operating modes and control circuitry can overcome this cell-to-cell variation.

差分操作模式和控制电路方案利用存储单元本身来生成用于所述单元读取操作的参考信号。所述概念是最初应用于差分操作模式的镜像位单元的修改技术。它利用电荷陷阱的方面和陷阱层的非导电性来形成沿着源极到漏极侧的沟道电阻分布的不对称性。The differential operating mode and control circuit scheme utilizes the memory cell itself to generate the reference signal for the read operation of the cell. The concept described is a modified technique of mirrored bit cells originally applied to differential operating modes. It exploits the charge trapping aspect and the non-conductivity of the trap layer to create an asymmetry in the channel resistance distribution along the source to drain side.

图20C说明了3D NOR-P的2x2阵列结构。图20D示出了对这种存储单元的写入电压的示例,且在四种情况下在单个单元上示出。箭头表示电子充电的方向。电荷陷阱层中的填充圆示出了填充的局部电子带电区,且电荷陷阱层内的仅轮廓(空)圆示出空的局部电子电荷区(空穴)。应当注意,作为这些值示例的绝对电压值将针对特定存储器结构来设置。Figure 20C illustrates the 2x2 array structure of 3D NOR-P. Figure 20D shows examples of write voltages to such memory cells, and is shown in four cases on a single cell. Arrows indicate the direction of electron charging. Filled circles in the charge trap layer show filled local electron charge regions, and only outline (empty) circles within the charge trap layer show empty local electron charge regions (holes). It should be noted that the absolute voltage values used as examples of these values will be set for a specific memory structure.

这种电荷陷阱存储单元可用于自参考(差分)读取操作。对于状态“1”,漏极侧的局部阈值电压低于源极侧的本地阈值电压。对于状态“0”,漏极侧的局部阈值电压高于源极侧的本地阈值电压。因此,读取操作进行局部阈值电压的不平衡或偏斜的感应。因此,自参考读数可以容忍大的细胞间变异。This charge trap memory cell can be used for self-referenced (differential) read operations. For state "1", the local threshold voltage on the drain side is lower than the local threshold voltage on the source side. For state "0", the local threshold voltage on the drain side is higher than the local threshold voltage on the source side. Therefore, read operations perform sensing of local imbalances or skews in threshold voltages. Therefore, self-referenced reads can tolerate large cell-to-cell variability.

图20E示出了连接到感应放大器(S/A)的2x2阵列。在传统存储器中,S/A的差分输入连接到存储器阵列的不同列的一个BL和另一个BL。对于自参考读数,差分输入从同一列连接到BL和SL。Figure 20E shows a 2x2 array connected to a sense amplifier (S/A). In conventional memory, the differential inputs of the S/A are connected to one BL and another BL in different columns of the memory array. For self-referenced readings, the differential inputs are connected to BL and SL from the same column.

图20F说明了S/A的BL根据读取时间的电压发展。图20G显示了读取操作的示例电压条件和相关能带图。首先,将相同的电压(如0.5V)预充电到选定列的SL/BL,所述操作可以称为“SL/BL预充电”。接下来,将大于用于读取的局部阈值电压的WL电压(例如1V)施加到所选择的行。然后,尽管SL/BL被预充电到等电位,但由于单元局部阈值电压的不对称分布,少量电子可能在状态“1”时从源极侧流到沟道,或在状态“0”时从漏极侧流至沟道。当发生从源极线或位线中的一个到沟道的小电流流动时,源极线或者位线的预充电电势层级略微下降。接下来,使能S/A以放大给定列的BL和SL的微小变化。因此,可以通过自参考来感应存储状态。Figure 20F illustrates the voltage development of the S/A's BL as a function of read time. Figure 20G shows example voltage conditions and associated band diagrams for read operations. First, the same voltage (such as 0.5V) is precharged to the SL/BL of the selected column. This operation may be called "SL/BL precharge". Next, a WL voltage (eg, 1V) greater than the local threshold voltage for reading is applied to the selected row. Then, although SL/BL are precharged to the equal potential, due to the asymmetric distribution of the local threshold voltage of the cell, a small amount of electrons may flow from the source side to the channel in state "1", or from the source side in state "0" The drain side flows to the channel. When a small current flow occurs from one of the source line or bit line to the channel, the precharge potential level of the source line or bit line drops slightly. Next, enable S/A to amplify small changes in BL and SL for a given column. Therefore, storage states can be sensed through self-reference.

图20G还显示了未选择电池的抑制读数的电压条件和相关能带图。小于用于读取的局部阈值电压(例如0V)的WL电压被施加到未选择的行共享SL/BL。由于读取禁止WL电压小于本地阈值电压,因此没有沟道电流将流动。对于共享WL的未选择列,BL/SL可以是浮动的。当BL/SL浮动时,不会有通道电流流动。对于具有与沟道接触的主体的3D NOR-P存储器结构,如美国申请16/483431的图27A-27C所示,主体可以被偏置到0伏,以高度加速这种差分单元结构和方法的读取。Figure 20G also shows the voltage conditions and associated band diagrams for suppressed readings for unselected cells. A WL voltage less than the local threshold voltage for reading (eg, 0V) is applied to the unselected row shared SL/BL. Since the read inhibit WL voltage is less than the local threshold voltage, no channel current will flow. BL/SL can be floated for unselected columns sharing WL. When BL/SL is floating, no channel current flows. For 3D NOR-P memory structures with a body in contact with the channel, as shown in Figures 27A-27C of US application 16/483431, the body can be biased to 0 volts to highly accelerate the development of such differential cell structures and methods. Read.

读取方法的另一个示例是加速感应定时。参考美国专利11018156和美国专利10892016的图5E中所示的读取电压条件(以引用的方式并入本文中),举例说明了用于读取的1V的WL电压。在所述实施例中,用于读取的栅极过驱动电压可以大于1V,例如1.5V或2V,以便加速沟道电流或减少感应所需的最小时间。然而,栅极过驱动电压可以被限制为在读取操作期间很少对共享相同BL的未选单元造成软写入或读取干扰或者对所选单元的存储状态造成干扰的值。例如,在WL和SL/BL之间的给定写入电压差ΔVWL_SL/BL下,读取栅极电压可以在ΔVWL_SL/BL的50%~75%之间。Another example of a read method is acceleration sensing timing. A WL voltage of 1V for reading is illustrated with reference to the read voltage conditions shown in Figure 5E of US Patent 11018156 and US Patent 10892016, which are incorporated herein by reference. In such embodiments, the gate overdrive voltage for readout may be greater than 1V, such as 1.5V or 2V, in order to speed up the channel current or reduce the minimum time required for induction. However, the gate overdrive voltage can be limited to a value that rarely causes soft write or read disturbance to unselected cells sharing the same BL or to the storage state of the selected cell during read operations. For example, at a given write voltage difference ΔV WL_SL/BL between WL and SL/BL, the read gate voltage can be between 50% and 75% of ΔV WL_SL/BL .

自参考型读取方案的额外替代方案可以利用过驱动来读取要用作读取参考信号的存储单元的饱和电流。所述技术可以使用两个读取周期,第一个用于参考信号,第二个用于单元存储状态信号。图20H示出了不同存储状态和读取条件下的位线电流与字线电压的特性。第一组读取偏置条件{Vread1}被施加到所选择的单元,且根据{Vread1}-单元存储状态信号获得位线电流的漏极电流IBL1。大于第一组读取偏置条件{Vread1}的第二组读取偏置状态{Vread2}被施加到相同的选择单元{Vread2},且根据{Vread2}-参考信号获得位线电流的漏极电流IBL2(饱和电流)。由于所施加的偏置条件{Vread2}-{Vread1}的变化而导致的位线电流(IBL2-IBL1)的差异取决于所选择的存储单元的状态。例如,{Vread2}的字线电压可能远大于状态“0”的阈值电压,而{Vread1}的字线上电压可能远高于状态“1”的阈值,但落入状态“0”)的亚阈值区域。状态“1”的电流差(IBL2-IBL1)可能远大于状态“0”的电流差值(IBL2-IBL1)。例如,对于状态“0”,可以观察到非常小的位线电流差,例如小于100nA,但是对于状态“1”,可以观测到大的位线电压差,例如大于5μm单元。感应放大器电路可以使用比率而不是差分技术。An additional alternative to a self-referenced read scheme could utilize overdrive to read the saturation current of the memory cell to be used as a read reference signal. The technique described can use two read cycles, the first for the reference signal and the second for the cell storage status signal. FIG. 20H shows the characteristics of bit line current and word line voltage under different storage states and read conditions. The first set of read bias conditions {Vread1} is applied to the selected cell, and the drain current IBL1 of the bit line current is obtained based on the {Vread1}-cell storage state signal. A second set of read bias conditions {Vread2} greater than the first set of read bias conditions {Vread1} is applied to the same selected cell {Vread2}, and the drain of the bit line current is obtained based on the {Vread2}-reference signal Current IBL2 (saturation current). The difference in bit line current (IBL2-IBL1) due to changes in the applied bias conditions {Vread2}-{Vread1} depends on the state of the selected memory cell. For example, the word line voltage of {Vread2} may be much greater than the threshold voltage of state "0", while the word line voltage of {Vread1} may be much greater than the threshold voltage of state "1", but fall into the sub-threshold of state "0"). threshold area. The current difference in state "1" (IBL2-IBL1) may be much larger than the current difference in state "0" (IBL2-IBL1). For example, for state "0", very small bit line current differences can be observed, such as less than 100 nA, but for state "1", large bit line voltage differences can be observed, such as greater than 5 μm cells. Sense amplifier circuits can use ratiometric rather than differential techniques.

这样的读出放大器技术在存储器领域是已知的,且这样的技术和电路已经在至少美国专利7,590,003和以下论文中提出:Jeong,Gitae等人的“具有自参考传感方案的0.24-μm 2.0-V 1T1MTJ 16kb非易失性磁阻RAM”,IEEE固态电路杂志38.11(2003):1906-1910;Tanizaki,Hiroaki等人的“A.高密度和高速1T-4MTJ MRAM,带电压偏移自参考传感方案”,2006年IEEE亚洲固态电路会议,IEEE,2006;Choi,Jun-Tae等人的“用于自旋转移力矩磁阻随机存取存储器的新型自参考读出放大器”,JSTS:半导体技术与科学杂志16.1(2016):31-38;以及Na,Taehui等人的“深亚微米STT-RAM的数据单元变化容忍双模传感方案”,IEEE电路与系统汇刊I:Regular Papers 65.1(2017):163-174,所有这些都以全文引用的方式并入本文中。Such sense amplifier technology is known in the memory field, and such technology and circuits have been proposed in at least U.S. Patent 7,590,003 and the following papers: "0.24-μm 2.0 with self-reference sensing scheme" by Jeong, Gitae et al. -V 1T1MTJ 16kb non-volatile magnetoresistive RAM," IEEE Solid-State Circuits Magazine 38.11 (2003): 1906-1910; "A. High-density and high-speed 1T-4MTJ MRAM with voltage offset self-referencing" by Tanizaki, Hiroaki et al. Sensing Scheme," 2006 IEEE Asian Solid-State Circuits Conference, IEEE, 2006; Choi, Jun-Tae et al., "A Novel Self-Referencing Sense Amplifier for Spin-Transfer Torque Magnetoresistive Random Access Memory," JSTS: Semiconductors Journal of Technology and Science 16.1 (2016): 31-38; and Na, Taehui et al., "Data cell variation tolerant dual-mode sensing scheme for deep submicron STT-RAM", IEEE Transactions on Circuits and Systems I: Regular Papers 65.1 (2017):163–174, all of which are incorporated by reference in their entirety.

自参考读取的另一个替代方案是使用3D NOR存储器,而不利用每个单元两个比特的镜像比特概念,而是将漏极侧区域不作为比特存储位置,而是作为形成自参考信号的参考位置。因此,在这种情况下,仅单元的源极侧用于编程和擦除,而漏极侧仅用于读取参考信号。在这种方法中,两个读取周期可以以类似于图20H中所示的方式使用。一个读取周期是读取未编程的漏极侧,并将其用作参考以与源极侧的存储器位点的读取进行比较。感应放大器可以使用与参考图20H所示的自参考方法和结构所示的电路类似的电路。Another alternative to self-referenced reading is to use 3D NOR memory and instead of utilizing the mirrored bit concept of two bits per cell, the drain side area is used not as a bit storage location but as a location where the self-referenced signal is formed. Reference location. Therefore, in this case, only the source side of the cell is used for programming and erasing, while the drain side is used only for reading the reference signal. In this approach, two read cycles can be used in a manner similar to that shown in Figure 20H. A read cycle reads the unprogrammed drain side and uses it as a reference to compare with the read of the memory site on the source side. The sense amplifier may use a circuit similar to that shown with reference to the self-referencing method and structure shown in Figure 20H.

具有这种多级3D NOR-P结构的系统可以包括一个或几个层级的存储器控制电路,其可以支持存储器数据从一个器件区域到另一器件区域的有效系统内传输,例如,从高密度区域到高速区域。如美国专利10,515,981的图15A-图17和图34A-图35D所示,其全部内容以引用的方式并入本文中。在某些情况下,可以通过连接一个存储区的数据线将其直接连接到另一存储区,或者通过首先将数据从一个区传输到缓冲存储器,然后从缓冲存储器传输到另一个区来控制这种传输。Systems with such multi-level 3D NOR-P structures may include one or several levels of memory control circuitry that may support efficient intra-system transfer of memory data from one device area to another, e.g., from a high-density area. to the highway area. As shown in Figures 15A-17 and 34A-35D of US Patent 10,515,981, the entire contents of which are incorporated herein by reference. In some cases this can be controlled by connecting data lines from one bank directly to another, or by first transferring data from one zone to buffer memory and then from buffer memory to another. kind of transmission.

具有存储器控制逻辑电路的3D存储器的另一个优点是,可以选择用极宽的数据母线来构造存储器。工业中具有相对窄母线的通用存储器受到低成本器件封装可用的I/O和引脚数量的限制以及使用通用印刷电路板布线母线的限制。器件控制和与处理器的连接可以垂直进行的3D架构可以支持非常宽的数据库,例如,例如32、64、128、256、512或甚至超过1024行。这种极宽的母线增加了数据速率以及存储器到处理器的带宽,尤其是对于多核架构。至少一个3D存储器芯片可以通过3D堆叠集成在处理器芯片上,这可能需要处理器和存储器设计者之间的密切协作。或者,至少一个3D存储器芯片可以通过2.5D内插器与处理器芯片集成,因此处理器和存储器的设计可以解耦。此外,多个3D存储器芯片和多个处理器可以通过片上网络拓扑进行路由。片上网络可以是传统的基于金属线的网络。或者它可以是基于光学或RF插入器的网络。处理器不仅可以是多核的,而且可以是异构的。异构集成可以包括任何其它奇特的器件。诸如MRAM、PCRAM和RRAM之类的不同类型的存储器可以集成到3D系统中。诸如激光雷达、3D相机和金属诱导晶化之类的传感器可以直接集成在3D系统上。除了宽数据母线之外,与传统的PCB集成相比,这种3D集成的优点还包括导线长度的减少以及随之而来的功率的减少。Another advantage of 3D memory with memory control logic is the option of constructing the memory with extremely wide data buses. General-purpose memories in industry with relatively narrow busbars are limited by the number of I/O and pins available in low-cost device packages and by the use of general-purpose printed circuit board routing busbars. A 3D architecture in which device control and connection to the processor can occur vertically can support very wide databases, for example, 32, 64, 128, 256, 512 or even more than 1024 rows. This extremely wide bus increases data rates and memory-to-processor bandwidth, especially for multi-core architectures. At least one 3D memory chip could be integrated on the processor chip via 3D stacking, which may require close collaboration between processor and memory designers. Alternatively, at least one 3D memory chip could be integrated with the processor chip through a 2.5D interposer, so the processor and memory designs could be decoupled. Additionally, multiple 3D memory chips and multiple processors can be routed through an on-chip network topology. The on-chip network can be a traditional metal wire-based network. Or it could be an optical or RF interposer based network. Processors can not only be multi-core but also heterogeneous. Heterogeneous integration can include any other exotic device. Different types of memory such as MRAM, PCRAM and RRAM can be integrated into 3D systems. Sensors such as lidar, 3D cameras and metal-induced crystallization can be integrated directly on the 3D system. In addition to wide data buses, the advantages of this 3D integration include a reduction in wire lengths and a consequent reduction in power compared to traditional PCB integration.

具有存储器控制逻辑电路的3D存储器的另一个优点是,可以在存储器控制逻辑中包括一些简单的算术/逻辑单元,所述存储器控制逻辑逻辑电路放置在存储器阵列的顶部和/或下面的至少一个或两个上。因此,可以在存储器芯片中完成简单的算术和逻辑操作,显著提高系统性能并降低功耗。ALU功能也可以在存储器本身中执行并优化。Another advantage of 3D memories with memory control logic placed at least one on top and/or below the memory array is that some simple arithmetic/logic units can be included in the memory control logic. Two up. Therefore, simple arithmetic and logical operations can be completed in the memory chip, significantly improving system performance and reducing power consumption. ALU functions can also be performed and optimized in the memory itself.

由于母线非常宽,慢速存储器有时可用于高速计算应用。高速计算的一个示例可以包括用于云和边缘的Al加速器。在这种情况下,诸如3D NAND之类的相对较慢的存储器架构可以用于形成中速系统。因此,在一些系统中可能希望利用具有薄隧道氧化物的3D NAND架构来支持具有良好存储器密度和低关断电流的适度良好存储器速度。Because the buses are very wide, slow memories are sometimes used in high-speed computing applications. One example of high-speed computing could include Al accelerators for the cloud and edge. In this case, relatively slower memory architectures such as 3D NAND can be used to form mid-speed systems. Therefore, it may be desirable in some systems to utilize 3D NAND architectures with thin tunnel oxides to support moderately good memory speeds with good memory density and low off-state current.

在Al计算中,可能需要使用不同的Al算法执行一些不同的Al任务。由于性能、效率和参数大小的不同要求,其配置可能需要通用。为了适应可配置性,具有3D存储器的3D系统可以包括FPGA元件。或者,通过片上网络的配置来构成许多处理核心和许多3D存储器块的阵列。所述软件可以重新配置内存带宽和内存位宽。在一些3D系统中,例如,在移动系统中,可以使用替代的(除非回收,否则不是液体冷却)热管理技术。In Al calculations, it may be necessary to use different Al algorithms to perform some different Al tasks. Due to different requirements for performance, efficiency and parameter size, its configuration may need to be generic. To accommodate configurability, 3D systems with 3D memory can include FPGA elements. Alternatively, an array of many processing cores and many 3D memory blocks can be configured through an on-chip network. The software can reconfigure memory bandwidth and memory bit width. In some 3D systems, for example in mobile systems, alternative (not liquid cooling unless recycled) thermal management techniques may be used.

本文所述的3D系统可以是全晶片或切成子晶片尺寸。这样的划片可以在规则图案中进行,规则图案可以被设计为匹配产量以使多层晶片结构中的良好产量结构最大化。这种划片可以通过工业中使用的许多划片技术来完成。更先进的划片技术(例如使用等离子体蚀刻)可以是有效的,且允许柔性划片图案以及减少划片通道(通常称为街道)的宽度和有源器件晶片利用的相关浪费。切割或单体化图案可以使用掩模图案或无掩模图案以获得更大的灵活性,特别是当采用定向蚀刻/物质去除技术时,例如,诸如基于等离子体的蚀刻。激光切割技术是切割或单体化的另一种替代方案。The 3D systems described herein can be full wafer or cut into sub-wafer sizes. Such dicing can be performed in regular patterns that can be designed to match throughput to maximize good throughput structures in multi-layer wafer structures. This scribing can be accomplished by many scribing techniques used in industry. More advanced dicing techniques, such as using plasma etching, can be effective and allow for flexible dicing patterns as well as reducing the width of dicing lanes (often called streets) and the associated waste of active device wafer utilization. Cutting or singulation patterns may use masked patterns or maskless patterns for greater flexibility, particularly when employing directional etching/substance removal techniques, such as, for example, plasma-based etching. Laser cutting technology is another alternative to cutting or singulation.

通常,本文所述的3D系统的构造包括层转移的多个步骤。这种层转移可以包括翻转目标晶片上的供体晶片并执行混合接合。然后利用内置切割层(例如SiGe)研磨并回蚀供体晶片衬底。如果需要,为下一步形成引脚/焊盘。这些步骤可以包括供体晶片或靶晶片的角色交换,以及基本上从其中一个或两个移除衬底,如参考至少图13A所示。本文和许多并入的参考文献中。层转移的这些步骤可以包括使用载体晶片,如在通过引用合并的技术中多次提出的,或者如Jourdain,Anne等人的论文“用于3D异质集成的极端晶片薄化和纳米TSV处理”,2020年IEEE第70届电子元件和技术会议(ECTC)。IEEE,2020中提出,其以全文引用的方式并入本文中。载体晶片的使用有助于在侧晶片上而不是在目标3D结构上执行引脚/焊盘的背面添加。此外,它有效地将转移层翻转回来,以非翻转的形式与目标晶片对准。因此,例如,参考本文的图13A,结构1318将是载体晶片,然后到结构1330的流动形成可以代表在移除载体晶片的最后步骤之前使用的载体晶片。载体晶片移除工艺/方法可以类似于通过使用研磨和回蚀刻到内置蚀刻终止层来移除衬底。Typically, the construction of the 3D systems described herein involves multiple steps of layer transfer. This layer transfer may include flipping the donor wafer over the target wafer and performing hybrid bonding. The donor wafer substrate is then ground and etched back using a built-in cutting layer (eg SiGe). If needed, form pins/pads for next step. These steps may include reversing the roles of the donor wafer or the target wafer and substantially removing the substrate from one or both of them, as shown with reference to at least FIG. 13A. This article and the many incorporated references. These steps of layer transfer can include the use of carrier wafers, as proposed several times in the technology incorporated by reference, or as in the paper "Extreme wafer thinning and nano-TSV processing for 3D heterogeneous integration" by Jourdain, Anne et al. , 2020 IEEE 70th Electronic Components and Technology Conference (ECTC). Proposed in IEEE, 2020, which is incorporated by reference in its entirety. The use of a carrier wafer helps perform backside addition of pins/pads on the side wafer rather than on the target 3D structure. Additionally, it effectively flips the transfer layer back into alignment with the target wafer in a non-flip form. Thus, for example, referring to Figure 13A herein, structure 1318 would be a carrier wafer, and then flow formation to structure 1330 may represent the carrier wafer used prior to the final step of removing the carrier wafer. The carrier wafer removal process/method may be similar to removing the substrate by using grinding and etching back to the built-in etch stop layer.

因此,本文提出的3D系统可以为最终系统的形成提供灵活的框架。它可以支持使用多种工艺在各种晶片厂进行器件级处理的混合和匹配。3D系统可以像乐高积木一样构建,其中的工程可以包括使用“现成的”通用层级。垂直巴士和单元尺寸的行业标准可能有助于支持类似乐高的系统构建的此类通用层级的可用性。层级还可以包括至少一个可编程核心阵列。其它形式的灵活性可以利用本文或通过参考技术并入的(嵌入式)现场可编程逻辑和半定制逻辑等技术。灵活的框架可以包括通过选择在Z方向上堆叠的各种层级来构建系统,以及全晶片级或划片的选择,包括X/Y方向上的划片选择。所属领域的工程师可以基于本文提出的灵活的3D系统框架来设计特定的系统和器件。Therefore, the 3D system proposed in this paper can provide a flexible framework for the formation of the final system. It can support the mixing and matching of device-level processing at various fabs using multiple processes. 3D systems can be built like Lego bricks, where engineering can include using "off-the-shelf" generic layers. Industry standards for vertical bus and cell dimensions may help support the availability of such common tiers for LEGO-like system construction. The tiers may also include at least one programmable core array. Other forms of flexibility can take advantage of technologies such as (embedded) field programmable logic and semi-custom logic incorporated herein or through reference technologies. The flexible framework can include building the system with a choice of various layers stacked in the Z direction, as well as full wafer level or dicing options, including dicing options in the X/Y direction. Engineers in their field can design specific systems and devices based on the flexible 3D system framework proposed in this article.

图21A显示了作为现有技术的Cerebras Systems公司的晶片级引擎。从圆形晶片上切下四块硅后,晶片级引擎呈方形。然后,输入/输出连接引脚或焊盘位于方形晶片级引擎的两个边缘上。FIG. 21A shows a Cerebras Systems wafer-level engine as a prior art. After cutting four pieces of silicon from a round wafer, the wafer-scale engine takes on a square shape. Input/output connection pins or pads are then located on both edges of the square wafer-scale engine.

作为晶片级3D系统的替代方案,它可以包括沿着晶片边缘圆周的IO焊盘。IO引脚可以是至少一排,如图2IB的放大图所示。虽然在传统半导体系统中IO焊盘通常是直线排列的,但IO焊盘可以轴向排列。I/O焊盘可以放置在离晶片边缘约110微米以内、离晶片边缘大约220微米以内、距晶片边缘大约300微米以内、或离晶片边缘500微米以内。IO焊盘的方向可以全部在一个方向上对齐,优选方向相对于晶片步进器布局直接“向上”或“向北”,或者可以全部指向圆形晶片的中心,就好像在来自圆形中心的径向射线上一样,或者在具有允许焊盘接合/连接工艺的最佳攻角的位置的变化方向上。通过使用全晶片尺寸来配置晶片级3D系统,晶片的房地产利用率是无与伦比的。As an alternative to wafer-level 3D systems, it could include IO pads along the circumference of the wafer edge. The IO pins can be in at least one row, as shown in the enlarged view of Figure 2IB. Although IO pads are usually arranged linearly in traditional semiconductor systems, IO pads can be arranged axially. I/O pads may be placed within approximately 110 microns of the wafer edge, within approximately 220 microns of the wafer edge, within approximately 300 microns of the wafer edge, or within 500 microns of the wafer edge. The orientation of the IO pads can all be aligned in one direction, with the preferred direction being directly "up" or "north" relative to the wafer stepper layout, or they can all be pointed towards the center of the circular wafer, as if from the center of the circle Either on a radial ray, or in a changing direction with a position that allows an optimal angle of attack for the pad bonding/connection process. By using the full die size to configure wafer-scale 3D systems, the real estate utilization of the wafer is unparalleled.

另一个实施例涉及3D晶片秤系统的夹具。传统上,半导体芯片被封装是为了保护免受外部环境的影响,例如,冲击、光、湿度和灰尘。封装材料的组分通常包括无机填料,例如二氧化硅和环氧树脂。消费电子产品的芯片,尤其是移动应用程序的芯片,需要受到保护,免受这些环境威胁。然而,企业应用程序,例如超级计算机、数据中心和服务器,可能不需要封装,因为此类应用程序通常在光线/湿度/灰尘控制良好的环境中运行,且芯片安装后很少重新定位。封装的一个主要缺点是其热阻抑制了热耗散。Another embodiment relates to a clamp for a 3D wafer scale system. Traditionally, semiconductor chips have been packaged to protect them from the external environment, such as shock, light, humidity, and dust. The components of encapsulating materials often include inorganic fillers such as silica and epoxy resins. Chips for consumer electronics, especially for mobile applications, need to be protected from these environmental threats. However, enterprise applications, such as supercomputers, data centers, and servers, may not require encapsulation because such applications typically operate in well-light/humidity/dust-controlled environments and the chip is rarely repositioned after installation. A major disadvantage of the package is that its thermal resistance inhibits heat dissipation.

因此,为了更好地散热晶片级3D系统,可以使用允许系统“裸露”的夹具,如图21C至图21E所示。夹具可以被制成保持和夹紧晶片级3D系统的边缘区域,而中心区域可以被暴露。夹具可以被配置为通过抓握区域连接晶片级3D系统的IO焊盘。夹具可以包括根据晶片级3D系统的形状的压痕,以便其坐置并牢固地保持晶片/3D系统。如图21C至图21E所示,晶片级3D系统绘制为圆形;然而,如果要对晶片级3D系统进行锯切,则系统形状可以是正方形、矩形、截边正方形或截边矩形。晶片级3D系统可以包括沿着晶片边缘区域的IO焊盘,如图2IB所示。底盘底部的材料可以包括铝、不锈钢、硅、碳化硅、玻璃或其镀金属(如镍或金)。夹具可以通过各种方法夹持晶片。Therefore, to better dissipate heat from a wafer-level 3D system, a fixture that allows the system to be "exposed" can be used, as shown in Figures 21C to 21E. Clamps can be made to hold and clamp the edge areas of the wafer-scale 3D system, while the center area can be exposed. The clamp can be configured to connect the IO pads of the wafer-level 3D system through the gripping area. The clamp may include indentations according to the shape of the wafer-level 3D system so that it seats and securely holds the wafer/3D system. As shown in Figures 21C-21E, the wafer-level 3D system is drawn as a circle; however, if the wafer-level 3D system is to be sawed, the system shape can be a square, a rectangle, a truncated square, or a truncated rectangle. Wafer-level 3D systems may include IO pads along the edge region of the wafer, as shown in Figure 2IB. Materials for the bottom of the chassis can include aluminum, stainless steel, silicon, silicon carbide, glass, or their metal coatings (such as nickel or gold). Fixtures can hold wafers in a variety of ways.

图21C显示了一个螺旋式夹具。夹具的顶部和底部是分开的。这两个部分通过使用螺钉和弹簧机构来固定晶片。螺钉和弹簧的数量和位置取决于工程考虑,且可能不同于所示夹具位置中的所示4个螺钉/弹簧。图21D所示为夹具类型的夹具。夹具的上部和下部通过垂直铰链连接。图21E所示为蛤壳式夹具。夹具的顶部和底部通过水平铰链连接。当晶片被安装时,铰链机构夹紧并保持晶片。图21F显示了这种夹具底部的内部视图。夹具的底部可以包括弹性材料,例如橡胶o形环或聚酰亚胺,其可以在夹具闭合时轻轻地施加压力以保持晶片。图21G显示了夹具顶部的内部视图。夹具的顶部可以包括用于信号完整性、电源完整性和其它控制功能的印刷电路板系统。PCB还可以包括用于探测晶片级3D系统的IO焊盘的引脚。Figure 21C shows a screw clamp. The top and bottom of the clamp are separate. These two parts hold the wafer in place using a screw and spring mechanism. The number and location of screws and springs depends on engineering considerations and may differ from the 4 screws/springs shown in the clamp locations shown. Figure 21D shows a clamp type fixture. The upper and lower parts of the clamp are connected by vertical hinges. Figure 21E shows a clamshell clamp. The top and bottom of the clamp are connected by horizontal hinges. The hinge mechanism clamps and holds the wafer as it is mounted. Figure 21F shows an internal view of the bottom of such a clamp. The bottom of the clamp can include a resilient material, such as a rubber o-ring or polyimide, which can apply gentle pressure to hold the wafer when the clamp is closed. Figure 21G shows an interior view of the top of the clamp. The top of the fixture may include printed circuit board systems for signal integrity, power integrity, and other control functions. The PCB may also include pins for probing the IO pads of the wafer-level 3D system.

另一个实施例可以使用液体浸没冷却浴,所述浴可以包含通过裸夹具安装的多个晶片级3D系统,如图21C至图21E所示。具有裸晶片级3D系统的多个夹具可以浸入介电传热液体浴中。如果流体需要诸如单相冷却的循环,则浴可以可选地包含入口孔和出口孔。Another embodiment may use a liquid immersion cooling bath that may contain multiple wafer-scale 3D systems mounted through a bare fixture, as shown in Figures 21C-21E. Multiple fixtures with bare wafer-level 3D systems can be immersed in a bath of dielectric heat transfer liquid. If the fluid requires circulation such as single-phase cooling, the bath may optionally contain inlet and outlet holes.

或者,浴可以包括两相冷却流体。在两相冷却中,流体在晶片级3D系统的热部分表面蒸发,从而从晶片上去除热区域的热量。流体的循环通过冷却剂的蒸发和冷凝而被动地发生。图21H显示了Gigabyte Technology的两相冷却槽(https://www.gigabyte.com/Solutions/Cooling/immersion-cooling)。安装多个封装芯片的PCB板浸没在浴槽中。作为替代实施例,液体冷却浴可以包含多个裸晶片级3D系统,如图21I所示。Alternatively, the bath may include a two-phase cooling fluid. In two-phase cooling, the fluid evaporates on the surface of the hot portion of the wafer-scale 3D system, thereby removing heat from the hot area on the wafer. Circulation of the fluid occurs passively through the evaporation and condensation of the coolant. Figure 21H shows Gigabyte Technology’s two-phase cooling bath (https://www.gigabyte.com/Solutions/Cooling/immersion-cooling). The PCB board on which multiple packaged chips are mounted is immersed in the bath. As an alternative embodiment, a liquid cooling bath may contain multiple bare wafer level 3D systems, as shown in Figure 21I.

另一种选择是使用裸晶片级3D系统的夹具,所述系统可能进一步包括以太网端口和电源端口,如图21J所示。以太网和电源端口通过类似于图21G所示的PCB连接到晶片级3D系统。Another option is to use a fixture of a bare wafer-level 3D system, which may further include an Ethernet port and a power port, as shown in Figure 21J. The Ethernet and power ports are connected to the wafer-level 3D system via a PCB similar to that shown in Figure 21G.

在另一种替代方案中,晶片级3D系统可以设计为切成两半,如图21K所示。半晶片级3D系统可以通过印刷电路板排列,如图21L所示。输入/输出(I/O)焊盘可以形成在晶片的中心线上,如图21K所示。如图21L所示,当排列多个晶片时,可以共享数据通道和地址。半晶片级3D系统可以直接安装到插座上。半晶片级3D系统可以是裸露的或由环氧模制化合物或聚酰亚胺封装。为了解决信息应与哪个晶片通信以及从哪个晶片通信的问题,I/O焊盘可以包括适当的晶片识别号(ID)。在这样的替代方案中,阵列晶片级3D系统可以是交错3D系统。通过在晶片上均匀地分布地址,但按晶片ID进行区分,存储器访问、指令和其它编程可以由主控制器交错进行。In another alternative, the wafer-scale 3D system can be designed to be cut in half, as shown in Figure 21K. Semi-wafer-scale 3D systems can be arranged via printed circuit boards, as shown in Figure 21L. Input/output (I/O) pads can be formed on the centerline of the die, as shown in Figure 21K. As shown in Figure 21L, when multiple dies are arrayed, data channels and addresses can be shared. The semi-wafer-level 3D system can be mounted directly to the socket. Semi-wafer-scale 3D systems can be bare or encapsulated with epoxy molding compound or polyimide. To address the issue of which die information should be communicated to and from, the I/O pads may include appropriate die identification numbers (IDs). In such an alternative, the array wafer-scale 3D system may be a staggered 3D system. By distributing addresses evenly across the die but distinguishing them by die ID, memory accesses, instructions and other programming can be interleaved by the master controller.

本文提出的3D系统可以被视为半导体器件,并使用行业中使用的其它集成技术,如印刷电路板(PCB)、插入器、衬底和集成技术(也称为2.5D)以及其它技术,集成到更大的系统中。The 3D system proposed in this paper can be considered as a semiconductor device and uses other integration technologies used in the industry such as printed circuit boards (PCB), interposer, substrate and integration technology (also known as 2.5D) and other technologies, integration into a larger system.

所属领域的普通技术人员还应理解,本发明不限于上文特别示出和描述的内容。例如,使用SiGe作为指定的牺牲层或蚀刻终止层可以由相容材料或其它材料的组合来代替,所述其它材料包括SiGe的添加材料,例如碳或各种掺杂材料,例如硼或其它变体。例如,为了清晰起见,图纸或插图可能没有显示n或p阱。此外,本文所示或所讨论的任何转移层或施主衬底或晶片制备物可以包括一个或多个半导体材料的未掺杂区域或层。此外,当被转移时,一个或多个被转移层可以在其内或其上具有STI或其它晶体管元件的区域。例如,能级的顺序和其功能可以与这里所示的不同,混合接合或其它类型的接合的使用以及相关的对准技术和其垂直连接可以使用本文或通过参考技术并入的技术或其它技术混合和匹配。此外,典型的基于单元的体系结构的模块化方法可以支持所需的柔性系统构造,例如将3D异质集成晶片切割成40x40mm2系统的尺寸,或者太大的尺寸,例如100x 100mm2系统,或者甚至使用3D晶片作为最终系统。此外,所述系统可以设计为具有不同尺寸和/或不同功能的单元的混合,包括支持Al计算的单元和支持数据管理和系统管理的单元。此外,通过利用具有内置波导或传输线的面板,3D系统可以扩展到晶片尺寸之外,如美国专利申请16/558,304,公开案2020/0176420的图43A至图43E所示,其以引用的方式并入本文中。It will further be understood by those of ordinary skill in the art that the present invention is not limited to what is specifically shown and described above. For example, the use of SiGe as a designated sacrificial layer or etch stop layer may be replaced by compatible materials or a combination of other materials including additives to SiGe such as carbon or various doping materials such as boron or other modified materials. body. For example, drawings or illustrations may not show n or p wells for clarity. Furthermore, any transfer layer or donor substrate or wafer preparation shown or discussed herein may include one or more undoped regions or layers of semiconductor material. Additionally, when transferred, one or more transferred layers may have regions of STI or other transistor elements within or on them. For example, the order of energy levels and their functionality may differ from that shown here, the use of hybrid bonding or other types of bonding and associated alignment techniques and their vertical connections may use the techniques incorporated herein or by reference or other techniques. Mix and match. Furthermore, a modular approach to typical cell-based architectures can support the required flexible system construction, such as dicing a 3D heterogeneous integrated wafer to the size of a 40x40mm system, or too large a size, such as a 100x 100mm system , or Even using 3D chips as the final system. Furthermore, the system may be designed as a mixture of units with different sizes and/or different functions, including units supporting Al calculations and units supporting data management and system management. Additionally, 3D systems can be scaled beyond the wafer size by utilizing panels with built-in waveguides or transmission lines, as shown in Figures 43A-43E of U.S. Patent Application No. 16/558,304, Publication No. 2020/0176420, which is incorporated by reference. into this article.

有许多选择和工程考虑因素可以利用本文中提出的技术来构建特定的系统,正如所属领域的技术人员可以应用的那样。相反,本发明的范围包括上文所描述的各种特征的组合和子组合,以及这些技术人员在阅读前述描述时会发生的修改和变化。因此,本发明仅受所附权利要求书限制。There are many options and engineering considerations for building a specific system using the techniques presented in this article, as those skilled in the art can apply. Rather, the scope of the invention includes combinations and subcombinations of the various features described above, as well as the modifications and changes that may occur to those skilled in the art on reading the foregoing description. Accordingly, the invention is limited only by the appended claims.

Claims (23)

1. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
A third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus,
wherein the vertical bus comprises more than eight struts and less than three hundred struts,
wherein the vertical bus provides an electrical connection between the first circuit and the second circuit,
wherein each of the ECUs includes a vertical control line,
wherein the vertical control line comprises greater than eight hundred struts, an
Wherein the vertical control line provides an electrical connection between the second circuit and the third circuit.
2. The device according to claim 1,
wherein the vertical bus is compatible with at least one industry accepted standard computer bus.
3. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus,
wherein the vertical bus comprises more than eight struts and less than three hundred struts,
wherein the vertical bus provides an electrical connection between the first circuit and the third circuit,
wherein each of the ECUs includes a vertical control line,
wherein the vertical control line comprises greater than eight hundred struts, an
Wherein the vertical control line provides an electrical connection between the second circuit and the third circuit.
4. A device according to claim 2,
wherein the second level is joined to the first level, and
wherein the bond includes an oxide-to-oxide bond region and a metal-to-metal bond region.
5. A device according to claim 2,
wherein the vertical bus includes less than one hundred twenty struts.
6. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus,
wherein the vertical bus comprises more than eight struts and less than three hundred struts,
Wherein the vertical bus provides an electrical connection between the first circuit and the second circuit,
wherein the third level comprises an array of memory cells, and wherein the second circuit comprises a memory control circuit.
7. The device according to claim 6,
wherein the second level is joined to the first level, and
wherein the bond includes an oxide-to-oxide bond region and a metal-to-metal bond region.
8. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a first vertical bus,
Wherein the first vertical bus includes more than eight first struts and less than three hundred first struts,
wherein the first vertical bus provides an electrical connection between the first circuit and the second circuit; and
a second vertical bus bar is provided which is arranged on the first vertical bus bar,
wherein the second vertical bus includes more than eight second struts and less than three hundred second struts,
wherein the second vertical bus provides an electrical connection between the second circuit and the third circuit, an
Wherein the first leg is not in direct contact with the second leg.
9. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus,
Wherein the vertical bus comprises more than eight struts and less than three hundred struts,
wherein the vertical bus provides an electrical connection between the first circuit and the second circuit, an
Wherein the vertical bus includes a plurality of redundant struts.
10. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus,
wherein the vertical bus comprises more than eight struts and less than three hundred struts,
Wherein the vertical bus provides an electrical connection between the first circuit and the second circuit, an
Wherein the vertical bus includes a plurality of power delivery struts.
11. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the plurality of ECUs comprises a vertical bus,
wherein the vertical bus comprises more than eight struts and less than three hundred struts,
wherein the vertical bus provides an electrical connection between the first circuit and the second circuit, an
Wherein the vertical bus meets industry standards regarding the position of the pillar relative to the ECU edge.
12. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus,
wherein the vertical bus comprises more than eight struts and less than three hundred struts,
wherein the vertical bus provides an electrical connection between the first circuit and the second circuit, an
Wherein the vertical bus includes a plurality of data columns and a plurality of address columns.
13. The device according to claim 12,
wherein the vertical bus comprises at least 8 data columns.
14. The device according to claim 12,
wherein the ECU size is greater than 2500 square microns and less than 4 square millimeters.
15. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus, and
wherein each of the ECUs includes a plurality of trench capacitors.
16. A 3D device, the device comprising:
A first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus,
wherein the vertical bus comprises more than eight struts and less than three hundred struts,
wherein the vertical bus provides an electrical connection between the first circuit and the second circuit, an
Wherein a portion of the pillars each comprise an electrostatic surge discharge ("ESD") structure.
17. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
A second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus, and
wherein each of the ECUs includes a plurality of power conditioners.
18. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus, and
Wherein each of the ECUs includes a plurality of charge pump circuits.
19. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus, and
wherein each of the ECUs includes at least one high resistivity trap rich layer.
20. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
A third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus, and
wherein each of the ECUs includes at least one monitor circuit.
21. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus, and
Wherein each of the ECUs includes at least one temperature sensor.
22. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
a second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus, and
wherein each of the ECUs includes a first liquid cooling structure and a second liquid cooling structure, an
Wherein the second liquid cooling structure is disposed above the first liquid cooling structure.
23. A 3D device, the device comprising:
a first level including a first transistor, the first level including a first interconnect;
A second level including a second transistor, the second level overlying the first level;
a third level including a third transistor, the third level overlying the second level; and
a plurality of Electronic Circuit Units (ECU),
wherein each of the plurality of ECUs comprises a first circuit comprising a portion of the first transistor, wherein each of the plurality of ECUs comprises a second circuit comprising a portion of the second transistor, wherein each of the plurality of ECUs comprises a third circuit comprising a portion of the third transistor, wherein each of the ECUs comprises a vertical bus, and
wherein each of the ECUs includes a first electromagnetic interconnect structure and a second electromagnetic interconnect structure, an
Wherein the second electromagnetic interconnect structure is disposed over the first electromagnetic interconnect structure.
CN202180096166.6A 2021-01-22 2021-08-01 3D semiconductor devices and structures Pending CN117581366A (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US63/140,661 2021-01-22
US63/144,970 2021-02-02
US63/151,664 2021-02-20
US63/180,083 2021-04-27
US63/196,682 2021-06-04
US202163220443P 2021-07-09 2021-07-09
US63/220,443 2021-07-09
PCT/US2021/044110 WO2022159141A1 (en) 2021-01-22 2021-08-01 3d semiconductor device and structure

Publications (1)

Publication Number Publication Date
CN117581366A true CN117581366A (en) 2024-02-20

Family

ID=89862994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180096166.6A Pending CN117581366A (en) 2021-01-22 2021-08-01 3D semiconductor devices and structures

Country Status (1)

Country Link
CN (1) CN117581366A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230230901A1 (en) * 2022-01-10 2023-07-20 International Business Machines Corporation TSV and Backside Power Distribution Structure
CN118321755A (en) * 2024-06-13 2024-07-12 通威微电子有限公司 Silicon carbide laser cold cracking method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230230901A1 (en) * 2022-01-10 2023-07-20 International Business Machines Corporation TSV and Backside Power Distribution Structure
US12431408B2 (en) * 2022-01-10 2025-09-30 International Business Machines Corporation TSV and backside power distribution structure
CN118321755A (en) * 2024-06-13 2024-07-12 通威微电子有限公司 Silicon carbide laser cold cracking method

Similar Documents

Publication Publication Date Title
US11621249B2 (en) 3D semiconductor device and structure
US20210242189A1 (en) 3d semiconductor devices and structures
US11502095B2 (en) 3D semiconductor device, structure and methods
US11121121B2 (en) 3D semiconductor device and structure
US20200105652A1 (en) Hard macro having blockage sites, integrated circuit including same and method of routing through a hard macro
US20220149012A1 (en) 3d semiconductor devices and structures with at least one vertical bus
US9886275B1 (en) Multi-core processor using three dimensional integration
US11800725B1 (en) 3D semiconductor devices and structures with electronic circuit units
US11646309B2 (en) 3D semiconductor devices and structures with metal layers
US20080291767A1 (en) Multiple wafer level multiple port register file cell
TWI762445B (en) Semiconductor device, system on chip, mobile device and semiconductor system
Coteus et al. Technologies for exascale systems
EP4282002A1 (en) 3d semiconductor device and structure
US20230187397A1 (en) 3d semiconductor device and structure with logic circuits and memory cells
US20230041344A1 (en) 3d semiconductor device, structure and methods with connectivity structures
US12027518B1 (en) 3D semiconductor devices and structures with metal layers
CN117581366A (en) 3D semiconductor devices and structures
TWI422009B (en) Multi-wafer stack structure
US20250318149A1 (en) 3d semiconductor device, structure and methods with memory arrays and connectivity structures
US12433031B2 (en) 3D semiconductor devices and structures with electronic circuit units
US12021028B2 (en) 3D semiconductor devices and structures with electronic circuit units
US12369323B2 (en) 3D semiconductor device, structure and methods with memory arrays and connectivity structures
US12029050B2 (en) 3D semiconductor device, structure and methods with connectivity structures
US20080116572A1 (en) Semiconductor memory modules, methods of arranging terminals therein, and methods of using thereof
US12563752B2 (en) 3D semiconductor devices and structures with electronic circuit units

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination