US7518225B2 - Chip system architecture for performance enhancement, power reduction and cost reduction - Google Patents

Chip system architecture for performance enhancement, power reduction and cost reduction Download PDF

Info

Publication number
US7518225B2
US7518225B2 US11/538,567 US53856706A US7518225B2 US 7518225 B2 US7518225 B2 US 7518225B2 US 53856706 A US53856706 A US 53856706A US 7518225 B2 US7518225 B2 US 7518225B2
Authority
US
United States
Prior art keywords
chip
layered chip
layered
carrier package
stack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US11/538,567
Other versions
US20070290315A1 (en
Inventor
Philip G. Emma
John U. Knickerbocker
Chirag S. Patel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GlobalFoundries US Inc
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US11/538,567 priority Critical patent/US7518225B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: EMMA, PHILIP G., KNICKERBOCKER, JOHN U., PATEL, CHIRAG S.
Priority to PCT/US2007/071370 priority patent/WO2008100324A2/en
Priority to CN2007800188856A priority patent/CN101473436B/en
Priority to EP07863368A priority patent/EP2036126A2/en
Publication of US20070290315A1 publication Critical patent/US20070290315A1/en
Application granted granted Critical
Publication of US7518225B2 publication Critical patent/US7518225B2/en
Assigned to GLOBALFOUNDRIES U.S. 2 LLC reassignment GLOBALFOUNDRIES U.S. 2 LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Assigned to GLOBALFOUNDRIES INC. reassignment GLOBALFOUNDRIES INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GLOBALFOUNDRIES U.S. 2 LLC, GLOBALFOUNDRIES U.S. INC.
Assigned to WILMINGTON TRUST, NATIONAL ASSOCIATION reassignment WILMINGTON TRUST, NATIONAL ASSOCIATION SECURITY AGREEMENT Assignors: GLOBALFOUNDRIES INC.
Assigned to GLOBALFOUNDRIES U.S. INC. reassignment GLOBALFOUNDRIES U.S. INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GLOBALFOUNDRIES INC.
Assigned to GLOBALFOUNDRIES INC. reassignment GLOBALFOUNDRIES INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: WILMINGTON TRUST, NATIONAL ASSOCIATION
Assigned to GLOBALFOUNDRIES U.S. INC. reassignment GLOBALFOUNDRIES U.S. INC. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: WILMINGTON TRUST, NATIONAL ASSOCIATION
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/03Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes
    • H01L25/04Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers
    • H01L25/065Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group H01L27/00
    • H01L25/0657Stacked arrangements of devices
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/03Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes
    • H01L25/04Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers
    • H01L25/065Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group H01L27/00
    • H01L25/0652Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group H01L27/00 the devices being arranged next and on each other, i.e. mixed assemblies
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/18Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof the devices being of types provided for in two or more different subgroups of the same main group of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L2225/00Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
    • H01L2225/03All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
    • H01L2225/04All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
    • H01L2225/065All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
    • H01L2225/06503Stacked arrangements of devices
    • H01L2225/06513Bump or bump-like direct electrical connections between devices, e.g. flip-chip connection, solder bumps
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L2225/00Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
    • H01L2225/03All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
    • H01L2225/04All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
    • H01L2225/065All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
    • H01L2225/06503Stacked arrangements of devices
    • H01L2225/06517Bump or bump-like direct electrical connections from device to substrate
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L2225/00Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
    • H01L2225/03All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
    • H01L2225/04All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
    • H01L2225/065All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
    • H01L2225/06503Stacked arrangements of devices
    • H01L2225/06541Conductive via connections through the device, e.g. vertical interconnects, through silicon via [TSV]
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L2225/00Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
    • H01L2225/03All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
    • H01L2225/04All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
    • H01L2225/065All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
    • H01L2225/06503Stacked arrangements of devices
    • H01L2225/06572Auxiliary carrier between devices, the carrier having an electrical connection structure
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L2225/00Details relating to assemblies covered by the group H01L25/00 but not provided for in its subgroups
    • H01L2225/03All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00
    • H01L2225/04All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers
    • H01L2225/065All the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/648 and H10K99/00 the devices not having separate containers the devices being of a type provided for in group H01L27/00
    • H01L2225/06503Stacked arrangements of devices
    • H01L2225/06589Thermal management, e.g. cooling
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L23/00Details of semiconductor or other solid state devices
    • H01L23/58Structural electrical arrangements for semiconductor devices not otherwise provided for, e.g. in combination with batteries
    • H01L23/64Impedance arrangements
    • H01L23/642Capacitive arrangements
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L2924/00Indexing scheme for arrangements or methods for connecting or disconnecting semiconductor or solid-state bodies as covered by H01L24/00
    • H01L2924/0001Technical content checked by a classifier
    • H01L2924/0002Not covered by any one of groups H01L24/00, H01L24/00 and H01L2224/00
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L2924/00Indexing scheme for arrangements or methods for connecting or disconnecting semiconductor or solid-state bodies as covered by H01L24/00
    • H01L2924/15Details of package parts other than the semiconductor or other solid state devices to be connected
    • H01L2924/151Die mounting substrate
    • H01L2924/153Connection portion
    • H01L2924/1531Connection portion the connection portion being formed only on the surface of the substrate opposite to the die mounting surface
    • H01L2924/15311Connection portion the connection portion being formed only on the surface of the substrate opposite to the die mounting surface being a ball array, e.g. BGA

Definitions

  • This invention generally relates to a method for designing multi-core processing systems, and more specifically, to a method for combining two emerging technology areas: Systems On Package (SOP) and 3D technology.
  • SOP Systems On Package
  • FIG. 1 illustrates the above situation.
  • a “core” 100 which comprises the processor 101 itself, and two additional cache levels 103 , 104 .
  • the processor 101 is the logic-intensive circuitry.
  • the L1 cache 102 is considered to be part of the processor 101 , since its access is part of the processor pipeline. As frequencies have gone up to multiple Ghz, the L1 102 has become too small to provide high hit rates. Therefore, an L1.5 103 cache level has been added to supplement the L1 102 .
  • the L1.5 103 is also limited in size because it must be reasonably fast to accommodate the high miss rate from the L1 102 . Therefore there is an L2 104 is required. Note that most of what is called a “core” 100 is actually memory. When multiple cores 100 are aggregated into a single “multi-core chip” 105 , the chip is large, but most of it is memory.
  • FIG. 2 illustrates today's prior art technology for multi-core chips.
  • the multi-core chip 105 is mounted on a Dual-Chip Module (DCM) 200 , which can either be ceramic or organic, where it is connected to another chip shown as L3 201 .
  • DCM Dual-Chip Module
  • L3 201 usually contains not only another layer of cache, but also other functions that facilitate the operation of the nest (not shown) (i.e. all system functions excluding the processor, cache, and memory; such as external memory, I/O controllers, wiring, PC boards, etc.), and functions that connect the DCM 200 to other devices in the system.
  • DCMs 200 are mounted with another layer of packaging 202 (and/or other interconnection means), as shown.
  • packaging 202 and/or other interconnection means
  • the four 4-core chips could be put together on a 4-chip module (not shown), and connected to other layers of cache put off module.
  • a 64-way system can be made out of four 16-way systems 203 (e.g. probably four boards in a chassis); a 256-way system by placing four chassis into a frame; multiple frames in a room, etc.
  • SOP Systems On Package
  • SC Silicon Carrier
  • the wiring on the silicon wafer henceforth called a Silicon Carrier (SC)
  • SC Silicon Carrier
  • a “chip-to-chip” electrical signal that propagates across a SC appears to be very much the same as any other electrical signal propagating across a single, larger chip at a similar distance.
  • the SC is purely a wiring vehicle and not subject to the blockages encountered on a normal chip, wiring can be shielded much more easily, resulting in better electrical characteristics.
  • SC technology appears to offer many advantages. It allows several chips to be connected together to operate as a single, larger chip.
  • the resulting chip system is larger than the theoretical size of any real chip, but because the chip is partitioned into smaller pieces, the effective yield is higher.
  • the system of FIG. 2 can be implemented more efficiently and much denser.
  • the advantage of making this denser is that more processors and cache can be aggregated into a subsystem, which allows a larger system (in number of processors) to be implemented in a given volume. Further, operation at much reduced power is possible using this densely integrated solution compared to no integration with chip stacks and silicon carrier due to power savings such as voltage reduction, reduced RC (because of reduced line length, size in chip stacks, and between chips on a silicon carrier), and lower noise.
  • FIG. 3 depicts two ways in which we could make a 16-way system 301 , 302 using the 4-core chips 105 of FIG. 1 and some L3 chips 201 . Shown in the top left corner, four of these 4-way chips 105 are assembled together on a large SC to make a 16-way chip 300 . To make a system, the 16-way SC “chip” 300 is combined with some L3 chips 201 using standard packaging as shown on the right 301 . Alternately, four of the 4-way chips 105 are combined with L3 chips 201 on a huge SC 302 to implement the entire system in one step.
  • This later implementation 302 requires fewer levels of packaging, has shorter wires (hence lower power—and perhaps wider paths) between the elements, and it results in a smaller footprint.
  • the SC is much larger in 302 , and the power density is higher. Which of these implementations ( 301 or 302 ) is “better” depends on the characteristic being optimized. At this level of abstraction, it is impossible to say which of these is cheaper by the time that the entire system is put together.
  • 3D and SC seem to offer many of the same advantages.
  • Each technology enables multiple chips to be integrated into a larger, monolithic entity, appearing as if it were a much larger chip.
  • circuits on the component chips can be arranged such that most of the required interconnections between them are directly over (or under) each other, then the interconnection wiring is much shorter than for SC.
  • Most logic circuitry cannot generally be arranged in this way, however this arrangement can work well for memory circuits.
  • 11/453,885 entitled “METHOD FOR ACHIEVING VERY HIGH BANDWIDTH BETWEEN THE LEVELS OF A CACHE HIERARCHY IN 3-DIMENSIONAL STRUCTURES, AND A 3-DIMENSIONAL STRUCTURE RESULTING THEREFROM,” filed on Jun. 16, 2006, teaches one method of constructing a cache hierarchy be leveraging 3D technology that offers unprecedented amounts of bandwidth between the layers at very low power, which is hereby incorporated by reference in its entirety.
  • FIG. 4 illustrates a system built using 3D stack technology only, as described in U.S. patent application Ser. No. 11/453,885, which is hereby incorporated by reference in its entirety. That prior art capitalizes on the enormous bandwidth that 3D offers in going between the levels of a cache hierarchy if the bits within the respective caches were placed according to a certain method. Therefore, to take advantage of that bandwidth opportunity, that prior art split out the L1.5s 103 and L2s 104 as separate chips that stacked on top of each other.
  • the L2s 104 are removed, and a new chip 401 is assembled comprising only processors 101 and L1.5s 103 covering the same area as the prior 4-core component 105 .
  • the new chip 401 contains twelve cores 101 with their corresponding L1.5s 103 .
  • Assembling layers of L2s 102 and L3s 201 into new chips 402 having the same footprint as the new twelve-core chip 401 , create a cache hierarchy. Then these chips 401 and 402 are stacked up to make a twelve-core system 403 .
  • a computer chip structure and computer system have at least one single-layered chip, at least one multi-layered chip stack, and a carrier package, characterized by electrical interconnections of less than 100 microns diameter, where the single-layered chip and the multi-layered chip stack are each electrically coupled to the electrical interconnections of the carrier package, and the single-layered chip is communicatively coupled to the multi-layered chip stack through the carrier package so that an electrical signal propagates over a given distance between the single-layered chip and the multi-layered chip stack at substantially a speed of propagation for a single layer chip over the given distance.
  • the single-layered chip is a processor, having single or multiple cores
  • the multi-layered chip is a memory cache stack.
  • At least one of the single-layered chip and the multi-layered chip stack is electrically coupled to the carrier package by interconnect vias.
  • the interconnect vias have a density that is as least as great as two thousand five hundred interconnections per square centimeter (2500 interconnections/cm 2 ).
  • the carrier package also contains a plurality of integrated decoupling capacitors, each coupling capacitor physically positioned directly under an interconnect via on the side of the carrier package opposite the side the single-layered chip and the multi-layered chip stack are electrically coupled thereto.
  • Power is delivered to the single-layered chip by means of integrated decoupling capacitors located directly under an interconnect via of the single-layered chip, and delivered to the multi-layered chip stack by means of integrated decoupling capacitors located directly under an interconnect via of the multi-layered chip stack.
  • the interconnect vias are created by means of silicon through vias, electrical interconnections, thermal vias, thermal interconnections, or thermal interface materials.
  • the carrier package is made of silicon, organic materials with build up layers, organic materials without build up layers, organic materials with fiber reinforcement, or organic materials without fiber reinforcement.
  • the carrier package has a first surface plane and a second surface plane opposing the first surface plane, and the single-layered chip and the multi-layered chip stack are physically positioned on the same surface plane of the carrier package.
  • the computer chip structure has a first cooling structure thermally coupled to a top-side of the single-layered chip, and a second cooling structure thermally coupled to a bottom-side of the multi-layered chip stack, providing at least two independent thermal paths. Each thermal path is created by means of at least one of electrical vias, electrical interconnections, thermal vias, thermal interconnections, and thermal interface materials.
  • the carrier package has a first surface plane and a second surface plane opposing the first surface plane, and the single-layered chip and the multi-layered chip stack are physically positioned on opposing surface planes of the carrier package such that horizontal wiring distances in intra-level busses are minimized.
  • the computer chip structure contains a second carrier package electrically coupled to at least one multi-layered chip stack by interconnect vias.
  • the second carrier package is thermally coupled to the multi-layered chip stack for providing a second, independent path for heat dissipation.
  • the computer chip structure contains a plurality of multi-layered chip stacks and a carrier package characterized by electrical interconnections of less than 100 microns diameter, wherein the multi-layered chip stacks are each electrically coupled to the electrical interconnections of the carrier package, and communicatively coupled to at least one other multi-layered chip stack through the carrier package so that an electrical signal propagates over a given distance between the multi-layered chip stacks at substantially a speed of propagation for a single layer chip over the given distance.
  • FIG. 1 is a block diagram illustrating a prior art single core with its cache hierarchy, and a prior art multi-core chip.
  • FIG. 2 is a block diagram illustrating a prior art dual-chip module using the multi-core chip of FIG. 1 , and a prior art 16-processor system made from dual-chip modules.
  • FIG. 3 is a block diagram illustrating a prior art 16-way multi-chip on a Silicon Carrier (SC) and two possible prior-art systems that can be made either with the SC and a second-level package, or with a huge SC alone.
  • SC Silicon Carrier
  • FIG. 4 is a block diagram illustrating a prior art 12-way processor system with a cache hierarchy made on the same footprint of a multi-core chip using 3D technology.
  • FIG. 5 is a block diagram illustrating an exemplary 24-way single-socket processor system in accordance with one embodiment of the present invention using a combination of both SC and 3D technologies.
  • FIG. 6 is a block diagram illustrating an exemplary 24-way processor system having enormous caches contained in multiple, vertical towers, by partitioning the system in accordance with one embodiment of the present invention.
  • FIG. 7 is a block diagram highlighting the simplicity of an exemplary system board when the multiprocessor is socketed in a manner set forth by an embodiment of the present invention.
  • FIG. 8 is a block diagram illustrating an exemplary 24-way single-socket processor system in accordance with another embodiment of the present invention using a combination of both SC and 3D technologies.
  • FIG. 9 is a more detailed block diagram of the 24-way single-socket processor system of FIG. 5 , illustrating exemplary cooling techniques in accordance with an embodiment of the present invention.
  • the terms “a” or “an,” as used herein, are defined as “one or more than one.”
  • the term “plurality,” as used herein, is defined as “two or more than two.”
  • the term “another,” as used herein, is defined as “at least a second or more.”
  • the terms “including” and/or “having,” as used herein, are defined as “comprising” (i.e., open language).
  • the term “coupled,” as used herein, is defined as “connected, although not necessarily directly, and not necessarily mechanically.”
  • the terms “controls,” “button,” “knob,” and “switch” are used interchangeably to indicate a mechanical means for a user to control functions of a piece of equipment instrument or device.
  • program is defined as “a sequence of instructions designed for execution on a computer system.”
  • a program, computer program, or software application typically includes a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
  • the present invention overcomes problems with the prior art by positioning a new system structure that combines 3D and SC technologies as mutually synergistic technologies instead of viewing these technologies as mutually competing technologies.
  • the strengths of each technology are used to compensate for the other's weakness. In this way, a system is constructed that is far beyond the scale of what could be done with either technology by itself
  • An embodiment of the present invention offers two major improvements over the prior art. First, all cache levels that are typically located outside the processor (i.e. all cache levels greater than L1) are removed from the processor chip. This allows the logic chip containing the cores to be more logically dense than ever before realized.
  • processors logic circuits
  • caches lack the simple redundancy possible with memories, resulting in low-yield problems.
  • the combination of processors and memory increases the risk associated with production yield and therefore makes the system costly. By removing all of the cache, the entire area of the chip is consumed exclusively by cores. This lowers yield loss because redundancy is accomplished by the simple addition of a few extra cores without adding all of the area that their caches would have required.
  • a second improvement over prior art is that neither the power density nor the thermal resistance is increased to the hottest component—the processor chip. This allows the cores to operate in a higher power regime, if desired, achieving a higher performance by providing area array power feeds to the processor and area array cooling to the opposite side of the processor chip without the need to remove heat from the processor chip, through other chips in a stack, to a heat sink.
  • the logic density is increased significantly by segregating the core processors 101 and level L1.5 memory cache 103 from higher cache levels in the hierarchy.
  • the resulting 12-core chip 401 is designed to share an identical footprint as each layer of cache memory 102 , 201 to enable vertical stacking of layers.
  • FIG. 5 An exemplary embodiment of the present invention, shown in FIG. 5 , builds on this concept by further removing the L1.5 memories 103 from the 12-core chip 401 , and filling in the area vacated by the L1.5s 103 with additional cores 101 . This results in a 24-core all-logic chip 501 , which has the same area as the original 12-core chip 401 . Next, a new chip layer 502 is created containing only L1.5 memory 103 .
  • Additional L2 104 and L3 201 chips are assembled using complementary 3D & SC technologies.
  • a 3D stack 503 of all of the memory caches (L1.5, L2, and L3) is produced and mounted adjacent to the 24-core logic chip 501 on a carrier package 504 .
  • Interconnections connecting the component chips 501 , 503 to the package 504 have a pitch less than 100 microns, meaning an interconnect density of greater than two thousand five hundred interconnects per square centimeter (2500/cm 2 ).
  • the data bus between the 24-core layer 502 and the cache stack 503 is routed through the carrier package 504 . This results in a single socket 24-way complete system 505 .
  • FIG. 6 illustrates one manner in which multiple cache stacks 601 are positioned.
  • the 24-core logic chip 501 is located in the center of the carrier package 602 , with four cache stacks 601 positioned around it.
  • each cache stack 601 serves just six of the twenty-four processors 101 . If each of these cache stacks 601 has a footprint comparable to the original chip stack 503 , then the L1.5s 502 can be much bigger, since the footprint of the chip need only accommodate six of them.
  • the electrical signal actually propagates over a given distance between the multi-core processor and the multi-layered memory cache stack at substantially a speed of propagation for a single layer chip over the given distance. (At this point, it should be noted that the processor chip 501 and cache stacks 601 , as depicted in FIG.
  • each cache stack 601 are not drawn to scale, but merely enhanced to illustrate a concept of the present invention.
  • the layers of each cache stack 601 are actually thinned down to a point where the height of the resulting cache stacks 601 are equivalent to the height of the processor chip 501 for easy manufacturability.
  • the cache stack is the entire hierarchy beyond the L1, with some horizontal wiring required across the SC to reach the L1.5, it should be understood that this need not be the exclusive manner of partitioning. For example, if the power density of the processor chip is not unwieldy, the L1.5 layer 502 is stacked upon the processor chip 501 , and the cache towers 503 begin with the L2 level.
  • FIG. 6 A highly desirable feature of FIG. 6 is that additional cache capacity is accomplished in three ways. First, since there are four cache towers 601 , instead of the single tower 503 shown in FIG. 5 , the cache capacity is at least quadrupled. Second, since the hottest component—the processors 101 —have been removed from the towers 601 , the stacks are able to be stacked higher because there is less power density. Thus, the capacity is further increased by factor of two or more from the increased stack height. Third, it should be noticed that in FIG. 6 , the four corners of the SC 602 are empty. Additional towers 601 can be placed in the corners of the SC 602 (e.g., doubling the amount of cache), or other functions, such as I/O channels, optical transceivers, or any other desired functions can be located in these corners.
  • a single socket system built using SC and 3D in conjunction, has many more processors and much more cache memory than could be put together otherwise. More surprisingly, while the total power consumption of all of the added processors and cache is greater because of the increased number of integrated components, the power density is less than that of the smaller capacity prior-art system when partitioned in the manner set forth above. This is accomplished by physically isolating the high-powered processors from lower-powered memory components. The low-powered memory stacks (SRAM and DRAM) are then multiplied and built even higher than before possible. The connections between the processors and the cache towers are very broad, enabling an extraordinarily powerful socket with a reasonably small footprint.
  • a 24-processor 6 is placed onto a single board 700 .
  • the 24-way system 600 of FIG. 6 appears as a single component 600 on the board 700 .
  • the only other components on this board 600 is the memory system 701 , comprising only a couple of columns of DIMMs 701 . Since there is only a single component 600 (besides the DIMMs) on the board 700 , the board wiring should be extremely simple, and the board itself cheap.
  • FIG. 8 illustrates another embodiment of the present invention using Package-on-Package techniques for vertical 3D integration.
  • a 24-core logic chip 501 containing only the processor cores 101 and L1 memory 103 is assembled in the same manner as shown in FIG. 5 .
  • the 24-core logic chip 501 is then mounted on its own silicon carrier 801 .
  • the memory cache 503 is combined with the L1.5 memory layer 502 using 3D technology and is sandwiched between the silicon carrier 801 carrying the 24-core logic chip 501 , and an additional carrier 802 for routing purposes.
  • integrated decoupling capacitors 803 are located directly underneath each chip or chip stack and are connected using the interconnects.
  • integrated decoupling capacitors 803 provide decoupling by means of a low inductance path, which provides advantages such as, but not limited to, reduced noise, lower voltage circuit operation, improved voltage distribution (i.e. lower voltage droop), higher frequency operation, and any combination thereof.
  • the lower carrier package 802 can be made from a variety of materials, including ceramics, silicon, and organic materials such as organic packages with build up layers, without build up layers, with fiber reinforcement, without fiber reinforcement.
  • power delivery to the chip 501 and chip stacks 502 , 503 is provided through the package 802 with the added benefit of integrated decoupling capacitors 803 in some applications, as described above.
  • the present invention provides a significant advantage over the prior art, in terms of power delivery, by keeping the processor 501 separate from the cache chip stack 502 , 503 , thereby reducing the number of power delivery connections needed through the silicon processor 501 and cache chip stacks 502 , 503 . This additional area would have previously detracted from the area available in both the processor 501 and cache chip stacks 502 , 503 for active circuits and for signal and power wiring.
  • the present invention offers an additional advantage over the prior 3D technology because there are now separate, multiple independent paths for cooling both the processor chip 501 and the cache chip stack 502 , 503 . Because the processor chip 501 is no longer vertically mounted to the cache stack 502 , 503 , cooling is provided on both the topsides and the backsides of the processor chip 501 and of the cache stack 502 , 503 . In addition, the cooling of the chip stack 502 , 503 is engineered to utilize vertical thermal interconnections 805 between the cache chips in the cache chip stack 502 , 503 .
  • the thermal path is comprised of electrical interconnections, thermal vias, thermal spreaders and/or thermal interconnections.
  • the thermal connections 902 benefit from added Metal vias such as copper vias and metal interconnections between chips in a chip stack.
  • Thermal interconnections include the use of additional mechanical and thermal connections such as added pads, solder connections, thermal interface materials such as filled adhesives or alternate means. Further cooling is enhanced using a wide range of assembled module options such as heat spreaders, heat sinks, heat pipes, heat fins, air cooling, liquid cooling such as but not limited to microchannel cooling or combinations of cooling methods and structures used in the industry at a module level.
  • cooling hat 901 at the topside of the processor chip 501 and another at the bottom of the chip stack 503 by means of a heat sink and thermal connections at the bottom side of the cache stack 503 to the package 504 .
  • the spreader 802 would also serve as a package 802 for power, electrical and mechanical integrity.
  • Another example shows one traditional cooling surface 901 placed at the at top of processor layer 501 , and the second carrier package 801 placed at the bottom of the cache stack 503 , as shown in FIG. 9 .
  • the multiple cooling paths allow for more efficient cooling than can be realized using only 3D technology.
  • the prior-art subsystems of FIGS. 2 and 3 would have taken a very large, complicated board, or even multiple boards, to implement a 24-way system. In those cases, the complexity level may have limited the system to a maximum 16-way system.
  • the present invention enables a simpler physical system. Not only is the new system a higher capacity system (in number of processors), but it is more power efficient because it has much shorter busses; more performance efficient because the busses can be wider; more cost efficient because redundancy in the cores enables lower yield loss; and capable of having a lower power density.
  • the circuit as described above is part of the design for an integrated circuit chip.
  • the chip design is created in a graphical computer programming language, and stored in a computer storage medium (such as a disk, tape, physical hard drive, or virtual hard drive such as in a storage access network). If the designer does not fabricate chips or the photolithographic masks used to fabricate chips, the designer transmits the resulting design by physical means (e.g., by providing a copy of the storage medium storing the design) or electronically (e.g., through the Internet) to such entities, directly or indirectly.
  • the stored design is then converted into the appropriate format (e.g., GDSII) for the fabrication of photolithographic masks, which typically include multiple copies of the chip design in question that are to be formed on a wafer.
  • the photolithographic masks are utilized to define areas of the wafer (and/or the layers thereon) to be etched or otherwise processed.
  • the method as described above is used in the fabrication of integrated circuit chips.
  • the resulting integrated circuit chips can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips), as a bare chip, or in a packaged form.
  • the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher level carrier) or in a multi-chip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections).
  • the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a motherboard, or (b) an end product.
  • the end product can be any product that includes integrated circuit chips, ranging from toys and other low-end applications to advanced computer products having a display, a keyboard, or other input device, and a central processor.
  • a computer system may include, inter alia, one or more computers and at least a computer readable medium, allowing a computer system, to read data, instructions, messages or message packets, and other computer readable information from the computer readable medium.
  • the computer readable medium may include non-volatile memory, such as ROM, Flash memory, Disk drive memory, CD-ROM, and other permanent storage. Additionally, a computer readable medium may include, for example, volatile storage such as RAM, buffers, cache memory, and network circuits. Furthermore, the computer readable medium may comprise computer readable information in a transitory state medium such as a network link and/or a network interface, including a wired network or a wireless network that allow a computer system to read such computer readable information.

Abstract

A computer chip is structured to have at least one single-layered chip, at least one multi-layered chip stack, and a carrier package characterized by electrical interconnections of less than 100 microns diameter, wherein the single-layered chip and the multi-layered chip stack are each electrically coupled to the electrical interconnections of the carrier package, and the single-layered chip is communicatively coupled to the multi-layered chip stack through the carrier package so that an electrical signal propagates over a given distance between the single-layered chip and the multi-layered chip stack at substantially a speed of propagation for a single layer chip over the given distance. The single-layered chip can be a processor having multi-cores and the multi-layered chip stack can be a memory cache stack. Interconnect vias, having a density at least as great as 2500 interconnects/cm2 electrically couple the single-layered chip and the multi-layered chip stack to the carrier package.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a continuation-in-part from prior U.S. patent application Ser. No. 11/453,885, entitled “METHOD FOR ACHIEVING VERY HIGH BANDWIDTH BETWEEN THE LEVELS OF A CACHE HIERARCHY IN 3-DIMENSIONAL STRUCTURES, AND A 3-DIMENSIONAL STRUCTURE RESULTING THEREFROM,” filed on Jun. 16, 2006 the entire disclosure of which is herein incorporated by reference.
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention generally relates to a method for designing multi-core processing systems, and more specifically, to a method for combining two emerging technology areas: Systems On Package (SOP) and 3D technology.
2. Description of Related Art
As circuit density has scaled in accordance with Moore's Law, single chips have grown to contain more and more of a computer system. Two decades ago, it was a revelation that an entire processor could fit on a single chip. When the CMOS node reached the 180 nanometer level, technology had evolved to the point that not only was the processor's Level-1 cache (L1) contained on the same chip with the processor, but for the first time, it was also feasible to include the next level of cache, L2. About a decade ago, the first single-chip multiprocessors were produced.
In 2001, IBM started shipping the first dual core chips. Presently, as chip processing capabilities head toward the 65 nanometer, and then 45 nanometer nodes, most companies intend to ship 4 or 8 cores—and perhaps more—on a single chip. As technology dimensions continue to shrink, and as more cores are being put on a chip in the multi-Ghz regime, three significant things are happening.
First, the cores themselves are getting physically much smaller and are taking up less of the area on the chip. Second, as more compute power is put into a chip, much more cache capacity is required to hold the data (being computed upon) for that compute power to be useful. Thus the on-chip caches are getting larger. Finally, at higher frequencies, in order for the chip to run fast, the first-level (L1) caches are becoming too small, so more levels are being inserted into the cache hierarchy. In other words, each core now has a private cache hierarchy before these hierarchies feed into a shared level of cache.
FIG. 1 illustrates the above situation. On the left are a “core” 100, which comprises the processor 101 itself, and two additional cache levels 103, 104. The processor 101 is the logic-intensive circuitry. The L1 cache 102 is considered to be part of the processor 101, since its access is part of the processor pipeline. As frequencies have gone up to multiple Ghz, the L1 102 has become too small to provide high hit rates. Therefore, an L1.5 103 cache level has been added to supplement the L1 102. The L1.5 103 is also limited in size because it must be reasonably fast to accommodate the high miss rate from the L1 102. Therefore there is an L2 104 is required. Note that most of what is called a “core” 100 is actually memory. When multiple cores 100 are aggregated into a single “multi-core chip” 105, the chip is large, but most of it is memory.
FIG. 2 illustrates today's prior art technology for multi-core chips. The multi-core chip 105 is mounted on a Dual-Chip Module (DCM) 200, which can either be ceramic or organic, where it is connected to another chip shown as L3 201. In fact, what is shown as L3 201 usually contains not only another layer of cache, but also other functions that facilitate the operation of the nest (not shown) (i.e. all system functions excluding the processor, cache, and memory; such as external memory, I/O controllers, wiring, PC boards, etc.), and functions that connect the DCM 200 to other devices in the system.
To make a 16-way processor system, four of these DCMs 200, are mounted with another layer of packaging 202 (and/or other interconnection means), as shown. Alternatively, the four 4-core chips could be put together on a 4-chip module (not shown), and connected to other layers of cache put off module.
There are many ways of making this 16-way processor system 203 with standard packaging. All require several layers of packaging, and wind up having a fairly large footprint. To make larger chips, this progression continues. For example, a 64-way system can be made out of four 16-way systems 203 (e.g. probably four boards in a chassis); a 256-way system by placing four chassis into a frame; multiple frames in a room, etc.
Within the last decade, Systems On Package (SOP) has emerged as a promising technology for integrating multiple chips into a system “package” in a manner that made these multiple chips behave as a single, monolithic chip. In particular, when the package is actually a silicon wafer and the solder balls connecting the component chips to the package are sufficiently tiny, with a sufficiently small pitch (e.g., less than 20 microns), then for all practical purposes, the aggregation appears, at least electrically, to be a single chip.
This effect occurs because the wiring on the silicon wafer, henceforth called a Silicon Carrier (SC), can be identical (in materials, pitch, and electrical characteristics) to the wiring on the component chips. With a dense enough interconnect pitch and sufficiently small solder balls, a “chip-to-chip” electrical signal that propagates across a SC appears to be very much the same as any other electrical signal propagating across a single, larger chip at a similar distance. In fact, because the SC is purely a wiring vehicle and not subject to the blockages encountered on a normal chip, wiring can be shielded much more easily, resulting in better electrical characteristics.
When the interconnect pitch is less than 20 microns, millions of signals can be “escaped” to the package from a typical chip. Generally, when signals are “escaped,” special off-chip drivers, receivers, and several metal layers are required to perform a spatial transformation of the signal. However, with a SC, no additional devices are necessary because the wiring on the SC is the same as the wiring on the chips that it contains. In other words, a signal is not affected by transferring from one chip to another chip.
Thus, SC technology appears to offer many advantages. It allows several chips to be connected together to operate as a single, larger chip. The resulting chip system is larger than the theoretical size of any real chip, but because the chip is partitioned into smaller pieces, the effective yield is higher.
Using SC technology, the system of FIG. 2 can be implemented more efficiently and much denser. The advantage of making this denser is that more processors and cache can be aggregated into a subsystem, which allows a larger system (in number of processors) to be implemented in a given volume. Further, operation at much reduced power is possible using this densely integrated solution compared to no integration with chip stacks and silicon carrier due to power savings such as voltage reduction, reduced RC (because of reduced line length, size in chip stacks, and between chips on a silicon carrier), and lower noise.
For example, FIG. 3 depicts two ways in which we could make a 16- way system 301, 302 using the 4-core chips 105 of FIG. 1 and some L3 chips 201. Shown in the top left corner, four of these 4-way chips 105 are assembled together on a large SC to make a 16-way chip 300. To make a system, the 16-way SC “chip” 300 is combined with some L3 chips 201 using standard packaging as shown on the right 301. Alternately, four of the 4-way chips 105 are combined with L3 chips 201 on a huge SC 302 to implement the entire system in one step.
This later implementation 302 requires fewer levels of packaging, has shorter wires (hence lower power—and perhaps wider paths) between the elements, and it results in a smaller footprint. However, the SC is much larger in 302, and the power density is higher. Which of these implementations (301 or 302) is “better” depends on the characteristic being optimized. At this level of abstraction, it is impossible to say which of these is cheaper by the time that the entire system is put together.
Even more recently, new developments have demonstrated the viability of interconnecting two or more planes of circuits by thinning those planes to a few hundred microns or less, etching dense via patterns in them, and then interconnecting them with metalization processes. The resulting structure is a monolithic “chip” comprised of multiple planes of circuits. This recent advance is provides a new dimension in the scaling of circuit density, as is hereafter referred to as “3D.”
Thus, 3D and SC seem to offer many of the same advantages. Each technology enables multiple chips to be integrated into a larger, monolithic entity, appearing as if it were a much larger chip. However, it should be noted that in 3D, if circuits on the component chips can be arranged such that most of the required interconnections between them are directly over (or under) each other, then the interconnection wiring is much shorter than for SC. Most logic circuitry cannot generally be arranged in this way, however this arrangement can work well for memory circuits. For example, U.S. patent application Ser. No. 11/453,885, entitled “METHOD FOR ACHIEVING VERY HIGH BANDWIDTH BETWEEN THE LEVELS OF A CACHE HIERARCHY IN 3-DIMENSIONAL STRUCTURES, AND A 3-DIMENSIONAL STRUCTURE RESULTING THEREFROM,” filed on Jun. 16, 2006, teaches one method of constructing a cache hierarchy be leveraging 3D technology that offers unprecedented amounts of bandwidth between the layers at very low power, which is hereby incorporated by reference in its entirety.
FIG. 4 illustrates a system built using 3D stack technology only, as described in U.S. patent application Ser. No. 11/453,885, which is hereby incorporated by reference in its entirety. That prior art capitalizes on the enormous bandwidth that 3D offers in going between the levels of a cache hierarchy if the bits within the respective caches were placed according to a certain method. Therefore, to take advantage of that bandwidth opportunity, that prior art split out the L1.5s 103 and L2s 104 as separate chips that stacked on top of each other.
Instead of the four-core chips 105, the L2s 104 are removed, and a new chip 401 is assembled comprising only processors 101 and L1.5s 103 covering the same area as the prior 4-core component 105. In FIG. 4, the new chip 401 contains twelve cores 101 with their corresponding L1.5s 103. Assembling layers of L2s 102 and L3s 201 into new chips 402, having the same footprint as the new twelve-core chip 401, create a cache hierarchy. Then these chips 401 and 402 are stacked up to make a twelve-core system 403.
The advantages of this new system 403 are:
    • 1. A complete 12-way system now has the same footprint as the old 4-way chip, which is just a subsystem. That is, the single-socket 12-way system does not require any more intermediate packaging layers.
    • 2. Depending on the number of chips stacked, there is a potential for including considerably more cache than would have had the area for in standard packaging.
    • 3. The busses between the levels of the cache hierarchy can be extremely wide, and extremely short, offering unprecedented amounts of bandwidth at low power.
However, certain limitations of this system may limit the range of product solutions:
    • 1. The total capacity of the cache is limited by the amount of layer that can reasonably be stacked in the chips.
    • 2. There is a potential for a greater power density than can dispensed using conventional cooling.
    • 3. The thermal resistance is increased when the stacks are high, further complicating the cooling.
    • 4. Every layer must be designed and built for the entire chip stack to function, leading to longer development times and costs for mask levels and integration.
Neither SC nor 3D solutions completely resolve the problems encountered when attempting to manufacture multi-core processors in ever-greater densities. Therefore a need exists to overcome the problems with the prior art as discussed above.
SUMMARY OF THE INVENTION
According to a preferred embodiment of the present invention, a computer chip structure and computer system have at least one single-layered chip, at least one multi-layered chip stack, and a carrier package, characterized by electrical interconnections of less than 100 microns diameter, where the single-layered chip and the multi-layered chip stack are each electrically coupled to the electrical interconnections of the carrier package, and the single-layered chip is communicatively coupled to the multi-layered chip stack through the carrier package so that an electrical signal propagates over a given distance between the single-layered chip and the multi-layered chip stack at substantially a speed of propagation for a single layer chip over the given distance.
In one embodiment, the single-layered chip is a processor, having single or multiple cores, and the multi-layered chip is a memory cache stack.
At least one of the single-layered chip and the multi-layered chip stack is electrically coupled to the carrier package by interconnect vias. The interconnect vias have a density that is as least as great as two thousand five hundred interconnections per square centimeter (2500 interconnections/cm2).
In one embodiment, the carrier package also contains a plurality of integrated decoupling capacitors, each coupling capacitor physically positioned directly under an interconnect via on the side of the carrier package opposite the side the single-layered chip and the multi-layered chip stack are electrically coupled thereto.
Power is delivered to the single-layered chip by means of integrated decoupling capacitors located directly under an interconnect via of the single-layered chip, and delivered to the multi-layered chip stack by means of integrated decoupling capacitors located directly under an interconnect via of the multi-layered chip stack.
The interconnect vias are created by means of silicon through vias, electrical interconnections, thermal vias, thermal interconnections, or thermal interface materials. The carrier package is made of silicon, organic materials with build up layers, organic materials without build up layers, organic materials with fiber reinforcement, or organic materials without fiber reinforcement.
In one embodiment, the carrier package has a first surface plane and a second surface plane opposing the first surface plane, and the single-layered chip and the multi-layered chip stack are physically positioned on the same surface plane of the carrier package. Also, the computer chip structure has a first cooling structure thermally coupled to a top-side of the single-layered chip, and a second cooling structure thermally coupled to a bottom-side of the multi-layered chip stack, providing at least two independent thermal paths. Each thermal path is created by means of at least one of electrical vias, electrical interconnections, thermal vias, thermal interconnections, and thermal interface materials.
In another embodiment, the carrier package has a first surface plane and a second surface plane opposing the first surface plane, and the single-layered chip and the multi-layered chip stack are physically positioned on opposing surface planes of the carrier package such that horizontal wiring distances in intra-level busses are minimized.
In another embodiment, the computer chip structure contains a second carrier package electrically coupled to at least one multi-layered chip stack by interconnect vias. The second carrier package is thermally coupled to the multi-layered chip stack for providing a second, independent path for heat dissipation.
In another embodiment, the computer chip structure contains a plurality of multi-layered chip stacks and a carrier package characterized by electrical interconnections of less than 100 microns diameter, wherein the multi-layered chip stacks are each electrically coupled to the electrical interconnections of the carrier package, and communicatively coupled to at least one other multi-layered chip stack through the carrier package so that an electrical signal propagates over a given distance between the multi-layered chip stacks at substantially a speed of propagation for a single layer chip over the given distance.
BRIEF DESCRIPTION OF THE DRAWINGS
In the accompanying figures, where like reference numerals refer to identical or functionally similar elements throughout the separate views and which together with the detailed description below are incorporated in and form part of the specification, serve to further illustrate various embodiments and to explain various principles and advantages all in accordance with the present invention.
FIG. 1 is a block diagram illustrating a prior art single core with its cache hierarchy, and a prior art multi-core chip.
FIG. 2 is a block diagram illustrating a prior art dual-chip module using the multi-core chip of FIG. 1, and a prior art 16-processor system made from dual-chip modules.
FIG. 3 is a block diagram illustrating a prior art 16-way multi-chip on a Silicon Carrier (SC) and two possible prior-art systems that can be made either with the SC and a second-level package, or with a huge SC alone.
FIG. 4 is a block diagram illustrating a prior art 12-way processor system with a cache hierarchy made on the same footprint of a multi-core chip using 3D technology.
FIG. 5 is a block diagram illustrating an exemplary 24-way single-socket processor system in accordance with one embodiment of the present invention using a combination of both SC and 3D technologies.
FIG. 6 is a block diagram illustrating an exemplary 24-way processor system having enormous caches contained in multiple, vertical towers, by partitioning the system in accordance with one embodiment of the present invention.
FIG. 7 is a block diagram highlighting the simplicity of an exemplary system board when the multiprocessor is socketed in a manner set forth by an embodiment of the present invention.
FIG. 8 is a block diagram illustrating an exemplary 24-way single-socket processor system in accordance with another embodiment of the present invention using a combination of both SC and 3D technologies.
FIG. 9 is a more detailed block diagram of the 24-way single-socket processor system of FIG. 5, illustrating exemplary cooling techniques in accordance with an embodiment of the present invention.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Terminology Overview
As required, detailed embodiments of the present invention are disclosed herein; however, it is to be understood that the disclosed embodiments are merely exemplary of the invention, which can be embodied in various forms. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the present invention in virtually any appropriately detailed structure. Further, the terms and phrases used herein are not intended to be limiting; but rather, to provide an understandable description of the invention.
The terms “a” or “an,” as used herein, are defined as “one or more than one.” The term “plurality,” as used herein, is defined as “two or more than two.” The term “another,” as used herein, is defined as “at least a second or more.” The terms “including” and/or “having,” as used herein, are defined as “comprising” (i.e., open language). The term “coupled,” as used herein, is defined as “connected, although not necessarily directly, and not necessarily mechanically.” The terms “controls,” “button,” “knob,” and “switch” are used interchangeably to indicate a mechanical means for a user to control functions of a piece of equipment instrument or device. The terms “program,” “software application,” and the like as used herein, are defined as “a sequence of instructions designed for execution on a computer system.” A program, computer program, or software application typically includes a subroutine, a function, a procedure, an object method, an object implementation, an executable application, an applet, a servlet, a source code, an object code, a shared library/dynamic load library and/or other sequence of instructions designed for execution on a computer system.
Overview
The present invention, according to one embodiment, overcomes problems with the prior art by positioning a new system structure that combines 3D and SC technologies as mutually synergistic technologies instead of viewing these technologies as mutually competing technologies. The strengths of each technology are used to compensate for the other's weakness. In this way, a system is constructed that is far beyond the scale of what could be done with either technology by itself
Although 3D and SC appear to be “competing” technologies in that they take different approaches toward achieving a similar goal (i.e. aggregating multiple chips into one), there are a few, subtle differences. The three greatest differences are:
    • 1. In the case that circuitry is arranged such that the required interconnections are directly “over and under” each other, the resulting busses in 3D will be on the scale of a millimeter (i.e. principally be in the vertical stack), hence they can be very fast and relatively low power. In SC, the same busses will be on the scale of a centimeter, and will not be as fast or as efficient.
    • 2. With SC, all of the chips that are to be interconnected must be arranged in a plane. Therefore, if there are many chips (such as with memory systems), the footprint of the aggregations will be very large. This can be a very inefficient use of space on the next-level package. Further, there are limits to how large the SC can be made in a cost-effective way. Thus, only the number of component chips that fit within that limit can reasonably be integrated.
    • 3. In 3D, when more than a few planes are piled up into a stack, the total power density can become too high and lead to challenges both with power delivery into the stack and cooling of all chips in the stack, especially if using low cost standard cooling methods. Concurrently, the thermal resistance grows with the stack height, compounding this problem. This is particularly problematic when the entire system—including processors—is in the stack where often a processor may be a high power density chip and require significant cooling for many applications.
An embodiment of the present invention offers two major improvements over the prior art. First, all cache levels that are typically located outside the processor (i.e. all cache levels greater than L1) are removed from the processor chip. This allows the logic chip containing the cores to be more logically dense than ever before realized.
One constant source of problems with manufacturing multi-chip processor systems that integrate logic circuits (processors) with large caches is that low-yield components are combined with memory to make a large chip. The processor cores lack the simple redundancy possible with memories, resulting in low-yield problems. The combination of processors and memory increases the risk associated with production yield and therefore makes the system costly. By removing all of the cache, the entire area of the chip is consumed exclusively by cores. This lowers yield loss because redundancy is accomplished by the simple addition of a few extra cores without adding all of the area that their caches would have required.
A second improvement over prior art is that neither the power density nor the thermal resistance is increased to the hottest component—the processor chip. This allows the cores to operate in a higher power regime, if desired, achieving a higher performance by providing area array power feeds to the processor and area array cooling to the opposite side of the processor chip without the need to remove heat from the processor chip, through other chips in a stack, to a heat sink.
In the prior art 12-core system 403, created using 3D technology, the logic density is increased significantly by segregating the core processors 101 and level L1.5 memory cache 103 from higher cache levels in the hierarchy. The resulting 12-core chip 401 is designed to share an identical footprint as each layer of cache memory 102, 201 to enable vertical stacking of layers.
An exemplary embodiment of the present invention, shown in FIG. 5, builds on this concept by further removing the L1.5 memories 103 from the 12-core chip 401, and filling in the area vacated by the L1.5s 103 with additional cores 101. This results in a 24-core all-logic chip 501, which has the same area as the original 12-core chip 401. Next, a new chip layer 502 is created containing only L1.5 memory 103.
Additional L2 104 and L3 201 chips are assembled using complementary 3D & SC technologies. For example, a 3D stack 503 of all of the memory caches (L1.5, L2, and L3) is produced and mounted adjacent to the 24-core logic chip 501 on a carrier package 504. Interconnections connecting the component chips 501, 503 to the package 504 have a pitch less than 100 microns, meaning an interconnect density of greater than two thousand five hundred interconnects per square centimeter (2500/cm2). The data bus between the 24-core layer 502 and the cache stack 503 is routed through the carrier package 504. This results in a single socket 24-way complete system 505.
Note that in canonical thinking, it seems disadvantageous to put an all-L1.5 chip 502 adjacent to the all-core chip 501 because some of the cores 101 will be located a great distance from their respective L1.5s 103, requiring long routes for data signal propagation. However, there are two mitigating factors. First, depending on the areas involved and the number of L1.5 layers 502 implemented, the performance lost due to these potential added cycles of delay can be mitigated by increasing the capacity of the L1.5s 502, which reduces their miss rates. Secondly, the cache stacks 503 can be broken into multiple stacks, and positioned along all of the edges of the processor chip to reduce this distance. FIG. 6 illustrates one manner in which multiple cache stacks 601 are positioned. Here, the 24-core logic chip 501 is located in the center of the carrier package 602, with four cache stacks 601 positioned around it. Note that in this example, each cache stack 601 serves just six of the twenty-four processors 101. If each of these cache stacks 601 has a footprint comparable to the original chip stack 503, then the L1.5s 502 can be much bigger, since the footprint of the chip need only accommodate six of them. Thus, the electrical signal actually propagates over a given distance between the multi-core processor and the multi-layered memory cache stack at substantially a speed of propagation for a single layer chip over the given distance. (At this point, it should be noted that the processor chip 501 and cache stacks 601, as depicted in FIG. 6, are not drawn to scale, but merely enhanced to illustrate a concept of the present invention. The layers of each cache stack 601 are actually thinned down to a point where the height of the resulting cache stacks 601 are equivalent to the height of the processor chip 501 for easy manufacturability.)
Thus, to those skilled in the art, it will initially appear counterintuitive to move the L1.5s 103 away from their corresponding processors 101, but the use of SC allows a very wide bus with high-speed shielded signals, and the 3D technology dramatically increases the L1.5 capacity. Further, this procedure allows more processors to fit on the same logic chip. Therefore, while breaking these two components apart may seem unnatural to those skilled in the art, the act of doing so more than compensates in other ways.
Although, in the above embodiment, the cache stack is the entire hierarchy beyond the L1, with some horizontal wiring required across the SC to reach the L1.5, it should be understood that this need not be the exclusive manner of partitioning. For example, if the power density of the processor chip is not unwieldy, the L1.5 layer 502 is stacked upon the processor chip 501, and the cache towers 503 begin with the L2 level.
A highly desirable feature of FIG. 6 is that additional cache capacity is accomplished in three ways. First, since there are four cache towers 601, instead of the single tower 503 shown in FIG. 5, the cache capacity is at least quadrupled. Second, since the hottest component—the processors 101—have been removed from the towers 601, the stacks are able to be stacked higher because there is less power density. Thus, the capacity is further increased by factor of two or more from the increased stack height. Third, it should be noticed that in FIG. 6, the four corners of the SC 602 are empty. Additional towers 601 can be placed in the corners of the SC 602 (e.g., doubling the amount of cache), or other functions, such as I/O channels, optical transceivers, or any other desired functions can be located in these corners.
A single socket system, built using SC and 3D in conjunction, has many more processors and much more cache memory than could be put together otherwise. More surprisingly, while the total power consumption of all of the added processors and cache is greater because of the increased number of integrated components, the power density is less than that of the smaller capacity prior-art system when partitioned in the manner set forth above. This is accomplished by physically isolating the high-powered processors from lower-powered memory components. The low-powered memory stacks (SRAM and DRAM) are then multiplied and built even higher than before possible. The connections between the processors and the cache towers are very broad, enabling an extraordinarily powerful socket with a reasonably small footprint.
Additionally, the larger, more powerful system processor further simplifies the rest of the computer system and changes the value proposition on a motherboard. For example, as shown in FIG. 7, a 24-processor 6 is placed onto a single board 700. Note that the 24-way system 600 of FIG. 6 appears as a single component 600 on the board 700. The only other components on this board 600 is the memory system 701, comprising only a couple of columns of DIMMs 701. Since there is only a single component 600 (besides the DIMMs) on the board 700, the board wiring should be extremely simple, and the board itself cheap.
FIG. 8 illustrates another embodiment of the present invention using Package-on-Package techniques for vertical 3D integration. In this embodiment, a 24-core logic chip 501, containing only the processor cores 101 and L1 memory 103 is assembled in the same manner as shown in FIG. 5. However, the 24-core logic chip 501 is then mounted on its own silicon carrier 801. The memory cache 503 is combined with the L1.5 memory layer 502 using 3D technology and is sandwiched between the silicon carrier 801 carrying the 24-core logic chip 501, and an additional carrier 802 for routing purposes. In this package-on-package system 800, integrated decoupling capacitors 803 are located directly underneath each chip or chip stack and are connected using the interconnects. Because of their close proximity to the chip circuits, integrated decoupling capacitors 803 provide decoupling by means of a low inductance path, which provides advantages such as, but not limited to, reduced noise, lower voltage circuit operation, improved voltage distribution (i.e. lower voltage droop), higher frequency operation, and any combination thereof. The lower carrier package 802 can be made from a variety of materials, including ceramics, silicon, and organic materials such as organic packages with build up layers, without build up layers, with fiber reinforcement, without fiber reinforcement.
In one embodiment, power delivery to the chip 501 and chip stacks 502, 503 is provided through the package 802 with the added benefit of integrated decoupling capacitors 803 in some applications, as described above. The present invention provides a significant advantage over the prior art, in terms of power delivery, by keeping the processor 501 separate from the cache chip stack 502, 503, thereby reducing the number of power delivery connections needed through the silicon processor 501 and cache chip stacks 502, 503. This additional area would have previously detracted from the area available in both the processor 501 and cache chip stacks 502, 503 for active circuits and for signal and power wiring.
In terms of cooling, the present invention offers an additional advantage over the prior 3D technology because there are now separate, multiple independent paths for cooling both the processor chip 501 and the cache chip stack 502, 503. Because the processor chip 501 is no longer vertically mounted to the cache stack 502, 503, cooling is provided on both the topsides and the backsides of the processor chip 501 and of the cache stack 502, 503. In addition, the cooling of the chip stack 502, 503 is engineered to utilize vertical thermal interconnections 805 between the cache chips in the cache chip stack 502, 503. The thermal path is comprised of electrical interconnections, thermal vias, thermal spreaders and/or thermal interconnections. In one embodiment, the thermal connections 902 benefit from added Metal vias such as copper vias and metal interconnections between chips in a chip stack. Thermal interconnections include the use of additional mechanical and thermal connections such as added pads, solder connections, thermal interface materials such as filled adhesives or alternate means. Further cooling is enhanced using a wide range of assembled module options such as heat spreaders, heat sinks, heat pipes, heat fins, air cooling, liquid cooling such as but not limited to microchannel cooling or combinations of cooling methods and structures used in the industry at a module level.
As depicted in one example shown in FIG. 9, there are two independent paths for cooling such as a cooling hat 901 at the topside of the processor chip 501 and another at the bottom of the chip stack 503 by means of a heat sink and thermal connections at the bottom side of the cache stack 503 to the package 504. Note in the case of the heat spreader 802, the spreader would also serve as a package 802 for power, electrical and mechanical integrity.
Another example shows one traditional cooling surface 901 placed at the at top of processor layer 501, and the second carrier package 801 placed at the bottom of the cache stack 503, as shown in FIG. 9. The multiple cooling paths allow for more efficient cooling than can be realized using only 3D technology.
The prior-art subsystems of FIGS. 2 and 3, would have taken a very large, complicated board, or even multiple boards, to implement a 24-way system. In those cases, the complexity level may have limited the system to a maximum 16-way system. By increasing the degree of integration, the present invention enables a simpler physical system. Not only is the new system a higher capacity system (in number of processors), but it is more power efficient because it has much shorter busses; more performance efficient because the busses can be wider; more cost efficient because redundancy in the cores enables lower yield loss; and capable of having a lower power density.
Although the preceding discussion of the features of the present invention highlights applying a combination of SC and 3D techniques to create a larger, more efficient microprocessor, it should be noted that the elements of the present invention are not limited to microprocessor design. It should be obvious to those skilled tin the art that any combination of chip types (e.g. single-layered chips with multi-layered chips; or exclusively multi-layered chips), using both SC and 3D routing techniques, are within the scope of the present invention.
Non-Limiting Hardware and Software Examples
The circuit as described above is part of the design for an integrated circuit chip. The chip design is created in a graphical computer programming language, and stored in a computer storage medium (such as a disk, tape, physical hard drive, or virtual hard drive such as in a storage access network). If the designer does not fabricate chips or the photolithographic masks used to fabricate chips, the designer transmits the resulting design by physical means (e.g., by providing a copy of the storage medium storing the design) or electronically (e.g., through the Internet) to such entities, directly or indirectly. The stored design is then converted into the appropriate format (e.g., GDSII) for the fabrication of photolithographic masks, which typically include multiple copies of the chip design in question that are to be formed on a wafer. The photolithographic masks are utilized to define areas of the wafer (and/or the layers thereon) to be etched or otherwise processed.
The method as described above is used in the fabrication of integrated circuit chips. The resulting integrated circuit chips can be distributed by the fabricator in raw wafer form (that is, as a single wafer that has multiple unpackaged chips), as a bare chip, or in a packaged form. In the latter case, the chip is mounted in a single chip package (such as a plastic carrier, with leads that are affixed to a motherboard or other higher level carrier) or in a multi-chip package (such as a ceramic carrier that has either or both surface interconnections or buried interconnections). In any case, the chip is then integrated with other chips, discrete circuit elements, and/or other signal processing devices as part of either (a) an intermediate product, such as a motherboard, or (b) an end product. The end product can be any product that includes integrated circuit chips, ranging from toys and other low-end applications to advanced computer products having a display, a keyboard, or other input device, and a central processor.
A computer system may include, inter alia, one or more computers and at least a computer readable medium, allowing a computer system, to read data, instructions, messages or message packets, and other computer readable information from the computer readable medium. The computer readable medium may include non-volatile memory, such as ROM, Flash memory, Disk drive memory, CD-ROM, and other permanent storage. Additionally, a computer readable medium may include, for example, volatile storage such as RAM, buffers, cache memory, and network circuits. Furthermore, the computer readable medium may comprise computer readable information in a transitory state medium such as a network link and/or a network interface, including a wired network or a wireless network that allow a computer system to read such computer readable information.
Although specific embodiments of the invention have been disclosed, those having ordinary skill in the art will understand that changes can be made to the specific embodiments without departing from the spirit and scope of the invention. The scope of the invention is not to be restricted, therefore, to the specific embodiments, and it is intended that the appended claims cover any and all such applications, modifications, and embodiments within the scope of the present invention.

Claims (18)

1. A computer chip structure, comprising:
at least one single-layered chip, wherein the at least one single-layered chip is a processor that; comprises multiple cores
at least one multi-layered chip stack; and
a carrier package characterized by electrical interconnections of less than one hundred (100) microns diameter, wherein the single-layered chip and the multi-layered chip stack are each electrically coupled to the electrical interconnections of the carrier package, and the single-layered chip communicatively coupled to the multi-layered chip stack through the carrier package so that an electrical signal propagates over a given distance between the single-layered chip and the multi-layered chip stack at substantially a speed of propagation for a single layer chip over the given distance, wherein the processor comprises multiple cores.
2. The computer chip structure of claim 1, wherein the at least one multi-layered chip comprises a memory cache stack.
3. The computer chip structure of claim 1, wherein at least one of the single-layered chip and the multi-layered chip stack is electrically coupled to the carrier package by interconnect vias.
4. The computer chip structure of claim 3, wherein the interconnect vias have a density that is as least as great as two thousand five hundred interconnections per square centimeter (2500 interconnections/cm2).
5. The computer chip structure of claim 3, wherein the carrier package further comprises a plurality of integrated decoupling capacitors, each coupling capacitor physically positioned directly under an interconnect via on a side of the carrier package opposite the side the single-layered chip and the multi-layered chip stack are electrically coupled thereto.
6. The computer chip structure of claim 5, wherein
power is delivered to the single-layered chip by means of integrated decoupling capacitors located directly under an interconnect via of the single-layered chip; and
power is delivered to the multi-layered chip stack by means of integrated decoupling capacitors located directly under an interconnect via of the multi-layered chip stack.
7. The computer chip architecture of claim 6, wherein the interconnect vias are created by means of at least one of silicon through vias, electrical interconnections, thermal vias, thermal interconnections, and thermal interface materials.
8. The computer chip structure of claim 1, wherein the carrier package comprises at least one of silicon, organic materials with build up layers, organic materials without build up layers, organic materials with fiber reinforcement, and organic materials without fiber reinforcement.
9. The computer chip structure of claim 1, wherein:
the carrier package comprises a first surface plane and a second surface plane opposing the first surface plane; and
the single-layered chip and the multi-layered chip stack are physically positioned on the same surface plane of the carrier package.
10. The computer chip structure of claim 9, further comprising:
a first cooling structure thermally coupled to a top-side of the single-layered chip; and
a second cooling structure thermally coupled to a bottom-side of the multi-layered chip stack.
11. The computer chip structure of claim 10, further comprising at least two independent thermal paths, wherein each thermal path is created by means of at least one of electrical vias, electrical interconnections, thermal vias, thermal interconnections, and thermal interface materials.
12. The computer chip structure of claim 1, wherein:
the carrier package comprises a first surface plane and a second surface plane opposing the first surface plane; and
the single-layered chip and the multi-layered chip stack are physically positioned on opposing surface planes of the carrier package such that horizontal wiring distances in intra-level busses are minimized.
13. The computer chip structure of claim 12, further comprising:
a second carrier package, electrically coupled to at least one multi-layered chip stack by interconnect vias.
14. The computer chip structure of claim 13, wherein the second carrier package further comprises a plurality of integrated coupling capacitors, each coupling capacitor physically positioned directly under an interconnect via on a side of the carrier package opposite the side the single-layered chip and the multi-layered chip stack are electrically coupled thereto.
15. The computer chip structure of claim 14, wherein the second carrier package is thermally coupled to the multi-layered chip stack for providing a second, independent path for heat dissipation.
16. A computer chip structure comprising:
a plurality of multi-layered chip stacks; and
a carrier package characterized by electrical interconnections of less than 100 microns diameter, wherein the multi-layered chip stacks are each electrically coupled to the electrical interconnections of the carrier package, and communicatively coupled to at least one other multi-layered chip stack through the carrier package so that an electrical signal propagates over a given distance between the multi-layered chip stacks at substantially a speed of propagation for a single layer chip over the given distance.
17. The computer chip structure of claim 16, wherein the plurality of multi-layered chip stacks is electrically coupled to the carrier package by interconnect vias.
18. A computer system comprising:
a computer chip structure comprising at least one single-layered chip, at least one multi-layered chip stack, and a carrier package characterized by electrical interconnections of less than 100 microns diameter, wherein the single-layered chip and the multi-layered chip stack are each electrically coupled to the electrical interconnections of the carrier package, and the single-layered chip communicatively coupled to the multi-layered chip stack through the carrier package so that an electrical signal propagates over a given distance between the single-layered chip and the multi-layered chip stack at substantially a speed of propagation for a single layer chip over the given distance; and
a nest, electrically coupled to the computer chip structure, for executing functions of the computer system.
US11/538,567 2006-06-16 2006-10-04 Chip system architecture for performance enhancement, power reduction and cost reduction Active 2027-02-03 US7518225B2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US11/538,567 US7518225B2 (en) 2006-06-16 2006-10-04 Chip system architecture for performance enhancement, power reduction and cost reduction
PCT/US2007/071370 WO2008100324A2 (en) 2006-06-16 2007-06-15 Method for achieving very high bandwidth between the levels of a cache hierarchy in 3-dimensional structures, and a 3- dimensional structure resulting therefrom
CN2007800188856A CN101473436B (en) 2006-06-16 2007-06-15 Method for achieving very high bandwidth between the levels of a cache hierarchy in 3-dimensional structures, and a 3-dimensional structure resulting therefrom
EP07863368A EP2036126A2 (en) 2006-06-16 2007-06-15 Method for achieving very high bandwidth between the levels of a cache hierarchy in 3-dimensional structures, and a 3-dimensional structure resulting therefrom

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/453,885 US7616470B2 (en) 2006-06-16 2006-06-16 Method for achieving very high bandwidth between the levels of a cache hierarchy in 3-dimensional structures, and a 3-dimensional structure resulting therefrom
US11/538,567 US7518225B2 (en) 2006-06-16 2006-10-04 Chip system architecture for performance enhancement, power reduction and cost reduction

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US11/453,885 Continuation-In-Part US7616470B2 (en) 2006-06-16 2006-06-16 Method for achieving very high bandwidth between the levels of a cache hierarchy in 3-dimensional structures, and a 3-dimensional structure resulting therefrom

Publications (2)

Publication Number Publication Date
US20070290315A1 US20070290315A1 (en) 2007-12-20
US7518225B2 true US7518225B2 (en) 2009-04-14

Family

ID=38860723

Family Applications (3)

Application Number Title Priority Date Filing Date
US11/453,885 Expired - Fee Related US7616470B2 (en) 2006-06-16 2006-06-16 Method for achieving very high bandwidth between the levels of a cache hierarchy in 3-dimensional structures, and a 3-dimensional structure resulting therefrom
US11/538,567 Active 2027-02-03 US7518225B2 (en) 2006-06-16 2006-10-04 Chip system architecture for performance enhancement, power reduction and cost reduction
US12/116,771 Expired - Fee Related US7986543B2 (en) 2006-06-16 2008-05-07 Method for achieving very high bandwidth between the levels of a cache hierarchy in 3-dimensional structures, and a 3-dimensional structure resulting therefrom

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US11/453,885 Expired - Fee Related US7616470B2 (en) 2006-06-16 2006-06-16 Method for achieving very high bandwidth between the levels of a cache hierarchy in 3-dimensional structures, and a 3-dimensional structure resulting therefrom

Family Applications After (1)

Application Number Title Priority Date Filing Date
US12/116,771 Expired - Fee Related US7986543B2 (en) 2006-06-16 2008-05-07 Method for achieving very high bandwidth between the levels of a cache hierarchy in 3-dimensional structures, and a 3-dimensional structure resulting therefrom

Country Status (4)

Country Link
US (3) US7616470B2 (en)
EP (1) EP2036126A2 (en)
CN (1) CN101473436B (en)
WO (1) WO2008100324A2 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080150088A1 (en) * 2006-12-20 2008-06-26 Reed Paul A Method for incorporating existing silicon die into 3d integrated stack
US20100288544A1 (en) * 2005-12-27 2010-11-18 Ibiden Co., Ltd Multilayer printed wiring board
US20110119508A1 (en) * 2009-11-16 2011-05-19 International Business Machines Corporation Power Efficient Stack of Multicore Microprocessors
US8163642B1 (en) * 2005-08-10 2012-04-24 Altera Corporation Package substrate with dual material build-up layers
US20120170345A1 (en) * 2011-01-04 2012-07-05 Choi Jang Seok Stacked semiconductor device and method of manufacturing the same
US8299608B2 (en) 2010-07-08 2012-10-30 International Business Machines Corporation Enhanced thermal management of 3-D stacked die packaging
US8466543B2 (en) 2010-05-27 2013-06-18 International Business Machines Corporation Three dimensional stacked package structure
US8569874B2 (en) 2011-03-09 2013-10-29 International Business Machines Corporation High memory density, high input/output bandwidth logic-memory structure and architecture
CN107564881A (en) * 2017-08-29 2018-01-09 睿力集成电路有限公司 A kind of chip stack stereo encapsulation structure and its manufacture method

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8032711B2 (en) * 2006-12-22 2011-10-04 Intel Corporation Prefetching from dynamic random access memory to a static random access memory
US20080237738A1 (en) * 2007-03-27 2008-10-02 Christoph Andreas Kleint Integrated circuit, cell, cell arrangement, method for manufacturing an integrated circuit, method for manufacturing a cell arrangement; memory module
US9229887B2 (en) * 2008-02-19 2016-01-05 Micron Technology, Inc. Memory device with network on chip methods, apparatus, and systems
US7978721B2 (en) 2008-07-02 2011-07-12 Micron Technology Inc. Multi-serial interface stacked-die memory architecture
US8086913B2 (en) 2008-09-11 2011-12-27 Micron Technology, Inc. Methods, apparatus, and systems to repair memory
US20100078788A1 (en) 2008-09-26 2010-04-01 Amir Wagiman Package-on-package assembly and method
JP2010108204A (en) * 2008-10-30 2010-05-13 Hitachi Ltd Multichip processor
US9123552B2 (en) 2010-03-30 2015-09-01 Micron Technology, Inc. Apparatuses enabling concurrent communication between an interface die and a plurality of dice stacks, interleaved conductive paths in stacked devices, and methods for forming and operating the same
CN104094402A (en) 2011-12-01 2014-10-08 考文森智财管理公司 CPU with stacked memory
CN102662909B (en) * 2012-03-22 2013-12-25 东华理工大学 Three-dimensional many-core system on chip
US8891279B2 (en) 2012-09-17 2014-11-18 International Business Machines Corporation Enhanced wiring structure for a cache supporting auxiliary data output
US9378793B2 (en) * 2012-12-20 2016-06-28 Qualcomm Incorporated Integrated MRAM module
US9037791B2 (en) 2013-01-22 2015-05-19 International Business Machines Corporation Tiered caching and migration in differing granularities
US9336144B2 (en) * 2013-07-25 2016-05-10 Globalfoundries Inc. Three-dimensional processing system having multiple caches that can be partitioned, conjoined, and managed according to more than one set of rules and/or configurations
CN107564825B (en) * 2017-08-29 2018-09-21 睿力集成电路有限公司 A kind of chip double-side encapsulating structure and its manufacturing method
FR3082656B1 (en) 2018-06-18 2022-02-04 Commissariat Energie Atomique INTEGRATED CIRCUIT COMPRISING MACROS AND ITS MANUFACTURING METHOD
CN110540164A (en) * 2019-10-09 2019-12-06 太仓全众智能装备有限公司 Bottle type buffer memory machine
EP4071593A4 (en) * 2021-02-26 2023-08-23 Beijing Vcore Technology Co.,Ltd. Stacked cache system based on sedram, and control method and cache device
CN113097383B (en) * 2021-03-09 2023-07-18 长江先进存储产业创新中心有限责任公司 CPU and manufacturing method thereof
CN113096706B (en) * 2021-03-09 2023-06-16 长江先进存储产业创新中心有限责任公司 CPU and manufacturing method thereof
US11887908B2 (en) 2021-12-21 2024-01-30 International Business Machines Corporation Electronic package structure with offset stacked chips and top and bottom side cooling lid
CN114244920B (en) * 2021-12-29 2024-02-09 苏州盛科通信股份有限公司 New and old chip stacking head compatible method and system and chip
WO2023203435A1 (en) * 2022-04-22 2023-10-26 株式会社半導体エネルギー研究所 Semiconductor device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5702984A (en) * 1993-09-13 1997-12-30 International Business Machines Corporation Integrated mulitchip memory module, structure and fabrication
US7217994B2 (en) * 2004-12-01 2007-05-15 Kyocera Wireless Corp. Stack package for high density integrated circuits

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5133061A (en) * 1987-10-29 1992-07-21 International Business Machines Corporation Mechanism for improving the randomization of cache accesses utilizing abit-matrix multiplication permutation of cache addresses
US6059835A (en) * 1997-06-13 2000-05-09 International Business Machines Corporation Performance evaluation of processor operation using trace pre-processing
US6175160B1 (en) * 1999-01-08 2001-01-16 Intel Corporation Flip-chip having an on-chip cache memory
US6725334B2 (en) * 2000-06-09 2004-04-20 Hewlett-Packard Development Company, L.P. Method and system for exclusive two-level caching in a chip-multiprocessor
US6678814B2 (en) * 2001-06-29 2004-01-13 International Business Machines Corporation Method and apparatus for allocating data usages within an embedded dynamic random access memory device
JP4047788B2 (en) * 2003-10-16 2008-02-13 松下電器産業株式会社 Compiler device and linker device
US7130967B2 (en) * 2003-12-10 2006-10-31 International Business Machines Corporation Method and system for supplier-based memory speculation in a memory subsystem of a data processing system
US7305523B2 (en) * 2005-02-12 2007-12-04 International Business Machines Corporation Cache memory direct intervention
US7533321B2 (en) * 2005-09-13 2009-05-12 International Business Machines Corporation Fault tolerant encoding of directory states for stuck bits
US7404041B2 (en) * 2006-02-10 2008-07-22 International Business Machines Corporation Low complexity speculative multithreading system based on unmodified microprocessor core
JP4208895B2 (en) * 2006-05-30 2009-01-14 株式会社東芝 Cache memory device and processing method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5702984A (en) * 1993-09-13 1997-12-30 International Business Machines Corporation Integrated mulitchip memory module, structure and fabrication
US7217994B2 (en) * 2004-12-01 2007-05-15 Kyocera Wireless Corp. Stack package for high density integrated circuits

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8163642B1 (en) * 2005-08-10 2012-04-24 Altera Corporation Package substrate with dual material build-up layers
US8334466B2 (en) * 2005-12-27 2012-12-18 Ibiden Co., Ltd. Multilayer printed wiring board
US20100288544A1 (en) * 2005-12-27 2010-11-18 Ibiden Co., Ltd Multilayer printed wiring board
US8110899B2 (en) * 2006-12-20 2012-02-07 Intel Corporation Method for incorporating existing silicon die into 3D integrated stack
US20080150088A1 (en) * 2006-12-20 2008-06-26 Reed Paul A Method for incorporating existing silicon die into 3d integrated stack
US20110119508A1 (en) * 2009-11-16 2011-05-19 International Business Machines Corporation Power Efficient Stack of Multicore Microprocessors
US8417974B2 (en) 2009-11-16 2013-04-09 International Business Machines Corporation Power efficient stack of multicore microprocessors
US8466543B2 (en) 2010-05-27 2013-06-18 International Business Machines Corporation Three dimensional stacked package structure
US8299608B2 (en) 2010-07-08 2012-10-30 International Business Machines Corporation Enhanced thermal management of 3-D stacked die packaging
US20120170345A1 (en) * 2011-01-04 2012-07-05 Choi Jang Seok Stacked semiconductor device and method of manufacturing the same
US9183890B2 (en) * 2011-01-04 2015-11-10 Samsung Electronics Co., Ltd. Stacked semiconductor device and method of manufacturing the same
US8569874B2 (en) 2011-03-09 2013-10-29 International Business Machines Corporation High memory density, high input/output bandwidth logic-memory structure and architecture
CN107564881A (en) * 2017-08-29 2018-01-09 睿力集成电路有限公司 A kind of chip stack stereo encapsulation structure and its manufacture method
CN107564881B (en) * 2017-08-29 2018-09-21 睿力集成电路有限公司 A kind of chip stack stereo encapsulation structure and its manufacturing method

Also Published As

Publication number Publication date
WO2008100324A9 (en) 2009-05-22
US7986543B2 (en) 2011-07-26
EP2036126A2 (en) 2009-03-18
WO2008100324A2 (en) 2008-08-21
US20070290315A1 (en) 2007-12-20
US20070294479A1 (en) 2007-12-20
US20080209126A1 (en) 2008-08-28
US7616470B2 (en) 2009-11-10
WO2008100324A3 (en) 2011-01-13
CN101473436B (en) 2011-04-13
CN101473436A (en) 2009-07-01

Similar Documents

Publication Publication Date Title
US7518225B2 (en) Chip system architecture for performance enhancement, power reduction and cost reduction
US11693801B2 (en) Stacked semiconductor device assembly in computer system
US8445918B2 (en) Thermal enhancement for multi-layer semiconductor stacks
US7611924B2 (en) Integrated circuit package with chip-side signal connections
US9495498B2 (en) Universal inter-layer interconnect for multi-layer semiconductor stacks
US7279795B2 (en) Stacked die semiconductor package
US8710676B2 (en) Stacked structure and stacked method for three-dimensional chip
RU2419179C2 (en) Integrated circuit device and method of making said device
KR20140109914A (en) Cpu with stacked memory
US10090236B2 (en) Interposer having a pattern of sites for mounting chiplets
KR20130015461A (en) Semiconductor stack package and the method for manufacturing the same
Clermidy et al. 3D embedded multi-core: Some perspectives
US20120049361A1 (en) Semiconductor integrated circuit
US11824046B2 (en) Symbiotic network on layers
CN113574662A (en) Interposer, microelectronic device assembly including the same, and method of manufacture
JP2009135204A (en) System-in-package
US6949826B2 (en) High density semiconductor package
KR101153222B1 (en) Stacked chip package, semiconductor module including the same and method of manufacturing stacked chip package
England et al. Advanced Packaging Drivers/Opportunities to Support Emerging Artificial Intelligence Applications
US20220214972A1 (en) Integrated Circuit with 3D Partitioning
US20120175763A1 (en) Integrated circuit packaging including auxiliary circuitry
US20230197705A1 (en) Interconnection structures for high-bandwidth data transfer
EP0698294A1 (en) Logical three-dimensional interconnections between integrated circuit chips using a two-dimensional multi-chip module package
WO2022068467A1 (en) Encapsulation structure, apparatus, board card, and method for laying out integrated circuit
KR101104380B1 (en) Semiconductor chip package, semiconductor module including semiconductor chip package, electronic system including semiconductor module and method of manufacturing semiconductor chip package

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:EMMA, PHILIP G.;KNICKERBOCKER, JOHN U.;PATEL, CHIRAG S.;REEL/FRAME:018347/0155;SIGNING DATES FROM 20060926 TO 20061003

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

REMI Maintenance fee reminder mailed
FPAY Fee payment

Year of fee payment: 4

SULP Surcharge for late payment
AS Assignment

Owner name: GLOBALFOUNDRIES U.S. 2 LLC, NEW YORK

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:036550/0001

Effective date: 20150629

AS Assignment

Owner name: GLOBALFOUNDRIES INC., CAYMAN ISLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GLOBALFOUNDRIES U.S. 2 LLC;GLOBALFOUNDRIES U.S. INC.;REEL/FRAME:036779/0001

Effective date: 20150910

FPAY Fee payment

Year of fee payment: 8

AS Assignment

Owner name: WILMINGTON TRUST, NATIONAL ASSOCIATION, DELAWARE

Free format text: SECURITY AGREEMENT;ASSIGNOR:GLOBALFOUNDRIES INC.;REEL/FRAME:049490/0001

Effective date: 20181127

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12

AS Assignment

Owner name: GLOBALFOUNDRIES U.S. INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GLOBALFOUNDRIES INC.;REEL/FRAME:054633/0001

Effective date: 20201022

AS Assignment

Owner name: GLOBALFOUNDRIES INC., CAYMAN ISLANDS

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION;REEL/FRAME:054636/0001

Effective date: 20201117

AS Assignment

Owner name: GLOBALFOUNDRIES U.S. INC., NEW YORK

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION;REEL/FRAME:056987/0001

Effective date: 20201117