WO2023056876A1 - Longitudinal stacked chip, integrated circuit device, board, and manufacturing method therefor - Google Patents

Longitudinal stacked chip, integrated circuit device, board, and manufacturing method therefor Download PDF

Info

Publication number
WO2023056876A1
WO2023056876A1 PCT/CN2022/122373 CN2022122373W WO2023056876A1 WO 2023056876 A1 WO2023056876 A1 WO 2023056876A1 CN 2022122373 W CN2022122373 W CN 2022122373W WO 2023056876 A1 WO2023056876 A1 WO 2023056876A1
Authority
WO
WIPO (PCT)
Prior art keywords
die
memory
die group
group
area
Prior art date
Application number
PCT/CN2022/122373
Other languages
French (fr)
Chinese (zh)
Inventor
邱志威
陈帅
高崧
Original Assignee
寒武纪(西安)集成电路有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 寒武纪(西安)集成电路有限公司 filed Critical 寒武纪(西安)集成电路有限公司
Publication of WO2023056876A1 publication Critical patent/WO2023056876A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L21/00Processes or apparatus adapted for the manufacture or treatment of semiconductor or solid state devices or of parts thereof
    • H01L21/70Manufacture or treatment of devices consisting of a plurality of solid state components formed in or on a common substrate or of parts thereof; Manufacture of integrated circuit devices or of parts thereof
    • H01L21/71Manufacture of specific parts of devices defined in group H01L21/70
    • H01L21/768Applying interconnections to be used for carrying current between separate components within a device comprising conductors and dielectrics
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L23/00Details of semiconductor or other solid state devices
    • H01L23/28Encapsulations, e.g. encapsulating layers, coatings, e.g. for protection
    • H01L23/31Encapsulations, e.g. encapsulating layers, coatings, e.g. for protection characterised by the arrangement or shape
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L23/00Details of semiconductor or other solid state devices
    • H01L23/48Arrangements for conducting electric current to or from the solid state body in operation, e.g. leads, terminal arrangements ; Selection of materials therefor
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/03Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes
    • H01L25/04Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers
    • H01L25/065Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group H01L27/00
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/03Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes
    • H01L25/04Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers
    • H01L25/07Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof all the devices being of a type provided for in the same subgroup of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N, e.g. assemblies of rectifier diodes the devices not having separate containers the devices being of a type provided for in group H01L29/00
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/18Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof the devices being of types provided for in two or more different subgroups of the same main group of groups H01L27/00 - H01L33/00, or in a single subclass of H10K, H10N
    • HELECTRICITY
    • H01ELECTRIC ELEMENTS
    • H01LSEMICONDUCTOR DEVICES NOT COVERED BY CLASS H10
    • H01L25/00Assemblies consisting of a plurality of individual semiconductor or other solid state devices ; Multistep manufacturing processes thereof
    • H01L25/50Multistep manufacturing processes of assemblies consisting of devices, each device being of a type provided for in group H01L27/00 or H01L29/00

Definitions

  • the present invention generally relates to the field of semiconductors. More specifically, the present invention relates to vertically stacked chips, integrated circuit devices, boards and manufacturing methods thereof.
  • D2D die-to-die
  • a die-to-die interface is a functional block that occupies a small area of the die to provide a data interface between two modules or two die assembled in the same package.
  • Die-to-die interfaces utilize very short channels to connect modules or dies within a package, with transfer rates and bandwidths that exceed traditional chip-to-chip interfaces.
  • two modules or dies connected by a die-to-die interface are usually placed side by side, and the die-to-die interfaces of the two modules or dies are adjacent, and the two die-to-die
  • the granular interface is electrically connected through the interposer layer below.
  • the transfer rate and bandwidth of the die-to-die interface are excellent, when transferring data through the underlying interposer, the transfer path is as high as millimeters. If the transmission path is too long, the signal will be attenuated and the speed will be reduced, which still cannot meet the requirements of high-intensity computing.
  • the solution of the present invention provides a vertically stacked chip, an integrated circuit device, a board and a manufacturing method thereof.
  • the present invention discloses a vertically stacked chip, including a first die group and a second die group.
  • the first die group includes a first die and a second die using a face-to-face process
  • the second die group includes a first die and a second die using a face-to-face process
  • the first die group and the second die Groups are processed back-to-back.
  • the present invention discloses an integrated circuit device including the aforementioned vertically stacked chips; and also discloses a board including the aforementioned integrated circuit device.
  • the present invention discloses a method for vertically stacking chips.
  • the vertically stacked chips include a first die group and a second die group.
  • the method includes: bonding the first crystal grain and the second crystal grain in the first crystal grain group face-to-face; bonding the first crystal grain and the second crystal grain in the second crystal grain group face-to-face; and bonding the first crystal grain back-to-back Die Group and Second Die Group.
  • the present invention adopts face-to-face lamination of the grains of the same grain group, and adopts back-to-back lamination of adjacent grain groups, so that the transmission path between the grains in the same grain group is greatly shortened, which helps to improve the internal strength of the grain group. transmission efficiency.
  • FIG. 1 shows a top view of the layout of a package structure including a die-to-die interface
  • FIG. 2 shows a cross-sectional view of the packaging structure in FIG. 1 along the dotted line direction;
  • Fig. 3 is a structural diagram showing a board of an embodiment of the present invention.
  • FIG. 4 is a structural diagram illustrating an integrated circuit device according to an embodiment of the present invention.
  • Fig. 5 is a schematic view showing vertical stacking according to another embodiment of the present invention.
  • Fig. 6 is a sectional view showing the structure of Fig. 5;
  • Fig. 7 is a schematic diagram showing vertical stacking according to another embodiment of the present invention.
  • Fig. 8 is a schematic diagram showing vertical stacking according to another embodiment of the present invention.
  • Fig. 9 is a schematic diagram showing vertical stacking according to another embodiment of the present invention.
  • Fig. 10 is a schematic diagram showing vertical stacking according to another embodiment of the present invention.
  • Fig. 11 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 5;
  • Fig. 12 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 7;
  • Fig. 13 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 8;
  • Fig. 14 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 9;
  • Fig. 15 is a flow chart showing another embodiment of the present invention to realize back-to-back stacking
  • Figure 16 is a cross-sectional view illustrating step 1501;
  • FIG. 17 is a cross-sectional view illustrating step 1504
  • FIG. 18 is a cross-sectional view illustrating step 1505
  • Figure 19 is a cross-sectional view illustrating step 1505;
  • Figure 20 is a cross-sectional view illustrating step 1505;
  • FIG. 21 is a cross-sectional view illustrating step 1505
  • FIG. 22 is a cross-sectional view illustrating step 1506
  • Figure 23 is a sectional view showing step 1507;
  • FIG. 24 is a cross-sectional view illustrating step 1508
  • Figure 25 is a cross-sectional view illustrating step 1509.
  • FIG. 26 is a cross-sectional view illustrating step 1511 .
  • the term “if” may be interpreted as “when” or “once” or “in response to determining” or “in response to detecting” depending on the context.
  • a die-to-die interface is like any other chip-to-chip interface, a data link channel established between the two dies.
  • the die-to-die interface is logically divided into physical layer, link layer, and transaction layer, and provides a standardized parallel interface to the internal interconnect structure.
  • the layout of the package structure is located in a molding compound area 10 of a chip.
  • the molding compound area 10 includes a system area and a storage area.
  • An exemplary system area is located in the center of the molding compound area 10 for placing two SoCs 101 , and storage areas are respectively located on both sides of the system area for placing eight off-chip memories 102 .
  • the system area also has a die-to-die area 103 , a physical area 104 and an input-output area 105 .
  • the die-to-die area 103 is formed with a transceiver circuit for data sharing between the two SoCs 101;
  • the physical area 104 is formed with a physical access circuit for accessing the off-chip memory 102;
  • the input-output area 105 is formed with input and output
  • the circuit is used as an interface for external communication of the system on chip 101 .
  • the memory 106 is also placed in the system area as a temporary storage space of the system on chip 101 , its capacity is smaller than that of the off-chip memory 102 , but the data transfer rate is higher than that of the off-chip memory 102 .
  • FIG. 2 shows a cross-sectional view of the package structure in FIG. 1 along the dotted line direction.
  • the system area is divided into upper and lower layers.
  • the upper layer is the SoC 101
  • the lower layer is the transceiver circuit of the die-to-die area 103 , the memory 106 and the I/O circuit of the I/O area 105 .
  • the packaging structure further includes an interposer 201 and a substrate 202 , and the interposer 201 is disposed on the substrate 202 .
  • the path is the system on chip 101 at the sending end ⁇ the transceiver circuit of the die-to-die area 103 at the sending end ⁇ the interposer 201 ⁇ the transceiver circuit of the die-to-die area 103 at the receiving end ⁇
  • the system on chip 101 at the receiving end realizes the technical effect of low delay and low power consumption of the die-to-die port.
  • FIG. 3 shows a schematic structural diagram of a board 30 according to an embodiment of the present invention.
  • the board card 30 includes a chip 301, which is a system-on-a-chip integrated with one or more combination processing devices.
  • the combination processing device is an artificial intelligence computing unit to support various types of deep learning and Machine learning algorithms meet the intelligent processing requirements in complex scenarios in the fields of computer vision, speech, natural language processing, and data mining.
  • deep learning technology is widely used in the field of cloud intelligence.
  • a notable feature of cloud intelligence applications is the large amount of input data, which has high requirements for the storage capacity and computing power of the platform.
  • the board 30 of this embodiment is suitable for cloud intelligence applications. applications, with huge off-chip storage, on-chip storage and powerful computing capabilities.
  • the chip 301 is connected to an external device 303 through an external interface device 302 .
  • the external device 303 is, for example, a server, a computer, a camera, a display, a mouse, a keyboard, a network card or a wifi interface, and the like.
  • the data to be processed can be transmitted to the chip 301 by the external device 303 through the external interface device 302 .
  • the calculation result of the chip 301 can be sent back to the external device 303 via the external interface device 302 .
  • the external interface device 302 may have different interface forms, such as a PCIe interface and the like.
  • the chip 301 includes computing means and processing means.
  • the computing device is configured to perform operations specified by the user, and is mainly implemented as a single-core intelligent processor or a multi-core intelligent processor, which is used to perform deep learning or machine learning calculations.
  • the processing device performs basic control including but not limited to data transfer, starting and/or stopping the computing device, and the like.
  • the processing means may be one or more types of processing in a central processing unit (CPU), a graphics processing unit (GPU), or other general-purpose and/or special-purpose processors.
  • processors include but are not limited to digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), field-programmable gate arrays (field-programmable gate arrays, FPGAs) or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field-programmable gate arrays
  • Programming logic devices discrete gate or transistor logic devices, discrete hardware components, etc.
  • the board 30 also includes a storage device 304 for storing data, which includes one or more storage units 305 .
  • the storage device 304 is connected and data transmitted with the control device 306 and the chip 301 through the bus.
  • the control device 306 in the board 30 is configured to regulate the state of the chip 301 .
  • the control device 306 may include a microcontroller (Micro Controller Unit, MCU).
  • FIG. 4 shows the structure of the combined processing device in the board 30.
  • the combined processing device 40 includes a computing device 401 , an interface device 402 , a processing device 403 and an off-chip memory 404 .
  • the computing device 401 is configured to perform operations specified by the user, and is mainly implemented as a single-core intelligent processor or a multi-core intelligent processor for performing deep learning or machine learning calculations, which can interact with the processing device 403 through the interface device 402 to Work together to complete user-specified operations.
  • the interface device 402 is connected to the bus for connecting with other devices, such as the control device 306 and the external interface device 302 in FIG. 3 .
  • the processing device 403 performs basic control including but not limited to data transfer, starting and/or stopping of the computing device 401 .
  • the processing device 403 may be one or more types of processors in a central processing unit, a graphics processing unit, or other general and/or special purpose processors, these processors include but are not limited to digital signal processors , application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs.
  • the computing device 401 in this embodiment it can be regarded as having a single-core structure or a homogeneous multi-core structure. However, when considering the integration of the computing device 401 and the processing device 403 together, they are considered to form a heterogeneous multi-core structure.
  • the off-chip memory 404 is used to store data to be processed, which is a DDR memory, usually 16G or larger in size, and is used to store data of the computing device 401 and/or the processing device 403 .
  • Figure 5 shows a schematic diagram of a vertical stack of embodiments of the present invention.
  • This embodiment is a multi-core chip, including a first die group and a second die group, wherein the first die group includes a first core layer 51 and a first memory layer 52, and the second die group includes a second core Layer 53 and the second memory layer 54, actually the first core layer 51, the first memory layer 52, the second core layer 53 and the second memory layer 54 are vertically stacked together in sequence, each layer in Fig. 5 is visually The upper and lower separations are shown in this manner for convenience of illustration only.
  • the first core layer 51 realizes the function of the processor core, including the first operation area 511, and the first operation area 511 is full of the logic layer of the first core layer 51, that is, the top side of the first core layer 51 in the figure, the first core
  • the layer 51 also includes a first grain-to-grain area 512 and a first through-silicon via 513 in a special area, and the first computing area 511 generates a first computing circuit to realize the function of the computing device 401; the first grain-to-crystal
  • the die area 512 is formed with a first transceiver circuit for the die-to-die interface of the first computing circuit; the first through-silicon via 513 is used for realizing the electrical interconnection of stacked chips in a three-dimensional integrated circuit.
  • the first memory layer 52 implements the function of on-chip memory, including a first memory area 521 , a first input/output area 522 , a first physical area 523 and a second TSV 524 .
  • the first memory area 521 is formed with a storage unit for temporarily storing the operation result of the first operation circuit.
  • the first input-output area 522 is formed with a first input-output circuit, which is used as an interface for the first core layer 51 to communicate with the first memory layer 52 , that is, to realize the function of the interface device 402 .
  • the second physical area 523 has a first physical access circuit for accessing the off-chip memory 404 .
  • the second TSVs 524 extend over the entire first memory layer 52 , and are only shown on one side for example, and are used to electrically connect specific elements.
  • the second core layer 53 realizes the function of the processor core, including a second operation area 531, and the second operation area 531 is full of the logic layer of the second core layer 53, that is, the top side of the second core layer 53 in the figure.
  • Layer 53 also includes a second grain-to-grain region 532 and a third through-silicon via 533 in a special area, and a second computing circuit is formed in the second computing region 531 to realize the function of the processing device 403;
  • the second transceiver circuit is formed in the grain area 532 to serve as the die-to-die interface of the second computing circuit;
  • the third TSV 533 is also used to realize the electrical interconnection of the stacked chips in the three-dimensional integrated circuit.
  • the second memory layer 54 implements the function of on-chip memory, including a second memory area 541 , a second input/output area 542 , a second physical area 543 and a fourth TSV 544 .
  • the second memory area 541 is formed with a storage unit for temporarily storing the operation result of the second operation circuit.
  • the second input-output area 542 is formed with a second input-output circuit, which is used as an interface for the second core layer 53 to communicate with the second memory layer 54 , that is, to realize the function of the interface device 402 .
  • the second physical area 543 has a second physical access circuit for accessing the off-chip memory 404 .
  • the fourth TSV 544 spreads over the entire second memory layer 54 , and is only shown on one side as an example, for electrically connecting specific components.
  • the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs.
  • the transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit
  • the input-output TSV is used to electrically conduct the data of the input-output circuit
  • the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. 404 out of memory.
  • the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 511 ⁇ the first transceiver circuit in the first die-to-die area 512 ⁇ the first Transceiver TSV of TSV 513 ⁇ Transceiver TSV of second TSV 524 ⁇ second transceiver circuit of second grain-to-grain region 532 ⁇ second computing circuit of second computing region 531; when processing When the device 403 intends to transmit data to the computing device 401, the data reaches the computing device 401 through the aforementioned reverse path.
  • the data reaches other off-chip devices through the following path: the first input-output circuit of the first input-output area 522 ⁇ the second silicon I/O TSV of TSV 524 ⁇ I/O TSV of second TSV 533 ⁇ I/O TSV of fourth TSV 544; when other devices outside the chip want to transmit data to the first memory area At 521, the data arrives at the first memory area 521 through the aforementioned reverse path.
  • the data reaches other off-chip devices through the following path: the input-output circuit of the second input-output area 542 ⁇ the fourth TSV 544 input and output TSVs; when other off-chip devices want to transmit data to the second memory area 541, the data arrives at the second memory area 541 through the aforementioned reverse path.
  • the data in the first memory area 521 is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the first physical access circuit of the first physical area 523 ⁇ the physical TSV of the second TSV 524 ⁇ Physical TSV of the second TSV 533 ⁇ Physical TSV of the fourth TSV 544; when the off-chip memory 404 intends to transmit input data to the first memory area 521 for processing by the computing device 401, the data passes through The aforementioned reverse path reaches the first memory area 521 .
  • the data in the second memory area 541 is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the second physical access circuit of the second physical area 543 ⁇ the physical TSV of the fourth TSV 544 ;
  • the off-chip memory 404 intends to transmit input data to the second memory area 541 for processing by the processing device 403, the data arrives at the second memory area 541 through the aforementioned reverse path.
  • FIG. 6 shows a cross-sectional view of the structure of FIG. 5 .
  • the first core layer 51 is used in conjunction with the first memory layer 52
  • the second core layer 53 is used in conjunction with the second memory layer 54.
  • the first core layer 51 and the first memory layer 52 use Face-to-face bonding process, that is, the logical side of the first core layer 51 that generates the first computing area 511 is bonded to the logical side of the first memory layer 52 that generates the first memory area 521, so that the first computing circuit and the first memory Area 521 has the shortest transmission path.
  • the logical side where the second core layer 53 generates the second computing area 531 fits with the logical side where the second memory layer 54 generates the first memory area 541, so that the transmission between the second computing circuit and the second memory area 541 The shortest path.
  • the first die group and the second die group adopt a back-to-back bonding process, that is, opposite sides of the first memory layer 52 and opposite sides of the second core layer 53 are bonded.
  • the first grain-to-grain region 512 and the second grain-to-grain region 532 are vertically stacked so that the grain-to-grain interface of the first core layer 51 is connected to the second core layer 53.
  • the grain-to-grain interface is directly electrically connected to the second TSV 524 through the first TSV 513 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
  • the first die group in this embodiment includes the first die and the second die using the face-to-face process
  • the second die group in this embodiment includes the first die and the second die using the face-to-face process.
  • Die while the first die group and the second die group adopt back-to-back process, in which the first die can be the processor core or memory, and the second die is the other of the processor core and memory, which are used in conjunction with each other .
  • the positions of the first core layer 51 and the first memory layer 52 of the first die group can be reversed, and the positions of the second core layer 53 and the second memory layer 54 of the second die group can be reversed.
  • the second memory layer 54 of the second die group of this structure is located between the first core layer 51 of the first die group and the second core layer 53 of the second die group, and the second The memory layer 54 is formed with transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 511 ⁇ the first transceiver circuit in the first die-to-die area 512 ⁇ the first Transceiver TSV of TSV 513 ⁇ Transceiver TSV of fourth TSV 544 ⁇ Second transceiver circuit in second grain-to-grain area 532 ⁇ Second operation circuit in second operation area 531; when processing When the device 403 intends to transmit data to the computing device 401, the data reaches the computing device 401 through the aforementioned reverse path.
  • the computing device 401 or the processing device 403 performs data exchange with other off-chip devices through the interface device 402, and the first memory area 521 or the second memory area 541 communicates with the off-chip memory 404.
  • the path of data transmission is similar to the embodiment in FIG. 5 , and those skilled in the art can easily deduce it, so it is not described in detail.
  • FIG. 8 shows a schematic diagram of vertical stacking in this embodiment.
  • the vertically stacked chips in this embodiment are divided into a first die group and a second die group, the first die group is stacked on the second die group, and the first die group is respectively the third memory layer from top to bottom 85 (the third grain), the first core layer 81 (the first grain) and the first memory layer 82 (the second grain), and the second grain group is respectively the fourth memory layer 86 (the second grain) from top to bottom second die), the second core layer 83 (the first die) and the second memory layer 84 (the third die), that is, the fourth memory layer 86 is located between the first memory layer 82 and the second core layer 83 .
  • the layers in FIG. 8 are visually separated up and down and shown in this way for convenience of illustration only.
  • first core layer 81, the first memory layer 82, the second core layer 83, and the second memory layer 84 are the same as the first core layer 51, the first memory layer 52, and the second core layer 53 in the foregoing embodiments.
  • second memory layer 54 are the same, so details are not repeated here.
  • the third memory layer 85 includes a third memory area 851 and a fifth TSV 852 , the third memory area 851 covers the logic layer of the third memory layer 85 , that is, the top side of the third memory layer 85 in the figure.
  • the third memory area 851 is formed with storage units for temporarily storing the calculation results of the first calculation circuit.
  • the fifth through-silicon vias 852 are spread over the entire third memory layer 85 and are only shown on one side for electrical connection. components.
  • the third memory layer 85 is only responsible for temporarily storing the calculation results of the first calculation circuit, and is not responsible for the external contact task of the first die group.
  • the first computing circuit can use the temporary storage space of the first memory area 821 and the third memory area 851, and when the computing device 401 wants to temporarily store intermediate data, it can temporarily store it to the third memory area 851 through the fifth TSV 852, Or it is temporarily stored in the first memory area 821 through the first TSV 813 .
  • the fourth memory layer 86 includes a fourth memory area 861 and sixth TSVs 862 .
  • the fourth memory area 861 covers the logical layer of the fourth memory layer 86 , ie the top side of the fourth memory layer 86 in the figure.
  • the fourth memory area 861 has storage units for temporarily storing the operation results of the second operation circuit.
  • the sixth through-silicon vias 862 are spread over the entire fourth memory layer 86 and are only shown on one side for electrical connection. components.
  • the fourth memory layer 86 is only responsible for temporarily storing the calculation results of the second calculation circuit, and is not responsible for the external contact task of the second die group.
  • the second computing circuit can use the temporary storage space of the second memory area 841 and the fourth memory area 861, and when the processing device 403 wants to temporarily store intermediate data, it can temporarily store it to the fourth memory area 861 through the sixth TSV 862, Or it is temporarily stored in the second memory area 841 through the second TSV 833 .
  • the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs.
  • the transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit
  • the input-output TSV is used to electrically conduct the data of the input-output circuit
  • the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. 404 out of memory.
  • the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 811 ⁇ the first transceiver circuit in the first die-to-die area 812 ⁇ the first Transceiver TSV of TSV 813 ⁇ Transceiver TSV of third TSV 824 ⁇ Transceiver TSV of sixth TSV 862 ⁇ Second transceiver circuit of second die-to-grain region 832 ⁇ No.
  • the second computing circuit of the second computing area 831 when the processing device 403 intends to transmit data to the computing device 401 , the data reaches the computing device 401 through the aforementioned reverse path.
  • the data reaches other off-chip devices through the following path: the first input-output circuit of the first input-output area 822 ⁇ the second The input-output TSV of the third TSV 824 ⁇ the input-output TSV of the sixth TSV 862 ⁇ the input-output TSV of the second TSV 833 ⁇ the input-output TSV of the fourth TSV 844 ;
  • the data arrives at the first memory area 821 through the aforementioned reverse path.
  • the data reaches other off-chip devices through the following path: the second input-output circuit of the second input-output area 842 ⁇ the second The input and output TSVs of the four TSVs 844 ; when other off-chip devices want to transmit data to the second die group, the data arrives at the second memory area 841 through the aforementioned reverse path.
  • the data When the data of the first die group is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the first physical access circuit of the first physical area 823 ⁇ the physical TSV of the third TSV 824 ⁇ the physical TSV of the sixth TSV 862 ⁇ the physical TSV of the second TSV 833 ⁇ the physical TSV of the fourth TSV 844; when the off-chip memory 404 wants to transmit input data to the first chip
  • the data arrives at the first memory area 821 through the aforementioned reverse path.
  • the data of the second die group When the data of the second die group is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the second physical access circuit of the second physical area 843 ⁇ the physical TSV of the fourth TSV 844 ;
  • the off-chip memory 404 intends to transmit input data to the second die group for processing by the processing device 403 , the data arrives at the second memory area 841 through the aforementioned reverse path.
  • the first core layer 81 is used in conjunction with the first memory layer 82 and the third memory layer 85
  • the second core layer 83 is used in conjunction with the second memory layer 84 and the fourth memory layer 86.
  • the first core layer 81 and the first memory layer 82 adopt a face-to-face bonding process, so that the transmission path between the first computing circuit and the first memory area 821 is the shortest
  • the first core layer 81 and the third memory layer 85 adopt face-to-back bonding Manufacturing process
  • the second core layer 83 and the fourth memory layer 86 adopt a face-to-face bonding process, which also makes the transmission path between the second computing circuit and the fourth memory area 861 the shortest
  • the second core layer 83 and the second memory layer 84 adopt face-to-face
  • the first die group and the second die group adopt a back-to-back bonding process, that is, the first memory layer 82 and the fourth memory layer 86 adopt a back-to-back bonding process.
  • the first grain-to-grain region 812 and the second grain-to-grain region 832 are vertically stacked such that the grain-to-grain interface of the first core layer 81 is connected to the grain of the second core layer 83.
  • the interface to the die is directly electrically connected to the sixth TSV 862 through the first TSV 813 , the third TSV 824 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
  • FIG. 9 shows a schematic diagram of vertical stacking in this embodiment.
  • the vertically stacked chips are stacked from top to bottom into a first die group, a second die group and a third die group.
  • the first die group is respectively the first core layer 91 (first die) and the first memory layer 92 (second die) from top to bottom
  • the second die group is respectively the second core layer from top to bottom.
  • the third die group only includes the third memory layer 95, so the third memory layer 95 is located under the second memory layer 94.
  • the layers in FIG. 9 are visually separated up and down and shown in this way for convenience of illustration only.
  • the first core layer 91 includes a first operation area 911, the first operation area 911 is covered with the logic layer of the first core layer 91, that is, the top side of the first core layer 91 in the figure, and the first core layer 91 also includes in a special area
  • the first memory area 921 has storage units for temporarily storing the calculation results of the first calculation circuit.
  • the second core layer 93 includes a second operation area 931, the second operation area 931 is full of the logic layer of the second core layer 93, that is, the top side of the second core layer 93 in the figure, and the second core layer 93 also includes in a special area
  • the third memory layer 95 includes a third memory area 951, a first input-output area 952, a second input-output area 953, a first physical access area 954, a second physical access area 955, and a fifth TSV 956.
  • the third memory The area 951 is formed with a storage unit for temporarily storing the calculation results of the first operation circuit or the second operation circuit
  • the first input-output area 952 is formed with a first input-output circuit, which is used as an interface for the first die group to communicate with the outside world , that is to realize the function of the interface device 402
  • the second input and output area 953 generates a second input and output circuit, which is used as an interface for the second die group to communicate with the outside world, that is, realizes the function of the interface device 402
  • the first physical area 954 generates
  • the second physical area 955 has a second physical access circuit for connecting the second die group and the off
  • the TSVs are present throughout the entire layer, only shown on one side by way of example. If necessary, the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs.
  • the transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit
  • the input-output TSV is used to electrically conduct the data of the input-output circuit
  • the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. 404 out of memory.
  • the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 911 ⁇ the first transceiver circuit in the first die-to-die area 912 ⁇ the first Transceiver TSV of TSV 913 ⁇ Transceiver TSV of second TSV 922 ⁇ second transceiver circuit of second grain-to-grain region 932 ⁇ second computing circuit of second computing region 931; when processing When the device 403 intends to transmit data to the computing device 401, the data reaches the computing device 401 through the aforementioned reverse path.
  • the first die group and the second die group are not directly connected to the off-chip, and when they need to be connected to the off-chip, this embodiment is implemented through the third memory layer 95 of the third die group.
  • the data will be transmitted to the third memory area 951 for temporary storage through the input and output silicon vias of each layer, and then the third memory area 951 reaches other off-chip devices through the following paths: the first I/O circuit of the first I/O region 952 ⁇ the first I/O TSV of the fifth TSV 956; when other off-chip devices want to transmit data to the In the case of a die group, the data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then transmitted from the third memory area 951 to the first memory area 921 .
  • the data will be transmitted to the third memory area 951 for temporary storage through the input and output through-silicon vias of each layer, and then the third memory area 951 reaches other devices off-chip through the following path: the second input-output circuit of the second input-output area 953 ⁇ the second input-output silicon via of the fifth silicon via 956; when other devices outside the chip want to transmit data to the second In the case of a two-die group, the data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then transmitted from the third memory area 951 to the second memory area 941 .
  • the data in the first memory area 921 When the data in the first memory area 921 is to be transmitted to the off-chip memory 404, the data will be transmitted to the third memory area 951 for temporary storage through the physical TSVs of each layer, and then from the third memory area 951 to the off-chip through the following path Other devices: the first physical access circuit of the first physical area 954 ⁇ the first physical TSV of the fifth TSV 956; when the off-chip memory 404 intends to transmit input data to the first die group, the input data passes through The aforementioned reverse path is temporarily stored in the third memory area 951 , and then transmitted from the third memory area 951 to the first memory area 921 .
  • the data in the second memory area 941 When the data in the second memory area 941 is to be transmitted to the off-chip memory 404, the data will be transmitted to the third memory area 951 for temporary storage through the physical TSV of the fourth TSV, and then the third memory area 951 will pass through the following path Reaching other off-chip devices: the second physical access circuit of the second physical area 955 ⁇ the second physical TSV of the fifth TSV 956; when the off-chip memory 404 intends to transmit input data to the second die group, The input data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then is transmitted from the third memory area 951 to the second memory area 941 through the physical TSV of the fourth TSV.
  • the first core layer 91 is used in conjunction with the first memory layer 92
  • the second core layer 93 is used in conjunction with the second memory layer 94.
  • the first core layer 91 and the first memory layer 92 use The face-to-face bonding process makes the transmission path between the first computing circuit and the first memory area 921 the shortest, and the second core layer 93 and the second memory layer 94 adopt a face-to-face bonding process, which also makes the second computing circuit and the second memory area 941 The transmission path is the shortest.
  • the first die group and the second die group adopt a back-to-back bonding process, that is, the first memory layer 92 and the second core layer 93 adopt a back-to-back bonding process, and the second die group and the second core layer adopt a back-to-back bonding process.
  • the three-die group adopts a face-to-back bonding process, that is, the second memory layer 94 and the third memory layer 95 adopt a face-to-back bonding process.
  • the first grain-to-grain region 912 and the second grain-to-grain region 932 are vertically stacked such that the grain-to-grain interface of the first core layer 91 is connected to the grain of the second core layer 93.
  • the interface to the die is directly electrically connected to the second TSV 922 through the first TSV 913 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
  • FIG. 10 shows a schematic diagram of vertical stacking of this embodiment.
  • the vertically stacked chips are stacked from top to bottom into a first die group, a second die group and a third die group.
  • the first die group is respectively the third memory layer B and the first core layer A from top to bottom
  • the second die group is respectively the first memory layer D and the second core layer C from top to bottom
  • the third die includes the second memory tier E only.
  • the only difference between the vertical stacking structure of this embodiment and the embodiment in FIG. 9 is that the positions of the core layer and the memory layer of the first die group and the second die group are swapped.
  • FCBGA flip chip ball grid array
  • CoWoS chip on wafer on substrate packaging technology
  • FCBGA flip chip ball grid array
  • Small balls are used instead of pins to connect circuits, which can provide the shortest external connection distance.
  • CoWoS is an integrated production technology. First, the die is connected to the silicon wafer (wafer) through the CoW packaging process, and then the CoW die is connected to the substrate to form CoWoS. Through this technology, multiple dies can be packaged. Together, the technical effects of small package size, low power consumption, and fewer pins are achieved.
  • the vertically stacked chip includes a first die group and a second die group, wherein the first die group includes a first core Layer 51 (the first die) and the first memory layer 52 (the second die), the second die group includes the second core layer 53 (the first die) and the second memory layer 54 (the second die) , in another case, the first die may be a memory and the second die may be a processor core. Its flowchart is shown in Figure 11.
  • a first transceiver circuit is formed in a first die-to-die region 512 in the first core layer 51 .
  • the second transceiver circuit is formed in the second die-to-die region 532 in the second core layer 53 .
  • step 1103 generate transceiver TSVs in the first memory layer 52 .
  • step 1104 generating I/O TSVs in the second core layer 53 and the second memory layer 54 .
  • step 1105 physical TSVs are formed in the second core layer 53 and the second memory layer 54 .
  • the first memory layer 52 is set between the first core layer 51 and the second core layer 53, that is, the first core layer 51, the first memory layer 52, the second core layer 53 and the second memory layer 54 The order is stacked from top to bottom.
  • the first core layer 51 and the first memory layer 52 are bonded face to face.
  • the second core layer 53 and the second memory layer 54 are bonded face to face.
  • the first die group and the second die group are bonded back to back.
  • the first computing area 511 and the second computing area 531 perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit, wherein the first memory layer 52 is electrically connected to the first transceiver circuit through the transceiver through silicon via.
  • the data in the first memory area 521 is transmitted to the outside of the vertically stacked chip through the first input-output area 522 and the input-output through-silicon via, and the data in the second memory area 541 is passed through the second input-output area 542 and the input and output TSVs are transmitted outside the vertically stacked chips;
  • the calculation results of the first calculation area 511 are transmitted to the off-chip memory 404 through the first physical area 523 and the physical TSVs, and the calculation results of the second calculation area 531 are transmitted through the first physical area 523
  • the second physical area 543 and the physical TSV are transmitted to the off-chip memory 404 .
  • the vertically stacked chip includes a first die group and a second die group, wherein the first die group includes a first core Layer 51 (the first die) and the first memory layer 52 (the second die), the second die group includes the second core layer 53 (the first die) and the second memory layer 54 (the second die) , in another case, the first die may be a memory and the second die may be a processor core.
  • the first die may be a memory and the second die may be a processor core.
  • a first transceiver circuit is formed in the first die-to-die region 512 in the first core layer 51 .
  • the second transceiver circuit is formed in the second die-to-die region 532 in the second core layer 53 .
  • step 1203 generate transceiver TSVs in the second memory layer 54 .
  • the I/O TSVs are formed in the second core layer 53 and the second memory layer 54 .
  • step 1205 physical TSVs are formed in the second core layer 53 and the second memory layer 54 .
  • the second memory layer 54 is set between the first core layer 51 and the second core layer 53, that is, the first memory layer 52, the first core layer 51, the second memory layer 54 and the second core layer 53 The order is stacked from top to bottom.
  • the first core layer 51 and the first memory layer 52 are bonded face to face.
  • the second core layer 53 and the second memory layer 54 are bonded face to face.
  • the first die group and the second die group are bonded back to back.
  • the first computing area 511 and the second computing area 531 perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit, wherein the second memory layer 54 is electrically connected to the first transceiver circuit through the transceiver through silicon via.
  • the data in the first memory area 521 is transmitted to the outside of the vertically stacked chip through the first input-output area 522 and the input-output through-silicon via, and the data in the second memory area 541 is passed through the second input-output area 542 and the input and output TSVs are transmitted outside the vertically stacked chips;
  • the calculation results of the first calculation area 511 are transmitted to the off-chip memory 404 through the first physical area 523 and the physical TSVs, and the calculation results of the second calculation area 531 are transmitted through the first physical area 523
  • the second physical area 543 and the physical TSV are transmitted to the off-chip memory 404 .
  • FIG. 8 Another embodiment of the present invention is a method of making a vertically stacked chip as shown in Figure 8, the vertically stacked chip of this embodiment is divided into a first die group and a second die group, the first die group Stacked on the second die group, the first die group includes a first core layer 81 (first die), a first memory layer 82 (second die) and a third memory layer 85 (third die) , the second die group includes a second core layer 83 (first die), a second memory layer 84 (third die) and a fourth memory layer 86 (second die).
  • FIG. 13 Its flow chart is shown in Figure 13.
  • a first transceiver circuit is formed in a first die-to-die region 812 in the first core layer 81 .
  • the second transceiver circuit is formed in the second die-to-die region 832 in the second core layer 83 .
  • TSVs for transmitting and receiving are formed in the first memory layer 82 and the fourth memory layer 86 .
  • the I/O TSVs are generated in the second core layer 83 , the second memory layer 84 and the fourth memory layer 86 .
  • physical TSVs are formed in the second core layer 83 , the second memory layer 84 and the fourth memory layer 86 .
  • step 1306 the first core layer 81 and the first memory layer 82 are bonded face to face.
  • step 1307 the third memory layer 85 and the first core layer 81 are bonded face to back.
  • step 1308 the second core layer 83 and the fourth memory layer 86 are bonded face to face.
  • step 1309 the second memory layer 84 and the second core layer 83 are bonded face to back.
  • step 1310 based on the order of the third memory layer 85, the first core layer 81, and the first memory layer 82, stacking is performed from top to bottom.
  • step 1311 based on the order of the fourth memory layer 86 , the second core layer 83 and the second memory layer 84 , stacking from top to bottom.
  • step 1312 the first die group and the second die group are bonded back to back.
  • the first operation area 811 and the second operation area 831 perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit, wherein the first memory layer 82 and the fourth memory layer 86 pass through the transceiver through silicon vias Electrically connect the first transceiver circuit and the second transceiver circuit; the data in the first memory area 821 is transmitted to the outside of the vertically stacked chips through the first input-output area 822 and the input-output through-silicon vias, and the data in the second memory area 841 Through the second input-output area 842 and the input-output through-silicon via, the calculation result of the first operation area 811 is transmitted to the off-chip memory 404 through the first physical area 823 and the physical through-silicon via, and the second operation area The operation result of 831 is transmitted to the off-chip memory 404 through the second physical area 843 and the physical TSV.
  • FIG. third grain group Another embodiment of the present invention is a method of making vertically stacked chips as shown in FIG. third grain group.
  • the first die group is respectively the first core layer 91 (first die) and the first memory layer 92 (second die) from top to bottom, and the second die group is respectively the second core layer from top to bottom.
  • 93 (the first die) and the second memory layer 94 (the second die) and the third die group only includes the third memory layer 95 .
  • Its flow chart is shown in Figure 14.
  • a first transceiver circuit is formed in a first die-to-die region 912 in the first core layer 91 .
  • the second transceiver circuit is formed in the second die-to-die region 932 in the second core layer 93 .
  • step 1403 generate transceiver TSVs in the first memory layer 92 .
  • step 1404 generate I/O TSVs in the third memory layer 95 .
  • step 1405 physical TSVs are formed in the third memory layer 95 .
  • the first core layer 91 and the first memory layer 92 are bonded face to face.
  • the second core layer 93 and the second memory layer 94 are bonded face to face.
  • step 1408 based on the order of the first core layer 91 and the first memory layer 92 , stacking is performed from top to bottom.
  • step 1409 based on the order of the second core layer 93 and the second memory layer 94, stacking is performed from top to bottom.
  • step 1410 the first die group and the second die group are bonded back to back.
  • step 1411 the third die group and the second die group are bonded face to back.
  • the third memory layer 95 includes a third memory area 951, a first I/O area 952, a second I/O area 953, a first physical access area 954, a second physical access area 955 and a fifth silicon through hole 956,
  • the third memory area 951 is formed with a storage unit for temporarily storing the calculation result of the first operation circuit or the second operation circuit
  • the first input-output area 952 is formed with a first input-output circuit for use as the first
  • the second input and output area 953 generates a second input and output circuit, which is used as an interface for the external contact of the second die group, that is, to realize the function of the interface device 402
  • the first physical area 954 generates a first physical access circuit for contacting the first die group and the off-chip memory 404
  • the second physical area 955 generates a second physical access circuit for contacting the second die group and the on-chip memory. 404 out of memory
  • the first die-to-grain region 912 and the second die-to-grain region 932 are vertically stacked such that the die-to-grain interface of the first core layer 91 directly passes through the grain-to-grain interface of the second core layer 93.
  • the first TSV 913 is electrically connected to the second TSV 922 without using the interposer 201 as shown in FIG. 2 for transmission.
  • FIG. 15 shows the manufacturing method of back-to-back stacking in the foregoing embodiments.
  • step 1501 circuits are formed on the logic side of a first wafer.
  • Each wafer can be divided into a logic side and an opposite side.
  • the logic side refers to the side where logic circuits are generated to achieve specific electrical functions, while the opposite side is the side of the wafer where logic circuits are not laid out. Since the generation of the logic circuit is to carry out processes such as deposition and etching on the top of the wafer, in this step, as shown in FIG. 1603 is located under the first wafer 1601 .
  • a front end of line (FEOL) 1604 is formed on the logic side 1602
  • a first TSV 1605 is formed on the logic side 1602
  • a backend process layer (backend) is formed on the logic side 1602. of line, BEOL) 1606, so that the first TSV 1605 is electrically connected to the subsequent process layer 1606.
  • BEOL backend process layer
  • the previous process is to divide the region for preparing transistors on the silicon substrate, and then ion implantation to realize N-type and P-type regions to realize N-type and/or P-type field effect transistors.
  • the subsequent process is multi-layer conductive metal wires, which can connect the transistors on the substrate according to the design requirements to achieve specific functions.
  • the previous process layer 1604 and the subsequent process layer 1606 are respectively formed.
  • the circuit on the logic side is mainly realized by the front-end process layer 1604 , and the electrical connection of each element in the circuit is realized by the back-end process layer 1606 .
  • step 1502 the first wafer 1601 is tested to eliminate defective products.
  • Wafer testing also known as mid-test, aims to ensure that each chip can basically meet the characteristics of the circuit or design specifications, usually including the verification of voltage, current, timing and electrical functions.
  • step 1503 the first wafer 1601 is flipped over. For those first wafers that are not eliminated, perform a 180-degree flip. After the flip, as shown in FIG. 17 , the logical side 1602 of the first wafer 1601 is located at the bottom, and the opposite side 1603 is located at the top.
  • step 1504 a second wafer 1701 is bonded on the logic side 1602 to form the structure shown in FIG. 17 .
  • the first TSV 1605 is exposed on the opposite side 1603 .
  • the opposite side 1603 is ground, and the ground opposite side 1603 is chemical mechanical polished (CMP) to form the structure shown in FIG. 18 .
  • CMP chemical mechanical polished
  • plasma etch the opposite side 1603 after chemical mechanical polishing, so that the first TSV 1605 protrudes from the surface of the opposite side 1603 to form a structure as shown in FIG. 19 .
  • the surface after the low-temperature chemical vapor deposition is chemically mechanically polished to make the silicon dioxide layer 2001 flat and expose the first TSV 1605 , that is, the structure shown in FIG. 21 .
  • the first wafer 1601 is diced into a plurality of first dies.
  • the first wafer 1601 and the second wafer 1701 are placed on the support (mount on frame) 2201, and then the second wafer 1701 is supported by the thimble 2202, and then according to the size of the circuit and the Positionally cut the first wafer 1601 and the second wafer 1701 , that is, cut along the dotted line in the figure, and finally generate a plurality of first crystal grains 2203 .
  • step 1507 the first die 2203 is flipped 180 degrees to form the structure shown in FIG. 23 .
  • step 1508 attach the opposite side of the first die to the opposite side of the second die so that the first TSV is in electrical communication with the second TSV of the second die.
  • the second die can be realized based on the manufacturing process of the prior art, and this embodiment does not limit the manufacturing process of the second die. As shown in FIG. 24, the opposite side 1603 of the first crystal grain 2203 and the opposite side 2402 of the second crystal grain 2401 are bonded together, so that the first TSV 1605 is electrically connected to the second TSV 2403 of the second crystal grain 2401. connected.
  • a back-to-back structure has been formed, that is, the opposite side 1603 of the first die 2203 and the opposite side 2402 of the second die 2401 are bonded together, passing through the first through-silicon via 1605 and the second through-silicon via 2403 so that the logic on both sides The circuits on both sides are electrically connected.
  • the first die 2203 is molded by molding compound formation to form the structure shown in FIG. 25 .
  • a direct bonding package can be used, in which the first die 2203 and the second die 2401 are directly bonded on a printed circuit board or covered with metal leads
  • organic resin is used to drip around the first die 2203 to form a package body 2501 to cover it.
  • step 1510 the plastic-encapsulated first die is smoothed.
  • step 1511 the ground first grain is chemically mechanically polished to form the structure shown in FIG. 26 . So far, the entire back-to-back stacking process is completed.
  • the solution of the present invention is to stack the core layers vertically, and arrange the processor cores of the same die group to be bonded face-to-face with the memory, and the adjacent die groups to be bonded back to back, so that the processor cores of the same die group
  • the transfer path for the die-to-die interface to the memory is greatly shortened.
  • the thickness of the logic side is only 0.3 microns, and the thickness of the bonding layer is about 1 micron, so the transmission path between the processor core and the memory can be shortened to 1.6 microns, which helps to improve the transmission between cores efficiency.
  • the electronic equipment or device of the present invention may include servers, cloud servers, server clusters, data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, PC equipment, Internet of Things terminals, mobile Terminals, mobile phones, driving recorders, navigators, sensors, cameras, cameras, video cameras, projectors, watches, earphones, mobile storage, wearable devices, visual terminals, automatic driving terminals, vehicles, household appliances, and/or medical equipment.
  • Said vehicles include airplanes, ships and/or vehicles;
  • said household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, range hoods;
  • said medical equipment includes nuclear magnetic resonance instruments, Ultrasound and/or electrocardiograph.
  • the electronic equipment or device of the present invention can also be applied to fields such as the Internet, the Internet of Things, data centers, energy, transportation, public management, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, and medical care. Further, the electronic device or device of the present invention can also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as cloud, edge, and terminal.
  • electronic devices or devices with high computing power according to the solution of the present invention can be applied to cloud devices (such as cloud servers), while electronic devices or devices with low power consumption can be applied to terminal devices and/or Edge devices (such as smartphones or cameras).
  • the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that according to the hardware information of the terminal device and/or the edge device, the hardware resources of the cloud device can be Match appropriate hardware resources to simulate the hardware resources of terminal devices and/or edge devices, so as to complete the unified management, scheduling and collaborative work of device-cloud integration or cloud-edge-end integration.
  • the present invention expresses some methods and their embodiments as a series of actions and combinations thereof, but those skilled in the art can understand that the solution of the present invention is not limited by the order of the described actions . Therefore, according to the disclosure or teaching of the present invention, those skilled in the art can understand that some of the steps can be performed in other order or at the same time. Further, those skilled in the art can understand that the embodiments described in the present invention can be regarded as optional embodiments, that is, the actions or modules involved therein are not necessarily necessary for the realization of one or some solutions of the present invention. In addition, according to different schemes, the description of some embodiments of the present invention also has different emphases. In view of this, those skilled in the art may understand the parts not described in detail in a certain embodiment of the present invention, and may also refer to relevant descriptions of other embodiments.
  • a unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit.
  • the aforementioned components or units may be located at the same location or distributed over multiple network units.
  • some or all of the units may be selected to achieve the purpose of the solutions described in the embodiments of the present invention.
  • multiple units in this embodiment of the present invention may be integrated into one unit, or each unit exists physically independently.
  • the above-mentioned integrated units may also be implemented in the form of hardware, that is, specific hardware circuits, which may include digital circuits and/or analog circuits.
  • the physical realization of the hardware structure of the circuit may include but not limited to physical devices, and the physical devices may include but not limited to devices such as transistors or memristors.
  • various devices such as computing devices or other processing devices described herein may be implemented by appropriate hardware processors, such as central processing units, GPUs, FPGAs, DSPs, and ASICs.
  • the aforementioned storage unit or storage device can be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), which can be, for example, a variable resistance memory (Resistive Random Access Memory, RRAM), dynamic Random Access Memory (Dynamic Random Access Memory, DRAM), Static Random Access Memory (Static Random Access Memory, SRAM), Enhanced Dynamic Random Access Memory (Enhanced Dynamic Random Access Memory, EDRAM), High Bandwidth Memory (High Bandwidth Memory , HBM), hybrid memory cube (Hybrid Memory Cube, HMC), ROM and RAM, etc.
  • RRAM variable resistance memory
  • DRAM Dynamic Random Access Memory
  • SRAM Static Random Access Memory
  • EDRAM Enhanced Dynamic Random Access Memory
  • High Bandwidth Memory High Bandwidth Memory
  • HBM High Bandwidth Memory
  • HMC Hybrid Memory Cube
  • ROM and RAM etc.
  • a vertically stacked chip comprising: a first die group including a first die and a second die using a face-to-face process; and a second die group including the first die and a second die using a face-to-face process Two crystal grains; wherein, the first crystal grain group and the second crystal grain group adopt a back-to-back process.
  • Clause A2 The vertically stacked chips of Clause A1, wherein the first die is one of a processor core and a memory, and the second die is the other of a processor core and a memory.
  • Clause A3 The vertically stacked chip of Clause A2, wherein the processor core of the first die group includes a first die-to-die region generating a first transceiver circuit, the processor core of the second die group The processor core includes a second die-to-die region, and a second transceiver circuit is generated; wherein, the processor cores of the first die and the second die group pass through the first transceiver circuit and the The second transceiver circuit performs inter-layer data transmission.
  • Clause A4 The vertically stacked die of Clause A3, wherein the memory of the first die group is located between the processor cores of the first die group and the processor cores of the second die group, so The memory of the first die group has transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • Clause A5. The vertically stacked die of Clause A4, wherein the memory of the first die group includes a first input and output area, the processor cores of the second die group, and the memory of the second die group The memory is provided with input-output through-silicon vias, and the data in the memory of the first die group is transmitted to the outside of the vertically stacked chips through the first input-output area and the input-output through-silicon vias.
  • Clause A6 The vertically stacked chips of Clause A4, wherein the memory of the second die group includes a second input and output area, and the data in the memory of the second die group passes through the through silicon vias Transfer to the outside of the vertically stacked chips.
  • Clause A7 The vertically stacked die of Clause A4, connected to off-chip memory, wherein the memory of the first die group further comprises a first physical area, the processor core of the second die group and the The memory of the second die group has physical through-silicon vias, and the operation result of the processor core of the first die group is transmitted to the off-chip memory through the first physical area and the physical through-silicon vias.
  • Clause A8 The vertically stacked die of Clause A3, wherein the memory of the second die group is located between the processor cores of the first die group and the processor cores of the second die group, so The memory of the second die group has transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
  • Clause A9 The vertically stacked chips of Clause A1 , wherein the first die group further includes a third die that is face-to-back with the first die of the first die group.
  • Clause A10 The vertically stacked chip of Clause A9, wherein the first die is a processor core, the second die is a memory and the third die is a memory.
  • Clause A11 The vertically stacked chips of Clause A1, further comprising a third die group, face-to-back process with the second die group.
  • Clause A13 The vertically stacked chip according to any one of Clauses A1 to 11, wherein the layers are packaged in CoWoS.
  • Clause A14 An integrated circuit device comprising vertically stacked chips according to any one of Clauses A1 to 11.
  • a method of vertically stacking chips comprising a first die group and a second die group, the method comprising: bonding a first die in the first die group face-to-face and the second crystal grain; bonding the first crystal grain and the second crystal grain in the second crystal grain group face-to-face; and bonding the first crystal grain group and the second crystal grain group back-to-back.
  • Clause A17 The method of Clause A16, wherein the first die is one of a processor core and a memory, and the second die is the other of a processor core and a memory, the method further comprising: generating a first transceiver circuit in a first die-to-die region of the processor cores of the first die group; and generating a second transceiver circuit in a second die-to-die region of a processor core in the second die group A die-to-die area; wherein, the processor cores of the first die and the second die group perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
  • Clause A18 The method of Clause A17, further comprising: generating TSVs in the memory of the first die group; and setting the memory of the first die group in the process of the first die group Between the processor core and the processor core of the second die group; wherein, the memory of the first die group is electrically connected to the first transceiver circuit and the second transceiver circuit through the transceiver through-silicon via .
  • Clause A19 The method of Clause A18, wherein the memory of the first die group includes a first input-output region and the memory of the second die group includes a second input-output region, the method further comprising: generating input and output through-silicon vias in the processor core of the second die group and the memory of the second die group; wherein, the data in the memory of the first die group passes through the first input and output region and the I/O TSV are transmitted to the outside of the vertically stacked chips, and the data in the memory of the second die group is transmitted to the second IO region and the IO TSV to the described above for vertical stacking of chips.
  • Clause A20 The method of Clause A17, the vertically stacked die connected to an off-chip memory, wherein the memory of the first die group further includes a first physical region, the method further comprising: generating physical through-silicon vias In the processor core of the second die group and the memory of the second die group; wherein, the calculation result of the processor core of the first die group passes through the first physical area and the physical TSVs communicate to the off-chip memory.
  • Clause A21 The method of Clause A16, further comprising: generating TSVs in memory of the second die group; and placing memory in the second die group in memory of the first die group Between the processor core and the processor core of the second die group; wherein, the transceiver TSV is electrically connected to the first transceiver circuit and the second transceiver circuit.
  • Clause A22 The method of Clause A16, wherein the first die set further includes a third die, the method comprising: face-to-back bonding the third die to the first die set of the first grain.
  • Clause A23 The method of Clause A22, wherein the first die is a processor core, the second die is a memory and the third die is a memory.
  • Clause A24 The method of Clause A16, the vertically stacking chips further comprising a third die set, the method further comprising: face-to-back bonding the third die set to the second die Group.

Landscapes

  • Engineering & Computer Science (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Power Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Condensed Matter Physics & Semiconductors (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Manufacturing & Machinery (AREA)
  • Semiconductor Integrated Circuits (AREA)
  • Credit Cards Or The Like (AREA)

Abstract

A longitudinal stacked chip, an integrated circuit device, a board, and a manufacturing method therefor. A computing device (401) is comprised in the integrated circuit device. The integrated circuit device comprises an interface device (402) and a processing device (403). The computing device (401) interacts with the processing device (403) to jointly complete a computing operation specified by a user. The integrated circuit device can further comprise an off-chip memory (404). The off-chip memory (404) is respectively connected to the computing device (401) and the processing device (403) for storing data of the computing device (401) and the processing device (403).

Description

纵向堆叠芯片、集成电路装置、板卡及其制程方法Vertically stacked chips, integrated circuit devices, boards and manufacturing methods thereof
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年10月08日申请的,申请号为202111172917.6,名称为“纵向堆叠芯片、集成电路装置、板卡及其制程方法”的中国专利申请的优先权。This application claims the priority of the Chinese patent application filed on October 08, 2021 with application number 202111172917.6 and titled "Vertical Stacked Chips, Integrated Circuit Devices, Boards and Their Manufacturing Methods".
技术领域technical field
本发明一般地涉及半导体领域。更具体地,本发明涉及纵向堆叠芯片、集成电路装置、板卡及其制程方法。The present invention generally relates to the field of semiconductors. More specifically, the present invention relates to vertically stacked chips, integrated circuit devices, boards and manufacturing methods thereof.
背景技术Background technique
自从大数据时代来临,结合人工智能技术的系统级芯片需要应对越来越复杂环境,迫使系统级芯片开发出更多的功能,目前芯片设计已逼近最大光罩尺寸。因此,开发人员试着将系统级芯片划分为多芯片模块,模块与模块间需要以超短(ultra-short)和极短(extra-short)距离连结,以实现晶粒(die)间的高速数据传递。除了尽量扩展带宽外,晶粒对晶粒(die-to-die,D2D)的连接更是一种极低延迟和极低功耗的解决方案。Since the advent of the era of big data, SoCs combined with artificial intelligence technology need to cope with increasingly complex environments, forcing SoCs to develop more functions, and the current chip design has approached the maximum mask size. Therefore, developers try to divide the system-on-chip into multi-chip modules, and the modules need to be connected with ultra-short and extra-short distances to achieve high-speed between dies. data transfer. In addition to extending bandwidth as much as possible, die-to-die (D2D) connection is an extremely low latency and extremely low power consumption solution.
晶粒对晶粒接口是一个功能块,会占据晶粒一小片面积,用以提供装配在同一封装中的两个模块或两晶粒间的数据接口。晶粒对晶粒接口利用非常短的通道连接封装内的模块或晶粒,其传输速率和带宽超过传统芯片对芯片接口。A die-to-die interface is a functional block that occupies a small area of the die to provide a data interface between two modules or two die assembled in the same package. Die-to-die interfaces utilize very short channels to connect modules or dies within a package, with transfer rates and bandwidths that exceed traditional chip-to-chip interfaces.
在现有技术中,两个用晶粒对晶粒接口相连的模块或晶粒通常会并排摆放,且两个模块或晶粒的晶粒对晶粒接口相邻,两个晶粒对晶粒接口通过下方的中介层(interposer layer)实现电性连接。虽然晶粒对晶粒接口的传输速率和带宽表现优异,但经由下方的中介层传输数据时,其传输路径高达毫米级。传输路径太长会造成讯号的衰减和速率的降低,仍无法满足高强度运算所需的要求。In the prior art, two modules or dies connected by a die-to-die interface are usually placed side by side, and the die-to-die interfaces of the two modules or dies are adjacent, and the two die-to-die The granular interface is electrically connected through the interposer layer below. Although the transfer rate and bandwidth of the die-to-die interface are excellent, when transferring data through the underlying interposer, the transfer path is as high as millimeters. If the transmission path is too long, the signal will be attenuated and the speed will be reduced, which still cannot meet the requirements of high-intensity computing.
因此,一种缩短晶粒间传输距离的技术方案是迫切需要的。Therefore, a technical solution for shortening the transmission distance between crystal grains is urgently needed.
发明内容Contents of the invention
为了至少部分地解决背景技术中提到的技术问题,本发明的方案提供了一种纵向堆叠芯片、集成电路装置、板卡及其制程方法。In order to at least partly solve the technical problems mentioned in the background art, the solution of the present invention provides a vertically stacked chip, an integrated circuit device, a board and a manufacturing method thereof.
在一个方面中,本发明揭露一种纵向堆叠芯片,包括第一晶粒组及第二晶粒组。第一晶粒组包括采用面对面制程的第一晶粒和第二晶粒,第二晶粒组包括采用面对面制程的第一晶粒和第二晶粒,第一晶粒组和第二晶粒组采用背对背制程。In one aspect, the present invention discloses a vertically stacked chip, including a first die group and a second die group. The first die group includes a first die and a second die using a face-to-face process, the second die group includes a first die and a second die using a face-to-face process, and the first die group and the second die Groups are processed back-to-back.
在另一个方面,本发明揭露一种集成电路装置,包括前述的纵向堆叠芯片;还揭露一种板卡,包括前述的集成电路装置。In another aspect, the present invention discloses an integrated circuit device including the aforementioned vertically stacked chips; and also discloses a board including the aforementioned integrated circuit device.
在另一个方面,本发明揭露一种纵向堆叠芯片的方法,纵向堆叠芯片包括第一晶粒组及第二晶粒组。其方法包括:面对面贴合第一晶粒组中的第一晶粒和第二晶粒;面对面贴合第二晶粒组中的第一晶粒和第二晶粒;以及背对背贴合第一晶粒组和第二晶粒组。In another aspect, the present invention discloses a method for vertically stacking chips. The vertically stacked chips include a first die group and a second die group. The method includes: bonding the first crystal grain and the second crystal grain in the first crystal grain group face-to-face; bonding the first crystal grain and the second crystal grain in the second crystal grain group face-to-face; and bonding the first crystal grain back-to-back Die Group and Second Die Group.
本发明采用面对面贴合同一个晶粒组的晶粒,而采用背对背贴合相邻的晶粒组,使得同一晶粒组内晶粒间的传输路径大大缩短了,有助于提高晶粒组内的传输效率。The present invention adopts face-to-face lamination of the grains of the same grain group, and adopts back-to-back lamination of adjacent grain groups, so that the transmission path between the grains in the same grain group is greatly shortened, which helps to improve the internal strength of the grain group. transmission efficiency.
附图说明Description of drawings
通过参考附图阅读下文的详细描述,本发明示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本发明的若干实施方式,并且相同或对应的标号表示相同或对应的部分。其中:The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily understood by reading the following detailed description with reference to the accompanying drawings. In the drawings, several embodiments of the present invention are shown by way of illustration and not limitation, and the same or corresponding reference numerals indicate the same or corresponding parts. in:
图1示出一种包括晶粒对晶粒接口的封装结构的布局俯视图;FIG. 1 shows a top view of the layout of a package structure including a die-to-die interface;
图2示出图1的封装结构沿着虚线方向的剖面图;FIG. 2 shows a cross-sectional view of the packaging structure in FIG. 1 along the dotted line direction;
图3是示出本发明实施例的板卡的结构图;Fig. 3 is a structural diagram showing a board of an embodiment of the present invention;
图4是示出本发明实施例的集成电路装置的结构图;4 is a structural diagram illustrating an integrated circuit device according to an embodiment of the present invention;
图5是示出本发明另一个实施例纵向堆叠的示意图;Fig. 5 is a schematic view showing vertical stacking according to another embodiment of the present invention;
图6是示出图5结构的剖面图;Fig. 6 is a sectional view showing the structure of Fig. 5;
图7是示出本发明另一个实施例纵向堆叠的示意图;Fig. 7 is a schematic diagram showing vertical stacking according to another embodiment of the present invention;
图8是示出本发明另一个实施例纵向堆叠的示意图;Fig. 8 is a schematic diagram showing vertical stacking according to another embodiment of the present invention;
图9是示出本发明另一个实施例纵向堆叠的示意图;Fig. 9 is a schematic diagram showing vertical stacking according to another embodiment of the present invention;
图10是示出本发明另一个实施例纵向堆叠的示意图;Fig. 10 is a schematic diagram showing vertical stacking according to another embodiment of the present invention;
图11是示出本发明另一个实施例制成图5的纵向堆叠芯片的流程图;Fig. 11 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 5;
图12是示出本发明另一个实施例制成图7的纵向堆叠芯片的流程图;Fig. 12 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 7;
图13是示出本发明另一个实施例制成图8的纵向堆叠芯片的流程图;Fig. 13 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 8;
图14是示出本发明另一个实施例制成图9的纵向堆叠芯片的流程图;Fig. 14 is a flow chart showing another embodiment of the present invention to make the vertically stacked chips of Fig. 9;
图15是示出本发明另一个实施例实现背对背堆叠的流程图;Fig. 15 is a flow chart showing another embodiment of the present invention to realize back-to-back stacking;
图16是示出步骤1501的剖面图;Figure 16 is a cross-sectional view illustrating step 1501;
图17是示出步骤1504的剖面图;FIG. 17 is a cross-sectional view illustrating step 1504;
图18是示出步骤1505的剖面图;FIG. 18 is a cross-sectional view illustrating step 1505;
图19是示出步骤1505的剖面图;Figure 19 is a cross-sectional view illustrating step 1505;
图20是示出步骤1505的剖面图;Figure 20 is a cross-sectional view illustrating step 1505;
图21是示出步骤1505的剖面图;FIG. 21 is a cross-sectional view illustrating step 1505;
图22是示出步骤1506的剖面图;FIG. 22 is a cross-sectional view illustrating step 1506;
图23是示出步骤1507的剖面图;Figure 23 is a sectional view showing step 1507;
图24是示出步骤1508的剖面图;FIG. 24 is a cross-sectional view illustrating step 1508;
图25是示出步骤1509的剖面图;以及Figure 25 is a cross-sectional view illustrating step 1509; and
图26是示出步骤1511的剖面图。FIG. 26 is a cross-sectional view illustrating step 1511 .
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.
应当理解,本发明的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本发明的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that the terms "first", "second", "third" and "fourth" in the claims, description and drawings of the present invention are used to distinguish different objects, rather than to describe a specific order . The terms "comprising" and "comprising" used in the description and claims of the present invention indicate the presence of described features, integers, steps, operations, elements and/or components, but do not exclude one or more other features, integers , steps, operations, elements, components, and/or the presence or addition of collections thereof.
还应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本发明。如在本发明说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本发明说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the terms used in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention. As used in the specification and claims herein, the singular forms "a", "an" and "the" are intended to include the plural forms unless the context clearly dictates otherwise. It should be further understood that the term "and/or" used in the description and claims of the present invention refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations.
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。As used in this specification and claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context.
下面结合附图来详细描述本发明的具体实施方式。The specific implementation manner of the present invention will be described in detail below in conjunction with the accompanying drawings.
晶粒对晶粒接口就如同任何其他芯片对芯片接口一样,在两个晶粒间建立的数据链接 渠道。晶粒对晶粒接口逻辑上分为物理层、链路层和事务层,并提供一种标准化的平行接口,连接到内部互连结构。A die-to-die interface is like any other chip-to-chip interface, a data link channel established between the two dies. The die-to-die interface is logically divided into physical layer, link layer, and transaction layer, and provides a standardized parallel interface to the internal interconnect structure.
图1示出一种包括晶粒对晶粒接口的封装结构的布局俯视图,此封装结构的布局是位于晶片的模塑料(molding compound)区10,模塑料区10包括系统区域及存储区域,此示例性的系统区域位于模塑料区10的中央,用以放置2个片上系统101,存储区域分别位于系统区域的两侧,用以放置8个片外内存102。1 shows a top view of the layout of a package structure including a die-to-die interface. The layout of the package structure is located in a molding compound area 10 of a chip. The molding compound area 10 includes a system area and a storage area. An exemplary system area is located in the center of the molding compound area 10 for placing two SoCs 101 , and storage areas are respectively located on both sides of the system area for placing eight off-chip memories 102 .
系统区域还设有晶粒对晶粒区103、物理区104及输入输出区105。晶粒对晶粒区103生成有收发电路,用以在两个片上系统101间进行数据分享;物理区104生成有物理访问电路,用以访问片外内存102;输入输出区105生成有输入输出电路,用以作为片上系统101对外联系的接口。The system area also has a die-to-die area 103 , a physical area 104 and an input-output area 105 . The die-to-die area 103 is formed with a transceiver circuit for data sharing between the two SoCs 101; the physical area 104 is formed with a physical access circuit for accessing the off-chip memory 102; the input-output area 105 is formed with input and output The circuit is used as an interface for external communication of the system on chip 101 .
系统区域还放置了内存106,作为片上系统101的暂存空间,其容量小于片外内存102,但数据传输速率却高于片外内存102。The memory 106 is also placed in the system area as a temporary storage space of the system on chip 101 , its capacity is smaller than that of the off-chip memory 102 , but the data transfer rate is higher than that of the off-chip memory 102 .
图2示出图1的封装结构沿着虚线方向的剖面图。如图所示,系统区域分为上下2层,上层为片上系统101,下层为晶粒对晶粒区103的收发电路、内存106及输入输出区105的输入输出电路。封装结构还包括中介层201及基板202,中介层201设置于基板202上。当2个片上系统101进行数据传输时,其路径为发送端片上系统101→发送端晶粒对晶粒区103的收发电路→中介层201→接收端晶粒对晶粒区103的收发电路→接收端片上系统101,以此实现晶粒对晶粒端口的低延迟和低功耗的技术功效。FIG. 2 shows a cross-sectional view of the package structure in FIG. 1 along the dotted line direction. As shown in the figure, the system area is divided into upper and lower layers. The upper layer is the SoC 101 , and the lower layer is the transceiver circuit of the die-to-die area 103 , the memory 106 and the I/O circuit of the I/O area 105 . The packaging structure further includes an interposer 201 and a substrate 202 , and the interposer 201 is disposed on the substrate 202 . When two SoCs 101 perform data transmission, the path is the system on chip 101 at the sending end → the transceiver circuit of the die-to-die area 103 at the sending end → the interposer 201 → the transceiver circuit of the die-to-die area 103 at the receiving end → The system on chip 101 at the receiving end realizes the technical effect of low delay and low power consumption of the die-to-die port.
图3示出本发明实施例的一种板卡30的结构示意图。如图1所示,板卡30包括芯片301,其是一种系统级芯片,集成有一个或多个组合处理装置,组合处理装置是一种人工智能运算单元,用以支持各类深度学习和机器学习算法,满足计算机视觉、语音、自然语言处理、数据挖掘等领域复杂场景下的智能处理需求。特别是深度学习技术大量应用在云端智能领域,云端智能应用的一个显著特点是输入数据量大,对平台的存储能力和计算能力有很高的要求,此实施例的板卡30适用在云端智能应用,具有庞大的片外存储、片上存储和强大的计算能力。FIG. 3 shows a schematic structural diagram of a board 30 according to an embodiment of the present invention. As shown in Figure 1, the board card 30 includes a chip 301, which is a system-on-a-chip integrated with one or more combination processing devices. The combination processing device is an artificial intelligence computing unit to support various types of deep learning and Machine learning algorithms meet the intelligent processing requirements in complex scenarios in the fields of computer vision, speech, natural language processing, and data mining. In particular, deep learning technology is widely used in the field of cloud intelligence. A notable feature of cloud intelligence applications is the large amount of input data, which has high requirements for the storage capacity and computing power of the platform. The board 30 of this embodiment is suitable for cloud intelligence applications. applications, with huge off-chip storage, on-chip storage and powerful computing capabilities.
芯片301通过对外接口装置302与外部设备303相连接。外部设备303例如是服务器、计算机、摄像头、显示器、鼠标、键盘、网卡或wifi接口等。待处理的数据可以由外部设备303通过对外接口装置302传递至芯片301。芯片301的计算结果可以经由对外接口装置302传送回外部设备303。根据不同的应用场景,对外接口装置302可以具有不同的接口形式,例如PCIe接口等。The chip 301 is connected to an external device 303 through an external interface device 302 . The external device 303 is, for example, a server, a computer, a camera, a display, a mouse, a keyboard, a network card or a wifi interface, and the like. The data to be processed can be transmitted to the chip 301 by the external device 303 through the external interface device 302 . The calculation result of the chip 301 can be sent back to the external device 303 via the external interface device 302 . According to different application scenarios, the external interface device 302 may have different interface forms, such as a PCIe interface and the like.
更详细来说,芯片301包括计算装置和处理装置。计算装置配置成执行用户指定的操作,主要实现为单核智能处理器或者多核智能处理器,用以执行深度学习或机器学习的计算。处理装置作为通用的处理装置,执行包括但不限于数据搬运、对计算装置的开启和/或停止等基本控制。根据实现方式的不同,处理装置可以是中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)或其他通用和/或专用处理器中的一种或多种类型的处理器,这些处理器包括但不限于数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,并且其数目可以根据实际需要来确定。如前所述,仅就此实施例的计算装置而言,其可以视为具有单核结构或者同构多核结构。然而,当将计算装置和处理装置整合共同考虑时,二者视为形成异构多核结构。In more detail, the chip 301 includes computing means and processing means. The computing device is configured to perform operations specified by the user, and is mainly implemented as a single-core intelligent processor or a multi-core intelligent processor, which is used to perform deep learning or machine learning calculations. As a general-purpose processing device, the processing device performs basic control including but not limited to data transfer, starting and/or stopping the computing device, and the like. Depending on the implementation, the processing means may be one or more types of processing in a central processing unit (CPU), a graphics processing unit (GPU), or other general-purpose and/or special-purpose processors. These processors include but are not limited to digital signal processors (digital signal processors, DSPs), application specific integrated circuits (application specific integrated circuits, ASICs), field-programmable gate arrays (field-programmable gate arrays, FPGAs) or other Programming logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs. As mentioned above, only in terms of the computing device of this embodiment, it can be regarded as having a single-core structure or a homogeneous multi-core structure. However, when the integration of computing devices and processing devices is considered together, the two are considered to form a heterogeneous multi-core structure.
板卡30还包括用于存储数据的存储器件304,其包括一个或多个存储单元305。存储器件304通过总线与控制器件306和芯片301进行连接和数据传输。板卡30中的控制器件306配置用于对芯片301的状态进行调控。为此,在一个应用场景中,控制器件306可以包括单片机(Micro Controller Unit,MCU)。The board 30 also includes a storage device 304 for storing data, which includes one or more storage units 305 . The storage device 304 is connected and data transmitted with the control device 306 and the chip 301 through the bus. The control device 306 in the board 30 is configured to regulate the state of the chip 301 . To this end, in an application scenario, the control device 306 may include a microcontroller (Micro Controller Unit, MCU).
图4示出板卡30中的组合处理装置的结构。组合处理装置40包括计算装置401、接口装置402、处理装置403和片外内存404。FIG. 4 shows the structure of the combined processing device in the board 30. As shown in FIG. The combined processing device 40 includes a computing device 401 , an interface device 402 , a processing device 403 and an off-chip memory 404 .
计算装置401配置成执行用户指定的操作,主要实现为单核智能处理器或者多核智能处理器,用以执行深度学习或机器学习的计算,其可以通过接口装置402与处理装置403进行交互,以共同完成用户指定的操作。The computing device 401 is configured to perform operations specified by the user, and is mainly implemented as a single-core intelligent processor or a multi-core intelligent processor for performing deep learning or machine learning calculations, which can interact with the processing device 403 through the interface device 402 to Work together to complete user-specified operations.
接口装置402连接至总线,用以与其他装置相连接,例如图3的控制器件306、对外接口装置302等。The interface device 402 is connected to the bus for connecting with other devices, such as the control device 306 and the external interface device 302 in FIG. 3 .
处理装置403作为通用的处理装置,执行包括但不限于数据搬运、对计算装置401的开启和/或停止等基本控制。根据实现方式的不同,处理装置403可以是中央处理器、图形处理器或其他通用和/或专用处理器中的一种或多种类型的处理器,这些处理器包括但不限于数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,并且其数目可以根据实际需要来确定。如前所述,仅就此实施例的计算装置401而言,其可以视为具有单核结构或者同构多核结构。然而,当将计算装置401和处理装置403整合共同考虑时,二者视为形成异构多核结构。As a general processing device, the processing device 403 performs basic control including but not limited to data transfer, starting and/or stopping of the computing device 401 . According to different implementations, the processing device 403 may be one or more types of processors in a central processing unit, a graphics processing unit, or other general and/or special purpose processors, these processors include but are not limited to digital signal processors , application-specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof can be determined according to actual needs. As mentioned above, only for the computing device 401 in this embodiment, it can be regarded as having a single-core structure or a homogeneous multi-core structure. However, when considering the integration of the computing device 401 and the processing device 403 together, they are considered to form a heterogeneous multi-core structure.
片外内存404用以存储待处理的数据,为DDR内存,大小通常为16G或更大,用于保存计算装置401和/或处理装置403的数据。The off-chip memory 404 is used to store data to be processed, which is a DDR memory, usually 16G or larger in size, and is used to store data of the computing device 401 and/or the processing device 403 .
图5示出本发明的实施例的纵向堆叠的示意图。此实施例是一种多核芯片,包括第一晶粒组和第二晶粒组,其中第一晶粒组包括第一核层51及第一内存层52,第二晶粒组包括第二核层53及第二内存层54,实际上第一核层51、第一内存层52、第二核层53及第二内存层54依序纵向堆叠在一块,图5中的各层视觉上为上下分离仅为了方便说明而以此方式展示。Figure 5 shows a schematic diagram of a vertical stack of embodiments of the present invention. This embodiment is a multi-core chip, including a first die group and a second die group, wherein the first die group includes a first core layer 51 and a first memory layer 52, and the second die group includes a second core Layer 53 and the second memory layer 54, actually the first core layer 51, the first memory layer 52, the second core layer 53 and the second memory layer 54 are vertically stacked together in sequence, each layer in Fig. 5 is visually The upper and lower separations are shown in this manner for convenience of illustration only.
第一核层51实现处理器核的功能,包括第一运算区511,第一运算区511布满第一核层51的逻辑层,即图中第一核层51的顶侧,第一核层51在特别区域还包括第一晶粒对晶粒区512及第一硅通孔513,第一运算区511生成有第一运算电路,以实现计算装置401的功能;第一晶粒对晶粒区512生成有第一收发电路,用以作为第一运算电路的晶粒对晶粒接口;第一硅通孔513用以在三维集成电路中实现堆叠芯片的电性互连。The first core layer 51 realizes the function of the processor core, including the first operation area 511, and the first operation area 511 is full of the logic layer of the first core layer 51, that is, the top side of the first core layer 51 in the figure, the first core The layer 51 also includes a first grain-to-grain area 512 and a first through-silicon via 513 in a special area, and the first computing area 511 generates a first computing circuit to realize the function of the computing device 401; the first grain-to-crystal The die area 512 is formed with a first transceiver circuit for the die-to-die interface of the first computing circuit; the first through-silicon via 513 is used for realizing the electrical interconnection of stacked chips in a three-dimensional integrated circuit.
第一内存层52实现片上内存的功能,包括第一内存区521、第一输入输出区522、第一物理区523及第二硅通孔524。第一内存区521生成有存储单元,用以暂存第一运算电路的运算结果。第一输入输出区522生成有第一输入输出电路,用以作为第一核层51与第一内存层52对外联系的接口,即实现接口装置402的功能。第二物理区523生成有第一物理访问电路,用以访问片外内存404。第二硅通孔524遍布整个第一内存层52,示例性仅显示于一侧,用以电性连接特定的元件。The first memory layer 52 implements the function of on-chip memory, including a first memory area 521 , a first input/output area 522 , a first physical area 523 and a second TSV 524 . The first memory area 521 is formed with a storage unit for temporarily storing the operation result of the first operation circuit. The first input-output area 522 is formed with a first input-output circuit, which is used as an interface for the first core layer 51 to communicate with the first memory layer 52 , that is, to realize the function of the interface device 402 . The second physical area 523 has a first physical access circuit for accessing the off-chip memory 404 . The second TSVs 524 extend over the entire first memory layer 52 , and are only shown on one side for example, and are used to electrically connect specific elements.
第二核层53实现处理器核的功能,包括第二运算区531,第二运算区531布满第二核层53的逻辑层,即图中第二核层53的顶侧,第二核层53在特别区域还包括第二晶粒对晶粒区532及第三硅通孔533,第二运算区531生成有第二运算电路,以实现处理装置403的功能;第二晶粒对晶粒区532生成有第二收发电路,用以作为第二运算电路的晶粒对晶粒接口;第三硅通孔533同样用以在三维集成电路中实现堆叠芯片的电性互连。The second core layer 53 realizes the function of the processor core, including a second operation area 531, and the second operation area 531 is full of the logic layer of the second core layer 53, that is, the top side of the second core layer 53 in the figure. Layer 53 also includes a second grain-to-grain region 532 and a third through-silicon via 533 in a special area, and a second computing circuit is formed in the second computing region 531 to realize the function of the processing device 403; The second transceiver circuit is formed in the grain area 532 to serve as the die-to-die interface of the second computing circuit; the third TSV 533 is also used to realize the electrical interconnection of the stacked chips in the three-dimensional integrated circuit.
第二内存层54实现片上内存的功能,包括第二内存区541、第二输入输出区542、第二物理区543及第四硅通孔544。第二内存区541生成有存储单元,用以暂存第二运算电路的运算结果。第二输入输出区542生成有第二输入输出电路,用以作为第二核层53与第二内存层54对外联系的接口,即实现接口装置402的功能。第二物理区543生成有第二物理访问电路,用以访问片外内存404。第四硅通孔544遍布整个第二内存层54,示例性仅显示于一侧,用以电性连接特定的元件。The second memory layer 54 implements the function of on-chip memory, including a second memory area 541 , a second input/output area 542 , a second physical area 543 and a fourth TSV 544 . The second memory area 541 is formed with a storage unit for temporarily storing the operation result of the second operation circuit. The second input-output area 542 is formed with a second input-output circuit, which is used as an interface for the second core layer 53 to communicate with the second memory layer 54 , that is, to realize the function of the interface device 402 . The second physical area 543 has a second physical access circuit for accessing the off-chip memory 404 . The fourth TSV 544 spreads over the entire second memory layer 54 , and is only shown on one side as an example, for electrically connecting specific components.
各层的硅通孔如有必要,将分别包括收发硅通孔、输入输出硅通孔及物理硅通孔。收发硅通孔用来电性连接第一收发电路和第二收发电路,输入输出硅通孔用以电性传导输入 输出电路的数据,物理硅通孔用以电性传导运算电路的运算结果至片外内存404。If necessary, the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs. The transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit, the input-output TSV is used to electrically conduct the data of the input-output circuit, and the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. 404 out of memory.
当计算装置401欲传输数据至处理装置403时,数据通过以下路径到达处理装置403:第一运算区511的第一运算电路→第一晶粒对晶粒区512的第一收发电路→第一硅通孔513的收发硅通孔→第二硅通孔524的收发硅通孔→第二晶粒对晶粒区532的第二收发电路→第二运算区531的第二运算电路;当处理装置403欲传输数据至计算装置401时,数据通过前述的反向路径到达计算装置401。When the computing device 401 intends to transmit data to the processing device 403, the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 511 → the first transceiver circuit in the first die-to-die area 512 → the first Transceiver TSV of TSV 513 → Transceiver TSV of second TSV 524 → second transceiver circuit of second grain-to-grain region 532 → second computing circuit of second computing region 531; when processing When the device 403 intends to transmit data to the computing device 401, the data reaches the computing device 401 through the aforementioned reverse path.
当计算装置401的计算结果需要通过接口装置402与片外的其他装置进行数据交换时,数据通过以下路径到达片外的其他装置:第一输入输出区522的第一输入输出电路→第二硅通孔524的输入输出硅通孔→第二硅通孔533的输入输出硅通孔→第四硅通孔544的输入输出硅通孔;当片外的其他装置欲传输数据至第一内存区521时,数据通过前述的反向路径到达第一内存区521。当处理装置403的计算结果需要通过接口装置402与片外的其他装置进行数据交换时,数据通过以下路径到达片外的其他装置:第二输入输出区542的输入输出电路→第四硅通孔544的输入输出硅通孔;当片外的其他装置欲传输数据至第二内存区541时,数据通过前述的反向路径到达第二内存区541。When the calculation result of the computing device 401 needs to exchange data with other off-chip devices through the interface device 402, the data reaches other off-chip devices through the following path: the first input-output circuit of the first input-output area 522→the second silicon I/O TSV of TSV 524 → I/O TSV of second TSV 533 → I/O TSV of fourth TSV 544; when other devices outside the chip want to transmit data to the first memory area At 521, the data arrives at the first memory area 521 through the aforementioned reverse path. When the calculation result of the processing device 403 needs to exchange data with other off-chip devices through the interface device 402, the data reaches other off-chip devices through the following path: the input-output circuit of the second input-output area 542→the fourth TSV 544 input and output TSVs; when other off-chip devices want to transmit data to the second memory area 541, the data arrives at the second memory area 541 through the aforementioned reverse path.
当第一内存区521的数据欲传输至片外内存404时,数据通过以下路径到达片外内存404:第一物理区523的第一物理访问电路→第二硅通孔524的物理硅通孔→第二硅通孔533的物理硅通孔→第四硅通孔544的物理硅通孔;当片外内存404欲传输输入数据至第一内存区521供计算装置401进行处理时,数据通过前述的反向路径到达第一内存区521。当第二内存区541的数据欲传输至片外内存404时,数据通过以下路径到达片外内存404:第二物理区543的第二物理访问电路→第四硅通孔544的物理硅通孔;当片外内存404欲传输输入数据至第二内存区541供处理装置403进行处理时,数据通过前述的反向路径到达第二内存区541。When the data in the first memory area 521 is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the first physical access circuit of the first physical area 523 → the physical TSV of the second TSV 524 → Physical TSV of the second TSV 533 → Physical TSV of the fourth TSV 544; when the off-chip memory 404 intends to transmit input data to the first memory area 521 for processing by the computing device 401, the data passes through The aforementioned reverse path reaches the first memory area 521 . When the data in the second memory area 541 is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the second physical access circuit of the second physical area 543 → the physical TSV of the fourth TSV 544 ; When the off-chip memory 404 intends to transmit input data to the second memory area 541 for processing by the processing device 403, the data arrives at the second memory area 541 through the aforementioned reverse path.
各层可分为逻辑侧和相对侧。逻辑侧设置有逻辑电路以实现特定的功能,相对侧则是层中未布设逻辑电路的另一侧。图6示出图5结构的剖面图。在此实施例中,第一核层51与第一内存层52搭配使用,第二核层53与第二内存层54搭配使用,为了传输效率,第一核层51与第一内存层52采用面对面贴合制程,也就是第一核层51中生成第一运算区511的逻辑侧与第一内存层52生成第一内存区521的逻辑侧相贴合,使得第一运算电路与第一内存区521的传输路径最短。同样地,第二核层53生成第二运算区531的逻辑侧与第二内存层54生成第一内存区541的逻辑侧相贴合,同样使得第二运算电路与第二内存区541的传输路径最短。为了实现前述最短传输路径,第一晶粒组和第二晶粒组则采用背对背贴合制程,也就是第一内存层52的相对侧与第二核层53的相对侧相贴合。Layers can be divided into logical side and opposite side. The logic side is provided with logic circuits to achieve specific functions, and the opposite side is the other side of the layer where logic circuits are not laid out. FIG. 6 shows a cross-sectional view of the structure of FIG. 5 . In this embodiment, the first core layer 51 is used in conjunction with the first memory layer 52, and the second core layer 53 is used in conjunction with the second memory layer 54. For transmission efficiency, the first core layer 51 and the first memory layer 52 use Face-to-face bonding process, that is, the logical side of the first core layer 51 that generates the first computing area 511 is bonded to the logical side of the first memory layer 52 that generates the first memory area 521, so that the first computing circuit and the first memory Area 521 has the shortest transmission path. Similarly, the logical side where the second core layer 53 generates the second computing area 531 fits with the logical side where the second memory layer 54 generates the first memory area 541, so that the transmission between the second computing circuit and the second memory area 541 The shortest path. In order to realize the aforementioned shortest transmission path, the first die group and the second die group adopt a back-to-back bonding process, that is, opposite sides of the first memory layer 52 and opposite sides of the second core layer 53 are bonded.
通过如图6所示的安排,第一晶粒对晶粒区512与第二晶粒对晶粒区532纵向堆叠,使得第一核层51的晶粒对晶粒接口与第二核层53的晶粒对晶粒接口直接通过第一硅通孔513与第二硅通孔524电性连接,不需要利用如图2所示的中介层201进行传输。With the arrangement shown in FIG. 6, the first grain-to-grain region 512 and the second grain-to-grain region 532 are vertically stacked so that the grain-to-grain interface of the first core layer 51 is connected to the second core layer 53. The grain-to-grain interface is directly electrically connected to the second TSV 524 through the first TSV 513 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
综上所述,此实施例的第一晶粒组包括采用面对面制程的第一晶粒和第二晶粒,此实施例的第二晶粒组包括采用面对面制程的第一晶粒和第二晶粒,而第一晶粒组和第二晶粒组采用背对背制程,其中第一晶粒可以为处理器核或内存,第二晶粒则为处理器核及内存的另一个,彼此搭配使用。In summary, the first die group in this embodiment includes the first die and the second die using the face-to-face process, and the second die group in this embodiment includes the first die and the second die using the face-to-face process. Die, while the first die group and the second die group adopt back-to-back process, in which the first die can be the processor core or memory, and the second die is the other of the processor core and memory, which are used in conjunction with each other .
在另一种情况下,第一晶粒组的第一核层51及第一内存层52的位置可以对调,第二晶粒组的第二核层53及第二内存层54的位置可以对调,如图7所示,这种结构的第二晶粒组的第二内存层54位于第一晶粒组的第一核层51及第二晶粒组的第二核层53间,第二内存层54生成有收发硅通孔,用以电性连接第一收发电路及第二收发电路。In another case, the positions of the first core layer 51 and the first memory layer 52 of the first die group can be reversed, and the positions of the second core layer 53 and the second memory layer 54 of the second die group can be reversed. , as shown in FIG. 7, the second memory layer 54 of the second die group of this structure is located between the first core layer 51 of the first die group and the second core layer 53 of the second die group, and the second The memory layer 54 is formed with transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
当计算装置401欲传输数据至处理装置403时,数据通过以下路径到达处理装置403:第一运算区511的第一运算电路→第一晶粒对晶粒区512的第一收发电路→第一硅通孔513的收发硅通孔→第四硅通孔544的收发硅通孔→第二晶粒对晶粒区532的第二收发电 路→第二运算区531的第二运算电路;当处理装置403欲传输数据至计算装置401时,数据通过前述的反向路径到达计算装置401。When the computing device 401 intends to transmit data to the processing device 403, the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 511 → the first transceiver circuit in the first die-to-die area 512 → the first Transceiver TSV of TSV 513→Transceiver TSV of fourth TSV 544→Second transceiver circuit in second grain-to-grain area 532→Second operation circuit in second operation area 531; when processing When the device 403 intends to transmit data to the computing device 401, the data reaches the computing device 401 through the aforementioned reverse path.
在图7所示的结构中,计算装置401或处理装置403通过接口装置402与片外的其他装置进行数据交换的路径,以及第一内存区521或第二内存区541与片外内存404进行数据传输的路径与图5的实施例相近,本领域技术人员可以轻易推及,故不赘述。In the structure shown in FIG. 7 , the computing device 401 or the processing device 403 performs data exchange with other off-chip devices through the interface device 402, and the first memory area 521 or the second memory area 541 communicates with the off-chip memory 404. The path of data transmission is similar to the embodiment in FIG. 5 , and those skilled in the art can easily deduce it, so it is not described in detail.
本发明的另一个实施例同样是实现如图4所示的结构。图8示出此实施例纵向堆叠的示意图。此实施例的纵向堆叠芯片分为第一晶粒组和第二晶粒组,第一晶粒组堆叠在第二晶粒组上,第一晶粒组由上至下分别为第三内存层85(第三晶粒)、第一核层81(第一晶粒)及第一内存层82(第二晶粒),第二晶粒组由上至下分别为第四内存层86(第二晶粒)、第二核层83(第一晶粒)及第二内存层84(第三晶粒),即第四内存层86位于第一内存层82与第二核层83间。图8中的各层视觉上为上下分离仅为了方便说明而以此方式展示。Another embodiment of the present invention is also to realize the structure shown in FIG. 4 . Figure 8 shows a schematic diagram of vertical stacking in this embodiment. The vertically stacked chips in this embodiment are divided into a first die group and a second die group, the first die group is stacked on the second die group, and the first die group is respectively the third memory layer from top to bottom 85 (the third grain), the first core layer 81 (the first grain) and the first memory layer 82 (the second grain), and the second grain group is respectively the fourth memory layer 86 (the second grain) from top to bottom second die), the second core layer 83 (the first die) and the second memory layer 84 (the third die), that is, the fourth memory layer 86 is located between the first memory layer 82 and the second core layer 83 . The layers in FIG. 8 are visually separated up and down and shown in this way for convenience of illustration only.
第一核层81、第一内存层82、第二核层83、第二内存层84的功能和作用与前述实施例中的第一核层51、第一内存层52、第二核层53、第二内存层54相同,故不赘述。The functions and effects of the first core layer 81, the first memory layer 82, the second core layer 83, and the second memory layer 84 are the same as the first core layer 51, the first memory layer 52, and the second core layer 53 in the foregoing embodiments. , and the second memory layer 54 are the same, so details are not repeated here.
第三内存层85包括第三内存区851及第五硅通孔852,第三内存区851布满第三内存层85的逻辑层,即图中第三内存层85的顶侧。第三内存区851生成有存储单元,用以暂存第一运算电路的运算结果,第五硅通孔852遍布整个第三内存层85,示例性仅显示于一侧,用以电性连接特定的元件。第三内存层85仅负责暂存第一运算电路的运算结果,不负责第一晶粒组对外的联系任务。第一运算电路可以使用第一内存区821和第三内存区851的暂存空间,当计算装置401欲暂存中间数据时,可以通过第五硅通孔852暂存至第三内存区851,或是通过第一硅通孔813暂存至第一内存区821。The third memory layer 85 includes a third memory area 851 and a fifth TSV 852 , the third memory area 851 covers the logic layer of the third memory layer 85 , that is, the top side of the third memory layer 85 in the figure. The third memory area 851 is formed with storage units for temporarily storing the calculation results of the first calculation circuit. The fifth through-silicon vias 852 are spread over the entire third memory layer 85 and are only shown on one side for electrical connection. components. The third memory layer 85 is only responsible for temporarily storing the calculation results of the first calculation circuit, and is not responsible for the external contact task of the first die group. The first computing circuit can use the temporary storage space of the first memory area 821 and the third memory area 851, and when the computing device 401 wants to temporarily store intermediate data, it can temporarily store it to the third memory area 851 through the fifth TSV 852, Or it is temporarily stored in the first memory area 821 through the first TSV 813 .
第四内存层86包括第四内存区861及第六硅通孔862,第四内存区861布满第四内存层86的逻辑层,即图中第四内存层86的顶侧。第四内存区861生成有存储单元,用以暂存第二运算电路的运算结果,第六硅通孔862遍布整个第四内存层86,示例性仅显示于一侧,用以电性连接特定的元件。第四内存层86仅负责暂存第二运算电路的运算结果,不负责第二晶粒组对外的联系任务。第二运算电路可以使用第二内存区841和第四内存区861的暂存空间,当处理装置403欲暂存中间数据时,可以通过第六硅通孔862暂存至第四内存区861,或是通过第二硅通孔833暂存至第二内存区841。The fourth memory layer 86 includes a fourth memory area 861 and sixth TSVs 862 . The fourth memory area 861 covers the logical layer of the fourth memory layer 86 , ie the top side of the fourth memory layer 86 in the figure. The fourth memory area 861 has storage units for temporarily storing the operation results of the second operation circuit. The sixth through-silicon vias 862 are spread over the entire fourth memory layer 86 and are only shown on one side for electrical connection. components. The fourth memory layer 86 is only responsible for temporarily storing the calculation results of the second calculation circuit, and is not responsible for the external contact task of the second die group. The second computing circuit can use the temporary storage space of the second memory area 841 and the fourth memory area 861, and when the processing device 403 wants to temporarily store intermediate data, it can temporarily store it to the fourth memory area 861 through the sixth TSV 862, Or it is temporarily stored in the second memory area 841 through the second TSV 833 .
各层的硅通孔如有必要,将分别包括收发硅通孔、输入输出硅通孔及物理硅通孔。收发硅通孔用来电性连接第一收发电路和第二收发电路,输入输出硅通孔用以电性传导输入输出电路的数据,物理硅通孔用以电性传导运算电路的运算结果至片外内存404。If necessary, the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs. The transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit, the input-output TSV is used to electrically conduct the data of the input-output circuit, and the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. 404 out of memory.
当计算装置401欲传输数据至处理装置403时,数据通过以下路径到达处理装置403:第一运算区811的第一运算电路→第一晶粒对晶粒区812的第一收发电路→第一硅通孔813的收发硅通孔→第三硅通孔824的收发硅通孔→第六硅通孔862的收发硅通孔→第二晶粒对晶粒区832的第二收发电路→第二运算区831的第二运算电路;当处理装置403欲传输数据至计算装置401时,数据通过前述的反向路径到达计算装置401。When the computing device 401 intends to transmit data to the processing device 403, the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 811 → the first transceiver circuit in the first die-to-die area 812 → the first Transceiver TSV of TSV 813 → Transceiver TSV of third TSV 824 → Transceiver TSV of sixth TSV 862 → Second transceiver circuit of second die-to-grain region 832 → No. The second computing circuit of the second computing area 831 ; when the processing device 403 intends to transmit data to the computing device 401 , the data reaches the computing device 401 through the aforementioned reverse path.
当第一晶粒组的计算结果需要通过接口装置402与片外的其他装置进行数据交换时,数据通过以下路径到达片外的其他装置:第一输入输出区822的第一输入输出电路→第三硅通孔824的输入输出硅通孔→第六硅通孔862的输入输出硅通孔→第二硅通孔833的输入输出硅通孔→第四硅通孔844的输入输出硅通孔;当片外的其他装置欲传输数据至第一晶粒组时,数据通过前述的反向路径到达第一内存区821。当第二晶粒组的计算结果需要通过接口装置402与片外的其他装置进行数据交换时,数据通过以下路径到达片外的其他装置:第二输入输出区842的第二输入输出电路→第四硅通孔844的输入输出硅通孔;当片外的其他装置欲传输数据至第二晶粒组时,数据通过前述的反向路径到达第二内存区841。When the calculation result of the first die group needs to exchange data with other off-chip devices through the interface device 402, the data reaches other off-chip devices through the following path: the first input-output circuit of the first input-output area 822→the second The input-output TSV of the third TSV 824 → the input-output TSV of the sixth TSV 862 → the input-output TSV of the second TSV 833 → the input-output TSV of the fourth TSV 844 ; When other off-chip devices want to transmit data to the first die group, the data arrives at the first memory area 821 through the aforementioned reverse path. When the calculation result of the second die group needs to exchange data with other off-chip devices through the interface device 402, the data reaches other off-chip devices through the following path: the second input-output circuit of the second input-output area 842→the second The input and output TSVs of the four TSVs 844 ; when other off-chip devices want to transmit data to the second die group, the data arrives at the second memory area 841 through the aforementioned reverse path.
当第一晶粒组的数据欲传输至片外内存404时,数据通过以下路径到达片外内存404:第一物理区823的第一物理访问电路→第三硅通孔824的物理硅通孔→第六硅通孔862的物理硅通孔→第二硅通孔833的物理硅通孔→第四硅通孔844的物理硅通孔;当片外内存404欲传输输入数据至第一晶粒组供计算装置401进行处理时,数据通过前述的反向路径到达第一内存区821。当第二晶粒组的数据欲传输至片外内存404时,数据通过以下路径到达片外内存404:第二物理区843的第二物理访问电路→第四硅通孔844的物理硅通孔;当片外内存404欲传输输入数据至第二晶粒组供处理装置403进行处理时,数据通过前述的反向路径到达第二内存区841。When the data of the first die group is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the first physical access circuit of the first physical area 823 → the physical TSV of the third TSV 824 → the physical TSV of the sixth TSV 862 → the physical TSV of the second TSV 833 → the physical TSV of the fourth TSV 844; when the off-chip memory 404 wants to transmit input data to the first chip When the granular group is processed by the computing device 401 , the data arrives at the first memory area 821 through the aforementioned reverse path. When the data of the second die group is to be transmitted to the off-chip memory 404, the data reaches the off-chip memory 404 through the following path: the second physical access circuit of the second physical area 843→the physical TSV of the fourth TSV 844 ; When the off-chip memory 404 intends to transmit input data to the second die group for processing by the processing device 403 , the data arrives at the second memory area 841 through the aforementioned reverse path.
在此实施例中,第一核层81与第一内存层82和第三内存层85搭配使用,第二核层83与第二内存层84和第四内存层86搭配使用,为了传输效率,第一核层81与第一内存层82采用面对面贴合制程,使得第一运算电路与第一内存区821的传输路径最短,第一核层81与第三内存层85采用面对背贴合制程,第二核层83与第四内存层86采用面对面贴合制程,同样使得第二运算电路与第四内存区861的传输路径最短,第二核层83与第二内存层84采用面对背贴合制程,第一晶粒组与第二晶粒组采用背对背贴合制程,即第一内存层82与第四内存层86采用背对背贴合制程。In this embodiment, the first core layer 81 is used in conjunction with the first memory layer 82 and the third memory layer 85, and the second core layer 83 is used in conjunction with the second memory layer 84 and the fourth memory layer 86. For transmission efficiency, The first core layer 81 and the first memory layer 82 adopt a face-to-face bonding process, so that the transmission path between the first computing circuit and the first memory area 821 is the shortest, and the first core layer 81 and the third memory layer 85 adopt face-to-back bonding Manufacturing process, the second core layer 83 and the fourth memory layer 86 adopt a face-to-face bonding process, which also makes the transmission path between the second computing circuit and the fourth memory area 861 the shortest, and the second core layer 83 and the second memory layer 84 adopt face-to-face In the back bonding process, the first die group and the second die group adopt a back-to-back bonding process, that is, the first memory layer 82 and the fourth memory layer 86 adopt a back-to-back bonding process.
如图8所示,第一晶粒对晶粒区812与第二晶粒对晶粒区832纵向堆叠,使得第一核层81的晶粒对晶粒接口与第二核层83的晶粒对晶粒接口直接通过第一硅通孔813、第三硅通孔824与第六硅通孔862电性连接,不需要利用如图2所示的中介层201进行传输。As shown in FIG. 8, the first grain-to-grain region 812 and the second grain-to-grain region 832 are vertically stacked such that the grain-to-grain interface of the first core layer 81 is connected to the grain of the second core layer 83. The interface to the die is directly electrically connected to the sixth TSV 862 through the first TSV 813 , the third TSV 824 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
本发明的另一个实施例同样是实现如图4所示的结构。图9示出此实施例纵向堆叠的示意图。此实施例的纵向堆叠芯片由上至下堆叠分为第一晶粒组、第二晶粒组和第三晶粒组。第一晶粒组由上至下分别为第一核层91(第一晶粒)及第一内存层92(第二晶粒),第二晶粒组由上至下分别为第二核层93(第一晶粒)及第二内存层94(第二晶粒),第三晶粒组仅包括第三内存层95,故第三内存层95位于第二内存层94下。图9中的各层视觉上为上下分离仅为了方便说明而以此方式展示。Another embodiment of the present invention is also to realize the structure shown in FIG. 4 . Figure 9 shows a schematic diagram of vertical stacking in this embodiment. In this embodiment, the vertically stacked chips are stacked from top to bottom into a first die group, a second die group and a third die group. The first die group is respectively the first core layer 91 (first die) and the first memory layer 92 (second die) from top to bottom, and the second die group is respectively the second core layer from top to bottom. 93 (the first die) and the second memory layer 94 (the second die), the third die group only includes the third memory layer 95, so the third memory layer 95 is located under the second memory layer 94. The layers in FIG. 9 are visually separated up and down and shown in this way for convenience of illustration only.
第一核层91包括第一运算区911,第一运算区911布满第一核层91的逻辑层,即图中第一核层91的顶侧,第一核层91在特别区域还包括第一晶粒对晶粒区912及第一硅通孔913,第一内存层92包括第一内存区921及第二硅通孔922,第一内存区921布满第一内存层92的逻辑层,即图中第一内存层92的顶侧。第一内存区921生成有存储单元,用以暂存第一运算电路的运算结果。第二核层93包括第二运算区931,第二运算区931布满第二核层93的逻辑层,即图中第二核层93的顶侧,第二核层93在特别区域还包括第二晶粒对晶粒区932及第三硅通孔933,第二内存层94包括第二内存区941及第四硅通孔942,第二内存区941布满第二内存层94的逻辑层,即图中第二内存层94的顶侧,第二内存区941生成有存储单元,用以暂存第二运算电路的运算结果。The first core layer 91 includes a first operation area 911, the first operation area 911 is covered with the logic layer of the first core layer 91, that is, the top side of the first core layer 91 in the figure, and the first core layer 91 also includes in a special area The first die-to-grain area 912 and the first TSV 913, the first memory layer 92 includes a first memory area 921 and a second TSV 922, the first memory area 921 is full of the logic of the first memory layer 92 layer, that is, the top side of the first memory layer 92 in the figure. The first memory area 921 has storage units for temporarily storing the calculation results of the first calculation circuit. The second core layer 93 includes a second operation area 931, the second operation area 931 is full of the logic layer of the second core layer 93, that is, the top side of the second core layer 93 in the figure, and the second core layer 93 also includes in a special area The second die-to-grain area 932 and the third TSV 933, the second memory layer 94 includes a second memory area 941 and a fourth TSV 942, the second memory area 941 is full of the logic of the second memory layer 94 layer, that is, the top side of the second memory layer 94 in the figure, and the second memory area 941 has storage units for temporarily storing the operation results of the second operation circuit.
第三内存层95包括第三内存区951、第一输入输出区952、第二输入输出区953、第一物理访问区954、第二物理访问区955及第五硅通孔956,第三内存区951生成有存储单元,用以暂存第一运算电路或第二运算电路的运算结果,第一输入输出区952生成有第一输入输出电路,用以作为第一晶粒组对外联系的接口,即实现接口装置402的功能,第二输入输出区953生成有第二输入输出电路,用以作为第二晶粒组对外联系的接口,即实现接口装置402的功能,第一物理区954生成有第一物理访问电路,用以联系第一晶粒组与片外内存404,第二物理区955生成有第二物理访问电路,用以联系第二晶粒组与片外内存404。The third memory layer 95 includes a third memory area 951, a first input-output area 952, a second input-output area 953, a first physical access area 954, a second physical access area 955, and a fifth TSV 956. The third memory The area 951 is formed with a storage unit for temporarily storing the calculation results of the first operation circuit or the second operation circuit, and the first input-output area 952 is formed with a first input-output circuit, which is used as an interface for the first die group to communicate with the outside world , that is to realize the function of the interface device 402, the second input and output area 953 generates a second input and output circuit, which is used as an interface for the second die group to communicate with the outside world, that is, realizes the function of the interface device 402, and the first physical area 954 generates There is a first physical access circuit for connecting the first die group and the off-chip memory 404 , and the second physical area 955 has a second physical access circuit for connecting the second die group and the off-chip memory 404 .
各硅通孔遍布整个层中,示例性仅显示于一侧。各层的硅通孔如有必要,将分别包括收发硅通孔、输入输出硅通孔及物理硅通孔。收发硅通孔用来电性连接第一收发电路和第二收发电路,输入输出硅通孔用以电性传导输入输出电路的数据,物理硅通孔用以电性传导运算电路的运算结果至片外内存404。TSVs are present throughout the entire layer, only shown on one side by way of example. If necessary, the TSVs of each layer will include the transceiver TSVs, the input-output TSVs and the physical TSVs. The transceiver TSV is used to electrically connect the first transceiver circuit and the second transceiver circuit, the input-output TSV is used to electrically conduct the data of the input-output circuit, and the physical TSV is used to electrically conduct the operation result of the operation circuit to the chip. 404 out of memory.
当计算装置401欲传输数据至处理装置403时,数据通过以下路径到达处理装置403:第一运算区911的第一运算电路→第一晶粒对晶粒区912的第一收发电路→第一硅通孔913的收发硅通孔→第二硅通孔922的收发硅通孔→第二晶粒对晶粒区932的第二收发电路→第二运算区931的第二运算电路;当处理装置403欲传输数据至计算装置401时,数据通过前述的反向路径到达计算装置401。When the computing device 401 intends to transmit data to the processing device 403, the data reaches the processing device 403 through the following path: the first computing circuit in the first computing area 911 → the first transceiver circuit in the first die-to-die area 912 → the first Transceiver TSV of TSV 913 → Transceiver TSV of second TSV 922 → second transceiver circuit of second grain-to-grain region 932 → second computing circuit of second computing region 931; when processing When the device 403 intends to transmit data to the computing device 401, the data reaches the computing device 401 through the aforementioned reverse path.
第一晶粒组与第二晶粒组不直接对片外联系,当需要对片外联系时,此实施例通过第三晶粒组的第三内存层95来执行。The first die group and the second die group are not directly connected to the off-chip, and when they need to be connected to the off-chip, this embodiment is implemented through the third memory layer 95 of the third die group.
当计算装置401的计算结果需要通过接口装置402与片外的其他装置进行数据交换时,数据会通过各层的输入输出硅通孔传送至第三内存区951暂存,再由第三内存区951通过以下路径到达片外的其他装置:第一输入输出区952的第一输入输出电路→第五硅通孔956的第一输入输出硅通孔;当片外的其他装置欲传输数据至第一晶粒组时,数据通过前述的反向路径先暂存在第三内存区951,再从第三内存区951传送至第一内存区921。When the calculation result of the computing device 401 needs to exchange data with other off-chip devices through the interface device 402, the data will be transmitted to the third memory area 951 for temporary storage through the input and output silicon vias of each layer, and then the third memory area 951 reaches other off-chip devices through the following paths: the first I/O circuit of the first I/O region 952 → the first I/O TSV of the fifth TSV 956; when other off-chip devices want to transmit data to the In the case of a die group, the data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then transmitted from the third memory area 951 to the first memory area 921 .
当处理装置403的计算结果需要通过接口装置402与片外的其他装置进行数据交换时,数据会通过各层的输入输出硅通孔传送至第三内存区951暂存,再由第三内存区951通过以下路径到达片外的其他装置:第二输入输出区953的第二输入输出电路→第五硅通孔956的第二输入输出硅通孔;当片外的其他装置欲传输数据至第二晶粒组时,数据通过前述的反向路径先暂存在第三内存区951,再从第三内存区951传送至达第二内存区941。When the calculation result of the processing device 403 needs to exchange data with other off-chip devices through the interface device 402, the data will be transmitted to the third memory area 951 for temporary storage through the input and output through-silicon vias of each layer, and then the third memory area 951 reaches other devices off-chip through the following path: the second input-output circuit of the second input-output area 953 → the second input-output silicon via of the fifth silicon via 956; when other devices outside the chip want to transmit data to the second In the case of a two-die group, the data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then transmitted from the third memory area 951 to the second memory area 941 .
当第一内存区921的数据欲传输至片外内存404时,数据会通过各层的物理硅通孔传送至第三内存区951暂存,再由第三内存区951通过以下路径到达片外的其他装置:第一物理区954的第一物理访问电路→第五硅通孔956的第一物理硅通孔;当片外内存404欲传输输入数据至第一晶粒组时,输入数据通过前述的反向路径先暂存在第三内存区951,再从第三内存区951传送至达第一内存区921。When the data in the first memory area 921 is to be transmitted to the off-chip memory 404, the data will be transmitted to the third memory area 951 for temporary storage through the physical TSVs of each layer, and then from the third memory area 951 to the off-chip through the following path Other devices: the first physical access circuit of the first physical area 954 → the first physical TSV of the fifth TSV 956; when the off-chip memory 404 intends to transmit input data to the first die group, the input data passes through The aforementioned reverse path is temporarily stored in the third memory area 951 , and then transmitted from the third memory area 951 to the first memory area 921 .
当第二内存区941的数据欲传输至片外内存404时,数据会通过第四硅通孔的物理硅通孔传送至第三内存区951暂存,再由第三内存区951通过以下路径到达片外的其他装置:第二物理区955的第二物理访问电路→第五硅通孔956的第二物理硅通孔;当片外内存404欲传输输入数据至第二晶粒组时,输入数据通过前述的反向路径先暂存在第三内存区951,再从第三内存区951通过第四硅通孔的物理硅通孔传送至达第二内存区941。When the data in the second memory area 941 is to be transmitted to the off-chip memory 404, the data will be transmitted to the third memory area 951 for temporary storage through the physical TSV of the fourth TSV, and then the third memory area 951 will pass through the following path Reaching other off-chip devices: the second physical access circuit of the second physical area 955→the second physical TSV of the fifth TSV 956; when the off-chip memory 404 intends to transmit input data to the second die group, The input data is temporarily stored in the third memory area 951 through the aforementioned reverse path, and then is transmitted from the third memory area 951 to the second memory area 941 through the physical TSV of the fourth TSV.
在此实施例中,第一核层91与第一内存层92搭配使用,第二核层93与第二内存层94搭配使用,为了传输效率,第一核层91与第一内存层92采用面对面贴合制程,使得第一运算电路与第一内存区921的传输路径最短,第二核层93与第二内存层94采用面对面贴合制程,同样使得第二运算电路与第二内存区941的传输路径最短。为了实现前述最短传输路径,第一晶粒组与第二晶粒组则采用背对背贴合制程,即第一内存层92与第二核层93采用背对背贴合制程,第二晶粒组与第三晶粒组采用面对背贴合制程,即第二内存层94与第三内存层95采用面对背贴合制程。In this embodiment, the first core layer 91 is used in conjunction with the first memory layer 92, and the second core layer 93 is used in conjunction with the second memory layer 94. For transmission efficiency, the first core layer 91 and the first memory layer 92 use The face-to-face bonding process makes the transmission path between the first computing circuit and the first memory area 921 the shortest, and the second core layer 93 and the second memory layer 94 adopt a face-to-face bonding process, which also makes the second computing circuit and the second memory area 941 The transmission path is the shortest. In order to realize the aforementioned shortest transmission path, the first die group and the second die group adopt a back-to-back bonding process, that is, the first memory layer 92 and the second core layer 93 adopt a back-to-back bonding process, and the second die group and the second core layer adopt a back-to-back bonding process. The three-die group adopts a face-to-back bonding process, that is, the second memory layer 94 and the third memory layer 95 adopt a face-to-back bonding process.
如图9所示,第一晶粒对晶粒区912与第二晶粒对晶粒区932纵向堆叠,使得第一核层91的晶粒对晶粒接口与第二核层93的晶粒对晶粒接口直接通过第一硅通孔913与第二硅通孔922电性连接,不需要利用如图2所示的中介层201进行传输。As shown in FIG. 9, the first grain-to-grain region 912 and the second grain-to-grain region 932 are vertically stacked such that the grain-to-grain interface of the first core layer 91 is connected to the grain of the second core layer 93. The interface to the die is directly electrically connected to the second TSV 922 through the first TSV 913 , without using the intermediary layer 201 as shown in FIG. 2 for transmission.
本发明的另一个实施例同样是实现如图4所示的结构。图10示出此实施例纵向堆叠的示意图。此实施例的纵向堆叠芯片由上至下堆叠分为第一晶粒组、第二晶粒组和第三晶粒组。第一晶粒组由上至下分别为第三内存层B及第一核层A,第二晶粒组由上至下分别为第一内存层D及第二核层C,第三晶粒组仅包括第二内存层E。明显地,此实施例的纵向堆叠结构与图9的实施例差异仅在于第一晶粒组与第二晶粒组的核层与内存层位置对调,本领域技术人员基于前述实施例的说明,无需创造性的劳动便可知悉此实施例各层间的协同方式,故不赘述。Another embodiment of the present invention is also to realize the structure shown in FIG. 4 . Fig. 10 shows a schematic diagram of vertical stacking of this embodiment. In this embodiment, the vertically stacked chips are stacked from top to bottom into a first die group, a second die group and a third die group. The first die group is respectively the third memory layer B and the first core layer A from top to bottom, the second die group is respectively the first memory layer D and the second core layer C from top to bottom, and the third die The group includes the second memory tier E only. Obviously, the only difference between the vertical stacking structure of this embodiment and the embodiment in FIG. 9 is that the positions of the core layer and the memory layer of the first die group and the second die group are swapped. Based on the description of the foregoing embodiments, those skilled in the art can The synergy between layers in this embodiment can be known without creative effort, so details will not be described.
上述多个实施例都是一种纵向堆叠的片上系统,可以用FCBGA(flip chip ball grid array) 或是CoWoS(chip on wafer on substrate)封装工艺来实现。FCBGA被称为倒装芯片球栅格阵列的封装格式,用小球代替针脚来连接电路,能提供最短的对外连接距离,采用这一封装不仅提供优异的电性效能,同时可以减少组件互连间的损耗及电感,降低电磁干扰的问题,并承受较高的频率。CoWoS是一种整合生产技术,先将晶粒通过CoW的封装制程连接至硅晶圆(wafer),再把CoW晶粒与基板连接,整合成CoWoS,通过这种技术可以把多颗晶粒封装到一起,达到了封装体积小、功耗低、引脚少的技术功效。The multiple embodiments above are all vertically stacked system-on-chips, which can be realized by FCBGA (flip chip ball grid array) or CoWoS (chip on wafer on substrate) packaging technology. FCBGA is a packaging format called flip-chip ball grid array. Small balls are used instead of pins to connect circuits, which can provide the shortest external connection distance. Using this package not only provides excellent electrical performance, but also reduces component interconnections Between the loss and inductance, reduce the problem of electromagnetic interference, and withstand higher frequencies. CoWoS is an integrated production technology. First, the die is connected to the silicon wafer (wafer) through the CoW packaging process, and then the CoW die is connected to the substrate to form CoWoS. Through this technology, multiple dies can be packaged. Together, the technical effects of small package size, low power consumption, and fewer pins are achieved.
本发明的另一个实施例是一种制成如图5所示的纵向堆叠芯片的方法,纵向堆叠芯片包括第一晶粒组及第二晶粒组,其中第一晶粒组包括第一核层51(第一晶粒)及第一内存层52(第二晶粒),第二晶粒组包括第二核层53(第一晶粒)及第二内存层54(第二晶粒),在另一种情况下,第一晶粒可以是内存,第二晶粒可以是处理器核。其流程图如图11所示。Another embodiment of the present invention is a method of making a vertically stacked chip as shown in FIG. 5 , the vertically stacked chip includes a first die group and a second die group, wherein the first die group includes a first core Layer 51 (the first die) and the first memory layer 52 (the second die), the second die group includes the second core layer 53 (the first die) and the second memory layer 54 (the second die) , in another case, the first die may be a memory and the second die may be a processor core. Its flowchart is shown in Figure 11.
在步骤1101中,生成第一收发电路于第一核层51中的第一晶粒对晶粒区512。在步骤1102中,生成第二收发电路于第二核层53中的第二晶粒对晶粒区532。在步骤1103中,生成收发硅通孔于第一内存层52。在步骤1104中,生成输入输出硅通孔于第二核层53及第二内存层54。在步骤1105中,生成物理硅通孔于第二核层53及第二内存层54。在步骤1106中,设置第一内存层52于第一核层51及第二核层53间,即第一核层51、第一内存层52、第二核层53及第二内存层54的顺序由上往下堆叠。在步骤1107中,面对面贴合第一核层51及第一内存层52。在步骤1108中,面对面贴合第二核层53及第二内存层54。在步骤1109中,背对背贴合第一晶粒组和第二晶粒组。In step 1101 , a first transceiver circuit is formed in a first die-to-die region 512 in the first core layer 51 . In step 1102 , the second transceiver circuit is formed in the second die-to-die region 532 in the second core layer 53 . In step 1103 , generate transceiver TSVs in the first memory layer 52 . In step 1104 , generating I/O TSVs in the second core layer 53 and the second memory layer 54 . In step 1105 , physical TSVs are formed in the second core layer 53 and the second memory layer 54 . In step 1106, the first memory layer 52 is set between the first core layer 51 and the second core layer 53, that is, the first core layer 51, the first memory layer 52, the second core layer 53 and the second memory layer 54 The order is stacked from top to bottom. In step 1107, the first core layer 51 and the first memory layer 52 are bonded face to face. In step 1108, the second core layer 53 and the second memory layer 54 are bonded face to face. In step 1109, the first die group and the second die group are bonded back to back.
在这样的结构下,第一运算区511及第二运算区531通过第一收发电路及第二收发电路进行层间数据传输,其中第一内存层52通过收发硅通孔电性连接第一收发电路及第二收发电路;第一内存区521中的数据通过第一输入输出区522及输入输出硅通孔传送至纵向堆叠芯片外,且第二内存区541中的数据通过第二输入输出区542及输入输出硅通孔传送至纵向堆叠芯片外;第一运算区511的运算结果通过第一物理区523及物理硅通孔传送至片外内存404,第二运算区531的运算结果通过第二物理区543及物理硅通孔传送至片外内存404。Under such a structure, the first computing area 511 and the second computing area 531 perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit, wherein the first memory layer 52 is electrically connected to the first transceiver circuit through the transceiver through silicon via. circuit and the second transceiver circuit; the data in the first memory area 521 is transmitted to the outside of the vertically stacked chip through the first input-output area 522 and the input-output through-silicon via, and the data in the second memory area 541 is passed through the second input-output area 542 and the input and output TSVs are transmitted outside the vertically stacked chips; the calculation results of the first calculation area 511 are transmitted to the off-chip memory 404 through the first physical area 523 and the physical TSVs, and the calculation results of the second calculation area 531 are transmitted through the first physical area 523 The second physical area 543 and the physical TSV are transmitted to the off-chip memory 404 .
本发明的另一个实施例是一种制成如图7所示的纵向堆叠芯片的方法,纵向堆叠芯片包括第一晶粒组及第二晶粒组,其中第一晶粒组包括第一核层51(第一晶粒)及第一内存层52(第二晶粒),第二晶粒组包括第二核层53(第一晶粒)及第二内存层54(第二晶粒),在另一种情况下,第一晶粒可以是内存,第二晶粒可以是处理器核。其流程图如图12所示。Another embodiment of the present invention is a method of making a vertically stacked chip as shown in FIG. 7, the vertically stacked chip includes a first die group and a second die group, wherein the first die group includes a first core Layer 51 (the first die) and the first memory layer 52 (the second die), the second die group includes the second core layer 53 (the first die) and the second memory layer 54 (the second die) , in another case, the first die may be a memory and the second die may be a processor core. Its flow chart is shown in Figure 12.
在步骤1201中,生成第一收发电路于第一核层51中的第一晶粒对晶粒区512。在步骤1202中,生成第二收发电路于第二核层53中的第二晶粒对晶粒区532。在步骤1203中,生成收发硅通孔于第二内存层54。在步骤1204中,生成输入输出硅通孔于第二核层53及第二内存层54。在步骤1205中,生成物理硅通孔于第二核层53及第二内存层54。在步骤1206中,设置第二内存层54于第一核层51及第二核层53间,即第一内存层52、第一核层51、第二内存层54及第二核层53的顺序由上往下堆叠。在步骤1207中,面对面贴合第一核层51及第一内存层52。在步骤1208中,面对面贴合第二核层53及第二内存层54。在步骤1209中,背对背贴合第一晶粒组和第二晶粒组。In step 1201 , a first transceiver circuit is formed in the first die-to-die region 512 in the first core layer 51 . In step 1202 , the second transceiver circuit is formed in the second die-to-die region 532 in the second core layer 53 . In step 1203 , generate transceiver TSVs in the second memory layer 54 . In step 1204 , the I/O TSVs are formed in the second core layer 53 and the second memory layer 54 . In step 1205 , physical TSVs are formed in the second core layer 53 and the second memory layer 54 . In step 1206, the second memory layer 54 is set between the first core layer 51 and the second core layer 53, that is, the first memory layer 52, the first core layer 51, the second memory layer 54 and the second core layer 53 The order is stacked from top to bottom. In step 1207, the first core layer 51 and the first memory layer 52 are bonded face to face. In step 1208, the second core layer 53 and the second memory layer 54 are bonded face to face. In step 1209, the first die group and the second die group are bonded back to back.
在这样的结构下,第一运算区511及第二运算区531通过第一收发电路及第二收发电路进行层间数据传输,其中第二内存层54通过收发硅通孔电性连接第一收发电路及第二收发电路;第一内存区521中的数据通过第一输入输出区522及输入输出硅通孔传送至纵向堆叠芯片外,且第二内存区541中的数据通过第二输入输出区542及输入输出硅通孔传送至纵向堆叠芯片外;第一运算区511的运算结果通过第一物理区523及物理硅通孔传送至片外内存404,第二运算区531的运算结果通过第二物理区543及物理硅通孔传送至 片外内存404。Under such a structure, the first computing area 511 and the second computing area 531 perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit, wherein the second memory layer 54 is electrically connected to the first transceiver circuit through the transceiver through silicon via. circuit and the second transceiver circuit; the data in the first memory area 521 is transmitted to the outside of the vertically stacked chip through the first input-output area 522 and the input-output through-silicon via, and the data in the second memory area 541 is passed through the second input-output area 542 and the input and output TSVs are transmitted outside the vertically stacked chips; the calculation results of the first calculation area 511 are transmitted to the off-chip memory 404 through the first physical area 523 and the physical TSVs, and the calculation results of the second calculation area 531 are transmitted through the first physical area 523 The second physical area 543 and the physical TSV are transmitted to the off-chip memory 404 .
本发明的另一个实施例是一种制成如图8所示的纵向堆叠芯片的方法,此实施例的纵向堆叠芯片分为第一晶粒组和第二晶粒组,第一晶粒组堆叠在第二晶粒组上,第一晶粒组包括第一核层81(第一晶粒)、第一内存层82(第二晶粒)及第三内存层85(第三晶粒),第二晶粒组包括第二核层83(第一晶粒)、第二内存层84(第三晶粒)及第四内存层86(第二晶粒)。其流程图如图13所示。Another embodiment of the present invention is a method of making a vertically stacked chip as shown in Figure 8, the vertically stacked chip of this embodiment is divided into a first die group and a second die group, the first die group Stacked on the second die group, the first die group includes a first core layer 81 (first die), a first memory layer 82 (second die) and a third memory layer 85 (third die) , the second die group includes a second core layer 83 (first die), a second memory layer 84 (third die) and a fourth memory layer 86 (second die). Its flow chart is shown in Figure 13.
在步骤1301中,生成第一收发电路于第一核层81中的第一晶粒对晶粒区812。在步骤1302中,生成第二收发电路于第二核层83中的第二晶粒对晶粒区832。在步骤1303中,生成收发硅通孔于第一内存层82及第四内存层86。在步骤1304中,生成输入输出硅通孔于第二核层83、第二内存层84及第四内存层86。在步骤1305中,生成物理硅通孔于第二核层83、第二内存层84及第四内存层86。在步骤1306中,面对面贴合第一核层81及第一内存层82。在步骤1307中,面对背贴合第三内存层85与第一核层81。在步骤1308中,面对面贴合第二核层83及第四内存层86。在步骤1309中,面对背贴合第二内存层84与第二核层83。在步骤1310中,基于第三内存层85、第一核层81、第一内存层82的顺序由上往下堆叠。在步骤1311中,基于第四内存层86、第二核层83及第二内存层84的顺序由上往下堆叠。在步骤1312中,背对背贴合第一晶粒组和第二晶粒组。In step 1301 , a first transceiver circuit is formed in a first die-to-die region 812 in the first core layer 81 . In step 1302 , the second transceiver circuit is formed in the second die-to-die region 832 in the second core layer 83 . In step 1303 , TSVs for transmitting and receiving are formed in the first memory layer 82 and the fourth memory layer 86 . In step 1304 , the I/O TSVs are generated in the second core layer 83 , the second memory layer 84 and the fourth memory layer 86 . In step 1305 , physical TSVs are formed in the second core layer 83 , the second memory layer 84 and the fourth memory layer 86 . In step 1306, the first core layer 81 and the first memory layer 82 are bonded face to face. In step 1307, the third memory layer 85 and the first core layer 81 are bonded face to back. In step 1308, the second core layer 83 and the fourth memory layer 86 are bonded face to face. In step 1309, the second memory layer 84 and the second core layer 83 are bonded face to back. In step 1310, based on the order of the third memory layer 85, the first core layer 81, and the first memory layer 82, stacking is performed from top to bottom. In step 1311 , based on the order of the fourth memory layer 86 , the second core layer 83 and the second memory layer 84 , stacking from top to bottom. In step 1312, the first die group and the second die group are bonded back to back.
在这样的结构下,第一运算区811及第二运算区831通过第一收发电路及第二收发电路进行层间数据传输,其中第一内存层82及第四内存层86通过收发硅通孔电性连接第一收发电路及第二收发电路;第一内存区821中的数据通过第一输入输出区822及输入输出硅通孔传送至纵向堆叠芯片外,且第二内存区841中的数据通过第二输入输出区842及输入输出硅通孔传送至纵向堆叠芯片外;第一运算区811的运算结果通过第一物理区823及物理硅通孔传送至片外内存404,第二运算区831的运算结果通过第二物理区843及物理硅通孔传送至片外内存404。Under such a structure, the first operation area 811 and the second operation area 831 perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit, wherein the first memory layer 82 and the fourth memory layer 86 pass through the transceiver through silicon vias Electrically connect the first transceiver circuit and the second transceiver circuit; the data in the first memory area 821 is transmitted to the outside of the vertically stacked chips through the first input-output area 822 and the input-output through-silicon vias, and the data in the second memory area 841 Through the second input-output area 842 and the input-output through-silicon via, the calculation result of the first operation area 811 is transmitted to the off-chip memory 404 through the first physical area 823 and the physical through-silicon via, and the second operation area The operation result of 831 is transmitted to the off-chip memory 404 through the second physical area 843 and the physical TSV.
本发明的另一个实施例是一种制成如图9所示的纵向堆叠芯片的方法,此实施例的纵向堆叠芯片由上至下堆叠分为第一晶粒组、第二晶粒组和第三晶粒组。第一晶粒组由上至下分别为第一核层91(第一晶粒)及第一内存层92(第二晶粒),第二晶粒组由上至下分别为第二核层93(第一晶粒)及第二内存层94(第二晶粒),第三晶粒组仅包括第三内存层95。其流程图如图14所示。Another embodiment of the present invention is a method of making vertically stacked chips as shown in FIG. third grain group. The first die group is respectively the first core layer 91 (first die) and the first memory layer 92 (second die) from top to bottom, and the second die group is respectively the second core layer from top to bottom. 93 (the first die) and the second memory layer 94 (the second die), and the third die group only includes the third memory layer 95 . Its flow chart is shown in Figure 14.
在步骤1401中,生成第一收发电路于第一核层91中的第一晶粒对晶粒区912。在步骤1402中,生成第二收发电路于第二核层93中的第二晶粒对晶粒区932。在步骤1403中,生成收发硅通孔于第一内存层92。在步骤1404中,生成输入输出硅通孔于第三内存层95。在步骤1405中,生成物理硅通孔于第三内存层95。在步骤1406中,面对面贴合第一核层91及第一内存层92。在步骤1407中,面对面贴合第二核层93及第二内存层94。在步骤1408中,基于第一核层91及第一内存层92的顺序由上往下堆叠。在步骤1409中,基于第二核层93及第二内存层94的顺序由上往下堆叠。在步骤1410中,背对背贴合第一晶粒组和第二晶粒组。在步骤1411中,面对背贴合第三晶粒组与第二晶粒组。In step 1401 , a first transceiver circuit is formed in a first die-to-die region 912 in the first core layer 91 . In step 1402 , the second transceiver circuit is formed in the second die-to-die region 932 in the second core layer 93 . In step 1403 , generate transceiver TSVs in the first memory layer 92 . In step 1404 , generate I/O TSVs in the third memory layer 95 . In step 1405 , physical TSVs are formed in the third memory layer 95 . In step 1406, the first core layer 91 and the first memory layer 92 are bonded face to face. In step 1407, the second core layer 93 and the second memory layer 94 are bonded face to face. In step 1408 , based on the order of the first core layer 91 and the first memory layer 92 , stacking is performed from top to bottom. In step 1409, based on the order of the second core layer 93 and the second memory layer 94, stacking is performed from top to bottom. In step 1410, the first die group and the second die group are bonded back to back. In step 1411, the third die group and the second die group are bonded face to back.
在此实施例中,第三内存层95包括第三内存区951、第一输入输出区952、第二输入输出区953、第一物理访问区954、第二物理访问区955及第五硅通孔956,第三内存区951生成有存储单元,用以暂存第一运算电路或第二运算电路的运算结果,第一输入输出区952生成有第一输入输出电路,用以作为第一晶粒组对外联系的接口,即实现接口装置402的功能,第二输入输出区953生成有第二输入输出电路,用以作为第二晶粒组对外联系的接口,即实现接口装置402的功能,第一物理区954生成有第一物理访问电路,用以联系第一晶粒组与片外内存404,第二物理区955生成有第二物理访问电路,用以联系第二晶粒组与片外内存404。In this embodiment, the third memory layer 95 includes a third memory area 951, a first I/O area 952, a second I/O area 953, a first physical access area 954, a second physical access area 955 and a fifth silicon through hole 956, the third memory area 951 is formed with a storage unit for temporarily storing the calculation result of the first operation circuit or the second operation circuit, and the first input-output area 952 is formed with a first input-output circuit for use as the first The interface for external contact of the grain group, that is, to realize the function of the interface device 402, and the second input and output area 953 generates a second input and output circuit, which is used as an interface for the external contact of the second die group, that is, to realize the function of the interface device 402, The first physical area 954 generates a first physical access circuit for contacting the first die group and the off-chip memory 404, and the second physical area 955 generates a second physical access circuit for contacting the second die group and the on-chip memory. 404 out of memory.
第一晶粒对晶粒区912与第二晶粒对晶粒区932纵向堆叠,使得第一核层91的晶粒 对晶粒接口与第二核层93的晶粒对晶粒接口直接通过第一硅通孔913与第二硅通孔922电性连接,不需要利用如图2所示的中介层201进行传输。The first die-to-grain region 912 and the second die-to-grain region 932 are vertically stacked such that the die-to-grain interface of the first core layer 91 directly passes through the grain-to-grain interface of the second core layer 93. The first TSV 913 is electrically connected to the second TSV 922 without using the interposer 201 as shown in FIG. 2 for transmission.
图15示出前述各实施例中背对背堆叠的制程方法。FIG. 15 shows the manufacturing method of back-to-back stacking in the foregoing embodiments.
在步骤1501中,在第一晶圆的逻辑侧形成电路。每个晶圆可分为逻辑侧和相对侧,逻辑侧指的是生成逻辑电路以实现特定电性功能的一侧,而相对侧则是晶圆未布设逻辑电路的一侧。由于逻辑电路的生成是在晶圆的上方进行沉积、蚀刻等工序,因此在此步骤中,如图16所示,第一晶圆1601的逻辑侧1602位于第一晶圆1601的上方,相对侧1603位于第一晶圆1601的下方。In step 1501, circuits are formed on the logic side of a first wafer. Each wafer can be divided into a logic side and an opposite side. The logic side refers to the side where logic circuits are generated to achieve specific electrical functions, while the opposite side is the side of the wafer where logic circuits are not laid out. Since the generation of the logic circuit is to carry out processes such as deposition and etching on the top of the wafer, in this step, as shown in FIG. 1603 is located under the first wafer 1601 .
在此步骤中,首先于逻辑侧1602形成前道工序层(front end of line,FEOL)1604,接着在逻辑侧1602形成第一硅通孔1605,最后于逻辑侧1602形成后道工序层(backend of line,BEOL)1606,使得第一硅通孔1605电性连接后道工序层1606。前道工序是在硅衬底上划分制备晶体管的区域,然后离子注入实现N型和P型区域,实现N型和/或P型场效应晶体管。后道工序是多层的导电金属线,这些导电金属线可以将衬底上的晶体管按设计的要求连接起来,实现特定的功能。经过前道工序和后道工序后,分别形成前道工序层1604及后道工序层1606。逻辑侧的电路主要由前道工序层1604来实现,电路中各元件的电性连接由后道工序层1606来实现。In this step, firstly a front end of line (FEOL) 1604 is formed on the logic side 1602, then a first TSV 1605 is formed on the logic side 1602, and finally a backend process layer (backend) is formed on the logic side 1602. of line, BEOL) 1606, so that the first TSV 1605 is electrically connected to the subsequent process layer 1606. The previous process is to divide the region for preparing transistors on the silicon substrate, and then ion implantation to realize N-type and P-type regions to realize N-type and/or P-type field effect transistors. The subsequent process is multi-layer conductive metal wires, which can connect the transistors on the substrate according to the design requirements to achieve specific functions. After the previous process and the subsequent process, the previous process layer 1604 and the subsequent process layer 1606 are respectively formed. The circuit on the logic side is mainly realized by the front-end process layer 1604 , and the electrical connection of each element in the circuit is realized by the back-end process layer 1606 .
在步骤1502中,测试第一晶圆1601,以淘汰残次品。晶圆测试又称为中测,其目的在于确保每个晶粒能基本满足电路的特征或设计规格书,通常包括电压、电流、时序和电性功能的验证。In step 1502, the first wafer 1601 is tested to eliminate defective products. Wafer testing, also known as mid-test, aims to ensure that each chip can basically meet the characteristics of the circuit or design specifications, usually including the verification of voltage, current, timing and electrical functions.
在步骤1503中,翻转第一晶圆1601。对于那些未被淘汰的第一晶圆进行180度的翻转,反转后如图17所示,第一晶圆1601的逻辑侧1602位于下方,相对侧1603位于上方。In step 1503, the first wafer 1601 is flipped over. For those first wafers that are not eliminated, perform a 180-degree flip. After the flip, as shown in FIG. 17 , the logical side 1602 of the first wafer 1601 is located at the bottom, and the opposite side 1603 is located at the top.
在步骤1504中,在逻辑侧1602键合第二晶圆1701,以形成如图17所示的结构。In step 1504 , a second wafer 1701 is bonded on the logic side 1602 to form the structure shown in FIG. 17 .
在步骤1505中,将第一硅通孔1605露出于相对侧1603。首先,磨平(grind)相对侧1603,并化学机械抛光(CMP)磨平后的相对侧1603,以形成如图18所示的结构。接着,等离子刻蚀(plasma etch)化学机械抛光后的相对侧1603,使第一硅通孔1605突出于相对侧1603的表面,以形成如图19所示的结构。随后低温化学气相沉积(LTCVD)二氧化硅于等离子刻蚀后的表面,以形成如图20所示的二氧化硅层2001。最后,化学机械抛光低温化学气相沉积后的表面,使二氧化硅层2001变得平整且露出第一硅通孔1605,即如图21所示的结构。In step 1505 , the first TSV 1605 is exposed on the opposite side 1603 . First, the opposite side 1603 is ground, and the ground opposite side 1603 is chemical mechanical polished (CMP) to form the structure shown in FIG. 18 . Next, plasma etch the opposite side 1603 after chemical mechanical polishing, so that the first TSV 1605 protrudes from the surface of the opposite side 1603 to form a structure as shown in FIG. 19 . Then low temperature chemical vapor deposition (LTCVD) silicon dioxide on the plasma-etched surface to form a silicon dioxide layer 2001 as shown in FIG. 20 . Finally, the surface after the low-temperature chemical vapor deposition is chemically mechanically polished to make the silicon dioxide layer 2001 flat and expose the first TSV 1605 , that is, the structure shown in FIG. 21 .
在步骤1506中,将第一晶圆1601切割成多个第一晶粒。首先,如图22所示,将第一晶圆1601连同第二晶圆1701放置在撑子(mount on frame)2201上,接着利用顶针2202抵住第二晶圆1701,再根据电路的大小与位置切割第一晶圆1601与第二晶圆1701,即沿着如图中虚线进行切割,最后生成多个第一晶粒2203。In step 1506, the first wafer 1601 is diced into a plurality of first dies. First, as shown in FIG. 22, the first wafer 1601 and the second wafer 1701 are placed on the support (mount on frame) 2201, and then the second wafer 1701 is supported by the thimble 2202, and then according to the size of the circuit and the Positionally cut the first wafer 1601 and the second wafer 1701 , that is, cut along the dotted line in the figure, and finally generate a plurality of first crystal grains 2203 .
在步骤1507中,180度翻转第一晶粒2203,以形成如图23所示的结构。In step 1507, the first die 2203 is flipped 180 degrees to form the structure shown in FIG. 23 .
在步骤1508中,贴合第一晶粒的相对侧和第二晶粒的相对侧,使第一硅通孔与第二晶粒的第二硅通孔电性相通。第二晶粒可以基于现有技术的制程来实现,此实施例并不限制第二晶粒的制程工艺。如图24所示,贴合第一晶粒2203的相对侧1603和第二晶粒2401的相对侧2402,使第一硅通孔1605与第二晶粒2401的第二硅通孔2403电性相通。In step 1508 , attach the opposite side of the first die to the opposite side of the second die so that the first TSV is in electrical communication with the second TSV of the second die. The second die can be realized based on the manufacturing process of the prior art, and this embodiment does not limit the manufacturing process of the second die. As shown in FIG. 24, the opposite side 1603 of the first crystal grain 2203 and the opposite side 2402 of the second crystal grain 2401 are bonded together, so that the first TSV 1605 is electrically connected to the second TSV 2403 of the second crystal grain 2401. connected.
至此已形成背对背的结构,也就是第一晶粒2203的相对侧1603和第二晶粒2401的相对侧2402贴合,彼此通过第一硅通孔1605与第二硅通孔2403使得两边的逻辑侧的电路电性相通。So far, a back-to-back structure has been formed, that is, the opposite side 1603 of the first die 2203 and the opposite side 2402 of the second die 2401 are bonded together, passing through the first through-silicon via 1605 and the second through-silicon via 2403 so that the logic on both sides The circuits on both sides are electrically connected.
在步骤1509中,采用模塑封装工艺(molding compound formation)塑封第一晶粒2203,以形成如图25所示的结构。现有技术存在多种模塑封装工艺,示例性的可以采用直接粘结式封装,该封装是将第一晶粒2203和第二晶粒2401直接粘结在印制线路板或覆有金属引线的塑料薄膜的条带上,利用有机树脂点滴在第一晶粒2203周围以形成封装体2501加 以覆盖。In step 1509, the first die 2203 is molded by molding compound formation to form the structure shown in FIG. 25 . There are various molding and packaging processes in the prior art, for example, a direct bonding package can be used, in which the first die 2203 and the second die 2401 are directly bonded on a printed circuit board or covered with metal leads On the strip of the plastic film, organic resin is used to drip around the first die 2203 to form a package body 2501 to cover it.
在步骤1510中,磨平塑封后的第一晶粒。In step 1510, the plastic-encapsulated first die is smoothed.
在步骤1511中,化学机械抛光磨平后的第一晶粒,以形成如图26所示的结构。至此完成整个背对背堆叠的制程。In step 1511 , the ground first grain is chemically mechanically polished to form the structure shown in FIG. 26 . So far, the entire back-to-back stacking process is completed.
本发明的方案是通过将核层纵向堆叠,并设置让同一个晶粒组的处理器核与内存面对面贴合,且相邻晶粒组背对背贴合,使得同一个晶粒组的处理器核与内存的晶粒对晶粒接口的传输路径大大缩短了。以目前的制程来说,逻辑侧的厚度仅有0.3微米,而贴合层的厚度约为1微米,故处理器核与内存的传输路径可缩短至1.6微米,有助于提高核间的传输效率。The solution of the present invention is to stack the core layers vertically, and arrange the processor cores of the same die group to be bonded face-to-face with the memory, and the adjacent die groups to be bonded back to back, so that the processor cores of the same die group The transfer path for the die-to-die interface to the memory is greatly shortened. According to the current manufacturing process, the thickness of the logic side is only 0.3 microns, and the thickness of the bonding layer is about 1 micron, so the transmission path between the processor core and the memory can be shortened to 1.6 microns, which helps to improve the transmission between cores efficiency.
根据不同的应用场景,本发明的电子设备或装置可以包括服务器、云端服务器、服务器集群、数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、PC设备、物联网终端、移动终端、手机、行车记录仪、导航仪、传感器、摄像头、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、视觉终端、自动驾驶终端、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。本发明的电子设备或装置还可以被应用于互联网、物联网、数据中心、能源、交通、公共管理、制造、教育、电网、电信、金融、零售、工地、医疗等领域。进一步,本发明的电子设备或装置还可以用于云端、边缘端、终端等与人工智能、大数据和/或云计算相关的应用场景中。在一个或多个实施例中,根据本发明方案的算力高的电子设备或装置可以应用于云端设备(例如云端服务器),而功耗小的电子设备或装置可以应用于终端设备和/或边缘端设备(例如智能手机或摄像头)。在一个或多个实施例中,云端设备的硬件信息和终端设备和/或边缘端设备的硬件信息相互兼容,从而可以根据终端设备和/或边缘端设备的硬件信息,从云端设备的硬件资源中匹配出合适的硬件资源来模拟终端设备和/或边缘端设备的硬件资源,以便完成端云一体或云边端一体的统一管理、调度和协同工作。According to different application scenarios, the electronic equipment or device of the present invention may include servers, cloud servers, server clusters, data processing devices, robots, computers, printers, scanners, tablet computers, smart terminals, PC equipment, Internet of Things terminals, mobile Terminals, mobile phones, driving recorders, navigators, sensors, cameras, cameras, video cameras, projectors, watches, earphones, mobile storage, wearable devices, visual terminals, automatic driving terminals, vehicles, household appliances, and/or medical equipment. Said vehicles include airplanes, ships and/or vehicles; said household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, range hoods; said medical equipment includes nuclear magnetic resonance instruments, Ultrasound and/or electrocardiograph. The electronic equipment or device of the present invention can also be applied to fields such as the Internet, the Internet of Things, data centers, energy, transportation, public management, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, and medical care. Further, the electronic device or device of the present invention can also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as cloud, edge, and terminal. In one or more embodiments, electronic devices or devices with high computing power according to the solution of the present invention can be applied to cloud devices (such as cloud servers), while electronic devices or devices with low power consumption can be applied to terminal devices and/or Edge devices (such as smartphones or cameras). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that according to the hardware information of the terminal device and/or the edge device, the hardware resources of the cloud device can be Match appropriate hardware resources to simulate the hardware resources of terminal devices and/or edge devices, so as to complete the unified management, scheduling and collaborative work of device-cloud integration or cloud-edge-end integration.
需要说明的是,为了简明的目的,本发明将一些方法及其实施例表述为一系列的动作及其组合,但是本领域技术人员可以理解本发明的方案并不受所描述的动作的顺序限制。因此,依据本发明的公开或教导,本领域技术人员可以理解其中的某些步骤可以采用其他顺序来执行或者同时执行。进一步,本领域技术人员可以理解本发明所描述的实施例可以视为可选实施例,即其中所涉及的动作或模块对于本发明某个或某些方案的实现并不一定是必需的。另外,根据方案的不同,本发明对一些实施例的描述也各有侧重。鉴于此,本领域技术人员可以理解本发明某个实施例中没有详述的部分,也可以参见其他实施例的相关描述。It should be noted that, for the purpose of brevity, the present invention expresses some methods and their embodiments as a series of actions and combinations thereof, but those skilled in the art can understand that the solution of the present invention is not limited by the order of the described actions . Therefore, according to the disclosure or teaching of the present invention, those skilled in the art can understand that some of the steps can be performed in other order or at the same time. Further, those skilled in the art can understand that the embodiments described in the present invention can be regarded as optional embodiments, that is, the actions or modules involved therein are not necessarily necessary for the realization of one or some solutions of the present invention. In addition, according to different schemes, the description of some embodiments of the present invention also has different emphases. In view of this, those skilled in the art may understand the parts not described in detail in a certain embodiment of the present invention, and may also refer to relevant descriptions of other embodiments.
在具体实现方面,基于本发明的公开和教导,本领域技术人员可以理解本发明所公开的若干实施例也可以通过本文未公开的其他方式来实现。例如,就前文所述的电子设备或装置实施例中的各个单元来说,本文在考虑了逻辑功能的基础上对其进行拆分,而实际实现时也可以有另外的拆分方式。又例如,可以将多个单元或组件结合或者集成到另一个系统,或者对单元或组件中的一些特征或功能进行选择性地禁用。就不同单元或组件之间的连接关系而言,前文结合附图所讨论的连接可以是单元或组件之间的直接或间接耦合。在一些场景中,前述的直接或间接耦合涉及利用接口的通信连接,其中通信接口可以支持电性、光学、声学、磁性或其它形式的信号传输。In terms of specific implementation, based on the disclosure and teaching of the present invention, those skilled in the art can understand that several embodiments disclosed in the present invention can also be implemented in other ways not disclosed herein. For example, with respect to each unit in the above-mentioned electronic device or device embodiment, this paper divides them on the basis of considering logical functions, but there may be other division methods in actual implementation. As another example, multiple units or components may be combined or integrated into another system, or some features or functions in units or components may be selectively disabled. As far as the connection relationship between different units or components is concerned, the connections discussed above in conjunction with the drawings may be direct or indirect couplings between units or components. In some scenarios, the aforementioned direct or indirect coupling involves a communication connection using an interface, where the communication interface may support electrical, optical, acoustic, magnetic or other forms of signal transmission.
在本发明中,作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元示出的部件可以是或者也可以不是物理单元。前述部件或单元可以位于同一位置或者分布到多个网络单元上。另外,根据实际的需要,可以选择其中的部分或者全部单元来实现本发明实施例所述方案的目的。另外,在一些场景中,本发明实施例中的多个单元可以 集成于一个单元中或者各个单元物理上单独存在。In the present invention, a unit described as a separate component may or may not be physically separated, and a component shown as a unit may or may not be a physical unit. The aforementioned components or units may be located at the same location or distributed over multiple network units. In addition, according to actual needs, some or all of the units may be selected to achieve the purpose of the solutions described in the embodiments of the present invention. In addition, in some scenarios, multiple units in this embodiment of the present invention may be integrated into one unit, or each unit exists physically independently.
在另外一些实现场景中,上述集成的单元也可以采用硬件的形式实现,即为具体的硬件电路,其可以包括数字电路和/或模拟电路等。电路的硬件结构的物理实现可以包括但不限于物理器件,而物理器件可以包括但不限于晶体管或忆阻器等器件。鉴于此,本文所述的各类装置(例如计算装置或其他处理装置)可以通过适当的硬件处理器来实现,例如中央处理器、GPU、FPGA、DSP和ASIC等。进一步,前述的所述存储单元或存储装置可以是任意适当的存储介质(包括磁存储介质或磁光存储介质等),其例如可以是可变电阻式存储器(Resistive Random Access Memory,RRAM)、动态随机存取存储器(Dynamic Random Access Memory,DRAM)、静态随机存取存储器(Static Random Access Memory,SRAM)、增强动态随机存取存储器(Enhanced Dynamic Random Access Memory,EDRAM)、高带宽存储器(High Bandwidth Memory,HBM)、混合存储器立方体(Hybrid Memory Cube,HMC)、ROM和RAM等。In other implementation scenarios, the above-mentioned integrated units may also be implemented in the form of hardware, that is, specific hardware circuits, which may include digital circuits and/or analog circuits. The physical realization of the hardware structure of the circuit may include but not limited to physical devices, and the physical devices may include but not limited to devices such as transistors or memristors. In view of this, various devices (such as computing devices or other processing devices) described herein may be implemented by appropriate hardware processors, such as central processing units, GPUs, FPGAs, DSPs, and ASICs. Further, the aforementioned storage unit or storage device can be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), which can be, for example, a variable resistance memory (Resistive Random Access Memory, RRAM), dynamic Random Access Memory (Dynamic Random Access Memory, DRAM), Static Random Access Memory (Static Random Access Memory, SRAM), Enhanced Dynamic Random Access Memory (Enhanced Dynamic Random Access Memory, EDRAM), High Bandwidth Memory (High Bandwidth Memory , HBM), hybrid memory cube (Hybrid Memory Cube, HMC), ROM and RAM, etc.
依据以下条款可更好地理解前述内容:The foregoing can be better understood in light of the following terms:
条款A1.一种纵向堆叠芯片,包括:第一晶粒组,包括采用面对面制程的第一晶粒和第二晶粒;以及第二晶粒组,包括采用面对面制程的第一晶粒和第二晶粒;其中,所述第一晶粒组和所述第二晶粒组采用背对背制程。Clause A1. A vertically stacked chip comprising: a first die group including a first die and a second die using a face-to-face process; and a second die group including the first die and a second die using a face-to-face process Two crystal grains; wherein, the first crystal grain group and the second crystal grain group adopt a back-to-back process.
条款A2.根据条款A1所述的纵向堆叠芯片,其中所述第一晶粒为处理器核及内存其中之一,所述第二晶粒为处理器核及内存的另一个。Clause A2. The vertically stacked chips of Clause A1, wherein the first die is one of a processor core and a memory, and the second die is the other of a processor core and a memory.
条款A3.根据条款A2所述的纵向堆叠芯片,其中所述第一晶粒组的处理器核包括第一晶粒对晶粒区,生成有第一收发电路,所述第二晶粒组的处理器核包括第二晶粒对晶粒区,生成有第二收发电路;其中,所述第一晶粒及所述第二晶粒组的处理器核通过所述第一收发电路及所述第二收发电路进行层间数据传输。Clause A3. The vertically stacked chip of Clause A2, wherein the processor core of the first die group includes a first die-to-die region generating a first transceiver circuit, the processor core of the second die group The processor core includes a second die-to-die region, and a second transceiver circuit is generated; wherein, the processor cores of the first die and the second die group pass through the first transceiver circuit and the The second transceiver circuit performs inter-layer data transmission.
条款A4.根据条款A3所述的纵向堆叠芯片,其中所述第一晶粒组的内存位于所述第一晶粒组的处理器核及所述第二晶粒组的处理器核间,所述第一晶粒组的内存生成有收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。Clause A4. The vertically stacked die of Clause A3, wherein the memory of the first die group is located between the processor cores of the first die group and the processor cores of the second die group, so The memory of the first die group has transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
条款A5.根据条款A4所述的纵向堆叠芯片,其中所述第一晶粒组的内存包括第一输入输出区,所述第二晶粒组的处理器核及所述第二晶粒组的内存生成有输入输出硅通孔,所述第一晶粒组的内存中的数据通过所述第一输入输出区及所述输入输出硅通孔传送至所述纵向堆叠芯片外。Clause A5. The vertically stacked die of Clause A4, wherein the memory of the first die group includes a first input and output area, the processor cores of the second die group, and the memory of the second die group The memory is provided with input-output through-silicon vias, and the data in the memory of the first die group is transmitted to the outside of the vertically stacked chips through the first input-output area and the input-output through-silicon vias.
条款A6.根据条款A4所述的纵向堆叠芯片,其中所述第二晶粒组的内存包括第二输入输出区,所述第二晶粒组的内存中的数据通过所述输入输出硅通孔传送至所述纵向堆叠芯片外。Clause A6. The vertically stacked chips of Clause A4, wherein the memory of the second die group includes a second input and output area, and the data in the memory of the second die group passes through the through silicon vias Transfer to the outside of the vertically stacked chips.
条款A7.根据条款A4所述的纵向堆叠芯片,连接至片外内存,其中所述第一晶粒组的内存还包括第一物理区,所述第二晶粒组的处理器核及所述第二晶粒组的内存生成有物理硅通孔,所述第一晶粒组的处理器核的运算结果通过所述第一物理区及所述物理硅通孔传送至所述片外内存。Clause A7. The vertically stacked die of Clause A4, connected to off-chip memory, wherein the memory of the first die group further comprises a first physical area, the processor core of the second die group and the The memory of the second die group has physical through-silicon vias, and the operation result of the processor core of the first die group is transmitted to the off-chip memory through the first physical area and the physical through-silicon vias.
条款A8.根据条款A3所述的纵向堆叠芯片,其中所述第二晶粒组的内存位于所述第一晶粒组的处理器核及所述第二晶粒组的处理器核间,所述第二晶粒组的内存生成有收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。Clause A8. The vertically stacked die of Clause A3, wherein the memory of the second die group is located between the processor cores of the first die group and the processor cores of the second die group, so The memory of the second die group has transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
条款A9.根据条款A1所述的纵向堆叠芯片,其中所述第一晶粒组还包括第三晶粒,与所述第一晶粒组的所述第一晶粒采用面对背制程。Clause A9. The vertically stacked chips of Clause A1 , wherein the first die group further includes a third die that is face-to-back with the first die of the first die group.
条款A10.根据条款A9所述的纵向堆叠芯片,其中所述第一晶粒为处理器核,所述第二晶粒为内存及所述第三晶粒为内存。Clause A10. The vertically stacked chip of Clause A9, wherein the first die is a processor core, the second die is a memory and the third die is a memory.
条款A11.根据条款A1所述的纵向堆叠芯片,还包括第三晶粒组,与所述第二晶粒组采用面对背制程。Clause A11. The vertically stacked chips of Clause A1, further comprising a third die group, face-to-back process with the second die group.
条款A12.根据条款A1至11所述任一项的纵向堆叠芯片,其中各层以倒装芯片球栅格阵列方式封装。Clause A12. The vertically stacked chip of any one of clauses A1 to 11, wherein the layers are packaged in a flip-chip ball grid array.
条款A13.根据条款A1至11所述任一项的纵向堆叠芯片,其中各层以CoWoS方式封装。Clause A13. The vertically stacked chip according to any one of Clauses A1 to 11, wherein the layers are packaged in CoWoS.
条款A14.一种集成电路装置,包括根据条款A1至11任一项所述的纵向堆叠芯片。Clause A14. An integrated circuit device comprising vertically stacked chips according to any one of Clauses A1 to 11.
条款A15.一种板卡,包括根据条款A14所述的集成电路装置。Clause A15. A board comprising the integrated circuit arrangement according to Clause A14.
条款A16.一种纵向堆叠芯片的方法,所述纵向堆叠芯片包括第一晶粒组及第二晶粒组,所述方法包括:面对面贴合所述第一晶粒组中的第一晶粒和第二晶粒;面对面贴合所述第二晶粒组中的第一晶粒和第二晶粒;以及背对背贴合所述第一晶粒组和所述第二晶粒组。Clause A16. A method of vertically stacking chips, the vertically stacking chips comprising a first die group and a second die group, the method comprising: bonding a first die in the first die group face-to-face and the second crystal grain; bonding the first crystal grain and the second crystal grain in the second crystal grain group face-to-face; and bonding the first crystal grain group and the second crystal grain group back-to-back.
条款A17.根据条款A16所述的方法,其中所述第一晶粒为处理器核及内存其中之一,所述第二晶粒为处理器核及内存的另一个,所述方法还包括:生成第一收发电路于所述第一晶粒组的处理器核中的第一晶粒对晶粒区;以及生成第二收发电路于所述第二晶粒组的处理器核中的第二晶粒对晶粒区;其中,所述第一晶粒及所述第二晶粒组的处理器核通过所述第一收发电路及所述第二收发电路进行层间数据传输。Clause A17. The method of Clause A16, wherein the first die is one of a processor core and a memory, and the second die is the other of a processor core and a memory, the method further comprising: generating a first transceiver circuit in a first die-to-die region of the processor cores of the first die group; and generating a second transceiver circuit in a second die-to-die region of a processor core in the second die group A die-to-die area; wherein, the processor cores of the first die and the second die group perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
条款A18.根据条款A17所述的方法,还包括:生成收发硅通孔于所述第一晶粒组的内存;设置所述第一晶粒组的内存于所述第一晶粒组的处理器核及所述第二晶粒组的处理器核间;其中,所述第一晶粒组的内存通过所述收发硅通孔电性连接所述第一收发电路及所述第二收发电路。Clause A18. The method of Clause A17, further comprising: generating TSVs in the memory of the first die group; and setting the memory of the first die group in the process of the first die group Between the processor core and the processor core of the second die group; wherein, the memory of the first die group is electrically connected to the first transceiver circuit and the second transceiver circuit through the transceiver through-silicon via .
条款A19.根据条款A18所述的方法,其中所述第一晶粒组的内存包括第一输入输出区,所述第二晶粒组的内存包括第二输入输出区,所述方法还包括:生成输入输出硅通孔于所述第二晶粒组的处理器核及所述第二晶粒组的内存;其中,所述第一晶粒组的内存中的数据通过所述第一输入输出区及所述输入输出硅通孔传送至所述纵向堆叠芯片外,且所述第二晶粒组的内存中的数据通过所述第二输入输出区及所述输入输出硅通孔传送至所述纵向堆叠芯片外。Clause A19. The method of Clause A18, wherein the memory of the first die group includes a first input-output region and the memory of the second die group includes a second input-output region, the method further comprising: generating input and output through-silicon vias in the processor core of the second die group and the memory of the second die group; wherein, the data in the memory of the first die group passes through the first input and output region and the I/O TSV are transmitted to the outside of the vertically stacked chips, and the data in the memory of the second die group is transmitted to the second IO region and the IO TSV to the described above for vertical stacking of chips.
条款A20.根据条款A17所述的方法,所述纵向堆叠芯片连接至片外内存,其中所述第一晶粒组的内存还包括第一物理区,所述方法还包括:生成物理硅通孔于所述第二晶粒组的处理器核及所述第二晶粒组的内存;其中,所述第一晶粒组的处理器核的运算结果通过所述第一物理区及所述物理硅通孔传送至所述片外内存。Clause A20. The method of Clause A17, the vertically stacked die connected to an off-chip memory, wherein the memory of the first die group further includes a first physical region, the method further comprising: generating physical through-silicon vias In the processor core of the second die group and the memory of the second die group; wherein, the calculation result of the processor core of the first die group passes through the first physical area and the physical TSVs communicate to the off-chip memory.
条款A21.根据条款A16所述的方法,还包括:生成收发硅通孔于所述第二晶粒组的内存;以及设置所述第二晶粒组的内存于所述第一晶粒组的处理器核及所述第二晶粒组的处理器核间;其中,所述收发硅通孔电性连接所述第一收发电路及所述第二收发电路。Clause A21. The method of Clause A16, further comprising: generating TSVs in memory of the second die group; and placing memory in the second die group in memory of the first die group Between the processor core and the processor core of the second die group; wherein, the transceiver TSV is electrically connected to the first transceiver circuit and the second transceiver circuit.
条款A22.根据条款A16所述的方法,其中所述第一晶粒组还包括第三晶粒,所述方法包括:面对背贴合所述第三晶粒与所述第一晶粒组的所述第一晶粒。Clause A22. The method of Clause A16, wherein the first die set further includes a third die, the method comprising: face-to-back bonding the third die to the first die set of the first grain.
条款A23.根据条款A22所述的方法,其中所述第一晶粒为处理器核,所述第二晶粒为内存及所述第三晶粒为内存。Clause A23. The method of Clause A22, wherein the first die is a processor core, the second die is a memory and the third die is a memory.
条款A24.根据条款A16所述的方法,所述纵向堆叠芯片包括还包括第三晶粒组,所述方法还包括:面对背贴合所述第三晶粒组与所述第二晶粒组。Clause A24. The method of Clause A16, the vertically stacking chips further comprising a third die set, the method further comprising: face-to-back bonding the third die set to the second die Group.
以上对本发明实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The embodiments of the present invention have been described in detail above, and specific examples have been used in this paper to illustrate the principles and implementation methods of the present invention. The descriptions of the above embodiments are only used to help understand the method and core idea of the present invention; at the same time, for Those skilled in the art will have changes in the specific implementation and scope of application according to the idea of the present invention. In summary, the contents of this specification should not be construed as limiting the present invention.

Claims (24)

  1. 一种纵向堆叠芯片,包括:A vertically stacked chip comprising:
    第一晶粒组,包括采用面对面制程的第一晶粒和第二晶粒;以及a first die group comprising a first die and a second die using a face-to-face process; and
    第二晶粒组,包括采用面对面制程的第一晶粒和第二晶粒;a second die group comprising first and second dies in a face-to-face process;
    其中,所述第一晶粒组和所述第二晶粒组采用背对背制程。Wherein, the first die group and the second die group adopt a back-to-back process.
  2. 根据权利要求1所述的纵向堆叠芯片,其中所述第一晶粒为处理器核及内存其中之一,所述第二晶粒为处理器核及内存的另一个。The vertically stacked chip according to claim 1, wherein the first die is one of a processor core and a memory, and the second die is the other one of a processor core and a memory.
  3. 根据权利要求2所述的纵向堆叠芯片,其中所述第一晶粒组的处理器核包括第一晶粒对晶粒区,生成有第一收发电路,所述第二晶粒组的处理器核包括第二晶粒对晶粒区,生成有第二收发电路;The vertically stacked chip according to claim 2, wherein the processor core of the first die group comprises a first die-to-die region, a first transceiver circuit is generated, and the processor core of the second die group The core includes a second die-to-die region generating a second transceiver circuit;
    其中,所述第一晶粒及所述第二晶粒组的处理器核通过所述第一收发电路及所述第二收发电路进行层间数据传输。Wherein, the processor cores of the first die and the second die group perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
  4. 根据权利要求3所述的纵向堆叠芯片,其中所述第一晶粒组的内存位于所述第一晶粒组的处理器核及所述第二晶粒组的处理器核间,所述第一晶粒组的内存生成有收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。The vertically stacked chips according to claim 3, wherein the memory of the first die group is located between the processor cores of the first die group and the processor cores of the second die group, the second die group The memory of a chip group is formed with transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
  5. 根据权利要求4所述的纵向堆叠芯片,其中所述第一晶粒组的内存包括第一输入输出区,所述第二晶粒组的处理器核及所述第二晶粒组的内存生成有输入输出硅通孔,所述第一晶粒组的内存中的数据通过所述第一输入输出区及所述输入输出硅通孔传送至所述纵向堆叠芯片外。The vertically stacked chip of claim 4, wherein the memory of the first die group includes a first input and output area, the processor core of the second die group and the memory of the second die group generate There are input-output through-silicon vias, and the data in the memory of the first die group is transmitted to the outside of the vertically stacked chips through the first input-output area and the input-output through-silicon vias.
  6. 根据权利要求4所述的纵向堆叠芯片,其中所述第二晶粒组的内存包括第二输入输出区,所述第二晶粒组的内存中的数据通过所述输入输出硅通孔传送至所述纵向堆叠芯片外。The vertically stacked chip according to claim 4, wherein the memory of the second die group includes a second input and output area, and the data in the memory of the second die group is transmitted to the outside of the vertically stacked chips.
  7. 根据权利要求4所述的纵向堆叠芯片,连接至片外内存,其中所述第一晶粒组的内存还包括第一物理区,所述第二晶粒组的处理器核及所述第二晶粒组的内存生成有物理硅通孔,所述第一晶粒组的处理器核的运算结果通过所述第一物理区及所述物理硅通孔传送至所述片外内存。The vertically stacked chip of claim 4, connected to an off-chip memory, wherein the memory of the first die group further includes a first physical area, the processor core of the second die group and the second Physical through-silicon vias are generated in the memory of the die group, and the computing results of the processor cores of the first die group are transmitted to the off-chip memory through the first physical area and the physical through-silicon vias.
  8. 根据权利要求3所述的纵向堆叠芯片,其中所述第二晶粒组的内存位于所述第一晶粒组的处理器核及所述第二晶粒组的处理器核间,所述第二晶粒组的内存生成有收发硅通孔,用以电性连接所述第一收发电路及所述第二收发电路。The vertically stacked chips according to claim 3, wherein the memory of the second die group is located between the processor cores of the first die group and the processor cores of the second die group, the second die group The memory of the two-die group is formed with transceiver TSVs for electrically connecting the first transceiver circuit and the second transceiver circuit.
  9. 根据权利要求1所述的纵向堆叠芯片,其中所述第一晶粒组还包括第三晶粒,与所述第一晶粒组的所述第一晶粒采用面对背制程。The vertically stacked chips according to claim 1 , wherein the first die group further comprises a third die, and the first die of the first die group adopts a face-to-back process.
  10. 根据权利要求9所述的纵向堆叠芯片,其中所述第一晶粒为处理器核,所述第二晶粒为内存及所述第三晶粒为内存。The vertically stacked chips of claim 9, wherein the first die is a processor core, the second die is a memory, and the third die is a memory.
  11. 根据权利要求1所述的纵向堆叠芯片,还包括第三晶粒组,与所述第二晶粒组采用面对背制程。The vertically stacked chips according to claim 1 , further comprising a third die group, which adopts a face-to-back process with the second die group.
  12. 根据权利要求1至11所述任一项的纵向堆叠芯片,其中各层以倒装芯片球栅格阵列(flip chip ball grid array,FCBGA)方式封装。The vertically stacked chip according to any one of claims 1 to 11, wherein each layer is packaged in a flip chip ball grid array (FCBGA) manner.
  13. 根据权利要求1至11所述任一项的纵向堆叠芯片,其中各层以CoWoS(chip on wafer on substrate)方式封装。The vertically stacked chip according to any one of claims 1 to 11, wherein each layer is packaged in a CoWoS (chip on wafer on substrate) manner.
  14. 一种集成电路装置,包括根据权利要求1至11任一项所述的纵向堆叠芯片。An integrated circuit device comprising vertically stacked chips according to any one of claims 1 to 11.
  15. 一种板卡,包括根据权利要求14所述的集成电路装置。A board comprising the integrated circuit device according to claim 14.
  16. 一种纵向堆叠芯片的方法,所述纵向堆叠芯片包括第一晶粒组及第二晶粒组,所述方法包括:A method for vertically stacking chips, the vertically stacking chips comprising a first die group and a second die group, the method comprising:
    面对面贴合所述第一晶粒组中的第一晶粒和第二晶粒;bonding the first die and the second die in the first die group face to face;
    面对面贴合所述第二晶粒组中的第一晶粒和第二晶粒;以及bonding the first die and the second die in the second die group face to face; and
    背对背贴合所述第一晶粒组和所述第二晶粒组。bonding the first die group and the second die group back to back.
  17. 根据权利要求16所述的方法,其中所述第一晶粒为处理器核及内存其中之一,所述第二晶粒为处理器核及内存的另一个,所述方法还包括:The method according to claim 16, wherein the first die is one of the processor core and the memory, and the second die is the other of the processor core and the memory, and the method further comprises:
    生成第一收发电路于所述第一晶粒组的处理器核中的第一晶粒对晶粒区;以及generating a first transceiver circuit in a first die-to-die region of a processor core of the first die group; and
    生成第二收发电路于所述第二晶粒组的处理器核中的第二晶粒对晶粒区;generating a second transceiver circuit in a second die-to-die region of the processor core of the second die group;
    其中,所述第一晶粒及所述第二晶粒组的处理器核通过所述第一收发电路及所述第二收发电路进行层间数据传输。Wherein, the processor cores of the first die and the second die group perform interlayer data transmission through the first transceiver circuit and the second transceiver circuit.
  18. 根据权利要求17所述的方法,还包括:The method of claim 17, further comprising:
    生成收发硅通孔于所述第一晶粒组的内存;generating transceiver TSVs in the memory of the first die group;
    设置所述第一晶粒组的内存于所述第一晶粒组的处理器核及所述第二晶粒组的处理器核间;disposing the memory of the first die group between the processor cores of the first die group and the processor cores of the second die group;
    其中,所述第一晶粒组的内存通过所述收发硅通孔电性连接所述第一收发电路及所述第二收发电路。Wherein, the memory of the first die group is electrically connected to the first transceiver circuit and the second transceiver circuit through the transceiver TSV.
  19. 根据权利要求18所述的方法,其中所述第一晶粒组的内存包括第一输入输出区,所述第二晶粒组的内存包括第二输入输出区,所述方法还包括:The method of claim 18, wherein the memory of the first die group includes a first input-output area and the memory of the second die group includes a second input-output area, the method further comprising:
    生成输入输出硅通孔于所述第二晶粒组的处理器核及所述第二晶粒组的内存;generating TSVs in the processor core of the second die set and the memory of the second die set;
    其中,所述第一晶粒组的内存中的数据通过所述第一输入输出区及所述输入输出硅通孔传送至所述纵向堆叠芯片外,且所述第二晶粒组的内存中的数据通过所述第二输入输出区及所述输入输出硅通孔传送至所述纵向堆叠芯片外。Wherein, the data in the memory of the first die group is transmitted to the outside of the vertically stacked chips through the first input-output area and the through-silicon vias, and the data in the memory of the second die group The data is transmitted to the outside of the vertically stacked chips through the second I/O area and the I/O TSV.
  20. 根据权利要求17所述的方法,所述纵向堆叠芯片连接至片外内存,其中所述第一晶粒组的内存还包括第一物理区,所述方法还包括:The method of claim 17, wherein the vertically stacked chips are connected to an off-chip memory, wherein the memory of the first die group further includes a first physical area, the method further comprising:
    生成物理硅通孔于所述第二晶粒组的处理器核及所述第二晶粒组的内存;generating physical TSVs in the processor cores of the second die set and the memory of the second die set;
    其中,所述第一晶粒组的处理器核的运算结果通过所述第一物理区及所述物理硅通孔传送至所述片外内存。Wherein, the operation result of the processor core of the first die group is transmitted to the off-chip memory through the first physical area and the physical TSV.
  21. 根据权利要求16所述的方法,还包括:The method of claim 16, further comprising:
    生成收发硅通孔于所述第二晶粒组的内存;以及generating TSVs in the memory of the second die set; and
    设置所述第二晶粒组的内存于所述第一晶粒组的处理器核及所述第二晶粒组的处理器核间;disposing the memory of the second die group between the processor cores of the first die group and the processor cores of the second die group;
    其中,所述收发硅通孔电性连接所述第一收发电路及所述第二收发电路。Wherein, the transceiver TSV is electrically connected to the first transceiver circuit and the second transceiver circuit.
  22. 根据权利要求16所述的方法,其中所述第一晶粒组还包括第三晶粒,所述方法包括:The method of claim 16, wherein the first group of dies further comprises a third die, the method comprising:
    面对背贴合所述第三晶粒与所述第一晶粒组的所述第一晶粒。The third die and the first die of the first die group are bonded face-to-back.
  23. 根据权利要求22所述的方法,其中所述第一晶粒为处理器核,所述第二晶粒为内存及所述第三晶粒为内存。The method of claim 22, wherein the first die is a processor core, the second die is a memory and the third die is a memory.
  24. 根据权利要求16所述的方法,所述纵向堆叠芯片包括还包括第三晶粒组,所述方法还包括:The method according to claim 16, the vertically stacking chips further comprising a third die group, the method further comprising:
    面对背贴合所述第三晶粒组与所述第二晶粒组。The third crystal grain group and the second crystal grain group are bonded face to back.
PCT/CN2022/122373 2021-10-08 2022-09-29 Longitudinal stacked chip, integrated circuit device, board, and manufacturing method therefor WO2023056876A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111172917.6 2021-10-08
CN202111172917.6A CN115966535A (en) 2021-10-08 2021-10-08 Longitudinally stacked chip, integrated circuit device, board card and manufacturing method thereof

Publications (1)

Publication Number Publication Date
WO2023056876A1 true WO2023056876A1 (en) 2023-04-13

Family

ID=85803154

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/122373 WO2023056876A1 (en) 2021-10-08 2022-09-29 Longitudinal stacked chip, integrated circuit device, board, and manufacturing method therefor

Country Status (3)

Country Link
CN (1) CN115966535A (en)
TW (1) TW202316621A (en)
WO (1) WO2023056876A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117222234A (en) * 2023-11-07 2023-12-12 北京奎芯集成电路设计有限公司 Semiconductor device based on UCie interface

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101090107A (en) * 2006-06-13 2007-12-19 日月光半导体制造股份有限公司 Package structure of crystal particle and manufacturing method thereof
CN107527877A (en) * 2016-06-15 2017-12-29 联发科技股份有限公司 Semiconductor packages
CN113299632A (en) * 2020-02-21 2021-08-24 赛灵思公司 Integrated circuit device with stacked dies of mirror circuit

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101090107A (en) * 2006-06-13 2007-12-19 日月光半导体制造股份有限公司 Package structure of crystal particle and manufacturing method thereof
CN107527877A (en) * 2016-06-15 2017-12-29 联发科技股份有限公司 Semiconductor packages
CN113299632A (en) * 2020-02-21 2021-08-24 赛灵思公司 Integrated circuit device with stacked dies of mirror circuit

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117222234A (en) * 2023-11-07 2023-12-12 北京奎芯集成电路设计有限公司 Semiconductor device based on UCie interface
CN117222234B (en) * 2023-11-07 2024-02-23 北京奎芯集成电路设计有限公司 Semiconductor device based on UCie interface

Also Published As

Publication number Publication date
CN115966535A (en) 2023-04-14
TW202316621A (en) 2023-04-16

Similar Documents

Publication Publication Date Title
TWI748291B (en) Integrated circuit device, interconnection device die and fabrication method for system on integrated chip
US10445269B2 (en) Stacked semiconductor device assembly in computer system
US20160300823A1 (en) Package-on-package options with multiple layer 3-d stacking
TWI591773B (en) Die stacking techniques in bga memory package for small footprint cpu and memory motherboard design
WO2023078006A1 (en) Accelerator structure, method for generating accelerator structure, and device thereof
TW201515176A (en) Flexible memory system with a controller and a stack of memory
KR20130054382A (en) Wide input output memory with low density, low latency and high density, high latency blocks
US20200066640A1 (en) Hybrid technology 3-d die stacking
US20230352412A1 (en) Multiple die package using an embedded bridge connecting dies
KR20110006482A (en) Multi chip package for use in multi processor system having memory link architecture
WO2023056876A1 (en) Longitudinal stacked chip, integrated circuit device, board, and manufacturing method therefor
KR102629195B1 (en) How to layout package structures, devices, board cards, and integrated circuits
WO2023056875A1 (en) Multi-core chip, integrated circuit apparatus, and board card and manufacturing procedure method therefor
WO2022242333A1 (en) Wafer chip having cowos package structure, wafer, device, and generation method therefor
CN115966517A (en) Back-to-back stacking process, medium and computer equipment
TWI834089B (en) A system-on-integrated-chip, a method for producing the same and a readable storage medium
CN110544673A (en) Multilayer fused three-dimensional system integrated structure
US20230343718A1 (en) Homogeneous chiplets configurable as a two-dimensional system or a three-dimensional system
CN116976411A (en) Device, chip, equipment, memory calculation scheduling and multi-layer neural network training method
CN117690808A (en) Method for producing chip
CN117690893A (en) Chip and product comprising same
CN113745197A (en) Three-dimensional heterogeneous integrated programmable array chip structure and electronic device
CN117525005A (en) Chip assembly with vacuum cavity vapor chamber, packaging structure and preparation method
CN116053229A (en) Chip package substrate and packaged chip

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22877895

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE